Catalog Ingestion API

This is for:

Developer
Important

The Catalog Schema and Ingestion APIs are currently in closed beta. Contact your Coveo representative to learn about these APIs and how to get involved.

The Catalog Schema and Ingestion APIs offer a streamlined approach to managing catalog data indexing and updates, simplifying data integration and maintenance within a Coveo organization.

The Catalog Ingestion API provides an improved alternative to the existing Full and Partial catalog data update operations of the Coveo Stream API. It ingests catalog data by validating it against the schemas you’ve defined, ensuring data integrity and consistency from the moment it enters the Coveo index.

Tip
Leading practice

Always create a schema using the Schema API before attempting to ingest data.

Limitations

This section outlines the current limitations of using the Catalog Ingestion and Schema APIs to manage your catalog data:

  • The APIs currently only support the ingestion of Product catalog object items. This means that if your catalog data contains items of the Variant or Availability types, you can’t use these APIs yet.

  • The APIs don’t currently support the ingestion of dictionary fields. However, the team is actively working on an approach to handle them.

Ingest product data

To ingest product data, use the following endpoint:

PUT /rest/organizations/<ORGANIZATION_ID>/ingest/unstable/sources/<SOURCE_ID>/object-types/<OBJECT_TYPE>/objects HTTP/1.1

Where:

  • <ORGANIZATION_ID> is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID.

  • <SOURCE_ID> is the identifier of the source where you want to ingest the product data. This source is the one created automatically when you created the catalog schema. To learn how to find the source ID for a given source, see Copy a source name or ID.

    Tip

    To find the source IDs tied to existing schemas, use the View all schemas endpoint of the Catalog Schema API.

  • <OBJECT_TYPE> is the type of catalog object that the items you’re ingesting pertain to. Currently, only PRODUCT is supported.

Note

To use this endpoint, you must have the Allow access level for the Push items to sources domain.

In the request body, you must provide a JSON object that contains an array of product objects to be ingested.

Example request body:

{
  "objects": [
    {
      "ec_product_id": "gulp-kayak-12345", 1
      "ec_name": "Gulp! Kayak",
      "ec_description": "Perfect for exploring the great outdoors.",
      "ec_shortdesc": "A kayak for the adventurous.",
      "ec_brand": "Gulp!",
      "ec_thumbnails": [
          "https://example.com/images/gulp-kayak.jpg",
          "https://example.com/images/gulp-kayak-2.jpg"
      ],
      "ec_images": [
          "https://example.com/images/gulp-kayak-large.jpg",
          "https://example.com/images/gulp-kayak-large-2.jpg"
        ],
      "ec_price": 299.99,
      "ec_cogs": 150.00,
      "ec_item_group_id": "gulp-kayak-group",
      "product_size": 42, 2
      "available_colors": ["red", "blue", "green"],
      "geolocation": 45.4215,
      "added_date": "2025-03-15",
      "timestamp": 1672531199
    },
    {
      "ec_product_id": "gulp-kayak-67890",
      "ec_name": "Gulp! Kayak Pro",
      "ec_description": "A professional-grade kayak for serious adventurers.",
      "ec_shortdesc": "Professional kayak for the serious adventurer.",
      "ec_brand": "Gulp!",
      "ec_thumbnails": [
          "https://example.com/images/gulp-kayak-pro.jpg"
      ],
      "ec_images": [
          "https://example.com/images/gulp-kayak-pro-large.jpg"
        ],
      "ec_price": 499.99,
      "ec_cogs": 300.00,
      "ec_item_group_id": "gulp-kayak-group",
      "product_size": 44,
      "available_colors": ["black", "yellow"],
      "geolocation": 45.4215,
      "added_date": "2025-04-01",
      "timestamp": 1672617599
    },
    [...]
  ]
}
1 Metadata keys with an ec_ prefix contain the catalog data that will fill the standard commerce fields. These fields are essential for the proper functioning of Coveo’s commerce features, such as Coveo ML models and event enrichment. However, only ec_product_id is strictly required for ingestion and omitting it will result in an error.
2 Custom metadata keys defined in the catalog schema can also be included in the product data. These keys must match the names defined in the schema, and their values must adhere to the specified data types and formats. For example, if you defined a custom field named product_size of type INTEGER_32, you can include it in the product data as shown above.

Providing values that haven’t been defined in the schema will result in ingestion errors. For example, if product_size wasn’t defined in the schema, including it in the product data would cause an error.

If the request is successful, you’ll get the HTTP response code 200 OK.

Delete product data

The Ingestion API supports two methods to delete product data:

Bulk deletion of multiple products

To delete multiple products, use the following endpoint:

POST /rest/organizations/<ORGANIZATION_ID>/ingest/unstable/sources/<SOURCE_ID>/object-types/<OBJECT_TYPE>/objects/bulk-delete HTTP/1.1

Where:

  • <ORGANIZATION_ID> is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID.

  • <SOURCE_ID> is the identifier of the source from which you want to delete the product data.

    Tip

    To find the source IDs tied to existing schemas, use the View all schemas endpoint of the Catalog Schema API.

  • <OBJECT_TYPE> is the type of catalog object that the items you’re deleting pertain to. Currently, only PRODUCT is supported.

Note

To use this endpoint, you must have the Allow access level for the Push items to sources domain.

In the request body, provide a JSON object that contains an array of product identifiers to delete.

Example request body:

{
  "objects": [ 1
    {
      "ec_product_id": "gulp-kayak-12345"
    },
    {
      "ec_product_id": "gulp-kayak-67890"
    },
    {
      "ec_product_id": "surf-co-repair-kit-54321"
    }
  ]
}
1 Each object in the objects array must contain the ec_product_id field, which specifies the unique identifier of the product to be deleted. In this example, three products are being deleted.
Important

Only the ec_product_id field is accepted in the deletion request body.

If the request succeeds, you get an HTTP 204 No Content response.

Deletion of products older than a specified timestamp

You can delete products that are older than a specified timestamp. This is useful for removing outdated products from the index after performing a full catalog data update, for example.

To delete products older than a specified timestamp, use the following endpoint:

DELETE /rest/organizations/<ORGANIZATION_ID>/ingest/unstable/sources/<SOURCE_ID>/objects/older-than/<ORDERING_ID> HTTP/1.1

Where:

  • <ORGANIZATION_ID> is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID.

  • <SOURCE_ID> is the identifier of the source from which you want to delete the product data.

  • <ORDERING_ID> is a Unix timestamp (in milliseconds) that specifies the cutoff date for deletion. Products that were indexed before this timestamp will be deleted.

Note

To use this endpoint, you must have the Allow access level for the Push items to sources domain.

If the request is successful, you’ll get the HTTP response code 204 No Content. A 15-minute delay may occur before the deletions are reflected in the index.

Ingestion API limits

The Ingestion API enforces specific limits to ensure optimal performance and resource usage [1]:

  • The maximum request batch size is 10,000 items, and the maximum request size is 20 MB. If you experience timeout errors, consider reducing the batch size or request size.

    Tip

    While the Stream API supports larger batch files (up to 256 MB), the Ingestion API intentionally uses smaller request sizes (20 MB) to reduce the risk of transmission timeouts and ensure faster, incremental processing. Smaller batches spread load more evenly, resulting in smoother and quicker indexing.

  • STRING field limits:

    • STRING field names (keys) must not exceed 255 characters.

    • STRING field values must not exceed 1,000 characters.

    • Multivalue STRING fields can contain up to 100 values. Each of these values must not exceed 50 characters.

  • DATE field values must not exceed 50 characters. Dates must use the ISO 8601 format.

  • The API allows a maximum of 1 request per second per source.

  • The maximum size for a product document is 10 KB.

Error handling and troubleshooting

Errors during ingestion are surfaced asynchronously in the Log Browser (platform-ca | platform-eu | platform-au) page. The Log Browser provides structured, actionable messages to facilitate quick resolution. Implementers must monitor these logs to detect and respond to issues promptly.

Items that don’t conform to the schema or contain invalid data will be rejected during ingestion. The error messages will indicate the specific issues with the data, such as missing required fields or incorrect data types.


1. To modify these limits, contact your Coveo representative.