Catalog Ingestion API

This is for:

Developer
Important

The Catalog Schema and Ingestion APIs are currently in closed beta. Contact your Coveo representative to learn about these APIs and how to get involved.

The Catalog Schema and Ingestion APIs offer a streamlined approach to managing catalog data indexing and updates, simplifying data integration and maintenance within a Coveo organization.

The Catalog Ingestion API provides an improved alternative to the existing catalog data operations of the Coveo Stream API. It supports full ingestion, partial updates, and data deletion. The Catalog Ingestion API ingests catalog data by validating it against the schemas you’ve defined, ensuring the integrity and consistency of data from the moment it enters the Coveo index.

Tip
Leading practices
  • Always create a schema using the Schema API before attempting to ingest data.

  • To perform all operations described in this guide, ensure your API key has the following privileges:

    Access level Domain Action

    Edit

    Catalog setup

    View and modify catalog schemas.

    Edit

    Field

    Add, delete, or modify custom fields in schemas.

    Allow

    Push items to sources

    Ingest data into your catalog.

Limitations

The APIs currently only support the ingestion of Product catalog object items. This means that if your catalog data contains items of the Variant or Availability types, you can’t use these APIs yet.

Usage

The Catalog Ingestion API supports several operations depending on what you need to do with your product data.

What you want to do Operation

Send product data for the first time

Ingest or replace product data

Add new products

Fully replace specific products

Update specific fields on existing products

Partial update of product data

Delete specific products by ID

Delete product data

Replace your entire product data and remove stale products

Rebuild your product data

Ingest or replace product data

To perform the initial ingestion, add new products, or fully replace existing ones, use the following endpoint:

Note

When working with existing products, the PUT endpoint performs a full replacement of the product data. This means that if you include a product with an ec_product_id that already exists in the source, all of its fields will be overwritten with the data provided in the request. Any fields that you omit from the request body for that product will be removed from the existing product in the source.

To update only specific fields of an existing product without affecting the other fields, consider using the partial update operation instead, which allows for partial updates without removing unspecified fields.

PUT /rest/organizations/<ORGANIZATION_ID>/ingest/v1/sources/<SOURCE_ID>/objects HTTP/1.1

Where:

  • <ORGANIZATION_ID> is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID.

  • <SOURCE_ID> is the identifier of the source in which you want to ingest the product data. This source is the one created automatically when you created the catalog schema. To learn how to find the source ID for a given source, see Copy a source name or ID.

    Tip

    To find the source IDs tied to existing schemas, use the View all schemas endpoint of the Catalog Schema API.

Note

To use this endpoint, you must have the Allow access level for the Push items to sources domain.

In the request body, you must provide a JSON object that contains an array of product objects to be ingested.

Example request body:

{
  "objects": [
    {
      "ec_product_id": "gulp-kayak-12345", 1
      "ec_name": "Gulp! Kayak",
      "ec_category": [ 2
          "Canoes & Kayaks|Kayaks|Sea Kayaks",
          "Promotions|Kayaks"
      ],
      "ec_description": "Perfect for exploring the great outdoors.",
      "ec_shortdesc": "A kayak for the adventurous.",
      "ec_brand": "Gulp!",
      "ec_thumbnails": [
          "https://example.com/images/gulp-kayak.jpg",
          "https://example.com/images/gulp-kayak-2.jpg"
      ],
      "ec_images": [
          "https://example.com/images/gulp-kayak-large.jpg",
          "https://example.com/images/gulp-kayak-large-2.jpg"
        ],
      "ec_price": 299.99,
      "ec_cogs": 150.00,
      "ec_item_group_id": "gulp-kayak-group",
      "product_size": 42, 3
      "price_list": { 4
        "Default": 299.99,
        "Vip": 249.99,
        "Premium": 274.99
      },
      "available_colors": ["red", "blue", "green"],
      "geolocation": 45.4215,
      "added_date": "2025-03-15",
      "timestamp": 1672531199
    },
    {
      "ec_product_id": "gulp-kayak-67890", 5
      "ec_name": "Gulp! Kayak Pro",
      "ec_category": [
          "Canoes & Kayaks|Kayaks|Professional Kayaks",
          "Promotions|Kayaks"
      ],
      "ec_description": "A professional-grade kayak for serious adventurers.",
      "ec_shortdesc": "Professional kayak for the serious adventurer.",
      "ec_brand": "Gulp!",
      "ec_thumbnails": [
          "https://example.com/images/gulp-kayak-pro.jpg"
      ],
      "ec_images": [
          "https://example.com/images/gulp-kayak-pro-large.jpg"
        ],
      "ec_price": 499.99,
      "ec_cogs": 300.00,
      "ec_item_group_id": "gulp-kayak-group",
      "product_size": 44,
      "price_list": {
        "Default": 499.99,
        "Vip": 449.99,
        "Premium": 474.99
      },
      "available_colors": ["black", "yellow"],
      "geolocation": 45.4215,
      "added_date": "2025-04-01",
      "timestamp": 1672617599
    },
    [...]
  ]
}
1 Metadata keys with an ec_ prefix contain the catalog data that will fill the standard commerce fields. These fields are essential for the proper functioning of Coveo’s commerce features, such as Coveo ML models and event enrichment. However, only ec_product_id is strictly required for ingestion and omitting it will result in an error.
2 The ec_category metadata is used to define the category hierarchy of your product. The value must be an array of strings, in which each string represents a complete hierarchical category path using the pipe (|) delimiter.

Each category path should represent the full hierarchy, from the broadest to the most specific level. For example:

  • Canoes & Kayaks|Kayaks|Sea Kayaks represents the hierarchy: Canoes & Kayaks → Kayaks → Sea Kayaks

  • Promotions|Kayaks represents the hierarchy: Promotions → Kayaks

    Important

    The Catalog Ingestion API enforces strict validation for the ec_category field format. Unlike other ingestion methods that may accept various formats, the Catalog Ingestion API requires the array of hierarchical strings format shown in the example above.

3 Custom metadata keys defined in the catalog schema can also be included in the product data. These keys must match the names defined in the schema, and their values must adhere to the specified data types and formats. For example, if you defined a custom field named product_size of type INTEGER_32, you can include it in the product data as shown above.

Providing values that haven’t been defined in the schema will result in ingestion errors. For example, if product_size wasn’t defined in the schema, including it in the product data would cause an error.

4 Fields defined as dictionary fields in the schema (that is, with keyValue set to true) must be provided as JSON objects containing key-value pairs. In this example, price_list stores pricing that varies by customer group.
5 If a product with ec_product_id gulp-kayak-67890 already exists in the source, this request fully replaces it. All fields are overwritten with the values provided here, and any fields present in the existing product but omitted from this request are removed.

If the request is accepted for processing, you’ll get the HTTP response code 202 Accepted.

Partial update of product data

The Catalog Ingestion API PATCH endpoint performs partial updates using JSON merge patch semantics, meaning that:

  • If the product doesn’t exist, a new product is created with the specified fields.

  • If the product already exists, it updates only the specified fields, leaving other fields unchanged.

  • A field can be removed from an existing product by setting its value to null.

  • For dictionary fields, the endpoint performs a deep merge. Existing key-value pairs that aren’t included in the request payload are preserved. You can add new entries, update existing entry values, and remove individual entries by setting them to null.

To update your product data, use the following endpoint:

PATCH /rest/organizations/<ORGANIZATION_ID>/ingest/v1/sources/<SOURCE_ID>/objects HTTP/1.1

Where:

  • <ORGANIZATION_ID> is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID.

  • <SOURCE_ID> is the identifier of the source in which you want to update the product data. This source is the one that was automatically generated when you created the catalog schema. To learn how to find the source ID for a given source, see Copy a source name or ID.

    Tip

    To find the source IDs tied to existing schemas, use the View all schemas endpoint of the Catalog Schema API.

Note

To use this endpoint, you must have the Allow access level for the Push items to sources domain.

In the request body, provide a JSON object that contains an array of product objects to be updated or created.

Example request body:

{
  "objects": [
    {
      "ec_product_id": "gulp-kayak-12345", 1
      "ec_price": 349.99, 2
      "available_colors": ["red", "blue"],
      "product_size": null 3
    },
    {
      "ec_product_id": "new-product-98765", 4
      "ec_name": "New Kayak Model",
      "ec_price": 199.99,
      "ec_brand": "Gulp!",
      "available_colors": ["red", "blue"]
    }
  ]
}
1 The ec_product_id field is required and must be of the string type.
2 If product gulp-kayak-12345 exists, the fields that are provided in the request body will be updated, while all other existing fields will remain unchanged.
Important

Array operations are limited to full replacement (no granular array manipulation).

3 Setting product_size to null removes this field from the existing product gulp-kayak-12345.
4 If product new-product-98765 doesn’t exist, it will be created with the specified fields.

If the request is accepted for processing, you’ll get the HTTP response code 202 Accepted.

Update scenarios and examples

This section illustrates how the PATCH endpoint behaves in the following scenarios:

Scenario 1: Update existing product fields

Initial state:

{
  "ec_product_id": "gulp-kayak-12345",
  "ec_name": "Gulp! Kayak",
  "ec_price": 299.99,
  "ec_brand": "Gulp!",
  "available_colors": ["red", "blue", "green"]
}

PATCH request:

{
  "objects": [
    {
      "ec_product_id": "gulp-kayak-12345",
      "ec_price": 349.99,
      "available_colors": ["red", "yellow"]
    }
  ]
}

Final state:

{
  "ec_product_id": "gulp-kayak-12345",
  "ec_name": "Gulp! Kayak", // unchanged
  "ec_price": 349.99, // updated
  "ec_brand": "Gulp!", // unchanged
  "available_colors": ["red", "yellow"] // updated
}

Scenario 2: Remove fields from existing products

Initial state:

{
  "ec_product_id": "gulp-kayak-12345",
  "ec_name": "Gulp! Kayak",
  "ec_price": 299.99,
  "ec_brand": "Gulp!",
  "available_colors": ["red", "blue", "green"]
}

PATCH request:

{
  "objects": [
    {
      "ec_product_id": "gulp-kayak-12345",
      "ec_price": null,
      "available_colors": null
    }
  ]
}

Final state:

{
  "ec_product_id": "gulp-kayak-12345",
  "ec_name": "Gulp! Kayak",
  "ec_brand": "Gulp!"
}

Scenario 3: Create a new product

PATCH request:

{
  "objects": [
    {
      "ec_product_id": "minimal-product-001",
      "ec_name": "Basic Product",
      "ec_price": 29.99
    }
  ]
}

This creates a new product with only the three specified fields.

Scenario 4: Add or update dictionary field entries

For dictionary fields, the PATCH endpoint performs a deep merge. Existing entries that aren’t included in the request are preserved, while specified entries are added or updated.

Initial state:

{
  "ec_product_id": "gulp-kayak-12345",
  "ec_name": "Gulp! Kayak",
  "price_list": {
    "Default": 299.99,
    "Vip": 249.99,
    "Premium": 274.99
  }
}

PATCH request:

{
  "objects": [
    {
      "ec_product_id": "gulp-kayak-12345",
      "price_list": {
        "Vip": 239.99,
        "Wholesale": 199.99
      }
    }
  ]
}

Final state:

{
  "ec_product_id": "gulp-kayak-12345",
  "ec_name": "Gulp! Kayak", // unchanged
  "price_list": {
    "Default": 299.99, // unchanged
    "Vip": 239.99, // updated
    "Premium": 274.99, // unchanged
    "Wholesale": 199.99 // added
  }
}

Scenario 5: Remove a single entry from a dictionary field

You can remove individual entries from a dictionary field by setting their value to null. Other entries in the dictionary field are preserved.

Initial state:

{
  "ec_product_id": "gulp-kayak-12345",
  "price_list": {
    "Default": 299.99,
    "Vip": 249.99,
    "Premium": 274.99
  }
}

PATCH request:

{
  "objects": [
    {
      "ec_product_id": "gulp-kayak-12345",
      "price_list": {
        "Premium": null
      }
    }
  ]
}

Final state:

{
  "ec_product_id": "gulp-kayak-12345",
  "price_list": {
    "Default": 299.99, // unchanged
    "Vip": 249.99 // unchanged
  }
}

Scenario 6: Remove an entire dictionary field

Setting the dictionary field itself to null removes the entire field and all its entries from the product. As a result, the field will no longer exist for that product.

Initial state:

{
  "ec_product_id": "gulp-kayak-12345",
  "ec_name": "Gulp! Kayak",
  "price_list": {
    "Default": 299.99,
    "Vip": 249.99,
    "Premium": 274.99
  }
}

PATCH request:

{
  "objects": [
    {
      "ec_product_id": "gulp-kayak-12345",
      "price_list": null
    }
  ]
}

Final state:

{
  "ec_product_id": "gulp-kayak-12345",
  "ec_name": "Gulp! Kayak"
}

Delete product data

To delete specific products by ID, use the following endpoint:

POST /rest/organizations/<ORGANIZATION_ID>/ingest/v1/sources/<SOURCE_ID>/objects/bulk-delete HTTP/1.1

Where:

  • <ORGANIZATION_ID> is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID.

  • <SOURCE_ID> is the identifier of the source from which you want to delete the product data.

    Tip

    To find the source IDs tied to existing schemas, use the View all schemas endpoint of the Catalog Schema API.

Note

To use this endpoint, you must have the Allow access level for the Push items to sources domain.

In the request body, provide a JSON object that contains an array of product identifiers to delete.

Example request body:

{
  "objects": [ 1
    {
      "ec_product_id": "gulp-kayak-12345"
    },
    {
      "ec_product_id": "gulp-kayak-67890"
    },
    {
      "ec_product_id": "surf-co-repair-kit-54321"
    }
  ]
}
1 Each object in the objects array must contain the ec_product_id field, which specifies the unique identifier of the product to be deleted. In this example, three products are being deleted.
Important

Only the ec_product_id field is accepted in the deletion request body.

If the request is accepted for processing, you get an HTTP 202 Accepted response.

Rebuild your product data

A rebuild replaces all product data in the source and removes products that no longer exist. The PUT endpoint only adds or replaces the products you include in the request. It doesn’t remove products that you omit, so a rebuild requires both a PUT and a deleteOlderThan request.

To perform a full catalog rebuild:

  1. Record a startTime timestamp (Unix timestamp in milliseconds, for example 1754590978409) before you begin.

  2. Send all your product data using the PUT endpoint (see Ingest or replace product data).

  3. Send a deleteOlderThan request using your startTime value as the orderingId path parameter.

    Because you recorded startTime before the PUT calls, any product that wasn’t included in step 2 still has an orderingId older than startTime and will be deleted. Products that were sent (or re-sent) in step 2 received a newer orderingId and are therefore kept.

deleteOlderThan endpoint details

The deleteOlderThan endpoint deletes products whose last update operation orderingId is lower than a specified cutoff value.

DELETE /rest/organizations/<ORGANIZATION_ID>/ingest/v1/sources/<SOURCE_ID>/objects/older-than/<ORDERING_ID> HTTP/1.1

Where:

  • <ORGANIZATION_ID> is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID.

  • <SOURCE_ID> is the identifier of the source from which you want to delete the product data.

  • <ORDERING_ID> is a Unix timestamp (in milliseconds, for example 1754590978409) that specifies the cutoff date for deletion. Products whose last update occurred before this timestamp will be deleted.

Note

To use this endpoint, you must have the Allow access level for the Push items to sources domain.

If the request is accepted for processing, you’ll get the HTTP response code 202 Accepted. A 15-minute delay may occur before the deletions are reflected in the index.

Ingestion API limits

The Ingestion API enforces specific limits to ensure optimal performance and resource usage [1]:

  • The maximum request batch size is 10,000 items, and the maximum request size is 20 MB. If you experience timeout errors, consider reducing the batch size or request size.

    Tip

    While the Stream API supports larger batch files (up to 256 MB), the Ingestion API intentionally uses smaller request sizes (20 MB) to reduce the risk of transmission timeouts and ensure faster, incremental processing. Smaller batches spread load more evenly, resulting in smoother and quicker indexing.

  • STRING field limits:

    • STRING field names (keys) must not exceed 255 characters.

    • STRING field values must not exceed 1,000 characters.

    • Multivalue STRING fields can contain up to 100 values. Each of these values must not exceed 50 characters.

  • DATE field values must not exceed 50 characters. Dates must use the ISO 8601 format.

  • The API allows a maximum of 1 request per second per source.

  • The maximum size for a product document is 10 KB.

  • Dictionary field limits:

    • STRING dictionary fields can contain a maximum of 100 key-value pairs.

    • Numeric (DECIMAL, INTEGER_32, INTEGER_64) dictionary fields can contain a maximum of 1,000 key-value pairs.

Error handling and troubleshooting

Errors during ingestion are surfaced asynchronously in the Log Browser (platform-ca | platform-eu | platform-au) page. The Log Browser provides structured, actionable messages to facilitate quick resolution. Implementers must monitor these logs to detect and respond to issues promptly.

Items that don’t conform to the schema or contain invalid data will be rejected during ingestion. The error messages will indicate the specific issues with the data, such as missing required fields or incorrect data types.


1. To modify these limits, contact your Coveo representative.