Catalog Ingestion API
Catalog Ingestion API
|
|
The Catalog Schema and Ingestion APIs are currently in closed beta. Contact your Coveo representative to learn about these APIs and how to get involved. |
The Catalog Schema and Ingestion APIs offer a streamlined approach to managing catalog data indexing and updates, simplifying data integration and maintenance within a Coveo organization.
The Catalog Ingestion API provides an improved alternative to the existing catalog data operations of the Coveo Stream API. It supports full ingestion, partial updates, and data deletion. The Catalog Ingestion API ingests catalog data by validating it against the schemas you’ve defined, ensuring the integrity and consistency of data from the moment it enters the Coveo index.
|
|
Leading practices
|
Limitations
The APIs currently only support the ingestion of Product catalog object items.
This means that if your catalog data contains items of the Variant or Availability types, you can’t use these APIs yet.
Usage
The Catalog Ingestion API supports several operations depending on what you need to do with your product data.
| What you want to do | Operation |
|---|---|
Send product data for the first time |
|
Add new products |
|
Fully replace specific products |
|
Update specific fields on existing products |
|
Delete specific products by ID |
|
Replace your entire product data and remove stale products |
Ingest or replace product data
To perform the initial ingestion, add new products, or fully replace existing ones, use the following endpoint:
|
|
Note
When working with existing products, the PUT endpoint performs a full replacement of the product data.
This means that if you include a product with an To update only specific fields of an existing product without affecting the other fields, consider using the partial update operation instead, which allows for partial updates without removing unspecified fields. |
PUT /rest/organizations/<ORGANIZATION_ID>/ingest/v1/sources/<SOURCE_ID>/objects HTTP/1.1
Where:
-
<ORGANIZATION_ID>is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SOURCE_ID>is the identifier of the source in which you want to ingest the product data. This source is the one created automatically when you created the catalog schema. To learn how to find the source ID for a given source, see Copy a source name or ID.To find the source IDs tied to existing schemas, use the View all schemas endpoint of the Catalog Schema API.
|
|
Note
To use this endpoint, you must have the Allow access level for the Push items to sources domain. |
In the request body, you must provide a JSON object that contains an array of product objects to be ingested.
Example request body:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak",
"ec_category": [
"Canoes & Kayaks|Kayaks|Sea Kayaks",
"Promotions|Kayaks"
],
"ec_description": "Perfect for exploring the great outdoors.",
"ec_shortdesc": "A kayak for the adventurous.",
"ec_brand": "Gulp!",
"ec_thumbnails": [
"https://example.com/images/gulp-kayak.jpg",
"https://example.com/images/gulp-kayak-2.jpg"
],
"ec_images": [
"https://example.com/images/gulp-kayak-large.jpg",
"https://example.com/images/gulp-kayak-large-2.jpg"
],
"ec_price": 299.99,
"ec_cogs": 150.00,
"ec_item_group_id": "gulp-kayak-group",
"product_size": 42,
"price_list": {
"Default": 299.99,
"Vip": 249.99,
"Premium": 274.99
},
"available_colors": ["red", "blue", "green"],
"geolocation": 45.4215,
"added_date": "2025-03-15",
"timestamp": 1672531199
},
{
"ec_product_id": "gulp-kayak-67890",
"ec_name": "Gulp! Kayak Pro",
"ec_category": [
"Canoes & Kayaks|Kayaks|Professional Kayaks",
"Promotions|Kayaks"
],
"ec_description": "A professional-grade kayak for serious adventurers.",
"ec_shortdesc": "Professional kayak for the serious adventurer.",
"ec_brand": "Gulp!",
"ec_thumbnails": [
"https://example.com/images/gulp-kayak-pro.jpg"
],
"ec_images": [
"https://example.com/images/gulp-kayak-pro-large.jpg"
],
"ec_price": 499.99,
"ec_cogs": 300.00,
"ec_item_group_id": "gulp-kayak-group",
"product_size": 44,
"price_list": {
"Default": 499.99,
"Vip": 449.99,
"Premium": 474.99
},
"available_colors": ["black", "yellow"],
"geolocation": 45.4215,
"added_date": "2025-04-01",
"timestamp": 1672617599
},
[...]
]
}
Metadata keys with an ec_ prefix contain the catalog data that will fill the standard commerce fields.
These fields are essential for the proper functioning of Coveo’s commerce features, such as Coveo ML models and event enrichment.
However, only ec_product_id is strictly required for ingestion and omitting it will result in an error. |
|||
The ec_category metadata is used to define the category hierarchy of your product.
The value must be an array of strings, in which each string represents a complete hierarchical category path using the pipe (|) delimiter.
Each category path should represent the full hierarchy, from the broadest to the most specific level. For example:
|
|||
Custom metadata keys defined in the catalog schema can also be included in the product data.
These keys must match the names defined in the schema, and their values must adhere to the specified data types and formats.
For example, if you defined a custom field named product_size of type INTEGER_32, you can include it in the product data as shown above.
Providing values that haven’t been defined in the schema will result in ingestion errors.
For example, if |
|||
Fields defined as dictionary fields in the schema (that is, with keyValue set to true) must be provided as JSON objects containing key-value pairs.
In this example, price_list stores pricing that varies by customer group. |
|||
If a product with ec_product_id gulp-kayak-67890 already exists in the source, this request fully replaces it.
All fields are overwritten with the values provided here, and any fields present in the existing product but omitted from this request are removed. |
If the request is accepted for processing, you’ll get the HTTP response code 202 Accepted.
Partial update of product data
The Catalog Ingestion API PATCH endpoint performs partial updates using JSON merge patch semantics, meaning that:
-
If the product doesn’t exist, a new product is created with the specified fields.
-
If the product already exists, it updates only the specified fields, leaving other fields unchanged.
-
A field can be removed from an existing product by setting its value to
null. -
For dictionary fields, the endpoint performs a deep merge. Existing key-value pairs that aren’t included in the request payload are preserved. You can add new entries, update existing entry values, and remove individual entries by setting them to
null.
To update your product data, use the following endpoint:
PATCH /rest/organizations/<ORGANIZATION_ID>/ingest/v1/sources/<SOURCE_ID>/objects HTTP/1.1
Where:
-
<ORGANIZATION_ID>is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SOURCE_ID>is the identifier of the source in which you want to update the product data. This source is the one that was automatically generated when you created the catalog schema. To learn how to find the source ID for a given source, see Copy a source name or ID.To find the source IDs tied to existing schemas, use the View all schemas endpoint of the Catalog Schema API.
|
|
Note
To use this endpoint, you must have the Allow access level for the Push items to sources domain. |
In the request body, provide a JSON object that contains an array of product objects to be updated or created.
Example request body:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345",
"ec_price": 349.99,
"available_colors": ["red", "blue"],
"product_size": null
},
{
"ec_product_id": "new-product-98765",
"ec_name": "New Kayak Model",
"ec_price": 199.99,
"ec_brand": "Gulp!",
"available_colors": ["red", "blue"]
}
]
}
The ec_product_id field is required and must be of the string type. |
|||
If product gulp-kayak-12345 exists, the fields that are provided in the request body will be updated, while all other existing fields will remain unchanged.
|
|||
Setting product_size to null removes this field from the existing product gulp-kayak-12345. |
|||
If product new-product-98765 doesn’t exist, it will be created with the specified fields. |
If the request is accepted for processing, you’ll get the HTTP response code 202 Accepted.
Update scenarios and examples
This section illustrates how the PATCH endpoint behaves in the following scenarios:
Scenario 1: Update existing product fields
Initial state:
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak",
"ec_price": 299.99,
"ec_brand": "Gulp!",
"available_colors": ["red", "blue", "green"]
}
PATCH request:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345",
"ec_price": 349.99,
"available_colors": ["red", "yellow"]
}
]
}
Final state:
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak", // unchanged
"ec_price": 349.99, // updated
"ec_brand": "Gulp!", // unchanged
"available_colors": ["red", "yellow"] // updated
}
Scenario 2: Remove fields from existing products
Initial state:
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak",
"ec_price": 299.99,
"ec_brand": "Gulp!",
"available_colors": ["red", "blue", "green"]
}
PATCH request:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345",
"ec_price": null,
"available_colors": null
}
]
}
Final state:
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak",
"ec_brand": "Gulp!"
}
Scenario 3: Create a new product
PATCH request:
{
"objects": [
{
"ec_product_id": "minimal-product-001",
"ec_name": "Basic Product",
"ec_price": 29.99
}
]
}
This creates a new product with only the three specified fields.
Scenario 4: Add or update dictionary field entries
For dictionary fields, the PATCH endpoint performs a deep merge. Existing entries that aren’t included in the request are preserved, while specified entries are added or updated.
Initial state:
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak",
"price_list": {
"Default": 299.99,
"Vip": 249.99,
"Premium": 274.99
}
}
PATCH request:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345",
"price_list": {
"Vip": 239.99,
"Wholesale": 199.99
}
}
]
}
Final state:
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak", // unchanged
"price_list": {
"Default": 299.99, // unchanged
"Vip": 239.99, // updated
"Premium": 274.99, // unchanged
"Wholesale": 199.99 // added
}
}
Scenario 5: Remove a single entry from a dictionary field
You can remove individual entries from a dictionary field by setting their value to null.
Other entries in the dictionary field are preserved.
Initial state:
{
"ec_product_id": "gulp-kayak-12345",
"price_list": {
"Default": 299.99,
"Vip": 249.99,
"Premium": 274.99
}
}
PATCH request:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345",
"price_list": {
"Premium": null
}
}
]
}
Final state:
{
"ec_product_id": "gulp-kayak-12345",
"price_list": {
"Default": 299.99, // unchanged
"Vip": 249.99 // unchanged
}
}
Scenario 6: Remove an entire dictionary field
Setting the dictionary field itself to null removes the entire field and all its entries from the product.
As a result, the field will no longer exist for that product.
Initial state:
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak",
"price_list": {
"Default": 299.99,
"Vip": 249.99,
"Premium": 274.99
}
}
PATCH request:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345",
"price_list": null
}
]
}
Final state:
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak"
}
Delete product data
To delete specific products by ID, use the following endpoint:
POST /rest/organizations/<ORGANIZATION_ID>/ingest/v1/sources/<SOURCE_ID>/objects/bulk-delete HTTP/1.1
Where:
-
<ORGANIZATION_ID>is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SOURCE_ID>is the identifier of the source from which you want to delete the product data.To find the source IDs tied to existing schemas, use the View all schemas endpoint of the Catalog Schema API.
|
|
Note
To use this endpoint, you must have the Allow access level for the Push items to sources domain. |
In the request body, provide a JSON object that contains an array of product identifiers to delete.
Example request body:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345"
},
{
"ec_product_id": "gulp-kayak-67890"
},
{
"ec_product_id": "surf-co-repair-kit-54321"
}
]
}
Each object in the objects array must contain the ec_product_id field, which specifies the unique identifier of the product to be deleted.
In this example, three products are being deleted.
|
If the request is accepted for processing, you get an HTTP 202 Accepted response.
Rebuild your product data
A rebuild replaces all product data in the source and removes products that no longer exist.
The PUT endpoint only adds or replaces the products you include in the request.
It doesn’t remove products that you omit, so a rebuild requires both a PUT and a deleteOlderThan request.
To perform a full catalog rebuild:
-
Record a
startTimetimestamp (Unix timestamp in milliseconds, for example1754590978409) before you begin. -
Send all your product data using the
PUTendpoint (see Ingest or replace product data). -
Send a
deleteOlderThanrequest using yourstartTimevalue as theorderingIdpath parameter.Because you recorded
startTimebefore the PUT calls, any product that wasn’t included in step 2 still has anorderingIdolder thanstartTimeand will be deleted. Products that were sent (or re-sent) in step 2 received a newerorderingIdand are therefore kept.
deleteOlderThan endpoint details
The deleteOlderThan endpoint deletes products whose last update operation orderingId is lower than a specified cutoff value.
DELETE /rest/organizations/<ORGANIZATION_ID>/ingest/v1/sources/<SOURCE_ID>/objects/older-than/<ORDERING_ID> HTTP/1.1
Where:
-
<ORGANIZATION_ID>is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SOURCE_ID>is the identifier of the source from which you want to delete the product data. -
<ORDERING_ID>is a Unix timestamp (in milliseconds, for example1754590978409) that specifies the cutoff date for deletion. Products whose last update occurred before this timestamp will be deleted.
|
|
Note
To use this endpoint, you must have the Allow access level for the Push items to sources domain. |
If the request is accepted for processing, you’ll get the HTTP response code 202 Accepted. A 15-minute delay may occur before the deletions are reflected in the index.
Ingestion API limits
The Ingestion API enforces specific limits to ensure optimal performance and resource usage [1]:
-
The maximum request batch size is 10,000 items, and the maximum request size is 20 MB. If you experience timeout errors, consider reducing the batch size or request size.
While the Stream API supports larger batch files (up to 256 MB), the Ingestion API intentionally uses smaller request sizes (20 MB) to reduce the risk of transmission timeouts and ensure faster, incremental processing. Smaller batches spread load more evenly, resulting in smoother and quicker indexing.
-
STRINGfield limits:-
STRINGfield names (keys) must not exceed 255 characters. -
STRINGfield values must not exceed 1,000 characters. -
Multivalue
STRINGfields can contain up to 100 values. Each of these values must not exceed 50 characters.
-
-
DATEfield values must not exceed50characters. Dates must use the ISO 8601 format. -
The API allows a maximum of 1 request per second per source.
-
The maximum size for a product document is 10 KB.
-
Dictionary field limits:
-
STRINGdictionary fields can contain a maximum of 100 key-value pairs. -
Numeric (
DECIMAL,INTEGER_32,INTEGER_64) dictionary fields can contain a maximum of 1,000 key-value pairs.
-
Error handling and troubleshooting
Errors during ingestion are surfaced asynchronously in the Log Browser (platform-ca | platform-eu | platform-au) page. The Log Browser provides structured, actionable messages to facilitate quick resolution. Implementers must monitor these logs to detect and respond to issues promptly.
Items that don’t conform to the schema or contain invalid data will be rejected during ingestion. The error messages will indicate the specific issues with the data, such as missing required fields or incorrect data types.