Catalog Ingestion API
Catalog Ingestion API
|
|
The Catalog Schema and Ingestion APIs are currently in closed beta. Contact your Coveo representative to learn about these APIs and how to get involved. |
The Catalog Schema and Ingestion APIs offer a streamlined approach to managing catalog data indexing and updates, simplifying data integration and maintenance within a Coveo organization.
The Catalog Ingestion API provides an improved alternative to the existing catalog data operations of the Coveo Stream API. It supports full ingestion, partial updates, and data deletion. The Catalog Ingestion API ingests catalog data by validating it against the schemas you’ve defined, ensuring the integrity and consistency of data from the moment it enters the Coveo index.
|
|
Leading practices
|
Limitations
This section outlines the current limitations of using the Catalog Ingestion and Schema APIs to manage your catalog data:
-
The APIs currently only support the ingestion of
Productcatalog object items. This means that if your catalog data contains items of theVariantorAvailabilitytypes, you can’t use these APIs yet. -
The APIs don’t currently support the ingestion of dictionary fields. However, the team is actively working on an approach to handle them.
Full rebuild of product data
A full rebuild replaces all existing products in the source with the new data provided in the request, meaning that any products which aren’t included in the request will be removed. To perform the initial ingestion of product data or to completely replace existing product data, use the following endpoint:
PUT /rest/organizations/<ORGANIZATION_ID>/ingest/v1/sources/<SOURCE_ID>/object-types/<OBJECT_TYPE>/objects HTTP/1.1
Where:
-
<ORGANIZATION_ID>is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SOURCE_ID>is the identifier of the source in which you want to ingest the product data. This source is the one created automatically when you created the catalog schema. To learn how to find the source ID for a given source, see Copy a source name or ID.To find the source IDs tied to existing schemas, use the View all schemas endpoint of the Catalog Schema API.
-
<OBJECT_TYPE>is the type of catalog object that the items you’re ingesting pertain to. Currently, onlyPRODUCTis supported.
|
|
Note
To use this endpoint, you must have the Allow access level for the Push items to sources domain. |
In the request body, you must provide a JSON object that contains an array of product objects to be ingested.
Example request body:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak",
"ec_category": [
"Canoes & Kayaks|Kayaks|Sea Kayaks",
"Promotions|Kayaks"
],
"ec_description": "Perfect for exploring the great outdoors.",
"ec_shortdesc": "A kayak for the adventurous.",
"ec_brand": "Gulp!",
"ec_thumbnails": [
"https://example.com/images/gulp-kayak.jpg",
"https://example.com/images/gulp-kayak-2.jpg"
],
"ec_images": [
"https://example.com/images/gulp-kayak-large.jpg",
"https://example.com/images/gulp-kayak-large-2.jpg"
],
"ec_price": 299.99,
"ec_cogs": 150.00,
"ec_item_group_id": "gulp-kayak-group",
"product_size": 42,
"available_colors": ["red", "blue", "green"],
"geolocation": 45.4215,
"added_date": "2025-03-15",
"timestamp": 1672531199
},
{
"ec_product_id": "gulp-kayak-67890",
"ec_name": "Gulp! Kayak Pro",
"ec_category": [
"Canoes & Kayaks|Kayaks|Professional Kayaks",
"Promotions|Kayaks"
],
"ec_description": "A professional-grade kayak for serious adventurers.",
"ec_shortdesc": "Professional kayak for the serious adventurer.",
"ec_brand": "Gulp!",
"ec_thumbnails": [
"https://example.com/images/gulp-kayak-pro.jpg"
],
"ec_images": [
"https://example.com/images/gulp-kayak-pro-large.jpg"
],
"ec_price": 499.99,
"ec_cogs": 300.00,
"ec_item_group_id": "gulp-kayak-group",
"product_size": 44,
"available_colors": ["black", "yellow"],
"geolocation": 45.4215,
"added_date": "2025-04-01",
"timestamp": 1672617599
},
[...]
]
}
Metadata keys with an ec_ prefix contain the catalog data that will fill the standard commerce fields.
These fields are essential for the proper functioning of Coveo’s commerce features, such as Coveo ML models and event enrichment.
However, only ec_product_id is strictly required for ingestion and omitting it will result in an error. |
|||
The ec_category metadata is used to define the category hierarchy of your product.
The value must be an array of strings, in which each string represents a complete hierarchical category path using the pipe (|) delimiter.
Each category path should represent the full hierarchy, from the broadest to the most specific level. For example:
|
|||
Custom metadata keys defined in the catalog schema can also be included in the product data.
These keys must match the names defined in the schema, and their values must adhere to the specified data types and formats.
For example, if you defined a custom field named product_size of type INTEGER_32, you can include it in the product data as shown above.
Providing values that haven’t been defined in the schema will result in ingestion errors.
For example, if |
If the request is accepted for processing, you’ll get the HTTP response code 202 Accepted.
Update product data
The Catalog Ingestion API PATCH endpoint performs partial updates using JSON merge patch semantics, meaning that:
-
If the product doesn’t exist, a new product is created with the specified fields.
-
If the product already exists, it updates only the specified fields, leaving other fields unchanged.
-
A field can be removed from an existing product by setting its value to
null.
To update your product data, use the following endpoint:
PATCH /rest/organizations/<ORGANIZATION_ID>/ingest/v1/sources/<SOURCE_ID>/object-types/<OBJECT_TYPE>/objects HTTP/1.1
Where:
-
<ORGANIZATION_ID>is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SOURCE_ID>is the identifier of the source in which you want to update the product data. This source is the one that was automatically generated when you created the catalog schema. To learn how to find the source ID for a given source, see Copy a source name or ID.To find the source IDs tied to existing schemas, use the View all schemas endpoint of the Catalog Schema API.
-
<OBJECT_TYPE>is the type of catalog object that the items you’re updating pertain to. Currently, onlyPRODUCTis supported.
|
|
Note
To use this endpoint, you must have the Allow access level for the Push items to sources domain. |
In the request body, provide a JSON object that contains an array of product objects to be updated or created.
Example request body:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345",
"ec_price": 349.99,
"available_colors": ["red", "blue"],
"product_size": null
},
{
"ec_product_id": "new-product-98765",
"ec_name": "New Kayak Model",
"ec_price": 199.99,
"ec_brand": "Gulp!",
"available_colors": ["red", "blue"]
}
]
}
The ec_product_id field is required and must be of the string type. |
|||
If product gulp-kayak-12345 exists, the fields that are provided in the request body will be updated, while all other existing fields will remain unchanged.
|
|||
Setting product_size to null removes this field from the existing product gulp-kayak-12345. |
|||
If product new-product-98765 doesn’t exist, it will be created with the specified fields. |
If the request is accepted for processing, you’ll get the HTTP response code 202 Accepted.
Update scenarios and examples
This section illustrates how the PATCH endpoint behaves in the following scenarios:
Scenario 1: Update existing product fields
Initial state:
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak",
"ec_price": 299.99,
"ec_brand": "Gulp!",
"available_colors": ["red", "blue", "green"]
}
PATCH request:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345",
"ec_price": 349.99,
"available_colors": ["red", "yellow"]
}
]
}
Final state:
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak", // unchanged
"ec_price": 349.99, // updated
"ec_brand": "Gulp!", // unchanged
"available_colors": ["red", "yellow"] // updated
}
Scenario 2: Remove fields from existing products
Initial state:
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak",
"ec_price": 299.99,
"ec_brand": "Gulp!",
"available_colors": ["red", "blue", "green"]
}
PATCH request:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345",
"ec_price": null,
"available_colors": null
}
]
}
Final state:
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak",
"ec_brand": "Gulp!"
}
Scenario 3: Create a new product
PATCH request:
{
"objects": [
{
"ec_product_id": "minimal-product-001",
"ec_name": "Basic Product",
"ec_price": 29.99
}
]
}
This creates a new product with only the three specified fields.
Delete product data
The Ingestion API supports two methods to delete product data:
-
Bulk deletion of multiple products: This allows you to specify a list of product identifiers to delete in a single request.
-
Deletion of products older than a specified timestamp: This allows you to delete products that were added before a specified timestamp.
Bulk deletion of multiple products
To delete multiple products, use the following endpoint:
POST /rest/organizations/<ORGANIZATION_ID>/ingest/v1/sources/<SOURCE_ID>/object-types/<OBJECT_TYPE>/objects/bulk-delete HTTP/1.1
Where:
-
<ORGANIZATION_ID>is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SOURCE_ID>is the identifier of the source from which you want to delete the product data.To find the source IDs tied to existing schemas, use the View all schemas endpoint of the Catalog Schema API.
-
<OBJECT_TYPE>is the type of catalog object that the items you’re deleting pertain to. Currently, onlyPRODUCTis supported.
|
|
Note
To use this endpoint, you must have the Allow access level for the Push items to sources domain. |
In the request body, provide a JSON object that contains an array of product identifiers to delete.
Example request body:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345"
},
{
"ec_product_id": "gulp-kayak-67890"
},
{
"ec_product_id": "surf-co-repair-kit-54321"
}
]
}
Each object in the objects array must contain the ec_product_id field, which specifies the unique identifier of the product to be deleted.
In this example, three products are being deleted.
|
If the request is accepted for processing, you get an HTTP 202 Accepted response.
Deletion of products older than a specified timestamp
You can delete products that are older than a specified timestamp. This is useful for removing outdated products from the index after performing a full catalog data update, for example.
To delete products older than a specified timestamp, use the following endpoint:
DELETE /rest/organizations/<ORGANIZATION_ID>/ingest/v1/sources/<SOURCE_ID>/objects/older-than/<ORDERING_ID> HTTP/1.1
Where:
-
<ORGANIZATION_ID>is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SOURCE_ID>is the identifier of the source from which you want to delete the product data. -
<ORDERING_ID>is a Unix timestamp (in milliseconds) that specifies the cutoff date for deletion. Products that were indexed before this timestamp will be deleted.
|
|
Note
To use this endpoint, you must have the Allow access level for the Push items to sources domain. |
If the request is accepted for processing, you’ll get the HTTP response code 202 Accepted. A 15-minute delay may occur before the deletions are reflected in the index.
Ingestion API limits
The Ingestion API enforces specific limits to ensure optimal performance and resource usage [1]:
-
The maximum request batch size is 10,000 items, and the maximum request size is 20 MB. If you experience timeout errors, consider reducing the batch size or request size.
While the Stream API supports larger batch files (up to 256 MB), the Ingestion API intentionally uses smaller request sizes (20 MB) to reduce the risk of transmission timeouts and ensure faster, incremental processing. Smaller batches spread load more evenly, resulting in smoother and quicker indexing.
-
STRINGfield limits:-
STRINGfield names (keys) must not exceed 255 characters. -
STRINGfield values must not exceed 1,000 characters. -
Multivalue
STRINGfields can contain up to 100 values. Each of these values must not exceed 50 characters.
-
-
DATEfield values must not exceed50characters. Dates must use the ISO 8601 format. -
The API allows a maximum of 1 request per second per source.
-
The maximum size for a product document is 10 KB.
Error handling and troubleshooting
Errors during ingestion are surfaced asynchronously in the Log Browser (platform-ca | platform-eu | platform-au) page. The Log Browser provides structured, actionable messages to facilitate quick resolution. Implementers must monitor these logs to detect and respond to issues promptly.
Items that don’t conform to the schema or contain invalid data will be rejected during ingestion. The error messages will indicate the specific issues with the data, such as missing required fields or incorrect data types.