Catalog Schema and Ingestion APIs
Catalog Schema and Ingestion APIs
|
The Catalog Schema and Ingestion APIs are currently in closed beta. Contact your Coveo representative to learn about these APIs and how to get involved. |
The Catalog Schema and Ingestion APIs offer a streamlined approach to managing catalog data indexing and updates, simplifying data integration and maintenance within a Coveo organization.
The Catalog Schema API serves as your starting point by letting you define the metadata keys you’ll want to index in your Coveo organization. This schema is used to validate your data as it’s being indexed. Additionally, the Catalog Schema API automates resource creation tasks previously done manually, such as setting up your Catalog source, catalog entity, and catalog configuration.
The Catalog Ingestion API provides an improved alternative to the existing Full and Partial catalog data update operations of the Coveo Stream API. It ingests catalog data by validating it against the schemas you’ve defined, ensuring data integrity and consistency from the moment it enters the Coveo index.
The following table summarizes the key differences between using the new Catalog Schema and Ingestion APIs versus the Stream API:
Capability | Catalog Schema and Ingestion APIs | Stream API |
---|---|---|
Data validation |
Built-in validation against defined schemas. |
Manual validation of data structure. |
Organization management |
Automatic creation of:
|
Manual resource creation required. |
Limitations
This section outlines the current limitations of using the Catalog Ingestion and Schema APIs to manage your catalog data:
-
The APIs currently only support the ingestion of
Product
catalog object items. This means that if your catalog data contains items of theVariant
orAvailability
types, you can’t use these APIs yet. -
The APIs don’t currently support the ingestion of dictionary fields. However, the team is actively working on an approach to handle them.
Leading practices
When using the Catalog Schema and Ingestion APIs, consider the following leading practices:
-
Always create a schema using the Schema API before attempting to ingest data.
-
Leverage the automated nature of these APIs to reduce manual interventions to the following resources, as these APIs manage those configurations automatically:
-
Automatically created fields.
Don’t modify the field Type and Multi-value facet options for automatically created fields. You can modify other field options as needed.
-
Automatically created Catalog sources.
-
Automatically created Catalog entities.
-
Automatically created Catalog configurations.
-
-
Don’t apply indexing pipeline extensions (IPEs) or modify the source mappings for the sources created by the Catalog Schema API.
Working with the APIs
To work with the Catalog Schema and Ingestion APIs
-
Use the Catalog Schema API to define a schema that specifies the structure of your product data.
-
Use the Catalog Ingestion API to submit product data for indexing.
Step 1: Define a catalog schema
The Catalog Schema API lets you define the structure of your product data. This schema-driven approach ensures that all incoming data is validated against the defined structure, reducing errors and improving data quality.
Defining a schema is only relevant for custom data that you want to index which isn’t already covered by the standard fields provided by Coveo. Coveo already enforces a schema to ensure that the required standard fields are present in the product data you ingest and that they comply with the specified data types and formats. These standard fields are crucial for the proper functioning of Coveo’s commerce features, such as Coveo Machine Learning (Coveo ML) models and event enrichment.
Expand this section to reveal a table that lists the standard fields that are enforced by the Coveo schema, and for which you don’t need to define a schema.
Standard fields enforced by the Coveo schema
Field | Description |
---|---|
|
The product’s unique identifier within a single source. |
|
The product’s name. |
|
The product’s description. |
|
A short description of the product. |
|
The product’s base price. |
|
An identifier used for product grouping. |
|
A collection of lower-resolution product images used for faster page load time (URL format). |
|
A collection of high-resolution product images used to view product details (URL format). |
|
The product’s brand. |
|
The product’s cost of goods sold (COGS). Used to calculate the product margin. |
For all the other catalog data that you want to index, you must create a schema that defines the custom fields and their types.
For example, if you want to index product attributes like product_size
, available_colors
, or gender
, you would define a schema that includes these fields and their data types.
You must create a schema for each locale you’re supporting.
The company Barca sells products in Canada and the US. In Canada, they support French and English. In the US, they support English and Spanish.
They have the following storefronts:
-
Barca Canada (
www.barca.com/ca/
). This storefront supports French (www.barca.com/ca/fr/
) and English (www.barca.com/ca/en/
). -
Barca US (
www.barca.com/us/
). This storefront supports English (www.barca.com/us/en/
) and Spanish (www.barca.com/us/es/
).
This setup requires the creation of four schemas:
-
One for the English variation of the Canadian storefront.
-
One for the French variation of the Canadian storefront.
-
One for the English variation of the US storefront.
-
One for the Spanish variation of the US storefront.
Resource creation
The Catalog Schema API is designed to simplify the process of creating and managing a commerce-focused Coveo organization. It removes manual configuration tasks and streamlines the onboarding process.
When a schema is created, the Catalog Schema API automatically sets up the underlying resources needed within the Coveo organization. For each schema, the following related resources are automatically created:
-
A Catalog source that will store your catalog data.
-
All of the custom fields defined in the schema, if they don’t already exist.
Notes-
If a custom field already exists in your organization, but its definition differs from what’s provided in your schema, the schema creation process won’t update the existing field. Any attempt to ingest data into a field whose definition differs from your schema will result in an ingestion error.
-
Don’t modify the following field options for automatically created fields:
-
Field type
-
Multi-value facet
-
-
-
All Coveo commerce standard fields and their mappings are automatically created in the Coveo organization.
Create a catalog schema
To create a catalog schema, you must use the Catalog Schema API. This section provides the details on how to use this API.
You must use the following endpoint:
POST /rest/organizations/<ORGANIZATION_ID>/catalogs/unstable/schemas HTTP/1.1
-
Where
<ORGANIZATION_ID>
is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID.
|
Note
To use this endpoint, you must have the Edit access level for the Catalog setup domain. |
In the request body, you must provide a JSON object that defines the schema. The schema represents the structure of your product data. It includes the fields and their types.
Example request body:
{
"name": "Barca en-CA-CAD",
"objectTypes": [
{
"name": "PRODUCT",
"customFields": [
{
"name": "product_size",
"type": {
"dataType": "INTEGER_32",
"multiValue": false
}
},
{
"name": "available_colors",
"type": {
"dataType": "STRING",
"multiValue": true
}
},
{
"name": "geolocation",
"type": {
"dataType": "DECIMAL",
"multiValue": false
}
},
{
"name": "added_date",
"type": {
"dataType": "DATE",
"multiValue": false
}
},
{
"name": "timestamp",
"type": {
"dataType": "INTEGER_64",
"multiValue": false
}
}
]
}
]
}
The top-level name key is a human-readable identifier for the schema.
You must create a schema for each locale you’re supporting, so you should include the locale in the name.
This name will be used for the following resources that are automatically created when the schema is created:
|
|
The name in the objectTypes property specifies the type of catalog object that you’re defining for this array.
Currently, only PRODUCT is supported. |
|
Each name in the customFields property specifies a custom metadata key to index from your catalog data.
For example, if your catalog data has product_size metadata that you want to index, set name to product_size .
This creates a corresponding Coveo field named product_size .
Custom field names must:
Not complying with these rules will result in an error when you try to create the schema.
Custom field names are also case-sensitive.
For example, |
|
dataType specifies the type of data for the possible values the metadata key can have.
Allowed values are:
|
|
multiValue indicates whether the key can have multiple values.
For example, if available_colors can have multiple colors, such as red , blue , or green , you would set multiValue to true .
Allowed values are
|
Response
The response to a successful schema creation request includes the details of the created schema, including its unique identifier and the fields defined.
Example response:
{
"id": "XXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"name": "Barca en-CA-CAD",
"objectTypes": [
{
"name": "PRODUCT",
"customFields": [
{
"name": "product_size",
"type": {
"dataType": "INTEGER_32",
"multiValue": false
}
},
{
"name": "available_colors",
"type": {
"dataType": "STRING",
"multiValue": true
}
},
{
"name": "geolocation",
"type": {
"dataType": "DECIMAL",
"multiValue": false
}
},
{
"name": "added_date",
"type": {
"dataType": "DATE",
"multiValue": false
}
},
{
"name": "timestamp",
"type": {
"dataType": "INTEGER_64",
"multiValue": false
}
}
]
}
]
}
The id is the unique identifier of the created schema. |
|
The customFields array contains the fields defined in the schema.
It includes their names, their types, and whether they can have multiple values. |
Update a catalog schema
To update a catalog schema, you must use the following endpoint of the Catalog Schema API to submit the updated schema definition:
PUT /rest/organizations/<ORGANIZATION_ID>/catalogs/unstable/schemas/<SCHEMA_ID> HTTP/1.1
Where:
-
<ORGANIZATION_ID>
is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SCHEMA_ID>
is the unique identifier of the schema that you want to update.
|
Note
To use this endpoint, you must have the Edit access level for the Catalog setup domain. |
In the request body, you must provide a JSON object that defines the updated schema. For a complete example, refer to the response in the Create a catalog schema section.
|
Partially updating a schema isn’t currently supported. You must provide the full schema definition, including all fields and their types, even if you’re only updating a single field. If you omit a field from the updated schema, it will be removed from the schema. |
Response (200 OK
)
The response to a successful schema update request includes the details of the updated schema. For a complete example, refer to the response in the Create a catalog schema section.
View catalog schemas
To view the existing catalog schemas in your Coveo organization, you must use the following endpoint of the Catalog Schema API:
GET /rest/organizations/<ORGANIZATION_ID>/catalogs/unstable/schemas HTTP/1.1
Where <ORGANIZATION_ID>
is the unique identifier of your Coveo organization.
To learn how to find the organization ID, see Find your organization ID.
|
Note
To use this endpoint, you must have the View access level for the Catalog setup domain. |
Response (200 OK
)
The response to a successful schema retrieval request includes the details of the existing schemas in your Coveo organization. For a complete example, refer to the response section of the Create a catalog schema request.
Delete a catalog schema
To delete a catalog schema, you must use the following endpoint of the Catalog Schema API:
DELETE /rest/organizations/<ORGANIZATION_ID>/catalogs/unstable/schemas/<SCHEMA_ID> HTTP/1.1
Where:
-
<ORGANIZATION_ID>
is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SCHEMA_ID>
is the unique identifier of the schema you want to delete.
|
Note
To use this endpoint, you must have the Edit access level for the Catalog setup domain. |
A successful response (200 OK
) indicates that the schema has been deleted.
Deleting a schema will also delete the following resources, which were automatically generated when the schema was created:
-
The Catalog source associated with the schema.
-
The catalog entity and catalog configuration associated with the schema.
|
Note
Fields created in the Coveo organization when the schema was created won’t be deleted. |
Step 2: Ingest product data
Once you’ve created a catalog schema, you must use the Catalog Ingestion API to submit product data for indexing.
The Catalog Ingestion API allows you to send product data in a structured format that adheres to the schema you defined in step 1. The product data that you submit must match the schema structure, including field names and types.
|
Note
The Catalog Ingestion API currently supports only As a result, you must provide the full product data for each item you want to add or update. When you update an item, its existing data is replaced with the new data you provide. If certain fields are omitted in the update, those fields will no longer be used by that item. For example, if you previously indexed a product with the All operation types are expected to be supported by the end of 2025. |
About standard fields
Coveo provides a set of standard fields that are automatically created when you create a catalog schema. These fields are used to store common product attributes and metadata. They’re essential for the proper functioning of Coveo’s commerce features, such as Coveo ML models, the Coveo Merchandising Hub (CMH), and event enrichment.
The following table lists the standard commerce fields, the type of data that must be provided, and the Coveo features that they serve:
Field | Type | Description |
multiValue value |
Required for |
---|---|---|---|---|
|
|
The product’s unique identifier within a single source. |
|
Mandatory. Items sent without this field will be rejected. |
|
|
The product’s name. |
|
|
|
|
The product’s description. |
|
ML models |
|
|
A short description of the product. |
|
ML models |
|
|
The product’s base price. |
|
|
|
|
An identifier used to group similar products together. |
|
|
|
|
A collection of lower resolution product images used for faster page load time (URI format). |
|
CMH |
|
|
A collection of high resolution product images used to view product details (URI format). |
|
CMH |
|
|
The product’s brand. |
|
|
|
|
The product’s cost of goods sold (COGS). Used to calculate the product margin. |
|
ML models |
Ingest product data
To ingest product data, you must use the following endpoint:
PUT /rest/organizations/<ORGANIZATION_ID>/ingest/unstable/sources/<SOURCE_ID>/object-types/<OBJECT_TYPE> HTTP/1.1
Where:
-
<ORGANIZATION_ID>
is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SOURCE_ID>
is the identifier of the source where you want to ingest the product data. This source is the one created automatically when you created the catalog schema. To learn how to find the source ID for a given source, see Copy a source name or ID. -
<OBJECT_TYPE>
is the type of catalog object that the items you’re ingesting pertain to. Currently, onlyPRODUCT
is supported.
|
Note
To use this endpoint, you must have the Allow access level for the Push items to sources domain. |
In the request body, you must provide a JSON object that contains an array of product objects to be ingested.
Example request body:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345",
"ec_name": "Gulp! Kayak",
"ec_description": "Perfect for exploring the great outdoors.",
"ec_shortdesc": "A kayak for the adventurous.",
"ec_brand": "Gulp!",
"ec_thumbnails": [
"https://example.com/images/gulp-kayak.jpg",
"https://example.com/images/gulp-kayak-2.jpg"
],
"ec_images": [
"https://example.com/images/gulp-kayak-large.jpg",
"https://example.com/images/gulp-kayak-large-2.jpg"
],
"ec_price": 299.99,
"ec_cogs": 150.00,
"ec_item_group_id": "gulp-kayak-group",
"product_size": 42,
"available_colors": ["red", "blue", "green"],
"geolocation": 45.4215,
"added_date": "2025-03-15",
"timestamp": 1672531199
},
{
"ec_product_id": "gulp-kayak-67890",
"ec_name": "Gulp! Kayak Pro",
"ec_description": "A professional-grade kayak for serious adventurers.",
"ec_shortdesc": "Professional kayak for the serious adventurer.",
"ec_brand": "Gulp!",
"ec_thumbnails": [
"https://example.com/images/gulp-kayak-pro.jpg"
],
"ec_images": [
"https://example.com/images/gulp-kayak-pro-large.jpg"
],
"ec_price": 499.99,
"ec_cogs": 300.00,
"ec_item_group_id": "gulp-kayak-group",
"product_size": 44,
"available_colors": ["black", "yellow"],
"geolocation": 45.4215,
"added_date": "2025-04-01",
"timestamp": 1672617599
},
[...]
]
}
Metadata keys with an ec_ prefix contain the catalog data that will fill the standard fields defined in the previous section.
These fields are essential for the proper functioning of Coveo’s commerce features, such as Coveo ML models and event enrichment. |
|
Custom metadata keys defined in the catalog schema can also be included in the product data.
These keys must match the names defined in the schema, and their values must adhere to the specified data types and formats.
For example, if you defined a custom field named product_size of type INTEGER_32 , you can include it in the product data as shown above. |
If the request is successful, you’ll get the HTTP response code 200 OK.
Delete product data
The Ingestion API supports two methods to delete product data:
-
Bulk deletion of multiple products: This allows you to specify a list of product identifiers to delete in a single request.
-
Deletion of products older than a specified timestamp: This allows you to delete products that were added before a specified timestamp.
Bulk deletion of multiple products
To delete multiple products, use the following endpoint:
POST /rest/organizations/<ORGANIZATION_ID>/ingest/unstable/sources/<SOURCE_ID>/object-types/<OBJECT_TYPE>/objects/bulk-delete HTTP/1.1
Where:
-
<ORGANIZATION_ID>
is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SOURCE_ID>
is the identifier of the source from which you want to delete the product data. -
<OBJECT_TYPE>
is the type of catalog object that the items you’re deleting pertain to. Currently, onlyPRODUCT
is supported.
|
Note
To use this endpoint, you must have the Allow access level for the Push items to sources domain. |
In the request body, provide a JSON object that contains an array of product identifiers to delete.
Example request body:
{
"objects": [
{
"ec_product_id": "gulp-kayak-12345"
},
{
"ec_product_id": "gulp-kayak-67890"
},
{
"ec_product_id": "surf-co-repair-kit-54321"
}
]
}
Each object in the objects array must contain the ec_product_id field, which specifies the unique identifier of the product to be deleted.
In this example, three products are being deleted. |
If the request succeeds, you get an HTTP 204 No Content response.
Deletion of products older than a specified timestamp
You can delete products that are older than a specified timestamp. This is useful for removing outdated products from the index after performing a full catalog data update, for example.
To delete products older than a specified timestamp, use the following endpoint:
DELETE /rest/organizations/<ORGANIZATION_ID>/ingest/unstable/sources/<SOURCE_ID>/objects/older-than/<ORDERING_ID> HTTP/1.1
Where:
-
<ORGANIZATION_ID>
is the unique identifier of your Coveo organization. To learn how to find the organization ID, see Find your organization ID. -
<SOURCE_ID>
is the identifier of the source from which you want to delete the product data. -
<ORDERING_ID>
is a Unix timestamp (in milliseconds) that specifies the cutoff date for deletion. Products that were indexed before this timestamp will be deleted.
|
Note
To use this endpoint, you must have the Allow access level for the Push items to sources domain. |
If the request is successful, you’ll get the HTTP response code 204 No Content.
Ingestion API limits
The Ingestion API enforces specific limits to ensure optimal performance and resource usage:
-
The maximum request batch size is 10,000 items, and the maximum request size is 20 MB. If you experience timeout errors, consider reducing the batch size or request size.
While the Stream API supports larger batch files (up to 256 MB), the Ingestion API intentionally uses smaller request sizes (20 MB) to reduce the risk of transmission timeouts and ensure faster, incremental processing. Smaller batches spread load more evenly, resulting in smoother and quicker indexing.
-
STRING
field values must not exceed 1,000 characters. -
Multivalue
STRING
fields can contain up to 100 values. Each of these values must not exceed 50 characters. -
DATE
field values must not exceed50
characters. Dates must use the ISO 8601 format. -
The API allows a maximum of 1 request per second per source.
-
The maximum size for a product document is 10 KB.
Error handling and troubleshooting
Errors during ingestion are surfaced asynchronously in the Log Browser (platform-ca | platform-eu | platform-au) page. The Log Browser provides structured, actionable messages to facilitate quick resolution. Implementers must monitor these logs to detect and respond to issues promptly.
Items that don’t conform to the schema or contain invalid data will be rejected during ingestion. The error messages will indicate the specific issues with the data, such as missing required fields or incorrect data types.