Full catalog data updates

This is for:

Developer

To fully update your catalog data in your source (typically a Catalog source), you have to interact with the Coveo Stream API. It supports two types of operations to perform full updates on your catalog data, each suited to specific use cases:

  • Update operations: This operation is typically used to update individual items in your source, but it can also be used to update the entire catalog data. It performs full document replacements—any fields not included in the update payload for an item will be removed from that item in the index. However, items not included in the payload are left unchanged and remain in the source.

  • Load operations: Also known as open and close Stream, this operation overwrites the entire catalog data in your source with the provided data. This means that if you don’t include in the payload an item that was previously indexed, it will be automatically removed from the source.

Prerequisites

To perform the operations listed in this article, you must have:

Leading practices

  • Update operations should be favored over load operations to push or update your catalog data in your source. They provide the same benefits as load operations but are more efficient and have fewer limitations.

    Load operations require uploading and processing all file containers at once, which is resource-intensive and delays data availability until the entire load completes. In contrast, update operations process each container as soon as it’s ready, allowing for faster indexing and more up-to-date catalog data throughout the update process.

  • Any update to your catalog data should be done using either update operations or partial item updates operations.

Catalog data structure

To perform full catalog data updates, you need to prepare a JSON file containing your catalog data. This structure can vary in many ways depending on your use case. This is typically a combination of the catalog objects: products, variants, and availabilities. This structure is then used to create a catalog configuration in the Coveo Administration Console.

Important
  • The objecttype source item type is crucial when defining your product data structure. It’s essential for categorizing items into product, variant, or availability catalog objects.

  • If the file size exceeds 256 MB, you must split the content into multiple files. See Uploading large catalog data files for instructions.

  • Availability data can be sent to a separate source, meaning that your setup may not require availability data in the same source as your product and variant data.

The JSON file must contain an object for each item (product, variant, or availability) that you want to index in your source. For instructions on how to configure items for the different catalog object types, see:

Example

The following catalog data (structured in JSON) contains objects that represent products, variants, and availabilities:

{
  "AddOrUpdate": [
    {
     "documentId": "product://001-red",
     "FileExtension": ".html",
     "ec_name": "Coveo Soccer Shoes - Red",
     "model": "Authentic",
     "ec_brand": ["Coveo"],
     "ec_description": "<p>The astonishing, the original, and always relevant Coveo style.</p>",
     "color": ["Red"],
     "ec_item_group_id": "001",
     "ec_product_id": "001-red",
     "ec_images": ["https://myimagegallery?productid"],
     "gender": "Men",
     "ec_price": 28.00,
     "ec_category": "Soccer Shoes",
     "objecttype": "Product"
   },
   {
     "documentId": "variant://001-red-8_wide",
     "FileExtension": ".html",
     "ec_name": "Coveo Soccer Shoes - Red / Size 8 - Wide",
     "ec_variant_id": "001-red-8_wide",
     "productsize": "8",
     "width": "wide",
     "ec_product_id": "001-red",
     "objecttype": "Variant"
   },
   {
      "documentId": "store://s000002",
      "title": "Montreal Store",
      "lat": 45.4975,
      "long": -73.5687,
      "ec_available_items": ["001-red-8_wide","001-red-9_wide","001-red-10_wide","001-red-11_wide", "001-blue-8_wide"],
      "ec_availability_id": "s000002",
      "objecttype": "Availability"
    },
  ]
}

Update operations

Update operations let you build and update your entire catalog data. It doesn’t overwrite the entire catalog data in your source, meaning that if you don’t include in the payload an item that was previously indexed, it will remain in the source.

If certain metadata exists in the source but is missing from the payload, it will be removed meaning the item is fully replaced by the new version. This is ideal when you need to update all fields of an item, rather than just a subset, without affecting other items in the source.

Tip
Leading practice

If you only need to update certain metadata in an item (for example, updating a product price), you should use one of the partial catalog data update mechanisms instead.

To perform a full item update, you must interact with the Coveo Stream API. This section guides you through the different actions that must be taken to update your catalog data.

Refer to the Stream API reference for a comprehensive list of required parameters.

Step 1: Create a file container (Update operation)

Important

Make sure that you meet the prerequisites before performing this operation.

To perform a full document update, you must first create an Amazon S3 file container. Use the Create a file container operation to create an Amazon S3 file container for a specific Coveo organization:

Request template

POST https://api.cloud.coveo.com/push/v1/organizations/<MyOrganizationId>/files?useVirtualHostedStyleUrl=<true|false> HTTP/1.1

Accept: application/json
Content-Type: application/json
Authorization: Bearer <MyAccessToken>

In the request path:

In the query string:

  • Optionally, set useVirtualHostedStyleUrl to true if you want the service to return a virtual hosted-style URL, such as coveo-nprod-customerdata.s3.amazonaws.com/.... The default value is currently false, which means that the service returns path-style URLs, such as s3.amazonaws.com/coveo-nprod-customerdata/....

    Important

    The useVirtualHostedStyleUrl query string parameter will soon be deprecated as part of the path-style URL deprecation. From this point onwards, the service will only return virtual hosted-style URLs.

In the Authorization HTTP header:

  • Replace <MyAccessToken> with an access token, such as an API key that has the required privileges to push content to the source.

Payload

{}

The body of a successful response contains important information about the temporary, private, and encrypted Amazon S3 file container that you just created:

{
    "uploadUri": "<UPLOAD-URI>", 1
    "fileId": "<FILE_ID>", 2
    "requiredHeaders": { 3
        "x-amz-server-side-encryption": "AES256",
        "Content-Type": "application/octet-stream"
    }
}
1 The uploadUri property contains a pre-signed URI to use in the PUT request of step 2.
Notes
  • The Amazon S3 file container applies AES-256 server-side encryption to your data.

  • The file container is automatically deleted as soon as its content has been successfully forwarded to the service.

  • The uploadUri automatically expires after 60 minutes.

Therefore, it’s safe to upload sensitive information into the Amazon S3 file container.

2 The fileId property contains the unique identifier of your file container. You must use this value to send the file container to the source in step 3.
3 The requiredHeaders property contains the required HTTP headers for sending in the PUT request of step 2.

Step 2: Upload the full item content into the file container

To upload the content to update into the Amazon S3 file container you got from step 1, perform the following PUT request:

Request template

PUT <MyUploadURI> HTTP/1.1

<HTTPHeaders>

Where you replace:

  • <MyUploadURI> with the value of the uploadUri property you received the response when you created your file container in step 1.

  • <HTTPHeaders> with the key-value pairs of the requiredHeaders object property you received the response when you created your file container in step 1.

You can now upload your update data (JSON file) in the body of the request. For example, the following update data is structured in JSON and has items that must be updated and an item that must be deleted:

Payload example

{
  "addOrUpdate": [ 1
    {
      "objecttype": "Product",
      "documentId": "product://010",
      "ec_name": "Sneaker 010",
      "ec_product_id": "010",
      "ec_category": "Sneakers",
      "gender": "Unisex",
      "departement": "Shoes"
    },
    {
      "objecttype": "Product",
      "documentId": "product://011",
      "ec_name": "Sneaker 011",
      "ec_product_id": "011",
      "ec_category": "Sneakers",
      "gender": "Unisex",
      "departement": "Shoes"
    },
    {
      "objecttype": "Variant",
      "documentId": "variant://010-blue",
      "ec_name": "Sneaker 010 Royal Blue",
      "ec_product_id": "010",
      "ec_variant_id": "010-blue",
      "width": "wide",
      "productSize": "9"
    },
  ],
  "delete": [ 2
    {
      "documentId": "store://s000001"
    },
  ]
}

In the request body:

1 For each item you include in the addOrUpdate array, specifying a unique documentId value for each item is mandatory. Therefore, you should make sure that all of your items contain a documentId for which the value is a URI that uniquely identifies the item. This value must be a valid URL with a proper URI prefix, such as product://, or any other scheme that fits your catalog data.
2 For each item you include in the delete array, specifying a unique documentId value for each item is mandatory. This value must be a valid URL with a proper URI prefix, such as product://, or any other scheme that fits your catalog data.

A successful response has no content, but indicates that the content update was successfully uploaded to the Amazon S3 file container, as in the following example:

200 OK

{}
Important

When the payload exceeds 256 MB, it must be chunked into 256 MB parts. See Uploading large catalog data files for instructions.

Step 3: Send the file container to update your source (Update operation)

To push the Amazon S3 file container into your source, use the Update a catalog stream source operation as follows:

Request template

PUT https://api.cloud.coveo.com/push/v1/organizations/<MyOrganizationId>/sources/<MySourceId>/stream/update?fileId=<MyFileId> HTTP/1.1

Content-Type: application/json
Authorization: Bearer <MyAccessToken>

Payload

{}

Where you replace:

  • <MyOrganizationId> with the ID of the target Coveo organization (see Retrieve the organization ID).

  • <MySourceId> with the ID of the source which contains the catalog data that you want to update.

  • <MyFileId> with the fileId you got from step 1.

  • <MyAccessToken> with an access token, such as an API key that has the required privileges to push content to the source.

A successful response (202) indicates that the operation was successfully forwarded to the service and that the batch of items is now enqueued to be processed by the Coveo indexing pipeline. For example:

202 Accepted

{
  "orderingId": 1716387965000, 1
  "requestId": "498ef728-1dc2-4b01-be5f-e8f8f1154a99" 2
}

Where:

1 orderingId indicates the time your request was received. You must use this value if you want to delete items that were present in the source before the update.
2 requestId is the unique identifier for your request.
Tip

The contents of a file container can be pushed to multiple sources in the same Coveo organization. Just update the target sourceId and Authorization HTTP header access token in your other Stream API update or merge requests.

The file container remains available for 4 days.

Step 4: Delete old items

When performing a full item update, you’re either adding catalog data to your source for the first time, or replacing your whole catalog data with newer data. To make sure old items that were previously indexed are removed from your source, you must delete them.

The Delete old documents operation of the Stream API deletes items that are older than a specified date.

To delete old items, you must perform the following POST request:

Request template

POST https://api.cloud.coveo.com/push/v1/organizations/<MyOrganizationId>/sources/<MySourceId>/stream/deleteolderthan/<MyOrderingId> HTTP/1.1

Content-Type: application/json
Authorization: Bearer <MY_ACCESS_TOKEN>

Where you replace:

  • <MyOrganizationId> with the unique identifier of your organization (see Find your organization ID).

  • <MySourceId> with the unique identifier of the source to which you want to push content.

  • <MyOrderingId> with the value of the orderingId you received when you sent the file container to update your source in step 3. If you have to push multiple file containers, you must use the orderingId of the first file container you sent to update your source.

  • <MY_ACCESS_TOKEN> with an access token, such as an API key that has the required privileges to push content to the source.

A successful response will produce the HTTP response code 201 Created without any content.

Load operations

The load operation (also known as stream) overwrites the entire catalog data in your source. A load operation uses the catalog data you send in the request to completely replace the existing data in the source.

Important
  • Load operations should be used with caution due to their limitations.

  • Load operations aren’t recommended for sources that contain more than 1,000,000 items.

Performing a load operation via the Stream API involves the following steps:

Limitations

Update operations should be favored over the load operation for both pushing and performing full updates on your catalog data. The load operation has limitations that can affect the performance and reliability of your catalog data updates:

  • Content deletion: When using load operations, indexed items that aren’t sent in the request will be automatically removed from your source.

    To prevent the accidental deletion of a substantial number of items from a source, the delete operation is skipped during the process if all of the existing items were to be deleted. Perform an update operation to intentionally delete indexed items.

    Important

    When your source isn’t used with a catalog configuration, and you open and close a stream with an empty JSON file, all of the content from your source will be deleted.

  • Delayed data ingestion: When using the load operation to push or fully update your catalog data, the index waits until the entire catalog data is uploaded before starting the ingestion process. This means that there’s a delay in the availability of the updated data, causing a mismatch between the data in your system and the data in the Coveo index.

  • Lack of batch processing: When using the load operation to update your catalog data, you must push your entire catalog data every time you want to update it.

To avoid these limitations, consider using update operations instead.

Step 1: Open a stream

Important

Make sure that you meet the prerequisites before performing this operation.

The first step is to open a stream using the Stream API.

To achieve this, you must perform the following POST request:

POST https://api.cloud.coveo.com/push/v1/organizations/{organizationId}/sources/{sourceId}/stream/open HTTP/1.1

Content-Type: application/json
Accept: application/json
Authorization: Bearer <MY_ACCESS_TOKEN>

Where you replace:

If your request is successful, you’ll get the HTTP response code 201. This will get you a response that looks like this:

{
  "streamId": "1234-5678-9101-1121",
  "uploadUri": "link:https://coveo-nprod-customerdata.s3.amazonaws.com/[...]",
  "fileId": "b5e8767e-8f0d-4a89-9095-1127915c89c7",
  "requiredHeaders": {
    "x-amz-server-side-encryption": "AES256",
    "Content-Type": "application/octet-stream"
  }
}
Important
  • Take note of the generated streamId and uploadUri values, as you’ll need them in the next steps.

  • The uploadUri is valid for one hour.

Step 2: Upload your catalog data into the stream

To upload your catalog data into the stream, you must attach the JSON file containing all of your items to the following Stream API PUT request:

PUT {uploadUri} HTTP/1.1

x-amz-server-side-encryption: AES256
Content-Type: application/octet-stream
  • Where you replace {uploadUri} with the uploadUri you received when you opened the stream in step 1.

  • The x-amz-server-side-encryption and Content-Type parameters are authentication headers and so should be included in the request headers section instead of the body of the request.

You can now upload your catalog data (JSON file).

Example

The following catalog data (structured in JSON) contains objects that represent products, variants, and availabilities:

{
  "AddOrUpdate": [
    {
     "documentId": "product://001-red",
     "FileExtension": ".html",
     "ec_name": "Coveo Soccer Shoes - Red",
     "model": "Authentic",
     "ec_brand": ["Coveo"],
     "ec_description": "<p>The astonishing, the original, and always relevant Coveo style.</p>",
     "color": ["Red"],
     "ec_item_group_id": "001",
     "ec_product_id": "001-red",
     "ec_images": ["https://myimagegallery?productid"],
     "gender": "Men",
     "ec_price": 28.00,
     "ec_category": "Soccer Shoes",
     "objecttype": "Product"
   },
   {
     "documentId": "variant://001-red-8_wide",
     "FileExtension": ".html",
     "ec_name": "Coveo Soccer Shoes - Red / Size 8 - Wide",
     "ec_variant_id": "001-red-8_wide",
     "productsize": "8",
     "width": "wide",
     "ec_product_id": "001-red",
     "objecttype": "Variant"
   },
   {
      "documentId": "store://s000002",
      "title": "Montreal Store",
      "lat": 45.4975,
      "long": -73.5687,
      "ec_available_items": ["001-red-8_wide","001-red-9_wide","001-red-10_wide","001-red-11_wide", "001-blue-8_wide"],
      "ec_availability_id": "s000002",
      "objecttype": "Availability"
    },
  ]
}
Important

When the payload exceeds 256 MB, it must be chunked into 256 MB parts. See Uploading large catalog data files for instructions.

Tip
Leading practice
  • Make sure that your catalog data (JSON file) contains information to fill the commerce standard fields.

  • To validate that the parsing of the file is successful, test a subset of your catalog data before uploading all of it.

Step 3: Close the stream

Once you uploaded all your catalog data, you must close the stream.

To achieve this, you must perform the following POST request:

POST https://api.cloud.coveo.com/push/v1/organizations/{organizationId}/sources/{sourceId}/stream/{streamId}/close HTTP/1.1

Authorization: Bearer <MY_ACCESS_TOKEN>

Where you replace:

If the request to close your items is successful, you’ll get the HTTP response code 200 Created. The response body contains an orderingId that indicates the time your request was received, as well as the requestId which is the unique identifier for your request.

Example

200 Created

{
  "orderingId": 1716387965000,
  "requestId": "498ef728-1dc2-4b01-be5f-e8f8f1154a99"
}

Given that your request is successful, when you upload catalog data into a source, it will completely replace the previous content of the source. Expect a 15-minute delay for the removal of the old items from the index.

After you’ve uploaded all your items, check the Log Browser (platform-ca | platform-eu | platform-au) to ensure that the streaming of products has been successful. For more information see Use the Log Browser to review indexing logs.

Stream API limits

The Stream API enforces certain limits on request size and frequency.

These limits differ depending on whether the organization to which data is pushed is a production or non-production organization.

The following table indicates the Stream API limits depending on your organization type:

organization type Maximum API requests per day Burst limit (requests per 5 minutes) Maximum upload requests per day Maximum file size Maximum item size[1] Maximum items per source[2]

Production

15,000

250

96

256 MB

3 MB

1,000,000

Non-production

10,000

150

96

256 MB

3 MB

1,000,000

1. This limit will be applied starting May 6, 2024.

2. This limit will be applied starting May 20, 2024.

Important

These limits could change at any time without prior notice. To modify these limits, contact your Coveo representative.

Stream API error codes

If a request to the Stream API fails because one of the limits has been exceeded, the API will trigger one of the following response status codes:

Status code Triggered when

413

The total Stream API request size exceeds 256 MB when pushing a large file container. See Uploading large catalog files.

429

The amount of total Stream API (upload and update) requests exceeds 15,000 per day (10,000 for non-production organizations). The quota is reset at midnight UTC.

The amount of total Stream API upload requests exceeds 96 per day (4 per hour). The quota is reset at midnight UTC.

The amount of total Stream API requests exceeds 250 (150 for non-production organizations) within a 5 minute period. The retry-after header indicates how long the user agent should wait before making another request.

Coveo declined your request due to a reduced indexing capacity.

Uploading large catalog data files

The Stream API limits the size of your catalog data JSON file to 256 MB. If your catalog data file exceeds the limit, you must upload multiple JSON files.

To upload multiple JSON files:

When you initially open the stream, you receive an uploadUri. This URI is used to upload your first set of metadata (JSON file).

  1. After uploading the first file, make a POST request to the following endpoint to get a new uploadUri:

    POST https://api.cloud.coveo.com/push/v1/organizations/{organizationId}/sources/{sourceId}/stream/{streamId}/chunk HTTP/1.1
    
    Content-Type: application/json
    Accept: application/json
    Authorization: Bearer <MY_ACCESS_TOKEN>

    This request returns a new uploadUri that you can use for the next step.

  2. Make a PUT request using the uploadUri your received in the previous step. The body of the request must contain the catalog data chunk (maximum 256 MB) that you want to upload.

    PUT {uploadUri} HTTP/1.1
    
    x-amz-server-side-encryption: AES256
    Content-Type: application/octet-stream

    If your request to upload the catalog data is successful, you’ll receive a 200 HTTP response code.

  3. If you have more catalog data files to upload, repeat this process until all of your catalog data has been uploaded. For each file, first obtain a new uploadUri, then upload the file.

Required privileges

The following table indicates the privileges required for your organization's groups to view or edit elements of the Catalogs (platform-ca | platform-eu | platform-au) page and its associated panels (see Manage privileges and Privilege reference). The Commerce domain is only available to organizations in which Coveo for Commerce features are enabled.

Action Service - Domain Required access level

View catalogs

Commerce - Catalogs
Content - Sources
Content - Fields
Organization - Organization

View

Edit catalogs

Content - Fields
Content - Sources
Organization - Organization

View

Commerce - Catalogs

Edit

Search - Execute Query

Allowed