Stream your catalog data to your source

This is for:

Developer

To send your catalog data to your Catalog source, you must use the Stream API.

You’ll likely use the Stream API in two different stages of your Coveo for Commerce implementation:

  1. To push a sample of your catalog data to your source for the first time. This allows you to test your catalog data structure, handle field mappings, and inspect content and properties to ensure products are indexed as expected.

  2. To push your entire catalog data to your source. This is done once you’ve created and configured your commerce catalog, making its data available for Coveo Machine Learning models.

Any other update to your catalog data should be done using a full item update or a partial item update. See How to update your catalog data for more information.

Warning
WARNING

When your Catalog source is used in a catalog configuration, currently indexed items not contained in the catalog’s data (JSON file) will be automatically removed from the Catalog source. To prevent accidental deletion of a substantial number of items from a Catalog source, the delete operation is skipped during the stream (rebuild) process if all of existing items were to be deleted. If you wish to delete indexed items, you should carry out a full item update instead.

When your Catalog source isn’t used with a catalog configuration, and you open and close a stream with an empty JSON file, all content from your source will be deleted.

This process consists of three steps:

When your catalog requires an update to a subset of products, see How to update your catalog data.

Tip

If you use Java in your project, it’s recommended to interact with the Stream API via the Coveo Push API client library for Java, as it can greatly simplify your implementation.

Stream API operations are also available in a C# Platform SDK.

Stream prerequisites

This section outlines the setup required before you can start uploading data to your Catalog source using the Stream API.

Create your Catalog source

The first step is to create a Catalog source that will hold all the products that you want to index.

Once you created your Catalog source, you’ll be able to push your products to the source.

Catalog data structure and configuration setup

A specific catalog data structure is required to optimize the search experience with Coveo. Your catalog data structure can vary in many ways depending on your use case, often a combination of three types of objects: products, variants, and availabilities. This structure is then used to create a catalog configuration in the Coveo Administration Console.

Important

The objecttype is one of the most important source item types when defining your product data structure. It serves for categorizing whether items are of the product, variant, or availability catalog object.

A catalog data structure consists of a JSON file that contains information about your products, variants, and availabilities. For instructions on how to configure items for the different catalog object types, see:

The JSON file must contain an object for each item (product, variant, or availability) that you want to index.

For example, the following catalog data is structured in JSON and has different objects to identify products, variants, and availabilities:

{
    "AddOrUpdate": [
    {
     "documentId": "product://001-red",
     "FileExtension": ".html",
     "ec_name": "Coveo Soccer Shoes - Red",
     "model": "Authentic",
     "ec_brand": ["Coveo"],
     "ec_description": "<p>The astonishing, the original, and always relevant Coveo style.</p>",
     "color": ["Red"],
     "ec_item_group_id": "001",
     "productid": "001-red",
     "ec_images": ["https://myimagegallery?productid"],
     "gender": "Men",
     "ec_price": 28.00,
     "ec_category": "Soccer Shoes",
     "objecttype": "Product"
   },
   {
     "documentId": "variant://001-red-8_wide",
     "FileExtension": ".html",
     "ec_name": "Coveo Soccer Shoes - Red / Size 8 - Wide",
     "sku": "001-red-8_wide",
     "productsize": "8",
     "width": "wide",
     "productid": "001-red",
     "objecttype": "Variant"
   },
   {
      "documentId": "store://s000002",
      "title": "Montreal Store",
      "lat": 45.4975,
      "long": -73.5687,
      "availableskus": ["001-red-8_wide","001-red-9_wide","001-red-10_wide","001-red-11_wide", "001-blue-8_wide"],
      "availabilityid": "s000002",
      "objecttype": "Availability"
    },
   // ...
    ]
}

Limits

The Stream API enforces certain limits on request size and frequency.

These limits differ depending on whether the organization to which data is pushed is a production or non-production organization.

The following table indicates the Stream API limits depending on your organization type:

organization type Maximum API requests per day Burst limit (requests per 5 minutes) Maximum upload requests per day Maximum file size Maximum item size[1] Maximum items per source[2]

Production

15,000

250

96

256 MB

3 MB

1,000,000

Non-production

10,000

150

96

256 MB

3 MB

1,000,000

1. This limit will be applied starting May 6, 2024.

2. This limit will be applied starting May 20, 2024.

Important

These limits could change at any time without prior notice. If you need to modify these limits, contact your Coveo representative.

Catalog data file exceeds 256 MB

The Stream API enforces a limit on the size of your JSON file. As a result, your catalog data JSON file can’t be larger than 256 MB.

When a single catalog data file (JSON file) exceeds 256 MB, you must divide it into smaller JSON files, each not surpassing 256 MB.

Stream API error codes

If a request to the Stream API fails because one of the limits has been exceeded, the API will trigger one of the following response status codes:

Status code Triggered when

413

The total Stream API request size exceeds 256 MB when pushing a large file container. See Catalog data file exceeds 256 MB.

429

The amount of total Stream API (upload and update) requests exceeds 15,000 per day (10,000 for non-production organizations). The quota is reset at midnight UTC.

The amount of total Stream API upload requests exceeds 96 per day (4 per hour). The quota is reset at midnight UTC.

The amount of total Stream API requests exceeds 250 (150 for non-production organizations) in an interval of 5 minutes. The retry-after header indicates how long the user agent should wait before making another request.

Coveo declined your request due to a reduced indexing capacity.

Step 1: Open a stream

The first step to index your catalog data is to open a stream using the Stream API.

To achieve this, you must perform the following POST request:

POST https://api.cloud.coveo.com/push/v1/organizations/{organizationId}/sources/{sourceId}/stream/open

Content-Type: application/json
Accept: application/json
Authorization: Bearer <MY_ACCESS_TOKEN>

Where you replace:

  • {organizationId} with the unique identifier of your organization (see Find your organization ID).

  • {sourceId} with the unique identifier of the source to which you want to push content (see Copy the ID of a source for information on how to copy your source’s unique identifier).

  • <MY_ACCESS_TOKEN> with an access token, such as an API key that has the required privileges to push content to the source.

If your request is successful, you’ll get the HTTP response code 200. This will get you a response that looks like this:

{
    "streamId": "1234-5678-9101-1121",
    "uploadUri": "link:https://coveo-nprod-customerdata.s3.amazonaws.com/[...]",
    "fileId": "b5e8767e-8f0d-4a89-9095-1127915c89c7",
    "requiredHeaders": {
      "x-amz-server-side-encryption": "AES256",
      "Content-Type": "application/octet-stream"
  }
}
Important
  • Take note of the generated streamId and uploadUri values, as you’ll need them in the next steps.

  • The uploadUri is valid for one hour.

Step 2: Upload your catalog data into the stream

To upload your catalog data into the stream, you must attach your JSON file to the following Stream API PUT request:

PUT {uploadUri}

x-amz-server-side-encryption: AES256
Content-Type: application/octet-stream

Where you replace {uploadUri} with the uploadUri you received when you opened the stream in step 1.

Important
  • The x-amz-server-side-encryption and Content-Type parameters are authentication headers and so should be included in the request headers section instead of the body of the request.

  • The JSON data must be formatted in a JSON file that contains all your items.

You can now upload your catalog data (JSON file). See Catalog data structure for an example of a catalog data file.

Tip
Leading practice
  • Make sure that your catalog data (JSON file) contains information to fill the commerce standard fields.

  • To validate the parsing of the file is successful, we recommend that you test a subset of your catalog data before uploading the entire catalog.

In the case your catalog data file (JSON file) exceeds 256 MB, you’ll have to upload multiple JSON files. When you initially open the stream, you get an uploadUri and a streamId. After you’ve sent the first catalog metadata (JSON file), you must send the next data file(s) in the body of the following POST request:

POST https://api.cloud.coveo.com/push/v1/organizations/{organizationId}/sources/{sourceId}/stream/{streamId}/chunk

Content-Type: application/json
Accept: application/json
Authorization: Bearer <MY_ACCESS_TOKEN>

Where you replace {uploadUri} with the uploadUri you received when you opened the stream in step 1

If your request to upload your JSON data is successful, you’ll get the HTTP response code 200. The response will contain another uploadUri to use in subsequent requests until you’ve uploaded all the catalog data to your Catalog source.

Step 3: Close the stream

Once you uploaded all your catalog data, you must close the stream.

To achieve this, you must perform the following POST request:

POST https://api.cloud.coveo.com/push/v1/organizations/{organizationId}/sources/{sourceId}/stream/{streamId}/close

Authorization: Bearer <MY_ACCESS_TOKEN>

Where you replace:

  • {organizationId} with the ID of your organization (see Find your organization ID).

  • {sourceId} with the unique identifier of the source to which you want to push content (see Copy the ID of a source for information on how to copy your source’s unique identifier).

  • {streamId} with the ID of your stream (see step 1).

  • <MY_ACCESS_TOKEN> with an access token, such as an API key that has the required privileges to push content to the source.

If the request to close your items is successful, you’ll get the HTTP response code 200. Given that your request is successful, when you upload a catalog into a source, it will replace the previous content of the source completely. Expect a 15-minute delay for the removal of the old items from the index.

After you’ve uploaded all your items, check the Log Browser (platform-ca | platform-eu | platform-au) to ensure that the streaming of products has been successful. For more information see Use the Log Browser to review indexing logs.

Required privileges

The following table indicates the privileges required for your organizations groups to view or edit elements of the Catalogs (platform-ca | platform-eu | platform-au) page and associated panels (see Manage privileges and Privilege reference). The Commerce domain is, however, only available in Coveo commerce organizations.

Action Service - Domain Required access level

View catalogs

Commerce - Catalogs
Content - Sources
Content - Fields
Organization - Organization

View

Edit catalogs

Content - Fields
Content - Sources
Organization - Organization

View

Commerce - Catalogs

Edit

Search - Execute Query

Allowed

What’s next?

  • Once you’re done streaming your catalog data, we strongly recommend that you inspect your content and properties to ensure that your content was indexed correctly.

  • Once your initial catalog data upload is complete, you can make updates to the catalog content by performing a full item update or by making smaller adjustments to information on single products with a partial item update.