Indexing Commerce Catalog Content With the Stream API

To index your commerce content, you need a Coveo organization. The Coveo Platform access lets you create a source which is the bridge to your commerce data.

If you have never used the Cloud Platform before, log in now.

  • Contact your sales representative to enable Coveo for Commerce features in your organization.

  • You can review your organization license limits in the Coveo Platform.

  • Ensure that you set the Required Privileges for your commerce organization.

Step 1: Create Your Catalog Source

The recommended way to index a commerce catalog is to stream its data to a catalog source created within the Coveo Administration Console.

Step 2: Prepare Your Catalog Data

A specific catalog data structure is recommended to optimize the search experience with Coveo. There are often a combination of three types of objects: Products, Variants and Availabilities.

Every Coveo item is represented by a JSON configuration, which can be inspected in the Administration Console (see Review Item Properties - Item JSON Tab).

Products

Products are searchable items. In a catalog without variants, a product is also a purchasable item. In a catalog with variants, users search for products, and then select a variant to purchase.

Here’s an example of a JSON representation of a product:

 {
   "DocumentId": "product://001-red",
   "FileExtension": ".html",
   "title": "Coveo Soccer Shoes - Red",
   "model": "Authentic",
   "brand": ["Coveo"],
   "description": "<p>The astonishing, the original, and always relevant Coveo style.</p>",
   "color": ["Red"],
   "groupid": "001",
   "productid": "001-red",
   "imagesurl": ["https://myimagegallery?productid"],
   "gender": "Men",
   "price_dict": {
        "": "28.00",
        "store1": "28.00",
        "store2": "30.00"
   },
   "category": "Soccer Shoes",
   "objecttype": "Product"
 }

The above JSON contains generic information about the Coveo Soccer Shoes - Red product, such as its description, image, and price.

The objecttype metadata is important, as it will be used to identify the item as a product in the index.

The productid metadata will be used to establish relationships with variant and availability objects. In your catalog, this metadata may have a different label.

The price_dict is a dictionary field that can be used in an environment where products need to have different prices (i.e., price based on location or seasonal pricing) for a single item, instead of sending a single price value. (see Dictionary Fields).

If your catalog doesn’t have variants or availability restrictions, proceed to Step 3: Create Fields.

Variants

Variants are never returned as search results. A variant instead provides additional metadata on a parent product. In a catalog with variants, users search for products, and then select a variant to purchase.

Here’s an example of a possible JSON representation of a variant:

 {
   "DocumentId": "variant://001-red-8_wide",
   "FileExtension": ".html",
   "title": "Coveo Soccer Shoes - Red / Size 8 - Wide",
   "sku": "001-red-8_wide",
   "productsize": "8",
   "width": "wide",
   "productid": "001-red",
   "objecttype": "Variant"
 } 

The above JSON contains information specific to a product for sale (or SKU).

In this example, the Coveo Soccer Shoes product varies in size and width, so a distinct variant would be needed for every possible combination of those.

Observe that the product picture isn’t included in the variant, since in this case, the actual Coveo Soccer Shoes - Red product looks the same regardless of its size and width.

The objecttype metadata is important, as it will be used to identify the item as a variant in the index.

The productid metadata is used to establish a relationship with the parent product. In your catalog this metadata may have a different label.

The sku metadata is the unique identifier used to create a relationship with availability objects. In your catalog this metadata may have a different label. Use values that are standardized throughout your index.

We recommend using a simple method to differentiate the metadata. You can use dashes (-) as a separator between the groupid, product descriptor(s), and variant descriptor(s), and using underscores (_) as a substitute to spaces in descriptors, e.g.,:

  • groupid: 001, productid: 001-red, sku: 001-red-8_wide

  • groupid: 026, productid: 026-blue_demo, sku: 026-blue_demo-10_slim

Availabilities

Availabilities determine whether a given user can purchase a given product or variant. An availability can be a store inventory, a price list, or anything that controls which user has access to certain products or variants.

Here’s an example of a possible JSON representation of an availability for a common business-to-consumer (B2C) scenario where a local store contains a finite amount of products:

 {
    "DocumentId": "store://s000002",
    "title": "Montreal Store",
    "lat": 45.4975,
    "long": 73.5687,
    "availableskus": ["001-red-8_wide","001-red-9_wide","001-red-10_wide","001-red-11_wide", "001-blue-8_wide"],
    "availabilityid": "s000002",
    "objecttype": "Availability"
 }

And here’s another example for a common business-to-business (B2B) scenario where a price list determines who has access to what products:

 {
   "DocumentId": "store://42",
   "title": "Group ID 42",
   "subscription_level": "Gold",
   "availableskus": ["001-red-8_wide","001-red-9_wide","001-red-10_wide","001-red-11_wide", "001-blue-8_wide"],
   "availabilityid": "42",
   "objecttype": "Availability"
 }

The objecttype metadata is important, as it’ll be used to identify the item as an availability in the index.

In both scenarios, the availabilityid metadata uniquely identifies each availability channel, while the availableskus metadata defines which variants / products are available through a given channel. In your original catalog, these may have different labels.

When an availability channel contains over 1000 items, and you want to improve the performance of your index, it is recommended to use the same field name (i.e. Availableskus) on both the availability channel and the variant. Furthermore, they both need to be written in an array.

  • Variant
     {
      "sku": "001-red-8_wide",
      "availableskus": ["001-red-8_wide"],
     }
    
  • Availability channel
     {
      "availableskus": ["001-red-8_wide","001-red-9_wide",...],
     }
    

Step 3: Create Your Fields

Coveo organization sources come with a set of default system fields. However, adding your own fields allows the end user to get additional information in search results and to better target desired content (see Field Uses).

Default fields won’t be available in the field picker of the Admin UI (see Field Origins).

The mapping of your metadata has to be done through the source mappings. You will want to explore your metadata before you create your fields.

Ensure your metadata has the same exact name as the Coveo field.

For example, if you want to map your metadata my_price, then the field my_price needs to be created in Coveo before creating the source mapping.

You can create your fields manually through the Administration Console, or programmatically through the Fields API.

Avoid repeating specific field names, that you intend to use as facets, on different types of items. For example, if you are defining the color at a product level, then there’s no need to define the color at the variant level. If you need to include a field at both levels, prefix it with product and variant (e.g, productcolor variantcolor).

In addition to the fields you will want to create to leverage product metadata such as price, color, and description within your commerce interfaces (search and listing pages, recommendation interfaces, etc.), you must create a set of string type fields that you will use to configure your Coveo commerce catalog (see Add or Edit a Field):

Suggested field name Field intent Field settings to enable
"productid" Uniquely identifies each product
  • Facet
  • Use cache for nested queries
"sku" Uniquely identifies each variant
"availabilityid" Uniquely identifies each availability channel
"availableskus" Identifies the list of available product/variants in a given availability channel
  • Multi-value Facet
  • Use cache for nested queries

When your catalog only contains products (i.e., if products don’t have variants), or if the products in your catalog are offered through a single availability channel (e.g., a single store or product list), you won’t need to configure all of the above fields. Minimally, however, you will always have to configure a field that can uniquely identify products in your catalog.

Step 4: Stream Your Catalog Data to Your Source

To send your catalog data to your catalog source, you must use the Stream API. This process consists of three steps:

  1. Open a stream.
  2. Upload your catalog data into the stream.
  3. Close the stream.

When your catalog only requires an update to a subset of products, see Update Your Catalog source.

Here are examples of the three API calls to use:

Open a Stream

POST https://api.cloud.coveo.com/push/v1/organizations/{organizationId}/sources/{sourceId}/stream/open
Content-Type: application/json
Accept: application/json
Authorization: Bearer <MY_ACCESS_TOKEN>

You will get a response like this one:

{
    "streamId": "1234-5678-9101-1121",
    "uploadUri": "https://coveo-nprod-customerdata.s3.amazonaws.com/[...]",
    "fileId": "b5e8767e-8f0d-4a89-9095-1127915c89c7",
    "requiredHeaders": {
      "x-amz-server-side-encryption": "AES256",
      "Content-Type": "application/octet-stream"
  }
}

Upload Your Catalog Data Into The Stream

Using the uploadUri you received:

PUT {uploadUri}
x-amz-server-side-encryption: AES256
Content-Type: application/octet-stream

You can now upload your catalog data (JSON file). The following is an example of content payload in the body of the request:

{
    "AddOrUpdate": [
    {
     "DocumentId": "product://001-red",
     "FileExtension": ".html",
     "title": "Coveo Soccer Shoes - Red",
     "model": "Authentic",
     "brand": ["Coveo"],
     "description": "<p>The astonishing, the original, and always relevant Coveo style.</p>",
     "color": ["Red"],
     "groupid": "001",
     "productid": "001-red",
     "imagesurl": ["https://myimagegallery?productid"],
     "gender": "Men",
     "price": 28.00,
     "category": "Soccer Shoes",
     "objecttype": "Product"
   },
   {
     "DocumentId": "variant://001-red-8_wide",
     "FileExtension": ".html",
     "title": "Coveo Soccer Shoes - Red / Size 8 - Wide",
     "sku": "001-red-8_wide",
     "productsize": "8",
     "width": "wide",
     "productid": "001-red",
     "objecttype": "Variant"
   },
   {
      "DocumentId": "store://s000002",
      "title": "Montreal Store",
      "lat": 45.4975,
      "long": 73.5687,
      "availableskus": ["001-red-8_wide","001-red-9_wide","001-red-10_wide","001-red-11_wide", "001-blue-8_wide"],
      "availabilityid": "s000002",
      "objecttype": "Availability"
    },
   // ...
    ]
}

Catalog Payload Exceeds 256 MB

When you initially open the stream, you get an uploadUri and a streamId. After you’ve sent the first payload, you need to send the subsequent content payload in the body of the request:

POST https://api.cloud.coveo.com/push/v1/organizations/{organizationId}/sources/{sourceId}/stream/{streamId}/chunk
Content-Type: application/json
Accept: application/json
Authorization: Bearer <MY_ACCESS_TOKEN>

Payload

{}

You then receive a second uploadUri and you must repeat the process until you have uploaded all metadata to your catalog. The last step is to close the stream.

Close the Stream

POST https://api.cloud.coveo.com/push/v1/organizations/{organizationId}/sources/{sourceId}/stream/{streamId}/close
Authorization: Bearer <MY_ACCESS_TOKEN>

When you upload a Catalog into a source, it will replace the previous content of the source completely. Expect a 15 min delay for the removal of the old items from the index.

To update information on single products, see Update Your Catalog source.

Step 5: Review and Inspect Your Indexed Items

The Content Browser is a basic Coveo Platform demo search interface to help you navigate and inspect your organization sources content.

For instructions on accessing the Content Browser and making use of its many features, see Inspect Items With the Content Browser.

Step 6: Define Your Coveo Commerce Catalog

See Creating a Coveo Commerce Catalog.

Step 7: (Optional) Create a Demo Search Page

You have successfully used the Content Browser to filter and view your indexed content. Now create a real, customizable Coveo demo search page in the Cloud Platform (see Manage Hosted Search Pages).

With a demo search page you will get an idea of what you can accomplish using the Coveo JavaScript Search Framework in the next step of the solution implementation.

Indexing Alternatives

Coveo provides many out-of-the-box connectors designed to access and index commerce catalog content. Connectors may be system-specific or generic.

The following table summarizes other connection options for commerce content. Click a given connector name for more details regarding features, content security type support, and instructions on how to create a source.

Indexing Alternatives
The Push API is another solution for Commerce indexing, since it gives you full flexibility on what content to index and when. A new or updated product is searchable in a few minutes, without having to wait for a refresh schedule. You can push content from any system, including, but not limited to, a commerce platform, a product information management (PIM) system, a static database, etc.
Use the Database Connector if you prefer to index the underlying database of your commerce system or product information management (PIM) system directly. The database connector allows for incremental refreshes, which can run every few minutes. The Database connector also uses the Coveo On-Premises Crawling Module, which can be installed behind your firewall to avoid having to create firewall rules for Coveo Cloud.
Use the Generic REST API to get content from a remote repository exposing its data through a REST API. The Generic REST API source runs on a schedule, so expect some delays between new content added/updated and the availability in the search.
Use the Sitemap Connector for simple catalogs where all products data is available online and properly discoverable through a Sitemap file or index file. The Sitemap source runs on a schedule, so expect some delays between new content added/updated and the availability in the search.
Use the Website Connector for simple catalogs where all product data is available online. The Web source runs on a schedule, so expect some delays between new content added/updated and the availability in the search.

Required Privileges

The following table indicates the privileges required for your organizations groups to view or edit elements of the Catalogs page and associated panels (see Manage Privileges and Privilege Reference). The Commerce domain is however only available in Coveo Cloud commerce organizations.

Action Service - Domain Required access level
View catalogs

Commerce - Catalogs

View

Content - Sources

View

Content - Fields

View
Edit catalogs

Commerce - Catalogs

Edit

Content - Sources

View

Content - Fields

View

Search - Execute Query

Allowed

What’s Next?

Proceed to Integrating a Search Interface into Your Commerce Solution or Website.

Recommended Articles