Product embeddings and vectors

This is for:


A product embedding is a machine learning procedure where products are assigned positions in a space. Similar products are close to each other, while products that are different are far from each other.

Each product’s position in the space is called its product vector. This embedding procedure can be accomplished in many ways by Coveo Personalization-as-you-go (PAYG) models, and the resulting relationships between products can be used to extract subtle information about products and provide personalization in search results.

Traditional versus new approach to vectors

The traditional approach to grouping similar products rely on manually assigning or tagging products with category or descriptive data, such as brand, size, or color, and then recommending similar products along these predefined dimensions. This method, however, is limited in determining which products are similar to one another, and this is where product embeddings are useful.

The main reasoning behind embeddings is that products which are “related” tend to appear in the same browsing sessions. For example, when a user is training for a marathon, they might browse for running shoes and other related products in the same session.


A digital commerce experience is made up of sessions, which are made up of different products. Browsing can be a sequence of events, going from product A to B.

A session vector is a representation of the products a user has recently viewed. By comparing the user’s session vector to the positions of products in the precomputed product embedding space, we can predict which products are most similar to the user’s interests in the short term and then recommend or boost them accordingly in search results.

Representation of a high dimensional product embedding

The above plot is a 3-dimensional representation of a high-dimensional product embedding. When the plot image stabilizes, distinct clusters of products become easily identifiable. Machine learning product embedding algorithms are a form of deep learning model with the ability to automatically capture subtle aspects of sport, gender and style, based purely on user behavior and customer data, which can power many use cases in commerce at scale. In addition, Coveo uses catalog data to augment those vectors even further.

The offerings of Coveo Personalization-as-you-go use product vectors as building blocks across all machine learning models. Once the product embedding is computed, we can take a user’s product preferences into account to provide personalized search results along the customer journey.

Again, using the initial example when browsing for running shoes, when we query for shoes, in the vanilla, non-personalized scenario, the results returned are good, as seen below, but completely removed from the “running shoe” theme we mentioned before.

Vanilla shoe search results example

By introducing the session vector of a user specifically interested in sports into the ranking mix, results become much more relevant to running shoes.

Personalized shoe search results example

Embeddings can also captures the intent of the user through queries not directly related to the products seen in the session.

Continuing with the above example, if a customer searches for pants after having browsed for running shoes, the difference between a vanilla approach, which returns a variety of pant styles, and personalized one, where you see pants related specifically to running, is impressive as it shows how well the “running” theme is captured by the search engine.

Vanilla versus personalized search results for pants in a running shoe context

Personalized search is a key component of the customer journey and of product deployments that work reliably at scale. Two sources of data are required to leverage personalized search:

About the Cold Start feature

Product vectors are generated based on customer interactions with the different products in your catalog. This means that the more interactions a product has, the more accurate its vector representation will be.

But what if a product has none or very few interactions? PAYG models integrate a Cold Start feature to address this issue. This feature allows PAYG models to leverage the product’s metadata to build its vector representation and place it accurately within the vector space. For more information, see About the Cold Start feature.