Product Embeddings and Vectors

A product embedding is a machine learning procedure where products are assigned positions in a space. Similar products are close to each other, while products that are different are far from each other.

Each product’s position in the space is called its product vector. This embedding procedure can be accomplished in many ways by Coveo Machine Learning models, and the resulting relationships between products can be used to extract subtle information about products and provide personalization in search results.

Traditional Versus New Approach to Vectors

The traditional approach to grouping similar products rely on manually assigning or tagging products with category or descriptive data, such as brand, size, or color, and then recommending similar products along these predefined dimensions. This method, however, is limited in determining which products are similar to one another, and this is where product embeddings are useful.

The main reasoning behind embeddings is that products which are “related” tend to appear in the same browsing sessions. For example, when a user is training for a marathon, they might browse for running shoes and other related products in the same session.


A digital commerce experience is made up of sessions, which are made up of different products. Browsing can be a sequence of events, going from product A to B.

A session vector is a representation of the products a user has recently viewed. By comparing the user’s session vector to the positions of products in the precomputed product embedding space, we can predict which products are most similar to the user’s interests in the short term and then recommend or boost them accordingly in search results.

Representation of a high dimensional product embedding

The above plot is a 3-dimensional representation of a high-dimensional product embedding. When the plot image stabilizes, distinct clusters of products become easily identifiable. Machine learning product embedding algorithms have the ability to automatically capture subtle aspects of sport, gender and style, based purely on user behavior and customer data, which can power many use cases in commerce at scale. In addition, Coveo uses catalog data to augment those vectors even further.

The offerings of Coveo Personalization-as-you-go use product vectors as building blocks across all machine learning models. Once the product embedding is computed, we can take a user’s product preferences into account to provide personalized search results along the customer journey.

Again, using the initial example when browsing for running shoes, when we query the index for shoes, in the vanilla, non-personalized scenario, the results returned are good, as seen below, but completely removed from the “running shoe” theme we mentioned before.

Vanilla shoe search results example

By introducing the session vector of a user specifically interested in sports into the ranking mix, results become much more relevant to running shoes.

Personalized shoe search results example

Embeddings can also captures the intent of the user through queries not directly related to the products seen in the session.

Continuing with the above example, if a customer searches for pants after having browsed for running shoes, the difference between a vanilla approach, which returns a variety of pant styles, and personalized one, where you see pants related specifically to running, is impressive as it shows how well the “running” theme is captured by the search engine.

Vanilla versus personalized search results for pants in a running shoe context

Personalized search is a key component of the customer journey and of product deployments that work reliably at scale. Two sources of data are required to leverage personalized search: