About Catalog Semantic Encoder (CSE)

This is for:

System Administrator
Important

Contact your Coveo representative to enable Catalog Semantic Encoder (CSE) in your Coveo organization.

Coveo Machine Learning Catalog Semantic Encoder (CSE) enhances product discovery by interpreting the meaning behind user queries. When integrated within a Coveo-powered commerce search interface, CSE leverages vector search and natural language processing (NLP) to retrieve products based on semantic similarity to queries, significantly extending beyond traditional keyword-based search.

In digital commerce environments, product descriptions and attributes often differ from the language shoppers use in queries. CSE bridges this gap by aligning customer queries more closely with the content included in your catalog data.

By using CSE in a Coveo-powered commerce search interface, you:

  • Enhance relevance: Improve the quality of search results by interpreting the intent behind user queries, complementing exact keyword matches.

  • Handle complex queries: Manage verbose, vague, or conversational queries more effectively by understanding the query’s underlying meaning.

  • Reduce manual tuning efforts: Automatically identify relationships between products and queries, minimizing the need for manual synonyms or boosting rules.

Prerequisites

Before contacting your Coveo representative to enable CSE in your Coveo organization, make sure that you:

How CSE works

CSE uses multilingual semantic encoders to create a vector representation of the textual information found in your catalog data and places products into a high-dimensional vector space. At query time, CSE converts the query into a vector in the same high-dimensional space as the product vectors. It then computes the similarity between the query vector and the product vectors to retrieve the most relevant products.

Here’s how the process works:

  1. When the model is trained, it uses the information contained in your catalog data to place the products into a high-dimensional vector space where similar products are close together, and dissimilar products are far apart.

  2. When a user enters a query, CSE encodes the query into a vector in the same high-dimensional space as the product vectors. This encoding captures the semantic meaning of the query.

  3. The CSE model then computes the similarity between the query vector and the product vectors to retrieve the products that are closest to the query in the vector space. This allows CSE to retrieve products that are semantically similar to the query, even if the query doesn’t contain the exact words used in the product data.

  4. Finally, CSE works with Coveo ranking algorithms to optimize the ranking of the results based on both semantic and keyword relevance. Note that CSE is designed to work seamlessly with Coveo AI ranking models, such as Automatic Relevance Tuning (ART) and Intent-Aware Product Ranking (IAPR).

Important

While CSE significantly enhances search capabilities through semantic understanding, optimal results are achieved when used alongside other Coveo AI and search features. Semantic matching alone may not always capture the full context of a query, and thus benefits from complementary keyword-based search capabilities.

For example, semantic matching can be less precise with very technical and specialized terminology because such terms may lack sufficient representation in general semantic training data. Short or ambiguous queries might also have limited semantic context, necessitating additional keyword-based support or tuning for accurate results.

How CSE processes queries

CSE uses a semantic query, which is a clean version of the user’s search query that represents the core meaning or intent. The semantic query sits between the user’s raw input (the original query) and the fully processed query sent to the index (index query).

The semantic query is extracted from the basic query expression and includes only keywords and phrases. It excludes elements that don’t contribute to the core meaning, such as:

  • Query syntax operators (such as AND, OR, and NOT)

  • Field expressions (such as @field=value)

  • Thesaurus rule expansions

  • Stop words removals

Example

For example, if you have the following rules:

  • Thesaurus rule expanding tv to television

  • Stop word rule removing stand from queries

The raw query tv stand for the living room AND @brand=Acme yields a semantic query like tv stand living room Acme.

  • Field expressions and query syntax operators are removed, leaving only the keywords that contribute to the core meaning.

  • Thesaurus expansions aren’t applied, so tv remains tv in the semantic query.

  • Stop words aren’t removed, so stand remains stand in the semantic query.

This keyword-focused approach ensures that CSE can effectively interpret the semantic meaning of the query without being affected by complex search syntax or query transformations applied later in the query pipeline.

Query correction interaction

When the query correction feature is applied to a user’s query:

  • The index returns items based on the corrected query.

  • CSE boosts items based on the semantic query (which is extracted before query correction is applied, including any typos that may be present).

This means that CSE can help surface relevant products even when the original query contains typos, potentially complementing the query correction mechanism by retrieving semantically similar products based on the user’s intent.

Redirect and trigger interaction

While CSE doesn’t directly modify the redirect trigger rules, it can affect their execution.

For example, if a redirect trigger is set up to redirect users to a product detail page (PDP) when they query for a SKU or part number, CSE could block the redirection from occurring if it interprets the query semantically and retrieves other products instead.

To prevent this from happening, you can configure query pipeline conditions on the CSE model association so that the model isn’t triggered for specific query patterns.

For example, you could set a condition to prevent CSE from being applied to all-numerical queries (for SKUs or product codes).

This ensures that CSE enhances semantic search for natural language queries while allowing business-critical redirects to function as intended.

Use case examples

Here are some examples of how CSE can enhance product discovery in a Coveo-powered commerce search interface:

  • A visitor searches for high-definition display. Traditionally, this might not match products described in catalog data as 4K monitor. With CSE, the system identifies these phrases as semantically similar, successfully returning relevant products.

  • A query like comfortable running shoes will effectively match products described as ergonomic athletic sneakers in catalog data.

Here are some additional examples where CSE could be less effective when used alone:

  • Short queries such as XS shirt might return less precise matches due to their limited semantic context and the ambiguity inherent in short or highly abbreviated terms.

  • Highly technical terms, brand-specific jargon, or product codes like SKU12345 may not be represented well in general semantic vector spaces.

In these scenarios, keyword-based search combined with intent-aware ranking, popularity-based models, and targeted manual tuning ensures that such queries return precise and relevant results.