Create and manage Semantic Encoder (SE) models

This is for:

In this article

What does an SE model do?
Prerequisites
Create an SE model
Edit an SE model
Delete an SE model
Review active model information
Reference
What’s next?

A Semantic Encoder (SE) model is only supported for use in a Coveo-powered search interface with Relevance Generative Answering (RGA) or in an LLM-powered application with Passage Retrieval (CPR).
The SE model is available as a paid product extension. Contact Coveo Sales or your Account Manager to add SE to your organization license.

A Coveo Machine Learning (Coveo ML) SE model retrieves items from your index based on semantic similarity with the query.

What does an SE model do?

When an SE model builds, it creates embeddings for the indexed items specified in the model settings and stores the embeddings in the index.

Note

The model is preconfigured to rebuild and update the embeddings weekly based on when the model is created. Contact your Coveo Account Manager if a different build interval is required.

Semantic Encoder model embeddings | Coveo

An SE model uses a pre-trained sentence transformer language model to create the embeddings. The language model does this by capturing relationships between words, phrases, and sentences in the dataset.

An SE model creates embeddings only for the content in an item’s title and body. That is, the item’s content that’s mapped to the item and body fields in the Coveo index. For more information, see How SE uses your content.

As shown in the following diagram, the model uses a chunking strategy to create the embeddings. This means that instead of creating a vector for each individual word, a vector is created for a segment of text (chunk) to increase relevance.

When a user enters a query in a Coveo-powered search interface or an LLM-powered application that uses an SE model, the query passes through a query pipeline where pipeline rules and machine learning are applied to optimize relevance. However, the SE model adds vector search capabilities to the search engine. As shown in the following diagram, the SE model embeds the query in the embedding vector space in the index to find items with high semantic similarity with the query. The search results include items that are based on both semantic and lexical similarity.

In the context of generating an answer using Relevance Generative Answering (RGA), or retrieving passages using Passage Retrieval (CPR), the SE model ensures that the RGA or CPR model retrieves segments of text only from the most relevant items in the index. For more information on how an SE model works with RGA or CPR during content retrieval, see RGA overview or CPR overview.

Note

The embeddings created by the SE model aren’t impacted by usage analytics events.

Prerequisites

You have the required privileges to create an SE model.

The content that you want to use for the model respects the item requirements and is optimized.

Note

An optimal Relevance Generative Answering (RGA) and Passage Retrieval (CPR) implementation includes both an RGA or CPR model and an SE model. The same SE model can be used with multiple RGA or CPR models. For best results, both models should be configured to use the same content.

See RGA overview or CPR overview for information on how SE works with RGA or CPR in the context of a search session.

Keep the model embedding limits in mind when choosing the content for your model.

Create an SE model

Depending on whether models have already been created in your Coveo organization:
- If your Coveo organization doesn’t contain any models, on the Models (platform-ca | platform-eu | platform-au) page, click the Semantic Encoder card.
- If your Coveo organization already contains models, on the Models (platform-ca | platform-eu | platform-au) page, click Add model, and then click the Semantic Encoder card.
Click Next.

In the Learn from section, select the content that the model will use. You can select the sources and apply additional filters using the Standard configuration, or use Advanced mode to define a custom filter expression.

You’ll lose the current mode settings when you switch between Standard and Advanced mode.

Note

See RGA overview or CPR overview for information on how SE works with RGA or CPR in the context of a search session.

The Data volume preview section shows the impact of your settings on the data that’s available to the model.

In the Standard tab:
1. In the Sources dropdown menu, select the sources that contain the items from which you want the model to learn.
  
  Note
  
  If your Coveo organization includes multiple indexes, the model can learn only from sources that are linked to the default index.
2. (Optional) In the Apply filters on dataset section, you can specify a condition to segment the content on which the model should base its training.
  
  Example
  
  You want the model to base its training only on items for which the collection field have the FAQ value.
  
  Therefore, you add a collection is equal to FAQ condition.
  1. Click Add filter(s).
  2. In the Field name input, enter the name of the field that you want to use to segment the dataset.
  3. In the Select an operator dropdown menu, select the desired operator.
  4. In the Value input, enter the value of the field on which you want to segment the dataset.
  5. Click Apply.
In the Advanced tab:
1. Enter a custom filter expression using Coveo query syntax.
2. Click Apply.

Click Next.
In the Name your model input, enter a meaningful display name for the model.

Use the Project selector to associate your SE model with one or more projects.
Click Start building.
You can then associate the model with a pipeline to take advantage of the model in a search interface.

Edit an SE model

On the Models (platform-ca | platform-eu | platform-au) page, click the model you want to edit, and then click Edit in the Action bar.
On the subpage that opens, select the Configuration tab.
In the upper-right corner, click Edit.
Under Name, edit the model’s display name.

In the Learn from section, select the content that the model will use. You can select the source(s) and apply additional filters using the Standard configuration, or use Advanced mode to define a custom filter expression.

You’ll lose the current mode settings when you switch between Standard and Advanced mode.

Note

See RGA overview or CPR overview for information on how SE works with RGA or CPR in the context of a search session.

The Data volume preview section shows the impact of your settings on the data that’s available to the model.

In the Standard tab:
1. In the Sources dropdown menu, select the sources that contain the items from which you want the model to learn.
  
  Note
  
  If your Coveo organization includes multiple indexes, the model can learn only from sources that are linked to the default index.
2. (Optional) In the Apply filters on dataset section, you can specify a condition to segment the content on which the model should base its training.
  
  Example
  
  You want the model to base its training only on items for which the collection field have the FAQ value.
  
  Therefore, you add a collection is equal to FAQ condition.
  1. Click Add filter(s).
  2. In the Field name input, enter the name of the field that you want to use to segment the dataset.
  3. In the Select an operator dropdown menu, select the desired operator.
  4. In the Value input, enter the value of the field on which you want to segment the dataset.
  5. Click Apply.
In the Advanced tab:
1. Enter a custom filter expression using Coveo query syntax.
2. Click Apply.

Click Save.

Delete an SE model

Note

Models are automatically dissociated from all their associated query pipelines once they’re deleted.

On the Models (platform-ca | platform-eu | platform-au) page, click the ML model that you want to delete, and then click More > Delete in the Action bar.
In the Delete a model panel that appears, click Delete Model.

Review active model information

On the Models (platform-ca | platform-eu | platform-au) page, click the desired model (must be Active), and then click Open in the Action bar (see Reviewing model information).

Reference

Model embedding limits

The SE model converts your content’s title and body text into numerical representations (vectors) in a process called embedding. It does this by breaking the text up into smaller segments called chunks, and each chunk is mapped as a distinct vector. For more information, see Embeddings.

Due to the amount of processing required for embeddings, the model is subject to the following embedding limits:

Note

The same chunking strategy is used for all sources and item types.

Up to 5 million items or 50 million chunks

Note

The maximum number of items depends on the item allocation of your product plan.

11 chunks per item

This means that for a given item, there can be a maximum of 11 chunks. This limit is sufficient in order for the SE model to capture an item’s main concepts through embeddings. If an item is long with a lot of text, however, such as more than 4000 words or 5 pages, the model will embed the item’s text until the 11-chunk limit is reached. The remaining text won’t be embedded and therefore won’t be used by the model. Use shorter and more focused items to make sure that the entire item’s text is embedded.

500 words per chunk

Note

There can be an overlap of up to 20% between chunks. In other words, the last 20% of the previous chunk can be the first 20% of the next chunk.

"Status" column

On the Models (platform-ca | platform-eu | platform-au) page of the Administration Console, the Status column indicates the current state of your Coveo ML models.

The following table lists the possible model statuses and their definitions:

Status	Definition	Status icon
Active	The model is active and available.
Build in progress	The model is currently building.
Inactive	The model isn’t ready to be queried, such as when a model was recently created or the organization is offline. Click See more details for additional information (see Review model information).
Limited	Build issues exist that may affect model performance. Click See more details for additional information (see Review model information).
Soon to be archived	The model will soon be archived because it hasn’t been queried for an extended period of time. Click Delete to remove the model. Learn more about archived models.
Error	An error prevented the model from being built successfully. If it’s a temporary system error, check back soon. Otherwise, click See more details for additional information (see Review model information).
Archived	The model was archived because it hasn’t been queried for at least 30 days. Click Delete to remove the model. Learn more about archived models.

Status

Definition

Status icon

Active

The model is active and available.

Active

Build in progress

The model is currently building.

Building

Inactive

The model isn’t ready to be queried, such as when a model was recently created or the organization is offline.
Click See more details for additional information (see Review model information).

Inactive

Limited

Build issues exist that may affect model performance.
Click See more details for additional information (see Review model information).

Limited

Soon to be archived

The model will soon be archived because it hasn’t been queried for an extended period of time.
Click Delete to remove the model.
Learn more about archived models.

Archive pending

Error

An error prevented the model from being built successfully.
If it’s a temporary system error, check back soon. Otherwise, click See more details for additional information (see Review model information).

Error

Archived

The model was archived because it hasn’t been queried for at least 30 days.
Click Delete to remove the model.
Learn more about archived models.

Archived

Required privileges

By default, members with the required privileges can view and edit elements of the Models (platform-ca | platform-eu | platform-au) page.

The following table indicates the privileges required for members to manage Coveo Generic models (see Manage privileges and Privilege reference).

Action	Service - Domain	Required access level
View models	Machine Learning - Models Organization - Organization Search - Query pipelines	View
Manage models	Organization - Organization Search - Query pipelines	View
Machine Learning - Models	Edit
Machine Learning - Allow content preview	Enable
Content - Sources	View All
Content - Fields	View

Action

Service - Domain

Required access level

View models

Machine Learning - Models
Organization - Organization
Search - Query pipelines

View

Manage models

Organization - Organization
Search - Query pipelines

View

Machine Learning - Models

Edit

Machine Learning - Allow content preview

Enable

Content - Sources

View All

Content - Fields

View

What’s next?

Associate the SE model with a query pipeline.