Associate a Passage Retrieval (CPR) model with a query pipeline

This is for:

In this article

Associate a CPR model
Edit a CPR model association
Dissociate a CPR model
CPR model association advanced configuration
Reorder model associations
Required privileges
What’s next?

Associating the CPR model with a query pipeline is only required if you’re using the CPR model with the Passage Retrieval API to provide your custom RAG system LLM application with relevant passages. It’s not required when using the CPR model with the Coveo Search Agent in a Coveo-powered search interface.

When a Coveo Machine Learning (Coveo ML) model has been created, it must be associated with a query pipeline to be effective in a search interface.

organization members with the required privileges can access the Machine learning tab of a query pipeline configuration page to manage Coveo ML model associations for that query pipeline.

When a Passage Retrieval (CPR) model is associated with a query pipeline, the model retrieves the passages that a large language model (LLM) will use to generate an output for a query submitted in the associated application.

Notes

Coveo recommends that you associate only one CPR model per query pipeline.
When using Passage Retrieval (CPR), the query pipeline must be configured to use both a CPR model and a Semantic Encoder (SE) model.
Query pipeline stop word rules aren’t applied to the query that’s used by the CPR model.
By default, query pipeline thesaurus rules aren’t applied to the query that’s used by the CPR model. Enable the Thesaurus rules model association option to apply thesaurus rules to the CPR model.

Associate a CPR model

On the Query Pipelines (platform-ca | platform-eu | platform-au) page, click the query pipeline for which you want to associate the model, and then click Edit components in the Action bar.
On the subpage that opens, select the Machine learning tab, and then in the upper-right corner, click Associate model.
In the Model dropdown menu, select the desired model.

On the right side, under Condition, you can select a query pipeline condition in the dropdown menu or create a new one.

A model association condition has a maximum size limit of 1000 characters. If a condition exceeds this limit, you’ll encounter an error when editing the model. When this happens, you can’t remove the condition from the Edit a model association panel.

To resolve the error when editing a model association, do one of the following:

To remove the condition from the model association:
1. If you’re not already in the Edit a model association panel, on the Query Pipelines (platform-ca | platform-eu | platform-au) page, double-click the query pipeline, and then double-click the model association on the Machine learning tab.
2. Click , and then click Switch to JSON view.
3. Remove the condition and conditionDefinition properties from the configuration.
4. Click Save.
To modify the condition:
1. On the Conditions page, select the condition that exceeds 1000 characters.
2. Click Edit in the Action bar, and then modify the condition configuration so it’s under 1000 characters.
3. Click Save.

Click Associate model.

Edit a CPR model association

On the Query Pipelines (platform-ca | platform-eu | platform-au) page, click the query pipeline for which you want to edit a model association, and then click Edit components in the Action bar.
On the subpage that opens, select the Machine learning tab, click the desired model, and then click Edit in the Action bar.

On the right side, under Condition, you can select a query pipeline condition in the dropdown menu or create a new one.

To resolve the error when editing a model association, do one of the following:

To remove the condition from the model association:
1. If you’re not already in the Edit a model association panel, on the Query Pipelines (platform-ca | platform-eu | platform-au) page, double-click the query pipeline, and then double-click the model association on the Machine learning tab.
2. Click , and then click Switch to JSON view.
3. Remove the condition and conditionDefinition properties from the configuration.
4. Click Save.
To modify the condition:
1. On the Conditions page, select the condition that exceeds 1000 characters.
2. Click Edit in the Action bar, and then modify the condition configuration so it’s under 1000 characters.
3. Click Save.

Click Save.

Dissociate a CPR model

On the Query Pipelines (platform-ca | platform-eu | platform-au) page, click the query pipeline from which you want to dissociate a model, and then click Edit components in the Action bar.
On the subpage that opens, select the Machine learning tab.
Click the model you want to dissociate from the pipeline, and then click Dissociate in the Action bar.

CPR model association advanced configuration

You can use the CPR model association advanced configuration to:

Modify the maximum number of items that the CPR model considers when retrieving the relevant passages.
Enable the query pipeline thesaurus rules for the CPR model.

Modifying the default values for the model association advanced parameters may result in unintended model behavior following a model version upgrade.

Maximum number of items to consider

Passage Retrieval (CPR) uses two stages of content retrieval. First-stage content retrieval identifies the most relevant items in the index, and second-stage content retrieval identifies the most relevant segments of text (passages) from those items. The most relevant passages will then be used by your LLM application.

If you find that your LLM application generates an output using text from low-relevance items, you can use the numberOfDocumentsToConsider model association parameter to set a custom value for the maximum number of items considered during second-stage content retrieval. For example, if you set the maximum number of items to 20, the CPR model retrieves the most relevant passages only from the 20 most relevant items identified during first-stage content retrieval.

This is an advanced model association configuration that should be used by experienced Coveo administrators only.

The default setting of 40 provides good results in most use cases. However, you can specify a custom value if you have a finely tuned query pipeline and you have a good understanding of the relevance hierarchy of the search results returned by your query pipeline configuration. If you set the value too low, there may not be enough relevant text (passages) for your LLM to generate an output.

To set a custom value for the number of items to consider for CPR

On the Query Pipelines (platform-ca | platform-eu | platform-au) page, click the query pipeline associated with the CPR model, and then click Edit components in the Action bar.
Select the Machine learning tab.
Double-click the CPR model.
If the Edit a model association subpage opens in JSON view, proceed to the next step. Otherwise, in the upper-right corner, click , click Switch to JSON view, and then click Switch to JSON view in the confirmation window.
In the JSON editor, add "numberOfDocumentsToConsider": "<VALUE>" under customQueryParameters, where <VALUE> is the maximum number of items.

The value must be an integer between 1 and 100 (default is 40).
Example
To set the custom value to 50, the JSON would be:
{ "passageRetrieval":{ "numberOfDocumentsToConsider": 50 } }

Thesaurus rules

Enable the Thesaurus rules model association option to use the query pipeline's thesaurus rules in the associated CPR model.

Extending your existing thesaurus rules to the CPR model can improve the relevance of retrieved passages, especially in organizations that feature domain-specific terminology or jargon. It allows the CPR model to more effectively handle user queries that include a synonym or alias already defined in your thesaurus based on your specific business rules.

CPR supports the following types of thesaurus rules:

When a thesaurus rule is applied to a CPR model, the query that’s used by the CPR model for passage retrieval is modified to include the synonym or replacement term. This helps the CPR model retrieve relevant passages that use terminology that differs from the user’s query but that you’ve defined as an acceptable synonym or replacement.

For a Synonym or One-way synonym thesaurus rule, the synonyms are added to the query using an OR syntax. For example, if a Synonym thesaurus rule is configured as Include cat, feline, kitten when any are present, the query organic cat treats becomes organic (cat OR feline OR kitten) treats. If a One-way synonym thesaurus rule is configured as Include feline, kitten when cat is present, the query organic cat treats becomes organic (feline OR kitten) treats, but the query organic feline treats remains unchanged.
For a Replace thesaurus rule, the query includes the replacement term instead of the original term. If the rule includes more than one replacement term, all replacement terms are added using an OR syntax. For example, if the Replace thesaurus rule is configured as Replace cat with feline, kitten, the query organic cat treats becomes organic (feline OR kitten) treats.

Example

Your indexed items use the term content recommendation instead of event recommendation. For the query What’s an event recommendation, when using a Synonym thesaurus rule that includes content recommendation and event recommendation when either term is present in the query, the modified query What’s an (event recommendation OR content recommendation) allows the model to retrieve relevant passages from your indexed content.

When the Thesaurus rules option is disabled, the CPR model uses the raw basic query expression (q) entered by the user without any transformations or modifications.

Notes

CPR doesn’t support the Match terms exactly thesaurus rule type. All thesaurus rules of that type are ignored by the CPR model.
If more than one rule exists in the query pipelines thesaurus, the CPR model uses only the first matching rule from top to bottom. You can reorder thesaurus rules to prioritize certain rules over others.
A CPR model applies a thesaurus rule only to the first matching term in the query. All other matching terms in the query are ignored. For example, if a Replace thesaurus rule is configured as Replace laptop, notebook with computer, device, the query Does the AD456 notebook laptop support docking stations becomes Does the AD456 (computer OR device) laptop support docking stations. The query isn’t modified to Does the AD456 (computer OR device) (computer OR device) support docking stations because only the first matching term notebook triggers the thesaurus rule.
CPR doesn’t support a thesaurus rule that includes a condition.
CPR doesn’t support a thesaurus rule that contains Coveo query syntax (for example, field expressions such as @field=value or @field=(value1, value2)), or a regular expression (regex), in its configuration. Such thesaurus rules can produce unexpected behavior, as the CPR model only extracts keywords from thesaurus expansions and can’t interpret query syntax or regex.

Coveo doesn’t validate customer-defined thesaurus rules. Customers are responsible for configuring, testing (for example, using A/B tests), monitoring, and maintaining these rules. Inefficient or ill-defined rules may negatively affect the relevance of retrieved passages. Customers must implement their own validation and QA processes.

To enable thesaurus rules for an associated CPR model

On the Query Pipelines (platform-ca | platform-eu | platform-au) page, click the query pipeline to which the CPR model is associated, and then click Edit components in the Action bar.
Select the Machine learning tab.
Double-click the CPR model.
Enable Thesaurus rules.
Click Save.

Reorder model associations

The order in which models appear in the query pipeline Machine learning tab is only relevant when multiple models of the same type are present. If there are no duplicate model types in the list, the model order has no effect and each model will either execute or not based on its individual condition.

However, when multiple models of the same type are present, the models are evaluated sequentially from top to bottom. Models of a given type are evaluated one after another. The first model of a given type is evaluated and executes only if its condition is satisfied. Evaluation then continues with each subsequent model of the same type following the same rules. This can result in multiple models of the same type being executed for the same query. The order in which they execute is determined by their order in the list, with each model potentially overriding the effect of the previous model of the same type.

To reorder model associations in a query pipeline

On the Query Pipelines (platform-ca | platform-eu | platform-au) page, click the query pipeline in which you want to reorder model associations, and then click Edit components in the Action bar.
On the subpage that opens, select the Machine learning tab.
Click the model whose position you want to change, and then use the Move up or Move down arrows in the Action bar to change the position of the model.

Required privileges

By default, members with the required privileges can view and edit elements of the Models (platform-ca | platform-eu | platform-au) page.

The following table indicates the privileges required to use elements of the Models page and associated panels (see Manage privileges and Privilege reference).

Action	Service	Domain	Required access level
View model associations	Machine Learning	Models	View
Organization	Organization	View
Search	Query pipelines	View
Edit model associations	Machine Learning	Models	View
Organization	Organization	View
Search	Query pipelines	Edit

Action

Service

Domain

Required access level

View model associations

Machine Learning

Models

View

Organization

View

Query pipelines

View

Edit model associations

Machine Learning

Models

View

Organization

View

Query pipelines

Edit

What’s next?

Use the Passage Retrieval API to extract the passages that were retrieved by the CPR model to use in your LLM-powered application.