Machine Learning Advanced Configuration

When creating or updating a Coveo Machine Learning (Coveo ML) model, you can specify various advanced parameters to tailor the model to specific use cases. This article provides reference information on the available advanced model parameters.

If your Coveo organization is managing model’s advanced configurations using the deprecated commandLineParameters method, see Machine Learning Advanced Configuration - Deprecated Methods.

In addition to the custom model parameters described in this article, you can also use the mlParameters query parameter to adjust the way your Coveo ML models are used at query time.

Specify an Advanced Configuration

  1. On the Models page, click the model for which you want to add an advanced configuration, and then, in the Action bar, click Edit.

  2. On the subpage that opens, select the Configuration tab.

  3. At the top-left corner, select the Advanced tab.

  4. In the JSON editor, enter the desired configuration. Available parameters differ depending on the type of model for which you want to specify an advanced configuration:

Reference

ART (topClicks) Advanced Model Parameters

filterFields (list of strings)

This parameter allows to select the Coveo Usage Analytics (Coveo UA) dimensions to be used as filters for potential suggestions. An item will be suggested by the model only if it has been clicked with the specified filter values.

Default value is the list ["originLevel1", "originLevel2"].

With the default filterFields value (i.e., ["originLevel1", "originLevel2"]), if there are two possible originLevel1 values (e.g., partnerHub and techSupportHub) and four possible originLevel2 values (e.g., all, documentation, training, and community), a total of eight possible filters will be created ( partnerHub/all, techSupportHub/all, partnerHub/documentation, etc.). This means that if partnerHub/all is received at query time, only the items clicked in partnerHub/all will be returned by the model.

Note that if you set another field than the two default ones (i.e., ["originLevel1", "originLevel2"]), you must also add the values at query time using the filters mlParameters.

EXAMPLE

You want your ML model to consider the possible value combination of the originContext and originLevel2 dimensions when filtering results because some of the results are not available in some other combinations.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "filterFields": [
      "originContext",
      "originLevel2"
    ]
}

This would require sending the dimension values at query time in the filters mlParameters as follows:

"mlParameters": {
    “filters”: {
          "originContext": “<MY-CONTEXT-VALUE>”,
          "originLevel2": “<TAB-VALUE>”
    }
}

Moreover, you may want to build a model that does not use filters at all since all items are accessible everywhere. You can do so by setting the filterFields parameter empty in a model configuration. This allows you to provide the same relevance across all search hubs using the model.

For example:

{
    "filterFields": []
}

userContextFields (list of strings)

The usage analytics dimensions whose values should be used as the user context by the model to influence the ranking scores of items.

When configuring the userContextFields advanced parameter, make sure that the related dimension values are sent at query time in the context query parameter.

EXAMPLE

You want to build an ML model that uses the originLevel3 and userGroups usage analytics dimensions as the user context to influence the ranking scores of items.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "userContextFields": [
      "originLevel3",
      "userGroups"
    ]
}

automaticContextDiscovery (boolean)

Whether the model should evaluate custom usage analytics dimensions prefixed with context_ to provide predictions or recommendations.

Default: true

When set to false, the model doesn’t automatically consider user context found in data. However, it will use user context fields defined in the userContextFields parameter.

EXAMPLE

You want to build an ML model that doesn’t evaluate custom usage analytics dimensions prefixed with context_. Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "automaticContextDiscovery": false
}

testConfiguration (boolean)

Whether to activate the test configuration mode for this model. This parameter should be used in sandbox environments, when very little analytics are available to train a model.

Default: false

When set to true, the parameter reduces the amount of analytics data required to build the model. It also reduces other frequency thresholds that discard queries or clicks that were not performed frequently enough.

EXAMPLE

In a sandbox environment, you want to build a ML model that takes into account infrequent analytics data for its learning process.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "testConfiguration": true
}

filterOutEmptyQueries (boolean)

Whether the ART model ignores clicks following empty queries as valid usage analytics events to analyze.

Default value is true, meaning that ART models learn from the most clicked documents following non-empty queries only. When set to false, ART models also learn from the most clicked documents following empty queries.

Regardless of the value (true or false), ART models still provide predictions (popular documents) for empty queries. The predictions are computed considering the filterFields parameter, which default behavior is to build different submodels for each search hub and tab. In other words, ART models will output popular documents clicked in the current search hub and tab.

EXAMPLE

You want your ART model to consider clicks that followed empty queries.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "filterOutEmptyQueries": true
}

whitelist (list of strings)

The dimension key names (e.g., context keys) to include in the machine learning model by overriding the Feature Selection algorithm. The algorithm keeps all specified dimensions, meaning that the end-user experience is personalized according to these dimensions.

Default value is the list [].

If the same context key is used in both blacklist and whitelist parameters, whitelist takes precedence.

EXAMPLE

You want an ML model to override the Feature Selection algorithm with the c_context_brand and c_context_contact_primary_role dimension keys.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
  "featureSelect”: {
      “whitelist": [
          "c_context_brand",
          "c_context_contact_primary_role"
      ]
  }
}

blacklist (list of strings)

The dimension key names (e.g., context keys) to exclude from ML models by overriding the Feature Selection algorithm. This algorithm ignores all specified dimensions, meaning that the end-user experience isn’t personalized according to these dimensions.

Default value is an empty list ([]).

If the same context key is used in both the blacklist and whitelist parameters, the whitelist takes precedence.

EXAMPLE

You want an ART model to ignore the c_context_brand and c_context_contact_primary_role dimension keys from its learning process.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
  "featureSelect”: {
      “blacklist": [
          "c_context_brand",
          "c_context_contact_primary_role"
      ]
  }
}

QS (querySuggest) Advanced Model Parameters

filterFields (list of strings)

This parameter allows to select the Coveo Usage Analytics (Coveo UA) dimensions to be used as filters for potential suggestions. An item will be suggested by the model only if it has been clicked with the specified filter values.

Default value is the list ["originLevel1", "originLevel2"].

With the default filterFields value (i.e., ["originLevel1", "originLevel2"]), if there are two possible originLevel1 values (e.g., partnerHub and techSupportHub) and four possible originLevel2 values (e.g., all, documentation, training, and community), a total of eight possible filters will be created ( partnerHub/all, techSupportHub/all, partnerHub/documentation, etc.). This means that if partnerHub/all is received at query time, only the items clicked in partnerHub/all will be returned by the model.

Note that if you set another field than the two default ones (i.e., ["originLevel1", "originLevel2"]), you must also add the values at query time using the filters mlParameters.

EXAMPLE

You want your ML model to consider the possible value combination of the originContext and originLevel2 dimensions when filtering results because some of the results are not available in some other combinations.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "filterFields": [
      "originContext",
      "originLevel2"
    ]
}

This would require sending the dimension values at query time in the filters mlParameters as follows:

"mlParameters": {
    “filters”: {
          "originContext": “<MY-CONTEXT-VALUE>”,
          "originLevel2": “<TAB-VALUE>”
    }
}

Moreover, you may want to build a model that does not use filters at all since all items are accessible everywhere. You can do so by setting the filterFields parameter empty in a model configuration. This allows you to provide the same relevance across all search hubs using the model.

For example:

{
    "filterFields": []
}

userContextFields (list of strings)

The usage analytics dimensions whose values should be used as the user context by the model to influence the ranking scores of items.

When configuring the userContextFields advanced parameter, make sure that the related dimension values are sent at query time in the context query parameter.

EXAMPLE

You want to build an ML model that uses the originLevel3 and userGroups usage analytics dimensions as the user context to influence the ranking scores of items.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "userContextFields": [
      "originLevel3",
      "userGroups"
    ]
}

automaticContextDiscovery (boolean)

Whether the model should evaluate custom usage analytics dimensions prefixed with context_ to provide predictions or recommendations.

Default: true

When set to false, the model doesn’t automatically consider user context found in data. However, it will use user context fields defined in the userContextFields parameter.

EXAMPLE

You want to build an ML model that doesn’t evaluate custom usage analytics dimensions prefixed with context_.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "automaticContextDiscovery": false
}

testConfiguration (boolean)

Whether to activate the test configuration mode for this model. This parameter should be used in sandbox environments, when very little analytics are available to train a model.

Default: false

When set to true, the parameter reduces the amount of analytics data required to build the model. It also reduces other frequency thresholds that discard queries or clicks that were not performed frequently enough.

EXAMPLE

In a sandbox environment, you want to build a ML model that takes into account infrequent analytics data for its learning process.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "testConfiguration": true
}

whiteList (list of strings)

The dimension key names (e.g., context keys) to include in the machine learning model by overriding the Feature Selection algorithm. The algorithm keeps all specified dimensions, meaning that the end-user experience is personalized according to these dimensions.

Default value is the list [].

If the same context key is used in both blacklist and whitelist parameters, whitelist takes precedence.

EXAMPLE

You want an ML model to override the Feature Selection algorithm with the c_context_brand and c_context_contact_primary_role dimension keys.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
  "featureSelect”: {
      “whitelist": [
          "c_context_brand",
          "c_context_contact_primary_role"
      ]
  }
}

blackList (list of strings)

The dimension key names (e.g., context keys) to exclude from ML models by overriding the Feature Selection algorithm. This algorithm ignores all specified dimensions, meaning that the end-user experience isn’t personalized according to these dimensions.

Default value is an empty list ([]).

If the same context key is used in both the blacklist and whitelist parameters, the whitelist takes precedence.

EXAMPLE

You want an ART model to ignore the c_context_brand and c_context_contact_primary_role dimension keys from its learning process.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
  "featureSelect”: {
      “blacklist": [
          "c_context_brand",
          "c_context_contact_primary_role"
      ]
  }
}

queryReplacePatterns (list of tuples)

A set of patterns to find and reformat in query suggestions.

The first value of each tuple (i.e., pattern) must be a regular expression to test against each original query suggestion.

The second value of each tuple (i.e., ordering) is the replacement pattern to apply when a query suggestion matching the pattern is found. Captured pattern groups can be referenced in the ordering pattern using $1, $2, etc.

  • The following characters aren’t supported in replacement patterns: (, ), [, ], { }.

  • Query suggestions are automatically lower-cased.

EXAMPLE

You want your QS model to reformat the following query suggestions:

  • 5551234567 to become 555-123-4567

  • abc123 to become 1a2b3c

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
  "queryReplacePattern": [
    {
      "pattern": "(\d{3})(\d{3})(\d{4})",
      "ordering": "$1-$2-$3"
    },
    {
      "pattern": "(a)(b)(c)(1)(2)(3)",
      "ordering": "$4$1$5$2$6$3"
    }
  ]
}

ER (eventRecommendation) Advanced Model Parameters

userContextFields (list of strings)

The usage analytics dimensions whose values should be used as the user context by the model to influence the ranking scores of items.

When configuring the userContextFields advanced parameter, make sure that the related dimension values are sent at query time in the context query parameter.

EXAMPLE

You want to build an ML model that uses the originLevel3 and userGroups usage analytics dimensions as the user context to influence the ranking scores of items.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "userContextFields": [
      "originLevel3",
      "userGroups"
    ]
}

automaticContextDiscovery (boolean)

Whether the model should evaluate custom usage analytics dimensions prefixed with context_ to provide predictions or recommendations.

Default: true

When set to false, the model doesn’t automatically consider user context found in data. However, it will use user context fields defined in the userContextFields parameter.

EXAMPLE

You want to build an ML model that doesn’t evaluate custom usage analytics dimensions prefixed with context_.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "automaticContextDiscovery": false
}

testConfiguration (boolean)

Whether to activate the test configuration mode for this model. This parameter should be used in sandbox environments, when very little analytics are available to train a model.

Default: false

When set to true, the parameter reduces the amount of analytics data required to build the model. It also reduces other frequency thresholds that discard browsing patterns that were not performed frequently enough.

EXAMPLE

In a sandbox environment, you want to build a ML model that takes into account infrequent analytics data for its learning process.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "testConfiguration": true
}

recommendationUseCase (string)

Sets the usage analytics configuration needed for an ER model that’s attached to a Coveo In-Product Experience (IPX) interface.

You can configure the model to recommend items in an IPX interface based on the currrent URL (ipx_referrer), or the current URL and visitor ID (ipx_visitor_referrer). Depending on the page a user is currently viewing, the ER model recommends items in the IPX interface that pertains to the current website URL. This gives a user access to the most relevant items based on their location on your website. Combining the URL with the visitor ID further personalizes recommendations based on a user’s previous searches and browsing history.

Your website must generate unique visitor IDs to take advantage of the personalization benefits of ipx_visitor_referrer.

EXAMPLE

You want your ML model to recommend items in an IPX interface that are relevant to the page the user is currently viewing.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "recommendationUseCase": "ipx_referrer"
}

urlReplacePatterns (list of tuples)

A set of patterns to find and reformat in URLs.

The first value of each tuple (i.e., pattern) must be a regular expression to test against each URL.

The second value of each tuple (i.e., replace) is the replacement pattern to apply when a URL matching the pattern is found.

EXAMPLE

You want your ER model to remove trailing labels in URLs.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
  "urlReplacePatterns": [
    {
      "pattern": "#.*",
      "replace": ""
    }
  ]
}

DNE (facetSense) Advanced Model Parameters

filterFields (list of strings)

This parameter allows to select the Coveo Usage Analytics (Coveo UA) dimensions to be used as filters for potential suggestions. An item will be suggested by the model only if it has been clicked with the specified filter values.

Default value is the list ["originLevel1", "originLevel2"].

With the default filterFields value (i.e., ["originLevel1", "originLevel2"]), if there are two possible originLevel1 values (e.g., partnerHub and techSupportHub) and four possible originLevel2 values (e.g., all, documentation, training, and community), a total of eight possible filters will be created ( partnerHub/all, techSupportHub/all, partnerHub/documentation, etc.). This means that if partnerHub/all is received at query time, only the items clicked in partnerHub/all will be returned by the model.

Note that if you set another field than the two default ones (i.e., ["originLevel1", "originLevel2"]), you must also add the values at query time using the filters mlParameters.

EXAMPLE

You want your ML model to consider the possible value combination of the originContext and originLevel2 dimensions when filtering results because some of the results are not available in some other combinations.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "filterFields": [
      "originContext",
      "originLevel2"
    ]
}

This would require sending the dimension values at query time in the filters mlParameters as follows:

"mlParameters": {
    “filters”: {
          "originContext": “<MY-CONTEXT-VALUE>”,
          "originLevel2": “<TAB-VALUE>”
    }
}

Moreover, you may want to build a model that does not use filters at all since all items are accessible everywhere. You can do so by setting the filterFields parameter empty in a model configuration. This allows you to provide the same relevance across all search hubs using the model.

For example:

{
    "filterFields": []
}

testConfiguration (boolean)

Whether to activate the test configuration mode for this model. This parameter should be used in sandbox environments, when very little analytics are available to train a model.

Default: false

When set to true, the parameter reduces the amount of analytics data required to build the model. It also reduces other frequency thresholds that discard queries or clicks that were not performed frequently enough.

EXAMPLE

In a sandbox environment, you want to build a ML model that takes into account infrequent analytics data for its learning process.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "testConfiguration": true
}

PR (ecommerce) Advanced Model Parameters

testConfiguration (boolean)

Whether to activate the test configuration mode for this model. This parameter should be used in sandbox environments, when very little analytics are available to train a model.

Default: false

When set to true, the parameter reduces the amount of analytics data required to build the model. It also reduces other frequency thresholds that discard browsing patterns that were not performed frequently enough.

EXAMPLE

In a sandbox environment, you want to build a ML model that takes into account infrequent analytics data for its learning process.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
    "testConfiguration": true
}

itemsToIgnore (list of strings)

A list of items to be ignored by the model upon training.

The items are represented by a list of unique identifiers (SKU) that must be filtered out by the model’s dataset.

EXAMPLE

You want your PR model to ignore the items that have the following SKUs: 12345678 and 87654321 for its learning process.

Therefore, you enter the following JSON configuration when configuring your model’s advanced configuration:

{
  "ìtemsToIgnore": [
    "12345678",
    "87654321"
  ]
}
Recommended Articles