Coveo Machine Learning Models

The Coveo™ Machine Learning (Coveo ML) features leverage usage analytics data by creating and training algorithmic models to predict and recommend which content is most helpful to users. Coveo ML is a service managed by Coveo that you can view like a ‘scientist in a box’ that handles the model complexity.

Your Coveo Cloud organization administrator can activate and configure a Coveo ML feature in minutes through the administration console. Behind the scenes, a predictive model is automatically built and is typically ready to make recommendations within 30 to 60 minutes.

The time required to build a model depends mainly on the system load (i.e., the number of model requests in the queue) and the size of the training set. Therefore, even models with small training sets can take several minutes to build when the request is in the queue waiting for resources.

A Coveo ML feature starts making recommendations as soon as sufficient data is available to the model. Some features have threshold values. Consequently, if you just start collecting usage analytics data and enable a Coveo ML feature, depending on your search traffic, it will take some time before an operational model starts to actually make recommendations, but they will get better and better as more data becomes available.

Training and Retraining

A model is trained with usage analytics data from a given recent period and regularly retrained.

By default, a Coveo ML model is based on usage analytics data of the last 3 months from when the model is built to ensure sufficient data is available and is updated every week to maintain the model freshness.

The more data is available to the model to learn from, the better will be the recommendations. As a general guide, a usage analytics data set of 10,000 queries or more typically allows a Coveo ML feature model to provide very relevant recommendations. You can look at your Coveo usage analytics data to evaluate the volume of queries on your search hub, and ensure that your Coveo ML features are configured with a training Data Period that correspond to at least 10,000 queries. When your search hub serves a very high volume of queries, you can consider reducing the data period so that the model learns only more recent user behavior and be more responsive to trends.

A Coveo ML feature model is regularly retrained on a more recent Coveo usage analytics data set, to ensure that recent user behavior is learned and the model freshness maintained.

Set your Coveo ML model training Frequency parameter in relation with the Data Period value. Select a longer time interval for a larger Data Period and shorter time interval for a smaller Data Period as recommended (Icon-ThumbsUp_BK_16) in the following table.

Data Period Frequency
Daily Weekly Monthly
1 week
1 month
3 months
6 months
1 year

thumb up icon = recommended = available

  • Because very frequently retraining a model based on a long period would have very little effect and consume significant Coveo ML service resources, some Data Period and Training Frequency parameter value combinations are not allowed.

  • If your Coveo Cloud organization has not yet collected enough data according to requirements, but your search interface has more than 55 visits per day in which a manual query is followed by a click for a specific language, you can reproduce the following configuration depending on when you start collecting data:

    Data collected for Data Period Training Frequency
    Daily Weekly Monthly
    1 to 29 days 1 month
    1 month 3 months

Sub-Models

A model automatically includes sub-models for example for the languages, hubs, and interfaces to uniquely optimize recommendations for each specific search experience on your community.

A model learns separately search visits made in search interfaces offered in different languages, since obviously, keywords will often be different from one language to another.

  • The number of sub-models does not matter. However, the quality of sub-models depends on the number of events that were used to build each sub-model. You can review the number of recommended items for each sub-model of ART models in the administration console (see Reviewing Coveo Machine Learning Model Information).

  • Sub-models are not grouped, meaning that sub-models built on very different user behaviors do not negatively impact the quality of the parent model.

  • The variation in data set sizes used to build sub-models has no negative impacts on the parent model quality.

Your Community search page and your content are available in several languages, but 75% of the queries are made in the English version of the search page, only 4% in the Greek search page.

In the Greek search page, a user searches for DFT-400, a product name that is the same in all languages. Because a sub-model learned only the user behavior for the Greek search page, Automatic Relevance Tuning (ART) can recommend relevant Greek items for the DFT-400 product. Without language sub-models, ART would most likely rather recommend DFT-400 English items that would not be included in search results because they are not part of the Greek search interface scope.

Different search hub or interfaces typically serve different purposes where users expect or seek for different results for the same query. Sub-models actually filter recommendations when they do not match the current hub and interface combination to prevent recommending items outside of the expected scope.