Coveo Machine Learning FAQ

Coveo Machine Learning (Coveo ML) is a service that leverages usage analytics data to deliver relevant search results and proactive recommendations (see Coveo Machine Learning). Coveo ML offers a few features meant to improve content relevancy using predictive models to make recommendations.

This article contains some questions you may ask yourself about Coveo ML, its features, and their models.

General

How Does Intelligent Term Detection (ITD) Work?

A typical Natural Language Processing (NLP) application extracts terms based on linguistic rules and word frequency.

In addition to traditional NLP methods, Coveo Machine Learning (Coveo ML) leverages vocabulary previously employed by end users to search for content, therefore providing a more accurate picture of what’s contextually relevant to the current end-user.

A typical basic query expression (q) (i.e., what the end user types in a search box) contains an average of four keywords. However, some use cases require much larger chunks of text to be used as input for queries, such as an entire support case description. The default query processing algorithm isn’t designed to deal with that many keywords. Therefore, Coveo ML provides an Intelligent Term Detection (ITD) algorithm to extract only the most relevant keywords from such large query expressions (lq).

To do so, ITD uses a set of usage analytics events (search and click) recorded for a search interface:

  1. It selects up to 2,500 queries that generated a positive outcome (i.e., a query result was opened) and were performed at least five times. The selected queries are called top user queries.

    If there are more than 2,500 queries that were made five or more times, then ITD selects the 2,500 most popular ones.

  2. It establishes a correlation between the top user queries and the keywords contained in the lq (e.g., support case description).

  3. It finds the five most relevant terms in the lq based on the average importance of each term (see TFIDF), and on the longest substring in the lq that’s contained in the top user queries. These five terms are called the refined keywords.1

  4. It overrides the original lq with the refined keywords before the query is executed against the index.

1: To review refined keywords in usage analytics reports, you must create a custom usage analytics dimension with the following configuration:

  • API name: refinedkeywords

  • Type: Text

  • Related events: Search, Click, Custom event

What Exactly Are the Coveo ML Capabilities for Salesforce Communities?

Specifically, Coveo ML offers the following features for Coveo for Salesforce - Experience Cloud Edition:

Does Coveo ML Support Coveo for Sitecore?

Yes, Coveo for Sitecore supports Coveo ML when using a Cloud edition. It’s however not available in on-premises installations.

Models

What’s the Optimal Model Data Period?

The default model data period value of 3 months is optimal for most implementations, typically taking into account a large data set that covers user behavior trends over the equivalent of a season.

Consider increasing the data period to get better trained recommendations when your search hubs serve less than about 10,000 user queries per month.

Consider reducing the data period when your search hubs serve significantly more than 10,000 queries per month and you want to get fresher or more trending recommendations.

For more information on model training sets, see Training and Retraining.

When Should I Change a Model Learning Interval/Training Frequency?

The recommended training frequencies for each Data Period value shown in the following table provide optimal results for most implementations. When training a model using a longer Data Period, you can’t retrain the model as frequently.

Data Period Frequency
Daily Weekly Monthly
1 month
3 months (Recommended)
6 months

Consider increasing the learning interval/training frequency when your search hubs serve at least 10,000 user queries per data period and, for example, your relevant content or your user behavior patterns are changing more frequently and you want recommendations to adapt more rapidly.

Consider reducing the learning interval/training frequency when your relevant content or your user behavior patterns are stable over time.

For more information on model training sets, see Training and Retraining.

When Exactly Is a Model Retrained?

The Coveo ML service automatically manages the exact date and time at which models are retrained according to the Learning Interval/Training Frequency set for the model. You can’t set a precise retrain schedule date and time and can’t find out when the model was last retrained or the next time it will be retrained.

For more information on model training sets, see Training and Retraining.

Features

How Are Coveo ML Features Deployed?

A Coveo organization administrator can enable and configure Coveo ML features in minutes via the Administration Console. Once configured, the machine learning model will begin building itself automatically and become ready to operate typically within 30-60 minutes (see Manage Machine Learning Models).

Can Coveo ML Features Be Tested Before They Are Activated?

Yes. You can test Coveo ML features on a test query pipeline before deploying it in your production environment. You can start using the test query pipeline offline and then perform A/B tests on real queries (see Test Query Pipeline Changes).

With usage analytics reports, you can then evaluate the Coveo ML features impact by comparing key metrics for the A and B pipelines (see How Do You Measure the Coveo ML Features Impact? and Analyze the Performance of Pipeline A Vs Pipeline B).

You can also compare ML (ART and QS) models together and compare an ART model with the default index ranking (see Testing Coveo Machine Learning Models).

What Languages Do Coveo ML Features Support?

Coveo ML supports many languages.

As long as the events are used in the model creation, a model contains a submodel for each language that was used by the users performing those events (see Language). Therefore, Coveo ML models support many languages simultaneously and multilingual search (see Coveo Machine Learning Models).

Do Coveo ML Features Work on a Secured Salesforce Community?

Yes. Coveo ML ART, QS, ER, and DNE work on communities where users must log in as long as most authenticated users have access to a significant shared body of content so Coveo ML can learn from the crowd.

How Long Do Coveo ML Features Take to Start Improving Relevance?

Coveo ML features learn from user interactions on your website. The more events a Coveo ML model has to learn from, the better it will be at providing relevant results. If well implemented, Coveo ML generally reaches its best optimization learning from 25,000 to 100,000 queries (see Coveo Cloud Project Guide).

The time to improve relevance therefore depends on the level of search activity on your community and when you started gathering usage analytics data.

How Do You Measure the Coveo ML Features Impact?

You can use the following 2 traditional marketing metrics to evaluate how successfully your community search connects users with the information they need to solve their specific issue:

  • Click-Through Rate (CTR) – The percentage of users clicking on any link on the search results page. Higher values are better.

  • Average Click Rank (ACR) – Similar in concept to page rank, this metric measures the average position of clicked items in a given set of search results. Lower values are better, as a value of 1 represents the first result in a list.

Coveo ML optimizes search results and query suggestions, and will therefore improve CTR and ACR metrics and contribute to increase self-service. You can test the addition of Coveo ML features like any other query pipeline changes (see Test Query Pipeline Changes or Testing Coveo Machine Learning Models).

Evaluate Automatic Relevance Tuning (ART) Impact

You can use usage analytics reports to obtain the percentage of users who clicked on search results that were promoted by ART by using the Click Ranking Modifier dimension.

You want to determine the percentage of users who clicked on ART-promoted results in a given month. Therefore, you create a usage analytics pie chart card using the Click Ranking Modifier dimension and the Click Event Count metric.

The newly created pie chart card now shows the number and the percentage of clicks made on results promoted by ART.

Evaluate Query Suggestions (QS) Impact

You can use usage analytics reports to obtain the number of search events originating from query suggestions by using the Search Cause dimension.

You want to know the number of search events that originated from query suggestions in a given month. Therefore, you create a usage analytics pie chart card using the Search Cause dimension and the Search Event Count metric.

The newly crated pie chart card now shows the total number of search events that occurred for each available search cause. You can inspect the number of search events that originated from query suggestions by looking at the omniboxAnalytics value.

How Exactly Does Automatic Relevance Tuning (ART) Work?

Coveo ML uses machine learning techniques to analyze the search activity data captured by Coveo Usage Analytics (Coveo UA). Coveo ART tracks what users search for, if and how they reformulate their queries, what results they click on, and whether they created a new support case. With this data, ART trains an algorithmic model for predicting which content will be most helpful to future users based upon their specific query. ART regularly retrains its model so that over time it gets better and better and adapts to new trends (see Use Automatic Relevance Tuning).

How Does ART Ensure That the First Results Are Not Falsely Promoted by Users Clicking Them Solely Because They Are the First Results?

ART evaluates user visits as a whole and allocates additional weight (i.e., gives a higher score) to the last clicked item rather than the previously clicked ones.

A user performs a query and inspects the returned results:

  • The user clicks the top 2 items, but these don’t answer their inquiry.

  • The user clicks the three following results, and then closes the search page.

  • ART assumes the fifth result is the best result (i.e., answer the user inquiry) since the click on this result is the last user action associated with the search event.

  • As the ART model is rebuilt over time, the fifth result climbs the result list based on the additional weight acquired from being the last clicked item.

  • ART boosts 5 results to make sure that results 5 to 10 are based on standard index-ranking, therefore allowing those results to move up in the result list.

  • To boost results, ART doesn’t simply use the result popularity, but also takes several other factors into account to prevent positional bias (also known as high ranking result bias).

What Happens When Enabling Usage Analytics and Coveo ML Features at the Same Time?

Coveo ML doesn’t recommend search results or query suggestions until sufficient usage analytics data is available. Coveo ML features will start to make recommendations when it’s retrained and will get better and better each time it’s retrained with more data.

Can Coveo ML Features Work with Coveo On-Premises?

No. Coveo ML features don’t work with a full on-premises Coveo deployment (on-premises CES index, usage analytics module, REST Search API). Coveo ML features require Coveo Cloud REST Search API and usage analytics.

Recommended Articles