Understanding Search Result Ranking

Result ranking is the process during which the index evaluates a distinct ranking score for each item that matches a query, and then sorts the results from most to least relevant (i.e., in descending score order). The Coveo Platform ranks search results by calculating a relevance score based on a series of ranking factors. The score spans from minus infinity to infinity. The higher the score, the higher the result will be in the result list.

Relevance Score

The relevance score is a combination of the index ranking algorithm in action during the index ranking phases, and other relevance modifiers such as query ranking expressions (QRE) and query ranking functions.

Members of the Administrators and Relevance Managers built-in groups can modify the relative weight of some index ranking factors by adding query pipeline ranking weight rules.

You can inspect the score of items using the Debug panel (see Use the JavaScript Search Debug Panel).

Because of their nature, featured results should always appear at the top of the result list, and Coveo Machine Learning (Coveo ML) ART-recommended results in the first ten results. The relevance score boost is higher for featured results than for ART-recommended results. However, both types of results can be affected by other ranking factors (e.g., other query pipeline component rules such as query ranking expressions).

Index Ranking Phases

The mechanism behind the ranking process can be compared to a funnel. Starting with all items, the index receives a query from a user, isolates items in which the user identity can be found in the permission groups (see Group and Granted Security Identities, Permission Sets, and Permission Levels), and then only keeps the items that match the query.

The ranking process is separated into four phases, each of them working on the items sorted by the preceding phase.

The Coveo Platform natively uses 17 pre-tuned ranking weight factors during these phases. Among the most important ones, the criteria with the biggest relevance impact are term proximity, item modified date (most recent), and term frequency. Each of these 17 criteria has been optimized over years of experience with a wide variety of indexed content to determine highly satisfying out-of-the-box relevance scores of items in most cases. You can still carefully tune these parameters when needed (see Manage Query Pipeline Ranking Weights). You can also troubleshoot ranking when a factor score seems too high or too low use the JavaScript Search Debug Panel.

While you can use several parameters to tune the index ranking engine, you must make changes carefully to prevent negative performance or ranking collateral effects. We recommend that you contact Coveo Support to get recommendations to address your index ranking issues.

Phase 1: Term Weighting

The first phase attributes a score to items based on each term of the user query. Seven factors are used to rank the indexed items the user has permissions to access and match the query. These factors cover areas such as the position of the query terms in those items (in the title, in the summary, in the concepts, etc.) and the item language (same language as the user query or not). Once the ranking is done, the 50,000 highest scored items are kept.

On top of these ranking factors, query ranking expressions (QRE), which are custom expressions used to modify the ranking score by a specified amount when items match certain conditions, are taken into account during this phase (see Manage Ranking Expression Rules).

Phase 2: Item Weighting

The second phase attributes a score to items based on their freshness (last modification date) and quality. This phase, which is performed on the first 50,000 items with the highest ranking scores returned by phase one, uses six ranking factors that cover areas such as the source rating (reputation from lowest to highest) to further adjust the relevance score of these 50,000 items.

Once the ranking is done, the 100 highest scored items are kept and the next three index ranking phases are performed on these items.

  • A Coveo organization member with the required privileges can fine-tune the importance of each of the factors, but this should be done with care because it affects all results in all search interfaces (see Manage Ranking Weight Rules).

  • This phase involves loading item-specific information such as if the items were modified recently.

  • For each item, the score attributed for each factor is shown under Document Weights (see Use the JavaScript Search Debug Panel).

Phase 3: Term Frequency–Inverse Item Frequency (TF-IDF)

The purpose of the third phase is to weight queried terms while taking their number of occurrences in items into account. The ranking engine evaluates the importance of a query term for an item based on the number of occurrences of this term in the item, but also inversely on the number of occurrences of the term in the index (TF-IDF). The more frequent a term is in the index, the less informative the term becomes since the significance and meaning are to a certain extent diluted.

A common term such as product is worth less than a rare one such as iPhone.

Based on this methodology, each of the 100 items returned from phase three receives an additional score, and then their ranks are adjusted accordingly.

  • For each item, the score attributed for Frequency, Correlation, and TF-IDF for each queried terms is shown under Term weights (see Use the JavaScript Search Debug Panel).

  • The index minimizes possible stemming errors by calculating a Correlation factor between the searched term and every possible expansion. In search results, items containing highly correlated expansions are ranked higher than ones containing poorly correlated expansions.

    For example, when you search for universe, because of the way the stemming algorithm works, the index expands your query using terms from the univer stem classes that can include university. When the terms universe and university rarely co-occur in your indexed items, items containing university are ranked lower.

Phase 4: Adjacency Ranking

The last phase computes the proximity of query terms, giving more weight to items having the terms close together in the text. This step fine-tunes the order of the items received from phase 3 and, once the reordering is done, items are returned in the search interface to the user as a response to the submitted query.

  • Term proximity doesn’t apply to queries with one term and is only calculated on a maximum of 100 items. Contact Coveo Support to modify this value that can be 400, 300, 200, 150, or 100 in your index configuration.

  • For each item, when ranking information is enabled, the score attributed for Adjacency is shown under Document Weights (see Use the JavaScript Search Debug Panel).

  • The value of the docID is used to break ties (if any) and ensure that the same results order is respected if the same query is performed in the future. items with the same ranking score are sorted in descending docID values order.

  • By default, ten results are shown per page in your search interface, meaning that past the tenth page, results were not processed by the last three phases.

This is how ranking is involved within relevancy. However, the ranking process isn’t limited to these phases. The Coveo Platform comes with many features that further help fulfill your needs. Features that you can use to personalize or customize the way you want your items to be ranked. Coveo Machine Learning models and query pipelines are among other features influencing the relevance or search results (see Coveo Machine Learning and What’s a Query Pipeline?).

Pre-Tuned Ranking Weight Factors

The following table indicates all the ranking factors taken into account out-of-the-box by the Coveo ranking engine at each phase of the ranking process:

Phase

Ranking factor
(Label in Debug panel)

Description
1 Term in title (Title)1 The presence of queried keywords in the title of the item.
Term in concepts (Concept)1 The presence of queried keywords in the automatically populated @concepts field of the item.
Term in summary (Summary)1 The presence of queried keywords in the summary of the item.
Terms in address (URI)2 The presence of queried keywords in the URI of the item.
Term has formatting (Formatted)2 Whether queried keywords are formatted in the item (e.g., heading level, bold, large, etc.).
Term casing (Casing)2 Whether queried keywords have a special casing in the item.
Term correlation within stemming classes (Relation)2 The presence of words with the same root as the queried keywords in the item.

If a user searches for programmer, Coveo performs a stemming extension and search the index for items matching programmer, programmers, program, programming, etc.

Since programmers is closely related to the original query, items matching programmers will obtain a higher score than those matching programming for this ranking factor.

Item in user language (QRE)2 Whether the item is in the language of the search interface from which the query originates.
2 Item modified recently (Date)1 Item last modification date. Items with the most recent modification date obtain a higher ranking.
Item quality evaluation (Quality)2 The proximity of the item to the root of the indexed system.
Source rating (Source)2 The rating of the source the item resides in.
Custom ranking weight (Custom)2 The custom weight assigned through an indexing pipeline extension (IPE) for the item.
3
Term Frequency–Inverse Document Frequency (TF-IDF)1 The number of times a queried keyword appears in a given item, offset by the number of items in the index containing that keyword (see TF-IDF).
4 Term proximity (Adjacency)1 The proximity of queried keywords to each other in the item.

Note 1: Configurable in ranking weight rules (see Manage Query Pipeline Ranking Weights).

Note 2: Default value that’s configurable with the help of Coveo Support.

The relative importance of each of the ranking criteria is difficult to establish, since each criteria score depends on many factors, such as the number of terms in the query, the type of sources that are indexed, the individual terms in the query and the number of items in the index.

Ranking Example

You perform the Washing Machine query on your appliance website, and two results are returned. To learn why the results are in that specific order, you inspect their relevance score in the Debug panel.

You first take a look at the index ranking. The first result (KleanKlothes Washing Machine) has Washing and Machine in its title and contains several occurrences of washing machine in its content. Therefore, the index sets the result score at 5,000. The second result (EZLaundry Machine) has only Machine in its title, so the index gives the result a score of 3,000.

You then analyze how the Coveo ML ART feature impacted the ranking. Since EZLaundry Machine is clicked more often than KleanKlothes Washing Machine and that users usually don’t return to the search page to perform another query after consulting the product page, the ART model adds 2,500 to the score of EZLaundry Machine.

So far, the score for KleanKlothes Washing Machine is 5,000 and 5,500 for EZLaundry Machine.

Finally, you remember your marketing team had an incentive to promote KleanKlothes Washing Machine. The team created a query ranking expression that adds 1,000 points, pushing the KleanKlothes Washing Machine score to 6,000, which is higher than the EZLaundry Machine one at 5,500. Hence why KleanKlothes Washing Machine is the first returned result.

Recommended Articles