About search result ranking
About search result ranking
Result ranking is the process during which the index evaluates a distinct ranking score for each item that matches a query, and then sorts the results from most to least relevant (that is, in descending score order). Coveo ranks search results by calculating a relevance score based on a series of ranking factors. The score spans from minus infinity to infinity. The higher the score, the higher the result will be in the result list.
Relevance score
The relevance score is a combination of the index ranking algorithm in action during the index ranking phases, and other relevance modifiers such as query ranking expressions (QREs) and query ranking functions.
Members with the required privileges can modify the relative weight of some index ranking factors by adding query pipeline ranking weight rules.
Note
You can inspect the score of items using the Debug panel (see Use the JavaScript Search Debug Panel). |
Featured and ART-recommended results
Because of their nature, featured results should always appear at the top of the result list, and Coveo Machine Learning (Coveo ML) ART-recommended results in the first ten results. The relevance score boost is higher for featured results than for ART-recommended results. However, both types of results can be affected by other ranking factors (for example, other query pipeline component rules such as query ranking expressions).
Index ranking phases
The mechanism behind the ranking process can be compared to a funnel. Starting with all items, the index receives a query from a user, isolates items in which the user identity can be found in the permission groups (see Group and Granted Security Identities, Permission Sets, and Permission Levels), and then only keeps the items that match the query.
The ranking process is separated into four phases, each of them working on the items sorted by the preceding phase.
Coveo natively uses 17 pre-tuned ranking weight factors during these phases. Among the most important ones, the criteria with the biggest relevance impact are term proximity, item modified date (most recent), and term frequency. Each of these 17 criteria has been optimized over years of experience with a wide variety of indexed content to determine highly satisfying out-of-the-box relevance scores of items in most cases. You can still carefully tune these parameters when needed (see Manage Query Pipeline Ranking Weights). You can also troubleshoot ranking when a factor score seems too high or too low by using the JavaScript Search Debug Panel.
While you can use several parameters to tune the index ranking engine, you must make changes carefully to prevent negative performance or ranking collateral effects. We recommend that you contact Coveo Support to get recommendations to address your index ranking issues. |
Phase 1: Term weighting
The first phase attributes a score to items based on each term of the user query. Seven factors are used to rank the indexed items the user has permissions to access and match the query.
These factors cover areas such as the location of the query terms in those items (in the title, in the summary, in the concepts, etc.) and the item language (same language as the user query or not). Once the ranking is done, the 50,000 highest scored items are kept.
On top of these ranking factors, query ranking expressions (QRE), which are custom expressions used to modify the ranking score by a specified amount when items match certain conditions, are taken into account during this phase.
Notes
|
Phase 2: Item weighting
The second phase attributes a score to items based on their freshness (last modification date) and quality. This phase, which is performed on the first 50,000 items with the highest ranking scores returned by phase one, uses six ranking factors that cover areas such as the source rating (reputation from lowest to highest) to further adjust the relevance score of these 50,000 items.
Once the ranking is done, the 200 highest scored items are kept and the next three index ranking phases are performed on these items.
Notes
|
Phase 3: Term frequency-inverse item frequency (TF-IDF)
The purpose of the third phase is to weight queried terms while taking their number of occurrences in items into account. The ranking engine evaluates the importance of a query term for an item based on the number of occurrences of this term in the item, but also inversely on the number of occurrences of the term in the index (TF-IDF). The more frequent a term is in the index, the less informative the term becomes since the significance and meaning are to a certain extent diluted.
A common term such as product
is worth less than a rare one such as iPhone
.
Based on this methodology, each of the 200 items returned from phase three receives an additional score, and then their ranks are adjusted accordingly.
Notes
|
Phase 4: Adjacency ranking
The last phase computes the proximity of query terms, giving more weight to items having the terms close together in the text. This step fine-tunes the order of the items received from phase 3 and, once the reordering is done, items are returned in the search interface to the user as a response to the submitted query.
Notes
|
This is how ranking is involved within relevancy. However, the ranking process isn’t limited to these phases. Coveo comes with many features that further help fulfill your needs. Features that you can use to personalize or customize the way you want your items to be ranked. Coveo ML models and query pipelines are among other features influencing the relevance or search results (see Coveo Machine Learning and What’s a query pipeline?).
Pre-tuned ranking weight factors
The following table indicates all the ranking factors taken into account out-of-the-box by the Coveo ranking engine at each phase of the ranking process:
Phase | Ranking factor (Label in Debug panel) | Description |
---|---|---|
Term in title (Title) [1]. |
The presence of queried keywords in the title of the item. |
|
Term in concepts (Concept) [1] |
The presence of queried keywords in the automatically populated |
|
Term in summary (Summary) [1] |
The presence of queried keywords in the summary of the item. |
|
Terms in address (URI) [2]. |
The presence of queried keywords in the URI of the item. |
|
Term has formatting (Formatted) [2] |
Whether queried keywords are formatted in the item (for example, heading level, bold, large, etc.). |
|
Term casing (Casing) [2] |
Whether queried keywords have a special casing in the item. |
|
Term correlation within stemming classes (Relation) [2] |
The presence of words with the same root as the queried keywords in the item. For example, if a user searches for Since |
|
Item in user language (QRE) [2] |
Whether the item is in the language of the search interface from which the query originates. |
|
Item modified recently (Date) [1] |
Item last modification date. Items with the most recent modification date obtain a higher ranking. |
|
Item quality evaluation (Quality) [3] |
The proximity of the item to the root of the indexed system. |
|
Source rating (Source) [4] |
The rating of the source the item resides in. |
|
Custom ranking weight (Custom) [5] |
The custom weight assigned through an indexing pipeline extension (IPE) for the item. |
|
Term Frequency–Inverse Document Frequency (TF-IDF) [1] |
The number of times a queried keyword appears in a given item, offset by the number of items in the index containing that keyword (see TF-IDF). |
|
Term proximity (Adjacency) [1] |
The proximity of queried keywords to each other in the item. |
Notes
|
Ranking example
You perform the Washing Machine
query on your appliance website, and two results are returned.
To learn why the results are in that specific order, you inspect their relevance score in the Debug panel.
You first take a look at the index ranking.
The first result (KleanKlothes Washing Machine
) has Washing
and Machine
in its title and contains several occurrences of washing machine
in its content.
Therefore, the index sets the result score at 5,000.
The second result (EZLaundry Machine
) has only Machine
in its title, so the index gives the result a score of 3,000.
You then analyze how the Coveo ML ART feature impacted the ranking.
Since EZLaundry Machine
is clicked more often than KleanKlothes Washing Machine
and that users usually don’t return to the search page to perform another query after consulting the product page, the ART model adds 2,500 to the score of EZLaundry Machine
.
So far, the score for KleanKlothes Washing Machine
is 5,000 and 5,500 for EZLaundry Machine
.
Finally, you remember your marketing team had an incentive to promote KleanKlothes Washing Machine
.
The team created a query ranking expression that adds 1,000 points, pushing the KleanKlothes Washing Machine
score to 6,000, which is higher than the EZLaundry Machine
one at 5,500.
Hence why KleanKlothes Washing Machine
is the first returned result.