Query Correction Feature

The Coveo Cloud index includes the automatic query correction or Did You Mean feature used to detect and automatically suggest or correct misspelled keywords (see Misspelled Words). This article describes in more detail how this feature works so that you can better understand what it can and cannot do.

Query correction feature facts:

  • The query correction is based on a word corrector lexicon (WCL) that contains frequent words and their number of occurrences gathered when items are indexed, so the spelling suggestions/corrections are based on the index content, not on predefined or custom dictionaries.

  • The query correction suggestions/corrections improve as the size of the index increases.

  • The index must have a minimum size of 2 000 items to start providing query correction suggestions.

  • The query correction algorithm is triggered when the query returns a low number of results relative to the size of the index.

    Query correction suggestions are provided when the index:

    • Contains between 2 000-10 000 items and returns less than 1 000 results following a user query.

    • Contains between 10 000-50 000 items and returns less than 1 250 results following a user query.

    • Contains more than 50 000 items and returns less than 0.75% of its results following a user query.

      For a 1 million items index, the query must return less than 7 500 results for suggestions to be provided.

  • Suggestions are not provided if the query has been expanded by the thesaurus.

    The query correction and the thesaurus are completely independent features (see Thesaurus Leading Practices).

  • The algorithm is not applied to search terms meeting one or more of the following rules:

    • Containing 3 characters or less

    • Containing a wildcard character (* and ?)

    • Beginning with a number

  • An indexed word is not suggested by the word corrector lexicon if the word meets one or more of the following rejection rules:

    • Containing more than 4 numbers.

    • Containing 7 or more consecutive consonants

    • Containing 6 or more consecutive vowels

    • Containing an invalid number of consecutive vowels considering the item language.

      The rule applies only to the following languages: English, French, Spanish, and German.

    These word rejection rules are all active by default, but they can be turned off independently to fine tune the query correction behavior. Contact Coveo Support for assistance if you want to do that.

  • The query correction is done on a word by word basis, so the correction of a word is not modified by other words in the query.

  • A suggested word must have a high degree of similarity (edit distance) with the searched word, i.e., a minimized number of character permutations differentiating it from the original word. A missing or added character is considered a permutation. The edit distance for compatible permutations (such as k replaced by q) is smaller than for an incompatible permutation (such as x replaced by r).

  • For a word to be suggested, for each permutation, it must have a number of occurrences in the index that is at least an order of magnitude greater than the original word.

    A user inverts two characters in a keyword such as typing enterpirse rather than correct enterprise. Because there are two permutations between the wrong and correct spelling, enterprise must have at least two order of magnitudes (100 times) more occurrences in the index than enterpirse to be suggested as a correction.

  • The suggested spelling of a query word is determined based on both the frequency of the alternative words in the lexicon (the higher the better) and their degree of similarity with the original word (the closer the better). Thus, with two alternative spellings having the same edit distance, the word that is more frequent in the index is suggested.

  • As an administrator, you can configure how the search interfaces take advantage of the query corrections. For the Coveo JavaScript Search, you can include suggestions or not in a search interface (see DidYouMean Component).