About stemming
About stemming
stemming is a process which reduces words to their stem, base, or root form. The index uses the stem of each queried term to expand the query by searching for the original term and also indexed terms that share the same root. This important automatic query expansion process often returns relevant results that wouldn’t appear otherwise.
|
|
Note
Stemming only applies to words of three characters or more. |
-
Searching for a term in its singular form returns items that include both its singular and plural forms.
-
The words search, searching, and searched share the same root or stem: search. When you query
searching, the index returns items containing the words searching, search, searches, and searched.
Stem expansions
For your index to use a certain term as a stem expansion, that term must appear in at least one indexed item, where:
-
The search interface language matches one of the languages of that item.
-
The term appears in the body of that item.
Languages
In addition to only using terms from indexed items in the same language as the search interface, Coveo employs language-specific stemming algorithms in order to improve the relevance of stem expansions. For more information, see a list of the languages supported by Coveo.
The term attention can stem to attentio in English and attenti in French.
The stem expansion attentif is only relevant in French.
By default, the index assumes the search interface language to be the main language detected in indexed items. However, your index may have multiple main languages.
Main languages are those that make up 90% of the indexed items, either individually or in combination with other languages. The remaining 10% of indexed items are ignored during stemming.
|
|
Example
Your index contains 75% English documents, 15% French documents, and 10% Spanish documents. Both English and French are recognized as main languages. |
You can, however, specify your preferred language in the Search API locale query parameter, and set the forwardLanguageToCoveoIndex query parameter to true.
If the forwardLanguageToCoveoIndex isn’t set to true, the index ignores the locale value, disregarding your preferred language.
Stemming in field queries
You can also leverage stemming in field queries by enabling the stemming option of the target fields.
That being said, keep in mind that doing so can impede performance.
Ranking
The index gives higher result ranking to items containing the original form of queried terms. Moreover, the index calculates a correlation factor in your index between the searched term and every possible expansion. In search results, highly correlated expansions are ranked higher than poorly correlated ones. This decreases the risks of stemming confusion that could occur when words of different natures share the same stem.
In English, the terms university and universe stem to the same root, although they’re not semantically related.
When you search for universe, the Coveo index expands your query using terms from the univer stem classes that can include university. However, since the terms universe and university rarely co-occur in your indexed items, items containing university rank lower.
Disabling stemming
While expanded queries are generally useful, you can disable the stemming expansion when you want to search for an exact term or an exact phrase.