Understanding Stemming

Stemming is a process which reduces words to their stem, base, or root form. The index uses the stem of each queried term to expand the query by searching for the original term and also indexed terms that share the same root. This important automatic query expansion process often returns relevant results that would not appear otherwise.

  • Searching for a term typed in its singular form returns items containing the singular and plural form of the term, and vice-versa.

  • The words search, searching and searched share the same root or stem: search-. When you query searching, the index returns items containing the words searching, search, searches, and searched.

Stem Expansions

For your index to use a certain term as a stem expansion, that term must appear in at least one indexed item, where:

  • The search interface language matches one of the languages of that item.
  • The term appears in the body of that item.

Languages

In addition to only using terms from indexed items in the same language as the search interface, Coveo employs language-specific stemming algorithms in order to improve the relevance of stem expansions (see Supported Languages - Coveo Cloud V2).

The term attention can stem to attentio in English and attenti in French.

The stem expansion attentif is only relevant in French.

By default, the index assumes the search interface language to be the main languages it detects in indexed items. You can however set the forwardLanguageToCoveoIndex Search API query parameter to true to force the index to use the language passed in the locale Search API query parameter.

Stemming in Field Queries

You can also leverage stemming in field queries by enabling the stemming option of the target fields (see Add or Edit a Field - Stemming). That being said, keep in mind that doing so can impede performance.

Ranking

The index gives higher result ranking to items containing the original form of queried terms. Moreover, the index calculates a correlation factor in your index between the searched term and every possible expansion. In search results, highly correlated expansions are ranked higher than poorly correlated ones. This decreases the risks of stemming confusion that could occur when words of different natures share the same stem.

In English, the terms university and universe stem to the same root, although they are not semantically related.

When you search for universe, the Coveo index expands your query using terms from the univer stem classes that can include university. However, since the terms universe and university rarely co-occur in your indexed items, items containing university rank lower.

Disabling Stemming

While expanded queries are generally useful, you can disable the stemming expansion when you want to search for a specific term or phrase (see Searching for an Exact Term and Searching for a Phrase).