About decompounding
About decompounding
Decompounding, also called word decomposition, is the process of splitting compound words into individual components to improve search recall.
What are compound words?
A compound word is formed by combining two or more words to create a new term. In languages like English and French, most compounds are lexicalized, meaning they are fixed terms found in a standard dictionary. They are also typically written as spaced (school bus) or hyphenated (chauffe-eau) words, which allows search engines to naturally match the individual terms without special processing.
However, languages like German, Dutch, and Swedish allow productive compounding. This means speakers can spontaneously invent new words by joining nouns together, much like forming a sentence (for example, combining Sand, Stein, and Türme to create Sandsteintürme). Because these custom compounds often don’t exist in a static dictionary, the index uses decompounding to identify and split them into their meaningful components.
The index uses decompounding to expand compound words in both queries and index items, enabling searches to match the original compound and its individual components. This improves search recall by surfacing relevant results that wouldn’t otherwise be found.
In German, Handtuch (hand towel) is a compound word made of Hand (hand) and Tuch (towel).
Without decompounding:
-
Searching for Hand or Tuch wouldn’t match items containing only the compound Handtuch.
-
Searching for Handtuch wouldn’t match items containing only the separate words Hand and Tuch.
With decompounding enabled, the index splits the compound and matches both the full compound (Handtuch) and its individual components (Hand and Tuch), ensuring more relevant results are retrieved.
How decompounding works
Decompounding operates in two stages:
During document indexing: When documents are indexed, compound words are automatically split into their components. These components are added to the index alongside the original compound, enriching the searchable content.
At query time: When a query contains a compound word, the index decompounds it and searches for both the compound word and all of its components. Items containing the full compound or all of its individual parts are returned.
|
|
Note
|
Supported languages
Decompounding uses language-specific dictionaries to accurately split compounds. For more information, see the list of languages for which Coveo supports decompounding.
During document indexing, the language detected in each document determines which dictionary the index uses.
At query time, the locale query parameter determines which dictionary is used for decompounding.
|
|
Note
Since decompounding relies on language-specific dictionaries, compounds containing words that aren’t in the dictionary won’t be split. This can occur with highly specialized terminology, proper nouns, or newly coined terms. |
Ranking
The index applies ranking to decompounded results:
-
Items containing the original compound form rank higher than items containing only the individual components.
-
Items containing both the compound and its individual components rank highest.
When searching for the German word Handtuch (hand towel):
-
Items containing Handtuch and also Hand and Tuch as separate words rank highest.
-
Items containing only Handtuch rank lower.
-
Items containing only Hand and Tuch (but not Handtuch) rank lowest among matching results.
Indexing-time decompounding requirements
For decompounding to be applied during indexing, the item’s language field must be set to a supported language. This can happen automatically through language detection during indexing, or you can manually set the item language.
Query-time decompounding requirements
For decompounding to be applied at query time, you must:
-
Set the
localequery parameter to a valid IETF BCP 47 locale code corresponding to one of the supported languages. -
Set the
forwardLanguageToCoveoIndexquery parameter totrueto force the index to use the language specified in thelocaleparameter.
To enable German decompounding in a Search API query:
{
"q": "Handtuch",
"locale": "de-DE",
"forwardLanguageToCoveoIndex": true
}
Without both the locale and forwardLanguageToCoveoIndex parameters properly set, decompounding won’t be applied to your queries.