Adding and Managing Query Pipeline Thesaurus Rules

The thesaurus of a Coveo Cloud organization is a list of equivalent words used to transparently add keywords to the query entered by a user before it is sent to the index.

The list of thesaurus rules for the index of an organization is empty by default, but members of the Administrators and Relevance Managers built-in groups can define query pipeline thesaurus rules in their organization. Thesaurus rules are defined independently for each pipeline.

Your index contains several items pertaining to the unfortunately named ACME CTRLR game controller (user manual, troubleshooting articles, etc.).

Usage analytics reports indicate that a sizable portion of end users who are obviously looking for information on this product in your Coveo-powered community portal are actually searching for acme pad, and not getting any relevant results.

To address the issue, you create a thesaurus rule that expands acme pad to acme ctrlr.

ExpandACME

Common Use Cases

Among other things, you can use thesaurus rules because:

  • You use different terminology to designate the same reality (see “Expand Any” Thesaurus Rule Type).

    You have two versions of a document you send to new employees. Depending on the version of the document, one is named New Employee Guide and the other New Employee Manual.

    ExpandAny

  • Your users search for acronyms (see “Expand Any” Thesaurus Rule Type).

    You notice a high query count for b2b. You thus set a thesaurus rule so items that only contain business to business are also returned as search results.

    Expandb2b

  • Your users search for a product name that has recently been changed, and some items still refer to the old name (see “Expand” Thesaurus Rule Type).

    One of your product named Nice Product changed for Awesome Product. You thus set a thesaurus rule so users who search for Nice Product will also obtain items related to Awesome Product as search results.

    ExpandNP

  • Your service agents search for Salesforce case number with leading zeros. You want them to automatically also search for the case number without the leading zeros (see “Expand” Thesaurus Rule Type and Using Java-Style Regular Expressions).

    • When someone searches for 00001008, you want the system to automatically search for 00001008 OR 1008.

      ThesaurusRegex1

      The matching regular expression could be:

      /[0]*(?<num>[1-9]{1}[0-9]*)/

      where num is a captured group name. Each captured group name must be inside parentheses (()).

      The replacement expression would be:

      _num_

    • Inversely, you want people searching for 1008 to automatically also search for 1008 OR 00001008.

      ThesaurusRegex2

      The matching regular expression can be:

      /(?<num>[0-9]{4})/

      where num is a captured group name. Each captured group name must be inside parentheses (()).

      The replacement expression would be:

      0000_num_

Leading Practices

Consider the following leading practices when creating thesaurus rules:

Use thesaurus rules for legitimate reasons.

  • Thesaurus rules are case insensitive, so don’t bother entering casing variants.

  • Identify searched keywords that don’t return optimal results because users are not entering the indexed synonym keywords, and then create a thesaurus entry that expands the query to the appropriate synonyms.

  • Be careful to enter only legitimate synonyms to prevent excessive search result broadening that can negatively affect search results ranking and confuse users.

  • Avoid using the thesaurus to expand a typo to its correct form. Based on the relative occurrences of a typo and its correct form in the index, the index Did You Mean feature will automatically correct or suggest the better spelling.

Use thesaurus rules sparingly.

  • When a query pipeline contains Coveo Machine Learning (Coveo ML) models, avoid or minimize the use of thesaurus rules. Thesaurus rules are static and can thus negatively impact Coveo ML models, which follow trends. Therefore, create thesaurus rules with caution.

  • For expand any rules, the thesaurus entry expansion is omnidirectional or reciprocal to all keywords/expressions in the thesaurus entry, so be careful not to enter many synonyms in a given entry to prevent drastically increasing the length of the query.

  • Consider that a specific keyword/expression can only appear once across thesaurus rules. You must group equivalent keywords/expressions into one thesaurus entry.

  • Thesaurus rules apply before the stemming expansion made by the index, meaning that thesaurus entries are only expanded for exact matches (see Understanding Stemming). While you can consider entering multiple thesaurus rules for each stem variants (e.g., singular/plural, conjugation, one vs two-word, and other synonym variants), the leading practice is to create a single thesaurus rule that covers the term and all its variants using a regular expression.

    When a user searches for kitty or kitten, you want the system to also automatically search for cat. Instead of creating two distinct thesaurus rules for each variant, you create the following rule:

    1738-regex-example

Apply thesaurus rules conditionally.

Test your thesaurus rules.

  • Immediately test your thesaurus entry creation or modification in the search interface. You can use the Content Browser search interface to ensure that the rule works as expected.

  • Run A/B tests to monitor the effectiveness of your thesaurus entry on your search results relevance.

Adding a Rule

To add a thesaurus rule in a query pipeline

Image: Add a thesaurus rule

  1. Access the “Thesaurus” tab of the desired query pipeline.
  2. In the Thesaurus tab, click Add Rule, and then select Thesaurus rule*.
  3. In the Add a Thesaurus Rule panel that appears:

    1. Select the type of thesaurus rule you want to add. Options are Expand any, Expand, and Replace.

      • If you selected Expand any, in the Keywords inputs, enter the desired keywords.

      • If you selected Expand:

        1. In the Original keywords inputs, enter the desired keywords.
        2. In the Keywords to add inputs, enter the desired keywords.
      • If you selected Replace:

        1. In the Original keywords inputs, enter the desired keywords.
        2. In the Substitute keywords inputs, enter the desired keywords.
    2. Click Add Rule. The new thesaurus rule is effective immediately.

*: (Advanced) You can instead select Thesaurus with code to define the rule using the appropriate QPL syntax.

Managing Existing Rules

See Managing Query Pipeline Rules From Tabs.

Reference

When creating thesaurus rules, consider that they apply to:

And don’t apply to:

  • Field queries.

  • Keywords entered next to the NOT and NEAR operators.

Using Java-Style Regular Expressions

When creating a thesaurus rule, you can use Java-style regular expressions (see java.util.regex Class Pattern) to match and even replace values in thesaurus entries. You must include the / / delimiters for the matching keyword. If you use named capturing groups, the syntax to include a named-capturing group in the replacement keyword is _groupName_.

You want to separate two product name parts that are concatenated (e.g., replacing iphone6 with iphone 6).

ThesaurusRegex3

The matching expression can be: /iphone(?<ver>[0-9])/ where ver is a captured group name.

The replacement expression would be: iphone _ver_

Thesaurus Rule Types

When creating or editing a thesaurus rule from the Query Pipelines page of the administration console, you can choose one of the following thesaurus sub-type:

Expand Any

Searches the index for all thesaurus keywords as soon as one term is part of the user query.

Expand

Searches the index for all original thesaurus keywords as soon as one term is part of user query. However, the “Expand” sub-type doesn’t expand original keywords when target keywords are queried.

You can enter keywords between double-quotes to expand an exact phrase. This is useful to expand acronyms or initialisms.

Replace

Overwrites specific end-users keywords when queried.

The Replace rule type should only be created when you’re certain that your index doesn’t, and will never contain the keywords to substitute. The “Expand” rule type should first be considered.

QPL Syntax

When creating a thesaurus rule with code or editing the code of an existing thesaurus rule, use the following query pipeline language (QPL) syntax:

The following table summarizes how statements using each of the different thesaurus sub-features would process the basic part (q) of the combined query expression, assuming its current value is kitty cat:

Statement definition Processed q expression
alias /kitt(y|en)/, "cat", "mouse hunter", "feline" (kitty OR cat OR (mouse hunter) OR feline) (cat OR (mouse hunter) OR feline)
expand /kitt(y|en)/, "cat" to "mouse hunter", "feline" (kitty OR (mouse hunter) OR feline) (cat OR (mouse hunter) OR feline)
replace /kitt(y|en)/, "cat" to "mouse hunter", "feline" ((mouse hunter) OR feline) ((mouse hunter) OR feline)
quote "kitty cat" "kitty cat"
quote /kitt(y|en)/ , "cat" to "mouse hunter" "mouse hunter" "mouse hunter"

Parameters

terms

A comma-separated list of quoted strings and/or regular expressions where each quoted string must contain one or more basic query terms (e.g., "foo bar", "baz", /^meo+w$/).

When using the alias feature, <terms> must contain at least one quoted string (i.e., it can’t contain only regular expressions).

otherTerms

A comma-separated list of quoted strings where each quoted string must contain one or more basic query terms (e.g., "hello world", "biz").

Order of Execution

Thesaurus rules apply before the stemming expansion made by the index, meaning that thesaurus entries are only expanded for exact matches (see Understanding Stemming).

The following diagram highlights the position of thesaurus rules in the overall order of execution of query pipeline features.

Apply thesaurus rules

Required Privileges

The following table indicates the required privileges to view or edit elements of the Query Pipelines page and associated panels (see Privilege Management and Privilege Reference).

Action Service - Domain Required access level
View thesaurus rules

Search - Query pipelines

View
Edit thesaurus rules Search - Query pipelines Edit
Recommended Articles