How Does Intelligent Term Detection (ITD) Work?

A typical Natural Language Processing (NLP) application extracts terms based on linguistic rules and word frequency.

In addition to traditional NLP methods, Coveo™ Machine Learning (Coveo ML) leverages vocabulary previously employed by end users to search for content, thus providing a more accurate picture of what is contextually relevant to the current end-user.

A typical basic query expression (q) (i.e., what the end user types in a search box) contains an average of four keywords. However, some use cases require much larger chunks of text to be used as input for queries, such as an entire support case description. The default query processing algorithm is not designed to deal with that many keywords. Therefore, Coveo ML provides an Intelligent Term Detection (ITD) algorithm to extract only the most relevant keywords from such large query expressions (lq).

To do so, ITD uses a set of usage analytics events (search and click) recorded for a search interface:

  1. It selects up to 2,500 queries that generated a positive outcome (i.e., a query result was opened) and were performed at least five times.

    • If more than 2,500 queries were made five plus times, then ITD selects the 2,500 most popular ones.

    • The selected queries are called top user queries.

  2. It establishes a correlation between the top user queries and the keywords contained in the large query expression (e.g., support case description).

  3. It finds the five most relevant terms based on the average importance of each term (see TFIDF), and on the longest substring in the large query that is contained in the top user queries. These five terms are called the refined keywords.

  4. It adds the refined keywords to the basic query expression, and converts the resulting expression into a partial match expression before the query is executed against the index.