How Does Intelligent Term Detection (ITD) Work?
A typical Natural Language Processing (NLP) application extracts terms based on linguistic rules and word frequency.
In addition to traditional NLP methods, Coveo Machine Learning (Coveo ML) leverages vocabulary previously employed by end users to search for content, thus providing a more accurate picture of what’s contextually relevant to the current end-user.
A typical basic query expression (
q) (i.e., what the end user types in a search box) contains an average of four keywords. However, some use cases require much larger chunks of text to be used as input for queries, such as an entire support case description. The default query processing algorithm isn’t designed to deal with that many keywords. Therefore, Coveo ML provides an Intelligent Term Detection (ITD) algorithm to extract only the most relevant keywords from such large query expressions (
To do so, ITD uses a set of usage analytics events (search and click) recorded for a search interface:
It selects up to 2,500 queries that generated a positive outcome (i.e., a query result was opened) and were performed at least five times. The selected queries are called top user queries.
If there are more than 2,500 queries that were made five or more times, then ITD selects the 2,500 most popular ones.
It establishes a correlation between the top user queries and the keywords contained in the
lq(e.g., support case description).
It finds the five most relevant terms in the
lqbased on the average importance of each term (see TFIDF), and on the longest substring in the
lqthat’s contained in the top user queries. These five terms are called the refined keywords.
It overrides the original
lqwith the refined keywords before the query is executed against the index.