About Relevance Generative Answering (RGA)
About Relevance Generative Answering (RGA)
Coveo Relevance Generative Answering (RGA) is currently available as a beta only to early access members. To be part of the RGA early access program, contact your Coveo Customer Success Manager.
As RGA is in beta, the information detailed in this article is subject to change.
Coveo uses generative AI technology to generate an answer based solely on your enterprise’s content that resides in a secure index in your Coveo organization. Semantic search capabilities ensure that the most contextually relevant content is used to generate answers.
RGA leverages large language models (LLMs) and all Coveo’s existing indexing, AI, personalization, recommendation, machine learning, relevance, and security features. Resulting in a powerful enterprise-ready solution that generates answers that are relevant, personalized, and secure. All while respecting your enterprise’s privacy and security.
RGA is designed to integrate seamlessly in the existing user experience in a Coveo-powered search interface. A single search box supports both lexical (keyword) and semantic search. This hybrid approach means that the search interface is able to handle both simple and complex user queries, and provides both traditional search results and a generated answer.
The answer is generated in real time on the search results page. If the user then applies filters to narrow down the search results, the answer regenerates on-the-fly based on the selected filters. The generated answer also includes citations that reference the indexed content that was used to generate the answer.
RGA combines semantic search capabilities and generative AI technology to generate answers to user queries. Let’s begin by taking a high-level look at how RGA works in the context of a search session in a Coveo-powered search interface. We’ll take a closer look at the main RGA processes later in this article.
The configuration and implementation of RGA for early access members are performed by Coveo Professional Services.
The following steps describe the RGA workflow as shown in the above diagram:
A user enters a query in a Coveo-powered search interface that has RGA enabled.
As it normally does, the query passes through a query pipeline where pipeline rules and machine learning are applied to optimize relevance. However, in addition to the traditional lexical (keyword) search, RGA introduces a semantic search layer. Semantic search improves search results by understanding the contextual meaning (semantics) of words and phrases in a long form user query to find the most relevant content. It does this by using document-level word embeddings to retrieve the items in the index with high semantic similarity with the query. For more information, see Document-level content retrieval.
The search engine identifies the most relevant items in the index, and sends a list of the items to an RGA model. As in any Coveo search workflow, content security is enforced to make sure that only authorized search results are displayed. The list that’s sent to the RGA model also contains only the items the user who performed the query is allowed to access.
An RGA model further refines the search results by using passage-level word embeddings to identify and retrieve the passages that are the most semantically relevant from the items that were identified by the search engine. For more information, see Passage-level content retrieval.
The RGA model uses prompt engineering and grounding to create a prompt that’s sent to the GPT LLM to generate the text. The prompt includes instructions, the query, and the relevant passages.
The GPT LLM generates the answer based only on the passages that are in the prompt, and then streams the generated text back to the search interface where it appears along with the traditional search results. The generated answer is presented along with citations that reference the content that was used to generate the answer. Clicking a citation opens the corresponding item. If the user applies filters to the search results, the answer regenerates on-the-fly based on the selected filters.
The answer that’s generated by the RGA feature.
Citations highlight the items that contain the raw data that was used to generate the answer. Click a citation to open the corresponding item.
The most relevant search results that were returned for the user query.
Relevance Generative Answering processes
Let’s look at the RGA feature in more detail by examining the three main processes that are involved in generating answers:
Embedding is the process of converting text data to mathematical representations (vectors) on a vector space. Vectors, which are also referred to as word embeddings, are mapped based on meaning. Therefore words with similar meaning occupy relatively close positions within the vector space. Embeddings are at the core of vector-based search, such as semantic search, that’s used to find similarities based on meaning and context.
The following is a graphical representation of a vector space with each dot representing a vector (word embedding). Each vector is mapped to a specific position in the multi-dimensional space. Word embeddings with similar meaning occupy relatively close positions.
The RGA feature uses two different embedding vector spaces: document-level embeddings and passage-level embeddings.
Document-level embedding is applied to your indexed items to map their key concepts. When a user enters a query in a search interface, the semantic search component uses the document-level embeddings to find the most relevant results for a given query.
Passage-level embedding is generated by the RGA model, and is used to identify the individual passages from which the GPT LLM will generate the answer.
During the indexing process, items from selected sources go through an indexing pipeline for processing by the document processing manager (DPM). The DPM uses a semantic encoder (pre-trained sentence-transformer language model) to create word embeddings for the indexed items. Document-level embeddings are updated based on your source’s refresh frequency.
For a given item, the body text is broken up into smaller chunks, and then the semantic encoder converts the text within those chunks to mathematical representations on a vector space. You can think of each chunk as representing a concept or idea that’s present in a item or document. The vectors are automatically mapped to the item’s vector fields.
The indexed items and their vectors are stored in your Coveo organization’s unified index.
When a user enters a query in a search interface, RGA’s semantic search component uses the document-level embeddings to find the items in the index that are the most contextually relevant to the user query. For more information, see Document-level content retrieval.
The RGA model uses a pre-trained sentence-transformer language model to create passage-level embeddings for the items in the index. The items are the same that were used for document-level embeddings. Passage-level embeddings are updated based on the RGA model’s build schedule.
For a given indexed item, the body text is broken into passages based on concepts. The transformer language model then encodes the passage text as mathematical representations on a vector space.
The passage-level embeddings and vector space are stored in the RGA model’s memory.
The purpose of passage-level embedding is to allow the RGA model to find the passages that are the most contextually relevant to the user query. For more information, see Passage-level content retrieval.
Relevant content retrieval
When using generative AI to generate text from raw data, it’s essential to identify and control the content that will be used as the raw data.
RGA applies two layers of content retrieval to make sure the answer that’s generated is based on the most contextually relevant content.
Document-level content retrieval
The initial content retrieval is done at the item (document) level where RGA uses semantic search to identify the most contextually relevant items from the index. Content security is enforced to make sure that the user sees only the content that they’re allowed to access.
When a user performs a query in a search interface, the query passes through a query pipeline where pipeline rules and machine learning is applied to optimize relevance. However, in addition to the traditional lexical (keyword) search, RGA introduces a semantic search layer to find the most relevant items based on meaning and context.
The semantic search component compares the semantic elements of the query with the document-level embeddings. Using vector search, semantic search finds the items in the index with high semantic similarity with the query.
A list of the 100 most relevant items is then sent to the RGA model. The RGA model then performs the second layer of content retrieval, which is a passage-level content retrieval.
Semantic search is only available for a Coveo-powered search interface as part of the RGA feature.
Why do we need semantic search in addition to lexical search?
Traditional lexical search relies on matching keywords or phrases that appear in a query with the words and phrases in items. Due to the exact-match nature of lexical search, user queries that yield the best results tend to be concise and precise. The user needs to know what they’re searching for and use the proper keywords.
However, what if a user doesn’t know exactly what they’re looking for, or what if the query is more complex? With the emergence of generative AI, customer expectations are evolving and search interfaces must be able to understand and provide answers for more complex and long search queries. A complex query is one where the user provides context and asks a question.
If a user enters the query “What is Coveo machine learning and what are its benefits”, lexical search results will include items that discuss “Coveo machine learning”. These results may also contain items with high occurrences of the word “benefit”, which may not be relevant to machine learning. Lexical search is fast, cost-efficient, and has a proven track record in many enterprises. However, lexical search doesn’t consider the meaning of words or the context. So even if a search interface has generative AI technology, generating the answer based on search results that were obtained using lexical search technology alone wouldn’t necessarily provide a contextually relevant answer. While lexical search isn’t suited to finding similarities in items based on context and word meaning, semantic search is designed for just that purpose.
Semantic search allows RGA to understand the question by understanding the context and user intention in a natural language search. Therefore, in order for a generative AI solution to be able to generate relevant answers, the content retrieval process must include semantic search capabilities.
Passage-level content retrieval
With document-level content retrieval complete, RGA now knows the most contextually relevant items to which the user has access. The next step is to identify the most relevant passages from those items.
The RGA model uses the passage-level word embeddings vector space in its memory to find the most relevant passages from the top 100 items identified in the document-level retrieval step. The RGA model does this by embedding the user query into the vector space and performs a vector search to find the passages with the highest semantic similarity with the query.
What results is a select collection of the most contextually relevant passages from your enterprise’s most relevant items. The RGA model is now ready to create the prompt that will be sent to the GPT LLM that will be used to generate the answer.
RGA uses an OpenAI GPT-3.5 LLM to generate the answer. While RGA leverages GPT’s linguistic capabilities to generate the answer, Coveo controls the content that serves as the raw data from which the text is generated.
The answer generation process includes the following steps:
Create the prompt
A prompt is essentially a task that the GPT LLM interprets to generate text. The RGA model uses prompt engineering and grounding to construct a prompt that includes detailed instructions, the query, and the most relevant passages.
By grounding the prompt and confining the GPT LLM to just the most relevant passages from your enterprise’s secured content, RGA ensures that the answer that’s generated is relevant and respects your enterprise’s content permissions.
The RGA model sends the prompt to the GPT LLM that will generate the answer.
Generate the answer
The GPT LLM receives the prompt that was created by the RGA model and generates the text based only on the passages that are in the prompt.
By controlling both the prompt and the answer, RGA greatly reduces the chances of AI hallucinations, which is when an LLM generates text that’s nonsensical or inaccurate.
The generated answer is then streamed back to the search interface where it appears on the search results page.