Search Agent data security

Important
Beta feature

Coveo Search Agent is currently available as a beta offering. Contact your Customer Success Manager for access to this feature. Your use of this feature is subject to the beta and pre-release terms of your agreement with Coveo, including any applicable beta or pre-release provisions therein. To the extent your agreement does not contain specific beta or pre-release terms, Section 8 (Beta Features) of the Coveo Customer Agreement shall apply. This feature is provided "as-is," without warranty or SLA coverage, and may be modified, suspended, or discontinued at any time. You should not use this feature to process sensitive or regulated data.

Data is a critical asset that fuels enterprise growth and innovation, making its protection essential, especially when working with a generative AI system, such as the Coveo Search Agent.

A generative large language model (LLM) trains on a large corpus of data and uses it to generate new content. However, it also raises concerns about data privacy and security, such as:

  • "What data is the model using, and is it retaining that data?

  • "Is my data being shared with others?

  • "What data is the model using to generate the new content?

  • "Will the generated content leak sensitive information to unintended audiences?

For any enterprise to use generative AI technology ethically and safely, these concerns must be addressed. An enterprise must be able to control the content that’s used by the model. The content that’s generated must be highly relevant but also take system permissions and access rights into account to prevent sensitive information from being leaked.

This article describes how Coveo handles your enterprise content safely when generating answers, and how the answers are secure and always based solely on your most relevant and up-to-date content.

Data security can be broken down into these main features as shown in the following diagram:

Coveo Search Agent security

Secure content retrieval

Secure content retrieval is a feature of the Coveo Platform that allows for more efficient and secure searching and generation of enterprise content with Coveo’s security cache at its core.

Note

Existing Coveo security protocols and data protection measures remain applicable throughout the answer-generation process and ensure that your enterprise items and data remain secure. See Data and security for details on how Coveo protects your data.

Your enterprise data is stored in a secure Coveo unified index. This data is only accessible to you and the people you authorize within your Coveo organization. Secure handling of your enterprise data applies not only at ingestion (indexing) but also at query time.

  • At indexing, the Coveo source retrieves content from your enterprise applications. The content is indexed with the item and user permissions from your repository’s permission system.

  • At query time, the Coveo’s security cache is used to handle the permissions for each authenticated user in your Coveo-powered search interface.

By indexing your enterprise restricted content with item and user permissions, and then applying those user permissions at query time, Coveo ensures that sensitive information isn’t inadvertently exposed through search results or generated answers. Through a Coveo-powered search interface, authenticated users only see the items that they’re allowed to access within the indexed repository.

Grounded content

In the context of generative AI, grounding refers to the process of providing a generative LLM with specific and relevant information that’s not available to the model based on its own training.

While generative LLMs come with a vast amount of knowledge, it isn’t specific to a given use case or industry. To obtain a relevant output to a query, the generative LLM must be provided with relevant content specific to your enterprise. In other words, the LLM must be "grounded" in the context of your enterprise content. Grounding is an important aspect of generative answering, as it helps to ensure that the generated output is relevant and secure. Grounding holds the model to factual data and relevant user context when generating an answer.

Coveo’s secure content retrieval makes grounding possible. The Search Agent uses content retrieved from your Coveo index to ground the generative LLM. Two layers of content retrieval ensure that Coveo controls the data used to generate the answer. The prompt that’s provided to the generative LLM includes a detailed instruction, the query, and the most relevant segments of text from the retrieved content. Confining the generative LLM to just the most relevant text from your secured content ensures that the generated answer is relevant and respects your enterprise content permissions.

The Coveo Search Agent orchestrates multiple rounds of content retrieval and answer generation based on follow-up questions. The Search Agent relies on an internal conversation ID to ensure that generated answers are confined to the current conversation, keeping responses grounded in its existing context.

Secure content retrieval and grounded content is an essential part of retrieval-augmented generation (RAG), which enhances the security, relevance, and reliability of content generated by an LLM.

HTTPS and TLS endpoints

To generate the answer, coveo uses a third-party generative LLM that’s hosted on an external foundation model service server. The grounded prompt is sent to the foundation model service, the LLM generates the answer, and the answer is then sent back to Coveo.

HTTPS endpoints ensure that communication between Coveo and the foundation model service server is encrypted and secure, preventing attacks such as eavesdropping, tampering, or data theft.

TLS endpoints use cryptographic protocols to provide authentication, confidentiality, integrity, and non-repudiation services, enabling secure web communication between Coveo and the foundation model service.

Zero retention

To maintain data privacy, enterprises must retain complete ownership of their data. With Coveo, you remain the sole owner of your data.

  • You control the content that’s indexed from your enterprise. This means that you control what content to index, when to update the content in the index, and how long your data is kept in the Coveo index. The index is only accessible to you and the people you authorize within your Coveo organization. Coveo doesn’t retain any of your enterprise content after it’s indexed.

  • The Coveo Machine Learning (Coveo ML) models that are used in the Search Agent workflow are only available within your Coveo organization. The models use only the indexed content that you specify, and Coveo won’t fine-tune any other LLMs or share your data with other clients.

  • To generate the answer, Coveo uses a third-party generative LLM that’s hosted on an external foundation model service server. The generative LLM is a stateless model that’s shared by all Coveo customers. The LLM is used solely for the purpose of generating answers. The model isn’t trained on your enterprise data, and it doesn’t retain any of your data for future learning.

Important

While the foundation model service hosts the generative LLM and processes your data for the purpose of generating answers, it doesn’t store your data.

Logged analytics data

Coveo logs data related to generative answering, and retains the data for a period of time as specified in Data retention.