Coveo Cloud V2 Indexing Pipeline

The Coveo Cloud V2 indexing pipeline is the process through which each source item goes when indexed. The items get into the indexing pipeline either from Coveo Cloud source crawlers while the source is indexed (see Rebuild, Rescan, or Refresh operations), or when pushed by a custom process taking advantage of the Push API.

For an administrator or a developer, knowing what the indexing pipeline does is useful in cases such as:

The indexing pipeline consists of a series of sequential stages illustrated and described in the following schema and table. As an administrator or developer, you can control the behavior of stages shown with a dark gray background, but cannot for the other stages.

IndexingPipelineSchema2

Indexing pipeline stage Description

Crawling

A source crawls the target repository to fetch and push the content and properties of each repository item (see Available Coveo Cloud V2 Source Types).

As an administrator, you can configure each source from the Coveo Cloud administration console (see Sources - Page). Some item metadata is already made available by the connector, including the URI, modification date, and more depending on the source type.

Streaming

The first Push API stage, receives items to index from an external custom process.

As a developer, you fully control when and what items you send to the Push API, one by one or in batch (see Push API Usage Overview).

Push API queue

The Push API queue holds items ready to be processed by the Consuming stage.

Consuming

The last Push API stage transfers items to be processed from the Push API Push API Queue to the item processing manager DPM Queue one by one, or in batch, depending how they where pushed to the Streaming stage.

DPM Queue

The DPM queue holds the items ready to be processed by the item processing manager (DPM) set of stages.

Applying extension (preconversion) (optional)

Similar to Applying extension (postconversion) (optional).

Most indexing pipeline extensions are added in the postconversion stage (see Preconversion Versus Postconversion).

Processing

This stage essentially converts the content and properties of each item from its native format into a common format suitable for the Indexing stage using the appropriate Coveo converter for the supported formats (see Supported File Formats).

  • When the item is a PDF file, the PDF converter extracts the text and the properties from the PDF binary file.
  • When the item is an HTML file, the HTML converter extracts the text from the body element and metadata from the meta elements.

Mapping

This stage applies standard and custom source mappings to set Coveo Cloud field values with item metadata or literal text.

As an administrator:

  • When you create a source of a given type, a set of standard fields and mappings are automatically created. This source standard metadata is therefore automatically available in the index fields.

  • When you want to leverage custom metadata, you must create target Coveo index fields to host these metadata values and create mappings to set the Coveo index field values with the appropriate metadata or literal fix content (see Fields - Page).

Applying extension (postconversion) (optional)

By default, there are no postconversion extensions.

As an administrator, maybe with the help of a developer, you can:

When assigned to a source, the script of an extension is executed for each source item.

When more than one preconversion extension is assigned to a source, they are executed sequentially in the order in which they are added to the source configuration.

When an extension script throws an error, the item keeps going through the next indexing pipeline stage.

Indexing

This stage puts the item extracted content and properties into the Coveo unified index to make it available for user queries. Temporaries files containing the extracted item content and properties are then deleted.

Coveo indexes do not store a copy of your original files. However, stored data include an excerpt of the item content to display in the search results.

If your search page includes the Quick View component, users can use it to view the entire content of their search results (see Coveo Quickview Component).