Indexing Pipeline Extension Overview

Your Coveo Cloud organization uses an indexing pipeline with various stages to process source items from enterprise repositories and make them searchable (see Coveo Cloud V2 Indexing Pipeline).

An extension consists in a Python script used to customize the way source items are indexed. You must define the extension in your Coveo Cloud organization, and then apply it to one or more sources. When items are processed in the pipeline, your extension is applied at the pre-conversion or post-conversion stage.

You make a website searchable using a Web source type. The last modification date of the page content is only available in a <meta> element in the pages, but as a text string in the local time (e.g., March 12, 2017 03:11:32 PM). You want to use this date text string and properly include it in an index date field in UTC format.

You (or a developer) create an extension with a Python script that converts the date text to a UTC date value and sets a metadata with the converted value.

As shown in the following diagram, when a crawled or pushed item enters the indexing pipeline, the Document Processing Manager (DPM) adds a pre-conversion or post-conversion stage for each extension that is applied to a source.


An extension can be conditionally applied for each source item based on the item type (Common versus Specific extension application), or based on a condition expression.