--- title: Indexing pipeline extension overview slug: '1556' canonical_url: https://docs.coveo.com/en/1556/ collection: index-content source_format: adoc --- # Indexing pipeline extension overview Your Coveo organization uses an indexing pipeline with various stages to process source items from enterprise repositories and make them searchable (see [Coveo indexing pipeline](https://docs.coveo.com/en/1893/)). An _extension_ consists of a Python script used to customize the way source items are indexed. You must define the extension in your Coveo organization, and then apply it to one or more sources. When items are processed in the pipeline, your extension is applied at the pre-conversion or post-conversion stage. **Example** You make a site searchable using a **Web** source type. The last modification date of the page content is only available in a `` element in the pages, but as a text string in the local time (for example, `March 12, 2017 03:11:32 PM`). You want to use this date text string and properly include it in an index date field in UTC format. You (or a developer) create an extension with a Python script that converts the date text to a UTC date value and sets a metadata with the converted value. As shown in the following diagram, when a crawled or pushed item enters the indexing pipeline, the Document Processing Manager (DPM) adds a pre-conversion or post-conversion stage for each extension that's applied to a source. ![Flowchart showing the steps of the Coveo indexing pipeline with extensions](https://docs.coveo.com/en/assets/images/index-content/indexing-pipeline-flowchart-with-extension.png) An extension can be conditionally applied for each source item based on the item type (**Common** versus **Specific** extension application), or based on a condition expression. ## Deploying an indexing pipeline extension The following procedure outlines the steps to get an indexing pipeline extension to work its magic. . Create or adapt a script to use in your extension. You need at least basic developer skills to write or adapt a sample script that does the custom processing you need for one or more of your Coveo organization sources. > **Note** > > Indexing pipeline extensions provide great flexibility to process items, but since executing a script for each item of a source can notably increase indexing time, they should be used as a last resort when other available indexing customization tools don't allow to perform the specific task (see [Indexing pipeline customization tools overview](https://docs.coveo.com/en/126/)). Consult the following references to help you write your script: ** [What are possible extension script purposes?](#Purposes) ** [What's the extension script environment?](#Environment) ** [Pre-conversion versus post-conversion](#PreVsPost) ** [Usage limits](#Limits) ** [`document` object Python API reference](https://docs.coveo.com/en/34/) ** [Indexing pipeline extension condition syntax reference](https://docs.coveo.com/en/64/) ** [Indexing pipeline extension script samples](https://docs.coveo.com/en/111/) . From the Coveo Administration Console, create an extension to host your script (see [Add or edit an indexing pipeline extension](https://docs.coveo.com/en/1645/)). . Test your indexing pipeline extension (see [Indexing pipeline extension testing strategies and good practices](https://docs.coveo.com/en/67/)). What you validate depends entirely on what the script does. Verify that the extension performed as expected for all applicable items. Extension testing suggestions: ** Create and use a temporary source with a small number of representative items of your production source to test your extension. ** Use logs in your script while debugging (see [Logging messages from an indexing pipeline extension](https://docs.coveo.com/en/140/)). . Apply your extension to your production source (see [Apply an extension to a source](https://docs.coveo.com/en/1936/)). . Rebuild your source (see [Refresh, rescan, or rebuild sources](https://docs.coveo.com/en/3390#refresh-rescan-or-rebuild-sources)). . Validate that your extension processed the source items as expected. Verify that the extension performed as expected for all applicable items of your production source. ## Managing credentials in indexing pipeline extensions In many cases, your indexing pipeline extension will need to handle credentials, API keys, or other sensitive data (for example to connect to external services or databases). To ensure the security of such information, you can [use vault parameters in your indexing pipeline extension](https://docs.coveo.com/en/l9he0046/). By storing your credentials in vault entries, you're able to securely retrieve them within your indexing pipeline extension. This guarantees that sensitive information, like passwords and API keys, is not hardcoded directly in your extension scripts, thus reducing the risk of exposing them in logs or source code. See [Create a vault entry](https://docs.coveo.com/en/m3a90243/) for instructions on how to achieve this. ## [[Purposes]]What are possible extension script purposes? In short, here are types of action an indexing pipeline extension script can perform for each item of a source: * Add, modify, clear metadata. * Get the metadata value and [data streams](https://docs.coveo.com/en/2891/) from any indexing pipeline stage by using the `origin` attribute (identified by the stage or extension name). * Modify data streams: ** **Body text** ** **Body HTML** ** **Body Markdown** ** **Thumbnail** ** **Original file** * Reject items (exclude them from the index). * Add, modify, delete permissions. ## [[Environment]]What's the extension script environment? An indexing pipeline extension Python 3 script: * Runs in a separate non-persistent isolated OS instance for each item. * Can import common Python libraries (such as `Requests`) available in the OS instance (see [Python modules available to indexing pipeline extensions](https://docs.coveo.com/en/116/)). * Can read and write to the local folder, but without persistence between extension instances for each source item. * Can access the Internet. ## [[PreVsPost]]Pre-conversion versus post-conversion The following table provides some criteria indicating when each indexing pipeline stage is more appropriate. In doubt, favor adding your extensions as post-conversion stage. [%header, cols="~,~"] |=== |Pre-conversion |Post-conversion a| Use when: * The script purpose is to reject items, and therefore prevent wasting resources on further indexing pipeline stages. **Example** #### You want to create separate sources for the oldest and newest items of a repository. In the source for the newest items, you add a script that rejects items with a last modification date older than your splitting date. #### * The script modifies the original `Item data` content and you want the `Processing` stage to process your changes. **Note** #### Metadata added in the pre-conversion stage isn't automatically mapped to a field with a matching name. You must add a mapping to the sources for which you want to leverage the metadata (see [Manage source mappings](https://docs.coveo.com/en/1640/)). #### a| Use when: * The script needs to get the `Body text` or the `Body HTML` data stream processed by the `Processing` stage. * You want to ensure that your metadata changes won't be altered by another stage. **Note** #### When more than one post-conversion extensions are applied to a source, another extension could execute after. #### * When you want to create a script to discover all available metadata from all previous stages. |=== ## [[Limits]]Usage limits By default, the following indexing pipeline extension usage limits apply to all organizations: * Number of extensions per organization: 10 * Number of extensions that can be applied to a source: 20 > **Note** > > You can apply the same extension two times to a source (that is, one time in pre-conversion and one time in post-conversion). * Extension execution timeout: 5 seconds Most common indexing pipeline extension applications only modify the `item` metadata and typically execute within significantly less than a second. An extension can take significantly longer when getting and processing items `Body text` or when calling an external service to process the items, particularly for large items. The extension execution can also have a significant impact on the crawling performance for sources containing many items. > **Note** > > You can review your extension and other [usage limits](https://docs.coveo.com/en/1562#limits) in the Coveo Administration Console **Settings** page, under **License** > [**Limits**](https://platform.cloud.coveo.com/admin/#/orgid/settings/license/limits) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/settings/license/limits) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/settings/license/limits) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/settings/license/limits)). > > Contact [Coveo Support](https://connect.coveo.com/s/case/Case/Default) if you would like to upgrade your Coveo license with an increased number of extension limit.