Use the Extensions API

Coveo organization sources can pull content from a variety of systems to make your content searchable for those with the appropriate permissions (see Connector types).

The indexing pipeline extension (IPE) feature provides a way to execute Python conversion scripts in a securely isolated non-persistent container, allowing developers to customize how items get indexed. Extension scripts can be executed at two different stages of the indexing pipeline: pre-conversion and post-conversion.

Notes

Usage overview

You can execute an indexing pipeline extension for every item of one or more sources of your organization using the Extension API:

  1. On the Administration Console API Keys (platform-ca | platform-eu | platform-au) page, add an API key to which you grant the privilege to edit extensions (that is, the Edit access level on the Extensions domain) (see Manage API keys, Manage privileges, and Extensions domain).

  2. Write your extension script using the document object (see Document object Python API reference).

  3. Create your extension (see Creating an indexing pipeline extension with the API).

  4. Add your script to your extension.

  5. Apply your extension to your source(s) (see Apply an extension to a source).

  6. rebuild your source(s) to make your extension effective.

  7. Validate that your changes perform as expected.

Python version deprecation

Currently, the IPE feature uses Python 3.10. To see what has been deprecated from 3.8, refer to:

Extensions with deprecation warnings can be seen in the Log Browser (platform-ca | platform-eu | platform-au) as shown below.

Python deprecation message in the Log Browser | Coveo

Execution of extensions using deprecated code may fail following our upgrade to Python 3.10.

Note

The most common warning is the removal of the unescape method, which has been moved from the HTMLParser object to the html module.

The following code has been deprecated before Python 3.8 and is not supported in Python 3.10.

from html.parser import HTMLParser
h = HTMLParser()
h.unescape("....")

The preceding code should be replaced with:

from html import unescape
unescape("....")