Manage indexing pipeline extensions

This is for:

In this article

Add or edit indexing pipeline extensions
Inspect impacted item logs
Delete an existing extension
Manage other extension versions
Get extension execution and usage information
Review the activity regarding extensions
Reference
Required privileges

An indexing pipeline extension (IPE) is a Python script used to customize the way one or more sources index content (see Indexing Pipeline Extension Overview).

As a member with the required privileges, you need developer skills or the help of a developer to create or adapt Python scripts from samples and test them before you include them in indexing pipeline extensions (see Coveo Indexing Pipeline Extensions).

Add or edit indexing pipeline extensions

An IPE can be associated with more than one source. Therefore, you should be aware that modifying an existing IPE can impact many sources across your Coveo organization.

Before you can apply an IPE to a source, you must first add it to your organization. After creating the extension, you can always edit its configuration.

Access the Extensions (platform-ca | platform-eu | platform-au) page, and then:
- To add an extension, click Add Extension.
- To edit an extension, click the desired extension, and then click Edit in the Action bar.
In the Add/Edit an Extension panel, in the Extension name box, enter or modify the extension name (see "Extension Name" Box Reference).
(Optional) In the Description box, enter information to help understand the purpose or the context of the extension (for example, what it does to specific metadata or a data stream and to what type or which specific sources it applies).
Under Select additional item data that the extension needs to access, to optimize performance, select only the optional item binary data streams needed by your extension code.
In the Extension script box, write or paste your Python script developed or adapted for your extension (see Item Object Python API Reference and Indexing Pipeline Extension Script Samples).

Note

Avoid including a sys.exit in your script, as this can cause issues at the processing stage of the Coveo indexing pipeline.

In the Access tab, set whether each group (and API key, if applicable) in your Coveo organization can view or edit the current extension.

For example, when creating a new extension, you could decide that members of Group A can edit its configuration while Group B can only view it.

See Custom access level for more information.
Click Add Extension or Save.

Back on the Extensions (platform-ca | platform-eu | platform-au) page, your new extension or new version of a modified extension is available in your Coveo organization (see Manage Other Extension Versions).
Apply the new or modified extension to one or more sources.

If you have the Enterprise edition, group this extension and your other implementation resources together in a project. See Manage projects.

Inspect impacted item logs

You can review the logs for the items impacted by an extension. On the Extensions (platform-ca | platform-eu | platform-au) page, click the desired extension, and then in the More menu, select Inspect impacted items. You’ll be redirected to the Log Browser (platform-ca | platform-eu | platform-au) page, in which only the items modified by the selected extension are displayed (see Review item logs).

Delete an existing extension

Leading practice

Delete old unused or deprecated IPEs.

On the Extensions (platform-ca | platform-eu | platform-au) page, click the extension that you want to delete.

Before deleting the extension, verify if the extension is used by one or more sources:

In the Action bar, click Usage Statistics.
In the Usage Statistics panel that appears, expand Used by the following sources to see the list of source names that are using the extension.
When the extension is used, validate if it’s OK to delete it.

Note

You can’t delete an extension that’s used by one or more sources. You get an error message when trying to do so. You must first detach the extension from the associated sources (see Apply an extension to a source).

In the Action bar, click More, and then select Delete.
Click Delete to confirm.

Manage other extension versions

Each time you edit and save an extension, a new version of this extension is created and saved. Each extension version has a versionId GUID (and a last modification date/time) allowing you to uniquely identify the version. You can see the list of existing versions for a given extension and even edit them, allowing you, for example, to create a new latest version from an older one.

Note

The extension versions don’t record changes to the Name and Description parameters.

On the Extensions (platform-ca | platform-eu | platform-au) page, click the extension for which you want to see versions, and then click More > Manage versions in the Action bar.
In the Versions panel that appears, you can:
- Review the list of existing versions for this extension.
- Get the Version Id GUID and see the Last update date.
- Edit a version by clicking the desired version, and then clicking Restore in the Action bar.
  
  When you save the modified extension, a new version is created. This version becomes the current latest version that will automatically be used with sources to which the extension is applied without a specified version (see Apply an extension to a source).

Get extension execution and usage information

You can get more details on a given extension execution over the last 24 hours and usage from the Usage Statistics panel.

On the Extensions (platform-ca | platform-eu | platform-au) page, click the extension for which you want to see more information, and then click Usage Statistics in the Action bar.
In the Usage Statistics panel, review the available information.

Review the activity regarding extensions

As part of your duties, you may need to review activities related to extensions for investigation or troubleshooting purposes. To do so, in the upper-right corner of the Extensions (platform-ca | platform-eu | platform-au) page, click .

See Review resource activity for details on activities and alternative ways to access this information.

Reference

"Extensions" page

The body of the Extensions (platform-ca | platform-eu | platform-au) page is essentially a table listing all the extensions that are defined in your Coveo organization.

Here are the details about each column of the table:

Name

The extension Name as entered when creating the extension.
The automatically generated extension ID that never changes, expressed in the form:

[organizationId]-[extensionGUID]

The version ID is used to uniquely identify an extension.

Description

The extension Description as entered when creating the extension.
The automatically generated current latest version ID (see Manage Other Extension Versions), expressed in the form:

[extensionVersionGUID]

Execution time status

The indicator displays the worst average extension execution time state, considering all items processed by the extension in the last 24 hours or the last 5 minutes. The state is relative to the allowed maximum execution time (default of 5 seconds).

Use this indicator to identify extensions that tend to take too long to execute and see how you can optimize the code efficiency.

The possible values are:

Good (Green indicator): The execution time is significantly below the maximum limit.
Warning (Yellow indicator): The execution time is getting closer to the maximum limit.
Problematic (Red indicator): The execution time is dangerously close to the maximum limit.

Note

If the column is empty, it means the information is unavailable at the moment.

Time out status

An extension times out when its execution time reaches the maximum value (default of 5 seconds), in which case the extension stage is skipped. There are no extension execution retries.

The indicator presents the worst average time out status, taking into account all items processed by the extension over the last 24 hours or over the last 5 minutes. The state is based on the ratio of timed out over the total number of extension executions.

Use this indicator to proactively optimize the extension code efficiency and identify extensions that could soon be disabled because they often take too long to execute.

Unknown (Grey indicator): No time out status is available.
Good (Green indicator): Indicates an acceptable percentage of time outs.
Warning (Yellow indicator): Indicates a significant percentage of time outs.
Problematic (Red indicator): Indicates a high percentage of time outs.

Version

Date and time at which the extension version was created.
Version ID

"Extension name" box

When naming an extension, you should consider the following practices:

You can use any characters in the name to create a short display name for your extension, ideally well describing its purpose compared to other extensions.
In the extension script, the extension name may be used to specify a metadata or data stream indexing pipeline stage origin, so you should create developer-friendly names.
Consider prefixing your extension names with pre- or post- to easily identify if they contain scripts that must be applied as pre-conversion or post-conversion indexing pipeline stages (see Pre-Conversion Versus Post-Conversion).
You can change the extension Name whenever you want. Each extension is uniquely identified by an extension ID (in the form [organizationID]-[guid]) that never changes and that appears on the Extensions (platform-ca | platform-eu | platform-au) page, below the extension name in the Extension name column.

"Select additional item data that the extension needs to access" section

By default, none of the data streams are selected, because:

The item object (allowing your script to add, modify, or delete metadata or permissions) is always available to all extensions scripts.
Getting data streams can significantly impact the crawling performance.
You don’t need to get a data stream if you want to create one from scratch.

Body text

The Body text is created by the indexing pipeline Processing stage (so available only to post-conversion scripts) and essentially contains all the item text in an appropriate format for the indexing stage that will make the content searchable.

Select Body text only when you want to get and do something with the extracted item text that will be indexed.

Note

For index size and performance optimization, the Body text is limited in size to 50 MB. This means that for rare items with larger body_text, the exceeding text won’t be indexed, and therefore not searchable.

Body HTML

The Body HTML is also created by the indexing pipeline Processing stage (so available only to post-conversion scripts) and contains an HTML version of the item that’s used by the Quick view.

Select Body HTML only when you want to get and modify the item Quick view content.

Note

When you can define your desired Body HTML content as a static HTML markup with containing metadata placeholders, it’s generally simpler to use a mapping on the body field (see Manage source mappings).
For index size and performance optimization, the Body HTML is limited in size to 10 MB. This means that the Quick view of items with larger Body HTML will be truncated.

Thumbnail

A Thumbnail is a small image file that typically represents the content of the item (such as a reduced capture of the first item page). In the processing stage, the converter of some item types (Microsoft Word, PowerPoint, Excel, and Visio items as well as image file types) may include a thumbnail image.

When available, the Thumbnail image can be included in search results templates to allow search users to more easily identify the item from its graphical look.

Select Thumbnail only when you want to get and modify the thumbnail image generated by the crawler in a post-conversion extension.

Original file

The Original file is the actual binary data, or content of the original extracted item.

Example

When the extracted item is a PDF file, the Item data is the actual PDF file content.

Select Original file only when you want to get and modify or do something with the original item binary content in a pre-conversion extension. There’s generally no point in getting and modifying the Original file in a post-conversion extension because the Indexing stage doesn’t process it.

Example

The items are scanned item images. You want to extract the text from each item image using an optical character recognition (OCR) service. The extension script needs to get the Item data stream to feed to the OCR service.

Note

Getting the Original file can significantly degrade indexing performances, as each item binary data has to be fetched, decompressed, and decrypted.

"Usage statistics" panel

In this panel, you can review the following information:

Creation date

Date and time at which the extension was created.

Execution time status

Use this indicator to identify extensions that tend to take too long to execute and see how you can optimize the code efficiency.

The possible values are:

Good (Green indicator): The execution time is significantly below the maximum limit.
Warning (Yellow indicator): The execution time is getting closer to the maximum limit.
Problematic (Red indicator): The execution time is dangerously close to the maximum limit.

Note

If the column is empty, it means the information is unavailable at the moment.

Time out status

An extension times out when its execution time reaches the maximum value (default of 5 seconds), in which case the extension stage is skipped. There are no extension execution retries.

Use this indicator to proactively optimize the extension code efficiency and identify extensions that could soon be disabled because they often take too long to execute.

Unknown (Grey indicator): No time out status is available.
Good (Green indicator): Indicates an acceptable percentage of time outs.
Warning (Yellow indicator): Indicates a significant percentage of time outs.
Problematic (Red indicator): Indicates a high percentage of time outs.

Daily statistics

Extension execution statistics (for all items from all sources to which the extension is applied) processed over the last 24 hours.

Average duration in seconds

The average time the script takes to execute.
Number of errors

Number of extension executions for which the script returned an exception.
Number of executions

Total number of executions for all items from all sources to which the extension is applied.
Number of skips

Total number of executions for which the extension wasn’t executed either because:
- The extension condition was evaluated as false.
- The extension timed out.
- The extension is disabled.
Number of timeouts

Total number of extension executions that reached the maximum execution time (default of 5 seconds).
Used by the following sources

Click the Used by the following sources bar to expand the section and see the names of sources to which this extension is applied.

You can apply an extension to a source or detach it from the source (see Apply an extension to a source).

Required privileges

The following table indicates the privileges required to view or edit elements of the Extensions (platform-ca | platform-eu | platform-au) page and associated panels (see Manage privileges and Privilege reference).

Action	Service - Domain	Required access level
View extensions	Content - Extensions Content - Sources Organization - Activities Organization - Organization	View
Edit extensions	Organization - Activities Organization - Organization	View
Content - Extensions Content - Sources	Edit

Action

Service - Domain

Required access level

View extensions

Content - Extensions
Content - Sources
Organization - Activities
Organization - Organization

View

Edit extensions

Organization - Activities
Organization - Organization

View

Content - Extensions
Content - Sources

Edit

A member with the View access level on the Activities domain can access the Activity Browser. This member can therefore see all activities taking place in the organization, including those from Coveo Administration Console pages that they can’t access.