Manage Indexing Pipeline Extensions

An indexing pipeline extension (IPE) is a Python script used to customize the way one or more sources index content (see Indexing Pipeline Extension Overview).

As a member with the required privileges, you need developer skills or the help of a developer to create or adapt Python scripts from samples and test them before you include them in indexing pipeline extensions (see Coveo Cloud Indexing Pipeline Extensions).

Add or Edit Indexing Pipeline Extensions

An IPE can be associated with more than one source. Therefore, you should be aware that modifying an existing IPE can impact many sources across your Coveo organization.

Before you can apply an IPE to a source, you must first add it to your organization. After creating the extension, you can always edit its configuration.

  1. Access the Extensions page, and then:

    • To add an extension, click Add Extension.

    • To edit an extension, click the desired extension, and then in the Action bar, click Edit.

    1645-add-extension

  2. In the Add/Edit an Extension panel, in the Extension name box, enter or modify the extension name (see “Extension Name” Box Reference).

  3. (Optional) In the Description box, enter information to help understand the purpose or the context of the extension (e.g., what it does to a specific metadata or data stream and to what type or which specific sources it applies).

  4. Under Select additional item data that the extension needs to access, to optimize performances, select only the optional item binary data streams needed by your extension code.

  5. In the Extension script box, write or paste your Python script developed or adapted for your extension (see Item Object Python API Reference and Indexing Pipeline Extension Script Samples).

    Avoid including a sys.exit in your script, as this can cause issues at the processing stage of the Coveo Cloud indexing pipeline.

  6. In the Access tab, determine whether each group and API key can view or edit the extension (see Resource Access):

    1. In the Access Level column, select View or Edit for each available group.

    2. On the left-hand side of the tab, if available, click Groups or API Keys to switch lists.

  7. Click Add Extension or Save.

    Back on the Extensions page, your new extension or new version of a modified extension is available in your Coveo organization (see Manage Other Extension Versions).

  8. Apply the new or modified extension to one or more sources.

Inspect Impacted Item Logs

You can review the logs for the items impacted by an extension. On the Extensions page, click the desired extension, and then in the More menu, select Inspect impacted items. You will be redirected to the Log Browser page, in which only the items modified by the selected extension are displayed (see Review Item Logs).

Delete an Existing Extension

Delete old unused or deprecated IPEs.

  1. On the Extensions page, click the extension that you want to delete.

  2. Before deleting the extension, verify if the extension is used by one or more sources:

    1. In the Action bar, click Usage Statistics.

    2. In the Usage Statistics panel that appears, expand Used by the following sources to see the list of source names that are using the extension.

    3. When the extension is used, validate if it’s OK to delete it.

    You can’t delete an extension that’s used by one or more sources. You get an error message when trying to do so. You must first detach the extension from the associated sources (see Apply an Extension to a Source).

  3. In the Action bar, click More, and then select Delete.

  4. Click Delete to confirm.

Manage Other Extension Versions

Each time you edit and save an extension, a new version of this extension is created and saved. Each extension version has a versionId GUID (and a last modification date/time) allowing you to uniquely identify the version. You can see the list of existing versions for a given extension and even edit them, allowing you, for example, to create a new latest version from an older one.

The extension versions don’t record changes to the Name and Description parameters.

  1. On the Extensions page, click the extension for which you want to see versions.

  2. In the Action bar, click More, and then select Manage versions.

  3. In the Versions panel that appears, you can:

    • Review the list of existing versions for this extension.

    • Get the Version Id GUID and see the Last update date.

    • Edit a version by clicking the desired version, and then in the Action bar, clicking Restore.

      When you save the modified extension, a new version is created. This version becomes the current latest version that will automatically be used with sources to which the extension is applied without a specified version (see Apply an Extension to a Source).

Get Extension Execution and Usage Information

You can get more details on a given extension execution over the last 24 hours and usage from the Additional Information panel.

  1. On the Extensions page, click the extension for which you want to see more information.

  2. In the Action bar, click Usage Statistics.

  3. In the Usage Statistics panel, review the available information.

Review the Activity Regarding Extensions

On the Extensions page, in the right section of the page header, click Activity (see Review Events Related to Specific Coveo Administration Console Resources).

Reference

“Extensions” Page

The body of the Extensions page is essentially a table listing all the extensions that are defined in your Coveo organization.

Here are the details about each column of the table:

Name

  • The extension Name as entered when creating the extension.

  • The automatically generated extension ID that never changes, expressed in the form:

    [organizationId]-[extensionGUID]

    The version ID is used to uniquely identify an extension.

Description

Execution Time Status

The indicator presents the worst average extension execution time state, taking into account all items processed by the extension over the last 24 hours or over the last 5 minutes. The state is relative to the allowed maximum execution time (default of 5 seconds).

Use this indicator to identify extensions that tend to take too long to execute and see how you can optimize the code efficiency.

The possible values are:

  • Good - Green indicator

    Significantly below the maximum execution time.

  • Warning - Yellow indicator

    Getting closer to the maximum execution time.

  • Problematic - Red indicator

    Dangerously close to the maximum execution time.

If the column is empty, it means the information is unavailable at the moment.

Time Out Status

An extension times out when its execution time reaches the maximum value (default of 5 seconds), in which case the extension stage is skipped. There are no extension execution retries.

The indicator presents the worst average time out status, taking into account all items processed by the extension over the last 24 hours or over the last 5 minutes. The state is based on the ratio of timed out over the total number of extension executions.

Use this indicator to proactively optimize the extension code efficiency and identify extensions that could soon be disabled because they often take too long to execute.

  • Unknown - Grey indicator

    No time out status is available.

  • Good - Green indicator

    Acceptable time out ratio.

  • Warning - Yellow indicator

    Significant time out ratio.

  • Problematic - Red indicator

    Severe time out ratio.

Version

  • Date and time at which the extension version was created.

  • Version ID

“Extension Name” Box

When naming an extension, you should consider the following practices:

  • You can use any characters in the name to create a short display name for your extension, ideally well describing its purpose compared to other extensions.

  • In the extension script, the extension name may be used to specify a metadata or data stream indexing pipeline stage origin, so you should create developer-friendly names.

  • Consider prefixing your extension names with pre- or post- to easily identify if they contain scripts that must be applied as pre-conversion or post-conversion indexing pipeline stages (see Pre-Conversion Versus Post-Conversion).

  • You can change the extension Name whenever you want. Each extension is uniquely identified by an extension ID (in the form [organizationID]-[guid]) that never changes and that appears in the Extensions page, below the extension name in the Extension name column.

“Select Additional Item Data That the Extension Needs to Access” Section

By default, none of the data streams are selected, because:

  • The item object (allowing your script to add, modify, or delete metadata or permissions) is always available to all extensions scripts.

  • Getting data streams can significantly impact the crawling performance.

  • You don’t need to get a data stream if you want to create one from scratch.

Body Text

The Body text is created by the indexing pipeline Processing stage (so available only to post-conversion scripts) and essentially contains all the item text in an appropriate format for the indexing stage that will make the content searchable.

Select Body text only when you want to get and do something with the extracted item text that will be indexed.

For index size and performance optimization, the Body text is limited in size to 50 MB. This means that for rare items with larger body_text, the exceeding text won’t be indexed, and therefore not searchable.

Body HTML

The Body HTML is also created by the indexing pipeline Processing stage (so available only to post-conversion scripts) and contains an HTML version of the item that’s used by the Quick View.

Select Body HTML only when you want to get and modify the item Quick View content.

  • When you can define your desired Body HTML content as a static HTML markup with containing metadata placeholders, it’s generally simpler to use a mapping on the body field (see Manage Source Mappings).

  • For index size and performance optimization, the Body HTML is limited in size to 10 MB. This means that the Quick View of items with larger Body HTML will be truncated.

Thumbnail

A Thumbnail is a small image file that typically represents the content of the item (such as a reduced capture of the first item page). In the processing stage, the converter of some item types (Microsoft Word, PowerPoint, Excel, and Visio items as well as image file types) may include a thumbnail image.

When available, the Thumbnail image can be included in search results templates to allow search users to more easily identify the item from its graphical look.

Select Thumbnail only when you want to get and modify the thumbnail image generated by the crawler in a post-conversion extension.

Original File

The Original file is the actual binary data, or content of the original extracted item.

When the extracted item is a PDF file, the Item data is the actual PDF file content.

Select Original file only when you want to get and modify or do something with the original item binary content in a pre-conversion extension. There’s generally no point to get and modify the Original file in a post-conversion extension because the Indexing stage doesn’t process it.

The items are scanned item images. You want to extract the text from each item image using an optical character recognition (OCR) service. The extension script needs to get the Item data stream to feed to the OCR service.

Getting the Original file can significantly degrade indexing performances, as each item binary data has to be fetched, decompressed, and decrypted.

“Usage Statistics” Panel

In this panel, you can review the following information:

Creation Date

Date and time at which the extension was created.

Execution Time Status

The indicator presents the worst average extension execution time state, taking into account all items processed by the extension over the last 24 hours or over the last 5 minutes. The state is relative to the allowed maximum execution time (default of 5 seconds).

Use this indicator to identify extensions that tend to take too long to execute and see how you can optimize the code efficiency.

The possible values are:

  • Good - Green indicator

    Significantly below the maximum execution time.

  • Warning - Yellow indicator

    Getting closer to the maximum execution time.

  • Problematic - Red indicator

    Dangerously close to the maximum execution time.

If the column is empty, it means the information is unavailable at the moment.

Time Out Status

An extension times out when its execution time reaches the maximum value (default of 5 seconds), in which case the extension stage is skipped. There are no extension execution retries.

The indicator presents the worst average time out status, taking into account all items processed by the extension over the last 24 hours or over the last 5 minutes. The state is based on the ratio of timed out over the total number of extension executions.

Use this indicator to proactively optimize the extension code efficiency and identify extensions that could soon be disabled because they often take too long to execute.

  • Unknown - Grey indicator

    No time out status is available.

  • Good - Green indicator

    Acceptable time out ratio.

  • Warning - Yellow indicator

    Significant time out ratio.

  • Problematic - Red indicator

    Severe time out ratio.

Daily Statistics

Extension execution statistics (for all items from all sources to which the extension is applied) processed over the last 24 hours.

  • Average duration in seconds

    The average time the script takes to execute.

  • Number of errors

    Number of extension executions for which the script returned an exception.

  • Number of executions

    Total number of executions for all items from all sources to which the extension is applied.

  • Number of skips

    Total number of executions for which the extension was not executed either because:

    • The extension condition was evaluated as false.

    • The extension timed out.

    • The extension is disabled.

  • Number of timeouts

    Total number of extension executions that reached the maximum execution time (default of 5 seconds).

  • Used by the following sources

    Click the Used by the following sources bar to expand the section and see the names of sources to which this extension is applied.

    You can apply an extension to a source or detach it from the source (see Apply an Extension to a Source).

Required Privileges

The following table indicates the privileges required to view or edit elements of the Extensions page and associated panels (see Manage Privileges and Privilege Reference).

Action Service - Domain Required access level
View

Content - Extensions

Content - Sources

Organization - Activities

View
Edit

Organization - Activities

View

Content - Extensions

Content -Sources

Edit
Recommended Articles