Manage Indexing Pipeline Extensions

An {indexing-pipeline-extension} is a Python script used to customize the way one or more sources index content (see Indexing Pipeline Extension Overview).

As a member with the required privileges, you need developer skills or the help of a developer to create or adapt Python scripts from samples and test them before you include them in indexing pipeline extensions (see Coveo Indexing Pipeline Extensions).

Add or Edit Indexing Pipeline Extensions

Important

An IPE can be associated with more than one source. Therefore, you should be aware that modifying an existing IPE can impact many sources across your Coveo organization.

Before you can apply an IPE to a source, you must first add it to your organization. After creating the extension, you can always edit its configuration.

  1. Access the Extensions (platform-ca | platform-eu | platform-au) page, and then:

    • To add an extension, click Add Extension.

    • To edit an extension, click the desired extension, and then click Edit in the Action bar.

  2. 1645-add-extension

    In the Add/Edit an Extension panel, in the Extension name box, enter or modify the extension name (see "Extension Name" Box Reference).

  3. (Optional) In the Description box, enter information to help understand the purpose or the context of the extension (for example, what it does to specific metadata or a data stream and to what type or which specific sources it applies).

  4. Under Select additional item data that the extension needs to access, to optimize performance, select only the optional item binary data streams needed by your extension code.

  5. In the Extension script box, write or paste your Python script developed or adapted for your extension (see Item Object Python API Reference and Indexing Pipeline Extension Script Samples).

    Note

    Avoid including a sys.exit in your script, as this can cause issues at the processing stage of the Coveo indexing pipeline.

  1. In the Access tab, set whether each group (and API key, if applicable) in your Coveo organization can view or edit the current extension.

    For example, when creating a new extension, you could decide that members of Group A can edit its configuration while Group B can only view it.

    See Custom access level for more information.

  2. Click Add Extension or Save.

    Back on the Extensions (platform-ca | platform-eu | platform-au) page, your new extension or new version of a modified extension is available in your Coveo organization (see Manage Other Extension Versions).

  3. Apply the new or modified extension to one or more sources.

Tip

If you have the Enterprise edition, group this extension and your other implementation resources together in a project. See Manage projects.

Inspect Impacted Item Logs

You can review the logs for the items impacted by an extension. On the Extensions (platform-ca | platform-eu | platform-au) page, click the desired extension, and then in the More menu, select Inspect impacted items. You will be redirected to the Log Browser (platform-ca | platform-eu | platform-au) page, in which only the items modified by the selected extension are displayed (see Review item logs).

Delete an Existing Extension

Tip
Leading practice

Delete old unused or deprecated IPEs.

  1. On the Extensions (platform-ca | platform-eu | platform-au) page, click the extension that you want to delete.

  2. Before deleting the extension, verify if the extension is used by one or more sources:

    1. In the Action bar, click Usage Statistics.

    2. In the Usage Statistics panel that appears, expand Used by the following sources to see the list of source names that are using the extension.

    3. When the extension is used, validate if it’s OK to delete it.

    Note

    You can’t delete an extension that’s used by one or more sources. You get an error message when trying to do so. You must first detach the extension from the associated sources (see Apply an Extension to a Source).

  3. In the Action bar, click More, and then select Delete.

  4. Click Delete to confirm.

Manage Other Extension Versions

Each time you edit and save an extension, a new version of this extension is created and saved. Each extension version has a versionId GUID (and a last modification date/time) allowing you to uniquely identify the version. You can see the list of existing versions for a given extension and even edit them, allowing you, for example, to create a new latest version from an older one.

Note

The extension versions don’t record changes to the Name and Description parameters.

  1. On the Extensions (platform-ca | platform-eu | platform-au) page, click the extension for which you want to see versions, and then click More > Manage versions in the Action bar.

  2. In the Versions panel that appears, you can:

    • Review the list of existing versions for this extension.

    • Get the Version Id GUID and see the Last update date.

    • Edit a version by clicking the desired version, and then clicking Restore in the Action bar.

      When you save the modified extension, a new version is created. This version becomes the current latest version that will automatically be used with sources to which the extension is applied without a specified version (see Apply an Extension to a Source).

Get Extension Execution and Usage Information

You can get more details on a given extension execution over the last 24 hours and usage from the Usage Statistics panel.

  1. On the Extensions (platform-ca | platform-eu | platform-au) page, click the extension for which you want to see more information, and then click Usage Statistics in the Action bar.

  2. In the Usage Statistics panel, review the available information.

Review the Activity Regarding Extensions

As part of your duties, you may need to review activities related to extensions for investigation or troubleshooting purposes. To do so, in the upper-right corner of the Extensions (platform-ca | platform-eu | platform-au) page, click clock.

See Review resource activity for details on activities and alternative ways to access this information.

Reference

"Extensions" Page

The body of the Extensions (platform-ca | platform-eu | platform-au) page is essentially a table listing all the extensions that are defined in your Coveo organization.

Here are the details about each column of the table:

Name

  • The extension Name as entered when creating the extension.

  • The automatically generated extension ID that never changes, expressed in the form:

    [organizationId]-[extensionGUID]

    The version ID is used to uniquely identify an extension.

Description

Execution Time Status

The indicator presents the worst average extension execution time state, taking into account all items processed by the extension over the last 24 hours or over the last 5 minutes. The state is relative to the allowed maximum execution time (default of 5 seconds).

Use this indicator to identify extensions that tend to take too long to execute and see how you can optimize the code efficiency.

The possible values are:

  • Good - Green indicator

    Significantly below the maximum execution time.

  • Warning - Yellow indicator

    Getting closer to the maximum execution time.

  • Problematic - Red indicator

    Dangerously close to the maximum execution time.

Note

If the column is empty, it means the information is unavailable at the moment.

Time Out Status

An extension times out when its execution time reaches the maximum value (default of 5 seconds), in which case the extension stage is skipped. There are no extension execution retries.

The indicator presents the worst average time out status, taking into account all items processed by the extension over the last 24 hours or over the last 5 minutes. The state is based on the ratio of timed out over the total number of extension executions.

Use this indicator to proactively optimize the extension code efficiency and identify extensions that could soon be disabled because they often take too long to execute.

  • Unknown - Grey indicator

    No time out status is available.

  • Good - Green indicator

    Acceptable time out ratio.

  • Warning - Yellow indicator

    Significant time out ratio.

  • Problematic - Red indicator

    Severe time out ratio.

Version

  • Date and time at which the extension version was created.

  • Version ID

"Extension Name" Box

When naming an extension, you should consider the following practices:

  • You can use any characters in the name to create a short display name for your extension, ideally well describing its purpose compared to other extensions.

  • In the extension script, the extension name may be used to specify a metadata or data stream indexing pipeline stage origin, so you should create developer-friendly names.

  • Consider prefixing your extension names with pre- or post- to easily identify if they contain scripts that must be applied as pre-conversion or post-conversion indexing pipeline stages (see Pre-Conversion Versus Post-Conversion).

  • You can change the extension Name whenever you want. Each extension is uniquely identified by an extension ID (in the form [organizationID]-[guid]) that never changes and that appears on the Extensions (platform-ca | platform-eu | platform-au) page, below the extension name in the Extension name column.

"Select Additional Item Data That the Extension Needs to Access" Section

Important

By default, none of the data streams are selected, because:

  • The item object (allowing your script to add, modify, or delete metadata or permissions) is always available to all extensions scripts.

  • Getting data streams can significantly impact the crawling performance.

  • You don’t need to get a data stream if you want to create one from scratch.

Body Text

The Body text is created by the indexing pipeline Processing stage (so available only to post-conversion scripts) and essentially contains all the item text in an appropriate format for the indexing stage that will make the content searchable.

Select Body text only when you want to get and do something with the extracted item text that will be indexed.

Note

For index size and performance optimization, the Body text is limited in size to 50 MB. This means that for rare items with larger body_text, the exceeding text won’t be indexed, and therefore not searchable.

Body HTML

The Body HTML is also created by the indexing pipeline Processing stage (so available only to post-conversion scripts) and contains an HTML version of the item that’s used by the Quick View.

Select Body HTML only when you want to get and modify the item Quick View content.

Note
  • When you can define your desired Body HTML content as a static HTML markup with containing metadata placeholders, it’s generally simpler to use a mapping on the body field (see Manage source mappings).

  • For index size and performance optimization, the Body HTML is limited in size to 10 MB. This means that the Quick View of items with larger Body HTML will be truncated.

Thumbnail

A Thumbnail is a small image file that typically represents the content of the item (such as a reduced capture of the first item page). In the processing stage, the converter of some item types (Microsoft Word, PowerPoint, Excel, and Visio items as well as image file types) may include a thumbnail image.

When available, the Thumbnail image can be included in search results templates to allow search users to more easily identify the item from its graphical look.

Select Thumbnail only when you want to get and modify the thumbnail image generated by the crawler in a post-conversion extension.

Original File

The Original file is the actual binary data, or content of the original extracted item.

Example

When the extracted item is a PDF file, the Item data is the actual PDF file content.

Select Original file only when you want to get and modify or do something with the original item binary content in a pre-conversion extension. There’s generally no point in getting and modifying the Original file in a post-conversion extension because the Indexing stage doesn’t process it.

Example

The items are scanned item images. You want to extract the text from each item image using an optical character recognition (OCR) service. The extension script needs to get the Item data stream to feed to the OCR service.

Note

Getting the Original file can significantly degrade indexing performances, as each item binary data has to be fetched, decompressed, and decrypted.

"Usage Statistics" Panel

In this panel, you can review the following information:

Creation Date

Date and time at which the extension was created.

Execution Time Status

The indicator presents the worst average extension execution time state, taking into account all items processed by the extension over the last 24 hours or over the last 5 minutes. The state is relative to the allowed maximum execution time (default of 5 seconds).

Use this indicator to identify extensions that tend to take too long to execute and see how you can optimize the code efficiency.

The possible values are:

  • Good - Green indicator

    Significantly below the maximum execution time.

  • Warning - Yellow indicator

    Getting closer to the maximum execution time.

  • Problematic - Red indicator

    Dangerously close to the maximum execution time.

Note

If the column is empty, it means the information is unavailable at the moment.

Time Out Status

An extension times out when its execution time reaches the maximum value (default of 5 seconds), in which case the extension stage is skipped. There are no extension execution retries.

The indicator presents the worst average time out status, taking into account all items processed by the extension over the last 24 hours or over the last 5 minutes. The state is based on the ratio of timed out over the total number of extension executions.

Use this indicator to proactively optimize the extension code efficiency and identify extensions that could soon be disabled because they often take too long to execute.

  • Unknown - Grey indicator

    No time out status is available.

  • Good - Green indicator

    Acceptable time out ratio.

  • Warning - Yellow indicator

    Significant time out ratio.

  • Problematic - Red indicator

    Severe time out ratio.

Daily Statistics

Extension execution statistics (for all items from all sources to which the extension is applied) processed over the last 24 hours.

  • Average duration in seconds

    The average time the script takes to execute.

  • Number of errors

    Number of extension executions for which the script returned an exception.

  • Number of executions

    Total number of executions for all items from all sources to which the extension is applied.

  • Number of skips

    Total number of executions for which the extension was not executed either because:

    • The extension condition was evaluated as false.

    • The extension timed out.

    • The extension is disabled.

  • Number of timeouts

    Total number of extension executions that reached the maximum execution time (default of 5 seconds).

  • Used by the following sources

    Click the Used by the following sources bar to expand the section and see the names of sources to which this extension is applied.

    You can apply an extension to a source or detach it from the source (see Apply an Extension to a Source).

Required privileges

The following table indicates the privileges required to view or edit elements of the Extensions (platform-ca | platform-eu | platform-au) page and associated panels (see Manage privileges and Privilege reference).

Action Service - Domain Required access level

View extensions

Content - Extensions
Content - Sources
Organization - Activities
Organization - Organization

View

Edit extensions

Organization - Activities
Organization - Organization

View

Content - Extensions
Content - Sources

Edit

Important

A member with the View access level on the Activities domain can access the Activity Browser. This member can therefore see all activities taking place in the organization, including those from Coveo Administration Console pages that they can’t access.