Manage indexing pipeline extensions
Manage indexing pipeline extensions
An indexing pipeline extension (IPE) is a Python script used to customize the way one or more sources index content. For details on how IPEs work, see Indexing pipeline extension overview.
This article explains how to use the Extensions (platform-ca | platform-eu | platform-au) page to manage your IPEs.
Add an indexing pipeline extension
Coveo provides script samples to help you get started with extensions. You’ll most likely need developer skills to adapt a sample script to your needs, and then test it.
Note
Avoid including a |
Once you’re satisfied with your script, follow these steps to add it to your organization, and then apply it to a source.
-
On the Extensions (platform-ca | platform-eu | platform-au) page, click Add extension.
-
In the Add an extension panel, enter a name for your extension.
Leading practice-
Use a short, descriptive name.
-
Use a developer-friendly name, as it may be used in the extension script to specify a metadata or data stream indexing pipeline stage
origin
. -
Prefix your extension name with
pre-
orpost-
to indicate at which stage of the pipeline the script will apply.
-
-
Optionally, enter a description of extension purpose. For example, explain what it does to metadata or a data stream, and to which sources it applies.
-
If applicable, select the types of item data that your extension script needs to access:
Body text
The body text contains all the text found in an item. It is formatted to be used at the pipeline’s indexing stage, which makes the content searchable.
Since the body text is created at the processing stage, it is available to post-conversion extension scripts only.
Select Body text when your extension script needs to process the text extracted from your content items. Selecting this option when it’s not necessary may degrade indexing performances.
NoteFor index size and performance optimization, the body text of an item is limited to 50 MB. For rare items with extensive body text, any text that exceeds the limit won’t be indexed and, consequently, won’t be searchable in the Coveo search interface.
Body HTML
The body HTML of an item is an HTML version of the item. It is used by the quickview component of a search interface.
Since the body HTML is created at the processing stage, it is available to post-conversion extension scripts only.
Select Body HTML only if your script needs the item’s quickview content. Selecting this option when it’s not necessary may degrade indexing performances.
Notes-
If you can define your desired body HTML content as a static HTML markup with metadata placeholders, consider creating a mapping rule for the
body
field instead of an extension. It’s typically simpler and more efficient. -
For index size and performance optimization, the body HTML of an item is limited to 10 MB. This means that the quickview of items with a larger body HTML will be truncated.
Thumbnail
A thumbnail is a small image that typically represents the content of the item, such as a miniature image of the first page of a document.
When available, the thumbnail can be included in search results templates to help search users identify the item.
Select Thumbnail only if your post-conversion extension script needs the thumbnail generated by Coveo. Selecting this option when it’s not necessary may degrade indexing performances.
Original file
The original file is the actual binary data, or content, of the item. For example, if the item is a PDF file, then the item data is the actual content of this file.
Select Original file only if your pre-conversion extension script needs the binary data of the item. There’s generally no point in feeding the original file to a post-conversion extension because the indexing stage doesn’t process it.
NoteGetting the original file can significantly degrade indexing performances, as each item binary data has to be fetched, decompressed, and decrypted.
-
-
Under Extension script, enter your Python script.
-
In the Access tab, set whether each group (and API key, if applicable) in your Coveo organization can view or edit the current extension.
For example, when creating a new extension, you could decide that members of Group A can edit its configuration while Group B can only view it.
See Custom access level for more information.
-
Click Add extension.
-
Apply your extension to at least one source.
If you edit your IPE in the future, keep in mind that your changes will apply to all sources associated to this IPE.
We recommend testing your changes in a sandbox organization before applying them to your production sources, to avoid any unexpected behavior.
Inspect impacted item logs
You can review the logs for the items impacted by an extension. On the Extensions (platform-ca | platform-eu | platform-au) page, click the desired extension, and then select Inspect impacted items in the More menu.
You’ll be redirected to the Log Browser (platform-ca | platform-eu | platform-au), where only the items modified by the selected extension will be displayed. For more information on the Log Browser, see Use the Log Browser to review indexing logs.
Delete an extension
It’s a good practice to delete unused extensions to keep your Coveo organization clean and optimized.
Ensure that the extension isn’t used by any source
-
On the Extensions (platform-ca | platform-eu | platform-au) page, click the extension that you want to delete, and then click Usage statistics in the Action bar.
-
In the Usage statistics panel that appears, click Used by the following sources to confirm that the extension isn’t used by any source.
-
Close the Usage statistics panel.
Detach the extension from a source
If the extension is used by a source, you must first detach it from the source before you can delete it:
-
On the Sources (platform-ca | platform-eu | platform-au) page, click the source that uses the extension you want to delete, and then click Edit extensions in the More menu.
-
In the Edit extensions panel that opens, click the extension you want to delete, and then click Delete in the Action bar.
-
Click Delete to confirm.
-
Click Save.
-
Repeat the process for all sources that use the extension.
Delete the extension
-
Once your extension is no longer attached to any source, on the Extensions (platform-ca | platform-eu | platform-au) page, select the extension, and then click Delete in the Action bar.
-
Click Delete to confirm.
Edit or restore an old version of an extension
To edit an extension, select it on the Extensions (platform-ca | platform-eu | platform-au) page, and then click Edit in the Action bar. When you save your changes, Coveo creates a new version of this extension with a unique ID and a timestamp.
To view the older versions of an extension, select the extension, and then click More > Manage versions in the Action bar. Then, you can either restore an old version or create a new version based on an existing one:
-
In the Version panel that opens, select a version, and then click Restore in the Action bar to open it.
-
Optionally, edit the extension, and then click Save.
The restored or modified version will be saved as a new version, and will become the latest version, i.e., the version Coveo uses when indexing your content.
Note
The versioning feature doesn’t record changes to the Name and Description parameters. If you edit the extension name, for instance, all extension versions will have the new name. |
Copy an extension ID
You can copy the ID of an extension to use it in other parts of the Coveo Platform, including the Coveo Platform API.
To do so, on the Extensions (platform-ca | platform-eu | platform-au) page, click the desired source, and then click Copy extension ID to clipboard in the More menu.
An extension ID consists of the organization ID and the extension’s unique identifier, separated by a dash. It identifies the extension in the Coveo Platform, whereas the version ID identifies a specific version of an extension.
Review the activity regarding extensions
As part of your duties, you may need to review activities related to extensions for investigation or troubleshooting purposes. To do so, in the upper-right corner of the Extensions (platform-ca | platform-eu | platform-au) page, click .
See Review resource activity for details on activities and alternative ways to access this information.
About extension usage statistics
The Extensions (platform-ca | platform-eu | platform-au) page and the Usage statistics panel provide information about the extension’s usage, including the number of sources to which it’s applied, the number of items processed, and the number of errors encountered. This will help you identify extensions that aren’t used or that are causing issues.
Execution time status
The indicator displays the worst average extension execution time state, considering all items processed by the extension in the last 24 hours or the last 5 minutes. The state is relative to the allowed maximum execution time (default of 5 seconds).
This data helps you identify extensions that tend to take too long to execute. You can then optimize the code efficiency.
The possible values are:
Status | Description |
---|---|
Good |
The execution time is significantly below the maximum limit. |
Warning |
The execution time is getting closer to the maximum limit. |
Problematic |
The execution time is dangerously close to the maximum limit. |
Time out status
An extension times out when it takes over 5 seconds to execute. When this happens, the indexing pipeline skips its extension stage. There are no retries.
The indicator presents the worst average time out status, taking into account all items processed by the extension over the last 24 hours or over the last 5 minutes. The state is based on the ratio of timed out over the total number of extension executions.
This indicator helps you identify extensions whose code you could proactively optimize, as well as extensions that could be disabled because they often take too long to execute.
The possible values are:
Status | Description |
---|---|
Good |
The percentage of timeouts is acceptable. |
Warning |
The percentage of timeouts is significant. |
Problematic |
The percentage of timeouts is high. |
Daily statistics
The daily statistics provide an overview of the extension’s usage over the last 24 hours. The following metrics are displayed:
-
Average duration in seconds: The average time the script takes to execute.
-
Number of errors: Number of extension executions for which the script returned an exception.
-
Number of executions: Total number of executions for all items from all sources to which the extension is applied.
-
Number of skips: Total number of executions for which the extension wasn’t executed either because the extension condition was evaluated as
false
, the extension timed out, or the extension was disabled. -
Number of timeouts: Total number of extension executions that reached the maximum execution time 5 of seconds.
Required privileges
The following table indicates the privileges required to view or edit elements of the Extensions (platform-ca | platform-eu | platform-au) page and associated panels (see Manage privileges and Privilege reference).
Action | Service - Domain | Required access level |
---|---|---|
View extensions |
Content - Extensions |
View |
Edit extensions |
Organization - Activities |
View |
Content - Extensions |
Edit |
A member with the View access level on the Activities domain can access the Activity Browser. This member can therefore see all activities taking place in the organization, including those from Coveo Administration Console pages that they can’t access. |