Add/Edit an Extension - Panel

The Add an Extension or Edit an Extension: [extensionName] panel allows you, when you have the required privileges, to create or modify indexing pipeline extensions that are Python scripts modifying how items are included in your organization sources (see Indexing Pipeline Extension Overview).

You need developer skills or the assistance of a developer to create or adapt Python scripts from samples and test them before you include them in indexing pipeline extensions (see Coveo Cloud V2 Indexing Pipeline Extensions).

To add or edit an indexing pipeline extension

  1. If not already in the Add/Edit an Extension: [ExtensionName] panel, go to the panel:

    • To add an extension, in the main menu, under Content, select Extensions > Add Extension button.

      OR

    • To edit an extension, in the main menu, under Content, select Extensions > Extension row > Edit in the Action bar.

      OR

    • In the Apply an Extension on Source Items panel, click Add Extension (see Applying an Extension to a Source).

  2. In the Extension name box, enter or modify the extension name.

    • You can use any characters in the name to create a short display name for your extension, ideally well describing its purpose compared to other extensions.

    • In the extension script, the extension name may be used to specify a metadata or data stream indexing pipeline stage origin, so you may want to create developer friendly names.

    • Consider prefixing your extension names with pre- or post- to easily identify if they contain scripts that must be applied as preconversion or postconversion indexing pipeline stages (see Preconversion Versus Postconversion).

    • You can change the extension Name whenever you want. Each extension is uniquely identified by an extension ID (in the form [organizationID]-[guid]) that never changes and that appears in the Extensions page, below the extension name in the Extension name column.

  3. Optionally, in the Description box, enter information to help understand the purpose or the context of the extension such as what it does to a specific metadata or data stream and to what type or which specific sources it applies.

  4. Under Select the item binary data available to the extension, to optimize performances, select only the optional item binary data stream(s) needed by your extension code:

    By default, none of the data streams are selected, because:

    • The item object (allowing your script to add, modify, or delete metadata or permissions) is always available to all extensions scripts.

    • Getting data streams can significantly impact the crawling performance.

    • You do not need to get a data stream if you want to create one from scratch.

    • Body text

      The Body text is created by the indexing pipeline Processing stage (so available only to post-conversion scripts) and essentially contains all the item text in an appropriate format for the indexing stage that will make the content searchable.

      Select Body text only when you want to get and do something with the extracted item text that will be indexed.

      For index size and performance optimization, the Body text is limited in size to 50 MB. This means that for rare items with larger body_text, the exceeding text will not be indexed, and therefore not searchable.

    • Body HTML

      The Body HTML is also created by the indexing pipeline Processing stage (so available only to post-conversion scripts) and contains an HTML version of the item that is used by the Quick View.

      Select Body HTML only when you want to get and modify the item Quick View content.

      • When you can define your desired Body HTML content as a static HTML markup with containing metadata placeholders, it is generally simpler to use a mapping on the body field (see Edit the Mappings of a Source: [SourceName]).

      • For index size and performance optimization, the Body HTML is limited in size to 10 MB. This means that the Quick View of items with larger Body HTML will be truncated.

    • Thumbnail

      A Thumbnail is a small image file that typically represents the content of the item (such as a reduced capture of the first item page). In the processing stage, the converter of some item types (Microsoft Word, PowerPoint, Excel, and Visio items as well as image file types) may include a thumbnail image.

      When available, the Thumbnail image can be included in search results templates to allow search users to more easily identify the item from its graphical look.

      Select Thumbnail only when you want to get and modify the thumbnail image generated by the crawler in a postconversion extension.

    • Original file

      The Original file is the actual binary data, or content of the original extracted item.

      When the extracted item is a PDF file, the Item data is the actual PDF file content.

      Select Original file only when you want to get and modify or do something with the original item binary content in a preconversion extension. There is generally no point to get and modify the Original file in a postconversion extension because the Indexing stage does not process it.

      The items are scanned item images. You want to extract the text from each item image using an optical character recognition (OCR) service. The extension script needs to get the Item data stream to feed to the OCR service.

      Getting the Original file can significantly degrade indexing performances, as each item binary data has to be fetched, decompressed, and decrypted.

  5. In the Extension script box, write and paste your Python script developed or adapted for your extension (see Item Object Python API Reference and Indexing Pipeline Extension Script Samples).

  6. Click Add or Save.

    Back in the Extensions page, your new extension or new version of a modified extension is available in your Coveo Cloud V2 organization (see Manage Other Extension Versions).

What’s Next?

Apply the new or modified extension to one or more sources (see Applying an Extension to a Source).