Using Pre-Push Extensions

After you create a content source, you may need to add extensions to your indexing pipeline to customize the way source items are indexed (see Creating a Crawling Module Source, Indexing Pipeline, and Indexing Pipeline Extension Overview). You can manage your indexing pipeline extensions from the Coveo Cloud administration console (see Extensions Page).

As detailed under Indexing Pipeline Extension Overview, indexing pipeline extensions are applied in the cloud, after the Push API has pushed your content into the Coveo Cloud platform (see Workflow). An indexing pipeline extension can therefore only handle data that has been pushed into the platform, and cannot interact with on-premises resources to customize your content indexing process. So, if you need an extension to leverage data that is not indexed by your source, for instance, you must rather use a pre-push extension.

A pre-push extension is a Python script that you write and save in your Crawling Module folder (see Installing Maestro). This script is applied to every item crawled by your source before it is pushed into the cloud. Pre-push extensions are distinct and independent from indexing pipeline extensions. Consequently, you can apply a pre-push and an indexing pipeline extension to your content.

On January 1st, 2020, Python 2 will be deprecated. This means that the Python 2 pre-push extension scripts you use with the Crawling Module will need to be translated to Python 3 by the end of 2019. For further information, see Python 2 End-Of-Life.

Applying a pre-push extension to a source may significantly slow down the content crawling process for this source, as the script is executed for every item crawled.

To apply a pre-push extension to a source:

  1. Write a script to be executed by a Python 2.7.x interpreter. A do_extension should be called first.

    The following script creates a new metadata named mynewmetadata. You could replace my new metadata value with a script associating mynewmetadata to data imported from a local database.

     # Name of the entry point method MUST be do_extension
     def do_extension(body):
         # process the body (json representation of the document that will be pushed)
         # example of adding a metadata
         body['mynewmetadata']="my new metadata value"
         # return new new body with the modifications
         return body
    

    Scripts importing an external dependency or Python library are supported.

  2. Save the script to C:\ProgramData\Coveo\data\PrePushExtensions, along with any external dependency formatted as a python package or module.

  3. In the Coveo Cloud administration console, edit your source JSON to add the PrePushExtension parameter (see Edit a Source JSON Configuration). The value must be your script file name.

    In this excerpt of a source JSON configuration, the pre-push extension script is MyPrePushExtension.py.

     ...
         "parameters": {
         "IndexSharePermissions": {
             "sensitive": false,
             "value": "false"
         },
         "PauseOnError": {
             "sensitive": false,
             "value": "true"
         },
     ...
         "OrganizationId": {
             "sensitive": false,
             "value": "myorganization"
         },
         "SourceId": {
             "sensitive": false,
             "value": "uxayrw42v6tkn2zz45tdcqsize-myorganization"
         },
         "PrePushExtension": {
             "sensitive": false,
             "value": "MyPrePushExtension.py"
         }
         }
     ...
    
  4. Rebuild your source (see Refresh, Rescan, or Rebuild Sources).