Using pre-push extensions
Using pre-push extensions
After you create a Crawling Module source, you may need to add extensions to your indexing pipeline to customize the way source items are indexed. You can manage your indexing pipeline extensions on the Extensions (platform-ca | platform-eu | platform-au) page of the Administration Console.
As detailed under Indexing Pipeline Extension Overview, indexing pipeline extensions are applied in the cloud, after the Push API has pushed your content into Coveo. An indexing pipeline extension can therefore only handle data that has been pushed into the Platform and can’t interact with on-premises resources to customize your content indexing process. So, if you need an extension to leverage data that isn’t indexed by your source, for instance, you must rather use a pre-push extension.
A pre-push extension is a Python script that you write and save on your server. This script is applied to every item crawled by your source before it’s pushed to the cloud. Pre-push extensions are distinct and independent of indexing pipeline extensions. Consequently, you can apply a pre-push and an indexing pipeline extension to your content.
Notes
|
To help you write your extensions, Coveo provides examples in the Python3PrePushExtensions/Samples
folder.
These examples cover the most common scenarios and can be used as templates to start from.
In addition, here’s an example of a content item that your extension would receive. Coveo crawls the item data and arranges it in a JSON format.
{
"DocumentId": "file:///c:/tmp/testdata/sample.txt",
"CompressionType": "ZLIB",
"CompressedBinaryData": "eAELycgsVgCiRIWS1OISPQAplwUk",
"clickableuri": "file:///C:/Tmp/TestData/sample.txt",
"date": "2022-05-04 15:07:39",
"Permissions": [{
"PermissionSets": [{
"AllowAnonymous": false,
"AllowedPermissions": [{
"IdentityType": "GROUP",
"SecurityProvider": "Email Security Provider",
"Identity": "*@*",
"AdditionalInfo": {}
}
]
}
]
}
],
"fileextension": ["txt"],
"connectortype": ["FileCrawler"],
"source": ["File"],
"collection": ["File Collection"],
"generateexcerpt": [true],
"contenttype": [""],
"originaluri": ["file:///c:/tmp/testdata/sample.txt"],
"printableuri": ["file:///C:/Tmp/TestData/sample.txt"],
"filename": ["sample.txt"],
"permanentid": ["9a3a317a4e49c31962b969967c15e51477b0fd9ca33dceac76c94982593b"],
"size": [15],
"compressedsize": [21],
"creationdate": ["2022-05-04 15:07:23"],
"lastaccessdate": ["2023-07-04 19:07:49"],
"folder": ["C:\\Tmp\\TestData"],
"fileowner": ["COVEO\\Bob"],
"lastwritedate": ["2022-05-04 15:07:39"],
"parents": ["<?xml version=\"1.0\" encoding=\"utf-16\"?><parents><parent name=\"C:\" uri=\"file:///C:/\" /><parent name=\"Tmp\" uri=\"file:///C:/Tmp/\" /><parent name=\"TestData\" uri=\"file:///C:/Tmp/TestData/\" /><parent name=\"sample.txt\" uri=\"file:///C:/Tmp/TestData/sample.txt\" /></parents>"],
"coveo_metadatasampling": [1]
}
Apply a pre-push extension to a source
-
Write a script to be executed by a Python 3 interpreter. A
do_extension
should be called first. Scripts importing an external dependency or a Python library are supported.ExampleThe following script creates a new metadata named
mynewmetadata
. You could replacemy new metadata value
with a script associatingmynewmetadata
to data imported from a local database.# Name of the entry point method MUST be do_extension def do_extension(body): # process the body (json representation of the document that will be pushed) # example of adding a metadata body['mynewmetadata']="my new metadata value" # return new new body with the modifications return body
-
Save the script to
C:\ProgramData\Coveo\Maestro\Python3PrePushExtensions
(ProgramData
is hidden by default). In addition, if your script has external dependencies, add them to therequirements.txt
file located in this folder. As you index content, the specified packages and modules will be installed or updated to ensure your extension script works properly. -
On the Sources (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console, edit your source JSON configuration to add the
PrePushExtension
parameter. The value must be your script file name.ExampleIn this excerpt of a source JSON configuration, the pre-push extension script is
MyPrePushExtension.py
.... "parameters": { "IndexSharePermissions": { "sensitive": false, "value": "false" }, "PauseOnError": { "sensitive": false, "value": "true" }, ... "OrganizationId": { "sensitive": false, "value": "myorganization" }, "SourceId": { "sensitive": false, "value": "uxayrw42v6tkn2zz45tdcqsize-myorganization" }, "PrePushExtension": { "sensitive": false, "value": "MyPrePushExtension.py" } } ...
Using a pre-push extension to modify your content permissions
Using a pre-push extension to modify your content permissions is possible, although not recommended.
Manually modifying permissions may lead, in case of a mistake, to an search interface end user accessing content they shouldn’t be able to access. In addition, like with any other pre-push extension, if your script takes too long or fails due to an error, Coveo will index your items as they are in your content system, without applying your script. This too can result in content access issues in your search interface. |
Before you write an extension that modifies permissions, ensure that you’ve selected the Same users and groups as in your current permission system content security option for your source. If you haven’t, or if your source doesn’t support this option, Coveo will overwrite the permissions of each content item with the source-level permissions to ensure coherent behavior.
The following is a sample script you can start from to modify your content permissions:
import os
import sys
import datetime
import base64
import random
import uuid
log = []
def do_extension(body):
global log
log = ['BEGIN :' + body['DocumentId']]
permissionsDict = body['Permissions'][0]
permissionsSetDict = permissionsDict['PermissionSets'][0]
allowedPermissionsList = permissionsSetDict['AllowedPermissions']
# Obtain a new user.
name = "Bob@coveo.com"
log.append("Adding new allowed permissions: " + name)
# "IdentityType" has the following allowed values: "user", "group", and "virtualgroup".
# The value of "SecurityProvider" is the name of the security provider that will deal with your new permission. This example refers to the Email Security Provider, but it could also be an expanded provider.
# In such case, you would need to use the Push API to send the information it needs to expand the identity, for example how to map it to email or how to expand a group.
myNewAllowedMember = {"IdentityType": "USER", "SecurityProvider": "Email Security Provider", "Identity": name, "AdditionalInfo": {}}
allowedPermissionsList.append(myNewAllowedMember)
body['prepushlog']=';'.join(log)
return body