Indexing pipeline extension testing strategies and good practices

This is for:

In this article

Leading practices

This article compares available strategies to test indexing pipeline extensions (IPEs). The most obvious method to test an extension is to apply it to a source, rebuild the source and validate if the script did what was expected. This method can be very tedious, particularly for a source with a large number of items, since you have to wait for a rebuild at each test.

Simulation alternatives such as the Test an Extension API call and the Coveo Labs pipeline-extension-manager (which relies on the Test an Extension API call) allow you to get results much faster, but come with limitations such as simulated metadata from index fields. Any metadata which was not mapped to a field won’t be available to your extension during the simulation. Similarly, metadata mapped to a field with an unmatched name won’t be available with the proper name.

Note

As a developer, your first choice might be to use the Test an Extension API call. However, while the implementation of an automated testing process is easier with the API, only mapped metadata is available with the API call, as unmapped metadata isn’t indexed. Furthermore, indexed metadata is retrievable only with the mapped field name and not with the original metadata name, unless they’re identical.

The following table provides an overview of the different methods for testing IPEs.

Method Testing goal Advantages Disadvantages

Method	Testing goal	Advantages	Disadvantages
Using the Test an Extension API call	When a developer needs to implement automated tests on many extension scripts without referring to unmapped metadata.	Easier to implement an automated testing process with the API call. Immediate results when testing an indexing pipeline extension script on a single item.	Only indexed metadata is available. Only metadata mapped to a field is indexed. Metadata isn’t retrievable with its original name, but rather with the mapped field name. The use of metadata origin is worthless since unmapped values aren’t indexed. You may need to map metadata and re-index a whole source to access metadata. Requires developer skills to create or retrieve the document model and feed it to the API.
Logging messages from an indexing pipeline extension	When testing parts of a single extension script or using unmapped metadata. In this script, the log messages give details for each step. Therefore, a developer can validate assigned values and test if-else statements, for example. `item_size = document.get_meta_data_value('size') log('1- Size of item: {}'.format(item_size[0]), 'Detail') item_type = document.get_meta_data_value('filetype') log('2- Type of item: {}'.format(item_type[0]), 'Detail') if int(item_size[0]) > 2000: log('item size is greater than 2000', 'Notification') elif str(item_type[0]) == 'html': log('item type is html', 'Notification') else: log('both conditions failed to match', 'Warning')`	Easier to test a single line of code with a log message. All metadata and metadata origin are available with their original names. Use of try-except code blocks to manage explicitly specified script errors. You can use the logging messages method jointly with the other three testing methods. Logging messages while indexing a source with a few chosen items is typically the best strategy in the following situations: When implementing automated tests. When you don’t need to access unmapped metadata. When you don’t have developer skills.	The log values don’t appear immediately on the Log Browser (platform-ca \| platform-eu \| platform-au) page or in the SourceLogs API. You may need to index a whole source to find relevant log results to analyze.
Testing with a source containing a small number of items	When you need access to unmapped metadata in your extension script.	All metadata and metadata origin are available with their original names.	The rebuild process takes time even with very few elements. Can be difficult to find test relevant items to index. Not always possible to index only a few items of a particular source.

Using the Test an Extension API call

When a developer needs to implement automated tests on many extension scripts without referring to unmapped metadata.

Easier to implement an automated testing process with the API call.
Immediate results when testing an indexing pipeline extension script on a single item.

Only indexed metadata is available. Only metadata mapped to a field is indexed.
Metadata isn’t retrievable with its original name, but rather with the mapped field name.
The use of metadata origin is worthless since unmapped values aren’t indexed.
You may need to map metadata and re-index a whole source to access metadata.
Requires developer skills to create or retrieve the document model and feed it to the API.

Logging messages from an indexing pipeline extension

When testing parts of a single extension script or using unmapped metadata.

In this script, the log messages give details for each step. Therefore, a developer can validate assigned values and test if-else statements, for example.

item_size = document.get_meta_data_value('size')
log('1- Size of item: {}'.format(item_size[0]), 'Detail')
item_type = document.get_meta_data_value('filetype')
log('2- Type of item: {}'.format(item_type[0]), 'Detail')

if int(item_size[0]) > 2000:
    log('item size is greater than 2000', 'Notification')
elif str(item_type[0]) == 'html':
    log('item type is html', 'Notification')
else:
    log('both conditions failed to match', 'Warning')

Easier to test a single line of code with a log message.
All metadata and metadata origin are available with their original names.
Use of try-except code blocks to manage explicitly specified script errors.
You can use the logging messages method jointly with the other three testing methods.

Logging messages while indexing a source with a few chosen items is typically the best strategy in the following situations:

When implementing automated tests.
When you don’t need to access unmapped metadata.
When you don’t have developer skills.

The log values don’t appear immediately on the Log Browser (platform-ca | platform-eu | platform-au) page or in the SourceLogs API.
You may need to index a whole source to find relevant log results to analyze.

Testing with a source containing a small number of items

When you need access to unmapped metadata in your extension script.

All metadata and metadata origin are available with their original names.

The rebuild process takes time even with very few elements.
Can be difficult to find test relevant items to index.
Not always possible to index only a few items of a particular source.

Leading practices

When using a try-except code block in your extension script, you should generally catch explicitly specified errors to manage them, as shown in the following code sample:

my_title = document.get_meta_data_value('title')

if 'Coveo' not in my_title:
    raise ValueError('Coveo not in the title')

try:
    my_title = my_title[0]
    my_title = my_title.upper()
    document.add_meta_data({'caps_title':my_title})

except ValueError as e:
    log(str(e),'Error')

Any error other than ValueError still raises a flag and makes this script fail. This practice helps to identify errors in your extension script.

You can retrieve any uncaught error messages with the Get specified source document logs SourceLogs API call or on the Administration Console Log Browser (platform-ca | platform-eu | platform-au) page. Furthermore, when binding the extension to a source in the JSON configuration or with the API call, you can manage errors by editing the actionOnError value to SKIP_EXTENSION or REJECT_DOCUMENT.

WARNING

If your indexing pipeline extension script modifies item permissions, ensure that your code covers every possible use case to prevent disclosing restricted access items to unauthorized users. You should also set actionOnError to REJECT_DOCUMENT to ensure that you never index a document without the proper permissions.