Coveo for Sitecore Indexing Performance Leading Practices

The indexing process consumes resources and takes time. As most of the processes in Sitecore, it can be customized and extended. Additional indexing steps can be very time and resource consuming depending on the number of items to index, the items type, or the work done in the steps.

As an administrator, you should make sure that you optimize the indexing process to balance between your users needs and index speed. If a process doesn’t have an added value for your users, you should remove it as it slows down indexing. Also, if a process does help your users finding relevant information faster, you should make sure it’s optimized so the strain on the system is as small as possible.

Manual Indexing Methods

Coveo for Sitecore leverages Sitecore index update strategies to automatically index Sitecore items. Created, deleted, and modified items in the master database are indexed as those events occur. In the web database, published items are indexed at the end of the publish operation. Hence, manually rebuilding the indexes should be avoided outside of the development phase.

In the development phase, choosing the right indexing method is a good way to optimize indexing time:

  • Rebuild all:re-index all the items in all the Sitecore indexes, including the Coveo indexes. This should be avoided.
  • Rebuild index: re-index all the items in a specific index. This is recommended after you reduce the indexing scope to ensure that the newly excluded items are deleted from the Coveo indexes.
  • Re-Index Tree: re-index only the selected Content Tree item and its children in all the indexes using the current Sitecore database. This is the optimal approach most of the time. It ensures only the items affected by your changes are re-indexed.

You can find these options in Content Editor > Developer **tab > Indexing tools** section.

You can also rebuild many indexes at once in the Control Panel > Indexing Manager section.

Indexing Scope

The larger the number of items to index is, the longer it takes to index. Consider restricting indexing to the items you want to make available in the search results for your users (see Changing the Crawling Root of an Index and Excluding Sitecore Items from your Coveo Index).

Only index items with a layout and downloadable media library files. Avoid indexing items used as data sources, settings items, and media library images used to be displayed on the site.

During indexing, each Sitecore item is converted into a Coveo item and its text is extracted, so the bigger a file is, the slower is the conversion.

Consider a folder containing several downloadable PDFs along with very large images. Indexing of this folder may take a while. If the images aren’t necessary for your index, you may want to consider excluding them from the index.

Processors and Computed Fields Optimization

Indexing pipeline processors and computed fields are the best tools to customize your indexing process. They’re flexible and powerful, but can also be highly demanding for your Sitecore instance if you’re not careful when creating them.

When creating a computed field or a processor in the indexing pipelines, you should:

  • Make sure it’s beneficial to your users.
  • Optimize the code to run fast and exit early if conditions aren’t met.
  • Avoid running HTTP requests.

  • Avoid querying a search index to retrieve items. Use the Sitecore API to get items from the database.

Inbound Filters

An inbound filter is an indexing processor solely focused on excluding items from your index. It allows you to use custom logic to decide which items to include or exclude, therefore reducing the size of your index and increasing performance.

Exit your code early if a previous processor has already decided to exclude the item. Keep in mind that all the items that aren’t excluded by the crawling roots of your indexes are processed by each inbound filter processor. Hence for simple path-based filtering, it’s best to modify the crawling roots (see Changing the Crawling Root of an Index).

When using the Coveo Inbound Filter Pipeline instead of the Sitecore Inbound Filter Pipeline, all your indexes outside of Coveo won’t be affected by exclusion processes.

Reduce the Number of Indexed Fields

Logically, you want to have as little fields as possible in your index. It’s best to only index the fields that are used by indexed items and leave out the others. To choose which fields to index, Coveo for Sitecore uses the coveoIndexingGetFields pipeline (see Understanding the coveoIndexingGetFields and coveoIndexingGetTemplates Pipelines).

  • IncludeFieldsFromConfigOnlyProcessor only indexes fields that are specified in the configuration file (see Customizing the Indexing Parameters). It has both the advantage and the disadvantage of being very restrictive. If you use this processor, your content owners must be aware that the fields they add aren’t automatically added to the index.
  • Both IncludeFieldsFromConfigOnlyProcessor and IncludeTemplatesFromConfigOnlyProcessor stop the sequence, and no other fields are indexed.