Indexing Performance Best Practices

The indexing process consumes resources and takes time. As most of the processes in Sitecore, it can be customized and extended. Additional indexing steps can be very time and resource consuming depending on the number of items to index, the items type, or the work done in the steps.

As an administrator, you should make sure that you optimize the indexing process to balance between your users needs and index speed. If a process doesn’t have an added value for your users, you should remove it as it slows down indexing. Also, if a process does help your users finding relevant information faster, you should make sure it’s optimized so the strain on the system is as small as possible.

Manual Indexing Methods

Coveo for Sitecore leverages Sitecore index update strategies to automatically index Sitecore items (see Use Indexing Strategies). Created, deleted, and modified items in the master database are indexed as those events occur. In the web database, published items are indexed at the end of the publish operation. Hence, manually rebuilding the indexes should be avoided outside of the development phase.

In the development phase, choosing the most appropriate index update method is a good way to optimize indexing time (see Update Your Search Indexes).

Indexing Scope

The greater the number of items to index is, the longer indexing takes. Also consider the fact that a Sitecore item with multiple language versions produces multiple items in the Coveo index (i.e., one for each language version).

During indexing, each Sitecore item is converted into a Coveo item and its text is extracted, so the bigger a file is, the slower is the conversion.

EXAMPLE

Consider a folder containing several downloadable PDFs along with very large images. Indexing of this folder may take a while. If the images aren’t necessary for your index, you may want to consider excluding them from the index.

Only index items with a layout and downloadable media library files. Avoid indexing items used as data sources, settings items, and media library images used to be displayed on the website.

Consider restricting indexing to the items you want to make available in the search results for your users (see Filter Your Content).

Processors and Computed Fields Optimization

Indexing pipeline processors and computed fields are the best tools to customize your indexing process. They’re flexible and powerful, but can also be highly demanding for your Sitecore instance if you’re not careful when creating them.

When creating a computed field or a processor in the indexing pipelines, you should:

  • Make sure it’s beneficial to your users.

  • Optimize the code to run fast and exit early if conditions aren’t met.

  • Avoid running HTTP requests.

  • Avoid querying a search index to retrieve items. Use the Sitecore API to get items from the database.

Inbound Filters

An inbound filter is an indexing processor solely focused on excluding items from your index. It lets you use custom logic to decide which items to include or exclude, therefore reducing the size of your index and increasing performance.

Exit your code early if a previous processor has already decided to exclude the item. Keep in mind that all the items that aren’t excluded by the crawling roots of your indexes are processed by each inbound filter processor. Hence for simple path-based filtering, it’s best to modify the crawling roots (see Change the Crawling Root of an Index).

When using the Coveo Inbound Filter Pipeline instead of the Sitecore Inbound Filter Pipeline, all your indexes outside of Coveo aren’t affected by exclusion processes.

Reduce the Number of Indexed Fields

Logically, you want to have as little fields as possible in your index.

What's Next for Me?