Coveo for Sitecore 5 is now available!

Analyzing the Rebuild Process

Rebuilding is the action of crawling a set of documents and pushing them into the index. At the end of the process, the search index is expected to contain only the crawled documents.

Starting June 2016 (4.0.222)

In this release, the overall rebuild process remains the same. However, the rebuild task that is launched through Sitecore now includes the whole rebuild process. Sitecore may thus say that the rebuild task takes longer to complete despite the fact that it takes the same time as in earlier releases.

The main benefit is that once the rebuild task is completed, you know that every crawled item is searchable. From an end-user point of view, this is when the rebuild task really ends.

More Precise Log Traces

Before this release, it was possible to identify when a rebuild started and ended, as well as the errors that occurred in the process. However, there was no indication letting you know which steps are more time consuming than others.

This release brings more precise log traces to help monitor rebuild tasks. The rebuild task is now divided into several parts. The beginning and end of each part are clearly indicated in the logs. It becomes easy to see if one specific part is taking longer than usual to execute.

Also, every trace contains the name of the source that is being rebuilt, making it easier to untangle traces when many search indexes are rebuilt at the same time.

Here is an example of the logged traces during a rebuild task:

ManagedPoolThread #8 10:44:24 INFO  Job started: Index_Update_IndexName=Coveo_web_index
ManagedPoolThread #8 10:44:24 WARN  The index Coveo_web_index has already been initialized.
...
ManagedPoolThread #8 10:44:49 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Rebuild started.
ManagedPoolThread #8 10:44:49 INFO  [YOUR COVEO SOURCE] Synchronizing source...
ManagedPoolThread #8 10:44:49 INFO  Total Field Count for Coveo_web_index: 202, Actual Field Count: 182
...
ManagedPoolThread #8 10:45:18 INFO  [YOUR COVEO SOURCE] Source synchronized.
...
ManagedPoolThread #8 10:45:19 INFO  [Permissions synchronization "Expanded Sitecore Security Provider for YOUR SITECORE INSTANCE"] Starting to send the permissions...
...
ManagedPoolThread #8 10:45:39 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Crawling Sitecore items...
...
ManagedPoolThread #8 10:46:01 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Sitecore items crawled.
ManagedPoolThread #8 10:46:01 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Finalizing rebuild...
ManagedPoolThread #8 10:46:01 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Waiting for items to be uploaded to Coveo Cloud...
...
ManagedPoolThread #8 10:46:04 INFO  [Rebuilding source "YOUR COVEO SOURCE"] items are uploaded.
ManagedPoolThread #8 10:46:04 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Waiting for organization to be provisioned...
ManagedPoolThread #8 10:46:04 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Organization is provisioned.
ManagedPoolThread #8 10:46:05 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Waiting for items to be searchable...
...
ManagedPoolThread #8 10:46:15 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Committed items: 1023 / 2041
...
ManagedPoolThread #8 10:46:50 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Committed items: 2041 / 2041
ManagedPoolThread #8 10:46:50 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Items are searchable.
ManagedPoolThread #8 10:46:51 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Removing old items...
...
ManagedPoolThread #8 10:47:07 INFO  [Rebuilding source "YOUR COVEO SOURCE"] 538 remaining items.
...
ManagedPoolThread #8 10:47:15 INFO  [Rebuilding source "YOUR COVEO SOURCE"] 0 remaining items.
ManagedPoolThread #8 10:47:15 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Old items removed.
ManagedPoolThread #8 10:47:16 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Rebuild finished.
ManagedPoolThread #8 15:47:18 INFO  Job ended: Index_Update_IndexName=Coveo_web_index (units processed: 2041)

As you can see, the rebuild process is divided into several parts. Here are some details regarding each one.

Synchronizing source

This is when the Sitecore configuration is compared to the resources in the Coveo platform. The sources, fields, or security providers are modified in the Coveo platform to match Sitecore configurations.

Coveo for Sitecore (October 2016)

Sending permissions

All Sitecore permissions are sent to Coveo Cloud. This way, Coveo Cloud does not have to contact the Sitecore instance to access the permissions anymore.

Crawling Sitecore items

Using the crawlers configured on the search index, Sitecore iterates over a set of items and passes them to the search index using the Search Provider framework.

Waiting for items to be uploaded to Coveo Cloud

Since Coveo Cloud is an online service, the items have to be uploaded in order to be indexed. The time required to upload the items varies depending on the number of items, the size of each item, and the bandwidth of your Internet connection.

Waiting for organization to be provisioned

Upon creation, a Coveo Cloud organization takes a few minutes to be up and running. In other words, the Coveo Cloud service has to deploy some components before the organization is fully functional. At this point, the organization must be ready before the rebuild process can continue.

Waiting for items to be searchable

This step validates that all items sent to the Coveo platform are being committed and are searchable. It displays the number of committed items and the number of expected items. This process times out after 1 hour of inactivity.

Removing old items

A rebuild operation replaces the source content with a new set of items. However, the old items have to be removed. This step monitors the old items and ensures they are removed. This process times out after 5 minutes of inactivity.

Before June 2016

In this release, the Sitecore rebuild task completes when all crawled items are sent to the Coveo platform. However, the items are not processed by the Coveo platform yet. This means that every crawled item is eventually searchable and every old item is eventually removed. However, there are no indications regarding the exact moment at which the rebuild process is completed from an end-user point of view.

Here are the steps that Sitecore performs in the rebuild task.

  1. Signals the rebuild start to the Coveo platform.
  2. Synchronizes the source, fields, security provider, etc. with the Coveo platform.
  3. Iterates over all items to index. For each item, it:
    1. Translates the Sitecore item in a format intelligible for the Coveo platform.
    2. Sends the item to the Coveo platform.
  4. Signals the rebuild end to the Coveo platform and removes the old items.

This is where the Sitecore rebuild task ends. Anything that happens after is not monitored by Sitecore, and may take a few minutes to process.

The index translations are then applied, and the items become searchable. The removed items are not searchable anymore.

Starting July 2016 (4.0.290)

In this release, the overall rebuild process remains the same. However, the rebuild task that is launched through Sitecore now includes the whole rebuild process. Sitecore may thus say that the rebuild task takes longer to complete despite the fact that it takes the same time as in earlier releases.

The main benefit is that once the rebuild task is completed, you know that every crawled item is searchable. From an end-user point of view, this is when the rebuild task really ends.

More Precise Log Traces

Before this release, it was possible to identify when a rebuild started and ended, as well as the errors that occurred in the process. However, there was no indication letting you know which steps are more time consuming than others.

This release brings more precise log traces to help monitor rebuild tasks. The rebuild task is now divided into several parts. The beginning and end of each part are clearly indicated in the logs. It becomes easy to see if one specific part is taking longer than usual to execute.

Also, every trace contains the name of the source that is being rebuilt, making it easier to untangle traces when many search indexes are rebuilt at the same time.

Here is an example of the logged traces during a rebuild task:

ManagedPoolThread #8 10:44:24 INFO  Job started: Index_Update_IndexName=Coveo_web_index
ManagedPoolThread #8 10:44:24 WARN  The index Coveo_web_index has already been initialized.
...
ManagedPoolThread #8 10:44:49 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Rebuild started.
ManagedPoolThread #8 10:44:49 INFO  [YOUR COVEO SOURCE] Synchronizing source...
ManagedPoolThread #8 10:44:49 INFO  Total Field Count for Coveo_web_index: 202, Actual Field Count: 182
...
ManagedPoolThread #8 10:45:18 INFO  [YOUR COVEO SOURCE] Source synchronized.
...
ManagedPoolThread #8 10:45:39 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Crawling Sitecore items...
...
ManagedPoolThread #8 10:46:01 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Sitecore items crawled.
ManagedPoolThread #8 10:46:01 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Finalizing rebuild...
ManagedPoolThread #8 10:46:02 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Waiting for documents to be searchable...
...
ManagedPoolThread #8 10:46:12 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Committed documents: 1023 / 2041
...
ManagedPoolThread #8 10:46:50 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Committed documents: 2041 / 2041
ManagedPoolThread #8 10:46:50 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Documents are searchable.
ManagedPoolThread #8 10:46:51 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Removing old documents...
...
ManagedPoolThread #8 10:47:07 INFO  [Rebuilding source "YOUR COVEO SOURCE"] 538 remaining documents.
...
ManagedPoolThread #8 10:47:15 INFO  [Rebuilding source "YOUR COVEO SOURCE"] 0 remaining documents.
ManagedPoolThread #8 10:47:15 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Old documents removed.
ManagedPoolThread #8 10:47:16 INFO  [Rebuilding source "YOUR COVEO SOURCE"] Rebuild finished.
ManagedPoolThread #8 15:47:17 INFO  Job ended: Index_Update_IndexName=Coveo_web_index (units processed: 2041)

As you can see, the rebuild process is divided into several parts. Here are some details regarding each one.

Synchronizing source

This is when the actual Sitecore configuration is compared to the resources in CES. The sources, fields, or security providers are modified in CES to match Sitecore configurations.

Crawling Sitecore items

Using the crawlers configured on the search index, Sitecore iterates over a set of items and passes them to the search index using the Search Provider framework.

Waiting for documents to be searchable

This step validates that all documents sent to CES are being committed and are searchable. It displays the number of committed documents and the number of expected documents. This process times out after 1 hour of inactivity.

Removing old documents

A rebuild operation essentially replaces the source content with a new set of documents. However, the old documents have to be removed. This step monitors the old documents and ensures they are removed. This process times out after 5 minutes of inactivity.

Before July 2016

The Sitecore rebuild task is completed when all crawled documents are sent to the Coveo platform. This means that every crawled document is eventually searchable and every old document is eventually removed. However, there are no indications regarding the exact moment at which the rebuild process is completed from an end-user point of view.

Here are the steps that Sitecore performs in the rebuild task.

  1. Signals the rebuild start to the Coveo platform.
  2. Synchronizes the source, fields, security provider, etc. with CES.
  3. Iterates over all items to index. For each item, it:
    1. Translates the Sitecore item in a format intelligible for CES.
    2. Sends the document to CES.
  4. Signals the rebuild end to CES and removes the old documents.

This is where the Sitecore rebuild task ends. Anything that happens after is not monitored by Sitecore, and may take a few minutes to process.

The index transactions are then applied, and the documents become searchable. The removed documents are not searchable anymore.

You can monitor the rebuild state with the Administration Tool Overview page. There, you can see the number of pending transactions and the number of documents enlisted in those transactions. You can even manually force pending transactions to be applied if needed.

Recommended Articles