Analyzing the Rebuild Process
Analyzing the Rebuild Process
Rebuilding is the action of crawling a set of documents and pushing them into the index. At the end of the process, the search index is expected to contain only the crawled documents.
Starting June 2016
In this release, the overall rebuild process remains the same. However, the rebuild task that’s launched through Sitecore now includes the whole rebuild process. Therefore, Sitecore may say that the rebuild task takes longer to complete despite the fact that it takes the same time as in earlier releases.
The main benefit is that once the rebuild task is completed, you know that every crawled item is searchable. From an end-user point of view, this is when the rebuild task really ends.
More Precise Log Traces
Before this release, it was possible to identify when a rebuild started and ended, as well as the errors that occurred in the process. However, there was no indication letting you know which steps are more time consuming than others.
This release brings more precise log traces to help monitor rebuild tasks. The rebuild task is now divided into several parts. The beginning and end of each part are clearly indicated in the logs. It becomes easy to see if one specific part is taking longer than usual to execute.
Also, every trace contains the name of the source that’s being rebuilt, making it easier to untangle traces when many search indexes are rebuilt at the same time.
Here is an example of the logged traces during a rebuild task:
ManagedPoolThread #8 10:44:24 INFO Job started: Index_Update_IndexName=Coveo_web_index
ManagedPoolThread #8 10:44:24 WARN The index Coveo_web_index has already been initialized.
...
ManagedPoolThread #8 10:44:49 INFO [Rebuilding source "YOUR COVEO SOURCE"] Rebuild started.
ManagedPoolThread #8 10:44:49 INFO [YOUR COVEO SOURCE] Synchronizing source...
ManagedPoolThread #8 10:44:49 INFO Total Field Count for Coveo_web_index: 202, Actual Field Count: 182
...
ManagedPoolThread #8 10:45:18 INFO [YOUR COVEO SOURCE] Source synchronized.
...
ManagedPoolThread #8 10:45:19 INFO [Permissions synchronization "Expanded Sitecore Security Provider for YOUR SITECORE INSTANCE"] Starting to send the permissions...
...
ManagedPoolThread #8 10:45:39 INFO [Rebuilding source "YOUR COVEO SOURCE"] Crawling Sitecore items...
...
ManagedPoolThread #8 10:46:01 INFO [Rebuilding source "YOUR COVEO SOURCE"] Sitecore items crawled.
ManagedPoolThread #8 10:46:01 INFO [Rebuilding source "YOUR COVEO SOURCE"] Finalizing rebuild...
ManagedPoolThread #8 10:46:01 INFO [Rebuilding source "YOUR COVEO SOURCE"] Waiting for items to be uploaded to Coveo Cloud...
...
ManagedPoolThread #8 10:46:04 INFO [Rebuilding source "YOUR COVEO SOURCE"] items are uploaded.
ManagedPoolThread #8 10:46:04 INFO [Rebuilding source "YOUR COVEO SOURCE"] Waiting for organization to be provisioned...
ManagedPoolThread #8 10:46:04 INFO [Rebuilding source "YOUR COVEO SOURCE"] Organization is provisioned.
ManagedPoolThread #8 10:46:05 INFO [Rebuilding source "YOUR COVEO SOURCE"] Waiting for items to be searchable...
...
ManagedPoolThread #8 10:46:15 INFO [Rebuilding source "YOUR COVEO SOURCE"] Committed items: 1023 / 2041
...
ManagedPoolThread #8 10:46:50 INFO [Rebuilding source "YOUR COVEO SOURCE"] Committed items: 2041 / 2041
ManagedPoolThread #8 10:46:50 INFO [Rebuilding source "YOUR COVEO SOURCE"] Items are searchable.
ManagedPoolThread #8 10:46:51 INFO [Rebuilding source "YOUR COVEO SOURCE"] Removing old items...
...
ManagedPoolThread #8 10:47:07 INFO [Rebuilding source "YOUR COVEO SOURCE"] 538 remaining items.
...
ManagedPoolThread #8 10:47:15 INFO [Rebuilding source "YOUR COVEO SOURCE"] 0 remaining items.
ManagedPoolThread #8 10:47:15 INFO [Rebuilding source "YOUR COVEO SOURCE"] Old items removed.
ManagedPoolThread #8 10:47:16 INFO [Rebuilding source "YOUR COVEO SOURCE"] Rebuild finished.
ManagedPoolThread #8 15:47:18 INFO Job ended: Index_Update_IndexName=Coveo_web_index (units processed: 2041)
As you can see, the rebuild process is divided into several parts. Here are some details regarding each one.
Synchronizing source
This is when the Sitecore configuration is compared to the resources in the Coveo Platform. The sources, fields, or security providers are modified in the Coveo Platform to match Sitecore configurations.
Coveo for Sitecore (October 2016)
Sending permissions
All Sitecore permissions are sent to Coveo Cloud. This way, Coveo Cloud doesn’t have to contact the Sitecore instance to access the permissions anymore.
Crawling Sitecore items
Using the crawlers configured on the search index, Sitecore iterates over a set of items and passes them to the search index using the Search Provider framework.
Waiting for items to be uploaded to Coveo Cloud
Since Coveo Cloud is an online service, the items have to be uploaded in order to be indexed. The time required to upload the items varies depending on the number of items, the size of each item, and the bandwidth of your Internet connection.
Waiting for organization to be provisioned
Upon creation, a Coveo organization takes a few minutes to be up and running. In other words, the Coveo Cloud service has to deploy some components before the organization is fully functional. At this point, the organization must be ready before the rebuild process can continue.
Waiting for items to be searchable
This step validates that all items sent to the Coveo Platform are being committed and are searchable. It displays the number of committed items and the number of expected items. This process times out after 1 hour of inactivity.
Removing old items
A rebuild operation replaces the source content with a new set of items. However, the old items have to be removed. This step monitors the old items and ensures they’re removed. This process times out after 5 minutes of inactivity.
Before June 2016
In this release, the Sitecore rebuild task completes when all crawled items are sent to the Coveo Platform. However, the items aren’t processed by the Coveo Platform yet. This means that every crawled item is eventually searchable and every old item is eventually removed. However, there are no indications regarding the exact moment at which the rebuild process is completed from an end-user point of view.
Here are the steps that Sitecore performs in the rebuild task.
-
Signals the rebuild start to the Coveo Platform.
-
Synchronizes the source, fields, security provider, etc. with the Coveo Platform.
-
Iterates over all items to index. For each item, it:
-
Translates the Sitecore item in a format intelligible for the Coveo Platform.
-
Sends the item to the Coveo Platform.
-
-
Signals the rebuild end to the Coveo Platform and removes the old items.
This is where the Sitecore rebuild task ends. Anything that happens after isn’t monitored by Sitecore, and may take a few minutes to process.
The index translations are then applied, and the items become searchable. The removed items aren’t searchable anymore.
Starting July 2016
In this release, the overall rebuild process remains the same. However, the rebuild task that’s launched through Sitecore now includes the whole rebuild process. Therefore, Sitecore may say that the rebuild task takes longer to complete despite the fact that it takes the same time as in earlier releases.
The main benefit is that once the rebuild task is completed, you know that every crawled item is searchable. From an end-user point of view, this is when the rebuild task really ends.
More Precise Log Traces
Before this release, it was possible to identify when a rebuild started and ended, as well as the errors that occurred in the process. However, there was no indication letting you know which steps are more time consuming than others.
This release brings more precise log traces to help monitor rebuild tasks. The rebuild task is now divided into several parts. The beginning and end of each part are clearly indicated in the logs. It becomes easy to see if one specific part is taking longer than usual to execute.
Also, every trace contains the name of the source that’s being rebuilt, making it easier to untangle traces when many search indexes are rebuilt at the same time.
Here is an example of the logged traces during a rebuild task:
ManagedPoolThread #8 10:44:24 INFO Job started: Index_Update_IndexName=Coveo_web_index
ManagedPoolThread #8 10:44:24 WARN The index Coveo_web_index has already been initialized.
...
ManagedPoolThread #8 10:44:49 INFO [Rebuilding source "YOUR COVEO SOURCE"] Rebuild started.
ManagedPoolThread #8 10:44:49 INFO [YOUR COVEO SOURCE] Synchronizing source...
ManagedPoolThread #8 10:44:49 INFO Total Field Count for Coveo_web_index: 202, Actual Field Count: 182
...
ManagedPoolThread #8 10:45:18 INFO [YOUR COVEO SOURCE] Source synchronized.
...
ManagedPoolThread #8 10:45:39 INFO [Rebuilding source "YOUR COVEO SOURCE"] Crawling Sitecore items...
...
ManagedPoolThread #8 10:46:01 INFO [Rebuilding source "YOUR COVEO SOURCE"] Sitecore items crawled.
ManagedPoolThread #8 10:46:01 INFO [Rebuilding source "YOUR COVEO SOURCE"] Finalizing rebuild...
ManagedPoolThread #8 10:46:02 INFO [Rebuilding source "YOUR COVEO SOURCE"] Waiting for documents to be searchable...
...
ManagedPoolThread #8 10:46:12 INFO [Rebuilding source "YOUR COVEO SOURCE"] Committed documents: 1023 / 2041
...
ManagedPoolThread #8 10:46:50 INFO [Rebuilding source "YOUR COVEO SOURCE"] Committed documents: 2041 / 2041
ManagedPoolThread #8 10:46:50 INFO [Rebuilding source "YOUR COVEO SOURCE"] Documents are searchable.
ManagedPoolThread #8 10:46:51 INFO [Rebuilding source "YOUR COVEO SOURCE"] Removing old documents...
...
ManagedPoolThread #8 10:47:07 INFO [Rebuilding source "YOUR COVEO SOURCE"] 538 remaining documents.
...
ManagedPoolThread #8 10:47:15 INFO [Rebuilding source "YOUR COVEO SOURCE"] 0 remaining documents.
ManagedPoolThread #8 10:47:15 INFO [Rebuilding source "YOUR COVEO SOURCE"] Old documents removed.
ManagedPoolThread #8 10:47:16 INFO [Rebuilding source "YOUR COVEO SOURCE"] Rebuild finished.
ManagedPoolThread #8 15:47:17 INFO Job ended: Index_Update_IndexName=Coveo_web_index (units processed: 2041)
As you can see, the rebuild process is divided into several parts. Here are some details regarding each one.
Synchronizing source
This is when the actual Sitecore configuration is compared to the resources in CES. The sources, fields, or security providers are modified in CES to match Sitecore configurations.
Crawling Sitecore items
Using the crawlers configured on the search index, Sitecore iterates over a set of items and passes them to the search index using the Search Provider framework.
Waiting for documents to be searchable
This step validates that all documents sent to CES are being committed and are searchable. It displays the number of committed documents and the number of expected documents. This process times out after 1 hour of inactivity.
Removing old documents
A rebuild operation essentially replaces the source content with a new set of documents. However, the old documents have to be removed. This step monitors the old documents and ensures they’re removed. This process times out after 5 minutes of inactivity.
Before July 2016
The Sitecore rebuild task is completed when all crawled documents are sent to the Coveo Platform. This means that every crawled document is eventually searchable and every old document is eventually removed. However, there are no indications regarding the exact moment at which the rebuild process is completed from an end-user point of view.
Here are the steps that Sitecore performs in the rebuild task.
-
Signals the rebuild start to the Coveo Platform.
-
Synchronizes the source, fields, security provider, etc. with CES.
-
Iterates over all items to index. For each item, it:
-
Translates the Sitecore item in a format intelligible for CES.
-
Sends the document to CES.
-
-
Signals the rebuild end to CES and removes the old documents.
This is where the Sitecore rebuild task ends. Anything that happens after isn’t monitored by Sitecore, and may take a few minutes to process.
The index transactions are then applied, and the documents become searchable. The removed documents aren’t searchable anymore.
You can monitor the rebuild state with the Administration Tool Overview page. There, you can see the number of pending transactions and the number of documents enlisted in those transactions. You can even manually force pending transactions to be applied if needed.