Coveo for Sitecore 5 is now available!

Understanding and Customizing the Binary Data Indexing Process

On-Premises only

When it comes to indexing binary data, the indexing process of Sitecore items in Coveo indexes is slightly different. This page explains what happens when binary data is indexed, and how you can customize this process.

Understanding the Default Indexing Process of Binary Data

Indexing an item causes these events to occur:

  1. The Coveo Search Provider fetches the item from the database.
  2. The Coveo Search Provider configures the item with the required metadata and fields.
  3. The Coveo Search Provider adds the binary data related properties using the BinaryDataPropertiesWriter specified in the index configuration.
  4. The Coveo Search Provider pushes the Sitecore Item to the RabbitMQ queue.
  5. The Queue Crawler fetches the item stored in the RabbitMQ queue.
  6. The Queue Crawler determines whether it has to retrieve the item’s binary data.
  7. If the binary data must be retrieved, the Queue Crawler sends a request to the Sitecore Web Service for the data.
  8. The Sitecore Web Service retrieves the data and sends it to the Queue Crawler.
  9. The Queue crawler indexes the item along with the binary data.

Customizing the Default Process

Send the Binary Data to RabbitMQ

The binary data can be sent along with the item’s metadata to the RabbitMQ queue. To do this:

  1. In your Coveo.SearchProvider.config file, locate and copy the contentSearch node.
  2. In your Coveo.SearchProvider.Custom.config file, paste the contentSearch node. You may now close your Coveo.SearchProvider.config file.
  3. Locate the following index configuration, which you just copied in your file.

     <index id="Coveo_web_index" type="Coveo.SearchProvider.ProviderIndex, Coveo.SearchProvider">
       <param desc="p_Name">$(id)</param>
     ...
    
  4. Add the BinaryDataPropertiesWriter node used to send the binary data to RabbitMQ like in the example below.

     <index id="Coveo_web_index" type="Coveo.SearchProvider.ProviderIndex, Coveo.SearchProvider">
       <BinaryDataPropertiesWriter type="Coveo.SearchProvider.Documents.BinaryDataPropertiesWriter.BinaryDataInQueuePropertiesWriter, Coveo.SearchProviderBase" />
       <param desc="p_Name">$(id)</param>
     ...
    
  5. Save the file.
  6. The binary data will now be sent directly to RabbitMQ and won’t have to be downloaded by the Queue Crawler.

Compress the Binary Data Sent to the RabbitMQ Queue

When you send binary data to the queue, large messages are compressed. By default, messages that include more than 10 MB of binary data are compressed. You can specify the message size threshold beyond which the binary data is compressed. To do so, edit the QueueCompressionThresholdInBytes setting in the Coveo.SearchProvider.Custom.config file. The value is specified in bytes.

If you can’t find setting in your Coveo.SearchProvider.Custom.config file, you need to copy and paste it from the Coveo.SearchProvider.config file. To avoid upgrading issues, We recommend that you don’t modify the Coveo.SearchProvider.config file.

<QueueCompressionThresholdInBytes>5000000</QueueCompressionThresholdInBytes>
Recommended Articles