Coveo On-Premises Crawling Module (Docker Version)

This article applies to the old Crawling Module, which works with Docker. If you are using the new Crawling Module, see Coveo On-Premises Crawling Module instead.

The old Crawling Module has reached its end-of-life on December 31, 2020. We recommend switching to the new Crawling Module, which doesn’t require Docker.

To identify the Crawling Module you’re currently using, on the Crawling Modules page of the Coveo Administration Console, look at the Maestro reported version:

  • Versions > 1: new Crawling Module

  • Versions < 1: Crawling Module with Docker

The Coveo On-Premises Crawling Module lets you crawl on-premises content in order to make it searchable in a Coveo-powered search page (see Supported Content). Customers who can’t open an inbound port in their firewall for the Coveo cloud-hosted crawlers to access their on-premises content are the typical Crawling Module users.

If you want to crawl on-premises content and can open a port in your firewall to let a Coveo cloud-hosted connector access your content, see the Coveo Administration Console documentation.

The cloud-hosted Coveo Cloud crawlers pull content from cloud and on-premises secured enterprise systems to make it searchable. The Coveo On-Premises Crawling Module, however, runs outside of the Coveo Cloud environment. It pulls your content from your on-premises systems, and then sends it to a push-type source, which serves as an intermediate to index the data.

Once the Coveo On-Premises Crawling Module is deployed on your Windows machine, all communications are outbound, and no inbound ports to your secured enterprise system are required (see Deployment Overview). Nevertheless, a Coveo Cloud source fed by the Coveo On-Premises Crawling Module supports the same features as a cloud source (see Sources Page).

  • In March 2020, Coveo Cloud will start enforcing TLS 1.2 for all client communications. If you want to confirm that your Coveo On-Premises Crawling Module instance will transition to TLS 1.2 seamlessly, see TLS 1.1 Deprecation.

  • In version 8.0.4780 of Maestro, Coveo will introduce a Crawling Module security update requiring your host server to run Windows Server 2016 version 14393.3504 or greater. See this knowledge base article for details.

Supported Content

The following on-premises content can be crawled using the Coveo On-Premises Crawling Module:

Your Coveo Cloud license must include Crawling Module connectors (see Crawling Module Requirements).


The Coveo On-Premises Crawling Module consists of three components:

  • Maestro, a software managing your local workers and database. Maestro also acts a bridge between the workers and database, and the Coveo Platform.

  • The MySQL Coveo Database, which stores information regarding the last update operations, such as the source state and the URI of indexed items. As a result, the workers know what content was crawled during the last crawling operation, and therefore what needs to be crawled during the next update operation.

  • One or more workers, which are responsible for executing content update tasks requested by the Coveo Platform (see Refresh, Rescan, and Rebuild). Each worker can only handle one task at a time, so you may need more than one, depending on your content (see Number of Workers).


The Coveo On-Premises Crawling Module indexing workflow is the following:

  1. Maestro authenticates to a Coveo organization and receives an API key.

  2. Maestro provides the API key to the Crawling Module workers.

  3. The Crawling Module workers periodically poll the Coveo Platform for source update tasks (see Refresh, Rescan, and Rebuild). When an update is due, the next available worker will execute it (see Number of Workers).

  4. The Coveo Database provides the worker with information regarding the last update operation. The worker uses this information to execute the update task.

  5. The worker crawls the content and provides the Push API with the changes that have been made since the last update operation. The worker authenticates with the API key received from Maestro.

  6. The Push API indexes the received changes so that the content in your Coveo-powered search page reflects your actual on-premises data.

Crawling Module Workflow

  • To make deployment easier and increase worker scalability, your workers and local database are inside Docker containers. This ensures that they run smoothly, regardless of your environment configuration.

  • The workflow above doesn’t take into account the option you have to index the permissions corresponding to your secured content. If you want to index secured content and take access permissions into account, contact Coveo Support.

  • You can install the Crawling Module on several servers if your content requires it (see Deploying Multiple Crawling Module Instances).

What’s Next?

To deploy the Crawling Module, see Coveo On-Premises Crawling Module Deployment Overview.

Recommended Articles