Cloud V2 for Developers
- API Overview
- Basic Topics
- API PaaS Tutorials
- Building Custom Search Integrations Using Coveo Cloud PaaS
- Search API
- Usage Analytics Write API
- Usage Analytics Read API
- Push API
- Activity API
- Authorization Server API
- Field API
- Index API
- Notification API
- Platform API
- Security Cache API
- Source API
- Customer Service API
- Source Logs API
- Indexing Pipeline Customization Tools Overview
- Indexing Pipeline Extensions
- Crawling Module
- Crawling Module (Docker Version)
- Coveo on Elasticsearch
- Coveo Cloud V2: API Reference
Coveo On-Premises Crawling Module
This article applies to the new Crawling Module, which works without Docker. If you still use the Crawling Module with Docker, see Coveo On-Premises Crawling Module (Docker Version) instead. You might also want to read on the advantages of the new Crawling Module.
Versions > 1: new Crawling Module
Versions < 1: Crawling Module with Docker
The Coveo On-Premises Crawling Module allows you to index on-premises content in order to make it searchable in a Coveo-powered search interface. Customers that can’t open an inbound port in their firewall for the Coveo cloud-hosted connectors to access their on-premises content are the typical Crawling Module users.
If you can open a port in your firewall to let a cloud-hosted connector access your on-premises content, you don’t need to install the Crawling Module. Instead, you can create On-Premises sources in the Coveo Cloud Administration Console. See the connector documentation for detailed instructions.
The cloud-hosted Coveo Cloud connectors pull content from cloud and on-premises secured enterprise systems to make it searchable. The Coveo On-Premises Crawling Module, however, runs outside of the Coveo Platform. It pulls your content from your on-premises systems, and then sends it to a Coveo Cloud push-type source, which serves as an intermediate to index your data.
Once the Coveo On-Premises Crawling Module is deployed on your Windows server, all communications are outbound, and no inbound ports to your secured enterprise system are required. Nevertheless, you can manage a Coveo Cloud source fed by the Crawling Module just like you would manage a cloud source.
The following on-premises content can be indexed using the Coveo On-Premises Crawling Module:
The Coveo On-Premises Crawling Module has three components:
Maestro, a software managing and monitoring your local workers and the State Store.
One or more workers, which are responsible for executing content update tasks requested by the Coveo Platform. Each worker can only handle one task at a time, so you may need more than one, depending on the content to index. See Number of Workers for details.
The State Store, which stores information regarding the last update operations, such as the source state and the URI of indexed items. As a result, the workers know what has been indexed during the last update operation, and therefore what needs to be indexed next time.
The Coveo On-Premises Crawling Module indexing workflow is the following:
The State Store provides the worker with information regarding the last update operation. The content worker uses this information to execute the update task.
The worker crawls the content and updates the State Store information.
The worker provides the Push API with the changes that have been made since the last update operation. The worker authenticates with the API key Maestro received at the end of its installation process.
The Push API indexes the received changes so that the content in your Coveo-powered search interface reflects your actual on-premises data. See Coveo Cloud Indexing Pipeline for details on the indexing process.
Should you make changes the Crawling Module configuration, Maestro applies them to the workers and the State Store.