Coveo On-Premises Crawling Module
- Deployment Overview
- Installing Docker
- Installing Maestro
- Managing the Crawling Module Using the REST API
- Creating a Source
- Using Pre-Push Extensions
- Advanced Configuration
- Updating the Crawling Module 0.2
- About Component Logs
Crawling Module Requirements
Coveo Cloud License
To use the Coveo On-Premises Crawling Module, a valid Coveo Cloud Enterprise edition license is required (see Manage Coveo Cloud Organization Settings). The Crawling Module is not available with the Coveo Cloud Pro edition.
Moreover, your Coveo Cloud license must include the Crawling Module connectors you want to use (see Supported Content and Content Retrieval Methods). Contact the Coveo Sales team if you do not have a license that includes Crawling Module connectors yet.
To check whether your Coveo Cloud license includes the desired Crawling Module connectors:
In the navigation menu, click Sources.
In the Sources page, click Add Source (see Adding and Managing Sources).
In the Add a Source of Supported Content panel, look for the desired connector with a Crawling Module tag (see Add a Source Panel and Content Retrieval Methods). If the Crawling Module connector is grayed and tagged Unavailable, your license does not include it. You must then contact the Coveo Sales team to upgrade your license.
The Coveo On-Premises Crawling Module must be installed on a server that has the following software installed:
- Docker Enterprise Edition is included in the Windows Server 2016 license and therefore comes with support through Microsoft.
If you prefer to install the Crawling Module on a virtual machine, Coveo provides installation steps for the following setups:
Coveo cannot guarantee that Docker Enterprise Edition will install successfully on a different virtual machine setup (see Installing Docker). See Validating the Installation to ensure Docker works correctly before installing the Crawling Module.
- Any server running the Crawling Module must have access to the content you want to index, regardless of whether the server is a virtual machine.
Since the Crawling Module uses executable files (
.exe extension), its proper operation is susceptible to be blocked by antivirus software. You should therefore ensure that the antivirus software of the server on which you intend to install the Crawling Module does not hinder Crawling Module activities. Due to the large number of antivirus products available, Coveo cannot provide directives for all of them. However, some generic guidelines and directives based on previously solved issues have been established below. Additional or different product-specific steps may be required depending on your environment, your antivirus software, etc.
Before installing Docker, you should ensure that it supports the antivirus software running on your machine (see Endpoint security for Windows containers). Unsupported products or versions could cause issues when creating or launching Docker containers.
Furthermore, to ensure that your antivirus software does not flag Crawling Module executable files as potential threats, add the following directories to the antivirus software exclusion list:
Windows containers are not supported on hosts running McAfee On-Access Scan (see Windows and Linux container support with McAfee products).
Symantec Endpoint Protection requires additional exclusions for Docker containers to launch (see Endpoint Protection interfering with Docker containers on Windows Server 2016).
Moreover, the Symantec Proactive Threat Protection module is not supported and must be deactivated.
When encountering problems, the Symantec Technical Support team may ask you to enable Vpdebug Logging so that they can gather data on your issue (see How to enable “Vpdebug Logging” on Symantec Endpoint Protection).
To determine your server hardware requirements, you must estimate the number of workers the Crawling Module should have based on the size and update schedule of your content sources.
CPU and RAM
CPU and RAM requirements are based on the number of workers you need. As a guideline, consider that a Crawling Module instance with 4 to 6 workers typically requires 4 CPUs and 16 GB of RAM.
When you install the Crawling Module on your server, you must select the disk on which you want to deploy it. The required size of this disk depends on the number of items you intend to index, among other things. As a rule of thumb, consider that 10 million items require about 10 GB of Crawling Module database storage (see Coveo On-Premises Crawling Module Workflow). This value increases in a linear fashion.
Moreover, you need disk space for Docker images, Maestro, and component logs. A disk space of 500 GB for the entire Coveo On-Premises Crawling Module solution is therefore required.
Due to the MySQL version that the Crawling Module uses, the server on which you install the Crawling Module must have TLS 1.1 and 1.2 communication channels open.
IP Address Whitelist
If your environment restricts communications, ensure to whitelist the IP addresses that the Crawling Module uses (see IP Addresses to Whitelist).
Once you have ensured your environment meets all the above requirements, you can proceed with the Crawling Module deployment (see Coveo On-Premises Crawling Module Deployment Overview).