Crawling Module requirements
Crawling Module requirements
Your Coveo license must also include the connectors you want to use. See Supported Content for details on what the Crawling Module can index. These connectors must allow the Crawling Module as a content retrieval method.
To check whether your Coveo license includes the desired Crawling Module connectors:
In the navigation menu, select Sources.
In the Add a Source of Supported Content panel, select the desired source. If the source has more than one content retrieval method, you must select the Crawling Module option (). If the Crawling Module source is grayed out, your license doesn’t include it. You must then contact the Coveo Sales team to upgrade your license.
We recommend that you install the Coveo On-Premises Crawling Module on a server running Windows Server 2019. Windows Server 2016 is also supported.
Your server must also host the repository to index or have access to the server on which this repository is located.
By default, after its deployment, the Crawling Module has four content workers and four security workers.
CPU and RAM
CPU and RAM requirements are based on the number of workers, although the type of source you create, the source size, the item size, and other factors can affect your actual needs. As a rule of thumb, consider that each worker typically requires at least:
- 0.5 vCPU (virtual CPU) or 0.5 CPU (physical CPU core)
- 2 GB RAM
When you install the Crawling Module on your server, you must select the disk on which you want to deploy it. Coveo recommends starting with a disk space of at least 100 GB. This disk space is mostly used by the State Store and logs.
Your actual required disk space might be higher depending on the number of items to index, as well as their type and size. As a rule of thumb, consider that 10 million items require at least 10 GB of State Store storage. However, complex sources such as SharePoint could require at least 20 GB per 10 million items.
The number of logs you have and your retention schedule may also impact your disk space requirements. Typically, the more workers you have, the more logs will be saved on your disk over time.
Rebuild operations require the most bandwidth, as they retrieve the entire source content and forward it to your Coveo index. The type and size of your indexed items also impact your bandwidth needs. Ideally, your network speed shouldn’t limit your content updates. Therefore, the more bandwidth you have, the better.
If your environment restricts outgoing communications, ensure to allow the IP addresses that the Crawling Module uses.
Moreover, ensure that outbound communication from the Crawling Module server is not blocked or slowed down by SSL decryption, antivirus scans, firewalls, Splunk, endpoint protection, etc.
Since the Crawling Module uses executable files (
.exe extension), its proper operation is susceptible to be blocked by antivirus software. You should therefore ensure that the antivirus software of the server on which you intend to install the Crawling Module doesn’t hinder Crawling Module activities.
Furthermore, to ensure that your antivirus software doesn’t flag Crawling Module executable files as potential threats, you should add the following to the antivirus software exclusion list:
Your data folder (
- Your installation folder (
C:\Program Files\Coveo\Maestroby default)
Once you have ensured your environment meets all the above requirements, you can proceed with the Crawling Module deployment.
If you experience unresponsiveness, crashes, or delayed update operations, you might need to scale up your deployment.