Crawling Module requirements
Crawling Module requirements
Before you deploy the Coveo Crawling Module, you must ensure that your Coveo license, host server, and IP address allowlist meet the following requirements.
Coveo license
Product edition
To use the Coveo Crawling Module, a valid Coveo Enterprise edition license is required. Check your license information to confirm your product edition.
Connectors
Your Coveo license must also include the connectors you want to use. See Supported content for details on what the Crawling Module can index. These connectors must allow the Crawling Module as a content retrieval method.
To check whether your Coveo license includes the desired Crawling Module connectors:
-
Log in to the Coveo Administration Console as a member of a group with the privileges required to create sources in the target Coveo organization.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click Add Source.
-
In the Add a source of content panel, select the desired source. If the source has more than one content retrieval method, you must select the Crawling Module option (). If the Crawling Module source is grayed out, your license doesn’t include it. You must then contact the Coveo Sales team to upgrade your license.
Host server
We recommend that you install the Coveo Crawling Module on a server running Windows Server 2022. Windows Server 2016 and above are also supported.
This server can be either a physical server on your premises or a virtual server running in Azure.
Your server must also host the repository to index or have access to the server on which this repository is located.
Hardware
To determine your server hardware requirements, you must estimate the number of workers the Crawling Module should have based on the size and update schedule of your content sources.
By default, after its deployment, the Crawling Module has four content workers and four security workers.
CPU and RAM
CPU and RAM requirements are based on the number of workers, although the type of source you create, the source size, the item size, and other factors can affect your actual needs. As a rule of thumb, consider that each worker typically requires at least:
-
0.5 vCPU (virtual CPU) or 0.5 CPU (physical CPU core)
-
2 GB RAM
If your server CPU or RAM is insufficient, you could experience unresponsiveness or crashes preventing update operations from completing. Monitor usage to determine when to scale up your deployment.
Disk space
When you install the Crawling Module on your server, you must select the disk on which you want to deploy it. Coveo recommends starting with a disk space of at least 100 GB. This disk space is mostly used by the State Store and logs.
Your actual required disk space might be higher depending on the number of items to index, as well as their type and size. As a rule of thumb, consider that 10 million items require at least 10 GB of State Store storage. However, complex sources such as SharePoint could require at least 20 GB per 10 million items.
The number of logs you have and your retention schedule may also impact your disk space requirements. Typically, the more workers you have, the more logs will be saved on your disk over time.
Network bandwidth
Rebuild operations require the most bandwidth, as they retrieve the entire source content and forward it to your Coveo index. The type and size of your indexed items also impact your bandwidth needs. Ideally, your network speed shouldn’t limit your content updates. Therefore, the more bandwidth you have, the better.
Outbound communication
If your environment restricts outgoing communications, ensure to allow the IP addresses that the Crawling Module uses.
Moreover, ensure that outbound communication from the Crawling Module server is not blocked or slowed down by SSL decryption, antivirus scans, firewalls, Splunk, endpoint protection, etc.
Antivirus
Since the Crawling Module uses executable files (.exe
extension), its proper operation is susceptible to be blocked by antivirus software.
You should therefore ensure that the antivirus software of the server on which you intend to install the Crawling Module doesn’t hinder Crawling Module activities.
Furthermore, to ensure that your antivirus software doesn’t flag Crawling Module executable files as potential threats, you should add the following to the antivirus software exclusion list:
-
Your data folder (
C:\ProgramData\Coveo\Maestro
by default) -
Your installation folder (
C:\Program Files\Coveo\Maestro
by default)
What’s next?
-
Once you have ensured your environment meets all the above requirements, you can proceed with the Crawling Module deployment.
-
If you experience unresponsiveness, crashes, or delayed update operations, you might need to scale up your deployment.