Scale up your deployment
Scale up your deployment
As you use your Crawling Module deployment and perhaps add more sources, you may notice that update operations slow down or that your host resources are maxed out. This indicates that your Crawling Module deployment has reached its limits and should be scaled up to work efficiently again.
When to scale up your deployment
As you associate new sources to your Crawling Module deployment or as the number of simultaneous update operations increases, you should monitor the following factors. They’ll help you determine whether you need to adjust your deployment.
-
Delayed update operations
When content or security identity provider update operations are executed behind schedule, you may notice that the search results in your Coveo-powered search interface don’t reflect your actual data, or that the content access permissions in your search interface don’t match your permission system. For example, your newest items may not be findable in your search interface, or a user recently forbidden from accessing some records may still be able to view them through Coveo. You may also notice that your source’s status on the Sources (platform-ca | platform-eu | platform-au) page remains
Starting refresh/rescan/rebuild
for a long time before switching toRetrieving content update
. -
Maxed out resources
You notice high CPU or memory usage on the server hosting your Crawling Module deployment. The host is being overtaxed, and this affects the server’s stability.
Scaling options
To improve Crawling Module operations, consider the following solutions.
Adjust the update schedules
A worker executes one update operation at a time. So, if all your content and security provider updates are scheduled at or around the same time and your number of workers is insufficient to execute them all simultaneously, some operations will be delayed. As a result, the most recent changes in your content or permission system won’t be visible in your search interface until the delayed operation is executed.
To decrease the operation execution delay without adding more workers, try adjusting the update schedules to spread out the operations over a longer time frame. For example, if all operations are scheduled to take place at 3 AM, you could instead schedule one every two hours starting from 11 PM. This should help you get freshly updated content in the morning without increasing your number of workers.
In addition, ensure that updates are scheduled at a frequency matching the rate at which your content or permission system changes. Consider decreasing the update frequency if the context allows it. For example, if you have a large source where only a few items change every day, an hourly refresh may not be needed. You could try a daily refresh instead.
See Schedule a source update and Edit security identity provider refresh schedules for instructions and best practices.
Add workers
Adding workers allows your Crawling Module deployment to execute more update operations simultaneously. However, the more operations you execute at a time, the more your server’s resources are used.
In Number of workers, Coveo offers rules of thumb to estimate the number of content and security workers you need. However, depending of the types of sources you use with your Crawling Module deployment, some workers may use more resources than others.
Monitor CPU and memory usage over time to determine whether your server’s resources allow adding workers. If the average CPU or memory usage is over 85%, we don’t recommend adding workers to your deployment. Overutilized resources could result in crashes, unresponsiveness, etc. Consider the other options below instead.
When adding workers to your deployment, take into account the type of updates that are delayed. Content workers execute source update operations while security workers are responsible for crawling the content permissions and feeding security identity providers this information. So, if only your security identity provider update operations are delayed, look into adding security workers. Adding content workers will have no effect in such a case.
In addition, keep in mind that disk usage and network will also be impacted. Typically, the more workers you have, the more logs will be saved on your disk over time. Your workers might also make more HTTP requests at the same time.
See Number of workers for details.
Scale up the server hardware
In Requirements, Coveo offers rules of thumb to estimate your hardware needs. As explained above, if the CPU or memory average usage is over 85%, adding workers to your deployment could compromise your server’s stability.
Therefore, upgrading your server CPU and/or memory could allow you to add more workers.
Split sources between multiple Crawling Module deployments
If you’ve adjusted the update schedules and can’t add more workers due to hardware limitations, you can decrease the load on your host server by moving some Crawling Module sources to multiple servers. This requires you to deploy another Crawling Module instance to another server, and then to pair the desired source with your new deployment.
See Deploying multiple Crawling Module instances for details.