About bot traffic
About bot traffic
Bot traffic is a common aspect of modern web operations. With advances in AI, bot traffic has become more dynamic in both volume and behavior. The increase in automated traffic generated by AI-driven agents, along with web crawlers and scrapers, has led to a surge in traffic across the web. Many of these systems mimic human behavior, making them difficult at times to identify as bots.
In your Coveo solution, this traffic will impact usage metrics and query volume.
Coveo’s solutions are typically integrated into web applications, and therefore bot traffic can interact with these applications in the same way as human users. As a Coveo client, you’re responsible for managing and filtering bot traffic by determining which requests should be allowed or blocked and applying appropriate bot-mitigation strategies within your infrastructure.
|
|
At a glance:
|
This article explains the complexities of bot traffic and provides guidance for managing it before it reaches Coveo.
How bot traffic interacts with your Coveo solution
Bots are automated software agents that perform tasks or interact with a website without direct human input. Some bots interact with your website through the user interface, while others send requests directly to backend services such as APIs. For example, AI agents use LLMs and can perform complex tasks and interact with your search interfaces in a human-like manner.
Bot traffic can originate from different IP addresses, and the traffic pattern can change quickly over time. Historically, bot traffic was often categorized as wasteful, but with the rise of AI-driven agents and automated tools, this distinction has changed. Many of these systems interact with websites for legitimate reasons, such as:
-
Indexing site content.
-
Monitoring site performance and functionality.
-
Retrieving and analyzing data for business intelligence.
However, harmful or unintended bot traffic still exists. This traffic can consist of web crawlers or scrapers that often use large pools of distributed IP addresses and rotate client identifiers to evade detection.
Regardless of the intent, bot traffic generates queries and can impact both your usage metrics, such as queries per month (QPM) consumption, and analytics metrics, such as clickthrough rates.
What is counted as a query
Coveo processes queries that are sent to its usage-based APIs (for example, Search API, PR API, Commerce API) by websites, applications, or other automated systems. These queries can originate from human users, automated tools, or other software interacting with your implementation.
When a query reaches the API, it’s considered valid if:
-
It includes authentication credentials, such as an API key or search token.
-
It’s processed without returning an error.
Your usage metrics are based on the total number of valid queries sent to the APIs, regardless of their origin. This includes valid queries generated by automated systems, including bots or other tools.
For this reason, we recommend that you identify and filter unwanted traffic before it reaches Coveo.
Approaches to identify and manage bot traffic
Automated traffic should be evaluated and filtered before it reaches Coveo services. Because Coveo operates as a downstream service integrated into your website or application, it has limited visibility into the full context of query traffic. Website owners and infrastructure providers are typically better positioned to analyze traffic patterns and apply security controls.
To identify or limit automated traffic, consider implementing either one or both of the following strategies within your own infrastructure.
Use dedicated web application firewalls (WAFs)
Security solutions such as Cloudflare™, Akamai™, or AWS WAF™ are specifically designed to detect, classify, and block malicious or abusive automated traffic.
These platforms combine multiple detection techniques, such as:
-
IP reputation databases.
-
Traffic pattern analysis.
-
Behavioral models that identify automated interactions.
-
Adaptive rule engines that respond to evolving bot behavior.
Because these systems operate at the network or application edge, they can analyze and block requests before they reach your website or Coveo APIs.
Implement a reverse proxy
A reverse proxy is a server that sits within your own infrastructure and routes requests from the browser through an endpoint that you control before forwarding them to Coveo.
Using a reverse proxy lets you do the following:
-
Inspect and filter incoming traffic.
-
Apply rate limiting or access rules.
-
Integrate with bot detection or security tools.
-
Control how requests are forwarded to Coveo services.
A proxy architecture can also provide additional benefits, such as reducing direct exposure of API credentials and giving you greater visibility into the traffic generated by your website. For more information, see When to use a reverse proxy.
FAQ
Can Coveo block bot traffic?
While Coveo includes infrastructure-level protections, such as safeguards against DDoS attacks and known malicious traffic patterns, it’s not designed to detect or block all bot traffic.
Coveo services operate at the API layer and process requests sent by websites, applications, or other systems. Detecting and filtering most automated traffic requires advanced analysis of visitor behavior, network characteristics, and request patterns. This type of detection is best performed by specialized security platforms such as web application firewalls (WAFs) or bot management services.
Because these tools operate closer to your website’s entry point, they have better visibility into the full context of incoming traffic and can apply more reliable detection and filtering mechanisms.
Why can’t I block bots using client identifiers or IP addresses?
Using static identifiers such as client identifiers or IP addresses isn’t a reliable way to block automated traffic, as it presents several challenges:
-
Dynamic identifiers
Bots rarely rely on fixed client identifiers or IP addresses. Because these identifiers can be rotated, exploited, or distributed across a large network of servers, static blocking lists become ineffective.
-
Distinguishing harmful from legitimate traffic
Modern crawlers include search engine bots, accessibility tools, AI agents, and integration services. Automatically blocking these false positives can degrade your customer experience or break valid automation.
Why are automated requests included in my usage metrics?
Coveo processes queries based on their technical validity rather than their origin. When a request reaches Coveo, includes valid authentication credentials, and is successfully processed, it’s treated as a valid request.
Because automated systems can interact with search interfaces in the same way as human users, their requests may be processed and included in your organization’s usage metrics.
For this reason, organizations that want greater control over automated traffic should implement filtering or traffic management strategies within their own infrastructure.
Is bot traffic included in my analytics data?
Some bot traffic, but not all, may be included in your Coveo Analytics data.
Coveo uses filtering mechanisms to reduce the impact of automated traffic on your data, thereby improving data quality and relevance. Because Coveo ML models are trained on analytics data, this filtering also helps reduce the impact of automated traffic on model training. However, these filtering mechanisms may not detect all bot traffic. As a result, some automated activity may still be reflected in analytics data.
|
|
You can also use Coveo Analytics reports to help you identify signs of bot traffic in your data and remove it from your reports. While automated traffic can vary widely, some indicators include:
These patterns can help you identify areas where automated traffic may be affecting your usage metrics. |
|
|
Note
Analytics filtering occurs after a valid query is processed. Therefore, even if a query is filtered out of analytics data, it will still count towards your usage limits if it’s considered valid and processed by Coveo. |