---
title: Crawler directives
slug: mc2a1538
canonical_url: https://docs.coveo.com/en/mc2a1538/
collection: index-content
source_format: adoc
---
# Crawler directives
Coveo's Web [source](https://docs.coveo.com/en/246/) [crawler](https://docs.coveo.com/en/2121/) behaves similarly to bots of web search engines such as Google.
The crawler only needs a [starting URL](https://docs.coveo.com/en/malf0160#starting-urls) and then discovers other web pages by following the site navigation and hyperlinks appearing on the pages.
Bots (including Coveo's Web source [crawler](https://docs.coveo.com/en/2121/)) can have a negative impact on the performance of the targeted website.
This is why mechanisms (that is, _directives_) were developed to let websites and web pages provide crawlers with indexing instructions.

By default, when indexing the content of a website, the Coveo crawler obeys all directives it encounters.
This can represent an obstacle if, for example, you want to use the Coveo Web source to index content that Google doesn't.
However, configuring the Web source to [override a directive](https://docs.coveo.com/en/malf0160#directives-overrides) should be a last resort.
Instead, you should leverage the fact that directives can usually be given on a per-crawler basis and configure [`coveobot`](https://docs.coveo.com/en/mc1f0219#robotsdottextuseragentstring-string-null)-specific directives.

The goal of this article is to provide information on the crawler directives the Web source can override and, for each, the actual way you should allow more freedom to the Coveo crawler.

## The "robots.txt" override setting

The Web source [**robots.txt**](https://docs.coveo.com/en/malf0160#directives-overrides) directives override setting pertains to the [robots.txt file](https://en.wikipedia.org/wiki/Robots.txt).
This file specifies `Allow`/`Disallow` directives that tell a crawler which parts of the website it should or shouldn't visit.

If you must give greater access to the `coveobot` crawler, consider adding a `User-agent: coveobot` section to your `robots.txt` file.
See [The User-Agent Line](https://www.rfc-editor.org/rfc/rfc9309.html#name-the-user-agent-line) and [Simple Example](https://www.rfc-editor.org/rfc/rfc9309.html#name-simple-example) for further details.

> **Note**
>
> The Web source crawler is coded to [request no more than one page per second](https://docs.coveo.com/en/malf0160#time-the-crawler-waits-between-requests-to-your-server).
> If the site `robots.txt` file includes a [`Crawl-delay`](https://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directive) directive with a different value, the slowest crawling speed applies.

## The "noindex" and "nofollow links" override settings

The Web source [**noindex**](https://docs.coveo.com/en/malf0160#directives-overrides) and [**nofollow links**](https://docs.coveo.com/en/malf0160#directives-overrides) directives override settings are grouped here because their related directives are often used together in a website implementation.
The `noindex` and `nofollow` directives apply to an entire HTML page.
They're case-insensitive and can appear:

* In the `<head>` section of an HTML page as the `content` attribute value of a `<meta>` tag (for example, `<meta name="robots" content="noindex">`).

* In the web server `X-Robots-Tag` HTTP response header following the request for a given page.

![Response header that contains an X-Robots-Tag | Coveo](https://docs.coveo.com/en/assets/images/index-content/x-robots-tag.png)

The `noindex` directive instructs crawlers not to index the page.
The `nofollow` directive instructs crawlers not to follow any of the links of the page.

Whether using `<meta>` tags or the `X-Robots-Tag` HTTP response headers, there are shorthand `content` values.
For example:

* `<meta name="robots" content="none">` is equivalent to `<meta name="robots" content="noindex, nofollow">`.

* `<meta name="robots" content="all">` means that there are no indexing and no link following restrictions.

Meta directives with `name="robots"` and HTTP response header directives without a specified crawler name apply to all crawlers.
However, if you want some directives to apply to the Coveo crawler only, you can set the `name` property to `coveobot`.
As a result, whenever your page contains directives specifically intended for the Coveo crawler, the crawler follows these instructions and ignores the general, all-robot directives.

**Example: Coveo-specific directives with a `<meta>` tag**

With the following page `<meta>` tag, all robots are instructed not to index the page and not to follow the links in it, except the `coveobot` crawler:

```html
<meta name="robots" content="nofollow, noindex" />
<meta name="coveobot" content="all" />
```

**Example: Coveo-specific directives with an X-Robots-Tag response header**

With the following X-Robots-Tag HTTP header, all robots are instructed not to index the page and not to follow the links in it, except the `coveobot` crawler:

```http
HTTP/1.1 200 OK
Date: Mon, 10 April 2023 15:08:11 GMT
(…)
X-Robots-Tag: nofollow, noindex, coveobot: all
(…)
```

## The "nofollow anchors" override setting

Whereas the [`<meta>` tag directives](#the-noindex-and-nofollow-links-override-settings) apply to an entire page, the Web source [**nofollow anchors**](https://docs.coveo.com/en/malf0160#directives-overrides) directive override setting pertains to `nofollow` directives specified on individual anchor tags (that is, `<a rel="nofollow" ...>` tags).

**Example**

This link shouldn't be followed by robots: `<a href="signin.php" rel="nofollow">sign in</a>`.

There's no way to change the anchor tag to make the `rel="nofollow"` attribute target a specific crawler.
However, if you're using the `rel="nofollow"` attribute to prevent the target page from being indexed, you may want to consider using the mechanisms mentioned above instead (see [Evolving "nofollow" - new ways to identify the nature of links](https://developers.google.com/search/blog/2019/09/evolving-nofollow-new-ways-to-identify)).