Noindex and Nofollow Directives
Noindex and Nofollow Directives
Directives can be added to the code of a web page to instruct robots how to index the content of the page.
In particular, the noindex
directive instructs crawlers not to index the page, while the nofollow
directive forbids them to follow the links in the page (see Robots meta tag and X-Robots-Tag HTTP header specifications).
When indexing the content of a Web source, the Coveo crawler can either follow or ignore these directives (see Add or Edit a Web Source).
When indexing the content of a website, the Coveo crawler takes the following directives into account.
These directives aren’t case-sensitive and can appear either in an HTML page meta
tag or in the X-Robots-Tag
HTTP header.
Directive | Instruction |
---|---|
|
Robots are forbidden to index the content of this page. |
|
Robots are forbidden to follow the links in this page.
However, if Example
This link wouldn’t be followed by robots: |
|
Don’t index the content of this page and don’t follow its links.
This directive is equivalent to |
|
Robots are allowed to index the page and to follow its links. This is the default directive. |
Typically and unless otherwise specified, these directives apply to all robots. However, if you want the Coveo crawler to abide by different rules, you have two options:
-
If you’re the owner of the website you want to make searchable, you can add Coveo-specific directives in the code for the Coveo crawler to follow (see Add Coveo-Specific Directives).
-
If you can’t edit the code of the website, you can configure your Web source so that it ignores the
noindex
andnofollow
directives (see Ignore Directives).
Add Coveo-Specific Directives
When you’re the owner of the website you want to make searchable, you can add indexing directives to your page code to indicate which items (i.e., website pages) robots are allowed to crawl.
Meta directives with name="robots"
and HTTP response header directives without a specified crawler name apply to all crawlers.
However, if you want some directives to apply to the Coveo crawler only, you can use the name
property to specify coveobot
.
As a result, whenever your page contains directives specifically addressed to the Coveo crawler, the crawler follows these instructions and ignores the general, all-robot directives.
-
With the following page meta tag, all robots are forbidden to index the page and follow the links in it, except the
coveobot
crawler:<meta name="robots" content="nofollow, noindex" /> <meta name="coveobot" content="all" />
-
With the following X-Robots-Tag HTTP header, all robots are forbidden to index the page and follow the links in it, except the
coveobot
crawler:HTTP/1.1 200 OK Date: Mon, 29 April 2019 15:08:11 GMT (…) X-Robots-Tag: nofollow, noindex, coveobot: all (…)
Ignore Directives
Alternatively, and especially if you don’t own the website you want to make searchable, you can instruct the Coveo crawler to ignore the website directives. To do so, in the Add/Edit a Web Source panel, under Crawling Settings, deselect the appropriate check box (see Add or Edit a Web Source).
|
Note
Consider notifying the website owner that you instructed the Coveo crawler to ignore their directives. |
What’s Next?
You can also configure a crawling limit rate and instruct the Coveo crawler to bypass restrictions specified in the website robots.txt
file, if not already done (see Crawling limit rate and Respect robots.txt directives).