Noindex and Nofollow Directives
Directives can be added to the code of a web page to instruct robots how to index the content of the page. In particular, the
noindex directive instructs crawlers not to index the page, while the
nofollow directive forbids them to follow the links in the page (see Robots meta tag and X-Robots-Tag HTTP header specifications). When indexing the content of a Web source, the Coveo Cloud crawler can either follow or ignore these directives (see Add or Edit a Web Source).
When indexing the content of a website, the Coveo Cloud crawler takes the following directives into account. These directives aren’t case-sensitive and can appear either in an HTML page
meta tag or in the
X-Robots-Tag HTTP header.
||Robots are forbidden to index the content of this page.|
||Robots are forbidden to follow the links in this page. However, if
This link wouldn't be followed by robots:
||Don't index the content of this page and don't follow its links. This directive is equivalent to
||Robots are allowed to index the page and to follow its links. This is the default directive.|
Typically and unless otherwise specified, these directives apply to all robots. However, if you want the Coveo Cloud crawler to abide by different rules, you have two options:
If you’re the owner of the website you want to make searchable, you can add Coveo-specific directives in the code for the Coveo Cloud crawler to follow (see Add Coveo-Specific Directives).
If you can’t edit the code of the website, you can configure your Web source so that it ignores the
nofollowdirectives (see Ignore Directives).
Add Coveo-Specific Directives
When you’re the owner of the website you want to make searchable, you can add indexing directives to your page code to indicate which items (i.e., website pages) robots are allowed to crawl.
Meta directives with
name="robots" and HTTP response header directives without a specified crawler name apply to all crawlers. However, if you want some directives to apply to the Coveo Cloud crawler only, you can use the
name property to specify
coveobot. As a result, whenever your page contains directives specifically addressed to the Coveo Cloud crawler, the crawler follows these instructions and ignores the general, all-robot directives.
With the following page meta tag, all robots are forbidden to index the page and follow the links in it, except the
<meta name="robots" content="nofollow, noindex" /> <meta name="coveobot" content="all" />
With the following X-Robots-Tag HTTP header, all robots are forbidden to index the page and follow the links in it, except the
HTTP/1.1 200 OK Date: Mon, 29 April 2019 15:08:11 GMT (…) X-Robots-Tag: nofollow, noindex, coveobot: all (…)
Alternatively, and especially if you don’t own the website you want to make searchable, you can instruct the Coveo Cloud crawler to ignore the website directives. To do so, in the Add/Edit a Web Source panel, under Crawling Settings, deselect the appropriate check box (see Add or Edit a Web Source).
Consider notifying the website owner that you instructed the Coveo Cloud crawler to ignore their directives.
You can also configure a crawling limit rate and instruct the Coveo Cloud crawler to bypass restrictions specified in the website
robots.txt file, if not already done (see Crawling limit rate and Respect robots.txt directives).