Noindex and Nofollow Directives

Directives can be added to the code of a web page to instruct robots how to index the content of the page. In particular, the noindex directive instructs crawlers not to index the page, while the nofollow directive forbids them to follow the links in the page (see Robots meta tag and X-Robots-Tag HTTP header specifications). When indexing the content of a Web source, the Coveo Cloud crawler can either follow or ignore these directives (see Add or Edit a Web Source).

When indexing the content of a website, the Coveo Cloud crawler takes the following directives into account. These directives aren’t case-sensitive and can appear either in an HTML page meta tag or in the X-Robots-Tag HTTP header.

Directive Instruction
noindex Robots are forbidden to index the content of this page.
nofollow Robots are forbidden to follow the links in this page. However, if nofollow appears in the rel attribute of a link, robots are forbidden to follow this link.
This link wouldn't be followed by robots: <a href="signin.php" rel="nofollow">sign in</a>.
none Don't index the content of this page and don't follow its links. This directive is equivalent to noindex, nofollow.
all Robots are allowed to index the page and to follow its links. This is the default directive.

Typically and unless otherwise specified, these directives apply to all robots. However, if you want the Coveo Cloud crawler to abide by different rules, you have two options:

  • If you’re the owner of the website you want to make searchable, you can add Coveo-specific directives in the code for the Coveo Cloud crawler to follow (see Add Coveo-Specific Directives).

  • If you can’t edit the code of the website, you can configure your Web source so that it ignores the noindex and nofollow directives (see Ignore Directives).

Add Coveo-Specific Directives

When you’re the owner of the website you want to make searchable, you can add indexing directives to your page code to indicate which items (i.e., website pages) robots are allowed to crawl.

Meta directives with name="robots" and HTTP response header directives without a specified crawler name apply to all crawlers. However, if you want some directives to apply to the Coveo Cloud crawler only, you can use the name property to specify coveobot. As a result, whenever your page contains directives specifically addressed to the Coveo Cloud crawler, the crawler follows these instructions and ignores the general, all-robot directives.

  • With the following page meta tag, all robots are forbidden to index the page and follow the links in it, except the coveobot crawler:

    <meta name="robots" content="nofollow, noindex" />
    <meta name="coveobot" content="all" />
    
  • With the following X-Robots-Tag HTTP header, all robots are forbidden to index the page and follow the links in it, except the coveobot crawler:

    HTTP/1.1 200 OK
    Date: Mon, 29 April 2019 15:08:11 GMT
    (…)
    X-Robots-Tag: nofollow, noindex, coveobot: all
    (…)
    

Ignore Directives

Alternatively, and especially if you don’t own the website you want to make searchable, you can instruct the Coveo Cloud crawler to ignore the website directives. To do so, in the Add/Edit a Web Source panel, under Crawling Settings, deselect the appropriate check box (see Add or Edit a Web Source).

Consider notifying the website owner that you instructed the Coveo Cloud crawler to ignore their directives.

What’s Next?

You can also configure a crawling limit rate and instruct the Coveo Cloud crawler to bypass restrictions specified in the website robots.txt file, if not already done (see Crawling limit rate and Respect robots.txt directives).

Recommended Articles