---
title: Add a Sitemap source
slug: '1967'
canonical_url: https://docs.coveo.com/en/1967/
collection: index-content
source_format: adoc
---
# Add a Sitemap source
:figure-caption!:
Members with the [required privileges](#required-privileges) can use a Sitemap [source](https://docs.coveo.com/en/246/) to make the content of webpages listed in a sitemap file or sitemap index file searchable.
A sitemap file can be added to a website and is required when using a Sitemap source.
The file contains a list of the website's URLs along with their respective [metadata](https://docs.coveo.com/en/218/) which include the LMD (last-modified-date).
This enables the Sitemap source to perform [refresh](https://docs.coveo.com/en/2710/) updates, which the [Web source](https://docs.coveo.com/en/malf0160/) doesn't support.
For this reason, although a Sitemap source requires the extra step of adding a sitemap file, it offers [better performance than the Web source](https://docs.coveo.com/en/2680#sitemap-or-web-source).
## Source key characteristics
The following table presents the main characteristics of a Sitemap source.
[%header,cols="2,2,2,4"]
|===
2+|Features
^|Supported
|Additional information
2+|Indexable content
^|Webpages (URL)
|
2+|Sitemap file format
a|* XML
* Text
* RSS 2.0
* Atom 1.0
* HTML
* GZ
a|Sitemap files and sitemap index files must respect the [Sitemap protocol](https://www.sitemaps.org/protocol.html). Strict validations can be enforced by enabling the [ParseSitemapInStrictMode](https://docs.coveo.com/en/3158#parsesitemapinstrictmode-boolean) option.
HTML pages that use JavaScript to generate a sitemap or redirect to a sitemap or sitemap index are supported.
For a .gz sitemap file, the web server response [`Content-Type` header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type) must be `application/gzip`.
2+|Compressed HTTP responses
^|[check]
|The source automatically handles compressed web server HTTP responses with the following `Content-Encoding` header values: `gzip`, `deflate`, and `br`.
.3+|[Content update operations](https://docs.coveo.com/en/2039/)
|[refresh](https://docs.coveo.com/en/2710/)
^|[check]
a|To be refreshed, an item in the sitemap file must have a last-modification date[.footnote]^[[1](#limitations)]^ (for example, the `lastmod` element in an XML sitemap, or the `updated` element in an Atom sitemap) whose value is more recent than the last refresh operation.
A rescan or rebuild operation is required to take account of deleted sitemap entries.
|[rescan](https://docs.coveo.com/en/2711/)
^|[check]
a|[Takes place every day by default](https://docs.coveo.com/en/1933/).
To be rescanned, an item in the sitemap file must have a last-modification date[.footnote]^[[1](#limitations)]^ (for example, the `lastmod` element in an XML sitemap, or the `updated` element in an Atom sitemap) whose value is more recent than the last time the item was indexed.
|[rebuild](https://docs.coveo.com/en/2712/)
^|[check]
|
.3+|[Content security options](#content-security-tab)
|[Same users and groups as in your content system](https://docs.coveo.com/en/1779#same-users-and-groups-as-in-your-content-system)
^|[x]
|
|[Specific users and groups](https://docs.coveo.com/en/1779#specific-users-and-groups)
^|[check]
|
|[Everyone](https://docs.coveo.com/en/1779#everyone)
^|[check]
|
.2+.^|[Authentication methods](#authentication-subtab)
|Basic authentication
^|[check]
.2+a|Supported HTTP authentication schemes:
* Basic
* Digest
* NTLM
* Negotiate/Kerberos
* Form based
|Form authentication
^|[check]
2+|[Crawling rules](#crawling-rules-subtab)
^|[check]
|A variety of basic and advanced rules may be used to ignore the webpages you don't want to [index](https://docs.coveo.com/en/204/).
.4+|[Metadata indexing for search](#index-metadata)
|Automatic mapping of [metadata](https://docs.coveo.com/en/218/) to [fields](https://docs.coveo.com/en/200/) that have the same name
2+a|Disabled by default.
To enable, [access the JSON configuration of your source](https://docs.coveo.com/en/1685#access-the-edit-configuration-with-json-panel), and set [`performFieldMappingUsingAllOrigins`](https://docs.coveo.com/en/1640#about-the-performfieldmappingusingallorigins-setting) to `true`.
|Automatically indexed [metadata](https://docs.coveo.com/en/218/)
2+a|Examples of [auto-populated default fields](https://docs.coveo.com/en/1833#field-origin) (no user-defined metadata required):
* `clickableuri`
* `date`
* `filetype`
* `language` (autodetected from document content)
* `title`
The [`author`](https://docs.coveo.com/en/1833#field-origin) field will also be auto-populated if the content item contains an `author` metadata value.
After a content update, [inspect your item field values](https://docs.coveo.com/en/2053#inspect-search-results) in the **Content Browser**.
|Extracted but not indexed metadata
2+a|The source automatically extracts the `content` attribute from `` tags that include a `name` attribute.
For example, if the HTML of a page contains the following: ``, the Web source extracts _jsmith_ as the `author` metadata.
After a rebuild, review the [**View and map metadata**](https://docs.coveo.com/en/m9ti0339#view-and-map-metadata-subpage) subpage for the list of indexed metadata, and [index additional metadata](https://docs.coveo.com/en/m9ti0339#index-metadata).
|Custom metadata extraction
2+a|Available using the following source features:
• XML sitemap file [custom metadata extraction](https://docs.coveo.com/en/2656/)
• Webpage [scraping](#web-scraping-subtab)
• Webpage [JSON-LD metadata extraction](#extract-json-ld-metadata)
• Webpage metadata extraction using the [IndexHtmlMetadata](https://docs.coveo.com/en/o2ta0401/) JSON configuration parameter
2+|[JavaScript content rendering](#execute-javascript-on-pages)
^|[check]
|The Sitemap source crawler can execute JavaScript in a webpage to dynamically render content before indexing the page.
2+|[Shadow DOM content retrieval](#execute-javascript-on-pages)
^|[check]
|If you choose to render JavaScript content, you can also specify whether the crawler should traverse and index attached Shadow DOM content.
2+|[Web scraping](#web-scraping-subtab)
^|[check]
|Exclude irrelevant sections in pages and extract [metadata](https://docs.coveo.com/en/218/).
2+|[Optical Character Recognition (OCR)](#content-and-images)
^|[check]
|Available at an extra charge.
Contact [Coveo Sales](https://www.coveo.com/en/contact) to add this feature to your [Coveo organization](https://docs.coveo.com/en/185/) [license](https://docs.coveo.com/en/2864/).
|===
## Limitations
* The last-modification attribute must specify the modification time in [W3C DateTime format](https://www.w3.org/TR/NOTE-datetime), that is, `YYYY-MM-DDThh:mm:ss`.
Moreover, unless you specify a time zone, you must express the modification time in Coordinated Universal Time (UTC).
* Multi-factor authentication (MFA) and CAPTCHA aren't supported.
* The Sitemap source crawler can handle up to 200 cookies for the same domain, and a total of 3000 cookies.
If the crawled sites add cookies beyond these limits, the crawler will drop older cookies, which can cause issues (for example, if a dropped cookie is required for authentication).
* Indexing page permissions isn't supported.
* The Sitemap source doesn't support `robots.txt` file directives or `` tags.
* The [Coveo indexing pipeline](https://docs.coveo.com/en/184/) can handle web pages up to 512 MB only.
Larger pages are [indexed by reference](https://docs.coveo.com/en/l3qg9275#file-type-configurations) (that is, their content is ignored by the Coveo [crawler](https://docs.coveo.com/en/2121/), and only their metadata and path are searchable).
Therefore, no search result [Quick view](https://docs.coveo.com/en/2760#search-result-quick-view) is available for these larger [items](https://docs.coveo.com/en/210/).
* JavaScript usage and limitations:
** The [**Execute JavaScript on pages**](#execute-javascript-on-pages) and **Add time for the crawler to wait before considering a page as fully rendered** settings only pertain to webpage content retrieval for indexing.
When authenticating, the Sitemap crawler applies the [Loading delay](#loading-delay) or the [custom login sequence](#custom-login-sequence) [wait delay](https://docs.coveo.com/en/3289#wait-delay) values.
** JavaScript-rendered sitemaps are supported provided that the `Content-Type` of the targeted sitemap file is `application/xhtml+xml` or `text/html`.
** Content in pop-up windows and elements that require interaction aren't indexed.
** When the [**Execute JavaScript on pages**](#execute-javascript-on-pages) option is enabled, the source doesn't support the [`UseProxy`](https://docs.coveo.com/en/3158#useproxy-boolean) parameter.
* The [`UseProxy`](https://docs.coveo.com/en/3158#useproxy-boolean) parameter can't be used in combination with [**Form authentication**](https://docs.coveo.com/en/1967#form-authentication).
## Leading practices
* Make sure you have the right to [crawl](https://docs.coveo.com/en/2121/) public content if you don't own the website.
Crawling sites that you don't own nor have the right to crawl could create reachability issues.
Some sites use infrastructure components such as CDN/Caching providers (for example, Akamai, Cloudflare, and Varnish) that can affect Coveo's ability to retrieve content.
If you're unfamiliar with these mechanisms, learn about them before you configure your source.
For example, a CDN/Caching provider can detect the Coveo crawler and block it from further crawling.
* Always try authenticating without a [custom login sequence](https://docs.coveo.com/en/3289/) first.
You should only start working on a custom login sequence when you're sure your form authentication details (that is, login address, user credentials, confirmation method) are accurate and that the standard form authentication process doesn't work.
* It's best to create or edit your source in your sandbox organization first.
Once you have confirmed that it indexes the desired content, you can copy your source configuration to your production organization, either [with a snapshot](https://docs.coveo.com/en/3239/) or manually.
See [About non-production organizations](https://docs.coveo.com/en/2959/) for more information and best practices regarding sandbox organizations.
* Though it's possible to index multiple domains by configuring the source outside the main user interface, doing so is a bad practice.
Always create one source per domain.
This helps:
** Prevent the crawler from using your source authentication credentials on an external site.
** Reduce the number and complexity of crawling and scraping rules.
** Optimize source configurations for each site.
** Avoid having a rebuild/rescan issue on one site cause the deletion of indexed items associated with the other sites.
* The number of [items](https://docs.coveo.com/en/210/) that a source processes per hour (crawling speed) depends on various factors, such as network bandwidth and source configuration.
See [About crawling speed](https://docs.coveo.com/en/2078/) for information on what can impact crawling speed, as well as possible solutions.
* Break down large sitemap files into multiple sitemap files.
* Group your source and the other implementation [resources](https://docs.coveo.com/en/2820/) together in a [project](https://docs.coveo.com/en/n7ed6189/).
See [Manage projects](https://docs.coveo.com/en/n7ef0517/).
## Add a Sitemap source
To add a source
. On the [**Sources**](https://platform.cloud.coveo.com/admin/#/orgid/content/sources/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/sources/)) page, click **Add source**.
. In the **Add a source of content** panel, click the **Cloud** (icon:cloud-icon[alt=cloud-icon,width=16]) or [**Crawling Module**](https://docs.coveo.com/en/3260/) ([crawlingmodule]) tile, depending on your [content retrieval context](https://docs.coveo.com/en/1612/).
With the latter, you must [install the Crawling Module](https://docs.coveo.com/en/3263/) to make your source operational.

. [[addstep3]]In the **Add a new Sitemap source** / **Add a new Crawling Module Sitemap source** panel, fill in the following fields.
--
**Name**: Use a short and descriptive name, using only letters, numbers, hyphens (-), and underscores (_).
The source name can't be modified once it's saved.
**Sitemap URLs**: Enter the direct URL of your sitemap file, not the website address.
Otherwise, the source can interpret the address as an HTML sitemap page and crawl the links it contains.
--
--
Enter the direct URL of your sitemap file, not the website address.
Otherwise, the source can interpret the address as an HTML sitemap page and crawl the links it contains.
**Examples of sitemap URLs**
* Public website sitemap: `+http://myorgwebsite.com/sitemap.xml+`
* Public website sitemap compressed with GZIP: `+http://myorgwebsite.com/sitemap.xml.gz+`
> **Notes**
>
> * If the sitemap URL is an HTML page that uses JavaScript to generate a sitemap or redirect to a sitemap or sitemap index, enable [JavaScript execution on pages](#execute-javascript-on-pages) in the source's advanced settings.
>
> * The [`ParseSitemapInStrictMode`](https://docs.coveo.com/en/3158#parsesitemapinstrictmode-boolean) JSON parameter dictates the extent of validation the Sitemap source applies on sitemap and sitemap index files, and on their referenced URLs.
>
> * The Sitemap source only crawls pages listed in a sitemap file.
> It doesn't crawl links in the listed web pages themselves.
--
**Crawling Module**: If you're creating a Crawling Module Sitemap source, select the installed Crawling Module instance.
**Project**: Specify the [projects](https://docs.coveo.com/en/n7ef0517/) you want to associate your source with.
> **Note**
>
> After source creation, you can update your Coveo project selection under the [**Identification**](#identification-subtab) subtab.
. Click **Next**.
. Select who has [permission to access the content](https://docs.coveo.com/en/1779/) through the search interface and click **Add source**.
> **Note**
>
> This information is editable later in the [**Content security**](#content-security-tab) tab.
. Configure your [source](https://docs.coveo.com/en/246/).
> **Note**
>
> You can save your source settings at any time by clicking **Save**.
### "Configuration" tab
The **Configuration** tab lets you manage the crawling rules, web scraping configurations, advanced settings, and authentication methods of your source.
These configuration groups are presented in subtabs.
#### "Crawling rules" subtab
The **Crawling rules** subtab lets you define the specific pages to [index](https://docs.coveo.com/en/204/).
##### Sitemap URLs
Enter the direct URL of your sitemap file, not the website address.
Otherwise, the source can interpret the address as an HTML sitemap page and crawl the links it contains.
**Examples of sitemap URLs**
* Public website sitemap: `+http://myorgwebsite.com/sitemap.xml+`
* Public website sitemap compressed with GZIP: `+http://myorgwebsite.com/sitemap.xml.gz+`
> **Notes**
>
> * If the sitemap URL is an HTML page that uses JavaScript to generate a sitemap or redirect to a sitemap or sitemap index, enable [JavaScript execution on pages](#execute-javascript-on-pages) in the source's advanced settings.
>
> * The [`ParseSitemapInStrictMode`](https://docs.coveo.com/en/3158#parsesitemapinstrictmode-boolean) JSON parameter dictates the extent of validation the Sitemap source applies on sitemap and sitemap index files, and on their referenced URLs.
>
> * The Sitemap source only crawls pages listed in a sitemap file.
> It doesn't crawl links in the listed web pages themselves.
##### Exclusions and inclusions
Add exclusion and inclusion rules to crawl only specific items based on their URL.

The following diagram illustrates how the Sitemap [crawler](https://docs.coveo.com/en/2121/) applies the exclusion and inclusion rules.
This flow applies to all pages, including the sitemap URLs.
You must therefore pay attention to not filter out your sitemap URLs.

> **About the "Include all non-excluded pages" option**
>
> [.float-group]
> --
> 
>
> The **Include all non-excluded pages** option automatically adds an "include all" inclusion rule in the background.
> This ensures that all sitemap URLs meet the `Does URL match at least one inclusion rule?` condition and that all non-excluded pages get crawled.
>
> --
You can use any of the six types of rules:
* **is** and a URL that includes the protocol.
For example, `+https://myfood.com/+`.
* **contains** and a string found in the URL.
For example, `recipes`.
* **begins with** and a string found at the beginning of the URL and which includes the protocol.
For example, `+https://myfood+`.
* **ends with** and a string found at the end of the URL.
For example, `.pdf`.
* **matches wilcard rule** and a wildcard expression that matches the whole URL.
For example, `+https://myfood.com/recipes*+`.
* **matches regex rule** and a regex rule that matches the whole URL.
For example, `^.**(company-(dev|staging)).**html.?$`.
> **Tip**
>
> When using regex rules, make sure they match the desired URLs with a testing tool such as [Regex101](https://regex101.com/).
#### "Web scraping" subtab
The **Web scraping** subtab lists and lets you manage [web scraping](https://docs.coveo.com/en/2767/) configurations for your source.
When the crawler is about to index a page, it checks whether it must apply web scraping configurations that have been defined.
The crawler considers the [**Pages to target**](https://docs.coveo.com/en/mahe0350#configuration-info) rules of each of your web scraping configurations, starting with the configuration at the top of your list.
The crawler will either apply [the first matching configuration or all matching configurations](#single-match-vs-multi-match).
Indexing irrelevant page sections and not extracting custom metadata reduces the quality of search results.
With this in mind, all new Sitemap sources are created with a default web scraping configuration that excludes typical repetitive elements found in web pages that shouldn't be indexed.

Existing Sitemap sources without a web scraping configuration prompt you to add the default configuration when you access the **Web scraping** subtab.

> **Important**
>
> When no web scraping configuration is defined:
>
> * All [crawling rules included pages](#crawling-rules-subtab) are indexed in their entirety (that is, no sections are excluded).
>
> * No custom metadata is extracted.
The Sitemap source features two web scraping configuration management modes: [UI-assisted mode](#ui-assisted-mode) and [Edit with JSON mode](#edit-with-json-mode).
##### UI-assisted mode
You can add (+), edit ([edit]), and delete ([delete]) _one_ web scraping configuration at a time with a user interface that makes many technical aspects transparent.
UI-assisted mode is easier to use and more mistake-proof than Edit with JSON mode.
This is now the recommended mode for all web scraping configurations.
When you add or edit a web scraping configuration using UI-assisted mode, the **Add/Edit a web scraping configuration** panel is displayed.
See [Configurations in UI-assisted mode](https://docs.coveo.com/en/mahe0350#configurations-in-ui-assisted-mode) for more details.
##### Edit with JSON mode
The **Edit with JSON** button gives access to the _aggregated_ web scraping JSON configuration of the source.
Adding, editing, and deleting configurations directly in the JSON requires more technical skills than using UI-assisted mode.
When you add or edit a web scraping configuration in Edit with JSON mode, the **Edit a web scraping JSON configuration** panel is displayed.
See [Configurations in Edit with JSON mode](https://docs.coveo.com/en/mahe0350#configurations-in-edit-with-json-mode) for more details.
##### Single-match vs multi-match
The Sitemap source can apply web scraping configurations in two ways: single-match or multi-match.
In single-match mode, the crawler applies only the first matching web scraping configuration.
In multi-match mode, the crawler applies all matching web scraping configurations.
The animation below demonstrates the application of three web scraping configurations on a culinary website featuring news articles and recipe pages, in single-match mode (left) and multi-match mode (right).

Sitemap sources created before mid-December 2023 were created in single-match mode.
All new Sitemap sources are created in multi-match mode.
Coveo converted existing single-match sources containing zero or one web scraping configuration to multi-match mode.
We recommend you convert any remaining single-match Sitemap source to multi-match mode.
If a Sitemap source is currently in single-match mode, the **Web scraping** subtab displays a banner prompting you to convert to multi-match mode.

To convert a source to multi-match mode
. In the **Web scraping** subtab, click **Switch to multi-match mode**.
. Confirm you want to convert the source to multi-match mode.
A green **You're currently in multi-match mode** banner will then appear.
. Click **Save**.
Once your source is fully converted, the **Web scraping** subtab no longer shows the green banner and the subtab description reflects the multi-match mode behavior.

#### "Advanced settings" subtab
The **Advanced settings** subtab lets you customize the Coveo crawler behavior.
All advanced settings have default values, which are adequate in most use cases.
##### Execute JavaScript on pages
Only enable this option when any of the following are true, as it can significantly increase the time needed to crawl pages:
* The website content you want to consider for indexing is dynamically rendered by JavaScript.
* A [sitemap URL](#sitemap-urls) you specified is an HTML page that uses JavaScript to generate or redirect to a sitemap or sitemap index file.
If you enable **Execute JavaScript on pages**, you'll have the following options:
* **Add time for the crawler to wait before considering a page as fully rendered**:
The default value of this setting is `0`, which means that the crawler doesn't wait after the page is loaded to retrieve its content.
If the JavaScript takes longer to execute than normal or makes asynchronous calls, consider increasing this value to ensure that pages with longer rendering times are indexed with their dynamically rendered content.
* **Enable Shadow DOM content retrieval**:
When you enable this option, the crawler builds a flattened DOM tree by combining the light DOM and the Shadow DOM.
It then processes the resulting structure as it would any other web page.
> **Note**
>
> The crawler adds a custom attribute to the shadow root elements in the flattened DOM, allowing these elements to be targeted using a [special web scraping CSS selector](https://docs.coveo.com/en/mahe0350#css-selectors).
> **Important**
>
> Building the composed DOM can significantly slow down indexing.
> Enable this option only if the Shadow DOM contains valuable content you need to index.
##### User Agent string
The [user agent](https://en.wikipedia.org/wiki/User_agent) string that the Sitemap source crawler uses to identify itself when requesting pages from your web server.
The default value is `Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko) (compatible; Coveobot/2.0;+http://www.coveo.com/bot.html)`.
##### Extract JSON-LD metadata
If you have [JSON-LD](https://json-ld.org/) metadata in your HTML pages that you want to index, enable the **Extract JSON-LD metadata** option.
When enabled, JSON-LD objects in the webpage are extracted, flattened, and represented in `jsonld.parent.child` metadata format in your [Coveo organization](https://docs.coveo.com/en/185/).
**Example**
Given the following JSON-LD script tag in a webpage:
```javascript
```
On an [indexing](https://docs.coveo.com/en/204/) action, the Sitemap [connector](https://docs.coveo.com/en/2734/) would extract `BBC News` as the value for the `jsonld.publisher.name` [metadata](https://docs.coveo.com/en/218/).
To [index this metadata](https://docs.coveo.com/en/m9ti0339#index-metadata), you would therefore need to use `%[jsonld.publisher.name]` as the [mapping rule](https://docs.coveo.com/en/1839/) for your field.
##### Time the crawler waits between requests to your server
Indicate the number of milliseconds between consecutive HTTP requests to the website server.
The default value is 1000 milliseconds, which represents a crawling rate of one page per second.
#### "Authentication" subtab
The **Authentication** settings, used by the source crawler, emulate the behavior of a user authenticating to access restricted website content.
If authentication is required, select the authentication type your website uses, whether [**Basic authentication**](#basic-authentication) or [**Form authentication**](#form-authentication).
Then, provide the corresponding login details.
> **Warning**
>
> Whether you use Basic or Form authentication, limit your source [crawling scope](#crawling-rules-subtab) to one domain that you own.
> This reduces the risk of exposing your authentication credentials.
> **Note**
>
> Manual form authentication is now only available on legacy sources.
> We recommend you [migrate existing Manual form authentication sources](#migrate-from-manual-form-authentication) to Form authentication.
##### Basic authentication
When selecting **Basic authentication**, enter the credentials of an account on the website you're making searchable.
See [Source credentials leading practices](https://docs.coveo.com/en/1920/).
> **Important**
>
> When **Execute JavaScript on pages** is enabled on the source, basic authentication significantly impacts indexing performance.
If your sitemap contains a link to a page of a different domain or subdomain that also requires basic authentication, the Sitemap source will provide the credentials you entered when challenged.
> **Important**
>
> To prevent exposing your credentials, provide username and password information only when the site uses a communication protocol secured with TLS or SSL (HTTPS).
> You are responsible for ensuring that your Sitemap links requiring basic authentication credentials use HTTPS for increased security.
> The basic authentication credentials you enter will be provided, regardless of whether the link requiring these credentials uses HTTP or HTTPS.
##### Form authentication
You can choose between two form authentication workflows:
**Force authentication disabled (recommended)**
Details
With [Force authentication](#force-authentication) disabled, the workflow typically goes as follows:
. Coveo's crawler requests a protected page.
. The web server redirects the crawler to the [**Login page address**](#login-page-address).
. Using the configured [**Validation method**](#validation-method), the crawler determines it's not authenticated.
This automatically triggers the next step.
. The crawler performs a standard login sequence using the provided **Login details**, or the [**Custom login sequence**](#custom-login-sequence) if one is configured.
. After successful authentication, the web server responds by redirecting back to the requested protected page and returning cookies.
. The crawler follows the server redirect to get the protected page and indexes that page.
. The crawler requests the other pages using the cookies.
This is the default and recommended workflow as it emulates human behavior the best and ensures crawler re-authentication, when needed.
**Force authentication enabled**
Details
With [Force authentication](#force-authentication) enabled, the workflow typically goes as follows:
. The crawler performs a standard login sequence using the provided **Login details**, or the [**Custom login sequence**](#custom-login-sequence) if one is configured.
. After successful authentication, the web server responds with cookies that the crawler will use to request other pages.
. The crawler requests the first URL from the web server using the cookies and indexes that page.
. The crawler requests other pages using the cookies.
If the crawler loses authentication at some point (for example, if a cookie expires), it has no way of knowing it must re-authenticate unless you have a proper authentication status [validation method](#validation-method).
As a result, you may notice at some point that your source has indexed some, but _not all_, protected pages.
Only use [**Force authentication**](#force-authentication) when no reliable authentication status [validation method](#validation-method) can be configured.
> **Note**
>
> The crawler can interact with Shadow DOM elements in your login pages.
> If this is required, make sure the form authentication [loading delay](#loading-delay) allows the Shadow DOM time to load before the crawler begins to interact with the page.
###### Username and password
Enter the credentials required to access the secured content.
See [Source credentials leading practices](https://docs.coveo.com/en/1920/).
###### Login page address
Enter the URL of the website login page where the username and password are to be used.
###### Loading delay
Enter the maximum time the crawler should allow for JavaScript to execute and go through the login sequence before timing out.
###### Validation method
The crawler uses the validation method after requesting a page from the web server to know if it's authenticated or not.
When the validation method reveals that the crawler isn't authenticated, the crawler immediately tries to re-authenticate.
To configure the validation method
. In the dropdown menu, select your preferred authentication status validation method.
. In the **Value(s)** field, specify the corresponding URL, regex or text.
** For **Cookie not found** (recommended):
Enter the name of the cookie returned by the server after _successful_ authentication.
If this cookie isn't found, the crawler will immediately authenticate (or re-authenticate).
**Example**
`ASP.NET_SessionId`
** For **Redirection to URL** (recommended):
Enter the URL where users trying to access protected content on the website are redirected to when they're _not_ authenticated.
If the crawler is redirected to this URL, it will immediately authenticate (or re-authenticate).
**Example**
`+https://mycompany.com/login/failed.html+`
** For **Text not found in page** footnote:not-recommended[Less reliable than the recommended validation methods. Can result in false positives, making form authentication issues harder to troubleshoot.]:
Enter the text that appears on the page after _successful_ authentication.
If this text isn't found on the page, the crawler will immediately authenticate (or re-authenticate).
**Example**
When a user successfully logs in, the page shows a "Hello, !" greeting text.
If the login [username you specified](#username-and-password) was `+jsmith@mycompany.com+`,
the text to enter would be:
`Hello, \jsmith@mycompany.com!`
**Example**
`Log out`
** For **Text found in page** footnote:not-recommended[]:
Enter the text that appears on the page when a user _isn't_ authenticated.
If this text is found on the page, the crawler will immediately authenticate (or re-authenticate).
**Examples**
* `An error has occurred.`
* `Your username or password is invalid.`
** For **URL matches regex** footnote:not-recommended[]:
Enter a regex rule that matches the URL where users trying to access protected content are redirected to when they're _not_ authenticated.
If the crawler is redirected to a URL that matches this regex, it will immediately authenticate (or re-authenticate).
**Example**
`.+Account\/Login.*`
** For **URL doesn't match regex** footnote:not-recommended[]:
Enter a regex rule that matches the URL where users trying to access protected content are redirected to after _successful_ authentication.
If the crawler isn't redirected to a URL that matches this regex, it will immediately authenticate (or re-authenticate).
###### Force authentication
Select this option if you want Coveo's first request to be for authentication, regardless of whether it is actually required.
> **Important**
>
> You should only force authentication if you have no reliable authentication status [validation method](#validation-method).
###### Custom login sequence
The default login sequence for Web and Sitemap sources supports various third-party login pages, such as OneLogin, Google, Salesforce, and Microsoft.
The default login sequence also tries to detect and log in to first-party login forms.
The login process uses the first `