--- title: Sitemap source JSON modification slug: '3158' canonical_url: https://docs.coveo.com/en/3158/ collection: index-content source_format: adoc --- # Sitemap source JSON modification Many [source](https://docs.coveo.com/en/246/) configuration parameters can be set through the user interface. Others, such as rarely used or new parameters, must be configured in the **Edit configuration with JSON** panel. To access this panel from the [**Sources**](https://platform.cloud.coveo.com/admin/#/orgid/content/sources/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/sources/)) page, click the source, and then click **Edit configuration with JSON** in the **More** menu. This article explains how to configure Sitemap source parameters, whether they're already listed in the JSON or not. ## Configuring listed and unlisted parameters [float-group] -- ![Changing a parameter value in the source JSON configuration | Coveo](https://docs.coveo.com/en/assets/images/index-content/changing-parameter-values.gif) If the parameter you want to change is already listed in the `parameters` section of the source JSON configuration, just modify its `value` in the JSON configuration. If the parameter isn't listed in the `parameters` section, copy the entire parameter example object from the [Reference](#reference) section below and paste it into that section. Then, update the `value` in the JSON configuration, if necessary. -- > **Important** > > If a parameter has a `value` attribute that contains sensitive information, set the `sensitive` attribute to `true`. > Otherwise, the value will appear in clear text in the JSON configuration. > **Tip** > > Document the changes you make to the source JSON configuration in the **Comments** area below the JSON configuration. > This ensures that you can easily revert to a previous configuration if needed. ## Reference This section provides information on the Sitemap source parameters that you can only modify through the JSON configuration. If a JSON configuration parameter isn't documented in this article, configure it through the source edition panel instead. ### `AdditionalHeaders` (String) Semicolon separated list of additional HTTP headers added to the connector requests in the following format: `key1\\=value1\\;key2\\=value2`. The parameter is empty by default. Don't use the `AdditionalHeaders` parameter to send the `Authorization` header. This will generate an error as soon as you start an indexing action. Avoid using the `AdditionalHeaders` parameter to submit sensitive information, such as API keys or authentication tokens — doing so is bad practice. > **Important** > > When **Execute JavaScript on pages** is enabled on the source, adding manual cookies significantly impacts indexing performance. **Example** ```json "AdditionalHeaders": { "sensitive": true, "value": "X-CSRF-Token\=" } ``` where you replace `` with the actual token. ### `AllowAutoRedirect` (Boolean) Whether a crawler request should automatically follow redirection responses from the web resource. When set to `true`, the crawler only performs a single HTTP request (that is, for the current page). It automatically follows server HTTP redirect responses to reach the final redirection page. When set to `false`, the crawler performs an HTTP request for the current page, and then another for each server HTTP redirect response until it reaches the final page. The default value is `true`. **Example** ```json "AllowAutoRedirect": { "sensitive": false, "value": "false" } ``` ### `AllowedDeletionPercentage` (Integer) This parameter specifies the maximum allowed percentage of [source](https://docs.coveo.com/en/246/) [items](https://docs.coveo.com/en/210/) that can be deleted from the index at the end of a [rescan](https://docs.coveo.com/en/2711/). If the actual percentage of source items to delete exceeds this value, no items are deleted from the index. By default, this parameter is set to `100`, which means that all source items can be deleted. The purpose of this parameter is to prevent accidental mass item deletions. This can occur, for example, because of an improper source configuration or if the content to index was moved. For more information about this parameter and its usage, see [Forbid item deletion based on a percentage condition](https://docs.coveo.com/en/2006#forbid-item-deletion-based-on-a-percentage-condition). **Example** You can set `AllowedDeletionPercentage` to `10` in the JSON configuration of your source, as shown in the snippet below. With this configuration, if Coveo detects that more than 10% of the items are flagged for deletion during a rescan, deletion will be blocked. The status on the [**Sources**](https://platform.cloud.coveo.com/admin/#/orgid/content/sources/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/sources/)) page will show your source in error, and the error details will indicate the actual percentage of items that were flagged for deletion versus the allowed percentage (in this case, 10%). ```json "AllowedDeletionPercentage": { "sensitive": false, "value": "10" } ``` ### `DateFormat` (String) (For XML sitemaps only) If the last modification dates aren't in a standard format (for example, `YYYY-MM-DDThh:mm:ss.sTZD`), which triggers the `SITEMAP_INVALID_FORMAT_ERROR` error in the Administration Console, specify the Sitemap file's custom date format. The format must use the MSDN format specifiers (see [Custom date and time format strings](https://docs.microsoft.com/en-us/dotnet/standard/base-types/custom-date-and-time-format-strings?redirectedfrom=MSDN)). **Example** ```json "DateFormat": { "sensitive": false, "value": "yyyy;MM;ddTHH:mm:sszzz" } ``` ### `EnableJavaScriptRenderingOptimizations` (Boolean) Whether to enable JavaScript rendering optimizations. When set to `true`, the crawler doesn't download images and external files. The default value is `true`. **Example where you would set `EnableJavaScriptRenderingOptimizations` to `false`** On a page, you have a dynamically generated table. The data in the table comes from a JSON file, downloaded from a server, using JavaScript. To index the table data, you would need to set `EnableJavaScriptRenderingOptimizations` to `false`. **Example** ```json "EnableJavaScriptRenderingOptimizations": { "sensitive": false, "value": "false" } ``` ### `ForceBasicAuthorizationHeader` (Boolean) Whether to enforce basic header authentication. The default value is `false`. > **Note** > > To use `ForceBasicAuthorizationHeader`, you need [**Execute JavaScript on pages**](https://docs.coveo.com/en/1967#execute-javascript-on-pages) to be disabled. > **Warning** > > Enabling this setting is unsafe, as your [basic authentication credentials](https://docs.coveo.com/en/1967#basic-authentication) will be sent [with every page your source requests](https://docs.coveo.com/en/1967#crawling-rules-subtab), regardless of the domain. **Example** ```json "ForceBasicAuthorizationHeader": { "sensitive": false, "value": "true" } ``` ### `HtmlXPathSelectorExpression` (String) The [XPath](https://developer.mozilla.org/en-US/docs/Web/XPath) expression used to select one or more nodes of an HTML format sitemap file containing the URLs to crawl. The parameter is empty by default, which results in the connector indexing all web pages listed in the sitemap file. **Example** You only want to index a specific part (only the web pages linked inside the `cbc-sitemap` div container) of the CBC sitemap web page so you add the following: ```json "HtmlXPathSelectorExpression": { "sensitive": false, "value": "/div[@id='cbc-sitemap']" } ``` > **Note** > > The [`ParseSitemapInStrictMode`](#parsesitemapinstrictmode-boolean) JSON parameter must also be set to `false` since an HTML format sitemap file doesn't follow the [Sitemap protocol](https://www.sitemaps.org/protocol.html). ### `IndexHtmlMetadata` (Boolean) Whether metadata tags found in HTML files should be indexed by the Sitemap crawler. When enabled, the `content` attribute of `` tags is indexed for tags keyed with one of the following attributes: `name`, `property`, `itemprop`, or `http-equiv`. The default value is `false` since the parameter has an impact on indexing performance. By default, the Coveo converter extracts metadata from `` HTML tags with a `name` attribute more efficiently. Therefore, consider enabling this option only when you want to extract from `` tags with a `property`, `itemprop`, or `http-equiv` attribute. **Example** ```json "IndexHtmlMetadata": { "sensitive": false, "value": "true" } ``` See [Index HTML page metadata](https://docs.coveo.com/en/o2ta0401/) for more information. ### `ManualCookies` (String | Null) The list of cookies to add to all HTTP requests the crawler sends to the web server. The following are examples of use cases for `ManualCookies`: * Identifying the Sitemap source crawler as the current web browser. * Setting user interface customization preferences. * Preventing popups from interfering with page content capture (for example, cookie consent prompts). * Ensuring session persistence for long-running crawling tasks. > **Important** > > When **Execute JavaScript on pages** is enabled on the source, adding manual cookies significantly impacts indexing performance. > **Warning** > > Don't use `ManualCookies` to add a cookie that impersonates an authenticated session; the Sitemap source crawler automatically emulates user actions to [authenticate](https://docs.coveo.com/en/1967#authentication-subtab). `ManualCookies` supports all fields of the [W3C WebDriver cookie specification](https://www.w3.org/TR/webdriver1/#cookies). Though it's optional in the specification, we recommend that you specify a `domain` value in each cookie. If you don't specify a `domain` value in a cookie, the crawler will use the domain of the first [**Sitemap URL**](https://docs.coveo.com/en/1967#sitemap-urls) from your source configuration. > **Note** > > Though it's possible to create a source that indexes pages from multiple domains, doing so is [bad practice](https://docs.coveo.com/en/1967#leading-practices). Use the `ManualCookies.value` property to specify the list of cookies as one string that must adhere to the following syntax: * Cookie key-value pairs use the `key=value` syntax. * Cookie key-value pairs are separated by a semicolon (`;`). * Cookies are separated by double semicolons (`;;`). **Example: Adding two manual cookies, `cookieConsent` and `locale`.** ```json "ManualCookies": { "sensitive": false, "value": "cookieConsent=true;domain=example.com;;locale=en-US;path=/us;domain=example.com" } ``` ### `NumberOfRetries` (Integer) The number of retries allowed when a failed web request is recoverable. Only the following HTTP errors will be retried: 408, 500, 503, and 504. The default value is `3` retries. **Example** ```json "NumberOfRetries": { "sensitive": false, "value": "5" } ``` ### `ParsableContentTypes` (String) A list of content [types](https://en.wikipedia.org/wiki/Media_type#Naming) in JSON format, for which the content will be parsed to find data such as hyperlinks. The default value is `"application/xhtml+xml", "application/xml", "text/html"`. **Example** ```json "ParsableContentTypes": { "sensitive": false, "value": "[\"application/xhtml+xml\", \"application/xml\", \"text/html\"]" } ``` ### `ParsableContentTypesSuffixes` (String) A list of content types [suffixes](https://en.wikipedia.org/wiki/Media_type#Suffix) in JSON format, for which the content will be parsed to find data such as hyperlinks. The default value is `+xml`. **Example** ```json "ParsableContentTypesSuffixes": { "sensitive": false, "value": "[\"+xml\", \"+json\"]" } ``` ### `ParseSitemapAlternateLinks` (Boolean) Whether the Sitemap `` element child `` alternate language links should be crawled. The default value is `false`. > **Note** > > When `ParseSitemapAlternateLinks` is set to `true`, if an `` element has its `hreflang` attribute set to `x-default`, the corresponding `href` URL will be crawled unless this URL has already been crawled as the `` element text value. **Example** ```json "ParseSitemapAlternateLinks": { "sensitive": false, "value": "true" } ``` ### `ParseSitemapInStrictMode` (Boolean) Whether each Sitemap file should be parsed in strict mode. The default value is `false`. When [`ParseSitemapInStrictMode`](#parsesitemapinstrictmode-boolean) is set to `false`, a URL must only be well-formatted (that is, an absolute HTTP or HTTPs URL) to be considered valid. Non-valid URLs are skipped. When [`ParseSitemapInStrictMode`](#parsesitemapinstrictmode-boolean) is set to `true`, the Sitemap source also performs the following validations on sitemap files and sitemap index files: * The uncompressed file must be no larger than 50 MB (even if the file is compressed with GZIP). If this condition isn't met, a descriptive error is displayed and the sitemap or sitemap index file isn't processed. * The file can't contain more than 50,000 URLs. If this condition isn't met, a descriptive error is displayed and the sitemap or sitemap index file isn't processed. * A referenced URL must be relative to the sitemap that references it and in the same domain. The [location of a sitemap file](https://www.sitemaps.org/protocol.html#location) determines the set of URLs that can be included in that sitemap. For example, a sitemap file located at `+http://myorgwebsite.com/tech/sitemap.xml+` can include any URL starting with `+http://myorgwebsite.com/tech/+` but can't include URLs starting with `+http://myorgwebsite/catalog/+`. If a URL doesn't meet this condition, it's skipped. * A referenced URL in the sitemap file must be less than 2,048 characters long. If a URL doesn't meet this condition, it's skipped. **Example** ```json "ParseSitemapInStrictMode": { "sensitive": false, "value": "true" } ``` ### `ReadTimeout` (Integer) The timeout duration in seconds when the connector reads web page content from a stream (that is, downloading a Sitemap/web page content). The default value is `300` seconds. When not receiving input from a stream, the crawler will wait for this duration before moving on. **Example** ```json "ReadTimeout": { "sensitive": false, "value": "500" } ``` ### `SkipOnSitemapError` (Boolean) Whether the crawler should skip a sitemap instead of stopping when encountering an exception. The default value is `false`. Skipping may occur on an exception in one of the source **Sitemap URLs**, or in a sitemap referenced in one of the **Sitemap URLs**. When the crawler skips a sitemap, no error message is displayed in the Coveo Administration Console and Coveo doesn't delete existing indexed items located under that sitemap directory. **Example** ```json "SkipOnSitemapError": { "sensitive": false, "value": "true" } ``` ### `Timeout` (Integer) The number of seconds to wait before a request times out (that is, the time a server can take to respond to a request). The default value is `100` seconds. **Example** ```json "Timeout": { "sensitive": false, "value": "100" } ``` ### `UseCookies` (Boolean) Whether cookies must be enabled to crawl. The default value is `false`. Set the value to `true` when you want a cookie container to be initialized and reused for each crawling web request. **Example** ```json "UseCookies": { "sensitive": false, "value": "true" } ``` ### `UseProxy` (Boolean) > **Important** > > * `UseProxy` is specifically for [Crawling Module](https://docs.coveo.com/en/3260/) Sitemap sources. > Don't use this parameter in a cloud Sitemap source. > > * This parameter doesn't support scenarios where [**Execute JavaScript on pages**](https://docs.coveo.com/en/1967#execute-javascript-on-pages) is enabled. > > * This parameter can't be used in combination with [**Form authentication**](https://docs.coveo.com/en/1967#form-authentication). Whether the Crawling Module should use a proxy to access the content to be crawled. The default value is `false`. **Example** ```json "UseProxy": { "sensitive": false, "value": "false" } ```