Troubleshooting Sitemap source issues
Troubleshooting Sitemap source issues
This article provides help troubleshooting common issues when indexing content with the Sitemap source.
Identify the issue you’re facing using the case context and symptoms provided. Then, apply the recommended resolution steps and rebuild your source.
|
Troubleshooting symptoms are provided as a guide. Actual symptoms may vary. For example, Coveo may or may not return an error mentioned among the issue symptoms. Review the Activity Browser (platform-ca | platform-eu | platform-au) page for a fuller picture of an abnormal indexing activity. You can also download the source update logs for a chronological account of what happened during the indexing process. |
Issues are divided into categories. Click a category description below to reach the related section.
Missing pages
|
Server throttling
Context and symptoms:
Click for likely cause and resolutionCause: By default, the Request interval delay value is Resolution: Open your source.
Increase the Request interval delay value (e.g., |
|
Blocklisting
Context and symptoms:
Click for likely cause and resolutionCause: Your network may be blocking inbound requests from Coveo. Resolution: Allow inbound requests from Coveo. If it’s not possible, install the Coveo On-Premises Crawling Module on your infrastructure to push documents to Coveo instead. |
|
URL exclusion
Context and symptoms:
Click for likely cause and resolutionCause: The Sitemap source JSON configuration Resolution: Access the Edit a source JSON configuration panel and review your
|
|
Missing or invalid basic authentication configuration
Context and symptoms:
Click for likely cause and resolutionCause: Accessing the page content requires basic authentication. Resolution: Request authentication credentials from the web server administrator. Then, open your source and configure basic authentication on the source. |
|
Missing or invalid form authentication configuration
Context and symptoms:
Click for likely cause and resolutionCause: Accessing the page content requires form authentication. Resolution: Request authentication credentials from the web server administrator. Then, open your source and configure automatic form authentication on the source. |
|
Authentication confirmation method issue
Context and symptoms:
Click for likely cause and resolutionCause: The authentication confirmation method may not be configured properly. Resolution: Open your source. Ensure the Confirmation method you have selected and the associated Value are adequate. |
|
Redirection to login page issue
Context and symptoms:
Click for likely cause and resolutionCause: The Resolution:
|
|
Content freshness issue
Symptom: Pages recently added to the website are still not appearing in the Content Browser (platform-ca | platform-eu | platform-au). Click for likely cause and resolutionCause and Resolution: See Indexed content isn’t up to date. |
Extra or unwanted pages
|
Missing filtering
Symptom: All URLs listed in the sitemap file are indexed. Click for likely cause and resolutionCause: By default, the Sitemap JSON configuration Resolution:
⚠️ You must have at least one |
|
Content freshness issue
Symptom: Pages recently deleted from the website are still appearing in the Content Browser (platform-ca | platform-eu | platform-au). Click for likely cause and resolutionCause and Resolution: See Indexed content isn’t up to date. |
Unexpected or missing content inside pages
|
Indexing by reference
Context and symptoms:
![]() Click for likely cause and resolutionCause: You may be indexing by reference. When indexing by reference, the body of the web page (used for the quick view) isn’t retrieved and no excerpt (used for the item description) is generated. Resolution:
|
|
Web scraping issue
Context and symptoms:
Click for likely cause and resolutionCause: A web scraping configuration may be removing the missing sections. Resolution: Open your source and review your web scraping configurations.
Focus on the pages the web scraping configurations are targeting and whether you can configure more restrictive exclusion selectors (i.e., the |
|
Missing dynamic content
Context and symptoms:
Click for likely cause and resolutionCause: The source may be crawling your page before all its dynamic content is rendered. Resolution: Open your source. In the Content to include section, make sure Render JavaScript is enabled. Increase the Loading delay value, if need be. |
|
Login page content instead of proper page content
Context and symptoms:
Click for likely cause and resolutionCause: The page to index is protected and automatic form authentication isn’t properly set up. Resolution:
|
|
Indexing pipeline extension
Context and symptoms:
Click for likely cause and resolutionCause: An indexing pipeline extension (IPE) may be removing the missing sections. Resolution: Review the logs for the items affected by the extensions. Make necessary adjustments to the extension script or conditions. |
Unexpected item field values
|
Inexistent field
Context and symptoms:
Click for likely cause and resolutionCause: The field doesn’t exist. You need to create the field and the field mapping. Resolution:
|
|
Field mapping issue
Context and symptoms:
Click for likely cause and resolutionCause: There may be a field mapping issue. Resolution:
|
|
Metadata extraction issue
Context and symptoms:
Click for likely cause and resolutionCause: There may be a metadata extraction issue specifically for that item. Resolution: Search for reasons why the metadata extraction process wouldn’t be working on your specific item. For example, if you’re using a web scraping configuration, open your source. Then, validate the following:
|
|
Title field value selection
Symptom: The item Click for likely cause and resolutionCause: Coveo has a Resolution: Coveo automatically extracts several pieces of metadata that can be used as item titles. See Item title selection mapping rule options to control the value selection process. |
|
Metadata origin selection
Context and symptoms:
Example: ![]() Click for likely cause and resolutionCause: There’s a metadata origin selection issue. For example, you have configured a web scraping configuration to extract a When values for the same metadata name are extracted in the crawling stage and in the processing (or converter) stage of the Coveo indexing pipeline, the latter value is used by default to populate the mapped field. Example: ![]() Resolution:
|
|
Overwritten crawler metadata
Context and symptoms:
Click for likely cause and resolutionCause: There’s a metadata conflict. You can have two configurations extracting values for the same metadata name at the crawling stage (e.g., one in a web scraping configuration, and another in the sitemap XML file).
When this happens, one value overwrites the other and you only see one Resolution: Change the metadata name in your configuration to make it unique and adjust your field mapping rule accordingly. |
Indexing is slow
|
Throttling
Context and symptoms:
Click for likely cause and resolutionCause: By default, the Request interval delay value is Resolution: Open your source and increase the Request interval delay value (e.g., |
Indexed content isn’t up to date
|
Refresh limitations
Context and symptoms:
Click for likely cause and resolutionCause: A source refresh doesn’t consider deleted and new sitemap file entries. A rebuild or rescan is required to reflect these changes in your index. Resolution: Make sure the Sitemap source rescan schedule is enabled.
The default |
|
Last modification date refresh support conditions
Context and symptoms:
Click for likely cause and resolutionCause: Your sitemap file doesn’t meet all Resolution: As a workaround, consider changing the Sitemap source rescan schedule from a |
|
SkipOnSitemapError setting
Context and symptoms:
Click for likely cause and resolutionCause: The source may be configured with Resolution:
|