Troubleshooting Sitemap source issues
Troubleshooting Sitemap source issues
This article provides troubleshooting best practices and lists common issues when indexing content with the Sitemap source.
Important: Troubleshooting fundamentals
Though the information provided in the Common issues section will often help you identify and resolve a problem, keep the following in mind:
-
A given set of symptoms can be caused by different underlying issues.
-
When you expand a content update activity in the Activity panel or Activity Browser (platform-ca | platform-eu | platform-au) page, the error code and messages displayed may only be general indicators of the problem.
-
Coveo only halts an indexing operation and displays an error when specific conditions are met.
Consequently, finding the root cause of an issue may require more granular information, which only update logs can deliver.
To download an update log
-
On the Sources (platform-ca | platform-eu | platform-au) page, click the desired resource, and then click Activity in the Action bar.
-
In the Activity panel that opens, click the desired activity, and then click Download Logs in the Action bar. The downloaded file is named after the unique operation ID representing the selected activity.
To locate issue root causes in logs
-
Open the log file in a text file viewer.
-
Look for
WARN
,ERROR
, andFATAL
messages.Use a log file viewer that supports highlighting by log level to make these messages more noticeable.
-
If necessary, review
INFO
messages. They sometimes reveal a configuration that you overlooked and that may be causing the issue.
Common issues
Issues are divided into categories. Click a category description below to reach the related section.
Missing pages
User agent blocklisting
Context and symptoms
Likely cause and resolutionCause Your web server may be blocking access to the Coveo crawler user agent (for example, using the Resolution
|
Sitemap or sitemap index file can’t be reached
Context and symptoms
Likely cause and resolutionCause The following are potential causes:
Resolution Depending on the cause, apply the corresponding resolution below:
|
Sitemap URL exclusion
Context and symptoms
Likely cause and resolutionCause The Sitemap source exclusion and inclusion rules may be filtering out that sitemap URL. Consequently, pages listed in that sitemap file aren’t indexed. Resolution Access your source configuration and review your exclusion and inclusion rules. Ensure your sitemap file URL:
|
NO_DOCUMENT_INDEXED errors
Context and symptoms
Likely cause and resolutionCause The For example, it’s possible no items were indexed because:
Resolution More information is needed to diagnose the issue. Review the source activity logs as follows:
|
Missing or invalid basic authentication configuration
Context and symptoms
Likely cause and resolutionCause Accessing the page content requires basic authentication. Resolution
|
Missing or invalid form authentication configuration
Context and symptoms
Likely cause and resolutionCause Accessing the page content requires form authentication. Resolution
|
Authentication validation method issue
Context and symptoms
Likely cause and resolutionCause The authentication validation method may not be configured properly. Resolution Access your source configuration. Ensure the Validation method you selected and the associated Value are adequate. |
Redirection to login page issue
Context and symptoms
Likely cause and resolutionCause The Resolution
|
Content freshness issue
Context and symptoms Pages recently added to the site are still not appearing in the Content Browser (platform-ca | platform-eu | platform-au). Likely cause and resolutionCause and Resolution |
Extra or unwanted pages
Missing filtering
Context and symptoms All URLs listed in the sitemap file are indexed. Likely cause and resolutionCause By default, the Sitemap source contains no exclusions and the inclusions are set to Include all non-excluded pages. In other words, you’re not filtering out any URLs listed in the sitemap file. Resolution
⚠️ Make sure you don’t exclude your Sitemap URLs. |
Duplicate items due to redirects
Context and symptoms
Likely cause and resolutionCause Your sitemap file contains several URLs that redirect to the same page.
For each URL listed in your sitemap file, the Sitemap source crawler creates an index item with the Resolution There are two ways to address this issue:
|
Content freshness issue
Context and symptoms Pages recently deleted from the site are still appearing in the Content Browser (platform-ca | platform-eu | platform-au). Likely cause and resolutionCause and Resolution |
Unexpected or missing content inside pages
Indexing by reference
Context and symptoms
Likely cause and resolutionCause You may be indexing by reference. When indexing by reference, the body of the web page (used for the Quick view) isn’t retrieved and no excerpt (used for the item description) is generated. Resolution Access the Edit configuration with JSON panel.
If HTML documents are currently indexed by |
Broken images in the Quick view
Context and symptoms When accessing the Quick view of an item, images are broken. Likely cause and resolutionCause The connector retrieves web page HTML as is and doesn’t retrieve the images referenced in the HTML.
The Content Browser Quick view displays this HTML without any alteration.
This means it doesn’t replace relative paths, such as Images that require authentication to be viewed also appear broken when browsing the web page item Quick view in the Content Browser. Resolution None. This is a known limitation of the Content Browser Quick view. The Quick view is intended to provide a preview of the item content, not a full rendering of the web page.
To view the full web page, users can open the original document by clicking the item |
YouTube player not available in the Quickview component
Context and symptoms In the Quickview component of a Coveo JavaScript Search Framework search result, the YouTube player isn’t available. You notice the following symptoms:
Likely cause and resolutionCause For security reasons, the only way to view a YouTube video in the YouTube player within a Coveo JavaScript Search Framework result template is by:
Resolution
The following is a sample implementation:
|
Copy protection on PDF
Context and symptoms When viewing a PDF item in the Content Browser (platform-ca | platform-eu | platform-au), you notice the following:
Likely cause and resolutionCause The PDF is password-protected. Therefore, the source can’t retrieve the document binary content it needs to generate the description and the Quick view. Resolution
|
Web scraping issue
Context and symptoms
Likely cause and resolutionCause A web scraping configuration may be removing the missing sections. Resolution Access your source configuration and review your web scraping configurations. Focus on your Pages to target configurations and whether you can set more restrictive Elements to exclude selectors. |
Missing dynamic content
Context and symptoms
Likely cause and resolutionCause The source may be crawling your page before all its dynamic content is rendered. Resolution Access your source configuration. In the Advanced settings subtab, make sure Execute JavaScript on pages is enabled. Increase the Add time for the crawler to wait before considering a page as fully rendered value, if need be. |
HTML pages indexed as txt items
Context and symptoms When accessing the Content Browser (platform-ca | platform-eu | platform-au), pages are appearing under the Likely cause and resolutionCause The web page, at the moment it’s crawled, isn’t valid HTML. If the page includes dynamic content, it might not be fully rendered when the crawler processes it. Resolution
|
Login page content instead of proper page content
Context and symptoms
Likely cause and resolutionCause The page to index is protected and form authentication isn’t properly set up. Resolution
|
Indexing pipeline extension
Context and symptoms
Likely cause and resolutionCause An IPE may be removing the missing sections. Resolution Review the logs for the items affected by the extensions. Make necessary adjustments to the extension script or conditions. |
Unexpected item field values
Inexistent field
Context and symptoms
Likely cause and resolutionCause The field doesn’t exist. You need to create the field and the field mapping. Resolution
|
Field mapping issue
Context and symptoms
Likely cause and resolutionCause There may be a field mapping issue. Resolution
|
Metadata extraction issue
Context and symptoms
Likely cause and resolutionCause There may be a metadata extraction issue specifically for that item. Resolution Search for reasons why the metadata extraction process wouldn’t be working on your specific item. For example, if you’re using a web scraping configuration, access your source configuration and validate the following:
|
Title field value selection
Context and symptoms The item Likely cause and resolutionCause Coveo has a Resolution Coveo automatically extracts several pieces of metadata that you can use as item titles.
See Item title selection mapping rule options to control the value selection process.
Edit the |
Metadata origin selection
Context and symptoms
Example: Likely cause and resolutionCause There’s a metadata origin selection issue. For example, you’ve configured a web scraping configuration to extract a When values for the same metadata name are extracted in the crawling stage and in the processing (or converter) stage of the Coveo indexing pipeline, the latter value is used by default to populate the mapped field. Example: Resolution
|
Overwritten crawler metadata
Context and symptoms
Likely cause and resolutionCause There’s a metadata conflict. You can have two configurations extracting values for the same metadata name at the crawling stage (for example, one in a web scraping configuration, and another in the sitemap XML file).
When this happens, one value overwrites the other and you only see one Resolution Change the metadata name in your configuration to make it unique and adjust your field mapping rule accordingly. |
Indexing is slow
Throttling
Context and symptoms
Likely cause and resolutionCause The Time the crawler waits between requests to your server value may be too low and the Sitemap crawler doesn’t take into account website Resolution Access your source configuration and increase the Time the crawler waits between requests to your server value (for example, Also ensure your source |
Indexed content is not up to date
Refresh limitations
Context and symptoms
Likely cause and resolutionCause A source refresh doesn’t consider deleted and new sitemap file entries. A rebuild or rescan is required to reflect these changes in your index. Resolution Make sure the Sitemap source rescan schedule is enabled.
The default |
Number of items limit reached
Context and symptoms
Likely cause and resolutionCause Indexing is blocked because you’ve reached the 200% license item usage threshold. Resolution
|
Last modification date refresh support conditions
Context and symptoms
Likely cause and resolutionCause Your sitemap file doesn’t meet all Resolution As a workaround, consider changing the Sitemap source rescan schedule from a |
SkipOnSitemapError setting
Context and symptoms
Likely cause and resolutionCause The source may be configured with Resolution
|