Troubleshooting Web source issues

This is for:

In this article

Important: Troubleshooting fundamentals
Common issues

This article provides troubleshooting best practices and lists common issues when indexing content with the Web source.

Important: Troubleshooting fundamentals

Though the information provided in the Common issues section will often help you identify and resolve a problem, keep the following in mind:

A given set of symptoms can be caused by different underlying issues.
When you expand a content update activity in the Activity panel or Activity Browser (platform-ca | platform-eu | platform-au) page, the error code and messages displayed may only be general indicators of the problem.
Coveo only halts an indexing operation and displays an error when specific conditions are met.

Consequently, finding the root cause of an issue may require more granular information, which only update logs can deliver.

To download an update log

On the Sources (platform-ca | platform-eu | platform-au) page, click the desired resource, and then click Activity in the Action bar.
In the Activity panel that opens, click the desired activity, and then click Download Logs in the Action bar. The downloaded file is named after the unique operation ID representing the selected activity.

To locate issue root causes in logs

Open the log file in a text file viewer.
Look for WARN,ERROR, and FATAL messages.

Use a log file viewer that supports highlighting by log level to make these messages more noticeable.
If necessary, review INFO messages. They sometimes reveal a configuration that you overlooked and that may be causing the issue.

Common issues

Issues are divided into categories. Click a category description below to reach the related section.

Missing pages
Extra or unwanted pages
Unexpected or missing content inside pages
Unexpected item field values
Indexing is slow
Indexed content isn’t up to date

Missing pages

User agent blocklisting

Context and symptoms:

No items or a limited number of pages are indexed.
The Activity Browser (platform-ca | platform-eu | platform-au) page may display a WEB_FORBIDDEN_ERROR error code.

Likely cause and resolution

Cause: Your web server may be blocking access to the Coveo crawler user agent (for example, using the .htaccess file mod_rewrite module on an Apache server, or the URL Rewrite module on an IIS server).

Resolution:

Chrome, and other web browsers, let you emulate web requests made with a specific user agent by overriding your browser default user agent string. Use this feature to test if your web server is blocking the Coveo crawler user agent (that is, Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko) (compatible; Coveobot/2.0;+http://www.coveo.com/bot.html)).
If applicable, make the relevant changes to your web server or Web source configuration.
1. (Recommended) Remove the Coveo crawler user agent from the blocklist on your web server.
2. Access the Edit configuration with JSON panel and set the UserAgentString parameter to a value that your web server doesn’t block.
Rebuild your source.

IP blocklisting

Context and symptoms:

No items are indexed.
Other symptoms may vary. For example, the Activity Browser (platform-ca | platform-eu | platform-au) page may display a WEB_FORBIDDEN_ERROR error code or an HTTP request may simply time out with no error.

Likely cause and resolution

Cause: Your network may be blocking inbound requests from the Coveo Platform.

Resolution: Allow inbound requests from the Coveo Platform or consider installing the Coveo Crawling Module on your infrastructure to push documents to Coveo instead.

RespectUrlCasing setting issue

Context and symptoms:

Missing pages have URLs that contain uppercase characters. All indexed pages have URLs that only contain lowercase characters.
Other symptoms may vary. For example, the Activity Browser (platform-ca | platform-eu | platform-au) page may display an authentication issue (for example, a WEB_FORBIDDEN_ERROR error code).
The RespectUrlCasing JSON parameter is set to false.

Likely cause and resolution

Cause: With RespectUrlCasing set to false, the Web source crawler lowercases a URL it discovers, and then requests the lowercased URL from the server. If the web server is case sensitive, it doesn’t recognize the requested URL and doesn’t serve the request.

Resolution: Access the Edit configuration with JSON panel and set RespectUrlCasing to true.

Crawling rules issue

Symptom: The Content Browser (platform-ca | platform-eu | platform-au) doesn’t show all the web pages you wanted to index.

Likely cause and resolution

Cause: Your current Crawling rules exclusions and inclusions are filtering out the pages you wanted to index.

Resolution: Open your source and review your exclusions and inclusions.

To be indexed, a page:

Must not match any exclusion rule, AND
It must match at least one inclusion rule (for example, by selecting the Include all non-excluded pages option).

Starting URL exclusion

Context and symptoms:

The Content Browser (platform-ca | platform-eu | platform-au) doesn’t show all web pages you want to index.
One of your Starting URLs isn’t indexed.
The Activity Browser (platform-ca | platform-eu | platform-au) page may show a WEB_NO_DOCUMENT_INDEXED_DUE_TO_FILTERS error code.

Likely cause and resolution

Cause: Your current Crawling rules exclusions and inclusions are filtering out that starting URL. Consequently, the crawler can’t index the pages that are reachable via that starting URL.

Resolution: Open your source. Make adjustments to your exclusions and inclusions to ensure the starting URL and all pages accessible through it aren’t filtered out.

To be indexed, a page:

must not match any exclusion rule, AND
it must match at least one inclusion rule (for example, by selecting the Include all non-excluded pages option).

301 Moved Permanently redirect

Context and symptoms:

The Activity Browser (platform-ca | platform-eu | platform-au) page may show a NO_DOCUMENT_INDEXED error code.
Your Starting URL doesn’t include a www segment (for example, https://abc.com).
When trying to access that starting URL manually in a browser, you’re automatically redirected to a page that includes the www segment (for example, https://www.abc.com).

Likely cause and resolution

Cause: By default, the Web source only indexes pages that are internal to the site. The Web source is considering the page it’s redirected to (for example, https://www.abc.com) as external to the website (for example, https://abc.com). This internal/external validation is unrelated to exclusion and inclusion rules.

Resolution: If you’re only getting started with a new Web source, you might simply want to delete the source, start fresh with a new one, and include the www segment in the Starting URL. Otherwise, you can proceed as follows:

Access the Edit configuration with JSON panel.

In the JSON configuration field, under the parameters property, add the following code block:
```
"IndexSubdomains": {
  "sensitive": false,
  "value": "true"
},
```
The JSON configuration field should now look as follows:

Click Save to exit the Edit configuration with JSON panel.
If necessary, make adjustments to your exclusions and inclusions to ensure the redirection URLs (for example, https://www.abc.com/something) aren’t filtered out. You might also need to add exclusion or inclusion rules to filter out unwanted subdomain pages.
Rebuild your source.

Orphan pages

Context and symptoms:

The Content Browser (platform-ca | platform-eu | platform-au) doesn’t show all web pages you want to index.
All of your Starting URLs are indexed.

Likely cause and resolution

Cause: The missing pages may be orphan pages.

Resolution:

Add links to these pages in your website so that the Web source crawler and other search engines may reach them, OR
Open your source. Add Starting URLs for your orphan pages. If necessary, make adjustments to your exclusions and inclusions to ensure the added Starting URLs aren’t filtered out.

Missing or invalid basic authentication configuration

Context and symptoms:

A page isn’t indexed.
The Activity Browser (platform-ca | platform-eu | platform-au) page may show a WEB_AUTHENTICATION_ERROR error code.
When trying to access that page manually in a browser, you’re prompted for credentials in a pop-up window.

Likely cause and resolution

Cause: Accessing the page content requires basic authentication.

Resolution:

Request the authentication credentials from the web server administrator. Then, open your source and configure Basic authentication on the source.
If you’re using a password manager (for example, LastPass), it may replace the previously recorded username and password with different ones as you edit the source. We recommend checking your password manager options and ensuring that it respects the autocomplete="off" attribute.

Missing or invalid form authentication configuration

Context and symptoms:

A page isn’t indexed.
The Activity Browser (platform-ca | platform-eu | platform-au) page may show a WEB_AUTHENTICATION_ERROR error code.
When trying to access that page manually in a browser, a login page is displayed instead.

Likely cause and resolution

Cause: Accessing the page requires form authentication.

Resolution:

Request the authentication credentials from the web server administrator. Then, open your source and configure Form authentication on the source.
If you’re using a password manager (for example, LastPass), it may replace the previously recorded username and password with different ones as you edit the source. We recommend checking your password manager options and ensuring that it respects the autocomplete="off" attribute.

Authentication status validation issue

Context and symptoms:

A page isn’t indexed.
Accessing that page requires form authentication.
When trying to access that page manually in a browser, the form authentication Login page address page is displayed. Typing in the credentials and submitting the login page brings up the page to be indexed.

Likely cause and resolution

Cause: The authentication Validation method might not be configured properly.

Resolution: Open your source. Make sure the Validation method you’ve selected and the associated value are adequate.

Redirection to login page issue

Context and symptoms:

A page isn’t indexed.
Accessing that page requires form authentication and your source configuration Validation method is Redirection to URL.
When trying to access that page manually in a browser, the form authentication Login page address page isn’t displayed.

Likely cause and resolution

Cause: The Redirection to URL Validation method doesn’t work in your use case. Consequently, the Web source crawler doesn’t know it must authenticate before accessing the page to index.

Resolution:

Open your source and choose another Validation method. Select a method based on the way the web server responds when you manually try to access the page to index (when unauthenticated).
If no reliable validation method can be found, try enabling the form authentication Force authentication option.

Content freshness issue

Symptom: Pages recently added to the site are still not appearing in the Content Browser (platform-ca | platform-eu | platform-au).

Likely cause and resolution

Cause and Resolution: See Indexed content is not up to date.

Extra or unwanted pages

Query parameters

Context and symptoms:

The Content Browser (platform-ca | platform-eu | platform-au) shows duplicate items.
Duplicate item URIs only differ in their query string parameter values.

Example:

Two items in the Content Browser with identical URIs except for query string parameter values

Likely cause and resolution

Cause: You’re currently not specifying that the query string parameter should be ignored.

Resolution: Open your source. In the Advanced settings tab, add the parameter to the Query parameters to ignore list.

Example:

Adding a parameter to the Query parameters to ignore list

Multiple URL variants

Context and symptoms:

The Content Browser (platform-ca | platform-eu | platform-au) shows duplicate items.
Duplicate item URIs only differ in their casing.
The RespectUrlCasing JSON parameter is set to true.

Likely cause and resolution

Cause: The Web source crawler discovers multiple variants of the same page, each with different URL casings.

Resolution: If the web server is case insensitive, access the Edit configuration with JSON panel and set RespectUrlCasing to false.

⚠️ Don’t set RespectUrlCasing to false if the web server is case sensitive. If you do, pages with uppercase characters in their URL won’t be indexed.

Missing filtering

Symptom: The Content Browser (platform-ca | platform-eu | platform-au) shows web pages you don’t want to index.

Likely cause and resolution

Cause: Your current Crawling rules exclusions and inclusions don’t filter out the unwanted pages.

Resolution: Open your source and configure exclusions and inclusions to filter out the unwanted pages.

To be indexed, a page:

must not match any exclusion rule, AND
it must match at least one inclusion rule (for example, by selecting the Include all non-excluded pages option).

Content freshness issue

Symptom: Pages recently deleted from the website are still appearing in the Content Browser (platform-ca | platform-eu | platform-au).

Likely cause and resolution

Cause and Resolution: See Indexed content is not up to date.

Unexpected or missing content inside pages

Indexing by reference

Context and symptoms:

When viewing source items in the Content Browser (platform-ca | platform-eu | platform-au), item Description areas are empty.
If you then click a specific item, and then click Properties, the Quick view tab isn’t displayed.

Likely cause and resolution

Cause: You may be indexing by reference. When indexing by reference, the body of the web page (used for the Quick view) isn’t retrieved and no excerpt (used for the item description) is generated.

Resolution:

Access the Edit configuration with JSON panel. If HTML documents are currently indexed by Reference, change that value to Retrieve.

Broken images in the Quick view

Context and symptoms:

When accessing the Quick view of an item, images are broken.

Likely cause and resolution

Cause: The connector retrieves web page HTML as is and doesn’t retrieve the images referenced in the HTML. The Content Browser Quick view displays this HTML without any alteration. This means it doesn’t replace relative paths, such as <img src="/sites/…/myimage.jpg">, with the corresponding absolute paths, such as <img src="https://…/myimage.jpg">. As a result, when web pages contain images that are referenced using relative paths, the images can’t be displayed in the Content Browser Quick view.

Images that require authentication to be viewed also appear broken when browsing the web page item Quick view in the Content Browser.

Resolution: None. This is a known limitation of the Content Browser Quick view.

The Quick view is intended to provide a preview of the item content, not a full rendering of the web page. To view the full web page, users can open the original document by clicking the item clickable URI link in the search results.

YouTube player not available in the Quickview component

Context and symptoms:

In the Quickview component of a Coveo JavaScript Search Framework search result, the YouTube player isn’t available. You notice the following symptoms:

The YouTube video iframe shows the following error message:

Try watching this video on www.youtube.com, or enable JavaScript if it is disabled in your browser.

In the browser console, the following message appears:

Blocked script execution in '<INSERT SEARCH PAGE URL>' because the document's frame is sandboxed and the 'allow-scripts' permission is not set.

Likely cause and resolution

Cause: For security reasons, the only way to view a YouTube video in the YouTube player within a Coveo JavaScript Search Framework result template is by:

Indexing items with the YouTube source.
Using the CoveoYouTubeThumbnail component. (Note: The Content Browser (platform-ca | platform-eu | platform-au) Quick view doesn’t meet this requirement.)

Resolution:

Index YouTube videos with the YouTube source.
In your Coveo JavaScript Search Framework search interface, use the CoveoYouTubeThumbnail component to show a relevant image of the result video content. Clicking the thumbnail starts the video.

The following is a sample implementation:

<div class="CoveoResultList" data-layout="list" data-wait-animation="fade" data-auto-select-fields-to-include="true">
  <script id="YouTubeVideo" class="result-template" type="text/html" data-layout="list" data-field-filetype="YouTubeVideo">
    <div class="coveo-result-frame">
      <div class="coveo-result-row">
        <div class="coveo-result-cell" style="width:220px; padding-top:7px">
          <span class="CoveoYouTubeThumbnail"></span>
        </div>
        <div class="coveo-result-cell">
          <div class="coveo-result-row">
            <div class="coveo-result-cell" style="font-size:16px" role="heading" aria-level="2">
              <a class="CoveoResultLink"></a>
            </div>
            <div class="coveo-result-cell" style="text-align:right; width:120px;font-size:12px">
              <span class="CoveoFieldValue" data-field="@date" data-helper="dateTime"></span>
            </div>
          </div>
          <div class="coveo-result-row" style="margin-top:10px;">
            <div class="coveo-result-cell">
              <span class="CoveoExcerpt"></span>
            </div>
          </div>
          <div class="coveo-result-row" style="margin-top:10px;">
            <div class="coveo-result-cell">
              <span class="CoveoFieldValue" data-field="@author" data-text-caption="Author" style="margin-right:30px;"></span>
              <span class="CoveoFieldValue" data-field="@ytvideoduration" data-helper="timeSpan" data-helper-options-is-milliseconds="false" data-text-caption="Length" style="margin-right:30px;"></span>
              <span class="CoveoFieldValue" data-field="@ytviewcount" data-helper="number" data-helper-options-format="n0" data-text-caption="Views" style="margin-right:30px;"></span>
              <span class="CoveoFieldValue" data-field="@language" data-text-caption="Language" style="margin-right:30px;"></span>
            </div>
          </div>
          <div class="coveo-result-row">
            <div class="coveo-result-cell">
              <div class="CoveoMissingTerms"></div>
            </div>
          </div>
        </div>
      </div>
    </div>
  </script>
  <script id="YouTubePlaylist" class="result-template" type="text/html" data-layout="list" data-field-filetype="YouTubePlaylist">
    <div class="coveo-result-frame">
      <div class="coveo-result-cell" style="vertical-align:top;text-align:center;width:32px;">
        <span class="CoveoIcon" data-small="true" data-with-label="false"></span>
      </div>
      <div class="coveo-result-cell" style="vertical-align: top;padding-left: 16px;">
        <div class="coveo-result-row" style="margin-top:0;">
          <div class="coveo-result-cell coveo-no-wrap" style="vertical-align:top;font-size:16px;" role="heading" aria-level="2">
            <a class="CoveoResultLink"></a>
          </div>
          <div class="coveo-result-cell" style="width:120px;text-align:right;font-size:12px">
            <div class="coveo-result-row">
              <span class="CoveoFieldValue" data-field="@date" data-helper="date"></span>
            </div>
          </div>
        </div>
        <div class="coveo-result-row" style="margin-top:10px;">
          <div class="coveo-result-cell">
            <span class="CoveoFieldValue" data-field="@filetype" data-text-caption="Type" style="margin-right:30px;"></span>
            <span class="CoveoFieldValue" data-field="@author" data-text-caption="Author" style="margin-right:30px;"></span>
            <span class="CoveoFieldValue" data-field="@ytitemcount" data-text-caption="NumberOfVideos" style="margin-right:30px;"></span>
          </div>
        </div>
        <div class="coveo-result-row" style="margin-top:12px;">
          <div class="coveo-result-cell" style="padding-top:5px; padding-bottom:5px; font-size:13px;">
            <span class="CoveoResultFolding" data-result-template-id="YouTubeVideo"></span>
          </div>
        </div>
        <div class="coveo-result-row">
          <div class="coveo-result-cell">
            <div class="CoveoMissingTerms"></div>
          </div>
        </div>
      </div>
    </div>
  </script>
</div>

Copy protection on PDF

Context and symptoms:

When viewing a PDF item in the Content Browser (platform-ca | platform-eu | platform-au), you notice the following:

There’s no description.
The Quick view shows the following:

Likely cause and resolution

Cause: The PDF is password-protected.

Document security on document in file system | Coveo

Therefore, the source can’t retrieve the document binary content it needs to generate the description and the Quick view.

Resolution:

If acceptable, remove the password protection on the PDF in the file system.
Rebuild your source.

Web scraping issue

Context and symptoms:

When accessing the Quick view of an item, sections of the actual web page are missing.
You have one or multiple web scraping configurations configured on your source.

Likely cause and resolution

Cause: A web scraping configuration may be removing the missing sections.

Resolution: Open your source and review your web scraping configurations:

Check that web scraping configurations appear in an order that makes sense. In single-match mode, the first matching configuration is applied to the page, and the following are ignored.
In the Configuration info settings of the applied web scraping configuration, try changing or adding a rule to exclude your page.
In the Elements to exclude tab of the applied web scraping configuration, try making your selector more restrictive to avoid removing sections from your page.

Missing dynamic content

Context and symptoms:

When accessing the Quick view of an item, sections of the actual web page are missing.
Your web page contains dynamically rendered content (for example, responses to JavaScript API calls).

Likely cause and resolution

Cause: The source may be crawling your page before all its dynamic content is rendered.

Resolution: Open your source. In the Advanced settings subtab, make sure Execute JavaScript on pages is enabled. Increase the Time the crawler waits before considering a page as fully rendered value, if necessary.

HTML pages indexed as txt items

Context and symptoms:

When accessing the Content Browser (platform-ca | platform-eu | platform-au), pages are appearing under the txt file type, instead of html.

Likely cause and resolution

Cause: The web page, at the moment it’s crawled, isn’t valid HTML. If the page includes dynamic content, it might not be fully rendered when the crawler processes it.

Resolution:

If the page includes dynamic content, make sure it’s fully rendered when the crawler processes it.
1. Access your source configuration.
2. In the Advanced settings subtab, make sure Execute JavaScript on pages is enabled.
3. Set or increase the Time the crawler waits before considering a page as fully rendered value (for example, 300 milliseconds).
4. Save and rebuild your source.
Fix the HTML of web pages still indexed as txt.
1. Use an HTML markup validator to identify the most significant issues with the page.
2. Fix these markup issues.
3. Rebuild your source.

Context and symptoms:

When accessing the Quick view of an item, you notice that a login page content appears instead of the content of the page specified by the URI. This symptom will likely repeat itself over many items.
When trying to access the page to index manually in a browser, you’re redirected to that login page.

Likely cause and resolution

Cause: The page to index is protected and form authentication isn’t properly set up.

Resolution:

Request the login page authentication credentials from the web server administrator.
Open your source.
Configure form authentication. Use the provided username and password and set the Login page address to the login page URL.
Set the Validation method to Redirection to URL and the Value to the login page URL.
Rebuild your source.
Validate that the item now contains the proper content.

Indexing pipeline extension

Context and symptoms:

When accessing the Quick view of an item, sections of the actual web page are missing.
The Extensions (platform-ca | platform-eu | platform-au) page shows you have one or several indexing pipeline extensions (IPEs) in your Coveo organization.

Likely cause and resolution

Cause: An IPE may be removing the missing sections.

Resolution: Review the logs for the items affected by the extensions. Make necessary adjustments to the extension script or conditions.

Unexpected item field values

Inexistent field

Context and symptoms:

When inspecting an item in the Content Browser (platform-ca | platform-eu | platform-au), the expected field name doesn’t appear.
On the Fields (platform-ca | platform-eu | platform-au) page, the expected field doesn’t appear.

Likely cause and resolution

Cause: The field doesn’t exist. You need to create the field and the field mapping.

Resolution:

On the Fields (platform-ca | platform-eu | platform-au) page, at the upper right, click Add field.
Follow instructions in the Add or edit a field article to configure your field.
On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click More > View and map metadata.
Choose the metadata you want to use to populate the field.
On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click Mappings in the Action bar.
Follow instructions in the Add or edit a mapping rule section to configure your mapping.

Field mapping issue

Context and symptoms:

When inspecting items that should have values for the field in the Content Browser (platform-ca | platform-eu | platform-au), the expected field name doesn’t appear in any item.
On the Fields (platform-ca | platform-eu | platform-au) page, the expected field appears.

Likely cause and resolution

Cause: There may be a field mapping issue.

Resolution:

On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click More > View and map metadata.
Make sure the metadata that should be used to populate your field appears. If the metadata is being used to populate a field, it will be shown as Indexed. If you see two entries under the same metadata name, take note of the indexed and not indexed metadata Origin values for the final step in this procedure.
On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click Mappings in the Action bar.
Make sure the mapping rule for the field references the right metadata name.
Add or edit the Origin value in the field mapping rule (for example, %[description:crawler]).

Metadata extraction issue

Context and symptoms:

When inspecting an item in the Content Browser (platform-ca | platform-eu | platform-au), the expected field name doesn’t appear.
When inspecting other items from the source in the Content Browser (platform-ca | platform-eu | platform-au), the field appears with values in some or all of them.

Likely cause and resolution

Cause: There may be a metadata extraction issue specifically for that item.

Resolution: Search for reasons why the metadata extraction process wouldn’t be working on your specific item. For example, if you’re using a web scraping configuration, open your source, and then validate the following:

Your item matches the Configuration info rules you set for that web scraping configuration. Also remember that only the first matching web scraping configuration is applied to the page in single-match mode.
Your CSS or XPath selector works for that specific item.

Title field value selection

Symptom: The item title field value isn’t ideal.

Likely cause and resolution

Cause: Coveo has a title field selection process to ensure all indexed items have titles. This process may not return ideal titles in your use case.

Resolution: Coveo automatically extracts several pieces of metadata that you can use as item titles. See Item title selection mapping rule options to control the value selection process. Edit the title field mappings on your source.

Metadata origin selection

Context and symptoms:

The indexed item has a value for the given field, but that value isn’t the expected one.
On the Sources (platform-ca | platform-eu | platform-au) page, when you click the source and then click More > View and map metadata, you see two entries under the same metadata name.

Example:

Likely cause and resolution

Cause: There’s a metadata origin selection issue.

For example, you’ve configured a web scraping configuration to extract a description metadata. The Web source may also be automatically extracting description metadata from the page <meta> tags.

When values for the same metadata name are extracted in the crawling stage and in the processing (or converter) stage of the Coveo indexing pipeline, the latter value is used by default to populate the mapped field.

Example:

Resolution:

Use a unique metadata name and create a dedicated field for the custom metadata you’re extracting, OR
Access the Edit mappings panel. Specify the origin value in the field mapping rule (for example, %[description:crawler]) to populate the field with the custom metadata you’re extracting.

Overwritten crawler metadata

Context and symptoms:

The indexed item has a value for the given field.
On the Sources (platform-ca | platform-eu | platform-au) page, when you click the source and then click More > View and map metadata, you see an entry under the metadata name you chose under the Crawler origin.
You specified the origin value in the field mapping rule (for example, %[description:crawler]), but the field value isn’t the expected one.

Likely cause and resolution

Cause: There’s a metadata conflict.

You can have two configurations extracting values for the same metadata name at the crawling stage. When this happens, one value overwrites the other and you only see one Crawler origin entry for that metadata name on the View and map metadata subpage.

Resolution: Change the metadata name in your configuration to make it unique and adjust your field mapping rule accordingly.

Indexing is slow

Source scope

Symptom: Indexing the source pages is taking a long time.

Likely cause and resolution

Cause: The Web source may be crawling and indexing a very high number of pages, and maybe even unwanted pages. This may be due to a number of reasons (for example, high number of starting URLs, too broad crawling rule exclusions and inclusions).

Resolution:

See Extra or unwanted pages.
Consider breaking up the Web source into multiple sources. This helps performance-wise and it simplifies source configurations and troubleshooting, OR
Consider using one or multiple Sitemap sources instead.

Crawl delay

Symptom: Indexing the source pages is taking a long time.

Likely cause and resolution

Cause: The Time the crawler waits between requests to your server may be unnecessarily high.

Resolution: Provide proof of website ownership. Then, open your source and, in the Advanced settings tab, reduce the Time the crawler waits between requests to your server value.

ExpandBeforeFiltering setting

Symptom: Indexing the source pages is taking a long time.

Likely cause and resolution

Cause: The source may be configured with ExpandBeforeFiltering set to true.

Resolution: Consider editing the source JSON configuration and setting ExpandBeforeFiltering to false.

Indexed content is not up to date

Source rescan schedule

Symptom: Recent changes to site pages aren’t reflected in the Content Browser (platform-ca | platform-eu | platform-au).

Likely cause and resolution

Cause:

Your source rescan schedule may be disabled, OR
The time interval between consecutive rescans might be too long.

Resolution: Make sure the rescan schedule is enabled and that its recurrence settings are adequate.

Number of items limit reached

Context and symptoms:

Recent updates to web pages and newly added pages aren’t reflected in the Content Browser (platform-ca | platform-eu | platform-au).
The Activity Browser (platform-ca | platform-eu | platform-au) shows schedule-triggered rescan activities are failing with a DOCUMENT_LIMIT_EXCEEDED error code.
In the Limits (platform-ca | platform-eu | platform-au) page, in the Content section, you see that items usage is over twice your license limit.

Likely cause and resolution

Cause: Indexing is blocked because you’ve reached the 200% license item usage threshold.

Resolution:

If possible, delete unused sources to bring the item count below the 200% threshold. Then, see the July 20, 2023 Coveo Platform update for suggestions on how to reduce your item count even more.
To reassess your needs and discuss your options, contact your Coveo Customer Success Manager.