Index page content

By default, each Sitecore item is indexed in a Coveo index using only the field information retrieved in Sitecore. However, some items have more information than what’s defined in the fields.

This section describes the different content indexing options available and their effect on relevance and performance.

How indexing the HTML of items affects results

The HTML representation of the page content will be set by the index as free-text searchable content. Free text content and free text fields can be queried by entering the search terms directly in the search box.

Using the Coveo JavaScript Search Framework, HTML content enables the CoveoQuickview component on the search results.

In the Coveo Platform Content Browser, the items with HTML rendering will have a File Type attribute equal to HTML while the rest of the content will be of type sitecoreitem (see Inspect items with the Content Browser).

Sitecore Item Types | Coveo

Available HTML content processors

An HTML representation of a page content can be created with and without executing an HTTP request to get the complete page content.

Processors which perform HTTP requests

Although sending an HTTP request during the indexing process is resource-intensive, it’s sometimes the only way to retrieve related content only available when the page is rendered in a browser.

The recommended Coveo for Sitecore processor for this purpose is the FetchPageContentProcessor processor. More configurable than the HTMLContentInBodyWithRequestsProcessor, the FetchPageContentProcessor processor has been the default HTML content processor since the October 2018 release of Coveo for Sitecore 4.1 (see Index page content with the FetchPageContentProcessor).

If you upgraded from Coveo for Sitecore 4 to Coveo for Sitecore 5, you might still be using the HTMLContentInBodyWithRequestsProcessor processor. Coveo for Sitecore 5 provides a simple mechanism to switch from the HTMLContentInBodyWithRequestsProcessor to the FetchPageContentProcessor processor (see When Currently Using HtmlContentInBodyWithRequestsProcessor).

If you embed search-driven components in your Sitecore layouts, this can lead to many unnecessary queries when a processor performs an HTTP request to fetch the HTML content of a page. Coveo for Sitecore provides mechanisms to ignore your search-driven components, whether through processor configuration or by detecting HTTP requests originating from a Coveo agent (see The postProcessing Section and Avoid search queries in page Quick view).

Processor which doesn’t perform HTTP requests

HTTP requests can significantly slow down the indexing process and should be avoided if possible.

Use the BasicHtmlContentInBodyProcessor processor to create an HTML representation of the item without sending an HTTP request (see Index with the BasicHtmlContentInBodyProcessor).