Index page content
Index page content
By default, each Sitecore item is indexed in a Coveo index using only the field information retrieved in Sitecore. However, some items have more information than what’s defined in the fields.
This section describes the different content indexing options available and their effect on relevance and performance.
How indexing the HTML of items affects results
The HTML representation of the page content will be set by the index as free-text searchable content. Free text content and free text fields can be queried by entering the search terms directly in the search box.
Using the Coveo JavaScript Search Framework, HTML content enables the CoveoQuickview component on the search results.
In the Coveo Platform Content Browser, the items with HTML rendering will have a File Type
attribute equal to HTML
while the rest of the content will be of type sitecoreitem
(see Inspect items with the Content Browser).
Available HTML content processors
An HTML representation of a page content can be created with and without executing an HTTP request to get the complete page content.
Processors which perform HTTP requests
Although sending an HTTP request during the indexing process is resource-intensive, it’s sometimes the only way to retrieve related content only available when the page is rendered in a browser.
The recommended Coveo for Sitecore processor for this purpose is the FetchPageContentProcessor
processor.
More configurable than the HTMLContentInBodyWithRequestsProcessor
, the FetchPageContentProcessor
processor has been the default HTML content processor since the October 2018 release of Coveo for Sitecore 4.1 (see Index page content with the FetchPageContentProcessor).
If you upgraded from Coveo for Sitecore 4 to Coveo for Sitecore 5, you might still be using the HTMLContentInBodyWithRequestsProcessor
processor.
Coveo for Sitecore 5 provides a simple mechanism to switch from the HTMLContentInBodyWithRequestsProcessor
to the FetchPageContentProcessor
processor (see When Currently Using HtmlContentInBodyWithRequestsProcessor).
If you embed search-driven components in your Sitecore layouts, this can lead to many unnecessary queries when a processor performs an HTTP request to fetch the HTML content of a page. Coveo for Sitecore provides mechanisms to ignore your search-driven components, whether through processor configuration or by detecting HTTP requests originating from a Coveo agent (see The postProcessing Section and Avoid search queries in page Quick view).
Processor which doesn’t perform HTTP requests
HTTP requests can significantly slow down the indexing process and should be avoided if possible.
Use the BasicHtmlContentInBodyProcessor
processor to create an HTML representation of the item without sending an HTTP request (see Index with the BasicHtmlContentInBodyProcessor).