Index Page Content
Index Page Content
By default, each Sitecore item is indexed in a Coveo index using only the field information retrieved in Sitecore. However, some items have more information than what’s defined in the fields.
This section describes the different content indexing options available and their effect on relevance and performance.
How Indexing the HTML of Items Affects Results
The HTML representation of the page content will be set by the index as free-text searchable content. Free text content and free text fields can be queried by entering the search terms directly in the search box.
In the Coveo Content Browser, the items with HTML rendering will have a
File Type attribute equal to
HTML while the rest of the content will be of type
sitecoreitem (see Inspect items with the Content Browser).
Available HTML Content Processors
An HTML representation of a page content can be created with and without executing an HTTP request to get the complete page content.
Processors Which Perform HTTP Requests
Although sending an HTTP request during the indexing process is resource-intensive, it’s sometimes the only way to retrieve related content only available when the page is rendered in a browser.
The recommended Coveo for Sitecore processor for this purpose is the
More configurable than the
FetchPageContentProcessor processor has been the default HTML content processor since the October 2018 release of Coveo for Sitecore 4.1 (see Index Page Content With the FetchPageContentProcessor).
If you upgraded from Coveo for Sitecore 4 to Coveo for Sitecore 5, you might still be using the
Coveo for Sitecore 5 provides a simple mechanism to switch from the
HTMLContentInBodyWithRequestsProcessor to the
FetchPageContentProcessor processor (see When Currently Using HtmlContentInBodyWithRequestsProcessor).
If you embed search-driven components in your Sitecore layouts, this can lead to many unnecessary queries when a processor performs an HTTP request to fetch the HTML content of a page. Coveo for Sitecore provides mechanisms to ignore your search-driven components, whether through processor configuration or by detecting HTTP requests originating from a Coveo agent (see The postProcessing Section and Avoid Search Queries in Page Quick View).
Processor Which Doesn’t Perform HTTP Requests
HTTP requests can significantly slow down the indexing process and should be avoided if possible.
BasicHtmlContentInBodyProcessor processor to create an HTML representation of the item without sending an HTTP request (see Index With the BasicHtmlContentInBodyProcessor).