Index With the HTMLContentInBodyWithRequestsProcessor
Index With the HTMLContentInBodyWithRequestsProcessor
The recommended Coveo for Sitecore HTML processor is the
When enabled, the
HTMLContentInBodyWithRequestsProcessor is fired during the indexing operation and will request the HTML page associated with the content being indexed.
Once the page is rendered, the processor will index the content for it to be searchable by users.
This HTTP request introduces a delay when indexing.
Configuring the HTMLContentInBodyWithRequestsProcessor
Follow these steps to enable it in the configuration file.
On the Sitecore instance server, open the
Coveo.SearchProvider.Custom.configfile. By default, the file is located in
Add the following processor element:
<processor type="Coveo.SearchProvider.Processors.HtmlContentInBodyWithRequestsProcessor, Coveo.SearchProviderBase"> </processor>
When your Sitecore items can only be accessed by being authenticated, you need to configure form authentication to be able to index their HTML content (see Configuring Form Authentication).
Ignoring Specific HTML Sections
When indexing an item, you might want to ignore certain sections of the HTML body of a layout, sublayout, view, or of a specific item. To do so, you have two options:
The simple option available since September 2016 release of Coveo for Sitecore 4 is to use HTML comments.
The complex option available since the initial release of Coveo for Sitecore 4, is to use a Quick View device.
Option 1: Using HTML Comments
Before modifying the HTML of your items, you need to configure the processor.
Coveo.SearchProvider.Custom.configfile, locate the previously created
Add the following elements between the processor start and end tags:
<StartCommentText>BEGIN NOINDEX</StartCommentText> <EndCommentText>END NOINDEX</EndCommentText>
You can replace
END NOINDEXwith the text of your choice but avoid using special characters. Use of special characters may cause the following error:
ERROR HtmlContentInBodyWithRequestsProcessor not correctly configured, start and end comments are either invalid or the same.
Modify the HTML of Your Item
Now that the processor is properly configured, you need to edit the HTML of the layout, sublayout, view, or item you want to modify. To do so, follow these steps:
Open the HTML of the item you want to modify:
For a layout or sublayout, navigate to the
<SITECORE_INSTANCE_ROOT>\Website\layoutsfolder, and open the .
aspxfile you want to modify.
For a view, navigate to
<SITECORE_INSTANCE_ROOT>\Website\Viewsfolder, and open the
.cshtmlfile you want to modify.
For a specific item, in your Sitecore Content Editor, select the item you want to modify and, under the section you want to modify, select Edit HTML. If you don’t have the Edit HTML option, edit in the textbox itself.
<!-- BEGIN NOINDEX -->and
<!-- END NOINDEX -->(or the start and end tags you defined in the Configure the HtmlContentInBodyWithRequestsProcessor section) around the part you don’t want to index, as follows:
<!-- BEGIN NOINDEX --> <p>This section won't be indexed.</p> <!-- END NOINDEX --> <p>This section will be indexed.</p>
Save your item, and rebuild it (see Re-Indexing Only a Section of Your Content Tree). Your undesired sections should now be ignored by the index.
Option 2: Using a Quick View device
This strategy implies the creation of a specific device for the quick view that detects the user agent used for crawling (see Creating the Quick View Device). The quick view page also allows you to set meta tags that add content to item fields.
Those fields could then be used for sorting or faceted searches (see Adding Meta Information on the Quick View).
To have Quick View on items, make sure to publish all items to the web database (see Constraints and Limitations).
Adding a Specific Quick View Layout
This is a fairly common operation in Sitecore:
Open the Sitecore Content Editor and select the
You’re encouraged to use a project specific folder for the new layout. To do so, right-click and select
Layout Folder. Give the new folder a meaningful name.
Right-click on the folder and select
Follow the wizard steps. A new file will be created in the layouts folder of the web site. You could modify it to suit your needs. The default path is
To make your layouts more manageable, you can define new sublayouts that are template-specific. This way, you can reuse the Quick View Layout for many items.
Creating a Quick View Device
Open the Content Editor and select the item
Right-click and select
Give it a meaningful name, such as
Activate the Content tab and find the Browser agent field in the Detection section.
Set the value to
Coveo Sitecore Search Provider.
You just enabled an additional device to be used later on by Coveo to fetch an item quick view.
Setting Item Preview Layout
Open the Content Editor and select a template for which you want to enable the Preview feature.
When you expand the desired template, you’ll see a sub item named
__Standard Values. Select it.
Presentationtab in the top of the window, and then click Details.
Next to the Quick View device, click the [No layout specified] link.
Select the Quick View Layout from the tree.
If you’re using any sublayout that’s specific to the template, follow these steps:
Click Edit. A new window will appear.
Click the Controls list item on the left.
To add a sublayout, click Add and then select the desired sublayout from the tree.
Type in the name of the placeholder in the Add to Placeholder text box. The default placeholder name in the samples is
The template instances now shows something in Preview mode. To see an item in Preview mode, follow these steps:
Open the Page Editor and navigate to the desired item.
Click Preview to enter preview mode.
The preview will use the Default device. Click Default and select the Quick View device.
Adding Meta Information on the Quick View
It’s also possible to set specific field values on items when using the quick view. It allows, for example, to set field values on an item when rendering a sub item. To add metadata to an item quick view, you only need to output tags like this one:
<meta name="theFieldName" content="The Field Value">
The meta element doesn’t need to be placed in the page head; you can place it anywhere.
When indexing the item, the field
theFieldName is given the value
The Field Value in the index for the item.
Note that the meta name must match the Sitecore field name.
Constraints and Limitations
There are currently some limitations when this method is used.
To get a quick view on an item, the following conditions must be met:
The item must have a layout so the item preview is displayed correctly in Sitecore.
The item preview must be available anonymously.
The quick view contains the item data coming from the web database regardless of the database it belongs to. The returned results are accurate when searches are performed on the web database. When performing full-text search on the master database, results match the item data that’s in the web database. Note that this doesn’t affect item permissions.
If a Sitecore item is secured (can’t be accessed anonymously), Coveo won’t be able to get the quick view for that item.
When adding meta information on the quick view, a Sitecore field with the same name must exist. Otherwise, the meta element is ignored.
Configuring Form Authentication
Form authentication allows you to index the HTML of your Sitecore items that can only be accessed while being authenticated.
To log in, form authentication uses the credentials entered in the
Coveo.SearchProvider.Custom.config during the installation of the Coveo for Sitecore package.
The password is encrypted in the
One of the use cases for form authentication is the
HtmlContentInBodyWithRequestsProcessor pipeline, which uses an
HTTP GET request to fetch the content of a page at indexing time, and attempts a form authentication using the
POST method when the page is secured.
To configure your processor to allow form authentication
Coveo.SearchProvider.Custom.configfile. It’s usually located under
defaultIndexConfigurationnode, add the following nodes:
<formsAuthConfiguration type="Coveo.Framework.Configuration.FormsAuthConfiguration, Coveo.Framework"> <formsAuthLoginPage></formsAuthLoginPage> <formsAuthUserControl></formsAuthUserControl> <formsAuthPasswordControl></formsAuthPasswordControl> <formsAuthLoginCommand></formsAuthLoginCommand> </formsAuthConfiguration>
Enter the following information in the nodes:
formsAuthLoginPage: the URL of your login page.
idof the username text field.
idof the password text field.Note
You can obtain the
idof these fields by inspecting the Sitecore instance authentication page Log in button using your favorite browser.
formsAuthLoginCommand: Enter the
valueattribute values associated with the Sitecore login page button, using the following syntax:
Spaces have to be replaced with the
You want to index content from your
http://www.secured.comSitecore website. You can access the authentication page of the website through
Inspecting this authentication page in your browser, you see the following markup:
Given this markup, you would set the
<formsAuthConfiguration>section element values as follows:
<formsAuthConfiguration type="Coveo.Framework.Configuration.FormsAuthConfiguration, Coveo.Framework"> <formsAuthLoginPage>http://www.secured.com/sitecore/login</formsAuthLoginPage> <formsAuthUserControl>UserName</formsAuthUserControl> <formsAuthPasswordControl>Password</formsAuthPasswordControl> <formsAuthLoginCommand>LogInBtn=Log+in</formsAuthLoginCommand> </formsAuthConfiguration>
Save and close the file.
Rebuild your indexes (see Coveo for Sitecore Indexing Guide).