Index with the HTMLContentInBodyWithRequestsProcessor
Index with the HTMLContentInBodyWithRequestsProcessor
Deprecated
The recommended Coveo for Sitecore HTML processor is the |
When enabled, the HTMLContentInBodyWithRequestsProcessor
is fired during the indexing operation and will request the HTML page associated with the content being indexed.
Once the page is rendered, the processor will index the content for it to be searchable by users.
This HTTP request introduces a delay when indexing. |
Configuring the HTMLContentInBodyWithRequestsProcessor
Follow these steps to enable it in the configuration file.
-
On the Sitecore instance server, open the
Coveo.SearchProvider.Custom.config
file. By default, the file is located in<SITECORE_INSTANCE_ROOT>\Website\App_Config\Include
. -
Find the
coveoPostItemProcessingPipeline
element. -
Add the following processor element:
<processor type="Coveo.SearchProvider.Processors.HtmlContentInBodyWithRequestsProcessor, Coveo.SearchProviderBase"> </processor>
When your Sitecore items can only be accessed by being authenticated, you need to configure form authentication to be able to index their HTML content (see Configuring form authentication). |
Ignoring specific HTML sections
When indexing an item, you might want to ignore certain sections of the HTML body of a layout, sublayout, view, or of a specific item. To do so, you have two options:
-
The simple option available since September 2016 release of Coveo for Sitecore 4 is to use HTML comments.
-
The complex option available since the initial release of Coveo for Sitecore 4, is to use a Quick view device.
Option 1: Using HTML comments
Configure the HtmlContentInBodyWithRequestsProcessor
Before modifying the HTML of your items, you need to configure the processor.
-
In the
Coveo.SearchProvider.Custom.config
file, locate the previously createdHtmlContentInBodyWithRequestsProcessor
in thecoveoPostItemProcessingPipeline
. -
Add the following elements between the processor start and end tags:
<StartCommentText>BEGIN NOINDEX</StartCommentText> <EndCommentText>END NOINDEX</EndCommentText>
You can replace
BEGIN NOINDEX
andEND NOINDEX
with the text of your choice but avoid using special characters. Use of special characters may cause the following error:ERROR HtmlContentInBodyWithRequestsProcessor not correctly configured, start and end comments are either invalid or the same.
Modify the HTML of your item
Now that the processor is properly configured, you need to edit the HTML of the layout, sublayout, view, or item you want to modify. To do so, follow these steps:
-
Open the HTML of the item you want to modify:
-
For a layout or sublayout, navigate to the
<SITECORE_INSTANCE_ROOT>\Website\layouts
folder, and open the .aspx
file you want to modify. -
For a view, navigate to
<SITECORE_INSTANCE_ROOT>\Website\Views
folder, and open the.cshtml
file you want to modify. -
For a specific item, in your Sitecore Content Editor, select the item you want to modify and, under the section you want to modify, select Edit HTML. If you don’t have the Edit HTML option, edit in the textbox itself.
-
-
Add
<!-- BEGIN NOINDEX -->
and<!-- END NOINDEX -->
(or the start and end tags you defined in the Configure the HtmlContentInBodyWithRequestsProcessor section) around the part you don’t want to index, as follows:<!-- BEGIN NOINDEX --> <p>This section won't be indexed.</p> <!-- END NOINDEX --> <p>This section will be indexed.</p>
-
Save your item, and rebuild it (see Re-indexing only a section of your content tree). Your undesired sections should now be ignored by the index.
Option 2: Using a Quick view device
This strategy implies the creation of a specific device for the Quick view that detects the user agent used for crawling (see Creating the Quick view device). The Quick view page also allows you to set meta tags that add content to item fields.
Those fields could then be used for sorting or faceted searches (see Adding meta information on the Quick view).
To have Quick view on items, make sure to publish all items to the web database (see Constraints and limitations).
Adding a specific Quick view layout
This is a fairly common operation in Sitecore:
-
Open the Sitecore Content Editor and select the
/sitecore/Layout/Layouts
item. -
You’re encouraged to use a project specific folder for the new layout. To do so, right-click and select
Insert
>Layout Folder
. Give the new folder a meaningful name. -
Right-click the folder and select
Insert
>Layout
. -
Follow the wizard steps. A new file will be created in the layouts folder of the web site. You could modify it to suit your needs. The default path is
<SITECORE_INSTANCE_ROOT>\website\layouts
.
To make your layouts more manageable, you can define new sublayouts that are template-specific. This way, you can reuse the Quick view layout for many items.
Creating a Quick view device
-
Open the Content Editor and select the item
/sitecore/Layout/Devices
. -
Right-click and select
Insert
>Device
. -
Give it a meaningful name, such as
Quick view
. -
Activate the Content tab and find the Browser agent field in the Detection section.
-
Set the value to
Coveo Sitecore Search Provider
.
You just enabled an additional device to be used later on by Coveo to fetch an item Quick view.
Setting item preview layout
-
Open the Content Editor and select a template for which you want to enable the Preview feature.
-
When you expand the desired template, you’ll see a sub item named
__Standard Values
. Select it. -
Click the
Presentation
tab in the top of the window, and then click Details. -
Next to the Quick View device, click the [No layout specified] link.
-
Select the Quick View Layout from the tree.
-
If you’re using any sublayout that’s specific to the template, follow these steps:
-
Click Edit. A new window will appear.
-
Click the Controls list item on the left.
-
To add a sublayout, click Add and then select the desired sublayout from the tree.
-
Type in the name of the placeholder in the Add to Placeholder text box. The default placeholder name in the samples is
content
.
-
The template instances now shows something in Preview mode. To see an item in Preview mode, follow these steps:
-
Open the Page Editor and navigate to the desired item.
-
Click Preview to enter preview mode.
-
The preview will use the Default device. Click Default and select the Quick View device.
Adding meta information on the Quick view
It’s also possible to set specific field values on items when using the Quick view. It allows, for example, to set field values on an item when rendering a sub item. To add metadata to an item Quick view, you only need to output tags like this one:
<meta name="theFieldName" content="The Field Value">
The meta element doesn’t need to be placed in the page head; you can place it anywhere.
When indexing the item, the field theFieldName
is given the value The Field Value
in the index for the item.
Note that the meta name must match the Sitecore field name.
Constraints and limitations
There are currently some limitations when this method is used.
-
To get a Quick view on an item, the following conditions must be met:
-
The item must have a layout so the item preview is displayed correctly in Sitecore.
-
The item preview must be available anonymously.
-
-
The Quick view contains the item data coming from the web database regardless of the database it belongs to. The returned results are accurate when searches are performed on the web database. When performing full-text search on the master database, results match the item data that’s in the web database. Note that this doesn’t affect item permissions.
-
If a Sitecore item is secured (can’t be accessed anonymously), Coveo won’t be able to get the Quick view for that item.
-
When adding meta information on the Quick view, a Sitecore field with the same name must exist. Otherwise, the meta element is ignored.
Configuring form authentication
Form authentication allows you to index the HTML of your Sitecore items that can only be accessed while being authenticated.
To log in, form authentication uses the credentials entered in the Coveo.SearchProvider.Custom.config
during the installation of the Coveo for Sitecore package.
The password is encrypted in the |
One of the use cases for form authentication is the HtmlContentInBodyWithRequestsProcessor
pipeline, which uses an HTTP GET
request to fetch the content of a page at indexing time, and attempts a form authentication using the POST
method when the page is secured.
To configure your processor to allow form authentication
-
Open the
Coveo.SearchProvider.Custom.config
file. It’s usually located under<SITECORE_INSTANCE_ROOT>\Website\App_Config\Include\Coveo
. -
In the
defaultIndexConfiguration
node, add the following nodes:<formsAuthConfiguration type="Coveo.Framework.Configuration.FormsAuthConfiguration, Coveo.Framework"> <formsAuthLoginPage></formsAuthLoginPage> <formsAuthUserControl></formsAuthUserControl> <formsAuthPasswordControl></formsAuthPasswordControl> <formsAuthLoginCommand></formsAuthLoginCommand> </formsAuthConfiguration>
-
Enter the following information in the nodes:
-
formsAuthLoginPage
: the URL of your login page. -
formsAuthUserControl
: theid
of the username text field. -
formsAuthPasswordControl
: theid
of the password text field.NoteYou can obtain the
id
of these fields by inspecting the Sitecore instance authentication page Log in button using your favorite browser. -
formsAuthLoginCommand
: Enter the<input>
elementname
andvalue
attribute values associated with the Sitecore login page button, using the following syntax:name=value
NoteSpaces have to be replaced with the
+
symbol.
ExampleYou want to index content from your
http://www.secured.com
Sitecore website. You can access the authentication page of the website throughhttp://www.secured.com/sitecore/login
.Inspecting this authentication page in your browser, you see the following markup:
Given this markup, you would set the
<formsAuthConfiguration>
section element values as follows:<formsAuthConfiguration type="Coveo.Framework.Configuration.FormsAuthConfiguration, Coveo.Framework"> <formsAuthLoginPage>http://www.secured.com/sitecore/login</formsAuthLoginPage> <formsAuthUserControl>UserName</formsAuthUserControl> <formsAuthPasswordControl>Password</formsAuthPasswordControl> <formsAuthLoginCommand>LogInBtn=Log+in</formsAuthLoginCommand> </formsAuthConfiguration>
-
-
Save and close the file.
-
Rebuild your indexes (see Coveo for Sitecore indexing guide).