Index With the HTMLContentInBodyWithRequestsProcessor

Coveo for Sitecore 5

The recommended Coveo for Sitecore HTML processor is the FetchPageContentProcessor processor (see Index Page Content with the FetchPageContent Processor).

When enabled, the HTMLContentInBodyWithRequestsProcessor is fired during the indexing operation and will request the HTML page associated with the content being indexed. Once the page is rendered, the processor will index the content for it to be searchable by users.

This HTTP request introduces a delay when indexing.

Configuring the HTMLContentInBodyWithRequestsProcessor

Follow these steps to enable it in the configuration file.

  1. On the Sitecore instance server, open the Coveo.SearchProvider.Custom.config file. By default, the file is located in <SITECORE_INSTANCE_ROOT>\Website\App_Config\Include.

  2. Find the coveoPostItemProcessingPipeline element.

  3. Add the following processor element:

     <processor type="Coveo.SearchProvider.Processors.HtmlContentInBodyWithRequestsProcessor, Coveo.SearchProviderBase">
     </processor>
    

    When your Sitecore items can only be accessed by being authenticated, you need to configure form authentication to be able to index their HTML content (see Configuring Form Authentication).

Ignoring Specific HTML Sections

When indexing an item, you might want to ignore certain sections of the HTML body of a layout, sublayout, view, or of a specific item. To do so, you have two options:

  • The simple option available since September 2016 release of Coveo for Sitecore 4 is to use HTML comments

  • The complex option available since the initial release of Coveo for Sitecore 4, is to use a Quick View device

Option 1: Using HTML comments

Configure the HtmlContentInBodyWithRequestsProcessor

Before modifying the HTML of your items, you need to configure the processor.

  1. In the Coveo.SearchProvider.Custom.config file, locate the previously created HtmlContentInBodyWithRequestsProcessor in the coveoPostItemProcessingPipeline.

  2. Add the following elements between the processor start and end tags:

    <StartCommentText>BEGIN NOINDEX</StartCommentText>
    <EndCommentText>END NOINDEX</EndCommentText>
    

    You can replace BEGIN NOINDEX and END NOINDEX with the text of your choice but avoid using special characters. Use of special characters may cause the following error:

    ERROR HtmlContentInBodyWithRequestsProcessor not correctly configured, start
    and end comments are either invalid or the same.
    

Modify the HTML of Your Item

Now that the processor is properly configured, you need to edit the HTML of the layout, sublayout, view, or item you want to modify. To do so, follow these steps:

  1. Open the HTML of the item you want to modify:

    1. For a layout or sublayout, navigate to the <SITECORE_INSTANCE_ROOT>\Website\layouts folder, and open the .aspx file you want to modify.

    2. For a view, navigate to <SITECORE_INSTANCE_ROOT>\Website\Views folder, and open the .cshtml file you want to modify.

    3. For a specific item, in your Sitecore Content Editor, select the item you want to modify and, under the section you want to modify, select Edit HTML. If you don’t have the Edit HTML option, edit in the textbox itself.

  2. Add <!-- BEGIN NOINDEX --> and <!-- END NOINDEX --> (or the start and end tags you defined in the Configure the HtmlContentInBodyWithRequestsProcessor section) around the part you don’t want to index, as follows:

    <!-- BEGIN NOINDEX -->
    <p>This section won't be indexed.</p>
    <!-- END NOINDEX -->
    <p>This section will be indexed.</p>
    
  3. Save your item, and rebuild it (see Re-Indexing Only a Section of Your Content Tree). Your undesired sections should now be ignored by the index.

Option 2: Using a Quick View device

This strategy implies the creation of a specific device for the quick view that detects the user agent used for crawling (see Creating the Quick View Device). The quick view page also allows you to set meta tags that add content to item fields.

Those fields could then be used for sorting or faceted searches (see Adding Meta Information on the Quick View).

To have Quick View on items, make sure to publish all items to the web database (see Constraints and Limitations).

Adding a Specific Quick View Layout

This is a fairly common operation in Sitecore:

  1. Open the Sitecore Content Editor and select the /sitecore/Layout/Layouts item.

  2. You’re encouraged to use a project specific folder for the new layout. To do so, right-click and select Insert >Layout Folder. Give the new folder a meaningful name.

  3. Right-click on the folder and select Insert > Layout.

  4. Follow the wizard steps. A new file will be created in the layouts folder of the web site. You could modify it to suit your needs. The default path is <SITECORE_INSTANCE_ROOT>\website\layouts.

To make your layouts more manageable, you can define new sublayouts that are template-specific. This way, you can reuse the Quick View Layout for many items.

Creating a Quick View Device

  1. Open the Content Editor and select the item /sitecore/Layout/Devices.

  2. Right-click and select Insert > Device.

  3. Give it a meaningful name, such as Quick View.

  4. Activate the Content tab and find the Browser agent field in the Detection section.

  5. Set the value to Coveo Sitecore Search Provider.

You just enabled an additional device to be used later on by Coveo to fetch an item quick view.

Setting Item Preview Layout

  1. Open the Content Editor and select a template for which you want to enable the Preview feature.

  2. When you expand the desired template, you will see a sub item named __Standard Values. Select it.

  3. Click the Presentation tab in the top of the window, and then click Details.

  4. Next to the Quick View device, click the [No layout specified] link.

  5. Select the Quick View Layout from the tree.

  6. If you’re using any sublayout that’s specific to the template, follow these steps:

    1. Click Edit. A new window will appear.

    2. Click the Controls list item on the left.

    3. To add a sublayout, click Add and then select the desired sublayout from the tree.

    4. Type in the name of the placeholder in the Add to Placeholder text box. The default placeholder name in the samples is content.

The template instances now shows something in Preview mode. To see an item in Preview mode, follow these steps:

  1. Open the Page Editor and navigate to the desired item.

  2. Click Preview to enter preview mode.

  3. The preview will use the Default device. Click Default and select the Quick View device.

Adding Meta Information on the Quick View

It’s also possible to set specific field values on items when using the quick view. It allows, for example, to set field values on an item when rendering a sub item. To add metadata to an item quick view, you only need to output tags like this one:

<meta name="theFieldName" content="The Field Value">

The meta element doesn’t need to be placed in the page head; you can place it anywhere. When indexing the item, the field theFieldName is given the value The Field Value in the index for the item. Please note that the meta name must match the Sitecore field name.

Constraints and Limitations

There are currently some limitations when this method is used.

  • To get a quick view on an item, the following conditions must be met:

    • The item must have a layout so the item preview is displayed correctly in Sitecore.

    • The item preview must be available anonymously.

  • The quick view contains the item data coming from the web database regardless of the database it belongs to. The returned results are accurate when searches are performed on the web database. When performing full-text search on the master database, results match the item data that’s in the web database. Please note that this doesn’t affect item permissions.

  • If a Sitecore item is secured (can’t be accessed anonymously), Coveo won’t be able to get the quick view for that item.

  • When adding meta information on the quick view, a Sitecore field with the same name must exist. Otherwise, the meta element is ignored.

Configuring Form Authentication

Form authentication allows you to index the HTML of your Sitecore items that can only be accessed while being authenticated. To log in, form authentication uses the credentials entered in the Coveo.SearchProvider.Custom.config during the installation of the Coveo for Sitecore package.

The password is encrypted in the Coveo.SearchProvider.Custom.config file. To change the credentials, you should instead use Control Panel > Coveo Search > Configuration > Sitecore Credentials.

One of the use cases for form authentication is the HtmlContentInBodyWithRequestsProcessor pipeline, which uses an HTTP GET request to fetch the content of a page at indexing time, and attempts a form authentication using the POST method when the page is secured.

To configure your processor to allow form authentication, follow these steps:

  1. Open the Coveo.SearchProvider.Custom.config file. It’s usually located under <SITECORE_INSTANCE_ROOT>\Website\App_Config\Include\Coveo.

  2. In the defaultIndexConfiguration node, add the following nodes:

    <formsAuthConfiguration type="Coveo.Framework.Configuration.FormsAuthConfiguration, Coveo.Framework">
      <formsAuthLoginPage></formsAuthLoginPage>
      <formsAuthUserControl></formsAuthUserControl>
      <formsAuthPasswordControl></formsAuthPasswordControl>
      <formsAuthLoginCommand></formsAuthLoginCommand>
    </formsAuthConfiguration>
    
  3. Enter the following information in the nodes:

    • formsAuthLoginPage: the URL of your login page.

    • formsAuthUserControl: the id of the username text field.

    • formsAuthPasswordControl: the id of the password text field.

      You can obtain the id of these fields by inspecting the Sitecore instance authentication page Log in button using your favorite browser.

    • formsAuthLoginCommand: Enter the <input> element name and value attribute values associated with the Sitecore login page button, using the following syntax:

        name=value
      

      Spaces have to be replaced with the + symbol.

    You want to index content from your http://www.secured.com Sitecore website. You can access the authentication page of the website through http://www.secured.com/sitecore/login.

    Inspecting this authentication page in your browser, you see the following markup:

    Sitecore Log In Button HTML

    Given this markup, you would set the <formsAuthConfiguration> section element values as follows:

    <formsAuthConfiguration type="Coveo.Framework.Configuration.FormsAuthConfiguration, Coveo.Framework">
      <formsAuthLoginPage>http://www.secured.com/sitecore/login</formsAuthLoginPage>
      <formsAuthUserControl>UserName</formsAuthUserControl>
      <formsAuthPasswordControl>Password</formsAuthPasswordControl>
      <formsAuthLoginCommand>LogInBtn=Log+in</formsAuthLoginCommand>
    </formsAuthConfiguration>
    
  4. Save and close the file.

  5. Rebuild your indexes (see Coveo for Sitecore Indexing Guide).

Recommended Articles