Smart Snippets Deployment Overview

To provide relevant snippets of content in a search results list, Coveo Machine Learning (Coveo ML) Smart Snippets models require the content it uses to be formatted in a certain way. This article provides best practices for properly scoping, formatting, and testing the content you want the model to use.

Coveo ML Smart Snippets is currently not available for HIPAA organizations.

Step 1: Scope the Content

To optimize the output of a Coveo ML Smart Snippets model, we strongly recommend that you first identify the content that the model must use. This will help you better understand how to configure the model.

Item Language Value

Coveo Smart Snippets models return snippets of content only for items whose language field value is English.

HTML Content

Coveo Smart Snippets models only return snippets of content for items that contain content in HTML format. Therefore, you should ensure that the content that you want to use for creating the model contains HTML elements.

You can use the Content Browser page of the Coveo Administration Console to verify the items that would be available for the model.

For example, you select the My Source source and HTML File file type to find the items from the My Source source that can be used as inputs by the model.

content browser with field selections to target html items in a given source

When checking the items that could be usable by the model in the Content Browser, the number of items that match your requirements may differ from the number of items you will see when inspecting the Item count section of your model’s model building statistics.

During the build process, the model can ignore some items because they either contain invalid HTML, or no snippets could be extracted from the parsed HTML.

PermanentId Field

The items from which you want the model to extract snippets must use the value of the permanentId field as their unique identifier.

You can use the Content Browser page of the Coveo Administration Console and check an item’s proprieties to verify whether an item uses the permanentId as its unique identifier.

Document Types

When you inspect the items that the model can use (i.e., items containing the required HTML tags in a given source), you may notice that many of these items are available, but not all of them are relevant for the model.

When configuring a Coveo ML Smart Snippets model, you can optionally target certain document types that must be used by the model. This allows you to further narrow the content that the model will use as an input.

EXAMPLE

The content you want the model to extract resides in a Salesforce source. This content is available in items whose documenttype has the Knowledge value. Therefore, you select Knowledge in the Document type is drop-down menu so that the model uses only items whose documenttype field value is Knowledge.

In the Content Browser page of the Coveo Administration Console, you can verify the items that would be available for the model by scoping the items that have the Knowledge value for the documenttype field. For example, you can use a field query to scope the documents as follows:

content browser with field selections to target knowledge items in a given source

Content Fields

When you inspect the items that the model can use (i.e., items of a specific document type containing the required HTML tags in a given source), you may notice that these items contain multiple fields that embed HTML content. The content of some of these fields may not be relevant and you may not want the model to use it.

When configuring a Coveo ML Smart Snippets model, you can optionally target certain fields to be used by the model. This allows you to further narrow the content that the model will use as an input. If you don’t mention specific fields, the model will use the value of the item body field by default.

EXAMPLE

The content you want the model to extract resides in a Salesforce source. The relevant information, formatted in HTML, is located in a custom field named sf_case_details_c. Therefore, you select the sf_case_details_c field in the Field(s) containing HTML content drop-down menu so that the model uses only the content that appears in the sf_case_details_c field when extracting the document.

In the Content Browser page of the Coveo Administration Console, you can verify the items that contain the sf_case_details_c field. For example, you can use a field query to scope items of the My Salesforce Source source whose documenttype value is Knowledge and that contains the sf_case_details_c field as follows:

content browser with field selections to target specific items in a given source

Step 2: Optimize the Content

Now that you’ve targeted the content that must be used by the model, you must ensure that this content is properly configured.

Coveo ML Smart Snippets establish correlations between the headers appearing in result items and user queries. Therefore, your content must be configured accordingly.

For optimal results, we recommend that you use Google structured data in JSON-LD format in the <head> of the HTML items that must be used by the model to extract snippets.

The following code sample shows a simple HTML markup that contains JSON-LD formatted content within the <head>:

<html>
  <head>
    <title>Example Site - Frequently Asked Questions(FAQ)</title>
    <script type="application/ld+json">
    {
      "@context": "https://schema.org",
      "@type": "FAQPage",
      "mainEntity":[{
        "@type": "Question",
        "name": "What is Smart Snippets?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "<p>The Coveo Smart Snippets feature provides users with answers to their queries directly on the results page by displaying a snippet of the most relevant result item. This allows users to quickly find answers without having to open links from the results page.</p>"]
    }
    </script>
  </head>
  <body>
  </body>
</html>

However, Coveo ML Smart Snippets will also work with raw HTML. When using this approach, it must be noted that the Coveo ML Smart Snippets feature doesn’t exclude navigation menus, featured content, or any other peripheral content in an item. To help the algorithm identify this type of content, you should specify CSS properties to exclude.

  • If your web page doesn’t contain Google structured data, and the questions contained in the web page aren’t formatted using HTML headers (<h> tags), you can use the pre-conversion IPE extension script to specify CSS selectors to identify questions and answers in an HTML item.

  • If the content you want the model to use resides in specific fields, and that this content isn’t properly configured to be optimally used by the model (e.g., the item doesn’t contain JSON-LD, or well-formatted HTML) you can use the post-conversion IPE extension script to specify fields whose content will be identified as questions and answers, and converted in JSON-LD format.

In a given item, the model takes the content that appears within the following tags into consideration when they’re attached to the last header (<h> tag) in a header stack:

  • <br>

  • <ol>

  • <ul>

  • <li>

  • <p>

  • <b>

  • <i>

  • <em>

  • <span>

EXAMPLE

Considering a page that is configured as follows:

<body>
<h1>FAQ</h1>
    <h2>Synchronizing Speedbit Watches</h2>
        <p>The procedure differs depending on the device with which you want to synchronize your watch.</p>
            <h3>Synchronizing a Speedbit Watch With a Smartphone</h3>
                <p>Procedure to synchronize your Speedbit watch with your smartphone.</p>
            <h3>Synchronizing a Speedbit Watch With a Computer</h3>
                <p>Procedure to synchronize your Speedbit watch with your computer.</p>
</body>

The model would process the page as follows:

[
  {"headers":  ["FAQ", "Synchronizing Speedbit Watches", "Synchronizing a Speedbit Watch With a Smartphone"], "excerpt": "<p>Procedure to synchronize your Speedbit watch with your smartphone.</p>"},
  {"headers":  ["FAQ", "Synchronizing Speedbit Watches", "Synchronizing a Speedbit Watch With a Computer"], "excerpt": "<p>Procedure to synchronize your Speedbit watch with your computer.</p>"}
]
  • Coveo ML Smart Snippets ignores the content that appears within the following tags:

    • <script>

    • <style>

    • <form>

    • <table>

    • <img>

    • <input>

  • By default, when a Coveo ML Smart Snippets model finds identical headers in both the JSON-LD and the HTML content, only those found in the JSON-LD are retained. This behavior can be changed by using the parsingMode advanced model parameter when updating the model configuration through the Update the information of a model operation of the Machine Learning API.

Step 3: Create the Model

Now that you scoped the content that must be used by the model, and your content is properly formatted, you can now create your Coveo ML Smart Snippets model and configure it as desired.

See Create a Smart Snippets Model for instructions on how to create a Coveo ML Smart Snippets model.

Review the Model Build Information

Now that your model is created and is Active, you can verify whether the model is able to provide snippets of content.

The Get detailed information about a specific model call of the Machine Learning Models API allows you to obtain detailed information about your Coveo ML Smart Snippets model, such as:

  • The number of items it can use to extract snippets of content.

  • The number of HTML headers it can target to find related content.

  • The average length of the snippets it extracted.

  • The total number of snippets that the model can provide.

When performing this API call, you should receive a response that contains a modelBuildingStats object as follows:

"modelBuildingStats": {
    "documentCount": 2688,
    "headerCount": 11239,
    "meanSnippetLength": 104.94903911094738,
    "snippetCount": 16287
}

Step 4: Associate the Model With the Desired Query Pipeline

Now that your model is created, you must associate the model with the query pipeline to which the traffic of the desired search interface is directed.

See Associate a Smart Snippets Model With a Query Pipeline for instructions on how to associate your model with a query pipeline.

Step 5: Configure the Search Interface

Now that your model is configured, and associated with the query pipeline to which the traffic of the desired search interface is directed, you must configure the search interface to include the components that will allow the model to render its output.

See Configure the Search Interface for instructions on how to set up a search interface for Coveo ML Smart Snippets.

Step 6: Test the Model

Now that your model is configured and that your search interface is set up to display the model’s output, you can test the model to ensure that everything works as expected.

You can test the model on the search interface that contains the required components, and for which the traffic is directed to the query pipeline that you associated with your Coveo ML Smart Snippets model.

Perform a query that would likely trigger a snippet to appear in the search results. For example, if one of the items you scoped for the model to extract snippets from contains a header that reads When Exactly Is a Model Retrained?, you can perform the When is a model retrained query:

example of a smart snippets model in action

You can also inspect the request to the Search API to obtain detailed information about the model’s output for a given query:

  1. Access the search interface that contains the required components, and in which the traffic is directed to the query pipeline that you associated with your Coveo ML Smart Snippets model.

  2. Access your browser developer tools.

  3. In the search box, perform a query that would likely trigger a snippet to appear in the search results.

  4. In your browser developer tools, in the Network tab, under the Name column, select the latest request to the Search API. The request path should contain /rest/search/v2.

  5. Select the Preview tab. You should now see the query response body.

  6. In the query response body, you should see an expandable questionAnswer property. You can expand it to get detailed information about the model’s output for this specific query. The questionAnswer property doesn’t appear if the model can’t provide snippets for the current query.

    search api response for a smart snippets model
What's Next for Me?