Add structured data (JSON-LD) to HTML items from index fields

In this article

When using a Coveo Machine Learning (Coveo ML) Smart Snippet model to extract questions and answers from a web page, we recommend that you use Google structured data in JSON-LD format within the <head> of the web page HTML for optimal results.

In addition to, or in the absence of JSON-LD, the model searches headers (<h> tags) in HTML items and uses the content that appears within these headers to extract snippets. See Optimize the content for further information on how Coveo ML Smart Snippet models leverage HTML content to extract snippets.

However, when using support-case content to build a Smart Snippet model (for example, content originating from a Salesforce or ServiceNow source), this content may not be properly configured to be optimally used by the model.

This article provides instructions on how to create an indexing pipeline extension (IPE) that allows you to identify the index fields containing the questions and answers you want the model to use, and convert this content to JSON-LD format, which will be added in the <head> of the HTML item.

content browser showing answers and questions fields

Basic recipe

The following code sample shows the post-conversion IPE script that can be used to specify the index field containing the questions and answers you want the model to use:

from bs4 import BeautifulSoup
import json

def get_safe_meta(meta_data_name):
  meta_data_value = document.get_meta_data_value(meta_data_name)
  if meta_data_value:
      return ''.join(char for char in meta_data_value[-1] if ord(char) < 128)
  else:
      return ''

def create_question(name, text):
  return {
      "@type": "Question",
      "name": name,
      "acceptedAnswer": {
          "@type": "Answer",
          "text": text
      }
  }


def clean_answer(answer: str):
 answer = answer.replace('\t', '&nbsp;&nbsp;')
 answer = answer.replace('\n', '<br/>')
 answer = answer.replace('\u00a0', '&nbsp;')
 return answer


def parse_answer(answer: str):
 # YOUR CUSTOM PARSING CODE HERE
 return answer


QUESTION_FIELD = '<QUESTION_FIELD>'
ANSWER_FIELD =  '<ANSWER_FIELD>'

body_html_stream = document.get_data_stream('body_html')
question = get_safe_meta(QUESTION_FIELD)
answer = clean_answer(get_safe_meta(ANSWER_FIELD))

questions = []
questions.append(create_question(question, parse_answer(answer)))

faq_schema = json.dumps({"@context": "https://schema.org", "@type": "FAQPage", "mainEntity": questions})
faq_script = BeautifulSoup("""<script type="application/ld+json">""" + faq_schema + """</script>""", 'html.parser')

body_html_soup = BeautifulSoup(body_html_stream.read(), 'html.parser')
body_html_soup.head.append(faq_script)

output_stream = document.DataStream('body_html')
output_stream.write(str(body_html_soup))

document.add_data_stream(output_stream)
document.add_meta_data({"HasHTMLVersion": True})

Where you replace:

  • <QUESTION_FIELD> with the field that contain the questions you want the model to use.

  • <ANSWER_FIELD> with the field that contain the answers you want the model to use.

  • Below YOUR CUSTOM PARSING CODE HERE, you can optionally add custom code to adapt the IPE to your use case.

    Example

    The field that contain the answers you want the model to use may contain information that you don’t want the snippet to display.

    When configuring your Smart Snippet model, you selected sfresolution as a field to be used by the model to extract content.

    The content of the sfresolution field is configured as follows in the source you selected when configuring the model:

    <!doctype html>
    <html>
    <head>
    <title>My Document Title</title>
    </head>
    <body>
    
      <h2>Objective</h2>
       <ul>
         <li>
           Request Access
         </li>
       </ul>
    
      <h2>Environment</h2>
       <ul>
         <li>
           V2
         </li>
       </ul>
    
      <h2>Procedure</h2>
       <ol>
         <li>
           Submit a ticket
         </li>
         <li>
           Fill out the request
         </li>
         <li>
           After the ticket is submitted, check your inbox for a confirmation.
         </li>
       </ol>
    
      <h2>Additional Information</h2>
       <p>
        Additional information can be found on our support website.
       </p>
    
    </body>
    </html>

    To always provide relevant snippets, you modify the above IPE to include custom code that will only scope elements appearing within the Procedure section of the item’s HTML, and use this information as the answer section of the JSON-LD generated by the IPE.

Usage

This section provides instructions on how to create the post-conversion IPE script and assign it to the desired sources.

Step 1: Create the indexing pipeline extension (IPE) script

extension configuration
  1. On the Extensions (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console, click Add extension.

  2. On the Add an Extension page, in the Extension name input, enter a meaningful name for your extension.

  3. In the Extension input, you can optionally add a description for your extension.

  4. In the Select additional item data that the extension needs to access section, select the Body HTML option.

  5. In the Select restricted parameters that the extension needs to access section, make sure the Vault parameters option is cleared.

  6. In the Extension script section, paste the IPE script and update the code to your needs.

  7. Assign the IPE script to your source.

Step 2: Assign the indexing pipeline extension (IPE) script to a source

extension configuration
  1. On the Sources (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console, click the source to which you want to apply the IPE, and then click More > Manage extensions in the Action bar.

  2. On the page that opens, click Add, and then select Extension.

  3. On the page that opens, in the Extensions section, select the IPE you created.

  4. In the Stage section, select Post-Conversion.

  5. In the Action on Error section, select Skip Extension.

  6. In the Apply to section, depending on whether your Coveo ML Smart Snippet model applies to specific item types:

    • If your Coveo ML Smart Snippet model doesn’t scope specific item types, select All items (common).

    • If your Coveo ML Smart Snippet model scopes specific item types, select Specific item types, and then specify the item types to which the IPE should apply.

  7. (Optional) In the Condition(s) to apply section, you can add a condition to the extension to scope the items on which the extension should apply (for example, %[documenttype] == "Solution").

  8. Click Apply extension.

  9. Click Save and rebuild source to apply the IPE to your source.

Important

To see the impact of the IPE in snippets extracted by a Coveo ML Smart Snippet model, you must update the model after the targeted sources have been rebuilt with the IPE.