Add structured data (JSON-LD) to HTML items from index fields
Add structured data (JSON-LD) to HTML items from index fields
When using a Coveo Machine Learning (Coveo ML) Smart Snippet model to extract questions and answers from a web page, we recommend that you use Google structured data in JSON-LD format within the <head>
of the web page HTML for optimal results.
In addition to, or in the absence of JSON-LD, the model searches headers (<h>
tags) in HTML items and uses the content that appears within these headers to extract snippets.
See Optimize the content for further information on how Coveo ML Smart Snippet models leverage HTML content to extract snippets.
However, when using support-case content to build a Smart Snippet model (for example, content originating from a Salesforce or ServiceNow source), this content may not be properly configured to be optimally used by the model.
This article provides instructions on how to create an indexing pipeline extension (IPE) that allows you to identify the index fields containing the questions and answers you want the model to use, and convert this content to JSON-LD format, which will be added in the <head>
of the HTML item.
Basic recipe
The following code sample shows the post-conversion IPE script that can be used to specify the index field containing the questions and answers you want the model to use:
from bs4 import BeautifulSoup
import json
def get_safe_meta(meta_data_name):
meta_data_value = document.get_meta_data_value(meta_data_name)
if meta_data_value:
return ''.join(char for char in meta_data_value[-1] if ord(char) < 128)
else:
return ''
def create_question(name, text):
return {
"@type": "Question",
"name": name,
"acceptedAnswer": {
"@type": "Answer",
"text": text
}
}
def clean_answer(answer: str):
answer = answer.replace('\t', ' ')
answer = answer.replace('\n', '<br/>')
answer = answer.replace('\u00a0', ' ')
return answer
def parse_answer(answer: str):
# YOUR CUSTOM PARSING CODE HERE
return answer
QUESTION_FIELD = '<QUESTION_FIELD>'
ANSWER_FIELD = '<ANSWER_FIELD>'
body_html_stream = document.get_data_stream('body_html')
question = get_safe_meta(QUESTION_FIELD)
answer = clean_answer(get_safe_meta(ANSWER_FIELD))
questions = []
questions.append(create_question(question, parse_answer(answer)))
faq_schema = json.dumps({"@context": "https://schema.org", "@type": "FAQPage", "mainEntity": questions})
faq_script = BeautifulSoup("""<script type="application/ld+json">""" + faq_schema + """</script>""", 'html.parser')
body_html_soup = BeautifulSoup(body_html_stream.read(), 'html.parser')
body_html_soup.head.append(faq_script)
output_stream = document.DataStream('body_html')
output_stream.write(str(body_html_soup))
document.add_data_stream(output_stream)
document.add_meta_data({"HasHTMLVersion": True})
Where you replace:
-
<QUESTION_FIELD>
with the field that contain the questions you want the model to use. -
<ANSWER_FIELD>
with the field that contain the answers you want the model to use. -
Below
YOUR CUSTOM PARSING CODE HERE
, you can optionally add custom code to adapt the IPE to your use case.ExampleThe field that contain the answers you want the model to use may contain information that you don’t want the snippet to display.
When configuring your Smart Snippet model, you selected
sfresolution
as a field to be used by the model to extract content.The content of the
sfresolution
field is configured as follows in the source you selected when configuring the model:<!doctype html> <html> <head> <title>My Document Title</title> </head> <body> <h2>Objective</h2> <ul> <li> Request Access </li> </ul> <h2>Environment</h2> <ul> <li> V2 </li> </ul> <h2>Procedure</h2> <ol> <li> Submit a ticket </li> <li> Fill out the request </li> <li> After the ticket is submitted, check your inbox for a confirmation. </li> </ol> <h2>Additional Information</h2> <p> Additional information can be found on our support website. </p> </body> </html>
To always provide relevant snippets, you modify the above IPE to include custom code that will only scope elements appearing within the
Procedure
section of the item’s HTML, and use this information as theanswer
section of the JSON-LD generated by the IPE.
Usage
This section provides instructions on how to create the post-conversion IPE script and assign it to the desired sources.
Step 1: Create the indexing pipeline extension (IPE) script
-
On the Extensions (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console, click Add extension.
-
On the Add an Extension page, in the Extension name input, enter a meaningful name for your extension.
-
In the Extension input, you can optionally add a description for your extension.
-
In the Select additional item data that the extension needs to access section, select the Body HTML option.
-
In the Select restricted parameters that the extension needs to access section, make sure the Vault parameters option is cleared.
-
In the Extension script section, paste the IPE script and update the code to your needs.
Step 2: Assign the indexing pipeline extension (IPE) script to a source
-
On the Sources (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console, click the source to which you want to apply the IPE, and then click More > Add extensions in the Action bar.
-
On the page that opens, click Add, and then select Extension.
-
On the page that opens, in the Extensions section, select the IPE you created.
-
In the Stage section, select Post-Conversion.
-
In the Action on Error section, select Skip Extension.
-
In the Apply to section, depending on whether your Coveo ML Smart Snippet model applies to specific item types:
-
If your Coveo ML Smart Snippet model doesn’t scope specific item types, select All items (common).
-
If your Coveo ML Smart Snippet model scopes specific item types, select Specific item types, and then specify the item types to which the IPE should apply.
-
-
(Optional) In the Condition(s) to apply section, you can add a condition to the extension to scope the items on which the extension should apply (for example,
%[documenttype] == "Solution"
). -
Click Apply extension.
-
Click Save and rebuild source to apply the IPE to your source.
To see the impact of the IPE in snippets extracted by a Coveo ML Smart Snippet model, you must update the model after the targeted sources have been rebuilt with the IPE. |