Modifying Item Bodies
Modifying Item Bodies
When you need to alter item bodies through an indexing pipeline extension (IPE), you should typically use a pre-conversion script to modify the documentdata
stream.
While you could be tempted to use a post-conversion script to modify the body_html
and body_text
streams instead, doing so can lead to inconsistencies in search results.
This is mainly due to the fact that item summaries and excerpts are extracted during the processing stage of the indexing pipeline and can no longer be altered thereafter.
Modifying the documentdata
stream in a pre-conversion script is therefore always preferable.
If you’re creating your pre-conversion IPE through the Coveo Administration Console, ensure that you select the Original file additional item data.
If you’re using the Extension API directly, include the |
Note
YouTube items are an exception to the above recommendations.
Their bodies are mapped from the |
Basic Recipe
The following script shows a typical basic recipe for modifying item bodies through pre-conversion IPEs.
# 1. Get a read-only stream
original_data = document.get_data_stream('documentdata')
# 2. Read/parse the read-only stream data to a workable format
modified_data = original_data.read().decode()
# 3. Make all necessary data alterations
modified_data = modified_data.replace('foo', 'bar')
# 4. Get a modifiable stream
modified_stream = document.DataStream('documentdata')
# 5. Overwrite the modifiable stream data with the previously altered data
modified_stream.write(modified_data)
# 6. Add the modified stream to the item
document.add_data_stream(modified_stream)
Note
When you modify the content type of
by performing the step below:
where |
Example
This script provides a slightly more concrete example where HTML item bodies are modified through a pre-conversion IPE.
from bs4 import BeautifulSoup
read_only_stream = document.get_data_stream('documentdata')
modified_data = BeautifulSoup(read_only_stream.read().decode(), 'html.parser')
# Remove a node
modified_data.find(id='my-node-to-remove').decompose()
# Add a new node
new_node = BeautifulSoup('<p>Hello world!</p>', 'html.parser')
parent_node = modified_data.find(id='my-parent-node')
parent_node.append(new_node)
modified_stream = document.DataStream('documentdata')
modified_stream.write(str(modified_data))
document.add_data_stream(modified_stream)
Modifying YouTube Item Bodies
To modify YouTube item bodies, modify the coveo_description
and coveo_videoid
metadata fields through a pre-conversion IPE, as you would other fields (see Add Metadata).
The following is a sample pre-conversion script to remove the occurrences of certain strings from YouTube item bodies.
old_description = document.get_meta_data_value("coveo_description")
new_description = [old.replace("Sentence to remove.", "") for old in old_description]
document.add_meta_data({ "coveo_description": new_description })
old_id = document.get_meta_data_value("coveo_videoid")
new_id = [old.replace("String to remove", "") for old in old_id]
document.add_meta_data({ "coveo_videoid": new_id })