--- title: Modifying item bodies slug: '2749' canonical_url: https://docs.coveo.com/en/2749/ collection: index-content source_format: adoc --- # Modifying item bodies When altering [item](https://docs.coveo.com/en/210/) bodies through an [indexing pipeline extension (IPE)](https://docs.coveo.com/en/206/), you should typically use a pre-conversion script to modify the `documentdata` stream. While you could be tempted to use a post-conversion script to modify the `body_html` and `body_text` streams instead, doing so can lead to inconsistencies in search results. This is mainly because item summaries and excerpts are extracted during the processing stage of the [indexing pipeline](https://docs.coveo.com/en/184/) and can no longer be altered after that. Modifying the `documentdata` stream in a pre-conversion script is therefore always preferable. > **Important** > > If you're creating your pre-conversion IPE through the [Coveo Administration Console](https://docs.coveo.com/en/183/), ensure that you select the **Original file** additional item data. > If you're using the Extension API directly, include the `DOCUMENT_DATA` string in the `requiredDataStreams` array property of your request payload. > **Note** > > YouTube items are an exception to the above recommendations. > Their bodies are mapped from the `coveo_description` and `coveo_videoid` metadata fields, [which you can modify through a pre-conversion IPE](#modifying-youtube-item-bodies). ## Basic recipe The following script shows a typical basic recipe for modifying item bodies through pre-conversion IPEs. ```python # 1. Get a read-only stream original_data = document.get_data_stream('documentdata') # 2. Read/parse the read-only stream data to a workable format modified_data = original_data.read().decode() # 3. Make all necessary data alterations modified_data = modified_data.replace('foo', 'bar') # 4. Get a modifiable stream modified_stream = document.DataStream('documentdata') # 5. Overwrite the modifiable stream data with the previously altered data modified_stream.write(modified_data) # 6. Add the modified stream to the item document.add_data_stream(modified_stream) ``` > **Note** > > When you modify the content type of `documentdata`, and not just its content, you must also specify the new content type if it's one of the following: > > * `TYPE_HTML`: HTML document > > * `TYPE_DOCX`: Microsoft Word 2007 Document (Zipped XML) > > * `TYPE_PPTX`: Microsoft PowerPoint 2007 Document (Zipped XML) > > * `TYPE_XLSX`: Microsoft Excel 2007 Document (Zipped XML) > > * `TYPE_PDF`: PDF (Portable Document Format) > > * `TYPE_RTF`: Rich Text Format > > * `TYPE_TXT`: Text (ASCII) > > by performing the step below: > > ```python # 7. Specify new content type document.add_meta_data({'detectedfileenum': ['']}) ``` > > where `` is new content type. ## Example This script provides a slightly more concrete example where HTML item bodies are modified through a pre-conversion IPE. ```python from bs4 import BeautifulSoup ​ read_only_stream = document.get_data_stream('documentdata') modified_data = BeautifulSoup(read_only_stream.read().decode(), 'html.parser') ​ # Remove a node modified_data.find(id='my-node-to-remove').decompose() ​ # Add a new node new_node = BeautifulSoup('

Hello world!

', 'html.parser') parent_node = modified_data.find(id='my-parent-node') parent_node.append(new_node) ​ modified_stream = document.DataStream('documentdata') modified_stream.write(str(modified_data)) document.add_data_stream(modified_stream) ``` ## Modifying YouTube item bodies To modify YouTube item bodies, modify the `coveo_description` and `coveo_videoid` metadata fields through a pre-conversion IPE, as you would other fields (see [Add metadata](https://docs.coveo.com/en/34#add-meta-data-method)). The following is a sample pre-conversion script to remove the occurrences of certain strings from YouTube item bodies. ```python old_description = document.get_meta_data_value("coveo_description") new_description = [old.replace("Sentence to remove.", "") for old in old_description] document.add_meta_data({ "coveo_description": new_description }) ​ old_id = document.get_meta_data_value("coveo_videoid") new_id = [old.replace("String to remove", "") for old in old_id] document.add_meta_data({ "coveo_videoid": new_id }) ```