---
title: "`document` object Python API reference"
slug: '34'
canonical_url: https://docs.coveo.com/en/34/
collection: index-content
source_format: adoc
---
# `document` object Python API reference
Creating an [indexing pipeline extension (IPE)](https://docs.coveo.com/en/206/) implies writing Python code that uses the `document` object to manipulate [item](https://docs.coveo.com/en/210/) properties (see [Creating an indexing pipeline extension with the API](https://docs.coveo.com/en/146/) and [Coveo indexing pipeline](https://docs.coveo.com/en/1893/)).

This article provides reference information describing the object methods and their parameters.
You may also want to read about the [versions of the Indexing Pipeline Extensions API](https://docs.coveo.com/en/156#indexing-pipeline-extensions-api-versions), especially if your organization has [dictionary fields](https://docs.coveo.com/en/2036/).

## `log` method

This method displays a message and its severity in the current [**Log Browser**](https://platform.cloud.coveo.com/admin/#/orgid/logs/browser/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/logs/browser/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/logs/browser/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/logs/browser/)) entry.
It's useful when debugging.

> **Leading practice**
>
> Always use the `log` method when you want to monitor something in an IPE.

**Syntax:**

```python
log(message, severity)
```

### `log` parameters

The following table shows the `log` method parameters:

[%header,cols="1,1,5"]
|===
|Parameter
|Type
|Description

|`message`
|**Required**: string
|The message that you want to log when applying an extension script.

|`severity`
|string
a|Optionally used to indicate the message severity type.

Default value is `Normal`.

The allowed case insensitive severity values are:

* `Debug`

* `Detail`

* `Error`

* `Fatal`

* `Important`

* `Normal`

* `Notification`

* `Warning`
|===

### `log` examples

```python
log("Hello world!", "Notification")
```

```python
fulltitle = document.get_meta_data_value('titleselection', 'crawler', True)

try: <1>
    # modifying fulltitle variable
    fulltitle = fulltitle[0]

    # logging a meaningful success message
    log('added metadata value to title: ' + fulltitle)

# catching all exceptions and logging them as a string for debugging purposes
except Exception as e: <2>
    log(str(e), 'Error')
```

This script example uses the `log` method in two different ways.

. First, the `try` block modifies the [metadata](https://docs.coveo.com/en/218/) and logs a success message only when the script runs without raising an error.
In this particular case, the second argument is missing as the default value `Normal` defines the log message severity.

. When the `try` block fails, the `except` block catches the exception and sends a log containing the error message:

![Coveo Log Browser entry regarding error when applying an extension](https://docs.coveo.com/en/assets/images/index-content/logs-applying-extension.png)

> **Note**
>
> Applying an extension populates the `documentLogEntries.meta.logs` field that contains all log messages and severity type strings.
> This field length is limited to approximately 4K characters, after which the content is truncated.
> When the added length of many log messages exceeds the limit, it's still possible to view all the messages that fits within the limit but the log message that sits on the limit is replaced with a `+truncated...+` mention as the following messages are ignored.
> 
> For example, when a very long string exceeds the 4K limit, even if it represents the one and only log that applies to your extension, the whole string is replaced with the `+truncated...+` mention.
> The log message generated by an extension script can be seen in an added subsection of the JSON response named `documentLogEntries.meta.logs` as well as in the **Log Browser**.
> 
> ```json
{
  "documentLogEntries": [
    {
      "id": "http://www.example.com/",
      "organizationId": "myorganization",
      "sourceId": "qqotfbbttohttrnva4ebwykbe4-myorganization",
      "resourceId": "myorganization-tb5qadfyqqv2mrdtn2gde5kcpi",
      "task": "EXTENSION",
      "operation": "ADD",
      "result": "COMPLETED",
      "datetime": "2017-08-17T13:01:36.852Z",
      "requestId": "976520a8-f569-45d9-b252-48e6aea544d5",
      "meta": {
        "duration": "0.0559999",
        "logs": "truncated..."
      }
    }
  ]
}
```
> 
> ![Coveo Log Browser interface showing a log entry with truncated](:https://docs.coveo.com/en/assets/images/index-content/logs-truncated.png)

## `get_uri` method

You use this method to get the [item](https://docs.coveo.com/en/210/) URI.

**Syntax:**

```python
document.uri
```

You can easily output an item `uri` in the **Log Browser** by adding those lines in your extension:

```python
my_variable = document.uri
log(my_variable)
```

## `get_meta_data` method

You use this method to get all item [metadata](https://docs.coveo.com/en/218/).
It returns a list of `MetaDataValue` objects (see [`document` object JSON schema](#document-object-json-schema)).

**Syntax:**

```python
document.get_meta_data()
```

Because unmapped metadata isn't indexed, using this method makes all metadata available before the final indexing step.
The following extension script makes it possible to consult a list of all custom metadata:

```python
import json
document.add_meta_data({'allmetadatavalues': json.dumps(document.get_meta_data())})
```

> **Important**
>
> You must map the `allmetadatavalues` metadata to a [field](https://docs.coveo.com/en/200/) so that you don't lose the populated values while indexing the item.

## `get_meta_data_value` method

You use this method to get a metadata value for a given metadata name and origin.

> **Note**
>
> This method returns a **list** of values.
> If there's only one metadata value, this list will contain a single element.

**Syntax:**

```python
document.get_meta_data_value(name, origin, reverse)
```

### `get_meta_data_value` parameters

The following table shows the `get_meta_data_value` method parameters:

[%header,cols="1,1,5"]
|===
|Parameter
|Type
|Description

|`name`
|**Required**: string
|The name of the metadata to retrieve.

|`origin`
|string
a|The unique identifier of the [Coveo indexing pipeline](https://docs.coveo.com/en/184/) step from which to retrieve a metadata value.

The allowed `origin` values are as follows:

* `crawler`: The metadata value set during the [crawling](https://docs.coveo.com/en/2684#crawling) stage

* Pre-conversion script name - The metadata value set during a specific pre-conversion script

* `converter`: The metadata value set during the [processing](https://docs.coveo.com/en/2684#processing) stage

* `mapping`: The metadata value set during the [mapping](https://docs.coveo.com/en/2684#mapping) stage

* Post-conversion script name - The metadata value set during a specific post-conversion script

> **Note**
>
> If no value is supplied for this parameter, the most recent origin is used.
> For example, the origin would be `crawler` in pre-conversion and `mapping` in post-conversion.

|`reverse`
|Boolean
|Whether to scan the metadata origin in reverse order or not.
The default value is `True`, meaning that the value is fetched from the latest indexing pipeline stage with a non-empty value.
|===

### `get_meta_data_value` example

```python
# Get original title from the crawling module in a log message
original_title = document.get_meta_data_value('title', 'crawler')  # Remember, this method returns a list
log(original_title[0], 'Normal')
```

## `add_meta_data` method

You use this method to add an item metadata key and its associated value.
You can also use it to unset or override item metadata.

> **Important**
>
> For all sources except push sources, if you add metadata before the [mapping](https://docs.coveo.com/en/217/) stage, you must map the metadata to a field for it to be indexed.
> 
> For example, if you add metadata in a post-conversion extension script, the metadata is only indexed when the index contains a field whose name matches the metadata key.

**Syntax:**

```python
document.add_meta_data({metadataKey: [metadataValue]})
```

Use an array to specify values for a multi-value field:

```python
document.add_meta_data({"language": ["en", "fr"]})
```

Because unmapped metadata isn't indexed, using this method makes all metadata available before the final indexing step.
The following extension script makes it possible to consult a list of all custom metadata:

```python
import json
document.add_meta_data({'allmetadatavalues': json.dumps(document.get_meta_data())})
```

> **Important**
>
> You must map the `allmetadatavalues` metadata to a field so that you don't lose the populated values while indexing the item.

### `add_meta_data` example

```python
# Unsetting the author metadata value
document.add_meta_data({'Author': []})
```

## `get_permissions` method

You use this method to get all item [permissions](https://docs.coveo.com/en/223/).
It returns a list of `PermissionLevel` objects:

```json
{
  "PermissionSets": [
    {
      "AllowAnonymous": false,
      "DeniedPermissions": [],
      "Name": "",
      "AllowedPermissions": []
    }
  ],
  "Name": ""
},
{
  "PermissionSets": [
    {
      "AllowAnonymous": false,
      "DeniedPermissions": [],
      "Name": "View All Data Members",
      "AllowedPermissions": [
        {
          "SecurityProvider": "SALESFORCE-00Df40000000SAbEAM",
          "IdentityType": "virtualgroup",
          "Identity": "ViewAll:Irrelevant:",
          "AdditionalInfo": {}
        },
        {
          "SecurityProvider": "SALESFORCE-00Df40000000SAbEAM",
          "IdentityType": "virtualgroup",
          "Identity": "ObjectAccess:ViewAllRecordsProfiles:Solution",
          "AdditionalInfo": {}
        },
        {
          "SecurityProvider": "SALESFORCE-00Df40000000SAbEAM",
          "IdentityType": "virtualgroup",
          "Identity": "ObjectAccess:ViewAllRecordsPermissionSets:Solution",
          "AdditionalInfo": {}
        }
      ]
    }
  ],
  "Name": "View All Data"
},
{
  "PermissionSets": [
    {
      "AllowAnonymous": false,
      "DeniedPermissions": [],
      "Name":"Read access members",
      "AllowedPermissions": [
        {
          "SecurityProvider": "SALESFORCE-00Df40000000SAbEAM",
          "IdentityType": "virtualgroup",
          "Identity": "ObjectAccess:ReadRecordsProfiles:Solution",
          "AdditionalInfo": {}
        },
        {
          "SecurityProvider": "SALESFORCE-00Df40000000SAbEAM",
          "IdentityType": "virtualgroup",
          "Identity": "ObjectAccess:ReadRecordsPermissionSets:Solution",
          "AdditionalInfo": {}
        }
      ]
    }
  ],
  "Name": "Read Access & Sharing"
}
```

**Syntax:**

```python
document.get_permissions()
```

### `get_permissions` example

```python
# Get item permissions in a log message
import json
my_permissions = json.dumps(document.get_permissions())
log(str(my_permissions))
```

## `clear_permissions` method

You use this method to clear all item permissions.

> **Important**
>
> Be careful when using the `clear_permissions` method.
> It could allow any user to access potentially sensitive information from originally secured items.

**Syntax:**

```python
document.clear_permissions()
```

## `add_allowed` method

You use this method to add an allowed [security identity](https://docs.coveo.com/en/240/).

**Syntax:**

```python
document.add_allowed(identity, identity_type, security_provider, {additional_info})
```

### `add_allowed` parameters

The following table shows the `add_allowed` method parameters:

[%header,cols="1,1,5"]
|===
|Parameter
|Type
|Description

|`identity`
|**Required**: string
|The allowed security identity name to add.

|`identity_type`
|**Required**: string
a|Allowed values are:

* `user`

An individual [user](https://docs.coveo.com/en/250/).

* `group`

A [group](https://docs.coveo.com/en/202/), which can have users or other groups/virtual groups as members.

* `virtualgroup`

A [virtual group](https://docs.coveo.com/en/252/), which is a group that doesn't exist in the indexed secured enterprise system.

* `unknown`

An entity that doesn't fit any of the aforementioned types.

|`security_provider`
|**Required**: string
a|The name of the security identity provider.

Sample value: `'Email Security Provider'`

|`additional_info`
|dictionary of string
|A collection of key value pairs that can be used to uniquely identify the security identity.
|===

### `add_allowed` example

```python
# Allowing access to all users logging in with Coveo account
document.add_allowed('*@coveo.com', 'user', 'Email Security Provider', {})
```

## `add_denied` method

You use this method to add a denied security identity.

**Syntax:**

```python
document.add_denied(identity, identity_type, security_provider, {additional_info})
```

### `add_denied` parameters

The following table shows the `add_denied` method parameters:

[%header,cols="1,1,5"]
|===
|Parameter
|Type
|Description

|`identity`
|**Required**: string
|The denied security identity name to add.

|`identity_type`
|**Required**: string
a|Allowed values are:

* `user`

An individual [user](https://docs.coveo.com/en/250/).

* `group`

A [group](https://docs.coveo.com/en/202/), which can have users or other groups/virtual groups as members.

* `virtualgroup`

A [virtual group](https://docs.coveo.com/en/252/), which is a group that doesn't exist in the indexed secured enterprise system.

* `unknown`

An entity that doesn't fit any of the aforementioned types.

|`security_provider`
|**Required**: string
a|The name of the security identity provider.

Sample value: `'Email Security Provider'`

|`additional_info`
|dictionary of string
|A collection of key value pairs that can be used to uniquely identify the security identity.
|===

### `add_denied` example

```python
# Denying access to all users logging in with hotmail account
document.add_denied('*@hotmail.com', 'user', 'Email Security Provider', {})
```

## `set_permissions` method

You use this method to set item permissions.
To set permissions, define at least one [permission level](https://docs.coveo.com/en/224/), one [permission set](https://docs.coveo.com/en/226/), and one permission.

**Syntax:**

```python
document.set_permissions([PermissionLevel])
```

### `PermissionLevel` parameters

The following table shows the [permission level](https://docs.coveo.com/en/224/) parameters:

[%header,cols="1,1,1"]
|===
|Parameter
|Type
|Description

|`level_name`
|String
|The name of the permission level.

|`permission_sets`
|Array of PermissionSet
|Array of permission sets
|===

### `PermissionSet` parameters

The following table shows the [permission set](https://docs.coveo.com/en/226/) parameters:

[%header,cols="1,1,1"]
|===
|Parameter
|Type
|Description

|`set_name`
|**Required**: String
|The name of the permission set.

|`allow_anonymous`
|**Required**: Boolean
|Whether to allow anonymous access.

|`allowed_permissions`
|Array of Permission
|Array of allowed permissions

|`denied_permissions`
|Array of Permission
|Array of denied permissions
|===

### `Permission` parameters

The following table shows the permission parameters:

[%header,cols="1,1,5"]
|===
|Parameter
|Type
|Description

|`identity`
|**Required**: string
a|The name of the security identity.

Sample value: `'*@coveo.com'` to allow access to all users logging in with Coveo email.

|`identity_type`
|**Required**: string
a|Allowed values are:

* `user`

An individual [user](https://docs.coveo.com/en/250/).

* `group`

A [group](https://docs.coveo.com/en/202/), which can have users or other groups/virtual groups as members.

* `virtualgroup`

A [virtual group](https://docs.coveo.com/en/252/), which is a group that doesn't exist in the indexed secured enterprise system.

* `unknown`

An entity that doesn't fit any of the aforementioned types.

|`security_provider`
|**Required**: string
a|The name of the security identity provider.

Sample value: `'Email Security Provider'`

|`additional_info`
|dictionary of string
|A collection of key value pairs that can be used to uniquely identify the security identity.
|===

### `set_permissions` example

The complexity of the permission model can range from allowing full anonymous access to requiring the resolution of permissions for several permission levels, each containing one or more permissions sets.

```python
import json

# defining security levels
# top_level allows ceo@coveo.com and denies Accountants
top_level = document.PermissionLevel('CEO', [document.PermissionSet('TopSet', False,
    [document.Permission('ceo@coveo.com', 'user', 'Email Security Provider')],
    [document.Permission('Accountants', 'group', 'Email Security Provider')])])

# lower_level allows myGroup1 and denies myGroup2 and myGroup3
lower_level = document.PermissionLevel('Employees', [document.PermissionSet('LowerSet', False,
    [document.Permission('myGroup1', 'group', 'Email Security Provider')],
    [document.Permission('myGroup2', 'group', 'Email Security Provider'),
    document.Permission('myGroup3', 'group', 'Email Security Provider')])])

# Set item permission levels
document.set_permissions([top_level, lower_level])

# Get item permissions in a log message
my_permissions = json.dumps(document.get_permissions())
log(str(my_permissions))
```

## `get_data_streams` method

You can use this method to get access to item [data streams](https://docs.coveo.com/en/2891/) when you must read or modify these streams.

This method returns a list of `ReadOnlyDataStream` objects.
Each of these is a `BytesIO` value, which is a stream of in-memory bytes (see [Python Buffered Streams](https://docs.python.org/3/library/io.html#buffered-streams)).

```txt
[
    <extension_runner.ApiV1.ReadOnlyDataStream object at 0x7f88fc0665d0>,
    <extension_runner.ApiV1.ReadOnlyDataStream object at 0x7f88fc0662d0>,
    <extension_runner.ApiV1.ReadOnlyDataStream object at 0x7f88fc066590>
]
```

**Syntax:**

```python
document.get_data_streams()
```

### `get_data_streams` example

. In the **Edit an extension** window, select at least one of the checkbox associated with each item data in order for the `get_data_streams()` method to return something.

![Edit extension panel with data access options](https://docs.coveo.com/en/assets/images/index-content/extension-edit-panel.png)

> **Note**
>
> A user must specify that their extension requires access to an item binary data in order for the data to be downloaded and passed along to the extension runner.
> 
> To optimize indexing performance, you should only access a data stream when necessary.

. Use the `get_data_streams()` method in your Python extension script.

```python
body = document.get_data_streams()
# body is now a list of `ReadOnlyDataStream` objects which are accessible data streams
# body[1] is the `ReadOnlyDataStream` object corresponding to `body_html`
log(body[1].read())
```

The preceding code has visible effects in the Log Browser:

![Coveo Log Browser entry showing that the extension has been applied](https://docs.coveo.com/en/assets/images/index-content/get-data-streams-example.png)

## `get_data_stream` method

You use this method to get a data stream for a given name and origin.

This method returns a single `ReadOnlyDataStream` object.
This is a `BytesIO` value, which is a stream of in-memory bytes (see [Python Buffered Streams](https://docs.python.org/3/library/io.html#buffered-streams)).

> **Note**
>
> For Web and Sitemap type sources, use the web scraping feature rather than extensions to do common HTML content processing such as excluding sections and extracting metadata (see [Web scraping configuration](https://docs.coveo.com/en/mc1f3573/)).

**Syntax:**

```python
document.get_data_stream(name, origin, reverse)
```

### `get_data_stream` parameters

The following table shows the `get_data_stream` method parameters:

[%header,cols="1,1,5"]
|===
|Parameter
|Type
|Description

|`name`
|**Required**: string
a|The available item data streams are:

* `documentdata`

The complete item binary content extracted by the Crawling stage of the indexing pipeline (see [Coveo indexing pipeline](https://docs.coveo.com/en/1893/)).

**Example:**

The `documentdata` of a PDF file is the actual (binary) PDF file.

The `documentdata` of a web page is the page (binary) HTML markup.

You may want to retrieve an item `documentdata` in a pre-conversion extension to modify the original item content.

**Example:**

You want to extract the text content from scanned items that are saved as image files.
You use a pre-conversion extension to send each image `documentdata` to a third party optical character recognition (OCR) service.
You save the returned text back in the `documentdata` so that the Processing stage can prepare the text content for the Indexing stage.

Getting the `documentdata` can significantly degrade indexing performances because each item binary data has to be fetched, decompressed, and decrypted.

There's generally no point to get and modify the `documentdata` in a post-conversion extension because the Indexing stage doesn't process it.

> **Note**
>
> In the Coveo Administration Console **Add/Edit an Extension** panel, the `documentdata` is referred to as the **Original file**.

* `body_text`

The complete textual content of an item extracted by the converter in the Processing stage of the indexing pipeline (see [Coveo indexing pipeline](https://docs.coveo.com/en/1893/)).

You can get the `body_text` of each item in a post-conversion extensions for rare cases where you want to access and possibly modify the item text content.

There's no point in getting and modifying the `body_text` in a pre-conversion extension because the Processing stage would overwrite it.

> **Note**
>
> For index size and performance optimization, the `body_text` is limited in size to 10 MB. This means that for rare items with larger `body_text`, the exceeding text won't be indexed, and therefore not searchable.

* `body_html`

The complete HTML representation of an item created by the converter in the Processing stage of the indexing pipeline (see [Coveo indexing pipeline](https://docs.coveo.com/en/1893/)).
The `body_html` appears in the Quickview of a search result item.

You can get the `body_html` of each item in a post-conversion extension for cases where you want to access and possibly modify the item text content.

**Example:**

Your source indexes a question and answer website.
Each question and each answer is indexed as a separate item even if they can come from the same HTML page.
Your indexed items don't have the `&lt;head&gt;` elements from the original HTML page and therefore are missing resources such as CSS.
Consequently, the Quickview for these items doesn't look good.

You get the `body_html` in an extension and inject the appropriate `&lt;head&gt;` elements.

There's no point in getting and modifying the `body_html` in a pre-conversion extension because the Processing stage would overwrite it.

> **Notes**
>
> * When you can define your desired `body_html` content as a static HTML markup containing metadata placeholders, it's generally simpler to [use a mapping](https://docs.coveo.com/en/1847/) on the body field.
> 
> * For index size and performance optimization, the `body_html` is limited in size to 10 MB.
> This means that the Quickview of items with a larger `body_html` will be truncated.

* `markdown`

The complete Markdown representation of an item created by the converter in the Processing stage of the indexing pipeline (see [Coveo indexing pipeline](https://docs.coveo.com/en/1893/)).
The Markdown data stream is used only by a [CPR model to create chunks](https://docs.coveo.com/en/p9ub0044#chunking-data-stream) for the [embeddings](https://docs.coveo.com/en/ncc87383/) that are used for semantic content retrieval.

You can get the `markdown` of each item in a post-conversion extension for cases where you want to access and possibly modify the item text that's used to create chunks for [embeddings](https://docs.coveo.com/en/ncc87383/).

There's no point in getting and modifying the `markdown` in a pre-conversion extension because the Processing stage would overwrite it.

> **Notes**
>
> * The Markdown data stream is processed for PDF files only.
> All other file types are processed only with body text and body HTML data streams.
> 
> * A PDF file that's already indexed won't have a Markdown data stream until it's re-indexed.
> To make sure all of your PDF files are processed to include a Markdown data stream, [rebuild your source](https://docs.coveo.com/en/2039#rebuild).
> 
> * If a Markdown data stream exists for an item, the [CPR](https://docs.coveo.com/en/oaie9196/) [model](https://docs.coveo.com/en/1012/) automatically uses the Markdown data stream to create the chunks.
> Otherwise, the [CPR](https://docs.coveo.com/en/oaie9196/) [model](https://docs.coveo.com/en/1012/) uses the body text data stream to create the chunks.
> 
> * To optimize indexing performance, the processing time for an item's Markdown data stream is limited to 15 minutes.
> If the limit is reached, the Markdown data stream will be truncated.
> In this case, the [CPR](https://docs.coveo.com/en/oaie9196/) [model](https://docs.coveo.com/en/1012/) still uses the truncated body Markdown data stream to create the chunks.

* `$thumbnail$`

The thumbnail image created by the converter in the Processing stage of the indexing pipeline for specific file types ( Microsoft Word, Excel, PowerPoint, and Visio as well as many image file types such as JPG, BMP, GIF, TIF, PSD, PNG... ).

You can get the `$thumbnail$` in a post-conversion extension in the rare cases where you want to modify the thumbnail or extract information from the thumbnail image.
Your thumbnail image can have any size, resolution, or format (as long as a browser can display it), but you should stick to a normalized image size and resolution for most cases.

> **Note**
>
> To create or overwrite a thumbnail, you don't need to have already retrieved the `$thumbnail$` data stream.

|`origin`
|string
a|The unique identifier of the Coveo indexing pipeline step from which to retrieve a data stream.

The allowed `origin` values are as follows:

* `crawler`: The metadata value set during the [crawling](https://docs.coveo.com/en/2684#crawling) stage

* Pre-conversion script name - The metadata value set during a specific pre-conversion script

* `converter`: The metadata value set during the [processing](https://docs.coveo.com/en/2684#processing) stage

* `mapping`: The metadata value set during the [mapping](https://docs.coveo.com/en/2684#mapping) stage

* Post-conversion script name - The metadata value set during a specific post-conversion script

> **Notes**
>
> * If no value is supplied for this parameter, the most recent origin is used.
> For example, the origin would be `crawler` in pre-conversion and `mapping` in post-conversion.
> 
> * If you have two different post-conversion scripts that modify a stream, and you don't specify the origin in the second script, the output of the first pre-conversion script will be used in the second post-conversion extension script, because it's the most recent origin specified.

|`reverse`
|Boolean
|Whether to scan the metadata origin in reverse order or not.
The default value is `True`, meaning that the value is fetched from the latest indexing pipeline stage with a non-empty value.
|===

### `get_data_stream` examples

```python
# Get document body text data stream to appear in a log message
# You must select the Body text checkbox because this indexing pipeline extension script needs to access it
my_data_stream = document.get_data_stream('body_text').read()
log(my_data_stream)
```

```python
# Get decoded documentdata stream to appear in a log message
# You must select the Original file checkbox because this indexing pipeline extension script needs to access it
my_data_stream = document.get_data_stream('documentdata').read().decode()
log(my_data_stream)
```

## `DataStream` attribute setter

You use this method to access and set a `DataStream` object for a given name and origin.

This method returns a single modifiable `DataStream` object.
This is a `BytesIO` value, which is a stream of in-memory bytes (see [Python Buffered Streams](https://docs.python.org/3/library/io.html#buffered-streams)).

When applicable, the extension runner is responsible for writing the item binary data back after executing the script.

**Syntax:**

```python
document.DataStream(name, origin, reverse)
```

The parameters are the same as those listed above for the [`get_meta_data`](#get_meta_data_value-parameters) method.

### `DataStream` example

```python
# Override the item body text
text = document.DataStream('body_text')
text.write('This is a test')
document.add_data_stream(text)
```

## `add_data_stream` method

You use this method to add or override an item data stream.

**Syntax:**

```python
document.add_data_stream(stream)
```

### `add_data_stream` example

```python
# Import the requests library to perform API calls
import requests

extracted_text = [x.strip('\r\n\t') for x in document.get_data_stream('body_text', 'converter').readlines() if x.strip('\r\n\t')]

# Override item html with perdu.com
html = document.DataStream('Body_HTML')
html.write(requests.get('http://perdu.com').text)

# Override the text with part of the original item
text = document.DataStream('body_text')
text.write('This is a test.')
text.write(extracted_text[0])

# Override the thumbnail of the item with Coveo logo
thumbnail = document.DataStream('$thumbnail$')
thumbnail.write(requests.get('https://careers.coveo.com/assets/images/opengraph.png').content)

document.add_data_stream(html)
document.add_data_stream(text)
document.add_data_stream(thumbnail)
```

## `reject` method

You use this method to set the item state as rejected.

**Syntax:**

```python
document.reject()
```

## `document` object JSON schema

The `Document` object can be represented with the following JSON schema:

```json
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "definitions": {
    "MetaDataValue": {
      "type": "object",
      "properties": {
        "origin": {
          "type": "string"
        },
        "values": {
          "type": "object",
          "properties": {
            "key": {
              "type": "string"
            },
            "value": {
              "type": "array",
              "items": {
                "type": "string"
              }
            }
          }
        }
      }
    },
    "Permission": {
      "type": "object",
      "properties": {
        "identity": {
          "type": "string"
        },
        "identity_type": {
          "type": "string"
        },
        "security_provider": {
          "type": "string",
          "enum": ["user", "group", "virtualgroup", "unknown"]
        },
        "additional_info": {
          "type": "object",
          "properties": {
            "key": {
              "type": "string"
            },
            "value": {
              "type": "string"
            }
          }
        }
      }
    },
    "PermissionSet": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "allow_anonymous": {
          "type": "boolean"
        },
        "allowed_permissions": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/Permission"
          }
        },
        "denied_permissions": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/Permission"
          }
        }
      }
    },
    "PermissionLevel": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "permission_sets": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/PermissionSet"
          }
        }
      }
    },
    "DataStream": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "origin": {
          "type": "string"
        }
      }
    },
    "Document": {
      "type": "object",
      "properties": {
        "uri": {
          "type": "string"
        },
        "meta_data": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/MetaDataValue"
          }
        },
        "permissions": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/PermissionLevel"
          }
        },
        "data_streams": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/DataStream"
          }
        }
      }
    }
  }
}
```

To consult a single item document object just before indexing time, using this script as the last executed post-conversion script populates a `documentobject` metadata.

```python
import json
# Get document object JSON into a metadata
document_object = json.dumps(document)
document.add_meta_data({'documentobject': document_object})
```

> **Note**
>
> You must map `documentobject` metadata to a field to index the content and consult the document object values.

The preceding extension script returns the document object JSON:

```json
{
  "DataStream": [
    {
      "Origin": "converter",
      "Name": "body_html"
    },
    {
      "Origin": "mypostconversionextension",
      "Name": "body_html"
    },
    {
      "Origin": "converter",
      "Name": "body_text"
    },
    {
      "Origin": "mypostconversionextension",
      "Name": "body_text"
    },
    {
      "Origin": "mypostconversionextension",
      "Name": "$thumbnail$"
    },
    {
      "Origin": "crawler",
      "Name": "documentdata"
    }
  ],
  "Permissions": [
    {
      "PermissionSets": [
        {
          "AllowAnonymous": false,
          "DeniedPermissions": [],
          "Name": "",
          "AllowedPermissions": [
            {
              "SecurityProvider": "Email Security Provider",
              "IdentityType": "user",
              "Identity": "*@coveo.com",
              "AdditionalInfo": {}
            }
          ]
        }
      ],
      "Name": ""
    }
  ],
  "URI": "http://www.example.com/",
  "MetaData": [
    {
      "Origin": "crawler",
      "Values": {
        "originaluri": [
          "http://www.example.com/"
        ],

             [ ... ]
        "permanentid": [
          "f1777111f5d0f1c81ffa04de75112889e6a0649e06d83370cdf2cbfb05f3"
        ],
        "content-type": [
          "text/html; charset=utf-8"
        ]
      }
    },
    {
      "Origin": "mypreconversionextension",
      "Values": {
        "title": [
          "Brand New Title"
        ]
      }
    },
    {
      "Origin": "converter",
      "Values": {
        "conversionstate": [
          0
        ],
        "detectedtitle": [
          "Example Domain"
        ],
        "language": [
          "English"
        ],

            [ ... ]
        "originalhtmlcharset": [
          65001
        ],
        "extractedsize": [
          420
        ]
      }
    },
    {
      "Origin": "mapping",
      "Values": {
        "sourcetype": [
          "Web"
        ],
        "language": [
          "English"
        ],
        "title": [
          "Example Domain"
        ],

             [ ... ]
        "date": [
          1376092475
        ],
        "permanentid": [
          "f1777111f5d0f1c81ffa04de75112889e6a0649e06d83370cdf2cbfb05f3"
        ],
        "size": [
          1270
        ]
      }
    },
    {
      "Origin": "mypostconversionextension",
      "Values": {
        "author": [
          "Coveo Documentation Team"
        ]
      }
    }
  ]
}
```