--- title: Add item data slug: '3270' canonical_url: https://docs.coveo.com/en/3270/ collection: index-content source_format: adoc --- # Add item data To index [items](https://docs.coveo.com/en/210/) that store their data outside the source, you can [create a pre-push extension](https://docs.coveo.com/en/3269/). For example, you might be using a [Database source](https://docs.coveo.com/en/1885/), where the database contains item metadata, including a path to a file that stores the item data. Use a pre-push extension to open the file, extract its content, and add it to the item `body`. The following example shows how. The script creates an `Extensions` folder under the [`COVEO_LOGS_ROOT`](https://docs.coveo.com/en/3269#coveo-pre-push-extension-environment-variables) folder (if it doesn't already exist) and a subfolder named after the [source ID](https://docs.coveo.com/en/3390#copy-a-source-name-or-id). The script logs relevant information about each crawled item in a .log file in that folder. > **Leading practice** > > Apply the extension to a [duplicate of your production source](https://docs.coveo.com/en/3390#duplicate-a-source) with a name that clearly indicates it's for testing purposes only. > In this test source, [crawl only a small subset of content](https://docs.coveo.com/en/2992/) for faster debugging and to limit the log file size. > > Only after fully testing and validating the pre-push extension in the test source should you apply it to your production source. After you [apply this extension](https://docs.coveo.com/en/3269#apply-a-pre-push-extension-to-a-source), rebuild the source. ```python # Import required Python libraries. Note: Add non-Python standard libraries to the requirements.txt file. import os import base64 import zlib import logging from logging.handlers import TimedRotatingFileHandler # Initialize rotating file logging log_folder = os.path.join( os.getenv("COVEO_LOGS_ROOT"), "Extensions", os.getenv("SOURCE_ID", "unknown") ) os.makedirs(log_folder, exist_ok=True) fname = f"{os.getenv('OPERATION_TYPE','unknown')}_{os.getenv('OPERATION_ID','unknown')}.log" fpath = os.path.join(log_folder, fname) handler = TimedRotatingFileHandler(fpath, when="midnight") handler.suffix = "%Y-%m-%d" formatter = logging.Formatter( fmt="%(asctime)s.%(msecs)03d %(levelname)s %(message)s", datefmt="%Y-%m-%d %H:%M:%S" ) handler.setFormatter(formatter) logging.basicConfig(level=logging.INFO, handlers=[handler]) # ----------------------------------------------------------------- # Extension entry point. The do_extension function must be defined. # ----------------------------------------------------------------- def do_extension(body): # Log basic item info document_id = body.get("DocumentId", "") logging.info("BEGIN processing item: %s", document_id) full_path = "C:/Data/sample.pdf" # File existence check if os.path.isfile(full_path): logging.info("File found: %s", full_path) try: # Open and read the file as a binary (`rb`) with open(full_path, "rb") as f: file_data = f.read() except Exception as ex: logging.error("Failed to read file '%s': %s", full_path, ex) return body # Empty file check if len(file_data) > 0: logging.info("Read %d bytes from %s", len(file_data), full_path) try: # Compress and encode the file content using `zlib` and `base64` modules compressed = zlib.compress(file_data) encoded = base64.b64encode(compressed).decode() body["CompressionType"] = "ZLIB" body["CompressedBinaryData"] = encoded logging.info("Compressed and encoded file. Original size: %d bytes, Compressed size: %d bytes", len(file_data), len(compressed)) except Exception as ex: logging.error("Error during compression/encoding for '%s': %s", full_path, ex) else: logging.warning("file_data is empty for document: %s", full_path) else: logging.warning("File not found: %s", full_path) logging.info("END processing document: %s", document_id) return body ```