Pre-Push Example for Merging Item Data and Metadata
This article applies to the new Crawling Module, which works without Docker. If you still use the Crawling Module with Docker, see Pre-Push Example for Merging Item Data and Metadata (Docker Version) instead. You might also want to read on the advantages of the new Crawling Module.
Versions > 1: new Crawling Module
Versions < 1: Crawling Module with Docker
If you want to index items whose metadata and data are stored in different locations, the best practice is to create a pre-push extension. This extension then applies to every item crawled, merging its data and metadata before the item is pushed into the cloud.
For instance, you may be using a database (ODBC) connector, where the database contains item metadata, including a link to the file containing the item data.
To follow the link to an item, extract the data of the item, and then add that data in the item
body, you would use a pre-push extension such as the following:
import base64 import datetime import os.path import subprocess import zlib # Prepush logs will be sent as "prepushlog" metadata log =  def do_extension(body): global log log = ['BEGIN %s' %(datetime.datetime.now().time())] full_path = 'C:/Data/sample.pdf' if os.path.isfile(full_path): # Open and read the file as a binary (`rb`) with open(full_path,'rb') as f: file_data = f.read() if len(file_data) > 0: # Compress and encode the file using `zlib` and `base64` modules body['CompressionType'] = 'ZLIB' body['CompressedBinaryData'] = base64.b64encode(zlib.compress(file_data)).decode() #log.append('file_data: ' + base64.b64encode(zlib.compress(file_data)).decode()) body['prepushflag'] = 'true' else: log.append('file_data is empty for document: %s' % full_path) else: log.append('file not found: %s' % full_path) body['prepushlog']=';'.join(log) return body