Pre-push example for merging item data and metadata
Pre-push example for merging item data and metadata
If you want to index items whose metadata and data are stored in different locations, the best practice is to create a pre-push extension. This extension then applies to every item crawled, merging its data and metadata before the item is pushed into the cloud.
For example, you may be using a Database source, where the database contains item metadata, including a link to the file containing the item data.
To follow the link to an item, extract the data of the item, and then add that data in the item body
, you would use a pre-push extension such as the following:
import base64
import datetime
import os.path
import subprocess
import zlib
# Prepush logs will be sent as "prepushlog" metadata
log = []
def do_extension(body):
global log
log = ['BEGIN %s' %(datetime.datetime.now().time())]
full_path = 'C:/Data/sample.pdf'
if os.path.isfile(full_path):
# Open and read the file as a binary (`rb`)
with open(full_path,'rb') as f:
file_data = f.read()
if len(file_data) > 0:
# Compress and encode the file using `zlib` and `base64` modules
body['CompressionType'] = 'ZLIB'
body['CompressedBinaryData'] = base64.b64encode(zlib.compress(file_data)).decode()
#log.append('file_data: ' + base64.b64encode(zlib.compress(file_data)).decode())
body['prepushflag'] = 'true'
else:
log.append('file_data is empty for document: %s' % full_path)
else:
log.append('file not found: %s' % full_path)
body['prepushlog']=';'.join(log)
return body