Use external Python modules

When writing pre-push extension scripts, you may want to use external Python modules that are not part of the Python standard library. This example shows how to import and use those modules in your pre-push extension scripts.

The script creates an Extensions folder under the COVEO_LOGS_ROOT folder (if it doesn’t already exist) and a subfolder named after the source ID. The script logs relevant information about each crawled item in a .log file in that folder.

Tip
Leading practice

Apply the extension to a duplicate of your production source with a name that clearly indicates it’s for testing purposes only. In this test source, crawl only a small subset of content for faster debugging and to limit the log file size.

Only after fully testing and validating the pre-push extension in the test source should you apply it to your production source.

Apply this extension to a source. Make sure you add the required external Python modules to the C:\ProgramData\Coveo\Maestro\Python3PrePushExtensions\requirements.txt file, as follows:

### Enter your python dependencies here.
### They will be available in python extensions.
### This file will be read everytime a connector needs to use python (so every crawling run).
### Adding the pendulum and requests third-party libraries that are used in the example script below.
pendulum
requests

Then, rebuild the source.

# Import required Python libraries. Add the pendulum and requests external Python modules to the requirements.txt file.
import os
import logging
from logging.handlers import TimedRotatingFileHandler
import pendulum
import requests

# Initialize rotating file logging
log_folder = os.path.join(
    os.getenv("COVEO_LOGS_ROOT"),
    "Extensions",
    os.getenv("SOURCE_ID", "unknown")
)
os.makedirs(log_folder, exist_ok=True)

fname = f"{os.getenv('OPERATION_TYPE','unknown')}_{os.getenv('OPERATION_ID','unknown')}.log"
fpath = os.path.join(log_folder, fname)

handler = TimedRotatingFileHandler(fpath, when="midnight")
handler.suffix = "%Y-%m-%d"

formatter = logging.Formatter(
    fmt="%(asctime)s.%(msecs)03d %(levelname)s %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S"
)
handler.setFormatter(formatter)

logging.basicConfig(level=logging.INFO, handlers=[handler])

# -----------------------------------------------------------------
# Extension entry point. The do_extension function must be defined.
# -----------------------------------------------------------------
def do_extension(body):

    # Log basic item info
    document_id = body.get("DocumentId", "<missing>")
    logging.info("BEGIN processing item: %s", document_id)

    # Add current time metadata using pendulum module
    now = pendulum.now("Europe/Paris")
    formatted_time = now.strftime("%m/%d/%Y, %H:%M:%S")
    body["currentTime"] = formatted_time
    logging.info("Set metadata: currentTime = %s", formatted_time)

    # Perform HTTP GET request using requests module
    try:
        logging.info("Fetching HTML content from http://httpbin.org/html")
        response = requests.get("http://httpbin.org/html")
        body["someHtml"] = response.text
        logging.info("Fetched HTML content (%d characters)", len(response.text))
    except Exception as ex:
        logging.error("Failed to fetch HTML content: %s", ex)
        # You can choose to re-raise, skip, or set fallback metadata
        body["someHtml"] = ""

    logging.info("END processing document: %s", document_id)

    return body