Python Modules Available to Indexing Pipeline Extensions

An indexing pipeline extension Python script runs in a separate non-persistent isolated OS instance for each source item going through the Coveo Cloud V2 indexing pipeline.

The OS instance comes with Python 2.7.6.

In your indexing pipeline extension Python script, you can import modules such as:

  • Modules from the standard Python library (see The Python Standard Library)
  • asn1crypto - Fast ASN.1 parser and serializer (see asn1crypto Documentation 0.23.0)
  • beautifulsoup4 (bs4) - A Python library for pulling data out of HTML and XML files (see Beautiful Soup Documentation 4.6.0)
  • boto3 - Amazon Web Services (AWS) SDK for Python (see Boto 3 Documentation 1.4.0)
  • botocore - A low level interface for AWS CLI. see Botocore Documentation 1.4.93)
  • cffi - Foreign function interface for calling C code (see cffi Documentation 1.11.2)
  • chardet - A high speed universal character encoding detector (see chardet Documentation 2.0.1)
  • colorama - A cross-platform colored terminal text (see colorama Documentation 0.3.6)
  • cryptography - A cryptographic library (see cryptography Documentation 2.1.3)
  • docutils - A modular system for processing documentation into useful formats such as HTML, XML and LaTeX (see docutils Documentation 0.14)

  • enum34 - A module to manipulate enumeration and their members (see enum34 Documentation 1.1.6)
  • futures - A high level interface for asynchronously executing callable (see futures Documentation 3.1.1)
  • html5lib - A pure Python library used to parse HTML (see html5lib Documentation 0.999)
  • idna - A support for the unicode internationalised domain name in application protocol (see IDNA Documentation 2.6)
  • ipaddress - IPv4/IPv6 manipulation library (see ipaddress Documentation 1.0.18 and ipaddress 1.0.18 repository)
  • jmespath - Allows to declaratively specify how to extract elements from a JSON document (see JMESPath Documentation 0.9.3)
  • msgpack-python - MessagePack (de)serializer (see msgpack-python Documentation 0.4.8)

  • ndg-httpsclient - Provides enhanced HTTPS support for httplib and urllib2 using PYOpenSSL (see ndg-httpsclient Documentation 0.4.2)
  • pip - Module to install Python packages (see pip Documentation 9.0.1)
  • pyasn1 - A pure Python implementation of ASN.1 types and DER/BER/CER codecs (see pyasn1 Documentation 0.2.3)
  • pycparser - A complete parser of the C language (see pycparser Documentation 2.18)

  • pymongo - Python driver for MongoDB(see pymongo Documentation 2.7.2)
  • pyopenssl - A Python wrapper module around the OpenSSL library (see pyopenSSL Documentation 17.4.0)
  • python-dateutil - Provides powerful extensions to the datetime module (see python-dateutil Documentation 2.6.1)

  • pytz - World timezone definitions, modern and historical (see pytz Documentation 2016.10)

  • redis - The Python interface to the Redis key-value store (see redis Documentation 2.10.5)
  • requests - HTTP library for Python (see Requests Documentation 2.9.1)
  • s3transfer - A library to manage Amazon S3 transfers (see S3transfer Documentation 0.1.11)
  • setuptools - A tool to manipulate Python packages (see setuptools Documentation 3.3)
  • six - A Python 2 and 3 compatibility library that provides utility functions (see six Documentation 1.5.2)
  • urllib3 - A HTTP library with thread-safe connection pooling, file post (see urllib3 Documentation 1.7.1)
  • virtualenv - A tool to create isolated Python environments (see virtualenv Documentation 14.0.0)
  • wheel - A built-package format for Python (see wheel Documentation 0.24.0)

Since this list is subject to minor changes such as adding new modules and updating existing modules, validating modules and their version is a good practice. Some older modules are used to match the use of the Python 2 interpreter. For instance, futures 3.1.1 package is a Python 2 backport of the concurrent.futures package that runs only on Python 3.2 version and newer Python versions.

  • To get an exhaustive list of updated modules and their version in a log message, use the following extension script, preferably on a one-item source:

      # imported to manage packages
      import pip
      # populating 'modules' variable with all values
      modules = pip.get_installed_distributions()
      # sorting values and populating 'modules_list'
      modules_list = sorted(["%s, version %s)" % (i.key, i.version)
           for i in modules])
      # logging a message with the formatted values
      log(str(modules_list))
    
  • You can validate the current Python version in a log message, use the following extension script preferably on a one-item source:

      import sys
      # assigns python version values to myPythonVersion variable
      myPythonVersion = sys.version_info
      # logging a message containing the values
      log(str(myPythonVersion))
      # returns a response in the Log Browser
      # Normal: sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0)
    

If you would like to use a Python module that is not currently supported, contact Coveo Support to suggest the addition of the module to the OS instance.