Creating an Amazon EC2 Virtual Machine for the Crawling Module (Docker Version)

This article applies to the old Crawling Module, which works with Docker. The old Crawling Module will soon reach its end-of-life. We recommend switching to the new Crawling Module, which doesn’t require Docker.

To identify the Crawling Module you’re currently using, on the Crawling Modules page of the Coveo Cloud Administration Console, look at the Maestro reported version:

  • Versions > 1: new Crawling Module

  • Versions < 1: Crawling Module with Docker

Coveo provides installation steps for the following virtual machine setups:

  • While you can use a different virtual machine setup, Coveo can’t guarantee that Docker Enterprise Edition will install successfully on such a setup (see Installing Docker). See Validating the Installation to ensure Docker works correctly before installing the Crawling Module.

  • Any server running the Crawling Module must have access to the content you want to index, regardless of whether the server is a virtual machine or not.

Follow these steps to create an Amazon EC2 VM that’s compatible with the Coveo On-Premises Crawling Module.

To create an Amazon EC2 virtual machine

  1. In your Instances panel, create a new Windows Server VM that has:

    • Windows Server 2016, version 14393.1914 onwards

    • The GUI enabled

    We recommend that you select Microsoft Windows Server 2016 Base AMI and installing Docker Enterprise Edition manually, as the with Containers image is outdated (see Installing Docker).

  2. When selecting the VM specifications, consider the following:

    • M4 instances are advised against as they’re known to have issues with Docker Swarm, which is used by the Crawling Module (see Swarm mode (init or join) terminating Remote Desktop connections in AWS EC2).

    • The VM needs at least 100 GB of disk space to support the installation of Windows, Docker and the Docker images used by the Crawling Module. The higher the available storage, the less risk of running out of storage during regular activity.

    • You must open a port for RDP inbound communications. By default, port 3389 is used.

    • Other requirements are more flexible. A good rule of thumb is to aim for one virtual CPU and 4 GB of RAM per worker.

      The number of workers is part of the Crawling Module configuration. It determines the number of sources that can refresh at the same time. That number should be between 1 and the total number of Crawling Module sources in your organization and can be changed at any time. See Number of Workers for more information on workers.

The following are two valid Amazon EC2 VMs that are compatible with the Coveo On-Premises Crawling Module:

  • VM 1:

    • Image: Microsoft Windows Server 2016 Base

    • Manual installation of Docker Enterprise Edition (see Installing Docker)

    • t2.large instance: 8GB RAM, 2 VCPUs, 500GB

  • VM 2:

    • Image: Microsoft Windows Server 2016 Base

    • Manual installation of Docker Enterprise Edition (see Installing Docker)

    • m3.2xlarge instance: 30GB RAM, 8VCPUs, 800GB

What’s Next?

Install Docker (see Installing Docker).

Recommended Articles