Crawling Module Additional Configuration

This article applies to the new Crawling Module, which works without Docker. If you still use the Crawling Module with Docker, see Additional Configuration (Docker Version) instead. You might also want to read on the advantages of the new Crawling Module.

To identify the Crawling Module you’re currently using, on the Crawling Modules page of the Coveo Administration Console, look at the Maestro reported version:

  • Versions > 1: new Crawling Module

  • Versions < 1: Crawling Module with Docker

Once you have installed and configured the Coveo On-Premises Crawling Module, you can further configure it so that it meets your specific needs. The instructions in this article are optional, and there’s no obligation to make these changes if your environment doesn’t require them.

Subscribing to Crawling Module Notifications

To help you monitor your Crawling Module deployment, the Crawling Modules page of the Coveo Administration Console displays the state of Maestro. The Activity Browser page also allows you to monitor the Crawling Module update and activities.

In addition, you may want to be notified when the Crawling Module stops reporting to the Coveo Platform for three consecutive days due to being outdated, its host server being shut down, a lack of Internet access, a communication issue, etc. Should you receive a notification, you can then take action to resolve the issue so that the content searchable in your Coveo-powered search interface reflects your actual data.

To receive a notification when the Crawling Module is outdated and deactivated

In the Notifications page, create a notification with the following JSON filter:

{
  "content": {},
  "operations": [
    "DISABLE"
  ],
  "resourceTypes": [
    "CRAWLING_MODULE"
  ],
  "resultTypes": [
    "*"
  ]
}

Switching to TLS 1.2 or Greater

On March 23, 2020, Coveo Cloud has disabled the TLS 1.1 protocol on its platform, as it’s no longer considered secure. Coveo Cloud now enforces TLS 1.2 for all client communications.

While the transition has taken place across the Coveo Platform, it didn’t affect your Coveo On-Premises Crawling Module sources since they are located in your on-premises environment. Therefore, the decision to keep crawling your content using TLS 1.1 or to switch to TLS 1.2 or greater is yours to make.

To switch to TLS 1.2 or greater

Switching to TLS 1.2 or greater is done on a per-source basis. You may therefore choose to have only some of your sources enforce the more recent versions of the protocol.

To have a source use TLS 1.2 or greater, open its JSON configuration and then, in the parameters object, add "EnforceOnlySecureSecurityProtocols": true. If your source indexes permissions, add the same line to the corresponding security identity provider.

Following the next successful rebuild operation, the source will use TLS 1.2 or greater protocols to crawl your content.

Restrict the API Addresses From Which the “Crawling Modules” API Key Can Be Used

When you link the Crawling Module to a Coveo organization, an API key named Crawling modules is created and appears on the API Keys Coveo Administration Console page. This API key is encrypted and saved on your disk.

Coveo Administration Console API Keys Page

You can edit the Crawling modules API key to restrict the IP addresses with which it can be used in requests to the Coveo Platform. We recommend that you only allow addresses that are likely to be used by the server on which you installed the Crawling Module. To do so, you should ask your network administrator for a complete list of outgoing public IP addresses, including those used by HTTP proxies for outbound communication. This ensures that the API key can only be used in requests that originate from the network associated with your organization.

Moving the “Maestro” Data Folder

When you install Maestro, two Maestro folders are created on your server:

  • The Maestro installation folder, which contains binary files of executable code. These files are identical for all Crawling Module instances. This folder is located under C:\Program Files\Coveo.

  • The Maestro data folder, which contains logs, backups, and your pre-push extensions among other things. This folder is located under C:\ProgramData\Coveo by default.

Although C:\ProgramData\Coveo is the recommended location for the Maestro data folder, you may need to move your Crawling Module data elsewhere, for example due to a lack of space.

  1. Uninstall the Crawling Module by running the unins000 script located under C:\Program Files\Coveo\Maestro.

  2. Reinstall Maestro and, in the wizard, specify the desired folder location.

  3. Stop the Maestro and State Store services:

    1. Press the Windows key + R.

    2. In the Run box, type services.msc, and then press Enter.

    3. In the Services window that opens, select the Coveo Maestro Service, and then click Stop.

    4. Select the Coveo Maestro State Service, and then click Stop.

  4. Copy your old Maestro folder to the location you selected while reinstalling Maestro.

  5. Back in the Services window, select the Coveo Maestro Service and the Coveo Maestro State Service, and then click Restart.

  6. In the Maestro Swagger, make the /api/status/maestro call to confirm that Maestro is up and running again.

  7. Complete the Crawling Module redeployment.

Changing the Identity Used by the Maestro Service

By default, the Maestro Service runs using the LOCAL SYSTEM identity. However, you might need to change this identity, for instance to crawl and index a database with Windows integrated security.

  1. In the Windows Start menu, search for Services.

  2. In the Services window that appears, right-click the Coveo Maestro Service, and then select Properties.

  3. In the Coveo Maestro Service Properties window that opens, in the Log On tab, select This account, and then enter the credentials of the desired identity. Click OK.

  4. Restart the service to apply the change.

  5. Once the service is running again, link your Maestro deployment to your Coveo organization again.

Deploying Multiple Crawling Module Instances

A single Crawling Module instance can retrieve the content of several repositories, provided that the server on which you install the Crawling Module can access these repositories. Therefore, a single instance could be sufficient to meet your needs if you adjust your number of workers accordingly.

However, there are situations in which deploying the Crawling Module on more than one server would be relevant, such as when the repositories to make searchable are located on servers you don’t want to make accessible to the server on which you install the Crawling Module. The alternative is then to deploy a Crawling Module instance on each server on which there’s a repository to make searchable.

To deploy an additional Crawling Module instance, follow the deployment instructions. Then, if necessary, edit your Crawling Module sources to pair them with your new Crawling Module instance.

A Crawling Module instance can retrieve the content of multiple repositories, but a source of content can’t be shared between two Crawling Module instances. It can only be paired with a single instance.

In the Coveo Administration Console, the Crawling Modules page lets you monitor your Crawling Module instances. Each instance is identified by a default name, which you can replace with something more convenient by editing the Maestro configuration of the desired instance.

Uninstalling the Crawling Module

If you need to uninstall your Crawling Module instance, type Apps & Features in the Windows menu, and then hit Enter. In the Apps & Features window that opens, select Maestro, and then click Uninstall.

Recommended Articles