Additional Configuration (Docker Version)

This article applies to the old Crawling Module, which works with Docker. If you are using the new Crawling Module, see Crawling Module Additional Configuration instead.

The old Crawling Module will soon reach its end-of-life. We recommend switching to the new Crawling Module, which doesn’t require Docker.

To identify the Crawling Module you’re currently using, on the Crawling Modules page of the Coveo Administration Console, look at the Maestro reported version:

  • Versions > 1: new Crawling Module

  • Versions < 1: Crawling Module with Docker

Once you have installed and configured the Coveo On-Premises Crawling Module, you can further configure it so that it meets your specific needs. The instructions in this article are optional, and there’s no obligation to make these changes if your environment doesn’t require them.

Switching to TLS 1.2 or Greater

On March 23, 2020, Coveo Cloud has disabled the TLS 1.1 protocol on its platform, as it’s no longer considered secure. Coveo Cloud now enforces TLS 1.2 for all client communications.

While the transition has taken place across the Coveo Cloud platform, it didn’t affect your Coveo On-Premises Crawling Module sources since they’re located in your on-premises environment. Therefore, the decision to keep crawling your content using TLS 1.1 or to switch to TLS 1.2 or greater is yours to make.

To switch to TLS 1.2 or greater

Switching to TLS 1.2 or greater is done on a per-source basis. You may therefore choose to have only some of your sources enforce the more recent versions of the protocol.

To have a source use TLS 1.2 or greater, open its JSON configuration and then, in the parameters object, add "EnforceOnlySecureSecurityProtocols": true.

Following the next successful rebuild operation, the source will use TLS 1.2 or greater protocols to crawl your content.

Subscribing to Crawling Module Deactivation Notifications

The Crawling Module updates automatically. However, if it’s unable to do so for a while, the workers stop until the Crawling Module is up to date again. For more details on the Crawling Module update process, see About Crawling Module Updates.

To help you monitor your Crawling Module deployment, the Crawling Modules page of the Coveo Administration Console displays the state of your Crawling Module components. The Activity Browser page also allows you to monitor the Crawling Module update and deactivation activities.

In addition, you may want to be notified when a Crawling Module component becomes obsolete and stops your workers. Should you receive such a deactivation notification, you could then update your Crawling Module to ensure that the content available through your Coveo-powered search interfaces reflects your actual data.

To receive a notification when the Crawling Module is outdated and deactivated

In the Notifications page, create a notification with the following JSON filter:

{
  "content": {},
  "operations": [
    "DISABLE"
  ],
  "resourceTypes": [
    "CRAWLING_MODULE"
  ],
  "resultTypes": [
    "*"
  ]
}

You are notified only when one of your Crawling Module components is out of date. No notification is sent when an up-to-date Crawling Module instance is down.

For more details on activity notifications, see Manage Notifications.

Changing the Default DNS Server Address

Domain Name System (DNS) servers translate host names to public IP addresses. For example, if you type coveo.com in your browser, a DNS server will translate it to the Coveo website server address.

Once you have authenticated to the platform and started a worker, you may notice that your worker can’t communicate with the Coveo Platform properly. The worker log should show the following error: System.Net.WebException: The remote name couldn't be resolved: 'api.cloud.coveo.com' (see About Logs).

To address this issue, you can change the default DNS server address.

In the Maestro Swagger, use the Modify Maestro’s configuration operation described in the following request template to change the default DNS server address.

Request template

PUT http://localhost:5000/api/config HTTP/1.1
Content-Type: application/json-patch+json

Payload

{
  "CustomDnsServers": ["<IPAddress>", "<IPAddress>"]
}

Replace <IPAddress> with the desired DNS server IP addresses.

You can enter several addresses, which will be used in order, i.e., if the first one fails to communicate with Coveo Cloud as expected, the next one will be used, and so on.

For testing purposes, you can use the Google DNS server address: 8.8.8.8.

The body of a successful response is an empty JSON object ({}).

When you change the DNS server address, your workers automatically stop and restart. They will then try to communicate with the Coveo Platform using the specified DNS server IP address.

If you ever need to revert to the default DNS server IP address, use the same API call and, in the request payload, use an empty array for the CustomDnsServers parameter value.

Adding a DNS Suffix

A DNS suffix is added to an unresolved domain name when searching for an IP address. For example, if you have a coveo.com DNS suffix and want your Crawling Module source to index https://docs.coveo.com, you can provide the short form https://docs as the source starting address, and it will be resolved to the full address.

In the Maestro Swagger, use the Modify Maestro’s configuration operation described in the following request template to add a DNS suffix.

Request template

PUT http://localhost:5000/api/config HTTP/1.1
Content-Type: application/json-patch+json

Payload

{
  "CustomDnsSuffixes": ["<Suffix>", "<Suffix>"]
}

Replace <Suffix> with the desired DNS suffixes.

URL short forms aren’t allowed in the Coveo Administration Console source configuration panels. Therefore, to create a source with a short form starting address, you must use the Coveo Platform API.

Using a Proxy

Due to your network configuration, you may want the Crawling Module to download new Docker images and to communicate with Coveo Cloud through a proxy.

  1. In the Maestro Swagger, use the Modify Maestro’s configuration operation described in the following request template to provide a proxy address starting with http:// or https://.

    Request template

     PUT http://localhost:5000/api/config HTTP/1.1
     Content-Type: application/json-patch+json
    

    Payload

     {
       "ProxyAddress": "<Address>"
     }
    

    Replace <Address> with the desired proxy address.

    The body of a successful response is an empty JSON object ({}).

  2. Configure Docker so that it uses the same proxy (see Proxy Configuration).

When you change the ProxyAddress setting, your workers will automatically stop and restart. The process may cause a brief downtime.

To stop using the proxy and return to a direct connection, enter an empty ProxyAddress value ("") in Maestro’s JSON configuration and configure Docker accordingly.

Proxy settings are used only for communication with the Coveo Platform. Crawlers and security workers running on your server don’t use the proxy settings when crawling content through an HTTP connection. This means that your server must be able to access your content without going through the proxy.

Deploying Multiple Crawling Module Instances

A single Crawling Module instance can retrieve the content of several repositories, provided that the server on which you install the Crawling Module can access these repositories. Therefore, a single instance could be sufficient to meet your needs if you adjust your number of workers accordingly.

However, there are situations in which deploying the Crawling Module on more than one server would be relevant, such as when the repositories to make searchable are located on servers you don’t want to make accessible to the server on which you install the Crawling Module. The alternative is then to deploy a Crawling Module instance on each server on which there’s a repository to make searchable.

To deploy an additional Crawling Module instance, follow the deployment instructions. Then, if necessary, edit your Crawling Module sources to pair them with your new Crawling Module instance.

A Crawling Module instance can retrieve the content of multiple repositories, but a source of content can’t be shared between two Crawling Module instances. It can only be paired with a single instance.

In the Coveo Administration Console, the Crawling Module page lets you monitor your Crawling Module instances. Each instance is identified by a default name, which you can replace with something more convenient by editing the Maestro configuration of the desired instance.

Recommended Articles