Crawling Module REST API Reference

Maestro is driven using a REST API and listens on port 5000 by default. Since no UI is available yet to manage your workers through Maestro, you must use Swagger at http://localhost:5000/api/swagger/.

Coveo only supports managing Crawling Modules using Swagger. If you want to use a different tool (e.g., PowerShell), keep in mind that the Coveo Support team only offers assistance with Swagger.

Linking Maestro to the Coveo Cloud Platform

Before providing your content to the Coveo Cloud platform, you must specify in which organization you want to index your content. When you link Maestro to an organization on the Coveo Cloud platform, an API key to provide with subsequent calls is created. Typically, you should only need to follow these instructions once.

  1. If you want the Crawling Module to communicate with Coveo Cloud through a proxy, specify its address in the Crawling Module configuration.

  2. Click here (or here if you have a HIPAA organization) to initiate the OAuth handshake.

  3. Log in to Coveo Cloud.

  4. On the Grant Access page, click the drop-down menu to select the organization to link with the Coveo On-Premises Crawling Module, and then click Next.

  5. Click Authorize to allow access.

    In the Coveo Cloud administration console, on the API Keys page, an API key named Crawling modules is created. This API key is encrypted and saved on your disk.

    Coveo Cloud Administration Console API Keys Page

  6. Optionally, you may want to restrict the IP addresses from which the Crawling modules API key can be used in requests to the Coveo Cloud platform. It’s recommended that you only allow addresses that are likely to be used by the server on which you installed Maestro. This ensures that the API key can only be used in requests that originate from the network associated with your organization.

    1. Obtain the outgoing public IP addresses of the server on which you installed the Crawling Module. Since a server can access the Internet with different IP addresses, you should request that your network administrator provide a complete list of outgoing public IP addresses, including those used by HTTP proxies for outbound communication.

    2. In the Coveo Cloud administration console, on the API Keys page, double-click the Crawling modules API key to edit it (see Adding and Managing API Keys).

    3. Under Allowed IPs, add the outgoing IP addresses.

    4. Click Save.

Viewing your On-Premises Crawling Module Configuration

The /api/config GET call allows you to review the Coveo On-Premises Crawling Module configuration by returning your:

  • Organization ID

  • Crawling Module setup name

  • Number of workers

  • Automatic update time in your local time zone

  • Log retention period in days

You can use this call to check that your configuration is adequate (see Editing Your On-Premises Crawling Module Configuration).

In the Maestro Swagger, use the /api/config GET call to get the Crawling Module configuration.

Request template

GET http://localhost:5000/api/config HTTP/1.1
 
Accept: application/json

The body of a successful response contains information regarding the Crawling Module configuration.

200 OK response body

{
  "OrganizationId": "MyCoveoCloudOrganization",
  "Name": "EU_CoveoCrawlingModule",
  "NumberOfWorkers": 4
}

Editing Your On-Premises Crawling Module Configuration

In the Maestro Swagger, use the /api/config PUT call to specify which Crawling Module configuration values to modify.

Request template

PUT http://localhost:5000/api/config HTTP/1.1
 
Content-Type: application/json-patch+json

Include the following key-value pairs in the request body:

  • "Name": "<Name>"

  • "NumberOfWorkers": <NumberOfWorkers>

  • "NumberOfSecurityWorkers": <NumberOfSecurityWorkers>

  • "LogRetentionPeriodInDays": <NumberOfDays>

  • "AutoUpdateTriggerTime": "<time>"

  • "CustomDnsServers": ["<IPAddress>", "<IPAddress>"]

  • "CustomDnsSuffixes": ["<Suffix>", "<Suffix>"]

  • "ProxyAddress": "<Address>"

where:

  • <Name> is a name identifying your Crawling Module instance. This value appears in the Crawling Module page of the Coveo Cloud Administration Console, as well as in source configuration panels. Only alphanumeric characters, dashes and underscores are allowed.

  • <NumberOfWorkers> is an integer value that represents the desired number of workers.

  • <NumberOfSecurityWorkers> is an integer value that represents the desired number of security workers. Security workers are only required for some Crawling Module secured sources. See About Crawling Module Secured Sources for details.

  • <NumberOfDays> is an integer value that represents the number of days that logs are kept before being automatically deleted. By default, this retention period is 30 days, which is also the minimum allowed. The maximum is 730 days (2 years).

  • <time> is the time at which the automatic update process starts, in the format HH:mm:ss. The default is 23:00:00, and the time zone used is that of your server.

  • <IPAddress> is an element in an array of custom DNS server addresses.

  • <Suffix> is an element in an array of DNS suffixes.

  • <Address> is a proxy address that starts with http:// or https://.

Only the parameters you wish to modify are required in the request payload.

Modifying the name of your Crawling Module setup

{
  "Name": "MyCrawlingModule"
}

The body of a successful response is an empty JSON object ({}).

Viewing the Unique Identifier of your Crawling Module Instance

The /api/config/id GET call allows you to review the unique identifier of your Crawling Module instance. This ID is also visible in the Crawling Module page of the Coveo Cloud administration console.

Request template

GET http://localhost:5000/api/config/id HTTP/1.1
 
Accept: application/json

The body of a successful response contains the Crawling Module unique identifier.

200 Success

"coveoorganization-345238a4-298h-8e3v-58467815481d"

Creating a New Unique Identifier

When you deploy the Crawling Module, a unique identifier is generated for your instance. However, if you ever need to make a copy of your Crawling Module instance, you will then have two instances with the same identifier. To prevent communication issues, you must manually generate a new unique identifier for one of these instances.

In the Maestro Swagger, use the /api/config/id PUT call to generate a new unique identifier.

Request template

PUT http://localhost:5000/api/config/id HTTP/1.1
 
Accept: application/json

Payload

"coveoorganization-h7d4h6137-w745-5f93-72h95ca314"

Executing a Command Inside the Workers

The /api/debug/command call allows you to run a command inside running workers for debugging purposes. This command is executed inside PowerShell. The result is identical to that of the same command executed from a PowerShell prompt inside the Docker container.

Enter your command as a string between double quotes in the request body.

  1. Ping any address to validate that your workers can access the Internet and resolve URLs.

    POST http://localhost:5000/api/debug/command HTTP/1.1
        
    Content-Type: application/json-patch+json
    

    Payload

    {
      "ping 8.8.8.8"
    }
    

    The body of a successful response indicates that the specified address has been pinged:

    200 OK response body

    [
      "Pinging 8.8.8.8 with 32 bytes of data:",
      "Reply from 8.8.8.8: bytes=32 time=12ms TTL=59",
      "Ping statistics for 8.8.8.8:",
      "    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),",
      "Approximate round trip times in milli-seconds:",
      "    Minimum = 12ms, Maximum = 12ms, Average = 12ms"
    ]
    
  2. Ping api.cloud.coveo.com to validate that your workers can talk to Coveo.

    POST http://localhost:5000/api/debug/command HTTP/1.1
        
    Content-Type: application/json-patch+json
    

    Payload

    {
      "ping api.cloud.coveo.com"
    }
    

    The body of a successful response indicates that the specified address has been pinged:

    200 OK response body

    [
      "Pinging d23op0vm8dczg1.cloudfront.net [52.84.96.152] with 32 bytes of data:",
      "Reply from 52.84.96.152: bytes=32 time=33ms TTL=241",
      "Ping statistics for 52.84.96.152:",
      "    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),",
      "Approximate round trip times in milli-seconds:",
      "    Minimum = 33ms, Maximum = 33ms, Average = 33ms"
    ]
    
  3. Use the following command to validate that your file share is reachable.

    POST http://localhost:5000/api/debug/command HTTP/1.1
        
    Content-Type: application/json-patch+json
    

    Payload

    {
      "net use \\\\MyMachine.corp.mycompany.com\\Users"
    }
    

    Ensure that you enter the full DNS name (e.g., corp.mycompany.com) and escape every backslash.

    The response body indicates whether the command was successful:

    200 OK response body

    [
      "The command completed successfully."
    ]
    

    If the call returns an error similar to the following:

    [
      "Enter the user name for '\\\\MyMachine.corp.mycompany.com': \u0000",
      "System error 1223 has occurred."
      "The operation was canceled by the user."
    ]
    

    Enter the following command:

    {
      "net use \\\\MyMachine.corp.mycompany.com\\Folder /user:DOMAIN\\username password"
    }
    

Creating a Compressed Logs Archive

The /api/debug/logs call creates a compressed archive containing all currently available Crawling Module logs, then returns the path to the archive in its response body.

The body of a successful response should be similar to the following:

200 OK Response body

C:\ProgramData\Coveo\data\CrawlingModuleLogs-20191018103928.zip

It’s recommended that you kill the workers before creating a log archive to ensure that the collected logs are not actively being written.

Viewing the Available Drivers for an ODBC Source

The /api/odbc/drivers operation allows you to view which drivers you can specify in your ODBC source connection string (see Creating a Crawling Module Source).

Request template

GET http://localhost:5000/api/odbc/drivers HTTP/1.1
 
Accept: application/json

The body of a successful response looks like the example below. X64Drivers lists the available 64-bit drivers, while the 32-bit drivers are listed under X86Drivers:

200 OK response body

{
  "X64Drivers": [
    "SQL Server",
    "PostgreSQL ANSI(x64)",
    "PostgreSQL Unicode(x64)",
    "MySQL ODBC 5.3 ANSI Driver",
    "MySQL ODBC 5.3 Unicode Driver"
    "Oracle in instantclient_12_2"
  ],
  "X86Drivers": [
    "Driver da Microsoft para arquivos texto (*.txt; *.csv)",
    "Driver do Microsoft Access (*.mdb)",
    "Driver do Microsoft dBase (*.dbf)",
    "Driver do Microsoft Excel(*.xls)",
    "Driver do Microsoft Paradox (*.db )",
    "Microsoft Access Driver (*.mdb)",
    "Microsoft Access-Treiber (*.mdb)",
    "Microsoft dBase Driver (*.dbf)",
    "Microsoft dBase-Treiber (*.dbf)",
    "Microsoft Excel Driver (*.xls)",
    "Microsoft Excel-Treiber (*.xls)",
    "Microsoft ODBC for Oracle",
    "Microsoft Paradox Driver (*.db )",
    "Microsoft Paradox-Treiber (*.db )",
    "Microsoft Text Driver (*.txt; *.csv)",
    "Microsoft Text-Treiber (*.txt; *.csv)",
    "SQL Server"
  ]
}

The Oracle in instantclient_12_2 driver supports Oracle database versions 10 and over.

Viewing the Crawling Module Status

The /api/status/workers call returns the Coveo On-Premises Crawling Module status. After you execute the /api/workers/start command, the response body also reports downloads and the extraction of container images.

In the Maestro Swagger, use the /api/status operation to view the status of the Crawling Module.

Request template

GET http://localhost:5000/api/status HTTP/1.1
 
Accept: application/json

The body of a successful response contains information regarding the Crawling Module configuration. Click Try it out again to send another request and get updated information.

When installing or updating the Crawling Module, as you click Try it out, the response body changes as follows:

  1. Under Downloads, DownloadPercentage indicates the progress of the database image (crawling-module-mysql) download process.

  2. Once DownloadPercentage reaches 100%, the file extraction process starts. Its progress is displayed next to ExtractionPercentage.

  3. Once ExtractionPercentage reaches 100%, Status goes from Pulling image to Pull complete.

  4. Under Downloads, a new object appears. It displays the same information regarding the worker image (crawling-module-worker) that was previously shown for the database image (see steps 1 to 3).

  5. Once the worker images are extracted, under Containers, you should see a database object (crawling-module-mysql) and one or more worker objects (crawling-module-worker), depending on the number of workers you specified in the Crawling Module configuration.

  6. When the Status values for the MySQL database and workers are all Running, you can create a Crawling Module source. The workers and database are also given an Id and a timestamp representing the last time their status changed.

Viewing the status of a Crawling Module with two workers once the entire download, extraction, and starting process is complete

200 OK response body

{
  "Downloads": {
    "064790157154.dkr.ecr.us-east-1.amazonaws.com/connectors/crawling-module-mysql": {
      "IsCompleted": true,
      "CompletionDate": "2019/07/04 17:58:50 +00:00",
      "DownloadPercentage": 100,
      "ExtractionPercentage": 100,
      "Status": "Pull complete"
    },
    "064790157154.dkr.ecr.us-east-1.amazonaws.com/connectors/crawling-module-worker": {
      "IsCompleted": true,
      "CompletionDate": "2019/07/04 18:01:06 +00:00",
      "DownloadPercentage": 100,
      "ExtractionPercentage": 100,
      "Status": "Pull complete"
    }
  },
  "Containers": {
    "HealthyContainers": [
      {
        "Id": "04e809b8573ffc112792a6edeed2246be98809f784e4e564c399818a952c5a33",
        "Image": "064790157154.dkr.ecr.us-east-1.amazonaws.com/connectors/crawling-module-mysql@sha256:8dc906ccf11e392e3698270ae65bcb0d9e74ec6e8840346994ffaf6ef7c71959",
        "Status": "running",
        "LastStatusChange": "2019/07/02 14:13:49 +00:00"
      },
      {
        "Id": "ca9ed5a1f0405e1d9e11522fae5dc1490565eeb923f3c163d1aaca01f861d765",
        "Image": "064790157154.dkr.ecr.us-east-1.amazonaws.com/connectors/crawling-module-worker@sha256:e87e9930f41924a3a03129e2f60bdd74acc82879a6f21c4363073af5680a4e2a",
        "Status": "running",
        "LastStatusChange": "2019/07/04 8:33:42 +00:00"
      },
      {
        "Id": "b37c4caecfd5ab212077db6c974a2c0fca6bbf8c5e1a536277ba5493fff33e3f",
        "Image": "064790157154.dkr.ecr.us-east-1.amazonaws.com/connectors/crawling-module-worker@sha256:e87e9930f41924a3a03129e2f60bdd74acc82879a6f21c4363073af5680a4e2a",
        "Status": "running",
        "LastStatusChange": "2019/07/04 8:33:22 +00:00"
      }
    ],
    "DiscardedContainers": [],
    "MissingContainers": []
  }
}

Viewing the Database, Maestro, and Worker Versions

The /api/status/versions call returns the database, Maestro, and worker versions in the response body.

200 OK response body

{
  "DatabaseVersion": "8.0.3052.1",
  "MaestroVersion": "0.3.47.0",
  "WorkerVersion": "8.0.3052.1"
}

You can compare these version numbers with those returned by the /rest/organizations/{organizationId}/crawlingmodule/versions/latest Coveo Platform API call at https://platform.cloud.coveo.com/docs?api=Platform#!.

Updating Maestro

The Crawling Module updates automatically as of version 0.3, so you should not need to use the /api/update/maestro call. However, if you ever need to update Maestro manually, (e.g., if instructed to do so by the Coveo Support team), use this call to launch the update process.

  • Close any open command prompts in your Crawling Module folder, as they prevent the update process from editing the files in the folder.
  • During the update process, a copy of your obsolete Crawling Module folder is saved under C:\ProgramData\Coveo\data\BackupPackage. If necessary, you can revert to the previous version by replacing the new Crawling Module folder with this copy.
  • When Maestro is two or more versions out of date, it stops the workers. If you use the /api/workers/start call, you get the following response body (see Starting the Workers and the Database):

    {
      "Message": "Your Crawling Module is using obsolete versions.",
      "Details": "Update both Maestro and the workers before reattempting this operation."
    }
    

    You must update Maestro and the workers manually to resume the crawling process (see Updating the Workers).

Updating the Workers

Typically, you should only use the /api/update/workers call when prompted to do so in the /api/workers/start response body, or if instructed by the Coveo Support team (see Starting the Workers and the Database).

Updating the workers can take several minutes and requires at least 30GB of available disk space.

If the workers were already running, they automatically stop when the download finishes, even if they were crawling your content. They then restart and resume the crawling operation.

You can view the progress of the update using the /api/status/workers call.

When workers are two or more versions out of date, they stop. If you use the /api/workers/start call, you will get the following response body (see Starting the Workers and the Database):

{
  "Message": "Your Crawling Module is using obsolete versions.",
  "Details": "Update both Maestro and the workers before reattempting this operation."
}

You must update Maestro and the workers manually to resume the crawling process.

Starting the Workers and the Database

The Coveo On-Premises Crawling Module creates a MySQL database in Docker in addition to the workers (see Crawling Module Workflow).

The first time the /api/workers/start operation is executed, it creates a database and starts the workers. Subsequently, since the database has already been created, the operation only starts the workers.

It downloads the Docker images from the AWS EC2 Container Registry, and then it starts the database and the required number of workers.

This call starts the Crawling Module workers using the current configuration. To ensure that you start the workers with the appropriate configuration, you should make an /api/config GET request before using this API call (see Viewing your Crawling Module Configuration).

In the Maestro Swagger, use the /api/workers/start operation to create a database and start the workers.

Request template

POST http://localhost:5000/api/workers/start HTTP/1.1

The body of a successful response is an empty JSON object ({}).

  • Depending on your network connection, the image download and extraction processes triggered by the initial start call may take several minutes. Expect an image size of around 7 GB.

  • Use the /api/status operation to see the progress of the download process. A successful request returns a Status of 200 OK, and the download and extraction percentages are displayed in the Response Body.

  • Older versions of the Symantec Endpoint Protection antivirus software are known to cause an error during the image extraction process. Upgrading to the latest version should solve this issue.

Stopping the Workers and the Database

If you ever need to stop the Crawling Module, use the /api/workers/stop call.

When you stop your Crawling Module workers, they can no longer request and execute jobs to update the Coveo Cloud index. As a result, the content displayed within a Coveo search interface may become obsolete as your on-premises system is updated, but these content changes are not provided to Coveo Cloud.

In the Maestro Swagger, use the /api/workers/stop operation to stop the workers.

Request template

POST http://localhost:5000/api/workers/kill HTTP/1.1

The body of a successful response is an empty JSON object ({}).

When you’re ready, you can restart your workers using the /api/workers/start call.

Killing the Workers and Removing the Database

The /api/workers/kill call kills the workers to stop the crawling process. This also removes the database, but its data and configuration remain on your machine, so you don’t need to relink Maestro to the Coveo Cloud platform when you’re ready to begin crawling again. You can immediately resume crawling with the /api/workers/start call (see Starting the Workers and the Database).

If you only want to stop the crawling process, use the /api/workers/stop call (see Stopping the Workers and the Database). This will speed up the worker restart process (see Starting the Workers and the Database). You should not need to kill the workers unless you encounter an exceptional situation.

In the Maestro Swagger, use the /api/workers/kill operation to kill the workers and delete the database.

Request template

POST http://localhost:5000/api/status HTTP/1.1
 
Accept: application/json

The body of a successful response is an empty JSON object ({}).

Recommended Articles