Crawling Module REST API Reference

This article applies to the new Crawling Module, which works without Docker. If you still use the Crawling Module with Docker, see Crawling Module REST API Reference (Docker Version) instead. You might also want to read on the advantages of the new Crawling Module.

To identify the Crawling Module you’re currently using, on the Crawling Modules page of the Coveo Administration Console, look at the Maestro reported version:

  • Versions > 1: new Crawling Module

  • Versions < 1: Crawling Module with Docker

Maestro is driven using a REST API and listens on port 5000 by default. Since not all Crawling Module management operations are available in the Coveo Administration Console yet, you must use Swagger at http://localhost:5000/api/swagger/ to accomplish most of them. If you decided to use a different service port while installing Maestro, go to the corresponding address instead (e.g., http://localhost:5001/api/swagger/ if you chose to use port 5001).

Coveo only supports managing Crawling Modules using Swagger. If you want to use a different tool (e.g., PowerShell), keep in mind that the Coveo Support team only offers help with Swagger.

Authentication

The /api/authorize/url call returns the URL at which you should log in with a Coveo Cloud account that has the privilege to create API keys. As you log in, you perform a handshake with the Coveo Platform and create an API key for your crawling Module instance to use when communicating with the Platform.

Request template

GET http://localhost:5000/api/authorize/url HTTP/1.1
 
Accept: application/json

200 OK response body

{
  https://platform.cloud.coveo.com/oauth/authorize?response_type=token&client_id=CrawlingModule&scope=full&redirect_uri=http://localhost:5000/oauth/receive_token.html
}

Linking the Crawling Module to an Organization

The /api/authorize call allows you to send an authorization token to complete the process of linking your Crawling Module instance to a Coveo organization. Typically, you shouldn’t need to use this call unless the linking process fails.

Before you use this call, ensure that your Maestro settings contains the correct CoveoEnvironment value. The CoveoEnvironment parameter represents the type of organization you want to link to your Crawling Module instance. Possible values are Production and Hipaa. Should the Crawling Module use a proxy to communicate with Coveo Cloud, specify its address and credentials as well.

In the request, you must provide the token you previously obtained.

The body of a successful response is an empty JSON object ({}).

Once you have linked the Crawling Module to a Coveo organization, you can use the /api/authorize/verify call to confirm the success of the linking process.

Request template

GET http://localhost:5000/api/authorize/verify HTTP/1.1
 
Accept: application/json

The body of a successful response is an empty JSON object ({}).

Configuration

Getting the Crawling Module Configuration

The /api/config GET call allows you to review your Coveo On-Premises Crawling Module configuration. The information it returns is the following:

You can use this call to check that your configuration is adequate, and then edit this configuration if needed.

Request template

GET http://localhost:5000/api/config HTTP/1.1
 
Accept: application/json

The body of a successful response contains information regarding the Crawling Module configuration.

200 OK response body

{
  "OrganizationId": "connectorsteamtestsmf76kcam",
  "Name": "MyCompanysCrawlingModule",
  "LogRetentionPeriodInDays": 30,
  "AutoUpdateTriggerTime": "23:00:00",
  "NumberOfCrawlerWorkers": 2,
  "NumberOfSecurityWorkers": 1
}

Editing the Crawling Module Configuration

Use the /api/config PUT call to provide new values for your Crawling Module configuration parameters.

Request template

PUT http://localhost:5000/api/config HTTP/1.1
 
Content-Type: application/json-patch+json

Include the key-value pairs to modify in the request body. Possible pairs are:

  • "Name": "<NAME>"

  • "LogRetentionPeriodInDays": <NUMBER_OF_DAYS>

  • "AutoUpdateTriggerTime": "<TIME>"

  • "NumberOfCrawlerWorkers": <NUMBER_OF_CONTENT_WORKERS>

  • "NumberOfSecurityWorkers": <NUMBER_OF_SECURITY_WORKERS>

where:

  • <NAME> is a name identifying your Crawling Module instance. This value appears on the Crawling Modules page of the Coveo Administration Console, as well as in your Crawling Module source configuration panels. Only alphanumeric characters, dashes and underscores are allowed.

  • <NUMBER_OF_DAYS> is an integer value that represents the number of days that logs are kept before being automatically deleted. By default, this retention period is 30 days, which is also the minimum allowed. The maximum is 730 days (2 years).

  • <TIME> is the time at which the automatic update process starts, in the format HH:mm:ss. The default is 23:00:00, and the time zone used is that of your server.

  • <NUMBER_OF_CONTENT_WORKERS> is an integer value that represents the desired number of content workers.

  • <NUMBER_OF_SECURITY_WORKERS> is an integer value that represents the desired number of security workers. Security workers are only required for some Crawling Module sources that index permissions ("sourceVisibility": "SECURED"). See Indexing Secured Content for details.

Only the parameters you want to modify are required in the request payload.

Modifying the name of your Crawling Module instance

{
  "Name": "MyCrawlingModule"
}

The body of a successful response is an empty JSON object ({}).

Editing Sensitive Configuration Parameters

The /api/config/sensitive call allows you to change the password of your proxy or database. See Password Update for details on when to use this call.

The body of a successful response is an empty JSON object ({}).

Getting the Crawling Module ID

The /api/config/id GET call allows you to review the unique identifier of your Crawling Module instance. This ID is also displayed on the Crawling Modules page of the Coveo Administration Console.

Request template

GET http://localhost:5000/api/config/id HTTP/1.1
 
Accept: application/json

The body of a successful response contains the Crawling Module unique identifier.

200 Success

"coveoorganization-345238a4-298h-8e3v-58467815481d"

Generating a New Crawling Module ID

When you deploy the Crawling Module, a unique identifier is generated for your instance. However, if you ever need to make a copy of your Crawling Module instance, you will then have two instances with the same identifier. To prevent communication issues, you must use the /api/config/id POST call to generate a new unique identifier for one of these instances.

Request template

POST http://localhost:5000/api/config/id HTTP/1.1
 
Accept: application/json

Payload

"coveoorganization-h7d4h6137-w745-5f93-72h95ca314"

Logging

Creating a Compressed Logs Archive

The /api/logging/logs call creates a compressed archive containing all available Crawling Module logs, and then returns the path to the compressed archive in its response body.

The body of a successful response should be similar to the following:

200 OK Response body


C:\ProgramData\Coveo\Maestro\CrawlingModuleLogs-20200618103928.zip

Deleting the “Dumps” Folder Files

The files in the Dumps folder can be useful for troubleshooting, but take a lot of space. If your Crawling Module is running as expected, you can use the /api/logging/purge/dump call to delete them to free up disk space.

A successful request returns a Status of 200 OK.

ODBC

Getting the Available Drivers for an ODBC Source

The /api/troubleshooting/odbc/drivers call allows you to view which drivers you can specify in your Database source connection string.

Request template

GET http://localhost:5000/api/odbc/drivers HTTP/1.1
 
Accept: application/json

The body of a successful response looks like the example below.

200 OK response body

{
  "X64Drivers": [
    "SQL Server",
    "PostgreSQL ANSI(x64)",
    "PostgreSQL Unicode(x64)",
    "MySQL ODBC 5.3 ANSI Driver",
    "MySQL ODBC 5.3 Unicode Driver"
    "Oracle in instantclient_12_2"
  ]
}

The Oracle in instantclient_12_2 driver supports Oracle database versions 10 and over.

Status

Getting the Workers’ Status

The /api/status/workers call returns the status of each worker, along with other details. This information is also available on the Crawling Module component dashboard.

Request template

GET http://localhost:5000/api/status/workers HTTP/1.1
 
Accept: application/json

200 OK response body

{
  "WorkerStatus": [
    {
      "IsRunning": true,
      "Name": "Crawler-Worker-68a1015f-7f5f-4f91-984c-0f16bd59da4f",
      "WorkerType": "Crawler",
      "Details": {
        "ProcessId": 15212,
        "Status": "Running",
        "LastStartStopTime": "2020-05-12T14:23:14"
      }
    },
    {
      "IsRunning": true,
      "Name": "Crawler-Worker-73edb6b3-7465-4253-9340-5707c97bbead",
      "WorkerType": "Crawler",
      "Details": {
        "ProcessId": 31188,
        "Status": "Running",
        "LastStartStopTime": "2020-05-12T14:23:14"
      }
    },
    {
      "IsRunning": true,
      "Name": "Security-Worker-c585d778-1d90-444d-bd46-57f8cdabe700",
      "WorkerType": "Security",
      "Details": {
        "ProcessId": 25388,
        "Status": "Running",
        "LastStartStopTime": "2020-05-12T14:23:18"
      }
    }
  ]
}

Getting Maestro’s Status

The /api/status/maestro call returns the status of Maestro, along with other information.

Request template

GET http://localhost:5000/api/status/maestro HTTP/1.1
 
Accept: application/json

200 OK response body

{
  "Uptime": "00:06:52",
  "IsWorkerServiceRunning": true,
  "LinkedOrganization": "myorganizationw376kcrn",
  "IsAbleToReachOrganization": true
}

Getting the Maestro Version

The /api/status/version call returns the version of Maestro in the response body.

Alternatively, you can find this information on the Crawling Modules page, which also shows the latest version available, and on the component dashboard.

200 OK response body

{
  "MaestroVersion": "1.2.8.0"
}

Update

Updating Maestro

The Crawling Module updates automatically at the time specified in its configuration, so you shouldn’t need to use the /api/update/maestro call. However, should you ever need to update Maestro manually, (e.g., if instructed to do so by the Coveo Support team), use this call to launch the update process.

During the update process, a copy of your obsolete Crawling Module folder is saved under C:\ProgramData\Coveo\Maestro\BackupPackage. If necessary, you can revert to the previous version by replacing the new Crawling Module folder with this copy.

Restarting Maestro

After you edit Maestro settings, you must restart Maestro with the /api/service/restart call to apply your changes.

Request template

POST http://localhost:5000/api/service/restart HTTP/1.1

The body of a successful response is an empty JSON object ({}). You check Maestro’s status with the /api/status call.

Getting the Proxy Status

If you configured the Crawling Module to communicate with Coveo Cloud through a proxy, you can use the /api/troubleshooting/proxy/settings call to review the proxy status.

Request template

GET http://localhost:5000/api/troubleshooting/proxy/settings HTTP/1.1

200 OK response body

{
  "MaestroSettingsProxyUrl": null,
  "IsHttpProxyEnvironmentVariablePresent": false,
  "IsHttpsProxyEnvironmentVariablePresent": false,
  "ProxyUsedForPlatform": "https://platform.cloud.coveo.com/",
  "IsDefaultProxyCredentialsPresent": false,
  "WinhttpProxyStatus": "winhttp proxy is not set."
}
Recommended Articles