Add a REST API source
Add a REST API source
Coveo has dedicated connectors for many web and on-premises systems, therefore allowing you to quickly make application content searchable. See the Connector Directory for the full list.
However, there may be applications of which you want to index the content, but for which there’s no dedicated connector. In such a case, when you have the required privileges, you can use a REST API source to retrieve and make the desired content searchable with Coveo.
A REST API source allows you to crawl content from a remote repository exposing its data through a REST API. When creating your source, you provide a JSON REST configuration instructing Coveo to retrieve items from the repository REST services and their respective resource endpoints. This configuration indicates which API calls to execute to fetch the desired items, how to parse the responses to extract relevant metadata, and which content type these items represent.
When working on your REST API source configuration, you may also want to refer to the following articles:
Your company uses Vimeo to host videos. You want to make these videos, along with their comments, accessible through your intranet’s search page. So, you create a REST API source that will query Vimeo’s API to index the desired content.
|
|
Leading practice
The number of items that a source processes per hour (crawling speed) depends on various factors, such as network bandwidth and source configuration. See About crawling speed for information on what can impact crawling speed, as well as possible solutions. |
Source key characteristics
The following table presents the main characteristics of a REST API source.
| Features | Supported | Additional information | |
|---|---|---|---|
To enable the refresh capability, add a |
|||
Content security options |
If your source configuration includes the |
||
Automatic mapping of metadata to fields that have the same name |
|||
Automatically indexed metadata |
Examples of auto-populated default fields (no user-defined metadata required):
Sample of auto-populated fields (source configuration parameter of matching name required): After a content update, inspect your item field values in the Content Browser. |
||
Extracted but not indexed metadata |
Parameters specified in the source configuration |
||
Commerce requirements
When using a REST API source to index commerce-specific content, such as products, variants, and availabilities, you have to undergo a catalog configuration process to benefit from all commerce-related capabilities.
More specifically, you must:
-
Enable Coveo Personalization-as-you-go (PAYG) capabilities in your source.
-
Associate your source with a catalog configuration.
Enable Coveo Personalization-as-you-go
Coveo Machine Learning tools include Coveo Personalization-as-you-go (PAYG) capabilities for commerce use cases. This suite of advanced features learns from a user’s intent and reacts within a few clicks. PAYG models require the building of a product vector space to represent the products contained in your source. For REST API sources, Coveo PAYG needs to be enabled in order to produce the product vector space. Contact your Coveo representative to discuss your options.
Catalog configuration
Behind the scenes, the REST API source uses the Stream API to push content to the Coveo index. Therefore, REST API sources must be associated with a catalog entity to ensure a complete configuration. This allows the source to accurately build a product vector space.
For instructions on how to create a catalog entity, see Commerce catalog entity.
Add a REST API source
|
|
Leading practice
It’s best to create or edit your source in your sandbox organization first. Once you’ve confirmed that it indexes the desired content, you can copy your source configuration to your production organization, either with a snapshot or manually. See About non-production organizations for more information and best practices regarding sandbox organizations. |
Follow the instructions below to add a REST API source using the desired content retrieval method.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click Add source.
-
In the Add a source of content panel, click the Cloud (
) or Crawling Module (
) tab, depending on your content retrieval context. With the latter, you must install the Crawling Module to make your source operational.
-
Click the REST API tile.
-
Configure your source.
The Build the source and Index metadata steps are especially important when creating a source of this type.
"Configuration" tab
In the Add a REST API Source panel, the Configuration tab is selected by default. It contains your source’s general and authentication information, as well as the JSON configuration that let you specify the content to index.
Name
Enter a name for your source.
|
|
Leading practice
A source name can’t be modified once it’s saved, therefore be sure to use a short and descriptive name, using letters, numbers, hyphens ( |
Project
Use the Project selector to associate your source with one or more Coveo projects.
Content to index
In the JSON configuration box, enter your source JSON configuration. This configuration instructs Coveo on how to retrieve items from your API’s REST services and their respective resource endpoints.
You can start from our Basic example and adapt them to your needs. For example, you could configure your source to benefit from the authentication, paging and refresh capabilities.
When working on your REST API source configuration, you may also want to refer to the following articles:
Authentication
In the Authentication section, all parameters are optional. Fill the appropriate boxes depending on the authentication type used by the source you want to make searchable.
-
If your source uses a HTTP, Basic, Kerberos, or NTLM authentication protocol, enter the Username and Password of the account with which you want to crawl the source. Then, use the
@Usernameand@Passwordplaceholders in your source JSON configuration instead of exposing the credentials in clear text. The account of which you enter the credentials must have access to all the content that you want to make searchable. See Source Credentials Leading Practices. -
If your source uses the OAuth 2.0 authentication protocol, enter your content source Client ID, Client secret and Refresh token in the corresponding boxes. Then, use the
@ClientID,@ClientSecret, and@RefreshTokenplaceholders in your source JSON configuration instead of exposing this information in clear text. -
If your source uses an API key to authenticate, enter it in the API key box. Then, use the
@ApiKeyplaceholder in your source JSON configuration instead of exposing the API key in clear text. -
If your source doesn’t require authentication, leave all boxes empty.
Crawling Module
If your source is a Crawling Module source, and if you have more than one Crawling Module linked to this organization, select the one with which you want to pair your source. If you change the Crawling Module instance paired with your source, a successful rebuild is required for your change to apply.
"Items" tab
On the Items tab, you can specify how the source handles items based on their file type or content type.
File types
File types let you define how the source handles items based on their file extension or content type. For each file type, you can specify whether to index the item content and metadata, only the item metadata, or neither.
You should fine-tune the file type configurations with the objective of indexing only the content that’s relevant to your users.
Your repository contains .pdf files, but you don’t want them to appear in search results.
You click Extensions and then, for the .pdf extension, you change the Default action and Action on error values to Ignore item.
For more details about this feature, see File type handling.
Content and images
If you want Coveo to extract text from image files or PDF files containing images, enable the appropriate option. The extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick view.
|
|
Note
When OCR is enabled, ensure the source’s relevant file type configurations index the item content. Indexing the item’s metadata only or ignoring the item will prevent OCR from being applied. |
See Enable optical character recognition for details on this feature.
"Content security" tab
Select who will be able to access the source items through a Coveo-powered search interface. For details on the content security options, see Content security.
|
|
Note
If, while writing your source JSON configuration, you chose to index content access permissions and used the |
"Access" tab
On the Access tab, specify whether each group (and API key, if applicable) in your Coveo organization can view or edit the current source.
For example, when creating a new source, you could decide that members of Group A can edit its configuration, while Group B can only view it.
For more information, see Custom access level.
Build the source
-
Finish adding or editing your source:
-
When you’re done editing the source and want to make your changes effective, click Add and build source/Save and rebuild source.
-
When you want to save your source configuration changes without starting a build/rebuild, such as when you know you want to make other changes soon, click Add source/Save. On the Sources (platform-ca | platform-eu | platform-au) page, click Launch build or Start required rebuild when you’re ready to make your changes effective and index your content.
-
-
On the Sources (platform-ca | platform-eu | platform-au) page, follow the progress of your source addition or modification.
-
Once the source is built or rebuilt, review its content in the Content Browser.
Index metadata
To use metadata values in search interface facets or result templates, the metadata must be mapped to fields. With recently created REST API sources, Coveo automatically maps metadata to fields with the same name.
|
|
Note
To enable this auto-mapping behavior on older REST API sources, set the |
Coveo has some default fields for commonly extracted metadata (for example, author, date).
For any custom metadata defined in your source JSON configuration, you must create a field with the same name to store the metadata values.
For example, if you’ve defined a department metadata, you must have a department field to store the metadata values.
-
Review the Metadata object in your source JSON configuration for the list of metadata you’re currently extracting from your content.
-
On the Fields (platform-ca | platform-eu | platform-au) page, for each metadata you want to use in facets or result templates, add a field with the same name, unless one already exists.
NoteFields are shared across sources in your Coveo organization. If a field with the same name as the metadata you want to index already exists and its configuration suits you, use it for the metadata you want to index. Otherwise, you can create a mapping to index the metadata in a new field with a different name.
ExampleYou decided to retrieve picture URIs from your content using the following metadata definition in your source JSON configuration:
"pictureuri": "%[picture.uri]"Since there’s no Coveo
pictureuridefault field, you need to create thepictureurifield, unless it already exists. -
Return to the Sources (platform-ca | platform-eu | platform-au) page.
-
To reindex your source with your new mappings, click your source, and then click More > Rebuild in the Action bar.
-
Once the source is rebuilt, review your item field values. They should now include the values of the metadata you selected to index.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click More > Open in Content Browser in the Action bar.
-
Select the card of the item for which you want to inspect properties, and then click Properties in the Action bar.
-
In the panel that appears, select the Fields tab.
-
Change the source request rate
Your source makes requests to an API to index the desired content. By default, these requests are made immediately one after the other. If Coveo’s requests reach the throttling limit of your API, you can increase the delay between each request.
To do so, in the source JSON configuration (not the crawling configuration) parameters object, add the following code.
"RequestsIntervalInMs": {
"value": "<NUMBER_OF_MILLISECONDS>"
}
Replace <NUMBER_OF_MILLISECONDS> with the number of milliseconds Coveo should wait between each request.
The default value is 0, that is, there is no delay between requests. |
Ignoring "no first page" errors in subitems
When indexing your content, Coveo may encounter an HTTP error. By default, Coveo stops the crawling process when it encounters such an error.
However, you can configure your source to ignore specific errors and continue indexing.
Similarly, when requesting subitems from your API, Coveo will stop the indexing process if your API returns a 404 error rather than the first page of results. A 404 error on the first page prevents Coveo from indexing any of your subitems, as the missing first page contains the information needed to request the second page, such as a cursor or the URL of the next page.
However, you can configure your source to ignore this error and continue indexing. It will therefore finish indexing the other items of the same endpoint, including their subitems if the API returns valid result pages. Then, your source will move on to any other endpoint you’ve defined.
To ignore "no first page" errors in subitems
-
On the Sources (platform-ca | platform-eu | platform-au), select your source, and then, in the More menu, click Edit configuration with JSON.
-
In your source JSON configuration, in the
parametersobject, add the following:"SkipNoFirstPageErrorsInSubItems": { "value": "true" }, -
Click Save and rebuild source.
This parameter applies to all subitem requests made by your source.
Required privileges
You can assign privileges to allow access to specific tools in the Coveo Administration Console. The following table indicates the privileges required to view or edit elements of the Sources (platform-ca | platform-eu | platform-au) page and associated panels. See Manage privileges and Privilege reference for more information.
|
|
Note
The Edit all privilege isn’t required to create sources. When granting privileges for the Sources domain, you can grant a group or API key the View all or Custom access level, instead of Edit all, and then select the Can Create checkbox to allow users to create sources. See Can Create ability dependence for more information. |
| Actions | Service | Domain | Required access level |
|---|---|---|---|
View sources, view source update schedules, and subscribe to source notifications |
Content |
Fields |
View |
Sources |
|||
Organization |
Organization |
||
Edit sources, edit source update schedules, and edit source mappings |
Organization |
Organization |
View |
Content |
Fields |
Edit |
|
Sources |
|||
View and map metadata |
Content |
Source metadata |
View |
Fields |
|||
Organization |
Organization |
||
Content |
Sources |
Edit |