REST API source concepts
REST API source concepts
Coveo has dedicated connectors for many web and on-premises systems, therefore allowing you to quickly make application content searchable. See the Connector directory for the full list.
However, there may be applications of which you want to index the content, but for which there’s no dedicated connector. In such a case, when you have the required privileges, you can use a generic API connector to retrieve and make the desired content searchable with Coveo.
Many web applications offer a public API that developers can use to leverage the application in their own software. Coveo takes advantage of such an API to call the application and therefore retrieve its content. A REST API source allows you to index content from repositories exposing their data through a REST API.
When creating your REST API source, you must provide a JSON configuration allowing Coveo to retrieve content items. This configuration indicates which API calls to execute to fetch the desired items, how to parse the responses to extract relevant metadata, and what type of resources these items represent.
While the Reference article details the elements to use in a JSON configuration, this article explains some basic concepts that apply to any REST API source and its JSON configuration. It describes the typical structure of a repository to index, as well as how to use dynamic values to retrieve your content and how JSON objects properties can be passed down to child objects in your JSON configuration.
When working on your REST API source configuration, you may also want to refer to the following articles:
Repository structure
The typical remote repository consists of services, endpoints, and items arranged in a hierarchical fashion. Each service contains one or more resource endpoints, and each of these endpoints represents a type of item to fetch, such as user profiles, web pages, files, etc.
While this structure applies to most repositories, the item types a repository contains vary. For example, a video-sharing website such as Vimeo provides not only video items, but also user profiles, channels, groups, etc., while a customer service management system may have support cases and knowledge articles as item types.
When you create a REST API source in the Coveo Administration Console, you must provide a JSON source configuration listing the services and endpoints to crawl. This JSON configuration must also indicate which API calls to execute to fetch the desired items and how to parse the responses to extract relevant metadata.
Dynamic values
Dynamic values are metadata values acting as placeholders in your REST API source JSON configuration and defining what the Coveo index fields will contain for each item. Dynamic values are to be replaced with metadata or content from the JSON response returned by the API. See REST API source tutorial for examples.
The syntax to use to leverage a dynamic value in your source JSON configuration is typically "CoveoFieldName": "%[DynamicValue]"
.
However, if your dynamic value contains whitespace characters, the syntax to use is slightly different. See Whitespace characters for details.
A dynamic value can consist of one or more of the following:
-
A JSON path leading to a property specified earlier in your JSON configuration or in the JSON response returned by the API.
-
coveo_parent
, which precedes a JSON path and indicates that this JSON path refers to a property of the parent of an item. -
raw
, an indication inserted betweencoveo_parent
and a JSON path, allows you to retrieve a value in the API JSON response that previously provided you with metadata regarding the parent of an item.
A dynamic value can also act as a placeholder for more than one field value.
For example, if the API returns several tags associated with a blog article, you can use a dynamic value to index some or all of them in the Coveo tags
field.
However, if a piece of metadata is constant across all items of an endpoint, you don’t need a dynamic value. You can hardcode your value, that is, enter it as a static value that applies to all items.
All posts on your blog are in English only.
The value of the language
field therefore doesn’t vary and can be hardcoded: "language": "English"
.
Conversely, if you have posts both in English and Spanish, the language
value will vary.
You will therefore need to use a JSON path dynamic value: "language": "%[lang]"
.
coveo_url
coveo_url
is a placeholder for your service URL.
You write the following configuration to retrieve user profiles.
Your service URL is https://myapplication.com
.
In addition, user profile URIs are built by a dynamic value consisting of your service URL, /users/
, and another dynamic value representing the user ID.
As a result, in Coveo, for user profile items, the Uri
field will be populated with https://myapplication.com/users/
, followed by the corresponding user ID.
{
"Services":[
{
"Url":"https://myapplication.com",
"Endpoints":[
{
"Path":"users/%[userid]",
"Method":"GET",
"ItemType":"User",
"Uri":"%[coveo_url]/users/%[userid]",
"ClickableUri":"%[link]",
"Title":"%[name]",
"Body":"%[bio]",
"Metadata":{
"datebirth":"%[birthday]",
"city":"%[location]"
},
}
]
}
]
}
JSON path
A JSON path is an address that tells Coveo’s crawler where to find the information to index in your API’s JSON response.
To build your source JSON configuration, you must make the API calls that you want Coveo to make to retrieve your content. In the JSON responses, you’ll see how the desired data is organized and labeled, and you’ll then use this information to direct the Coveo crawler to the data you want to retrieve.
Example: user profile
You want to index a specific user profile. When you request the user ID, your API returns the following response:
{
"uri": "/users/4543466",
"userid": "4543466",
"firstname": "Andrew",
"lastname": "Price",
"link": "https://example.com/andrewprice",
"city": "Toronto",
"country": "Canada",
"department": "Parts",
"title": "Electrical Panel Specialist",
"bio": "Andrew Price joined the Parts department in 2008, after working as an Electrical Panel Specialist for a major car manufacturer. He has extensive experience in operations, steering and engines, and is highly skilled in diagnosing and repairing electrical problems. Andrew is a dedicated member of his team, and always puts the safety of his customers first.",
"pictureurl": "https://example.com/andrewprice/pic",
"contact": [
{
"name": "Slack",
"link": "@aprice",
},
{
"name": "Email",
"link": "aprice@barca.group",
}
]
}
You want Coveo to index the user’s bio returned by the API and to display it as the body of the content item representing the user profile. When writing your source JSON configuration, you associate the Coveo fields with the property keys that represent the desired data in your API’s response, using JSONPath syntax and dynamic values.
For example, if you want the value of the bio
property to be indexed by Coveo in the Body
field, your source configuration should contain "Body": "%[bio]"
.
The Coveo crawler will retrieve the content of the bio
property in the API JSON response, and store this information as the value of the Coveo Body
field.
Your full source configuration could therefore look as follows:
{
"Services": [
{
"Url": "http://example.com/api/v1",
"Authentication": {
"Username": "@Username",
"Password": "@Password",
"ForceBasicAuthentication": true
},
"Endpoints": [
{
"Path": "/users/",
"Method": "GET",
"ItemType": "People",
"Uri": "%[coveo_url]/users/%[userid]",
"ClickableUri": "%[coveo_url]/users/%[userid]",
"Title": "%[firstname] %[lastname]",
"Body": "%[bio]",
"Metadata": {
"id": "%[userid]",
"division": "%[department]",
"jobtitle": "%[title]",
"location": "%[city], %[country]",
"email": "%[contact[1].link]",
"picture": "%[pictureurl]"
}
}
]
}
]
}
This dynamic value acts as a placeholder for your service URL to build a full working URL.
For more details, see coveo_url . |
|
Coveo can index multiple pieces of data as a single value.
In this case, Coveo will index the first and last names in the Title field, which is dedicated to full names when indexing user profiles. |
|
Following the JSONPath syntax, this value selects the value of the link property in the second object of the contact array. |
With the information indexed by this source configuration, the user profile returned by your API could look as follows in your Coveo search results:
The JSONPath syntax also lets you include a filter expression in dynamic values.
Example: SmartSheet table
The following is a truncated response example from a SmartSheet table. In this response, each row corresponds to a project. The cells of a row contain project data such as its name and status. Each column contains a certain type of data regarding the projects in the table.
{
"id": 952283761449296,
"name": "MY SMARTSHEET",
"permalink": "https://app.smartsheet.com/sheets/mC8mcf4M8j37dbnHbBe77fP0bw939PMw96pq961",
"columns": [
{
"id": 7323702856574852,
"version": 0,
"index": 0,
"title": "Project name",
"type": "TEXT_NUMBER"
},
{
"id": 1694203322361732,
"version": 0,
"index": 1,
"title": "Status",
"type": "TEXT_NUMBER"
},
"..."
],
"rows": [
{
"id": 2314743682492292,
"rowNumber": 1,
"cells": [
{
"columnId": 7323702856574852,
"value": "ERP Vendor Selection",
"displayValue": "ERP Vendor Selection"
},
{
"columnId": 1694203322361732,
"value": "Closed",
"displayValue": "Closed"
},
"..."
]
},
{
"id": 6818343309862788,
"rowNumber": 2,
"siblingId": 2314743682492292,
"cells": [
{
"columnId": 7323702856574852,
"value": "IS Dashboard",
"displayValue": "IS Dashboard"
},
{
"columnId": 1694203322361732,
"value": "On Hold",
"displayValue": "On Hold"
},
"..."
]
}
]
}
You want to index the name, status, and other data of each project item.
You use a JSONPath with a filter expression based on columnId
to retrieve the value of the value
field of each column.
Your JSON source configuration therefore contains:
{
"Endpoints": [
{
"Path": "/sheets/952283761449296",
"Method": "GET",
"ItemPath": "rows",
"ItemType": "Project",
"Uri": "https://app.smartsheet.com/sheets/mC8mcf4M8j37dbnHbBe77fP0bw939PMw96pq961?rowId=%[id]",
"ClickableUri": "https://app.smartsheet.com/sheets/mC8mcf4M8j37dbnHbBe77fP0bw939PMw96pq961?rowId=%[id]",
"Title": "%[cells[?(@.columnId==7323702856574852)].value]",
"CreatedDate": "%[createdAt]",
"ModifiedDate": "%[modifiedAt]",
"Metadata": {
"rowId": "%[id]",
"rowNumber": "%[rowNumber]",
"projectName": "%[cells[?(@.columnId==7323702856574852)].value]",
"projectStatus": "%[cells[?(@.columnId==1694203322361732)].value]"
}
}
]
}
Example: Wordpress blog post tags
Let’s say you want to index the posts of a Wordpress blog. The following is a truncated response example from the Wordpress API. The main object represents a blog post.
{
"id": 12345,
"date_gmt": "2024-03-11T15:43:43",
"slug": "coveo-test-page-wp",
"status": "publish",
"link": "https://example.com/coveo-test-page-wp/",
"title": {
"rendered": "Coveo Test Page WP Edited Content"
},
"content": {
"rendered": "<p>This is a sample wordpress page to test integration with Coveo.</p>\n<p>Heklsjfldsfj;ds</p>\n<p>dsfjlkdskjafklds</p>\n<p>lkfjdslkfjldk</p>\n",
"protected": false
},
"author": 4321,
"_embedded": {
"author": [
{
"id": 4321,
"name": "John Smith",
"url": "",
"description": "",
"link": "https://example.com/author/00u1o7cop0d6hhecl0h8/",
"slug": "00u1o7cop0d6hhecl0h8",
}
],
"wp:term": [
[
{
"id": 73,
"link": "https://example.com/category/docs-testing/",
"name": "DocsTesting",
"taxonomy": "category"
}
],
[
{
"id": 242,
"link": "https://example.com/tag/abc-xyz/",
"name": "abc-xyz",
"taxonomy": "post_tag"
},
{
"id": 243,
"link": "https://example.com/tag/coveo-test-wp/",
"name": "coveo-test-wp",
"taxonomy": "post_tag"
}
],
]
}
}
In your search interface, you want to be able to filter posts by tag. So, when indexing a blog post, you also want to index a list of its tags as metadata.
To do so, you use a JSONPath expression, in which you add a filter expression based on the taxonomy
property to exclude category names.
In your source JSON configuration, the Metadata
object may therefore look as follows:
"Metadata": {
"author": "%[_embedded.author[0].name]",
"date": "%[date_gmt]",
"wordpress_status": "%[status]",
"id": "%[id]",
"wordpress_tag_names": "%[_embedded.wp:term..[?(@.taxonomy=='post_tag')].name]"
}
As a result, Coveo will index the tags associated with each blog post in the wordpress_tag_names
field.
In the Content Browser (platform-ca | platform-eu | platform-au), this should look as follows:
However, since you want Coveo to consider these values separately, you must make the wordpress_tag_names
field a multi-value field.
As a result, the tags will be displayed separately in the facet, like so:
Otherwise, you’ll see the tags as a single string in the facet, for example: abc-xyz;coveo-test-wp
.
Use the JSONPath Online Evaluator to test your JSON paths. |
If an application field targeted by your JSON path is missing or empty, Coveo ignores it. The rest of the content to index, if applicable, is indexed normally.
For example, let’s say your source JSON configuration contains the following property: "blogpostcomment": "%[subject] - %[message]"
.
If, in an item, the subject
field contains Hello world!
and the message
field is empty, the content indexed in the blogpostcomment
Coveo field will be: `Hello world! - `.
coveo_parent
coveo_parent
always appears with a JSON path.
In a source configuration, it’s used to instruct the Coveo crawler to refer to the parent item Metadata
properties, and then to retrieve the value of the desired property.
So, when writing the JSON configuration to retrieve data regarding a sub-item, if you want to get a metadata that you specified earlier in your JSON configuration for the parent object, your dynamic value syntax should be the following:
"CoveoFieldName": %[coveo_parent.MetadataFieldName]
You use the Vimeo API to index user profiles, and then the videos uploaded by each user as sub-items associated to their profile.
You therefore first make an API call to retrieve user profiles, and use it to build your endpoint configuration.
The API response contains, among other pieces of metadata, the user’s ID: "userid": "jsmith01"
.
In your user profile endpoint configuration, you therefore indicate:
"username": "%[userid]"
As a result, for user profiles, the userid
provided by the Vimeo API is stored in the Coveo index as the value of the username
field.
Then, you make another API call to retrieve videos, and use the API response to build the endpoint Subitems
section of your JSON configuration.
You would like to index, for each video, the username of the user who originally uploaded it, but the video endpoint doesn’t provide this information.
You therefore need to instruct the Coveo crawler to retrieve it from the parent item configuration, that is, the user profile endpoint configuration.
In your Subitems
configuration, you include:
"uploadedby": "%[coveo_parent.username]"
This instructs the Coveo crawler to refer to the Metadata
of the parent item in your JSON configuration, to retrieve the value of the username
property, and to index it as the value of the uploadedby
field for videos.
In Coveo, user IDs will then appear as the username
value of user profiles and as the uploadedby
value of videos.
A dynamic value can also contain more than one coveo_parent
, if needed.
"CoveoFieldName": %[coveo_parent.coveo_parent.MetadataFieldName]
refers to the parent of the parent of a sub-item.
Note
|
In a permission configuration, coveo_parent
instructs the Coveo crawler to retrieve the value of the desired property from the permission subquery AdditionalInfo
properties.
So, once you’ve retrieved the security identities associated with each item through the permission subquery, you must use the permission configuration and coveo_parent
to extract the identities' relationships.
raw
In the metadata of a sub-item, you may want to refer to a property that was returned in the API JSON response for the parent item.
If you specified this property before in your JSON configuration for the parent item metadata, you can retrieve it from your own configuration with coveo_parent
alone.
If you didn’t specify it, however, you must instruct Coveo to retrieve it from the API response returned for the parent item by adding raw
in the JSON path leading to the desired property.
When implementing result folding, you want a sub-item to have its parent item URI in the foldingparent
field.
In your sub-item configuration, you therefore write:
"foldingparent": "%[coveo_parent.uri]"
However, you didn’t include the uri
property in the parent item endpoint configuration, as it was irrelevant at this point.
So, you must add raw
to your sub-item property value:
"foldingparent": "%[coveo_parent.raw.uri]"
This instructs Coveo to retrieve the parent item URI from the metadata provided in the API JSON response rather than from the parent item endpoint configuration you wrote earlier.
Note
When a JSON path leading to the desired property in the API response is too long or complicated, you can choose to avoid using the raw metadata of the parent item, and to rather include the property in the parent item endpoint configuration to give it a more suitable JSON path.
In your child item Example The API JSON response provides the following user profile metadata:
You choose not to index website links in user profiles. Your configuration therefore includes:
You also want to index videos as sub-items of user profiles. In Coveo, you want the metadata of video items to include the author’s personal website. Your sub-item endpoint configuration should therefore include:
However, you think that this dynamic value is too long and prefer to avoid it so that reading and interpreting your JSON configuration remains effortless.
So, you add
By doing so, you change the JSON path leading to the desired value for a more simple one. Although not necessary, it may be more convenient. |
Whitespace characters
Depending on the content repository to index, your dynamic values may contain whitespace characters.
In such case, the syntax to use is slightly different: the property name must be enclosed in simple quotes and square brackets in addition to the regular syntax, for example, "%[['property with whitespace']]"
as opposed to "%[property]"
in the regular syntax.
The following table shows the syntax to use based on the scenario.
Scenario | Syntax to use |
---|---|
No whitespace (regular syntax) |
|
Single property |
|
Property with whitespace nested within another property |
|
Property with whitespace nested within another property with whitespace |
|
Complex expression and property with whitespace |
|
Multi-value fields
The examples above show JSON paths leading to properties that have a single value. However, the API’s JSON response may contain arrays in which several objects have a property in common. In such a case, you may want to retrieve some or all of the values associated to this property and populate a single Coveo field with these values. You must therefore write your source configuration accordingly, so that Coveo indexes the desired content.
To populate a Coveo field with many values, use the dynamic value syntax ("CoveoFieldName": "%[DynamicValue]"
) with, inside the square brackets, a JSONPath expression.
When writing a typical JSON path to populate a Coveo field with many values:
-
Specify the objects to take into account between square brackets next to the array name.
-
Use a
*
character to represent all objects in an array, even if there’s only one object. -
If you want to specify certain objects only, decrement the desired object place by one, that is, use
0
to refer to the first object in the array, use1
to refer to the second object, and so on. -
Use commas to separate the values to which you want to refer.
As a result, "CoveoFieldName": "%[path.object1[*].object2.property]"
would populate the CoveoFieldName
field with the property
value of all object1
objects found under path
.
The API returns the following JSON response, providing four sizes of the same picture.
{
...
"name": "Vimeo Holiday Videos!",
"pictures": {
"uri": "/videos/148903960/pictures/548505676",
"active": true,
"type": "custom",
"sizes": [
{
"width": 100,
"height": 75,
"link": "https://i.vimeocdn.com/video/54855654705676_100x75.jpg?r=pad"
},
{
"width": 200,
"height": 150,
"link": "https://i.vimeocdn.com/video/5485567705676_200x150.jpg?r=pad"
},
{
"width": 295,
"height": 166,
"link": "https://i.vimeocdn.com/video/1231234_295x166.jpg?r=pad"
},
{
"width": 640,
"height": 360,
"link": "https://i.vimeocdn.com/video/548545654605676_640x360.jpg?r=pad"
}
]
}
...
}
In the Coveo width
field, you want to list the first two available widths for this picture.
Under metadata
, your JSON configuration therefore contains the following metadata:
"width": "%[pictures.sizes[0,1].width]"
In Coveo, the width
field will contain the following information: 100;200
.
However, in the Coveo pictureuri
field, you only want to have the link to the largest version of the picture.
Your JSON configuration therefore contains the following metadata:
"pictureuri": "%[pictures.sizes[3].link]"
The API returns a JSON response containing the websites
array, which includes only one website object.
{
...
"websites": [
{
"name": null,
"link": "http://www.canadashistory.ca",
"description": null
}
]
...
}
You want a website link to appear in the Coveo website
field.
Since there’s only one link value in your JSON response, your JSON configuration therefore contains the following metadata.
You can’t omit the [*]
, even if there’s only one object in the array.
"website": "%[websites[*].link]"
If there were more than one object in the websites array, and you wanted to index only one of the link
property values, you would have to specify the object of which you want to index the property value.
To index the first property, your JSON configuration would therefore need the following metadata:
"website": "%[websites[0].link]"
Dynamic time expressions
Dynamic time expressions are placeholders for dates in your source JSON configuration. These expressions contain at least a token representing a specific date and time. They may also include a mathematical operator and a number of months, days, hours, or minutes. When indexing or re-indexing your source content, Coveo computes the time expression and retrieves the content matching your date criterion.
Allowed tokens are @Now
and @RefreshDate
.
@Now
represents the start date of the source update operation, while @RefreshDate
represents the date of the last source refresh.
Allowed units are M
(months), d
(days), h
(hours), and m
(minutes).
Only whole values are supported.
Space characters are supported, but not recommended.
When using a dynamic time expression with a date token, make sure to provide the date format with the DateFormat
parameter.
When performing a source refresh, Coveo retrieves all items modified after the last update operation start date:
"RefreshEndpoints":[
{
"DateFormat":"\\'yyyy-MM-dd\\',\\'hh:mm:ss\\'",
"QueryParameters":{
"modified":"%[modified_date]>=@RefreshDate"
}
}
...
]
When indexing a Slack channel, Coveo retrieves all messages written in the last 6 months:
"Endpoints": [
{
"Method": "GET",
"Path": "/api/conversations.history/",
"QueryParameters": {
"token": "@ApiKey",
"channel": "AD8GFL97BFG",
"oldest": "@Now-6M",
"latest": "@Now"
},
"DateFormat": "UnixEpoch",
...
}
...
]
Inheritable properties
Some properties of your source JSON are inheritable. Inheritable properties are passed down from a higher level in the JSON hierarchy to all lower levels unless overridden. You can take advantage of inheritable properties to avoid redundancy in your JSON configuration.
SkippableErrorCodes
, the Paging
object, and the Authentication
object are inheritable properties, among others.
For example, let’s say you want your source to ignore 404 errors when crawling your content.
If you specify the SkippableErrorCodes
property at the service level, it will apply to all endpoints and sub-queries underneath this service.
As a result, you don’t need to specify the SkippableErrorCodes
property in each endpoint or sub-query configuration of this service.
Conversely, if you specify a different SkippableErrorCodes
value at the endpoint level, the endpoint value will override the service value.
Example
The following JSON configuration specifies a service with two endpoints.
The first endpoint has no SkippableErrorCodes
property, so it inherits the value specified at the service level.
As a result, calls to this endpoint will skip the 404 error.
The second endpoint has a different SkippableErrorCodes
value than its parent service, so the endpoint value overrides the service value.
Calls to this endpoint will skip both the 404 and 403 errors.
{
"Services": [
{
"Url": "http://example.com/api/v1",
"Headers": {
"Authorization": "Bearer @ApiKey"
},
"SkippableErrorCodes": "404",
"Endpoints": [
{
"Path": "/users",
"Method": "GET",
"ItemType": "User",
"Uri": "%[coveo_url]/users/%[id]",
"ClickableUri": "%[coveo_url]/users/%[id]",
"Title": "%[name]",
"ModifiedDate": "%[updated]",
"Metadata": {
"id": "%[id]",
"username": "%[name]"
}
},
{
"Path": "/posts",
"Method": "GET",
"ItemType": "Post",
"Uri": "%[coveo_url]/posts/%[id]",
"ClickableUri": "%[coveo_url]/posts/%[id]",
"Title": "%[title]",
"ModifiedDate": "%[updated]",
"Metadata": {
"id": "%[id]",
"category": "%[category]"
},
"SkippableErrorCodes": "404;403"
}
]
}
]
}
Object values can also be inherited.
For example, if you add a Paging
object at the service level and omit it at the endpoint level, your service paging configuration, which includes OffsetType
and PageSize
properties, will apply to all endpoints under the service.