REST API source concepts

Coveo has dedicated connectors for many web and on-premises systems, therefore allowing you to quickly make application content searchable. See the Connector directory for the full list.

However, there may be applications of which you want to index the content, but for which there’s no dedicated connector. In such a case, when you have the required privileges, you can use a generic API connector to retrieve and make the desired content searchable with Coveo.

Many web applications offer a public API that developers can use to leverage the application in their own software. Coveo takes advantage of such an API to call the application and therefore retrieve its content. A REST API source allows you to index content from repositories exposing their data through a REST API.

When creating your REST API source, you must provide a JSON configuration allowing Coveo to retrieve content items. This configuration indicates which API calls to execute to fetch the desired items, how to parse the responses to extract relevant metadata, and what type of resources these items represent.

While the Reference article details the elements to use in a JSON configuration, this article explains some basic concepts that apply to any REST API source and its JSON configuration. It describes the typical structure of a repository to index, as well as how to use dynamic values to retrieve your content and how JSON objects properties can be passed down to child objects in your JSON configuration.

When working on your REST API source configuration, you may also want to refer to the following articles:

Repository structure

The typical remote repository consists of services, endpoints, and items arranged in a hierarchical fashion. Each service contains one or more resource endpoints, and each of these endpoints represents a type of item to fetch, such as user profiles, web pages, files, etc.

Generic_REST_API_source_key_concepts

While this structure applies to most repositories, the item types a repository contains vary. For example, a video-sharing website such as Vimeo provides not only video items, but also user profiles, channels, groups, etc., while a customer service management system may have support cases and knowledge articles as item types.

When you create a REST API source in the Coveo Administration Console, you must provide a JSON source configuration listing the services and endpoints to crawl. This JSON configuration must also indicate which API calls to execute to fetch the desired items and how to parse the responses to extract relevant metadata.

Dynamic values

Dynamic values are metadata values acting as placeholders in your REST API source JSON configuration and defining what the Coveo index fields will contain for each item. Dynamic values are to be replaced with metadata or content from the JSON response returned by the API. See REST API source tutorial for examples.

The syntax to use to leverage a dynamic value in your source JSON configuration is typically "CoveoFieldName": "%[DynamicValue]".

However, if your dynamic value contains whitespace characters, the syntax to use is slightly different. See Whitespace characters for details.

A dynamic value can consist of one or more of the following:

  • coveo_url, a placeholder for your service URL.

  • A JSON path leading to a property specified earlier in your JSON configuration or in the JSON response returned by the API.

  • coveo_parent, which precedes a JSON path and indicates that this JSON path refers to a property of the parent of an item.

  • raw, an indication inserted between coveo_parent and a JSON path, allows you to retrieve a value in the API JSON response that previously provided you with metadata regarding the parent of an item.

A dynamic value can also act as a placeholder for more than one field value. For example, if the API returns several tags associated with a blog article, you can use a dynamic value to index some or all of them in the Coveo tags field.

However, if a piece of metadata is constant across all items of an endpoint, you don’t need a dynamic value. You can hardcode your value, that is, enter it as a static value that applies to all items.

Example

All posts on your blog are in English only. The value of the language field therefore doesn’t vary and can be hardcoded: "language": "English".

Conversely, if you have posts both in English and Spanish, the language value will vary. You will therefore need to use a JSON path dynamic value: "language": "%[lang]".

coveo_url

coveo_url is a placeholder for your service URL.

Example

You write the following configuration to retrieve user profiles. Your service URL is https://myapplication.com. In addition, user profile URIs are built by a dynamic value consisting of your service URL, /users/, and another dynamic value representing the user ID.

As a result, in Coveo, for user profile items, the Uri field will be populated with https://myapplication.com/users/, followed by the corresponding user ID.

{
  "Services":[
    {
        "Url":"https://myapplication.com",
        "Endpoints":[
          {
              "Path":"users/%[userid]",
              "Method":"GET",
              "ItemType":"User",
              "Uri":"%[coveo_url]/users/%[userid]",
              "ClickableUri":"%[link]",
              "Title":"%[name]",
              "Body":"%[bio]",
              "Metadata":{
                "datebirth":"%[birthday]",
                "city":"%[location]"
          },
        }
      ]
    }
  ]
}

JSON path

A JSON path is an address that tells Coveo’s crawler where to find the information to index in your API’s JSON response.

To build your source JSON configuration, you must make the API calls that you want Coveo to make to retrieve your content. In the JSON responses, you’ll see how the desired data is organized and labeled, and you’ll then use this information to direct the Coveo crawler to the data you want to retrieve.

Example: user profile

You want to index a specific user profile. When you request the user ID, your API returns the following response:

{
  "uri": "/users/4543466",
  "userid": "4543466",
  "firstname": "Andrew",
  "lastname": "Price",
  "link": "https://example.com/andrewprice",
  "city": "Toronto",
  "country": "Canada",
  "department": "Parts",
  "title": "Electrical Panel Specialist",
  "bio": "Andrew Price joined the Parts department in 2008, after working as an Electrical Panel Specialist for a major car manufacturer. He has extensive experience in operations, steering and engines, and is highly skilled in diagnosing and repairing electrical problems. Andrew is a dedicated member of his team, and always puts the safety of his customers first.",
  "pictureurl": "https://example.com/andrewprice/pic",
  "contact": [
    {
      "name": "Slack",
      "link": "@aprice",
    },
    {
      "name": "Email",
      "link": "aprice@barca.group",
    }
  ]
}

You want Coveo to index the user’s bio returned by the API and to display it as the body of the content item representing the user profile. When writing your source JSON configuration, you associate the Coveo fields with the property keys that represent the desired data in your API’s response, using JSONPath syntax and dynamic values.

For example, if you want the value of the bio property to be indexed by Coveo in the Body field, your source configuration should contain "Body": "%[bio]". The Coveo crawler will retrieve the content of the bio property in the API JSON response, and store this information as the value of the Coveo Body field.

Your full source configuration could therefore look as follows:

{
  "Services": [
    {
      "Url": "http://example.com/api/v1",
      "Authentication": {
        "Username": "@Username",
        "Password": "@Password",
        "ForceBasicAuthentication": true
      },
      "Endpoints": [
        {
          "Path": "/users/",
          "Method": "GET",
          "ItemType": "People",
          "Uri": "%[coveo_url]/users/%[userid]", 1
          "ClickableUri": "%[coveo_url]/users/%[userid]",
          "Title": "%[firstname] %[lastname]", 2
          "Body": "%[bio]",
          "Metadata": {
            "id": "%[userid]",
            "division": "%[department]",
            "jobtitle": "%[title]",
            "location": "%[city], %[country]",
            "email": "%[contact[1].link]", 3
            "picture": "%[pictureurl]"
          }
        }
      ]
    }
  ]
}
1 This dynamic value acts as a placeholder for your service URL to build a full working URL. For more details, see coveo_url.
2 Coveo can index multiple pieces of data as a single value. In this case, Coveo will index the first and last names in the Title field, which is dedicated to full names when indexing user profiles.
3 Following the JSONPath syntax, this value selects the value of the link property in the second object of the contact array.

With the information indexed by this source configuration, the user profile returned by your API could look as follows in your Coveo search results:

User profile Coveo search result

The JSONPath syntax also lets you include a filter expression in dynamic values.

Example: SmartSheet table

The following is a truncated response example from a SmartSheet table. In this response, each row corresponds to a project. The cells of a row contain project data such as its name and status. Each column contains a certain type of data regarding the projects in the table.

{
  "id": 952283761449296,
  "name": "MY SMARTSHEET",
  "permalink": "https://app.smartsheet.com/sheets/mC8mcf4M8j37dbnHbBe77fP0bw939PMw96pq961",
  "columns": [
    {
      "id": 7323702856574852,
      "version": 0,
      "index": 0,
      "title": "Project name",
      "type": "TEXT_NUMBER"
    },
    {
      "id": 1694203322361732,
      "version": 0,
      "index": 1,
      "title": "Status",
      "type": "TEXT_NUMBER"
    },
    "..."
  ],
  "rows": [
    {
      "id": 2314743682492292,
      "rowNumber": 1,
      "cells": [
        {
          "columnId": 7323702856574852,
          "value": "ERP Vendor Selection",
          "displayValue": "ERP Vendor Selection"
        },
        {
          "columnId": 1694203322361732,
          "value": "Closed",
          "displayValue": "Closed"
        },
        "..."
      ]
    },
    {
      "id": 6818343309862788,
      "rowNumber": 2,
      "siblingId": 2314743682492292,
      "cells": [
        {
          "columnId": 7323702856574852,
          "value": "IS Dashboard",
          "displayValue": "IS Dashboard"
        },
        {
          "columnId": 1694203322361732,
          "value": "On Hold",
          "displayValue": "On Hold"
        },
        "..."
      ]
    }
  ]
}

You want to index the name, status, and other data of each project item. You use a JSONPath with a filter expression based on columnId to retrieve the value of the value field of each column. Your JSON source configuration therefore contains:

{
  "Endpoints": [
    {
      "Path": "/sheets/952283761449296",
      "Method": "GET",
      "ItemPath": "rows",
      "ItemType": "Project",
      "Uri": "https://app.smartsheet.com/sheets/mC8mcf4M8j37dbnHbBe77fP0bw939PMw96pq961?rowId=%[id]",
      "ClickableUri": "https://app.smartsheet.com/sheets/mC8mcf4M8j37dbnHbBe77fP0bw939PMw96pq961?rowId=%[id]",
      "Title": "%[cells[?(@.columnId==7323702856574852)].value]",
      "CreatedDate": "%[createdAt]",
      "ModifiedDate": "%[modifiedAt]",
      "Metadata": {
        "rowId": "%[id]",
        "rowNumber": "%[rowNumber]",
        "projectName": "%[cells[?(@.columnId==7323702856574852)].value]",
        "projectStatus": "%[cells[?(@.columnId==1694203322361732)].value]"
      }
    }
  ]
}
Example: Wordpress blog post tags

Let’s say you want to index the posts of a Wordpress blog. The following is a truncated response example from the Wordpress API. The main object represents a blog post.

{
  "id": 12345,
  "date_gmt": "2024-03-11T15:43:43",
  "slug": "coveo-test-page-wp",
  "status": "publish",
  "link": "https://example.com/coveo-test-page-wp/",
  "title": {
    "rendered": "Coveo Test Page WP Edited Content"
  },
  "content": {
    "rendered": "<p>This is a sample wordpress page to test integration with Coveo.</p>\n<p>Heklsjfldsfj;ds</p>\n<p>dsfjlkdskjafklds</p>\n<p>lkfjdslkfjldk</p>\n",
    "protected": false
  },
  "author": 4321,
  "_embedded": {
    "author": [
      {
        "id": 4321,
        "name": "John Smith",
        "url": "",
        "description": "",
        "link": "https://example.com/author/00u1o7cop0d6hhecl0h8/",
        "slug": "00u1o7cop0d6hhecl0h8",
      }
    ],
    "wp:term": [
      [
        {
          "id": 73,
          "link": "https://example.com/category/docs-testing/",
          "name": "DocsTesting",
          "taxonomy": "category"
        }
      ],
      [
        {
          "id": 242,
          "link": "https://example.com/tag/abc-xyz/",
          "name": "abc-xyz",
          "taxonomy": "post_tag"
        },
        {
          "id": 243,
          "link": "https://example.com/tag/coveo-test-wp/",
          "name": "coveo-test-wp",
          "taxonomy": "post_tag"
        }
      ],
    ]
  }
}

In your search interface, you want to be able to filter posts by tag. So, when indexing a blog post, you also want to index a list of its tags as metadata.

To do so, you use a JSONPath expression, in which you add a filter expression based on the taxonomy property to exclude category names. In your source JSON configuration, the Metadata object may therefore look as follows:

"Metadata": {
  "author": "%[_embedded.author[0].name]",
  "date": "%[date_gmt]",
  "wordpress_status": "%[status]",
  "id": "%[id]",
  "wordpress_tag_names": "%[_embedded.wp:term..[?(@.taxonomy=='post_tag')].name]"
}

As a result, Coveo will index the tags associated with each blog post in the wordpress_tag_names field. In the Content Browser (platform-ca | platform-eu | platform-au), this should look as follows:

Blog post item open in Coveo’s Content Browser

However, since you want Coveo to consider these values separately, you must make the wordpress_tag_names field a multi-value field. As a result, the tags will be displayed separately in the facet, like so:

Facet linked to multi-value field wordpress_tag_names in a Coveo search interface

Otherwise, you’ll see the tags as a single string in the facet, for example: abc-xyz;coveo-test-wp.

Tip

Use the JSONPath Online Evaluator to test your JSON paths.

If an application field targeted by your JSON path is missing or empty, Coveo ignores it. The rest of the content to index, if applicable, is indexed normally.

For example, let’s say your source JSON configuration contains the following property: "blogpostcomment": "%[subject] - %[message]". If, in an item, the subject field contains Hello world! and the message field is empty, the content indexed in the blogpostcomment Coveo field will be: `Hello world! - `.

coveo_parent

coveo_parent always appears with a JSON path. In a source configuration, it’s used to instruct the Coveo crawler to refer to the parent item Metadata properties, and then to retrieve the value of the desired property. So, when writing the JSON configuration to retrieve data regarding a sub-item, if you want to get a metadata that you specified earlier in your JSON configuration for the parent object, your dynamic value syntax should be the following:

"CoveoFieldName": %[coveo_parent.MetadataFieldName]

Example

You use the Vimeo API to index user profiles, and then the videos uploaded by each user as sub-items associated to their profile. You therefore first make an API call to retrieve user profiles, and use it to build your endpoint configuration. The API response contains, among other pieces of metadata, the user’s ID: "userid": "jsmith01". In your user profile endpoint configuration, you therefore indicate:

"username": "%[userid]"

As a result, for user profiles, the userid provided by the Vimeo API is stored in the Coveo index as the value of the username field.

Then, you make another API call to retrieve videos, and use the API response to build the endpoint Subitems section of your JSON configuration. You would like to index, for each video, the username of the user who originally uploaded it, but the video endpoint doesn’t provide this information. You therefore need to instruct the Coveo crawler to retrieve it from the parent item configuration, that is, the user profile endpoint configuration. In your Subitems configuration, you include:

"uploadedby": "%[coveo_parent.username]"

This instructs the Coveo crawler to refer to the Metadata of the parent item in your JSON configuration, to retrieve the value of the username property, and to index it as the value of the uploadedby field for videos. In Coveo, user IDs will then appear as the username value of user profiles and as the uploadedby value of videos.

A dynamic value can also contain more than one coveo_parent, if needed.

Example

"CoveoFieldName": %[coveo_parent.coveo_parent.MetadataFieldName] refers to the parent of the parent of a sub-item.

Note

coveo_parent alone followed by a JSON path indicates to the Coveo crawler that it must go back up in your JSON configuration, under the Metadata object, to find the specified property. However, if you didn’t specify the desired property in the Metadata of the parent object, you can use raw to retrieve it from the API response you obtained before for the parent item.

In a permission configuration, coveo_parent instructs the Coveo crawler to retrieve the value of the desired property from the permission subquery AdditionalInfo properties. So, once you’ve retrieved the security identities associated with each item through the permission subquery, you must use the permission configuration and coveo_parent to extract the identities' relationships.

raw

In the metadata of a sub-item, you may want to refer to a property that was returned in the API JSON response for the parent item.

If you specified this property before in your JSON configuration for the parent item metadata, you can retrieve it from your own configuration with coveo_parent alone. If you didn’t specify it, however, you must instruct Coveo to retrieve it from the API response returned for the parent item by adding raw in the JSON path leading to the desired property.

Example

When implementing result folding, you want a sub-item to have its parent item URI in the foldingparent field. In your sub-item configuration, you therefore write:

"foldingparent": "%[coveo_parent.uri]"

However, you didn’t include the uri property in the parent item endpoint configuration, as it was irrelevant at this point. So, you must add raw to your sub-item property value:

"foldingparent": "%[coveo_parent.raw.uri]"

This instructs Coveo to retrieve the parent item URI from the metadata provided in the API JSON response rather than from the parent item endpoint configuration you wrote earlier.

Note

When a JSON path leading to the desired property in the API response is too long or complicated, you can choose to avoid using the raw metadata of the parent item, and to rather include the property in the parent item endpoint configuration to give it a more suitable JSON path. In your child item Metadata, you can then use coveo_parent and the new JSON path.

Example

The API JSON response provides the following user profile metadata:

{
     "uri": "/users/jsmith01",
     "name": "John Smith",
     "location": "Winnipeg, Canada",
     "bio": "I am a tax expert and I like to share videos of my lovely dog!",
     "personalwebsite": {
          "name": "John Smith's Personal Website",
          "link": "http://www.johnsmith.ca",
          "description": "Visit my personal website for funny photos of my basset hound running!"
         },
      "professionalwebsite": {
          "name": "John Smith's Professional Website",
          "link": "http://www.taxseasonsgreetings.ca",
          "description": "Visit my professional website to discover my tips and tricks for filing taxes. Happy tax filing!"
         }
 }

You choose not to index website links in user profiles. Your configuration therefore includes:

"title": "%[name]",
"uri": "%[uri]",
"location": "%[location]",

You also want to index videos as sub-items of user profiles. In Coveo, you want the metadata of video items to include the author’s personal website. Your sub-item endpoint configuration should therefore include:

"website": "%[coveo_parent.raw.personalwebsite.link]"

However, you think that this dynamic value is too long and prefer to avoid it so that reading and interpreting your JSON configuration remains effortless. So, you add "website": "%[personalwebsite.link]" to the user profile endpoint JSON configuration, that is, the parent item configuration, and, in the video item configuration, you include:

"website": "%[coveo_parent.website]"

By doing so, you change the JSON path leading to the desired value for a more simple one. Although not necessary, it may be more convenient.

Whitespace characters

Depending on the content repository to index, your dynamic values may contain whitespace characters. In such case, the syntax to use is slightly different: the property name must be enclosed in simple quotes and square brackets in addition to the regular syntax, for example, "%[['property with whitespace']]" as opposed to "%[property]" in the regular syntax.

The following table shows the syntax to use based on the scenario.

Scenario Syntax to use

No whitespace (regular syntax)

"%[property]"

Single property

"%[['property with whitespace']]"

Property with whitespace nested within another property

"%[property.['property with whitespace']]"

Property with whitespace nested within another property with whitespace

"%[['property with whitespace'].['property with whitespace']]"

Complex expression and property with whitespace

"%[['property with whitespace'][?(@.type=='CATEGORY')].name]"

Multi-value fields

The examples above show JSON paths leading to properties that have a single value. However, the API’s JSON response may contain arrays in which several objects have a property in common. In such a case, you may want to retrieve some or all of the values associated to this property and populate a single Coveo field with these values. You must therefore write your source configuration accordingly, so that Coveo indexes the desired content.

To populate a Coveo field with many values, use the dynamic value syntax ("CoveoFieldName": "%[DynamicValue]") with, inside the square brackets, a JSONPath expression.

When writing a typical JSON path to populate a Coveo field with many values:

  • Specify the objects to take into account between square brackets next to the array name.

  • Use a * character to represent all objects in an array, even if there’s only one object.

  • If you want to specify certain objects only, decrement the desired object place by one, that is, use 0 to refer to the first object in the array, use 1 to refer to the second object, and so on.

  • Use commas to separate the values to which you want to refer.

As a result, "CoveoFieldName": "%[path.object1[*].object2.property]" would populate the CoveoFieldName field with the property value of all object1 objects found under path.

Example

The API returns the following JSON response, providing four sizes of the same picture.

{
    ...
    "name": "Vimeo Holiday Videos!",
    "pictures": {
        "uri": "/videos/148903960/pictures/548505676",
        "active": true,
        "type": "custom",
        "sizes": [
            {
                "width": 100,
                "height": 75,
                "link": "https://i.vimeocdn.com/video/54855654705676_100x75.jpg?r=pad"
            },
            {
                "width": 200,
                "height": 150,
                "link": "https://i.vimeocdn.com/video/5485567705676_200x150.jpg?r=pad"
            },
            {
                "width": 295,
                "height": 166,
                "link": "https://i.vimeocdn.com/video/1231234_295x166.jpg?r=pad"
            },
            {
                "width": 640,
                "height": 360,
                "link": "https://i.vimeocdn.com/video/548545654605676_640x360.jpg?r=pad"
            }
        ]
    }
...
}

In the Coveo width field, you want to list the first two available widths for this picture. Under metadata, your JSON configuration therefore contains the following metadata:

"width": "%[pictures.sizes[0,1].width]"

In Coveo, the width field will contain the following information: 100;200.

However, in the Coveo pictureuri field, you only want to have the link to the largest version of the picture. Your JSON configuration therefore contains the following metadata:

"pictureuri": "%[pictures.sizes[3].link]"
Example

The API returns a JSON response containing the websites array, which includes only one website object.

{
    ...
        "websites": [
        {
            "name": null,
            "link": "http://www.canadashistory.ca",
            "description": null
        }
      ]
    ...
}

You want a website link to appear in the Coveo website field. Since there’s only one link value in your JSON response, your JSON configuration therefore contains the following metadata. You can’t omit the [*], even if there’s only one object in the array.

"website": "%[websites[*].link]"

If there were more than one object in the websites array, and you wanted to index only one of the link property values, you would have to specify the object of which you want to index the property value. To index the first property, your JSON configuration would therefore need the following metadata:

"website": "%[websites[0].link]"

Dynamic time expressions

Dynamic time expressions are placeholders for dates in your source JSON configuration. These expressions contain at least a token representing a specific date and time. They may also include a mathematical operator and a number of months, days, hours, or minutes. When indexing or re-indexing your source content, Coveo computes the time expression and retrieves the content matching your date criterion.

Allowed tokens are @Now and @RefreshDate. @Now represents the start date of the source update operation, while @RefreshDate represents the date of the last source refresh.

Allowed units are M (months), d (days), h (hours), and m (minutes). Only whole values are supported. Space characters are supported, but not recommended.

When using a dynamic time expression with a date token, make sure to provide the date format with the DateFormat parameter.

Examples

When performing a source refresh, Coveo retrieves all items modified after the last update operation start date:

"RefreshEndpoints":[
  {
    "DateFormat":"\\'yyyy-MM-dd\\',\\'hh:mm:ss\\'",
    "QueryParameters":{
      "modified":"%[modified_date]>=@RefreshDate"
    }
  }
...
]

When indexing a Slack channel, Coveo retrieves all messages written in the last 6 months:

"Endpoints": [
  {
    "Method": "GET",
    "Path": "/api/conversations.history/",
    "QueryParameters": {
      "token": "@ApiKey",
      "channel": "AD8GFL97BFG",
      "oldest": "@Now-6M",
      "latest": "@Now"
    },
  "DateFormat": "UnixEpoch",
  ...
  }
...
]

Inheritable properties

The SkippableErrorCodes, Paging, and Authentication properties can sometimes be inherited, that is, the value specified in a parent object also applies to its child object if no other value is specified in the child object configuration. You can take advantage of such properties to avoid redundancy in your JSON configuration.

If you want an inheritable property to apply to the direct children of an object, omit this property in the object configuration. If you specify the property for the child object with a different value than that specified for its parents, the former applies and the latter is ignored.

However, the inheritable character of these properties isn’t carried forward at all levels:

  • SkippableErrorCodes specified at the service level apply to the child endpoints, and sub-queries underneath these endpoints inherit from this property as well. Sub-items don’t inherit this property.

    Example

    In the following configuration structure, since there’s no SkippableErrorCodes specified for Endpoint 1, the property in the parent object (Service) applies. Similarly, Sub-query 1 uses the SkippableErrorCodes value used by Endpoint 1, which is "404". Endpoint 2, however, has a different SkippableErrorCodes value than its parent object, Service. The Service value is therefore overridden by an empty value, which means that no error code should be skipped. Since Sub-query 2 has no specified SkippableErrorCodes, the value specified in its parent object applies, which is " ".

    • Service SkippableErrorCodes: "404"

      • Endpoint 1 (SkippableErrorCodes not specified)

        • Sub-query 1 (SkippableErrorCodes not specified)

      • Endpoint 2 SkippableErrorCodes: " "

        • Sub-query 2 (SkippableErrorCodes not specified)

  • Paging properties specified at the service level apply to the child endpoints. Sub-items underneath these endpoints also inherit the Paging properties. Sub-queries don’t inherit Paging properties.

    Note

    PermissionSubQueries objects inherit paging properties specified at the service or endpoint level.

  • Authentication properties specified at the service level apply to the child endpoints, and sub-queries underneath them. Typically, you don’t need to override the configuration you entered at the service level further down in the JSON configuration, as the same authentication method usually applies to the entire application content.

To disable paging property inheritance, set DoNotInherit to true in the Paging object. This prevents the paging configuration from being applied to child objects, which can help speed up the crawling process.