REST API Source Concepts

Coveo has dedicated connectors for many web and on-premises systems, therefore allowing you to quickly make application content searchable. See the Connector directory for the full list.

However, there may be applications of which you want to index the content, but for which there’s no dedicated connector. In such a case, when you have the required privileges, you can use a generic API connector to retrieve and make the desired content searchable with Coveo.

Many web applications offer a public API that developers can use to leverage the application in their own software. Coveo takes advantage of such an API to call the application and therefore retrieve its content. A REST API source allows you to index content from repositories exposing their data through a REST API.

When creating your source, in the Coveo Administration Console, you must provide a JSON configuration allowing Coveo to retrieve items from the repository services and their respective resource endpoints. This configuration indicates which API calls to execute to fetch the desired items, how to parse the responses to extract relevant metadata, and what type of resources these items represent.

While the Reference article details the elements to use in a JSON configuration, this article explains some basic concepts that apply to any REST API source and its JSON configuration. It describes the typical structure of a repository to index, as well as how to use dynamic values to retrieve your content and how JSON objects properties can be passed down to child objects in your JSON configuration.

See also Add or Edit a REST API Source for more information on the source and the REST API Source Tutorial for a complete step-by-step tutorial.

Repository Structure

The typical remote repository consists of services, endpoints, and items arranged in a hierarchical fashion. Each service contains one or more resource endpoints, and each of these endpoints represents a type of item to fetch, such as user profiles, web pages, files, etc.

Generic_REST_API_source_key_concepts

While this structure applies to most repositories, the item types a repository contains vary. For example, a video-sharing website such as Vimeo provides not only video items, but also user profiles, channels, groups, etc., while a customer service management system may have support cases and knowledge articles as item types.

When you create a REST API source in the Coveo Administration Console, you must provide a JSON source configuration listing the services and endpoints to crawl. This JSON configuration must also indicate which API calls to execute to fetch the desired items and how to parse the responses to extract relevant metadata.

Dynamic Values

Dynamic values are metadata values acting as placeholders in your REST API source JSON configuration and defining what the Coveo index fields will contain for each item. Dynamic values are to be replaced with metadata or content from the JSON response returned by the application API. See REST API Source Tutorial for examples.

The syntax to use to leverage a dynamic value in your source JSON configuration is typically "CoveoFieldName": "%[DynamicValue]".

However, if your dynamic value contains whitespace characters, the syntax to use is slightly different. See Whitespace Characters for details.

A dynamic value can consist of one or more of the following:

  • coveo_url, a placeholder for your service URL.

  • A JSON path leading to a property specified earlier in your JSON configuration or in the JSON response returned by the web application API.

  • coveo_parent, which precedes a JSON path and indicates that this JSON path refers to a property of the parent of an item.

  • raw, an indication inserted between coveo_parent and a JSON path, allows you to retrieve a value in the API JSON response that previously provided you with metadata regarding the parent of an item.

A dynamic value can also act as a placeholder for more than one field value. For example, if the web application API returns several tags associated with a blog article, you can use a dynamic value to index some or all of them in the Coveo tags field.

However, if a piece of metadata is constant across all items of an endpoint, you don’t need a dynamic value. You can hardcode your value, that is, enter it as a static value that applies to all items.

Example

All posts on your blog are in English only. The value of the language field therefore doesn’t vary and can be hardcoded: "language": "English".

Conversely, if you have posts both in English and Spanish, the language value will vary. You will therefore need to use a JSON path dynamic value: "language": "%[lang]".

coveo_url

coveo_url is a placeholder for your service URL.

Example

You write the following configuration to retrieve user profiles. Your service URL is https://myapplication.com. In addition, user profile URIs are built by a dynamic value consisting of your service URL, /users/, and another dynamic value representing the user ID.

As a result, in Coveo, for user profile items, the Uri field will be populated with https://myapplication.com/users/, followed by the corresponding user ID.

{
  "Services":[
    {
        "Url":"https://myapplication.com",
        "Endpoints":[
          {
              "Path":"users/%[userid]",
              "Method":"GET",
              "ItemType":"User",
              "Uri":"%[coveo_url]/users/%[userid]",
              "ClickableUri":"%[link]",
              "Title":"%[name]",
              "Body":"%[bio]",
              "Metadata":{
                "datebirth":"%[birthday]",
                "city":"%[location]"
          },
        }
      ]
    }
  ]
}

JSON Path

A JSON path dynamic value indicates to the Coveo crawler where to find the desired information. Just like a URL leads to a specific page in a website, a JSON path is akin to an address. So, when building your source JSON configuration, you must make calls to the web application of which you want to index the content. In the JSON response, you’ll see how the desired data is labeled and organized, and you will then be able to use this information to direct the Coveo crawler to the data you want to retrieve.

Example

You want to index Vimeo user profiles. In Coveo, you want the body of these items to be the user presentation text displayed on their Vimeo profile. In a Vimeo API JSON response, a user’s presentation text is provided as the bio value.

    {
      "uri": "/users/4543466",
      "name": "John Smith",
      "link": "https://vimeo.com/johnsmith",
      "location": "New York City",
      "bio": "Hello, World! My name is John Smith. I like posting cute videos of my pets on Vimeo.",
      "websites": [
          {
              "name": "John Smith's Professional Website",
              "link": "http://www.johnsmith.com",
              "description": "Visit my website to check out some samples of my work as a videographer!"
          }
      ]
    }

In Coveo, the item body field is named body. You therefore configure your REST API source so that the content of the bio Vimeo field is indexed in Coveo as the content of the body field (see Dynamic Values): "Body": "%[bio]". This instructs the Coveo crawler to retrieve the content of the bio property in the API JSON response, and then to store this information as the value of the Coveo body field.

If you want to use a dynamic value to refer to a property value that’s nested within another property, the JSONPath syntax to use in your JSON value is the following:

"CoveoFieldName": %[ItemMetadataObjectFieldName.ItemMetadataFieldName]

This instructs the Coveo crawler to look in the ItemMetadataObjectFieldName object for the ItemMetadataFieldName property, and to save the value of this property as the value of the CoveoFieldName field.

Example

You want to index Vimeo user profiles and want the Coveo Picture field to contain a URL to the user profile picture. The API JSON response contains the following:

"Metadata": {
  "profilepicture": {
    "pictureuri": "/user/{user_id}/pictures/{picture_id}",
    "date_uploaded": "2016-12-20T00:41:41+00:00",
    "width": 1920,
    "height": 1080
  }
}

You therefore configure your REST API source so that the content of the pictureuri Vimeo field under profilepicture, that is, the URI of the user’s profile picture, is indexed in Coveo as the content of the picture field (see Dynamic Values and About Fields):

"picture": "%[coveo_url]%[profilepicture.pictureuri]"

This informs the Coveo crawler that the content of the Coveo picture field should consist in the service URL followed by the value of the Vimeo pictureuri, which can be found in the API JSON response under profilepicture.

The JSONPath syntax also allows applying a filter expression. See JSONPath With Filter Expression for an example.

Note

Use the JSONPath Online Evaluator to test your JSON paths.

Should an application field targeted by your JSON path be missing or empty, Coveo ignores it. The rest of the content to index, if applicable, is indexed normally.

Example

Your source configuration contains: "privatemessage": "Direct message: %[subject] - %[message]".

If the subject field contains Hello world! and the message field is empty, the content indexed in the privatemessage Coveo field is: Direct message: Hello world! -.

coveo_parent

coveo_parent always appears with a JSON path. In a source configuration, it’s used to instruct the Coveo crawler to refer to the parent item Metadata properties, and then to retrieve the value of the desired property. So, when writing the JSON configuration to retrieve data regarding a sub-item, if you want to get a metadata that you specified earlier in your JSON configuration for the parent object, your dynamic value syntax should be the following:

"CoveoFieldName": %[coveo_parent.MetadataFieldName]

Example

You use the Vimeo API to index user profiles, and then the videos uploaded by each user as sub-items associated to their profile. You therefore first make an API call to retrieve user profiles, and use it to build your endpoint configuration. The API response contains, among other pieces of metadata, the user’s ID: "userid": "jsmith01". In your user profile endpoint configuration, you therefore indicate:

"username": "%[userid]"

As a result, for user profiles, the userid provided by the Vimeo API is stored in the Coveo index as the value of the username field.

Then, you make another API call to retrieve videos, and use the API response to build the endpoint Subitems section of your JSON configuration. You would like to index, for each video, the username of the user who originally uploaded it, but the video endpoint doesn’t provide this information. You therefore need to instruct the Coveo crawler to retrieve it from the parent item configuration, that is, the user profile endpoint configuration. In your Subitems configuration, you include:

"uploadedby": "%[coveo_parent.username]"

This instructs the Coveo crawler to refer to the Metadata of the parent item in your JSON configuration, to retrieve the value of the username property, and to index it as the value of the uploadedby field for videos. In Coveo, user IDs will then appear as the username value of user profiles and as the uploadedby value of videos.

A dynamic value can also contain more than one coveo_parent, if needed.

Example

"CoveoFieldName": %[coveo_parent.coveo_parent.MetadataFieldName] refers to the parent of the parent of a sub-item.

Note

coveo_parent alone followed by a JSON path indicates to the Coveo crawler that it must go back up in your JSON configuration, under the Metadata object, to find the specified property. However, if you didn’t specify the desired property in the Metadata of the parent object, you can use raw to retrieve it from the API response you obtained before for the parent item.

In a permission configuration, coveo_parent instructs the Coveo crawler to retrieve the value of the desired property from the permission subquery AdditionalInfo properties. So, once you’ve retrieved the security identities associated with each item through the permission subquery, you must use the permission configuration and coveo_parent to extract the identities' relationships.

raw

In the metadata of a sub-item, you may want to refer to a property that was returned in the API JSON response for the parent item. If you specified this property before in your JSON configuration for the parent item metadata, you can retrieve it from your own configuration with coveo_parent alone. If you didn’t specify it, however, you must instruct Coveo to retrieve it from the API response returned for the parent item by adding raw in the JSON path leading to the desired property.

Example

When implementing result folding, you want a sub-item to have its parent item URI in the foldingparent field. In your sub-item configuration, you therefore write:

"foldingparent": "%[coveo_parent.uri]"

However, you didn’t include the uri property in the parent item endpoint configuration, as it was irrelevant at this point. So, you must add raw to your sub-item property value:

"foldingparent": "%[coveo_parent.raw.uri]"

This instructs Coveo to retrieve the parent item URI from the metadata provided in the API JSON response rather than from the parent item endpoint configuration you wrote earlier.

Note

When a JSON path leading to the desired property in the API response is too long or complicated, you can choose to avoid using the raw metadata of the parent item, and to rather include the property in the parent item endpoint configuration to give it a more suitable JSON path. In your child item Metadata, you can then use coveo_parent and the new JSON path.

Example

The API JSON response provides the following user profile metadata:

{
     "uri": "/users/jsmith01",
     "name": "John Smith",
     "location": "Winnipeg, Canada",
     "bio": "I am a tax expert and I like to share videos of my lovely dog!",
     "personalwebsite": {
          "name": "John Smith's Personal Website",
          "link": "http://www.johnsmith.ca",
          "description": "Visit my personal website for funny photos of my basset hound running!"
         },
      "professionalwebsite": {
          "name": "John Smith's Professional Website",
          "link": "http://www.taxseasonsgreetings.ca",
          "description": "Visit my professional website to discover my tips and tricks for filing taxes. Happy tax filing!"
         }
 }

You choose not to index website links in user profiles. Your configuration therefore includes:

"title": "%[name]",
"uri": "%[uri]",
"location": "%[location]",

You also want to index videos as sub-items of user profiles. In Coveo, you want the metadata of video items to include the author’s personal website. Your sub-item endpoint configuration should therefore include:

"website": "%[coveo_parent.raw.personalwebsite.link]"

However, you think that this dynamic value is too long and prefer to avoid it so that reading and interpreting your JSON configuration remains effortless. So, you add "website": "%[personalwebsite.link]" to the user profile endpoint JSON configuration, that is, the parent item configuration, and, in the video item configuration, you include:

"website": "%[coveo_parent.website]"

By doing so, you change the JSON path leading to the desired value for a more simple one. Although not necessary, it may be more convenient.

Whitespace Characters

Depending on the content repository to index, your dynamic values may contain whitespace characters. In such case, the syntax to use is slightly different: the property name must be enclosed in simple quotes and square brackets in addition to the regular syntax, for example, "%[['property with whitespace']]" as opposed to "%[property]" in the regular syntax.

The following table shows the syntax to use based on the scenario.

Scenario Syntax to use
No whitespace (regular syntax) "%[property]"
Single property "%[['property with whitespace']]"
Property with whitespace nested within another property "%[property.['property with whitespace']]"
Property with whitespace nested within another property with whitespace "%[['property with whitespace'].['property with whitespace']]"
Complex expression and property with whitespace "%[['property with whitespace'][?(@.type=='CATEGORY')].name]"

Multi-Value Fields

The examples above show JSON paths leading to properties that have a single value. However, the JSON response returned by your web application may contain arrays in which several objects have a property in common. In such a case, you may want to retrieve some or all of the values associated to this property and populate a single Coveo field with these values. You must therefore write your source configuration accordingly, so that Coveo indexes the desired content.

To populate a Coveo field with many values, use the dynamic value syntax ("CoveoFieldName": "%[DynamicValue]") with, inside the square brackets, a JSONPath expression.

When writing a typical JSON path to populate a Coveo field with many values:

  • Specify the objects to take into account between square brackets next to the array name.

  • Use a * character to represent all objects in an array, even if there’s only one object.

  • If you want to specify certain objects only, decrement the desired object place by one, that is, use 0 to refer to the first object in the array, use 1 to refer to the second object, and so on.

  • Use commas to separate the values to which you want to refer.

As a result, "CoveoFieldName": "%[path.object1[*].object2.property]" would populate the CoveoFieldName field with the property value of all object1 objects found under path.

Example

Your web application returns the following JSON response, providing four sizes of the same picture.

{
    ...
    "name": "Vimeo Holiday Videos!",
    "pictures": {
        "uri": "/videos/148903960/pictures/548505676",
        "active": true,
        "type": "custom",
        "sizes": [
            {
                "width": 100,
                "height": 75,
                "link": "https://i.vimeocdn.com/video/54855654705676_100x75.jpg?r=pad"
            },
            {
                "width": 200,
                "height": 150,
                "link": "https://i.vimeocdn.com/video/5485567705676_200x150.jpg?r=pad"
            },
            {
                "width": 295,
                "height": 166,
                "link": "https://i.vimeocdn.com/video/1231234_295x166.jpg?r=pad"
            },
            {
                "width": 640,
                "height": 360,
                "link": "https://i.vimeocdn.com/video/548545654605676_640x360.jpg?r=pad"
            }
        ]
    }
...
}

In the Coveo width field, you want to list the first two available widths for this picture. Under metadata, your JSON configuration therefore contains the following metadata:

"width": "%[pictures.sizes[0,1].width]"

In Coveo, the width field will contain the following information: 100;200.

However, in the Coveo pictureuri field, you only want to have the link to the largest version of the picture. Your JSON configuration therefore contains the following metadata:

"pictureuri": "%[pictures.sizes[3].link]"
Example

Your web application returns a JSON response containing the websites array, which includes only one website object.

{
    ...
        "websites": [
        {
            "name": null,
            "link": "http://www.canadashistory.ca",
            "description": null
        }
      ]
    ...
}

You want a website link to appear in the Coveo website field. Since there’s only one link value in your JSON response, your JSON configuration therefore contains the following metadata. You can’t omit the [*], even if there’s only one object in the array.

"website": "%[websites[*].link]"

If there were more than one object in the websites array, and you wanted to index only one of the link property values, you would have to specify the object of which you want to index the property value. To index the first property, your JSON configuration would therefore need the following metadata:

"website": "%[websites[0].link]"

Dynamic Time Expressions

Dynamic time expressions are placeholders for dates in your source JSON configuration. These expressions contain at least a token representing a specific date and time. They may also include a mathematical operator and a number of months, days, hours, or minutes. When indexing or re-indexing your source content, Coveo computes the time expression and retrieves the content matching your date criterion.

Allowed tokens are @Now and @RefreshDate. @Now represents the start date of the source update operation, while @RefreshDate represents the date of the last source refresh.

Allowed units are M (months), d (days), h (hours), and m (minutes). Only whole values are supported. Space characters are supported, but not recommended.

When using a dynamic time expression with a date token, make sure to provide the date format with the DateFormat parameter.

Examples

When performing a source refresh, Coveo retrieves all items modified after the last update operation start date:

"RefreshEndpoints":[
  {
    "DateFormat":"\\'yyyy-MM-dd\\',\\'hh:mm:ss\\'",
    "QueryParameters":{
      "modified":"%[modified_date]>=@RefreshDate"
    }
  }
...
]

When indexing a Slack channel, Coveo retrieves all messages written in the last 6 months:

"Endpoints": [
  {
    "Method": "GET",
    "Path": "/api/conversations.history/",
    "QueryParameters": {
      "token": "@ApiKey",
      "channel": "AD8GFL97BFG",
      "oldest": "@Now-6M",
      "latest": "@Now"
    },
  "DateFormat": "UnixEpoch",
  ...
  }
...
]

Inheritable Properties

The SkippableErrorCodes, Paging, and Authentication properties can sometimes be inherited, that is, the value specified in a parent object also applies to its child object if no other value is specified in the child object configuration. You can take advantage of such properties to avoid redundancy in your JSON configuration.

If you want a inheritable property to apply to the direct children of an object, omit this property in the object configuration. If you specify the property for the child object with a different value than that specified for its parents, the former applies and the latter is ignored.

However, the inheritable character of these properties isn’t carried forward at all levels:

  • SkippableErrorCodes specified at the service level apply to the child endpoints, and sub-queries underneath these endpoints inherit from this property as well. Sub-items don’t inherit this property.

    Example

    In the following configuration structure, since there’s no SkippableErrorCodes specified for Endpoint 1, the property in the parent object (Service) applies. Similarly, Sub-query 1 uses the SkippableErrorCodes value used by Endpoint 1, which is "404". Endpoint 2, however, has a different SkippableErrorCodes value than its parent object, Service. The Service value is therefore overridden by an empty value, which means that no error code should be skipped. Since Sub-query 2 has no specified SkippableErrorCodes, the value specified in its parent object applies, which is " ".

    • Service SkippableErrorCodes: "404"

      • Endpoint 1 (SkippableErrorCodes not specified)

        • Sub-query 1 (SkippableErrorCodes not specified)

      • Endpoint 2 SkippableErrorCodes: " "

        • Sub-query 2 (SkippableErrorCodes not specified)

  • Paging properties specified at the service level apply to the child endpoints. Sub-queries and sub-items underneath don’t inherit this property.

  • Authentication properties specified at the service level apply to the child endpoints, and sub-queries underneath them. Typically, you don’t need to override the configuration you entered at the service level further down in the JSON configuration, as the same authentication method usually applies to the entire application content.