Generic REST API Source Concepts

Although Coveo Cloud has dedicated connectors for many web and on-premises systems, you may want to make other content searchable (see Connector Directory). In such a case, you can use a Generic REST API source to retrieve and include this content in Coveo Cloud.

Many web applications offer a public API that developers can use to leverage the application in their own software. Coveo Cloud takes advantage of such an API to call the application and therefore retrieve its content. A Generic REST API source allows you to make searchable many remote repositories exposing their data through a REST API.

In the Coveo Administration Console, you must provide a JSON REST configuration allowing Coveo to retrieve items from the repository REST services and their respective resource endpoints. This configuration indicates which API calls to execute to fetch the desired items, how to parse the responses to extract relevant metadata, and what type of resources these items represent.

While the Generic REST API Source Reference article details the elements to use in a JSON configuration, this article explains some basic concepts that apply to any generic REST API source and its JSON configuration. It describes the typical structure of a repository to index, as well as how to use dynamic values to retrieve your content and how JSON objects properties can be passed down to child objects in your JSON configuration.

See also Add or Edit a Generic REST API Source for more information on the source and the Generic REST API Source Tutorial for a complete step-by-step tutorial.

Repository Structure

The typical remote repository consists of services, endpoints, and items arranged in a hierarchical fashion. Each service contains one or more resource endpoints, and each of these endpoints represents a type of item to fetch, such as user profiles, web pages, files, etc.

Generic_REST_API_source_key_concepts

While this structure applies to most repositories, the item types a repository contains vary. For example, a video-sharing website such as Vimeo provides not only video items, but also user profiles, channels, groups, etc., while a customer service management system may have support cases and knowledge articles as item types.

When you create a Generic REST API source in the Coveo Administration Console, you must provide a JSON source configuration listing the services and endpoints to crawl. This JSON configuration must also indicate which API calls to execute to fetch the desired items and how to parse the responses to extract relevant metadata.

Dynamic Values

Dynamic values are metadata values acting as placeholders in your Generic REST API source JSON configuration and defining what the Coveo Cloud index fields will contain for each item. Dynamic values are to be replaced by metadata or content from the JSON response returned by the application API (see Generic REST API Source Tutorial).

The syntax to use to enter a dynamic value in your JSON configuration is the following (see Add or Edit a Generic REST API Source):

"CoveoFieldName": "%[DynamicValue]"

A dynamic value can consist of one or more of the following:

  • coveo_url, a placeholder for your service URL.

  • A JSON path leading to a property specified earlier in your JSON configuration or in the JSON response returned by the web application API.

  • coveo_parent, which precedes a JSON path and indicates that this JSON path refers to a property of the parent of an item.

  • raw, an indication inserted between coveo_parent and a JSON path, allows you to retrieve a value in the API JSON response that previously provided you with metadata regarding the parent of an item.

A dynamic value can also act as a placeholder for more than one field value (see Multi-Value Fields). For example, if the web application API returns several tags associated with a blog article, you can use a dynamic value to index some or all of them in the Coveo tags field.

However, if a piece of metadata is constant across all items of an endpoint, you don’t need a dynamic value. You can hardcode your value, i.e., enter it as a static value that applies to all items.

All posts on your blog are in English only. The value of the language field therefore doesn’t vary and can be hardcoded: "language": "English".

Conversely, if you have posts both in English and Spanish, the language value will vary. You will therefore need to use a JSON path dynamic value: "language": "%[lang]".

coveo_url

coveo_url is a placeholder for your service URL.

You write the following configuration to retrieve user profiles. Your service URL is https://myapplication.com. In addition, user profile URIs are built by a dynamic value consisting of your service URL, /users/, and another dynamic value representing the user ID.

As a result, in Coveo Cloud, for user profile items, the Uri field will be populated with https://myapplication.com/users/, followed by the corresponding user ID.

{
  "Services":[
    {
        "Url":"https://myapplication.com",
        "Endpoints":[
          {
              "Path":"users/%[userid]",
              "Method":"GET",
              "ItemType":"User",
              "Uri":"%[coveo_url]/users/%[userid]",
              "ClickableUri":"%[link]",
              "Title":"%[name]",
              "Body":"%[bio]",
              "Metadata":{
                "datebirth":"%[birthday]",
                "city":"%[location]"
              },

JSON Path

A JSON path dynamic value indicates to the Coveo crawler where to find the desired information. Just like a URL leads to a specific page in a website, a JSON path is akin to an address. So, when building your Generic REST API source JSON configuration, you must make calls to the web application of which you want to index the content. In the JSON response, you will see how the desired data is labeled and organized, and you will then be able to use this information to direct the Coveo crawler to the data you want to retrieve.

You want to index Vimeo user profiles. In Coveo Cloud, you want the body of these items to be the user presentation text displayed on their Vimeo profile. In a Vimeo API JSON response, a user’s presentation text is provided as the bio value.

    {
      "uri": "/users/4543466",
      "name": "John Smith",
      "link": "https://vimeo.com/johnsmith",
      "location": "New York City",
      "bio": "Hello, World! My name is John Smith. I like posting cute videos of my pets on Vimeo.",
      "websites": [
          {
              "name": "John Smith's Professional Website",
              "link": "http://www.johnsmith.com",
              "description": "Visit my website to check out some samples of my work as a videographer!"
          }
      ]
    }

In Coveo Cloud, the item body field is named body. You therefore configure your Generic REST API source so that the content of the bio Vimeo field is indexed in Coveo Cloud as the content of the body field (see Dynamic Values): "Body": "%[bio]". This instructs the Coveo crawler to retrieve the content of the bio property in the API JSON response, and then to store this information as the value of the Coveo Cloud body field.

If you want to use a dynamic value to refer to a property value that’s nested within another property, the JSONPath syntax to use in your JSON value is the following:

"CoveoFieldName": %[ItemMetadataObjectFieldName.ItemMetadataFieldName]

This instructs the Coveo crawler to look in the ItemMetadataObjectFieldName object for the ItemMetadataFieldName property, and to save the value of this property as the value of the CoveoFieldName field.

You want to index Vimeo user profiles and want the Coveo Cloud Picture field to contain an URL to the user profile picture. The API JSON response contains the following:

"Metadata": {
  "profilepicture": {
    "pictureuri": "/user/{user_id}/pictures/{picture_id}",
    "date_uploaded": "2016-12-20T00:41:41+00:00",
    "width": 1920,
    "height": 1080
  }
}

You therefore configure your Generic REST API source so that the content of the pictureuri Vimeo field under profilepicture, i.e., the URI of the user’s profile picture, is indexed in Coveo Cloud as the content of the picture field (see Dynamic Values and About Fields):

"picture": "%[coveo_url]%[profilepicture.pictureuri]"

This informs the Coveo crawler that the content of the Coveo Cloud picture field should consist in the service URL followed by the value of the Vimeo pictureuri, which can be found in the API JSON response under profilepicture.

The JSONPath syntax also allows applying a filter expression. See JSONPath With Filter Expression for an example.

Use the JSONPath Online Evaluator to test your JSON paths.

Should an application field targeted by your JSON path be missing or empty, Coveo ignores it. The rest of the content to index, if applicable, is indexed normally.

Your source configuration contains: "privatemessage": "Direct message: %[subject] - %[message]".

If the subject field contains Hello world! and the message field is empty, the content indexed in the privatemessage Coveo field is: Direct message: Hello world! -.

coveo_parent

coveo_parent is always used with a JSON path. It’s used to instruct the Coveo crawler to refer to the parent item Metadata properties, and then to retrieve the value of the desired property. So, when writing the JSON configuration to retrieve data regarding a sub-item, if you want to get a metadata that you specified earlier in your JSON configuration for the parent object, your dynamic value syntax should be the following:

"CoveoFieldName": %[coveo_parent.MetadataFieldName]

You use the Vimeo API to index user profiles, and then the videos uploaded by each user as sub-items associated to their profile. You therefore first make an API call to retrieve user profiles, and use it to build your endpoint configuration. The API response contains, among other pieces of metadata, the user’s ID: "userid": "jsmith01". In your user profile endpoint configuration, you therefore indicate:

"username": "%[userid]"

As a result, for user profiles, the userid provided by the Vimeo API is stored in the Coveo index as the value of the username field.

Then, you make another API call to retrieve videos, and use the API response to build the endpoint Subitems section of your JSON configuration. You would like to index, for each video, the username of the user who originally uploaded it, but the video endpoint doesn’t provide this information. You therefore need to instruct the Coveo crawler to retrieve it from the parent item configuration, i.e., the user profile endpoint configuration. In your Subitems configuration, you include:

"uploadedby": "%[coveo_parent.username]"

This instructs the Coveo crawler to refer to the Metadata of the parent item in your JSON configuration, to retrieve the value of the username property, and to index it as the value of the uploadedby field for videos. In Coveo, user IDs will then appear as the username value of user profiles and as the uploadedby value of videos.

A dynamic value can also contain more than one coveo_parent, if needed.

"CoveoFieldName": %[coveo_parent.coveo_parent.MetadataFieldName] refers to the parent of the parent of a sub-item.

coveo_parent alone followed by a JSON path indicates to the Coveo crawler that it must go back up in your JSON configuration, under the Metadata object, to find the specified property. However, if you didn’t specify the desired property in the Metadata of the parent object, you can use raw to retrieve it from the API response you obtained before for the parent item.

raw

In the metadata of a sub-item, you may want to refer to a property that was returned in the API JSON response for the parent item. If you specified this property before in your JSON configuration for the parent item metadata, you can retrieve it from your own configuration with coveo_parent alone. If you didn’t specify it, however, you must instruct Coveo to retrieve it from the API response returned for the parent item by adding raw in the JSON path leading to the desired property.

When implementing result folding, you want a sub-item to have its parent item URI in the foldingparent field. In your sub-item configuration, you therefore write:

"foldingparent": "%[coveo_parent.uri]"

However, you didn’t include the uri property in the parent item endpoint configuration, as it was irrelevant at this point. So, you must add raw to your sub-item property value:

"foldingparent": "%[coveo_parent.raw.uri]"

This instructs Coveo to retrieve the parent item URI from the metadata provided in the API JSON response rather than from the parent item endpoint configuration you wrote earlier.

When a JSON path leading to the desired property in the API response is too long or complicated, you can choose to avoid using the raw metadata of the parent item, and to rather include the property in the parent item endpoint configuration to give it a more suitable JSON path. In your child item Metadata, you can then use coveo_parent and the new JSON path.

The API JSON response provides the following user profile metadata:

{
     "uri": "/users/jsmith01",
     "name": "John Smith",
     "location": "Winnipeg, Canada",
     "bio": "I am a tax expert and I like to share videos of my lovely dog!",
     "personalwebsite": {
          "name": "John Smith's Personal Website",
          "link": "http://www.johnsmith.ca",
          "description": "Visit my personal website for funny photos of my basset hound running!"
         },
      "professionalwebsite": {
          "name": "John Smith's Professional Website",
          "link": "http://www.taxseasonsgreetings.ca",
          "description": "Visit my professional website to discover my tips and tricks for filing taxes. Happy tax filing!"
         }
 }

You choose not to index website links in user profiles. Your configuration therefore includes:

"title": "%[name]",
"uri": "%[uri]",
"location": "%[location]",

You also want to index videos as sub-items of user profiles. In Coveo, you want the metadata of video items to include the author’s personal website. Your sub-item endpoint configuration should therefore include:

"website": "%[coveo_parent.raw.personalwebsite.link]"

However, you think that this dynamic value is too long and prefer to avoid it so that reading and interpreting your JSON configuration remains effortless. So, you add "website": "%[personalwebsite.link]" to the user profile endpoint JSON configuration, i.e., the parent item configuration, and, in the video item configuration, you include:

"website": "%[coveo_parent.website]"

By doing so, you change the JSON path leading to the desired value for a more simple one. Although not necessary, it may be more convenient.

Multi-Value Fields

The examples above show JSON paths leading to properties that have a single value. However, the JSON response returned by your web application may contain arrays in which several objects have a property in common. In such a case, you may want to retrieve some or all of the values associated to this property and populate a single Coveo Cloud field with these values. You must therefore write your Generic REST API source configuration accordingly, so that Coveo Cloud indexes the desired content.

To populate a Coveo Cloud field with many values, use the dynamic value syntax ("CoveoFieldName": "%[DynamicValue]") with, inside the square brackets, a JSONPath expression.

When writing a typical JSON path to populate a Coveo Cloud field with many values (see JSONPath Syntax):

  • Specify the objects to take into account between square brackets next to the array name.

  • Use a * character to represent all objects in an array, even if there’s only one object.

  • If you want to specify certain objects only, decrement the desired object place by one, i.e., use 0 to refer to the first object in the array, use 1 to refer to the second object, and so on.

  • Use commas to separate the values to which you want to refer.

As a result, "CoveoFieldName": "%[path.object1[*].object2.property]" would populate the CoveoFieldName field with the property value of all object1 objects found under path.

Your web application returns the following JSON response, providing four sizes of the same picture.

{
    ...
    "name": "Vimeo Holiday Videos!",
    "pictures": {
        "uri": "/videos/148903960/pictures/548505676",
        "active": true,
        "type": "custom",
        "sizes": [
            {
                "width": 100,
                "height": 75,
                "link": "https://i.vimeocdn.com/video/54855654705676_100x75.jpg?r=pad"
            },
            {
                "width": 200,
                "height": 150,
                "link": "https://i.vimeocdn.com/video/5485567705676_200x150.jpg?r=pad"
            },
            {
                "width": 295,
                "height": 166,
                "link": "https://i.vimeocdn.com/video/1231234_295x166.jpg?r=pad"
            },
            {
                "width": 640,
                "height": 360,
                "link": "https://i.vimeocdn.com/video/548545654605676_640x360.jpg?r=pad"
            }
        ]
    }
...
}

In the Coveo Cloud width field, you want to list the first two available widths for this picture. Under metadata, your JSON configuration therefore contains the following metadata:

"width": "%[pictures.sizes[0,1].width]"

In Coveo, the width field will contain the following information: 100;200.

However, in the Coveo Cloud pictureuri field, you only want to have the link to the largest version of the picture. Your JSON configuration therefore contains the following metadata:

"pictureuri": "%[pictures.sizes[3].link]"

Your web application returns a JSON response containing the websites array, which includes only one website object.

{
    ...
        "websites": [
        {
            "name": null,
            "link": "http://www.canadashistory.ca",
            "description": null
        }
    ...
}

You want a website link to appear in the Coveo Cloud website field. Since there’s only one link value in your JSON response, your JSON configuration therefore contains the following metadata. You can’t omit the [*], even if there’s only one object in the array.

"website": "%[websites[*].link]"

If there were more than one object in the websites array, and you wanted to index only one of the link property values, you would have to specify the object of which you want to index the property value. To index the first property, your JSON configuration would therefore need the following metadata:

"website": "%[websites[0].link]"

Inheritable Properties

The SkippableErrorCodes, Paging, and Authentication properties can sometimes be inherited, i.e., the value specified in a parent object also applies to its child object if no other value is specified in the child object configuration. You can take advantage of such properties to avoid redundancy in your JSON configuration.

If you want a inheritable property to apply to the direct children of an object, omit this property in the object configuration. If you specify the property for the child object with a different value than that specified for its parents, the former applies and the latter is ignored.

However, the inheritable character of these properties isn’t carried forward at all levels:

  • SkippableErrorCodes specified at the service level apply to the child endpoints, and sub-queries underneath these endpoints inherit from this property as well. Sub-items don’t inherit this property.

    In the following configuration structure, since there’s no SkippableErrorCodes specified for Endpoint 1, the property in the parent object (Service) applies. Similarly, Sub-query 1 uses the SkippableErrorCodes value used by Endpoint 1, which is "404". Endpoint 2, however, has a different SkippableErrorCodes value than its parent object, Service. The Service value is therefore overridden by an empty value, which means that no error code should be skipped. Since Sub-query 2 has no specified SkippableErrorCodes, the value specified in its parent object applies, which is " ".

    • Service SkippableErrorCodes: "404"

      • Endpoint 1 (SkippableErrorCodes not specified)

        • Sub-query 1 (SkippableErrorCodes not specified)
      • Endpoint 2 SkippableErrorCodes: " "

        • Sub-query 2 (SkippableErrorCodes not specified)
  • Paging properties specified at the service level apply to the child endpoints. Sub-queries and sub-items underneath don’t inherit this property.

  • Authentication properties specified at the service level apply to the child endpoints, and sub-queries underneath them. Typically, you don’t need to override the configuration you entered at the service level further down in the JSON configuration, as the same authentication method usually applies to the entire application content.

Recommended Articles