Add a GraphQL API source
Add a GraphQL API source
The GraphQL API source is a beta feature. This means it will change over the coming months as Coveo adds to it and enhances your experience. In the meantime, we encourage you to start using it to explore the data that Coveo can index. |
Coveo has dedicated connectors for many web and on-premises systems, therefore allowing you to quickly make application content searchable. See the Connector directory for the full list.
However, there may be applications of which you want to index the content, but for which there’s no dedicated connector. In such a case, when you have the required privileges, you can use a generic API connector to retrieve and make the desired content searchable with Coveo.
The GraphQL API connector lets you crawl content from a remote repository exposing its data through a GraphQL API. When creating your source, in the Coveo Administration Console, you must provide a JSON configuration allowing Coveo to retrieve content items. This configuration indicates which API calls to execute to fetch the desired items, how to parse the responses to extract relevant metadata, and what type of resources these items represent.
Your company uses GitHub to manage your software code. You want to list the PRs made by each user for statistical purposes. So, you create a GraphQL API source that will query GitHub’s API to index the desired content.
The GraphQL source works very similarly to the REST API source.
When working on your GraphQL API source, you may also want to refer to the following articles:
Leading practice
The number of items that a source processes per hour (crawling speed) depends on various factors, such as network bandwidth and source configuration. See About crawling speed for information on what can impact crawling speed, as well as possible solutions. |
Source key characteristics
Features | Supported | Additional information | |
---|---|---|---|
For each endpoint you define in your source configuration, you can provide a refresh endpoint to override the initial endpoint. When you do so, the connector can add, update, or delete specific items in the index instead of refreshing the entire repository. However, the following limitations currently apply:
|
|||
Content security options |
If your source configuration includes the |
||
metadata indexing for search |
Automapping of metadata to a field with a matching name |
||
Automatically indexed metadata |
Sample of autopopulated fields (no user-defined metadata required):
Sample of autopopulated fields (source configuration parameter of matching name required): After a content update operation, inspect your item field values in the Content Browser. |
||
Collected indexable metadata |
Parameters specified in the source configuration |
Commerce requirements
When using a GraphQL API source to index commerce-specific content, such as products, variants, and availabilities, you have to undergo a catalog configuration process to benefit from all commerce-related capabilities.
More specifically, you must:
-
Enable Coveo Personalization-as-you-go (PAYG) capabilities in your source.
-
Associate your source with a catalog configuration.
Note that additional configuration is required. Contact your Customer Success Manager to discuss your options.
Enable Coveo Personalization-as-you-go
Coveo Machine Learning tools include Coveo Personalization-as-you-go (PAYG) capabilities for commerce use cases. This suite of advanced features learns from a user’s intent and reacts within a few clicks. PAYG models require the building of a product vector space to represent the products contained in your source. For GraphQL API sources, Coveo PAYG needs to be enabled in order to produce the product vector space.
You must enable PAYG in your source before starting to index content in it. |
To enable PAYG in your source
-
Modify the
parameters
section by adding the following:"parameters": { "UseStreamApi": { "value": "true" } }
Catalog configuration
Behind the scenes, the GraphQL API source uses the Stream API to push content to the Coveo index. Therefore, GraphQL API sources must be associated with a catalog entity to ensure a complete configuration. This allows the source to accurately build a product vector space.
For instructions on how to create a catalog entity, see Commerce catalog entity.
Add a GraphQL API source
Follow the instructions below to add a GraphQL API source using the desired content retrieval method.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click Add source.
-
In the Add a source of content panel, click the Cloud () or Crawling Module () tab, depending on your content retrieval context. With the latter, you must install the Crawling Module to make your source operational.
-
Click the GraphQL API tile.
-
Configure your source.
The completion steps are especially important when creating a source of this type.
Leading practice
It’s best to create or edit your source in your sandbox organization first. Once you’ve confirmed that it indexes the desired content, you can copy your source configuration to your production organization, either with a snapshot or manually. See About non-production organizations for more information and best practices regarding sandbox organizations. |
"Configuration" tab
In the Add a GraphQL API Source panel, the Configuration tab is selected by default. It contains your source’s general and authentication information, as well as other parameters.
General information
Source name
Enter a name for your source.
Leading practice
A source name can’t be modified once it’s saved, therefore be sure to use a short and descriptive name, using letters, numbers, hyphens ( |
Paired Crawling Module
If your source is a Crawling Module source, and if you have more than one Crawling Module linked to this organization, select the one with which you want to pair your source. If you change the Crawling Module instance paired with your source, a successful rebuild is required for your change to apply.
Optical character recognition (OCR)
If you want Coveo to extract text from image files or PDF files containing images, enable the appropriate option.
The extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick view. See Enable optical character recognition for details on this feature.
Project
If you have the Enterprise edition, use the Project selector to associate your source with one or multiple Coveo projects.
"Authentication" section
In the Authentication section, all parameters are optional. Fill the appropriate boxes depending on the authentication type used by the source you want to make searchable.
-
If your source uses a HTTP, Basic, Kerberos, or NTLM authentication protocol, enter the Username and Password of the account with which you want to crawl the source. Then, use the
@Username
and@Password
placeholders in your source JSON configuration instead of exposing the credentials in clear text. The account of which you enter the credentials must have access to all the content that you want to make searchable. See Source credentials leading practices. -
If your source uses the OAuth 2.0 authentication protocol, enter your content source Client ID, Client secret and Refresh token in the corresponding boxes. Then, use the
@ClientID
,@ClientSecret
, and@RefreshToken
placeholders in your source JSON configuration instead of exposing this information in clear text. -
If your source uses an API key to authenticate, enter it in the API key box. Then, use the
@ApiKey
placeholder in your source JSON configuration instead of exposing the API key in clear text. -
If your source doesn’t require authentication, leave all boxes empty.
"Content to include" section
In the JSON configuration box, enter your source JSON configuration.
Under PayloadJsonContent
or QueryParameters
, enter a placeholder stating with @
for your GraphQL queries, for example, @MyFirstQuery
.
Then, in the GraphQL queries section below, enter the name you used as a placeholder and your actual query.
For more information on the GraphQL API source JSON configuration, see:
"GraphQL queries" section
Click Add query to start adding the GraphQL API queries that you want Coveo to execute to retrieve your content. You may want to use a GraphQL-to-JSON conversion tool such as Data Fetcher’s to help you write your queries.
Under Query name, enter a name that you’ll use as a placeholder for your query in the source JSON configuration.
This name must start with @
, for example, @MyFirstQuery
.
Then, under GraphQL query, enter your query. For example:
query {
user(login:"jsmith") {
pullRequests(first:@pageSize, after:@offset) {
totalCount
edges {
node {
createdAt
title
url
}
cursor
}
pageInfo {
endCursor
hasNextPage
}
}
}
}
Finally, back in your source JSON configuration, ensure to reference your query by name (starting with @
) under PayloadJsonContent
or QueryParameters
.
For example, if you named your queries @MyFirstQuery
and @MySecondQuery
, your source JSON configuration could look as follows:
{
"Services": [
{
"Url": "https://api.github.com/",
"authentication": {
"username": "@username",
"password": "@password",
"forceBasicAuthentication": "true"
},
"Endpoints": [
{
"paging": {
"pageSize": 10,
"offsetType": "cursor",
"nextPageKey": "data.user.pullRequests.pageInfo.endCursor"
},
"headers": {
"accept": "application/vnd.github.v3+json",
"User-Agent": "PostmanRuntime/7.29.0"
},
"Path": "graphql",
"Method": "POST",
"ItemPath": "data.user.pullRequests.edges",
"ItemType": "PullRequests",
"Uri": "%[node.url]",
"ClickableUri": "%[node.url]",
"Title": "%[node.title]",
"ModifiedDate": "%[node.createdAt]",
"PayloadJsonContent": "@MyFirstQuery"
}
],
"RefreshEndpoints": [
{
"PayloadJsonContent": "@MySecondQuery"
}
]
}
]
}
Alternatively, if you’re using the placeholder under QueryParameters
, it could contain the following:
{
"QueryParameters": {
"query": "@MyFirstQuery"
}
}
"Content security" tab
Select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content security.
Note
If, while writing your source JSON configuration, you chose to index content access permissions and used the |
"Access" tab
In the Access tab, set whether each group (and API key, if applicable) in your Coveo organization can view or edit the current source.
For example, when creating a new source, you could decide that members of Group A can edit its configuration while Group B can only view it.
See Custom access level for more information.
Completion
-
Click Add Source/Save to add/save your source configuration.
-
While writing your JSON configuration, you may have decided to populate fields that aren’t default fields. If you have not already created these fields for another source, you must create them in the Fields (platform-ca | platform-eu | platform-au) page before building your source.
Examples-
You decided to retrieve picture URIs and to have Coveo populate the
pictureuri
field with this data. Your item metadata therefore contains:"pictureuri": "%[node.picture.uri]"
However, since the
pictureuri
field isn’t a default field likeauthor
ordate
, you must create it. -
You have another GraphQL API source populating the custom field
facebookaccountid
. When creating your new source, you therefore don’t need to create this field, as it’s already in the Fields (platform-ca | platform-eu | platform-au) page.
-
-
Ensure that your source correctly maps all the fields to populate. In sources created before January 10, 2024, if a field doesn’t have a mapping, you must create one. In sources created after that date, Coveo will automatically populate fields whose name exactly matches a metadata key.
ExampleIn your content, profile picture URIs are stored under
profilepic
. If apictureuri
fields exists in your organization, you’ll create the following mapping to populate thepictureuri
field with picture URIs:%[profilepic]
.However, if a
profilepic
field already exists in your organization, Coveo will automatically populate it with theprofilepic
value extracted from your content. -
On the Sources (platform-ca | platform-eu | platform-au) page, you must click Launch build or Launch rebuild in the source Status column to add the source content or to make your changes effective, respectively.
Limitation
When indexing content with the Crawling Module, ensure not to change space character encoding in an item’s URI, as Coveo uses URIs to distinguish items.
For example, an item whose URI would change from example.com/my first item
to example.com/my%20first%20item
wouldn’t be recognized as the same by Coveo.
As a result, it would be indexed twice, and the older version wouldn’t be deleted.
Item URIs are displayed in the Content Browser (platform-ca | platform-eu | platform-au).
We recommend you check where these URIs come from before making changes that affect space character encoding.
Depending on your source type, the URI may be an item’s URL, or it may be built out of pieces of metadata by your source mapping rules.
For example, your item URIs may consist of the main site URL plus the item filename, due to a mapping rule such as example.com/%[filename]
.
In such a case, changing space encoding in the item filename could impact the URI.
Ignoring "no first page" errors in subitems
When indexing your content, Coveo may encounter an HTTP error. By default, Coveo stops the crawling process when it encounters such an error.
However, you can configure your source to ignore specific errors and continue indexing.
Similarly, when requesting subitems from your API, Coveo will stop the indexing process if your API returns a 404 error rather than the first page of results. A 404 error on the first page prevents Coveo from indexing any of your subitems, as the missing first page contains the information needed to request the second page, such as a cursor or the URL of the next page.
However, you can configure your source to ignore this error and continue indexing. It will therefore finish indexing the other items of the same endpoint, including their subitems if the API returns valid result pages. Then, your source will move on to any other endpoint you’ve defined.
To ignore "no first page" errors in subitems
-
On the Sources (platform-ca | platform-eu | platform-au), select your source, and then, in the More menu, click Edit configuration with JSON.
-
In your source JSON configuration, in the
parameters
object, add the following:"SkipNoFirstPageErrorsInSubItems": { "value": "true" },
-
Click Save and rebuild source.
This parameter applies to all subitem requests made by your source.
Required privileges
You can assign privileges to allow access to specific tools in the Coveo Administration Console. The following table indicates the privileges required to view or edit elements of the Sources (platform-ca | platform-eu | platform-au) page and associated panels. See Manage privileges and Privilege reference for more information.
Note
The Edit all privilege isn’t required to create sources. When granting privileges for the Sources domain, you can grant a group or API key the View all or Custom access level, instead of Edit all, and then select the Can Create checkbox to allow users to create sources. See Can Create ability dependence for more information. |
Actions | Service | Domain | Required access level |
---|---|---|---|
View sources, view source update schedules, and subscribe to source notifications |
Content |
Fields |
View |
Sources |
|||
Organization |
Organization |
||
Edit sources, edit source update schedules, and view the View and map metadata subpage |
Content |
Fields |
Edit |
Sources |
|||
Content |
Source metadata |
View |
|
Organization |
Organization |