Source JSON Modification Options

This article covers options for the source aspects that you can modify by editing the source JSON configuration.

Add a Hidden Source Parameter

You can edit frequently used source parameters from the Administration Console. Other rarely used parameters aren’t exposed in the console user interface but can be added to the source JSON configuration.

Hidden parameters have two attributes:

  • sensitive which is set to false by default for all parameters. Set to true when the parameter value contains sensitive information. When set to true, the value attribute won’t appear in the JSON configuration once the source is rebuilt.

  • value which is obviously the value of the parameter.

Example

A Coveo Support agent tells you to add a hidden source parameter in the JSON configuration parameters section to fix a specific issue that you’re experiencing.

Assuming the recommended parameter is Boolean, should be set to false, and named AHiddenSourceParameter, you would add:

"AHiddenSourceParameter": {
  "sensitive": false,
  "value": "false"
}
Note

When editing a parameter with the sensitive attribute set to true, you need to specify a value to overwrite the current one which is hidden.

Add Conditional Indexing for a Salesforce Source

The Salesforce source allows you to index items only when they meet specific conditions, which can reduce the size of your index (see Introducing Conditional Indexing).

Add Source Filters

By default, Coveo indexes all the items in your source URL. However, if you want to index only certain items or ignore unwanted items, you can add filters to your source configuration. Coveo source configurations support inclusion and exclusion filters.

To fine-tune the items to index or ignore, you must define source filters in your source JSON configuration Alternatively, with a Web source, you can define filters directly in the source configuration panel. In any case, use the reference below as a guide.

Once you’ve saved your configuration changes, launch a source rescan to apply them. A rebuild may be suggested on the Administration Console Sources (platform-eu | platform-au) page, but isn’t required.

JSON Filter Reference

Your source URL and filters appear at the top of your JSON configuration as follows:

"startingAddresses": [
  "http://www.example.com/sitemap.xml"
],
"addressPatterns": [
  { 1
    "expression": "*",
    "patternType": "Wildcard",
    "allowed": true
  },
  {
    "expression": "<YOUR_FILTERING_EXPRESSION>",
    "patternType": "<EXPRESSION_TYPE>",
    "allowed": <BOOLEAN>
  }
]
1 The default addressPatterns array object (i.e., the all-inclusive filter).

startingAddresses (Array, Required)

This array contains the source URL(s) that Coveo crawls to retrieve the content to index.

You must encode space and special characters in your source URL.

addressPatterns (Array, Required)

This array contains your source filters. Each filter is represented by an object grouping the three mandatory filter parameters. These parameters are: expression, patternType, and allowed.

Important

By default, the addressPatterns array of a newly created source only contains the all-inclusive filter.

"addressPatterns": [
  {
    "expression": "*",
    "patternType": "Wildcard",
    "allowed": true
  }
]

With this default configuration, Coveo doesn’t filter at all (i.e., it crawls all document paths it finds using the startingAddresses).

Importantly, the addressPatterns filters are also applied to the startingAddresses themselves. If you submit an empty addressPatterns array (or if you remove the addressPatterns array altogether), the startingAddresses won’t match any allowed addressPatterns filter and Coveo will return a No Items Indexed error.

Ensure you have at least one allowed addressPatterns that matches each of your startingAddresses. Also ensure you don’t have any exclusion filters that match your startingAddresses.

expression (String, Required)

This parameter determines the wildcard or regular expression that defines your source filter. Items at URIs matching this pattern will be indexed or ignored by Coveo.

Examples
  • With a wildcard: "expression": "http://career.MyCompany.com/jobs/*"

  • With a regular expression: "expression": ".*\\.(zip|rar|tar|7z|png|jpg)"

You must encode space and special characters in your expression. In addition, you must escape all backslash characters by adding a backslash in front of them. Slash characters do not need to be escaped.

For example, if your desired regular expression is:

^https?://docs\.coveo\.com/en/7\d+/$

The expression to provide in the source JSON is:

"expression": "^https?://docs\\.coveo\\.com/en/7\\d+/$",

patternType (String Enum, Required)

This parameter determines the type of expression used. Allowed values are Wildcard and RegEx.

Example

You have an AWS S3 source, where the bucket contains PDFs, compressed files, and images. You want to index only PDFs, so you add the following filter:

{
    "expression": ".*\\.(zip|rar|tar|7z|png|jpg)",
    "patternType": "RegEx",
    "allowed": false
}

Note that in the second expression value above, the second . character is escaped twice: once for the regular expression and once for the JSON.

allowed (Boolean, Required)

This parameter determines whether the filter is an inclusion filter or an exclusion filter, i.e., whether the items at URIs matching the pattern should be indexed or ignored.

Allowed values are true for an inclusion filter and false for an exclusion filter.

Example

By default, a Sitemap source indexes all web pages listed in a Sitemap. Many listed web pages contain JPG images, but you only want the text to be indexed. So, you add the following filter:

{
    "expression": "*.jpg",
    "patternType": "Wildcard",
    "allowed": false
}

Change Indexed Item Types

By default, each connector is configured to index several item types (based on their file extension) that can typically be found in the specific system. In the source JSON configuration, you can see the list of supported file extensions and the associated settings determining how they’re processed by the source. In particular, you can easily change which item types are indexed or not.

Example

By default, an Amazon S3 source indexes many item types. You index an Amazon S3 bucket that contains .html and .pdf files, but you only want to index the HTML files.

In the documentConfig section of the source JSON configuration, you identify the extensions sub-sections containing the .pdf extension type, and change the action and actionOnError values from Retrieve to Ignore, and then rebuild your source to reject the PDF files.

{
    "extensions": [
        ".pdf"
    ],
    "extensionSetting": {
        "action": "Ignore",
        "actionOnError": "Ignore",
        "converter": "Detect",
        "useContentType": false,
        "indexContainer": true,
        "fileTypeValue": "",
        "generateThumbnail": true,
        "useExternalHTMLGenerator": false,
        "convertDirectlyToHtml": false
    }
}

Change the Default Security Provider of a Generic REST API Source

Generic REST API sources use the Email security identity provider by default. This means that all security identities are identified with an email address. Alternatively, you can switch to any other existing security identity provider.

For example, to switch to a custom security identity provider populated through the Push API, in the source JSON configuration, replace the securityProviders object with the following configuration:

"securityProviders": {
  "SecurityProvider": {
    "name": "<SECURITY_PROVIDER_NAME>",
    "typeName": "Expanded"
    }
  }

Where <SECURITY_PROVIDER_NAME> is a name of the target security provider, as displayed on the Security Identities (platform-eu | platform-au) Administration Console page.

Note

A Generic REST API source can handle only one security identity provider.

Enable Coveo Personalization-as-you-go for a Source

In order for your source to benefit from the group of functionalities that promote the personalization feature, it needs to be Stream API enabled. By default, this feature is built into the catalog source, however we also now support it for the Generic REST API and Salesforce sources, which you can change by editing the source JSON configuration from the Administration Console Sources (platform-eu | platform-au) page.

Add the UseStreamApi parameter to the parameters section of the source JSON configuration.

"parameters": {
    "UseStreamApi": {
        "value": "true"
    }
    // add additional parameters
}

Find the Creator of a Source

As an administrator, you may want to know who is the user that’s the only one that has access to a source whose content is accessible to the source creator only. You can find this information in the source JSON.

Example

You have a source whose content accessible to the source creator only. You want to know the only identity that currently has access to the items of this source. In the permissionSets section of the source JSON configuration, under allowedPermissions, find the identity (typically an email).

"permissionSets": [
    {
        "allowedPermissions": [
            {
                "identityType": "User",
                "securityProvider": "Email Security Provider",
                "identity": "someone@mycompany.com"
            }
        ]
    }
]

Only a user authenticated with this email identity will be able to see search results from this source. Here, you can also change the identity.

What's next for me?