Source JSON modification options

This article covers options for the source aspects that you can modify by editing the source JSON configuration.

Add source filters

By default, Coveo indexes all the items in your source URL. However, if you want to index only certain items or ignore unwanted items on an item path basis, you can add filters to your source configuration. Coveo source configurations support inclusion and exclusion filters. See the JSON filter reference for guidance.

Tip

Many source types now allow you to easily define inclusion and exclusion rules directly in the source configuration user interface, under the Configuration tab. The source configuration user interface lets you define rules using an intuitive language rather than wildcards or regular expressions, as one must do in the JSON configuration.

Once you’ve saved your configuration changes, launch a source rescan to apply them. A rebuild may be suggested on the Sources (platform-ca | platform-eu | platform-au) page, but isn’t required.

JSON filter reference

In the Configuration tab of the Edit configuration with JSON panel, your source filters appear at the top of the JSON configuration box as follows:

"addressPatterns": [
  { 1
    "allowed": true,
    "expression": "*",
    "patternType": "Wildcard"

  },
  {
    "allowed": <BOOLEAN>,
    "expression": "<YOUR_FILTERING_EXPRESSION>",
    "patternType": "<EXPRESSION_TYPE>"
  },
  ...
]
1 The default addressPatterns array object (that is, the all-inclusive filter).

addressPatterns (Array, Required)

This array contains your source filters. Each filter is represented by an object grouping the three mandatory filter parameters. These parameters are: allowed, expression, and patternType.

Important

By default, the addressPatterns array of a newly created source only contains the all-inclusive filter.

"addressPatterns": [
  {
    "allowed": true,
    "expression": "*",
    "patternType": "Wildcard"
  }
]

With this default configuration, Coveo doesn’t filter at all (that is, it crawls all document paths it finds using the startingAddresses).

Importantly, the addressPatterns filters are also applied to the startingAddresses themselves. If you submit an empty addressPatterns array (or remove it entirely), the startingAddresses won’t match any allowed addressPatterns filter and Coveo will return a No Items Indexed error.

Ensure you have at least one allowed addressPatterns that matches each of your startingAddresses. Also ensure you don’t have any exclusion filters that match your startingAddresses.

allowed (Boolean, Required)

This parameter specifies whether the filter acts as an inclusion filter (indexing items) or an exclusion filter (ignoring items). In other words, it determines if the items at URIs matching the pattern are to be indexed or ignored.

Allowed values are true for an inclusion filter and false for an exclusion filter.

Example

By default, a Sitemap source indexes all web pages listed in a Sitemap. Many listed web pages contain links to JPG images, but you only want the text to be indexed. So, you add the following filter:

{
    "allowed": false,
    "expression": "*.jpg",
    "patternType": "Wildcard"
}
expression (String, Required)

This parameter determines the wildcard or regular expression that defines your source filter. Items at URIs matching this pattern will be indexed or ignored by Coveo.

Examples
  • With a wildcard: "expression": "http://career.MyCompany.com/jobs/*"

  • With a regular expression: "expression": ".*\\.(zip|rar|tar|7z|png|jpg)"

You must encode space and special characters in your expression. In addition, you must escape all backslash characters by adding a backslash in front of them. Slash characters do not need to be escaped.

For example, if your desired regular expression is:

^https?://docs\.coveo\.com/en/7\d/$+

The expression to provide in the source JSON is:

"expression": "^https?://docs\\.coveo\\.com/en/7\\d/$",+

patternType (String Enum, Required)

This parameter determines the type of expression used. Allowed values are Wildcard and RegEx.

Example

You have an AWS S3 source, where the bucket contains PDFs, compressed files, and images. You want to index only PDFs, so you add the following filter:

{
    "allowed": false,
    "expression": ".*\\.(zip|rar|tar|7z|png|jpg)",
    "patternType": "RegEx"
}

Note that in the second expression value above, the second . character is escaped twice: once for the regular expression and once for the JSON.

Change file type configurations

By default, each source is configured to index items of several file types, based on their file extension. In the File types tab of the Edit configuration with JSON panel, you can see the list of supported file extensions and the associated settings determining how they’re processed by the source. See File type handling for more information.

Add conditional indexing for a Salesforce source

The Salesforce source lets you index items only when they meet specific conditions, which can reduce the size of your index (see Introducing conditional indexing).

Enable Coveo Personalization-as-you-go

When using a REST API, Database, Sitemap, Web, or GraphQL API source to index commerce-specific content, such as products, variants, and availabilities, you have to undergo a catalog configuration process to benefit from all commerce-related capabilities.

Coveo Machine Learning tools include Coveo Personalization-as-you-go (PAYG) capabilities for commerce use cases. This suite of advanced features learns from a user’s intent and reacts within a few clicks. PAYG models require the building of a product vector space to represent the products contained in your source. For REST API, Database, Sitemap, Web, or GraphQL API sources, Coveo PAYG needs to be enabled in order to produce the product vector space. Contact your Coveo representative to discuss your options.

Note

With Catalog sources or SAP, however, this modification isn’t required. You’ll be able to benefit from Coveo PAYG functionalities as soon as it’s enabled in your organization.

Forbid item deletion during a rescan

You may have a source whose content rarely or never gets deleted, such as a static site or an application where older content is archived. In rare cases, due to an error with an API or with Coveo, a source rescan may delete this stable content.

For example, if your server returns no item during the rescan, Coveo will consider your content has been deleted and will remove it from your index. As a result, your source content won’t be searchable through your search interface.

Although this issue rarely happens, Coveo offers a source parameter that forbids item deletion during a source rescan, as an extra layer of security. Enabling this parameter for a source with stable content ensures that your content remains available through your search interface despite the error.

If you know your source content rarely or never gets deleted and you want to forbid item deletion during rescans, edit the source JSON configuration and then, in the Parameters tab, add "SkipUncrawledDocumentsDeletionOnRescan": {"value": "true"}.

"SkipUncrawledDocumentsDeletionOnRescan": {
  "sensitive": false,
  "value": "true"
}

Once the parameter is enabled, the only way to have the source delete content from your index is to launch a source rebuild.

Alternatively, you can configure your source to block the deletion process if it’s about to delete more than a certain percentage of the source items.

Forbid item deletion based on a percentage condition

You may have a source that’s crucial to your business and whose content is mostly stable. That is, if items get deleted at the end of a rescan operation, it’s usually a fraction of the source content.

To protect this source’s content from accidental deletions, in the Parameters tab, use the AllowedDeletionPercentage parameter to block the deletion process if it’s about to delete more than the specified percentage of the source items.

For example, let’s say you set this parameter to 10% as follows.

"AllowedDeletionPercentage": {
  "sensitive": false,
  "value": "10"
}

You then make a change in the source configuration panel. At the next rescan, if Coveo detects that more than 10% of the items are flagged for deletion, the deletion process will be blocked and your source will display an error with code DELETION_BLOCKED_BY_ALLOWED_PERCENTAGE. As a result, your search interface will keep displaying the source content that would have otherwise been deleted.

You may also want to take advantage of the AllowedDeletionPercentage parameter if the source’s content comes from an API or server that you know is unreliable. For example, if a scheduled rescan takes place while your API momentarily returns no items (and no errors either), this parameter will prevent the deletion of all the items in your index, and your search interface will keep displaying your content.

Note

The File system source doesn’t support the AllowedDeletionPercentage parameter yet. Contact your Coveo representative if you’d like to use it with a File system source.

Alternatively, you can configure your source to skip the deletion process following a rescan altogether.

Add a hidden source parameter

You can edit frequently used source parameters from the Administration Console. Other rarely used parameters aren’t exposed in the console user interface but can be added to the source JSON configuration upon instructions from Coveo Support.

Hidden parameters have two attributes:

  • sensitive which is set to false by default for all parameters. Set to true when the parameter value contains sensitive information. When set to true, the value attribute won’t appear in the JSON configuration once the source is rebuilt.

  • value which is obviously the value of the parameter.

Example

A Coveo Support agent tells you to add a hidden source parameter in the JSON configuration parameters section to fix a specific issue that you’re experiencing.

Assuming the recommended parameter is Boolean, should be set to false, and named AHiddenSourceParameter, you would add the following in the Parameters tab of the Edit configuration with JSON panel:

"AHiddenSourceParameter": {
  "sensitive": false,
  "value": "false"
}
Note

When editing a parameter with the sensitive attribute set to true, you must specify a value to overwrite the current one which is hidden.