Indexing by Reference

Coveo Cloud crawlers can index the content of many items of various formats and sizes. However, by default, they don’t index certain formats or very large items.

To include large or unsupported items in the search results, Coveo Cloud sources index these items by reference, which means that a source only contains their file information, such as URI, file name, and other metadata. Although omitting their content saves space in the index, this also limits search capability because only an item’s metadata and path are searchable as opposed to its entire content.

You index your company Dropbox account, which contains a Microsoft Publisher item (.pub) created by a user named John Smith. Because Publisher file content isn’t indexed by default for this source, the item is indexed by reference. The Dropbox source includes the following metadata:

  • filename: Retirement_Announcement_Letter

  • title: Early Retirement

  • author: John Smith

  • date of last modification: April 1, 2017

  • URI:

After the item is indexed, John Smith queries retirement letter to retrieve it. Because his query contains keywords which match the above metadata, the item appears in the search results. He can then review the item metadata in the search interface or use the URI to open it directly from his Dropbox folder. However, if John Smith uses keywords in his query which match the content of the item rather than its metadata (e.g., dear colleagues), the file wouldn’t appear in the search results.

Handling File Formats in Source JSON

You can browse the JSON configuration of a source to review and change how Coveo Cloud handles file formats (or extensions) that it encounters when crawling a system (Edit the JSON Configuration and Change Indexed Item Types).

In the source JSON, under extensionSetting for the desired file format, two parameters determine how an item is handled:

Parameter Description
action Default action to take when encountering an item of the above file format.
actionOnError Action to take when an error occurs while crawling an item of the above file format.

For both parameters, there are three action values possible. However, only a single value can be applied for each parameter:

Value Description
Retrieve Index the item by content.
Reference Index the item by reference.
Ignore Skip the item (i.e., don’t index it).

For file formats that are common within a given source, action and actionOnError are typically set to Retrieve and Reference, respectively. For uncommon formats, or those associated with larger file sizes, both parameters are often set to Reference.

Recommended Articles