Indexing by Reference

Coveo Cloud crawlers can index the content of a large number of items of various formats and sizes (see Supported File Formats). However, by default, crawlers do not index the content of files of certain formats, and does not index very large items either (see Manage Format Handling in Source JSON). To include large or unsupported items in search results despite this limitation, Coveo Cloud sources include these items by reference. Indexing by reference means that the sources only contain item file information such as the URI, filename, and metadata. Although indexing items by reference saves index space since item content is not taken into account, this process limits the Coveo Cloud search capability because only item metadata and path are searchable, as opposed to the entire item content.

You index your company Dropbox account, which contains a Microsoft Publisher item (.pub) created by a user named John Smith. Since Publisher file content is not indexed by default for this source, the item is indexed by reference. The content of the item is not included in the index, but the Dropbox source still includes the following metadata:

  • filename: Retirement_Announcement_Letter

  • title: Early Retirement

  • author: John Smith

  • date of last modification: April 1, 2017

  • URI: www.dropbox.com/yourcompany/jsmith/foldername/RetirementAnnouncementLetter

Once the item has been indexed, John Smith queries retirement letter to find the item. Since John Smith’s query contains keywords matching the above metadata, the item appears in the search results. John Smith can then review the item metadata in the search interface or use the URI to open it directly from his Dropbox folder. However, if John Smith queries keywords matching the item content such as dear colleagues, the file does not appear in the search results.

Coveo Cloud can index several item types (see Supported File Formats). You can include items of other formats using indexing pipeline extensions (see Extensions - Page).