About the PermanentId Field

Coveo Cloud - May, 5 2017 

The permanentid field contains a value that uniquely and permanently identifies each item with respect to the original repository. The field value would be the same for an item indexed more than once with different source types. The permanentid field was introduced with the May 5, 2017 Coveo Cloud V2 release for the purpose of Coveo Machine Learning (Coveo ML) models to learn user behavior on stable item IDs.

Previously, JavaScript Search Framework pages passed, and Coveo ML models used, the urihash field by default to identify index items. However, for some source types such as repository allowing to share items, an item could have more than one URI, and therefore more than one urihash field value.

In a Box source, the item URIs include the Box user ID. Consequently, a shared file or folder has multiple URIs. The permanentid field value is rather based on the Box file_id value, which remains the same even when an item is shared and accessible with various paths by various users.

PermanentId Field Value

The method to get the permanentid field value may be different depending on the repository type:

  • For most standard Coveo Cloud source types, the permanentid field:

    • Is based on the item URI because it is an appropriate unique and permanent identifier.

    • Is a 60 hexadecimal character hash of the item URI (to optimize index performances).

  • For other repository types such as:

    • Box (personal) and Box Business

    • Dropbox (personal) and Dropbox Business

    • Google Drive (personal) and Google Drive for Work

    • YouTube

      Following the September 16, 2017 release, the YouTube permanentid field will have a different format. To implement this change in sources created before this date, you must rebuild your YouTube source (see Refresh VS Rescan VS Rebuild).

    where one item can have more than one URI so the permanentid field is typically a hash of a string containing:

      (repository type identification) + (Unique separator) + (Source dependent unique item identifier)
    

    In a Box source, the item permanentid field value is a hash of the string:

      "https://www.box.com/" + "@@@" + file_id
    

    where file_id is the Box unique identifier for each item.

  • For Salesforce sources, the Salesforce item ID is used to set the permanentid for Coveo for Salesforce editions using a Coveo index. Coveo for Salesforce editions using a Salesforce index also pass the Salesforce item ID as the unique ID for all usage analytics events.

    Consequently, if you migrate from a Coveo for Salesforce edition using a Salesforce index to one using a Coveo index, the item ID remains the same, keeping the Coveo ML learned behavior history for all Salesforce items.

    The URI for a Salesforce case is:

    https://na26.salesforce.com/5003200004BSRT6BAP

    The permanentid value will be: 5003200004BSRT6BAP

Taking Advantage of the PermanentId Field

The permanentId field usage should be mostly transparent, but you may need to perform some tasks to fully take advantage of its benefits with custom aspects.

  • Standard source types

    The introduction and usage of the permanentid field is meant to be mostly transparent for standard source types:

    • All Coveo Cloud V2 standard sources automatically include the permanentid metadata and field mapping after the May 5, 2017 release.

      Following the May 5, 2017 release, you must however rebuild the Coveo Cloud V2 sources created before the release to ensure that all items get a permanentid field value. Otherwise, only rescanned or refreshed items will have a value for the permanentid field.

    • Coveo JavaScript Search Framework 1.2537.7 - April 2017  A JavaScript Search Framework search page automatically detects the existence of the permanentId field for each clicked item and sends this field and its value with each usage analytics events, respectively in the contentIdKey and contentIdValue metadata. The page always sends the urihash field and its value.

    • The Coveo ML models update take the contentIdKey field and the contentIdValue metadata passed in usage analytics events to identify each item to learn from, using the urihash as a fallback. The transition from the urihash to the permanentid fields is therefore automatic. Within an index or even a source, items may be identified using either fields.

      For an item, the models can map older usage analytics events containing only the urihash to newer ones with both the urihash and permanentId fields. This way an item click history is not lost with the transition.

      Following the transition from the urihash to the permanentid fields, your end-users may experience a minor and temporary degradation of Coveo ML Automatic Relevance Tuning and Recommendation model performance, but only if you push custom usage analytic click events that do not include both the urihash and the permanentid.

      This is because items for which the unique identifier suddenly change appear as new items to Coveo ML models. With time however, new usage analytics events on those items will accumulate to rebuild these item usage history and allow Coveo ML models to learn from them again.

  • Custom sources (Push API)

    For custom sources populated through the Push API, when the URI is not a unique identifier and you want to allow Coveo ML models to learn usage of pushed items, you can:

    1. Push a metadata that uniquely and permanently identifies each pushed item.

      The content of the metadata can be anything such as a URI or a GUID, as long as it is unique and never changes.

      Using an alphanumeric string of at most 60 characters (without spaces) optimizes index performances. You can however use any value you wish. Using a value that is not hashed can help for troubleshooting.

    2. Map this metadata to the permanentid field (see Edit the Mappings of a Source: [SourceName]).