Add a SharePoint Online source

Members with the required privileges can index SharePoint Online or OneDrive content and make it searchable. In a Coveo-powered search interface, the source content is accessible to either everyone, some specific users and groups, or the same users and groups as in the content system.

SharePoint Online tenants typically hold large volumes of content. Follow the SharePoint Online source configuration leading practices to optimize indexing performance.

Notes
  • To retrieve SharePoint on-premises content, you must create a SharePoint Server source.

  • The item modifications that are retrievable during a source rescan are determined by the options selected when adding or editing the SharePoint Online source in the Additional content section.

  • Following a refresh operation, deleted discussion lists are excluded from your SharePoint Online source content, but replies to the original discussion message will only be excluded following the next rescan operation. This is a known issue caused by a limitation of SharePoint Online.

Source key characteristics

Features Supported Additional information

SharePoint Online version

Latest cloud version

Indexable content

Sites, sub-sites, user profiles, personal websites, lists, list items, list item attachments, document libraries, document sets, documents, web parts, and microblog posts and replies.

Content update operations

refresh

check

Takes place every hour by default.

A refresh doesn’t capture groups that are added/removed using the SharePoint Advanced site permissions panel. Only rescans and rebuilds capture these changes.

rescan

check

Takes place every week by default.

Extracts all the data and indexes the following: modified permissions on items, new items, existing items with a modified date greater than the date in the index, and existing items with a computed entity tag[1] different than the one in the index.

Coveo runs refreshes in parallel with rescan operations only when App authentication using certificate is used.

rebuild

check

Content security options

Same users and groups as in your content system

check

Specific users and groups

check

Everyone

check

metadata indexing for search

Automapping of metadata to a field with a matching name

Disabled by default and not recommended for this source type.

Automatically indexed metadata

Sample of autopopulated fields (no user-defined metadata required):

  • author

  • clickableuri

  • date

  • filename

  • filetype

  • indexeddate

  • language (autodetected from item content)

  • spolistbasetype

  • spolisttype

  • spparentname

  • spsitename

  • title
     

After a content update operation, inspect your item field values in the Content Browser.

Collected indexable metadata

The SharePoint Online source collects some of the site, list, list item, and file-level metadata that the SharePoint Online APIs make available.

After a rebuild, review the View and map metadata subpage for the list of indexed metadata and to index additional metadata from those available.

Custom metadata collection

Add columns to your list and libraries. Then, your SharePoint Online source will automatically collect the metadata in these columns on content update operations.

Authentication and site access

A SharePoint Online source uses the OAuth 2.0 authorization protocol to access your SharePoint Online site content, and the source must authenticate through an Azure Active Directory application. You can choose to authenticate the Azure Active Directory application via a client certificate using app-only permissions, or using a delegated SharePoint Online user account (crawling account). The authentication method you choose depends mainly on your individual needs and corporate policy.

Before creating a new SharePoint Online source:

  1. Determine the authentication method you want to use. The following subsections highlight the main advantages of each authentication method.

  2. Perform the related prerequisites (see App authentication using certificate prerequisites or User delegated access using OAuth 2.0 prerequisites).

Notes
  • It’s important to know the difference between site access, indexed items, and user access to content.

    • Authenticating through a certificate or a delegated crawling account gives Coveo authorization to crawl site content in your SharePoint Online tenant.

    • The sites and items that are actually indexed by your SharePoint Online source are determined by your source Content to index settings.

    • User access to the indexed items through a Coveo-powered search interface depends on your source Content Security settings.

  • You must enter authentication credentials to be able to save a new SharePoint Online source configuration.

The main advantages of App authentication using certificate are:

  • It provides a higher throttling rate limit than User delegated access using OAuth 2.0, and is recommended for indexing large amounts of data.

  • It enables parallel refreshes, which improve content freshness by indexing latest changes even when the source is performing a long rescan operation.

  • It allows more flexibility than User delegated access using OAuth 2.0 in terms of content access.

    • You can grant the source with permission to access all site content, personal sites, and user profiles in your SharePoint Online tenant without having to provide individual access to each site. The content that’s actually crawled and indexed depends on your source Content to index subtab settings.

    • You can grant the source with access to only a subset of site collections using the Sites.Selected permission (instead of Sites.FullControl.All).

  • It provides easier setup, as you don’t need to create and manage a crawling account in SharePoint Online, assign the account appropriate roles and permissions, and grant the account access to content.

App authentication using certificate prerequisites

If you choose to authenticate using a certificate, you must perform the following before creating your SharePoint Online source:

Create a client certificate

Your SharePoint Online source uses the client certificate to authenticate the Azure Active Directory application to crawl your SharePoint Online tenant.

  1. Create a CA-signed certificate using a trusted certificate authority (recommended), or a self-signed certificate using the method of your choice.

    Note

    The certificate file format must be .cer, .pem, or .cert. You’ll need the certificate file when adding the certificate to your Azure Active Directory application.

  2. Export the certificate as a password-protected .pfx file. Depending on how you created the certificate file, the .pfx file may be created for you automatically.

    Note

    You’ll need the .pfx file and password when creating your SharePoint Online source.

Create the Azure Active Directory application

The Azure Active Directory application that you create for use with your source grants Coveo the permissions to crawl your SharePoint Online tenant. Create the application and assign the required permissions as follows:

  1. Access your Azure portal with an administrator account, and create (register) an Azure Active Directory application.

    Notes
    • Select Accounts in this organizational directory only for the Supported account type option when creating the application.

    • Once you register the application, you’re taken to the application Overview page in Azure. Take note of the Application (client) ID and Directory (tenant) ID as you’ll need them when creating your SharePoint Online source.

  2. Grant the Azure Active Directory application the required crawling permissions as follows:

    1. If you’re currently on your application’s page in Azure, proceed to the next step. Otherwise, access your Azure portal with an administrator account, click App registration, and then click the application you created previously.

    2. Click API permissions.

    3. If the User.Read permission is added by default, click the permission, and then click Remove permission.

    4. For each of the following required permissions, click Add a permission, and then in the Microsoft APIs tab:

  3. Once you’ve added all the required permissions, grant tenant-wide admin consent to the application.

    Note

    You must have the appropriate user role to be authorized to consent on behalf of the organization.

Add the client certificate to the Azure Active Directory application

Follow the Microsoft documentation to add your certificate to the Azure Active Directory application.

(Optional) Create a selected sites list in SharePoint Online

With App authentication using certificate, if you want to use the Selected sites list URL option, you need to create a list of the selected site collections you’ve granted your application access to. You can then reference the URL of this list in your SharePoint Online source.

To create the list in SharePoint Online

  1. In your SharePoint Online tenant, access one of the sites that your application has access to.

  2. Go to Site Contents.

  3. Click + New > List.

  4. Enter a descriptive Name for the list (for example, selected-sites-list), and optionally, a Description. Then click Create.

  5. In the list, click + Add column.

  6. Select the Hyperlink column type, and then click Next.

  7. For the Name field, type Url (with that precise casing), and then click Save.

  8. For each site collection that you want to index, add a new list item, and then enter the site URL in the Url column.

    List of selected sites | Coveo Platform

Azure application permissions with App authentication using certificate

To work with Microsoft APIs (CSOM and REST), Coveo must authenticate via an Azure Active Directory application that has the proper permissions. The access token is then limited to these permissions, which are necessary to successfully crawl SharePoint Online.

You must provide tenant-wide admin consent for the permissions in the Azure Active Directory application that’s used to authenticate your source. Typically, you provide consent when creating the Azure Active Directory application for use with Coveo, but you can do so at a later time.

Notes
  • To provide admin consent, you must have the appropriate user role.

  • You won’t be able to index your SharePoint Online content until you provide admin consent.

  • The application permissions determine what Coveo can access when crawling your SharePoint Online tenant. The sites and items that are actually crawled and indexed by your SharePoint Online source depends on your source Content to index subtab settings. User access to the indexed items through a Coveo-powered search interface depends on your source Content security setting.

  • Coveo is a verified publisher for the Azure application.

  • You must provide admin consent for all required application permissions.

The following table provides a description of the permissions that you must grant the application when using App authentication using certificate.

API Permission Justification

SharePoint

Sites.FullControl.All

Allows Coveo to retrieve permissions of crawled items, such as sites, users, lists, and documents.

Why is FullControl required?

Coveo indexes item permissions to support the Same users and groups as in your content system content security source option, ensuring user access to items in a Coveo-powered search interface mirrors your SharePoint permission system. To get all item permissions, Coveo needs to access the list of Site Collection Administrators, which Microsoft considers to contain sensitive information. Microsoft requires a user to be Site Collection Administrator to view, add, or edit this list, in other words, someone with FullControl on the site collection. This is why the application needs FullControl.

Sites.Selected (FullControl)

Grants Coveo the permission to access only a specified subset of site collections.

You need to grant the application the FullControl permission on each crawled site.

Why is FullControl required?

Coveo indexes item permissions to support the Same users and groups as in your content system content security source option, ensuring user access to items in a Coveo-powered search interface mirrors your SharePoint permission system. To get all item permissions, Coveo needs to access the list of Site Collection Administrators, which Microsoft considers to contain sensitive information. Microsoft requires a user to be Site Collection Administrator to view, add, or edit this list, in other words, someone with FullControl on the site collection. This is why the application needs FullControl.

Note

The Sites.Selected permission can’t be used with the All sites and Hub sites options.

User.Read.All

Grants Coveo the permission to crawl user profiles.

Note

Coveo crawls all user profiles in your SharePoint Online tenant. You can’t choose to grant Coveo access to crawl only specific user profiles.

Microsoft Graph

Sites.Read.All

Grants Coveo the permission to crawl site content.

Note

This permission only grants permission to crawl site content. The site content that’s actually crawled and indexed is determined by your source Content to index settings.

Sites.Selected

Grants Coveo the permission to access only a specified subset of site collections (see Microsoft Graph permissions reference).

Directory.Read.All

Coveo requires this permission to fetch:

  • The Directory Role and Directory Role Members (see List members).

  • All users in Office 365, which is necessary to determine which users are in built-in groups such as Everyone (see List users and Coveo management of security identities and item permissions).

    Note

    The Azure documentation shows that the least privileged permission to retrieve the list of users in a group is actually User.ReadBasic.All, but since Directory.Read.All is already required for other operations, User.ReadBasic.All doesn’t appear in the list of required permissions.

Group.Read.All

Coveo uses this permission to obtain the ID of a group (represents an Azure Active Directory group, which can be an Office 365 group, or a security group), and then a list of the group members (see Get group and List members).

User delegated access using OAuth 2.0

For User delegated access using OAuth 2.0, the Azure Active Directory application is automatically created in your SharePoint tenant when you create the source, and is linked to the permissions of the crawling account that you create.

Note

The Azure Active Directory application appears as SharePoint Online Connector in your Azure portal Enterprise applications page.

The main advantages of User delegated access using OAuth 2.0 are:

  • It provides a way to give the crawling account, and by association your source, access to crawl only specific sites and user profiles.

  • It provides a way to grant the crawling account with minimal permissions when accessing site content.

However, an important drawback of User delegated access using OAuth 2.0 is that it’s more prone to throttling than App authentication using certificate.

User delegated access using OAuth 2.0 prerequisites

If you decide to use User delegated access using OAuth 2.0, you must perform the following before creating your SharePoint Online source:

  1. Create a SharePoint Online user account (crawling account) with appropriate roles and permissions.

  2. Grant the crawling account permission to access sites.

Azure application permissions with user delegated access using OAuth 2.0

A SharePoint Online source uses the OAuth 2.0 authorization protocol. To work with Microsoft APIs (CSOM and REST), Coveo must authenticate via an Azure Active Directory application that has the proper permissions. The access token is then limited to these permissions, which are necessary to successfully crawl SharePoint Online.

You must provide tenant-wide admin consent for the permissions in the Azure Active Directory application that’s used to authenticate your source. Provide admin consent directly from your SharePoint Online source panel when creating your source (requires SharePoint Global Admin credentials), or from your Azure portal after creating your source (see Grant tenant-wide admin consent in Enterprise apps).

Note

The Azure Active Directory application that’s automatically created in your SharePoint Online tenant after you create your source appears as SharePoint Online Connector in your Azure portal’s Enterprise applications page.

Notes
  • To provide admin consent from the Azure portal, you must have the appropriate user role.

  • You won’t be able to index your SharePoint Online content until you provide admin consent.

  • The application permissions determine what Coveo can access when crawling your SharePoint Online tenant. The sites and items that are actually crawled and indexed by your SharePoint Online source depends on your source Content to index settings. User access to the indexed items through a Coveo-powered search interface depends on your source Content Security setting.

  • Coveo is a verified publisher for the Azure application.

The following table lists the permissions that are automatically assigned to the application when using User delegated access using OAuth 2.0.

API Permission Justification

SharePoint

AllSites.FullControl

Allows Coveo to retrieve permissions of crawled items, such as sites, users, lists, and documents.

Why is FullControl required?

Coveo indexes item permissions to support the Same users and groups as in your content system content security source option, ensuring user access to items in a Coveo-powered search interface mirrors your SharePoint permission system. To get all item permissions, Coveo needs to access the list of Site Collection Administrators, which Microsoft considers to contain sensitive information. Microsoft requires a user to be Site Collection Administrator to view, add, or edit this list, in other words, someone with FullControl on the site collection. This is why the application needs FullControl.

Note

Coveo will never have more privileges than the crawling account, as the crawling account permissions take precedence. Coveo can only have the complete set of AllSites.FullControl privileges if the crawling account also has the same level of privileges.

User.Read.All

Grants Coveo the permission to crawl user profiles.

Note

Coveo only crawls the user profiles that the crawling account permissions allow.

Microsoft Graph

Sites.Read.All

Grants Coveo the permission to crawl site content.

Note

Coveo has permission to crawl only the sites that the crawling account permissions allow. The site content that’s actually crawled and indexed depends on your source Content to index settings.

Directory.Read.All

Coveo requires this permission to fetch:

  • The Directory Role and Directory Role Members (see List members).

  • All users in Office 365, which is necessary to determine which users are in built-in groups such as Everyone (see List users and Coveo management of security identities and item permissions).

    Note

    The Azure documentation shows that the least privileged permission to retrieve the list of users in a group is actually User.ReadBasic.All, but since Directory.Read.All is already required for other operations, User.ReadBasic.All doesn’t appear in the list of required permissions.

Group.Read.All

Coveo uses this permission to obtain the ID of a group (represents an Azure Active Directory group, which can be an Office 365 group, or a security group), and then a list of the group members (see Get group and List members).

Domain Name System records configuration for Microsoft 365

Regardless of the chosen authentication method, if you’re using custom domains in SharePoint Online, you must configure your Domain Name System (DNS) records for Microsoft 365.

  1. Access the Domains page of your Office 365 admin center.

  2. Select your corporate domain (not company.onmicrosoft.com) checkbox.

  3. On the domain page, in the DNS records section, take note of the DNS records.

  4. Configure the DNS records in your DNS host provider.

  5. On the domain page, in the DNS records section, click Check health to ensure that the DNS records were correctly configured.

Add a SharePoint Online source

A SharePoint Online source indexes cloud content. If you want to retrieve on-premises (server) content, see Add or edit a SharePoint Server source instead.

Tip
Leading practice

It’s best to create or edit your source in your sandbox organization first. Once you’ve confirmed that it indexes the desired content, you can copy your source configuration to your production organization, either with a snapshot or manually.

See About non-production organizations for more information and best practices regarding sandbox organizations.

  1. Determine the authentication method you want to use.

  2. Perform the related prerequisites (see App authentication using certificate prerequisites or User delegated access using OAuth 2.0 prerequisites).

  3. If applicable, configure your DNS records for Microsoft 365.

  4. On the Sources (platform-ca | platform-eu | platform-au) page, click Add source, and then click SharePoint Online.

  5. Configure your source.

"Configuration" tab

In the Add a SharePoint Online Source page, the Configuration tab is selected by default. It contains your source’s general and authentication information, as well as other parameters.

"Identification" subtab

The Identification subtab contains general information about the source.

Name

Enter a name for your source.

Tip
Leading practice

A source name can’t be modified once it’s saved, therefore be sure to use a short and descriptive name, using letters, numbers, hyphens (-), and underscores (_). Avoid spaces and other special characters.

Project

If you have the Enterprise edition, use the Project selector to associate your source with one or multiple Coveo projects.

"Content to index" subtab

Specify the content that your source indexes and makes searchable to users in a Coveo-powered search interface.

Note

For implementations using the Quickview component in a Coveo JavaScript Search Framework result template, and ASPX list items in SharePoint Online, the Quick view is supported only for ASPX list items (pages) of type Wiki, Publishing, Modern, and Web Part.

Content

Select whether you want to index SharePoint Online, User profiles, Personal sites, or OneDrive for Business content.

SharePoint Online content

Select this content type to scope the SharePoint Online content you want to index.

The options are:

  • Specific URLs

    Select this option to index specific site collections, subsites, or lists by specifying their URLs.

    Examples of URLs
    • For a specific site: https://contoso.sharepoint.com/sites/support

    • For a specific list: https://contoso.sharepoint.com/sites/support/Lists/contacts/AllItems.aspx

    Notes
    • For User delegated access using OAuth 2.0, the crawling account must have access to the specified sites.

    • A specific folder in a list isn’t supported.

  • Selected sites list URL

    Select this option to index content you’ve granted the SharePoint Online source access to by referencing a custom list of site collections. The list must be in a site that the SharePoint Online source also has access to.

    Tip
    Optimal configuration

    Using the Selected sites list URL option in conjunction with App authentication using certificate and the Sites.Selected application permission is an optimal configuration, both in terms of indexing performance and content access.

  • Hub site URLs

    Select this option to index the content of all sites associated with the hub sites you reference. This includes all subsites and lists in the associated sites.

    Note

    For User delegated access using OAuth 2.0, the crawling account must have access to the hub site and the associated sites. If the crawling account has access only to a subset of the associated sites, only these sites will be indexed and searchable.

  • All sites

    With App authentication using certificate, all sites in your SharePoint Online tenant can be indexed. With User delegated access using OAuth 2.0, only the sites that the crawling account is allowed to access can be indexed.

    Note

    This option corresponds only to top-level site collections and their associated content. It doesn’t include personal site content.

User profiles

Select this content type to index only the user profiles in your SharePoint Online tenant.

Note

For User delegated access using OAuth 2.0, the crawling account must be set as an owner in the personal sites for the user profiles that you want to index.

Personal sites

Select this content type to only index personal sites, which includes site collections and OneDrive documents.

Notes
  • For User delegated access using OAuth 2.0, the crawling account must be set as an owner in all personal sites that you want to index.

  • User access to the indexed items through a Coveo-powered search interface depends on your source Content security setting. Personal sites documents and folders are private unless they’re shared with others.

OneDrive for Business

Select this content type to only index the document libraries in OneDrive, including the My Files content of users' personal sites.

If you want to index all content of users' personal sites (all site collections) in addition to document libraries, select the Personal sites option. However, if you only want to index user documents, we recommend using the OneDrive for Business option to limit the crawling scope.

Notes
  • For App authentication using certificate, your source has access to all user content. For User delegated access using OAuth 2.0, the crawling account must be set as an owner in all personal sites that you want to index.

  • User access to the indexed items through a Coveo-powered search interface depends on your source Content security setting. OneDrive for Business documents and folders are private unless they’re shared with others.

Exclusions and inclusions

Add exclusion and inclusion rules to crawl only specific items based on their URL.

Exclusions and inclusions user interface screenshot | Coveo

The following diagram illustrates how the SharePoint Online crawler applies the exclusion and inclusion rules. This flow applies to all items, including the starting URLs. You must therefore pay attention to not filter out your starting URLs.

Crawling workflow diagram | Coveo
Tip
About the "Include all non-excluded items" option
Crawling flow with the all-inclusive inclusion rule | Coveo

The Include all non-excluded items option automatically adds an "include all" inclusion rule in the background. This ensures that all starting URLs meet the Does URL match at least one inclusion rule? condition and that all non-excluded items get crawled.

With SharePoint Online content, the following are common configuration patterns:

  • When you don’t want to exclude content on a URL basis, you don’t add any exclusion rule and you use the default Include all non-excluded items inclusion option.

  • When you want to exclude content on a URL basis with the Hub site URLs and All sites options, you add an inclusion rule for the special root URL Coveo uses to discover sites (that is, the sharepoint://online/Administration tenant URL), and other inclusion rules for each of the sites you want to index.

You can use any of the six types of rules:

  • is and a URL that includes the protocol. For example, https://myfood.com/.

  • contains and a string found in the URL. For example, recipes.

  • begins with and a string found at the beginning of the URL and which includes the protocol. For example, https://myfood.

  • ends with and a string found at the end of the URL. For example, .pdf.

  • matches wilcard rule and a wildcard expression that matches the whole URL. For example, https://myfood.com/recipes*.

  • matches regex rule and a regex rule that matches the whole URL. For example, ^.*(company-(dev|staging)).*html.?$.

    Tip

    When using regex rules, make sure they match the desired URLs with a testing tool such as Regex101.

Exclusion and inclusion configuration examples

The following examples illustrate how to configure exclusion and inclusion rules for your SharePoint Online source. For demonstration purposes only, some of the examples use the All sites content option. You should try to use another content option whenever possible to limit the scope of your source.

Example 1: Using the All sites content option with exclusions

You want to index all site collections in your SharePoint Online tenant, except for the HR (https://mytenant.sharepoint.com/sites/HR) site collection.

Possible solution

  • SharePoint Online content option: All sites

  • Exclusions: https://mytenant.sharepoint.com/sites/HR* (type: matches wilcard rule)

  • Inclusions: Include all non-excluded items[1]

1. With the All sites option, the site discovery address (that is, sharepoint://online/Administration tenant) must be in your inclusions, otherwise no items will be indexed.

Example 2: Using the All sites content option with inclusions

You want to index all top level site collections in your SharePoint Online tenant whose names start with B (that is, their URLs begin with https://mytenant.sharepoint.com/sites/B).

Possible solution

  • SharePoint Online content option: All sites

  • Exclusions: none

  • Inclusions:

    • https://mytenant.sharepoint.com/sites/B* (type: matches wilcard rule)

    • sharepoint://online* (type: matches wilcard rule)[2]

2. With the All sites option, the site discovery address (that is, sharepoint://online/Administration tenant) must be in your inclusions, otherwise no items will be indexed.

Example 3: Using the Hub site URLs content option

You have a hub site, SiteA (https://mytenant.sharepoint.com/sites/SiteA). Sites SiteB (https://mytenant.sharepoint.com/sites/SiteB) and SiteC (https://mytenant.sharepoint.com/sites/SiteC) are associated with SiteA.

You want to index the content of SiteA and SiteB, but exclude the content of SiteC

Possible solution 1

  • SharePoint Online content option: Hub site URLs (set URL to https://mytenant.sharepoint.com/sites/SiteA)

  • Exclusions: https://mytenant.sharepoint.com/sites/SiteC* (type: matches wilcard rule)

  • Inclusions: Include all non-excluded items[3]

Possible solution 2

  • SharePoint Online content option: Hub site URLs (set URL to https://mytenant.sharepoint.com/sites/SiteA)

  • Exclusions: none

  • Inclusions:

    • https://mytenant.sharepoint.com/sites/SiteA* (type: matches wilcard rule)

    • https://mytenant.sharepoint.com/sites/SiteB* (type: matches wilcard rule)

    • sharepoint://online* (type: matches wilcard rule)[3]

3. With the Hub site URLs option, the site discovery address (that is, sharepoint://online/Administration tenant) must be in your inclusions, otherwise no items will be indexed.

Additional exclusion filters

There are three other ways to exclude content from being indexed:

Exclude items on a metadata value basis

You can define a condition based on metadata values to prevent items from being crawled.

Conditions must reference metadata names using the %[METADATA_NAME] syntax, where METADATA_NAME is replaced with the actual metadata name. Metadata names and values are case-sensitive.

The View and map metadata subpage lists metadata names available in your source. Given metadata-based exclusion is applied at the crawling stage of the Coveo indexing pipeline, make sure you select a metadata name whose Origin value is Crawler. You can’t use a metadata name that’s listed twice: once with the Crawler origin and once with the converter origin.

The View and map metadata subpage showing the File Extension metadata sample values | Coveo
An example of crawler stage metadata.

The condition may be a single expression or a combination of expressions. The following operators are supported: AND, OR, Exists, NOT, ==, >, and <. Parentheses are also supported to specify operation order.

Important

The > and < operators can only be used with numeric metadata, not with date metadata. For date metadata-based conditions, consider only indexing items modified within a rolling period.

The following table gives examples of conditions and their effects:

Condition Matches items that Indexing result

%[documenttype]

Have a documenttype metadata value.

All items with a documenttype metadata value are excluded.

NOT %[documenttype]

Don’t have a documenttype metadata value.

All items that don’t have a documenttype metadata value are excluded.

%[fileextension] == "pdf"

Have the pdf value for the fileextension metadata.

All items with a fileextension metadata value of pdf are excluded.

NOT (%[fileextension] == "pdf")

Don’t have the pdf value for the fileextension metadata.

All items with a fileextension metadata value other than pdf are excluded.

%[documenttype] == "List" OR %[fileextension] == "pdf"

Have the List value for the documenttype metadata, or the pdf value for the fileextension metadata.

All items whose documenttype metadata value is List or whose fileextension metadata value is pdf are excluded.

Exclude template types

You can configure your SharePoint Online source to ignore specific SharePoint list template types when indexing items.

Enter the list template types to ignore by adding a separate entry for each template type.

Example

You don’t want your source to index DocumentLibrary and Tasks template-type items. Therefore, you enter the following:

List To Ignore | Coveo
Exclude older content

You can configure your SharePoint Online source to index only items that were created or modified within a specified time range. To configure this rolling period, set the amount and period.

Example

You want your source to index only items that were modified in the last two years, so you configure the rolling period as follows:

Example of a rolling period configuration | Coveo

To disable the rolling period, set the amount to 0.

Configuring a rolling period has the following effects during source updates:

Update type Effects

rebuild

Your source is emptied, then only items modified within the rolling window are added to your source.

rescan

The connector crawls the entire content your source targets, and:

  • New items are added to your source.

  • Items that have been modified since the last source update are updated in your source.

  • Items that haven’t been modified within the rolling window or that have been deleted in SharePoint are removed from your source.

refresh

The connector crawls items that have been modified/added since the last source update and items whose last modified date is outside the rolling window, and:

  • Items that have been modified/added since the last source update are updated/added in your source.

  • Items that haven’t been modified within the rolling window are removed from your source.

"Advanced settings" subtab

The Advanced settings subtab lets you customize the SharePoint Online crawler behavior. All advanced settings have default values that are adequate in most use cases.

Content and images

If you want Coveo to extract text from image files or PDF files containing images, enable the appropriate option.

The extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick view. See Enable optical character recognition for details on this feature.

Additional content

If you selected All sites, Hub site URLs, Specific URLs, Personal sites, or Selected sites list URL in the Content section, select whether to index the following:

Folders

Select this option to index list folders and document sets.

A SharePoint folder is only a container for items. By default, this option isn’t enabled, which means Coveo indexes the items in folders, but not the folders themselves.

Unapproved items

Select this option to retrieve unapproved items, which are items with a Draft or Pending approval status, from lists where moderation is activated. If an unapproved version exists for an item that’s already Approved, your source indexes the unapproved item instead of the approved item. As a result, the unapproved item appears in Coveo search results. If this option is disabled, your source indexes only approved items.

Example

In a list where moderation is active, a document named Meeting Notes is Approved and indexed by Coveo. This document version is 1.0. However, a coworker edits Meeting Notes, thereby creating version 1.1, and the document status becomes Draft. Then, your SharePoint Online source is rescanned. If Unapproved items is enabled in your source, version 1.0 is deleted from the Coveo index and is replaced with the draft version 1.1. If Unapproved items is disabled in your source, Coveo indexes version 1.0 as version 1.1 is not yet Approved.

In lists where moderation is deactivated, this option doesn’t apply. Coveo indexes the latest version of an item, be it Approved, Draft, or Pending.

Note

For SharePoint lists that require documents to be checked out before editing, Coveo doesn’t index a document while it’s checked out regardless of the Unapproved items option and the list moderation setting in SharePoint. If a checked out item is checked in and its status changes to Draft or Pending, the unapproved item is indexed only if the Unapproved items option is enabled in your source or if moderation is deactivated for the list.

"Authentication" subtab

Your SharePoint Online source must authenticate to SharePoint Online using app authentication with a certificate or user delegated access to retrieve your tenant content.

Note

Your source authentication access token can potentially expire and become invalid. See Update a SharePoint Online access token for information on what causes the access token to expire and how to update an expired access token.

  1. Select whether to use App authentication using certificate or User delegated access using OAuth 2.0.

  2. Specify the corresponding settings:

    • For App authentication using certificate:

      1. Enter your SharePoint Online Tenant name or tenant address.

        Examples
        • SharePoint Online tenant name: mycompany

        • SharePoint Online tenant address: https://mycompany.sharepoint.com

      2. Enter your SharePoint Online Tenant ID.

      3. Enter the Client ID for the Azure Active Directory application that you created for your source.

      4. Enter your Certificate password.

      5. Click Upload certificate or Upload and replace existing certificate to select and upload your PFX certificate.

    • For User delegated access using OAuth 2.0:

      1. Click Authorize account.

      2. Enter your SharePoint Online Tenant name, and then click Sign In.

        Example

        If the SharePoint Online tenant address is https://mycompany.sharepoint.com, then the tenant name is mycompany.

      3. Provide admin consent using SharePoint Online user credentials that have the Global Admin role by following the steps detailed here, or proceed to the next step if you want to provide consent from your Azure portal after creating your source.

        Note

        You can switch your source to the crawling account after you provide admin consent.

        1. Enter the Email and Password of a SharePoint account with the Global Admin role.

        2. Select Consent on behalf of your organization.

        3. Click Accept.

        4. To switch the source to the crawling account, click Authorize account again, enter your SharePoint Online Tenant name, click Sign In, and then proceed to the next step.

      4. Enter the Email and Password of the crawling account that you created earlier and that has access to the desired SharePoint Online content, and then click Sign in.

        Note

        When you create two SharePoint Online sources retrieving content with the same tenant, they share their security providers, which increases the speed of the security identities refresh operation. You must, however, use the same limited administrator credentials for both sources.

"Content security" tab

Select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content security.

Note

When using the Same users and groups as in your content system content security option, you can map Microsoft 365 email aliases to their corresponding primary email addresses so that your repository’s content permissions are respected when a user logs in to a Coveo-powered search interface using an email alias.

Important

When using the Everyone content security option, see Safely apply content filtering for information on how to ensure that your source content is safely filtered and only accessible by intended users.

"Access" tab

In the Access tab, set whether each group (and API key, if applicable) in your Coveo organization can view or edit the current source.

For example, when creating a new source, you could decide that members of Group A can edit its configuration while Group B can only view it.

See Custom access level for more information.

Completion

  1. Finish adding or editing your source:

    • When you want to save your source configuration changes without starting a build/rebuild, such as when you know you want to do other changes soon, click Add source/Save.

    • When you’re done editing the source and want to make changes effective, click Add and build source/Save and rebuild source.

      Note

      On the Sources (platform-ca | platform-eu | platform-au) page, you must click Launch build or Start required rebuild in the source Status column to add the source content or to make your changes effective, respectively.

      Back on the Sources (platform-ca | platform-eu | platform-au) page, you can follow the progress of your source addition or modification.

      Once the source is built or rebuilt, you can review its content in the Content Browser.

      Note

      If you selected Specific URLs or User profiles in the Content section, some additional items will appear in the Content Browser. To retrieve user profiles, Coveo must dig through your SharePoint Online instance, including your host site collection and the documents it contains. The items it encounters in the process are retrieved as well and therefore appear in the Content Browser.

  2. Once your source is done building or rebuilding, review the metadata Coveo is retrieving from your content.

    1. On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click More > View and map metadata in the Action bar.

    2. If you want to use a currently not indexed metadata in a facet or result template, map it to a field.

      1. Click the metadata and then, at the top right, click Add to Index.

      2. In the Apply a mapping on all item types of a source panel, select the field you want to map the metadata to, or add a new field if none of the existing fields are appropriate.

        Notes
        • For details on configuring a new field, see Add or edit a field.

        • For advanced mapping configurations, like applying a mapping to a specific item type, see Manage mappings.

      3. Click Apply mapping.

    3. Depending on the source type you use, you may be able to extract additional metadata from your content. You can then map that metadata to a field, just like you did for the default metadata.

      More on custom metadata extraction and indexing

      Some source types let you define rules to extract metadata beyond the default metadata Coveo discovers during the initial source build.

      For example:

      Source type Custom metadata extraction methods

      Push API

      Define metadata key-value pairs in the addOrUpdate section of the PUT request payload used to upload push operations to an Amazon S3 file container.

      In the JSON configuration (REST API | GraphQL API) of the source, define metadata names (REST API | GraphQL API) and specify where to locate the metadata values in the JSON API response Coveo receives.

      Database

      Add <CustomField> elements in the XML configuration. Each element defines a metadata name and the database field to use to populate the metadata with.

      Web

      Sitemap

      Some source types automatically map metadata to default or user created fields, making the mapping process unnecessary. Some source types automatically create mappings and fields for you when you configure metadata extraction.

      See your source type documentation for more details.

    4. When you’re done reviewing and mapping metadata, return to the Sources (platform-ca | platform-eu | platform-au) page.

    5. To reindex your source with your new mappings, click Launch rebuild in the source Status column.

    6. Once the source is rebuilt, you can review its content in the Content Browser.

Troubleshooting

After a rebuild, you may notice that your source isn’t indexing as expected. For example, there may be missing or extra items, or the values of some fields may not meet your requirements.

To help you troubleshoot, refer to the list of common issues and solutions when using the SharePoint Online source.

Safely apply content filtering

The best way to ensure that your indexed content is seen only by the intended users is to enforce content security by selecting the Same users and groups as in your content system option. Should this option be unavailable, select Specific users and groups instead.

However, if you need to configure your source so that the indexed source content is accessible to Everyone, you should adhere to the following leading practices. These practices ensure that your source content is safely filtered and only accessible by the appropriate users:

Following the above leading practices results in a workflow whereby the user query is authenticated server side via a search token that enforces the search hub from which the query originates. Therefore, the query can’t be modified by users or client-side code. The query then passes through a specific query pipeline based on a search hub condition, and the query results are filtered using the filter rules.

Configure query filters

Filter rules allow you to enter hidden query expressions to be added to all queries going through a given query pipeline. They’re typically used to add a field-based expression to the constant query expression (cq).

Example

You apply the @objectType=="Solution" query filter to the pipeline to which the traffic of your public support portal is directed. As a result, the @objectType=="Solution" query expression is added to any query sent via this support portal.

Therefore, if a user types Speedbit watch wristband in the search box, the items returned are those that match these keywords and whose objectType has the Solution value. Items matching these keywords but having a different objectType value aren’t returned in the user’s search results.

To learn how to configure query pipeline filter rules, see Manage filter rules.

Note

You can also enforce a filter expression directly in the search token.

Use condition-based query pipeline routing

The most recommended and flexible query pipeline routing mechanism is condition-based routing.

When using this routing mechanism, you ensure that search requests are routed to a specific query pipeline according to the search interface from which they originate, and the authentication is done server side.

To accomplish this:

  1. Apply a condition to a query pipeline based on a search hub value, such as Search Hub is Community Search or Search Hub is Agent Panel. This condition ensures that all queries that originate from a specific search hub go through that query pipeline.

  2. Authenticate user queries via a search token that’s generated server side and that contains the search hub parameter that you specified in the query pipeline.

Configure the search token

When using query filters to secure content, the safest way to enforce content security is to authenticate user queries using a search token that’s generated server side. For instance, when using this approach, you can enforce a search hub value in the search token. This makes every authenticated request that originates from a component use the specified search hub, and therefore be routed to the proper query pipeline. Because this configuration is stored server side and encrypted in the search token, it can’t be modified by users or client-side code.

Implementing search token authentication requires you to add server side logic to your web site or application. Therefore, the actual implementation details will vary from one project to another.

The following procedure provides general guidelines:

Note

If you’re using the Coveo In-Product Experience (IPX) feature, see Implement advanced search token authentication.

  1. Authenticate the user.

  2. Call a service exposed through Coveo to request a search token for the authenticated user.

  3. Specify the userIDs for the search token, and enforce a searchHub parameter in the search token.

Note

You can specify other parameters in the search token, such as a query filter.

For more information and examples, see Search token authentication.

Update a SharePoint Online access token

Your SharePoint Online source uses the OAuth 2.0 authorization protocol to access your SharePoint Online site content via an Azure Active Directory application that has the required permissions (see Authentication and site access).

The access token is linked to the certificate or SharePoint Online user account (crawling account) that you specified in your source configuration, and you must update the access token manually if it’s no longer valid. An invalid access token occurs when:

  • The certificate expires

  • The SharePoint Online crawling account’s credentials (email and/or password) are modified

An authentication error appears on the Sources (platform-ca | platform-eu | platform-au) page when your SharePoint Online source access token is no longer valid.

SharePoint Online source authentication issue | Coveo
Note

A source authentication error may also appear due to configuration or connectivity issues. If the certificate hasn’t expired, or the crawling account’s credentials haven’t changed, verify the following:

To update the access token

  • For App authentication using certificate:

    1. Create a client certificate.

    2. Add your certificate to the Azure Active Directory application that you created for use with your source.

    3. On the Sources (platform-ca | platform-eu | platform-au) page, click your SharePoint Online source, and then click Edit in the Action bar.

    4. In the Authentication subtab, click Upload and replace existing certificate and select your new certificate.

    5. Enter the Certificate password.

    6. Click Save or Save and rebuild source.

  • For User delegated access using OAuth 2.0:

    1. On the Sources (platform-ca | platform-eu | platform-au) page, click your SharePoint Online source, and then click Edit in the Action bar.

    2. In the Authentication subtab, click Authorize Account.

    3. Enter your SharePoint Online Tenant name, and then click Sign in.

    4. In the Microsoft Online login form, enter the Email and Password of the crawling account, and then click Sign in.

    5. Click Save or Save and rebuild source.

Required privileges

You can assign privileges to allow access to specific tools in the Coveo Administration Console. The following table indicates the privileges required to view or edit elements of the Sources (platform-ca | platform-eu | platform-au) page and associated panels. See Manage privileges and Privilege reference for more information.

Note

The Edit all privilege isn’t required to create sources. When granting privileges for the Sources domain, you can grant a group or API key the View all or Custom access level, instead of Edit all, and then select the Can Create checkbox to allow users to create sources. See Can Create ability dependence for more information.

Actions Service Domain Required access level

View sources, view source update schedules, and subscribe to source notifications

Content

Fields

View

Sources

Organization

Organization

Edit sources, edit source update schedules, and view the View and map metadata subpage

Content

Fields

Edit

Sources

Content

Source metadata

View

Organization

Organization

What’s next?


1. The entity tag is the version identifier of an item and is calculated using the item metadata.