Add or Edit a SharePoint Server Source

Members of the Administrators and Content Managers built-in groups can include SharePoint on-premises content and make it searchable. In a Coveo-powered search interface, the source content is accessible to either everyone, some specific users and groups, or the same users and groups as in the content system.

Note

To retrieve SharePoint Online content, you must create a SharePoint Online source.

Tip
Leading practice

The number of items that a source processes per hour (crawling speed) depends on various factors, such as network bandwidth and source configuration. See About Crawling Speed for information on what can impact crawling speed, as well as possible solutions.

Source Key Characteristics

Features Supported Additional information

SharePoint version

2019, 2016, 2013, 2010, Foundation 2013, and Foundation 2010

Searchable content types

check

Sites, sub-sites, public user profiles[1], personal websites[1], lists, list items, list item attachments, document libraries, document sets, documents, web parts[2], and microblog posts and replies.

Content update operations

Refresh

check

Takes place every six hours by default. A rescan or rebuild is required to take account of deleted user profiles.

Rescan

check

Takes place every week by default.

Rebuild

check

Content security options

Same users and groups as in your content system

check

On-premises Active Directory permission systems aren’t supported with SharePoint Server sources of the On-Premises type. However, if you use the Crawling Module Active Directory is supported.

Specific users and groups

check

Everyone

check

Requirements

Active Directory Federation Services

When your SharePoint environment uses ADFS as a trusted identity provider, the ADFS service endpoint URL paths must be enabled.

SharePoint Account Permissions

When you want to include SharePoint content, you must create a specific SharePoint account to be used by the source only. Otherwise, you need to also change the source Password value each time the account password changes to prevent authentication errors (see Username and Password).

  1. Access your SharePoint tenant with an administrator account.

  2. On your SharePoint tenant:

    1. Select or create a user account for the source to use when retrieving your SharePoint content. See the following table to identify the required type of user for your web application enabled authentication.

      SharePoint environment SharePoint web application enabled authentication User type User format

      Classic

      Windows

      Windows account

      domain\username

      or

      username@domain.com

      Claims

      Windows

      Windows account

      ADFS

      ADFS SSO

      Okta

      Okta SSO

      username@domain.com

    2. Grant appropriate SharePoint permissions to the SharePoint account to ensure it has access to the content that you want to make searchable.

      The following table presents the minimal required permissions that the source account must have to perform specific actions.

      Action to perform Minimal required permission

      Content and security indexing, source refresh, and site collection discovery

      Full Read policy for each web application to make searchable (see Add the Full Read Policy to All SharePoint Tenant Web Applications).

      Personal site, public user profile, and social tags indexing

      Note

      When including personal sites or public user profiles, the account used as source credentials must not have a personal site on the SharePoint server being included to prevent failures when attempting to retrieve the list of personal sites.

Add or Edit a SharePoint Server Source

Before you start, ensure that your SharePoint instance meets the source requirements.

When adding a source, in the Add a source of content panel, click the On-Premises (On-premises icon) or the Crawling Module (Crawling Module icon) tab, depending on whether you need to use the Coveo On-Premises Crawling Module to retrieve your content. See Content Retrieval Methods for details.

A SharePoint Server source indexes on-premises (server) content. To retrieve cloud content instead, see Add or Edit a SharePoint Online Source.

"Configuration" Tab

On the Add/Edit a SharePoint Server Source subpage, the Configuration tab is selected by default. It contains your source’s general and authentication information, as well as other parameters.

General Information

Source Name

Enter a name for your source.

Tip
Leading practice

A source name can’t be modified once it’s saved, therefore be sure to use a short and descriptive name, using letters, numbers, hyphens (-), and underscores (_). Avoid spaces and other special characters.

URL

Enter one or more URLs corresponding to the desired site collection, lists, websites, and subsites to make searchable. Each URL must include the protocol and tenant name.

Note

A specific folder in a list isn’t supported.

Examples
  • For a specific web application: https://site:8080/

  • For a specific site collection: https://site:8080/sites/support

  • For a specific website: https://site:8080/sites/support/subsite

  • For a specific list: https://site:8080/sites/support/lists/contacts/allItems.aspx

Scope

In the drop-down menu, select the option for the content type matching the URLs you specified. By default, Web application is selected.

Available options are the following:

Value Content to make searchable

Web application

All site collections of the specified web application.

Site collection

All web sites of the specified site collection.

Web and sub webs

Only the specified web site and its sub webs (also known as subsites).

List

Only the specified list or document library.

Paired Crawling Module

If your source is a Crawling Module source, and if you have more than one Crawling Module linked to this organization, select the one with which you want to pair your source. If you change the Crawling Module instance paired with your source, a successful rebuild is required for your change to apply.

Optical Character Recognition (OCR)

If you want Coveo to extract text from image files or PDF files containing images, check the appropriate box. OCR-extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick View. See Enable Optical Character Recognition for details on this feature.

Note

Contact Coveo Sales to add this feature to your organization license.

"Authentication" Section

In the Authentication section, you must provide authentication information so that Coveo can access the content you want to make searchable. In the drop-down menu, select the identity provider that you use to manage identities in your SharePoint site:

  • Active Directory On-Premises (available when using the Crawling Module only)

  • Windows (NTLM or Kerberos)

  • ADFS under claims (simple or chained ADFS identity provider)

  • Okta

Depending on the option you chose in the drop-down menu, you must specify some of the following options.

Username and Password

The username and password of a dedicated SharePoint administrator account that has access to the content to include, or if using Okta, the username of an Okta administrator account. See Source Credentials Leading Practices.

ADFS Server URL

The URL of an ADFS server trusted by SharePoint.

Example

https://adfs01.subdomain.example.com

SharePoint Trust Identifier

The SharePoint server relying party trust identifier.

Example

https://subdomain.example.com:44626/_trust

To find your relying party trust identifier:

  1. Access the AD FS 2.0 Management Console (Windows Start menu > All Programs > Administrative Tools > AD FS 2.0 Management).

  2. In AD FS 2.0 Management Console, under Trust Relationships, select Relying Party Trusts.

  3. In the Relying Party Trusts list, find the row for SharePoint. The ADFS relying party trust identifier is the value in the Identifier column.

ADFS Trust Identifier

The relying party trust identifier of the ADFS server acting as an intermediate.

Example

http://adfs01.subdomain.example.com/adfs/services/trust

Identity Provider Server URL

The URL of the identity provider used in SharePoint to authenticate users.

Example

https://adfs2012.subdomain.example.com

You can edit the identity provider server URL in the ADFS settings (see Enable the ADFS Service Endpoint URL Path).

Okta Realm

The SharePoint trusted identity provider realm provided in your Okta application configuration (see Using Okta as a Trusted Identity Provider).

Example

urn:okta:sharepoint:exknuavz9hbOItwsS8e7

Okta Sign in URL

The URL to which users should be redirected in order to authenticate with Okta (see Using Okta as a Trusted Identity Provider).

Example

https://dev-782461.oktapreview.com/app/appname/sso/wsfed/passive

Active Directory Username and Active Directory Password

Enter credentials to grant Coveo access to your Active Directory.

Expand Well-Known SIDs

Select this option if you want the users that are included in your Active Directory well-known security identifiers to be granted access to the indexed content. Expect an increase in the duration of the security identity provider refresh operation. Supported well-known SIDs are: Everyone, Authenticated Users, Domain Admins, Domain Users, and Anonymous Users.

Tip
Leading practice

If your entire content is secured with the Everyone or Authenticated users well-known, it’s more cost-effective resource-wise to index it with a source whose content is accessible to everyone than to expand the well-known with a source that indexes permissions.

Enable TLS

Select this option to use a TLS protocol to retrieve your security identities. If you do, we strongly recommend selecting StartTLS if you can. Since LDAPS is a much older protocol, you should only select this value if StartTLS is incompatible with your environment.

Email Attributes

By default, Coveo retrieves the email address associated to each security identity from the mail attribute. Optionally, you can specify additional or different attributes to check. Should an attribute contain more than one value, Coveo uses the first one.

"Content to Include" Section

In the Content to Include section, consider changing the default settings to make additional content searchable.

User Profiles

Check this box to index public SharePoint user profiles.

Note

This box is unavailable if you’ve selected ADFS under claims or Okta as an identity provider.

Personal Sites

When the Scope is Web application, check this box to include SharePoint personal sites.

"Crawling Settings" Section

In the Crawling Settings section, the Reindex all child items on UpdateShallow option allows you to reindex the children of an item that has been updated. This ensures that, if the metadata of the child items contains parent item information, this information stays up to date. However, checking this box significantly impacts the source refresh time. Therefore, if you don’t check it, we recommend scheduling source rescans so that the child items are eventually updated as well.

Example

You change your SharePoint site name. In the metadata of the child items, the site name appears under spsitename. If the box isn’t checked, the children aren’t reindexed and keep an outdated spsitename until the next source rescan or rebuild. However, if the box is checked, the children are updated along with the parent SharePoint site item.

"Filters" Section

Use this section to include or exclude content from specific pages based on URL expressions.

Note

You can view your URL expressions in the addressPatterns attribute of your source JSON configuration panel.

Inclusion Filters

Your source indexes only the pages that match a URL expression specified in this section.

Note

The URL(s) that you specified for your source must be part of the inclusion filter scope, otherwise the corresponding content won’t be indexed. For example, if you entered https://site:8080/sites/support as the source URL, that URL must match one of your filter expressions to index the corresponding content. If a source URL redirects to another URL, both URLs must be part of the inclusion filter scope.

  1. Enter a URL expression to apply as the inclusion filter.

  2. Select whether the URL expression uses a Wildcard or a Regex (regular expression) pattern.

Tip
Leading practice

You can test your regexes to ensure that they match the desired URLs with tools such as Regex101.

You can customize regexes to meet your use case focusing on aspects such as:

  • Case insensitivity

  • Capturing groups

  • Trailing slash inclusion

  • File extension

For example, you want to index HTML pages on your company staging and dev websites without taking the case sensitivity or the trailing slash (/) into account, so you use the following regex:

(?i)^.*(company-(dev|staging)).*html.?$

The regex matches the following URLs:

  • http://company-dev/important/document.html/

  • http://ComPanY-DeV/important/document.html/ (because of (?i), the case insensitive flag)

  • http://company-dev/important/document.html (with or without trailing / because of .?)

  • http://company-staging/important/document.html/ (because of dev|staging)

but doesn’t match the following ones:

  • http://besttech-dev/important/document.html/ (besttech isn’t included in the regex)

  • http://company-dev/important/document.pdf/ (only html files are included)

  • http://company-prod/important/document.html/ (prod isn’t included in the regex)

Example

The www.mycompany.com website you crawl contains versions in several languages and you want to have one source per language. For the US English source, if the source URL is www.mycompany.com/en-us/welcome.html, the inclusion filter would be www.mycompany.com/en-us/*.

Exclusion Filters

Your source ignores content from pages that match a URL expression specified in this section.

Note

The URL(s) that you specified for your source must not be part of the exclusion filter scope, otherwise the corresponding content won’t be indexed. For example, if you entered https://site:8080/sites/support as the source URL, and that URL matches one of your exclusion filter expressions, the corresponding content won’t be indexed. If a source URL redirects to another URL, both URLs must not be part of the exclusion filter scope.

  1. Enter a URL expression to apply as the exclusion filter.

    Notes
    • Exclusion filters also apply to shortened and redirected URLs.

    • By default, if pages are only accessible via excluded pages, those pages will also be excluded.

  2. Select whether the URL expression uses a Wildcard or a Regex (regular expression) pattern.

Examples
  • There’s no point in indexing the search page of your website, so you exclude its URL:

    www.mycompany.com/en-us/search.html

  • You don’t want to index ZIP files that are linked from website pages:

    www.mycompany.com/en-us/*.zip

"Content Security" Tab

Select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content Security.

Important

When using the Everyone content security option, see Safely Apply Content Filtering for information on how to ensure that your source content is safely filtered and only accessible by intended users.

"Access" Tab

In the Access tab, set whether each group and API key can view or edit the source configuration (see Resource Access):

  1. If available, in the left pane, click Groups or API Keys to select the appropriate list.

  2. In the Access Level column for groups or API keys with access to source content, select View or Edit.

Completion

  1. Finish adding or editing your source:

    • When you want to save your source configuration changes without starting a build/rebuild, such as when you know you want to do other changes soon, click Add Source/Save.

      Note

      On the Sources (platform-eu | platform-au) page, you must click Launch build or Start required rebuild in the source Status column to add the source content or to make your changes effective, respectively.

    • When you’re done editing the source and want to make changes effective, click Add and Build Source/Save and Rebuild Source.

      Back on the Sources (platform-eu | platform-au) page, you can review the progress of your source addition or modification.

      Once the source is built or rebuilt, you can review its content in the Content Browser.

  2. Optionally, consider editing or adding mappings once your source is done building or rebuilding.

    Should you need to create mapping rules for Jira Software custom fields, see this article.

Tip
Leading practice

By default, a Jira Software source indexes the entire Jira Software instance content. If you want to index only certain projects, click Save, and then specify the desired address patterns in your source JSON configuration before launching the initial build. See Add Source Filters for further information.

Additional Adjustments

  1. If your source retrieves your content through the Crawling Module and if access to its content is secured with an Active Directory security identity provider, you must edit the JSON configuration of the security identity provider associated to this source to provide additional information. In the security identity provider JSON configuration, add the following code snippet, in which you replace <HOSTNAME> with either your Active Directory server IP address or domain name to use to connect to your Active Directory.

     "Hostname": {
       "value": "<HOSTNAME>"
     }
  2. Moreover, if you checked the Enable TLS box in the Authentication section, ensure your security certificates are public and installed on the Crawling Module server.

Safely Apply Content Filtering

The best way to ensure that your indexed content is seen only by the intended users is to enforce content security by selecting the Same users and groups as in your content system option. Should this option be unavailable, select Specific users and groups instead.

However, if you need to configure your source so that the indexed source content is accessible to Everyone, you should adhere to the following leading practices to ensure that your source content is safely filtered and only accessible by the appropriate users:

Following the above leading practices results in a workflow whereby the user query is authenticated server side via a search token that enforces the search hub from which the query originates, which can’t be modified by users or client-side code. The query then passes through a specific query pipeline based on a search hub condition, and the query results are filtered using the pipeline filter rules.

Configure Query Filters

Filter rules allow you to enter hidden query expressions to be added to all queries going through a given query pipeline. They’re typically used to add a field-based expression to the constant query expression (cq).

Example

You apply the @objectType=="Solution" query filter to the pipeline to which the traffic of your public support portal is directed. As a result, the @objectType=="Solution" query expression is added to any query sent via this support portal.

Therefore, if a user types Speedbit watch wristband in the searchbox, the items returned are those that match these keywords and whose objectType has the Solution value. Items matching these keywords but having a different objectType value aren’t returned in the user’s search results.

To learn how to configure query pipeline filter rules, see Manage Filter Rules.

Note

You can also enforce a filter expression directly in the search token.

Use Condition-Based Query Pipeline Routing

The most recommended and flexible query pipeline routing mechanism is condition-based routing.

When using this routing mechanism, you ensure that search requests are routed to a specific query pipeline according to the search interface from which they originate, and the authentication is done server-side.

To accomplish this:

  1. Apply a condition to a query pipeline based on a search hub value, such as Search Hub is Community Search or Search Hub is Agent Panel. This condition ensures that all queries that originate from a specific search hub go through that query pipeline.

  2. Authenticate user queries via a search token that’s generated server side and that contains the search hub parameter that you specified in the query pipeline.

Configure the Search Token

When using query filters to secure content, the safest way to enforce content security is to authenticate user queries using a search token that’s generated server side. For instance, when using this approach, you can enforce a search hub value in the search token. This makes every authenticated request that originates from a component use the specified search hub, and therefore be routed to the proper query pipeline. Because this configuration is stored server side and encrypted in the search token, it can’t be modified by users or client-side code.

Implementing search token authentication requires you to add server-side logic to your web site or application. Therefore, the actual implementation details will vary from one project to another.

The following procedure provides general guidelines:

Note

If you’re using the Coveo In-Product Experience (IPX) feature, see Implementing Advanced Search Token Authentication.

  1. Authenticate the user.

  2. Call a service exposed through Coveo to request a search token for the authenticated user.

  3. Specify the userIDs for the search token, and enforce a searchHub parameter in the search token.

Note

You can specify other parameters in the search token, such as a query filter.

For more information and examples, see Search Token Authentication.

What’s Next?


1. Not available in Microsoft SharePoint Foundation.
2. Not all web parts are available in Microsoft SharePoint Foundation 2010 (see Web Parts in SharePoint Foundation).