Add a SharePoint Server source
Add a SharePoint Server source
Members with the required privileges can include SharePoint on-premises content and make it searchable. In a Coveo-powered search interface, the source content is accessible to either everyone, some specific users and groups, or the same users and groups as in the content system.
Note
To retrieve SharePoint Online content, you must create a SharePoint Online source. |
Leading practice
The number of items that a source processes per hour (crawling speed) depends on various factors, such as network bandwidth and source configuration. See About crawling speed for information on what can impact crawling speed, as well as possible solutions. |
Source key characteristics
Features | Supported | Additional information | |
---|---|---|---|
SharePoint version |
2019, 2016, 2013, and Foundation 2013 |
||
Indexable content |
Sites, sub-sites, public user profiles[1], personal websites[1], lists, list items, list item attachments, document libraries, document sets, documents, web parts, and microblog posts and replies. |
||
Takes place every six hours by default. A rescan or rebuild is required to take account of deleted user profiles. |
|||
Content security options |
On-premises Active Directory permission systems aren’t supported with SharePoint Server sources of the On-Premises type. However, if you use the Crawling Module Active Directory is supported. |
||
Requirements
SharePoint account permissions
When you want to include SharePoint content, you must create a specific SharePoint account to be used by the source only. Otherwise, you need to also change the source Password value each time the account password changes to prevent authentication errors (see Username and Password).
-
Access your SharePoint tenant with an administrator account.
-
On your SharePoint tenant:
-
Select or create a user account for the source to use when retrieving your SharePoint content. See the following table to identify the required type of user for your web application enabled authentication.
SharePoint environment SharePoint web application enabled authentication User type User format Classic
Windows
Windows account
domain\username
or
username@domain.com
Claims
Windows
Windows account
Okta
Okta SSO
username@domain.com
-
Grant appropriate SharePoint permissions to the SharePoint account to ensure it has access to the content that you want to make searchable.
The following table presents the minimal required permissions that the source account must have to perform specific actions.
Action to perform Minimal required permission Content and security indexing, source refresh, and site collection discovery
Full Read policy for each web application to make searchable (see Add the Full Read Policy to All SharePoint Tenant Web Applications).
Personal site, public user profile, and social tags indexing
NoteWhen including personal sites or public user profiles, the account used as source credentials must not have a personal site on the SharePoint server being included to prevent failures when attempting to retrieve the list of personal sites.
-
Read permission for the site collection of the source URL (see Add the SharePoint Website Read Permission).
-
Retrieve People Data for Search Crawlers permission to the User Profile Service Application (see Add the "Retrieve People Data for Search Crawlers" Permission).
-
-
Add a SharePoint Server source
A SharePoint Server source indexes on-premises (server) content. To retrieve cloud content instead, see Add a SharePoint Online source.
Before you start, ensure that your SharePoint instance meets the source requirements.
Follow the instructions below to add a SharePoint Server source that uses the desired content retrieval method.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click Add source.
-
In the Add a source of content panel, click the On-premises () or the Crawling Module () tab, depending on your content retrieval context. With the latter, you must install the Crawling Module to make your source operational.
-
Click the SharePoint Server tile.
-
Configure your source.
Leading practice
It’s best to create or edit your source in your sandbox organization first. Once you’ve confirmed that it indexes the desired content, you can copy your source configuration to your production organization, either with a snapshot or manually. See About non-production organizations for more information and best practices regarding sandbox organizations. |
"Configuration" tab
In the Add a SharePoint Server Source panel, the Configuration tab is selected by default. It contains your source’s general and authentication information, as well as other parameters.
General information
Source name
Enter a name for your source.
Leading practice
A source name can’t be modified once it’s saved, therefore be sure to use a short and descriptive name, using letters, numbers, hyphens ( |
URL
Enter one or more URLs corresponding to the desired site collection, lists, websites, and subsites to make searchable. Each URL must include the protocol and tenant name.
Note
A specific folder in a list isn’t supported. |
-
For a specific web application:
https://site:8080/
-
For a specific site collection:
https://site:8080/sites/support
-
For a specific website:
https://site:8080/sites/support/subsite
-
For a specific list:
https://site:8080/sites/support/lists/contacts/allItems.aspx
Scope
In the dropdown menu, select the option for the content type matching the URLs you specified. By default, Web application is selected.
Available options are the following:
Value | Content to make searchable |
---|---|
Web application |
All site collections of the specified web application. |
Site collection |
All web sites of the specified site collection. |
Web and sub webs |
Only the specified web site and its sub webs (also known as subsites). |
List |
Only the specified list or document library. |
Paired Crawling Module
If your source is a Crawling Module source, and if you have more than one Crawling Module linked to this organization, select the one with which you want to pair your source. If you change the Crawling Module instance paired with your source, a successful rebuild is required for your change to apply.
Optical character recognition (OCR)
If you want Coveo to extract text from image files or PDF files containing images, enable the appropriate option.
The extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick view. See Enable optical character recognition for details on this feature.
Project
If you have the Enterprise edition, use the Project selector to associate your source with one or multiple Coveo projects.
"Authentication" section
In the Authentication section, you must provide authentication information so that Coveo can access the content you want to make searchable.
In the dropdown menu, select the identity provider that you use to manage identities in your SharePoint site, and specify the corresponding options:
-
Active Directory On-Premises (available when using the Crawling Module only)
-
Windows (NTLM or Kerberos)
-
Okta
Depending on the option you chose in the dropdown menu, you must specify some of the following options.
Username and Password
The username and password of a dedicated SharePoint administrator account that has access to the content to include, or if using Okta, the username of an Okta administrator account. See Source Credentials Leading Practices.
Okta realm
The SharePoint trusted identity provider realm provided in your Okta application configuration (see Using Okta as a Trusted Identity Provider).
urn:okta:sharepoint:exknuavz9hbOItwsS8e7
Okta sign in URL
The URL to which users should be redirected to authenticate with Okta (see Using Okta as a Trusted Identity Provider).
https://dev-782461.oktapreview.com/app/appname/sso/wsfed/passive
Active Directory username and Active Directory password
Enter credentials to grant Coveo access to your Active Directory.
Expand well-known SIDs
Select this option if you want the users that are included in your Active Directory well-known security identifiers to be granted access to the indexed content.
Expect an increase in the duration of the security identity provider refresh operation.
Supported well-known SIDs are: Everyone
, Authenticated Users
, Domain Admins
, Domain Users
, and Anonymous Users
.
Leading practice
If your entire content is secured with |
Expand trusted domains
Select this option to have Coveo connect to your root domain to get the security identities of your other domains through the root domain.
If your environment contains more than one domain, you can establish a bidirectional or outbound cross-link relationship between the root domain of your Crawling Module server and your additional domains. When you do so, these domains trust your root domain, and Coveo can get their security identities through this root domain.
However, when enabling this option, you should expect an increase in the duration of the security identity provider refresh operation. Moreover, if a linked domain is unreachable, Coveo stops the security identity provider refresh operation.
Enable TLS
Select this option to use a TLS protocol to retrieve your security identities. If you do, we strongly recommend selecting StartTLS if you can. Since LDAPS is a much older protocol, you should only select this value if StartTLS is incompatible with your environment.
Email attributes
By default, Coveo retrieves the email address associated to each security identity from the mail
attribute.
Optionally, you can specify additional or different attributes to check.
Should an attribute contain more than one value, Coveo uses the first one.
"Content to include" section
In the Content to Include section, consider changing the default settings to make additional content searchable.
User profiles
Check this box to index public SharePoint user profiles.
Note
This box is unavailable if you selected Okta as the identity provider. |
Personal sites
When the Scope is Web application, check this box to include SharePoint personal sites.
"Crawling Settings" Section
In the Crawling Settings section, the Reindex all child items on UpdateShallow option allows you to reindex the children of an item that has been updated. This ensures that, if the metadata of the child items contains parent item information, this information stays up to date. However, checking this box significantly impacts the source refresh time. Therefore, if you don’t check it, we recommend scheduling source rescans so that the child items are eventually updated as well.
You change your SharePoint site name.
In the metadata of the child items, the site name appears under spsitename
.
If the box isn’t checked, the children aren’t reindexed and keep an outdated spsitename
until the next source rescan or rebuild.
However, if the box is checked, the children are updated along with the parent SharePoint site item.
"Filters" section
Note
You can view your URL expressions in the |
Inclusion filters
Your source indexes only the pages that match a URL expression specified in this section.
Note
The URL(s) that you specified for your source must be part of the inclusion filter scope, otherwise the corresponding content won’t be indexed.
For example, if you entered |
-
Enter a URL expression to apply as the inclusion filter.
-
Select whether the URL expression uses a Wildcard or a Regex (regular expression) pattern.
Leading practice
You can test your regexes to ensure that they match the desired URLs with tools such as Regex101. You can customize regexes to meet your use case focusing on aspects such as:
For example, you want to index HTML pages on your company staging and dev websites without taking the case sensitivity or the trailing slash (/) into account, so you use the following regex:
The regex matches the following URLs:
but doesn’t match the following ones:
|
The www.mycompany.com
website you crawl contains versions in several languages and you want to have one source per language.
For the US English
source, if the source URL is www.mycompany.com/en-us/welcome.html
, the inclusion filter would be www.mycompany.com/en-us/*
.
Exclusion filters
Your source ignores content from pages that match a URL expression specified in this section.
Note
The URL(s) that you specified for your source must not be part of the exclusion filter scope, otherwise the corresponding content won’t be indexed.
For example, if you entered |
-
Enter a URL expression to apply as the exclusion filter.
Notes-
Exclusion filters also apply to shortened and redirected URLs.
-
By default, if pages are only accessible via excluded pages, those pages will also be excluded.
-
Exclusion filters for Sharepoint Online sources are not case sensitive when using a Regex (regular expression). For example,
(company-(dev|staging)).*html.?$
will matchhttp:// ComPanY-dev/important/document.html
without adding any additional symbols to account for case sensitivity. Exclusion filters are case sensitive when using Wildcard expressions.
-
-
Select whether the URL expression uses a Wildcard or a Regex (regular expression) pattern.
-
There’s no point in indexing the search page of your website, so you exclude its URL:
www.mycompany.com/en-us/search.html
-
You don’t want to index ZIP files that are linked from website pages:
www.mycompany.com/en-us/*.zip
"Content security" tab
Select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content security.
When using the Everyone content security option, see Safely apply content filtering for information on how to ensure that your source content is safely filtered and only accessible by intended users. |
"Access" tab
In the Access tab, set whether each group (and API key, if applicable) in your Coveo organization can view or edit the current source.
For example, when creating a new source, you could decide that members of Group A can edit its configuration while Group B can only view it.
See Custom access level for more information.
Completion
-
Finish adding or editing your source:
-
When you want to save your source configuration changes without starting a build/rebuild, such as when you know you want to do other changes soon, click Add source/Save.
-
When you’re done editing the source and want to make changes effective, click Add and build source/Save and rebuild source.
NoteOn the Sources (platform-ca | platform-eu | platform-au) page, you must click Launch build or Start required rebuild in the source Status column to add the source content or to make your changes effective, respectively.
Back on the Sources (platform-ca | platform-eu | platform-au) page, you can follow the progress of your source addition or modification.
Once the source is built or rebuilt, you can review its content in the Content Browser.
-
-
Once your source is done building or rebuilding, review the metadata Coveo is retrieving from your content.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click More > View and map metadata in the Action bar.
-
If you want to use a currently not indexed metadata in a facet or result template, map it to a field.
-
Click the metadata and then, at the top right, click Add to Index.
-
In the Apply a mapping on all item types of a source panel, select the field you want to map the metadata to, or add a new field if none of the existing fields are appropriate.
Notes-
For details on configuring a new field, see Add or edit a field.
-
For advanced mapping configurations, like applying a mapping to a specific item type, see Manage mappings.
-
-
Click Apply mapping.
-
-
Depending on the source type you use, you may be able to extract additional metadata from your content. You can then map that metadata to a field, just like you did for the default metadata.
More on custom metadata extraction and indexing
Some source types let you define rules to extract metadata beyond the default metadata Coveo discovers during the initial source build.
For example:
Source type Custom metadata extraction methods Define metadata key-value pairs in the
addOrUpdate
section of thePUT
request payload used to upload push operations to an Amazon S3 file container.REST API
and
GraphQL APIIn the JSON configuration (REST API | GraphQL API) of the source, define metadata names (REST API | GraphQL API) and specify where to locate the metadata values in the JSON API response Coveo receives.
Add
<CustomField>
elements in the XML configuration. Each element defines a metadata name and the database field to use to populate the metadata with.-
Configure web scraping configurations that contain metadata extraction rules using CSS or XPath selectors.
-
Extract metadata from JSON-LD
<script>
tags.
-
Configure web scraping configurations that contain metadata extraction rules using CSS or XPath selectors.
-
Extract JSON-LD
<script>
tag metadata. -
Extract
<meta>
tag content using theIndexHtmlMetadata
JSON parameter.
Some source types automatically map metadata to default or user created fields, making the mapping process unnecessary. Some source types automatically create mappings and fields for you when you configure metadata extraction.
See your source type documentation for more details.
-
-
When you’re done reviewing and mapping metadata, return to the Sources (platform-ca | platform-eu | platform-au) page.
-
To reindex your source with your new mappings, click Launch rebuild in the source Status column.
-
Once the source is rebuilt, you can review its content in the Content Browser.
-
Additional adjustments
-
If your source retrieves your content through the Crawling Module and if access to its content is secured with an Active Directory security identity provider, you must edit the JSON configuration of the security identity provider associated to this source to provide additional information. In the security identity provider JSON configuration, add the following code snippet, in which you replace
<HOSTNAME>
with either your Active Directory server IP address or domain name to use to connect to your Active Directory."Hostname": { "value": "<HOSTNAME>" }
-
Moreover, if you checked the Enable TLS box in the Authentication section, ensure your security certificates are public and installed on the Crawling Module server.
Safely apply content filtering
The best way to ensure that your indexed content is seen only by the intended users is to enforce content security by selecting the Same users and groups as in your content system option. Should this option be unavailable, select Specific users and groups instead.
However, if you need to configure your source so that the indexed source content is accessible to Everyone, you should adhere to the following leading practices. These practices ensure that your source content is safely filtered and only accessible by the appropriate users:
-
Configure query filters: Apply filter rules on a query pipeline to filter the source content that appears in search results when a query goes through that pipeline.
-
Use condition-based query pipeline routing: Apply a condition on a query pipeline to make sure that every query originating from a specific search hub is routed to the right query pipeline.
-
Configure the search token: Authenticate user queries via a search token that’s generated server side that enforces a specific search hub.
Following the above leading practices results in a workflow whereby the user query is authenticated server side via a search token that enforces the search hub from which the query originates. Therefore, the query can’t be modified by users or client-side code. The query then passes through a specific query pipeline based on a search hub condition, and the query results are filtered using the filter rules.
Configure query filters
Filter rules allow you to enter hidden query expressions to be added to all queries going through a given query pipeline.
They’re typically used to add a field-based expression to the constant query expression (cq
).
You apply the @objectType=="Solution"
query filter to the pipeline to which the traffic of your public support portal is directed.
As a result, the @objectType=="Solution"
query expression is added to any query sent via this support portal.
Therefore, if a user types Speedbit watch wristband
in the search box, the items returned are those that match these keywords and whose objectType
has the Solution
value.
Items matching these keywords but having a different objectType
value aren’t returned in the user’s search results.
To learn how to configure query pipeline filter rules, see Manage filter rules.
Note
You can also enforce a filter expression directly in the search token. |
Use condition-based query pipeline routing
The most recommended and flexible query pipeline routing mechanism is condition-based routing.
When using this routing mechanism, you ensure that search requests are routed to a specific query pipeline according to the search interface from which they originate, and the authentication is done server side.
To accomplish this:
-
Apply a condition to a query pipeline based on a search hub value, such as Search Hub is Community Search or Search Hub is Agent Panel. This condition ensures that all queries that originate from a specific search hub go through that query pipeline.
-
Authenticate user queries via a search token that’s generated server side and that contains the search hub parameter that you specified in the query pipeline.
Configure the search token
When using query filters to secure content, the safest way to enforce content security is to authenticate user queries using a search token that’s generated server side. For instance, when using this approach, you can enforce a search hub value in the search token. This makes every authenticated request that originates from a component use the specified search hub, and therefore be routed to the proper query pipeline. Because this configuration is stored server side and encrypted in the search token, it can’t be modified by users or client-side code.
Implementing search token authentication requires you to add server side logic to your web site or application. Therefore, the actual implementation details will vary from one project to another.
The following procedure provides general guidelines:
Note
If you’re using the Coveo In-Product Experience (IPX) feature, see Implement advanced search token authentication. |
-
Authenticate the user.
-
Call a service exposed through Coveo to request a search token for the authenticated user.
-
Specify the
userIDs
for the search token, and enforce asearchHub
parameter in the search token.
Note
You can specify other parameters in the search token, such as a query |
For more information and examples, see Search token authentication.
Limitation
When indexing content with the Crawling Module, ensure not to change space character encoding in an item’s URI, as Coveo uses URIs to distinguish items.
For example, an item whose URI would change from example.com/my first item
to example.com/my%20first%20item
wouldn’t be recognized as the same by Coveo.
As a result, it would be indexed twice, and the older version wouldn’t be deleted.
Item URIs are displayed in the Content Browser (platform-ca | platform-eu | platform-au).
We recommend you check where these URIs come from before making changes that affect space character encoding.
Depending on your source type, the URI may be an item’s URL, or it may be built out of pieces of metadata by your source mapping rules.
For example, your item URIs may consist of the main site URL plus the item filename, due to a mapping rule such as example.com/%[filename]
.
In such a case, changing space encoding in the item filename could impact the URI.
Required privileges
You can assign privileges to allow access to specific tools in the Coveo Administration Console. The following table indicates the privileges required to view or edit elements of the Sources (platform-ca | platform-eu | platform-au) page and associated panels. See Manage privileges and Privilege reference for more information.
Note
The Edit all privilege isn’t required to create sources. When granting privileges for the Sources domain, you can grant a group or API key the View all or Custom access level, instead of Edit all, and then select the Can Create checkbox to allow users to create sources. See Can Create ability dependence for more information. |
Actions | Service | Domain | Required access level |
---|---|---|---|
View sources, view source update schedules, and subscribe to source notifications |
Content |
Fields |
View |
Sources |
|||
Organization |
Organization |
||
Edit sources, edit source update schedules, and view the View and map metadata subpage |
Content |
Fields |
Edit |
Sources |
|||
Content |
Source metadata |
View |
|
Organization |
Organization |
What’s next?
-
If you selected the Same users and groups as in your content system content security option, you might want to read up on how Coveo manages security identities and item permissions to replicate the permission models of the original repository.
-
If you’re using the Crawling Module to retrieve your content, consider subscribing to deactivation notifications to receive an alert when a Crawling Module component becomes obsolete and stops the content crawling process.