Performance leading practices
Performance leading practices
SharePoint Online is a complex system and tenants typically hold large volumes of content. Capturing only relevant items, limiting indexing times, and maintaining source content freshness can be challenging.
The goal of this article is to present SharePoint Online connector scoping features and other indexing strategies which, when combined, can significantly improve indexing performance.
Scope the content to index
You should only index items that you deem necessary for your search interface users. Excluding unimportant content from being indexed improves search relevance and reduces indexing time.
There are several ways you can configure the SharePoint Online connector to exclude irrelevant content.
Target specific content in your tenant
In a SharePoint Online source, in the Content to include section, the All sites option is selected by default. This configuration crawls every site available in your SharePoint Online tenant. You should instead be selective as to the content you want to index.
For example, select the Specific items option and specify starting URLs you want to crawl. Only SharePoint Online items whose URLs begin with the specified starting URLs will be indexed.

|
Note
If you need to exclude URLs under a given starting URL (e.g., to exclude specific subsites), you can do so using filters. |
Avoid indexing folders and unapproved items
When adding a SharePoint Online source, in the Content to include section, the additional content Folders and Unapproved items options aren’t selected by default. This is the recommended configuration.
SharePoint Online sources created before February 18, 2020 had a different default configuration. If you have old SharePoint Online sources, we recommend you deselect the Folders and Unapproved items options, if applicable.
|
In SharePoint Online, a folder is strictly a container with some metadata. The following are considered folders:
The SharePoint Online connector crawls items inside a folder even if the Folders option isn’t selected. For example, a discussion will not be indexed but all its replies will. |
Use inclusion and exclusion filters
In a SharePoint Online source, in the Filters section, add Inclusion filters and/or Exclusion filters to specify which pages you want to index on a URL basis. Inclusion and exclusion filters can be useful, for example, to prevent indexing irrelevant pages you’re redirected to.
Filters support regex and wildcard expressions.
In the Content to include section, you specified the following starting URL: https://sometenant.sharepoint.com/SiteA/
.
SiteA
contains subsites as follows:
-
https://sometenant.sharepoint.com/SiteA/SubSiteAA/
-
https://sometenant.sharepoint.com/SiteA/SubSiteAA/SubSubSiteAAA/
You want to index the contents of SiteA
and SubSubSiteAAA
, but not those of SubSiteAA
.
To achieve this, you could use the following filters:

For details on how inclusion and exclusion fitlers are applied, see Filters section.
Ignore non-relevant template types
In a SharePoint Online source, in the Filters section, the List template types to ignore option contains many SharePoint Online list template types by default. Ignore as many template types as possible if they’re not relevant to the search experience.
Ignore specific file types or index them by reference
A SharePoint Online tenant typically contains many file types. If there’s no value in indexing files of a given type, you should ignore items of this type. If you’re not interested in the body of items of a given type, but you’d like to index some metadata on these items, then you should index by reference.
Here are only a few examples of file types that you should consider ignoring:
-
Operating system files (e.g.,
.HSancillary
) -
Configuration, log, and other various text files (e.g.,
.ini
,.config
,.xml
,.json
,.log
,.bak
,.txt
) -
Code files (e.g.,
.py
,.css
,.sql
,.ts
,.cs
,.java
)
You configure file type-specific indexing actions in the Edit a Source JSON Configuration panel.
By default, .txt
and .log
file action
and actionOnError
settings are set to Retrieve
and Reference
respectively.
To ignore .txt
, .log
, and .bak
files, you could edit the related section of the JSON configuration as follows:

For more details on ignoring file formats and indexing by reference, see Customize the indexing process.
Postpone re-indexing changes to list folder content
When performing a refresh, the SharePoint Online connector bases itself on the RecrawlListFolderContentOnChange
JSON configuration parameter value to determine whether to recrawl list folders when changes are detected.
Recrawling list folder content based on changes can lead to long indexing times when changes occur frequently.
Your source can potentially re-index list folders continually.
SharePoint Online sources created since September 13, 2021, aren’t set by default to recrawl a list folder upon detected changes. SharePoint Online sources created before September 13, 2021 were set by default to recrawl changed list folders.
To make sure your scheduled refreshes don’t recrawl changed list folders, validate that the RecrawlListFolderContentOnChange
parameter in your source JSON configuration is set to false
.
When set to false
, your source indexes changes in list folders during the following rescan or rebuild.
To set the RecrawlListFolderContentOnChange
parameter value to false
:
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your SharePoint Online source, and then click More > Edit JSON.
-
In the Edit a Source JSON Configuration panel, locate the
RecrawlListFolderContentOnChange
parameter, and set itsvalue
tofalse
. -
Click Save to apply your change for subsequent refresh operations.
A word on indexing pipeline extensions
indexing pipeline extensions (IPEs) are a powerful way to customize the indexing process. However, whereas your SharePoint Online connector configurations are applied in the crawling stage of the Coveo indexing pipeline, indexing pipeline extensions are applied in the document processing manager (DPM). Using an IPE to reject items doesn’t reduce the number of items crawled and, therefore, only adds to the total time required before source items are indexed.
Only use IPEs as a last resort, when filtering isn’t possible natively in the SharePoint Online connector.
Additional indexing strategies
The SharePoint Online connector uses SharePoint Online APIs to retrieve your tenant content. SharePoint Online throttles applications to control usage of its APIs.
The following leading practices are meant to minimize the impacts of throttling on connector update times.
Use certificates to avoid throttling
Using a certificate to access SharePoint Online APIs rather than a user account increases the call rate limits before throttling is applied. That’s why we recommend using certificate authentication over delegated authentication when adding or editing a SharePoint Online source.
Split your sources and optimize update schedules
SharePoint Online limits API calls per minute and per day. Minimizing connector update times involves taking into account these time window limits. Splitting big SharePoint Online sources into several smaller ones allows you to spread scheduled source updates over time, reducing the risk of throttling and the duration of throttling episodes.
You can split sources into smaller ones by:
-
Alphabetical order of site names.
-
A given maximum number of specific sites per source.
-
Content type (e.g., user profiles, documentation, teams).
|
Make sure you take into account content that will be added over time in your SharePoint Online tenant (e.g., new sites). For example, you can achieve this by setting up "catch remaining" sources. If you need to create a new source to capture newly added content in your tenant, consider duplicating an existing source that indexes analogous content. This will make configuring the new source simpler and faster. |
The SharePoint Online connector supports refresh and rescan source updates. You should understand the differences between both source update types and set scheduled update frequencies that make sense in your context. You should also have a global scheduling strategy to prevent overlapping source updates or having multiple lengthy updates running on the same day of the week.