--- title: Performance leading practices slug: n8ub0545 canonical_url: https://docs.coveo.com/en/n8ub0545/ collection: index-content source_format: adoc --- # Performance leading practices SharePoint Online is a complex system and tenants typically hold large volumes of content. Capturing only relevant [items](https://docs.coveo.com/en/210/), limiting [indexing](https://docs.coveo.com/en/204/) times, and maintaining [source](https://docs.coveo.com/en/246/) content freshness can be challenging. The goal of this article is to present SharePoint Online [connector](https://docs.coveo.com/en/2734/) [scoping features](#scope-the-content-to-index) and [other indexing strategies](#additional-indexing-strategies) which, when combined, can significantly improve indexing performance. ## Scope the content to index You should only index items that you deem necessary for your [search interface](https://docs.coveo.com/en/2741/) users. Excluding unimportant content from being [indexed](https://docs.coveo.com/en/204/) improves search relevance and reduces indexing time. There are several ways you can configure the SharePoint Online connector to exclude irrelevant content. ### Use the "Content to index" subtab options On the **Add/Edit a SharePoint Online Source** page, in the [**Content to index**](https://docs.coveo.com/en/1739#content-to-index-subtab) subtab, the **All sites** option may be selected. This configuration crawls every site available in your SharePoint Online tenant. You should instead be selective as to the content you want to index. For example, select the **Specific URLs** option and specify starting URLs you want to crawl. Only SharePoint Online items whose URLs begin with the specified starting URLs will be indexed. ![Excluding subsite content](https://docs.coveo.com/en/assets/images/index-content/spo-specific-items-example.png) To exclude specific subsites under a given starting URL, [use URL exclusions and inclusions](#use-exclusion-and-inclusion-rules). ### Avoid indexing folders and unapproved items On the **Add/Edit a SharePoint Online Source** page, in the [**Content to index**](https://docs.coveo.com/en/1739#content-to-index-subtab) subtab, the additional content **Folders** and **Unapproved items** options aren't selected by default. This is the recommended configuration. SharePoint Online sources [created before February 18, 2020](https://docs.coveo.com/en/2687/) had a different default configuration. If you have old SharePoint Online sources, [deselect](https://docs.coveo.com/en/1739#add-a-sharepoint-online-source) the **Folders** and **Unapproved items** options, if applicable. > **Important** > > In SharePoint Online, a folder is strictly a container with some metadata. > The following are considered folders: > > * A discussion (that is, a thread) in a discussions list. > > * A notebook containing a OneNote note. > > The SharePoint Online connector crawls items inside a folder even if the **Folders** option isn't selected. > For example, a discussion will not be indexed but all its replies will. ### Only index items modified within a rolling period On the **Add/Edit a SharePoint Online Source** page, in the [**Content to index**](https://docs.coveo.com/en/1739#content-to-index-subtab) subtab, you can configure a rolling period. Only items created or modified within the specified rolling period are [indexed](https://docs.coveo.com/en/204/). More precisely, the feature has the following effects during [source](https://docs.coveo.com/en/246/) updates: [%header,cols="1,5"] |=== |Update type |Effects |[rebuild](https://docs.coveo.com/en/2712/) a|Your source is emptied, then only items modified within the rolling window are added to your source. |[rescan](https://docs.coveo.com/en/2711/) a|The connector crawls the entire content your source targets, and: * New items are added to your source. * Items that have been modified since the last source update are updated in your source. * Items that haven't been modified within the rolling window or that have been deleted in SharePoint are removed from your source. |[refresh](https://docs.coveo.com/en/2710/) a|The connector crawls items that have been modified/added since the last source update and items whose last modified date is outside the rolling window, and: * Items that have been modified/added since the last source update are updated/added in your source. * Items that haven't been modified within the rolling window are removed from your source. |=== See [Exclude older content](https://docs.coveo.com/en/1739#exclude-older-content) for configuration details. ### Exclude items on a metadata value basis On the **Add/Edit a SharePoint Online Source** page, in the [**Content to index**](https://docs.coveo.com/en/1739#content-to-index-subtab) subtab, you can define a condition based on metadata values to prevent items from being crawled. The condition may be a single expression or a combination of expressions. The following operators are supported: `AND`, `OR`, `Exists`, `NOT`, `==`, `>`, and `<`. Parentheses are also supported to specify operation order. See [Exclude items on a metadata value basis](https://docs.coveo.com/en/1739#exclude-items-on-a-metadata-value-basis) for configuration details and examples. ### Use exclusion and inclusion rules On the **Add/Edit a SharePoint Online Source** page, in the [**Content to index**](https://docs.coveo.com/en/1739#content-to-index-subtab) subtab, add exclusion and/or inclusion rules to specify the URLs you want to crawl. Exclusion and inclusion rules can be useful, for example, to prevent [indexing](https://docs.coveo.com/en/204/) irrelevant pages you're redirected to. Exclusion and inclusion rules support regex and wildcard expressions. **Example** In the **Content to index** subtab, you selected the **Specific URLs** option and specified the following URL: `+https://sometenant.sharepoint.com/SiteA/+`. `SiteA` contains subsites as follows: * `+https://sometenant.sharepoint.com/SiteA/SubSiteAA/+` * `+https://sometenant.sharepoint.com/SiteA/SubSiteAA/SubSubSiteAAA/+` You want to index the contents of `SiteA` and `SubSubSiteAAA`, but not those of `SubSiteAA`. To achieve this, you could use the following rules: ![Excluding subsite content](https://docs.coveo.com/en/assets/images/index-content/spo-exclusion-filters-example.png) See [Exclusions and inclusions](https://docs.coveo.com/en/1739#exclusions-and-inclusions) for details on how these rules are applied. ### Exclude template types On the **Add/Edit a SharePoint Online Source** page, in the [**Content to index**](https://docs.coveo.com/en/1739#content-to-index-subtab) subtab, the [**Exclude template types**](https://docs.coveo.com/en/1739#exclude-template-types) option contains many [SharePoint Online list template types](https://learn.microsoft.com/en-us/previous-versions/office/sharepoint-server/ee541191(v=office.15)) by default. Exclude as many template types as possible if they're not relevant to the search experience. ### Index only tagged sites The SharePoint Online source features an option to index only sites that are "tagged" the same way. This option is useful when you want to index only a subset of your tenant sites. . If you haven't already done so, [tag your sites with the `CoveoSiteFilter` managed property](https://docs.coveo.com/en/naoe0288/). . On the source configuration [**Content to index**](https://docs.coveo.com/en/1739#content-to-index-subtab) subtab, select [**SharePoint Online content**](https://docs.coveo.com/en/1739#sharepoint-online-content), and then select the **All sites** option. . On the [**Sources**](https://platform.cloud.coveo.com/admin/#/orgid/content/sources/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/sources/)) page, click your SharePoint Online source, and then click **More** > **Edit configuration with JSON**. . In the **Edit configuration with JSON** panel, locate the `OnlyIndexSitesWithCoveoProperty` parameter. . Set the `value` to what you tagged your sites as using the `CoveoSiteFilter` property. **Example** You tagged all your Canadian sites with the `CoveoSiteFilter` managed property value `Canada`. Therefore, you would set `value` to `Canada`. ![Only index sites with the `CoveoSiteFilter` managed property value `Canada`](https://docs.coveo.com/en/assets/images/index-content/spo-only-index-sites-with-coveo-property.png) . Click **Save** to apply your change for subsequent update operations. ### Ignore items of a specific file type or index only their metadata A SharePoint Online tenant typically contains items of many types. If there's no value in indexing files of a given type, you should ignore items of this type. If you're not interested in the [body](https://docs.coveo.com/en/3313/) of items of a given type, you can index only their [metadata](https://docs.coveo.com/en/218/). Here are a few examples of file types that you should consider ignoring: * Operating system files (for example, `.HSancillary`) * Configuration, log, and other various text files (for example, `.ini`, `.config`, `.xml`, `.json`, `.log`, `.bak`, `.txt`) * Code files (for example, `.py`, `.css`, `.sql`, `.ts`, `.cs`, `.java`) You configure file type-specific indexing actions in the [**File types**](https://docs.coveo.com/en/1739#file-types) section, on the source configuration **Items** tab. **Example** By default, `.txt` and `.log` files are indexed as follows: * **Action by default**: `Index content and metadata` * **Action on error**: `Index metadata` To ignore `.txt` and `.log` files, set their actions to `Ignore item`: ![File types settings showing .txt and .log excluded | Coveo Administration Console](https://docs.coveo.com/en/assets/images/index-content/spo-excluding-extensions.png) For more details on ignoring items of a specific file type or indexing only their metadata, see [File type handling](https://docs.coveo.com/en/l3qg9275/). ### Postpone reindexing changes to list folder content When performing a [refresh](https://docs.coveo.com/en/2039#refresh), the SharePoint Online connector bases itself on the [`RecrawlListFolderContentOnChange`](https://docs.coveo.com/en/o2f80147#recrawllistfoldercontentonchange-boolean) JSON configuration parameter value to determine whether to recrawl list folder content when changes were made at the folder level. Recrawling list folder content when folder changes occur frequently can lead to long [indexing](https://docs.coveo.com/en/204/) times and increased API call usage. At worst, the source may reindex list folders continually. SharePoint Online sources created since September 13, 2021, don't recrawl list folder content upon detected changes. SharePoint Online sources created before September 13, 2021 were set by default to recrawl changed list folders. To make sure your scheduled [refreshes](https://docs.coveo.com/en/2710/) don't recrawl changed list folders, validate that the `RecrawlListFolderContentOnChange` parameter in your source JSON configuration is set to `false`. When set to `false`, your source [indexes](https://docs.coveo.com/en/204/) changes in list folders during the following [rescan](https://docs.coveo.com/en/2039#rescan) or [rebuild](https://docs.coveo.com/en/2039#rebuild). To set the `RecrawlListFolderContentOnChange` parameter value to `false`: . On the [**Sources**](https://platform.cloud.coveo.com/admin/#/orgid/content/sources/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/sources/)) page, click your SharePoint Online source, and then click **More** > **Edit configuration with JSON**. . In the **Edit configuration with JSON** panel, locate the `RecrawlListFolderContentOnChange` parameter, and set its `value` to `false`. . Click **Save** to apply your change for later refresh operations. ### A word on indexing pipeline extensions [indexing pipeline extensions (IPEs)](https://docs.coveo.com/en/206/) are a powerful way to customize the indexing process. However, whereas your [SharePoint Online connector configurations](https://docs.coveo.com/en/1739#add-a-sharepoint-online-source) are applied in the [crawling stage](https://docs.coveo.com/en/1893/) of the Coveo indexing pipeline, indexing pipeline extensions are applied in the [document processing manager (DPM)](https://docs.coveo.com/en/191/). Using an IPE to [reject items](https://docs.coveo.com/en/58/) doesn't reduce the number of items crawled and, therefore, only adds to the total time required before source items are indexed. Only use IPEs as a last resort, when filtering isn't possible natively in the SharePoint Online connector. ## Additional indexing strategies The SharePoint Online connector uses SharePoint Online APIs to retrieve your tenant content. [SharePoint Online throttles](https://learn.microsoft.com/en-us/sharepoint/dev/general-development/how-to-avoid-getting-throttled-or-blocked-in-sharepoint-online) applications to control usage of its APIs. The following leading practices are meant to minimize the impacts of throttling on connector update times. ### Use certificates Using a certificate to access SharePoint Online APIs rather than a user account increases the call rate limits before throttling is applied. Therefore, use [certificate authentication](https://docs.coveo.com/en/1739#app-authentication-using-certificate-recommended) over delegated authentication when [adding or editing a SharePoint Online source](https://docs.coveo.com/en/1739#add-a-sharepoint-online-source). Additionally, using a certificate enables [automatic refresh operations](https://docs.coveo.com/en/1739#parallelrefresh) during a rescan. > **Note** > > To avoid user account throttling occurrences, Coveo prevents refreshes from being executed during a rescan when using **User delegated access using OAuth 2.0**. > Permitting this wouldn't yield any advantages for users. ### Split your sources and optimize update schedules SharePoint Online limits API calls per minute and per day. Minimizing connector update times involves taking into account these time window limits. Splitting big SharePoint Online sources into several smaller ones allows you to spread scheduled source updates over time, reducing the risk of throttling and the duration of throttling episodes. You can split sources into smaller ones by: * Alphabetical order of site names. * A given maximum number of specific sites per source. * Content type (for example, user profiles, documentation, teams). > **Important** > > Make sure you consider content that will be added over time in your SharePoint Online tenant (for example, new sites). > For example, you can achieve this by setting up "catch remaining" sources. > > If you need to create a new source to capture newly added content in your tenant, consider [duplicating an existing source](https://docs.coveo.com/en/3390#duplicate-a-source) that indexes analogous content instead. > This will make configuring the new source simpler and faster. The SharePoint Online connector supports [refresh](https://docs.coveo.com/en/2710/) and [rescan](https://docs.coveo.com/en/2711/) source updates. You should understand the [differences between both source update types](https://docs.coveo.com/en/2705/) and [set scheduled update frequencies](https://docs.coveo.com/en/2705#determine-incremental-source-update-frequency) that make sense in your context. You should also have a global scheduling strategy to prevent overlapping source updates or having multiple lengthy updates running on the same day of the week.