Excluding Sitecore Items From Your Coveo Index
Excluding Sitecore Items From Your Coveo Index
Most of the time, Coveo for Sitecore will index more items than what’s needed.
It’s highly recommended to reduce as much as possible the amount of content which doesn’t bring value to your visitors.
This article highlights the Coveo for Sitecore tools which can help you manage your content.
Index Crawling Root
Changing the crawling root is definitely the way to go when you want to exclude items from the Coveo index (see Changing the Crawling Root of an Index). With this method, the items aren’t even analyzed, therefore saving resources and time when rebuilding your index.
Available Pipelines
If it’s not possible in your setting to exclude those items by changing the crawling root, you can use pipelines to filter out the items as they’re analyzed. Review the following pipeline descriptions to decide which one suits your needs best.
You can use the inbound indexing pipelines to prevent items from being indexed in Coveo. To do so, you can either use the default Sitecore pipelines or the additional Coveo pipelines that come with Coveo for Sitecore (see Sitecore 7 Inbound and Outbound Filter Pipelines and Understanding the Indexing and Search Pipelines).
Use the Coveo pipelines to avoid affecting other indexes used by Sitecore.
If excluded items were previously indexed, you must rebuild the index to delete them.
coveoInboundFilterPipeline
This pipeline is the only one that’s run only for Coveo indexes. This makes it an ideal candidate to prevent items from being indexed for Coveo indexes and keep Lucene indexes untouched. The processors for this pipeline require a different type of argument. Therefore, processors from the Sitecore indexing.filterIndex.inbound pipeline
can’t be used directly in this pipeline without adapting their code.
A default Coveo.SearchProvider.InboundFilters.ApplySitecoreInboundFilterProcessor
processor is included in the pipeline to run the Sitecore indexing.filterIndex.inbound
pipeline. Therefore, all the Sitecore inbound pipeline processors are also run for Coveo indexes. This processor can be removed if desired (see Creating a Custom Coveo Inbound Filter).
Coveo for Sitecore (December 2016)
You can exclude from the index all items that don’t have a layout, using the Coveo.SearchProvider.InboundFilters.HasLayoutInboundFilter
processor (see Excluding Items Without Layouts From Being Indexed).
coveoSitecoreInboundFilterPipeline
This pipeline uses the same type of arguments as the Sitecore indexing.filterIndex.inbound
pipeline and works in conjunction with the Coveo.SearchProvider.InboundFilters.InvokeSitecoreInboundFilterPipeline
processor in the coveoInboundFilterPipeline
pipeline. It was created to avoid adjusting the code of older Sitecore indexing.filterIndex.inbound
pipeline processors, and be able to use them for Coveo indexes only.
indexing.filterIndex.inbound
This pipeline is available out of the box with Sitecore. It runs its processors for all the possible Sitecore search indexes, Lucene (or Solr), and Coveo. This is a problem with Coveo for Sitecore, as it runs side-by-side with the default Lucene indexes but you probably don’t want to exclude items from the Lucene index. Therefore, we don’t recommend that you use this pipeline to exclude items from your Coveo index. You’re advised to use the coveoInboundFilterPipeline
provided with Coveo for Sitecore unless you’re sure that you want to exclude the items from all indexes, including Lucene, Solr or Azure Search.
What’s Next?
Once the crawling scope is well adjusted and the right filters are in place, you need to decide what will be indexed on your item (see Creating an HTML Representation of Page Content).