Filter Your Content
Filter Your Content
After indexing content, you should browse the results in the Coveo Administration Console (see Review item properties).
Browsing your index content should reveal opportunities to improve your indexing configurations to filter out irrelevant items and item content. Cleaning your data before it’s indexed not only increases the relevance of your search results, but it also improves the performance of your solution.
Filtering out extraneous items and item content can be achieved in a number of ways. The goal of this article is to guide you in this process.
Default Item Indexing Filtering
When you install Coveo for Sitecore, it’s configured to exclude certain items from being sent to the Coveo indexes. Before you begin exploring methods to filter items yourself, you should take note of this default filtering (see Out-of-the-Box Item Indexing Filtering).
Filtering Items Based on Their Location in the Sitecore Content Tree
By default, Coveo for Sitecore indexes all items under /sitecore/content
, and all items under /sitecore/media library/Files
.
You can change these default crawling root settings and define other crawling roots.
This allows you to select the items you want, based on their location in the Sitecore content tree (see Change the Crawling Root of an Index).
Configuring and leveraging the Coveo ItemPathInboundFilter
processor achieves the same result, though changing the crawling root is a better practice (see Use the ItemPathInboundFilter).
Filtering Items Based on the Sitecore Template They Derive From
Sitecore items are created from templates. You can include or exclude an item from indexing based on the template that was used to create the item (see Specify Which Templates to Index).
Filtering Items That Do Not Have a Layout
Sitecore content items without a layout to are most likely items you don’t want to index. Through a simple configuration change, you can enable a Coveo for Sitecore processor that excludes items that don’t have a layout from being indexed (see Exclude Items Without Layouts From Being Indexed).
Filtering Items Programmatically
When none of the simple solutions above allow you to filter out the items you don’t want to index, but these items have something in common, a programmatic approach may be considered (see Create a Custom coveoInboundFilterPipeline Processor).
Item Content and Fields Filtering
If you’re indexing the HTML content of your rendered Sitecore items to make that content searchable, you may want to exclude useless sections of your web pages from being indexed (for example, <nav>
, <header>
, <footer>
section contents).
You can quickly do so by editing your Sitecore layouts and sublayouts, and configuring the FetchPageContentProcessor
processor (see Configuring the Processor).
As for Sitecore fields, the Coveo for Sitecore Indexing Manager lets you handpick the ones you want to index (see About the Indexing Manager - Fields).