Keep an index up to date
Keep an index up to date
Once created, a source must periodically retrieve data from its original content repository to maintain up-to-date items (and possibly permissions) in your index. Most sources support three different ways to update their content: refresh, rescan, or rebuild.
This article offers guidelines on scheduling and triggering source updates and monitoring the indexing process. The following diagram and table outline how each source update operation affects the Coveo indexing pipeline at various stages.
Indexing pipeline stage | refresh | rescan | rebuild |
---|---|---|---|
Crawling |
N/A |
Crawl the entire content repository |
|
DPM |
Receive and process an incremental update retrieved directly from the indexed system (for example, using an API) |
Receive and process only new or updated crawled content |
Receive and process all crawled content |
Indexing |
|
|
|
Note
Each connector available in the Coveo Platform implements its own rebuild, rescan, and possibly refresh mechanisms. However, to leverage a Push source, you must essentially design and host your own custom connector relying on the Push API. This implies that you’re fully responsible for implementing and scheduling the execution of your Push source update mechanisms. The Push API lets you set the status of your Push source so that your activity logs remain coherent. |
Trigger a full source rebuild
A full source rebuild re-indexes an entire content repository from scratch. This type of source update can weigh heavily on the computing resources of the indexed system, as well as on those of the Coveo Platform, especially when a large content repository must be re-indexed. For this reason, you can only trigger a source rebuild manually.
Typically, you should only rebuild a source when:
-
You create that source, assuming its initial build process didn’t start automatically, or was somehow canceled.
-
You modify one or more of the following settings on that source:
-
Its basic configuration (for example, you edit the web scraping configuration of a Sitemap source).
-
Its mappings (for example, you remove a mapping from a source).
-
Its indexing pipeline extensions (IPEs) (for example, you associate a new post-conversion IPE to a source).
NoteIf you modify an IPE that applies to many sources, you should rebuild each of those sources.
-
-
You enable or disable one of the following options on a field that’s associated with that source:
-
Search operator
-
Displayable in results
-
Free text search
-
Ranking
-
Stemming
Any time you change a setting that must apply to all items in a source (that is, to all unmodified, modified, and new items), you should rebuild that source.
-
Schedule and trigger incremental source updates
You will typically schedule incremental updates on each of your sources to ensure that your indexed content remains up-to-date. When you need some content repository changes to become quickly available in your index, you can also trigger an incremental source update manually.
There are two types of incremental source updates:
-
Refresh
Retrieves an incremental update directly from an indexed system (for example, by leveraging an exposed REST API).
-
Rescan
Re-crawls an entire content repository to retrieve new and updated content.
A refresh consumes less computing resources than a rescan. However, while all connectors support rescans, several don’t support refreshes at all, or have limited refresh capabilities. For example, upon refreshing, some data, metadata, or permissions may not be updated, or deleted items may not be removed from the index.
Consequently, a source that supports refreshing should normally have both a refresh and a rescan schedule. The rescan will clean up the index after several refreshes have occurred.
Note
To figure out whether a given source connector supports refresh and, if so, whether that connector has any refresh limitations, see the documentation on managing sources based on that connector. |
Determine incremental source update frequency
Non-generic Coveo connectors supporting incremental source updates (for example, SharePoint, Jive, YouTube, etc.) have default schedules which are adapted to the system they connect to, as well as to crawler performance. These default schedules fit most use cases and should only be modified to reduce the load on an indexed system or to ensure that a small source is updated more often. |
The refresh and rescan frequency of a source should depend on two factors:
-
How often the original content repository changes.
-
How quickly original content repository changes need to become available in your index.
In addition, when a source supports refresh, its rescan frequency should depend on the quality of those refreshes. A source that has a highly reliable refresh mechanism may only require a monthly rescan to handle edge-case indexing issues. In contrast, a source with a flawed refresh mechanism might require a daily rescan schedule.
-
A Jira source that indexes an instance with many active users could refresh every 15-30 minutes, and rescan every Sunday.
-
A Sitemap source that indexes a relatively stable intranet could refresh every week day, and rescan every Sunday.
-
A source that doesn’t support refresh, but indexes content that changes several times a day could be rescanned every 4-6 hours.
-
A source that indexes entirely static content shouldn’t have a refresh or rescan schedule at all.
Note
Updating a source taxes computing resources in the Coveo Platform and in the indexed system itself. Therefore, you should be careful to schedule source updates to be performed only as frequently as required. |
Monitor the indexing process
There are two ways you can monitor the indexing process:
-
Inspect the indexing pipeline logs
The Log Browser (platform-ca | platform-eu | platform-au) lets you review the status of each item going through the indexing process. This information can be extremely useful when troubleshooting indexing issues.
-
Subscribe to source notifications
You can subscribe to email notification to be alerted when certain activities are triggered on a specific source, such as when a refresh, rescan, or rebuild fails, succeeds, or aborts.
What’s next?
The Explore indexed content article explains how you can navigate and inspect indexed content through the Coveo Administration Console.