--- title: About the indexing process slug: '2684' canonical_url: https://docs.coveo.com/en/2684/ collection: project-guide source_format: adoc --- # About the indexing process Each piece of data considered for [indexing](https://docs.coveo.com/en/204/) (including web pages, files, database records, and Salesforce objects, among others) must go through the entire [Coveo indexing pipeline](https://docs.coveo.com/en/184/) to end up as an [item](https://docs.coveo.com/en/210/) in your index. This article and diagram provide an overview of the main indexing pipeline stages. ![Flowchart showing the steps of the Coveo indexing pipeline with extensions](https://docs.coveo.com/en/assets/images/index-content/indexing-pipeline-flowchart-with-extension.png) For detailed information on all indexing pipeline stages, see [Coveo indexing pipeline](https://docs.coveo.com/en/1893/). ## Crawling At the _crawling_ stage of the indexing pipeline, raw data is retrieved from various content repositories, and sent to the [document processing manager (DPM)](https://docs.coveo.com/en/191/) queue. To specify which content repositories need to be crawled at this stage, you must [create and configure](https://docs.coveo.com/en/3390/) [sources](https://docs.coveo.com/en/246/) in your [Coveo organization](https://docs.coveo.com/en/185/). The [Coveo Platform](https://docs.coveo.com/en/186/) offers [a broad range](https://docs.coveo.com/en/1702#connector-types) of source [connectors](https://docs.coveo.com/en/2734/), many of which are designed to index data residing in specific systems such as Jira or SharePoint, while others are more generic. Sources indexing secured content can retrieve [permissions](https://docs.coveo.com/en/223/) and [security identities](https://docs.coveo.com/en/240/) to [replicate secured systems](https://docs.coveo.com/en/1719/) in your index. In general, sources that [retrieve](https://docs.coveo.com/en/1612/) Cloud-accessible content rely on [Coveo crawlers](https://docs.coveo.com/en/2121/), whereas sources that retrieve content from behind firewalls rely on the Coveo Crawling Module or Push API directly. In the [Prepare to index content](https://docs.coveo.com/en/2680/) article, you'll learn how to determine which connectors are required for your search project. In the [Apply indexing techniques](https://docs.coveo.com/en/2721/) article, you'll learn how to [take advantage of the crawling flexibility](https://docs.coveo.com/en/2721#set-the-crawling-scope-and-refine-or-enhance-crawled-content) of your sources. ## Applying extensions [indexing pipeline extensions (IPEs)](https://docs.coveo.com/en/206/) are executed during the _applying extensions_ stages of the indexing pipeline. An IPE is a custom Python 3 script that either runs before (_pre-conversion_) or after (_post-conversion_) the [processing](#processing) and [mapping](#mapping) stages of the indexing pipeline. Each source can have [its own set](https://docs.coveo.com/en/1936/) of pre-conversion and post-conversion IPEs. IPEs typically alter or reject candidate items before they can reach the index. Later in this guide, you'll learn [how and when to use](https://docs.coveo.com/en/2721#further-refine-or-enhance-content-using-indexing-pipeline-extensions) pre-conversion and post-conversion IPEs. > **Leading practice** > > IPEs can slow down the indexing pipeline process and make it difficult to troubleshoot. > You should only use IPEs when necessary. See also: * [Indexing pipeline extension overview](https://docs.coveo.com/en/1556/) * [Use the Extensions API](https://docs.coveo.com/en/156/) ## Optical character recognition At the _optical character recognition_ (OCR) stage of the indexing pipeline, the [Coveo Platform](https://docs.coveo.com/en/186/) extracts text from images and PDF files in sources for which the [OCR feature](https://docs.coveo.com/en/2937/) has been enabled. OCR-extracted text is processed as item data, meaning that it's fully searchable, and will appear in the item [Quick view](https://docs.coveo.com/en/2760#search-result-quick-view). ## Processing At the _processing_ stage of the indexing pipeline, candidate items are converted to a format suitable for indexing, and automatic language detection occurs, if applicable. You can exercise no direct control over this stage. The indexer can convert various standard [file formats](https://docs.coveo.com/en/1689/). Candidate items whose format isn't supported can still be indexed by reference. The index also supports a [wide array of languages](https://docs.coveo.com/en/1956/), many of which have their own [stemmer](https://docs.coveo.com/en/1576/). ## Mapping At the _mapping_ stage of the indexing pipeline, candidate item metadata is associated to [fields](https://docs.coveo.com/en/200/) in the index. You can exercise [granular control](https://docs.coveo.com/en/1640/) over this stage through the [mapping](https://docs.coveo.com/en/217/) configuration of each of your sources. Elsewhere in this guide, you can learn how to concatenate metadata using [custom mapping rules](https://docs.coveo.com/en/2721#define-custom-mapping-rules-to-populate-fields) and how to [create and customize fields](https://docs.coveo.com/en/2721#create-and-populate-custom-fields). > **Note** > > The mapping stage merely establishes which metadata key-value pairs are going to populate which fields. > Fields are actually populated at the [indexing](#indexing) stage. See also: * [Mapping Rule Syntax Reference](https://docs.coveo.com/en/1839/) * [Add or edit a body mapping](https://docs.coveo.com/en/1847/) ## Indexing At the _indexing_ stage of the indexing pipeline, fields are populated with metadata as determined at the [mapping](#mapping) stage, and the fully processed item is committed to the index. You can exercise no direct control over this stage. ## What's next? The [Prepare to index content](https://docs.coveo.com/en/2680/) article outlines the steps you should follow before you start indexing content.