--- title: File type handling slug: l3qg9275 canonical_url: https://docs.coveo.com/en/l3qg9275/ collection: index-content source_format: adoc --- # File type handling Scoping the content to index helps reduce the number of irrelevant [items](https://docs.coveo.com/en/210/) in your index and improves indexing performance. As a result, your content will stay fresh and relevant, ensuring a better search experience for users. For [supported file formats](https://docs.coveo.com/en/1689/), Coveo's file type handling configurations are typically designed to index more rather than less, with the expectation that users will further refine the scope. You can refine scoping by modifying existing file type configurations or by adding new file type configurations if no existing ones match your needs. For many source types, this feature is available when adding or editing a source under the **Items** tab. Otherwise, you can edit the file type configurations in the [**Edit configuration with JSON**](https://docs.coveo.com/en/1685/) panel. ## File type configurations Each file type configuration consists of: * A **Default action**: the default indexing behavior applied when encountering an item of the given file type. * An **Action on error**: the fallback indexing behavior when an error occurs [at the conversion stage](#conversion-stage) on an item of the given file type (for example, when a document is corrupted). The **Action on error** is typically a more limited action than the **Default action**. You can choose from the following actions: * `Index content and [metadata](https://docs.coveo.com/en/218/)` (not available as an **Action on error** value) * `Index metadata` * `Ignore item` See [Choosing the right default action](#choosing-the-right-default-action) for a comparison of the results of each action from the perspective of search functionality. > **Notes** > > * When using the [**Edit configuration with JSON**](https://docs.coveo.com/en/1685/) panel, these action values are `Retrieve`, `Reference`, and `Ignore` respectively. > > * When the action applied to an item is `Ignore item`, the [**Log Browser**](https://platform.cloud.coveo.com/admin/#/orgid/logs/browser/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/logs/browser/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/logs/browser/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/logs/browser/)) logs the item as `Skipped`, with the activity value `filterbydoctype`. > > ![Filter by document type in the Log Browser | Coveo](:https://docs.coveo.com/en/assets/images/index-content/skipped-item.png) ## File type detection Coveo's file type configurations allow targeting either **Extensions** (for example, `.pdf`, `.docx`) or **Content-Types**. The **Others** section contains file type configurations that catch items not matching any of the defined extensions or content types. > **Tip** > > File type detection looks at **Extensions** configurations first. > Favor adding or editing **Extensions** configurations over **Content-Types**. Coveo uses the file type configurations to detect and handle items at two stages of the [Coveo indexing pipeline](https://docs.coveo.com/en/1893/). [Barring errors](#file-type-configurations), the default action of the matching file type configuration is applied at each stage, as follows: . **Crawling stage** The source [crawler](https://docs.coveo.com/en/2121/) retrieves the item from the content repository. The crawler tries to determine the item's file type based on the file extension or the content type it receives from the repository. The crawler then applies the matching file type configuration's default action to determine whether to send the item content and metadata further downstream in the Coveo indexing pipeline. . **Converter stage** Regardless of the source type, all items go through the [document processing manager (DPM)](https://docs.coveo.com/en/191/). The role of this component is to handle items of various file types and convert them into a common format that can be recorded in the index. When trying to detect the item's file type, the [document processing manager (DPM)](https://docs.coveo.com/en/191/) [uses the item's content](https://docs.coveo.com/en/1893#processing) (for example, the binary data of a file). For the converter, most plain text files are treated the same; whether they're `.txt`, `.bat`, or `.js` files, they all fall under the `.txt` extension. The converter does recognize HTML files based on the presence of a `` tag, and considers the content of their `` tags as metadata. If the converter can determine the item's file type, it applies the first matching file type configuration's default action to index or ignore the item content and metadata. Only if the converter fails to resolve the item's file type does the crawler's file type detection determine the final indexing behavior. For visibility into file types detected at both stages during a content update operation, you can consult the [**View and map metadata** subpage](https://docs.coveo.com/en/m9ti0339#view-and-map-metadata-subpage) in the [Coveo Administration Console](https://docs.coveo.com/en/183/). Examine the metadata shown in the image below to understand how file types were resolved at each stage. Click icon:chevron-down[alt=chevron-down,width=16] to expand a metadata and see a breakdown of the values detected in the sample of indexed items. ![File type detection reflected on the View and map metadata subpage | Coveo](https://docs.coveo.com/en/assets/images/index-content/file-type-detection.png) ## Choosing the right default action To determine whether you can settle for a downgraded default action, the following table compares the results of each action from the perspective of search functionality. An example of a [search interface](https://docs.coveo.com/en/2741/) result is provided for the actions that support it. [%header,cols="~,~"] |=== |`Default action` |Result | `Index content and metadata` a|* You can display the [quickview](https://docs.coveo.com/en/3311/) and a coherent excerpt of the item content in [search interface](https://docs.coveo.com/en/2741/) results. * The item content is [free-text searchable](https://docs.coveo.com/en/mc2g0298#free-text-search). * You can display item metadata in [search interface](https://docs.coveo.com/en/2741/) results and badges, and use them for result filtering or sorting. {nbsp} ![Indexing content and metadata](https://docs.coveo.com/en/assets/images/index-content/full-index.png) | `Index metadata` a|* You can display item metadata in [search interface](https://docs.coveo.com/en/2741/) results and badges, and use them for result filtering or sorting. * You can't display the [quickview](https://docs.coveo.com/en/3311/) of items in your search interface results. * The item content isn't [free-text searchable](https://docs.coveo.com/en/mc2g0298#free-text-search). {nbsp} ![Indexing metadata only](https://docs.coveo.com/en/assets/images/index-content/metadataonly.png) |`Ignore item` |Items of this type don't appear in your [search interface](https://docs.coveo.com/en/2741/) results. |=== **Example** You index your company SharePoint Online tenant, which contains a Microsoft Word `.docx` item created by a user named John Smith. In your SharePoint Online source configuration, items with the `.docx` extension have their default action set to `Index metadata`. The SharePoint Online source item includes the following metadata: * filename: `Retirement_Announcement_Letter.docx` * title: `Early Retirement` * author: `John Smith` * date: `April 1, 2017` * URI: `https://mycomp.sharepoint.com/MySite/SiteAssets/Retirement_Announcement_Letter.docx?web=1` After the item is indexed, John Smith queries `retirement letter` to retrieve it. Because his query contains keywords which match the above metadata, the item appears in the search results. He can then review the item metadata in the [search interface](https://docs.coveo.com/en/2741/) or click the URI to open the document directly in his SharePoint Online tenant. However, if John Smith uses keywords in his query which match the content of the item rather than its metadata (for example, `dear colleagues`), the file doesn't appear in the search results. **Example** Your Amazon S3 source items currently contain both `.html` and `.pdf` files, but you only want to index `.html` files. In your source configuration **Items** tab, you click **Extensions** and then, for the `.pdf` extension, you change the **Default action** and **Action on error** values to `Ignore item`. You then rebuild your source to apply the changes. As a result, `.pdf` files are removed from your source items.