---
title: About field decomposition
slug: occg0574
canonical_url: https://docs.coveo.com/en/occg0574/
collection: coveo-for-commerce
source_format: adoc
---
# About field decomposition
Field decomposition breaks down a [field](https://docs.coveo.com/en/200/) value, such as a product identifier, into multiple permutations stored in a separate field.
This allows users to easily search for complex string values, like model numbers or SKUs, by entering only a partial value while still retrieving accurate results.

**Example**

In B2B commerce scenarios, visitors often search for products using product identifiers.

Product identifiers often consist of a combination of letters, numbers, and special characters, making them difficult to remember or type correctly.
For example, a product identifier might look like this: `ABC123DEF456`.

By decomposing the product identifier, you can generate additional search terms from it, such as:

```txt
123DEF456 ; 23DEF456 ; 3DEF456 ; 456 ; ABC ; ABC1 ; ABC12 ; ABC123 ; ABC123D ; ABC123DE ; ABC123DEF ; ABC123DEF4 ; ABC123DEF45 ; ABC123DEF456 ; BC123DEF456 ; C123DEF456 ; DEF456 ; EF456 ; F456
```

## Leveraging field decomposition

You can implement field decomposition either in your own systems before sending data to Coveo, or using Coveo [indexing pipeline extension (IPE)](https://docs.coveo.com/en/206/).
If you prefer to handle decomposition upstream in your data processing workflows, you can generate the decomposed values and send them to a dedicated field in Coveo.

Alternatively, you can create a Coveo [indexing pipeline extension (IPE)](https://docs.coveo.com/en/206/) to decompose fields during the [indexing](https://docs.coveo.com/en/204/) process.
Using this feature, you can run a custom script to transform data as it's being indexed.

Regardless of which approach you choose, the decomposition best practices and algorithms outlined in this article apply to both strategies.
The [example Python script](#example-field-decomposition-script) provided later can serve as a reference for implementing field decomposition in either your own systems or within a Coveo IPE.

### Step 1: Create a decomposition field

The first step is to create a field that will store the decomposed values.

. Access the [**Fields**](https://platform.cloud.coveo.com/admin/#/orgid/content/fields/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/fields/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/fields/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/fields/)) page of the Coveo Administration Console.

. [Create a new field](https://docs.coveo.com/en/1833#add-a-field) that will store the decomposed values (for example, `product_id_decomposed`).
Make sure that this field has the **Multi-value facet** and **Free text search** options enabled.

** You should also ensure that you have a field that stores the original field value (for example, `ec_product_id`).

### Step 2: Create the Coveo indexing pipeline extension

After creating the field that will store the decomposed values, create a Coveo [indexing pipeline extension (IPE)](https://docs.coveo.com/en/206/) that will run the field decomposition script.
You'll then need to apply this IPE to the [source](https://docs.coveo.com/en/246/) that stores the items you want to decompose.

To create the Coveo indexing pipeline extension:

. Access the [**Extensions**](https://platform.cloud.coveo.com/admin/#/orgid/content/extensions/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/extensions/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/extensions/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/extensions/)) page of the Coveo Administration Console.

. Create a new indexing pipeline extension.
This is where you'll [write the script](#step-3-write-the-field-decomposition-script) that will decompose the field values.
See [Use indexing pipeline extensions](https://docs.coveo.com/en/3394/) for detailed instructions.

### Step 3: Write the field decomposition script

When creating the Coveo IPE, you'll be required to write a script.

Consider the following best practices when writing a field decomposition script.
These practices minimize index noise while maintaining effective searchability.
Note that an [example script](#example-field-decomposition-script) that includes these best practices is provided later in this article.

. Cleaning input data:


Remove any non-alphanumeric characters (except for specific [allowed characters](https://docs.coveo.com/en/2744/), like hyphens) from the field values to standardize the field value for processing.

. Handling hyphen-separated values:


Split the field value into separate parts based on the hyphens used in the field value.
For example, the product identifier `XYZ-789-DEF` would be split into `XYZ`, `789`, and `DEF`.

. Generating meaningful incremental variations:


Create incremental substring variations containing at least 3-4 characters to avoid index noise from overly short terms.
For example, the product identifier `ABC123DEF456` would generate the following variations:

** `ABC`, `ABC1`, `ABC12`, `ABC123`, `ABC123D`, `ABC123DE`, `ABC123DEF`, `ABC123DEF4`, `ABC123DEF45`, `ABC123DEF456`


This approach avoids generating single or double-character variations like `A`, `B`, `12` that can create noise and reduce query precision.

. Hyphen-progressive concatenation:


For hyphenated identifiers, progressively add characters across hyphen boundaries to generate incremental variants that reflect how users type.
For example, the product identifier `XYZ-789-DEF` would generate the following variations:

** Progressive hyphenated: `XYZ-`, `XYZ-7`, `XYZ-78`, `XYZ-789`, `XYZ-789-`, `XYZ-789-D`, `XYZ-789-DE`, `XYZ-789-DEF`

** Fully concatenated: `XYZ789DEF`

** Individual parts: `XYZ`, `XYZ789`, `XYZ789DEF`

. Normalization of separators:


Create variations that replace dashes with spaces and provide the fully concatenated version.
For example, the product identifier `ABC123-GHI789` would generate:

** Space-separated: `ABC123 GHI789`

** Fully concatenated: `ABC123GHI789`

. Edge trimming for flexibility:


Generate variations that remove one or two leading or trailing characters to account for partial recalls.
For example, the product identifier `ABC123DEF456` would generate:

** Without leading characters: `C123DEF456`, `123DEF456`

** Without trailing characters: `ABC123DEF45`, `ABC123DEF4`

. Combining all variations:


Combine all the variations generated in the previous steps into a single list of decomposed values.
Ensure that there are no duplicates, and sort the list alphabetically.
For example, the product identifier `ABC123DEF456` would generate the following optimized list:


```text
123DEF456 ; 23DEF456 ; 3DEF456 ; 456 ; ABC ; ABC1 ; ABC12 ; ABC123 ; ABC123D ; ABC123DE ; ABC123DEF ; ABC123DEF4 ; ABC123DEF45 ; ABC123DEF456 ; BC123DEF456 ; C123DEF456 ; DEF456 ; EF456 ; F456
```

Note how this approach eliminates single-character noise while maintaining comprehensive searchability.

### Step 4: Apply the indexing pipeline extension to a source

When you're done creating your IPE, you'll need to apply it to the source that contains the items you want to decompose.

To apply the IPE to a source:

. Access the [**Sources**](https://platform.cloud.coveo.com/admin/#/orgid/content/sources/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/sources/)) page of the Coveo Administration Console.

. Click the source for which you want to apply the indexing pipeline extension.

. In the Action bar, click **More** > **Edit extensions**.

. Click **Add** > **Extension**, and then select the indexing pipeline extension you created.

. Under **Stage**, select **Post-conversion**.

. Under **Action on error**, select **Skip extension**.

. Click **Apply extension**.
You'll need to rebuild the source for the IPE to apply to the items.

### Step 5: Validate the field decomposition

After applying the IPE to the source, you should validate that the field decomposition is working as expected.
You can use the [**Content Browser**](https://platform.cloud.coveo.com/admin/#/orgid/content/browser/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/browser/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/browser/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/browser/)) to view the indexed items and verify that the decomposed field exists and contains the expected values.

If the field decomposition isn't working as expected, you can review the [**Log Browser**](https://platform.cloud.coveo.com/admin/#/orgid/logs/browser/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/logs/browser/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/logs/browser/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/logs/browser/)) to identify any errors or issues that may have occurred during the indexing process and adjust the field decomposition script accordingly.

## Example field decomposition script

The following example Python script decomposes a product identifier field value according to the best practices outlined in the previous section.
Note that this script considers the product identifier field to be a single value.
If your product identifier field contains multiple values, you'll need to adjust the script accordingly.

```python
import re

def get_safe_meta_data(meta_data_name):
    meta_data_value = document.get_meta_data_value(meta_data_name)
    return list(meta_data_value)

def generate_variations(product_identifier):
    variations = set()
    MIN_VARIATION_LENGTH = 3  # Minimum decomposition length to avoid index noise

    # Remove non-alphanumeric characters (excluding hyphens) from the product identifier
    cleaned_product_identifier = re.sub(r'[^A-Za-z0-9-]+', '', product_identifier)

    # Handle hyphen-separated parts
    parts = cleaned_product_identifier.split('-')
    concatenated_id = ''.join(parts)

    # 1. Generate incremental variations starting from minimum length for concatenated version
    # This avoids single/double character noise in the index
    for i in range(MIN_VARIATION_LENGTH, len(concatenated_id) + 1):
        variations.add(concatenated_id[:i])

    # 2. Generate suffix variations (from the end) for better searchability
    for i in range(MIN_VARIATION_LENGTH, len(concatenated_id) + 1):
        suffix = concatenated_id[-i:]
        if len(suffix) >= MIN_VARIATION_LENGTH:
            variations.add(suffix)

    # 3. Hyphen-progressive concatenation for hyphenated identifiers
    if len(parts) > 1:
        current_progressive = ""
        for part_idx, part in enumerate(parts):
            if part_idx > 0:
                current_progressive += "-"
                # Add variation with trailing hyphen if it meets minimum length
                if len(current_progressive) >= MIN_VARIATION_LENGTH:
                    variations.add(current_progressive)

            # Add each character progressively within the current part
            for char_idx in range(len(part)):
                current_progressive += part[char_idx]
                if len(current_progressive) >= MIN_VARIATION_LENGTH:
                    variations.add(current_progressive)

    # 4. Generate variations for individual parts that meet minimum length
    for part in parts:
        if len(part) >= MIN_VARIATION_LENGTH:
            variations.add(part)
            # Add incremental variations for parts longer than minimum
            for i in range(MIN_VARIATION_LENGTH, len(part)):
                variations.add(part[:i])

    # 5. Edge trimming: remove leading/trailing characters for flexibility
    if len(concatenated_id) > MIN_VARIATION_LENGTH + 1:
        # Remove 1-2 leading characters
        for trim_count in range(1, min(3, len(concatenated_id) - MIN_VARIATION_LENGTH + 1)):
            trimmed = concatenated_id[trim_count:]
            if len(trimmed) >= MIN_VARIATION_LENGTH:
                variations.add(trimmed)

        # Remove 1-2 trailing characters
        for trim_count in range(1, min(3, len(concatenated_id) - MIN_VARIATION_LENGTH + 1)):
            trimmed = concatenated_id[:-trim_count]
            if len(trimmed) >= MIN_VARIATION_LENGTH:
                variations.add(trimmed)

    # 6. Always include the original cleaned identifier
    variations.add(cleaned_product_identifier)

    # 7. Add space-separated version for better matching
    spaced_version = cleaned_product_identifier.replace('-', ' ')
    variations.add(spaced_version)

    # 8. Filter to ensure minimum length requirement (except for original forms)
    filtered_variations = []
    for v in variations:
        if (len(v) >= MIN_VARIATION_LENGTH or
            v == cleaned_product_identifier or
            v == spaced_version):
            filtered_variations.append(v)

    return sorted(list(set(filtered_variations)))

def main():
    product_identifier_meta_field = '<MY_PRODUCT_IDENTIFIER_FIELD>' <1>
    decomposed_meta_field = '<MY_DECOMPOSED_FIELD>'  <2>

    product_identifiers = get_safe_meta_data(product_identifier_meta_field)
    decomposed_product_identifiers = []

    for product_identifier in product_identifiers:
        log(f"Processing Product Identifier: {product_identifier}")
        variations = generate_variations(product_identifier)
        # Properly join variations with a semicolon
        decomposed_product_identifier = ';'.join(variations)
        log(f"Decomposed Product Identifier: {decomposed_product_identifier}")
        decomposed_product_identifiers.append(decomposed_product_identifier)

    if decomposed_product_identifiers:
        # Add metadata as a semicolon-separated string
        document.add_meta_data({decomposed_meta_field: decomposed_product_identifiers})

main()
```

<1> Replace `<MY_PRODUCT_IDENTIFIER_FIELD>` with the name of the field that holds the product identifiers (for example, `ec_product_id`).

<2> Replace `<MY_DECOMPOSED_FIELD>` with the name of the field that will store the decomposed values (for example, `product_id_decomposed`).