--- title: About field decomposition slug: occg0574 canonical_url: https://docs.coveo.com/en/occg0574/ collection: coveo-for-commerce source_format: adoc --- # About field decomposition Field decomposition breaks down a [field](https://docs.coveo.com/en/200/) value, such as a product identifier, into multiple permutations stored in a separate field. This allows users to easily search for complex string values, like model numbers or SKUs, by entering only a partial value while still retrieving accurate results. **Example** In B2B commerce scenarios, visitors often search for products using product identifiers. Product identifiers often consist of a combination of letters, numbers, and special characters, making them difficult to remember or type correctly. For example, a product identifier might look like this: `ABC123DEF456`. By decomposing the product identifier, you can generate additional search terms from it, such as: ```txt 123DEF456 ; 23DEF456 ; 3DEF456 ; 456 ; ABC ; ABC1 ; ABC12 ; ABC123 ; ABC123D ; ABC123DE ; ABC123DEF ; ABC123DEF4 ; ABC123DEF45 ; ABC123DEF456 ; BC123DEF456 ; C123DEF456 ; DEF456 ; EF456 ; F456 ``` ## Leveraging field decomposition You can implement field decomposition either in your own systems before sending data to Coveo, or using Coveo [indexing pipeline extension (IPE)](https://docs.coveo.com/en/206/). If you prefer to handle decomposition upstream in your data processing workflows, you can generate the decomposed values and send them to a dedicated field in Coveo. Alternatively, you can create a Coveo [indexing pipeline extension (IPE)](https://docs.coveo.com/en/206/) to decompose fields during the [indexing](https://docs.coveo.com/en/204/) process. Using this feature, you can run a custom script to transform data as it's being indexed. Regardless of which approach you choose, the decomposition best practices and algorithms outlined in this article apply to both strategies. The [example Python script](#example-field-decomposition-script) provided later can serve as a reference for implementing field decomposition in either your own systems or within a Coveo IPE. ### Step 1: Create a decomposition field The first step is to create a field that will store the decomposed values. . Access the [**Fields**](https://platform.cloud.coveo.com/admin/#/orgid/content/fields/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/fields/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/fields/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/fields/)) page of the Coveo Administration Console. . [Create a new field](https://docs.coveo.com/en/1833#add-a-field) that will store the decomposed values (for example, `product_id_decomposed`). Make sure that this field has the **Multi-value facet** and **Free text search** options enabled. ** You should also ensure that you have a field that stores the original field value (for example, `ec_product_id`). ### Step 2: Create the Coveo indexing pipeline extension After creating the field that will store the decomposed values, create a Coveo [indexing pipeline extension (IPE)](https://docs.coveo.com/en/206/) that will run the field decomposition script. You'll then need to apply this IPE to the [source](https://docs.coveo.com/en/246/) that stores the items you want to decompose. To create the Coveo indexing pipeline extension: . Access the [**Extensions**](https://platform.cloud.coveo.com/admin/#/orgid/content/extensions/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/extensions/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/extensions/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/extensions/)) page of the Coveo Administration Console. . Create a new indexing pipeline extension. This is where you'll [write the script](#step-3-write-the-field-decomposition-script) that will decompose the field values. See [Use indexing pipeline extensions](https://docs.coveo.com/en/3394/) for detailed instructions. ### Step 3: Write the field decomposition script When creating the Coveo IPE, you'll be required to write a script. Consider the following best practices when writing a field decomposition script. These practices minimize index noise while maintaining effective searchability. Note that an [example script](#example-field-decomposition-script) that includes these best practices is provided later in this article. . Cleaning input data: Remove any non-alphanumeric characters (except for specific [allowed characters](https://docs.coveo.com/en/2744/), like hyphens) from the field values to standardize the field value for processing. . Handling hyphen-separated values: Split the field value into separate parts based on the hyphens used in the field value. For example, the product identifier `XYZ-789-DEF` would be split into `XYZ`, `789`, and `DEF`. . Generating meaningful incremental variations: Create incremental substring variations containing at least 3-4 characters to avoid index noise from overly short terms. For example, the product identifier `ABC123DEF456` would generate the following variations: ** `ABC`, `ABC1`, `ABC12`, `ABC123`, `ABC123D`, `ABC123DE`, `ABC123DEF`, `ABC123DEF4`, `ABC123DEF45`, `ABC123DEF456` This approach avoids generating single or double-character variations like `A`, `B`, `12` that can create noise and reduce query precision. . Hyphen-progressive concatenation: For hyphenated identifiers, progressively add characters across hyphen boundaries to generate incremental variants that reflect how users type. For example, the product identifier `XYZ-789-DEF` would generate the following variations: ** Progressive hyphenated: `XYZ-`, `XYZ-7`, `XYZ-78`, `XYZ-789`, `XYZ-789-`, `XYZ-789-D`, `XYZ-789-DE`, `XYZ-789-DEF` ** Fully concatenated: `XYZ789DEF` ** Individual parts: `XYZ`, `XYZ789`, `XYZ789DEF` . Normalization of separators: Create variations that replace dashes with spaces and provide the fully concatenated version. For example, the product identifier `ABC123-GHI789` would generate: ** Space-separated: `ABC123 GHI789` ** Fully concatenated: `ABC123GHI789` . Edge trimming for flexibility: Generate variations that remove one or two leading or trailing characters to account for partial recalls. For example, the product identifier `ABC123DEF456` would generate: ** Without leading characters: `C123DEF456`, `123DEF456` ** Without trailing characters: `ABC123DEF45`, `ABC123DEF4` . Combining all variations: Combine all the variations generated in the previous steps into a single list of decomposed values. Ensure that there are no duplicates, and sort the list alphabetically. For example, the product identifier `ABC123DEF456` would generate the following optimized list: ```text 123DEF456 ; 23DEF456 ; 3DEF456 ; 456 ; ABC ; ABC1 ; ABC12 ; ABC123 ; ABC123D ; ABC123DE ; ABC123DEF ; ABC123DEF4 ; ABC123DEF45 ; ABC123DEF456 ; BC123DEF456 ; C123DEF456 ; DEF456 ; EF456 ; F456 ``` Note how this approach eliminates single-character noise while maintaining comprehensive searchability. ### Step 4: Apply the indexing pipeline extension to a source When you're done creating your IPE, you'll need to apply it to the source that contains the items you want to decompose. To apply the IPE to a source: . Access the [**Sources**](https://platform.cloud.coveo.com/admin/#/orgid/content/sources/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/sources/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/sources/)) page of the Coveo Administration Console. . Click the source for which you want to apply the indexing pipeline extension. . In the Action bar, click **More** > **Edit extensions**. . Click **Add** > **Extension**, and then select the indexing pipeline extension you created. . Under **Stage**, select **Post-conversion**. . Under **Action on error**, select **Skip extension**. . Click **Apply extension**. You'll need to rebuild the source for the IPE to apply to the items. ### Step 5: Validate the field decomposition After applying the IPE to the source, you should validate that the field decomposition is working as expected. You can use the [**Content Browser**](https://platform.cloud.coveo.com/admin/#/orgid/content/browser/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/content/browser/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/content/browser/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/content/browser/)) to view the indexed items and verify that the decomposed field exists and contains the expected values. If the field decomposition isn't working as expected, you can review the [**Log Browser**](https://platform.cloud.coveo.com/admin/#/orgid/logs/browser/) ([platform-ca](https://platform-ca.cloud.coveo.com/admin/#/orgid/logs/browser/) | [platform-eu](https://platform-eu.cloud.coveo.com/admin/#/orgid/logs/browser/) | [platform-au](https://platform-au.cloud.coveo.com/admin/#/orgid/logs/browser/)) to identify any errors or issues that may have occurred during the indexing process and adjust the field decomposition script accordingly. ## Example field decomposition script The following example Python script decomposes a product identifier field value according to the best practices outlined in the previous section. Note that this script considers the product identifier field to be a single value. If your product identifier field contains multiple values, you'll need to adjust the script accordingly. ```python import re def get_safe_meta_data(meta_data_name): meta_data_value = document.get_meta_data_value(meta_data_name) return list(meta_data_value) def generate_variations(product_identifier): variations = set() MIN_VARIATION_LENGTH = 3 # Minimum decomposition length to avoid index noise # Remove non-alphanumeric characters (excluding hyphens) from the product identifier cleaned_product_identifier = re.sub(r'[^A-Za-z0-9-]+', '', product_identifier) # Handle hyphen-separated parts parts = cleaned_product_identifier.split('-') concatenated_id = ''.join(parts) # 1. Generate incremental variations starting from minimum length for concatenated version # This avoids single/double character noise in the index for i in range(MIN_VARIATION_LENGTH, len(concatenated_id) + 1): variations.add(concatenated_id[:i]) # 2. Generate suffix variations (from the end) for better searchability for i in range(MIN_VARIATION_LENGTH, len(concatenated_id) + 1): suffix = concatenated_id[-i:] if len(suffix) >= MIN_VARIATION_LENGTH: variations.add(suffix) # 3. Hyphen-progressive concatenation for hyphenated identifiers if len(parts) > 1: current_progressive = "" for part_idx, part in enumerate(parts): if part_idx > 0: current_progressive += "-" # Add variation with trailing hyphen if it meets minimum length if len(current_progressive) >= MIN_VARIATION_LENGTH: variations.add(current_progressive) # Add each character progressively within the current part for char_idx in range(len(part)): current_progressive += part[char_idx] if len(current_progressive) >= MIN_VARIATION_LENGTH: variations.add(current_progressive) # 4. Generate variations for individual parts that meet minimum length for part in parts: if len(part) >= MIN_VARIATION_LENGTH: variations.add(part) # Add incremental variations for parts longer than minimum for i in range(MIN_VARIATION_LENGTH, len(part)): variations.add(part[:i]) # 5. Edge trimming: remove leading/trailing characters for flexibility if len(concatenated_id) > MIN_VARIATION_LENGTH + 1: # Remove 1-2 leading characters for trim_count in range(1, min(3, len(concatenated_id) - MIN_VARIATION_LENGTH + 1)): trimmed = concatenated_id[trim_count:] if len(trimmed) >= MIN_VARIATION_LENGTH: variations.add(trimmed) # Remove 1-2 trailing characters for trim_count in range(1, min(3, len(concatenated_id) - MIN_VARIATION_LENGTH + 1)): trimmed = concatenated_id[:-trim_count] if len(trimmed) >= MIN_VARIATION_LENGTH: variations.add(trimmed) # 6. Always include the original cleaned identifier variations.add(cleaned_product_identifier) # 7. Add space-separated version for better matching spaced_version = cleaned_product_identifier.replace('-', ' ') variations.add(spaced_version) # 8. Filter to ensure minimum length requirement (except for original forms) filtered_variations = [] for v in variations: if (len(v) >= MIN_VARIATION_LENGTH or v == cleaned_product_identifier or v == spaced_version): filtered_variations.append(v) return sorted(list(set(filtered_variations))) def main(): product_identifier_meta_field = '' <1> decomposed_meta_field = '' <2> product_identifiers = get_safe_meta_data(product_identifier_meta_field) decomposed_product_identifiers = [] for product_identifier in product_identifiers: log(f"Processing Product Identifier: {product_identifier}") variations = generate_variations(product_identifier) # Properly join variations with a semicolon decomposed_product_identifier = ';'.join(variations) log(f"Decomposed Product Identifier: {decomposed_product_identifier}") decomposed_product_identifiers.append(decomposed_product_identifier) if decomposed_product_identifiers: # Add metadata as a semicolon-separated string document.add_meta_data({decomposed_meta_field: decomposed_product_identifiers}) main() ``` <1> Replace `` with the name of the field that holds the product identifiers (for example, `ec_product_id`). <2> Replace `` with the name of the field that will store the decomposed values (for example, `product_id_decomposed`).