About field decomposition

Field decomposition breaks down a field value, such as a product identifier, into multiple permutations stored in a separate field. This allows users to easily search for complex string values, like model numbers or SKUs, by entering only a partial value while still retrieving accurate results.

Example

In B2B commerce scenarios, visitors often search for products using product identifiers.

Product identifiers often consist of a combination of letters, numbers, and special characters, making them difficult to remember or type correctly. For example, a product identifier might look like this: ABC123DEF456.

By decomposing the product identifier, you can generate additional search terms from it, such as:

123DEF456 ; 23DEF456 ; 3DEF456 ; 456 ; ABC ; ABC1 ; ABC12 ; ABC123 ; ABC123D ; ABC123DE ; ABC123DEF ; ABC123DEF4 ; ABC123DEF45 ; ABC123DEF456 ; BC123DEF456 ; C123DEF456 ; DEF456 ; EF456 ; F456

Leveraging field decomposition

You can implement field decomposition either in your own systems before sending data to Coveo, or using Coveo indexing pipeline extension (IPE). If you prefer to handle decomposition upstream in your data processing workflows, you can generate the decomposed values and send them to a dedicated field in Coveo.

Alternatively, you can create a Coveo indexing pipeline extension (IPE) to decompose fields during the indexing process. Using this feature, you can run a custom script to transform data as it’s being indexed.

Regardless of which approach you choose, the decomposition best practices and algorithms outlined in this article apply to both strategies. The example Python script provided later can serve as a reference for implementing field decomposition in either your own systems or within a Coveo IPE.

Step 1: Create a decomposition field

The first step is to create a field that will store the decomposed values.

  1. Access the Fields (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console.

  2. Create a new field that will store the decomposed values (for example, product_id_decomposed). Make sure that this field has the Multi-value facet and Free text search options enabled.

    • You should also ensure that you have a field that stores the original field value (for example, ec_product_id).

Step 2: Create the Coveo indexing pipeline extension

After creating the field that will store the decomposed values, create a Coveo indexing pipeline extension (IPE) that will run the field decomposition script. You’ll then need to apply this IPE to the source that stores the items you want to decompose.

To create the Coveo indexing pipeline extension:

  1. Access the Extensions (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console.

  2. Create a new indexing pipeline extension. This is where you’ll write the script that will decompose the field values. See Use indexing pipeline extensions for detailed instructions.

Step 3: Write the field decomposition script

When creating the Coveo IPE, you’ll be required to write a script.

Consider the following best practices when writing a field decomposition script. These practices minimize index noise while maintaining effective searchability. Note that an example script that includes these best practices is provided later in this article.

  1. Cleaning input data:

    Remove any non-alphanumeric characters (except for specific allowed characters, like hyphens) from the field values to standardize the field value for processing.

  2. Handling hyphen-separated values:

    Split the field value into separate parts based on the hyphens used in the field value. For example, the product identifier XYZ-789-DEF would be split into XYZ, 789, and DEF.

  3. Generating meaningful incremental variations:

    Create incremental substring variations containing at least 3-4 characters to avoid index noise from overly short terms. For example, the product identifier ABC123DEF456 would generate the following variations:

    • ABC, ABC1, ABC12, ABC123, ABC123D, ABC123DE, ABC123DEF, ABC123DEF4, ABC123DEF45, ABC123DEF456

    This approach avoids generating single or double-character variations like A, B, 12 that can create noise and reduce query precision.

  4. Hyphen-progressive concatenation:

    For hyphenated identifiers, progressively add characters across hyphen boundaries to generate incremental variants that reflect how users type. For example, the product identifier XYZ-789-DEF would generate the following variations:

    • Progressive hyphenated: XYZ-, XYZ-7, XYZ-78, XYZ-789, XYZ-789-, XYZ-789-D, XYZ-789-DE, XYZ-789-DEF

    • Fully concatenated: XYZ789DEF

    • Individual parts: XYZ, XYZ789, XYZ789DEF

  5. Normalization of separators:

    Create variations that replace dashes with spaces and provide the fully concatenated version. For example, the product identifier ABC123-GHI789 would generate:

    • Space-separated: ABC123 GHI789

    • Fully concatenated: ABC123GHI789

  6. Edge trimming for flexibility:

    Generate variations that remove one or two leading or trailing characters to account for partial recalls. For example, the product identifier ABC123DEF456 would generate:

    • Without leading characters: C123DEF456, 123DEF456

    • Without trailing characters: ABC123DEF45, ABC123DEF4

  7. Combining all variations:

    Combine all the variations generated in the previous steps into a single list of decomposed values. Ensure that there are no duplicates, and sort the list alphabetically. For example, the product identifier ABC123DEF456 would generate the following optimized list:

    123DEF456 ; 23DEF456 ; 3DEF456 ; 456 ; ABC ; ABC1 ; ABC12 ; ABC123 ; ABC123D ; ABC123DE ; ABC123DEF ; ABC123DEF4 ; ABC123DEF45 ; ABC123DEF456 ; BC123DEF456 ; C123DEF456 ; DEF456 ; EF456 ; F456

    Note how this approach eliminates single-character noise while maintaining comprehensive searchability.

Step 4: Apply the indexing pipeline extension to a source

When you’re done creating your IPE, you’ll need to apply it to the source that contains the items you want to decompose.

To apply the IPE to a source:

  1. Access the Sources (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console.

  2. Click the source for which you want to apply the indexing pipeline extension.

  3. In the Action bar, click More > Edit extensions.

  4. Click Add > Extension, and then select the indexing pipeline extension you created.

  5. Under Stage, select Post-conversion.

  6. Under Action on error, select Skip extension.

  7. Click Apply extension. You’ll need to rebuild the source for the IPE to apply to the items.

Step 5: Validate the field decomposition

After applying the IPE to the source, you should validate that the field decomposition is working as expected. You can use the Content Browser (platform-ca | platform-eu | platform-au) to view the indexed items and verify that the decomposed field exists and contains the expected values.

If the field decomposition isn’t working as expected, you can review the Log Browser (platform-ca | platform-eu | platform-au) to identify any errors or issues that may have occurred during the indexing process and adjust the field decomposition script accordingly.

Example field decomposition script

The following example Python script decomposes a product identifier field value according to the best practices outlined in the previous section. Note that this script considers the product identifier field to be a single value. If your product identifier field contains multiple values, you’ll need to adjust the script accordingly.

import re

def get_safe_meta_data(meta_data_name):
    meta_data_value = document.get_meta_data_value(meta_data_name)
    return list(meta_data_value)

def generate_variations(product_identifier):
    variations = set()
    MIN_VARIATION_LENGTH = 3  # Minimum decomposition length to avoid index noise

    # Remove non-alphanumeric characters (excluding hyphens) from the product identifier
    cleaned_product_identifier = re.sub(r'[^A-Za-z0-9-]+', '', product_identifier)

    # Handle hyphen-separated parts
    parts = cleaned_product_identifier.split('-')
    concatenated_id = ''.join(parts)

    # 1. Generate incremental variations starting from minimum length for concatenated version
    # This avoids single/double character noise in the index
    for i in range(MIN_VARIATION_LENGTH, len(concatenated_id) + 1):
        variations.add(concatenated_id[:i])

    # 2. Generate suffix variations (from the end) for better searchability
    for i in range(MIN_VARIATION_LENGTH, len(concatenated_id) + 1):
        suffix = concatenated_id[-i:]
        if len(suffix) >= MIN_VARIATION_LENGTH:
            variations.add(suffix)

    # 3. Hyphen-progressive concatenation for hyphenated identifiers
    if len(parts) > 1:
        current_progressive = ""
        for part_idx, part in enumerate(parts):
            if part_idx > 0:
                current_progressive += "-"
                # Add variation with trailing hyphen if it meets minimum length
                if len(current_progressive) >= MIN_VARIATION_LENGTH:
                    variations.add(current_progressive)

            # Add each character progressively within the current part
            for char_idx in range(len(part)):
                current_progressive += part[char_idx]
                if len(current_progressive) >= MIN_VARIATION_LENGTH:
                    variations.add(current_progressive)

    # 4. Generate variations for individual parts that meet minimum length
    for part in parts:
        if len(part) >= MIN_VARIATION_LENGTH:
            variations.add(part)
            # Add incremental variations for parts longer than minimum
            for i in range(MIN_VARIATION_LENGTH, len(part)):
                variations.add(part[:i])

    # 5. Edge trimming: remove leading/trailing characters for flexibility
    if len(concatenated_id) > MIN_VARIATION_LENGTH + 1:
        # Remove 1-2 leading characters
        for trim_count in range(1, min(3, len(concatenated_id) - MIN_VARIATION_LENGTH + 1)):
            trimmed = concatenated_id[trim_count:]
            if len(trimmed) >= MIN_VARIATION_LENGTH:
                variations.add(trimmed)

        # Remove 1-2 trailing characters
        for trim_count in range(1, min(3, len(concatenated_id) - MIN_VARIATION_LENGTH + 1)):
            trimmed = concatenated_id[:-trim_count]
            if len(trimmed) >= MIN_VARIATION_LENGTH:
                variations.add(trimmed)

    # 6. Always include the original cleaned identifier
    variations.add(cleaned_product_identifier)

    # 7. Add space-separated version for better matching
    spaced_version = cleaned_product_identifier.replace('-', ' ')
    variations.add(spaced_version)

    # 8. Filter to ensure minimum length requirement (except for original forms)
    filtered_variations = []
    for v in variations:
        if (len(v) >= MIN_VARIATION_LENGTH or
            v == cleaned_product_identifier or
            v == spaced_version):
            filtered_variations.append(v)

    return sorted(list(set(filtered_variations)))

def main():
    product_identifier_meta_field = '<MY_PRODUCT_IDENTIFIER_FIELD>' 1
    decomposed_meta_field = '<MY_DECOMPOSED_FIELD>'  2

    product_identifiers = get_safe_meta_data(product_identifier_meta_field)
    decomposed_product_identifiers = []

    for product_identifier in product_identifiers:
        log(f"Processing Product Identifier: {product_identifier}")
        variations = generate_variations(product_identifier)
        # Properly join variations with a semicolon
        decomposed_product_identifier = ';'.join(variations)
        log(f"Decomposed Product Identifier: {decomposed_product_identifier}")
        decomposed_product_identifiers.append(decomposed_product_identifier)

    if decomposed_product_identifiers:
        # Add metadata as a semicolon-separated string
        document.add_meta_data({decomposed_meta_field: decomposed_product_identifiers})

main()
1 Replace <MY_PRODUCT_IDENTIFIER_FIELD> with the name of the field that holds the product identifiers (for example, ec_product_id).
2 Replace <MY_DECOMPOSED_FIELD> with the name of the field that will store the decomposed values (for example, product_id_decomposed).