About field decomposition

This is for:

In this article

Leveraging field decomposition
Example field decomposition script

Field decomposition breaks down a field value, such as a product identifier, into multiple permutations stored in a separate field. This allows users to easily search for complex string values—like model numbers or SKUs—by entering only a partial value while still retrieving accurate results.

Example

In B2B commerce scenarios, visitors often search for products using product identifiers.

Product identifiers often consist of a combination of letters, numbers, and special characters, making them difficult to remember or type correctly. For example, a product identifier might look like this: ABC-123-XYZ.

By decomposing the product identifier, you can generate additional search terms from it, such as:

ABC123XYZ
ABC 123 XYZ

Leveraging field decomposition

A good way to decompose fields is to create a Coveo indexing pipeline extension (IPE). Using this feature, you can run a custom script to transform data during the indexing process.

This means that you can use a script on your product source to decompose the field values, generate the desired variations, and store them in a dedicated field.

Note that an example python script is provided later in this article.

Step 1: Create a decomposition field

The first step is to create a field that will store the decomposed values.

Access the Fields (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console.
Create a new field that will store the decomposed values (for example, ec_product_id_decomposed). Make sure that this field has the Multi-value facet and Free text search options enabled.
- You should also ensure that you have a field that stores the original field value (for example, ec_product_id).

Step 2: Create the Coveo indexing pipeline extension

After creating the field that will store the decomposed values, create a Coveo IPE that will run the field decomposition script. You’ll then need to apply this IPE to the source that stores the items you want to decompose.

To create the Coveo indexing pipeline extension:

Access the Extensions (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console.
Create a new indexing pipeline extension. This is where you’ll write the script that will decompose the field values. See Use indexing pipeline extensions for detailed instructions.

Step 3: Write the field decomposition script

When creating the Coveo IPE, you’ll be required to write a script.

Consider the following best practices when writing a field decomposition script. Note that an example script that includes these best practices is provided in the next section.

Cleaning input data:

Remove any non-alphanumeric characters (except for specific allowed characters, like hyphens) from the field values to standardize the field value for processing.
Handling hyphen-separated values:

Split the field value into separate parts based on the hyphens used in the field value. For example, the product identifier ABC-123-XYZ would be split into ABC, 123, and XYZ.
Generating incremental variations:

Create all possible prefixes from each part of the field value (obtained from the last step). For example, the product identifier ABC-123-XYZ would generate the following variations:
- Part ABC: A, AB, ABC
- Part 123: 1, 12, 123
- Part XYZ: X, XY, XYZ
Generating hyphen-separated incremental variations of the field value:

Create hyphen-separated incremental variations for each part of the field value. For example, the product identifier ABC-123-XYZ would generate the following variations:
- Part ABC-123-XYZ: ABC-, ABC-1, ABC-12, ABC-123, ABC-123-, ABC-123-X, ABC-123-XY, ABC-123-XYZ
- Part 123-XYZ: 123-, 123-X, 123-XY, 123-XYZ
- Part XYZ: X, XY, XYZ
Generating incremental variations for each part of the field value:

Create incremental variations for the field value, starting from the first character. For example, the product identifier ABC-123-XYZ would generate the following variations:
- A
- AB
- ABC
- ABC1
- ABC12
- ABC123
- ABC123X
- ABC123XY
- ABC123XYZ
Making leading and trailing characters optional:

Create variations where the leading two and trailing two characters are removed. For example, the product identifier ABC-123-XYZ would generate the following variations:
- Without the leading two characters: C-123-XYZ
- Without the trailing two characters: ABC-123-X
Combining all variations:

Combine all the variations generated in the previous steps into a single list of decomposed values. Ensure that there are no duplicates, and sort the list alphabetically. For example, the product identifier ABC-123-XYZ would generate the following list:
```
1; 12; 123; 123-; 123-X; 123-XY; 123-XYZ; A; AB; ABC; ABC 123 XYZ; ABC-; ABC-1; ABC-12; ABC-123; ABC-123-; ABC-123-X; ABC-123-XY; ABC-123-XYZ; ABC1; ABC12; ABC123; ABC123X; ABC123XY; ABC123XYZ; C-123-XYZ; C123XYZ; X; XY; XYZ
```

Step 4: Apply the indexing pipeline extension to a source

When you’re done creating your IPE, you’ll need to apply it to the source that contains the items you want to decompose.

To apply the IPE to a source:

Access the Sources (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console.
Click the source for which you want to apply the indexing pipeline extension.
In the Action bar, click More > Edit extensions.
Click Add > Extension, and then select the indexing pipeline extension you created.
Under Stage, select Post-conversion.
Under Action on error, select Skip extension.
Click Apply extension. You’ll need to rebuild the source for the IPE to apply to the items.

Step 5: Validate the field decomposition

After applying the IPE to the source, you should validate that the field decomposition is working as expected. You can use the Content Browser (platform-ca | platform-eu | platform-au) to view the indexed items and verify that the decomposed field exists and contains the expected values.

If the field decomposition isn’t working as expected, you can review the Log Browser (platform-ca | platform-eu | platform-au) to identify any errors or issues that may have occurred during the indexing process and adjust the field decomposition script accordingly.

Example field decomposition script

The following example Python script decomposes a product identifier field value according to the best practices outlined in the previous section. Note that this script considers the product identifier field to be a single value. If your product identifier field contains multiple values, you’ll need to adjust the script accordingly.

import re

def get_safe_meta_data(meta_data_name):
    meta_data_value = document.get_meta_data_value(meta_data_name)
    return list(meta_data_value)

def generate_variations(product_identifier):
    variations = set()

    # Remove non-alphanumeric characters (excluding hyphens) from the product identifier
    cleaned_product_identifier = re.sub(r'[^A-Za-z0-9-]+', '', product_identifier)

    # Handle hyphen-separated parts
    parts = cleaned_product_identifier.split('-')

    # Generate incremental variations for each part and sub-part
    for idx, part in enumerate(parts):
        # Incremental variations for the current part
        for i in range(1, len(part) + 1):
            prefix = part[:i]
            variations.add(prefix)

        # Generate variations for combinations starting from this part
        for sub_idx in range(idx, len(parts)):
            combined = '-'.join(parts[idx:sub_idx + 1])  # Combine parts from idx to sub_idx
            for i in range(1, len(combined) + 1):
                variations.add(combined[:i])

    # Generate variations for the concatenated product identifier
    concatenated_id = ''.join(parts)
    for i in range(1, len(concatenated_id) + 1):
        prefix = concatenated_id[:i]
        variations.add(prefix)

    # Generate variations with optional leading and trailing characters
    if len(cleaned_product_identifier) > 2:
        # Remove the first two characters of the cleaned identifier
        leading_removed_cleaned = cleaned_product_identifier[2:]
        variations.add(leading_removed_cleaned)
    if len(cleaned_product_identifier) > 2:
        # Remove the last two characters of the cleaned identifier
        trailing_removed_cleaned = cleaned_product_identifier[:-2]
        variations.add(trailing_removed_cleaned)

    if len(concatenated_id) > 2:
        # Remove the first two characters of the concatenated version
        leading_removed_concatenated = concatenated_id[2:]
        variations.add(leading_removed_concatenated)
    if len(concatenated_id) > 2:
        # Remove the last two characters of the concatenated version
        trailing_removed_concatenated = concatenated_id[:-2]
        variations.add(trailing_removed_concatenated)

    # Add the cleaned product identifier itself
    variations.add(cleaned_product_identifier)

    # Add the version with spaces instead of hyphens
    spaced_version = cleaned_product_identifier.replace('-', ' ')
    variations.add(spaced_version)

    return sorted(list(variations))

def main():
    product_identifier_meta_field = '<MY_PRODUCT_IDENTIFIER_FIELD>' 
    decomposed_meta_field = '<MY_DECOMPOSED_FIELD>'  

    product_identifiers = get_safe_meta_data(product_identifier_meta_field)
    decomposed_product_identifiers = []

    for product_identifier in product_identifiers:
        log(f"Processing Product Identifier: {product_identifier}")
        variations = generate_variations(product_identifier)
        # Properly join variations with a semicolon
        decomposed_product_identifier = ';'.join(variations)
        log(f"Decomposed Product Identifier: {decomposed_product_identifier}")
        decomposed_product_identifiers.append(decomposed_product_identifier)

    if decomposed_product_identifiers:
        # Add metadata as a semicolon-separated string
        document.add_meta_data({decomposed_meta_field: decomposed_product_identifiers})

main()

	Replace `<MY_PRODUCT_IDENTIFIER_FIELD>` with the name of the field that holds the product identifiers (for example, `ec_product_id`).
	Replace `<MY_DECOMPOSED_FIELD>` with the name of the field that will store the decomposed values (for example, `ec_product_id_decomposed`).

Was this article useful?

Very useful

Not really