About field decomposition
About field decomposition
Field decomposition breaks down a field value, such as a product identifier, into multiple permutations stored in a separate field. This allows users to easily search for complex string values—like model numbers or SKUs—by entering only a partial value while still retrieving accurate results.
In B2B commerce scenarios, visitors often search for products using product identifiers.
Product identifiers often consist of a combination of letters, numbers, and special characters, making them difficult to remember or type correctly.
For example, a product identifier might look like this: ABC-123-XYZ
.
By decomposing the product identifier, you can generate additional search terms from it, such as:
-
ABC123XYZ
-
ABC 123 XYZ
Leveraging field decomposition
A good way to decompose fields is to create a Coveo indexing pipeline extension (IPE). Using this feature, you can run a custom script to transform data during the indexing process.
This means that you can use a script on your product source to decompose the field values, generate the desired variations, and store them in a dedicated field.
Note that an example python script is provided later in this article.
Step 1: Create a decomposition field
The first step is to create a field that will store the decomposed values.
-
Access the Fields (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console.
-
Create a new field that will store the decomposed values (for example,
ec_product_id_decomposed
). Make sure that this field has the Multi-value facet and Free text search options enabled. See Add or edit a field for detailed instructions.-
You should also ensure that you have a field that stores the original field value (for example,
ec_product_id
).
-
Step 2: Create the Coveo indexing pipeline extension
After creating the field that will store the decomposed values, you need to create a Coveo IPE that will run the field decomposition script. You’ll then need to apply this IPE to the source that stores the items you want to decompose.
To create the Coveo indexing pipeline extension:
-
Access the Extensions (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console.
-
Create a new indexing pipeline extension. This is where you’ll write the script that will decompose the field values. See Use indexing pipeline extensions for detailed instructions.
Step 3: Write the field decomposition script
When creating the Coveo IPE, you’ll be required to write a script.
Consider the following best practices when writing a field decomposition script. Note that an example script that includes these best practices is provided in the next section.
-
Cleaning input data:
Remove any non-alphanumeric characters (except for specific allowed characters, like hyphens) from the field values to standardize the field value for processing.
-
Handling hyphen-separated values:
Split the field value into separate parts based on the hyphens used in the field value. For example, the product identifier
ABC-123-XYZ
would be split intoABC
,123
, andXYZ
. -
Generating incremental variations:
Create all possible prefixes from each part of the field value (obtained from the last step). For example, the product identifier
ABC-123-XYZ
would generate the following variations:-
Part
ABC
:A
,AB
,ABC
-
Part
123
:1
,12
,123
-
Part
XYZ
:X
,XY
,XYZ
-
-
Generating hyphen-separated incremental variations of the field value:
Create hyphen-separated incremental variations for each part of the field value. For example, the product identifier
ABC-123-XYZ
would generate the following variations:-
Part
ABC-123-XYZ
:ABC-
,ABC-1
,ABC-12
,ABC-123
,ABC-123-
,ABC-123-X
,ABC-123-XY
,ABC-123-XYZ
-
Part
123-XYZ
:123-
,123-X
,123-XY
,123-XYZ
-
Part
XYZ
:X
,XY
,XYZ
-
-
Generating incremental variations for each part of the field value:
Create incremental variations for the field value, starting from the first character. For example, the product identifier
ABC-123-XYZ
would generate the following variations:-
A
-
AB
-
ABC
-
ABC1
-
ABC12
-
ABC123
-
ABC123X
-
ABC123XY
-
ABC123XYZ
-
-
Making leading and trailing characters optional:
Create variations where the leading two and trailing two characters are removed. For example, the product identifier
ABC-123-XYZ
would generate the following variations:-
Without the leading two characters:
C-123-XYZ
-
Without the trailing two characters:
ABC-123-X
-
-
Combining all variations:
Combine all the variations generated in the previous steps into a single list of decomposed values. Ensure that there are no duplicates, and sort the list alphabetically. For example, the product identifier
ABC-123-XYZ
would generate the following list:1; 12; 123; 123-; 123-X; 123-XY; 123-XYZ; A; AB; ABC; ABC 123 XYZ; ABC-; ABC-1; ABC-12; ABC-123; ABC-123-; ABC-123-X; ABC-123-XY; ABC-123-XYZ; ABC1; ABC12; ABC123; ABC123X; ABC123XY; ABC123XYZ; C-123-XYZ; C123XYZ; X; XY; XYZ
Step 4: Apply the indexing pipeline extension to a source
When you’re done creating your IPE, you’ll need to apply it to the source that contains the items you want to decompose.
To apply the IPE to a source:
-
Access the Sources (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console.
-
Click the source for which you want to apply the indexing pipeline extension.
-
In the Action bar, click More > Edit extensions.
-
Click Add > Extension, and then select the indexing pipeline extension you created.
-
Under Stage, select Post-conversion.
-
Under Action on error, select Skip extension.
-
Click Apply extension. You’ll need to rebuild the source for the IPE to apply to the items.
Step 5: Validate the field decomposition
After applying the IPE to the source, you should validate that the field decomposition is working as expected. You can use the Content Browser (platform-ca | platform-eu | platform-au) to view the indexed items and verify that the decomposed field exists and contains the expected values.
If the field decomposition isn’t working as expected, you can review the Log Browser (platform-ca | platform-eu | platform-au) to identify any errors or issues that may have occurred during the indexing process and adjust the field decomposition script accordingly.
Example field decomposition script
The following example Python script decomposes a product identifier field value according to the best practices outlined in the previous section. Note that this script considers the product identifier field to be a single value. If your product identifier field contains multiple values, you’ll need to adjust the script accordingly.
import re
def get_safe_meta_data(meta_data_name):
meta_data_value = document.get_meta_data_value(meta_data_name)
return list(meta_data_value)
def generate_variations(product_identifier):
variations = set()
# Remove non-alphanumeric characters (excluding hyphens) from the product identifier
cleaned_product_identifier = re.sub(r'[^A-Za-z0-9-]+', '', product_identifier)
# Handle hyphen-separated parts
parts = cleaned_product_identifier.split('-')
# Generate incremental variations for each part and sub-part
for idx, part in enumerate(parts):
# Incremental variations for the current part
for i in range(1, len(part) + 1):
prefix = part[:i]
variations.add(prefix)
# Generate variations for combinations starting from this part
for sub_idx in range(idx, len(parts)):
combined = '-'.join(parts[idx:sub_idx + 1]) # Combine parts from idx to sub_idx
for i in range(1, len(combined) + 1):
variations.add(combined[:i])
# Generate variations for the concatenated product identifier
concatenated_id = ''.join(parts)
for i in range(1, len(concatenated_id) + 1):
prefix = concatenated_id[:i]
variations.add(prefix)
# Generate variations with optional leading and trailing characters
if len(cleaned_product_identifier) > 2:
# Remove the first two characters of the cleaned identifier
leading_removed_cleaned = cleaned_product_identifier[2:]
variations.add(leading_removed_cleaned)
if len(cleaned_product_identifier) > 2:
# Remove the last two characters of the cleaned identifier
trailing_removed_cleaned = cleaned_product_identifier[:-2]
variations.add(trailing_removed_cleaned)
if len(concatenated_id) > 2:
# Remove the first two characters of the concatenated version
leading_removed_concatenated = concatenated_id[2:]
variations.add(leading_removed_concatenated)
if len(concatenated_id) > 2:
# Remove the last two characters of the concatenated version
trailing_removed_concatenated = concatenated_id[:-2]
variations.add(trailing_removed_concatenated)
# Add the cleaned product identifier itself
variations.add(cleaned_product_identifier)
# Add the version with spaces instead of hyphens
spaced_version = cleaned_product_identifier.replace('-', ' ')
variations.add(spaced_version)
return sorted(list(variations))
def main():
product_identifier_meta_field = '<MY_PRODUCT_IDENTIFIER_FIELD>'
decomposed_meta_field = '<MY_DECOMPOSED_FIELD>'
product_identifiers = get_safe_meta_data(product_identifier_meta_field)
decomposed_product_identifiers = []
for product_identifier in product_identifiers:
log(f"Processing Product Identifier: {product_identifier}")
variations = generate_variations(product_identifier)
# Properly join variations with a semicolon
decomposed_product_identifier = ';'.join(variations)
log(f"Decomposed Product Identifier: {decomposed_product_identifier}")
decomposed_product_identifiers.append(decomposed_product_identifier)
if decomposed_product_identifiers:
# Add metadata as a semicolon-separated string
document.add_meta_data({decomposed_meta_field: decomposed_product_identifiers})
main()
Replace <MY_PRODUCT_IDENTIFIER_FIELD> with the name of the field that holds the product identifiers (for example, ec_product_id ). |
|
Replace <MY_DECOMPOSED_FIELD> with the name of the field that will store the decomposed values (for example, ec_product_id_decomposed ). |