Data completeness

This is for:

Developer

This article discusses how to perform validation checks and maintain data integrity on your live Coveo-powered commerce implementation. We highly recommend maintaining a continuous commitment to conducting the validations described in this article at consistent intervals. This approach ensures thorough and precise monitoring, thereby safeguarding the credibility and resilience of your data.

Monitoring the Data Health page

During and after the implementation of data tracking, you should review the data health dashboard on the Data Health (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console.

Data health refers to the integrity of organizational data, determining its reliability and accuracy for sound analytics and Coveo Machine Learning models. The dashboard explicitly identifies failures detected through data validation rules. You can employ this dashboard to pinpoint inconsistencies by analyzing the impact of submitted usage analytics events on the data health.

The data health dashboard provides a comprehensive snapshot of your Coveo organization’s data quality. It outlines the number of events failing validation rules, categorized by severity, and breaks down these occurrences by event type. This dashboard is really helpful for finding inconsistencies in recorded data and understanding how each event affects the overall quality of the data.

Additionally, the Data Health page also provides a data health score. This numerical value ranges from 0 to 100, offering a rapid assessment of an organization’s data quality. A corresponding color indicator offers a visual prompt for potential data quality concerns. This score reflects the effect of data quality on commerce dashboards. A low score should prompt an investigation into potential factors, such as high-severity rules or a significant failure rate.

For most usage analytics related issues, front-end difficulties are the probable culprits. Grasping the scope of failed validation criteria and locating them within the Data Health page can help troubleshoot potential data health problems for your organization.

When examining a specific validation rule, delving into individual events that did not meet the rule can be beneficial. This approach allows you to investigate the root cause and identify any patterns that may be present.

For additional details on utilizing the data health dashboard, see the Data Health troubleshooting tutorial.

Validating conversions

Another layer of validation checks you can perform is validating the event data sent to Coveo UA with your database that records transactions. We recommend you perform this validation approximately one week after implementing data tracking.

The primary objective here is to align orders with their respective revenue figures. This process enables you to ensure the completeness of orders and revenue data within Coveo’s records.

As the Data Health page cannot verify the data health of events not sent to Coveo, it is vital to validate transaction records from your backend by comparing them with the data available in Coveo.

For a thorough validation, compare a substantial volume of orders from Coveo against your backend. This could encompass a few days' worth of records or even up to a week’s worth, depending on the scale of the organization. Match each individual order using its transaction ID and its corresponding revenue figure. This approach allows you to identify any discrepancies between the two data sources.

We recommend conducting this process regularly, such as quarterly. Repeating these validations over time reinforces the trustworthiness of the aggregated information and upholds the integrity of your data analysis.

Sample SQL query

The following presents an SQL query that can be employed to retrieve transaction records from Coveo within a specified timeframe, eliminating any redundant events. Given that you have access to the Snowflake reader account, you can copy the query as is and replace the placeholders with your own values.

USE DATABASE CORE_MODEL_V001;

-- Set parameters
SET tz = 'America/New_York'; 1
ALTER SESSION SET timezone = $tz;

WITH CART_ITEMS_TR_ROLLUP AS (
    SELECT
        cart_id,
        SUM(base_price * quantity) AS cart_subtotal
    FROM COMMERCE.CART_ITEMS
    GROUP BY ALL
)

SELECT
    DATE(CONVERT_TIMEZONE($tz, TRANSACTION_TIMESTAMP)) AS date,
    MIN(base_currency) AS currency,
    COUNT(DISTINCT transaction_xid) AS nbr_transactions,
    ROUND(SUM(cart_subtotal), 2) AS subtotal,
    ROUND(SUM(base_shipping), 2) AS shipping,
    ROUND(SUM(base_tax), 2) AS tax,
    ROUND(SUM(base_total), 2) AS total
FROM
    COMMERCE.TRANSACTIONS
INNER JOIN CART_ITEMS_TR_ROLLUP USING (cart_id)
WHERE 2
    date BETWEEN '2023-08-06 00:00:00.000'::TIMESTAMP_TZ
    AND '2023-08-12 23:59:59.999'::TIMESTAMP_TZ
GROUP BY ALL
ORDER BY date DESC;
1 Set the timezone parameter necessary for performing calculations and filtering in the specified time zone.
2 Specify the timzestamp parameters to specify the timeframe for which you want to retrieve records for. Here, we retrieve records for the week of August 6th, 2023.

The output of this query should be in the following format:

Transactions records output from SQL query

For more intricate queries or variations, if you intend to conduct additional validation using the data sent to Coveo, see Create custom Snowflake commerce dashboards.