Data completeness
Data completeness
This is for:
DeveloperThis article discusses how to perform validation checks and maintain data integrity on your live Coveo-powered commerce implementation. We highly recommend maintaining a continuous commitment to conducting the validations described in this article at consistent intervals. This approach ensures thorough and precise monitoring, thereby safeguarding the credibility and resilience of your data.
Monitoring the Data Health page
During and after the implementation of data tracking, you should review the data health dashboard on the Data Health (platform-ca | platform-eu | platform-au) page of the Coveo Administration Console.
Data health refers to the integrity of organizational data, determining its reliability and accuracy for sound analytics and Coveo Machine Learning models. The dashboard explicitly identifies failures detected through data validation rules. You can review this dashboard to pinpoint inconsistencies by analyzing the impact of submitted usage analytics events on the data health.
The data health dashboard provides a comprehensive snapshot of your Coveo organization’s data quality. It outlines the number of events failing validation rules, categorized by severity, and breaks down these occurrences by event type. This dashboard is helpful for finding inconsistencies in recorded data and understanding how each event affects the overall quality of the data.
Additionally, the Data Health page also provides a data health score. This numerical value ranges from 0 to 100, offering a rapid assessment of an organization’s data quality. A corresponding color indicator offers a visual prompt for potential data quality concerns. This score reflects the effect of data quality on commerce dashboards. When encountering a low score, it’s essential to conduct a thorough investigation into potential factors contributing to the issue. This includes analyzing the cause of any validation rules marked as "Critical" severity.
For most usage analytics related issues, front-end difficulties are the probable culprits. Grasping the scope of failed validation criteria and locating them within the Data Health page can help troubleshoot potential data health problems for your organization.
For additional details on utilizing the data health dashboard, see the Data Health troubleshooting tutorial.
Validating conversions
Another layer of validation checks you can perform is validating the event data sent to Coveo UA with your database that records transactions. We recommend you perform this validation approximately one week after implementing data tracking.
The primary objective here is to align orders with their respective revenue figures. This process enables you to ensure the completeness of orders and revenue data within Coveo’s records.
As the Data Health page can’t verify the data health of events not sent to Coveo, it’s vital to validate transaction records from your backend by comparing them with the data available in Coveo.
For a thorough validation, compare a substantial volume of orders from Coveo against your backend. This could encompass a few days' worth of records or even up to a week’s worth, depending on the scale of the organization. Match each individual order using its transaction ID and its corresponding revenue figure. This approach allows you to identify any discrepancies between the two data sources.
We recommend conducting this process regularly, such as quarterly. Repeating these validations over time reinforces the trustworthiness of the aggregated information and upholds the integrity of your data analysis.
Sample SQL query
The following presents an SQL query that can be employed to retrieve transaction records from Coveo within a specified timeframe, eliminating any redundant events. Given that you have access to the Snowflake reader account, you can copy the query as is and replace the placeholders with your own values.
USE DATABASE CORE_MODEL_V001;
-- Set parameters
SET tz = 'America/New_York';
ALTER SESSION SET timezone = $tz;
WITH CART_ITEMS_TR_ROLLUP AS (
SELECT
cart_id,
SUM(base_price * quantity) AS cart_subtotal
FROM COMMERCE.CART_ITEMS
GROUP BY ALL
)
SELECT
DATE(CONVERT_TIMEZONE($tz, TRANSACTION_TIMESTAMP)) AS date,
MIN(base_currency) AS currency,
COUNT(DISTINCT transaction_xid) AS nbr_transactions,
ROUND(SUM(cart_subtotal), 2) AS subtotal,
ROUND(SUM(base_shipping), 2) AS shipping,
ROUND(SUM(base_tax), 2) AS tax,
ROUND(SUM(base_total), 2) AS total
FROM
COMMERCE.TRANSACTIONS
INNER JOIN CART_ITEMS_TR_ROLLUP USING (cart_id)
WHERE
date BETWEEN '2023-08-06 00:00:00.000'::TIMESTAMP_TZ
AND '2023-08-12 23:59:59.999'::TIMESTAMP_TZ
GROUP BY ALL
ORDER BY date DESC;
Set the timezone parameter necessary for performing calculations and filtering in the specified time zone. |
|
Specify the timzestamp parameters to specify the timeframe for which you want to retrieve records for. Here, we retrieve records for the week of August 6th, 2023. |
The output of this query should be in the following format:
For more intricate queries or variations, if you intend to conduct additional validation using the data sent to Coveo, see Create custom Snowflake commerce dashboards.