Use case: Retrieve Coveo UA data with the Python connector
Use case: Retrieve Coveo UA data with the Python connector
In the Coveo Administration Console, the Snowflake reader account gives you direct access to your organization's' data and it enables you to explore that data as well as create customized exports and materialized views. You can use different Business Intelligence (BI) tools for data analysis, such as Tableau™ and Power BI™.
Alternatively, you have the option to use a native Snowflake connector. In this article, we’ll demonstrate how to use Python™ to explore and export your data.
Step 1: Install the connector
To get started, you must first install the Snowflake Connector for Python using the following command:
pip install snowflake-connector-python
For detailed installation steps, see Installing the Python Connector.
Step 2: Install Pandas
Once you’ve installed the connector, you must also install Pandas, which is a data analysis library that works with Python. Use the following command to install:
pip install pandas
For detailed installation steps, see Using Pandas DataFrames with the Python Connector.
Step 3: Connect to Snowflake
Following the installations of Snowflake Connector for Python and Pandas, you must create a script to connect to Snowflake, as shown in the following example:
import snowflake.connector
import pandas as pd
conn = snowflake.connector.connect(
user='<xxxxxx>',
password='<xxxxxxxx>',
account='<account>',
ocsp_fail_open=False,
)
Step 4: Define a SQL query
Define and test the query in a Snowflake worksheet. For example:
set from_date ='<from_date>';
set to_date ='<to_date>';
select * from "COVEO"."ua"."all_events_shared" where "datetime" between $from_date and $to_date;
Once you’re satisfied with the query, you must add it to your Python script. For example:
sql_query = pd.read_sql_query('select * from "COVEO"."ua"."all_events_shared" where "datetime" between %(from_date)s and %(to_date)s'
,conn
,params={"from_date":from_date, "to_date":to_date})
Step 5: Add the data to a DataFrame
Use the following code to read the data into a Pandas DataFrame.
df = pd.DataFrame(sql_query)
Step 6: Export to CSV
As a final step, export the data to a CSV file as follows:
df.to_csv (r'searches.csv', index = False)
Limitation
When querying a larger dataset, you may encounter an SSL verification error. This occurs because a larger dataset prompts the connector to open multiple HTTPS connections, and each of these connections requires SSL verification. If traffic from specific Snowflake IP addresses is blocked by your firewall, it will prevent the secure handshake required for HTTPS connections. Snowflake’s connector needs to access these IP addresses to retrieve data securely.
To avoid this error, we recommend running the following statement in Snowflake to list the host and port values that should be allowed:
SELECT value:host::varchar AS host, value:port AS port
FROM TABLE(FLATTEN(input => PARSE_JSON(system$allowlist())))
WHERE value:type = 'STAGE';
To enable successful connections without SSL handshake errors, ensure that the hosts and ports listed in the output aren’t blocked by the firewall.
Reference
Code sample
The script below is similar to the Python script used for UA data exports, however it’s shorter since it doesn’t require joining the various tables on the CSV file.
import snowflake.connector
import pandas as pd
conn = snowflake.connector.connect(
user='xxxxxx',
password='xxxxxxxx',
account='account',
ocsp_fail_open=False,
)
# No need to define organization name, your reader account bound to your data
from_date = "2022-01-01"
to_date = "2022-01-02"
# Searches
sql_query = pd.read_sql_query('select * from "COVEO"."ua"."searches_shared" where "datetime" between %(from_date)s and %(to_date)s limit 10;'
,conn
,params={"from_date":from_date, "to_date": to_date})
df = pd.DataFrame(sql_query)
df.to_csv (r'searches.csv', index = False)
# Clicks
sql_query = pd.read_sql_query('select * from "COVEO"."ua"."clicks_shared" where "datetime" between %(from_date)s and %(to_date)s'
,conn
,params={"from_date":from_date, "to_date": to_date})
df = pd.DataFrame(sql_query)
df.to_csv (r'clicks.csv', index = False)
# Customs
sql_query = pd.read_sql_query('select * from "COVEO"."ua"."custom_events_shared" where "datetime" between %(from_date)s and %(to_date)s'
,conn
,params={"from_date":from_date, "to_date": to_date})
df = pd.DataFrame(sql_query)
df.to_csv (r'customs.csv', index = False)
# Views
sql_query = pd.read_sql_query('select * from "COVEO"."ua"."views_shared" where "datetime" between %(from_date)s and %(to_date)s'
,conn
,params={"from_date":from_date, "to_date": to_date})
df = pd.DataFrame(sql_query)
df.to_csv (r'views.csv', index = False)
# All events combined
sql_query = pd.read_sql_query('select * from "COVEO"."ua"."all_events_shared" where "datetime" between %(from_date)s and %(to_date)s'
,conn
,params={"from_date":from_date, "to_date": to_date})
df = pd.DataFrame(sql_query)
df.to_csv (r'all.csv', index = False)