Local SQL Queries¶

Query your fetched data with SQL using DuckDB's powerful analytical engine.

Explore on DeepWiki

Ask questions about SQL patterns, JSON property access, or how to structure complex analytical queries.

Basic Queries¶

Execute and Get DataFrame¶

import mixpanel_data as mp

ws = mp.Workspace()

df = ws.sql("""
    SELECT
        event_name,
        COUNT(*) as count
    FROM jan_events
    GROUP BY 1
    ORDER BY 2 DESC
""")

print(df)

Get Single Value¶

total = ws.sql_scalar("SELECT COUNT(*) FROM jan_events")
print(f"Total events: {total}")

Get Rows as Tuples¶

rows = ws.sql_rows("""
    SELECT event_name, COUNT(*)
    FROM jan_events
    GROUP BY 1
    LIMIT 5
""")

for event_name, count in rows:
    print(f"{event_name}: {count}")

DuckDB JSON Syntax¶

Mixpanel properties are stored as JSON columns. Use DuckDB's JSON operators to access them.

Extract String Property¶

SELECT properties->>'$.country' as country
FROM jan_events

Extract and Cast Numeric¶

SELECT CAST(properties->>'$.amount' AS DECIMAL) as amount
FROM jan_events

Filter on Property¶

SELECT *
FROM jan_events
WHERE properties->>'$.plan' = 'premium'

Nested Property Access¶

SELECT properties->>'$.user.email' as email
FROM jan_events

Check Property Exists¶

SELECT *
FROM jan_events
WHERE properties->>'$.coupon_code' IS NOT NULL

Array Properties¶

-- Array length
SELECT json_array_length(properties->'$.items') as item_count
FROM jan_events

-- Array element
SELECT properties->'$.items'->>0 as first_item
FROM jan_events

Common Query Patterns¶

Daily Event Counts¶

SELECT
    DATE_TRUNC('day', event_time) as day,
    COUNT(*) as count
FROM jan_events
GROUP BY 1
ORDER BY 1

Events by User¶

SELECT
    distinct_id,
    COUNT(*) as event_count,
    MIN(event_time) as first_seen,
    MAX(event_time) as last_seen
FROM jan_events
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10

Property Distribution¶

SELECT
    properties->>'$.country' as country,
    COUNT(*) as count,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) as pct
FROM jan_events
WHERE event_name = 'Purchase'
GROUP BY 1
ORDER BY 2 DESC

Revenue by Day¶

SELECT
    DATE_TRUNC('day', event_time) as day,
    COUNT(*) as purchases,
    SUM(CAST(properties->>'$.amount' AS DECIMAL)) as revenue
FROM jan_events
WHERE event_name = 'Purchase'
GROUP BY 1
ORDER BY 1

User Cohort Analysis¶

WITH first_events AS (
    SELECT
        distinct_id,
        DATE_TRUNC('week', MIN(event_time)) as cohort_week
    FROM jan_events
    WHERE event_name = 'Signup'
    GROUP BY 1
)
SELECT
    cohort_week,
    COUNT(DISTINCT distinct_id) as users
FROM first_events
GROUP BY 1
ORDER BY 1

Funnel Query¶

WITH step1 AS (
    SELECT DISTINCT distinct_id
    FROM jan_events
    WHERE event_name = 'View Product'
),
step2 AS (
    SELECT DISTINCT distinct_id
    FROM jan_events
    WHERE event_name = 'Add to Cart'
    AND distinct_id IN (SELECT distinct_id FROM step1)
),
step3 AS (
    SELECT DISTINCT distinct_id
    FROM jan_events
    WHERE event_name = 'Purchase'
    AND distinct_id IN (SELECT distinct_id FROM step2)
)
SELECT
    (SELECT COUNT(*) FROM step1) as viewed,
    (SELECT COUNT(*) FROM step2) as added,
    (SELECT COUNT(*) FROM step3) as purchased

Joining Events and Profiles¶

Query events with user profile data:

# First, fetch both
ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31")
ws.fetch_profiles("users")

# Join them
df = ws.sql("""
    SELECT
        e.event_name,
        u.properties->>'$.plan' as plan,
        COUNT(*) as count
    FROM events e
    JOIN users u ON e.distinct_id = u.distinct_id
    GROUP BY 1, 2
    ORDER BY 3 DESC
""")

CLI Usage¶

Run SQL queries from the command line:

# Table output
mp query sql "SELECT event_name, COUNT(*) FROM events GROUP BY 1" --format table

# JSON output
mp query sql "SELECT * FROM events LIMIT 10" --format json

# CSV export
mp query sql "SELECT * FROM events" --format csv > events.csv

# JSONL for streaming
mp query sql "SELECT * FROM events" --format jsonl > events.jsonl

# Filter with built-in jq support
mp query sql "SELECT * FROM events LIMIT 100" --format json \
    --jq '.[] | select(.event_name == "Purchase")'

# Extract specific fields with jq
mp query sql "SELECT event_name, COUNT(*) as cnt FROM events GROUP BY 1" \
    --format json --jq 'map({name: .event_name, count: .cnt})'

Direct DuckDB Access¶

For advanced use cases, access the DuckDB connection directly:

# Get the connection
conn = ws.connection

# Run DuckDB-specific operations
conn.execute("SET threads TO 4")
result = conn.execute("EXPLAIN ANALYZE SELECT * FROM events").fetchall()

Database Path¶

Get the path to the underlying database file:

# Get the database file path
path = ws.db_path
print(f"Data stored at: {path}")

# Useful for reopening the same database later
ws.close()
ws = mp.Workspace.open(path)

Note: db_path returns None for in-memory workspaces created with Workspace.memory().

Performance Tips¶

Use Appropriate Data Types¶

Cast properties to appropriate types for better performance:

-- Instead of string comparison
WHERE CAST(properties->>'$.amount' AS DECIMAL) > 100

-- Consider creating a view with typed columns
CREATE VIEW typed_events AS
SELECT
    event_id,
    event_name,
    event_time,
    distinct_id,
    CAST(properties->>'$.amount' AS DECIMAL) as amount,
    properties->>'$.country' as country
FROM jan_events

Limit Result Sets¶

Always use LIMIT during exploration:

SELECT * FROM jan_events LIMIT 100

Use Aggregations¶

DuckDB is optimized for analytical queries. Prefer aggregations over fetching raw rows.

Next Steps¶

Live Analytics — Query Mixpanel directly
API Reference — Complete Workspace API