Quick Start¶
This guide walks you through your first queries with mixpanel_data in about 5 minutes.
Explore on DeepWiki
Ask questions about getting started, explore example workflows, or troubleshoot common issues.
Prerequisites¶
You'll need:
- mixpanel_data installed (
pip install git+https://github.com/jaredmcfarland/mixpanel_data.git) - A Mixpanel service account with username, secret, and project ID
- Your project's data residency region (us, eu, or in)
Step 1: Set Up Service Account Credentials¶
Option A: Environment Variables¶
export MP_USERNAME="sa_abc123..."
export MP_SECRET="your-secret-here"
export MP_PROJECT_ID="12345"
export MP_REGION="us"
Option B: Using the CLI¶
# Interactive prompt (secure, recommended)
mp auth add production \
--username sa_abc123... \
--project 12345 \
--region us
# You'll be prompted for the service account secret with hidden input
This stores credentials in ~/.mp/config.toml and sets production as the default account.
For CI/CD environments, provide the secret via environment variable or stdin:
# Via environment variable
MP_SECRET=your-secret mp auth add production --username sa_abc123... --project 12345
# Via stdin
echo "$SECRET" | mp auth add production --username sa_abc123... --project 12345 --secret-stdin
Step 2: Test Your Connection¶
Verify credentials are working:
Step 3: Explore Your Data¶
Before writing queries, survey your data landscape. Discovery commands let you see what exists in your Mixpanel project without guessing.
List Events¶
Drill Into Properties¶
Once you know an event name, see what properties it has:
Sample Property Values¶
See actual values a property contains:
See What's Active¶
Check today's top events by volume:
Browse Saved Assets¶
See funnels, cohorts, and saved reports already defined in Mixpanel:
This discovery workflow ensures your queries reference real event names, valid properties, and actual values—no trial and error.
Step 4: Fetch Events to Local Storage¶
Fetch a month of events into a local DuckDB database:
Parallel Fetching for Large Date Ranges
For date ranges longer than a week, use --parallel (CLI) or parallel=True (Python) for up to 10x faster exports:
See Fetching Data for details.
Step 5: Inspect Your Fetched Data¶
Before writing queries, explore what you fetched:
import mixpanel_data as mp
ws = mp.Workspace()
# See tables in your workspace
for table in ws.tables():
print(f"{table.name}: {table.row_count:,} rows")
# Sample rows to see data shape
print(ws.sample("jan_events", n=3))
# Understand event distribution
breakdown = ws.event_breakdown("jan_events")
print(f"{breakdown.total_events:,} events from {breakdown.total_users:,} users")
for e in breakdown.events[:5]:
print(f" {e.event_name}: {e.count:,} ({e.pct_of_total:.1f}%)")
# Discover queryable property keys
print(ws.property_keys("jan_events"))
This tells you what events exist, how they're distributed, and what properties you can query—so your SQL is informed rather than guesswork.
Step 6: Query with SQL¶
Analyze the data with SQL:
Step 7: Run Live Queries¶
For real-time analytics, query Mixpanel directly:
Alternative: Stream Data Without Storage¶
For ETL pipelines or one-time processing, stream data directly without storing:
Temporary Workspaces¶
For one-off analysis without persisting data, use ephemeral or in-memory workspaces:
import mixpanel_data as mp
# Ephemeral: uses temp file (best for large datasets, benefits from compression)
with mp.Workspace.ephemeral() as ws:
ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31")
total = ws.sql_scalar("SELECT COUNT(*) FROM events")
# Database automatically deleted when context exits
# In-memory: no files created (best for small datasets or zero disk footprint)
with mp.Workspace.memory() as ws:
ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-07")
total = ws.sql_scalar("SELECT COUNT(*) FROM events")
# Database gone - no files ever created
Next Steps¶
- Configuration — Multiple accounts and advanced settings
- Fetching Data — Filtering and progress callbacks
- Streaming Data — Process data without local storage
- SQL Queries — DuckDB JSON syntax and patterns
- Live Analytics — Segmentation, funnels, retention