Skip to content

Quick Start

This guide walks you through your first queries with mixpanel_data in about 5 minutes.

Explore on DeepWiki

🤖 Quick Start Tutorial →

Ask questions about getting started, explore example workflows, or troubleshoot common issues.

Prerequisites

You'll need:

  • mixpanel_data installed (pip install git+https://github.com/jaredmcfarland/mixpanel_data.git)
  • A Mixpanel service account with username, secret, and project ID
  • Your project's data residency region (us, eu, or in)

Step 1: Set Up Service Account Credentials

Option A: Environment Variables

export MP_USERNAME="sa_abc123..."
export MP_SECRET="your-secret-here"
export MP_PROJECT_ID="12345"
export MP_REGION="us"

Option B: Using the CLI

# Interactive prompt (secure, recommended)
mp auth add production \
    --username sa_abc123... \
    --project 12345 \
    --region us
# You'll be prompted for the service account secret with hidden input

This stores credentials in ~/.mp/config.toml and sets production as the default account.

For CI/CD environments, provide the secret via environment variable or stdin:

# Via environment variable
MP_SECRET=your-secret mp auth add production --username sa_abc123... --project 12345

# Via stdin
echo "$SECRET" | mp auth add production --username sa_abc123... --project 12345 --secret-stdin

Step 2: Test Your Connection

Verify credentials are working:

mp auth test
import mixpanel_data as mp

ws = mp.Workspace()
ws.test_credentials()  # Raises AuthenticationError if invalid

Step 3: Explore Your Data

Before writing queries, survey your data landscape. Discovery commands let you see what exists in your Mixpanel project without guessing.

List Events

mp inspect events
import mixpanel_data as mp

ws = mp.Workspace()
events = ws.list_events()
for e in events[:10]:
    print(e.name)

Drill Into Properties

Once you know an event name, see what properties it has:

mp inspect properties "Purchase"
props = ws.list_properties("Purchase")
for p in props:
    print(f"{p.name}: {p.type}")

Sample Property Values

See actual values a property contains:

mp inspect values "Purchase" "country"
values = ws.list_property_values("Purchase", "country")
print(values)  # ['US', 'UK', 'DE', 'FR', ...]

See What's Active

Check today's top events by volume:

mp inspect top-events
top = ws.top_events()
for e in top[:5]:
    print(f"{e.name}: {e.count:,} events")

Browse Saved Assets

See funnels, cohorts, and saved reports already defined in Mixpanel:

mp inspect funnels
mp inspect cohorts
mp inspect bookmarks
funnels = ws.list_funnels()
cohorts = ws.list_cohorts()
bookmarks = ws.list_bookmarks()

This discovery workflow ensures your queries reference real event names, valid properties, and actual values—no trial and error.

Step 4: Fetch Events to Local Storage

Fetch a month of events into a local DuckDB database:

mp fetch events jan_events --from 2025-01-01 --to 2025-01-31
import mixpanel_data as mp

ws = mp.Workspace()
result = ws.fetch_events(
    name="jan_events",
    from_date="2025-01-01",
    to_date="2025-01-31"
)
print(f"Fetched {result.row_count} events in {result.duration_seconds:.1f}s")

Parallel Fetching for Large Date Ranges

For date ranges longer than a week, use --parallel (CLI) or parallel=True (Python) for up to 10x faster exports:

mp fetch events q1_events --from 2025-01-01 --to 2025-03-31 --parallel

See Fetching Data for details.

Step 5: Inspect Your Fetched Data

Before writing queries, explore what you fetched:

# See tables in your workspace
mp inspect tables

# Sample a few rows to see the data shape
mp inspect sample -t jan_events

# Understand event distribution
mp inspect breakdown -t jan_events

# Discover queryable property keys
mp inspect keys -t jan_events
import mixpanel_data as mp

ws = mp.Workspace()

# See tables in your workspace
for table in ws.tables():
    print(f"{table.name}: {table.row_count:,} rows")

# Sample rows to see data shape
print(ws.sample("jan_events", n=3))

# Understand event distribution
breakdown = ws.event_breakdown("jan_events")
print(f"{breakdown.total_events:,} events from {breakdown.total_users:,} users")
for e in breakdown.events[:5]:
    print(f"  {e.event_name}: {e.count:,} ({e.pct_of_total:.1f}%)")

# Discover queryable property keys
print(ws.property_keys("jan_events"))

This tells you what events exist, how they're distributed, and what properties you can query—so your SQL is informed rather than guesswork.

Step 6: Query with SQL

Analyze the data with SQL:

mp query sql "SELECT event_name, COUNT(*) as count FROM jan_events GROUP BY 1 ORDER BY 2 DESC" --format table
import mixpanel_data as mp

ws = mp.Workspace()

# Get results as DataFrame
df = ws.sql("""
    SELECT
        event_name,
        COUNT(*) as count
    FROM jan_events
    GROUP BY 1
    ORDER BY 2 DESC
""")
print(df)

Step 7: Run Live Queries

For real-time analytics, query Mixpanel directly:

mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 --format table

# Filter results with built-in jq support
mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 \
    --format json --jq '.total'
import mixpanel_data as mp

ws = mp.Workspace()

result = ws.segmentation(
    event="Purchase",
    from_date="2025-01-01",
    to_date="2025-01-31"
)

# Access as DataFrame
print(result.df)

Alternative: Stream Data Without Storage

For ETL pipelines or one-time processing, stream data directly without storing:

# Stream events as JSONL (memory-efficient for large datasets)
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout > events.jsonl

# Count unique users via Unix pipeline
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout \
  | jq -r '.distinct_id' | sort -u | wc -l
import mixpanel_data as mp

ws = mp.Workspace()
for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"):
    send_to_warehouse(event)
ws.close()

Temporary Workspaces

For one-off analysis without persisting data, use ephemeral or in-memory workspaces:

import mixpanel_data as mp

# Ephemeral: uses temp file (best for large datasets, benefits from compression)
with mp.Workspace.ephemeral() as ws:
    ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31")
    total = ws.sql_scalar("SELECT COUNT(*) FROM events")
# Database automatically deleted when context exits

# In-memory: no files created (best for small datasets or zero disk footprint)
with mp.Workspace.memory() as ws:
    ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-07")
    total = ws.sql_scalar("SELECT COUNT(*) FROM events")
# Database gone - no files ever created

Next Steps