mixpanel_data¶

A complete programmable interface to Mixpanel analytics—available as both a Python library and CLI.

AI-Friendly Documentation

DeepWiki provides an AI-optimized view of this project—perfect for code assistants, agents, and LLM-powered workflows. Ask questions about the codebase, explore architecture, or get contextual help.

Google Code Wiki

🔍 Explore on Code Wiki →

Google's Code Wiki offers another AI-optimized interface for exploring this codebase—search, understand, and navigate the project with natural language queries.

Why This Exists¶

Mixpanel's web UI is built for interactive exploration. But many workflows need something different: scripts that run unattended, notebooks that combine Mixpanel data with other sources, agents that query analytics programmatically, or pipelines that move data between systems.

mixpanel_data provides direct programmatic access to Mixpanel's analytics platform. Core analytics—segmentation, funnels, retention, saved reports—plus capabilities like raw JQL execution and local SQL analysis are available as Python methods or shell commands.

Three Interfaces, One Capability Set¶

Python Library — For notebooks, scripts, and applications:

import mixpanel_data as mp

ws = mp.Workspace()

# Discover what's in your project
events = ws.list_events()
props = ws.list_properties("Purchase")
values = ws.list_property_values("Purchase", "country")
funnels = ws.list_funnels()
cohorts = ws.list_cohorts()
bookmarks = ws.list_bookmarks()

# Live queries—use discovered data to construct accurate queries
segmentation = ws.segmentation(
    event=events[0].name,
    from_date="2025-01-01",
    to_date="2025-01-31",
    on="country"
)

funnel = ws.funnel(
    funnel_id=funnels[0].id,
    from_date="2025-01-01",
    to_date="2025-01-31"
)

saved = ws.saved_report(bookmark_id=bookmarks[0].id)
activity = ws.activity_feed(
    distinct_id="user@example.com",
    from_date="2025-01-01"
)

# Fetch data locally (use parallel=True for large date ranges)
ws.fetch_events(
    "jan_events",
    from_date="2025-01-01",
    to_date="2025-01-31"
)
ws.fetch_events(
    "q1_events",
    from_date="2025-01-01",
    to_date="2025-03-31",
    parallel=True  # Up to 10x faster for large date ranges
)
ws.fetch_profiles("power_users", cohort_id=cohorts[0].id)

# Query with full SQL power—joins, window functions, CTEs
df = ws.sql("""
    SELECT
        e.properties->>'$.country' as country,
        COUNT(DISTINCT e.distinct_id) as users,
        COUNT(*) as events
    FROM jan_events e
    JOIN power_users u ON e.distinct_id = u.distinct_id
    GROUP BY 1
    ORDER BY 2 DESC
""")

# Results have .df for pandas interoperability
segmentation.df
funnel.df
df.to_csv("export.csv")

# Execute arbitrary JQL for custom analysis
jql_result = ws.jql("""
    function main() {
        return Events({...}).groupBy([...])
    }
""")

CLI — For shell scripts, pipelines, and agent tool calls:

# Discover your data landscape
mp inspect events
mp inspect properties "Purchase"
mp inspect values "Purchase" "country"
mp inspect top-events
mp inspect funnels
mp inspect cohorts
mp inspect bookmarks

# Live queries against Mixpanel API
mp query segmentation "Purchase" \
    --from 2025-01-01 --to 2025-01-31 --on country
mp query funnel 12345 --from 2025-01-01 --to 2025-01-31
mp query retention \
    --born-event Signup --return-event Purchase --from 2025-01-01
mp query activity-feed user@example.com --from 2025-01-01
mp query saved-report 67890
mp query frequency "Login" --from 2025-01-01

# Fetch data locally (use --parallel for large date ranges)
mp fetch events jan_events --from 2025-01-01 --to 2025-01-31
mp fetch events q1_events --from 2025-01-01 --to 2025-03-31 --parallel
mp fetch profiles users --cohort-id 12345

# Query locally with SQL
mp query sql "SELECT event_name, COUNT(*) FROM jan_events GROUP BY 1"

# Inspect local data
mp inspect tables
mp inspect schema jan_events
mp inspect sample jan_events
mp inspect summarize jan_events

# Filter with built-in jq
mp query segmentation "Purchase" --from 2025-01-01 --format json --jq '.total'

# Stream to Unix tools (memory-efficient for large datasets)
mp fetch events --stdout --from 2025-01-01 --to 2025-01-31 \
    | jq -r '.distinct_id' | sort -u | wc -l

MCP Server — For AI assistants like Claude Desktop:

The mp_mcp package exposes all mixpanel_data capabilities through the Model Context Protocol (MCP). AI assistants can query your analytics through natural conversation:

You Say	What Happens
"What events are tracked in my project?"	Discovers schema via `list_events`
"How many signups happened last week?"	Runs segmentation query
"Why did conversions drop on January 7th?"	AI-powered diagnosis across dimensions
"Show me a product health dashboard"	Orchestrates AARRR metrics analysis
"Fetch events from January and find top users"	Stores locally, then runs SQL

The MCP server includes intelligent tools that synthesize insights, composed tools that orchestrate multi-query analysis, and interactive workflows with user confirmation for large operations. See the MCP Server documentation for setup and tool reference.

Capabilities¶

Discovery — Rapidly explore your project's data landscape:

List all events, drill into properties, sample actual values
Browse saved funnels, cohorts, and reports (bookmarks)
Access Lexicon definitions from your data dictionary
Analyze property distributions, coverage, and numeric statistics
Inspect top events by volume, daily trends, user engagement patterns

Discovery commands let you survey what exists before writing queries—no guessing at event names or property values.

Live Queries — Execute Mixpanel analytics directly:

Segmentation with filtering, grouping, and time bucketing
Funnel conversion analysis
Retention analysis
Saved reports (Insights, Funnels, Flows, Retention)
User activity feeds
Frequency and engagement analysis
Numeric aggregations (sum, average, bucket)
Raw JQL execution for custom analysis

Local Storage — Fetch once, query repeatedly:

Store events and profiles in a local DuckDB database
Parallel fetching for large date ranges (up to 10x faster)
Query with full SQL: joins, window functions, CTEs
Introspect tables, sample data, analyze distributions
Iterate on analysis without repeated API calls

Streaming — Process data without storage:

Stream events directly for ETL pipelines
One-time processing without local persistence
Memory-efficient iteration over large datasets

For Humans and Agents¶

The structured output and deterministic command interface make mixpanel_data particularly effective for AI coding agents—the same properties that make it scriptable for humans make it reliable for automated workflows.

Discovery commands are particularly valuable: an agent can rapidly survey your data landscape—listing events, inspecting properties, sampling values—then construct accurate queries based on what actually exists rather than guessing.

The tool is designed to be self-documenting: comprehensive --help on every command, complete docstrings on every method, full type annotations throughout, and rich exception messages that explain what went wrong and how to fix it. Agents can discover capabilities, learn correct usage, and recover from mistakes autonomously.

LLM-Optimized Documentation¶

This documentation is built with AI consumption in mind. In addition to the standard HTML pages, we provide:

Endpoint	Size	Use Case
`llms.txt`	~3KB	Structured index—discover what documentation exists
`llms-full.txt`	~400KB	Complete documentation in one file—comprehensive search
`index.md` pages	Varies	Each HTML page has a corresponding `index.md` at the same path

Every page also has a Copy Markdown button in the upper right corner—click it to copy the page content as markdown, ready to paste into your AI assistant's context.

For interactive exploration of the codebase itself, see DeepWiki.

Next Steps¶

Installation — Get started with pip or uv
Quick Start — Your first queries in 5 minutes
API Reference — Complete Python API documentation
CLI Reference — Command-line interface documentation
MCP Server — Expose analytics to AI assistants like Claude Desktop