# mixpanel_data

> Python library for working with Mixpanel analytics data, designed for AI coding agents

mixpanel_data is a complete programmable interface to Mixpanel analytics.
Python library and CLI for discovery, querying, and data extraction.
Discover your schema, run live analytics (segmentation, funnels, retention),
execute JQL, and analyze locally with SQL via DuckDB.


# Getting Started

# mixpanel_data

A complete programmable interface to Mixpanel analytics—available as both a Python library and CLI.

AI-Friendly Documentation

🤖 **[Explore on DeepWiki →](https://deepwiki.com/jaredmcfarland/mixpanel_data)**

DeepWiki provides an AI-optimized view of this project—perfect for code assistants, agents, and LLM-powered workflows. Ask questions about the codebase, explore architecture, or get contextual help.

## Why This Exists

Mixpanel's web UI is built for interactive exploration. But many workflows need something different: scripts that run unattended, notebooks that combine Mixpanel data with other sources, agents that query analytics programmatically, or pipelines that move data between systems.

`mixpanel_data` provides direct programmatic access to Mixpanel's analytics platform. Core analytics—segmentation, funnels, retention, saved reports—plus capabilities like raw JQL execution and local SQL analysis are available as Python methods or shell commands.

## Two Interfaces, One Capability Set

**Python Library** — For notebooks, scripts, and applications:

```
import mixpanel_data as mp

ws = mp.Workspace()

# Discover what's in your project
events = ws.list_events()
props = ws.list_properties("Purchase")
values = ws.list_property_values("Purchase", "country")
funnels = ws.list_funnels()
cohorts = ws.list_cohorts()
bookmarks = ws.list_bookmarks()

# Live queries—use discovered data to construct accurate queries
segmentation = ws.segmentation(
    event=events[0].name,
    from_date="2025-01-01",
    to_date="2025-01-31",
    on="country"
)

funnel = ws.funnel(
    funnel_id=funnels[0].id,
    from_date="2025-01-01",
    to_date="2025-01-31"
)

saved = ws.saved_report(bookmark_id=bookmarks[0].id)
activity = ws.activity_feed(
    distinct_id="user@example.com",
    from_date="2025-01-01"
)

# Fetch data locally (use parallel=True for large date ranges)
ws.fetch_events(
    "jan_events",
    from_date="2025-01-01",
    to_date="2025-01-31"
)
ws.fetch_events(
    "q1_events",
    from_date="2025-01-01",
    to_date="2025-03-31",
    parallel=True  # Up to 10x faster for large date ranges
)
ws.fetch_profiles("power_users", cohort_id=cohorts[0].id)

# Query with full SQL power—joins, window functions, CTEs
df = ws.sql("""
    SELECT
        e.properties->>'$.country' as country,
        COUNT(DISTINCT e.distinct_id) as users,
        COUNT(*) as events
    FROM jan_events e
    JOIN power_users u ON e.distinct_id = u.distinct_id
    GROUP BY 1
    ORDER BY 2 DESC
""")

# Results have .df for pandas interoperability
segmentation.df
funnel.df
df.to_csv("export.csv")

# Execute arbitrary JQL for custom analysis
jql_result = ws.jql("""
    function main() {
        return Events({...}).groupBy([...])
    }
""")
```

**CLI** — For shell scripts, pipelines, and agent tool calls:

```
# Discover your data landscape
mp inspect events
mp inspect properties "Purchase"
mp inspect values "Purchase" "country"
mp inspect top-events
mp inspect funnels
mp inspect cohorts
mp inspect bookmarks

# Live queries against Mixpanel API
mp query segmentation "Purchase" \
    --from 2025-01-01 --to 2025-01-31 --on country
mp query funnel 12345 --from 2025-01-01 --to 2025-01-31
mp query retention \
    --born-event Signup --return-event Purchase --from 2025-01-01
mp query activity-feed user@example.com --from 2025-01-01
mp query saved-report 67890
mp query frequency "Login" --from 2025-01-01

# Fetch data locally (use --parallel for large date ranges)
mp fetch events jan_events --from 2025-01-01 --to 2025-01-31
mp fetch events q1_events --from 2025-01-01 --to 2025-03-31 --parallel
mp fetch profiles users --cohort-id 12345

# Query locally with SQL
mp query sql "SELECT event_name, COUNT(*) FROM jan_events GROUP BY 1"

# Inspect local data
mp inspect tables
mp inspect schema jan_events
mp inspect sample jan_events
mp inspect summarize jan_events

# Filter with built-in jq
mp query segmentation "Purchase" --from 2025-01-01 --format json --jq '.total'

# Stream to Unix tools (memory-efficient for large datasets)
mp fetch events --stdout --from 2025-01-01 --to 2025-01-31 \
    | jq -r '.distinct_id' | sort -u | wc -l
```

## Capabilities

**Discovery** — Rapidly explore your project's data landscape:

- List all events, drill into properties, sample actual values
- Browse saved funnels, cohorts, and reports (bookmarks)
- Access Lexicon definitions from your data dictionary
- Analyze property distributions, coverage, and numeric statistics
- Inspect top events by volume, daily trends, user engagement patterns

Discovery commands let you survey what exists before writing queries—no guessing at event names or property values.

**Live Queries** — Execute Mixpanel analytics directly:

- Segmentation with filtering, grouping, and time bucketing
- Funnel conversion analysis
- Retention analysis
- Saved reports (Insights, Funnels, Flows, Retention)
- User activity feeds
- Frequency and engagement analysis
- Numeric aggregations (sum, average, bucket)
- Raw JQL execution for custom analysis

**Local Storage** — Fetch once, query repeatedly:

- Store events and profiles in a local DuckDB database
- Parallel fetching for large date ranges (up to 10x faster)
- Query with full SQL: joins, window functions, CTEs
- Introspect tables, sample data, analyze distributions
- Iterate on analysis without repeated API calls

**Streaming** — Process data without storage:

- Stream events directly for ETL pipelines
- One-time processing without local persistence
- Memory-efficient iteration over large datasets

## For Humans and Agents

The structured output and deterministic command interface make `mixpanel_data` particularly effective for AI coding agents—the same properties that make it scriptable for humans make it reliable for automated workflows.

Discovery commands are particularly valuable: an agent can rapidly survey your data landscape—listing events, inspecting properties, sampling values—then construct accurate queries based on what actually exists rather than guessing.

The tool is designed to be self-documenting: comprehensive `--help` on every command, complete docstrings on every method, full type annotations throughout, and rich exception messages that explain what went wrong and how to fix it. Agents can discover capabilities, learn correct usage, and recover from mistakes autonomously.

### LLM-Optimized Documentation

This documentation is built with AI consumption in mind. In addition to the standard HTML pages, we provide:

| Endpoint                                                                        | Size   | Use Case                                                       |
| ------------------------------------------------------------------------------- | ------ | -------------------------------------------------------------- |
| [`llms.txt`](https://jaredmcfarland.github.io/mixpanel_data/llms.txt)           | ~3KB   | Structured index—discover what documentation exists            |
| [`llms-full.txt`](https://jaredmcfarland.github.io/mixpanel_data/llms-full.txt) | ~400KB | Complete documentation in one file—comprehensive search        |
| [`index.md`](https://jaredmcfarland.github.io/mixpanel_data/index.md) pages     | Varies | Each HTML page has a corresponding `index.md` at the same path |

Every page also has a **Copy Markdown** button in the upper right corner—click it to copy the page content as markdown, ready to paste into your AI assistant's context.

For interactive exploration of the codebase itself, see [DeepWiki](https://deepwiki.com/jaredmcfarland/mixpanel_data).

## Next Steps

- [Installation](https://jaredmcfarland.github.io/mixpanel_data/getting-started/installation/index.md) — Get started with pip or uv
- [Quick Start](https://jaredmcfarland.github.io/mixpanel_data/getting-started/quickstart/index.md) — Your first queries in 5 minutes
- [API Reference](https://jaredmcfarland.github.io/mixpanel_data/api/index.md) — Complete Python API documentation
- [CLI Reference](https://jaredmcfarland.github.io/mixpanel_data/cli/index.md) — Command-line interface documentation

Copy markdown

# Installation

> **⚠️ Pre-release Software**: This package is under active development and not yet published to PyPI. Install directly from GitHub.

Explore on DeepWiki

🤖 **[Installation Guide →](https://deepwiki.com/jaredmcfarland/mixpanel_data/2.1-installation)**

Ask questions about requirements, dependencies, or troubleshoot installation issues.

## Requirements

- Python 3.11 or higher
- A Mixpanel service account with API access

## Installing with pip

```
pip install git+https://github.com/jaredmcfarland/mixpanel_data.git
```

## Installing with uv

[uv](https://github.com/astral-sh/uv) is a fast Python package installer:

```
uv pip install git+https://github.com/jaredmcfarland/mixpanel_data.git
```

Or add to your project:

```
uv add git+https://github.com/jaredmcfarland/mixpanel_data.git
```

## Optional Dependencies

### Documentation Tools

If you want to build the documentation locally:

```
pip install mixpanel_data[docs]
```

## Verifying Installation

After installation, verify the CLI is available:

```
mp --version
```

You should see output like:

```
mixpanel_data 0.1.0
```

Test the Python import:

```
import mixpanel_data as mp
print(mp.__version__)
```

## Next Steps

- [Quick Start](https://jaredmcfarland.github.io/mixpanel_data/getting-started/quickstart/index.md) — Set up credentials and run your first query
- [Configuration](https://jaredmcfarland.github.io/mixpanel_data/getting-started/configuration/index.md) — Learn about environment variables and config files

Copy markdown

# Quick Start

This guide walks you through your first queries with mixpanel_data in about 5 minutes.

Explore on DeepWiki

🤖 **[Quick Start Tutorial →](https://deepwiki.com/jaredmcfarland/mixpanel_data/2.3-quick-start-tutorial)**

Ask questions about getting started, explore example workflows, or troubleshoot common issues.

## Prerequisites

You'll need:

- mixpanel_data installed (`pip install git+https://github.com/jaredmcfarland/mixpanel_data.git`)
- A Mixpanel service account with username, secret, and project ID
- Your project's data residency region (us, eu, or in)

## Step 1: Set Up Service Account Credentials

### Option A: Environment Variables

```
export MP_USERNAME="sa_abc123..."
export MP_SECRET="your-secret-here"
export MP_PROJECT_ID="12345"
export MP_REGION="us"
```

### Option B: Using the CLI

```
# Interactive prompt (secure, recommended)
mp auth add production \
    --username sa_abc123... \
    --project 12345 \
    --region us
# You'll be prompted for the service account secret with hidden input
```

This stores credentials in `~/.mp/config.toml` and sets `production` as the default account.

For CI/CD environments, provide the secret via environment variable or stdin:

```
# Via environment variable
MP_SECRET=your-secret mp auth add production --username sa_abc123... --project 12345

# Via stdin
echo "$SECRET" | mp auth add production --username sa_abc123... --project 12345 --secret-stdin
```

## Step 2: Test Your Connection

Verify credentials are working:

```
mp auth test
```

```
import mixpanel_data as mp

ws = mp.Workspace()
ws.test_credentials()  # Raises AuthenticationError if invalid
```

## Step 3: Explore Your Data

Before writing queries, survey your data landscape. Discovery commands let you see what exists in your Mixpanel project without guessing.

### List Events

```
mp inspect events
```

```
import mixpanel_data as mp

ws = mp.Workspace()
events = ws.list_events()
for e in events[:10]:
    print(e.name)
```

### Drill Into Properties

Once you know an event name, see what properties it has:

```
mp inspect properties "Purchase"
```

```
props = ws.list_properties("Purchase")
for p in props:
    print(f"{p.name}: {p.type}")
```

### Sample Property Values

See actual values a property contains:

```
mp inspect values "Purchase" "country"
```

```
values = ws.list_property_values("Purchase", "country")
print(values)  # ['US', 'UK', 'DE', 'FR', ...]
```

### See What's Active

Check today's top events by volume:

```
mp inspect top-events
```

```
top = ws.top_events()
for e in top[:5]:
    print(f"{e.name}: {e.count:,} events")
```

### Browse Saved Assets

See funnels, cohorts, and saved reports already defined in Mixpanel:

```
mp inspect funnels
mp inspect cohorts
mp inspect bookmarks
```

```
funnels = ws.list_funnels()
cohorts = ws.list_cohorts()
bookmarks = ws.list_bookmarks()
```

This discovery workflow ensures your queries reference real event names, valid properties, and actual values—no trial and error.

## Step 4: Fetch Events to Local Storage

Fetch a month of events into a local DuckDB database:

```
mp fetch events jan_events --from 2025-01-01 --to 2025-01-31
```

```
import mixpanel_data as mp

ws = mp.Workspace()
result = ws.fetch_events(
    name="jan_events",
    from_date="2025-01-01",
    to_date="2025-01-31"
)
print(f"Fetched {result.row_count} events in {result.duration_seconds:.1f}s")
```

Parallel Fetching for Large Date Ranges

For date ranges longer than a week, use `--parallel` (CLI) or `parallel=True` (Python) for up to 10x faster exports:

```
mp fetch events q1_events --from 2025-01-01 --to 2025-03-31 --parallel
```

See [Fetching Data](https://jaredmcfarland.github.io/mixpanel_data/guide/fetching/#parallel-fetching) for details.

## Step 5: Inspect Your Fetched Data

Before writing queries, explore what you fetched:

```
# See tables in your workspace
mp inspect tables

# Sample a few rows to see the data shape
mp inspect sample -t jan_events

# Understand event distribution
mp inspect breakdown -t jan_events

# Discover queryable property keys
mp inspect keys -t jan_events
```

```
import mixpanel_data as mp

ws = mp.Workspace()

# See tables in your workspace
for table in ws.tables():
    print(f"{table.name}: {table.row_count:,} rows")

# Sample rows to see data shape
print(ws.sample("jan_events", n=3))

# Understand event distribution
breakdown = ws.event_breakdown("jan_events")
print(f"{breakdown.total_events:,} events from {breakdown.total_users:,} users")
for e in breakdown.events[:5]:
    print(f"  {e.event_name}: {e.count:,} ({e.pct_of_total:.1f}%)")

# Discover queryable property keys
print(ws.property_keys("jan_events"))
```

This tells you what events exist, how they're distributed, and what properties you can query—so your SQL is informed rather than guesswork.

## Step 6: Query with SQL

Analyze the data with SQL:

```
mp query sql "SELECT event_name, COUNT(*) as count FROM jan_events GROUP BY 1 ORDER BY 2 DESC" --format table
```

```
import mixpanel_data as mp

ws = mp.Workspace()

# Get results as DataFrame
df = ws.sql("""
    SELECT
        event_name,
        COUNT(*) as count
    FROM jan_events
    GROUP BY 1
    ORDER BY 2 DESC
""")
print(df)
```

## Step 7: Run Live Queries

For real-time analytics, query Mixpanel directly:

```
mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 --format table

# Filter results with built-in jq support
mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 \
    --format json --jq '.total'
```

```
import mixpanel_data as mp

ws = mp.Workspace()

result = ws.segmentation(
    event="Purchase",
    from_date="2025-01-01",
    to_date="2025-01-31"
)

# Access as DataFrame
print(result.df)
```

## Alternative: Stream Data Without Storage

For ETL pipelines or one-time processing, stream data directly without storing:

```
# Stream events as JSONL (memory-efficient for large datasets)
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout > events.jsonl

# Count unique users via Unix pipeline
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout \
  | jq -r '.distinct_id' | sort -u | wc -l
```

```
import mixpanel_data as mp

ws = mp.Workspace()
for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"):
    send_to_warehouse(event)
ws.close()
```

## Temporary Workspaces

For one-off analysis without persisting data, use **ephemeral** or **in-memory** workspaces:

```
import mixpanel_data as mp

# Ephemeral: uses temp file (best for large datasets, benefits from compression)
with mp.Workspace.ephemeral() as ws:
    ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31")
    total = ws.sql_scalar("SELECT COUNT(*) FROM events")
# Database automatically deleted when context exits

# In-memory: no files created (best for small datasets or zero disk footprint)
with mp.Workspace.memory() as ws:
    ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-07")
    total = ws.sql_scalar("SELECT COUNT(*) FROM events")
# Database gone - no files ever created
```

## Next Steps

- [Configuration](https://jaredmcfarland.github.io/mixpanel_data/getting-started/configuration/index.md) — Multiple accounts and advanced settings
- [Fetching Data](https://jaredmcfarland.github.io/mixpanel_data/guide/fetching/index.md) — Filtering and progress callbacks
- [Streaming Data](https://jaredmcfarland.github.io/mixpanel_data/guide/streaming/index.md) — Process data without local storage
- [SQL Queries](https://jaredmcfarland.github.io/mixpanel_data/guide/sql-queries/index.md) — DuckDB JSON syntax and patterns
- [Live Analytics](https://jaredmcfarland.github.io/mixpanel_data/guide/live-analytics/index.md) — Segmentation, funnels, retention

Copy markdown

# Configuration

mixpanel_data uses Service Accounts for authentication and supports multiple configuration methods for credentials and settings.

Explore on DeepWiki

🤖 **[Authentication Setup →](https://deepwiki.com/jaredmcfarland/mixpanel_data/2.2-authentication-setup)**

Ask questions about service accounts, environment variables, or multi-account configuration.

## Environment Variables

Set these environment variables to configure credentials:

| Variable         | Description                              | Required           |
| ---------------- | ---------------------------------------- | ------------------ |
| `MP_USERNAME`    | Service account username                 | Yes                |
| `MP_SECRET`      | Service account secret                   | Yes                |
| `MP_PROJECT_ID`  | Mixpanel project ID                      | Yes                |
| `MP_REGION`      | Data residency region (`us`, `eu`, `in`) | No (default: `us`) |
| `MP_CONFIG_PATH` | Override config file location            | No                 |
| `MP_ACCOUNT`     | Account name to use from config file     | No                 |

Example:

```
export MP_USERNAME="sa_abc123..."
export MP_SECRET="your-secret-here"
export MP_PROJECT_ID="12345"
export MP_REGION="us"
```

## Config File

For persistent credential storage, use the config file at `~/.mp/config.toml`:

```
default = "production"

[accounts.production]
username = "sa_abc123..."
secret = "..."
project_id = "12345"
region = "us"

[accounts.staging]
username = "sa_xyz789..."
secret = "..."
project_id = "67890"
region = "eu"

[accounts.development]
username = "sa_dev456..."
secret = "..."
project_id = "11111"
region = "us"
```

### Managing Accounts with CLI

Add a new account:

```
# Interactive prompt (secure, recommended)
mp auth add production \
    --username sa_abc123... \
    --project 12345 \
    --region us
# You'll be prompted for the secret with hidden input
```

For CI/CD environments, provide the secret via environment variable or stdin:

```
# Via environment variable
MP_SECRET=your-secret mp auth add production --username sa_abc123... --project 12345

# Via stdin
echo "$SECRET" | mp auth add production --username sa_abc123... --project 12345 --secret-stdin
```

List configured accounts:

```
mp auth list
```

Switch the default account:

```
mp auth switch staging
```

Remove an account:

```
mp auth remove development
```

Show account details (secrets hidden):

```
mp auth show production
```

### Managing Accounts with Python

```
from mixpanel_data.auth import ConfigManager

config = ConfigManager()

# Add account
config.add_account(
    name="production",
    username="sa_abc123...",
    secret="your-secret",
    project_id="12345",
    region="us"
)

# List accounts
accounts = config.list_accounts()
for account in accounts:
    print(f"{account.name}: project {account.project_id} ({account.region})")

# Set default
config.set_default("production")

# Remove account
config.remove_account("old_account")
```

## Credential Resolution Order

When creating a Workspace, credentials are resolved in this order:

1. **Explicit arguments** — `Workspace(project_id=..., region=...)`
1. **Environment variables** — `MP_USERNAME`, `MP_SECRET`, etc.
1. **Named account** — `Workspace(account="staging")` or `MP_ACCOUNT=staging`
1. **Default account** — The account marked as `default` in config.toml

Example showing resolution:

```
import mixpanel_data as mp

# Uses explicit arguments
ws = mp.Workspace(
    username="sa_...",
    secret="...",
    project_id="12345"
)

# Uses environment variables (if set)
ws = mp.Workspace()

# Uses named account from config file
ws = mp.Workspace(account="staging")
```

## Data Residency Regions

Mixpanel stores data in regional data centers. Use the correct region for your project:

| Region         | Code | API Endpoint      |
| -------------- | ---- | ----------------- |
| United States  | `us` | `mixpanel.com`    |
| European Union | `eu` | `eu.mixpanel.com` |
| India          | `in` | `in.mixpanel.com` |

Region Mismatch

Using the wrong region will result in authentication errors or empty data.

## Workspace Path

By default, the workspace database is stored at `./mixpanel.db`. Override with:

```
import mixpanel_data as mp

# Custom path
ws = mp.Workspace(path="./data/analytics.db")

# Ephemeral (temporary, auto-deleted)
with mp.Workspace.ephemeral() as ws:
    # ... work with data
# Database deleted on exit
```

For CLI, use the `--db` option:

```
mp fetch events --db ./data/my_project.db --from 2025-01-01 --to 2025-01-31
```

## Next Steps

- [Fetching Data](https://jaredmcfarland.github.io/mixpanel_data/guide/fetching/index.md) — Learn about data ingestion
- [API Reference](https://jaredmcfarland.github.io/mixpanel_data/api/index.md) — Complete API documentation

Copy markdown
# User Guide

# Fetching Data

Fetch events and user profiles from Mixpanel into a local DuckDB database for fast, repeated SQL queries.

Explore on DeepWiki

🤖 **[Fetching Data Guide →](https://deepwiki.com/jaredmcfarland/mixpanel_data/3.2.3-fetching-data)**

Ask questions about fetch options, parallel processing, or troubleshoot data ingestion issues.

## Fetching Events

### Basic Usage

Fetch all events for a date range:

```
import mixpanel_data as mp

ws = mp.Workspace()
result = ws.fetch_events(
    name="jan_events",
    from_date="2025-01-01",
    to_date="2025-01-31"
)

print(f"Fetched {result.row_count} events")
print(f"Duration: {result.duration_seconds:.1f}s")
```

```
mp fetch events jan_events --from 2025-01-01 --to 2025-01-31
```

### Filtering Events

Fetch specific event types:

```
result = ws.fetch_events(
    name="purchases",
    from_date="2025-01-01",
    to_date="2025-01-31",
    events=["Purchase", "Checkout Started"]
)
```

```
mp fetch events purchases --from 2025-01-01 --to 2025-01-31 \
    --events Purchase,"Checkout Started"
```

### Using Where Clauses

Filter with Mixpanel expression syntax:

```
result = ws.fetch_events(
    name="premium_purchases",
    from_date="2025-01-01",
    to_date="2025-01-31",
    where='properties["plan"] == "premium"'
)
```

```
mp fetch events premium_purchases --from 2025-01-01 --to 2025-01-31 \
    --where 'properties["plan"] == "premium"'
```

### Limiting Results

Cap the number of events returned (max 100,000):

```
result = ws.fetch_events(
    name="sample_events",
    from_date="2025-01-01",
    to_date="2025-01-31",
    limit=10000
)
```

```
mp fetch events sample_events --from 2025-01-01 --to 2025-01-31 \
    --limit 10000
```

This is useful for testing queries or sampling data before a full fetch.

### Progress Tracking

Monitor fetch progress with a callback:

```
def on_progress(count: int) -> None:
    print(f"Fetched {count} events...")

result = ws.fetch_events(
    name="events",
    from_date="2025-01-01",
    to_date="2025-01-31",
    progress_callback=on_progress
)
```

The CLI automatically displays a progress bar.

### Batch Size

Control the memory/IO tradeoff with `batch_size`:

```
# Smaller batch size = less memory, more disk IO
result = ws.fetch_events(
    name="events",
    from_date="2025-01-01",
    to_date="2025-01-31",
    batch_size=500
)

# Larger batch size = more memory, less disk IO
result = ws.fetch_events(
    name="events",
    from_date="2025-01-01",
    to_date="2025-01-31",
    batch_size=5000
)
```

```
mp fetch events --from 2025-01-01 --to 2025-01-31 --batch-size 500
```

The default is 1000 rows per commit. Valid range: 100-100,000.

## Parallel Fetching

For large date ranges, parallel fetching can dramatically speed up exports—up to 10x faster for multi-month ranges.

### Basic Parallel Fetch

Enable parallel fetching with the `parallel` flag:

```
result = ws.fetch_events(
    name="q4_events",
    from_date="2024-10-01",
    to_date="2024-12-31",
    parallel=True
)

print(f"Fetched {result.total_rows} rows in {result.duration_seconds:.1f}s")
print(f"Batches: {result.successful_batches} succeeded, {result.failed_batches} failed")
```

```
mp fetch events q4_events --from 2024-10-01 --to 2024-12-31 --parallel
```

Parallel fetching splits the date range into 7-day chunks and fetches them concurrently using multiple threads. This bypasses Mixpanel's 100-day limit and enables faster exports.

### How It Works

1. **Date Range Chunking**: The date range is split into chunks (default: 7 days each)
1. **Concurrent Fetching**: Multiple threads fetch chunks simultaneously from Mixpanel
1. **Single-Writer Queue**: A dedicated writer thread serializes writes to DuckDB (respecting its single-writer constraint)
1. **Partial Failure Handling**: Failed batches are tracked for potential retry

### Performance

| Date Range | Sequential | Parallel (10 workers) | Speedup         |
| ---------- | ---------- | --------------------- | --------------- |
| 7 days     | ~5s        | ~5s                   | 1x (no benefit) |
| 30 days    | ~20s       | ~5s                   | 4x              |
| 90 days    | ~60s       | ~8s                   | 7.5x            |

When to Use Parallel Fetching

- **Use parallel** for date ranges > 7 days
- **Use sequential** for small ranges or when you need the `limit` parameter

### Configuring Workers

Control the number of concurrent fetch threads:

```
result = ws.fetch_events(
    name="events",
    from_date="2024-01-01",
    to_date="2024-03-31",
    parallel=True,
    max_workers=5  # Default is 10
)
```

```
mp fetch events --from 2024-01-01 --to 2024-03-31 --parallel --workers 5
```

Higher worker counts may hit Mixpanel rate limits. The default of 10 works well for most cases.

### Configuring Chunk Size

Control how many days each chunk covers:

```
result = ws.fetch_events(
    name="events",
    from_date="2024-01-01",
    to_date="2024-03-31",
    parallel=True,
    chunk_days=14  # Default is 7
)
```

```
mp fetch events --from 2024-01-01 --to 2024-03-31 --parallel --chunk-days 14
```

Smaller chunk sizes create more parallel batches (potentially faster) but increase API overhead. Valid range: 1-100 days.

### Progress Callbacks

Monitor batch completion with a callback:

```
from mixpanel_data import BatchProgress

def on_batch(progress: BatchProgress) -> None:
    status = "✓" if progress.success else "✗"
    print(f"[{status}] Batch {progress.batch_index + 1}/{progress.total_batches}: "
          f"{progress.from_date} to {progress.to_date} ({progress.rows} rows)")

result = ws.fetch_events(
    name="events",
    from_date="2024-01-01",
    to_date="2024-03-31",
    parallel=True,
    on_batch_complete=on_batch
)
```

The CLI automatically displays batch progress when `--parallel` is used.

### Handling Failures

Parallel fetching tracks failures and provides retry information:

```
result = ws.fetch_events(
    name="events",
    from_date="2024-01-01",
    to_date="2024-03-31",
    parallel=True
)

if result.has_failures:
    print(f"Warning: {result.failed_batches} batches failed")
    for from_date, to_date in result.failed_date_ranges:
        print(f"  Failed: {from_date} to {to_date}")

    # Retry failed ranges with append mode
    for from_date, to_date in result.failed_date_ranges:
        ws.fetch_events(
            name="events",
            from_date=from_date,
            to_date=to_date,
            append=True  # Append to existing table
        )
```

Parallel Fetch Limitations

- **No `limit` parameter**: Parallel fetch does not support the `limit` parameter. Using both raises an error.
- **Exit code 1 on partial failure**: The CLI returns exit code 1 if any batches fail, even if some succeeded.

## Fetching Profiles

Fetch user profiles into local storage:

```
result = ws.fetch_profiles(name="users")
print(f"Fetched {result.row_count} profiles")
```

```
mp fetch profiles users
```

### Filtering Profiles

Use Mixpanel expression syntax:

```
result = ws.fetch_profiles(
    name="premium_users",
    where='properties["plan"] == "premium"'
)
```

```
mp fetch profiles premium_users \
    --where 'properties["plan"] == "premium"'
```

### Filtering by Cohort

Fetch only profiles that are members of a specific cohort:

```
result = ws.fetch_profiles(
    name="power_users",
    cohort_id="12345"
)
```

```
mp fetch profiles power_users --cohort 12345
```

### Selecting Specific Properties

Reduce bandwidth and memory by fetching only the properties you need:

```
result = ws.fetch_profiles(
    name="user_emails",
    output_properties=["$email", "$name", "plan"]
)
```

```
mp fetch profiles user_emails --output-properties '$email,$name,plan'
```

### Combining Filters

Filters can be combined for precise data selection:

```
result = ws.fetch_profiles(
    name="premium_emails",
    cohort_id="premium_cohort",
    output_properties=["$email", "$name"],
    where='properties["country"] == "US"'
)
```

```
mp fetch profiles premium_emails \
    --cohort premium_cohort \
    --output-properties '$email,$name' \
    --where 'properties["country"] == "US"'
```

### Fetching Specific Users by ID

Fetch one or more specific users by their distinct ID:

```
# Single user
result = ws.fetch_profiles(
    name="single_user",
    distinct_id="user_123"
)

# Multiple specific users
result = ws.fetch_profiles(
    name="specific_users",
    distinct_ids=["user_1", "user_2", "user_3"]
)
```

```
# Single user
mp fetch profiles single_user --distinct-id user_123

# Multiple specific users
mp fetch profiles specific_users \
    --distinct-ids user_1 --distinct-ids user_2 --distinct-ids user_3
```

Mutually Exclusive

`distinct_id` and `distinct_ids` cannot be used together. Choose one approach based on your needs.

### Fetching Group Profiles

Fetch group profiles (companies, accounts, etc.) instead of user profiles:

```
result = ws.fetch_profiles(
    name="companies",
    group_id="companies"  # The group type defined in your Mixpanel project
)
```

```
mp fetch profiles companies --group-id companies
```

### Behavioral Filtering

Filter profiles by event behavior—users who performed specific actions. Behaviors use a named pattern that you reference in a `where` clause:

```
# Users who purchased in the last 30 days
result = ws.fetch_profiles(
    name="recent_purchasers",
    behaviors=[{
        "window": "30d",
        "name": "made_purchase",
        "event_selectors": [{"event": "Purchase"}]
    }],
    where='(behaviors["made_purchase"] > 0)'
)

# Users with multiple behavior criteria
result = ws.fetch_profiles(
    name="engaged_users",
    behaviors=[
        {
            "window": "30d",
            "name": "purchased",
            "event_selectors": [{"event": "Purchase"}]
        },
        {
            "window": "7d",
            "name": "active",
            "event_selectors": [{"event": "Page View"}]
        }
    ],
    where='(behaviors["purchased"] > 0) and (behaviors["active"] >= 5)'
)
```

```
# Users who purchased in the last 30 days
mp fetch profiles recent_purchasers \
    --behaviors '[{"window":"30d","name":"made_purchase","event_selectors":[{"event":"Purchase"}]}]' \
    --where '(behaviors["made_purchase"] > 0)'

# Users with multiple behavior criteria
mp fetch profiles engaged_users \
    --behaviors '[{"window":"30d","name":"purchased","event_selectors":[{"event":"Purchase"}]},{"window":"7d","name":"active","event_selectors":[{"event":"Page View"}]}]' \
    --where '(behaviors["purchased"] > 0) and (behaviors["active"] >= 5)'
```

Behavior Format

Each behavior requires:

- `window`: Time window (e.g., "30d", "7d", "90d")
- `name`: Identifier to reference in `where` clause
- `event_selectors`: Array of event filters with `{"event": "Event Name"}`

The `where` clause filters using `behaviors["name"]` to check counts.

Mutually Exclusive

`behaviors` and `cohort_id` cannot be used together. Use one or the other for filtering.

### Historical Profile State

Query profile properties as they existed at a specific point in time:

```
import time

# Get profiles as of January 1, 2024
timestamp = 1704067200  # Unix timestamp

result = ws.fetch_profiles(
    name="historical_profiles",
    as_of_timestamp=timestamp
)
```

```
# Get profiles as of January 1, 2024 (Unix timestamp)
mp fetch profiles historical_profiles --as-of-timestamp 1704067200
```

### Cohort Membership Analysis

Include all users and mark whether they're in a cohort:

```
result = ws.fetch_profiles(
    name="cohort_analysis",
    cohort_id="power_users",
    include_all_users=True  # Include non-members too
)
```

```
mp fetch profiles cohort_analysis \
    --cohort power_users --include-all-users
```

This is useful for comparing users inside and outside a cohort. The response includes a membership indicator for each profile.

Requires Cohort

`include_all_users` requires `cohort_id`. It has no effect without specifying a cohort.

## Parallel Profile Fetching

For large profile datasets (thousands of profiles), parallel fetching can dramatically speed up exports—up to 5x faster.

### Basic Parallel Profile Fetch

Enable parallel fetching with the `parallel` flag:

```
result = ws.fetch_profiles(
    name="all_users",
    parallel=True
)

print(f"Fetched {result.total_rows} profiles in {result.duration_seconds:.1f}s")
print(f"Pages: {result.successful_pages} succeeded, {result.failed_pages} failed")
```

```
mp fetch profiles all_users --parallel
```

Parallel profile fetching uses page-based parallelism—fetching multiple pages of profiles concurrently using a session ID for consistency.

### How It Works

1. **Session-Based Pagination**: The initial page establishes a session ID for consistent results
1. **Dynamic Page Discovery**: Pages are fetched as they're discovered (not pre-scheduled)
1. **Concurrent Fetching**: Multiple threads fetch pages simultaneously (default: 5 workers)
1. **Single-Writer Queue**: A dedicated writer thread serializes writes to DuckDB
1. **Partial Failure Handling**: Failed pages are tracked for potential retry

### Performance

| Profile Count | Sequential | Parallel (5 workers) | Speedup         |
| ------------- | ---------- | -------------------- | --------------- |
| 1,000         | ~2s        | ~2s                  | 1x (no benefit) |
| 10,000        | ~10s       | ~3s                  | 3x              |
| 50,000        | ~50s       | ~12s                 | 4x              |

When to Use Parallel Profile Fetching

- **Use parallel** for datasets with 5,000+ profiles
- **Use sequential** for small datasets or when you need maximum consistency

### Configuring Workers

Control the number of concurrent fetch threads:

```
result = ws.fetch_profiles(
    name="users",
    parallel=True,
    max_workers=3  # Default is 5, max is 5
)
```

```
mp fetch profiles users --parallel --workers 3
```

Worker Limit

Workers are capped at 5 to avoid Mixpanel API rate limits (60 requests/hour for Engage API). Requesting more than 5 workers will be automatically capped.

### Progress Callbacks

Monitor page completion with a callback:

```
from mixpanel_data import ProfileProgress

def on_page(progress: ProfileProgress) -> None:
    status = "✓" if progress.success else "✗"
    print(f"[{status}] Page {progress.page_index}: "
          f"{progress.rows} rows (cumulative: {progress.cumulative_rows})")

result = ws.fetch_profiles(
    name="users",
    parallel=True,
    on_page_complete=on_page
)
```

The CLI automatically displays page progress when `--parallel` is used.

### Handling Failures

Parallel fetching tracks failures and provides information for debugging:

```
result = ws.fetch_profiles(
    name="users",
    parallel=True
)

if result.has_failures:
    print(f"Warning: {result.failed_pages} pages failed")
    print(f"Failed page indices: {result.failed_page_indices}")
```

Parallel Profile Fetch Limitations

- **Rate limits**: The Engage API has a 60 requests/hour limit. Large exports with many pages may hit this limit.
- **Exit code 1 on partial failure**: The CLI returns exit code 1 if any pages fail, even if some succeeded.

### Combining with Filters

Parallel fetching works with all profile filters:

```
result = ws.fetch_profiles(
    name="premium_users",
    where='properties["plan"] == "premium"',
    output_properties=["$email", "$name", "plan"],
    parallel=True,
    max_workers=3
)
```

```
mp fetch profiles premium_users \
    --where 'properties["plan"] == "premium"' \
    --output-properties '$email,$name,plan' \
    --parallel --workers 3
```

## Table Naming

Tables are stored with the name you provide:

```
ws.fetch_events(name="jan_events", ...)   # Creates table: jan_events
ws.fetch_events(name="feb_events", ...)   # Creates table: feb_events
ws.fetch_profiles(name="users")            # Creates table: users
```

Table Names Must Be Unique

Fetching to an existing table name raises `TableExistsError`. Use `--replace` to overwrite, `--append` to add data, or choose a different name.

## Replacing and Appending

### Replace Mode

Drop and recreate a table with fresh data:

```
# First drop the table, then fetch
ws.drop("events")
result = ws.fetch_events(
    name="events",
    from_date="2025-01-01",
    to_date="2025-01-31"
)
```

```
mp fetch events --from 2025-01-01 --to 2025-01-31 --replace
```

### Append Mode

Add data to an existing table. Duplicates (by `insert_id` for events, `distinct_id` for profiles) are automatically skipped:

```
# Initial fetch
ws.fetch_events(
    name="events",
    from_date="2025-01-01",
    to_date="2025-01-31"
)

# Append more data
ws.fetch_events(
    name="events",
    from_date="2025-02-01",
    to_date="2025-02-28",
    append=True
)
```

```
# Initial fetch
mp fetch events --from 2025-01-01 --to 2025-01-31

# Append more data
mp fetch events --from 2025-02-01 --to 2025-02-28 --append
```

Resuming Failed Fetches

If a fetch crashes or times out, use append mode to resume from where you left off:

```
# Check the last event timestamp
mp query sql "SELECT MAX(event_time) FROM events"
# 2025-01-15T14:30:00

# Resume from that point
mp fetch events --from 2025-01-15 --to 2025-01-31 --append
```

Overlapping date ranges are safe—duplicates are automatically skipped.

## Table Management

### Listing Tables

```
tables = ws.tables()
for table in tables:
    print(f"{table.name}: {table.row_count} rows ({table.type})")
```

### Viewing Table Schema

```
schema = ws.schema("jan_events")
for col in schema.columns:
    print(f"{col.name}: {col.type}")
```

### Dropping Tables

```
ws.drop("jan_events")  # Drop single table
ws.drop_all()          # Drop all tables
```

## FetchResult

Both `fetch_events()` and `fetch_profiles()` return a `FetchResult`:

```
result = ws.fetch_events(...)

# Attributes
result.table_name       # "jan_events"
result.row_count        # 125000
result.duration_seconds # 45.2

# Metadata
result.metadata.from_date    # "2025-01-01"
result.metadata.to_date      # "2025-01-31"
result.metadata.events       # ["Purchase", "Signup"] or None
result.metadata.where        # 'properties["plan"]...' or None
result.metadata.fetched_at   # datetime

# Serialization
result.to_dict()  # JSON-serializable dict
```

## Event Table Schema

Fetched events have this schema:

| Column        | Type      | Description             |
| ------------- | --------- | ----------------------- |
| `event_id`    | VARCHAR   | Unique event identifier |
| `event_name`  | VARCHAR   | Event name              |
| `event_time`  | TIMESTAMP | When the event occurred |
| `distinct_id` | VARCHAR   | User identifier         |
| `insert_id`   | VARCHAR   | Deduplication ID        |
| `properties`  | JSON      | All event properties    |

## Profile Table Schema

Fetched profiles have this schema:

| Column        | Type    | Description                   |
| ------------- | ------- | ----------------------------- |
| `distinct_id` | VARCHAR | User identifier (primary key) |
| `properties`  | JSON    | All profile properties        |

## Best Practices

### Use Parallel Fetching for Large Date Ranges

For date ranges longer than a week, use parallel fetching for the best performance:

```
# Recommended: Parallel fetch for large date ranges
result = ws.fetch_events(
    name="events_2025",
    from_date="2025-01-01",
    to_date="2025-12-31",
    parallel=True
)
print(f"Fetched {result.total_rows} rows in {result.duration_seconds:.1f}s")
```

```
# Recommended: Parallel fetch for large date ranges
mp fetch events events_2025 --from 2025-01-01 --to 2025-12-31 --parallel
```

Parallel fetching automatically handles chunking, concurrent API requests, and serialized writes to DuckDB—no manual chunking required.

### Manual Chunking (Alternative)

If you need the `limit` parameter (incompatible with parallel), or want fine-grained control, you can manually chunk:

```
import datetime

# Fetch first chunk
ws.fetch_events(
    name="events_2025",
    from_date="2025-01-01",
    to_date="2025-01-31"
)

# Append subsequent chunks
start = datetime.date(2025, 2, 1)
end = datetime.date(2025, 12, 31)

current = start
while current <= end:
    chunk_end = min(current + datetime.timedelta(days=30), end)

    ws.fetch_events(
        name="events_2025",
        from_date=str(current),
        to_date=str(chunk_end),
        append=True  # Add to existing table
    )

    current = chunk_end + datetime.timedelta(days=1)
```

```
# Fetch month by month, appending to a single table
mp fetch events events_2025 --from 2025-01-01 --to 2025-01-31
mp fetch events events_2025 --from 2025-02-01 --to 2025-02-29 --append
mp fetch events events_2025 --from 2025-03-01 --to 2025-03-31 --append
# ... continue for each month
```

```
import datetime

start = datetime.date(2025, 1, 1)
end = datetime.date(2025, 12, 31)

current = start
while current < end:
    chunk_end = min(current + datetime.timedelta(days=30), end)
    table_name = f"events_{current.strftime('%Y%m')}"

    ws.fetch_events(
        name=table_name,
        from_date=str(current),
        to_date=str(chunk_end)
    )

    current = chunk_end + datetime.timedelta(days=1)
```

### Choose the Right Storage Mode

mixpanel_data offers three storage modes:

| Mode           | Method                  | Disk Usage                    | Best For                                     |
| -------------- | ----------------------- | ----------------------------- | -------------------------------------------- |
| **Persistent** | `Workspace()`           | Yes (permanent)               | Repeated analysis, large datasets            |
| **Ephemeral**  | `Workspace.ephemeral()` | Yes (temp file, auto-deleted) | One-off analysis with large data             |
| **In-Memory**  | `Workspace.memory()`    | None                          | Small datasets, testing, zero disk footprint |

**Ephemeral mode** creates a temp file that benefits from DuckDB's compression—up to 8× faster for large datasets:

```
with mp.Workspace.ephemeral() as ws:
    ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31")
    result = ws.sql("SELECT event_name, COUNT(*) FROM events GROUP BY 1")
# Database automatically deleted
```

**In-memory mode** creates no files at all—ideal for small datasets, unit tests, or privacy-sensitive scenarios:

```
with mp.Workspace.memory() as ws:
    ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-07")
    total = ws.sql_scalar("SELECT COUNT(*) FROM events")
# Database gone - no files ever created
```

When to use each mode

- **Persistent**: You'll query the same data multiple times across sessions
- **Ephemeral**: Large datasets where you need compression benefits but won't keep the data
- **In-Memory**: Small datasets, unit tests, or when zero disk footprint is required

## Streaming as an Alternative

If you don't need to store data locally, use streaming instead:

| Approach          | Storage      | Best For                           |
| ----------------- | ------------ | ---------------------------------- |
| `fetch_events()`  | DuckDB table | Repeated SQL analysis              |
| `stream_events()` | None         | ETL pipelines, one-time processing |

```
# Stream directly without storage
for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"):
    send_to_warehouse(event)
```

See [Streaming Data](https://jaredmcfarland.github.io/mixpanel_data/guide/streaming/index.md) for details.

## Next Steps

- [Streaming Data](https://jaredmcfarland.github.io/mixpanel_data/guide/streaming/index.md) — Process data without local storage
- [SQL Queries](https://jaredmcfarland.github.io/mixpanel_data/guide/sql-queries/index.md) — Query your fetched data with SQL
- [Live Analytics](https://jaredmcfarland.github.io/mixpanel_data/guide/live-analytics/index.md) — Query Mixpanel directly for real-time data

Copy markdown

# Streaming Data

Stream events and user profiles directly from Mixpanel without storing to local database. Ideal for ETL pipelines, one-time exports, and Unix-style piping.

Explore on DeepWiki

🤖 **[Data Flow Patterns →](https://deepwiki.com/jaredmcfarland/mixpanel_data/4.1-data-flow-patterns)**

Ask questions about streaming vs fetching, memory-efficient processing, or ETL pipeline patterns.

## When to Stream vs Fetch

| Use Case               | Recommended       | Why                                  |
| ---------------------- | ----------------- | ------------------------------------ |
| Repeated analysis      | `fetch_events()`  | Query once, analyze many times       |
| ETL to external system | `stream_events()` | No intermediate storage needed       |
| Memory-constrained     | `stream_events()` | Constant memory usage                |
| Ad-hoc exploration     | `fetch_events()`  | SQL iteration is faster              |
| Piping to tools        | `--stdout`        | JSONL integrates with jq, grep, etc. |

## Streaming Events

### Basic Usage

Stream all events for a date range:

```
import mixpanel_data as mp

ws = mp.Workspace()

for event in ws.stream_events(
    from_date="2025-01-01",
    to_date="2025-01-31"
):
    print(f"{event['event_name']}: {event['distinct_id']}")
    # event_time is a datetime object
    # properties contains remaining fields

ws.close()
```

```
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout
```

### Filtering Events

Filter by event name or expression:

```
# Filter by event names
for event in ws.stream_events(
    from_date="2025-01-01",
    to_date="2025-01-31",
    events=["Purchase", "Signup"]
):
    process(event)

# Filter with WHERE clause
for event in ws.stream_events(
    from_date="2025-01-01",
    to_date="2025-01-31",
    where='properties["country"]=="US"'
):
    process(event)
```

```
# Filter by event names
mp fetch events --from 2025-01-01 --to 2025-01-31 \
    --events "Purchase,Signup" --stdout

# Filter with WHERE clause
mp fetch events --from 2025-01-01 --to 2025-01-31 \
    --where 'properties["country"]=="US"' --stdout
```

### Raw API Format

By default, streaming returns normalized data with `event_time` as a datetime. Use `raw=True` to get the exact Mixpanel API format:

```
for event in ws.stream_events(
    from_date="2025-01-01",
    to_date="2025-01-31",
    raw=True
):
    # event has {"event": "...", "properties": {...}} structure
    # properties["time"] is Unix timestamp
    legacy_system.ingest(event)
```

```
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout --raw
```

## Streaming Profiles

### Basic Usage

Stream all user profiles:

```
for profile in ws.stream_profiles():
    sync_to_crm(profile)
```

```
mp fetch profiles --stdout
```

### Filtering Profiles

```
for profile in ws.stream_profiles(
    where='properties["plan"]=="premium"'
):
    send_survey(profile)
```

```
mp fetch profiles --where 'properties["plan"]=="premium"' --stdout
```

### Streaming Specific Users

Stream a single user by their distinct ID:

```
for profile in ws.stream_profiles(distinct_id="user_123"):
    process(profile)
```

```
mp fetch profiles --distinct-id user_123 --stdout
```

Stream multiple specific users:

```
user_ids = ["user_123", "user_456", "user_789"]
for profile in ws.stream_profiles(distinct_ids=user_ids):
    sync_to_external_system(profile)
```

```
mp fetch profiles --distinct-ids "user_123,user_456,user_789" --stdout
```

Mutually Exclusive

`distinct_id` and `distinct_ids` cannot be used together. Use `distinct_id` for a single user, `distinct_ids` for multiple users.

### Streaming Group Profiles

Stream group profiles (e.g., companies, accounts) instead of user profiles:

```
# Stream all company profiles
for company in ws.stream_profiles(group_id="companies"):
    sync_company(company)

# Filter group profiles
for account in ws.stream_profiles(
    group_id="accounts",
    where='properties["plan"]=="enterprise"'
):
    process_enterprise_account(account)
```

```
# Stream company profiles
mp fetch profiles --group-id companies --stdout

# Filter group profiles
mp fetch profiles --group-id accounts \
    --where 'properties["plan"]=="enterprise"' --stdout
```

### Behavioral Filtering

Stream users based on actions they've performed. Behaviors use a named pattern that you reference in a `where` clause:

```
# Users who completed a purchase in last 30 days
behaviors = [{
    "window": "30d",
    "name": "made_purchase",
    "event_selectors": [{"event": "Purchase"}]
}]
for profile in ws.stream_profiles(
    behaviors=behaviors,
    where='(behaviors["made_purchase"] > 0)'
):
    send_thank_you(profile)

# Users who signed up but didn't purchase
behaviors = [
    {"window": "30d", "name": "signed_up", "event_selectors": [{"event": "Signup"}]},
    {"window": "30d", "name": "purchased", "event_selectors": [{"event": "Purchase"}]}
]
for profile in ws.stream_profiles(
    behaviors=behaviors,
    where='(behaviors["signed_up"] > 0) and (behaviors["purchased"] == 0)'
):
    send_conversion_reminder(profile)
```

```
# Users who completed a purchase in last 30 days
mp fetch profiles \
    --behaviors '[{"window":"30d","name":"made_purchase","event_selectors":[{"event":"Purchase"}]}]' \
    --where '(behaviors["made_purchase"] > 0)' \
    --stdout

# Users who signed up but didn't purchase
mp fetch profiles \
    --behaviors '[{"window":"30d","name":"signed_up","event_selectors":[{"event":"Signup"}]},{"window":"30d","name":"purchased","event_selectors":[{"event":"Purchase"}]}]' \
    --where '(behaviors["signed_up"] > 0) and (behaviors["purchased"] == 0)' \
    --stdout
```

Behavior Format

Each behavior requires: `window` (time window like "30d"), `name` (identifier for `where` clause), and `event_selectors` (array with `{"event": "Name"}`).

Mutually Exclusive

`behaviors` cannot be used with `cohort_id`. Use one or the other for filtering.

### Historical Profile State

Query profile state at a specific point in time:

```
import time

# Profile state from 7 days ago
seven_days_ago = int(time.time()) - (7 * 24 * 60 * 60)
for profile in ws.stream_profiles(as_of_timestamp=seven_days_ago):
    compare_historical_state(profile)
```

```
# Query historical state (Unix timestamp)
mp fetch profiles --as-of-timestamp 1704067200 --stdout
```

### Cohort Membership Analysis

Get all users with cohort membership marked:

```
# Stream all users, marking which are in the cohort
for profile in ws.stream_profiles(
    cohort_id="12345",
    include_all_users=True
):
    if profile.get("in_cohort"):
        tag_as_cohort_member(profile)
    else:
        tag_as_non_member(profile)
```

```
mp fetch profiles --cohort-id 12345 --include-all-users --stdout
```

Requires cohort_id

`include_all_users` only works when `cohort_id` is specified.

## CLI Pipeline Examples

The `--stdout` flag outputs JSONL (one JSON object per line), perfect for Unix pipelines:

```
# Filter with jq
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout \
    | jq 'select(.event_name == "Purchase")'

# Count events
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout | wc -l

# Save to file
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout > events.jsonl

# Process with custom script
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout \
    | python process_events.py

# Extract specific fields
mp fetch profiles --stdout | jq -r '.distinct_id'
```

## Output Formats

### Normalized Format (Default)

Events:

```
{
  "event_name": "Purchase",
  "distinct_id": "user_123",
  "event_time": "2025-01-15T10:30:00+00:00",
  "insert_id": "abc123",
  "properties": {
    "amount": 99.99,
    "currency": "USD"
  }
}
```

Profiles:

```
{
  "distinct_id": "user_123",
  "last_seen": "2025-01-15T14:30:00",
  "properties": {
    "name": "Alice",
    "plan": "premium"
  }
}
```

### Raw Format (`raw=True` or `--raw`)

Events:

```
{
  "event": "Purchase",
  "properties": {
    "distinct_id": "user_123",
    "time": 1705319400,
    "$insert_id": "abc123",
    "amount": 99.99,
    "currency": "USD"
  }
}
```

Profiles:

```
{
  "$distinct_id": "user_123",
  "$properties": {
    "$last_seen": "2025-01-15T14:30:00",
    "name": "Alice",
    "plan": "premium"
  }
}
```

## Common Patterns

### ETL Pipeline

Batch events and send to external system:

```
import mixpanel_data as mp
from your_warehouse import send_batch

ws = mp.Workspace()
batch = []

for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"):
    batch.append(event)
    if len(batch) >= 1000:
        send_batch(batch)
        batch = []

# Send remaining
if batch:
    send_batch(batch)

ws.close()
```

### Aggregation Without Storage

Compute statistics without creating a local table:

```
from collections import Counter
import mixpanel_data as mp

ws = mp.Workspace()
event_counts = Counter()

for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"):
    event_counts[event["event_name"]] += 1

print(event_counts.most_common(10))
ws.close()
```

### Context Manager

Use `with` for automatic cleanup:

```
import mixpanel_data as mp

with mp.Workspace() as ws:
    for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"):
        process(event)
# No need to call ws.close()
```

## Method Signatures

### stream_events()

```
def stream_events(
    *,
    from_date: str,
    to_date: str,
    events: list[str] | None = None,
    where: str | None = None,
    raw: bool = False,
) -> Iterator[dict[str, Any]]
```

| Parameter   | Type                | Description                |
| ----------- | ------------------- | -------------------------- |
| `from_date` | `str`               | Start date (YYYY-MM-DD)    |
| `to_date`   | `str`               | End date (YYYY-MM-DD)      |
| `events`    | `list[str] \| None` | Event names to include     |
| `where`     | `str \| None`       | Mixpanel expression filter |
| `raw`       | `bool`              | Return raw API format      |

### stream_profiles()

```
def stream_profiles(
    *,
    where: str | None = None,
    cohort_id: str | None = None,
    output_properties: list[str] | None = None,
    raw: bool = False,
    distinct_id: str | None = None,
    distinct_ids: list[str] | None = None,
    group_id: str | None = None,
    behaviors: list[dict[str, Any]] | None = None,
    as_of_timestamp: int | None = None,
    include_all_users: bool = False,
) -> Iterator[dict[str, Any]]
```

| Parameter           | Type                 | Description                           |
| ------------------- | -------------------- | ------------------------------------- |
| `where`             | `str \| None`        | Mixpanel expression filter            |
| `cohort_id`         | `str \| None`        | Filter by cohort membership           |
| `output_properties` | `list[str] \| None`  | Limit returned properties             |
| `raw`               | `bool`               | Return raw API format                 |
| `distinct_id`       | `str \| None`        | Single user ID to fetch               |
| `distinct_ids`      | `list[str] \| None`  | Multiple user IDs to fetch            |
| `group_id`          | `str \| None`        | Group type for group profiles         |
| `behaviors`         | `list[dict] \| None` | Behavioral filters                    |
| `as_of_timestamp`   | `int \| None`        | Historical state Unix timestamp       |
| `include_all_users` | `bool`               | Include all users with cohort marking |

**Parameter Constraints:**

- `distinct_id` and `distinct_ids` are mutually exclusive
- `behaviors` and `cohort_id` are mutually exclusive
- `include_all_users` requires `cohort_id` to be set

## Next Steps

- [Fetching Data](https://jaredmcfarland.github.io/mixpanel_data/guide/fetching/index.md) — Store data locally for repeated SQL queries
- [SQL Queries](https://jaredmcfarland.github.io/mixpanel_data/guide/sql-queries/index.md) — Query stored data with DuckDB SQL
- [Live Analytics](https://jaredmcfarland.github.io/mixpanel_data/guide/live-analytics/index.md) — Real-time Mixpanel reports

Copy markdown

# Local SQL Queries

Query your fetched data with SQL using DuckDB's powerful analytical engine.

Explore on DeepWiki

🤖 **[Querying Data Guide →](https://deepwiki.com/jaredmcfarland/mixpanel_data/3.2.4-querying-data)**

Ask questions about SQL patterns, JSON property access, or how to structure complex analytical queries.

## Basic Queries

### Execute and Get DataFrame

```
import mixpanel_data as mp

ws = mp.Workspace()

df = ws.sql("""
    SELECT
        event_name,
        COUNT(*) as count
    FROM jan_events
    GROUP BY 1
    ORDER BY 2 DESC
""")

print(df)
```

### Get Single Value

```
total = ws.sql_scalar("SELECT COUNT(*) FROM jan_events")
print(f"Total events: {total}")
```

### Get Rows as Tuples

```
rows = ws.sql_rows("""
    SELECT event_name, COUNT(*)
    FROM jan_events
    GROUP BY 1
    LIMIT 5
""")

for event_name, count in rows:
    print(f"{event_name}: {count}")
```

## DuckDB JSON Syntax

Mixpanel properties are stored as JSON columns. Use DuckDB's JSON operators to access them.

### Extract String Property

```
SELECT properties->>'$.country' as country
FROM jan_events
```

### Extract and Cast Numeric

```
SELECT CAST(properties->>'$.amount' AS DECIMAL) as amount
FROM jan_events
```

### Filter on Property

```
SELECT *
FROM jan_events
WHERE properties->>'$.plan' = 'premium'
```

### Nested Property Access

```
SELECT properties->>'$.user.email' as email
FROM jan_events
```

### Check Property Exists

```
SELECT *
FROM jan_events
WHERE properties->>'$.coupon_code' IS NOT NULL
```

### Array Properties

```
-- Array length
SELECT json_array_length(properties->'$.items') as item_count
FROM jan_events

-- Array element
SELECT properties->'$.items'->>0 as first_item
FROM jan_events
```

## Common Query Patterns

### Daily Event Counts

```
SELECT
    DATE_TRUNC('day', event_time) as day,
    COUNT(*) as count
FROM jan_events
GROUP BY 1
ORDER BY 1
```

### Events by User

```
SELECT
    distinct_id,
    COUNT(*) as event_count,
    MIN(event_time) as first_seen,
    MAX(event_time) as last_seen
FROM jan_events
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10
```

### Property Distribution

```
SELECT
    properties->>'$.country' as country,
    COUNT(*) as count,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) as pct
FROM jan_events
WHERE event_name = 'Purchase'
GROUP BY 1
ORDER BY 2 DESC
```

### Revenue by Day

```
SELECT
    DATE_TRUNC('day', event_time) as day,
    COUNT(*) as purchases,
    SUM(CAST(properties->>'$.amount' AS DECIMAL)) as revenue
FROM jan_events
WHERE event_name = 'Purchase'
GROUP BY 1
ORDER BY 1
```

### User Cohort Analysis

```
WITH first_events AS (
    SELECT
        distinct_id,
        DATE_TRUNC('week', MIN(event_time)) as cohort_week
    FROM jan_events
    WHERE event_name = 'Signup'
    GROUP BY 1
)
SELECT
    cohort_week,
    COUNT(DISTINCT distinct_id) as users
FROM first_events
GROUP BY 1
ORDER BY 1
```

### Funnel Query

```
WITH step1 AS (
    SELECT DISTINCT distinct_id
    FROM jan_events
    WHERE event_name = 'View Product'
),
step2 AS (
    SELECT DISTINCT distinct_id
    FROM jan_events
    WHERE event_name = 'Add to Cart'
    AND distinct_id IN (SELECT distinct_id FROM step1)
),
step3 AS (
    SELECT DISTINCT distinct_id
    FROM jan_events
    WHERE event_name = 'Purchase'
    AND distinct_id IN (SELECT distinct_id FROM step2)
)
SELECT
    (SELECT COUNT(*) FROM step1) as viewed,
    (SELECT COUNT(*) FROM step2) as added,
    (SELECT COUNT(*) FROM step3) as purchased
```

## Joining Events and Profiles

Query events with user profile data:

```
# First, fetch both
ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31")
ws.fetch_profiles("users")

# Join them
df = ws.sql("""
    SELECT
        e.event_name,
        u.properties->>'$.plan' as plan,
        COUNT(*) as count
    FROM events e
    JOIN users u ON e.distinct_id = u.distinct_id
    GROUP BY 1, 2
    ORDER BY 3 DESC
""")
```

## CLI Usage

Run SQL queries from the command line:

```
# Table output
mp query sql "SELECT event_name, COUNT(*) FROM events GROUP BY 1" --format table

# JSON output
mp query sql "SELECT * FROM events LIMIT 10" --format json

# CSV export
mp query sql "SELECT * FROM events" --format csv > events.csv

# JSONL for streaming
mp query sql "SELECT * FROM events" --format jsonl > events.jsonl

# Filter with built-in jq support
mp query sql "SELECT * FROM events LIMIT 100" --format json \
    --jq '.[] | select(.event_name == "Purchase")'

# Extract specific fields with jq
mp query sql "SELECT event_name, COUNT(*) as cnt FROM events GROUP BY 1" \
    --format json --jq 'map({name: .event_name, count: .cnt})'
```

## Direct DuckDB Access

For advanced use cases, access the DuckDB connection directly:

```
# Get the connection
conn = ws.connection

# Run DuckDB-specific operations
conn.execute("SET threads TO 4")
result = conn.execute("EXPLAIN ANALYZE SELECT * FROM events").fetchall()
```

### Database Path

Get the path to the underlying database file:

```
# Get the database file path
path = ws.db_path
print(f"Data stored at: {path}")

# Useful for reopening the same database later
ws.close()
ws = mp.Workspace.open(path)
```

Note: `db_path` returns `None` for in-memory workspaces created with `Workspace.memory()`.

## Performance Tips

### Use Appropriate Data Types

Cast properties to appropriate types for better performance:

```
-- Instead of string comparison
WHERE CAST(properties->>'$.amount' AS DECIMAL) > 100

-- Consider creating a view with typed columns
CREATE VIEW typed_events AS
SELECT
    event_id,
    event_name,
    event_time,
    distinct_id,
    CAST(properties->>'$.amount' AS DECIMAL) as amount,
    properties->>'$.country' as country
FROM jan_events
```

### Limit Result Sets

Always use LIMIT during exploration:

```
SELECT * FROM jan_events LIMIT 100
```

### Use Aggregations

DuckDB is optimized for analytical queries. Prefer aggregations over fetching raw rows.

## Next Steps

- [Live Analytics](https://jaredmcfarland.github.io/mixpanel_data/guide/live-analytics/index.md) — Query Mixpanel directly
- [API Reference](https://jaredmcfarland.github.io/mixpanel_data/api/workspace/index.md) — Complete Workspace API

Copy markdown

# Live Analytics

Query Mixpanel's analytics APIs directly for real-time data without fetching to local storage.

Explore on DeepWiki

🤖 **[Querying Data Guide →](https://deepwiki.com/jaredmcfarland/mixpanel_data/3.2.4-querying-data)**

Ask questions about segmentation, funnels, retention, JQL, or other live query methods.

## When to Use Live Queries

Use live queries when:

- You need the most current data
- You're running one-off analysis
- The query is already optimized by Mixpanel (segmentation, funnels, retention)
- You want to leverage Mixpanel's pre-computed aggregations

Use local queries when:

- You need to run many queries over the same data
- You need custom SQL logic
- You want to minimize API calls
- Context window preservation matters (for AI agents)

## Segmentation

Time-series event counts with optional property segmentation:

```
import mixpanel_data as mp

ws = mp.Workspace()

# Simple count over time
result = ws.segmentation(
    event="Purchase",
    from_date="2025-01-01",
    to_date="2025-01-31"
)

# Segment by property
result = ws.segmentation(
    event="Purchase",
    from_date="2025-01-01",
    to_date="2025-01-31",
    on="country"
)

# With filtering
result = ws.segmentation(
    event="Purchase",
    from_date="2025-01-01",
    to_date="2025-01-31",
    on="country",
    where='properties["plan"] == "premium"',
    unit="week"  # day, week, month
)

# Access as DataFrame
print(result.df)
```

```
# Simple segmentation
mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31

# With property breakdown
mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 \
    --on country --format table

# Filter with jq to get just the total
mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 \
    --format json --jq '.total'

# Get top 3 days by volume
mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 \
    --format json --jq '.series | to_entries | sort_by(.value) | reverse | .[:3]'
```

### SegmentationResult

```
result.event          # "Purchase"
result.dates          # ["2025-01-01", "2025-01-02", ...]
result.values         # {"$overall": [100, 150, ...]}
result.segments       # ["US", "UK", "DE", ...]
result.df             # pandas DataFrame
result.to_dict()      # JSON-serializable dict
```

## Funnels

Analyze conversion through a sequence of steps:

```
# First, find your funnel ID
funnels = ws.funnels()
for f in funnels:
    print(f"{f.funnel_id}: {f.name}")

# Query the funnel
result = ws.funnel(
    funnel_id=12345,
    from_date="2025-01-01",
    to_date="2025-01-31"
)

# With segmentation
result = ws.funnel(
    funnel_id=12345,
    from_date="2025-01-01",
    to_date="2025-01-31",
    on="country"
)

# Access results
for step in result.steps:
    print(f"{step.event}: {step.count} ({step.conversion_rate:.1%})")
```

```
# List available funnels
mp inspect funnels

# Query a funnel
mp query funnel --funnel-id 12345 --from 2025-01-01 --to 2025-01-31 --format table
```

### FunnelResult

```
result.funnel_id       # 12345
result.steps           # [FunnelStep, ...]
result.overall_rate    # 0.15 (15% overall conversion)
result.df              # DataFrame with step metrics

# Each step
step.event             # "Checkout Started"
step.count             # 5000
step.conversion_rate   # 0.85
step.avg_time          # timedelta or None
```

## Retention

Cohort-based retention analysis:

```
result = ws.retention(
    born_event="Signup",
    return_event="Login",
    from_date="2025-01-01",
    to_date="2025-01-31",
    born_where='properties["source"] == "organic"',
    unit="week"
)

# Access cohorts
for cohort in result.cohorts:
    print(f"{cohort.date}: {cohort.size} users")
    print(f"  Retention: {cohort.retention_rates}")
```

```
mp query retention \
    --born-event Signup \
    --return-event Login \
    --from 2025-01-01 \
    --to 2025-01-31 \
    --unit week \
    --format table
```

### RetentionResult

```
result.born_event      # "Signup"
result.return_event    # "Login"
result.cohorts         # [CohortInfo, ...]
result.df              # DataFrame with retention matrix

# Each cohort
cohort.date            # "2025-01-01"
cohort.size            # 1000
cohort.retention_rates # [1.0, 0.45, 0.32, 0.28, ...]
```

## JQL (JavaScript Query Language)

Run custom JQL scripts for advanced analysis:

```
script = """
function main() {
    return Events({
        from_date: params.from_date,
        to_date: params.to_date,
        event_selectors: [{event: "Purchase"}]
    })
    .groupBy(["properties.country"], mixpanel.reducer.count())
    .sortDesc("value")
    .take(10);
}
"""

result = ws.jql(
    script=script,
    params={"from_date": "2025-01-01", "to_date": "2025-01-31"}
)

print(result.data)  # Raw JQL result
print(result.df)    # As DataFrame
```

```
# From file
mp query jql --script ./query.js --param from_date=2025-01-01 --param to_date=2025-01-31

# Inline
mp query jql --script 'function main() { return Events({...}).count(); }'
```

## Event Counts

Multi-event time series comparison:

```
result = ws.event_counts(
    events=["Signup", "Purchase", "Churn"],
    from_date="2025-01-01",
    to_date="2025-01-31",
    unit="day"
)

# DataFrame with columns: date, Signup, Purchase, Churn
print(result.df)
```

```
mp query event-counts \
    --event Signup --event Purchase --event Churn \
    --from 2025-01-01 --to 2025-01-31 \
    --format table
```

## Property Counts

Break down an event by property values:

```
result = ws.property_counts(
    event="Purchase",
    property_name="country",
    from_date="2025-01-01",
    to_date="2025-01-31",
    limit=10
)

print(result.df)  # Columns: date, US, UK, DE, ...
```

```
mp query property-counts \
    --event Purchase \
    --property country \
    --from 2025-01-01 --to 2025-01-31 \
    --limit 10 \
    --format table
```

## Activity Feed

Get a user's event history:

```
result = ws.activity_feed(
    distinct_ids=["user_123", "user_456"],
    from_date="2025-01-01",
    to_date="2025-01-31"
)

for event in result.events:
    print(f"{event.time}: {event.event}")
    print(f"  Properties: {event.properties}")
```

```
mp query activity-feed \
    --distinct-id user_123 \
    --from 2025-01-01 --to 2025-01-31 \
    --format json
```

## Saved Reports

Query saved reports from Mixpanel (Insights, Retention, Funnels, and Flows).

### Listing Bookmarks

First, find available saved reports:

```
# List all saved reports
bookmarks = ws.list_bookmarks()
for b in bookmarks:
    print(f"{b.id}: {b.name} ({b.type})")

# Filter by type
insights = ws.list_bookmarks(bookmark_type="insights")
funnels = ws.list_bookmarks(bookmark_type="funnels")
```

```
mp inspect bookmarks
mp inspect bookmarks --type insights
mp inspect bookmarks --type funnels --format table
```

### Querying Saved Reports

Query Insights, Retention, or Funnel reports by bookmark ID:

Get Bookmark IDs First

Run `list_bookmarks()` or `mp inspect bookmarks` to find the numeric ID of the report you want to query.

```
# Get the bookmark ID from list_bookmarks() first
bookmarks = ws.list_bookmarks(bookmark_type="insights")
bookmark_id = bookmarks[0].id  # e.g., 98765

result = ws.query_saved_report(bookmark_id=bookmark_id)
print(f"Report type: {result.report_type}")
print(result.df)
```

```
# First find your bookmark ID
mp inspect bookmarks --type insights --format table

# Then query it
mp query saved-report --bookmark-id 98765 --format table
```

## Flows

Query saved Flows reports:

Flows Use Different IDs

Flows reports have their own bookmark IDs. Filter with `--type flows` when listing.

```
# Get Flows bookmark ID
flows = ws.list_bookmarks(bookmark_type="flows")
bookmark_id = flows[0].id  # e.g., 54321

result = ws.query_flows(bookmark_id=bookmark_id)
print(f"Conversion rate: {result.overall_conversion_rate:.1%}")
for step in result.steps:
    print(f"  {step}")
```

```
# First find Flows bookmark IDs
mp inspect bookmarks --type flows --format table

# Then query it
mp query flows --bookmark-id 54321 --format table
```

## Frequency Analysis

Analyze how often users perform an event:

```
result = ws.frequency(
    event="Login",
    from_date="2025-01-01",
    to_date="2025-01-31",
    unit="month",
    addiction_unit="day"
)

# Distribution of logins per day
print(result.buckets)  # {"0": 1000, "1": 500, "2-3": 300, ...}
```

```
mp query frequency \
    --event Login \
    --from 2025-01-01 --to 2025-01-31 \
    --format table
```

## Numeric Aggregations

Aggregate numeric properties:

### Bucketing

```
result = ws.segmentation_numeric(
    event="Purchase",
    from_date="2025-01-01",
    to_date="2025-01-31",
    on="amount",
    type="general"  # or "linear", "logarithmic"
)
```

### Sum

```
result = ws.segmentation_sum(
    event="Purchase",
    from_date="2025-01-01",
    to_date="2025-01-31",
    on="amount"
)
# Total revenue per time period
```

### Average

```
result = ws.segmentation_average(
    event="Purchase",
    from_date="2025-01-01",
    to_date="2025-01-31",
    on="amount"
)
# Average purchase amount per time period
```

## API Escape Hatch

For Mixpanel APIs not covered by the Workspace class, use the `api` property to make authenticated requests directly:

```
import mixpanel_data as mp

ws = mp.Workspace()
client = ws.api

# Example: List annotations from the Annotations API
# Many Mixpanel APIs require the project ID in the URL path
base_url = "https://mixpanel.com/api/app"  # Use eu.mixpanel.com for EU
url = f"{base_url}/projects/{client.project_id}/annotations"

response = client.request("GET", url)
annotations = response["results"]

for ann in annotations:
    print(f"{ann['id']}: {ann['date']} - {ann['description']}")

# Get a specific annotation by ID
if annotations:
    annotation_id = annotations[0]["id"]
    detail_url = f"{base_url}/projects/{client.project_id}/annotations/{annotation_id}"
    annotation = client.request("GET", detail_url)
    print(annotation)
```

### Request Parameters

```
client.request(
    "POST",
    "https://mixpanel.com/api/some/endpoint",
    params={"key": "value"},           # Query parameters
    json_body={"data": "payload"},     # JSON request body
    headers={"X-Custom": "header"},    # Additional headers
    timeout=60.0                       # Request timeout in seconds
)
```

Authentication is handled automatically — the client adds the proper `Authorization` header to all requests.

The client also exposes `project_id` and `region` properties, which are useful when constructing URLs for APIs that require these values in the path.

## Next Steps

- [Data Discovery](https://jaredmcfarland.github.io/mixpanel_data/guide/discovery/index.md) — Explore your event schema
- [API Reference](https://jaredmcfarland.github.io/mixpanel_data/api/workspace/index.md) — Complete API documentation

Copy markdown

# Data Discovery

Explore your Mixpanel project's schema before writing queries. Discovery results are cached for the session.

Explore on DeepWiki

🤖 **[Discovery Methods Guide →](https://deepwiki.com/jaredmcfarland/mixpanel_data/3.2.2-discovery-methods)**

Ask questions about schema exploration, caching behavior, or how to discover your data landscape.

## Listing Events

Get all event names in your project:

```
import mixpanel_data as mp

ws = mp.Workspace()

events = ws.events()
print(events)  # ['Login', 'Purchase', 'Signup', ...]
```

```
mp inspect events

# Filter with jq - get first 5 events
mp inspect events --format json --jq '.[:5]'

# Find events containing "User"
mp inspect events --format json --jq '.[] | select(contains("User"))'
```

Events are returned sorted alphabetically.

## Listing Properties

Get properties for a specific event:

```
properties = ws.properties("Purchase")
print(properties)  # ['amount', 'country', 'product_id', ...]
```

```
mp inspect properties --event Purchase
```

Properties include both event-specific and common properties.

## Property Values

Sample values for a property:

```
# Sample values for a property
values = ws.property_values("country", event="Purchase")
print(values)  # ['US', 'UK', 'DE', 'FR', ...]

# Limit results
values = ws.property_values("country", event="Purchase", limit=5)
```

```
mp inspect values --property country --event Purchase --limit 10
```

## Saved Funnels

List funnels defined in Mixpanel:

```
funnels = ws.funnels()
for f in funnels:
    print(f"{f.funnel_id}: {f.name}")
```

```
mp inspect funnels
```

### FunnelInfo

```
f.funnel_id  # 12345
f.name       # "Checkout Funnel"
```

## Saved Cohorts

List cohorts defined in Mixpanel:

```
cohorts = ws.cohorts()
for c in cohorts:
    print(f"{c.id}: {c.name} ({c.count} users)")
```

```
mp inspect cohorts
```

### SavedCohort

```
c.id           # 12345
c.name         # "Power Users"
c.count        # 5000
c.description  # "Users with 10+ logins"
c.created      # datetime
c.is_visible   # True
```

## Lexicon Schemas

Retrieve data dictionary schemas for events and profile properties. Schemas include descriptions, property types, and metadata defined in Mixpanel's Lexicon.

Schema Coverage

The Lexicon API returns only events/properties with explicit schemas (defined via API, CSV import, or UI). It does not return all events visible in Lexicon's UI.

```
# List all schemas
schemas = ws.lexicon_schemas()
for s in schemas:
    print(f"{s.entity_type}: {s.name}")

# Filter by entity type
event_schemas = ws.lexicon_schemas(entity_type="event")
profile_schemas = ws.lexicon_schemas(entity_type="profile")

# Get a specific schema
schema = ws.lexicon_schema("event", "Purchase")
print(schema.schema_json.description)
for prop, info in schema.schema_json.properties.items():
    print(f"  {prop}: {info.type}")
```

```
mp inspect lexicon-schemas
mp inspect lexicon-schemas --type event
mp inspect lexicon-schemas --type profile
mp inspect lexicon-schema --type event --name Purchase
```

### LexiconSchema

```
s.entity_type           # "event", "profile", or other API-returned types
s.name                  # "Purchase"
s.schema_json           # LexiconDefinition object
```

### LexiconDefinition

```
s.schema_json.description                # "User completes a purchase"
s.schema_json.properties                 # dict[str, LexiconProperty]
s.schema_json.metadata                   # LexiconMetadata or None
```

### LexiconProperty

```
prop = s.schema_json.properties["amount"]
prop.type                                # "number"
prop.description                         # "Purchase amount in USD"
prop.metadata                            # LexiconMetadata or None
```

### LexiconMetadata

```
meta = s.schema_json.metadata
meta.display_name       # "Purchase Event"
meta.tags               # ["core", "revenue"]
meta.hidden             # False
meta.dropped            # False
meta.contacts           # ["owner@company.com"]
meta.team_contacts      # ["Analytics Team"]
```

Rate Limit

The Lexicon API has a strict rate limit of **5 requests per minute**. Schema results are cached for the session to minimize API calls.

## Top Events

Get today's most active events:

```
# General top events
top = ws.top_events(type="general")
for event in top:
    print(f"{event.event}: {event.count} ({event.percent_change:+.1f}%)")

# Average top events
top = ws.top_events(type="average", limit=5)
```

```
mp inspect top-events --type general --limit 10
```

### TopEvent

```
event.event           # "Login"
event.count           # 15000
event.percent_change  # 12.5 (compared to yesterday)
```

Not Cached

Unlike other discovery methods, `top_events()` always makes an API call since it returns real-time data.

## JQL-Based Remote Discovery

These methods use JQL (JavaScript Query Language) to analyze data directly on Mixpanel's servers, returning aggregated results without fetching raw data locally.

### Property Value Distribution

Understand what values a property contains and how often they appear:

```
result = ws.property_distribution(
    event="Purchase",
    property="country",
    from_date="2025-01-01",
    to_date="2025-01-31",
    limit=10,
)
print(f"Total: {result.total_count}")
for v in result.values:
    print(f"  {v.value}: {v.count} ({v.percentage:.1f}%)")
```

```
mp inspect distribution -e Purchase -p country --from 2025-01-01 --to 2025-01-31
mp inspect distribution -e Purchase -p country --from 2025-01-01 --to 2025-01-31 --limit 10
```

### Numeric Property Summary

Get statistical summary for numeric properties:

```
result = ws.numeric_summary(
    event="Purchase",
    property="amount",
    from_date="2025-01-01",
    to_date="2025-01-31",
)
print(f"Count: {result.count}")
print(f"Range: {result.min} to {result.max}")
print(f"Avg: {result.avg:.2f}, Stddev: {result.stddev:.2f}")
print(f"Median: {result.percentiles[50]}")
```

```
mp inspect numeric -e Purchase -p amount --from 2025-01-01 --to 2025-01-31
mp inspect numeric -e Purchase -p amount --from 2025-01-01 --to 2025-01-31 --percentiles 25,50,75,90
```

### Daily Event Counts

See event activity over time:

```
result = ws.daily_counts(
    from_date="2025-01-01",
    to_date="2025-01-07",
    events=["Purchase", "Signup"],
)
for c in result.counts:
    print(f"{c.date} {c.event}: {c.count}")
```

```
mp inspect daily --from 2025-01-01 --to 2025-01-07
mp inspect daily --from 2025-01-01 --to 2025-01-07 -e Purchase,Signup
```

### User Engagement Distribution

Understand how engaged users are by their event count:

```
result = ws.engagement_distribution(
    from_date="2025-01-01",
    to_date="2025-01-31",
)
print(f"Total users: {result.total_users}")
for b in result.buckets:
    print(f"  {b.bucket_label} events: {b.user_count} ({b.percentage:.1f}%)")
```

```
mp inspect engagement --from 2025-01-01 --to 2025-01-31
mp inspect engagement --from 2025-01-01 --to 2025-01-31 --buckets 1,5,10,50,100
```

### Property Coverage

Check data quality by seeing how often properties are defined:

```
result = ws.property_coverage(
    event="Purchase",
    properties=["coupon_code", "referrer", "utm_source"],
    from_date="2025-01-01",
    to_date="2025-01-31",
)
print(f"Total events: {result.total_events}")
for c in result.coverage:
    print(f"  {c.property}: {c.coverage_percentage:.1f}% defined")
```

```
mp inspect coverage -e Purchase -p coupon_code,referrer,utm_source --from 2025-01-01 --to 2025-01-31
```

When to Use JQL-Based Discovery

These methods are ideal for:

- **Quick exploration**: Understand data shape before fetching locally
- **Large date ranges**: Analyze months of data without downloading everything
- **Data quality checks**: Verify property coverage and value distributions
- **Trend analysis**: See daily activity patterns

See the [JQL Discovery Types](https://jaredmcfarland.github.io/mixpanel_data/api/types/#jql-discovery-types) in the API reference for return type details.

## Caching

Discovery results are cached for the lifetime of the Workspace:

```
ws = mp.Workspace()

# First call hits the API
events1 = ws.events()

# Second call returns cached result (instant)
events2 = ws.events()

# Clear cache to force refresh
ws.clear_discovery_cache()

# Now hits API again
events3 = ws.events()
```

## Local Data Analysis

After fetching data into DuckDB, use these introspection methods to understand your data before writing SQL queries.

### Sampling Data

Get random sample rows to see data structure:

```
# Get 10 random rows (default)
df = ws.sample("events")
print(df)

# Get 5 random rows
df = ws.sample("events", n=5)
```

```
mp inspect sample -t events
mp inspect sample -t events -n 5
mp inspect sample -t events --format table
```

### Statistical Summary

Get column-level statistics for an entire table:

```
summary = ws.summarize("events")
print(f"Total rows: {summary.row_count}")

for col in summary.columns:
    print(f"{col.column_name}: {col.column_type}")
    print(f"  Nulls: {col.null_percentage:.1f}%")
    print(f"  Unique: {col.approx_unique}")
    if col.avg is not None:  # Numeric columns
        print(f"  Mean: {col.avg:.2f}, Std: {col.std:.2f}")
```

```
mp inspect summarize -t events
mp inspect summarize -t events --format table
```

### Event Breakdown

Analyze event distribution in an events table:

```
breakdown = ws.event_breakdown("events")
print(f"Total events: {breakdown.total_events}")
print(f"Total users: {breakdown.total_users}")
print(f"Date range: {breakdown.date_range[0]} to {breakdown.date_range[1]}")

for event in breakdown.events:
    print(f"{event.event_name}: {event.count} ({event.pct_of_total:.1f}%)")
    print(f"  Users: {event.unique_users}")
    print(f"  First seen: {event.first_seen}")
```

```
mp inspect breakdown -t events
mp inspect breakdown -t events --format table
```

Required Columns

The table must have `event_name`, `event_time`, and `distinct_id` columns.

### Property Key Discovery

Discover all JSON property keys in a table:

```
# All property keys across all events
keys = ws.property_keys("events")
print(keys)  # ['amount', 'country', 'product_id', ...]

# Property keys for a specific event
keys = ws.property_keys("events", event="Purchase")
```

```
mp inspect keys -t events
mp inspect keys -t events -e "Purchase"
```

This is especially useful for building JSON path expressions like `properties->>'$.country'`.

### Column Statistics

Deep analysis of a single column:

```
# Analyze a regular column
stats = ws.column_stats("events", "event_name")
print(f"Total: {stats.count}, Nulls: {stats.null_pct:.1f}%")
print(f"Unique values: {stats.unique_count}")
print("Top values:")
for value, count in stats.top_values:
    print(f"  {value}: {count}")

# Analyze a JSON property
stats = ws.column_stats("events", "properties->>'$.country'", top_n=20)
```

```
mp inspect column -t events -c event_name
mp inspect column -t events -c "properties->>'$.country'" --top 20
```

For numeric columns, additional statistics are available:

```
stats = ws.column_stats("purchases", "properties->>'$.amount'")
print(f"Min: {stats.min}, Max: {stats.max}")
print(f"Mean: {stats.mean:.2f}, Std: {stats.std:.2f}")
```

### Introspection Workflow

A typical workflow for exploring fetched data:

```
import mixpanel_data as mp

ws = mp.Workspace()

# Fetch data first
ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31")

# 1. Quick look at the data
print(ws.sample("events", n=3))

# 2. Get overall statistics
summary = ws.summarize("events")
print(f"Rows: {summary.row_count}")

# 3. Understand event distribution
breakdown = ws.event_breakdown("events")
for e in breakdown.events[:5]:
    print(f"{e.event_name}: {e.count}")

# 4. Discover available properties
keys = ws.property_keys("events", event="Purchase")
print(f"Purchase properties: {keys}")

# 5. Deep dive into specific columns
stats = ws.column_stats("events", "properties->>'$.country'")
print(f"Top countries: {stats.top_values[:5]}")

# Now write informed SQL queries
df = ws.sql("""
    SELECT properties->>'$.country' as country, COUNT(*) as count
    FROM events
    WHERE event_name = 'Purchase'
    GROUP BY 1
    ORDER BY 2 DESC
""")
```

## Local Table Discovery

Inspect tables in your local database:

### List Tables

```
tables = ws.tables()
for t in tables:
    print(f"{t.name}: {t.row_count} rows ({t.type})")
```

```
mp inspect tables
```

### Table Schema

```
schema = ws.table_schema("jan_events")
for col in schema.columns:
    print(f"{col.name}: {col.type} (nullable: {col.nullable})")
```

```
mp inspect schema --table jan_events
```

### Workspace Info

```
info = ws.info()
print(f"Database: {info.path}")
print(f"Project: {info.project_id} ({info.region})")
print(f"Account: {info.account}")
print(f"Tables: {len(info.tables)}")
print(f"Size: {info.size_mb:.1f} MB")
```

```
mp inspect info
```

## Discovery Workflow

A typical discovery workflow before analysis:

```
import mixpanel_data as mp

ws = mp.Workspace()

# 1. What events exist?
print("Events:")
for event in ws.events()[:10]:
    print(f"  - {event}")

# 2. What properties does Purchase have?
print("\nPurchase properties:")
for prop in ws.properties("Purchase"):
    print(f"  - {prop}")

# 3. What values does 'country' have?
print("\nCountry values:")
for value in ws.property_values("country", event="Purchase", limit=10):
    print(f"  - {value}")

# 4. What funnels are defined?
print("\nFunnels:")
for f in ws.funnels():
    print(f"  - {f.name} (ID: {f.funnel_id})")

# 5. Now fetch and analyze
ws.fetch_events("purchases", from_date="2025-01-01", to_date="2025-01-31",
                events=["Purchase"])

df = ws.sql("""
    SELECT properties->>'$.country' as country, COUNT(*) as count
    FROM purchases
    GROUP BY 1
    ORDER BY 2 DESC
""")
print(df)
```

## Next Steps

- [Fetching Data](https://jaredmcfarland.github.io/mixpanel_data/guide/fetching/index.md) — Fetch events for local analysis
- [SQL Queries](https://jaredmcfarland.github.io/mixpanel_data/guide/sql-queries/index.md) — Query with SQL
- [API Reference](https://jaredmcfarland.github.io/mixpanel_data/api/workspace/index.md) — Complete API documentation

Copy markdown
# API Reference

# API Overview

The `mixpanel_data` Python API provides programmatic access to all library functionality.

Explore on DeepWiki

🤖 **[Python API Reference →](https://deepwiki.com/jaredmcfarland/mixpanel_data/7.2-python-api-reference)**

Ask questions about API methods, explore usage patterns, or get help with specific functionality.

## Import Patterns

```
# Recommended: import with alias
import mixpanel_data as mp

ws = mp.Workspace()
result = ws.segmentation(...)

# Direct imports
from mixpanel_data import Workspace, FetchResult, MixpanelDataError

# Auth utilities
from mixpanel_data.auth import ConfigManager, Credentials
```

## Core Components

### Workspace

The main entry point for all operations:

- **Discovery** — Explore events, properties, funnels, cohorts
- **Fetching** — Download events and profiles to local storage
- **Streaming** — Stream data directly without storage (ETL, pipelines)
- **Local Queries** — SQL queries against DuckDB
- **Live Queries** — Real-time analytics from Mixpanel API
- **Introspection** — Examine local tables and schemas

[View Workspace API](https://jaredmcfarland.github.io/mixpanel_data/api/workspace/index.md)

### Auth Module

Credential and account management:

- **ConfigManager** — Manage accounts in config file
- **Credentials** — Credential container with secrets
- **AccountInfo** — Account metadata (without secrets)

[View Auth API](https://jaredmcfarland.github.io/mixpanel_data/api/auth/index.md)

### Exceptions

Structured error handling:

- **MixpanelDataError** — Base exception
- **APIError** — HTTP/API errors
- **ConfigError** — Configuration errors
- **TableExistsError** / **TableNotFoundError** — Storage errors

[View Exceptions](https://jaredmcfarland.github.io/mixpanel_data/api/exceptions/index.md)

### Result Types

Typed results for all operations:

- **FetchResult** — Fetch operation results
- **SegmentationResult** — Time-series data
- **FunnelResult** — Funnel conversion data
- **RetentionResult** — Retention cohort data
- And many more...

[View Result Types](https://jaredmcfarland.github.io/mixpanel_data/api/types/index.md)

## Type Aliases

The library exports these type aliases:

```
from mixpanel_data import CountType, HourDayUnit, TimeUnit

# CountType: Literal["general", "unique", "average", "median", "min", "max"]
# HourDayUnit: Literal["hour", "day"]
# TimeUnit: Literal["day", "week", "month", "quarter", "year"]
```

## Complete API Reference

- [Workspace](https://jaredmcfarland.github.io/mixpanel_data/api/workspace/index.md) — Main facade class
- [Auth](https://jaredmcfarland.github.io/mixpanel_data/api/auth/index.md) — Authentication and configuration
- [Exceptions](https://jaredmcfarland.github.io/mixpanel_data/api/exceptions/index.md) — Error handling
- [Result Types](https://jaredmcfarland.github.io/mixpanel_data/api/types/index.md) — All result dataclasses

Copy markdown

# Workspace

The `Workspace` class is the unified entry point for all Mixpanel data operations.

Explore on DeepWiki

🤖 **[Workspace Class Deep Dive →](https://deepwiki.com/jaredmcfarland/mixpanel_data/3.2.1-workspace-class)**

Ask questions about Workspace methods, explore usage patterns, or understand how services are orchestrated.

## Overview

Workspace orchestrates four internal services:

- **DiscoveryService** — Schema exploration (events, properties, funnels, cohorts)
- **FetcherService** — Data ingestion from Mixpanel to DuckDB, or streaming without storage
- **LiveQueryService** — Real-time analytics queries
- **StorageEngine** — Local SQL query execution

## Key Features

### Parallel Fetching

For large date ranges, use `parallel=True` for up to 10x faster exports:

```
# Parallel fetch for large date ranges (recommended)
result = ws.fetch_events(
    name="events",
    from_date="2024-01-01",
    to_date="2024-12-31",
    parallel=True
)

print(f"Fetched {result.total_rows} rows in {result.duration_seconds:.1f}s")
```

Parallel fetching:

- Splits date ranges into 7-day chunks (configurable via `chunk_days`)
- Fetches chunks concurrently (configurable via `max_workers`, default: 10)
- Returns `ParallelFetchResult` with batch statistics and failure tracking
- Supports progress callbacks via `on_batch_complete`

### Parallel Profile Fetching

For large profile datasets, use `parallel=True` for up to 5x faster exports:

```
# Parallel profile fetch for large datasets
result = ws.fetch_profiles(
    name="users",
    parallel=True,
    max_workers=5  # Default and max is 5
)

print(f"Fetched {result.total_rows} profiles in {result.duration_seconds:.1f}s")
print(f"Pages: {result.successful_pages} succeeded, {result.failed_pages} failed")
```

Parallel profile fetching:

- Uses page-based parallelism with session IDs for consistency
- Fetches pages concurrently (configurable via `max_workers`, default: 5, max: 5)
- Returns `ParallelProfileResult` with page statistics and failure tracking
- Supports progress callbacks via `on_page_complete`

### Append Mode

The `fetch_events()` and `fetch_profiles()` methods support an `append` parameter for incremental data loading:

```
# Initial fetch
ws.fetch_events(name="events", from_date="2025-01-01", to_date="2025-01-31")

# Append more data (duplicates are automatically skipped)
ws.fetch_events(name="events", from_date="2025-02-01", to_date="2025-02-28", append=True)
```

This is useful for:

- **Incremental loading**: Fetch data in chunks without creating multiple tables
- **Crash recovery**: Resume a failed fetch from the last successful point
- **Extending date ranges**: Add more historical or recent data to an existing table
- **Retrying failed parallel batches**: Use append mode to retry specific date ranges

Duplicate events (by `insert_id`) and profiles (by `distinct_id`) are automatically skipped via `INSERT OR IGNORE`.

### Advanced Profile Fetching

The `fetch_profiles()` and `stream_profiles()` methods support advanced filtering options:

```
# Fetch specific users by ID
ws.fetch_profiles(name="vip_users", distinct_ids=["user_1", "user_2", "user_3"])

# Fetch group profiles (e.g., companies)
ws.fetch_profiles(name="companies", group_id="companies")

# Fetch users based on behavior
ws.fetch_profiles(
    name="purchasers",
    behaviors=[{"window": "30d", "name": "buyers", "event_selectors": [{"event": "Purchase"}]}],
    where='(behaviors["buyers"] > 0)'
)

# Query historical profile state
ws.fetch_profiles(
    name="profiles_last_week",
    as_of_timestamp=int(time.time()) - 604800  # 7 days ago
)

# Get all users with cohort membership marked
ws.fetch_profiles(
    name="cohort_analysis",
    cohort_id="12345",
    include_all_users=True
)
```

**Parameter constraints:**

- `distinct_id` and `distinct_ids` are mutually exclusive
- `behaviors` and `cohort_id` are mutually exclusive
- `include_all_users` requires `cohort_id` to be set

## Class Reference

## mixpanel_data.Workspace

```
Workspace(
    account: str | None = None,
    project_id: str | None = None,
    region: str | None = None,
    path: str | Path | None = None,
    read_only: bool = False,
    _config_manager: ConfigManager | None = None,
    _api_client: MixpanelAPIClient | None = None,
    _storage: StorageEngine | None = None,
)
```

Unified entry point for Mixpanel data operations.

The Workspace class is a facade that orchestrates all services:

- DiscoveryService for schema exploration
- FetcherService for data ingestion
- LiveQueryService for real-time analytics
- StorageEngine for local SQL queries

Examples:

Basic usage with credentials from config:

```
ws = Workspace()
ws.fetch_events(from_date="2024-01-01", to_date="2024-01-31")
df = ws.sql("SELECT * FROM events LIMIT 10")
```

Ephemeral workspace for temporary analysis:

```
with Workspace.ephemeral() as ws:
    ws.fetch_events(from_date="2024-01-01", to_date="2024-01-31")
    total = ws.sql_scalar("SELECT COUNT(*) FROM events")
# Database automatically deleted
```

Query-only access to existing database:

```
ws = Workspace.open("path/to/database.db")
df = ws.sql("SELECT * FROM events")
```

Create a new Workspace with credentials and optional database path.

Credentials are resolved in priority order:

1. Environment variables (MP_USERNAME, MP_SECRET, MP_PROJECT_ID, MP_REGION)
1. Named account from config file (if account parameter specified)
1. Default account from config file

| PARAMETER         | DESCRIPTION                                                                                                                                 |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| `account`         | Named account from config file to use. **TYPE:** \`str                                                                                      |
| `project_id`      | Override project ID from credentials. **TYPE:** \`str                                                                                       |
| `region`          | Override region from credentials (us, eu, in). **TYPE:** \`str                                                                              |
| `path`            | Path to database file. If None, uses default location. **TYPE:** \`str                                                                      |
| `read_only`       | If True, open database in read-only mode allowing concurrent reads. Defaults to False (write access). **TYPE:** `bool` **DEFAULT:** `False` |
| `_config_manager` | Injected ConfigManager for testing. **TYPE:** \`ConfigManager                                                                               |
| `_api_client`     | Injected MixpanelAPIClient for testing. **TYPE:** \`MixpanelAPIClient                                                                       |
| `_storage`        | Injected StorageEngine for testing. **TYPE:** \`StorageEngine                                                                               |

| RAISES                 | DESCRIPTION                        |
| ---------------------- | ---------------------------------- |
| `ConfigError`          | If no credentials can be resolved. |
| `AccountNotFoundError` | If named account doesn't exist.    |

Source code in `src/mixpanel_data/workspace.py`

```
def __init__(
    self,
    account: str | None = None,
    project_id: str | None = None,
    region: str | None = None,
    path: str | Path | None = None,
    read_only: bool = False,
    # Dependency injection for testing
    _config_manager: ConfigManager | None = None,
    _api_client: MixpanelAPIClient | None = None,
    _storage: StorageEngine | None = None,
) -> None:
    """Create a new Workspace with credentials and optional database path.

    Credentials are resolved in priority order:
    1. Environment variables (MP_USERNAME, MP_SECRET, MP_PROJECT_ID, MP_REGION)
    2. Named account from config file (if account parameter specified)
    3. Default account from config file

    Args:
        account: Named account from config file to use.
        project_id: Override project ID from credentials.
        region: Override region from credentials (us, eu, in).
        path: Path to database file. If None, uses default location.
        read_only: If True, open database in read-only mode allowing
            concurrent reads. Defaults to False (write access).
        _config_manager: Injected ConfigManager for testing.
        _api_client: Injected MixpanelAPIClient for testing.
        _storage: Injected StorageEngine for testing.

    Raises:
        ConfigError: If no credentials can be resolved.
        AccountNotFoundError: If named account doesn't exist.
    """
    # Store injected or create default ConfigManager
    self._config_manager = _config_manager or ConfigManager()

    # Resolve credentials
    self._credentials: Credentials | None = None
    self._account_name: str | None = account

    # Resolve credentials (may raise ConfigError or AccountNotFoundError)
    self._credentials = self._config_manager.resolve_credentials(account)

    # Apply overrides if provided
    if project_id or region:
        from typing import cast

        from pydantic import SecretStr

        from mixpanel_data._internal.config import RegionType

        resolved_region = region or self._credentials.region
        self._credentials = Credentials(
            username=self._credentials.username,
            secret=SecretStr(self._credentials.secret.get_secret_value()),
            project_id=project_id or self._credentials.project_id,
            region=cast(RegionType, resolved_region),
        )

    # Initialize storage lazily
    # Store path for lazy initialization, or use injected storage directly
    self._db_path: Path | None = None
    self._storage: StorageEngine | None = None
    self._read_only = read_only

    if _storage is not None:
        # Injected storage - use directly
        self._storage = _storage
    else:
        # Determine database path for lazy initialization
        if path is not None:
            self._db_path = Path(path) if isinstance(path, str) else path
        else:
            # Default path: ~/.mp/data/{project_id}.db
            self._db_path = (
                Path.home() / ".mp" / "data" / f"{self._credentials.project_id}.db"
            )
        # NOTE: StorageEngine is NOT created here - see storage property

    # Lazy-initialized services (None until first use)
    self._api_client: MixpanelAPIClient | None = _api_client
    self._discovery: DiscoveryService | None = None
    self._fetcher: FetcherService | None = None
    self._live_query: LiveQueryService | None = None
```

### connection

```
connection: DuckDBPyConnection
```

Direct access to the DuckDB connection.

Use this for operations not covered by the Workspace API.

| RETURNS              | DESCRIPTION                       |
| -------------------- | --------------------------------- |
| `DuckDBPyConnection` | The underlying DuckDB connection. |

### db_path

```
db_path: Path | None
```

Path to the DuckDB database file.

Returns the filesystem path where data is stored. Useful for:

- Knowing where your data lives
- Opening the same database later with `Workspace.open(path)`
- Debugging and logging

| RETURNS | DESCRIPTION |
| ------- | ----------- |
| \`Path  | None\`      |

Example

Save the path for later use::

```
ws = mp.Workspace()
path = ws.db_path
ws.close()

# Later, reopen the same database
ws = mp.Workspace.open(path)
```

### api

```
api: MixpanelAPIClient
```

Direct access to the Mixpanel API client.

Use this escape hatch for Mixpanel API operations not covered by the Workspace class. The client handles authentication automatically.

The client provides

- `request(method, url, **kwargs)`: Make authenticated requests to any Mixpanel API endpoint.
- `project_id`: The configured project ID for constructing URLs.
- `region`: The configured region ('us', 'eu', or 'in').

| RETURNS             | DESCRIPTION                       |
| ------------------- | --------------------------------- |
| `MixpanelAPIClient` | The underlying MixpanelAPIClient. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Example

Fetch event schema from the Lexicon Schemas API::

```
import mixpanel_data as mp
from urllib.parse import quote

ws = mp.Workspace()
client = ws.api

# Build the URL with proper encoding
event_name = quote("Added To Cart", safe="")
url = f"https://mixpanel.com/api/app/projects/{client.project_id}/schemas/event/{event_name}"

# Make the authenticated request
schema = client.request("GET", url)
print(schema)
```

### ephemeral

```
ephemeral(
    account: str | None = None,
    project_id: str | None = None,
    region: str | None = None,
    _config_manager: ConfigManager | None = None,
    _api_client: MixpanelAPIClient | None = None,
) -> Iterator[Workspace]
```

Create a temporary workspace that auto-deletes on exit.

| PARAMETER         | DESCRIPTION                                                           |
| ----------------- | --------------------------------------------------------------------- |
| `account`         | Named account from config file to use. **TYPE:** \`str                |
| `project_id`      | Override project ID from credentials. **TYPE:** \`str                 |
| `region`          | Override region from credentials. **TYPE:** \`str                     |
| `_config_manager` | Injected ConfigManager for testing. **TYPE:** \`ConfigManager         |
| `_api_client`     | Injected MixpanelAPIClient for testing. **TYPE:** \`MixpanelAPIClient |

| YIELDS      | DESCRIPTION                                                 |
| ----------- | ----------------------------------------------------------- |
| `Workspace` | A workspace with temporary database. **TYPE::** `Workspace` |

Example

```
with Workspace.ephemeral() as ws:
    ws.fetch_events(from_date="2024-01-01", to_date="2024-01-31")
    print(ws.sql_scalar("SELECT COUNT(*) FROM events"))
# Database file automatically deleted
```

Source code in `src/mixpanel_data/workspace.py`

````
@classmethod
@contextmanager
def ephemeral(
    cls,
    account: str | None = None,
    project_id: str | None = None,
    region: str | None = None,
    _config_manager: ConfigManager | None = None,
    _api_client: MixpanelAPIClient | None = None,
) -> Iterator[Workspace]:
    """Create a temporary workspace that auto-deletes on exit.

    Args:
        account: Named account from config file to use.
        project_id: Override project ID from credentials.
        region: Override region from credentials.
        _config_manager: Injected ConfigManager for testing.
        _api_client: Injected MixpanelAPIClient for testing.

    Yields:
        Workspace: A workspace with temporary database.

    Example:
        ```python
        with Workspace.ephemeral() as ws:
            ws.fetch_events(from_date="2024-01-01", to_date="2024-01-31")
            print(ws.sql_scalar("SELECT COUNT(*) FROM events"))
        # Database file automatically deleted
        ```
    """
    storage = StorageEngine.ephemeral()
    ws = cls(
        account=account,
        project_id=project_id,
        region=region,
        _config_manager=_config_manager,
        _api_client=_api_client,
        _storage=storage,
    )
    try:
        yield ws
    finally:
        ws.close()
````

### memory

```
memory(
    account: str | None = None,
    project_id: str | None = None,
    region: str | None = None,
    _config_manager: ConfigManager | None = None,
    _api_client: MixpanelAPIClient | None = None,
) -> Iterator[Workspace]
```

Create a workspace with true in-memory database.

The database exists only in RAM with zero disk footprint. All data is lost when the context manager exits.

Best for:

- Small datasets where zero disk footprint is required
- Unit tests without filesystem side effects
- Quick exploratory queries

For large datasets, prefer ephemeral() which benefits from DuckDB's compression (can be 8x faster for large workloads).

| PARAMETER         | DESCRIPTION                                                           |
| ----------------- | --------------------------------------------------------------------- |
| `account`         | Named account from config file to use. **TYPE:** \`str                |
| `project_id`      | Override project ID from credentials. **TYPE:** \`str                 |
| `region`          | Override region from credentials. **TYPE:** \`str                     |
| `_config_manager` | Injected ConfigManager for testing. **TYPE:** \`ConfigManager         |
| `_api_client`     | Injected MixpanelAPIClient for testing. **TYPE:** \`MixpanelAPIClient |

| YIELDS      | DESCRIPTION                                                 |
| ----------- | ----------------------------------------------------------- |
| `Workspace` | A workspace with in-memory database. **TYPE::** `Workspace` |

Example

```
with Workspace.memory() as ws:
    ws.fetch_events(from_date="2024-01-01", to_date="2024-01-01")
    total = ws.sql_scalar("SELECT COUNT(*) FROM events")
# Database gone - no cleanup needed, no files left behind
```

Source code in `src/mixpanel_data/workspace.py`

````
@classmethod
@contextmanager
def memory(
    cls,
    account: str | None = None,
    project_id: str | None = None,
    region: str | None = None,
    _config_manager: ConfigManager | None = None,
    _api_client: MixpanelAPIClient | None = None,
) -> Iterator[Workspace]:
    """Create a workspace with true in-memory database.

    The database exists only in RAM with zero disk footprint.
    All data is lost when the context manager exits.

    Best for:
    - Small datasets where zero disk footprint is required
    - Unit tests without filesystem side effects
    - Quick exploratory queries

    For large datasets, prefer ephemeral() which benefits from
    DuckDB's compression (can be 8x faster for large workloads).

    Args:
        account: Named account from config file to use.
        project_id: Override project ID from credentials.
        region: Override region from credentials.
        _config_manager: Injected ConfigManager for testing.
        _api_client: Injected MixpanelAPIClient for testing.

    Yields:
        Workspace: A workspace with in-memory database.

    Example:
        ```python
        with Workspace.memory() as ws:
            ws.fetch_events(from_date="2024-01-01", to_date="2024-01-01")
            total = ws.sql_scalar("SELECT COUNT(*) FROM events")
        # Database gone - no cleanup needed, no files left behind
        ```
    """
    storage = StorageEngine.memory()
    ws = cls(
        account=account,
        project_id=project_id,
        region=region,
        _config_manager=_config_manager,
        _api_client=_api_client,
        _storage=storage,
    )
    try:
        yield ws
    finally:
        ws.close()
````

### open

```
open(path: str | Path, *, read_only: bool = True) -> Workspace
```

Open an existing database for query-only access.

This method opens a database without requiring API credentials. Discovery, fetching, and live query methods will be unavailable.

| PARAMETER   | DESCRIPTION                                                                                                                              |
| ----------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `path`      | Path to existing database file. **TYPE:** \`str                                                                                          |
| `read_only` | If True (default), open in read-only mode allowing concurrent reads. Set to False for write access. **TYPE:** `bool` **DEFAULT:** `True` |

| RETURNS     | DESCRIPTION                                                   |
| ----------- | ------------------------------------------------------------- |
| `Workspace` | A workspace with access to stored data. **TYPE:** `Workspace` |

| RAISES              | DESCRIPTION                     |
| ------------------- | ------------------------------- |
| `FileNotFoundError` | If database file doesn't exist. |

Example

```
ws = Workspace.open("my_data.db")
df = ws.sql("SELECT * FROM events")
ws.close()
```

Source code in `src/mixpanel_data/workspace.py`

````
@classmethod
def open(cls, path: str | Path, *, read_only: bool = True) -> Workspace:
    """Open an existing database for query-only access.

    This method opens a database without requiring API credentials.
    Discovery, fetching, and live query methods will be unavailable.

    Args:
        path: Path to existing database file.
        read_only: If True (default), open in read-only mode allowing
            concurrent reads. Set to False for write access.

    Returns:
        Workspace: A workspace with access to stored data.

    Raises:
        FileNotFoundError: If database file doesn't exist.

    Example:
        ```python
        ws = Workspace.open("my_data.db")
        df = ws.sql("SELECT * FROM events")
        ws.close()
        ```
    """
    db_path = Path(path) if isinstance(path, str) else path
    storage = StorageEngine.open_existing(db_path, read_only=read_only)

    # Create instance without credential resolution
    instance = object.__new__(cls)
    instance._config_manager = ConfigManager()
    instance._credentials = None
    instance._account_name = None
    instance._db_path = db_path
    instance._storage = storage
    instance._read_only = read_only
    instance._api_client = None
    instance._discovery = None
    instance._fetcher = None
    instance._live_query = None

    return instance
````

### close

```
close() -> None
```

Close all resources (database connection, HTTP client).

This method is idempotent and safe to call multiple times.

Source code in `src/mixpanel_data/workspace.py`

```
def close(self) -> None:
    """Close all resources (database connection, HTTP client).

    This method is idempotent and safe to call multiple times.
    """
    # Close storage
    if self._storage is not None:
        self._storage.close()

    # Close API client if we created one
    if self._api_client is not None:
        self._api_client.close()
        self._api_client = None
```

### test_credentials

```
test_credentials(account: str | None = None) -> dict[str, Any]
```

Test account credentials by making a lightweight API call.

This method verifies that credentials are valid and can access the Mixpanel API. It's useful for validating configuration before attempting more expensive operations.

| PARAMETER | DESCRIPTION                                                                                                          |
| --------- | -------------------------------------------------------------------------------------------------------------------- |
| `account` | Named account to test. If None, tests the default account or credentials from environment variables. **TYPE:** \`str |

| RETURNS          | DESCRIPTION                                                                  |
| ---------------- | ---------------------------------------------------------------------------- |
| `dict[str, Any]` | Dict containing: - success: bool - Whether the test succeeded - account: str |

| RAISES                 | DESCRIPTION                        |
| ---------------------- | ---------------------------------- |
| `AccountNotFoundError` | If named account doesn't exist.    |
| `AuthenticationError`  | If credentials are invalid.        |
| `ConfigError`          | If no credentials can be resolved. |

Example

```
# Test default account
result = Workspace.test_credentials()
if result["success"]:
    print(f"Authenticated to project {result['project_id']}")

# Test specific account
result = Workspace.test_credentials("production")
```

Source code in `src/mixpanel_data/workspace.py`

````
@staticmethod
def test_credentials(account: str | None = None) -> dict[str, Any]:
    """Test account credentials by making a lightweight API call.

    This method verifies that credentials are valid and can access the
    Mixpanel API. It's useful for validating configuration before
    attempting more expensive operations.

    Args:
        account: Named account to test. If None, tests the default account
            or credentials from environment variables.

    Returns:
        Dict containing:
            - success: bool - Whether the test succeeded
            - account: str | None - Account name tested
            - project_id: str - Project ID from credentials
            - region: str - Region from credentials
            - events_found: int - Number of events found (validation metric)

    Raises:
        AccountNotFoundError: If named account doesn't exist.
        AuthenticationError: If credentials are invalid.
        ConfigError: If no credentials can be resolved.

    Example:
        ```python
        # Test default account
        result = Workspace.test_credentials()
        if result["success"]:
            print(f"Authenticated to project {result['project_id']}")

        # Test specific account
        result = Workspace.test_credentials("production")
        ```
    """
    config_manager = ConfigManager()
    credentials = config_manager.resolve_credentials(account)

    # Get account info if we used a named account
    account_info = None
    if account is not None:
        account_info = config_manager.get_account(account)
    else:
        # Check if credentials came from a default account
        accounts = config_manager.list_accounts()
        for acc in accounts:
            if acc.is_default:
                account_info = acc
                break

    # Create API client and test with a lightweight call
    api_client = MixpanelAPIClient(credentials)
    try:
        events = api_client.get_events()
        event_count = len(list(events)) if events else 0

        return {
            "success": True,
            "account": account_info.name if account_info else None,
            "project_id": credentials.project_id,
            "region": credentials.region,
            "events_found": event_count,
        }
    finally:
        api_client.close()
````

### events

```
events() -> list[str]
```

List all event names in the Mixpanel project.

Results are cached for the lifetime of the Workspace.

| RETURNS     | DESCRIPTION                                |
| ----------- | ------------------------------------------ |
| `list[str]` | Alphabetically sorted list of event names. |

| RAISES                | DESCRIPTION                       |
| --------------------- | --------------------------------- |
| `ConfigError`         | If API credentials not available. |
| `AuthenticationError` | If credentials are invalid.       |

Source code in `src/mixpanel_data/workspace.py`

```
def events(self) -> list[str]:
    """List all event names in the Mixpanel project.

    Results are cached for the lifetime of the Workspace.

    Returns:
        Alphabetically sorted list of event names.

    Raises:
        ConfigError: If API credentials not available.
        AuthenticationError: If credentials are invalid.
    """
    return self._discovery_service.list_events()
```

### properties

```
properties(event: str) -> list[str]
```

List all property names for an event.

Results are cached per event for the lifetime of the Workspace.

| PARAMETER | DESCRIPTION                                       |
| --------- | ------------------------------------------------- |
| `event`   | Event name to get properties for. **TYPE:** `str` |

| RETURNS     | DESCRIPTION                                   |
| ----------- | --------------------------------------------- |
| `list[str]` | Alphabetically sorted list of property names. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def properties(self, event: str) -> list[str]:
    """List all property names for an event.

    Results are cached per event for the lifetime of the Workspace.

    Args:
        event: Event name to get properties for.

    Returns:
        Alphabetically sorted list of property names.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._discovery_service.list_properties(event)
```

### property_values

```
property_values(
    property_name: str, *, event: str | None = None, limit: int = 100
) -> list[str]
```

Get sample values for a property.

Results are cached per (property, event, limit) for the lifetime of the Workspace.

| PARAMETER       | DESCRIPTION                                                            |
| --------------- | ---------------------------------------------------------------------- |
| `property_name` | Property to get values for. **TYPE:** `str`                            |
| `event`         | Optional event to filter by. **TYPE:** \`str                           |
| `limit`         | Maximum number of values to return. **TYPE:** `int` **DEFAULT:** `100` |

| RETURNS     | DESCRIPTION                                |
| ----------- | ------------------------------------------ |
| `list[str]` | List of sample property values as strings. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def property_values(
    self,
    property_name: str,
    *,
    event: str | None = None,
    limit: int = 100,
) -> list[str]:
    """Get sample values for a property.

    Results are cached per (property, event, limit) for the lifetime of the Workspace.

    Args:
        property_name: Property to get values for.
        event: Optional event to filter by.
        limit: Maximum number of values to return.

    Returns:
        List of sample property values as strings.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._discovery_service.list_property_values(
        property_name, event=event, limit=limit
    )
```

### funnels

```
funnels() -> list[FunnelInfo]
```

List saved funnels in the Mixpanel project.

Results are cached for the lifetime of the Workspace.

| RETURNS            | DESCRIPTION                                   |
| ------------------ | --------------------------------------------- |
| `list[FunnelInfo]` | List of FunnelInfo objects (funnel_id, name). |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def funnels(self) -> list[FunnelInfo]:
    """List saved funnels in the Mixpanel project.

    Results are cached for the lifetime of the Workspace.

    Returns:
        List of FunnelInfo objects (funnel_id, name).

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._discovery_service.list_funnels()
```

### cohorts

```
cohorts() -> list[SavedCohort]
```

List saved cohorts in the Mixpanel project.

Results are cached for the lifetime of the Workspace.

| RETURNS             | DESCRIPTION                  |
| ------------------- | ---------------------------- |
| `list[SavedCohort]` | List of SavedCohort objects. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def cohorts(self) -> list[SavedCohort]:
    """List saved cohorts in the Mixpanel project.

    Results are cached for the lifetime of the Workspace.

    Returns:
        List of SavedCohort objects.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._discovery_service.list_cohorts()
```

### list_bookmarks

```
list_bookmarks(bookmark_type: BookmarkType | None = None) -> list[BookmarkInfo]
```

List all saved reports (bookmarks) in the project.

Retrieves metadata for all saved Insights, Funnel, Retention, and Flows reports in the project.

| PARAMETER       | DESCRIPTION                                                                                                                                                                    |
| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `bookmark_type` | Optional filter by report type. Valid values are 'insights', 'funnels', 'retention', 'flows', 'launch-analysis'. If None, returns all bookmark types. **TYPE:** \`BookmarkType |

| RETURNS              | DESCRIPTION                                        |
| -------------------- | -------------------------------------------------- |
| `list[BookmarkInfo]` | List of BookmarkInfo objects with report metadata. |
| `list[BookmarkInfo]` | Empty list if no bookmarks exist.                  |

| RAISES        | DESCRIPTION                                  |
| ------------- | -------------------------------------------- |
| `ConfigError` | If API credentials not available.            |
| `QueryError`  | Permission denied or invalid type parameter. |

Source code in `src/mixpanel_data/workspace.py`

```
def list_bookmarks(
    self,
    bookmark_type: BookmarkType | None = None,
) -> list[BookmarkInfo]:
    """List all saved reports (bookmarks) in the project.

    Retrieves metadata for all saved Insights, Funnel, Retention, and
    Flows reports in the project.

    Args:
        bookmark_type: Optional filter by report type. Valid values are
            'insights', 'funnels', 'retention', 'flows', 'launch-analysis'.
            If None, returns all bookmark types.

    Returns:
        List of BookmarkInfo objects with report metadata.
        Empty list if no bookmarks exist.

    Raises:
        ConfigError: If API credentials not available.
        QueryError: Permission denied or invalid type parameter.
    """
    return self._discovery_service.list_bookmarks(bookmark_type=bookmark_type)
```

### top_events

```
top_events(
    *,
    type: Literal["general", "average", "unique"] = "general",
    limit: int | None = None,
) -> list[TopEvent]
```

Get today's most active events.

This method is NOT cached (returns real-time data).

| PARAMETER | DESCRIPTION                                                                                                              |
| --------- | ------------------------------------------------------------------------------------------------------------------------ |
| `type`    | Counting method (general, average, unique). **TYPE:** `Literal['general', 'average', 'unique']` **DEFAULT:** `'general'` |
| `limit`   | Maximum number of events to return. **TYPE:** \`int                                                                      |

| RETURNS          | DESCRIPTION                                              |
| ---------------- | -------------------------------------------------------- |
| `list[TopEvent]` | List of TopEvent objects (event, count, percent_change). |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def top_events(
    self,
    *,
    type: Literal["general", "average", "unique"] = "general",
    limit: int | None = None,
) -> list[TopEvent]:
    """Get today's most active events.

    This method is NOT cached (returns real-time data).

    Args:
        type: Counting method (general, average, unique).
        limit: Maximum number of events to return.

    Returns:
        List of TopEvent objects (event, count, percent_change).

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._discovery_service.list_top_events(type=type, limit=limit)
```

### lexicon_schemas

```
lexicon_schemas(
    *, entity_type: EntityType | None = None
) -> list[LexiconSchema]
```

List Lexicon schemas in the project.

Retrieves documented event and profile property schemas from the Mixpanel Lexicon (data dictionary).

Results are cached for the lifetime of the Workspace.

| PARAMETER     | DESCRIPTION                                                                                          |
| ------------- | ---------------------------------------------------------------------------------------------------- |
| `entity_type` | Optional filter by type ("event" or "profile"). If None, returns all schemas. **TYPE:** \`EntityType |

| RETURNS               | DESCRIPTION                                          |
| --------------------- | ---------------------------------------------------- |
| `list[LexiconSchema]` | Alphabetically sorted list of LexiconSchema objects. |

| RAISES                | DESCRIPTION                       |
| --------------------- | --------------------------------- |
| `ConfigError`         | If API credentials not available. |
| `AuthenticationError` | If credentials are invalid.       |

Note

The Lexicon API has a strict 5 requests/minute rate limit. Caching helps avoid hitting this limit; call clear_discovery_cache() only when fresh data is needed.

Source code in `src/mixpanel_data/workspace.py`

```
def lexicon_schemas(
    self,
    *,
    entity_type: EntityType | None = None,
) -> list[LexiconSchema]:
    """List Lexicon schemas in the project.

    Retrieves documented event and profile property schemas from the
    Mixpanel Lexicon (data dictionary).

    Results are cached for the lifetime of the Workspace.

    Args:
        entity_type: Optional filter by type ("event" or "profile").
            If None, returns all schemas.

    Returns:
        Alphabetically sorted list of LexiconSchema objects.

    Raises:
        ConfigError: If API credentials not available.
        AuthenticationError: If credentials are invalid.

    Note:
        The Lexicon API has a strict 5 requests/minute rate limit.
        Caching helps avoid hitting this limit; call clear_discovery_cache()
        only when fresh data is needed.
    """
    return self._discovery_service.list_schemas(entity_type=entity_type)
```

### lexicon_schema

```
lexicon_schema(entity_type: EntityType, name: str) -> LexiconSchema
```

Get a single Lexicon schema by entity type and name.

Retrieves a documented schema for a specific event or profile property from the Mixpanel Lexicon (data dictionary).

Results are cached for the lifetime of the Workspace.

| PARAMETER     | DESCRIPTION                                                |
| ------------- | ---------------------------------------------------------- |
| `entity_type` | Entity type ("event" or "profile"). **TYPE:** `EntityType` |
| `name`        | Entity name. **TYPE:** `str`                               |

| RETURNS         | DESCRIPTION                             |
| --------------- | --------------------------------------- |
| `LexiconSchema` | LexiconSchema for the specified entity. |

| RAISES                | DESCRIPTION                       |
| --------------------- | --------------------------------- |
| `ConfigError`         | If API credentials not available. |
| `AuthenticationError` | If credentials are invalid.       |
| `QueryError`          | If schema not found.              |

Note

The Lexicon API has a strict 5 requests/minute rate limit. Caching helps avoid hitting this limit; call clear_discovery_cache() only when fresh data is needed.

Source code in `src/mixpanel_data/workspace.py`

```
def lexicon_schema(
    self,
    entity_type: EntityType,
    name: str,
) -> LexiconSchema:
    """Get a single Lexicon schema by entity type and name.

    Retrieves a documented schema for a specific event or profile property
    from the Mixpanel Lexicon (data dictionary).

    Results are cached for the lifetime of the Workspace.

    Args:
        entity_type: Entity type ("event" or "profile").
        name: Entity name.

    Returns:
        LexiconSchema for the specified entity.

    Raises:
        ConfigError: If API credentials not available.
        AuthenticationError: If credentials are invalid.
        QueryError: If schema not found.

    Note:
        The Lexicon API has a strict 5 requests/minute rate limit.
        Caching helps avoid hitting this limit; call clear_discovery_cache()
        only when fresh data is needed.
    """
    return self._discovery_service.get_schema(entity_type, name)
```

### clear_discovery_cache

```
clear_discovery_cache() -> None
```

Clear cached discovery results.

Subsequent discovery calls will fetch fresh data from the API.

Source code in `src/mixpanel_data/workspace.py`

```
def clear_discovery_cache(self) -> None:
    """Clear cached discovery results.

    Subsequent discovery calls will fetch fresh data from the API.
    """
    if self._discovery is not None:
        self._discovery.clear_cache()
```

### fetch_events

```
fetch_events(
    name: str = "events",
    *,
    from_date: str,
    to_date: str,
    events: list[str] | None = None,
    where: str | None = None,
    limit: int | None = None,
    progress: bool = True,
    append: bool = False,
    batch_size: int = 1000,
    parallel: bool = False,
    max_workers: int | None = None,
    on_batch_complete: Callable[[BatchProgress], None] | None = None,
    chunk_days: int = 7,
) -> FetchResult | ParallelFetchResult
```

Fetch events from Mixpanel and store in local database.

Note

This is a potentially long-running operation that streams data from Mixpanel's Export API. For large date ranges, use `parallel=True` for significantly faster exports (up to 10x speedup).

| PARAMETER           | DESCRIPTION                                                                                                                                                                                                                                      |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `name`              | Table name to create or append to (default: "events"). **TYPE:** `str` **DEFAULT:** `'events'`                                                                                                                                                   |
| `from_date`         | Start date (YYYY-MM-DD). **TYPE:** `str`                                                                                                                                                                                                         |
| `to_date`           | End date (YYYY-MM-DD). **TYPE:** `str`                                                                                                                                                                                                           |
| `events`            | Optional list of event names to filter. **TYPE:** \`list[str]                                                                                                                                                                                    |
| `where`             | Optional WHERE clause for filtering. **TYPE:** \`str                                                                                                                                                                                             |
| `limit`             | Optional maximum number of events to return (max 100000). **TYPE:** \`int                                                                                                                                                                        |
| `progress`          | Show progress bar (default: True). **TYPE:** `bool` **DEFAULT:** `True`                                                                                                                                                                          |
| `append`            | If True, append to existing table. If False (default), create new. **TYPE:** `bool` **DEFAULT:** `False`                                                                                                                                         |
| `batch_size`        | Number of rows per INSERT/COMMIT cycle. Controls the memory/IO tradeoff: smaller values use less memory but more disk IO; larger values use more memory but less IO. Default: 1000. Valid range: 100-100000. **TYPE:** `int` **DEFAULT:** `1000` |
| `parallel`          | If True, use parallel fetching with multiple threads. Splits date range into 7-day chunks and fetches concurrently. Enables export of date ranges exceeding 100 days. Default: False. **TYPE:** `bool` **DEFAULT:** `False`                      |
| `max_workers`       | Maximum concurrent fetch threads when parallel=True. Default: 10. Higher values may hit Mixpanel rate limits. Ignored when parallel=False. **TYPE:** \`int                                                                                       |
| `on_batch_complete` | Callback invoked when each batch completes during parallel fetch. Receives BatchProgress with status. Useful for custom progress reporting. Ignored when parallel=False. **TYPE:** \`Callable\[[BatchProgress], None\]                           |
| `chunk_days`        | Days per chunk for parallel date range splitting. Default: 7. Valid range: 1-100. Smaller values create more parallel batches but may increase API overhead. Ignored when parallel=False. **TYPE:** `int` **DEFAULT:** `7`                       |

| RETURNS       | DESCRIPTION           |
| ------------- | --------------------- |
| \`FetchResult | ParallelFetchResult\` |
| \`FetchResult | ParallelFetchResult\` |

| RAISES                | DESCRIPTION                                        |
| --------------------- | -------------------------------------------------- |
| `TableExistsError`    | If table exists and append=False.                  |
| `TableNotFoundError`  | If table doesn't exist and append=True.            |
| `ConfigError`         | If API credentials not available.                  |
| `AuthenticationError` | If credentials are invalid.                        |
| `ValueError`          | If batch_size is outside valid range (100-100000). |
| `ValueError`          | If limit is outside valid range (1-100000).        |
| `ValueError`          | If max_workers is not positive.                    |
| `ValueError`          | If chunk_days is not in range 1-100.               |

Example

```
# Sequential fetch (default)
result = ws.fetch_events(
    name="events",
    from_date="2024-01-01",
    to_date="2024-01-31",
)

# Parallel fetch for large date ranges
result = ws.fetch_events(
    name="events_q4",
    from_date="2024-10-01",
    to_date="2024-12-31",
    parallel=True,
)

# With custom progress callback
def on_batch(progress: BatchProgress) -> None:
    print(f"Batch {progress.batch_index + 1}/{progress.total_batches}")

result = ws.fetch_events(
    name="events",
    from_date="2024-01-01",
    to_date="2024-03-31",
    parallel=True,
    on_batch_complete=on_batch,
)
```

Source code in `src/mixpanel_data/workspace.py`

````
def fetch_events(
    self,
    name: str = "events",
    *,
    from_date: str,
    to_date: str,
    events: list[str] | None = None,
    where: str | None = None,
    limit: int | None = None,
    progress: bool = True,
    append: bool = False,
    batch_size: int = 1000,
    parallel: bool = False,
    max_workers: int | None = None,
    on_batch_complete: Callable[[BatchProgress], None] | None = None,
    chunk_days: int = 7,
) -> FetchResult | ParallelFetchResult:
    """Fetch events from Mixpanel and store in local database.

    Note:
        This is a potentially long-running operation that streams data from
        Mixpanel's Export API. For large date ranges, use ``parallel=True``
        for significantly faster exports (up to 10x speedup).

    Args:
        name: Table name to create or append to (default: "events").
        from_date: Start date (YYYY-MM-DD).
        to_date: End date (YYYY-MM-DD).
        events: Optional list of event names to filter.
        where: Optional WHERE clause for filtering.
        limit: Optional maximum number of events to return (max 100000).
        progress: Show progress bar (default: True).
        append: If True, append to existing table. If False (default), create new.
        batch_size: Number of rows per INSERT/COMMIT cycle. Controls the
            memory/IO tradeoff: smaller values use less memory but more
            disk IO; larger values use more memory but less IO.
            Default: 1000. Valid range: 100-100000.
        parallel: If True, use parallel fetching with multiple threads.
            Splits date range into 7-day chunks and fetches concurrently.
            Enables export of date ranges exceeding 100 days. Default: False.
        max_workers: Maximum concurrent fetch threads when parallel=True.
            Default: 10. Higher values may hit Mixpanel rate limits.
            Ignored when parallel=False.
        on_batch_complete: Callback invoked when each batch completes
            during parallel fetch. Receives BatchProgress with status.
            Useful for custom progress reporting. Ignored when parallel=False.
        chunk_days: Days per chunk for parallel date range splitting.
            Default: 7. Valid range: 1-100. Smaller values create more
            parallel batches but may increase API overhead.
            Ignored when parallel=False.

    Returns:
        FetchResult when parallel=False, ParallelFetchResult when parallel=True.
        ParallelFetchResult includes per-batch statistics and any failure info.

    Raises:
        TableExistsError: If table exists and append=False.
        TableNotFoundError: If table doesn't exist and append=True.
        ConfigError: If API credentials not available.
        AuthenticationError: If credentials are invalid.
        ValueError: If batch_size is outside valid range (100-100000).
        ValueError: If limit is outside valid range (1-100000).
        ValueError: If max_workers is not positive.
        ValueError: If chunk_days is not in range 1-100.

    Example:
        ```python
        # Sequential fetch (default)
        result = ws.fetch_events(
            name="events",
            from_date="2024-01-01",
            to_date="2024-01-31",
        )

        # Parallel fetch for large date ranges
        result = ws.fetch_events(
            name="events_q4",
            from_date="2024-10-01",
            to_date="2024-12-31",
            parallel=True,
        )

        # With custom progress callback
        def on_batch(progress: BatchProgress) -> None:
            print(f"Batch {progress.batch_index + 1}/{progress.total_batches}")

        result = ws.fetch_events(
            name="events",
            from_date="2024-01-01",
            to_date="2024-03-31",
            parallel=True,
            on_batch_complete=on_batch,
        )
        ```
    """
    # Validate parameters early to avoid wasted API calls
    _validate_batch_size(batch_size)
    _validate_limit(limit)

    # Validate max_workers for parallel mode
    if max_workers is not None and max_workers <= 0:
        raise ValueError("max_workers must be positive")

    # Validate chunk_days for parallel mode
    if chunk_days <= 0:
        raise ValueError("chunk_days must be positive")
    if chunk_days > 100:
        raise ValueError("chunk_days must be at most 100")

    # Create progress callback if requested (only for interactive terminals)
    progress_callback = None
    pbar = None
    if progress and sys.stderr.isatty() and not parallel:
        try:
            from rich.progress import Progress, SpinnerColumn, TextColumn

            pbar = Progress(
                SpinnerColumn(),
                TextColumn("[progress.description]{task.description}"),
                TextColumn("{task.completed} rows"),
            )
            task = pbar.add_task("Fetching events...", total=None)
            pbar.start()

            def callback(count: int) -> None:
                pbar.update(task, completed=count)

            progress_callback = callback
        except Exception:
            # Progress bar unavailable or failed to initialize, skip silently
            pass

    try:
        result = self._fetcher_service.fetch_events(
            name=name,
            from_date=from_date,
            to_date=to_date,
            events=events,
            where=where,
            limit=limit,
            progress_callback=progress_callback,
            append=append,
            batch_size=batch_size,
            parallel=parallel,
            max_workers=max_workers,
            on_batch_complete=on_batch_complete,
            chunk_days=chunk_days,
        )
    finally:
        if pbar is not None:
            pbar.stop()

    return result
````

### fetch_profiles

```
fetch_profiles(
    name: str = "profiles",
    *,
    where: str | None = None,
    cohort_id: str | None = None,
    output_properties: list[str] | None = None,
    progress: bool = True,
    append: bool = False,
    batch_size: int = 1000,
    distinct_id: str | None = None,
    distinct_ids: list[str] | None = None,
    group_id: str | None = None,
    behaviors: list[dict[str, Any]] | None = None,
    as_of_timestamp: int | None = None,
    include_all_users: bool = False,
    parallel: bool = False,
    max_workers: int | None = None,
    on_page_complete: Callable[[ProfileProgress], None] | None = None,
) -> FetchResult | ParallelProfileResult
```

Fetch user profiles from Mixpanel and store in local database.

Note

This is a potentially long-running operation that streams data from Mixpanel's Engage API. For large profile sets, use `parallel=True` for up to 5x faster exports.

| PARAMETER           | DESCRIPTION                                                                                                                                                                                                                                                                                                 |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`              | Table name to create or append to (default: "profiles"). **TYPE:** `str` **DEFAULT:** `'profiles'`                                                                                                                                                                                                          |
| `where`             | Optional WHERE clause for filtering. **TYPE:** \`str                                                                                                                                                                                                                                                        |
| `cohort_id`         | Optional cohort ID to filter by. Only profiles that are members of this cohort will be returned. **TYPE:** \`str                                                                                                                                                                                            |
| `output_properties` | Optional list of property names to include in the response. If None, all properties are returned. **TYPE:** \`list[str]                                                                                                                                                                                     |
| `progress`          | Show progress bar (default: True). **TYPE:** `bool` **DEFAULT:** `True`                                                                                                                                                                                                                                     |
| `append`            | If True, append to existing table. If False (default), create new. **TYPE:** `bool` **DEFAULT:** `False`                                                                                                                                                                                                    |
| `batch_size`        | Number of rows per INSERT/COMMIT cycle. Controls the memory/IO tradeoff: smaller values use less memory but more disk IO; larger values use more memory but less IO. Default: 1000. Valid range: 100-100000. **TYPE:** `int` **DEFAULT:** `1000`                                                            |
| `distinct_id`       | Optional single user ID to fetch. Mutually exclusive with distinct_ids. **TYPE:** \`str                                                                                                                                                                                                                     |
| `distinct_ids`      | Optional list of user IDs to fetch. Mutually exclusive with distinct_id. Duplicates are automatically removed. **TYPE:** \`list[str]                                                                                                                                                                        |
| `group_id`          | Optional group type identifier (e.g., "companies") to fetch group profiles instead of user profiles. **TYPE:** \`str                                                                                                                                                                                        |
| `behaviors`         | Optional list of behavioral filters. Each dict should have 'window' (e.g., "30d"), 'name' (identifier), and 'event_selectors' (list of {"event": "Name"}). Use with where parameter to filter, e.g., where='(behaviors["name"] > 0)'. Mutually exclusive with cohort_id. **TYPE:** \`list\[dict[str, Any]\] |
| `as_of_timestamp`   | Optional Unix timestamp to query profile state at a specific point in time. Must be in the past. **TYPE:** \`int                                                                                                                                                                                            |
| `include_all_users` | If True, include all users and mark cohort membership. Only valid when cohort_id is provided. **TYPE:** `bool` **DEFAULT:** `False`                                                                                                                                                                         |
| `parallel`          | If True, use parallel fetching with multiple threads. Uses page-based parallelism for concurrent profile fetching. Enables up to 5x faster exports. Default: False. **TYPE:** `bool` **DEFAULT:** `False`                                                                                                   |
| `max_workers`       | Maximum concurrent fetch threads when parallel=True. Default: 5, capped at 5. Ignored when parallel=False. **TYPE:** \`int                                                                                                                                                                                  |
| `on_page_complete`  | Callback invoked when each page completes during parallel fetch. Receives ProfileProgress with status. Useful for custom progress reporting. Ignored when parallel=False. **TYPE:** \`Callable\[[ProfileProgress], None\]                                                                                   |

| RETURNS       | DESCRIPTION             |
| ------------- | ----------------------- |
| \`FetchResult | ParallelProfileResult\` |
| \`FetchResult | ParallelProfileResult\` |

| RAISES               | DESCRIPTION                                                                                      |
| -------------------- | ------------------------------------------------------------------------------------------------ |
| `TableExistsError`   | If table exists and append=False.                                                                |
| `TableNotFoundError` | If table doesn't exist and append=True.                                                          |
| `ConfigError`        | If API credentials not available.                                                                |
| `ValueError`         | If batch_size is outside valid range (100-100000) or mutually exclusive parameters are provided. |

Source code in `src/mixpanel_data/workspace.py`

```
def fetch_profiles(
    self,
    name: str = "profiles",
    *,
    where: str | None = None,
    cohort_id: str | None = None,
    output_properties: list[str] | None = None,
    progress: bool = True,
    append: bool = False,
    batch_size: int = 1000,
    distinct_id: str | None = None,
    distinct_ids: list[str] | None = None,
    group_id: str | None = None,
    behaviors: list[dict[str, Any]] | None = None,
    as_of_timestamp: int | None = None,
    include_all_users: bool = False,
    parallel: bool = False,
    max_workers: int | None = None,
    on_page_complete: Callable[[ProfileProgress], None] | None = None,
) -> FetchResult | ParallelProfileResult:
    """Fetch user profiles from Mixpanel and store in local database.

    Note:
        This is a potentially long-running operation that streams data from
        Mixpanel's Engage API. For large profile sets, use ``parallel=True``
        for up to 5x faster exports.

    Args:
        name: Table name to create or append to (default: "profiles").
        where: Optional WHERE clause for filtering.
        cohort_id: Optional cohort ID to filter by. Only profiles that are
            members of this cohort will be returned.
        output_properties: Optional list of property names to include in
            the response. If None, all properties are returned.
        progress: Show progress bar (default: True).
        append: If True, append to existing table. If False (default), create new.
        batch_size: Number of rows per INSERT/COMMIT cycle. Controls the
            memory/IO tradeoff: smaller values use less memory but more
            disk IO; larger values use more memory but less IO.
            Default: 1000. Valid range: 100-100000.
        distinct_id: Optional single user ID to fetch. Mutually exclusive
            with distinct_ids.
        distinct_ids: Optional list of user IDs to fetch. Mutually exclusive
            with distinct_id. Duplicates are automatically removed.
        group_id: Optional group type identifier (e.g., "companies") to fetch
            group profiles instead of user profiles.
        behaviors: Optional list of behavioral filters. Each dict should have
            'window' (e.g., "30d"), 'name' (identifier), and 'event_selectors'
            (list of {"event": "Name"}). Use with `where` parameter to filter,
            e.g., where='(behaviors["name"] > 0)'. Mutually exclusive with
            cohort_id.
        as_of_timestamp: Optional Unix timestamp to query profile state at
            a specific point in time. Must be in the past.
        include_all_users: If True, include all users and mark cohort membership.
            Only valid when cohort_id is provided.
        parallel: If True, use parallel fetching with multiple threads.
            Uses page-based parallelism for concurrent profile fetching.
            Enables up to 5x faster exports. Default: False.
        max_workers: Maximum concurrent fetch threads when parallel=True.
            Default: 5, capped at 5. Ignored when parallel=False.
        on_page_complete: Callback invoked when each page completes during
            parallel fetch. Receives ProfileProgress with status.
            Useful for custom progress reporting. Ignored when parallel=False.

    Returns:
        FetchResult when parallel=False, ParallelProfileResult when parallel=True.
        ParallelProfileResult includes per-page statistics and any failure info.

    Raises:
        TableExistsError: If table exists and append=False.
        TableNotFoundError: If table doesn't exist and append=True.
        ConfigError: If API credentials not available.
        ValueError: If batch_size is outside valid range (100-100000) or
            mutually exclusive parameters are provided.
    """
    # Validate batch_size
    _validate_batch_size(batch_size)

    # Validate max_workers for parallel mode
    if max_workers is not None and max_workers <= 0:
        raise ValueError("max_workers must be positive")

    # Create progress callback if requested (only for interactive terminals)
    # Sequential mode uses spinner progress bar
    progress_callback = None
    pbar = None
    if progress and sys.stderr.isatty() and not parallel:
        try:
            from rich.progress import Progress, SpinnerColumn, TextColumn

            pbar = Progress(
                SpinnerColumn(),
                TextColumn("[progress.description]{task.description}"),
                TextColumn("{task.completed} rows"),
            )
            task = pbar.add_task("Fetching profiles...", total=None)
            pbar.start()

            def callback(count: int) -> None:
                pbar.update(task, completed=count)

            progress_callback = callback
        except Exception:
            # Progress bar unavailable or failed to initialize, skip silently
            pass

    try:
        result = self._fetcher_service.fetch_profiles(
            name=name,
            where=where,
            cohort_id=cohort_id,
            output_properties=output_properties,
            progress_callback=progress_callback,
            append=append,
            batch_size=batch_size,
            distinct_id=distinct_id,
            distinct_ids=distinct_ids,
            group_id=group_id,
            behaviors=behaviors,
            as_of_timestamp=as_of_timestamp,
            include_all_users=include_all_users,
            parallel=parallel,
            max_workers=max_workers,
            on_page_complete=on_page_complete,
        )
    finally:
        if pbar is not None:
            pbar.stop()

    return result
```

### stream_events

```
stream_events(
    *,
    from_date: str,
    to_date: str,
    events: list[str] | None = None,
    where: str | None = None,
    limit: int | None = None,
    raw: bool = False,
) -> Iterator[dict[str, Any]]
```

Stream events directly from Mixpanel API without storing.

Yields events one at a time as they are received from the API. No database files or tables are created.

| PARAMETER   | DESCRIPTION                                                                                                                                                  |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `from_date` | Start date inclusive (YYYY-MM-DD format). **TYPE:** `str`                                                                                                    |
| `to_date`   | End date inclusive (YYYY-MM-DD format). **TYPE:** `str`                                                                                                      |
| `events`    | Optional list of event names to filter. If None, all events returned. **TYPE:** \`list[str]                                                                  |
| `where`     | Optional Mixpanel filter expression (e.g., 'properties["country"]=="US"'). **TYPE:** \`str                                                                   |
| `limit`     | Optional maximum number of events to return (max 100000). **TYPE:** \`int                                                                                    |
| `raw`       | If True, return events in raw Mixpanel API format. If False (default), return normalized format with datetime objects. **TYPE:** `bool` **DEFAULT:** `False` |

| YIELDS           | DESCRIPTION                                                       |
| ---------------- | ----------------------------------------------------------------- |
| `dict[str, Any]` | dict\[str, Any\]: Event dictionaries in normalized or raw format. |

| RAISES                | DESCRIPTION                                 |
| --------------------- | ------------------------------------------- |
| `ConfigError`         | If API credentials are not available.       |
| `AuthenticationError` | If credentials are invalid.                 |
| `RateLimitError`      | If rate limit exceeded after max retries.   |
| `QueryError`          | If filter expression is invalid.            |
| `ValueError`          | If limit is outside valid range (1-100000). |

Example

```
ws = Workspace()
for event in ws.stream_events(from_date="2024-01-01", to_date="2024-01-31"):
    process(event)
ws.close()
```

With raw format:

```
for event in ws.stream_events(
    from_date="2024-01-01", to_date="2024-01-31", raw=True
):
    legacy_system.ingest(event)
```

Source code in `src/mixpanel_data/workspace.py`

````
def stream_events(
    self,
    *,
    from_date: str,
    to_date: str,
    events: list[str] | None = None,
    where: str | None = None,
    limit: int | None = None,
    raw: bool = False,
) -> Iterator[dict[str, Any]]:
    """Stream events directly from Mixpanel API without storing.

    Yields events one at a time as they are received from the API.
    No database files or tables are created.

    Args:
        from_date: Start date inclusive (YYYY-MM-DD format).
        to_date: End date inclusive (YYYY-MM-DD format).
        events: Optional list of event names to filter. If None, all events returned.
        where: Optional Mixpanel filter expression (e.g., 'properties["country"]=="US"').
        limit: Optional maximum number of events to return (max 100000).
        raw: If True, return events in raw Mixpanel API format.
             If False (default), return normalized format with datetime objects.

    Yields:
        dict[str, Any]: Event dictionaries in normalized or raw format.

    Raises:
        ConfigError: If API credentials are not available.
        AuthenticationError: If credentials are invalid.
        RateLimitError: If rate limit exceeded after max retries.
        QueryError: If filter expression is invalid.
        ValueError: If limit is outside valid range (1-100000).

    Example:
        ```python
        ws = Workspace()
        for event in ws.stream_events(from_date="2024-01-01", to_date="2024-01-31"):
            process(event)
        ws.close()
        ```

        With raw format:

        ```python
        for event in ws.stream_events(
            from_date="2024-01-01", to_date="2024-01-31", raw=True
        ):
            legacy_system.ingest(event)
        ```
    """
    # Validate limit early to avoid wasted API calls
    _validate_limit(limit)

    api_client = self._require_api_client()
    event_iterator = api_client.export_events(
        from_date=from_date,
        to_date=to_date,
        events=events,
        where=where,
        limit=limit,
    )

    if raw:
        yield from event_iterator
    else:
        for event in event_iterator:
            yield transform_event(event)
````

### stream_profiles

```
stream_profiles(
    *,
    where: str | None = None,
    cohort_id: str | None = None,
    output_properties: list[str] | None = None,
    raw: bool = False,
    distinct_id: str | None = None,
    distinct_ids: list[str] | None = None,
    group_id: str | None = None,
    behaviors: list[dict[str, Any]] | None = None,
    as_of_timestamp: int | None = None,
    include_all_users: bool = False,
) -> Iterator[dict[str, Any]]
```

Stream user profiles directly from Mixpanel API without storing.

Yields profiles one at a time as they are received from the API. No database files or tables are created.

| PARAMETER           | DESCRIPTION                                                                                                                                                                                                                                                                                                 |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `where`             | Optional Mixpanel filter expression for profile properties. **TYPE:** \`str                                                                                                                                                                                                                                 |
| `cohort_id`         | Optional cohort ID to filter by. Only profiles that are members of this cohort will be returned. **TYPE:** \`str                                                                                                                                                                                            |
| `output_properties` | Optional list of property names to include in the response. If None, all properties are returned. **TYPE:** \`list[str]                                                                                                                                                                                     |
| `raw`               | If True, return profiles in raw Mixpanel API format. If False (default), return normalized format. **TYPE:** `bool` **DEFAULT:** `False`                                                                                                                                                                    |
| `distinct_id`       | Optional single user ID to fetch. Mutually exclusive with distinct_ids. **TYPE:** \`str                                                                                                                                                                                                                     |
| `distinct_ids`      | Optional list of user IDs to fetch. Mutually exclusive with distinct_id. Duplicates are automatically removed. **TYPE:** \`list[str]                                                                                                                                                                        |
| `group_id`          | Optional group type identifier (e.g., "companies") to fetch group profiles instead of user profiles. **TYPE:** \`str                                                                                                                                                                                        |
| `behaviors`         | Optional list of behavioral filters. Each dict should have 'window' (e.g., "30d"), 'name' (identifier), and 'event_selectors' (list of {"event": "Name"}). Use with where parameter to filter, e.g., where='(behaviors["name"] > 0)'. Mutually exclusive with cohort_id. **TYPE:** \`list\[dict[str, Any]\] |
| `as_of_timestamp`   | Optional Unix timestamp to query profile state at a specific point in time. Must be in the past. **TYPE:** \`int                                                                                                                                                                                            |
| `include_all_users` | If True, include all users and mark cohort membership. Only valid when cohort_id is provided. **TYPE:** `bool` **DEFAULT:** `False`                                                                                                                                                                         |

| YIELDS           | DESCRIPTION                                                         |
| ---------------- | ------------------------------------------------------------------- |
| `dict[str, Any]` | dict\[str, Any\]: Profile dictionaries in normalized or raw format. |

| RAISES                | DESCRIPTION                                    |
| --------------------- | ---------------------------------------------- |
| `ConfigError`         | If API credentials are not available.          |
| `AuthenticationError` | If credentials are invalid.                    |
| `RateLimitError`      | If rate limit exceeded after max retries.      |
| `ValueError`          | If mutually exclusive parameters are provided. |

Example

```
ws = Workspace()
for profile in ws.stream_profiles():
    sync_to_crm(profile)
ws.close()
```

Filter to premium users:

```
for profile in ws.stream_profiles(where='properties["plan"]=="premium"'):
    send_survey(profile)
```

Filter by cohort and select specific properties:

```
for profile in ws.stream_profiles(
    cohort_id="12345",
    output_properties=["$email", "$name"]
):
    send_email(profile)
```

Fetch specific users by ID:

```
for profile in ws.stream_profiles(distinct_ids=["user_1", "user_2"]):
    print(profile)
```

Fetch group profiles:

```
for company in ws.stream_profiles(group_id="companies"):
    print(company)
```

Source code in `src/mixpanel_data/workspace.py`

````
def stream_profiles(
    self,
    *,
    where: str | None = None,
    cohort_id: str | None = None,
    output_properties: list[str] | None = None,
    raw: bool = False,
    distinct_id: str | None = None,
    distinct_ids: list[str] | None = None,
    group_id: str | None = None,
    behaviors: list[dict[str, Any]] | None = None,
    as_of_timestamp: int | None = None,
    include_all_users: bool = False,
) -> Iterator[dict[str, Any]]:
    """Stream user profiles directly from Mixpanel API without storing.

    Yields profiles one at a time as they are received from the API.
    No database files or tables are created.

    Args:
        where: Optional Mixpanel filter expression for profile properties.
        cohort_id: Optional cohort ID to filter by. Only profiles that are
            members of this cohort will be returned.
        output_properties: Optional list of property names to include in
            the response. If None, all properties are returned.
        raw: If True, return profiles in raw Mixpanel API format.
             If False (default), return normalized format.
        distinct_id: Optional single user ID to fetch. Mutually exclusive
            with distinct_ids.
        distinct_ids: Optional list of user IDs to fetch. Mutually exclusive
            with distinct_id. Duplicates are automatically removed.
        group_id: Optional group type identifier (e.g., "companies") to fetch
            group profiles instead of user profiles.
        behaviors: Optional list of behavioral filters. Each dict should have
            'window' (e.g., "30d"), 'name' (identifier), and 'event_selectors'
            (list of {"event": "Name"}). Use with `where` parameter to filter,
            e.g., where='(behaviors["name"] > 0)'. Mutually exclusive with
            cohort_id.
        as_of_timestamp: Optional Unix timestamp to query profile state at
            a specific point in time. Must be in the past.
        include_all_users: If True, include all users and mark cohort membership.
            Only valid when cohort_id is provided.

    Yields:
        dict[str, Any]: Profile dictionaries in normalized or raw format.

    Raises:
        ConfigError: If API credentials are not available.
        AuthenticationError: If credentials are invalid.
        RateLimitError: If rate limit exceeded after max retries.
        ValueError: If mutually exclusive parameters are provided.

    Example:
        ```python
        ws = Workspace()
        for profile in ws.stream_profiles():
            sync_to_crm(profile)
        ws.close()
        ```

        Filter to premium users:

        ```python
        for profile in ws.stream_profiles(where='properties["plan"]=="premium"'):
            send_survey(profile)
        ```

        Filter by cohort and select specific properties:

        ```python
        for profile in ws.stream_profiles(
            cohort_id="12345",
            output_properties=["$email", "$name"]
        ):
            send_email(profile)
        ```

        Fetch specific users by ID:

        ```python
        for profile in ws.stream_profiles(distinct_ids=["user_1", "user_2"]):
            print(profile)
        ```

        Fetch group profiles:

        ```python
        for company in ws.stream_profiles(group_id="companies"):
            print(company)
        ```
    """
    api_client = self._require_api_client()
    profile_iterator = api_client.export_profiles(
        where=where,
        cohort_id=cohort_id,
        output_properties=output_properties,
        distinct_id=distinct_id,
        distinct_ids=distinct_ids,
        group_id=group_id,
        behaviors=behaviors,
        as_of_timestamp=as_of_timestamp,
        include_all_users=include_all_users,
    )

    if raw:
        yield from profile_iterator
    else:
        for profile in profile_iterator:
            yield transform_profile(profile)
````

### sql

```
sql(query: str) -> pd.DataFrame
```

Execute SQL query and return results as DataFrame.

| PARAMETER | DESCRIPTION                       |
| --------- | --------------------------------- |
| `query`   | SQL query string. **TYPE:** `str` |

| RETURNS     | DESCRIPTION                          |
| ----------- | ------------------------------------ |
| `DataFrame` | pandas DataFrame with query results. |

| RAISES       | DESCRIPTION          |
| ------------ | -------------------- |
| `QueryError` | If query is invalid. |

Source code in `src/mixpanel_data/workspace.py`

```
def sql(self, query: str) -> pd.DataFrame:
    """Execute SQL query and return results as DataFrame.

    Args:
        query: SQL query string.

    Returns:
        pandas DataFrame with query results.

    Raises:
        QueryError: If query is invalid.
    """
    return self.storage.execute_df(query)
```

### sql_scalar

```
sql_scalar(query: str) -> Any
```

Execute SQL query and return single scalar value.

| PARAMETER | DESCRIPTION                                            |
| --------- | ------------------------------------------------------ |
| `query`   | SQL query that returns a single value. **TYPE:** `str` |

| RETURNS | DESCRIPTION                                |
| ------- | ------------------------------------------ |
| `Any`   | The scalar result (int, float, str, etc.). |

| RAISES       | DESCRIPTION                                     |
| ------------ | ----------------------------------------------- |
| `QueryError` | If query is invalid or returns multiple values. |

Source code in `src/mixpanel_data/workspace.py`

```
def sql_scalar(self, query: str) -> Any:
    """Execute SQL query and return single scalar value.

    Args:
        query: SQL query that returns a single value.

    Returns:
        The scalar result (int, float, str, etc.).

    Raises:
        QueryError: If query is invalid or returns multiple values.
    """
    return self.storage.execute_scalar(query)
```

### sql_rows

```
sql_rows(query: str) -> SQLResult
```

Execute SQL query and return structured result with column metadata.

| PARAMETER | DESCRIPTION                       |
| --------- | --------------------------------- |
| `query`   | SQL query string. **TYPE:** `str` |

| RETURNS     | DESCRIPTION                                 |
| ----------- | ------------------------------------------- |
| `SQLResult` | SQLResult with column names and row tuples. |

| RAISES       | DESCRIPTION          |
| ------------ | -------------------- |
| `QueryError` | If query is invalid. |

Example

```
result = ws.sql_rows("SELECT name, age FROM users")
print(result.columns)  # ['name', 'age']
for row in result.rows:
    print(row)  # ('Alice', 30)

# Or convert to dicts for JSON output:
for row in result.to_dicts():
    print(row)  # {'name': 'Alice', 'age': 30}
```

Source code in `src/mixpanel_data/workspace.py`

````
def sql_rows(self, query: str) -> SQLResult:
    """Execute SQL query and return structured result with column metadata.

    Args:
        query: SQL query string.

    Returns:
        SQLResult with column names and row tuples.

    Raises:
        QueryError: If query is invalid.

    Example:
        ```python
        result = ws.sql_rows("SELECT name, age FROM users")
        print(result.columns)  # ['name', 'age']
        for row in result.rows:
            print(row)  # ('Alice', 30)

        # Or convert to dicts for JSON output:
        for row in result.to_dicts():
            print(row)  # {'name': 'Alice', 'age': 30}
        ```
    """
    return self.storage.execute_rows(query)
````

### segmentation

```
segmentation(
    event: str,
    *,
    from_date: str,
    to_date: str,
    on: str | None = None,
    unit: Literal["day", "week", "month"] = "day",
    where: str | None = None,
) -> SegmentationResult
```

Run a segmentation query against Mixpanel API.

| PARAMETER   | DESCRIPTION                                                                                 |
| ----------- | ------------------------------------------------------------------------------------------- |
| `event`     | Event name to query. **TYPE:** `str`                                                        |
| `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str`                                                    |
| `to_date`   | End date (YYYY-MM-DD). **TYPE:** `str`                                                      |
| `on`        | Optional property to segment by. **TYPE:** \`str                                            |
| `unit`      | Time unit for aggregation. **TYPE:** `Literal['day', 'week', 'month']` **DEFAULT:** `'day'` |
| `where`     | Optional WHERE clause. **TYPE:** \`str                                                      |

| RETURNS              | DESCRIPTION                               |
| -------------------- | ----------------------------------------- |
| `SegmentationResult` | SegmentationResult with time-series data. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def segmentation(
    self,
    event: str,
    *,
    from_date: str,
    to_date: str,
    on: str | None = None,
    unit: Literal["day", "week", "month"] = "day",
    where: str | None = None,
) -> SegmentationResult:
    """Run a segmentation query against Mixpanel API.

    Args:
        event: Event name to query.
        from_date: Start date (YYYY-MM-DD).
        to_date: End date (YYYY-MM-DD).
        on: Optional property to segment by.
        unit: Time unit for aggregation.
        where: Optional WHERE clause.

    Returns:
        SegmentationResult with time-series data.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._live_query_service.segmentation(
        event=event,
        from_date=from_date,
        to_date=to_date,
        on=on,
        unit=unit,
        where=where,
    )
```

### funnel

```
funnel(
    funnel_id: int,
    *,
    from_date: str,
    to_date: str,
    unit: str | None = None,
    on: str | None = None,
) -> FunnelResult
```

Run a funnel analysis query.

| PARAMETER   | DESCRIPTION                                      |
| ----------- | ------------------------------------------------ |
| `funnel_id` | ID of saved funnel. **TYPE:** `int`              |
| `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str`         |
| `to_date`   | End date (YYYY-MM-DD). **TYPE:** `str`           |
| `unit`      | Optional time unit. **TYPE:** \`str              |
| `on`        | Optional property to segment by. **TYPE:** \`str |

| RETURNS        | DESCRIPTION                              |
| -------------- | ---------------------------------------- |
| `FunnelResult` | FunnelResult with step conversion rates. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def funnel(
    self,
    funnel_id: int,
    *,
    from_date: str,
    to_date: str,
    unit: str | None = None,
    on: str | None = None,
) -> FunnelResult:
    """Run a funnel analysis query.

    Args:
        funnel_id: ID of saved funnel.
        from_date: Start date (YYYY-MM-DD).
        to_date: End date (YYYY-MM-DD).
        unit: Optional time unit.
        on: Optional property to segment by.

    Returns:
        FunnelResult with step conversion rates.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._live_query_service.funnel(
        funnel_id=funnel_id,
        from_date=from_date,
        to_date=to_date,
        unit=unit,
        on=on,
    )
```

### retention

```
retention(
    *,
    born_event: str,
    return_event: str,
    from_date: str,
    to_date: str,
    born_where: str | None = None,
    return_where: str | None = None,
    interval: int = 1,
    interval_count: int = 10,
    unit: Literal["day", "week", "month"] = "day",
) -> RetentionResult
```

Run a retention analysis query.

| PARAMETER        | DESCRIPTION                                                                 |
| ---------------- | --------------------------------------------------------------------------- |
| `born_event`     | Event that defines cohort entry. **TYPE:** `str`                            |
| `return_event`   | Event that defines return. **TYPE:** `str`                                  |
| `from_date`      | Start date (YYYY-MM-DD). **TYPE:** `str`                                    |
| `to_date`        | End date (YYYY-MM-DD). **TYPE:** `str`                                      |
| `born_where`     | Optional filter for born event. **TYPE:** \`str                             |
| `return_where`   | Optional filter for return event. **TYPE:** \`str                           |
| `interval`       | Retention interval. **TYPE:** `int` **DEFAULT:** `1`                        |
| `interval_count` | Number of intervals. **TYPE:** `int` **DEFAULT:** `10`                      |
| `unit`           | Time unit. **TYPE:** `Literal['day', 'week', 'month']` **DEFAULT:** `'day'` |

| RETURNS           | DESCRIPTION                                 |
| ----------------- | ------------------------------------------- |
| `RetentionResult` | RetentionResult with cohort retention data. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def retention(
    self,
    *,
    born_event: str,
    return_event: str,
    from_date: str,
    to_date: str,
    born_where: str | None = None,
    return_where: str | None = None,
    interval: int = 1,
    interval_count: int = 10,
    unit: Literal["day", "week", "month"] = "day",
) -> RetentionResult:
    """Run a retention analysis query.

    Args:
        born_event: Event that defines cohort entry.
        return_event: Event that defines return.
        from_date: Start date (YYYY-MM-DD).
        to_date: End date (YYYY-MM-DD).
        born_where: Optional filter for born event.
        return_where: Optional filter for return event.
        interval: Retention interval.
        interval_count: Number of intervals.
        unit: Time unit.

    Returns:
        RetentionResult with cohort retention data.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._live_query_service.retention(
        born_event=born_event,
        return_event=return_event,
        from_date=from_date,
        to_date=to_date,
        born_where=born_where,
        return_where=return_where,
        interval=interval,
        interval_count=interval_count,
        unit=unit,
    )
```

### jql

```
jql(script: str, params: dict[str, Any] | None = None) -> JQLResult
```

Execute a custom JQL script.

| PARAMETER | DESCRIPTION                                                       |
| --------- | ----------------------------------------------------------------- |
| `script`  | JQL script code. **TYPE:** `str`                                  |
| `params`  | Optional parameters to pass to script. **TYPE:** \`dict[str, Any] |

| RETURNS     | DESCRIPTION                       |
| ----------- | --------------------------------- |
| `JQLResult` | JQLResult with raw query results. |

| RAISES           | DESCRIPTION                       |
| ---------------- | --------------------------------- |
| `ConfigError`    | If API credentials not available. |
| `JQLSyntaxError` | If script has syntax errors.      |

Source code in `src/mixpanel_data/workspace.py`

```
def jql(self, script: str, params: dict[str, Any] | None = None) -> JQLResult:
    """Execute a custom JQL script.

    Args:
        script: JQL script code.
        params: Optional parameters to pass to script.

    Returns:
        JQLResult with raw query results.

    Raises:
        ConfigError: If API credentials not available.
        JQLSyntaxError: If script has syntax errors.
    """
    return self._live_query_service.jql(script=script, params=params)
```

### event_counts

```
event_counts(
    events: list[str],
    *,
    from_date: str,
    to_date: str,
    type: Literal["general", "unique", "average"] = "general",
    unit: Literal["day", "week", "month"] = "day",
) -> EventCountsResult
```

Get event counts for multiple events.

| PARAMETER   | DESCRIPTION                                                                                   |
| ----------- | --------------------------------------------------------------------------------------------- |
| `events`    | List of event names. **TYPE:** `list[str]`                                                    |
| `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str`                                                      |
| `to_date`   | End date (YYYY-MM-DD). **TYPE:** `str`                                                        |
| `type`      | Counting method. **TYPE:** `Literal['general', 'unique', 'average']` **DEFAULT:** `'general'` |
| `unit`      | Time unit. **TYPE:** `Literal['day', 'week', 'month']` **DEFAULT:** `'day'`                   |

| RETURNS             | DESCRIPTION                                   |
| ------------------- | --------------------------------------------- |
| `EventCountsResult` | EventCountsResult with time-series per event. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def event_counts(
    self,
    events: list[str],
    *,
    from_date: str,
    to_date: str,
    type: Literal["general", "unique", "average"] = "general",
    unit: Literal["day", "week", "month"] = "day",
) -> EventCountsResult:
    """Get event counts for multiple events.

    Args:
        events: List of event names.
        from_date: Start date (YYYY-MM-DD).
        to_date: End date (YYYY-MM-DD).
        type: Counting method.
        unit: Time unit.

    Returns:
        EventCountsResult with time-series per event.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._live_query_service.event_counts(
        events=events,
        from_date=from_date,
        to_date=to_date,
        type=type,
        unit=unit,
    )
```

### property_counts

```
property_counts(
    event: str,
    property_name: str,
    *,
    from_date: str,
    to_date: str,
    type: Literal["general", "unique", "average"] = "general",
    unit: Literal["day", "week", "month"] = "day",
    values: list[str] | None = None,
    limit: int | None = None,
) -> PropertyCountsResult
```

Get event counts broken down by property values.

| PARAMETER       | DESCRIPTION                                                                                   |
| --------------- | --------------------------------------------------------------------------------------------- |
| `event`         | Event name. **TYPE:** `str`                                                                   |
| `property_name` | Property to break down by. **TYPE:** `str`                                                    |
| `from_date`     | Start date (YYYY-MM-DD). **TYPE:** `str`                                                      |
| `to_date`       | End date (YYYY-MM-DD). **TYPE:** `str`                                                        |
| `type`          | Counting method. **TYPE:** `Literal['general', 'unique', 'average']` **DEFAULT:** `'general'` |
| `unit`          | Time unit. **TYPE:** `Literal['day', 'week', 'month']` **DEFAULT:** `'day'`                   |
| `values`        | Optional list of property values to include. **TYPE:** \`list[str]                            |
| `limit`         | Maximum number of property values. **TYPE:** \`int                                            |

| RETURNS                | DESCRIPTION                                               |
| ---------------------- | --------------------------------------------------------- |
| `PropertyCountsResult` | PropertyCountsResult with time-series per property value. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def property_counts(
    self,
    event: str,
    property_name: str,
    *,
    from_date: str,
    to_date: str,
    type: Literal["general", "unique", "average"] = "general",
    unit: Literal["day", "week", "month"] = "day",
    values: list[str] | None = None,
    limit: int | None = None,
) -> PropertyCountsResult:
    """Get event counts broken down by property values.

    Args:
        event: Event name.
        property_name: Property to break down by.
        from_date: Start date (YYYY-MM-DD).
        to_date: End date (YYYY-MM-DD).
        type: Counting method.
        unit: Time unit.
        values: Optional list of property values to include.
        limit: Maximum number of property values.

    Returns:
        PropertyCountsResult with time-series per property value.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._live_query_service.property_counts(
        event=event,
        property_name=property_name,
        from_date=from_date,
        to_date=to_date,
        type=type,
        unit=unit,
        values=values,
        limit=limit,
    )
```

### activity_feed

```
activity_feed(
    distinct_ids: list[str],
    *,
    from_date: str | None = None,
    to_date: str | None = None,
) -> ActivityFeedResult
```

Get activity feed for specific users.

| PARAMETER      | DESCRIPTION                                     |
| -------------- | ----------------------------------------------- |
| `distinct_ids` | List of user identifiers. **TYPE:** `list[str]` |
| `from_date`    | Optional start date filter. **TYPE:** \`str     |
| `to_date`      | Optional end date filter. **TYPE:** \`str       |

| RETURNS              | DESCRIPTION                          |
| -------------------- | ------------------------------------ |
| `ActivityFeedResult` | ActivityFeedResult with user events. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def activity_feed(
    self,
    distinct_ids: list[str],
    *,
    from_date: str | None = None,
    to_date: str | None = None,
) -> ActivityFeedResult:
    """Get activity feed for specific users.

    Args:
        distinct_ids: List of user identifiers.
        from_date: Optional start date filter.
        to_date: Optional end date filter.

    Returns:
        ActivityFeedResult with user events.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._live_query_service.activity_feed(
        distinct_ids=distinct_ids,
        from_date=from_date,
        to_date=to_date,
    )
```

### query_saved_report

```
query_saved_report(bookmark_id: int) -> SavedReportResult
```

Query a saved report (Insights, Retention, or Funnel).

Executes a saved report by its bookmark ID. The report type is automatically detected from the response headers.

| PARAMETER     | DESCRIPTION                                                               |
| ------------- | ------------------------------------------------------------------------- |
| `bookmark_id` | ID of saved report (from list_bookmarks or Mixpanel URL). **TYPE:** `int` |

| RETURNS             | DESCRIPTION                                                  |
| ------------------- | ------------------------------------------------------------ |
| `SavedReportResult` | SavedReportResult with report data and report_type property. |

| RAISES        | DESCRIPTION                                    |
| ------------- | ---------------------------------------------- |
| `ConfigError` | If API credentials not available.              |
| `QueryError`  | If bookmark_id is invalid or report not found. |

Source code in `src/mixpanel_data/workspace.py`

```
def query_saved_report(self, bookmark_id: int) -> SavedReportResult:
    """Query a saved report (Insights, Retention, or Funnel).

    Executes a saved report by its bookmark ID. The report type is
    automatically detected from the response headers.

    Args:
        bookmark_id: ID of saved report (from list_bookmarks or Mixpanel URL).

    Returns:
        SavedReportResult with report data and report_type property.

    Raises:
        ConfigError: If API credentials not available.
        QueryError: If bookmark_id is invalid or report not found.
    """
    return self._live_query_service.query_saved_report(bookmark_id=bookmark_id)
```

### query_flows

```
query_flows(bookmark_id: int) -> FlowsResult
```

Query a saved Flows report.

Executes a saved Flows report by its bookmark ID, returning step data, breakdowns, and conversion rates.

| PARAMETER     | DESCRIPTION                                                                     |
| ------------- | ------------------------------------------------------------------------------- |
| `bookmark_id` | ID of saved flows report (from list_bookmarks or Mixpanel URL). **TYPE:** `int` |

| RETURNS       | DESCRIPTION                                              |
| ------------- | -------------------------------------------------------- |
| `FlowsResult` | FlowsResult with steps, breakdowns, and conversion rate. |

| RAISES        | DESCRIPTION                                    |
| ------------- | ---------------------------------------------- |
| `ConfigError` | If API credentials not available.              |
| `QueryError`  | If bookmark_id is invalid or report not found. |

Source code in `src/mixpanel_data/workspace.py`

```
def query_flows(self, bookmark_id: int) -> FlowsResult:
    """Query a saved Flows report.

    Executes a saved Flows report by its bookmark ID, returning
    step data, breakdowns, and conversion rates.

    Args:
        bookmark_id: ID of saved flows report (from list_bookmarks or Mixpanel URL).

    Returns:
        FlowsResult with steps, breakdowns, and conversion rate.

    Raises:
        ConfigError: If API credentials not available.
        QueryError: If bookmark_id is invalid or report not found.
    """
    return self._live_query_service.query_flows(bookmark_id=bookmark_id)
```

### frequency

```
frequency(
    *,
    from_date: str,
    to_date: str,
    unit: Literal["day", "week", "month"] = "day",
    addiction_unit: Literal["hour", "day"] = "hour",
    event: str | None = None,
    where: str | None = None,
) -> FrequencyResult
```

Analyze event frequency distribution.

| PARAMETER        | DESCRIPTION                                                                         |
| ---------------- | ----------------------------------------------------------------------------------- |
| `from_date`      | Start date (YYYY-MM-DD). **TYPE:** `str`                                            |
| `to_date`        | End date (YYYY-MM-DD). **TYPE:** `str`                                              |
| `unit`           | Overall time unit. **TYPE:** `Literal['day', 'week', 'month']` **DEFAULT:** `'day'` |
| `addiction_unit` | Measurement granularity. **TYPE:** `Literal['hour', 'day']` **DEFAULT:** `'hour'`   |
| `event`          | Optional event filter. **TYPE:** \`str                                              |
| `where`          | Optional WHERE clause. **TYPE:** \`str                                              |

| RETURNS           | DESCRIPTION                                  |
| ----------------- | -------------------------------------------- |
| `FrequencyResult` | FrequencyResult with frequency distribution. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def frequency(
    self,
    *,
    from_date: str,
    to_date: str,
    unit: Literal["day", "week", "month"] = "day",
    addiction_unit: Literal["hour", "day"] = "hour",
    event: str | None = None,
    where: str | None = None,
) -> FrequencyResult:
    """Analyze event frequency distribution.

    Args:
        from_date: Start date (YYYY-MM-DD).
        to_date: End date (YYYY-MM-DD).
        unit: Overall time unit.
        addiction_unit: Measurement granularity.
        event: Optional event filter.
        where: Optional WHERE clause.

    Returns:
        FrequencyResult with frequency distribution.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._live_query_service.frequency(
        from_date=from_date,
        to_date=to_date,
        unit=unit,
        addiction_unit=addiction_unit,
        event=event,
        where=where,
    )
```

### segmentation_numeric

```
segmentation_numeric(
    event: str,
    *,
    from_date: str,
    to_date: str,
    on: str,
    unit: Literal["hour", "day"] = "day",
    where: str | None = None,
    type: Literal["general", "unique", "average"] = "general",
) -> NumericBucketResult
```

Bucket events by numeric property ranges.

| PARAMETER   | DESCRIPTION                                                                                   |
| ----------- | --------------------------------------------------------------------------------------------- |
| `event`     | Event name. **TYPE:** `str`                                                                   |
| `from_date` | Start date. **TYPE:** `str`                                                                   |
| `to_date`   | End date. **TYPE:** `str`                                                                     |
| `on`        | Numeric property expression. **TYPE:** `str`                                                  |
| `unit`      | Time unit. **TYPE:** `Literal['hour', 'day']` **DEFAULT:** `'day'`                            |
| `where`     | Optional filter. **TYPE:** \`str                                                              |
| `type`      | Counting method. **TYPE:** `Literal['general', 'unique', 'average']` **DEFAULT:** `'general'` |

| RETURNS               | DESCRIPTION                             |
| --------------------- | --------------------------------------- |
| `NumericBucketResult` | NumericBucketResult with bucketed data. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def segmentation_numeric(
    self,
    event: str,
    *,
    from_date: str,
    to_date: str,
    on: str,
    unit: Literal["hour", "day"] = "day",
    where: str | None = None,
    type: Literal["general", "unique", "average"] = "general",
) -> NumericBucketResult:
    """Bucket events by numeric property ranges.

    Args:
        event: Event name.
        from_date: Start date.
        to_date: End date.
        on: Numeric property expression.
        unit: Time unit.
        where: Optional filter.
        type: Counting method.

    Returns:
        NumericBucketResult with bucketed data.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._live_query_service.segmentation_numeric(
        event=event,
        from_date=from_date,
        to_date=to_date,
        on=on,
        unit=unit,
        where=where,
        type=type,
    )
```

### segmentation_sum

```
segmentation_sum(
    event: str,
    *,
    from_date: str,
    to_date: str,
    on: str,
    unit: Literal["hour", "day"] = "day",
    where: str | None = None,
) -> NumericSumResult
```

Calculate sum of numeric property over time.

| PARAMETER   | DESCRIPTION                                                        |
| ----------- | ------------------------------------------------------------------ |
| `event`     | Event name. **TYPE:** `str`                                        |
| `from_date` | Start date. **TYPE:** `str`                                        |
| `to_date`   | End date. **TYPE:** `str`                                          |
| `on`        | Numeric property expression. **TYPE:** `str`                       |
| `unit`      | Time unit. **TYPE:** `Literal['hour', 'day']` **DEFAULT:** `'day'` |
| `where`     | Optional filter. **TYPE:** \`str                                   |

| RETURNS            | DESCRIPTION                                  |
| ------------------ | -------------------------------------------- |
| `NumericSumResult` | NumericSumResult with sum values per period. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def segmentation_sum(
    self,
    event: str,
    *,
    from_date: str,
    to_date: str,
    on: str,
    unit: Literal["hour", "day"] = "day",
    where: str | None = None,
) -> NumericSumResult:
    """Calculate sum of numeric property over time.

    Args:
        event: Event name.
        from_date: Start date.
        to_date: End date.
        on: Numeric property expression.
        unit: Time unit.
        where: Optional filter.

    Returns:
        NumericSumResult with sum values per period.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._live_query_service.segmentation_sum(
        event=event,
        from_date=from_date,
        to_date=to_date,
        on=on,
        unit=unit,
        where=where,
    )
```

### segmentation_average

```
segmentation_average(
    event: str,
    *,
    from_date: str,
    to_date: str,
    on: str,
    unit: Literal["hour", "day"] = "day",
    where: str | None = None,
) -> NumericAverageResult
```

Calculate average of numeric property over time.

| PARAMETER   | DESCRIPTION                                                        |
| ----------- | ------------------------------------------------------------------ |
| `event`     | Event name. **TYPE:** `str`                                        |
| `from_date` | Start date. **TYPE:** `str`                                        |
| `to_date`   | End date. **TYPE:** `str`                                          |
| `on`        | Numeric property expression. **TYPE:** `str`                       |
| `unit`      | Time unit. **TYPE:** `Literal['hour', 'day']` **DEFAULT:** `'day'` |
| `where`     | Optional filter. **TYPE:** \`str                                   |

| RETURNS                | DESCRIPTION                                          |
| ---------------------- | ---------------------------------------------------- |
| `NumericAverageResult` | NumericAverageResult with average values per period. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |

Source code in `src/mixpanel_data/workspace.py`

```
def segmentation_average(
    self,
    event: str,
    *,
    from_date: str,
    to_date: str,
    on: str,
    unit: Literal["hour", "day"] = "day",
    where: str | None = None,
) -> NumericAverageResult:
    """Calculate average of numeric property over time.

    Args:
        event: Event name.
        from_date: Start date.
        to_date: End date.
        on: Numeric property expression.
        unit: Time unit.
        where: Optional filter.

    Returns:
        NumericAverageResult with average values per period.

    Raises:
        ConfigError: If API credentials not available.
    """
    return self._live_query_service.segmentation_average(
        event=event,
        from_date=from_date,
        to_date=to_date,
        on=on,
        unit=unit,
        where=where,
    )
```

### property_distribution

```
property_distribution(
    event: str, property: str, *, from_date: str, to_date: str, limit: int = 20
) -> PropertyDistributionResult
```

Get distribution of values for a property.

Uses JQL to count occurrences of each property value, returning counts and percentages sorted by frequency.

| PARAMETER   | DESCRIPTION                                                                        |
| ----------- | ---------------------------------------------------------------------------------- |
| `event`     | Event name to analyze. **TYPE:** `str`                                             |
| `property`  | Property name to get distribution for. **TYPE:** `str`                             |
| `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str`                                           |
| `to_date`   | End date (YYYY-MM-DD). **TYPE:** `str`                                             |
| `limit`     | Maximum number of values to return. Default: 20. **TYPE:** `int` **DEFAULT:** `20` |

| RETURNS                      | DESCRIPTION                                                   |
| ---------------------------- | ------------------------------------------------------------- |
| `PropertyDistributionResult` | PropertyDistributionResult with value counts and percentages. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |
| `QueryError`  | Script execution error.           |

Example

```
result = ws.property_distribution(
    event="Purchase",
    property="country",
    from_date="2024-01-01",
    to_date="2024-01-31",
)
for v in result.values:
    print(f"{v.value}: {v.count} ({v.percentage:.1f}%)")
```

Source code in `src/mixpanel_data/workspace.py`

````
def property_distribution(
    self,
    event: str,
    property: str,
    *,
    from_date: str,
    to_date: str,
    limit: int = 20,
) -> PropertyDistributionResult:
    """Get distribution of values for a property.

    Uses JQL to count occurrences of each property value, returning
    counts and percentages sorted by frequency.

    Args:
        event: Event name to analyze.
        property: Property name to get distribution for.
        from_date: Start date (YYYY-MM-DD).
        to_date: End date (YYYY-MM-DD).
        limit: Maximum number of values to return. Default: 20.

    Returns:
        PropertyDistributionResult with value counts and percentages.

    Raises:
        ConfigError: If API credentials not available.
        QueryError: Script execution error.

    Example:
        ```python
        result = ws.property_distribution(
            event="Purchase",
            property="country",
            from_date="2024-01-01",
            to_date="2024-01-31",
        )
        for v in result.values:
            print(f"{v.value}: {v.count} ({v.percentage:.1f}%)")
        ```
    """
    return self._live_query_service.property_distribution(
        event=event,
        property=property,
        from_date=from_date,
        to_date=to_date,
        limit=limit,
    )
````

### numeric_summary

```
numeric_summary(
    event: str,
    property: str,
    *,
    from_date: str,
    to_date: str,
    percentiles: list[int] | None = None,
) -> NumericPropertySummaryResult
```

Get statistical summary for a numeric property.

Uses JQL to compute count, min, max, avg, stddev, and percentiles for a numeric property.

| PARAMETER     | DESCRIPTION                                                                      |
| ------------- | -------------------------------------------------------------------------------- |
| `event`       | Event name to analyze. **TYPE:** `str`                                           |
| `property`    | Numeric property name. **TYPE:** `str`                                           |
| `from_date`   | Start date (YYYY-MM-DD). **TYPE:** `str`                                         |
| `to_date`     | End date (YYYY-MM-DD). **TYPE:** `str`                                           |
| `percentiles` | Percentiles to compute. Default: [25, 50, 75, 90, 95, 99]. **TYPE:** \`list[int] |

| RETURNS                        | DESCRIPTION                                   |
| ------------------------------ | --------------------------------------------- |
| `NumericPropertySummaryResult` | NumericPropertySummaryResult with statistics. |

| RAISES        | DESCRIPTION                                     |
| ------------- | ----------------------------------------------- |
| `ConfigError` | If API credentials not available.               |
| `QueryError`  | Script execution error or non-numeric property. |

Example

```
result = ws.numeric_summary(
    event="Purchase",
    property="amount",
    from_date="2024-01-01",
    to_date="2024-01-31",
)
print(f"Avg: {result.avg}, Median: {result.percentiles[50]}")
```

Source code in `src/mixpanel_data/workspace.py`

````
def numeric_summary(
    self,
    event: str,
    property: str,
    *,
    from_date: str,
    to_date: str,
    percentiles: list[int] | None = None,
) -> NumericPropertySummaryResult:
    """Get statistical summary for a numeric property.

    Uses JQL to compute count, min, max, avg, stddev, and percentiles
    for a numeric property.

    Args:
        event: Event name to analyze.
        property: Numeric property name.
        from_date: Start date (YYYY-MM-DD).
        to_date: End date (YYYY-MM-DD).
        percentiles: Percentiles to compute. Default: [25, 50, 75, 90, 95, 99].

    Returns:
        NumericPropertySummaryResult with statistics.

    Raises:
        ConfigError: If API credentials not available.
        QueryError: Script execution error or non-numeric property.

    Example:
        ```python
        result = ws.numeric_summary(
            event="Purchase",
            property="amount",
            from_date="2024-01-01",
            to_date="2024-01-31",
        )
        print(f"Avg: {result.avg}, Median: {result.percentiles[50]}")
        ```
    """
    return self._live_query_service.numeric_summary(
        event=event,
        property=property,
        from_date=from_date,
        to_date=to_date,
        percentiles=percentiles,
    )
````

### daily_counts

```
daily_counts(
    *, from_date: str, to_date: str, events: list[str] | None = None
) -> DailyCountsResult
```

Get daily event counts.

Uses JQL to count events by day, optionally filtered to specific events.

| PARAMETER   | DESCRIPTION                                                                |
| ----------- | -------------------------------------------------------------------------- |
| `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str`                                   |
| `to_date`   | End date (YYYY-MM-DD). **TYPE:** `str`                                     |
| `events`    | Optional list of events to count. None = all events. **TYPE:** \`list[str] |

| RETURNS             | DESCRIPTION                                     |
| ------------------- | ----------------------------------------------- |
| `DailyCountsResult` | DailyCountsResult with date/event/count tuples. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |
| `QueryError`  | Script execution error.           |

Example

```
result = ws.daily_counts(
    from_date="2024-01-01",
    to_date="2024-01-07",
    events=["Purchase", "Signup"],
)
for c in result.counts:
    print(f"{c.date} {c.event}: {c.count}")
```

Source code in `src/mixpanel_data/workspace.py`

````
def daily_counts(
    self,
    *,
    from_date: str,
    to_date: str,
    events: list[str] | None = None,
) -> DailyCountsResult:
    """Get daily event counts.

    Uses JQL to count events by day, optionally filtered to specific events.

    Args:
        from_date: Start date (YYYY-MM-DD).
        to_date: End date (YYYY-MM-DD).
        events: Optional list of events to count. None = all events.

    Returns:
        DailyCountsResult with date/event/count tuples.

    Raises:
        ConfigError: If API credentials not available.
        QueryError: Script execution error.

    Example:
        ```python
        result = ws.daily_counts(
            from_date="2024-01-01",
            to_date="2024-01-07",
            events=["Purchase", "Signup"],
        )
        for c in result.counts:
            print(f"{c.date} {c.event}: {c.count}")
        ```
    """
    return self._live_query_service.daily_counts(
        from_date=from_date,
        to_date=to_date,
        events=events,
    )
````

### engagement_distribution

```
engagement_distribution(
    *,
    from_date: str,
    to_date: str,
    events: list[str] | None = None,
    buckets: list[int] | None = None,
) -> EngagementDistributionResult
```

Get user engagement distribution.

Uses JQL to bucket users by their event count, showing how many users performed N events.

| PARAMETER   | DESCRIPTION                                                                   |
| ----------- | ----------------------------------------------------------------------------- |
| `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str`                                      |
| `to_date`   | End date (YYYY-MM-DD). **TYPE:** `str`                                        |
| `events`    | Optional list of events to count. None = all events. **TYPE:** \`list[str]    |
| `buckets`   | Bucket boundaries. Default: [1, 2, 5, 10, 25, 50, 100]. **TYPE:** \`list[int] |

| RETURNS                        | DESCRIPTION                                               |
| ------------------------------ | --------------------------------------------------------- |
| `EngagementDistributionResult` | EngagementDistributionResult with user counts per bucket. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |
| `QueryError`  | Script execution error.           |

Example

```
result = ws.engagement_distribution(
    from_date="2024-01-01",
    to_date="2024-01-31",
)
for b in result.buckets:
    print(f"{b.bucket_label}: {b.user_count} ({b.percentage:.1f}%)")
```

Source code in `src/mixpanel_data/workspace.py`

````
def engagement_distribution(
    self,
    *,
    from_date: str,
    to_date: str,
    events: list[str] | None = None,
    buckets: list[int] | None = None,
) -> EngagementDistributionResult:
    """Get user engagement distribution.

    Uses JQL to bucket users by their event count, showing how many
    users performed N events.

    Args:
        from_date: Start date (YYYY-MM-DD).
        to_date: End date (YYYY-MM-DD).
        events: Optional list of events to count. None = all events.
        buckets: Bucket boundaries. Default: [1, 2, 5, 10, 25, 50, 100].

    Returns:
        EngagementDistributionResult with user counts per bucket.

    Raises:
        ConfigError: If API credentials not available.
        QueryError: Script execution error.

    Example:
        ```python
        result = ws.engagement_distribution(
            from_date="2024-01-01",
            to_date="2024-01-31",
        )
        for b in result.buckets:
            print(f"{b.bucket_label}: {b.user_count} ({b.percentage:.1f}%)")
        ```
    """
    return self._live_query_service.engagement_distribution(
        from_date=from_date,
        to_date=to_date,
        events=events,
        buckets=buckets,
    )
````

### property_coverage

```
property_coverage(
    event: str, properties: list[str], *, from_date: str, to_date: str
) -> PropertyCoverageResult
```

Get property coverage statistics.

Uses JQL to count how often each property is defined (non-null) vs undefined for the specified event.

| PARAMETER    | DESCRIPTION                                            |
| ------------ | ------------------------------------------------------ |
| `event`      | Event name to analyze. **TYPE:** `str`                 |
| `properties` | List of property names to check. **TYPE:** `list[str]` |
| `from_date`  | Start date (YYYY-MM-DD). **TYPE:** `str`               |
| `to_date`    | End date (YYYY-MM-DD). **TYPE:** `str`                 |

| RETURNS                  | DESCRIPTION                                                   |
| ------------------------ | ------------------------------------------------------------- |
| `PropertyCoverageResult` | PropertyCoverageResult with coverage statistics per property. |

| RAISES        | DESCRIPTION                       |
| ------------- | --------------------------------- |
| `ConfigError` | If API credentials not available. |
| `QueryError`  | Script execution error.           |

Example

```
result = ws.property_coverage(
    event="Purchase",
    properties=["coupon_code", "referrer"],
    from_date="2024-01-01",
    to_date="2024-01-31",
)
for c in result.coverage:
    print(f"{c.property}: {c.coverage_percentage:.1f}% defined")
```

Source code in `src/mixpanel_data/workspace.py`

````
def property_coverage(
    self,
    event: str,
    properties: list[str],
    *,
    from_date: str,
    to_date: str,
) -> PropertyCoverageResult:
    """Get property coverage statistics.

    Uses JQL to count how often each property is defined (non-null)
    vs undefined for the specified event.

    Args:
        event: Event name to analyze.
        properties: List of property names to check.
        from_date: Start date (YYYY-MM-DD).
        to_date: End date (YYYY-MM-DD).

    Returns:
        PropertyCoverageResult with coverage statistics per property.

    Raises:
        ConfigError: If API credentials not available.
        QueryError: Script execution error.

    Example:
        ```python
        result = ws.property_coverage(
            event="Purchase",
            properties=["coupon_code", "referrer"],
            from_date="2024-01-01",
            to_date="2024-01-31",
        )
        for c in result.coverage:
            print(f"{c.property}: {c.coverage_percentage:.1f}% defined")
        ```
    """
    return self._live_query_service.property_coverage(
        event=event,
        properties=properties,
        from_date=from_date,
        to_date=to_date,
    )
````

### info

```
info() -> WorkspaceInfo
```

Get metadata about this workspace.

| RETURNS         | DESCRIPTION                                                         |
| --------------- | ------------------------------------------------------------------- |
| `WorkspaceInfo` | WorkspaceInfo with path, project_id, region, account, tables, size. |

Source code in `src/mixpanel_data/workspace.py`

```
def info(self) -> WorkspaceInfo:
    """Get metadata about this workspace.

    Returns:
        WorkspaceInfo with path, project_id, region, account, tables, size.
    """
    path = self.storage.path
    tables = [t.name for t in self.storage.list_tables()]

    # Calculate database size and creation time
    size_mb = 0.0
    created_at: datetime | None = None
    if path is not None and path.exists():
        try:
            stat = path.stat()
            size_mb = stat.st_size / 1_000_000
            created_at = datetime.fromtimestamp(stat.st_ctime)
        except (OSError, PermissionError):
            # File became inaccessible, use defaults
            pass

    return WorkspaceInfo(
        path=path,
        project_id=self._credentials.project_id if self._credentials else "unknown",
        region=self._credentials.region if self._credentials else "unknown",
        account=self._account_name,
        tables=tables,
        size_mb=size_mb,
        created_at=created_at,
    )
```

### tables

```
tables() -> list[TableInfo]
```

List tables in the local database.

| RETURNS           | DESCRIPTION                                                    |
| ----------------- | -------------------------------------------------------------- |
| `list[TableInfo]` | List of TableInfo objects (name, type, row_count, fetched_at). |

Source code in `src/mixpanel_data/workspace.py`

```
def tables(self) -> list[TableInfo]:
    """List tables in the local database.

    Returns:
        List of TableInfo objects (name, type, row_count, fetched_at).
    """
    return self.storage.list_tables()
```

### table_schema

```
table_schema(table: str) -> TableSchema
```

Get schema for a table in the local database.

| PARAMETER | DESCRIPTION                 |
| --------- | --------------------------- |
| `table`   | Table name. **TYPE:** `str` |

| RETURNS       | DESCRIPTION                          |
| ------------- | ------------------------------------ |
| `TableSchema` | TableSchema with column definitions. |

| RAISES               | DESCRIPTION             |
| -------------------- | ----------------------- |
| `TableNotFoundError` | If table doesn't exist. |

Source code in `src/mixpanel_data/workspace.py`

```
def table_schema(self, table: str) -> TableSchema:
    """Get schema for a table in the local database.

    Args:
        table: Table name.

    Returns:
        TableSchema with column definitions.

    Raises:
        TableNotFoundError: If table doesn't exist.
    """
    return self.storage.get_schema(table)
```

### drop

```
drop(*names: str) -> None
```

Drop specified tables.

| PARAMETER | DESCRIPTION                                            |
| --------- | ------------------------------------------------------ |
| `*names`  | Table names to drop. **TYPE:** `str` **DEFAULT:** `()` |

| RAISES               | DESCRIPTION                 |
| -------------------- | --------------------------- |
| `TableNotFoundError` | If any table doesn't exist. |

Source code in `src/mixpanel_data/workspace.py`

```
def drop(self, *names: str) -> None:
    """Drop specified tables.

    Args:
        *names: Table names to drop.

    Raises:
        TableNotFoundError: If any table doesn't exist.
    """
    for name in names:
        self.storage.drop_table(name)
```

### drop_all

```
drop_all(type: TableType | None = None) -> None
```

Drop all tables from the workspace, optionally filtered by type.

Permanently removes all tables and their data. When used with the type parameter, only tables matching the specified type are dropped.

| PARAMETER | DESCRIPTION                                                                                                                               |
| --------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `type`    | Optional table type filter. Valid values: "events", "profiles". If None, all tables are dropped regardless of type. **TYPE:** \`TableType |

| RAISES               | DESCRIPTION                                      |
| -------------------- | ------------------------------------------------ |
| `TableNotFoundError` | If a table cannot be dropped (rare in practice). |

Example

Drop all event tables:

```
ws = Workspace()
ws.drop_all(type="events")  # Only drops event tables
ws.close()
```

Drop all tables:

```
ws = Workspace()
ws.drop_all()  # Drops everything
ws.close()
```

Source code in `src/mixpanel_data/workspace.py`

````
def drop_all(self, type: TableType | None = None) -> None:
    """Drop all tables from the workspace, optionally filtered by type.

    Permanently removes all tables and their data. When used with the type
    parameter, only tables matching the specified type are dropped.

    Args:
        type: Optional table type filter. Valid values: "events", "profiles".
              If None, all tables are dropped regardless of type.

    Raises:
        TableNotFoundError: If a table cannot be dropped (rare in practice).

    Example:
        Drop all event tables:

        ```python
        ws = Workspace()
        ws.drop_all(type="events")  # Only drops event tables
        ws.close()
        ```

        Drop all tables:

        ```python
        ws = Workspace()
        ws.drop_all()  # Drops everything
        ws.close()
        ```
    """
    tables = self.storage.list_tables()
    for table in tables:
        if type is None or table.type == type:
            self.storage.drop_table(table.name)
````

### sample

```
sample(table: str, n: int = 10) -> pd.DataFrame
```

Return random sample rows from a table.

Uses DuckDB's reservoir sampling for representative results. Unlike LIMIT, sampling returns rows from throughout the table.

| PARAMETER | DESCRIPTION                                                               |
| --------- | ------------------------------------------------------------------------- |
| `table`   | Table name to sample from. **TYPE:** `str`                                |
| `n`       | Number of rows to return (default: 10). **TYPE:** `int` **DEFAULT:** `10` |

| RETURNS     | DESCRIPTION                                                   |
| ----------- | ------------------------------------------------------------- |
| `DataFrame` | DataFrame with n random rows. If table has fewer than n rows, |
| `DataFrame` | returns all available rows.                                   |

| RAISES               | DESCRIPTION             |
| -------------------- | ----------------------- |
| `TableNotFoundError` | If table doesn't exist. |

Example

```
ws = Workspace()
ws.sample("events")  # 10 random rows
ws.sample("events", n=5)  # 5 random rows
```

Source code in `src/mixpanel_data/workspace.py`

````
def sample(self, table: str, n: int = 10) -> pd.DataFrame:
    """Return random sample rows from a table.

    Uses DuckDB's reservoir sampling for representative results.
    Unlike LIMIT, sampling returns rows from throughout the table.

    Args:
        table: Table name to sample from.
        n: Number of rows to return (default: 10).

    Returns:
        DataFrame with n random rows. If table has fewer than n rows,
        returns all available rows.

    Raises:
        TableNotFoundError: If table doesn't exist.

    Example:
        ```python
        ws = Workspace()
        ws.sample("events")  # 10 random rows
        ws.sample("events", n=5)  # 5 random rows
        ```
    """
    # Validate table exists
    self.storage.get_schema(table)

    # Use DuckDB's reservoir sampling
    sql = f'SELECT * FROM "{table}" USING SAMPLE {n}'
    return self.storage.execute_df(sql)
````

### summarize

```
summarize(table: str) -> SummaryResult
```

Get statistical summary of all columns in a table.

Uses DuckDB's SUMMARIZE command to compute min/max, quartiles, null percentage, and approximate distinct counts for each column.

| PARAMETER | DESCRIPTION                              |
| --------- | ---------------------------------------- |
| `table`   | Table name to summarize. **TYPE:** `str` |

| RETURNS         | DESCRIPTION                                                   |
| --------------- | ------------------------------------------------------------- |
| `SummaryResult` | SummaryResult with per-column statistics and total row count. |

| RAISES               | DESCRIPTION             |
| -------------------- | ----------------------- |
| `TableNotFoundError` | If table doesn't exist. |

Example

```
result = ws.summarize("events")
result.row_count         # 1234567
result.columns[0].null_percentage  # 0.5
result.df                # Full summary as DataFrame
```

Source code in `src/mixpanel_data/workspace.py`

````
def summarize(self, table: str) -> SummaryResult:
    """Get statistical summary of all columns in a table.

    Uses DuckDB's SUMMARIZE command to compute min/max, quartiles,
    null percentage, and approximate distinct counts for each column.

    Args:
        table: Table name to summarize.

    Returns:
        SummaryResult with per-column statistics and total row count.

    Raises:
        TableNotFoundError: If table doesn't exist.

    Example:
        ```python
        result = ws.summarize("events")
        result.row_count         # 1234567
        result.columns[0].null_percentage  # 0.5
        result.df                # Full summary as DataFrame
        ```
    """
    # Validate table exists
    self.storage.get_schema(table)

    # Get row count
    row_count = self.storage.execute_scalar(f'SELECT COUNT(*) FROM "{table}"')

    # Get column statistics using SUMMARIZE
    summary_df = self.storage.execute_df(f'SUMMARIZE "{table}"')

    # Convert to ColumnSummary objects (to_dict is more efficient than iterrows)
    columns: list[ColumnSummary] = []
    for row in summary_df.to_dict("records"):
        columns.append(
            ColumnSummary(
                column_name=str(row["column_name"]),
                column_type=str(row["column_type"]),
                min=row["min"],
                max=row["max"],
                approx_unique=int(row["approx_unique"]),
                avg=self._try_float(row["avg"]),
                std=self._try_float(row["std"]),
                q25=row["q25"],
                q50=row["q50"],
                q75=row["q75"],
                count=int(row["count"]),
                null_percentage=float(row["null_percentage"]),
            )
        )

    return SummaryResult(
        table=table,
        row_count=int(row_count),
        columns=columns,
    )
````

### event_breakdown

```
event_breakdown(table: str) -> EventBreakdownResult
```

Analyze event distribution in a table.

Computes per-event counts, unique users, date ranges, and percentage of total for each event type.

| PARAMETER | DESCRIPTION                                                                                           |
| --------- | ----------------------------------------------------------------------------------------------------- |
| `table`   | Table name containing events. Must have columns: event_name, event_time, distinct_id. **TYPE:** `str` |

| RETURNS                | DESCRIPTION                                     |
| ---------------------- | ----------------------------------------------- |
| `EventBreakdownResult` | EventBreakdownResult with per-event statistics. |

| RAISES               | DESCRIPTION                                                                                                              |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `TableNotFoundError` | If table doesn't exist.                                                                                                  |
| `QueryError`         | If table lacks required columns (event_name, event_time, distinct_id). Error message lists the specific missing columns. |

Example

```
breakdown = ws.event_breakdown("events")
breakdown.total_events           # 1234567
breakdown.events[0].event_name   # "Page View"
breakdown.events[0].pct_of_total # 45.2
```

Source code in `src/mixpanel_data/workspace.py`

````
def event_breakdown(self, table: str) -> EventBreakdownResult:
    """Analyze event distribution in a table.

    Computes per-event counts, unique users, date ranges, and
    percentage of total for each event type.

    Args:
        table: Table name containing events. Must have columns:
               event_name, event_time, distinct_id.

    Returns:
        EventBreakdownResult with per-event statistics.

    Raises:
        TableNotFoundError: If table doesn't exist.
        QueryError: If table lacks required columns (event_name,
                   event_time, distinct_id). Error message lists
                   the specific missing columns.

    Example:
        ```python
        breakdown = ws.event_breakdown("events")
        breakdown.total_events           # 1234567
        breakdown.events[0].event_name   # "Page View"
        breakdown.events[0].pct_of_total # 45.2
        ```
    """
    # Validate table exists and get schema
    schema = self.storage.get_schema(table)
    column_names = {col.name for col in schema.columns}

    # Check for required columns
    required_columns = {"event_name", "event_time", "distinct_id"}
    missing = required_columns - column_names
    if missing:
        raise QueryError(
            f"event_breakdown() requires columns {required_columns}, "
            f"but '{table}' is missing: {missing}",
            status_code=0,
        )

    # Get aggregate statistics
    agg_sql = f"""
        SELECT
            COUNT(*) as total_events,
            COUNT(DISTINCT distinct_id) as total_users,
            MIN(event_time) as min_time,
            MAX(event_time) as max_time
        FROM "{table}"
    """
    agg_result = self.storage.execute_rows(agg_sql)
    total_events, total_users, min_time, max_time = agg_result.rows[0]

    # Handle empty table
    if total_events == 0:
        return EventBreakdownResult(
            table=table,
            total_events=0,
            total_users=0,
            date_range=(datetime.min, datetime.min),
            events=[],
        )

    # Get per-event statistics
    breakdown_sql = f"""
        SELECT
            event_name,
            COUNT(*) as count,
            COUNT(DISTINCT distinct_id) as unique_users,
            MIN(event_time) as first_seen,
            MAX(event_time) as last_seen,
            ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) as pct_of_total
        FROM "{table}"
        GROUP BY event_name
        ORDER BY count DESC
    """
    breakdown_rows = self.storage.execute_rows(breakdown_sql)

    events: list[EventStats] = []
    for row in breakdown_rows:
        event_name, count, unique_users, first_seen, last_seen, pct = row
        events.append(
            EventStats(
                event_name=str(event_name),
                count=int(count),
                unique_users=int(unique_users),
                first_seen=first_seen
                if isinstance(first_seen, datetime)
                else datetime.fromisoformat(str(first_seen)),
                last_seen=last_seen
                if isinstance(last_seen, datetime)
                else datetime.fromisoformat(str(last_seen)),
                pct_of_total=float(pct),
            )
        )

    return EventBreakdownResult(
        table=table,
        total_events=int(total_events),
        total_users=int(total_users),
        date_range=(
            min_time
            if isinstance(min_time, datetime)
            else datetime.fromisoformat(str(min_time)),
            max_time
            if isinstance(max_time, datetime)
            else datetime.fromisoformat(str(max_time)),
        ),
        events=events,
    )
````

### property_keys

```
property_keys(table: str, event: str | None = None) -> list[str]
```

List all JSON property keys in a table.

Extracts distinct keys from the 'properties' JSON column. Useful for discovering queryable fields in event properties.

| PARAMETER | DESCRIPTION                                                                                                      |
| --------- | ---------------------------------------------------------------------------------------------------------------- |
| `table`   | Table name with a 'properties' JSON column. **TYPE:** `str`                                                      |
| `event`   | Optional event name to filter by. If provided, only returns keys present in events of that type. **TYPE:** \`str |

| RETURNS     | DESCRIPTION                                       |
| ----------- | ------------------------------------------------- |
| `list[str]` | Alphabetically sorted list of property key names. |
| `list[str]` | Empty list if no keys found.                      |

| RAISES               | DESCRIPTION                         |
| -------------------- | ----------------------------------- |
| `TableNotFoundError` | If table doesn't exist.             |
| `QueryError`         | If table lacks 'properties' column. |

Example

All keys across all events:

```
ws.property_keys("events")
# ['$browser', '$city', 'page', 'referrer', 'user_plan']
```

Keys for specific event type:

```
ws.property_keys("events", event="Purchase")
# ['amount', 'currency', 'product_id', 'quantity']
```

Source code in `src/mixpanel_data/workspace.py`

````
def property_keys(
    self,
    table: str,
    event: str | None = None,
) -> list[str]:
    """List all JSON property keys in a table.

    Extracts distinct keys from the 'properties' JSON column.
    Useful for discovering queryable fields in event properties.

    Args:
        table: Table name with a 'properties' JSON column.
        event: Optional event name to filter by. If provided, only
               returns keys present in events of that type.

    Returns:
        Alphabetically sorted list of property key names.
        Empty list if no keys found.

    Raises:
        TableNotFoundError: If table doesn't exist.
        QueryError: If table lacks 'properties' column.

    Example:
        All keys across all events:

        ```python
        ws.property_keys("events")
        # ['$browser', '$city', 'page', 'referrer', 'user_plan']
        ```

        Keys for specific event type:

        ```python
        ws.property_keys("events", event="Purchase")
        # ['amount', 'currency', 'product_id', 'quantity']
        ```
    """
    # Validate table exists and get schema
    schema = self.storage.get_schema(table)
    column_names = {col.name for col in schema.columns}

    # Check for required column
    if "properties" not in column_names:
        raise QueryError(
            f"property_keys() requires a 'properties' column, "
            f"but '{table}' does not have one",
            status_code=0,
        )

    # Build query with optional event filter
    if event is not None:
        # Check if event_name column exists
        if "event_name" not in column_names:
            raise QueryError(
                f"Cannot filter by event: '{table}' lacks 'event_name' column",
                status_code=0,
            )
        sql = f"""
            SELECT DISTINCT unnest(json_keys(properties)) as key
            FROM "{table}"
            WHERE event_name = ?
            ORDER BY key
        """
        result = self.storage.execute_rows_params(sql, [event])
        rows = result.rows
    else:
        sql = f"""
            SELECT DISTINCT unnest(json_keys(properties)) as key
            FROM "{table}"
            ORDER BY key
        """
        result = self.storage.execute_rows(sql)
        rows = result.rows

    return [str(row[0]) for row in rows]
````

### column_stats

```
column_stats(table: str, column: str, *, top_n: int = 10) -> ColumnStatsResult
```

Get detailed statistics for a single column.

Performs deep analysis including null rates, cardinality, top values, and numeric statistics (for numeric columns).

The column parameter supports JSON path expressions for analyzing properties stored in JSON columns:

- `properties->>'$.country'` for string extraction
- `CAST(properties->>'$.amount' AS DOUBLE)` for numeric

| PARAMETER | DESCRIPTION                                                                     |
| --------- | ------------------------------------------------------------------------------- |
| `table`   | Table name to analyze. **TYPE:** `str`                                          |
| `column`  | Column name or expression to analyze. **TYPE:** `str`                           |
| `top_n`   | Number of top values to return (default: 10). **TYPE:** `int` **DEFAULT:** `10` |

| RETURNS             | DESCRIPTION                                             |
| ------------------- | ------------------------------------------------------- |
| `ColumnStatsResult` | ColumnStatsResult with comprehensive column statistics. |

| RAISES               | DESCRIPTION                      |
| -------------------- | -------------------------------- |
| `TableNotFoundError` | If table doesn't exist.          |
| `QueryError`         | If column expression is invalid. |

Example

Analyze standard column:

```
stats = ws.column_stats("events", "event_name")
stats.unique_count      # 47
stats.top_values[:3]    # [('Page View', 45230), ...]
```

Analyze JSON property:

```
stats = ws.column_stats("events", "properties->>'$.country'")
```

Security

The column parameter is interpolated directly into SQL queries to allow expression syntax. Only use with trusted input from developers or AI coding agents. Do not pass untrusted user input.

Source code in `src/mixpanel_data/workspace.py`

````
def column_stats(
    self,
    table: str,
    column: str,
    *,
    top_n: int = 10,
) -> ColumnStatsResult:
    """Get detailed statistics for a single column.

    Performs deep analysis including null rates, cardinality,
    top values, and numeric statistics (for numeric columns).

    The column parameter supports JSON path expressions for
    analyzing properties stored in JSON columns:
    - `properties->>'$.country'` for string extraction
    - `CAST(properties->>'$.amount' AS DOUBLE)` for numeric

    Args:
        table: Table name to analyze.
        column: Column name or expression to analyze.
        top_n: Number of top values to return (default: 10).

    Returns:
        ColumnStatsResult with comprehensive column statistics.

    Raises:
        TableNotFoundError: If table doesn't exist.
        QueryError: If column expression is invalid.

    Example:
        Analyze standard column:

        ```python
        stats = ws.column_stats("events", "event_name")
        stats.unique_count      # 47
        stats.top_values[:3]    # [('Page View', 45230), ...]
        ```

        Analyze JSON property:

        ```python
        stats = ws.column_stats("events", "properties->>'$.country'")
        ```

    Security:
        The column parameter is interpolated directly into SQL queries
        to allow expression syntax. Only use with trusted input from
        developers or AI coding agents. Do not pass untrusted user input.
    """
    # Validate table exists
    self.storage.get_schema(table)

    # Get total row count
    total_rows = self.storage.execute_scalar(f'SELECT COUNT(*) FROM "{table}"')

    # Get basic stats: count, null_count, approx unique
    stats_sql = f"""
        SELECT
            COUNT({column}) as count,
            COUNT(*) - COUNT({column}) as null_count,
            APPROX_COUNT_DISTINCT({column}) as unique_count
        FROM "{table}"
    """
    try:
        stats_result = self.storage.execute_rows(stats_sql)
    except Exception as e:
        raise QueryError(
            f"Invalid column expression: {column}. Error: {e}",
            status_code=0,
        ) from e

    count, null_count, unique_count = stats_result.rows[0]

    # Calculate percentages
    null_pct = (null_count / total_rows * 100) if total_rows > 0 else 0.0
    unique_pct = (unique_count / count * 100) if count > 0 else 0.0

    # Get top values
    top_sql = f"""
        SELECT {column} as value, COUNT(*) as cnt
        FROM "{table}"
        WHERE {column} IS NOT NULL
        GROUP BY {column}
        ORDER BY cnt DESC
        LIMIT {top_n}
    """
    top_result = self.storage.execute_rows(top_sql)
    top_values: list[tuple[Any, int]] = [
        (row[0], int(row[1])) for row in top_result.rows
    ]

    # Detect column type to determine if numeric stats apply
    type_sql = (
        f'SELECT typeof({column}) FROM "{table}" WHERE {column} IS NOT NULL LIMIT 1'
    )
    try:
        type_result = self.storage.execute_rows(type_sql)
        dtype = str(type_result.rows[0][0]) if type_result.rows else "UNKNOWN"
    except Exception:
        dtype = "UNKNOWN"

    # Get numeric stats if applicable
    min_val: float | None = None
    max_val: float | None = None
    mean_val: float | None = None
    std_val: float | None = None

    numeric_types = {
        "INTEGER",
        "BIGINT",
        "DOUBLE",
        "FLOAT",
        "DECIMAL",
        "HUGEINT",
        "SMALLINT",
        "TINYINT",
        "UBIGINT",
        "UINTEGER",
        "USMALLINT",
        "UTINYINT",
    }
    if dtype.upper() in numeric_types:
        numeric_sql = f"""
            SELECT
                MIN({column}) as min_val,
                MAX({column}) as max_val,
                AVG({column}) as mean_val,
                STDDEV({column}) as std_val
            FROM "{table}"
        """
        try:
            numeric_result = self.storage.execute_rows(numeric_sql)
            if numeric_result.rows:
                row = numeric_result.rows[0]
                min_val = float(row[0]) if row[0] is not None else None
                max_val = float(row[1]) if row[1] is not None else None
                mean_val = float(row[2]) if row[2] is not None else None
                std_val = float(row[3]) if row[3] is not None else None
        except Exception:
            # Not numeric, skip
            pass

    return ColumnStatsResult(
        table=table,
        column=column,
        dtype=dtype,
        count=int(count),
        null_count=int(null_count),
        null_pct=round(null_pct, 2),
        unique_count=int(unique_count),
        unique_pct=round(unique_pct, 2),
        top_values=top_values,
        min=min_val,
        max=max_val,
        mean=mean_val,
        std=std_val,
    )
````

Copy markdown

# Auth Module

The auth module provides credential management and configuration.

Explore on DeepWiki

🤖 **[Configuration Reference →](https://deepwiki.com/jaredmcfarland/mixpanel_data/7.3-configuration-reference)**

Ask questions about credential management, ConfigManager, or account configuration.

## Overview

```
from mixpanel_data.auth import ConfigManager, Credentials, AccountInfo

# Manage accounts
config = ConfigManager()
config.add_account("production", username="...", secret="...", project_id="...", region="us")
accounts = config.list_accounts()

# Resolve credentials
creds = config.resolve_credentials(account="production")
```

## ConfigManager

Manages accounts stored in the TOML config file (`~/.mp/config.toml`).

## mixpanel_data.auth.ConfigManager

```
ConfigManager(config_path: Path | None = None)
```

Manages Mixpanel project credentials and configuration.

Handles:

- Adding, removing, and listing project accounts
- Setting the default account
- Resolving credentials from environment variables or config file

Config file location (in priority order):

1. Explicit config_path parameter
1. MP_CONFIG_PATH environment variable
1. Default: ~/.mp/config.toml

Initialize ConfigManager.

| PARAMETER     | DESCRIPTION                                                                |
| ------------- | -------------------------------------------------------------------------- |
| `config_path` | Override config file location. Default: ~/.mp/config.toml **TYPE:** \`Path |

Source code in `src/mixpanel_data/_internal/config.py`

```
def __init__(self, config_path: Path | None = None) -> None:
    """Initialize ConfigManager.

    Args:
        config_path: Override config file location.
                     Default: ~/.mp/config.toml
    """
    if config_path is not None:
        self._config_path = config_path
    elif "MP_CONFIG_PATH" in os.environ:
        self._config_path = Path(os.environ["MP_CONFIG_PATH"])
    else:
        self._config_path = self.DEFAULT_CONFIG_PATH
```

### config_path

```
config_path: Path
```

Return the config file path.

### resolve_credentials

```
resolve_credentials(account: str | None = None) -> Credentials
```

Resolve credentials using priority order.

Resolution order:

1. Environment variables (MP_USERNAME, MP_SECRET, MP_PROJECT_ID, MP_REGION)
1. Named account from config file (if account parameter provided)
1. Default account from config file

| PARAMETER | DESCRIPTION                                                      |
| --------- | ---------------------------------------------------------------- |
| `account` | Optional account name to use instead of default. **TYPE:** \`str |

| RETURNS       | DESCRIPTION                   |
| ------------- | ----------------------------- |
| `Credentials` | Immutable Credentials object. |

| RAISES                 | DESCRIPTION                        |
| ---------------------- | ---------------------------------- |
| `ConfigError`          | If no credentials can be resolved. |
| `AccountNotFoundError` | If named account doesn't exist.    |

Source code in `src/mixpanel_data/_internal/config.py`

```
def resolve_credentials(self, account: str | None = None) -> Credentials:
    """Resolve credentials using priority order.

    Resolution order:
    1. Environment variables (MP_USERNAME, MP_SECRET, MP_PROJECT_ID, MP_REGION)
    2. Named account from config file (if account parameter provided)
    3. Default account from config file

    Args:
        account: Optional account name to use instead of default.

    Returns:
        Immutable Credentials object.

    Raises:
        ConfigError: If no credentials can be resolved.
        AccountNotFoundError: If named account doesn't exist.
    """
    # Priority 1: Environment variables
    env_creds = self._resolve_from_env()
    if env_creds is not None:
        return env_creds

    # Priority 2 & 3: Config file (named account or default)
    config = self._read_config()
    accounts = config.get("accounts", {})

    if not accounts:
        raise ConfigError(
            "No credentials configured. "
            "Set MP_USERNAME, MP_SECRET, MP_PROJECT_ID, MP_REGION environment variables, "
            "or add an account with add_account()."
        )

    # Determine which account to use
    account_name: str
    if account is not None:
        account_name = account
    else:
        default_account = config.get("default")
        if default_account is not None and isinstance(default_account, str):
            account_name = default_account
        else:
            # Use the first account if no default set
            account_name = next(iter(accounts.keys()))

    if account_name not in accounts:
        raise AccountNotFoundError(
            account_name,
            available_accounts=list(accounts.keys()),
        )

    account_data = accounts[account_name]
    return Credentials(
        username=account_data["username"],
        secret=SecretStr(account_data["secret"]),
        project_id=account_data["project_id"],
        region=account_data["region"],
    )
```

### list_accounts

```
list_accounts() -> list[AccountInfo]
```

List all configured accounts.

| RETURNS             | DESCRIPTION                                         |
| ------------------- | --------------------------------------------------- |
| `list[AccountInfo]` | List of AccountInfo objects (secrets not included). |

Source code in `src/mixpanel_data/_internal/config.py`

```
def list_accounts(self) -> list[AccountInfo]:
    """List all configured accounts.

    Returns:
        List of AccountInfo objects (secrets not included).
    """
    config = self._read_config()
    accounts = config.get("accounts", {})
    default_name = config.get("default")

    result: list[AccountInfo] = []
    for name, data in accounts.items():
        result.append(
            AccountInfo(
                name=name,
                username=data.get("username", ""),
                project_id=data.get("project_id", ""),
                region=data.get("region", ""),
                is_default=(name == default_name),
            )
        )

    return result
```

### add_account

```
add_account(
    name: str, username: str, secret: str, project_id: str, region: str
) -> None
```

Add a new account configuration.

| PARAMETER    | DESCRIPTION                                         |
| ------------ | --------------------------------------------------- |
| `name`       | Display name for the account. **TYPE:** `str`       |
| `username`   | Service account username. **TYPE:** `str`           |
| `secret`     | Service account secret. **TYPE:** `str`             |
| `project_id` | Mixpanel project ID. **TYPE:** `str`                |
| `region`     | Data residency region (us, eu, in). **TYPE:** `str` |

| RAISES               | DESCRIPTION                     |
| -------------------- | ------------------------------- |
| `AccountExistsError` | If account name already exists. |
| `ValueError`         | If region is invalid.           |

Source code in `src/mixpanel_data/_internal/config.py`

```
def add_account(
    self,
    name: str,
    username: str,
    secret: str,
    project_id: str,
    region: str,
) -> None:
    """Add a new account configuration.

    Args:
        name: Display name for the account.
        username: Service account username.
        secret: Service account secret.
        project_id: Mixpanel project ID.
        region: Data residency region (us, eu, in).

    Raises:
        AccountExistsError: If account name already exists.
        ValueError: If region is invalid.
    """
    # Validate region
    region_lower = region.lower()
    if region_lower not in VALID_REGIONS:
        valid = ", ".join(VALID_REGIONS)
        raise ValueError(f"Region must be one of: {valid}. Got: {region}")

    config = self._read_config()
    accounts = config.setdefault("accounts", {})

    if name in accounts:
        raise AccountExistsError(name)

    accounts[name] = {
        "username": username,
        "secret": secret,
        "project_id": project_id,
        "region": region_lower,
    }

    # If this is the first account, make it the default
    if "default" not in config:
        config["default"] = name

    self._write_config(config)
```

### remove_account

```
remove_account(name: str) -> None
```

Remove an account configuration.

| PARAMETER | DESCRIPTION                             |
| --------- | --------------------------------------- |
| `name`    | Account name to remove. **TYPE:** `str` |

| RAISES                 | DESCRIPTION               |
| ---------------------- | ------------------------- |
| `AccountNotFoundError` | If account doesn't exist. |

Source code in `src/mixpanel_data/_internal/config.py`

```
def remove_account(self, name: str) -> None:
    """Remove an account configuration.

    Args:
        name: Account name to remove.

    Raises:
        AccountNotFoundError: If account doesn't exist.
    """
    config = self._read_config()
    accounts = config.get("accounts", {})

    if name not in accounts:
        raise AccountNotFoundError(name, available_accounts=list(accounts.keys()))

    del accounts[name]

    # If we removed the default, clear it or set to another account
    if config.get("default") == name:
        if accounts:
            config["default"] = next(iter(accounts.keys()))
        else:
            config.pop("default", None)

    self._write_config(config)
```

### set_default

```
set_default(name: str) -> None
```

Set the default account.

| PARAMETER | DESCRIPTION                                     |
| --------- | ----------------------------------------------- |
| `name`    | Account name to set as default. **TYPE:** `str` |

| RAISES                 | DESCRIPTION               |
| ---------------------- | ------------------------- |
| `AccountNotFoundError` | If account doesn't exist. |

Source code in `src/mixpanel_data/_internal/config.py`

```
def set_default(self, name: str) -> None:
    """Set the default account.

    Args:
        name: Account name to set as default.

    Raises:
        AccountNotFoundError: If account doesn't exist.
    """
    config = self._read_config()
    accounts = config.get("accounts", {})

    if name not in accounts:
        raise AccountNotFoundError(name, available_accounts=list(accounts.keys()))

    config["default"] = name
    self._write_config(config)
```

### get_account

```
get_account(name: str) -> AccountInfo
```

Get information about a specific account.

| PARAMETER | DESCRIPTION                   |
| --------- | ----------------------------- |
| `name`    | Account name. **TYPE:** `str` |

| RETURNS       | DESCRIPTION                               |
| ------------- | ----------------------------------------- |
| `AccountInfo` | AccountInfo object (secret not included). |

| RAISES                 | DESCRIPTION               |
| ---------------------- | ------------------------- |
| `AccountNotFoundError` | If account doesn't exist. |

Source code in `src/mixpanel_data/_internal/config.py`

```
def get_account(self, name: str) -> AccountInfo:
    """Get information about a specific account.

    Args:
        name: Account name.

    Returns:
        AccountInfo object (secret not included).

    Raises:
        AccountNotFoundError: If account doesn't exist.
    """
    config = self._read_config()
    accounts = config.get("accounts", {})

    if name not in accounts:
        raise AccountNotFoundError(name, available_accounts=list(accounts.keys()))

    data = accounts[name]
    default_name = config.get("default")

    return AccountInfo(
        name=name,
        username=data.get("username", ""),
        project_id=data.get("project_id", ""),
        region=data.get("region", ""),
        is_default=(name == default_name),
    )
```

## Credentials

Immutable container for authentication credentials.

## mixpanel_data.auth.Credentials

Bases: `BaseModel`

Immutable credentials for Mixpanel API authentication.

This is a frozen Pydantic model that ensures:

- All fields are validated on construction
- The secret is never exposed in repr/str output
- The object cannot be modified after creation

### username

```
username: str
```

Service account username.

### secret

```
secret: SecretStr
```

Service account secret (redacted in output).

### project_id

```
project_id: str
```

Mixpanel project identifier.

### region

```
region: RegionType
```

Data residency region (us, eu, or in).

### validate_region

```
validate_region(v: str) -> str
```

Validate and normalize region to lowercase.

Source code in `src/mixpanel_data/_internal/config.py`

```
@field_validator("region", mode="before")
@classmethod
def validate_region(cls, v: str) -> str:
    """Validate and normalize region to lowercase."""
    if not isinstance(v, str):
        raise ValueError(f"Region must be a string. Got: {type(v).__name__}")
    v_lower = v.lower()
    if v_lower not in VALID_REGIONS:
        valid = ", ".join(VALID_REGIONS)
        raise ValueError(f"Region must be one of: {valid}. Got: {v}")
    return v_lower
```

### validate_non_empty

```
validate_non_empty(v: str) -> str
```

Validate string fields are non-empty.

Source code in `src/mixpanel_data/_internal/config.py`

```
@field_validator("username", "project_id")
@classmethod
def validate_non_empty(cls, v: str) -> str:
    """Validate string fields are non-empty."""
    if not v or not v.strip():
        raise ValueError("Field cannot be empty")
    return v
```

### __repr__

```
__repr__() -> str
```

Return string representation with redacted secret.

Source code in `src/mixpanel_data/_internal/config.py`

```
def __repr__(self) -> str:
    """Return string representation with redacted secret."""
    return (
        f"Credentials(username={self.username!r}, secret=***, "
        f"project_id={self.project_id!r}, region={self.region!r})"
    )
```

### __str__

```
__str__() -> str
```

Return string representation with redacted secret.

Source code in `src/mixpanel_data/_internal/config.py`

```
def __str__(self) -> str:
    """Return string representation with redacted secret."""
    return self.__repr__()
```

## AccountInfo

Account metadata (without the secret).

## mixpanel_data.auth.AccountInfo

```
AccountInfo(
    name: str, username: str, project_id: str, region: str, is_default: bool
)
```

Information about a configured account (without secret).

Used for listing accounts without exposing sensitive credentials.

### name

```
name: str
```

Account display name.

### username

```
username: str
```

Service account username.

### project_id

```
project_id: str
```

Mixpanel project identifier.

### region

```
region: str
```

Data residency region.

### is_default

```
is_default: bool
```

Whether this is the default account.

Copy markdown

# Exceptions

All library exceptions inherit from `MixpanelDataError`, enabling callers to catch all library errors with a single except clause.

Explore on DeepWiki

🤖 **[Error Handling Guide →](https://deepwiki.com/jaredmcfarland/mixpanel_data/7.4-error-codes-and-exceptions)**

Ask questions about specific exceptions, error recovery patterns, or debugging strategies.

## Exception Hierarchy

```
MixpanelDataError
├── ConfigError
│   ├── AccountNotFoundError
│   └── AccountExistsError
├── APIError
│   ├── AuthenticationError
│   ├── RateLimitError
│   ├── QueryError
│   ├── ServerError
│   └── JQLSyntaxError
├── TableExistsError
├── TableNotFoundError
├── DatabaseLockedError
└── DatabaseNotFoundError
```

## Catching Errors

```
import mixpanel_data as mp

try:
    ws = mp.Workspace()
    result = ws.segmentation(event="Purchase", from_date="2025-01-01", to_date="2025-01-31")
except mp.AuthenticationError as e:
    print(f"Auth failed: {e.message}")
except mp.RateLimitError as e:
    print(f"Rate limited, retry after {e.retry_after}s")
except mp.MixpanelDataError as e:
    print(f"Error [{e.code}]: {e.message}")
```

## Base Exception

## mixpanel_data.MixpanelDataError

```
MixpanelDataError(
    message: str,
    code: str = "UNKNOWN_ERROR",
    details: dict[str, Any] | None = None,
)
```

Bases: `Exception`

Base exception for all mixpanel_data errors.

All library exceptions inherit from this class, allowing callers to:

- Catch all library errors: except MixpanelDataError
- Handle specific errors: except AccountNotFoundError
- Serialize errors: error.to_dict()

Initialize exception.

| PARAMETER | DESCRIPTION                                                                                           |
| --------- | ----------------------------------------------------------------------------------------------------- |
| `message` | Human-readable error message. **TYPE:** `str`                                                         |
| `code`    | Machine-readable error code for programmatic handling. **TYPE:** `str` **DEFAULT:** `'UNKNOWN_ERROR'` |
| `details` | Additional structured data about the error. **TYPE:** \`dict[str, Any]                                |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(
    self,
    message: str,
    code: str = "UNKNOWN_ERROR",
    details: dict[str, Any] | None = None,
) -> None:
    """Initialize exception.

    Args:
        message: Human-readable error message.
        code: Machine-readable error code for programmatic handling.
        details: Additional structured data about the error.
    """
    super().__init__(message)
    self._message = message
    self._code = code
    self._details = details or {}
```

### code

```
code: str
```

Machine-readable error code.

### message

```
message: str
```

Human-readable error message.

### details

```
details: dict[str, Any]
```

Additional structured error data.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize exception for logging/JSON output.

| RETURNS          | DESCRIPTION                                   |
| ---------------- | --------------------------------------------- |
| `dict[str, Any]` | Dictionary with keys: code, message, details. |
| `dict[str, Any]` | All values are JSON-serializable.             |

Source code in `src/mixpanel_data/exceptions.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize exception for logging/JSON output.

    Returns:
        Dictionary with keys: code, message, details.
        All values are JSON-serializable.
    """
    return {
        "code": self._code,
        "message": self._message,
        "details": self._details,
    }
```

### __str__

```
__str__() -> str
```

Return human-readable error message.

Source code in `src/mixpanel_data/exceptions.py`

```
def __str__(self) -> str:
    """Return human-readable error message."""
    return self._message
```

### __repr__

```
__repr__() -> str
```

Return detailed string representation.

Source code in `src/mixpanel_data/exceptions.py`

```
def __repr__(self) -> str:
    """Return detailed string representation."""
    return (
        f"{self.__class__.__name__}(message={self._message!r}, code={self._code!r})"
    )
```

## API Exceptions

## mixpanel_data.APIError

```
APIError(
    message: str,
    *,
    status_code: int,
    response_body: str | dict[str, Any] | None = None,
    request_method: str | None = None,
    request_url: str | None = None,
    request_params: dict[str, Any] | None = None,
    request_body: dict[str, Any] | None = None,
    code: str = "API_ERROR",
)
```

Bases: `MixpanelDataError`

Base class for Mixpanel API HTTP errors.

Provides structured access to HTTP request/response context for debugging and automated recovery by AI agents. All API-related exceptions inherit from this class, enabling agents to:

- Understand what went wrong (status code, error message)
- See exactly what was sent (request method, URL, params, body)
- See exactly what came back (response body, headers)
- Modify their approach and retry autonomously

Example

```
try:
    result = client.segmentation(event="signup", ...)
except APIError as e:
    print(f"Status: {e.status_code}")
    print(f"Response: {e.response_body}")
    print(f"Request URL: {e.request_url}")
    print(f"Request params: {e.request_params}")
```

Initialize APIError.

| PARAMETER        | DESCRIPTION                                                             |
| ---------------- | ----------------------------------------------------------------------- |
| `message`        | Human-readable error message. **TYPE:** `str`                           |
| `status_code`    | HTTP status code from response. **TYPE:** `int`                         |
| `response_body`  | Raw response body (string or parsed dict). **TYPE:** \`str              |
| `request_method` | HTTP method used (GET, POST). **TYPE:** \`str                           |
| `request_url`    | Full request URL. **TYPE:** \`str                                       |
| `request_params` | Query parameters sent. **TYPE:** \`dict[str, Any]                       |
| `request_body`   | Request body sent (for POST requests). **TYPE:** \`dict[str, Any]       |
| `code`           | Machine-readable error code. **TYPE:** `str` **DEFAULT:** `'API_ERROR'` |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(
    self,
    message: str,
    *,
    status_code: int,
    response_body: str | dict[str, Any] | None = None,
    request_method: str | None = None,
    request_url: str | None = None,
    request_params: dict[str, Any] | None = None,
    request_body: dict[str, Any] | None = None,
    code: str = "API_ERROR",
) -> None:
    """Initialize APIError.

    Args:
        message: Human-readable error message.
        status_code: HTTP status code from response.
        response_body: Raw response body (string or parsed dict).
        request_method: HTTP method used (GET, POST).
        request_url: Full request URL.
        request_params: Query parameters sent.
        request_body: Request body sent (for POST requests).
        code: Machine-readable error code.
    """
    self._status_code = status_code
    self._response_body = response_body
    self._request_method = request_method
    self._request_url = request_url
    self._request_params = request_params
    self._request_body = request_body

    details: dict[str, Any] = {
        "status_code": status_code,
    }
    if response_body is not None:
        details["response_body"] = response_body
    if request_method is not None:
        details["request_method"] = request_method
    if request_url is not None:
        details["request_url"] = request_url
    if request_params is not None:
        details["request_params"] = request_params
    if request_body is not None:
        details["request_body"] = request_body

    super().__init__(message, code=code, details=details)
```

### status_code

```
status_code: int
```

HTTP status code from response.

### response_body

```
response_body: str | dict[str, Any] | None
```

Raw response body (string or parsed dict).

### request_method

```
request_method: str | None
```

HTTP method used (GET, POST).

### request_url

```
request_url: str | None
```

Full request URL.

### request_params

```
request_params: dict[str, Any] | None
```

Query parameters sent.

### request_body

```
request_body: dict[str, Any] | None
```

Request body sent (for POST requests).

## mixpanel_data.AuthenticationError

```
AuthenticationError(
    message: str = "Authentication failed",
    *,
    status_code: int = 401,
    response_body: str | dict[str, Any] | None = None,
    request_method: str | None = None,
    request_url: str | None = None,
    request_params: dict[str, Any] | None = None,
)
```

Bases: `APIError`

Authentication with Mixpanel API failed (HTTP 401).

Raised when credentials are invalid, expired, or lack required permissions. Inherits from APIError to provide full request/response context.

Example

```
try:
    client.segmentation(...)
except AuthenticationError as e:
    print(f"Auth failed: {e.message}")
    print(f"Request URL: {e.request_url}")
    # Check if project_id is correct, credentials are valid, etc.
```

Initialize AuthenticationError.

| PARAMETER        | DESCRIPTION                                                                          |
| ---------------- | ------------------------------------------------------------------------------------ |
| `message`        | Human-readable error message. **TYPE:** `str` **DEFAULT:** `'Authentication failed'` |
| `status_code`    | HTTP status code (default 401). **TYPE:** `int` **DEFAULT:** `401`                   |
| `response_body`  | Raw response body. **TYPE:** \`str                                                   |
| `request_method` | HTTP method used. **TYPE:** \`str                                                    |
| `request_url`    | Full request URL. **TYPE:** \`str                                                    |
| `request_params` | Query parameters sent. **TYPE:** \`dict[str, Any]                                    |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(
    self,
    message: str = "Authentication failed",
    *,
    status_code: int = 401,
    response_body: str | dict[str, Any] | None = None,
    request_method: str | None = None,
    request_url: str | None = None,
    request_params: dict[str, Any] | None = None,
) -> None:
    """Initialize AuthenticationError.

    Args:
        message: Human-readable error message.
        status_code: HTTP status code (default 401).
        response_body: Raw response body.
        request_method: HTTP method used.
        request_url: Full request URL.
        request_params: Query parameters sent.
    """
    super().__init__(
        message,
        status_code=status_code,
        response_body=response_body,
        request_method=request_method,
        request_url=request_url,
        request_params=request_params,
        code="AUTH_FAILED",
    )
```

## mixpanel_data.RateLimitError

```
RateLimitError(
    message: str = "Rate limit exceeded",
    *,
    retry_after: int | None = None,
    status_code: int = 429,
    response_body: str | dict[str, Any] | None = None,
    request_method: str | None = None,
    request_url: str | None = None,
    request_params: dict[str, Any] | None = None,
)
```

Bases: `APIError`

Mixpanel API rate limit exceeded (HTTP 429).

Raised when the API returns a 429 status. The retry_after property indicates when the request can be retried. Inherits from APIError to provide full request context for debugging.

Example

```
try:
    for _ in range(1000):
        client.segmentation(...)
except RateLimitError as e:
    print(f"Rate limited! Retry after {e.retry_after}s")
    print(f"Request: {e.request_method} {e.request_url}")
    time.sleep(e.retry_after or 60)
```

Initialize RateLimitError.

| PARAMETER        | DESCRIPTION                                                                        |
| ---------------- | ---------------------------------------------------------------------------------- |
| `message`        | Human-readable error message. **TYPE:** `str` **DEFAULT:** `'Rate limit exceeded'` |
| `retry_after`    | Seconds until retry is allowed (from Retry-After header). **TYPE:** \`int          |
| `status_code`    | HTTP status code (default 429). **TYPE:** `int` **DEFAULT:** `429`                 |
| `response_body`  | Raw response body. **TYPE:** \`str                                                 |
| `request_method` | HTTP method used. **TYPE:** \`str                                                  |
| `request_url`    | Full request URL. **TYPE:** \`str                                                  |
| `request_params` | Query parameters sent. **TYPE:** \`dict[str, Any]                                  |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(
    self,
    message: str = "Rate limit exceeded",
    *,
    retry_after: int | None = None,
    status_code: int = 429,
    response_body: str | dict[str, Any] | None = None,
    request_method: str | None = None,
    request_url: str | None = None,
    request_params: dict[str, Any] | None = None,
) -> None:
    """Initialize RateLimitError.

    Args:
        message: Human-readable error message.
        retry_after: Seconds until retry is allowed (from Retry-After header).
        status_code: HTTP status code (default 429).
        response_body: Raw response body.
        request_method: HTTP method used.
        request_url: Full request URL.
        request_params: Query parameters sent.
    """
    self._retry_after = retry_after
    if retry_after is not None:
        message = f"{message}. Retry after {retry_after} seconds."

    super().__init__(
        message,
        status_code=status_code,
        response_body=response_body,
        request_method=request_method,
        request_url=request_url,
        request_params=request_params,
        code="RATE_LIMITED",
    )
    # Add retry_after to details
    if retry_after is not None:
        self._details["retry_after"] = retry_after
```

### retry_after

```
retry_after: int | None
```

Seconds until retry is allowed, or None if unknown.

## mixpanel_data.QueryError

```
QueryError(
    message: str = "Query execution failed",
    *,
    status_code: int = 400,
    response_body: str | dict[str, Any] | None = None,
    request_method: str | None = None,
    request_url: str | None = None,
    request_params: dict[str, Any] | None = None,
    request_body: dict[str, Any] | None = None,
)
```

Bases: `APIError`

Query execution failed (HTTP 400 or query-specific error).

Raised when an API query fails due to invalid parameters, syntax errors, or other query-specific issues. Inherits from APIError to provide full request/response context for debugging.

Example

```
try:
    client.segmentation(event="nonexistent", ...)
except QueryError as e:
    print(f"Query failed: {e.message}")
    print(f"Response: {e.response_body}")
    print(f"Request params: {e.request_params}")
```

Initialize QueryError.

| PARAMETER        | DESCRIPTION                                                                           |
| ---------------- | ------------------------------------------------------------------------------------- |
| `message`        | Human-readable error message. **TYPE:** `str` **DEFAULT:** `'Query execution failed'` |
| `status_code`    | HTTP status code (default 400). **TYPE:** `int` **DEFAULT:** `400`                    |
| `response_body`  | Raw response body with error details. **TYPE:** \`str                                 |
| `request_method` | HTTP method used. **TYPE:** \`str                                                     |
| `request_url`    | Full request URL. **TYPE:** \`str                                                     |
| `request_params` | Query parameters sent. **TYPE:** \`dict[str, Any]                                     |
| `request_body`   | Request body sent (for POST). **TYPE:** \`dict[str, Any]                              |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(
    self,
    message: str = "Query execution failed",
    *,
    status_code: int = 400,
    response_body: str | dict[str, Any] | None = None,
    request_method: str | None = None,
    request_url: str | None = None,
    request_params: dict[str, Any] | None = None,
    request_body: dict[str, Any] | None = None,
) -> None:
    """Initialize QueryError.

    Args:
        message: Human-readable error message.
        status_code: HTTP status code (default 400).
        response_body: Raw response body with error details.
        request_method: HTTP method used.
        request_url: Full request URL.
        request_params: Query parameters sent.
        request_body: Request body sent (for POST).
    """
    super().__init__(
        message,
        status_code=status_code,
        response_body=response_body,
        request_method=request_method,
        request_url=request_url,
        request_params=request_params,
        request_body=request_body,
        code="QUERY_FAILED",
    )
```

## mixpanel_data.ServerError

```
ServerError(
    message: str = "Server error",
    *,
    status_code: int = 500,
    response_body: str | dict[str, Any] | None = None,
    request_method: str | None = None,
    request_url: str | None = None,
    request_params: dict[str, Any] | None = None,
    request_body: dict[str, Any] | None = None,
)
```

Bases: `APIError`

Mixpanel server error (HTTP 5xx).

Raised when the Mixpanel API returns a server error. These are typically transient issues that may succeed on retry. The response_body property contains the full error details from Mixpanel, which often include actionable information (e.g., "unit and interval both specified").

Example

```
try:
    client.retention(born_event="signup", ...)
except ServerError as e:
    print(f"Server error {e.status_code}: {e.message}")
    print(f"Response: {e.response_body}")
    print(f"Request params: {e.request_params}")
    # AI agent can analyze response_body to fix the request
```

Initialize ServerError.

| PARAMETER        | DESCRIPTION                                                                 |
| ---------------- | --------------------------------------------------------------------------- |
| `message`        | Human-readable error message. **TYPE:** `str` **DEFAULT:** `'Server error'` |
| `status_code`    | HTTP status code (5xx). **TYPE:** `int` **DEFAULT:** `500`                  |
| `response_body`  | Raw response body with error details. **TYPE:** \`str                       |
| `request_method` | HTTP method used. **TYPE:** \`str                                           |
| `request_url`    | Full request URL. **TYPE:** \`str                                           |
| `request_params` | Query parameters sent. **TYPE:** \`dict[str, Any]                           |
| `request_body`   | Request body sent (for POST). **TYPE:** \`dict[str, Any]                    |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(
    self,
    message: str = "Server error",
    *,
    status_code: int = 500,
    response_body: str | dict[str, Any] | None = None,
    request_method: str | None = None,
    request_url: str | None = None,
    request_params: dict[str, Any] | None = None,
    request_body: dict[str, Any] | None = None,
) -> None:
    """Initialize ServerError.

    Args:
        message: Human-readable error message.
        status_code: HTTP status code (5xx).
        response_body: Raw response body with error details.
        request_method: HTTP method used.
        request_url: Full request URL.
        request_params: Query parameters sent.
        request_body: Request body sent (for POST).
    """
    super().__init__(
        message,
        status_code=status_code,
        response_body=response_body,
        request_method=request_method,
        request_url=request_url,
        request_params=request_params,
        request_body=request_body,
        code="SERVER_ERROR",
    )
```

## mixpanel_data.JQLSyntaxError

```
JQLSyntaxError(
    raw_error: str, script: str | None = None, request_path: str | None = None
)
```

Bases: `QueryError`

JQL script execution failed with syntax or runtime error (HTTP 412).

Raised when a JQL script fails to execute due to syntax errors, type errors, or other JavaScript runtime issues. Provides structured access to error details from Mixpanel's response.

Inherits from QueryError (and thus APIError) to provide full HTTP context.

Example

```
try:
    result = live_query.jql(script)
except JQLSyntaxError as e:
    print(f"Error: {e.error_type}: {e.error_message}")
    print(f"Script: {e.script}")
    print(f"Line info: {e.line_info}")
    # AI agent can use this to fix the script and retry
```

Initialize JQLSyntaxError.

| PARAMETER      | DESCRIPTION                                                  |
| -------------- | ------------------------------------------------------------ |
| `raw_error`    | Raw error string from Mixpanel API response. **TYPE:** `str` |
| `script`       | The JQL script that caused the error. **TYPE:** \`str        |
| `request_path` | API request path from error response. **TYPE:** \`str        |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(
    self,
    raw_error: str,
    script: str | None = None,
    request_path: str | None = None,
) -> None:
    """Initialize JQLSyntaxError.

    Args:
        raw_error: Raw error string from Mixpanel API response.
        script: The JQL script that caused the error.
        request_path: API request path from error response.
    """
    # Parse structured error info from raw error string
    self._error_type = self._extract_error_type(raw_error)
    self._error_message = self._extract_message(raw_error)
    self._line_info = self._extract_line_info(raw_error)
    self._stack_trace = self._extract_stack_trace(raw_error)
    self._script = script
    self._raw_error = raw_error
    self._request_path = request_path

    # Build human-readable message
    message = f"JQL {self._error_type}: {self._error_message}"
    if self._line_info:
        message += f"\n{self._line_info}"

    # Build response body dict for APIError
    response_body: dict[str, Any] = {
        "error": raw_error,
    }
    if request_path:
        response_body["request"] = request_path

    super().__init__(
        message,
        status_code=412,
        response_body=response_body,
        request_body={"script": script} if script else None,
    )
    self._code = "JQL_SYNTAX_ERROR"

    # Add JQL-specific details
    self._details["error_type"] = self._error_type
    self._details["error_message"] = self._error_message
    self._details["line_info"] = self._line_info
    self._details["stack_trace"] = self._stack_trace
    self._details["script"] = script
    self._details["request_path"] = request_path
    self._details["raw_error"] = raw_error
```

### error_type

```
error_type: str
```

JavaScript error type (TypeError, SyntaxError, ReferenceError, etc.).

### error_message

```
error_message: str
```

Error message describing what went wrong.

### line_info

```
line_info: str | None
```

Code snippet with caret showing error location, if available.

### stack_trace

```
stack_trace: str | None
```

JavaScript stack trace, if available.

### script

```
script: str | None
```

The JQL script that caused the error.

### raw_error

```
raw_error: str
```

Complete raw error string from Mixpanel.

## Configuration Exceptions

## mixpanel_data.ConfigError

```
ConfigError(message: str, details: dict[str, Any] | None = None)
```

Bases: `MixpanelDataError`

Base for configuration-related errors.

Raised when there's a problem with configuration files, environment variables, or credential resolution.

Initialize ConfigError.

| PARAMETER | DESCRIPTION                                            |
| --------- | ------------------------------------------------------ |
| `message` | Human-readable error message. **TYPE:** `str`          |
| `details` | Additional structured data. **TYPE:** \`dict[str, Any] |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(
    self,
    message: str,
    details: dict[str, Any] | None = None,
) -> None:
    """Initialize ConfigError.

    Args:
        message: Human-readable error message.
        details: Additional structured data.
    """
    super().__init__(message, code="CONFIG_ERROR", details=details)
```

## mixpanel_data.AccountNotFoundError

```
AccountNotFoundError(
    account_name: str, available_accounts: list[str] | None = None
)
```

Bases: `ConfigError`

Named account does not exist in configuration.

Raised when attempting to access an account that hasn't been configured. The available_accounts property lists valid account names to help users.

Initialize AccountNotFoundError.

| PARAMETER            | DESCRIPTION                                                        |
| -------------------- | ------------------------------------------------------------------ |
| `account_name`       | The requested account name that wasn't found. **TYPE:** `str`      |
| `available_accounts` | List of valid account names for suggestions. **TYPE:** \`list[str] |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(
    self,
    account_name: str,
    available_accounts: list[str] | None = None,
) -> None:
    """Initialize AccountNotFoundError.

    Args:
        account_name: The requested account name that wasn't found.
        available_accounts: List of valid account names for suggestions.
    """
    available = available_accounts or []
    if available:
        available_str = ", ".join(f"'{a}'" for a in available)
        message = (
            f"Account '{account_name}' not found. "
            f"Available accounts: {available_str}"
        )
    else:
        message = f"Account '{account_name}' not found. No accounts configured."

    details = {
        "account_name": account_name,
        "available_accounts": available,
    }
    super().__init__(message, details=details)
    self._code = "ACCOUNT_NOT_FOUND"
```

### account_name

```
account_name: str
```

The requested account name that wasn't found.

### available_accounts

```
available_accounts: list[str]
```

List of valid account names.

## mixpanel_data.AccountExistsError

```
AccountExistsError(account_name: str)
```

Bases: `ConfigError`

Account name already exists in configuration.

Raised when attempting to add an account with a name that's already in use.

Initialize AccountExistsError.

| PARAMETER      | DESCRIPTION                                   |
| -------------- | --------------------------------------------- |
| `account_name` | The conflicting account name. **TYPE:** `str` |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(self, account_name: str) -> None:
    """Initialize AccountExistsError.

    Args:
        account_name: The conflicting account name.
    """
    message = f"Account '{account_name}' already exists."
    details = {"account_name": account_name}
    super().__init__(message, details=details)
    self._code = "ACCOUNT_EXISTS"
```

### account_name

```
account_name: str
```

The conflicting account name.

## Storage Exceptions

Storage exceptions are raised during fetch and table operations:

| Exception               | Raised When                                                        |
| ----------------------- | ------------------------------------------------------------------ |
| `TableExistsError`      | Fetching to an existing table without `append=True` or `--replace` |
| `TableNotFoundError`    | Using `append=True` on a non-existent table                        |
| `DatabaseLockedError`   | Another process has the database locked                            |
| `DatabaseNotFoundError` | Database file not found in read-only mode                          |

## mixpanel_data.TableExistsError

```
TableExistsError(table_name: str)
```

Bases: `MixpanelDataError`

Table already exists in local database.

Raised when attempting to create a table that already exists. Use drop() first to remove the existing table.

Initialize TableExistsError.

| PARAMETER    | DESCRIPTION                                 |
| ------------ | ------------------------------------------- |
| `table_name` | Name of the existing table. **TYPE:** `str` |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(self, table_name: str) -> None:
    """Initialize TableExistsError.

    Args:
        table_name: Name of the existing table.
    """
    message = f"Table '{table_name}' already exists."
    details = {
        "table_name": table_name,
        "suggestion": "Use drop() first to remove the existing table.",
    }
    super().__init__(message, code="TABLE_EXISTS", details=details)
```

### table_name

```
table_name: str
```

Name of the existing table.

## mixpanel_data.TableNotFoundError

```
TableNotFoundError(table_name: str)
```

Bases: `MixpanelDataError`

Table does not exist in local database.

Raised when attempting to access a table that hasn't been created.

Initialize TableNotFoundError.

| PARAMETER    | DESCRIPTION                                |
| ------------ | ------------------------------------------ |
| `table_name` | Name of the missing table. **TYPE:** `str` |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(self, table_name: str) -> None:
    """Initialize TableNotFoundError.

    Args:
        table_name: Name of the missing table.
    """
    message = f"Table '{table_name}' not found."
    details = {"table_name": table_name}
    super().__init__(message, code="TABLE_NOT_FOUND", details=details)
```

### table_name

```
table_name: str
```

Name of the missing table.

## mixpanel_data.DatabaseLockedError

```
DatabaseLockedError(db_path: str, holding_pid: int | None = None)
```

Bases: `MixpanelDataError`

Database is locked by another process.

Raised when attempting to access a DuckDB database that is locked by another process. DuckDB uses single-writer, multiple-reader concurrency - only one process can have write access at a time.

Example

```
try:
    ws = Workspace()
except DatabaseLockedError as e:
    print(f"Database {e.db_path} is locked")
    if e.holding_pid:
        print(f"Held by PID {e.holding_pid}")
```

Initialize DatabaseLockedError.

| PARAMETER     | DESCRIPTION                                                |
| ------------- | ---------------------------------------------------------- |
| `db_path`     | Path to the locked database file. **TYPE:** `str`          |
| `holding_pid` | Process ID holding the lock, if available. **TYPE:** \`int |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(
    self,
    db_path: str,
    holding_pid: int | None = None,
) -> None:
    """Initialize DatabaseLockedError.

    Args:
        db_path: Path to the locked database file.
        holding_pid: Process ID holding the lock, if available.
    """
    message = f"Database '{db_path}' is locked by another process"
    if holding_pid is not None:
        message += f" (PID {holding_pid})"
    message += ". Wait for the other operation to complete and try again."

    details: dict[str, str | int] = {
        "db_path": db_path,
        "suggestion": "Wait for the other operation to complete and try again.",
    }
    if holding_pid is not None:
        details["holding_pid"] = holding_pid

    super().__init__(message, code="DATABASE_LOCKED", details=details)
```

### db_path

```
db_path: str
```

Path to the locked database.

### holding_pid

```
holding_pid: int | None
```

Process ID holding the lock, if available.

## mixpanel_data.DatabaseNotFoundError

```
DatabaseNotFoundError(db_path: str)
```

Bases: `MixpanelDataError`

Database file does not exist.

Raised when attempting to open a non-existent database file in read-only mode. DuckDB cannot create a new database file when opened read-only.

This typically happens when running read-only commands (like `mp query` or `mp inspect tables`) before any data has been fetched.

Example

```
try:
    ws = Workspace(read_only=True)
except DatabaseNotFoundError as e:
    print(f"No data yet: {e.db_path}")
    print("Run 'mp fetch events' first to create the database.")
```

Initialize DatabaseNotFoundError.

| PARAMETER | DESCRIPTION                                                   |
| --------- | ------------------------------------------------------------- |
| `db_path` | Path to the database file that doesn't exist. **TYPE:** `str` |

Source code in `src/mixpanel_data/exceptions.py`

```
def __init__(self, db_path: str) -> None:
    """Initialize DatabaseNotFoundError.

    Args:
        db_path: Path to the database file that doesn't exist.
    """
    message = (
        f"Database '{db_path}' does not exist. "
        "Run 'mp fetch events' first to create it."
    )

    details: dict[str, str] = {
        "db_path": db_path,
        "suggestion": "Run 'mp fetch events' or 'mp fetch profiles' to create the database.",
    }

    super().__init__(message, code="DATABASE_NOT_FOUND", details=details)
```

### db_path

```
db_path: str
```

Path to the database file that doesn't exist.

Copy markdown

# Result Types

Explore on DeepWiki

🤖 **[Result Types Reference →](https://deepwiki.com/jaredmcfarland/mixpanel_data/7.5-result-type-reference)**

Ask questions about result structures, DataFrame conversion, or type usage patterns.

All result types are immutable frozen dataclasses with:

- Lazy DataFrame conversion via the `.df` property
- JSON serialization via the `.to_dict()` method
- Full type hints for IDE/mypy support

## Fetch Results

## mixpanel_data.FetchResult

```
FetchResult(
    table: str,
    rows: int,
    type: Literal["events", "profiles"],
    duration_seconds: float,
    date_range: tuple[str, str] | None,
    fetched_at: datetime,
    _data: list[dict[str, Any]] = list(),
    _df_cache: DataFrame | None = None,
)
```

Result of a data fetch operation.

Represents the outcome of fetching events or profiles from Mixpanel and storing them in the local database.

### table

```
table: str
```

Name of the created table.

### rows

```
rows: int
```

Number of rows fetched.

### type

```
type: Literal['events', 'profiles']
```

Type of data fetched.

### duration_seconds

```
duration_seconds: float
```

Time taken to complete the fetch.

### date_range

```
date_range: tuple[str, str] | None
```

Date range for events (None for profiles).

### fetched_at

```
fetched_at: datetime
```

Timestamp when fetch completed.

### df

```
df: DataFrame
```

Convert result data to pandas DataFrame.

Conversion is lazy - computed on first access and cached.

| RETURNS     | DESCRIPTION                  |
| ----------- | ---------------------------- |
| `DataFrame` | DataFrame with fetched data. |

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize result for JSON output.

| RETURNS          | DESCRIPTION                                          |
| ---------------- | ---------------------------------------------------- |
| `dict[str, Any]` | Dictionary representation (excludes raw data).       |
| `dict[str, Any]` | datetime values are converted to ISO format strings. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize result for JSON output.

    Returns:
        Dictionary representation (excludes raw data).
        datetime values are converted to ISO format strings.
    """
    return {
        "table": self.table,
        "rows": self.rows,
        "type": self.type,
        "duration_seconds": self.duration_seconds,
        "date_range": self.date_range,
        "fetched_at": self.fetched_at.isoformat(),
    }
```

## Parallel Fetch Types

Types for parallel event fetching with progress tracking and failure handling.

## mixpanel_data.ParallelFetchResult

```
ParallelFetchResult(
    table: str,
    total_rows: int,
    successful_batches: int,
    failed_batches: int,
    failed_date_ranges: tuple[tuple[str, str], ...],
    duration_seconds: float,
    fetched_at: datetime,
)
```

Result of a parallel fetch operation.

Aggregates results from all batches, providing summary statistics and information about any failures for retry.

| ATTRIBUTE            | DESCRIPTION                                                                                 |
| -------------------- | ------------------------------------------------------------------------------------------- |
| `table`              | Name of the created/appended table. **TYPE:** `str`                                         |
| `total_rows`         | Total number of rows fetched across all batches. **TYPE:** `int`                            |
| `successful_batches` | Number of batches that completed successfully. **TYPE:** `int`                              |
| `failed_batches`     | Number of batches that failed. **TYPE:** `int`                                              |
| `failed_date_ranges` | Date ranges (from_date, to_date) of failed batches. **TYPE:** `tuple[tuple[str, str], ...]` |
| `duration_seconds`   | Total time taken for the parallel fetch. **TYPE:** `float`                                  |
| `fetched_at`         | Timestamp when fetch completed. **TYPE:** `datetime`                                        |

Example

```
result = ws.fetch_events(
    name="events",
    from_date="2024-01-01",
    to_date="2024-03-31",
    parallel=True,
)

if result.has_failures:
    print(f"Warning: {result.failed_batches} batches failed")
    for from_date, to_date in result.failed_date_ranges:
        print(f"  {from_date} to {to_date}")
```

### table

```
table: str
```

Name of the created/appended table.

### total_rows

```
total_rows: int
```

Total number of rows fetched across all batches.

### successful_batches

```
successful_batches: int
```

Number of batches that completed successfully.

### failed_batches

```
failed_batches: int
```

Number of batches that failed.

### failed_date_ranges

```
failed_date_ranges: tuple[tuple[str, str], ...]
```

Date ranges (from_date, to_date) of failed batches for retry.

### duration_seconds

```
duration_seconds: float
```

Total time taken for the parallel fetch.

### fetched_at

```
fetched_at: datetime
```

Timestamp when fetch completed.

### has_failures

```
has_failures: bool
```

Check if any batches failed.

| RETURNS | DESCRIPTION                                         |
| ------- | --------------------------------------------------- |
| `bool`  | True if at least one batch failed, False otherwise. |

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                               |
| ---------------- | --------------------------------------------------------- |
| `dict[str, Any]` | Dictionary with all result fields including has_failures. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all result fields including has_failures.
    """
    return {
        "table": self.table,
        "total_rows": self.total_rows,
        "successful_batches": self.successful_batches,
        "failed_batches": self.failed_batches,
        "failed_date_ranges": [list(dr) for dr in self.failed_date_ranges],
        "duration_seconds": self.duration_seconds,
        "fetched_at": self.fetched_at.isoformat(),
        "has_failures": self.has_failures,
    }
```

## mixpanel_data.BatchProgress

```
BatchProgress(
    from_date: str,
    to_date: str,
    batch_index: int,
    total_batches: int,
    rows: int,
    success: bool,
    error: str | None = None,
)
```

Progress update for a parallel fetch batch.

Sent to the on_batch_complete callback when a batch finishes (successfully or with error).

| ATTRIBUTE       | DESCRIPTION                                                         |
| --------------- | ------------------------------------------------------------------- |
| `from_date`     | Start date of this batch (YYYY-MM-DD). **TYPE:** `str`              |
| `to_date`       | End date of this batch (YYYY-MM-DD). **TYPE:** `str`                |
| `batch_index`   | Zero-based index of this batch. **TYPE:** `int`                     |
| `total_batches` | Total number of batches in the parallel fetch. **TYPE:** `int`      |
| `rows`          | Number of rows fetched in this batch (0 if failed). **TYPE:** `int` |
| `success`       | Whether this batch completed successfully. **TYPE:** `bool`         |
| `error`         | Error message if failed, None if successful. **TYPE:** \`str        |

Example

```
def on_batch(progress: BatchProgress) -> None:
    status = "✓" if progress.success else "✗"
    print(f"[{status}] Batch {progress.batch_index + 1}/{progress.total_batches}")

result = ws.fetch_events(
    name="events",
    from_date="2024-01-01",
    to_date="2024-03-31",
    parallel=True,
    on_batch_complete=on_batch,
)
```

### from_date

```
from_date: str
```

Start date of this batch (YYYY-MM-DD).

### to_date

```
to_date: str
```

End date of this batch (YYYY-MM-DD).

### batch_index

```
batch_index: int
```

Zero-based index of this batch.

### total_batches

```
total_batches: int
```

Total number of batches in the parallel fetch.

### rows

```
rows: int
```

Number of rows fetched in this batch (0 if failed).

### success

```
success: bool
```

Whether this batch completed successfully.

### error

```
error: str | None = None
```

Error message if failed, None if successful.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                |
| ---------------- | ------------------------------------------ |
| `dict[str, Any]` | Dictionary with all batch progress fields. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all batch progress fields.
    """
    return {
        "from_date": self.from_date,
        "to_date": self.to_date,
        "batch_index": self.batch_index,
        "total_batches": self.total_batches,
        "rows": self.rows,
        "success": self.success,
        "error": self.error,
    }
```

## mixpanel_data.BatchResult

```
BatchResult(
    from_date: str,
    to_date: str,
    rows: int,
    success: bool,
    error: str | None = None,
)
```

Result of fetching a single date range chunk.

Internal type used by ParallelFetcherService to track batch outcomes. Contains either the fetched data (on success) or error info (on failure).

| ATTRIBUTE   | DESCRIPTION                                                      |
| ----------- | ---------------------------------------------------------------- |
| `from_date` | Start date of this batch (YYYY-MM-DD). **TYPE:** `str`           |
| `to_date`   | End date of this batch (YYYY-MM-DD). **TYPE:** `str`             |
| `rows`      | Number of rows fetched (0 if failed). **TYPE:** `int`            |
| `success`   | Whether the batch completed successfully. **TYPE:** `bool`       |
| `error`     | Exception message if failed, None if successful. **TYPE:** \`str |

Note

Data is not included in to_dict() as it's consumed by the writer thread and is not JSON-serializable (iterator of dicts).

### from_date

```
from_date: str
```

Start date of this batch (YYYY-MM-DD).

### to_date

```
to_date: str
```

End date of this batch (YYYY-MM-DD).

### rows

```
rows: int
```

Number of rows fetched (0 if failed).

### success

```
success: bool
```

Whether the batch completed successfully.

### error

```
error: str | None = None
```

Exception message if failed, None if successful.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output (excludes data).

| RETURNS          | DESCRIPTION                                           |
| ---------------- | ----------------------------------------------------- |
| `dict[str, Any]` | Dictionary with batch result fields (excluding data). |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output (excludes data).

    Returns:
        Dictionary with batch result fields (excluding data).
    """
    return {
        "from_date": self.from_date,
        "to_date": self.to_date,
        "rows": self.rows,
        "success": self.success,
        "error": self.error,
    }
```

## Parallel Profile Fetch Types

Types for parallel profile fetching with page-based progress tracking.

## mixpanel_data.ParallelProfileResult

```
ParallelProfileResult(
    table: str,
    total_rows: int,
    successful_pages: int,
    failed_pages: int,
    failed_page_indices: tuple[int, ...],
    duration_seconds: float,
    fetched_at: datetime,
)
```

Result of a parallel profile fetch operation.

Aggregates results from all pages, providing summary statistics and information about any failures for retry.

| ATTRIBUTE             | DESCRIPTION                                                         |
| --------------------- | ------------------------------------------------------------------- |
| `table`               | Name of the created/appended table. **TYPE:** `str`                 |
| `total_rows`          | Total number of rows fetched across all pages. **TYPE:** `int`      |
| `successful_pages`    | Number of pages that completed successfully. **TYPE:** `int`        |
| `failed_pages`        | Number of pages that failed. **TYPE:** `int`                        |
| `failed_page_indices` | Page indices of failed pages for retry. **TYPE:** `tuple[int, ...]` |
| `duration_seconds`    | Total time taken for the parallel fetch. **TYPE:** `float`          |
| `fetched_at`          | Timestamp when fetch completed. **TYPE:** `datetime`                |

Example

```
result = ws.fetch_profiles(
    name="users",
    parallel=True,
)

if result.has_failures:
    print(f"Warning: {result.failed_pages} pages failed")
    for idx in result.failed_page_indices:
        print(f"  Page {idx}")
```

### table

```
table: str
```

Name of the created/appended table.

### total_rows

```
total_rows: int
```

Total number of rows fetched across all pages.

### successful_pages

```
successful_pages: int
```

Number of pages that completed successfully.

### failed_pages

```
failed_pages: int
```

Number of pages that failed.

### failed_page_indices

```
failed_page_indices: tuple[int, ...]
```

Page indices of failed pages for retry.

### duration_seconds

```
duration_seconds: float
```

Total time taken for the parallel fetch.

### fetched_at

```
fetched_at: datetime
```

Timestamp when fetch completed.

### has_failures

```
has_failures: bool
```

Check if any pages failed.

| RETURNS | DESCRIPTION                                        |
| ------- | -------------------------------------------------- |
| `bool`  | True if at least one page failed, False otherwise. |

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                               |
| ---------------- | --------------------------------------------------------- |
| `dict[str, Any]` | Dictionary with all result fields including has_failures. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all result fields including has_failures.
    """
    return {
        "table": self.table,
        "total_rows": self.total_rows,
        "successful_pages": self.successful_pages,
        "failed_pages": self.failed_pages,
        "failed_page_indices": list(self.failed_page_indices),
        "duration_seconds": self.duration_seconds,
        "fetched_at": self.fetched_at.isoformat(),
        "has_failures": self.has_failures,
    }
```

## mixpanel_data.ProfileProgress

```
ProfileProgress(
    page_index: int,
    total_pages: int | None,
    rows: int,
    success: bool,
    error: str | None,
    cumulative_rows: int,
)
```

Progress update for a parallel profile fetch page.

Sent to the on_page_complete callback when a page finishes (successfully or with error). Used for progress visibility during parallel profile fetching operations.

| ATTRIBUTE         | DESCRIPTION                                                        |
| ----------------- | ------------------------------------------------------------------ |
| `page_index`      | Zero-based index of this page. **TYPE:** `int`                     |
| `total_pages`     | Total pages if known, None if not yet determined. **TYPE:** \`int  |
| `rows`            | Number of rows fetched in this page (0 if failed). **TYPE:** `int` |
| `success`         | Whether this page completed successfully. **TYPE:** `bool`         |
| `error`           | Error message if failed, None if successful. **TYPE:** \`str       |
| `cumulative_rows` | Total rows fetched so far across all pages. **TYPE:** `int`        |

Example

```
def on_page(progress: ProfileProgress) -> None:
    status = "✓" if progress.success else "✗"
    pct = f"{progress.page_index + 1}/{progress.total_pages}" if progress.total_pages else f"{progress.page_index + 1}/?"
    print(f"[{status}] Page {pct}: {progress.cumulative_rows} total rows")

result = ws.fetch_profiles(
    name="users",
    parallel=True,
    on_page_complete=on_page,
)
```

### page_index

```
page_index: int
```

Zero-based index of this page.

### total_pages

```
total_pages: int | None
```

Total pages if known, None if not yet determined.

### rows

```
rows: int
```

Number of rows fetched in this page (0 if failed).

### success

```
success: bool
```

Whether this page completed successfully.

### error

```
error: str | None
```

Error message if failed, None if successful.

### cumulative_rows

```
cumulative_rows: int
```

Total rows fetched so far across all pages.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                  |
| ---------------- | -------------------------------------------- |
| `dict[str, Any]` | Dictionary with all profile progress fields. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all profile progress fields.
    """
    return {
        "page_index": self.page_index,
        "total_pages": self.total_pages,
        "rows": self.rows,
        "success": self.success,
        "error": self.error,
        "cumulative_rows": self.cumulative_rows,
    }
```

## mixpanel_data.ProfilePageResult

```
ProfilePageResult(
    profiles: list[dict[str, Any]],
    session_id: str | None,
    page: int,
    has_more: bool,
    total: int,
    page_size: int,
)
```

Result from fetching a single page of profiles.

Contains the profiles from one page of the Engage API along with pagination metadata for fetching subsequent pages.

| ATTRIBUTE    | DESCRIPTION                                                                   |
| ------------ | ----------------------------------------------------------------------------- |
| `profiles`   | List of profile dictionaries from this page. **TYPE:** `list[dict[str, Any]]` |
| `session_id` | Session ID for fetching next page, None if no more pages. **TYPE:** \`str     |
| `page`       | Zero-based page index that was fetched. **TYPE:** `int`                       |
| `has_more`   | True if there are more pages to fetch. **TYPE:** `bool`                       |
| `total`      | Total number of profiles matching the query across all pages. **TYPE:** `int` |
| `page_size`  | Number of profiles per page (typically 1000). **TYPE:** `int`                 |

Example

```
# Fetch first page to get pagination metadata
result = api_client.export_profiles_page(page=0)
all_profiles = list(result.profiles)

# Pre-compute total pages for parallel fetching
total_pages = result.num_pages
print(f"Fetching {total_pages} pages ({result.total} profiles)")

# Continue fetching if more pages
while result.has_more:
    result = api_client.export_profiles_page(
        page=result.page + 1,
        session_id=result.session_id,
    )
    all_profiles.extend(result.profiles)
```

### profiles

```
profiles: list[dict[str, Any]]
```

List of profile dictionaries from this page.

### session_id

```
session_id: str | None
```

Session ID for fetching next page, None if no more pages.

### page

```
page: int
```

Zero-based page index that was fetched.

### has_more

```
has_more: bool
```

True if there are more pages to fetch.

### total

```
total: int
```

Total number of profiles matching the query across all pages.

### page_size

```
page_size: int
```

Number of profiles per page (typically 1000).

### num_pages

```
num_pages: int
```

Calculate total number of pages needed.

Uses ceiling division to ensure partial pages are counted.

| RETURNS | DESCRIPTION                                 |
| ------- | ------------------------------------------- |
| `int`   | Total pages needed to fetch all profiles.   |
| `int`   | Returns 0 if total is 0 (empty result set). |

Example

```
result = api_client.export_profiles_page(page=0)
# If total=5432 and page_size=1000, num_pages=6
for page_idx in range(1, result.num_pages):
    # Fetch remaining pages...
```

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                                           |
| ---------------- | --------------------------------------------------------------------- |
| `dict[str, Any]` | Dictionary with all page result fields including pagination metadata. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all page result fields including pagination metadata.
    """
    return {
        "profiles": self.profiles,
        "session_id": self.session_id,
        "page": self.page,
        "has_more": self.has_more,
        "profile_count": len(self.profiles),
        "total": self.total,
        "page_size": self.page_size,
        "num_pages": self.num_pages,
    }
```

## Query Results

## mixpanel_data.SegmentationResult

```
SegmentationResult(
    event: str,
    from_date: str,
    to_date: str,
    unit: Literal["day", "week", "month"],
    segment_property: str | None,
    total: int,
    series: dict[str, dict[str, int]] = dict(),
    *,
    _df_cache: DataFrame | None = None,
)
```

Bases: `ResultWithDataFrame`

Result of a segmentation query.

Contains time-series data for an event, optionally segmented by a property.

Inherits from ResultWithDataFrame to provide:

- Lazy DataFrame caching via \_df_cache field
- Normalized table output via to_table_dict() method

### event

```
event: str
```

Queried event name.

### from_date

```
from_date: str
```

Query start date (YYYY-MM-DD).

### to_date

```
to_date: str
```

Query end date (YYYY-MM-DD).

### unit

```
unit: Literal['day', 'week', 'month']
```

Time unit for aggregation.

### segment_property

```
segment_property: str | None
```

Property used for segmentation (None if total only).

### total

```
total: int
```

Total count across all segments and time periods.

### series

```
series: dict[str, dict[str, int]] = field(default_factory=dict)
```

Time series data by segment.

Structure: {segment_name: {date_string: count}} Example: {"US": {"2024-01-01": 150, "2024-01-02": 200}, "EU": {...}} For unsegmented queries, segment_name is "total".

### df

```
df: DataFrame
```

Convert to DataFrame with columns: date, segment, count.

For unsegmented queries, segment column is 'total'.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "event": self.event,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "unit": self.unit,
        "segment_property": self.segment_property,
        "total": self.total,
        "series": self.series,
    }
```

## mixpanel_data.FunnelResult

```
FunnelResult(
    funnel_id: int,
    funnel_name: str,
    from_date: str,
    to_date: str,
    conversion_rate: float,
    steps: list[FunnelStep] = list(),
    *,
    _df_cache: DataFrame | None = None,
)
```

Bases: `ResultWithDataFrame`

Result of a funnel query.

Contains step-by-step conversion data for a funnel.

Inherits from ResultWithDataFrame to provide:

- Lazy DataFrame caching via \_df_cache field
- Normalized table output via to_table_dict() method

### funnel_id

```
funnel_id: int
```

Funnel identifier.

### funnel_name

```
funnel_name: str
```

Funnel display name.

### from_date

```
from_date: str
```

Query start date.

### to_date

```
to_date: str
```

Query end date.

### conversion_rate

```
conversion_rate: float
```

Overall conversion rate (0.0 to 1.0).

### steps

```
steps: list[FunnelStep] = field(default_factory=list)
```

Step-by-step breakdown.

### df

```
df: DataFrame
```

Convert to DataFrame with columns: step, event, count, conversion_rate.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "funnel_id": self.funnel_id,
        "funnel_name": self.funnel_name,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "conversion_rate": self.conversion_rate,
        "steps": [step.to_dict() for step in self.steps],
    }
```

## mixpanel_data.FunnelStep

```
FunnelStep(event: str, count: int, conversion_rate: float)
```

Single step in a funnel.

### event

```
event: str
```

Event name for this step.

### count

```
count: int
```

Number of users at this step.

### conversion_rate

```
conversion_rate: float
```

Conversion rate from previous step (0.0 to 1.0).

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "event": self.event,
        "count": self.count,
        "conversion_rate": self.conversion_rate,
    }
```

## mixpanel_data.RetentionResult

```
RetentionResult(
    born_event: str,
    return_event: str,
    from_date: str,
    to_date: str,
    unit: Literal["day", "week", "month"],
    cohorts: list[CohortInfo] = list(),
    *,
    _df_cache: DataFrame | None = None,
)
```

Bases: `ResultWithDataFrame`

Result of a retention query.

Contains cohort-based retention data.

Inherits from ResultWithDataFrame to provide:

- Lazy DataFrame caching via \_df_cache field
- Normalized table output via to_table_dict() method

### born_event

```
born_event: str
```

Event that defines cohort membership.

### return_event

```
return_event: str
```

Event that defines return.

### from_date

```
from_date: str
```

Query start date.

### to_date

```
to_date: str
```

Query end date.

### unit

```
unit: Literal['day', 'week', 'month']
```

Time unit for retention periods.

### cohorts

```
cohorts: list[CohortInfo] = field(default_factory=list)
```

Cohort retention data.

### df

```
df: DataFrame
```

Convert to DataFrame with columns: cohort_date, cohort_size, period_N.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "born_event": self.born_event,
        "return_event": self.return_event,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "unit": self.unit,
        "cohorts": [cohort.to_dict() for cohort in self.cohorts],
    }
```

## mixpanel_data.CohortInfo

```
CohortInfo(date: str, size: int, retention: list[float] = list())
```

Retention data for a single cohort.

### date

```
date: str
```

Cohort date (when users were 'born').

### size

```
size: int
```

Number of users in cohort.

### retention

```
retention: list[float] = field(default_factory=list)
```

Retention percentages by period (0.0 to 1.0).

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "date": self.date,
        "size": self.size,
        "retention": self.retention,
    }
```

## mixpanel_data.JQLResult

```
JQLResult(_raw: list[Any] = list(), *, _df_cache: DataFrame | None = None)
```

Bases: `ResultWithDataFrame`

Result of a JQL query.

JQL (JavaScript Query Language) allows custom queries against Mixpanel data.

Inherits from ResultWithDataFrame to provide:

- Lazy DataFrame caching via \_df_cache field
- Normalized table output via to_table_dict() method

The df property intelligently detects JQL result patterns (groupBy, percentiles, simple dicts) and converts them to clean tabular format.

### raw

```
raw: list[Any]
```

Raw result data from JQL execution.

### df

```
df: DataFrame
```

Convert result to DataFrame with intelligent structure detection.

The conversion strategy depends on the detected JQL result pattern:

**groupBy results** (detected by {key: [...], value: X} structure):

- Keys expanded to columns: key_0, key_1, key_2, ...
- Single value: "value" column
- Multiple reducers (value array): value_0, value_1, value_2, ...
- Additional fields (from .map()): preserved as-is
- Example: {"key": ["US"], "value": 100, "name": "USA"} -> columns: key_0, value, name

**Nested percentile results** (\[[{percentile: X, value: Y}, ...]\]):

- Outer list unwrapped, inner dicts converted directly

**Simple list of dicts** (already well-structured):

- Converted directly to DataFrame preserving all fields

**Fallback for other structures** (scalars, mixed types, incompatible dicts):

- Safely wrapped in single "value" column to prevent data loss
- Used when structure doesn't match known patterns

| RAISES       | DESCRIPTION                                                                                                                      |
| ------------ | -------------------------------------------------------------------------------------------------------------------------------- |
| `ValueError` | If groupBy structure has inconsistent value types across rows (some scalar, some array) which indicates malformed query results. |

| RETURNS     | DESCRIPTION                                          |
| ----------- | ---------------------------------------------------- |
| `DataFrame` | DataFrame representation, cached after first access. |

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "raw": self._raw,
        "row_count": len(self._raw),
    }
```

## Discovery Types

## mixpanel_data.FunnelInfo

```
FunnelInfo(funnel_id: int, name: str)
```

A saved funnel definition.

Represents a funnel saved in Mixpanel that can be queried using the funnel() method.

### funnel_id

```
funnel_id: int
```

Unique identifier for funnel queries.

### name

```
name: str
```

Human-readable funnel name.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "funnel_id": self.funnel_id,
        "name": self.name,
    }
```

## mixpanel_data.SavedCohort

```
SavedCohort(
    id: int,
    name: str,
    count: int,
    description: str,
    created: str,
    is_visible: bool,
)
```

A saved cohort definition.

Represents a user cohort saved in Mixpanel for profile filtering.

### id

```
id: int
```

Unique identifier for profile filtering.

### name

```
name: str
```

Human-readable cohort name.

### count

```
count: int
```

Current number of users in cohort.

### description

```
description: str
```

Optional description (may be empty string).

### created

```
created: str
```

Creation timestamp (YYYY-MM-DD HH:mm:ss).

### is_visible

```
is_visible: bool
```

Whether cohort is visible in Mixpanel UI.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "id": self.id,
        "name": self.name,
        "count": self.count,
        "description": self.description,
        "created": self.created,
        "is_visible": self.is_visible,
    }
```

## mixpanel_data.TopEvent

```
TopEvent(event: str, count: int, percent_change: float)
```

Today's event activity data.

Represents an event's current activity including count and trend.

### event

```
event: str
```

Event name.

### count

```
count: int
```

Today's event count.

### percent_change

```
percent_change: float
```

Change vs yesterday (-1.0 to +infinity).

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "event": self.event,
        "count": self.count,
        "percent_change": self.percent_change,
    }
```

## Lexicon Types

## mixpanel_data.LexiconSchema

```
LexiconSchema(entity_type: str, name: str, schema_json: LexiconDefinition)
```

Complete schema definition from Mixpanel Lexicon.

Represents a documented event or profile property definition from the Mixpanel data dictionary.

### entity_type

```
entity_type: str
```

Type of entity (e.g., 'event', 'profile', 'custom_event', 'group', etc.).

### name

```
name: str
```

Name of the event or profile property.

### schema_json

```
schema_json: LexiconDefinition
```

Full schema definition.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                         |
| ---------------- | --------------------------------------------------- |
| `dict[str, Any]` | Dictionary with entity_type, name, and schema_json. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with entity_type, name, and schema_json.
    """
    return {
        "entity_type": self.entity_type,
        "name": self.name,
        "schema_json": self.schema_json.to_dict(),
    }
```

## mixpanel_data.LexiconDefinition

```
LexiconDefinition(
    description: str | None,
    properties: dict[str, LexiconProperty],
    metadata: LexiconMetadata | None,
)
```

Full schema definition for an event or profile property in Lexicon.

Contains the structural definition including description, properties, and platform-specific metadata.

### description

```
description: str | None
```

Human-readable description of the entity.

### properties

```
properties: dict[str, LexiconProperty]
```

Property definitions keyed by property name.

### metadata

```
metadata: LexiconMetadata | None
```

Optional Mixpanel-specific metadata for the entity.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                                          |
| ---------------- | -------------------------------------------------------------------- |
| `dict[str, Any]` | Dictionary with properties, and optionally description and metadata. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with properties, and optionally description and metadata.
    """
    result: dict[str, Any] = {
        "properties": {k: v.to_dict() for k, v in self.properties.items()},
    }
    if self.description is not None:
        result["description"] = self.description
    if self.metadata is not None:
        result["metadata"] = self.metadata.to_dict()
    return result
```

## mixpanel_data.LexiconProperty

```
LexiconProperty(
    type: str, description: str | None, metadata: LexiconMetadata | None
)
```

Schema definition for a single property in a Lexicon schema.

Describes the type and metadata for an event or profile property.

### type

```
type: str
```

JSON Schema type (string, number, boolean, array, object, integer, null).

### description

```
description: str | None
```

Human-readable description of the property.

### metadata

```
metadata: LexiconMetadata | None
```

Optional Mixpanel-specific metadata.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                                    |
| ---------------- | -------------------------------------------------------------- |
| `dict[str, Any]` | Dictionary with type, and optionally description and metadata. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with type, and optionally description and metadata.
    """
    result: dict[str, Any] = {"type": self.type}
    if self.description is not None:
        result["description"] = self.description
    if self.metadata is not None:
        result["metadata"] = self.metadata.to_dict()
    return result
```

## mixpanel_data.LexiconMetadata

```
LexiconMetadata(
    source: str | None,
    display_name: str | None,
    tags: list[str],
    hidden: bool,
    dropped: bool,
    contacts: list[str],
    team_contacts: list[str],
)
```

Mixpanel-specific metadata for Lexicon schemas and properties.

Contains platform-specific information about how schemas and properties are displayed and organized in the Mixpanel UI.

### source

```
source: str | None
```

Origin of the schema definition (e.g., 'api', 'csv', 'ui').

### display_name

```
display_name: str | None
```

Human-readable display name in Mixpanel UI.

### tags

```
tags: list[str]
```

Categorization tags for organization.

### hidden

```
hidden: bool
```

Whether hidden from Mixpanel UI.

### dropped

```
dropped: bool
```

Whether data is dropped/ignored.

### contacts

```
contacts: list[str]
```

Owner email addresses.

### team_contacts

```
team_contacts: list[str]
```

Team ownership labels.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                          |
| ---------------- | ------------------------------------ |
| `dict[str, Any]` | Dictionary with all metadata fields. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all metadata fields.
    """
    return {
        "source": self.source,
        "display_name": self.display_name,
        "tags": self.tags,
        "hidden": self.hidden,
        "dropped": self.dropped,
        "contacts": self.contacts,
        "team_contacts": self.team_contacts,
    }
```

## Event Analytics Results

## mixpanel_data.EventCountsResult

```
EventCountsResult(
    events: list[str],
    from_date: str,
    to_date: str,
    unit: Literal["day", "week", "month"],
    type: Literal["general", "unique", "average"],
    series: dict[str, dict[str, int]],
    *,
    _df_cache: DataFrame | None = None,
)
```

Bases: `ResultWithDataFrame`

Time-series event count data.

Contains aggregate counts for multiple events over time with lazy DataFrame conversion support.

Inherits from ResultWithDataFrame to provide:

- Lazy DataFrame caching via \_df_cache field
- Normalized table output via to_table_dict() method

### events

```
events: list[str]
```

Queried event names.

### from_date

```
from_date: str
```

Query start date (YYYY-MM-DD).

### to_date

```
to_date: str
```

Query end date (YYYY-MM-DD).

### unit

```
unit: Literal['day', 'week', 'month']
```

Time unit for aggregation.

### type

```
type: Literal['general', 'unique', 'average']
```

Counting method used.

### series

```
series: dict[str, dict[str, int]]
```

Time series data: {event_name: {date: count}}.

### df

```
df: DataFrame
```

Convert to DataFrame with columns: date, event, count.

Conversion is lazy - computed on first access and cached.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "events": self.events,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "unit": self.unit,
        "type": self.type,
        "series": self.series,
    }
```

## mixpanel_data.PropertyCountsResult

```
PropertyCountsResult(
    event: str,
    property_name: str,
    from_date: str,
    to_date: str,
    unit: Literal["day", "week", "month"],
    type: Literal["general", "unique", "average"],
    series: dict[str, dict[str, int]],
    *,
    _df_cache: DataFrame | None = None,
)
```

Bases: `ResultWithDataFrame`

Time-series property value distribution data.

Contains aggregate counts by property values over time with lazy DataFrame conversion support.

Inherits from ResultWithDataFrame to provide:

- Lazy DataFrame caching via \_df_cache field
- Normalized table output via to_table_dict() method

### event

```
event: str
```

Queried event name.

### property_name

```
property_name: str
```

Property used for segmentation.

### from_date

```
from_date: str
```

Query start date (YYYY-MM-DD).

### to_date

```
to_date: str
```

Query end date (YYYY-MM-DD).

### unit

```
unit: Literal['day', 'week', 'month']
```

Time unit for aggregation.

### type

```
type: Literal['general', 'unique', 'average']
```

Counting method used.

### series

```
series: dict[str, dict[str, int]]
```

Time series data by property value.

Structure: {property_value: {date: count}} Example: {"US": {"2024-01-01": 150, "2024-01-02": 200}, "EU": {...}}

### df

```
df: DataFrame
```

Convert to DataFrame with columns: date, value, count.

Conversion is lazy - computed on first access and cached.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "event": self.event,
        "property_name": self.property_name,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "unit": self.unit,
        "type": self.type,
        "series": self.series,
    }
```

## Advanced Query Results

## mixpanel_data.UserEvent

```
UserEvent(event: str, time: datetime, properties: dict[str, Any] = dict())
```

Single event in a user's activity feed.

Represents one event from a user's event history with timestamp and all associated properties.

### event

```
event: str
```

Event name.

### time

```
time: datetime
```

Event timestamp (UTC).

### properties

```
properties: dict[str, Any] = field(default_factory=dict)
```

All event properties including system properties.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "event": self.event,
        "time": self.time.isoformat(),
        "properties": self.properties,
    }
```

## mixpanel_data.ActivityFeedResult

```
ActivityFeedResult(
    distinct_ids: list[str],
    from_date: str | None,
    to_date: str | None,
    events: list[UserEvent] = list(),
    *,
    _df_cache: DataFrame | None = None,
)
```

Bases: `ResultWithDataFrame`

Collection of user events from activity feed query.

Contains chronological event history for one or more users with lazy DataFrame conversion support.

Inherits from ResultWithDataFrame to provide:

- Lazy DataFrame caching via \_df_cache field
- Normalized table output via to_table_dict() method

### distinct_ids

```
distinct_ids: list[str]
```

Queried user identifiers.

### from_date

```
from_date: str | None
```

Start date filter (YYYY-MM-DD), None if not specified.

### to_date

```
to_date: str | None
```

End date filter (YYYY-MM-DD), None if not specified.

### events

```
events: list[UserEvent] = field(default_factory=list)
```

Event history (chronological order).

### df

```
df: DataFrame
```

Convert to DataFrame with columns: event, time, distinct_id, + properties.

Flattens event properties into individual columns. Conversion is lazy - computed on first access and cached.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "distinct_ids": self.distinct_ids,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "event_count": len(self.events),
        "events": [e.to_dict() for e in self.events],
    }
```

## mixpanel_data.FrequencyResult

```
FrequencyResult(
    event: str | None,
    from_date: str,
    to_date: str,
    unit: Literal["day", "week", "month"],
    addiction_unit: Literal["hour", "day"],
    data: dict[str, list[int]] = dict(),
    *,
    _df_cache: DataFrame | None = None,
)
```

Bases: `ResultWithDataFrame`

Event frequency distribution (addiction analysis).

Contains frequency arrays showing how many users performed events in N time periods, with lazy DataFrame conversion support.

Inherits from ResultWithDataFrame to provide:

- Lazy DataFrame caching via \_df_cache field
- Normalized table output via to_table_dict() method

### event

```
event: str | None
```

Filtered event name (None = all events).

### from_date

```
from_date: str
```

Query start date (YYYY-MM-DD).

### to_date

```
to_date: str
```

Query end date (YYYY-MM-DD).

### unit

```
unit: Literal['day', 'week', 'month']
```

Overall time period.

### addiction_unit

```
addiction_unit: Literal['hour', 'day']
```

Measurement granularity.

### data

```
data: dict[str, list[int]] = field(default_factory=dict)
```

Frequency arrays by date.

Structure: {date: [count_1, count_2, ...]} Example: {"2024-01-01": [100, 50, 25, 10]}

Each array shows user counts by frequency:

- Index 0: users active exactly 1 time
- Index 1: users active exactly 2 times
- Index N: users active exactly N+1 times

### df

```
df: DataFrame
```

Convert to DataFrame with columns: date, period_1, period_2, ...

Each period_N column shows users active in at least N time periods. Conversion is lazy - computed on first access and cached.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "event": self.event,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "unit": self.unit,
        "addiction_unit": self.addiction_unit,
        "data": self.data,
    }
```

## mixpanel_data.NumericBucketResult

```
NumericBucketResult(
    event: str,
    from_date: str,
    to_date: str,
    property_expr: str,
    unit: Literal["hour", "day"],
    series: dict[str, dict[str, int]] = dict(),
    *,
    _df_cache: DataFrame | None = None,
)
```

Bases: `ResultWithDataFrame`

Events segmented into numeric property ranges.

Contains time-series data bucketed by automatically determined numeric ranges, with lazy DataFrame conversion support.

Inherits from ResultWithDataFrame to provide:

- Lazy DataFrame caching via \_df_cache field
- Normalized table output via to_table_dict() method

### event

```
event: str
```

Queried event name.

### from_date

```
from_date: str
```

Query start date (YYYY-MM-DD).

### to_date

```
to_date: str
```

Query end date (YYYY-MM-DD).

### property_expr

```
property_expr: str
```

The 'on' expression used for bucketing.

### unit

```
unit: Literal['hour', 'day']
```

Time aggregation unit.

### series

```
series: dict[str, dict[str, int]] = field(default_factory=dict)
```

Bucket data: {range_string: {date: count}}.

### df

```
df: DataFrame
```

Convert to DataFrame with columns: date, bucket, count.

Conversion is lazy - computed on first access and cached.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "event": self.event,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "property_expr": self.property_expr,
        "unit": self.unit,
        "series": self.series,
    }
```

## mixpanel_data.NumericSumResult

```
NumericSumResult(
    event: str,
    from_date: str,
    to_date: str,
    property_expr: str,
    unit: Literal["hour", "day"],
    results: dict[str, float] = dict(),
    computed_at: str | None = None,
    *,
    _df_cache: DataFrame | None = None,
)
```

Bases: `ResultWithDataFrame`

Sum of numeric property values per time unit.

Contains daily or hourly sum totals for a numeric property with lazy DataFrame conversion support.

Inherits from ResultWithDataFrame to provide:

- Lazy DataFrame caching via \_df_cache field
- Normalized table output via to_table_dict() method

### event

```
event: str
```

Queried event name.

### from_date

```
from_date: str
```

Query start date (YYYY-MM-DD).

### to_date

```
to_date: str
```

Query end date (YYYY-MM-DD).

### property_expr

```
property_expr: str
```

The 'on' expression summed.

### unit

```
unit: Literal['hour', 'day']
```

Time aggregation unit.

### results

```
results: dict[str, float] = field(default_factory=dict)
```

Sum values: {date: sum}.

### computed_at

```
computed_at: str | None = None
```

Computation timestamp (if provided by API).

### df

```
df: DataFrame
```

Convert to DataFrame with columns: date, sum.

Conversion is lazy - computed on first access and cached.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    result: dict[str, Any] = {
        "event": self.event,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "property_expr": self.property_expr,
        "unit": self.unit,
        "results": self.results,
    }
    if self.computed_at is not None:
        result["computed_at"] = self.computed_at
    return result
```

## mixpanel_data.NumericAverageResult

```
NumericAverageResult(
    event: str,
    from_date: str,
    to_date: str,
    property_expr: str,
    unit: Literal["hour", "day"],
    results: dict[str, float] = dict(),
    *,
    _df_cache: DataFrame | None = None,
)
```

Bases: `ResultWithDataFrame`

Average of numeric property values per time unit.

Contains daily or hourly average values for a numeric property with lazy DataFrame conversion support.

Inherits from ResultWithDataFrame to provide:

- Lazy DataFrame caching via \_df_cache field
- Normalized table output via to_table_dict() method

### event

```
event: str
```

Queried event name.

### from_date

```
from_date: str
```

Query start date (YYYY-MM-DD).

### to_date

```
to_date: str
```

Query end date (YYYY-MM-DD).

### property_expr

```
property_expr: str
```

The 'on' expression averaged.

### unit

```
unit: Literal['hour', 'day']
```

Time aggregation unit.

### results

```
results: dict[str, float] = field(default_factory=dict)
```

Average values: {date: average}.

### df

```
df: DataFrame
```

Convert to DataFrame with columns: date, average.

Conversion is lazy - computed on first access and cached.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "event": self.event,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "property_expr": self.property_expr,
        "unit": self.unit,
        "results": self.results,
    }
```

## Bookmark Types

## mixpanel_data.BookmarkInfo

```
BookmarkInfo(
    id: int,
    name: str,
    type: BookmarkType,
    project_id: int,
    created: str,
    modified: str,
    workspace_id: int | None = None,
    dashboard_id: int | None = None,
    description: str | None = None,
    creator_id: int | None = None,
    creator_name: str | None = None,
)
```

Metadata for a saved report (bookmark) from the Mixpanel Bookmarks API.

Represents a saved Insights, Funnel, Retention, or Flows report that can be queried using query_saved_report() or query_flows().

| ATTRIBUTE      | DESCRIPTION                                                                                  |
| -------------- | -------------------------------------------------------------------------------------------- |
| `id`           | Unique bookmark identifier. **TYPE:** `int`                                                  |
| `name`         | User-defined report name. **TYPE:** `str`                                                    |
| `type`         | Report type (insights, funnels, retention, flows, launch-analysis). **TYPE:** `BookmarkType` |
| `project_id`   | Parent Mixpanel project ID. **TYPE:** `int`                                                  |
| `created`      | Creation timestamp (ISO format). **TYPE:** `str`                                             |
| `modified`     | Last modification timestamp (ISO format). **TYPE:** `str`                                    |
| `workspace_id` | Optional workspace ID if scoped to a workspace. **TYPE:** \`int                              |
| `dashboard_id` | Optional parent dashboard ID if linked to a dashboard. **TYPE:** \`int                       |
| `description`  | Optional user-provided description. **TYPE:** \`str                                          |
| `creator_id`   | Optional creator's user ID. **TYPE:** \`int                                                  |
| `creator_name` | Optional creator's display name. **TYPE:** \`str                                             |

### id

```
id: int
```

Unique bookmark identifier.

### name

```
name: str
```

User-defined report name.

### type

```
type: BookmarkType
```

Report type.

### project_id

```
project_id: int
```

Parent Mixpanel project ID.

### created

```
created: str
```

Creation timestamp (ISO format).

### modified

```
modified: str
```

Last modification timestamp (ISO format).

### workspace_id

```
workspace_id: int | None = None
```

Workspace ID if scoped to a workspace.

### dashboard_id

```
dashboard_id: int | None = None
```

Parent dashboard ID if linked to a dashboard.

### description

```
description: str | None = None
```

User-provided description.

### creator_id

```
creator_id: int | None = None
```

Creator's user ID.

### creator_name

```
creator_name: str | None = None
```

Creator's display name.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                   |
| ---------------- | --------------------------------------------- |
| `dict[str, Any]` | Dictionary with all bookmark metadata fields. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all bookmark metadata fields.
    """
    result: dict[str, Any] = {
        "id": self.id,
        "name": self.name,
        "type": self.type,
        "project_id": self.project_id,
        "created": self.created,
        "modified": self.modified,
    }
    if self.workspace_id is not None:
        result["workspace_id"] = self.workspace_id
    if self.dashboard_id is not None:
        result["dashboard_id"] = self.dashboard_id
    if self.description is not None:
        result["description"] = self.description
    if self.creator_id is not None:
        result["creator_id"] = self.creator_id
    if self.creator_name is not None:
        result["creator_name"] = self.creator_name
    return result
```

## mixpanel_data.SavedReportResult

```
SavedReportResult(
    bookmark_id: int,
    computed_at: str,
    from_date: str,
    to_date: str,
    headers: list[str] = list(),
    series: dict[str, Any] = dict(),
    _df_cache: DataFrame | None = None,
)
```

Data from a saved report (Insights, Retention, or Funnel).

Contains data from a pre-configured saved report with automatic report type detection and lazy DataFrame conversion support.

The report_type property automatically detects the report type based on headers: "$retention" indicates retention, "$funnel" indicates funnel, otherwise it's an insights report.

| ATTRIBUTE     | DESCRIPTION                                                               |
| ------------- | ------------------------------------------------------------------------- |
| `bookmark_id` | Saved report identifier. **TYPE:** `int`                                  |
| `computed_at` | When report was computed (ISO format). **TYPE:** `str`                    |
| `from_date`   | Report start date. **TYPE:** `str`                                        |
| `to_date`     | Report end date. **TYPE:** `str`                                          |
| `headers`     | Report column headers (used for type detection). **TYPE:** `list[str]`    |
| `series`      | Report data (structure varies by report type). **TYPE:** `dict[str, Any]` |

### bookmark_id

```
bookmark_id: int
```

Saved report identifier.

### computed_at

```
computed_at: str
```

When report was computed (ISO format).

### from_date

```
from_date: str
```

Report start date.

### to_date

```
to_date: str
```

Report end date.

### headers

```
headers: list[str] = field(default_factory=list)
```

Report column headers (used for type detection).

### series

```
series: dict[str, Any] = field(default_factory=dict)
```

Report data (structure varies by report type).

For Insights reports: {event_name: {date: count}} For Retention reports: {series_name: {date: {segment: {first, counts, rates}}}} For Funnel reports: {count: {...}, overall_conv_ratio: {...}, ...}

### report_type

```
report_type: SavedReportType
```

Detect the report type from headers.

| RETURNS           | DESCRIPTION                                  |
| ----------------- | -------------------------------------------- |
| `SavedReportType` | 'retention' if headers contain '$retention', |
| `SavedReportType` | 'funnel' if headers contain '$funnel',       |
| `SavedReportType` | 'insights' otherwise.                        |

### df

```
df: DataFrame
```

Convert to DataFrame.

For Insights reports: columns are date, event, count. For Retention/Funnel reports: flattens the nested structure.

Conversion is lazy - computed on first access and cached.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                                       |
| ---------------- | ----------------------------------------------------------------- |
| `dict[str, Any]` | Dictionary with all report fields including detected report_type. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all report fields including detected report_type.
    """
    return {
        "bookmark_id": self.bookmark_id,
        "computed_at": self.computed_at,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "headers": self.headers,
        "series": self.series,
        "report_type": self.report_type,
    }
```

## mixpanel_data.FlowsResult

```
FlowsResult(
    bookmark_id: int,
    computed_at: str,
    steps: list[dict[str, Any]] = list(),
    breakdowns: list[dict[str, Any]] = list(),
    overall_conversion_rate: float = 0.0,
    metadata: dict[str, Any] = dict(),
    *,
    _df_cache: DataFrame | None = None,
)
```

Bases: `ResultWithDataFrame`

Data from a saved Flows report.

Contains user path/navigation data from a pre-configured Flows report with lazy DataFrame conversion support.

Inherits from ResultWithDataFrame to provide:

- Lazy DataFrame caching via \_df_cache field
- Normalized table output via to_table_dict() method

| ATTRIBUTE                 | DESCRIPTION                                                                          |
| ------------------------- | ------------------------------------------------------------------------------------ |
| `bookmark_id`             | Saved report identifier. **TYPE:** `int`                                             |
| `computed_at`             | When report was computed (ISO format). **TYPE:** `str`                               |
| `steps`                   | Flow step data with event sequences and counts. **TYPE:** `list[dict[str, Any]]`     |
| `breakdowns`              | Path breakdown data showing user flow distribution. **TYPE:** `list[dict[str, Any]]` |
| `overall_conversion_rate` | End-to-end conversion rate (0.0 to 1.0). **TYPE:** `float`                           |
| `metadata`                | Additional API metadata from the response. **TYPE:** `dict[str, Any]`                |

### bookmark_id

```
bookmark_id: int
```

Saved report identifier.

### computed_at

```
computed_at: str
```

When report was computed (ISO format).

### steps

```
steps: list[dict[str, Any]] = field(default_factory=list)
```

Flow step data with event sequences and counts.

### breakdowns

```
breakdowns: list[dict[str, Any]] = field(default_factory=list)
```

Path breakdown data showing user flow distribution.

### overall_conversion_rate

```
overall_conversion_rate: float = 0.0
```

End-to-end conversion rate (0.0 to 1.0).

### metadata

```
metadata: dict[str, Any] = field(default_factory=dict)
```

Additional API metadata from the response.

### df

```
df: DataFrame
```

Convert steps to DataFrame.

Returns DataFrame with columns derived from step data structure. Conversion is lazy - computed on first access and cached.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                              |
| ---------------- | ---------------------------------------- |
| `dict[str, Any]` | Dictionary with all flows report fields. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all flows report fields.
    """
    return {
        "bookmark_id": self.bookmark_id,
        "computed_at": self.computed_at,
        "steps": self.steps,
        "breakdowns": self.breakdowns,
        "overall_conversion_rate": self.overall_conversion_rate,
        "metadata": self.metadata,
    }
```

## JQL Discovery Types

## mixpanel_data.PropertyDistributionResult

```
PropertyDistributionResult(
    event: str,
    property_name: str,
    from_date: str,
    to_date: str,
    total_count: int,
    values: tuple[PropertyValueCount, ...],
    _df_cache: DataFrame | None = None,
)
```

Distribution of values for a property from JQL analysis.

Contains the top N values for a property with their counts and percentages, enabling quick understanding of property value distribution without fetching all data locally.

| ATTRIBUTE       | DESCRIPTION                                                                        |
| --------------- | ---------------------------------------------------------------------------------- |
| `event`         | The event type analyzed. **TYPE:** `str`                                           |
| `property_name` | The property name analyzed. **TYPE:** `str`                                        |
| `from_date`     | Query start date (YYYY-MM-DD). **TYPE:** `str`                                     |
| `to_date`       | Query end date (YYYY-MM-DD). **TYPE:** `str`                                       |
| `total_count`   | Total number of events with this property defined. **TYPE:** `int`                 |
| `values`        | Top values with counts and percentages. **TYPE:** `tuple[PropertyValueCount, ...]` |

### event

```
event: str
```

Event type analyzed.

### property_name

```
property_name: str
```

Property name analyzed.

### from_date

```
from_date: str
```

Query start date (YYYY-MM-DD).

### to_date

```
to_date: str
```

Query end date (YYYY-MM-DD).

### total_count

```
total_count: int
```

Total events with this property defined.

### values

```
values: tuple[PropertyValueCount, ...]
```

Top values with counts and percentages.

### df

```
df: DataFrame
```

Convert to DataFrame with columns: value, count, percentage.

Conversion is lazy - computed on first access and cached.

| RETURNS     | DESCRIPTION                             |
| ----------- | --------------------------------------- |
| `DataFrame` | DataFrame with value distribution data. |

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                            |
| ---------------- | -------------------------------------- |
| `dict[str, Any]` | Dictionary with all distribution data. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all distribution data.
    """
    return {
        "event": self.event,
        "property_name": self.property_name,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "total_count": self.total_count,
        "values": [v.to_dict() for v in self.values],
    }
```

## mixpanel_data.PropertyValueCount

```
PropertyValueCount(
    value: str | int | float | bool | None, count: int, percentage: float
)
```

A single value and its count from property distribution analysis.

Represents one row in a property value distribution, showing the value, its occurrence count, and percentage of total.

| ATTRIBUTE    | DESCRIPTION                                                                |
| ------------ | -------------------------------------------------------------------------- |
| `value`      | The property value (can be string, number, bool, or None). **TYPE:** \`str |
| `count`      | Number of occurrences of this value. **TYPE:** `int`                       |
| `percentage` | Percentage of total events (0.0 to 100.0). **TYPE:** `float`               |

### value

```
value: str | int | float | bool | None
```

The property value.

### count

```
count: int
```

Number of occurrences.

### percentage

```
percentage: float
```

Percentage of total (0.0 to 100.0).

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                   |
| ---------------- | --------------------------------------------- |
| `dict[str, Any]` | Dictionary with value, count, and percentage. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with value, count, and percentage.
    """
    return {
        "value": self.value,
        "count": self.count,
        "percentage": self.percentage,
    }
```

## mixpanel_data.NumericPropertySummaryResult

```
NumericPropertySummaryResult(
    event: str,
    property_name: str,
    from_date: str,
    to_date: str,
    count: int,
    min: float,
    max: float,
    sum: float,
    avg: float,
    stddev: float,
    percentiles: dict[int, float],
)
```

Statistical summary of a numeric property from JQL analysis.

Contains min, max, sum, average, standard deviation, and percentiles for a numeric property, enabling understanding of value distributions without fetching all data locally.

| ATTRIBUTE       | DESCRIPTION                                                                |
| --------------- | -------------------------------------------------------------------------- |
| `event`         | The event type analyzed. **TYPE:** `str`                                   |
| `property_name` | The property name analyzed. **TYPE:** `str`                                |
| `from_date`     | Query start date (YYYY-MM-DD). **TYPE:** `str`                             |
| `to_date`       | Query end date (YYYY-MM-DD). **TYPE:** `str`                               |
| `count`         | Number of events with this property defined. **TYPE:** `int`               |
| `min`           | Minimum value. **TYPE:** `float`                                           |
| `max`           | Maximum value. **TYPE:** `float`                                           |
| `sum`           | Sum of all values. **TYPE:** `float`                                       |
| `avg`           | Average value. **TYPE:** `float`                                           |
| `stddev`        | Standard deviation. **TYPE:** `float`                                      |
| `percentiles`   | Percentile values keyed by percentile number. **TYPE:** `dict[int, float]` |

### event

```
event: str
```

Event type analyzed.

### property_name

```
property_name: str
```

Property name analyzed.

### from_date

```
from_date: str
```

Query start date (YYYY-MM-DD).

### to_date

```
to_date: str
```

Query end date (YYYY-MM-DD).

### count

```
count: int
```

Number of events with this property defined.

### min

```
min: float
```

Minimum value.

### max

```
max: float
```

Maximum value.

### sum

```
sum: float
```

Sum of all values.

### avg

```
avg: float
```

Average value.

### stddev

```
stddev: float
```

Standard deviation.

### percentiles

```
percentiles: dict[int, float]
```

Percentile values keyed by percentile number (e.g., {50: 98.0}).

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                               |
| ---------------- | ----------------------------------------- |
| `dict[str, Any]` | Dictionary with all numeric summary data. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all numeric summary data.
    """
    return {
        "event": self.event,
        "property_name": self.property_name,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "count": self.count,
        "min": self.min,
        "max": self.max,
        "sum": self.sum,
        "avg": self.avg,
        "stddev": self.stddev,
        "percentiles": {str(k): v for k, v in self.percentiles.items()},
    }
```

## mixpanel_data.DailyCountsResult

```
DailyCountsResult(
    from_date: str,
    to_date: str,
    events: tuple[str, ...] | None,
    counts: tuple[DailyCount, ...],
    _df_cache: DataFrame | None = None,
)
```

Time-series event counts by day from JQL analysis.

Contains daily event counts for quick activity trend analysis without complex segmentation setup.

| ATTRIBUTE   | DESCRIPTION                                                             |
| ----------- | ----------------------------------------------------------------------- |
| `from_date` | Query start date (YYYY-MM-DD). **TYPE:** `str`                          |
| `to_date`   | Query end date (YYYY-MM-DD). **TYPE:** `str`                            |
| `events`    | Event types included (None for all events). **TYPE:** \`tuple[str, ...] |
| `counts`    | Daily counts for each event. **TYPE:** `tuple[DailyCount, ...]`         |

### from_date

```
from_date: str
```

Query start date (YYYY-MM-DD).

### to_date

```
to_date: str
```

Query end date (YYYY-MM-DD).

### events

```
events: tuple[str, ...] | None
```

Event types included (None for all events).

### counts

```
counts: tuple[DailyCount, ...]
```

Daily counts for each event.

### df

```
df: DataFrame
```

Convert to DataFrame with columns: date, event, count.

Conversion is lazy - computed on first access and cached.

| RETURNS     | DESCRIPTION                       |
| ----------- | --------------------------------- |
| `DataFrame` | DataFrame with daily counts data. |

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                            |
| ---------------- | -------------------------------------- |
| `dict[str, Any]` | Dictionary with all daily counts data. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all daily counts data.
    """
    return {
        "from_date": self.from_date,
        "to_date": self.to_date,
        "events": list(self.events) if self.events else None,
        "counts": [c.to_dict() for c in self.counts],
    }
```

## mixpanel_data.DailyCount

```
DailyCount(date: str, event: str, count: int)
```

Event count for a single date from daily counts analysis.

Represents one row in a daily counts result, showing date, event, and count.

| ATTRIBUTE | DESCRIPTION                                         |
| --------- | --------------------------------------------------- |
| `date`    | Date string (YYYY-MM-DD). **TYPE:** `str`           |
| `event`   | Event name. **TYPE:** `str`                         |
| `count`   | Number of occurrences on this date. **TYPE:** `int` |

### date

```
date: str
```

Date string (YYYY-MM-DD).

### event

```
event: str
```

Event name.

### count

```
count: int
```

Number of occurrences.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                             |
| ---------------- | --------------------------------------- |
| `dict[str, Any]` | Dictionary with date, event, and count. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with date, event, and count.
    """
    return {
        "date": self.date,
        "event": self.event,
        "count": self.count,
    }
```

## mixpanel_data.EngagementDistributionResult

```
EngagementDistributionResult(
    from_date: str,
    to_date: str,
    events: tuple[str, ...] | None,
    total_users: int,
    buckets: tuple[EngagementBucket, ...],
    _df_cache: DataFrame | None = None,
)
```

User engagement distribution from JQL analysis.

Shows how many users performed N events, helping understand user engagement patterns without fetching all data locally.

| ATTRIBUTE     | DESCRIPTION                                                                   |
| ------------- | ----------------------------------------------------------------------------- |
| `from_date`   | Query start date (YYYY-MM-DD). **TYPE:** `str`                                |
| `to_date`     | Query end date (YYYY-MM-DD). **TYPE:** `str`                                  |
| `events`      | Event types included (None for all events). **TYPE:** \`tuple[str, ...]       |
| `total_users` | Total number of distinct users. **TYPE:** `int`                               |
| `buckets`     | Engagement buckets with user counts. **TYPE:** `tuple[EngagementBucket, ...]` |

### from_date

```
from_date: str
```

Query start date (YYYY-MM-DD).

### to_date

```
to_date: str
```

Query end date (YYYY-MM-DD).

### events

```
events: tuple[str, ...] | None
```

Event types included (None for all events).

### total_users

```
total_users: int
```

Total number of distinct users.

### buckets

```
buckets: tuple[EngagementBucket, ...]
```

Engagement buckets with user counts.

### df

```
df: DataFrame
```

Convert to DataFrame with engagement bucket columns.

Conversion is lazy - computed on first access and cached.

| RETURNS     | DESCRIPTION                                  |
| ----------- | -------------------------------------------- |
| `DataFrame` | DataFrame with engagement distribution data. |

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                       |
| ---------------- | ------------------------------------------------- |
| `dict[str, Any]` | Dictionary with all engagement distribution data. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all engagement distribution data.
    """
    return {
        "from_date": self.from_date,
        "to_date": self.to_date,
        "events": list(self.events) if self.events else None,
        "total_users": self.total_users,
        "buckets": [b.to_dict() for b in self.buckets],
    }
```

## mixpanel_data.EngagementBucket

```
EngagementBucket(
    bucket_min: int, bucket_label: str, user_count: int, percentage: float
)
```

User count in an engagement bucket from engagement analysis.

Represents one bucket in a user engagement distribution, showing how many users performed events in a certain frequency range.

| ATTRIBUTE      | DESCRIPTION                                                      |
| -------------- | ---------------------------------------------------------------- |
| `bucket_min`   | Minimum events in this bucket. **TYPE:** `int`                   |
| `bucket_label` | Human-readable label (e.g., "1", "2-5", "100+"). **TYPE:** `str` |
| `user_count`   | Number of users in this bucket. **TYPE:** `int`                  |
| `percentage`   | Percentage of total users (0.0 to 100.0). **TYPE:** `float`      |

### bucket_min

```
bucket_min: int
```

Minimum events in this bucket.

### bucket_label

```
bucket_label: str
```

Human-readable label (e.g., '1', '2-5', '100+').

### user_count

```
user_count: int
```

Number of users in this bucket.

### percentage

```
percentage: float
```

Percentage of total users (0.0 to 100.0).

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                  |
| ---------------- | ---------------------------- |
| `dict[str, Any]` | Dictionary with bucket data. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with bucket data.
    """
    return {
        "bucket_min": self.bucket_min,
        "bucket_label": self.bucket_label,
        "user_count": self.user_count,
        "percentage": self.percentage,
    }
```

## mixpanel_data.PropertyCoverageResult

```
PropertyCoverageResult(
    event: str,
    from_date: str,
    to_date: str,
    total_events: int,
    coverage: tuple[PropertyCoverage, ...],
    _df_cache: DataFrame | None = None,
)
```

Property coverage analysis result from JQL.

Shows which properties are consistently populated vs sparse, helping understand data quality before writing queries.

| ATTRIBUTE      | DESCRIPTION                                                                     |
| -------------- | ------------------------------------------------------------------------------- |
| `event`        | The event type analyzed. **TYPE:** `str`                                        |
| `from_date`    | Query start date (YYYY-MM-DD). **TYPE:** `str`                                  |
| `to_date`      | Query end date (YYYY-MM-DD). **TYPE:** `str`                                    |
| `total_events` | Total number of events analyzed. **TYPE:** `int`                                |
| `coverage`     | Coverage statistics for each property. **TYPE:** `tuple[PropertyCoverage, ...]` |

### event

```
event: str
```

Event type analyzed.

### from_date

```
from_date: str
```

Query start date (YYYY-MM-DD).

### to_date

```
to_date: str
```

Query end date (YYYY-MM-DD).

### total_events

```
total_events: int
```

Total number of events analyzed.

### coverage

```
coverage: tuple[PropertyCoverage, ...]
```

Coverage statistics for each property.

### df

```
df: DataFrame
```

Convert to DataFrame with property coverage columns.

Conversion is lazy - computed on first access and cached.

| RETURNS     | DESCRIPTION                            |
| ----------- | -------------------------------------- |
| `DataFrame` | DataFrame with property coverage data. |

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                        |
| ---------------- | ---------------------------------- |
| `dict[str, Any]` | Dictionary with all coverage data. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all coverage data.
    """
    return {
        "event": self.event,
        "from_date": self.from_date,
        "to_date": self.to_date,
        "total_events": self.total_events,
        "coverage": [c.to_dict() for c in self.coverage],
    }
```

## mixpanel_data.PropertyCoverage

```
PropertyCoverage(
    property: str,
    defined_count: int,
    null_count: int,
    coverage_percentage: float,
)
```

Coverage statistics for a single property from coverage analysis.

Shows how often a property is defined vs null for a given event type.

| ATTRIBUTE             | DESCRIPTION                                                               |
| --------------------- | ------------------------------------------------------------------------- |
| `property`            | Property name. **TYPE:** `str`                                            |
| `defined_count`       | Number of events with this property defined. **TYPE:** `int`              |
| `null_count`          | Number of events with this property null/undefined. **TYPE:** `int`       |
| `coverage_percentage` | Percentage of events with property defined (0.0-100.0). **TYPE:** `float` |

### property

```
property: str
```

Property name.

### defined_count

```
defined_count: int
```

Number of events with property defined.

### null_count

```
null_count: int
```

Number of events with property null/undefined.

### coverage_percentage

```
coverage_percentage: float
```

Percentage with property defined (0.0 to 100.0).

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                    |
| ---------------- | ------------------------------ |
| `dict[str, Any]` | Dictionary with coverage data. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with coverage data.
    """
    return {
        "property": self.property,
        "defined_count": self.defined_count,
        "null_count": self.null_count,
        "coverage_percentage": self.coverage_percentage,
    }
```

## Introspection Types

## mixpanel_data.ColumnSummary

```
ColumnSummary(
    column_name: str,
    column_type: str,
    min: Any,
    max: Any,
    approx_unique: int,
    avg: float | None,
    std: float | None,
    q25: Any,
    q50: Any,
    q75: Any,
    count: int,
    null_percentage: float,
)
```

Statistical summary of a single column from DuckDB's SUMMARIZE command.

Contains per-column statistics including min/max, quartiles, null percentage, and approximate distinct counts. Numeric columns include additional stats like average and standard deviation.

### column_name

```
column_name: str
```

Name of the column.

### column_type

```
column_type: str
```

DuckDB data type (VARCHAR, TIMESTAMP, INTEGER, JSON, etc.).

### min

```
min: Any
```

Minimum value (type varies by column type).

### max

```
max: Any
```

Maximum value (type varies by column type).

### approx_unique

```
approx_unique: int
```

Approximate count of distinct values (HyperLogLog).

### avg

```
avg: float | None
```

Mean value (None for non-numeric columns).

### std

```
std: float | None
```

Standard deviation (None for non-numeric columns).

### q25

```
q25: Any
```

25th percentile value (None for non-numeric).

### q50

```
q50: Any
```

Median / 50th percentile (None for non-numeric).

### q75

```
q75: Any
```

75th percentile value (None for non-numeric).

### count

```
count: int
```

Number of non-null values.

### null_percentage

```
null_percentage: float
```

Percentage of null values (0.0 to 100.0).

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                            |
| ---------------- | -------------------------------------- |
| `dict[str, Any]` | Dictionary with all column statistics. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all column statistics.
    """
    return {
        "column_name": self.column_name,
        "column_type": self.column_type,
        "min": self.min,
        "max": self.max,
        "approx_unique": self.approx_unique,
        "avg": self.avg,
        "std": self.std,
        "q25": self.q25,
        "q50": self.q50,
        "q75": self.q75,
        "count": self.count,
        "null_percentage": self.null_percentage,
    }
```

## mixpanel_data.SummaryResult

```
SummaryResult(
    table: str,
    row_count: int,
    columns: list[ColumnSummary] = list(),
    _df_cache: DataFrame | None = None,
)
```

Statistical summary of all columns in a table.

Contains row count and per-column statistics from DuckDB's SUMMARIZE command. Provides both structured access via the columns list and DataFrame conversion via the df property.

### table

```
table: str
```

Name of the summarized table.

### row_count

```
row_count: int
```

Total number of rows in the table.

### columns

```
columns: list[ColumnSummary] = field(default_factory=list)
```

Per-column statistics.

### df

```
df: DataFrame
```

Convert to DataFrame with one row per column.

Conversion is lazy - computed on first access and cached.

| RETURNS     | DESCRIPTION                       |
| ----------- | --------------------------------- |
| `DataFrame` | DataFrame with column statistics. |

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                                   |
| ---------------- | ------------------------------------------------------------- |
| `dict[str, Any]` | Dictionary with table name, row count, and column statistics. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with table name, row count, and column statistics.
    """
    return {
        "table": self.table,
        "row_count": self.row_count,
        "columns": [col.to_dict() for col in self.columns],
    }
```

## mixpanel_data.EventStats

```
EventStats(
    event_name: str,
    count: int,
    unique_users: int,
    first_seen: datetime,
    last_seen: datetime,
    pct_of_total: float,
)
```

Statistics for a single event type.

Contains count, unique users, date range, and percentage of total for a specific event in an events table.

### event_name

```
event_name: str
```

Name of the event.

### count

```
count: int
```

Total occurrences of this event.

### unique_users

```
unique_users: int
```

Count of distinct users who triggered this event.

### first_seen

```
first_seen: datetime
```

Earliest occurrence timestamp.

### last_seen

```
last_seen: datetime
```

Latest occurrence timestamp.

### pct_of_total

```
pct_of_total: float
```

Percentage of all events (0.0 to 100.0).

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                                  |
| ---------------- | ------------------------------------------------------------ |
| `dict[str, Any]` | Dictionary with event statistics (datetimes as ISO strings). |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with event statistics (datetimes as ISO strings).
    """
    return {
        "event_name": self.event_name,
        "count": self.count,
        "unique_users": self.unique_users,
        "first_seen": self.first_seen.isoformat(),
        "last_seen": self.last_seen.isoformat(),
        "pct_of_total": self.pct_of_total,
    }
```

## mixpanel_data.EventBreakdownResult

```
EventBreakdownResult(
    table: str,
    total_events: int,
    total_users: int,
    date_range: tuple[datetime, datetime],
    events: list[EventStats] = list(),
    _df_cache: DataFrame | None = None,
)
```

Distribution of events in a table.

Contains aggregate statistics and per-event breakdown with counts, unique users, date ranges, and percentages.

### table

```
table: str
```

Name of the analyzed table.

### total_events

```
total_events: int
```

Total number of events in the table.

### total_users

```
total_users: int
```

Total distinct users across all events.

### date_range

```
date_range: tuple[datetime, datetime]
```

(earliest, latest) event timestamps.

### events

```
events: list[EventStats] = field(default_factory=list)
```

Per-event statistics, ordered by count descending.

### df

```
df: DataFrame
```

Convert to DataFrame with one row per event type.

Conversion is lazy - computed on first access and cached.

| RETURNS     | DESCRIPTION                      |
| ----------- | -------------------------------- |
| `DataFrame` | DataFrame with event statistics. |

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                                      |
| ---------------- | ------------------------------------------------ |
| `dict[str, Any]` | Dictionary with table info and event statistics. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with table info and event statistics.
    """
    return {
        "table": self.table,
        "total_events": self.total_events,
        "total_users": self.total_users,
        "date_range": [
            self.date_range[0].isoformat(),
            self.date_range[1].isoformat(),
        ],
        "events": [event.to_dict() for event in self.events],
    }
```

## mixpanel_data.ColumnStatsResult

```
ColumnStatsResult(
    table: str,
    column: str,
    dtype: str,
    count: int,
    null_count: int,
    null_pct: float,
    unique_count: int,
    unique_pct: float,
    top_values: list[tuple[Any, int]] = list(),
    min: float | None = None,
    max: float | None = None,
    mean: float | None = None,
    std: float | None = None,
    _df_cache: DataFrame | None = None,
)
```

Deep statistical analysis of a single column.

Provides detailed statistics including null rates, cardinality, top values, and numeric statistics (for numeric columns). Supports JSON path expressions for analyzing properties.

### table

```
table: str
```

Name of the source table.

### column

```
column: str
```

Column expression analyzed (may include JSON path).

### dtype

```
dtype: str
```

DuckDB data type of the column.

### count

```
count: int
```

Number of non-null values.

### null_count

```
null_count: int
```

Number of null values.

### null_pct

```
null_pct: float
```

Percentage of null values (0.0 to 100.0).

### unique_count

```
unique_count: int
```

Approximate count of distinct values.

### unique_pct

```
unique_pct: float
```

Percentage of values that are unique (0.0 to 100.0).

### top_values

```
top_values: list[tuple[Any, int]] = field(default_factory=list)
```

Most frequent (value, count) pairs.

### min

```
min: float | None = None
```

Minimum value (None for non-numeric).

### max

```
max: float | None = None
```

Maximum value (None for non-numeric).

### mean

```
mean: float | None = None
```

Mean value (None for non-numeric).

### std

```
std: float | None = None
```

Standard deviation (None for non-numeric).

### df

```
df: DataFrame
```

Convert top values to DataFrame with columns: value, count.

Conversion is lazy - computed on first access and cached.

| RETURNS     | DESCRIPTION                                 |
| ----------- | ------------------------------------------- |
| `DataFrame` | DataFrame with top values and their counts. |

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

| RETURNS          | DESCRIPTION                            |
| ---------------- | -------------------------------------- |
| `dict[str, Any]` | Dictionary with all column statistics. |

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output.

    Returns:
        Dictionary with all column statistics.
    """
    return {
        "table": self.table,
        "column": self.column,
        "dtype": self.dtype,
        "count": self.count,
        "null_count": self.null_count,
        "null_pct": self.null_pct,
        "unique_count": self.unique_count,
        "unique_pct": self.unique_pct,
        "top_values": [[value, count] for value, count in self.top_values],
        "min": self.min,
        "max": self.max,
        "mean": self.mean,
        "std": self.std,
    }
```

## Storage Types

## mixpanel_data.TableMetadata

```
TableMetadata(
    type: Literal["events", "profiles"],
    fetched_at: datetime,
    from_date: str | None = None,
    to_date: str | None = None,
    filter_events: list[str] | None = None,
    filter_where: str | None = None,
    filter_cohort_id: str | None = None,
    filter_output_properties: list[str] | None = None,
    filter_group_id: str | None = None,
    filter_behaviors: str | None = None,
)
```

Metadata for a data fetch operation.

This metadata is passed to table creation methods and stored in the database's internal \_metadata table for tracking fetch operations.

### type

```
type: Literal['events', 'profiles']
```

Type of data fetched.

### fetched_at

```
fetched_at: datetime
```

When the fetch completed (UTC).

### from_date

```
from_date: str | None = None
```

Start date for events (YYYY-MM-DD), None for profiles.

### to_date

```
to_date: str | None = None
```

End date for events (YYYY-MM-DD), None for profiles.

### filter_events

```
filter_events: list[str] | None = None
```

Event names filtered (if applicable).

### filter_where

```
filter_where: str | None = None
```

WHERE clause filter (if applicable).

### filter_cohort_id

```
filter_cohort_id: str | None = None
```

Cohort ID filter for profiles (if applicable).

### filter_output_properties

```
filter_output_properties: list[str] | None = None
```

Property names to include in output (if applicable).

### filter_group_id

```
filter_group_id: str | None = None
```

Group ID for group profile queries (if applicable).

### filter_behaviors

```
filter_behaviors: str | None = None
```

Serialized behaviors filter for behavioral profile queries (if applicable).

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "type": self.type,
        "fetched_at": self.fetched_at.isoformat(),
        "from_date": self.from_date,
        "to_date": self.to_date,
        "filter_events": self.filter_events,
        "filter_where": self.filter_where,
        "filter_cohort_id": self.filter_cohort_id,
        "filter_output_properties": self.filter_output_properties,
        "filter_group_id": self.filter_group_id,
        "filter_behaviors": self.filter_behaviors,
    }
```

## mixpanel_data.TableInfo

```
TableInfo(
    name: str,
    type: Literal["events", "profiles"],
    row_count: int,
    fetched_at: datetime,
)
```

Information about a table in the database.

Returned by list_tables() to provide summary information about available tables without retrieving full schemas.

### name

```
name: str
```

Table name.

### type

```
type: Literal['events', 'profiles']
```

Table type.

### row_count

```
row_count: int
```

Number of rows.

### fetched_at

```
fetched_at: datetime
```

When data was fetched (UTC).

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "name": self.name,
        "type": self.type,
        "row_count": self.row_count,
        "fetched_at": self.fetched_at.isoformat(),
    }
```

## mixpanel_data.ColumnInfo

```
ColumnInfo(name: str, type: str, nullable: bool, primary_key: bool = False)
```

Information about a table column.

Describes a single column's schema, including name, type, nullability constraints, and primary key status.

### name

```
name: str
```

Column name.

### type

```
type: str
```

DuckDB type (VARCHAR, TIMESTAMP, JSON, INTEGER, etc.).

### nullable

```
nullable: bool
```

Whether column allows NULL values.

### primary_key

```
primary_key: bool = False
```

Whether column is a primary key.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "name": self.name,
        "type": self.type,
        "nullable": self.nullable,
        "primary_key": self.primary_key,
    }
```

## mixpanel_data.TableSchema

```
TableSchema(table_name: str, columns: list[ColumnInfo])
```

Schema information for a table.

Returned by get_schema() to describe the structure of a table, including all column definitions.

### table_name

```
table_name: str
```

Table name.

### columns

```
columns: list[ColumnInfo]
```

Column definitions.

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "table_name": self.table_name,
        "columns": [col.to_dict() for col in self.columns],
    }
```

## mixpanel_data.WorkspaceInfo

```
WorkspaceInfo(
    path: Path | None,
    project_id: str,
    region: str,
    account: str | None,
    tables: list[str],
    size_mb: float,
    created_at: datetime | None,
)
```

Information about a Workspace instance.

Returned by Workspace.info() to provide metadata about the workspace including database location, connection details, and table summary.

### path

```
path: Path | None
```

Database file path (None for ephemeral or in-memory workspaces).

### project_id

```
project_id: str
```

Mixpanel project ID.

### region

```
region: str
```

Data residency region (us, eu, in).

### account

```
account: str | None
```

Named account used (None if credentials from environment).

### tables

```
tables: list[str]
```

Names of tables in the database.

### size_mb

```
size_mb: float
```

Database file size in megabytes (0.0 for in-memory workspaces).

### created_at

```
created_at: datetime | None
```

When database was created (None if unknown).

### to_dict

```
to_dict() -> dict[str, Any]
```

Serialize for JSON output.

Source code in `src/mixpanel_data/types.py`

```
def to_dict(self) -> dict[str, Any]:
    """Serialize for JSON output."""
    return {
        "path": str(self.path) if self.path else None,
        "project_id": self.project_id,
        "region": self.region,
        "account": self.account,
        "tables": self.tables,
        "size_mb": self.size_mb,
        "created_at": self.created_at.isoformat() if self.created_at else None,
    }
```

Copy markdown
# CLI Reference

# CLI Overview

The `mp` command provides full access to mixpanel_data functionality from the command line.

Explore on DeepWiki

🤖 **[CLI Usage Guide →](https://deepwiki.com/jaredmcfarland/mixpanel_data/3.1-cli-usage)**

Ask questions about CLI commands, explore options, or get help with specific workflows.

## Installation

The CLI is installed automatically with the package:

```
pip install mixpanel_data
```

Verify installation:

```
mp --version
```

## Global Options

| Option      | Short | Description                             |
| ----------- | ----- | --------------------------------------- |
| `--account` | `-a`  | Account name to use (overrides default) |
| `--quiet`   | `-q`  | Suppress progress output                |
| `--verbose` | `-v`  | Enable debug output                     |
| `--version` |       | Show version and exit                   |
| `--help`    |       | Show help and exit                      |

## Command Groups

### auth — Account Management

Manage stored credentials and accounts.

| Command          | Description              |
| ---------------- | ------------------------ |
| `mp auth list`   | List configured accounts |
| `mp auth add`    | Add a new account        |
| `mp auth remove` | Remove an account        |
| `mp auth switch` | Set the default account  |
| `mp auth show`   | Display account details  |
| `mp auth test`   | Test account credentials |

### fetch — Data Fetching

Fetch data from Mixpanel into local storage, or stream directly to stdout.

| Command             | Description                         |
| ------------------- | ----------------------------------- |
| `mp fetch events`   | Fetch events to local DuckDB        |
| `mp fetch profiles` | Fetch user profiles to local DuckDB |

**Table Options:**

| Option          | Description                                     |
| --------------- | ----------------------------------------------- |
| `--replace`     | Drop and recreate existing table                |
| `--append`      | Add data to existing table (duplicates skipped) |
| `--batch-size`  | Rows per commit (100-100000, default: 1000)     |
| `--no-progress` | Hide progress bar                               |

**Streaming Options:**

| Option     | Description                                          |
| ---------- | ---------------------------------------------------- |
| `--stdout` | Stream data as JSONL to stdout instead of storing    |
| `--raw`    | Output raw Mixpanel API format (requires `--stdout`) |

**Event Filter Options (fetch events only):**

| Option     | Short | Description                                                           |
| ---------- | ----- | --------------------------------------------------------------------- |
| `--events` | `-e`  | Comma-separated event names to filter                                 |
| `--where`  | `-w`  | Mixpanel filter expression                                            |
| `--limit`  | `-l`  | Maximum events to return (max 100000, not compatible with --parallel) |

**Parallel Fetch Options (fetch events):**

| Option         | Short | Description                                                             |
| -------------- | ----- | ----------------------------------------------------------------------- |
| `--parallel`   | `-p`  | Fetch in parallel using multiple threads (faster for large date ranges) |
| `--workers`    |       | Number of parallel workers (default: 10, only with --parallel)          |
| `--chunk-days` |       | Days per chunk for parallel fetching (default: 7, only with --parallel) |

**Parallel Fetch Options (fetch profiles):**

| Option       | Short | Description                                                                   |
| ------------ | ----- | ----------------------------------------------------------------------------- |
| `--parallel` | `-p`  | Fetch in parallel using multiple threads (up to 5x faster for large datasets) |
| `--workers`  |       | Number of parallel workers (default: 5, max: 5, only with --parallel)         |

**Profile Filter Options (fetch profiles only):**

| Option                | Short | Description                                                                          |
| --------------------- | ----- | ------------------------------------------------------------------------------------ |
| `--cohort`            | `-c`  | Filter by cohort ID (mutually exclusive with --behaviors)                            |
| `--output-properties` | `-o`  | Comma-separated properties to include                                                |
| `--where`             | `-w`  | Mixpanel filter expression                                                           |
| `--behaviors`         |       | Behavioral filter as JSON array (requires --where, mutually exclusive with --cohort) |
| `--distinct-id`       |       | Fetch a specific user by distinct_id (mutually exclusive with --distinct-ids)        |
| `--distinct-ids`      |       | Fetch specific users (repeatable flag, mutually exclusive with --distinct-id)        |
| `--group-id`          | `-g`  | Fetch group profiles instead of user profiles                                        |
| `--as-of-timestamp`   |       | Query profile state at a specific Unix timestamp                                     |
| `--include-all-users` |       | Include all users and mark cohort membership (requires --cohort)                     |

### query — Query Operations

Execute queries against local or remote data.

| Command                         | Description                                       |
| ------------------------------- | ------------------------------------------------- |
| `mp query sql`                  | Query local DuckDB with SQL                       |
| `mp query segmentation`         | Time-series event counts                          |
| `mp query funnel`               | Funnel conversion analysis                        |
| `mp query retention`            | Cohort retention analysis                         |
| `mp query jql`                  | Execute JQL scripts                               |
| `mp query event-counts`         | Multi-event time series                           |
| `mp query property-counts`      | Property breakdown time series                    |
| `mp query activity-feed`        | User event history                                |
| `mp query saved-report`         | Query saved reports (Insights, Retention, Funnel) |
| `mp query flows`                | Query saved Flows reports                         |
| `mp query frequency`            | Event frequency distribution                      |
| `mp query segmentation-numeric` | Numeric property bucketing                        |
| `mp query segmentation-sum`     | Numeric property sum                              |
| `mp query segmentation-average` | Numeric property average                          |

Saved Reports Workflow

Use `mp inspect bookmarks` to list available saved reports and get their IDs, then query them with `mp query saved-report` or `mp query flows`.

### inspect — Discovery & Introspection

Explore schema and local database.

| Command                      | Description                                 |
| ---------------------------- | ------------------------------------------- |
| `mp inspect events`          | List event names                            |
| `mp inspect properties`      | List properties for an event                |
| `mp inspect values`          | List values for a property                  |
| `mp inspect funnels`         | List saved funnels                          |
| `mp inspect cohorts`         | List saved cohorts                          |
| `mp inspect bookmarks`       | List saved reports (bookmarks)              |
| `mp inspect top-events`      | List today's top events                     |
| `mp inspect lexicon-schemas` | List Lexicon schemas from data dictionary   |
| `mp inspect lexicon-schema`  | Get a single Lexicon schema                 |
| `mp inspect info`            | Show workspace info                         |
| `mp inspect tables`          | List local tables                           |
| `mp inspect schema`          | Show table schema                           |
| `mp inspect drop`            | Drop a local table                          |
| `mp inspect drop-all`        | Drop all tables (with optional type filter) |
| `mp inspect sample`          | Random sample rows from a table             |
| `mp inspect summarize`       | Statistical summary of all columns          |
| `mp inspect breakdown`       | Event distribution analysis                 |
| `mp inspect keys`            | Discover JSON property keys                 |
| `mp inspect column`          | Deep column-level statistics                |
| `mp inspect distribution`    | Property value distribution (JQL)           |
| `mp inspect numeric`         | Numeric property statistics (JQL)           |
| `mp inspect daily`           | Daily event counts (JQL)                    |
| `mp inspect engagement`      | User engagement distribution (JQL)          |
| `mp inspect coverage`        | Property coverage analysis (JQL)            |

## Output Formats

All commands support the `--format` option:

| Format  | Description          | Use Case                  |
| ------- | -------------------- | ------------------------- |
| `json`  | Pretty-printed JSON  | Default, human-readable   |
| `jsonl` | JSON Lines           | Streaming, large datasets |
| `table` | Rich formatted table | Terminal viewing          |
| `csv`   | CSV with headers     | Spreadsheet export        |
| `plain` | Minimal text         | Scripting                 |

## Filtering with --jq

Commands that output JSON also support the `--jq` option for client-side filtering using jq syntax. This enables powerful transformations without external tools.

```
# Get first 5 events
mp inspect events --format json --jq '.[:5]'

# Filter events by name pattern
mp inspect events --format json --jq '.[] | select(startswith("User"))'

# Count results
mp inspect events --format json --jq 'length'

# Extract specific fields from query results
mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 \
  --format json --jq '.series | to_entries | map({date: .key, count: .value})'

# Filter SQL results
mp query sql "SELECT * FROM events LIMIT 100" --format json \
  --jq '.[] | select(.event_name == "Purchase")'
```

--jq requires JSON format

The `--jq` option only works with `--format json` or `--format jsonl`. Using it with other formats produces an error.

See the [jq manual](https://jqlang.org/manual/) for filter syntax.

### Format Examples

Given this query:

```
mp query sql "SELECT event_name, COUNT(*) as count FROM events GROUP BY 1 LIMIT 3"
```

**json** (default) — Pretty-printed, easy to read:

```
[
  {
    "event_name": "Purchase",
    "count": 1523
  },
  {
    "event_name": "Signup",
    "count": 892
  },
  {
    "event_name": "Login",
    "count": 4201
  }
]
```

**jsonl** — One object per line, ideal for streaming:

```
{"event_name": "Purchase", "count": 1523}
{"event_name": "Signup", "count": 892}
{"event_name": "Login", "count": 4201}
```

**table** — Rich ASCII table for terminal viewing:

```
┏━━━━━━━━━━━━━┳━━━━━━━┓
┃ EVENT NAME  ┃ COUNT ┃
┡━━━━━━━━━━━━━╇━━━━━━━┩
│ Purchase    │ 1523  │
│ Signup      │ 892   │
│ Login       │ 4201  │
└─────────────┴───────┘
```

**csv** — Headers plus comma-separated values:

```
event_name,count
Purchase,1523
Signup,892
Login,4201
```

**plain** — Minimal output, one value per line:

```
Purchase
Signup
Login
```

### Choosing a Format

```
# Terminal viewing
mp inspect events --format table

# Export to spreadsheet
mp query sql "SELECT * FROM events" --format csv > events.csv

# Pipe to jq for processing
mp query segmentation "Purchase" --from 2025-01-01 --format json | jq '.values'

# Count results
mp inspect events --format plain | wc -l

# Stream to another tool
mp query sql "SELECT * FROM events" --format jsonl | python process.py
```

## Exit Codes

| Code | Meaning              | Exception                                    |
| ---- | -------------------- | -------------------------------------------- |
| 0    | Success              | —                                            |
| 1    | General error        | `MixpanelDataError`                          |
| 2    | Authentication error | `AuthenticationError`                        |
| 3    | Invalid arguments    | `ConfigError`, validation errors             |
| 4    | Resource not found   | `TableNotFoundError`, `AccountNotFoundError` |
| 5    | Rate limit exceeded  | `RateLimitError`                             |
| 130  | Interrupted          | Ctrl+C                                       |

## Environment Variables

| Variable         | Description               |
| ---------------- | ------------------------- |
| `MP_USERNAME`    | Service account username  |
| `MP_SECRET`      | Service account secret    |
| `MP_PROJECT_ID`  | Project ID                |
| `MP_REGION`      | Data residency region     |
| `MP_ACCOUNT`     | Account name to use       |
| `MP_CONFIG_PATH` | Override config file path |

## Examples

### Complete Workflow

```
# 1. Set up credentials (prompts for secret securely)
mp auth add production --username sa_... --project 12345 --region us

# 2. Explore schema
mp inspect events
mp inspect properties --event Purchase

# 3. Fetch data
mp fetch events jan --from 2025-01-01 --to 2025-01-31

# 4. Query locally
mp query sql "SELECT event_name, COUNT(*) FROM jan GROUP BY 1" --format table

# 5. Run live queries
mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 --format table
```

### Incremental Fetching

```
# Fetch initial data
mp fetch events events --from 2025-01-01 --to 2025-01-31

# Append more data later
mp fetch events events --from 2025-02-01 --to 2025-02-28 --append

# Resume after a crash (overlapping dates are safe)
mp query sql "SELECT MAX(event_time) FROM events"
mp fetch events events --from 2025-02-15 --to 2025-02-28 --append

# Replace with fresh data
mp fetch events events --from 2025-01-01 --to 2025-02-28 --replace

# Parallel fetch for large date ranges (up to 10x faster)
mp fetch events events --from 2025-01-01 --to 2025-12-31 --parallel

# Parallel fetch with custom settings
mp fetch events events --from 2025-01-01 --to 2025-12-31 --parallel --workers 20 --chunk-days 3

# Parallel profile fetch for large datasets (up to 5x faster)
mp fetch profiles users --parallel

# Parallel profile fetch with custom workers
mp fetch profiles users --parallel --workers 3

# Parallel profile fetch with filters
mp fetch profiles premium --where 'properties["plan"] == "premium"' --parallel
```

### Piping and Scripting

```
# Export to file
mp query sql "SELECT * FROM events" --format csv > events.csv

# Built-in jq filtering (no external tools needed)
mp query segmentation --event Login --from 2025-01-01 --to 2025-01-31 \
    --format json --jq '.series | keys | length'

# Or pipe to external jq
mp query segmentation --event Login --from 2025-01-01 --to 2025-01-31 --format json \
    | jq '.values."$overall"'

# Count lines
mp query sql "SELECT * FROM events" --format jsonl | wc -l
```

### Streaming to Stdout

Stream data directly without storing locally:

```
# Stream events as JSONL
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout

# Stream profiles
mp fetch profiles --stdout

# Stream profiles filtered by cohort
mp fetch profiles --stdout --cohort 12345

# Stream specific profile properties only
mp fetch profiles --stdout --output-properties '$email,$name,plan'

# Stream profiles with behavioral filter (users who purchased in last 30 days)
mp fetch profiles --stdout \
    --behaviors '[{"window":"30d","name":"buyers","event_selectors":[{"event":"Purchase"}]}]' \
    --where '(behaviors["buyers"] > 0)'

# Fetch a specific user profile
mp fetch profiles --stdout --distinct-id user_123

# Fetch multiple specific user profiles
mp fetch profiles --stdout --distinct-ids user_123 --distinct-ids user_456

# Fetch group profiles (e.g., companies)
mp fetch profiles --stdout --group-id companies

# Pipe to jq for filtering
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout \
    | jq 'select(.event_name == "Purchase")'

# Save to file
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout > events.jsonl

# Raw Mixpanel API format
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout --raw
```

## Full Command Reference

See [Commands](https://jaredmcfarland.github.io/mixpanel_data/cli/commands/index.md) for the complete auto-generated reference.

Copy markdown

# CLI Commands

Complete reference for the `mp` command-line interface.

Explore on DeepWiki

🤖 **[CLI Command Reference →](https://deepwiki.com/jaredmcfarland/mixpanel_data/7.1-cli-command-reference)**

Ask questions about specific commands, explore options, or get examples for your use case.

### mp

Mixpanel data CLI - fetch, store, and query analytics data.

Usage:

```
mp [OPTIONS] COMMAND [ARGS]...
```

Options:

```
  -a, --account TEXT    Account name to use (overrides default).  \[env var:
                        MP_ACCOUNT]
  -q, --quiet           Suppress progress output.
  -v, --verbose         Enable debug output.
  --version             Show version and exit.
  --install-completion  Install completion for the current shell.
  --show-completion     Show completion for the current shell, to copy it or
                        customize the installation.
```

#### auth

Manage authentication and accounts.

Usage:

```
mp auth [OPTIONS] COMMAND [ARGS]...
```

##### add

Add a new account to the configuration.

The secret can be provided via:

- Interactive prompt (default, hidden input)
- MP_SECRET environment variable (for CI/CD)
- --secret-stdin flag to read from stdin

Examples:

```
mp auth add production -u myuser -p 12345
MP_SECRET=abc123 mp auth add production -u myuser -p 12345  # inline env var
echo "$SECRET" | mp auth add production -u myuser -p 12345 --secret-stdin
mp auth add staging -u myuser -p 12345 -r eu --default
```

Usage:

```
mp auth add [OPTIONS] NAME
```

Options:

```
  NAME                 Account name (identifier).  \[required]
  -u, --username TEXT  Service account username.
  -p, --project TEXT   Project ID.
  -r, --region TEXT    Region: us, eu, or in.  \[default: us]
  -d, --default        Set as default account.
  -i, --interactive    Prompt for all credentials.
  --secret-stdin       Read secret from stdin.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
```

##### list

List all configured accounts.

Shows account name, username, project ID, region, and default status.

Examples:

```
mp auth list
mp auth list --format table
```

Usage:

```
mp auth list [OPTIONS]
```

Options:

```
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
```

##### remove

Remove an account from the configuration.

Deletes the account credentials from local config. Use --force to skip the confirmation prompt.

Examples:

```
mp auth remove staging
mp auth remove old_account --force
```

Usage:

```
mp auth remove [OPTIONS] NAME
```

Options:

```
  NAME                 Account name to remove.  \[required]
  --force              Skip confirmation prompt.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
```

##### show

Show account details (secret is redacted).

Displays configuration for the named account or default if omitted.

Examples:

```
mp auth show
mp auth show production
mp auth show --format table
```

Usage:

```
mp auth show [OPTIONS] [NAME]
```

Options:

```
  [NAME]               Account name (default if omitted).
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
```

##### switch

Set an account as the default.

The default account is used when --account is not specified.

Examples:

```
mp auth switch production
mp auth switch staging
```

Usage:

```
mp auth switch [OPTIONS] NAME
```

Options:

```
  NAME                 Account name to set as default.  \[required]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
```

##### test

Test account credentials by pinging the API.

Verifies that the credentials are valid and can access the project.

Examples:

```
mp auth test
mp auth test production
```

Usage:

```
mp auth test [OPTIONS] [NAME]
```

Options:

```
  [NAME]               Account name to test (default if omitted).
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
```

#### fetch

Fetch data from Mixpanel.

Usage:

```
mp fetch [OPTIONS] COMMAND [ARGS]...
```

##### events

Fetch events from Mixpanel into local storage.

Events are stored in a DuckDB table for SQL querying. A progress bar shows fetch progress (disable with --no-progress or --quiet).

**Note:** This is a long-running operation. For large date ranges, use --parallel for up to 10x faster exports.

Use --events to filter by event name (comma-separated list). Use --where for Mixpanel expression filters (e.g., 'properties["country"]=="US"'). Use --limit to cap the number of events returned (max 100000). Use --replace to drop and recreate an existing table. Use --append to add data to an existing table. Use --parallel/-p for faster parallel fetching (recommended for large date ranges). Use --chunk-days to configure days per chunk for parallel fetching (default: 7). Use --stdout to stream JSONL to stdout instead of storing locally. Use --raw with --stdout to output raw Mixpanel API format.

**Output Structure (JSON):**

```
{
  "table": "events",
  "rows": 15234,
  "type": "events",
  "duration_seconds": 12.5,
  "date_range": ["2025-01-01", "2025-01-31"],
  "fetched_at": "2025-01-15T10:30:00Z"
}
```

**Parallel Output Structure (JSON):**

```
{
  "table": "events",
  "total_rows": 15234,
  "successful_batches": 5,
  "failed_batches": 0,
  "has_failures": false,
  "duration_seconds": 2.5,
  "fetched_at": "2025-01-15T10:30:00Z"
}
```

**Examples:**

```
mp fetch events --from 2025-01-01 --to 2025-01-31
mp fetch events signups --from 2025-01-01 --to 2025-01-31 --events "Sign Up"
mp fetch events --from 2025-01-01 --to 2025-01-31 --where 'properties["country"]=="US"'
mp fetch events --from 2025-01-01 --to 2025-01-31 --limit 10000
mp fetch events --from 2025-01-01 --to 2025-01-31 --replace
mp fetch events --from 2025-01-01 --to 2025-01-31 --append
mp fetch events --from 2025-01-01 --to 2025-01-31 --parallel
mp fetch events --from 2025-01-01 --to 2025-01-31 --parallel --chunk-days 1
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout
mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout --raw | jq '.event'
```

**jq Examples:**

```
--jq '.rows'                         # Number of events fetched (sequential)
--jq '.total_rows'                   # Number of events fetched (parallel)
--jq '.duration_seconds | round'     # Fetch duration in seconds
--jq '.date_range'                   # Date range fetched
```

Usage:

```
mp fetch events [OPTIONS] [NAME]
```

Options:

```
  [NAME]                      Table name for storing events. Ignored with
                              --stdout.
  --from TEXT                 Start date (YYYY-MM-DD).
  --to TEXT                   End date (YYYY-MM-DD).
  -e, --events TEXT           Comma-separated event filter.
  -w, --where TEXT            Mixpanel filter expression.
  -l, --limit INTEGER RANGE   Maximum events to return (max 100000).
                              [1<=x<=100000]
  --replace                   Replace existing table.
  --append                    Append to existing table.
  --no-progress               Hide progress bar.
  -p, --parallel              Fetch in parallel using multiple threads. Faster
                              for large date ranges.
  --workers INTEGER RANGE     Number of parallel workers (default: 10). Only
                              applies with --parallel.  \[x>=1]
  --chunk-days INTEGER RANGE  Days per chunk for parallel fetching (default:
                              7). Only applies with --parallel.  \[default: 7;
                              1<=x<=100]
  --stdout                    Stream to stdout as JSONL instead of storing.
  --raw                       Output raw API format (only with --stdout).
  --batch-size INTEGER RANGE  Rows per commit. Controls memory/IO tradeoff.
                              (100-100000)  \[default: 1000; 100<=x<=100000]
  -f, --format [TEXT]         Output format: json, jsonl, table, csv, plain.
                              \[default: json]
  --jq TEXT                   Apply jq filter to JSON output (requires
                              --format json or jsonl).
```

##### profiles

Fetch user profiles from Mixpanel into local storage.

Profiles are stored in a DuckDB table for SQL querying. A progress bar shows fetch progress (disable with --no-progress or --quiet).

**Note:** This can be a long-running operation for large profile sets. Use --parallel for up to 5x faster exports.

Use --where for Mixpanel expression filters on profile properties. Use --cohort to filter by cohort ID membership. Use --output-properties to select specific properties (reduces bandwidth). Use --distinct-id to fetch a single user's profile. Use --distinct-ids to fetch multiple specific users (repeatable flag). Use --group-id to fetch group profiles (e.g., companies) instead of users. Use --behaviors with --where to filter by user behavior (see --behaviors help for format). Use --as-of-timestamp to query historical profile state. Use --include-all-users with --cohort to include non-members with membership flag. Use --replace to drop and recreate an existing table. Use --append to add data to an existing table. Use --parallel/-p for faster parallel fetching (recommended for large profile sets). Use --stdout to stream JSONL to stdout instead of storing locally. Use --raw with --stdout to output raw Mixpanel API format.

**Output Structure (JSON - Sequential):**

```
{
  "table": "profiles",
  "rows": 5000,
  "type": "profiles",
  "duration_seconds": 8.2,
  "date_range": null,
  "fetched_at": "2025-01-15T10:30:00Z"
}
```

**Output Structure (JSON - Parallel):**

```
{
  "table": "profiles",
  "total_rows": 5000,
  "successful_pages": 5,
  "failed_pages": 0,
  "failed_page_indices": [],
  "duration_seconds": 1.8,
  "fetched_at": "2025-01-15T10:30:00Z"
}
```

**Examples:**

```
mp fetch profiles
mp fetch profiles users --replace
mp fetch profiles users --append
mp fetch profiles --parallel
mp fetch profiles --parallel --workers 3
mp fetch profiles --where 'properties["plan"]=="premium"'
mp fetch profiles --cohort 12345
mp fetch profiles --output-properties '$email,$name,plan'
mp fetch profiles --distinct-id user_123
mp fetch profiles --distinct-ids user_1 --distinct-ids user_2
mp fetch profiles --group-id companies
mp fetch profiles --behaviors '[{"window":"30d","name":"buyers","event_selectors":[{"event":"Purchase"}]}]' --where '(behaviors["buyers"] > 0)'
mp fetch profiles --as-of-timestamp 1704067200
mp fetch profiles --cohort 12345 --include-all-users
mp fetch profiles --stdout
mp fetch profiles --stdout --raw
```

**jq Examples:**

```
--jq '.rows'                         # Number of profiles fetched (sequential)
--jq '.total_rows'                   # Number of profiles fetched (parallel)
--jq '.table'                        # Table name created
--jq '.duration_seconds | round'     # Fetch duration in seconds
```

Usage:

```
mp fetch profiles [OPTIONS] [NAME]
```

Options:

```
  [NAME]                        Table name for storing profiles. Ignored with
                                --stdout.
  -w, --where TEXT              Mixpanel filter expression.
  -c, --cohort TEXT             Filter by cohort ID.
  -o, --output-properties TEXT  Comma-separated properties to include.
  --replace                     Replace existing table.
  --append                      Append to existing table.
  --no-progress                 Hide progress bar.
  --stdout                      Stream to stdout as JSONL instead of storing.
  --raw                         Output raw API format (only with --stdout).
  --batch-size INTEGER RANGE    Rows per commit. Controls memory/IO tradeoff.
                                (100-100000)  \[default: 1000; 100<=x<=100000]
  --distinct-id TEXT            Fetch a specific user by distinct_id. Mutually
                                exclusive with --distinct-ids.
  --distinct-ids TEXT           Fetch specific users by distinct_id (can be
                                repeated). Mutually exclusive with --distinct-
                                id.
  -g, --group-id TEXT           Fetch group profiles (e.g., 'companies')
                                instead of user profiles.
  --behaviors TEXT              Behavioral filter as JSON array. Each behavior
                                needs: "window" (e.g., "30d"), "name"
                                (identifier), and "event_selectors" (array
                                with {"event":"Name"}). Use with --where to
                                filter by behavior count, e.g., --where
                                '(behaviors["name"] > 0)'. Example: '[{"window
                                ":"30d","name":"buyers","event_selectors":[{"e
                                vent":"Purchase"}]}]'. Mutually exclusive with
                                --cohort.
  --as-of-timestamp INTEGER     Query profile state at a specific Unix
                                timestamp (must be in the past).
  --include-all-users           Include all users and mark cohort membership.
                                Requires --cohort.
  -p, --parallel                Fetch in parallel using multiple threads. Up
                                to 5x faster for large exports.
  --workers INTEGER             Number of parallel workers (default: 5, max:
                                5). Only applies with --parallel.
  -f, --format [TEXT]           Output format: json, jsonl, table, csv, plain.
                                \[default: json]
  --jq TEXT                     Apply jq filter to JSON output (requires
                                --format json or jsonl).
```

#### inspect

Inspect schema and local database.

Usage:

```
mp inspect [OPTIONS] COMMAND [ARGS]...
```

##### bookmarks

List saved reports (bookmarks) in Mixpanel project.

Calls the Mixpanel API to retrieve saved report definitions. Use the bookmark ID with 'mp query saved-report' or 'mp query flows'.

Output Structure (JSON):

```
[
  {"id": 98765, "name": "Weekly KPIs", "type": "insights", "modified": "2024-01-15T10:30:00"},
  {"id": 98766, "name": "Conversion Funnel", "type": "funnels", "modified": "2024-01-14T15:45:00"},
  {"id": 98767, "name": "User Retention", "type": "retention", "modified": "2024-01-13T09:20:00"}
]
```

Examples:

```
mp inspect bookmarks
mp inspect bookmarks --type insights
mp inspect bookmarks --type funnels --format table
```

**jq Examples:**

```
--jq '[.[] | select(.type == "insights")]'    # Get bookmarks by type
--jq '[.[].id]'                               # Get bookmark IDs only
--jq 'sort_by(.modified) | reverse'           # Sort by modified date (newest first)
--jq '.[] | select(.name | test("KPI"; "i"))' # Find bookmark by name
```

Usage:

```
mp inspect bookmarks [OPTIONS]
```

Options:

```
  -t, --type TEXT      Filter by type: insights, funnels, retention, flows,
                       launch-analysis.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### breakdown

Show event distribution in a table.

Analyzes event counts, unique users, date ranges, and percentages for each event type. Requires event_name, event_time, distinct_id columns.

Output Structure (JSON):

```
{
  "table": "events",
  "total_events": 125000,
  "total_users": 8500,
  "date_range": ["2024-01-01T00:00:00", "2024-01-31T23:59:59"],
  "events": [
    {
      "event_name": "Page View",
      "count": 75000,
      "unique_users": 8200,
      "first_seen": "2024-01-01T00:05:00",
      "last_seen": "2024-01-31T23:55:00",
      "pct_of_total": 60.0
    },
    {
      "event_name": "Purchase",
      "count": 5000,
      "unique_users": 2100,
      "first_seen": "2024-01-01T08:30:00",
      "last_seen": "2024-01-31T22:15:00",
      "pct_of_total": 4.0
    }
  ]
}
```

Examples:

```
mp inspect breakdown -t events
mp inspect breakdown -t events --format json
```

**jq Examples:**

```
--jq '.events | sort_by(.count) | reverse | [.[].event_name]'    # Event names sorted by count
--jq '.events | [.[] | select(.pct_of_total > 10)]'              # Events with more than 10%
--jq '.total_events'                                              # Get total event count
--jq '.events | max_by(.unique_users)'                            # Event with most unique users
```

Usage:

```
mp inspect breakdown [OPTIONS]
```

Options:

```
  -t, --table TEXT     Table name.  \[required]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### cohorts

List saved cohorts in Mixpanel project.

Calls the Mixpanel API to retrieve saved cohort definitions. Shows cohort ID, name, user count, and description.

Output Structure (JSON):

```
[
  {"id": 1001, "name": "Power Users", "count": 5420, "description": "Users with 10+ sessions"},
  {"id": 1002, "name": "Trial Users", "count": 892, "description": "Active trial accounts"},
  {"id": 1003, "name": "Churned", "count": 2341, "description": "No activity in 30 days"}
]
```

Examples:

```
mp inspect cohorts
mp inspect cohorts --format table
```

**jq Examples:**

```
--jq '[.[] | select(.count > 1000)]'           # Cohorts with more than 1000 users
--jq '[.[].name]'                              # Get cohort names only
--jq 'sort_by(.count) | reverse'               # Sort by user count descending
--jq '.[] | select(.name == "Power Users")'    # Find cohort by name
```

Usage:

```
mp inspect cohorts [OPTIONS]
```

Options:

```
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### column

Show detailed statistics for a single column.

Performs deep analysis including null rates, cardinality, top values, and numeric statistics. Supports JSON path expressions like "properties->>'$.country'" for analyzing JSON columns.

Output Structure (JSON):

```
{
  "table": "events",
  "column": "properties->>'$.country'",
  "dtype": "VARCHAR",
  "count": 120000,
  "null_count": 5000,
  "null_pct": 4.0,
  "unique_count": 45,
  "unique_pct": 0.04,
  "top_values": [["US", 45000], ["UK", 22000], ["DE", 15000]],
  "min": null,
  "max": null,
  "mean": null,
  "std": null
}
```

Examples:

```
mp inspect column -t events -c event_name
mp inspect column -t events -c "properties->>'$.country'"
mp inspect column -t events -c distinct_id --top 20
```

**jq Examples:**

```
--jq '.top_values'               # Get top values only
--jq '.null_pct'                 # Get null percentage
--jq '.unique_count'             # Get unique count
--jq '.top_values | map(.[0])'   # Get top value names only
```

Usage:

```
mp inspect column [OPTIONS]
```

Options:

```
  -t, --table TEXT     Table name.  \[required]
  -c, --column TEXT    Column name or expression.  \[required]
  --top INTEGER        Number of top values to show.  \[default: 10]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### coverage

Show property coverage statistics from Mixpanel.

Uses JQL to count how often each property is defined (non-null) vs undefined. Useful for data quality assessment.

Output Structure (JSON):

```
{
  "event": "Purchase",
  "from_date": "2024-01-01",
  "to_date": "2024-01-31",
  "total_events": 5000,
  "coverage": [
    {"property": "amount", "defined_count": 5000, "null_count": 0, "coverage_percentage": 100.0},
    {"property": "coupon_code", "defined_count": 1250, "null_count": 3750, "coverage_percentage": 25.0},
    {"property": "referrer", "defined_count": 4500, "null_count": 500, "coverage_percentage": 90.0}
  ]
}
```

Examples:

```
mp inspect coverage -e Purchase -p coupon_code,referrer --from 2024-01-01 --to 2024-01-31
```

**jq Examples:**

```
--jq '.coverage | [.[] | select(.coverage_percentage < 50)]'     # Properties with low coverage
--jq '.coverage | [.[] | select(.coverage_percentage == 100)]'   # Fully covered properties
--jq '.coverage | [.[].property]'                                # Get property names only
--jq '.coverage | sort_by(.coverage_percentage)'                 # Sort by coverage percentage
```

Usage:

```
mp inspect coverage [OPTIONS]
```

Options:

```
  -e, --event TEXT       Event name to analyze.  \[required]
  -p, --properties TEXT  Comma-separated property names to check.  \[required]
  --from TEXT            Start date (YYYY-MM-DD).  \[required]
  --to TEXT              End date (YYYY-MM-DD).  \[required]
  -f, --format [TEXT]    Output format: json, jsonl, table, csv, plain.
                         \[default: json]
  --jq TEXT              Apply jq filter to JSON output (requires --format
                         json or jsonl).
```

##### daily

Show daily event counts from Mixpanel.

Uses JQL to count events by day. Optionally filter to specific events. Useful for understanding activity trends over time.

Output Structure (JSON):

```
{
  "from_date": "2024-01-01",
  "to_date": "2024-01-07",
  "events": ["Purchase", "Signup"],
  "counts": [
    {"date": "2024-01-01", "event": "Purchase", "count": 150},
    {"date": "2024-01-01", "event": "Signup", "count": 45},
    {"date": "2024-01-02", "event": "Purchase", "count": 175},
    {"date": "2024-01-02", "event": "Signup", "count": 52}
  ]
}
```

Examples:

```
mp inspect daily --from 2024-01-01 --to 2024-01-07
mp inspect daily --from 2024-01-01 --to 2024-01-07 -e Purchase,Signup
```

**jq Examples:**

```
--jq '.counts | [.[] | select(.event == "Purchase")] | map(.count) | add'              # Total for one event
--jq '.counts | [.[] | select(.date == "2024-01-01")]'                                 # Counts for specific date
--jq '.counts | [.[].date] | unique'                                                   # Get all dates
--jq '.counts | group_by(.date) | [.[] | {date: .[0].date, total: map(.count) | add}]' # Daily totals
```

Usage:

```
mp inspect daily [OPTIONS]
```

Options:

```
  --from TEXT          Start date (YYYY-MM-DD).  \[required]
  --to TEXT            End date (YYYY-MM-DD).  \[required]
  -e, --events TEXT    Comma-separated event names (or all if omitted).
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### distribution

Show property value distribution from Mixpanel.

Uses JQL to count occurrences of each value for a property, showing counts and percentages sorted by frequency. Useful for understanding what values a property contains before writing queries.

Output Structure (JSON):

```
{
  "event": "Purchase",
  "property_name": "country",
  "from_date": "2024-01-01",
  "to_date": "2024-01-31",
  "total_count": 50000,
  "values": [
    {"value": "US", "count": 25000, "percentage": 50.0},
    {"value": "UK", "count": 10000, "percentage": 20.0},
    {"value": "DE", "count": 7500, "percentage": 15.0}
  ]
}
```

Examples:

```
mp inspect distribution -e Purchase -p country --from 2024-01-01 --to 2024-01-31
mp inspect distribution -e Signup -p referrer --from 2024-01-01 --to 2024-01-31 --limit 10
```

**jq Examples:**

```
--jq '.values | [.[].value]'                          # Get values only
--jq '.values | [.[] | select(.percentage > 10)]'     # Values with more than 10%
--jq '.total_count'                                   # Get total count
--jq '.values[0]'                                     # Get top value
```

Usage:

```
mp inspect distribution [OPTIONS]
```

Options:

```
  -e, --event TEXT     Event name to analyze.  \[required]
  -p, --property TEXT  Property name to get distribution for.  \[required]
  --from TEXT          Start date (YYYY-MM-DD).  \[required]
  --to TEXT            End date (YYYY-MM-DD).  \[required]
  -l, --limit INTEGER  Maximum values to return.  \[default: 20]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### drop

Drop a table from the local database.

Permanently removes a table and all its data. Use --force to skip the confirmation prompt. Commonly used before re-fetching data.

Output Structure (JSON):

```
{"dropped": "old_events"}
```

Examples:

```
mp inspect drop -t old_events
mp inspect drop -t events --force
```

**jq Examples:**

```
--jq '.dropped'    # Get dropped table name
```

Usage:

```
mp inspect drop [OPTIONS]
```

Options:

```
  -t, --table TEXT     Table name to drop.  \[required]
  --force              Skip confirmation prompt.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### drop-all

Drop all tables from the local database.

Permanently removes all tables and their data. Use --type to filter by table type. Use --force to skip the confirmation prompt.

Output Structure (JSON):

```
{"dropped_count": 3}

# With type filter:
{"dropped_count": 2, "type_filter": "events"}
```

Examples:

```
mp inspect drop-all --force
mp inspect drop-all --type events --force
mp inspect drop-all -t profiles --force
```

**jq Examples:**

```
--jq '.dropped_count'        # Get count of dropped tables
--jq '.dropped_count > 0'    # Check if any tables were dropped
```

Usage:

```
mp inspect drop-all [OPTIONS]
```

Options:

```
  -t, --type TEXT      Only drop tables of this type: events or profiles.
  --force              Skip confirmation prompt.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### engagement

Show user engagement distribution from Mixpanel.

Uses JQL to bucket users by their event count, showing how many users performed N events. Useful for understanding user engagement levels.

Output Structure (JSON):

```
{
  "from_date": "2024-01-01",
  "to_date": "2024-01-31",
  "events": null,
  "total_users": 8500,
  "buckets": [
    {"bucket_min": 1, "bucket_label": "1", "user_count": 2500, "percentage": 29.4},
    {"bucket_min": 2, "bucket_label": "2-5", "user_count": 3200, "percentage": 37.6},
    {"bucket_min": 6, "bucket_label": "6-10", "user_count": 1800, "percentage": 21.2},
    {"bucket_min": 11, "bucket_label": "11+", "user_count": 1000, "percentage": 11.8}
  ]
}
```

Examples:

```
mp inspect engagement --from 2024-01-01 --to 2024-01-31
mp inspect engagement --from 2024-01-01 --to 2024-01-31 -e Purchase
mp inspect engagement --from 2024-01-01 --to 2024-01-31 --buckets 1,5,10,50,100
```

**jq Examples:**

```
--jq '.total_users'                                               # Get total users
--jq '.buckets | [.[] | select(.bucket_min >= 10)]'               # Power users (high engagement)
--jq '.buckets | .[] | select(.bucket_min == 1) | .percentage'    # Single-event user percentage
--jq '.buckets | [.[].bucket_label]'                              # Get bucket labels only
```

Usage:

```
mp inspect engagement [OPTIONS]
```

Options:

```
  --from TEXT          Start date (YYYY-MM-DD).  \[required]
  --to TEXT            End date (YYYY-MM-DD).  \[required]
  -e, --events TEXT    Comma-separated event names (or all if omitted).
  --buckets TEXT       Comma-separated bucket boundaries (e.g., 1,5,10,50).
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### events

List all event names from Mixpanel project.

Calls the Mixpanel API to retrieve tracked event types. Use this to discover what events exist before fetching or querying.

Output Structure (JSON):

```
["Sign Up", "Login", "Purchase", "Page View", "Add to Cart"]
```

Examples:

```
mp inspect events
mp inspect events --format table
mp inspect events --format json --jq '.[0:3]'
```

**jq Examples:**

```
--jq '.[0:5]'                                 # Get first 5 events
--jq 'length'                                 # Count total events
--jq '[.[] | select(contains("Purchase"))]'  # Find events containing "Purchase"
--jq 'sort'                                   # Sort alphabetically
```

Usage:

```
mp inspect events [OPTIONS]
```

Options:

```
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### funnels

List saved funnels in Mixpanel project.

Calls the Mixpanel API to retrieve saved funnel definitions. Use the funnel_id with 'mp query funnel' to run funnel analysis.

Output Structure (JSON):

```
[
  {"funnel_id": 12345, "name": "Onboarding Flow"},
  {"funnel_id": 12346, "name": "Purchase Funnel"},
  {"funnel_id": 12347, "name": "Trial to Paid"}
]
```

Examples:

```
mp inspect funnels
mp inspect funnels --format table
```

**jq Examples:**

```
--jq '[.[].funnel_id]'                               # Get all funnel IDs
--jq '.[] | select(.name | test("Purchase"; "i"))'   # Find funnel by name pattern
--jq '[.[].name]'                                    # Get funnel names only
--jq 'length'                                        # Count funnels
```

Usage:

```
mp inspect funnels [OPTIONS]
```

Options:

```
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### info

Show workspace information.

Shows current account configuration, database location, and connection status. Uses local configuration only (no API call).

Output Structure (JSON):

```
{
  "path": "/path/to/mixpanel.db",
  "project_id": "12345",
  "region": "us",
  "account": "production",
  "tables": ["events", "profiles"],
  "size_mb": 42.5,
  "created_at": "2024-01-10T08:00:00"
}
```

Examples:

```
mp inspect info
mp inspect info --format json
```

**jq Examples:**

```
--jq '.path'         # Get database path
--jq '.project_id'   # Get project ID
--jq '.tables'       # Get list of tables
--jq '.size_mb'      # Get database size in MB
```

Usage:

```
mp inspect info [OPTIONS]
```

Options:

```
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### keys

List JSON property keys in a table.

Extracts distinct keys from the 'properties' JSON column. Useful for discovering queryable fields in event properties.

Output Structure (JSON):

```
["amount", "browser", "campaign", "country", "currency", "device", "platform"]
```

Examples:

```
mp inspect keys -t events
mp inspect keys -t events -e "Purchase"
mp inspect keys -t events --format table
```

**jq Examples:**

```
--jq '.[0:10]'                            # Get first 10 keys
--jq 'length'                             # Count total property keys
--jq '[.[] | select(contains("utm"))]'    # Find keys containing "utm"
--jq 'sort'                               # Sort keys alphabetically
```

Usage:

```
mp inspect keys [OPTIONS]
```

Options:

```
  -t, --table TEXT     Table name.  \[required]
  -e, --event TEXT     Filter to specific event type.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### lexicon-schema

Get a single Lexicon schema from Mixpanel data dictionary.

Retrieves the full schema definition for a specific event or profile property, including all property definitions and metadata.

Output Structure (JSON):

```
{
  "entity_type": "event",
  "name": "Purchase",
  "schema_json": {
    "description": "User completed a purchase",
    "properties": {
      "amount": {"type": "number", "description": "Purchase amount in USD"},
      "currency": {"type": "string", "description": "Currency code"},
      "product_id": {"type": "string", "description": "Product identifier"}
    },
    "metadata": {"hidden": false, "dropped": false, "tags": ["revenue"]}
  }
}
```

Examples:

```
mp inspect lexicon-schema --type event --name "Purchase"
mp inspect lexicon-schema -t event -n "Sign Up"
mp inspect lexicon-schema -t profile -n "Plan Type" --format json
```

**jq Examples:**

```
--jq '.schema_json.properties | keys'                                                     # Get property names only
--jq '.schema_json.properties | to_entries | [.[] | {name: .key, type: .value.type}]'    # Get property types
--jq '.schema_json.description'                                                           # Get description
--jq '.schema_json.metadata.hidden'                                                       # Check if schema is hidden
```

Usage:

```
mp inspect lexicon-schema [OPTIONS]
```

Options:

```
  -t, --type TEXT      Entity type: event, profile, custom_event, etc.
                       \[required]
  -n, --name TEXT      Entity name.  \[required]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### lexicon-schemas

List Lexicon schemas from Mixpanel data dictionary.

Retrieves documented event and profile property schemas from the Mixpanel Lexicon. Shows schema names, types, and property counts.

Output Structure (JSON):

```
[
  {"entity_type": "event", "name": "Purchase", "property_count": 12, "description": "User completed purchase"},
  {"entity_type": "event", "name": "Sign Up", "property_count": 8, "description": "New user registration"},
  {"entity_type": "profile", "name": "Plan Type", "property_count": 3, "description": "User subscription tier"}
]
```

Examples:

```
mp inspect lexicon-schemas
mp inspect lexicon-schemas --type event
mp inspect lexicon-schemas --type profile --format table
```

**jq Examples:**

```
--jq '[.[] | select(.entity_type == "event")]'             # Get only event schemas
--jq '[.[].name]'                                          # Get schema names
--jq '[.[] | select(.property_count > 10)]'                # Schemas with many properties
--jq '[.[] | select(.description | test("purchase"; "i"))]' # Search by description
```

Usage:

```
mp inspect lexicon-schemas [OPTIONS]
```

Options:

```
  -t, --type TEXT      Entity type: event, profile, custom_event, etc.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### numeric

Show numeric property statistics from Mixpanel.

Uses JQL to compute min, max, avg, stddev, and percentiles for a numeric property. Useful for understanding value ranges and distributions.

Output Structure (JSON):

```
{
  "event": "Purchase",
  "property_name": "amount",
  "from_date": "2024-01-01",
  "to_date": "2024-01-31",
  "count": 5000,
  "min": 9.99,
  "max": 999.99,
  "sum": 125000.50,
  "avg": 25.00,
  "stddev": 45.75,
  "percentiles": {"25": 12.99, "50": 19.99, "75": 49.99, "90": 99.99}
}
```

Examples:

```
mp inspect numeric -e Purchase -p amount --from 2024-01-01 --to 2024-01-31
mp inspect numeric -e Purchase -p amount --from 2024-01-01 --to 2024-01-31 --percentiles 10,50,90
```

**jq Examples:**

```
--jq '.avg'               # Get average value
--jq '.percentiles["50"]' # Get median (50th percentile)
--jq '{min, max}'         # Get min and max
--jq '.percentiles'       # Get all percentiles
```

Usage:

```
mp inspect numeric [OPTIONS]
```

Options:

```
  -e, --event TEXT     Event name to analyze.  \[required]
  -p, --property TEXT  Numeric property name.  \[required]
  --from TEXT          Start date (YYYY-MM-DD).  \[required]
  --to TEXT            End date (YYYY-MM-DD).  \[required]
  --percentiles TEXT   Comma-separated percentiles (e.g., 25,50,75,90).
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### properties

List properties for a specific event.

Calls the Mixpanel API to retrieve property names tracked with an event. Shows both custom event properties and default Mixpanel properties.

Output Structure (JSON):

```
["country", "browser", "device", "$city", "$region", "plan_type"]
```

Examples:

```
mp inspect properties -e "Sign Up"
mp inspect properties -e "Purchase" --format table
```

**jq Examples:**

```
--jq '.[0:10]'                                    # Get first 10 properties
--jq '[.[] | select(startswith("$") | not)]'     # User-defined properties (no $ prefix)
--jq '[.[] | select(startswith("$"))]'           # Mixpanel system properties ($ prefix)
--jq 'length'                                     # Count properties
```

Usage:

```
mp inspect properties [OPTIONS]
```

Options:

```
  -e, --event TEXT     Event name.  \[required]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### sample

Show random sample rows from a table.

Uses reservoir sampling to return representative rows from throughout the table. Useful for quickly exploring data structure and values.

Output Structure (JSON):

```
[
  {
    "event_name": "Purchase",
    "event_time": "2024-01-15T10:30:00",
    "distinct_id": "user_123",
    "properties": {"amount": 99.99, "currency": "USD", "product": "Pro Plan"}
  },
  {
    "event_name": "Login",
    "event_time": "2024-01-15T09:15:00",
    "distinct_id": "user_456",
    "properties": {"browser": "Chrome", "platform": "web"}
  }
]
```

Examples:

```
mp inspect sample -t events
mp inspect sample -t events -n 5 --format json
```

**jq Examples:**

```
--jq '[.[].event_name]'                                    # Get event names from sample
--jq '[.[].distinct_id] | unique'                          # Get unique distinct_ids
--jq '[.[].properties.country]'                            # Extract specific property
--jq '[.[] | select(.event_name == "Purchase")]'           # Filter sample by event type
```

Usage:

```
mp inspect sample [OPTIONS]
```

Options:

```
  -t, --table TEXT     Table name.  \[required]
  -n, --rows INTEGER   Number of rows to sample.  \[default: 10]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### schema

Show schema for a table in local database.

Lists all columns with their types and nullability constraints. Useful for understanding the data structure before writing SQL.

Note: The --sample option is reserved for future implementation.

Output Structure (JSON):

```
{
  "table": "events",
  "columns": [
    {"name": "event_name", "type": "VARCHAR", "nullable": false},
    {"name": "event_time", "type": "TIMESTAMP", "nullable": false},
    {"name": "distinct_id", "type": "VARCHAR", "nullable": false},
    {"name": "properties", "type": "JSON", "nullable": true}
  ]
}
```

Examples:

```
mp inspect schema -t events
mp inspect schema -t events --format table
```

**jq Examples:**

```
--jq '.columns | [.[].name]'                   # Get column names only
--jq '.columns | [.[] | select(.nullable)]'   # Get nullable columns
--jq '.columns | [.[] | {name, type}]'        # Get column types
--jq '.columns | length'                       # Count columns
```

Usage:

```
mp inspect schema [OPTIONS]
```

Options:

```
  -t, --table TEXT     Table name.  \[required]
  --sample             Include sample values.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### summarize

Show statistical summary of all columns in a table.

Uses DuckDB's SUMMARIZE command to compute per-column statistics including min/max, quartiles, null percentage, and distinct counts.

Output Structure (JSON):

```
{
  "table": "events",
  "row_count": 125000,
  "columns": [
    {
      "column_name": "event_name",
      "column_type": "VARCHAR",
      "min": "Add to Cart",
      "max": "View Page",
      "approx_unique": 25,
      "avg": null,
      "std": null,
      "q25": null,
      "q50": null,
      "q75": null,
      "count": 125000,
      "null_percentage": 0.0
    }
  ]
}
```

Examples:

```
mp inspect summarize -t events
mp inspect summarize -t events --format json
```

**jq Examples:**

```
--jq '.columns | [.[].column_name]'                         # Get column names
--jq '.columns | [.[] | select(.null_percentage > 0)]'     # Find columns with nulls
--jq '.row_count'                                           # Get row count
--jq '.columns | [.[] | select(.approx_unique > 1000)]'    # High-cardinality columns
```

Usage:

```
mp inspect summarize [OPTIONS]
```

Options:

```
  -t, --table TEXT     Table name.  \[required]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### tables

List tables in local database.

Shows all tables in the local DuckDB database with row counts and fetch timestamps. Use this to see what data has been fetched.

Output Structure (JSON):

```
[
  {"name": "events", "type": "events", "row_count": 125000, "fetched_at": "2024-01-15T10:30:00"},
  {"name": "jan_events", "type": "events", "row_count": 45000, "fetched_at": "2024-01-10T08:00:00"},
  {"name": "profiles", "type": "profiles", "row_count": 8500, "fetched_at": "2024-01-14T14:20:00"}
]
```

Examples:

```
mp inspect tables
mp inspect tables --format table
```

**jq Examples:**

```
--jq '[.[].name]'                               # Get table names only
--jq '[.[] | select(.row_count > 100000)]'      # Tables with more than 100k rows
--jq '[.[] | select(.type == "events")]'        # Get only event tables
--jq '[.[].row_count] | add'                    # Total row count across all tables
```

Usage:

```
mp inspect tables [OPTIONS]
```

Options:

```
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### top-events

List today's top events by count.

Calls the Mixpanel API to retrieve today's most frequent events. Useful for quick overview of project activity.

Output Structure (JSON):

```
[
  {"event": "Page View", "count": 15234, "percent_change": 12.5},
  {"event": "Login", "count": 8921, "percent_change": -3.2},
  {"event": "Purchase", "count": 1456, "percent_change": 8.7}
]
```

Examples:

```
mp inspect top-events
mp inspect top-events --limit 20 --format table
mp inspect top-events --type unique
```

**jq Examples:**

```
--jq '[.[] | select(.percent_change > 0)]'    # Events with positive growth
--jq '[.[].event]'                            # Get just event names
--jq '[.[] | select(.count > 10000)]'         # Events with count over 10000
--jq 'max_by(.percent_change)'                # Event with highest growth
```

Usage:

```
mp inspect top-events [OPTIONS]
```

Options:

```
  -t, --type TEXT      Count type: general, unique, average.  \[default:
                       general]
  -l, --limit INTEGER  Maximum events to return.  \[default: 10]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### values

List sample values for a property.

Calls the Mixpanel API to retrieve sample values for a property. Useful for understanding the data shape before writing queries.

Output Structure (JSON):

```
["US", "UK", "DE", "FR", "CA", "AU", "JP"]
```

Examples:

```
mp inspect values -p country
mp inspect values -p country -e "Sign Up" --limit 20
mp inspect values -p browser --format table
```

**jq Examples:**

```
--jq '.[0:5]'                          # Get first 5 values
--jq 'length'                          # Count unique values
--jq '[.[] | select(test("^U"))]'      # Filter values matching pattern
--jq 'sort'                            # Sort values alphabetically
```

Usage:

```
mp inspect values [OPTIONS]
```

Options:

```
  -p, --property TEXT  Property name.  \[required]
  -e, --event TEXT     Event name (optional).
  -l, --limit INTEGER  Maximum values to return.  \[default: 100]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

#### query

Query local and live data.

Usage:

```
mp query [OPTIONS] COMMAND [ARGS]...
```

##### activity-feed

Query user activity feed for specific users.

Retrieves the event history for one or more users identified by their distinct_id. Pass comma-separated IDs to --users.

Optionally filter by date range with --from and --to. Without date filters, returns recent activity (API default).

**Output Structure (JSON):**

```
{
  "distinct_ids": ["user123", "user456"],
  "from_date": "2025-01-01",
  "to_date": "2025-01-31",
  "event_count": 47,
  "events": [
    {
      "event": "Login",
      "time": "2025-01-15T10:30:00+00:00",
      "properties": {"$browser": "Chrome", "$city": "San Francisco", ...}
    },
    {
      "event": "Purchase",
      "time": "2025-01-15T11:45:00+00:00",
      "properties": {"product_id": "SKU123", "amount": 99.99, ...}
    }
  ]
}
```

**Examples:**

```
mp query activity-feed --users "user123"
mp query activity-feed --users "user123,user456" --from 2025-01-01 --to 2025-01-31
mp query activity-feed --users "user123" --format table
```

**jq Examples:**

```
--jq '.event_count'                  # Total number of events
--jq '.events | length'              # Same as above
--jq '.events[].event'               # List all event names
--jq '.events | group_by(.event) | map({event: .[0].event, count: length})'
```

Usage:

```
mp query activity-feed [OPTIONS]
```

Options:

```
  -U, --users TEXT     Comma-separated distinct IDs.  \[required]
  --from TEXT          Start date (YYYY-MM-DD).
  --to TEXT            End date (YYYY-MM-DD).
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### event-counts

Query event counts over time for multiple events.

Compares multiple events on the same time series. Pass comma-separated event names to --events (e.g., --events "Sign Up,Login,Purchase").

The --type option controls how counts are calculated:

- general: Total event occurrences (default)
- unique: Unique users who triggered the event
- average: Average events per user

**Output Structure (JSON):**

```
{
  "events": ["Sign Up", "Login", "Purchase"],
  "from_date": "2025-01-01",
  "to_date": "2025-01-07",
  "unit": "day",
  "type": "general",
  "series": {
    "Sign Up": {"2025-01-01": 150, "2025-01-02": 175, ...},
    "Login": {"2025-01-01": 520, "2025-01-02": 610, ...},
    "Purchase": {"2025-01-01": 45, "2025-01-02": 52, ...}
  }
}
```

**Examples:**

```
mp query event-counts --events "Sign Up,Login,Purchase" --from 2025-01-01 --to 2025-01-31
mp query event-counts --events "Sign Up,Purchase" --from 2025-01-01 --to 2025-01-31 --type unique
mp query event-counts --events "Login" --from 2025-01-01 --to 2025-01-31 --unit week
```

**jq Examples:**

```
--jq '.series | keys'                # List event names
--jq '.series["Login"] | add'        # Sum counts for one event
--jq '.series["Login"]["2025-01-01"]'  # Count for specific date
--jq '[.series | to_entries[] | {event: .key, total: (.value | add)}]'
```

Usage:

```
mp query event-counts [OPTIONS]
```

Options:

```
  -e, --events TEXT    Comma-separated event names.  \[required]
  --from TEXT          Start date (YYYY-MM-DD).  \[required]
  --to TEXT            End date (YYYY-MM-DD).  \[required]
  -t, --type TEXT      Count type: general, unique, average.  \[default:
                       general]
  -u, --unit TEXT      Time unit: day, week, month.  \[default: day]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### flows

Query a saved Flows report by bookmark ID.

Retrieves data from a saved Flows report in Mixpanel. The bookmark_id can be found in the URL when viewing a flows report (the numeric ID after /flows/).

Flows reports show user paths through a sequence of events with step-by-step conversion rates and path breakdowns.

**Output Structure (JSON):**

```
{
  "bookmark_id": 12345,
  "computed_at": "2025-01-15T10:30:00Z",
  "steps": [
    {"step": 1, "event": "Sign Up", "count": 10000},
    {"step": 2, "event": "Verify Email", "count": 7500},
    {"step": 3, "event": "Complete Profile", "count": 4200}
  ],
  "breakdowns": [
    {"path": ["Sign Up", "Verify Email", "Complete Profile"], "count": 3800},
    {"path": ["Sign Up", "Verify Email", "Drop Off"], "count": 3300}
  ],
  "overall_conversion_rate": 0.42,
  "metadata": {...}
}
```

**Examples:**

```
mp query flows 12345
mp query flows 12345 --format table
```

**jq Examples:**

```
--jq '.overall_conversion_rate'      # End-to-end conversion rate
--jq '.steps | length'               # Number of flow steps
--jq '.steps[] | {event, count}'     # Event and count per step
--jq '.breakdowns | sort_by(.count) | reverse | .[0]'
```

Usage:

```
mp query flows [OPTIONS] BOOKMARK_ID
```

Options:

```
  BOOKMARK_ID          Saved flows report bookmark ID.  \[required]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### frequency

Analyze event frequency distribution (addiction analysis).

Shows how many users performed an event N times within each time period. Useful for understanding user engagement depth and "power user" distribution.

The --addiction-unit controls granularity of frequency buckets (hour or day). For example, with --addiction-unit hour, the data shows how many users performed the event 1 time, 2 times, 3 times, etc. per hour.

**Output Structure (JSON):**

```
{
  "event": "Login",
  "from_date": "2025-01-01",
  "to_date": "2025-01-07",
  "unit": "day",
  "addiction_unit": "hour",
  "data": {
    "2025-01-01": [500, 250, 125, 60, 30, 15],
    "2025-01-02": [520, 260, 130, 65, 32, 16],
    ...
  }
}
```

Each array shows user counts by frequency (index 0 = 1x, index 1 = 2x, etc.).

**Examples:**

```
mp query frequency --from 2025-01-01 --to 2025-01-31
mp query frequency -e "Login" --from 2025-01-01 --to 2025-01-31
mp query frequency -e "Login" --from 2025-01-01 --to 2025-01-31 --addiction-unit day
```

**jq Examples:**

```
--jq '.data | keys'                  # List all dates
--jq '.data["2025-01-01"][0]'        # Users who did it once on Jan 1
--jq '.data["2025-01-01"] | add'     # Total active users on Jan 1
--jq '.data | to_entries | map({date: .key, power_users: .value[4:] | add})'
```

Usage:

```
mp query frequency [OPTIONS]
```

Options:

```
  --from TEXT            Start date (YYYY-MM-DD).  \[required]
  --to TEXT              End date (YYYY-MM-DD).  \[required]
  -e, --event TEXT       Event name (all events if omitted).
  -u, --unit TEXT        Time unit: day, week, month.  \[default: day]
  --addiction-unit TEXT  Addiction unit: hour, day.  \[default: hour]
  -w, --where TEXT       Filter expression.
  -f, --format [TEXT]    Output format: json, jsonl, table, csv, plain.
                         \[default: json]
  --jq TEXT              Apply jq filter to JSON output (requires --format
                         json or jsonl).
```

##### funnel

Run live funnel analysis against Mixpanel API.

Analyzes conversion through a saved funnel's steps. The funnel_id can be found in the Mixpanel UI URL when viewing the funnel, or via 'mp inspect funnels'.

**Output Structure (JSON):**

```
{
  "funnel_id": 12345,
  "funnel_name": "Onboarding Funnel",
  "from_date": "2025-01-01",
  "to_date": "2025-01-31",
  "conversion_rate": 0.23,
  "steps": [
    {"event": "Sign Up", "count": 10000, "conversion_rate": 1.0},
    {"event": "Verify Email", "count": 7500, "conversion_rate": 0.75},
    {"event": "Complete Profile", "count": 4200, "conversion_rate": 0.56},
    {"event": "First Purchase", "count": 2300, "conversion_rate": 0.55}
  ]
}
```

**Examples:**

```
mp query funnel 12345 --from 2025-01-01 --to 2025-01-31
mp query funnel 12345 --from 2025-01-01 --to 2025-01-31 --unit week
mp query funnel 12345 --from 2025-01-01 --to 2025-01-31 --on country
```

**jq Examples:**

```
--jq '.conversion_rate'              # Overall conversion rate
--jq '.steps | length'               # Number of funnel steps
--jq '.steps[-1].count'              # Users completing the funnel
--jq '.steps[] | {event, rate: .conversion_rate}'
```

Usage:

```
mp query funnel [OPTIONS] FUNNEL_ID
```

Options:

```
  FUNNEL_ID            Funnel ID.  \[required]
  --from TEXT          Start date (YYYY-MM-DD).  \[required]
  --to TEXT            End date (YYYY-MM-DD).  \[required]
  -u, --unit TEXT      Time unit: day, week, month.
  -o, --on TEXT        Property to segment by.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### jql

Execute JQL script against Mixpanel API.

Script can be provided as a file argument or inline with --script. Parameters can be passed with --param key=value (repeatable).

**Output Structure (JSON):**

The output structure depends on your JQL script. Common patterns:

groupBy result:

```
{
  "raw": [
    {"key": ["Login"], "value": 5234},
    {"key": ["Sign Up"], "value": 1892}
  ],
  "row_count": 2
}
```

Aggregation result:

```
{
  "raw": [{"count": 15234, "unique_users": 3421}],
  "row_count": 1
}
```

**Examples:**

```
mp query jql analysis.js
mp query jql --script "function main() { return Events({...}).groupBy(['event'], mixpanel.reducer.count()) }"
mp query jql analysis.js --param start_date=2025-01-01 --param event_name=Login
```

**jq Examples:**

```
--jq '.raw'                          # Get raw result array
--jq '.raw[0]'                       # First result row
--jq '.raw[] | {event: .key[0], count: .value}'
--jq '.row_count'                    # Number of result rows
```

Usage:

```
mp query jql [OPTIONS] [FILE]
```

Options:

```
  [FILE]               JQL script file.
  -c, --script TEXT    Inline JQL script.
  -P, --param TEXT     Parameter (key=value).
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### property-counts

Query event counts broken down by property values.

Shows how event counts vary across different values of a property. For example, --property country shows event counts per country.

The --type option controls how counts are calculated:

- general: Total event occurrences (default)
- unique: Unique users who triggered the event
- average: Average events per user

The --limit option controls how many property values to return (default 10, ordered by count descending).

**Output Structure (JSON):**

```
{
  "event": "Purchase",
  "property_name": "country",
  "from_date": "2025-01-01",
  "to_date": "2025-01-07",
  "unit": "day",
  "type": "general",
  "series": {
    "US": {"2025-01-01": 150, "2025-01-02": 175, ...},
    "UK": {"2025-01-01": 75, "2025-01-02": 80, ...},
    "DE": {"2025-01-01": 45, "2025-01-02": 52, ...}
  }
}
```

**Examples:**

```
mp query property-counts -e "Purchase" -p country --from 2025-01-01 --to 2025-01-31
mp query property-counts -e "Sign Up" -p "utm_source" --from 2025-01-01 --to 2025-01-31 --limit 20
mp query property-counts -e "Login" -p browser --from 2025-01-01 --to 2025-01-31 --type unique
```

**jq Examples:**

```
--jq '.series | keys'                # List property values
--jq '.series["US"] | add'           # Sum counts for one value
--jq '.series | to_entries | sort_by(.value | add) | reverse'
--jq '[.series | to_entries[] | {value: .key, total: (.value | add)}]'
```

Usage:

```
mp query property-counts [OPTIONS]
```

Options:

```
  -e, --event TEXT     Event name.  \[required]
  -p, --property TEXT  Property name.  \[required]
  --from TEXT          Start date (YYYY-MM-DD).  \[required]
  --to TEXT            End date (YYYY-MM-DD).  \[required]
  -t, --type TEXT      Count type: general, unique, average.  \[default:
                       general]
  -u, --unit TEXT      Time unit: day, week, month.  \[default: day]
  -l, --limit INTEGER  Max property values to return.  \[default: 10]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### retention

Run live retention analysis against Mixpanel API.

Measures how many users return after their first action (birth event). Users are grouped into cohorts by when they first did the birth event, then tracked for how many returned to do the return event.

The --interval and --intervals options control bucket granularity: --interval is the bucket size (default 1), --intervals is the number of buckets to track (default 10). Combined with --unit, this defines the retention window (e.g., --unit day --interval 1 --intervals 7 tracks daily retention for 7 days).

**Output Structure (JSON):**

```
{
  "born_event": "Sign Up",
  "return_event": "Login",
  "from_date": "2025-01-01",
  "to_date": "2025-01-31",
  "unit": "day",
  "cohorts": [
    {"date": "2025-01-01", "size": 500, "retention": [1.0, 0.65, 0.45, 0.38]},
    {"date": "2025-01-02", "size": 480, "retention": [1.0, 0.62, 0.41, 0.35]},
    {"date": "2025-01-03", "size": 520, "retention": [1.0, 0.68, 0.48, 0.40]}
  ]
}
```

**Examples:**

```
mp query retention --born "Sign Up" --return "Login" --from 2025-01-01 --to 2025-01-31
mp query retention --born "Sign Up" --return "Purchase" --from 2025-01-01 --to 2025-01-31 --unit week
mp query retention --born "Sign Up" --return "Login" --from 2025-01-01 --to 2025-01-31 --intervals 7
```

**jq Examples:**

```
--jq '.cohorts | length'                   # Number of cohorts
--jq '.cohorts[0].retention'               # First cohort retention curve
--jq '.cohorts[] | {date, size, day7: .retention[7]}'
```

Usage:

```
mp query retention [OPTIONS]
```

Options:

```
  -b, --born TEXT          Birth event.  \[required]
  -r, --return TEXT        Return event.  \[required]
  --from TEXT              Start date (YYYY-MM-DD).  \[required]
  --to TEXT                End date (YYYY-MM-DD).  \[required]
  --born-where TEXT        Birth event filter.
  --return-where TEXT      Return event filter.
  -i, --interval INTEGER   Bucket size.
  -n, --intervals INTEGER  Number of buckets.
  -u, --unit TEXT          Time unit: day, week, month.  \[default: day]
  -f, --format [TEXT]      Output format: json, jsonl, table, csv, plain.
                           \[default: json]
  --jq TEXT                Apply jq filter to JSON output (requires --format
                           json or jsonl).
```

##### saved-report

Query a saved report (Insights, Retention, or Funnel) by bookmark ID.

Retrieves data from a saved report in Mixpanel. The bookmark_id can be found in the URL when viewing a report (the numeric ID after /insights/, /retention/, or /funnels/).

The report type is automatically detected from the response headers.

**Output Structure (JSON):**

Insights report:

```
{
  "bookmark_id": 12345,
  "computed_at": "2025-01-15T10:30:00Z",
  "from_date": "2025-01-01",
  "to_date": "2025-01-31",
  "headers": ["$event"],
  "series": {
    "Sign Up": {"2025-01-01": 150, "2025-01-02": 175, ...},
    "Login": {"2025-01-01": 520, "2025-01-02": 610, ...}
  },
  "report_type": "insights"
}
```

Funnel/Retention reports have different series structures based on the saved report configuration.

**Examples:**

```
mp query saved-report 12345
mp query saved-report 12345 --format table
```

**jq Examples:**

```
--jq '.report_type'                  # Report type (insights/retention/funnel)
--jq '.series | keys'                # List series names
--jq '.headers'                      # Report column headers
--jq '.series | to_entries | map({name: .key, total: (.value | add)})'
```

Usage:

```
mp query saved-report [OPTIONS] BOOKMARK_ID
```

Options:

```
  BOOKMARK_ID          Saved report bookmark ID.  \[required]
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### segmentation

Run live segmentation query against Mixpanel API.

Returns time-series event counts, optionally segmented by a property. Without --on, returns total counts per time period. With --on, breaks down counts by property values (e.g., --on country shows counts per country).

The --on parameter accepts bare property names (e.g., 'country') or full filter expressions (e.g., 'properties["country"] == "US"').

**Output Structure (JSON):**

```
{
  "event": "Sign Up",
  "from_date": "2025-01-01",
  "to_date": "2025-01-07",
  "unit": "day",
  "segment_property": "country",
  "total": 1850,
  "series": {
    "US": {"2025-01-01": 150, "2025-01-02": 175, ...},
    "UK": {"2025-01-01": 75, "2025-01-02": 80, ...}
  }
}
```

**Examples:**

```
mp query segmentation -e "Sign Up" --from 2025-01-01 --to 2025-01-31
mp query segmentation -e "Purchase" --from 2025-01-01 --to 2025-01-31 --on country
mp query segmentation -e "Login" --from 2025-01-01 --to 2025-01-07 --unit week
```

**jq Examples:**

```
--jq '.total'                    # Total event count
--jq '.series | keys'            # List segment names
--jq '.series["US"] | add'       # Sum counts for one segment
```

Usage:

```
mp query segmentation [OPTIONS]
```

Options:

```
  -e, --event TEXT     Event name.  \[required]
  --from TEXT          Start date (YYYY-MM-DD).  \[required]
  --to TEXT            End date (YYYY-MM-DD).  \[required]
  -o, --on TEXT        Property to segment by (bare name or expression).
  -u, --unit TEXT      Time unit: day, week, month.  \[default: day]
  -w, --where TEXT     Filter expression.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### segmentation-average

Calculate average of numeric property over time.

Calculates the mean value of a numeric property across all matching events. Useful for tracking averages like order value, session duration, or scores.

For example, --event Purchase --on order_value calculates average order value per time period.

**Output Structure (JSON):**

```
{
  "event": "Purchase",
  "from_date": "2025-01-01",
  "to_date": "2025-01-07",
  "property_expr": "order_value",
  "unit": "day",
  "results": {
    "2025-01-01": 85.50,
    "2025-01-02": 92.75,
    "2025-01-03": 78.25,
    ...
  }
}
```

**Examples:**

```
mp query segmentation-average -e "Purchase" --on order_value --from 2025-01-01 --to 2025-01-31
mp query segmentation-average -e "Session" --on duration --from 2025-01-01 --to 2025-01-31 --unit hour
```

**jq Examples:**

```
--jq '.results | add / length'       # Overall average
--jq '.results | to_entries | max_by(.value)'  # Highest day
--jq '.results | to_entries | min_by(.value)'  # Lowest day
--jq '[.results | to_entries[] | {date: .key, avg: .value}]'
```

Usage:

```
mp query segmentation-average [OPTIONS]
```

Options:

```
  -e, --event TEXT     Event name.  \[required]
  -o, --on TEXT        Numeric property to average (bare name or expression).
                       \[required]
  --from TEXT          Start date (YYYY-MM-DD).  \[required]
  --to TEXT            End date (YYYY-MM-DD).  \[required]
  -u, --unit TEXT      Time unit: hour, day.  \[default: day]
  -w, --where TEXT     Filter expression.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### segmentation-numeric

Bucket events by numeric property ranges.

Groups events into buckets based on a numeric property's value. Mixpanel automatically determines optimal bucket ranges based on the property's value distribution.

For example, --on price might create buckets like "0-10", "10-50", "50+".

The --type option controls how counts are calculated:

- general: Total event occurrences (default)
- unique: Unique users who triggered the event
- average: Average events per user

**Output Structure (JSON):**

```
{
  "event": "Purchase",
  "from_date": "2025-01-01",
  "to_date": "2025-01-07",
  "property_expr": "amount",
  "unit": "day",
  "series": {
    "0-50": {"2025-01-01": 120, "2025-01-02": 135, ...},
    "50-100": {"2025-01-01": 85, "2025-01-02": 92, ...},
    "100-500": {"2025-01-01": 45, "2025-01-02": 52, ...},
    "500+": {"2025-01-01": 12, "2025-01-02": 15, ...}
  }
}
```

**Examples:**

```
mp query segmentation-numeric -e "Purchase" --on amount --from 2025-01-01 --to 2025-01-31
mp query segmentation-numeric -e "Purchase" --on amount --from 2025-01-01 --to 2025-01-31 --type unique
```

**jq Examples:**

```
--jq '.series | keys'                # List bucket ranges
--jq '.series["100-500"] | add'      # Sum counts for a bucket
--jq '[.series | to_entries[] | {bucket: .key, total: (.value | add)}]'
--jq '.series | to_entries | sort_by(.value | add) | reverse'
```

Usage:

```
mp query segmentation-numeric [OPTIONS]
```

Options:

```
  -e, --event TEXT     Event name.  \[required]
  -o, --on TEXT        Numeric property to bucket (bare name or expression).
                       \[required]
  --from TEXT          Start date (YYYY-MM-DD).  \[required]
  --to TEXT            End date (YYYY-MM-DD).  \[required]
  -t, --type TEXT      Count type: general, unique, average.  \[default:
                       general]
  -u, --unit TEXT      Time unit: hour, day.  \[default: day]
  -w, --where TEXT     Filter expression.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### segmentation-sum

Calculate sum of numeric property over time.

Sums the values of a numeric property across all matching events. Useful for tracking totals like revenue, quantity, or duration.

For example, --event Purchase --on revenue calculates total revenue per time period.

**Output Structure (JSON):**

```
{
  "event": "Purchase",
  "from_date": "2025-01-01",
  "to_date": "2025-01-07",
  "property_expr": "revenue",
  "unit": "day",
  "results": {
    "2025-01-01": 15234.50,
    "2025-01-02": 18456.75,
    "2025-01-03": 12890.25,
    ...
  }
}
```

**Examples:**

```
mp query segmentation-sum -e "Purchase" --on revenue --from 2025-01-01 --to 2025-01-31
mp query segmentation-sum -e "Purchase" --on quantity --from 2025-01-01 --to 2025-01-31 --unit hour
```

**jq Examples:**

```
--jq '.results | add'                # Total sum across all dates
--jq '.results | to_entries | max_by(.value)'  # Highest day
--jq '.results | to_entries | min_by(.value)'  # Lowest day
--jq '[.results | to_entries[] | {date: .key, revenue: .value}]'
```

Usage:

```
mp query segmentation-sum [OPTIONS]
```

Options:

```
  -e, --event TEXT     Event name.  \[required]
  -o, --on TEXT        Numeric property to sum (bare name or expression).
                       \[required]
  --from TEXT          Start date (YYYY-MM-DD).  \[required]
  --to TEXT            End date (YYYY-MM-DD).  \[required]
  -u, --unit TEXT      Time unit: hour, day.  \[default: day]
  -w, --where TEXT     Filter expression.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

##### sql

Execute SQL query against local DuckDB database.

Query can be provided as an argument or read from a file with --file. Use --scalar when your query returns a single value (e.g., COUNT(\*)).

**Output Structure (JSON):**

Default (row results):

```
[
  {"event": "Sign Up", "count": 1500},
  {"event": "Login", "count": 3200},
  {"event": "Purchase", "count": 450}
]
```

With --scalar:

```
{"value": 15234}
```

**Examples:**

```
mp query sql "SELECT COUNT(*) FROM events" --scalar
mp query sql "SELECT event, COUNT(*) FROM events GROUP BY 1" --format table
mp query sql --file analysis.sql --format csv
```

**jq Examples:**

```
--jq '.[0]'                      # First row
--jq '.[] | .event'              # All event names
--jq 'map(select(.count > 100))' # Filter rows
--jq '.value'                    # Scalar result value
```

Usage:

```
mp query sql [OPTIONS] [QUERY]
```

Options:

```
  [QUERY]              SQL query string.
  -F, --file PATH      Read query from file.
  -s, --scalar         Return single value.
  -f, --format [TEXT]  Output format: json, jsonl, table, csv, plain.
                       \[default: json]
  --jq TEXT            Apply jq filter to JSON output (requires --format json
                       or jsonl).
```

Copy markdown
# Architecture

# Architecture

mixpanel_data follows a layered architecture with clear separation of concerns.

Explore on DeepWiki

🤖 **[Architecture Deep Dive →](https://deepwiki.com/jaredmcfarland/mixpanel_data/5-architecture)**

Ask questions about the architecture, trace data flows, or explore component relationships interactively.

## Layer Diagram

```
┌─────────────────────────────────────────────────────────────┐
│                      CLI Layer (Typer)                      │
│         Argument parsing, output formatting, progress       │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                   Public API Layer                          │
│              Workspace class, auth module                   │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     Service Layer                           │
│     DiscoveryService, FetcherService, LiveQueryService      │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                  Infrastructure Layer                       │
│       ConfigManager, MixpanelAPIClient, StorageEngine       │
└─────────────────────────────────────────────────────────────┘
```

## Components

### Workspace (Facade)

The `Workspace` class is the unified entry point that coordinates all services:

- **Credential Resolution** — Env vars → named account → default account
- **Service Orchestration** — Creates and manages service instances
- **Resource Management** — Context manager support for cleanup

### Services

#### DiscoveryService

Schema introspection with session-scoped caching:

- `list_events()` — All event names (cached)
- `list_properties(event)` — Properties for an event (cached per event)
- `list_property_values(property, event)` — Sample values (cached)
- `list_funnels()` — Saved funnels (cached)
- `list_cohorts()` — Saved cohorts (cached)
- `list_top_events()` — Today's top events (NOT cached, real-time)

#### FetcherService

Coordinates data ingestion from Mixpanel API to DuckDB, or direct streaming:

- Streaming transformation (memory efficient)
- Progress callback integration
- Returns `FetchResult` with metadata (fetch mode)
- Returns `Iterator[dict]` without storage (stream mode)

#### LiveQueryService

Executes live analytics queries against Mixpanel Query API:

- Segmentation, funnels, retention, JQL
- Event counts, property counts
- Activity feed, saved reports, flows, frequency
- Numeric aggregations (bucket, sum, average)

### Infrastructure

#### ConfigManager

TOML-based account management at `~/.mp/config.toml`:

- Account CRUD operations
- Credential resolution
- Default account management

#### MixpanelAPIClient

HTTP client with Mixpanel-specific features:

- Service account authentication
- Regional endpoint routing (US, EU, India)
- Automatic rate limit handling with exponential backoff
- Streaming JSONL parsing for large exports

#### StorageEngine

DuckDB-based storage:

- Persistent, ephemeral, and in-memory modes
- Table creation with streaming batch ingestion
- Query execution (DataFrame, scalar, rows)
- Schema introspection and metadata

## Data Paths

### Live Query Path

```
User Request → Workspace → LiveQueryService → MixpanelAPIClient → Mixpanel API
                                                      ↓
                                              Typed Result (e.g., SegmentationResult)
```

Best for:

- Real-time data needs
- One-off analysis
- Pre-computed Mixpanel reports

### Local Analysis Path

```
User Request → Workspace → FetcherService → MixpanelAPIClient → Mixpanel Export API
                                 ↓
                          StorageEngine (DuckDB)
                                 ↓
User Query → Workspace → StorageEngine → SQL Execution → DataFrame
```

Best for:

- Repeated queries over same data
- Custom SQL logic
- Context window preservation (AI agents)
- Offline analysis

### Streaming Path

```
User Request → Workspace → MixpanelAPIClient → Mixpanel Export API
                                    ↓
                          Iterator[dict] (no storage)
                                    ↓
                          Process each record inline
```

Best for:

- ETL pipelines to external systems
- One-time processing without storage
- Memory-constrained environments
- Unix pipeline integration (CLI `--stdout`)

## Key Design Decisions

### Explicit Table Management

Tables are never implicitly overwritten. Fetching to an existing table name raises `TableExistsError`. This prevents accidental data loss and makes data lineage explicit.

### Streaming Ingestion

The API client returns iterators, and storage accepts iterators. This enables memory-efficient processing of large datasets without loading everything into memory.

### JSON Property Storage

Event and profile properties are stored as JSON columns in DuckDB. This preserves the flexible Mixpanel schema while enabling powerful JSON querying:

```
SELECT properties->>'$.country' as country FROM events
```

### Immutable Credentials

Credentials are resolved once at Workspace construction. This prevents confusion from mid-session credential changes.

### Dependency Injection

All services accept their dependencies as constructor arguments. This enables:

- Easy testing with mocks
- Flexible composition
- Clear dependency relationships

## Technology Stack

| Component         | Technology   | Purpose                       |
| ----------------- | ------------ | ----------------------------- |
| Language          | Python 3.11+ | Type hints, modern syntax     |
| CLI Framework     | Typer        | Declarative CLI building      |
| Output Formatting | Rich         | Tables, progress bars, colors |
| Validation        | Pydantic     | Data validation, settings     |
| Database          | DuckDB       | Embedded analytical database  |
| HTTP Client       | httpx        | Async-capable HTTP            |
| DataFrames        | pandas       | Data analysis interface       |

## Package Structure

```
src/mixpanel_data/
├── __init__.py              # Public API exports
├── workspace.py             # Workspace facade
├── auth.py                  # Public auth module
├── exceptions.py            # Exception hierarchy
├── types.py                 # Result types
├── py.typed                 # PEP 561 marker
├── _internal/               # Private implementation
│   ├── config.py            # ConfigManager, Credentials
│   ├── api_client.py        # MixpanelAPIClient
│   ├── storage.py           # StorageEngine
│   └── services/
│       ├── discovery.py     # DiscoveryService
│       ├── fetcher.py       # FetcherService
│       └── live_query.py    # LiveQueryService
└── cli/
    ├── main.py              # Typer app entry point
    ├── commands/            # Command implementations
    ├── formatters.py        # Output formatters
    └── utils.py             # CLI utilities
```

Copy markdown

# Data Model

How Mixpanel data maps to local storage.

Explore on DeepWiki

🤖 **[Data Transformation Deep Dive →](https://deepwiki.com/jaredmcfarland/mixpanel_data/4.5-data-transformation)**

Ask questions about how Mixpanel events and profiles are transformed into DuckDB schemas, or explore the transformation logic.

## Mixpanel Data Model

Mixpanel tracks two primary data types:

### Events

Actions users take in your product:

| Field         | Description                             |
| ------------- | --------------------------------------- |
| `event`       | Event name (e.g., "Purchase", "Signup") |
| `time`        | Unix timestamp when event occurred      |
| `distinct_id` | User identifier                         |
| `$insert_id`  | Deduplication ID                        |
| `properties`  | Custom properties (JSON object)         |

### User Profiles

Persistent attributes about users:

| Field          | Description                      |
| -------------- | -------------------------------- |
| `$distinct_id` | User identifier (primary key)    |
| `$properties`  | Profile properties (JSON object) |

## Local Storage Schema

### Events Table

When you fetch events, they're stored with this schema:

| Column        | Type      | Description             |
| ------------- | --------- | ----------------------- |
| `event_id`    | VARCHAR   | Unique event identifier |
| `event_name`  | VARCHAR   | Event name              |
| `event_time`  | TIMESTAMP | When the event occurred |
| `distinct_id` | VARCHAR   | User identifier         |
| `insert_id`   | VARCHAR   | Deduplication ID        |
| `properties`  | JSON      | All event properties    |

Example query:

```
SELECT
    event_name,
    event_time,
    distinct_id,
    properties->>'$.country' as country,
    CAST(properties->>'$.amount' AS DECIMAL) as amount
FROM events
WHERE event_name = 'Purchase'
```

### Profiles Table

User profiles are stored with:

| Column        | Type    | Description                   |
| ------------- | ------- | ----------------------------- |
| `distinct_id` | VARCHAR | User identifier (primary key) |
| `properties`  | JSON    | All profile properties        |

Example query:

```
SELECT
    distinct_id,
    properties->>'$.name' as name,
    properties->>'$.email' as email,
    properties->>'$.plan' as plan
FROM profiles
WHERE properties->>'$.plan' = 'premium'
```

## JSON Property Access

DuckDB provides powerful JSON operators for querying properties:

### Extract String

```
-- Arrow operator returns JSON, ->> returns text
SELECT properties->>'$.country' as country FROM events
```

### Extract and Cast

```
SELECT CAST(properties->>'$.amount' AS DECIMAL) as amount FROM events
```

### Nested Access

```
SELECT properties->>'$.user.address.city' as city FROM events
```

### Array Access

```
-- First element
SELECT properties->'$.items'->>0 as first_item FROM events

-- Array length
SELECT json_array_length(properties->'$.items') as count FROM events
```

### Check Existence

```
SELECT * FROM events
WHERE properties->>'$.coupon_code' IS NOT NULL
```

## Metadata Table

Each workspace maintains a `_mp_metadata` table for tracking fetch operations:

| Column         | Type      | Description              |
| -------------- | --------- | ------------------------ |
| `table_name`   | VARCHAR   | Name of the data table   |
| `table_type`   | VARCHAR   | "events" or "profiles"   |
| `from_date`    | VARCHAR   | Start date (events only) |
| `to_date`      | VARCHAR   | End date (events only)   |
| `events`       | JSON      | Event filter (if any)    |
| `where_clause` | VARCHAR   | Where filter (if any)    |
| `row_count`    | BIGINT    | Number of rows           |
| `fetched_at`   | TIMESTAMP | When fetch completed     |

This metadata is used by `ws.tables()` and `ws.info()`.

## Common Mixpanel Properties

### Event Properties

| Property          | Type   | Description             |
| ----------------- | ------ | ----------------------- |
| `$city`           | string | User's city             |
| `$region`         | string | User's region/state     |
| `$country_code`   | string | Two-letter country code |
| `$browser`        | string | Browser name            |
| `$device`         | string | Device type             |
| `$os`             | string | Operating system        |
| `mp_country_code` | string | Country code            |
| `$current_url`    | string | Page URL                |
| `$referrer`       | string | Referrer URL            |

### Profile Properties

| Property      | Type      | Description              |
| ------------- | --------- | ------------------------ |
| `$email`      | string    | User's email             |
| `$name`       | string    | User's name              |
| `$first_name` | string    | First name               |
| `$last_name`  | string    | Last name                |
| `$created`    | timestamp | When profile was created |
| `$last_seen`  | timestamp | Last activity time       |

## Query Patterns

### Daily Active Users

```
SELECT
    DATE_TRUNC('day', event_time) as day,
    COUNT(DISTINCT distinct_id) as dau
FROM events
GROUP BY 1
ORDER BY 1
```

### Revenue by Country

```
SELECT
    properties->>'$.country_code' as country,
    SUM(CAST(properties->>'$.amount' AS DECIMAL)) as revenue
FROM events
WHERE event_name = 'Purchase'
GROUP BY 1
ORDER BY 2 DESC
```

### Join Events with Profiles

```
SELECT
    e.event_name,
    p.properties->>'$.plan' as plan,
    COUNT(*) as count
FROM events e
JOIN profiles p ON e.distinct_id = p.distinct_id
GROUP BY 1, 2
```

### Funnel Analysis

```
WITH step1 AS (
    SELECT DISTINCT distinct_id, MIN(event_time) as t1
    FROM events WHERE event_name = 'View Product' GROUP BY 1
),
step2 AS (
    SELECT DISTINCT e.distinct_id, MIN(e.event_time) as t2
    FROM events e
    JOIN step1 s ON e.distinct_id = s.distinct_id
    WHERE e.event_name = 'Add to Cart' AND e.event_time > s.t1
    GROUP BY 1
),
step3 AS (
    SELECT DISTINCT e.distinct_id
    FROM events e
    JOIN step2 s ON e.distinct_id = s.distinct_id
    WHERE e.event_name = 'Purchase' AND e.event_time > s.t2
)
SELECT
    (SELECT COUNT(*) FROM step1) as viewed,
    (SELECT COUNT(*) FROM step2) as added,
    (SELECT COUNT(*) FROM step3) as purchased
```

## See Also

- [SQL Queries Guide](https://jaredmcfarland.github.io/mixpanel_data/guide/sql-queries/index.md) — More query examples
- [DuckDB JSON Documentation](https://duckdb.org/docs/extensions/json) — Complete JSON function reference

Copy markdown

# Storage Engine

How mixpanel_data uses DuckDB for local data storage.

Explore on DeepWiki

🤖 **[StorageEngine Deep Dive →](https://deepwiki.com/jaredmcfarland/mixpanel_data/5.3.2-storageengine)**

Ask questions about DuckDB integration, concurrency, or storage internals.

## Overview

The `StorageEngine` class wraps DuckDB to provide persistent local storage for fetched Mixpanel data. Understanding DuckDB's concurrency model helps avoid conflicts when running multiple `mp` commands.

## Storage Modes

Three storage modes are available:

| Mode           | Description                     | Use Case                             |
| -------------- | ------------------------------- | ------------------------------------ |
| **Persistent** | Database file on disk (default) | Production use, data preservation    |
| **Ephemeral**  | Temp file deleted on close      | Testing, one-off analysis            |
| **In-Memory**  | No file, RAM only               | Quick scripts, no persistence needed |

### Mode Selection

```
# Persistent (default) - stored at ~/.mp/data/{project_id}.db
ws = Workspace()

# Custom path
ws = Workspace(path="/path/to/my.db")

# Ephemeral - temp file, deleted on close
ws = Workspace(ephemeral=True)

# In-memory - no file at all
ws = Workspace(in_memory=True)
```

## DuckDB Concurrency Model

DuckDB uses a **single-writer, multiple-reader** concurrency model:

- **One write connection** can be active at a time per database file
- **Multiple read connections** can coexist with each other
- Read and write connections **cannot coexist** on the same file

This differs from client-server databases (PostgreSQL, MySQL) where a server process mediates all access.

### What This Means in Practice

| Scenario                                 | Result                                    |
| ---------------------------------------- | ----------------------------------------- |
| One `mp fetch` command                   | Works normally                            |
| Two `mp fetch` commands to same database | Second command gets `DatabaseLockedError` |
| `mp fetch` + `mp query` to same database | Query command gets `DatabaseLockedError`  |
| Two `mp query` commands to same database | Both work (when no write lock is held)    |
| Two `mp inspect` commands (API-only)     | Both work (no database access)            |

## Lock Conflicts

When a second process tries to open a database that's already locked for writing, DuckDB raises an error. mixpanel_data catches this and raises a `DatabaseLockedError`:

```
Database locked: /home/user/.mp/data/12345.db
Another mp command may be running. Try again shortly.
```

## Database Not Found

When opening a database in read-only mode, the file must already exist. If you run a read command (like `mp query` or `mp inspect tables`) before fetching any data, you'll get a `DatabaseNotFoundError`:

```
No data yet: /home/user/.mp/data/12345.db
Run 'mp fetch events' or 'mp fetch profiles' to create the database.
```

This is different from write mode, which creates the database file automatically.

### Common Causes

1. **Long-running fetch** — Large date ranges take time; other commands must wait
1. **Background processes** — A previous command didn't exit cleanly
1. **Multiple terminals** — Different shells running concurrent `mp` commands

### Resolution

1. **Wait** — Let the first operation complete
1. **Check for stuck processes** — `ps aux | grep mp` to find orphaned commands
1. **Use separate databases** — Specify different `--path` for concurrent work

## Lazy Storage Initialization

To avoid unnecessary lock conflicts, `Workspace` initializes storage **lazily**:

```
# These DON'T open the database:
ws = Workspace()
ws.events()           # API call, no storage
ws.segmentation(...)  # API call, no storage
ws.funnels(...)       # API call, no storage

# These DO open the database (on first access):
ws.fetch_events(...)  # Writes to storage
ws.sql(...)           # Reads from storage
ws.tables()           # Reads metadata
```

This means API-only commands like `mp inspect events` never conflict with fetch operations, even when targeting the same project.

## Avoiding Conflicts

### Use Ephemeral Mode for Testing

```
# Won't conflict with your main database
mp fetch events --from 2025-01-01 --to 2025-01-07 --ephemeral
```

### Use Separate Paths for Parallel Work

```
# Terminal 1
mp fetch events --from 2025-01-01 --to 2025-06-30 --path ./h1.db

# Terminal 2 (parallel)
mp fetch events --from 2025-07-01 --to 2025-12-31 --path ./h2.db
```

### Combine into Single Commands

```
# Instead of two fetches, use date range in one command
mp fetch events --from 2025-01-01 --to 2025-12-31
```

### Stream Instead of Store

If you don't need to query the data repeatedly:

```
# No database, no locks
mp fetch events --from 2025-01-01 --stdout | process_events.py
```

## Connection Lifecycle

The `StorageEngine` manages its DuckDB connection:

```
# Workspace as context manager ensures cleanup
with Workspace() as ws:
    ws.fetch_events(from_date="2025-01-01", to_date="2025-01-31")
    df = ws.sql("SELECT * FROM events LIMIT 10")
# Connection closed, lock released

# Or explicit close
ws = Workspace()
try:
    ws.fetch_events(...)
finally:
    ws.close()
```

CLI commands handle this automatically.

## Technical Details

### Lock File

DuckDB creates a `.wal` (write-ahead log) file alongside the database during write operations. The lock is held for the duration of the connection.

### Process Isolation

Within a single Python process, multiple `Workspace` instances can share the same database file (DuckDB handles internal locking). Lock conflicts occur between **separate processes**.

### Read-Only Mode

Both `StorageEngine` and `Workspace` support a `read_only` parameter:

```
# Default: write access (matches DuckDB's native behavior)
ws = Workspace()  # read_only=False

# Explicit read-only for concurrent access
ws = Workspace(path="data.db", read_only=True)
```

Read-only connections:

- Allow multiple reader processes to access the database concurrently (when no write lock is held)
- Cannot execute INSERT, UPDATE, DELETE, or DDL statements
- Still blocked by an active write lock (DuckDB write locks are exclusive)

The CLI uses this automatically:

- **Read commands** (`mp query`, `mp inspect tables`, etc.) use `read_only=True`
- **Write commands** (`mp fetch`, `mp inspect drop`) use `read_only=False`

**Note:** If a `mp fetch` is running, other commands will still be blocked until it completes. The benefit of read-only mode is enabling multiple concurrent read operations (e.g., two `mp query` commands).

## See Also

- [Design](https://jaredmcfarland.github.io/mixpanel_data/architecture/design/index.md) — Overall architecture
- [Data Model](https://jaredmcfarland.github.io/mixpanel_data/architecture/data-model/index.md) — Table schemas and query patterns
- [DuckDB Documentation](https://duckdb.org/docs/) — Full DuckDB reference

Copy markdown