# mixpanel_data > Python library for working with Mixpanel analytics data, designed for AI coding agents mixpanel_data is a complete programmable interface to Mixpanel analytics. Python library and CLI for discovery, querying, and data extraction. Discover your schema, run live analytics (segmentation, funnels, retention), execute JQL, and analyze locally with SQL via DuckDB. # Getting Started # mixpanel_data A complete programmable interface to Mixpanel analyticsβ€”available as both a Python library and CLI. AI-Friendly Documentation πŸ€– **[Explore on DeepWiki β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data)** DeepWiki provides an AI-optimized view of this projectβ€”perfect for code assistants, agents, and LLM-powered workflows. Ask questions about the codebase, explore architecture, or get contextual help. ## Why This Exists Mixpanel's web UI is built for interactive exploration. But many workflows need something different: scripts that run unattended, notebooks that combine Mixpanel data with other sources, agents that query analytics programmatically, or pipelines that move data between systems. `mixpanel_data` provides direct programmatic access to Mixpanel's analytics platform. Core analyticsβ€”segmentation, funnels, retention, saved reportsβ€”plus capabilities like raw JQL execution and local SQL analysis are available as Python methods or shell commands. ## Two Interfaces, One Capability Set **Python Library** β€” For notebooks, scripts, and applications: ``` import mixpanel_data as mp ws = mp.Workspace() # Discover what's in your project events = ws.list_events() props = ws.list_properties("Purchase") values = ws.list_property_values("Purchase", "country") funnels = ws.list_funnels() cohorts = ws.list_cohorts() bookmarks = ws.list_bookmarks() # Live queriesβ€”use discovered data to construct accurate queries segmentation = ws.segmentation( event=events[0].name, from_date="2025-01-01", to_date="2025-01-31", on="country" ) funnel = ws.funnel( funnel_id=funnels[0].id, from_date="2025-01-01", to_date="2025-01-31" ) saved = ws.saved_report(bookmark_id=bookmarks[0].id) activity = ws.activity_feed( distinct_id="user@example.com", from_date="2025-01-01" ) # Fetch data locally (use parallel=True for large date ranges) ws.fetch_events( "jan_events", from_date="2025-01-01", to_date="2025-01-31" ) ws.fetch_events( "q1_events", from_date="2025-01-01", to_date="2025-03-31", parallel=True # Up to 10x faster for large date ranges ) ws.fetch_profiles("power_users", cohort_id=cohorts[0].id) # Query with full SQL powerβ€”joins, window functions, CTEs df = ws.sql(""" SELECT e.properties->>'$.country' as country, COUNT(DISTINCT e.distinct_id) as users, COUNT(*) as events FROM jan_events e JOIN power_users u ON e.distinct_id = u.distinct_id GROUP BY 1 ORDER BY 2 DESC """) # Results have .df for pandas interoperability segmentation.df funnel.df df.to_csv("export.csv") # Execute arbitrary JQL for custom analysis jql_result = ws.jql(""" function main() { return Events({...}).groupBy([...]) } """) ``` **CLI** β€” For shell scripts, pipelines, and agent tool calls: ``` # Discover your data landscape mp inspect events mp inspect properties "Purchase" mp inspect values "Purchase" "country" mp inspect top-events mp inspect funnels mp inspect cohorts mp inspect bookmarks # Live queries against Mixpanel API mp query segmentation "Purchase" \ --from 2025-01-01 --to 2025-01-31 --on country mp query funnel 12345 --from 2025-01-01 --to 2025-01-31 mp query retention \ --born-event Signup --return-event Purchase --from 2025-01-01 mp query activity-feed user@example.com --from 2025-01-01 mp query saved-report 67890 mp query frequency "Login" --from 2025-01-01 # Fetch data locally (use --parallel for large date ranges) mp fetch events jan_events --from 2025-01-01 --to 2025-01-31 mp fetch events q1_events --from 2025-01-01 --to 2025-03-31 --parallel mp fetch profiles users --cohort-id 12345 # Query locally with SQL mp query sql "SELECT event_name, COUNT(*) FROM jan_events GROUP BY 1" # Inspect local data mp inspect tables mp inspect schema jan_events mp inspect sample jan_events mp inspect summarize jan_events # Filter with built-in jq mp query segmentation "Purchase" --from 2025-01-01 --format json --jq '.total' # Stream to Unix tools (memory-efficient for large datasets) mp fetch events --stdout --from 2025-01-01 --to 2025-01-31 \ | jq -r '.distinct_id' | sort -u | wc -l ``` ## Capabilities **Discovery** β€” Rapidly explore your project's data landscape: - List all events, drill into properties, sample actual values - Browse saved funnels, cohorts, and reports (bookmarks) - Access Lexicon definitions from your data dictionary - Analyze property distributions, coverage, and numeric statistics - Inspect top events by volume, daily trends, user engagement patterns Discovery commands let you survey what exists before writing queriesβ€”no guessing at event names or property values. **Live Queries** β€” Execute Mixpanel analytics directly: - Segmentation with filtering, grouping, and time bucketing - Funnel conversion analysis - Retention analysis - Saved reports (Insights, Funnels, Flows, Retention) - User activity feeds - Frequency and engagement analysis - Numeric aggregations (sum, average, bucket) - Raw JQL execution for custom analysis **Local Storage** β€” Fetch once, query repeatedly: - Store events and profiles in a local DuckDB database - Parallel fetching for large date ranges (up to 10x faster) - Query with full SQL: joins, window functions, CTEs - Introspect tables, sample data, analyze distributions - Iterate on analysis without repeated API calls **Streaming** β€” Process data without storage: - Stream events directly for ETL pipelines - One-time processing without local persistence - Memory-efficient iteration over large datasets ## For Humans and Agents The structured output and deterministic command interface make `mixpanel_data` particularly effective for AI coding agentsβ€”the same properties that make it scriptable for humans make it reliable for automated workflows. Discovery commands are particularly valuable: an agent can rapidly survey your data landscapeβ€”listing events, inspecting properties, sampling valuesβ€”then construct accurate queries based on what actually exists rather than guessing. The tool is designed to be self-documenting: comprehensive `--help` on every command, complete docstrings on every method, full type annotations throughout, and rich exception messages that explain what went wrong and how to fix it. Agents can discover capabilities, learn correct usage, and recover from mistakes autonomously. ### LLM-Optimized Documentation This documentation is built with AI consumption in mind. In addition to the standard HTML pages, we provide: | Endpoint | Size | Use Case | | ------------------------------------------------------------------------------- | ------ | -------------------------------------------------------------- | | [`llms.txt`](https://jaredmcfarland.github.io/mixpanel_data/llms.txt) | ~3KB | Structured indexβ€”discover what documentation exists | | [`llms-full.txt`](https://jaredmcfarland.github.io/mixpanel_data/llms-full.txt) | ~400KB | Complete documentation in one fileβ€”comprehensive search | | [`index.md`](https://jaredmcfarland.github.io/mixpanel_data/index.md) pages | Varies | Each HTML page has a corresponding `index.md` at the same path | Every page also has a **Copy Markdown** button in the upper right cornerβ€”click it to copy the page content as markdown, ready to paste into your AI assistant's context. For interactive exploration of the codebase itself, see [DeepWiki](https://deepwiki.com/jaredmcfarland/mixpanel_data). ## Next Steps - [Installation](https://jaredmcfarland.github.io/mixpanel_data/getting-started/installation/index.md) β€” Get started with pip or uv - [Quick Start](https://jaredmcfarland.github.io/mixpanel_data/getting-started/quickstart/index.md) β€” Your first queries in 5 minutes - [API Reference](https://jaredmcfarland.github.io/mixpanel_data/api/index.md) β€” Complete Python API documentation - [CLI Reference](https://jaredmcfarland.github.io/mixpanel_data/cli/index.md) β€” Command-line interface documentation Copy markdown # Installation > **⚠️ Pre-release Software**: This package is under active development and not yet published to PyPI. Install directly from GitHub. Explore on DeepWiki πŸ€– **[Installation Guide β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/2.1-installation)** Ask questions about requirements, dependencies, or troubleshoot installation issues. ## Requirements - Python 3.11 or higher - A Mixpanel service account with API access ## Installing with pip ``` pip install git+https://github.com/jaredmcfarland/mixpanel_data.git ``` ## Installing with uv [uv](https://github.com/astral-sh/uv) is a fast Python package installer: ``` uv pip install git+https://github.com/jaredmcfarland/mixpanel_data.git ``` Or add to your project: ``` uv add git+https://github.com/jaredmcfarland/mixpanel_data.git ``` ## Optional Dependencies ### Documentation Tools If you want to build the documentation locally: ``` pip install mixpanel_data[docs] ``` ## Verifying Installation After installation, verify the CLI is available: ``` mp --version ``` You should see output like: ``` mixpanel_data 0.1.0 ``` Test the Python import: ``` import mixpanel_data as mp print(mp.__version__) ``` ## Next Steps - [Quick Start](https://jaredmcfarland.github.io/mixpanel_data/getting-started/quickstart/index.md) β€” Set up credentials and run your first query - [Configuration](https://jaredmcfarland.github.io/mixpanel_data/getting-started/configuration/index.md) β€” Learn about environment variables and config files Copy markdown # Quick Start This guide walks you through your first queries with mixpanel_data in about 5 minutes. Explore on DeepWiki πŸ€– **[Quick Start Tutorial β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/2.3-quick-start-tutorial)** Ask questions about getting started, explore example workflows, or troubleshoot common issues. ## Prerequisites You'll need: - mixpanel_data installed (`pip install git+https://github.com/jaredmcfarland/mixpanel_data.git`) - A Mixpanel service account with username, secret, and project ID - Your project's data residency region (us, eu, or in) ## Step 1: Set Up Service Account Credentials ### Option A: Environment Variables ``` export MP_USERNAME="sa_abc123..." export MP_SECRET="your-secret-here" export MP_PROJECT_ID="12345" export MP_REGION="us" ``` ### Option B: Using the CLI ``` # Interactive prompt (secure, recommended) mp auth add production \ --username sa_abc123... \ --project 12345 \ --region us # You'll be prompted for the service account secret with hidden input ``` This stores credentials in `~/.mp/config.toml` and sets `production` as the default account. For CI/CD environments, provide the secret via environment variable or stdin: ``` # Via environment variable MP_SECRET=your-secret mp auth add production --username sa_abc123... --project 12345 # Via stdin echo "$SECRET" | mp auth add production --username sa_abc123... --project 12345 --secret-stdin ``` ## Step 2: Test Your Connection Verify credentials are working: ``` mp auth test ``` ``` import mixpanel_data as mp ws = mp.Workspace() ws.test_credentials() # Raises AuthenticationError if invalid ``` ## Step 3: Explore Your Data Before writing queries, survey your data landscape. Discovery commands let you see what exists in your Mixpanel project without guessing. ### List Events ``` mp inspect events ``` ``` import mixpanel_data as mp ws = mp.Workspace() events = ws.list_events() for e in events[:10]: print(e.name) ``` ### Drill Into Properties Once you know an event name, see what properties it has: ``` mp inspect properties "Purchase" ``` ``` props = ws.list_properties("Purchase") for p in props: print(f"{p.name}: {p.type}") ``` ### Sample Property Values See actual values a property contains: ``` mp inspect values "Purchase" "country" ``` ``` values = ws.list_property_values("Purchase", "country") print(values) # ['US', 'UK', 'DE', 'FR', ...] ``` ### See What's Active Check today's top events by volume: ``` mp inspect top-events ``` ``` top = ws.top_events() for e in top[:5]: print(f"{e.name}: {e.count:,} events") ``` ### Browse Saved Assets See funnels, cohorts, and saved reports already defined in Mixpanel: ``` mp inspect funnels mp inspect cohorts mp inspect bookmarks ``` ``` funnels = ws.list_funnels() cohorts = ws.list_cohorts() bookmarks = ws.list_bookmarks() ``` This discovery workflow ensures your queries reference real event names, valid properties, and actual valuesβ€”no trial and error. ## Step 4: Fetch Events to Local Storage Fetch a month of events into a local DuckDB database: ``` mp fetch events jan_events --from 2025-01-01 --to 2025-01-31 ``` ``` import mixpanel_data as mp ws = mp.Workspace() result = ws.fetch_events( name="jan_events", from_date="2025-01-01", to_date="2025-01-31" ) print(f"Fetched {result.row_count} events in {result.duration_seconds:.1f}s") ``` Parallel Fetching for Large Date Ranges For date ranges longer than a week, use `--parallel` (CLI) or `parallel=True` (Python) for up to 10x faster exports: ``` mp fetch events q1_events --from 2025-01-01 --to 2025-03-31 --parallel ``` See [Fetching Data](https://jaredmcfarland.github.io/mixpanel_data/guide/fetching/#parallel-fetching) for details. ## Step 5: Inspect Your Fetched Data Before writing queries, explore what you fetched: ``` # See tables in your workspace mp inspect tables # Sample a few rows to see the data shape mp inspect sample -t jan_events # Understand event distribution mp inspect breakdown -t jan_events # Discover queryable property keys mp inspect keys -t jan_events ``` ``` import mixpanel_data as mp ws = mp.Workspace() # See tables in your workspace for table in ws.tables(): print(f"{table.name}: {table.row_count:,} rows") # Sample rows to see data shape print(ws.sample("jan_events", n=3)) # Understand event distribution breakdown = ws.event_breakdown("jan_events") print(f"{breakdown.total_events:,} events from {breakdown.total_users:,} users") for e in breakdown.events[:5]: print(f" {e.event_name}: {e.count:,} ({e.pct_of_total:.1f}%)") # Discover queryable property keys print(ws.property_keys("jan_events")) ``` This tells you what events exist, how they're distributed, and what properties you can queryβ€”so your SQL is informed rather than guesswork. ## Step 6: Query with SQL Analyze the data with SQL: ``` mp query sql "SELECT event_name, COUNT(*) as count FROM jan_events GROUP BY 1 ORDER BY 2 DESC" --format table ``` ``` import mixpanel_data as mp ws = mp.Workspace() # Get results as DataFrame df = ws.sql(""" SELECT event_name, COUNT(*) as count FROM jan_events GROUP BY 1 ORDER BY 2 DESC """) print(df) ``` ## Step 7: Run Live Queries For real-time analytics, query Mixpanel directly: ``` mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 --format table # Filter results with built-in jq support mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 \ --format json --jq '.total' ``` ``` import mixpanel_data as mp ws = mp.Workspace() result = ws.segmentation( event="Purchase", from_date="2025-01-01", to_date="2025-01-31" ) # Access as DataFrame print(result.df) ``` ## Alternative: Stream Data Without Storage For ETL pipelines or one-time processing, stream data directly without storing: ``` # Stream events as JSONL (memory-efficient for large datasets) mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout > events.jsonl # Count unique users via Unix pipeline mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout \ | jq -r '.distinct_id' | sort -u | wc -l ``` ``` import mixpanel_data as mp ws = mp.Workspace() for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"): send_to_warehouse(event) ws.close() ``` ## Temporary Workspaces For one-off analysis without persisting data, use **ephemeral** or **in-memory** workspaces: ``` import mixpanel_data as mp # Ephemeral: uses temp file (best for large datasets, benefits from compression) with mp.Workspace.ephemeral() as ws: ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31") total = ws.sql_scalar("SELECT COUNT(*) FROM events") # Database automatically deleted when context exits # In-memory: no files created (best for small datasets or zero disk footprint) with mp.Workspace.memory() as ws: ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-07") total = ws.sql_scalar("SELECT COUNT(*) FROM events") # Database gone - no files ever created ``` ## Next Steps - [Configuration](https://jaredmcfarland.github.io/mixpanel_data/getting-started/configuration/index.md) β€” Multiple accounts and advanced settings - [Fetching Data](https://jaredmcfarland.github.io/mixpanel_data/guide/fetching/index.md) β€” Filtering and progress callbacks - [Streaming Data](https://jaredmcfarland.github.io/mixpanel_data/guide/streaming/index.md) β€” Process data without local storage - [SQL Queries](https://jaredmcfarland.github.io/mixpanel_data/guide/sql-queries/index.md) β€” DuckDB JSON syntax and patterns - [Live Analytics](https://jaredmcfarland.github.io/mixpanel_data/guide/live-analytics/index.md) β€” Segmentation, funnels, retention Copy markdown # Configuration mixpanel_data uses Service Accounts for authentication and supports multiple configuration methods for credentials and settings. Explore on DeepWiki πŸ€– **[Authentication Setup β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/2.2-authentication-setup)** Ask questions about service accounts, environment variables, or multi-account configuration. ## Environment Variables Set these environment variables to configure credentials: | Variable | Description | Required | | ---------------- | ---------------------------------------- | ------------------ | | `MP_USERNAME` | Service account username | Yes | | `MP_SECRET` | Service account secret | Yes | | `MP_PROJECT_ID` | Mixpanel project ID | Yes | | `MP_REGION` | Data residency region (`us`, `eu`, `in`) | No (default: `us`) | | `MP_CONFIG_PATH` | Override config file location | No | | `MP_ACCOUNT` | Account name to use from config file | No | Example: ``` export MP_USERNAME="sa_abc123..." export MP_SECRET="your-secret-here" export MP_PROJECT_ID="12345" export MP_REGION="us" ``` ## Config File For persistent credential storage, use the config file at `~/.mp/config.toml`: ``` default = "production" [accounts.production] username = "sa_abc123..." secret = "..." project_id = "12345" region = "us" [accounts.staging] username = "sa_xyz789..." secret = "..." project_id = "67890" region = "eu" [accounts.development] username = "sa_dev456..." secret = "..." project_id = "11111" region = "us" ``` ### Managing Accounts with CLI Add a new account: ``` # Interactive prompt (secure, recommended) mp auth add production \ --username sa_abc123... \ --project 12345 \ --region us # You'll be prompted for the secret with hidden input ``` For CI/CD environments, provide the secret via environment variable or stdin: ``` # Via environment variable MP_SECRET=your-secret mp auth add production --username sa_abc123... --project 12345 # Via stdin echo "$SECRET" | mp auth add production --username sa_abc123... --project 12345 --secret-stdin ``` List configured accounts: ``` mp auth list ``` Switch the default account: ``` mp auth switch staging ``` Remove an account: ``` mp auth remove development ``` Show account details (secrets hidden): ``` mp auth show production ``` ### Managing Accounts with Python ``` from mixpanel_data.auth import ConfigManager config = ConfigManager() # Add account config.add_account( name="production", username="sa_abc123...", secret="your-secret", project_id="12345", region="us" ) # List accounts accounts = config.list_accounts() for account in accounts: print(f"{account.name}: project {account.project_id} ({account.region})") # Set default config.set_default("production") # Remove account config.remove_account("old_account") ``` ## Credential Resolution Order When creating a Workspace, credentials are resolved in this order: 1. **Explicit arguments** β€” `Workspace(project_id=..., region=...)` 1. **Environment variables** β€” `MP_USERNAME`, `MP_SECRET`, etc. 1. **Named account** β€” `Workspace(account="staging")` or `MP_ACCOUNT=staging` 1. **Default account** β€” The account marked as `default` in config.toml Example showing resolution: ``` import mixpanel_data as mp # Uses explicit arguments ws = mp.Workspace( username="sa_...", secret="...", project_id="12345" ) # Uses environment variables (if set) ws = mp.Workspace() # Uses named account from config file ws = mp.Workspace(account="staging") ``` ## Data Residency Regions Mixpanel stores data in regional data centers. Use the correct region for your project: | Region | Code | API Endpoint | | -------------- | ---- | ----------------- | | United States | `us` | `mixpanel.com` | | European Union | `eu` | `eu.mixpanel.com` | | India | `in` | `in.mixpanel.com` | Region Mismatch Using the wrong region will result in authentication errors or empty data. ## Workspace Path By default, the workspace database is stored at `./mixpanel.db`. Override with: ``` import mixpanel_data as mp # Custom path ws = mp.Workspace(path="./data/analytics.db") # Ephemeral (temporary, auto-deleted) with mp.Workspace.ephemeral() as ws: # ... work with data # Database deleted on exit ``` For CLI, use the `--db` option: ``` mp fetch events --db ./data/my_project.db --from 2025-01-01 --to 2025-01-31 ``` ## Next Steps - [Fetching Data](https://jaredmcfarland.github.io/mixpanel_data/guide/fetching/index.md) β€” Learn about data ingestion - [API Reference](https://jaredmcfarland.github.io/mixpanel_data/api/index.md) β€” Complete API documentation Copy markdown # User Guide # Fetching Data Fetch events and user profiles from Mixpanel into a local DuckDB database for fast, repeated SQL queries. Explore on DeepWiki πŸ€– **[Fetching Data Guide β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/3.2.3-fetching-data)** Ask questions about fetch options, parallel processing, or troubleshoot data ingestion issues. ## Fetching Events ### Basic Usage Fetch all events for a date range: ``` import mixpanel_data as mp ws = mp.Workspace() result = ws.fetch_events( name="jan_events", from_date="2025-01-01", to_date="2025-01-31" ) print(f"Fetched {result.row_count} events") print(f"Duration: {result.duration_seconds:.1f}s") ``` ``` mp fetch events jan_events --from 2025-01-01 --to 2025-01-31 ``` ### Filtering Events Fetch specific event types: ``` result = ws.fetch_events( name="purchases", from_date="2025-01-01", to_date="2025-01-31", events=["Purchase", "Checkout Started"] ) ``` ``` mp fetch events purchases --from 2025-01-01 --to 2025-01-31 \ --events Purchase,"Checkout Started" ``` ### Using Where Clauses Filter with Mixpanel expression syntax: ``` result = ws.fetch_events( name="premium_purchases", from_date="2025-01-01", to_date="2025-01-31", where='properties["plan"] == "premium"' ) ``` ``` mp fetch events premium_purchases --from 2025-01-01 --to 2025-01-31 \ --where 'properties["plan"] == "premium"' ``` ### Limiting Results Cap the number of events returned (max 100,000): ``` result = ws.fetch_events( name="sample_events", from_date="2025-01-01", to_date="2025-01-31", limit=10000 ) ``` ``` mp fetch events sample_events --from 2025-01-01 --to 2025-01-31 \ --limit 10000 ``` This is useful for testing queries or sampling data before a full fetch. ### Progress Tracking Monitor fetch progress with a callback: ``` def on_progress(count: int) -> None: print(f"Fetched {count} events...") result = ws.fetch_events( name="events", from_date="2025-01-01", to_date="2025-01-31", progress_callback=on_progress ) ``` The CLI automatically displays a progress bar. ### Batch Size Control the memory/IO tradeoff with `batch_size`: ``` # Smaller batch size = less memory, more disk IO result = ws.fetch_events( name="events", from_date="2025-01-01", to_date="2025-01-31", batch_size=500 ) # Larger batch size = more memory, less disk IO result = ws.fetch_events( name="events", from_date="2025-01-01", to_date="2025-01-31", batch_size=5000 ) ``` ``` mp fetch events --from 2025-01-01 --to 2025-01-31 --batch-size 500 ``` The default is 1000 rows per commit. Valid range: 100-100,000. ## Parallel Fetching For large date ranges, parallel fetching can dramatically speed up exportsβ€”up to 10x faster for multi-month ranges. ### Basic Parallel Fetch Enable parallel fetching with the `parallel` flag: ``` result = ws.fetch_events( name="q4_events", from_date="2024-10-01", to_date="2024-12-31", parallel=True ) print(f"Fetched {result.total_rows} rows in {result.duration_seconds:.1f}s") print(f"Batches: {result.successful_batches} succeeded, {result.failed_batches} failed") ``` ``` mp fetch events q4_events --from 2024-10-01 --to 2024-12-31 --parallel ``` Parallel fetching splits the date range into 7-day chunks and fetches them concurrently using multiple threads. This bypasses Mixpanel's 100-day limit and enables faster exports. ### How It Works 1. **Date Range Chunking**: The date range is split into chunks (default: 7 days each) 1. **Concurrent Fetching**: Multiple threads fetch chunks simultaneously from Mixpanel 1. **Single-Writer Queue**: A dedicated writer thread serializes writes to DuckDB (respecting its single-writer constraint) 1. **Partial Failure Handling**: Failed batches are tracked for potential retry ### Performance | Date Range | Sequential | Parallel (10 workers) | Speedup | | ---------- | ---------- | --------------------- | --------------- | | 7 days | ~5s | ~5s | 1x (no benefit) | | 30 days | ~20s | ~5s | 4x | | 90 days | ~60s | ~8s | 7.5x | When to Use Parallel Fetching - **Use parallel** for date ranges > 7 days - **Use sequential** for small ranges or when you need the `limit` parameter ### Configuring Workers Control the number of concurrent fetch threads: ``` result = ws.fetch_events( name="events", from_date="2024-01-01", to_date="2024-03-31", parallel=True, max_workers=5 # Default is 10 ) ``` ``` mp fetch events --from 2024-01-01 --to 2024-03-31 --parallel --workers 5 ``` Higher worker counts may hit Mixpanel rate limits. The default of 10 works well for most cases. ### Configuring Chunk Size Control how many days each chunk covers: ``` result = ws.fetch_events( name="events", from_date="2024-01-01", to_date="2024-03-31", parallel=True, chunk_days=14 # Default is 7 ) ``` ``` mp fetch events --from 2024-01-01 --to 2024-03-31 --parallel --chunk-days 14 ``` Smaller chunk sizes create more parallel batches (potentially faster) but increase API overhead. Valid range: 1-100 days. ### Progress Callbacks Monitor batch completion with a callback: ``` from mixpanel_data import BatchProgress def on_batch(progress: BatchProgress) -> None: status = "βœ“" if progress.success else "βœ—" print(f"[{status}] Batch {progress.batch_index + 1}/{progress.total_batches}: " f"{progress.from_date} to {progress.to_date} ({progress.rows} rows)") result = ws.fetch_events( name="events", from_date="2024-01-01", to_date="2024-03-31", parallel=True, on_batch_complete=on_batch ) ``` The CLI automatically displays batch progress when `--parallel` is used. ### Handling Failures Parallel fetching tracks failures and provides retry information: ``` result = ws.fetch_events( name="events", from_date="2024-01-01", to_date="2024-03-31", parallel=True ) if result.has_failures: print(f"Warning: {result.failed_batches} batches failed") for from_date, to_date in result.failed_date_ranges: print(f" Failed: {from_date} to {to_date}") # Retry failed ranges with append mode for from_date, to_date in result.failed_date_ranges: ws.fetch_events( name="events", from_date=from_date, to_date=to_date, append=True # Append to existing table ) ``` Parallel Fetch Limitations - **No `limit` parameter**: Parallel fetch does not support the `limit` parameter. Using both raises an error. - **Exit code 1 on partial failure**: The CLI returns exit code 1 if any batches fail, even if some succeeded. ## Fetching Profiles Fetch user profiles into local storage: ``` result = ws.fetch_profiles(name="users") print(f"Fetched {result.row_count} profiles") ``` ``` mp fetch profiles users ``` ### Filtering Profiles Use Mixpanel expression syntax: ``` result = ws.fetch_profiles( name="premium_users", where='properties["plan"] == "premium"' ) ``` ``` mp fetch profiles premium_users \ --where 'properties["plan"] == "premium"' ``` ### Filtering by Cohort Fetch only profiles that are members of a specific cohort: ``` result = ws.fetch_profiles( name="power_users", cohort_id="12345" ) ``` ``` mp fetch profiles power_users --cohort 12345 ``` ### Selecting Specific Properties Reduce bandwidth and memory by fetching only the properties you need: ``` result = ws.fetch_profiles( name="user_emails", output_properties=["$email", "$name", "plan"] ) ``` ``` mp fetch profiles user_emails --output-properties '$email,$name,plan' ``` ### Combining Filters Filters can be combined for precise data selection: ``` result = ws.fetch_profiles( name="premium_emails", cohort_id="premium_cohort", output_properties=["$email", "$name"], where='properties["country"] == "US"' ) ``` ``` mp fetch profiles premium_emails \ --cohort premium_cohort \ --output-properties '$email,$name' \ --where 'properties["country"] == "US"' ``` ### Fetching Specific Users by ID Fetch one or more specific users by their distinct ID: ``` # Single user result = ws.fetch_profiles( name="single_user", distinct_id="user_123" ) # Multiple specific users result = ws.fetch_profiles( name="specific_users", distinct_ids=["user_1", "user_2", "user_3"] ) ``` ``` # Single user mp fetch profiles single_user --distinct-id user_123 # Multiple specific users mp fetch profiles specific_users \ --distinct-ids user_1 --distinct-ids user_2 --distinct-ids user_3 ``` Mutually Exclusive `distinct_id` and `distinct_ids` cannot be used together. Choose one approach based on your needs. ### Fetching Group Profiles Fetch group profiles (companies, accounts, etc.) instead of user profiles: ``` result = ws.fetch_profiles( name="companies", group_id="companies" # The group type defined in your Mixpanel project ) ``` ``` mp fetch profiles companies --group-id companies ``` ### Behavioral Filtering Filter profiles by event behaviorβ€”users who performed specific actions. Behaviors use a named pattern that you reference in a `where` clause: ``` # Users who purchased in the last 30 days result = ws.fetch_profiles( name="recent_purchasers", behaviors=[{ "window": "30d", "name": "made_purchase", "event_selectors": [{"event": "Purchase"}] }], where='(behaviors["made_purchase"] > 0)' ) # Users with multiple behavior criteria result = ws.fetch_profiles( name="engaged_users", behaviors=[ { "window": "30d", "name": "purchased", "event_selectors": [{"event": "Purchase"}] }, { "window": "7d", "name": "active", "event_selectors": [{"event": "Page View"}] } ], where='(behaviors["purchased"] > 0) and (behaviors["active"] >= 5)' ) ``` ``` # Users who purchased in the last 30 days mp fetch profiles recent_purchasers \ --behaviors '[{"window":"30d","name":"made_purchase","event_selectors":[{"event":"Purchase"}]}]' \ --where '(behaviors["made_purchase"] > 0)' # Users with multiple behavior criteria mp fetch profiles engaged_users \ --behaviors '[{"window":"30d","name":"purchased","event_selectors":[{"event":"Purchase"}]},{"window":"7d","name":"active","event_selectors":[{"event":"Page View"}]}]' \ --where '(behaviors["purchased"] > 0) and (behaviors["active"] >= 5)' ``` Behavior Format Each behavior requires: - `window`: Time window (e.g., "30d", "7d", "90d") - `name`: Identifier to reference in `where` clause - `event_selectors`: Array of event filters with `{"event": "Event Name"}` The `where` clause filters using `behaviors["name"]` to check counts. Mutually Exclusive `behaviors` and `cohort_id` cannot be used together. Use one or the other for filtering. ### Historical Profile State Query profile properties as they existed at a specific point in time: ``` import time # Get profiles as of January 1, 2024 timestamp = 1704067200 # Unix timestamp result = ws.fetch_profiles( name="historical_profiles", as_of_timestamp=timestamp ) ``` ``` # Get profiles as of January 1, 2024 (Unix timestamp) mp fetch profiles historical_profiles --as-of-timestamp 1704067200 ``` ### Cohort Membership Analysis Include all users and mark whether they're in a cohort: ``` result = ws.fetch_profiles( name="cohort_analysis", cohort_id="power_users", include_all_users=True # Include non-members too ) ``` ``` mp fetch profiles cohort_analysis \ --cohort power_users --include-all-users ``` This is useful for comparing users inside and outside a cohort. The response includes a membership indicator for each profile. Requires Cohort `include_all_users` requires `cohort_id`. It has no effect without specifying a cohort. ## Parallel Profile Fetching For large profile datasets (thousands of profiles), parallel fetching can dramatically speed up exportsβ€”up to 5x faster. ### Basic Parallel Profile Fetch Enable parallel fetching with the `parallel` flag: ``` result = ws.fetch_profiles( name="all_users", parallel=True ) print(f"Fetched {result.total_rows} profiles in {result.duration_seconds:.1f}s") print(f"Pages: {result.successful_pages} succeeded, {result.failed_pages} failed") ``` ``` mp fetch profiles all_users --parallel ``` Parallel profile fetching uses page-based parallelismβ€”fetching multiple pages of profiles concurrently using a session ID for consistency. ### How It Works 1. **Session-Based Pagination**: The initial page establishes a session ID for consistent results 1. **Dynamic Page Discovery**: Pages are fetched as they're discovered (not pre-scheduled) 1. **Concurrent Fetching**: Multiple threads fetch pages simultaneously (default: 5 workers) 1. **Single-Writer Queue**: A dedicated writer thread serializes writes to DuckDB 1. **Partial Failure Handling**: Failed pages are tracked for potential retry ### Performance | Profile Count | Sequential | Parallel (5 workers) | Speedup | | ------------- | ---------- | -------------------- | --------------- | | 1,000 | ~2s | ~2s | 1x (no benefit) | | 10,000 | ~10s | ~3s | 3x | | 50,000 | ~50s | ~12s | 4x | When to Use Parallel Profile Fetching - **Use parallel** for datasets with 5,000+ profiles - **Use sequential** for small datasets or when you need maximum consistency ### Configuring Workers Control the number of concurrent fetch threads: ``` result = ws.fetch_profiles( name="users", parallel=True, max_workers=3 # Default is 5, max is 5 ) ``` ``` mp fetch profiles users --parallel --workers 3 ``` Worker Limit Workers are capped at 5 to avoid Mixpanel API rate limits (60 requests/hour for Engage API). Requesting more than 5 workers will be automatically capped. ### Progress Callbacks Monitor page completion with a callback: ``` from mixpanel_data import ProfileProgress def on_page(progress: ProfileProgress) -> None: status = "βœ“" if progress.success else "βœ—" print(f"[{status}] Page {progress.page_index}: " f"{progress.rows} rows (cumulative: {progress.cumulative_rows})") result = ws.fetch_profiles( name="users", parallel=True, on_page_complete=on_page ) ``` The CLI automatically displays page progress when `--parallel` is used. ### Handling Failures Parallel fetching tracks failures and provides information for debugging: ``` result = ws.fetch_profiles( name="users", parallel=True ) if result.has_failures: print(f"Warning: {result.failed_pages} pages failed") print(f"Failed page indices: {result.failed_page_indices}") ``` Parallel Profile Fetch Limitations - **Rate limits**: The Engage API has a 60 requests/hour limit. Large exports with many pages may hit this limit. - **Exit code 1 on partial failure**: The CLI returns exit code 1 if any pages fail, even if some succeeded. ### Combining with Filters Parallel fetching works with all profile filters: ``` result = ws.fetch_profiles( name="premium_users", where='properties["plan"] == "premium"', output_properties=["$email", "$name", "plan"], parallel=True, max_workers=3 ) ``` ``` mp fetch profiles premium_users \ --where 'properties["plan"] == "premium"' \ --output-properties '$email,$name,plan' \ --parallel --workers 3 ``` ## Table Naming Tables are stored with the name you provide: ``` ws.fetch_events(name="jan_events", ...) # Creates table: jan_events ws.fetch_events(name="feb_events", ...) # Creates table: feb_events ws.fetch_profiles(name="users") # Creates table: users ``` Table Names Must Be Unique Fetching to an existing table name raises `TableExistsError`. Use `--replace` to overwrite, `--append` to add data, or choose a different name. ## Replacing and Appending ### Replace Mode Drop and recreate a table with fresh data: ``` # First drop the table, then fetch ws.drop("events") result = ws.fetch_events( name="events", from_date="2025-01-01", to_date="2025-01-31" ) ``` ``` mp fetch events --from 2025-01-01 --to 2025-01-31 --replace ``` ### Append Mode Add data to an existing table. Duplicates (by `insert_id` for events, `distinct_id` for profiles) are automatically skipped: ``` # Initial fetch ws.fetch_events( name="events", from_date="2025-01-01", to_date="2025-01-31" ) # Append more data ws.fetch_events( name="events", from_date="2025-02-01", to_date="2025-02-28", append=True ) ``` ``` # Initial fetch mp fetch events --from 2025-01-01 --to 2025-01-31 # Append more data mp fetch events --from 2025-02-01 --to 2025-02-28 --append ``` Resuming Failed Fetches If a fetch crashes or times out, use append mode to resume from where you left off: ``` # Check the last event timestamp mp query sql "SELECT MAX(event_time) FROM events" # 2025-01-15T14:30:00 # Resume from that point mp fetch events --from 2025-01-15 --to 2025-01-31 --append ``` Overlapping date ranges are safeβ€”duplicates are automatically skipped. ## Table Management ### Listing Tables ``` tables = ws.tables() for table in tables: print(f"{table.name}: {table.row_count} rows ({table.type})") ``` ### Viewing Table Schema ``` schema = ws.schema("jan_events") for col in schema.columns: print(f"{col.name}: {col.type}") ``` ### Dropping Tables ``` ws.drop("jan_events") # Drop single table ws.drop_all() # Drop all tables ``` ## FetchResult Both `fetch_events()` and `fetch_profiles()` return a `FetchResult`: ``` result = ws.fetch_events(...) # Attributes result.table_name # "jan_events" result.row_count # 125000 result.duration_seconds # 45.2 # Metadata result.metadata.from_date # "2025-01-01" result.metadata.to_date # "2025-01-31" result.metadata.events # ["Purchase", "Signup"] or None result.metadata.where # 'properties["plan"]...' or None result.metadata.fetched_at # datetime # Serialization result.to_dict() # JSON-serializable dict ``` ## Event Table Schema Fetched events have this schema: | Column | Type | Description | | ------------- | --------- | ----------------------- | | `event_id` | VARCHAR | Unique event identifier | | `event_name` | VARCHAR | Event name | | `event_time` | TIMESTAMP | When the event occurred | | `distinct_id` | VARCHAR | User identifier | | `insert_id` | VARCHAR | Deduplication ID | | `properties` | JSON | All event properties | ## Profile Table Schema Fetched profiles have this schema: | Column | Type | Description | | ------------- | ------- | ----------------------------- | | `distinct_id` | VARCHAR | User identifier (primary key) | | `properties` | JSON | All profile properties | ## Best Practices ### Use Parallel Fetching for Large Date Ranges For date ranges longer than a week, use parallel fetching for the best performance: ``` # Recommended: Parallel fetch for large date ranges result = ws.fetch_events( name="events_2025", from_date="2025-01-01", to_date="2025-12-31", parallel=True ) print(f"Fetched {result.total_rows} rows in {result.duration_seconds:.1f}s") ``` ``` # Recommended: Parallel fetch for large date ranges mp fetch events events_2025 --from 2025-01-01 --to 2025-12-31 --parallel ``` Parallel fetching automatically handles chunking, concurrent API requests, and serialized writes to DuckDBβ€”no manual chunking required. ### Manual Chunking (Alternative) If you need the `limit` parameter (incompatible with parallel), or want fine-grained control, you can manually chunk: ``` import datetime # Fetch first chunk ws.fetch_events( name="events_2025", from_date="2025-01-01", to_date="2025-01-31" ) # Append subsequent chunks start = datetime.date(2025, 2, 1) end = datetime.date(2025, 12, 31) current = start while current <= end: chunk_end = min(current + datetime.timedelta(days=30), end) ws.fetch_events( name="events_2025", from_date=str(current), to_date=str(chunk_end), append=True # Add to existing table ) current = chunk_end + datetime.timedelta(days=1) ``` ``` # Fetch month by month, appending to a single table mp fetch events events_2025 --from 2025-01-01 --to 2025-01-31 mp fetch events events_2025 --from 2025-02-01 --to 2025-02-29 --append mp fetch events events_2025 --from 2025-03-01 --to 2025-03-31 --append # ... continue for each month ``` ``` import datetime start = datetime.date(2025, 1, 1) end = datetime.date(2025, 12, 31) current = start while current < end: chunk_end = min(current + datetime.timedelta(days=30), end) table_name = f"events_{current.strftime('%Y%m')}" ws.fetch_events( name=table_name, from_date=str(current), to_date=str(chunk_end) ) current = chunk_end + datetime.timedelta(days=1) ``` ### Choose the Right Storage Mode mixpanel_data offers three storage modes: | Mode | Method | Disk Usage | Best For | | -------------- | ----------------------- | ----------------------------- | -------------------------------------------- | | **Persistent** | `Workspace()` | Yes (permanent) | Repeated analysis, large datasets | | **Ephemeral** | `Workspace.ephemeral()` | Yes (temp file, auto-deleted) | One-off analysis with large data | | **In-Memory** | `Workspace.memory()` | None | Small datasets, testing, zero disk footprint | **Ephemeral mode** creates a temp file that benefits from DuckDB's compressionβ€”up to 8Γ— faster for large datasets: ``` with mp.Workspace.ephemeral() as ws: ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31") result = ws.sql("SELECT event_name, COUNT(*) FROM events GROUP BY 1") # Database automatically deleted ``` **In-memory mode** creates no files at allβ€”ideal for small datasets, unit tests, or privacy-sensitive scenarios: ``` with mp.Workspace.memory() as ws: ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-07") total = ws.sql_scalar("SELECT COUNT(*) FROM events") # Database gone - no files ever created ``` When to use each mode - **Persistent**: You'll query the same data multiple times across sessions - **Ephemeral**: Large datasets where you need compression benefits but won't keep the data - **In-Memory**: Small datasets, unit tests, or when zero disk footprint is required ## Streaming as an Alternative If you don't need to store data locally, use streaming instead: | Approach | Storage | Best For | | ----------------- | ------------ | ---------------------------------- | | `fetch_events()` | DuckDB table | Repeated SQL analysis | | `stream_events()` | None | ETL pipelines, one-time processing | ``` # Stream directly without storage for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"): send_to_warehouse(event) ``` See [Streaming Data](https://jaredmcfarland.github.io/mixpanel_data/guide/streaming/index.md) for details. ## Next Steps - [Streaming Data](https://jaredmcfarland.github.io/mixpanel_data/guide/streaming/index.md) β€” Process data without local storage - [SQL Queries](https://jaredmcfarland.github.io/mixpanel_data/guide/sql-queries/index.md) β€” Query your fetched data with SQL - [Live Analytics](https://jaredmcfarland.github.io/mixpanel_data/guide/live-analytics/index.md) β€” Query Mixpanel directly for real-time data Copy markdown # Streaming Data Stream events and user profiles directly from Mixpanel without storing to local database. Ideal for ETL pipelines, one-time exports, and Unix-style piping. Explore on DeepWiki πŸ€– **[Data Flow Patterns β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/4.1-data-flow-patterns)** Ask questions about streaming vs fetching, memory-efficient processing, or ETL pipeline patterns. ## When to Stream vs Fetch | Use Case | Recommended | Why | | ---------------------- | ----------------- | ------------------------------------ | | Repeated analysis | `fetch_events()` | Query once, analyze many times | | ETL to external system | `stream_events()` | No intermediate storage needed | | Memory-constrained | `stream_events()` | Constant memory usage | | Ad-hoc exploration | `fetch_events()` | SQL iteration is faster | | Piping to tools | `--stdout` | JSONL integrates with jq, grep, etc. | ## Streaming Events ### Basic Usage Stream all events for a date range: ``` import mixpanel_data as mp ws = mp.Workspace() for event in ws.stream_events( from_date="2025-01-01", to_date="2025-01-31" ): print(f"{event['event_name']}: {event['distinct_id']}") # event_time is a datetime object # properties contains remaining fields ws.close() ``` ``` mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout ``` ### Filtering Events Filter by event name or expression: ``` # Filter by event names for event in ws.stream_events( from_date="2025-01-01", to_date="2025-01-31", events=["Purchase", "Signup"] ): process(event) # Filter with WHERE clause for event in ws.stream_events( from_date="2025-01-01", to_date="2025-01-31", where='properties["country"]=="US"' ): process(event) ``` ``` # Filter by event names mp fetch events --from 2025-01-01 --to 2025-01-31 \ --events "Purchase,Signup" --stdout # Filter with WHERE clause mp fetch events --from 2025-01-01 --to 2025-01-31 \ --where 'properties["country"]=="US"' --stdout ``` ### Raw API Format By default, streaming returns normalized data with `event_time` as a datetime. Use `raw=True` to get the exact Mixpanel API format: ``` for event in ws.stream_events( from_date="2025-01-01", to_date="2025-01-31", raw=True ): # event has {"event": "...", "properties": {...}} structure # properties["time"] is Unix timestamp legacy_system.ingest(event) ``` ``` mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout --raw ``` ## Streaming Profiles ### Basic Usage Stream all user profiles: ``` for profile in ws.stream_profiles(): sync_to_crm(profile) ``` ``` mp fetch profiles --stdout ``` ### Filtering Profiles ``` for profile in ws.stream_profiles( where='properties["plan"]=="premium"' ): send_survey(profile) ``` ``` mp fetch profiles --where 'properties["plan"]=="premium"' --stdout ``` ### Streaming Specific Users Stream a single user by their distinct ID: ``` for profile in ws.stream_profiles(distinct_id="user_123"): process(profile) ``` ``` mp fetch profiles --distinct-id user_123 --stdout ``` Stream multiple specific users: ``` user_ids = ["user_123", "user_456", "user_789"] for profile in ws.stream_profiles(distinct_ids=user_ids): sync_to_external_system(profile) ``` ``` mp fetch profiles --distinct-ids "user_123,user_456,user_789" --stdout ``` Mutually Exclusive `distinct_id` and `distinct_ids` cannot be used together. Use `distinct_id` for a single user, `distinct_ids` for multiple users. ### Streaming Group Profiles Stream group profiles (e.g., companies, accounts) instead of user profiles: ``` # Stream all company profiles for company in ws.stream_profiles(group_id="companies"): sync_company(company) # Filter group profiles for account in ws.stream_profiles( group_id="accounts", where='properties["plan"]=="enterprise"' ): process_enterprise_account(account) ``` ``` # Stream company profiles mp fetch profiles --group-id companies --stdout # Filter group profiles mp fetch profiles --group-id accounts \ --where 'properties["plan"]=="enterprise"' --stdout ``` ### Behavioral Filtering Stream users based on actions they've performed. Behaviors use a named pattern that you reference in a `where` clause: ``` # Users who completed a purchase in last 30 days behaviors = [{ "window": "30d", "name": "made_purchase", "event_selectors": [{"event": "Purchase"}] }] for profile in ws.stream_profiles( behaviors=behaviors, where='(behaviors["made_purchase"] > 0)' ): send_thank_you(profile) # Users who signed up but didn't purchase behaviors = [ {"window": "30d", "name": "signed_up", "event_selectors": [{"event": "Signup"}]}, {"window": "30d", "name": "purchased", "event_selectors": [{"event": "Purchase"}]} ] for profile in ws.stream_profiles( behaviors=behaviors, where='(behaviors["signed_up"] > 0) and (behaviors["purchased"] == 0)' ): send_conversion_reminder(profile) ``` ``` # Users who completed a purchase in last 30 days mp fetch profiles \ --behaviors '[{"window":"30d","name":"made_purchase","event_selectors":[{"event":"Purchase"}]}]' \ --where '(behaviors["made_purchase"] > 0)' \ --stdout # Users who signed up but didn't purchase mp fetch profiles \ --behaviors '[{"window":"30d","name":"signed_up","event_selectors":[{"event":"Signup"}]},{"window":"30d","name":"purchased","event_selectors":[{"event":"Purchase"}]}]' \ --where '(behaviors["signed_up"] > 0) and (behaviors["purchased"] == 0)' \ --stdout ``` Behavior Format Each behavior requires: `window` (time window like "30d"), `name` (identifier for `where` clause), and `event_selectors` (array with `{"event": "Name"}`). Mutually Exclusive `behaviors` cannot be used with `cohort_id`. Use one or the other for filtering. ### Historical Profile State Query profile state at a specific point in time: ``` import time # Profile state from 7 days ago seven_days_ago = int(time.time()) - (7 * 24 * 60 * 60) for profile in ws.stream_profiles(as_of_timestamp=seven_days_ago): compare_historical_state(profile) ``` ``` # Query historical state (Unix timestamp) mp fetch profiles --as-of-timestamp 1704067200 --stdout ``` ### Cohort Membership Analysis Get all users with cohort membership marked: ``` # Stream all users, marking which are in the cohort for profile in ws.stream_profiles( cohort_id="12345", include_all_users=True ): if profile.get("in_cohort"): tag_as_cohort_member(profile) else: tag_as_non_member(profile) ``` ``` mp fetch profiles --cohort-id 12345 --include-all-users --stdout ``` Requires cohort_id `include_all_users` only works when `cohort_id` is specified. ## CLI Pipeline Examples The `--stdout` flag outputs JSONL (one JSON object per line), perfect for Unix pipelines: ``` # Filter with jq mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout \ | jq 'select(.event_name == "Purchase")' # Count events mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout | wc -l # Save to file mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout > events.jsonl # Process with custom script mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout \ | python process_events.py # Extract specific fields mp fetch profiles --stdout | jq -r '.distinct_id' ``` ## Output Formats ### Normalized Format (Default) Events: ``` { "event_name": "Purchase", "distinct_id": "user_123", "event_time": "2025-01-15T10:30:00+00:00", "insert_id": "abc123", "properties": { "amount": 99.99, "currency": "USD" } } ``` Profiles: ``` { "distinct_id": "user_123", "last_seen": "2025-01-15T14:30:00", "properties": { "name": "Alice", "plan": "premium" } } ``` ### Raw Format (`raw=True` or `--raw`) Events: ``` { "event": "Purchase", "properties": { "distinct_id": "user_123", "time": 1705319400, "$insert_id": "abc123", "amount": 99.99, "currency": "USD" } } ``` Profiles: ``` { "$distinct_id": "user_123", "$properties": { "$last_seen": "2025-01-15T14:30:00", "name": "Alice", "plan": "premium" } } ``` ## Common Patterns ### ETL Pipeline Batch events and send to external system: ``` import mixpanel_data as mp from your_warehouse import send_batch ws = mp.Workspace() batch = [] for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"): batch.append(event) if len(batch) >= 1000: send_batch(batch) batch = [] # Send remaining if batch: send_batch(batch) ws.close() ``` ### Aggregation Without Storage Compute statistics without creating a local table: ``` from collections import Counter import mixpanel_data as mp ws = mp.Workspace() event_counts = Counter() for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"): event_counts[event["event_name"]] += 1 print(event_counts.most_common(10)) ws.close() ``` ### Context Manager Use `with` for automatic cleanup: ``` import mixpanel_data as mp with mp.Workspace() as ws: for event in ws.stream_events(from_date="2025-01-01", to_date="2025-01-31"): process(event) # No need to call ws.close() ``` ## Method Signatures ### stream_events() ``` def stream_events( *, from_date: str, to_date: str, events: list[str] | None = None, where: str | None = None, raw: bool = False, ) -> Iterator[dict[str, Any]] ``` | Parameter | Type | Description | | ----------- | ------------------- | -------------------------- | | `from_date` | `str` | Start date (YYYY-MM-DD) | | `to_date` | `str` | End date (YYYY-MM-DD) | | `events` | `list[str] \| None` | Event names to include | | `where` | `str \| None` | Mixpanel expression filter | | `raw` | `bool` | Return raw API format | ### stream_profiles() ``` def stream_profiles( *, where: str | None = None, cohort_id: str | None = None, output_properties: list[str] | None = None, raw: bool = False, distinct_id: str | None = None, distinct_ids: list[str] | None = None, group_id: str | None = None, behaviors: list[dict[str, Any]] | None = None, as_of_timestamp: int | None = None, include_all_users: bool = False, ) -> Iterator[dict[str, Any]] ``` | Parameter | Type | Description | | ------------------- | -------------------- | ------------------------------------- | | `where` | `str \| None` | Mixpanel expression filter | | `cohort_id` | `str \| None` | Filter by cohort membership | | `output_properties` | `list[str] \| None` | Limit returned properties | | `raw` | `bool` | Return raw API format | | `distinct_id` | `str \| None` | Single user ID to fetch | | `distinct_ids` | `list[str] \| None` | Multiple user IDs to fetch | | `group_id` | `str \| None` | Group type for group profiles | | `behaviors` | `list[dict] \| None` | Behavioral filters | | `as_of_timestamp` | `int \| None` | Historical state Unix timestamp | | `include_all_users` | `bool` | Include all users with cohort marking | **Parameter Constraints:** - `distinct_id` and `distinct_ids` are mutually exclusive - `behaviors` and `cohort_id` are mutually exclusive - `include_all_users` requires `cohort_id` to be set ## Next Steps - [Fetching Data](https://jaredmcfarland.github.io/mixpanel_data/guide/fetching/index.md) β€” Store data locally for repeated SQL queries - [SQL Queries](https://jaredmcfarland.github.io/mixpanel_data/guide/sql-queries/index.md) β€” Query stored data with DuckDB SQL - [Live Analytics](https://jaredmcfarland.github.io/mixpanel_data/guide/live-analytics/index.md) β€” Real-time Mixpanel reports Copy markdown # Local SQL Queries Query your fetched data with SQL using DuckDB's powerful analytical engine. Explore on DeepWiki πŸ€– **[Querying Data Guide β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/3.2.4-querying-data)** Ask questions about SQL patterns, JSON property access, or how to structure complex analytical queries. ## Basic Queries ### Execute and Get DataFrame ``` import mixpanel_data as mp ws = mp.Workspace() df = ws.sql(""" SELECT event_name, COUNT(*) as count FROM jan_events GROUP BY 1 ORDER BY 2 DESC """) print(df) ``` ### Get Single Value ``` total = ws.sql_scalar("SELECT COUNT(*) FROM jan_events") print(f"Total events: {total}") ``` ### Get Rows as Tuples ``` rows = ws.sql_rows(""" SELECT event_name, COUNT(*) FROM jan_events GROUP BY 1 LIMIT 5 """) for event_name, count in rows: print(f"{event_name}: {count}") ``` ## DuckDB JSON Syntax Mixpanel properties are stored as JSON columns. Use DuckDB's JSON operators to access them. ### Extract String Property ``` SELECT properties->>'$.country' as country FROM jan_events ``` ### Extract and Cast Numeric ``` SELECT CAST(properties->>'$.amount' AS DECIMAL) as amount FROM jan_events ``` ### Filter on Property ``` SELECT * FROM jan_events WHERE properties->>'$.plan' = 'premium' ``` ### Nested Property Access ``` SELECT properties->>'$.user.email' as email FROM jan_events ``` ### Check Property Exists ``` SELECT * FROM jan_events WHERE properties->>'$.coupon_code' IS NOT NULL ``` ### Array Properties ``` -- Array length SELECT json_array_length(properties->'$.items') as item_count FROM jan_events -- Array element SELECT properties->'$.items'->>0 as first_item FROM jan_events ``` ## Common Query Patterns ### Daily Event Counts ``` SELECT DATE_TRUNC('day', event_time) as day, COUNT(*) as count FROM jan_events GROUP BY 1 ORDER BY 1 ``` ### Events by User ``` SELECT distinct_id, COUNT(*) as event_count, MIN(event_time) as first_seen, MAX(event_time) as last_seen FROM jan_events GROUP BY 1 ORDER BY 2 DESC LIMIT 10 ``` ### Property Distribution ``` SELECT properties->>'$.country' as country, COUNT(*) as count, ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) as pct FROM jan_events WHERE event_name = 'Purchase' GROUP BY 1 ORDER BY 2 DESC ``` ### Revenue by Day ``` SELECT DATE_TRUNC('day', event_time) as day, COUNT(*) as purchases, SUM(CAST(properties->>'$.amount' AS DECIMAL)) as revenue FROM jan_events WHERE event_name = 'Purchase' GROUP BY 1 ORDER BY 1 ``` ### User Cohort Analysis ``` WITH first_events AS ( SELECT distinct_id, DATE_TRUNC('week', MIN(event_time)) as cohort_week FROM jan_events WHERE event_name = 'Signup' GROUP BY 1 ) SELECT cohort_week, COUNT(DISTINCT distinct_id) as users FROM first_events GROUP BY 1 ORDER BY 1 ``` ### Funnel Query ``` WITH step1 AS ( SELECT DISTINCT distinct_id FROM jan_events WHERE event_name = 'View Product' ), step2 AS ( SELECT DISTINCT distinct_id FROM jan_events WHERE event_name = 'Add to Cart' AND distinct_id IN (SELECT distinct_id FROM step1) ), step3 AS ( SELECT DISTINCT distinct_id FROM jan_events WHERE event_name = 'Purchase' AND distinct_id IN (SELECT distinct_id FROM step2) ) SELECT (SELECT COUNT(*) FROM step1) as viewed, (SELECT COUNT(*) FROM step2) as added, (SELECT COUNT(*) FROM step3) as purchased ``` ## Joining Events and Profiles Query events with user profile data: ``` # First, fetch both ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31") ws.fetch_profiles("users") # Join them df = ws.sql(""" SELECT e.event_name, u.properties->>'$.plan' as plan, COUNT(*) as count FROM events e JOIN users u ON e.distinct_id = u.distinct_id GROUP BY 1, 2 ORDER BY 3 DESC """) ``` ## CLI Usage Run SQL queries from the command line: ``` # Table output mp query sql "SELECT event_name, COUNT(*) FROM events GROUP BY 1" --format table # JSON output mp query sql "SELECT * FROM events LIMIT 10" --format json # CSV export mp query sql "SELECT * FROM events" --format csv > events.csv # JSONL for streaming mp query sql "SELECT * FROM events" --format jsonl > events.jsonl # Filter with built-in jq support mp query sql "SELECT * FROM events LIMIT 100" --format json \ --jq '.[] | select(.event_name == "Purchase")' # Extract specific fields with jq mp query sql "SELECT event_name, COUNT(*) as cnt FROM events GROUP BY 1" \ --format json --jq 'map({name: .event_name, count: .cnt})' ``` ## Direct DuckDB Access For advanced use cases, access the DuckDB connection directly: ``` # Get the connection conn = ws.connection # Run DuckDB-specific operations conn.execute("SET threads TO 4") result = conn.execute("EXPLAIN ANALYZE SELECT * FROM events").fetchall() ``` ### Database Path Get the path to the underlying database file: ``` # Get the database file path path = ws.db_path print(f"Data stored at: {path}") # Useful for reopening the same database later ws.close() ws = mp.Workspace.open(path) ``` Note: `db_path` returns `None` for in-memory workspaces created with `Workspace.memory()`. ## Performance Tips ### Use Appropriate Data Types Cast properties to appropriate types for better performance: ``` -- Instead of string comparison WHERE CAST(properties->>'$.amount' AS DECIMAL) > 100 -- Consider creating a view with typed columns CREATE VIEW typed_events AS SELECT event_id, event_name, event_time, distinct_id, CAST(properties->>'$.amount' AS DECIMAL) as amount, properties->>'$.country' as country FROM jan_events ``` ### Limit Result Sets Always use LIMIT during exploration: ``` SELECT * FROM jan_events LIMIT 100 ``` ### Use Aggregations DuckDB is optimized for analytical queries. Prefer aggregations over fetching raw rows. ## Next Steps - [Live Analytics](https://jaredmcfarland.github.io/mixpanel_data/guide/live-analytics/index.md) β€” Query Mixpanel directly - [API Reference](https://jaredmcfarland.github.io/mixpanel_data/api/workspace/index.md) β€” Complete Workspace API Copy markdown # Live Analytics Query Mixpanel's analytics APIs directly for real-time data without fetching to local storage. Explore on DeepWiki πŸ€– **[Querying Data Guide β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/3.2.4-querying-data)** Ask questions about segmentation, funnels, retention, JQL, or other live query methods. ## When to Use Live Queries Use live queries when: - You need the most current data - You're running one-off analysis - The query is already optimized by Mixpanel (segmentation, funnels, retention) - You want to leverage Mixpanel's pre-computed aggregations Use local queries when: - You need to run many queries over the same data - You need custom SQL logic - You want to minimize API calls - Context window preservation matters (for AI agents) ## Segmentation Time-series event counts with optional property segmentation: ``` import mixpanel_data as mp ws = mp.Workspace() # Simple count over time result = ws.segmentation( event="Purchase", from_date="2025-01-01", to_date="2025-01-31" ) # Segment by property result = ws.segmentation( event="Purchase", from_date="2025-01-01", to_date="2025-01-31", on="country" ) # With filtering result = ws.segmentation( event="Purchase", from_date="2025-01-01", to_date="2025-01-31", on="country", where='properties["plan"] == "premium"', unit="week" # day, week, month ) # Access as DataFrame print(result.df) ``` ``` # Simple segmentation mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 # With property breakdown mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 \ --on country --format table # Filter with jq to get just the total mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 \ --format json --jq '.total' # Get top 3 days by volume mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 \ --format json --jq '.series | to_entries | sort_by(.value) | reverse | .[:3]' ``` ### SegmentationResult ``` result.event # "Purchase" result.dates # ["2025-01-01", "2025-01-02", ...] result.values # {"$overall": [100, 150, ...]} result.segments # ["US", "UK", "DE", ...] result.df # pandas DataFrame result.to_dict() # JSON-serializable dict ``` ## Funnels Analyze conversion through a sequence of steps: ``` # First, find your funnel ID funnels = ws.funnels() for f in funnels: print(f"{f.funnel_id}: {f.name}") # Query the funnel result = ws.funnel( funnel_id=12345, from_date="2025-01-01", to_date="2025-01-31" ) # With segmentation result = ws.funnel( funnel_id=12345, from_date="2025-01-01", to_date="2025-01-31", on="country" ) # Access results for step in result.steps: print(f"{step.event}: {step.count} ({step.conversion_rate:.1%})") ``` ``` # List available funnels mp inspect funnels # Query a funnel mp query funnel --funnel-id 12345 --from 2025-01-01 --to 2025-01-31 --format table ``` ### FunnelResult ``` result.funnel_id # 12345 result.steps # [FunnelStep, ...] result.overall_rate # 0.15 (15% overall conversion) result.df # DataFrame with step metrics # Each step step.event # "Checkout Started" step.count # 5000 step.conversion_rate # 0.85 step.avg_time # timedelta or None ``` ## Retention Cohort-based retention analysis: ``` result = ws.retention( born_event="Signup", return_event="Login", from_date="2025-01-01", to_date="2025-01-31", born_where='properties["source"] == "organic"', unit="week" ) # Access cohorts for cohort in result.cohorts: print(f"{cohort.date}: {cohort.size} users") print(f" Retention: {cohort.retention_rates}") ``` ``` mp query retention \ --born-event Signup \ --return-event Login \ --from 2025-01-01 \ --to 2025-01-31 \ --unit week \ --format table ``` ### RetentionResult ``` result.born_event # "Signup" result.return_event # "Login" result.cohorts # [CohortInfo, ...] result.df # DataFrame with retention matrix # Each cohort cohort.date # "2025-01-01" cohort.size # 1000 cohort.retention_rates # [1.0, 0.45, 0.32, 0.28, ...] ``` ## JQL (JavaScript Query Language) Run custom JQL scripts for advanced analysis: ``` script = """ function main() { return Events({ from_date: params.from_date, to_date: params.to_date, event_selectors: [{event: "Purchase"}] }) .groupBy(["properties.country"], mixpanel.reducer.count()) .sortDesc("value") .take(10); } """ result = ws.jql( script=script, params={"from_date": "2025-01-01", "to_date": "2025-01-31"} ) print(result.data) # Raw JQL result print(result.df) # As DataFrame ``` ``` # From file mp query jql --script ./query.js --param from_date=2025-01-01 --param to_date=2025-01-31 # Inline mp query jql --script 'function main() { return Events({...}).count(); }' ``` ## Event Counts Multi-event time series comparison: ``` result = ws.event_counts( events=["Signup", "Purchase", "Churn"], from_date="2025-01-01", to_date="2025-01-31", unit="day" ) # DataFrame with columns: date, Signup, Purchase, Churn print(result.df) ``` ``` mp query event-counts \ --event Signup --event Purchase --event Churn \ --from 2025-01-01 --to 2025-01-31 \ --format table ``` ## Property Counts Break down an event by property values: ``` result = ws.property_counts( event="Purchase", property_name="country", from_date="2025-01-01", to_date="2025-01-31", limit=10 ) print(result.df) # Columns: date, US, UK, DE, ... ``` ``` mp query property-counts \ --event Purchase \ --property country \ --from 2025-01-01 --to 2025-01-31 \ --limit 10 \ --format table ``` ## Activity Feed Get a user's event history: ``` result = ws.activity_feed( distinct_ids=["user_123", "user_456"], from_date="2025-01-01", to_date="2025-01-31" ) for event in result.events: print(f"{event.time}: {event.event}") print(f" Properties: {event.properties}") ``` ``` mp query activity-feed \ --distinct-id user_123 \ --from 2025-01-01 --to 2025-01-31 \ --format json ``` ## Saved Reports Query saved reports from Mixpanel (Insights, Retention, Funnels, and Flows). ### Listing Bookmarks First, find available saved reports: ``` # List all saved reports bookmarks = ws.list_bookmarks() for b in bookmarks: print(f"{b.id}: {b.name} ({b.type})") # Filter by type insights = ws.list_bookmarks(bookmark_type="insights") funnels = ws.list_bookmarks(bookmark_type="funnels") ``` ``` mp inspect bookmarks mp inspect bookmarks --type insights mp inspect bookmarks --type funnels --format table ``` ### Querying Saved Reports Query Insights, Retention, or Funnel reports by bookmark ID: Get Bookmark IDs First Run `list_bookmarks()` or `mp inspect bookmarks` to find the numeric ID of the report you want to query. ``` # Get the bookmark ID from list_bookmarks() first bookmarks = ws.list_bookmarks(bookmark_type="insights") bookmark_id = bookmarks[0].id # e.g., 98765 result = ws.query_saved_report(bookmark_id=bookmark_id) print(f"Report type: {result.report_type}") print(result.df) ``` ``` # First find your bookmark ID mp inspect bookmarks --type insights --format table # Then query it mp query saved-report --bookmark-id 98765 --format table ``` ## Flows Query saved Flows reports: Flows Use Different IDs Flows reports have their own bookmark IDs. Filter with `--type flows` when listing. ``` # Get Flows bookmark ID flows = ws.list_bookmarks(bookmark_type="flows") bookmark_id = flows[0].id # e.g., 54321 result = ws.query_flows(bookmark_id=bookmark_id) print(f"Conversion rate: {result.overall_conversion_rate:.1%}") for step in result.steps: print(f" {step}") ``` ``` # First find Flows bookmark IDs mp inspect bookmarks --type flows --format table # Then query it mp query flows --bookmark-id 54321 --format table ``` ## Frequency Analysis Analyze how often users perform an event: ``` result = ws.frequency( event="Login", from_date="2025-01-01", to_date="2025-01-31", unit="month", addiction_unit="day" ) # Distribution of logins per day print(result.buckets) # {"0": 1000, "1": 500, "2-3": 300, ...} ``` ``` mp query frequency \ --event Login \ --from 2025-01-01 --to 2025-01-31 \ --format table ``` ## Numeric Aggregations Aggregate numeric properties: ### Bucketing ``` result = ws.segmentation_numeric( event="Purchase", from_date="2025-01-01", to_date="2025-01-31", on="amount", type="general" # or "linear", "logarithmic" ) ``` ### Sum ``` result = ws.segmentation_sum( event="Purchase", from_date="2025-01-01", to_date="2025-01-31", on="amount" ) # Total revenue per time period ``` ### Average ``` result = ws.segmentation_average( event="Purchase", from_date="2025-01-01", to_date="2025-01-31", on="amount" ) # Average purchase amount per time period ``` ## API Escape Hatch For Mixpanel APIs not covered by the Workspace class, use the `api` property to make authenticated requests directly: ``` import mixpanel_data as mp ws = mp.Workspace() client = ws.api # Example: List annotations from the Annotations API # Many Mixpanel APIs require the project ID in the URL path base_url = "https://mixpanel.com/api/app" # Use eu.mixpanel.com for EU url = f"{base_url}/projects/{client.project_id}/annotations" response = client.request("GET", url) annotations = response["results"] for ann in annotations: print(f"{ann['id']}: {ann['date']} - {ann['description']}") # Get a specific annotation by ID if annotations: annotation_id = annotations[0]["id"] detail_url = f"{base_url}/projects/{client.project_id}/annotations/{annotation_id}" annotation = client.request("GET", detail_url) print(annotation) ``` ### Request Parameters ``` client.request( "POST", "https://mixpanel.com/api/some/endpoint", params={"key": "value"}, # Query parameters json_body={"data": "payload"}, # JSON request body headers={"X-Custom": "header"}, # Additional headers timeout=60.0 # Request timeout in seconds ) ``` Authentication is handled automatically β€” the client adds the proper `Authorization` header to all requests. The client also exposes `project_id` and `region` properties, which are useful when constructing URLs for APIs that require these values in the path. ## Next Steps - [Data Discovery](https://jaredmcfarland.github.io/mixpanel_data/guide/discovery/index.md) β€” Explore your event schema - [API Reference](https://jaredmcfarland.github.io/mixpanel_data/api/workspace/index.md) β€” Complete API documentation Copy markdown # Data Discovery Explore your Mixpanel project's schema before writing queries. Discovery results are cached for the session. Explore on DeepWiki πŸ€– **[Discovery Methods Guide β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/3.2.2-discovery-methods)** Ask questions about schema exploration, caching behavior, or how to discover your data landscape. ## Listing Events Get all event names in your project: ``` import mixpanel_data as mp ws = mp.Workspace() events = ws.events() print(events) # ['Login', 'Purchase', 'Signup', ...] ``` ``` mp inspect events # Filter with jq - get first 5 events mp inspect events --format json --jq '.[:5]' # Find events containing "User" mp inspect events --format json --jq '.[] | select(contains("User"))' ``` Events are returned sorted alphabetically. ## Listing Properties Get properties for a specific event: ``` properties = ws.properties("Purchase") print(properties) # ['amount', 'country', 'product_id', ...] ``` ``` mp inspect properties --event Purchase ``` Properties include both event-specific and common properties. ## Property Values Sample values for a property: ``` # Sample values for a property values = ws.property_values("country", event="Purchase") print(values) # ['US', 'UK', 'DE', 'FR', ...] # Limit results values = ws.property_values("country", event="Purchase", limit=5) ``` ``` mp inspect values --property country --event Purchase --limit 10 ``` ## Saved Funnels List funnels defined in Mixpanel: ``` funnels = ws.funnels() for f in funnels: print(f"{f.funnel_id}: {f.name}") ``` ``` mp inspect funnels ``` ### FunnelInfo ``` f.funnel_id # 12345 f.name # "Checkout Funnel" ``` ## Saved Cohorts List cohorts defined in Mixpanel: ``` cohorts = ws.cohorts() for c in cohorts: print(f"{c.id}: {c.name} ({c.count} users)") ``` ``` mp inspect cohorts ``` ### SavedCohort ``` c.id # 12345 c.name # "Power Users" c.count # 5000 c.description # "Users with 10+ logins" c.created # datetime c.is_visible # True ``` ## Lexicon Schemas Retrieve data dictionary schemas for events and profile properties. Schemas include descriptions, property types, and metadata defined in Mixpanel's Lexicon. Schema Coverage The Lexicon API returns only events/properties with explicit schemas (defined via API, CSV import, or UI). It does not return all events visible in Lexicon's UI. ``` # List all schemas schemas = ws.lexicon_schemas() for s in schemas: print(f"{s.entity_type}: {s.name}") # Filter by entity type event_schemas = ws.lexicon_schemas(entity_type="event") profile_schemas = ws.lexicon_schemas(entity_type="profile") # Get a specific schema schema = ws.lexicon_schema("event", "Purchase") print(schema.schema_json.description) for prop, info in schema.schema_json.properties.items(): print(f" {prop}: {info.type}") ``` ``` mp inspect lexicon-schemas mp inspect lexicon-schemas --type event mp inspect lexicon-schemas --type profile mp inspect lexicon-schema --type event --name Purchase ``` ### LexiconSchema ``` s.entity_type # "event", "profile", or other API-returned types s.name # "Purchase" s.schema_json # LexiconDefinition object ``` ### LexiconDefinition ``` s.schema_json.description # "User completes a purchase" s.schema_json.properties # dict[str, LexiconProperty] s.schema_json.metadata # LexiconMetadata or None ``` ### LexiconProperty ``` prop = s.schema_json.properties["amount"] prop.type # "number" prop.description # "Purchase amount in USD" prop.metadata # LexiconMetadata or None ``` ### LexiconMetadata ``` meta = s.schema_json.metadata meta.display_name # "Purchase Event" meta.tags # ["core", "revenue"] meta.hidden # False meta.dropped # False meta.contacts # ["owner@company.com"] meta.team_contacts # ["Analytics Team"] ``` Rate Limit The Lexicon API has a strict rate limit of **5 requests per minute**. Schema results are cached for the session to minimize API calls. ## Top Events Get today's most active events: ``` # General top events top = ws.top_events(type="general") for event in top: print(f"{event.event}: {event.count} ({event.percent_change:+.1f}%)") # Average top events top = ws.top_events(type="average", limit=5) ``` ``` mp inspect top-events --type general --limit 10 ``` ### TopEvent ``` event.event # "Login" event.count # 15000 event.percent_change # 12.5 (compared to yesterday) ``` Not Cached Unlike other discovery methods, `top_events()` always makes an API call since it returns real-time data. ## JQL-Based Remote Discovery These methods use JQL (JavaScript Query Language) to analyze data directly on Mixpanel's servers, returning aggregated results without fetching raw data locally. ### Property Value Distribution Understand what values a property contains and how often they appear: ``` result = ws.property_distribution( event="Purchase", property="country", from_date="2025-01-01", to_date="2025-01-31", limit=10, ) print(f"Total: {result.total_count}") for v in result.values: print(f" {v.value}: {v.count} ({v.percentage:.1f}%)") ``` ``` mp inspect distribution -e Purchase -p country --from 2025-01-01 --to 2025-01-31 mp inspect distribution -e Purchase -p country --from 2025-01-01 --to 2025-01-31 --limit 10 ``` ### Numeric Property Summary Get statistical summary for numeric properties: ``` result = ws.numeric_summary( event="Purchase", property="amount", from_date="2025-01-01", to_date="2025-01-31", ) print(f"Count: {result.count}") print(f"Range: {result.min} to {result.max}") print(f"Avg: {result.avg:.2f}, Stddev: {result.stddev:.2f}") print(f"Median: {result.percentiles[50]}") ``` ``` mp inspect numeric -e Purchase -p amount --from 2025-01-01 --to 2025-01-31 mp inspect numeric -e Purchase -p amount --from 2025-01-01 --to 2025-01-31 --percentiles 25,50,75,90 ``` ### Daily Event Counts See event activity over time: ``` result = ws.daily_counts( from_date="2025-01-01", to_date="2025-01-07", events=["Purchase", "Signup"], ) for c in result.counts: print(f"{c.date} {c.event}: {c.count}") ``` ``` mp inspect daily --from 2025-01-01 --to 2025-01-07 mp inspect daily --from 2025-01-01 --to 2025-01-07 -e Purchase,Signup ``` ### User Engagement Distribution Understand how engaged users are by their event count: ``` result = ws.engagement_distribution( from_date="2025-01-01", to_date="2025-01-31", ) print(f"Total users: {result.total_users}") for b in result.buckets: print(f" {b.bucket_label} events: {b.user_count} ({b.percentage:.1f}%)") ``` ``` mp inspect engagement --from 2025-01-01 --to 2025-01-31 mp inspect engagement --from 2025-01-01 --to 2025-01-31 --buckets 1,5,10,50,100 ``` ### Property Coverage Check data quality by seeing how often properties are defined: ``` result = ws.property_coverage( event="Purchase", properties=["coupon_code", "referrer", "utm_source"], from_date="2025-01-01", to_date="2025-01-31", ) print(f"Total events: {result.total_events}") for c in result.coverage: print(f" {c.property}: {c.coverage_percentage:.1f}% defined") ``` ``` mp inspect coverage -e Purchase -p coupon_code,referrer,utm_source --from 2025-01-01 --to 2025-01-31 ``` When to Use JQL-Based Discovery These methods are ideal for: - **Quick exploration**: Understand data shape before fetching locally - **Large date ranges**: Analyze months of data without downloading everything - **Data quality checks**: Verify property coverage and value distributions - **Trend analysis**: See daily activity patterns See the [JQL Discovery Types](https://jaredmcfarland.github.io/mixpanel_data/api/types/#jql-discovery-types) in the API reference for return type details. ## Caching Discovery results are cached for the lifetime of the Workspace: ``` ws = mp.Workspace() # First call hits the API events1 = ws.events() # Second call returns cached result (instant) events2 = ws.events() # Clear cache to force refresh ws.clear_discovery_cache() # Now hits API again events3 = ws.events() ``` ## Local Data Analysis After fetching data into DuckDB, use these introspection methods to understand your data before writing SQL queries. ### Sampling Data Get random sample rows to see data structure: ``` # Get 10 random rows (default) df = ws.sample("events") print(df) # Get 5 random rows df = ws.sample("events", n=5) ``` ``` mp inspect sample -t events mp inspect sample -t events -n 5 mp inspect sample -t events --format table ``` ### Statistical Summary Get column-level statistics for an entire table: ``` summary = ws.summarize("events") print(f"Total rows: {summary.row_count}") for col in summary.columns: print(f"{col.column_name}: {col.column_type}") print(f" Nulls: {col.null_percentage:.1f}%") print(f" Unique: {col.approx_unique}") if col.avg is not None: # Numeric columns print(f" Mean: {col.avg:.2f}, Std: {col.std:.2f}") ``` ``` mp inspect summarize -t events mp inspect summarize -t events --format table ``` ### Event Breakdown Analyze event distribution in an events table: ``` breakdown = ws.event_breakdown("events") print(f"Total events: {breakdown.total_events}") print(f"Total users: {breakdown.total_users}") print(f"Date range: {breakdown.date_range[0]} to {breakdown.date_range[1]}") for event in breakdown.events: print(f"{event.event_name}: {event.count} ({event.pct_of_total:.1f}%)") print(f" Users: {event.unique_users}") print(f" First seen: {event.first_seen}") ``` ``` mp inspect breakdown -t events mp inspect breakdown -t events --format table ``` Required Columns The table must have `event_name`, `event_time`, and `distinct_id` columns. ### Property Key Discovery Discover all JSON property keys in a table: ``` # All property keys across all events keys = ws.property_keys("events") print(keys) # ['amount', 'country', 'product_id', ...] # Property keys for a specific event keys = ws.property_keys("events", event="Purchase") ``` ``` mp inspect keys -t events mp inspect keys -t events -e "Purchase" ``` This is especially useful for building JSON path expressions like `properties->>'$.country'`. ### Column Statistics Deep analysis of a single column: ``` # Analyze a regular column stats = ws.column_stats("events", "event_name") print(f"Total: {stats.count}, Nulls: {stats.null_pct:.1f}%") print(f"Unique values: {stats.unique_count}") print("Top values:") for value, count in stats.top_values: print(f" {value}: {count}") # Analyze a JSON property stats = ws.column_stats("events", "properties->>'$.country'", top_n=20) ``` ``` mp inspect column -t events -c event_name mp inspect column -t events -c "properties->>'$.country'" --top 20 ``` For numeric columns, additional statistics are available: ``` stats = ws.column_stats("purchases", "properties->>'$.amount'") print(f"Min: {stats.min}, Max: {stats.max}") print(f"Mean: {stats.mean:.2f}, Std: {stats.std:.2f}") ``` ### Introspection Workflow A typical workflow for exploring fetched data: ``` import mixpanel_data as mp ws = mp.Workspace() # Fetch data first ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31") # 1. Quick look at the data print(ws.sample("events", n=3)) # 2. Get overall statistics summary = ws.summarize("events") print(f"Rows: {summary.row_count}") # 3. Understand event distribution breakdown = ws.event_breakdown("events") for e in breakdown.events[:5]: print(f"{e.event_name}: {e.count}") # 4. Discover available properties keys = ws.property_keys("events", event="Purchase") print(f"Purchase properties: {keys}") # 5. Deep dive into specific columns stats = ws.column_stats("events", "properties->>'$.country'") print(f"Top countries: {stats.top_values[:5]}") # Now write informed SQL queries df = ws.sql(""" SELECT properties->>'$.country' as country, COUNT(*) as count FROM events WHERE event_name = 'Purchase' GROUP BY 1 ORDER BY 2 DESC """) ``` ## Local Table Discovery Inspect tables in your local database: ### List Tables ``` tables = ws.tables() for t in tables: print(f"{t.name}: {t.row_count} rows ({t.type})") ``` ``` mp inspect tables ``` ### Table Schema ``` schema = ws.table_schema("jan_events") for col in schema.columns: print(f"{col.name}: {col.type} (nullable: {col.nullable})") ``` ``` mp inspect schema --table jan_events ``` ### Workspace Info ``` info = ws.info() print(f"Database: {info.path}") print(f"Project: {info.project_id} ({info.region})") print(f"Account: {info.account}") print(f"Tables: {len(info.tables)}") print(f"Size: {info.size_mb:.1f} MB") ``` ``` mp inspect info ``` ## Discovery Workflow A typical discovery workflow before analysis: ``` import mixpanel_data as mp ws = mp.Workspace() # 1. What events exist? print("Events:") for event in ws.events()[:10]: print(f" - {event}") # 2. What properties does Purchase have? print("\nPurchase properties:") for prop in ws.properties("Purchase"): print(f" - {prop}") # 3. What values does 'country' have? print("\nCountry values:") for value in ws.property_values("country", event="Purchase", limit=10): print(f" - {value}") # 4. What funnels are defined? print("\nFunnels:") for f in ws.funnels(): print(f" - {f.name} (ID: {f.funnel_id})") # 5. Now fetch and analyze ws.fetch_events("purchases", from_date="2025-01-01", to_date="2025-01-31", events=["Purchase"]) df = ws.sql(""" SELECT properties->>'$.country' as country, COUNT(*) as count FROM purchases GROUP BY 1 ORDER BY 2 DESC """) print(df) ``` ## Next Steps - [Fetching Data](https://jaredmcfarland.github.io/mixpanel_data/guide/fetching/index.md) β€” Fetch events for local analysis - [SQL Queries](https://jaredmcfarland.github.io/mixpanel_data/guide/sql-queries/index.md) β€” Query with SQL - [API Reference](https://jaredmcfarland.github.io/mixpanel_data/api/workspace/index.md) β€” Complete API documentation Copy markdown # API Reference # API Overview The `mixpanel_data` Python API provides programmatic access to all library functionality. Explore on DeepWiki πŸ€– **[Python API Reference β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/7.2-python-api-reference)** Ask questions about API methods, explore usage patterns, or get help with specific functionality. ## Import Patterns ``` # Recommended: import with alias import mixpanel_data as mp ws = mp.Workspace() result = ws.segmentation(...) # Direct imports from mixpanel_data import Workspace, FetchResult, MixpanelDataError # Auth utilities from mixpanel_data.auth import ConfigManager, Credentials ``` ## Core Components ### Workspace The main entry point for all operations: - **Discovery** β€” Explore events, properties, funnels, cohorts - **Fetching** β€” Download events and profiles to local storage - **Streaming** β€” Stream data directly without storage (ETL, pipelines) - **Local Queries** β€” SQL queries against DuckDB - **Live Queries** β€” Real-time analytics from Mixpanel API - **Introspection** β€” Examine local tables and schemas [View Workspace API](https://jaredmcfarland.github.io/mixpanel_data/api/workspace/index.md) ### Auth Module Credential and account management: - **ConfigManager** β€” Manage accounts in config file - **Credentials** β€” Credential container with secrets - **AccountInfo** β€” Account metadata (without secrets) [View Auth API](https://jaredmcfarland.github.io/mixpanel_data/api/auth/index.md) ### Exceptions Structured error handling: - **MixpanelDataError** β€” Base exception - **APIError** β€” HTTP/API errors - **ConfigError** β€” Configuration errors - **TableExistsError** / **TableNotFoundError** β€” Storage errors [View Exceptions](https://jaredmcfarland.github.io/mixpanel_data/api/exceptions/index.md) ### Result Types Typed results for all operations: - **FetchResult** β€” Fetch operation results - **SegmentationResult** β€” Time-series data - **FunnelResult** β€” Funnel conversion data - **RetentionResult** β€” Retention cohort data - And many more... [View Result Types](https://jaredmcfarland.github.io/mixpanel_data/api/types/index.md) ## Type Aliases The library exports these type aliases: ``` from mixpanel_data import CountType, HourDayUnit, TimeUnit # CountType: Literal["general", "unique", "average", "median", "min", "max"] # HourDayUnit: Literal["hour", "day"] # TimeUnit: Literal["day", "week", "month", "quarter", "year"] ``` ## Complete API Reference - [Workspace](https://jaredmcfarland.github.io/mixpanel_data/api/workspace/index.md) β€” Main facade class - [Auth](https://jaredmcfarland.github.io/mixpanel_data/api/auth/index.md) β€” Authentication and configuration - [Exceptions](https://jaredmcfarland.github.io/mixpanel_data/api/exceptions/index.md) β€” Error handling - [Result Types](https://jaredmcfarland.github.io/mixpanel_data/api/types/index.md) β€” All result dataclasses Copy markdown # Workspace The `Workspace` class is the unified entry point for all Mixpanel data operations. Explore on DeepWiki πŸ€– **[Workspace Class Deep Dive β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/3.2.1-workspace-class)** Ask questions about Workspace methods, explore usage patterns, or understand how services are orchestrated. ## Overview Workspace orchestrates four internal services: - **DiscoveryService** β€” Schema exploration (events, properties, funnels, cohorts) - **FetcherService** β€” Data ingestion from Mixpanel to DuckDB, or streaming without storage - **LiveQueryService** β€” Real-time analytics queries - **StorageEngine** β€” Local SQL query execution ## Key Features ### Parallel Fetching For large date ranges, use `parallel=True` for up to 10x faster exports: ``` # Parallel fetch for large date ranges (recommended) result = ws.fetch_events( name="events", from_date="2024-01-01", to_date="2024-12-31", parallel=True ) print(f"Fetched {result.total_rows} rows in {result.duration_seconds:.1f}s") ``` Parallel fetching: - Splits date ranges into 7-day chunks (configurable via `chunk_days`) - Fetches chunks concurrently (configurable via `max_workers`, default: 10) - Returns `ParallelFetchResult` with batch statistics and failure tracking - Supports progress callbacks via `on_batch_complete` ### Parallel Profile Fetching For large profile datasets, use `parallel=True` for up to 5x faster exports: ``` # Parallel profile fetch for large datasets result = ws.fetch_profiles( name="users", parallel=True, max_workers=5 # Default and max is 5 ) print(f"Fetched {result.total_rows} profiles in {result.duration_seconds:.1f}s") print(f"Pages: {result.successful_pages} succeeded, {result.failed_pages} failed") ``` Parallel profile fetching: - Uses page-based parallelism with session IDs for consistency - Fetches pages concurrently (configurable via `max_workers`, default: 5, max: 5) - Returns `ParallelProfileResult` with page statistics and failure tracking - Supports progress callbacks via `on_page_complete` ### Append Mode The `fetch_events()` and `fetch_profiles()` methods support an `append` parameter for incremental data loading: ``` # Initial fetch ws.fetch_events(name="events", from_date="2025-01-01", to_date="2025-01-31") # Append more data (duplicates are automatically skipped) ws.fetch_events(name="events", from_date="2025-02-01", to_date="2025-02-28", append=True) ``` This is useful for: - **Incremental loading**: Fetch data in chunks without creating multiple tables - **Crash recovery**: Resume a failed fetch from the last successful point - **Extending date ranges**: Add more historical or recent data to an existing table - **Retrying failed parallel batches**: Use append mode to retry specific date ranges Duplicate events (by `insert_id`) and profiles (by `distinct_id`) are automatically skipped via `INSERT OR IGNORE`. ### Advanced Profile Fetching The `fetch_profiles()` and `stream_profiles()` methods support advanced filtering options: ``` # Fetch specific users by ID ws.fetch_profiles(name="vip_users", distinct_ids=["user_1", "user_2", "user_3"]) # Fetch group profiles (e.g., companies) ws.fetch_profiles(name="companies", group_id="companies") # Fetch users based on behavior ws.fetch_profiles( name="purchasers", behaviors=[{"window": "30d", "name": "buyers", "event_selectors": [{"event": "Purchase"}]}], where='(behaviors["buyers"] > 0)' ) # Query historical profile state ws.fetch_profiles( name="profiles_last_week", as_of_timestamp=int(time.time()) - 604800 # 7 days ago ) # Get all users with cohort membership marked ws.fetch_profiles( name="cohort_analysis", cohort_id="12345", include_all_users=True ) ``` **Parameter constraints:** - `distinct_id` and `distinct_ids` are mutually exclusive - `behaviors` and `cohort_id` are mutually exclusive - `include_all_users` requires `cohort_id` to be set ## Class Reference ## mixpanel_data.Workspace ``` Workspace( account: str | None = None, project_id: str | None = None, region: str | None = None, path: str | Path | None = None, read_only: bool = False, _config_manager: ConfigManager | None = None, _api_client: MixpanelAPIClient | None = None, _storage: StorageEngine | None = None, ) ``` Unified entry point for Mixpanel data operations. The Workspace class is a facade that orchestrates all services: - DiscoveryService for schema exploration - FetcherService for data ingestion - LiveQueryService for real-time analytics - StorageEngine for local SQL queries Examples: Basic usage with credentials from config: ``` ws = Workspace() ws.fetch_events(from_date="2024-01-01", to_date="2024-01-31") df = ws.sql("SELECT * FROM events LIMIT 10") ``` Ephemeral workspace for temporary analysis: ``` with Workspace.ephemeral() as ws: ws.fetch_events(from_date="2024-01-01", to_date="2024-01-31") total = ws.sql_scalar("SELECT COUNT(*) FROM events") # Database automatically deleted ``` Query-only access to existing database: ``` ws = Workspace.open("path/to/database.db") df = ws.sql("SELECT * FROM events") ``` Create a new Workspace with credentials and optional database path. Credentials are resolved in priority order: 1. Environment variables (MP_USERNAME, MP_SECRET, MP_PROJECT_ID, MP_REGION) 1. Named account from config file (if account parameter specified) 1. Default account from config file | PARAMETER | DESCRIPTION | | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | | `account` | Named account from config file to use. **TYPE:** \`str | | `project_id` | Override project ID from credentials. **TYPE:** \`str | | `region` | Override region from credentials (us, eu, in). **TYPE:** \`str | | `path` | Path to database file. If None, uses default location. **TYPE:** \`str | | `read_only` | If True, open database in read-only mode allowing concurrent reads. Defaults to False (write access). **TYPE:** `bool` **DEFAULT:** `False` | | `_config_manager` | Injected ConfigManager for testing. **TYPE:** \`ConfigManager | | `_api_client` | Injected MixpanelAPIClient for testing. **TYPE:** \`MixpanelAPIClient | | `_storage` | Injected StorageEngine for testing. **TYPE:** \`StorageEngine | | RAISES | DESCRIPTION | | ---------------------- | ---------------------------------- | | `ConfigError` | If no credentials can be resolved. | | `AccountNotFoundError` | If named account doesn't exist. | Source code in `src/mixpanel_data/workspace.py` ``` def __init__( self, account: str | None = None, project_id: str | None = None, region: str | None = None, path: str | Path | None = None, read_only: bool = False, # Dependency injection for testing _config_manager: ConfigManager | None = None, _api_client: MixpanelAPIClient | None = None, _storage: StorageEngine | None = None, ) -> None: """Create a new Workspace with credentials and optional database path. Credentials are resolved in priority order: 1. Environment variables (MP_USERNAME, MP_SECRET, MP_PROJECT_ID, MP_REGION) 2. Named account from config file (if account parameter specified) 3. Default account from config file Args: account: Named account from config file to use. project_id: Override project ID from credentials. region: Override region from credentials (us, eu, in). path: Path to database file. If None, uses default location. read_only: If True, open database in read-only mode allowing concurrent reads. Defaults to False (write access). _config_manager: Injected ConfigManager for testing. _api_client: Injected MixpanelAPIClient for testing. _storage: Injected StorageEngine for testing. Raises: ConfigError: If no credentials can be resolved. AccountNotFoundError: If named account doesn't exist. """ # Store injected or create default ConfigManager self._config_manager = _config_manager or ConfigManager() # Resolve credentials self._credentials: Credentials | None = None self._account_name: str | None = account # Resolve credentials (may raise ConfigError or AccountNotFoundError) self._credentials = self._config_manager.resolve_credentials(account) # Apply overrides if provided if project_id or region: from typing import cast from pydantic import SecretStr from mixpanel_data._internal.config import RegionType resolved_region = region or self._credentials.region self._credentials = Credentials( username=self._credentials.username, secret=SecretStr(self._credentials.secret.get_secret_value()), project_id=project_id or self._credentials.project_id, region=cast(RegionType, resolved_region), ) # Initialize storage lazily # Store path for lazy initialization, or use injected storage directly self._db_path: Path | None = None self._storage: StorageEngine | None = None self._read_only = read_only if _storage is not None: # Injected storage - use directly self._storage = _storage else: # Determine database path for lazy initialization if path is not None: self._db_path = Path(path) if isinstance(path, str) else path else: # Default path: ~/.mp/data/{project_id}.db self._db_path = ( Path.home() / ".mp" / "data" / f"{self._credentials.project_id}.db" ) # NOTE: StorageEngine is NOT created here - see storage property # Lazy-initialized services (None until first use) self._api_client: MixpanelAPIClient | None = _api_client self._discovery: DiscoveryService | None = None self._fetcher: FetcherService | None = None self._live_query: LiveQueryService | None = None ``` ### connection ``` connection: DuckDBPyConnection ``` Direct access to the DuckDB connection. Use this for operations not covered by the Workspace API. | RETURNS | DESCRIPTION | | -------------------- | --------------------------------- | | `DuckDBPyConnection` | The underlying DuckDB connection. | ### db_path ``` db_path: Path | None ``` Path to the DuckDB database file. Returns the filesystem path where data is stored. Useful for: - Knowing where your data lives - Opening the same database later with `Workspace.open(path)` - Debugging and logging | RETURNS | DESCRIPTION | | ------- | ----------- | | \`Path | None\` | Example Save the path for later use:: ``` ws = mp.Workspace() path = ws.db_path ws.close() # Later, reopen the same database ws = mp.Workspace.open(path) ``` ### api ``` api: MixpanelAPIClient ``` Direct access to the Mixpanel API client. Use this escape hatch for Mixpanel API operations not covered by the Workspace class. The client handles authentication automatically. The client provides - `request(method, url, **kwargs)`: Make authenticated requests to any Mixpanel API endpoint. - `project_id`: The configured project ID for constructing URLs. - `region`: The configured region ('us', 'eu', or 'in'). | RETURNS | DESCRIPTION | | ------------------- | --------------------------------- | | `MixpanelAPIClient` | The underlying MixpanelAPIClient. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Example Fetch event schema from the Lexicon Schemas API:: ``` import mixpanel_data as mp from urllib.parse import quote ws = mp.Workspace() client = ws.api # Build the URL with proper encoding event_name = quote("Added To Cart", safe="") url = f"https://mixpanel.com/api/app/projects/{client.project_id}/schemas/event/{event_name}" # Make the authenticated request schema = client.request("GET", url) print(schema) ``` ### ephemeral ``` ephemeral( account: str | None = None, project_id: str | None = None, region: str | None = None, _config_manager: ConfigManager | None = None, _api_client: MixpanelAPIClient | None = None, ) -> Iterator[Workspace] ``` Create a temporary workspace that auto-deletes on exit. | PARAMETER | DESCRIPTION | | ----------------- | --------------------------------------------------------------------- | | `account` | Named account from config file to use. **TYPE:** \`str | | `project_id` | Override project ID from credentials. **TYPE:** \`str | | `region` | Override region from credentials. **TYPE:** \`str | | `_config_manager` | Injected ConfigManager for testing. **TYPE:** \`ConfigManager | | `_api_client` | Injected MixpanelAPIClient for testing. **TYPE:** \`MixpanelAPIClient | | YIELDS | DESCRIPTION | | ----------- | ----------------------------------------------------------- | | `Workspace` | A workspace with temporary database. **TYPE::** `Workspace` | Example ``` with Workspace.ephemeral() as ws: ws.fetch_events(from_date="2024-01-01", to_date="2024-01-31") print(ws.sql_scalar("SELECT COUNT(*) FROM events")) # Database file automatically deleted ``` Source code in `src/mixpanel_data/workspace.py` ```` @classmethod @contextmanager def ephemeral( cls, account: str | None = None, project_id: str | None = None, region: str | None = None, _config_manager: ConfigManager | None = None, _api_client: MixpanelAPIClient | None = None, ) -> Iterator[Workspace]: """Create a temporary workspace that auto-deletes on exit. Args: account: Named account from config file to use. project_id: Override project ID from credentials. region: Override region from credentials. _config_manager: Injected ConfigManager for testing. _api_client: Injected MixpanelAPIClient for testing. Yields: Workspace: A workspace with temporary database. Example: ```python with Workspace.ephemeral() as ws: ws.fetch_events(from_date="2024-01-01", to_date="2024-01-31") print(ws.sql_scalar("SELECT COUNT(*) FROM events")) # Database file automatically deleted ``` """ storage = StorageEngine.ephemeral() ws = cls( account=account, project_id=project_id, region=region, _config_manager=_config_manager, _api_client=_api_client, _storage=storage, ) try: yield ws finally: ws.close() ```` ### memory ``` memory( account: str | None = None, project_id: str | None = None, region: str | None = None, _config_manager: ConfigManager | None = None, _api_client: MixpanelAPIClient | None = None, ) -> Iterator[Workspace] ``` Create a workspace with true in-memory database. The database exists only in RAM with zero disk footprint. All data is lost when the context manager exits. Best for: - Small datasets where zero disk footprint is required - Unit tests without filesystem side effects - Quick exploratory queries For large datasets, prefer ephemeral() which benefits from DuckDB's compression (can be 8x faster for large workloads). | PARAMETER | DESCRIPTION | | ----------------- | --------------------------------------------------------------------- | | `account` | Named account from config file to use. **TYPE:** \`str | | `project_id` | Override project ID from credentials. **TYPE:** \`str | | `region` | Override region from credentials. **TYPE:** \`str | | `_config_manager` | Injected ConfigManager for testing. **TYPE:** \`ConfigManager | | `_api_client` | Injected MixpanelAPIClient for testing. **TYPE:** \`MixpanelAPIClient | | YIELDS | DESCRIPTION | | ----------- | ----------------------------------------------------------- | | `Workspace` | A workspace with in-memory database. **TYPE::** `Workspace` | Example ``` with Workspace.memory() as ws: ws.fetch_events(from_date="2024-01-01", to_date="2024-01-01") total = ws.sql_scalar("SELECT COUNT(*) FROM events") # Database gone - no cleanup needed, no files left behind ``` Source code in `src/mixpanel_data/workspace.py` ```` @classmethod @contextmanager def memory( cls, account: str | None = None, project_id: str | None = None, region: str | None = None, _config_manager: ConfigManager | None = None, _api_client: MixpanelAPIClient | None = None, ) -> Iterator[Workspace]: """Create a workspace with true in-memory database. The database exists only in RAM with zero disk footprint. All data is lost when the context manager exits. Best for: - Small datasets where zero disk footprint is required - Unit tests without filesystem side effects - Quick exploratory queries For large datasets, prefer ephemeral() which benefits from DuckDB's compression (can be 8x faster for large workloads). Args: account: Named account from config file to use. project_id: Override project ID from credentials. region: Override region from credentials. _config_manager: Injected ConfigManager for testing. _api_client: Injected MixpanelAPIClient for testing. Yields: Workspace: A workspace with in-memory database. Example: ```python with Workspace.memory() as ws: ws.fetch_events(from_date="2024-01-01", to_date="2024-01-01") total = ws.sql_scalar("SELECT COUNT(*) FROM events") # Database gone - no cleanup needed, no files left behind ``` """ storage = StorageEngine.memory() ws = cls( account=account, project_id=project_id, region=region, _config_manager=_config_manager, _api_client=_api_client, _storage=storage, ) try: yield ws finally: ws.close() ```` ### open ``` open(path: str | Path, *, read_only: bool = True) -> Workspace ``` Open an existing database for query-only access. This method opens a database without requiring API credentials. Discovery, fetching, and live query methods will be unavailable. | PARAMETER | DESCRIPTION | | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------- | | `path` | Path to existing database file. **TYPE:** \`str | | `read_only` | If True (default), open in read-only mode allowing concurrent reads. Set to False for write access. **TYPE:** `bool` **DEFAULT:** `True` | | RETURNS | DESCRIPTION | | ----------- | ------------------------------------------------------------- | | `Workspace` | A workspace with access to stored data. **TYPE:** `Workspace` | | RAISES | DESCRIPTION | | ------------------- | ------------------------------- | | `FileNotFoundError` | If database file doesn't exist. | Example ``` ws = Workspace.open("my_data.db") df = ws.sql("SELECT * FROM events") ws.close() ``` Source code in `src/mixpanel_data/workspace.py` ```` @classmethod def open(cls, path: str | Path, *, read_only: bool = True) -> Workspace: """Open an existing database for query-only access. This method opens a database without requiring API credentials. Discovery, fetching, and live query methods will be unavailable. Args: path: Path to existing database file. read_only: If True (default), open in read-only mode allowing concurrent reads. Set to False for write access. Returns: Workspace: A workspace with access to stored data. Raises: FileNotFoundError: If database file doesn't exist. Example: ```python ws = Workspace.open("my_data.db") df = ws.sql("SELECT * FROM events") ws.close() ``` """ db_path = Path(path) if isinstance(path, str) else path storage = StorageEngine.open_existing(db_path, read_only=read_only) # Create instance without credential resolution instance = object.__new__(cls) instance._config_manager = ConfigManager() instance._credentials = None instance._account_name = None instance._db_path = db_path instance._storage = storage instance._read_only = read_only instance._api_client = None instance._discovery = None instance._fetcher = None instance._live_query = None return instance ```` ### close ``` close() -> None ``` Close all resources (database connection, HTTP client). This method is idempotent and safe to call multiple times. Source code in `src/mixpanel_data/workspace.py` ``` def close(self) -> None: """Close all resources (database connection, HTTP client). This method is idempotent and safe to call multiple times. """ # Close storage if self._storage is not None: self._storage.close() # Close API client if we created one if self._api_client is not None: self._api_client.close() self._api_client = None ``` ### test_credentials ``` test_credentials(account: str | None = None) -> dict[str, Any] ``` Test account credentials by making a lightweight API call. This method verifies that credentials are valid and can access the Mixpanel API. It's useful for validating configuration before attempting more expensive operations. | PARAMETER | DESCRIPTION | | --------- | -------------------------------------------------------------------------------------------------------------------- | | `account` | Named account to test. If None, tests the default account or credentials from environment variables. **TYPE:** \`str | | RETURNS | DESCRIPTION | | ---------------- | ---------------------------------------------------------------------------- | | `dict[str, Any]` | Dict containing: - success: bool - Whether the test succeeded - account: str | | RAISES | DESCRIPTION | | ---------------------- | ---------------------------------- | | `AccountNotFoundError` | If named account doesn't exist. | | `AuthenticationError` | If credentials are invalid. | | `ConfigError` | If no credentials can be resolved. | Example ``` # Test default account result = Workspace.test_credentials() if result["success"]: print(f"Authenticated to project {result['project_id']}") # Test specific account result = Workspace.test_credentials("production") ``` Source code in `src/mixpanel_data/workspace.py` ```` @staticmethod def test_credentials(account: str | None = None) -> dict[str, Any]: """Test account credentials by making a lightweight API call. This method verifies that credentials are valid and can access the Mixpanel API. It's useful for validating configuration before attempting more expensive operations. Args: account: Named account to test. If None, tests the default account or credentials from environment variables. Returns: Dict containing: - success: bool - Whether the test succeeded - account: str | None - Account name tested - project_id: str - Project ID from credentials - region: str - Region from credentials - events_found: int - Number of events found (validation metric) Raises: AccountNotFoundError: If named account doesn't exist. AuthenticationError: If credentials are invalid. ConfigError: If no credentials can be resolved. Example: ```python # Test default account result = Workspace.test_credentials() if result["success"]: print(f"Authenticated to project {result['project_id']}") # Test specific account result = Workspace.test_credentials("production") ``` """ config_manager = ConfigManager() credentials = config_manager.resolve_credentials(account) # Get account info if we used a named account account_info = None if account is not None: account_info = config_manager.get_account(account) else: # Check if credentials came from a default account accounts = config_manager.list_accounts() for acc in accounts: if acc.is_default: account_info = acc break # Create API client and test with a lightweight call api_client = MixpanelAPIClient(credentials) try: events = api_client.get_events() event_count = len(list(events)) if events else 0 return { "success": True, "account": account_info.name if account_info else None, "project_id": credentials.project_id, "region": credentials.region, "events_found": event_count, } finally: api_client.close() ```` ### events ``` events() -> list[str] ``` List all event names in the Mixpanel project. Results are cached for the lifetime of the Workspace. | RETURNS | DESCRIPTION | | ----------- | ------------------------------------------ | | `list[str]` | Alphabetically sorted list of event names. | | RAISES | DESCRIPTION | | --------------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | | `AuthenticationError` | If credentials are invalid. | Source code in `src/mixpanel_data/workspace.py` ``` def events(self) -> list[str]: """List all event names in the Mixpanel project. Results are cached for the lifetime of the Workspace. Returns: Alphabetically sorted list of event names. Raises: ConfigError: If API credentials not available. AuthenticationError: If credentials are invalid. """ return self._discovery_service.list_events() ``` ### properties ``` properties(event: str) -> list[str] ``` List all property names for an event. Results are cached per event for the lifetime of the Workspace. | PARAMETER | DESCRIPTION | | --------- | ------------------------------------------------- | | `event` | Event name to get properties for. **TYPE:** `str` | | RETURNS | DESCRIPTION | | ----------- | --------------------------------------------- | | `list[str]` | Alphabetically sorted list of property names. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def properties(self, event: str) -> list[str]: """List all property names for an event. Results are cached per event for the lifetime of the Workspace. Args: event: Event name to get properties for. Returns: Alphabetically sorted list of property names. Raises: ConfigError: If API credentials not available. """ return self._discovery_service.list_properties(event) ``` ### property_values ``` property_values( property_name: str, *, event: str | None = None, limit: int = 100 ) -> list[str] ``` Get sample values for a property. Results are cached per (property, event, limit) for the lifetime of the Workspace. | PARAMETER | DESCRIPTION | | --------------- | ---------------------------------------------------------------------- | | `property_name` | Property to get values for. **TYPE:** `str` | | `event` | Optional event to filter by. **TYPE:** \`str | | `limit` | Maximum number of values to return. **TYPE:** `int` **DEFAULT:** `100` | | RETURNS | DESCRIPTION | | ----------- | ------------------------------------------ | | `list[str]` | List of sample property values as strings. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def property_values( self, property_name: str, *, event: str | None = None, limit: int = 100, ) -> list[str]: """Get sample values for a property. Results are cached per (property, event, limit) for the lifetime of the Workspace. Args: property_name: Property to get values for. event: Optional event to filter by. limit: Maximum number of values to return. Returns: List of sample property values as strings. Raises: ConfigError: If API credentials not available. """ return self._discovery_service.list_property_values( property_name, event=event, limit=limit ) ``` ### funnels ``` funnels() -> list[FunnelInfo] ``` List saved funnels in the Mixpanel project. Results are cached for the lifetime of the Workspace. | RETURNS | DESCRIPTION | | ------------------ | --------------------------------------------- | | `list[FunnelInfo]` | List of FunnelInfo objects (funnel_id, name). | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def funnels(self) -> list[FunnelInfo]: """List saved funnels in the Mixpanel project. Results are cached for the lifetime of the Workspace. Returns: List of FunnelInfo objects (funnel_id, name). Raises: ConfigError: If API credentials not available. """ return self._discovery_service.list_funnels() ``` ### cohorts ``` cohorts() -> list[SavedCohort] ``` List saved cohorts in the Mixpanel project. Results are cached for the lifetime of the Workspace. | RETURNS | DESCRIPTION | | ------------------- | ---------------------------- | | `list[SavedCohort]` | List of SavedCohort objects. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def cohorts(self) -> list[SavedCohort]: """List saved cohorts in the Mixpanel project. Results are cached for the lifetime of the Workspace. Returns: List of SavedCohort objects. Raises: ConfigError: If API credentials not available. """ return self._discovery_service.list_cohorts() ``` ### list_bookmarks ``` list_bookmarks(bookmark_type: BookmarkType | None = None) -> list[BookmarkInfo] ``` List all saved reports (bookmarks) in the project. Retrieves metadata for all saved Insights, Funnel, Retention, and Flows reports in the project. | PARAMETER | DESCRIPTION | | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `bookmark_type` | Optional filter by report type. Valid values are 'insights', 'funnels', 'retention', 'flows', 'launch-analysis'. If None, returns all bookmark types. **TYPE:** \`BookmarkType | | RETURNS | DESCRIPTION | | -------------------- | -------------------------------------------------- | | `list[BookmarkInfo]` | List of BookmarkInfo objects with report metadata. | | `list[BookmarkInfo]` | Empty list if no bookmarks exist. | | RAISES | DESCRIPTION | | ------------- | -------------------------------------------- | | `ConfigError` | If API credentials not available. | | `QueryError` | Permission denied or invalid type parameter. | Source code in `src/mixpanel_data/workspace.py` ``` def list_bookmarks( self, bookmark_type: BookmarkType | None = None, ) -> list[BookmarkInfo]: """List all saved reports (bookmarks) in the project. Retrieves metadata for all saved Insights, Funnel, Retention, and Flows reports in the project. Args: bookmark_type: Optional filter by report type. Valid values are 'insights', 'funnels', 'retention', 'flows', 'launch-analysis'. If None, returns all bookmark types. Returns: List of BookmarkInfo objects with report metadata. Empty list if no bookmarks exist. Raises: ConfigError: If API credentials not available. QueryError: Permission denied or invalid type parameter. """ return self._discovery_service.list_bookmarks(bookmark_type=bookmark_type) ``` ### top_events ``` top_events( *, type: Literal["general", "average", "unique"] = "general", limit: int | None = None, ) -> list[TopEvent] ``` Get today's most active events. This method is NOT cached (returns real-time data). | PARAMETER | DESCRIPTION | | --------- | ------------------------------------------------------------------------------------------------------------------------ | | `type` | Counting method (general, average, unique). **TYPE:** `Literal['general', 'average', 'unique']` **DEFAULT:** `'general'` | | `limit` | Maximum number of events to return. **TYPE:** \`int | | RETURNS | DESCRIPTION | | ---------------- | -------------------------------------------------------- | | `list[TopEvent]` | List of TopEvent objects (event, count, percent_change). | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def top_events( self, *, type: Literal["general", "average", "unique"] = "general", limit: int | None = None, ) -> list[TopEvent]: """Get today's most active events. This method is NOT cached (returns real-time data). Args: type: Counting method (general, average, unique). limit: Maximum number of events to return. Returns: List of TopEvent objects (event, count, percent_change). Raises: ConfigError: If API credentials not available. """ return self._discovery_service.list_top_events(type=type, limit=limit) ``` ### lexicon_schemas ``` lexicon_schemas( *, entity_type: EntityType | None = None ) -> list[LexiconSchema] ``` List Lexicon schemas in the project. Retrieves documented event and profile property schemas from the Mixpanel Lexicon (data dictionary). Results are cached for the lifetime of the Workspace. | PARAMETER | DESCRIPTION | | ------------- | ---------------------------------------------------------------------------------------------------- | | `entity_type` | Optional filter by type ("event" or "profile"). If None, returns all schemas. **TYPE:** \`EntityType | | RETURNS | DESCRIPTION | | --------------------- | ---------------------------------------------------- | | `list[LexiconSchema]` | Alphabetically sorted list of LexiconSchema objects. | | RAISES | DESCRIPTION | | --------------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | | `AuthenticationError` | If credentials are invalid. | Note The Lexicon API has a strict 5 requests/minute rate limit. Caching helps avoid hitting this limit; call clear_discovery_cache() only when fresh data is needed. Source code in `src/mixpanel_data/workspace.py` ``` def lexicon_schemas( self, *, entity_type: EntityType | None = None, ) -> list[LexiconSchema]: """List Lexicon schemas in the project. Retrieves documented event and profile property schemas from the Mixpanel Lexicon (data dictionary). Results are cached for the lifetime of the Workspace. Args: entity_type: Optional filter by type ("event" or "profile"). If None, returns all schemas. Returns: Alphabetically sorted list of LexiconSchema objects. Raises: ConfigError: If API credentials not available. AuthenticationError: If credentials are invalid. Note: The Lexicon API has a strict 5 requests/minute rate limit. Caching helps avoid hitting this limit; call clear_discovery_cache() only when fresh data is needed. """ return self._discovery_service.list_schemas(entity_type=entity_type) ``` ### lexicon_schema ``` lexicon_schema(entity_type: EntityType, name: str) -> LexiconSchema ``` Get a single Lexicon schema by entity type and name. Retrieves a documented schema for a specific event or profile property from the Mixpanel Lexicon (data dictionary). Results are cached for the lifetime of the Workspace. | PARAMETER | DESCRIPTION | | ------------- | ---------------------------------------------------------- | | `entity_type` | Entity type ("event" or "profile"). **TYPE:** `EntityType` | | `name` | Entity name. **TYPE:** `str` | | RETURNS | DESCRIPTION | | --------------- | --------------------------------------- | | `LexiconSchema` | LexiconSchema for the specified entity. | | RAISES | DESCRIPTION | | --------------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | | `AuthenticationError` | If credentials are invalid. | | `QueryError` | If schema not found. | Note The Lexicon API has a strict 5 requests/minute rate limit. Caching helps avoid hitting this limit; call clear_discovery_cache() only when fresh data is needed. Source code in `src/mixpanel_data/workspace.py` ``` def lexicon_schema( self, entity_type: EntityType, name: str, ) -> LexiconSchema: """Get a single Lexicon schema by entity type and name. Retrieves a documented schema for a specific event or profile property from the Mixpanel Lexicon (data dictionary). Results are cached for the lifetime of the Workspace. Args: entity_type: Entity type ("event" or "profile"). name: Entity name. Returns: LexiconSchema for the specified entity. Raises: ConfigError: If API credentials not available. AuthenticationError: If credentials are invalid. QueryError: If schema not found. Note: The Lexicon API has a strict 5 requests/minute rate limit. Caching helps avoid hitting this limit; call clear_discovery_cache() only when fresh data is needed. """ return self._discovery_service.get_schema(entity_type, name) ``` ### clear_discovery_cache ``` clear_discovery_cache() -> None ``` Clear cached discovery results. Subsequent discovery calls will fetch fresh data from the API. Source code in `src/mixpanel_data/workspace.py` ``` def clear_discovery_cache(self) -> None: """Clear cached discovery results. Subsequent discovery calls will fetch fresh data from the API. """ if self._discovery is not None: self._discovery.clear_cache() ``` ### fetch_events ``` fetch_events( name: str = "events", *, from_date: str, to_date: str, events: list[str] | None = None, where: str | None = None, limit: int | None = None, progress: bool = True, append: bool = False, batch_size: int = 1000, parallel: bool = False, max_workers: int | None = None, on_batch_complete: Callable[[BatchProgress], None] | None = None, chunk_days: int = 7, ) -> FetchResult | ParallelFetchResult ``` Fetch events from Mixpanel and store in local database. Note This is a potentially long-running operation that streams data from Mixpanel's Export API. For large date ranges, use `parallel=True` for significantly faster exports (up to 10x speedup). | PARAMETER | DESCRIPTION | | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `name` | Table name to create or append to (default: "events"). **TYPE:** `str` **DEFAULT:** `'events'` | | `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date (YYYY-MM-DD). **TYPE:** `str` | | `events` | Optional list of event names to filter. **TYPE:** \`list[str] | | `where` | Optional WHERE clause for filtering. **TYPE:** \`str | | `limit` | Optional maximum number of events to return (max 100000). **TYPE:** \`int | | `progress` | Show progress bar (default: True). **TYPE:** `bool` **DEFAULT:** `True` | | `append` | If True, append to existing table. If False (default), create new. **TYPE:** `bool` **DEFAULT:** `False` | | `batch_size` | Number of rows per INSERT/COMMIT cycle. Controls the memory/IO tradeoff: smaller values use less memory but more disk IO; larger values use more memory but less IO. Default: 1000. Valid range: 100-100000. **TYPE:** `int` **DEFAULT:** `1000` | | `parallel` | If True, use parallel fetching with multiple threads. Splits date range into 7-day chunks and fetches concurrently. Enables export of date ranges exceeding 100 days. Default: False. **TYPE:** `bool` **DEFAULT:** `False` | | `max_workers` | Maximum concurrent fetch threads when parallel=True. Default: 10. Higher values may hit Mixpanel rate limits. Ignored when parallel=False. **TYPE:** \`int | | `on_batch_complete` | Callback invoked when each batch completes during parallel fetch. Receives BatchProgress with status. Useful for custom progress reporting. Ignored when parallel=False. **TYPE:** \`Callable\[[BatchProgress], None\] | | `chunk_days` | Days per chunk for parallel date range splitting. Default: 7. Valid range: 1-100. Smaller values create more parallel batches but may increase API overhead. Ignored when parallel=False. **TYPE:** `int` **DEFAULT:** `7` | | RETURNS | DESCRIPTION | | ------------- | --------------------- | | \`FetchResult | ParallelFetchResult\` | | \`FetchResult | ParallelFetchResult\` | | RAISES | DESCRIPTION | | --------------------- | -------------------------------------------------- | | `TableExistsError` | If table exists and append=False. | | `TableNotFoundError` | If table doesn't exist and append=True. | | `ConfigError` | If API credentials not available. | | `AuthenticationError` | If credentials are invalid. | | `ValueError` | If batch_size is outside valid range (100-100000). | | `ValueError` | If limit is outside valid range (1-100000). | | `ValueError` | If max_workers is not positive. | | `ValueError` | If chunk_days is not in range 1-100. | Example ``` # Sequential fetch (default) result = ws.fetch_events( name="events", from_date="2024-01-01", to_date="2024-01-31", ) # Parallel fetch for large date ranges result = ws.fetch_events( name="events_q4", from_date="2024-10-01", to_date="2024-12-31", parallel=True, ) # With custom progress callback def on_batch(progress: BatchProgress) -> None: print(f"Batch {progress.batch_index + 1}/{progress.total_batches}") result = ws.fetch_events( name="events", from_date="2024-01-01", to_date="2024-03-31", parallel=True, on_batch_complete=on_batch, ) ``` Source code in `src/mixpanel_data/workspace.py` ```` def fetch_events( self, name: str = "events", *, from_date: str, to_date: str, events: list[str] | None = None, where: str | None = None, limit: int | None = None, progress: bool = True, append: bool = False, batch_size: int = 1000, parallel: bool = False, max_workers: int | None = None, on_batch_complete: Callable[[BatchProgress], None] | None = None, chunk_days: int = 7, ) -> FetchResult | ParallelFetchResult: """Fetch events from Mixpanel and store in local database. Note: This is a potentially long-running operation that streams data from Mixpanel's Export API. For large date ranges, use ``parallel=True`` for significantly faster exports (up to 10x speedup). Args: name: Table name to create or append to (default: "events"). from_date: Start date (YYYY-MM-DD). to_date: End date (YYYY-MM-DD). events: Optional list of event names to filter. where: Optional WHERE clause for filtering. limit: Optional maximum number of events to return (max 100000). progress: Show progress bar (default: True). append: If True, append to existing table. If False (default), create new. batch_size: Number of rows per INSERT/COMMIT cycle. Controls the memory/IO tradeoff: smaller values use less memory but more disk IO; larger values use more memory but less IO. Default: 1000. Valid range: 100-100000. parallel: If True, use parallel fetching with multiple threads. Splits date range into 7-day chunks and fetches concurrently. Enables export of date ranges exceeding 100 days. Default: False. max_workers: Maximum concurrent fetch threads when parallel=True. Default: 10. Higher values may hit Mixpanel rate limits. Ignored when parallel=False. on_batch_complete: Callback invoked when each batch completes during parallel fetch. Receives BatchProgress with status. Useful for custom progress reporting. Ignored when parallel=False. chunk_days: Days per chunk for parallel date range splitting. Default: 7. Valid range: 1-100. Smaller values create more parallel batches but may increase API overhead. Ignored when parallel=False. Returns: FetchResult when parallel=False, ParallelFetchResult when parallel=True. ParallelFetchResult includes per-batch statistics and any failure info. Raises: TableExistsError: If table exists and append=False. TableNotFoundError: If table doesn't exist and append=True. ConfigError: If API credentials not available. AuthenticationError: If credentials are invalid. ValueError: If batch_size is outside valid range (100-100000). ValueError: If limit is outside valid range (1-100000). ValueError: If max_workers is not positive. ValueError: If chunk_days is not in range 1-100. Example: ```python # Sequential fetch (default) result = ws.fetch_events( name="events", from_date="2024-01-01", to_date="2024-01-31", ) # Parallel fetch for large date ranges result = ws.fetch_events( name="events_q4", from_date="2024-10-01", to_date="2024-12-31", parallel=True, ) # With custom progress callback def on_batch(progress: BatchProgress) -> None: print(f"Batch {progress.batch_index + 1}/{progress.total_batches}") result = ws.fetch_events( name="events", from_date="2024-01-01", to_date="2024-03-31", parallel=True, on_batch_complete=on_batch, ) ``` """ # Validate parameters early to avoid wasted API calls _validate_batch_size(batch_size) _validate_limit(limit) # Validate max_workers for parallel mode if max_workers is not None and max_workers <= 0: raise ValueError("max_workers must be positive") # Validate chunk_days for parallel mode if chunk_days <= 0: raise ValueError("chunk_days must be positive") if chunk_days > 100: raise ValueError("chunk_days must be at most 100") # Create progress callback if requested (only for interactive terminals) progress_callback = None pbar = None if progress and sys.stderr.isatty() and not parallel: try: from rich.progress import Progress, SpinnerColumn, TextColumn pbar = Progress( SpinnerColumn(), TextColumn("[progress.description]{task.description}"), TextColumn("{task.completed} rows"), ) task = pbar.add_task("Fetching events...", total=None) pbar.start() def callback(count: int) -> None: pbar.update(task, completed=count) progress_callback = callback except Exception: # Progress bar unavailable or failed to initialize, skip silently pass try: result = self._fetcher_service.fetch_events( name=name, from_date=from_date, to_date=to_date, events=events, where=where, limit=limit, progress_callback=progress_callback, append=append, batch_size=batch_size, parallel=parallel, max_workers=max_workers, on_batch_complete=on_batch_complete, chunk_days=chunk_days, ) finally: if pbar is not None: pbar.stop() return result ```` ### fetch_profiles ``` fetch_profiles( name: str = "profiles", *, where: str | None = None, cohort_id: str | None = None, output_properties: list[str] | None = None, progress: bool = True, append: bool = False, batch_size: int = 1000, distinct_id: str | None = None, distinct_ids: list[str] | None = None, group_id: str | None = None, behaviors: list[dict[str, Any]] | None = None, as_of_timestamp: int | None = None, include_all_users: bool = False, parallel: bool = False, max_workers: int | None = None, on_page_complete: Callable[[ProfileProgress], None] | None = None, ) -> FetchResult | ParallelProfileResult ``` Fetch user profiles from Mixpanel and store in local database. Note This is a potentially long-running operation that streams data from Mixpanel's Engage API. For large profile sets, use `parallel=True` for up to 5x faster exports. | PARAMETER | DESCRIPTION | | ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `name` | Table name to create or append to (default: "profiles"). **TYPE:** `str` **DEFAULT:** `'profiles'` | | `where` | Optional WHERE clause for filtering. **TYPE:** \`str | | `cohort_id` | Optional cohort ID to filter by. Only profiles that are members of this cohort will be returned. **TYPE:** \`str | | `output_properties` | Optional list of property names to include in the response. If None, all properties are returned. **TYPE:** \`list[str] | | `progress` | Show progress bar (default: True). **TYPE:** `bool` **DEFAULT:** `True` | | `append` | If True, append to existing table. If False (default), create new. **TYPE:** `bool` **DEFAULT:** `False` | | `batch_size` | Number of rows per INSERT/COMMIT cycle. Controls the memory/IO tradeoff: smaller values use less memory but more disk IO; larger values use more memory but less IO. Default: 1000. Valid range: 100-100000. **TYPE:** `int` **DEFAULT:** `1000` | | `distinct_id` | Optional single user ID to fetch. Mutually exclusive with distinct_ids. **TYPE:** \`str | | `distinct_ids` | Optional list of user IDs to fetch. Mutually exclusive with distinct_id. Duplicates are automatically removed. **TYPE:** \`list[str] | | `group_id` | Optional group type identifier (e.g., "companies") to fetch group profiles instead of user profiles. **TYPE:** \`str | | `behaviors` | Optional list of behavioral filters. Each dict should have 'window' (e.g., "30d"), 'name' (identifier), and 'event_selectors' (list of {"event": "Name"}). Use with where parameter to filter, e.g., where='(behaviors["name"] > 0)'. Mutually exclusive with cohort_id. **TYPE:** \`list\[dict[str, Any]\] | | `as_of_timestamp` | Optional Unix timestamp to query profile state at a specific point in time. Must be in the past. **TYPE:** \`int | | `include_all_users` | If True, include all users and mark cohort membership. Only valid when cohort_id is provided. **TYPE:** `bool` **DEFAULT:** `False` | | `parallel` | If True, use parallel fetching with multiple threads. Uses page-based parallelism for concurrent profile fetching. Enables up to 5x faster exports. Default: False. **TYPE:** `bool` **DEFAULT:** `False` | | `max_workers` | Maximum concurrent fetch threads when parallel=True. Default: 5, capped at 5. Ignored when parallel=False. **TYPE:** \`int | | `on_page_complete` | Callback invoked when each page completes during parallel fetch. Receives ProfileProgress with status. Useful for custom progress reporting. Ignored when parallel=False. **TYPE:** \`Callable\[[ProfileProgress], None\] | | RETURNS | DESCRIPTION | | ------------- | ----------------------- | | \`FetchResult | ParallelProfileResult\` | | \`FetchResult | ParallelProfileResult\` | | RAISES | DESCRIPTION | | -------------------- | ------------------------------------------------------------------------------------------------ | | `TableExistsError` | If table exists and append=False. | | `TableNotFoundError` | If table doesn't exist and append=True. | | `ConfigError` | If API credentials not available. | | `ValueError` | If batch_size is outside valid range (100-100000) or mutually exclusive parameters are provided. | Source code in `src/mixpanel_data/workspace.py` ``` def fetch_profiles( self, name: str = "profiles", *, where: str | None = None, cohort_id: str | None = None, output_properties: list[str] | None = None, progress: bool = True, append: bool = False, batch_size: int = 1000, distinct_id: str | None = None, distinct_ids: list[str] | None = None, group_id: str | None = None, behaviors: list[dict[str, Any]] | None = None, as_of_timestamp: int | None = None, include_all_users: bool = False, parallel: bool = False, max_workers: int | None = None, on_page_complete: Callable[[ProfileProgress], None] | None = None, ) -> FetchResult | ParallelProfileResult: """Fetch user profiles from Mixpanel and store in local database. Note: This is a potentially long-running operation that streams data from Mixpanel's Engage API. For large profile sets, use ``parallel=True`` for up to 5x faster exports. Args: name: Table name to create or append to (default: "profiles"). where: Optional WHERE clause for filtering. cohort_id: Optional cohort ID to filter by. Only profiles that are members of this cohort will be returned. output_properties: Optional list of property names to include in the response. If None, all properties are returned. progress: Show progress bar (default: True). append: If True, append to existing table. If False (default), create new. batch_size: Number of rows per INSERT/COMMIT cycle. Controls the memory/IO tradeoff: smaller values use less memory but more disk IO; larger values use more memory but less IO. Default: 1000. Valid range: 100-100000. distinct_id: Optional single user ID to fetch. Mutually exclusive with distinct_ids. distinct_ids: Optional list of user IDs to fetch. Mutually exclusive with distinct_id. Duplicates are automatically removed. group_id: Optional group type identifier (e.g., "companies") to fetch group profiles instead of user profiles. behaviors: Optional list of behavioral filters. Each dict should have 'window' (e.g., "30d"), 'name' (identifier), and 'event_selectors' (list of {"event": "Name"}). Use with `where` parameter to filter, e.g., where='(behaviors["name"] > 0)'. Mutually exclusive with cohort_id. as_of_timestamp: Optional Unix timestamp to query profile state at a specific point in time. Must be in the past. include_all_users: If True, include all users and mark cohort membership. Only valid when cohort_id is provided. parallel: If True, use parallel fetching with multiple threads. Uses page-based parallelism for concurrent profile fetching. Enables up to 5x faster exports. Default: False. max_workers: Maximum concurrent fetch threads when parallel=True. Default: 5, capped at 5. Ignored when parallel=False. on_page_complete: Callback invoked when each page completes during parallel fetch. Receives ProfileProgress with status. Useful for custom progress reporting. Ignored when parallel=False. Returns: FetchResult when parallel=False, ParallelProfileResult when parallel=True. ParallelProfileResult includes per-page statistics and any failure info. Raises: TableExistsError: If table exists and append=False. TableNotFoundError: If table doesn't exist and append=True. ConfigError: If API credentials not available. ValueError: If batch_size is outside valid range (100-100000) or mutually exclusive parameters are provided. """ # Validate batch_size _validate_batch_size(batch_size) # Validate max_workers for parallel mode if max_workers is not None and max_workers <= 0: raise ValueError("max_workers must be positive") # Create progress callback if requested (only for interactive terminals) # Sequential mode uses spinner progress bar progress_callback = None pbar = None if progress and sys.stderr.isatty() and not parallel: try: from rich.progress import Progress, SpinnerColumn, TextColumn pbar = Progress( SpinnerColumn(), TextColumn("[progress.description]{task.description}"), TextColumn("{task.completed} rows"), ) task = pbar.add_task("Fetching profiles...", total=None) pbar.start() def callback(count: int) -> None: pbar.update(task, completed=count) progress_callback = callback except Exception: # Progress bar unavailable or failed to initialize, skip silently pass try: result = self._fetcher_service.fetch_profiles( name=name, where=where, cohort_id=cohort_id, output_properties=output_properties, progress_callback=progress_callback, append=append, batch_size=batch_size, distinct_id=distinct_id, distinct_ids=distinct_ids, group_id=group_id, behaviors=behaviors, as_of_timestamp=as_of_timestamp, include_all_users=include_all_users, parallel=parallel, max_workers=max_workers, on_page_complete=on_page_complete, ) finally: if pbar is not None: pbar.stop() return result ``` ### stream_events ``` stream_events( *, from_date: str, to_date: str, events: list[str] | None = None, where: str | None = None, limit: int | None = None, raw: bool = False, ) -> Iterator[dict[str, Any]] ``` Stream events directly from Mixpanel API without storing. Yields events one at a time as they are received from the API. No database files or tables are created. | PARAMETER | DESCRIPTION | | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `from_date` | Start date inclusive (YYYY-MM-DD format). **TYPE:** `str` | | `to_date` | End date inclusive (YYYY-MM-DD format). **TYPE:** `str` | | `events` | Optional list of event names to filter. If None, all events returned. **TYPE:** \`list[str] | | `where` | Optional Mixpanel filter expression (e.g., 'properties["country"]=="US"'). **TYPE:** \`str | | `limit` | Optional maximum number of events to return (max 100000). **TYPE:** \`int | | `raw` | If True, return events in raw Mixpanel API format. If False (default), return normalized format with datetime objects. **TYPE:** `bool` **DEFAULT:** `False` | | YIELDS | DESCRIPTION | | ---------------- | ----------------------------------------------------------------- | | `dict[str, Any]` | dict\[str, Any\]: Event dictionaries in normalized or raw format. | | RAISES | DESCRIPTION | | --------------------- | ------------------------------------------- | | `ConfigError` | If API credentials are not available. | | `AuthenticationError` | If credentials are invalid. | | `RateLimitError` | If rate limit exceeded after max retries. | | `QueryError` | If filter expression is invalid. | | `ValueError` | If limit is outside valid range (1-100000). | Example ``` ws = Workspace() for event in ws.stream_events(from_date="2024-01-01", to_date="2024-01-31"): process(event) ws.close() ``` With raw format: ``` for event in ws.stream_events( from_date="2024-01-01", to_date="2024-01-31", raw=True ): legacy_system.ingest(event) ``` Source code in `src/mixpanel_data/workspace.py` ```` def stream_events( self, *, from_date: str, to_date: str, events: list[str] | None = None, where: str | None = None, limit: int | None = None, raw: bool = False, ) -> Iterator[dict[str, Any]]: """Stream events directly from Mixpanel API without storing. Yields events one at a time as they are received from the API. No database files or tables are created. Args: from_date: Start date inclusive (YYYY-MM-DD format). to_date: End date inclusive (YYYY-MM-DD format). events: Optional list of event names to filter. If None, all events returned. where: Optional Mixpanel filter expression (e.g., 'properties["country"]=="US"'). limit: Optional maximum number of events to return (max 100000). raw: If True, return events in raw Mixpanel API format. If False (default), return normalized format with datetime objects. Yields: dict[str, Any]: Event dictionaries in normalized or raw format. Raises: ConfigError: If API credentials are not available. AuthenticationError: If credentials are invalid. RateLimitError: If rate limit exceeded after max retries. QueryError: If filter expression is invalid. ValueError: If limit is outside valid range (1-100000). Example: ```python ws = Workspace() for event in ws.stream_events(from_date="2024-01-01", to_date="2024-01-31"): process(event) ws.close() ``` With raw format: ```python for event in ws.stream_events( from_date="2024-01-01", to_date="2024-01-31", raw=True ): legacy_system.ingest(event) ``` """ # Validate limit early to avoid wasted API calls _validate_limit(limit) api_client = self._require_api_client() event_iterator = api_client.export_events( from_date=from_date, to_date=to_date, events=events, where=where, limit=limit, ) if raw: yield from event_iterator else: for event in event_iterator: yield transform_event(event) ```` ### stream_profiles ``` stream_profiles( *, where: str | None = None, cohort_id: str | None = None, output_properties: list[str] | None = None, raw: bool = False, distinct_id: str | None = None, distinct_ids: list[str] | None = None, group_id: str | None = None, behaviors: list[dict[str, Any]] | None = None, as_of_timestamp: int | None = None, include_all_users: bool = False, ) -> Iterator[dict[str, Any]] ``` Stream user profiles directly from Mixpanel API without storing. Yields profiles one at a time as they are received from the API. No database files or tables are created. | PARAMETER | DESCRIPTION | | ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `where` | Optional Mixpanel filter expression for profile properties. **TYPE:** \`str | | `cohort_id` | Optional cohort ID to filter by. Only profiles that are members of this cohort will be returned. **TYPE:** \`str | | `output_properties` | Optional list of property names to include in the response. If None, all properties are returned. **TYPE:** \`list[str] | | `raw` | If True, return profiles in raw Mixpanel API format. If False (default), return normalized format. **TYPE:** `bool` **DEFAULT:** `False` | | `distinct_id` | Optional single user ID to fetch. Mutually exclusive with distinct_ids. **TYPE:** \`str | | `distinct_ids` | Optional list of user IDs to fetch. Mutually exclusive with distinct_id. Duplicates are automatically removed. **TYPE:** \`list[str] | | `group_id` | Optional group type identifier (e.g., "companies") to fetch group profiles instead of user profiles. **TYPE:** \`str | | `behaviors` | Optional list of behavioral filters. Each dict should have 'window' (e.g., "30d"), 'name' (identifier), and 'event_selectors' (list of {"event": "Name"}). Use with where parameter to filter, e.g., where='(behaviors["name"] > 0)'. Mutually exclusive with cohort_id. **TYPE:** \`list\[dict[str, Any]\] | | `as_of_timestamp` | Optional Unix timestamp to query profile state at a specific point in time. Must be in the past. **TYPE:** \`int | | `include_all_users` | If True, include all users and mark cohort membership. Only valid when cohort_id is provided. **TYPE:** `bool` **DEFAULT:** `False` | | YIELDS | DESCRIPTION | | ---------------- | ------------------------------------------------------------------- | | `dict[str, Any]` | dict\[str, Any\]: Profile dictionaries in normalized or raw format. | | RAISES | DESCRIPTION | | --------------------- | ---------------------------------------------- | | `ConfigError` | If API credentials are not available. | | `AuthenticationError` | If credentials are invalid. | | `RateLimitError` | If rate limit exceeded after max retries. | | `ValueError` | If mutually exclusive parameters are provided. | Example ``` ws = Workspace() for profile in ws.stream_profiles(): sync_to_crm(profile) ws.close() ``` Filter to premium users: ``` for profile in ws.stream_profiles(where='properties["plan"]=="premium"'): send_survey(profile) ``` Filter by cohort and select specific properties: ``` for profile in ws.stream_profiles( cohort_id="12345", output_properties=["$email", "$name"] ): send_email(profile) ``` Fetch specific users by ID: ``` for profile in ws.stream_profiles(distinct_ids=["user_1", "user_2"]): print(profile) ``` Fetch group profiles: ``` for company in ws.stream_profiles(group_id="companies"): print(company) ``` Source code in `src/mixpanel_data/workspace.py` ```` def stream_profiles( self, *, where: str | None = None, cohort_id: str | None = None, output_properties: list[str] | None = None, raw: bool = False, distinct_id: str | None = None, distinct_ids: list[str] | None = None, group_id: str | None = None, behaviors: list[dict[str, Any]] | None = None, as_of_timestamp: int | None = None, include_all_users: bool = False, ) -> Iterator[dict[str, Any]]: """Stream user profiles directly from Mixpanel API without storing. Yields profiles one at a time as they are received from the API. No database files or tables are created. Args: where: Optional Mixpanel filter expression for profile properties. cohort_id: Optional cohort ID to filter by. Only profiles that are members of this cohort will be returned. output_properties: Optional list of property names to include in the response. If None, all properties are returned. raw: If True, return profiles in raw Mixpanel API format. If False (default), return normalized format. distinct_id: Optional single user ID to fetch. Mutually exclusive with distinct_ids. distinct_ids: Optional list of user IDs to fetch. Mutually exclusive with distinct_id. Duplicates are automatically removed. group_id: Optional group type identifier (e.g., "companies") to fetch group profiles instead of user profiles. behaviors: Optional list of behavioral filters. Each dict should have 'window' (e.g., "30d"), 'name' (identifier), and 'event_selectors' (list of {"event": "Name"}). Use with `where` parameter to filter, e.g., where='(behaviors["name"] > 0)'. Mutually exclusive with cohort_id. as_of_timestamp: Optional Unix timestamp to query profile state at a specific point in time. Must be in the past. include_all_users: If True, include all users and mark cohort membership. Only valid when cohort_id is provided. Yields: dict[str, Any]: Profile dictionaries in normalized or raw format. Raises: ConfigError: If API credentials are not available. AuthenticationError: If credentials are invalid. RateLimitError: If rate limit exceeded after max retries. ValueError: If mutually exclusive parameters are provided. Example: ```python ws = Workspace() for profile in ws.stream_profiles(): sync_to_crm(profile) ws.close() ``` Filter to premium users: ```python for profile in ws.stream_profiles(where='properties["plan"]=="premium"'): send_survey(profile) ``` Filter by cohort and select specific properties: ```python for profile in ws.stream_profiles( cohort_id="12345", output_properties=["$email", "$name"] ): send_email(profile) ``` Fetch specific users by ID: ```python for profile in ws.stream_profiles(distinct_ids=["user_1", "user_2"]): print(profile) ``` Fetch group profiles: ```python for company in ws.stream_profiles(group_id="companies"): print(company) ``` """ api_client = self._require_api_client() profile_iterator = api_client.export_profiles( where=where, cohort_id=cohort_id, output_properties=output_properties, distinct_id=distinct_id, distinct_ids=distinct_ids, group_id=group_id, behaviors=behaviors, as_of_timestamp=as_of_timestamp, include_all_users=include_all_users, ) if raw: yield from profile_iterator else: for profile in profile_iterator: yield transform_profile(profile) ```` ### sql ``` sql(query: str) -> pd.DataFrame ``` Execute SQL query and return results as DataFrame. | PARAMETER | DESCRIPTION | | --------- | --------------------------------- | | `query` | SQL query string. **TYPE:** `str` | | RETURNS | DESCRIPTION | | ----------- | ------------------------------------ | | `DataFrame` | pandas DataFrame with query results. | | RAISES | DESCRIPTION | | ------------ | -------------------- | | `QueryError` | If query is invalid. | Source code in `src/mixpanel_data/workspace.py` ``` def sql(self, query: str) -> pd.DataFrame: """Execute SQL query and return results as DataFrame. Args: query: SQL query string. Returns: pandas DataFrame with query results. Raises: QueryError: If query is invalid. """ return self.storage.execute_df(query) ``` ### sql_scalar ``` sql_scalar(query: str) -> Any ``` Execute SQL query and return single scalar value. | PARAMETER | DESCRIPTION | | --------- | ------------------------------------------------------ | | `query` | SQL query that returns a single value. **TYPE:** `str` | | RETURNS | DESCRIPTION | | ------- | ------------------------------------------ | | `Any` | The scalar result (int, float, str, etc.). | | RAISES | DESCRIPTION | | ------------ | ----------------------------------------------- | | `QueryError` | If query is invalid or returns multiple values. | Source code in `src/mixpanel_data/workspace.py` ``` def sql_scalar(self, query: str) -> Any: """Execute SQL query and return single scalar value. Args: query: SQL query that returns a single value. Returns: The scalar result (int, float, str, etc.). Raises: QueryError: If query is invalid or returns multiple values. """ return self.storage.execute_scalar(query) ``` ### sql_rows ``` sql_rows(query: str) -> SQLResult ``` Execute SQL query and return structured result with column metadata. | PARAMETER | DESCRIPTION | | --------- | --------------------------------- | | `query` | SQL query string. **TYPE:** `str` | | RETURNS | DESCRIPTION | | ----------- | ------------------------------------------- | | `SQLResult` | SQLResult with column names and row tuples. | | RAISES | DESCRIPTION | | ------------ | -------------------- | | `QueryError` | If query is invalid. | Example ``` result = ws.sql_rows("SELECT name, age FROM users") print(result.columns) # ['name', 'age'] for row in result.rows: print(row) # ('Alice', 30) # Or convert to dicts for JSON output: for row in result.to_dicts(): print(row) # {'name': 'Alice', 'age': 30} ``` Source code in `src/mixpanel_data/workspace.py` ```` def sql_rows(self, query: str) -> SQLResult: """Execute SQL query and return structured result with column metadata. Args: query: SQL query string. Returns: SQLResult with column names and row tuples. Raises: QueryError: If query is invalid. Example: ```python result = ws.sql_rows("SELECT name, age FROM users") print(result.columns) # ['name', 'age'] for row in result.rows: print(row) # ('Alice', 30) # Or convert to dicts for JSON output: for row in result.to_dicts(): print(row) # {'name': 'Alice', 'age': 30} ``` """ return self.storage.execute_rows(query) ```` ### segmentation ``` segmentation( event: str, *, from_date: str, to_date: str, on: str | None = None, unit: Literal["day", "week", "month"] = "day", where: str | None = None, ) -> SegmentationResult ``` Run a segmentation query against Mixpanel API. | PARAMETER | DESCRIPTION | | ----------- | ------------------------------------------------------------------------------------------- | | `event` | Event name to query. **TYPE:** `str` | | `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date (YYYY-MM-DD). **TYPE:** `str` | | `on` | Optional property to segment by. **TYPE:** \`str | | `unit` | Time unit for aggregation. **TYPE:** `Literal['day', 'week', 'month']` **DEFAULT:** `'day'` | | `where` | Optional WHERE clause. **TYPE:** \`str | | RETURNS | DESCRIPTION | | -------------------- | ----------------------------------------- | | `SegmentationResult` | SegmentationResult with time-series data. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def segmentation( self, event: str, *, from_date: str, to_date: str, on: str | None = None, unit: Literal["day", "week", "month"] = "day", where: str | None = None, ) -> SegmentationResult: """Run a segmentation query against Mixpanel API. Args: event: Event name to query. from_date: Start date (YYYY-MM-DD). to_date: End date (YYYY-MM-DD). on: Optional property to segment by. unit: Time unit for aggregation. where: Optional WHERE clause. Returns: SegmentationResult with time-series data. Raises: ConfigError: If API credentials not available. """ return self._live_query_service.segmentation( event=event, from_date=from_date, to_date=to_date, on=on, unit=unit, where=where, ) ``` ### funnel ``` funnel( funnel_id: int, *, from_date: str, to_date: str, unit: str | None = None, on: str | None = None, ) -> FunnelResult ``` Run a funnel analysis query. | PARAMETER | DESCRIPTION | | ----------- | ------------------------------------------------ | | `funnel_id` | ID of saved funnel. **TYPE:** `int` | | `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date (YYYY-MM-DD). **TYPE:** `str` | | `unit` | Optional time unit. **TYPE:** \`str | | `on` | Optional property to segment by. **TYPE:** \`str | | RETURNS | DESCRIPTION | | -------------- | ---------------------------------------- | | `FunnelResult` | FunnelResult with step conversion rates. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def funnel( self, funnel_id: int, *, from_date: str, to_date: str, unit: str | None = None, on: str | None = None, ) -> FunnelResult: """Run a funnel analysis query. Args: funnel_id: ID of saved funnel. from_date: Start date (YYYY-MM-DD). to_date: End date (YYYY-MM-DD). unit: Optional time unit. on: Optional property to segment by. Returns: FunnelResult with step conversion rates. Raises: ConfigError: If API credentials not available. """ return self._live_query_service.funnel( funnel_id=funnel_id, from_date=from_date, to_date=to_date, unit=unit, on=on, ) ``` ### retention ``` retention( *, born_event: str, return_event: str, from_date: str, to_date: str, born_where: str | None = None, return_where: str | None = None, interval: int = 1, interval_count: int = 10, unit: Literal["day", "week", "month"] = "day", ) -> RetentionResult ``` Run a retention analysis query. | PARAMETER | DESCRIPTION | | ---------------- | --------------------------------------------------------------------------- | | `born_event` | Event that defines cohort entry. **TYPE:** `str` | | `return_event` | Event that defines return. **TYPE:** `str` | | `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date (YYYY-MM-DD). **TYPE:** `str` | | `born_where` | Optional filter for born event. **TYPE:** \`str | | `return_where` | Optional filter for return event. **TYPE:** \`str | | `interval` | Retention interval. **TYPE:** `int` **DEFAULT:** `1` | | `interval_count` | Number of intervals. **TYPE:** `int` **DEFAULT:** `10` | | `unit` | Time unit. **TYPE:** `Literal['day', 'week', 'month']` **DEFAULT:** `'day'` | | RETURNS | DESCRIPTION | | ----------------- | ------------------------------------------- | | `RetentionResult` | RetentionResult with cohort retention data. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def retention( self, *, born_event: str, return_event: str, from_date: str, to_date: str, born_where: str | None = None, return_where: str | None = None, interval: int = 1, interval_count: int = 10, unit: Literal["day", "week", "month"] = "day", ) -> RetentionResult: """Run a retention analysis query. Args: born_event: Event that defines cohort entry. return_event: Event that defines return. from_date: Start date (YYYY-MM-DD). to_date: End date (YYYY-MM-DD). born_where: Optional filter for born event. return_where: Optional filter for return event. interval: Retention interval. interval_count: Number of intervals. unit: Time unit. Returns: RetentionResult with cohort retention data. Raises: ConfigError: If API credentials not available. """ return self._live_query_service.retention( born_event=born_event, return_event=return_event, from_date=from_date, to_date=to_date, born_where=born_where, return_where=return_where, interval=interval, interval_count=interval_count, unit=unit, ) ``` ### jql ``` jql(script: str, params: dict[str, Any] | None = None) -> JQLResult ``` Execute a custom JQL script. | PARAMETER | DESCRIPTION | | --------- | ----------------------------------------------------------------- | | `script` | JQL script code. **TYPE:** `str` | | `params` | Optional parameters to pass to script. **TYPE:** \`dict[str, Any] | | RETURNS | DESCRIPTION | | ----------- | --------------------------------- | | `JQLResult` | JQLResult with raw query results. | | RAISES | DESCRIPTION | | ---------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | | `JQLSyntaxError` | If script has syntax errors. | Source code in `src/mixpanel_data/workspace.py` ``` def jql(self, script: str, params: dict[str, Any] | None = None) -> JQLResult: """Execute a custom JQL script. Args: script: JQL script code. params: Optional parameters to pass to script. Returns: JQLResult with raw query results. Raises: ConfigError: If API credentials not available. JQLSyntaxError: If script has syntax errors. """ return self._live_query_service.jql(script=script, params=params) ``` ### event_counts ``` event_counts( events: list[str], *, from_date: str, to_date: str, type: Literal["general", "unique", "average"] = "general", unit: Literal["day", "week", "month"] = "day", ) -> EventCountsResult ``` Get event counts for multiple events. | PARAMETER | DESCRIPTION | | ----------- | --------------------------------------------------------------------------------------------- | | `events` | List of event names. **TYPE:** `list[str]` | | `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date (YYYY-MM-DD). **TYPE:** `str` | | `type` | Counting method. **TYPE:** `Literal['general', 'unique', 'average']` **DEFAULT:** `'general'` | | `unit` | Time unit. **TYPE:** `Literal['day', 'week', 'month']` **DEFAULT:** `'day'` | | RETURNS | DESCRIPTION | | ------------------- | --------------------------------------------- | | `EventCountsResult` | EventCountsResult with time-series per event. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def event_counts( self, events: list[str], *, from_date: str, to_date: str, type: Literal["general", "unique", "average"] = "general", unit: Literal["day", "week", "month"] = "day", ) -> EventCountsResult: """Get event counts for multiple events. Args: events: List of event names. from_date: Start date (YYYY-MM-DD). to_date: End date (YYYY-MM-DD). type: Counting method. unit: Time unit. Returns: EventCountsResult with time-series per event. Raises: ConfigError: If API credentials not available. """ return self._live_query_service.event_counts( events=events, from_date=from_date, to_date=to_date, type=type, unit=unit, ) ``` ### property_counts ``` property_counts( event: str, property_name: str, *, from_date: str, to_date: str, type: Literal["general", "unique", "average"] = "general", unit: Literal["day", "week", "month"] = "day", values: list[str] | None = None, limit: int | None = None, ) -> PropertyCountsResult ``` Get event counts broken down by property values. | PARAMETER | DESCRIPTION | | --------------- | --------------------------------------------------------------------------------------------- | | `event` | Event name. **TYPE:** `str` | | `property_name` | Property to break down by. **TYPE:** `str` | | `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date (YYYY-MM-DD). **TYPE:** `str` | | `type` | Counting method. **TYPE:** `Literal['general', 'unique', 'average']` **DEFAULT:** `'general'` | | `unit` | Time unit. **TYPE:** `Literal['day', 'week', 'month']` **DEFAULT:** `'day'` | | `values` | Optional list of property values to include. **TYPE:** \`list[str] | | `limit` | Maximum number of property values. **TYPE:** \`int | | RETURNS | DESCRIPTION | | ---------------------- | --------------------------------------------------------- | | `PropertyCountsResult` | PropertyCountsResult with time-series per property value. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def property_counts( self, event: str, property_name: str, *, from_date: str, to_date: str, type: Literal["general", "unique", "average"] = "general", unit: Literal["day", "week", "month"] = "day", values: list[str] | None = None, limit: int | None = None, ) -> PropertyCountsResult: """Get event counts broken down by property values. Args: event: Event name. property_name: Property to break down by. from_date: Start date (YYYY-MM-DD). to_date: End date (YYYY-MM-DD). type: Counting method. unit: Time unit. values: Optional list of property values to include. limit: Maximum number of property values. Returns: PropertyCountsResult with time-series per property value. Raises: ConfigError: If API credentials not available. """ return self._live_query_service.property_counts( event=event, property_name=property_name, from_date=from_date, to_date=to_date, type=type, unit=unit, values=values, limit=limit, ) ``` ### activity_feed ``` activity_feed( distinct_ids: list[str], *, from_date: str | None = None, to_date: str | None = None, ) -> ActivityFeedResult ``` Get activity feed for specific users. | PARAMETER | DESCRIPTION | | -------------- | ----------------------------------------------- | | `distinct_ids` | List of user identifiers. **TYPE:** `list[str]` | | `from_date` | Optional start date filter. **TYPE:** \`str | | `to_date` | Optional end date filter. **TYPE:** \`str | | RETURNS | DESCRIPTION | | -------------------- | ------------------------------------ | | `ActivityFeedResult` | ActivityFeedResult with user events. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def activity_feed( self, distinct_ids: list[str], *, from_date: str | None = None, to_date: str | None = None, ) -> ActivityFeedResult: """Get activity feed for specific users. Args: distinct_ids: List of user identifiers. from_date: Optional start date filter. to_date: Optional end date filter. Returns: ActivityFeedResult with user events. Raises: ConfigError: If API credentials not available. """ return self._live_query_service.activity_feed( distinct_ids=distinct_ids, from_date=from_date, to_date=to_date, ) ``` ### query_saved_report ``` query_saved_report(bookmark_id: int) -> SavedReportResult ``` Query a saved report (Insights, Retention, or Funnel). Executes a saved report by its bookmark ID. The report type is automatically detected from the response headers. | PARAMETER | DESCRIPTION | | ------------- | ------------------------------------------------------------------------- | | `bookmark_id` | ID of saved report (from list_bookmarks or Mixpanel URL). **TYPE:** `int` | | RETURNS | DESCRIPTION | | ------------------- | ------------------------------------------------------------ | | `SavedReportResult` | SavedReportResult with report data and report_type property. | | RAISES | DESCRIPTION | | ------------- | ---------------------------------------------- | | `ConfigError` | If API credentials not available. | | `QueryError` | If bookmark_id is invalid or report not found. | Source code in `src/mixpanel_data/workspace.py` ``` def query_saved_report(self, bookmark_id: int) -> SavedReportResult: """Query a saved report (Insights, Retention, or Funnel). Executes a saved report by its bookmark ID. The report type is automatically detected from the response headers. Args: bookmark_id: ID of saved report (from list_bookmarks or Mixpanel URL). Returns: SavedReportResult with report data and report_type property. Raises: ConfigError: If API credentials not available. QueryError: If bookmark_id is invalid or report not found. """ return self._live_query_service.query_saved_report(bookmark_id=bookmark_id) ``` ### query_flows ``` query_flows(bookmark_id: int) -> FlowsResult ``` Query a saved Flows report. Executes a saved Flows report by its bookmark ID, returning step data, breakdowns, and conversion rates. | PARAMETER | DESCRIPTION | | ------------- | ------------------------------------------------------------------------------- | | `bookmark_id` | ID of saved flows report (from list_bookmarks or Mixpanel URL). **TYPE:** `int` | | RETURNS | DESCRIPTION | | ------------- | -------------------------------------------------------- | | `FlowsResult` | FlowsResult with steps, breakdowns, and conversion rate. | | RAISES | DESCRIPTION | | ------------- | ---------------------------------------------- | | `ConfigError` | If API credentials not available. | | `QueryError` | If bookmark_id is invalid or report not found. | Source code in `src/mixpanel_data/workspace.py` ``` def query_flows(self, bookmark_id: int) -> FlowsResult: """Query a saved Flows report. Executes a saved Flows report by its bookmark ID, returning step data, breakdowns, and conversion rates. Args: bookmark_id: ID of saved flows report (from list_bookmarks or Mixpanel URL). Returns: FlowsResult with steps, breakdowns, and conversion rate. Raises: ConfigError: If API credentials not available. QueryError: If bookmark_id is invalid or report not found. """ return self._live_query_service.query_flows(bookmark_id=bookmark_id) ``` ### frequency ``` frequency( *, from_date: str, to_date: str, unit: Literal["day", "week", "month"] = "day", addiction_unit: Literal["hour", "day"] = "hour", event: str | None = None, where: str | None = None, ) -> FrequencyResult ``` Analyze event frequency distribution. | PARAMETER | DESCRIPTION | | ---------------- | ----------------------------------------------------------------------------------- | | `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date (YYYY-MM-DD). **TYPE:** `str` | | `unit` | Overall time unit. **TYPE:** `Literal['day', 'week', 'month']` **DEFAULT:** `'day'` | | `addiction_unit` | Measurement granularity. **TYPE:** `Literal['hour', 'day']` **DEFAULT:** `'hour'` | | `event` | Optional event filter. **TYPE:** \`str | | `where` | Optional WHERE clause. **TYPE:** \`str | | RETURNS | DESCRIPTION | | ----------------- | -------------------------------------------- | | `FrequencyResult` | FrequencyResult with frequency distribution. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def frequency( self, *, from_date: str, to_date: str, unit: Literal["day", "week", "month"] = "day", addiction_unit: Literal["hour", "day"] = "hour", event: str | None = None, where: str | None = None, ) -> FrequencyResult: """Analyze event frequency distribution. Args: from_date: Start date (YYYY-MM-DD). to_date: End date (YYYY-MM-DD). unit: Overall time unit. addiction_unit: Measurement granularity. event: Optional event filter. where: Optional WHERE clause. Returns: FrequencyResult with frequency distribution. Raises: ConfigError: If API credentials not available. """ return self._live_query_service.frequency( from_date=from_date, to_date=to_date, unit=unit, addiction_unit=addiction_unit, event=event, where=where, ) ``` ### segmentation_numeric ``` segmentation_numeric( event: str, *, from_date: str, to_date: str, on: str, unit: Literal["hour", "day"] = "day", where: str | None = None, type: Literal["general", "unique", "average"] = "general", ) -> NumericBucketResult ``` Bucket events by numeric property ranges. | PARAMETER | DESCRIPTION | | ----------- | --------------------------------------------------------------------------------------------- | | `event` | Event name. **TYPE:** `str` | | `from_date` | Start date. **TYPE:** `str` | | `to_date` | End date. **TYPE:** `str` | | `on` | Numeric property expression. **TYPE:** `str` | | `unit` | Time unit. **TYPE:** `Literal['hour', 'day']` **DEFAULT:** `'day'` | | `where` | Optional filter. **TYPE:** \`str | | `type` | Counting method. **TYPE:** `Literal['general', 'unique', 'average']` **DEFAULT:** `'general'` | | RETURNS | DESCRIPTION | | --------------------- | --------------------------------------- | | `NumericBucketResult` | NumericBucketResult with bucketed data. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def segmentation_numeric( self, event: str, *, from_date: str, to_date: str, on: str, unit: Literal["hour", "day"] = "day", where: str | None = None, type: Literal["general", "unique", "average"] = "general", ) -> NumericBucketResult: """Bucket events by numeric property ranges. Args: event: Event name. from_date: Start date. to_date: End date. on: Numeric property expression. unit: Time unit. where: Optional filter. type: Counting method. Returns: NumericBucketResult with bucketed data. Raises: ConfigError: If API credentials not available. """ return self._live_query_service.segmentation_numeric( event=event, from_date=from_date, to_date=to_date, on=on, unit=unit, where=where, type=type, ) ``` ### segmentation_sum ``` segmentation_sum( event: str, *, from_date: str, to_date: str, on: str, unit: Literal["hour", "day"] = "day", where: str | None = None, ) -> NumericSumResult ``` Calculate sum of numeric property over time. | PARAMETER | DESCRIPTION | | ----------- | ------------------------------------------------------------------ | | `event` | Event name. **TYPE:** `str` | | `from_date` | Start date. **TYPE:** `str` | | `to_date` | End date. **TYPE:** `str` | | `on` | Numeric property expression. **TYPE:** `str` | | `unit` | Time unit. **TYPE:** `Literal['hour', 'day']` **DEFAULT:** `'day'` | | `where` | Optional filter. **TYPE:** \`str | | RETURNS | DESCRIPTION | | ------------------ | -------------------------------------------- | | `NumericSumResult` | NumericSumResult with sum values per period. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def segmentation_sum( self, event: str, *, from_date: str, to_date: str, on: str, unit: Literal["hour", "day"] = "day", where: str | None = None, ) -> NumericSumResult: """Calculate sum of numeric property over time. Args: event: Event name. from_date: Start date. to_date: End date. on: Numeric property expression. unit: Time unit. where: Optional filter. Returns: NumericSumResult with sum values per period. Raises: ConfigError: If API credentials not available. """ return self._live_query_service.segmentation_sum( event=event, from_date=from_date, to_date=to_date, on=on, unit=unit, where=where, ) ``` ### segmentation_average ``` segmentation_average( event: str, *, from_date: str, to_date: str, on: str, unit: Literal["hour", "day"] = "day", where: str | None = None, ) -> NumericAverageResult ``` Calculate average of numeric property over time. | PARAMETER | DESCRIPTION | | ----------- | ------------------------------------------------------------------ | | `event` | Event name. **TYPE:** `str` | | `from_date` | Start date. **TYPE:** `str` | | `to_date` | End date. **TYPE:** `str` | | `on` | Numeric property expression. **TYPE:** `str` | | `unit` | Time unit. **TYPE:** `Literal['hour', 'day']` **DEFAULT:** `'day'` | | `where` | Optional filter. **TYPE:** \`str | | RETURNS | DESCRIPTION | | ---------------------- | ---------------------------------------------------- | | `NumericAverageResult` | NumericAverageResult with average values per period. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | Source code in `src/mixpanel_data/workspace.py` ``` def segmentation_average( self, event: str, *, from_date: str, to_date: str, on: str, unit: Literal["hour", "day"] = "day", where: str | None = None, ) -> NumericAverageResult: """Calculate average of numeric property over time. Args: event: Event name. from_date: Start date. to_date: End date. on: Numeric property expression. unit: Time unit. where: Optional filter. Returns: NumericAverageResult with average values per period. Raises: ConfigError: If API credentials not available. """ return self._live_query_service.segmentation_average( event=event, from_date=from_date, to_date=to_date, on=on, unit=unit, where=where, ) ``` ### property_distribution ``` property_distribution( event: str, property: str, *, from_date: str, to_date: str, limit: int = 20 ) -> PropertyDistributionResult ``` Get distribution of values for a property. Uses JQL to count occurrences of each property value, returning counts and percentages sorted by frequency. | PARAMETER | DESCRIPTION | | ----------- | ---------------------------------------------------------------------------------- | | `event` | Event name to analyze. **TYPE:** `str` | | `property` | Property name to get distribution for. **TYPE:** `str` | | `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date (YYYY-MM-DD). **TYPE:** `str` | | `limit` | Maximum number of values to return. Default: 20. **TYPE:** `int` **DEFAULT:** `20` | | RETURNS | DESCRIPTION | | ---------------------------- | ------------------------------------------------------------- | | `PropertyDistributionResult` | PropertyDistributionResult with value counts and percentages. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | | `QueryError` | Script execution error. | Example ``` result = ws.property_distribution( event="Purchase", property="country", from_date="2024-01-01", to_date="2024-01-31", ) for v in result.values: print(f"{v.value}: {v.count} ({v.percentage:.1f}%)") ``` Source code in `src/mixpanel_data/workspace.py` ```` def property_distribution( self, event: str, property: str, *, from_date: str, to_date: str, limit: int = 20, ) -> PropertyDistributionResult: """Get distribution of values for a property. Uses JQL to count occurrences of each property value, returning counts and percentages sorted by frequency. Args: event: Event name to analyze. property: Property name to get distribution for. from_date: Start date (YYYY-MM-DD). to_date: End date (YYYY-MM-DD). limit: Maximum number of values to return. Default: 20. Returns: PropertyDistributionResult with value counts and percentages. Raises: ConfigError: If API credentials not available. QueryError: Script execution error. Example: ```python result = ws.property_distribution( event="Purchase", property="country", from_date="2024-01-01", to_date="2024-01-31", ) for v in result.values: print(f"{v.value}: {v.count} ({v.percentage:.1f}%)") ``` """ return self._live_query_service.property_distribution( event=event, property=property, from_date=from_date, to_date=to_date, limit=limit, ) ```` ### numeric_summary ``` numeric_summary( event: str, property: str, *, from_date: str, to_date: str, percentiles: list[int] | None = None, ) -> NumericPropertySummaryResult ``` Get statistical summary for a numeric property. Uses JQL to compute count, min, max, avg, stddev, and percentiles for a numeric property. | PARAMETER | DESCRIPTION | | ------------- | -------------------------------------------------------------------------------- | | `event` | Event name to analyze. **TYPE:** `str` | | `property` | Numeric property name. **TYPE:** `str` | | `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date (YYYY-MM-DD). **TYPE:** `str` | | `percentiles` | Percentiles to compute. Default: [25, 50, 75, 90, 95, 99]. **TYPE:** \`list[int] | | RETURNS | DESCRIPTION | | ------------------------------ | --------------------------------------------- | | `NumericPropertySummaryResult` | NumericPropertySummaryResult with statistics. | | RAISES | DESCRIPTION | | ------------- | ----------------------------------------------- | | `ConfigError` | If API credentials not available. | | `QueryError` | Script execution error or non-numeric property. | Example ``` result = ws.numeric_summary( event="Purchase", property="amount", from_date="2024-01-01", to_date="2024-01-31", ) print(f"Avg: {result.avg}, Median: {result.percentiles[50]}") ``` Source code in `src/mixpanel_data/workspace.py` ```` def numeric_summary( self, event: str, property: str, *, from_date: str, to_date: str, percentiles: list[int] | None = None, ) -> NumericPropertySummaryResult: """Get statistical summary for a numeric property. Uses JQL to compute count, min, max, avg, stddev, and percentiles for a numeric property. Args: event: Event name to analyze. property: Numeric property name. from_date: Start date (YYYY-MM-DD). to_date: End date (YYYY-MM-DD). percentiles: Percentiles to compute. Default: [25, 50, 75, 90, 95, 99]. Returns: NumericPropertySummaryResult with statistics. Raises: ConfigError: If API credentials not available. QueryError: Script execution error or non-numeric property. Example: ```python result = ws.numeric_summary( event="Purchase", property="amount", from_date="2024-01-01", to_date="2024-01-31", ) print(f"Avg: {result.avg}, Median: {result.percentiles[50]}") ``` """ return self._live_query_service.numeric_summary( event=event, property=property, from_date=from_date, to_date=to_date, percentiles=percentiles, ) ```` ### daily_counts ``` daily_counts( *, from_date: str, to_date: str, events: list[str] | None = None ) -> DailyCountsResult ``` Get daily event counts. Uses JQL to count events by day, optionally filtered to specific events. | PARAMETER | DESCRIPTION | | ----------- | -------------------------------------------------------------------------- | | `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date (YYYY-MM-DD). **TYPE:** `str` | | `events` | Optional list of events to count. None = all events. **TYPE:** \`list[str] | | RETURNS | DESCRIPTION | | ------------------- | ----------------------------------------------- | | `DailyCountsResult` | DailyCountsResult with date/event/count tuples. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | | `QueryError` | Script execution error. | Example ``` result = ws.daily_counts( from_date="2024-01-01", to_date="2024-01-07", events=["Purchase", "Signup"], ) for c in result.counts: print(f"{c.date} {c.event}: {c.count}") ``` Source code in `src/mixpanel_data/workspace.py` ```` def daily_counts( self, *, from_date: str, to_date: str, events: list[str] | None = None, ) -> DailyCountsResult: """Get daily event counts. Uses JQL to count events by day, optionally filtered to specific events. Args: from_date: Start date (YYYY-MM-DD). to_date: End date (YYYY-MM-DD). events: Optional list of events to count. None = all events. Returns: DailyCountsResult with date/event/count tuples. Raises: ConfigError: If API credentials not available. QueryError: Script execution error. Example: ```python result = ws.daily_counts( from_date="2024-01-01", to_date="2024-01-07", events=["Purchase", "Signup"], ) for c in result.counts: print(f"{c.date} {c.event}: {c.count}") ``` """ return self._live_query_service.daily_counts( from_date=from_date, to_date=to_date, events=events, ) ```` ### engagement_distribution ``` engagement_distribution( *, from_date: str, to_date: str, events: list[str] | None = None, buckets: list[int] | None = None, ) -> EngagementDistributionResult ``` Get user engagement distribution. Uses JQL to bucket users by their event count, showing how many users performed N events. | PARAMETER | DESCRIPTION | | ----------- | ----------------------------------------------------------------------------- | | `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date (YYYY-MM-DD). **TYPE:** `str` | | `events` | Optional list of events to count. None = all events. **TYPE:** \`list[str] | | `buckets` | Bucket boundaries. Default: [1, 2, 5, 10, 25, 50, 100]. **TYPE:** \`list[int] | | RETURNS | DESCRIPTION | | ------------------------------ | --------------------------------------------------------- | | `EngagementDistributionResult` | EngagementDistributionResult with user counts per bucket. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | | `QueryError` | Script execution error. | Example ``` result = ws.engagement_distribution( from_date="2024-01-01", to_date="2024-01-31", ) for b in result.buckets: print(f"{b.bucket_label}: {b.user_count} ({b.percentage:.1f}%)") ``` Source code in `src/mixpanel_data/workspace.py` ```` def engagement_distribution( self, *, from_date: str, to_date: str, events: list[str] | None = None, buckets: list[int] | None = None, ) -> EngagementDistributionResult: """Get user engagement distribution. Uses JQL to bucket users by their event count, showing how many users performed N events. Args: from_date: Start date (YYYY-MM-DD). to_date: End date (YYYY-MM-DD). events: Optional list of events to count. None = all events. buckets: Bucket boundaries. Default: [1, 2, 5, 10, 25, 50, 100]. Returns: EngagementDistributionResult with user counts per bucket. Raises: ConfigError: If API credentials not available. QueryError: Script execution error. Example: ```python result = ws.engagement_distribution( from_date="2024-01-01", to_date="2024-01-31", ) for b in result.buckets: print(f"{b.bucket_label}: {b.user_count} ({b.percentage:.1f}%)") ``` """ return self._live_query_service.engagement_distribution( from_date=from_date, to_date=to_date, events=events, buckets=buckets, ) ```` ### property_coverage ``` property_coverage( event: str, properties: list[str], *, from_date: str, to_date: str ) -> PropertyCoverageResult ``` Get property coverage statistics. Uses JQL to count how often each property is defined (non-null) vs undefined for the specified event. | PARAMETER | DESCRIPTION | | ------------ | ------------------------------------------------------ | | `event` | Event name to analyze. **TYPE:** `str` | | `properties` | List of property names to check. **TYPE:** `list[str]` | | `from_date` | Start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date (YYYY-MM-DD). **TYPE:** `str` | | RETURNS | DESCRIPTION | | ------------------------ | ------------------------------------------------------------- | | `PropertyCoverageResult` | PropertyCoverageResult with coverage statistics per property. | | RAISES | DESCRIPTION | | ------------- | --------------------------------- | | `ConfigError` | If API credentials not available. | | `QueryError` | Script execution error. | Example ``` result = ws.property_coverage( event="Purchase", properties=["coupon_code", "referrer"], from_date="2024-01-01", to_date="2024-01-31", ) for c in result.coverage: print(f"{c.property}: {c.coverage_percentage:.1f}% defined") ``` Source code in `src/mixpanel_data/workspace.py` ```` def property_coverage( self, event: str, properties: list[str], *, from_date: str, to_date: str, ) -> PropertyCoverageResult: """Get property coverage statistics. Uses JQL to count how often each property is defined (non-null) vs undefined for the specified event. Args: event: Event name to analyze. properties: List of property names to check. from_date: Start date (YYYY-MM-DD). to_date: End date (YYYY-MM-DD). Returns: PropertyCoverageResult with coverage statistics per property. Raises: ConfigError: If API credentials not available. QueryError: Script execution error. Example: ```python result = ws.property_coverage( event="Purchase", properties=["coupon_code", "referrer"], from_date="2024-01-01", to_date="2024-01-31", ) for c in result.coverage: print(f"{c.property}: {c.coverage_percentage:.1f}% defined") ``` """ return self._live_query_service.property_coverage( event=event, properties=properties, from_date=from_date, to_date=to_date, ) ```` ### info ``` info() -> WorkspaceInfo ``` Get metadata about this workspace. | RETURNS | DESCRIPTION | | --------------- | ------------------------------------------------------------------- | | `WorkspaceInfo` | WorkspaceInfo with path, project_id, region, account, tables, size. | Source code in `src/mixpanel_data/workspace.py` ``` def info(self) -> WorkspaceInfo: """Get metadata about this workspace. Returns: WorkspaceInfo with path, project_id, region, account, tables, size. """ path = self.storage.path tables = [t.name for t in self.storage.list_tables()] # Calculate database size and creation time size_mb = 0.0 created_at: datetime | None = None if path is not None and path.exists(): try: stat = path.stat() size_mb = stat.st_size / 1_000_000 created_at = datetime.fromtimestamp(stat.st_ctime) except (OSError, PermissionError): # File became inaccessible, use defaults pass return WorkspaceInfo( path=path, project_id=self._credentials.project_id if self._credentials else "unknown", region=self._credentials.region if self._credentials else "unknown", account=self._account_name, tables=tables, size_mb=size_mb, created_at=created_at, ) ``` ### tables ``` tables() -> list[TableInfo] ``` List tables in the local database. | RETURNS | DESCRIPTION | | ----------------- | -------------------------------------------------------------- | | `list[TableInfo]` | List of TableInfo objects (name, type, row_count, fetched_at). | Source code in `src/mixpanel_data/workspace.py` ``` def tables(self) -> list[TableInfo]: """List tables in the local database. Returns: List of TableInfo objects (name, type, row_count, fetched_at). """ return self.storage.list_tables() ``` ### table_schema ``` table_schema(table: str) -> TableSchema ``` Get schema for a table in the local database. | PARAMETER | DESCRIPTION | | --------- | --------------------------- | | `table` | Table name. **TYPE:** `str` | | RETURNS | DESCRIPTION | | ------------- | ------------------------------------ | | `TableSchema` | TableSchema with column definitions. | | RAISES | DESCRIPTION | | -------------------- | ----------------------- | | `TableNotFoundError` | If table doesn't exist. | Source code in `src/mixpanel_data/workspace.py` ``` def table_schema(self, table: str) -> TableSchema: """Get schema for a table in the local database. Args: table: Table name. Returns: TableSchema with column definitions. Raises: TableNotFoundError: If table doesn't exist. """ return self.storage.get_schema(table) ``` ### drop ``` drop(*names: str) -> None ``` Drop specified tables. | PARAMETER | DESCRIPTION | | --------- | ------------------------------------------------------ | | `*names` | Table names to drop. **TYPE:** `str` **DEFAULT:** `()` | | RAISES | DESCRIPTION | | -------------------- | --------------------------- | | `TableNotFoundError` | If any table doesn't exist. | Source code in `src/mixpanel_data/workspace.py` ``` def drop(self, *names: str) -> None: """Drop specified tables. Args: *names: Table names to drop. Raises: TableNotFoundError: If any table doesn't exist. """ for name in names: self.storage.drop_table(name) ``` ### drop_all ``` drop_all(type: TableType | None = None) -> None ``` Drop all tables from the workspace, optionally filtered by type. Permanently removes all tables and their data. When used with the type parameter, only tables matching the specified type are dropped. | PARAMETER | DESCRIPTION | | --------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | `type` | Optional table type filter. Valid values: "events", "profiles". If None, all tables are dropped regardless of type. **TYPE:** \`TableType | | RAISES | DESCRIPTION | | -------------------- | ------------------------------------------------ | | `TableNotFoundError` | If a table cannot be dropped (rare in practice). | Example Drop all event tables: ``` ws = Workspace() ws.drop_all(type="events") # Only drops event tables ws.close() ``` Drop all tables: ``` ws = Workspace() ws.drop_all() # Drops everything ws.close() ``` Source code in `src/mixpanel_data/workspace.py` ```` def drop_all(self, type: TableType | None = None) -> None: """Drop all tables from the workspace, optionally filtered by type. Permanently removes all tables and their data. When used with the type parameter, only tables matching the specified type are dropped. Args: type: Optional table type filter. Valid values: "events", "profiles". If None, all tables are dropped regardless of type. Raises: TableNotFoundError: If a table cannot be dropped (rare in practice). Example: Drop all event tables: ```python ws = Workspace() ws.drop_all(type="events") # Only drops event tables ws.close() ``` Drop all tables: ```python ws = Workspace() ws.drop_all() # Drops everything ws.close() ``` """ tables = self.storage.list_tables() for table in tables: if type is None or table.type == type: self.storage.drop_table(table.name) ```` ### sample ``` sample(table: str, n: int = 10) -> pd.DataFrame ``` Return random sample rows from a table. Uses DuckDB's reservoir sampling for representative results. Unlike LIMIT, sampling returns rows from throughout the table. | PARAMETER | DESCRIPTION | | --------- | ------------------------------------------------------------------------- | | `table` | Table name to sample from. **TYPE:** `str` | | `n` | Number of rows to return (default: 10). **TYPE:** `int` **DEFAULT:** `10` | | RETURNS | DESCRIPTION | | ----------- | ------------------------------------------------------------- | | `DataFrame` | DataFrame with n random rows. If table has fewer than n rows, | | `DataFrame` | returns all available rows. | | RAISES | DESCRIPTION | | -------------------- | ----------------------- | | `TableNotFoundError` | If table doesn't exist. | Example ``` ws = Workspace() ws.sample("events") # 10 random rows ws.sample("events", n=5) # 5 random rows ``` Source code in `src/mixpanel_data/workspace.py` ```` def sample(self, table: str, n: int = 10) -> pd.DataFrame: """Return random sample rows from a table. Uses DuckDB's reservoir sampling for representative results. Unlike LIMIT, sampling returns rows from throughout the table. Args: table: Table name to sample from. n: Number of rows to return (default: 10). Returns: DataFrame with n random rows. If table has fewer than n rows, returns all available rows. Raises: TableNotFoundError: If table doesn't exist. Example: ```python ws = Workspace() ws.sample("events") # 10 random rows ws.sample("events", n=5) # 5 random rows ``` """ # Validate table exists self.storage.get_schema(table) # Use DuckDB's reservoir sampling sql = f'SELECT * FROM "{table}" USING SAMPLE {n}' return self.storage.execute_df(sql) ```` ### summarize ``` summarize(table: str) -> SummaryResult ``` Get statistical summary of all columns in a table. Uses DuckDB's SUMMARIZE command to compute min/max, quartiles, null percentage, and approximate distinct counts for each column. | PARAMETER | DESCRIPTION | | --------- | ---------------------------------------- | | `table` | Table name to summarize. **TYPE:** `str` | | RETURNS | DESCRIPTION | | --------------- | ------------------------------------------------------------- | | `SummaryResult` | SummaryResult with per-column statistics and total row count. | | RAISES | DESCRIPTION | | -------------------- | ----------------------- | | `TableNotFoundError` | If table doesn't exist. | Example ``` result = ws.summarize("events") result.row_count # 1234567 result.columns[0].null_percentage # 0.5 result.df # Full summary as DataFrame ``` Source code in `src/mixpanel_data/workspace.py` ```` def summarize(self, table: str) -> SummaryResult: """Get statistical summary of all columns in a table. Uses DuckDB's SUMMARIZE command to compute min/max, quartiles, null percentage, and approximate distinct counts for each column. Args: table: Table name to summarize. Returns: SummaryResult with per-column statistics and total row count. Raises: TableNotFoundError: If table doesn't exist. Example: ```python result = ws.summarize("events") result.row_count # 1234567 result.columns[0].null_percentage # 0.5 result.df # Full summary as DataFrame ``` """ # Validate table exists self.storage.get_schema(table) # Get row count row_count = self.storage.execute_scalar(f'SELECT COUNT(*) FROM "{table}"') # Get column statistics using SUMMARIZE summary_df = self.storage.execute_df(f'SUMMARIZE "{table}"') # Convert to ColumnSummary objects (to_dict is more efficient than iterrows) columns: list[ColumnSummary] = [] for row in summary_df.to_dict("records"): columns.append( ColumnSummary( column_name=str(row["column_name"]), column_type=str(row["column_type"]), min=row["min"], max=row["max"], approx_unique=int(row["approx_unique"]), avg=self._try_float(row["avg"]), std=self._try_float(row["std"]), q25=row["q25"], q50=row["q50"], q75=row["q75"], count=int(row["count"]), null_percentage=float(row["null_percentage"]), ) ) return SummaryResult( table=table, row_count=int(row_count), columns=columns, ) ```` ### event_breakdown ``` event_breakdown(table: str) -> EventBreakdownResult ``` Analyze event distribution in a table. Computes per-event counts, unique users, date ranges, and percentage of total for each event type. | PARAMETER | DESCRIPTION | | --------- | ----------------------------------------------------------------------------------------------------- | | `table` | Table name containing events. Must have columns: event_name, event_time, distinct_id. **TYPE:** `str` | | RETURNS | DESCRIPTION | | ---------------------- | ----------------------------------------------- | | `EventBreakdownResult` | EventBreakdownResult with per-event statistics. | | RAISES | DESCRIPTION | | -------------------- | ------------------------------------------------------------------------------------------------------------------------ | | `TableNotFoundError` | If table doesn't exist. | | `QueryError` | If table lacks required columns (event_name, event_time, distinct_id). Error message lists the specific missing columns. | Example ``` breakdown = ws.event_breakdown("events") breakdown.total_events # 1234567 breakdown.events[0].event_name # "Page View" breakdown.events[0].pct_of_total # 45.2 ``` Source code in `src/mixpanel_data/workspace.py` ```` def event_breakdown(self, table: str) -> EventBreakdownResult: """Analyze event distribution in a table. Computes per-event counts, unique users, date ranges, and percentage of total for each event type. Args: table: Table name containing events. Must have columns: event_name, event_time, distinct_id. Returns: EventBreakdownResult with per-event statistics. Raises: TableNotFoundError: If table doesn't exist. QueryError: If table lacks required columns (event_name, event_time, distinct_id). Error message lists the specific missing columns. Example: ```python breakdown = ws.event_breakdown("events") breakdown.total_events # 1234567 breakdown.events[0].event_name # "Page View" breakdown.events[0].pct_of_total # 45.2 ``` """ # Validate table exists and get schema schema = self.storage.get_schema(table) column_names = {col.name for col in schema.columns} # Check for required columns required_columns = {"event_name", "event_time", "distinct_id"} missing = required_columns - column_names if missing: raise QueryError( f"event_breakdown() requires columns {required_columns}, " f"but '{table}' is missing: {missing}", status_code=0, ) # Get aggregate statistics agg_sql = f""" SELECT COUNT(*) as total_events, COUNT(DISTINCT distinct_id) as total_users, MIN(event_time) as min_time, MAX(event_time) as max_time FROM "{table}" """ agg_result = self.storage.execute_rows(agg_sql) total_events, total_users, min_time, max_time = agg_result.rows[0] # Handle empty table if total_events == 0: return EventBreakdownResult( table=table, total_events=0, total_users=0, date_range=(datetime.min, datetime.min), events=[], ) # Get per-event statistics breakdown_sql = f""" SELECT event_name, COUNT(*) as count, COUNT(DISTINCT distinct_id) as unique_users, MIN(event_time) as first_seen, MAX(event_time) as last_seen, ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) as pct_of_total FROM "{table}" GROUP BY event_name ORDER BY count DESC """ breakdown_rows = self.storage.execute_rows(breakdown_sql) events: list[EventStats] = [] for row in breakdown_rows: event_name, count, unique_users, first_seen, last_seen, pct = row events.append( EventStats( event_name=str(event_name), count=int(count), unique_users=int(unique_users), first_seen=first_seen if isinstance(first_seen, datetime) else datetime.fromisoformat(str(first_seen)), last_seen=last_seen if isinstance(last_seen, datetime) else datetime.fromisoformat(str(last_seen)), pct_of_total=float(pct), ) ) return EventBreakdownResult( table=table, total_events=int(total_events), total_users=int(total_users), date_range=( min_time if isinstance(min_time, datetime) else datetime.fromisoformat(str(min_time)), max_time if isinstance(max_time, datetime) else datetime.fromisoformat(str(max_time)), ), events=events, ) ```` ### property_keys ``` property_keys(table: str, event: str | None = None) -> list[str] ``` List all JSON property keys in a table. Extracts distinct keys from the 'properties' JSON column. Useful for discovering queryable fields in event properties. | PARAMETER | DESCRIPTION | | --------- | ---------------------------------------------------------------------------------------------------------------- | | `table` | Table name with a 'properties' JSON column. **TYPE:** `str` | | `event` | Optional event name to filter by. If provided, only returns keys present in events of that type. **TYPE:** \`str | | RETURNS | DESCRIPTION | | ----------- | ------------------------------------------------- | | `list[str]` | Alphabetically sorted list of property key names. | | `list[str]` | Empty list if no keys found. | | RAISES | DESCRIPTION | | -------------------- | ----------------------------------- | | `TableNotFoundError` | If table doesn't exist. | | `QueryError` | If table lacks 'properties' column. | Example All keys across all events: ``` ws.property_keys("events") # ['$browser', '$city', 'page', 'referrer', 'user_plan'] ``` Keys for specific event type: ``` ws.property_keys("events", event="Purchase") # ['amount', 'currency', 'product_id', 'quantity'] ``` Source code in `src/mixpanel_data/workspace.py` ```` def property_keys( self, table: str, event: str | None = None, ) -> list[str]: """List all JSON property keys in a table. Extracts distinct keys from the 'properties' JSON column. Useful for discovering queryable fields in event properties. Args: table: Table name with a 'properties' JSON column. event: Optional event name to filter by. If provided, only returns keys present in events of that type. Returns: Alphabetically sorted list of property key names. Empty list if no keys found. Raises: TableNotFoundError: If table doesn't exist. QueryError: If table lacks 'properties' column. Example: All keys across all events: ```python ws.property_keys("events") # ['$browser', '$city', 'page', 'referrer', 'user_plan'] ``` Keys for specific event type: ```python ws.property_keys("events", event="Purchase") # ['amount', 'currency', 'product_id', 'quantity'] ``` """ # Validate table exists and get schema schema = self.storage.get_schema(table) column_names = {col.name for col in schema.columns} # Check for required column if "properties" not in column_names: raise QueryError( f"property_keys() requires a 'properties' column, " f"but '{table}' does not have one", status_code=0, ) # Build query with optional event filter if event is not None: # Check if event_name column exists if "event_name" not in column_names: raise QueryError( f"Cannot filter by event: '{table}' lacks 'event_name' column", status_code=0, ) sql = f""" SELECT DISTINCT unnest(json_keys(properties)) as key FROM "{table}" WHERE event_name = ? ORDER BY key """ result = self.storage.execute_rows_params(sql, [event]) rows = result.rows else: sql = f""" SELECT DISTINCT unnest(json_keys(properties)) as key FROM "{table}" ORDER BY key """ result = self.storage.execute_rows(sql) rows = result.rows return [str(row[0]) for row in rows] ```` ### column_stats ``` column_stats(table: str, column: str, *, top_n: int = 10) -> ColumnStatsResult ``` Get detailed statistics for a single column. Performs deep analysis including null rates, cardinality, top values, and numeric statistics (for numeric columns). The column parameter supports JSON path expressions for analyzing properties stored in JSON columns: - `properties->>'$.country'` for string extraction - `CAST(properties->>'$.amount' AS DOUBLE)` for numeric | PARAMETER | DESCRIPTION | | --------- | ------------------------------------------------------------------------------- | | `table` | Table name to analyze. **TYPE:** `str` | | `column` | Column name or expression to analyze. **TYPE:** `str` | | `top_n` | Number of top values to return (default: 10). **TYPE:** `int` **DEFAULT:** `10` | | RETURNS | DESCRIPTION | | ------------------- | ------------------------------------------------------- | | `ColumnStatsResult` | ColumnStatsResult with comprehensive column statistics. | | RAISES | DESCRIPTION | | -------------------- | -------------------------------- | | `TableNotFoundError` | If table doesn't exist. | | `QueryError` | If column expression is invalid. | Example Analyze standard column: ``` stats = ws.column_stats("events", "event_name") stats.unique_count # 47 stats.top_values[:3] # [('Page View', 45230), ...] ``` Analyze JSON property: ``` stats = ws.column_stats("events", "properties->>'$.country'") ``` Security The column parameter is interpolated directly into SQL queries to allow expression syntax. Only use with trusted input from developers or AI coding agents. Do not pass untrusted user input. Source code in `src/mixpanel_data/workspace.py` ```` def column_stats( self, table: str, column: str, *, top_n: int = 10, ) -> ColumnStatsResult: """Get detailed statistics for a single column. Performs deep analysis including null rates, cardinality, top values, and numeric statistics (for numeric columns). The column parameter supports JSON path expressions for analyzing properties stored in JSON columns: - `properties->>'$.country'` for string extraction - `CAST(properties->>'$.amount' AS DOUBLE)` for numeric Args: table: Table name to analyze. column: Column name or expression to analyze. top_n: Number of top values to return (default: 10). Returns: ColumnStatsResult with comprehensive column statistics. Raises: TableNotFoundError: If table doesn't exist. QueryError: If column expression is invalid. Example: Analyze standard column: ```python stats = ws.column_stats("events", "event_name") stats.unique_count # 47 stats.top_values[:3] # [('Page View', 45230), ...] ``` Analyze JSON property: ```python stats = ws.column_stats("events", "properties->>'$.country'") ``` Security: The column parameter is interpolated directly into SQL queries to allow expression syntax. Only use with trusted input from developers or AI coding agents. Do not pass untrusted user input. """ # Validate table exists self.storage.get_schema(table) # Get total row count total_rows = self.storage.execute_scalar(f'SELECT COUNT(*) FROM "{table}"') # Get basic stats: count, null_count, approx unique stats_sql = f""" SELECT COUNT({column}) as count, COUNT(*) - COUNT({column}) as null_count, APPROX_COUNT_DISTINCT({column}) as unique_count FROM "{table}" """ try: stats_result = self.storage.execute_rows(stats_sql) except Exception as e: raise QueryError( f"Invalid column expression: {column}. Error: {e}", status_code=0, ) from e count, null_count, unique_count = stats_result.rows[0] # Calculate percentages null_pct = (null_count / total_rows * 100) if total_rows > 0 else 0.0 unique_pct = (unique_count / count * 100) if count > 0 else 0.0 # Get top values top_sql = f""" SELECT {column} as value, COUNT(*) as cnt FROM "{table}" WHERE {column} IS NOT NULL GROUP BY {column} ORDER BY cnt DESC LIMIT {top_n} """ top_result = self.storage.execute_rows(top_sql) top_values: list[tuple[Any, int]] = [ (row[0], int(row[1])) for row in top_result.rows ] # Detect column type to determine if numeric stats apply type_sql = ( f'SELECT typeof({column}) FROM "{table}" WHERE {column} IS NOT NULL LIMIT 1' ) try: type_result = self.storage.execute_rows(type_sql) dtype = str(type_result.rows[0][0]) if type_result.rows else "UNKNOWN" except Exception: dtype = "UNKNOWN" # Get numeric stats if applicable min_val: float | None = None max_val: float | None = None mean_val: float | None = None std_val: float | None = None numeric_types = { "INTEGER", "BIGINT", "DOUBLE", "FLOAT", "DECIMAL", "HUGEINT", "SMALLINT", "TINYINT", "UBIGINT", "UINTEGER", "USMALLINT", "UTINYINT", } if dtype.upper() in numeric_types: numeric_sql = f""" SELECT MIN({column}) as min_val, MAX({column}) as max_val, AVG({column}) as mean_val, STDDEV({column}) as std_val FROM "{table}" """ try: numeric_result = self.storage.execute_rows(numeric_sql) if numeric_result.rows: row = numeric_result.rows[0] min_val = float(row[0]) if row[0] is not None else None max_val = float(row[1]) if row[1] is not None else None mean_val = float(row[2]) if row[2] is not None else None std_val = float(row[3]) if row[3] is not None else None except Exception: # Not numeric, skip pass return ColumnStatsResult( table=table, column=column, dtype=dtype, count=int(count), null_count=int(null_count), null_pct=round(null_pct, 2), unique_count=int(unique_count), unique_pct=round(unique_pct, 2), top_values=top_values, min=min_val, max=max_val, mean=mean_val, std=std_val, ) ```` Copy markdown # Auth Module The auth module provides credential management and configuration. Explore on DeepWiki πŸ€– **[Configuration Reference β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/7.3-configuration-reference)** Ask questions about credential management, ConfigManager, or account configuration. ## Overview ``` from mixpanel_data.auth import ConfigManager, Credentials, AccountInfo # Manage accounts config = ConfigManager() config.add_account("production", username="...", secret="...", project_id="...", region="us") accounts = config.list_accounts() # Resolve credentials creds = config.resolve_credentials(account="production") ``` ## ConfigManager Manages accounts stored in the TOML config file (`~/.mp/config.toml`). ## mixpanel_data.auth.ConfigManager ``` ConfigManager(config_path: Path | None = None) ``` Manages Mixpanel project credentials and configuration. Handles: - Adding, removing, and listing project accounts - Setting the default account - Resolving credentials from environment variables or config file Config file location (in priority order): 1. Explicit config_path parameter 1. MP_CONFIG_PATH environment variable 1. Default: ~/.mp/config.toml Initialize ConfigManager. | PARAMETER | DESCRIPTION | | ------------- | -------------------------------------------------------------------------- | | `config_path` | Override config file location. Default: ~/.mp/config.toml **TYPE:** \`Path | Source code in `src/mixpanel_data/_internal/config.py` ``` def __init__(self, config_path: Path | None = None) -> None: """Initialize ConfigManager. Args: config_path: Override config file location. Default: ~/.mp/config.toml """ if config_path is not None: self._config_path = config_path elif "MP_CONFIG_PATH" in os.environ: self._config_path = Path(os.environ["MP_CONFIG_PATH"]) else: self._config_path = self.DEFAULT_CONFIG_PATH ``` ### config_path ``` config_path: Path ``` Return the config file path. ### resolve_credentials ``` resolve_credentials(account: str | None = None) -> Credentials ``` Resolve credentials using priority order. Resolution order: 1. Environment variables (MP_USERNAME, MP_SECRET, MP_PROJECT_ID, MP_REGION) 1. Named account from config file (if account parameter provided) 1. Default account from config file | PARAMETER | DESCRIPTION | | --------- | ---------------------------------------------------------------- | | `account` | Optional account name to use instead of default. **TYPE:** \`str | | RETURNS | DESCRIPTION | | ------------- | ----------------------------- | | `Credentials` | Immutable Credentials object. | | RAISES | DESCRIPTION | | ---------------------- | ---------------------------------- | | `ConfigError` | If no credentials can be resolved. | | `AccountNotFoundError` | If named account doesn't exist. | Source code in `src/mixpanel_data/_internal/config.py` ``` def resolve_credentials(self, account: str | None = None) -> Credentials: """Resolve credentials using priority order. Resolution order: 1. Environment variables (MP_USERNAME, MP_SECRET, MP_PROJECT_ID, MP_REGION) 2. Named account from config file (if account parameter provided) 3. Default account from config file Args: account: Optional account name to use instead of default. Returns: Immutable Credentials object. Raises: ConfigError: If no credentials can be resolved. AccountNotFoundError: If named account doesn't exist. """ # Priority 1: Environment variables env_creds = self._resolve_from_env() if env_creds is not None: return env_creds # Priority 2 & 3: Config file (named account or default) config = self._read_config() accounts = config.get("accounts", {}) if not accounts: raise ConfigError( "No credentials configured. " "Set MP_USERNAME, MP_SECRET, MP_PROJECT_ID, MP_REGION environment variables, " "or add an account with add_account()." ) # Determine which account to use account_name: str if account is not None: account_name = account else: default_account = config.get("default") if default_account is not None and isinstance(default_account, str): account_name = default_account else: # Use the first account if no default set account_name = next(iter(accounts.keys())) if account_name not in accounts: raise AccountNotFoundError( account_name, available_accounts=list(accounts.keys()), ) account_data = accounts[account_name] return Credentials( username=account_data["username"], secret=SecretStr(account_data["secret"]), project_id=account_data["project_id"], region=account_data["region"], ) ``` ### list_accounts ``` list_accounts() -> list[AccountInfo] ``` List all configured accounts. | RETURNS | DESCRIPTION | | ------------------- | --------------------------------------------------- | | `list[AccountInfo]` | List of AccountInfo objects (secrets not included). | Source code in `src/mixpanel_data/_internal/config.py` ``` def list_accounts(self) -> list[AccountInfo]: """List all configured accounts. Returns: List of AccountInfo objects (secrets not included). """ config = self._read_config() accounts = config.get("accounts", {}) default_name = config.get("default") result: list[AccountInfo] = [] for name, data in accounts.items(): result.append( AccountInfo( name=name, username=data.get("username", ""), project_id=data.get("project_id", ""), region=data.get("region", ""), is_default=(name == default_name), ) ) return result ``` ### add_account ``` add_account( name: str, username: str, secret: str, project_id: str, region: str ) -> None ``` Add a new account configuration. | PARAMETER | DESCRIPTION | | ------------ | --------------------------------------------------- | | `name` | Display name for the account. **TYPE:** `str` | | `username` | Service account username. **TYPE:** `str` | | `secret` | Service account secret. **TYPE:** `str` | | `project_id` | Mixpanel project ID. **TYPE:** `str` | | `region` | Data residency region (us, eu, in). **TYPE:** `str` | | RAISES | DESCRIPTION | | -------------------- | ------------------------------- | | `AccountExistsError` | If account name already exists. | | `ValueError` | If region is invalid. | Source code in `src/mixpanel_data/_internal/config.py` ``` def add_account( self, name: str, username: str, secret: str, project_id: str, region: str, ) -> None: """Add a new account configuration. Args: name: Display name for the account. username: Service account username. secret: Service account secret. project_id: Mixpanel project ID. region: Data residency region (us, eu, in). Raises: AccountExistsError: If account name already exists. ValueError: If region is invalid. """ # Validate region region_lower = region.lower() if region_lower not in VALID_REGIONS: valid = ", ".join(VALID_REGIONS) raise ValueError(f"Region must be one of: {valid}. Got: {region}") config = self._read_config() accounts = config.setdefault("accounts", {}) if name in accounts: raise AccountExistsError(name) accounts[name] = { "username": username, "secret": secret, "project_id": project_id, "region": region_lower, } # If this is the first account, make it the default if "default" not in config: config["default"] = name self._write_config(config) ``` ### remove_account ``` remove_account(name: str) -> None ``` Remove an account configuration. | PARAMETER | DESCRIPTION | | --------- | --------------------------------------- | | `name` | Account name to remove. **TYPE:** `str` | | RAISES | DESCRIPTION | | ---------------------- | ------------------------- | | `AccountNotFoundError` | If account doesn't exist. | Source code in `src/mixpanel_data/_internal/config.py` ``` def remove_account(self, name: str) -> None: """Remove an account configuration. Args: name: Account name to remove. Raises: AccountNotFoundError: If account doesn't exist. """ config = self._read_config() accounts = config.get("accounts", {}) if name not in accounts: raise AccountNotFoundError(name, available_accounts=list(accounts.keys())) del accounts[name] # If we removed the default, clear it or set to another account if config.get("default") == name: if accounts: config["default"] = next(iter(accounts.keys())) else: config.pop("default", None) self._write_config(config) ``` ### set_default ``` set_default(name: str) -> None ``` Set the default account. | PARAMETER | DESCRIPTION | | --------- | ----------------------------------------------- | | `name` | Account name to set as default. **TYPE:** `str` | | RAISES | DESCRIPTION | | ---------------------- | ------------------------- | | `AccountNotFoundError` | If account doesn't exist. | Source code in `src/mixpanel_data/_internal/config.py` ``` def set_default(self, name: str) -> None: """Set the default account. Args: name: Account name to set as default. Raises: AccountNotFoundError: If account doesn't exist. """ config = self._read_config() accounts = config.get("accounts", {}) if name not in accounts: raise AccountNotFoundError(name, available_accounts=list(accounts.keys())) config["default"] = name self._write_config(config) ``` ### get_account ``` get_account(name: str) -> AccountInfo ``` Get information about a specific account. | PARAMETER | DESCRIPTION | | --------- | ----------------------------- | | `name` | Account name. **TYPE:** `str` | | RETURNS | DESCRIPTION | | ------------- | ----------------------------------------- | | `AccountInfo` | AccountInfo object (secret not included). | | RAISES | DESCRIPTION | | ---------------------- | ------------------------- | | `AccountNotFoundError` | If account doesn't exist. | Source code in `src/mixpanel_data/_internal/config.py` ``` def get_account(self, name: str) -> AccountInfo: """Get information about a specific account. Args: name: Account name. Returns: AccountInfo object (secret not included). Raises: AccountNotFoundError: If account doesn't exist. """ config = self._read_config() accounts = config.get("accounts", {}) if name not in accounts: raise AccountNotFoundError(name, available_accounts=list(accounts.keys())) data = accounts[name] default_name = config.get("default") return AccountInfo( name=name, username=data.get("username", ""), project_id=data.get("project_id", ""), region=data.get("region", ""), is_default=(name == default_name), ) ``` ## Credentials Immutable container for authentication credentials. ## mixpanel_data.auth.Credentials Bases: `BaseModel` Immutable credentials for Mixpanel API authentication. This is a frozen Pydantic model that ensures: - All fields are validated on construction - The secret is never exposed in repr/str output - The object cannot be modified after creation ### username ``` username: str ``` Service account username. ### secret ``` secret: SecretStr ``` Service account secret (redacted in output). ### project_id ``` project_id: str ``` Mixpanel project identifier. ### region ``` region: RegionType ``` Data residency region (us, eu, or in). ### validate_region ``` validate_region(v: str) -> str ``` Validate and normalize region to lowercase. Source code in `src/mixpanel_data/_internal/config.py` ``` @field_validator("region", mode="before") @classmethod def validate_region(cls, v: str) -> str: """Validate and normalize region to lowercase.""" if not isinstance(v, str): raise ValueError(f"Region must be a string. Got: {type(v).__name__}") v_lower = v.lower() if v_lower not in VALID_REGIONS: valid = ", ".join(VALID_REGIONS) raise ValueError(f"Region must be one of: {valid}. Got: {v}") return v_lower ``` ### validate_non_empty ``` validate_non_empty(v: str) -> str ``` Validate string fields are non-empty. Source code in `src/mixpanel_data/_internal/config.py` ``` @field_validator("username", "project_id") @classmethod def validate_non_empty(cls, v: str) -> str: """Validate string fields are non-empty.""" if not v or not v.strip(): raise ValueError("Field cannot be empty") return v ``` ### __repr__ ``` __repr__() -> str ``` Return string representation with redacted secret. Source code in `src/mixpanel_data/_internal/config.py` ``` def __repr__(self) -> str: """Return string representation with redacted secret.""" return ( f"Credentials(username={self.username!r}, secret=***, " f"project_id={self.project_id!r}, region={self.region!r})" ) ``` ### __str__ ``` __str__() -> str ``` Return string representation with redacted secret. Source code in `src/mixpanel_data/_internal/config.py` ``` def __str__(self) -> str: """Return string representation with redacted secret.""" return self.__repr__() ``` ## AccountInfo Account metadata (without the secret). ## mixpanel_data.auth.AccountInfo ``` AccountInfo( name: str, username: str, project_id: str, region: str, is_default: bool ) ``` Information about a configured account (without secret). Used for listing accounts without exposing sensitive credentials. ### name ``` name: str ``` Account display name. ### username ``` username: str ``` Service account username. ### project_id ``` project_id: str ``` Mixpanel project identifier. ### region ``` region: str ``` Data residency region. ### is_default ``` is_default: bool ``` Whether this is the default account. Copy markdown # Exceptions All library exceptions inherit from `MixpanelDataError`, enabling callers to catch all library errors with a single except clause. Explore on DeepWiki πŸ€– **[Error Handling Guide β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/7.4-error-codes-and-exceptions)** Ask questions about specific exceptions, error recovery patterns, or debugging strategies. ## Exception Hierarchy ``` MixpanelDataError β”œβ”€β”€ ConfigError β”‚ β”œβ”€β”€ AccountNotFoundError β”‚ └── AccountExistsError β”œβ”€β”€ APIError β”‚ β”œβ”€β”€ AuthenticationError β”‚ β”œβ”€β”€ RateLimitError β”‚ β”œβ”€β”€ QueryError β”‚ β”œβ”€β”€ ServerError β”‚ └── JQLSyntaxError β”œβ”€β”€ TableExistsError β”œβ”€β”€ TableNotFoundError β”œβ”€β”€ DatabaseLockedError └── DatabaseNotFoundError ``` ## Catching Errors ``` import mixpanel_data as mp try: ws = mp.Workspace() result = ws.segmentation(event="Purchase", from_date="2025-01-01", to_date="2025-01-31") except mp.AuthenticationError as e: print(f"Auth failed: {e.message}") except mp.RateLimitError as e: print(f"Rate limited, retry after {e.retry_after}s") except mp.MixpanelDataError as e: print(f"Error [{e.code}]: {e.message}") ``` ## Base Exception ## mixpanel_data.MixpanelDataError ``` MixpanelDataError( message: str, code: str = "UNKNOWN_ERROR", details: dict[str, Any] | None = None, ) ``` Bases: `Exception` Base exception for all mixpanel_data errors. All library exceptions inherit from this class, allowing callers to: - Catch all library errors: except MixpanelDataError - Handle specific errors: except AccountNotFoundError - Serialize errors: error.to_dict() Initialize exception. | PARAMETER | DESCRIPTION | | --------- | ----------------------------------------------------------------------------------------------------- | | `message` | Human-readable error message. **TYPE:** `str` | | `code` | Machine-readable error code for programmatic handling. **TYPE:** `str` **DEFAULT:** `'UNKNOWN_ERROR'` | | `details` | Additional structured data about the error. **TYPE:** \`dict[str, Any] | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__( self, message: str, code: str = "UNKNOWN_ERROR", details: dict[str, Any] | None = None, ) -> None: """Initialize exception. Args: message: Human-readable error message. code: Machine-readable error code for programmatic handling. details: Additional structured data about the error. """ super().__init__(message) self._message = message self._code = code self._details = details or {} ``` ### code ``` code: str ``` Machine-readable error code. ### message ``` message: str ``` Human-readable error message. ### details ``` details: dict[str, Any] ``` Additional structured error data. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize exception for logging/JSON output. | RETURNS | DESCRIPTION | | ---------------- | --------------------------------------------- | | `dict[str, Any]` | Dictionary with keys: code, message, details. | | `dict[str, Any]` | All values are JSON-serializable. | Source code in `src/mixpanel_data/exceptions.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize exception for logging/JSON output. Returns: Dictionary with keys: code, message, details. All values are JSON-serializable. """ return { "code": self._code, "message": self._message, "details": self._details, } ``` ### __str__ ``` __str__() -> str ``` Return human-readable error message. Source code in `src/mixpanel_data/exceptions.py` ``` def __str__(self) -> str: """Return human-readable error message.""" return self._message ``` ### __repr__ ``` __repr__() -> str ``` Return detailed string representation. Source code in `src/mixpanel_data/exceptions.py` ``` def __repr__(self) -> str: """Return detailed string representation.""" return ( f"{self.__class__.__name__}(message={self._message!r}, code={self._code!r})" ) ``` ## API Exceptions ## mixpanel_data.APIError ``` APIError( message: str, *, status_code: int, response_body: str | dict[str, Any] | None = None, request_method: str | None = None, request_url: str | None = None, request_params: dict[str, Any] | None = None, request_body: dict[str, Any] | None = None, code: str = "API_ERROR", ) ``` Bases: `MixpanelDataError` Base class for Mixpanel API HTTP errors. Provides structured access to HTTP request/response context for debugging and automated recovery by AI agents. All API-related exceptions inherit from this class, enabling agents to: - Understand what went wrong (status code, error message) - See exactly what was sent (request method, URL, params, body) - See exactly what came back (response body, headers) - Modify their approach and retry autonomously Example ``` try: result = client.segmentation(event="signup", ...) except APIError as e: print(f"Status: {e.status_code}") print(f"Response: {e.response_body}") print(f"Request URL: {e.request_url}") print(f"Request params: {e.request_params}") ``` Initialize APIError. | PARAMETER | DESCRIPTION | | ---------------- | ----------------------------------------------------------------------- | | `message` | Human-readable error message. **TYPE:** `str` | | `status_code` | HTTP status code from response. **TYPE:** `int` | | `response_body` | Raw response body (string or parsed dict). **TYPE:** \`str | | `request_method` | HTTP method used (GET, POST). **TYPE:** \`str | | `request_url` | Full request URL. **TYPE:** \`str | | `request_params` | Query parameters sent. **TYPE:** \`dict[str, Any] | | `request_body` | Request body sent (for POST requests). **TYPE:** \`dict[str, Any] | | `code` | Machine-readable error code. **TYPE:** `str` **DEFAULT:** `'API_ERROR'` | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__( self, message: str, *, status_code: int, response_body: str | dict[str, Any] | None = None, request_method: str | None = None, request_url: str | None = None, request_params: dict[str, Any] | None = None, request_body: dict[str, Any] | None = None, code: str = "API_ERROR", ) -> None: """Initialize APIError. Args: message: Human-readable error message. status_code: HTTP status code from response. response_body: Raw response body (string or parsed dict). request_method: HTTP method used (GET, POST). request_url: Full request URL. request_params: Query parameters sent. request_body: Request body sent (for POST requests). code: Machine-readable error code. """ self._status_code = status_code self._response_body = response_body self._request_method = request_method self._request_url = request_url self._request_params = request_params self._request_body = request_body details: dict[str, Any] = { "status_code": status_code, } if response_body is not None: details["response_body"] = response_body if request_method is not None: details["request_method"] = request_method if request_url is not None: details["request_url"] = request_url if request_params is not None: details["request_params"] = request_params if request_body is not None: details["request_body"] = request_body super().__init__(message, code=code, details=details) ``` ### status_code ``` status_code: int ``` HTTP status code from response. ### response_body ``` response_body: str | dict[str, Any] | None ``` Raw response body (string or parsed dict). ### request_method ``` request_method: str | None ``` HTTP method used (GET, POST). ### request_url ``` request_url: str | None ``` Full request URL. ### request_params ``` request_params: dict[str, Any] | None ``` Query parameters sent. ### request_body ``` request_body: dict[str, Any] | None ``` Request body sent (for POST requests). ## mixpanel_data.AuthenticationError ``` AuthenticationError( message: str = "Authentication failed", *, status_code: int = 401, response_body: str | dict[str, Any] | None = None, request_method: str | None = None, request_url: str | None = None, request_params: dict[str, Any] | None = None, ) ``` Bases: `APIError` Authentication with Mixpanel API failed (HTTP 401). Raised when credentials are invalid, expired, or lack required permissions. Inherits from APIError to provide full request/response context. Example ``` try: client.segmentation(...) except AuthenticationError as e: print(f"Auth failed: {e.message}") print(f"Request URL: {e.request_url}") # Check if project_id is correct, credentials are valid, etc. ``` Initialize AuthenticationError. | PARAMETER | DESCRIPTION | | ---------------- | ------------------------------------------------------------------------------------ | | `message` | Human-readable error message. **TYPE:** `str` **DEFAULT:** `'Authentication failed'` | | `status_code` | HTTP status code (default 401). **TYPE:** `int` **DEFAULT:** `401` | | `response_body` | Raw response body. **TYPE:** \`str | | `request_method` | HTTP method used. **TYPE:** \`str | | `request_url` | Full request URL. **TYPE:** \`str | | `request_params` | Query parameters sent. **TYPE:** \`dict[str, Any] | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__( self, message: str = "Authentication failed", *, status_code: int = 401, response_body: str | dict[str, Any] | None = None, request_method: str | None = None, request_url: str | None = None, request_params: dict[str, Any] | None = None, ) -> None: """Initialize AuthenticationError. Args: message: Human-readable error message. status_code: HTTP status code (default 401). response_body: Raw response body. request_method: HTTP method used. request_url: Full request URL. request_params: Query parameters sent. """ super().__init__( message, status_code=status_code, response_body=response_body, request_method=request_method, request_url=request_url, request_params=request_params, code="AUTH_FAILED", ) ``` ## mixpanel_data.RateLimitError ``` RateLimitError( message: str = "Rate limit exceeded", *, retry_after: int | None = None, status_code: int = 429, response_body: str | dict[str, Any] | None = None, request_method: str | None = None, request_url: str | None = None, request_params: dict[str, Any] | None = None, ) ``` Bases: `APIError` Mixpanel API rate limit exceeded (HTTP 429). Raised when the API returns a 429 status. The retry_after property indicates when the request can be retried. Inherits from APIError to provide full request context for debugging. Example ``` try: for _ in range(1000): client.segmentation(...) except RateLimitError as e: print(f"Rate limited! Retry after {e.retry_after}s") print(f"Request: {e.request_method} {e.request_url}") time.sleep(e.retry_after or 60) ``` Initialize RateLimitError. | PARAMETER | DESCRIPTION | | ---------------- | ---------------------------------------------------------------------------------- | | `message` | Human-readable error message. **TYPE:** `str` **DEFAULT:** `'Rate limit exceeded'` | | `retry_after` | Seconds until retry is allowed (from Retry-After header). **TYPE:** \`int | | `status_code` | HTTP status code (default 429). **TYPE:** `int` **DEFAULT:** `429` | | `response_body` | Raw response body. **TYPE:** \`str | | `request_method` | HTTP method used. **TYPE:** \`str | | `request_url` | Full request URL. **TYPE:** \`str | | `request_params` | Query parameters sent. **TYPE:** \`dict[str, Any] | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__( self, message: str = "Rate limit exceeded", *, retry_after: int | None = None, status_code: int = 429, response_body: str | dict[str, Any] | None = None, request_method: str | None = None, request_url: str | None = None, request_params: dict[str, Any] | None = None, ) -> None: """Initialize RateLimitError. Args: message: Human-readable error message. retry_after: Seconds until retry is allowed (from Retry-After header). status_code: HTTP status code (default 429). response_body: Raw response body. request_method: HTTP method used. request_url: Full request URL. request_params: Query parameters sent. """ self._retry_after = retry_after if retry_after is not None: message = f"{message}. Retry after {retry_after} seconds." super().__init__( message, status_code=status_code, response_body=response_body, request_method=request_method, request_url=request_url, request_params=request_params, code="RATE_LIMITED", ) # Add retry_after to details if retry_after is not None: self._details["retry_after"] = retry_after ``` ### retry_after ``` retry_after: int | None ``` Seconds until retry is allowed, or None if unknown. ## mixpanel_data.QueryError ``` QueryError( message: str = "Query execution failed", *, status_code: int = 400, response_body: str | dict[str, Any] | None = None, request_method: str | None = None, request_url: str | None = None, request_params: dict[str, Any] | None = None, request_body: dict[str, Any] | None = None, ) ``` Bases: `APIError` Query execution failed (HTTP 400 or query-specific error). Raised when an API query fails due to invalid parameters, syntax errors, or other query-specific issues. Inherits from APIError to provide full request/response context for debugging. Example ``` try: client.segmentation(event="nonexistent", ...) except QueryError as e: print(f"Query failed: {e.message}") print(f"Response: {e.response_body}") print(f"Request params: {e.request_params}") ``` Initialize QueryError. | PARAMETER | DESCRIPTION | | ---------------- | ------------------------------------------------------------------------------------- | | `message` | Human-readable error message. **TYPE:** `str` **DEFAULT:** `'Query execution failed'` | | `status_code` | HTTP status code (default 400). **TYPE:** `int` **DEFAULT:** `400` | | `response_body` | Raw response body with error details. **TYPE:** \`str | | `request_method` | HTTP method used. **TYPE:** \`str | | `request_url` | Full request URL. **TYPE:** \`str | | `request_params` | Query parameters sent. **TYPE:** \`dict[str, Any] | | `request_body` | Request body sent (for POST). **TYPE:** \`dict[str, Any] | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__( self, message: str = "Query execution failed", *, status_code: int = 400, response_body: str | dict[str, Any] | None = None, request_method: str | None = None, request_url: str | None = None, request_params: dict[str, Any] | None = None, request_body: dict[str, Any] | None = None, ) -> None: """Initialize QueryError. Args: message: Human-readable error message. status_code: HTTP status code (default 400). response_body: Raw response body with error details. request_method: HTTP method used. request_url: Full request URL. request_params: Query parameters sent. request_body: Request body sent (for POST). """ super().__init__( message, status_code=status_code, response_body=response_body, request_method=request_method, request_url=request_url, request_params=request_params, request_body=request_body, code="QUERY_FAILED", ) ``` ## mixpanel_data.ServerError ``` ServerError( message: str = "Server error", *, status_code: int = 500, response_body: str | dict[str, Any] | None = None, request_method: str | None = None, request_url: str | None = None, request_params: dict[str, Any] | None = None, request_body: dict[str, Any] | None = None, ) ``` Bases: `APIError` Mixpanel server error (HTTP 5xx). Raised when the Mixpanel API returns a server error. These are typically transient issues that may succeed on retry. The response_body property contains the full error details from Mixpanel, which often include actionable information (e.g., "unit and interval both specified"). Example ``` try: client.retention(born_event="signup", ...) except ServerError as e: print(f"Server error {e.status_code}: {e.message}") print(f"Response: {e.response_body}") print(f"Request params: {e.request_params}") # AI agent can analyze response_body to fix the request ``` Initialize ServerError. | PARAMETER | DESCRIPTION | | ---------------- | --------------------------------------------------------------------------- | | `message` | Human-readable error message. **TYPE:** `str` **DEFAULT:** `'Server error'` | | `status_code` | HTTP status code (5xx). **TYPE:** `int` **DEFAULT:** `500` | | `response_body` | Raw response body with error details. **TYPE:** \`str | | `request_method` | HTTP method used. **TYPE:** \`str | | `request_url` | Full request URL. **TYPE:** \`str | | `request_params` | Query parameters sent. **TYPE:** \`dict[str, Any] | | `request_body` | Request body sent (for POST). **TYPE:** \`dict[str, Any] | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__( self, message: str = "Server error", *, status_code: int = 500, response_body: str | dict[str, Any] | None = None, request_method: str | None = None, request_url: str | None = None, request_params: dict[str, Any] | None = None, request_body: dict[str, Any] | None = None, ) -> None: """Initialize ServerError. Args: message: Human-readable error message. status_code: HTTP status code (5xx). response_body: Raw response body with error details. request_method: HTTP method used. request_url: Full request URL. request_params: Query parameters sent. request_body: Request body sent (for POST). """ super().__init__( message, status_code=status_code, response_body=response_body, request_method=request_method, request_url=request_url, request_params=request_params, request_body=request_body, code="SERVER_ERROR", ) ``` ## mixpanel_data.JQLSyntaxError ``` JQLSyntaxError( raw_error: str, script: str | None = None, request_path: str | None = None ) ``` Bases: `QueryError` JQL script execution failed with syntax or runtime error (HTTP 412). Raised when a JQL script fails to execute due to syntax errors, type errors, or other JavaScript runtime issues. Provides structured access to error details from Mixpanel's response. Inherits from QueryError (and thus APIError) to provide full HTTP context. Example ``` try: result = live_query.jql(script) except JQLSyntaxError as e: print(f"Error: {e.error_type}: {e.error_message}") print(f"Script: {e.script}") print(f"Line info: {e.line_info}") # AI agent can use this to fix the script and retry ``` Initialize JQLSyntaxError. | PARAMETER | DESCRIPTION | | -------------- | ------------------------------------------------------------ | | `raw_error` | Raw error string from Mixpanel API response. **TYPE:** `str` | | `script` | The JQL script that caused the error. **TYPE:** \`str | | `request_path` | API request path from error response. **TYPE:** \`str | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__( self, raw_error: str, script: str | None = None, request_path: str | None = None, ) -> None: """Initialize JQLSyntaxError. Args: raw_error: Raw error string from Mixpanel API response. script: The JQL script that caused the error. request_path: API request path from error response. """ # Parse structured error info from raw error string self._error_type = self._extract_error_type(raw_error) self._error_message = self._extract_message(raw_error) self._line_info = self._extract_line_info(raw_error) self._stack_trace = self._extract_stack_trace(raw_error) self._script = script self._raw_error = raw_error self._request_path = request_path # Build human-readable message message = f"JQL {self._error_type}: {self._error_message}" if self._line_info: message += f"\n{self._line_info}" # Build response body dict for APIError response_body: dict[str, Any] = { "error": raw_error, } if request_path: response_body["request"] = request_path super().__init__( message, status_code=412, response_body=response_body, request_body={"script": script} if script else None, ) self._code = "JQL_SYNTAX_ERROR" # Add JQL-specific details self._details["error_type"] = self._error_type self._details["error_message"] = self._error_message self._details["line_info"] = self._line_info self._details["stack_trace"] = self._stack_trace self._details["script"] = script self._details["request_path"] = request_path self._details["raw_error"] = raw_error ``` ### error_type ``` error_type: str ``` JavaScript error type (TypeError, SyntaxError, ReferenceError, etc.). ### error_message ``` error_message: str ``` Error message describing what went wrong. ### line_info ``` line_info: str | None ``` Code snippet with caret showing error location, if available. ### stack_trace ``` stack_trace: str | None ``` JavaScript stack trace, if available. ### script ``` script: str | None ``` The JQL script that caused the error. ### raw_error ``` raw_error: str ``` Complete raw error string from Mixpanel. ## Configuration Exceptions ## mixpanel_data.ConfigError ``` ConfigError(message: str, details: dict[str, Any] | None = None) ``` Bases: `MixpanelDataError` Base for configuration-related errors. Raised when there's a problem with configuration files, environment variables, or credential resolution. Initialize ConfigError. | PARAMETER | DESCRIPTION | | --------- | ------------------------------------------------------ | | `message` | Human-readable error message. **TYPE:** `str` | | `details` | Additional structured data. **TYPE:** \`dict[str, Any] | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__( self, message: str, details: dict[str, Any] | None = None, ) -> None: """Initialize ConfigError. Args: message: Human-readable error message. details: Additional structured data. """ super().__init__(message, code="CONFIG_ERROR", details=details) ``` ## mixpanel_data.AccountNotFoundError ``` AccountNotFoundError( account_name: str, available_accounts: list[str] | None = None ) ``` Bases: `ConfigError` Named account does not exist in configuration. Raised when attempting to access an account that hasn't been configured. The available_accounts property lists valid account names to help users. Initialize AccountNotFoundError. | PARAMETER | DESCRIPTION | | -------------------- | ------------------------------------------------------------------ | | `account_name` | The requested account name that wasn't found. **TYPE:** `str` | | `available_accounts` | List of valid account names for suggestions. **TYPE:** \`list[str] | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__( self, account_name: str, available_accounts: list[str] | None = None, ) -> None: """Initialize AccountNotFoundError. Args: account_name: The requested account name that wasn't found. available_accounts: List of valid account names for suggestions. """ available = available_accounts or [] if available: available_str = ", ".join(f"'{a}'" for a in available) message = ( f"Account '{account_name}' not found. " f"Available accounts: {available_str}" ) else: message = f"Account '{account_name}' not found. No accounts configured." details = { "account_name": account_name, "available_accounts": available, } super().__init__(message, details=details) self._code = "ACCOUNT_NOT_FOUND" ``` ### account_name ``` account_name: str ``` The requested account name that wasn't found. ### available_accounts ``` available_accounts: list[str] ``` List of valid account names. ## mixpanel_data.AccountExistsError ``` AccountExistsError(account_name: str) ``` Bases: `ConfigError` Account name already exists in configuration. Raised when attempting to add an account with a name that's already in use. Initialize AccountExistsError. | PARAMETER | DESCRIPTION | | -------------- | --------------------------------------------- | | `account_name` | The conflicting account name. **TYPE:** `str` | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__(self, account_name: str) -> None: """Initialize AccountExistsError. Args: account_name: The conflicting account name. """ message = f"Account '{account_name}' already exists." details = {"account_name": account_name} super().__init__(message, details=details) self._code = "ACCOUNT_EXISTS" ``` ### account_name ``` account_name: str ``` The conflicting account name. ## Storage Exceptions Storage exceptions are raised during fetch and table operations: | Exception | Raised When | | ----------------------- | ------------------------------------------------------------------ | | `TableExistsError` | Fetching to an existing table without `append=True` or `--replace` | | `TableNotFoundError` | Using `append=True` on a non-existent table | | `DatabaseLockedError` | Another process has the database locked | | `DatabaseNotFoundError` | Database file not found in read-only mode | ## mixpanel_data.TableExistsError ``` TableExistsError(table_name: str) ``` Bases: `MixpanelDataError` Table already exists in local database. Raised when attempting to create a table that already exists. Use drop() first to remove the existing table. Initialize TableExistsError. | PARAMETER | DESCRIPTION | | ------------ | ------------------------------------------- | | `table_name` | Name of the existing table. **TYPE:** `str` | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__(self, table_name: str) -> None: """Initialize TableExistsError. Args: table_name: Name of the existing table. """ message = f"Table '{table_name}' already exists." details = { "table_name": table_name, "suggestion": "Use drop() first to remove the existing table.", } super().__init__(message, code="TABLE_EXISTS", details=details) ``` ### table_name ``` table_name: str ``` Name of the existing table. ## mixpanel_data.TableNotFoundError ``` TableNotFoundError(table_name: str) ``` Bases: `MixpanelDataError` Table does not exist in local database. Raised when attempting to access a table that hasn't been created. Initialize TableNotFoundError. | PARAMETER | DESCRIPTION | | ------------ | ------------------------------------------ | | `table_name` | Name of the missing table. **TYPE:** `str` | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__(self, table_name: str) -> None: """Initialize TableNotFoundError. Args: table_name: Name of the missing table. """ message = f"Table '{table_name}' not found." details = {"table_name": table_name} super().__init__(message, code="TABLE_NOT_FOUND", details=details) ``` ### table_name ``` table_name: str ``` Name of the missing table. ## mixpanel_data.DatabaseLockedError ``` DatabaseLockedError(db_path: str, holding_pid: int | None = None) ``` Bases: `MixpanelDataError` Database is locked by another process. Raised when attempting to access a DuckDB database that is locked by another process. DuckDB uses single-writer, multiple-reader concurrency - only one process can have write access at a time. Example ``` try: ws = Workspace() except DatabaseLockedError as e: print(f"Database {e.db_path} is locked") if e.holding_pid: print(f"Held by PID {e.holding_pid}") ``` Initialize DatabaseLockedError. | PARAMETER | DESCRIPTION | | ------------- | ---------------------------------------------------------- | | `db_path` | Path to the locked database file. **TYPE:** `str` | | `holding_pid` | Process ID holding the lock, if available. **TYPE:** \`int | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__( self, db_path: str, holding_pid: int | None = None, ) -> None: """Initialize DatabaseLockedError. Args: db_path: Path to the locked database file. holding_pid: Process ID holding the lock, if available. """ message = f"Database '{db_path}' is locked by another process" if holding_pid is not None: message += f" (PID {holding_pid})" message += ". Wait for the other operation to complete and try again." details: dict[str, str | int] = { "db_path": db_path, "suggestion": "Wait for the other operation to complete and try again.", } if holding_pid is not None: details["holding_pid"] = holding_pid super().__init__(message, code="DATABASE_LOCKED", details=details) ``` ### db_path ``` db_path: str ``` Path to the locked database. ### holding_pid ``` holding_pid: int | None ``` Process ID holding the lock, if available. ## mixpanel_data.DatabaseNotFoundError ``` DatabaseNotFoundError(db_path: str) ``` Bases: `MixpanelDataError` Database file does not exist. Raised when attempting to open a non-existent database file in read-only mode. DuckDB cannot create a new database file when opened read-only. This typically happens when running read-only commands (like `mp query` or `mp inspect tables`) before any data has been fetched. Example ``` try: ws = Workspace(read_only=True) except DatabaseNotFoundError as e: print(f"No data yet: {e.db_path}") print("Run 'mp fetch events' first to create the database.") ``` Initialize DatabaseNotFoundError. | PARAMETER | DESCRIPTION | | --------- | ------------------------------------------------------------- | | `db_path` | Path to the database file that doesn't exist. **TYPE:** `str` | Source code in `src/mixpanel_data/exceptions.py` ``` def __init__(self, db_path: str) -> None: """Initialize DatabaseNotFoundError. Args: db_path: Path to the database file that doesn't exist. """ message = ( f"Database '{db_path}' does not exist. " "Run 'mp fetch events' first to create it." ) details: dict[str, str] = { "db_path": db_path, "suggestion": "Run 'mp fetch events' or 'mp fetch profiles' to create the database.", } super().__init__(message, code="DATABASE_NOT_FOUND", details=details) ``` ### db_path ``` db_path: str ``` Path to the database file that doesn't exist. Copy markdown # Result Types Explore on DeepWiki πŸ€– **[Result Types Reference β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/7.5-result-type-reference)** Ask questions about result structures, DataFrame conversion, or type usage patterns. All result types are immutable frozen dataclasses with: - Lazy DataFrame conversion via the `.df` property - JSON serialization via the `.to_dict()` method - Full type hints for IDE/mypy support ## Fetch Results ## mixpanel_data.FetchResult ``` FetchResult( table: str, rows: int, type: Literal["events", "profiles"], duration_seconds: float, date_range: tuple[str, str] | None, fetched_at: datetime, _data: list[dict[str, Any]] = list(), _df_cache: DataFrame | None = None, ) ``` Result of a data fetch operation. Represents the outcome of fetching events or profiles from Mixpanel and storing them in the local database. ### table ``` table: str ``` Name of the created table. ### rows ``` rows: int ``` Number of rows fetched. ### type ``` type: Literal['events', 'profiles'] ``` Type of data fetched. ### duration_seconds ``` duration_seconds: float ``` Time taken to complete the fetch. ### date_range ``` date_range: tuple[str, str] | None ``` Date range for events (None for profiles). ### fetched_at ``` fetched_at: datetime ``` Timestamp when fetch completed. ### df ``` df: DataFrame ``` Convert result data to pandas DataFrame. Conversion is lazy - computed on first access and cached. | RETURNS | DESCRIPTION | | ----------- | ---------------------------- | | `DataFrame` | DataFrame with fetched data. | ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize result for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ---------------------------------------------------- | | `dict[str, Any]` | Dictionary representation (excludes raw data). | | `dict[str, Any]` | datetime values are converted to ISO format strings. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize result for JSON output. Returns: Dictionary representation (excludes raw data). datetime values are converted to ISO format strings. """ return { "table": self.table, "rows": self.rows, "type": self.type, "duration_seconds": self.duration_seconds, "date_range": self.date_range, "fetched_at": self.fetched_at.isoformat(), } ``` ## Parallel Fetch Types Types for parallel event fetching with progress tracking and failure handling. ## mixpanel_data.ParallelFetchResult ``` ParallelFetchResult( table: str, total_rows: int, successful_batches: int, failed_batches: int, failed_date_ranges: tuple[tuple[str, str], ...], duration_seconds: float, fetched_at: datetime, ) ``` Result of a parallel fetch operation. Aggregates results from all batches, providing summary statistics and information about any failures for retry. | ATTRIBUTE | DESCRIPTION | | -------------------- | ------------------------------------------------------------------------------------------- | | `table` | Name of the created/appended table. **TYPE:** `str` | | `total_rows` | Total number of rows fetched across all batches. **TYPE:** `int` | | `successful_batches` | Number of batches that completed successfully. **TYPE:** `int` | | `failed_batches` | Number of batches that failed. **TYPE:** `int` | | `failed_date_ranges` | Date ranges (from_date, to_date) of failed batches. **TYPE:** `tuple[tuple[str, str], ...]` | | `duration_seconds` | Total time taken for the parallel fetch. **TYPE:** `float` | | `fetched_at` | Timestamp when fetch completed. **TYPE:** `datetime` | Example ``` result = ws.fetch_events( name="events", from_date="2024-01-01", to_date="2024-03-31", parallel=True, ) if result.has_failures: print(f"Warning: {result.failed_batches} batches failed") for from_date, to_date in result.failed_date_ranges: print(f" {from_date} to {to_date}") ``` ### table ``` table: str ``` Name of the created/appended table. ### total_rows ``` total_rows: int ``` Total number of rows fetched across all batches. ### successful_batches ``` successful_batches: int ``` Number of batches that completed successfully. ### failed_batches ``` failed_batches: int ``` Number of batches that failed. ### failed_date_ranges ``` failed_date_ranges: tuple[tuple[str, str], ...] ``` Date ranges (from_date, to_date) of failed batches for retry. ### duration_seconds ``` duration_seconds: float ``` Total time taken for the parallel fetch. ### fetched_at ``` fetched_at: datetime ``` Timestamp when fetch completed. ### has_failures ``` has_failures: bool ``` Check if any batches failed. | RETURNS | DESCRIPTION | | ------- | --------------------------------------------------- | | `bool` | True if at least one batch failed, False otherwise. | ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | --------------------------------------------------------- | | `dict[str, Any]` | Dictionary with all result fields including has_failures. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all result fields including has_failures. """ return { "table": self.table, "total_rows": self.total_rows, "successful_batches": self.successful_batches, "failed_batches": self.failed_batches, "failed_date_ranges": [list(dr) for dr in self.failed_date_ranges], "duration_seconds": self.duration_seconds, "fetched_at": self.fetched_at.isoformat(), "has_failures": self.has_failures, } ``` ## mixpanel_data.BatchProgress ``` BatchProgress( from_date: str, to_date: str, batch_index: int, total_batches: int, rows: int, success: bool, error: str | None = None, ) ``` Progress update for a parallel fetch batch. Sent to the on_batch_complete callback when a batch finishes (successfully or with error). | ATTRIBUTE | DESCRIPTION | | --------------- | ------------------------------------------------------------------- | | `from_date` | Start date of this batch (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date of this batch (YYYY-MM-DD). **TYPE:** `str` | | `batch_index` | Zero-based index of this batch. **TYPE:** `int` | | `total_batches` | Total number of batches in the parallel fetch. **TYPE:** `int` | | `rows` | Number of rows fetched in this batch (0 if failed). **TYPE:** `int` | | `success` | Whether this batch completed successfully. **TYPE:** `bool` | | `error` | Error message if failed, None if successful. **TYPE:** \`str | Example ``` def on_batch(progress: BatchProgress) -> None: status = "βœ“" if progress.success else "βœ—" print(f"[{status}] Batch {progress.batch_index + 1}/{progress.total_batches}") result = ws.fetch_events( name="events", from_date="2024-01-01", to_date="2024-03-31", parallel=True, on_batch_complete=on_batch, ) ``` ### from_date ``` from_date: str ``` Start date of this batch (YYYY-MM-DD). ### to_date ``` to_date: str ``` End date of this batch (YYYY-MM-DD). ### batch_index ``` batch_index: int ``` Zero-based index of this batch. ### total_batches ``` total_batches: int ``` Total number of batches in the parallel fetch. ### rows ``` rows: int ``` Number of rows fetched in this batch (0 if failed). ### success ``` success: bool ``` Whether this batch completed successfully. ### error ``` error: str | None = None ``` Error message if failed, None if successful. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ------------------------------------------ | | `dict[str, Any]` | Dictionary with all batch progress fields. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all batch progress fields. """ return { "from_date": self.from_date, "to_date": self.to_date, "batch_index": self.batch_index, "total_batches": self.total_batches, "rows": self.rows, "success": self.success, "error": self.error, } ``` ## mixpanel_data.BatchResult ``` BatchResult( from_date: str, to_date: str, rows: int, success: bool, error: str | None = None, ) ``` Result of fetching a single date range chunk. Internal type used by ParallelFetcherService to track batch outcomes. Contains either the fetched data (on success) or error info (on failure). | ATTRIBUTE | DESCRIPTION | | ----------- | ---------------------------------------------------------------- | | `from_date` | Start date of this batch (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | End date of this batch (YYYY-MM-DD). **TYPE:** `str` | | `rows` | Number of rows fetched (0 if failed). **TYPE:** `int` | | `success` | Whether the batch completed successfully. **TYPE:** `bool` | | `error` | Exception message if failed, None if successful. **TYPE:** \`str | Note Data is not included in to_dict() as it's consumed by the writer thread and is not JSON-serializable (iterator of dicts). ### from_date ``` from_date: str ``` Start date of this batch (YYYY-MM-DD). ### to_date ``` to_date: str ``` End date of this batch (YYYY-MM-DD). ### rows ``` rows: int ``` Number of rows fetched (0 if failed). ### success ``` success: bool ``` Whether the batch completed successfully. ### error ``` error: str | None = None ``` Exception message if failed, None if successful. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output (excludes data). | RETURNS | DESCRIPTION | | ---------------- | ----------------------------------------------------- | | `dict[str, Any]` | Dictionary with batch result fields (excluding data). | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output (excludes data). Returns: Dictionary with batch result fields (excluding data). """ return { "from_date": self.from_date, "to_date": self.to_date, "rows": self.rows, "success": self.success, "error": self.error, } ``` ## Parallel Profile Fetch Types Types for parallel profile fetching with page-based progress tracking. ## mixpanel_data.ParallelProfileResult ``` ParallelProfileResult( table: str, total_rows: int, successful_pages: int, failed_pages: int, failed_page_indices: tuple[int, ...], duration_seconds: float, fetched_at: datetime, ) ``` Result of a parallel profile fetch operation. Aggregates results from all pages, providing summary statistics and information about any failures for retry. | ATTRIBUTE | DESCRIPTION | | --------------------- | ------------------------------------------------------------------- | | `table` | Name of the created/appended table. **TYPE:** `str` | | `total_rows` | Total number of rows fetched across all pages. **TYPE:** `int` | | `successful_pages` | Number of pages that completed successfully. **TYPE:** `int` | | `failed_pages` | Number of pages that failed. **TYPE:** `int` | | `failed_page_indices` | Page indices of failed pages for retry. **TYPE:** `tuple[int, ...]` | | `duration_seconds` | Total time taken for the parallel fetch. **TYPE:** `float` | | `fetched_at` | Timestamp when fetch completed. **TYPE:** `datetime` | Example ``` result = ws.fetch_profiles( name="users", parallel=True, ) if result.has_failures: print(f"Warning: {result.failed_pages} pages failed") for idx in result.failed_page_indices: print(f" Page {idx}") ``` ### table ``` table: str ``` Name of the created/appended table. ### total_rows ``` total_rows: int ``` Total number of rows fetched across all pages. ### successful_pages ``` successful_pages: int ``` Number of pages that completed successfully. ### failed_pages ``` failed_pages: int ``` Number of pages that failed. ### failed_page_indices ``` failed_page_indices: tuple[int, ...] ``` Page indices of failed pages for retry. ### duration_seconds ``` duration_seconds: float ``` Total time taken for the parallel fetch. ### fetched_at ``` fetched_at: datetime ``` Timestamp when fetch completed. ### has_failures ``` has_failures: bool ``` Check if any pages failed. | RETURNS | DESCRIPTION | | ------- | -------------------------------------------------- | | `bool` | True if at least one page failed, False otherwise. | ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | --------------------------------------------------------- | | `dict[str, Any]` | Dictionary with all result fields including has_failures. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all result fields including has_failures. """ return { "table": self.table, "total_rows": self.total_rows, "successful_pages": self.successful_pages, "failed_pages": self.failed_pages, "failed_page_indices": list(self.failed_page_indices), "duration_seconds": self.duration_seconds, "fetched_at": self.fetched_at.isoformat(), "has_failures": self.has_failures, } ``` ## mixpanel_data.ProfileProgress ``` ProfileProgress( page_index: int, total_pages: int | None, rows: int, success: bool, error: str | None, cumulative_rows: int, ) ``` Progress update for a parallel profile fetch page. Sent to the on_page_complete callback when a page finishes (successfully or with error). Used for progress visibility during parallel profile fetching operations. | ATTRIBUTE | DESCRIPTION | | ----------------- | ------------------------------------------------------------------ | | `page_index` | Zero-based index of this page. **TYPE:** `int` | | `total_pages` | Total pages if known, None if not yet determined. **TYPE:** \`int | | `rows` | Number of rows fetched in this page (0 if failed). **TYPE:** `int` | | `success` | Whether this page completed successfully. **TYPE:** `bool` | | `error` | Error message if failed, None if successful. **TYPE:** \`str | | `cumulative_rows` | Total rows fetched so far across all pages. **TYPE:** `int` | Example ``` def on_page(progress: ProfileProgress) -> None: status = "βœ“" if progress.success else "βœ—" pct = f"{progress.page_index + 1}/{progress.total_pages}" if progress.total_pages else f"{progress.page_index + 1}/?" print(f"[{status}] Page {pct}: {progress.cumulative_rows} total rows") result = ws.fetch_profiles( name="users", parallel=True, on_page_complete=on_page, ) ``` ### page_index ``` page_index: int ``` Zero-based index of this page. ### total_pages ``` total_pages: int | None ``` Total pages if known, None if not yet determined. ### rows ``` rows: int ``` Number of rows fetched in this page (0 if failed). ### success ``` success: bool ``` Whether this page completed successfully. ### error ``` error: str | None ``` Error message if failed, None if successful. ### cumulative_rows ``` cumulative_rows: int ``` Total rows fetched so far across all pages. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | -------------------------------------------- | | `dict[str, Any]` | Dictionary with all profile progress fields. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all profile progress fields. """ return { "page_index": self.page_index, "total_pages": self.total_pages, "rows": self.rows, "success": self.success, "error": self.error, "cumulative_rows": self.cumulative_rows, } ``` ## mixpanel_data.ProfilePageResult ``` ProfilePageResult( profiles: list[dict[str, Any]], session_id: str | None, page: int, has_more: bool, total: int, page_size: int, ) ``` Result from fetching a single page of profiles. Contains the profiles from one page of the Engage API along with pagination metadata for fetching subsequent pages. | ATTRIBUTE | DESCRIPTION | | ------------ | ----------------------------------------------------------------------------- | | `profiles` | List of profile dictionaries from this page. **TYPE:** `list[dict[str, Any]]` | | `session_id` | Session ID for fetching next page, None if no more pages. **TYPE:** \`str | | `page` | Zero-based page index that was fetched. **TYPE:** `int` | | `has_more` | True if there are more pages to fetch. **TYPE:** `bool` | | `total` | Total number of profiles matching the query across all pages. **TYPE:** `int` | | `page_size` | Number of profiles per page (typically 1000). **TYPE:** `int` | Example ``` # Fetch first page to get pagination metadata result = api_client.export_profiles_page(page=0) all_profiles = list(result.profiles) # Pre-compute total pages for parallel fetching total_pages = result.num_pages print(f"Fetching {total_pages} pages ({result.total} profiles)") # Continue fetching if more pages while result.has_more: result = api_client.export_profiles_page( page=result.page + 1, session_id=result.session_id, ) all_profiles.extend(result.profiles) ``` ### profiles ``` profiles: list[dict[str, Any]] ``` List of profile dictionaries from this page. ### session_id ``` session_id: str | None ``` Session ID for fetching next page, None if no more pages. ### page ``` page: int ``` Zero-based page index that was fetched. ### has_more ``` has_more: bool ``` True if there are more pages to fetch. ### total ``` total: int ``` Total number of profiles matching the query across all pages. ### page_size ``` page_size: int ``` Number of profiles per page (typically 1000). ### num_pages ``` num_pages: int ``` Calculate total number of pages needed. Uses ceiling division to ensure partial pages are counted. | RETURNS | DESCRIPTION | | ------- | ------------------------------------------- | | `int` | Total pages needed to fetch all profiles. | | `int` | Returns 0 if total is 0 (empty result set). | Example ``` result = api_client.export_profiles_page(page=0) # If total=5432 and page_size=1000, num_pages=6 for page_idx in range(1, result.num_pages): # Fetch remaining pages... ``` ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | --------------------------------------------------------------------- | | `dict[str, Any]` | Dictionary with all page result fields including pagination metadata. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all page result fields including pagination metadata. """ return { "profiles": self.profiles, "session_id": self.session_id, "page": self.page, "has_more": self.has_more, "profile_count": len(self.profiles), "total": self.total, "page_size": self.page_size, "num_pages": self.num_pages, } ``` ## Query Results ## mixpanel_data.SegmentationResult ``` SegmentationResult( event: str, from_date: str, to_date: str, unit: Literal["day", "week", "month"], segment_property: str | None, total: int, series: dict[str, dict[str, int]] = dict(), *, _df_cache: DataFrame | None = None, ) ``` Bases: `ResultWithDataFrame` Result of a segmentation query. Contains time-series data for an event, optionally segmented by a property. Inherits from ResultWithDataFrame to provide: - Lazy DataFrame caching via \_df_cache field - Normalized table output via to_table_dict() method ### event ``` event: str ``` Queried event name. ### from_date ``` from_date: str ``` Query start date (YYYY-MM-DD). ### to_date ``` to_date: str ``` Query end date (YYYY-MM-DD). ### unit ``` unit: Literal['day', 'week', 'month'] ``` Time unit for aggregation. ### segment_property ``` segment_property: str | None ``` Property used for segmentation (None if total only). ### total ``` total: int ``` Total count across all segments and time periods. ### series ``` series: dict[str, dict[str, int]] = field(default_factory=dict) ``` Time series data by segment. Structure: {segment_name: {date_string: count}} Example: {"US": {"2024-01-01": 150, "2024-01-02": 200}, "EU": {...}} For unsegmented queries, segment_name is "total". ### df ``` df: DataFrame ``` Convert to DataFrame with columns: date, segment, count. For unsegmented queries, segment column is 'total'. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "event": self.event, "from_date": self.from_date, "to_date": self.to_date, "unit": self.unit, "segment_property": self.segment_property, "total": self.total, "series": self.series, } ``` ## mixpanel_data.FunnelResult ``` FunnelResult( funnel_id: int, funnel_name: str, from_date: str, to_date: str, conversion_rate: float, steps: list[FunnelStep] = list(), *, _df_cache: DataFrame | None = None, ) ``` Bases: `ResultWithDataFrame` Result of a funnel query. Contains step-by-step conversion data for a funnel. Inherits from ResultWithDataFrame to provide: - Lazy DataFrame caching via \_df_cache field - Normalized table output via to_table_dict() method ### funnel_id ``` funnel_id: int ``` Funnel identifier. ### funnel_name ``` funnel_name: str ``` Funnel display name. ### from_date ``` from_date: str ``` Query start date. ### to_date ``` to_date: str ``` Query end date. ### conversion_rate ``` conversion_rate: float ``` Overall conversion rate (0.0 to 1.0). ### steps ``` steps: list[FunnelStep] = field(default_factory=list) ``` Step-by-step breakdown. ### df ``` df: DataFrame ``` Convert to DataFrame with columns: step, event, count, conversion_rate. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "funnel_id": self.funnel_id, "funnel_name": self.funnel_name, "from_date": self.from_date, "to_date": self.to_date, "conversion_rate": self.conversion_rate, "steps": [step.to_dict() for step in self.steps], } ``` ## mixpanel_data.FunnelStep ``` FunnelStep(event: str, count: int, conversion_rate: float) ``` Single step in a funnel. ### event ``` event: str ``` Event name for this step. ### count ``` count: int ``` Number of users at this step. ### conversion_rate ``` conversion_rate: float ``` Conversion rate from previous step (0.0 to 1.0). ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "event": self.event, "count": self.count, "conversion_rate": self.conversion_rate, } ``` ## mixpanel_data.RetentionResult ``` RetentionResult( born_event: str, return_event: str, from_date: str, to_date: str, unit: Literal["day", "week", "month"], cohorts: list[CohortInfo] = list(), *, _df_cache: DataFrame | None = None, ) ``` Bases: `ResultWithDataFrame` Result of a retention query. Contains cohort-based retention data. Inherits from ResultWithDataFrame to provide: - Lazy DataFrame caching via \_df_cache field - Normalized table output via to_table_dict() method ### born_event ``` born_event: str ``` Event that defines cohort membership. ### return_event ``` return_event: str ``` Event that defines return. ### from_date ``` from_date: str ``` Query start date. ### to_date ``` to_date: str ``` Query end date. ### unit ``` unit: Literal['day', 'week', 'month'] ``` Time unit for retention periods. ### cohorts ``` cohorts: list[CohortInfo] = field(default_factory=list) ``` Cohort retention data. ### df ``` df: DataFrame ``` Convert to DataFrame with columns: cohort_date, cohort_size, period_N. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "born_event": self.born_event, "return_event": self.return_event, "from_date": self.from_date, "to_date": self.to_date, "unit": self.unit, "cohorts": [cohort.to_dict() for cohort in self.cohorts], } ``` ## mixpanel_data.CohortInfo ``` CohortInfo(date: str, size: int, retention: list[float] = list()) ``` Retention data for a single cohort. ### date ``` date: str ``` Cohort date (when users were 'born'). ### size ``` size: int ``` Number of users in cohort. ### retention ``` retention: list[float] = field(default_factory=list) ``` Retention percentages by period (0.0 to 1.0). ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "date": self.date, "size": self.size, "retention": self.retention, } ``` ## mixpanel_data.JQLResult ``` JQLResult(_raw: list[Any] = list(), *, _df_cache: DataFrame | None = None) ``` Bases: `ResultWithDataFrame` Result of a JQL query. JQL (JavaScript Query Language) allows custom queries against Mixpanel data. Inherits from ResultWithDataFrame to provide: - Lazy DataFrame caching via \_df_cache field - Normalized table output via to_table_dict() method The df property intelligently detects JQL result patterns (groupBy, percentiles, simple dicts) and converts them to clean tabular format. ### raw ``` raw: list[Any] ``` Raw result data from JQL execution. ### df ``` df: DataFrame ``` Convert result to DataFrame with intelligent structure detection. The conversion strategy depends on the detected JQL result pattern: **groupBy results** (detected by {key: [...], value: X} structure): - Keys expanded to columns: key_0, key_1, key_2, ... - Single value: "value" column - Multiple reducers (value array): value_0, value_1, value_2, ... - Additional fields (from .map()): preserved as-is - Example: {"key": ["US"], "value": 100, "name": "USA"} -> columns: key_0, value, name **Nested percentile results** (\[[{percentile: X, value: Y}, ...]\]): - Outer list unwrapped, inner dicts converted directly **Simple list of dicts** (already well-structured): - Converted directly to DataFrame preserving all fields **Fallback for other structures** (scalars, mixed types, incompatible dicts): - Safely wrapped in single "value" column to prevent data loss - Used when structure doesn't match known patterns | RAISES | DESCRIPTION | | ------------ | -------------------------------------------------------------------------------------------------------------------------------- | | `ValueError` | If groupBy structure has inconsistent value types across rows (some scalar, some array) which indicates malformed query results. | | RETURNS | DESCRIPTION | | ----------- | ---------------------------------------------------- | | `DataFrame` | DataFrame representation, cached after first access. | ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "raw": self._raw, "row_count": len(self._raw), } ``` ## Discovery Types ## mixpanel_data.FunnelInfo ``` FunnelInfo(funnel_id: int, name: str) ``` A saved funnel definition. Represents a funnel saved in Mixpanel that can be queried using the funnel() method. ### funnel_id ``` funnel_id: int ``` Unique identifier for funnel queries. ### name ``` name: str ``` Human-readable funnel name. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "funnel_id": self.funnel_id, "name": self.name, } ``` ## mixpanel_data.SavedCohort ``` SavedCohort( id: int, name: str, count: int, description: str, created: str, is_visible: bool, ) ``` A saved cohort definition. Represents a user cohort saved in Mixpanel for profile filtering. ### id ``` id: int ``` Unique identifier for profile filtering. ### name ``` name: str ``` Human-readable cohort name. ### count ``` count: int ``` Current number of users in cohort. ### description ``` description: str ``` Optional description (may be empty string). ### created ``` created: str ``` Creation timestamp (YYYY-MM-DD HH:mm:ss). ### is_visible ``` is_visible: bool ``` Whether cohort is visible in Mixpanel UI. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "id": self.id, "name": self.name, "count": self.count, "description": self.description, "created": self.created, "is_visible": self.is_visible, } ``` ## mixpanel_data.TopEvent ``` TopEvent(event: str, count: int, percent_change: float) ``` Today's event activity data. Represents an event's current activity including count and trend. ### event ``` event: str ``` Event name. ### count ``` count: int ``` Today's event count. ### percent_change ``` percent_change: float ``` Change vs yesterday (-1.0 to +infinity). ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "event": self.event, "count": self.count, "percent_change": self.percent_change, } ``` ## Lexicon Types ## mixpanel_data.LexiconSchema ``` LexiconSchema(entity_type: str, name: str, schema_json: LexiconDefinition) ``` Complete schema definition from Mixpanel Lexicon. Represents a documented event or profile property definition from the Mixpanel data dictionary. ### entity_type ``` entity_type: str ``` Type of entity (e.g., 'event', 'profile', 'custom_event', 'group', etc.). ### name ``` name: str ``` Name of the event or profile property. ### schema_json ``` schema_json: LexiconDefinition ``` Full schema definition. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | --------------------------------------------------- | | `dict[str, Any]` | Dictionary with entity_type, name, and schema_json. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with entity_type, name, and schema_json. """ return { "entity_type": self.entity_type, "name": self.name, "schema_json": self.schema_json.to_dict(), } ``` ## mixpanel_data.LexiconDefinition ``` LexiconDefinition( description: str | None, properties: dict[str, LexiconProperty], metadata: LexiconMetadata | None, ) ``` Full schema definition for an event or profile property in Lexicon. Contains the structural definition including description, properties, and platform-specific metadata. ### description ``` description: str | None ``` Human-readable description of the entity. ### properties ``` properties: dict[str, LexiconProperty] ``` Property definitions keyed by property name. ### metadata ``` metadata: LexiconMetadata | None ``` Optional Mixpanel-specific metadata for the entity. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | -------------------------------------------------------------------- | | `dict[str, Any]` | Dictionary with properties, and optionally description and metadata. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with properties, and optionally description and metadata. """ result: dict[str, Any] = { "properties": {k: v.to_dict() for k, v in self.properties.items()}, } if self.description is not None: result["description"] = self.description if self.metadata is not None: result["metadata"] = self.metadata.to_dict() return result ``` ## mixpanel_data.LexiconProperty ``` LexiconProperty( type: str, description: str | None, metadata: LexiconMetadata | None ) ``` Schema definition for a single property in a Lexicon schema. Describes the type and metadata for an event or profile property. ### type ``` type: str ``` JSON Schema type (string, number, boolean, array, object, integer, null). ### description ``` description: str | None ``` Human-readable description of the property. ### metadata ``` metadata: LexiconMetadata | None ``` Optional Mixpanel-specific metadata. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | -------------------------------------------------------------- | | `dict[str, Any]` | Dictionary with type, and optionally description and metadata. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with type, and optionally description and metadata. """ result: dict[str, Any] = {"type": self.type} if self.description is not None: result["description"] = self.description if self.metadata is not None: result["metadata"] = self.metadata.to_dict() return result ``` ## mixpanel_data.LexiconMetadata ``` LexiconMetadata( source: str | None, display_name: str | None, tags: list[str], hidden: bool, dropped: bool, contacts: list[str], team_contacts: list[str], ) ``` Mixpanel-specific metadata for Lexicon schemas and properties. Contains platform-specific information about how schemas and properties are displayed and organized in the Mixpanel UI. ### source ``` source: str | None ``` Origin of the schema definition (e.g., 'api', 'csv', 'ui'). ### display_name ``` display_name: str | None ``` Human-readable display name in Mixpanel UI. ### tags ``` tags: list[str] ``` Categorization tags for organization. ### hidden ``` hidden: bool ``` Whether hidden from Mixpanel UI. ### dropped ``` dropped: bool ``` Whether data is dropped/ignored. ### contacts ``` contacts: list[str] ``` Owner email addresses. ### team_contacts ``` team_contacts: list[str] ``` Team ownership labels. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ------------------------------------ | | `dict[str, Any]` | Dictionary with all metadata fields. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all metadata fields. """ return { "source": self.source, "display_name": self.display_name, "tags": self.tags, "hidden": self.hidden, "dropped": self.dropped, "contacts": self.contacts, "team_contacts": self.team_contacts, } ``` ## Event Analytics Results ## mixpanel_data.EventCountsResult ``` EventCountsResult( events: list[str], from_date: str, to_date: str, unit: Literal["day", "week", "month"], type: Literal["general", "unique", "average"], series: dict[str, dict[str, int]], *, _df_cache: DataFrame | None = None, ) ``` Bases: `ResultWithDataFrame` Time-series event count data. Contains aggregate counts for multiple events over time with lazy DataFrame conversion support. Inherits from ResultWithDataFrame to provide: - Lazy DataFrame caching via \_df_cache field - Normalized table output via to_table_dict() method ### events ``` events: list[str] ``` Queried event names. ### from_date ``` from_date: str ``` Query start date (YYYY-MM-DD). ### to_date ``` to_date: str ``` Query end date (YYYY-MM-DD). ### unit ``` unit: Literal['day', 'week', 'month'] ``` Time unit for aggregation. ### type ``` type: Literal['general', 'unique', 'average'] ``` Counting method used. ### series ``` series: dict[str, dict[str, int]] ``` Time series data: {event_name: {date: count}}. ### df ``` df: DataFrame ``` Convert to DataFrame with columns: date, event, count. Conversion is lazy - computed on first access and cached. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "events": self.events, "from_date": self.from_date, "to_date": self.to_date, "unit": self.unit, "type": self.type, "series": self.series, } ``` ## mixpanel_data.PropertyCountsResult ``` PropertyCountsResult( event: str, property_name: str, from_date: str, to_date: str, unit: Literal["day", "week", "month"], type: Literal["general", "unique", "average"], series: dict[str, dict[str, int]], *, _df_cache: DataFrame | None = None, ) ``` Bases: `ResultWithDataFrame` Time-series property value distribution data. Contains aggregate counts by property values over time with lazy DataFrame conversion support. Inherits from ResultWithDataFrame to provide: - Lazy DataFrame caching via \_df_cache field - Normalized table output via to_table_dict() method ### event ``` event: str ``` Queried event name. ### property_name ``` property_name: str ``` Property used for segmentation. ### from_date ``` from_date: str ``` Query start date (YYYY-MM-DD). ### to_date ``` to_date: str ``` Query end date (YYYY-MM-DD). ### unit ``` unit: Literal['day', 'week', 'month'] ``` Time unit for aggregation. ### type ``` type: Literal['general', 'unique', 'average'] ``` Counting method used. ### series ``` series: dict[str, dict[str, int]] ``` Time series data by property value. Structure: {property_value: {date: count}} Example: {"US": {"2024-01-01": 150, "2024-01-02": 200}, "EU": {...}} ### df ``` df: DataFrame ``` Convert to DataFrame with columns: date, value, count. Conversion is lazy - computed on first access and cached. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "event": self.event, "property_name": self.property_name, "from_date": self.from_date, "to_date": self.to_date, "unit": self.unit, "type": self.type, "series": self.series, } ``` ## Advanced Query Results ## mixpanel_data.UserEvent ``` UserEvent(event: str, time: datetime, properties: dict[str, Any] = dict()) ``` Single event in a user's activity feed. Represents one event from a user's event history with timestamp and all associated properties. ### event ``` event: str ``` Event name. ### time ``` time: datetime ``` Event timestamp (UTC). ### properties ``` properties: dict[str, Any] = field(default_factory=dict) ``` All event properties including system properties. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "event": self.event, "time": self.time.isoformat(), "properties": self.properties, } ``` ## mixpanel_data.ActivityFeedResult ``` ActivityFeedResult( distinct_ids: list[str], from_date: str | None, to_date: str | None, events: list[UserEvent] = list(), *, _df_cache: DataFrame | None = None, ) ``` Bases: `ResultWithDataFrame` Collection of user events from activity feed query. Contains chronological event history for one or more users with lazy DataFrame conversion support. Inherits from ResultWithDataFrame to provide: - Lazy DataFrame caching via \_df_cache field - Normalized table output via to_table_dict() method ### distinct_ids ``` distinct_ids: list[str] ``` Queried user identifiers. ### from_date ``` from_date: str | None ``` Start date filter (YYYY-MM-DD), None if not specified. ### to_date ``` to_date: str | None ``` End date filter (YYYY-MM-DD), None if not specified. ### events ``` events: list[UserEvent] = field(default_factory=list) ``` Event history (chronological order). ### df ``` df: DataFrame ``` Convert to DataFrame with columns: event, time, distinct_id, + properties. Flattens event properties into individual columns. Conversion is lazy - computed on first access and cached. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "distinct_ids": self.distinct_ids, "from_date": self.from_date, "to_date": self.to_date, "event_count": len(self.events), "events": [e.to_dict() for e in self.events], } ``` ## mixpanel_data.FrequencyResult ``` FrequencyResult( event: str | None, from_date: str, to_date: str, unit: Literal["day", "week", "month"], addiction_unit: Literal["hour", "day"], data: dict[str, list[int]] = dict(), *, _df_cache: DataFrame | None = None, ) ``` Bases: `ResultWithDataFrame` Event frequency distribution (addiction analysis). Contains frequency arrays showing how many users performed events in N time periods, with lazy DataFrame conversion support. Inherits from ResultWithDataFrame to provide: - Lazy DataFrame caching via \_df_cache field - Normalized table output via to_table_dict() method ### event ``` event: str | None ``` Filtered event name (None = all events). ### from_date ``` from_date: str ``` Query start date (YYYY-MM-DD). ### to_date ``` to_date: str ``` Query end date (YYYY-MM-DD). ### unit ``` unit: Literal['day', 'week', 'month'] ``` Overall time period. ### addiction_unit ``` addiction_unit: Literal['hour', 'day'] ``` Measurement granularity. ### data ``` data: dict[str, list[int]] = field(default_factory=dict) ``` Frequency arrays by date. Structure: {date: [count_1, count_2, ...]} Example: {"2024-01-01": [100, 50, 25, 10]} Each array shows user counts by frequency: - Index 0: users active exactly 1 time - Index 1: users active exactly 2 times - Index N: users active exactly N+1 times ### df ``` df: DataFrame ``` Convert to DataFrame with columns: date, period_1, period_2, ... Each period_N column shows users active in at least N time periods. Conversion is lazy - computed on first access and cached. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "event": self.event, "from_date": self.from_date, "to_date": self.to_date, "unit": self.unit, "addiction_unit": self.addiction_unit, "data": self.data, } ``` ## mixpanel_data.NumericBucketResult ``` NumericBucketResult( event: str, from_date: str, to_date: str, property_expr: str, unit: Literal["hour", "day"], series: dict[str, dict[str, int]] = dict(), *, _df_cache: DataFrame | None = None, ) ``` Bases: `ResultWithDataFrame` Events segmented into numeric property ranges. Contains time-series data bucketed by automatically determined numeric ranges, with lazy DataFrame conversion support. Inherits from ResultWithDataFrame to provide: - Lazy DataFrame caching via \_df_cache field - Normalized table output via to_table_dict() method ### event ``` event: str ``` Queried event name. ### from_date ``` from_date: str ``` Query start date (YYYY-MM-DD). ### to_date ``` to_date: str ``` Query end date (YYYY-MM-DD). ### property_expr ``` property_expr: str ``` The 'on' expression used for bucketing. ### unit ``` unit: Literal['hour', 'day'] ``` Time aggregation unit. ### series ``` series: dict[str, dict[str, int]] = field(default_factory=dict) ``` Bucket data: {range_string: {date: count}}. ### df ``` df: DataFrame ``` Convert to DataFrame with columns: date, bucket, count. Conversion is lazy - computed on first access and cached. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "event": self.event, "from_date": self.from_date, "to_date": self.to_date, "property_expr": self.property_expr, "unit": self.unit, "series": self.series, } ``` ## mixpanel_data.NumericSumResult ``` NumericSumResult( event: str, from_date: str, to_date: str, property_expr: str, unit: Literal["hour", "day"], results: dict[str, float] = dict(), computed_at: str | None = None, *, _df_cache: DataFrame | None = None, ) ``` Bases: `ResultWithDataFrame` Sum of numeric property values per time unit. Contains daily or hourly sum totals for a numeric property with lazy DataFrame conversion support. Inherits from ResultWithDataFrame to provide: - Lazy DataFrame caching via \_df_cache field - Normalized table output via to_table_dict() method ### event ``` event: str ``` Queried event name. ### from_date ``` from_date: str ``` Query start date (YYYY-MM-DD). ### to_date ``` to_date: str ``` Query end date (YYYY-MM-DD). ### property_expr ``` property_expr: str ``` The 'on' expression summed. ### unit ``` unit: Literal['hour', 'day'] ``` Time aggregation unit. ### results ``` results: dict[str, float] = field(default_factory=dict) ``` Sum values: {date: sum}. ### computed_at ``` computed_at: str | None = None ``` Computation timestamp (if provided by API). ### df ``` df: DataFrame ``` Convert to DataFrame with columns: date, sum. Conversion is lazy - computed on first access and cached. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" result: dict[str, Any] = { "event": self.event, "from_date": self.from_date, "to_date": self.to_date, "property_expr": self.property_expr, "unit": self.unit, "results": self.results, } if self.computed_at is not None: result["computed_at"] = self.computed_at return result ``` ## mixpanel_data.NumericAverageResult ``` NumericAverageResult( event: str, from_date: str, to_date: str, property_expr: str, unit: Literal["hour", "day"], results: dict[str, float] = dict(), *, _df_cache: DataFrame | None = None, ) ``` Bases: `ResultWithDataFrame` Average of numeric property values per time unit. Contains daily or hourly average values for a numeric property with lazy DataFrame conversion support. Inherits from ResultWithDataFrame to provide: - Lazy DataFrame caching via \_df_cache field - Normalized table output via to_table_dict() method ### event ``` event: str ``` Queried event name. ### from_date ``` from_date: str ``` Query start date (YYYY-MM-DD). ### to_date ``` to_date: str ``` Query end date (YYYY-MM-DD). ### property_expr ``` property_expr: str ``` The 'on' expression averaged. ### unit ``` unit: Literal['hour', 'day'] ``` Time aggregation unit. ### results ``` results: dict[str, float] = field(default_factory=dict) ``` Average values: {date: average}. ### df ``` df: DataFrame ``` Convert to DataFrame with columns: date, average. Conversion is lazy - computed on first access and cached. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "event": self.event, "from_date": self.from_date, "to_date": self.to_date, "property_expr": self.property_expr, "unit": self.unit, "results": self.results, } ``` ## Bookmark Types ## mixpanel_data.BookmarkInfo ``` BookmarkInfo( id: int, name: str, type: BookmarkType, project_id: int, created: str, modified: str, workspace_id: int | None = None, dashboard_id: int | None = None, description: str | None = None, creator_id: int | None = None, creator_name: str | None = None, ) ``` Metadata for a saved report (bookmark) from the Mixpanel Bookmarks API. Represents a saved Insights, Funnel, Retention, or Flows report that can be queried using query_saved_report() or query_flows(). | ATTRIBUTE | DESCRIPTION | | -------------- | -------------------------------------------------------------------------------------------- | | `id` | Unique bookmark identifier. **TYPE:** `int` | | `name` | User-defined report name. **TYPE:** `str` | | `type` | Report type (insights, funnels, retention, flows, launch-analysis). **TYPE:** `BookmarkType` | | `project_id` | Parent Mixpanel project ID. **TYPE:** `int` | | `created` | Creation timestamp (ISO format). **TYPE:** `str` | | `modified` | Last modification timestamp (ISO format). **TYPE:** `str` | | `workspace_id` | Optional workspace ID if scoped to a workspace. **TYPE:** \`int | | `dashboard_id` | Optional parent dashboard ID if linked to a dashboard. **TYPE:** \`int | | `description` | Optional user-provided description. **TYPE:** \`str | | `creator_id` | Optional creator's user ID. **TYPE:** \`int | | `creator_name` | Optional creator's display name. **TYPE:** \`str | ### id ``` id: int ``` Unique bookmark identifier. ### name ``` name: str ``` User-defined report name. ### type ``` type: BookmarkType ``` Report type. ### project_id ``` project_id: int ``` Parent Mixpanel project ID. ### created ``` created: str ``` Creation timestamp (ISO format). ### modified ``` modified: str ``` Last modification timestamp (ISO format). ### workspace_id ``` workspace_id: int | None = None ``` Workspace ID if scoped to a workspace. ### dashboard_id ``` dashboard_id: int | None = None ``` Parent dashboard ID if linked to a dashboard. ### description ``` description: str | None = None ``` User-provided description. ### creator_id ``` creator_id: int | None = None ``` Creator's user ID. ### creator_name ``` creator_name: str | None = None ``` Creator's display name. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | --------------------------------------------- | | `dict[str, Any]` | Dictionary with all bookmark metadata fields. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all bookmark metadata fields. """ result: dict[str, Any] = { "id": self.id, "name": self.name, "type": self.type, "project_id": self.project_id, "created": self.created, "modified": self.modified, } if self.workspace_id is not None: result["workspace_id"] = self.workspace_id if self.dashboard_id is not None: result["dashboard_id"] = self.dashboard_id if self.description is not None: result["description"] = self.description if self.creator_id is not None: result["creator_id"] = self.creator_id if self.creator_name is not None: result["creator_name"] = self.creator_name return result ``` ## mixpanel_data.SavedReportResult ``` SavedReportResult( bookmark_id: int, computed_at: str, from_date: str, to_date: str, headers: list[str] = list(), series: dict[str, Any] = dict(), _df_cache: DataFrame | None = None, ) ``` Data from a saved report (Insights, Retention, or Funnel). Contains data from a pre-configured saved report with automatic report type detection and lazy DataFrame conversion support. The report_type property automatically detects the report type based on headers: "$retention" indicates retention, "$funnel" indicates funnel, otherwise it's an insights report. | ATTRIBUTE | DESCRIPTION | | ------------- | ------------------------------------------------------------------------- | | `bookmark_id` | Saved report identifier. **TYPE:** `int` | | `computed_at` | When report was computed (ISO format). **TYPE:** `str` | | `from_date` | Report start date. **TYPE:** `str` | | `to_date` | Report end date. **TYPE:** `str` | | `headers` | Report column headers (used for type detection). **TYPE:** `list[str]` | | `series` | Report data (structure varies by report type). **TYPE:** `dict[str, Any]` | ### bookmark_id ``` bookmark_id: int ``` Saved report identifier. ### computed_at ``` computed_at: str ``` When report was computed (ISO format). ### from_date ``` from_date: str ``` Report start date. ### to_date ``` to_date: str ``` Report end date. ### headers ``` headers: list[str] = field(default_factory=list) ``` Report column headers (used for type detection). ### series ``` series: dict[str, Any] = field(default_factory=dict) ``` Report data (structure varies by report type). For Insights reports: {event_name: {date: count}} For Retention reports: {series_name: {date: {segment: {first, counts, rates}}}} For Funnel reports: {count: {...}, overall_conv_ratio: {...}, ...} ### report_type ``` report_type: SavedReportType ``` Detect the report type from headers. | RETURNS | DESCRIPTION | | ----------------- | -------------------------------------------- | | `SavedReportType` | 'retention' if headers contain '$retention', | | `SavedReportType` | 'funnel' if headers contain '$funnel', | | `SavedReportType` | 'insights' otherwise. | ### df ``` df: DataFrame ``` Convert to DataFrame. For Insights reports: columns are date, event, count. For Retention/Funnel reports: flattens the nested structure. Conversion is lazy - computed on first access and cached. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ----------------------------------------------------------------- | | `dict[str, Any]` | Dictionary with all report fields including detected report_type. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all report fields including detected report_type. """ return { "bookmark_id": self.bookmark_id, "computed_at": self.computed_at, "from_date": self.from_date, "to_date": self.to_date, "headers": self.headers, "series": self.series, "report_type": self.report_type, } ``` ## mixpanel_data.FlowsResult ``` FlowsResult( bookmark_id: int, computed_at: str, steps: list[dict[str, Any]] = list(), breakdowns: list[dict[str, Any]] = list(), overall_conversion_rate: float = 0.0, metadata: dict[str, Any] = dict(), *, _df_cache: DataFrame | None = None, ) ``` Bases: `ResultWithDataFrame` Data from a saved Flows report. Contains user path/navigation data from a pre-configured Flows report with lazy DataFrame conversion support. Inherits from ResultWithDataFrame to provide: - Lazy DataFrame caching via \_df_cache field - Normalized table output via to_table_dict() method | ATTRIBUTE | DESCRIPTION | | ------------------------- | ------------------------------------------------------------------------------------ | | `bookmark_id` | Saved report identifier. **TYPE:** `int` | | `computed_at` | When report was computed (ISO format). **TYPE:** `str` | | `steps` | Flow step data with event sequences and counts. **TYPE:** `list[dict[str, Any]]` | | `breakdowns` | Path breakdown data showing user flow distribution. **TYPE:** `list[dict[str, Any]]` | | `overall_conversion_rate` | End-to-end conversion rate (0.0 to 1.0). **TYPE:** `float` | | `metadata` | Additional API metadata from the response. **TYPE:** `dict[str, Any]` | ### bookmark_id ``` bookmark_id: int ``` Saved report identifier. ### computed_at ``` computed_at: str ``` When report was computed (ISO format). ### steps ``` steps: list[dict[str, Any]] = field(default_factory=list) ``` Flow step data with event sequences and counts. ### breakdowns ``` breakdowns: list[dict[str, Any]] = field(default_factory=list) ``` Path breakdown data showing user flow distribution. ### overall_conversion_rate ``` overall_conversion_rate: float = 0.0 ``` End-to-end conversion rate (0.0 to 1.0). ### metadata ``` metadata: dict[str, Any] = field(default_factory=dict) ``` Additional API metadata from the response. ### df ``` df: DataFrame ``` Convert steps to DataFrame. Returns DataFrame with columns derived from step data structure. Conversion is lazy - computed on first access and cached. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ---------------------------------------- | | `dict[str, Any]` | Dictionary with all flows report fields. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all flows report fields. """ return { "bookmark_id": self.bookmark_id, "computed_at": self.computed_at, "steps": self.steps, "breakdowns": self.breakdowns, "overall_conversion_rate": self.overall_conversion_rate, "metadata": self.metadata, } ``` ## JQL Discovery Types ## mixpanel_data.PropertyDistributionResult ``` PropertyDistributionResult( event: str, property_name: str, from_date: str, to_date: str, total_count: int, values: tuple[PropertyValueCount, ...], _df_cache: DataFrame | None = None, ) ``` Distribution of values for a property from JQL analysis. Contains the top N values for a property with their counts and percentages, enabling quick understanding of property value distribution without fetching all data locally. | ATTRIBUTE | DESCRIPTION | | --------------- | ---------------------------------------------------------------------------------- | | `event` | The event type analyzed. **TYPE:** `str` | | `property_name` | The property name analyzed. **TYPE:** `str` | | `from_date` | Query start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | Query end date (YYYY-MM-DD). **TYPE:** `str` | | `total_count` | Total number of events with this property defined. **TYPE:** `int` | | `values` | Top values with counts and percentages. **TYPE:** `tuple[PropertyValueCount, ...]` | ### event ``` event: str ``` Event type analyzed. ### property_name ``` property_name: str ``` Property name analyzed. ### from_date ``` from_date: str ``` Query start date (YYYY-MM-DD). ### to_date ``` to_date: str ``` Query end date (YYYY-MM-DD). ### total_count ``` total_count: int ``` Total events with this property defined. ### values ``` values: tuple[PropertyValueCount, ...] ``` Top values with counts and percentages. ### df ``` df: DataFrame ``` Convert to DataFrame with columns: value, count, percentage. Conversion is lazy - computed on first access and cached. | RETURNS | DESCRIPTION | | ----------- | --------------------------------------- | | `DataFrame` | DataFrame with value distribution data. | ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | -------------------------------------- | | `dict[str, Any]` | Dictionary with all distribution data. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all distribution data. """ return { "event": self.event, "property_name": self.property_name, "from_date": self.from_date, "to_date": self.to_date, "total_count": self.total_count, "values": [v.to_dict() for v in self.values], } ``` ## mixpanel_data.PropertyValueCount ``` PropertyValueCount( value: str | int | float | bool | None, count: int, percentage: float ) ``` A single value and its count from property distribution analysis. Represents one row in a property value distribution, showing the value, its occurrence count, and percentage of total. | ATTRIBUTE | DESCRIPTION | | ------------ | -------------------------------------------------------------------------- | | `value` | The property value (can be string, number, bool, or None). **TYPE:** \`str | | `count` | Number of occurrences of this value. **TYPE:** `int` | | `percentage` | Percentage of total events (0.0 to 100.0). **TYPE:** `float` | ### value ``` value: str | int | float | bool | None ``` The property value. ### count ``` count: int ``` Number of occurrences. ### percentage ``` percentage: float ``` Percentage of total (0.0 to 100.0). ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | --------------------------------------------- | | `dict[str, Any]` | Dictionary with value, count, and percentage. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with value, count, and percentage. """ return { "value": self.value, "count": self.count, "percentage": self.percentage, } ``` ## mixpanel_data.NumericPropertySummaryResult ``` NumericPropertySummaryResult( event: str, property_name: str, from_date: str, to_date: str, count: int, min: float, max: float, sum: float, avg: float, stddev: float, percentiles: dict[int, float], ) ``` Statistical summary of a numeric property from JQL analysis. Contains min, max, sum, average, standard deviation, and percentiles for a numeric property, enabling understanding of value distributions without fetching all data locally. | ATTRIBUTE | DESCRIPTION | | --------------- | -------------------------------------------------------------------------- | | `event` | The event type analyzed. **TYPE:** `str` | | `property_name` | The property name analyzed. **TYPE:** `str` | | `from_date` | Query start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | Query end date (YYYY-MM-DD). **TYPE:** `str` | | `count` | Number of events with this property defined. **TYPE:** `int` | | `min` | Minimum value. **TYPE:** `float` | | `max` | Maximum value. **TYPE:** `float` | | `sum` | Sum of all values. **TYPE:** `float` | | `avg` | Average value. **TYPE:** `float` | | `stddev` | Standard deviation. **TYPE:** `float` | | `percentiles` | Percentile values keyed by percentile number. **TYPE:** `dict[int, float]` | ### event ``` event: str ``` Event type analyzed. ### property_name ``` property_name: str ``` Property name analyzed. ### from_date ``` from_date: str ``` Query start date (YYYY-MM-DD). ### to_date ``` to_date: str ``` Query end date (YYYY-MM-DD). ### count ``` count: int ``` Number of events with this property defined. ### min ``` min: float ``` Minimum value. ### max ``` max: float ``` Maximum value. ### sum ``` sum: float ``` Sum of all values. ### avg ``` avg: float ``` Average value. ### stddev ``` stddev: float ``` Standard deviation. ### percentiles ``` percentiles: dict[int, float] ``` Percentile values keyed by percentile number (e.g., {50: 98.0}). ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ----------------------------------------- | | `dict[str, Any]` | Dictionary with all numeric summary data. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all numeric summary data. """ return { "event": self.event, "property_name": self.property_name, "from_date": self.from_date, "to_date": self.to_date, "count": self.count, "min": self.min, "max": self.max, "sum": self.sum, "avg": self.avg, "stddev": self.stddev, "percentiles": {str(k): v for k, v in self.percentiles.items()}, } ``` ## mixpanel_data.DailyCountsResult ``` DailyCountsResult( from_date: str, to_date: str, events: tuple[str, ...] | None, counts: tuple[DailyCount, ...], _df_cache: DataFrame | None = None, ) ``` Time-series event counts by day from JQL analysis. Contains daily event counts for quick activity trend analysis without complex segmentation setup. | ATTRIBUTE | DESCRIPTION | | ----------- | ----------------------------------------------------------------------- | | `from_date` | Query start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | Query end date (YYYY-MM-DD). **TYPE:** `str` | | `events` | Event types included (None for all events). **TYPE:** \`tuple[str, ...] | | `counts` | Daily counts for each event. **TYPE:** `tuple[DailyCount, ...]` | ### from_date ``` from_date: str ``` Query start date (YYYY-MM-DD). ### to_date ``` to_date: str ``` Query end date (YYYY-MM-DD). ### events ``` events: tuple[str, ...] | None ``` Event types included (None for all events). ### counts ``` counts: tuple[DailyCount, ...] ``` Daily counts for each event. ### df ``` df: DataFrame ``` Convert to DataFrame with columns: date, event, count. Conversion is lazy - computed on first access and cached. | RETURNS | DESCRIPTION | | ----------- | --------------------------------- | | `DataFrame` | DataFrame with daily counts data. | ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | -------------------------------------- | | `dict[str, Any]` | Dictionary with all daily counts data. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all daily counts data. """ return { "from_date": self.from_date, "to_date": self.to_date, "events": list(self.events) if self.events else None, "counts": [c.to_dict() for c in self.counts], } ``` ## mixpanel_data.DailyCount ``` DailyCount(date: str, event: str, count: int) ``` Event count for a single date from daily counts analysis. Represents one row in a daily counts result, showing date, event, and count. | ATTRIBUTE | DESCRIPTION | | --------- | --------------------------------------------------- | | `date` | Date string (YYYY-MM-DD). **TYPE:** `str` | | `event` | Event name. **TYPE:** `str` | | `count` | Number of occurrences on this date. **TYPE:** `int` | ### date ``` date: str ``` Date string (YYYY-MM-DD). ### event ``` event: str ``` Event name. ### count ``` count: int ``` Number of occurrences. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | --------------------------------------- | | `dict[str, Any]` | Dictionary with date, event, and count. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with date, event, and count. """ return { "date": self.date, "event": self.event, "count": self.count, } ``` ## mixpanel_data.EngagementDistributionResult ``` EngagementDistributionResult( from_date: str, to_date: str, events: tuple[str, ...] | None, total_users: int, buckets: tuple[EngagementBucket, ...], _df_cache: DataFrame | None = None, ) ``` User engagement distribution from JQL analysis. Shows how many users performed N events, helping understand user engagement patterns without fetching all data locally. | ATTRIBUTE | DESCRIPTION | | ------------- | ----------------------------------------------------------------------------- | | `from_date` | Query start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | Query end date (YYYY-MM-DD). **TYPE:** `str` | | `events` | Event types included (None for all events). **TYPE:** \`tuple[str, ...] | | `total_users` | Total number of distinct users. **TYPE:** `int` | | `buckets` | Engagement buckets with user counts. **TYPE:** `tuple[EngagementBucket, ...]` | ### from_date ``` from_date: str ``` Query start date (YYYY-MM-DD). ### to_date ``` to_date: str ``` Query end date (YYYY-MM-DD). ### events ``` events: tuple[str, ...] | None ``` Event types included (None for all events). ### total_users ``` total_users: int ``` Total number of distinct users. ### buckets ``` buckets: tuple[EngagementBucket, ...] ``` Engagement buckets with user counts. ### df ``` df: DataFrame ``` Convert to DataFrame with engagement bucket columns. Conversion is lazy - computed on first access and cached. | RETURNS | DESCRIPTION | | ----------- | -------------------------------------------- | | `DataFrame` | DataFrame with engagement distribution data. | ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ------------------------------------------------- | | `dict[str, Any]` | Dictionary with all engagement distribution data. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all engagement distribution data. """ return { "from_date": self.from_date, "to_date": self.to_date, "events": list(self.events) if self.events else None, "total_users": self.total_users, "buckets": [b.to_dict() for b in self.buckets], } ``` ## mixpanel_data.EngagementBucket ``` EngagementBucket( bucket_min: int, bucket_label: str, user_count: int, percentage: float ) ``` User count in an engagement bucket from engagement analysis. Represents one bucket in a user engagement distribution, showing how many users performed events in a certain frequency range. | ATTRIBUTE | DESCRIPTION | | -------------- | ---------------------------------------------------------------- | | `bucket_min` | Minimum events in this bucket. **TYPE:** `int` | | `bucket_label` | Human-readable label (e.g., "1", "2-5", "100+"). **TYPE:** `str` | | `user_count` | Number of users in this bucket. **TYPE:** `int` | | `percentage` | Percentage of total users (0.0 to 100.0). **TYPE:** `float` | ### bucket_min ``` bucket_min: int ``` Minimum events in this bucket. ### bucket_label ``` bucket_label: str ``` Human-readable label (e.g., '1', '2-5', '100+'). ### user_count ``` user_count: int ``` Number of users in this bucket. ### percentage ``` percentage: float ``` Percentage of total users (0.0 to 100.0). ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ---------------------------- | | `dict[str, Any]` | Dictionary with bucket data. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with bucket data. """ return { "bucket_min": self.bucket_min, "bucket_label": self.bucket_label, "user_count": self.user_count, "percentage": self.percentage, } ``` ## mixpanel_data.PropertyCoverageResult ``` PropertyCoverageResult( event: str, from_date: str, to_date: str, total_events: int, coverage: tuple[PropertyCoverage, ...], _df_cache: DataFrame | None = None, ) ``` Property coverage analysis result from JQL. Shows which properties are consistently populated vs sparse, helping understand data quality before writing queries. | ATTRIBUTE | DESCRIPTION | | -------------- | ------------------------------------------------------------------------------- | | `event` | The event type analyzed. **TYPE:** `str` | | `from_date` | Query start date (YYYY-MM-DD). **TYPE:** `str` | | `to_date` | Query end date (YYYY-MM-DD). **TYPE:** `str` | | `total_events` | Total number of events analyzed. **TYPE:** `int` | | `coverage` | Coverage statistics for each property. **TYPE:** `tuple[PropertyCoverage, ...]` | ### event ``` event: str ``` Event type analyzed. ### from_date ``` from_date: str ``` Query start date (YYYY-MM-DD). ### to_date ``` to_date: str ``` Query end date (YYYY-MM-DD). ### total_events ``` total_events: int ``` Total number of events analyzed. ### coverage ``` coverage: tuple[PropertyCoverage, ...] ``` Coverage statistics for each property. ### df ``` df: DataFrame ``` Convert to DataFrame with property coverage columns. Conversion is lazy - computed on first access and cached. | RETURNS | DESCRIPTION | | ----------- | -------------------------------------- | | `DataFrame` | DataFrame with property coverage data. | ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ---------------------------------- | | `dict[str, Any]` | Dictionary with all coverage data. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all coverage data. """ return { "event": self.event, "from_date": self.from_date, "to_date": self.to_date, "total_events": self.total_events, "coverage": [c.to_dict() for c in self.coverage], } ``` ## mixpanel_data.PropertyCoverage ``` PropertyCoverage( property: str, defined_count: int, null_count: int, coverage_percentage: float, ) ``` Coverage statistics for a single property from coverage analysis. Shows how often a property is defined vs null for a given event type. | ATTRIBUTE | DESCRIPTION | | --------------------- | ------------------------------------------------------------------------- | | `property` | Property name. **TYPE:** `str` | | `defined_count` | Number of events with this property defined. **TYPE:** `int` | | `null_count` | Number of events with this property null/undefined. **TYPE:** `int` | | `coverage_percentage` | Percentage of events with property defined (0.0-100.0). **TYPE:** `float` | ### property ``` property: str ``` Property name. ### defined_count ``` defined_count: int ``` Number of events with property defined. ### null_count ``` null_count: int ``` Number of events with property null/undefined. ### coverage_percentage ``` coverage_percentage: float ``` Percentage with property defined (0.0 to 100.0). ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ------------------------------ | | `dict[str, Any]` | Dictionary with coverage data. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with coverage data. """ return { "property": self.property, "defined_count": self.defined_count, "null_count": self.null_count, "coverage_percentage": self.coverage_percentage, } ``` ## Introspection Types ## mixpanel_data.ColumnSummary ``` ColumnSummary( column_name: str, column_type: str, min: Any, max: Any, approx_unique: int, avg: float | None, std: float | None, q25: Any, q50: Any, q75: Any, count: int, null_percentage: float, ) ``` Statistical summary of a single column from DuckDB's SUMMARIZE command. Contains per-column statistics including min/max, quartiles, null percentage, and approximate distinct counts. Numeric columns include additional stats like average and standard deviation. ### column_name ``` column_name: str ``` Name of the column. ### column_type ``` column_type: str ``` DuckDB data type (VARCHAR, TIMESTAMP, INTEGER, JSON, etc.). ### min ``` min: Any ``` Minimum value (type varies by column type). ### max ``` max: Any ``` Maximum value (type varies by column type). ### approx_unique ``` approx_unique: int ``` Approximate count of distinct values (HyperLogLog). ### avg ``` avg: float | None ``` Mean value (None for non-numeric columns). ### std ``` std: float | None ``` Standard deviation (None for non-numeric columns). ### q25 ``` q25: Any ``` 25th percentile value (None for non-numeric). ### q50 ``` q50: Any ``` Median / 50th percentile (None for non-numeric). ### q75 ``` q75: Any ``` 75th percentile value (None for non-numeric). ### count ``` count: int ``` Number of non-null values. ### null_percentage ``` null_percentage: float ``` Percentage of null values (0.0 to 100.0). ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | -------------------------------------- | | `dict[str, Any]` | Dictionary with all column statistics. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all column statistics. """ return { "column_name": self.column_name, "column_type": self.column_type, "min": self.min, "max": self.max, "approx_unique": self.approx_unique, "avg": self.avg, "std": self.std, "q25": self.q25, "q50": self.q50, "q75": self.q75, "count": self.count, "null_percentage": self.null_percentage, } ``` ## mixpanel_data.SummaryResult ``` SummaryResult( table: str, row_count: int, columns: list[ColumnSummary] = list(), _df_cache: DataFrame | None = None, ) ``` Statistical summary of all columns in a table. Contains row count and per-column statistics from DuckDB's SUMMARIZE command. Provides both structured access via the columns list and DataFrame conversion via the df property. ### table ``` table: str ``` Name of the summarized table. ### row_count ``` row_count: int ``` Total number of rows in the table. ### columns ``` columns: list[ColumnSummary] = field(default_factory=list) ``` Per-column statistics. ### df ``` df: DataFrame ``` Convert to DataFrame with one row per column. Conversion is lazy - computed on first access and cached. | RETURNS | DESCRIPTION | | ----------- | --------------------------------- | | `DataFrame` | DataFrame with column statistics. | ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ------------------------------------------------------------- | | `dict[str, Any]` | Dictionary with table name, row count, and column statistics. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with table name, row count, and column statistics. """ return { "table": self.table, "row_count": self.row_count, "columns": [col.to_dict() for col in self.columns], } ``` ## mixpanel_data.EventStats ``` EventStats( event_name: str, count: int, unique_users: int, first_seen: datetime, last_seen: datetime, pct_of_total: float, ) ``` Statistics for a single event type. Contains count, unique users, date range, and percentage of total for a specific event in an events table. ### event_name ``` event_name: str ``` Name of the event. ### count ``` count: int ``` Total occurrences of this event. ### unique_users ``` unique_users: int ``` Count of distinct users who triggered this event. ### first_seen ``` first_seen: datetime ``` Earliest occurrence timestamp. ### last_seen ``` last_seen: datetime ``` Latest occurrence timestamp. ### pct_of_total ``` pct_of_total: float ``` Percentage of all events (0.0 to 100.0). ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ------------------------------------------------------------ | | `dict[str, Any]` | Dictionary with event statistics (datetimes as ISO strings). | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with event statistics (datetimes as ISO strings). """ return { "event_name": self.event_name, "count": self.count, "unique_users": self.unique_users, "first_seen": self.first_seen.isoformat(), "last_seen": self.last_seen.isoformat(), "pct_of_total": self.pct_of_total, } ``` ## mixpanel_data.EventBreakdownResult ``` EventBreakdownResult( table: str, total_events: int, total_users: int, date_range: tuple[datetime, datetime], events: list[EventStats] = list(), _df_cache: DataFrame | None = None, ) ``` Distribution of events in a table. Contains aggregate statistics and per-event breakdown with counts, unique users, date ranges, and percentages. ### table ``` table: str ``` Name of the analyzed table. ### total_events ``` total_events: int ``` Total number of events in the table. ### total_users ``` total_users: int ``` Total distinct users across all events. ### date_range ``` date_range: tuple[datetime, datetime] ``` (earliest, latest) event timestamps. ### events ``` events: list[EventStats] = field(default_factory=list) ``` Per-event statistics, ordered by count descending. ### df ``` df: DataFrame ``` Convert to DataFrame with one row per event type. Conversion is lazy - computed on first access and cached. | RETURNS | DESCRIPTION | | ----------- | -------------------------------- | | `DataFrame` | DataFrame with event statistics. | ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | ------------------------------------------------ | | `dict[str, Any]` | Dictionary with table info and event statistics. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with table info and event statistics. """ return { "table": self.table, "total_events": self.total_events, "total_users": self.total_users, "date_range": [ self.date_range[0].isoformat(), self.date_range[1].isoformat(), ], "events": [event.to_dict() for event in self.events], } ``` ## mixpanel_data.ColumnStatsResult ``` ColumnStatsResult( table: str, column: str, dtype: str, count: int, null_count: int, null_pct: float, unique_count: int, unique_pct: float, top_values: list[tuple[Any, int]] = list(), min: float | None = None, max: float | None = None, mean: float | None = None, std: float | None = None, _df_cache: DataFrame | None = None, ) ``` Deep statistical analysis of a single column. Provides detailed statistics including null rates, cardinality, top values, and numeric statistics (for numeric columns). Supports JSON path expressions for analyzing properties. ### table ``` table: str ``` Name of the source table. ### column ``` column: str ``` Column expression analyzed (may include JSON path). ### dtype ``` dtype: str ``` DuckDB data type of the column. ### count ``` count: int ``` Number of non-null values. ### null_count ``` null_count: int ``` Number of null values. ### null_pct ``` null_pct: float ``` Percentage of null values (0.0 to 100.0). ### unique_count ``` unique_count: int ``` Approximate count of distinct values. ### unique_pct ``` unique_pct: float ``` Percentage of values that are unique (0.0 to 100.0). ### top_values ``` top_values: list[tuple[Any, int]] = field(default_factory=list) ``` Most frequent (value, count) pairs. ### min ``` min: float | None = None ``` Minimum value (None for non-numeric). ### max ``` max: float | None = None ``` Maximum value (None for non-numeric). ### mean ``` mean: float | None = None ``` Mean value (None for non-numeric). ### std ``` std: float | None = None ``` Standard deviation (None for non-numeric). ### df ``` df: DataFrame ``` Convert top values to DataFrame with columns: value, count. Conversion is lazy - computed on first access and cached. | RETURNS | DESCRIPTION | | ----------- | ------------------------------------------- | | `DataFrame` | DataFrame with top values and their counts. | ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. | RETURNS | DESCRIPTION | | ---------------- | -------------------------------------- | | `dict[str, Any]` | Dictionary with all column statistics. | Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output. Returns: Dictionary with all column statistics. """ return { "table": self.table, "column": self.column, "dtype": self.dtype, "count": self.count, "null_count": self.null_count, "null_pct": self.null_pct, "unique_count": self.unique_count, "unique_pct": self.unique_pct, "top_values": [[value, count] for value, count in self.top_values], "min": self.min, "max": self.max, "mean": self.mean, "std": self.std, } ``` ## Storage Types ## mixpanel_data.TableMetadata ``` TableMetadata( type: Literal["events", "profiles"], fetched_at: datetime, from_date: str | None = None, to_date: str | None = None, filter_events: list[str] | None = None, filter_where: str | None = None, filter_cohort_id: str | None = None, filter_output_properties: list[str] | None = None, filter_group_id: str | None = None, filter_behaviors: str | None = None, ) ``` Metadata for a data fetch operation. This metadata is passed to table creation methods and stored in the database's internal \_metadata table for tracking fetch operations. ### type ``` type: Literal['events', 'profiles'] ``` Type of data fetched. ### fetched_at ``` fetched_at: datetime ``` When the fetch completed (UTC). ### from_date ``` from_date: str | None = None ``` Start date for events (YYYY-MM-DD), None for profiles. ### to_date ``` to_date: str | None = None ``` End date for events (YYYY-MM-DD), None for profiles. ### filter_events ``` filter_events: list[str] | None = None ``` Event names filtered (if applicable). ### filter_where ``` filter_where: str | None = None ``` WHERE clause filter (if applicable). ### filter_cohort_id ``` filter_cohort_id: str | None = None ``` Cohort ID filter for profiles (if applicable). ### filter_output_properties ``` filter_output_properties: list[str] | None = None ``` Property names to include in output (if applicable). ### filter_group_id ``` filter_group_id: str | None = None ``` Group ID for group profile queries (if applicable). ### filter_behaviors ``` filter_behaviors: str | None = None ``` Serialized behaviors filter for behavioral profile queries (if applicable). ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "type": self.type, "fetched_at": self.fetched_at.isoformat(), "from_date": self.from_date, "to_date": self.to_date, "filter_events": self.filter_events, "filter_where": self.filter_where, "filter_cohort_id": self.filter_cohort_id, "filter_output_properties": self.filter_output_properties, "filter_group_id": self.filter_group_id, "filter_behaviors": self.filter_behaviors, } ``` ## mixpanel_data.TableInfo ``` TableInfo( name: str, type: Literal["events", "profiles"], row_count: int, fetched_at: datetime, ) ``` Information about a table in the database. Returned by list_tables() to provide summary information about available tables without retrieving full schemas. ### name ``` name: str ``` Table name. ### type ``` type: Literal['events', 'profiles'] ``` Table type. ### row_count ``` row_count: int ``` Number of rows. ### fetched_at ``` fetched_at: datetime ``` When data was fetched (UTC). ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "name": self.name, "type": self.type, "row_count": self.row_count, "fetched_at": self.fetched_at.isoformat(), } ``` ## mixpanel_data.ColumnInfo ``` ColumnInfo(name: str, type: str, nullable: bool, primary_key: bool = False) ``` Information about a table column. Describes a single column's schema, including name, type, nullability constraints, and primary key status. ### name ``` name: str ``` Column name. ### type ``` type: str ``` DuckDB type (VARCHAR, TIMESTAMP, JSON, INTEGER, etc.). ### nullable ``` nullable: bool ``` Whether column allows NULL values. ### primary_key ``` primary_key: bool = False ``` Whether column is a primary key. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "name": self.name, "type": self.type, "nullable": self.nullable, "primary_key": self.primary_key, } ``` ## mixpanel_data.TableSchema ``` TableSchema(table_name: str, columns: list[ColumnInfo]) ``` Schema information for a table. Returned by get_schema() to describe the structure of a table, including all column definitions. ### table_name ``` table_name: str ``` Table name. ### columns ``` columns: list[ColumnInfo] ``` Column definitions. ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "table_name": self.table_name, "columns": [col.to_dict() for col in self.columns], } ``` ## mixpanel_data.WorkspaceInfo ``` WorkspaceInfo( path: Path | None, project_id: str, region: str, account: str | None, tables: list[str], size_mb: float, created_at: datetime | None, ) ``` Information about a Workspace instance. Returned by Workspace.info() to provide metadata about the workspace including database location, connection details, and table summary. ### path ``` path: Path | None ``` Database file path (None for ephemeral or in-memory workspaces). ### project_id ``` project_id: str ``` Mixpanel project ID. ### region ``` region: str ``` Data residency region (us, eu, in). ### account ``` account: str | None ``` Named account used (None if credentials from environment). ### tables ``` tables: list[str] ``` Names of tables in the database. ### size_mb ``` size_mb: float ``` Database file size in megabytes (0.0 for in-memory workspaces). ### created_at ``` created_at: datetime | None ``` When database was created (None if unknown). ### to_dict ``` to_dict() -> dict[str, Any] ``` Serialize for JSON output. Source code in `src/mixpanel_data/types.py` ``` def to_dict(self) -> dict[str, Any]: """Serialize for JSON output.""" return { "path": str(self.path) if self.path else None, "project_id": self.project_id, "region": self.region, "account": self.account, "tables": self.tables, "size_mb": self.size_mb, "created_at": self.created_at.isoformat() if self.created_at else None, } ``` Copy markdown # CLI Reference # CLI Overview The `mp` command provides full access to mixpanel_data functionality from the command line. Explore on DeepWiki πŸ€– **[CLI Usage Guide β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/3.1-cli-usage)** Ask questions about CLI commands, explore options, or get help with specific workflows. ## Installation The CLI is installed automatically with the package: ``` pip install mixpanel_data ``` Verify installation: ``` mp --version ``` ## Global Options | Option | Short | Description | | ----------- | ----- | --------------------------------------- | | `--account` | `-a` | Account name to use (overrides default) | | `--quiet` | `-q` | Suppress progress output | | `--verbose` | `-v` | Enable debug output | | `--version` | | Show version and exit | | `--help` | | Show help and exit | ## Command Groups ### auth β€” Account Management Manage stored credentials and accounts. | Command | Description | | ---------------- | ------------------------ | | `mp auth list` | List configured accounts | | `mp auth add` | Add a new account | | `mp auth remove` | Remove an account | | `mp auth switch` | Set the default account | | `mp auth show` | Display account details | | `mp auth test` | Test account credentials | ### fetch β€” Data Fetching Fetch data from Mixpanel into local storage, or stream directly to stdout. | Command | Description | | ------------------- | ----------------------------------- | | `mp fetch events` | Fetch events to local DuckDB | | `mp fetch profiles` | Fetch user profiles to local DuckDB | **Table Options:** | Option | Description | | --------------- | ----------------------------------------------- | | `--replace` | Drop and recreate existing table | | `--append` | Add data to existing table (duplicates skipped) | | `--batch-size` | Rows per commit (100-100000, default: 1000) | | `--no-progress` | Hide progress bar | **Streaming Options:** | Option | Description | | ---------- | ---------------------------------------------------- | | `--stdout` | Stream data as JSONL to stdout instead of storing | | `--raw` | Output raw Mixpanel API format (requires `--stdout`) | **Event Filter Options (fetch events only):** | Option | Short | Description | | ---------- | ----- | --------------------------------------------------------------------- | | `--events` | `-e` | Comma-separated event names to filter | | `--where` | `-w` | Mixpanel filter expression | | `--limit` | `-l` | Maximum events to return (max 100000, not compatible with --parallel) | **Parallel Fetch Options (fetch events):** | Option | Short | Description | | -------------- | ----- | ----------------------------------------------------------------------- | | `--parallel` | `-p` | Fetch in parallel using multiple threads (faster for large date ranges) | | `--workers` | | Number of parallel workers (default: 10, only with --parallel) | | `--chunk-days` | | Days per chunk for parallel fetching (default: 7, only with --parallel) | **Parallel Fetch Options (fetch profiles):** | Option | Short | Description | | ------------ | ----- | ----------------------------------------------------------------------------- | | `--parallel` | `-p` | Fetch in parallel using multiple threads (up to 5x faster for large datasets) | | `--workers` | | Number of parallel workers (default: 5, max: 5, only with --parallel) | **Profile Filter Options (fetch profiles only):** | Option | Short | Description | | --------------------- | ----- | ------------------------------------------------------------------------------------ | | `--cohort` | `-c` | Filter by cohort ID (mutually exclusive with --behaviors) | | `--output-properties` | `-o` | Comma-separated properties to include | | `--where` | `-w` | Mixpanel filter expression | | `--behaviors` | | Behavioral filter as JSON array (requires --where, mutually exclusive with --cohort) | | `--distinct-id` | | Fetch a specific user by distinct_id (mutually exclusive with --distinct-ids) | | `--distinct-ids` | | Fetch specific users (repeatable flag, mutually exclusive with --distinct-id) | | `--group-id` | `-g` | Fetch group profiles instead of user profiles | | `--as-of-timestamp` | | Query profile state at a specific Unix timestamp | | `--include-all-users` | | Include all users and mark cohort membership (requires --cohort) | ### query β€” Query Operations Execute queries against local or remote data. | Command | Description | | ------------------------------- | ------------------------------------------------- | | `mp query sql` | Query local DuckDB with SQL | | `mp query segmentation` | Time-series event counts | | `mp query funnel` | Funnel conversion analysis | | `mp query retention` | Cohort retention analysis | | `mp query jql` | Execute JQL scripts | | `mp query event-counts` | Multi-event time series | | `mp query property-counts` | Property breakdown time series | | `mp query activity-feed` | User event history | | `mp query saved-report` | Query saved reports (Insights, Retention, Funnel) | | `mp query flows` | Query saved Flows reports | | `mp query frequency` | Event frequency distribution | | `mp query segmentation-numeric` | Numeric property bucketing | | `mp query segmentation-sum` | Numeric property sum | | `mp query segmentation-average` | Numeric property average | Saved Reports Workflow Use `mp inspect bookmarks` to list available saved reports and get their IDs, then query them with `mp query saved-report` or `mp query flows`. ### inspect β€” Discovery & Introspection Explore schema and local database. | Command | Description | | ---------------------------- | ------------------------------------------- | | `mp inspect events` | List event names | | `mp inspect properties` | List properties for an event | | `mp inspect values` | List values for a property | | `mp inspect funnels` | List saved funnels | | `mp inspect cohorts` | List saved cohorts | | `mp inspect bookmarks` | List saved reports (bookmarks) | | `mp inspect top-events` | List today's top events | | `mp inspect lexicon-schemas` | List Lexicon schemas from data dictionary | | `mp inspect lexicon-schema` | Get a single Lexicon schema | | `mp inspect info` | Show workspace info | | `mp inspect tables` | List local tables | | `mp inspect schema` | Show table schema | | `mp inspect drop` | Drop a local table | | `mp inspect drop-all` | Drop all tables (with optional type filter) | | `mp inspect sample` | Random sample rows from a table | | `mp inspect summarize` | Statistical summary of all columns | | `mp inspect breakdown` | Event distribution analysis | | `mp inspect keys` | Discover JSON property keys | | `mp inspect column` | Deep column-level statistics | | `mp inspect distribution` | Property value distribution (JQL) | | `mp inspect numeric` | Numeric property statistics (JQL) | | `mp inspect daily` | Daily event counts (JQL) | | `mp inspect engagement` | User engagement distribution (JQL) | | `mp inspect coverage` | Property coverage analysis (JQL) | ## Output Formats All commands support the `--format` option: | Format | Description | Use Case | | ------- | -------------------- | ------------------------- | | `json` | Pretty-printed JSON | Default, human-readable | | `jsonl` | JSON Lines | Streaming, large datasets | | `table` | Rich formatted table | Terminal viewing | | `csv` | CSV with headers | Spreadsheet export | | `plain` | Minimal text | Scripting | ## Filtering with --jq Commands that output JSON also support the `--jq` option for client-side filtering using jq syntax. This enables powerful transformations without external tools. ``` # Get first 5 events mp inspect events --format json --jq '.[:5]' # Filter events by name pattern mp inspect events --format json --jq '.[] | select(startswith("User"))' # Count results mp inspect events --format json --jq 'length' # Extract specific fields from query results mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 \ --format json --jq '.series | to_entries | map({date: .key, count: .value})' # Filter SQL results mp query sql "SELECT * FROM events LIMIT 100" --format json \ --jq '.[] | select(.event_name == "Purchase")' ``` --jq requires JSON format The `--jq` option only works with `--format json` or `--format jsonl`. Using it with other formats produces an error. See the [jq manual](https://jqlang.org/manual/) for filter syntax. ### Format Examples Given this query: ``` mp query sql "SELECT event_name, COUNT(*) as count FROM events GROUP BY 1 LIMIT 3" ``` **json** (default) β€” Pretty-printed, easy to read: ``` [ { "event_name": "Purchase", "count": 1523 }, { "event_name": "Signup", "count": 892 }, { "event_name": "Login", "count": 4201 } ] ``` **jsonl** β€” One object per line, ideal for streaming: ``` {"event_name": "Purchase", "count": 1523} {"event_name": "Signup", "count": 892} {"event_name": "Login", "count": 4201} ``` **table** β€” Rich ASCII table for terminal viewing: ``` ┏━━━━━━━━━━━━━┳━━━━━━━┓ ┃ EVENT NAME ┃ COUNT ┃ ┑━━━━━━━━━━━━━╇━━━━━━━┩ β”‚ Purchase β”‚ 1523 β”‚ β”‚ Signup β”‚ 892 β”‚ β”‚ Login β”‚ 4201 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **csv** β€” Headers plus comma-separated values: ``` event_name,count Purchase,1523 Signup,892 Login,4201 ``` **plain** β€” Minimal output, one value per line: ``` Purchase Signup Login ``` ### Choosing a Format ``` # Terminal viewing mp inspect events --format table # Export to spreadsheet mp query sql "SELECT * FROM events" --format csv > events.csv # Pipe to jq for processing mp query segmentation "Purchase" --from 2025-01-01 --format json | jq '.values' # Count results mp inspect events --format plain | wc -l # Stream to another tool mp query sql "SELECT * FROM events" --format jsonl | python process.py ``` ## Exit Codes | Code | Meaning | Exception | | ---- | -------------------- | -------------------------------------------- | | 0 | Success | β€” | | 1 | General error | `MixpanelDataError` | | 2 | Authentication error | `AuthenticationError` | | 3 | Invalid arguments | `ConfigError`, validation errors | | 4 | Resource not found | `TableNotFoundError`, `AccountNotFoundError` | | 5 | Rate limit exceeded | `RateLimitError` | | 130 | Interrupted | Ctrl+C | ## Environment Variables | Variable | Description | | ---------------- | ------------------------- | | `MP_USERNAME` | Service account username | | `MP_SECRET` | Service account secret | | `MP_PROJECT_ID` | Project ID | | `MP_REGION` | Data residency region | | `MP_ACCOUNT` | Account name to use | | `MP_CONFIG_PATH` | Override config file path | ## Examples ### Complete Workflow ``` # 1. Set up credentials (prompts for secret securely) mp auth add production --username sa_... --project 12345 --region us # 2. Explore schema mp inspect events mp inspect properties --event Purchase # 3. Fetch data mp fetch events jan --from 2025-01-01 --to 2025-01-31 # 4. Query locally mp query sql "SELECT event_name, COUNT(*) FROM jan GROUP BY 1" --format table # 5. Run live queries mp query segmentation --event Purchase --from 2025-01-01 --to 2025-01-31 --format table ``` ### Incremental Fetching ``` # Fetch initial data mp fetch events events --from 2025-01-01 --to 2025-01-31 # Append more data later mp fetch events events --from 2025-02-01 --to 2025-02-28 --append # Resume after a crash (overlapping dates are safe) mp query sql "SELECT MAX(event_time) FROM events" mp fetch events events --from 2025-02-15 --to 2025-02-28 --append # Replace with fresh data mp fetch events events --from 2025-01-01 --to 2025-02-28 --replace # Parallel fetch for large date ranges (up to 10x faster) mp fetch events events --from 2025-01-01 --to 2025-12-31 --parallel # Parallel fetch with custom settings mp fetch events events --from 2025-01-01 --to 2025-12-31 --parallel --workers 20 --chunk-days 3 # Parallel profile fetch for large datasets (up to 5x faster) mp fetch profiles users --parallel # Parallel profile fetch with custom workers mp fetch profiles users --parallel --workers 3 # Parallel profile fetch with filters mp fetch profiles premium --where 'properties["plan"] == "premium"' --parallel ``` ### Piping and Scripting ``` # Export to file mp query sql "SELECT * FROM events" --format csv > events.csv # Built-in jq filtering (no external tools needed) mp query segmentation --event Login --from 2025-01-01 --to 2025-01-31 \ --format json --jq '.series | keys | length' # Or pipe to external jq mp query segmentation --event Login --from 2025-01-01 --to 2025-01-31 --format json \ | jq '.values."$overall"' # Count lines mp query sql "SELECT * FROM events" --format jsonl | wc -l ``` ### Streaming to Stdout Stream data directly without storing locally: ``` # Stream events as JSONL mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout # Stream profiles mp fetch profiles --stdout # Stream profiles filtered by cohort mp fetch profiles --stdout --cohort 12345 # Stream specific profile properties only mp fetch profiles --stdout --output-properties '$email,$name,plan' # Stream profiles with behavioral filter (users who purchased in last 30 days) mp fetch profiles --stdout \ --behaviors '[{"window":"30d","name":"buyers","event_selectors":[{"event":"Purchase"}]}]' \ --where '(behaviors["buyers"] > 0)' # Fetch a specific user profile mp fetch profiles --stdout --distinct-id user_123 # Fetch multiple specific user profiles mp fetch profiles --stdout --distinct-ids user_123 --distinct-ids user_456 # Fetch group profiles (e.g., companies) mp fetch profiles --stdout --group-id companies # Pipe to jq for filtering mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout \ | jq 'select(.event_name == "Purchase")' # Save to file mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout > events.jsonl # Raw Mixpanel API format mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout --raw ``` ## Full Command Reference See [Commands](https://jaredmcfarland.github.io/mixpanel_data/cli/commands/index.md) for the complete auto-generated reference. Copy markdown # CLI Commands Complete reference for the `mp` command-line interface. Explore on DeepWiki πŸ€– **[CLI Command Reference β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/7.1-cli-command-reference)** Ask questions about specific commands, explore options, or get examples for your use case. ### mp Mixpanel data CLI - fetch, store, and query analytics data. Usage: ``` mp [OPTIONS] COMMAND [ARGS]... ``` Options: ``` -a, --account TEXT Account name to use (overrides default). \[env var: MP_ACCOUNT] -q, --quiet Suppress progress output. -v, --verbose Enable debug output. --version Show version and exit. --install-completion Install completion for the current shell. --show-completion Show completion for the current shell, to copy it or customize the installation. ``` #### auth Manage authentication and accounts. Usage: ``` mp auth [OPTIONS] COMMAND [ARGS]... ``` ##### add Add a new account to the configuration. The secret can be provided via: - Interactive prompt (default, hidden input) - MP_SECRET environment variable (for CI/CD) - --secret-stdin flag to read from stdin Examples: ``` mp auth add production -u myuser -p 12345 MP_SECRET=abc123 mp auth add production -u myuser -p 12345 # inline env var echo "$SECRET" | mp auth add production -u myuser -p 12345 --secret-stdin mp auth add staging -u myuser -p 12345 -r eu --default ``` Usage: ``` mp auth add [OPTIONS] NAME ``` Options: ``` NAME Account name (identifier). \[required] -u, --username TEXT Service account username. -p, --project TEXT Project ID. -r, --region TEXT Region: us, eu, or in. \[default: us] -d, --default Set as default account. -i, --interactive Prompt for all credentials. --secret-stdin Read secret from stdin. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] ``` ##### list List all configured accounts. Shows account name, username, project ID, region, and default status. Examples: ``` mp auth list mp auth list --format table ``` Usage: ``` mp auth list [OPTIONS] ``` Options: ``` -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] ``` ##### remove Remove an account from the configuration. Deletes the account credentials from local config. Use --force to skip the confirmation prompt. Examples: ``` mp auth remove staging mp auth remove old_account --force ``` Usage: ``` mp auth remove [OPTIONS] NAME ``` Options: ``` NAME Account name to remove. \[required] --force Skip confirmation prompt. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] ``` ##### show Show account details (secret is redacted). Displays configuration for the named account or default if omitted. Examples: ``` mp auth show mp auth show production mp auth show --format table ``` Usage: ``` mp auth show [OPTIONS] [NAME] ``` Options: ``` [NAME] Account name (default if omitted). -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] ``` ##### switch Set an account as the default. The default account is used when --account is not specified. Examples: ``` mp auth switch production mp auth switch staging ``` Usage: ``` mp auth switch [OPTIONS] NAME ``` Options: ``` NAME Account name to set as default. \[required] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] ``` ##### test Test account credentials by pinging the API. Verifies that the credentials are valid and can access the project. Examples: ``` mp auth test mp auth test production ``` Usage: ``` mp auth test [OPTIONS] [NAME] ``` Options: ``` [NAME] Account name to test (default if omitted). -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] ``` #### fetch Fetch data from Mixpanel. Usage: ``` mp fetch [OPTIONS] COMMAND [ARGS]... ``` ##### events Fetch events from Mixpanel into local storage. Events are stored in a DuckDB table for SQL querying. A progress bar shows fetch progress (disable with --no-progress or --quiet). **Note:** This is a long-running operation. For large date ranges, use --parallel for up to 10x faster exports. Use --events to filter by event name (comma-separated list). Use --where for Mixpanel expression filters (e.g., 'properties["country"]=="US"'). Use --limit to cap the number of events returned (max 100000). Use --replace to drop and recreate an existing table. Use --append to add data to an existing table. Use --parallel/-p for faster parallel fetching (recommended for large date ranges). Use --chunk-days to configure days per chunk for parallel fetching (default: 7). Use --stdout to stream JSONL to stdout instead of storing locally. Use --raw with --stdout to output raw Mixpanel API format. **Output Structure (JSON):** ``` { "table": "events", "rows": 15234, "type": "events", "duration_seconds": 12.5, "date_range": ["2025-01-01", "2025-01-31"], "fetched_at": "2025-01-15T10:30:00Z" } ``` **Parallel Output Structure (JSON):** ``` { "table": "events", "total_rows": 15234, "successful_batches": 5, "failed_batches": 0, "has_failures": false, "duration_seconds": 2.5, "fetched_at": "2025-01-15T10:30:00Z" } ``` **Examples:** ``` mp fetch events --from 2025-01-01 --to 2025-01-31 mp fetch events signups --from 2025-01-01 --to 2025-01-31 --events "Sign Up" mp fetch events --from 2025-01-01 --to 2025-01-31 --where 'properties["country"]=="US"' mp fetch events --from 2025-01-01 --to 2025-01-31 --limit 10000 mp fetch events --from 2025-01-01 --to 2025-01-31 --replace mp fetch events --from 2025-01-01 --to 2025-01-31 --append mp fetch events --from 2025-01-01 --to 2025-01-31 --parallel mp fetch events --from 2025-01-01 --to 2025-01-31 --parallel --chunk-days 1 mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout mp fetch events --from 2025-01-01 --to 2025-01-31 --stdout --raw | jq '.event' ``` **jq Examples:** ``` --jq '.rows' # Number of events fetched (sequential) --jq '.total_rows' # Number of events fetched (parallel) --jq '.duration_seconds | round' # Fetch duration in seconds --jq '.date_range' # Date range fetched ``` Usage: ``` mp fetch events [OPTIONS] [NAME] ``` Options: ``` [NAME] Table name for storing events. Ignored with --stdout. --from TEXT Start date (YYYY-MM-DD). --to TEXT End date (YYYY-MM-DD). -e, --events TEXT Comma-separated event filter. -w, --where TEXT Mixpanel filter expression. -l, --limit INTEGER RANGE Maximum events to return (max 100000). [1<=x<=100000] --replace Replace existing table. --append Append to existing table. --no-progress Hide progress bar. -p, --parallel Fetch in parallel using multiple threads. Faster for large date ranges. --workers INTEGER RANGE Number of parallel workers (default: 10). Only applies with --parallel. \[x>=1] --chunk-days INTEGER RANGE Days per chunk for parallel fetching (default: 7). Only applies with --parallel. \[default: 7; 1<=x<=100] --stdout Stream to stdout as JSONL instead of storing. --raw Output raw API format (only with --stdout). --batch-size INTEGER RANGE Rows per commit. Controls memory/IO tradeoff. (100-100000) \[default: 1000; 100<=x<=100000] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### profiles Fetch user profiles from Mixpanel into local storage. Profiles are stored in a DuckDB table for SQL querying. A progress bar shows fetch progress (disable with --no-progress or --quiet). **Note:** This can be a long-running operation for large profile sets. Use --parallel for up to 5x faster exports. Use --where for Mixpanel expression filters on profile properties. Use --cohort to filter by cohort ID membership. Use --output-properties to select specific properties (reduces bandwidth). Use --distinct-id to fetch a single user's profile. Use --distinct-ids to fetch multiple specific users (repeatable flag). Use --group-id to fetch group profiles (e.g., companies) instead of users. Use --behaviors with --where to filter by user behavior (see --behaviors help for format). Use --as-of-timestamp to query historical profile state. Use --include-all-users with --cohort to include non-members with membership flag. Use --replace to drop and recreate an existing table. Use --append to add data to an existing table. Use --parallel/-p for faster parallel fetching (recommended for large profile sets). Use --stdout to stream JSONL to stdout instead of storing locally. Use --raw with --stdout to output raw Mixpanel API format. **Output Structure (JSON - Sequential):** ``` { "table": "profiles", "rows": 5000, "type": "profiles", "duration_seconds": 8.2, "date_range": null, "fetched_at": "2025-01-15T10:30:00Z" } ``` **Output Structure (JSON - Parallel):** ``` { "table": "profiles", "total_rows": 5000, "successful_pages": 5, "failed_pages": 0, "failed_page_indices": [], "duration_seconds": 1.8, "fetched_at": "2025-01-15T10:30:00Z" } ``` **Examples:** ``` mp fetch profiles mp fetch profiles users --replace mp fetch profiles users --append mp fetch profiles --parallel mp fetch profiles --parallel --workers 3 mp fetch profiles --where 'properties["plan"]=="premium"' mp fetch profiles --cohort 12345 mp fetch profiles --output-properties '$email,$name,plan' mp fetch profiles --distinct-id user_123 mp fetch profiles --distinct-ids user_1 --distinct-ids user_2 mp fetch profiles --group-id companies mp fetch profiles --behaviors '[{"window":"30d","name":"buyers","event_selectors":[{"event":"Purchase"}]}]' --where '(behaviors["buyers"] > 0)' mp fetch profiles --as-of-timestamp 1704067200 mp fetch profiles --cohort 12345 --include-all-users mp fetch profiles --stdout mp fetch profiles --stdout --raw ``` **jq Examples:** ``` --jq '.rows' # Number of profiles fetched (sequential) --jq '.total_rows' # Number of profiles fetched (parallel) --jq '.table' # Table name created --jq '.duration_seconds | round' # Fetch duration in seconds ``` Usage: ``` mp fetch profiles [OPTIONS] [NAME] ``` Options: ``` [NAME] Table name for storing profiles. Ignored with --stdout. -w, --where TEXT Mixpanel filter expression. -c, --cohort TEXT Filter by cohort ID. -o, --output-properties TEXT Comma-separated properties to include. --replace Replace existing table. --append Append to existing table. --no-progress Hide progress bar. --stdout Stream to stdout as JSONL instead of storing. --raw Output raw API format (only with --stdout). --batch-size INTEGER RANGE Rows per commit. Controls memory/IO tradeoff. (100-100000) \[default: 1000; 100<=x<=100000] --distinct-id TEXT Fetch a specific user by distinct_id. Mutually exclusive with --distinct-ids. --distinct-ids TEXT Fetch specific users by distinct_id (can be repeated). Mutually exclusive with --distinct- id. -g, --group-id TEXT Fetch group profiles (e.g., 'companies') instead of user profiles. --behaviors TEXT Behavioral filter as JSON array. Each behavior needs: "window" (e.g., "30d"), "name" (identifier), and "event_selectors" (array with {"event":"Name"}). Use with --where to filter by behavior count, e.g., --where '(behaviors["name"] > 0)'. Example: '[{"window ":"30d","name":"buyers","event_selectors":[{"e vent":"Purchase"}]}]'. Mutually exclusive with --cohort. --as-of-timestamp INTEGER Query profile state at a specific Unix timestamp (must be in the past). --include-all-users Include all users and mark cohort membership. Requires --cohort. -p, --parallel Fetch in parallel using multiple threads. Up to 5x faster for large exports. --workers INTEGER Number of parallel workers (default: 5, max: 5). Only applies with --parallel. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` #### inspect Inspect schema and local database. Usage: ``` mp inspect [OPTIONS] COMMAND [ARGS]... ``` ##### bookmarks List saved reports (bookmarks) in Mixpanel project. Calls the Mixpanel API to retrieve saved report definitions. Use the bookmark ID with 'mp query saved-report' or 'mp query flows'. Output Structure (JSON): ``` [ {"id": 98765, "name": "Weekly KPIs", "type": "insights", "modified": "2024-01-15T10:30:00"}, {"id": 98766, "name": "Conversion Funnel", "type": "funnels", "modified": "2024-01-14T15:45:00"}, {"id": 98767, "name": "User Retention", "type": "retention", "modified": "2024-01-13T09:20:00"} ] ``` Examples: ``` mp inspect bookmarks mp inspect bookmarks --type insights mp inspect bookmarks --type funnels --format table ``` **jq Examples:** ``` --jq '[.[] | select(.type == "insights")]' # Get bookmarks by type --jq '[.[].id]' # Get bookmark IDs only --jq 'sort_by(.modified) | reverse' # Sort by modified date (newest first) --jq '.[] | select(.name | test("KPI"; "i"))' # Find bookmark by name ``` Usage: ``` mp inspect bookmarks [OPTIONS] ``` Options: ``` -t, --type TEXT Filter by type: insights, funnels, retention, flows, launch-analysis. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### breakdown Show event distribution in a table. Analyzes event counts, unique users, date ranges, and percentages for each event type. Requires event_name, event_time, distinct_id columns. Output Structure (JSON): ``` { "table": "events", "total_events": 125000, "total_users": 8500, "date_range": ["2024-01-01T00:00:00", "2024-01-31T23:59:59"], "events": [ { "event_name": "Page View", "count": 75000, "unique_users": 8200, "first_seen": "2024-01-01T00:05:00", "last_seen": "2024-01-31T23:55:00", "pct_of_total": 60.0 }, { "event_name": "Purchase", "count": 5000, "unique_users": 2100, "first_seen": "2024-01-01T08:30:00", "last_seen": "2024-01-31T22:15:00", "pct_of_total": 4.0 } ] } ``` Examples: ``` mp inspect breakdown -t events mp inspect breakdown -t events --format json ``` **jq Examples:** ``` --jq '.events | sort_by(.count) | reverse | [.[].event_name]' # Event names sorted by count --jq '.events | [.[] | select(.pct_of_total > 10)]' # Events with more than 10% --jq '.total_events' # Get total event count --jq '.events | max_by(.unique_users)' # Event with most unique users ``` Usage: ``` mp inspect breakdown [OPTIONS] ``` Options: ``` -t, --table TEXT Table name. \[required] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### cohorts List saved cohorts in Mixpanel project. Calls the Mixpanel API to retrieve saved cohort definitions. Shows cohort ID, name, user count, and description. Output Structure (JSON): ``` [ {"id": 1001, "name": "Power Users", "count": 5420, "description": "Users with 10+ sessions"}, {"id": 1002, "name": "Trial Users", "count": 892, "description": "Active trial accounts"}, {"id": 1003, "name": "Churned", "count": 2341, "description": "No activity in 30 days"} ] ``` Examples: ``` mp inspect cohorts mp inspect cohorts --format table ``` **jq Examples:** ``` --jq '[.[] | select(.count > 1000)]' # Cohorts with more than 1000 users --jq '[.[].name]' # Get cohort names only --jq 'sort_by(.count) | reverse' # Sort by user count descending --jq '.[] | select(.name == "Power Users")' # Find cohort by name ``` Usage: ``` mp inspect cohorts [OPTIONS] ``` Options: ``` -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### column Show detailed statistics for a single column. Performs deep analysis including null rates, cardinality, top values, and numeric statistics. Supports JSON path expressions like "properties->>'$.country'" for analyzing JSON columns. Output Structure (JSON): ``` { "table": "events", "column": "properties->>'$.country'", "dtype": "VARCHAR", "count": 120000, "null_count": 5000, "null_pct": 4.0, "unique_count": 45, "unique_pct": 0.04, "top_values": [["US", 45000], ["UK", 22000], ["DE", 15000]], "min": null, "max": null, "mean": null, "std": null } ``` Examples: ``` mp inspect column -t events -c event_name mp inspect column -t events -c "properties->>'$.country'" mp inspect column -t events -c distinct_id --top 20 ``` **jq Examples:** ``` --jq '.top_values' # Get top values only --jq '.null_pct' # Get null percentage --jq '.unique_count' # Get unique count --jq '.top_values | map(.[0])' # Get top value names only ``` Usage: ``` mp inspect column [OPTIONS] ``` Options: ``` -t, --table TEXT Table name. \[required] -c, --column TEXT Column name or expression. \[required] --top INTEGER Number of top values to show. \[default: 10] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### coverage Show property coverage statistics from Mixpanel. Uses JQL to count how often each property is defined (non-null) vs undefined. Useful for data quality assessment. Output Structure (JSON): ``` { "event": "Purchase", "from_date": "2024-01-01", "to_date": "2024-01-31", "total_events": 5000, "coverage": [ {"property": "amount", "defined_count": 5000, "null_count": 0, "coverage_percentage": 100.0}, {"property": "coupon_code", "defined_count": 1250, "null_count": 3750, "coverage_percentage": 25.0}, {"property": "referrer", "defined_count": 4500, "null_count": 500, "coverage_percentage": 90.0} ] } ``` Examples: ``` mp inspect coverage -e Purchase -p coupon_code,referrer --from 2024-01-01 --to 2024-01-31 ``` **jq Examples:** ``` --jq '.coverage | [.[] | select(.coverage_percentage < 50)]' # Properties with low coverage --jq '.coverage | [.[] | select(.coverage_percentage == 100)]' # Fully covered properties --jq '.coverage | [.[].property]' # Get property names only --jq '.coverage | sort_by(.coverage_percentage)' # Sort by coverage percentage ``` Usage: ``` mp inspect coverage [OPTIONS] ``` Options: ``` -e, --event TEXT Event name to analyze. \[required] -p, --properties TEXT Comma-separated property names to check. \[required] --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### daily Show daily event counts from Mixpanel. Uses JQL to count events by day. Optionally filter to specific events. Useful for understanding activity trends over time. Output Structure (JSON): ``` { "from_date": "2024-01-01", "to_date": "2024-01-07", "events": ["Purchase", "Signup"], "counts": [ {"date": "2024-01-01", "event": "Purchase", "count": 150}, {"date": "2024-01-01", "event": "Signup", "count": 45}, {"date": "2024-01-02", "event": "Purchase", "count": 175}, {"date": "2024-01-02", "event": "Signup", "count": 52} ] } ``` Examples: ``` mp inspect daily --from 2024-01-01 --to 2024-01-07 mp inspect daily --from 2024-01-01 --to 2024-01-07 -e Purchase,Signup ``` **jq Examples:** ``` --jq '.counts | [.[] | select(.event == "Purchase")] | map(.count) | add' # Total for one event --jq '.counts | [.[] | select(.date == "2024-01-01")]' # Counts for specific date --jq '.counts | [.[].date] | unique' # Get all dates --jq '.counts | group_by(.date) | [.[] | {date: .[0].date, total: map(.count) | add}]' # Daily totals ``` Usage: ``` mp inspect daily [OPTIONS] ``` Options: ``` --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] -e, --events TEXT Comma-separated event names (or all if omitted). -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### distribution Show property value distribution from Mixpanel. Uses JQL to count occurrences of each value for a property, showing counts and percentages sorted by frequency. Useful for understanding what values a property contains before writing queries. Output Structure (JSON): ``` { "event": "Purchase", "property_name": "country", "from_date": "2024-01-01", "to_date": "2024-01-31", "total_count": 50000, "values": [ {"value": "US", "count": 25000, "percentage": 50.0}, {"value": "UK", "count": 10000, "percentage": 20.0}, {"value": "DE", "count": 7500, "percentage": 15.0} ] } ``` Examples: ``` mp inspect distribution -e Purchase -p country --from 2024-01-01 --to 2024-01-31 mp inspect distribution -e Signup -p referrer --from 2024-01-01 --to 2024-01-31 --limit 10 ``` **jq Examples:** ``` --jq '.values | [.[].value]' # Get values only --jq '.values | [.[] | select(.percentage > 10)]' # Values with more than 10% --jq '.total_count' # Get total count --jq '.values[0]' # Get top value ``` Usage: ``` mp inspect distribution [OPTIONS] ``` Options: ``` -e, --event TEXT Event name to analyze. \[required] -p, --property TEXT Property name to get distribution for. \[required] --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] -l, --limit INTEGER Maximum values to return. \[default: 20] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### drop Drop a table from the local database. Permanently removes a table and all its data. Use --force to skip the confirmation prompt. Commonly used before re-fetching data. Output Structure (JSON): ``` {"dropped": "old_events"} ``` Examples: ``` mp inspect drop -t old_events mp inspect drop -t events --force ``` **jq Examples:** ``` --jq '.dropped' # Get dropped table name ``` Usage: ``` mp inspect drop [OPTIONS] ``` Options: ``` -t, --table TEXT Table name to drop. \[required] --force Skip confirmation prompt. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### drop-all Drop all tables from the local database. Permanently removes all tables and their data. Use --type to filter by table type. Use --force to skip the confirmation prompt. Output Structure (JSON): ``` {"dropped_count": 3} # With type filter: {"dropped_count": 2, "type_filter": "events"} ``` Examples: ``` mp inspect drop-all --force mp inspect drop-all --type events --force mp inspect drop-all -t profiles --force ``` **jq Examples:** ``` --jq '.dropped_count' # Get count of dropped tables --jq '.dropped_count > 0' # Check if any tables were dropped ``` Usage: ``` mp inspect drop-all [OPTIONS] ``` Options: ``` -t, --type TEXT Only drop tables of this type: events or profiles. --force Skip confirmation prompt. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### engagement Show user engagement distribution from Mixpanel. Uses JQL to bucket users by their event count, showing how many users performed N events. Useful for understanding user engagement levels. Output Structure (JSON): ``` { "from_date": "2024-01-01", "to_date": "2024-01-31", "events": null, "total_users": 8500, "buckets": [ {"bucket_min": 1, "bucket_label": "1", "user_count": 2500, "percentage": 29.4}, {"bucket_min": 2, "bucket_label": "2-5", "user_count": 3200, "percentage": 37.6}, {"bucket_min": 6, "bucket_label": "6-10", "user_count": 1800, "percentage": 21.2}, {"bucket_min": 11, "bucket_label": "11+", "user_count": 1000, "percentage": 11.8} ] } ``` Examples: ``` mp inspect engagement --from 2024-01-01 --to 2024-01-31 mp inspect engagement --from 2024-01-01 --to 2024-01-31 -e Purchase mp inspect engagement --from 2024-01-01 --to 2024-01-31 --buckets 1,5,10,50,100 ``` **jq Examples:** ``` --jq '.total_users' # Get total users --jq '.buckets | [.[] | select(.bucket_min >= 10)]' # Power users (high engagement) --jq '.buckets | .[] | select(.bucket_min == 1) | .percentage' # Single-event user percentage --jq '.buckets | [.[].bucket_label]' # Get bucket labels only ``` Usage: ``` mp inspect engagement [OPTIONS] ``` Options: ``` --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] -e, --events TEXT Comma-separated event names (or all if omitted). --buckets TEXT Comma-separated bucket boundaries (e.g., 1,5,10,50). -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### events List all event names from Mixpanel project. Calls the Mixpanel API to retrieve tracked event types. Use this to discover what events exist before fetching or querying. Output Structure (JSON): ``` ["Sign Up", "Login", "Purchase", "Page View", "Add to Cart"] ``` Examples: ``` mp inspect events mp inspect events --format table mp inspect events --format json --jq '.[0:3]' ``` **jq Examples:** ``` --jq '.[0:5]' # Get first 5 events --jq 'length' # Count total events --jq '[.[] | select(contains("Purchase"))]' # Find events containing "Purchase" --jq 'sort' # Sort alphabetically ``` Usage: ``` mp inspect events [OPTIONS] ``` Options: ``` -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### funnels List saved funnels in Mixpanel project. Calls the Mixpanel API to retrieve saved funnel definitions. Use the funnel_id with 'mp query funnel' to run funnel analysis. Output Structure (JSON): ``` [ {"funnel_id": 12345, "name": "Onboarding Flow"}, {"funnel_id": 12346, "name": "Purchase Funnel"}, {"funnel_id": 12347, "name": "Trial to Paid"} ] ``` Examples: ``` mp inspect funnels mp inspect funnels --format table ``` **jq Examples:** ``` --jq '[.[].funnel_id]' # Get all funnel IDs --jq '.[] | select(.name | test("Purchase"; "i"))' # Find funnel by name pattern --jq '[.[].name]' # Get funnel names only --jq 'length' # Count funnels ``` Usage: ``` mp inspect funnels [OPTIONS] ``` Options: ``` -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### info Show workspace information. Shows current account configuration, database location, and connection status. Uses local configuration only (no API call). Output Structure (JSON): ``` { "path": "/path/to/mixpanel.db", "project_id": "12345", "region": "us", "account": "production", "tables": ["events", "profiles"], "size_mb": 42.5, "created_at": "2024-01-10T08:00:00" } ``` Examples: ``` mp inspect info mp inspect info --format json ``` **jq Examples:** ``` --jq '.path' # Get database path --jq '.project_id' # Get project ID --jq '.tables' # Get list of tables --jq '.size_mb' # Get database size in MB ``` Usage: ``` mp inspect info [OPTIONS] ``` Options: ``` -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### keys List JSON property keys in a table. Extracts distinct keys from the 'properties' JSON column. Useful for discovering queryable fields in event properties. Output Structure (JSON): ``` ["amount", "browser", "campaign", "country", "currency", "device", "platform"] ``` Examples: ``` mp inspect keys -t events mp inspect keys -t events -e "Purchase" mp inspect keys -t events --format table ``` **jq Examples:** ``` --jq '.[0:10]' # Get first 10 keys --jq 'length' # Count total property keys --jq '[.[] | select(contains("utm"))]' # Find keys containing "utm" --jq 'sort' # Sort keys alphabetically ``` Usage: ``` mp inspect keys [OPTIONS] ``` Options: ``` -t, --table TEXT Table name. \[required] -e, --event TEXT Filter to specific event type. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### lexicon-schema Get a single Lexicon schema from Mixpanel data dictionary. Retrieves the full schema definition for a specific event or profile property, including all property definitions and metadata. Output Structure (JSON): ``` { "entity_type": "event", "name": "Purchase", "schema_json": { "description": "User completed a purchase", "properties": { "amount": {"type": "number", "description": "Purchase amount in USD"}, "currency": {"type": "string", "description": "Currency code"}, "product_id": {"type": "string", "description": "Product identifier"} }, "metadata": {"hidden": false, "dropped": false, "tags": ["revenue"]} } } ``` Examples: ``` mp inspect lexicon-schema --type event --name "Purchase" mp inspect lexicon-schema -t event -n "Sign Up" mp inspect lexicon-schema -t profile -n "Plan Type" --format json ``` **jq Examples:** ``` --jq '.schema_json.properties | keys' # Get property names only --jq '.schema_json.properties | to_entries | [.[] | {name: .key, type: .value.type}]' # Get property types --jq '.schema_json.description' # Get description --jq '.schema_json.metadata.hidden' # Check if schema is hidden ``` Usage: ``` mp inspect lexicon-schema [OPTIONS] ``` Options: ``` -t, --type TEXT Entity type: event, profile, custom_event, etc. \[required] -n, --name TEXT Entity name. \[required] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### lexicon-schemas List Lexicon schemas from Mixpanel data dictionary. Retrieves documented event and profile property schemas from the Mixpanel Lexicon. Shows schema names, types, and property counts. Output Structure (JSON): ``` [ {"entity_type": "event", "name": "Purchase", "property_count": 12, "description": "User completed purchase"}, {"entity_type": "event", "name": "Sign Up", "property_count": 8, "description": "New user registration"}, {"entity_type": "profile", "name": "Plan Type", "property_count": 3, "description": "User subscription tier"} ] ``` Examples: ``` mp inspect lexicon-schemas mp inspect lexicon-schemas --type event mp inspect lexicon-schemas --type profile --format table ``` **jq Examples:** ``` --jq '[.[] | select(.entity_type == "event")]' # Get only event schemas --jq '[.[].name]' # Get schema names --jq '[.[] | select(.property_count > 10)]' # Schemas with many properties --jq '[.[] | select(.description | test("purchase"; "i"))]' # Search by description ``` Usage: ``` mp inspect lexicon-schemas [OPTIONS] ``` Options: ``` -t, --type TEXT Entity type: event, profile, custom_event, etc. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### numeric Show numeric property statistics from Mixpanel. Uses JQL to compute min, max, avg, stddev, and percentiles for a numeric property. Useful for understanding value ranges and distributions. Output Structure (JSON): ``` { "event": "Purchase", "property_name": "amount", "from_date": "2024-01-01", "to_date": "2024-01-31", "count": 5000, "min": 9.99, "max": 999.99, "sum": 125000.50, "avg": 25.00, "stddev": 45.75, "percentiles": {"25": 12.99, "50": 19.99, "75": 49.99, "90": 99.99} } ``` Examples: ``` mp inspect numeric -e Purchase -p amount --from 2024-01-01 --to 2024-01-31 mp inspect numeric -e Purchase -p amount --from 2024-01-01 --to 2024-01-31 --percentiles 10,50,90 ``` **jq Examples:** ``` --jq '.avg' # Get average value --jq '.percentiles["50"]' # Get median (50th percentile) --jq '{min, max}' # Get min and max --jq '.percentiles' # Get all percentiles ``` Usage: ``` mp inspect numeric [OPTIONS] ``` Options: ``` -e, --event TEXT Event name to analyze. \[required] -p, --property TEXT Numeric property name. \[required] --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] --percentiles TEXT Comma-separated percentiles (e.g., 25,50,75,90). -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### properties List properties for a specific event. Calls the Mixpanel API to retrieve property names tracked with an event. Shows both custom event properties and default Mixpanel properties. Output Structure (JSON): ``` ["country", "browser", "device", "$city", "$region", "plan_type"] ``` Examples: ``` mp inspect properties -e "Sign Up" mp inspect properties -e "Purchase" --format table ``` **jq Examples:** ``` --jq '.[0:10]' # Get first 10 properties --jq '[.[] | select(startswith("$") | not)]' # User-defined properties (no $ prefix) --jq '[.[] | select(startswith("$"))]' # Mixpanel system properties ($ prefix) --jq 'length' # Count properties ``` Usage: ``` mp inspect properties [OPTIONS] ``` Options: ``` -e, --event TEXT Event name. \[required] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### sample Show random sample rows from a table. Uses reservoir sampling to return representative rows from throughout the table. Useful for quickly exploring data structure and values. Output Structure (JSON): ``` [ { "event_name": "Purchase", "event_time": "2024-01-15T10:30:00", "distinct_id": "user_123", "properties": {"amount": 99.99, "currency": "USD", "product": "Pro Plan"} }, { "event_name": "Login", "event_time": "2024-01-15T09:15:00", "distinct_id": "user_456", "properties": {"browser": "Chrome", "platform": "web"} } ] ``` Examples: ``` mp inspect sample -t events mp inspect sample -t events -n 5 --format json ``` **jq Examples:** ``` --jq '[.[].event_name]' # Get event names from sample --jq '[.[].distinct_id] | unique' # Get unique distinct_ids --jq '[.[].properties.country]' # Extract specific property --jq '[.[] | select(.event_name == "Purchase")]' # Filter sample by event type ``` Usage: ``` mp inspect sample [OPTIONS] ``` Options: ``` -t, --table TEXT Table name. \[required] -n, --rows INTEGER Number of rows to sample. \[default: 10] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### schema Show schema for a table in local database. Lists all columns with their types and nullability constraints. Useful for understanding the data structure before writing SQL. Note: The --sample option is reserved for future implementation. Output Structure (JSON): ``` { "table": "events", "columns": [ {"name": "event_name", "type": "VARCHAR", "nullable": false}, {"name": "event_time", "type": "TIMESTAMP", "nullable": false}, {"name": "distinct_id", "type": "VARCHAR", "nullable": false}, {"name": "properties", "type": "JSON", "nullable": true} ] } ``` Examples: ``` mp inspect schema -t events mp inspect schema -t events --format table ``` **jq Examples:** ``` --jq '.columns | [.[].name]' # Get column names only --jq '.columns | [.[] | select(.nullable)]' # Get nullable columns --jq '.columns | [.[] | {name, type}]' # Get column types --jq '.columns | length' # Count columns ``` Usage: ``` mp inspect schema [OPTIONS] ``` Options: ``` -t, --table TEXT Table name. \[required] --sample Include sample values. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### summarize Show statistical summary of all columns in a table. Uses DuckDB's SUMMARIZE command to compute per-column statistics including min/max, quartiles, null percentage, and distinct counts. Output Structure (JSON): ``` { "table": "events", "row_count": 125000, "columns": [ { "column_name": "event_name", "column_type": "VARCHAR", "min": "Add to Cart", "max": "View Page", "approx_unique": 25, "avg": null, "std": null, "q25": null, "q50": null, "q75": null, "count": 125000, "null_percentage": 0.0 } ] } ``` Examples: ``` mp inspect summarize -t events mp inspect summarize -t events --format json ``` **jq Examples:** ``` --jq '.columns | [.[].column_name]' # Get column names --jq '.columns | [.[] | select(.null_percentage > 0)]' # Find columns with nulls --jq '.row_count' # Get row count --jq '.columns | [.[] | select(.approx_unique > 1000)]' # High-cardinality columns ``` Usage: ``` mp inspect summarize [OPTIONS] ``` Options: ``` -t, --table TEXT Table name. \[required] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### tables List tables in local database. Shows all tables in the local DuckDB database with row counts and fetch timestamps. Use this to see what data has been fetched. Output Structure (JSON): ``` [ {"name": "events", "type": "events", "row_count": 125000, "fetched_at": "2024-01-15T10:30:00"}, {"name": "jan_events", "type": "events", "row_count": 45000, "fetched_at": "2024-01-10T08:00:00"}, {"name": "profiles", "type": "profiles", "row_count": 8500, "fetched_at": "2024-01-14T14:20:00"} ] ``` Examples: ``` mp inspect tables mp inspect tables --format table ``` **jq Examples:** ``` --jq '[.[].name]' # Get table names only --jq '[.[] | select(.row_count > 100000)]' # Tables with more than 100k rows --jq '[.[] | select(.type == "events")]' # Get only event tables --jq '[.[].row_count] | add' # Total row count across all tables ``` Usage: ``` mp inspect tables [OPTIONS] ``` Options: ``` -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### top-events List today's top events by count. Calls the Mixpanel API to retrieve today's most frequent events. Useful for quick overview of project activity. Output Structure (JSON): ``` [ {"event": "Page View", "count": 15234, "percent_change": 12.5}, {"event": "Login", "count": 8921, "percent_change": -3.2}, {"event": "Purchase", "count": 1456, "percent_change": 8.7} ] ``` Examples: ``` mp inspect top-events mp inspect top-events --limit 20 --format table mp inspect top-events --type unique ``` **jq Examples:** ``` --jq '[.[] | select(.percent_change > 0)]' # Events with positive growth --jq '[.[].event]' # Get just event names --jq '[.[] | select(.count > 10000)]' # Events with count over 10000 --jq 'max_by(.percent_change)' # Event with highest growth ``` Usage: ``` mp inspect top-events [OPTIONS] ``` Options: ``` -t, --type TEXT Count type: general, unique, average. \[default: general] -l, --limit INTEGER Maximum events to return. \[default: 10] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### values List sample values for a property. Calls the Mixpanel API to retrieve sample values for a property. Useful for understanding the data shape before writing queries. Output Structure (JSON): ``` ["US", "UK", "DE", "FR", "CA", "AU", "JP"] ``` Examples: ``` mp inspect values -p country mp inspect values -p country -e "Sign Up" --limit 20 mp inspect values -p browser --format table ``` **jq Examples:** ``` --jq '.[0:5]' # Get first 5 values --jq 'length' # Count unique values --jq '[.[] | select(test("^U"))]' # Filter values matching pattern --jq 'sort' # Sort values alphabetically ``` Usage: ``` mp inspect values [OPTIONS] ``` Options: ``` -p, --property TEXT Property name. \[required] -e, --event TEXT Event name (optional). -l, --limit INTEGER Maximum values to return. \[default: 100] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` #### query Query local and live data. Usage: ``` mp query [OPTIONS] COMMAND [ARGS]... ``` ##### activity-feed Query user activity feed for specific users. Retrieves the event history for one or more users identified by their distinct_id. Pass comma-separated IDs to --users. Optionally filter by date range with --from and --to. Without date filters, returns recent activity (API default). **Output Structure (JSON):** ``` { "distinct_ids": ["user123", "user456"], "from_date": "2025-01-01", "to_date": "2025-01-31", "event_count": 47, "events": [ { "event": "Login", "time": "2025-01-15T10:30:00+00:00", "properties": {"$browser": "Chrome", "$city": "San Francisco", ...} }, { "event": "Purchase", "time": "2025-01-15T11:45:00+00:00", "properties": {"product_id": "SKU123", "amount": 99.99, ...} } ] } ``` **Examples:** ``` mp query activity-feed --users "user123" mp query activity-feed --users "user123,user456" --from 2025-01-01 --to 2025-01-31 mp query activity-feed --users "user123" --format table ``` **jq Examples:** ``` --jq '.event_count' # Total number of events --jq '.events | length' # Same as above --jq '.events[].event' # List all event names --jq '.events | group_by(.event) | map({event: .[0].event, count: length})' ``` Usage: ``` mp query activity-feed [OPTIONS] ``` Options: ``` -U, --users TEXT Comma-separated distinct IDs. \[required] --from TEXT Start date (YYYY-MM-DD). --to TEXT End date (YYYY-MM-DD). -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### event-counts Query event counts over time for multiple events. Compares multiple events on the same time series. Pass comma-separated event names to --events (e.g., --events "Sign Up,Login,Purchase"). The --type option controls how counts are calculated: - general: Total event occurrences (default) - unique: Unique users who triggered the event - average: Average events per user **Output Structure (JSON):** ``` { "events": ["Sign Up", "Login", "Purchase"], "from_date": "2025-01-01", "to_date": "2025-01-07", "unit": "day", "type": "general", "series": { "Sign Up": {"2025-01-01": 150, "2025-01-02": 175, ...}, "Login": {"2025-01-01": 520, "2025-01-02": 610, ...}, "Purchase": {"2025-01-01": 45, "2025-01-02": 52, ...} } } ``` **Examples:** ``` mp query event-counts --events "Sign Up,Login,Purchase" --from 2025-01-01 --to 2025-01-31 mp query event-counts --events "Sign Up,Purchase" --from 2025-01-01 --to 2025-01-31 --type unique mp query event-counts --events "Login" --from 2025-01-01 --to 2025-01-31 --unit week ``` **jq Examples:** ``` --jq '.series | keys' # List event names --jq '.series["Login"] | add' # Sum counts for one event --jq '.series["Login"]["2025-01-01"]' # Count for specific date --jq '[.series | to_entries[] | {event: .key, total: (.value | add)}]' ``` Usage: ``` mp query event-counts [OPTIONS] ``` Options: ``` -e, --events TEXT Comma-separated event names. \[required] --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] -t, --type TEXT Count type: general, unique, average. \[default: general] -u, --unit TEXT Time unit: day, week, month. \[default: day] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### flows Query a saved Flows report by bookmark ID. Retrieves data from a saved Flows report in Mixpanel. The bookmark_id can be found in the URL when viewing a flows report (the numeric ID after /flows/). Flows reports show user paths through a sequence of events with step-by-step conversion rates and path breakdowns. **Output Structure (JSON):** ``` { "bookmark_id": 12345, "computed_at": "2025-01-15T10:30:00Z", "steps": [ {"step": 1, "event": "Sign Up", "count": 10000}, {"step": 2, "event": "Verify Email", "count": 7500}, {"step": 3, "event": "Complete Profile", "count": 4200} ], "breakdowns": [ {"path": ["Sign Up", "Verify Email", "Complete Profile"], "count": 3800}, {"path": ["Sign Up", "Verify Email", "Drop Off"], "count": 3300} ], "overall_conversion_rate": 0.42, "metadata": {...} } ``` **Examples:** ``` mp query flows 12345 mp query flows 12345 --format table ``` **jq Examples:** ``` --jq '.overall_conversion_rate' # End-to-end conversion rate --jq '.steps | length' # Number of flow steps --jq '.steps[] | {event, count}' # Event and count per step --jq '.breakdowns | sort_by(.count) | reverse | .[0]' ``` Usage: ``` mp query flows [OPTIONS] BOOKMARK_ID ``` Options: ``` BOOKMARK_ID Saved flows report bookmark ID. \[required] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### frequency Analyze event frequency distribution (addiction analysis). Shows how many users performed an event N times within each time period. Useful for understanding user engagement depth and "power user" distribution. The --addiction-unit controls granularity of frequency buckets (hour or day). For example, with --addiction-unit hour, the data shows how many users performed the event 1 time, 2 times, 3 times, etc. per hour. **Output Structure (JSON):** ``` { "event": "Login", "from_date": "2025-01-01", "to_date": "2025-01-07", "unit": "day", "addiction_unit": "hour", "data": { "2025-01-01": [500, 250, 125, 60, 30, 15], "2025-01-02": [520, 260, 130, 65, 32, 16], ... } } ``` Each array shows user counts by frequency (index 0 = 1x, index 1 = 2x, etc.). **Examples:** ``` mp query frequency --from 2025-01-01 --to 2025-01-31 mp query frequency -e "Login" --from 2025-01-01 --to 2025-01-31 mp query frequency -e "Login" --from 2025-01-01 --to 2025-01-31 --addiction-unit day ``` **jq Examples:** ``` --jq '.data | keys' # List all dates --jq '.data["2025-01-01"][0]' # Users who did it once on Jan 1 --jq '.data["2025-01-01"] | add' # Total active users on Jan 1 --jq '.data | to_entries | map({date: .key, power_users: .value[4:] | add})' ``` Usage: ``` mp query frequency [OPTIONS] ``` Options: ``` --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] -e, --event TEXT Event name (all events if omitted). -u, --unit TEXT Time unit: day, week, month. \[default: day] --addiction-unit TEXT Addiction unit: hour, day. \[default: hour] -w, --where TEXT Filter expression. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### funnel Run live funnel analysis against Mixpanel API. Analyzes conversion through a saved funnel's steps. The funnel_id can be found in the Mixpanel UI URL when viewing the funnel, or via 'mp inspect funnels'. **Output Structure (JSON):** ``` { "funnel_id": 12345, "funnel_name": "Onboarding Funnel", "from_date": "2025-01-01", "to_date": "2025-01-31", "conversion_rate": 0.23, "steps": [ {"event": "Sign Up", "count": 10000, "conversion_rate": 1.0}, {"event": "Verify Email", "count": 7500, "conversion_rate": 0.75}, {"event": "Complete Profile", "count": 4200, "conversion_rate": 0.56}, {"event": "First Purchase", "count": 2300, "conversion_rate": 0.55} ] } ``` **Examples:** ``` mp query funnel 12345 --from 2025-01-01 --to 2025-01-31 mp query funnel 12345 --from 2025-01-01 --to 2025-01-31 --unit week mp query funnel 12345 --from 2025-01-01 --to 2025-01-31 --on country ``` **jq Examples:** ``` --jq '.conversion_rate' # Overall conversion rate --jq '.steps | length' # Number of funnel steps --jq '.steps[-1].count' # Users completing the funnel --jq '.steps[] | {event, rate: .conversion_rate}' ``` Usage: ``` mp query funnel [OPTIONS] FUNNEL_ID ``` Options: ``` FUNNEL_ID Funnel ID. \[required] --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] -u, --unit TEXT Time unit: day, week, month. -o, --on TEXT Property to segment by. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### jql Execute JQL script against Mixpanel API. Script can be provided as a file argument or inline with --script. Parameters can be passed with --param key=value (repeatable). **Output Structure (JSON):** The output structure depends on your JQL script. Common patterns: groupBy result: ``` { "raw": [ {"key": ["Login"], "value": 5234}, {"key": ["Sign Up"], "value": 1892} ], "row_count": 2 } ``` Aggregation result: ``` { "raw": [{"count": 15234, "unique_users": 3421}], "row_count": 1 } ``` **Examples:** ``` mp query jql analysis.js mp query jql --script "function main() { return Events({...}).groupBy(['event'], mixpanel.reducer.count()) }" mp query jql analysis.js --param start_date=2025-01-01 --param event_name=Login ``` **jq Examples:** ``` --jq '.raw' # Get raw result array --jq '.raw[0]' # First result row --jq '.raw[] | {event: .key[0], count: .value}' --jq '.row_count' # Number of result rows ``` Usage: ``` mp query jql [OPTIONS] [FILE] ``` Options: ``` [FILE] JQL script file. -c, --script TEXT Inline JQL script. -P, --param TEXT Parameter (key=value). -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### property-counts Query event counts broken down by property values. Shows how event counts vary across different values of a property. For example, --property country shows event counts per country. The --type option controls how counts are calculated: - general: Total event occurrences (default) - unique: Unique users who triggered the event - average: Average events per user The --limit option controls how many property values to return (default 10, ordered by count descending). **Output Structure (JSON):** ``` { "event": "Purchase", "property_name": "country", "from_date": "2025-01-01", "to_date": "2025-01-07", "unit": "day", "type": "general", "series": { "US": {"2025-01-01": 150, "2025-01-02": 175, ...}, "UK": {"2025-01-01": 75, "2025-01-02": 80, ...}, "DE": {"2025-01-01": 45, "2025-01-02": 52, ...} } } ``` **Examples:** ``` mp query property-counts -e "Purchase" -p country --from 2025-01-01 --to 2025-01-31 mp query property-counts -e "Sign Up" -p "utm_source" --from 2025-01-01 --to 2025-01-31 --limit 20 mp query property-counts -e "Login" -p browser --from 2025-01-01 --to 2025-01-31 --type unique ``` **jq Examples:** ``` --jq '.series | keys' # List property values --jq '.series["US"] | add' # Sum counts for one value --jq '.series | to_entries | sort_by(.value | add) | reverse' --jq '[.series | to_entries[] | {value: .key, total: (.value | add)}]' ``` Usage: ``` mp query property-counts [OPTIONS] ``` Options: ``` -e, --event TEXT Event name. \[required] -p, --property TEXT Property name. \[required] --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] -t, --type TEXT Count type: general, unique, average. \[default: general] -u, --unit TEXT Time unit: day, week, month. \[default: day] -l, --limit INTEGER Max property values to return. \[default: 10] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### retention Run live retention analysis against Mixpanel API. Measures how many users return after their first action (birth event). Users are grouped into cohorts by when they first did the birth event, then tracked for how many returned to do the return event. The --interval and --intervals options control bucket granularity: --interval is the bucket size (default 1), --intervals is the number of buckets to track (default 10). Combined with --unit, this defines the retention window (e.g., --unit day --interval 1 --intervals 7 tracks daily retention for 7 days). **Output Structure (JSON):** ``` { "born_event": "Sign Up", "return_event": "Login", "from_date": "2025-01-01", "to_date": "2025-01-31", "unit": "day", "cohorts": [ {"date": "2025-01-01", "size": 500, "retention": [1.0, 0.65, 0.45, 0.38]}, {"date": "2025-01-02", "size": 480, "retention": [1.0, 0.62, 0.41, 0.35]}, {"date": "2025-01-03", "size": 520, "retention": [1.0, 0.68, 0.48, 0.40]} ] } ``` **Examples:** ``` mp query retention --born "Sign Up" --return "Login" --from 2025-01-01 --to 2025-01-31 mp query retention --born "Sign Up" --return "Purchase" --from 2025-01-01 --to 2025-01-31 --unit week mp query retention --born "Sign Up" --return "Login" --from 2025-01-01 --to 2025-01-31 --intervals 7 ``` **jq Examples:** ``` --jq '.cohorts | length' # Number of cohorts --jq '.cohorts[0].retention' # First cohort retention curve --jq '.cohorts[] | {date, size, day7: .retention[7]}' ``` Usage: ``` mp query retention [OPTIONS] ``` Options: ``` -b, --born TEXT Birth event. \[required] -r, --return TEXT Return event. \[required] --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] --born-where TEXT Birth event filter. --return-where TEXT Return event filter. -i, --interval INTEGER Bucket size. -n, --intervals INTEGER Number of buckets. -u, --unit TEXT Time unit: day, week, month. \[default: day] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### saved-report Query a saved report (Insights, Retention, or Funnel) by bookmark ID. Retrieves data from a saved report in Mixpanel. The bookmark_id can be found in the URL when viewing a report (the numeric ID after /insights/, /retention/, or /funnels/). The report type is automatically detected from the response headers. **Output Structure (JSON):** Insights report: ``` { "bookmark_id": 12345, "computed_at": "2025-01-15T10:30:00Z", "from_date": "2025-01-01", "to_date": "2025-01-31", "headers": ["$event"], "series": { "Sign Up": {"2025-01-01": 150, "2025-01-02": 175, ...}, "Login": {"2025-01-01": 520, "2025-01-02": 610, ...} }, "report_type": "insights" } ``` Funnel/Retention reports have different series structures based on the saved report configuration. **Examples:** ``` mp query saved-report 12345 mp query saved-report 12345 --format table ``` **jq Examples:** ``` --jq '.report_type' # Report type (insights/retention/funnel) --jq '.series | keys' # List series names --jq '.headers' # Report column headers --jq '.series | to_entries | map({name: .key, total: (.value | add)})' ``` Usage: ``` mp query saved-report [OPTIONS] BOOKMARK_ID ``` Options: ``` BOOKMARK_ID Saved report bookmark ID. \[required] -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### segmentation Run live segmentation query against Mixpanel API. Returns time-series event counts, optionally segmented by a property. Without --on, returns total counts per time period. With --on, breaks down counts by property values (e.g., --on country shows counts per country). The --on parameter accepts bare property names (e.g., 'country') or full filter expressions (e.g., 'properties["country"] == "US"'). **Output Structure (JSON):** ``` { "event": "Sign Up", "from_date": "2025-01-01", "to_date": "2025-01-07", "unit": "day", "segment_property": "country", "total": 1850, "series": { "US": {"2025-01-01": 150, "2025-01-02": 175, ...}, "UK": {"2025-01-01": 75, "2025-01-02": 80, ...} } } ``` **Examples:** ``` mp query segmentation -e "Sign Up" --from 2025-01-01 --to 2025-01-31 mp query segmentation -e "Purchase" --from 2025-01-01 --to 2025-01-31 --on country mp query segmentation -e "Login" --from 2025-01-01 --to 2025-01-07 --unit week ``` **jq Examples:** ``` --jq '.total' # Total event count --jq '.series | keys' # List segment names --jq '.series["US"] | add' # Sum counts for one segment ``` Usage: ``` mp query segmentation [OPTIONS] ``` Options: ``` -e, --event TEXT Event name. \[required] --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] -o, --on TEXT Property to segment by (bare name or expression). -u, --unit TEXT Time unit: day, week, month. \[default: day] -w, --where TEXT Filter expression. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### segmentation-average Calculate average of numeric property over time. Calculates the mean value of a numeric property across all matching events. Useful for tracking averages like order value, session duration, or scores. For example, --event Purchase --on order_value calculates average order value per time period. **Output Structure (JSON):** ``` { "event": "Purchase", "from_date": "2025-01-01", "to_date": "2025-01-07", "property_expr": "order_value", "unit": "day", "results": { "2025-01-01": 85.50, "2025-01-02": 92.75, "2025-01-03": 78.25, ... } } ``` **Examples:** ``` mp query segmentation-average -e "Purchase" --on order_value --from 2025-01-01 --to 2025-01-31 mp query segmentation-average -e "Session" --on duration --from 2025-01-01 --to 2025-01-31 --unit hour ``` **jq Examples:** ``` --jq '.results | add / length' # Overall average --jq '.results | to_entries | max_by(.value)' # Highest day --jq '.results | to_entries | min_by(.value)' # Lowest day --jq '[.results | to_entries[] | {date: .key, avg: .value}]' ``` Usage: ``` mp query segmentation-average [OPTIONS] ``` Options: ``` -e, --event TEXT Event name. \[required] -o, --on TEXT Numeric property to average (bare name or expression). \[required] --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] -u, --unit TEXT Time unit: hour, day. \[default: day] -w, --where TEXT Filter expression. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### segmentation-numeric Bucket events by numeric property ranges. Groups events into buckets based on a numeric property's value. Mixpanel automatically determines optimal bucket ranges based on the property's value distribution. For example, --on price might create buckets like "0-10", "10-50", "50+". The --type option controls how counts are calculated: - general: Total event occurrences (default) - unique: Unique users who triggered the event - average: Average events per user **Output Structure (JSON):** ``` { "event": "Purchase", "from_date": "2025-01-01", "to_date": "2025-01-07", "property_expr": "amount", "unit": "day", "series": { "0-50": {"2025-01-01": 120, "2025-01-02": 135, ...}, "50-100": {"2025-01-01": 85, "2025-01-02": 92, ...}, "100-500": {"2025-01-01": 45, "2025-01-02": 52, ...}, "500+": {"2025-01-01": 12, "2025-01-02": 15, ...} } } ``` **Examples:** ``` mp query segmentation-numeric -e "Purchase" --on amount --from 2025-01-01 --to 2025-01-31 mp query segmentation-numeric -e "Purchase" --on amount --from 2025-01-01 --to 2025-01-31 --type unique ``` **jq Examples:** ``` --jq '.series | keys' # List bucket ranges --jq '.series["100-500"] | add' # Sum counts for a bucket --jq '[.series | to_entries[] | {bucket: .key, total: (.value | add)}]' --jq '.series | to_entries | sort_by(.value | add) | reverse' ``` Usage: ``` mp query segmentation-numeric [OPTIONS] ``` Options: ``` -e, --event TEXT Event name. \[required] -o, --on TEXT Numeric property to bucket (bare name or expression). \[required] --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] -t, --type TEXT Count type: general, unique, average. \[default: general] -u, --unit TEXT Time unit: hour, day. \[default: day] -w, --where TEXT Filter expression. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### segmentation-sum Calculate sum of numeric property over time. Sums the values of a numeric property across all matching events. Useful for tracking totals like revenue, quantity, or duration. For example, --event Purchase --on revenue calculates total revenue per time period. **Output Structure (JSON):** ``` { "event": "Purchase", "from_date": "2025-01-01", "to_date": "2025-01-07", "property_expr": "revenue", "unit": "day", "results": { "2025-01-01": 15234.50, "2025-01-02": 18456.75, "2025-01-03": 12890.25, ... } } ``` **Examples:** ``` mp query segmentation-sum -e "Purchase" --on revenue --from 2025-01-01 --to 2025-01-31 mp query segmentation-sum -e "Purchase" --on quantity --from 2025-01-01 --to 2025-01-31 --unit hour ``` **jq Examples:** ``` --jq '.results | add' # Total sum across all dates --jq '.results | to_entries | max_by(.value)' # Highest day --jq '.results | to_entries | min_by(.value)' # Lowest day --jq '[.results | to_entries[] | {date: .key, revenue: .value}]' ``` Usage: ``` mp query segmentation-sum [OPTIONS] ``` Options: ``` -e, --event TEXT Event name. \[required] -o, --on TEXT Numeric property to sum (bare name or expression). \[required] --from TEXT Start date (YYYY-MM-DD). \[required] --to TEXT End date (YYYY-MM-DD). \[required] -u, --unit TEXT Time unit: hour, day. \[default: day] -w, --where TEXT Filter expression. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` ##### sql Execute SQL query against local DuckDB database. Query can be provided as an argument or read from a file with --file. Use --scalar when your query returns a single value (e.g., COUNT(\*)). **Output Structure (JSON):** Default (row results): ``` [ {"event": "Sign Up", "count": 1500}, {"event": "Login", "count": 3200}, {"event": "Purchase", "count": 450} ] ``` With --scalar: ``` {"value": 15234} ``` **Examples:** ``` mp query sql "SELECT COUNT(*) FROM events" --scalar mp query sql "SELECT event, COUNT(*) FROM events GROUP BY 1" --format table mp query sql --file analysis.sql --format csv ``` **jq Examples:** ``` --jq '.[0]' # First row --jq '.[] | .event' # All event names --jq 'map(select(.count > 100))' # Filter rows --jq '.value' # Scalar result value ``` Usage: ``` mp query sql [OPTIONS] [QUERY] ``` Options: ``` [QUERY] SQL query string. -F, --file PATH Read query from file. -s, --scalar Return single value. -f, --format [TEXT] Output format: json, jsonl, table, csv, plain. \[default: json] --jq TEXT Apply jq filter to JSON output (requires --format json or jsonl). ``` Copy markdown # Architecture # Architecture mixpanel_data follows a layered architecture with clear separation of concerns. Explore on DeepWiki πŸ€– **[Architecture Deep Dive β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/5-architecture)** Ask questions about the architecture, trace data flows, or explore component relationships interactively. ## Layer Diagram ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ CLI Layer (Typer) β”‚ β”‚ Argument parsing, output formatting, progress β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Public API Layer β”‚ β”‚ Workspace class, auth module β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Service Layer β”‚ β”‚ DiscoveryService, FetcherService, LiveQueryService β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Infrastructure Layer β”‚ β”‚ ConfigManager, MixpanelAPIClient, StorageEngine β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## Components ### Workspace (Facade) The `Workspace` class is the unified entry point that coordinates all services: - **Credential Resolution** β€” Env vars β†’ named account β†’ default account - **Service Orchestration** β€” Creates and manages service instances - **Resource Management** β€” Context manager support for cleanup ### Services #### DiscoveryService Schema introspection with session-scoped caching: - `list_events()` β€” All event names (cached) - `list_properties(event)` β€” Properties for an event (cached per event) - `list_property_values(property, event)` β€” Sample values (cached) - `list_funnels()` β€” Saved funnels (cached) - `list_cohorts()` β€” Saved cohorts (cached) - `list_top_events()` β€” Today's top events (NOT cached, real-time) #### FetcherService Coordinates data ingestion from Mixpanel API to DuckDB, or direct streaming: - Streaming transformation (memory efficient) - Progress callback integration - Returns `FetchResult` with metadata (fetch mode) - Returns `Iterator[dict]` without storage (stream mode) #### LiveQueryService Executes live analytics queries against Mixpanel Query API: - Segmentation, funnels, retention, JQL - Event counts, property counts - Activity feed, saved reports, flows, frequency - Numeric aggregations (bucket, sum, average) ### Infrastructure #### ConfigManager TOML-based account management at `~/.mp/config.toml`: - Account CRUD operations - Credential resolution - Default account management #### MixpanelAPIClient HTTP client with Mixpanel-specific features: - Service account authentication - Regional endpoint routing (US, EU, India) - Automatic rate limit handling with exponential backoff - Streaming JSONL parsing for large exports #### StorageEngine DuckDB-based storage: - Persistent, ephemeral, and in-memory modes - Table creation with streaming batch ingestion - Query execution (DataFrame, scalar, rows) - Schema introspection and metadata ## Data Paths ### Live Query Path ``` User Request β†’ Workspace β†’ LiveQueryService β†’ MixpanelAPIClient β†’ Mixpanel API ↓ Typed Result (e.g., SegmentationResult) ``` Best for: - Real-time data needs - One-off analysis - Pre-computed Mixpanel reports ### Local Analysis Path ``` User Request β†’ Workspace β†’ FetcherService β†’ MixpanelAPIClient β†’ Mixpanel Export API ↓ StorageEngine (DuckDB) ↓ User Query β†’ Workspace β†’ StorageEngine β†’ SQL Execution β†’ DataFrame ``` Best for: - Repeated queries over same data - Custom SQL logic - Context window preservation (AI agents) - Offline analysis ### Streaming Path ``` User Request β†’ Workspace β†’ MixpanelAPIClient β†’ Mixpanel Export API ↓ Iterator[dict] (no storage) ↓ Process each record inline ``` Best for: - ETL pipelines to external systems - One-time processing without storage - Memory-constrained environments - Unix pipeline integration (CLI `--stdout`) ## Key Design Decisions ### Explicit Table Management Tables are never implicitly overwritten. Fetching to an existing table name raises `TableExistsError`. This prevents accidental data loss and makes data lineage explicit. ### Streaming Ingestion The API client returns iterators, and storage accepts iterators. This enables memory-efficient processing of large datasets without loading everything into memory. ### JSON Property Storage Event and profile properties are stored as JSON columns in DuckDB. This preserves the flexible Mixpanel schema while enabling powerful JSON querying: ``` SELECT properties->>'$.country' as country FROM events ``` ### Immutable Credentials Credentials are resolved once at Workspace construction. This prevents confusion from mid-session credential changes. ### Dependency Injection All services accept their dependencies as constructor arguments. This enables: - Easy testing with mocks - Flexible composition - Clear dependency relationships ## Technology Stack | Component | Technology | Purpose | | ----------------- | ------------ | ----------------------------- | | Language | Python 3.11+ | Type hints, modern syntax | | CLI Framework | Typer | Declarative CLI building | | Output Formatting | Rich | Tables, progress bars, colors | | Validation | Pydantic | Data validation, settings | | Database | DuckDB | Embedded analytical database | | HTTP Client | httpx | Async-capable HTTP | | DataFrames | pandas | Data analysis interface | ## Package Structure ``` src/mixpanel_data/ β”œβ”€β”€ __init__.py # Public API exports β”œβ”€β”€ workspace.py # Workspace facade β”œβ”€β”€ auth.py # Public auth module β”œβ”€β”€ exceptions.py # Exception hierarchy β”œβ”€β”€ types.py # Result types β”œβ”€β”€ py.typed # PEP 561 marker β”œβ”€β”€ _internal/ # Private implementation β”‚ β”œβ”€β”€ config.py # ConfigManager, Credentials β”‚ β”œβ”€β”€ api_client.py # MixpanelAPIClient β”‚ β”œβ”€β”€ storage.py # StorageEngine β”‚ └── services/ β”‚ β”œβ”€β”€ discovery.py # DiscoveryService β”‚ β”œβ”€β”€ fetcher.py # FetcherService β”‚ └── live_query.py # LiveQueryService └── cli/ β”œβ”€β”€ main.py # Typer app entry point β”œβ”€β”€ commands/ # Command implementations β”œβ”€β”€ formatters.py # Output formatters └── utils.py # CLI utilities ``` Copy markdown # Data Model How Mixpanel data maps to local storage. Explore on DeepWiki πŸ€– **[Data Transformation Deep Dive β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/4.5-data-transformation)** Ask questions about how Mixpanel events and profiles are transformed into DuckDB schemas, or explore the transformation logic. ## Mixpanel Data Model Mixpanel tracks two primary data types: ### Events Actions users take in your product: | Field | Description | | ------------- | --------------------------------------- | | `event` | Event name (e.g., "Purchase", "Signup") | | `time` | Unix timestamp when event occurred | | `distinct_id` | User identifier | | `$insert_id` | Deduplication ID | | `properties` | Custom properties (JSON object) | ### User Profiles Persistent attributes about users: | Field | Description | | -------------- | -------------------------------- | | `$distinct_id` | User identifier (primary key) | | `$properties` | Profile properties (JSON object) | ## Local Storage Schema ### Events Table When you fetch events, they're stored with this schema: | Column | Type | Description | | ------------- | --------- | ----------------------- | | `event_id` | VARCHAR | Unique event identifier | | `event_name` | VARCHAR | Event name | | `event_time` | TIMESTAMP | When the event occurred | | `distinct_id` | VARCHAR | User identifier | | `insert_id` | VARCHAR | Deduplication ID | | `properties` | JSON | All event properties | Example query: ``` SELECT event_name, event_time, distinct_id, properties->>'$.country' as country, CAST(properties->>'$.amount' AS DECIMAL) as amount FROM events WHERE event_name = 'Purchase' ``` ### Profiles Table User profiles are stored with: | Column | Type | Description | | ------------- | ------- | ----------------------------- | | `distinct_id` | VARCHAR | User identifier (primary key) | | `properties` | JSON | All profile properties | Example query: ``` SELECT distinct_id, properties->>'$.name' as name, properties->>'$.email' as email, properties->>'$.plan' as plan FROM profiles WHERE properties->>'$.plan' = 'premium' ``` ## JSON Property Access DuckDB provides powerful JSON operators for querying properties: ### Extract String ``` -- Arrow operator returns JSON, ->> returns text SELECT properties->>'$.country' as country FROM events ``` ### Extract and Cast ``` SELECT CAST(properties->>'$.amount' AS DECIMAL) as amount FROM events ``` ### Nested Access ``` SELECT properties->>'$.user.address.city' as city FROM events ``` ### Array Access ``` -- First element SELECT properties->'$.items'->>0 as first_item FROM events -- Array length SELECT json_array_length(properties->'$.items') as count FROM events ``` ### Check Existence ``` SELECT * FROM events WHERE properties->>'$.coupon_code' IS NOT NULL ``` ## Metadata Table Each workspace maintains a `_mp_metadata` table for tracking fetch operations: | Column | Type | Description | | -------------- | --------- | ------------------------ | | `table_name` | VARCHAR | Name of the data table | | `table_type` | VARCHAR | "events" or "profiles" | | `from_date` | VARCHAR | Start date (events only) | | `to_date` | VARCHAR | End date (events only) | | `events` | JSON | Event filter (if any) | | `where_clause` | VARCHAR | Where filter (if any) | | `row_count` | BIGINT | Number of rows | | `fetched_at` | TIMESTAMP | When fetch completed | This metadata is used by `ws.tables()` and `ws.info()`. ## Common Mixpanel Properties ### Event Properties | Property | Type | Description | | ----------------- | ------ | ----------------------- | | `$city` | string | User's city | | `$region` | string | User's region/state | | `$country_code` | string | Two-letter country code | | `$browser` | string | Browser name | | `$device` | string | Device type | | `$os` | string | Operating system | | `mp_country_code` | string | Country code | | `$current_url` | string | Page URL | | `$referrer` | string | Referrer URL | ### Profile Properties | Property | Type | Description | | ------------- | --------- | ------------------------ | | `$email` | string | User's email | | `$name` | string | User's name | | `$first_name` | string | First name | | `$last_name` | string | Last name | | `$created` | timestamp | When profile was created | | `$last_seen` | timestamp | Last activity time | ## Query Patterns ### Daily Active Users ``` SELECT DATE_TRUNC('day', event_time) as day, COUNT(DISTINCT distinct_id) as dau FROM events GROUP BY 1 ORDER BY 1 ``` ### Revenue by Country ``` SELECT properties->>'$.country_code' as country, SUM(CAST(properties->>'$.amount' AS DECIMAL)) as revenue FROM events WHERE event_name = 'Purchase' GROUP BY 1 ORDER BY 2 DESC ``` ### Join Events with Profiles ``` SELECT e.event_name, p.properties->>'$.plan' as plan, COUNT(*) as count FROM events e JOIN profiles p ON e.distinct_id = p.distinct_id GROUP BY 1, 2 ``` ### Funnel Analysis ``` WITH step1 AS ( SELECT DISTINCT distinct_id, MIN(event_time) as t1 FROM events WHERE event_name = 'View Product' GROUP BY 1 ), step2 AS ( SELECT DISTINCT e.distinct_id, MIN(e.event_time) as t2 FROM events e JOIN step1 s ON e.distinct_id = s.distinct_id WHERE e.event_name = 'Add to Cart' AND e.event_time > s.t1 GROUP BY 1 ), step3 AS ( SELECT DISTINCT e.distinct_id FROM events e JOIN step2 s ON e.distinct_id = s.distinct_id WHERE e.event_name = 'Purchase' AND e.event_time > s.t2 ) SELECT (SELECT COUNT(*) FROM step1) as viewed, (SELECT COUNT(*) FROM step2) as added, (SELECT COUNT(*) FROM step3) as purchased ``` ## See Also - [SQL Queries Guide](https://jaredmcfarland.github.io/mixpanel_data/guide/sql-queries/index.md) β€” More query examples - [DuckDB JSON Documentation](https://duckdb.org/docs/extensions/json) β€” Complete JSON function reference Copy markdown # Storage Engine How mixpanel_data uses DuckDB for local data storage. Explore on DeepWiki πŸ€– **[StorageEngine Deep Dive β†’](https://deepwiki.com/jaredmcfarland/mixpanel_data/5.3.2-storageengine)** Ask questions about DuckDB integration, concurrency, or storage internals. ## Overview The `StorageEngine` class wraps DuckDB to provide persistent local storage for fetched Mixpanel data. Understanding DuckDB's concurrency model helps avoid conflicts when running multiple `mp` commands. ## Storage Modes Three storage modes are available: | Mode | Description | Use Case | | -------------- | ------------------------------- | ------------------------------------ | | **Persistent** | Database file on disk (default) | Production use, data preservation | | **Ephemeral** | Temp file deleted on close | Testing, one-off analysis | | **In-Memory** | No file, RAM only | Quick scripts, no persistence needed | ### Mode Selection ``` # Persistent (default) - stored at ~/.mp/data/{project_id}.db ws = Workspace() # Custom path ws = Workspace(path="/path/to/my.db") # Ephemeral - temp file, deleted on close ws = Workspace(ephemeral=True) # In-memory - no file at all ws = Workspace(in_memory=True) ``` ## DuckDB Concurrency Model DuckDB uses a **single-writer, multiple-reader** concurrency model: - **One write connection** can be active at a time per database file - **Multiple read connections** can coexist with each other - Read and write connections **cannot coexist** on the same file This differs from client-server databases (PostgreSQL, MySQL) where a server process mediates all access. ### What This Means in Practice | Scenario | Result | | ---------------------------------------- | ----------------------------------------- | | One `mp fetch` command | Works normally | | Two `mp fetch` commands to same database | Second command gets `DatabaseLockedError` | | `mp fetch` + `mp query` to same database | Query command gets `DatabaseLockedError` | | Two `mp query` commands to same database | Both work (when no write lock is held) | | Two `mp inspect` commands (API-only) | Both work (no database access) | ## Lock Conflicts When a second process tries to open a database that's already locked for writing, DuckDB raises an error. mixpanel_data catches this and raises a `DatabaseLockedError`: ``` Database locked: /home/user/.mp/data/12345.db Another mp command may be running. Try again shortly. ``` ## Database Not Found When opening a database in read-only mode, the file must already exist. If you run a read command (like `mp query` or `mp inspect tables`) before fetching any data, you'll get a `DatabaseNotFoundError`: ``` No data yet: /home/user/.mp/data/12345.db Run 'mp fetch events' or 'mp fetch profiles' to create the database. ``` This is different from write mode, which creates the database file automatically. ### Common Causes 1. **Long-running fetch** β€” Large date ranges take time; other commands must wait 1. **Background processes** β€” A previous command didn't exit cleanly 1. **Multiple terminals** β€” Different shells running concurrent `mp` commands ### Resolution 1. **Wait** β€” Let the first operation complete 1. **Check for stuck processes** β€” `ps aux | grep mp` to find orphaned commands 1. **Use separate databases** β€” Specify different `--path` for concurrent work ## Lazy Storage Initialization To avoid unnecessary lock conflicts, `Workspace` initializes storage **lazily**: ``` # These DON'T open the database: ws = Workspace() ws.events() # API call, no storage ws.segmentation(...) # API call, no storage ws.funnels(...) # API call, no storage # These DO open the database (on first access): ws.fetch_events(...) # Writes to storage ws.sql(...) # Reads from storage ws.tables() # Reads metadata ``` This means API-only commands like `mp inspect events` never conflict with fetch operations, even when targeting the same project. ## Avoiding Conflicts ### Use Ephemeral Mode for Testing ``` # Won't conflict with your main database mp fetch events --from 2025-01-01 --to 2025-01-07 --ephemeral ``` ### Use Separate Paths for Parallel Work ``` # Terminal 1 mp fetch events --from 2025-01-01 --to 2025-06-30 --path ./h1.db # Terminal 2 (parallel) mp fetch events --from 2025-07-01 --to 2025-12-31 --path ./h2.db ``` ### Combine into Single Commands ``` # Instead of two fetches, use date range in one command mp fetch events --from 2025-01-01 --to 2025-12-31 ``` ### Stream Instead of Store If you don't need to query the data repeatedly: ``` # No database, no locks mp fetch events --from 2025-01-01 --stdout | process_events.py ``` ## Connection Lifecycle The `StorageEngine` manages its DuckDB connection: ``` # Workspace as context manager ensures cleanup with Workspace() as ws: ws.fetch_events(from_date="2025-01-01", to_date="2025-01-31") df = ws.sql("SELECT * FROM events LIMIT 10") # Connection closed, lock released # Or explicit close ws = Workspace() try: ws.fetch_events(...) finally: ws.close() ``` CLI commands handle this automatically. ## Technical Details ### Lock File DuckDB creates a `.wal` (write-ahead log) file alongside the database during write operations. The lock is held for the duration of the connection. ### Process Isolation Within a single Python process, multiple `Workspace` instances can share the same database file (DuckDB handles internal locking). Lock conflicts occur between **separate processes**. ### Read-Only Mode Both `StorageEngine` and `Workspace` support a `read_only` parameter: ``` # Default: write access (matches DuckDB's native behavior) ws = Workspace() # read_only=False # Explicit read-only for concurrent access ws = Workspace(path="data.db", read_only=True) ``` Read-only connections: - Allow multiple reader processes to access the database concurrently (when no write lock is held) - Cannot execute INSERT, UPDATE, DELETE, or DDL statements - Still blocked by an active write lock (DuckDB write locks are exclusive) The CLI uses this automatically: - **Read commands** (`mp query`, `mp inspect tables`, etc.) use `read_only=True` - **Write commands** (`mp fetch`, `mp inspect drop`) use `read_only=False` **Note:** If a `mp fetch` is running, other commands will still be blocked until it completes. The benefit of read-only mode is enabling multiple concurrent read operations (e.g., two `mp query` commands). ## See Also - [Design](https://jaredmcfarland.github.io/mixpanel_data/architecture/design/index.md) β€” Overall architecture - [Data Model](https://jaredmcfarland.github.io/mixpanel_data/architecture/data-model/index.md) β€” Table schemas and query patterns - [DuckDB Documentation](https://duckdb.org/docs/) β€” Full DuckDB reference Copy markdown