Workspace¶
The Workspace class is the unified entry point for all Mixpanel data operations.
Explore on DeepWiki
Ask questions about Workspace methods, explore usage patterns, or understand how services are orchestrated.
Overview¶
Workspace orchestrates four internal services:
- DiscoveryService — Schema exploration (events, properties, funnels, cohorts)
- FetcherService — Data ingestion from Mixpanel to DuckDB, or streaming without storage
- LiveQueryService — Real-time analytics queries
- StorageEngine — Local SQL query execution
Key Features¶
Parallel Fetching¶
For large date ranges, use parallel=True for up to 10x faster exports:
# Parallel fetch for large date ranges (recommended)
result = ws.fetch_events(
name="events",
from_date="2024-01-01",
to_date="2024-12-31",
parallel=True
)
print(f"Fetched {result.total_rows} rows in {result.duration_seconds:.1f}s")
Parallel fetching:
- Splits date ranges into 7-day chunks (configurable via
chunk_days) - Fetches chunks concurrently (configurable via
max_workers, default: 10) - Returns
ParallelFetchResultwith batch statistics and failure tracking - Supports progress callbacks via
on_batch_complete
Parallel Profile Fetching¶
For large profile datasets, use parallel=True for up to 5x faster exports:
# Parallel profile fetch for large datasets
result = ws.fetch_profiles(
name="users",
parallel=True,
max_workers=5 # Default and max is 5
)
print(f"Fetched {result.total_rows} profiles in {result.duration_seconds:.1f}s")
print(f"Pages: {result.successful_pages} succeeded, {result.failed_pages} failed")
Parallel profile fetching:
- Uses page-based parallelism with session IDs for consistency
- Fetches pages concurrently (configurable via
max_workers, default: 5, max: 5) - Returns
ParallelProfileResultwith page statistics and failure tracking - Supports progress callbacks via
on_page_complete
Append Mode¶
The fetch_events() and fetch_profiles() methods support an append parameter for incremental data loading:
# Initial fetch
ws.fetch_events(name="events", from_date="2025-01-01", to_date="2025-01-31")
# Append more data (duplicates are automatically skipped)
ws.fetch_events(name="events", from_date="2025-02-01", to_date="2025-02-28", append=True)
This is useful for:
- Incremental loading: Fetch data in chunks without creating multiple tables
- Crash recovery: Resume a failed fetch from the last successful point
- Extending date ranges: Add more historical or recent data to an existing table
- Retrying failed parallel batches: Use append mode to retry specific date ranges
Duplicate events (by insert_id) and profiles (by distinct_id) are automatically skipped via INSERT OR IGNORE.
Advanced Profile Fetching¶
The fetch_profiles() and stream_profiles() methods support advanced filtering options:
# Fetch specific users by ID
ws.fetch_profiles(name="vip_users", distinct_ids=["user_1", "user_2", "user_3"])
# Fetch group profiles (e.g., companies)
ws.fetch_profiles(name="companies", group_id="companies")
# Fetch users based on behavior
ws.fetch_profiles(
name="purchasers",
behaviors=[{"window": "30d", "name": "buyers", "event_selectors": [{"event": "Purchase"}]}],
where='(behaviors["buyers"] > 0)'
)
# Query historical profile state
ws.fetch_profiles(
name="profiles_last_week",
as_of_timestamp=int(time.time()) - 604800 # 7 days ago
)
# Get all users with cohort membership marked
ws.fetch_profiles(
name="cohort_analysis",
cohort_id="12345",
include_all_users=True
)
Parameter constraints:
distinct_idanddistinct_idsare mutually exclusivebehaviorsandcohort_idare mutually exclusiveinclude_all_usersrequirescohort_idto be set
Class Reference¶
mixpanel_data.Workspace
¶
Workspace(
account: str | None = None,
project_id: str | None = None,
region: str | None = None,
path: str | Path | None = None,
read_only: bool = False,
_config_manager: ConfigManager | None = None,
_api_client: MixpanelAPIClient | None = None,
_storage: StorageEngine | None = None,
)
Unified entry point for Mixpanel data operations.
The Workspace class is a facade that orchestrates all services: - DiscoveryService for schema exploration - FetcherService for data ingestion - LiveQueryService for real-time analytics - StorageEngine for local SQL queries
Examples:
Basic usage with credentials from config:
ws = Workspace()
ws.fetch_events(from_date="2024-01-01", to_date="2024-01-31")
df = ws.sql("SELECT * FROM events LIMIT 10")
Ephemeral workspace for temporary analysis:
with Workspace.ephemeral() as ws:
ws.fetch_events(from_date="2024-01-01", to_date="2024-01-31")
total = ws.sql_scalar("SELECT COUNT(*) FROM events")
# Database automatically deleted
Query-only access to existing database:
Create a new Workspace with credentials and optional database path.
Credentials are resolved in priority order: 1. Environment variables (MP_USERNAME, MP_SECRET, MP_PROJECT_ID, MP_REGION) 2. Named account from config file (if account parameter specified) 3. Default account from config file
| PARAMETER | DESCRIPTION |
|---|---|
account
|
Named account from config file to use.
TYPE:
|
project_id
|
Override project ID from credentials.
TYPE:
|
region
|
Override region from credentials (us, eu, in).
TYPE:
|
path
|
Path to database file. If None, uses default location.
TYPE:
|
read_only
|
If True, open database in read-only mode allowing concurrent reads. Defaults to False (write access).
TYPE:
|
_config_manager
|
Injected ConfigManager for testing.
TYPE:
|
_api_client
|
Injected MixpanelAPIClient for testing.
TYPE:
|
_storage
|
Injected StorageEngine for testing.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If no credentials can be resolved. |
AccountNotFoundError
|
If named account doesn't exist. |
Source code in src/mixpanel_data/workspace.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | |
connection
property
¶
Direct access to the DuckDB connection.
Use this for operations not covered by the Workspace API.
| RETURNS | DESCRIPTION |
|---|---|
DuckDBPyConnection
|
The underlying DuckDB connection. |
db_path
property
¶
Path to the DuckDB database file.
Returns the filesystem path where data is stored. Useful for:
- Knowing where your data lives
- Opening the same database later with Workspace.open(path)
- Debugging and logging
| RETURNS | DESCRIPTION |
|---|---|
Path | None
|
The database file path, or None for in-memory workspaces. |
Example
Save the path for later use::
ws = mp.Workspace()
path = ws.db_path
ws.close()
# Later, reopen the same database
ws = mp.Workspace.open(path)
api
property
¶
Direct access to the Mixpanel API client.
Use this escape hatch for Mixpanel API operations not covered by the Workspace class. The client handles authentication automatically.
The client provides
request(method, url, **kwargs): Make authenticated requests to any Mixpanel API endpoint.project_id: The configured project ID for constructing URLs.region: The configured region ('us', 'eu', or 'in').
| RETURNS | DESCRIPTION |
|---|---|
MixpanelAPIClient
|
The underlying MixpanelAPIClient. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Example
Fetch event schema from the Lexicon Schemas API::
import mixpanel_data as mp
from urllib.parse import quote
ws = mp.Workspace()
client = ws.api
# Build the URL with proper encoding
event_name = quote("Added To Cart", safe="")
url = f"https://mixpanel.com/api/app/projects/{client.project_id}/schemas/event/{event_name}"
# Make the authenticated request
schema = client.request("GET", url)
print(schema)
ephemeral
classmethod
¶
ephemeral(
account: str | None = None,
project_id: str | None = None,
region: str | None = None,
_config_manager: ConfigManager | None = None,
_api_client: MixpanelAPIClient | None = None,
) -> Iterator[Workspace]
Create a temporary workspace that auto-deletes on exit.
| PARAMETER | DESCRIPTION |
|---|---|
account
|
Named account from config file to use.
TYPE:
|
project_id
|
Override project ID from credentials.
TYPE:
|
region
|
Override region from credentials.
TYPE:
|
_config_manager
|
Injected ConfigManager for testing.
TYPE:
|
_api_client
|
Injected MixpanelAPIClient for testing.
TYPE:
|
| YIELDS | DESCRIPTION |
|---|---|
Workspace
|
A workspace with temporary database.
TYPE::
|
Example
Source code in src/mixpanel_data/workspace.py
memory
classmethod
¶
memory(
account: str | None = None,
project_id: str | None = None,
region: str | None = None,
_config_manager: ConfigManager | None = None,
_api_client: MixpanelAPIClient | None = None,
) -> Iterator[Workspace]
Create a workspace with true in-memory database.
The database exists only in RAM with zero disk footprint. All data is lost when the context manager exits.
Best for: - Small datasets where zero disk footprint is required - Unit tests without filesystem side effects - Quick exploratory queries
For large datasets, prefer ephemeral() which benefits from DuckDB's compression (can be 8x faster for large workloads).
| PARAMETER | DESCRIPTION |
|---|---|
account
|
Named account from config file to use.
TYPE:
|
project_id
|
Override project ID from credentials.
TYPE:
|
region
|
Override region from credentials.
TYPE:
|
_config_manager
|
Injected ConfigManager for testing.
TYPE:
|
_api_client
|
Injected MixpanelAPIClient for testing.
TYPE:
|
| YIELDS | DESCRIPTION |
|---|---|
Workspace
|
A workspace with in-memory database.
TYPE::
|
Example
Source code in src/mixpanel_data/workspace.py
open
classmethod
¶
Open an existing database for query-only access.
This method opens a database without requiring API credentials. Discovery, fetching, and live query methods will be unavailable.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Path to existing database file.
TYPE:
|
read_only
|
If True (default), open in read-only mode allowing concurrent reads. Set to False for write access.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Workspace
|
A workspace with access to stored data.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If database file doesn't exist. |
Source code in src/mixpanel_data/workspace.py
close
¶
Close all resources (database connection, HTTP client).
This method is idempotent and safe to call multiple times.
Source code in src/mixpanel_data/workspace.py
test_credentials
staticmethod
¶
Test account credentials by making a lightweight API call.
This method verifies that credentials are valid and can access the Mixpanel API. It's useful for validating configuration before attempting more expensive operations.
| PARAMETER | DESCRIPTION |
|---|---|
account
|
Named account to test. If None, tests the default account or credentials from environment variables.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dict containing: - success: bool - Whether the test succeeded - account: str | None - Account name tested - project_id: str - Project ID from credentials - region: str - Region from credentials - events_found: int - Number of events found (validation metric) |
| RAISES | DESCRIPTION |
|---|---|
AccountNotFoundError
|
If named account doesn't exist. |
AuthenticationError
|
If credentials are invalid. |
ConfigError
|
If no credentials can be resolved. |
Example
Source code in src/mixpanel_data/workspace.py
events
¶
List all event names in the Mixpanel project.
Results are cached for the lifetime of the Workspace.
| RETURNS | DESCRIPTION |
|---|---|
list[str]
|
Alphabetically sorted list of event names. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
AuthenticationError
|
If credentials are invalid. |
Source code in src/mixpanel_data/workspace.py
properties
¶
List all property names for an event.
Results are cached per event for the lifetime of the Workspace.
| PARAMETER | DESCRIPTION |
|---|---|
event
|
Event name to get properties for.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[str]
|
Alphabetically sorted list of property names. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
property_values
¶
Get sample values for a property.
Results are cached per (property, event, limit) for the lifetime of the Workspace.
| PARAMETER | DESCRIPTION |
|---|---|
property_name
|
Property to get values for.
TYPE:
|
event
|
Optional event to filter by.
TYPE:
|
limit
|
Maximum number of values to return.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[str]
|
List of sample property values as strings. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
funnels
¶
List saved funnels in the Mixpanel project.
Results are cached for the lifetime of the Workspace.
| RETURNS | DESCRIPTION |
|---|---|
list[FunnelInfo]
|
List of FunnelInfo objects (funnel_id, name). |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
cohorts
¶
List saved cohorts in the Mixpanel project.
Results are cached for the lifetime of the Workspace.
| RETURNS | DESCRIPTION |
|---|---|
list[SavedCohort]
|
List of SavedCohort objects. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
list_bookmarks
¶
List all saved reports (bookmarks) in the project.
Retrieves metadata for all saved Insights, Funnel, Retention, and Flows reports in the project.
| PARAMETER | DESCRIPTION |
|---|---|
bookmark_type
|
Optional filter by report type. Valid values are 'insights', 'funnels', 'retention', 'flows', 'launch-analysis'. If None, returns all bookmark types.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[BookmarkInfo]
|
List of BookmarkInfo objects with report metadata. |
list[BookmarkInfo]
|
Empty list if no bookmarks exist. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
QueryError
|
Permission denied or invalid type parameter. |
Source code in src/mixpanel_data/workspace.py
top_events
¶
top_events(
*,
type: Literal["general", "average", "unique"] = "general",
limit: int | None = None,
) -> list[TopEvent]
Get today's most active events.
This method is NOT cached (returns real-time data).
| PARAMETER | DESCRIPTION |
|---|---|
type
|
Counting method (general, average, unique).
TYPE:
|
limit
|
Maximum number of events to return.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[TopEvent]
|
List of TopEvent objects (event, count, percent_change). |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
lexicon_schemas
¶
List Lexicon schemas in the project.
Retrieves documented event and profile property schemas from the Mixpanel Lexicon (data dictionary).
Results are cached for the lifetime of the Workspace.
| PARAMETER | DESCRIPTION |
|---|---|
entity_type
|
Optional filter by type ("event" or "profile"). If None, returns all schemas.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[LexiconSchema]
|
Alphabetically sorted list of LexiconSchema objects. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
AuthenticationError
|
If credentials are invalid. |
Note
The Lexicon API has a strict 5 requests/minute rate limit. Caching helps avoid hitting this limit; call clear_discovery_cache() only when fresh data is needed.
Source code in src/mixpanel_data/workspace.py
lexicon_schema
¶
Get a single Lexicon schema by entity type and name.
Retrieves a documented schema for a specific event or profile property from the Mixpanel Lexicon (data dictionary).
Results are cached for the lifetime of the Workspace.
| PARAMETER | DESCRIPTION |
|---|---|
entity_type
|
Entity type ("event" or "profile").
TYPE:
|
name
|
Entity name.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
LexiconSchema
|
LexiconSchema for the specified entity. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
AuthenticationError
|
If credentials are invalid. |
QueryError
|
If schema not found. |
Note
The Lexicon API has a strict 5 requests/minute rate limit. Caching helps avoid hitting this limit; call clear_discovery_cache() only when fresh data is needed.
Source code in src/mixpanel_data/workspace.py
clear_discovery_cache
¶
Clear cached discovery results.
Subsequent discovery calls will fetch fresh data from the API.
fetch_events
¶
fetch_events(
name: str = "events",
*,
from_date: str,
to_date: str,
events: list[str] | None = None,
where: str | None = None,
limit: int | None = None,
progress: bool = True,
append: bool = False,
batch_size: int = 1000,
parallel: bool = False,
max_workers: int | None = None,
on_batch_complete: Callable[[BatchProgress], None] | None = None,
chunk_days: int = 7,
) -> FetchResult | ParallelFetchResult
Fetch events from Mixpanel and store in local database.
Note
This is a potentially long-running operation that streams data from
Mixpanel's Export API. For large date ranges, use parallel=True
for significantly faster exports (up to 10x speedup).
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Table name to create or append to (default: "events").
TYPE:
|
from_date
|
Start date (YYYY-MM-DD).
TYPE:
|
to_date
|
End date (YYYY-MM-DD).
TYPE:
|
events
|
Optional list of event names to filter.
TYPE:
|
where
|
Optional WHERE clause for filtering.
TYPE:
|
limit
|
Optional maximum number of events to return (max 100000).
TYPE:
|
progress
|
Show progress bar (default: True).
TYPE:
|
append
|
If True, append to existing table. If False (default), create new.
TYPE:
|
batch_size
|
Number of rows per INSERT/COMMIT cycle. Controls the memory/IO tradeoff: smaller values use less memory but more disk IO; larger values use more memory but less IO. Default: 1000. Valid range: 100-100000.
TYPE:
|
parallel
|
If True, use parallel fetching with multiple threads. Splits date range into 7-day chunks and fetches concurrently. Enables export of date ranges exceeding 100 days. Default: False.
TYPE:
|
max_workers
|
Maximum concurrent fetch threads when parallel=True. Default: 10. Higher values may hit Mixpanel rate limits. Ignored when parallel=False.
TYPE:
|
on_batch_complete
|
Callback invoked when each batch completes during parallel fetch. Receives BatchProgress with status. Useful for custom progress reporting. Ignored when parallel=False.
TYPE:
|
chunk_days
|
Days per chunk for parallel date range splitting. Default: 7. Valid range: 1-100. Smaller values create more parallel batches but may increase API overhead. Ignored when parallel=False.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FetchResult | ParallelFetchResult
|
FetchResult when parallel=False, ParallelFetchResult when parallel=True. |
FetchResult | ParallelFetchResult
|
ParallelFetchResult includes per-batch statistics and any failure info. |
| RAISES | DESCRIPTION |
|---|---|
TableExistsError
|
If table exists and append=False. |
TableNotFoundError
|
If table doesn't exist and append=True. |
ConfigError
|
If API credentials not available. |
AuthenticationError
|
If credentials are invalid. |
ValueError
|
If batch_size is outside valid range (100-100000). |
ValueError
|
If limit is outside valid range (1-100000). |
ValueError
|
If max_workers is not positive. |
ValueError
|
If chunk_days is not in range 1-100. |
Example
# Sequential fetch (default)
result = ws.fetch_events(
name="events",
from_date="2024-01-01",
to_date="2024-01-31",
)
# Parallel fetch for large date ranges
result = ws.fetch_events(
name="events_q4",
from_date="2024-10-01",
to_date="2024-12-31",
parallel=True,
)
# With custom progress callback
def on_batch(progress: BatchProgress) -> None:
print(f"Batch {progress.batch_index + 1}/{progress.total_batches}")
result = ws.fetch_events(
name="events",
from_date="2024-01-01",
to_date="2024-03-31",
parallel=True,
on_batch_complete=on_batch,
)
Source code in src/mixpanel_data/workspace.py
825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 | |
fetch_profiles
¶
fetch_profiles(
name: str = "profiles",
*,
where: str | None = None,
cohort_id: str | None = None,
output_properties: list[str] | None = None,
progress: bool = True,
append: bool = False,
batch_size: int = 1000,
distinct_id: str | None = None,
distinct_ids: list[str] | None = None,
group_id: str | None = None,
behaviors: list[dict[str, Any]] | None = None,
as_of_timestamp: int | None = None,
include_all_users: bool = False,
parallel: bool = False,
max_workers: int | None = None,
on_page_complete: Callable[[ProfileProgress], None] | None = None,
) -> FetchResult | ParallelProfileResult
Fetch user profiles from Mixpanel and store in local database.
Note
This is a potentially long-running operation that streams data from
Mixpanel's Engage API. For large profile sets, use parallel=True
for up to 5x faster exports.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
Table name to create or append to (default: "profiles").
TYPE:
|
where
|
Optional WHERE clause for filtering.
TYPE:
|
cohort_id
|
Optional cohort ID to filter by. Only profiles that are members of this cohort will be returned.
TYPE:
|
output_properties
|
Optional list of property names to include in the response. If None, all properties are returned.
TYPE:
|
progress
|
Show progress bar (default: True).
TYPE:
|
append
|
If True, append to existing table. If False (default), create new.
TYPE:
|
batch_size
|
Number of rows per INSERT/COMMIT cycle. Controls the memory/IO tradeoff: smaller values use less memory but more disk IO; larger values use more memory but less IO. Default: 1000. Valid range: 100-100000.
TYPE:
|
distinct_id
|
Optional single user ID to fetch. Mutually exclusive with distinct_ids.
TYPE:
|
distinct_ids
|
Optional list of user IDs to fetch. Mutually exclusive with distinct_id. Duplicates are automatically removed.
TYPE:
|
group_id
|
Optional group type identifier (e.g., "companies") to fetch group profiles instead of user profiles.
TYPE:
|
behaviors
|
Optional list of behavioral filters. Each dict should have
'window' (e.g., "30d"), 'name' (identifier), and 'event_selectors'
(list of {"event": "Name"}). Use with
TYPE:
|
as_of_timestamp
|
Optional Unix timestamp to query profile state at a specific point in time. Must be in the past.
TYPE:
|
include_all_users
|
If True, include all users and mark cohort membership. Only valid when cohort_id is provided.
TYPE:
|
parallel
|
If True, use parallel fetching with multiple threads. Uses page-based parallelism for concurrent profile fetching. Enables up to 5x faster exports. Default: False.
TYPE:
|
max_workers
|
Maximum concurrent fetch threads when parallel=True. Default: 5, capped at 5. Ignored when parallel=False.
TYPE:
|
on_page_complete
|
Callback invoked when each page completes during parallel fetch. Receives ProfileProgress with status. Useful for custom progress reporting. Ignored when parallel=False.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FetchResult | ParallelProfileResult
|
FetchResult when parallel=False, ParallelProfileResult when parallel=True. |
FetchResult | ParallelProfileResult
|
ParallelProfileResult includes per-page statistics and any failure info. |
| RAISES | DESCRIPTION |
|---|---|
TableExistsError
|
If table exists and append=False. |
TableNotFoundError
|
If table doesn't exist and append=True. |
ConfigError
|
If API credentials not available. |
ValueError
|
If batch_size is outside valid range (100-100000) or mutually exclusive parameters are provided. |
Source code in src/mixpanel_data/workspace.py
979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 | |
stream_events
¶
stream_events(
*,
from_date: str,
to_date: str,
events: list[str] | None = None,
where: str | None = None,
limit: int | None = None,
raw: bool = False,
) -> Iterator[dict[str, Any]]
Stream events directly from Mixpanel API without storing.
Yields events one at a time as they are received from the API. No database files or tables are created.
| PARAMETER | DESCRIPTION |
|---|---|
from_date
|
Start date inclusive (YYYY-MM-DD format).
TYPE:
|
to_date
|
End date inclusive (YYYY-MM-DD format).
TYPE:
|
events
|
Optional list of event names to filter. If None, all events returned.
TYPE:
|
where
|
Optional Mixpanel filter expression (e.g., 'properties["country"]=="US"').
TYPE:
|
limit
|
Optional maximum number of events to return (max 100000).
TYPE:
|
raw
|
If True, return events in raw Mixpanel API format. If False (default), return normalized format with datetime objects.
TYPE:
|
| YIELDS | DESCRIPTION |
|---|---|
dict[str, Any]
|
dict[str, Any]: Event dictionaries in normalized or raw format. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials are not available. |
AuthenticationError
|
If credentials are invalid. |
RateLimitError
|
If rate limit exceeded after max retries. |
QueryError
|
If filter expression is invalid. |
ValueError
|
If limit is outside valid range (1-100000). |
Example
ws = Workspace()
for event in ws.stream_events(from_date="2024-01-01", to_date="2024-01-31"):
process(event)
ws.close()
With raw format:
Source code in src/mixpanel_data/workspace.py
1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 | |
stream_profiles
¶
stream_profiles(
*,
where: str | None = None,
cohort_id: str | None = None,
output_properties: list[str] | None = None,
raw: bool = False,
distinct_id: str | None = None,
distinct_ids: list[str] | None = None,
group_id: str | None = None,
behaviors: list[dict[str, Any]] | None = None,
as_of_timestamp: int | None = None,
include_all_users: bool = False,
) -> Iterator[dict[str, Any]]
Stream user profiles directly from Mixpanel API without storing.
Yields profiles one at a time as they are received from the API. No database files or tables are created.
| PARAMETER | DESCRIPTION |
|---|---|
where
|
Optional Mixpanel filter expression for profile properties.
TYPE:
|
cohort_id
|
Optional cohort ID to filter by. Only profiles that are members of this cohort will be returned.
TYPE:
|
output_properties
|
Optional list of property names to include in the response. If None, all properties are returned.
TYPE:
|
raw
|
If True, return profiles in raw Mixpanel API format. If False (default), return normalized format.
TYPE:
|
distinct_id
|
Optional single user ID to fetch. Mutually exclusive with distinct_ids.
TYPE:
|
distinct_ids
|
Optional list of user IDs to fetch. Mutually exclusive with distinct_id. Duplicates are automatically removed.
TYPE:
|
group_id
|
Optional group type identifier (e.g., "companies") to fetch group profiles instead of user profiles.
TYPE:
|
behaviors
|
Optional list of behavioral filters. Each dict should have
'window' (e.g., "30d"), 'name' (identifier), and 'event_selectors'
(list of {"event": "Name"}). Use with
TYPE:
|
as_of_timestamp
|
Optional Unix timestamp to query profile state at a specific point in time. Must be in the past.
TYPE:
|
include_all_users
|
If True, include all users and mark cohort membership. Only valid when cohort_id is provided.
TYPE:
|
| YIELDS | DESCRIPTION |
|---|---|
dict[str, Any]
|
dict[str, Any]: Profile dictionaries in normalized or raw format. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials are not available. |
AuthenticationError
|
If credentials are invalid. |
RateLimitError
|
If rate limit exceeded after max retries. |
ValueError
|
If mutually exclusive parameters are provided. |
Example
Filter to premium users:
Filter by cohort and select specific properties:
for profile in ws.stream_profiles(
cohort_id="12345",
output_properties=["$email", "$name"]
):
send_email(profile)
Fetch specific users by ID:
Fetch group profiles:
Source code in src/mixpanel_data/workspace.py
1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 | |
sql
¶
Execute SQL query and return results as DataFrame.
| PARAMETER | DESCRIPTION |
|---|---|
query
|
SQL query string.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
pandas DataFrame with query results. |
| RAISES | DESCRIPTION |
|---|---|
QueryError
|
If query is invalid. |
Source code in src/mixpanel_data/workspace.py
sql_scalar
¶
Execute SQL query and return single scalar value.
| PARAMETER | DESCRIPTION |
|---|---|
query
|
SQL query that returns a single value.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
The scalar result (int, float, str, etc.). |
| RAISES | DESCRIPTION |
|---|---|
QueryError
|
If query is invalid or returns multiple values. |
Source code in src/mixpanel_data/workspace.py
sql_rows
¶
Execute SQL query and return structured result with column metadata.
| PARAMETER | DESCRIPTION |
|---|---|
query
|
SQL query string.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
SQLResult
|
SQLResult with column names and row tuples. |
| RAISES | DESCRIPTION |
|---|---|
QueryError
|
If query is invalid. |
Example
Source code in src/mixpanel_data/workspace.py
segmentation
¶
segmentation(
event: str,
*,
from_date: str,
to_date: str,
on: str | None = None,
unit: Literal["day", "week", "month"] = "day",
where: str | None = None,
) -> SegmentationResult
Run a segmentation query against Mixpanel API.
| PARAMETER | DESCRIPTION |
|---|---|
event
|
Event name to query.
TYPE:
|
from_date
|
Start date (YYYY-MM-DD).
TYPE:
|
to_date
|
End date (YYYY-MM-DD).
TYPE:
|
on
|
Optional property to segment by.
TYPE:
|
unit
|
Time unit for aggregation.
TYPE:
|
where
|
Optional WHERE clause.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
SegmentationResult
|
SegmentationResult with time-series data. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
funnel
¶
funnel(
funnel_id: int,
*,
from_date: str,
to_date: str,
unit: str | None = None,
on: str | None = None,
) -> FunnelResult
Run a funnel analysis query.
| PARAMETER | DESCRIPTION |
|---|---|
funnel_id
|
ID of saved funnel.
TYPE:
|
from_date
|
Start date (YYYY-MM-DD).
TYPE:
|
to_date
|
End date (YYYY-MM-DD).
TYPE:
|
unit
|
Optional time unit.
TYPE:
|
on
|
Optional property to segment by.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FunnelResult
|
FunnelResult with step conversion rates. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
retention
¶
retention(
*,
born_event: str,
return_event: str,
from_date: str,
to_date: str,
born_where: str | None = None,
return_where: str | None = None,
interval: int = 1,
interval_count: int = 10,
unit: Literal["day", "week", "month"] = "day",
) -> RetentionResult
Run a retention analysis query.
| PARAMETER | DESCRIPTION |
|---|---|
born_event
|
Event that defines cohort entry.
TYPE:
|
return_event
|
Event that defines return.
TYPE:
|
from_date
|
Start date (YYYY-MM-DD).
TYPE:
|
to_date
|
End date (YYYY-MM-DD).
TYPE:
|
born_where
|
Optional filter for born event.
TYPE:
|
return_where
|
Optional filter for return event.
TYPE:
|
interval
|
Retention interval.
TYPE:
|
interval_count
|
Number of intervals.
TYPE:
|
unit
|
Time unit.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
RetentionResult
|
RetentionResult with cohort retention data. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
jql
¶
Execute a custom JQL script.
| PARAMETER | DESCRIPTION |
|---|---|
script
|
JQL script code.
TYPE:
|
params
|
Optional parameters to pass to script.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
JQLResult
|
JQLResult with raw query results. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
JQLSyntaxError
|
If script has syntax errors. |
Source code in src/mixpanel_data/workspace.py
event_counts
¶
event_counts(
events: list[str],
*,
from_date: str,
to_date: str,
type: Literal["general", "unique", "average"] = "general",
unit: Literal["day", "week", "month"] = "day",
) -> EventCountsResult
Get event counts for multiple events.
| PARAMETER | DESCRIPTION |
|---|---|
events
|
List of event names.
TYPE:
|
from_date
|
Start date (YYYY-MM-DD).
TYPE:
|
to_date
|
End date (YYYY-MM-DD).
TYPE:
|
type
|
Counting method.
TYPE:
|
unit
|
Time unit.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
EventCountsResult
|
EventCountsResult with time-series per event. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
property_counts
¶
property_counts(
event: str,
property_name: str,
*,
from_date: str,
to_date: str,
type: Literal["general", "unique", "average"] = "general",
unit: Literal["day", "week", "month"] = "day",
values: list[str] | None = None,
limit: int | None = None,
) -> PropertyCountsResult
Get event counts broken down by property values.
| PARAMETER | DESCRIPTION |
|---|---|
event
|
Event name.
TYPE:
|
property_name
|
Property to break down by.
TYPE:
|
from_date
|
Start date (YYYY-MM-DD).
TYPE:
|
to_date
|
End date (YYYY-MM-DD).
TYPE:
|
type
|
Counting method.
TYPE:
|
unit
|
Time unit.
TYPE:
|
values
|
Optional list of property values to include.
TYPE:
|
limit
|
Maximum number of property values.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
PropertyCountsResult
|
PropertyCountsResult with time-series per property value. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
activity_feed
¶
activity_feed(
distinct_ids: list[str],
*,
from_date: str | None = None,
to_date: str | None = None,
) -> ActivityFeedResult
Get activity feed for specific users.
| PARAMETER | DESCRIPTION |
|---|---|
distinct_ids
|
List of user identifiers.
TYPE:
|
from_date
|
Optional start date filter.
TYPE:
|
to_date
|
Optional end date filter.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ActivityFeedResult
|
ActivityFeedResult with user events. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
query_saved_report
¶
Query a saved report (Insights, Retention, or Funnel).
Executes a saved report by its bookmark ID. The report type is automatically detected from the response headers.
| PARAMETER | DESCRIPTION |
|---|---|
bookmark_id
|
ID of saved report (from list_bookmarks or Mixpanel URL).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
SavedReportResult
|
SavedReportResult with report data and report_type property. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
QueryError
|
If bookmark_id is invalid or report not found. |
Source code in src/mixpanel_data/workspace.py
query_flows
¶
Query a saved Flows report.
Executes a saved Flows report by its bookmark ID, returning step data, breakdowns, and conversion rates.
| PARAMETER | DESCRIPTION |
|---|---|
bookmark_id
|
ID of saved flows report (from list_bookmarks or Mixpanel URL).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FlowsResult
|
FlowsResult with steps, breakdowns, and conversion rate. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
QueryError
|
If bookmark_id is invalid or report not found. |
Source code in src/mixpanel_data/workspace.py
frequency
¶
frequency(
*,
from_date: str,
to_date: str,
unit: Literal["day", "week", "month"] = "day",
addiction_unit: Literal["hour", "day"] = "hour",
event: str | None = None,
where: str | None = None,
) -> FrequencyResult
Analyze event frequency distribution.
| PARAMETER | DESCRIPTION |
|---|---|
from_date
|
Start date (YYYY-MM-DD).
TYPE:
|
to_date
|
End date (YYYY-MM-DD).
TYPE:
|
unit
|
Overall time unit.
TYPE:
|
addiction_unit
|
Measurement granularity.
TYPE:
|
event
|
Optional event filter.
TYPE:
|
where
|
Optional WHERE clause.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FrequencyResult
|
FrequencyResult with frequency distribution. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
segmentation_numeric
¶
segmentation_numeric(
event: str,
*,
from_date: str,
to_date: str,
on: str,
unit: Literal["hour", "day"] = "day",
where: str | None = None,
type: Literal["general", "unique", "average"] = "general",
) -> NumericBucketResult
Bucket events by numeric property ranges.
| PARAMETER | DESCRIPTION |
|---|---|
event
|
Event name.
TYPE:
|
from_date
|
Start date.
TYPE:
|
to_date
|
End date.
TYPE:
|
on
|
Numeric property expression.
TYPE:
|
unit
|
Time unit.
TYPE:
|
where
|
Optional filter.
TYPE:
|
type
|
Counting method.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
NumericBucketResult
|
NumericBucketResult with bucketed data. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
segmentation_sum
¶
segmentation_sum(
event: str,
*,
from_date: str,
to_date: str,
on: str,
unit: Literal["hour", "day"] = "day",
where: str | None = None,
) -> NumericSumResult
Calculate sum of numeric property over time.
| PARAMETER | DESCRIPTION |
|---|---|
event
|
Event name.
TYPE:
|
from_date
|
Start date.
TYPE:
|
to_date
|
End date.
TYPE:
|
on
|
Numeric property expression.
TYPE:
|
unit
|
Time unit.
TYPE:
|
where
|
Optional filter.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
NumericSumResult
|
NumericSumResult with sum values per period. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
segmentation_average
¶
segmentation_average(
event: str,
*,
from_date: str,
to_date: str,
on: str,
unit: Literal["hour", "day"] = "day",
where: str | None = None,
) -> NumericAverageResult
Calculate average of numeric property over time.
| PARAMETER | DESCRIPTION |
|---|---|
event
|
Event name.
TYPE:
|
from_date
|
Start date.
TYPE:
|
to_date
|
End date.
TYPE:
|
on
|
Numeric property expression.
TYPE:
|
unit
|
Time unit.
TYPE:
|
where
|
Optional filter.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
NumericAverageResult
|
NumericAverageResult with average values per period. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
Source code in src/mixpanel_data/workspace.py
property_distribution
¶
property_distribution(
event: str, property: str, *, from_date: str, to_date: str, limit: int = 20
) -> PropertyDistributionResult
Get distribution of values for a property.
Uses JQL to count occurrences of each property value, returning counts and percentages sorted by frequency.
| PARAMETER | DESCRIPTION |
|---|---|
event
|
Event name to analyze.
TYPE:
|
property
|
Property name to get distribution for.
TYPE:
|
from_date
|
Start date (YYYY-MM-DD).
TYPE:
|
to_date
|
End date (YYYY-MM-DD).
TYPE:
|
limit
|
Maximum number of values to return. Default: 20.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
PropertyDistributionResult
|
PropertyDistributionResult with value counts and percentages. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
QueryError
|
Script execution error. |
Example
Source code in src/mixpanel_data/workspace.py
numeric_summary
¶
numeric_summary(
event: str,
property: str,
*,
from_date: str,
to_date: str,
percentiles: list[int] | None = None,
) -> NumericPropertySummaryResult
Get statistical summary for a numeric property.
Uses JQL to compute count, min, max, avg, stddev, and percentiles for a numeric property.
| PARAMETER | DESCRIPTION |
|---|---|
event
|
Event name to analyze.
TYPE:
|
property
|
Numeric property name.
TYPE:
|
from_date
|
Start date (YYYY-MM-DD).
TYPE:
|
to_date
|
End date (YYYY-MM-DD).
TYPE:
|
percentiles
|
Percentiles to compute. Default: [25, 50, 75, 90, 95, 99].
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
NumericPropertySummaryResult
|
NumericPropertySummaryResult with statistics. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
QueryError
|
Script execution error or non-numeric property. |
Example
Source code in src/mixpanel_data/workspace.py
daily_counts
¶
daily_counts(
*, from_date: str, to_date: str, events: list[str] | None = None
) -> DailyCountsResult
Get daily event counts.
Uses JQL to count events by day, optionally filtered to specific events.
| PARAMETER | DESCRIPTION |
|---|---|
from_date
|
Start date (YYYY-MM-DD).
TYPE:
|
to_date
|
End date (YYYY-MM-DD).
TYPE:
|
events
|
Optional list of events to count. None = all events.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DailyCountsResult
|
DailyCountsResult with date/event/count tuples. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
QueryError
|
Script execution error. |
Example
Source code in src/mixpanel_data/workspace.py
engagement_distribution
¶
engagement_distribution(
*,
from_date: str,
to_date: str,
events: list[str] | None = None,
buckets: list[int] | None = None,
) -> EngagementDistributionResult
Get user engagement distribution.
Uses JQL to bucket users by their event count, showing how many users performed N events.
| PARAMETER | DESCRIPTION |
|---|---|
from_date
|
Start date (YYYY-MM-DD).
TYPE:
|
to_date
|
End date (YYYY-MM-DD).
TYPE:
|
events
|
Optional list of events to count. None = all events.
TYPE:
|
buckets
|
Bucket boundaries. Default: [1, 2, 5, 10, 25, 50, 100].
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
EngagementDistributionResult
|
EngagementDistributionResult with user counts per bucket. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
QueryError
|
Script execution error. |
Example
Source code in src/mixpanel_data/workspace.py
property_coverage
¶
property_coverage(
event: str, properties: list[str], *, from_date: str, to_date: str
) -> PropertyCoverageResult
Get property coverage statistics.
Uses JQL to count how often each property is defined (non-null) vs undefined for the specified event.
| PARAMETER | DESCRIPTION |
|---|---|
event
|
Event name to analyze.
TYPE:
|
properties
|
List of property names to check.
TYPE:
|
from_date
|
Start date (YYYY-MM-DD).
TYPE:
|
to_date
|
End date (YYYY-MM-DD).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
PropertyCoverageResult
|
PropertyCoverageResult with coverage statistics per property. |
| RAISES | DESCRIPTION |
|---|---|
ConfigError
|
If API credentials not available. |
QueryError
|
Script execution error. |
Example
Source code in src/mixpanel_data/workspace.py
info
¶
Get metadata about this workspace.
| RETURNS | DESCRIPTION |
|---|---|
WorkspaceInfo
|
WorkspaceInfo with path, project_id, region, account, tables, size. |
Source code in src/mixpanel_data/workspace.py
tables
¶
List tables in the local database.
| RETURNS | DESCRIPTION |
|---|---|
list[TableInfo]
|
List of TableInfo objects (name, type, row_count, fetched_at). |
table_schema
¶
Get schema for a table in the local database.
| PARAMETER | DESCRIPTION |
|---|---|
table
|
Table name.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
TableSchema
|
TableSchema with column definitions. |
| RAISES | DESCRIPTION |
|---|---|
TableNotFoundError
|
If table doesn't exist. |
Source code in src/mixpanel_data/workspace.py
drop
¶
Drop specified tables.
| PARAMETER | DESCRIPTION |
|---|---|
*names
|
Table names to drop.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
TableNotFoundError
|
If any table doesn't exist. |
Source code in src/mixpanel_data/workspace.py
drop_all
¶
Drop all tables from the workspace, optionally filtered by type.
Permanently removes all tables and their data. When used with the type parameter, only tables matching the specified type are dropped.
| PARAMETER | DESCRIPTION |
|---|---|
type
|
Optional table type filter. Valid values: "events", "profiles". If None, all tables are dropped regardless of type.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
TableNotFoundError
|
If a table cannot be dropped (rare in practice). |
Example
Drop all event tables:
Drop all tables:
Source code in src/mixpanel_data/workspace.py
sample
¶
Return random sample rows from a table.
Uses DuckDB's reservoir sampling for representative results. Unlike LIMIT, sampling returns rows from throughout the table.
| PARAMETER | DESCRIPTION |
|---|---|
table
|
Table name to sample from.
TYPE:
|
n
|
Number of rows to return (default: 10).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with n random rows. If table has fewer than n rows, |
DataFrame
|
returns all available rows. |
| RAISES | DESCRIPTION |
|---|---|
TableNotFoundError
|
If table doesn't exist. |
Example
Source code in src/mixpanel_data/workspace.py
summarize
¶
Get statistical summary of all columns in a table.
Uses DuckDB's SUMMARIZE command to compute min/max, quartiles, null percentage, and approximate distinct counts for each column.
| PARAMETER | DESCRIPTION |
|---|---|
table
|
Table name to summarize.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
SummaryResult
|
SummaryResult with per-column statistics and total row count. |
| RAISES | DESCRIPTION |
|---|---|
TableNotFoundError
|
If table doesn't exist. |
Example
Source code in src/mixpanel_data/workspace.py
event_breakdown
¶
Analyze event distribution in a table.
Computes per-event counts, unique users, date ranges, and percentage of total for each event type.
| PARAMETER | DESCRIPTION |
|---|---|
table
|
Table name containing events. Must have columns: event_name, event_time, distinct_id.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
EventBreakdownResult
|
EventBreakdownResult with per-event statistics. |
| RAISES | DESCRIPTION |
|---|---|
TableNotFoundError
|
If table doesn't exist. |
QueryError
|
If table lacks required columns (event_name, event_time, distinct_id). Error message lists the specific missing columns. |
Example
Source code in src/mixpanel_data/workspace.py
2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 | |
property_keys
¶
List all JSON property keys in a table.
Extracts distinct keys from the 'properties' JSON column. Useful for discovering queryable fields in event properties.
| PARAMETER | DESCRIPTION |
|---|---|
table
|
Table name with a 'properties' JSON column.
TYPE:
|
event
|
Optional event name to filter by. If provided, only returns keys present in events of that type.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[str]
|
Alphabetically sorted list of property key names. |
list[str]
|
Empty list if no keys found. |
| RAISES | DESCRIPTION |
|---|---|
TableNotFoundError
|
If table doesn't exist. |
QueryError
|
If table lacks 'properties' column. |
Example
All keys across all events:
Keys for specific event type:
Source code in src/mixpanel_data/workspace.py
2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 | |
column_stats
¶
Get detailed statistics for a single column.
Performs deep analysis including null rates, cardinality, top values, and numeric statistics (for numeric columns).
The column parameter supports JSON path expressions for
analyzing properties stored in JSON columns:
- properties->>'$.country' for string extraction
- CAST(properties->>'$.amount' AS DOUBLE) for numeric
| PARAMETER | DESCRIPTION |
|---|---|
table
|
Table name to analyze.
TYPE:
|
column
|
Column name or expression to analyze.
TYPE:
|
top_n
|
Number of top values to return (default: 10).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ColumnStatsResult
|
ColumnStatsResult with comprehensive column statistics. |
| RAISES | DESCRIPTION |
|---|---|
TableNotFoundError
|
If table doesn't exist. |
QueryError
|
If column expression is invalid. |
Example
Analyze standard column:
stats = ws.column_stats("events", "event_name")
stats.unique_count # 47
stats.top_values[:3] # [('Page View', 45230), ...]
Analyze JSON property:
Security
The column parameter is interpolated directly into SQL queries to allow expression syntax. Only use with trusted input from developers or AI coding agents. Do not pass untrusted user input.
Source code in src/mixpanel_data/workspace.py
2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 | |