Data Discovery¶
Explore your Mixpanel project's schema before writing queries. Discovery results are cached for the session.
Explore on DeepWiki
🤖 Discovery Methods Guide →
Ask questions about schema exploration, caching behavior, or how to discover your data landscape.
Listing Events¶
Get all event names in your project:
Events are returned sorted alphabetically.
Listing Properties¶
Get properties for a specific event:
Properties include both event-specific and common properties.
Property Values¶
Sample values for a property:
Saved Funnels¶
List funnels defined in Mixpanel:
FunnelInfo¶
Saved Cohorts¶
List cohorts defined in Mixpanel:
SavedCohort¶
c.id # 12345
c.name # "Power Users"
c.count # 5000
c.description # "Users with 10+ logins"
c.created # datetime
c.is_visible # True
Lexicon Schemas¶
Retrieve data dictionary schemas for events and profile properties. Schemas include descriptions, property types, and metadata defined in Mixpanel's Lexicon.
Schema Coverage
The Lexicon API returns only events/properties with explicit schemas (defined via API, CSV import, or UI). It does not return all events visible in Lexicon's UI.
# List all schemas
schemas = ws.lexicon_schemas()
for s in schemas:
print(f"{s.entity_type}: {s.name}")
# Filter by entity type
event_schemas = ws.lexicon_schemas(entity_type="event")
profile_schemas = ws.lexicon_schemas(entity_type="profile")
# Get a specific schema
schema = ws.lexicon_schema("event", "Purchase")
print(schema.schema_json.description)
for prop, info in schema.schema_json.properties.items():
print(f" {prop}: {info.type}")
LexiconSchema¶
s.entity_type # "event", "profile", or other API-returned types
s.name # "Purchase"
s.schema_json # LexiconDefinition object
LexiconDefinition¶
s.schema_json.description # "User completes a purchase"
s.schema_json.properties # dict[str, LexiconProperty]
s.schema_json.metadata # LexiconMetadata or None
LexiconProperty¶
prop = s.schema_json.properties["amount"]
prop.type # "number"
prop.description # "Purchase amount in USD"
prop.metadata # LexiconMetadata or None
LexiconMetadata¶
meta = s.schema_json.metadata
meta.display_name # "Purchase Event"
meta.tags # ["core", "revenue"]
meta.hidden # False
meta.dropped # False
meta.contacts # ["owner@company.com"]
meta.team_contacts # ["Analytics Team"]
Rate Limit
The Lexicon API has a strict rate limit of 5 requests per minute. Schema results are cached for the session to minimize API calls.
Top Events¶
Get today's most active events:
TopEvent¶
Not Cached
Unlike other discovery methods, top_events() always makes an API call since it returns real-time data.
JQL-Based Remote Discovery¶
These methods use JQL (JavaScript Query Language) to analyze data directly on Mixpanel's servers, returning aggregated results without fetching raw data locally.
Property Value Distribution¶
Understand what values a property contains and how often they appear:
Numeric Property Summary¶
Get statistical summary for numeric properties:
Daily Event Counts¶
See event activity over time:
User Engagement Distribution¶
Understand how engaged users are by their event count:
Property Coverage¶
Check data quality by seeing how often properties are defined:
When to Use JQL-Based Discovery
These methods are ideal for:
- Quick exploration: Understand data shape before fetching locally
- Large date ranges: Analyze months of data without downloading everything
- Data quality checks: Verify property coverage and value distributions
- Trend analysis: See daily activity patterns
See the JQL Discovery Types in the API reference for return type details.
Caching¶
Discovery results are cached for the lifetime of the Workspace:
ws = mp.Workspace()
# First call hits the API
events1 = ws.events()
# Second call returns cached result (instant)
events2 = ws.events()
# Clear cache to force refresh
ws.clear_discovery_cache()
# Now hits API again
events3 = ws.events()
Local Data Analysis¶
After fetching data into DuckDB, use these introspection methods to understand your data before writing SQL queries.
Sampling Data¶
Get random sample rows to see data structure:
Statistical Summary¶
Get column-level statistics for an entire table:
summary = ws.summarize("events")
print(f"Total rows: {summary.row_count}")
for col in summary.columns:
print(f"{col.column_name}: {col.column_type}")
print(f" Nulls: {col.null_percentage:.1f}%")
print(f" Unique: {col.approx_unique}")
if col.avg is not None: # Numeric columns
print(f" Mean: {col.avg:.2f}, Std: {col.std:.2f}")
Event Breakdown¶
Analyze event distribution in an events table:
breakdown = ws.event_breakdown("events")
print(f"Total events: {breakdown.total_events}")
print(f"Total users: {breakdown.total_users}")
print(f"Date range: {breakdown.date_range[0]} to {breakdown.date_range[1]}")
for event in breakdown.events:
print(f"{event.event_name}: {event.count} ({event.pct_of_total:.1f}%)")
print(f" Users: {event.unique_users}")
print(f" First seen: {event.first_seen}")
Required Columns
The table must have event_name, event_time, and distinct_id columns.
Property Key Discovery¶
Discover all JSON property keys in a table:
This is especially useful for building JSON path expressions like properties->>'$.country'.
Column Statistics¶
Deep analysis of a single column:
# Analyze a regular column
stats = ws.column_stats("events", "event_name")
print(f"Total: {stats.count}, Nulls: {stats.null_pct:.1f}%")
print(f"Unique values: {stats.unique_count}")
print("Top values:")
for value, count in stats.top_values:
print(f" {value}: {count}")
# Analyze a JSON property
stats = ws.column_stats("events", "properties->>'$.country'", top_n=20)
For numeric columns, additional statistics are available:
stats = ws.column_stats("purchases", "properties->>'$.amount'")
print(f"Min: {stats.min}, Max: {stats.max}")
print(f"Mean: {stats.mean:.2f}, Std: {stats.std:.2f}")
Introspection Workflow¶
A typical workflow for exploring fetched data:
import mixpanel_data as mp
ws = mp.Workspace()
# Fetch data first
ws.fetch_events("events", from_date="2025-01-01", to_date="2025-01-31")
# 1. Quick look at the data
print(ws.sample("events", n=3))
# 2. Get overall statistics
summary = ws.summarize("events")
print(f"Rows: {summary.row_count}")
# 3. Understand event distribution
breakdown = ws.event_breakdown("events")
for e in breakdown.events[:5]:
print(f"{e.event_name}: {e.count}")
# 4. Discover available properties
keys = ws.property_keys("events", event="Purchase")
print(f"Purchase properties: {keys}")
# 5. Deep dive into specific columns
stats = ws.column_stats("events", "properties->>'$.country'")
print(f"Top countries: {stats.top_values[:5]}")
# Now write informed SQL queries
df = ws.sql("""
SELECT properties->>'$.country' as country, COUNT(*) as count
FROM events
WHERE event_name = 'Purchase'
GROUP BY 1
ORDER BY 2 DESC
""")
Local Table Discovery¶
Inspect tables in your local database:
List Tables¶
Table Schema¶
Workspace Info¶
Discovery Workflow¶
A typical discovery workflow before analysis:
import mixpanel_data as mp
ws = mp.Workspace()
# 1. What events exist?
print("Events:")
for event in ws.events()[:10]:
print(f" - {event}")
# 2. What properties does Purchase have?
print("\nPurchase properties:")
for prop in ws.properties("Purchase"):
print(f" - {prop}")
# 3. What values does 'country' have?
print("\nCountry values:")
for value in ws.property_values("country", event="Purchase", limit=10):
print(f" - {value}")
# 4. What funnels are defined?
print("\nFunnels:")
for f in ws.funnels():
print(f" - {f.name} (ID: {f.funnel_id})")
# 5. Now fetch and analyze
ws.fetch_events("purchases", from_date="2025-01-01", to_date="2025-01-31",
events=["Purchase"])
df = ws.sql("""
SELECT properties->>'$.country' as country, COUNT(*) as count
FROM purchases
GROUP BY 1
ORDER BY 2 DESC
""")
print(df)
Next Steps¶
- Fetching Data — Fetch events for local analysis
- SQL Queries — Query with SQL
- API Reference — Complete API documentation