User Profile Queries¶
Query user profiles from Mixpanel's Engage API — filter by properties, sort, select fields, count matching profiles, and fetch large result sets with parallel pagination. Uses the same Filter vocabulary as all other query engines.
New in v0.2
Workspace.query_user() is the 5th engine in the unified query system. It answers identity questions ("who are these users?") that complement the behavioral questions answered by insights, funnels, retention, and flows.
When to Use query_user()¶
Use query_user() when you need to work with user profiles rather than events:
| Use Case | Example |
|---|---|
| Filter profiles by property | query_user(mode="profiles", where=Filter.equals("plan", "premium")) |
| Count matching profiles | query_user(where=Filter.is_set("$email")) |
| Get top users by a metric | query_user(mode="profiles", sort_by="ltv", sort_order="descending", limit=50) |
| Look up specific users | query_user(mode="profiles", distinct_id="user_abc123") |
| Profile a behavioral cohort | query_user(mode="profiles", cohort=CohortDefinition.all_of(...)) |
| Export profiles at scale | query_user(mode="profiles", properties=[...], limit=5000, parallel=True) |
| Cross-engine profiling | Insights identifies a segment, query_user() profiles those users |
Use stream_profiles() when you need to iterate over raw profile dicts without structured filtering or DataFrame output.
Getting Started¶
import mixpanel_data as mp
from mixpanel_data import Filter
ws = mp.Workspace()
# Quick count — how many profiles exist? (default: mode="aggregate")
result = ws.query_user()
print(f"Total profiles: {result.value}")
# Quick peek — one sample profile
result = ws.query_user(mode="profiles")
print(result.df)
# Filter and select properties
result = ws.query_user(
mode="profiles",
where=Filter.equals("plan", "premium"),
properties=["$email", "$name", "ltv"],
sort_by="ltv",
sort_order="descending",
limit=50,
)
print(result.df) # distinct_id | last_seen | email | name | ltv
Aggregate Mode¶
Aggregate mode is the default (mode="aggregate"). Compute statistics across matching profiles without fetching individual records:
# Count users with email (aggregate is the default mode)
count = ws.query_user(where=Filter.is_set("$email"))
print(f"Users with email: {count.value}")
# Total users (all)
total = ws.query_user()
print(f"Total profiles: {total.value}")
# Extremes (min/max) of a numeric property
result = ws.query_user(
aggregate="extremes",
aggregate_property="ltv",
)
print(result.aggregate_data) # {"max": 9500, "min": 0, ...}
# Percentile
result = ws.query_user(
aggregate="percentile",
aggregate_property="ltv",
percentile=90,
)
print(result.aggregate_data) # {"percentile": 90, "result": 4500}
# Numeric summary (count, mean, variance, sum_of_squares)
result = ws.query_user(
aggregate="numeric_summary",
aggregate_property="ltv",
)
print(result.aggregate_data) # {"count": 1532, "mean": 245.6, ...}
# Segmented count by cohort IDs
result = ws.query_user(
segment_by=[12345, 67890],
)
print(result.df) # columns: segment, value
Behavioral Filtering¶
Filter by behavioral criteria using the same CohortDefinition builders available across all engines:
from mixpanel_data import CohortDefinition, CohortCriteria
# Users who purchased 3+ times in 30 days
result = ws.query_user(
mode="profiles",
cohort=CohortDefinition.all_of(
CohortCriteria.did_event("Purchase", at_least=3, within_days=30),
),
properties=["$email", "plan", "ltv"],
limit=200,
)
print(f"Power buyers: {len(result.profiles)}")
# Filter by saved cohort ID
result = ws.query_user(mode="profiles", cohort=12345, limit=100)
Parallel Fetching¶
For large result sets, enable concurrent page retrieval:
result = ws.query_user(
mode="profiles",
where=Filter.is_set("$email"),
properties=["$email", "plan", "ltv"],
limit=5000,
parallel=True,
workers=5,
)
print(f"Fetched {len(result.profiles)} profiles")
print(f"Pages: {result.meta['pages_fetched']}, Workers: {result.meta['workers']}")
Cross-Engine Composition¶
The real power of query_user() is combining it with behavioral engines. Identify interesting behavior with event engines, then profile those users:
# Step 1: Which plan drives the most DAU?
dau = ws.query("Login", math="dau", group_by="plan", last=30)
top_plan = dau.df.sort_values("count", ascending=False).iloc[0]["event"]
# Step 2: Profile users from that plan
users = ws.query_user(
mode="profiles",
where=Filter.equals("plan", top_plan),
properties=["$email", "company", "ltv"],
sort_by="ltv",
sort_order="descending",
limit=100,
)
print(f"Plan '{top_plan}' has {len(users.profiles)} top users")
print(users.df.describe())
UserQueryResult¶
All results are returned as UserQueryResult, a frozen dataclass with:
| Property | Type | Description |
|---|---|---|
.df |
pd.DataFrame |
Lazy cached DataFrame. Profiles mode: distinct_id, last_seen, then alphabetical properties ($ prefix stripped). Aggregate mode: metric/value columns. |
.total |
int |
Number of profiles returned (len(profiles)). Use mode='aggregate', aggregate='count' for full population count. |
.profiles |
list[dict] |
Normalized profile dicts |
.distinct_ids |
list[str] |
List of distinct IDs from profiles |
.value |
int \| float \| None |
Scalar aggregate result (aggregate mode only) |
.params |
dict |
Engage API params used (for debugging) |
.meta |
dict |
Execution metadata (session_id, pages_fetched, parallel, workers) |
.to_dict() |
dict |
JSON-serializable output |
Previewing Parameters¶
Inspect the generated Engage API params without executing:
params = ws.build_user_params(
where=Filter.equals("plan", "premium"),
properties=["$email", "ltv"],
sort_by="ltv",
)
import json
print(json.dumps(params, indent=2))
What's Next¶
- Unified Query System — how all five engines work together
- Insights Queries — event-level analytics
- Funnel Queries — conversion analysis
- Retention Queries — cohort retention
- Flow Queries — path analysis
- API Reference — full method signatures