Skip to content

User Profile Queries

Query user profiles from Mixpanel's Engage API — filter by properties, sort, select fields, count matching profiles, and fetch large result sets with parallel pagination. Uses the same Filter vocabulary as all other query engines.

New in v0.2

Workspace.query_user() is the 5th engine in the unified query system. It answers identity questions ("who are these users?") that complement the behavioral questions answered by insights, funnels, retention, and flows.

When to Use query_user()

Use query_user() when you need to work with user profiles rather than events:

Use Case Example
Filter profiles by property query_user(mode="profiles", where=Filter.equals("plan", "premium"))
Count matching profiles query_user(where=Filter.is_set("$email"))
Get top users by a metric query_user(mode="profiles", sort_by="ltv", sort_order="descending", limit=50)
Look up specific users query_user(mode="profiles", distinct_id="user_abc123")
Profile a behavioral cohort query_user(mode="profiles", cohort=CohortDefinition.all_of(...))
Export profiles at scale query_user(mode="profiles", properties=[...], limit=5000, parallel=True)
Cross-engine profiling Insights identifies a segment, query_user() profiles those users

Use stream_profiles() when you need to iterate over raw profile dicts without structured filtering or DataFrame output.

Getting Started

import mixpanel_data as mp
from mixpanel_data import Filter

ws = mp.Workspace()

# Quick count — how many profiles exist? (default: mode="aggregate")
result = ws.query_user()
print(f"Total profiles: {result.value}")

# Quick peek — one sample profile
result = ws.query_user(mode="profiles")
print(result.df)

# Filter and select properties
result = ws.query_user(
    mode="profiles",
    where=Filter.equals("plan", "premium"),
    properties=["$email", "$name", "ltv"],
    sort_by="ltv",
    sort_order="descending",
    limit=50,
)
print(result.df)  # distinct_id | last_seen | email | name | ltv

Aggregate Mode

Aggregate mode is the default (mode="aggregate"). Compute statistics across matching profiles without fetching individual records:

# Count users with email (aggregate is the default mode)
count = ws.query_user(where=Filter.is_set("$email"))
print(f"Users with email: {count.value}")

# Total users (all)
total = ws.query_user()
print(f"Total profiles: {total.value}")

# Extremes (min/max) of a numeric property
result = ws.query_user(
    aggregate="extremes",
    aggregate_property="ltv",
)
print(result.aggregate_data)  # {"max": 9500, "min": 0, ...}

# Percentile
result = ws.query_user(
    aggregate="percentile",
    aggregate_property="ltv",
    percentile=90,
)
print(result.aggregate_data)  # {"percentile": 90, "result": 4500}

# Numeric summary (count, mean, variance, sum_of_squares)
result = ws.query_user(
    aggregate="numeric_summary",
    aggregate_property="ltv",
)
print(result.aggregate_data)  # {"count": 1532, "mean": 245.6, ...}

# Segmented count by cohort IDs
result = ws.query_user(
    segment_by=[12345, 67890],
)
print(result.df)  # columns: segment, value

Behavioral Filtering

Filter by behavioral criteria using the same CohortDefinition builders available across all engines:

from mixpanel_data import CohortDefinition, CohortCriteria

# Users who purchased 3+ times in 30 days
result = ws.query_user(
    mode="profiles",
    cohort=CohortDefinition.all_of(
        CohortCriteria.did_event("Purchase", at_least=3, within_days=30),
    ),
    properties=["$email", "plan", "ltv"],
    limit=200,
)
print(f"Power buyers: {len(result.profiles)}")

# Filter by saved cohort ID
result = ws.query_user(mode="profiles", cohort=12345, limit=100)

Parallel Fetching

For large result sets, enable concurrent page retrieval:

result = ws.query_user(
    mode="profiles",
    where=Filter.is_set("$email"),
    properties=["$email", "plan", "ltv"],
    limit=5000,
    parallel=True,
    workers=5,
)
print(f"Fetched {len(result.profiles)} profiles")
print(f"Pages: {result.meta['pages_fetched']}, Workers: {result.meta['workers']}")

Cross-Engine Composition

The real power of query_user() is combining it with behavioral engines. Identify interesting behavior with event engines, then profile those users:

# Step 1: Which plan drives the most DAU?
dau = ws.query("Login", math="dau", group_by="plan", last=30)
top_plan = dau.df.sort_values("count", ascending=False).iloc[0]["event"]

# Step 2: Profile users from that plan
users = ws.query_user(
    mode="profiles",
    where=Filter.equals("plan", top_plan),
    properties=["$email", "company", "ltv"],
    sort_by="ltv",
    sort_order="descending",
    limit=100,
)
print(f"Plan '{top_plan}' has {len(users.profiles)} top users")
print(users.df.describe())

UserQueryResult

All results are returned as UserQueryResult, a frozen dataclass with:

Property Type Description
.df pd.DataFrame Lazy cached DataFrame. Profiles mode: distinct_id, last_seen, then alphabetical properties ($ prefix stripped). Aggregate mode: metric/value columns.
.total int Number of profiles returned (len(profiles)). Use mode='aggregate', aggregate='count' for full population count.
.profiles list[dict] Normalized profile dicts
.distinct_ids list[str] List of distinct IDs from profiles
.value int \| float \| None Scalar aggregate result (aggregate mode only)
.params dict Engage API params used (for debugging)
.meta dict Execution metadata (session_id, pages_fetched, parallel, workers)
.to_dict() dict JSON-serializable output

Previewing Parameters

Inspect the generated Engage API params without executing:

params = ws.build_user_params(
    where=Filter.equals("plan", "premium"),
    properties=["$email", "ltv"],
    sort_by="ltv",
)
import json
print(json.dumps(params, indent=2))

What's Next