Private Draft

The 29 personas behind AI

We’ve organized every stage and persona in the AI supply chain, informed by real recruiting at frontier companies. Click any row to see matching profiles from our talent graph.

Shaped by Industry Experts
Kumar Chellapilla
Kumar ChellapillaVPE
Jennifer Anderson
Jennifer AndersonVPE / Stanford PhD
Thuan Pham
Thuan PhamCTO
Akash Garg
Akash GargCTO
Linghao Zhang
Linghao ZhangResearch Engineer
Wayne Chang
Wayne ChangEarly FB Engineer
Indrajit Khare
Indrajit KhareEM & Head of Product
← ATOMS & ENERGYUSERS & MARKETS →
← Back

Training

Scales ideas into training runs
Training

Known as: Member of Technical Staff, Research Engineer, Research Software Engineer

Trains models from scratch (pre-training), continues training on specialized datasets (mid-training), or distills frontier model capabilities into smaller, deployable form factors. Owns training runs end-to-end: data mixing, hyperparameter tuning, convergence monitoring, and checkpoint evaluation. A large and growing share of training compute goes to distillation — compressing what the frontier model knows into models cheap and fast enough to ship.

Specializations

Data Mixing & Curriculum Decides what data goes into each phase of training and in what proportions. Data mixing ratios, curriculum scheduling, multi-epoch strategies, domain weighting, and the ablation experiments that validate choices. The training recipe that determines what the model learns and when.
Scaling & Convergence The experimental and optimization decisions that make training runs work: learning rate schedules, loss diagnostics, convergence monitoring, precision tuning, and gradient health. Reads loss curves and gradient statistics to decide when a run is healthy, when to intervene, and when to kill it.
Checkpoint Evaluation & Run Management Evaluating checkpoints against capability and safety benchmarks, deciding when to stop or branch, managing the lifecycle of training experiments, and determining whether a run has produced something useful. Eval signal is the primary feedback loop for data mixing and hyperparameter decisions (consumes signals from the Evaluation & Benchmarking persona).
Distillation Trains smaller models against the logits of a larger teacher model to compress frontier capabilities into deployable form factors. A fundamentally different optimization problem from pre-training: the objective is to preserve as much capability as possible within a parameter and latency budget, not to push the frontier. Multi-epoch training on soft labels pulls capability out of the teacher that hard labels miss. At frontier labs, distillation is what turns each generation's flagship model into the affordable, low-latency variant that actually ships at scale — the model behind the API, in the search product, in the email client. The strategic decisions (what teacher to distill from, what quality targets to hit, where to accept capability loss) are as consequential as the pre-training recipe.
[1]Substrate
[2]Compute
Secondary

Runs distributed training across GPU clusters, consuming massive compute for weeks-long runs.

[3]Intelligence
Primary

Owns the training recipe — data mixing, hyperparameters, convergence — that turns compute into learned capability.

[4]Systems
[5]Distribution
Philip Wagener
Philip Wagener
Anthropic
Data mixing & curriculum

Decides what the model learns when — mixture ratios, phase schedules, and ablation-driven recipe changes.

Lori Eliza
Lori Eliza
OpenAI
Convergence diagnostician

Reads loss and gradient health, tunes schedules and precision, and calls whether a run is sick or salvageable.

Zack Miller
Zack Miller
xAI
Run manager

Owns stop/branch decisions and checkpoint triage that turn compute into usable model artifacts.

Early-Stage
Rare
Growth
Occasional
Enterprise
Primary

Frontier labs for pretraining. Growth-stage for domain-specific training with the right data and compute.

Let’s Find Your Next Builder

If you’re hiring at the AI frontier, let’s talk.