Private Draft

The 29 personas behind AI

We’ve organized every stage and persona in the AI supply chain, informed by real recruiting at frontier companies. Click any row to see matching profiles from our talent graph.

Shaped by Industry Experts

← ATOMS & ENERGYUSERS & MARKETS →

← Back

Training

Scales ideas into training runs

Training

Summary

Known as: Member of Technical Staff, Research Engineer, Research Software Engineer

Trains models from scratch (pre-training), continues training on specialized datasets (mid-training), or distills frontier model capabilities into smaller, deployable form factors. Owns training runs end-to-end: data mixing, hyperparameter tuning, convergence monitoring, and checkpoint evaluation. A large and growing share of training compute goes to distillation — compressing what the frontier model knows into models cheap and fast enough to ship.

Specializations

Data Mixing & Curriculum — Decides what data goes into each phase of training and in what proportions. Data mixing ratios, curriculum scheduling, multi-epoch strategies, domain weighting, and the ablation experiments that validate choices. The training recipe that determines what the model learns and when.

Scaling & Convergence — The experimental and optimization decisions that make training runs work: learning rate schedules, loss diagnostics, convergence monitoring, precision tuning, and gradient health. Reads loss curves and gradient statistics to decide when a run is healthy, when to intervene, and when to kill it.

Checkpoint Evaluation & Run Management — Evaluating checkpoints against capability and safety benchmarks, deciding when to stop or branch, managing the lifecycle of training experiments, and determining whether a run has produced something useful. Eval signal is the primary feedback loop for data mixing and hyperparameter decisions (consumes signals from the Evaluation & Benchmarking persona).

Distillation — Trains smaller models against the logits of a larger teacher model to compress frontier capabilities into deployable form factors. A fundamentally different optimization problem from pre-training: the objective is to preserve as much capability as possible within a parameter and latency budget, not to push the frontier. Multi-epoch training on soft labels pulls capability out of the teacher that hard labels miss. At frontier labs, distillation is what turns each generation's flagship model into the affordable, low-latency variant that actually ships at scale — the model behind the API, in the search product, in the email client. The strategic decisions (what teacher to distill from, what quality targets to hit, where to accept capability loss) are as consequential as the pre-training recipe.

Where the Work Lives

[1]Substrate

[2]Compute

Secondary

Runs distributed training across GPU clusters, consuming massive compute for weeks-long runs.

[3]Intelligence

Primary

Owns the training recipe — data mixing, hyperparameters, convergence — that turns compute into learned capability.

[4]Systems

[5]Distribution

Candidate Archetypes

Philip Wagener

Anthropic

Data mixing & curriculum

Decides what the model learns when — mixture ratios, phase schedules, and ablation-driven recipe changes.

Lori Eliza

OpenAI

Convergence diagnostician

Reads loss and gradient health, tunes schedules and precision, and calls whether a run is sick or salvageable.

Zack Miller

xAI

Run manager

Owns stop/branch decisions and checkpoint triage that turn compute into usable model artifacts.

Company Scale

Early-Stage

Rare

Growth

Occasional

Enterprise

Primary

Frontier labs for pretraining. Growth-stage for domain-specific training with the right data and compute.

Featured Roles

Partnership Inquiries

We partner selectively with teams hiring for roles where the right person changes the trajectory.