We’ve organized every stage and persona in the AI supply chain, informed by real recruiting at frontier companies. Click any row to see matching profiles from our talent graph.







Summary
Known as: Evaluation Engineer, Research Engineer (Evals), Research Scientist (Evals)
Owns eval methodology, task design, benchmark suites, and launch gating as a cross-cutting function. Defines what "better" means and builds the suites that tell the org whether things are improving. Most personas consume eval signals; this persona designs the methodology and maintains the benchmarks.
Specializations
Eval and reward design are collapsing into the same function at frontier labs — as RLVR scales, the people designing capability benchmarks are increasingly the same people designing reward curricula. This is shifting hiring toward eval engineers who can build training feedback loops, not just measurement suites.
Where the Work Lives
Defines capability benchmarks, task design, and eval methodology that measure model progress.
Owns launch gating, regression suites, and product-quality thresholds for safe deployment.
Candidate Archetypes
Writes tasks, graders, and suites that measure real capability without becoming training targets.
Owns thresholds, suite health, and cross-release comparability that drive go/no-go decisions.
Defends eval signal from benchmark contamination, data leakage, and optimization pressure.
Company Scale
Dedicated eval orgs at frontier labs (Anthropic, OpenAI, DeepMind). Growth-stage evals are typically owned by researchers or a senior engineer part-time; dedicated teams emerge at scale.
Featured Roles
If you’re hiring at the AI frontier, let’s talk.