We’ve organized every stage and persona in the AI supply chain, informed by real recruiting at frontier companies. Click any row to see matching profiles from our talent graph.







Summary
Known as: Member of Technical Staff, Research Engineer, RL Engineer
Modifies model weights to turn base models into deployment-ready systems: instruction-following, stronger reasoning on multi-step tasks, steerable behavior, and safer/more reliable outputs. Uses fine-tuning and preference-based optimization (often with reinforcement learning) to shape behavior. This is where many "what it's like to use" changes occur.
Specializations
The strategic split between pre-training compute and RL compute is a live frontier decision — labs are only beginning to scale RL compute and expect to increase it dramatically. This changes the hiring weight: more RL infrastructure and reward engineering, less data-mixing optimization. The surface area of RL environments is expanding fast — computer use (GUI navigation, web browsers, desktop applications) is now a distinct training domain alongside code, math, and tool use.
Where the Work Lives
Modifies model weights via RLHF, DPO, and RL to shape behavior, reasoning, and safety.
Defines how models behave in deployment — safety tuning, instruction following, and behavioral guardrails.
Candidate Archetypes
Turns base models into compliant assistants via supervised fine-tuning, format hardening, and behavior shaping.
Owns preference datasets, reward signals, and the RLHF/DPO/RLVR optimization loop that shapes model behavior.
Drives refusal boundaries, policy adherence, and behavior regression gating via tight eval loops.
Company Scale
Frontier labs for RLHF. Growth-stage occasionally hires for fine-tuning.
Featured Roles
If you’re hiring at the AI frontier, let’s talk.