Private Draft

The 29 personas behind AI

We’ve organized every stage and persona in the AI supply chain, informed by real recruiting at frontier companies. Click any row to see matching profiles from our talent graph.

Shaped by Industry Experts

← ATOMS & ENERGYUSERS & MARKETS →

← Back

Alignment

Makes models do what we intend

Alignment

Summary

Known as: Research Scientist (Safety), Alignment Researcher, Interpretability Researcher, AI Red Team Engineer, Safety Engineer, Adversarial ML Researcher

Research and engineering for alignment, interpretability, and safe model behavior. Makes models do what people intend even when instructions are ambiguous, users try to break them (jailbreaks, prompt injection), or systems take actions over many steps.

Specializations

Alignment Research & Interpretability — Core alignment theory and methods: reward modeling, scalable oversight, Constitutional AI, debate, and related approaches to keeping models aligned as capabilities grow. Interpretability research studies model internals (superposition, circuits, feature visualization) to understand and predict behavior. The goal is robust alignment techniques that scale with model capability.

Safety Evaluation & Red Teaming — Adversarial probing, safety benchmarks, attack discovery, threat modeling, automated safety testing, and abuse/misuse scenario generation. Finds and exploits failure modes (jailbreaks, prompt injection, tool misuse) and turns them into mitigations, evals, and hardening work. Includes offensive AI security: novel attack surface discovery, adversarial ML research, and jailbreak research.

In many orgs the same people red-team and then fix what they find via RLHF. The split between safety research and safety tuning is an org-design choice, not a hard technical boundary.

Where the Work Lives

[1]Substrate

[2]Compute

[3]Intelligence

Primary

Researches alignment techniques, interpretability, and reward modeling to make models do what we intend.

[4]Systems

Primary

Safety evaluation, red teaming, and adversarial testing that harden models before and during deployment.

[5]Distribution

Candidate Archetypes

Tom Banks

Anthropic

Interpretability

Studies internal representations and circuits to predict and constrain model behavior.

Xing Anh

OpenAI

Safety eval & red team

Discovers jailbreak, prompt-injection, and tool-misuse failure modes and turns them into repeatable test assets.

Lillian Wilkinson

DeepMind

Scalable oversight

Builds supervision schemes and reward models that hold up as model capability grows.

Company Scale

Early-Stage

Rare

Growth

Occasional

Enterprise

Primary

Frontier labs and safety-focused orgs. Most enterprises do governance, not alignment research.

Featured Roles

Partnership Inquiries

We partner selectively with teams hiring for roles where the right person changes the trajectory.