We’ve organized every stage and persona in the AI supply chain, informed by real recruiting at frontier companies. Click any row to see matching profiles from our talent graph.







Summary
Known as: Data Engineer, ML Engineer, Research Engineer (Data), Program Manager, Operations Lead
Curates and creates the data that shapes model capability. Serves both pre-training and post-training: the same function supplies web-scale corpora for base models and preference data for RLHF. This function is more central than it sounds; what data makes it into training, and how it's weighted, strongly determines what the model can do.
Specializations
Where the Work Lives
What data makes it into training — and how it's weighted — strongly determines what the model can do.
Candidate Archetypes
Builds and negotiates pipelines for licensed, crawled, and partnered corpora with provenance constraints.
Generates targeted data, runs quality gates, and closes the loop by feeding verified outputs back into training.
Runs the RLHF/RLVR supply chain — rubrics, QA, throughput, vendors, and the annotation tooling surface.
Company Scale
Any org training or fine-tuning. Early-stage outsources to Scale/Surge; growth+ builds in-house.
Featured Roles
We partner selectively with teams hiring for roles where the right person changes the trajectory.