Private Draft

The 29 personas behind AI

We’ve organized every stage and persona in the AI supply chain, informed by real recruiting at frontier companies. Click any row to see matching profiles from our talent graph.

Shaped by Industry Experts
Kumar Chellapilla
Kumar ChellapillaVPE
Jennifer Anderson
Jennifer AndersonVPE / Stanford PhD
Thuan Pham
Thuan PhamCTO
Akash Garg
Akash GargCTO
Linghao Zhang
Linghao ZhangResearch Engineer
Wayne Chang
Wayne ChangEarly FB Engineer
Indrajit Khare
Indrajit KhareEM & Head of Product
← ATOMS & ENERGYUSERS & MARKETS →
← Back

Serving Infrastructure

Runs the serving platform
Serving Infrastructure

Known as: Serving Platform Engineer, Inference Infrastructure Engineer, GPU Platform Engineer, ML Serving Engineer

Builds and operates the platform that serves models at scale: GPU scheduling, multi-tenant serving, capacity management, cost attribution, and cloud compute procurement.

Specializations

GPU Scheduling & Multi-Tenant Serving GPU resource management, multi-tenant isolation, request routing, priority scheduling, and the orchestration layer that turns a pool of GPUs into a reliable serving platform. Handles heterogeneous hardware, mixed workloads, and preemption policies.
Capacity Management & Cost Attribution Capacity planning for inference workloads, cost-per-query attribution, utilization optimization, and demand forecasting. Turns GPU spend into measurable cost-per-token and cost-per-request, enabling product teams to make informed tradeoffs between quality, latency, and cost.
Cloud Compute & Capacity Planning GPU-as-a-service procurement, cloud provider management, workload scheduling optimization, and cloud cost engineering. Navigates multi-cloud GPU allocation, reserved vs. spot capacity, and the brokerage layer (CoreWeave, Lambda, Crusoe) that connects training and serving demand to available compute.
Agent Execution Infrastructure Sandboxed compute environments for autonomous agent actions, state management and checkpointing for long-running agent tasks, delegated credential scoping and least-privilege enforcement, and observability for agent execution traces. A distinct workload pattern from model serving — long-running, stateful, credential-bearing — emerging as its own infrastructure category at companies building agentic products (Anthropic, Perplexity, OpenAI). Includes secure tool-execution runtimes, container isolation for untrusted code, and the recovery infrastructure that lets agents resume across failures.
[1]Substrate
[2]Compute
Primary

GPU scheduling, capacity management, and cloud compute procurement for inference workloads.

[3]Intelligence
[4]Systems
Primary

Builds the multi-tenant serving platform — request routing, isolation, and cost attribution.

[5]Distribution
Philip Wagener
Philip Wagener
Together
GPU platform

Turns a GPU pool into an isolatable, reliable serving platform with routing and priority semantics.

Clive Silvia
Clive Silvia
Anyscale
Capacity & cost

Makes cost-per-token legible and optimizable as a business control surface.

Lori Eliza
Lori Eliza
Fireworks
Cloud procurement

Arbitrages reserved and spot capacity across providers to keep inference supply ahead of demand.

Early-Stage
Occasional
Growth
Common
Enterprise
Primary

Inference companies building on serving runtimes like vLLM and SGLang (Anyscale, Modal, Together, Fireworks) and frontier lab serving teams.

Let’s Find Your Next Builder

If you’re hiring at the AI frontier, let’s talk.