Private Draft

The 29 personas behind AI

We’ve organized every stage and persona in the AI supply chain, informed by real recruiting at frontier companies. Click any row to see matching profiles from our talent graph.

Shaped by Industry Experts

← ATOMS & ENERGYUSERS & MARKETS →

← Back

Serving Infrastructure

Runs the serving platform

Serving Infrastructure

Summary

Known as: Serving Platform Engineer, Inference Infrastructure Engineer, GPU Platform Engineer, ML Serving Engineer

Builds and operates the platform that serves models at scale: GPU scheduling, multi-tenant serving, capacity management, cost attribution, and cloud compute procurement.

Specializations

GPU Scheduling & Multi-Tenant Serving — GPU resource management, multi-tenant isolation, request routing, priority scheduling, and the orchestration layer that turns a pool of GPUs into a reliable serving platform. Handles heterogeneous hardware, mixed workloads, and preemption policies.

Capacity Management & Cost Attribution — Capacity planning for inference workloads, cost-per-query attribution, utilization optimization, and demand forecasting. Turns GPU spend into measurable cost-per-token and cost-per-request, enabling product teams to make informed tradeoffs between quality, latency, and cost.

Cloud Compute & Capacity Planning — GPU-as-a-service procurement, cloud provider management, workload scheduling optimization, and cloud cost engineering. Navigates multi-cloud GPU allocation, reserved vs. spot capacity, and the brokerage layer (CoreWeave, Lambda, Crusoe) that connects training and serving demand to available compute.

Agent Execution Infrastructure — Sandboxed compute environments for autonomous agent actions, state management and checkpointing for long-running agent tasks, delegated credential scoping and least-privilege enforcement, and observability for agent execution traces. A distinct workload pattern from model serving — long-running, stateful, credential-bearing — emerging as its own infrastructure category at companies building agentic products (Anthropic, Perplexity, OpenAI). Includes secure tool-execution runtimes, container isolation for untrusted code, and the recovery infrastructure that lets agents resume across failures.

Where the Work Lives

[1]Substrate

[2]Compute

Primary

GPU scheduling, capacity management, and cloud compute procurement for inference workloads.

[3]Intelligence

[4]Systems

Primary

Builds the multi-tenant serving platform — request routing, isolation, and cost attribution.

[5]Distribution

Candidate Archetypes