What about Mark 1
Mark 1 infrastructure was originally designed for serverless ML model inference but proved inadequate for running agentic workloads. Built on Knative Custom Resource Definitions (CRDs) running atop managed Kubernetes clusters, it leveraged KNative Serving’s scale-to-zero capabilities and Kubernetes’ container orchestration features. The infrastructure utilized pod autoscaling through the Knative Autoscaler (KNA). It also allowed to federate multiple clusters via a Blaxel agent that would offload inference requests from one Knative cluster to another based on a usage metric. While it demonstrated reasonable stability even at 20+ requests per second and achieved somewhat acceptable cold starts through runtime optimization, its architecture wasn’t suited for the more lightweight workloads that make up most of autonomous agents: tool calls, agent orchestration, and external model routing. Mark 1 infrastructure was decommissioned in January 2025.What about Mark 2
Mark 2 infrastructure used containers to run workloads, providing emulation of most Linux system calls. Cold starts typically took between 2 and 10 seconds. After a deployment was queried, it stayed warm for a period that varies based on overall infrastructure usage, allowing it to serve subsequent requests instantly. Mark 2 infrastructure was suitable when:- your workload required system calls not supported by Mk 3 infrastructure
- boot times of around 5 seconds was suitable for your needs
- your deployment received consistent traffic that keeps it running warm
- you needed to run workloads in specific regions for sovereignty or regulatory compliance using deployment policies
- you required revision control for rollbacks or canary deployments
