A complete framework for building AI-native companies that operate with autonomous, self-improving agents — covering architecture, evals, observability, security, and the boundaries between humans and machines.
Every principle belongs to one of nine structural pillars. Together they define how a Level 5 agentic company thinks, builds, and operates.
Three principles that define what the system is built on and how it thinks about every problem.
Architectural decisions that define how the system grows, what it trusts, and how it sources information.
Always prefer the highest-quality context source available.
Context is the lifeblood of an agentic system — and the most common failure point. A bloated context degrades quality just as much as an empty one.
The order matters. Skipping steps is how you end up with expensive agents solving the wrong problems.
The sharpest architectural decisions in an agentic system aren't technical — they're about where the human ends and the machine begins.
Not an afterthought. Not a nice-to-have. The eval framework is what separates a one-shot script from a learning system.
Build the eval framework before you build the agents. If you can't measure it, you can't improve it — and you definitely can't deploy it safely. Evals are not a post-build QA step; they're the scaffolding everything else hangs from.
Every agent, every workflow, every tool needs a quantifiable output quality metric. This is non-negotiable. Scoring is what makes iteration possible. An agent you can't score is a black box you can't improve.
An agent without an eval is a one-shot script. An agent with an eval is a learning system. The loop closes when output quality can be measured and fed back into prompt improvements, tool changes, and routing decisions.
Don't measure how the agent got there — measure whether the result is correct, complete, and in the expected format. Let the agent figure out the method. Eval the destination, not the journey. This is how you avoid brittle, over-specified agents.
Tool design is the most common source of agentic system failures at scale. Agents compound tool failures — bad tool design poisons everything downstream.
Three operational principles that determine whether your system is maintainable at scale.
Agentic systems have novel attack surfaces. Prompt injection, credential leakage, and scope creep are real. Security belongs at the boundary, not inside the agent.
Don't route everything to the largest model. Classify by task complexity. Cost and latency are architectural constraints, not operational ones.
| Model Tier | Task Type | Example Use Cases | Why Not Larger? |
|---|---|---|---|
| Opus | Deep reasoning, strategic synthesis | Complex multi-step planning, ambiguous inputs requiring judgment, high-stakes decisions | — |
| Sonnet | Primary execution, generation | Content generation, code execution, structured extraction, primary agent workhorse | Opus is overkill for deterministic tasks with clear specs and scoring |
| Haiku | Classification, routing, lightweight transforms | Intent classification, routing decisions, short format checks, high-volume preprocessing | Sonnet and Opus are overpriced and over-latent for binary/categorical outputs |
Filter by pillar to focus on what matters most for your current stage.