Introduction

Eighty-eight percent of AI agents never reach production. The ones that fail aren't failing because the model isn't good enough — they're failing because the engineering around the model is wrong. Wrong retrieval strategy. Wrong context management. No bounds on the agent loop. Token costs compounding at the square of context length on tasks engineers assumed were linear. A second model that would catch the errors the first cannot see in itself — never added because nobody had a name for it.

The patterns exist. Engineers at different companies are independently discovering that routing works better with a classifier in front of it, that long-running agents need checkpointing, that parallel workers sharing a stable context prefix should cache it once. The techniques circulate as blog posts, conference talks, and GitHub repositories — each framed slightly differently, none connected to the others, none carrying the analysis that would let a practitioner know when to use one, why it solves what it solves, or what it costs.

In 1994, Gamma, Helm, Johnson, and Vlissides faced a structurally similar problem in object-oriented software. They did not invent Observer, Factory, Strategy, or Decorator — experienced engineers were already using them. What they did was name them precisely, describe the forces each one resolves, and give practitioners a shared vocabulary to reason with. A generation of engineers could say "use a Strategy here" and mean something exact. The vocabulary spread because it was useful, not because it was novel.

This catalog applies that method to AI engineering. It is a homage to the Gang of Four approach, not a claim to their authority. The patterns documented here are already in practice — the goal is to name them precisely enough to be useful.

The throughline throughout is simple: use the smallest sufficient pattern. Zero-shot before few-shot. Single agent before multi-agent. Retrieval only when it earns its context budget. The patterns here are not ranked by sophistication — a well-placed Zero-Shot is not a lesser engineering decision than a Tree of Thoughts. Each pattern is appropriate or inappropriate given the problem's actual requirements, the context budget available, and what the next simpler pattern fails to achieve.

This book is for engineers building LLM systems in production: architects choosing between retrieval patterns, engineers implementing agent loops that need bounds, teams debugging why a multi-agent system costs ten times what was projected. It assumes you can write code. It does not assume you have read the transformer papers — that material is in the Mechanisms chapter at the back, there when you need it.

The catalog covers seven categories. Signal patterns govern how instructions, personas, and examples are shaped before the model sees them. Knowledge patterns govern context engineering — what information and memory the model has access to during a task. Reasoning patterns govern how a model structures its thinking: chain-of-thought, planning, tool use, reflection, and verification. Orchestration patterns govern how agents are coordinated — chains, routers, parallel workers, and hierarchies. Reliability patterns govern safety, cost bounds, and production hardening. Integration patterns govern how agents reach tools and other agents. Humanizer patterns govern continuity and adaptive behaviour across sessions.

Each entry gives you: an Intent (one sentence); a Motivation (the concrete problem and why it recurs); Applicability (when to use it and when not to); Decision Criteria (measurements and thresholds that distinguish this pattern from its alternatives); a Participants table with explicit must-not constraints; an Implementation Sketch; and a Related Patterns section covering dependencies, conflicts, and upgrade paths.

How to read this book. These entries are reference material — use them when you are choosing between alternatives, implementing a pattern for the first time, or recognising a failure mode you have encountered. You do not need to read sequentially. Jump to the category you need; each category introduction covers the forces every pattern in that group resolves before you reach individual entries.

The Mechanisms chapter at the back derives twelve principles from how transformers actually compute — attention cost scaling, KV cache structure, prefix caching economics, subagent context bounding. It is there for when you want to understand why a pattern's costs are what they are, not just that they are. Mechanism citations in pattern entries (for example, mechanism 2 — n² compute cost) are cross-references to that chapter. You can use the catalog without opening it; it is a derivation of what the patterns already tell you.

No production system uses a single pattern in isolation. The Implementation Sketches throughout name which patterns compose naturally — which Reliability patterns wrap which Orchestration patterns, which Knowledge patterns feed which Reasoning patterns. The Appendix on Conflicts documents the tensions that require explicit design decisions: patterns that cannot run simultaneously, dependencies that are non-negotiable, and tradeoffs that cannot be resolved by convention.

The vocabulary this catalog establishes is a tool for thinking, not a checklist for compliance. Use it to communicate precisely with colleagues, to evaluate proposals against known forces, and to recognise when a new problem is in fact an old problem that has already been solved.

GO4 — AI Engineering Design Patterns

Introduction