Category III — Reasoning Patterns

A Reasoning pattern is a design pattern that governs how the model processes its context to produce a structured answer — how it decomposes the problem, what intermediate work it writes down, whether it branches or backtracks, whether it calls tools, and how it checks itself before committing. Reasoning patterns separate the shape of thought from the content the model is reasoning over.

Usage

A language model that answers in one forward pass commits to the first plausible completion it can find. For arithmetic, multi-hop questions, planning, tool use, and any task with a non-obvious solution path, that is precisely where it fails: the answer is fluent, the reasoning behind it is invented after the fact, and there is no internal step at which the model could have caught its own mistake. Reasoning patterns intervene on the process — they prescribe what the model writes between the question and its final answer, and what it does with those intermediate writings.

The shift Reasoning patterns embody is from one-shot generation to structured deliberation. The category covers everything from the smallest such intervention — appending "Let's think step by step" — to the most elaborate: Monte Carlo Tree Search over an agent's own action trajectories. Apply a Reasoning pattern whenever:

the task requires multi-step inference the model will not produce by default;
the answer must be auditable as well as correct (intermediate steps inspectable);
the model must interact with the world via tools and adapt to what it observes;
one-shot generation is empirically unreliable on the task and quality matters more than the saved tokens.

Forces

Every Reasoning pattern resolves three forces in tension. A pattern fits a situation when it balances them in the way that situation demands.

Tokens are not free, and reasoning is tokens. Every intermediate step the model writes, every retry, every branch in a tree of thoughts, every verification pass — all of them are tokens billed and latency added. Cost rises at least linearly in call count, and quadratically in attention cost per call as context accumulates (mechanism 2 — O(seq_len²) attention). For looping patterns (ReAct, Reflexion, Self-Refine, Tree-of-Thoughts), each step appends to the accumulated context, so each subsequent LLM call attends over a longer prefix at growing quadratic cost — making the true cost super-linear, not linear, with deliberation depth.
One-shot answers are confident but unreliable. Left to itself, the model commits to its first plausible completion. On compositional, numerical, multi-hop, or tool-mediated tasks, that completion is wrong often enough that an intervention is required — and the intervention has to be structural, because the model cannot self-detect its own failure mode without being asked to.
Adaptability and efficiency trade off directly. A pattern that decides every next step in light of the previous observation (ReAct) is maximally adaptive but expensive. A pattern that plans everything upfront and executes blind (ReWOO) is cheap but cannot react. Every Reasoning pattern picks a point on this spectrum, explicitly. The mechanistic basis of this trade-off: ReAct's accumulated trajectory context grows with each step, paying O(n²) attention cost (mechanism 2) on every subsequent LLM call; ReWOO's Worker phase is deterministic code execution (mechanism 7) — no LLM calls, no stochastic variance, no KV-cache growth. The adaptability/efficiency spectrum maps directly onto the question of how much in-context stochastic generation versus deterministic computation is on the critical path.

A Reasoning pattern is, in each case, a disciplined answer to one question: what structure of deliberation gives this task the quality it needs at a cost the system can afford?

Structure

All Reasoning patterns share one skeleton. They interpose a deliberation stage between the question and the answer:

  Question ────▶ Deliberation ────▶ Answer
                 (decompose,
                  search,
                  tool-use,
                  verify,
                  iterate)

Patterns differ in what the deliberation does — write a linear chain, branch into a tree, alternate thought and tool call, plan-then-execute, draft-then-verify, retry-with-critique — and in where the intermediate work lives — in the prompt itself, across multiple LLM calls, in a sandboxed interpreter, or in an external memory carried across attempts. These four locations correspond to distinct storage tiers with different cost profiles (mechanism 9): in-context storage pays O(n²) attention cost on every call; prefix caching across calls (mechanism 5) pays a one-time write cost then reads at ~10% of normal token cost within a TTL window; external execution environments (deterministic interpreters, tool sandboxes) store intermediate values at near-zero LLM cost (mechanism 7); external stores (vector indices, key-value stores) pay retrieval cost but no attention cost per token (mechanism 10). Choosing where deliberation lives is a cost-architecture decision, not just a structural one. The sub-bands below group patterns by the shape of the deliberation they prescribe.

Examples

III-A — Chain-of-Thought family. Linear, in-context reasoning traces.

R1 Zero-Shot CoT — append a trigger phrase; let the model write the chain.
R2 Few-Shot CoT — supply worked examples with reasoning steps.
R19 Step-Back Prompting — abstract the question first, derive the principle, then specialise back.

III-B — Plan-and-Act. Separate planning from execution.

R3 Plan-and-Solve — produce an inspectable plan upfront, then execute it.
R5 ReWOO — plan every tool call with placeholders, execute without an LLM in the loop, synthesise once.

III-C — Tool-Use loops. Interleave reasoning with actions against the world.

R4 ReAct — Thought $\to$ Action $\to$ Observation, repeat; each next step conditioned on what came back.
R13 CodeAct — emit executable Python as the action language, with stdout / errors returning as the Observation.
R14 Program of Thoughts — delegate the computation (not the orchestration) to a deterministic interpreter.

III-D — Decomposition. Break the question apart before answering it.

R6 Self-Ask — explicit follow-up sub-questions, each answered, then composed.
R12 Skeleton-of-Thought — outline first, then expand each outline point in parallel.

III-E — Search. Explore a space of partial solutions rather than committing to one path.

R9 Tree of Thoughts — branching search with LLM-evaluated nodes and backtracking.
R10 LATS — Monte Carlo Tree Search unifying ReAct, ToT, and Reflexion under UCB selection.
R18 Graph of Thoughts — directed graph of thoughts with aggregate edges that merge branches no tree can.
R11 Buffer of Thoughts — retrieve a thought-template from past problems instead of re-searching.

III-F — Reflection and Verification. Generate, then check or improve.

R7 Reflexion — verbal critique of a failed attempt carried into the retry.
R8 Self-Refine — generate, self-critique, revise, loop — single model, no external signal.
R17 Self-Consistency Voting — sample N independent reasoning paths and take the majority.
R20 Chain-of-Verification — draft, generate verification questions, answer them independently, revise.

III-G — Multi-Mode. Run two distinct reasoning modes side by side.

R16 Talker-Reasoner — a fast conversational Talker and a slow deliberative Reasoner running concurrently against a shared memory.

Quick Reference

#	Pattern	Also Known As	LLM Calls	Best For
R1	Zero-Shot CoT	"Think step by step"	1	Quick reasoning improvement; no examples
R2	Few-Shot CoT	Exemplar CoT	1	Consistent reasoning format with examples
R3	Plan-and-Solve	Explicit Planning	2	Well-defined multi-step workflows
R4	ReAct	Reason+Act Loop	N per step	Exploratory; adaptive; unpredictable paths
R5	ReWOO	Reasoning Without Observation	2 total	Independent tool calls; 5$\times$ cheaper than R4
R6	Self-Ask	Decomposition	1 + N follow-ups	Multi-hop factual questions
R7	Reflexion	Verbal Reinforcement	N $\times$ retries	Clear pass/fail criteria; retries acceptable
R8	Self-Refine	Generate-Critique-Refine	N iterations	General quality improvement; no separate judge
R9	Tree of Thoughts	ToT	N (branching)	Hard open-ended; path unknown
R10	LATS	Language Agent Tree Search	N (tree search)	Highest quality; highest cost
R11	Buffer of Thoughts	BoT	1 + template	12% cost of ToT; reusable templates
R12	Skeleton-of-Thought	SoT	1 + N parallel	Parallel generation; latency reduction
R13	CodeAct	Executable Code Actions	N (with execution)	Multi-tool; ~20pp accuracy gain over JSON
R14	Program of Thoughts	PoT	1 + execution	Numerical/mathematical tasks
R16	Talker-Reasoner	System 1/System 2	Dual async	Real-time + deliberative combined
R17	Self-Consistency	Majority Voting	N samples	Factual tasks; sample and vote
R18	Graph of Thoughts	GoT	N (DAG)	Non-linear reasoning; merging thought branches
R19	Step-Back Prompting	Abstraction Prompting	2	Abstract to principle before answering
R20	Chain of Verification	CoVe	1 + N verifications	Reduce hallucination; verify each claim

R1 — Zero-Shot CoT

Append a short reasoning-elicitation trigger (canonically "Let's think step by step") to a zero-shot prompt and let the model write its reasoning out before the final answer — no examples, no decomposition, no scaffold.

Full entry: R1-Zero-Shot-CoT.md

R2 — Few-Shot CoT

Put k worked examples in the prompt — each one a complete question with its reasoning steps leading to the answer — so the model learns from the demonstrations both how to reason about the task and what the answer should look like.

Full entry: R2-Few-Shot-CoT.md

R3 — Plan-and-Solve

Split reasoning into two distinct LLM calls — first a Plan call that produces an explicit, inspectable step list from the full task in view, then an Execute call (or chain) that carries the plan out — so plan quality and execution efficiency are tuned independently.

Full entry: R3-Plan-and-Solve.md

R4 — ReAct

Interleave a free-text Thought, a structured Action (tool call), and the returning Observation in a single loop, so each next reasoning step is conditioned on what the previous action actually returned rather than on a plan made before the world was seen.

Full entry: R4-ReAct.md

R5 — ReWOO

Plan every tool call upfront in a single LLM pass, execute the plan without any LLM in the loop, then synthesise the answer from the collected evidence — trading mid-run adaptability for roughly 5$\times$ token efficiency over R4.

Full entry: R5-ReWOO.md

R6 — Self-Ask

Decompose a compositional question into explicit follow-up sub-questions, answer each one (optionally via a tool or retriever), then compose the final answer from the intermediate answers.

Full entry: R6-Self-Ask.md

R7 — Reflexion

Retry a failed task with a verbal critique of the previous attempt in context — converting an automated pass/fail signal into linguistic feedback that the next attempt can read and act on.

Full entry: R7-Reflexion.md

R8 — Self-Refine

Have one model generate an output, critique its own output, and revise it from that critique — looping until a stopping condition fires, with no external feedback signal and no second model.

Full entry: R8-Self-Refine.md

R9 — Tree of Thoughts

Search a tree of partial-solution states by having the LLM generate candidate next thoughts, evaluate the promise of each, and explore the most promising branches with backtracking — turning one-shot reasoning into deliberate exploration of a solution space.

Full entry: R9-Tree-of-Thoughts.md

R10 — Language Agent Tree Search (LATS)

Run Monte Carlo Tree Search over an agent's reasoning trajectories: select promising branches by UCB, expand with LLM-proposed actions, evaluate with an LLM value function, simulate forward, and backpropagate value through the tree — so the agent searches the solution space the way AlphaGo searches a board. Unifies R4, R7, and R9 under MCTS.

Full entry: R10-LATS.md

R11 — Buffer of Thoughts

Maintain a meta-buffer of reusable high-level thought-templates distilled from past problems, and for each new problem retrieve the most relevant template and instantiate it — trading expensive per-problem search for amortised reuse of reasoning structure.

Full entry: R11-Buffer-of-Thoughts.md

R12 — Skeleton-of-Thought

Generate an outline of the answer in one call, then expand each outline point in parallel, then aggregate — turning a sequentially-decoded long-form response into a fan-out / fan-in inside a single agent's thinking.

Full entry: R12-Skeleton-of-Thought.md

R13 — CodeAct

Have the agent emit executable Python code as its action — calling tools, composing them with control flow, and parking intermediate values in variables — instead of emitting a single structured JSON tool call per step, with the code running in a sandbox and its stdout / errors returning as the Observation.

Full entry: R13-CodeAct.md

R14 — Program of Thoughts

Generate a self-contained program that computes the answer, run it in a deterministic interpreter, return the interpreter's output — delegating numerical and symbolic work out of the model's tokens and into code. Distinct from R13: PoT offloads computation, CodeAct offloads orchestration.

Full entry: R14-Program-of-Thoughts.md

R15 — Inner Monologue: intentional gap. The MIRROR paper (arXiv:2506.00430) proposed inner monologue as a background reasoning substrate. After review, this was classified as a Humanizer concern — it describes how an agent maintains continuous inner speech across turns and sessions, not a reasoning technique applied within a single turn. Documented as H6 Continuous Inner Monologue (Humanizers category). R15 is reserved and will not be reused.

R16 — Talker-Reasoner

Split the agent into a fast, conversational Talker that handles every user turn in real time and a slow, deliberative Reasoner that thinks in the background and injects conclusions when ready — two cognitive speeds running concurrently against a shared memory.

Full entry: R16-Talker-Reasoner.md

R17 — Self-Consistency Voting

Run the same prompt N times with diversity-inducing sampling, then select the answer by majority vote — marginalising over independent reasoning paths instead of trusting any single one.

Full entry: R17-Self-Consistency-Voting.md — was a Signal pattern (former S7); relocated here because the mechanism is sampling diverse reasoning paths, not shaping the prompt.

R18 — Graph of Thoughts

Represent reasoning as a directed graph whose vertices are LLM-generated thoughts and whose edges are generate, refine, and — uniquely — aggregate operations, so partial results from different branches can be merged into a single composite thought that no tree-shaped search can produce.

Full entry: R18-Graph-of-Thoughts.md

R19 — Step-Back Prompting

Before answering a specific question, ask a more abstract version of it, derive the underlying principle or concept, and then specialise that principle back to the original — so reasoning starts from a level the model handles more reliably than the specific case.

Full entry: R19-Step-Back-Prompting.md — the Step-Back-as-retrieval-key move is the Step-Back variant of K2 Query Transformation; same abstraction applied at a different layer.

R20 — Chain-of-Verification

Have a model draft an answer, generate verification questions targeted at its own factual claims, answer each question independently so the answers do not lean on the draft, and revise the draft from those answers — turning hallucination into a thing the model checks against itself.

Full entry: R20-Chain-of-Verification.md

GO4 — AI Engineering Design Patterns