GO4 — Master Reference List

Consolidated bibliography for all patterns across all seven categories. Organised by source type. Every citation used in any pattern file appears here. Patterns that cite each source are listed in brackets.

Academic Papers

Foundational LLM Papers

Brown, T., Mann, B., Ryder, N., et al. (2020) "Language Models are Few-Shot Learners" NeurIPS 2020 arXiv: 2005.14165 $\to$ Established in-context learning (few-shot). The empirical foundation for S2 (Few-Shot), I2 (Function Call). Cited by: S2, I2

Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) "Attention Is All You Need" NeurIPS 2017 arXiv: 1706.03762 $\to$ The transformer architecture underlying all patterns in this collection. Cited by: foundational context

Olsson, C., Elhage, N., Nanda, N., et al. (2022) "In-Context Learning and Induction Heads" Transformer Circuits Thread (Anthropic) transformer-circuits.pub/2022/in-context-learning/index.html $\to$ Induction heads: a two-step attention circuit performing match-and-copy ([A][B]…[A]$\to$[B]); argued to be a major mechanism behind in-context learning. Mechanistic basis for why few-shot examples work. Cited by: S2

Liu, N. F., Lin, K., Hewitt, J., et al. (2024) "Lost in the Middle: How Language Models Use Long Contexts" TACL 2024 arXiv: 2307.03172 $\to$ U-shaped recall over long context: strong at the start/end, materially weaker in the middle. Empirical foundation for the "clean the data room first" discipline. Cited by: K-series (Chapter 0 Mechanism 4)

Prompting and Reasoning Papers

Wei, J., Wang, X., Schuurmans, D., et al. (2022) "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" NeurIPS 2022 arXiv: 2201.11903 $\to$ Established CoT as a prompting technique. Direct foundation for R1 (Zero-Shot CoT) and R2 (Few-Shot CoT). Cited by: R1, R2

Wang, X., Wei, J., Schuurmans, D., et al. (2022) "Self-Consistency Improves Chain of Thought Reasoning in Language Models" ICLR 2023 arXiv: 2203.11171 $\to$ Established self-consistency voting. N=5-10 samples; majority vote outperforms greedy decoding on reasoning tasks. Cited by: R17, R-category conflict notes

Kojima, T., Gu, S. S., Reid, M., et al. (2022) "Large Language Models are Zero-Shot Reasoners" NeurIPS 2022 arXiv: 2205.11916 $\to$ "Let's think step by step" zero-shot CoT. Foundation for R1. Cited by: R1

Wang, L., Xu, W., Lan, Y., et al. (2023) "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models" ACL 2023 arXiv: 2305.04091 $\to$ Establishes Plan-and-Solve as two-step: extract plan $\to$ execute. Foundation for R3. Cited by: R3

Yao, S., Zhao, J., Yu, D., et al. (2022) "ReAct: Synergizing Reasoning and Acting in Language Models" ICLR 2023 arXiv: 2210.03629 $\to$ The foundational ReAct paper. Thought-Action-Observation loop. One of the most cited papers in this collection. Cited by: R4, R5-conflict

Xu, B., Peng, B., Li, B., et al. (2023) "ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models" arXiv: 2305.18323 $\to$ Reasoning Without Observation. Plans all tool calls upfront. 5$\times$ token efficiency over ReAct. Cited by: R5

Press, O., Zhang, M., Min, S., et al. (2022) "Measuring and Narrowing the Compositionality Gap in Language Models" arXiv: 2210.03350 $\to$ Self-Ask decomposition pattern. Compositional multi-hop question answering. Cited by: R6

Shinn, N., Cassano, F., Berman, E., et al. (2023) "Reflexion: Language Agents with Verbal Reinforcement Learning" NeurIPS 2023 arXiv: 2303.11366 $\to$ GPT-4 HumanEval 80% $\to$ 91% via verbal self-critique. Foundation for R7, H2. Cited by: R7, H2

Madaan, A., Tandon, N., Gupta, P., et al. (2023) "Self-Refine: Iterative Refinement with Self-Feedback" NeurIPS 2023 arXiv: 2303.17651 $\to$ Generate-Critique-Refine loop without separate judge. Foundation for R8, O5. Cited by: R8

Yao, S., Yu, D., Zhao, J., et al. (2023) "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" NeurIPS 2023 arXiv: 2305.10601 $\to$ BFS/DFS over reasoning states. Foundation for R9. Cited by: R9

Zhou, A., Yan, K., Shlapentokh-Rothman, M., et al. (2024) "Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models" ICML 2024 arXiv: 2310.04406 $\to$ MCTS + ReAct + Reflexion unified. Foundation for R10. Cited by: R10

Yang, C., Wang, X., Lu, Y., et al. (2023) "Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models" NeurIPS 2024 arXiv: 2406.04271 $\to$ Reusable thought templates. 12% of ToT/GoT compute cost. Foundation for R11. Cited by: R11

Ning, X., Lin, Z., Zhou, Z., et al. (2024) "Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation" ICLR 2024 arXiv: 2307.15337 $\to$ Parallel section generation via outline. Reduces latency for structured long-form output. Foundation for R12. Cited by: R12

Wang, Z., Mao, S., Wu, W., et al. (2024) "Executable Code Actions Elicit Better LLM Agents" ICML 2024 arXiv: 2402.01030 $\to$ CodeAct: Python execution as agent action vs. JSON tool calls. ~20pp accuracy gain. Foundation for R13. Cited by: R13, V8

Chen, W., Ma, X., Wang, X., Cohen, W. W. (2022) "Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks" arXiv: 2211.12588 $\to$ Delegates computation to Python interpreter. Foundation for R14. Cited by: R14

Adams, G., Fabbri, A., Ladhak, F., et al. (2023) "From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting" arXiv: 2309.04269 $\to$ Iterative densification without length increase. Foundation for K6 Chain-of-Density variant. Cited by: K6

Memory and Knowledge Papers

Packer, C., Fang, V., Patil, S. G., et al. (2023) "MemGPT: Towards LLMs as Operating Systems" arXiv: 2310.08560 $\to$ OS-inspired memory hierarchy for LLMs. Main memory / external storage analogy. Foundation for K10, K11, H9. Cited by: K10, K11, H2, H9

Gao, L., Ma, X., Lin, J., Callan, J. (2023) "Precise Zero-Shot Dense Retrieval without Relevance Labels" ACL 2023 arXiv: 2212.10496 $\to$ HyDE: hypothetical document embeddings improve sparse query retrieval. Foundation for K2. Cited by: K2

Edge, D., Trinh, H., Cheng, N., et al. (2024) "From Local to Global: A Graph RAG Approach to Query-Focused Summarization" arXiv: 2404.16130 $\to$ GraphRAG: entity-relationship graph for multi-hop retrieval. Foundation for K3. Cited by: K3

Sarthi, P., Abdullah, R., Tuli, A., et al. (2024) "RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval" ICLR 2024 arXiv: 2401.18059 $\to$ Multi-level summary tree for hierarchical retrieval. Foundation for K4. Cited by: K4

Asai, A., Wu, Z., Wang, Y., et al. (2024) "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection" ICLR 2024 arXiv: 2310.11511 $\to$ Model decides when to retrieve; critiques own outputs. Foundation for K5. Cited by: K5

Yan, S., Gu, J., Zhu, Y., Ling, Z. (2024) "Corrective Retrieval Augmented Generation" arXiv: 2401.15884 $\to$ Evaluates retrieval quality; triggers web search fallback. Foundation for K6. Cited by: K6

Agent Architecture Papers

Wang, G., Xie, Y., Jiang, Y., et al. (2023) "Voyager: An Open-Ended Embodied Agent with Large Language Models" arXiv: 2305.16291 $\to$ Autonomous Minecraft agent building a skill library. Foundation for H4. Cited by: H4

Salemi, A., Mysore, S., Bendersky, M., Zamani, H. (2023) "LaMP: When Large Language Models Meet Personalization" arXiv: 2304.11406 $\to$ LLM personalisation: user-specific style adaptation. Foundation for H7. Cited by: H7

Cognitive Architecture Papers

"Theater of Mind: A Global Workspace Framework for LLM Agent Architecture" (2025) arXiv: 2604.08206 $\to$ Global Workspace Theory applied to LLMs. Introduces: Genesis State, autobiographical directives, entropy monitoring for deadlock breaking, epistemic state tracking. Foundation for H1, H3, H9. Cited by: H1, H3, H6, H9

"MIRROR: Inner Monologue as a First-Class Architectural Component" (2025) arXiv: 2506.00430 $\to$ Background Thinker process, continuous inner monologue, LEGOMem skill accumulation. Foundation for H4, H6, R15. Cited by: H4, H6, R15

"Talker-Reasoner: Dual-Process Architecture for Conversational Agents" (2024) arXiv: 2410.08328 $\to$ System 1 (Talker: fast, reactive) + System 2 (Reasoner: slow, deliberative) dual architecture. Foundation for R16. Cited by: R16

"Agentic Communities: Patterns for Multi-Agent AI Systems" (2025) arXiv: 2601.03624 $\to$ 46-pattern catalog. ISO ODP-EL deontic governance tokens (PERMIT, PROHIBIT, OBLIGATE, WAIVE). Foundation for V7, O-category patterns, H5. Cited by: V7, H5, O9-O13

"Inside the Scaffold: Empirical Taxonomy of Coding Agent Architectures" (2025) arXiv: 2604.03515 $\to$ 13 coding agents, 12 dimensions, 5 loop primitives. Key finding: 11/13 use stacked primitives. Two fault lines: LLM-as-navigator vs scaffold-understands-code. Foundation for O16. Cited by: O16

"Blackboard Multi-Agent Systems for LLMs" (bMAS) (2024) arXiv: 2510.01285 $\to$ Shared blackboard architecture achieving SOTA reasoning at lower token cost than static pipelines. Foundation for O11. Cited by: O11

Evaluation Papers

Zheng, L., Chiang, W., Sheng, Y., et al. (2023) "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena" NeurIPS 2023 arXiv: 2306.05685 $\to$ LLM-as-Judge methodology, position/verbosity/self-similarity bias documentation. Foundation for V15. Cited by: V15

Safety and Security Papers

Bai, Y., Jones, A., Ndousse, K., et al. (2022) "Constitutional AI: Harmlessness from AI Feedback" Anthropic arXiv: 2212.08073 $\to$ Constitutional AI: RLHF + self-critique against a set of principles. Foundation for S9, H5. Cited by: S9, H5

Perez, F., Ribeiro, I. (2022) "Ignore Previous Prompt: Attack Techniques for Language Models" arXiv: 2211.09527 $\to$ First systematic study of prompt injection. Documents injection attack classes. Foundation for V6. Cited by: V6

Prompt Engineering Papers

White, J., Fu, Q., Hays, S., et al. (2023) "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT" PLoP 2023 (Vanderbilt University) arXiv: 2302.11382 $\to$ 16-pattern prompt pattern catalog in GoF format. The closest prior work to this entire project. Covers Signal patterns primarily. Cited by: S1-S10, meta-reference

"AutoPDL: Automated Prompt Design with Large Language Models" (2025) arXiv: 2504.04365 $\to$ Automated prompt design loop. Foundation for S8, H8. Cited by: S8, H8

"Meta Prompting: Enhancing Language Models with Task-Agnostic Scaffolding" (2023) arXiv: 2311.11482 $\to$ Meta-prompting: model generates candidate prompts; selects best. Foundation for S8. Cited by: S8

Books

Gamma, E., Helm, R., Johnson, R., Vlissides, J. (1994) Design Patterns: Elements of Reusable Object-Oriented Software Addison-Wesley $\to$ The original Gang of Four. This entire project is an attempt to do for AI engineering what GoF did for OOP. Cited by: all files (foundational)

Nygard, M. T. (2007) Release It! Design and Deploy Production-Ready Software Pragmatic Bookshelf (2nd ed. 2018) $\to$ Circuit breaker pattern. Stability patterns for production systems. Foundation for V9. Cited by: V9

Baddeley, A. D. (2000) Working Memory, Thought, and Action Oxford University Press (Original model: Baddeley & Hitch, 1974) $\to$ Episodic buffer, central executive, visuospatial sketchpad, phonological loop. Grounds K10 Long-Term Memory (episodic, semantic, and procedural variants). Foundation for cognitive grounding of memory patterns. Cited by: K10, H9

Minsky, M. (1986) The Society of Mind Simon & Schuster $\to$ Society of mind as multi-agent architecture. Foundation for O10 (Swarm). Cited by: O10

Kahneman, D. (2011) Thinking, Fast and Slow Farrar, Straus and Giroux $\to$ System 1 (fast, intuitive) / System 2 (slow, deliberative) dual-process theory. Foundation for R16 (Talker-Reasoner). Cited by: R16

Specifications and Standards

Anthropic Model Context Protocol (MCP) Specification (November 2024) modelcontextprotocol.io $\to$ Standardised tool discovery, authentication, and invocation. Foundation for I3. Cited by: I3, V13, CONFLICTS

Google Agent-to-Agent (A2A) Protocol Specification (2024) github.com/google-a2a/A2A $\to$ Structured cross-agent task delegation with streaming status. Foundation for I5, I6. Cited by: I5, I6

IBM/Red Hat Agent Communication Protocol (ACP) (2025) $\to$ RESTful, message-based agent communication. Alternative to A2A. Foundation for I6. Cited by: I6

Linux Foundation Agentic AI Interoperability Framework (AAIF) (2025) $\to$ Standards body for agent interoperability. Covers A2A, ACP, ANP. Foundation for I5, I6. Cited by: I5, I6

OpenTelemetry GenAI Semantic Conventions (CNCF, 2024-25) opentelemetry.io/docs/specs/semconv/gen-ai/ $\to$ Standard trace format for LLM operations. Foundation for V14. Cited by: V14

OWASP LLM Top 10 (2025 Edition) owasp.org/www-project-top-10-for-large-language-model-applications/ $\to$ LLM01 Prompt Injection, LLM06 Excessive Agency, LLM07 System Prompt Leakage, LLM08 Code Execution. Foundation for V3, V4, V6, V8. Cited by: V3, V4, V5, V6, V8

European Union AI Act (2024) eur-lex.europa.eu — Regulation (EU) 2024/1689 $\to$ Article 9 (Risk Management), Article 14 (Human Oversight), Article 52 (Transparency obligations). Foundation for V1, V7, H10. Cited by: V1, V7, H10

NIST AI Risk Management Framework (AI RMF 1.0) (2023) airc.nist.gov/technical-reports/ [direct PDF link stale — landing page confirmed live] $\to$ Govern, Map, Measure, Manage framework. Foundation for V5, V7, V18. Cited by: V5, V7, V18

IETF RFC 8615 — Well-Known Uniform Resource Identifiers (2019) $\to$ /.well-known/ standard. Foundation for I5 (Agent Card URL convention). Cited by: I5

ISO/IEC ODP Enterprise Language (ODP-EL) $\to$ Deontic modalities used in Agentic Communities paper for governance tokens. Foundation for V7. Cited by: V7

Practitioner Frameworks

Andrew Ng (2024) "What's next for AI agentic workflows" deeplearning.ai / Sequoia Capital interview $\to$ Four agentic patterns: Reflection, Tool Use, Planning, Multi-Agent Collaboration. Cited by: all categories (foundational context)

Anthropic (2024-25) "Building Effective Agents" anthropic.com/research/building-effective-agents $\to$ Five workflow patterns: Prompt Chaining, Routing, Parallelization, Orchestrator-Workers, Evaluator-Optimizer. Primary source for O2-O6. Cited by: O2, O3, O4, O5, O6, V1, V14

Anthropic (2025) "Effective Context Engineering for AI Agents" anthropic.com/engineering/effective-context-engineering-for-ai-agents $\to$ Canonical "context as finite resource" post. Verbatim: LLMs have an "attention budget"; transformer attention is n² in tokens; recall degrades as context grows; goal is "the smallest possible set of high-signal tokens." Primary mechanistic source for the K-series and the data-room workflow. Cited by: K-series (Chapter 0 Mechanisms 2, 5)

Anthropic (2025) "Equipping Agents for the Real World with Agent Skills" anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills $\to$ Three-level progressive disclosure (metadata $\to$ SKILL.md $\to$ bundled files); bundled context "effectively unbounded." Mechanistic basis for skills-not-prompts. Cited by: I-series (Chapter 0 Mechanism 1)

Anthropic (2025) "Writing Effective Tools for AI Agents" anthropic.com/engineering/writing-tools-for-agents $\to$ Tools as a contract between deterministic systems and non-deterministic agents; bundle deterministic operations rather than have the model re-derive them. Cited by: I-series, V-series

Anthropic (2025) "Code Execution with MCP: Building More Efficient AI Agents" anthropic.com/engineering/code-execution-with-mcp $\to$ Treating tool calls as code keeps intermediate results out of context; reports ~98.7% token reduction (150k $\to$ 2k) in one case. Determinism-vs-sampling evidence. Cited by: I-series

Anthropic (2025-26) "Claude Code Memory" and "Memory Tool" (docs) docs.anthropic.com/en/docs/claude-code/memory · platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool $\to$ Persistence is externalised memory (CLAUDE.md / MEMORY.md / /memory files re-loaded into context), not weight updates. Corrects the "skills compound" folk-claim. Cited by: H-series (Chapter 0 Mechanism 10)

Dex Horthy / HumanLayer (2025) "12-Factor Agents: Best Practices for Building AI Agents in Production" github.com/humanlayer/12-factor-agents [original domain 12factor.agency has expired] $\to$ All 12 factors: Natural Language to Structured Output; Own Your Prompts; Own Your Context Window; Own Your State, Separate from Session; Call LLM as a Pure Function; Human in the Loop; Small Focused Agents; Own Your Control Flow; Compact Errors; Trigger from Anywhere; Trust Nobody; Stateless by Default. Cited by: V1, V9, V10, V11, V12, V14

Lilian Weng (2023-25) "LLM-powered Autonomous Agents" lilianweng.github.io/posts/2023-06-23-agent/ $\to$ Comprehensive survey covering planning, memory, tool use, multi-agent. One of the most-cited practitioner resources. Cited by: S2, S3, R17, R4, R7, K10, K11, H7, V15

Simon Willison (2023-25) "Prompt injection attacks against GPT-3" and subsequent posts simonwillison.net $\to$ Lethal Trifecta concept (3 conditions for catastrophic injection risk). 6 defense patterns. Dual LLM pattern. Cited by: V3, V4, V5, V6

Andrej Karpathy (2025) "Software Is Eating the World, AI Is Eating Software" and related talks $\to$ "Harness engineering" era framing. Vibe coding $\to$ agentic engineering transition. Context engineering. Cited by: all categories (foundational context)

Martin Fowler and Birgitta Böckeler (2024) "Exploring Generative AI" series martinfowler.com/articles/exploring-gen-ai.html $\to$ Harness Architecture 2$\times$2 framework. Practical agent design patterns. Cited by: background context

Industry Reports

Composio (2025) "AI Agent Report 2025" composio.dev/blog/ai-agent-report [temporarily unavailable June 2026 due to security incident — report expected to return] $\to$ Key findings: 88% of AI agents never reach production. Tool overload quantification: 43% $\to$ 14% selection accuracy. Production failure root cause analysis. Simulation as recommended mitigation. Cited by: V1, V9, V13, V16, V18

PineCone (2025) "Nexus: Agent Operating Context" and NoQL query language pinecone.io/blog/nexus [link unavailable as of June 2026 — content may have moved within Pinecone docs] $\to$ Explicit repositioning from vector similarity to agent operating context bundles. NoQL carries intent, filters, access policy, provenance, response shape, and confidence — not just similarity. Rediscovery quantification: up to 85% of agent compute consumed by context re-assembly rather than task execution. Conceptual and empirical foundation for K13 Retrieval Bundle. Cited by: K13

PageIndex (2025) Document tree retrieval — hierarchical indexing for structured documents pageindex.ai $\to$ Claim: many documents should never be chunked because document structure carries meaning that vector flattening destroys. Hierarchical tree approach (table of contents with per-node summaries; model reasons through tree to find section). Reports 98.7% accuracy on FinanceBench evaluation using tree retrieval vs. lower accuracy with embedding-based chunk retrieval. Foundation for the structured document shape in K13 and confirmation of K4 RAPTOR's core principle. Cited by: K13, K4

Chroma (2025) "Context Rot" research trychroma.com $\to$ Model performance degrades as context window fills with mixed-authority, mixed-freshness, and inferred-alongside-confirmed content — not because the correct answer is absent, but because it is not presented in a form the model uses reliably. Named failure mode: context rot. Distinct from lost-in-the-middle (mechanism 4): context rot is specifically about authority and freshness mixing, not positional under-attendance. Foundation for K13's per-field authority labeling requirement and K9's "appropriate context not maximum context" discipline. Cited by: K13, K9

Cognitive Science References

Tulving, E. (1985) "Memory and Consciousness" Canadian Psychology, 26(1), 1–12 DOI: 10.1037/h0080017 $\to$ Episodic vs. semantic memory distinction. Foundation for K10/K11 split. Cited by: K10, K11, H1

Berlyne, D. E. (1966) "Curiosity and Exploration" Science, 153(3731), 25–33 DOI: 10.1126/science.153.3731.25 $\to$ Optimal arousal theory. Curiosity as entropy-seeking. Foundation for H3. Cited by: H3

Premack, D., Woodruff, G. (1978) "Does the chimpanzee have a theory of mind?" Behavioral and Brain Sciences, 1(4), 515–526 DOI: 10.1017/S0140525X00076512 $\to$ Theory of Mind. Foundation for H7 (Adaptive Persona as user model). Cited by: H7

Clark, A., Chalmers, D. (1998) "The Extended Mind" Analysis, 58(1), 7–19 DOI: 10.1093/analys/58.1.7 $\to$ External tools as cognitive extensions. Foundation for K11 (Observational Memory as extended mind). Cited by: K11

Saltzer, J. H., Schroeder, M. D. (1975) "The Protection of Information in Computer Systems" Proceedings of the IEEE, 63(9) DOI: 10.1109/PROC.1975.9939 $\to$ Principle of least privilege. Foundation for V4 (Dual LLM), V8 (Tool Sandboxing). Cited by: V4, V8

Baars, B. J. (1988) A Cognitive Theory of Consciousness Cambridge University Press archive.org/details/cognitivetheoryo0000baar $\to$ Global Workspace Theory. Conscious processing as broadcast to global workspace. Foundation for O11 (Blackboard System). Cited by: O11, H6, Theater of Mind paper

Vygotsky, L. S. (1934/1986) Thought and Language MIT Press (Kozulin translation) archive.org/details/thoughtlanguage0000vygo $\to$ Inner speech as internalized dialogue. Foundation for R15 (Inner Monologue), H6 (Continuous Inner Monologue). Cited by: R15, H6

Skjuve, M., Følstad, A., Fostervold, K. I., Brandtzaeg, P. B. (2021) "My Chatbot Companion — a Study of Human-Chatbot Relationships" International Journal of Human-Computer Studies, 149, 102601 DOI: 10.1016/j.ijhcs.2021.102601 $\to$ Parasocial relationship formation with AI agents. Foundation for H10 (Relational Memory) ethical constraints. Cited by: H10

Community Sources

Hacker News — MCP and Tool Overhead Discussion (2024-25) Multiple threads including: "Show HN: Model Context Protocol" discussion; "MCP is the npm of AI tools" thread Search on Hacker News $\to$ Community quantification of token overhead. Practitioner backlash on schema costs. "Supply chain risk" framing. Cited by: I3

Hacker News — LangChain Backlash (2024) "Ask HN: Why are people moving away from LangChain?" Search on Hacker News $\to$ 80+ package dependencies. Death by abstraction. MCP as disruption of LangChain value proposition. Cited by: I6

Hacker News — Production Agent Failures (2024-25) Various threads on agent reliability and production incidents Search on Hacker News $\to$ Context for A1-A15 anti-patterns. Empirical grounding for reliability patterns. Cited by: V-category patterns

Reference Summary by Pattern Category

Category	Key Primary Sources
Signal (S)	White et al. 2023 (PLoP), Brown et al. 2020, Bai et al. 2022, Adams et al. 2023, Wang et al. 2022
Knowledge (K)	Packer et al. 2023, Gao et al. 2023, Edge et al. 2024, Sarthi et al. 2024, Asai et al. 2024, Clark & Chalmers 1998, PineCone 2025, PageIndex 2025, Chroma 2025
Reasoning (R)	Wei et al. 2022, Yao et al. 2022 (ReAct), Xu et al. 2023 (ReWOO), Shinn et al. 2023, Yao et al. 2023 (ToT), Zhou et al. 2024 (LATS), Wang et al. 2024 (CodeAct)
Orchestration (O)	Anthropic 2024-25, Agentic Communities 2025, Scaffold Taxonomy 2025, bMAS 2024, Minsky 1986, Kahneman 2011
Reliability (V)	OWASP LLM 2025, EU AI Act 2024, NIST AI RMF, Willison 2023-25, Nygard 2007, Bai et al. 2022, Zheng et al. 2023, Composio 2025, 12-Factor Agents
Integration (I)	Anthropic MCP 2024, Google A2A 2024, IBM ACP 2025, AAIF 2025, Brown et al. 2020
Humanizers (H)	Theater of Mind 2025, MIRROR 2025, Talker-Reasoner 2024, Shinn et al. 2023, Voyager 2023, Salemi et al. 2023, Tulving 1985, Berlyne 1966, Skjuve et al. 2021

Open Access Links

All arXiv papers are freely available at arxiv.org/abs/[ID].

Paper	arXiv ID
GPT-3 (Brown et al.)	2005.14165
Chain-of-Thought (Wei et al.)	2201.11903
Self-Consistency (Wang et al.)	2203.11171
Zero-Shot CoT (Kojima et al.)	2205.11916
Plan-and-Solve (Wang et al.)	2305.04091
ReAct (Yao et al.)	2210.03629
ReWOO (Xu et al.)	2305.18323
Self-Ask (Press et al.)	2210.03350
Reflexion (Shinn et al.)	2303.11366
Self-Refine (Madaan et al.)	2303.17651
Tree of Thoughts (Yao et al.)	2305.10601
LATS (Zhou et al.)	2310.04406
Buffer of Thoughts (Yang et al.)	2406.04271
Skeleton-of-Thought (Ning et al.)	2307.15337
CodeAct (Wang et al.)	2402.01030
Program of Thoughts (Chen et al.)	2211.12588
Chain of Density (Adams et al.)	2309.04269
MemGPT (Packer et al.)	2310.08560
HyDE (Gao et al.)	2212.10496
GraphRAG (Edge et al.)	2404.16130
RAPTOR (Sarthi et al.)	2401.18059
Self-RAG (Asai et al.)	2310.11511
Corrective RAG (Yan et al.)	2401.15884
Voyager (Wang et al.)	2305.16291
LAMP Personalisation (Salemi et al.)	2304.11406
LLM-as-Judge (Zheng et al.)	2306.05685
Constitutional AI (Bai et al.)	2212.08073
Prompt Injection (Perez & Ribeiro)	2211.09527
Prompt Pattern Catalog (White et al.)	2302.11382
AutoPDL	2504.04365
Meta Prompting	2311.11482
Theater of Mind	2604.08206
MIRROR Inner Monologue	2506.00430
Talker-Reasoner	2410.08328
Agentic Communities	2601.03624
Scaffold Taxonomy	2604.03515
Blackboard MAS (bMAS)	2510.01285

GO4 — AI Engineering Design Patterns