Is this a recurring workflow type with known context requirements?
→ K13 (Retrieval Bundle): specify the exact context bundle BEFORE writing retrieval code
Prevents the rediscovery problem (up to 85% of token budget on context assembly)
Does the entire working set fit an affordable context window?
→ Benchmark K9 (Long Context) vs K1 (Vanilla RAG) at your actual corpus size
K9 wins when: corpus fits, queries are diverse, retrieval precision is hard to tune
K1 wins when: corpus is large, queries are targeted, caching matters
Do you need in-context retrieval?
Are queries multi-hop or relational? → K3 (GraphRAG)
Variable abstraction levels required? → K4 (RAPTOR)
Factuality-critical, possibly stale corpus? → K5 (Adaptive RAG)
Query/document mismatch suspected? → K2 (Query Transformation) wrapping K1
Default retrieval: → K1 (Vanilla RAG)
Does the context window need management during a session?
Remove spent/irrelevant spans (lossless)? → K7 (Context Pruning) — preserves prefix cache
Summarise overflowing history (lossy)? → K6 (Context Compression)
⚠ K6 and K7 invalidate the provider prefix cache
Agent needs explicit scratchpad? → K8 (Working Memory)
Do you need cross-session memory?
Flat facts across sessions? → K10 (Long-Term Memory)
Append-only activity log + prefix caching? → K11 (Observational Memory)
LLM-curated structured notes? → K12 (Karpathy Memory)
K11 and K12 are complementary branches of the same memory strategy — run together