Advanced RAG Patterns
Advanced RAG Patterns
Section titled “Advanced RAG Patterns”Standard RAG is a knife. These patterns are the rest of the kitchen. You don’t always need them — but when you do, you’ll know.
Section titled “Standard RAG is a knife. These patterns are the rest of the kitchen. You don’t always need them — but when you do, you’ll know.”After this chapter, you will be able to: identify which advanced RAG pattern solves a specific problem, explain how each one works, and make an honest decision about whether you actually need one.
When Standard RAG Falls Short
Section titled “When Standard RAG Falls Short”The pipeline you have built — chunk, embed, retrieve, generate — works remarkably well for straightforward questions over clean documents. But the real world is not straightforward. Users ask vague questions. They ask complex multi-part questions. Sometimes your retrieval just misses.
These advanced patterns exist to handle those edge cases. But here is the honest truth up front: most RAG use cases need zero of these patterns. Some need one. Rarely two. If your standard pipeline scores well on evaluation (Chapter 7), do not add complexity for the sake of it. Every pattern adds latency, cost, and debugging surface area.
With that caveat, here are the six patterns worth knowing.
Pattern 1 — HyDE (Hypothetical Document Embeddings)
Section titled “Pattern 1 — HyDE (Hypothetical Document Embeddings)”The problem it solves: the user’s query is vague or phrased very differently from how the answer appears in your documents.
Think about this: a user asks “Why is my app slow?” Your knowledge base contains a chunk that says “Database connection pooling reduces latency by reusing existing connections.” These two sentences mean related things, but their surface-level similarity is low. The query is a question in casual language. The answer is a technical statement. Standard semantic search might miss it.
The hypothetical answer does not need to be correct. It just needs to be phrased like the kind of document that would contain the real answer. This bridges the vocabulary gap between how people ask questions and how documents are written.
User query: "Why is my app slow?"HyDE step: LLM generates → "Application slowness is commonly caused by unoptimised database queries, insufficient connection pooling, and memory leaks in long-running processes."Search with: Embed the hypothetical answer, not the original queryUse HyDE when:
- ✅ Queries are vague or conversational
- ✅ Documents are formal and technical
- ✅ You see vocabulary mismatch between user language and document language
Skip HyDE when:
- ⚠️ Queries are already specific and precise
- ⚠️ Your current retrieval quality is strong
- ⚠️ Added latency and cost are not justified
Pattern 2 — Query Decomposition
Section titled “Pattern 2 — Query Decomposition”The problem it solves: the user asks a complex question that requires information from multiple different parts of your knowledge base.
Consider the question: “Compare Apple’s and Google’s revenue in 2023 and explain which company grew faster.” A single retrieval pass will struggle here. It needs chunks about Apple’s revenue, chunks about Google’s revenue, and possibly chunks about growth rates. A single query embedding cannot capture all three needs simultaneously.
- “What was Apple’s revenue in 2023?”
- “What was Google’s revenue in 2023?”
- “What were the year-over-year growth rates for each?”
Each sub-question gets its own retrieval pass. The results are combined and the LLM synthesises a final comparative answer from all the retrieved chunks.
Use Query Decomposition when:
- ✅ Questions are multi-part or comparative
- ✅ One query needs evidence from multiple document areas
Skip Query Decomposition when:
- ⚠️ Questions are short and single-topic
- ⚠️ Additional retrieval passes would add unnecessary latency
Pattern 3 — Self-RAG
Section titled “Pattern 3 — Self-RAG”The problem it solves: standard RAG always retrieves, even when it does not need to, and never checks whether its own answer is actually grounded.
Think of it like a student taking an open-book exam who is also honest about their own confidence. They check their notes when they need to, skip the notes when they are sure, and re-check when their answer feels shaky.
The self-evaluation step is the key innovation. In standard RAG, the model never questions its own output. In Self-RAG, the model generates special “reflection tokens” that indicate whether it thinks retrieval was helpful and whether the answer is grounded. If the self-check fails, it can retrieve again with a refined query.
Use Self-RAG when:
- ✅ Reliability requirements are strict
- ✅ You want retrieval decisions and self-checking behavior
- ✅ Hallucination control needs to improve beyond prompt-only guardrails
Skip Self-RAG when:
- ⚠️ Use case is straightforward document Q&A
- ⚠️ Retrieval is always required anyway
- ⚠️ You want to minimize system complexity
Pattern 4 — Corrective RAG (CRAG)
Section titled “Pattern 4 — Corrective RAG (CRAG)”The problem it solves: sometimes retrieval fails completely. The chunks you get back are not relevant to the question at all, but the model tries to answer from them anyway — producing a confidently wrong response.
That corrective action is typically a fallback to web search. If your internal documents do not have the answer, search the web instead. The system can also partially correct: keep the one good chunk, discard the three bad ones, and supplement with web results.
Query → Retrieve chunks → Score relevance ├── Scores HIGH → Proceed normally with retrieved chunks ├── Scores MEDIUM → Keep best chunks + supplement with web search └── Scores LOW → Discard all chunks → Fall back to web searchUse CRAG when:
- ✅ Internal retrieval sometimes fails completely
- ✅ Fallback-to-web is acceptable for your product
- ✅ You need a quality gate before generation
Skip CRAG when:
- ⚠️ Internal corpus already covers the full query space
- ⚠️ Security/compliance blocks external web access
Pattern 5 — Multi-Vector Retrieval
Section titled “Pattern 5 — Multi-Vector Retrieval”The problem it solves: a single embedding per chunk captures only one view of what that chunk is about. Some queries match the chunk’s summary better than its full text. Some match the hypothetical questions the chunk could answer.
- The full text embedding (what you already have)
- A summary embedding (a one-sentence summary of the chunk, embedded separately)
- Hypothetical question embeddings (questions that this chunk would answer, generated by an LLM and embedded)
When a query comes in, retrieval searches across all representations. A query phrased as a question might match the hypothetical question embedding better than the full text embedding. A broad query might match the summary embedding best.
Use Multi-Vector Retrieval when:
- ✅ Documents are long or semantically dense
- ✅ Users query in many different styles
- ✅ Single-vector retrieval misses relevant content
Skip Multi-Vector Retrieval when:
- ⚠️ Chunks are short and well-scoped
- ⚠️ Single-vector retrieval already scores well
- ⚠️ Storage/indexing cost needs to stay lean
Pattern 6 — Agentic RAG
Section titled “Pattern 6 — Agentic RAG”The problem it solves: the user’s question requires multiple steps, and the system needs to decide dynamically what to do next based on intermediate results.
Think of it as the difference between a vending machine and a chef. Standard RAG is a vending machine — put in a query, get an answer, done. Agentic RAG is a chef who checks the pantry, decides what is missing, sends someone to the store, adjusts the recipe based on what is available, and tastes as they go.
Example flow:
- User asks: “How does our vacation policy compare to industry standard?”
- Agent retrieves from internal knowledge base → finds the company vacation policy
- Agent decides it needs external data → searches the web for industry benchmarks
- Agent combines both sources → generates a comparative answer with citations
Use Agentic RAG when:
- ✅ Multi-step reasoning and tool orchestration are truly required
- ✅ Workflow depends on dynamic decisions across steps
Skip Agentic RAG when:
- ⚠️ Standard RAG or one focused pattern solves the problem
- ⚠️ Latency and operational complexity are key constraints
- ⚠️ You do not need autonomous tool routing
When to Use Each Pattern — The Honest Framework
Section titled “When to Use Each Pattern — The Honest Framework”Most people reading about advanced patterns want to use all of them. Resist that urge. Here is the decision framework:
| Your Problem | Pattern to Consider | Complexity Added |
|---|---|---|
| Vague, conversational queries | HyDE | Low — one extra LLM call |
| Complex multi-part questions | Query Decomposition | Medium — multiple retrieval passes |
| Need high reliability, reduce hallucination | Self-RAG | High — requires model fine-tuning or careful prompting |
| Retrieval sometimes fails completely | CRAG | Medium — needs relevance scoring + fallback |
| Documents are long, queries vary widely | Multi-Vector Retrieval | Medium — multiplied storage and indexing |
| Multi-step reasoning required | Agentic RAG | Very High — full agent architecture |
The rule of thumb: start with standard RAG. Evaluate it (Chapter 7). If evaluation reveals a specific, measurable problem — low recall on vague queries, poor precision on complex questions — then pick the one pattern that targets that problem. Do not stack patterns until you have evidence that one is not enough.
Try It Yourself — Which Pattern Do I Need?
Section titled “Try It Yourself — Which Pattern Do I Need?”Answer a few questions about your specific situation and get a recommendation for which pattern (if any) would help.
Which Pattern Do I Need?
What problem are you facing?
The Full Advanced Architecture
Section titled “The Full Advanced Architecture”Here is where each pattern plugs into the standard RAG pipeline. Not every system uses all of these — most use zero or one.
Your Project Step
Section titled “Your Project Step”Look at your chatbot’s RAGAS scores from Chapter 7. If all metrics are above 0.8, congratulations — you probably do not need any advanced patterns. If context recall is low on certain types of queries, try HyDE. If precision drops on complex questions, try query decomposition. The key discipline: identify the problem with data first, then pick the pattern that fixes it.
HyDE generates a hypothetical answer before retrieval. Why does this help?
According to the 'honest framework' in this chapter, when should you add an advanced RAG pattern?
What You Just Built
Section titled “What You Just Built”You have completed the full LearnRAG curriculum. You understand the six advanced patterns that extend standard RAG, you know when each one is and is not appropriate, and — most importantly — you have the discipline to only use them when evaluation data justifies the complexity.
Let us recap the entire journey. You started by understanding why RAG exists (Chapter 1). You learned to ingest and chunk documents (Chapter 2), embed them into vectors (Chapter 3), and store those vectors in a database (Chapter 4). You built smart retrieval (Chapter 5), crafted prompts that turn retrieved chunks into grounded answers (Chapter 6), and measured whether your pipeline actually works (Chapter 7). Now you have the advanced toolkit for when standard RAG is not enough.
You have a working chatbot over your own notes. You have a measured, evaluated pipeline. And you know exactly where to go next when you encounter a problem your current setup cannot handle.
Was this chapter helpful?
Sources
Section titled “Sources”- Gao, L., Ma, X., Lin, J., & Callan, J. (2022). Precise Zero-Shot Dense Retrieval without Relevance Labels. [src: gao2022hyde]
- Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. [src: asai2023selfrag]
- Yan, S., Gu, J., Zhu, Y., & Ling, Z. (2024). Corrective Retrieval Augmented Generation. [src: yan2024crag]
- Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., … & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. [src: lewis2020]