Why RAG Exists
Your AI is trained on yesterday’s news. RAG is how you plug it into today.
After this chapter, you’ll be able to: Explain what RAG is, why it exists, and when to use it instead of fine-tuning — in plain English.
The Two Problems with AI Today
Section titled “The Two Problems with AI Today”Problem 1: The Memory Cutoff
Section titled “Problem 1: The Memory Cutoff”Every LLM has a training cutoff date. Everything it knows comes from the text it was trained on. Ask it about something that happened after that date, and it simply doesn’t know. [src: lewis2020, §1]
This isn’t a minor limitation. If you’re building a customer support bot, the AI doesn’t know about your latest product update. If you’re building a legal research tool, it doesn’t know about last month’s court ruling. The world changes daily — your AI’s knowledge doesn’t.
Problem 2: Hallucinations
Section titled “Problem 2: Hallucinations”Here’s the scarier problem: when an LLM doesn’t know something, it doesn’t say “I don’t know.” Instead, it makes something up — confidently, fluently, and convincingly. This is called a
A hallucinating AI will invent statistics, fabricate quotes, and cite papers that don’t exist. It does this because it’s optimised to produce fluent text, not truthful text. [src: lewis2020, §1]
See It For Yourself
Section titled “See It For Yourself”Don’t take our word for it. Toggle between an AI with and without RAG and watch what happens.
Try It: The Hallucination Toggle
The 2026 Global AI Safety Report found that 78% of organizations have implemented basic AI safety measures, with a particular focus on bias detection and model interpretability. The report highlighted three major breakthroughs in automated red-teaming techniques.
What RAG Actually Does
Section titled “What RAG Actually Does”Find the right documents first, then ask the AI to answer using only those documents.
That’s it. The entire idea is that simple.
The Open-Book Exam Analogy
Section titled “The Open-Book Exam Analogy”Think about two ways to take an exam:
- Closed-book (fine-tuning): You memorise the entire textbook beforehand. You might remember most things, but you’ll misremember details and you can’t learn new material without re-studying everything.
- Open-book (RAG): You bring the textbook to the exam. When you get a question, you look up the relevant pages first, then write your answer based on what you just read.
RAG is the open-book approach. The AI doesn’t need to have memorised your company’s docs — it just needs to be able to look them up when asked.
The RAG Pipeline in Three Steps
Section titled “The RAG Pipeline in Three Steps”- Retrieve — Search your document collection for chunks relevant to the user’s question
- Augment — Add those chunks to the AI’s prompt as context
- Generate — The AI writes an answer grounded in the retrieved information
RAG vs. Fine-Tuning
Section titled “RAG vs. Fine-Tuning”Both RAG and
| RAG | Fine-Tuning | |
|---|---|---|
| How it works | Looks up documents at query time | Retrains the model on your data |
| Cost | Low — just search + API calls | High — GPU hours for training |
| Update speed | Instant — add new docs anytime | Slow — retrain needed for new data |
| Hallucination control | High — answers cite sources | Medium — model may still hallucinate |
| Best for | Knowledge bases, docs, FAQs | Changing the model’s writing style or behaviour |
The rule of thumb: If you need the AI to know specific facts, use RAG. If you need the AI to behave differently, use fine-tuning. Many production systems use both. [src: lewis2020, §5]
When Should You Use RAG?
Section titled “When Should You Use RAG?”Use RAG when:
- Your data changes frequently (product docs, knowledge bases, news)
- You need the AI to cite its sources
- You want to control exactly what information the AI has access to
- You need answers grounded in specific documents
- You want to avoid the cost of model training
Consider fine-tuning when:
- You need the AI to adopt a specific tone or persona
- You want to change how the AI responds, not what it knows
- Your data is static and well-defined
Common Misconceptions About RAG
Section titled “Common Misconceptions About RAG”These are the questions that confuse almost every beginner. If any of these have been in your head, you’re in good company.
“Can’t I just give the AI a huge context window instead?”
Modern models like Gemini 1.5 Pro and Claude 3.5 Sonnet have context windows of 100k–1M tokens. Why not just paste your entire knowledge base and skip RAG entirely?
Two reasons: cost and quality. Sending a million tokens with every request costs real money — at $3 per million input tokens, a 500k-token context costs $1.50 per query. At scale, that’s unsustainable. More importantly, research consistently shows LLMs lose focus in very long contexts — the “lost in the middle” effect means information buried in the middle of a huge context is often ignored. [src: liu2023lost] RAG retrieves only the 3–5 most relevant chunks, keeping costs low and attention sharp.
“Does RAG eliminate hallucinations completely?”
No — but it reduces them dramatically. RAG gives the AI a source to cite and ground its answer in. However, if the retrieved chunks are irrelevant (a retrieval problem) or the prompt doesn’t instruct the model to stay faithful to the source (a prompting problem), the model can still hallucinate. Chapters 5–7 cover how to measure and reduce this.
“Is RAG the same as a search engine?”
The retrieval step in RAG is a form of search — but it’s semantic search, not keyword search. A traditional search engine like Elasticsearch looks for matching words. RAG retrieves by meaning, so “return policy” and “refund process” match each other even with zero word overlap. The generation step (where the LLM writes an answer using the retrieved text) is what separates RAG from pure search.
“Do I need a vector database to do RAG?”
For learning and small projects: no. You can store embeddings in a simple JSON file or NumPy array and compute similarity with a few lines of code. For production systems with thousands of documents: yes, a vector database like ChromaDB or Qdrant becomes essential for performance. Chapter 4 covers when you actually need one.
RAG in the Real World
Section titled “RAG in the Real World”RAG is not a niche technique — it powers some of the most widely used AI products:
- Customer support bots at companies like Intercom and Zendesk use RAG over their help documentation, so the AI only answers with documented facts.
- Legal and compliance tools use RAG over contracts and regulations so lawyers can query the actual document text, not the model’s memory of similar documents.
- Developer tools like GitHub Copilot Chat use retrieval over your local codebase context so responses are grounded in your actual code.
- Internal knowledge bases are the most common enterprise use case: RAG over your company’s Notion, Confluence, or Slack — so employees can ask “what’s our parental leave policy?” and get a cited answer.
The pattern is always the same: private or frequently-updated information that an LLM can’t know from training → perfect RAG use case.
How Long Does It Take to Build?
Section titled “How Long Does It Take to Build?”This is a real question, so here’s a real answer.
A basic RAG pipeline — ingest a document, embed it, retrieve chunks, generate an answer — takes about 30 lines of Python and 15 minutes to get working using LangChain + ChromaDB + the OpenAI API. That’s what Lab 1 walks you through.
A production-quality system — with evaluation, re-ranking, metadata filtering, error handling, and a UI — takes weeks to months, depending on your data complexity and quality requirements.
This course gets you from zero to “working demo” in the chapters, and from “working demo” to “production-ready” in the labs. By the end, you’ll know exactly which decisions matter at scale and which don’t.
What You’re Building
Section titled “What You’re Building”Throughout this course, you’ll build one thing: a RAG chatbot over your own notes. Every chapter adds a piece to the puzzle.
By the end, you’ll have a working pipeline that:
- Takes your text files
- Splits them into searchable chunks
- Converts those chunks into vectors
- Finds the most relevant chunks for any question
- Generates an accurate, cited answer
This chapter was the “why.” Next, we start building — beginning with how to prepare your documents.
Quick Check
Section titled “Quick Check”What is the main advantage of RAG over fine-tuning for knowledge that changes frequently?
What is a 'hallucination' in the context of AI?
Was this chapter helpful?
Sources:
- Lewis et al. (2020) — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [NeurIPS 2020]
- Anthropic documentation on context windows and model limitations