Why RAG Exists

Chapter 1 of 8

Explorer~12 min

Your AI is trained on yesterday’s news. RAG is how you plug it into today.

After this chapter, you’ll be able to: Explain what RAG is, why it exists, and when to use it instead of fine-tuning — in plain English.

The Two Problems with AI Today

Large Language Models like GPT-4 and Claude are incredibly powerful. They can write code, explain quantum physics, and draft legal contracts. But they have two fundamental problems that make them unreliable for real-world applications.

Problem 1: The Memory Cutoff

Every LLM has a training cutoff date. Everything it knows comes from the text it was trained on. Ask it about something that happened after that date, and it simply doesn’t know. [src: lewis2020, §1]

This isn’t a minor limitation. If you’re building a customer support bot, the AI doesn’t know about your latest product update. If you’re building a legal research tool, it doesn’t know about last month’s court ruling. The world changes daily — your AI’s knowledge doesn’t.

Problem 2: Hallucinations

Here’s the scarier problem: when an LLM doesn’t know something, it doesn’t say “I don’t know.” Instead, it makes something up — confidently, fluently, and convincingly. This is called a hallucination.

A hallucinating AI will invent statistics, fabricate quotes, and cite papers that don’t exist. It does this because it’s optimised to produce fluent text, not truthful text. [src: lewis2020, §1]

See It For Yourself

Don’t take our word for it. Toggle between an AI with and without RAG and watch what happens.

Try It: The Hallucination Toggle

User asks: "What were the key findings of the 2026 Global AI Safety Report?"

No RAGWith RAG

Without RAG — The AI made this up

The 2026 Global AI Safety Report found that 78% of organizations have implemented basic AI safety measures, with a particular focus on bias detection and model interpretability. The report highlighted three major breakthroughs in automated red-teaming techniques.

This sounds confident and specific, but the AI has no access to this document. Every detail here is fabricated — the 78% figure, the 'three breakthroughs,' all of it. This is a hallucination.

What RAG Actually Does

RAG stands for Retrieval-Augmented Generation. In one sentence:

Find the right documents first, then ask the AI to answer using only those documents.

PLAIN ENGLISH

RAG is an open-book exam for AI — instead of memorising everything, it looks up the right notes before answering.

That’s it. The entire idea is that simple.

The Open-Book Exam Analogy

Think about two ways to take an exam:

Closed-book (fine-tuning): You memorise the entire textbook beforehand. You might remember most things, but you’ll misremember details and you can’t learn new material without re-studying everything.
Open-book (RAG): You bring the textbook to the exam. When you get a question, you look up the relevant pages first, then write your answer based on what you just read.

RAG is the open-book approach. The AI doesn’t need to have memorised your company’s docs — it just needs to be able to look them up when asked.

The RAG Pipeline in Three Steps

Retrieve — Search your document collection for chunks relevant to the user’s question
Augment — Add those chunks to the AI’s prompt as context
Generate — The AI writes an answer grounded in the retrieved information

Loading diagram...

The RAG pipeline: Retrieve relevant documents, augment the prompt, generate a grounded answer.

RAG vs. Fine-Tuning

Both RAG and fine-tuning solve the “my AI doesn’t know about X” problem. But they solve it in very different ways.

	RAG	Fine-Tuning
How it works	Looks up documents at query time	Retrains the model on your data
Cost	Low — just search + API calls	High — GPU hours for training
Update speed	Instant — add new docs anytime	Slow — retrain needed for new data
Hallucination control	High — answers cite sources	Medium — model may still hallucinate
Best for	Knowledge bases, docs, FAQs	Changing the model’s writing style or behaviour

The rule of thumb: If you need the AI to know specific facts, use RAG. If you need the AI to behave differently, use fine-tuning. Many production systems use both. [src: lewis2020, §5]

TIP

Start with RAG. It is cheaper, faster to set up, and easier to update than fine-tuning. Only add fine-tuning later if you need to change the model’s tone or behaviour.

Loading diagram...

Fine-tuning memorises your data. RAG looks it up on demand.

When Should You Use RAG?

Use RAG when:

Your data changes frequently (product docs, knowledge bases, news)
You need the AI to cite its sources
You want to control exactly what information the AI has access to
You need answers grounded in specific documents
You want to avoid the cost of model training

Consider fine-tuning when:

You need the AI to adopt a specific tone or persona
You want to change how the AI responds, not what it knows
Your data is static and well-defined

Common Misconceptions About RAG

These are the questions that confuse almost every beginner. If any of these have been in your head, you’re in good company.

“Can’t I just give the AI a huge context window instead?”

Modern models like Gemini 1.5 Pro and Claude 3.5 Sonnet have context windows of 100k–1M tokens. Why not just paste your entire knowledge base and skip RAG entirely?

Two reasons: cost and quality. Sending a million tokens with every request costs real money — at $3 per million input tokens, a 500k-token context costs $1.50 per query. At scale, that’s unsustainable. More importantly, research consistently shows LLMs lose focus in very long contexts — the “lost in the middle” effect means information buried in the middle of a huge context is often ignored. [src: liu2023lost] RAG retrieves only the 3–5 most relevant chunks, keeping costs low and attention sharp.

“Does RAG eliminate hallucinations completely?”

No — but it reduces them dramatically. RAG gives the AI a source to cite and ground its answer in. However, if the retrieved chunks are irrelevant (a retrieval problem) or the prompt doesn’t instruct the model to stay faithful to the source (a prompting problem), the model can still hallucinate. Chapters 5–7 cover how to measure and reduce this.

“Is RAG the same as a search engine?”

The retrieval step in RAG is a form of search — but it’s semantic search, not keyword search. A traditional search engine like Elasticsearch looks for matching words. RAG retrieves by meaning, so “return policy” and “refund process” match each other even with zero word overlap. The generation step (where the LLM writes an answer using the retrieved text) is what separates RAG from pure search.

“Do I need a vector database to do RAG?”

For learning and small projects: no. You can store embeddings in a simple JSON file or NumPy array and compute similarity with a few lines of code. For production systems with thousands of documents: yes, a vector database like ChromaDB or Qdrant becomes essential for performance. Chapter 4 covers when you actually need one.

RAG in the Real World

RAG is not a niche technique — it powers some of the most widely used AI products:

Customer support bots at companies like Intercom and Zendesk use RAG over their help documentation, so the AI only answers with documented facts.
Legal and compliance tools use RAG over contracts and regulations so lawyers can query the actual document text, not the model’s memory of similar documents.
Developer tools like GitHub Copilot Chat use retrieval over your local codebase context so responses are grounded in your actual code.
Internal knowledge bases are the most common enterprise use case: RAG over your company’s Notion, Confluence, or Slack — so employees can ask “what’s our parental leave policy?” and get a cited answer.

The pattern is always the same: private or frequently-updated information that an LLM can’t know from training → perfect RAG use case.

How Long Does It Take to Build?

This is a real question, so here’s a real answer.

A basic RAG pipeline — ingest a document, embed it, retrieve chunks, generate an answer — takes about 30 lines of Python and 15 minutes to get working using LangChain + ChromaDB + the OpenAI API. That’s what Lab 1 walks you through.

A production-quality system — with evaluation, re-ranking, metadata filtering, error handling, and a UI — takes weeks to months, depending on your data complexity and quality requirements.

This course gets you from zero to “working demo” in the chapters, and from “working demo” to “production-ready” in the labs. By the end, you’ll know exactly which decisions matter at scale and which don’t.

What You’re Building

Throughout this course, you’ll build one thing: a RAG chatbot over your own notes. Every chapter adds a piece to the puzzle.

By the end, you’ll have a working pipeline that:

Takes your text files
Splits them into searchable chunks
Converts those chunks into vectors
Finds the most relevant chunks for any question
Generates an accurate, cited answer

This chapter was the “why.” Next, we start building — beginning with how to prepare your documents.

Quick Check

What is the main advantage of RAG over fine-tuning for knowledge that changes frequently?

What is a 'hallucination' in the context of AI?

Was this chapter helpful?

Sources:

Lewis et al. (2020) — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [NeurIPS 2020]
Anthropic documentation on context windows and model limitations