Last updated: April 5, 2026 · Model Architecture · by Daniel Ashford
What is RAG (Retrieval-Augmented Generation)?
A technique that gives LLMs access to external documents to improve accuracy and reduce hallucination.
Definition
Retrieval-Augmented Generation (RAG) is an architecture that combines an LLM with an external knowledge retrieval system. Instead of relying solely on training data, RAG retrieves relevant documents from a knowledge base and includes them in the prompt, allowing grounded responses.
How It Works
A typical RAG pipeline: (1) the user query is converted into a vector embedding, (2) similar documents are retrieved from a vector database, and (3) retrieved documents are inserted into the LLM prompt as context. RAG dramatically reduces hallucination because the model cites specific sources rather than generating from memory.
Example
A support chatbot using RAG: when a customer asks "What is your refund policy?", the system retrieves the actual refund policy document and includes it in the prompt, so the answer is based on the real policy.
Related Terms
See How Models Compare
Understanding rag (retrieval-augmented generation) is important when choosing the right AI model. See how 12 models compare on our leaderboard.