What’s Retrieval-Augmented Generation? An ELI5 guide.

5 min readDec 19, 2023

RAG is a framework that helps large language models to generate more accurate and up-to-date answers.

By Qurratulain Saleem, Technical Lead — AI/ML, SourceFuse

Photo by Planet Volumes on Unsplash

So first, let’s discuss what a generation is in the context of large language models.

Generation refers to that part of large language models or LLMs that generate text in response to a user query, referred to as a prompt, just like how ChatGPT works. You ask a question, then it generates your answer for you. But like everything in AI-ML, it’s not 100% accurate while generating these answers. First, we are going to discuss the two challenges a large language model is prone to, then we will discuss how RAG can help us overcome these issues.

Two behaviors are considered problematic while interacting with Large Language Models.

  1. Out-of-date information
  2. No Source while answering a particular query

Large language models, are everywhere, they get some things amazingly right and other things very interestingly wrong.

Let’s try to understand why this is a problem using a cosmic example.

There always has been intense rivalry between Jupiter and Saturn about having the highest number of moons. If you had asked me a while back, I would have said it’s Jupiter, since I once read an article about new moons being discovered for the biggest planet of our solar system. But Saturn regained its crown of the planet with most moons in the solar system just months after being overtaken by its fellow gas giant. The leapfrog comes after the discovery of 62 new moons of Saturn, bringing the official total to a whopping 145. But since I was asked this before I was aware of this discovery my answer was not up-to-date, the first problem.

Asking the same question to someone who doesn’t have a knack for astronomy unlike me would probably give you the same answer, that’s because it is the closest guess, or maybe they will make a wrong assumption that Jupiter is the biggest planet and hence might as well have the highest number of moons too, therefore no source or a judgment hallucination, the second problem.

Let’s try to ask the same “how many moons” question with the ChatGPT.

ChatGPT 3.5

As you can see, it also mentions this information is only valid as per its last update. If we need our model to answer this question correctly, we would need to train our large language model again, which is not always feasible, and it would be very tough and uneconomical to keep your large language models in the swim.

That’s where Retriever Augmented Generation or RAG comes in, this strategy helps address both LLM hallucinations and out-of-date training data, and pairing the large language models with this architecture boosts its capabilities despite spending time and money on additional training.

The way large language works is that the user asks a question also known as a prompt about moons, and a large language model will confidently answer it as Jupiter since it knows only that from the parameters during the training. The large language model would be quite confident while doing this response generation, although it would most likely be wrong.

A non-RAG process of answer generation

But when you add the Retrieval Augmented with this generation process, instead of relying on just what LLM knows, we would add content store data, this content store data could be anything, a collection of documents, internal or external databases, or even the open internet. Now LLM has an instruction set provided along with the prompt that says first go and look into this content store for the related information that is contextually relevant and combines that with the user’s question and only then generates the response along with the evidence of why the response was what it was.

Answer generation with the RAG strategy

At this point, the large language model will fetch an informed and up-to-date answer, which is Saturn.

I hope now you can visualize how RAG helps you to overcome those challenges we saw earlier,

  1. Outdated: Now instead of retraining the model, if new information comes up like the discovery of moons, maybe Jupiter would again overtake you in the future, all you have to do is augment the data store with the new information so that the language model can provide an updated and more relevant answer by retrieving the most up-to-date information.
  2. Source: The large language model is now instructed to pay attention to the primary source data before giving its response, being able to provide evidence will make it less likely to hallucinate because now it is asked to be less likely to rely on the outdated data it is trained upon.

There is still a lot of research being conducted on how to improve these model performances from both ends, to improve the retriever to give the large language model the best quality of data on which to ground its response and also the generative part so that LLM can give the richest and best response to the user.

Discover the power of generative AI with SourceFuse. Contact Us Today!





Strategic digital transformation helping businesses evolve through cloud-native technologies