Summary of What is RAG? - Retrieval-Augmented Generation AI Explained - AWS

  • aws.amazon.com
  • Article
  • Summarized Content

    LLM: The Foundation of AI

    Large language models (LLMs) are powerful tools that can generate human-like text, translate languages, write different kinds of creative content, and answer your questions in an informative way. However, LLMs are limited by the data they were trained on. To enhance their capabilities and make them more versatile, we need to introduce a new approach: Retrieval Augmented Generation (RAG).

    • RAG is a technique that augments the knowledge of an LLM by retrieving relevant information from external data sources.
    • This additional information allows the LLM to provide more accurate and contextually relevant responses.
    • By integrating RAG, we can bridge the gap between an LLM's inherent knowledge and the vast pool of information readily available in the real world.

    RAG: Unlocking the Power of External Data

    Imagine a chatbot that can answer your questions about company policies, but it only has access to the information it was trained on. It wouldn't be able to provide you with the most up-to-date information, such as changes in leave policies. This is where RAG comes in.

    • RAG allows the LLM to access external data, such as company policy documents, databases, or APIs.
    • This external data is called "external data" and it can be in various formats, like files, database records, or long-form text.
    • RAG uses a process called "embedding" to convert the external data into numerical representations, making it understandable to the LLM.

    Information Retrieval: Finding the Relevant Data

    When a user asks a question, RAG uses information retrieval techniques to find the most relevant information from the external data. This is like searching a library for the specific book you need.

    • The user's query is converted into a numerical representation (a vector), and it's compared to the vectors of the external data.
    • The information with the closest vector similarity to the user's query is retrieved, ensuring that the LLM receives the most relevant data.
    • This process of converting data into vectors and storing them in a database is known as a "vector database".

    Annual Leave Example: Putting RAG into Action

    Let's take a real-world example of an employee asking "How much annual leave do I have?" RAG would work as follows:

    • The user's query, "How much annual leave do I have?", is converted into a vector.
    • This vector is compared to the vectors of the external data, which could include the company's annual leave policy and the employee's individual leave records.
    • The system would retrieve the most relevant information: the annual leave policy and the employee's leave history.
    • This relevant data is then provided to the LLM, along with the user's original query.

    Augmenting Prompts: Giving the LLM the Context it Needs

    With the relevant data retrieved, the LLM can now generate a much more accurate and informative response. This is because the LLM has the necessary information to understand the context of the user's query.

    • The process of adding the retrieved data to the user's original query is called "augmenting the prompt".
    • This augmented prompt provides the LLM with a richer context, enabling it to generate a more complete and accurate response.
    • This is where "prompt engineering" plays a crucial role. Prompt engineering involves crafting effective prompts to guide the LLM towards generating the desired output.

    Keeping Data Fresh: The Importance of Updates

    Imagine a chatbot that provides inaccurate information because the data it accesses is outdated. This can be a significant problem, especially in areas like company policies or news events.

    • To prevent this, it's essential to regularly update the external data used by RAG.
    • This involves automatically updating documents, database records, or APIs with the most recent information.
    • The updated data is then converted into vectors and added to the vector database.
    • This ensures that the LLM always has access to the latest and most relevant information.

    Conceptual Flow of RAG with LLMs

    The diagram below illustrates the workflow of RAG with LLMs:

    image
     

    As you can see, RAG is a powerful tool that can enhance the capabilities of LLMs by giving them access to external data. This allows LLMs to provide more accurate and relevant responses to user queries, making them more valuable and useful in a wide range of applications.

    Discover content by category

    Ask anything...

    Sign Up Free to ask questions about anything you want to learn.