Wolfgang Groß
Wolfgang Groß
Chief AI Officer

Own Your RAG – Developing RAG Systems Reliably using the DAIKI Framework

While pre-built models can work for many use cases, private RAG and LLM systems guarantee data protection and ownership, making them relevant for organizations seeking to implement AI in a secure way.

Chatbots based on Large Language Models (LLMs), like ChatGPT, remain a hot topic. Now, Retrieval Augmented Generation (RAG) Systems are starting to gain attention as more organizations explore AI implementation. In this article, we provide an overview of RAG systems and explore the advantages of custom RAG systems and LLMs for the enterprise.

RAG in Context

ChatGPT hit 100 Mio monthly active Users in just under two months. OpenAI got there in record time because they made AI very accessible, and the user interface is self-explanatory.

Chatbots are the primary use case of Large Language Models (LLMs) today. This was anticipated years before LLMs gained widespread attention, but most chatbots failed to deliver because the language fidelity was insufficient for a convincing and helpful interaction. 

Powerful LLMs made up for this shortcoming but still left things to be desired. For example, there is a need to use the latest information, provide clear attribution to sources, and use private documents specific to your use case. Models like ChatGPT are trained with data only up to a specific date; things that happen after this point in time are unknown to the model. 

With Retrieval Augmented Generation (RAG) systems, developers found an elegant way to engineer us out of this predicament by employing Information Retrieval (IR) systems – that power search engines like Google – along with LLMs. 

A Retrieval Augmented Generation (RAG) application is a whole system of components that work together and provide a chat interface that can answer user questions articulately and faithfully. 

RAG systems are widely popular and encapsulate the idea of bringing external information to the model, which means the AI model does not have to memorize everything by heart. 

Like in an open book exam, the retrieval system can look things up and the LLM processes this information.

Services vs. Private LLMs 

Generic LLMs as well as RAG systems are readily available as cloud-based Software as a Service (SaaS). 

But while pre-built tools – including the new Open AI Custom GPTs – are great for many use cases, it’s still very valuable to understand and “own” the technology and the processes behind it. 

This is especially true in an enterprise context where dependence on external services that can break at any time (as happened with ChatGPT recently) is a  big issue. In these cases, custom, controlled RAG and Private LLM systems can be very beneficial.

In addition to this technological sovereignty, private RAG and LLM systems can be a very important factor in your risk management and mitigation strategy. Data protection and ownership can effectively be guaranteed only with private RAG setups.

Cloud-based solutions require your private data to be sent to the cloud, i.e. outside your safe on-premise realm. To make things worse, many providers reserve the right to not only inspect your data, but explicitly train their models on your data. Such models may then reveal your data to any third party using the cloud service.

In addition, misinformation and hallucinations are two risk factors that should be addressed when using LLM technology in an enterprise context. While hallucinations are to a certain degree inherent to the underlying transformer technology RAG systems offer the possibility to effectively control the sources for the text generations and link to them whenever there’s a need for accurate source checking.

In order to allow for the necessary customizations for high answer quality and to exclude any possibility of data leakage, you need a flexible on-premise RAG system with integrated private LLM. 

It is paramount that you own the system with exclusive deep control of both private data retrieval and augmented generation. Only when you are in complete control can you achieve the highest possible answer quality, while keeping your private data safe.

Main Components of a RAG System

The first thing to understand about RAG systems is that they are composed of multiple independent components that are glued together by structural code. 

There are numerous options for each component, and you need to select one based on the requirements of your application. Some popular frameworks like LangChain and LlamaIndex provide glue code and easy access to multiple components. 

Good engineering demands strategic planning and a solid requirement elicitation phase. The Daiki software provides templates that ensure high-quality results and transparent processes across the organization.  

Similar to any software application, using off-the-shelf parts compared to custom solutions is a matter of the application demands and experience of the development team. 

Successful RAG engineering is about selecting the right components that best fit the use case. The main components of a RAG application are a user and user interface, a retrieval component, like a search engine utilizes, and a generation component, like in Chat-GPT.

A systematic approach is paramount for the successful application of a RAG system.

User Interface

The user interface determines how the information generated should be displayed and what questions the user can post to the systems. In the simplest case, a user would ask a textual question, and the system would return a block of text that answers this question. 

The next step of complexity is a continuous chat where the user can ask follow-up questions, and the system retains previous questions and answers. To further improve this, the user interface could provide additional input fields for the user to specify more precisely which documents should be retrieved, for example, based on the metadata identifying documents created after a specific date or by a specific author. 

You should specify the user interface (UI) requirement early in planning the RAG system’s development to be sure to select the right technical components that can support this UI. A good user interface establishes trust with the user and amplifies the potential of the RAG system.

Retrieval Components

The retrieval component is what makes a RAG system different from a Chat-GPT-like application that just uses prompt engineering. The job of the retrieval system is to find and return documents from a database that best fits the query. 

These documents are passed to the generative competent, and this is precisely why the generative components know about the latest and potentially private information even though they have not seen it at training time. 

The generative component can then also attribute the answer given to the documents provided by the retrieval system. To find relevant documents, the retrieval system chunks and embeds the chosen documents into a database, typically as a vector representation, often referred to as a vector database, and calculates the similarity metrics. 

The data ingestion is computationally heavy but must only be done when the database is updated. When a user writes a question during operation, only the query must be embedded, and the systems then find similar document chunks in the database that best match the semantics of the question.

Nonetheless, the ingestion process should be planned carefully. Only then can it be run often and robustly to ensure the database is accurate and current. How the documents are chucked, embedded, and matched to the query has huge implications for the speed and quality of the retrieval systems. There is a broad offering of databases, Chung, and embedding methods to be chosen from, and the criteria to evaluate for your application are speed, scalability, reliability, development support, security, compatibility, and type of representation and similarity metrics.

Generation Components

The generation side’s main components are the generative LLM and the prompt composer. RAG systems don’t go without prompt engineering because the information of the retrieval system is presented to the generative LLM by providing it in the prompt. 

This reveals one of the rigid limits of today’s RAG systems. The context window of LLMs is limited and we can’t put everything into the prompt. Furthermore, when the prompt gets very long, the latency of the answer gets long. 

We must prioritize between the length and the number of chunks we present to the model. We must ensure that all relevant information is presented and only relevant information is presented to the model. The prompt provides valuable context for the LLM and if information is missing or it is irrelevant the LLM has no chance of answering correctly. 

Sound RAG systems should put guardrails in place to prompt the model to answer truthfully and accurately on the information provided. Prompt libraries also provide off-the-shelf prompts for the RAG system, e.g. LangChain Hub. To summarize, the documents should provide the knowledge and LLM comprehension. 

RAG Systems in Operation

If these components are tuned well together, the generative and retrieval components become more than their parts. This is a testament to how responsible engineering can bring out the best of AI and non-AI components if set up well and process-focused. It is paramount that the systems requirements are set up early and the components are appropriately evaluated. 

However, developing RAG systems is incomplete without proper evaluation and continuous operational system monitoring. Like any software, RAG systems require proper planning, testing, and monitoring, but RAG systems also have unique requirements for their LLM-specific and IR elements. 

RAG the Daiki Way

We at Daiki are strongly process-focused on developing AI applications of this kind, which is needed to ensure the durable quality and long-term success of the application beyond the pilot phase.

In contrast to off-the-shelf solutions, Daiki RAG does not limit you to a specific tool stack, e.g. langchain, or vendor ecosystem, e.g. OpenAI and Chat-GPT. It allows you to cherry-pick the best components and easily deploy a specifically optimized solution for each and any of your tasks.

The Daiki RAG system is a flexible and scalable on-premise solution that provides you with total control over the entire retrieval and augmentation pipeline. The Daiki software can help you select the right tech stack and set up a RAG system to your specifications, develop it successfully, and satisfy the users’ needs. Daiki comes with pre-made recipes and templates that aid planning and communication, to help you do your best work.

 

Need help with AI implementation?

Request a demo to learn more about Daiki can help

Related articles

The United Nations General Assembly adopted a resolution advocating for trustworthy AI systems, offering a good basis for continued international policy on AI governance
Human-Computer Interaction (HCI) is a crucial lens that informs the technology we build, the way we build it, and whom to consider in the process. When we discuss responsible AI, then, we can see the crucial role of HCI in the development and adoption of AI.
While pre-built models can work for many use cases, private RAG and LLM systems guarantee data protection and ownership, making them relevant for organizations seeking to implement AI in a secure way.
Daiki Logo

Request a demo today

Daiki Logo

Apply today

Daiki Logo

Partner with us

Daiki Logo

Join our waitlist for responsible AI development

Daiki Logo

Join our waitlist for responsible AI development

Daiki Logo

Join our waitlist for responsible AI development

Daiki Logo

Join waitlist for responsible AI development