Daiki logo blue
Picture of Wolfgang Groß
Wolfgang Groß
Chief AI Officer

A Framework for Responsible Use of LLMs: The Daiki Approach

The Daiki process enables you to use and customize language models responsibly, enhancing the safety, truthfulness, and helpfulness of LLMs.

A set of non-trivial challenges posed by data, technical, and ethical issues awaits any organization that plans to adopt and customize large language models (LLMs) to its specific needs. Daiki has defined a principled process to help organizations with these challenges.

The Daiki process enables you to use and customize language models responsibly, by putting in place the necessary procedures to enhance the safety, truthfulness, and helpfulness of large language models when interacting with humans (conversational AI) or when adapted to a certain (downstream) task. The principles behind this process include the commitment to help people with their tasks, to provide truthful and reliable information, and to prevent any harmful or biased behavior from the LLM.

At the time of this writing, an open issue related to violating intellectual property laws and plagiarism must be addressed. Indeed, LLMs are usually trained on a trove of publicly available data, potentially violating the copyright laws that may protect these data. We will leave the discussion about data copyright to future posts.

The Daiki process

In this post, we will focus on the widespread practice of choosing an existing LLM (i.e. a pre-trained/foundation model) and adapting it to a specific downstream task. The Daiki process for using LLMs responsibly consists of seven steps, from the initial user research phase and interaction with relevant domain experts to the selection and adaptation of the LLM to the task at hand and to the final deployment of the AI system based on the LLM.

By following this process, Daiki can make the whole pipeline of natural language processing in the era of large language models safe. In detail, here are the steps of the Daiki process for responsible use of LLMs:

1. User Research and System Design

The system requirements must be specified as a prerequisite for building an AI system based on LLMs. They include the system design and the identification of the system users.

Often, the users may interact directly with the LM, for example, with a chat interface. To ensure an effective testing of the system at the later stages, the system’s requirements should also define how the user interacts with the LM and how this interaction is evaluated.

This stage should prevent project misspecifications, i.e. when the problem that the technical solution proposed is not a good fit for the application specification. To avoid this issue, good communication across all stakeholders and roles is crucial.

2. Q&A with Domain Experts

The domain knowledge elicited from the domain experts ensures the relevance and accuracy of the language model customization. We define the downstream task and the available task-specific data at this stage.

In addition, we identify the stakeholders of the AI system, that is, everyone impacted by the system. The stakeholders include (but are not limited to) the system’s end users.

The concerns of each stakeholder about potential failures and harmful behavior of the AI system are collected and organized into the Ethical Matrix (O’Neil and Gunn). The collected concerns will be used later in the process to organize the test phase.

Furthermore, at this stage, the engineers should clarify all relevant issues with the domain experts and make sure that the technical implications are understood by everyone involved in the project.

3. Model and System Specification

Choosing the right foundation model for the downstream task is crucial. Benchmarking tools like MTEB (Massive Text Embedding Benchmark) (Muenninghoff et al.) can guide this choice.

The analysis of potential risks, like privacy breaches (Lukas et al.) or prompt injection attacks (Anderson), as well as the selection of the training method (in-context learning or fine-tuning), are also done at this step. Finally, the system requirements from the first step are translated into a set of functional requirements for the developers.

4. Data and Foundation Model Auditing.

This auditing is a sanity check for the task-specific data collection process and the foundation model. If the training data reflect any unwanted societal biases, the learned model may reproduce or even amplify the unwanted behavior (Gebru et al.).

Therefore, the data collection process has to be properly audited. In addition, the foundation model is checked for bias and toxic behavior, possibly validating the evaluation results in the model card accompanying the foundation model.

5. Responsible Training

Responsible training aims to align the language model with the specific downstream task while following ethical principles and upholding human values. At the end of this step, the model is fully developed according to the functional requirements.

The weights of the pre-trained LM may be updated by different methods, including fine-tuning the foundation model, reinforcement learning from human feedback (Ouyang et al., Stiennon et al., Bai et al.), or fine-tuning with distillation (Hsieh et al.). If updating the weights of the pre-trained LM is not desirable, due, for example, to the scarcity of task-specific examples, or even impossible, in the case of a closed-source pre-trained LM, the suitable task demonstrations for in-context learning are defined. The task demonstrations will be included in the few-shot prompts generated at inference time (step 7).

6. Model Evaluation

Evaluation judges the trained LM’s performance both with and without human involvement. Typical tests involving humans are manually written test cases, manual evaluation of selected model outputs, or structured system testing. Evaluations not requiring human interaction are intrinsic and extrinsic performance metrics that can be measured with an adequate dataset.

For high-risk applications, internal or external red teams that focus on stress testing and risk analysis of the model can be set up. Unlike the traditional machine learning approach to model evaluation, which usually consists of measuring the accuracy on a hold-out set of examples, the Daiki process tests AI systems in a way that is more similar to how software engineers test software: by identifying and focusing on the cases where the system fails.

7. Responsible Deployment

Responsible deployment ensures that the system reliably delivers high-quality outputs to the users. The output quality is measured not only in terms of their accuracy but also of their relevancy and harmlessness.

A common practice for ensuring the quality of the outputs consists of setting up content-moderation filters. Content moderation may also be applied to decide which users’ queries can engage with the LM. Query filtering should also mitigate the possibility of harmful responses, enhancing the safety and harmlessness of the outputs given to users. The quality controls and the safety measures implemented in the content filters depend on the downstream task and the harm it could cause.

Finally, the AI system should be subject to continuous monitoring and regular auditing to ensure it is not drifting away from the original requirements. In addition to the automatic control of the deployed system, the Daiki process establishes procedures to enable human oversight, both from the end users who are given the possibility to raise potential safety issues and from designated members of the organization deploying the AI system.

Following the seven steps of the Daiki process enhances the reliability, harmlessness, and usefulness of language models’ usage and customization, contributing to a better and safer AI-powered future.

A Closer Look at the Responsible Training of Language Models

This phase typically involves various approaches, including fine-tuning on a domain-specific dataset, Reinforcement Learning from Human Feedback (RLHF) (Bai et al., Ouyang et al., Stiennon et al.), Constitutional AI (Bai et al.), fine-tuning to follow instructions (FLAN) (Chung et al.), prompt engineering or fine-tuning with distillation.

Fine-tuning forms the basis of domain adaptation, while RLHF is the primary method when human feedback for training upholds the moral principles of responsible language modeling. On the other hand, constitutional AI determines model values based on a set of predefined rules, forming the “constitution.”

If, during the design phase, in-context learning has been selected to align the model with the desired downstream task, the prompts are designed, engineered, and optimized during the training phase.

Fine-Tuning LMs

A common practice in modern natural language (NLP) processing is fine-tuning an existing pre-trained language model on a domain-specific dataset. This transfer-learning approach is often also used in computer vision, and it works well in NLP for the same reasons.

The pre-trained model develops a general understanding of the language that is useful in various tasks, and through fine-tuning, the model acquires the specific knowledge necessary for the task at hand. Furthermore, due to the gigantic amount of data and the heavy computing resources needed, training a large LM from scratch is generally impractical for medium-to-small-size organizations. It remains predominantly a privilege of large tech companies or well–funded universities. The large carbon footprint of the training process is also a big pro in favor of transfer-learning approaches.

Reinforcement Learning from Human Feedback

Generally, Reinforcement Learning from Human Feedback (RLHF) (Ouyang et al., Stiennon et al., Bai et al.) has many benefits and no cost to performance but requires a human in the loop. RLHF is a rather challenging concept involving multiple model training processes, which is impractical in many situations.

Fine-tune to follow instructions (FLAN)

A pre-trained LLM can also be fine-tuned on training instances containing instructions in natural language. An instruction-formatted instance consists of a task description (called an instruction e.g., “Please translate French to English”), an input-output pair, and optionally a small number of demonstrations.

The instruction-formatted instances are usually generated by formatting existing datasets for different tasks (summarization, text classification, translation) with natural language descriptions of the tasks. These formatted instances are used to fine-tune the LLM in a supervised learning way (e.g., training with the sequence-to-sequence loss). This method can be very effective in making the model robust and generalizable, but it also requires a specific training set of instructions. Even if some open-source instruction datasets are freely available, manual work is usually needed to generate a suitable dataset for a specific task.

Fine-tune on a curated dataset for adapting to society

The Process for Adapting Language Models to Society (PALMS) (Solaiman and Dennison) suggests improving a language model behavior by fine-tuning it on a curated dataset reflecting a predetermined set of target values. Even if experimental results show that PALMS is able to align the model with human values, generating the curated dataset also requires manual work.

Constitutional AI

Bai et al. propose a method to use an AI model to supervise other AI models (Bai et al.). The main advantage of Constitutional AI (CAI) is reduced human effort in generating the supervised training data. But CAI still requires a sophisticated process that, like RLHF, might not be suitable for all circumstances. 

In Summary

By investing in responsible usage and customization of LLMs, we are fostering an environment where AI doesn’t merely reflect the biases and flaws of our society but serves as a tool for progress and fairness. The real breakthrough happens when we successfully deliver high-quality and ethical artificial intelligence that upholds the principles of responsibility. 

Our aim should be to maximize the benefits while minimizing the harm, for that is where the true potential of AI and language modeling lies. We achieve this by following a thorough and structured process integrating expertise from multiple disciplines.


References

O’Neil, Cathy, and Hanna Gunn, ‘Near-Term Artificial Intelligence and the Ethical Matrix’, in S. Matthew Liao (ed.), Ethics of Artificial Intelligence (New York, 2020; online edn, Oxford Academic, 22 Oct. 2020), accessed 14 July 2023.

Anderson, Carol. “Newly discovered “prompt injection”​ tactic threatens large language models.” 7 10 2022, https://www.linkedin.com/pulse/newly-discovered-prompt-injection-tactic-threatens-large-anderson/. Accessed 10 July 2023.

Bai, Yuntao, et al. “Constitutional AI: Harmlessness from AI Feedback.” 2022, https://arxiv.org/abs/2212.08073.

Bai, Yuntao, et al. “Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.” 2022, https://arxiv.org/abs/2204.05862.

Chung, Hyung Won, et al. “Scaling Instruction-Finetuned Language Models.” 2022, https://arxiv.org/abs/2210.11416.

Hsieh, Chen-Yu, et al. “Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes.” 2023, https://arxiv.org/abs/2305.02301.

Lukas, Bils, et al. “Analyzing Leakage of Personally Identifiable Information in Language Models.” 2023, https://arxiv.org/abs/2302.00539.

Timnit, Gebru, et al.  “Datasheets for datasets.” 2021.  https://arxiv.org/abs/1803.09010v8

Muenninghoff, Niklas, et al. “MTEB: Massive Text Embedding Benchmark.” 2023. https://arxiv.org/abs/2210.07316.

Ouyang, Long, et al. “Training language models to follow instructions with human feedback.” 2022, https://arxiv.org/abs/2203.02155.

Solaiman, Irene, and Christy Dennison. “Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets.” 2021, https://arxiv.org/abs/2106.10328.

Stiennon, Nisan, et al. “Learning to summarize from human feedback.” 2022, https://arxiv.org/abs/2009.01325.

Need help with AI implementation?

Request a demo to learn more about Daiki can help

Related articles

The Stanford Center for Responsible Quantum Technology aims to connect the quantum community to explore how to balance maximizing the benefits and mitigating the risks of a new class of applied quantum technologies
Given the complexity of AI projects, it's important to have a strategic approach when embarking on your first AI project to ensure that you're set up for success
The United Nations General Assembly adopted a resolution advocating for trustworthy AI systems, offering a good basis for continued international policy on AI governance
Daiki Logo

Apply today

Daiki Logo

Book Demo Today

Daiki Logo

Partner with us

Daiki Logo

Join our waitlist for responsible AI development

Daiki Logo

Join our waitlist for responsible AI development

Daiki Logo

Join our waitlist for responsible AI development

Daiki Logo

Join waitlist for responsible AI development