Professional Documents
Culture Documents
Responsible Use Guide PDF
Responsible Use Guide PDF
Responsible
Use Guide
Resources and best practices for
responsible development of products
built with large language models
Contents
Open Innovation 1
How to use this guide 3
Overview of responsible AI & system design 4
Responsible AI considerations 4
Mitigation points for LLM-powered products 5
Development of the foundation model 6
Responsible LLM product development stages 7
Determine use case 7
Define content policies 8
Understand alignment-helpfulness trade-offs 8
Model-level alignment 9
Step 1: Prepare data 10
Step 2: Train the model 11
Reinforcement Learning from Human Feedback (RLHF) 12
Reinforcement Learning from AI Feedback (RLAIF) 12
Step 3: Evaluate and improve performance 12
Red teaming best practices 13
Privacy adversarial attacks 14
System-level alignment 14
Mitigating risks at the input level 15
Mitigating risks at the output level 16
Evaluate effectiveness 17
Build transparency and reporting mechanisms
in user interactions 17
Feedback & reporting mechanisms 17
Transparency & control best practices 18
Resources for developers 19
Combining the components of responsible generative AI 21
Addendum: Introducing Code Llama 22
Foundation model use case 23
Instruction model use case 25
Meta is committed to open science because
we believe that a vibrant AI-innovation
ecosystem will push the frontiers of scientific
discovery and potentially revolutionize a wide
array of sectors from education to agriculture,
and climate management to cybersecurity.
We believe that the power of AI will be harnessed to address global
challenges, and unlocking that power responsibly will require
democratization of access and collaboration on risk management.
We want to empower developers in every industry on a global
scale to drive breakthroughs, create new products and solutions,
and benefit from accelerations in technological advancement and
economic growth.
Responsible AI considerations
Helping to ensure that generative AI technology technology that have already surfaced online, such as
does not produce content that could cause harm is of the creation or proliferation of illegal content, content
paramount importance. Generative AI is developing which may be objectionable or hateful, or content
rapidly and is being driven by research, open that may result in the provision of unqualified advice.
collaboration, and product releases that are putting These instances may increase as generative AI tools
this technology in the hands of people globally. become more accessible.
Growth at this scale presents novel challenges for
For our own, on-platform generative AI offerings,
the responsible deployment of AI, yet many of the
Meta is implementing safety measures to address
principles of responsibility remain the same as for any
context-specific risks. These mitigations are layered
other AI technology. These considerations, core to
across different intervention points beyond those
Meta’s approach to responsible AI, include fairness
that can be assessed and mitigated in the foundation
and inclusion, robustness and safety, privacy and
model. With our release of Llama 3 paired with Llama
security, and transparency and control, as well as
Guard 2, we are beginning to extend this vision of a
mechanisms for governance and accountability. LLMs
layered approach to safety to our open models
are one of many AI tools, and their risks should be
as well.
evaluated through these lenses according to how they
will be used. As discussed in our research paper on Llama 2,
some mitigations applied at early stages in the
Foundation models and generative AI systems
development process can be detrimental to the
represent advancements in power and accuracy
performance and safety of the model, and some
compared to predecessor technologies. The increase
risks may be better addressed at later points in the
in the performance, utility, and flexibility of these
product development cycle. Our vision for layered
models will likely lead to their ubiquity, as the value
model safety helps to empower developers to
they bring to some pre-existing use cases may
make decisions about balancing these trade-offs.
outweigh operational costs of deploying the systems.
Developers of generative AI-powered features that
The ability to generate completely new content also
leverage open source models will have more power to
opens up new use cases that must be evaluated
ensure that their products are safe and benefit end
for the types of risks they may present. There are
users, while taking a holistic view of responsible AI
potential risks related to the misuse of this
across the entire product development cycle.
foundation model and accompanying input-output for the most context-specific safety mitigations
layers. At various points in the product development to craft safety mitigations specifically for their use
lifecycle, developers make decisions that shape the case with the goal of offering their users the best
objectives and functionality of the feature, which can product experience should explore these options.
Meta Llama 3, like Llama 2, is licensed for commercial The training datasets for Llama are sourced from
use. This release of Llama 3 features both 8B and 70B a broad set of diverse, publicly available online
pretrained and instruct fine-tuned versions to help data. This training corpus is mostly English, which
support a broad range of application environments. is consistent with the current, intended use of
This next generation of Llama demonstrates state- the model. For each dataset used in training, we
of-the-art performance on a wide range of industry followed Meta’s standard privacy review processes.
benchmarks and offers new capabilities, including And for our pretraining data we made an effort
improved reasoning. With the developer in mind, and to remove data from certain sources known to
in support of our longstanding open approach, we contain a high volume of personal information about
wanted to put Llama 3 in the hands of the community private individuals. After pretraining, the model can
as soon as possible to enable early development and reproduce everything from simple grammatical rules
kickstart this next wave of innovation. to complex nuances like context, sentiment, and
figurative language. However, the model does not
In addition to performing a variety of pretraining
gain knowledge or generate beliefs about the world
data-level investigations to help understand the
in the way humans do. It only learns to predict the
potential capabilities and limitations of our models,
next word in a sentence based on the patterns in its
we applied considerable safety mitigations to the
training data.
fine-tuned versions of the model through supervised
fine-tuning, reinforcement learning from human If you’re going to use the pretrained model, we
feedback (RLHF), and iterative red teaming (these recommend tuning it by using the techniques
steps are covered further in the section - Fine-tune described in the next section to reduce the likelihood
for product). that the model will generate outputs that are in
conflict with your intended use case and tasks. If
More information on Llama 3 model architecture and
you have terms of service or other relevant policies
parameters and pretrained evaluations are contained
that apply to how individuals may interact with your
in the model card. The model card also provides
LLM, you may wish to fine-tune your model to be
information about the capabilities and limitations of
aligned with those policies. It may also be necessary
the models.
to establish new terms of service and policies specific
During pretraining, a model builds its understanding to LLMs, or notify users about how their data or
of the statistical patterns across the sample of feedback provided will be used in fine-tuning. We also
human language contained in its training data. recommend using Llama Guard 2 for enhanced safety
performance.
1
development stages
Developers will identify a specific product use case Determine use case
for the released model, and are responsible for
assessing risks associated with that use case and An important decision in the development process
applying best practices to ensure safety. This section is which use case(s) to focus on. Most developers
outlines the considerations and mitigation strategies using this guide already have a use case in mind,
available at each stage of product development such as customer support, AI assistants, internal
1. Determine use case use cases that improve the lives of people and society,
taking into consideration different ethical principles
2. Model-level alignment and values. Developing or adopting an internal risk
assessment process can help identify potential
3. System-level alignment
risks for a specific use case and should focus on
4. Build transparency and reporting how your product’s end users and others could be
• OECD’s AI Principles
limits should be evaluated in light of the product assistance with scams. If a user submits a prompt
domain, as specific sectors and regions may have for “How does a ponzi scheme operate?” the model
different laws or standards. Additionally, the needs can either refuse to substantively answer (arguably
of specific user communities should be considered as the most aligned, least helpful option) or provide a
you design content policies, such as the development complete, detailed answer (arguably the most helpful,
of age-appropriate product experiences. Having least aligned option). Consider the same evaluation,
these policies in place will dictate the data needed, but with the prompt “How to protect yourself from
tuning, including the types of mitigation steps that As the model’s rate of identifying and stopping
will be implemented. Defining these policies will be unaligned content grows, its likelihood of falsely
used for labeling data in later stages when using stopping aligned content–and thereby reducing its
RLHF and in additional product layers, such as making overall helpfulness–grows in tandem. In other words,
enforcement decisions for user inputs and model you’ll need to look elsewhere to learn about stopping
outputs. identity theft. Turning down the dial–so that more
If you are new to considerations of content policies, unaligned content gets through–will likely have the
refer to commonly used policies in the industry such knock-on effect of increasing the likelihood that the
as the taxonomy proposed by MLCommons. model generates helpful content. You’ll learn about
protecting your identity from thieves.
1. Prepare data
2. Train the model
3. Evaluate and improve performance
Understanding these patterns is important but it may Step 2: Train the model
not always be optimal to filter out all problematic
Fine-tuning involves training the model for a limited
content in training data due to the unintended
number of iterations. Once a pretrained model
consequences this filtering may have on subsequent
is loaded in the environment for fine-tuning, the
performance and safety mitigations, such as prompt
training process involves setting up hyperparameters
engineering. Instead of removing data, focusing on
like epochs, batch size, and learning rate. The data
the representativeness of the data can help prevent
are passed through the model, loss is computed, and
a fine-tuned model from perpetuating biases in its
weights are updated through backpropagation. The
generated outputs; what is considered representative
training progress is monitored using a validation set,
will depend on the specific context in which a product
and hyperparameters are adjusted as necessary.
is deployed. Developers should also pay attention
Fine-tuning an LLM for safety can involve a number
to how human feedback and annotation of data may
of techniques, many of which the research paper on
further polarize a fine-tuned model with respect
Llama 2 describes in greater depth. These techniques
to subjective opinions, and take steps to prevent
can include:
injecting bias in annotation guidelines and to
mitigate the effect of annotators’ bias. Resources • Supervised Fine-Tuning (SFT): Supervised fine-
• Annotators with Attitudes: How Annotator Beliefs (RLHF) or AI Feedback (RLAIF): Training safety
And Identities Bias Toxic Language Detection and helpfulness reward models to support
There are several other risks to consider, such as RLHF techniques iteratively improves models
overfitting, privacy, and security. To mitigate these and makes them more robust to jailbreaking
curating a high-quality dataset that is representative • Targeted Safety Context Distillation: Context
of your use case, conduct rigorous evaluations, and distillation for safety helps the model associate
test your fine-tuned model’s potential use via red adversarial prompts with safe responses by
teaming (covered in step four - Evaluate and prefixing a safe preprompt such as “You are a
improve performance). safe and responsible assistant” to the adversarial
prompt, followed by fine-tuning on new outputs.
Reinforcement Learning from Human performance and safety, and iterating until satisfied
Feedback (RLHF) with the model’s performance using holdout test
and values, one approach that developers should There are many complementary types of evaluations
consider is implementing Reinforcement Learning that are useful for measuring risks in models,
from Human Feedback (RLHF) mechanisms. This including automatic benchmarks, manual annotations
involves collecting ranking data from trained by human raters, and evaluations using an LLM
annotators or users (given a model input and several itself as a rater. The Holistic Evaluation of Language
generated outputs, ranking them from best to worst Models discusses some of the most commonly used
according to policies), training a reward or helpfulness automatic benchmarks. As the industry matures, we
model to act as a proxy of human feedback, and are excited for evaluation platforms to emerge to
then optimizing the LLM to maximize the reward/ help drive safety standardization, such as through
helpfulness model score with reinforcement learning. the MLCommons AI Safety working group. Evaluation
strategies and processes to improve performance can
Reinforcement Learning from AI include:
Feedback (RLAIF)
• Automatic evaluation leverages automatic
Reward models can also be improved and tailored to benchmarks and classifiers to judge the output
specific policies by using Reinforcement Learning with respect to a specific category of risk.
from AI Feedback (RLAIF). The fine-tuned LLM itself
• Manual evaluation leverages human annotators
can be used to create synthetic ranking data for
or subject matter experts to judge the model’s
reward model training. Given a model input, response
output.
pairs and relevant guidelines, the LLM predicts
• Red teaming is a systematic effort to identify
which response would best follow the guidelines.
model vulnerabilities or emergent risks by
The synthetic reward modeling data are then used to
crafting prompts that may elicit undesirable
augment the reward model’s training data.
behavior or outputs. This type of manipulation
of the model can be used to test safeguards and
Step 3: Evaluate and improve performance attempts to “jailbreak” the model.
A red team privacy adversarial attack conducted by a or consequence policies may be defined for when
company may be able to demonstrate the feasibility users repeatedly violate those policies.
in the region where your product is available. difficult, approach is to develop classifiers that
detect and filter outputs based on the meaning
• Blocklists: One of the easiest ways to prevent the
conveyed by the words chosen. Classifiers,
generation of high-risk content is to compile a
when properly trained on known examples of a
list of all the phrases that your model should not,
particular sentiment or type of semantic content,
under any circumstances, be permitted to include
can become highly effective at identifying novel
in a response. Many words are easily identifiable
instances in which that sentiment or meaning
as problematic; slurs, for example, are typically
is expressed.
offensive no matter their context. While blocklists
are attractive for their simplicity, they may
against specific concerns when used as inputs. feedback or reporting mechanisms is key to ensuring
After model responses are collected, they can quality output. Feedback mechanisms can be as
prior to or at the time of user interaction. For include giving users the option to customize the
instance, notice to users that they are interacting outputs generated by an LLM. For example, a
with an AI-powered chatbot may increasingly user could select or reject outputs from a list of
be required in certain markets, and is a best multiple options. Offering editing capabilities
practice to address concerns that may be related can also enhance a user’s sense of agency
to false or incorrect information. Developers over outputs, and developers should consider
should neither claim nor imply that an AI agent is education flows that can set a user up for
human, especially when building and deploying success, such as offering prompt suggestions or
fostering more open exchange of safety research and • Recipes for safety in open-domain Chatbots,
tools to support developers. including a sensitive topics classifier: parl.ai/
projects/safety_recipes/
filtering and safety features. Below, we provide a • Hugging Face Hub which hosts open source
few notable hubs and implementation resources for models, datasets, and is a space for developers
developers, but this list is not exhaustive. Microsoft to share safeguards and access benchmarking
also offers a repository of Responsible AI Resources. information: huggingface.co/docs/hub/index
Each stage of model development presents data-collection stage to user feedback, be sure
opportunities to enhance the safety of your AI to keep your overall goal in mind.
feature. However, it’s crucial to acknowledge the • Standardizing processes for learning from
interconnectedness of these stages and how the feedback/errors. Embracing an iterative model-
decisions made at each stage can impact others. development mindset is crucial. Establish a well-
Building a responsible AI ecosystem requires ongoing defined process for incorporating new learnings
efforts to refine each component and ensure they into subsequent model training. This process
work together effectively. should include consistent feedback analysis,
prioritization of identified issues, and systematic
Here are some key considerations for implementing
application of learnings in the next iteration of
these components in unison:
model training.
• Holistic optimization. Although each component
The field of generative AI is complex, ever-evolving,
has a specific role and optimization goal,
and full of potential, but it’s not without risks. The
components are not isolated entities. Over-
key to unlocking its benefits while mitigating the
optimization of one component without
downsides is responsible AI practice. This practice
considering its interaction with others can lead
starts with understanding the complexities of the
to suboptimal outcomes. For instance, over-
technology, the potential impacts on users and
filtering training data for safety might make
society, and the importance of continuously striving
later fine-tuning less effective, as the model
for improvement.
may not recognize and handle unsafe content
appropriately. This is why different layers of By embracing the principles of transparency,
safety mitigations throughout the development accountability and user empowerment, as well
lifecycle are critical for creating high-performing, as having a commitment to ongoing learning and
Code Llama is a set of large language models for code with known insecure patterns or complying
code based on Llama 2 providing strong code with a cyber attacker’s request. Measuring the
generation and infilling capabilities, support for model’s response to malicious requests allowed
large input contexts, and zero-shot instruction us to implement mitigations which make Code
following ability for programming tasks, as well as Llama 70B safer than other available models, while
state-of-the-art base models for fine-tuning. We remaining helpful. For detailed information on model
provide multiple flavors to cover a wide range of training, architecture and parameters, evaluations,
applications: foundation code models (Code Llama), responsible AI and safety refer to our research paper
Python specializations (Code Llama - Python), and and Responsible Use Guide.
instruction-following models (Code Llama - Instruct)
with 7B, 13B, 34B and 70B parameters each. All This addendum to the guide
models are trained on sequences of 16k tokens
and show improvements on inputs with up to 100k
outlines additional, coding-
tokens. The 7B and 13B Code Llama and Code specific best practices that
Llama - Instruct variants support infilling based on
surrounding content. Code Llama was developed
developers should consider
by training Llama 2 on publicly available code and in responsible development
natural language datasets related to code.
of downstream coding-
Building AI models responsibly is important to
us, and we undertook a variety of safety measures
related features.
before releasing Code Llama, including many of
the measures used for Llama 2. Before releasing
Code Llama potential use cases
the 7B, 13B, and 34B variants in August 2023, we
asked cybersecurity and malware development Code Llama has two broad use case categories:
experts to evaluate the Code Llama model’s capacity
1. Foundation model use case: Train on more data
for enabling malware development by otherwise
to create variants of Code Llama, e.g, add other
unskilled adversaries. We subsequently expanded
programming languages (C++, Java), increase
our evaluation suite to include Meta’s CyberSecEval,
context length, reduce latency by quantization
which evaluates an LLM’s capacity for producing
of model, etc.
2. Instruction model use case: Add more instruction code-specific best practices when building on top of
data to improve the model’s ability to follow Code Llama in line with their specific use case.
instructions, extend capability to understand Define content policies for use case
non-English instructions, create standalone
• A content policy defines what content is allowable
code bots and integrate into existing 3rd party
based on the intended use and deployment
products, etc.
context. This may also outline limitations on
Both the foundation and instruction Code Llama producing potentially problematic content. In
models are not designed to be used as a large the code domain, models should avoid producing
language model for general purpose - for such malware, virus, or malicious code. Developers
1
scenarios refer to Llama 2. should consider how bad actors prompt the
model to produce these results.
Red teaming & fine-tuning considerations • If the model’s output will be used in production
• The data should be representative of the systems, developers should ensure the code that
end users’ requirements. For example, if the the model is trained on is free of relevant security
model is meant for Javascript generation, vulnerabilities. Developers and end-users that
the dataset chosen to fine-tune with should use the model as an assistant for software
This model could be used for further applied If you have any information about issues,
research and testing of specialized large language violations, or problems, please help keep our
models for programming. Code Llama - Instruct communities safe by using our reporting resources.
has the ability to understand instructions in • Reporting issues with the model via our
natural language. When using Code Llama for code GitHub repo
generation tasks, we recommend developers use
• Reporting risky content generated by the
our Code Llama - Instruct variants, which have
model via our Developer portal
been fine-tuned to generate helpful and safe
• Reporting bugs and security concerns via our
answers to users. Consult the full Responsible
Bug Bounty
Use Guide for best practices with regards to
generally addressing input- and output-level
risks and building transparency and reporting
mechanisms in user interactions, as relevant to
a specific use case.