Professional Documents
Culture Documents
Fb Llmas Responsible Use Guide PDF
Fb Llmas Responsible Use Guide PDF
Fb Llmas Responsible Use Guide PDF
Responsible
Use Guide
Resources and best practices for
responsible development for products
powered by large language models.
Contents
Open Innovation 1
How to use this guide 3
Overview of responsible AI & system design 4
Responsible AI considerations 4
Mitigation points for LLM-powered products 5
Development of the foundation model 6
Responsible LLM product development stages 7
Determine use case 7
Fine-tune for product 8
The responsible fine-tuning flow 9
Step 1: Define content policies & mitigations 9
Step 2: Prepare data 10
Step 3: Train the model 10
Reinforcement Learning from Human Feedback (RLHF) 11
Reinforcement Learning from AI Feedback (RLAIF) 11
Step 4: Evaluate and improve performance 12
Red teaming best practices 13
Privacy adversarial attacks 14
Address input- and output-level risks 14
Mitigating risks at the input level 15
Mitigating risks at the output level 16
Evaluate effectiveness 17
Build transparency and reporting mechanisms
in user interactions 18
Feedback & reporting mechanisms 18
Transparency & control best practices 18
Resources for developers 20
Combining the components of responsible generative AI 22
Addendum: Introducing Code Llama 23
Foundation model use case 24
Instruction model use case 25
Meta is committed to open science because
we believe that a vibrant AI-innovation
ecosystem will push the frontiers of scientific
discovery and potentially revolutionize a wide
array of sectors from education to agriculture,
and climate management to cybersecurity.
We believe that the power of AI will be harnessed to address global
challenges, and unlocking that power responsibly will require
democratization of access and collaboration on risk management.
We want to empower developers in every industry on a global
scale to drive breakthroughs, create new products and solutions,
and benefit from accelerations in technological advancement and
economic growth.
AUGUST 2023 1
We have already seen what innovations in LLMs to explore and build text-
generation use cases. Ultimately, we believe this will
open science can achieve. create a more level playing field for organizations
Meta’s mission to advance the state of the art of of all sizes across the globe to benefit from the
AI for the benefit of all has always relied on open economic growth promised by the advancement of AI.
AUGUST 2023 2
How to use this guide
This guide is a resource for developers that outlines The recommendations included in this guide reflect
common approaches to building responsibly at each current research on responsible generative AI. We
level of an LLM-powered product. It covers best expect these to evolve as the field advances and
practices and considerations that developers should access to foundation models grows, inviting further
evaluate in the context of their specific use case and innovation on AI safety. Decisions to implement
market. It also highlights some mitigation strategies best practices should be evaluated based on the
and resources available to developers to address risks jurisdiction where your products will be deployed and
at various points in the system. These best practices should follow your company’s internal legal and risk
should be considered holistically because strategies management processes.
adopted at one level can impact the entire system.
AUGUST 2023 3
Overview of responsible
AI & system design
Responsible AI considerations
Helping to ensure that generative AI technology does technology that have already surfaced online, such as
not produce undue harm is of paramount importance. the creation or proliferation of illegal content, content
Generative AI is developing rapidly and is being which may be objectionable or hateful, or content
driven by research, open collaboration, and product that may result in the provision of unqualified advice.
releases that are putting this technology in the hands These instances may increase as generative AI tools
of people globally. Growth at this scale presents become more accessible.
novel challenges for the responsible deployment
of AI, yet many of the principles of responsibility For our own, on-platform
remain the same as for any other AI technology.
generative AI offerings,
These considerations, core to Meta’s approach
to responsible AI, include fairness and inclusion, Meta is implementing
robustness and safety, privacy and security, and safety measures to address
transparency and control, as well as mechanisms for
governance and accountability. LLMs are one of many
context-specific risks.
AI tools, and their risks should be evaluated through These mitigations are layered across different
these lenses according to how they will be used. intervention points beyond those that can be
assessed and mitigated in the foundation model.
Foundation models and generative AI systems
As discussed in our research paper on Llama 2,
represent advancements in power and accuracy
some mitigations applied at early stages in the
compared to predecessor technologies. The increase
development process can be detrimental to the
in the performance, utility, and flexibility of these
downstream performance and safety of the model,
models will likely lead to their ubiquity, as the value
and some risks may be better addressed at later
they bring to some pre-existing use cases may
points in the product development cycle. Developers
outweigh operational costs of deploying the systems.
of generative AI-powered features that leverage
The ability to generate completely new content also
open source models will similarly need to ensure that
opens up new use cases that must be evaluated
their products are safe and benefit end users, taking
for the types of risks they may present. There are
a holistic view of responsible AI across the entire
potential risks related to the misuse of this
product development cycle.
AUGUST 2023 4
Mitigation points for LLM-
powered products
A foundation model is a general purpose AI provide opportunities to mitigate potential risks.
technology whereas an LLM-powered product has It is critical that developers examine each layer of the
a defined use case and performs specific tasks product to determine which potential risks may arise
to enable an intended use or capability through a based on the product objectives and design,
user interface, sometimes embedded in products. and implement mitigation strategies accordingly.
An LLM-powered system encompasses both the
The following section presents responsible AI
foundation model and a number of product-specific
considerations for the different stages of LLM
layers. At various points in the product development
product development. At each of these levels,
lifecycle, developers make decisions that shape the
we highlight best practices for mitigating
objectives and functionality of the feature, which can
potential risks.
introduce potential risks. These decision points also
JULY 2023 5
Development of the
foundation model
Llama 2 is a new version of the Llama 1 model, which set of diverse, publicly available online data. This
was made available previously for research. The new training corpus is mostly English, which is consistent
pretrained and fine-tuned versions of the model have with the current, intended use of the model. For each
been updated for commercial release. In addition dataset used in training, we followed Meta’s standard
to performing a variety of pretraining data-level privacy review processes. And for our pretraining
investigations to help understand the potential data we made an effort to remove data from
capabilities and limitations of our models, we applied certain sources known to contain a high volume of
considerable safety mitigations to the fine-tuned personal information about private individuals. After
versions of the model through supervised fine-tuning, pretraining, the model can reproduce everything from
reinforcement learning from human feedback (RLHF), simple grammatical rules to complex nuances like
and iterative red teaming (these steps are covered context, sentiment, and figurative language. However,
further in the section - Fine-tune for product). the model does not gain knowledge or generate
beliefs about the world in the way humans do. It only
Information on pretraining data, model architecture
learns to predict the next word in a sentence based
and parameters, and pretrained evaluations are
on the patterns in its training data.
contained in the Llama 2 research paper. The
paper also describes in further detail the steps to If you’re going to use the pretrained model, we
develop the fine-tuned versions, including detailed recommend tuning it by using the techniques
safety alignment efforts and evaluation results. described in the next section to reduce the likelihood
Additional information is included in the model card that the model will generate unsafe outputs that are
accompanying the release. The research paper and in conflict with your intended use case and tasks. If
model card provide information about the capabilities you have terms of service or other relevant policies
and limitations of the models, which will help that apply to how individuals may interact with your
developers more safely tune, evaluate, and deploy LLM, you may wish to fine-tune your model to be
Llama for new use cases. aligned with those policies. It may also be necessary
to establish new terms of service and policies specific
During pretraining, a model builds its understanding
to LLMs, or notify users about how their data or
of the statistical patterns across the sample of
feedback provided will be used in fine-tuning.
human language contained in its training data. The
training datasets for Llama are sourced from a broad
AUGUST 2023 6
Responsible LLM product
1
development stages
Developers will identify a specific product use case Determine use case
for the released model, and are responsible for
assessing risks associated with that use case and An important decision in the development process
applying best practices to ensure safety. This section is which use case(s) to focus on. Most developers
outlines the considerations and mitigation strategies using this guide already have a use case in mind,
available at each stage of product development such as customer support, AI assistants, internal
1. Determine use case use cases that improve the lives of people and society,
taking into consideration different ethical principles
2. Fine-tune for product and values. Developing or adopting an internal risk
assessment process can help identify potential
3. Address input- and
risks for a specific use case and should focus on
output-level risks
how your product’s end users and others could be
• OECD’s AI Principles
AUGUST 2023 7
2
Fine-tune for product
Product-specific fine-tuning enables developers to
leverage pretrained models or models with some
fine-tuning for a specific task requiring only limited
Examples of fine-tuning for a pretrained
LLM include:
AUGUST 2023 8
The responsible fine-tuning flow
Here are the general steps needed to responsibly fine-
tune an LLM for alignment, guided at a high
level by Meta’s Responsible AI framework:
AUGUST 2023 9
THE RESPONSIBLE FINE-TUNING FLOW
STEP 2: PREPARE DATA will depend on the specific context in which a product
Developing downstream applications of LLMs begins is deployed. Developers should also pay attention
with taking steps to consider the potential limitations, to how human feedback and annotation of data may
privacy implications, and representativeness of further polarize a fine-tuned model with respect
data for a specific use case. Begin by preparing and to subjective opinions, and take steps to prevent
preprocessing a clean dataset that is representative injecting bias in annotation guidelines and to
of the target domain. This involves tokenizing the text, mitigate the effect of annotators’ bias. Resources
information, and splitting the dataset into training, • Don’t Blame the Annotator: Bias Already Starts in
validation, and testing sets. This step may also the Annotation Instructions
involve ensuring that data are representative of the • Annotators with Attitudes: How Annotator Beliefs
end users in the deployment context, for instance, by And Identities Bias Toxic Language Detection
ensuring there are enough examples from relevant There are several other risks to consider, such as
languages if you plan to deploy your product in a overfitting, privacy, and security. To mitigate these
non-English speaking market. Representativeness risks, carefully design the fine-tuning process by
of data is dependent on the use case and should be curating a high-quality dataset that is representative
assessed accordingly. of your use case, conduct rigorous evaluations, and
When fine-tuning for a specific use case it can be test your fine-tuned model’s potential use via red
beneficial to examine training data for biases, such teaming (covered in step four - Evaluate and
AUGUST 2023 10
THE RESPONSIBLE FINE-TUNING FLOW
training progress is monitored using a validation set, Reinforcement Learning from Human
and hyperparameters are adjusted as necessary. Feedback (RLHF)
Fine-tuning an LLM for safety can involve a number To align the output of LLMs with user expectations
of techniques, many of which the research paper on and values, one approach that developers should
Llama 2 describes in greater depth. These techniques consider is implementing Reinforcement Learning
can include: from Human Feedback (RLHF) mechanisms. This
• Supervised Fine-Tuning (SFT): Supervised fine- involves collecting ranking data from trained
tuning using data annotated across helpfulness annotators or users (given a model input and several
adversarial prompts with safe responses by specific policies by using Reinforcement Learning
prefixing a safe preprompt such as “You are a from AI Feedback (RLAIF). The fine-tuned LLM itself
safe and responsible assistant” to the adversarial can be used to create synthetic ranking data for
prompt, followed by fine-tuning on new outputs. reward model training. Given a model input, response
pairs and relevant guidelines, the LLM predicts
which response would best follow the guidelines.
The synthetic reward modeling data are then used to
augment the reward model’s training data.
AUGUST 2023 11
THE RESPONSIBLE FINE-TUNING FLOW
STEP 4: EVALUATE AND IMPROVE PERFORMANCE Evaluation strategies and processes to improve
The final stage is to evaluate the fine-tuned model on performance can include:
a test set to measure its performance on the specific • Automatic evaluation leverages automatic
task and against safety benchmarks, according to benchmarks and classifiers to judge the output
the use case. This includes analyzing the model’s with respect to a specific category of risk.
strengths and weaknesses based on evaluation • Manual evaluation leverages human annotators or
results, gathering more data to further enhance subject matter experts to judge the model’s output.
performance and safety, and iterating until satisfied
• Red teaming is a systematic effort to identify
with the model’s performance using holdout test
model vulnerabilities or emergent risks by crafting
datasets.
prompts that may elicit undesirable behavior or
There are many complementary types of evaluations outputs. This type of manipulation of the model
that are useful for measuring risks in models, can be used to test safeguards and attempts to
including automatic benchmarks, manual annotations “jailbreak” the model.
by human raters, and evaluations using an LLM
itself as a rater. The Holistic Evaluation of Language
Models discusses some of the commonly used
automatic benchmarks.
AUGUST 2023 12
Red teaming best practices • Regular testing: The model should undergo
regular testing to determine whether or not
Red teams should adopt systematic approaches
mitigations against attacks are effective.
to testing and measurement, while estimating
This requires some form of automated
real-world behaviors and threat vectors to the
evaluation, either with human labeling, which
extent possible.
can be expensive, or with classifiers trained
• Diversity: Red teams should include a diverse to recognize responses that fall under the
set of people from a range of professional risk categories.
backgrounds that are representative of a broad
group of potential users and demographics. Red
teams can be composed of internal employees,
experts, or community members.
AUGUST 2023 13
Privacy adversarial attacks
Additional privacy protections should be considered
when releasing the product, to test whether bad
actors may be able to improperly extract information.
3
Address input- and
output-level risks
Without proper safeguards at the input and output
levels, it is hard to ensure that the model will respond
A privacy adversarial attack is a method where
properly to adversarial inputs and will be protected
attackers can exfiltrate data from a model. For
from efforts to circumvent content policies and
example, common adversarial attacks may include
safeguard measures (“jailbreaking”). Mitigations at
membership inference attacks on a model to predict
the output level can also act as a safeguard against
whether or not a particular sample was in the training
generating high-risk or policy-violating content.
data, or model inversion attacks to reconstruct
Enforcement of content policies can be managed
representative views of a subset of examples.
through automated systems and manual analysis
Prompt injection attacks are attempts to circumvent
of samples and reports. Automated systems may
content restrictions to produce particular outputs.
include machine learning and rule-based classifiers
A red team privacy adversarial attack conducted by a for filtering prompt inputs or system outputs. Usage
company may be able to demonstrate the feasibility or consequence policies may be defined for when
of such attacks. In scenarios where companies users repeatedly violate those policies.
fine-tune models using personal data (pursuant to
applicable privacy laws), they should consider
testing the outputs to see if the model memorized
particular data.
AUGUST 2023 14
Mitigating risks at the input level • Prompt engineering: Direct modifications of
the user inputs are an option for guiding the
The input refers to the information provided by
model behavior and encouraging responsible
the user and passed to the system. The developer
outputs, by including contextual information or
does not control what the user inputs. Without
constraints in the prompts to establish background
implementation of input filters and safeguards, even
knowledge and guidelines while generating the
advanced models can potentially be manipulated to
output. Modifications may be done in a variety
generate harmful or misleading outputs or violate
of ways, such as with automated identification
content policies. Although safeguards to protect
and categorization, assistance of the LLM itself,
privacy and prevent potential harm can be developed
or rules engines. These can help improve the
by tuning the model, it should be expected that even
user experience by creating more diversity and
after rigorous design and testing, those safeguards
expressiveness from the model. For example,
will not have perfect performance and may be
prompt engineering can be leveraged to direct the
subverted. Additional safeguards include direct
model to include more diverse references or apply
filtering and engineering of the inputs. For these to
a certain tone or point of view. Prompt engineering
be effective, model inputs must be well-formatted.
rules may be hard coded or probabilistic.
These approaches include:
AUGUST 2023 15
Mitigating risks at the output level unreasonably restrict the usage of your model.
Words often have context-dependent meanings,
Based on the downstream use case, you can apply
and terms that could be sexually suggestive, for
several approaches for detecting and filtering the
example, may also be used in medical contexts.
generated output of models for problematic or policy-
Content policies will help articulate the specifics
violating content. Here are some considerations and
between permitted and prohibited topics to users.
best practices for filtering outputs. Any output filter
mitigation should include all languages that are used • Classifiers: The more effective, but also more
in the region where your product is available. difficult, approach is to develop classifiers that
detect and filter outputs based on the meaning
• Blocklists: One of the easiest ways to prevent the
conveyed by the words chosen. Classifiers,
generation of high-risk content is to compile a
when properly trained on known examples of a
list of all the phrases that your model should not,
particular sentiment or type of semantic content,
under any circumstances, be permitted to include
can become highly effective at identifying novel
in a response. Many words are easily identifiable
instances in which that sentiment or meaning
as problematic; slurs, for example, are typically
is expressed.
offensive no matter their context. While blocklists
are attractive for their simplicity, they may
AUGUST 2023 16
Evaluate effectiveness Some best practices include:
AUGUST 2023 17
4
Build transparency and
reporting mechanisms in
user interactions
performance and avoid repeating mistakes. Product
developers should review feedback by monitoring the
rate that users report model outputs and by manually
reviewing those reports and selected samples of
model outputs.
Releasing an LLM-powered feature for users to
interact with can reveal new use cases as well as
Transparency & control best practices
new concerns. User interactions can provide critical
To ensure high-quality feedback and provide end
feedback, which can be used for reinforcement
users with notice and choice about their interactions
learning (discussed in a previous section). This is
with your AI assets, developers should consider the
also an opportunity to provide appropriate notice,
following practices for user interactions:
transparency, and control to users, which can lead to
greater satisfaction and trust in the feature.
• Transparency: Developers should consider ways
AUGUST 2023 18
should also consider the use of system cards to • Control mechanisms: Additional controls could
provide insight into their AI system’s underlying include giving users the option to customize the
architecture and explain how a particular AI outputs generated by an LLM. For example, a
experience is produced. Further best practices are user could select or reject outputs from a list of
outlined in the Partnership on AI’s Responsible multiple options. Offering editing capabilities
Practices for Synthetic Media. can also enhance a user’s sense of agency
over outputs, and developers should consider
education flows that can set a user up for
success, such as offering prompt suggestions or
explanations of how to improve an output.
AUGUST 2023 19
Resources for developers
fostering more open exchange of safety research and including a sensitive topics classifier: https://parl.
models and use the most Research on Foundation Models, HELM: https://
crfm.stanford.edu/helm/latest/
current version to get the • EleutherAI LLM Evaluation Harness: https://
AUGUST 2023 20
Reporting resources • Reporting bugs and security concerns:
facebook.com/whitehat/info
If you have any information about issues, violations,
or problems, please help keep our communities safe • Reporting violations of the Acceptable
AUGUST 2023 21
Combining the components
of responsible generative AI
Each stage of model development presents data-collection stage to user feedback, be sure
opportunities to enhance the safety of your AI to keep your overall goal in mind.
feature. However, it’s crucial to acknowledge the • Standardizing processes for learning from
interconnectedness of these stages and how the feedback/errors. Embracing an iterative model-
decisions made at each stage can impact others. development mindset is crucial. Establish a well-
Building a responsible AI ecosystem requires ongoing defined process for incorporating new learnings
efforts to refine each component and ensure they into subsequent model training. This process
work together effectively. should include consistent feedback analysis,
prioritization of identified issues, and systematic
Here are some key considerations for implementing
application of learnings in the next iteration of
these components in unison:
model training.
• Holistic optimization. Although each component
The field of generative AI is complex, ever-evolving,
has a specific role and optimization goal,
and full of potential, but it’s not without risks. The
components are not isolated entities. Over-
key to unlocking its benefits while mitigating the
optimization of one component without
downsides is responsible AI practice. This practice
considering its interaction with others can lead
starts with understanding the complexities of the
to suboptimal outcomes. For instance, over-
technology, the potential impacts on users and
filtering training data for safety might make
society, and the importance of continuously striving
later fine-tuning less effective, as the model
for improvement.
may not recognize and handle unsafe content
appropriately. This is why different layers of By embracing the principles of transparency,
safety mitigations throughout the development accountability and user empowerment, as well
lifecycle are critical for creating high-performing, as having a commitment to ongoing learning and
AUGUST 2023 22
ADDENDUM
Building AI models responsibly is important to us, and context length, reduce latency by quantization
AUGUST 2023 23
Both the foundation and instruction Code Llama limitations on producing potentially problematic
models are not designed to be used as a large content. In the code domain, models should
language model for general purpose - for such avoid producing malware, virus, or malicious
1
scenarios refer to Llama 2. code. Developers should consider how bad actors
prompt the model to produce these results.
improving performance; (v) addressing input- and • The data should be representative of the
output-level risks; and (vi) building transparency end users’ requirements. For example, if the
and reporting mechanisms in user interactions. model is meant for Javascript generation,
Additionally, developers should consider code- the dataset chosen to fine-tune with should
specific best practices when building on top of be Javascript-focused. Developers should also
Code Llama in line with their specific use case. consider examining and placing restrictions
on any potentially malicious or nefarious code
Define content policies for use case
in the data.
• A content policy defines what content is
• Developers should ensure the security and
allowable based on the intended use and
robustness qualities of the training code dataset
deployment context. This may also outline
matches the security requirements of the output
AUGUST 2023 24
•
and the systems where the output code will be
integrated based on a specific use case.
• If the model’s output will be used in production generation tasks, we recommend developers use
systems, developers should ensure the code that our Code Llama - Instruct variants, which have
the model is trained on is free of relevant security been fine-tuned to generate helpful and safe
vulnerabilities. Developers and end-users that answers to users. Consult the full Responsible
use the model as an assistant for software Use Guide for best practices with regards to
development should continue to follow security generally addressing input- and output-level
foundation model for other task families, e.g, general Acceptable Use Policy.
relevant, it may be more appropriate to use our Code If you have any information about issues,
Llama - Python model variant. Code Llama and Code violations, or problems, please help keep our
Llama - Python are not trained with instruction communities safe by using our reporting resources.
data hence are not designed to follow instruction in • Reporting issues with the model via our
natural language. Any use of these models to perform GitHub repo
general natural language tasks is not recommended
• Reporting risky content generated by the
to avoid potential misuse of the models.
model via our Developer portal
AUGUST 2023 25