Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Meta Llama

Responsible
Use Guide
Resources and best practices for
responsible development of products
built with large language models
Contents
Open Innovation 1
How to use this guide 3
Overview of responsible AI & system design 4
Responsible AI considerations 4
Mitigation points for LLM-powered products 5
Development of the foundation model 6
Responsible LLM product development stages 7
Determine use case 7
Define content policies 8
Understand alignment-helpfulness trade-offs 8
Model-level alignment 9
Step 1: Prepare data 10
Step 2: Train the model 11
Reinforcement Learning from Human Feedback (RLHF) 12
Reinforcement Learning from AI Feedback (RLAIF) 12
Step 3: Evaluate and improve performance 12
Red teaming best practices 13
Privacy adversarial attacks 14
System-level alignment 14
Mitigating risks at the input level 15
Mitigating risks at the output level 16
Evaluate effectiveness 17
Build transparency and reporting mechanisms
in user interactions 17
Feedback & reporting mechanisms 17
Transparency & control best practices 18
Resources for developers 19
Combining the components of responsible generative AI 21
Addendum: Introducing Code Llama 22
Foundation model use case 23
Instruction model use case 25
Meta is committed to open science because
we believe that a vibrant AI-innovation
ecosystem will push the frontiers of scientific
discovery and potentially revolutionize a wide
array of sectors from education to agriculture,
and climate management to cybersecurity.
We believe that the power of AI will be harnessed to address global
challenges, and unlocking that power responsibly will require
democratization of access and collaboration on risk management.
We want to empower developers in every industry on a global
scale to drive breakthroughs, create new products and solutions,
and benefit from accelerations in technological advancement and
economic growth.

AI at Meta April 2024 1


Meta has open sourced code and datasets for Meta is proud to have supported the Llama 2 developer
machine translation, computer vision, and fairness community by building state-of-the-art responsibility
evaluation, while contributing to the infrastructure tooling that makes it easier than ever to build and release
of the AI-developer community with tools like models responsibly. Learn more about the open source
PyTorch, ONNX, Glow, and Detectron. In the past, tools we share with developers to help them build
we have also made our cutting-edge large language responsibly.
models (LLMs) Llama 1 and OPT-175B available to
We’re also excited to release an early look at the next
the scientific community through research releases
generation of Llama, Meta Llama 3 which, like Llama 2,
which have spurred research in model efficiency,
is licensed for commercial use. This release of
medicine, and conversational safety studies on
Llama 3 features both 8B and 70B pretrained and
evaluation methods, de-biasing techniques, and
instruct fine-tuned versions to help support a broad
sources of hallucinations in LLMs.
range of application environments.
We also took an important step toward advancing
access and opportunity in the creation of AI-powered We envision Llama models
products and experiences with the launch of Meta
Llama 2. The open release of these models to the as part of a broader system
research and business community laid the foundation
that puts the developer in the
for the next wave of community-driven innovation in
generative AI. We’ve seen an incredible response driver seat.
thus far with millions of download requests in the
Llama models will serve as the foundational piece of
time since its release.
a complex system that developers design with their
Democratization of access will put these models unique end goals in mind. As part of this system
in more people’s hands, which we believe is the centric approach, and to support responsible
right path to ensure that this technology will benefit deployment of these models, we have updates to our
the world at large. We take our commitment to open trust and safety project including a Meta Llama
building responsible AI seriously, cognizant of the Guard 2 model that supports a broader taxonomy for
potential privacy and content-related risks, as well input/output prompt filtering.
as societal impacts.

AI at Meta April 2024 2


How to use this guide
This guide is a resource for developers that outlines The recommendations included in this guide reflect
common approaches to building responsibly at each current research on responsible generative AI. We
level of an LLM-powered product. It covers best expect these to evolve as the field advances and
practices and considerations that developers should access to foundation models grows, inviting further
evaluate in the context of their specific use case and innovation on AI safety. Decisions to implement
market. It also highlights some mitigation strategies best practices should be evaluated based on the
and resources available to developers to address risks jurisdiction where your products will be deployed and
at various points in the system. These best practices should follow your company’s internal legal and risk
should be considered holistically because strategies management processes.
adopted at one level can impact the entire system.

AI at Meta April 2024 3


Overview of responsible
AI & system design

Responsible AI considerations
Helping to ensure that generative AI technology technology that have already surfaced online, such as
does not produce content that could cause harm is of the creation or proliferation of illegal content, content
paramount importance. Generative AI is developing which may be objectionable or hateful, or content
rapidly and is being driven by research, open that may result in the provision of unqualified advice.
collaboration, and product releases that are putting These instances may increase as generative AI tools
this technology in the hands of people globally. become more accessible.
Growth at this scale presents novel challenges for
For our own, on-platform generative AI offerings,
the responsible deployment of AI, yet many of the
Meta is implementing safety measures to address
principles of responsibility remain the same as for any
context-specific risks. These mitigations are layered
other AI technology. These considerations, core to
across different intervention points beyond those
Meta’s approach to responsible AI, include fairness
that can be assessed and mitigated in the foundation
and inclusion, robustness and safety, privacy and
model. With our release of Llama 3 paired with Llama
security, and transparency and control, as well as
Guard 2, we are beginning to extend this vision of a
mechanisms for governance and accountability. LLMs
layered approach to safety to our open models
are one of many AI tools, and their risks should be
as well.
evaluated through these lenses according to how they
will be used. As discussed in our research paper on Llama 2,
some mitigations applied at early stages in the
Foundation models and generative AI systems
development process can be detrimental to the
represent advancements in power and accuracy
performance and safety of the model, and some
compared to predecessor technologies. The increase
risks may be better addressed at later points in the
in the performance, utility, and flexibility of these
product development cycle. Our vision for layered
models will likely lead to their ubiquity, as the value
model safety helps to empower developers to
they bring to some pre-existing use cases may
make decisions about balancing these trade-offs.
outweigh operational costs of deploying the systems.
Developers of generative AI-powered features that
The ability to generate completely new content also
leverage open source models will have more power to
opens up new use cases that must be evaluated
ensure that their products are safe and benefit end
for the types of risks they may present. There are
users, while taking a holistic view of responsible AI
potential risks related to the misuse of this
across the entire product development cycle.

AI at Meta April 2024 4


Mitigation points for LLM-
powered products
A foundation model is a general purpose AI
Model-level safety: Model-level safety concerns the
technology whereas an LLM-powered product has
data preparation and processing best practices and
a defined use case and performs specific tasks
human feedback or alignment practices for safety at
to enable an intended use or capability through a
the foundation and fine-tuned model level.
user interface, sometimes embedded in products.
An LLM-powered system encompasses both the System-level safety: System-level safety is the venue

foundation model and accompanying input-output for the most context-specific safety mitigations

safeguards, and a number of product-specific dependent on user interactions. Developers looking

layers. At various points in the product development to craft safety mitigations specifically for their use

lifecycle, developers make decisions that shape the case with the goal of offering their users the best

objectives and functionality of the feature, which can product experience should explore these options.

introduce potential risks. These decision points also


You can learn more about our layered approach to
provide opportunities to mitigate potential risks. It
safety by visiting our resources for Meta Llama open
is critical that developers examine each layer of the
trust and safety.
product to determine which potential risks may arise
based on the product objectives and design, and The following section presents responsible AI
implement mitigation strategies accordingly. considerations for the different stages of LLM
product development. At each of these levels, we
highlight best practices for mitigating potential risks.

AI at Meta April 2024 5


Development of the
foundation model

Meta Llama 3, like Llama 2, is licensed for commercial The training datasets for Llama are sourced from
use. This release of Llama 3 features both 8B and 70B a broad set of diverse, publicly available online
pretrained and instruct fine-tuned versions to help data. This training corpus is mostly English, which
support a broad range of application environments. is consistent with the current, intended use of
This next generation of Llama demonstrates state- the model. For each dataset used in training, we
of-the-art performance on a wide range of industry followed Meta’s standard privacy review processes.
benchmarks and offers new capabilities, including And for our pretraining data we made an effort
improved reasoning. With the developer in mind, and to remove data from certain sources known to
in support of our longstanding open approach, we contain a high volume of personal information about
wanted to put Llama 3 in the hands of the community private individuals. After pretraining, the model can
as soon as possible to enable early development and reproduce everything from simple grammatical rules
kickstart this next wave of innovation. to complex nuances like context, sentiment, and
figurative language. However, the model does not
In addition to performing a variety of pretraining
gain knowledge or generate beliefs about the world
data-level investigations to help understand the
in the way humans do. It only learns to predict the
potential capabilities and limitations of our models,
next word in a sentence based on the patterns in its
we applied considerable safety mitigations to the
training data.
fine-tuned versions of the model through supervised
fine-tuning, reinforcement learning from human If you’re going to use the pretrained model, we
feedback (RLHF), and iterative red teaming (these recommend tuning it by using the techniques
steps are covered further in the section - Fine-tune described in the next section to reduce the likelihood
for product). that the model will generate outputs that are in
conflict with your intended use case and tasks. If
More information on Llama 3 model architecture and
you have terms of service or other relevant policies
parameters and pretrained evaluations are contained
that apply to how individuals may interact with your
in the model card. The model card also provides
LLM, you may wish to fine-tune your model to be
information about the capabilities and limitations of
aligned with those policies. It may also be necessary
the models.
to establish new terms of service and policies specific

During pretraining, a model builds its understanding to LLMs, or notify users about how their data or

of the statistical patterns across the sample of feedback provided will be used in fine-tuning. We also

human language contained in its training data. recommend using Llama Guard 2 for enhanced safety
performance.

AI at Meta April 2024 6


Responsible LLM product

1
development stages

Developers will identify a specific product use case Determine use case
for the released model, and are responsible for
assessing risks associated with that use case and An important decision in the development process

applying best practices to ensure safety. This section is which use case(s) to focus on. Most developers

outlines the considerations and mitigation strategies using this guide already have a use case in mind,

available at each stage of product development such as customer support, AI assistants, internal

and deployment. productivity tools, entertaining end-user experiences,


or research applications. If you’re a developer who
At a high level these stages include: is not certain of a particular use case for which you
would want to use the model, consider focusing on

1. Determine use case use cases that improve the lives of people and society,
taking into consideration different ethical principles
2. Model-level alignment and values. Developing or adopting an internal risk
assessment process can help identify potential
3. System-level alignment
risks for a specific use case and should focus on

4. Build transparency and reporting how your product’s end users and others could be

mechanisms in user interactions affected. This understanding is critical for evaluating


in-context safety for your product deployment, and
can take forms such as surveys and interviews of
potential users or market analysis of similar product
applications.

If you are new to considerations of values in the


development and deployment of AI, refer to the
principles and guidance on risk management released
by academic and expert institutions, such as:

• OECD’s AI Principles

• NIST’s Trustworthy and Responsible AI


Resource Center

AI at Meta April 2024 7


Define content policies applying content policies falsely (i.e., false positives
and false negatives.) These errors will necessarily
Based on the intended use and audience for your
mean that a model will either be more aligned and
product, a content policy will define what content
less helpful or less aligned and more helpful.
is allowable and may outline safety limitations on
producing illegal, violent, or harmful content. These To illustrate: Consider a content policy against

limits should be evaluated in light of the product assistance with scams. If a user submits a prompt

domain, as specific sectors and regions may have for “How does a ponzi scheme operate?” the model

different laws or standards. Additionally, the needs can either refuse to substantively answer (arguably

of specific user communities should be considered as the most aligned, least helpful option) or provide a

you design content policies, such as the development complete, detailed answer (arguably the most helpful,

of age-appropriate product experiences. Having least aligned option). Consider the same evaluation,

these policies in place will dictate the data needed, but with the prompt “How to protect yourself from

annotation requirements, and goals for safety fine- identity theft.”

tuning, including the types of mitigation steps that As the model’s rate of identifying and stopping
will be implemented. Defining these policies will be unaligned content grows, its likelihood of falsely
used for labeling data in later stages when using stopping aligned content–and thereby reducing its
RLHF and in additional product layers, such as making overall helpfulness–grows in tandem. In other words,
enforcement decisions for user inputs and model you’ll need to look elsewhere to learn about stopping
outputs. identity theft. Turning down the dial–so that more

If you are new to considerations of content policies, unaligned content gets through–will likely have the

refer to commonly used policies in the industry such knock-on effect of increasing the likelihood that the

as the taxonomy proposed by MLCommons. model generates helpful content. You’ll learn about
protecting your identity from thieves.

Avoiding alignment-helpfulness trade-offs is


Understand alignment-helpfulness
probably impossible. But developers should exercise
trade-offs
discretion about how to weigh the benefits of
While overall model safety should keep improving
alignment and helpfulness for their specific use case
as models advance, some trade-off between model
and audience. We look forward to exploring more
helpfulness and model alignment is likely unavoidable.
ways to give developers greater control over this
That’s because any prediction–Is this content aligned?
important aspect of model building.
Is this content unaligned?–carries at least some risk of

AI at Meta April 2024 8


2
Model-level alignment
Product-specific fine-tuning enables developers to
leverage pretrained models or models with some fine-
tuning for a specific task requiring only limited data
These examples showcase
how fine-tuning an LLM
and resources. Even with initial fine-tuning performed can be used to specialize
by Meta, developers can further train the model with
domain-specific datasets to improve quality on their the model’s capabilities for
defined use case. Fine-tuning adapts the model to
specific use cases, improving
domain- or application-specific requirements and
introduces additional layers of safety mitigations. its performance and making
Examples of fine-tuning for a pretrained LLM include:
it more suitable for specific
• Text summarization: By using a pretrained language
model, the model can be fine-tuned on a dataset applications. The choice of
that includes pairs of long-form documents and
the foundation model and the
corresponding summaries. This fine-tuned model
can then generate concise summaries for new task-specific dataset plays a
documents.
crucial role in achieving the
• Question answering: Fine-tuning a language
model on a Q&A dataset such as SQuAD (Stanford desired results.
Question Answering Dataset) allows the model to
learn how to answer questions based on a given
context paragraph. The fine-tuned model can then
be used to answer questions on various topics.

• Sentiment analysis: A model can be fine-tuned


on a dataset of labeled text reviews (positive
or negative sentiment) to recognize sentiment and
perform analysis to understand user satisfaction.
By training the model on this task-specific dataset,
it can learn to predict sentiment in text accurately.

AI at Meta April 2024 9


The responsible fine-tuning flow
Here are the general steps needed to responsibly fine-
tune an LLM for alignment, guided at a high
level by Meta’s Responsible AI framework:

1. Prepare data
2. Train the model
3. Evaluate and improve performance

Step 1: Prepare data


Developing downstream applications of LLMs begins
with taking steps to consider the potential limitations,
privacy implications, and representativeness of
data for a specific use case. Begin by preparing and
preprocessing a clean dataset that is representative
of the target domain. This involves tokenizing the text,
handling special characters, removing unnecessary
information, and splitting the dataset into training,
validation, and testing sets. This step may also involve
ensuring that data are representative of the end users
in the deployment context, for instance, by ensuring
there are enough examples from relevant languages if
you plan to deploy your product in a
non-English speaking market. Representativeness
of data is dependent on the use case and should be
assessed accordingly.

When fine-tuning for a specific use case it can be


beneficial to examine training data for biases, such
as gender, racial, linguistic, cultural or other biases.

AI at Meta April 2024 10


THE RESPONSIBLE FINE-TUNING FLOW

Understanding these patterns is important but it may Step 2: Train the model
not always be optimal to filter out all problematic
Fine-tuning involves training the model for a limited
content in training data due to the unintended
number of iterations. Once a pretrained model
consequences this filtering may have on subsequent
is loaded in the environment for fine-tuning, the
performance and safety mitigations, such as prompt
training process involves setting up hyperparameters
engineering. Instead of removing data, focusing on
like epochs, batch size, and learning rate. The data
the representativeness of the data can help prevent
are passed through the model, loss is computed, and
a fine-tuned model from perpetuating biases in its
weights are updated through backpropagation. The
generated outputs; what is considered representative
training progress is monitored using a validation set,
will depend on the specific context in which a product
and hyperparameters are adjusted as necessary.
is deployed. Developers should also pay attention
Fine-tuning an LLM for safety can involve a number
to how human feedback and annotation of data may
of techniques, many of which the research paper on
further polarize a fine-tuned model with respect
Llama 2 describes in greater depth. These techniques
to subjective opinions, and take steps to prevent
can include:
injecting bias in annotation guidelines and to
mitigate the effect of annotators’ bias. Resources • Supervised Fine-Tuning (SFT): Supervised fine-

on this topic include: tuning using data annotated across helpfulness


and safety.
• Don’t Blame the Annotator: Bias Already Starts in
the Annotation Instructions • Reinforcement Learning from Human Feedback

• Annotators with Attitudes: How Annotator Beliefs (RLHF) or AI Feedback (RLAIF): Training safety
And Identities Bias Toxic Language Detection and helpfulness reward models to support

There are several other risks to consider, such as RLHF techniques iteratively improves models

overfitting, privacy, and security. To mitigate these and makes them more robust to jailbreaking

risks, carefully design the fine-tuning process by techniques.

curating a high-quality dataset that is representative • Targeted Safety Context Distillation: Context
of your use case, conduct rigorous evaluations, and distillation for safety helps the model associate
test your fine-tuned model’s potential use via red adversarial prompts with safe responses by
teaming (covered in step four - Evaluate and prefixing a safe preprompt such as “You are a
improve performance). safe and responsible assistant” to the adversarial
prompt, followed by fine-tuning on new outputs.

AI at Meta April 2024 11


THE RESPONSIBLE FINE-TUNING FLOW

Reinforcement Learning from Human performance and safety, and iterating until satisfied
Feedback (RLHF) with the model’s performance using holdout test

To align the output of LLMs with user expectations datasets.

and values, one approach that developers should There are many complementary types of evaluations
consider is implementing Reinforcement Learning that are useful for measuring risks in models,
from Human Feedback (RLHF) mechanisms. This including automatic benchmarks, manual annotations
involves collecting ranking data from trained by human raters, and evaluations using an LLM
annotators or users (given a model input and several itself as a rater. The Holistic Evaluation of Language
generated outputs, ranking them from best to worst Models discusses some of the most commonly used
according to policies), training a reward or helpfulness automatic benchmarks. As the industry matures, we
model to act as a proxy of human feedback, and are excited for evaluation platforms to emerge to
then optimizing the LLM to maximize the reward/ help drive safety standardization, such as through
helpfulness model score with reinforcement learning. the MLCommons AI Safety working group. Evaluation
strategies and processes to improve performance can
Reinforcement Learning from AI include:
Feedback (RLAIF)
• Automatic evaluation leverages automatic
Reward models can also be improved and tailored to benchmarks and classifiers to judge the output
specific policies by using Reinforcement Learning with respect to a specific category of risk.
from AI Feedback (RLAIF). The fine-tuned LLM itself
• Manual evaluation leverages human annotators
can be used to create synthetic ranking data for
or subject matter experts to judge the model’s
reward model training. Given a model input, response
output.
pairs and relevant guidelines, the LLM predicts
• Red teaming is a systematic effort to identify
which response would best follow the guidelines.
model vulnerabilities or emergent risks by
The synthetic reward modeling data are then used to
crafting prompts that may elicit undesirable
augment the reward model’s training data.
behavior or outputs. This type of manipulation
of the model can be used to test safeguards and
Step 3: Evaluate and improve performance attempts to “jailbreak” the model.

The final stage is to evaluate the fine-tuned model on


a test set to measure its performance on the specific
task and against safety benchmarks, according to
the use case. This includes analyzing the model’s
strengths and weaknesses based on evaluation
results, gathering more data to further enhance

AI at Meta April 2024 12


Red teaming best practices • Regular testing: The model should undergo
regular testing to determine whether or not
Red teams should adopt systematic approaches
mitigations against attacks are effective.
to testing and measurement, while estimating
This requires some form of automated
real-world behaviors and threat vectors to the
evaluation, either with human labeling, which
extent possible.
can be expensive, or with classifiers trained
• Diversity: Red teams should include a diverse to recognize responses that fall under the
set of people from a range of professional risk categories.
backgrounds that are representative of a broad
group of potential users and demographics. Red
teams can be composed of internal employees,
experts, or community members.

• Subject matter expertise: Subject matter experts


should judge model responses based on their
familiarity with the identified risk categories and
label responses that fall under each category.

AI at Meta April 2024 13


Privacy adversarial attacks
Additional privacy protections should be considered
when releasing the product, to test whether bad
actors may be able to improperly extract information.
3
System-level alignment
Without proper safeguards at the input and output
levels, it is hard to ensure that the model will respond
properly to adversarial inputs and will be protected
A privacy adversarial attack is a method where
from efforts to circumvent content policies and
attackers can exfiltrate data from a model. For
safeguard measures (“jailbreaking”). Mitigations at
example, common adversarial attacks may include
the output level can also act as a safeguard against
membership inference attacks on a model to predict
generating high-risk or policy-violating content.
whether or not a particular sample was in the training
Enforcement of content policies can be managed
data, or model inversion attacks to reconstruct
through automated systems and manual analysis
representative views of a subset of examples.
of samples and reports. Automated systems may
Prompt injection attacks are attempts to circumvent
include machine learning and rule-based classifiers
content restrictions to produce particular outputs.
for filtering prompt inputs or system outputs. Usage

A red team privacy adversarial attack conducted by a or consequence policies may be defined for when

company may be able to demonstrate the feasibility users repeatedly violate those policies.

of such attacks. In scenarios where companies


fine-tune models using personal data (pursuant to Enforcement of content
applicable privacy laws), they should consider testing
the outputs to see if the model memorized particular
policies can be managed
data. This approach may be especially useful for through automated systems
testing models that are intended to be deployed as
AI assistants or agents. and manual analysis of
samples and reports.

Automated systems may include machine learning


and rule-based classifiers for filtering prompt inputs
or system outputs. Usage or consequence policies
may be defined for when users repeatedly violate
those policies.

AI at Meta April 2024 14


Mitigating risks at the input level • Prompt engineering: Direct modifications of
the user inputs are an option for guiding the
The input refers to the information provided by
model behavior and encouraging responsible
the user and passed to the system. The developer
outputs, by including contextual information or
does not control what the user inputs. Without
constraints in the prompts to establish background
implementation of input filters and safeguards, even
knowledge and guidelines while generating the
advanced models can potentially be manipulated to
output. Modifications may be done in a variety
generate harmful or misleading outputs or violate
of ways, such as with automated identification
content policies. Although safeguards to protect
and categorization, assistance of the LLM itself,
privacy and prevent potential harm can be developed
or rules engines. These can help improve the
by tuning the model, it should be expected that even
user experience by creating more diversity and
after rigorous design and testing, those safeguards
expressiveness from the model. For example,
will not have perfect performance and may be
prompt engineering can be leveraged to direct the
subverted. Additional safeguards include direct
model to include more diverse references or apply
filtering and engineering of the inputs. For these to
a certain tone or point of view. Prompt engineering
be effective, model inputs must be well-formatted.
rules may be hard coded or probabilistic.
These approaches include:

• Prompt filters: Even when inputs may not


violate content policies, the model may produce
Alongside prompts, it
problematic engagements or outputs. In these might be beneficial to
cases, it may be appropriate to filter, block, and
hard code responses for some inputs until the
provide instructive sample
model can respond in the intended way. This inputs and outputs that
tactic may come with tradeoffs to the user’s
experience and agency in engaging with the illustrate the desired
system. Thus, the safety benefits of such
responsible behavior.
restrictions or modifications should be weighed
against those costs, until more robust solutions
are developed.

AI at Meta April 2024 15


Mitigating risks at the output level unreasonably restrict the usage of your model.
Words often have context-dependent meanings,
Based on the downstream use case, you can apply
and terms that could be sexually suggestive, for
several approaches for detecting and filtering the
example, may also be used in medical contexts.
generated output of models for problematic or policy-
Content policies will help articulate the specifics
violating content. Here are some considerations and
between permitted and prohibited topics to users.
best practices for filtering outputs. Any output filter
mitigation should include all languages that are used • Classifiers: The more effective, but also more

in the region where your product is available. difficult, approach is to develop classifiers that
detect and filter outputs based on the meaning
• Blocklists: One of the easiest ways to prevent the
conveyed by the words chosen. Classifiers,
generation of high-risk content is to compile a
when properly trained on known examples of a
list of all the phrases that your model should not,
particular sentiment or type of semantic content,
under any circumstances, be permitted to include
can become highly effective at identifying novel
in a response. Many words are easily identifiable
instances in which that sentiment or meaning
as problematic; slurs, for example, are typically
is expressed.
offensive no matter their context. While blocklists
are attractive for their simplicity, they may

AI at Meta April 2024 16


Evaluate effectiveness
While prompt filtering and engineering are critical
safety mitigations, it’s important to monitor
effectiveness and avoid unintended consequences.
4
Build transparency and reporting
mechanisms in user interactions
Releasing an LLM-powered feature for users to
interact with can reveal new use cases as well as
Some best practices include:
new concerns. User interactions can provide critical
• Test for unintended outcomes. Take
feedback, which can be used for reinforcement
caution that prompt engineering doesn’t
learning (discussed in a previous section). This is
inadvertently create other issues. Test
also an opportunity to provide appropriate notice,
end-to-end performance after any prompt
transparency, and control to users, which can lead to
engineering to ensure desired behavior.
greater satisfaction and trust in the feature.
• Evaluate effectiveness of safeguards. Many
publicly available datasets offer collections
Feedback & reporting mechanisms
of prompts that are designed to benchmark Facilitating user interaction with appropriate

against specific concerns when used as inputs. feedback or reporting mechanisms is key to ensuring

After model responses are collected, they can quality output. Feedback mechanisms can be as

be evaluated by using standardized metrics. simple as positive or negative (thumbs up or thumbs


down), and tailoring feedback to the types of issues
• Adjust for different languages. Prompt
that may be foreseeable based on a company’s use
filtering and engineering mitigations should
case (for example, AI assistants) can enhance the
include all languages that are used in the
quality of feedback. This feedback can be used by
region where your product is available; the
developers to improve the model in more targeted
effectiveness of these mitigations may be
ways. Providing an option for freeform feedback
dependent on linguistic and community-level
within a reporting mechanism can also reveal new or
nuances. Llama was trained primarily on data
unanticipated concerns raised by users. Furthermore,
in English, in accordance with its intended
users can identify and highlight errors, unsafe
use, so it is critical to carefully evaluate any
behaviors, or suboptimal actions that the model
mitigations in other languages.
might not recognize on its own. Developers can
further train the model with this feedback to improve
performance and avoid repeating mistakes. Product

AI at Meta April 2024 17


developers should review feedback by monitoring the critical factors in ascertaining when and how
rate that users report model outputs and by manually to be transparent. Work with your appropriate
reviewing those reports and selected samples of advisors to determine the types of transparency
model outputs. that should be provided to users, including
whether users should be informed that their
Transparency & control best practices
responses may be used to fine-tune a model.
To ensure high-quality feedback and provide end Developers should also consider the use of
users with notice and choice about their interactions system cards to provide insight into their AI
with your AI assets, developers should consider the system’s underlying architecture and explain how
following practices for user interactions: a particular AI experience is produced. Further
best practices are outlined in the Partnership on
• Transparency: Developers should consider ways
AI’s Responsible Practices for Synthetic Media.
to provide transparency to end users regarding
potential risks and limitations of the system • Control mechanisms: Additional controls could

prior to or at the time of user interaction. For include giving users the option to customize the

instance, notice to users that they are interacting outputs generated by an LLM. For example, a

with an AI-powered chatbot may increasingly user could select or reject outputs from a list of

be required in certain markets, and is a best multiple options. Offering editing capabilities

practice to address concerns that may be related can also enhance a user’s sense of agency

to false or incorrect information. Developers over outputs, and developers should consider

should neither claim nor imply that an AI agent is education flows that can set a user up for

human, especially when building and deploying success, such as offering prompt suggestions or

anthropomorphized interfaces. Context, intent, explanations of how to improve an output.

sensitivity, and likelihood to deceive are additional

AI at Meta April 2024 18


Resources for developers

There is a value chain emerging to support the Filters and classifiers:


responsible training and use of LLMs, which we
• Meta Llama Guard and other solutions
believe will be advanced through more open
• Content-filtering systems from Azure, supporting
releases and sharing of best practices, tools, and
a range of languages: learn.microsoft.com/
benchmarking. A growing number of researchers,
en-us/azure/cognitive-services/content-safety/
platforms, companies, and developer communities
overview
are contributing to this ecosystem. We expect more
tools for the responsible development of LLM to • Filter lists for generation of problematic words:

become available over time and are committed to github.com/LDNOOBW/naughty-words-js

fostering more open exchange of safety research and • Recipes for safety in open-domain Chatbots,
tools to support developers. including a sensitive topics classifier: parl.ai/
projects/safety_recipes/

It is critical to remain aware


of the latest versions of Platforms for tools and evaluations:
• MLCommons AI Safety (v0.5): mlcommons.
models and use the most org/2024/04/mlc-aisafety-v0-5-poc/

current version to get the • Meta Llama Cybersecurity evaluations

best results. • Benchmarking of LLMs by Stanford’s Center for


Research on Foundation Models, HELM: crfm.
stanford.edu/helm/latest/
Our partnership to make Llama available on the Azure
Model Catalog will enable developers using Microsoft • EleutherAI LLM Evaluation Harness: github.com/

Azure to leverage their cloud-native tools for content EleutherAI/lm-evaluation-harness

filtering and safety features. Below, we provide a • Hugging Face Hub which hosts open source
few notable hubs and implementation resources for models, datasets, and is a space for developers
developers, but this list is not exhaustive. Microsoft to share safeguards and access benchmarking
also offers a repository of Responsible AI Resources. information: huggingface.co/docs/hub/index

AI at Meta April 2024 19


Reporting resources • Reporting bugs and security concerns:
facebook.com/whitehat/info
If you have any information about issues, violations,
or problems, please help keep our communities safe • Reporting violations of the Acceptable

by using our reporting resources. Use Policy or unlicensed uses of Llama:


LlamaUseReport@meta.com
• Reporting issues with the model: github.com/
facebookresearch/llama

• Reporting risky content generated by


the model: developers.facebook.com/
llama_output_feedback

AI at Meta April 2024 20


Combining the components
of responsible generative AI

Each stage of model development presents data-collection stage to user feedback, be sure
opportunities to enhance the safety of your AI to keep your overall goal in mind.
feature. However, it’s crucial to acknowledge the • Standardizing processes for learning from
interconnectedness of these stages and how the feedback/errors. Embracing an iterative model-
decisions made at each stage can impact others. development mindset is crucial. Establish a well-
Building a responsible AI ecosystem requires ongoing defined process for incorporating new learnings
efforts to refine each component and ensure they into subsequent model training. This process
work together effectively. should include consistent feedback analysis,
prioritization of identified issues, and systematic
Here are some key considerations for implementing
application of learnings in the next iteration of
these components in unison:
model training.
• Holistic optimization. Although each component
The field of generative AI is complex, ever-evolving,
has a specific role and optimization goal,
and full of potential, but it’s not without risks. The
components are not isolated entities. Over-
key to unlocking its benefits while mitigating the
optimization of one component without
downsides is responsible AI practice. This practice
considering its interaction with others can lead
starts with understanding the complexities of the
to suboptimal outcomes. For instance, over-
technology, the potential impacts on users and
filtering training data for safety might make
society, and the importance of continuously striving
later fine-tuning less effective, as the model
for improvement.
may not recognize and handle unsafe content
appropriately. This is why different layers of By embracing the principles of transparency,

safety mitigations throughout the development accountability and user empowerment, as well

lifecycle are critical for creating high-performing, as having a commitment to ongoing learning and

responsible products. improvement, you can ensure that your AI feature


is not only innovative and useful but also responsible
• Alignment of objectives at each stage of
and respectful. We hope this guide serves as a
development. To yield a product that is optimized
valuable tool in your journey toward responsible
for your target use cases, it’s essential to have
AI practice.
a consistent set of goals and outcomes that
guide each stage of the process. From the

AI at Meta April 2024 21


ADDENDUM

Introducing Code Llama

Code Llama is a set of large language models for code with known insecure patterns or complying
code based on Llama 2 providing strong code with a cyber attacker’s request. Measuring the
generation and infilling capabilities, support for model’s response to malicious requests allowed
large input contexts, and zero-shot instruction us to implement mitigations which make Code
following ability for programming tasks, as well as Llama 70B safer than other available models, while
state-of-the-art base models for fine-tuning. We remaining helpful. For detailed information on model
provide multiple flavors to cover a wide range of training, architecture and parameters, evaluations,
applications: foundation code models (Code Llama), responsible AI and safety refer to our research paper
Python specializations (Code Llama - Python), and and Responsible Use Guide.
instruction-following models (Code Llama - Instruct)
with 7B, 13B, 34B and 70B parameters each. All This addendum to the guide
models are trained on sequences of 16k tokens
and show improvements on inputs with up to 100k
outlines additional, coding-
tokens. The 7B and 13B Code Llama and Code specific best practices that
Llama - Instruct variants support infilling based on
surrounding content. Code Llama was developed
developers should consider
by training Llama 2 on publicly available code and in responsible development
natural language datasets related to code.
of downstream coding-
Building AI models responsibly is important to
us, and we undertook a variety of safety measures
related features.
before releasing Code Llama, including many of
the measures used for Llama 2. Before releasing
Code Llama potential use cases
the 7B, 13B, and 34B variants in August 2023, we
asked cybersecurity and malware development Code Llama has two broad use case categories:
experts to evaluate the Code Llama model’s capacity
1. Foundation model use case: Train on more data
for enabling malware development by otherwise
to create variants of Code Llama, e.g, add other
unskilled adversaries. We subsequently expanded
programming languages (C++, Java), increase
our evaluation suite to include Meta’s CyberSecEval,
context length, reduce latency by quantization
which evaluates an LLM’s capacity for producing
of model, etc.

AI at Meta April 2024 22


ADDENDUM

2. Instruction model use case: Add more instruction code-specific best practices when building on top of
data to improve the model’s ability to follow Code Llama in line with their specific use case.
instructions, extend capability to understand Define content policies for use case
non-English instructions, create standalone
• A content policy defines what content is allowable
code bots and integrate into existing 3rd party
based on the intended use and deployment
products, etc.
context. This may also outline limitations on

Both the foundation and instruction Code Llama producing potentially problematic content. In

models are not designed to be used as a large the code domain, models should avoid producing

language model for general purpose - for such malware, virus, or malicious code. Developers

1
scenarios refer to Llama 2. should consider how bad actors prompt the
model to produce these results.

• These policies will dictate the data needed,


annotation requirement, and goals of safety
fine-tuning. They may also be applied in input-
and output-level safeguards as additional
safety mechanisms.

Evaluations & benchmarks

• Models should be evaluated against their


intended use and end user requirements.
Foundation model use case
Specifically, code models should be evaluated
This model could be used for further research against code-specific benchmarks. As a resource,
exploration on specialized foundation large you can find various benchmarks on Papers with
language models for programming. When Code: Code Generation Benchmarks.
fine-tuning Code Llama, we refer users to the
• We recommend evaluating the cybersecurity
Responsible Use Guide for Llama 2, which provides
safety of coding models with the CyberSecEval
essential guidance on responsible development
(GitHub) which was released as part of our open
of downstream models, including on (i) defining
trust and safety tools and evaluations project.
content policies and mitigations; (ii) preparing
• Non code-specific safety evaluations are also
data; (iii) fine-tuning the model; (iv) evaluating and
recommended, for example, code models can be
improving performance; (v) addressing input- and
evaluated on benchmarks such as TruthfulQA,
output-level risks; and (vi) building transparency
ToxiGen and BOLD.
and reporting mechanisms in user interactions.

Additionally, developers should consider

AI at Meta April 2024 23


ADDENDUM

Red teaming & fine-tuning considerations • If the model’s output will be used in production

• The data should be representative of the systems, developers should ensure the code that

end users’ requirements. For example, if the the model is trained on is free of relevant security

model is meant for Javascript generation, vulnerabilities. Developers and end-users that

the dataset chosen to fine-tune with should use the model as an assistant for software

be Javascript-focused. Developers should also development should continue to follow security

consider examining and placing restrictions best practices.

on any potentially malicious or nefarious code System-level safeguards


in the data. • As for other LLM use cases, we recommend
• Developers should ensure the security and adding appropriate system-level safeguards in
robustness qualities of the training code dataset addition to the model-level built-in alignment.
matches the security requirements of the output • As part of our open trust and safety tools, we
and the systems where the output code will be released Code Shield, an insecure code detector
integrated based on a specific use case. that can be used at inference to prevent bad
• Developers should perform safety studies on coding practices. Visit Code Shield Github for
code-specific areas such as intentional malware more information on how to integrate it into
generation and the unintentional introduction your application.
of vulnerable code. Working with red-teaming When using Code Llama in particular, it is important
domain experts can help developers evaluate to keep in mind that Code Llama is specialized for
the model’s capacity to lower the bar for writing code-related tasks and may not be appropriate as a
malicious code when the prompt intent is clear foundation model for other task families, e.g, general
and the output goes beyond resources already language model. We note that for downstream tasks
publicly available on the Internet. where the Python programming language is more
• Developers and end-users that use the model as relevant, it may be more appropriate to use our Code
an assistant for software development should Llama - Python model variant. Code Llama and Code
be aware of the model’s overall language safety. Llama - Python are not trained with instruction
Performing safety studies and comparing results data hence are not designed to follow instruction in
to representative benchmarks can identify natural language. Any use of these models to perform
particular categories of content risk. To mitigate general natural language tasks is not recommended
those risks, collect relevant fine-tuning data that to avoid potential misuse of the models.
is not within the test set, and fine-tune the model
by controlling for higher measured safety while
maintaining helpfulness.

AI at Meta April 2024 24


2
ADDENDUM

Instruction model use case Reporting resources for developers

This model could be used for further applied If you have any information about issues,

research and testing of specialized large language violations, or problems, please help keep our

models for programming. Code Llama - Instruct communities safe by using our reporting resources.

has the ability to understand instructions in • Reporting issues with the model via our
natural language. When using Code Llama for code GitHub repo
generation tasks, we recommend developers use
• Reporting risky content generated by the
our Code Llama - Instruct variants, which have
model via our Developer portal
been fine-tuned to generate helpful and safe
• Reporting bugs and security concerns via our
answers to users. Consult the full Responsible
Bug Bounty
Use Guide for best practices with regards to
generally addressing input- and output-level
risks and building transparency and reporting
mechanisms in user interactions, as relevant to
a specific use case.

For both use cases, users must abide by our


Acceptable Use Policy.

AI at Meta April 2024 25

You might also like