Large Language Model Cheat Sheet

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Large Language Models: Cheat Sheet

Large Language Models Fine-Tuning Input Representations:


LLMs, or language models, are artificial intelligence
systems capable of producing text that resembles Fine-tuning is the process of training a pre- Word Embeddings:
human writing by learning from extensive training trained large language model on a specific task
data. These models find applications in tasks like using a smaller dataset. This allows the model to Each token is substituted with a vector that
language translation, chatbots, and content learn task-specific features and improve its symbolizes its meaning in a continuous
generation. performance. vector space.
The fine-tuning process typically involves Common techniques for word embeddings
freezing the weights of the pre-trained model comprise Word2Vec, GloVe, and fastText.
Some popular LLMs and only training the task-specific layers.
When fine-tuning a model, it's important to
Subword Embeddings:
Some popular LLMs include GPT-3 (Generative consider factors such as the size of the fine- Every token is deconstructed into smaller
Pretrained Transformer by OpenAI, BERT tuning dataset, the choice of optimizer and subword units (e.g., characters or character
(Bidirectional Encoder Representations from learning rate, and the choice of evaluation n-grams), and each subword is substituted
Transformers) by Google, and XLNet (eXtreme metrics with a vector representing its meaning.
MultiLingual Language Model) by Carnegie Mellon
This method can manage out-of-vocabulary
University and Google.
Example of fine-tuning (OOV) words and enhance the model's
capability to grasp morphological and
How are LLMs trained? LLMs semantic similarities.
Popular methods for subword embeddings
Model Cost: $500 - $5000 per month, depending on the include Byte Pair Encoding (BPE), Unigram
LLMs are trained using a process called size and complexity of the language model Language Model (ULM), and SentencePiece.
unsupervised learning. This involves Positional Encodings:
feeding the model massive amounts of GPU size: NVIDIA GeForce RTX 3080 or higher
text data, such as books, articles, and Since LLMs work on sequences of tokens,
websites, and having the model learn the Number of GPUs: 1-4, depending on the size of the they require a way to encode the position of
patterns and relationships between words language model and the desired speed of fine-tuning. each token in the sequence.
and phrases in the text. The model is then For example, fine-tuning the GPT-3 model, which is one Positional encodings are vectors added to
fine-tuned on a specific task, such as of the largest language models available, would require the word or subword embeddings to provide
language translation or text a minimum of 4 GPUs. positional information for each token.
summarization. Segment Embeddings:
The size of the data that GPT-3 is fine-tuned on can vary
Choose between LLMs greatly depending on the specific use case and the size In certain LLMs like the Transformer, the
of the model itself. GPT-3 is one of the largest language input sequence can be segmented into
When evaluating various models, it's
crucial to take into account their Preprocessing models available, with over 175 billion parameters, so it
typically requires a large amount of data for fine-tuning
multiple segments (e.g., sentences or
paragraphs).
architecture, model size, the volume of to see a noticeable improvement in performance.
training data utilized, and their Text normalization is the process of converting text to a Segment embeddings are incorporated into
performance on specific NLP tasks. standard format, such as lowercasing all text, removing the word or subword embeddings to specify
Note: fine-tuning GPT-3 on a small dataset of only a few the segment to which each token belongs.
special characters, and converting numbers to their gigabytes may not result in a significant improvement in
written form. performance, while fine-tuning on a much larger dataset
of several terabytes could result in a substantial
Components of LLMs Tokenization is the process of breaking down text into
individual units, such as words or phrases. This is an
improvement. The size of the fine-tuning data will also
depend on the specific NLP task the model is being fine-
LLMs typically consist of an encoder, a important step in preparing text data for NLP tasks. tuned for and the desired level of accuracy.
decoder, and attention mechanisms. The
encoder takes in input text and converts Stop Words are common words that are usually removed
it into a set of hidden representations, during text processing, as they do not carry much This is just one example, and actual costs and GPU
while the decoder generates the output meaning and can introduce noise or affect the results of specifications may vary depending on the language
text. The attention mechanisms help the NLP tasks. Examples of stop words include "the," "a," "an," model, the data it is being fine-tuned on, and other
model focus on the most relevant parts of "in," and "is.” factors. It's always best to check with the language
the input text. model provider for the latest information and specific
Lemmatization is the process of reducing words to their recommendations for fine-tuning.
Applications of LLMs base or dictionary form, by taking into account their part
of speech and context. It is a more sophisticated
LLMs are used in a wide range of technique than stemming and produces more accurate
applications, including language results, but it is computationally more expensive.
translation, chatbots, content creation,
and text summarization. Stemming and lemmatization are techniques used to
reduce words to their base form. This helps to reduce the
They can also be used to improve search dimensionality of the data and improve the performance
engines, voice assistants, and virtual of models.
assistants.

You might also like