LLM Review

LARGE LANGUAGE
MODELS
A REVIEW FOR LE AI
HACKATHON
Shay Zweig, 19/4/2023

WHAT ARE WE GOING TO COVER
• What are Large Language models
• LLMs Strengths and limitation
• Prompt engineering
• Context and short term memory
• Retrieval augmented generation
• Augmenting LLMS: Tools, plugins and agents
• Helpful libraries (Langchain)
WHAT ARE Given a sequence of words – predict the next
LANGUAGE
word Clean
0.19
MODELS? 0.4
Amazing
The room in the hotel was ___
Disappointing
0.3
Haunted
0.01
Has been around for a while
WHAT ARE Large language models are huge artificial neural
LARGE networks trained on the word prediction task
LANGUAGE
MODELS • Not magic – Function optimization
(LLM)?
• Only trained to predict the next word *
How big? (GPT3) Why does it work?
• 175B parameter • Quality data

• Trained on all the • Turns out word
internet ~500B tokens completion is a great
task
• Also – we don't know
* also RLHF
WHAT IS IT GOOD FOR?
Completion – write a rap song about Knowledge extraction– given a hotel
luxury escapes description, extract the name, location,
number of rooms...
Q&A– When was luxury escapes founded?
Sentiment analysis – what is the
Summarization – summarize the
sentiment of the following text: "The hotel
following document...
was..."
Classification – given a hotel Paraphrasing – rewrite the following text
description, classify it to the one or more of in 10 different styles
the following classes: [family friendly, city
break...] Coding – write a python function that
takes a document and analyzes...
And much more!!!

LIMITATION – BE CAREFUL....
Hallucinations and alignment:
Knowledge cutoff:
Consistency and predictability - how do I know I get the right result?

Evaluation - how to evaluate the results?
Number of tokens
Cost of inference
...
OPEN AI Completion: text-davinci-003
API openai.api_key = os.getenv("OPENAI_API_KEY")
response = openai.Completion.create(model="text-davinci-003", prompt="Say this is a test)
Chat completion: gpt-4/gpt-3.5-turbo

openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
]
)
Embeddings:text-embedding-ada-002
def get_embedding(text, model="text-embedding-ada-002"):
text = text.replace("\n", " ")
return openai.Embedding.create(input = [text], model=model)['data'][0]['embedding']
PROMPT ENGINEERING
The prompt: Our main way to control the model behavior
From To
Emergent behaviour
Task specific model training Zero shot and few shot
and fine tuning learning
PROMPT ENGINEERING
The prompt: The "program language" of the model
Instructions: (Answer the user query given the specifies hotel description. If there is no information in
the description to answer the query answer "I don't know".)
Context: (hotel description: Fly high in one of the world's ultimate....)
Examples (Few shots) :

Query: Did the hotel win any prizes
Answer: Yes, it won the Tripadvisor Travellers' Choice for 2020
Query: Does it have Gluten free meal?

Answer: I don't know
User input: "query: Can I charge my electric car in the hotel?
Output indicator: Answer:

PROMPT ENGINEERING
PROMPT ENGINEERING
CHAT PROMPTS – ROLES
Only relevant in the new chat models (GPT-3.5/GPT-4)
system: "You are LeGPT you are an expert in travel you System prompts are used for
can answer questions in reference to provided context. you general instructions to the model –
answer questions in a fun and engaging way." They are more useful in GPT4.0
user: I want to take my family to a vacation in December,
where should I go? User prompts are used for the user
interaction within the conversation
Assistant: December is a great time to take a family
vacation! If you're looking for a fun and festive experience,
Assistant prompts are used for the
I suggest visiting one of the many Christmas markets in
model response within the
Europe. Germany, Austria, and Switzerland are known for
conversation
their beautiful markets...
PROMPT ENGINEERING - TIPS AND TRICKS
• Tell the model it’s role: “As an expert in...”
• Be as explicit and elaborate as possible:
• ...If you don't have the answer, say: I don't know
• ...No more than 60 words but can be less than 60 words.
• Chain of thought reasoning (CoT) - Let's think step by step

• A good reference:
https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/
• Automatic prompt generator (careful – expensive....):
https://github.com/keirp/automatic_prompt_engineer
THE TEMPERATURE PARAMETER
The temperature parameter sets the randomness level of the model.
Temperature = 0 ==> an (almost) deterministic output
Temperature = 1 ==> Increase randomness – different outputs every time, higher "creativity"
CONTEXT - SHORT TERM MEMORY
• LLMs are stateless, all the context needs to be passed in the prompt....
"total_tokens": 263
"total_tokens": 488
* Tokens are the atoms of the language model – each token can be one or more words or even parts of a word
"total_tokens": 263
"total_tokens": 488
Problem: Token* explosion Possible solutions:

• Every model has a token limitation (4K for • Context window (include only the last X iterations)
ChatGpt/ 8K for GPT4) • Summarization (Summarize the chat to this point)
• Billing I usually by the token as well as runtime
* Tokens are the atoms of the language model – each token can be one or more words or even parts of a word
CONTEXT - FEW SHOTS
Providing the model with examples of the desired behavior, will greatly improve performance:
Problem: Possible solutions:

• Few shots increases the number of tokens • Example selection
significantly... • Fine tuning (only available for GPT 3)
CONTEXT
Remember – Always count your tokens...
https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
RETRIEVAL AUGMENTED GENERATION
• The best way to handle: knowledge cutoff, hallucinations, referencing and using internal
knowledge
• Use the strengths of the generative model but ground them to external knowledge.
EMBEDDINGS
Top-Rated Five-Star Maldives

Paradise with Two Infinity Pools & 2 -79 1 4 -30 26 8 94 -1 ...
Eight Restaurants
EMBEDDINGS
Ultimate All-Inclusive Pullman Maldives

Villas with Unlimited Drinks & ...
Roundtrip Domestic Malé Flights
Top-Rated Five-Star Maldives Paradise

with Two Infinity Pools & Eight
2 -79 1 4 -30 26 8 94 -1 ...
Restaurants
Vibrant Five-Star Pullman Stay in the

Heart of Melbourne CBD's Shopping & ...
Dining District with Daily Breakfast
• Embeddings + Vector DB + retrieval + contextual generation
Query
LLM
Embedding Generation
Context
Knn
Similarity search Response
store
Embedding
Vector
DB*
Documents Embedding vectors Similar docs
* pinecone/chrome/Faiss
Tutorial link
• Use cases:
• Search
• Recommendation
• In context QA
Query
• …
LLM
Embedding Generation
Context
Knn
Similarity search Response
store
Embedding
Vector
DB*
Documents Embedding vectors Similar docs
* pinecone/chrome/Faiss
Tutorial link
• Pros
• Reduces hallucinations dramatically!
• LLM augmentation with new and external knowledge (like organizational knowledge)
• Can reduce cost (embeddings are cheap and one time)
• Leverage LLM strength for generation.
• Allows referencing
• Cons
• Complexity –
• preprocessing the data
• Vector DB ops
• Cost - Vector DB costs
AUGMENTING LLMS - TOOLS
• external APIs meant to augment LLMs such as:
• Search
• Calculator
• DB query
• Bash / Python interpreter
• OTHER AI models (HuggingGPT)
• Humans!
• …
• Should have a good description of function and expected input.

AUGMENTING LLMS - PLUGINS
• Exposing external API to OpenAI’s ChatGPT
• You can think of it as an app store for the new chat UI
LLM AGENTS
• LLMs that can plan, use tools and self improve:
• Plan strategy to reach a goal Be careful!
Use of agents can get expensive…
• Perform tasks (use of external tools or internal subtasks)
• Take Observations from tasks
• Self reflect and improve
AutoGPT, Demo, LangChain Agents

LANGCHAIN – A LIBRARY FOR LLM APP DEV
A lot of wrappers and functionality - make life very easy!
• Multiple LLMs
• Prompt templates and selectors
• Chains (composing different LLM components together)
• Memory (multiple types)
• Agents
• Indexes (Vector db wrappers)
• Loaders (easily load text data) Tutorial link
Should be the go-to lib for the AI hackathon

NOT ONLY OPEN AI
• AI21Labs (Task specific APIs)
• CoHere (multilingual embeddings!)
• HuggingFace – many open-source models
• Anthropic (closed beta)
REFERENCES
• Great post about production LLMS: link

• Prompt engineering tricks tl;dr link
• Retrieval augmented generation tutorial: link
• Langchain crash course: link
• Data preprocessing for link

LLM Review

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LLM Review

Uploaded by

Copyright:

Available Formats

LARGE LANGUAGE

Shay Zweig, 19/4/2023

How big? (GPT3) Why does it work?

• 175B parameter • Quality data

And much more!!!

Consistency and predictability - how do I know I get the right result?

response = openai.Completion.create(model="text-davinci-003", prompt="Say this is a test)

Chat completion: gpt-4/gpt-3.5-turbo

Context: (hotel description: Fly high in one of the world's ultimate....)

Examples (Few shots) :

Query: Does it have Gluten free meal?

User input: "query: Can I charge my electric car in the hotel?

Output indicator: Answer:

• Chain of thought reasoning (CoT) - Let's think step by step

Temperature = 0 ==> an (almost) deterministic output

Problem: Token* explosion Possible solutions:

Problem: Possible solutions:

Top-Rated Five-Star Maldives

Ultimate All-Inclusive Pullman Maldives

Roundtrip Domestic Malé Flights

Top-Rated Five-Star Maldives Paradise

Vibrant Five-Star Pullman Stay in the

• Should have a good description of function and expected input.

AutoGPT, Demo, LangChain Agents

Should be the go-to lib for the AI hackathon

• Great post about production LLMS: link

You might also like