Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

LARGE LANGUAGE

MODELS

A REVIEW FOR LE AI
HACKATHON

Shay Zweig, 19/4/2023


WHAT ARE WE GOING TO COVER
• What are Large Language models
• LLMs Strengths and limitation
• Prompt engineering
• Context and short term memory
• Retrieval augmented generation
• Augmenting LLMS: Tools, plugins and agents
• Helpful libraries (Langchain)
WHAT ARE Given a sequence of words – predict the next
LANGUAGE
word Clean
0.19

MODELS? 0.4
Amazing
The room in the hotel was ___
Disappointing
0.3
Haunted
0.01
Has been around for a while
WHAT ARE Large language models are huge artificial neural
LARGE networks trained on the word prediction task
LANGUAGE
MODELS • Not magic – Function optimization
(LLM)?
• Only trained to predict the next word *

How big? (GPT3) Why does it work?

• 175B parameter • Quality data


• Trained on all the • Turns out word
internet ~500B tokens completion is a great
task
• Also – we don't know
* also RLHF
WHAT IS IT GOOD FOR?
Completion – write a rap song about Knowledge extraction– given a hotel
luxury escapes description, extract the name, location,
number of rooms...
Q&A– When was luxury escapes founded?
Sentiment analysis – what is the
Summarization – summarize the
sentiment of the following text: "The hotel
following document...
was..."
Classification – given a hotel Paraphrasing – rewrite the following text
description, classify it to the one or more of in 10 different styles
the following classes: [family friendly, city
break...] Coding – write a python function that
takes a document and analyzes...

And much more!!!


LIMITATION – BE CAREFUL....
Hallucinations and alignment:
Knowledge cutoff:

Consistency and predictability - how do I know I get the right result?


Evaluation - how to evaluate the results?
Number of tokens
Cost of inference
...
OPEN AI Completion: text-davinci-003
API openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(model="text-davinci-003", prompt="Say this is a test)

Chat completion: gpt-4/gpt-3.5-turbo


openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
]
)

Embeddings:text-embedding-ada-002
def get_embedding(text, model="text-embedding-ada-002"):
text = text.replace("\n", " ")
return openai.Embedding.create(input = [text], model=model)['data'][0]['embedding']
PROMPT ENGINEERING
The prompt: Our main way to control the model behavior
From To

Emergent behaviour
Task specific model training Zero shot and few shot
and fine tuning learning
PROMPT ENGINEERING
The prompt: The "program language" of the model
Instructions: (Answer the user query given the specifies hotel description. If there is no information in
the description to answer the query answer "I don't know".)

Context: (hotel description: Fly high in one of the world's ultimate....)

Examples (Few shots) :


Query: Did the hotel win any prizes
Answer: Yes, it won the Tripadvisor Travellers' Choice for 2020

Query: Does it have Gluten free meal?


Answer: I don't know

User input: "query: Can I charge my electric car in the hotel?

Output indicator: Answer:


PROMPT ENGINEERING
PROMPT ENGINEERING
CHAT PROMPTS – ROLES
Only relevant in the new chat models (GPT-3.5/GPT-4)

system: "You are LeGPT you are an expert in travel you System prompts are used for
can answer questions in reference to provided context. you general instructions to the model –
answer questions in a fun and engaging way." They are more useful in GPT4.0
user: I want to take my family to a vacation in December,
where should I go? User prompts are used for the user
interaction within the conversation
Assistant: December is a great time to take a family
vacation! If you're looking for a fun and festive experience,
Assistant prompts are used for the
I suggest visiting one of the many Christmas markets in
model response within the
Europe. Germany, Austria, and Switzerland are known for
conversation
their beautiful markets...
PROMPT ENGINEERING - TIPS AND TRICKS
• Tell the model it’s role: “As an expert in...”
• Be as explicit and elaborate as possible:
• ...If you don't have the answer, say: I don't know
• ...No more than 60 words but can be less than 60 words.

• Chain of thought reasoning (CoT) - Let's think step by step


• A good reference:
https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/
• Automatic prompt generator (careful – expensive....):
https://github.com/keirp/automatic_prompt_engineer
THE TEMPERATURE PARAMETER
The temperature parameter sets the randomness level of the model.

Temperature = 0 ==> an (almost) deterministic output

Temperature = 1 ==> Increase randomness – different outputs every time, higher "creativity"
CONTEXT - SHORT TERM MEMORY
• LLMs are stateless, all the context needs to be passed in the prompt....
CONTEXT - SHORT TERM MEMORY
• LLMs are stateless, all the context needs to be passed in the prompt....

"total_tokens": 263

"total_tokens": 488

* Tokens are the atoms of the language model – each token can be one or more words or even parts of a word
CONTEXT - SHORT TERM MEMORY
• LLMs are stateless, all the context needs to be passed in the prompt....

"total_tokens": 263

"total_tokens": 488

Problem: Token* explosion Possible solutions:


• Every model has a token limitation (4K for • Context window (include only the last X iterations)
ChatGpt/ 8K for GPT4) • Summarization (Summarize the chat to this point)
• Billing I usually by the token as well as runtime

* Tokens are the atoms of the language model – each token can be one or more words or even parts of a word
CONTEXT - FEW SHOTS
Providing the model with examples of the desired behavior, will greatly improve performance:

Problem: Possible solutions:


• Few shots increases the number of tokens • Example selection
significantly... • Fine tuning (only available for GPT 3)
CONTEXT
Remember – Always count your tokens...
https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
RETRIEVAL AUGMENTED GENERATION
• The best way to handle: knowledge cutoff, hallucinations, referencing and using internal
knowledge
• Use the strengths of the generative model but ground them to external knowledge.
EMBEDDINGS

Top-Rated Five-Star Maldives


Paradise with Two Infinity Pools & 2 -79 1 4 -30 26 8 94 -1 ...
Eight Restaurants
EMBEDDINGS

Ultimate All-Inclusive Pullman Maldives


Villas with Unlimited Drinks & ...

Roundtrip Domestic Malé Flights

Top-Rated Five-Star Maldives Paradise


with Two Infinity Pools & Eight
2 -79 1 4 -30 26 8 94 -1 ...
Restaurants

Vibrant Five-Star Pullman Stay in the


Heart of Melbourne CBD's Shopping & ...
Dining District with Daily Breakfast
RETRIEVAL AUGMENTED GENERATION
• Embeddings + Vector DB + retrieval + contextual generation

Query
LLM
Embedding Generation

Context
Knn
Similarity search Response
store
Embedding
Vector
DB*
Documents Embedding vectors Similar docs
* pinecone/chrome/Faiss
Tutorial link
RETRIEVAL AUGMENTED GENERATION
• Use cases:
• Search
• Recommendation
• In context QA
Query
• …
LLM
Embedding Generation

Context
Knn
Similarity search Response
store
Embedding
Vector
DB*
Documents Embedding vectors Similar docs
* pinecone/chrome/Faiss
Tutorial link
RETRIEVAL AUGMENTED GENERATION
• Pros
• Reduces hallucinations dramatically!
• LLM augmentation with new and external knowledge (like organizational knowledge)
• Can reduce cost (embeddings are cheap and one time)
• Leverage LLM strength for generation.
• Allows referencing

• Cons
• Complexity –
• preprocessing the data
• Vector DB ops
• Cost - Vector DB costs
AUGMENTING LLMS - TOOLS
• external APIs meant to augment LLMs such as:
• Search
• Calculator
• DB query
• Bash / Python interpreter
• OTHER AI models (HuggingGPT)
• Humans!
• …

• Should have a good description of function and expected input.


AUGMENTING LLMS - PLUGINS
• Exposing external API to OpenAI’s ChatGPT
• You can think of it as an app store for the new chat UI
LLM AGENTS
• LLMs that can plan, use tools and self improve:
• Plan strategy to reach a goal Be careful!
Use of agents can get expensive…
• Perform tasks (use of external tools or internal subtasks)
• Take Observations from tasks
• Self reflect and improve

AutoGPT, Demo, LangChain Agents


LANGCHAIN – A LIBRARY FOR LLM APP DEV
A lot of wrappers and functionality - make life very easy!
• Multiple LLMs
• Prompt templates and selectors
• Chains (composing different LLM components together)
• Memory (multiple types)
• Agents
• Indexes (Vector db wrappers)
• Loaders (easily load text data) Tutorial link

Should be the go-to lib for the AI hackathon


NOT ONLY OPEN AI
• AI21Labs (Task specific APIs)
• CoHere (multilingual embeddings!)
• HuggingFace – many open-source models
• Anthropic (closed beta)
REFERENCES

• Great post about production LLMS: link


• Prompt engineering tricks tl;dr link
• Retrieval augmented generation tutorial: link
• Langchain crash course: link
• Data preprocessing for link

You might also like