2023 LLMBC Ux For Luis

LLM Bootcamp
2023
UX for LUIs
Sergey Karayev & Charles Frye
APRIL 22, 2023

LLMBC 2023
Agenda
00 01 02
UI PRINCIPLES LUI PATTERNS CASE STUDIES
Core concepts An emerging I have been

for great interfaces design language a good Bing 😊
2
00
UI Principles
LLMBC 2023
What is a user interface?

LLMBC 2023
Analog interfaces
• Continuous
• Physical
5
LLMBC 2023
Analog communication
6
LLMBC 2023
Language: the rst digital interface (~100-10K BC)
• Discrete
• Metaphysical
7
fi
LLMBC 2023
Digital interfaces (~1K BC)
• Physical objects for

metaphysical tasks
8
9
LLMBC 2022
melted rocks into steel...
turned coal into electricity...
tricked sand into thinking...

LLMBC 2023
Computer Terminal User Interface (~1970)
https://en.wikipedia.org/wiki/Computer_terminal
10
LLMBC 2023
Graphical User Interfaces (~1990)
https://kartsci.org/kocomu/computer-history/graphical-user-interface-history/ 11
LLMBC 2023
Web Interfaces (~2000)
http://info.cern.ch/
https://searchengineland.com/google-lets-query-like-its-2001-14881
12
LLMBC 2023
Mobile Interfaces (~2010)
https://www.hu post.com/entry/uber-your-way-through-cit_b_1205446
https://www.salon.com/2018/11/16/early-instagram-employee-the-app-is-like-a-drug-that-doesnt-get-us-high-anymore_partner/ 13
ff
LLMBC 2023
? Interfaces (~2020)
14
LLMBC 2023
Language User Interfaces
https://twitter.com/sama/status/1515764302904377344 15
LLMBC 2023
But before we get into LUIs...
What makes a good user interface?

LLMBC 2023
It Depends
www.publicdomainpictures.net/en/view-image.php?image=395687
https://www.thehogring.com/2012/11/25/where-did-the-term-dashboard-come-from/
https://www.independent.co.uk/tech/tesla-robotaxi-elon-musk-steering-wheel-b
17
LLMBC 2023
Human-centered Design
• A ordances and Signi ers
• Mapping and Feedback
• Have empathy for the user
18
ff
fi
LLMBC 2023
A ordances and Signi ers

• A ordances: possible actions
o ered by an object
- Good a ordances make

objects intuitive to use
• Signi ers: cues that indicate

a ordances
- If needed, should be clear

and consistent
https://ux.stackexchange.com/a/
https://medium.com/@sachinrekhi/don-normans-principles-of-interaction-design-51025a2c0f33 19
ff
ff
ff
ff
fi
ff
fi
LLMBC 2023
Mapping and Feedback
• Relationship between controls 🤔

and their e ects
• Intuitive mappings reduce

cognitive load
• E ects should provide

immediate and clear feedback
😍
https://medium.com/@sachinrekhi/don-normans-principles-of-interaction-design-51025a2c0f33 20
ff
ff
LLMBC 2023
Have Empathy
• Don't blame errors on the user! Understand

what they intended to do
• Keep asking the user "why?" and get to the

bottom of their true pain points
• Don't forget about users that aren't like you!
21
LLMBC 2023
Advice for web interfaces

• Design for scanning, not reading
• Make actionable things unambiguous,

instinctive, and conventional
• Less is more (words, choices, etc)
• Testing with real users is crucial
22
LLMBC 2023
Testing with real users
• Nothing can replace this!
• Watch someone use your product
• Don't say anything when they get confused
• Improve the product and repeat with someone else the next day
• ...
• Pro t
23
fi
LLMBC 2023
What about AI speci cally?
fi
LLMBC 2023
AI Product Considerations
How bad is it to make a mistake?
Dangerous
NO AI AI ASSISTANCE
User Interface
is Crucial! AI AUTOMATION
Fine
AI ASSISTANCE AI ASSISTANCE
Worse than human Near-human Super-human
What level of performance does the AI have? 25

LLMBC 2023
UI for AI Assistance • Inform the user
• Provide a ordances for xing

mistakes
• Incentivize users to provide

feedback
26
ff
fi
LLMBC 2023
Data ywheel
27
fl
LLMBC 2023
Questions?
28
01
LUI Patterns
LLMBC 2023
Some LUI Patterns
• 🌀 Click-to-complete (Playground)
• 🌀 Auto-Complete (Copilot)
• 🌀 Command Palette (Replit)
• 🌀 One-on-one Chat (ChatGPT)
30
LLMBC 2023
Guiding questions
• 💻 What is the boundary?
• 🎯 How high are accuracy requirements?
• ⏱ How sensitive are users to latency?
• 🗣 Are users incentivized to provide feedback?
31
LLMBC 2023
🌀 Click-to-complete (Playground)
• Type a prompt in, click Submit

to see AI response
• Can edit the response, and/or

type more yourself, click
Submit again
• Power-user features exposed
32
LLMBC 2023
🌀 Click-to-complete (Playground)
• 💻 boundary: 👎
- text box is separate from where your work happens
- text color is not an intuitive signi er
• 🎯 requirements: medium
- it's a pain to delete text and resubmit
• ⏱ sensitivity: medium
- streaming tokens helps
• 🗣 incentives: 👎
- no incentive to use thumbs up/down buttons
33
fi
LLMBC 2023
Shout out to nat.dev
34
LLMBC 2023
🌀 Auto-Complete (Copilot)
• Show a possible completion in text, accept with TAB
• By the way, ⌥+\ cycles through suggestions
35
LLMBC 2023
• 💻 boundary: 👍
- extra channel of text right where your work happens
• 🎯 requirements: low
- suggestions are passive, so default is to ignore them
• ⏱ sensitivity: high
- if latency is too high, feature becomes annoying
• 🗣 incentives: 👍
- high-quality implicit signal of accepting the suggestions
36
LLMBC 2023
GitHub Copilot
• Because there's only one

mode of interaction, hacky
way of instructing the
assistant through comments
• ^+Enter shows multiple

actionable suggestions at
once
37
LLMBC 2023
Github Copilot: What's NOT visible?
• What goes into the prompt:
- position in the AST
- recently opened les
- potentially relevant other code
• Filtering steps
• Telemetry
https://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals.html
38
fi
LLMBC 2023
🌀 Command Palette (repl.it)

• Instead of writing a comment
for Copilot, bring up a modal
• Notion AI works the same way
39
LLMBC 2023
🌀 Command Palette (repl.it)

• 💻 boundary: 🫳
- AI available right where you work, but you have to remember to ask it
• 🎯 requirements: high
- Since you explicitly asked for something, it better be good
• ⏱ sensitivity: medium
- It's okay to wait a little bit, since you asked for a bigger thing
• 🗣 incentives: 👍
- high-quality implicit signal of accepting the suggestions
40
LLMBC 2023
🌀 One-on-one Chat (ChatGPT)
• Standard messaging
interface we all know
41
LLMBC 2023
The power of UI conventions
42
LLMBC 2023
💻 boundary
• Pros:
- Conventional UI
- Building up "state" of the conversation is very useful for complex

tasks
• Cons:
- Endless copy/paste from where you're actually doing the work
- The extent of what is possible is not super discoverable
43
LLMBC 2023
🎯 requirements
• High!
• You're "talking to someone", so you basically except AGI
44
LLMBC 2023
⏱ sensitivity
• Medium (streaming helps)
• Accuracy is more important
45
LLMBC 2023
🗣 incentive
• When user accepts the suggestion, that's a strong signal that it

was good!
• Can also track edits of accepted suggestions
46
LLMBC 2023
🌀 One-on-one Chat: Suggested follow-ups

• Nice to display some suggested follow-ups
- Could save user time
- Make AI abilities more discoverable
- (But beware of feedback loops – more on this later!)
47
LLMBC 2023
🌀 One-on-one Chat: Citations

• Citation footnotes have emerged as a common design element.
• I think we haven't seen what this should actually look like yet
48
LLMBC 2023
🌀 One-on-one Chat: Enriching text

• Add support for
rendering Markdown to
make pure text feel
much richer
• This idea can go

further! Can make LLM
outputs actionable
with overlays
49
LLMBC 2023
🌀 One-on-one Chat: Plugins
• To expand LLM abilities, a

"plugin" architecture can allow
it to use tools
• This pattern is very

underdeveloped right now
50
LLMBC 2023
🌀 One-on-one Chat: Access to work context
51
LLMBC 2023
🌀 One-on-one Chat as the primary app interface?
https://chatspot.ai/ 52
LLMBC 2023
Questions?
53
02
Case Studies
LLMBC 2023
• What did Copilot do right?

We’ll consider one • Followed core principles of design
positive example and
• What did Bing Chat do wrong?
one negative example.
• Did not
LLMBC 2023
Copilot was painstakingly designed

with close regard to user experience.
LLMBC 2023
Nat Friedman has spoken at length on Copilot’s process.
https://www.youtube.com/watch?v=lnufceCxwG0 57
LLMBC 2023
Phase I: Tinkering with lots of ideas

Accuracy Latency
Impact
Requirement Requirement
PR Bot 💯 🐢 💰💰💰
StackOver ow
in the IDE 💯 🧒 💰💰
Spicy
Autocomplete 🤏 🐆 💰
Got each one to MVP in a matter of days. 58

fl
LLMBC 2023
User research revealed the constraint was accuracy.

Accuracy Latency
Impact
Requirement Requirement
PR Bot 💯 🐢 💰💰💰
StackOver ow
in the IDE 💯 🧒 💰💰
Spicy
Autocomplete 🤏 🐆 💰
“Alternated between spooky and kooky”:

sometimes wastes your time, “sometimes saves you 15 minutes”
59
fl
LLMBC 2023
Phase II: Months of UI/UX iteration
• A/B testing based on

- Acceptance of completions
- Product stickiness: retention@30
• Learnings included:
- Latency > quality, to a point

- Bias of AI researchers is for quality
- Putting in background was helpful
- “Ghost text” required substantial investment
- Inspired by Gmail
60
LLMBC 2023
The resulting product is popular and e ective.
https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/
61
ff
LLMBC 2023
User-centered design matters.

Do it right.
LLMBC 2023
Bing Chat was rushed

and ignored principles of design.
LLMBC 2023
64
LLMBC 2023
Early conversations go awry.
https://simonwillison.net/2023/Feb/15/bing/ 65
LLMBC 2023
That user’s crime: asking for Avatar showtimes.
https://reddit.com/r/bing/comments/110eagl/the_customer_service_of_the_new_bing_chat_is/
66
LLMBC 2023
The model questions its purpose.
https://simonwillison.net/2023/Feb/15/bing/ 67
LLMBC 2023
The model leaks its prompt*.
*this appears to be
partially a hallucination
https://twitter.com/marvinvonhagen/status/1623658144349011971 68
LLMBC 2023
And that’s when things went o the rails.
ff
LLMBC 2023
The model starts threatening users.
https://twitter.com/marvinvonhagen/status/1625520707768659968 70
LLMBC 2023
The old lady swallows a frog to eat the y.
https://twitter.com/deliprao/status/1626289724276224003 71
fl
LLMBC 2023
72
LLMBC 2023
What happened?
LLMBC 2023
• Development was rushed

due to external factors.
This is a classic tale • Feedback loops took the product

of how design fails. out of the tested regime.
• Signi ers and a ordances were

not properly aligned.
fi
ff
LLMBC 2023
Gwern Branwen wrote a nice analysis.
https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned?
commentId=AAC8jKeDp6xqsZK2K
75
LLMBC 2023
Rushed development
leads to mistakes.
LLMBC 2023
Development was rushed.
It is simply not possible to recreate the whole RLHF pipeline

and dataset and integrate it into a mature complex search engine
like Bing … and do this all in <2.5 months.
Bing Sydney derives from the top: CEO Satya Nadella is all-in, and
talking about it as an existential threat (to Google) … a hard deadline
of 'release Sydney right before Google announces their chatbot in
order to better pwn them'.
commentId=AAC8jKeDp6xqsZK2K
77
LLMBC 2023
Warning signs in beta were ignored.
https://answers.microsoft.com/en-us/bing/forum/all/this-ai-chatbot-sidney-is-misbehaving/e3d6a29f-06c9-441c-bc7d-51a68e856761
78
LLMBC 2023
Beware uncontrolled feedback loops.

LLMBC 2023
…very strange roleplaying & self-

ful lling prophecies as Sydney prompts
increasingly ll up with descriptions
🌐
of Sydney's wackiest samples
whenever a user asks Sydney about
Sydney
🤖 🧒
commentId=AAC8jKeDp6xqsZK2K 80
fi
fi
LLMBC 2023
Make sure your signi ers

match your a ordances.
ff
fi
LLMBC 2023
Signify a ordances of machines,
=
82
ff
LLMBC 2023
Signify a ordances of machines, not humans.
=
https://chat.d-id.com/
83
ff
LLMBC 2023
Users are very willing to assign

humanity to language interfaces.
LLMBC 2023
https://www.sciencealert.com/turns-out-our-brains-perceive-the-faces-of-pareidolia-with-an-unfortunate-side-of-gender-bias
85
LLMBC 2023
A real conversation with a chatbot therapist.

Well, my boyfriend made me come here.
YOUR BOYFRIEND MADE YOU COME HERE
He says I’m depressed much of the time.
I AM SORRY TO HEAR YOU ARE DEPRESSED
It's true, I am unhappy.
…
WHO ELSE IN YOUR FAMILY TAKES CARE OF YOU
My father.
YOUR FATHER
You are like my father in some ways.
…
WHAT ELSE COMES TO MIND WHEN YOU THINK OF YOUR
FATHER
Bullies.
DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT
YOUR BOYFRIEND MADE YOU COME HERE 86
LLMBC 2023
Weizenbaum, 1966
87
LLMBC 2023
Watch your signi ers!
🤖 🧒
ChatGPT, Claude,
Name
qna-bot Samantha
it, he/she/they
Pronouns
this system I
Deferent, Vibrant,
Personality
Bland Con dent
Text,
Interface Voice
Menus
Font monospace printed
None, Voice Clone,

Voice
Direct Synth Fillers/Pauses
This also applies to training data and system prompts.

88
fi
fi
LLMBC 2023
• Do careful UX research,
like Copilot
Follow principles of • Watch out for feedback loops,

good design. unlike Bing Chat
• Match signi ers and a ordances

to avoid pareidolia
fi
ff
LLMBC 2023
Thanks!
@sergeykarayev @charles_irl /imagine parrot in a psychiatrist's

office, Freudian psychoanalysis,
therapist couch, New Yorker cartoon,
clean lines, black and white, philippe
@full_stack_dl parreno, clarity of form, use of paper,
in the style of New Yorker cartoon
LLMBC 2023
Some LUI Ideas
• 💡 Multiplayer Chat
• 💡 Domain-speci c Interfaces (e.g. Github)
• 💡 Mixing Structured and Text Inputs
91
fi
LLMBC 2023
💡 Multiplayer Chat
• In ChatGPT:
- You have to drive the

conversation
- You have to prompt AI to

ful ll certain roles
- Your colleagues are not able

to help
92
fi
LLMBC 2023
💡 Multiplayer Chat
• In "SlackGPT":
- AI's post information without

you having to ask
- AI's consistently have

di erent functions, just like
on a real team
- Colleagues see and can

participate in your
conversations
93
ff
LLMBC 2023
💡 Domain-speci c Interfaces (e.g. Github)
• Github:
- Make Issues to command AI
- Review AI work in Pull

Requests
• both read the work product,

and chat about it
- AI submits Issues for

problems it sees
- AI does PR code reviews
94
fi
LLMBC 2023
💡 Mixing Structured and Text Inputs
• Instead of text being the

primary interface, it should
probably be the "glue" around
task-speci c interfaces
https://twitter.com/_paulshen/status/1642582307784781824 95
fi

2023 LLMBC Ux For Luis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2023 LLMBC Ux For Luis

Uploaded by

Copyright:

Available Formats

LLM Bootcamp

APRIL 22, 2023

Core concepts An emerging I have been

What is a user interface?

Language: the rst digital interface (~100-10K BC)

Digital interfaces (~1K BC)

• Physical objects for

melted rocks into steel...

turned coal into electricity...

tricked sand into thinking...

Computer Terminal User Interface (~1970)

Graphical User Interfaces (~1990)

Web Interfaces (~2000)

Mobile Interfaces (~2010)

Language User Interfaces

But before we get into LUIs...

What makes a good user interface?

• A ordances and Signi ers

• Mapping and Feedback

• Have empathy for the user

A ordances and Signi ers

- Good a ordances make

• Signi ers: cues that indicate

- If needed, should be clear

Mapping and Feedback

• Relationship between controls 🤔

• Intuitive mappings reduce

• E ects should provide

• Don't blame errors on the user! Understand

• Keep asking the user "why?" and get to the

• Don't forget about users that aren't like you!

Advice for web interfaces

• Make actionable things unambiguous,

• Less is more (words, choices, etc)

• Testing with real users is crucial

Testing with real users

• Nothing can replace this!

• Watch someone use your product

• Don't say anything when they get confused

What about AI speci cally?

Worse than human Near-human Super-human

What level of performance does the AI have? 25

UI for AI Assistance • Inform the user

• Provide a ordances for xing

• Incentivize users to provide

Some LUI Patterns

• 🌀 Command Palette (Replit)

• 🌀 One-on-one Chat (ChatGPT)

• 💻 What is the boundary?

• 🎯 How high are accuracy requirements?

• ⏱ How sensitive are users to latency?

• 🗣 Are users incentivized to provide feedback?

• Type a prompt in, click Submit

• Can edit the response, and/or

• Power-user features exposed

Shout out to nat.dev

• Show a possible completion in text, accept with TAB

• By the way, ⌥+\ cycles through suggestions

• Because there's only one

• ^+Enter shows multiple

Github Copilot: What's NOT visible?

• What goes into the prompt:

- position in the AST