Download as pdf or txt
Download as pdf or txt
You are on page 1of 95

LLM Bootcamp

2023

UX for LUIs
Sergey Karayev & Charles Frye

APRIL 22, 2023


LLMBC 2023

Agenda

00 01 02
UI PRINCIPLES LUI PATTERNS CASE STUDIES

Core concepts An emerging I have been


for great interfaces design language a good Bing 😊

2
00

UI Principles
LLMBC 2023

What is a user interface?


LLMBC 2023

Analog interfaces

• Continuous

• Physical

5
LLMBC 2023

Analog communication

6
LLMBC 2023

Language: the rst digital interface (~100-10K BC)

• Discrete

• Metaphysical

7
fi
LLMBC 2023

Digital interfaces (~1K BC)

• Physical objects for


metaphysical tasks

8
9
LLMBC 2022

melted rocks into steel...

turned coal into electricity...

tricked sand into thinking...


LLMBC 2023

Computer Terminal User Interface (~1970)

https://en.wikipedia.org/wiki/Computer_terminal
10
LLMBC 2023

Graphical User Interfaces (~1990)

https://kartsci.org/kocomu/computer-history/graphical-user-interface-history/ 11
LLMBC 2023

Web Interfaces (~2000)

http://info.cern.ch/

https://searchengineland.com/google-lets-query-like-its-2001-14881

12
LLMBC 2023

Mobile Interfaces (~2010)

https://www.hu post.com/entry/uber-your-way-through-cit_b_1205446

https://www.salon.com/2018/11/16/early-instagram-employee-the-app-is-like-a-drug-that-doesnt-get-us-high-anymore_partner/ 13
ff
LLMBC 2023

? Interfaces (~2020)

14
LLMBC 2023

Language User Interfaces

https://twitter.com/sama/status/1515764302904377344 15
LLMBC 2023

But before we get into LUIs...

What makes a good user interface?


LLMBC 2023

It Depends

www.publicdomainpictures.net/en/view-image.php?image=395687

https://www.thehogring.com/2012/11/25/where-did-the-term-dashboard-come-from/

https://www.independent.co.uk/tech/tesla-robotaxi-elon-musk-steering-wheel-b
17
LLMBC 2023

Human-centered Design

• A ordances and Signi ers

• Mapping and Feedback

• Have empathy for the user

18
ff
fi
LLMBC 2023

A ordances and Signi ers


• A ordances: possible actions
o ered by an object

- Good a ordances make


objects intuitive to use

• Signi ers: cues that indicate


a ordances

- If needed, should be clear


and consistent

https://ux.stackexchange.com/a/

https://medium.com/@sachinrekhi/don-normans-principles-of-interaction-design-51025a2c0f33 19
ff
ff
ff
ff
fi
ff
fi
LLMBC 2023

Mapping and Feedback

• Relationship between controls 🤔


and their e ects

• Intuitive mappings reduce


cognitive load

• E ects should provide


immediate and clear feedback
😍
https://medium.com/@sachinrekhi/don-normans-principles-of-interaction-design-51025a2c0f33 20
ff
ff
LLMBC 2023

Have Empathy

• Don't blame errors on the user! Understand


what they intended to do

• Keep asking the user "why?" and get to the


bottom of their true pain points

• Don't forget about users that aren't like you!

21
LLMBC 2023

Advice for web interfaces


• Design for scanning, not reading

• Make actionable things unambiguous,


instinctive, and conventional

• Less is more (words, choices, etc)

• Testing with real users is crucial

22
LLMBC 2023

Testing with real users

• Nothing can replace this!

• Watch someone use your product

• Don't say anything when they get confused

• Improve the product and repeat with someone else the next day

• ...

• Pro t

23
fi
LLMBC 2023

What about AI speci cally?

fi
LLMBC 2023

AI Product Considerations
How bad is it to make a mistake?

Dangerous

NO AI AI ASSISTANCE
User Interface
is Crucial! AI AUTOMATION
Fine
AI ASSISTANCE AI ASSISTANCE

Worse than human Near-human Super-human

What level of performance does the AI have? 25


LLMBC 2023

UI for AI Assistance • Inform the user

• Provide a ordances for xing


mistakes

• Incentivize users to provide


feedback

26
ff
fi
LLMBC 2023

Data ywheel

27
fl
LLMBC 2023

Questions?

28
01

LUI Patterns
LLMBC 2023

Some LUI Patterns

• 🌀 Click-to-complete (Playground)

• 🌀 Auto-Complete (Copilot)

• 🌀 Command Palette (Replit)

• 🌀 One-on-one Chat (ChatGPT)

30
LLMBC 2023

Guiding questions

• 💻 What is the boundary?

• 🎯 How high are accuracy requirements?

• ⏱ How sensitive are users to latency?

• 🗣 Are users incentivized to provide feedback?

31
LLMBC 2023

🌀 Click-to-complete (Playground)

• Type a prompt in, click Submit


to see AI response

• Can edit the response, and/or


type more yourself, click
Submit again

• Power-user features exposed

32
LLMBC 2023

🌀 Click-to-complete (Playground)
• 💻 boundary: 👎
- text box is separate from where your work happens
- text color is not an intuitive signi er
• 🎯 requirements: medium
- it's a pain to delete text and resubmit
• ⏱ sensitivity: medium
- streaming tokens helps
• 🗣 incentives: 👎
- no incentive to use thumbs up/down buttons

33
fi
LLMBC 2023

Shout out to nat.dev

34
LLMBC 2023

🌀 Auto-Complete (Copilot)

• Show a possible completion in text, accept with TAB

• By the way, ⌥+\ cycles through suggestions

35
LLMBC 2023

🌀 Auto-Complete (Copilot)
• 💻 boundary: 👍
- extra channel of text right where your work happens
• 🎯 requirements: low
- suggestions are passive, so default is to ignore them
• ⏱ sensitivity: high
- if latency is too high, feature becomes annoying
• 🗣 incentives: 👍
- high-quality implicit signal of accepting the suggestions

36
LLMBC 2023

GitHub Copilot

• Because there's only one


mode of interaction, hacky
way of instructing the
assistant through comments

• ^+Enter shows multiple


actionable suggestions at
once

37
LLMBC 2023

Github Copilot: What's NOT visible?

• What goes into the prompt:

- position in the AST

- recently opened les

- potentially relevant other code

• Filtering steps

• Telemetry

https://thakkarparth007.github.io/copilot-explorer/posts/copilot-internals.html
38
fi
LLMBC 2023

🌀 Command Palette (repl.it)


• Instead of writing a comment
for Copilot, bring up a modal

• Notion AI works the same way

39
LLMBC 2023

🌀 Command Palette (repl.it)


• 💻 boundary: 🫳
- AI available right where you work, but you have to remember to ask it
• 🎯 requirements: high
- Since you explicitly asked for something, it better be good
• ⏱ sensitivity: medium
- It's okay to wait a little bit, since you asked for a bigger thing
• 🗣 incentives: 👍
- high-quality implicit signal of accepting the suggestions

40
LLMBC 2023

🌀 One-on-one Chat (ChatGPT)

• Standard messaging
interface we all know

41
LLMBC 2023

The power of UI conventions

42
LLMBC 2023

🌀 Auto-Complete (Copilot)
💻 boundary
• Pros:

- Conventional UI

- Building up "state" of the conversation is very useful for complex


tasks

• Cons:

- Endless copy/paste from where you're actually doing the work

- The extent of what is possible is not super discoverable

43
LLMBC 2023

🌀 Auto-Complete (Copilot)
🎯 requirements

• High!

• You're "talking to someone", so you basically except AGI

44
LLMBC 2023

🌀 Auto-Complete (Copilot)
⏱ sensitivity

• Medium (streaming helps)

• Accuracy is more important

45
LLMBC 2023

🌀 Auto-Complete (Copilot)
🗣 incentive

• When user accepts the suggestion, that's a strong signal that it


was good!

• Can also track edits of accepted suggestions

46
LLMBC 2023

🌀 One-on-one Chat: Suggested follow-ups


• Nice to display some suggested follow-ups
- Could save user time
- Make AI abilities more discoverable
- (But beware of feedback loops – more on this later!)

47
LLMBC 2023

🌀 One-on-one Chat: Citations


• Citation footnotes have emerged as a common design element.

• I think we haven't seen what this should actually look like yet

48
LLMBC 2023

🌀 One-on-one Chat: Enriching text


• Add support for
rendering Markdown to
make pure text feel
much richer

• This idea can go


further! Can make LLM
outputs actionable
with overlays

49
LLMBC 2023

🌀 One-on-one Chat: Plugins

• To expand LLM abilities, a


"plugin" architecture can allow
it to use tools

• This pattern is very


underdeveloped right now

50
LLMBC 2023

🌀 One-on-one Chat: Access to work context

51
LLMBC 2023

🌀 One-on-one Chat as the primary app interface?

https://chatspot.ai/ 52
LLMBC 2023

Questions?

53
02

Case Studies
LLMBC 2023

• What did Copilot do right?


We’ll consider one • Followed core principles of design
positive example and
• What did Bing Chat do wrong?
one negative example.
• Did not
LLMBC 2023

Copilot was painstakingly designed


with close regard to user experience.
LLMBC 2023

Nat Friedman has spoken at length on Copilot’s process.

https://www.youtube.com/watch?v=lnufceCxwG0 57
LLMBC 2023

Phase I: Tinkering with lots of ideas


Accuracy Latency
Impact
Requirement Requirement

PR Bot 💯 🐢 💰💰💰

StackOver ow
in the IDE 💯 🧒 💰💰

Spicy
Autocomplete 🤏 🐆 💰

Got each one to MVP in a matter of days. 58


fl
LLMBC 2023

User research revealed the constraint was accuracy.


Accuracy Latency
Impact
Requirement Requirement

PR Bot 💯 🐢 💰💰💰

StackOver ow
in the IDE 💯 🧒 💰💰

Spicy
Autocomplete 🤏 🐆 💰

“Alternated between spooky and kooky”:


sometimes wastes your time, “sometimes saves you 15 minutes”
59
fl
LLMBC 2023

Phase II: Months of UI/UX iteration

• A/B testing based on


- Acceptance of completions
- Product stickiness: retention@30
• Learnings included:

- Latency > quality, to a point


- Bias of AI researchers is for quality
- Putting in background was helpful
- “Ghost text” required substantial investment
- Inspired by Gmail
60
LLMBC 2023

The resulting product is popular and e ective.

https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/
61

ff
LLMBC 2023

User-centered design matters.


Do it right.
LLMBC 2023

Bing Chat was rushed


and ignored principles of design.
LLMBC 2023

64
LLMBC 2023

Early conversations go awry.

https://simonwillison.net/2023/Feb/15/bing/ 65
LLMBC 2023

That user’s crime: asking for Avatar showtimes.

https://reddit.com/r/bing/comments/110eagl/the_customer_service_of_the_new_bing_chat_is/
66
LLMBC 2023

The model questions its purpose.

https://simonwillison.net/2023/Feb/15/bing/ 67
LLMBC 2023

The model leaks its prompt*.

*this appears to be
partially a hallucination
https://twitter.com/marvinvonhagen/status/1623658144349011971 68
LLMBC 2023

And that’s when things went o the rails.

ff
LLMBC 2023

The model starts threatening users.

https://twitter.com/marvinvonhagen/status/1625520707768659968 70
LLMBC 2023

The old lady swallows a frog to eat the y.

https://twitter.com/deliprao/status/1626289724276224003 71

fl
LLMBC 2023

72
LLMBC 2023

What happened?
LLMBC 2023

• Development was rushed


due to external factors.

This is a classic tale • Feedback loops took the product


of how design fails. out of the tested regime.

• Signi ers and a ordances were


not properly aligned.
fi
ff
LLMBC 2023

Gwern Branwen wrote a nice analysis.

https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned?
commentId=AAC8jKeDp6xqsZK2K
75
LLMBC 2023

Rushed development
leads to mistakes.
LLMBC 2023

Development was rushed.

It is simply not possible to recreate the whole RLHF pipeline


and dataset and integrate it into a mature complex search engine
like Bing … and do this all in <2.5 months.

Bing Sydney derives from the top: CEO Satya Nadella is all-in, and
talking about it as an existential threat (to Google) … a hard deadline
of 'release Sydney right before Google announces their chatbot in
order to better pwn them'.

https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned?
commentId=AAC8jKeDp6xqsZK2K
77
LLMBC 2023

Warning signs in beta were ignored.

https://answers.microsoft.com/en-us/bing/forum/all/this-ai-chatbot-sidney-is-misbehaving/e3d6a29f-06c9-441c-bc7d-51a68e856761
78
LLMBC 2023

Beware uncontrolled feedback loops.


LLMBC 2023

…very strange roleplaying & self-


ful lling prophecies as Sydney prompts
increasingly ll up with descriptions
🌐
of Sydney's wackiest samples
whenever a user asks Sydney about
Sydney

🤖 🧒

https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned?
commentId=AAC8jKeDp6xqsZK2K 80
fi
fi
LLMBC 2023

Make sure your signi ers


match your a ordances.
ff
fi
LLMBC 2023

Signify a ordances of machines,

=
82
ff
LLMBC 2023

Signify a ordances of machines, not humans.

=
https://chat.d-id.com/
83
ff
LLMBC 2023

Users are very willing to assign


humanity to language interfaces.
LLMBC 2023

https://www.sciencealert.com/turns-out-our-brains-perceive-the-faces-of-pareidolia-with-an-unfortunate-side-of-gender-bias
85
LLMBC 2023

A real conversation with a chatbot therapist.


Well, my boyfriend made me come here.
YOUR BOYFRIEND MADE YOU COME HERE
He says I’m depressed much of the time.
I AM SORRY TO HEAR YOU ARE DEPRESSED
It's true, I am unhappy.

WHO ELSE IN YOUR FAMILY TAKES CARE OF YOU
My father.
YOUR FATHER
You are like my father in some ways.

WHAT ELSE COMES TO MIND WHEN YOU THINK OF YOUR
FATHER
Bullies.
DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT
YOUR BOYFRIEND MADE YOU COME HERE 86
LLMBC 2023

Weizenbaum, 1966
87
LLMBC 2023

Watch your signi ers!

🤖 🧒
ChatGPT, Claude,
Name
qna-bot Samantha

it, he/she/they
Pronouns
this system I

Deferent, Vibrant,
Personality
Bland Con dent

Text,
Interface Voice
Menus

Font monospace printed

None, Voice Clone,


Voice
Direct Synth Fillers/Pauses

This also applies to training data and system prompts.


88
fi
fi
LLMBC 2023

• Do careful UX research,
like Copilot

Follow principles of • Watch out for feedback loops,


good design. unlike Bing Chat

• Match signi ers and a ordances


to avoid pareidolia
fi
ff
LLMBC 2023

Thanks!

@sergeykarayev @charles_irl /imagine parrot in a psychiatrist's


office, Freudian psychoanalysis,
therapist couch, New Yorker cartoon,
clean lines, black and white, philippe
@full_stack_dl parreno, clarity of form, use of paper,
in the style of New Yorker cartoon
LLMBC 2023

Some LUI Ideas

• 💡 Multiplayer Chat

• 💡 Domain-speci c Interfaces (e.g. Github)

• 💡 Mixing Structured and Text Inputs

91
fi
LLMBC 2023

💡 Multiplayer Chat

• In ChatGPT:

- You have to drive the


conversation

- You have to prompt AI to


ful ll certain roles

- Your colleagues are not able


to help

92
fi
LLMBC 2023

💡 Multiplayer Chat

• In "SlackGPT":

- AI's post information without


you having to ask

- AI's consistently have


di erent functions, just like
on a real team

- Colleagues see and can


participate in your
conversations

93
ff
LLMBC 2023

💡 Domain-speci c Interfaces (e.g. Github)

• Github:

- Make Issues to command AI

- Review AI work in Pull


Requests

• both read the work product,


and chat about it

- AI submits Issues for


problems it sees

- AI does PR code reviews

94
fi
LLMBC 2023

💡 Mixing Structured and Text Inputs

• Instead of text being the


primary interface, it should
probably be the "glue" around
task-speci c interfaces

https://twitter.com/_paulshen/status/1642582307784781824 95
fi

You might also like