Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

Natural Language

Processing
Amanda, Annalice, Catherine, Cameron, Claire, Jayden, & Reuben
May 13, 2021
History

2
Outline of Topics
- Intro / thesis
- History / background
- Siri
- NLP today / research
- how ai changes what we have already
- UW specific research?
- Application/Demo? - 2 people
- Written vs. Audio
- Body language???
- Recognizing emotion in faces >> deciphering abstract meaning
- Lie detector??!?!!!
- Scopes (syllables, sentences, meaning, being convincing)
- Grammarly/Word spell checking
- https://teachablemachine.withgoogle.com/
- Societal Implications 3
-
Intro (not a section)
2 minutes?

- Formal definition of NLP

4
History/Background - Amanda
-Court Documenter

-2001 Space odyssey reference (hal scene)

-2 sections of speech processing in the brain- motor skills vs. content (stutter vs nonsense)

- ELIZA therapist bot

-Customer service automation field

5
History/Background

1957 1964 1995 2006 2010 ???

Syntactic
Structures ELIZA N-grams Watson Siri Future

Noam Chomsky, a Created at MIT and was Paper is published A question-answering A virtual assistant that NLP is becoming a larger
linguist, publishes one of the first chatbots. suggesting new ways to machine developed by could respond to varying part of our everyday lives.
revolutionary book on ELIZA operated by approach language IBM. Watson would later human commands and
following pre-determined modelling using n-grams - go on to win Jeopardy reply is developed. Siri
universal grammar
scripts. a contiguous sequence of would later be integrated
n items into the iPhone

6
History/Background

1957 1964 1995 2006 2010 ???

Syntactic
Structures ELIZA N-grams Watson Siri Future

Noam Chomsky, a Created at MIT and Paper is published A question-answering A virtual assistant that NLP is becoming a larger
linguist, publishes was one of the first suggesting new ways to machine developed by could respond to varying part of our everyday lives.
revolutionary book on chatbots. ELIZA approach language IBM. Watson would later human commands and
universal grammar modelling using n-grams - go on to win Jeopardy reply is developed. Siri
operated by following
a contiguous sequence of would later be integrated
pre-determined scripts.
n items into the iPhone

7
History/Background

1957 1964 1995 2006 2010 ???

Syntactic
Structures ELIZA N-grams Watson Siri Future

Noam Chomsky, a Created at MIT and was Paper is published A question-answering A virtual assistant that NLP is becoming a larger
linguist, publishes one of the first chatbots. suggesting new ways to machine developed by could respond to varying part of our everyday lives.
revolutionary book on ELIZA operated by approach language IBM. Watson would later human commands and
universal grammar following pre-determined modelling using n-grams go on to win Jeopardy reply is developed. Siri
scripts. would later be integrated
- a contiguous sequence
into the iPhone
of n items

8
History/Background

1957 1964 1995 2006 2010 ???

Syntactic
Structures ELIZA N-grams Watson Siri Future

Noam Chomsky, a Created at MIT and was Paper is published A question-answering A virtual assistant that NLP is becoming a larger
linguist, publishes one of the first chatbots. suggesting new ways to machine developed by could respond to varying part of our everyday lives.
revolutionary book on ELIZA operated by approach language IBM. Watson would human commands and
universal grammar following pre-determined modelling using n-grams - reply is developed. Siri
later go on to win
scripts. a contiguous sequence of would later be integrated
n items
Jeopardy into the iPhone

9
History/Background

1957 1964 1995 2006 2010 ???

Syntactic
Structures ELIZA N-grams Watson Siri Future

Noam Chomsky, a Created at MIT and was Paper is published A question-answering A virtual assistant that NLP is becoming a larger
linguist, publishes one of the first chatbots. suggesting new ways to machine developed by could respond to part of our everyday lives.
revolutionary book on ELIZA operated by approach language IBM. Watson would later varying human
universal grammar following pre-determined modelling using n-grams - go on to win Jeopardy
commands and reply is
scripts. a contiguous sequence of
developed. Siri would
n items
later be integrated into
the iPhone

10
History/Background

1957 1964 1995 2006 2010 ???

Syntactic
Structures ELIZA N-grams Watson Siri Future

Noam Chomsky, a Created at MIT and was Paper is published A question-answering A virtual assistant that NLP is becoming a
linguist, publishes one of the first chatbots. suggesting new ways to machine developed by could respond to varying larger part of our
revolutionary book on ELIZA operated by approach language IBM. Watson would later human commands and everyday lives.
universal grammar following pre-determined modelling using n-grams - go on to win Jeopardy reply is developed. Siri
scripts. a contiguous sequence of would later be integrated
n items into the iPhone

11
Overview

12
Current Applications (Claire)
- Theory of operation / how it works today

-Siri/Alexa/Cortana

-Translation (also google translate)

-Movie transcription/Youtube Captions

-Spelling/Grammar checks in word processors

- Phone / email word prediction

- autocorrect

-From following rules (recursive, algorithm, etc..) to learning actively (into the future!)

13
General Overview
● NLP helps machines process and
understand human languages in
various contexts
● To perform algorithmic tasks
○ Translation
○ Summarization
○ Classification

Challenges:

● Ambiguity in language: idiomatic


expressions, homonyms, slang

14
Pipeline
1. Sentence segmentation
a. Breaks text into separate sentences to separate single ideas
2. Word tokenization
a. Breaks sentence into separate words and punctuation marks
3. Predicting parts of speech for each token
a. Classifies each word as their parts of speech
4. Text lemmatization
a. Determine most basic form / lemma of each word
5. Identify stop words
a. Take out filler words before statistical analysis

15
Pipeline
6. Dependency Parsing
a. Determines how words relate to each other
7. Finding Noun Phrases
a. Group together words that represent the
same idea
8. Named Entity Recognition
a. Detect and label nouns with real-world
concepts they represent
i. “London is the capital and most
populous city of England and the United
Kingdom”
9. Coreference Resolution
a. Assign meaning to pronouns by tracking
tracking them across sentences
16
Current Applications

17
Virtual Assistants (Siri, Alexa, Cortana)
● Interpret voice commands to perform a
variety of tasks
● Present on over 4 billion devices

18
Google Translate
● Can currently translate 109 languages
● Can translate websites, documents, speech, images, and handwritten text

19
Email Filtering
● Emails sorted based on the sender and its contents

20
Automated Journalism
● Used by large news outlets like
Washington Post and Associated
Press
● Used primarily to compose
articles on data driven subjects

21
Grammarly
● Provides suggestions to make writing clearer and easier to read

22
Hate Speech Monitoring
● Used by a variety of social media
platforms
● Gives comments varying toxicity
scores

23
Technique: Search

● Correlate two bodies of text


○ Target often more complex than search terms
● Varying complexity
○ Exact match
○ Morphology-Aware
○ Semantics-Aware

24
Search Applications

● Resumes
○ Generally keyword based
● Search Engine Optimization
● Search Engine Design
○ Primary influence on a platform’s marketing potential

25
26
Technique: Sentiment Analysis

● Essentially text comprehension


● Multiple challenges confound word-by-word
analysis
○ Ambiguity
○ Irony
○ Qualification
○ Multi-target sentiments
27
Sentiment Analysis Applications

● Reputation Management
● Recommendations / Search
● Multi-factor results provide additional insights

28
Teachable Machines

29
30
Google Teachable Machines
● Accessible web-based tool
for creating ML models with
no expertise
● User provides video / audio
snippets
● Transfer learning: pre-trained
neural network + new layer of
user data
● Data becomes last layer of
neural network

31
32
Future
Applications

33
Future Application - Annalice, Catherine
- Written vs. Audio
- Body language???
- Recognizing emotion in faces >> deciphering abstract meaning
- Lie detector??!?!!!
- Scopes (syllables, sentences, meaning, being convincing)
- Grammarly/Word spell checking
- https://teachablemachine.withgoogle.com/
-
- Reference video of real-time audio translation

34
Future Applications

1. Improving existing AI
2. Creation of more human-like & creative AI

35
Neuroscience
“With a radical new
approach, doctors have
found a way to extract
a person’s speech
directly from their
brain.” (Sample, 2019)

36
Welcome to the 1st Annual NLP Products Fair!

37
MarketMate

1. Acquire data (customer reviews)


2. Compile in machine
3. Digitally produce marketing
campaign

38
Host

1. Pull up to drive thru


2. Place order with Host
3. Pick up food at window

39
Zoocognition

1. Download Zoocognition app


2. Select the enclosure (giraffes)
3. Read transcriptions of their
“conversations”

40
Compatible
1. Download Compatible app

2. Turn listener on
- Question suggestions
- Determines compatibility at the end of the
date

3. After the date, you can save your personal


data
- Compatible will find potential matches for
you

41
Society stuff storyboard
1. Implications of NLP can be summed up into 3 categories: the good, the
bad, and the controversial.
a. Good stuff: Increasing access, democratizing language
b. Bad stuff: Exclusion, Generalization, Exposure bias
i. Ethical point 1: Gender issues
c. Controversial Stuff: Fact Checking/Fake news detection
2. Tricky: The predicament
a. Ethical point 2: Race issues
b. Ethical point 3: Privacy issues
c. The Problem: Demographic issues and Privacy issues can NEVER both be solved- fixing
one creates the other
3. Any ideas how to fix? Doom and despair? Or hopeful ending
a. I wanna end optimistic

42
AI + NLP:

Societal Impact
“It’s about society,
maaan” 43
The Good

44
Societal Impact: Removes Barriers

The Good The Bad The… Controversial?


45
The Bad

46
Societal Impact: Generalization & Gender

The Good The Bad The… Controversial?


47
Societal Impact: Exposure Inequality

The Good The Bad The… Controversial?


48
??
The Controversial

49
Societal Impact: Fact Checking

The Good The Bad The Controversial


50
Societal Impact: Fact Checking

The Good The Bad The Controversial


51
Societal Impact: Fact Checking

The Good The Bad The Controversial


52
Societal Impact: Fact Checking

The Good The Bad The Controversial


53
Societal Impact: Fact Checking

The Good The Bad The Controversial


54
Societal Impact: Fact Checking

The Good The Bad The Controversial


55
Societal Impact: Fact Checking

The Good The Bad The Controversial


56
Societal Impact: Fact Checking

“Your problem”

The Good The Bad The Controversial


57
Societal Impact: Fact Checking

“Your problem” “Our problem”

The Good The Bad The Controversial


58
Societal Impact: Fact Checking

The Good The Bad The Controversial


59
Societal: The Controversial: Toxicity Detection
-Ideas of toxicity are fairly subjective! Making a single, OBjective measure of
toxicity can be dangerous (but necessary)

60
Societal Impact: The Problem

Privacy Security

The Good The Bad The Controversial


61
Societal Impact: The Problem

The Good The Bad The Controversial


62
Societal Impact: The Problem

Ban it: No wait:


-De facto bans black twitter -De facto allows hate speech

The Good The Bad The Controversial


63
Societal Impact: The Problem

Privacy
Law
The Good The Bad The Controversial
64
Societal Impact: The Problem

Privacy Safety

The Good The Bad The Controversial


65
Societal Impact: The Problem

Safety Freedom

Privacy

The Good The Bad The Controversial


66
Societal Impact: The Problem

Safety Freedom

Privacy

Hate Speech

The Good The Bad The Controversial


67
Societal Impact: The Problem

Equity Safety Freedom

Privacy

Hate Speech

The Good The Bad The Controversial


68
Societal Impact: The Problem
The Law Toxicity

Equity Safety Freedom

Privacy

Hate Speech

The Good The Bad The Controversial


69
Societal Impact: The Problem
The Law Activism Toxicity

Equity Safety Freedom

Privacy

Hate Speech

The Good The Bad The Controversial


70
“cancelled”
Societal Impact: The Problem
The Law Activism Toxicity

Equity Safety Freedom


Doxxing
Privacy
4chan
Hate Speech

The Good The Bad The Controversial


71
“cancelled”
Going Viral
Societal Impact: The Problem
Instagram owned by facebook

The Law Activism Toxicity


1st amendment

Equity Safety Freedom


Doxxing
Privacy
4chan
The Zucc Hate Speech
Infographics worth 1000 words?

The Good The Bad The Controversial


72
“cancelled”
Going Viral
Societal Impact: The Problem
Instagram owned by facebook

The Law Activism Toxicity


1st amendment

Equity Safety Freedom


Doxxing
Privacy
4chan
The Zucc Hate Speech
Infographics worth 1000 words?

The Good The Bad The Controversial


73
Thanks bye

Oh wait Questions?

74

You might also like