Professional Documents
Culture Documents
Natural Language Processing (NLP)
Natural Language Processing (NLP)
Natural Language Processing (NLP)
NLP
sentiment analysis
↑ Text classification
7
Smart assistants. Visual 9/A
->
NLP
Machine translation
- >
Topic modelling
spell
->
checkers
↓ -
Chatbot
Text generation
Best models
NIp
language
① BERT -> Bidirectional Encoder representation
from Transformation
③ XLNEF
GPT3
④ Open Al's
⑤ ALBERT
⑥ T5
computer
science
NLP
Al
Human
Language
NLD is a branch of Al which deals with
->
⑤ Fainseq
self supervised
new zero-short
few-short 3 unsupervised
semi-supervised
casificationBinary classification (2-classes
literalmeaningone
pragmatic aces
semany
incases
structure
syntax
-> ofword
norphology
phonology
phonotics 3
phoneme
(cinguistic long
structure sics
sound
making
of
& speech
↓
control)
(out
of NLP
structure word.
of
morphology study
->
of
->
words & phrases
of identifying
↳pography-study
stempreet
Morphology
->suf-
uncomfortable
---
↓
X ↳
stemfootsuffix
prefix
(2) Desiration (3)
compounding
inflection
stem-plural, past, progressive
jump- jumps,
jumped, jumping
#x
like - links,
linked, liking
(2) pation
pointer-painted
Net print-rephantentable
(3) compounding
cosputa -> corpus
& data
text
stem stem
->
boat ->
houseboat
house
boathouse
boathouse ->
corder matters)
order does not matter
morphological wish =>
(some meaning
will's will
will
Ex will will
-
↓ ↓
↓ ↓ desire
verb person
modal person
verb name same fift
pipeline
*stages/NLD
⑲mentation -> enization -> stemming
↓
⑭ER -
Pony- Tmmatization
*
Named entity sparts speech
of
recog)
Powterstemming
↓ ↑stemming Lancaster
↑unking stemming
->
↳ snowball
stemming
pPELINE
↑
(In wide properties)/Lifecycle
↑ ·improvement ↓
Maintaining Evaluation
& - Deployment- Modelring
<-
updating
model
Pobability guage
↳ Joint probability
↳ conditional probability
↳ marginal probability
P(BIA)
P(A, B) P(A).
=
P(CIANB)
P(AGB,C) P(A) P(BIA)
=
P(DIACBOCI
P(A). P(B/A) P(C/A, B).
P(A,BaC,D)
=
Markov's Assumption
-
->
n-gram
uni-gram
->
p(wi/wii)
->
bi-gram
PCWi(wiz Wi-i)
->
tsi-gram
Basics Probability theory
probabilityew: of
coins simultaneously,
tossing two
possible
outcomeSE=5TT, THeHTcHUY
1. Sample space
2. Random experiment
3. Favourable event
4. success.
5. Random variable
ProbabilityDistribution
n
vcontinous
Discrete
General continous
1. General Discrete 1.
Random variable
< (RV
DRV
↓ ↓
pmF pdf
IPsob mass funch) (Prob funch]
density
T
RV X:0
=
12
f(n) P(X)
=
hu Y2 Yu
Fins Yu Yat1
Arge ofX
G(x4- G(n)
=
(x) E(n 1)
-
var
=
(2) Variance
arix
(3) standard deviation (S) =
a. Consider the
following pot of a random variable
19
4
q if
X0
=
P(n,2) =
ifn
1 1
9
=
- c
otherwise
variance=?
g0.4,
if =
in a
0,1 or 2 defective piece
9. A machine produces and
16,2/3 16
associated prob of
with
day mean value the variance of
Then
respectively. machines.
of defective pieces produced
by the
the no.
A+.
oloid!
xs
E(x) =
0x0. 4+
var (x) 2 (x2)
=
-
G(v/
(0x0.4 4x0.6)
= +
- 0.36
=
0.6-0.36
=
0.24
Ans2 !
E (X) 2nf(n)
=
var(x|t((z y
1) 5 13
= 1 =
,
1
-
+
+
-
P(n) Plul
8. function is
given by where A
=
The and
to
(A) R -
1 (B) n 1 +
x)((x
-
1(B)Y/n + 1
B [p(n) =
2) 28E 1
=
=) A(++E 5 + ..
+
-
J 1
=
#> j
eaxtra.et
2) A
(ic], = 1
*
20
A+
I] 1
=
*
=) 1
=
a l
en
poppydistrico
-a
continuous
y f(n)
=
a >n
is
X is a continuous random variable.
P(X a)
=
0
=
P(X b)
= 0
=
= fculdm
p(a(x(b)
p(X a) 0.
=
=
(ii) X-c.5.v.
p(a< x = b)
(culdm
=
=Sculdh
(iv) flns 10, always positive
0. h is
If a crv.
having pdf is
given by
(02n<l
[
f(x)
ca2
=
in,'urse
(i) Find c
(ii)
p((((X(3/2)
p(x <3(u)
(iii)
(iv) p(n)(2)
*
1
(i)
Sf(u)dn
=
(nz.du (Yn.an
-
+
1
=
(! ==
y
1
=> +
z
3 5 [(x2 EJ 1
-
+ =
=>
5 E 1
=
& taa
a/
c 6(1)
=
3
finnan n.de
(ii) PCnF
2!
*
E1
= +
5
=
-
z q -
+
(5 -
zy F
+
-
z]
1
r
-
5x
=
5x En e
(iii) p =
junz.du
o
+
I (n.dn
3/2
-3!! +
2 1,
5 19 E
-
+
=
12
8 + 27
-
4x -
=
1xz2
(iv) P(x=( 2)
Scuranton
=
an
1/2
=
-1, + In
5
=
-
zy
+
c-
=
(5 -
E E
+
-
1]
1x
=
7.36
=
-x+
scall to
0. Forthe funch flulatba,
bea valid pdf. statement
which one of
the following
is true.
=1
4 (b)a 0.5,b=
(a)a 1,b
=
=
1
(d)a 1qb
= -
0,b =1
(c)a
=
=> Sa bu)du+ 1
=
-
o
an
+
b/! 1
=
1
a +
E =
↓by
:
πl).
Az 1
A +Az
=
+
-
1x(xzh 2x(x3h
+
1
=
bx(xh
+
=>
i) n 13
=
9. g<X70
Son
-
# (x) =
.0=nc)
j
3n 1
-
1[n<2
9
(2=xc
L 1
(i) find P(z<X(3(2)
(ii)p(X (2)=
(iii)p(x(3/2)
distributed
&If X is uniformly in 10.10) then find
(i) f (n)
(ii) mean, variance, std deviation
(iii) P(2(X(6)
(iv)p(0X (5)
(1 3)
P(X =
(vi)P(x)8)
If a wandom variable is uniformly distributed
0.2 variance 13, then P(X<(z) =?
-
meantand
with
random variable X
is
9.3 The pdf of
a
22) 0212
for
(4
-
f(x) =
-
otherwise
· ,
=
Find mean
the
Uniform Density
-
If
X is uniformly distributed continuous
dens ity
variable, then the probability
wandom
finite
the
interval (agb] is
in
function
as
given
f(n)
[ # ,a[X b
=
f(x) =
,
otherwise
Fat
a[n=b l/(((,*
f(x) 90
= I
otherwise
pdf
↑eties
of
1.f(n) = 0
⑥
2 -
fuldn
(i)f() a E5 to
=
Ans 1
=
- =
(n)
1 110 c0XC
=
i. f
Infinida
=
=
Gbn
-
=
-
a
[22] "
=
Tba
an
itb32
=
-
mean =
#ta
...mean=
0 5
=
variance G(x2) = -
(f(x1)2
E(x2)-badu
[,]" a, =
=
f(x2)
calbutant
=
re
f(x)
=
((9)
-E-cs---
A -
az
=
100
stadeviation = =>
T2
variance "/3
=
((k) 5
1:mean=1 b
As 19
=
b a 2
+
=
=
1 b -
a 2
=
-o
p(xc)(2)
= ***.>
f(z)E(4 n2) for
13
0 n22
: -
0
= potheswise
=>
(n. (4 -22) an
E(x)
1
=
d0
28)
2π
⑧
sin
[
(2x7, -2x52) sinpiticos
I
-
=
-
2π
Pd
&tonuous #ponential
Pd
ch,n>0
4
<e-
1
b7q f(x)
=
qu<0
are
⑧
ariances n E(x)
=
fnf(n)dn
=
PdfE f(u)
acacia
=
other
=>I.cendo
=
n.zean fan -
=
-ne-an-eau
o
[o t)
=
-
⑭'K
variance f(x2)
=
- f(x)
=cute-endo
=
G(nz. +(2n tdn]
-
re
2
In Esturtiuxofing
variance E
= -
E En =
6 12
=
8. If the
then
find
dutation exceeds Emins
call
(i) 3 5ming
/
between
(ii) than 8
less
((i,j)
(iv) greater than and of call duration.
n
Gte
-
=
n!0
-
f(x) =
n(0
o
(+ 70 dn
-
P(n(t) e -
=
5en0
= 27/10
(ii) je..du
3
3/10
1
= 5
-
2)
-
-
-
2
+
e
= -
(iii)
j0fe 2/10am
-
=
- i2-((= - 1800 - 1
(u)x 1/10
=
- 110 =
P(X),0)
be FRU, mean=1
0. LetI
p(z>2(z)1)
1
I 1ga
=
-
=
Glen
,n=0
flul:
,2(0
p(2)2) 16.dn
=
= - e
-
21
-
2
= 2
+
P(z)1) =
-nanI
- 2
-
4(4 e
=
1
p(zxz(z)1)
=
-
= i e
=
ERV.with
0. LetX, & Xbe two independent
0.59 0.25respectively
then
=
mean
What
y min(x1,X2).
=
2
ai t1
=
->
4
a= 0
= =
f(x) = 2222
f(x) =
230-32
4
1
-
=
min(2e-2,ye-in)
--
Var (x + y) Va5(x) Var (Y)
= + + 2cov(Xqy)
X
if &Y are independent variable,
correlation coeff =w
oxy)
=
9. Consider
two boxes boxl &box2. Box contains
used and 6 black balls whereas Box2 contains s
red and s black balls. Now, a coin is tossed, if
head occurs then one ball is randomly drawn from
tail occurs then one ball from Box2.
boxl whereas if
getting a red ball.
(i) Find the probability of
pee. "
box.
it comes from
B.
*
p(B)) 12 P(B2)
=
7
=
-Bus
p(B).p(z/B)
p(Red) p(A).p(E(A)
+
Ex
=
E+
+
=
9/20
opleparedina
p(B.lRedl
paththe
=
(ii)
Dave
1Events of in
Mutually dependent
1)
(p(A1B) P(A).P(B))
6
2) Mutually Independent
=
4)
#
M. C
-
conditional
I
joint
pookprob
marginal (dependent)
cindependent)
nB)
p(A p(A).P(B(A)
=
*
A
=
p(As 1B)
P(B) p(A,1B) P(A-1B)
= + +
p(A1)
=
-
p(B(A)) p/A2).P(B1A2) P(As).P(B1A3)
+
+
probability
⑭
TOR
PHAT.P(BIAT) theorem
Be
theorem
pAT3) PABBA
s
CA-1B) =
↑ AB)dPl
manufacture
factory, machines A.B& total output
a
9. In a bolt
25%,35%& 40%
respectively the are
of
and 2% temp
There is a chance 59,95atrandom.
of
bolt is drawn
defective. A bolt
0.0347
is a defective 0.3623
(i) Prob that
it
drawn from A.
defective is
(ii) Prob that
1B) p(def1c)
p(def1A) P(def
+
p(def)
+
=)
=
25x +35x7 10 ro
= +
80
1315
=
+
10000
= 120
- =
10000
&. Consider the following corpus of sentences.
Cs) three friends Aman, akbar and are
anthony
book. (/s)
reading
Is a man
is reading malgudi days </s)
<s anthony
is
model. Calculate
Assume bigram language
a
book (Is))
PCCs>aman is reading
a
B, contains
The
are 3 bags BicB2 & B3 ·
bag
2. There contains 3rd &
& sred balls, bag By
s
green contains s red & 3 green
sgreen
balls and bags
have probabilities 31,003/10is
balls. B, B2& By
Bags chosen. A
bag
respectively being
if
at
&4/10 ball is chosen
and
atwandom
a
selected
random from bag.
chosen ball
is
that the green
(A) Find probability
the
selected bag is B3
thatthe
, given is
21 green.
(B)u ,..
that
probability
the
the selected bag is
$39
Find
(2)
that
thechosen ball is green.
given that selected Bag is B3
the
&
find the prob
(d) that chosen
the
is ball grows.
given
Ang2ip(green) 30 30 1 3 1x
=
x
+
x +
0 15
E +
= - +
=
0.4875-
1mf
(A) =
xp ne
=
=
0.8205
(d)p 1x =
0.4875
P(B>1 Gween) p(53).P(gran/Bs)
=
10
0.195 3/20
=
A1 =
4 4 25
8
+
+
⑥ p =
=
-
24 24
I
V
↓ ↳info/topic
stop words open class words
keep adding
or on
↳ we
closed class words
new words.
a an,theto, is, of,.-- noun
verb
prepositions - in one of, by adjective
determines- a, an, the
whe adverbs
pronouns -
I, he, his, him,the,
maxolimofurigno
Token end
will will
will
FAT
TTR = 1/3
more.
corpus, it was found thatthe word rank
with
9. In a
has a frequency
4th 600. What
of can be the
300. & n =
have to feat
the only thing
we
8. In the sentence,
is the fear itself, find TTR.
the
=>11
corpus be
2 words wid
we in a
the rank of
a. Let me &
Let represent
respectively.
my
and 400
1600
ofmeanings of
w, and we respectively.
nor
the
ratio mine would
of
tentatively -
true: S
-
which are
-
9. Tokenization in steps.
(a) Ambiguity can appear
sentence
not appear in
->
will
(b) Ambiguity
F
segmentation step. in
generally more frequent
used is
(c) Function contentword.
T than any any
text words
always real
a
Law
zts f:freq of
5: sank of
word
word
ord
in degc
k= constant
i.f.5
=
Pr (psob
of
word or ranks) =
E in; =
=
#
↳ m:no. of meanings
↓
risank
ofa word
lilength
#
Hps
Law
IVI kNB =
IVEsizeofrocabula re
document
What size
is the of unique words in a
9. K 3.71 &
words is 12000,
=
total no. of
where
0.69
B =
0-69
(v =3.71x(12000)
-
2421
=
& second
8. If the first corpus following
has TTR=0.78.
which the
of
corpus
are F.
tendency have
to unique
has more
F (i)1*corpus
words.
(ii) Ind
.....
T ... 4, 22
10-12 slides
one
ision
textintoveto
of
3
1. -> of
fow
feature 2. Binary Bow
removal stopwatchof
After
vocabulary freq -
- >
awesome fi 5
2
#t
pre-process.
boy f2
dance fy I
① Stopwords
removal
Y
girl
1
goodof ② lowering of
movie the the case
③ stemming &
comiatization
④ Removal of
punctuation sym
⑤ handling of
negation
·trstsentence awesomeboyarresom danceor a girl
dancer good movie acting
awesome boy
[2
101000]
I
S,: I
L I L 7 1
S2:
in isa
comespresent
do
TF(t,d)
hoof
(iesm fi.)
IDF(t) documentsto
log) aint
=
see
↓
+F- IDF d
Cineasedocument
TF)ta
= IDF
↓score
0.
di:He is a
good boy
&2:She is a good girl
Both the boy
and the girl are good
di:
TF-IDE score for the above corpus.
Calculate
do:good boy
& good girl
d2 ~
good:
- ⑮3 ( F(tad)(
+
o
1/2 1/3
boy i
1/3
o 1/2
girl
good:log?
0
=
log
boy:
(IDF)t))
girl:log3/
good
Exoy irl 1DF)
d1: 0
(TF -
0
I log (diX (DFi)
0
du.
= log
d3:0 5 log 2
I
of a movie:
reviews
9 From
website we
got 3
long-
a
and
RI:this movie is very scary
slow.
scary and
is
movie is not
R2:this
movie is spooky and
good.
R3:This
d2
=> This:
d/
⑤
Y/7
-
1/8 83
-
lotRF 0
=
0
1093/3
=
1/8 1/6
movie: "/7
1/6 log3/5 0
=
18
and:. 1/7
o o log3/1
long: 1/7
O log3/1
1/8
not: o
blogs blogs o
gloge0 0 0
⑤ o
o
o
=
R2
0 0 0 0
0 0
Elogh 00
flog
0
Ryi
I am writing an email on
thatofKIIT.
incorrect
↳ correct one
·
behalf
min, distance · behave
behavious
of operations
·
min no
3
Insection
↳ Levenshtein
↳ deletion distance
↳ substitution
daybl
abb* -
-
·0-PPoe
10 q O 11 12 /
10
↑ · 89 7
a 10 11
10
I7
8 9109
6
8 :
7 11
2PicsIn
9 &
T656 78 D(igj) min
09 101
=
D(i,5 1) 1
+
6 7
-
545
E " 34
8 9
7
96 · 9 8
⑧ 7 &
& 7
⑧ 7
7
S 6
ixcizysRe
4
2
=
n
6 78
1 1 2 3 4567
# o 12 3456789
U T 1 0
E
# X C
E
D. "The
for each word.
on -> pre-IN
a -> det-DT
number -> houn-NN
Pwep- IN
of ->
(adjective
other -> 55
at
D(determine
flight Nw
->
from] IN
Atlanta - NNP
P(race/VB)
P(NR/VB) *
P (VB/ T0) x
an
a
= emmision prob.
State
STf
transition
P2D.
0
P 1e/T
=
=
independence in 1947.
(2) India got p 0.04
=
Ir
will get 5 emails in next one year.
(3) You
will home
The prime minister
come to
your
0 =0.01 IN
(3) toMosOW.
1=0.002 N
will snow in Delhi in June.
(5) It
holiday next Sunday. p=0.04 In
167 You will not get a
will back.
(7) The dog pr It
I: Information
↑E
content
p(ni)6 tui)
p(ni) f)ui))
=
(n.,
↑logEz
⑭lyi)
1(yi) I(ni)
= +
E P(i) log is
PmiclogThe
H
*EpMnillogPIni)
2min
=
Lang=[P: Li
P(x,y) p(x).p(y)X)
=
↓
Independent chain sule of
ih(7(x) n(x)
=
- entropy
↓
Independent
Bayes classification
↑
P(A(B) =
PAL
PAL
Fruits a yellow,
=
sweet, long
(Fruits lowange)
p/Yellow
=
Jorange) P(sweetlorange)
x
P
xP(long lorange)
P(sweet/banana
P(Fruits/banana) P(Yellow/banana)
= x
xP(long/banana
p Yellow/
others) x P(sweetlothers
P (fruits others) =
xPClong others)
I
50 850x200
=0in
(sweetlosange) x
=
P(long lowange)
0
=
:P(fruit/orange) 0
=
P(yellow/bananal=00x 08x =
1
0.75
↑ (sweet/banana 1 =
3870
x0x
=
P(y) *AH.P(nily)
Plyln3...un)& asgmax
↓
dependent
features
Idepedant
res