ML June July

Sl'v,•nth Sl'llll'Ster B.I◄:. Deon•, 1, .
-
• h ~ '•XU11111rnr1011 ( ·1 ~ .
M . , H S - .huu.· / .luly 20t«)
ach•ne learning
I ,11u.·: J ht:-; .
Note: 1t11rn 1t'r 1111y F/J-11:'Ju// tJllc.•,ti Mnx. Murk,; XO
_:_,o1t.\1 \t'ft•c•ti1111 ONJ,.
·-- ,.. ,,/111/ qu,•,,l/011 Jro,11 ,•at·h 11111tl11h•.
Module - I
t. a. l)dinc mad1inc karnino.... Dl's•·•··l ti
• ~ I )(.• ll' Skt)S i I • .
· · n < cMgurn~ lt•aruing system
}\ns. /\ ~ompuh.'r prng1am ,~ ~aid lo k"t. 1.. . (08 Marks)
·n · • l w,t I1 rcspc~l lo some c Iass o I
.. l':\pcr,
tat..k.s I and pcrl'onnanc\! 111 ,. 1 •c, Ill, . IOlll · ~ cc
. l,Mlll: 1,1 1 ,tspctlo1m·111 · • · lt · ·k ·' 1·, ., . ll
P. 1mpro\ cs wtth c~perience f· . ' n ,\l. ,,s s ,n . ,ts mc,1 s 111 e<. >y
Steps involved in desi 0 ning k·,rnino . ·t
, • • ~ ' c::, S) s cm arc
l. ( hoosmg the tranung t•xpcriencc:-
Thc first choice is to choose the t , f .. · · • • • ·
)pc o lt,ltntno c:-..pcn'--·ncc lrom \.\lll'·h S)'Sten1 will
learn. c:- "" • •
l'hc
• •
t) pc ,of training c~pcricncc 'ava·1, t I• .. l · · ·· ·
• I tl) C i.;,\l) l,\VC i\ s1gn1ftcnnt lllltJad on 'illCCCSS or
tatlurc ol the learner.
E~amplc:- In learning to play checkers.
rack r: phasing checkers
Performance measure P: percent of games won in the \\Urld tourmm,cnt
Tr:ainin~ C\.pericncc E: Garnes played against itself. In order to complct~ the design
ot learnmg S) ~tem \\e must no,.., choose
I. l'he c\.act t) pc of k.nov. lcdgc to be lcim1cd
2. Rcprc~cntat ion for this target knowkdg.c
3. /\ learning mechanism
2. Choosing th~ target function:-
l'hc ne\.t design dwice is to tktcnninc c~adl) what type or k.nowlcdgc '" ill be
learned and ho\\ this "ill be used by the pcrfonnam:c program let us begin with
checkers playing program that can generate legal moves from an) board state The
program needs only to learn how to chase the best moves among these legal mm cs.
We must learn to choose among the legal mows the choice for the t) pc of inforn1.1tion
to b~ learned is a fun ct ion, that choose the best move for any given board state
Let us call this function choose move and use the notation choos~ mo\-C : B ~➔ M to
indicate that this function accepts as input any board from the set of legal board state
B and produces the output as some move from legal mo\ e M.
Ihc choice of target function will be the kc) design choice choose mm c ts ob\ ious
choice for target function this function is ditlicult to learn.
An alternative target function will turns out to be easier to learn it is an ~caluntion
function that assigns a numerical score to any given board stntc.
Let us call this target tune non V and again use the notation V:B -- K
23
M ~ LWA'"~
VLI S0t-rv (LSI/ESI)
I. If bis final board state that 1~ won v(b) = 100
2. Ifb is a final board statt that is lost v(b)=-100
3.lfbisatinalboardstatethatisdrawnv(b)=O 1
h b tfi1 lb d
•
Ifb is not a final state in the game then v(b) = \(b') ~here bis t e esl na odarf
. - b I , optionally unt1 t11e en o
!>late that can be ach 1c\ co 'ltartmg from ancI P a) mg
game
3. Choosing a rl'prc~cntation for target function:- .
Nm\ we ha\ e specified the ideal target function \, we mu~t choose .a representation
that the learn 111g program \\ dI u~c to describe the function Y that wtl I learn.
Vis the n.:pn:sentation of quadratic pol) nomial function of predefined board :ea_tures.
\Ve wish to piL-k expressi, e representation to allo~ representing an appropriation as
possible to get deal target function V.
V will be calculated as linear comb111ation of board features
:x. 1: number of blad, pieces on th<.! board
x2 : number of red pieces on the board
x., : number of black kings on the board
X 4 : number of red k111gs on the board
X 5 : number of black p1ec1.:s threatened by red
\, : number of red pieces threatened b) black

Thus learning program\\ ill represcntY (b) as linear function of the form.
Y(b )- \\ 0+ v, 1x 1+,, ~ '~ +\\ :-,.. 1+\\ ~·'.i +\\ ~\.~h\,\,
4. Choosing a function approximation algorithm:-
1n order to learn target function V we require a set of training example, each
describing a specific board state band tra111ing 'value, (b) folb train.
In other words each training c,ampk is an ordered pair or the form (b. \(b) train)
For im,tanct:. the folio~ ing training c:-..ample describes a board state bin,, hich black
ms won the game and target function \aluc ,(b) is therefore +100 {b. ,(b) train)
[( nI =3. x =0. x.= I. x4 = Q. xh - 0),+IO0J
~ I
Now adjust the weight w to best fit these training examples.

1
Estimating training valucs:-

For estimating the training, alues for intermediate board state. one simple approach
is rule for estimating trai11ing values.
V ( b) ~-- V(successor ( b))
tr,1111
Adjusting the wcights:-

Specify the learning algorithm for choosing the weights w, to best fit the set of
training examples {b. V(b),ra,J
E= I (v(b)-V(b) J
l )
11,1111
1, \ (h)
L1_1ul
Where E is minimizing sum of squared error is equivalent to finding the hypotheses

for adjusting the weights.
24
.Jt,M'\R/IJuly 2019
c'BCS
Se\ cral algorid~ms a'.·e known for finding weights of linear function that minimii.e f'.,.
one such a_lgorithm 1s least mean square (LMS) algorithm .
t.,MS algonthm:-
for each training e~ample (b, V(bt, 11
)
Use the current weights to calculate Y(b) for each weights w, update it as
W, ,- W, + ri(V~,~}-V (b))x, .
Where 11 is small constant example 0.1 that moderates the siLe of the weight update
rule.
5, The final design:-
The final design of checkers learning system can be naturally described by four
distinct program modules that represents the central components in many learning
systems.
Experiment
ne\\ problem
Generator
(initial game
board)
Performance Generalizes
system
Solution trace Training samples

'b V (bll > b V (b:!)\.
(game history) l I tram 2 tram J
Critic
25
M~Lear~
\ll l Seu i ( <,;,SI/fSI)
·
Oeten111ne 1y•pc of
training C"'-perience
( 1<\lllC.'.
,1g,11 n.',l
1-~::~ -- ----
~,
c,pcrb
lllOVC.'.
Detcrm inc representation of

learned function
Linl!ar
runt:Lion of Si\
li:alltrl!.-..
i\rttlicial
Determine learning nl!utrnl
nch.\uri...
Grn<lil!nt
dt:l:.CCllt
Completed design ------

programming
Fig: Summary of choices i11 tfesig11i11g the checl.er.f learning program
b. Write Find-S algorithm and explain with example. (04 Marks)
Ans. Algorithm
I. Initialize h to the most specific hypotheses in H
2. For each attribute constraint a, in h if the constraint a, in is satisfied by x then do
nothing
Else replace n, in h b) the next more general constraint that is satisfied by x.

3. Output the hypothcsl's h
Example:
Step I:- Initialize h to most Spt!cific h) potheses in H
h~(q><j,q>q><f,q,)
26
cBl-S j (,{,t"I£/ I July 2 o l 9
~ -..:~~1111plc Sky
Air
llumhlit~
-
Enjoy
temp \\ind \\'ah•t· I• on·c~1st
-- ~I
I
,..,
~llllll) \\,11111 No1rnal Strong \\ HI Ill Same
sport
Yes
X -.. Sunn) \\:um l ligh Strong \\arm Same Yes
.) Raiuy Cold High No
Strung. \Vann Chang~
""x_,' l \Vann Warm
- -
X,
. ,. I
( Sunn). \\i.u Ill. No1
. I ligh Strong Cold
m,11. Strong, Warm, Same)
( "hangc Yes
h, (Sutrn), \Varm.No,mal. Strong, Warm. Same)

l{c<;u ltanl h \Vherc x i:-. ~ample and n is h) pothc:,,c~
Stt·p 2:- X 1 ,_(Sunny.Wann. l ligh. Strong, warm. same)
Compan.: x, and x~
x ,_ (sllllll). '' arm. normal, :-.trong. warm. same)
I •
i\ , •- (~unt1). \\:tl'm. high. strnng. \\arm. same)
11·, _ ('>lllll1) \\arm,'.' ~long. \,arm. same)
Step 3:- :\ 1 •- (Rain). cold. high. :-.trnng.. \\.tl'lll, drnnge) - No
In find S algorithm \\l' h,t\ c lo eliminate ncgati, c :,,ample
I km.:e con~Hkt the p1 I!\ jou-. ~stcp2 h) pulhcscs
h x ; ·- (~u,m~. \,,um.? '>llong. \\artn. same)
Sox , .md h both ,ll c sdme
Skp 4:- Con-,idcr sample l
~. (-..unn~. \\,mn. high. ,t,ong. cool. change)
Compare"- ,llld, it ~ou get diffc1c11t ,ahtcs replace it with?
:\ , ~t,ong. \\,trill. ~.imc)
(Sllllll~. \\,trill.'}
x 1 ( \llllll~. \\.ttm, high. ~trong. cool. change)
h ... {sunn). "mm. ? .;_;tr ong. ? ? )
c. Explain list thl'n eliminate al~orithm? (04 Marks)
Ans. /\ lgorithm
I. V~r~ion sp,,cc , a I t!')t \.:Ontaining c, ~I') h) pothcscs in l l
~- I \)I' l'H\.'h trainrng ci\ampk (:\, (.\)) n:mmc from ,crsion space any hypotheses h
for whkh h(.,) j (x)
3. Output the list of h) pothescs in version space
OR
2. n. List out any 5 applications of machine learning? (05 Marks)
An s, Applicatit)ll of ma~hinc learning includes
l. \\cb M:an.:h engine:- On..-: of the reasons "hY search engines like, google, bing etc
\\ork so \\ell is because the systems has learnt how to rank pages through a complex
learning algorithm.
2. Photo tagging applications:- Be it facebook or any other photo tagging application,
the ability to tag friends makes it even more happening, it is all possible because of
a face recognition algorithm that runs behind the application.
3· Spam detector:- Our mail agent like gmail or hotmail does a lot of hard work for
..... ~s.+A~ e.cM\ ~ 27

I VII Setrt1 (CSI/t.SI)
. .. . . ,
th rn da!-.sll~ 111g the mads and rnovrng spam ma
ils to spam folder.
-
4. Databa:,c mining for growth of automations:- . ) .
I) pical applications include web click data for better UX (user experience · medical
nd
records for better automation in health care. biological ~ata a man~ ~<lore.
· I · · · · f t"fic ·ial 1·11 telJ1oence that prov, es S\stems
5. M ac I11111: carnrng rs an apphcat1on o ar 1 1
.· · l ·
= . . · .
, firo111 exnenence \\ 1thout bemn
t I1c a brl 1ty to automatically learn anc rm pro\ e · ,, e
rxplkitly programmed.\
b. \Vlrnt do you mean by hypotheses space, instance space, and ,·crsion space?
(03 Marks)
Ans. Hypotheses space:- A hypotheses H is described b) a conjunction of constraints on
the attributes.
fhe constraint\ may be "T (any value is acceptable).··~•· (no value i~ accepted) or
a specific value.
Instance space:- Instancc space is the set of all possible e:-.amples.
Version space:- The version space denoted VS 110 with respect to hypotheses space
H and training e>-.amples D. is the subset of h) potheses from H con~istent \\ ith the
training examples in D.
vslll) - :hE H / consistent (h.D)l
c. Find the maximally general h) pothescs and maximally specific hypotheses for
the training examples given in the table using candidate elimination algorithm.
Day Sky Air Humidity Wind Water Forecast Enjoy
temp sport
I Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Sa_mt! Yes
_,... Rainy Cold High Strong Warm Chanae No
e
4 Sunny Warm High Strong Cool Change Yes
(08 l\1arks)
Ans. lntialization:-
So= {$ ~ $ $ ~ ~}
GJI)= 'I ')· ?· ?• ')• ?• ')· lI
I. Need to gcncrali,c S
Consider fir~t instanct: - yes (instance)
S = : sunny, v. arm. normal, strong, warm. same l
GI = 'l ')• ')· ')· ')· ?• ')· II
S,= f sunny, warm. normal, strong, wam1. same}
Further generalize S
.2nd positive instance
S = Isunny, warm. high: strong. warm, same J
CJ;:• : Il ')• '). ?. '). ?. ?. II
S,,= lsunny, warm,? strong, warm. samcl
Writing S called as \\riting minimal generalization of hypotheses s
28
• No,, the next instance is r .
, . ,
wntc •
c·
1
G 1. . , nebat,ve insta
s called as min· 1
c . .
•· • nee, ,or negative instance we have to
ima spec1ali · '
samp Ie zation that are consistent with negative
Now we instance is
s=== { Rainy. cold, high, strong w
[f d is a negatl\ . e sample • arm ' change,1
Remove from s an) hypotheses in . .
. . . . consistent with d
s111ce 1t 1s a negative sample consid . . ·
_ , er previous s value
s~-·S , - 1 sunnv.~ warm • 9· strong, ,.,a• rm, same 1 ·
G, === {sunn 1 ? ? ? ? ?J 1
l. ,varm ')
1" 9 'J 9 1
. .. .f
J')?'J'J')?I
t • • • • • • J
t 'J? ? ? ? same}
The next i: 1 qance is a positive instance
S= tsunny. \\arm, l11gh. strong. coo l, change l
S, = : sunn), \\arm.? strong, \\arm. same}
S-1 = {sunn), ,,arm.? strong.??}
Now to write G
Previous G~ = {sunny ? ? ? ? ? ?}
(? \\ arm ') ? ? ? J
I ') ') ') ') ')
t· • . . . same,I
Now to ,uite G~ = {sunny. ?????}
=P warm ? ? ? ?}
={??????}
G 1 ={<sunny?????><? warm ??? ?>J
S4 = { sunny. ,, arm, '? strong, ? ?}
Module -2
3. a. Construct decision tree for the following data using ID algorithm. ]
Day A, A~ A.1 Classification

I True Hot High No
')
True Hot High No
.,.... False Hot High Yes
4 False Cool Normal Yes
6 True Cool High No
7 True Hot High No
8 True Hot Normal Yes
10 False Cool High No
(16 Marks)
29
VII Se.n,, ( CSI(f.SI)
An~. S is a collection of IO examples of !,Orne boolean concept {yes or no) includmg 5-w
and 5 +-w samples
b1tropy <SJ -P Ing P - f' log I'
p Noof po~itvc instance
'f otal
nurnhcrof i11stc~
l·,ntropy(S ~ 5-) (~)log (2-J-(2-Jiog.,

2
10 10 10 -
- 0.5 log 2 0.5 - 0.5 log 2 0.5
log 0.5 log0.5
ll>l.!')
f0.5>'- l I-lr 0.5 (- I} l
... -
[+0.5]-i[0.5]
L(s)
Note:- that entropy is I if collection contains equal number of +ve and -ve examples
A 1 - attributes arc true and false
S=[S-+ .5-J
s,ruc < l} , ..
[ I 1, 4 - .
] Clas~tf) thcatlnbutcsfort + vcand- H:111stanccof thcattnbute
. . .
S 1
• t,,l c; < [ '1 ¼ • 1 -
Gain(S,1\) Entrop~(s) I IS~'lcntropy(sv)

' l1r IC IJt~: s
s
1-:ntropy ( s) 1s u_u I cntrop, (S ) _l ,.aisc Ientropv ( S ) (1)
s
11
• I s • I 11.c
= NOwcalculatccntropyof(\ru~)
f
Entropy(S,.~) -( )iog, ( ~ )-(: }og, (;)
log( IJ
log, G) = log~ =-0:2x-2.32-0.8x 0.321
= 0.464 + 0.256
=0.72
Entropy(S,aK) =-(:)tog,(:)-(~ }og,( ~)

= -0.8x 0.321-0.2 X -2.32
=0.72
30
~e,S .J~I J ~ ~Ul':J
~ t•ntropy 1
lJpdate valuesG equation ( l)
I -(2.) (0.72) - ~(0.72)

10 10
1- (0 .5 )(0.72) - (0 .5 )(0.72)
::: 1- (0 .3 6) - (0 .3 6)
::: 1 t- 0.129
Gain 1.129
Gain (S. A) = 1.129
Cakl1latcGain (S. A: )
hot cool
A: >Attributcs ➔ p+,3-} p.,.,~-]
5 5
Gain ( A, ) =entropy ( s)- 2, I~~ entropy (s,., )- 2,

S. I ISi~ Ientropy ( s,""' )
Entrop) (S~, )-( }os,(! !)-(¾)Los,(!)
= -, 0.4) X (-l.)21)-( 0.6) X (-0.73)
= 0.528 + 0.438
= 0.966
Entropy(S=• )=(!)tog,(!)-(~ }og, ( !)

=- (0.6 )(-0.73)-0.4 x(-1.32)
=0.438 + 0.528
=0.966
Gain =1-Uo )co.966)-Uo )co.966)

= 1-( 0.5 )( 0.966)-(0.5 ){ 0.966)
= 1-(0.483)-(0.483)
= I +0.233
Gain(S.A:)= 1.233
Calculate Garn tS.A )
It.
-
\ ( htgh{I ♦-~ }
· \th nu.1l( 4-+ .ll J
1
rm~,,,,\ ' ) l;, )1<,g-C,) l: 1'•'!!-l J

(o 166),, .:: --$ tOJU ;){ o..:63)
0 l'~, {l.'l')
= 0.6 H
-1,0-(h O
0-l) 0
Ctn in l "· \ ~) I-(~\,0.647-(-l
I 10
),o 10
I -{0.6)·, 0.G i- -0
-1-0.3 ~~~
0.61 IS
\., ,l i11 \ \. \ ) I. I 29
(1,\lll \ ~- \ .) 1.:3]
l1,1i1t lS, \ l) 0.611 ~
HiglH.'~t gain,, ill 01.· tin- ro0t mmh.·
rrue Fnlsc High
No
/ ~\'t..•s No
/
Yes
Figur,, Tiu tledsi,m tree Ji,r the gfre11 tr11i11i11g ~,-,,,111 ,ies
32 ~ ''--~
, ztt etas r rt 7: ► • M'Htltf\ ., •
4, Explain
il•
. Ithe concept ot. cl . on
rcq uarct. to select att rabutes
. . ec1sion
t tr,ee learn·
.. . or buildino . Ing discuss t
j\ns, Dec1s1on
. I. I tree learnin,.
g IS a tnetl b ,\ decision 1 . he necessary me·asure
I ee using ID 3
111 \\ 11c 1 the learned t'unct10
. 10d· for approxi1 · algorithm ·
Learned. . trees can 'al so bc r is. represe
11
. nted bvnat111g
i discret
. e \ aluect ta , (08f Marks)
.
read ab I hty These . . ep, esented • a cec 1s i tre •
011 rget unctions.
. 5 1ea1 nm!!. as sets of 'f e.
~
I
algorithms
. • methods ar
c e amon r I then rules to .improve hu
1
Dec1s1on~ trees classify .instances b g t e most popul ar of .mductive t~an
noc,e, \,\ h ich prov 1•ct es the cl Ysorting
Iea f 'fi . th em down ti
spec1 es a test of some att 'b ass1fication of th . 1e tree from the root to some
c rt ute of ti . e mstance E I
ia no e correspond in t le instance d . ac 1 node in the tree
d
tiAn t •instance g o one of • ' an each b ·I 4
is classifi ·ct b . possible values fi 1 • r~nc 1 c1cscending from
'fi d . c ) starttng t I or t 11s attribute
spec, e b} tl11s node. then, . " a t le root node of the .. .
of the attribute in the ,. • novmg down the tree bra, 1 . tr_ee teSh'.\g the attribute
rooted
• at the new nodeg1, · en eKample. This process is1:1:e~~'.-'eespotndd,~g
pea e ,or tothethesubtree
value
I he n~ccssar).· .measure t o Se.. 1ect. the attribut . b

~ a Igrn
.- .
tthm ts ~dcdini.?. ,,1. ·I .
... lie l attribute tot, t
e is ased on ID. algorithm
ID . , . .
stat1~tH.:al prnpert, call· i . c . es at each node 111 the tree
A .,_ • ' cl 111,ormat,on oain ti ·
attnoute scp,iratcs the tra111inu b •• iat measures how well a given
• • e e:\.ample accord111g t0 ti ·
0
'.\ Ll_rmation gam '.' bm,cd on entrap) 1e1r target classification.
I f
Ent1· op) drnrnctenzes th~ imirnrit) 0 r1 an ai·b·1t1.ary collect10 . 11 0 f I
1ven a collection of~ . con . tammg • . pos1t1,e
. . and negative examp
I fes\
G
concept the entropy of S rdativc to this bool..ean cIass, ... c.fi cation
_exa~np
1s es o some target
Entrop)(S) =- Pe log: p0 -P log~P)
Entrop) (S)- I-P, log:P,

Where P, is the proportion of S belonging to class algorithm is still base 2 because
entrop) is a measure of the expected encoding length measured in bits.
Information Gain:-
The measure we use for class if) ing the attributes of training data. is information gain.
is the e,pected reduction in entrap) caused by pai1itioning the example according
to the attributes.
l nformat ion gain. Ga in (S.A) of an attributes A. is relati, e to a collection of example
S. is defined as
. , 15, I (s )
Gam (S. A):eentrop) (S)- ''"f.:.t ,)\SI Entrop} ,
. . .
, > fitf g the data, handling continuous data
1
b· Discuss the issues of a,•01ding oHr '" (08 Marks)
and missing values in decision trees.
33
.
Mct.ehtn,e,Le.CLY~
\rh \, \,,din!! \l\ \.'I ht1 111 ~' • .

1 11 ,.., s.1 11 1 tn 1' ' l:•1· flt th e tn111111
<
t·
g '.tta
, ,t, \.ll,th , Jh'tlh.''\'sSJ',11..·c ll I st.: " H
,lll\pn tll' l · t Ii 11 ·1'" s111 ·1JI ..
I1 1 I I .., 11 l' I1 t 1. 1 , •' • ~r c 11•·0 r
,t tlh'I\.' 1..' '- '"" s\'I\I \ ' ,11t \,·111,111, l . 11\• fhl tll i.:se~
1
•
1 1
•
• li er error t 11,111 I1 over ti H.: entire
th,111 h \\ \ \.'I tlh· 11 ,llll ing 1.• , .111 1p ks. hut h 11a~ a s i I,
~h,. , 111b11til'll
. \,c,n~t.\111.'l'S. . • • · .1. typic il a11p 1·1ca t 1011
'
o f (tec·1:s·1on tree
I lflll '-' tll11 ,t 1,11~·s the 1111pa1.·t nl l' ' l.' I fltt111 g Ill • '
h.-.u nin&,
(l 1)
() ~ ...
{) ~
() ;.;;
o-
0 (,5 I
I
l) (l On trainin g data -
() 5.;; On tc:\t data
(} .;;
0 Io 20 JO 10 50 60 70 80 90 I 00
Fio,., overlittiuo
J' t:,
in decision tree /ear11i11g
11:rndling continuous data:-
8,. dd1n1ti0n of'IO .. it takes discrete set of values the target attributes whose value is
pred1dcd b, kMned tree must be discrete valued.
Altrubutcs tested in the decision nodes of the tree must al so be di screte valued.
This can be accomplished by dynamically defining ne\\ discrete valued attributes
that partilion the continuous attribute value into a discrete set of intervals.
For an attribute A is continuous valued the algorithm can dynamical!) create a nev.
boolean <1ttribute A that is true A<C and false otherwise.
Missing\ alues in decision tree:-
In some cases. the available data may be missing the values for some attributes.
For e:\arnples in medical domain in 'A-hich \H~ 'A-ish to predict patient outcome based
on \arious labs te~t~ it nm) be that the lab test blood test rc~ult i~ available only for
a sub~ct of the patients.
In ~uch ca~cs it is common to estimate the missing attributes value based on the other
examples for,\ hich this attribute has a known value.
A second more complex procedure is to assign a probability to each of the possible
value:-:. of A rather than simply assigning the most common, alue to A(x).
Modulc-3
5. a. Explain nrlificial neural network based on perception concept with diagram?
(06 !\'larks)
Ans. Pcrccptron :- One type of artificial neural network system is biased on a unit called
pcrccptron and is i Ilustrntcd in figure.
A pcrccptron takes a vcdor of real valued inputs cnkulates a linear contribution
of these inputc;. then outputs a I. if thl' result is gr('atL'r than sorn~ th r0 shold and -1
oth cI'\\ 1<;c
34
I 1111111 l111 1111 11 ' ,I
\
I-,
II
~ \\
I
,
I
I If
l·i,:t11t',' l',•n·,·111to11
Wi: will :-.,11111.: 111111:i, ,\111 c IH' l ~l. 1111 ,' 111 111 Ill 111111 I\",
() (X) ~{~II ( \\ , ~ )
ti \ ()
( llhl'I \\ IW
I c:11, 1111g :i pc, ccpl II JI I meh111 i11 g ch1>11,i11g 1:, luc , 1111 weight , W,. , W,
I hc1d J1 c II«: , p,,cc 11 " I c,uul 111111c h) p111hc ,c , c1111 ,idc,cd i11 pc,ccpl, v11 lc:1111 ini; ,~
I
the ~ct vi ,ill p11~'i tl 1 lc ,cn l \ ,d11cd wc1glt1 \ C1;t, 11 ,
11 { \\ / \\ I{ ( I I)}
h, Wh:it is i:r:uli,•nt ,lcsi•cnl and 1il'itu n1lc'! Why ,t:itistil' :ippro~i,u:,tion to
~r:uli,•nt dcs,·cnt i, 11cc1icd, (04 M:,rl~•)
Ans. c;nutil'nt ,ksccnt rule:- 1.nch 1, ,,ining c~:11npk i•, a pai, 111 1hc li11111 ( >.I) whc1c x
is Iill' \ eelIll , , f iII P" I \ I,I UC~ ,111,I ii ,~ 1hc 1,111•cl 11111 pul v:, Inc. 11 is Ihe lc:u II iII!; ,:,le
(lkll
c:-.:1111pll':-
u ru k :-0.5I he
) deII:, 111 le is hcsl 1111dc"'' 11111 h) c1111, idc1i11g Ihe l,tsk 111 I, ain inii an
tlu csh11ldcd pc1L·cpl 11111 ,.c. a linc:11 1111 ii 1111 1d1irh 1hc ,,111p111 is givc11 h)
o(,) ,,.:\
I hus a liucar unit c1111 cs ponds 111 lirst st:,gc ul a pcrccpil<JII without 1hc tlue;hold.
I hm· :11 c pract ica I dillic11 ltics in applying g, ad icnl descent arc:
I, Cunl'crging to a lucal mini1num can sometimes be slow, it re4uires thousands of
gradient
~- Ir thesedcs~cnl steps and
arc n1ultiplc local minimal in the error surface then there is 110 guarantee
that the pro~l:du1c \\ ill lind glulntl lllinimulll.
One c,m1mu11 v:11-i.ition 011 gradient descent intended to overcome dilliculties arc
statisti~ gradient descent

Where as gradient descent training rule presented in equation
L\W, L(t, 1 - oJx,,1 (1) 35

,,,., ,,
~r\,+M e.cAM 5c..A,\l\fl(
Vl I -~ (' H 1 ( C\ I Il 5 I)
Compute~ \\t:1ghts 11pd:itcs alle1 c,11111111i11g (J\CJHII trai11i11g cx,,mr_lcs in r,
'Ii·l: ·HIc.i
.. IH.: IlllH
· I ~tm:hast1<.:
· p1 c1d1c11t dcsc.:ent ·1s to appr c>X·1'rr1'1tc
'
thl!-. gradient dccc cnt
L)
' updating \\eights i11c1c1111.:11tally

h)
Modifying. training ruk· 111 equation I to the equation
\,v , q(t 0)\,
Where t is the target value
0 1s the output u111t
and\'"' the 111p11t Im tra111ing example.
fo modi!) the g1adient descent algorithm to implement stocha 51 ic gradient
nppru;\1lllnt1un
w <-- \V I Aw I)
I I I
A\\t f b.w, ~n (l-0) \ is replaced by

1
W, ~ W 111 (t-0) X
1
One w,~y to vicw thi(j stochastic gradient descent is to consider a distinct error
lunct1on l·i1 (W) defined for cnch indi\ 1dual training example is as follows
.) I ,
Ed ( ,, >2 (td -od )
Where tJ is the target value

oit is the output unit value for training exampled
Stochastic gradient descent iterates over the training exampled in D at each iteration
altering the \\eights according to the gradient wrt
Ed ( ,~)
c. Describe multilayer neural network'! explain why back propagation algorithm
is required. (06 Mar~)
Ans. Multilayer neural network:
Output layer
2nd hidden layer
1\I hidden layer
Input . Input data

t t t
Figure:- ,tt11llilt1yer 11e11ral network
36 ~f\s.+~r e.cAM Sc,Ml\v
.JifA'IR/ IJ iily 2019
(,,; CS
~ I t 11:1,er net,,ofl..s le med b, th b i... , ,)f
~ tu • ~ e \,k prt,pagauo 1 I lmthm ft i.' :tp ll' \. '
ressmg n rt~h , anct, of non tm" rd -"
e:-.. P • " e\' 1s1on q1a ,~h. t·,
., . pror:ig taon alg ,mt m learn t 1
Btt\.:f\ "" ') t t' \\(' ~hts formululu\t'f Ot'l\\\\ffl.
u,,cn the m:tv,ork \\ 1th J fixe-d et f \l lit-. t\n\.t tnkfn' l H.'dlOlh tt t. rnrl,)\' go,i l l\t
de,\,ent to tte npt ID mmm ue the <:qua red error bet" t't'n tht m·t\\ ,)rt-. Nttpul , ,h "'
nnd the tareet , lues for tht·re output~
(t -o )
~
f
Bel 1usc '' c an: t OibidennJ net,, t'rk "ith mult1ph: output \llllt r.llht·r th.m ~ingk
output umt us beforl'. ,, e begin b~ redefinmg f to ,um th<· trmrs ~,, l'f .lll the nc-t" l\l ~
output ullll"- " -- •
E{ \\ H~ ~ (t -o)
\\ here output 1' tht· s~ t <'f output ttnth m thl' net" or~ nnd tl nnd ol nn: tht· t 1rget nnd
oulp ll, nluc~ as,oc1atc-d "tth tht• k output unit nnd trnining e~nmplt.· ct
,s
Ihe lenming problrrn f.11. ct b\ b,ltk prop:tgm1on to st'nrch o lurot h) polht·~l'!\
space defined b) nll po\~1bll· "l'1ght ,nhu.•s for nll the unih rn the nc·t"or"
OR
6. Dl'rh l' the bnrk proplgnhon rule considrring thr output lnJ rr und truinmg ruk
It,
for outJlut unit ,Hight? (08 Mnrks)
Ans. Bnr~ propag tt n mlc b, c n 1derrng mtput la~ er training mle for output unu
,H:1ght \\t1ghl, "J' mfh . . .: the ncmork onl) through nct1, nt'tJ ~nn mfluente onl)
thruugh OJ I ht: cfi re ,, t in, ~c \:hil1n rule to ,, rite
61 SI 0
(1}
finct 60
Io br!!lll. con 1dcr1hc fi

~
rhl'tk1hntl\cs..£...(1, o) ,,1llbcOfornlll,utputun1Hil(.exccptY.henk J
6o
\h• therefore drop the summation over output units and simply set k•j
\\ hen ~ ou set k-j re,\ rite the above equation
&E,1 o I (
---- t -o )~
&o &o J 2
5 I I 5 ~
=-l -(t -0 )=>--(t -0 ).
80 2 ~6o
l o
=~>- 2( t -o )~(t -o)
37
VII Sem, ( CSI/ESI)
=(tJ -01).;-(tl -Q)

uo,
'----v-----
on d,fferenu~tm.;
=(t J -o J )~t ~(o ) ⇒ l

00 00 J 1
J _I
=(t 1 -0J-l ⇒ -(t,-0J
:~d l
=-(tJ -o,) (2)
Now second term in equation I

00 .
~ = is the just deri\ati\eof s1gnoid function
unet I
Which \.\e ha, e alread) noted is equal to cr (nctj) ( 1--o(nctj) therefore

Oo, oo (net,)
onet J 6net J
OOJ
onet =o ( I-o ) (3)
J
Substituting expression 2 and 1 rn equation I,, cobtain

oEd oE 6o
--=---
0 llt!l J Oo 6m:t
=-{t -o )o (t-o)
1
{4)
Combining this" ith equation
d w = -'1 61:.d
JI ow JI
=-rix-(t 1 -0Jo 1 (t-o,)
AW, 1 = Tl ( t 1 - o,) o 1
( I- o 1
)
We have stochastic gradient descent rule as

oEd
AWJI =-n-x
'I OW II
JI
AW,,= -11( t, -o,)oJ {l-0,)x, 1 (5)

Note this training rule is exactly the weight update rule implemented by equations
6K ~o~ (1-ok )(tk -ot )and
~w,, =116 x,.
38
. . Jt<-♦'1.~ / Jt..,<,Ly 20 I 9
-t,'-'S
(., \\ hat b squashing fu~ction 3 "b) i, it needed? (0~ i.\.l.tr~,i
1
b, \ squa:::.hing function b e~~enttall) defined as a fun-.tion th:tt , qu . . ,h ..·, th.: 11 P t t~'
\It'· of the end" of a ,mall mtef\ al
ol n~eural net\'- ork5, the-..e can be lb ... d at node, m the hidden r ' .. r h.J ,qua, th . . . t 1put
n• . f
ll .,er\ e nn important role m neural net,\ or~ there an.· ,e\ en ,..i .1,htn,; un . . no '
i-,t out ~rnd t.>xph1in in briefl~ rt>prcsentation P°'"er of fet.·•d fonHl rd nd,,"Orik, "?
c. L (04 ~ l nrks)
hese qu 1te gt.>ncral re,ult:::, are 1'.110\\ n for the r~p ... "''" 'tatiun pO'-\ 1;.•r of fr.:d fol'\\ .:ird
J\11"'· 1
net\\ork
I. Boolean function:- E\ er1 brn)lean function ~an bt· repre,._•ntt>d t':\.ndl~ b~ ,omt'
t,,
net\' ork ,, ith o la) er-, of unit, although the numbt·r of h idd-:-r' vn ir.. n'qu1red gm"'
cxponr.:ntiall: in the "or:::.t :m: ,, ith lhe numben~r net\, ark mp .. :, .
Con:-,iti,•r the foll0\\ ing. gent:ral -:cheml..' for rcp1 t>,~ .,ting an arl,i: .: , ~0olean f'~mctto 1
for each po::,~1v'.: intJllt \ e~tor 1.:rcak n di~tinct hh.iden unit dl'l~ "-I.'~ ·, \\ eigh., ,.,, th~it
it acti, ate::, if and onl~ u~ ,!:;, spt·cific , cctor b input to the 11-..'t\, v, 1'., ulb pr'-"'n -. ,_.... a
hidden la~e, that ,,ill ~•\,n,, ha\e \. ...,·th om: \llltt 'lC~:,e.
2. Continuous function,;- fa Cr) boumh:d l."011tinu 'l . .... 1 ncti("'1' c~n bt· apprO:\.lllH\t .._'..i
,, ith arbilrarih. ,mall 1.. .. v, o,- a net,\ orl-- \\ ith t\\ o 1.1,., '-, ~ of ur ·, , It' tht"O ...,•m ' '\ th i,
~ase applil..', to th..'t\\ork::, that lht· ,1~mo1d unit, .n hidlh.·n l ·"- •'""' lim·ar m· ., : th'-·
output la) er.
3. Arbitr.lry function-.:- An) function can tx.· npproximnted h-1 .1rbitraf) n.:crn~h.·~ b~ a
net,, ork ,, ith three la~ ers of unit, the output la~ '-'r lbc':-i lin1?nr unit--.. the t\\ l"' hidd . .·n l.l, e~
use sigmoid unib and lht: number of unih required ,ll e.1ch l,l) t·r i-: fh."'t ~ '\."'\\ n in g ...~ ' . .'. Jl
:\lodulc -4
7. n. [\plain ma:\.imum a po,tcrim (\ I \P ) h) polhl''l'' u,m!! Bn~ l'' lhl'vn.·1u.
(06 ~lark.,)
Ans. Rekr 7 (a) quc~tion Dec ~O IS Jan 2019
b. Estimate conditional f>robabilitit·s of l'1td1 altrihntl·~ \'colour. kCJs, ht•i,,ht :-.
smelly} for the spccit..•s classl·s: {i\1.11} usin~ the t.lat~, ~hen in ti\\.' tnbll' u~ing
~ .
tlu~sc probabilities t•stim;ttc tltt' probability, alucs for thl' nt•,, i11sti1n~l' (colo 1:_
green. kegs= 2. hcight= tall and ~mclly=no) ( 10 :\h\r~)
No Colour Ll·gs lkight Smdl) Spl'dl•~ I
I \\ hitl' _..., SIHlft '\\.''- :\ 1

,
-_, Green -....
")
-
Jail No M
-- Green _, Short 't C'S ~1
,.,
4 \\ .hitc '_, Short Yes M
~
) Gn:L'n - Short No H
"'I
6 \\ hitt.· - foll Nll II
- \\ hill' --- r,tll °Nl\ ll
8 \\ hitc -
")
Short l~S 11
39
~n~. (( olPt fl~l'II k '. 1 lu 1 •ht till 111(1 ,111 lh 1101 1
t hll , ,..,~ '"' In pn:d1~1 lhl' 111~• 1 , du I ,11 ,, Ii I pl tht 011 q l p, ll d
I 11µrt, 11111,; \ ,, ,.., ~•" c11 h,
V II lll-1,111,1\ p( \I) ( p( 1 \I) I
,qM Ill
mgt11 1~ Pl' I)
, I\ t\U I l
P(~ll,lll fll.'l'll ,j)p(k~s ...,/,j)p{h1:irht ttll)p{'-llll.'II\ 1111,1) (I)
I he pwli.1htl1t1'-'" ul lltlll.!icnt 1,ur1.:1 , ulucs 1,;,111 c,,..,tl) Ii..: c ,t1111,1lc b,,..,cd on lh..:1r
hl'qlhJlll.'11.·, P\ 1.·1the X 11 .1i11111g c11;11111pk~
P(Spl'CI\.'~ d,IS!-.C', M) 1/8 0.5
p(:-ipcd'-'" d,,..,..,ct; I I} ,1;x o,.i;
Si111il111 I), \\C c.111 c!-.tim.,tc the c1111ditiP1111I pll1l>,1hilit11.!c, lot cxnmplc those for
l.p(coh,1 ,,l11tclspccic<.d,1,s1.·s 111) 2/1 05
p(cohu ,,l11tc spcc11.·s 11)1/1 075
?
P(Clllll1 giccn ~pc~i'-·~ 111) - 0.5
l
1
p(l.'l1lo1 gu:cn/spc~ics II) 0.2"
I
"
2. P(kgs 3 spl'CCIC.., 111), 0.75
l
(I
p( legs ) ~pl'CICS II) u
0
1
p( legs s/spccicsm) 0.2"
4
4
p( legs 2 / species h)
4
3.hcight
p(hcight short/ species M) 3 0.75

4
2
p( height short/ species H) 0.5
4
p(heighl tall/ spcc~is m) J_ 0.25

4
p( height tall/ species H) ~4 OS

.
40
· ,.... . It<
r~ i~' ♦
________________________
J td=)I 2019
H' / _
~,ell)
"'~
pl ,111d h \ cs spc~1cs
~
Ill) 0 75
4
p( s111cll) )~" '-J)CCH?s h) I 0 25

l
I
,w species m) - - - 015
•l ·-~
p( c:,ncll) no h ,pccie~ = h ) = ! = O .,5
P(I\t) P("hitc r,...\) P(gn:1..·n ~1) P(kgs ~1) P(hct,-?.ht / M) P(smdl) M)

0.5 0.5 "'- 0.5 " 0.7 5 x 0.25 x O "") , o 7S ...
U.01 ~ I
1'(11) P(" hitc 1 11) l\gn.:l·n l l) P(kgs / l I) P(hcight / l I) P(smdl) 11)
o.s o. 75 o.~s ;,. o x o..s , o.~s
0
J h1..·nn i, e ba) c ... l'lnssifil''- ns,igns thl' ta,gl't, aluc spcl.1e..., dasscs = ('. 1 ton~,, instance
b.i-.l'd on tht' probabilit) cst1mat\? k.trth.'d from th1..· t1 ,un111g data
(Color ~rcen. lq!~ 2. hl·1ght tall nnd \mt:11) 1\.o) =M
()R
8. ~,. E:\plain Nah t• lb~ es d,1',ilil'"- and Ba) sdan hl'lit'f ndworks ( l O l\larks)
Ans. Nah t• Ba~ c.., dn,sifit•s
rhl' Nni, c Bn) cs cl.,.-,~1hc~ applies to tht.' le,u ning. tas!... ,, hcrL' each instance :\. is
dcsc , ibcd b) t1 conJunction of attnbutc, nhtC'~ ,111d "here the t,,rgct function f(.~) can
t.11-.l' on nn) , nlue from ome finite set \ .
A ""-'t uf trnining examples of the tn,gct function is pro, tlkd. and a new instance is
prC",1..'ntcd. described b) the Lr 1plc ol ,ttlr ibutes , aim: l Q . Q .. Q)
I hl' karner i, asked to pr~dict thl' l.H'gl'l, ahtl' or cla::-.::-.iti~ation for the nc"' instance.
I h1..• H.,:, "-·~inn nppro.ich to l'las5i f} 111g the nc" im,l,H1~1..' i, as..,1g.n tin: most probable
t.u-~1..·t, .tine\ • gi\ en the :iltl ihttll', ,1lucs lt..\. t.)., .... 0)
\ ,. "' - argm;;.. P( \ n n .......1
1 1• 0 )
\ \
\\~~an lbC 8.t: c" tlH:orcm to re\\ rite the cxprc!:>~ion

. _ . P(VJ a,. a~ ....a )P(YJ)
\ \1 \P - ,ll t!.l1l..1\. 0
, ., P(a 1 • a; .... a,)

= argm.1 , P ( a . a: ....a V ) P( V1 ) ...... U)
\ \
P(\')
•
is estimakd b,• ~ounting
...
the frequency with which each target value V J occurs
111 the trashing data.
41
\\ ih.'l'\.' \ 'l lklh'h.'' tilt' l,lrft'( \ ,llllL' l'lllpllt l)~ t Ilt' \ GI\ t ' f3 •.l\ t"S, 1.·1:-\,,ttk~
• ' -
8!.n t.'sian Bdid' ~ L'tn nrk
\ B:n t':-t,lll l~d .t'f \1.'I\\ l)r"- lk~l·r 1b1.•s 1hc prl,babi lit> dt'-ll 1bu t i'-'Il g,.)\ t>rning. n set of
\ ari.1t,\.•:- b: ~Pt'l·,t~ ing .1 M.'t 1.'f 1.·1.mdtttl,n.ll indt'fk't11.knc1.' ,1:-.~utnptlL-'O, ,lk)nt! '' 1th n
St'{ L'r 1.'lmdHi1.,n,1! rt\)b:tbiltttt'S.
In ~1.'lh.'r.1I. ,l H,t,1.•, 1,111 b1..-lil't' l1L'l\Hlrh. tksl...'ribL'" thL' pt,.'babilit: di-..tnbuti0n 1..'\t'r a
St't ~l, f \ ,II ial- k·, ~l '11~ 11..i1.'I ,111 ,lrb It r,lr) ~1.•t '-' r I ,rndom \ :1 rtab k_,..., ) • .. ) \\ here l'3Ch
1
\ .1r;,1t11,, , 1..\111 uk.c ,l St'l 1.11' p,,,~,bk , aluc~ \ ,> )

C'o1ulitionaJ indt.'Pt'IHknn.·
Ll..'t us l"i1.·g1n '-'llr disl...'uss1(,n 1..'f B.1: cs,,m bdi1..·f 111..'t" orh. b, <.kfining notntion of
t'1..md1t11.m:1I rnth.'p1.'rhknc1.'
Let ::\. : . l b~ tl1r1..'t' dis1.·n.'tt' , .1h11..·s , ,rndom , .1nabk~.
\\ c S:l) th.u \. ,~ 1.·1.,nd1t11.1n.1II: indtpl'l1dl'llt \)f: g.1, en, j..., the prob,1bilit> 1..fo,trihution
gl)\ crn,ng \. i, llhkpl'lh.knt l\r thL' \ ,llllt': gt\ l'l1 ,l \ ,lllll' fllr l. th,H i~ if
(:\. •: .z:.)P( \. \. ) · : .L 1,1-.) P(\ - \ IZ-"1-.)
" hac \. 0 \ h.). ) , · \ · (: ) ,llld -' . _ L \ (.,)
\\ e l'Omnwnl) "nll· tht· l'\.prc-.sion :-is
/P (~ l).1)=P( \. l1.)/
\\ e Su) thnt ~l..'t or \ ariabks \. I ..... \ is C()nditi1..ln,\ll) lfHkpl'llLknt l'r the Sl'{ of
, arinbks •, 1 •••• •, n '-L!i\ en the ~ct (,f\ :-inabll's / •••••• 1..n if
1
p,\ :\, ...... ;\, I •\ I .... •\ m • , , . . . . . l n ) - P(:\ , ...... :\., . zI ...... z11 )
The nni, e Bn) es dnssific, n,sumes thnt the instance nttribute A is conditionnll)
indt'pt>nLfrnt t'f instam'l' attributl's ·\: gi\ en tht: target , alue \ '. 1
The ,wi, l' 8) t'S clas~ifies to cnkulnte P(1\. r\ I V) as follo,,er
P( r\. ,,.\ \') - P(.\. 1
\V) P(,\ , \') "' P( t\ , Y) P(A , \')
• Repn·sentalion :- Ba) c,ian bdid' 11c·l\,or1' rcpr,·sc·nts the joint probab,lit)
distrilmtit)ll for n set t)f \ ariablt·s.
• For E, :- Tht: Ba: l'sian nctwor~ in th~ figure represents joint pwbabilit)
distribution o, er the Boolean \ anables storm. lightning. thunder. forest fire.
campfire and Bus tour group.
42
-
Forest Fire
In general n Ba, es·1a11 Fig A Bayes·iu11 Be/i,,ifN
•, • • ~' net\, k " etwor/.i
spc~if) mg a set or ~ond1t 1011 u~ represents the 1oint .. .
conditional probnbil't' 11
• 11 1tkpendencc ~ssu· . probability distribution b,
N~t,, ork arcs r~1~1·" I H>,I. ' . mpt1ons together,, ith -,et of loc;I
· \.~l'IH t \I..' a ·
'ts t '~:-.cniun th·tt ti ·
I non l c~cendnnt.... in the 11"l .1 • ' 1e, anabk is cunditionnll . d
\V X · 1.: ,,01 I\
0
1, • · · ' in ependent of
. c Srt) ' i:s n descendant of y is th~ . _i.:_n _its ~n1mediatc pre deccsso;· in the net\, k
Second. a conditional ()rol b·1· t..:1c is ,1 d1rakd path from y to X or .
tl b b·1· ' )a 1 ,t, tnble i '
1c pro a I It) distribution for tirnt ·, bsl gl\_cn for each ,ariabl_e, describing
11
)redcc\.!ssor
l• • • ~-
'ai • e !!.I\ en the , "I
'"'
f .
t, ues O its immediate
.
1he J0111t probabilit, for rm, d . I .
., ( cs,ret U~Sll!.nm ' t r I
ncl\, l\tl , ariable () ). . . . ll1 ll 'a ue~ (~ .... ~.) to the tuple of
• • •) \:Un be compuh:d b~ tht· lormula.
P(> 1.. ,... y) TCP{) parent5(~))
\\ hl'rL' parent~
•
h ) denotes ti • . ,~· -
•'
·
H: set o imnH.:dmtc prt:d~ccssor::, of v . ti • k
• I,,.,
""' 1·11 11 1g }>Jaycs1a11
· I)>l' 1,et
- •network ·1m 1c net\, or •
l__ir~t. th~ lll'~\\.ork ~t1 ucturt might b1..· gi, L'll in ad\at1L'e, or it might lune to be infi rr i
liom thl· tram111!2. data. e ec
:L'~on1.L all thl' ... nd,_, ork , ariabk:-> might bt: din~ctl~ obsL'n able in each training
1.:\,1tnpk-. l11' ~Orne m1ght bL' unobsL'n abk
• lnforcncc
\\ L' \\ i~h to thL' a Ba: c~ian net\\ ork to infer the value of some target variable given
th~ obst'n ~d , alm~s of the other, ariables.
This inference step can be straight fon, ard if, alues for all of the other variables in
the net\\ 011 are kno,, n ~\.act I~.
h. Prove that how maximum liklih~od (Bayesian learning) ca~ be used in any
learning algorithms tha tare used to minimize the squared error between actual
output hypothesis and predicted output hypothesis. (06 Marks)
43
\ l l ,, o t (t , I t ,t>
r\.tl"i lh" 111. h"'u '" I' t,1 h . h I,,, l''l'lttl;!l11, d 11 "' \H ''-"'i.: m 1'1mutn likl1h
h, p,,,h, I "' lllllllllll \: '''" "'" 11\:d l:11 11 1
",, II I\ I\ p(I) h)
I
PU> hl ,, '"" p1,,d11u ,,1 1ht ,u11"u Ped, h)

h \Ii Cit '111 I\ '1t I' ( d h )
I
l11\\'111h.1l lh(; 1H11\l: 11lin,, 11101111.11 d1 t11l•11l1t 111 ,,,th ,en.> md unkiHn,n' mince
1.:
o , "'-'~h d1 11n,..,1 1h,11111,l.:, , 111111 11 ,il d1\111b11t1P11 "1th' 11 1,mcc o l:Clltcrcd 1round the
1n1c l 11~c1, nhtl.' tl,) 111thc1 th.1111l·1,,
I h"·11.,rn1c l'(d h} ~ 1~1 "'-' " 1 ith.:n .is I llllllll ,I d, . . t, 1hut1011 "1th, 1r1 lncc o nnd mean
JI th)
JI n:\) h{\) llll'II th\· 11111111 ii d1,111l111111111 IP1 the ~qu,,tlllll I\ ['I\Cll h)
II I I I II I I
,II gtll,I\ 7t ✓ C
' " ' ") no
I I ( ,I I(~ ))
,II ~111,1\ 7t ✓ C
I II I I 27t<J
'H'm," nppl) n 11un:,f,,1mntHH1 th,,t ,s common 1n 111n1".imum l1kcl1hood cnlculntions

l<,1tlw1 th,111 111.1x1m11111 1 the ,hm '-' '-'0111pl1rntcd c.\pn:'>'>l(Hl to choo~c to maximize
its lc._,~n, 1thrn
I his •~ 1w,t1fil•d hl·cnu!,.c In P ,~ monotonic lun~tion of P
Tlw, ~tore mn:\1mu111g In P nb,o nrnx11111Lcs P
I1~ 11 nrgm11, ~ I11 ~;az-;»:::-
~ I I {•I,
~ t1( x ))
h II , , 2no 2o
The first term in rhh, c:\prc\sion ,, contc:-.t independent of h. nnd can therefore be
c.li~~·,,rdcd.) u:ld111g
'" I
h~II nrgnrnxr ✓ (d, h{~,))
11 11 , , 2o
maximizing this negative quantity is c:,iuivnlent to minimizing the corresponding
positive quantity
Ill I ..
h~11 argmax L £:(d, h(x.))
h II I I 20
Finally we can again discard the constant
" • h~u nrgmax f (d, h{x,)f

··•
This equation shows that maximum likehhood hypothe i h tS the on, dtat
minimizes the sum of the squared error between the rved
the hypothesis pred1cl10ns h(x
44
9. a. Explain locally we· h Modu~I;e~-~5~-----------
r
~ ~
1g tcd
,1.ns. Let
. . us con,ider
. the case of locmear regressmn.
.
f( approxunated
) near x us111g aIII'Yweighted .
,egress,. · . (08 Man"'),,,
x "Q + w, a,(x) + :' ... + a mear function of ti1o,~ ,n which the target runction f
As before a (x) denot . wnan(x) e orm.
. • es tie1 val
. . ti ue of the jth att n'b ute of th ·
1e weights that m·i r1tm1ze
examples
Tl I ( , le squared error summed• over instance x. D of training
the set
E ==- L f(x)-f(x)) .... (I)

2 V'D
Which led ( to the gract·rent des,cent tram,ng
"' us .. rule
t:.w,"11L r(x)-f(x l)a ( ) .... (2)

,,.. D -' X
Where
rom . (i.e. letarnmg
11 is a constant
. M'the. notation
. , ~
0 t( , d ~~
.c )mt"'- and where the training rule has been re ex
pressed
~
l\X
1
fi ) m1m1ze the sqltar·ed error 'over · tx,h::!1 x-' aJ (x))
l, ( ) JllS t e K neart-st neighbors
(l 2
E1 Xq ) ==- L.J f(x)-f(x)

( 2 ,e D nearest nbs Clf , q
1 entire
over tie 0
(2)
we1g
. Minimize
I .
1t111g the the squared
error of eaclierror
t. • . . set D f training
. .
examples' while
.
distance form X . iain1ng exampl e bY some decreasing
. function K of its
~ ~ (r~~.~!~,~))' K (d(x, , ))
I
E, (x.,) =
(3) Combine I and 2

I ( ,. )'
E,(x,,)"2~ f~;,~:t.~t! -K(d(x,.,))
This approach require: computation that grows linear!) with the number of training
examples we use gradient descent rule to obtain the training rule.
t:.w, = 1J Z: K(d( x,, x))( f (xJ-f (x) )g(x)

xeK neorest nb~ ol Xq
b. What do you mean by reinforcement learning? How reinforcement learning

problem differs from other function approximation tasks, (05 Marks)
Ans, Reinforcement learning addresses the question of ho" an auto\tomous agent that
senses acts in its environment can learn to choose optimal actions to achieve its goal.
The rcinllJrcement iearning problem differs from function approximation task in
• l>clavcd
several rcw:trd
imporwnt :· 1 he wsk of the agent is to learn a target function II that maps
respects.
trom.1hc current ,tatc and w the optimal adio11 a~ n(s). In rdnforcen1ent learning
45
Vll ~\ct11 t ( ·s l/l SI)
·
tr ,1111111~ , · , . ' hhlc the tra11· 1t:. , r provides
111 lnr 111:it 1011 1s nnt .1, •11 •
onl\- a sequence of
•• of actions
. ': I l , ·cutes sequence .. ..
1111111cd1,ltl' H.' \\,lld ,.ilUt..' s c1~ t 1L' agcn c\.e · , . fl •nee~ the distribution f
• I. I ·
•.xp orahon :- In rl'1nton:cmcn l:,
· · t I •·rn1I11~ agent 111 lit:
.., _•
o
Tl · raises the question f
tr ainin° C\.,lltlpk-, LJ\ the ad1on scquern.:c it chooses . 115 . · 0
t"' • t cm.:ct1\. e 1earn mg.
"hidi 1..·,pcri1111..·ntation stratcg) prn<luccs mos . t to assume that the ag
• P.u·tialh obs<.·n abk sf ates:- \I though ti is com enien ' · ·I ·. , c~t
·1s ,cns,1r-.. · can pcr~e,,. l' the cnt1n: . state, o f ti lt:• c, 11 , ironment . ti at eac . 1 tnnt.: step 10
•
· · · d . h partial 111 ormatron. 1· or ex :. A
111.111, pr,1~t1~.ll sItuatI011~ <ic11so1 pw, 1 cs on· . l • ·t · ·I .
· · • , .
rnb,,t ,, 1th a t(,r\\ ard p0111t rng camera can no c t 5 ·e ,vhat 1s
· Je 111nc 1. t tn sue 1 cases. rt
m;n lK' lll'\..'Css,1n lor - the a~cnt to consr·d er . ·its pre'\. iou s observations . .... 11er \\ 1th
tot!et
· • .... • ·
its "-'lltTl..'tlt sc11s0r data" hen choor.;111g action:,, ant t H.: f f , best pof ,cv J
mav
• -
be one th at
.
c I10sc-, .IL'lt1..)11s ~pee .rhea
, 11 ) to 1mpn1,
. c ti ic, o bs•·r·,. "' c,. . bilit,• of the • cn\'1ronment.
.
• l ifl, - long lcurning :- Unlike isolated function approximatlo~ t~sks, robot
· 1..) 1·ten n:qu,re~
I1..·arn111g · t Iiat t Iw ro l10 t IcaI •n sc, ,,ral -.. r•·lated
.. tasks withm the same
t'll\ iro111111.:nt using the same sensors.
ror 1..'\. ·- a mobil~ robot mm need lo learn ho,\ to dock on its barter:· charger.
lw\\ t1..) 11,n i~;ltc throu~h
~ .... nar;.o\\ corridors. and how to pick up output from laser
printl.'rs .
t·.\Yrik d<n, n Q-ll':trning Algorithm'? (03 !\-larks)

Ans. ~) karning algorithm '\
rt,r t·nd1 (S, a) initialitt' the tabk cntr) Q(S.Q) to zero .
Oh:,;t•n c the "-'urrcnt stall.! S
Dt, fon:, er :
• Sl..'11.xt an action a and e\.ccutc it
• R1..·cci, c immediate n.:\\ard r
• ()bscn l' thl..' flt'\\ state S
• updata the t,1bk cntr) for() ( S.()) as f'ollm\ s
\ '\
() (S.Q) +- r + y ma:\ Q (S.Q)

• s-s 1
OR
I 0. n. \\'h~•t is instance based learning'! Expluin K-ncarest neighbor algorithm
(08 Mar~)
Ans. lnsr.111~~ ba~cd learning methods simply store the training examples. Generalizing
be~ ()lld these examples is postponed until a nc\\ instance must be classified.
Each time a nc\\ que1) instance is encountered its relationship to the previously
stored examples is examined in order to assign a target function value to the new
instance.
K - nearest neighbor algorithm
The most basic instance based method is
K - nearest neighbor algorithm
This afg 0 rirhm assumes all instances correspond to points in the n _ dimensional
sp,u-c K 11
46
ln tlh' lh'a rcs\ 1w1ghhw k,u ni,w ti , t· , ~ .
c \\.: ,ltg\::t l\lnct1,,n m,n he either discrete , alucd or
I t
' ,,., I \ l lll-'\ '
1lh' " '~.H '--'"t 1\1..·i c.hb\)\' ,\I l\)l'I\\ 1· . . .
, ~ µ in, l°'t •'Pl''1..''-'lllat111g a d1,1..· 1cl1..' , .th '-'.d targd function
, , ' '-'" ' l)t' \' \\
f h HlllH1-! atµ.,\ , lthm
•• l\)r• ,1..' ,h:h• tt.1min~ ...
1.' '\.,un,,1,, ) )) ,\\.,
1.: , " · 1, "
ii tilt' C'\.,ltnpk· to th1..· I.1~t tram . .mg cxamplos
( ' '""s1th'a t w n nl!!onth1n
• '-. '.._, .- ' 1'~1'\ t1bt,m ...·1..· \ ~h.1 be ~l.ts:-- ith~d
• l 1..'\ '- '- i.. ,kth.)'i.:'

•
\he 1'. t 11~,,Hh.
·• ,: i.:, t'rom tr,1111
· 111~
· c'\.at1t plc-.; that arc nearest to x .
• l,\.'lll , 11 '- q
I,,.
l " ) a, l! m,1 '\. " "'

\c \ l
(, l\
•
•
Pig K 11et1rt!.\t nt'igltbor
\ ' l'l \lr 1'1..', it i, l' ,\lld 1\ 1,.'p.,ll l\ l'
m s l,l11CC .U-1..' sho\\ n along. " ith X. •
l h1.' 1'. r11..'.m.·,t nc ,ghhi.'t ,llg,m ithm for l'ontinuous, alu~d target function.
ls.
}:f ( x..)
f(, ) < _.!,. )_ _
"' k
b. Fxph1in s:unph.• l'ITOI'. true error contidcncl' intern1l and Q-learning functions?
(08 Marks)
Ans. Samph.• error :- .-\ sample l'lTOr denoted as errors (h) of ah) pothesis h \\ ith respect
h) t.ug1.·t fun1.·t il,n f and data sample S is
F n(,r~~ h ) - .!_ ~-- 0 ( f( , ). h ( :d )

tl~
\\ IH:n.: n i~ th\.' numbn of example in S. and the quantit)' o(f(x). h(x)) is l if f(x) ::j:.
h(:\. ) and O i..1tl11..· n, isc .
Trm,• l'tTor :- rrnc crn.lr is denoted as error (h) of hypoth~sis h wrt target function
f and db tribution D is the probability that h will mis-classify an instance ·d rown at
47

ML June July

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML June July

Uploaded by

Copyright:

Available Formats

Sl'v,•nth Sl'llll'Ster B.I◄:. Deon•, 1, .

\, : number of red pieces threatened b) black

Now adjust the weight w to best fit these training examples.

Estimating training valucs:-

Adjusting the wcights:-

Where E is minimizing sum of squared error is equivalent to finding the hypotheses

Solution trace Training samples

Detcrm inc representation of

Completed design ------

Else replace n, in h b) the next more general constraint that is satisfied by x.

h, (Sutrn), \Varm.No,mal. Strong, Warm. Same)

..... ~s.+A~ e.cM\ ~ 27

Day A, A~ A.1 Classification

l·,ntropy(S ~ 5-) (~)log (2-J-(2-Jiog.,

Gain(S,1\) Entrop~(s) I IS~'lcntropy(sv)

Entropy(S,aK) =-(:)tog,(:)-(~ }og,( ~)

I -(2.) (0.72) - ~(0.72)

Gain ( A, ) =entropy ( s)- 2, I~~ entropy (s,., )- 2,

Entropy(S=• )=(!)tog,(!)-(~ }og, ( !)

Gain =1-Uo )co.966)-Uo )co.966)

rm~,,,,\ ' ) l;, )1<,g-C,) l: 1'•'!!-l J

rrue Fnlsc High

I he n~ccssar).· .measure t o Se.. 1ect. the attribut . b

Entrop) (S)- I-P, log:P,

\rh \, \,,din!! \l\ \.'I ht1 111 ~' • .

statisti~ gradient descent

L\W, L(t, 1 - oJx,,1 (1) 35

' updating \\eights i11c1c1111.:11tally

A\\t f b.w, ~n (l-0) \ is replaced by

Where tJ is the target value

2nd hidden layer

1\I hidden layer

Input . Input data

Io br!!lll. con 1dcr1hc fi

=(tJ -01).;-(tl -Q)

=(t J -o J )~t ~(o ) ⇒ l

=(t 1 -0J-l ⇒ -(t,-0J

Now second term in equation I

Which \.\e ha, e alread) noted is equal to cr (nctj) ( 1--o(nctj) therefore

Substituting expression 2 and 1 rn equation I,, cobtain

=-rix-(t 1 -0Jo 1 (t-o,)

We have stochastic gradient descent rule as

AW,,= -11( t, -o,)oJ {l-0,)x, 1 (5)

I \\ hitl' _..., SIHlft '\\.''- :\ 1

p(hcight short/ species M) 3 0.75

p(heighl tall/ spcc~is m) J_ 0.25

p( height tall/ species H) ~4 OS

p( s111cll) )~" '-J)CCH?s h) I 0 25

p( c:,ncll) no h ,pccie~ = h ) = ! = O .,5

P(I\t) P("hitc r,...\) P(gn:1..·n ~1) P(kgs ~1) P(hct,-?.ht / M) P(smdl) M)

\\~~an lbC 8.t: c" tlH:orcm to re\\ rite the cxprc!:>~ion

, ., P(a 1 • a; .... a,)

\ .1r;,1t11,, , 1..\111 uk.c ,l St'l 1.11' p,,,~,bk , aluc~ \ ,> )

\\ e l'Omnwnl) "nll· tht· l'\.prc-.sion :-is

PU> hl ,, '"" p1,,d11u ,,1 1ht ,u11"u Ped, h)

'H'm," nppl) n 11un:,f,,1mntHH1 th,,t ,s common 1n 111n1".imum l1kcl1hood cnlculntions

" • h~u nrgmax f (d, h{x,)f

E ==- L f(x)-f(x)) .... (I)

t:.w,"11L r(x)-f(x l)a ( ) .... (2)

E1 Xq ) ==- L.J f(x)-f(x)

(3) Combine I and 2

t:.w, = 1J Z: K(d( x,, x))( f (xJ-f (x) )g(x)

b. What do you mean by reinforcement learning? How reinforcement learning