Download as pdf
Download as pdf
You are on page 1of 21
ier Example ‘Consider the following data concerning eredit default in the table, Training Data oy 10000 ‘Age and Loan are two numerical variables (predictors) and Defaults the target. Test case Input- Age=48 Joan=1,42000 Predict the lass label (48-25)2+(1,42,000-40000°=102000.002 With K=3, there are two Default= and one Default=N out of three closest neighbors. The prediction for the Test case Input- Age=48, loan=1,42000 is class label Y dtd de Training Data & Test Data Teme cool Ceol Humidity cry eh He Mahesh Huddar ay | Outlook | temp | Humidty 1 8 05 Cy Cy aor Overcast Sonny Sunny Overcast Overcast oy Her He Wer Hier Mahesh Huddar Wind Strong Strong ‘Strong Wind ‘trong Strong Stone Fiay | Attribute: Outlook Tenn | yatwes (Outl 7. Overcast.Rain Ne Entropy(s Entropy Siem Entropy Sovra Entropy Si Cain(s, outtook ntropy's)~ 5 entropy(Seomny)~ Entropy s. opy(S)~ SBmtropy(Ssoney)~ Entropy F Entropy Si Gain(S, Outlook) = 0.9% 0.971 = 0.2466 vtupulse.com: BOGaeuXEGE AN SO vim Vatues (Temp) = Hot. Mild putropy(9) =~ 2lony 2 lone’ flog f= 0.94 Entropy (Snr) = ~Fo Entropy(Suaa) = ~{togs$—2og,2 = 0.9183 fntropy Seat) ~~ Zea? ous Is, Gatn (S,Temp) = Entropy(s) mtropyiS.) Gain(s, Temp) = Entropy(s) ~ 4 Entropy(Snut)—&, Entropy Suu) este viasige Sa A Entropy Scot) 4 Gain(s, Temp) aoGmeut uma A cao Sie Day | Outlook | Temp | Humidity | wind | PHY Attribute: Humidity Dr] Sonny] mot | rgh | wea | no | Mélues (Humidity) = High, Normat 2 | Sunny | Hot | Hah | Strong | Ne 5 = [945 Bntropy(s) D3_| Overcast | Hot | High | Weak | Yes - a | Rain | Mid | Wigh | Weak | es one erecta) S| Rain | Coot | Normal | Weak | Yes 6 | Rain | Coot | Normal | strong | NO | Syermat + {64.1-] Emtropy(Svermat) 107 | Overcast | Cooi | Normal | Strong | Yer De | Sunny | Mid | Wigh | Weak | No ia] Seer Tea | er [ae Ti Gain (5, Humidity) = Entropy(s) 10 | Rain | Mid | Normal | Week | Yes | — Gatn(s, Humidity) ii | Sunny | Mid | Normal | Strong | ve 7 2 Diz | Overcast | Mid | High | Strong | Yer = Entropy(s) ~ 2 Entropy Snigh) ~ 74 EMtropy(Svermat) ‘ 7 Entropy(s) ~ <7 Entropy (Snips) ~ 4 Entropy AB | Overcast | Hot | Normal | Weak | Yes Dis | Rain | Mid | High | Strong | z 2 Cain(S, Humidity) = 0.94 ~ 70.9952 -7. 0.5916 =0. Joven 3, ss Mahesh Huddar vtupulse.com Pir | Attribute: Wind “he | Values (Wind) = Strong. Weak Day | Outlook | Temp Humidity | Wind ‘Sunny | Hot | Meh | Strong | Ws : s-19 Entropy(s) ; 0.94 Overcast | Hot | Wah | Weak | Yer wlan Ato: fin He Ws | Sareng ~ (34 ntropy Ss Lo ara ma | Entropy (Swoss) = ~£logs$~2logs? = 0.8113 s]ale ermal | Seon cS rureryiss- J a) eh _[Srone Mahesh Huddar vtupulse.com BOGBeuUNXEESE AN SO nam Day | Outlook | Temp Humidity | Wind Gain(S, Outlook) = 0.2464 Sonor | rae | tagh | Wook Hot | Meh | Strong = Gain(S. Temp) = 0.0209 coat noone Gain(S, Humidity) = 0.1516 via | Wen Coot | rma Gain($, Wind) = 0.0470 Mahesh Huddar z vtupulse.com owen cm eo” BORBeuN HEE a‘ {D1, D2, ..., D14) (9+ 5-1 Outlook Overcast ae {D1,D2,D8,D9,D11} {D3,D7,D12,D13} {D4,D5,D6,D10,D14} [243-1 [4+9-] (G+2+1 hy 2 Mahesh Huddar vtupulse.com Howe com ox «Sa BOoGBeuUKeESE aN SO eiten Play Wind | Tennis Weak | No Strong | No ‘Weak | No Day | Temp | Humidity Oi _| Hot | High 02 | Hot | High 08 | Mild | High 09 | Cool | Normal Mild | Normal Weak Strong | Yes Attribute: Temp Values (Femp) = Mot, Mild, Coot ay 1 Entropy(Ssemy) = ~ Zlog, otglog;? = 0.97 Sno + (04.2-1 Entropy(Sner) = 0.0 Syeua © (14.1-] Entropy(Sua) = 1.0 fntropy So 00 Gate (Ssumny Temp) = Entropy(s) ~ | neropy(s,) 7 my Gatm(Ssmny.Temp 2 2 Entropy(S) ~ 2 Entropy(Snut) ~ zEntropy(Swu) 2 entropy 5 Entropy Seo 3,,3,3 Galn(Syuny Temp) = 0.97 24-150 osr0 Mahesh Huddar vtupulse.com Tiflelay =e —_ HORB@euxeE AN SOO wham Play Day | Temp | Humidity a Wind OL | Hot | High 02 | Hot | High 08 | Mild | High 09 | Cool oit_| Mild Weak ‘Strong Weak Normal_| Weak Normal | Strong Mahesh Huddar goORmgeuXx oma * Attribute: Humidity Values (Humidity) = High.Normat 3] mntropy(s) = ~3log:?~} Entropy(Snigs) = 0.0 Entropy(Sverm) = 0.0 > Is Gain (Ssunny, Humidity) = Entropy(s) =! entropy(s.) 3 ain(Ssanny- Humidity) = Entropy(S) ~ 3 Entropy Sn 2 entropy 5 EMtropy(Srorm 2a0-2edlfor Gatn(Spmay- Humidity) = 0.97 = vtupulse.com Day | Temp | Humidity O 02 08 09 on Hot | High Hien Hien Normal Normal Mahesh Huddar Play Tennis No as Attribute: Wind Values (Wind) = Strong. Weak Weak Strong | No 1 Entropy(S) = ~zlog,3~ 31og,2 = 0.97 ‘Weak | No eal ntropy(Ssireng) = 1.0 trong | Yes Sve Entropy(Siveos) does Gain (Ssemny.Wind) = Entropy(S) sl entropy( Sy = vtupulse.com [el] Fw zlSlx al eoRmendemaE A cao See Day | Temp | Humidity on 02 08 09 ou Hot | High Hot mils Hien High Coo mils Normal Normal Mahesh Huddar woao® Play Tennis No Wind Weak ‘Strong ‘Weak Gain(Ssunny,Temp) = 0.570 No Weak Strong | Yes Gain(S sunny, Humidity) = 0.97 Gain(S sunny, Wind) = 0.019% vtupulse.com omen oS {D1, D2, ..., D4) (945-1 Outlook Overcast SS ~ {D3,D7.12.D13} {D4,.D3,D6D10D14) 04 B24 High Normal Yes N (1p. D8) wpm Mahesh Huddar vtupulse.com HORB@euxeE AN SOO ime Play | Attribute: Temp Tennis Os [Mid | High | Weak | Yes 0S | Coot | Normal | Weak | Yes Day | Temp | Humidity | Wind Values (Femp) = ot.Mild. Cool Saain Entropy Seem) 06 | Coot | Normal | Strong | _No ‘Mild | Normal _| Weak | Yes Mild | High | Strong | No Sen Bntropy(Siar) = 0.0 Butropy(Swae) Entropy(Scout) = 1.0 Gain (Spam, Temp) = Entropy(S) Gat (Suin-Temp) ° 3 Entropy(s) ~ 9 Entropy(Suat) ~ LEntropy(Swua) 2 entropy Fetropy Scout 800-200-210 2 entre 5 Mahesh Huddar vtupulse.com 5% a omen oS Play | Attribute: Humidity Tennis High | Weak | Yes Humidity | wind Values (Humidity) = High.Normat Normal | Weak | Yes | Sau BntroPy(Ssemny ‘Normal | Strong | _No a oe Entropy(Snies) High | Strong | No pacar Gat (Sra, Humidity) = Entropy(S) 2 3 Gain(Spain Humidity) = Entropy(S) ~ ZEntropy(Sinign)~ 2 ERErOPY(Syermat) Gain (Spain Humidity) = 0.97 ~2 1.020.918 = 0.0192 Mahesh Huddar vtupulsesam ii eelm % ES] HORB@euxeE AN SOO iam Play | Attribute: Wind Tennis High | Weak | Yes Humidity | wind Values (wind) = Strong. Weak Normal_| Weak | Yes_| Seuie = (34.2 -1 Bntropy(Ssemy) ‘Normal _| Strong ‘Normal | Weak Sserang * (0*.2 ntropy(Ssirong) = 0.0 High | Strong Entropy(Sezas) ~ 0.0 Entropy Sel entropyis, ntropy Sl encropy(Se) 2 3 Gain(Sauim Wind) = Entropy(S) ~ = Bntropy(Ssireng) ~ Entropy Sivas Mahesh Huddar vtupulse.coa® BOGBeuUNXEESE AN SOO ime Play Humidity | Wind Y Tennis High | Weak | Yes_| Gain(Spain, Temp) = 0.0192 ‘Normal_| Weak | Yes ‘Normal | Strong | _No ‘Normal_| Weak | Yes High | Strong Gain(S gain, Humidity) = 0.192 Gain(S pain Wind) = 0.97 Mahesh Huddar vtupulse.com BOoORBenXxe ES at eam She {D1, D2, ..., D4) (945-1 Outlook se Ovetbest [eomiaury | {3.p7.012.D13) aS (4404 High Normal Yes Strong Weak N Ze \ (D1, D2, D8} (09,11) No Yes Nw = (D6, D14} (D4, Ds, D10} No Yes Mahesh Huddar vtupulse.com omen oS Bag of Word Algorithm four comments excluding the punctuation, We get, “I”, “Love”, "Cricket", “Football”, “Hockey”, “Golf” The next step is the create vectors. Vectors convert text that can be used by the machine learning . So for the 1st comment it is 2s follows 121 love=1 Cricket Football Hock So, following comments are represented as follows Hove Cricket : [1,1,1,0,0,0] Hove Football: [1,1,0,1,0,0] Hove Hockey:: [1,1,0,0,1,0] Hove Golf: : [1,1,0,0,0,1] Feature Engineering TF-IDF Vectors as features TF-IDF score represents the relative importance of a term in the document and the entire corpus. TF-IDF score is composed by two terms: the first computes the normalized Term Frequency (TF), the second term is the Inverse Document Frequency (IDF), computed as the logarithm of the number of the documents in the corpus divided by the number of documents where the specific term appears, TF{t) = (Number of times term t appears in a document) / (Total number of terms in the document) IDF(t) = log_e(Total number of documents / Number of documents with termtinit) eeeoceoooee eceee oo @ TF-IDF weight calculation Example: Consider a document containing 100 words wherein the word cat appears 3 times, The term frequency (i.e., tf) for cat is then (3 / 100) = 0.03. Now, assume we have 10 million documents and the word cat appears in one thousand of these. Then, the inverse document frequency (i.e., idf) is calculated as log( 10,000,000 / 1,000) = 4. The TE-idf weight is the product of these quantities: 0.03 * 420.12. BOoORMenXxeaES mo eae ti icohton e combs ee ge 8 Oe, the BR oor . as peyhk t Conv fre nk we prot Paine pt ws ; f é ae bee ee = pihe_tm thee Joss Prawn Tepe : 5) Gonpidee™ Contide’ = gral 0% ee —~ Fug (A) he ee oo Tequs ww ku & nee PEE : goer Pe number hs bene oun begdner nen.es 1.2.8 G [ay [un Figure 6.2 Gencration ofthe candidate itemsets and frequent itemsets, where the minimum support countis2. | il ou fel se Tree Es Douce te ww = fovey=2 Coe = 898 Fm beyye \ y= 2 fevey) =F 23 rer. ae (oe = 03 orm, SONAL SD fru) > vice ANWR) AE Psp Namen 223, try =2 Comey =%23 ~ a Len 435 q Winey = F23 serie State ey = Quine) ele =423 Pres Hemerenny eT iets Quiver) =F 2 aa® F onere V-fes (tes 3 Gar bey =o 3 - (em =2 Cee = £23 ay Yok Uae Set wrlo He) LB fommtst 2, eS ESE |

You might also like