Supervised Learning 1710685760

You might also like

Download as pdf
Download as pdf
You are on page 1of 21
CRamifitetion CI Machine Ceoviniing is fhe pow , Maxhtag 4 5 omputon ast givin te aloility # Gann Aecnning make Lubion fsorn cheater, voltreut ving Laphiuly prograsmmect « email Spam] not. spam Uunsupsrwlsech Lecorning > Uneoveu'ing hictden pattesin from Unlabeled clata. as Example, (rstouping Customs Into aisha Categorias Cobutoving) Unaken Analysis o Cylomuy chun. Supentaier teeaininy —> dhe prudoiitect vats one Mnown —> Precttih the fesiqyh valu oy cnden cate , iver He aa Aoi Fon Racyuerh ion Cheowet voruia ble vonnrtle of Cheorqet vecwable is categosues) wontauousl) M. CAs fnauclulent No -6{.nv0rk, +Hanmackon Size 6 proparty Je bo Non frauceelent precick fhe drakakon foagcf Vorutuble ¢ pu'ce’) Feature —_s (ncbepanclentvoou'ctale - es > Clepenclent yorutalole Voouteole Requdsumenk of Superurvel (goorning & Mo Messing choute x Data tn numeuc format A pata Stored in panclar Patefoame on NumPy crvrary. > Peyorm Explonatony Pata Analysis CEOA) first. Com'tytng (al us _urmeen chart Le Bull co moctet Q. Moat loons form Me labeled otata we pus to ft 3. paws _unlabuled lata to tre mock og input He Mods prectick the (bots of the Unteen olata. Clebelik olata = taining date) K-~ Newark suighboss > predict He label A a hata point by > Woolung at fe ‘kK! lowat (cbuttecl olata point, > Talung Mog oruty Voking, S Bole, | ud §o, answit (5 plus 23 > InN buatesa cteabion Boundany jo poet fhe VOU Binary classification (k = 15) Maoausung Moctuh purypnmante >in Corn tton , cuceurtiay 13a Commonly Und Mutsat , Acuna =e _coask prediction total obssnvation — > Waid wompute acunary on the data und to fit the chouifeon —) not jndrcativy Of ability te genena lice, Hit [tein Chomt'fisn or Mood vom platy > leseqent Ue), AWA vomplu Mocled, corn LOUAe undoing — > Smatlor i, more vompler model, can teac fo overfriting, 2.class classfieaton (k= 1) 2.class classification k= 9) 2.clatsclastfication (k= 18) [atroduskon to Ragpussion > tangt veourble fyprecilly has continuous valu. = prrecti'ting blood glow uw. > Creating feature and tang orocuys. > Mating prectiutione from & Lingle feabwou —» Plotting qutose vs Body moss nol. filing oc seqpemion Liinecet) moet - Barics & Linear reqyemion Reqremton NMothan ia Ys adtb —5 Simple Unean Seaqyuion UMA one feature. Cub > ponamebers| conpiuent af He model ~Slope 2 interuept. XK 5 Singh feahore» dyin the wren function fon any given be anch chose Fre Line that minimize fre 2991091 fendtion. Exvr| loss | vost function, Vesthird | isan: CPusrduct) lose funckon. Rst s. a tyre)? Coterdctuad Sum 6 Squcocer ) Onin Ae (ors): | pdininize Res CAtm) SU aes, Linear Requeiorn fn highen ouranatony Ye tlt M te - b fo fit a Unemn srequeviion model hee , Neel to Cpeurty 3 veounbles G4) ,A2, (n higher guimensons, Cknown as Multiple seeqzcexci’on ) Mut speaty confiuenls fosr Lach feahusce Oncl he varuteble b Ye Quid + QrX2 + Ag%Z + -- Hanan + b She clfaule Inetourle of Linen steqreion 18 R-Squcuct , R22 quyonhter the veruiante in target Valu caplacned by the Feckuer Cvs stange from 0 tol) O>d featur Lomplikily eaplain the foougpl's Vasant. High 22 low 2? wa ot x a a eae Maan Squcscect n i. MSE IS mecawecd th 4ANON S Yn Z gig] fosgpt uni Squared. l= Ce) Root mean Measure RMSE In the Squared = MSE Same unit at fhe v0 fooype Vovu'a ble. CRMLE) Cross- Valaation mokvakon hi ec ge — > mock pistfommance te Atpanctent on He way we Split up re data > Not stepruuntatus pt the modul's ability to genunatize fo unneen data - — > Solution: (noes ~ valictatton . Cross-validation basics Split 1 Split 2 Split 3 split 4 Split § “Training Dota TestData S veluas 4 R-Wqueste + cefouk Sarr fan linear suyursion. S-f0leK SS fold cv : k-fOlK D k-fold tv Mose FOU = Morte Lompubionally sapensive . Qs. confecdtence Interval GS Uf we ware fo felce Lod ckigpoont Sampler anol compute a ASl- Lofidence Inertval for each Sample, then apprioutmately 95 61 He (ed confi'ctence interval wit contain the tue mean valieo (4) Requtosetzedl Recyeeerst'on Raqudosetzcteon to sepersiion fo Avord Ovaxtfitttng. Uneon Hegqeuston mintmize oloss function, He choses a congivent ‘ol for zach featur vasutble, plus \b? Losge voufiuient con Leool fo overfitting , Reguteou'cabion penalize lange _cdeyy'ue nls. Aidge. susp 3 & 2 5 Ly Ruvlye penalizes [os ts | neg oh un ben Wwe meecl fo che g SEN Similan fo prttuing 1c Parte fable weet fo 0 inots uh vomplenu'tey ABO = OLS (ean Lec wor filing 6 funtion = OL fenckton -t. oMO can sleck tmpontant £ g Shunt Hur coyfitten: L an 4K to Zw. Foahuss nots! k to WHO How good is youn moctel ? x MaoALeUNG Moet podormence woth C.ouscocy ~7 Frackon convectty Chat ack goccam ples _ > Not cla, a _umfud mutsuc. Chow irniboulcun e & cdothrakion for pudding fnocutent banc transach'on4 > 441 tnamactions core segitimate HS bk ante frauolulint F cowl pull a domition that precitr None 9 the tnamachons axe fnoculent > gat aeuancite > But fovu'ble ak ackuolly predicting forctueeat +t1a.nsathbn. > Falls at I oratnor punpore ® ts Imbalance + uneven. freq uenuy of clces A nae ditpount way 1 IMAx Pufonmane. (onfumon motevx for ouasing clossitietn peyaxmante, an Or ectcrect Levlek —> KX \ Toe -VL | Jobe xve OUI \oupah Fae -ve | TIAL AVL. Iwe AVL > _ NH. Froncuant transackon corparty labuect - Teg V2 _} NO Of. Legth'meke toanracklons ote vwntty Labtec. FUN -VE gy. bj. Uegimoke frankackona tnvonmedty Sabulud. ar HR 08> frmartdons inooutlty (abeblicl ck Jrouclent. AOR 5 he Acusms = tp 4 to (now our. p+ tn 4 fe 34 Preston = owe +g C pone fous ave t Fae tve precitivg Vous) Hah preutston = lowsn false Posttye, nate High preuion = qot many Legitimate trlarathory cuit prance to be fnavolulink. Toe «vk Reco = CSankvy'ty ) FHA FAIA tv VQ toh reek = (ows Fale ve nate Hh secu 2 prtedicked most frcuclilent tnansach’on corn dy, Fl Swe = ag psutton % sear Chermonr'e man of uriton & paweuston + sxcauek Heath Y \ \ QOL WwU'ght ogishc Regrumtion cncl tre ROC Cunve : a to pxeusion Cina y1ecouk - logisht. suspumion ts wel for. clout trac tlon Prov . Logistic sueqredton putpuls pxobablu'hie & tre Proubility 905: the clata ts Adbetlect! if tnt provablutty, P2057 the data & labellecl 0 (ogish'c seqrurton prods a Uinzounr cleuiion bounclosuy. Logistic Regression: Binary Classification eo eu By clyawtk, logictec sogsteaion Hsushold =o-s Not Speutite fo togishe suqrursion , (oon) hamifion otsd have thdecholel . Vox freushot Rewrvin Operating chanackouthic Croc’) cunve pviquclize how O'yorent Herholol affede Jou ave and dase ave 0th . Roc Cunve S VE we howe a modt wie ‘1! for Try avg note and Vo! fon Falte Ave nabs , wow le puree Mod - AUC Meare 9 He ability 6 a CAL Unde > hrnousy chom'fien 40 fhe Wunve ) iphingwish between towed fo 40 'S UA CA A Summomy Of He ROC dune. Aroha He AUC ; the button the mod's peryormance at Olshingulthing butwten tve 2 -ve Aoser. Hqpot porwmmwken funing > Rito [ Camo seqpursions choosing alpha > kon > choosing ne nerghons 5 Ayporparameters: pordmeben WL Speuty egos fitting tre moot Uke Cox 2 N-nelghtporu) Choosing the work hyper porrameben.. {. Tay los of cuifteent hypwr parcometon VoluA fet allo thom Sepanately - Ser how ell Hey peyoxm choot tha put puyorming value. by penpannamben bi 3 u fantng (¢ 16 exenttod fo ud vt0ss— valibleton fo avartl ovgsfritingy fo the teat oltita. ue can Shit Split He data @ perform Uress valida kon on ht foroulnthg sot we vithholl fhe best otha fon fined svaluck'on. Hyponpeommebot = —__s Crud teevtch. tuning Cou of hy perpaxametor)) Limitations and an alternative approach * 3-fold cross-validation, 1 hyperparameter, 10 total values = 30 fits * 10 fold cross-validation, 3 hyperparameters, 30 total values = 900 fits Altornoke wary, Rondomredsearh( , whith picks srandom hy peo posuameter VOWA Haken fron eohoakv.ly Seoviching Haugh cbt options, Roc Ie agnaph Showing fhe pyyormany Cia uumifitahon Moder at all cAcusititation Hraugholols Pre procersing Data Requucsu ment — > Numuuc clata >No Miata cheba with seat wos clata - Ths 8 Powuly He Core «We will aften Merck Fo prceps1ocek Meotate Frut H SWt— Lecorn will not accept categorutod featur by healt ae anecl to convent categoru'cas features into numeuc yoeluer # tonvert fo binary features cached dusnmy yaruables, Dummy variables ne Ronco [es [Stearic [Sour [Rector [Wp-Ho [nar [Rap [Rock aerate [a CC CT) ire > CE] EE) ERE] EC) ues CET SE EEC EEE costo a) ol ee] eso) cacao) ao] xo] nero) earn feountry a) ooo) evo) mao) ao) sso] Becton CE a a Ie-Hop neo eo) sesto) eso) eet] eo] ea oleae pea sa) oe] aerof reo) evaco) eal Ol rae ap CE SC EE EES EC rock a a Oe ee i S@lut-teann ; Dre Hot Envocterc’) Panclas + opkclarnmal) Handling Auning pate Nuleing cette: FA WO VALUE fon Q feature 1 O pantiwten stoLw * The can OCW becawe — > Thou moy have been no odsevatbn. —> ahs cata mibht be Wrotupt. Awe nrect tp oleod with musing data. 4 Common oypproach iS 1 Sumeve miving ObMavahiory acuounting fan LK than Si | al ater. [mouting veel, # imputation, WiC - Subjective matter eduportise fo suplace rtxsing clatter vith 2olucoked guenes 3% common fo we the mean % con alo wre the mechan On anothen vale. F for arkagorural valu, watypicatty use tur nott foveauen t Volar Cire Mocte) R Munk split oun clota fist, fe Avert olcta Lealage Cfeatset information 49 owl moet ) Pure iD frux abi'ky fo tranform our cata, lmputery ou known os 4a amMfonmen. imputtag within a pipeline whith isan objet wad fo un a Seu o tamtonmations ek bull co moclel inca Single wortleHow. Data imputitton ts One ef Several [mpoxtant psepsocening St298 fon Mowhtre Leasing . tenkveing 2 Suabihg a cata Scate oun data? > Many modlele ue Some form of clistan le te mgonm tfam —> featuws 00 (anger Seles can cLispno pon hbnately rnfuene He Meclel ES FE: Jeon wed ctstanue eapiirtly vohon maikag powckiton . > we want featyous fo be on a Similan scaee, +5 Nonmalizing on Standoneiainy C Scaling 2 centeung) trow to Scale _owrolata: ¥ Subtract He mean ancl chvicl by Varuane > Ail feabwus ou Centerecl amound were Cine have Ovoru'ane 6 One Spits 8 Altec Standosetiza tion. * Can al§o Substract fhe minimum and chivitcle loy the HUNGR ~ Crinimum zed @ Moxutmum one) #® Canalo normale so trecata sans fytom -l tol Evowuoktng nulkipe naoolely 5 ottefpunt mods fon difyvunt prxroblemk Some guiding principles * Size of the dataset o Fewer features = simpler model, faster training time © Some models require large amounts of data to perform well * Interpretability © Some models are easier to explain, which can be important for stakeholders © Linear regression has high interpretability, as we can understand the coefficients + Flexibility © May improve accuracy, by making fewer assumptions about data o KNN is a more flexible model, doesn't assume any linear relationships sate rc FR Yon mood pufvonmank RMSE R- Square chamibietion mocked paxfonmante. Aowsnce confusron mito’ pouuiton re ft-Stone OC AU: oe Cruin Sevouk moi and 2valuat IAMANLL O He bow. Scaleag fe Aeted. oy Scaling fey jnron ton ( Rite, Louso Logisht. segpemion Antifrutl Nerworte. Bert to Scale our olata before Lvaluating mo

You might also like