Download as pdf
Download as pdf
You are on page 1of 67
7128723, 935 PM mn (3) snport pandas as pe inport natplotlibypyplot a5 ple ‘sng. set_thene(color_codes=True) paraet_sption( displsyonax_colunns", None) Dataset : The dota contain dographic geal, werkt metres and ation fg. ‘ttn -Did he ompoyeo atta? 1 Age Age ofthe employee Business Travel Travel commiment forthe fb Department - Eslojee Deparmant DistanceFromome Distance rom work to home (nk) Education {Below Cologe, 2-Coege, Bache, 4 Master S-Docor EEnvronmentSatstacton- Low. 2Medum, High, Very High HeurlyRate- sta doszripton not avaiable ‘obLevel- Level feb (182), ‘obSatisfection--Low. 2Madum, 3-4gh 4-very High Maritaitatus - Moria Sinus Monthtyincome - Monny Salary MonthiyRate - Dats desrion not vaisse* ‘NumCompaniesWorked - Number of companies worked at 1 Overt8- Over 18 yore of 90? PercentSalayHiks ~The porantago inoaso in salary et yoar PetformanceRating - Low, 2-G2od, $-celent -Outsorang RelationshipSatistaction- -Low,-Medhi,3-Hgh, Very High StockOptionavel- Stock Opion Love! TotaWorkingYears Total years worked TainingTimesLast¥ear- Numbor of raning ater last yoar WorktteBalance - tow, 2-Good, Exes oustanding YearsAtCompany-Yoars at Company + YearssinceLastPromotion-Yoars ics tho lst promaton EDA (1) Drop eamua kolo yang bak pcukan pada deta employes. cv, Lakukan EDA unvanat ntuk sta Kolom numérpad employe. cv yang histogram danboxpot ust ap elm iver tate dasa untk a ol: mean st, mi, 2 ar max anita nila! uppr whisker dan ler whisker dar oxplt ap alm 4. apabilatodapa outer (a-Si >4944 Stra count, propeton, dan it doula ap kolo «identifica nal yang menu anda menank dan has EDA yang Anda dopatan locahost 8888inotebooks/Sharing Vision Testipynbt ‘Sharing Vision Test -Jupyter Notebook Wer 7128723, 935 PM mm (2) outta} mB) ovti3} ma) outta): m5) localhost 8888inotebooks/Sharing Vision Testipyibt ‘Sharing Vision Test -Jupyler Notebook 4 Drop senuo Folon, hol pertona yang dilatukan adalah mengecek 15% dalow dotaset GF pdsraad cov exployeacsu') af nead) Unramed; cnpayahnber Aan Agt BiMnes Taw Onyate Deparment OstancaronHons Educaton Edvetbonald_EnvtnmtSaact oe ee 2 te Seence| 14 2 No Tomlfanaeny 278 gepr§ 8 Scene po 2 You ar motauey sora mart zo one Cex Junlah untque value setiap object datatype fs select-dtypes(inelude- object") -runique() sctrivion 2 Ceparenent 3 Eeriontield Sobre 5 Gvertine 2 types tte cok junioh urtqve value setigp int dotatype Gfvselect_dtypes(inelude="int -nuntque() nase’: 10 Eployectinser ite nee 3 Oattyeate 6 Eaueation 3 tourlyrate a Sobrnvoivenent 4 Sebsotisfaction ‘ roneslyineane ae fertniyaase haar fncongantestorked 18 Percentsolaryte % Perrormancetatng 2 felstionshigSatisfaetion 4 StockoptionLevel a {otalworkingvesrs * ork fenalonce ‘ Yearsasconrany a YearsSincetastPronation 36 ‘earsitheure¥arager is dtypes inte orep tolon Unnoned:6 ,EmpLoycehutber, dan Standards, Overi8, Performance Rating fcdrop(calumrse{"Unnaned: 8, trployeelumer", "Standardiours", Overlay "pefornanceRating’], lnplace=True) 2ier 7128723, 935 PM tn (6): fa select_dtypestincluces‘int') ‘Sharing Vision Test -Jupyler Notebook ovt{s} ‘oe Dalyfte_DisanceFremtione Eductlon Enekonmentatnacton HowthRiate_Jetnvohemant_JebLeve Montnyncome Mott locahost 8888inotebooks/Sharing Vision Testipynbt ier 7128723, 935 PM ‘Sharing Vision Test -Jupyler Notebook mm (7) # Get the nones of alt coluws with dota type “int Sntvara » df-telece_dtypes( include "int’) columns. tolsst() # create 9 figure with subplots fun cols = lenCint vars) runcrovs = (run cols + 2) // 3 # To wake sure there are enough rovs for the subplots gy ax = plessubplots(nrowsenun_rots, neolSo3, Fpsizer(35, 5°%4M_FOWs)) fa Realelatten) # create 2 histogram for each integer variable for fy var in enunerate(Jntvars) G¢{var) plot rist(@r-axsl1]) deait see eitle(var) 4 Rewove any extra empty subplots if needed Ee ounceota « ten(ana) or in range(nun_ cols, Len(axs)): fig.delaxes(axsti]) # adjust spocing between subplots fig.tighe tayoutO) # show plot pre-show) localhost 8888inotebooks/Sharing Vision Testipyibt 7128723, 935 PM ‘Sharing Vision Test -Jupyler Notebook locahost 8888inotebooks/Sharing Vision Testipynbt sie? 7128923, 995 PM Suen ac Jupyter Notebook. 8 =— F | 1 | ill WT I - tl tlie ee ¢ oa TU ee ae | EEE L. ih Vision Testipynot 7128923, 995 PM 28 -8 roe oe ey Sd Ma “I ia I 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1m (8) # Get the nones of alt colums with dota type “int Sntvara » df-telect_dtypes( include int’) columns. tolsst() # create 9 figure with subplots fun cols = lenCint vars) runcrovs = (run_cols + 2) // 3 # To make sure there are enough rovs for the subplots ig, ax = plessubplots(nrowsenun_rots, neolSo3, Fgsizer(35, 524M FOWs)) fa’ Realelatten) 1 create 9 box plot for cach integer variable using Seaborn for 4, var in enunerate( dnt, vars) sts. boxplot(xcet{var], 3e-2x5(41) auait tee tstle(var) 4 Rewove any extra empty subplots if needed Ee eunceota « len(ana) or in range(tan_ cols, Len(axs)): fig.delaxes(axsti]) # adjust spocing between subplots fig.tighe tayoutO) # show plot pre-show) localhost 8888inotebooks/Sharing Vision Testipyibt 7728723, 935 PM ‘Sharing Vision Test -Jupyler Notebook locahost 8888inotebooks/Sharing Vision Testipynbt 967 772823, 935 PM ‘Sharing Vision Test -Jupyler Notebook localhost 8888inotebooks/Sharing Vision Testipyibt 10187 772823, 935 PM ‘Sharing Vision Test -Jupyler Notebook 'H Soden ees “argent ‘sosoten uote rd esuiheee eecemcany localhost 8888inotebooks/Sharing Vision Testipyibt ser 7128723, 935 PM 1m (9) oxtto1 # Filter columns wish deta type “ane Sne_coluans » fselectatypes(Ineludes"Snt") ses yaues = int_cohams.sean() Stalvalues ~ ine Zolums-std0) sincalves © Sntocolume.min() {values = ant Zoluans-cuant4Te(0.25) Grovalues ~ Sne-coluensquantile(o.5) {Dovalues = SneccoluensavantiTe(0.79) orivalues = ahovolues = qi values raccvalues ~ int-coluens.naa() Seotlaies_ar "pe oeorrane({ ste's sté_values, Q's qivalues, * 2 (vecian)': @2_valves, bt apvalues, Son: Sgrvaiues, foe's ancvalues » wu stdin at_azitedny a ie Saneeoaee ema Dalykale aozansri4 amaour seo 4659 m0 870 Worthneame stzaaieea A7e7ss770 ioe zane at80 at6n0 Morya 12391 TESTER 2me4 wot zs mtn PeranSalaytine —S520660 ass 9 HOD RantonenpSatstcton 272215 081m 1a ae 48 Stocoplonevel aera nasi 00a wt eulWorkingtnan 11276512 777-09 100 TrinogTinecLeetver 2786529 tases 020 ae a0 Yewanicompany — 7908169 Grate 8 so 98 localhost 8888inotebooks/Sharing Vision Testipyibt ‘Sharing Vision Test -Jupyter Notebook 12067 7128723, 935 PM tn (20): # FLLEer coLumas wizh dota type “ine Int columns select etypes inetd é nt_coluans-auatle(0.25) ‘Popalues = ineccoluens-evantilec0.75) er dsker values = «@ualves= 1.5 whiskers = pd.vatarrane({ me) » ous(s0) hae Datyrate cele locahost 8888inotebooks/Sharing Vision Testipynbt ‘Sharing Vision Test -Jupyter Notebook 19167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook in (23): import pandas 35 pa 1 Assuming you alreody have the DatoFrave “6f" contotning the data fInt_colums = of select. dtypes(tneluces‘int") ‘values = int_cotuans.quantite(@.25) Goavalues = int_colums-quantile(®. 75) Sgravalues = qbovaluer - gh-values pperanisker values = qhivalues + 2.5 * Ser values Tower-pniskervalues = qi_values ~ 1.5 * igryalues cutiten count = ((nt_colums < lover shisker values) | (Ant colunes > upper_wntskes_values)).sun() ‘otal_data_poires » len(ine coluwns) fuller propertion = (out2iar_count / total data points) © 100 outlier test © 0. ‘outers = int_coluens(col]{(Ant_colums{col) < lover whisker_valves(col)) | (int_colunrs{col} > upper_phister_valves{col))). outlier 1ist[eai} » outliers outlier of = pa.oatarrare(¢ ‘outlier Proportion (8)"7 outlier proportion, (tien Lists ovtlier_lést » ution of out Cuter Contour Propet 8) utr ro ° o00te0 7 sive ° eee 0 Datanctromtome o acon 1 Eaveaton ° 0000 0 nvtermenttntcton ° 0000 0 outa « acco 0 osovtanat| o enc 0 obee ° 0000 0 Jebsabetecton ° dean 0 orenyicame ma esi0n (500,047, 145,174,806. 1872. 172. Montiyate ° p0aca 0 uncompanintoreed om sss Pesnesneneeaaes Percensataythe ° ten 0 PutarmanceRting 4 sesraea HA eltonsipsatscton ° 0200 0 Storoptentevl sms passasasoasaa9s Tetaongveee wa zara 28,37 30,2040 3,36, 32, 3,37, 3. ‘ringTneaLatone a ‘esas P555.65556,50.005.9, WoL taBsanee ° 0050 0 eanatcomgeny ze ones es 2.22.21, 21,2291 25,200,202, ‘vearancuranaie 2 amn6Th (8 1,96 96 1,97, 90,1, 16 98,184 YeresneLatPromsion om Tamm .98.8.8.9.19,12.10 159.42 18.15 ‘eoreincurMaraae a oases 197.15, 95.1895,17. 18.07 15,97. 14 (2) Lakukan EDA uivariatunuk setap kal katgorkal pasa employee.csv yang mencakup: 2 coumplt untuk Kolom dtr katogor unk da Fakuonsiny untuk ap kom «ganas hl yang ment ands menarik danas EDA yang Anda dapat localhost 8888inotebooks/Sharing Vision Testipyibt 4107 7128723, 935 PM tn (32): ‘Sharing Vision Test -Jupyter Notebook # Get the nones of alt coLums with dota type ‘ebject® (categorical colums) Cativars » df-seleet dyes srelude. "20 jeet")-eolunps.toltse() # create 9 figure with subplots fun cols = Len(est vars). runcrovs = (run-cols + 2) // 3 # To wake sure there are enough rovs for the subplots gy ax = plessubplots(nrowsenun_rots, neolso3, gsizen(35, 5°m4M_ POWs} fa’ Reacelatten) 4 create 9 countplot for each categorical vortable using Seoborn ‘sns.countplot(x-var, data-d#, ax-axs[t)) Sua[ty see estle(var) {s(t thee parans (xt 1) rotation=s0) # fotate x-axis labels for reedability 4 nenove any extra empty subplots 1f needed S¢'umzcols « lentaxs): Wor 1 in range(nan_cols, Len(axs)): ‘figidelanes(anali) 4 Adjust spocing between subplots Fig. Light Layout) : d E —— ‘ah a - oe i { il i i i oo -uevadd uevdeed -ee bag | Pil tT a i “EtucalieFioks : localhost 8888inotebooks/Sharing Vision Testipyibt j | i 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1m (28) mtd ‘tetionary to store unique cateyertes and their freauencies for each column ries { 1 reenate over each coluen tn the Datgrrane Tonech 17 the colum hos object dota type (categorical column) 16 6ffeo] .étype 7 Value courts = ef[eot].vatue_courts() ‘nique_categorses(col] = ‘CStogories’t value_counts.tndee.tolist(), Frequency" value counts. values tolist() , F DUsplay the List of unique categories and their frequenctes for each coLum for col, data in unique categories. tcens() prine(?"coluen® {eel} Dprinc(pd.0ataFrane(<2ta)) brine Categories Frequency column: Burinessteavel ‘Categories Frequency © wavelifarely 2086 1 travel-frequently “S54 Categories Frequency @ Research & oevelopwent 1922 Categories Frequency © utfe setenees aan 2 marketing 38 column: Gender Categories Frequency 1 remle 3176 ea EERE. Pemteey Laboratory Technician Nealtheare Representative Sales Representative Research airector column: Marstalstatus 1 "Single 348 colunn: overtine Catagories Frequency a) 4 sayonttas Roryovan Jorang setakudan travelling, dan mayorttas haryawan bekerja dt bidang Research and DeeveLopsent ' Mayorttas horyowan overtime dalam bekeryo (2) Lakutan EDA mutvarat unt pasangan ken numeri an Kolm Arto’ paca employees yang manoaki: «2 boxplot (atau varasiy) antara seme kolom numeri axis dan kolor ation (xx) 2 yang menor anda menark daha EDA yang Anda dapat localhost 8888inotebooks/Sharing Vision Testipyibt 16167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook rm (a7) # Get the nones of all colums with dota type “int Sntvara » f-telect_dtypes( include int’) columns. tolsst() # create 9 figure with subplots fun cols = lenCint vars). runcrovs = (rum cols + 2) // 3 # To wake sure there are enough rovs for the subplots gy ax = plessubplots(nrowsenan_rots, neolSo3, gsizer(35, 5°m4M_FOWs)) fa’ Reacelatten) # create 9 box plot for each integer variable using Seaborn with hues ttritton” for {y var in enunerate({ne,vars) Snesbomplot(yrvan, x-'Aterseion’, data-dfy axcaxs(]) suai tee_tiele(var) 4 Renove any extra empty subplots if needed Ee eunceata « ten(ana) or in range(nun_ cols, Len(axs)): figuaelaxes(axstt])| 1H Adjust spocing between subplots fig.tighs tayoutO) # show plot pre-show) localhost 8888inotebooks/Sharing Vision Testipyibt sTier 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook locahost 8888inotebooks/Sharing Vision Testipynbt 18167 7128723, 935 PM ‘Sharing Vision Test -Jupyler Notebook mn OE oe ‘He a ‘a 20 é » : ie ; — ‘hm: - = I ; zz : i ~ — . jean amas a - : oe | Pwcmtssantte trnsaectn i i Ir = = 3 ve i ‘38? localhost 8888inotebooks/Sharing Vision Testipyibt 7128723, 935 PM ‘Sharing Vision Test -Jupyler Notebook fc Ee = I. a : ti i localhost 8888inotebooks/Sharing Vision Testipyibt 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook rn (28) # Get the nones of alt coluws with dota type “int Sntvara » df-selece_dtypes( include "int’) columns. tolist() # create 9 figure with subplots fun cols © lenCint vara) runcrovs = (run cols + 2) // 3 # To wake sure there are enough rovs for the subplots {gy ax = plessubplots(nrowsonan_ros, neolso3, Fgsize-(35, 52MM FOWs)) fa Realelattend # create 9 histogram for each integer vartoble with hue Attrition” for {y var in enunerate(Jnt-vars) snsihistplot(aatavd, over, hue="Attrition’, kée-Teue, ax-axs[) suai]. see sseleQar) 4 Rewove any extra empty subplots if needed Ee eun-eata « ten(an) or in range(tun cols, Len(axs)): figudelaxes(aesti]) 1 adjust spocing between subplots fig.tighe tayoutO) # show plot pre-show) localhost 8888inotebooks/Sharing Vision Testipyibt 2067 7128723, 935 PM ‘Sharing Vision Test -Jupyler Notebook locahost 8888inotebooks/Sharing Vision Testipynbt 2267 7728923, 935 PM ‘Sharing Vision Test -Jupyler Notebook vo aya ix i oPPEEE SEE e238 eres localhost 8888inotebooks/Sharing Vision Test ipyibt 772823, 935 PM ‘Sharing Vision Test -Jupyler Notebook a | c i peared mt (4) Lautan EDA mutvarist untuk pasangan Kolm kategol dan kolo Aton’ pads employee csv yang mencatup 4. count untuk ap Kalo Kategoria dengan Klom ‘Aton sabaga! hue » stacked barlt yang monunktan propos value kolo Aton’ tuk masing-masng Kaogor pasa sora Kolm katogorkal(t kolo ktogerkl plo 1 pot beri stacked ba semis banjanya kslegor unk ada Kolo lego eretut «tential yang ment ands menarik da hail CDA yng Anda dapstan localhost 8888inotebooks/Sharing Vision Testipyibt 2467 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook tn (38): # Get the nones of alt coLums with dota type ‘ebject* (categortcal colums) Cativars » df-seleet_dtypas(Srelude. "2a ject")-eolunps.toltse() # create o figure with subplots fun cols = len(est vars), uncrovs = (run-cols + 2) // 3 # To wake sure there are enough rovs for the subplots {gy ax = plessubplots(nrowsenun_rots, neolS=3, psize-(35, 5°%4M_FOWs)) fa’ Reacelattend) # create 9 countplot for each categorical vortable using Seoborn with ‘Attrition’ as hue for {y var in enunerste(cat-vars) ‘ns.countplottacvary nue Attrition’, datarat, axaxs[4]) Sua[t see eitle(var) {s(t stock paras (ax "x1, rotations0) # forate x-axis labels for reedabitity 1 newove any extra empty subplots 6f oseded fe numcols « len(ox Wor 1 dn range(nan_cols, Len(axs)): ‘figidelanes(anali) 4 Adjust spocing between subplots Fig. Light Layout) + stow soe rs ie Oe vr _= . a | i i I i E i e I 7 ; J : “ool 0 oe . Try Pd 2 2 fe = ad localhost 8888inotebooks/Sharing Vision Testipyibt i 25167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1m (20) nport satplotlib.pyplot a5 ple {Inport seaborn 32 an nport pandas 25 pa 1 sssuning you already have the DataFrase "df" containing the date # Get the noves of alt colums with dota type ‘ebject" (categorical colums) Cat vars = "df select dtypes(inelude-" ject") colunps.tontst() # create the stocked density plot fun cols © len(est- vars) runcrovs = (run-cols = 2) // 3 pize-(05, Stman_rows)) for 4, var in enumerate(cat_vars) snsibistplot(satardt, sovar, huec'Attrition’, stat='density', multiple='stack', ax-axs(t)) asi set t3tlevar) {oa[t] set ylabel( Density") Saal tick parans(axds="", rotation-29) 1 nenove any extra empty subplots 1f needed se nunecols « Tentaxs): Wort in range(nan_cols, Ten(axs)) ‘igreetaxes(@xs{t)) Fig. igh 1ay0u80) 4 show plot pit-show() localhost 8888inotebooks/Sharing Vision Testipyibt 2667 7128923, 995 PM r i o- I ] 3 (6) Lakutan independen os (ide) dengan ketetuan: ‘Sharing Vision Test -Jupyter Notebook = =: mm = it i i i cs fo Fas _ ol Hszs. =. = a i 3 HO: Tak ¢a prbadaan mean otal Working Yar artrakaryawan yang els maupin menetap(gunakan Kolo Aiton sebagal acu) Ht: Terdapatperbadaan mean Total orion Years’ anara aryawan yang Kear maupun menetap(gunakanKlom titer sebaga cur) sph = 55 Ps ha Jos dan shan kosimpuanny localhost 8888inotebooks/Sharing Vision Testipyibt 2167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1m (22) me 125) mtd 1 (26) ‘ovs(26) fram seipy.stats import test_ind 4 extract “Total Working Years’ for employees who Left (Attrition = "Yes") and stayed (Aterttton = No") Monking.yeans left dfof[ssccrstion ] --'Yes"J[ Totaldoreingtears") Working yearastayed = df[de| Attestion”) == oho" ][ Totaluorkingvears"] 1 Perform the ‘ndepondont t-test fips = 2.05 UStatistic, p_valve ~ trest_ind(working years lett, working. years_stayee) Drint(- Independent T-Test Results:") princ(e"lsstocistie! (e-seatistle}") printte"evalue: (pvaive)") Independent T-Test Results Tsenedetie: 29,410912409729871 hasit = p-value > alpha 4 Tolok Mpotests rol (8). Terdopat perbedoon yang stgniftkan rote-rote “Jualah Tahun Kenjo' antara horyowon yorg helvar don hor (6) Lakukan one-way ANOVA dongankatentvan: "Ho: Tdak ada pereedaan mean Age ertarekaryawan dar departemen yang ad dl dataset it Sabana tecapa 2 dopartamen yang man umurkanwanny bere rth enesaay ANOVA dn usa kesipulanya, fram selpy.stats import ¢_enewsy # exeract “Age” for enployees fron each department See soles © af[a¢[ Department’ ] = "ster']( Age" sbe_Research_Developnent = df{de[ Departneat’] == Research & oevelopeent")(‘Age") Seechunan Resources f[4e[ Department] -> "human Resources’ ]["Age") 1 Perfore the one-way ANOVA Sapna 8.05 EStaListic, pvalve = f_oneway(age Sales, age_Research Development, sge_tunar Resources) # prin the results Print“one-nay ANOVA Results") printer-statistic: (estatistiey") print(e"Poisiue: {pevaive)") cnecay ANOVA Results Statistic: 2-5925719912969073, Puslue: @.z36isaie:s715i168 nasil = pvolue » alpha 4H nipotests rol (h@) dterina. Tldak ode perbedoan yang stgnifthan delae rota-rata “Usta" ontara haryewan dort hetiga departeren Classification (@)Lakukan feature organ yang danagop spetukan pa ansa:inpstaton,encosng, seal, selection, Lakukan tansfomasi supa pada ‘esse tanpa melokukan fing Kemal scheck missing volve Check pling ~ df. Lunul1().sun() * 10 / dF shape[] heck_nissing( check eissing > €)-sort_values(ascending-False) Series(L], atype: floats) localhost 8888inotebooks/Sharing Vision Testipyibt 2067 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1m (27) tn (28) # Lo0p_over each colunn in the OotaFrowe where atype 4s “object or cot in df-select_dtypes{inelude-( object" ])- coi: 1 print the column nae ond the unique volues peint(e{eol): {arLeot]-unique)}") ‘ction: ["¥es" "No") Susinessrravel’ [Travel Rarely’ “Teavel Frequently" ‘Non-Travel Departnents [Sales ‘Research & Orvelopaent™ "tuna Resources] Educationtields ["Uife Sclences’ "Gther "Madical” "harketing™ ‘Technical Degree’ Hunan Rerounces"] Genders ["Fenate" Male") Sobrole: ("sales Executive’ “Research Sclentist' ‘Laboratory Tecymictan vanafacturing Director’ Wealsheare Representative” anager ‘Sales Representatives “Research Director* lunan Resources] aricaistatus: ['single® ‘Warried” “Divorced'] Dverrine: ["Yes" "Wo" ‘fram sklearn import preprocessing 1 Loop over each column {n the OotaFrane where dtype ts ‘abject for col in 6f select dtypes(aneludes( ‘object 1). Colas: A mntetolize 0 Labetencoder object Tabel_encader = preprocessing. abelercoder() 4 Fe the encoder to the unique values tn the colum Tabel_eneoder. Fit (4r[eot]-unique()} 1 teansform the colume using the encoder e{eol] = Tabel_eneosen-transtorm([eo1]) 1 print the column rane ond the unique encoded valves peint(e{eol): {arleot] unique} seteteion: (2.8) Susineserravel? (2 2 6] Department: [22.0] Educationfielé: (214.3 2 5 @) Gencers [0-1] Sobkoles (7.6 2483.95 4] arseaistatus: (2 2 6) verrine: [1 8) localhost 8888inotebooks/Sharing Vision Testipyibt 20967 772823, 935 PM ‘Sharing Vision Test -Jupyler Notebook In (30): #corretation Heatwap ple tigura(fipsize- (20, 26)) naiheatap(df corr), fat"26') sxessubplot:> (9) Peripan dotatet untuk Hasta. Jaci holm Aion’ eebagl age (]-Drap sams Kolm yang longa dak petkan tn (35) X= af trope aie y= aff actestion’) (10) Lekutan rain tet pl tsttast =: tray = y, Kemuan napus Outer manggunakan 2-Soore 1h (23]+ from sklearn. sod selection snport teain_fest_sp)st yerain, yotest = train fest split(t, y, test size-0.2, stratitysy, random statee) localhost 8888inotebooks/Sharing Vision Testipyibt 30167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook rn (38) 1m (35) fram seipy import stats 1 Define the colunns for which you want to remove outL ers felectee columns” ['ronthlyincone', 'stockoptiontevel’, "Totsluorking¥enrs', “TeainingTinestaetvear', esrsntcongany', Nearsincurcenticle’, “YearssinceLastPronation", “VearsMithcur*sanager } # Calculate the Z-scores for the selected columns in the troining dato Escores = nprabs(stats.25core(Q\ trala[ selected threshold) [0] 4 henove the outliers fron the training dato K.train ~ x train.croa(h train dodex(outiier_srésces)) Sosrain « yotrain.dro(ycerainsndex[outiter-snaices]) (11) Sapkan 3 ostinato okukancose-aéatlon dengan estimator Logit Rgresson, Deion Tree Clase, dan XGBaost Classe (apa idk ise ‘ssa xgb00e ska plik lassferain ant menggankan) otk menenutan nis ptr unt be-bage Myparpararetar masing making esate, ‘fram skleurn.sodel_selection import GridSeurchev, cross_val_score fram sklearnsLinear-wodel ingore Lopistickepression fram sklearntroe Sport Dectelontresclassifier fram xgpoost import NoBclassifier 4 Estinator 2: Logistic Regression raelassi ier = Logistictegression(solver="Tsblinear', rondon_state=42) Boparans = (°C: [ely 18, 10.67) # estinator 2: decision Tree Classifier Gt elassiier = DeeisionTreeClassi tier (random state=t2) Gt-parans = (°naxsceoth'? (None, 5, 18, 25)) estinator 3: xGaoost Classifier ab_elassifter ~ xGBClassifier(randor_state-e2) fatcperane = 7 noestinators': (50, 180, 150), raxadepth's (3, 5» Te Joarning rate’? [@.1,"2 2, 0.621] > 1 cross-validation untuk Logtatic Regression rsarie_ search = Grlaseerenev(lr_elassifier, Ar_parons, Srcariei search ese train, yotrain) princ“Logistic Regression ~ Best Paraneters:*, Ingrid search. best_parans_) Drintt-Logsetie Regression = fest Enoss-valscntson Scorer", Invgrie_seanchbest_ score) scoring accuracy") 1 cross-vaLidation untuk Decision Tree classifier Grigrie_ search ~ Gridsearenev(de-classifien, et_parans, cvs, scoring'3ccuraey") Gtgriecseareh #stQX train, y-train) princ("Ceciston Tree Classifier = Gest Paraneters:*, dt_grid_search.best_parans_) Print(-ecision Tree elessivier ~ best crossevaligation seorer's dh-prid-searchybest_score_) 1 cross-validation untuk XGB008¢ Classifier feb arie_searen » Gridsearehev(agh classifier, sah_parans, cWb, seoringe“sccursey") sabagriszseoreh. 280% trata, y-2rain) Dprine(-xGsoos" Classifier - Best Paraneters:", xgb_erd_search.best_parans_) Print{-xeaeost Classifier = best cross-validation Seare®*s nab-arid searehbest_score) Logistic Regression - nest Parameters: (C's 2.0) logistic Regression ~ pest Crosscvalidation Score: 9, 868991787A718646 Deeision tree Classsiier = Sest Parsneters: {’nax_depth': hone} otoost Classivien = aest Paraneters: (learning rate’: 8.1, 'nax depth’ 7, ‘restinators': 108) (2) Fttoigaestinator dengan transl. Pint clsseaon repo untuk tarsal dan ste unk Kt esa. localhost 8888inotebooks/Sharing Vision Testipyibt 3067 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook rn (38) tn p37) oux(37) 1 (38) 1 (39) ‘fram sklearn.tree snport oecistonTreeclassifier fram akiearn_acde]_selection import Gridsearcht tree = Decistontreeclossifver(elass weight balanced") aran_grid = ¢ nanedepth': (3) 4, 5) 67, 81, rincsanples, split": (2," 3,4], ritcsampleraleat'? (2 25°3) 41, random state’? (0, 42) > 1 Perform a grid search with cross-validation to find the best hyperparaneters frig. search ~ Gridsearchev(dtree, paran_erid, 6v-5) pridisearen-rie(utrain, y.trsin) # Prine the best hyperporaneters Print(gridsenreh- best. parses.) oman depen: 8, in samples Leaf": 1, ‘nln. somples split": 2, ‘random state": a) ‘from sklearn.tree Sport oacisiontreeClassifier Gtree = Decisiontreeclossitier(rangar_state-e, max eat eres. Fie train, y-train) rnin_sanples_teat=1, min samples splite2, class weights" Decisiontreeclassifier(clats_weight="halanced’, maxdepthe8, random_state-e) fram sklearnonctries Inport aecuracy_score Yyapred = dtrce-preaict(i test) Print(-Accuracy Seore +", round(accuracy.score(y test, y_pred)*109 ,2), “S") fram sklearnonetries inport accuracy score, f2_score, precision score, recall score, jaccard score, Log_1055 print'o1 score : ",(,seore(y test, pred, average nicro")}) Print Peectsion Score 1 's(pretiston’score(y. test, yiared, averaga~"acro'))) Drint(’Recath Score : *,recallascoredy.test, yopredyaverage~'nicro"))} print(-Jaccord Score’ ',(Saceard seorely.testy Y pred, sverage™"™crs"))) Drintt’ Log toss =", (log Loss¢y-test, y.ered))) a Score + o.nen93673«6930774| fees score @.84i8367340938775, Saceard sore : "6.7268722466968352 og toss 5.asn87aaasooai2 localhost 8888inotebooks/Sharing Vision Testipyibt 2267 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1n (20) 46 = pd oaeaFrane(( ‘cnportence": dtede.featureimportances_ » 1H snp_afsort_atues(byTaportance",ascendingeFase) 432» Gsnenatie) pit. tigure(eigetz Ens: barplot(aata-fi2, x-"Inportance’, y-'Feature Hane!) ple.tite("Top 10 Feature importance’ cach Aterdauten (Decision Te pitsdabel ("ingortance’, fonesize=16) Pltoylabel ("Feature None", fortsize=16) *, fonesize-i8) ‘Top 10 Feature Importance Each Attributes (Decision Tree) rT TT 7 . 5 £ e —— Importance localhost 8888inotebooks/Sharing Vision Testipyibt 30067 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1m (43): port stop foplatner = shap.Tretxplatner(étres) Shapivalues © expleiner-shap values(X test) Shops sumary_plot(stop. values, Atos) ev LS ‘< Soe i SockOpion.eve! I a EnironmentSaisfecion i ToaMorking ears ET NumCompeniesWorked [I JebSaisfacion Morthyincome Department Conder YearsAtCompany RelationshipSatisfaction [I Dairatc MenthiyRate I YeorsinCurrentRole (I Jobrole [I TrainingTimesLastYear IN PercentSalaryHike (NE YearinCurttarage = Gwe " = wm Class 0 a a a ‘mean(/SHAP valuel) (average impact on model output magnitude) localhost 8888inotebooks/Sharing Vision Testipyibt 3467 7128723, 935 PM In (42): |# compute Shae votues ‘Sharing Vision Test -Jupyler Notebook fuplainer «shay. Ireetxplainer(atree) Shapivalues © expleiner-shap valucs(X test) Shap; sumary_plot(shap values], Overtime Age JobLovel ‘StockOptionLevel EnvironmentSatisfaction TotalWorkingYears ‘NumCompaniesWorked JobSatisfaction Monthiylncome Department Gender YearsAtCompany RelationshipSatisfaction DailyRate MonthiyRate YearsinCurrentRole JobRole TrainingTimesLastYear PercentSalaryHike YearsWithCurrManager localhost 8888inotebooks/Sharing Vision Testipyibt 4 2 oo 02 os Rutestivalues, feature nies = x text.colvnns) vn epee " Feature value + . low ‘SHAP value (impact on model output) 35167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook rn (83) ous [43} fram sklearnonetrics import confuei fen confuston-yatrinty tests 9p plt.tigure(rigstae-(5,5)). Sns:neateap(aata-en, inenidths>.5, anot-True, cnap = “Slues') ple-ylabel( actual label) pltsrdabel( ‘Predicted 12bel") Silleanple-title = "Accuracy Score for Decision Tree: (@)" format(deree-score(x test, y_test)) pltveithe(ald supple title, size = 35) Text(0.5, 1.0, ‘Accuracy Score for Decision Tree: 0, 846367346538775") Accuracy Score for Decision Tree: 0.8418367346938775 400 ‘Aetual abel 8 — 160 = 100 ° 1 Practed label localhost 8888inotebooks/Sharing Vision Testipyibt 3667 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook rn (aa) couse) tn (35) tn (26) oust) 1m (27) fram sklearnonetries Inport roc_curve, roc_aue_score Spred proba" dirve. predict praba(ictert)E-TIoa At actual_predicted = pé.concst([pé.0ataFranetnp.array(y.test), columsc['y_actual’]), sd.bataFrave(y_pred_proba, colums-["y_pr« ‘Gf-aclual_predicted. index = y teste inden ory tr, tr » roc_eurve(se_pctual_predicted| 'y_actual"], $f petual_predicte 'y_ores_poba'7) Ste ot_sue_score(af setual_predieted| y actual], 4f_pctual predicted| yorea probe") pit.plot(fpr, tgry Label "Ae = 0.4" taue) pitlplot(spr, for, Linestyle =", color='k’) pltsxdabel( False Positive fate’) pltiylabel( True Positive Rote") pie.tite(*Ror curve", size = 18) pit. legend() ‘ratplotiip. Legend. Legend at 612549462972» ROC Curve 10 |— Aauc-os771 ‘True Positive Rate 02 00 00 02 os 08 os 10 False Positive Rate ‘fram sklesrn.ensenble dnort Randonforestclass fier fram sklearncsodel selection snport Gricsearchev he» RandonforestCiassiier(class weight "salanced") aran_grid ree, 2901, Mone, §, 10), Tisert:, "og", Nore), andor esate"? (0, 421 > 1 Perfors 0 grid search with cross-validation to find the best hyperparareters rid. search = cradsearchv(rfe, param.gri, c¥-5) eridzsearen.fitOucrain, y_train) 1 Prine the best hyperporaneters Drintgrid_senreh. best parane_) {rex depen’: None, ‘aan features’: ‘aqrt", “nestinators’: 100, “random state’: 42) ‘fram skearnoensenble inport RandonforestClassitier ‘fe RandonrorestClassiier(random_staten2, nax-aepth-tone, nax_features='sqrt', #_estinators PresfstOcteain, sosrain) class wesahe- babs andontorestCiassitier(ciase_weights'baLanced’, max featurese"sqrt', andor statent2) yopred = rfcspredict x test) prine(-accursey Score T", reurd(accuracy.score(y test, y_pred)*i09 ,2), °S") Aeeuracy Score + 97.28 X localhost 8888inotebooks/Sharing Vision Testipyibt 3167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook tn (08): tn (39) fram sklearnonetrics inport accuracy score, f1_score, precision score, recall score, jaccardscore, log_l055 prin(‘Fel score = "y(t. seore(y texts yopred, averages niero"))) Print(’ Precision Seate : "(precision store(y test, y-pred, averages" cro"))) Drint(’ Recall Score’: ",(recall-score(y test, yore,” average-nicro"))) Print(-dnceard Score +‘ (Joccard seorely. testy greed, average™"aiera"))) Printt’ Log Loss =", (log Lossty-tast, y.pred})) DSrectsfon Score 0.3/27881356452585 Saceard Soret G.o47@3986 75496688 tog toss" 0s188i00s07036522 Amp_af = pd-bataFrane(( fcsfeature inportances_ 4 = Snpdf sort values(ty="Inportance”, ascendingsFalse) #52 « tcneaatne) pit. Figure figsize>(10,2)) Sns:barplot(eata-f12, x=" Inportance’, y="Feature Hane") PUE.titLeC"Top 10 Feature Teportance’ Each attrfoutes, (Ran pltadabel ("inportance’ fonesize=is) Pitiylabel ("rosture Hone") fontsize- ts) pltsahont) Forest)", fontstze=i8) Top 10 Feature Importance Each Attributes (Random Forest) onmiyincome overtime ie DailyRate “atWorking Years Feature Name YearsatCompany DistanceFromtlems NumCompaniesWorked g oor g 003 es g 008 a Importance, localhost 8888inotebooks/Sharing Vision Testipyibt 3067 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1m (50); import stap foplatner = shap.Treetxplatner(efe) Shapivalues © expleiner-shap values(X test) Shap; sumary_plot(shap values, %tese) Nontiyincore [I ‘oa SockOptonLeve! iT ot Hiomensitcion i aT ics i veasAConary I TotaWorkingvecrs I YearsWithCurManager [i a LbbSaisfacion hz! NumCompaniesWorked DailyRate DisiancoFromHome MenthiRate YeorsinCurenifole HoutRete JobRole PerceniSaler Hike RelationshipSatistaction [TI oe: 00002 abe ats Sate ato) teat ‘mean(/SHAP valuel) (average impact on model output magnitude) localhost 8888inotebooks/Sharing Vision Testipyibt 30167 7128723, 935 PM In (51): |# conpute Shap votues fuplainer = shap.Ireetxplainer (ef ‘Sharing Vision Test -Jupyter Notebook ) Shapivalues © expleiner-shap values(X test) Shap; sumary_plot(shap_valuesti], ALsest.values, feature_ranes = Xtest.columns) Overtime Monthiyincome Ages ‘StockOptionLevel dobLovel EnvironmentSatisfaction MaritalStatus YearsAtCompany TotalWorkingYears YearsWithCurrManager JebSatisfaction NumCompaniesWorked DailyRate DistanceFromHome MonthiyRate YearsinCurrentRole HourtyRate JobRole PercentSalaryHike RelationshipSatisfaction localhost 8888inotebooks/Sharing Vision Testipyibt Hoh (ten pe meee ee Feature value mene mt Se << - ae 010-005 000 005 a40 015 'SHAP value (impact on model output) 40167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook ous(52) fram silearnonetrics Inport confusion matrix En confurton-natrix(y. tant, 9-30 ple. tigure(eapsiae-(5, Sns:neateap(eata-en, ineeidths>.5, anot-true, cnap = “Slues') ple-ylabel( actual label") pltsrdabel( ‘Predicted abel") Silleanple-title = "Accuracy Score for Random Forest: (0)"fornat(nfe.score(K test, y-test)) pitveitle(ald supple title, size = 35) Text(@.5, 1.0, ‘Accuracy Score for Randon Forest: 0,9727691156062585") Accuracy Score for Random Forest: 0.9727891156462585 400 ° ° z s00 2 - 6 7 = 100 a 1 Predicted label localhost 8888inotebooks/Sharing Vision Testipyibt 4167 7128723, 935 PM 1m (33) ovs153) 1 (58) fram sklearnonetrics Inport roc_curve, roc_aue_score Pipred proba" r¥esprediet-probagh eest)UsTEs02) ‘Sharing Vision Test -Jupyter Notebook At actual_predicted = pé.concst([pé.0atafrane(np.array(y.test), columsc['y_actual’]), sd.bataFrave(y_pred_proba, colums-["y_pr« ‘Gf-aclual_predicted. index = y tests inden ory tor, tr » roc_cunve(se_petual_predicted{'y_actual"], 4 aetval_predictedty.oreé_s oc aue_scora(ae setuai predicted|'y actual], ##_setuat_predicted| y sreé_praba"]) pit.plot(fpr, tgry Label "AUC = 0.4" tue) pit.plot(spr, for, Linestyle =, color='k’) pltsxdabel( False Positive Rate’) plt-ylabel( True Positive Rote") prestitie(*ROe curve’, size = 13) pits legend() ‘ratplotiib. Legend. Legend at 025489176330 ROC Curve 10 ‘True Positive Rate 02 a — auc - 09807 00 02 os 08 os 10 False Positive Rate ‘rom sklearn.aodel_selection import GridSearche¥ fram agoost import AGBCLassifter # create an xcBo0st closstfer ep X6xClaestf$er0) sven the procter grid for or search naestinatars': (100, 200), raxadepth': (3, 5, 7]e Tesrning rate"? [0-1,'0.61, 0.001], yf, ea} 1 Perfora a grid search with cross-valtdatton to find the best hyperparaneters frid_search — crsasearenevOegh, param ari, <¥-5) bridcsearen-fie(usrain, y.train} # Print the best nyperporancters Print(grid search. best, paren.) gemma’: 0.2, ‘Learning rate’: A, ‘max depth's 7, ‘nestinators”: 200) localhost 8888inotebooks/Sharing Vision Testipyibt 4267 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1m (58) ouet35) te (56) 1m (37) fram agboost import xsBclassifter gh» Xolclassivier(ganaé.2, learning sab.FiU0L train, yotrain) xabclassifter(base_scorestone, toosterstione, callbacksstone, ‘colsanple_pyie Colsanple bytree-Noney easy. stopping. ound erable. categoricaloraise, eval_netricmvone, feature typessvone, Erteraction-constraint-None, leorsing_rater8.2, nax_bin-tene, iax_cat_thresnoldsione, nax_cat_to_onehot-Wone, ‘Danitelta stopelone, nan depthe?, fan leaves Aone, ‘Ain-child_veightsNote, missinganon, ronetone_constraintsNore, Festinatorse208, njobsctone, run-parallel_tree-Noney predictorslone, random stateshene, =) fram skearn.netries import accuracy. score Jipred = xgoeprediet(X test) Drine(accuraey Score Ty round(accuracy.score(y test, ¥_pred)°109 42), °S") ecuracy Score + 97.28 X from sklearnonetrics inport accuracy score, ¢2_scone, precision score, recall score, Jaccard score, Log_l0ss princ("Fo1 score = ",(.seore(y tex, preg, average. nicro"))) Printt-Precksion Score "s(pretision scarey testy yipred, average="acro"))) Drint(’Recall Score", recallascore(y. test, ¥opre6y”average~'M1er0"))). prin(-isccard Score =", (Saccard-seorely testy y pred, sverage™"er"))) Print(’Log toss = "(log Lossty-tast, y.pred))) Fa Seore + o.s72rmsnascas2ses cision Score: 6,972701156462585 fees score =. @.9778913560695 tog toss + .9398533690208025 localhost 8888inotebooks/Sharing Vision Testipyibt 43067 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook in (58): import stap fiplainer » shap.Treetxplainer(1ab) Shipivalues © expleinen-shap values(X test) Shops sumary_plot(shap values, Atest) ntree_Linit 45 deprecated, use “Iteration_range’ on wodel slicing insteaé, Hoh Overtime ‘StockOptionLevel Monthiylncome Age DistanceFromHome ‘NumCompaniesWorked EnvironmentSatisfaction JobSatisfacton RelaionshipSatistacion +. JobRole ee DailyRate + emeatttpowes YearswithCurManager - BusinessTravel te YearsinCurrentfole ws ee Feature value JobLevel eae PercentSalaryHike ee TotalWorkingYears ae Gender <> —_— a HourlyRate 4 ‘SHAP value (impact on model output) localhost 8888inotebooks/Sharing Vision Testipyibt 4067 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook rn (39) ous(59} fram siearnonetrics Inport contusion matrix En confurton-natrix(y tant, 9-38 ple. tigre eigsia nsinesteap(osta-eny inerigthe-.5, annot-Trus, nap = “Blues") ple-ylabel( actual abet) pltsrdabel( Predicted bel") Silleanplectitle = "Accuracy Score for xS3005¢: (0}' format (ngb.scoreQX test, ¥_test)) pitveite(ald supple title, size = 35) Texe( 0.5, 1.0, ‘Accuracy Score for XGs00st: 0,9727891156462565') Accuracy Score for XGBoost: 0.9727891156462585 400 ° 2 x = 2 - 4 a ~ 100 ° 1 Predicted label localhost 8888inotebooks/Sharing Vision Testipyibt 45167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1 (60) veto) mo) 1 (65) ouctes) 1 [6) fram sklearnonetrics Inport roc_curve, roc_aue_score Pipred proba” apb.prediet-probath vest)EcTEs037 at actual_prebictes = pé.concst([pé.0ataFrane(np.array(y.test), columsc['y_actual']), pd.bataFrave(y_preé_proba, colums-["y_pr« ‘Gf-actual_ predicted. index = y tests inden ory tor, tr » roc_eurve(de_sctual_predicted['y_actual"], $f petual_predicted'y_ored_p0ba'7) Ste Tet_aue_score(af setual_predietedl y actual], 4f_pctual predicted| y orea probe") setae = x0.4¢" Nave) oP color) > pit.plot(for, tary 1a pit.plot(epr, far, Mestyle plt.xdabel( False Positive R plt-ylabel("irue Positive Rote") plestite("ROC curve", size = 18) pis. legend¢) ‘ratplotiip. Legend. Legend at 125458051940» ROC Curve 10 ‘True Positive Rate 02 a — ave ~0.se09 00 02 os 08 os 10 False Positive Rate (13) Dan has prtormansi yang Anda dapatkan marakahestmtor yang pang Bak? Regression 68 = pascead_ cov esioyee.csv') afa.neset) Unnamed; employedanber Aten Age Busnes Dalle Doputnent Oitancerontons Etueson CiverbonPald.Covkonmasaact . +e oy +8 Uns mon 14 2 No Tatromsety 779 man 8 nsec 22 3 Wor ar Toot Ra 22 one a3 4 no 38 TL Fromenty meu 3 Unseen oo a 2 a sorop holon Unnoned:@ ,mpLoyeehunber, dan Stondordkours, overi8, Performance Roving Gfhudrop(columras[ Unaned: 8", "Enplayechinber”, “standaraiaurs", “Overi8", "PerfornanceRating’ |, tnpla localhost 8888inotebooks/Sharing Vision Testipyibt 4667 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook in (67): from sklearn anoort preprocessing 1 Loop over each colunm tn the DotaFrane where dtype ts ‘object for col in oft, select, dtypes(ineluge-T object T).colvans mntetotize @ Labetencoder object Tabel_eneader » preprocessing. abelerceder() 1 FE the encoder to the unique values tn the colum Tabet encoder. Fit (art {cot ].untque()) 1 teansform the colume using the encoder Grdfeod] © Tabel eneadertranstore(dfiteat]) 4 print the colunn nae and the unique encoded values peine(e-{eol): {araleal}-uniqueQ}") snetettton: (2.9) Susinesstravel! (2 3 6) Department: [2.1.8] Edueationfielé: [114.3 2 5 2) Gencer= [0-1] Sobfoles (7.6 2402.85 4] ardeaistatus: (2 4 6) verrine: [1 8) (15) Spt walnset don tess test 826 = 02. Ip [68]: X = d¢a.érop( Wonthlyinesre", axis=1) ¥ © afaimenthlyancone"} in [69]: fram sklearn-sodel_selection énport srain_test_splst Kitrain, Xtest, yoerain, yotese ~ rain East SPILUCK y, C8St_shze=t.2, random states) (19) Lakutan ancora yang dpetukan pada aise. Lakikan jugs pasa lesa tanpa ting lag, In (70): From seipy snport seats 4 befine the coluans for which you want to remove outlters Selectes columns = ['StockOptionvevel”, "Totalworkingvears", “TrainingTinesLast¥ear’, “YearsttConoany", ‘Yearsincurrent#ole’, "NearssinceLasteronotion®Yearskitheurotanager'] 4 Calculate the Z-scores for the selected columns (n the teotning date ZLscones = npsabe(stats. recore(x train[ selected olvane])) 1 set o threshold value for outlier aetectton (2-9., 3) threshold = > # Find the indices of outliers bosed on the threshold outlier indices = npaanere(2_scores > threshole)(@] # howove the outliers from the troining dato train =X train-erop( sratn-Index[outter_sraices]) yoktain = yotrain.groa(y_ trash dngex(outhter_sraices)) (17) Slphan 2 epressor ‘entukannypeparam optimal unk ka regress. localhost 8888inotebooks/Sharing Vision Testipyibt ying el (1) heaipoynonial dan 2) decson reakandom frestxgboostregtesor Dengan menggunakan coss valdeon 4767 7728723, 935 PM ‘Sharing Vision Test -Jupyler Notebook locahost 8888inotebooks/Sharing Vision Testipynbt 4867 7728723, 935 PM ‘Sharing Vision Test -Jupyter Notebook zn (29) fram sktearn.aodel_selection inport GridSearchev, eross_val_score {ram sklearn-Lineaf-sodel ingore Lineartegression ‘ram sklearn-preprocessing inport PolynontaTFeatures fram sklearn-tree Inport DecistonTreetegresscr. {ram skearncensenble inport RandonForestRegressor fram gboost Inport Xeahegressor fram skLearnonetrics inport near squared error 1 hegressor 2: Linear/Petynoatal Regression snearregressor ~Linesregression() polysragressor = Lineartogreasion() 1 Nenanbahkon fstur polinowiot (order 2) olynontal_ features ~ Pelynontalreatures (degree=2) Kctrain.poly ~ polynonial,restures,it-transforn(X train) XCkost_poly ~ polynowiai_Festures.transforn(h test), # cross-validation untuk Linear Regression Isnear_seores = cross val_score( linear regressor, XLsrain, y.train, evS, scoring nee_neor_squsred.error”) Aipear_rase = np-sert{ tear Seores.e2on()) poly. scores = cross val. score(goly_regressor, K tratn,poly, y_train, cv-S, scorkng="neg_sean_squarad_ error") polyarase = npvsare(-aoly.scores.nean()) print(“Linesr Regression = cross-validation RASE:*, Linear rmse) Drintt-Polynoniai Regression = cross-validation RAGE!" poly.rege) # negressor 2: Decision Tree/Randow Forest/AGBo0st Reqressor Gtlregressor = Geeisiontreckepressor (random state-42). Pecnegeessor = Randan‘orestRepressor(randon_statens2} sgb_regressor = Mobnegnesson(random_statent2) 1 Wypenparaneter grid for Dectston Tree Regressor oeparons = ( nax-depth': [2y 4, 6, 8, None), samples split: (2,4, 6» Bly sanplesajeat": [ly 2,3, 4 raxcreatures"? ('aute", 'sgrt","tog2") > 1 Wypenparancter grt for Rondon Forest Regressor reparans = ( instars": (50, 100, 150], depth [None, 5, 10), samples split’ {2 5, 10] # typenparancter gric for X68505¢ Regressor fae parana = { nuestinators*: (50, 100, 150], ax-depth's By 5» earning rate’! (8.1,"8.01, e.021) > 1 Wypenparaneter grt for Potynontat Regression poly.perans = { itaintercept": (True, False], ‘normalize’? (True, Faise] ) 1 Mypenparancter grid for Linear Regression inear_parans = ¢ itLintercept": (True, False), Ioralize': (True, False] > # ertasearchcv for Decision tree Regressor Gtiaridlscarch - GridsearchCV(Gt_reqressor, dt_parans, cv-5, scoring Gt ariel search.#it(X train, yotrain) prine(-Becision tree Ropretsor~ best Parameters: Print(-Decision Tree Regrestor neg.nean squared. error") ya grts_senreh.bess_parans_) est Crosvalidation ANSE", np ogee Ut_gridsearch.dest_score_)) 1 srtdsearchcv for Randon Forest Regressor Feige search ~ Grldsearencv(re regressor, “f_parans, c¥5, scoring Peogrieh search #it(x train, yotrain) bprine(‘Randon forest Rogretsor ~ best Poraneters:*, rf_grtd search. best porans_) Print(-Randon Forest Regressor ~ Gest Crossrvalieation MSE", nov sget("P# grid search.best,score_)) neg.nean_squared. error") # GrtdsearchCv for XGboost Regressor gb_gri_search = cridsearchev(agh_regressor, xgb_parans, apbgri_searen 2G train, ¥_273in), pristcsdGacost Rogresior = bert Paraneters:*, ngh_grid_search.best_parsas_) Printt“XG300st Repressor = Best Cross-validation RMSE: na.SQPEC xD RP seaPch.best_score_)) Scortng="neg_tean_squared_ervor) 1 Getdsearchev for Polynomial Regression locahost 888inotebooks/Sharing Vision Testipynbt 49167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1 [73) ous(73} te (74) poly.arid_search = Gridsearchevipoly_regressor, poly_parans, c¥-5, scoring neR-nean_squared_error*) polyoarid-acarch.flt@t train poly, Yoteain) oniai Regression ~ Best Poraneters:", poly grid_search.best_parans_) iymomiah Regression ~ Best Cross-validation WAGES", npvsart (-poly-ar arch. vest_score_)) # eriasearchcv for Linear Regression Isnear_ertdsearch - GniaSearchcV( linear regressor, Linear_parans, v5, scoring: nee_near_aquared_error") Aipear_grid_seareh-1U0K trashy y_ train) DrinttLinese Repression « here Parameters", Tinean_ante_seareh.best_parans_) Print(-Linear Repression = best Cross-validation Wei, Spraqrt( Linear grids sch. best_sc0r6_)) "narnalize’ was daprecates Jn vision 1.0 and will be removed in 1.2, Plesse leave the normalize parameter to Sts default val * se to sllonce ths worning. the default boravior of this ertsnator is t0 not G0 ary nonnallzation. If normalseation ix needed please use sklesrn, preprocessing. Standardscaler instead Poiyrontal Regression ~ best Parareters: ("fit_intercept': True, ‘norsalize’: False) Doiypantal Repression - best Cross:validation RASC: 143. 87S6Bb09¢44%4 Linear Regression = gest Parameters: ("flt-antercept': Troe, "normalize": False) Linear Regression - Gest Crorscvalseation BHSE: 1427.1635765488027 maize’ was deprecated in version 1.0 and will be renoved in 1.2, please leave the normalize paraseter to its default val serto stlence tMs warning. the default beravior of this ertinator is €0 not Go any nomnalization. if normalization is needed please use sklearn.praprocessing.standaréSealer instead norealize’ was deprecated in version 1.0 and will be renoved in 1.2, Please leave the nornalize paraneter to its default val sito stlence ths warning. she default benavior of this eetinator is to not Go ary noenal ization. if normalszation 1s needed please use sklearn. preprocessing. standardsealer instead noraaiize’ was doprecates in version 1.0 and will be reneved £9 2.2. If you wash fo scale the gata, vse Pipeline with 2 Standardscaler in a preprocessing stage. To reproduce the previous behavio fron skleara.pipeline seport make_pipeline e ‘fram sklearn-tree import Decisiontreetegressor Gtree = Decistontreetegressor(randon stated, max deptholore, aax_features='auto", ain sonples_leaf=t, nin sanples_ spl eres. 1c train, y-trein) beck stontrestegressor(nax features’ sito", randon_stite-d) fram akLearn Saport metrics nport math yapred " dtree.sredict(x test) Tae" netricacnwuncabastite.errov(y test, pred) se = etnies near_squared_error(y test, y. pred) Pa» wetricsar2_score(y. test, ¥-Dre3) Pose" aatheserBtese), Drint(MAE is ()'forwat(mae)) rint(-MsE £3 {)frmatfase)) printt 2 format(r2)) printt roe is localhost 8888inotebooks/Sharing Vision Testipyibt 50167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1m (73) nya = pa vataFrane(( feature hane"s Xctrain. columns, “Enportance'? dtege, feature, inportances_ » #5 = Snp_df-sort_values(bye"Inportance”, ascendingsFalse) 432» @neaatiey pit tagure(sigetz Srs:varplot(eara-fi2, pie-titia pit vdabel ("ingortance’, fontsize=16), Pltiylabel (Feature Hane", fontsize=15) Fit. thoet) fontatze-18) Feature Importance Each Attributes (Decision Tree Regressor) bole Houtyate MonthiyRate NumCompaniesWorked Feature Name ‘evita ff ony | roconsayie | YeorshnCurensole OstanceFromtome 00 02 os. 08 08 Importance. localhost 8888inotebooks/Sharing Vision Testipyibt 5167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook In (76): sport stop fiplainer » shap. Ireetxplainer(atree) Shipivalues © expleiner-shap_values(X test) Shops sumary_plot(shap values, %tese) xine ge a a a TotalWorkingYears * otk Department vouyhale + Monty : Percentalaike ; + + + + | Hoh ‘NumCompaniesWorked DailyRate Age DistanceFromHome Feature value YearsinCurrentRole Education RelationshipSatisfaction YearsWithCurrManager JobSatisfaction YearsAtCompany TrainingTimesLastYear EnvironmentSatisfaction EducationField -2§o0 0 2500 900 7500 10000. 12500 'SHAP value (impact on model output) localhost 8888inotebooks/Sharing Vision Testipyibt 5267 7128723, 935 PM mn (77) tn (78) oust78): 1 (79) ‘Sharing Vision Test -Jupyter Notebook explainer = shap.explainen(atree, Xsest) Shaplvatues © eyplatner (test) Shap:plots waterfall (shep_values(0)) x) one Nunonpaieoed ea TotalWorkingYears -174.28 qd 20 = Montyte ag fe race | = EducationField = RelationshipSatisfaction = Depaien pores 20th me@ sao 00 eon nro ato. ato EIRX)] ~ 6 a ‘ram skiearn-ensenble iaport RandonForestRegressor PO” Randonroresthegressor (random seaterd, max depthsAone, min sanples_splite2, a_estinatorss 200) Peastokcerain, yotrain) aanconé orestRegressor(rancon_state~0) fram sklearn import metrics fram sklearn.nceries inport nean_absolute_percentage_ecror nport th yipred "r¢-predict(X test) toe = etricr mean abiolete_errarty_sest, y. pred) tape ~ ean absolute percentage errar(y 4250, 9-2red) tse ~ netrics near squared error(y est, y,pre@) PD. metrics rdacare(y £058, 9.2083) fase Saath Serttese) print printt printt print printt ae Sa" foreat(aae)) yuce Is. 0)" format (eape)) st se 0) -foreat(nse)) Ra score 5 ()"format(r2)) Ise score 15 (}'format(ense)) ape is 0-one36353092026016 se ir 426579. 5358489798 vse score te 645.4297768533632 localhost 8888inotebooks/Sharing Vision Testipyibt 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1m (30) so 4 = pd owearrne(( ‘cnportence": dteae.feature_importances_ » Hs snp sort_atues(by"Taportance",ascendingsFase) 432» Gneaatne) put. tagure sige Srs:barplot(eara-ti2, pie-titia( pit vdabel ("ingortance’, fontsize=16), Pltoylabel ("Feature Wane", fontsize=15) Fitton) fontatze-18) Feature Importance Each Attributes (Random Forest Regressor) bole Houtyate MonthiyRate NumCompaniesWorked Feature Name ‘evita ff ony | roconsayie | YeorshnCurensole OstanceFromtome 00 02 os. 08 08 Importance. localhost 8888inotebooks/Sharing Vision Testipyibt 5067 7128723, 935 PM in (81): port stop fiplainer » shap. Ireeteplainer (ee) ‘Sharing Vision Test -Jupyter Notebook Shapivalues © expleiner-shap_valucs(X test) Shops sumary_plot(shap values, %test) JobLevel ‘otalWorkingYears debRale HourlyRate DistanceFromHome DailyRate MonthiyRate NumCompaniesWorked Department Age PercentSalaryHike ‘YearsWithCurrManager Education YearsinCurrentRole EducationField ‘YearsAtCompany TrainingTimesLastYear soblavolvement EnvironmentSatisfaction YearsSinceLastPromotion localhost 8888inotebooks/Sharing Vision Testipyibt ane ! r++ pee cep ee eee te eee “2500 6 2600 so00 7600 10000 +2600 ‘SHAP value (impact on model output) Hoh Feature valve 5567 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook in (82): explatner = shap.explatner(nt, Xtest, check aaditivity-False) Shipivalues = eyplainer(t test, check seaieivity False) Shap:plots waterfall (shep_values(9]) sx revwweveeeerees | 563/588 (69:1660:00) i”) woe o-numconganievoted 209 TT | 2 Toatingtoaose | 012 -Monyrate 250 =veasicinenoe 215 =HowyRate P7522 Deparment 2872 avonertenues 20:25 Ql e000." 65007000 [88]: fram aghast import xGBRegressor gh regressor ~ sabRegressor(leaening cate 6.1, aax_dapth= 7, #_estinators Aet_rogressor “Gt OK train, ¥-2rain) ovr(8s): XebRegressor(bate_scorechone, boosteretore, cal bsckschore, Colsanple_byleveloNone, colsangle_bynocestone, olsanple_bytree-None,_early_stopping_reaneschone, ra, Gpu_idstone, grow poLtcystore, snportonce, type reeraction constraints-vone, learving.rate-b.1, nax_pinoone, fnax_delta_stepetone, max depth7, nax_leavesetone, ruestinstorsni58, #jbsstene, num puralle predictorstone, randon_statesone, ==.) {in [86]: from sklearn Snport metrics fron aklearnametries daport nport ath yapred " xgb_regressor. predict (test) nctricainean absolute error(y test, Y_pred) pe = nean_eosolute percentoge error(y_ test, ¥_2red) = necrice.nean.sgunred errorty. testy 9.9060) Fai= petriesor2 scare(y testy 9 Bee ase "math Sere¢nse) print (MAE ts {)°-fernat(mse)) Drine( moe is ()' format (eape)) Dprint(MSE ts ()"fereat(mse)) Print("R2 score <5 {)--formatte2)) Drintt ase score 45" fonmae(omse)) oe is 387105.37737368 localhost 8888inotebooks/Sharing Vision Testipyibt Eld.welgrtstone, nissingenan, nonotone_constratnts-tone, 8500 9000 56167 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 1m (87): port stop fiplainer » shap. Ireetxplainer(sab_regressor) Shapivalues © expleiner-shap values(X-test) Shops sumary_plot(shap values, %tese) ree Lnit Ss apace, ue “Stree. ragy or noel slicing iat vo sou we ae JobRole +- mae TotalWorkingYears DalyRete Monthiyate Department HourlyRate DistanceFromHiome aoe YearsAtCompany ‘YearsinCurrentRole Feature value Jobinvolvement PercentSalaryHike TrainingTimesLastYear EducationField NumCompaniesWorked RelationshipSatistaction Education YearsWithCurManager bpp tee ertttteer ey ‘StockOptionLevel 200 0 2600 Sood 7600 10000 12600 ‘SHAP value (impact on model output) localhost 8888inotebooks/Sharing Vision Testipyibt S767 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook in (88); explaaner = shap.explatner(agh_regressor, x test, check aaattivity-ralse) Shipivalues = eyplainer(t test, check seateiviey False) Shap:plots waterfall (shep_velues(9)) Ax) . vonmyrate a8 i 12 Tetons vo @] 6 NanCamparosnts «on Stectopteneve poss = Deparment pore earttihCunanager -s020( = snes ra ( 20ers x | soo 00 son oon FIRx)) z in [90]: |anport runpy as mp ‘fram sklearn-Linear_sodel Anport Lineartegression ines’ = Lineartegression (eit lateree Linreg.F1eOCtrain, y-taln) True, naenalize: False) normalize’ wos deprecated in version 1.0 and will be renoved in 1.2. Please leave the normalize parameter to its default valve to silence this warring. The default behavior of this estinator ts to not do any noreallastion, Tf noraslization te needed plea 56 Use skleann preprocessing. Standordscaler inste2, 0ut(90): LinearRegression(normalsze-false) Xn [98]: fram akdearn éaport metres from skiearnngtrses import nean_absolute percentage. error nport ath Yipred © Vinreg.predict(X test) netricernean absolute error. test, y_pred) pe» nean_sosolute percentage error(y_text, ¥_2°e6) owtrdcs.nean squared error(s tests peed) 2's wetriesor2 score(y_ test, 9 2°83) ase = nathesare(nse) print(mae ts ()°format(nae)) printt me és ()"sformse(eape)) Print(Mse ss ()"ferwaetase)) Print’ score 35 {)"fornat(r2)) Prine(RISE store 15 (js formaeConse)) ape is 0, 2i6462754s2n1208 se tr 1958738. 1157680628, vse score Le 1998, 4779701619696 localhost 8888inotebooks/Sharing Vision Testipyibt 5067 7128723, 935 PM ‘Sharing Vision Test -Jupyler Notebook 1m (92) np af = pd vataFrane(( feature ane"s Xctrain. columns, “teportance"s np.abs(Lineeg.coet_) » 1 sort by importance (absolute volue of coefftctents) 45 inp_s€csortvalues(by-"inportance”, ascendingsFalse) 4 select the top 10 feotures 132 Fi shenette) # Plot the fecture smportance pit. Figure(Figsize-G0, 8) Sns:barplot(aata-fl2, X<"Ingortance’, ye"Feature Nave) plestitle( restare inporcance Each Attribute (Linear Regression)’, fontsize-18) pit sdsbel("Tapertance’, fontsize-:6) plt-ylabel( Feature Nave! Fortsize=15) Feature Importance Each Attribute (Linear Regression) ender I SsoskOptonLeve I . 5 a sone | £ a Business Travel I 8 i ‘earawincurdanager | “etaWorkingYears l Erwerrnensatstacson | ortalsiats | 000 Importance (18) Pont mets 2, mse, ms, mae, dan map (dam bantu daar) ase sess nt ke egrssor. Manaka model yang tin bak peromance-nya? localhost 8888inotebooks/Sharing Vision Testipyibt 50167 7128723, 935 PM ‘Sharing Vision Test -Jupyler Notebook locahost 8888inotebooks/Sharing Vision Testipynbt 60167 7728723, 935 PM ‘Sharing Vision Test -Jupyler Notebook zn (93) ‘ram skdearn-Linear_sodel inport Lineastegression fram sklearn-tree inport ecision"reetegressor ‘ram sklearncensenble iaport RandonForestRegressor fram ogboost Inport XGRegressor {ram sklearn.nctries Inport near_squared_error, nean_absolute_error fram sklearnaetries import near absolute percentage eeror as MOE Linear regressor = Linearhegression( it intercept True, nornaliz ineararegresson.f48(@-train, y_tratn) train, geed_Linear = Linear_regressor.predict(X trata) Scsest_predLinesr = Linear Fegressor.precict(X fest) 1 Caleutate metrics for Linear Regression Lsnear_ra-train = Lineor_rogressor-score(X train, y train) Aipear_ra7test = Linear Fegressor-seare(t Rest, 9 test) Aanear_s@_train = pean_squaree_error(y. train, y-train_pred_lsnear) Linearcnse_test = sear_sauared_error(y Test, .? Ainear_rase_train © np.sqrt(ilnear nse train) Ainear_rese_test » np. sre (Linear Ase_ fest) Linearanae train = nean-absoivte error(y-erain, y_train_pred_tinear) Linsarcaae_ tert ~ nean absolute srror(y Test, y_test pred, Linear) Linear-napé_srain © nAPe(y_train, y-crain_ored Tinea?) Ainear_pape_test = MWPE(/ Fest, test pred_28hear) Gt regressor = Gecistontrectepressor(randon state-0, max depth-None, max feature Gt_regressor.e18( train, y-2Fein) ain senoles_leaf-1, nin_sanples_ split Yy.arain peed dt = o&_regressor.preatce(X train) YLRest_pred_Gt = o&_Fegressor-preaict(x fest) 4 Calestate metrtes for Dectston Thee Regressor Gearatrain ” at-regressor-score(xerain, y-train) Gtcraitese = at pegressor-score(X test, 90081), ‘tcpsn train = nean-squarederror(y- trait, y-train_prad_¢t) ancaauared_srror(y-test, y_test_pree_ dt) a) an ansolute errar(y- test, test pred dt) aPeCy trains y-erein. pre at) Gcnape_eratn Gtcpapectert = RAPEGY 1 Regressor 3: Rondow Forest Regressor Pelrepressor © RandontorestReprecsor (random state-2, wax depth-lone, win sinples_split-2, nestinators= 100) PeLregessor.#18(x- train, y-tFain) Yy.arain pred r# = ef repressor.preatee (trash) YIReSt_pred_ Ft ~ rf Fegressor-predict(x fest) rera_train = ré_repressorscore(X train, y-train) Peoracpese = rt-Pegressor-seore(X fest, y225t) Pelesa train © pean squared error(y- train, y train pred r#) (eonse_test = nean-squared_error(y- Test, ¥_t05t_pree. rf) Peopmse_tratn " npssaet(rfonse train) ecrmse_test = aprsgrt (rf Ase fest) Peonae train = mean. absolute-Srror(y-sratn, y_train_predr4) Peonae test = mean absolute errar(y test, test pred. f) ‘eonape train " RAPECY trains y-erein. pres P#) Pronapectest = MAPEGy Fest, testaprad rf) fghirogressor ~ Nobhegressor(Iearning_rte= 0.2, max depth: 7, restinators= 158) Ngb_ragressor #AtQK train, y_train) train pced_xgh = sgh_cegressor-predict Xtrain) YIReSt_pred_egb ~ ogb_pegressor-preslctOX fest) # calculate metrics for XGboost Regressor fghura.train = agh-regressor.ecore(X train, y_train) zokest = xgh_Fegresson score(K Fest, J t05t) Setunsa train ~nesnesquacedervor(y train, j-train_pred_xet) ngbupse_test ~ moan squared error(y Cast, 9.205% pred_x65) gtoress.eratn " npssqet (agora train) ngt_rase_test = np.adet (xgh_ase test) fgtupae.tonin ~ nean-absolute-ervor(y tenin, y_Srain_pcedxe>) Hubcnae teat = moun absolute Orvorty Test, 7 tos8 pre gbipape_tratn ~ MSPE(y_ train, y-traln. pred 3g8) abapape_test > WAPECG Test, Y_2@st_ pre ¥25) # create a dotafrave to compare the penfornonce of the regressors retries_aats = ( Repressor"? [‘Linear Regression’, “Decision Tree Regressor’, “Randon Forest Regressor", “Natoost Regresser"], locahost 8888inotebooks/Sharing Vision Testipynbt ever 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook ovr(93) zn (98) outs) 1 (95) fe (Inainy"s (Linean e2_train, at_r2_tain, of-r2_teain, xeb-12_¢rain), fa (Test)"s [Linear 2 fest, atop test, re edeest, gh rd test), SE (Tain): [linear ase train, di_nse’ train, re_nse train, agh nse train), St Cost)": [linesn fe tert, atanae-tore, rinse tert, xpb_neeaves?), fis (Trala)"s (Iinear_rasetrain, ease train, Perce trace, ogb_rse_trath), vse (Test)': [linear pase fest, dtarase test, rage test, agh_rnse_test], ak. (insin)'s [Inearcae train, dtcnae train, ronae. train, xgbonae. tain], Nae (Test)"s [Linear fue, fest, dt_nae test, "nae test, gh nae test), ‘wave’ (irstn)'s Cinear_nape train, et_napecrainy Pf-nape strict = plbtatrana(ectrics sat) omalize’ was deprecated in version 1.0 and will be renoved in 1.2, Pleate leave the noraalize paraneter to ite default value to sllence this warning. The default behavior of this estinator is to net do ary norvalization, If homalization is needed ples fe use skleaen preprocessing. StandordSesier. inste3e Regrestor_R2(Trn) R2,Teat)_MSE(Tan) _MSE(Tet) RMSE(Tin) RMSE Tex) MAE(Taln) _MAE(Tet) MAPE (Trin) MAPE Test) 7 Urwin Oana! oapeze TpsaiSeeae TessTana6 —400st270 “IORATIOW? TOHLAOTGS TORRES OPIATE 71089 1 chnTies Raenar 1000009 047027 CoMHoDNEe EaHette-9s Goren TAKEDA 827517 BEONDND SEES Clustering {19} tkstantantas yong peti pc ae empyee 442 = pe.read csv enoloyee.cs¥") r2.neae() Unmams empoyecamber Attn Age Burlsstavel Dlyfiate Department Diencefromoms Eaton CiucetonFild Envtonmentattacto: oO ee 1B Un seme so 2 No 4 Tatra, 779 man 81 nsec 2 2 9 Yor a Tota 2 2 on a8 4 no 39 ToL rrementy m2 fun 3 nsec oe 5 ho Tenet Rae sot mant 2a sorop holon Unnoned:@ ,mpLoyeetunber, dan Stondordhours, overI8, Performance Roving Gthcdrop(colummac[ Unianed: @"y"Enplayectinber”, “standargiaurs', Ovcra8", "PerfarmanceRating’ |, inpla rue) (21) Dengan menggunakanatow meta dan sithouete sere entukan nia opin nik pembuatan model kmesnsieedsdscistrng| localhost 8888inotebooks/Sharing Vision Testipyibt e267 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 166) ‘from sklearn.claster inport Kians # ramction to find optinal R using the Flbow Method fet ne optinal bk elbon(k, mak) ineréiae ~ (1 for kin ranges, max » 1) Ineans Keane (rcclusterssk, random state-t2) nears. #280), Snertiae. sppend(encans.inersia_) splot elbow graph Pit tigure(tigeise=t8, 69) Fitcplet(range(d, max k > 2), Snertias, marke pitunapel(honber of clusters 2) Siectitte( tee ethos) piesa) # Function to find optinal & using the Sithovette score Set anti I sibeaete 0, ma) for ein range(2s MOK + 1) ncara Keane (roelustersk, random statest2) nears. #200), abel ereans. 130015 stIhouette scores append(sThouette_seare(, Tabels)) 4 plot sttnouette score graph fatcHguretsgsisess, 6) Eitiplotteage(ar nak 1), stlnouette_scores, wankers") nt alae Rune of usters (9°) piecithe( siitovete score) estou) selectes_coluans = ['Yonthlyincone’, Xo afa[aelected columns] + “Datlynace'] 4 Nencart milot & optinek nensgunakan elbow Method Find.opeinal,k elbow, m4) 4 Nencare nétot b eptinet mengguratan Stthovette Seore ‘ind_optinal,k siltovette(t, max 4) 1e10 Elbow Method Inertia wi 4 ° 5 10 ‘Number of clusters (k) localhost 8888inotebooks/Sharing Vision Testipyibt 67 7128723, 935 PM ‘Sharing Vision Test -Jupyler Notebook Sithoustte Soore om 085. 060 oss: Sihouette Score 045, 040 2 3 4 5 6 7 Bs 8 Number of clusters (k) (22) Taman 1 kolo abel pada dataset yang bers omar clstor untuk ap rw, bordasarkan madelkmoansclsaring yang lah dbus, locahost 8888inotebooks/Sharing Vision Testipynbt 07 7128723, 935 PM ‘Sharing Vision Test -Jupyter Notebook 105) ‘from sklearn.claster inport Kians # ramction to find optinal R using the Flow Method fet in optinal bk elbon(X, mak) dnere tae ~ (1 for kin ranges, max » 1) nears Keane (rcclustersck, randon_state-t2) nears. #280), Snertiae. sppend(ensans.inersia_) splot elbow graph Pit. tigurettigeise=t8, 69) Pitcplet(range(d, max k > 1), Snertias, marke pitunabel(honber of clusters) Siectitte( tee thes) pits) 1 aeturn the fitted steons model with optimal ptinat_k - npsaranin(npncter(inertias)) » 2 Keans = ehanra(naelusters-optinal ky random sta eans 42008) return encom # rumetion t0 find optinal & using the silhouette Score fet Fin optinal ke silhouette(X, max) Hsthowette_ scores ~ () for kein range(2y max « 1) neant = Bteane(ecclustersck, random state-t2) nears. #280), abel ereane 133015 sStlnovetter scores append(sttnouette_seare(, 1abels)) pit-figure(eigsizes(6, 6)) pitsplottrange(2, max t » 1), silhouette scores, warker=‘o") pit sdabel( hunter oF eLusters (e)") pit.ylabel('stltouette score") pit.ttle('silhovette score") pit shou) 4 aeturn the fitted sears model with optimal cpliralk - np.argnax(sithouette scores) = 2 Keans = xneans(naclusters-optinal Ky ranéon_state~42) nears F5808) selectee_columns = (‘vonthlyzncome’, "Age", “Dadiynate"] X's afaf elected columns] F Wencort mttot & eptinet menggurakan Elbow Method nox Kneans node = fing eptiaal K elbou(X, maxt) # presict cluster Labels and ode “Label” column to the Datofrane (Gf3[ label") » aneans, nodal predict) localhost 8888inotebooks/Sharing Vision Testipyibt 6567 7728723, 935 PM 4 ‘Sharing Vision Test -Jupyler Notebook Elbow Method 6 Number of clusters (ke) . £3 2 ° 2 cunt105) locahost 8888inotebooks/Sharing Vision Testipynbt 66167 772823, 935 PM ‘Sharing Vision Test -Jupyter Notebook 3 (106) ‘ram ap toolkits. nplot3é inpor= Axes30 130 Visualization of the clusters 3g» ple. fagure(Cigsize-(18, 8)) sx'= figsaddnsupplet (i, projection-'34") # create 9 colornap for the clusters Colors = pitcen-tabi@(af2{"2abel'] / Flose(nax(ee2{"1abe2")))) # Plot the dota points ‘Sx.scatter(af2{ Honinlyincone'), ef2{ Age"), aF2["DallyRate'], e-colors, $50) # Set Labels for the axes ‘Sx. set abel northlyzncone") Sxcset_ylavel ("Ace") Se setae Gasiyaate') # set the title Sset_ttle( "Employee cluster") pit.shout) Employee Cluster ~ 00 ~ 1200 ~1000-$ localhost 8888inotebooks/Sharing Vision Testipyibt en67

You might also like