Hcin620 Final

You might also like

Download as pdf
Download as pdf
You are on page 1of 7
8113723, 856 PM HCIN620_Mé_Labé_FinalProject_cké_TT_Update (1)ipynb - Colaboratory HIN 620 Lab 6 Course Project In this project we are tasked with predicting the stages of Chronic Kidney Disease based on Glomerular Filtration Rate (GFR). Information for dataset is avalabe inthis link httoJ/archive ics ucLedu/datase/336/chronicskidneysdisease In orderto succeed inthis fnal you willbe required to input most ofthe python code. Please review all he previous labs before you begin, and read the instructions careful Rather han using a question/answer format, we have commented the code cells withthe notation #0 DO Thisis a placenolcer technique (a To Do list, s0 0 speak) commonly used in machine leatning, Complete each #TODO task requested of you Good Luck! "Notebook by Reza Afra, PhD. and Barbara Berkovich, Ph.D, M.A, Last update December 28, 2020 Revised by Thidarat Tinnakomsr'suphep, Ph.D, > Step 1: Environment Setup You have learned that inorder to setup an environment for your project, you'd need to fist import the libraries you need Ingort pandas 38 pd lngort nunpy 35." import watplotlsb.pyplot as ple fngort seaborn 35 sn ple-style-vre(“Eeplot’) fron sklearnsnetrizs laport 12 score, meay squared error, accuracy scone, confusion watric fron sklearn.conpose inport Colunntransfooner warnings. fiterwarnings ("ignore") 110 00 2. Add 2 print comand to acknowledge completion of the import. prine¢*Inport 1 complete") > Step 2: Data Cleaning Upload the data ecalled dat lb--ketcourseprojectcv dat, Read the csv file ito a data frame, and use the name of the datatrameto print the frst and last 5 rows, {70 00 3. Load the data (lab 6) into 2 pandas detafrane called “ats Gata = pd.rend_cew(aata-labeseckancourseprosectc3¥") hnpe:iolab research google. comdriva/17AFULYPwiP-IO-D37TBEMKmRI2FYetRecrolT wr 8113723, 856 PM HCIN620_Mé_Labé_FinalProject_cké_TT_Update (1)ipynb - Colaboratory ee 4p cttd SMES amin sugar sand TE PHD nactria acoso ME a 480 m9 4008 40.09. somal ebnemal peort rarsert 1170 4610994018 20.00. abmemal aincmal mavens roresmnt 1720 1m 207004028 00 0D. somal_ mal nabresen nce 750 1170 00 5. perform data cleaning. How many a sin¢(2}).sen() ng values are in Mis dataset? Ary further actions required? Blood Glecose Random Packed cell Volume type: eneee ‘There are no missing values and no further action is requted. Allo the atrbutes sted above that may indicate a missing value include O as a possible input. For example, Albimiun has a possible Input of 0,1,2.24. ofS. The same goes for Sugar (with an Input of 01,2.24, or 5) and Class (ith an input of 0 for notekd 0-1 for eka). > Step 3: Exploratory Data Analysis (EDA) and Preprocessing Next were going to build the targets which are stages of CKD in a column we wil ell KD Stagee" There are various equations for caleuating CGFRbut here we will sick with a simplified form oft. Please read about GFR in the following lnk, Source: hitps:/wwowniddk nih gov/healts (nformation/professionals/linicaltools-patlent-management/kidneydisease/laboratoy-evalvatlon/glometulae-filratlon-ate/estimating 4 see a formule given by NIDDK wich nas simpler and made 3¢ even sinpler by GER (al/nin/3. 73 ne2) = 275 x (Ser)oe1-264 (RO)-0.203 fet eale.ghr(Ser, Age): ¥ Function to calculate ofR eturn (275 * (See) *# 4.154) * (age 6.203) 8 GFR (pL/min/2.73 m2) «= 175 « (Ser). 254 = (Age) 0.203 4 nad no sretances. % reduced the number of classes to 3 Got cale_cke_stape(ain): Function for deternining cke stage bine = (8, 45,98) 258) npe:iolab research google. condriva/17AFULYPwiP-IO-D37TBEMKmRI2FYelRsecrolT a 8113723, 856 PM HCIN620_Mé_Labé_FinalProject_ckd_TT_Update (1)ipynb - laboratory ret = pdscut(ghr, bintabins, labelselabels) 100 WoT change the code below sata("ora} = calc gre(aatal"Serun Creatinine”), aatalAge"]) fe = catal'oAR') Fenoved_outLiens = g/r.beteen(gfr.quantile(.08), gir.quantsle(.95)) ata = data[renoved outliers] 100 Wor change the code betow fatal "CHO Stages"). valve. cousts() 3 Mane: C0 Stages, atypes inten + Histogram 170 00 6: create a histogran of the values of “Serv Creatinine” and provide Intergretations. sns.histplot(datal"Sern Creatinine], bins-50); Count fi Tham esp ll 2 4 8 8) UU Serum Creatinine ~ Scatterplot 44 10 0 7: reate a scatterptot of the values of “GET (y-axis) vs "Serum Crestinine” and provide interpretations pit. Figure(igsize=(28, 6)) for stage tn set ask = data["CXD Stages") == stage plt.seatter(aita.loc{nask, “Serum Creatinine"], data. toc(mask, “GFR*), label=F" stage (stage)', alo pls.tisle(oFR ve Serun coentinine with cxD stages") ple wlavel( Serum Creatinine) ple-ylavel(“aF8") ple legend) ple-griat ce) put stent) npe:iolab research google. condriva/17AFULYPwiP-IO-D37TBEMKmRI2FYelRsecrolT a7 8113723, 856 PM HCIN620_Mé_Labé_FinalProject_cko_TT_Update (1)ipynb - Colaboratry GFR vs Serum Creatinine with CKD Stages © sage st © Stage 2 © stages . 8 i 4 ‘ é 10 2 a ‘The scatter plot shown above indicates that the glomerular filtration rate (GFR) i highest when Serum Creatinine is lowest. The GFR quickly rops between @1-2 Creatinine level and slowly approaches zero the higher the Creatinine gets. ~ Isolate features from target #00 Wor change t rop({ "135s", "CKO stages']_ axis ‘Thote ate various ways to encode eategorial data, You leaned about some of ther during labs, Visit this Unk and readit thoroughly Find an anpropsate encoding echeme and transform the calegerical attrbutes of your dataset Categorical features = date.drop(( ‘Age’, Wlood Pressure’ Specitie Gravity, Alounin", "Blood Glucose Randor*, Blood Uree ite Blood Cell Count", Red Blood Celi Count’, Serum Creatinine’, "Sodiun,Potassiue”,"hanoglobin’,"Packe encoder = onetotEncoaer() fercoded results ~ encoser.it_transtorn(categorical_‘eatures) -toarray() funerical. features = éata.crop({'Red Blood Cells", "Pus cell", Pus Cell clamps", Bacteria’, ypertensson’, Diabetes Hellitus’, "Coronary Actery Disease", Aopetite’, ‘Pedal Edens’, Anemia", "Class", “AD Stages], sealer + StandardsealerQ) Sealed punerical = sealer. fit_transform(nunerScal_features) Sealed = scaler-#it_transtorafnp-array(y) reshope(-ly 2) 100 WoT change the code below ypes(snclusen{ oats) coluans ct_ttypes(ineLudee{ object’) columns = [(eat", onetotencoder(), categorical tx), (*run', StandardScaler(), nunertes}_tx)] transform = ColunTrarstorner (transform X= transform. fie_transtonm(x) 110 00 11: create a heatnap of correlation between features. Pick at least 2 pairs of features and explain thelr correlations ovr = aata.corr() 1800 Not-change the cose below Pie. igure(Figrszen(8, 8)) sns.hestnap(data.corr(), cbareTeue, annotafalse, yticklabelsenunerieal ix ‘icidabelscnunerieal 00) npe:iolab research google. condriva/17AFULYPwiP-IO-D37TBEMKmRI2FYelRsecrolT a 8113723, 856 PM HCIN620_Mé_Labé_FinalProject_cké_TT_Update (1)ipynb - Colaboratory Blood Pressure specific Gravity on loed Glucose Random blood ures 02s 025 Packed Cel volume White Blond Cal count 14 Cell Count Red Bloo ‘Speci Graity& Albumin: Specie gravity and albumin have an extremely high negative corelation of less than 0.75 Albumin & GFR: ‘Albumonin and GFR have an extremely high positive coreltation of almost 1.00, > Split the Data Ktrain, A est, yotrain, test > treintest split y stratityndatat CKD Stages", randon_state~308) ~ Step 4: Build the Models and Evaluate ~ Logistic Regression 44 Use Logistic Regression to presict the stage of she kidney function Tog reg = Lagistietegression() ogres. 280% trata, y_erasn) yy ges = 20g reg. predict test) W Prediction based on a value given from the test data Print¢e" accuracy on test set: (eccuracy_score(y_test, y.pres):-3F)") rant \otccuracy on tain set: (accuracy. scorety tain, 20g reg. prectct(X 0)):.30)") npe:iolab research google. comidriva/17AFULYPwiP-IO-D37TBEMKmR|2FYelRsecrolT sr 8113723, 856 PM HOIN620_M6_Lab6_FinalProject_cko_TT_Update (1)pynb - Colaboratory ~ Confusion Matrix 6 = pd.batafrane(dats_, columnse{'y_trus','y_greé")) confusion nateix = pd. crosstab(er{'yitree"], Je{'y_pree"], romanese[ “ACTUAL colnanese( "PREDICTED" ns.heatwap confusion matesx, snrat=trve) pleshow) ACTUAL PREDICTED, + Kenearest Neighbors ben = KhedghoorsClassisien(n_neigntors=3) kon. ¢G4 train, yotrain) Y.gred = ken prechctti test) print(® Accuracy on test sets (accuracy_scorety test, y_pree);.34P") accuracies = [] for W in range(2,28) kon = RighborsClaesstte(n_nesghtors=\) kon. Fitteain, yotrain) y.pred = kaneprecict(K test) tee = sceurecy acorety test, pred) Seearactes.apponatsee) apeareay(accuracies) # convert to nungy array eon. arange(1,28),yraccurectes); best k = 1 + mprargrax(accuractes) # ad on best_accuracy = npunax(aecuracies) print(fraest ke (oestlk) \nbest Accuracy from KAW: (best accuracy:.349") b/c arrays are e-tndexed hitps cola research google comidrval7AFULIYPwvP-10-D37TEEKmR)2FYetR#serol or 8113723, 856 PM HCIN620_Mé_Labé_FinalProject_cké_TT_Update (1)pynb - Colaboratry 090 a9 092 080- o7a- ‘The optimal number of neighbors is 14 and the accuracy ofthis model is 0.895 ltyour code runs cleanly all the way through and you have answered all questions then submit both Notebook and pf les to Blackboard for grading Y 08 completes at 252M npe:iolab research google. condriva/17AFULYPwiP-IO-D37TBEMKmRI2FYelRsecrolT wT

You might also like