Download as pdf
Download as pdf
You are on page 1of 9
237723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory Double-click (or enter) to edit ~ we have imported the necessary libraries to create our ML model Dataframe name = HR Next we can check the head of the dataframe warnings. teruarnings(Sgnore") Lnoort py 25 r@ = pa.rend csv */content/#8 comma. sep csv" ) hess) mint, ‘ 44399 non-null 5 pS 34999 noncnull object types: Floaten(2), int6a(6), 90 head) faction level Last_evalustion runber project average_sontly.hours tine spend_conpany ‘ » Double-click (or enter) to edit ~ Now we can check the count of people left as 1 and retained as 0 value courts() Nave: eft, ctype: ints Dosble-ik rete) to elt + Created a list with all the columns 1 = seem cotums) (csststacton tee", naan prefer htpsscolab research google comidrival'SHSGZOAmnIB3sK0Sh_aMsobHpFSeOltiscralTo=bol UgdobSOSSEprniMode= 7723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory Doubleclick rene) to ect + Dropping the Column "left" x fostures = Tist( HR.coluens ) Xfeatures.renove( "Ieft! (Csseisfoction level", runber_project',” pronation Double-click or enter) to ect + Using Hot Code to create (N-1) Dummy variable for N categories encoded df = pl.get dunates( HA(X Features], srep first = True ) List encodes What. colunas) Uisatistaction level’, Departaent_narketing' Departaent_produc selary_nediin’} Double-click (or enter) to edit ~ Importing statsmodels library as ‘sm’ Creating dependent variable as X Creating independent variable as | and adding a constant so the Im model gives intercept value htpsscolab research google comidriva!'SHSGZOAmnIB3sK0Sh_aMsobHpFSeOltiscralTo=bol UgdobSOSSEprniMode= 7723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory X= snladé_constant( encodes Mtef } Double-click (or enter) to edit Training the model with 70% of dataframe and spliting the dataframe and testing with 30% data random state + 42) Doubleclick or ante) to et + Importing statsmodel library and naming the regression model as "logit_model’” Angort statstedels.api 35 sm logit = sn-togit(y-train, X train) logit poded = Toglt. tO Now checking the summary of the model and identifying significant and non significant variables ogst_podet sunnary20) Lop Pseudo R-squarod: 0.225, ert Vaal: et Ac: es7124s 2023.03.07 13:24 BIG 109.5462 No. Observations: 10499 Leg Liatnoos: 4466.8 or Meds 18 un “set DrResidals: 10480 UR pao: 0.0000 Conver: 1.0000 Seat +0000 No Mortons: 7.0000 cout, Sidr. Poke] (0025 0975) umber project -0.30190.0254 -11.8800 0000-03517 0252+ mondy.howre 0048 0.0006 7.7658 0.0000 0.0036 0.0060, end_company 0.2878 00:87 143073 o.9000023%1 oso44 promotion last Syeare -1.24860:3021 ~4.4534 0.0000-1.9405-0,798 Department. marketing 0.2529 0.1489 1.7891 0.0736-0.0251 0.8500 Department product mg 062 0.1402 0.8283 0.4075-0.1587 03810 Department support 01707 0173 14552. 01458-00592 04006 rymedium 15442 0.1809 95877 o.oo 12180 1a4e4 htpsscolab research google comidriva!'SHSGZOAmnIB3sK0Sh_aMsobHpFSeOltiscralTo=bol UgdobSOSSEprniMode= 7723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory Double-click or enter) to edit ~ Defining a function which will extract the SIGNIFICANT variables from the logit_model det get_significant_vars( In): \ar_p_vals_of = pa.batarrare( In.praiues var_plvals_of.coluins = ["pvais', "vars" rh JAsEC var_p.vale_of[var_p vals df.pvals <= @.@5){vars"] ) Double-click (or enter) to edit + Viewing the significant variables sHanificant_vars = got_santficart_vars( logit rodel ) Creonst’, selary_nediin’} Double-click (or enter) to edit Now creating a new model "final_logit’ with only significant variables of the previous ” model final Logit ~ sn.Loaitt y_trein, smada_eonstant( X-train[sienificant_vars] ) )-f880) ‘carrent function value! 0.425996 ~ Now checking the summary of final model #inal_togtt.summary20) htpsscolab research google comidriva!'SHSGZOAmnIB3sK0Sh_aMsobHpFSeOltiscralTo=bol UgdobSOSSEprniMode= 49 7723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory od: Lop Pseudo R-squrod: 0.224 Dependent Varable let Alc: sor.ori Date 2023-03.07 13:26 BIG: 005 4399 Ne. Obsoratons: 10409 Leg Likatnoos: 44728 tM 2 una “set DrResidvals: 10488 UR pale: 0.0000, Converged: 1.0000 Sear: +0000 No. Nertons: 7.0000 Coot, Sider 2 Patel (0028 0975) ~ Creating a new dataframe y_pred_df which includes the predicted values a.A = pd.datatranet ( "actual": 9.3 predicted prob": Final logit.pregict( smade_constant( xLtese[ significant vars] )) ) ) ~ Checking the predicted values y.preg_sf.sowple(3®, randon state = 42) actual preaictes.preb 3908 0.07618, 0 ozran0 0 ogzarie est 0 aura ao ove996 am 0.08008 Creating a list which compares the actual values with predicted values, def if prob value > .50 esle 0 Tanoce x: 1 1¢ x > 0.5 else ®) y.pred.af.sorgle(2e, randon_state htpsscolab research google comidriva!'SHSGZOAmnIB3sK0Sh_aMsobHpFSeOltiscralTo=bol UgdobSOSSEprniMode= 1723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory sctusl predicted prob predicted wee 40004 ° 0 a3235s6 ° + Importing necessary libraries for plotting graphs ngort natplotlib.pyplot as ple ‘ngort seaborn 36 30 Defining a function to draw a confusion matrix to evaluate the performance of a classification model. A confusion matrix is a table that summarizes the performance of a classification model by showing the number of true positives, true negatives, false positives, and false negatives. erix( actual, predicted ) Sh.neatap(en, annot=irus, fat="-2F", sticklabels = [Bae credit", "Good Chet ytiekiaters = ("Bae creat", “Good Credst”) } pltsylabel(“trve label") abel( Predicted Label’) shew) craw_ent y_pred_of actoal, y.pred.af.predicted oll 1 : Now printing a classification report, which provides a comprehensive evaluation of the performance of a classification model. rint( matrice.classiflcatton_report( y_pr Y.pret_at.predicted )) hitpsscolab research google comidrivel'SHSGZGAmnI83sK0Shx_aMsobHpFSeOIFscralTo=bol UgdebSOSSSprintNod 7723, 743 PM ‘Aquilorar_1212 ipynb - Colaboratory precision cecall. fl-score support 1 ons baz 0 accuracy oa 500 macro age 788.63 get weighted 27 87a 8 ase Ploting a histogram of the predicted probabilities for each class (bad credit and good credit). pit. tigure Figsize = (8,6) ) Sn.distplot( y_pred_affy_pred_df.actual == 230" label = “ad Creelt” ) sncdistolot( y_pred_df{y_pred_df.actual == e)("predtcted prob"), desFalse, color = "8's abel = "Good Creait™ } plt-legend() ple-showt) ren pe Defining a function to plot the Receiver Operating Characteristic (ROC) curve of a binary ” classification model and calculate the Area Under the Curve (AUC) score. oF arag_roct actual, probs ) for, ters \ thresholds = netrics.roc_curve( actual, prove, frop_intarnectate = False ) ave score = netrics.roc_aue_score( actual, probs ) puts Figure(figsize~(s, 8) pitwplot( fers tbr, label: net OC curve (area = 38.26)" % aue_score ) Plt nlabel(“False Positive Rate or [1 ~ True Negative Rate”) pltiylabel(“True Positive tate") puts dagend(2oc="2ower rignt") pit-shou() return for. thr, thresholds pr, toe, thresholds = deouroct y-pres.af.sctusl, pred. df.predicted proo} hitpsscolab research google comidtival'SHSGZQAmnIB3sK0Sh_aMsobHpFSeOltiscrolTo=bol UgDobSOSSEprntMode=true 719 7723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory Fave ote ate Wu ape ate Now calculating the Area Under the Curve (AUC) score of a binary classification model using the predicted probabilities and actual labels. ounat flost( ave score ), 2 ) cscore( y_pred.of. actual Creating DataFrame that contains True Positive Rate (TPR), False Positive Rate (FPR), and threshold values for a binary classification model. pé.datarrane( { ‘tor’: tor, esholds': thresholés ) ) fort'@304"] ~ tor for-tor ~ tor_for-for sortavalues( Téifé", aacending = False 91855 tor for thresholds iff 157 osro10 0.175029 oz0612 o4sases Again plotting a confusion matrix Now craeting a new column predicted_new in a pandas DataFrame y_pred_df, where the ~ predicted class labels are based on a new threshold value of 0.22 for the predicted probabilities. ype sep ew] = y_pred_df. predicted prob. nap Tanbda xi 1 1¢ x > 0122 else 0) hitpsscolab research google comidtival'SHSGZQAmnIB3sK0Sh_aMsobHpFSeOltiscrolTo=bol UgDobSOSSEprntMode=true 7723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory ~ Again, Plotting Consuion matrix with threshold values sean ent y_pred_of-actual, 7. pees, ae predicted_ nen) + Again printing new classification report with highes precision and recall values print(oetescs classification report( y_pred_df.2c Ypres. of-predicted new }) 08 completed at 706° htpsscolab research google comidriva!'SHSGZOAmnIB3sK0Sh_aMsobHpFSeOltiscralTo=bol UgdobSOSSEprniMode=

You might also like