237723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory
Double-click (or enter) to edit
~ we have imported the necessary libraries to create our ML model
Dataframe name = HR
Next we can check the head of the dataframe
warnings. teruarnings(Sgnore")
Lnoort py 25 r@
= pa.rend csv */content/#8 comma. sep csv" )
hess)
mint,
‘ 44399 non-null
5 pS 34999 noncnull object
types: Floaten(2), int6a(6), 90
head)
faction level Last_evalustion runber project average_sontly.hours tine spend_conpany
‘ »
Double-click (or enter) to edit
~ Now we can check the count of people left as 1 and retained as 0
value courts()
Nave: eft, ctype: ints
Dosble-ik rete) to elt
+ Created a list with all the columns
1 = seem cotums)
(csststacton tee",
naan prefer
htpsscolab research google comidrival'SHSGZOAmnIB3sK0Sh_aMsobHpFSeOltiscralTo=bol UgdobSOSSEprniMode=7723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory
Doubleclick rene) to ect
+ Dropping the Column "left"
x fostures = Tist( HR.coluens )
Xfeatures.renove( "Ieft!
(Csseisfoction level",
runber_project',”
pronation
Double-click or enter) to ect
+ Using Hot Code to create (N-1) Dummy variable for N categories
encoded df = pl.get dunates( HA(X Features],
srep first = True )
List encodes What. colunas)
Uisatistaction level’,
Departaent_narketing'
Departaent_produc
selary_nediin’}
Double-click (or enter) to edit
~ Importing statsmodels library as ‘sm’
Creating dependent variable as X
Creating independent variable as | and adding a constant so the Im model gives
intercept value
htpsscolab research google comidriva!'SHSGZOAmnIB3sK0Sh_aMsobHpFSeOltiscralTo=bol UgdobSOSSEprniMode=7723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory
X= snladé_constant( encodes Mtef }
Double-click (or enter) to edit
Training the model with 70% of dataframe and spliting the dataframe and testing with
30% data
random state + 42)
Doubleclick or ante) to et
+ Importing statsmodel library and naming the regression model as "logit_model’”
Angort statstedels.api 35 sm
logit = sn-togit(y-train, X train)
logit poded = Toglt. tO
Now checking the summary of the model and identifying significant and non significant
variables
ogst_podet sunnary20)
Lop Pseudo R-squarod: 0.225,
ert Vaal: et Ac: es7124s
2023.03.07 13:24 BIG 109.5462
No. Observations: 10499 Leg Liatnoos: 4466.8
or Meds 18 un “set
DrResidals: 10480 UR pao: 0.0000
Conver: 1.0000 Seat +0000
No Mortons: 7.0000
cout, Sidr. Poke] (0025 0975)
umber project -0.30190.0254 -11.8800 0000-03517 0252+
mondy.howre 0048 0.0006 7.7658 0.0000 0.0036 0.0060,
end_company 0.2878 00:87 143073 o.9000023%1 oso44
promotion last Syeare -1.24860:3021 ~4.4534 0.0000-1.9405-0,798
Department. marketing 0.2529 0.1489 1.7891 0.0736-0.0251 0.8500
Department product mg 062 0.1402 0.8283 0.4075-0.1587 03810
Department support 01707 0173 14552. 01458-00592 04006
rymedium 15442 0.1809 95877 o.oo 12180 1a4e4
htpsscolab research google comidriva!'SHSGZOAmnIB3sK0Sh_aMsobHpFSeOltiscralTo=bol UgdobSOSSEprniMode=7723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory
Double-click or enter) to edit
~ Defining a function which will extract the SIGNIFICANT variables from the logit_model
det get_significant_vars( In):
\ar_p_vals_of = pa.batarrare( In.praiues
var_plvals_of.coluins = ["pvais', "vars"
rh JAsEC var_p.vale_of[var_p vals df.pvals <= @.@5){vars"] )
Double-click (or enter) to edit
+ Viewing the significant variables
sHanificant_vars = got_santficart_vars( logit rodel )
Creonst’,
selary_nediin’}
Double-click (or enter) to edit
Now creating a new model "final_logit’ with only significant variables of the previous
” model
final Logit ~ sn.Loaitt y_trein,
smada_eonstant( X-train[sienificant_vars] ) )-f880)
‘carrent function value! 0.425996
~ Now checking the summary of final model
#inal_togtt.summary20)
htpsscolab research google comidriva!'SHSGZOAmnIB3sK0Sh_aMsobHpFSeOltiscralTo=bol UgdobSOSSEprniMode=
497723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory
od: Lop Pseudo R-squrod: 0.224
Dependent Varable let Alc: sor.ori
Date 2023-03.07 13:26 BIG: 005 4399
Ne. Obsoratons: 10409 Leg Likatnoos: 44728
tM 2 una “set
DrResidvals: 10488 UR pale: 0.0000,
Converged: 1.0000 Sear: +0000
No. Nertons: 7.0000
Coot, Sider 2 Patel (0028 0975)
~ Creating a new dataframe y_pred_df which includes the predicted values
a.A = pd.datatranet ( "actual": 9.3
predicted prob": Final logit.pregict(
smade_constant( xLtese[ significant vars] )) ) )
~ Checking the predicted values
y.preg_sf.sowple(3®, randon state = 42)
actual preaictes.preb
3908 0.07618,
0 ozran0
0 ogzarie
est 0 aura
ao ove996
am 0.08008
Creating a list which compares the actual values with predicted values, def if prob value
> .50 esle 0
Tanoce x: 1 1¢ x > 0.5 else ®)
y.pred.af.sorgle(2e, randon_state
htpsscolab research google comidriva!'SHSGZOAmnIB3sK0Sh_aMsobHpFSeOltiscralTo=bol UgdobSOSSEprniMode=1723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory
sctusl predicted prob predicted
wee 40004 °
0 a3235s6 °
+ Importing necessary libraries for plotting graphs
ngort natplotlib.pyplot as ple
‘ngort seaborn 36 30
Defining a function to draw a confusion matrix to evaluate the performance of a
classification model. A confusion matrix is a table that summarizes the performance of
a classification model by showing the number of true positives, true negatives, false
positives, and false negatives.
erix( actual, predicted )
Sh.neatap(en, annot=irus, fat="-2F",
sticklabels = [Bae credit", "Good Chet
ytiekiaters = ("Bae creat", “Good Credst”) }
pltsylabel(“trve label")
abel( Predicted Label’)
shew)
craw_ent y_pred_of actoal,
y.pred.af.predicted
oll
1
:
Now printing a classification report, which provides a comprehensive evaluation of the
performance of a classification model.
rint( matrice.classiflcatton_report( y_pr
Y.pret_at.predicted ))
hitpsscolab research google comidrivel'SHSGZGAmnI83sK0Shx_aMsobHpFSeOIFscralTo=bol UgdebSOSSSprintNod7723, 743 PM ‘Aquilorar_1212 ipynb - Colaboratory
precision cecall. fl-score support
1 ons baz 0
accuracy oa 500
macro age 788.63 get
weighted 27 87a 8 ase
Ploting a histogram of the predicted probabilities for each class (bad credit and good
credit).
pit. tigure Figsize = (8,6) )
Sn.distplot( y_pred_affy_pred_df.actual == 230"
label = “ad Creelt” )
sncdistolot( y_pred_df{y_pred_df.actual == e)("predtcted prob"),
desFalse, color = "8's
abel = "Good Creait™ }
plt-legend()
ple-showt)
ren pe
Defining a function to plot the Receiver Operating Characteristic (ROC) curve of a binary
” classification model and calculate the Area Under the Curve (AUC) score.
oF arag_roct actual, probs )
for,
ters \
thresholds = netrics.roc_curve( actual,
prove,
frop_intarnectate = False )
ave score = netrics.roc_aue_score( actual, probs )
puts Figure(figsize~(s, 8)
pitwplot( fers tbr, label:
net
OC curve (area = 38.26)" % aue_score )
Plt nlabel(“False Positive Rate or [1 ~ True Negative Rate”)
pltiylabel(“True Positive tate")
puts dagend(2oc="2ower rignt")
pit-shou()
return for. thr, thresholds
pr, toe, thresholds = deouroct y-pres.af.sctusl,
pred. df.predicted proo}
hitpsscolab research google comidtival'SHSGZQAmnIB3sK0Sh_aMsobHpFSeOltiscrolTo=bol UgDobSOSSEprntMode=true
7197723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory
Fave ote ate Wu ape ate
Now calculating the Area Under the Curve (AUC) score of a binary classification model
using the predicted probabilities and actual labels.
ounat flost( ave score ), 2 )
cscore( y_pred.of. actual
Creating DataFrame that contains True Positive Rate (TPR), False Positive Rate (FPR),
and threshold values for a binary classification model.
pé.datarrane( { ‘tor’: tor,
esholds': thresholés ) )
fort'@304"] ~ tor for-tor ~ tor_for-for
sortavalues( Téifé", aacending = False 91855
tor for thresholds iff
157 osro10 0.175029 oz0612 o4sases
Again plotting a confusion matrix
Now craeting a new column predicted_new in a pandas DataFrame y_pred_df, where the
~ predicted class labels are based on a new threshold value of 0.22 for the predicted
probabilities.
ype sep ew] = y_pred_df. predicted prob. nap
Tanbda xi 1 1¢ x > 0122 else 0)
hitpsscolab research google comidtival'SHSGZQAmnIB3sK0Sh_aMsobHpFSeOltiscrolTo=bol UgDobSOSSEprntMode=true7723, 743 PM ‘Aquilorar_1212 ipynb- Colaboratory
~ Again, Plotting Consuion matrix with threshold values
sean ent y_pred_of-actual,
7. pees, ae predicted_ nen)
+ Again printing new classification report with highes precision and recall values
print(oetescs classification report( y_pred_df.2c
Ypres. of-predicted new })
08 completed at 706°
htpsscolab research google comidriva!'SHSGZOAmnIB3sK0Sh_aMsobHpFSeOltiscralTo=bol UgdobSOSSEprniMode=