Professional Documents
Culture Documents
Employee Attrition
Employee Attrition
Contents
1.0 Scope and Objectives...........................................................................................................................3
2.0 Methodology.........................................................................................................................................3
Tools.......................................................................................................................................................3
Analysis..................................................................................................................................................3
1. Multivariate analysis.................................................................................................................3
2. Correlation.................................................................................................................................6
3. Descriptive Statistics..................................................................................................................7
4. t-Test two samples assuming unequal variance.....................................................................10
3.0 Initial analysis statement...................................................................................................................12
1.0 Scope and Objectives
According to Attrition statistics, the number/per cent of employees leaving a company in order to
work for another company or pursue other career paths. The report will discuss employee
attrition in a company in reference to the other variables, including both categorical and non-
categorical data.
The current study aimed to provide a t-test model and correlation with significant predictors of
attrition using significantly correlated variables. As a result of the categorical nature of the
response variable, the analysis was unable to explain the attrition states.
2.0 Methodology
Tools
Excel provides various tools for examining and interpreting data. A pivot table is a powerful
analysis tool included with Microsoft Excel, which is one of the most widely used data analysis
programs (Agarwal, 2021). Various data analysis programs in Microsoft Excel 2016 such as
pivot table, regression analysis, descriptive summary, t-test etc. were used to analyse the given
data.
Analysis
1. Multivariate analysis
There is only one outcome for a multivariate approach consisting of multiple dependent
variables (Great Learning Team, 2020). Here, two related variables are considered for
knowing the reason for employees’ attrition rate.
No Yes
Female Male
d) Attrition based on Years at Company and Years since the last promotion
Employees who are associated with Company between 0-9 years and received the last
promotion between 0-4 years have the highest attrition and retention rate.
2. Correlation
An R-value between -1 and +1 measures the strength of correlation between two variables
(BYJUS, n.d.). Here, the Attrition correlation is analysed with the non-categorical variables.
Variables such as Distance from Home, Monthly Rate, Number of Companies Worked, and
Performance Rating are negatively correlated with Attrition (highlighted in red) whereas
Variables such as Age, Job Level, Monthly Income, Total Working Years, Years in Current
Role, and Years with Current Manager have a weak positive correlation with Attrition
(highlighted in red).
Table for Correlation of Attrition with non-categorical variables
Attrition
Age 0.159205007
Daily Rate 0.056651992
Distance from Home -0.077923583
Education 0.03137282
Environment Satisfaction 0.103368978
Hourly Rate 0.00684555
Job Involvement 0.130015957
Job Level 0.169104751
Job Satisfaction 0.103481126
Monthly Income 0.159839582
Monthly Rate -0.015170213
Number Companies Worked -0.043493739
Percent Salary Hike 0.013478202
Performance Rating -0.002888752
Relationship Satisfaction 0.045872279
Stock Option Level 0.137144919
Total Working Years 0.171063246
Training Times Last Year 0.059477799
Work-Life Balance 0.063939047
Years At Company 0.134392214
Years In Current Role 0.160545004
Years Since Last Promotion 0.033018775
Years With Current Manager 0.156199316
3. Descriptive Statistics
In descriptive statistics, a sample or data set is summarized to show its characteristics, such as its
mean, standard deviation, or frequency (Bhandari, 2020). Here, the data is summarised for the
categorical variables.
Attrition Age Daily Rate Distance From Home
Mean 0.83877551 36.92380952 802.4857143 9.192517007
Standard Error 0.009594613 0.238269054 10.52433506 0.211443453
Median 1 36 802 7
Mode 1 35 691 2
Standard Deviation 0.367863032 9.135373489 403.5090999 8.106864436
Sample Variance 0.13532321 83.45504879 162819.5937 65.72125098
Kurtosis 1.403594201 -0.404145137 -1.203822808 -0.224833405
Skewness -1.844366124 0.413286302 -0.003518568 0.958117996
Range 1 42 1397 28
Minimum 0 18 102 1
Maximum 1 60 1499 29
Sum 1233 54278 1179654 13513
Count 1470 1470 1470 1470
Environment Job
Education Satisfaction Hourly Rate Involvement
Mean 2.91292517 2.721768707 65.89115646 2.729931973
Standard Error 0.026712297 0.028509799 0.53023267 0.018558957
Median 3 3 66 3
Mode 3 3 66 3
Standard
Deviation 1.024164945 1.093082215 20.32942759 0.711561143
Sample Variance 1.048913834 1.194828728 413.2856263 0.50631926
Kurtosis -0.559114966 -1.202520522 -1.196398456 0.270998766
Skewness -0.289681082 -0.321654448 -0.032310953 -0.498419364
Range 4 3 70 3
Minimum 1 1 30 1
Maximum 5 4 100 4
Sum 4282 4001 96860 4013
Count 1470 1470 1470 1470
No Yes
Mean 411 79
Variance 171157 5259
Observations 3 3
Hypothesized Mean Difference 0
df 2
t Stat 1.369082836
P(T<=t) one-tail 0.152225139
t Critical one-tail 2.91998558
P(T<=t) two-tail 0.304450279
t Critical two-tail 4.30265273
No Yes
Mean 411 79
Variance 153369 3787
Observations 3 3
Hypothesized Mean Difference 0
df 2
t Stat 1.450551752
P(T<=t) one-tail 0.141990753
t Critical one-tail 2.91998558
P(T<=t) two-tail 0.283981505
t Critical two-tail 4.30265273
No Yes
Mean 411 79
Variance 24547 1911
Observations 3 3
Hypothesized Mean Difference 0
df 2
t Stat 3.535250603
P(T<=t) one-tail 0.035766786
t Critical one-tail 2.91998558
P(T<=t) two-tail 0.071533572
t Critical two-tail 4.30265273
No Yes
Mean 616.5 118.5
Variance 26680.5 1984.5
Observations 2 2
Hypothesized Mean Difference 0
df 1
t Stat 4.159761
P(T<=t) one-tail 0.075096
t Critical one-tail 6.313752
P(T<=t) two-tail 0.150192
t Critical two-tail 12.7062
No Yes
Mean 137 26.3333
Variance 6872 563
Observations 9 9
Hypothesized Mean Difference 0
df 9
t Stat 3.850327
P(T<=t) one-tail 0.001952
t Critical one-tail 1.833113
P(T<=t) two-tail 0.003904
t Critical two-tail 2.262157
Conclusion: The null hypothesis for all the categories was rejected except the category ‘Job
Role’. A significant difference in attrition rates except with the category ‘Job Role’ cannot be
inferred from the observed difference between the sample means.
In the current study, both categorical and continuous predictors are utilized in order to predict a
categorical variable. In the analysis, the attrition states were unable to be explained since the
response variable was categorical. Using the other model, the results can be improved.
References
Agarwal, G. (2021). A Comprehensive Guide on Microsoft Excel for Data Analysis. Available
at: https://www.analyticsvidhya.com/blog/2021/11/a-comprehensive-guide-on-microsoft-excel-
for-data-analysis/ (Accessed on June 5, 2022)
Bhandari, P. (2020). Descriptive Statistics | Definitions, Types, Examples. Available at:
https://www.scribbr.com/statistics/descriptive-statistics/ (Accessed on June 5, 2022)
BYJUS. Correlation. Available at: https://byjus.com/maths/correlation/ (Accessed on June 5,
2022)
Great Learning Team (2020). Overview of Multivariate Analysis | What is Multivariate Analysis
and Model Building Process? Available at: https://www.mygreatlearning.com/blog/introduction-
to-multivariate-analysis/ (Accessed on June 5, 2022)
Soetewey, A. (2020). How to do a t-test or ANOVA for more than one variable at once in R.
Available at: https://statsandr.com/blog/how-to-do-a-t-test-or-anova-for-many-variables-at-once-
in-r-and-communicate-the-results-in-a-better-way/ (Accessed on June 5, 2022)