Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Individual Project III

Contents
1.0 Scope and Objectives...........................................................................................................................3
2.0 Methodology.........................................................................................................................................3
Tools.......................................................................................................................................................3
Analysis..................................................................................................................................................3
1. Multivariate analysis.................................................................................................................3
2. Correlation.................................................................................................................................6
3. Descriptive Statistics..................................................................................................................7
4. t-Test two samples assuming unequal variance.....................................................................10
3.0 Initial analysis statement...................................................................................................................12
1.0 Scope and Objectives
According to Attrition statistics, the number/per cent of employees leaving a company in order to
work for another company or pursue other career paths. The report will discuss employee
attrition in a company in reference to the other variables, including both categorical and non-
categorical data.
The current study aimed to provide a t-test model and correlation with significant predictors of
attrition using significantly correlated variables. As a result of the categorical nature of the
response variable, the analysis was unable to explain the attrition states.
2.0 Methodology
Tools
Excel provides various tools for examining and interpreting data. A pivot table is a powerful
analysis tool included with Microsoft Excel, which is one of the most widely used data analysis
programs (Agarwal, 2021). Various data analysis programs in Microsoft Excel 2016 such as
pivot table, regression analysis, descriptive summary, t-test etc. were used to analyse the given
data.
Analysis
1. Multivariate analysis
There is only one outcome for a multivariate approach consisting of multiple dependent
variables (Great Learning Team, 2020). Here, two related variables are considered for
knowing the reason for employees’ attrition rate.

a) Attrition based on Business Travel and Distance


Employees who travel rarely within 1-5 km distance have the highest attrition and retention
rate whereas the non-travellers within 11-15 km distance have the lowest attrition and
retention rate among all. Though, employees who travel frequently have a constant attrition
rate.
Attrition based on Business Travel and Distance
450
400
350
300
250
200
150
100
50
0
No Yes No Yes No Yes No Yes No Yes No Yes
1-5 6-10 11-15 16-20 21-25 26-30

Non-Travel Travel_Frequently Travel_Rarely

b) Attrition based on Department


Employees in the R&D department have the highest attrition and retention rate as compared
to other departments.

Attrition based on Department


900
800
700
600
500
400
300
200
100
0
Human Resources Research & Development Sales

No Yes

c) Attrition based on Age and Gender


Both female & male employees aged between 25-44 have the highest retention rate whereas
employees aged between 25-34 have the highest attrition rate.
Attrition based on Age and Gender
300
250
200
150
100
50
0
No Yes No Yes No Yes No Yes No Yes
15-24 25-34 35-44 45-54 55-64

Female Male

d) Attrition based on Years at Company and Years since the last promotion
Employees who are associated with Company between 0-9 years and received the last
promotion between 0-4 years have the highest attrition and retention rate.

Attrition based on Years at Company & Years since last


promotion
900
800
700
600
500
400
300
200
100
0
No Yes No Yes No Yes
0-4 5-9 10-15

0-9 10-19 20-29 30-40

e) Attrition based on Marital Status and Monthly Income


Employees who are married and whose monthly income is between 1000-5999 have the
highest retention rate whereas employees who are single and whose monthly income is
between 1000-5999 have the highest attrition rate.
Attrition based on Marital Status & Monthly Income
400
350
300
250
200
150
100
50
0
No Yes No Yes No Yes No Yes
1000-5999 6000-10999 11000-15999 16000-20999

Divorced Married Single

f) Attrition based on Job Role and % Salary Hiked


Sales Executives who received a % salary hike between 11-15 have the highest retention rate
whereas Laboratory Technicians who received a % salary hike between 11-15 have the
highest attrition rate. However, employees who received a % salary hike of more than 15%
have stability in their jobs.

Attrition based on Job Role & % Salary hiked


200
180
160
140
120
100
80
60
40
20
0
No Yes No Yes No Yes
11-15 16-20 21-25

Healthcare Representative Human Resources Laboratory Technician


Manager Manufacturing Director Research Director
Research Scientist Sales Executive Sales Representative

2. Correlation
An R-value between -1 and +1 measures the strength of correlation between two variables
(BYJUS, n.d.). Here, the Attrition correlation is analysed with the non-categorical variables.
Variables such as Distance from Home, Monthly Rate, Number of Companies Worked, and
Performance Rating are negatively correlated with Attrition (highlighted in red) whereas
Variables such as Age, Job Level, Monthly Income, Total Working Years, Years in Current
Role, and Years with Current Manager have a weak positive correlation with Attrition
(highlighted in red).
Table for Correlation of Attrition with non-categorical variables
  Attrition
Age 0.159205007
Daily Rate 0.056651992
Distance from Home -0.077923583
Education 0.03137282
Environment Satisfaction 0.103368978
Hourly Rate 0.00684555
Job Involvement 0.130015957
Job Level 0.169104751
Job Satisfaction 0.103481126
Monthly Income 0.159839582
Monthly Rate -0.015170213
Number Companies Worked -0.043493739
Percent Salary Hike 0.013478202
Performance Rating -0.002888752
Relationship Satisfaction 0.045872279
Stock Option Level 0.137144919
Total Working Years 0.171063246
Training Times Last Year 0.059477799
Work-Life Balance 0.063939047
Years At Company 0.134392214
Years In Current Role 0.160545004
Years Since Last Promotion 0.033018775
Years With Current Manager 0.156199316
3. Descriptive Statistics
In descriptive statistics, a sample or data set is summarized to show its characteristics, such as its
mean, standard deviation, or frequency (Bhandari, 2020). Here, the data is summarised for the
categorical variables.
  Attrition Age Daily Rate Distance From Home
Mean 0.83877551 36.92380952 802.4857143 9.192517007
Standard Error 0.009594613 0.238269054 10.52433506 0.211443453
Median 1 36 802 7
Mode 1 35 691 2
Standard Deviation 0.367863032 9.135373489 403.5090999 8.106864436
Sample Variance 0.13532321 83.45504879 162819.5937 65.72125098
Kurtosis 1.403594201 -0.404145137 -1.203822808 -0.224833405
Skewness -1.844366124 0.413286302 -0.003518568 0.958117996
Range 1 42 1397 28
Minimum 0 18 102 1
Maximum 1 60 1499 29
Sum 1233 54278 1179654 13513
Count 1470 1470 1470 1470

Environment Job
  Education Satisfaction Hourly Rate Involvement
Mean 2.91292517 2.721768707 65.89115646 2.729931973
Standard Error 0.026712297 0.028509799 0.53023267 0.018558957
Median 3 3 66 3
Mode 3 3 66 3
Standard
Deviation 1.024164945 1.093082215 20.32942759 0.711561143
Sample Variance 1.048913834 1.194828728 413.2856263 0.50631926
Kurtosis -0.559114966 -1.202520522 -1.196398456 0.270998766
Skewness -0.289681082 -0.321654448 -0.032310953 -0.498419364
Range 4 3 70 3
Minimum 1 1 30 1
Maximum 5 4 100 4
Sum 4282 4001 96860 4013
Count 1470 1470 1470 1470

  Job Level Job Satisfaction Monthly Income Monthly Rate


Mean 2.063945578 2.728571429 6502.931293 14313.1034
Standard Error 0.028871236 0.028764462 122.7930538 185.6462846
Median 2 3 4919 14235.5
Mode 1 4 2342 9150
Standard
Deviation 1.106939899 1.102846123 4707.956783 7117.786044
Sample Variance 1.22531594 1.216269571 22164857.07 50662878.17
Kurtosis 0.399152055 -1.222192568 1.005232691 -1.2149561
Skewness 1.025401283 -0.329671959 1.369816681 0.018577808
Range 4 3 18990 24905
Minimum 1 1 1009 2094
Maximum 5 4 19999 26999
Sum 3034 4011 9559309 21040262
Count 1470 1470 1470 1470

Number Percent Salary Performance Relationship


Companies Hike Rating Satisfaction
  Worked
Mean 2.693197279 15.20952381 3.153741497 2.712244898
Standard Error 0.065153137 0.095458593 0.009411009 0.028200119
Median 2 14 3 3
Mode 1 11 3 3
Standard 2.498009006 3.659937717 0.360823525 1.081208886
Deviation
Sample Variance 6.240048994 13.39514409 0.130193616 1.169012656
Kurtosis 0.010213817 -0.300598222 1.69593867 -1.184813982
Skewness 1.026471112 0.821127976 1.921882702 -0.302827565
Range 9 14 1 3
Minimum 0 11 3 1
Maximum 9 25 4 4
Sum 3959 22358 4636 3987
Count 1470 1470 1470 1470

Stock Total Working Training Times Work-Life


Option Years Last Year Balance
  Level
Mean 0.793877551 11.27959184 2.799319728 2.76122449
Standard Error 0.022223886 0.202938554 0.033626791 0.018426321
Median 1 10 3 3
Mode 0 10 2 3
Standard 0.852076668 7.780781676 1.289270621 0.70647583
Deviation
Sample Variance 0.726034648 60.54056348 1.662218734 0.499108098
Kurtosis 0.364634334 0.918269537 0.494992986 0.419460495
Skewness 0.968980317 1.117171853 0.553124171 -0.552480299
Range 3 40 6 3
Minimum 0 0 0 1
Maximum 3 40 6 4
Sum 1167 16581 4115 4059
Count 1470 1470 1470 1470

  Years At Years In Current Years Since Last Years With


Company Role Promotion Current
Manager
Mean 7.008163265 4.229251701 2.187755102 4.123129252
Standard Error 0.159792192 0.094498756 0.084047512 0.093064221
Median 5 3 1 3
Mode 5 2 0 2
Standard
Deviation 6.126525152 3.623137035 3.222430279 3.568136121
Sample Variance 37.53431044 13.12712197 10.3840569 12.73159537
Kurtosis 3.935508756 0.477420774 3.612673115 0.171058084
Skewness 1.764529454 0.917363156 1.984289983 0.833450992
Range 40 18 15 17
Minimum 0 0 0 0
Maximum 40 18 15 17
Sum 10302 6217 3216 6061
Count 1470 1470 1470 1470
4. t-Test two samples assuming unequal variance
Two populations are compared using a t-Test to determine whether their means are equal
(Soetewey, 2020). Here, the t-Test is used to test the null hypothesis between the attrition rate
and the categorical variables.
t-Test: Two-Sample Assuming Unequal Variances (Business Travel)

  No Yes
Mean 411 79
Variance 171157 5259
Observations 3 3
Hypothesized Mean Difference 0
df 2
t Stat 1.369082836
P(T<=t) one-tail 0.152225139
t Critical one-tail 2.91998558
P(T<=t) two-tail 0.304450279
t Critical two-tail 4.30265273  

t-Test: Two-Sample Assuming Unequal Variances (Department)

  No Yes
Mean 411 79
Variance 153369 3787
Observations 3 3
Hypothesized Mean Difference 0
df 2
t Stat 1.450551752
P(T<=t) one-tail 0.141990753
t Critical one-tail 2.91998558
P(T<=t) two-tail 0.283981505
t Critical two-tail 4.30265273  

t-Test: Two-Sample Assuming Unequal Variances (Marital Status)

  No Yes
Mean 411 79
Variance 24547 1911
Observations 3 3
Hypothesized Mean Difference 0
df 2
t Stat 3.535250603
P(T<=t) one-tail 0.035766786
t Critical one-tail 2.91998558
P(T<=t) two-tail 0.071533572
t Critical two-tail 4.30265273  

t-Test: Two-Sample Assuming Unequal Variances (Gender)

  No Yes
Mean 616.5 118.5
Variance 26680.5 1984.5
Observations 2 2
Hypothesized Mean Difference 0
df 1
t Stat 4.159761
P(T<=t) one-tail 0.075096
t Critical one-tail 6.313752
P(T<=t) two-tail 0.150192
t Critical two-tail 12.7062  

t-Test: Two-Sample Assuming Unequal Variances (Job Role)

  No Yes
Mean 137 26.3333
Variance 6872 563
Observations 9 9
Hypothesized Mean Difference 0
df 9
t Stat 3.850327
P(T<=t) one-tail 0.001952
t Critical one-tail 1.833113
P(T<=t) two-tail 0.003904
t Critical two-tail 2.262157  

Conclusion: The null hypothesis for all the categories was rejected except the category ‘Job
Role’. A significant difference in attrition rates except with the category ‘Job Role’ cannot be
inferred from the observed difference between the sample means.

Category t Stat t Critical two-tail Remarks

Business Travel 1.369082836 4.30265273 Rejection of Null Hypothesis

Department 1.450551752 4.30265273 Rejection of Null Hypothesis

Marital Status 3.535250603 4.30265273 Rejection of Null Hypothesis

Gender 4.159760892 12.70620474 Rejection of Null Hypothesis

Job Role 3.850326845 2.262157163 Acceptance of Null Hypothesis

3.0 Initial analysis statement


In the beginning, all variables were analyzed based on their relationship with attrition.
Multivariate analysis revealed the impact on attrition of both categorical and non-categorical
variables. Correlations between categorical variables and the response variables were
investigated. Based on the significant correlations included in the correlation analysis, the
descriptive statistics and the t-test model were constructed.

In the current study, both categorical and continuous predictors are utilized in order to predict a
categorical variable.  In the analysis, the attrition states were unable to be explained since the
response variable was categorical. Using the other model, the results can be improved.
References
Agarwal, G. (2021). A Comprehensive Guide on Microsoft Excel for Data Analysis. Available
at: https://www.analyticsvidhya.com/blog/2021/11/a-comprehensive-guide-on-microsoft-excel-
for-data-analysis/ (Accessed on June 5, 2022)
Bhandari, P. (2020). Descriptive Statistics | Definitions, Types, Examples. Available at:
https://www.scribbr.com/statistics/descriptive-statistics/ (Accessed on June 5, 2022)
BYJUS. Correlation. Available at: https://byjus.com/maths/correlation/ (Accessed on June 5,
2022)
Great Learning Team (2020). Overview of Multivariate Analysis | What is Multivariate Analysis
and Model Building Process? Available at: https://www.mygreatlearning.com/blog/introduction-
to-multivariate-analysis/ (Accessed on June 5, 2022)
Soetewey, A. (2020). How to do a t-test or ANOVA for more than one variable at once in R.
Available at: https://statsandr.com/blog/how-to-do-a-t-test-or-anova-for-many-variables-at-once-
in-r-and-communicate-the-results-in-a-better-way/ (Accessed on June 5, 2022)

You might also like