Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

ASSIGNMENT

TECHNOLOGY PARK MALAYSIA


CT127-3-2-PFDA
PROGRAMMING FOR DATA ANALYSIS
APD2F2206CS(DA)
HAND OUT DATE: 25 JULY 2022
HAND IN DATE: 12 AUGUST 2022
WEIGHTAGE: 50%
NAME: LIEW JUN YEN
TP NUMBER: TP064175

INSTRUCTIONS TO CANDIDATES:

1 Submit your assignment at the administrative counter.


2 Students are advised to underpin their answers with the use of references (cited
using the American Psychological Association (APA) Referencing).
3 Late submission will be awarded zero (0) unless Extenauting Circumstances (EC)
are upheld
4 Cases of plagiarism will be penalized.
5 The assignment should be bound in an appropriate style (comb bound or
stapled).
6 Where the assignment should be submitted in both hardcopy and softcopy, the
softcopy of the written assignment and source code (where appropriate) should
be on a CD in an envelope / CD cover and attached to the hardcopy.
7 You must obtain 50% overall to pass this module.

1
CT127-3-2-PFDA APU2F2206CS(DA)

TABLE OF CONTENT

INTRODUCTION 4
PACKAGES INSTALLATION 5
DATA IMPORT 6
DATA PRE-PROCESSING 7
Data Cleaning 7
DATA EXPLORATION 9
ANALYSIS 11
Question 1: Which gender has higher termination occurrences? 11
1.1 Number of terminated male and female employees 11
1.1.1 Number of male and female employees who have retired 11
1.1.2 Number of male and female employees who volunteered for resignation 12
1.2 Number of terminations of male and female employees in each store 13
1.3 Number of terminations of involuntary male and female employees according to
age in the 2014 year. 14
1.4 Number of terminations of male and female employees in each department 15
1.5 Number of terminations of male and female employees in each city according to
age 16
1.5.1 Number of terminations of male and female employees in Vancouver city 16
Conclusion Q1 18
Question 2: How many laid-off employees are in the organisation? 19
2.1 Comparison of the number of laid-off employees between 2014 and 2015 19
2.1.1 Number of laid-off employees in each city in the 2014 year 19
2.1.2 Proportion of laid-off employees in each job based on Fort Nelson city in
2014 20
2.1.4 Which store has the most laid-off Cashier employees based on the Fort
Nelson city in 2014? 22
2.2 Number of laid-off employees in each department 23
2.2.1 Number of laid-off employees in each city based on the Customer Service
department 23
2.2.2 Number of the length of service laid-off employees based on the Customer
Service department in White Rock city. 24
Conclusion Q2 26
Question 3: What are the proportions of the terminated employees? 27
3.1 Number of terminated employees in each year 27
3.1.1 Number of termination type descriptions from employees according to status
year 27
3.2 Number of termination reason description in each department 29

2
CT127-3-2-PFDA APU2F2206CS(DA)

3.2.1 Number of type of termination reason based on status year 29


3.2.2 Relationship between the type of termination reason and city based on the
2015 year 30
3.3 Number of terminated employees in each job title 32
3.3.1 How many terminated employees in Meat Cutter based on 2015 year? 32
3.3.2 How many terminated employees in Meat Cutter in each city based on the
2015 year? 33
Conclusion Q3 35
Question 4: What is the rate of active and terminated employees? 36
4.1 Number of active and terminated employees in each department 36
4.1.1 How many active and terminated employees have worked less than 2 years
in Meats department? 37
4.2 Relationship between active and terminated employees in each store 38
4.2.1 Number of active and terminated employees in each job based on store 46 38
4.2.2 Are the active and terminated employees increasing or decreasing in Meats
position based on age in store 46? 39
Conclusion Q4 41
Question 5: How many workers have been hired by this organisation? 42
5.1 Total male employees hired in this organisation 42
5.2 Total female employees hired in this company 43
5.3 Which city has the most employment rate? 44
5.3.1 Which store has hired the most employee in Vancouver city? 44
5.3.2 Which department has the most employment rate in Vancouver city store 42? 45
Conclusion Q5 47
EXTRA FEATURES 48
I. getwd( ) 48
II . str(Employee_data) 49
III. position_dodge( ) 50
IV. theme_bw( ) 51
V. theme( ) 52
VI. Density plot + scale_fill_manual( ) 53
VII. facet_wrap( ) + coord_flip( ) 54
VIII. table( ) 55
IX. stat_count( ) 56
X. coord_polar( ) + position_stack( ) 57
XI. Violin plot 58
XII. position = “fill” 59
XIII. Jitter plot 60
CONCLUSION 61
REFERENCE 62

3
CT127-3-2-PFDA APU2F2206CS(DA)

INTRODUCTION
The hypothetical company wishes to utilise data science principles and procedures to deal
with the employee attrition human resource issue. The given dataset “Employee Attrition”
from the hypothetical company consists of 49654 rows and 18 columns which include the
personal details of the staff, location, working status, reason of termination, department, and
position. The data Cleaning, Import, and Analysis will be conducted with the given dataset
provided using the R programming language. However, a conclusion will be justified within
each question of the analysis.

4
CT127-3-2-PFDA APU2F2206CS(DA)

PACKAGES INSTALLATION
R packages are collections of R functions, compiled code, and sample data in a well-defined
format. In the R environment, they are kept in a " library " directory. Therefore, there are
plenty of packages installed for the data analysis of this employee attrition dataset.

Figure 1

dplyr: A grammar of data manipulation for providing a consistent collection of verbs to


assist in solving the most frequent data manipulation difficulties, which includes
mutate( ), select( ), filter( ), summarise( ), and arrange( ).
plyr: Simple to separate, manipulate, and combine data back together. It is a frequent stage in
data manipulation. Importantly, pylr makes it simple to manage the input and output
data formats using a collection of functions with the standard syntax.
readr: Provide a fast way to read rectangular data from delimited files, such as
comma-separated values (CSV).
crayon: Colorize terminal output, combine styles and make styles for notes, warnings, and
errors.
ggplot2: A system for producing graphics by providing data for mapping variables to
aesthetics, and utilising graphical primitives.

5
CT127-3-2-PFDA APU2F2206CS(DA)

DATA IMPORT

Figure 2

In order to do data analysis, the first step is to import data from the given dataset
“employee.attrition”. This dataset was given as a CSV file and it was saved in the current
working directory. Therefore, the “getwd( )” function was used to display the current working
directory before reading the CSV employee attrition file with headers. After reading, the
dataset is completely imported into the R programming. To view the imported dataset, the
View( ) function was used to view the dataset in visual form.

6
CT127-3-2-PFDA APU2F2206CS(DA)

DATA PRE-PROCESSING
Data Pre-processing is the transformation of raw data into a format that can be understood. It
is one of a important step in data mining as the raw data cannot be work without the step of
data pre-processing. Therefore, There is one major task to be conducted in data
pre-processing, which include Data Cleaning.

● Data Cleaning

Figure 3

Renaming column names: All the columns were renamed to understandable columns names
by using names(Employee_data) function

Fixing error inputs: From the column “termreason_desc”, there are inputs “resignaton” have
been figured as error inputs. Therefore, the error inputs have been fixed to “Resignation”
appropriate word.

Converting variables to appropriate data types: Four variables have been identified as
error data types used, which include recordddate_key variable, birthdate_key variable,
orighiredate_key variable, and terminationdate_key variable. The recorddate_key variable’s
data type had been converted to POSIXct and the remaining variables’ data types had been
converted to Date.

7
CT127-3-2-PFDA APU2F2206CS(DA)

Sorting rows according to Employee_ID: From the employee attrition dataset given, the
data rows had been found not sorted. Therefore, the sorting part had been conducted in data
cleaning.

Removing duplicate column: Based on the employee attrition dataset given, a column
named “gender_short” had been identified as duplicate variable due to “gender_short” and
“gender_full” are containing the same values. Therefore, the “gender_short” column had
been removed to prevent data redundancy.

8
CT127-3-2-PFDA APU2F2206CS(DA)

DATA EXPLORATION
The data exploration was conducted after data pre-processing. This step is to understand the
dataset given by doing research about the attributes of the datasets. Therefore, five functions
have been used to explore the employee attrition dataset, which include str(Employee_data),
dim(Employee_data), names(Employee_data), summary(Employee_data), and
View(Employee_data).

Figure 4

str(Employee_data): To display the internal structure of employee attrition dataset

Figure 5

dim(Employee_data): To investigate the number of rows and columns consisted in the


employee attrition dataset

Figure 6

names(Employee_data): To study the employee attrition dataset’s columns name

9
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 7

summary(Employee_data): To produce a summary of all records in the dataset

Figure 8

View(Employee_data): To invoke a spreadsheet-style data viewer to view the dataset

Figure 9

10
CT127-3-2-PFDA APU2F2206CS(DA)

ANALYSIS

Question 1: Which gender has higher termination occurrences?

1.1 Number of terminated male and female employees

➔ This analysis is to find out which gender has higher termination occurrences.

Figure 10

Figure 11

The result shows that female has higher termination occurrence than male in the organisation.
However, the difference between the female and male termination occurrences is 345.

1.1.1 Number of male and female employees who have retired

➔ This analysis determines the number of male and female employees who have retired
from the organisation.

Figure 12

Figure 13

The result shows that female has more retirements than male in the organisation. However,
the number of male retirees is 294, which is lower than female retirees with the number of
591.

11
CT127-3-2-PFDA APU2F2206CS(DA)

1.1.2 Number of male and female employees who volunteered for resignation

➔ This analysis is to find out the number of male and female employees who
volunteered for resignation.

Figure 14

Figure 15

The result shows that 208 terminated female employees have volunteered for resignation
more than 174 terminated male employees in the organisation.

12
CT127-3-2-PFDA APU2F2206CS(DA)

1.2 Number of terminations of male and female employees in each store

➔ This analysis is to find which store has the highest number of terminations of male
and female employees by using dodged bar graph.

Figure 16

Figure 17

This dodged bar graph has concluded that store 35 has the highest number of terminations
and the number of terminated female employees is higher than the number of terminated male
employees in store 35.

13
CT127-3-2-PFDA APU2F2206CS(DA)

1.3 Number of terminations of involuntary male and female employees according to age
in the 2014 year.

➔ This analysis is to investigate which gender has the biggest variation of involuntary
terminations of male and female employees according to age in the year 2014.

Figure 18

Figure 19

The density graph has shown that year 2014 had variation of terminations of male and female
employees. The terminated male employees have two peaks which means that it has two
variations in the age 30 and age 55. However, terminated female employees only have one
variation, which is in the age of 32.

14
CT127-3-2-PFDA APU2F2206CS(DA)

1.4 Number of terminations of male and female employees in each department


➔ This analysis is to find the difference between terminated male and female employees
in each department by using dodged bar graph.

Figure 20

Figure 21

This dodged bar graph has concluded that the number of terminated female employees is the
highest in the Meats department. However, the Produce department is the second highest
terminations of male and female employees.

15
CT127-3-2-PFDA APU2F2206CS(DA)

1.5 Number of terminations of male and female employees in each city according to age

➔ This analysis is to find which city has the highest terminations of male and female
employees according to their ages by using the boxplot graph.

Figure 22

Figure 23

Based on the boxplot graphs in each city, we have investigated that Vancouver city has the
most number of terminations of male and female employees according to age.

1.5.1 Number of terminations of male and female employees in Vancouver city


➔ This analysis is to find the number of terminated male and female employees in
Vancouver city.

Figure 24

16
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 25

This result shows that 197 terminated female employees and 104 terminated male employees
were found in Vancouver city.

17
CT127-3-2-PFDA APU2F2206CS(DA)

Conclusion Q1

Regarding the retirements and resignations between the genders, terminated female
employees have been found to have the highest number of retirements and resignations
occurrences compared with terminated male employees. It is considered that terminated
female employees have higher termination occurrences than terminated male employees.
Apart from that, store 35 has been discovered that it has the highest number of
terminations among all the stores. The store 35’s terminated female employees are higher
than the terminated male employees. Therefore, it is possible that there are some issues
encountered by numerous terminated female employees in store 35.
There is also a big variation in the year 2014 due to female and male employees’
involuntary terminations increasing in the age range from 25 to 41. However, the terminated
male employees increase again in the age range from 41 to 55. It is possible to see that male
employees were having age discrimination and gender inequality due to the involuntary
terminated male employees having higher terminations than involuntary terminated female
employees in 2014.
The Meats department of this organisation has also found that most of the
terminations are from female employees. Therefore, it is possible to see that female employee
are not interested in working in the Meats department.
We have also found that Vancouver city has the most number of terminations of male
and female employees according to age, reaching the number of 197 terminated female
employees and 104 terminated male employees.

18
CT127-3-2-PFDA APU2F2206CS(DA)

Question 2: How many laid-off employees are in the organisation?

2.1 Comparison of the number of laid-off employees between 2014 and 2015

➔ This analysis is to investigate which year had the most laid-off employees

Figure 26

Figure 27

The bar graph shows that year 2014 has the most laid-off employees compared with the year
2015.

2.1.1 Number of laid-off employees in each city in the 2014 year

➔ This analysis is to find which city has the highest number of laid-off employees in
each city based on the 2014 year.

19
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 28

Figure 29

The bar graph shows that 39 employees are the highest number of laid-off employees found
in Fort Nelson city in the year 2014.

2.1.2 Proportion of laid-off employees in each job based on Fort Nelson city in 2014

➔ This analysis is to find which job has the highest laid-off employees in each job based
on Fort Nelson city in 2014.

20
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 30

Figure 31

The pie chart shows that the Cashier position has the most number of laid-off employees in
Fort Nelson city in the year 2014, reaching the rate of 25.64%. While the Bakery Manager,
Customer Service Manager, Meats Manager, Processed Foods Manager, and Produce
Manager positions are having the least number of laid-off employees in Fort Nelson city in
the year 2014, reaching the rate of 2.56%.

21
CT127-3-2-PFDA APU2F2206CS(DA)

2.1.4 Which store has the most laid-off Cashier employees based on the Fort Nelson city
in 2014?

➔ This analysis finds which store has the most laid-off Cashier employees based on Fort
Nelson city in the year 2014.

Figure 32

Figure 33

The bar graph shows that there is only Store 11 had 11 laid-off cashier employees in Fort
Nelson city in the year 2014.

22
CT127-3-2-PFDA APU2F2206CS(DA)

2.2 Number of laid-off employees in each department

➔ This analysis is to find which department has the most laid-off employees.

Figure 34

Figure 35

The result shows that the most laid-off employees found in the department is Customer
Service department, reaching the number of 70 laid-off employees. While the least laid-off
employees found in the department is Store Management department, reaching the number of
6 laid-off employees.

2.2.1 Number of laid-off employees in each city based on the Customer Service
department

➔ This analysis finds which city has the most laid-off employees based on the Customer
Service department.

Figure 36

23
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 37

From the bar graph above, the result shows that White Rock city has the most laid-off
employees in the Customer Service department, reaching the number of 18 laid-off
employees.

2.2.2 Number of the length of service laid-off employees based on the Customer Service
department in White Rock city.

➔ This analysis finds the length of service laid-off employees based on the Customer
Service department in White Rock city.

Figure 38

24
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 39

The results show that the laid-off employees based on the Customer Service department in
White Rock city have served for at least 2 years in the organisation. However, there are at
least 8 laid-off employees who served more than 10 years.

25
CT127-3-2-PFDA APU2F2206CS(DA)

Conclusion Q2

In the year 2014, it has been found that this year has the most laid-off employees
compared with the year 2015 in Fort Nelson city, reaching the number of 39 laid-off
employees. However, Cashier positions had the most laid-off rate in Fort Nelson city,
reaching rate of 25.64%. It is also found that there is Store 11 is having 11 laid-off cashier
employees in Fort Nelson city in the year 2014. Therefore, it is possible that this city’s sales
are the worst among the cities and causing staff reduction occurs in this circumstance.
The Customer Service department has also been found as the most laid-off employees
among the other departments, reaching the number of 70 laid-off employees. It is discovered
that White Rock city has the highest number of laid-off employees and there are at least 8
laid-off employees who served more than 10 years.

26
CT127-3-2-PFDA APU2F2206CS(DA)

Question 3: What are the proportions of the terminated employees?

3.1 Number of terminated employees in each year

➔ This analysis finds the variation of termination employees in each year

Figure 40

Figure 41

The histogram shows that the variation from 2006 to 2013 has no significant impact until
2014, the number of terminated employees had increased by 148 terminated employees,
reaching the number of 253 terminated employees in the year 2014.

3.1.1 Number of termination type descriptions from employees according to status year

➔ This analysis finds the number of termination type descriptions from employees
according to status year.

27
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 42

Figure 43

The bar chart shows that the majority of terminated employees volunteered for the
termination every year. However, there are plenty of involuntarily terminated employees
found in 2014 and 2015 years.

28
CT127-3-2-PFDA APU2F2206CS(DA)

3.2 Number of termination reason description in each department

➔ This analysis finds the number of termination reason description in each department

Figure 44

Figure 45

The bar graph shows that Meats department has the most number of retirement employees
reaching the number of 317. While, the most number of resignation employees are from the
Customer Service department.

3.2.1 Number of type of termination reason based on status year

➔ This analysis will look at the proportion of the type of termination reason every year.

29
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 46

Figure 47

The above violin graph with boxplot shows that the Lay-off of employees occurred the most
in year 2014. However, the Resignation of employees occurred the most in the year 2012.
Regarding the Retirement of employees, the terminations occurred the most in 2007.

3.2.2 Relationship between the type of termination reason and city based on the 2015
year

➔ This analysis looks at every city type of termination reason that occurred in the year
2015.

Figure 48

30
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 49

The visualisation shows that 7 cities have the highest retirement in the year 2015, which
include West Vancouver, Vernon, Terrace, Squamish, Kelowna, Fort St John, and
Aldergrove.

31
CT127-3-2-PFDA APU2F2206CS(DA)

3.3 Number of terminated employees in each job title

➔ This analysis finds which job has the most terminated employees

Figure 50

Figure 51

The visualisation shows that the job with highest number of terminated employees is Meat
Cutter position. However, following to the second highest number of terminated employees is
Produce Clerk.

3.3.1 How many terminated employees in Meat Cutter based on 2015 year?

➔ This analysis is to find the number of terminated employees who worked as Meat
Cutter position in 2015 year.

32
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 52

Figure 53

The result shows that there is a total number of 31 terminated employees in the Meat Cutter
position.

3.3.2 How many terminated employees in Meat Cutter in each city based on the 2015
year?

➔ This analysis is to find the number of terminated employees in Meat Cutter in each
city based on the 2015 year.

33
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 54

Figure 55

The above pie graph shows that the highest number of terminated employees in Meat Cutter
in each city based on the 2015 year is from Victoria city reaching the rate of 38.71%.
However, the least number of terminated employees in Meat Cutter in each city based on the
2015 year are from 3 cities, which include Cranbrook, Princeton, and White Rock with the
rates of 3.23%.

34
CT127-3-2-PFDA APU2F2206CS(DA)

Conclusion Q3

From the year 2013 to 2014, the number of terminated employees had increased by
148 terminated employees, reaching the number of 253 terminated employees in the year
2014. However, it is also been found that in 2014 and 2015 years had occurred involuntary
termination in this organisation.
The Meats department has also been found as the most number of retirement
employees reaching the number of 317. While the most number of resignation employees are
from the Customer Service department. However, there are 7 cities which have the highest
retirement in the year 2015, which include West Vancouver, Vernon, Terrace, Squamish,
Kelowna, Fort St John, and Aldergrove.
The Meat Cutter position has also been found as the highest number of terminated
employees in that job, reaching the number of 31 terminated employees. It has been
discovered that Meat Cutter has 38.71% termination rate, which is considered as the highest
number of terminated employees in the year 2015.

35
CT127-3-2-PFDA APU2F2206CS(DA)

Question 4: What is the rate of active and terminated employees?

4.1 Number of active and terminated employees in each department


➔ This analysis shows the number of active and terminated employees in each
department.

Figure 56

Figure 57

The above jitter graph shows that there are plenty of most active departments, which include
Produce, Processed Foods, Meats, Dairy, Customer Service, and Bakery departments.
However, there are two least active departments from Management and Executive
departments. Apart from that, there are four departments which have the most number of
terminated employees, which include Produce, Meats, Dairy, and Customer Service
departments.

36
CT127-3-2-PFDA APU2F2206CS(DA)

4.1.1 How many active and terminated employees have worked less than 2 years in
Meats department?

➔ This analysis investigates the number of active and terminated employees who have
worked less than 2 years in Meats department.

Figure 58

Figure 59

The bar graph shows a total number of 125 active employees who worked less than 2 years in
Meats departments. While there is a total number of 264 terminated employees who worked
less than 2 years in the same department.

37
CT127-3-2-PFDA APU2F2206CS(DA)

4.2 Relationship between active and terminated employees in each store

➔ This analysis is to find which store has the most active employees.

Figure 60

Figure 61

The visualisation shows that store 46 has the most active employees.

4.2.1 Number of active and terminated employees in each job based on store 46

➔ This analysis is to investigate the number of active and terminated employees in every
job based on store 46.

38
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 62

Figure 63

The bar graph shows that Meat Cutter position is the highest number of active employees in
store 46. While the Store Manager, Produce manager, Processed Foods Manager, Meats
Manager, Customer Service Manager, and Bakery Manager positions are having the least
number of terminations in store 46.

4.2.2 Are the active and terminated employees increasing or decreasing in Meats
position based on age in store 46?

➔ This analysis is to find whether the active and terminated employees increasing or
decreasing in Meats position based on age in store 46.

Figure 64

39
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 65

The line graph shows that active employees from store 46’s Meats position increase from the
age 47 to age 62 but decrease after age 62. While the terminated employees remained as 1
employee from age 54 to age 58.

40
CT127-3-2-PFDA APU2F2206CS(DA)

Conclusion Q4

We have found that the most active departments in this organisation are Produce,
Processed Foods, Meats, Dairy, Customer Service, and Bakery departments. However, there
are 125 active employees and 264 terminated employees who worked less than 2 years in the
Meats department.
Store 46 has also been found as the most active employee. However, the Meat Cutter
position is considered as the highest number of active employees in store 46. While there are
7 job positions having the least number of terminations in store 46. I have also found that
active employees from store 46’s Meats position started to increase from the age 47 to age 62.
While the terminated employees remained as 1 employee from age 54 to age 58.

41
CT127-3-2-PFDA APU2F2206CS(DA)

Question 5: How many workers have been hired by this organisation?

5.1 Total male employees hired in this organisation


➔ This analysis is to find how many male employees have been hired in this
organisation.

Figure 66

Figure 67

The result shows that 3006 male employees have been hired in this organisation.

42
CT127-3-2-PFDA APU2F2206CS(DA)

5.2 Total female employees hired in this company

➔ This analysis is to find how many female employees have been hired in this
organisation.

Figure 68

Figure 69

The result shows that 3278 female employees have been hired in this organisation.

43
CT127-3-2-PFDA APU2F2206CS(DA)

5.3 Which city has the most employment rate?

➔ This analysis is to find which city has the most employment rate.

Figure 70

Figure 71

From the pie graph above, it shows that Vancouver city has the highest employment rate
among all the cities, reaching the rate of 22.15%.

5.3.1 Which store has hired the most employee in Vancouver city?

➔ This analysis is to find which store has hired the most employee in Vancouver city

44
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 72

Figure 73

From the bar graphs above, there are 6 stores in Vancouver city. However, the highest number
of employment in Vancouver city is store 42, reaching the number of 392 employees.

5.3.2 Which department has the most employment rate in Vancouver city store 42?

➔ This analysis is to find which department has the most employment rate in Vancouver
city store 42.

Figure 74

45
CT127-3-2-PFDA APU2F2206CS(DA)

Figure 75

The visualisation shows that Customer Service department has the most employment rate in
the Vancover city store 42, reaching the number of 128 employees.

46
CT127-3-2-PFDA APU2F2206CS(DA)

Conclusion Q5

There is a total of 6284 employees hired in this organisation. However, 3006 are from
male employees and 3278 are from female employees. I have also found that Vancouver city
has the highest employment rate among all the cities, reaching a rate of 22.15%. The store 42
from Vancouver city has also been found to have the most employment rate among the other
stores, reaching the number of 392 employees. This store 42 has also hired 128 employees for
the Customer Service department.

47
CT127-3-2-PFDA APU2F2206CS(DA)

EXTRA FEATURES

I. getwd( )

By calling the getwd( ) function, we can display the current working directory. This simple
function, which doesn’t take any arguments and return the current working directory.
Therefore, it is extremely useful for debugging (Finnstat, 2021).

Source Code Example: I used the getwd( ) function to view the working directory’s files
directly before importing my CSV file due to my CSV file is in the same file location.
Therefore, I can directly import my CSV file by reading it instead.

Output: getwd( ) function shows where my R file location is working on.

48
CT127-3-2-PFDA APU2F2206CS(DA)

II . str(Employee_data)

The str( ) function is used for displaying the internal structure of the dataset. It shows the
structure of large lists that are nested inside of each other. It gives a single line of output for
the fundamental R objects for informing the user about the object and its components
(GeeksforGeeks, 2020).

Source Code Example: I used str( ) function to view the structure of my Employee_data
dataset.

Output: The information of the Employee_data dataset’s columns’ name, data types, etc are
displayed neatly.

49
CT127-3-2-PFDA APU2F2206CS(DA)

III. position_dodge( )

Position_dodge( ) function is to preserve a gem’s vertical position while altering its


horizontal position (ggplot2, n.d.).

Output:

50
CT127-3-2-PFDA APU2F2206CS(DA)

IV. theme_bw( )
theme_bw( ) is a function for making the theme with white background and black gridlines
(theme_bw function - RDocumentation, 2022).

Output:

51
CT127-3-2-PFDA APU2F2206CS(DA)

V. theme( )

Themes are an effective technique to modify the non-data elements of our graphs. For
example, titles, labels, fonts, backgrounds, gridlines, and legends can be modified with the
fonts, colours, and size of the words we want. Therefore, themes can be utilised to give plots
a consistent and customizable look (theme function - RDocumentation, 2022).

Source Code Example:

Output:

52
CT127-3-2-PFDA APU2F2206CS(DA)

VI. Density plot + scale_fill_manual( )

Density plot is used for representing the distribution of a numeric variable by using a kernel
density estimate to illustrate the variable’s probability density function (GeeksforGeeks,
2021).

scale_fill_manual( ) function is used for manually assigning colors for the graphs.

Source Code Example:

Output:

53
CT127-3-2-PFDA APU2F2206CS(DA)

VII. facet_wrap( ) + coord_flip( )

facet_wrap( ) function is utilised to generate a long ribbon of panels and wrapped it into 2d.
This is helpful if there is a single variable with many levels that you want to show up on your
graph because it takes up less space to show the data (Hadley Wickham, a., n.d.).

coord_flip( ) function is useful for changing geoms and statistics that show y conditional on
x and to x conditional on y (coord_flip function - RDocumentation, 2022).

Source Code Example:

Output:

54
CT127-3-2-PFDA APU2F2206CS(DA)

VIII. table( )

The table( ) function is utilised for generating the categorical representation of data
containing variable name and frequency as a form of a table.

Source Code Example:

Output:

55
CT127-3-2-PFDA APU2F2206CS(DA)

IX. stat_count( )

The stat_count( ) function will count the number of cases at each x position and assign the
numbers to the graph.

Source Code Example:

Output:

56
CT127-3-2-PFDA APU2F2206CS(DA)

X. coord_polar( ) + position_stack( )

The coord_polar( ) function is utilised for generating pie charts, it is considered as a stacked
bar chart but in polar coordinates (ggplot2 pie chart : Quick start guide - R software and data
visualization - Easy Guides - Wiki - STHDA, 2022).

The position_stack( ) function is utilised for stacking bars on top of each other.

Source Code Example:

Output:

57
CT127-3-2-PFDA APU2F2206CS(DA)

XI. Violin plot

Violin plot is utilised for displaying numerical data. In particular, it will display the
distribution shape and summarise statistics of the numerical data. Violin plot is useful for data
exploration across different variables (Marsja, E., 2021).

Source Code Example:

Output:

58
CT127-3-2-PFDA APU2F2206CS(DA)

XII. position = “fill”

The position = “fill” function is to generate stacked bars and standardise each stack for a
constant height.

Source Code Example:

Output:

59
CT127-3-2-PFDA APU2F2206CS(DA)

XIII. Jitter plot

Jitter plot is utilised for viewing overlapping points from data that is discrete. It is beneficial
for reducing overplotting with those bigger datasets (How to Create a ggplot Jitter Plot in R,
2022).

Source Code Example:

Output:

60
CT127-3-2-PFDA APU2F2206CS(DA)

CONCLUSION
There are a total of 6 questions with 33 analyses that have been conducted in this report. All
the answers had been answered through data analysis by using the R programming language
tool. However, there are a total of 15 extra features utilised in the analyses and every extra
feature has been well explained with brief descriptions and examples.

61
CT127-3-2-PFDA APU2F2206CS(DA)

REFERENCE
R - Packages. Tutorialspoint.com. (2022).

https://www.tutorialspoint.com/r/r_packages.htm

A Grammar of Data Manipulation. Dplyr.tidyverse.org. (n.d.).

https://dplyr.tidyverse.org/#:~:text=dplyr%20is%20a%20grammar%20of,cases%20base

d%20on%20their%20values

Anderson, S. (2012). Seananderson.ca.

https://seananderson.ca/courses/12-plyr/plyr_2012.pdf

README. Cran.r-project.org. (n.d.).

https://cran.r-project.org/web/packages/readr/readme/README.html

R-project.org. (n.d.).

https://www.r-project.org/nosvn/pandoc/crayon.html.

Create Elegant Data Visualisations Using the Grammar of Graphics. Ggplot2.tidyverse.org.

(n.d.).

https://ggplot2.tidyverse.org/

Finnstat (2021, Dec 29) DecGet and Set working directory (setwd / getwd) in R | R-bloggers.

R-bloggers.

https://www.r-bloggers.com/2021/12/get-and-set-working-directory-setwd-getwd-in-r/#

:~:text=The%20getwd()%20function%20in,display%20the%20current%20working%2

0directory.&text=This%20simple%20function%2C%20which%20takes,in%20very%20

handily%20for%20debugging

GeeksforGeeks (2020, Jun 05) Display the internal Structure of an Object in R Programming

- str() Function

https://www.geeksforgeeks.org/display-the-internal-structure-of-an-object-in-r-program

62
CT127-3-2-PFDA APU2F2206CS(DA)

ming-str-function/#:~:text=str()%20function%20in%20R,the%20object%20and%20its

%20constituents

Dodge overlapping objects side-to-side — position_dodge. Ggplot2.tidyverse.org. (n.d.).

https://ggplot2.tidyverse.org/reference/position_dodge.html#:~:text=position_dodge.Rd

,while%20adjusting%20the%20horizontal%20position

theme_bw function - RDocumentation. Rdocumentation.org. (n.d.).

https://www.rdocumentation.org/packages/ggplot2/versions/0.9.0/topics/theme_bw

theme function - RDocumentation. Rdocumentation.org. (n.d.).

https://www.rdocumentation.org/packages/ggplot2/versions/3.3.6/topics/theme

GeeksforGeeks (2021, Feb 25) Histograms and Density Plots in R.

https://www.geeksforgeeks.org/histograms-and-density-plots-in-r/#:~:text=A%20densit

y%20plot%20is%20a,to%20compute%20kernel%20density%20estimates

Hadley Wickham, a. (n.d.). 17 Faceting | ggplot2. Ggplot2-book.org.

https://ggplot2-book.org/facet.html#:~:text=facet_wrap()%20makes%20a%20long,a%2

0more%20space%20efficient%20manner

coord_flip function - RDocumentation. Rdocumentation.org. (n.d.).

https://www.rdocumentation.org/packages/ggplot2/versions/1.0.0/topics/coord_flip

ggplot2 pie chart : Quick start guide - R software and data visualization - Easy Guides - Wiki

- STHDA. Sthda.com. (n.d.).

http://www.sthda.com/english/wiki/ggplot2-pie-chart-quick-start-guide-r-software-and-

data-visualization#:~:text=The%20function%20coord_polar()%20is,bar%20chart%20i

n%20polar%20coordinates

Marsja, E. (2021, Jun 30). How to Create a Violin plot in R with ggplot2 and Customize it.

Erik Marsja.

https://www.marsja.se/how-to-create-a-violin-plot-in-r-with-ggplot2-and-customize-it/#

63
CT127-3-2-PFDA APU2F2206CS(DA)

:~:text=A%20violin%20plot%20is%20showing,or%20variables%20in%20our%20data

sets

How to Create a ggplot Jitter Plot in R. Koalatea.io. (n.d.)

https://koalatea.io/r-gglot-jitter-plot/

64

You might also like