Business Analytics-1: STR (Crew - Data)

Business Analytics- 1
1) List the categorical and numeric variables of the data set

ANS.
A. Categorical variables:
1. Hire date
2. Lastname
3. firstname
4. Location
5. Phone
6. EmpId
7. Job.code
B. Numeric variable
1. Salary
Output:
str(Crew.data)
'data.frame': 69 obs. of 8 variables:
$ Hire.date: Factor w/ 69 levels "1-Jul-87","1-Mar-90",..: 35 50 3 16 27 36 62
60 24 17 ...
$ Lastname : Factor w/ 69 levels "BEAUMONT","BERGAMASCO",..: 21 35 69 19
41 18 42 64 67 9 ...
$ Firstname: Factor w/ 69 levels "ANITA M.","ANNETTE M.",..: 30 29 24 58 54
26 68 39 59 37 ...
$ Location : Factor w/ 3 levels "CARY","FRANKFURT",..: 1 2 3 1 3 2 3 2 2 3 ...
$ Phne : int 1168 2164 1565 1157 2360 1595 2366 1197 1553 1369 ...
$ EmpId : Factor w/ 69 levels "E00034","E00084",..: 53 36 49 46 31 4 25 29
41 18 ...
$ Job.code : Factor w/ 6 levels "FLTAT1","FLTAT2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Salary : int 21000 22000 22000 23000 24000 25000 25000 26000 27000
28000 ...
2) Describe the numeric variable using descriptive technique
Ans:
a) Summary
Output:
summary(Crew.data$Salary)
Min. 1st Qu. Median Mean 3rd Qu. Max.
21000 33000 42000 52145 73000 112000
b) Mean
Output:
>mean(Crew.data$Salary)
[1] 52144.93
c) Median
Output:
>median(Crew.data$Salary)
[1] 42000
d) Standard Deviation
Output:
>sd(Crew.data$Salary)
[1] 25521.78
e) Variance
Output:
>var(Crew.data$Salary)
[1] 651361040
3) How many groups are containing in the variable “Job code”
Ans: There are 6 categories
O/p 1: Using dplyr function
>Crew.data%>%count(Job.code)
# A tibble: 6 x 2
Job.code n
<fct><int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
O/p 2: Using group_by function:
# A tibble: 6 x 2
Job.code count
<fct><int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
4) Enumerate all functions explained in the video for “Job code”
Ans:
a) Count function
Output:
>Crew.data%>%count(Job.code)
A tibble: 6 x 2
Job.code `mean(Salary)`
<fct><dbl>
1 FLTAT1 25643.
2 FLTAT2 35111.
3 FLTAT3 44250
4 PILOT1 69500
5 PILOT2 80111.
6 PILOT3 99875
b) Group by function:
O/p
>Crew.data%>%group_by(Job.code)%>%summarise(count=n())
# A tibble: 6 x 2
Job.code count
<fct><int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
c) Table function:

Output
>table(Crew.data$Job.code)
FLTAT1 FLTAT2 FLTAT3 PILOT1 PILOT2 PILOT3

14 18 12 8 9 8
5) Enumerate all functions explained in the video for “salary”

Ans:
a) Mean Salary:
Output:
mean(Crew.data$Salary)
[1] 52144.93
b) Standard Deviation in Salary:
Output:
sd(Crew.data$Salary)
[1] 25521.78
c) Variance
Output:
>var(Crew.data$Salary)
[1] 651361040
d) Summary
Output:
summary(Crew.data$Salary)
21000 33000 42000 52145 73000 112000
e) Median Salary:
Output:
median(Crew.data$Salary)
[1] 42000
f) Jobcode Category-wise Salary
Output:
Crew.data%>%group_by(Job.code)%>%summarise(mean(Salary))
# A tibble: 6 x 2
Job.code `mean(Salary)`
<fct><dbl>
1 FLTAT1 25643.
2 FLTAT2 35111.
3 FLTAT3 44250
4 PILOT1 69500
5 PILOT2 80111.
6 PILOT3 99875
Question 2:
1) Enumerate all functions explained in the video for all categorical and
numerical variables of the data set.
Ans:
Although it shows all as numeric variables, here 5 are categorical variables.

Categorical variables: cyl,vs, am, gear and carb.- (5)
Numeric Variables: mpg,disp, hp, drat, wt and qsec – (6)
Output:
str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
Numeric Variables:
1) mpg:
Mean:
>mean(mtcars$mpg)
[1] 20.09062
Median:
>median(mtcars$mpg)
[1] 19.2
Standard deviation:
>sd(mtcars$mpg)
[1] 6.026948
Variance:
>var(mtcars$mpg)
[1] 36.3241
Summary:
>summary(mtcars$mpg)
10.40 15.43 19.20 20.09 22.80 33.90
2) disp
Mean:
>mean(mtcars$disp)
[1] 230.7219
Median:
>median(mtcars$disp)
[1] 196.3
Standard deviation:
>sd(mtcars$disp)
[1] 123.9387
Variance:
>var(mtcars$disp)
[1] 15360.
Summary:
>summary(mtcars$disp)
71.1 120.8 196.3 230.7 326.0 472.0
3) hp:
Mean:
>mean(mtcars$hp)
[1] 146.6875
Median:
>median(mtcars$hp)
[1] 123
Standard deviation:
>sd(mtcars$hp)
[1] 68.56287
Variance:
>var(mtcars$hp)
[1] 4700.867
Summary:
>summary(mtcars$hp)
52.0 96.5 123.0 146.7 180.0 335.0
4) drat
Mean:
>mean(mtcars$drat)
[1] 3.596563
Median:
>median(mtcars$drat)
[1] 3.695
Standard deviation:
>sd(mtcars$drat)
[1] 0.5346787
Variance:
>var(mtcars$drat)
[1] 0.285881
Summary:
>summary(mtcars$drat)
2.760 3.080 3.695 3.597 3.920 4.930
5) wt
Mean:
>mean(mtcars$wt)
[1] 3.21725
Median:
>median(mtcars$wt)
[1] 3.32
Standard deviation:
>sd(mtcars$wt)
[1] 0.9784574
Variance:
>var(mtcars$wt)
[1] 0.957379
Summary:
>summary(mtcars$wt)
1.513 2.581 3.325 3.217 3.610 5.424
6) qsec
Mean:
>mean(mtcars$qsec)
[1] 17.84875
Median:
>median(mtcars$qsec)
[1] 17.71
Standard deviation:
>sd(mtcars$qsec)
[1] 1.786943
Variance:
>var(mtcars$qsec)
[1] 3.193166
Summary:
>summary(mtcars$qsec)
14.50 16.89 17.71 17.85 18.90 22.90
Categorical Variables:
1) cyl:
a) using dplyr package
>mtcars%>%count(cyl)
# A tibble: 3 x 2
cyl n
<dbl><int>
1 4 11
2 6 7
3 8 14
b) using group_by function

>mtcars%>%group_by(cyl)%>%summarise(count=n())
# A tibble: 3 x 2
cyl count
<dbl><int>
1 4 11
2 6 7
3 8 14
2) vs:
a) using dplyr package:
>mtcars%>%count(vs)
# A tibble: 2 x 2
vs n
<dbl><int>
1 0 18
2 1 14
>mtcars%>%group_by(vs)%>%summarise(count=n())
# A tibble: 2 x 2
vs count
<dbl><int>
1 0 18
2 1 14
3) am:
>mtcars%>%count(am)
# A tibble: 2 x 2
am n
<dbl><int>
1 0 19
2 1 13

>mtcars%>%group_by(am)%>%summarise(count=n())
# A tibble: 2 x 2
am count
<dbl><int>
1 0 19
2 1 13
4) gear:
>mtcars%>%count(gear)
# A tibble: 3 x 2
gear n
<dbl><int>
1 3 15
2 4 12
3 5 5

>mtcars%>%group_by(gear)%>%summarise(count=n())
# A tibble: 3 x 2
gear count
<dbl><int>
1 3 15
2 4 12
3 5 5
5) carb:
>mtcars%>%count(carb)
# A tibble: 6 x 2
carb n
<dbl><int>
1 1 7
2 2 10
3 3 3
4 4 10
5 6 1
6 8 1

>mtcars%>%group_by(carb)%>%summarise(count=n())
# A tibble: 6 x 2
carb count
<dbl><int>
1 1 7
2 2 10
3 3 3
4 4 10
5 6 1
6 8 1
2. Prepare a data frame for at least two categorical variables and find the
mean salary of those groups.
Ans:
Numeric Variables:
I. Finding the mean mpg of the cars with different gears.

a) using count function:
# A tibble: 3 x 2
gear n
<dbl><int>
1 3 15
2 4 12
3 5 5
b) using group by function:

# A tibble: 3 x 2
gear count
<dbl><int>
1 3 15
2 4 12
3 5 5
Mean mpg of different geared car:
>mtcars%>%group_by(gear)%>%summarise(mean(mpg))
# A tibble: 3 x 2
gear `mean(mpg)`
<dbl><dbl>
1 3 16.1
2 4 24.5
3 5 21.4
II. Finding average horsepower generated by different geared cars
a) using count function:

# A tibble: 3 x 2
gear n
<dbl><int>
1 3 15
2 4 12
3 5 5
c) using group by function:

# A tibble: 3 x 2
gear count
<dbl><int>
1 3 15
2 4 12
3 5 5
Mean hp of different geared car:
>mtcars%>%group_by(gear)%>%summarise(mean(hp))
# A tibble: 3 x 2
gear `mean(hp)`
<dbl><dbl>
1 3 176.
2 4 89.5
3 5 196.
Categorical Variables:
1) For Cyl:
Steps:
table(mtcars$cyl)
mtcarst=table(mtcars$cyl)
class(mtcarst)
mtcarsf=as.data.frame(mtcarst)
mtcarsf
Output:
>mtcarsf
Var1 Freq
1 4 11
2 6 7
3 8 14
2) For am:
Steps:
table(mtcars$am)
mtcarst1=table(mtcars$am)
class(mtcarst1)
mtcarsf1=as.data.frame(mtcarst1)
mtcarsf1
Output:
>mtcarsf1
Var1 Freq
1 0 19
2 1 13

Business Analytics-1: STR (Crew - Data)

Uploaded by

Copyright:

Available Formats

You might also like

Business Analytics-1: STR (Crew - Data)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Analytics-1: STR (Crew - Data)

Uploaded by

Copyright:

Available Formats

Business Analytics- 1

1) List the categorical and numeric variables of the data set

3) How many groups are containing in the variable “Job code”

Ans: There are 6 categories

O/p 1: Using dplyr function

O/p 2: Using group_by function:

4) Enumerate all functions explained in the video for “Job code”

FLTAT1 FLTAT2 FLTAT3 PILOT1 PILOT2 PILOT3

5) Enumerate all functions explained in the video for “salary”

f) Jobcode Category-wise Salary

Although it shows all as numeric variables, here 5 are categorical variables.

b) using group_by function

b) using group_by function

b) using group_by function

b) using group_by function

b) using group_by function

I. Finding the mean mpg of the cars with different gears.

b) using group by function:

II. Finding average horsepower generated by different geared cars

a) using count function:

c) using group by function:

You might also like