Professional Documents
Culture Documents
Business Analytics-1: STR (Crew - Data)
Business Analytics-1: STR (Crew - Data)
Business Analytics-1: STR (Crew - Data)
a) Summary
Output:
summary(Crew.data$Salary)
Min. 1st Qu. Median Mean 3rd Qu. Max.
21000 33000 42000 52145 73000 112000
b) Mean
Output:
>mean(Crew.data$Salary)
[1] 52144.93
c) Median
Output:
>median(Crew.data$Salary)
[1] 42000
d) Standard Deviation
Output:
>sd(Crew.data$Salary)
[1] 25521.78
e) Variance
Output:
>var(Crew.data$Salary)
[1] 651361040
>Crew.data%>%count(Job.code)
# A tibble: 6 x 2
Job.code n
<fct><int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
# A tibble: 6 x 2
Job.code count
<fct><int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
Ans:
a) Count function
Output:
>Crew.data%>%count(Job.code)
A tibble: 6 x 2
Job.code `mean(Salary)`
<fct><dbl>
1 FLTAT1 25643.
2 FLTAT2 35111.
3 FLTAT3 44250
4 PILOT1 69500
5 PILOT2 80111.
6 PILOT3 99875
b) Group by function:
O/p
>Crew.data%>%group_by(Job.code)%>%summarise(count=n())
# A tibble: 6 x 2
Job.code count
<fct><int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8
c) Table function:
Output
>table(Crew.data$Job.code)
mean(Crew.data$Salary)
[1] 52144.93
b) Standard Deviation in Salary:
Output:
sd(Crew.data$Salary)
[1] 25521.78
c) Variance
Output:
>var(Crew.data$Salary)
[1] 651361040
d) Summary
Output:
summary(Crew.data$Salary)
Min. 1st Qu. Median Mean 3rd Qu. Max.
21000 33000 42000 52145 73000 112000
e) Median Salary:
Output:
median(Crew.data$Salary)
[1] 42000
Output:
Crew.data%>%group_by(Job.code)%>%summarise(mean(Salary))
# A tibble: 6 x 2
Job.code `mean(Salary)`
<fct><dbl>
1 FLTAT1 25643.
2 FLTAT2 35111.
3 FLTAT3 44250
4 PILOT1 69500
5 PILOT2 80111.
6 PILOT3 99875
Question 2:
1) Enumerate all functions explained in the video for all categorical and
numerical variables of the data set.
Ans:
Output:
str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
Numeric Variables:
1) mpg:
Mean:
>mean(mtcars$mpg)
[1] 20.09062
Median:
>median(mtcars$mpg)
[1] 19.2
Standard deviation:
>sd(mtcars$mpg)
[1] 6.026948
Variance:
>var(mtcars$mpg)
[1] 36.3241
Summary:
>summary(mtcars$mpg)
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.40 15.43 19.20 20.09 22.80 33.90
2) disp
Mean:
>mean(mtcars$disp)
[1] 230.7219
Median:
>median(mtcars$disp)
[1] 196.3
Standard deviation:
>sd(mtcars$disp)
[1] 123.9387
Variance:
>var(mtcars$disp)
[1] 15360.
Summary:
>summary(mtcars$disp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
71.1 120.8 196.3 230.7 326.0 472.0
3) hp:
Mean:
>mean(mtcars$hp)
[1] 146.6875
Median:
>median(mtcars$hp)
[1] 123
Standard deviation:
>sd(mtcars$hp)
[1] 68.56287
Variance:
>var(mtcars$hp)
[1] 4700.867
Summary:
>summary(mtcars$hp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
52.0 96.5 123.0 146.7 180.0 335.0
4) drat
Mean:
>mean(mtcars$drat)
[1] 3.596563
Median:
>median(mtcars$drat)
[1] 3.695
Standard deviation:
>sd(mtcars$drat)
[1] 0.5346787
Variance:
>var(mtcars$drat)
[1] 0.285881
Summary:
>summary(mtcars$drat)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.760 3.080 3.695 3.597 3.920 4.930
5) wt
Mean:
>mean(mtcars$wt)
[1] 3.21725
Median:
>median(mtcars$wt)
[1] 3.32
Standard deviation:
>sd(mtcars$wt)
[1] 0.9784574
Variance:
>var(mtcars$wt)
[1] 0.957379
Summary:
>summary(mtcars$wt)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.513 2.581 3.325 3.217 3.610 5.424
6) qsec
Mean:
>mean(mtcars$qsec)
[1] 17.84875
Median:
>median(mtcars$qsec)
[1] 17.71
Standard deviation:
>sd(mtcars$qsec)
[1] 1.786943
Variance:
>var(mtcars$qsec)
[1] 3.193166
Summary:
>summary(mtcars$qsec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
14.50 16.89 17.71 17.85 18.90 22.90
Categorical Variables:
1) cyl:
a) using dplyr package
>mtcars%>%count(cyl)
# A tibble: 3 x 2
cyl n
<dbl><int>
1 4 11
2 6 7
3 8 14
2) vs:
a) using dplyr package:
>mtcars%>%count(vs)
# A tibble: 2 x 2
vs n
<dbl><int>
1 0 18
2 1 14
>mtcars%>%group_by(vs)%>%summarise(count=n())
# A tibble: 2 x 2
vs count
<dbl><int>
1 0 18
2 1 14
3) am:
a) using dplyr package:
>mtcars%>%count(am)
# A tibble: 2 x 2
am n
<dbl><int>
1 0 19
2 1 13
4) gear:
a) using dplyr package:
>mtcars%>%count(gear)
# A tibble: 3 x 2
gear n
<dbl><int>
1 3 15
2 4 12
3 5 5
2. Prepare a data frame for at least two categorical variables and find the
mean salary of those groups.
Ans:
Numeric Variables:
>mtcars%>%group_by(gear)%>%summarise(mean(hp))
# A tibble: 3 x 2
gear `mean(hp)`
<dbl><dbl>
1 3 176.
2 4 89.5
3 5 196.
Categorical Variables:
1) For Cyl:
Steps:
table(mtcars$cyl)
mtcarst=table(mtcars$cyl)
class(mtcarst)
mtcarsf=as.data.frame(mtcarst)
mtcarsf
Output:
>mtcarsf
Var1 Freq
1 4 11
2 6 7
3 8 14
2) For am:
Steps:
table(mtcars$am)
mtcarst1=table(mtcars$am)
class(mtcarst1)
mtcarsf1=as.data.frame(mtcarst1)
mtcarsf1
Output:
>mtcarsf1
Var1 Freq
1 0 19
2 1 13