Business Analytics-1: STR (Crew - Data)

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

Business Analytics- 1

1) List the categorical and numeric variables of the data set


ANS.
A. Categorical variables:
1. Hire date
2. Lastname
3. firstname
4. Location
5. Phone
6. EmpId
7. Job.code
B. Numeric variable
1. Salary
Output:
str(Crew.data)
'data.frame': 69 obs. of 8 variables:
$ Hire.date: Factor w/ 69 levels "1-Jul-87","1-Mar-90",..: 35 50 3 16 27 36 62
60 24 17 ...
$ Lastname : Factor w/ 69 levels "BEAUMONT","BERGAMASCO",..: 21 35 69 19
41 18 42 64 67 9 ...
$ Firstname: Factor w/ 69 levels "ANITA M.","ANNETTE M.",..: 30 29 24 58 54
26 68 39 59 37 ...
$ Location : Factor w/ 3 levels "CARY","FRANKFURT",..: 1 2 3 1 3 2 3 2 2 3 ...
$ Phne : int 1168 2164 1565 1157 2360 1595 2366 1197 1553 1369 ...
$ EmpId : Factor w/ 69 levels "E00034","E00084",..: 53 36 49 46 31 4 25 29
41 18 ...
$ Job.code : Factor w/ 6 levels "FLTAT1","FLTAT2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Salary : int 21000 22000 22000 23000 24000 25000 25000 26000 27000
28000 ...
2) Describe the numeric variable using descriptive technique
Ans:

a) Summary
Output:
summary(Crew.data$Salary)
Min. 1st Qu. Median Mean 3rd Qu. Max.
21000 33000 42000 52145 73000 112000

b) Mean
Output:
>mean(Crew.data$Salary)
[1] 52144.93

c) Median
Output:
>median(Crew.data$Salary)
[1] 42000

d) Standard Deviation
Output:
>sd(Crew.data$Salary)
[1] 25521.78

e) Variance
Output:
>var(Crew.data$Salary)
[1] 651361040

3) How many groups are containing in the variable “Job code”

Ans: There are 6 categories

O/p 1: Using dplyr function

>Crew.data%>%count(Job.code)
# A tibble: 6 x 2
Job.code n
<fct><int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8

O/p 2: Using group_by function:

# A tibble: 6 x 2
Job.code count
<fct><int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8

4) Enumerate all functions explained in the video for “Job code”

Ans:
a) Count function
Output:
>Crew.data%>%count(Job.code)
A tibble: 6 x 2
Job.code `mean(Salary)`
<fct><dbl>
1 FLTAT1 25643.
2 FLTAT2 35111.
3 FLTAT3 44250
4 PILOT1 69500
5 PILOT2 80111.
6 PILOT3 99875
b) Group by function:

O/p

>Crew.data%>%group_by(Job.code)%>%summarise(count=n())
# A tibble: 6 x 2
Job.code count
<fct><int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8

c) Table function: 
 
Output
>table(Crew.data$Job.code)

FLTAT1 FLTAT2 FLTAT3 PILOT1 PILOT2 PILOT3


14 18 12 8 9 8

5) Enumerate all functions explained in the video for “salary”


Ans:
a) Mean Salary:
Output:

mean(Crew.data$Salary)
[1] 52144.93
b) Standard Deviation in Salary:

Output:

sd(Crew.data$Salary)
[1] 25521.78

c) Variance

Output:

>var(Crew.data$Salary)
[1] 651361040

d) Summary

Output:

summary(Crew.data$Salary)
Min. 1st Qu. Median Mean 3rd Qu. Max.
21000 33000 42000 52145 73000 112000

e) Median Salary:

Output:

median(Crew.data$Salary)
[1] 42000

f) Jobcode Category-wise Salary

Output:

Crew.data%>%group_by(Job.code)%>%summarise(mean(Salary))
# A tibble: 6 x 2
Job.code `mean(Salary)`
<fct><dbl>
1 FLTAT1 25643.
2 FLTAT2 35111.
3 FLTAT3 44250
4 PILOT1 69500
5 PILOT2 80111.
6 PILOT3 99875

Question 2:

1) Enumerate all functions explained in the video for all categorical and
numerical variables of the data set.
Ans:

Although it shows all as numeric variables, here 5 are categorical variables.


Categorical variables: cyl,vs, am, gear and carb.- (5)
Numeric Variables: mpg,disp, hp, drat, wt and qsec – (6)

Output:
str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...

Numeric Variables:
1) mpg:
Mean:
>mean(mtcars$mpg)
[1] 20.09062

Median:
>median(mtcars$mpg)
[1] 19.2

Standard deviation:
>sd(mtcars$mpg)
[1] 6.026948
Variance:
>var(mtcars$mpg)
[1] 36.3241

Summary:
>summary(mtcars$mpg)
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.40 15.43 19.20 20.09 22.80 33.90

2) disp

Mean:
>mean(mtcars$disp)
[1] 230.7219

Median:
>median(mtcars$disp)
[1] 196.3

Standard deviation:
>sd(mtcars$disp)
[1] 123.9387

Variance:
>var(mtcars$disp)
[1] 15360.
Summary:
>summary(mtcars$disp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
71.1 120.8 196.3 230.7 326.0 472.0

3) hp:

Mean:
>mean(mtcars$hp)
[1] 146.6875

Median:
>median(mtcars$hp)
[1] 123

Standard deviation:
>sd(mtcars$hp)
[1] 68.56287

Variance:
>var(mtcars$hp)
[1] 4700.867

Summary:
>summary(mtcars$hp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
52.0 96.5 123.0 146.7 180.0 335.0

4) drat

Mean:
>mean(mtcars$drat)
[1] 3.596563

Median:
>median(mtcars$drat)
[1] 3.695

Standard deviation:
>sd(mtcars$drat)
[1] 0.5346787

Variance:
>var(mtcars$drat)
[1] 0.285881

Summary:
>summary(mtcars$drat)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.760 3.080 3.695 3.597 3.920 4.930

5) wt

Mean:
>mean(mtcars$wt)
[1] 3.21725

Median:
>median(mtcars$wt)
[1] 3.32

Standard deviation:
>sd(mtcars$wt)
[1] 0.9784574

Variance:
>var(mtcars$wt)
[1] 0.957379

Summary:
>summary(mtcars$wt)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.513 2.581 3.325 3.217 3.610 5.424
6) qsec
Mean:
>mean(mtcars$qsec)
[1] 17.84875

Median:
>median(mtcars$qsec)
[1] 17.71

Standard deviation:
>sd(mtcars$qsec)
[1] 1.786943

Variance:
>var(mtcars$qsec)
[1] 3.193166

Summary:
>summary(mtcars$qsec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
14.50 16.89 17.71 17.85 18.90 22.90

Categorical Variables:
1) cyl:
a) using dplyr package
>mtcars%>%count(cyl)
# A tibble: 3 x 2
cyl n
<dbl><int>
1 4 11
2 6 7
3 8 14

b) using group_by function


>mtcars%>%group_by(cyl)%>%summarise(count=n())
# A tibble: 3 x 2
cyl count
<dbl><int>
1 4 11
2 6 7
3 8 14

2) vs:
a) using dplyr package:
>mtcars%>%count(vs)
# A tibble: 2 x 2
vs n
<dbl><int>
1 0 18
2 1 14

b) using group_by function

>mtcars%>%group_by(vs)%>%summarise(count=n())
# A tibble: 2 x 2
vs count
<dbl><int>
1 0 18
2 1 14

3) am:
a) using dplyr package:
>mtcars%>%count(am)
# A tibble: 2 x 2
am n
<dbl><int>
1 0 19
2 1 13

b) using group_by function


>mtcars%>%group_by(am)%>%summarise(count=n())
# A tibble: 2 x 2
am count
<dbl><int>
1 0 19
2 1 13

4) gear:
a) using dplyr package:
>mtcars%>%count(gear)
# A tibble: 3 x 2
gear n
<dbl><int>
1 3 15
2 4 12
3 5 5

b) using group_by function


>mtcars%>%group_by(gear)%>%summarise(count=n())
# A tibble: 3 x 2
gear count
<dbl><int>
1 3 15
2 4 12
3 5 5
5) carb:
a) using dplyr package:
>mtcars%>%count(carb)
# A tibble: 6 x 2
carb n
<dbl><int>
1 1 7
2 2 10
3 3 3
4 4 10
5 6 1
6 8 1

b) using group_by function


>mtcars%>%group_by(carb)%>%summarise(count=n())
# A tibble: 6 x 2
carb count
<dbl><int>
1 1 7
2 2 10
3 3 3
4 4 10
5 6 1
6 8 1

2. Prepare a data frame for at least two categorical variables and find the
mean salary of those groups.
Ans:
Numeric Variables:

I. Finding the mean mpg of the cars with different gears.


a) using count function:
>mtcars%>%count(gear)
# A tibble: 3 x 2
gear n
<dbl><int>
1 3 15
2 4 12
3 5 5

b) using group by function:


>mtcars%>%group_by(gear)%>%summarise(count=n())
# A tibble: 3 x 2
gear count
<dbl><int>
1 3 15
2 4 12
3 5 5
Mean mpg of different geared car:
>mtcars%>%group_by(gear)%>%summarise(mean(mpg))
# A tibble: 3 x 2
gear `mean(mpg)`
<dbl><dbl>
1 3 16.1
2 4 24.5
3 5 21.4

II. Finding average horsepower generated by different geared cars

a) using count function:


>mtcars%>%count(gear)
# A tibble: 3 x 2
gear n
<dbl><int>
1 3 15
2 4 12
3 5 5

c) using group by function:


>mtcars%>%group_by(gear)%>%summarise(count=n())
# A tibble: 3 x 2
gear count
<dbl><int>
1 3 15
2 4 12
3 5 5
Mean hp of different geared car:

>mtcars%>%group_by(gear)%>%summarise(mean(hp))
# A tibble: 3 x 2
gear `mean(hp)`
<dbl><dbl>
1 3 176.
2 4 89.5
3 5 196.

Categorical Variables:
1) For Cyl:
Steps:
table(mtcars$cyl)
mtcarst=table(mtcars$cyl)
class(mtcarst)
mtcarsf=as.data.frame(mtcarst)
mtcarsf

Output:
>mtcarsf
Var1 Freq
1 4 11
2 6 7
3 8 14

2) For am:
Steps:
table(mtcars$am)
mtcarst1=table(mtcars$am)
class(mtcarst1)
mtcarsf1=as.data.frame(mtcarst1)
mtcarsf1

Output:
>mtcarsf1
Var1 Freq
1 0 19
2 1 13

You might also like