Professional Documents
Culture Documents
Lab Assessment 2: Creating The Student Dataset
Lab Assessment 2: Creating The Student Dataset
19BCE2698
SHAIL PATEL
STATISTICS FOR ENGINEERS
df1<-data.frame(
StudentName = c(paste("Student", 1:60)),
MathMarks =
c(
sample(80:100, 12, replace = TRUE),
sample(70:90, 12, replace = TRUE),
sample(60:75, 12, replace = TRUE),
sample(50:80, 12, replace = TRUE),
sample(40:90, 12, replace = TRUE)
),
PhyMarks =
c(
sample(80:100, 12, replace = TRUE),
sample(70:90, 12, replace = TRUE),
sample(60:75, 12, replace = TRUE),
sample(50:80, 12, replace = TRUE),
sample(40:90, 12, replace = TRUE)
),
ChemMarks =
c(
sample(80:100, 12, replace = TRUE),
sample(70:90, 12, replace = TRUE),
sample(60:75, 12, replace = TRUE),
sample(50:80, 12, replace = TRUE),
sample(40:90, 12, replace = TRUE)
),
BioMarks =
c(
sample(80:100, 12, replace = TRUE),
sample(70:90, 12, replace = TRUE),
sample(60:75, 12, replace = TRUE),
sample(50:80, 12, replace = TRUE),
sample(40:90, 12, replace = TRUE)
),
EnglishMarks =
c(
sample(80:100, 12, replace = TRUE),
sample(70:90, 12, replace = TRUE),
1
sample(60:75, 12, replace = TRUE),
sample(50:80, 12, replace = TRUE),
sample(40:90, 12, replace = TRUE)
)
)
view(df1)
write.table(
df1,
file = "Student_dataset2.csv",
sep = ",",
append = TRUE,
row.names = FALSE
)
After uploading the data to a cloud manager like GitHub, we will access the
data from there
df3<-read.csv("https://raw.githubusercontent.com/rohilsaraf97/datasets/main/Student_dataset2.csv",sep=",
head(df3)
Descriptive Statistics
Mean:
colMeans(df3[2:6])
2
round(colMeans(df3[2:6]))
Therefore, one can deduce the mean marks for each subject from the result above
Median:
apply(df3[2:6], 2, median)
round(apply(df3[2:6], 2, median))
Therefore, one can deduce the median marks for each subject from the result above
Mode
apply(df3[2:6], 2, getmode)
Therefore, one can deduce the mode marks for each subject from the result above
Measures of Dispersion
Following are some of the measures of variability that R offers to differentiate between data sets:
• Variance
• Standard Deviation
• Range
• Mean Deviation
• Interquartile Range
3
Variance
Variance is a measure that shows how far is each value from a particular point, preferably mean value.
Mathematically, it is defined as the average of squared differences from the mean value.
apply(df3[2:6], 2, var)
Therefore, one can deduce the variance of marks from the mean marks for each subject from the result above
Standard Deviation
Standard deviation in statistics measures the spreadness of data values with respect to mean and mathe-
matically, is calculated as square root of variance.
apply(df3[2:6], 2, sd)
Range
Range is the difference between maximum and minimum value of a data set. In R language, max() and
min() is used to find the same, unlike range() function that returns the minimum and maximum value of
data set.
apply(df3[2:6], 2, getrange)
Mean Deviation
Mean deviation is a measure calculated by taking an average of the arithmetic mean of the absolute difference
of each value from the central value. Central value can be mean, median, or mode.
apply(df3[2:6], 2, getMeanAD)
4
About Mean
apply(df3[2:6], 2, getMedianAD)
About Median
About Mode
Interquartile Range is based on splitting a data set into parts called as quartiles. There are 3 quartile values
(Q1, Q2, Q3) that divide the whole data set into 4 equal parts. Q2 specifies the median of the whole data
set.
Mathematical equation for representing the Inter Quartile Range is,
IQR = Q3 − Q1
apply(df3[2:6], 2, IQR)
5
getBeta2<- function(v){
m4=sum((v-mean(v))ˆ4)/length(v)
m2=var(v)
beta2=m4/(m2ˆ2)
beta2
apply(df3[2:6],2, getBeta2)
getGama2<-function(v){
gama2=getBeta2(v)-3
gama2
}
apply(df3[2:6],2, getGama2)
Graphical Representation
df4<-data.frame(
Subjects=colnames(df3[2:6]),
meanMarks=apply(df3[2:6],2, mean)
)
6
Line Chart Representation for mean marks
75
Mean marks for each subject
74
73
72
71
BioMarks ChemMarks EnglishMarks MathMarks PhyMarks
Subjects