Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Student

Dataset
Study
Hours

Insights
Baylas, Fatima Joan B.
BSCS-2
CSPE 3100 : Data Science
Dataset
The data set contains two columns that is the number of hours
student studied and the marks they got.

himanshunakrani

kaggle datasets download -d


himanshunakrani/student-study-hours
library(readr)
install.packages("ggplot2")
library(ggplot2)

studstudyhour = read.csv("score.csv", sep=",")


summary(studstudyhour)
head(studstudyhour)

hours = studstudyhour[,"Hours"]
scores = studstudyhour[,"Scores"]

#plot(x,y)
plot(hours, scores, pch = 16, col = "blue")

#correlation of between x and y


cor(hours, scores)
#linear regresssion model
model = lm(scores~hours, data=studstudyhour)
summary(model)
abline(model)

#using ggplot
ggplot(data = studstudyhour,aes(x = hours,y = scores)) +
geom_point(colour = "black",size = 1.5) +
geom_smooth(method = "lm",se = FALSE,colour = "red",size = 0.8)
Insights

Insights
> cor(hours, scores)
[1] 0.9761907
#plot(x,y)
plot(hours, scores, pch = 16, col = "blue")
> model = lm(scores~hours, data=studstudyhour)
> summary(model)

Call:
lm(formula = scores ~ hours, data = studstudyhour)

Residuals:
Min 1Q Median 3Q Max
-10.578 -5.340 1.839 4.593 7.265
R-square value: 0.95

P-value: < 2.2e-16 Coefficients:


Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.4837 2.5317 0.981 0.337
hours 9.7758 0.4529 21.583 <2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.603 on 23 degrees of freedom


Multiple R-squared: 0.9529, Adjusted R-squared: 0.9509
F-statistic: 465.8 on 1 and 23 DF, p-value: < 2.2e-16
abline(model)
ggplot(data = studstudyhour,aes(x = hours,y = scores)) +
geom_point(colour = "black",size = 1.5) +
geom_smooth(method = "lm",se = FALSE,colour = "red",size = 0.8)

You might also like