Professional Documents
Culture Documents
강준혁 회귀분석 과제 4
강준혁 회귀분석 과제 4
강준혁 회귀분석 과제 4
Q1.
a)
b)
we can use a plot of standardized residuals against each of the predictor variables.
c)
d)
a)
The standardized residuals versus fitted values graph can be used for showing that standardized
residuals are not related with fitted value. If the graph is consisted of randomly scatter dots, It
means that standardized residuals is not related with fitted value. Otherwise, standardized residuals
are related with fitted value.
The potential residual plot is used for identifying observations as outliers, high leverage points, or
highly influential points.
The scatter plot matrix of variables 𝑥1 , ⋯ , 𝑥𝑃 is used for verifying if 𝑥1 , ⋯ , 𝑥𝑃 are linearly correlated.
It is used because we assume that 𝑥1 , ⋯ , 𝑥𝑃 are not linearly correlated.
b)
c)
Q3)
a)
age – Because in the past, the harmfulness to cigarettes was not well known.
Black - It is a known fact that black people's living standards are lower than white people, and I
think that smoking rates go up when living standards fall.
HS -If the level of education is high, it can be estimated that you are more educated about the
harmfulness of cigarettes, so the smoking rate is likely to have a negative correlation.
Income – People who have high income tend to pay a lot of attention to health care.
b)
Except for price, all other explanatory variables have positive linear relationship between sales.
c)
> result<-lm(Sales~HS+Income+Black+Female+Price+Age,data=cigarette2)
> summary(result)
Call:
lm(formula = Sales ~ HS + Income + Black + Female + Price + Age,
data = cigarette2)
Residuals:
Min 1Q Median 3Q Max
-48.398 -12.388 -5.367 6.270 133.213
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 103.34485 245.60719 0.421 0.67597
HS -0.06159 0.81468 -0.076 0.94008
Income 0.01895 0.01022 1.855 0.07036 .
Black 0.35754 0.48722 0.734 0.46695
Female -1.05286 5.56101 -0.189 0.85071
Price -3.25492 1.03141 -3.156 0.00289 **
Age 4.52045 3.21977 1.404 0.16735
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I expected negative linear relationship between Income, but It has positive relationship between
sales.
d)
> rstandard<-rstandard(result)
> p<-predict(result)
> plot(p,rstandard,ylab="Standardized Residuals", xlab="Predicted Values")
> abline(h=0)
If the linearity assumption is valid, the graph must show randomly scattered dots. However, there
are few dots that are locating at very weird place. So, the linearity assumption is not valid.
e)
we can use a normal probability plot and a plot of standardized residuals against each of the
predictor variables.
> par(mfrow=c(1,1))
> qqnorm(rstandard, ylab="Standardized Residuals", xlab="Normal Scores" )
> par(mfrow=c(3,2))
> plot(cigarette$Age,rstandard,ylab="Standardized Residuals", xlab="Age")
> abline(h=0)
>
> plot(cigarette$HS,rstandard,ylab="Standardized Residuals", xlab="HS")
> abline(h=0)
>
> plot(cigarette$Income,rstandard,ylab="Standardized Residuals",
xlab="Income")
> abline(h=0)
>
> plot(cigarette$Black,rstandard,ylab="Standardized Residuals",
xlab="Black")
> abline(h=0)
>
> plot(cigarette$Female,rstandard,ylab="Standardized Residuals",
xlab="Female")
> abline(h=0)
>
> plot(cigarette$Price,rstandard,ylab="Standardized Residuals",
xlab="Price")
> abline(h=0)
As you can see, the first picture doesn’t show the line with slope 1 and pass throw (0,0).
> pairs(cigarette2)
There is linear relationship between HS and Income, and Black and Income.
> leverage=hatvalues(result)
> plot(leverage,ylab="Leverage Values", xlab="Observation Number")
As you can see, there are 2 weird dots. So, observations are not equally influential on least squares
result.
h)
> install.packages("car")
> library(car)
> avPlots(result)
Since the graph of HS and Female didn’t show an obvious slope, so they are not appropriate to
use.