강준혁 회귀분석 과제 4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

2020150423 강준혁 회귀분석 과제 4

Q1.

a)

we can use a plot of standardized residuals against the predicted values.

b)

we can use a plot of standardized residuals against each of the predictor variables.

c)

we can use a normal probability plot of standardized residuals.

d)

we can use an index plot of leverage values.


Q2.

a)

The standardized residuals versus fitted values graph can be used for showing that standardized
residuals are not related with fitted value. If the graph is consisted of randomly scatter dots, It
means that standardized residuals is not related with fitted value. Otherwise, standardized residuals
are related with fitted value.

The potential residual plot is used for identifying observations as outliers, high leverage points, or
highly influential points.

The scatter plot matrix of variables 𝑥1 , ⋯ , 𝑥𝑃 is used for verifying if 𝑥1 , ⋯ , 𝑥𝑃 are linearly correlated.
It is used because we assume that 𝑥1 , ⋯ , 𝑥𝑃 are not linearly correlated.

b)

c)
Q3)

a)

positive – age, Black

age – Because in the past, the harmfulness to cigarettes was not well known.

Black - It is a known fact that black people's living standards are lower than white people, and I
think that smoking rates go up when living standards fall.

negative – HS, Income, Female, Price

HS -If the level of education is high, it can be estimated that you are more educated about the
harmfulness of cigarettes, so the smoking rate is likely to have a negative correlation.

Income – People who have high income tend to pay a lot of attention to health care.

Price – If a price of cigarette is going up, the sales will be decrease.

b)

> cigarette = read.table("C://Users//강준혁학부재학통계학과


//Desktop//cigarette.txt",header=T)
> cigarette2<-cigarette[,-1]
> pairs(cigarette2)
> cor(cigarette2)
Age HS Income Black Female
Price Sales
Age 1.00000000 -0.09891626 0.25658098 -0.04033021 0.55303189
0.24775673 0.22655492
HS -0.09891626 1.00000000 0.53400534 -0.50171191 -0.41737794
0.05697473 0.06669476
Income 0.25658098 0.53400534 1.00000000 0.01728756 -0.06882666
0.21455717 0.32606789
Black -0.04033021 -0.50171191 0.01728756 1.00000000 0.45089974 -
0.14777619 0.18959037
Female 0.55303189 -0.41737794 -0.06882666 0.45089974 1.00000000
0.02247351 0.14622124
Price 0.24775673 0.05697473 0.21455717 -0.14777619 0.02247351
1.00000000 -0.30062263
Sales 0.22655492 0.06669476 0.32606789 0.18959037 0.14622124 -
0.30062263 1.00000000

Except for price, all other explanatory variables have positive linear relationship between sales.

c)

> result<-lm(Sales~HS+Income+Black+Female+Price+Age,data=cigarette2)
> summary(result)

Call:
lm(formula = Sales ~ HS + Income + Black + Female + Price + Age,
data = cigarette2)

Residuals:
Min 1Q Median 3Q Max
-48.398 -12.388 -5.367 6.270 133.213

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 103.34485 245.60719 0.421 0.67597
HS -0.06159 0.81468 -0.076 0.94008
Income 0.01895 0.01022 1.855 0.07036 .
Black 0.35754 0.48722 0.734 0.46695
Female -1.05286 5.56101 -0.189 0.85071
Price -3.25492 1.03141 -3.156 0.00289 **
Age 4.52045 3.21977 1.404 0.16735
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 28.17 on 44 degrees of freedom


Multiple R-squared: 0.3208, Adjusted R-squared: 0.2282
F-statistic: 3.464 on 6 and 44 DF, p-value: 0.006857

I expected negative linear relationship between Income, but It has positive relationship between
sales.
d)

we can use a plot of standardized residuals against the predicted values.

> rstandard<-rstandard(result)
> p<-predict(result)
> plot(p,rstandard,ylab="Standardized Residuals", xlab="Predicted Values")
> abline(h=0)

If the linearity assumption is valid, the graph must show randomly scattered dots. However, there
are few dots that are locating at very weird place. So, the linearity assumption is not valid.

e)

we can use a normal probability plot and a plot of standardized residuals against each of the
predictor variables.

> par(mfrow=c(1,1))
> qqnorm(rstandard, ylab="Standardized Residuals", xlab="Normal Scores" )
> par(mfrow=c(3,2))
> plot(cigarette$Age,rstandard,ylab="Standardized Residuals", xlab="Age")
> abline(h=0)
>
> plot(cigarette$HS,rstandard,ylab="Standardized Residuals", xlab="HS")
> abline(h=0)
>
> plot(cigarette$Income,rstandard,ylab="Standardized Residuals",
xlab="Income")
> abline(h=0)
>
> plot(cigarette$Black,rstandard,ylab="Standardized Residuals",
xlab="Black")
> abline(h=0)
>
> plot(cigarette$Female,rstandard,ylab="Standardized Residuals",
xlab="Female")
> abline(h=0)
>
> plot(cigarette$Price,rstandard,ylab="Standardized Residuals",
xlab="Price")
> abline(h=0)
As you can see, the first picture doesn’t show the line with slope 1 and pass throw (0,0).

It means that the normality assumption is not valid.

The second picture doesn’t show the randomness.

It means that the homogeneity assumption is not valid.


f)

we can use scatterplot matrix.

> pairs(cigarette2)

There is linear relationship between HS and Income, and Black and Income.

So, explanatory variables are not linearly independent.


g)

we can use an index plot of leverage values.

> leverage=hatvalues(result)
> plot(leverage,ylab="Leverage Values", xlab="Observation Number")

As you can see, there are 2 weird dots. So, observations are not equally influential on least squares
result.

h)

We can use added-variable plot.

> install.packages("car")
> library(car)
> avPlots(result)
Since the graph of HS and Female didn’t show an obvious slope, so they are not appropriate to
use.

You might also like