Eenvoudige Lineêre Regressie Simple Linear Regression

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

11

Eenvoudige Lineêre Regressie


Simple Linear Regression

What is the difference between a statistician and a mortician?


Nobody's dying to see the statistician!

What is the difference between an introverted and extroverted


statistician?
The introverted statistician stares DOWN at his shoes whereas
the extroverted statistician stares OVER at your shoes!!!

Regressie
Regression

Ons wil nou ‘n wiskundige model bou om veranderlikes te


gebruik (onafhanklike) om ander veranderlikes te
voorspel (afhanklike).
We want to build a mathematical model to use variables
(independent) to predict other variables (dependent).

Lineêre regressie beteken ons gaan ‘n reguitlyn formule


gebruik.
Linear regression indicates that we’ll be using a straight
line equation.

Y = β0 + β1.X + ε

Tempratuur (C) Sterkte (N/m2)


Temperature (C) Strength (N/m2)

1) 67 75
2) 87 65
3) 87 67
4) 68 70
5) 72 82
6) 62 67
7) 76 51
8) 65 52
9) 73 50

1
Ons stel belang in the verwantskap tussen die twee
veranderlikes, maar meer as dit, wil ons ook tempratuur
gebruik of sterkte te voorspel.
We are interested in the relationship between the two
variables, but more than that, we want to predict
strength using temperature.

Korrelasie gee ‘n aanduiding van wat die assosiasie is,


terwyl regressie gebruik word om ‘n model te bou wat
gebruik kan word vir voorspelling.
Correlation gives an indication of the association between
the variables, while regression is used to build a model
that can be used for predictions.

Verspreidingsdiagram
Scatter plot

90
80
70
Sterkte / Strength (N/m2)

60
50
40
30
20
10
0
0 20 40 60 80 100

Temp (C)

Geen reguitlyn gaan presies deur al die punte gaan nie.


No straight line will pass through all the points.

Ons moet die lyn kry wat hierdie data die “beste” pas.
We need the line that “best” fit these data.

Ons weet ons gaan ‘n fout maak met seker indien nie al die
punte nie, ons soek die lyn wat hierdie foute gaan
minimeer.
We know that we will make mistakes with some, if not all
the points, we need a line that minimises this mistake.

2
Y = β0 + β1.X + ε

β0 ≡ Beta 0 Y-afsnit
Y-intercept

β1 ≡ Beta 1 Koëffisiënt van X


X coefficient

ε ≡ Epsilon Fout wat gemaak word


The error we make

Metode van Kleinste Kwadrate gebruik vir beraming van β0


en β1  sakrekenaar.
Use the Method of Least Squares to estimate β0 and β1 
calculator.

Verspreidingsdiagram
Scatter plot

90

80

70
Sterkte / Strength (N/m2)

60

50

40

30
y = 68.164 - 0.0525x
20

10

0
0 10 20 30 40 50 60 70 80 90 100

Temp (c)

Sakrekenaar / Calculator

MODE  STAT  A+BX


X Y
1
2
3
4
AC
SHIFT  STAT [1]
SUM  Sums of squares
VAR  Basic statistics
REG  A = β0, B = β1, r

3
Te St Voorspelde St Fout
Predicted St Residual

1) 67 75 64.65 -10.35
2) 87 65 63.60 -1.4
3) 87 67 63.60 -3.4
4) 68 70 64.59 -5.41
5) 72 82 64.38 -17.62
6) 62 67 64.91 -2.09
7) 76 51 64.17 13.17
8) 65 52 64.75 12.75
9) 73 50 64.33 14.33

Ons kan ‘n regressie model op enige datastel pas, dit


beteken egter nie dat die model altyd ‘n goeie model
gaan wees of dat dit enigsins vir voorspelling gebruik
moet word nie.
We can fit a regression model to any dataset, but this does
not mean that the model is always a good model or that
it should be used to make predictions.

In ons voorbeeld sal ‘n tempratuur van 0 lei tot ’n sterkte


van 68.164 en vir elke graad meer as 0 daal die sterkte
0.0525 per graad.
In our example a temperature of 0 leads to a strength of
68.164 and strength will decrease by 0.0525 for each
additional degree above 0.

Maatstawwe van Variasie


Measure of Variation

Ons het maatstawwe nodig wat ‘n aanduiding kan gee of ‘n


model ‘n goeie model is of nie en of dit vir ons
voorspellings sal gee wat die moeite werd is.
We need measures that can give us an indiciation of how
good the model is and if it will give us prediction that are
actually useful.

Korrelasiekoëffisiënt / Correlation Coefficient


Bepalingskoëffisiënt / Coefficient of Determination
Standaardfout van die Beraming / Standard Error of the
Estimate

4
Korrelasiekoëffisiënt
Correlation Coefficient

Selfde maatstaf as in hoofstuk 5 bespreek.


Same measure as discussed in chapter 5.

r -1 ≤ r ≤ 1

Dit gee vir ons ‘n aanduiding van die lineêre assosiasie


tussen veranderlikes. Indien daar ‘n sterk assosiasie is
sal ‘n lineêre regressie model ‘n gepaste model wees om
te pas.
This gives us an indication of the linear association
between variable. If there is a strong association a linear
regression model is an appropriate model to fit.

Bepalingskoëffisiënt
Coefficient of Determination

r2 0 ≤ r2 ≤ 1

Gee vir ons ‘n aanduiding van watter deel van die variasie
van die Y’s word verklaar deur die X’e.
Gives us an indication of what part of the variation of the
Y’s is explained by the X’s.

Hoe nader aan 1, hoe meer verklaar X vir Y, dus hoe beter
is ons model vir vooruitskattings.
The closer to 1, the more X explains Y, thus the more
accurate our model for predictions.

Standaardfout van die Beraming


Standard Error of the Estimate

‘n Maatstaf van die variasie van die beraamde/voorspelde


waardes rondom die werklik waardes.
A measure of the variation of the predicted values around
the actual values.

vs Y

Hoe groter die waarde hoe meer verspreid is die beraamde


waardes rondom die werklike waardes, dus hoe slegter
is one model.
The larger the value the more the predicted values varies
around the actual values, the worse our model.

5
SST en SSR bereken met sakrekenaar.
Calculate SST and SSR using the calculator.

 SHIFT STAT  Var

 SHIFT STAT  Reg

Excel
Regression Statistics
Multiple R 0.042
R Square 0.0018 R2
Adjusted R Square -0.141
Standard Error 11.989
Observations 9
ANOVA
df SS MS F Sig F
Regression 1 1.7840 1.7840 0.0124 0.9144
Residual 7 1006.2160 143.7451
Total 8 1008
Coef SE t Stat P-value
Intercept 68.164 34.614 1.969 0.090
Temp -0.052 0.471 -0.111 0.914

Inferensie
Inference

Intervalle
Interevals

 SHIFT STAT  Var

6
Hipotesetoetse
Hypothesis Tests

Verskillende hipotese kan getoets word nadat ‘n regressie


model gebou is. Hierdie hipoteses toets op ‘n meer
formele manier of die model ‘n goeie of ‘n slegte model
is.
Different hypothesis tests can also be tested after
performing a regression analysis. These hypothesis tests
are more formal tests to see if the model is a good
model or not.

F-Toets
F-Test

Die f-toets toets of alle parameters wat in die model


gebruik word (β0,β1) gelyktydig 0 is of nie.
The F-test tests if all the parameters in the model (β0,β1)
are 0 or not.

H0: β0 = β1 = 0
H1: Ten minste een is nie 0 nie.
At least one is not 0.

7
T-Toets
T-Test

Elke parameter in die regressievergelyking (β’s) het sy eie


hipotesetoets wat toets of daardie spesifieke parameter
0 is of nie.
Each parameter in the regression equation (β’s) has its
own hypothesis test that tests if that parameter is 0.

Excel verskaf die p-waarde vir elk van hierdie toetse.


Excel provides us with the p-value for each of these tests.

H0: β0 = 0 H1: β0 ≠ 0
H0: β1 = 0 H1: β1 ≠ 0  H0: ρ = 0 H1: ρ ≠ 0

Excel
Regression Statistics
Multiple R 0.042
R Square 0.002
Adjusted R Square -0.141
Standard Error 11.989
Observations 9
ANOVA
df SS MS F Sig F
Regression 1 1.7840 1.7840 0.0124 0.9144
Residual 7 1006.2160 143.7451
Total 8 1008
Coef SE t Stat P-value
Intercept 68.164 34.614 1.969 0.090
Temp -0.052 0.471 -0.111 0.914

EXAMPLE 11-1 (p. 408)


 Table 11-2 (p. 409)
EXAMPLE 11-2 (p. 417)
 Excel, table 11-2

EXAMPLE 11-4 (p. 421)


EXAMPLE 11-5 (p. 422)
EXAMPLE 11-6 (p. 424)

You might also like