Professional Documents
Culture Documents
2.5 Modeling Real-World Data
2.5 Modeling Real-World Data
1200000
1000000
800000
600000 Series1
400000
200000
0
0 2 4 6 8
years since 1999
We want to draw a line close to all
the points
This line would be called the best fit line.
We would use Linear regression to find this line.
1200000
1000000
800000
600000 Series1
400000
200000
0
0 2 4 6 8
years since 1999
Linear Regression
Most commonly, linear regression refers to a
model in which the conditional mean of y
given the value of X is an affine function of
X.
Linear Regression
Most commonly, linear regression refers to a
model in which the conditional mean of y
given the value of X is an affine function of
X.
What!
It is the line that fits closest to all the points.
How do we find it.
We can do it by hand (yeah)
We have to find the mean of domain
and the mean of the range. Then
subtract each x by the mean and do
the same for y and its mean. Square
the
1200000
1000000
800000
600000 Series1
400000
200000
0
0 2 4 6 8
years since 1999
Go to Chart on your Excel
spreadsheet
Use. Add a Trendline
Choose Linear
And a line is born.
female out of school
1200000
1000000
800000
Series1
600000
Linear (Series1)
400000
200000
0
0 2 4 6 8
years since 1999
Want to find the equation
Go to Add a Trendline . again
Click on the options tab
Click the box that says Display equation on chart
female out of school
1200000
1000000
800000
Series1
600000
Linear (Series1)
400000
200000
0
0 2 4 6 8
years since 1999
Want to find the equation
Move equations so you can read it
1200000
1000000
800000 Series1
600000 Linear (Series1)
400000 Linear (Series1)
200000
0
0 2 4 6 8
y = 62492x + 412671
years since 1999
Now the equation we can predict
How many females will be out of school in the year
2019?
Twenty years after the study begin.
y = 62492x + 412671; x = 20
y = 62492(20) + 412671
y = 1,662,511
Is this number correct? Why or why not ?
Lets try another one
http://www.gapminder.org/data/