Chapter 15

Chapter 15: Describing Relationships: Regression, Prediction, and Causation
Objective:
- Learn how to apply statistical methods in order to predict one variable from other variables.
- Distinguish between the ability to predict one variable from others and the issue of whether
changes in one variable are caused by changes in others.
Things that cause value of something to go up and down (predicting)
Regression Lines
A regression line is a straight line that describes how a response variable y changes as an explanatory
variable x change. We often use a regression line to predict the value of y for a given value of x.
- If scatterplot shows a straight-line relationship between two quantitative variables, we

summarize overall pattern by drawing a line on the graph.
- Regression line summarizes relationship between 2 variables but only in one specific setting:
one of the variables helps explain or predict the other.
Ex.1
Ex.2
Regression Equations
The least-square regression line of y on x is the line that makes the sum of the squares of the vertical
distances of the data points from the line as small as possible.
15.2: different people might draw different lines according

to eye.
Because we want to predict y from x, we want a line that is

close to the points in the vertical direction
15.4 demonstrates least-squares idea
Focuses on 3 points, to find lest squares line, look at these

vertical distances, square them, and move the line until the
sum of the squares is the smallest it can be for any line.
In writing equation of line x= explanatory variable and y=

response variable.
B is the slope of the line by which y changes when x increases

by one unit
A is the intercept when x=0
- To use equation for prediction, just substitute x value

into equation and calculate the resulting y
Y=a+bx
Ex.3
Understanding Prediction
We use several explanatory variables to predict a response.
E.g., As part of college admissions process, a college might use SAT Math and Verbal’s cores and high
school grades in math, English, science (five explanatory variables) to predict first-year college grades.
- All statistical methods of predicting a response share some basic properties of least-squares regression
lines.
- Prediction Is based on fitting some “model set of data” e.g., in 15.1 and 15.2 our model is a
straight line that we draw through the points in a scatterplot, other prediction methods use
more elaborate models.
- Prediction works best when the model fits the data closely; In 15.1 the data closely follow a
line, with 15.2 they do not. Prediction is more trustworthy in 15.1 because the pattern is
stronger, if not easy to see pattern and there are many variables= no strong pattern= prediction
inaccurate.
- Prediction outside the range of the available data is risky; Prediction outside range of available
data is referred to as extrapolation
o (making predictions based on data that you do not have)
Ex.4
Correlation and Regression
Correlation measures the direction and strength of a straight-line relationship
Regression draws a line to describe the relationship
They are both affected by outliers
Correlation for all 50 states is 0.510 (If leave out

Hawaii, it drops to r=0.248
Solid Red line is the least-squares line for

predicting the annual record from the 23-
hour record
(leaving out Hawaii, the least-squares line
drops down to the dotted line)
- Line is nearly flat once we drop Hawaii
The square of the correlation r2, is the proportion of the variation in the values of y that is explained by
the least-squares regression of y on x.
The usefulness of the regression line for prediction depends on the strength of the association.
- Regression line depends on the correlation between variables.

o The square of the correlation is the right measure
- When there is a straight-line relationship, some of the variation in y is accounted for by the fact
that as x changes it pulls y along with it.
Ex.5
In reporting regression, it is usual to give r2 as a measure of how successful the regression was in
explaining the response.
- When see correlation, square it to get a better feel for the strength of the association. Perfect
correlation means the points lie exactly on a line.
- E.g., r=1, r2=1 and all the variation in one variable is accounted for by the straight-line
relationship with the other variable
- E.g., r=-.07 or 0.7; r2=0.49 and about half the variation is accounted for by the straight-line
relationship. (In the r2 scale, correlation +-0.7 is about halfway between 0 and +-1.
The Question of Causation
Statistics and causation
1. A strong relationship between two variables does not always mean that changes in one variable
cause changes in the other.
2. The relationship between two variables is often influenced by other variables lurking in the
background.
3. The best evidence for causation comes from randomized comparative experiments.
Ex.6
Ex.7
Ex.8
Evidence for Causation
4. The observation relationship between two variables may be due to direct causation, common
response, or confounding. Two or more of these factors may be present together.
5. An observed relationship can, however, be used for prediction without worrying about
causation as long as the patterns found in past data continue to hold true.
It is possible to build a strong case for causation in the absence of experiments.
Criteria for establishing causation when we cannot do an experiment:
- Association is strong; Association between smoking and lung cancer is very strong.
- Association is consistent; many studies of different kinds of people in many countries link
smoking to lung cancer. (reduces lurking variable
- Higher doses are associated with stronger responses; people who smoke more ger cancer more
often
- The alleged cause precedes the effect in time; lung cancer develops after years of smoking, the
number of men dying from lung cancer rose as smoking became more common. Lag of about 30
years
- The alleged cause is plausible; experiments with animals show that tars from cigarette smoke do
cause cancer.
Correlation, Prediction, and Big Data

Chapter 17 Thinking about Chance
Learn fundamental concepts about chance behavior
Interpret probabilities like 1 in 133,000
Identify common myths concerning chance behavior, such as misinterpretations of the law of averages
and the notation of the “hot land” in sports.
The Idea of Probability
Chance behavior is unpredictable in the short run but has a regular and predictable distribution of
outcomes in the long run
- E.g., tossing a coin, result cannot be predicted in advance because will vary, but there still is a
regular pattern in the results.
Ex.1
- Random in statistics is a description of events that are unpredictable in the short run but that
exhibit a kind of order that emerges only in the long run
- Haphazard; lacking any principle of organization
We Call a phenomenon random if individual outcomes are uncertain but there is, nonetheless, a regular
distribution of outcomes in a large number of repetitions.
The probability of any outcome of a random phenomenon is a number between 0 and 1 that describers
the proportion of times the outcome world occur in a very long series of repetitions.
Ex.2
- An outcome with probability 0 never occurs

- An outcome with probability 1 happens on every repetition
The Ancient History of Chance
- In ancient times, it was “rolling the bones”

o Tossing the astragalus (six-sided animal heel bone that, when thrown, will come to rest
on one of four sides)
Myths about Chance behavior
What happen if we did this many times?
The myth of short-run regularity; The idea of probability is that randomness is regular in the long run
- Also says that random phenomena should also be regular in the short run but when it isn’t, we
look for explanation other than chance variation
Ex.4
Ex.5
The myth of surprising coincidence; legit coincidence for numbers matching.
Ex.6
Ex.7
The myth of the law of averages; if you are winning too much, you will lose. “Law of averages says that
you must lose now so that the wins and losses will balance out.
Ex.8
Law of averages; Is there a law of averages?
- No, knowing the outcome of one trial does not change the probability for the outcomes of any
other trials
proportion of heads gradually becomes closer

and closer to 0.5 as the number of tosses
increases
Varies more and more as the number of tosses

increases (how far the total number of heads
differs from exactly half of the tosses being
heads)
Personal Probabilities
Probability is based on data about many repetitions of the same random phenomenon
Personal probability of an outcome is a number between 0 and 1 that expresses an individual’s

judgement of how likely the outcome is (Not a real probability)
- Personal probabilities are different in kind from probabilities as “proportions in many

repetitions”
o They express individual opinion, they can’t be said to be right or wrong
- There is not reason a person’s degree of confidence in the outcome of one try must agree with
the results of many tries
o Personal probability and what happens in many trials are different ideas
Probability and Risk
- Use probabilities from data to describe the risk of an unpleasant event

Ex.9
- Feel safer when a risk seems under control

- Hard to comprehend very small probabilities
Chapter 18 Probability Models
Describe random phenomena by listing possible outcomes and their associated probabilities
Learn the basic rules that probabilities must obey
Understand the relation between the odds of an event and the probability of an event
Probability Models
A probability model for a random phenomenon describes all the possible outcomes and says how to
assign probabilities to any collection of outcomes. We sometimes all a collection of outcomes an
event.
Probability Rules
A. Any probabilities is a number between 1 and 0

B. The sum of probabilities of all possible outcomes must have probability 1
C. The probability that an event does not occur is 1 minus the probability that the even does occur
D. If two events have no outcomes in common, the probability that one or the other occurs is the
sum of their individual probabilities

Chapter 15

Uploaded by

Copyright:

Available Formats

You might also like

Chapter 15

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 15

Uploaded by

Copyright:

Available Formats

Chapter 15: Describing Relationships: Regression, Prediction, and Causation

Things that cause value of something to go up and down (predicting)

- If scatterplot shows a straight-line relationship between two quantitative variables, we

15.2: different people might draw different lines according

Because we want to predict y from x, we want a line that is

Focuses on 3 points, to find lest squares line, look at these

In writing equation of line x= explanatory variable and y=

B is the slope of the line by which y changes when x increases

A is the intercept when x=0

- To use equation for prediction, just substitute x value

We use several explanatory variables to predict a response.

Correlation measures the direction and strength of a straight-line relationship

Regression draws a line to describe the relationship

They are both affected by outliers

Correlation for all 50 states is 0.510 (If leave out

Solid Red line is the least-squares line for

- Regression line depends on the correlation between variables.

Statistics and causation

Evidence for Causation

It is possible to build a strong case for causation in the absence of experiments.

Criteria for establishing causation when we cannot do an experiment:

Correlation, Prediction, and Big Data

Learn fundamental concepts about chance behavior

Interpret probabilities like 1 in 133,000

The Idea of Probability

- An outcome with probability 0 never occurs

The Ancient History of Chance

- In ancient times, it was “rolling the bones”

Myths about Chance behavior

What happen if we did this many times?

The myth of surprising coincidence; legit coincidence for numbers matching.

Law of averages; Is there a law of averages?

proportion of heads gradually becomes closer

Varies more and more as the number of tosses

Personal probability of an outcome is a number between 0 and 1 that expresses an individual’s

- Personal probabilities are different in kind from probabilities as “proportions in many

Probability and Risk

- Use probabilities from data to describe the risk of an unpleasant event

- Feel safer when a risk seems under control

Chapter 18 Probability Models

Learn the basic rules that probabilities must obey

A. Any probabilities is a number between 1 and 0

You might also like