Chapter 2

1
INTRODUCTION TO STATISTICS &

PROBABILITY
Chapter 2:
Looking at Data–Relationships
Dr. Nahid Sultana
12/8/2022 Copyright© Nahid Sultana 2017-2018

Chapter 2:
Looking at Data–Relationships
2
Introduction
2.4 Least-Squares Regression
2.5 Cautions about Correlation and Regression
2. 6 Data Analysis for Two-Way Tables
Copyright© Nahid Sultana 2017-2018 12/8/2022

Introduction
3
Objectives
➢ Relationships
➢ Scatterplots
➢ Correlation

Bivariate data
4
➢ For each individual studied, we record data on two variables.

➢ We then examine whether there is a relationship between these two
variables:
Size and price of a coffee beverage. Suppose you visited a local

Starbucks to buy a Mocha. The barista explains that this blended
coffee beverage comes in three sizes: small, medium, large, and the
prices are $3.15, $4.65, and $5.15, respectively.
✓ There is a clear association between the size of the Mocha and its price.

Associations Between Variables
5
➢
5 Many interesting examples of the use of statistics involve
relationships between pairs of variables.
Two variables measured on the same cases are associated if

knowing the value of one of the variables tells you something about
the values of the other variable that you would not know without this
information.
➢ A response (dependent) variable measures an outcome of a study.

➢ An explanatory (independent) variable explains changes in the
response variable.
Scatterplot
6
➢ The most useful graph for displaying the relationship between two
quantitative variables on the same individuals is a scatterplot.
How to Make a Scatterplot

1. Decide which variable should go on which axis.
2. Typically, the explanatory or independent variable is plotted
on the x-axis, and the response or dependent variable is plotted
on the y-axis.
3. Label and scale your axes.
4. Plot individual data values.
6
Scatterplot (Cont…)
7
Example: Make a scatterplot of the relationship between body

weight and backpack weight for a group of hikers.
Body weight (lb) 120 187 109 103 131 165 158 116
Backpack weight (lb) 26 30 26 24 29 35 31 28
7 12/8/2022
Copyright© Nahid Sultana 2017-2018
Interpreting Scatterplots
8
How to Examine a Scatterplot

➢ After plotting two variables on a scatterplot, we describe the
overall pattern of the relationship. Specifically, we look for form,
direction, and strength .
Form: linear, curved, clusters, no pattern
Direction: positive, negative, no direction
Strength: how closely the points fit the “form”
➢… and clear deviations from that pattern
Outliers of the relationship, an individual value that falls
8 outside the overall pattern of the relationship
Interpreting Scatterplots (Cont…)
(Form)
9
Linear
No relationship
Nonlinear

(Direction)
10
Positive association: High values of one variable tend to occur

together with high values of the other variable.
Negative association: High values of one variable tend to occur
together with low values of the other variable

11
No relationship: X and Y vary independently. Knowing X tells you

nothing about Y.

(Strength)
12
The strength of the relationship between the two variables can be

seen by how much variation, or scatter, there is around the main
form.

(Outliers)
13
In a scatterplot, outliers are points that fall outside of the overall

pattern of the relationship.

14
✓ There is one possible

outlier―the hiker with
the body weight of 187
pounds seems to be
carrying relatively less
weight than are the
other group members.
Strength Direction Form

✓ There is a moderately strong, positive, linear relationship between body
weight and backpack weight.
✓ It appears that lighter hikers are carrying lighter backpacks.
Categorical variables in scatterplots
15
To add a categorical variable, use a different plot color or symbol for

each category.
What may look like a positive

linear relationship is in fact a
series of negative linear
associations.
Plotting different habitats in
different colors allows us to
make that important distinction.

Categorical variables in scatterplots
16
(Cont…)
Comparison of men and women
racing records over time.
Each group shows a very strong
negative linear relationship that
would not be apparent without the
gender categorization.
Relationship between lean body
mass and metabolic rate in men
and women.
Both men and women follow the
same positive linear trend, but
women show a stronger association.
Categorical explanatory variables
When the explanatory variable is categorical, you cannot make a
scatterplot, but you can compare the different categories side by side on
the same graph (boxplots, or mean +/− standard deviation).
Comparison of income (quantitative

response variable) for different
education levels (five categories).
But be careful in your

interpretation: This is NOT a
positive association, because
education is not quantitative.
Nonlinear Relationships
▪ There are other forms of relationships besides linear. The
scatterplot below is an example of a nonlinear form.
▪ Note that there is curvature in the relationship between x

and y.
18
Correlation
➢ The correlation coefficient r

➢ Properties of r
➢ Influential points
19
The correlation coefficient "r"
20
➢ The correlation coefficient is a ➢ Suppose that we have data

measure of the direction and on variables x and y for n
strength of a linear relationship. individuals.
➢ The means and standard
➢ Correlation can only be used to
deviations of the two variables
describe quantitative variables.
are x and for the x-values,
Categorical variables don’t have and y and for y-values.
means and standard deviations.
➢ The correlation r between x
➢ It is calculated using the mean and y
and the standard deviation of
x i − x  y i − y 
n 
1
both the x and y variables. r=   
n −1 i=1  sx  sy 


"r" ranges from -1 to +1
21
Properties of Correlation
➢ r is always a no. between –1 and 1.
➢ r > 0 indicates a positive association.
r < 0 indicates a negative association.
➢ Values of r near 0 indicate a very
weak linear relationship.
➢ The strength of the linear relationship
increases as r moves away from 0
toward –1 or 1.
➢ The extreme values r = –1 and r = 1
occur only in the case of a perfect
linear relationship.
Properties of Correlation
22
1. Correlation makes no distinction between explanatory and response

variables.
2. r has no units and does not change when we change the units of
measurement of x, y, or both.
3. Positive r indicates positive association between the variables, and
negative r indicates negative association.
4. The correlation r is always a number between –1 and 1.
Cautions:
▪ Correlation requires that both variables be quantitative.
▪ Correlation does not describe curved relationships between
variables, no matter how strong the relationship is.
▪ Correlation is not resistant. r is strongly affected by a few
outlying observations.
▪ Correlation is not a complete summary of two-variable data.
2.4 Least-Squares Regression
23
Objectives
➢ Regression lines
➢ Least-squares regression line
➢ Facts about Least-Squares Regression
➢ Correlation and Regression

Regression line
24
➢ Correlation tells us about strength and direction of the linear

relationship between two quantitative variables.
➢ In Regression we study the association between two variables in
order to explain the values of one from the values of the other
(i.e., make predictions).
➢ When there is a linear association between two variables, then a
straight line equation can be used to model the relationship.
➢ In regression the distinction between Response and Explanatory is
important.

Regression Line
25
A regression line is a straight line that describes how a response variable

y changes as an explanatory variable x changes.
We can use a regression line to predict the value of y for a given value of x.
Example: Predict the number of

new adult birds that join the colony
based on the percent of adult
birds that return to the colony from
the previous year.
If 60% of adults return, how

many new birds are predicted?
Regression line (Cont…)
26
➢ A regression line is a line that best describes the linear

relationship between the two variables, and it is expressed by
means of an equation of the form:
Where is the slope and is the intercept.
➢ Once the equation of the regression line is established, we can

use it to predict the response y for a specific value of the
explanatory variable x .
The least-squares regression line
27
The least-squares regression line is the line that makes the sum of
the squares of the vertical distances of the data points from the
line as small as possible.

The least-squares regression line (Cont.)
28
The equation of the least-squares regression line of y on x is

yˆ = b 0 + b1 x
yˆis the predicted y value (y hat)

b1 is the slope
b0 is the y-intercept

How to plot the least-squares
regression line
29
sy
First we calculate the slope of the line, b1 = r
sx
Where
r is the correlation,
sy is the standard deviation of the response variable y,
sx is the standard deviation of the explanatory variable x.

Once we know b1, the slope, we can calculate b0, the y-intercept:
b0 = y − b1 x
Where x and y are the sample means of the x and y variables
Typically, we use stats software.

Two different regression lines can be drawn if we
interchange the roles of x and y.
30
Example:
Fitted Line Plot Fitted Line Plot
Fat = 3.505 - 0.003441 NEA NEA = 745.3 - 176.1 Fat
700
4
600
Nonexercise activity (calories)

500
Fat gain (Kilograms)
3
400
300
2
200
100
1
0
-100
0
-100 0 100 200 300 400 500 600 700 0 1 2 3 4
Nonexercise activity (calories) Fat gain (Kilograms)
Correlation coefficient of NEA and Fat, r = -0.779 stay same in both cases
BEWARE!!!
31
Not all calculators and software use the same convention. Some use:
yˆ = a + bx
yˆ = ax + b
And some use:
Make sure you know what YOUR calculator gives you for a and b before
you answer homework or exam questions.
Facts About Least-Squares Regression
32
Least- squares is the most common method for fitting a regression

line to data. Here are some facts about least-squares regression
lines.
➢ Fact 1: A change of one standard deviation in x

corresponds to a change of r standard deviations in y.
➢ Fact 2: The LSRL always passes through (x-bar, y-bar)
➢ Fact 3: The distinction between explanatory and response
variables is essential.
(in 1000s)
33 Year
1 977
Powerboat s
4 47
Dead Manate es
13
yˆ = 0.125 x − 41 .4
1 978 4 60 21
1 979 4 81 24
1 980 4 98 16
1 981 5 13 24
1 982 5 12 20
1 983 5 26 15
1 984 5 59 34
1 985 5 85 33
1 986 6 14 33
1 987 6 45 39
1 988 6 75 43
1 989 7 11 50
1 990 7 19 47
➢ There is a positive linear relationship between the number of

powerboats registered and the number of manatee deaths.
➢ The least squares regression line has the equation: yˆ = 0.125 x − 41.4
➢ Thus if we were to limit the number of powerboat registrations to
500,000, what could we expect for the number of manatee deaths?
yˆ = 0.125(500) − 41.4  yˆ = 62.5 − 41.4 = 21.1 ➢ Roughly 21 manatees.
----Could we use this regression line to predict the number of manatee
deaths for a year with 200,000 powerboat registrations?
Extrapolation
!!!
34
!!!
Extrapolation is the use of a

regression line for prediction
far outside the range of values
of x used to obtain the line.
Such predictions are often not

accurate.

Extrapolation (cont…)
35
➢ Sarah’s height was plotted

against her age.
➢ Can you guess (predict)
her height at age 42
months?
➢ Can you predict her height
at age 30 years (360
months)?

Extrapolation (cont…)
36
➢ Regression line:
y-hat = 71.95 + .383 x
➢ Height at age 42 months?
y-hat = 88
➢ Height at age 30 years?
y-hat = 209.8
➢ She is predicted to be 6’10.5”
at age 30! What’s wrong?
Coefficient of determination, r 2
37
➢ Least-squares regression looks at the distances of the data points

from the line only in the y direction.
➢ The variables x and y play different roles in regression.
➢ Even though correlation r ignores the distinction between x and y,

there is a close connection between correlation and regression.
➢ r2 is called the coefficient of determination.
➢ r2 represents the percentage of the variance in y (vertical scatter

from the regression line) that can be explained by changes in x.

38
r = -1 Changes in x
r2 = 1 explain 100% of r = 0.87
the variations in y. r2 = 0.76
Y can be entirely
predicted for any
given value of x.
Changes in x
r=0
explain 0% of the Here the change in x only
r2 = 0
variations in y.
explains 76% of the change in
The values y takes
y. The rest of the change in y
are entirely
independent of (the vertical scatter, shown as
what value x red arrows) must be explained
takes. by something other than x.
38
r r==–0.3, r 2 = 0.09, or 9%
–0.3, r 2 = 0.09, or 9%
The
Theregression
regressionmodel
modelexplains
explainsnot
noteven
even10%
10%
ofofthe
thevariations
variationsininy.y.
r r==–0.7, r 2 = 0.49, or 49%

–0.7, r 2 = 0.49, or 49%
The
Theregression
regressionmodel
modelexplains
explainsnearly
nearlyhalf
halfofof
the
thevariations
variationsininy.y.
r r==–0.99, r 2 = 0.9801, or ~98%

–0.99, r 2 = 0.9801, or ~98%
The
Theregression
regressionmodel
modelexplains
explainsalmost
almostallallofof
the
thevariations
variationsininy.y.
39
2.5 Cautions About Correlation and
40
Regression
Objectives
➢ Residuals and residual plots

➢ Outliers and influential observations
➢ Lurking variables
➢ Correlation and causation

Residuals
41
A residual is the difference between an observed value of the
response variable and the value predicted by the regression line:
residual = observed y – predicted y = y − yˆ
Points above the

The sum of these
line have a positive residuals is always 0.
residual. 
Points below the line have a
negative residual.
Predicted ŷ
dist. ( y − yˆ ) = residual
Observed y

Residual plots
42
➢ A residual plot is a scatterplot of the regression residuals against

the explanatory variable.
➢ Residual plots help us assess the fit of a regression line.
➢ If residuals are scattered randomly around 0, chances are your
data fit a linear model, was normally distributed, and you didn’t
have outliers.
The x-axis in a residual plot is
the same as on the
scatterplot.
Only the y-axis is different.
43
Residuals are randomly
scattered—good!
Curved pattern—means the

relationship you are looking at is
not linear.
A change in variability across a

plot is a warning sign. You need to
find out why it is, and remember
that predictions made in areas of
larger variability will not be as
44
Copyright© Nahid Sultana 2017-2018 good. 12/8/2022
Outliers and Influential Points
An outlier is an observation that lies outside the overall pattern of the
other observations.
45
➢ Outliers in the y direction have large residuals.

➢ Outliers in the x direction are often influential for the least-squares
regression line, meaning that the removal of such points would
markedly change the equation of the line.

Outliers and Influential Points (cont…)
46
Gesell Adaptive Score and Age at First Word
After removing child 18

r 2 = 11%
From all of the data

r 2 = 41%

Cautions About Correlation and
47
Regression
➢ Both describe linear relationships.
➢ Both are affected by outliers.
➢ Always plot the data before interpreting.
➢ Beware of extrapolation: Use caution in predicting y when x is
outside the range of observed x’s.
➢ Beware of lurking variables--these have an important effect on the
relationship among the variables in a study, but are not included in
the study.
➢ Correlation does not imply causation!
Example:
A personal trainer wants to look at the relationship between number of hours of
exercise per week and resting heart rate of her clients. The data show a linear
pattern with the summary statistics shown below:
mean standard deviation
x= hours of exercise
sx=4.8
per week
y=resting heart rate

sy=7.2
(beats per minute)
r =−0.88
Find the equation of the least-squares regression line for predicting resting
heart rate from the hours of exercise per week.
48
2.5 Data Analysis for Two-Way Tables
49
Objectives
➢ The Two-Way Table

➢ Joint distribution
➢ Marginal Distribution
➢ Conditional Distributions

Two-way tables
50
Two-way tables summarize data about two categorical variables (or

factors) collected on the same set of individuals.
Example (Smoking Survey in Arizona): High school students were
asked whether they smoke and whether their parents smoke.
Does parental smoking influence the smoking habits of their high school
children?
Explanatory Variable: Smoking habit of student’s parents
(both smoke/ one smoke/ neither smoke)
Response variable: Smoking habit of student
(smokes/does not smoke)
To analyze the relationship we can summarize the result in a Two-way
table: Copyright© Nahid Sultana 2017-2018 12/8/2022
Two-way tables (Cont …)
51
Explanatory (Row) Variable: Smoking habit of student’s parents

Response (Column) variable: Smoking habit of student
High school students were asked whether they smoke,

and whether their parents smoke: Second factor:
Student smoking status
First factor:
Parent smoking status 400 1380
416 1823
188 1168
This 3X2 two-way table has 3 rows and 2 columns. Numbers are counts
or frequency
Margins
52
Margins show the total for each column and each row.
400 1380 Margin for parental

416 1823 smoking
188 1168
Margin for student smoking
➢ For each cell, we can compute a proportion by dividing the cell

entry by the total sample size.
➢ The collection of these proportions is the joint distribution of the
two categorical variables. Copyright© Nahid Sultana 2017-2018 12/8/2022
Marginal distributions
(When examine the distribution of a single variable in a two-way table)
53
❖ Marginal distributions: Distribution of column variable separately (or

row variable separately) expressed in counts or percent.
400 1380
416 1823
188 1168
400 1380 33.1%

416 1823 41.7% 1780
188 1168 25.2%
 33.1%
5375
18.7% 81.3% 100%
Copyright© Nahid Sultana 2017-2018
1004 12/8/2022
= 18.7%
5375
Marginal distribution (Cont..)
Sum of Counts
Parental smoking
45%
Smoker Nonsmoker Total
Percent of students interviewed

40%
Both
54
400 1380 33.1% 35%
One 416 1823 41.7% 30%
25%
Neither 188 1168 25.2% 20%
Total 18.7% 81.3% 100%

15%
10%
5%
The marginal distributions can be 0%
displayed on separate bar graphs, Both One Neither
Student smoking
typically expressed as percents
Sum of Counts Parents
90%
Percent of students interviewed

80%
instead of raw counts. 70%
Each graph represents only one 60%
of the two variables, ignoring the

50%
40%
second one. 30%
Each marginal distribution can also 20%
10%
be shown in a pie chart.Copyright© Nahid Sultana 0%2017-2018 12/8/2022
Smoker Nonsmoker
Conditional Distribution
55
A conditional distribution is the distribution of one factor for each

level of the other factor.
A conditional percent is computed using the counts within a single row
or a single column. The denominator is the corresponding row or
column total (rather than the table grand total).
400
400 1380
1380
416
416
1823
1823
188 1168
188 1168
Percent of students
Percent who
of students smoke
who smoke when
whenboth
bothparents smoke
parents smoke =
= 400/1780
400/1780
= 22.5%
= 22.5%
Conditional distributions (Cont…)
56
➢ Comparing conditional distributions helps us describe the “relationship"

between the two categorical variables.
➢ We can compare the percent of individuals in one level of factor 1 for
each level of factor 2.
400 1380
416 1823
188 1168
Conditional distribution of student smokers for different parental smoking statuses:

Percent of students who smoke when both parents smoke = 400/1780 = 22.5%
Percent of students who smoke when one parent smokes = 416/2239 = 18.6%
Percent of students who smoke when neither parent smokes = 188/1356 = 13.9%
57
The conditional distributions can be compared graphically by displaying the percents

making up one level of one factor, for each level of the other factor.
Conditional distribution of student smoking status for different levels of parental

smoking status: Percent who Percent who
Row total
smoke do not smoke
Both parents smoke 22% 78% 100%
One parent smokes 19% 81% 100%
Neither parent smokes 14% 86% 100%

58
➢ In the table below, the 25 to 34 age group occupies the first column.

59
Here the percents are calculated

by age range (columns).
29.30% = 11071
37785
= cell total .
column total

The conditional distributions can be graphically compared using side by
side bar graphs of one variable for each value of the other variable.
Here, the percents are

calculated by age range
(columns).
60 Copyright© Nahid Sultana 2017-2018 12/8/2022

60
61
Young adults by gender and chance of getting rich by age 30
Female Male Total

Almost no chance 96 98 194
Some chance, but probably not 426 286 712
A 50-50 chance 696 720 1416
A good chance 663 758 1421
Almost certain 486 597 1083
Total 2367 2459 4826
What are the variables described by this two-way table?
How many young adults were surveyed?

Marginal Distribution
62
Young adults by gender and chance of

getting rich Examine the marginal distribution of
Female Male Total chance of getting rich.
Almost no chance 96 98 194
Some chance, but 426 286 712
probably not
A 50-50 chance 696 720 1416
Total 2367 2459 4826
Response Percent
Almost no chance 194/4826 = 4.0%
Some chance 712/4826 = 14.8%
A 50-50 chance 1416/4826 = 29.3%
A good chance 1421/4826 = 29.4%
Almost certain 1083/4826 = 22.4%
Young adults by gender and chance of getting rich 1. Calculate the conditional distribution of opinion
63
Female Male Total among males.
Almost no chance 96 98 194 2. Examine the relationship between gender and
Some chance, 426 286 712
opinion.
but probably not
A 50-50 chance 696 720 1416
Total 2367 2459 4826
Response Male Female
Almost no chance 98/2459 = 96/2367 =

4.0% 4.1%
Some chance 286/2459 426/2367 =
= 11.6% 18.0%
A 50-50 chance 720/2459 696/2367 =
= 29.3% 29.4%
A good chance 758/2459 663/2367 =
= 30.8% 28.0%
Almost certain 597/2459 Copyright©
486/2367 = Nahid Sultana 2017-2018 12/8/2022 63
= 24.3% 20.5%
Simpson’s Paradox
64
Consider the acceptance rates for the following groups of men

and women who applied to college.
Not
Counts Accepted Total
accepted Not
Percents Accepted
accepted
Men 198 162 360
Men 55% 45%
Women 88 112 200
Women 44% 56%
Total 286 274 560
A higher percentage of men were accepted: Is there evidence

of discrimination?
Simpson’s Paradox (cont…)
65
Consider the acceptance rates when broken down by type of school.

BUSINESS SCHOOL
Not Not
Counts Accepted Total Percents Accepted
accepted accepted
Men 18 102 120
Men 15% 85%
Women 24 96 120
Total 42 198 240 Women 20% 80%
ART SCHOOL
Not Not
Counts Accepted Total Percents Accepted
accepted accepted
Men 180 60 240
Women 64 16 80 Men 75% 25%
Total 244 76 320 Women 80% 20%
Within each school a higher percentage of women were accepted than men.
Within each school a higher percentage of women were accepted than men.
66
There is not any discrimination against women!!!

➢ lurking variables have an important effect on the relationship
among the variables in a study, but are not included in the study.
✓ Lurking variable: Applications were split between the Business

School (240) and the Art School (320).
This is an example of Simpsons Paradox.
➢ When the lurking variable (Type of School: Business or Art) is
ignored the data seem to suggest discrimination against women.
➢ However, when the type of school is considered, the association is
reversed and suggests discrimination against men.
67
An association or comparison that holds for all of several groups

can reverse direction when the data are combined to form a
single group. This reversal is called Simpson’s paradox.

Chapter 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2

Uploaded by

Copyright:

Available Formats

1

INTRODUCTION TO STATISTICS &

Dr. Nahid Sultana

12/8/2022 Copyright© Nahid Sultana 2017-2018

Copyright© Nahid Sultana 2017-2018 12/8/2022

Copyright© Nahid Sultana 2017-2018 12/8/2022

➢ For each individual studied, we record data on two variables.

Size and price of a coffee beverage. Suppose you visited a local

Copyright© Nahid Sultana 2017-2018 12/8/2022

Two variables measured on the same cases are associated if

➢ A response (dependent) variable measures an outcome of a study.

How to Make a Scatterplot

Example: Make a scatterplot of the relationship between body

How to Examine a Scatterplot

Copyright© Nahid Sultana 2016-2017 12/8/2022

Positive association: High values of one variable tend to occur

Copyright© Nahid Sultana 2016-2017 12/8/2022

No relationship: X and Y vary independently. Knowing X tells you

Copyright© Nahid Sultana 2016-2017 12/8/2022

The strength of the relationship between the two variables can be

Copyright© Nahid Sultana 2016-2017 12/8/2022

In a scatterplot, outliers are points that fall outside of the overall

Copyright© Nahid Sultana 2016-2017 12/8/2022

✓ There is one possible

Strength Direction Form

To add a categorical variable, use a different plot color or symbol for

What may look like a positive

Copyright© Nahid Sultana 2016-2017 12/8/2022

Comparison of income (quantitative

But be careful in your

▪ Note that there is curvature in the relationship between x

➢ The correlation coefficient r

➢ The correlation coefficient is a ➢ Suppose that we have data

Copyright© Nahid Sultana 2017-2018 12/8/2022

1. Correlation makes no distinction between explanatory and response

Copyright© Nahid Sultana 2017-2018 12/8/2022

➢ Correlation tells us about strength and direction of the linear

Copyright© Nahid Sultana 2017-2018 12/8/2022

A regression line is a straight line that describes how a response variable

Example: Predict the number of

If 60% of adults return, how

➢ A regression line is a line that best describes the linear

Where is the slope and is the intercept.

➢ Once the equation of the regression line is established, we can

Copyright© Nahid Sultana 2017-2018 12/8/2022

The equation of the least-squares regression line of y on x is

yˆis the predicted y value (y hat)

Copyright© Nahid Sultana 2017-2018 12/8/2022

Typically, we use stats software.

Nonexercise activity (calories)

Least- squares is the most common method for fitting a regression

➢ Fact 1: A change of one standard deviation in x

➢ There is a positive linear relationship between the number of

Extrapolation is the use of a

Such predictions are often not

Copyright© Nahid Sultana 2017-2018 12/8/2022

➢ Sarah’s height was plotted

Copyright© Nahid Sultana 2017-2018 12/8/2022

➢ Least-squares regression looks at the distances of the data points

➢ The variables x and y play different roles in regression.

➢ Even though correlation r ignores the distinction between x and y,

➢ r2 is called the coefficient of determination.

➢ r2 represents the percentage of the variance in y (vertical scatter

Copyright© Nahid Sultana 2017-2018 12/8/2022