Parth Sharma (A1004820106) Vardan Malik (A1004820147) Kanishka Bisht (A1004820145) Harshita Gupta (A1004820132)

MODULE 1
BY:-
Parth Sharma(A1004820106)
Vardan Malik(A1004820147)
Kanishka Bisht(A1004820145)
Harshita Gupta(A1004820132)
1
Module 1
Measures of Central Tendency
• What is a measure of central tendency?
• Measures of Central Tendency
– Mode
– Median
– Mean
• Shape of the Distribution
• Considerations for Choosing an
Appropriate Measure of Central
Tendency
What is a measure of Central
Tendency?
• Numbers that describe what is average or

typical of the distribution
• You can think of this value as where the

middle of a distribution lies.
The Mode
• The category or score with the largest
frequency (or percentage) in the
distribution.
• The mode can be calculated for variables

with levels of measurement that are:
nominal, ordinal, or interval-ratio.
The Mode: An Example
• Example: Number of Votes for Candidates for
Mayor. The mode, in this case, gives you the
“central” response of the voters: the most
popular candidate.
Candidate A – 11,769 votes The Mode:
Candidate B – 39,443 votes “Candidate C”
Candidate C – 78,331 votes
The Median
• The score that divides the distribution
into two equal parts, so that half the
cases are above it and half below it.
• The median is the middle score, or

average of middle scores in a
distribution.
Median Exercise #1 (N is odd)
Calculate the median for this hypothetical
distribution:
Job Satisfaction Frequency
Very High 2
High 3
Moderate 5
Low 7
Very Low 4
TOTAL 21
Median Exercise #2 (N is even)
Calculate the median for this hypothetical
distribution:
Satisfaction with Health Frequency
Very High 5
High 7
Moderate 6
Low 7
Very Low 3
TOTAL 28
Finding the Median in
Grouped Data
N (.5)  Cf
Median  L  w
f
Percentiles
• A score below which a specific percentage of
the distribution falls.
• Finding percentiles in grouped data:
N (.25)  Cf
25%  L  w
f
The Mean
• The arithmetic average obtained by
adding up all the scores and dividing
by the total number of scores.
Formula for the Mean
Y
Y
N
•“Y bar” equals the sum of all the scores, Y,
divided by the number of scores, N.
Calculating the mean with
grouped scores
fY
Y 
N
•where: f Y = a score multiplied by its frequency
Mean: Grouped Scores
Mean: Grouped Scores
Grouped Data: the Mean &
Median
•Calculate the median and mean for the grouped
frequency below.
•Number of People Age 18 or older living in a U.S. Household

in 1996 (GSS 1996)
• Number of People Frequency
• 1 190
• 2 316
• 3 54
• 4 17
• 5 2
• 6 2
• TOTAL 581
Shape of the Distribution
• Symmetrical (mean is about equal to
median)
• Skewed
– Negatively (example: years of education)
mean < median
– Positively (example: income)
mean > median
• Bimodal (two distinct modes)
• Multi-modal (more than 2 distinct modes)
Data
• Quantitative explanatory variable X
• Quantitative response variable Y
• Objective: To quantify the linear relationship
between X and Y
04/22/22 18
Illustrative Data (Doll, 1955)
per capita cigarette lung cancer mortality per
consumption (X) 100,000 in 1950 (Y)
n = 11
04/22/22 19
Scatterplot
Assess:
• Form
• Direction of
association
• Outliers
• Strength of
relation
04/22/22 20
Doll, 1955
• Form: linear
• Direction: positive
association
• Outlier: no clear
outliers
• Strength: difficult to
determine by eye
04/22/22 21
Correlation Coefficient r
• r ≡ Pearson’s product-moment
correlation coefficient
• Measures degree to which X
and Y “go together”
• Always between −1 and 1
• r ≈ 0  no correlation
• r > 0  positive correlation
• r < 0  negative correlation Karl Pearson
• Closer r is to 1 or −1, the 1857 - 1936
stronger the correlation

04/22/22 23
Correlational Direction and Strength
04/22/22 24
Coefficient of
determination (r )
2
• Square the correlation coefficient

 r2 = proportion of variance in Y
mathematically explained by X
• Illustrative data: r2 = 0.7372 = 0.54
 54% of variance in lung cancer
mortality is mathematically
explained per capita smoking
rates
04/22/22 25
Cautions
• Outliers
• Non-linear relations
• Confounding
(correlation is NOT
causation)
• Randomness
04/22/22 26
Outliers
Outliers can have profound influence on r
These data have r = 0.82

all because of this guy
04/22/22 27
Linear Relations Only
r = 0.00
This strong
relationship is
missed by r
because it is not
linear
04/22/22 28
Least Squares Line
Residual ≡ distance of data point from regression line (dotted)
The best fitting

line minimizes
the residuals
Determine a and
b of best fitting
line via formula,
calculator, or
computer.
04/22/22 29
ŷ = 6.756 + 0.0284 ∙ X
Slope = “rise over run”
.0228 increase per unit X
“Rise” over 200 units

= 200 ∙ .0228
= 5.68
6.756
(intercept)
04/22/22 31
Population Regression Model
where
• α ≡ intercept parameter
• β ≡ slope parameter
• εi ≡ residual error, observation i
Objective:
To estimate β with (1 – α)100% confidence
04/22/22 32
THANK YOU! 
04/22/22 33

Parth Sharma (A1004820106) Vardan Malik (A1004820147) Kanishka Bisht (A1004820145) Harshita Gupta (A1004820132)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parth Sharma (A1004820106) Vardan Malik (A1004820147) Kanishka Bisht (A1004820145) Harshita Gupta (A1004820132)

Uploaded by

Copyright:

Available Formats

MODULE 1

• Numbers that describe what is average or

• You can think of this value as where the

• The mode can be calculated for variables

• The median is the middle score, or

•Number of People Age 18 or older living in a U.S. Household

stronger the correlation

• Square the correlation coefficient

These data have r = 0.82

The best fitting

“Rise” over 200 units

You might also like