Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

MODULE 1

BY:-
Parth Sharma(A1004820106)
Vardan Malik(A1004820147)
Kanishka Bisht(A1004820145)
Harshita Gupta(A1004820132)

1
Module 1
Measures of Central Tendency
• What is a measure of central tendency?
• Measures of Central Tendency
– Mode
– Median
– Mean
• Shape of the Distribution
• Considerations for Choosing an
Appropriate Measure of Central
Tendency
What is a measure of Central
Tendency?

• Numbers that describe what is average or


typical of the distribution

• You can think of this value as where the


middle of a distribution lies.
The Mode
• The category or score with the largest
frequency (or percentage) in the
distribution.

• The mode can be calculated for variables


with levels of measurement that are:
nominal, ordinal, or interval-ratio.
The Mode: An Example
• Example: Number of Votes for Candidates for
Mayor. The mode, in this case, gives you the
“central” response of the voters: the most
popular candidate.
Candidate A – 11,769 votes The Mode:
Candidate B – 39,443 votes “Candidate C”
Candidate C – 78,331 votes
The Median
• The score that divides the distribution
into two equal parts, so that half the
cases are above it and half below it.

• The median is the middle score, or


average of middle scores in a
distribution.
Median Exercise #1 (N is odd)
Calculate the median for this hypothetical
distribution:
Job Satisfaction Frequency
Very High 2
High 3
Moderate 5
Low 7
Very Low 4

TOTAL 21
Median Exercise #2 (N is even)
Calculate the median for this hypothetical
distribution:
Satisfaction with Health Frequency
Very High 5
High 7
Moderate 6
Low 7
Very Low 3

TOTAL 28
Finding the Median in
Grouped Data

N (.5)  Cf
Median  L  w
f
Percentiles
• A score below which a specific percentage of
the distribution falls.
• Finding percentiles in grouped data:

N (.25)  Cf
25%  L  w
f
The Mean
• The arithmetic average obtained by
adding up all the scores and dividing
by the total number of scores.
Formula for the Mean

Y
Y
N
•“Y bar” equals the sum of all the scores, Y,
divided by the number of scores, N.
Calculating the mean with
grouped scores

fY
Y 
N
•where: f Y = a score multiplied by its frequency
Mean: Grouped Scores
Mean: Grouped Scores
Grouped Data: the Mean &
Median
•Calculate the median and mean for the grouped
frequency below.

•Number of People Age 18 or older living in a U.S. Household


in 1996 (GSS 1996)
• Number of People Frequency
• 1 190
• 2 316
• 3 54
• 4 17
• 5 2
• 6 2
• TOTAL 581
Shape of the Distribution
• Symmetrical (mean is about equal to
median)
• Skewed
– Negatively (example: years of education)
mean < median
– Positively (example: income)
mean > median
• Bimodal (two distinct modes)
• Multi-modal (more than 2 distinct modes)
Data
• Quantitative explanatory variable X
• Quantitative response variable Y
• Objective: To quantify the linear relationship
between X and Y

04/22/22 18
Illustrative Data (Doll, 1955)
per capita cigarette lung cancer mortality per
consumption (X) 100,000 in 1950 (Y)

n = 11
04/22/22 19
Scatterplot
Assess:
• Form
• Direction of
association
• Outliers
• Strength of
relation

04/22/22 20
Doll, 1955
• Form: linear
• Direction: positive
association
• Outlier: no clear
outliers
• Strength: difficult to
determine by eye

04/22/22 21
Correlation Coefficient r
• r ≡ Pearson’s product-moment
correlation coefficient
• Measures degree to which X
and Y “go together”
• Always between −1 and 1
• r ≈ 0  no correlation
• r > 0  positive correlation
• r < 0  negative correlation Karl Pearson
• Closer r is to 1 or −1, the 1857 - 1936

stronger the correlation


04/22/22 23
Correlational Direction and Strength

04/22/22 24
Coefficient of
determination (r )
2

• Square the correlation coefficient


 r2 = proportion of variance in Y
mathematically explained by X
• Illustrative data: r2 = 0.7372 = 0.54
 54% of variance in lung cancer
mortality is mathematically
explained per capita smoking
rates

04/22/22 25
Cautions
• Outliers
• Non-linear relations
• Confounding
(correlation is NOT
causation)
• Randomness

04/22/22 26
Outliers
Outliers can have profound influence on r

These data have r = 0.82


all because of this guy

04/22/22 27
Linear Relations Only

r = 0.00
This strong
relationship is
missed by r
because it is not
linear

04/22/22 28
Least Squares Line
Residual ≡ distance of data point from regression line (dotted)

The best fitting


line minimizes
the residuals
Determine a and
b of best fitting
line via formula,
calculator, or
computer.

04/22/22 29
ŷ = 6.756 + 0.0284 ∙ X
Slope = “rise over run”
.0228 increase per unit X

“Rise” over 200 units


= 200 ∙ .0228
= 5.68

6.756
(intercept)

04/22/22 31
Population Regression Model

where
• α ≡ intercept parameter
• β ≡ slope parameter
• εi ≡ residual error, observation i
Objective:
To estimate β with (1 – α)100% confidence

04/22/22 32
THANK YOU! 

04/22/22 33

You might also like