Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

ECON 225: Data and

Statistics for Economics


Lecture 6
Review

X1

X2 X3
Plan for today
• Density curves
• Normal distributions
• Standardizing a normal distribution
• Normal quantile plots
• If there is time, Excel: Boxplot (Q 1.51 in Alwan)
Density curves all of the distribution is equal to 100% = 1

• A density curve is a mathematical


model of a distribution
• The total area under the curve, by
definition, is equal to 1, or 100%.
• The area under the curve for a
range of values is the proportion of
all observations that fall in that
range.
• The smoothed density curve on top
of the histogram describes
theoretically the population
Density curves
uniform
can come in any
shape. left skewed

Some can be described


mathematically, and others bimodal
cannot.

NOTE: as long as the curve is continuous, you can calculate


Median and mean of a density curve
• The median of a density curve is the equal-areas point, the point that
divides the area under the curve in half
• The mean of a density curve is the balance point, at which the curve
would balance if made of solid material.

NOTE: median is the one that sensitive with skewness


Normal distributions
• Normal—or Gaussian—distributions are a family of symmetrical, bell-
shaped density curves defined by a mean m (mu) and a standard
deviation s (sigma): N(m, s).
• The probability density distribution:

µ = mean
σ = SD
σ^2 = variance
N = Nornal
N ~ (mean, SD)
Normal distributions
same mean, different SD

Here, means are the same (μ = 15),


whereas standard deviations are
different (σ = 2, 4, and 6).

same SD, different mean

Here, means are different, whereas


standard deviations are the same.

NOTE: if SD big, the distribution is getting spread


The 68-95-99.7 rule ONLY FOR NORMAL DISTRIBUTION

• About 68% of all observations are


within 1 standard deviation
(σ) of the mean (μ).
• About 95% of all observations are
within 2 σ of μ.
• Almost all (99.7%) observations
are within 3 σ of μ.
• Note: this rule only applies to
Normal distributions
Conclusion:
68% = 1SD
95% = 2 SD
99.7% = 3 SD
Example 1
If we notice that these histograms are approximately normal, can we approximate
the standard deviation in each?
99.7,
Why? because 99.7 is almost
all of the distributions SD = 1150-850 / 6

SD = (1250 - 775) / 6
The standard normal distribution
Because all Normal distributions share the same properties, we can
standardize our data to transform any Normal curve N(μ, σ) into the
standard Normal curve N(0, 1).
Standardizing a normal distribution
• For each data point x, we calculate a new value, z (called a z-score).
• A z-score measures the number of standard deviations that a data
value x is from the mean μ.

value of the observation


Standardizing a normal distribution
• For each data point x, we calculate a new value, z (called a z-score).
• A z-score measures the number of standard deviations that a data
value x is from the mean μ.
When x is 1 standard deviation larger than the
mean, then z = 1.

When x is 2 standard deviations larger than the mean,


then z = 2.

When x is larger than the mean, z is positive.


When x is smaller than the mean, z is negative.
Example 2
Women’s heights follow the N(64.5”,
2.5”) distribution. What percent of
women are shorter than 67 inches tall
(that’s 5’6”)?
mean µ = 64.5”

standard deviation s = 2.5”

x (height) = 67”

Z = 67 - 64.5 / 2.5 = 1

Proportion = 0.8413 or 84.13 %


Example 2
Women’s heights follow the N(64.5”,
2.5”) distribution. What percent of
women are shorter than 67 inches tall
(that’s 5’6”)?
mean µ = 64.5”

standard deviation s = 2.5”

x (height) = 67”

We calculate z, the standardized value of x:

Because of the 68-95-99.7 rule, we can conclude that the percent of women shorter than
67” should be, approximately, 0.68 + half of (1 - 0.68) = 0.84, or 84%.
Using the standard normal table
Table gives the area under the standard Normal curve to the left of any
z-value
Find the percent of women shorter than 67’’
using the table
For z = 1.00, the area under
the standard Normal curve to
the left of z is 0.8413.

Conclusion:
84.13% of women are shorter than 67”.

By subtraction, 1 - 0.8413, or 15.87% of


women are taller than 67”.
Calculating the area to the right of z
Because the Normal distribution is symmetrical, there are two
ways that you can calculate the area under the standard
Normal curve to the right of a z-value.
Calculating the area in between two values
• To calculate the area between two z-values, first get the area under
under N(0, 1) to the left for each z-value from the table.
• Then subtract the smaller area from the larger area.

area between z1 and z2 =


area left of z1 – area left of z2
Example 3
The National Collegiate Athletic Association (NCAA) requires Division I
athletes to score at least 820 on the combined math and verbal SAT exam to
compete in their first college year. The SAT scores of 2003 were
approximately normal with mean 1026 and standard deviation 209.
What proportion of all students would be NCAA qualifiers (SAT ≥ 820)?
N ~ (1026, 209)
Z = (820 - 1026) / 209 = -0.9856 roughly -0.99
Proportion = 1 - 0.1611 = 83.89%

NOTE: if >= then u should do 1-… or Look at negative table


Example 3
The National Collegiate Athletic Association (NCAA) requires Division I
athletes to score at least 820 on the combined math and verbal SAT exam to
compete in their first college year. The SAT scores of 2003 were
approximately normal with mean 1026 and standard deviation 209.
What proportion of all students would be NCAA qualifiers (SAT ≥ 820)?
Example 4
The NCAA defines a “partial qualifier” eligible to practice and receive an
athletic scholarship, but not to compete, as a combined SAT score of at least
720.
What proportion of all students who take the SAT would be partial
qualifiers? That is, what proportion have scores between 720 and 820?
Ans:
N ~ (1026, 209)

720 < X < 820

Z1 = (720 - 1026) / 209 = -1.46 [Proportion = 0.0721]


Z2 = (820 - 1026) / 209 = -0.99 [Proportion = 0.1611]

Proportion: 0.1611 - 0.0721 = 0.089 = 8.9%


Example 4
The NCAA defines a “partial qualifier” eligible to practice and receive an
athletic scholarship, but not to compete, as a combined SAT score of at least
720.
What proportion of all students who take the SAT would be partial
qualifiers? That is, what proportion have scores between 720 and 820?
Finding a value when given a proportion
Inverse normal calculations: We may also want to find the observed
range of values that correspond to a given proportion under the curve.
For that, we can use the table backward:
§ We first find the desired
area/proportion in the body
of the table.

§ We then read the


corresponding z-value from
the left column and top row.
Example 5: Inverse normal calculation
SAT Verbal test scores follow approximately the N(505,
110) distribution. How high must a student score to place
in the top 10% of all students taking the SAT?

top 10% = 90% from the bottom, 0.900


so, Z = 1.28

1.28 = (X - 505) / 110


X - 505 = 140.8
X = 645.8
Example 5: Inverse normal calculation
SAT Verbal test scores follow approximately the N(505, 110)
distribution. How high must a student score to place in the top
10% of all students taking the SAT?
1. z = 1.28 is the standardized value with area 0.9 to its left and 0.1
to its right.
2. Unstandardize
!"#$#
%%$
= 1.28
Solving for x, we get a
SAT score of 646.
Questions
1. In a normal distribution, how would you describe the relative
position of the mean and the median?
Questions
1. In a normal distribution, how would you describe the relative
position of the mean and the median? roughly equal or maybe equal
2. Suppose that test scores have an approximate Normal distribution
with a mean of 75 and a variance of 64. A score of 83 translated to a
z-score of 1. What are the units associated with this z-score?
A z-score is unitless.
Assessing the normality of data
• Many of the rules and methods we’ve seen only apply to Normal
distributions
• How do we tell if data follow an approximately Normal distribution?
• One way is to plot the data on a normal quantile plot:
• We find the percentile rank of each data point
• We convert those percentiles to Normal scores (that is, we find the
z-score of a Normal distribution with the same percentage of
observations to the left).
• The Normal scores are then plotted on the x-axis, with the
corresponding data points plotted on the y-axis.
Normal quantile plot NOTE: normal distribution = straight plots

Each point on the plot shows


this is roughly normal distribution
the value of the observation
(y-coordinate) and the
corresponding z-score for that
value (x-coordinate)

Each point compares the


location of a given
percentile in a standard
normal distribution
with the location of the
same percentile in the
distribution of the data
Normal quantile plot of fifth-graders’ IQ scores
Normal quantile plot

If the data are normally


distributed, the plot will show a
straight line, indicating a good
match between the data and a
normal distribution

From this plot, we can say that


the distribution of fith-graders’
IQ scores is roughly normal.

Normal quantile plot of fifth-graders’ IQ scores


Non-normality in the normal quantile plot
outliers are here Systematic deviations from a
straight line indicate a non-Normal
distribution. Outliers appear as
points that are far away from the
overall pattern of the plot.

The distribution of the time to


start a business is not normal.
It is skewed to the right.

Normal quantile plot of the time to start a business


Example 6: Normal Quantile Plot
What does this normal
quantile plot show?

a) A strong linear
relationship between
z-score and minutes
b) A highly skewed
distribution of song
lengths
c) An approximately
Normal distribution of
song lengths
For next class
• Reading: Alwan 4.1-4.2
• For practice: Alwan 4.5, 4.31, 4.33
• Reminder: Krauth Chapter 6 (reading from last week) has additional
Excel instructions.

You might also like