Unit 4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Unit IV

Normal Distribution
• Probability,
• Characteristics and application of normal
probability curve;
• sampling error
Probability
The study of probability is concerned with
random phenomena. Even though we
cannot be certain whether a given result
will occur, we often can obtain a good
measure of its likelihood, or probability.

• Probability is the study of randomness and


uncertainty
• In the early days, probability was associated
with games of chance (gambling).
Probability
Random Experiment…
In the study of probability, any observation, or
measurement, of a random phenomenon is an
experiment.
an experiment is any process that can be repeated
in which the results are uncertain.
…a random experiment is an action or process that leads
to one of several possible outcomes. For example:
Experiment
Experiment Outcomes
Outcomes
Flip a coin Heads, Tails
Exam Numbers: 0, 1, 2, ..., 100
Probability
The possible results of the experiment are called
outcomes, and the set of all possible outcomes is
called the sample space.
This list must be exhaustive, i.e. ALL possible
outcomes included.
Die roll {1, 2, 3, 5, 6} Die roll {1, 2, 3, 4, 5, 6}

A simple event is any single outcome from a


probability experiment. Each simple event is
denoted ei.
Probability
An event is any collection of outcomes from a
probability experiment. An event may consist of
one or more simple events. Events are denoted
using capital letters such as E.
Usually we are interested in some particular
collection of the possible outcomes. Any such
subset of the sample space is called an event.
Probability
If E is an event that may happen when an
experiment is performed, then the empirical
probability of event E is given by

number of favorable outcomes n( E )


P( E )   .
probability total number of outcomes n( S )
of event E

number of times event E occurred


P( E )  .
number of times the experiment was performed
Example: Gender of a Student
A school has 820 male students and 835 female
students. If a student from the school is selected
at random, what is the probability that the
student would be a female?

Solution
number of female students
P(female) 
total number of students
835
  .505
820 + 835
EXAMPLE: Consider the probability
experiment of selecting a candy
A bag contains 9 brown candies, 6 yellow candies, 7 red
candies, 4 orange candies, 2 blue candies, and 2 green
candies. Suppose that a candy is randomly selected.
(a) What is the probability that it is brown?
(b) What is the probability that it is blue?

Solution:
a) 3/10 (=0.3)
b) 1/15 (=0.07)
Properties of Probabilities

1. The probability of any event E, P(E), must be


between 0 and 1 inclusive (i.e., 0 < P(E) < 1).
2. If an event is impossible, the probability of the
event is 0.
3. If an event is a certainty, the probability of the
event is 1.
4. For any event E, P(E does not occur) = 1 – P(E).
5. If S = {e1, e2, …, en}, then
P(e1) + P(e2) + … + P(en) = 1.
Complement of an Event
Let S denote the sample space of a probability
experiment and let E denote an event.
The complement of E, denoted E , is all simple
events in the sample space S that are not simple
events in the event E.
Complimentary Rule
If E represents any event and E represents
the complement of E, then
P(E ) = 1 – P(E)

This rule is useful if it is easier to determine


the probability of the complimentary event
than the probability of the event itself.
EXAMPLE: Consider the probability
experiment of selecting a candy
A bag contains 9 brown candies, 6 yellow candies, 7 red
candies, 4 orange candies, 2 blue candies, and 2 green
candies. Suppose that a candy is randomly selected.
(a) What is the probability that it is not brown?
(b) What is the probability that it is not blue?

Solution:
a) 21/30 (=0.7) (= 1 – 9/30)
b) 28/30 (=0.93) (= 1 – 0.07)
Deterministic vs. Random Processes
In deterministic processes, the outcome can be
predicted exactly in advance
Example: Force = mass x acceleration.
If we are given values for mass and acceleration, we exactly
know the value of force

In random processes, the outcome is not known


exactly, but we can still describe the probability
distribution of possible outcomes
Example: 10 coin tosses:
we don’t know exactly how many heads we will get, but
we can calculate the probability of getting a certain
number of heads
Random variables
A random variable is a numerical outcome of
a random process or random event
Example: three tosses of a coin
S = {HHH, THH, HTH, HHT, HTT, THT, TTH, TTT}
• Random variable X = number of observed tails
• Possible values for X = {0,1, 2, 3}

Why do we need random variables?


• We use them as a model for our observed data
Random variables

Random variables that have a finite


(countable) list of possible outcomes, with
probabilities assigned to each of these
outcomes, are called discrete
Random variables that can take on any
value in an interval, with probabilities
given as areas under a density curve, are
called continuous
Discrete Random Variables

A discrete random variable has a finite or


countable number of distinct values
Discrete random variables can be
summarized by listing all values along with
the probabilities
Called a probability distribution
Example
Random variable X = the sum of two dice
X takes on values from 10 to 12
the probability distribution: (using “equally-likely outcomes” rule)
X 2 3 4 5 6 7 8 9 10 11 12
# of
Outco 1 2 3 4 5 6 5 4 3 2 1
mes

P(X) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
0.03 0.06 0.08 0.11 0.14 0.17 0.14 0.11 0.08 0.06 0.03

If discrete r.v. takes on many values, it is better


to use a probability histogram
Probability Histograms
Probability histogram of sum of two dice:
0.2

0.15

0.1

0.05

-1.94E-16
2 3 4 5 6 7 8 9 10 11 12

Using the disjoint addition rule, probabilities for discrete


random variables are calculated by adding up the “bars” of
this histogram:
P(sum ≥ 10) = P(sum = 10) + P(sum = 11) + P(sum = 12) = 6/36
= 1/6 = 0.17
Continuous Random Variables
Continuous random variables have a non-
countable number of values.
Can’t list the entire probability distribution, so
we use a density curve instead of a histogram.
Eg. Normal density curve:
Calculating Probabilities
Discrete: add up bars from probability histogram
Continuous: we have to use integration to calculate
the area under the density curve.

Although it seems more complicated, it is often easier to


integrate than add up discrete “bars”.
If a discrete r.v. has many possible values, we often treat
that variable as continuous
Why do we need Probability?

We have several graphical and numerical


statistics for summarizing our data
We want to make probability statements
about the significance of our statistics

Example: r = 0.782 for marks obtained and number of


hrs read
What is the chance that the true correlation is
significantly different from zero?
Types of Distribution

• Frequency Distribution
• Normal Distribution
• Poisson Distribution
• Binomial Distribution
• Sampling Distribution
• t distribution
• F distribution
Probability Density Function
A probability density function is an equation that is used
to compute probabilities of continuous random variables
that must satisfy the following two properties.
Let f(x) be a probability density function.
1. The area under the graph of the equation over all
possible values of the random variable must equal
one. (i.e.,  f ( x)dx  1 )
xS

2. The graph of the equation must be greater than or


equal to zero for all possible values of the random
variable (i.e., The probability that x assumes a value
in any interval lies in the range 0 to 1)
What is Normal Distribution?
It is defined as a continuous frequency distribution
of infinite range (can take any values not just integers as in the
case of binomial and Poisson distribution).

A random variable X with mean  and standard


deviation σ is normally distributed if its probability
density function is given by
( X  ) 2
1 

f (X)  e 22 Where


-≤x≥
 2  = 3.14159….
e = 2.71828…
The Normal Distribution:
as mathematical function (pdf)

1 x 2
1  ( )
f ( x)  e 2 
 2
This is a bell shaped curve
constants: with different centers and
=3.14159 spreads depending on 
e=2.71828 and 
What is Normal Distribution?
A normal distribution curve is symmetrical, bell-
shaped curve defined by the mean and standard
deviation of a data set.

The Normal distribution is also known as the


Gaussian Distribution and the curve is also known
as the Gaussian Curve, named after German
Mathematician-Astronomer Carl Frederich Gauss.
Normal Distribution

A family of bell-shaped curves that differ only in


their means and standard deviations.
µ = the mean of the distribution
 = the standard deviation

 µ = 3 and  = 1

0 3 6 9 12 X
Normal Distribution

 µ = 3 and  = 1

0 3 6 9 12 X

µ = 6 and  = 1

0 3 6 9 12 X
Normal Distribution


µ = 6 and  = 2

X
0 3 6 8 9 12

µ = 6 and  = 1

X
0 3 6 8 9 12
The effects of 
How does the expected value affect the location of f(x)?

 = 10  = 11  = 12

Three normal distribution curves with different means but


the same standard deviation.

μ controls location
The effects of 
How does the standard deviation affect the shape of f(x)?

= 2
 =3
 =4

 = 12

Three normal distribution curves with the same


mean but different standard deviations.
σ controls spread
Normal Distribution
f(X)

Changing σ increases or
σ decreases the spread.

 X

Changing μ shifts the distribution left or right.

• There are infinitely many normal distributions


• The expected value E(X) (also called the mean ) can be
any number
• The standard deviation  can be any nonnegative number
Normal Distribution

Total area under a normal curve

μ x

The shaded area is 1.0 or 100%


Normal Distribution
A normal curve is symmetric about the mean

Each of the two shaded


areas is 0.5 or 50%

.5 .5

μ x
Normal Distribution

No matter what  and  are,


the area between - and + is about 68%;
the area between -2 and +2 is about 95%; and
the area between -3 and +3 is about 99.7%.
Almost all values fall within 3 standard deviations.
Normal Distribution
Areas of the normal curve beyond μ ± 3σ

Each of the two shaded


areas is very close to zero

μ x
μ – 3σ μ + 3σ
Points of inflections

Points of inflections one σ below and above μ


Area under Curve
The area under the graph of a density function over some
interval represents the probability of observing a value of
the random variable in that interval.
P(a < X < b) = Area under the density curve
between a and b.

f(x) P(a < X < b)


b

P(a  X  b) =  f(x)dx
a

X
a b
Characteristics of Normal Distribution
• The normal distribution has a Bell Shape Curve
and is Symmetric around the mean (two halves of
the curve are the same; highest point occurs at mean
i.e., Mean, median, and mode all have the same value)
• Normal curves are unimodal.
• The total area under the curve is 1
(area under the curve
to the right of  = to the left of  = 0.5)
• It has inflection points at 1 standard deviation
from mean (i.e., at  - σ and  + σ)
Characteristics of Normal Distribution
• It links frequency distribution to probability
distribution
• the area between  is about 68%; 2 is
about 95%; and 3 is about 99.7%.
• The two tails of the curve extend indefinitely.
Normal Distribution
Why are normal distributions so important?
• Many dependent variables are commonly assumed
to be normally distributed in the population
• The normal distribution and its properties are well
known, and if our variable of interest is normally
distributed, we can apply what we know about the
normal distribution to our situation.
– find the probabilities associated with particular
outcomes
– can make inferences about values of that variable
Example: Sampling distribution of the mean
Positive Skewness (Tail to Right)

Tail of the distribution on the right hand (positive)


side is longer than on the left hand side.

For a right skewed distribution, the mean is


typically greater than the median.
Negative Skewness (Tail to Left)

A distribution that is skewed left has exactly the opposite characteristics


of one that is skewed right:
• the mean is typically less than the median;
• the tail of the distribution is longer on the left hand side
than on the right hand side; and
• the median is closer to the 3rd quartile than to the 1st quartile
Skewness

• Positive Skewness: Mean ≥ Median

• Negative Skewness: Median ≥ Mean

• Pearson’s Coefficient of Skewness:

= 3 (Mean – Median) / Standard deviation


Are my data normally distributed?
1. Look at the histogram! Does it appear bell shaped?
2. Compute descriptive summary measures—are mean,
median, and mode similar?
3. Do 2/3 of observations lie within 1 SD of the mean?
Do 95% of observations lie within 2 SD of the mean?
4. Look at a normal probability plot—is it approximately
linear?
5. Run tests of normality (such as Kolmogorov-Smirnov).
But, be cautious, highly influenced by sample size!

You might also like