Topik 2 Probability Distribution PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

ANALISIS DATA LINGKUNGAN- TL5002

Probability
Distribution

ENVIRONMENTAL ENGINEERING Ahmad Soleh Setiyawan


INSTITUT TEKNOLOGI BANDUNG
1
RANDOM VARIABLES
A random variable x represents a numerical value
associated with each outcome of a probability distribution.

A random variable is discrete if it has a finite or countable


number of possible outcomes that can be listed.
x
0 2 4 6 8 10

A random variable is continuous if it has an uncountable


number or possible outcomes, represented by the intervals
on a number line.
x
0 2 4 6 8 10
DISCRETE PROBABILITY DISTRIBUTIONS

A discrete probability distribution lists each possible value


the random variable can assume, together with its
probability. A probability distribution must satisfy the
following conditions.

In Words In Symbols
1. The probability of each value of 0 £ P (x) £ 1
the discrete random variable is
between 0 and 1, inclusive.

2. The sum of all the probabilities ΣP (x) = 1


is 1.
CONSTRUCTING A DISCRETE PROBABILITY DISTRIBUTION

Guidelines
Let x be a discrete random variable with possible
outcomes x1, x2, … , xn.
1. Make a frequency distribution for the possible
outcomes.
2. Find the sum of the frequencies.
3. Find the probability of each possible outcome by
dividing its frequency by the sum of the frequencies.
4. Check that each probability is between 0 and 1 and
that the sum is 1.
MEAN
The mean of a discrete random variable is given by
µ = ΣxP(x).
Each value of x is multiplied by its corresponding
probability and the products are added.
Example:
Find the mean of the probability distribution for the sum of
the two spins.
x P (x) xP (x)
2 0.0625 2(0.0625) = 0.125 ΣxP(x) = 3.5
3 0.375 3(0.375) = 1.125 The mean for the
4 0.5625 4(0.5625) = 2.25 two spins is 3.5.
VARIANCE
The variance of a discrete random variable is given by
s2 = Σ(x – µ)2P (x).
Example:
Find the variance of the probability distribution for the sum
of the two spins. The mean is 3.5.

x P (x) x – µ (x – µ)2 P (x)(x – µ)2 ΣP(x)(x – µ)2


2 0.0625 –1.5 2.25 » 0.141 » 0.376
3 0.375 –0.5 0.25 » 0.094
The variance for the
4 0.5625 0.5 0.25 » 0.141 two spins is
approximately 0.376
STANDARD DEVIATION
The standard deviation of a discrete random variable is
given by σ = σ 2

σ = σ 2.
Example:
Find the standard deviation of the probability distribution
for the sum of the two spins. The variance is 0.376.

x P (x) x – µ (x – µ)2 P (x)(x – µ)2


= 0.376 » 0.613
2 0.0625 –1.5 2.25 0.141
Most of the sums
3 0.375 –0.5 0.25 0.094 differ from the
4 0.5625 0.5 0.25 0.141 mean by no more
than 0.6 points.
EXPECTED VALUE
The expected value of a discrete random variable is equal to
the mean of the random variable.
Expected Value = E(x) = µ = ΣxP(x).

Example:
At a raffle, 500 tickets are sold for $1 each for two prizes of
$100 and $50. What is the expected value of your gain?

Your gain for the $100 prize is $100 – $1 = $99.


Your gain for the $50 prize is $50 – $1 = $49.
Write a probability distribution for the possible gains
(or outcomes).
Continued.
EXPECTED VALUE
Example continued:
At a raffle, 500 tickets are sold for $1 each for two prizes of
$100 and $50. What is the expected value of your gain?

Gain, x P (x)
E(x) = ΣxP(x).
1
$99 500 1 1 498
= $99 × + $49 × + (-$1) ×
1 500 500 500
$49 500
= -$0.70
–$1 498
500
Because the expected value is
Winning negative, you can expect to lose
no prize
$0.70 for each ticket you buy.
BINOMIAL DISTRIBUTIONS
BINOMIAL EXPERIMENTS

A binomial experiment is a probability experiment that


satisfies the following conditions.
1. The experiment is repeated for a fixed number of
trials, where each trial is independent of other trials.
2. There are only two possible outcomes of interest for
each trial. The outcomes can be classified as a success
(S) or as a failure (F).
3. The probability of a success P (S) is the same for each
trial.
4. The random variable x counts the number of
successful trials.
NOTATION FOR BINOMIAL EXPERIMENTS

Symbol Description
n The number of times a trial is repeated.

p = P ( S) The probability of success in a single trial.


q = P (F) The probability of failure in a single trial.
(q = 1 – p)

x The random variable represents a count


of the number of successes in n trials:
x = 0, 1, 2, 3, … , n.
BINOMIAL PROBABILITY FORMULA
In a binomial experiment, the probability of exactly x
successes in n trials is
Example:
A bag contains 10 chips. 3 of the chips are red, 5 of the chips are
white, and 2 of the chips are blue. Three chips are selected, with
replacement. Find the probability that you select exactly one red chip.

P (x ) = nC x p q x n -x
=
n ! x n -x
p q .
(n - x )! x !
p = the probability of selecting a red chip = 3 = 0.3
10
q = 1 – p = 0.7 P (1) = 3C 1(0.3)1(0.7)2
n=3 = 3(0.3)(0.49)
x=1 = 0.441
FINDING PROBABILITIES
Example:
The following probability distribution represents the probability of
selecting 0, 1, 2, 3, or 4 red chips when 4 chips are selected.

x P (x) a.) Find the probability of selecting no


0 0.24 more than 3 red chips.
1 0.412
2 0.265
3 0.076 b.) Find the probability of selecting at
4 0.008 least 1 red chip.
a.) P (no more than 3) = P (x £ 3) = P (0) + P (1) + P (2) + P (3)
= 0.24 + 0.412 + 0.265 + 0.076 = 0.993
b.) P (at least 1) = P (x ³ 1) = 1 – P (0) = 1 – 0.24 = 0.76
Complement
GRAPHING BINOMIAL PROBABILITIES
Example:
The following probability distribution represents the probability of
selecting 0, 1, 2, 3, or 4 red chips when 4 chips are selected. Graph
the distribution using a histogram.
x P (x) P (x)
0 0.24 0.5 Selecting Red Chips
1 0.412 Probability
0.4
2 0.265
0.3
3 0.076
4 0.008 0.2

0.1
0 x
0 1 2 3 4
Number of red chips
MEAN, VARIANCE AND STANDARD DEVIATION
Population Parameters of a Binomial Distribution
Mean: µ = np
Variance: σ 2 = npq
Standard deviation: σ = npq
Example:
One out of 5 students at a local college say that they skip breakfast in
the morning. Find the mean, variance and standard deviation if 10
students are randomly selected.
n = 10 µ = np σ 2 = npq σ = npq
p = 1 = 0.2 = 10(0.2) = (10)(0.2)(0.8) = 1.6
5
q = 0.8 =2 = 1.6 » 1.3
MORE DISCRETE
PROBABILITY DISTRIBUTIONS
GEOMETRIC DISTRIBUTION

A geometric distribution is a discrete probability


distribution of a random variable x that satisfies the
following conditions.
1. A trial is repeated until a success occurs.
2. The repeated trials are independent of each other.
3. The probability of a success p is constant for each
trial.
The probability that the first success will occur on trial x
is
P (x) = p(q)x – 1, where q = 1 – p.
POISSON DISTRIBUTION

The Poisson distribution is a discrete probability distribution of


a random variable x that satisfies the following conditions.
1. The experiment consists of counting the number of times
an event, x, occurs in a given interval. The interval can be
an interval of time, area, or volume.
2. The probability of the event occurring is the same for each
interval.
3. The number of occurrences in one interval is independent
of the number of occurrences in other intervals.
The probability of exactly x occurrences in an interval is
x -µ
P (x ) = e
µ
where e » 2.71828
x! and µ is the mean number of occurrences.
ANALISIS DATA LINGKUNGAN- TL5002

Normal Probability
Distribution

ENVIRONMENTAL ENGINEERING Ahmad Soleh Setiyawan


INSTITUT TEKNOLOGI BANDUNG
20
PROPERTIES OF NORMAL DISTRIBUTIONS

A continuous random variable has an infinite number of


possible values that can be represented by an interval on
the number line.

Hours spent studying in a day

0 3 6 9 12 15 18 21 24

The time spent


studying can be any
number between 0
and 24.

The probability distribution of a continuous random


variable is called a continuous probability distribution.

21
PROPERTIES OF NORMAL DISTRIBUTIONS
The most important probability distribution in
statistics is the normal distribution.

Normal curve

A normal distribution is a continuous probability


distribution for a random variable, x. The graph of a
normal distribution is called the normal curve.

22
PROPERTIES OF NORMAL DISTRIBUTIONS
Properties of a Normal Distribution
1. The mean, median, and mode are equal.
2. The normal curve is bell-shaped and symmetric about
the mean.
3. The total area under the curve is equal to one.
4. The normal curve approaches, but never touches the x-
axis as it extends farther and farther away from the
mean.
5. Between µ - σ and µ + σ (in the center of the curve), the
graph curves downward. The graph curves upward to
the left of µ - σ and to the right of µ + σ. The points at
which the curve changes from curving upward to
curving downward are called the inflection points.
23
PROPERTIES OF NORMAL DISTRIBUTIONS

Inflection points

Total area = 1

x
µ - 3σ µ - 2σ µ-σ µ µ+σ µ + 2σ µ + 3σ

If x is a continuous random variable having a normal


distribution with mean µ and standard deviation σ, you
can graph a normal curve with the equation
1
y= e -(x - µ )2 2σ 2
. e = 2.178 π = 3.14
σ 2π
24
MEANS AND STANDARD DEVIATIONS
A normal distribution can have any mean and
any positive standard deviation.
Inflection
The mean gives points
Inflection the location of
points the line of
symmetry.
x x
1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 10 11

Mean: µ = 3.5 Mean: µ = 6


Standard Standard
deviation: σ » 1.3 deviation: σ » 1.9

The standard deviation describes the spread of the data.

25
THE STANDARD NORMAL DISTRIBUTION
The standard normal distribution is a normal distribution
with a mean of 0 and a standard deviation of 1.

The horizontal scale


corresponds to z-scores.

z
-3 -2 -1 0 1 2 3

Any value can be transformed into a z-score by using the


formula z = Value - Mean
=
x -µ.
Standard deviation σ

26
THE STANDARD NORMAL DISTRIBUTION
If each data value of a normally distributed random
variable x is transformed into a z-score, the result will be
the standard normal distribution.
The area that falls in the interval under
the nonstandard normal curve (the x-
values) is the same as the area under
the standard normal curve (within the
corresponding z-boundaries).

z
-3 -2 -1 0 1 2 3

After the formula is used to transform an x-value into a


z-score, the Standard Normal Table in Appendix B is
used to find the cumulative area under the curve.
27
THE STANDARD NORMAL TABLE

Properties of the Standard Normal Distribution


1. The cumulative area is close to 0 for z-scores close to z = -3.49.
2. The cumulative area increases as the z-scores increase.
3. The cumulative area for z = 0 is 0.5000.
4. The cumulative area is close to 1 for z-scores close to z = 3.49

Area is close to 0. Area is close to 1.


z
-3 -2 -1 0 1 2 3
z = -3.49 z = 3.49
z=0
Area is 0.5000.

28
GUIDELINES FOR FINDING AREAS

Finding Areas Under the Standard Normal Curve


1. Sketch the standard normal curve and shade the
appropriate area under the curve.
2. Find the area by following the directions for each case
shown.
a. To find the area to the left of z, find the area that
corresponds to z in the Standard Normal Table.
2. The area to the
left of z = 1.23
is 0.8907.

z
0 1.23
1. Use the table to find
the area for the z-score.

29
GUIDELINES FOR FINDING AREAS

Finding Areas Under the Standard Normal Curve


b. To find the area to the right of z, use the Standard
Normal Table to find the area that corresponds to z.
Then subtract the area from 1.

2. The area to the 3. Subtract to find the area to


left of z = 1.23 is the right of z = 1.23:
0.8907. 1 - 0.8907 = 0.1093.

z
0 1.23
1. Use the table to find
the area for the z-score.

30
GUIDELINES FOR FINDING AREAS

Finding Areas Under the Standard Normal Curve


c. To find the area between two z-scores, find the area
corresponding to each z-score in the Standard
Normal Table. Then subtract the smaller area from
the larger area.
2. The area to the 4. Subtract to find the area of
left of z = 1.23 the region between the two
is 0.8907. z-scores:
0.8907 - 0.2266 = 0.6641.
3. The area to the left
of z = -0.75 is
0.2266.

z
-0.75 0 1.23

1. Use the table to find the area for


the z-score.

31
NORMAL DISTRIBUTIONS:
FINDING PROBABILITIES

32
PROBABILITY AND NORMAL DISTRIBUTIONS

If a random variable, x, is normally distributed,


you can find the probability that x will fall in a
given interval by calculating the area under the
normal curve for that interval.

µ = 10
P(x < 15) σ=5

x
µ =10 15

33
PROBABILITY AND NORMAL DISTRIBUTIONS

Normal Distribution Standard Normal Distribution


µ = 10 µ=0
σ=5 σ=1

P(x < 15) P(z < 1)

x z
µ =10 15 µ =0 1

Same area

P(x < 15) = P(z < 1) = Shaded area under the curve
= 0.8413
34
NORMAL DISTRIBUTIONS:
FINDING VALUES

35
FINDING Z-SCORES
Example:
Find the z-score that corresponds to a cumulative area
of 0.9973. Appendix B: Standard Normal Table
z .00 .01 .02 .03 .04 .05 .06 .07 .08
.08 .09

0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359

0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753

0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141

2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964

2.7
2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974

2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981

Find the z-score by locating 0.9973 in the body of the Standard


Normal Table. The values at the beginning of the
corresponding row and at the top of the column give the z-score.
The z-score is 2.78.
36
FINDING Z-SCORES

Example:
Find the z-score that corresponds to a cumulative area
of 0.4170.
Appendix B: Standard Normal Table
z .09 .08 .07 .06 .05 .04 .03 .02 .01
.01 .00

-3.4 .0002 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003

-0.2 .0003 .0004 .0004 .0004 .0004 .0004 .0004 .0005 .0005 .0005
Use the
closest
-0.3 .3483 .3520 .3557 .3594 .3632 .3669 .3707 .3745 .3783 .3821
area.
-0.2
-0.2 .3859 .3897 .3936 .3974 .4013 .4052 .4090 .4129 .4168 .4207

-0.1 .4247 .4286 .4325 .4364 .4404 .4443 .4483 .4522 .4562 .4602
-0.0 .4641 .4681 .4724 .4761 .4801 .4840 .4880 .4920 .4960 .5000

Find the z-score by locating 0.4170 in the body of the Standard


Normal Table. Use the value closest to 0.4170.
The z-score is -0.21.
37
FINDING A Z-SCORE GIVEN A PERCENTILE

Example:
Find the z-score that corresponds to P75.

Area = 0.75

z
µ =0 ?
0.67

The z-score that corresponds to P75 is the same z-score that


corresponds to an area of 0.75.

The z-score is 0.67.

38
TRANSFORMING A Z-SCORE TO AN X-SCORE

To transform a standard z-score to a data value, x, in


a given population, use the formula
x = µ + zσ.
Example:
The monthly electric bills in a city are normally distributed
with a mean of $120 and a standard deviation of $16. Find
the x-value corresponding to a z-score of 1.60.

x = µ + zσ
= 120 +1.60(16)
= 145.6
We can conclude that an electric bill of $145.60 is 1.6 standard
deviations above the mean.
39
FINDING A SPECIFIC DATA VALUE
Example:
The weights of bags of chips for a vending machine are
normally distributed with a mean of 1.25 ounces and a
standard deviation of 0.1 ounce. Bags that have weights in
the lower 8% are too light and will not work in the machine.
What is the least a bag of chips can weigh and still work in the
machine?
P(z < ?) = 0.08
8% P(z < -1.41) = 0.08
z
?
-1.41 0 x = µ + zσ
x
? 1.25
= 1.25 + (-1.41)0.1
1.11
= 1.11
The least a bag can weigh and still work in the machine is 1.11 ounces.

40
SAMPLING DISTRIBUTIONS AND
THE CENTRAL LIMIT THEOREM

41
SAMPLING DISTRIBUTIONS
A sampling distribution is the probability distribution of a
sample statistic that is formed when samples of size n are
repeatedly taken from a population.

Sample Sample
Sample Sample
Sample
Sample
Sample
Sample
Population Sample
Sample

42
SAMPLING DISTRIBUTIONS

If the sample statistic is the sample mean, then the


distribution is the sampling distribution of sample means.

Sample 3
Sample 1 x3 Sample 6
Sample 4
x4 x1 Sample 5 Sample 2
x6
x5 x2

The sampling distribution consists of the values of the


sample means, x1 , x 2 , x 3 , x 4 , x 5 , x 6 .

43
PROPERTIES OF SAMPLING DISTRIBUTIONS

Properties of Sampling Distributions of Sample Means


1. The mean of the sample means, µx , is equal to the population
mean.
µx = µ

2. The standard deviation of the sample means,σ x , is equal to the


population standard deviation, σ , divided by the square root of n.

σx = σ
n
The standard deviation of the sampling distribution of the sample
means is called the standard error of the mean.

44
THE CENTRAL LIMIT THEOREM
If a sample of size n ³ 30 is taken from a population with
any type of distribution that has a mean = µ and standard
deviation = s,

x x
µ µ
the sample means will have a normal distribution.
xx
x x
x x x
x x x x x x
µ
45
THE CENTRAL LIMIT THEOREM

If the population itself is normally distributed, with


mean = µ and standard deviation = s,

x
µ
the sample means will have a normal distribution for
any sample size n. x x
x x
x x x
x x x x x
x
µ
46
THE CENTRAL LIMIT THEOREM
In either case, the sampling distribution of sample means
has a mean equal to the population mean.

µx = µ Mean of the
sample means

The sampling distribution of sample means has a standard


deviation equal to the population standard deviation
divided by the square root of n.

σx = σ Standard deviation of the


sample means
n
This is also called the
standard error of the mean.
47
UJI NORMALITAS
• BILA SEBUAH DISTRIBUSI MEMPUNYAI
DISTRIBUSI NORMAL à MENGHITUNG
PROBABILITAS DAPAT MENGGUNAKAN
TABEL DISTRIBUSI NORMAL.
• UNTUK DISTRIBUSI SAMPLING RERATA
à TRANSFORMASINYA MENJADI:

Z =
(x - µ x )
sx
48
UJI NORMALITAS

• CARA PENGUJIAN NORAMALITAS:


A. UJI NORMALITAS PADA KERTAS PROBABILITAS
B. UJI NORMALITAS DENGAN CHI-KUADRAT (GOODNESS-
OF-FIT):

X =å
2 ( f0 - fe ) 2

fe

F0 = FREKUENSI DARI OBSERVASI (DATA SAMPEL)


FE = FREKUENSI TEORITIS (EKSPEKTASI DARI KURVA NORMAL)

• Ketentuan X2 perhitungan < X2 teoritis à data


terdistribusi normal 49

You might also like