Statistics

Geomathematics (M.
Simonetta Bernabei and Horst Thaler)

- Notes
- R.A. Johnson, G.K. Bhattacharyya, Statistics, Wiley
- J.C. Davis, Statistics and Data Analysis in Geology, Wiley
- M. Middleton, Data Analysis using Excel, Duxbury

Probability theory

Introduction.

Even though the complete knowledge of a statistical population is the main goal of a statistical
analysis, in general, we have only the partial information given by a sample to our disposal. With
descriptive statistics we were able to describe data by means of regrouping, graphical
representation and the calculation of numerical measures like mean, median, mode, variance and
standard deviation. Our aim is to make generalizations or inferences to the complete population
based on the information of the sample. To understand the reasoning that leads to this
generalization it is essential to introduce the language of probability.

In everyday language we often hear phrases like
- Very likely the weather next weekend will be fine.
- It is very improbable that he will win at the lottery.
- With the probability of 50% I will get the job.

The words very likely, very improbable, indicate in a qualitative manner the possibility that
an event may happen. Probability theory is a branch of mathematics which furnishes a method to
quantify the certainty of events. In general the probability of an event is a numerical value that
measures how likely it is that an event will happen. Probability is measured on a scale of values
ranging from 0 to 1, where a value close to 0 indicates a very unlikely event and a value close to 1
indicates a very probable event. In the final part on inferential statistics we shall see how
probability theory enters as the basic instrument.

In this context the notion of experiment is used not only for studies performed in laboratories, but
more generally for observations of any phenomenon, that shows variation in its possible
outcomes.
The sample space S or event space is the set of all possible outcomes in an experiment. It can be
discrete (e.g. throwing a die) or continuous (e.g. length). Every element or result of S is called an
elementary event and will be denoted by e. An event is a subset of S , and we say that an event
happens, when it contains an elementary event belonging to this subset. In particular the event S
is called the certain event and the empty set is called impossible event.

Probability axioms

The probabilty of an event A, denoted by ) (A P should satisfy the following properties

1. 0 ) (A P 1 for all A

_ S and 1 ) ( = S P , . 0 ) ( = C P
2. for every collection of pairwise disjoint events
n
E E E , , ,
2 1
, i.e. C =
j i
E E for , j i =
= =
= |
.
|
\
|
n
i
i i
n
i
E P E P
1 1
), ( . , , 2 , 1 n n =

A sample space S together with a probability function Pis called a probability space.

Example 1. Throwing a die. The event space is S = {1,2,3,4,5,6}, with elementary events 1,2,
, 6. An event is for example = A The result of a throw gives an even number, that is
= A {2,4,6} .

Assigning probabilities

I. Uniform model: If our event space is finite with N elements and every elementary event has
the same probability, then
, ) (
N
A
A P =
where A is the number of elements in . A

Example 3: when tossing a fair coin the possible outcomes are Head, H
and Tail, T , therefore = S {T, H} and P(T)=P(H) =

1
2
.

Example 4: when throwing a fair die we have = S {1,2,3,4,5,6} and P(1)=P(2) =P(3) = P(4) =
P(5) = P(6) =

1
6
.
Let A be the event the throw gives an even number, then = ) (A P

1
2
.
Of course, only special phenomena can be described by such a symmetric model.

Exercise 5: Gregor Mendel, pioneer of genetics, has conceived a theory of heredity for the
explanation of generation patterns of pea plants. According to Mendel, inherited characteristics
are transmitted from one generation to another by genes. Genes occur in pairs and the offspring
obtain their pair by taking one gene from each parent. A simple uniform probability model is the
basis of Mendels theory of selection mechanism.
Mendels experiment consists in cross-fertilizing red flowers (R) with white flowers (W) giving
pink-flowered hybrids which have one gene of each type. Crossing these hybrids leads to one of
the four possible pair genes.
According to Mendels law, the four possibilities have equal probability and therefore
P(Pink)=1/2 and P(White)=P(Red)=1/4.

WR WR

WW WR RW RR

An experiment performed by Correns, one of Mendels followers, has shown the frequencies 141,
291 and 132, which are nearly related as 1:2:1.

II. Frequentistic interpretation of probability: Assuming that we repeat an experiment under
the same conditions, then with an increasing number of trials it may happen that the relative
frequency of an event tends to a definite value which we then define as the probability of the
event. For example, consider the number of heads in a sequence of n tosses of a coin.

relative frequency of the number of heads
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
0 100 200 300 400 500
number of trials
r
e
l
a
t
i
v
e

f
r
e
q
u
e
n
c
y

III. Subjective interpretation of probability: This definition uses the subjective degree of
confidence of an observer in the outcome of an event. For example when bets are made one
applies this definition.

Composed events

The negation or complement of an event A ) ( S A_ is the event that Adoesnt happen. It is
denoted by
c
A and in terms of sets we have , A S A
c
= i.e. it is the set that contains all
elements of S which are not contained in . A

The intersection of two events , ,
2 1
A A is the event that both events occur. It is denoted by
.
2 1
A A

The union of two events ,
2 1
A A is the event that either
1
A or
2
A occur. It is denoted by
.
2 1
A A

Example 2. Let
1
A be the event Mr. Rossi catches the train and
2
A the event Mr. Bianchi
catches the train. Then we have

- the event
c
A
1
Mr. Rossi misses the train;

- the event
2 1
A A =Mr. Rossi and Mr. Bianchi catch the train;
- the event
2 1
A A =Mr. Rossi or Mr. Bianchi catches the train;
- the event
c c
A A
2 1
=Neither Mr. Rossi nor Mr. Bianchi catch the train;

Properties of Probability

- ) ( 1 ) ( A P A P
c
= for all . S A_

- 0 ) ( = C P ;

- ), ( ) ( ) ( ) (
2 1 2 1 2 1
A A P A P A P A A P + = for all . ,
2 1
S A A _

Definition: Two events
1
A and
2
A are called incompatible if C =
2 1
A A . In this case one
has

P(A
1
A
2
) = P(A
1
) + P(A
2
) .

Exercise. The following contingency table, shows the data (in percent) of patients with diabetes
during an observation time of one year.

Light case Serious case
Parents with diabetes Parents with diabetes
Age of patient S No S No
Below 40 15 10 8 2
Above 40 15 20 20 10
A
A
C
A
S
A
1
A
2

A
1
A
2

A
1
A
2
A
1
A
2

Suppose that some patient is chosen randomly from this group, let A, B and C be the following
events:
A: The patient is a serious case
B: The patient has less than 40 years
C: The patients parents suffer of diabetes

a) Find the probabilities ), (A P ) (B P , ), ( B A P ). ( C B A P
b) Describe the following events and find their respective probabilities , B A
.
c c
B A

Solution:
a) ) (A P =0,08+0,02+0,20+0,10=0,40
) (B P =0,15+0,10+0,08+0,02=0,35
10 , 0 02 , 0 08 , 0 ) ( = + = B A P
(the probability that the person has less than 40 years and is a serious case).
08 , 0 ) ( = C B A P (the probability that the person has less than 40 years, is a
serious case and the parents suffer of diabetes).
b)

AB is the event that the patient is a serious case or has less than 40 years.
65 , 0 10 , 0 35 , 0 40 , 0 ) ( ) ( ) ( ) ( = + = + = B A P B P A P B A P

A
C
B
C
is the event that the patient is not a serious case and has not less than 40
years.

P(A
C
B
C
) =0,15+0,20 =0,35.

Combinatorics

Very often the number of outcomes for an event or the sample space is very large.
In order to handle such large numbers it is useful to take advantage of certain
counting techniques which also provide compact formulas.

Product rule

Suppose an experiment consists of r parts, where the first part has
1
k possible
outcomes, the second
2
k
, and so on. Then there are
r
k k k
2 1
possible
outcomes for the experiment.

Example. A dice is thrown twice or two dice are thrown at once. There are
36 6 6 = outcomes and the probability to get, e.g. the pair (2,5), is .
36
1

Permutations

Given n different objects
n
a a , ,
1
, we may ask for the number of possible

different (ordered) arrangements, also called permutations, of these objects.

Example. For three objects a, b, c we have 6 possible arrangements

abc, acb, bac, bca, cab, cba.

In the general case this number can be found as follows: For the first object we have
n possibilities - once we have selected the first object, there remain 1 n
possibilities for the second object. Proceeding in this way until the last object, we
find that the number of possible permutations in this case is

1 2 ) 2 ( ) 1 ( ! = n n n n

The number ! n is called n factorial (by convention one puts 0!=1).
By the same argument one finds that the number of different arrangements of k
objects selected from n objects equals

) 1 ( ) 1 ( + = k n n n P
n
k

Note that we have
. !
n
n
P n =
This number can also be expressed as

)! (
!
k n
n
P
n
k
=

Example. An urn contains nine balls which are labeled with the numbers 1, ,9.
4 balls are drawn at random without replacement.

a) What is the probability to get the numbers 2467 in this order.

b) What is the probability that the drawn numbers contain the numbers 345.

Solution:

a) Our event 2467 consists of one element and the sample space has
9
4
P

elements, therefore we get

. 0003 . 0
6 7 8 9
1 1
) " 2467 ("
9
4
=

= =
P
P

b) This event can be realized in 12 outcomes, namely x345 or 345x, where x is
a number from 1, ,9 different from 345. We thus get

. 004 . 0
6 7 8 9
12
) " 345 (" =

= x x P

Example. In a room there are 23 persons. What is the probability that arent two of
them with the same birthday.
- Since the persons may have been born on any day of the year, we have
23
365 possible combinations of birthdays.
- In order that all birthdays be different, we have 365 choices for the first person,
364 for the second and so on. The searched for number therefore equals
356 ) 22 365 ( 364 and the probability is

. 493 , 0
365
) 22 365 ( 364 365
23
~

Combinations

In how many ways we can choose k elements from
n
different elements, if the
order is not considered? In order to solve this problem, we can take the number of
ordered arrangements of k elements from
n
elements and divide by the number of
permutations of every group of k elements which equals !. k We thus obtain that
the number of possible combinations of k elements out of
n
elements is given by
the binomial coefficient

.
!
) 1 ( ) 1 (
k
k n n n
k
n
+
=
|
|
.
|
\
|

By convention one puts
. 1
0
=
|
|
.
|
\
|
n
In particular, the binomial coefficient fulfills the
relation

|
|
.
|
\
|
=
|
|
.
|
\
|
k n
n
k
n
.

Example. Find the combinations of 3 elements from a, b, c, d. We have the
following choices:
{a, b, c}; {a, b, d}; {a, c, d}; {b, c, d}.

Example. 4 cards are chosen from a pile of 52 cards. What is the probability to
get 4 aces.

- The number of possible outcomes equals the number of possible combinations of
drawing 4 cards from a pile of 52 without taking care of the order, which is

, 270725
1 2 3 4
49 50 51 52
4
52
=

=
|
|
.
|
\
|

the probability is therefore 1/270725

Conditional probability and independence. Very often the probability of an event A
has to be modified when information on another event B has been obtained, which has
occurred and which is related to event A.

Example 5. A sample of 200 people has been classified according to body weight and
the incidence of hypertension. The results are listed in the following table.

Overweight Normal Weight Underweight Total
Hypertension / 20 16 4 40
No
hypertension
30 90 40 160
Total 50 116 44 200

a) What is the probability that a person chosen randomly among this group has
hypertension.

b) A person has been chosen randomly and has been found to be overweight. What is the
probability that he has hypertension.

Solution:
Define the following events

A: The person has hypertension
B: The person has overweight

a) The probability to find a person with hypertension (event A) equals the ratio of
persons with hypertension over the total number of the group.
2 . 0 200 / ) 4 16 20 ( ) ( = + + = A P

b) This probability can be calculated by considering only the subgroup of persons with
overweight (50). The probability to find a person with hypertension within this group is

P(A| B) =20/50 =0,4.

Conditional probability: Given a probability space and an event B such that P(B) > 0 ,
the conditional probability of A, knowing that B occurs is defined as

P(A | B) =
P(AB)
P(B)

Very often one also uses the phrase conditional probability of A under the hypotheses of
B.

Independence: Given a probability space S , two events S B A _ , are called
independent, if

which, in the case of 0 ) ( > B P , is equivalent to

Example 6. The events Aand Bfrom example 5 above are not independent .
In fact

) (A P = 0,2 and

P(A| B) =0,4,
therefore

P(A| B) = P(A) and the events A and Bare dependent.

Exercise. The probability that a cat lives 12 years is

1
4
, the probability that a dog lives
12 years is

1
3
.
Calculate the following probabilities, assuming that the dog and the cat are born at the
same time.

a) the cat will not be alive in 12 years.
b) both will be alive in 12 years;
c) at least one of them will be alive in 12 years;
d) none of them will be alive in 12 years .

Solution:

Let A be the event the cat will be alive in 12 years and B the event the dog will be
alive in 12 years.

a) ) ( 1 ) ( A P A P
c
= = 1 -

1
4
=

3
4
.
b) We can suppore that the events A and B are independent. Therefore, using the
property of independence we may write:

P(AB) = P(A) P(B) =
1
4

1
3
=
1
12

c) The event at least one of them will be alive in 12 years is the composed event

AB , and we get

P(AB) = P(A) + P(B) P(AB) =
1
4
+
1
3
1
12
=
6
12
=
1
2

P(AB) = P(A) P(B)

P(A| B) = P(A)
d) The event none of them will be alive in 12 years is the complement of the event
of

AB , therefore

) ( 1 ) ) (( B A P B A P
c
= = 1 -

1
2
=

1
2

Calculating probabilities by distinction of cases

Sometimes it is useful to split a random experiment in two or more disjoint
events whose union is the sample space and use the conditional probabilities
with respect to these cases.
For arbitrary events A and E in a probability space the following formula
holds

) ( ) | ( ) ( ) | ( ) (
c c
A P A E P A P A E P E P + =

To see this, note that ), ( ) ( ) (
c c
A E A E A A E E = = with
disjoint union, and therefore

) ( ) ( ) (
c
A E P A E P E P + =

= ) ( ) | ( ) ( ) | (
c c
A P A E P A P A E P +

The formula above can be generalized to more than two cases. Suppose that
we have a partition of the sample space S by events
, , ,
1 n
A A
which
means that
, C =
j i
A A
for
, j i =
and
,
1
S A
i
n
i
=
=

then

=
=
n
i
i i
A P A E P E P
1
) ( ) | ( ) (

The latter formula is sometimes called formula of total probability.

Example. An urn contains 2 black and 5 white balls. Two balls are drawn
without replacement. What is the probability to get a black ball in the
second draw.

Solution. Let = E The second ball is black and = A The first ball is
black
We have , 7 / 2 ) ( = A P , 7 / 5 ) ( =
c
A P 6 / 1 ) | ( = A E P and . 6 / 2 ) | ( =
c
A E P
Applying the formula we thus get

. 7 / 2 42 / 12 7 / 5 6 / 2 7 / 2 6 / 1 ) ( = = + = E P

Formula of Bayes

As an immediate consequence of the total probability formula we get the
formula of Bayes. Given a partition of the sample space by events , , ,
1 n
A A
then we have for any event E

) (
) (
) | (
E P
E A P
E A P
j
j
=
=
n
i
i i
j j
A P A E P
A P A E P
1
) ( ) | (
) ( ) | (

If we think of the events
i
A as possible different hypotheses which might
have influence on an experiment
, E
then the formula of Bayes shows us
how to modify our hypotheses after having performed the experiment, by
passing from
) (
j
A P
to ). | ( E A P
j

Example: In a population, a certain illness is present with probability 1/1000
and can be verified by a test that with probability 0,99 is positive whether
the person is really ill, and that with probability 0,05 is positive whether the
person is not ill. Compute the probability that a person whose test is
positive is ill.

Solution. Let M be the event the person is ill, and let us denote by S the
event the person is not ill. Moreover let + be the event the test is
positive and - the event the test is negative. Then we have the
following information: P(M)=0.001, P(+|M)=0.99, P(+|S)=0.05. We apply
the Bayes formula in order to find the solution:

P M ( | )
. * .
. * . . * .
. + =
+
=
0 001 0 99
0 001 0 99 0 05 0 999
0 0194

Exercises.

Esercise 1. The following table shows the age and the sex of 85 students of a
course of mathematics:

Sex / Age 20 >20 Tot
Male 15 30 45
Female 20 20 40
Tot 35 50 85

A student is chosen randomly from the class.
a) Find the probability that the student is a male and has less than 20 years;
b) Find the probability that the student is a male;

c) Find the probability that the student is male or female and has less than 20
years;
d) Find the probability that the student is male or female.
[R= a) 0,18, b) 0,53, c) 0,41, d) 1]

Exercise 2. One throws a dice two times. Find the probability that the sum of
the outcomes is 10.
[R: 1/12]

Exercise 3. The PIN of a credit card is given by a five-digit number.
Suppose that each sequence has the same probability.
a) Find the probability that all the PIN digits are different.
b) Find the probability that the PIN code has at least two equal digits.
[R: a) 0,30; b) 0,70]

Exercise 4. Let S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} be a sample space and
consider the following events: A = {an even number is chosen} e B = {a
multiple of 3 is chosen}.
Find
a) P(A B);
[1/10]
b) P(A B);
[7/10]
c) Are the events A and B independent?
[They are dependent]
d) the probability that the event B happens knowing that A occurs.
[1/5]

Exercise 5. A group of people with incidence of hypertension has undergone
two different treatments A and B. The following table shows the number of
patients which has improved

Improved Not improved
Therapy A 34 10
Therapy B 61 35

Find the probability that, choosing a patient randomly,
a) the patient has undergone the therapy B and has improved; [61/140]
b) the patient has undergone the therapy B or has improved; [13/14]
c) Are the events A =the patient has undergone the therapy B e B =the
patient has improved independent? [not independent]
d) knowing that the patient has undergone the therapy B, find the probability
that he has improved. [61/96]

Exercise 6. The following table shows 3 different types of a certain genes
observed in 2 different places of Appennino (Abetone, Pisanino).

Place Gene 1 Gene 2 Gene 3 Tot
Abetone 10 240 50 300
Pisanino 10 150 20 180
Tot 20 390 70 480

a) Find the probability that the gene is of type 1 and is in Abetone.
b) Find the probability that the gene of type 1 is in the Abetone.
c) Does there exist dependence between the gene of type 1 and the fact that
it is in the Abetone?

Exercise 7. In a building they live 3 families A, B, C. The family A is made
of 4 men, family B of 3 men and 1 woman, family C of 2 men and 2
women. At a certain moment one observes that a man is leaving the
building. What is the probability that he belongs to the family B?
[R=1/3]

Exercise 8. A committee is composed of 10 people: 4 women and 6 men.
What is the probability that, choosing from the committee a subset of 5
people, in the group there is no woman.
[R=1/42]

Exercise 9. An urn contains 100 dices, 50 of them are fair, 50 are unfair so
that the probability to throw 1 is for half of them and 1/10 for the other
half.
A dice is drawn and thrown. What is the probability to get 3?
b) Knowing that the outcome of the throw is 2, what is the probability that
the dice is unfair?
R=[2/15; 3/8]

Exercise 10. In the next table you find the list of frequencies of different
types of prey and predator present in a certain place.

Predator/ Prey Hare Chicken
Wolf 60 30
Weasel 40 20
Fox 10 90

Find the probability that the prey is the hare and the predator the wolf.
[R=6/25]
Find the probability that the chicken is captured by the fox.
[R=9/14]
Are the events the prey is the hare and the predator is the wolf
dependent? Why?
[R=no]

Exercise 11. In a city of USA the percentage of Republicans in the
population is 48%. Find the probability that in a sample of 200 people less
than 100 are Republican. Use a suitable approximation and find the error.
R=[0,7143; 0,6915]

Random variables

Random variables serve to analyse the event space by assigning values to the elementary
events and looking for their distributions.
Given a probability space. A random variable X is a function that associates a numerical
value to every outcome of an experiment. The adjective random indicates that, a priori,
we do not know the outcome of the experiment. The random variable can be discrete or
continuous, depending on whether it assumes discrete or continuous values.

For a discrete random variable that assumes the values
n
x x , ,
1
, we can associate to
each of its values the probability that X will assume this value, i.e.

), ( ) ) ( : (
i i i
x X P x e X e P p = = = = . , , 1 n i =

The set {p
1
, ..., p
n
} is called the probabilty distribution of the random variable . X

Example 7. Let X be the random variable that measures the number of heads in three
tosses of a fair coin. Hence, the possible values of X are {0,1,2,3} , the event space is
given by

T T T X=0
H T T
T H T X=1
T T H
T H H
H T H X=2
H H T
H H H X=3

and the probability distribution is the following:

i
x
i
p
0 1/8
1 3/8
2 3/8
3 1/8

The corresponding histogram is

0,125
0,375 0,375
0,125
0
0,1
0,2
0,3
0,4
0,5
0 1 2 3

The probability distribution of a random variable X is the table of the distinct values the
variable can assume together with their corresonding probabilities.
The probability distribution of a random variable has the following properties:
0 >
i
p for all i
1
2 1
= + + +
n
p p p

The expectation value or mean of a r.v. X is given by

= ) (X E =

x
i
i=1
n
p
i
= x
1
p
1
+ x
2
p
2
+ ...+ x
n
p
n
.

Properties of Expectation
- E( c ) = c c is a constant;
- E(X+Y)=E(X)+E(Y) for all r.v. X and Y;

Example. An insurence company pays every client an amount of 1000 euros in the case
of an accident or robbery during a travel of 5 days. If the risk of such a happening can be
estimated by 1 to 200, what is the honest price the client should pay for the policy.

Solution. The probability that the company has to pay the client is 1/200=0.05, therefore,
if X is the sum paid by the company, then its distribution is given by

i
x
i
p
0 0.95
1000 0.05

with the expectation value

=0 0,995+1000 0,005=5 euros

The honest price to be payed by the client is 5 euros.

The variance ) (X Var of a random variable X with mean u is defined by

= =
= =
n
i
i i i
n
i
i
X E p x p X E x X Var
1
2 2
1
2
) ( )) ( ( ) ( .

The variance measures the concentration of the distribution around the mean or, if you
like, the dispersion of the distribution. It is zero for constant random variables.

The standard deviation of a random variable X with mean u is the square root of the
variance, o=

Var(X) .

More examples of discrete distributions

Binomial Distribution

A Bernoulli trial is a random experiment with two possible outcomes, A
(success) and
c
A (failure) with probabilities p and 1-p, respectively.
Consider the random variable, which assigns the value 1 to event A and 0
to event .
c
A We thus get
p A P X P = = = ) ( ) 1 (
and
. 1 ) ( ) 0 ( p A P X P
c
= = =

i
x

i
p

1 p
0 1-p

Next, let us consider a fixed number n of Bernoulli trials. The number of
successes obtained in n trials is a random variable we denote by . Y This
random variable Y is called a binomial random variable with corresponding
binomial distribution. The binomial distribution is given by

. ) 1 ( ) (
k n k
p p
k
n
k Y P

|
|
.
|
\
|
= =

In fact, there are
|
|
.
|
\
|
k
n
ways to get k successes in n trials, and each such
event has probability
. ) 1 (
k n k
p p

We shall denote the binomial distribution also by
) , ( p n B
, that is,
). ( ) )( , ( k Y P k p n B = =

The binomial r.v. X=X
1
+ X
2
++ X
n
where X
i
=1 with probability p and
X
i
=0 with probability 1-p.
By the above properties the expectation of a binomial r.v. is E(X)=
E(X
1
)+E(X
2
)++E(X
n
)
But E(Xi)=p and therefore E(X)=np.

The variance of X is Var(X)=Var(X
1
)+Var(X
2
)++Var(X
n
) because of
independence of the ra.v. X
i
, i=1,2,,n.
Var(X
i
)=E(X
i
2
)-(E(X
i
))
2
=p-p
2
=p(1-p) because X
i
2
=1 with probability p and 0
with probability 1-p.

Example. A test consists of 10 questions, where each question has 4 possible
answers only one of which is correct. In order to pass the test one has to
answer at least 8 questions correctly. What is the probability to pass the test
if somebody chooses the answers randomly.

Solution. For every question, the probability to guess the right answer, is
1/4.
If we assign 1 point for every correct answer, and Y is the random variable
that counts the points, then we get a binomial distribution with n=10 and
p=1/4. The probability is given by

= = + = + = = > ) 10 ( ) 9 ( ) 8 ( ) 8 ( Y P Y P Y P Y P

=
|
|
.
|
\
|
+
|
|
.
|
\
|
+
|
|
.
|
\
|
0 10 1 9 2 8
) 4 / 3 ( ) 4 / 1 (
10
10
) 4 / 3 ( ) 4 / 1 (
9
10
) 4 / 3 ( ) 4 / 1 (
8
10

. 00042 . 0
4
436
4
1
1
4
3
10
4
3
2
9 10
10 10 10 10
2
= = + +

Hypergeometric distribution

The classical application of the hypergeometric distribution is sampling without
replacement. Think of an urn with two types of marbles, red ones and white ones. Define
drawing a white marble as a success and drawing a red marble as a failure. In the case with
replacement we find the binomial distribution. If the variable N+M describes the number of
all marbles in the urn and M describes the number of white marbles, then N corresponds to
the number of red marbles. Now, assume that there are 5 white and 4 red marbles in the urn.
Standing next to the urn, you close your eyes and draw 3 marbles without replacement. What
is the probability that exactly 2 of the 3 are white? Note that although we are looking at
success/failure, the data cannot be modeled under the binomial distribution, because the
probability of success on each trial is not the same, as the size of the remaining population
changes as we remove each marble.
Let us denote by Y the number of white marbles (Y=0,1,2,3). The probability of drawing 2
white marbles is obtained as ratio of successes and all possible cases. The number of all
possible cases is equal to the combinations of 3 groups out of the total number of marbles,
that is

The number of favorable cases is given by the product of the combinations of 2 white
marbles from 5 and that one of 1 red marble from 4. Then one obtains

In the case with replacement the probability to find 2 white marbles from the sample of
size 3 is given by the binomial distribution with n=3 and p=5/9 (probability to draw a
white marble from an urn with 5 white marbles and 4 red ones). Then the required
probability is

3
2
|
\

|
.
|
5
9
|
\

|
.
|
2
4
9
|
\

|
.
|
1

P(Y = 2) =
5
2
|
\

|
.
|
4
1
|
\

|
.
|
9
3
|
\

|
.
|
=
5!
2!3!
4!
1!3!
9!
3!6!
=
5 4
2
4
9 8 7
3 2
=
40
84
=
10
21

9
3
|
\

|
.
|
=
9!
3!6!

Generally the hypergeometric distribution describes the number of successes in a
sequence of n draws from a finite population of size M+N, with n M+N composed by 2
kind of objects: M of type I (success) and N of type II (failure). Then the probability to
have k successes (kn) within the sample is

Denote the hypergeometric distribution by Hyper(M+N,M,n). One can prove that

If p =M / M+N, then

The binomial approximation to the hypergeometric

If M+N is much larger than n (at least 10 times: M+N 10 n), then the hypergeometric
distribution can be approximated by a binomial distribution with parameters n and p.

Exercise.
In a lake there are 12 fishes belonging to different species. 6 fishes are trouts. Fishing
randomly 4 fishes from the lake, find the probability that none of them is a trout.

One has: M+N=12, M=6, n=4, k=0; then

P(Y = k) =
M
k
|
\

|
.
|
N
n k
|
\

|
.
|
M + N
n
|
\

|
.
|

E(Y) = n p
Var(Y) = n p 1 p
( )
1
n 1
M + N 1
|
\

|
.
|

E(Y) = n
M
M + N
Var(Y) = n
M
M + N
1
M
M + N
|
\

|
.
| 1
n 1
M + N 1
|
\

|
.
|

P(Y = 0) =
6
0
|
\

|
.
|
6
4
|
\

|
.
|
12
4
|
\

|
.
|
=
15
495
= 0.0303

Example.

From a batch of computers is chosen a sample of size 10, that are tested. The batch will be
accepted when in the sample there are at most 2 defect computers. Assuming that 10% of the
computers of the batch is defect, find the probability that it will be accepted.
Solution. Let Y be the number of defect computers. Then the probability distribution is
hypergeometric and the required probability is P(Y2)

Then

P(Y s 2) =
10
k
|
\

|
.
|
90
10 k
|
\

|
.
|
100
10
|
\

|
.
|
k=0
2
= P(Y = 0) + P(Y =1) + P(Y = 2)

P(Y = 0) =
10
0
|
\

|
.
|
90
10
|
\

|
.
|
100
10
|
\

|
.
|
=
90 89 88 ... 81
100 99 98 ... 91
~ 0, 330476

P(Y =1) =
10
1
|
\

|
.
|
90
9
|
\

|
.
|
100
10
|
\

|
.
|
=
90 89 88 ... 82
99 98 ... 91
~ 0, 407995

Finally we have

P(Y2)=0,3304576+0,407995+0,201510,94

Approximation to the binomial distribution

By using the Binomial approximation (10010n) with n=10 e p=0,1.

The error is 0,01.

Hypergeometric distribution with Excel.

- Hypgeomdist (sample_s, number_sample, population_s, number_population,
cumulative)
- sample_s is the number of successes in the sample,
- number_sample is the size of the sample,
- population_s is the number of successes in the population
- number_population is the size of the population
- cumulative is a logical value: for the cumulative distribution function, use TRUE; for
the probability mass function, use FALSE.

P(Y = 2) =
10
2
|
\

|
.
|
90
8
|
\

|
.
|
100
10
|
\

|
.
|
=
9 9 90 89 88 ... 83
2 99 98 ... 91
~ 0,20151

P(Y s 2) =
10
k
|
\

|
.
|
k=0
2
0,1
k
0,9
10k
= 0,9
10
+10 0,1 0,9
9
+
10 9
2
0,1
2
0,9
8
~ 0,93

Poisson distribution

Binomial distributions
) , ( p n B
, where
n
is very large and
p
is very
small such that
, = p n
, can be very well approximated by the so called
Poisson distribution with parameter , defined by

,
!
) (
k
e k X P
k

= =
, 2 , 1 , 0 = k

The mathematical statement is that
Confronto Ipergeometrica(100,10,10) e binominale B(100;0,1)
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0,45
0 1 2 3 4 5 6 7 8 9 10
Bin(100,0,1)
iper(100,10,10)

,
!
) )( , ( lim
k
e k p n B
k
n

=
, 2 , 1 , 0 = k

where p and n are related by . = p n

The corresponding random variable X is naturally called a Poisson random
variable. Because p is small, one speaks of rare events. These situations are
realized for example for the number of defect pieces of some machine, the
number of persons older than 100 years, etc.
We may interpret the case of rare events also as follows. Whenever we are
observing purely random events in a certain time period, then by
subdividing this period into small time-intervals, such that the success of the
event in every interval will be small, we are in the situation as described
above. Consider, for example, the number of calls to a some service line, the
number of persons joining some queue, the number e-mails distributed by
some mail-server.

Example. A worker produces objects which are defect with the probability
p=0.1. Assuming that the quality remains the same, what is the probability
that a sample of 10 pieces will contain at most one defect piece.

Solution. The distribution is binomial with n=10 and p=0.1. If X denotes
the random variable that counts the defect pieces, then we are interested in
the probability

= = + = = s ) 1 ( ) 0 ( ) 1 ( X P X P X P

7361 . 0 9 . 0 1 . 0
1
10
9 . 0 1 . 0
0
10
9 1 10 0
~
|
|
.
|
\
|
+
|
|
.
|
\
|

Using the Poisson distribution as approximation, with parameter 1 1 . 0 10 = ,
we get
7358 . 0 2
! 1
1
! 0
1
) 1 ( ) 0 (
1
1
1
0
1
~ = + = = + =

e e e X P X P

Exercises
Exercise 1. In a village of 200 people, 5 have a genetic illness. A sample of 3 people is
chosen from the population. Let X be the number of ill people within the sample.
a) Find the probability distribution of X;
b) Make a suitable approximation of X.

Exercise 2. In a certain office of an industry the orders of the customers arrive via fax
with a mean frequency of 10 per day.
a) Find the right distribution that counts the number of orders that arrive a day chosen
randomly. Compute the probability that in a certain day there arrive at most 3 orders.
(0,01)
2% of the total number of orders cannot be satisfied because they are related to articles
which are not produced anymore.
b) Which distribution describes the number of orders that can be satisfied among 100
arrived orders? Find the probability that among 100 orders, at least 2 cannot be
satisfied.(0,5967)
c) Use a suitable approximation. (0,594)

Exercise 3. A web site is visited in average 1000 times per hour. Among 1000, 10 visitors
come from an italian address (.it).
Find the probability that in the next hour the number of visits from Italy is twice its mean.
Use a suitable approximation and compute the error. (R=0.00179, 0.00187).

Exercise 4. An industry receives fuses in groups of 40 peaces. A subset of 4 fuses chosen
randomly from each group which is being controlled. In the case when at least 1 fuse is
broken the industry refuses the whole group of 40 peaces.
If 10% of the group is defect, find the probability that the group will be accepted. Find an
approximation for this probability and compute the error. (0,6445;0,6561)

Exercise 5. Suppose that the probability that a product is defect is p = 0, 001. Taking a
sample of size 10000:
a) Find the mean of defect peaces. (10)
b) Find the probability that the sample doesnt contain defect peaces. (0,0000451733)
c) Find the probability that it contains at least 10 defect peaces. (0,542132876)
d) Find the probability c) by using a suitable approximation.

Exercise 6. The probability that a certain basket player scores a point is 0.8. In a
sequence of 9 throws what is the probability that he scores
a) at least 5 points?
b) at most 5;
c) less than 5?
R=[0,9804; 0,0856; 0,0196]

Exercise 7. Suppose that in a population the percentage of people over 90 years is 1%.
Find the probability that in a sample of size 1000 the people over 90 years are at least 10.
Find a suitable approximation and compute the error.
[R=0,5427; 0,5421]

Continuous distributions. Continuous random variables attain values on a continuous
scale, i.e. their values are real numbers. In the case of discrete random variables we may
consider the frequency distribution as a particular histogram, where each value represents
a class. For a continuous random variable we may imagine to start from a histogram
which corresponds to a certain precision of measurements. Performing finer and finer
measurements increasing at the same time the number of measurements, we shall obtain
histograms with ever more classes. Proceeding in this way, our histograms will
approximate a continuous curve more and more.

0 5 10 15 20

0 10 20

0 10 20

For every continuous random variable one associates its probability density ) (x f which
corresponds the probability distribution in the discrete case. The probability density is a
function of real numbers . x It satisfies the following properties:
0 ) ( > x f for all . x
the total area of the probability density is 1.
the area of the curve between the values a and b is:

P(a s X s b) = f (x)dx
a
b
.
The expectation value can be calculated by

= , ) ( ) ( dx x xf X E

and the variance as

= . ) ( ) ( ( ) (
2
dx x f X E x X Var

The mode is the value where the density attains its largest value and the median is the
value m, such that

= =
m
m
dx x f dx x f
2
1
) ( ) (

The Gaussian or normal random variables

A random variable X is called normal or Gaussian with parameters u and
2
o if its
probability density is given by
.
2
1
) (
2
2
2
) (
o
u
o t
=
x
e x f
We a normal r.v. we have

- , ) ( u = X E and . ) (
2
o = X Var
- it is symmetric with respect to the mean
- the mean, median and mode coincide (are equal to )
- it is always strictly positive.

The following plot shows the density of a normal r.v. with parameters 0,1.
-4 -2 0 2 4
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
x
d
n
o
r
m

(
x
)

Standardization of random variables. Given a continuous r.v. X , it is possible to
transform the latter to a standardized form, i.e. a r.v. Z , which has mean 0 and standard
deviation 1. This is obtained by setting

Z =
X u
o

Remark. The Gaussian distribution or curve of errors is the most important continuous
distribution. It has been proposed by Gau (1809) in the context of the theory of errors. It
is sometimes also attributed to Laplace (1812), who has defined its basic properties. The
adjective normal comes from the observation that for many physical, biological
phenomena the observed distribution can be modelled by normal distributions.

By changing the mean, the form of the curve doesnt change, but the maxima are shifted.
The following curves all have standard deviation 1 but different means.

Normal distributions with equal variance
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0,45
-3 -2 -1 0 1 2 3 4

Critical values with two tails.

If the area of two symmetric tails is given by o , what is the corresponding value
c
z .

Normal distributions with equal mean
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
-1,5 -1 -0,5 0 0,5 1 1,5 2 2,5 3 3,5

Solution. In tables on the standardized normal distribution one usually finds the values of
) ( ) ( z X P z s = u from which we we find the value
c
z as solution of the equation
o = u )) ( 1 ( 2 z . For example, for

o=0,05 we find . 96 . 1 =
c
z

Central Limit Theorem. Given a sequence of independent random variables {Xn} with the
same probability distribution, consider their sum S
n
=X
1
+X
2
++X
n
. If E(X
i
)=

u e Var(X
i
)=

o
2
,
then by using the properties of the average and the variance we get E(S
n
)=n

u e Var(S
n
)=n

o
2
.
The corresponding normalized variable

Z =
S
n
nu
o n

A fundamental result of Probability Theory is the Central Limit Theorem (CLT) states that
whether the size of the sample is large enough (for instance n30) then for each initial
probability distribution the distribution of the sum can be approximated by the standard
gaussian distribution Z.

Example: One throws two times a dice and let X be the random variable that measures the
sum of the outcomes. Suppose to throw 5 times a couple of dice and to repeat 100 times such
experiment

X
1
X
2
X
3
X
4
X
5
average
5 7 5 7 5 5,8
8 4 9 11 7 7,8
6 10 8 8 7 7,8
6 5 6 7 6 6
7 12 8 5 9 8,2
9 9 4 5 4 6,2
6 3 10 6 4 5,8
5 7 8 5 7 6,4
8 10 7 7 2 6,8
11 3 5 9 9 7,4
6 6 3 5 7 5,4
7 7 10 3 10 7,4
6 3 7 6 8 6
10 7 9 9 4 7,8
6 6 11 12 10 9
7 7 3 8 5 6
9 11 3 11 10 8,8
6 10 8 3 11 7,6
6 2 6 4 8 5,2
2 11 8 7 7 7
9 8 10 4 12 8,6
4 8 5 7 6 6
8 5 6 5 5 5,8
12 5 7 8 11 8,6
7 10 6 11 6 8
9 9 8 7 9 8,4
4 3 2 6 3 3,6
8 4 6 7 8 6,6
5 6 8 8 4 6,2
7 8 4 11 8 7,6
4 6 6 5 6 5,4
11 7 11 8 5 8,4
5 8 9 4 6 6,4
7 5 6 5 6 5,8
7 8 6 6 2 5,8
8 10 4 11 7 8
6 10 8 10 8 8,4
5 5 7 3 7 5,4
6 2 6 6 4 4,8
7 3 6 9 9 6,8
12 10 9 7 7 9
6 5 7 8 7 6,6
8 7 11 11 5 8,4
7 5 6 6 7 6,2
6 7 12 3 4 6,4
11 7 6 10 6 8
11 5 6 6 4 6,4
6 4 8 4 7 5,8
7 8 8 5 11 7,8
10 6 5 8 7 7,2
8 4 8 4 7 6,2
7 8 9 4 7 7
11 4 9 7 7 7,6
6 9 6 3 11 7
11 7 9 12 4 8,6
7 3 7 5 7 5,8
9 3 9 6 6 6,6
8 9 7 7 4 7
9 7 5 8 6 7
7 9 4 9 6 7
6 12 6 11 5 8
12 7 6 8 8 8,2
8 10 11 6 7 8,4
7 5 9 9 7 7,4
6 6 9 7 9 7,4
4 8 6 7 9 6,8
5 8 4 5 8 6
6 8 8 4 10 7,2
6 6 6 6 4 5,6
10 6 7 4 3 6
8 11 10 9 8 9,2
5 6 9 7 10 7,4
8 5 3 6 3 5
9 6 8 9 7 7,8
5 7 10 7 8 7,4
8 7 8 5 12 8
8 7 9 6 6 7,2
5 8 6 9 4 6,4
6 7 7 8 11 7,8
7 5 7 9 10 7,6
4 12 9 6 8 7,8
4 6 7 9 3 5,8
6 8 9 4 7 6,8
7 11 6 6 5 7
4 4 5 7 4 4,8
9 5 10 5 7 7,2
7 8 8 6 4 6,6
4 6 8 3 7 5,6
9 6 10 10 5 8
3 5 7 8 8 6,2
10 10 11 3 4 7,6
6 6 10 6 6 6,8
6 7 6 10 5 6,8
8 3 8 9 7 7
7 5 5 5 2 4,8
8 7 9 3 5 6,4
7 8 8 10 11 8,8
12 9 4 7 5 7,4
8 11 9 5 5 7,6
10 2 10 8 9 7,8

Histogram
0
5
10
15
20
25
30
35
3 4 5 6 7 8 9 10 11 12 More

Statistica
descrittiva

Mean 6,96
Standard Error 0,11
Median 7,00
Mode 5,80
Standard
Deviation 1,11
Sample Variance 1,24
Kurtosis -0,25
Skewness -0,25
Range 5,60
Minimum 3,60
Maximum 9,20
Sum 696,20
Count 100,00

Gaussian approximation to the binomial.

The binomial distribution is symmetric for p=0,5 and becomes asymmetric for p different
from 0,5. A binomial distribution can be well approximated by the normal distribution when
np 5 and n(1-p)5,
otherwise, under suitable hypotheses, one could use the Poisson distribution.
Generally if X is a normal r.v. then

where

and Z is the standard normal distribution. The values of can be found on the statistical
tables.
Note that the binomial distribution is discrete and that one of Gauss is continuous. In order to
compare these distributions one adds a correction 0,5

Example: A new robot has been projected for making weldings and will substitute an old
robot. The new model will be considered good if it mistakes 1% of weldings and bad if it
mistakes 6% of weldings. One test 100 weldings. The new project will be accepted if the
number of errors is at most 2, rejected otherwise.
What is the probability that a good project will be rejected?
What is the probability that a bad project will be accepted?

Solution. Let X be the number of defect weldings among 100. The random variables X can
be considered a binomial distribution with n=100 and p=0,01 or 0,06. If the project is good
p=0,01. The project is rejected if X >2. If p=0,01 then P(X>2) can be approximated by a
Poisson distribution and cannot be approximated by a Gauss distribution because
np= 1000,01 = 1 <5. Therefore

Using the binomial distribution

P(X s x) ~ u
x np
np(1 p)
|
\

|
.
|

u(z) = P(Z s z)

P(a s X s b) ~ u
b np + 0,5
np(1 p)
|
\

|
.
| u
a np 0,5
np(1 p)
|
\

|
.
|

P(X > 2) =1 P(X s 2) =1 P(X = 0) P(X =1) P(X = 2) =
1e
1
1
0
0!
+
1
1
1!
+
1
2
2!
|
\

|
.
|
=1e
1
(1+1+ 0,5) ~ 0,0775

P(X > 2) =1 P(X s 2) =1 P(X = 0) P(X =1) P(X = 2) =
1
100
0
|
\

|
.
|
0,99
100
100
1
|
\

|
.
|
0,99
99
0.01
1
100
2
|
\

|
.
|
0,99
98
0.01
2
~ 0,0793
The approximation is good. Analogously the probability that a bad project will be accepted
under the assumption that p=0,06 can be approximated by the Gaussian distribution because
pn = 0,06 100 = 6 >5.

Computing directly this probability one obtains

The error is e=0,057-0,071=-0,014.

P(X s 2) = u
2+ 0,56
2,37
|
\

|
.
|
~ u(1,47) ~ 0,0708

P(X s 2) =
100
0
|
\

|
.
|
0,94
100
+
100
1
|
\

|
.
|
0,94
99
0.06
1
+
100
2
|
\

|
.
|
0,94
98
0.06
2
~ 0,057

Statistics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics

Uploaded by

Copyright:

Available Formats

Geomathematics (M.

Simonetta Bernabei and Horst Thaler)

, we may ask for the number of possible

= P(Y = 0) + P(Y =1) + P(Y = 2)

You might also like