Probability and Probability Distributions: DR Martin C. Simuunza Dept of Disease Control School of Veterinary Medicine

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 240

Probability and Probability

Distributions
Dr Martin C. Simuunza
Dept of Disease Control
School of Veterinary medicine
1.1 Relative frequency and probability
• The event A occurs r times during m experiments or trials.
• The relative frequency h(A) is then defined by the formula
h(A) = r/m

Example 1.1
• Toss a coin m = 10 times (10 experiments) and define the event A =
equal to getting “head”.
• Assume that “head” occurred r = 8 times. The relative frequency will
then be

h(head) = r/m = 8/10 = 0.8


• Example 1.2
• m = 50 persons are examined for a syndrome called
“restless legs”. The event A of interest is a given
person suffering from “restless legs”. Assume that r
= 21 of the examined persons suffered from the
syndrome. The relative frequency of “restless legs”
will then be:

h(restless legs) = 21/50 = 0.42

• The number of experiments m in both examples are


limited. What happens if we increase them?
Example 1.1 (cont)

Number of tossed coins is increased from m = 10 to m =


10,000.

Number of No. Of Relative frequency


experiments “head”
10 8 0.800
50 20 0.400
100 46 0.460
1000 526 0.526
10,000 5080 0.508
Example 1.2 (Cont.)
• Number of examined persons is increased from m =
5 to m = 4722

No. of expts N. of persons Relative


with restless legs frequency
50 21 0.42
100 23 0.23
200 56 0.28
500 161 0.32
1000 316 0.316
4722 1448 0.307
• From these two experiments we can see that the
relative frequency increases in stability with
increasing number of experiments
• It seems that the relative frequency approaches a
“true covered frequency”. This “true covered
frequency” is called the probability.

Definition
The probability of an event is the event’s long-run
relative frequency in repeated experiments (trials)
under the same conditions

We refer to the probability of an event A as P(A).


Read as “probability of A”.
Sample Space and events
• The term “experiment” when used in statistics is
not restricted to laboratory experiments, but
includes:

• Any activity that results in the collection of data


pertaining to a phenomena that exhibits variation .

Definition
An experiment is the process of collecting data
relevant to a phenomena that exhibits variation in
its outcomes
Definition
The sample space is an exhaustive list of all the
possible outcomes of an experiment.

Each possible outcome is represented by one and only


one point in the sample space, which is usually
denoted by S

Example 1.3
Experiment: A blood sample is examined for blood
grouping.
Sample space: {O, A, B, AB}
Example 1.4
Experiment: Throwing a dice
Sample space: {1, 2, 3, 4, 5, 6}

In all the above examples, the sample space is limited.

Definition
A sample event is defined as each distinct outcome
of an experiment
Example 1.3 (cont.)
The outcome of blood examination with regard to
blood group may be O, A, B, or AB. All these
outcomes or results are sample events.

Example 1.4 (cont.)


The outcome of throwing one die may be 1, 2, 3, 4, 5
or 6. All these results are sample events.
Definition
A collection of elementary outcomes (sample event) characterised by
some descriptive feature is called an event.
An event is a subset of a sample space

Example 1.3 (cont.)


Sample space: {O, A, B, AB}
Events: all possible combinations of sample events

Example 1.4 (cont.)


Sample space: {1, 2, 3, 4, 5, 6}
Events: all possible combinations of sample events
Definitions
A discrete sample space is a sample space consisting of either a finite
or countable infinite number of sample events (elements)

A continuous sample space is a sample space consisting of


uncountable, infinite number of sample events (elements).

The sample spaces described in examples 1.3 and 1.4 are both discrete
sample spaces.

Example 1.5
Experiments: The age of a cow is to be determined.
Sample space: {All real numbers between 0 and 20}

This is a continuous sample space


1.3 Probability and events
Basic Definition

P(event) = number of times the event occurs


No. of times the experiment is repeated

Example 1.6
Experiment: Examine one cow
Event: Clinical mastitis

N = 100 experiments (cows examined)


n = 24 Clinical mastitis

P(event) = P(clinical mastitis) = n/N = 24/100 = 0.24


• If all sample events are equally likely, this rule can be written as

P(event) = number sample events in the event


No. of sample events in the sample space

Example 1.3 (cont.)


Sample space: {O, A, B, AB}
Sample events: O, A, B or AB

We are interested in blood group A or AB

P(event) = number of sample events in the event


number of sample events in the experiment
= n/N = 2/4 = 0.5
However, this probability calculation assumes that each sample event is
equally likely. This is not always the case.
Example 1.7

We analyse N = 100 blood samples for blood grouping. I.e. we repeat


the experiment 100 times.

The event is still {A, AB}

The results are : 43 cases with A and 3 cases with AB.

P(event) = number times the event occurs = 43 + 3 = 0.46


Number of repeated experiments 100

This is an estimate of the situation based on information available


Conditions for a model of probability
for discrete sample space
The probability is a function, defined on events that satisfy the following
conditions:
1. For all events A, 0≤P(A)≤1

2. If A is an event that has to occur, then P(A) = 1. The event A then


consists of all the sample events in the sample space.

3. If A is an event that can never occur, then P(A) = 0. The event A will
not consist of any sample events in the sample space.

4. P(A) is the sum of probabilities of all sample events belonging to A.

5. P(sample space) = 1
Permutation and combination
As long as the sample space is simple and its easy to find “the number
of elements in A”, we can directly use the rule:

P(A) = number of elements in A


number of elements in the sample space

If the event A is more complex, we need some new rules

Rule of permutations
The number of different orderings that can be formed with r objects
selected from a group of n distinct objects and is denoted by

Pnr = n(n-1) ........ (n-r+1)


Example 1.8:
How many different orderings can be formed with r
objects selected from a group of 10?

r=1 P110 = 10
r =2 P210 = 10 . 9 = 90
r=3 P310 = 10. 9. 8 = 720
r=4 P410 = 10 . 9 . 8 . 7 = 5040
r=5 P510 = 10 . 9 . 8 . 7 . 6 =30240
The number of different orderings of n objects in a group of n is
Pnn = n(n-1). (n-2)...... 2 . 1
Pnn =n! = “n factorial)
Example 1.9
How many different orderings can be done by n objects in a group of
n=2 objects A and B

A, B
B, A P22 = 2 . 1 = 2

n =3 objects A, B, C
ABC
AC B
BAC
BCA P33 = 3 . 2 . 1 = 6
CAB
CBA
The permutation rule deals with enumerating all arrangements when
choosing r objects out of n
In most situations we are interested only in the number of possible
choices of a group of r objects out of n, without looking at the order.

Example 1.10
We have a group of n = 3 objects (A, B, C) and we want to select 2.
If the ordering is of interest, we can perform this in
P23 = 3 . 2 = 6 ways

[AB, BA, AC, CA, BC, CB]

If the ordering is of no interest, we can perform this as: [AB AC BC] = 3


ways

This is called Combination


Rule of Combinations
The number of possible collections of r objects chosen from a group of n
distinct objects denoted by

n = n! = Prn
r r!(n-r)! r!

Example 1.11
How many different ways can r objects be selected from a group of n =
10?
r=1 10 = 10! = 10 (same as for permutation)
1 1!.9!

r=2 10 = 10! = 45
2 2!.8!
r=3 10 = 10! = 120
3 3!.7!

r =4 10 = 10! = 210
4 4!.6!

r =5 10 = 10! = 252
5 5!.5!
Example 1.12
In a herd of n = 10 cows in which four are sub-clinically ill, we are going
to randomly select two cows. This has to be done in such a way that
we are not able to investigate the first cow before selecting the
second.

a) What is the probability of selecting two sub-clinically ill cows?


b) What is the probability of selecting only one cow with sub-clinical
disease?
a) The event A in this case is to select two sick cows. The number of
sample events in this is equal to the number of possible collecting of
r = 2 cows chosen from a group of sick cows combined with r = 0
cows chosen from the group of 6 healthy cows.

4 . 6 = 4! . 6! =6
2 0 2!.2! 0!.6!
The number of sample events in the sample space is equal to the
number of possible collection ways of collecting r = 2 cows chosen
from the total herds of n = 10. Consequently the sample space
consists of:

10 = 45 sample events
2
In accordance with the rule

P(A) = No of elements in A
No. Of elements in the sample space

P(A) = 6/45 ≈ 0.13


b ) This question can be solved in the same way.
Event A = selecting one sick cow and one health cow.

No of elements in A: 4 6 = 24
1 . 1

Number of elements in in sample space is same as in


(a ).

P(A) = 24/45 = 0.53


Conditional probability

Sample space
Event
A

Assume that we have events A and B in the same sample


space. Then we have one of the following situations
1 2
A A
B OR B
Two events are called disjoint or mutually exclusive events if
they have no sample events in common.

The intersection of two events A and B is defined as the set of


sample events that belong simultaneously to both A and B.
The intersection is denoted as A∩B.

The Union of two events A and B is defined as the set of


sample events in A, in B and in A and B. The union is
denoted by AUB.

The complement of an event A is the set of sample events


that are not in A. The complement is denoted by Ā.
A B

Additive rule
P(AUB) = P(A) + P(B) – P(A∩B)
If the two events are disjoint P(A∩B) = 0

Then P(AUB) = P(A) + P(B)

In general, A1 , A2, ...... Ar are disjoint events


Then P(A1UA2UA3U...UAr) = P(A1) + P(A2) + P(A3) + + + P(Ar)
Example 1.13
Experiment: One blood sample is analysed for
blood grouping
Sample space: {O, A, B, AB}

Event: {A, AB}

Each sample event are independent and


P(O) = 0.46, P(A) = 0.43, P(B) = 0.08, P(AB) = 0.03

In accordance with the additive rule


P(event) = P(AUAB) = P(A) + P(AB)
= 0.43 + 0.03 = 0.46
A

The complement law


P(A) = 1 – P(Ā)

P(Ā) = 1 – P(A)

Example
We found that P(event) = P(AUAB) = 0.46
P(event) = P(OUB) = 1- P(event) = 1 – 0.46 = 0.54
The probability of an event A must often be modified
after information is obtained as to whether or not a
related event B has taken place.

The revised probability of A when it is known that B


has occurred is called

The Conditional probability of A, given B

P(A|B)
Example 1.14
There are 10 mice, of which 4 are white and 6 are grey. Two
mice are randomly selected. What is the probability that
the second selected mouse white when the first was grey.

Let A denote the event that the second selected mouse is


white and let B be the event that the first was grey.

Then we have
P(A) = 4/10 & P(B) = 6/10

But

P(A|B) = 4/9
A B

What is P(A∩B)
By looking at the figure above, it is easy to convince yourself that

P(A∩B) = P(A) . P(B|A)


or
P(A∩B) = P(B) . P(A|B)

This is the multiplicative law


Independent events
Two events A and B are said to be independent if:

P(A|B) = P(A)

P(B|A) = P(B)

P(A∩B) = P(A) . P(B)


Example 1.15
Assume we have two mice of which one is grey and the other is white.
Let A denote the event that the first selected mouse is white and B the
event that the second selected mouse is grey.
Assume further that we replace the mouse after each selection. Then
we have

P(A) = ½ & P(A|B) = ½

The two events A and B are independent.

Assume we now don't replace the mouse after each selection. Then we
have
P(B) = ½ & P(B|A) = 1

The two event are dependent


Example 1.16
(From Br. Med. J 1985; Telstad W. & Larsen S.)
Cramps
No Yes Total

Restless Legs No 2874 400 3274

Yes 675 773 1448


Total 3549 1173 4722
Let A denote the event to suffer from cramps in legs and B to
suffer from restless legs .

P(B) = 1448/4722 = 0.307


P(B|A) = 773/1173 = 0.659
This strongly indicates that the two events “restless legs” and
Baye’s Theorem
• Bayes' Theorem is a result that allows new information
to be used to update the conditional probability of an
event.
• Using the multiplication rule, gives Bayes' Theorem in
its simplest form:
P(A|B) = (A∩B) = P(B|A).P(A)
P(B) P(B)

Using the multiplicative rule


P(A|B) = P(B|A).P(A)
P(B|A).P(A) + P(B|Ā).P(Ā)
2. RANDOM VARIABLES AND
PROBABILITY DISTRIBUTION
• Events and random variables
• Describing data
• Probability distribution for discrete variables
• Probability distribution for continuous variables
• Parameters describing the probability distribution
• Expectation; The theoretical mean
• Variance and standard deviation; A measure of
dispersion
• Skewness and kurtosis
• Joint distributions; Covariance and correlation
2.1 Event and Random Variables
• Assume that we carry out the experiments of flipping a coin
three times. The sample space of the event will then be:

• {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}

• The sample space described by sample events is a


description by attributes rather than by numbers. Instead
of reporting the results in sample events, we can report it
in numbers like

• “number of tail” & “number of head”


• If we change from event and report the results in
“X=number of head” the sample space changes to

• {0, 1, 2, 3}

• And the sample event has changed to a random variable.

• Definition
• A random variable is a numerical valued function defined
on a sample space.
NOT ALL EVENTS CAN BE “RESTRUCTURETED TO A RANDOM
VARIABLE

• We have the following types of “variable”:

• Categorical variable events


• Nominal (without natural ranking)
• Ordinal (with natural ranking)

• Numerical variable random variables


• Discrete variables
• Continuous variables

CATEGORICAL VARIBALES

EVENTS

• Gender (sex)
Male
Female nominal

• Blood group
O
A nominal
B
AB
•Smoking
Non-Smoker (1)
•Ex-smoker (2)
•Light smoker (3) ordinal
•Heavy smoker (4)
NUMERICAL VARIBALES

RANDOM VARIABLES

1. Discrete random variables

A discrete random variable is a variable which only can


result in a certain, limited number of numerical values.

The sample space of a discrete random variable consists


of a countable limited number of numerical values.
• Example 2.1:
Experiment: Flipping coin 10 times
Variable: Number of head

Sample space: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

Example 2.2:
Experiment: A herd of 8 elephants is to be
investigated with regard to a given
pattern of symptom.
Variable: The number of elephants with the given
patter of symptoms.

Sample space: {0, 1, 2, 3, 4, 5, 6, 7, 8}


3. Continuous random variable
• A continuous random variable is a variable which
theoretically can result in an unlimited number of
numerical values within a limited or unlimited interval.

• The sample space of a continuous random variable is a


limited or unlimited interval consisting of an unlimited
number of numerical values.

Example 2.3:
Experiment: A herd of cows is investigated with regard to
body temperature.
Variable: Body temperature
Sample space: {all real number between 36
and 42}
Example 2.4:
Experiment: All the lambs in a sheep herd are to be treated with anti-
parasite drugs in the spring and the body weight, before
and after the summer is to be recorded.
Variable: Increase in body weight
Sample space: (all real number larger than 0
and less or equal to 40 kg).

Example 2.5:
Experiment: A total of n=98 with Rheumatoid Arthritis (RA) are
investigated and the patients report the degree of pain
on a 10 cm Visual Analogue Scale (VAS).
Variable: Degree of pain
Sample space: (all real number between
0 and 10}
2.2 Describing Data
HOW TO PRESENT DATA?

1. Descriptive statistics
- Table and figures

2. Statistical analysis
CATEGORICAL VARIABLES
Variable: Frequency of perinatal mortality in
England and Wales in 1979 by day.
Table: No. of deaths per 1000 births

Wednesday

Thursday

Saturday
Tuesday
Monday

Sunday
Friday
13.4 14.3 13.7 13.9 14.2 16.1 17.0
No. of
deaths
CONTINUOUS RANDOME VARIABLE
Variable: Age and Pl max in 25 patients with cystic fibrosis (O’Neill
et al. 1983) Subject Age Plmax
(years) (cm H2O)
1 7 80
2 7 85
3 8 110
4 8 95
5 8 95
6 9 100
7 11 45
8 12 95
9 12 130
10 13 75
11 13 80
12 14 70
13 14 80
14 15 100
15 16 120
16 17 110
17 17 125
18 17 75
19 17 100
20 19 40
21 19 17
22 20 110
23 23 150
24 23 75
25 23 95
Table: Describing averages

Lung function
(Pl max/cm H2O)
Mean 92.6
Median 95
Mode 95
Range 40-150
Variable: Serum IgM in 298 children aged from 6 months
to 6 years
IgM (g/l)
Number of Children

0.1 3
0.2 7
0.3 19
0.4 27
0.5 32
0.6 35
0.7 38
0.8 38
0.9 22
1.0 16
1.1 16
1.2 6
1.3 7
1.4 9
1.5 6
1.6 2
1.7 3
1.8 3
2.0 3
2.1 2
2.2 1
2.5 1
2.7 1
4.5 1
STEM-AND-LEAF PLOT
IgM
(g/l) Number of Children
0.1 3
0.2 7
0.3 19
0.4 27
0.5 32
0.6 35
0.7 38
0.8 38
0.9 22
1.0 16
1.1 16
1.2 6
1.3 7
1.4 9
1.5 6
1.6 2
1.7 3
1.8 3
2.0 3
2.1 2
2.2 1
2.5 1
2.7 1
4.5 1
Cumulative Frequency
Cumulative
IgM Relative Cumulative
Frequency Relative
(g/l) Frequency % Frequency
Frequency %
0.1 3 1.0 3 1.0
0.2 7 2.3 10 3.4
0.3 19 6.4 29 9.7
0.4 27 9.1 56 18.8
0.5 32 10.7 88 29.5
0.6 35 11.7 123 41.3
0.7 38 12.8 161 54.0
0.8 38 12.8 199 66.8
0.9 22 7.4 221 74.2
1.0 16 5.4 237 79.5
1.1 16 5.4 253 84.9
1.2 6 2.0 259 86.9
1.3 7 2.3 266 89.3
1.4 9 3.0 275 92.3
1.5 6 2.0 281 94.3
1.6 2 0.7 283 95.0
1.7 3 1.0 286 96.0
1.8 3 1.0 289 97.0
2.0 3 1.0 292 98.0
2.1 2 0.7 294 98.7
2.2 1 0.3 295 99.0
2.5 1 0.3 296 99.3
2.7 1 0.3 297 99.7
4.5 1 0.3 298 100.0
Total 298 99.9
2.3 Probability distribution for
discrete variables
Each random variable has a sample space which consists of all
the possible values of the variable.

If S denotes the sample space and X the random variable

When X is a discrete random variable, S will consists


of a limited number of X-values
S={X1, X2, X3, … Xn}

P(X=x)=f(x) for all x S


Is called the Probability density.

The probability distribution of the variable X in the sample space S.

The demands to the probability density are:

0 ≤ P(X=x) ≤1 for all values x S

P(X=x1)+P(X=x2)+..+P(X=xn) =
Example 2.6
Experiment: Flipping coin 3 times
Variable: Number of head
Sample space: {0, 1, 2, 3}

From previously we know that that the Sample space written with
events is

{HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}

Then we can calculate Probability

• P(X=0) = 1/8
• P(X=1) = 3/8
• P(X=2) = 3/8
• P(X=3) = 1/8
Example 2.7
Experiment: A herd of 8 cows is to be investigated
with regards to a given pattern of
symptoms.
Variable: The number of cows with the given
pattern.
Sample space: {0, 1, 2, 3, 4, 5, 6, 7, 8}

In this case each point probability [P(X=x)] is unknown and


needs to be calculated. This follows later.
Assume that the calculations gave the following results:
P(X=0)=0.017 P(X=1)=0.089 P(X=2)=0.209
P(X=3)=0.279 P(X=4)=0.232 P(X=5)=0.124
P(X=6)=0.041 P(X=7)=0.008 P(X=8)=0.001
Important issues with probability distributions for discrete
random variables.
1) ) =1 The area under the density curve is 1
2) =
3) )

4)

Probability Density or Probability Distribution


Cumulative Probability Distribution
Example 2.7 (cont.)
Experiment: A herd of 8 cows investigated regarding a pattern
of symptoms.
Variable: No. of cows with the symptom.
Sample space: {0, 1, 2, 3, 4, 5, 6, 7, 8}
2.4 Probability distribution for continuously variables
• When X is a continuous random variable, the Sample Space consist of an unlimited
number of values within a limited or unlimited interval.

• The probability density of a continuous random variable (f(x) is consequently a


continuously function and not a “histogram”-type as for discrete random variables.

Demands to the probability density for a continuous random variable

1. The probability distribution f(x) is always positive

2. The area under the probability distribution curve f(x) is equal


• to 1.




N=2

N=3

N=4
Xm
4.5 Properties of an estimator

• An estimator is defined as a function of a random


variable
• Consequently, an estimator is also a random
variable
– has a probability distribution
– E( )) and Var ( ) are defined
• As an estimator , is unbiased for the parameter θ
if E( ) = θ, for what ever true value of θ. If this
property does not hold, is a biased estimator
However, in some situations, there exists more than
one unbiased estimator . In these situations, we
have to look at SD ( ).

Select the unbiased estimator of θ that has the


smallest variance, whatever the true value of . If
one exist, it is called the minimum variance unbiased
estimator of θ.
Which of the two demands
Unbiased
Small SD
is the most important
• In order to answer this question, think of an
estimator as a gun:
– You have two guns, One has the expectation to hit the
target even with the relative large deviation of the shots.
– The other gun does not have the expectation to hit the
target, but the deviation of the shots is small.

• The most important property is that the estimator


is unbiased for the parameter θ
I.e. Expected value of a Chi-squared
variable is n -1

You might also like