Professional Documents
Culture Documents
Prob Dist
Prob Dist
Prob Dist
Abstract: Aim of this paper is a general definition of probability, of its main mathematical features
and the features it presents under particular circumstances. The behavior of probability is linked to the
features of the phenomenon we would predict. This link can be defined probability distribution. Given the
characteristics of phenomena (that we can also define variables), there are defined probability distribution.
For categorical (or discrete) variables, the probability can be described by a binomial or Poisson distribution
in the majority of cases. For continuous variables, the probability can be described by the most important
distribution in statistics, the normal distribution. Distributions of probability are briefly described together
with some examples for their possible application.
Submitted Nov 09, 2014. Accepted for publication Dec 17, 2014.
doi: 10.3978/j.issn.2072-1439.2015.01.37
View this article at: http://dx.doi.org/10.3978/j.issn.2072-1439.2015.01.37
A short definition of probability (II) The sum of the probabilities of E = P(E1) + (E2) + …
+ P (En) is 100%;
We can define the probability of a given event by
(III) If E1 and E3 are two possible events, the probability
thatone or the other could happen P (E1 or E3) 1is equal 2to the
evaluating, in previous observations, the incidence of P X x 1 E P E P E P En
the same event under circumstances that are as similar
sum of the probability of E1 and the probability of E3 (Eq. [2]):
as possible to the circumstances we are observing [this is
the frequentistic definition of probability, and is based on P E1 or
E2 P E1 P E3 P p, p, p
, q, q pppqq
[2] p3q 2
the relative frequency of an observed event, observed in Probability could be described by a formula, a graph, in
previous circumstances (1)]. In other words, probability which each event is linked to its probability. This kind of
n!
describes the possibility of an event to occur given a series nCx
description f x nC
of probability is called probability
x n x
xp q
distribution.
x! n x !
of circumstances (or under a series of pre-event factors). It
is a form of inference, a way to predict what may happen,
Binomial
f x 1 distribution
based on what happened before under the same (never f 4 10
C4 0.3 0.7 0.2001
4 6
P E1 or
E2 P E1 P E3 P p, p, p
, q, q pppqq
p 3q 2
2
2 1
(I) If under some circumstances, a given number of handed or right-handed). The probability that f x zindividuals
e 2 , z
2
events (E) could verify (E1, E2, E3, …, En), the probability (P) present a given characteristic, p, that is mutually exclusive
of any E is always
n! more than zero; of another one, called q, depends on the possible number
nCx f x nCx p q
x n x
x ! n x ! z1 1 z2 65 70
z0
e
2 2
dz z
3
1.67
© Journal of Thoracic Disease. All rights reserved. www.jthoracdis.com J Thorac Dis 2015;7(3):E7-E10
f x 1
f 4 10C4 0.3 0.7 0.2001
4 6
f x 1
1.672
1.67 1
e 2
P z 1.67 0.0475
P X x 1 E P E1 P E2
P En
E8 Viti et al. Probability distributions
P E1 or
E2 P E1 P E3 P p, p, p
, q, q pppqq
p 3q 2
of combinations of x individuals within the population, when the period of observation is longer.
called C. If my population is composed of five5 individuals, To predict the probability, I must know how the events
that can be p or q, I have ten possible combinations of, for behave (thisn !data comes from previous, or historical,
nCx f x nC p x q n x
instance, three individuals with p is (Eq. [3]): observations x ! nof xthe! same event before the time xI am trying
E P E1 P E2 P En
pppqq, pqqpp, ppqpq, ppqqp, pqpqpq, qpppq, qpqpp, qppqp, to perform my analysis). This parameter, that is a mean
qqppp of the events in a given interval, as derived from previous
f x 1
observations, is called λ. f 4 10 C4 0.3 0.7 0.2001
4 6
P E3 P p, p, p , q, q pppqq pq 3 2
[3] f Poisson
x 1
The distribution follows the following formula
Then P X p 3 2
q x 1
will be multiplied
P
E
for
X
Pthe
x
E 11number
P E 2 of P E
(Eq.
n E [8]):
P 1 E2 P En
E P
combinations (ten times).
f x nCx p q x n x
x e 2.753 e2.75
If, in experimental population, I had a big number of f x P X 3 [8] 0.221
P E1 or E2 P E1 P E3 P p, p, p , q, q pppqq p 3q 2 x! 3!
individuals (n), the number of combinations P E1 or E2 ofPx E 1
individuals
P E 3 P p, p , p, q , q pppqq
p 3 2
q
where the number e is an important mathematical constant
within the population will be (Eq. [4]):
that is the base of the natural logarithm. It is approximately x
f 4 10 Cn4! 0.3 0.7 0.2001
4 6
Z
f x n! nCx p q x n x
equal to 2.71828.
2
nCx [4] x
E P xE!1nPxE ! 2 P En nCx ff xx nC1x p qe 2 , x
2
x n x
P X x 1 x ! n x !
E P E1 P E2 P En For example, 2 the distribution of major thoracic traumas2
1 2z
Therefore, the probability that a group of x individuals needing intensive care unit (ICU) recovery f z a month
during e , z
3 2.75 2
within the 2.75 e
P E3 P f px,Xp
, 1p,3q
population
, q pppqq of q 2 individuals
3 n
p0.221 f x f 14 10
presents4 the 6 in the last three years in a Third Level Trauma Center
P E
characteristic or E
2 p, that P E 3! P
1 excludes E q, will Pbe p , p , p
described ,Cq4, q 0.3
pppqq
by 0.7
the p0.2001
3 2
qfollows
f 4 a 10 C4 0.3 distribution,
Poisson
4
0.7 0.2001were λ=2.75. In a future
6
f x 1
1 3
f x 1 1 z 2
65 70
following formula (Eq. [5]): z
period of one e month, what is the probability to have three
1
dz z 1.67
z
patients 2
with 2major thoracic trauma in ICU? 3(Eq. [9]):
f x nCxnp! xq
0
x n x
[5]
nCx Zx e f xx nCx p q
x n x 3 2.75
2.75 e
f x
x that describes x ! n x ! f x
P e X
3
It follows3!the 0.221 P X 3 1.67
2.753 e2.75
x ! the 1binomial 2
z distribution. 0.221 [9]
x! 3!
2
2.75 e 3 2.75 2
f z e 2
, z
z
1 2z
X 3xten 2 2.75 present an finfinite e , zof
2distribution
values, within a given
If we
P select e individuals
from0.221 this population, what 2.75is3 ethe
f x 3! P X 3 0.221 interval.
P 65 These
x 74 are
65 70
P called continuous
74 70
probability that x ! four out of ten individuals are left handed? 3! z Pvariables
1.67 z (3).
1.33 P z 1.33 P
3 3
Wef xcan nC x n x
x p qthe binomial distribution, since we suppose
P z 1 zapply
z2 1.67 0.0475 65 70
thatza person
1z z 2 65 70
1
e xdz
z 1.67
may be either z 2 right-handed.
left-handed or
1
0 2 Z 2 e 3dz x z
Distributions 1.67continuous variables
of
our 0 2 3
Se we can use 2formula (Eq. [7]): Z
x 1
x
2
Pff 4x0.9082
e 241
2 z
e,0.8607 x
26 0.2001 An example of continuous variable is the systolic blood
, 86.07%
4z 0.3 0.7
082 10
f2C 0.0475
z 1 [7]2z
2
2
1.67
1.67 1 2 1.67 P1 2
2 zf z1.67 2
1.67 e , pressure.
z Within a given cohort of systolic blood pressure
2 e 2 2 e
0.0475
canPbe presented
z 1.67as in
0.0475
Figure 1. Each single histogram
74 70
z P 1.67 z 1.33 P z 1.33 P z 1.67
Poisson length represents an interval of the measure of interest
3 65 distribution
70 2.753 e2.75
zPz 1X 3 2 1.67 0.221 70 0.0475 0.8607 between two intervals on the x-axis, while the histogram
3ez z1.33 P 65
Pz 2important dz distribution
3!0.9082 0.9082 86.07%
1
P z 1.67 0.0475
© Journal of Thoracic Disease. All rights reserved. www.jthoracdis.com J Thorac Dis 2015;7(3):E7-E10
50
n! this E1 orwill
Parea P E68%
Ex2ecover 1 P
of Eall
3 the possible
P pvalues
, p, p , q 2.75
, qof X, pppqq
while
3 2.75
e p 3q 2
40 nCx f x the
nCf
x x n x P
x p qbetween µ ± 2σ, it will cover 95% of all the values.
area
X 3
0.221
x ! n x ! x! 3!
30 The two parameters of the distribution are linked in the
n!
nCx (Eq. [10]):
formula f x nCx p x q n x
x ! n x ! x
20 f x 1 Z
f 4 10 C4 0.3 10.7 x 0.2001
4 6
2
f x 1 f x e 2 , x [10]
2
10 2
2 1 2z
f x 1 f z e 6 , z
4 10 C4 0.3 20.7 0.2001
4
Forf µ x= 0,1 and σ = 1, the curve is f called standardized
110 120 130 140 Systolic Blood
f x X Pressure
119 129 139 149 x e P x 1 (mmHg)
normal P
E1 3 eP2.75 E2 All
E Pdistribution.
2.75
X P X
3
the P possible
En normal distributions of
x!
Figure 1 Graphical description of the distribution of systolic blood x may 3! 1 0.221
x E PE PE
1 2 n PE
z be 1 “normalized”
z2 by defining a derived
65 variable
70 called
fz x[11]):
1
e x dz z 1.67
, p
3 2.75
pressure in a given population. z. (Eq. e 2.75 e
P E1 or
E2 P E1 P E3 P p,2p , q2, q pppqq p 3q 2 P X3 3 0.221
0
P E1xor Ex2 ! P E1 P E3 P p, p, p 3!
, q, q pppqq
p 3q 2
x
2 Z
120 f x130
110
1
140e
2 2
, x P X 1.67 x
2 1 E P E1 P E2 P En
[11]
N. of 150
n! 1.67 1 2
zZ x1.67 0.0475
z
2
nCx f zx 2nC 1e 2x n x
P
2nx !peq ,x z
pts 2
x ! n x !
2
40 nCx 1 f x nCx p q
x n x
fP xE 1 or !En2 xe!P 2E1 ,P E3x P p, p, p , q, q pppqq
2
x 2 p3q 2
To calculate 2 the probability that our variable falls 1 2z
within
f z e , z
70 interval, for
P z 1.33 instance 0.9082 z0 and z1P 0.9082 0.0475 2 0.8607 86.07%
z1 1 fzx2 1 65a given , we should calculate
z0 2 e 2 f dz 4x 1.67 !0.3 0.7 0.2001
4 6
30 z fffollowing 10
1 Cn4definite
the integral calculus (Eq. [12]): x n4x
x 1 3 nC x 10
f f 4 C4x p0.3
nC q 0.7 0.2001
6
f
x
74
x x!1 n x !
z P 65 70 z 74 70 P 1.6765 z 701.33 P z 1.33 P
2
P z165 1 xe
20 2
z0 2 2
dz 3 3 z
3
1.67 [12]
1.67 1 1.672 x e 2.75 e 3 2.75
1x !
2
z2
selected
individual
P X
this 0.221
2 Pf x zwould
1.33 e 120.9082
2
, x P 0.9082 0.04753! 0.8607 86.07%
Systolic blood
P 65 x 74 P
65Pressure
70
z
(mmHg)
74 70 population
P 1.67 z 1.33 P 2
f
z z 1.33 e P
have a 2weight of
z65
, z 1.67
kg or less?
1 2z
2
3
Figure 2 Graphical description of the normal distribution. 3 2
To “normalize” our distribution, we should calculate the f z e , z
2
value of z (Eq. [13]): 65 70 74 70 x
P 65 x 74 P x 2 z P 1.67 z 1.33Z P z 1.33 P
1 z 2
65 70 1 32 3
z zf1 x1 z2 1.67
z1
dz X becomes e 2 , x
z0 2 e 2 of
direct analysis), the probability distribution 3e 2dz 65 70 [13] z 2
z1.671 e 2 , z
similar to a particular form of distribution, called normal z0
2 we2 should calculate the area under
z
3
f
2
Then, the curve
distribution or Gauss distribution. The aforementioned (Eq. [14]):
2
x 2x 1 2z
E10 z2 f z e , z Viti et al. Probability distributions
P Zfz e , z
z 1.33
1 0.9082
2 P 0.9082 2 0.0475 0.8607 86.07%
x 2
z2
f 2z 86.07% e 2 , z
or less
P 0.9082 0.0475 and those
0.8607 whose 1 weight is 65 kg or less: (Eq. [15]): category will allow a proper application of a model (for
z1 1 z 265 70 74 70z 65 70 instance, the standardized normal distribution) that would
zz06565
P xe70
74dz P
2 2 1.67 3
z
3
1.67
P 3 1.67 z 1.33 P z 1.33 P z 1.67
easily predict the probability of a given event.
3
1.67 z 1.33 P 65 70z 1.33 P z 1.67 [15]
z 1.67
We 3
1.67already
1 know that (Eq. [16]):
1.672
Acknowledgements
P 2ze 2−1.67 P z 1.67 0.0475
1.67 0.0475 [16]
Disclosure: The authors declare no conflict of interest.
In the table we can find also the value for (Eq. [17]):
P z 1.67 0.0475
P z 1.33 0.9082 0.8607 86.07%
P 0.9082 0.0475[17]
082 P 0.9082 0.0475 0.8607 86.07% References
Our probability is (Eq. [18]):
z 86.07% 1. Daniel WW. eds. Biostatistics: a foundation for analysis in
082 PP 65
0.9082
x 74 0.0475
700.8607
65 74 70 [18]
74 70
P P 1.67 z 1.33 P z 1.33 P z 1.67
z P 1.67 z 1.33 3 z 1.333 P z 1.67
P the health sciences. New York: John Wiley & Sons, 1995.
3
2. Kolmogorov AN. eds. Foundations of Theory of
74 70
z Conclusions
P 1.67 z 1.33 P z 1.33 P z 1.67
Probability. Oxford: Chelsea Publishing, 1950.
3
The probability distributions are a common way to describe, 3. Lim E. Basic statistics (the fundamental concepts). J
and possibly predict, the probability of an event. The main Thorac Dis 2014;6:1875-8.
point is to define the character of the variables whose 4. Standard Normal Distribution Table. Available online:
behaviour we are trying to describe, trough probability http://www.mathsisfun.com/data/standard-normal-
(discrete or continuous). The identification of the right distribution-table.html
© Journal of Thoracic Disease. All rights reserved. www.jthoracdis.com J Thorac Dis 2015;7(3):E7-E10