Professional Documents
Culture Documents
1 One-Dimensional Random Variables
1 One-Dimensional Random Variables
1 One-Dimensional Random Variables
1 denition. A variable is called random, if it can receive real values with denite
probabilities as a result of experiment.
Suppose a randomly selected family has 3 children. Let X = the number of boys
in the family. Then x = 0, 1, 2, or3. If necessary, you can use a tree diagram to list
all of the possible 3-children families, as follows. The column labeled x corresponds
to the number of boys in the particular outcome. The column labeled ”P robability”
identies the probability of the particular outcome, assuming a boy and girl are
equally likely, i.e. P(G) = P(B) = 0.5. Note that the probability is calculated
assuming independence, i.e. assuming that the gender of one child does not aect
the gender of the next child, a seemingly reasonable assumption. So, for example,
P(GGG) = P(G) × P(G) × P(G) = 0.5 × 0.5 × 0.5 = 0.125.
Outcome x Probability
GGG 0 0.5 × 0.5 × 0.5 = 0.125
BGG 1 0.5 × 0.5 × 0.5 = 0.125
GBG 1 0.5 × 0.5 × 0.5 = 0.125
GGB 1 0.5 × 0.5 × 0.5 = 0.125
BBG 2 0.5 × 0.5 × 0.5 = 0.125
GBB 2 0.5 × 0.5 × 0.5 = 0.125
BGB 2 0.5 × 0.5 × 0.5 = 0.125
BBB 3 0.5 × 0.5 × 0.5 = 0.125
1
The function f (x) is called probability density function or pdf. The notation for a
probability density function, f (x), entails using a lowercase "f". As we will soon see,
F (x), with a capital "F", takes on a dierent meaning.
We can determine the probability density function by noting that the possible
3-child families are mutually exclusive outcomes. So:
x 0 1 2 3
f(x) 0.125 0.375 0.375 0.125
Note that a probability density function must follow basic probability rules. The
probabilities must be numbers between 0 and 1 (inclusive) and the probabilities must
add to 1.
2
x 0 1 2 3
F(x) 0.125 0.5 0.875 1
As we will see, typically we are faced with the situation in which we have some
assumed cumulative distribution function, but we need to calculate some probability.
The following example illustrates how to use a cumulative distribution function F(x)
to nd various probabilities.
Both density function and distributional functions may also be described by the
probability histogram.
For example, consider the probability density function shown in the graph below.
Suppose we wanted to know the probability that the random variable X was less than
or equal to a. The probability that X is less than or equal to a is equal to the area
under the curve bounded by a and minus innity - as indicated by the shaded area.
Note: The shaded area in the graph represents the probability that the random
variable X is less than or equal to a. This is a cumulative probability. However, the
3
Figure 1: Density function
4
1) 0 ≤ f (x) ≤ 1 for all x ∈ R;
2) P(a ≤ X ≤ b) = P(a < X ≤ b) = F (b) − F (a) = f (x)dx;
Rb
a
4) f (x) = F 0 (x);
5) f (x)dx = 1.
R∞
−∞
8) P(X = i) = 1.
P∞
i=−∞
Mean.
The mean of a random variable indicates its average or central value. It is a useful
summary value (a number) of the variable's distribution. Stating the mean gives a
general impression of the behavior of some random variable without giving full details
of its probability distribution (if it is discrete) or its probability density function (if
it is continuous). The mean can be notated as µ, m or E(X).
The mean of a discrete random variable X is a weighted average of the possible
values that the random variable can take.
n
X
µX = p1 x1 + p2 x2 + ... + pn xn = p i xi .
i=1
The mean of a random variable provides the long-run average of the variable, or
the expected average outcome over many observations.
5
If X is a continuous random variable with probability density function f(x), then
the expected value of X is dened by:
Z ∞
µx = xf (x)dx.
∞
Properties of mean:
1) E(cX) = cE(X), where c is a constant;
2) E(X + Y ) = E(X) + E(Y );
3) E(XY ) = E(X)E(Y ) for independents events.
Median.
The median of X is the least number xm such that
Mode.
A mode of X is a number xe such that P (X = xe ) is largest. This is the most likely
value of X or one of the most likely values if X has several values with the same
largest probability. For a continuous random variable, a mode is a number xe such
that the probability density function is highest at x = xe .
6
4.2 Measures of Dispersion
2
σX = E(X − E(X))2 = EX 2 − (EX)2 .
Properties of variance:
1) D(cX) = c2 D(X), where c is a constant;
2) D(X + Y ) = D(X) + D(Y ) for independents events.
7
5 Some distributions of discrete random variables
p, if x = 1;
f (x) =
q if x = 0.
The corresponding distribution function is
0, if x < 0;
F (x) = q, if 0 6 x < 1;
1 if x > 1.
E(X) = 1 × p + 0 × q = p
and variance
Let X1 ,..., Xn be independent Bernoulli random variables. Often we are only inter-
ested in number of successes:
Y = X1 + ... + Xn
Example: A coin is tossed 50 times and we are only interested in number of 'heads'
occurring.
Dene for the ith toss:
1, if head occours;
Xi =
0 if tail occours.
Y = X1 + ... + Xn
Here Y is the number of "heads" which occour after 50 tosses.
We know:
1) Xi is Bernoulli distributed with parameter p;
2) there are just two outcomes of each trial: success and failure;
8
What is the distribution of Y ? Y has density function
where y = 0, 1, 2, ..., n
n = 1, 2, 3, ...
p success probability; 0 6 p 6 1;
q failure probability; q = 1 − p.
A discrete random variable Y is said to follow a Binomial distribution with param-
eters n and p, written Y ∼ Bi(n, p) or Y ∼ B(n, p), if it has probability distribution.
The trials must meet the following requirements:
1) the total number of trials is xed in advance;
2) there are just two outcomes of each trial: success and failure;
3) the outcomes of all the trials are statistically independent;
4) all the trials have the same probability of success.
The distributional function is:
if x < 0;
0,
qn, if 0 6 x < 1;
q n + npq n−1 , if 1 6 x < 2;
F (x) = ...
if k 6 x < k + 1;
Pk
i=0 b(i, n, p),
...
if x > n.
1
The Binomial distribution has mean E(X) = np and variance D(X) = np(1 − p).
9
X is geometrically distributed with parameter p. The distributional function is:
if x < 1;
0,
if 1 6 x < 2;
p,
if 2 6 x < 3;
p(1 − p),
F (x) =
...
if k 6 x < k + 1;
p(1 − p)k−1 ,
...
CNx 1 CNn−x
2
P (X = x) = h(x; n, N1 , N2 ) =
CNn 1 +N2
for integers x such that 0 6 x 6 n and 0 6 X 6 N1 + N2 .
Often we are interested in the number of events which occur in a specic period of
time or in a specic area of volume:
1) the number of cars passing a xed point in a 5 minute interval;
2) Number of telephone calls coming into an exchange during one unit of time;
3) Number of diseased trees per acre of a certain woodland;
4) Number of death claims received per day by an insurance company.
A discrete random variable X is said to follow a Poisson distribution with param-
eter λ, written X P (λ), if it has probability distribution
λx −λ
p(x, λ) = P (X = x) = e
x!
where x = 0, 1, 2, ..., n, λ > 0.
The following requirements must be met:
1) The probability that the event occurs in a given unit of time is the same for all
the units;
10
2) The number of events that occur in one unit of time is independent of the
number of events in other units;
3) The mean (or expected) rate is λ.
The Poisson distribution has expected value E(X) = λ and variance D(X) = λ.
The Poisson distribution can sometimes be used to approximate the Binomial dis-
tribution with parameters n and p. When the number of observations n is large, and
the success probability p is small, the Bi(n, p) distribution approaches the Poisson
distribution with the parameter given by λ = np. This is useful since the computa-
tions involved in calculating binomial probabilities are greatly reduced.
if x 6 a;
0,
F (x) = x−a
b−a
, if a < x 6 b;
1 if x > b.
1 (a + b)(b − a) a+b
= = .
b−a 2 2
and variance:
b
1 x3 b 1 b 3 − a3
Z
2 1
E(X ) = x2 dx = |a = =
a b−a b−a 3 b−a 3
1 (b − a)(b2 + ab + a2 ) b2 + ab + a2
= = .
b−a 3 3
11
Then
b2 + ab + a2 (a + b)2
D(X) = E(X 2 ) − (E(X))2 = − =
3 4
4(b2 + ab + a2 ) − 3(b2 − 2ab + a2 ) b2 − 2ab + a2 (b − a)2
= = = .
12 12 12
So the Uniform distribution has mean E(X) = (a+b)
2
and variance D(X) = (b−a)2
12
.
Exponential distributions describe the distance between events with uniform distri-
bution in time: if x is the time variable, λx is the expected number of events in the
interval [0, x], then e−λx is the probability of no event in [0, x]. The function
λe−λx , for x > 0;
f (x) =
0, elsewhere.
is called the exponential probability density function. The distributional function of
X is:
for x < 0;
0,
F (x) =
1 − e , for x > 0.
−λx
Strictly, a Normal random variable should be capable of assuming any value on the
real line, though this requirement is often waived in practice. For example, height at
a given age for a given gender in a given racial group is adequately described by a
Normal random variable even though heights must be positive.
A continuous random variable X , taking all real values in the range (−∞; ∞) is said
to follow a Normal distribution with parameters µ and σ if it has probability density
function
1 2 (µ−x)
f (x) = ϕ(x) = √ e 2σ 2 .
2πσ 2
12
The distributional function is
Z x Z x
1 −(y−µ)2
F (x) = f (y) dy = √ e 2σ 2 dy. (3)
−∞ 2πσ 2 −∞
The graph of the normal distribution depends on two factors - the mean and
the standard deviation. The mean of the distribution determines the location of the
center of the graph, and the standard deviation determines the height and width of
the graph. When the standard deviation is large, the curve is short and wide; when
the standard deviation is small, the curve is tall and narrow. All normal distributions
look like a symmetric, bell-shaped curve, as shown below.
13
Figure 5: normal distribution and probability
2) About 95% of the area under the curve falls within 2 standard deviations of the
mean;
3) About 99.7% of the area under the curve falls within 3 standard deviations of
the mean.
Collectively, these points are known as the empirical rule or the 68 − 95 − 99.7 rule or
1 − 2 − 3 σ rule. Clearly, given a normal distribution, most outcomes will be within
3 standard deviations of the mean.
The simplest case of the normal distribution, known as the Standard Normal Dis-
tribution, has expected value zero and variance one. This is written as N(0,1). The
probability density function is
1 −x2
φ(x) = √ e 2 , −∞ < x < ∞. (4)
2π
The distributional function is
Z x Z x
1 −y 2
Φ(x) = φ(y) dy = √ e 2 dy. (5)
−∞ 2π −∞
The probability density function of the exponential distribution has a mode of zero.
In many instances, it is known a priori that the mode of the distribution of a particular
random variable of interest is not equal to zero (e.g., when modeling the distribution
of the life-times of a product such as an electric light bulb, or the serving time taken
14
at a ticket booth at a baseball game). In those cases, the gamma distribution is more
appropriate for describing the underlying distribution. The gamma distribution is
dened as:
λγ γ−1 −λx
f (x) = x e for x > 0, γ > 0
Γ(γ)
where
γ is the Shape parameter
λ is the Scale parameter
Γ is the Gamma function dened as Γ(k) = 0 xγ−1 e−x dx. It's properties:
R∞
• Γ(1) = 1;
• Γ(γ + 1) = γΓ(γ).
15
The chi-square distribution has one parameter: k - a positive integer that species
the number of degrees of freedom (i.e. the number of Xi ).
The best-known situations in which the chi-square distribution is used are the
common chi-square tests for goodness of t of an observed distribution to a theoretical
one, and of the independence of two criteria of classication of qualitative data.
However, many other statistical tests lead to a use of this distribution. One example
is Friedman's analysis of variance by ranks.
A probability density function of the chi-square distribution is
1
xk/2−1 e−x/2 , for x > 0;
f (x) = 2k/2 Γ(K/2)
0, for x 6 0.
Here
Γ(K/2) - gamma distribution
k is the degrees of freedom.
The student's t distribution is symmetric about zero, and its general shape is similar
to that of the standard normal distribution. It is most commonly used in testing
hypothesis about the mean of a particular population. Suppose X0 , X1 , X2 , ..., Xn
are independent normal distributed random variables with means 0 and variance σ 2
(Xi ∼ N (0; σ), i = 1, ..., n). Then the random variable
X0
tn = q P
1 n
n i=1 Xi2
16
Figure 8: Student's density functions
7.4 F Distribution
Here n and m are the shape parameters, degrees of freedom. The probability density
function of F distribution can be written in this way:
m Γ m+n − m+n
2
, for x > 0;
m 2
2 m
x −1 mx
n
2 1+ n
f (x) = Γ m
Γ n
2 2
for x 6 0.
0,
17