1 One-Dimensional Random Variables

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

1 One-dimensional random variables

1 denition. A variable is called random, if it can receive real values with denite
probabilities as a result of experiment.

Such variables are denoted by a capital letter such as X , Y or X1 , X2 . The cor-


responding lower case letters, x, y , x1 , x2 represent the actual values. For example
P(X = x) = 0.4 means, that the probability that random variable X is equal to x is
0.4.
If a variable can take on any value between two specied values, it is called a contin-
uous variable; otherwise, it is called a discrete variable.

2 Probability density and cumulative distributional


functions for discrete random variable

Suppose a randomly selected family has 3 children. Let X = the number of boys
in the family. Then x = 0, 1, 2, or3. If necessary, you can use a tree diagram to list
all of the possible 3-children families, as follows. The column labeled x corresponds
to the number of boys in the particular outcome. The column labeled ”P robability”
identies the probability of the particular outcome, assuming a boy and girl are
equally likely, i.e. P(G) = P(B) = 0.5. Note that the probability is calculated
assuming independence, i.e. assuming that the gender of one child does not aect
the gender of the next child, a seemingly reasonable assumption. So, for example,
P(GGG) = P(G) × P(G) × P(G) = 0.5 × 0.5 × 0.5 = 0.125.

Outcome x Probability
GGG 0 0.5 × 0.5 × 0.5 = 0.125
BGG 1 0.5 × 0.5 × 0.5 = 0.125
GBG 1 0.5 × 0.5 × 0.5 = 0.125
GGB 1 0.5 × 0.5 × 0.5 = 0.125
BBG 2 0.5 × 0.5 × 0.5 = 0.125
GBB 2 0.5 × 0.5 × 0.5 = 0.125
BGB 2 0.5 × 0.5 × 0.5 = 0.125
BBB 3 0.5 × 0.5 × 0.5 = 0.125

We summarize the probability that a discrete random variable X takes on certain


values x by way of a PROBABILITY DENSITY FUNCTION f (x). The probability
density function, which can be described by a formula or in a table, is dened as:

(1)
X
f (xi ) = P(X = xi ) = pi , i = 1, 2, ..., pi = 1
i=1

which determines the correspondence between a value xi of the variable X and


the probability pi , that X receives this value.

1
The function f (x) is called probability density function or pdf. The notation for a
probability density function, f (x), entails using a lowercase "f". As we will soon see,
F (x), with a capital "F", takes on a dierent meaning.
We can determine the probability density function by noting that the possible
3-child families are mutually exclusive outcomes. So:

P(X = 0) = P(GGG) = 0.125

P(X = 1) = P(BGGorGBGorGGB) = P(BGG) + P(GBG) + P(GGB) = 0.375


P(X = 2) = P(BBGorGBBorBGB) = P(BBG) + P(GBB) + P(BGB) = 0.375
P(X = 3) = P(BBB) = 0.125
So, the probability density function for X , the number of boys in a 3-child family is:

x 0 1 2 3
f(x) 0.125 0.375 0.375 0.125

Note that a probability density function must follow basic probability rules. The
probabilities must be numbers between 0 and 1 (inclusive) and the probabilities must
add to 1.

Using probability density function we can answer to such type of questions:


What is the probability that a randomly selected 3-child family as at least one boy?
What is the probability that a randomly selected 3-child family has fewer than 2
boys?
What is the probability that a randomly selected 3-child family has at least 1, but
no more than 2 boys?

The CUMULATIVE DISTRIBUTION FUNCTION F (x) is the more common


way that the probabilities of a random variable are summarized. The cumulative
distribution function, which also can be described by a formula or in a table, is
dened as:
F (x) = P(X ≤ x)
Again, consider X, the number of boys in a 3-child family. Then:

P(X ≤ 0) = P(X = 0) = 0.125

P(X ≤ 1) = P(X = 0) + P(X = 1) = 0.125 + 0.375 = 0.50


P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) = 0.125 + 0.375 + 0.375 = 0.875
P(X ≤ 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) = 0.125 + ... + 0.125 = 1
Then, summarizing the cumulative distribution function F (x) for X in a table:

2
x 0 1 2 3
F(x) 0.125 0.5 0.875 1

As we will see, typically we are faced with the situation in which we have some
assumed cumulative distribution function, but we need to calculate some probability.
The following example illustrates how to use a cumulative distribution function F(x)
to nd various probabilities.
Both density function and distributional functions may also be described by the
probability histogram.

3 Probability density and cumulative distributional


functions for continuous random variable

A continuous probability density function diers from a discrete probability density


function in several ways.
1) The probability that a continuous random variable will assume a particular
value is zero.
2) As a result, a continuous probability distribution cannot be expressed in tabular
form.
3) Instead, an equation or formula is used to describe a continuous probability
distribution.
For a continuous random variable, the density function has the following proper-
ties:

1) Since the continuous random variable is dened over a continuous range of


values (called the domain of the variable), the graph of the density function
will also be continuous over that range.
2) The area bounded by the curve of the density function and the x-axis is equal
to 1, when computed over the domain of the variable.
3) The probability that a random variable assumes a value between a and b is
equal to the area under the density function bounded by a and b.

For example, consider the probability density function shown in the graph below.
Suppose we wanted to know the probability that the random variable X was less than
or equal to a. The probability that X is less than or equal to a is equal to the area
under the curve bounded by a and minus innity - as indicated by the shaded area.
Note: The shaded area in the graph represents the probability that the random
variable X is less than or equal to a. This is a cumulative probability. However, the

3
Figure 1: Density function

probability that X is exactly equal to a would be zero. A continuous random variable


can take on an innite number of values. The probability that it will equal a specic
value (such as a) is always zero.
2 denition. The density function for continuous random variable X is denoted in
this way: Z b
P(a 6 X 6 b) = f (x)dx. (2)
a

Properties of density function:


1) f (x) > 0;
2) f (x)dx = 1.
R∞
−∞

The CUMULATIVE DISTRIBUTION FUNCTION of a continuous random vari-


able X is dened as:
F (x) = P(X ≤ x)

3 denition. A random variable X is continuous if its cumulative distribution func-


tion F (x) is a continuous function..
If X is a continuous random variable then we must have P{X = x} = 0 for all
x ∈ R. It also implies that P(X < x) = P(X ≤ x).

3.1 Properties of probability density and cumulative distribu-

tional functions, relationship between them

Properties of distributional function:


1) 0 ≤ F (x) ≤ 1;
2) if x1 6 x2 , then F (x1 ) 6 F (x1 ). It means that F (x) is nondecreasing function;
3) P(a ≤ X ≤ b) = F (b) − F (a);
2) F (−∞) = limx→−∞ F (x) = 0, F (∞) = limx→∞ F (x) = 1.
Properties of the probability density function. Let X be a continuous random
variable. Then:

4
1) 0 ≤ f (x) ≤ 1 for all x ∈ R;
2) P(a ≤ X ≤ b) = P(a < X ≤ b) = F (b) − F (a) = f (x)dx;
Rb
a

3) F (b) = P(X ≤ b) = f (x)dx;


Rb
−∞

4) f (x) = F 0 (x);
5) f (x)dx = 1.
R∞
−∞

Properties of the probability density function. Let X be a discrete random vari-


able. Then:
1) 0 ≤ f (x) ≤ 1 for all x ∈ Z ;
2) P(X = a) = F (a) − F (a)− ;
3) P(a < X ≤ b) = F (b) − F (a) = P(X = i) − P(X = a);
Pb
i=a

4) P(a ≤ X ≤ b) = F (b) − F (a)− = P(X = i);


Pb
i=a

5) P(a ≤ X < b) = F (b)− − F (a)− = P(X = i) − P(X = b);


Pb
i=a

6) P(a < X < b) = F (b)− − F (a) = P(X = i) − P(X = a) − P(X = b);


Pb
i=a

7) F (b) = P(X ≤ b) = P(X = i);


Pb
i=−∞

8) P(X = i) = 1.
P∞
i=−∞

4 Characteristics of random variables

4.1 Measures of Central Tendency

Mean.
The mean of a random variable indicates its average or central value. It is a useful
summary value (a number) of the variable's distribution. Stating the mean gives a
general impression of the behavior of some random variable without giving full details
of its probability distribution (if it is discrete) or its probability density function (if
it is continuous). The mean can be notated as µ, m or E(X).
The mean of a discrete random variable X is a weighted average of the possible
values that the random variable can take.
n
X
µX = p1 x1 + p2 x2 + ... + pn xn = p i xi .
i=1

The mean of a random variable provides the long-run average of the variable, or
the expected average outcome over many observations.

5
If X is a continuous random variable with probability density function f(x), then
the expected value of X is dened by:
Z ∞
µx = xf (x)dx.

Properties of mean:
1) E(cX) = cE(X), where c is a constant;
2) E(X + Y ) = E(X) + E(Y );
3) E(XY ) = E(X)E(Y ) for independents events.
Median.
The median of X is the least number xm such that

P(X 6 xm ) > 1/2 and P(X > xm ) > 1/2.

Mode.
A mode of X is a number xe such that P (X = xe ) is largest. This is the most likely
value of X or one of the most likely values if X has several values with the same
largest probability. For a continuous random variable, a mode is a number xe such
that the probability density function is highest at x = xe .

Figure 2: Mode examples

6
4.2 Measures of Dispersion

Variance and standard deviation.


The (population) variance of a random variable is a non-negative number which gives
an idea of how widely spread the values of the random variable are likely to be; the
larger the variance, the more scattered the observations on average.
Stating the variance gives an impression of how closely concentrated round the ex-
pected value the distribution is; it is a measure of the 'spread' of a distribution about
its average value.
The variance of a discrete random variable X is dened by
n
X
2
σX = (xi − µX )2 pi .
i=1
The variance can be notated as V ar(X) or D(X).
NOTES:
• the larger the variance, the further that individual values of the random variable
(observations) tend to be from the mean, on average;
• the smaller the variance, the closer that individual values of the random variable
(observations) tend to be to the mean, on average.
The standard deviation σ is the square root of the variance.
The variance of a continuous random variable X is dened by
Z ∞
2
σX = (x − µx )2 f (x)dx.

The variance also is dened by

2
σX = E(X − E(X))2 = EX 2 − (EX)2 .
Properties of variance:
1) D(cX) = c2 D(X), where c is a constant;
2) D(X + Y ) = D(X) + D(Y ) for independents events.

4.2.1 Moments of the random variables


4 denition. The moment αk is dened (if it exists) as the expected value of the
random variable X k , that is
αk = E(X)k .
5 denition. The kth central moment Mk is dened (if it exists) as the expected
value of the random variable (X − E(X))k , that is
Mk = E(X − E(X))k .

Notice that α = E(X) and M2 = E(X − E(X))2 = σX2 .

7
5 Some distributions of discrete random variables

5.1 Bernoulli Distribution

The Bernoulli distribution is a discrete distribution having two possible outcomes


labeled by x = 0 and x = 1 in which x = 1 ("success") occurs with probability p and
x = 0 ("failure") occurs with probability q = 1 − p, where 0 < p < 1. It therefore
has probability function

p, if x = 1;

f (x) =
q if x = 0.
The corresponding distribution function is

 0, if x < 0;

F (x) = q, if 0 6 x < 1;
1 if x > 1.

The Bernoulli distribution has mean

E(X) = 1 × p + 0 × q = p

and variance

D(X) = (1 − p)2 × p + (0 − p)2 × q = q 2 p + p2 q = pq(q + p) = pq.

5.2 Binomial Distribution

Let X1 ,..., Xn be independent Bernoulli random variables. Often we are only inter-
ested in number of successes:

Y = X1 + ... + Xn

Example: A coin is tossed 50 times and we are only interested in number of 'heads'
occurring.
Dene for the ith toss:
1, if head occours;

Xi =
0 if tail occours.

Y = X1 + ... + Xn
Here Y is the number of "heads" which occour after 50 tosses.
We know:
1) Xi is Bernoulli distributed with parameter p;
2) there are just two outcomes of each trial: success and failure;

8
What is the distribution of Y ? Y has density function

b(y, n, p) = P (X = y) = Cny py q n−y y ∈ 0, ..., n

where y = 0, 1, 2, ..., n
n = 1, 2, 3, ...
p success probability; 0 6 p 6 1;
q failure probability; q = 1 − p.
A discrete random variable Y is said to follow a Binomial distribution with param-
eters n and p, written Y ∼ Bi(n, p) or Y ∼ B(n, p), if it has probability distribution.
The trials must meet the following requirements:
1) the total number of trials is xed in advance;
2) there are just two outcomes of each trial: success and failure;
3) the outcomes of all the trials are statistically independent;
4) all the trials have the same probability of success.
The distributional function is:

if x < 0;

 0,
qn, if 0 6 x < 1;




 q n + npq n−1 , if 1 6 x < 2;



F (x) = ...
if k 6 x < k + 1;
Pk
i=0 b(i, n, p),




...



if x > n.

1

The Binomial distribution has mean E(X) = np and variance D(X) = np(1 − p).

5.3 Geometric Distribution

Consider a sequence of independent Bernoulli trials.


1) On each trial, a success occurs with probability p;
2) Let X be the number of trials up to the rst success.
What is the distribution of X ?
1) Probability of no success in x − 1 trials: (1 − p)x−1 ;
2) Probability of one success in the xth trial: p.
Then the density function of X is

f (x) = p(1 − p)x−1 ; x = 1, 2, 3, ...

9
X is geometrically distributed with parameter p. The distributional function is:

if x < 1;

 0,
if 1 6 x < 2;

p,



if 2 6 x < 3;

p(1 − p),

F (x) =
 ...
if k 6 x < k + 1;

p(1 − p)k−1 ,




...

The Geometric distribution has mean E(X) = 1


p
and variance D(X) = 1−p
p2
.

5.4 Hypergeometric Distribution

The hypergeometric distribution arises when a random selection (without repetition)


is made among objects of two distinct types. Typical example: choose a team of 8
from a group of 10 boys and 7 girls.
Suppose a population consists of N = N1 + N2 items of which N1 are of one type
and N2 are of the second type. A random sample drawn from that population consists
on n items, X of which are of the rst type. Then the hypergeometric probability is:

CNx 1 CNn−x
2
P (X = x) = h(x; n, N1 , N2 ) =
CNn 1 +N2
for integers x such that 0 6 x 6 n and 0 6 X 6 N1 + N2 .

5.5 Poisson Distribution

Often we are interested in the number of events which occur in a specic period of
time or in a specic area of volume:
1) the number of cars passing a xed point in a 5 minute interval;
2) Number of telephone calls coming into an exchange during one unit of time;
3) Number of diseased trees per acre of a certain woodland;
4) Number of death claims received per day by an insurance company.
A discrete random variable X is said to follow a Poisson distribution with param-
eter λ, written X P (λ), if it has probability distribution
λx −λ
p(x, λ) = P (X = x) = e
x!
where x = 0, 1, 2, ..., n, λ > 0.
The following requirements must be met:

1) The probability that the event occurs in a given unit of time is the same for all
the units;

10
2) The number of events that occur in one unit of time is independent of the
number of events in other units;
3) The mean (or expected) rate is λ.

The Poisson distribution has expected value E(X) = λ and variance D(X) = λ.
The Poisson distribution can sometimes be used to approximate the Binomial dis-
tribution with parameters n and p. When the number of observations n is large, and
the success probability p is small, the Bi(n, p) distribution approaches the Poisson
distribution with the parameter given by λ = np. This is useful since the computa-
tions involved in calculating binomial probabilities are greatly reduced.

6 Some distributions of continuous random variables

6.1 Uniform Distribution

A uniform distribution, sometimes also known as a rectangular distribution, is a


distribution that has constant probability. For example, if buses arrive at a given bus
stop every 15 minutes, and you arrive at the bus stop at a random time, the time
you wait for the next bus to arrive could be described by a uniform distribution over
the interval from 0 to 15.
Let a and b be constants (a < b). The function
1
for a < x < b;

b−a
,
f (x) =
0, elsewhere.
is called the uniform probability density function, where a is the location parameter
and (b − a) is the scale parameter. The distributional function is:

if x 6 a;

 0,
F (x) = x−a
b−a
, if a < x 6 b;
1 if x > b.

Let nd the mean:


b
1 x2 b 1 b 2 − a2
Z
1
µx = x dx = |a = =
a b−a b−a 2 b−a 2

1 (a + b)(b − a) a+b
= = .
b−a 2 2
and variance:
b
1 x3 b 1 b 3 − a3
Z
2 1
E(X ) = x2 dx = |a = =
a b−a b−a 3 b−a 3

1 (b − a)(b2 + ab + a2 ) b2 + ab + a2
= = .
b−a 3 3

11
Then
b2 + ab + a2 (a + b)2
D(X) = E(X 2 ) − (E(X))2 = − =
3 4
4(b2 + ab + a2 ) − 3(b2 − 2ab + a2 ) b2 − 2ab + a2 (b − a)2
= = = .
12 12 12
So the Uniform distribution has mean E(X) = (a+b)
2
and variance D(X) = (b−a)2
12
.

6.2 Exponential Distribution

Exponential distributions describe the distance between events with uniform distri-
bution in time: if x is the time variable, λx is the expected number of events in the
interval [0, x], then e−λx is the probability of no event in [0, x]. The function
λe−λx , for x > 0;

f (x) =
0, elsewhere.
is called the exponential probability density function. The distributional function of
X is:
for x < 0;

0,
F (x) =
1 − e , for x > 0.
−λx

The exponential distribution has mean E(X) = 1


λ
and variance D(X) = 1
λ2
.

Figure 3: Exponiantial pdf and cdf

6.3 Normal Distribution

Strictly, a Normal random variable should be capable of assuming any value on the
real line, though this requirement is often waived in practice. For example, height at
a given age for a given gender in a given racial group is adequately described by a
Normal random variable even though heights must be positive.
A continuous random variable X , taking all real values in the range (−∞; ∞) is said
to follow a Normal distribution with parameters µ and σ if it has probability density
function
1 2 (µ−x)
f (x) = ϕ(x) = √ e 2σ 2 .
2πσ 2

12
The distributional function is
Z x Z x
1 −(y−µ)2
F (x) = f (y) dy = √ e 2σ 2 dy. (3)
−∞ 2πσ 2 −∞

We write X ∼ N (µ, σ). This probability density function is a symmetrical, bell-


shaped curve, centered at its mean µ. The variance is σ 2 . Many distributions arising
in practice can be approximated by a Normal distribution. Other random variables
may be transformed to normality.

The graph of the normal distribution depends on two factors - the mean and
the standard deviation. The mean of the distribution determines the location of the
center of the graph, and the standard deviation determines the height and width of
the graph. When the standard deviation is large, the curve is short and wide; when
the standard deviation is small, the curve is tall and narrow. All normal distributions
look like a symmetric, bell-shaped curve, as shown below.

Figure 4: normal distribution

The normal distribution is a continuous probability distribution. This has several


implications for probability.
1) The total area under the normal curve is equal to 1.;
2) The probability that a normal random variable X equals any particular value
is 0;
3) The probability that "X is greater than a" equals the area under the normal
curve bounded by a and plus innity (as indicated by the non-shaded area in
the gure below);
4) The probability that X is less than a equals the area under the normal curve
bounded by a and minus innity (as indicated by the shaded area in the gure
below).
Additionally, every normal curve (regardless of its mean or standard deviation)
conforms to the following "rule".
1) About 68% of the area under the curve falls within 1 standard deviation of the
mean;

13
Figure 5: normal distribution and probability

2) About 95% of the area under the curve falls within 2 standard deviations of the
mean;
3) About 99.7% of the area under the curve falls within 3 standard deviations of
the mean.
Collectively, these points are known as the empirical rule or the 68 − 95 − 99.7 rule or
1 − 2 − 3 σ rule. Clearly, given a normal distribution, most outcomes will be within
3 standard deviations of the mean.
The simplest case of the normal distribution, known as the Standard Normal Dis-
tribution, has expected value zero and variance one. This is written as N(0,1). The
probability density function is
1 −x2
φ(x) = √ e 2 , −∞ < x < ∞. (4)

The distributional function is
Z x Z x
1 −y 2
Φ(x) = φ(y) dy = √ e 2 dy. (5)
−∞ 2π −∞

1 proposition. Let X is random variable described by a Normal distribution with


mean µ and variance σ 2 , X ∼ N (µ, σ 2 ) . Characteristics of this random variable


X can be written trough characteristics of standard Normal Distribution:


 
1) density function: f (x) = 1
σ
φ x−µ
σ
;
 
2) distributional function: F (x) = Φ σ .
x−µ

7 The main distributions used in mathematical statis-


tics

7.1 Gamma distribution

The probability density function of the exponential distribution has a mode of zero.
In many instances, it is known a priori that the mode of the distribution of a particular
random variable of interest is not equal to zero (e.g., when modeling the distribution
of the life-times of a product such as an electric light bulb, or the serving time taken

14
at a ticket booth at a baseball game). In those cases, the gamma distribution is more
appropriate for describing the underlying distribution. The gamma distribution is
dened as:
λγ γ−1 −λx
f (x) = x e for x > 0, γ > 0
Γ(γ)
where
γ is the Shape parameter
λ is the Scale parameter
Γ is the Gamma function dened as Γ(k) = 0 xγ−1 e−x dx. It's properties:
R∞

• Γ(1) = 1;

• Γ(γ + 1) = γΓ(γ).

Figure 6: Gamma density functions

7.2 Chi-square distribution

In probability theory and statistics, the chi-square distribution (also χ2 distribution)


is one of the most widely used theoretical probability distributions in inferential
statistics, i.e. in statistical signicance tests. It is useful because, under reasonable
assumptions, easily calculated quantities can be proven to have distributions that
approximate to the chi-square distribution if the null hypothesis is true.
If Xi are k independent, normally distributed random variables with mean 0 and
variance 1, then the random variable
k
X
Q= Xi2
i=1

is distributed according to the chi-square distribution. This is usually written


Q ∼ χ2k

15
The chi-square distribution has one parameter: k - a positive integer that species
the number of degrees of freedom (i.e. the number of Xi ).
The best-known situations in which the chi-square distribution is used are the
common chi-square tests for goodness of t of an observed distribution to a theoretical
one, and of the independence of two criteria of classication of qualitative data.
However, many other statistical tests lead to a use of this distribution. One example
is Friedman's analysis of variance by ranks.
A probability density function of the chi-square distribution is
1
xk/2−1 e−x/2 , for x > 0;

f (x) = 2k/2 Γ(K/2)
0, for x 6 0.
Here
Γ(K/2) - gamma distribution
k is the degrees of freedom.

Figure 7: Chi-square density functions

7.3 Student's t Distribution

The student's t distribution is symmetric about zero, and its general shape is similar
to that of the standard normal distribution. It is most commonly used in testing
hypothesis about the mean of a particular population. Suppose X0 , X1 , X2 , ..., Xn
are independent normal distributed random variables with means 0 and variance σ 2
(Xi ∼ N (0; σ), i = 1, ..., n). Then the random variable
X0
tn = q P
1 n
n i=1 Xi2

is called student's or t distribution with n degrees of freedom.


The student's t distribution is dened as (for n = 1, 2, ...):
 
n+1 
1 Γ 2 x2 − n+1
2
f (x) = √   1+ .
πn Γ n n
2

The shape of the student's t distribution is determined by the degrees of freedom.

16
Figure 8: Student's density functions

7.4 F Distribution

Snedecor's F distribution is most commonly used in tests of variance. Suppose


U1 , U2 , ..., Um ;
V1 , V2 , ..., Vn are independent normal distributed random variables with means 0 and
variance σ 2 (Ui ∼ N (0; σ), i = 1, ..., m; Vj ∼ N (0; σ), j = 1, ..., n;). Then the ratio
of two chi-squares divided by their respective degrees of freedom is said to follow an
F distribution: 1
Pm 2 1 2
U χ
Fm,n = m
1
Pni=1 i2 = m m
1 2 .
n j=1 Vj χ
n n

Here n and m are the shape parameters, degrees of freedom. The probability density
function of F distribution can be written in this way:
  
  m Γ m+n  − m+n
2
, for x > 0;
 m 2
 2 m
   x −1 mx
n
2 1+ n
f (x) = Γ m
Γ n
 2 2

for x 6 0.

0,

Figure 9: Fisher's density functions

17

You might also like