Chapter 7 Eng

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 59

Analysis of economic data (20606)

Chapter 7
Basic concepts of probability
• You should read the introduction to
probability theory before reading here…
Inferencial statistics

Provides methods to estimate the


characteristics of a group
(population) based on data from a
small set (sample).

Sample Population
Random variable
A random variable X is a function that associates
each event of the sample space of an experiment E
with a random real numerical value:
X :E 
w  X ( w)
Example:
Let the random experiment be tossing two coins at once.
The random variable X = 'Get Head' S(x) X (va)
Events (head, head; head, tail; tail, head; tail, tail) HH 2
X = 0 (no head), 1 (a head), 2 (two heads) HT 1
It has been associated to each possible event a numeric value.
TH 1
TT 0
The random variables may be discrete or continuous:

Discrete: the set of possible values is infinite or finite


(number of children in a family, number of cars ...)

Continuous: the set of possible values is uncountable. You


can take all values in a range (height, income, salary,
weight...)

Distribution of a random variable

● Let x be a discrete random variable. Their distribution is given


by the values they can take, x1, x2, x3, …, xn , and the probability
that they will happen p1, p2, p3, …, pn .

pi  P { x  xi }
These amounts are called the probability function.
● The probability function verifies that:

 p( x)  0
  p( x)  1
x

Let x be a continuous random variable. The continuous random


variables are treated differently because in the continuous case is not
possible to assign a probability to each of the infinite possible values and
that these probabilities add to one (as in the discrete case). It is necessary
to take a different approach to obtain the probability distribution of a
continuous random variable.

We will use the density function f (x) that should fulfill:

 f ( x)  0

  f ( x) dx  1

For the example X = 'Getting head' in the tossing of two coins
in an experiment, in the following table is represented its
probability distribution and its graph

S(x) X (va) P(x) P(x)


HH 2 ¼
HT 1 ½
TH 1 ½
TT 0 ¼
0 1 2
X
An example of a density function
Cumulative distribution of a random variable

Let x be a discrete random variable. Its cumulative probability


distribution is defined as:
x
F ( x)  P( X  x)   p( x)
0
Let x be a continuous random variable. Its cumulative distribution
function is defined as:

x
F ( x)  P( X  x)   f ( x)dx

Properties:

 0  Fx ( x)  1
 F ()  lim F ( x)  lim P ( X  x)  0
x   x  
 F ()  lim F ( x)  lim P( X  x)  1
x   x  
 Fx ( x   )  Fx ( x) For all ε ≥ 0

F is monotone increasing.

F is continuous on the right: the probability


that the discrete random variable X takes a value
concrete is equal to the jump in the distribution function
at that point.
Apart it should be fulfilled that:

Continuous variables:
x2
 P[ x1  X  x2 ]   f x ( x)dx  F ( x2 )  F ( x1 )
x1

F ( x)
  f ( x)
x
a
 P( X  x)  0 P(X  a)  P(a  X  a)   f(x)dx  0
a
As it is an integrable function, the
Discrete variables
probability of a point is zero
x2
 P[ x1  X  x2 ]   p x ( x)
x1

 P (X  x)  1 - F(x)
For the example X = 'Getting head' in the tossing of two coins in an
experiment, in the following table is represented its cumulative
probability distribution and its graph.

S(x) X (va) P(x) F(x) F(x)


TT 0 ¼ ¼
HT 1 ½ ¾ 3/4
HH 2 ¼ ¹

0 1 2 X

Continuous
b
P[a  X  b]   f x ( x)dx  F (b)  F (a)
a

F (a ) F (b)
Example: Draw the probability function f (x) and the
distribution function F (x) of a discrete variable defined
as:
X = Number of a side of dice.
X has possible values x = 1, 2, 3, 4, 5, 6 each with
probability 1/6.

1 1
6
F(x)

f(x)
0.5

0 1 6 x 0 1 6 x

Probability function f(x) Distribution function F(x)


PARAMETERS:
Expected Value E(X):

The mean or expected value is a measure of location,


which indicates the value around which fluctuates the
random variable X; it is the "center of gravity" and is
defined as:

E ( X )   xi f ( xi ) In the discete case


xi


E ( X )   xi f ( x)dx In the continuous case

Example

A domestic insurance company must determine the average


cost per contract signed, knowing that each year 1 out of
10000 finishes in a claim of 20 million, 1 out of 1000 by 5
million claim and 1 in each 50 in 200,000 and the rest to 0.

 1   1 
   2 10 
7
   5 10 
6

 10000   1000 
 1   9789 
  2 10     0 
5
  11000
 50   10000 
PROPERTIES

a, b y c are constants. It is showed that:

(1) E (c )  c
(2) E ( P1 ( x)  P2 ( x))  E ( P1 ( x))  E ( P2 ( x))
(3) E (aX  b)  aE ( X )  b
Variance and standard deviation

Variance, of a discrete probability function


  Var ( X )  E (( X   ) )   ( xi   ) P( x)
2 2 2

Variance, a continuous probability function



  Var ( X )  E (( X   ) )   ( xi   ) f ( x)dx
2 2 2


Standard deviation   Var ( X )


Both measure the "dispersion of the data." Note that the deviation
does so with the same units as the data itself.
Example
X  ( X   ) ( X   )  P( X )
2 2
X P(X)

-1 0.1 -2 4 .4
0 0.2 -1 1 .2
1 0.4 0 0 .0
2 0.2 1 1 .2
3 0.1 2 4 .4
1.2

   ( xi   ) P( xi )  1,2
2 2

  Var ( X )  1.10
REMINDER FOR THE PROPERTIES OF THE VARIANCE

a) The variance of a constant is zero

b) The variance of every variable is always nonnegative

c) If we add (or subtract) a constant to a random variable, its


variance does not change

d) If we multiply a random variable by a constant, its variance


is multiplied by the square of this constant

e) The variance of a linear transformation: aX ± b is equal to:


a^2 Var (X)
Contents
• Discrete distributions
BERNOULLI
BINOMIAL
POISSON
• Continuous distributions
NORMAL
T-STUDENT
 2 (Chi-square)
F
Probability distribution
• The distribution of a variable X is defined as a
description of all possible values of X,
together with the probability associated with
each of these values

• For a discrete random variable the probability


distribution is described by a probability
function, represented by P(x). Where this
function defines the probability of each value
of the variable analyzed
Bernoulli distribution
Bernoulli experiment: they are only two possible
outcomes: success or failure. We can define a discrete
random variable X such that:

success  1
failure  0
If the probability of success is p and of failure 1 - p,
we can construct a probability function:

1 x
P ( x)  p (1  p )
x
x  0,1
A typical Bernoulli experiment is tossing a coin with
probability p for head and (1-p) for tail.
1 x
P ( x)  p (1  p )
x
x  0, 1

Distribution function:

1  p, for x  0
F ( x)  
 p, for x  1
Esperanza y varianza de la distribución de
Bernoulli

1
E[ X ]     x P ( X  x) 0  P ( X  0)  1  P ( X  1)  p
x 0

1
Var ( X )  E[ X ]  ( E[ X ])   x P ( X  x)  p
2 2 2 2

x 0

 0  P ( X  0)  1  P ( X  1)  p  p  p  p (1  p )
2 2 2 2
Binomial distribution
• It is a discrete probability distribution known for its
various applications related to a multi-stage
experiment
• A binomial experiment has four properties

1. The experiment consists of a sequence of n trials


identical.
2. In each trial two results are possible. Success or
Failure.
3. The probability of success, represented by p,
does not change from one trial to another.
Consequently, the probability of failure, (1-p),
does not change from one trial to another.
Stationary assumption.
4. Attempts are independent
• If only the properties 2,3,4 are fulfilled it is a Bernoulli
process.
• An example of the Binomial distribution is to determine
the probability that in n trials by tossing a coin coming up
heads (success) and tail (failure).
• The combinatorial formula of n objects selected from a
group provides the number of experimental results that
results in x successes.
• Determine the amount of experimental results with
exactly x successes in n trials

n n!
 x 
  x ! n  x !
• It is also necessary to know the probability associated
with each of the experimental results, which can be
determined through the following relationship

(n x)
p (1  p )
x
• Combining the two expressions we obtain the
Binomial distribution function.

n x
f ( x)    p (1  p) ( n x )
 x
f (x) = probability of x successes with n trials.

n n!
  
 x  x!(n  x)!

p = probability of a success in any trial.


(1  p ) = probability of failure in any trial.
• Expected value of the binomial probability
distribution

E ( x)    np

• Variance of binomial probability distribution

Var ( x)    np (1  p )
2
Example
• The manager of a department store needs to determine
what is the probability that two out of three customers
who enter the store make a purchase. He knows that the
probability that a customer buys is 0.3.

 3 3!
   3
 2  2!(3  2)! Number of experimental results

0,3 2 (1  0,3) (3 2)  0,063 Probability of each experimental


outcome in which two of the three
customers buy

Then 3*0.063 = 0.189 is probability that out of 3 customers


that enters the shop 2 turn out to buy.
Exercise:

What is the probability that in a family of four children there


are exactly 2 girls?

n x n x
p(x)    p ( 1  p)
 x
p  0.5; n  4; x  2
 4
p( 2 )    ( 0.5 ) ( 1-0.5 )
2 4- 2

 2
Exercise:

If one tenth of people have a certain blood type, what is the


probability that among 100 people randomly chosen exactly
8 of them belonging to this blood group?

n x n x
p(x)    p ( 1  p)
 x
p  0.1; n  100; x  8
100 
p( 8 )    ( 0.1 ) ( 1-0.1 )
8 92

 8 
What if the question is 8 or fewer?

8
n x
p(x  8 )     p ( 1  p) n x

x 0  x 

8
100 
   (0.1) ( 0.9 )
x 100 x

x 0  x 
Calculate the probability of obtaining at least two sixes
when throwing a dice four times.
 n  k nk
P(k )    p q (k  0,1,....n)
k 
p = 1/6, q = 5/6, n = 4
At least two sixes, implies that k can be, k = 2, 3, 4.
P(2) + P(3) + P (4)
2 2 3 4
 4  1   5   4  1   5   4  1 
               
 2  6   6   3  6   6   4  6 
1 171
 4 (6  25  4  5  1)   0.132
6 1296
Poisson Distribution
• If we analyze a random variable that is a
count over a period of time we can use the
Poisson distribution.
• This distribution can only have discrete,
non negative values, i.e. 0,1,2,3 ....
• A random variable that follows a Poission distribution
with parameter  (which expresses the rate) is the
probability mass function (density);

e   y
P(Y  y )  f ( y |  )  y  0, 1, 2,...,
y!
•  is the rate, it is the expected value of the variable
and indicates the expected number of times that an
event occurs per unit time. For example, the number of
patents per month, number of doctor visits per year, etc..
• The Poisson distribution has the following
properties;
• 1) The events are independent
• 2) The expected value is E (Y )   .
• 3) The variance is Var (Y )   .
• 4) The probability of a value of 0 is smaller with
a larger rate.
• 5) As the rate increases the Poisson distribution
approximates the normal distribution.
• The first property implies that past events do not
influence future events.
• The second and third means equidispersion property,
i.e. the expected value is equal to the variance. It is a
property of the distribution, but basically is it an
assumption that must be met in order to use the
distribution. (That is, in practical cases, it may be a
constraint that is not satisfied). (Note that the Poisson
distribution has only one parameter, while, for example
the normal distribution has two parameters, (one for the
expected value, and one for the variance).
Example
• A real estate agent sells on average 1.24
apartments per week.
a) Calculate the probability of not selling any
apartment in a week.
Sabemos:  lambda
We know that  0 , and
1,24 y isy 1,24 ya que
y = queremos
0, because calcular
we wantlatoprobabilidad de vender 0
calculte the probability
to sell Usamos
pisos. 0 apartments. We asumePoisson;
la distribución that the probability can be found by using the
Poisson Distribution.
e 1, 24 1,24 0 (exp(1,24))  1
P (Y  0)  f (0 | 1,24)    0,289
0! 1
b) Calculate the probability of selling one
apartment in a week.
We know that
Sabemos:  lambda
1,24 yisy1,24
 1 ,and
ya yque
= 1,queremos
because we want tolacalculte
calcular the probability
probabilidad de venderto 1
sell 1 apartment. We asume that the probability can be found by using the Poisson
pisos. Usamos la distribución Poisson;
Distribution.
e 1, 24 1,241 (exp(1,24))  1,24
P (Y  1)  f (1 | 1,24)    0,359
1! 1
c) Calculate the probability of selling two
apartments in a week.
We know that lambda
Sabemos:  1,24 is y  2and
y 1,24 , yay que queremos
= 2, because we calcular la probabilidad
want to calculte de vender
the probability to sell2
2pisos.
apartments.
UsamosWe laasume that the probability
distribución Poisson; can be found by using the Poisson
Distribution.
e 1, 24 1,24 2 (exp( 1,24))  1,24 2
P (Y  2)  f ( 2 | 1,24)    0,222
2! 1 2
d) Calculate the probability of selling more
than two apartments in a week.
Aquí
We takeaprovechamos
advantage of thelos cálculosdone
previously ya realizadas.
calculations.

P(Y  2)  1  P(Y  0)  P(Y  1)  P(Y  2)   0,129

(Enhave
(We los cálculos
used morehemos usado
decimals in themás decimales)
calculations).
• In Excel we can use the formula;
• =POISSON(x; mean; accumulated)
• X is what we have labelled y, mean is the
rate and accumulated is 0 if we want the
probabiliity to have the value X and 1 if we
want the probability of X or less. In
question d) we can calculate;
=1-POISSON(2; 1,24; 1).
Normal distribution
• Abraham de Moivre published in 1733
the doctrine of probabilities and derived
the normal probability distribution.
• It is the most important continuous
distribution of probability.
• In individual cases it may be applied as
an approximation in the use of discrete
variables.
• The normal probability density function
is expressed as:

1  ( x   )2 / 2 2
f ( x)  e
2


m x
Features of the normal distribution:

– There are families of normal distributions. Each is


identified by its mean and standard deviation
– The highest point is the average
– The media can be any numeric value
– The normal distribution is symmetrical. The tails
extend to infinity (never touch the x-axis)
– The standard deviations determine the width of the
curve
– The total area is 1
When we have a normal distribution with mean 0 and
standard deviation 1 we talk about a standard normal
distribution (tabulated).

x
z

Chi-square distribution of Pearson

 n2  X 12  X 22      X n2

where X 1 , X 2 ,..., X n are independent random variables


y X i  N (0,1) for i = 1, 2,…, n. The graph is its density function:

Chi-Cuadrado Distribución
Mean: n = degrees of freedom 0,1 Grad. de libertad
13
Variance: 2n 0,08
densidad

0,06

0,04

0,02

0
0 10 20 30 40
x
When it is tabulated:

Chi-Cuadrado Distribución
0,1 Degrees
Grad. of freedom
de libertad
13
0,08
densidad

0,06

0,04

0,02

0
0 10 20 30 40
x

 2
n , P(   
2
n
2
n , ) 
• The chi-square distribution is the distribution of
the sum of squares of standard normal deviates.
• The degrees of freedom of the distribution is
equal to the amount of added standard normal
deviation.
• Therefore, Chi-square with one degree of
freedom, written as χ2 (1), is simply the
distribution of a single standard deviation
squared.
• The distribution Chi2 is skewed
(positively), and the skewness decreases
when the degree of freedom increases.
• As the degrees of freedom grows the Chi2
distribution approximates the normal
distribution.
• We can use a table for the chi-square
distribution to find corresponding
propabilities.
• In Excel we can calculate;
=DISTR.CHICUAD.CD (x; degrees of
freedom)
Example
We randomly chose two results of a standard
normal distribution. We calculate the squared
values, and add these squared results.
What is the probability that the sum of these two
results will be 6 or higher?
We added two results; 2 degrees of freedom. We
look at the table of  2 ...
The result is a probability of less than 0.05.

(If we use Excel, we find


=DISTR.CHICUAD.CD(6;2)
=0,049787068
Using the Chi2 distribution in
chapter 4
We can use the chi-square distribution to evaluate the
result of the coefficient of association  2 (Pearson)
that we saw in chapter 4.
The degrees of freedom are, (r  1)  (c  1)
Where r and c indicate number of rows and number of
columns. (We lose degrees of freedom when calculating
the frequencies under the assumption of independence).
If we have a frequency under the assumption of
independence that is <5, we should regroup, to avoid it.
If we do not, the results are unreliable. We should also
regroup if we find cells with a frequency of 0 in the
contingency table.
Example
• We have a contingency table with 4 rows and 3
columns, and coefficient of association  2 is
13.
• What is the probability to find a  2 such high
(or higher) than 13 if the variables are
independent?
• Degrees of freedom: (r  1)  (c  1)  (4  1)  (3  1)  6
• Look in the table: The probability is less than
0.05. If you use Excel: 0,043.
• In conclusion: It is unlikely that the variables are
independent.
t Student distribution

tn 
z z  N (0,1) Normal

1 2  2
n n Chi-squared
n with n degrees of
freedom

t de Student Distribución
0,4 Grad. de libertad
10
0,3
densidad

0,2

0,1
It is symmetrical.
0
Media: 0
-6 -4 -2 0 2 4 6 Variance: n / (n-2) (for n> 2)
It is flatter than the normal
x distribution
When it is tabulated

t de Student Distribución
0,4 Grad. de libertad
10
0,3
densidad

0,2

0,1

0
-6 -4 -2 0 2 4 6
x

t n , P (t n  t n , )  
F of Fisher-Snedecor

 / n1
2
n1  2
Chi-squared with n1
Fn1 ,n2  n1
degrees of freedom
 / n2
2
n2  2
n2 Chi-squared with n2
degrees of freedom

F (índice de varianza) Distribución


0,8 Numerador g.l.,Denominador g.l.
10,10
0,6
densidad

0,4

0,2

0
0 1 2 3 4 5
x
When it is tabulated

F (índice de varianza) Distribución


0,8 Numerador g.l.,Denominador g.l.
10,10
0,6
densidad

0,4

0,2

0
0 1 2 3 4 5
x

Fn1 ,n2 , p P ( Fn1 ,n2  Fn1 ,n2 , p )  p

You might also like