Chapter 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 129

MH3500 Statistics

Lecture Notes
Chapter 1
Introduction and Revision
Table of Contents
Section Remark
Introduction
1. PDF, CDF, PMF, Mean, Variance, and Moments
2. Common Probability Distributions
3. Upper Percentage Points of Distributions
4. Moment Generating Functions
5. Probability Distributions of Functions of Random Variables
6. Distribution of Maximum and Minimum of Random Variables
7. Statistical Populations and Random Samples
8. Statistics and Sampling Distributions
9. Sample Mean and Sample Variance
10. Chi-Square Distribution
11. Distribution of Standardized Sample Variance in the Normal Case
12. t-Distribution
13. F-Distribution
14. Some Limit Theorems
What is Statistics?
› Statistics is the science of data that involves:
– Collecting
– Classifying
– Summarizing
– Organizing and
– Interpretation
of (usually) numerical information.
› It includes mathematical methods for collection, analysis, and
presentation of numerical data
› The aim is to make rational decisions under uncertain conditions
and to derive insights from data
› In manufacturing, computer software, pharmaceuticals and other
areas, information are collected and analyzed to improve the
quality of a process and product (Inferential Statistics)
Basic Procedures of Statistics
Study of data in four steps:

Data
Assume appropriate joint
Data Statistical distribution of with
Collection Modeling parameters

Estimate by
functions of , Decision
Data
quantify uncertainty Making
Analysis
of these estimates
Review of Probability
1) PDF, CDF, PMF, Mean, Variance,
and Moments
Some Standard Notation

Let be a random variable (r.v.)

Cumulative Distribution Function (CDF):

Probability Density Function (PDF, for continuous ):

Probability Mass Function (PMF, for discrete ):


Computing Probabilities

Continuous case:

Discrete case:
Example
Let (standard normal distribution)
PDF:

CDF:

PDF CDF
Median and Quartiles of Distributions

Consider a distribution with CDF .

Median: Value with

Lower Quartile: Value with

Upper Quartile: Value with

(median and quartiles may not exist or may not be unique)


Expected Values

Let be a random variable and a function


Let be the PDF of (if continuous) respectively PMF
(if discrete)

if continuous

if discrete
Properties of Expected Values

Linearity:

(this holds even if the are not independent)

Product formula:

(only holds if the are independent)


Mean, Variance, Moments

Let be a random variable

Mean (expected value):

Variance:

th moment:

th central moment:
2) Common Probability Distributions

 Bernoulli • Uniform
 Binomial • Normal
 Geometric • Exponential
 Poisson • Gamma
Bernoulli Distribution
means that only takes values 0
(failure) and 1 (success) such that

PMF:

Mean:
Variance:

Note that: CDF of Bernoulli(1/3)


PMF of Bernoulli Distribution

Alternative formula:

for and otherwise


Mean and Variance of Bernoulli
Random Variable

if we let
Binomial Distribution
means that is the sum of
independent Bernoulli variables

PMF: ,
Mean:
Variance:

Binomial Identity:

PMF of Binomial(100,0.5)
Geometric Distribution

means that counts the number of trials up to


the first success in a sequence of independent Bernoulli
trials
PMF: ,
Mean:
Variance:

Geometric Identity:

PMF of Geom(0.1)
Poisson Distribution
approximates a variable
with (acceptable for and .
The Poisson distribution is used to model rare events
PMF: ,
Mean:
Variance:

Exponential Identity:

PMF of Poisson(1)
Uniform Distribution

means that the PDF of is

Mean:
Variance:

PDF of U(-1,1)
Normal Distribution
means that the PDF of is

Mean:
Variance:

Normal Identity:

PDF of N(0,1)
Exponential Distribution
Let . We write if the PDF of is

for and for


Mean:
Variance:

Application: time until


occurrence of rare event, e.g.,
PDF of Exp(10)
radioactive decay, earthquake,
tsunami
Gamma Distribution
Let means that the PDF of is

for and for

Mean:
Variance:

Note: PDF of Gamma(5,1)


• is an distribution

• is an (chi square) distribution


Remarks on Gamma Distribution
 There are two common ways to specify a Gamma
distribution (see Wikipedia); one uses the shape and
scale and the other uses shape and rate
 The version we use in this course is shape and scale. In
our notation, the shape parameter is and the scale
parameter is (in Wikipedia, however, the shape
parameter is and the scale parameter is )
 To switch from one version of parameters to the other,
just use the fact that rate = 1/scale
3) Upper Percentage Points of
Distributions
Upper Percentage Points
Let be a random variable with distribution

is the point such that the probability for being larger


than this point is :

is called the upper point of

Percentage points are frequently used for confidence


intervals and hypothesis testing. They measure how
significant an observation for is.
Connection Between Percentage
Points and CDF

Let be a random variable with distribution and CDF

is the point where


Upper Percentage Points of
Example
Consider a situation where the value of a random variable
is viewed as “significant” if the probability to observe this
value or something more extreme is 25.

Suppose is distributed, what positive values of


are significant?

Need to know the point with . This is


exactly the upper 2.5% point of

From last slide, . So is viewed as


significant
Remark on Upper Percentage Points

 If percentage points are needed in the exam or midterm,


then tables will be given which contain the necessary
information
 Similar tables are available in most textbooks (chi-
square upper percentage points, t-distribution
percentage points, and standard normal CDF)
 The percentage points either can be obtained from a
table of percentage points or from a table of the CDF of
the distribution
4) Moment Generating Function (MGF)
Let be a random variable

Moment Generating Function:

Continuous case:
( is the PDF of )

Discrete case:
( is the PMF of )
Example (Discrete r.v.)
Let .

That is, ,

The MGF of is:


Example (Continuous r.v.)
Let , and .

Then the PDF of is for and otherwise.

The MGF of is:


Example (Normal r.v.)
Let .
Then the PDF of is for .
The MGF of is:
=
Some Properties of the MGF
Property A (Inversion Theorem)
If for all , then
and have the same distribution. That is,
they have the same CDF.
We will not prove this results here, because it involves
higher level of Mathematics, such as Laplace
transformation.
Property B
If the MGF of exists in an open interval
containing zero, then
for all positive integers .
We know , hence,

Therefore,
Property C
If then
Example (Normal r.v.)
Let and .

We see that the distribution of is the same as

Note that
Property D
If are independent, then

Note: Properties (A) & (D) often can be used to determine


the distribution of if the MGFs of are known
Example
Suppose are independent. What is the
distribution of ?

 Property (D):

 Let .

 Property (A)
MGFs of Common Distributions
Distribution MGF
Bernoulli( )
Geo( ) for

Binomial( )
Poisson(
( for , for

Gamma for
Exp( for
5) Probability Distributions of
Functions of Random Variables

Let be a random variable. Let


where is a function with domain

Goal: Find distribution of given the distribution of

We discuss the CDF, Jacobian, and MGF methods


CDF Method

 Given
 Find the CDF of from the CDF of and the
definition of

 If is continuous, then a PDF for is given by


Example for CDF Method
(uniform distribution on

Find the CDF and PDF of


PDF of :

and otherwise

CDF of :

for
Example of CDF Method (graphical)

CDF of
1.0

0.5
PDF of

-1 +1
Example for CDF Method
So, we know CDF of : for

To find the CDF of

Let . Then
Example of CDF Method (graphical)

CDF of
1.0

0.5
PDF of

-1 +1
Example for CDF Method
for
for
for

for
for
Result: for and otherwise

Note that , since is not


differentiable at 0 and 1
Example of CDF Method (graphical)

CDF of
1.0

PDF of
0.5

-1 +1
Jacobian Method
Let be a continuous random variable with CDF and
PDF . Set

Consider where is strictly increasing


or strictly decreasing

In this case, can be expressed explicitly by and


Hence we get explicit formulas for and

This way to find the PDF of is called Jacobian Method


Reason Why Jacobian Method Works

If is strictly increasing, then


for all
Moreover, exists and is also strictly increasing

If is strictly decreasing, then


for all
Moreover, exists and is also strictly decreasing

Recall: The inverse function of is the function with


for all
Strictly Increasing Functions of Random
Variables
Let be a continuous random variable with CDF and
PDF . Set

Suppose where is strictly increasing


Image of is
Let be an arbitrary element of the image of . Then

, i.e,

)
Strictly Increasing Functions of Random
Variables
We have shown:
in image of )

Hence, by the chain rule,

is not in image of then cannot attain the value ,


thus we set in this case
Summary: PDF of Strictly Increasing
Function of Random Variable
random variable with PDF
Set

Suppose where is strictly increasing

Then if is in the image of and


otherwise
Strictly Decreasing Case

Let where is strictly decreasing. Then

, i.e,

Hence if is in the image of


and otherwise
Procedure for Application of Jacobian
Method (Summary)
Goal: Find PDF of

 Write down PDF


 Determine
 Verify that is strictly monotone on and find
 Determine the image of , i.e.,


Example for Jacobian Method
. Find the PDF of
 PDF of is , , and otherwise

 (set of positive real numbers)

 is strictly increasing on with

 Image of is
Hence, for , we have
.
Moreover, for .
MGF Method
Goal: Find distribution of

Based on:
Inversion Theorem If for all ,
, then and have the same distribution

Procedure:

1) Find the MGF of


2) Find a random variable which has the same MGF
3) Inversion Theorem has the same distribution as
Example for MGF Method
Let independent random variables with
and .
Find the distribution of

where

Hence
6) Distribution of Maximum and
Minimum of Random Variables
Let be continuous i.i.d. random variables with
CDF and PDF

(recall: “i.i.d.”= independent, identically distributed)

We want to find the distributions of and

We just need to apply the CDF method


Distribution of
Let

Result:
Example of when

We know that
and , for

Let , then

Hence,
Distribution of
Let

Result:
7) Statistical Populations and Random
Samples
A population is a set of objects of a certain kind.
A specific property of these objects is analyzed statistically.
Examples:

Objects (Population) Property


Stars in the universe Luminosity
Undergraduate students in NTU CGPA
Chess players in Russia Elo rating
Electrons emitted by Kinetic energy
photoelectric effect
Caucasians IQ
Population Distribution
Measurements of the property in a population to be
investigated are modeled by random variables

Usually, these are i.i.d. random variables such that


measures the property of object

Example: is the CGPA of student of NTU,

The distribution of the random variables is called


population distribution
Random Samples
A random sample is a randomly chosen subset of the
objects in a population

Instead of the whole population, often only a random


sample is investigated (easier)

The population distribution usually is assumed to have a


certain form (say, or ), but the parameters
(here ) are unknown

Parameters are estimated from a random sample


Statistical inference
Statistical inference is the process by which we acquire
information and draw conclusions about populations from samples.

Selected
from

Population
(parameter)
Sample
Findings
Generalised to (Statistics)
Statistical Model for Random Samples

 Suppose we collect independent measurements (data)


of a property on a random sample
 can be viewed as realizations of i.i.d. random
variables
 Let be the distribution of the underlying population
 is called an i.i.d. sample drawn from
 We also say that is drawn from a population
with population distribution
Example
Population: Undergraduate students of NTU
Property to be studied: CGPA
Population distribution:
Random sample: 100 randomly chosen NTU students
Data:
Statistical model: i.i.d.

Note: “Random sample” has two meanings:


• Randomly chosen subset of the population
• Set of random variables modeling the data
Population Mean and Variance

Let be a random sample drawn from a


population with population distribution
is called the population mean
is called the population variance

Example: i.i.d.
Population mean:
Population variance:
Observations

The measurements are also called


observations for

Observations can be repeated: There can be further


observations , , etc., all for the
same random sample

Observations are -tuples of real numbers (not random


variables) and are denoted by lower-case letters. Random
variables are denoted by upper-case letters.
Example
A fair dice is rolled three times
outcome of -th roll,
is an i.i.d. random sample from the discrete
distribution with , for
Possible (repeated) observations:

1 2 6
6 6 3
4 3 5
5 3 2
Generating Observations for Random
Samples in R
In R, we use the prefix “r” to generate observations for a
random sample drawn from a distribution. Examples:

S <- rnorm(100,10,3)
After this, S will contain 100 observations drawn from
N(10,9)

S <- rgamma(1000,10,1/10)
Creates 1000 observations drawn from Gamma(10,10)
8) Statistics and Sampling
Distributions
Let be a random sample

 A real valued function is called a statistic


 Note that a statistic is a random variable which only
depends on and not on any parameters of
the distribution of the ’s
 The distribution of a statistic is called a sampling
distribution
Examples
Let be a random sample

 , , are statistics
 For each the distribution of is a sampling
distribution
 Suppose i.i.d.
The function is not a statistic, as it involves
a parameter
Difference Between Population
Distribution and Sampling Distribution

Let be an i.i.d. random sample

Population distribution: distribution of


Sampling distribution: distribution of a statistic based on

More concretely: Let be statistic based on


(any real valued function of , for instance,
). Then the distribution of is a sampling
distribution
9) Sample Mean and Sample Variance
Let be a random sample
Sample mean:

Sample variance:

Note that and are statistics. Their distributions are


examples of sampling distributions
Distribution of Sample Mean for
Normally Distributed Sample
Suppose i.i.d.
Sample mean:

In Tutorial 1 we show
For we have for all

Note that is a statistic, but is not


Features of Population Versus
Features of Statistics
Let i.i.d. random sample drawn from an
population. It is important to distinguish the features of the
population from those of statistics based on :

Features of population Features of statistics


Population mean: Sample mean:
Population variance: Sample variance:
Population distribution: Sampling distributions:

Will be discussed later.


Independence of Sample Mean and
Sample Variance
If are i.i.d. , then and are
independent random variables (only is true for normally
distributed samples!)
This is useful to find distributions of estimators involving
and

Proof of independence of and : Textbook, Section 6.3,


Corollary A
The proof depends on the fact that the vector
has a multivariate normal distribution
Alternate Proof
Assume that are i.i.d. .

We know , and

Furthermore,

for all
Alternate Proof
It follows from the fact that
if and are normally distributed, then
and are independent

This implies that is independent of each of .

Hence, id independent of , since is a function of


10) Chi-Square Distribution

Let be i.i.d.

Definition: The distribution of is called a chi-


square distribution with degrees of freedom

Notation:
PDF of Notation: CDF of ,
PDF of
Let , that is, where

Let ,
Gamma Distribution
Recall that for means that the
PDF of is
/

/
for and for .

Hence, we see that by letting

Which is the same as the PDF of .


and Gamma
We have shown that with i.i.d.
for each

From Tutorial 1, Problem 4

Conclusion: with PDF

for
PDF of

The upper percentage point is defined by


Summary of Chi-Square Distribution

 is the distribution of the sum of squares of


independent standard normal variables

 As a consequence of this, if and are independent


and and , then
 The MGF of a distributed random variable is

 Percentage points of can be used to test validity of


statistical models involving normal distributions
11) Distribution of Standardized
Sample Variance in the Normal Case
Recall that,

Have shown: if i.i.d.

Our goal now is to find a similar result for the distribution of


. This has great practical value.
Distribution of Standardized Sample
Variance in the Normal Case
Suppose i.i.d.
Sample variance:

Standardized sample variance :

Proof: See textbook Section 6.3, Theorem B. Relies on


independence of and and
Proof
We know that

Further,

Independent
Proof

Independent

Therefore,
Significance of Standardized Sample
Variance
involves the (usually unknown) parameter ,
but its distribution does not depend on

For all , we can find

without knowing
12) -Distribution
Let i.i.d.
Recall that
Suppose both are unknown and our task is to
estimate from

If sample size is large, say , then can be


approximated by the sample variance , that is, is
approximately distributed

For large samples, this helps to approximate probabilities


involving and construct confidence intervals for
-Distribution Motivation

However, if sample is small, then might be far from


and it makes no sense to approximate the distribution of
by

In this case, we should use the exact distribution of


, which turns out to be a t-distribution
Definition of -Distribution

Let and be independent

The distribution of

is called a t-distribution (or Student’s t-distribution) with


degrees of freedom

Notation:
PDF of -Distribution
A random variable has a t-distribution (or Student’s t-
distribution) with degrees of freedom, denoted by ,
then the PDF of is

for
has a -Distribution

with and
Since and independent and independent
Importance of -Distribution
We have shown:

if i.i.d.

can be analyzed without knowing . This is useful for


small samples

Compare to : This can only be used for


large samples, when can be approximated by
Properties of Distribution

 PDF is similar to that of


, but curve is a bit
lower and wider
 tends to for

df: degrees of freedom ( )


Historical Notes

The probability distribution of was first published in


1908 in a paper written by W. S. Gosset, but under the
name of “Student”.

At that time, Gosset was employed by the Irish Brewery


that prohibited publication of research by its staff
members.

Consequently, the distribution of is called the Student


–distribution or simply -distribution.
Summary of t-Distribution

 is the distribution of where and


are independent
 t-distribution is similar to , but more useful than
for small samples
 We have for normally distributed samples
 t-distribution can be used for hypothesis tests concerning
the population mean of normally distributed samples
13) F-Distribution
Let and be independent. The distribution of

is called an F-distribution, denoted by

PDF: for ,

Mean: for
Variance: ] for
PDF of F(d1, d2) Distribution
Remarks on the F-Distribution
 F-distribution is important for statistical tests involving
variances
 A major application of the F-Distribution is Analysis of
Variance (ANOVA)
 If the random variable follows a -distribution, then
follows a distribution
Informally,
Relationship between Distributions
Relationship between Distributions
http://www.math.wm.edu/~leemis/chart/UDR/UDR.html
14) Some Limit Theorems
Chebyshev’s Inequality:
Let be a random variable with mean and variance .
Then for any

Proof: for continuous PDF , we let

This means if is small, there is a high probability that


will not deviate much from .
(weak) Law of Large Numbers (wLLN)
Let i.i.d. random variables with mean and
variance .
Then for any with ,

Proof: we already know and , by


Chebyshev’s inequality

We say that converges to in probability.


Convergence in Distribution
Definition: Let be a sequence of random
variables with CDF and let be a random
variable with CDF .
We say that converges in distribution to if
at every point at which is continuous.
Note: MFG are often used in establishing convergence of
CDF. So, we need the following.

Continuity Theorem: Let be a sequence of CDF with the


corresponding MGF . Let be a CDF with MGF .
If for all in an open interval containing zero,
then at all continuity points of .
Convergence of Poisson Distribution
Let be an increasing sequence with , and
let be a sequence of Poisson random variables
with the corresponding parameters.

Since , and MGF

We let so that , and

/
Convergence of Poisson Distribution
/

Note that is the MGF of


Central Limit Theorem (CLT)
Let i.i.d. with and .
Then
for

for all , where CDF of

This means for large , the standardized sample mean


has an approximately standard normal distribution.
Proof of CLT
Let be the common MGF of .
We define and .
Then,

and

Using Taylor series expansion about zero:

where as .
Proof of CLT
Since we know and
, we have

where as .

Therefore,

which is the MGF of the


Proof of CLT
So, we have
in distribution

Finally,

Note that,

and in general,
Non-Normal Samples
CLT is important because in practice, we often encounter
i.i.d. samples which may not be normally
distributed.

In such situation, the exact distribution of and cannot


be determined

For large samples, however, the Central Limit Theorem


and its generalizations (asymptotic normality of maximum
likelihood estimators) provide approximations to
distributions of and (and other statistics)
Consequences of CLT for
The CLT may be used to approximate probabilities such as

where , are given constants.

(by CLT if is large)


Consequences of CLT for ,
Summary
i.i.d. with and If is
large, then
Example
Let i.i.d.
Then, and
Example, R Simulation
Create vector of 100 Bernoulli(0.8) observations:
test <- rbinom(100,1,0.8)
Compute :
sum_x_i <- sum(test)
Replicate this experiment 1000 times:
sums_x_i <- replicate(1000, sum(rbinom(100,1,0.8)))
Extract those results with :
count <- sums_x_i[sums_x_i>=70 & sums_x_i<=90]
Determine in how many cases holds:
length(count)
Result should be around 990
Significance of CLT for Statistics
 For large samples, the CLT provides an approximation
of the distribution of the standardized sample mean
 Similarly, it can be used to approximate the distribution
of other statistics (e.g. parameter estimators) for large
samples (to be discussed later)
 For example, we will discuss the asymptotic normality of
maximum likelihood estimators, which is a generalization
of the CLT
What the CLT does NOT say

Wrong!

Correct: The distribution of the standardized sample


mean approaches a standard normal distribution for
What the CLT does NOT say

Wrong!

This actually is an imprecise formulation of a different


theorem.

Which one?
What the CLT does NOT say

Absolutely wrong!!

This mixes up random variables and distributions.

Even worse, if ’s are independent random variables,


then usually diverges almost surely.
What the CLT does NOT say
“By the CLT, the sum of independent trials tends to a
constant.”

(interview presentation, January 2019)

Wrong!

The mean of i.i.d. random variables converges to a


constant under the conditions of the Law of Large
Numbers, but the sum of independent random variables
usually diverges
Correct Version
The CLT verbally (without formulas) can be stated as
follows.

1) For i.i.d. samples with finite population mean and


variance, the standardized sample mean asymptotically
has a standard normal distribution (“asymptotically” means
for )

2) For i.i.d. samples with finite population mean and


variance, the standardized sample mean in distribution
tends to a standard normal distribution when the sample
size increases

You might also like