Chapter 1

MH3500 Statistics
Lecture Notes
Chapter 1
Introduction and Revision
Table of Contents
Section Remark
Introduction
1. PDF, CDF, PMF, Mean, Variance, and Moments
2. Common Probability Distributions
3. Upper Percentage Points of Distributions
4. Moment Generating Functions
5. Probability Distributions of Functions of Random Variables
6. Distribution of Maximum and Minimum of Random Variables
7. Statistical Populations and Random Samples
8. Statistics and Sampling Distributions
9. Sample Mean and Sample Variance
10. Chi-Square Distribution
11. Distribution of Standardized Sample Variance in the Normal Case
12. t-Distribution
13. F-Distribution
14. Some Limit Theorems
What is Statistics?
› Statistics is the science of data that involves:
– Collecting
– Classifying
– Summarizing
– Organizing and
– Interpretation
of (usually) numerical information.
› It includes mathematical methods for collection, analysis, and
presentation of numerical data
› The aim is to make rational decisions under uncertain conditions
and to derive insights from data
› In manufacturing, computer software, pharmaceuticals and other
areas, information are collected and analyzed to improve the
quality of a process and product (Inferential Statistics)
Basic Procedures of Statistics
Study of data in four steps:
Data
Assume appropriate joint
Data Statistical distribution of with
Collection Modeling parameters
Estimate by
functions of , Decision
Data
quantify uncertainty Making
Analysis
of these estimates
Review of Probability
1) PDF, CDF, PMF, Mean, Variance,
and Moments
Some Standard Notation
Let be a random variable (r.v.)
Cumulative Distribution Function (CDF):
Probability Density Function (PDF, for continuous ):
Probability Mass Function (PMF, for discrete ):

Computing Probabilities
Continuous case:
Discrete case:
Example
Let (standard normal distribution)
PDF:
CDF:
PDF CDF
Median and Quartiles of Distributions
Consider a distribution with CDF .
Median: Value with
Lower Quartile: Value with
Upper Quartile: Value with
(median and quartiles may not exist or may not be unique)

Expected Values
Let be a random variable and a function

Let be the PDF of (if continuous) respectively PMF
(if discrete)
if continuous
if discrete
Properties of Expected Values
Linearity:
(this holds even if the are not independent)
Product formula:
(only holds if the are independent)

Mean, Variance, Moments
Let be a random variable
Mean (expected value):
Variance:
th moment:
th central moment:
2) Common Probability Distributions
 Bernoulli • Uniform
 Binomial • Normal
 Geometric • Exponential
 Poisson • Gamma
Bernoulli Distribution
means that only takes values 0
(failure) and 1 (success) such that
PMF:
Mean:
Variance:
Note that: CDF of Bernoulli(1/3)

PMF of Bernoulli Distribution
Alternative formula:
for and otherwise

Mean and Variance of Bernoulli
Random Variable
if we let
Binomial Distribution
means that is the sum of
independent Bernoulli variables
PMF: ,
Mean:
Variance:
Binomial Identity:
PMF of Binomial(100,0.5)
Geometric Distribution
means that counts the number of trials up to

the first success in a sequence of independent Bernoulli
trials
PMF: ,
Mean:
Variance:
Geometric Identity:
PMF of Geom(0.1)
Poisson Distribution
approximates a variable
with (acceptable for and .
The Poisson distribution is used to model rare events
PMF: ,
Mean:
Variance:
Exponential Identity:
PMF of Poisson(1)
Uniform Distribution
means that the PDF of is
Mean:
Variance:
PDF of U(-1,1)
Normal Distribution
means that the PDF of is
Mean:
Variance:
Normal Identity:
PDF of N(0,1)
Exponential Distribution
Let . We write if the PDF of is
for and for

Mean:
Variance:
Application: time until

occurrence of rare event, e.g.,
PDF of Exp(10)
radioactive decay, earthquake,
tsunami
Gamma Distribution
Let means that the PDF of is
for and for
Mean:
Variance:
Note: PDF of Gamma(5,1)

• is an distribution
• is an (chi square) distribution

Remarks on Gamma Distribution
 There are two common ways to specify a Gamma
distribution (see Wikipedia); one uses the shape and
scale and the other uses shape and rate
 The version we use in this course is shape and scale. In
our notation, the shape parameter is and the scale
parameter is (in Wikipedia, however, the shape
parameter is and the scale parameter is )
 To switch from one version of parameters to the other,
just use the fact that rate = 1/scale
3) Upper Percentage Points of
Distributions
Upper Percentage Points
Let be a random variable with distribution
is the point such that the probability for being larger

than this point is :
is called the upper point of
Percentage points are frequently used for confidence

intervals and hypothesis testing. They measure how
significant an observation for is.
Connection Between Percentage
Points and CDF
Let be a random variable with distribution and CDF
is the point where

Upper Percentage Points of
Example
Consider a situation where the value of a random variable
is viewed as “significant” if the probability to observe this
value or something more extreme is 25.
Suppose is distributed, what positive values of

are significant?
Need to know the point with . This is

exactly the upper 2.5% point of
From last slide, . So is viewed as

significant
Remark on Upper Percentage Points
 If percentage points are needed in the exam or midterm,

then tables will be given which contain the necessary
information
 Similar tables are available in most textbooks (chi-
square upper percentage points, t-distribution
percentage points, and standard normal CDF)
 The percentage points either can be obtained from a
table of percentage points or from a table of the CDF of
the distribution
4) Moment Generating Function (MGF)
Let be a random variable
Moment Generating Function:
Continuous case:
( is the PDF of )
Discrete case:
( is the PMF of )
Example (Discrete r.v.)
Let .
That is, ,
The MGF of is:

Example (Continuous r.v.)
Let , and .
Then the PDF of is for and otherwise.
The MGF of is:

Example (Normal r.v.)
Let .
Then the PDF of is for .
The MGF of is:
=
Some Properties of the MGF
Property A (Inversion Theorem)
If for all , then
and have the same distribution. That is,
they have the same CDF.
We will not prove this results here, because it involves
higher level of Mathematics, such as Laplace
transformation.
Property B
If the MGF of exists in an open interval
containing zero, then
for all positive integers .
We know , hence,
Therefore,
Property C
If then
Example (Normal r.v.)
Let and .
We see that the distribution of is the same as
Note that
Property D
If are independent, then
Note: Properties (A) & (D) often can be used to determine

the distribution of if the MGFs of are known
Example
Suppose are independent. What is the
distribution of ?

 Property (D):
 Let .
 Property (A)
MGFs of Common Distributions
Distribution MGF
Bernoulli( )
Geo( ) for
Binomial( )
Poisson(
( for , for
Gamma for
Exp( for
5) Probability Distributions of
Functions of Random Variables
Let be a random variable. Let

where is a function with domain
Goal: Find distribution of given the distribution of
We discuss the CDF, Jacobian, and MGF methods

CDF Method
 Given
 Find the CDF of from the CDF of and the
definition of
 If is continuous, then a PDF for is given by

Example for CDF Method
(uniform distribution on
Find the CDF and PDF of

PDF of :
and otherwise
CDF of :
for
Example of CDF Method (graphical)
CDF of
1.0
0.5
PDF of
-1 +1
So, we know CDF of : for
To find the CDF of
Let . Then
CDF of
1.0
0.5
PDF of
-1 +1
for
for
for
for
for
Result: for and otherwise
Note that , since is not

differentiable at 0 and 1
CDF of
1.0
PDF of
0.5
-1 +1
Jacobian Method
Let be a continuous random variable with CDF and
PDF . Set
Consider where is strictly increasing

or strictly decreasing
In this case, can be expressed explicitly by and

Hence we get explicit formulas for and
This way to find the PDF of is called Jacobian Method

Reason Why Jacobian Method Works
If is strictly increasing, then

for all
Moreover, exists and is also strictly increasing
If is strictly decreasing, then

for all
Moreover, exists and is also strictly decreasing
Recall: The inverse function of is the function with

for all
Strictly Increasing Functions of Random
Variables
Let be a continuous random variable with CDF and
PDF . Set
Suppose where is strictly increasing

Image of is
Let be an arbitrary element of the image of . Then
, i.e,
)
Strictly Increasing Functions of Random
Variables
We have shown:
in image of )
Hence, by the chain rule,
is not in image of then cannot attain the value ,

thus we set in this case
Summary: PDF of Strictly Increasing
Function of Random Variable
random variable with PDF
Set
Suppose where is strictly increasing
Then if is in the image of and

otherwise
Strictly Decreasing Case
Let where is strictly decreasing. Then
, i.e,
Hence if is in the image of

and otherwise
Procedure for Application of Jacobian
Method (Summary)
Goal: Find PDF of
 Write down PDF

 Determine
 Verify that is strictly monotone on and find
 Determine the image of , i.e.,

Example for Jacobian Method
. Find the PDF of
 PDF of is , , and otherwise
 (set of positive real numbers)
 is strictly increasing on with
 Image of is
Hence, for , we have
.
Moreover, for .
MGF Method
Goal: Find distribution of
Based on:
Inversion Theorem If for all ,
, then and have the same distribution
Procedure:
1) Find the MGF of

2) Find a random variable which has the same MGF
3) Inversion Theorem has the same distribution as
Example for MGF Method
Let independent random variables with
and .
Find the distribution of
where
Hence
6) Distribution of Maximum and
Minimum of Random Variables
Let be continuous i.i.d. random variables with
CDF and PDF
(recall: “i.i.d.”= independent, identically distributed)
We want to find the distributions of and
We just need to apply the CDF method

Distribution of
Let
Result:
Example of when
We know that
and , for
Let , then
Hence,
Distribution of
Let
Result:
7) Statistical Populations and Random
Samples
A population is a set of objects of a certain kind.
A specific property of these objects is analyzed statistically.
Examples:
Objects (Population) Property

Stars in the universe Luminosity
Undergraduate students in NTU CGPA
Chess players in Russia Elo rating
Electrons emitted by Kinetic energy
photoelectric effect
Caucasians IQ
Population Distribution
Measurements of the property in a population to be
investigated are modeled by random variables
Usually, these are i.i.d. random variables such that

measures the property of object
Example: is the CGPA of student of NTU,
The distribution of the random variables is called

population distribution
Random Samples
A random sample is a randomly chosen subset of the
objects in a population
Instead of the whole population, often only a random

sample is investigated (easier)
The population distribution usually is assumed to have a

certain form (say, or ), but the parameters
(here ) are unknown
Parameters are estimated from a random sample

Statistical inference
Statistical inference is the process by which we acquire
information and draw conclusions about populations from samples.
Selected
from
Population
(parameter)
Sample
Findings
Generalised to (Statistics)
Statistical Model for Random Samples
 Suppose we collect independent measurements (data)

of a property on a random sample
 can be viewed as realizations of i.i.d. random
variables
 Let be the distribution of the underlying population
 is called an i.i.d. sample drawn from
 We also say that is drawn from a population
with population distribution
Example
Population: Undergraduate students of NTU
Property to be studied: CGPA
Population distribution:
Random sample: 100 randomly chosen NTU students
Data:
Statistical model: i.i.d.
Note: “Random sample” has two meanings:

• Randomly chosen subset of the population
• Set of random variables modeling the data
Population Mean and Variance
Let be a random sample drawn from a

population with population distribution
is called the population mean
is called the population variance
Example: i.i.d.
Population mean:
Population variance:
Observations
The measurements are also called

observations for
Observations can be repeated: There can be further

observations , , etc., all for the
same random sample
Observations are -tuples of real numbers (not random

variables) and are denoted by lower-case letters. Random
variables are denoted by upper-case letters.
Example
A fair dice is rolled three times
outcome of -th roll,
is an i.i.d. random sample from the discrete
distribution with , for
Possible (repeated) observations:
1 2 6
6 6 3
4 3 5
5 3 2
Generating Observations for Random
Samples in R
In R, we use the prefix “r” to generate observations for a
random sample drawn from a distribution. Examples:
S <- rnorm(100,10,3)
After this, S will contain 100 observations drawn from
N(10,9)
S <- rgamma(1000,10,1/10)
Creates 1000 observations drawn from Gamma(10,10)
8) Statistics and Sampling
Distributions
Let be a random sample
 A real valued function is called a statistic

 Note that a statistic is a random variable which only
depends on and not on any parameters of
the distribution of the ’s
 The distribution of a statistic is called a sampling
distribution
Examples
 , , are statistics
 For each the distribution of is a sampling
distribution
 Suppose i.i.d.
The function is not a statistic, as it involves
a parameter
Difference Between Population
Distribution and Sampling Distribution
Let be an i.i.d. random sample
Population distribution: distribution of

Sampling distribution: distribution of a statistic based on
More concretely: Let be statistic based on

(any real valued function of , for instance,
). Then the distribution of is a sampling
distribution
9) Sample Mean and Sample Variance
Sample mean:
Sample variance:
Note that and are statistics. Their distributions are

examples of sampling distributions
Distribution of Sample Mean for
Normally Distributed Sample
Suppose i.i.d.
Sample mean:
In Tutorial 1 we show
For we have for all
Note that is a statistic, but is not

Features of Population Versus
Features of Statistics
Let i.i.d. random sample drawn from an
population. It is important to distinguish the features of the
population from those of statistics based on :
Features of population Features of statistics

Population mean: Sample mean:
Population variance: Sample variance:
Population distribution: Sampling distributions:
Will be discussed later.

Independence of Sample Mean and
Sample Variance
If are i.i.d. , then and are
independent random variables (only is true for normally
distributed samples!)
This is useful to find distributions of estimators involving
and
Proof of independence of and : Textbook, Section 6.3,

Corollary A
The proof depends on the fact that the vector
has a multivariate normal distribution
Alternate Proof
Assume that are i.i.d. .
We know , and
Furthermore,
for all
Alternate Proof
It follows from the fact that
if and are normally distributed, then
and are independent
This implies that is independent of each of .
Hence, id independent of , since is a function of

10) Chi-Square Distribution
Let be i.i.d.
Definition: The distribution of is called a chi-

square distribution with degrees of freedom
Notation:
PDF of Notation: CDF of ,
PDF of
Let , that is, where
Let ,
Gamma Distribution
Recall that for means that the
PDF of is
/
/
for and for .
Hence, we see that by letting
Which is the same as the PDF of .

and Gamma
We have shown that with i.i.d.
for each
From Tutorial 1, Problem 4
Conclusion: with PDF
for
PDF of
The upper percentage point is defined by

Summary of Chi-Square Distribution
 is the distribution of the sum of squares of

independent standard normal variables

 As a consequence of this, if and are independent

and and , then
 The MGF of a distributed random variable is
 Percentage points of can be used to test validity of

statistical models involving normal distributions
11) Distribution of Standardized
Sample Variance in the Normal Case
Recall that,
Have shown: if i.i.d.
Our goal now is to find a similar result for the distribution of

. This has great practical value.
Distribution of Standardized Sample
Variance in the Normal Case
Suppose i.i.d.
Sample variance:
Standardized sample variance :
Proof: See textbook Section 6.3, Theorem B. Relies on

independence of and and
Proof
We know that
Further,
Independent
Proof
Independent
Therefore,
Significance of Standardized Sample
Variance
involves the (usually unknown) parameter ,
but its distribution does not depend on
For all , we can find
without knowing
12) -Distribution
Let i.i.d.
Recall that
Suppose both are unknown and our task is to
estimate from
If sample size is large, say , then can be

approximated by the sample variance , that is, is
approximately distributed
For large samples, this helps to approximate probabilities

involving and construct confidence intervals for
-Distribution Motivation
However, if sample is small, then might be far from

and it makes no sense to approximate the distribution of
by
In this case, we should use the exact distribution of

, which turns out to be a t-distribution
Definition of -Distribution
Let and be independent
The distribution of
is called a t-distribution (or Student’s t-distribution) with

degrees of freedom
Notation:
PDF of -Distribution
A random variable has a t-distribution (or Student’s t-
distribution) with degrees of freedom, denoted by ,
then the PDF of is
for
has a -Distribution
with and
Since and independent and independent
Importance of -Distribution
We have shown:
if i.i.d.
can be analyzed without knowing . This is useful for

small samples
Compare to : This can only be used for

large samples, when can be approximated by
Properties of Distribution
 PDF is similar to that of

, but curve is a bit
lower and wider
 tends to for
df: degrees of freedom ( )

Historical Notes
The probability distribution of was first published in

1908 in a paper written by W. S. Gosset, but under the
name of “Student”.
At that time, Gosset was employed by the Irish Brewery

that prohibited publication of research by its staff
members.
Consequently, the distribution of is called the Student

–distribution or simply -distribution.
Summary of t-Distribution
 is the distribution of where and

are independent
 t-distribution is similar to , but more useful than
for small samples
 We have for normally distributed samples
 t-distribution can be used for hypothesis tests concerning
the population mean of normally distributed samples
13) F-Distribution
Let and be independent. The distribution of
is called an F-distribution, denoted by
PDF: for ,
Mean: for
Variance: ] for
PDF of F(d1, d2) Distribution
Remarks on the F-Distribution
 F-distribution is important for statistical tests involving
variances
 A major application of the F-Distribution is Analysis of
Variance (ANOVA)
 If the random variable follows a -distribution, then
follows a distribution
Informally,
Relationship between Distributions
Relationship between Distributions
http://www.math.wm.edu/~leemis/chart/UDR/UDR.html
14) Some Limit Theorems
Chebyshev’s Inequality:
Let be a random variable with mean and variance .
Then for any
Proof: for continuous PDF , we let
This means if is small, there is a high probability that

will not deviate much from .
(weak) Law of Large Numbers (wLLN)
Let i.i.d. random variables with mean and
variance .
Then for any with ,
Proof: we already know and , by

Chebyshev’s inequality
We say that converges to in probability.

Convergence in Distribution
Definition: Let be a sequence of random
variables with CDF and let be a random
variable with CDF .
We say that converges in distribution to if
at every point at which is continuous.
Note: MFG are often used in establishing convergence of
CDF. So, we need the following.
Continuity Theorem: Let be a sequence of CDF with the

corresponding MGF . Let be a CDF with MGF .
If for all in an open interval containing zero,
then at all continuity points of .
Convergence of Poisson Distribution
Let be an increasing sequence with , and
let be a sequence of Poisson random variables
with the corresponding parameters.
Since , and MGF
We let so that , and
/
Convergence of Poisson Distribution
/
Note that is the MGF of

Central Limit Theorem (CLT)
Let i.i.d. with and .
Then
for
for all , where CDF of
This means for large , the standardized sample mean

has an approximately standard normal distribution.
Proof of CLT
Let be the common MGF of .
We define and .
Then,
and
Using Taylor series expansion about zero:
where as .
Proof of CLT
Since we know and
, we have
where as .
Therefore,
which is the MGF of the

Proof of CLT
So, we have
in distribution
Finally,
Note that,
and in general,
Non-Normal Samples
CLT is important because in practice, we often encounter
i.i.d. samples which may not be normally
distributed.
In such situation, the exact distribution of and cannot

be determined
For large samples, however, the Central Limit Theorem

and its generalizations (asymptotic normality of maximum
likelihood estimators) provide approximations to
distributions of and (and other statistics)
Consequences of CLT for
The CLT may be used to approximate probabilities such as
where , are given constants.
(by CLT if is large)

Consequences of CLT for ,
Summary
i.i.d. with and If is
large, then
Example
Let i.i.d.
Then, and
Example, R Simulation
Create vector of 100 Bernoulli(0.8) observations:
test <- rbinom(100,1,0.8)
Compute :
sum_x_i <- sum(test)
Replicate this experiment 1000 times:
sums_x_i <- replicate(1000, sum(rbinom(100,1,0.8)))
Extract those results with :
count <- sums_x_i[sums_x_i>=70 & sums_x_i<=90]
Determine in how many cases holds:
length(count)
Result should be around 990
Significance of CLT for Statistics
 For large samples, the CLT provides an approximation
of the distribution of the standardized sample mean
 Similarly, it can be used to approximate the distribution
of other statistics (e.g. parameter estimators) for large
samples (to be discussed later)
 For example, we will discuss the asymptotic normality of
maximum likelihood estimators, which is a generalization
of the CLT
What the CLT does NOT say
Wrong!
Correct: The distribution of the standardized sample

mean approaches a standard normal distribution for
Wrong!
This actually is an imprecise formulation of a different

theorem.
Which one?
Absolutely wrong!!
This mixes up random variables and distributions.
Even worse, if ’s are independent random variables,

then usually diverges almost surely.
“By the CLT, the sum of independent trials tends to a
constant.”
(interview presentation, January 2019)
Wrong!
The mean of i.i.d. random variables converges to a

constant under the conditions of the Law of Large
Numbers, but the sum of independent random variables
usually diverges
Correct Version
The CLT verbally (without formulas) can be stated as
follows.
1) For i.i.d. samples with finite population mean and

variance, the standardized sample mean asymptotically
has a standard normal distribution (“asymptotically” means
for )
2) For i.i.d. samples with finite population mean and

variance, the standardized sample mean in distribution
tends to a standard normal distribution when the sample
size increases

Chapter 1

Uploaded by

Copyright:

Available Formats

You might also like

Chapter 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1

Uploaded by

Copyright:

Available Formats

MH3500 Statistics

Let be a random variable (r.v.)

Cumulative Distribution Function (CDF):

Probability Density Function (PDF, for continuous ):

Probability Mass Function (PMF, for discrete ):

Consider a distribution with CDF .

Median: Value with

Lower Quartile: Value with

Upper Quartile: Value with

(median and quartiles may not exist or may not be unique)

Let be a random variable and a function

(this holds even if the are not independent)

(only holds if the are independent)

Let be a random variable

Mean (expected value):

Note that: CDF of Bernoulli(1/3)

for and otherwise

means that counts the number of trials up to

means that the PDF of is

for and for

Application: time until

for and for

Note: PDF of Gamma(5,1)

• is an (chi square) distribution

is the point such that the probability for being larger

is called the upper point of

Percentage points are frequently used for confidence

Let be a random variable with distribution and CDF

is the point where

Suppose is distributed, what positive values of

Need to know the point with . This is

From last slide, . So is viewed as

 If percentage points are needed in the exam or midterm,

Moment Generating Function:

The MGF of is:

Then the PDF of is for and otherwise.

The MGF of is:

We see that the distribution of is the same as

Note: Properties (A) & (D) often can be used to determine

Let be a random variable. Let

Goal: Find distribution of given the distribution of

We discuss the CDF, Jacobian, and MGF methods

 If is continuous, then a PDF for is given by

Find the CDF and PDF of

To find the CDF of

Note that , since is not

Consider where is strictly increasing

In this case, can be expressed explicitly by and

This way to find the PDF of is called Jacobian Method

If is strictly increasing, then

If is strictly decreasing, then

Recall: The inverse function of is the function with

Suppose where is strictly increasing

Hence, by the chain rule,

is not in image of then cannot attain the value ,

Suppose where is strictly increasing

Then if is in the image of and

Let where is strictly decreasing. Then

Hence if is in the image of