Professional Documents
Culture Documents
Chapter 1
Chapter 1
Chapter 1
Lecture Notes
Chapter 1
Introduction and Revision
Table of Contents
Section Remark
Introduction
1. PDF, CDF, PMF, Mean, Variance, and Moments
2. Common Probability Distributions
3. Upper Percentage Points of Distributions
4. Moment Generating Functions
5. Probability Distributions of Functions of Random Variables
6. Distribution of Maximum and Minimum of Random Variables
7. Statistical Populations and Random Samples
8. Statistics and Sampling Distributions
9. Sample Mean and Sample Variance
10. Chi-Square Distribution
11. Distribution of Standardized Sample Variance in the Normal Case
12. t-Distribution
13. F-Distribution
14. Some Limit Theorems
What is Statistics?
› Statistics is the science of data that involves:
– Collecting
– Classifying
– Summarizing
– Organizing and
– Interpretation
of (usually) numerical information.
› It includes mathematical methods for collection, analysis, and
presentation of numerical data
› The aim is to make rational decisions under uncertain conditions
and to derive insights from data
› In manufacturing, computer software, pharmaceuticals and other
areas, information are collected and analyzed to improve the
quality of a process and product (Inferential Statistics)
Basic Procedures of Statistics
Study of data in four steps:
Data
Assume appropriate joint
Data Statistical distribution of with
Collection Modeling parameters
Estimate by
functions of , Decision
Data
quantify uncertainty Making
Analysis
of these estimates
Review of Probability
1) PDF, CDF, PMF, Mean, Variance,
and Moments
Some Standard Notation
Continuous case:
Discrete case:
Example
Let (standard normal distribution)
PDF:
CDF:
PDF CDF
Median and Quartiles of Distributions
if continuous
if discrete
Properties of Expected Values
Linearity:
Product formula:
Variance:
th moment:
th central moment:
2) Common Probability Distributions
Bernoulli • Uniform
Binomial • Normal
Geometric • Exponential
Poisson • Gamma
Bernoulli Distribution
means that only takes values 0
(failure) and 1 (success) such that
PMF:
Mean:
Variance:
Alternative formula:
if we let
Binomial Distribution
means that is the sum of
independent Bernoulli variables
PMF: ,
Mean:
Variance:
Binomial Identity:
PMF of Binomial(100,0.5)
Geometric Distribution
Geometric Identity:
PMF of Geom(0.1)
Poisson Distribution
approximates a variable
with (acceptable for and .
The Poisson distribution is used to model rare events
PMF: ,
Mean:
Variance:
Exponential Identity:
PMF of Poisson(1)
Uniform Distribution
Mean:
Variance:
PDF of U(-1,1)
Normal Distribution
means that the PDF of is
Mean:
Variance:
Normal Identity:
PDF of N(0,1)
Exponential Distribution
Let . We write if the PDF of is
Mean:
Variance:
Continuous case:
( is the PDF of )
Discrete case:
( is the PMF of )
Example (Discrete r.v.)
Let .
That is, ,
Therefore,
Property C
If then
Example (Normal r.v.)
Let and .
Note that
Property D
If are independent, then
Property (D):
Let .
Property (A)
MGFs of Common Distributions
Distribution MGF
Bernoulli( )
Geo( ) for
Binomial( )
Poisson(
( for , for
Gamma for
Exp( for
5) Probability Distributions of
Functions of Random Variables
Given
Find the CDF of from the CDF of and the
definition of
and otherwise
CDF of :
for
Example of CDF Method (graphical)
CDF of
1.0
0.5
PDF of
-1 +1
Example for CDF Method
So, we know CDF of : for
Let . Then
Example of CDF Method (graphical)
CDF of
1.0
0.5
PDF of
-1 +1
Example for CDF Method
for
for
for
for
for
Result: for and otherwise
CDF of
1.0
PDF of
0.5
-1 +1
Jacobian Method
Let be a continuous random variable with CDF and
PDF . Set
, i.e,
)
Strictly Increasing Functions of Random
Variables
We have shown:
in image of )
, i.e,
Example for Jacobian Method
. Find the PDF of
PDF of is , , and otherwise
Image of is
Hence, for , we have
.
Moreover, for .
MGF Method
Goal: Find distribution of
Based on:
Inversion Theorem If for all ,
, then and have the same distribution
Procedure:
where
Hence
6) Distribution of Maximum and
Minimum of Random Variables
Let be continuous i.i.d. random variables with
CDF and PDF
Result:
Example of when
We know that
and , for
Let , then
Hence,
Distribution of
Let
Result:
7) Statistical Populations and Random
Samples
A population is a set of objects of a certain kind.
A specific property of these objects is analyzed statistically.
Examples:
Selected
from
Population
(parameter)
Sample
Findings
Generalised to (Statistics)
Statistical Model for Random Samples
Example: i.i.d.
Population mean:
Population variance:
Observations
1 2 6
6 6 3
4 3 5
5 3 2
Generating Observations for Random
Samples in R
In R, we use the prefix “r” to generate observations for a
random sample drawn from a distribution. Examples:
S <- rnorm(100,10,3)
After this, S will contain 100 observations drawn from
N(10,9)
S <- rgamma(1000,10,1/10)
Creates 1000 observations drawn from Gamma(10,10)
8) Statistics and Sampling
Distributions
Let be a random sample
, , are statistics
For each the distribution of is a sampling
distribution
Suppose i.i.d.
The function is not a statistic, as it involves
a parameter
Difference Between Population
Distribution and Sampling Distribution
Sample variance:
In Tutorial 1 we show
For we have for all
We know , and
Furthermore,
for all
Alternate Proof
It follows from the fact that
if and are normally distributed, then
and are independent
Let be i.i.d.
Notation:
PDF of Notation: CDF of ,
PDF of
Let , that is, where
Let ,
Gamma Distribution
Recall that for means that the
PDF of is
/
/
for and for .
for
PDF of
Further,
Independent
Proof
Independent
Therefore,
Significance of Standardized Sample
Variance
involves the (usually unknown) parameter ,
but its distribution does not depend on
without knowing
12) -Distribution
Let i.i.d.
Recall that
Suppose both are unknown and our task is to
estimate from
The distribution of
Notation:
PDF of -Distribution
A random variable has a t-distribution (or Student’s t-
distribution) with degrees of freedom, denoted by ,
then the PDF of is
for
has a -Distribution
with and
Since and independent and independent
Importance of -Distribution
We have shown:
if i.i.d.
PDF: for ,
Mean: for
Variance: ] for
PDF of F(d1, d2) Distribution
Remarks on the F-Distribution
F-distribution is important for statistical tests involving
variances
A major application of the F-Distribution is Analysis of
Variance (ANOVA)
If the random variable follows a -distribution, then
follows a distribution
Informally,
Relationship between Distributions
Relationship between Distributions
http://www.math.wm.edu/~leemis/chart/UDR/UDR.html
14) Some Limit Theorems
Chebyshev’s Inequality:
Let be a random variable with mean and variance .
Then for any
/
Convergence of Poisson Distribution
/
and
where as .
Proof of CLT
Since we know and
, we have
where as .
Therefore,
Finally,
Note that,
and in general,
Non-Normal Samples
CLT is important because in practice, we often encounter
i.i.d. samples which may not be normally
distributed.
Wrong!
Wrong!
Which one?
What the CLT does NOT say
Absolutely wrong!!
Wrong!