Download as pdf or txt
Download as pdf or txt
You are on page 1of 127

Math 3200: Elementary to Intermediate Statistics

Instructor: Jimin Ding


jmding@wustl.edu

Department of Mathematics
Washington University in St. Louis
Class materials are available on course website
(www.math.wustl.edu/ jmding/math3200/ )

Spring 2017

Jimin Ding, Math WUSTL Math 3200 Spring 2017 1 / 67


Introduction

Jimin Ding, Math WUSTL Math 3200 Spring 2017 2 / 67


About Class
I Class materials are available at
http://www.math.wustl.edu/~jmding/math3200/
I Study Goals:
I statistical reasoning
I basic analytic skills
I critical thinking (in real life and empirical research studies)
I Syllabus
I Tentative class schedule
I Online survey through bb.wustl.edu
I Two interactive learning technologies: iClicker & Crowdmark
I Two statistical software: R & SAS
Learning tip: learn from reading others’ code, practice from
writing your code, and improve from searching online
help/documentation

Jimin Ding, Math WUSTL Math 3200 Spring 2017 3 / 67


What is Statistics?

Jimin Ding, Math WUSTL Math 3200 Spring 2017 4 / 67


Learning from Data= Statistics

Statistics is the art and science of learning from data.


I Art comes from the various creative and informative ways to
visualize, summarize, and analyze data
I Science comes from using applied and theoretic mathematics
and probability to make objective decisions.
An analysis that does not contain both aspects is often incomplete
and difficult to understand and use.
I Note, “statistics” is also the plural of “statistic” which is a
numerical fact of summary.
Example: average/mean, variance, range, ...
I Formally, a statistic is a function of data.
I Key: understand and quantify uncertainty/variability.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 5 / 67


Example 1: Men’s Olympic Triple Jump

Christian Taylor won the gold in men’s triple jump in 2016 Olympic
with a jump of 17.86 meters. It is amazing that how further a
human can jump comparing with a century ago. The record of
1896 Olympic, when men’s triple jump first presented in Olympic,
was only 13.71 m. To understand how the trip jump distance have
been improved, we collect Olympic records from 1896 to 2016.
I How would you like to present the data?
I What can you tell fro the data?
I Can you make any prediction on 2020 Olympic Men’s Triple
Jump distance?
I How would you quantify the uncertainty of 2020 Olympic
Men’s Triple Jump distance?

Jimin Ding, Math WUSTL Math 3200 Spring 2017 6 / 67


1
Example 2: Music and memory

Is it a good idea to list to music when study for a big test?


Can you design a study to test your hypothesis?
In a study conducted by some Statistics students, 62 people were
randomly assigned to listen to rap music, Mozart, or no music
while attempting to memorize objects pictured on a page. They
were then asked to list all the objects they could remember.
I How would you like to present and summarize the data?
I If given this summary table, what will you conclude? Why?
I Is there any pitfall in this study? If so, how will it affect your
conclusion?

1
From: Veaux (2012) “Stats: Data and Models”
Jimin Ding, Math WUSTL Math 3200 Spring 2017 7 / 67
2
Example 3: Kidney Stones Treatment
In 1990s, a medical study was conducted to compare several
treatments of kidney stones, and found that 273 out of 350
patients who underwent open surgery were successfully cured,
while 289 out of 350 who underwent noninvasive percutaneous
nephrolithotomy were successfully cured.
I If you have a friend or relative who has kidney stone problem,
which treatment would you suggest him/her?
I Actually for kidney stones, the treatment is often assigned
based on the size of stones instead of random. See data from
the two subgroups of small and large stones.
New Treatment Standard Treatment
Small Stones 81 out of 87 (93%) 234 out of 270 87%)
Large Stones 192 out of 263 (73%) 55 out of 80 (69 %)
Total 273 out of 350 (78%) 289 out of 350 (83%)
Now which would you suggest?
2
From: Charig et al. (1986). “Comparison of treatment of renal calculi by
open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave
lithotripsy”, British Medical Journal.
Jimin Ding, Math WUSTL Math 3200 Spring 2017 8 / 67
3 4
Example 4: Care Pathway
An OB doctor (obstetrician) wanted to evaluate effect of a care
pathway protocol in childbirth that was implemented two years
ago. She counted the number of different types of delivery prior
and post implementation of the care pathway. She found that the
rate of C-sections dropped by 3%, and the rate of Spontaneous
Vaginal Deliveries increased by 2%.
I Do you think this care pathway is beneficial? Why?
I Can these small percentage changes caused by randomness in
data collection? How to distinguish a random change from a
true improvement?
I If there were 33 C-sections out of 100 inductions pre-pathway
and 30 C-sections out of 100 inductions post-pathway, are
these 3 cases enough to claim the benefit of the care pathway?
I If data were collected from 2000 patients, ... ?
3
From: a recent hospital consulting project
4
Clinical care pathways are essentially protocols that are used to manage
the quality in healthcare concerning the standardization of care processes.
Implementation of care pathway promotes organized and efficient patient care.
Jimin Ding, Math WUSTL Math 3200 Spring 2017 9 / 67
Example 5: “Do I really have cancer?”

A patient saw his medical diagnostic report of some cancer is “+”,


which indicates the presence of cancer. He immediately searched
online and found this type of medical diagnose is very accurate:
I sensitivity of the test is 99% (if a person has this cancer, with
99% of chance, his test result will ”+”)
I specificity of the test is 95% (if a person is normal, with 95%
of chance, his test result will be “-”).
Of course, the patient was really scared and worried. But his
doctor told him that the type of cancer is very rare with only 0.1%
prevalence rate, so he does not need to worry too much. Why does
the doctor say that?
Hint: Knowing his diagnostic result is “+”, what is the chance
(probability) that he actually has this type of cancer?

Jimin Ding, Math WUSTL Math 3200 Spring 2017 10 / 67


R

Jimin Ding, Math WUSTL Math 3200 Spring 2017 11 / 67


About R

I R is a very powerful and popular statistical software


I R is open source and FREE
I R can be used on many OS: windows, Mac, Unix, Linux ....
I R is an efficient data managing and storage facility
I R is the most widely used statistical software in research
I R has a sharp learning curve and good documentation support
I Download R: https://cran.r-project.org/bin/windows/base/
I Introduction to R:
I Long version: https:
//cran.r-project.org/doc/manuals/R-intro.pdf
I Short version: https://cran.r-project.org/doc/
contrib/Torfs+Brauer-Short-R-Intro.pdf

Jimin Ding, Math WUSTL Math 3200 Spring 2017 12 / 67


Learning R

I Installation and interface of R (and Rstudio)


I R console, script, workspace, working directory, libraries,
I R code: comments, help
I Data structure in R: scaler, vector, matrix, array, lists, data
frame,
I Data type in R: class (numeric, factor, character,
user-defined), names, attributes,
I Summary statistics: mean, sd, cor, var, plot, hist, summary
I In and out of R: workspace, csv/excel data

Jimin Ding, Math WUSTL Math 3200 Spring 2017 13 / 67


SAS

Jimin Ding, Math WUSTL Math 3200 Spring 2017 14 / 67


About SAS
I SAS is a powerful and the most widely used statistical package
I SAS is the standard and official package in many application
fields.
I Knowledge of SAS is an asset in many job markets:
I SAS programmer
I official statistical package authorized by FDA
I Used by all Fortune 500 companies
I SAS is good at data management and handling data in
various formats
I much faster than other statistical for extremely large data
I supports Structured Query Language (SQL)
I Can be interactive
I SAS is good at routine statistical analyses.
I SAS was developed in 70s at NCSU, ”Statistical Analysis
System”.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 15 / 67


More about SAS

I SAS is NOT free.


I SAS does NOT fully support Mac OS.
I SAS is NOT much used for statistical methodological research.
I SAS is very different from all other programming languages.
I SAS has a very rigid code format.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 16 / 67


Where to find SAS

I SAS is available on computers in the ArtSci computing lab in


Seigle Hall, room 012.
I SAS is available in WU Medical school computing lab
I If you have purchased a SAS license, you may download SAS
through wustl
I SAS University Edition (on cloud): free, work on both
windows and Mac,
I Access Virtual SAS through campus network

Jimin Ding, Math WUSTL Math 3200 Spring 2017 17 / 67


Learning SAS

I Goal: write simple SAS codes to utilize basic statistical


procedures, and understand the output.
I We will only focus on windowed version SAS.
I The part of SAS we learn is a very small part of SAS: Base,
STAT, Graph, IML, SQL...
I A strategy: check the details of the sample code, and learn
details of each procedure when you need them.
I There are two main components of most SAS programs: data
steps and procedure steps.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 18 / 67


SAS Interface

I Program editor:
write your SAS program codes (commands and comments),
should be saved frequently ; xxx.sas.
I Log window:
contain errors, warnings, notes of how SAS interpret your
codes, check it every time you run a program; xxx.log.
I Output window:
results; xxx.lst.
I Explorer and Results Windows:
easy to use as data/file management tools, navigation tool.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 19 / 67


SAS Programs

I End of the sentence:


;
I Begining of the blok:
DATA xxx;
PROC GLM;
I End of the blok:
Run; Quit;
I Comments:
/*comments*/
* comments;
Remark: SAS is case insensitive.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 20 / 67


SAS Sample codes

I Entering data using data step and data importation


I Exploring data using descriptive statistics and graphics
I Inferential statistics: tests
www.ats.ucla.edu/stat/sas/modules/default.htm

Jimin Ding, Math WUSTL Math 3200 Spring 2017 21 / 67


Q1:Test your iClickers

Have you submitted online survey through blackboard yet?


I Yes.
I No.
If you have not done your online survey for math 3200, please do
so.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 22 / 67


Q2: Test your iClickers

Have you used R since last Friday?


I No, but I have used R before and am very familiar with R.
I No, but I plan to try it this week when I do my homework.
I Yes, I have installed R on my computer, read the introduction
document, and tried R code.
I Yes, I have installed R on my computer, but not used R yet.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 23 / 67


Data Exploring and Descriptive Statistics

Jimin Ding, Math WUSTL Math 3200 Spring 2017 24 / 67


Variables

Jimin Ding, Math WUSTL Math 3200 Spring 2017 25 / 67


Type of Variables

When we are given data, we have to know the meaning of the


numbers to really understand it. Variables are the results of
observing/measuring selected characteristics of the study units.
We often classify variables in the following types to find more
approriate models/presentations for them.
I Categorical (Qualitative)
I Nominal (non-ordered, “character”)
I Ordinal (ordered, “factor”)
I Numerical (Quantitative)
I Continuous (any possible values on an interval,“numeric”)
I Discrete (finite or infinite countable, “integer”)
Sometimes a variable can be classified differently (eg: income) and
one need to make a choice case by case.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 26 / 67


5 6
Recall: Example 4: Care Pathway

An OB doctor (obstetrician) wanted to evaluate effect of a care


pathway protocol in childbirth that was implemented on Sep. 1st,
2014. She counted the number of different types of delivery prior
and post implementation of the care pathway. She found that the
rate of C-sections dropped by 3%, and the rate of Spontaneous
Vaginal Deliveries increased by 2%.
I See a random sample of 200 cases
I How many variables in this dataset? What type of variables?
I How would you present, summarize, and analyze this data?

5
From: a recent hospital consulting project
6
Clinical care pathways are essentially protocols that are used to manage
the quality in healthcare concerning the standardization of care processes.
Implementation of care pathway promotes organized and efficient patient care.
Jimin Ding, Math WUSTL Math 3200 Spring 2017 27 / 67
Summary Statistics for Single Numerical Variable

Jimin Ding, Math WUSTL Math 3200 Spring 2017 28 / 67


Measure of Central Tendency (Location Measurement)

Let x1 , · · · , xn be the n observed variables.


Pn
I (Sample) Mean x̄ =
i=1 xi /n
average, use information from all values, sensitive to outliers
I (Sample) Median x̃ (x̃0.5 )
middle value, only depend on one or two values and rank of
observations, insensitive to outliers (robust)
I Trimmed mean
I Mode

Jimin Ding, Math WUSTL Math 3200 Spring 2017 29 / 67


Measure of Dispersion (Variability)

I Range
Pn
I Variance s2 = i=1 (xi − x̄)2 /(n − 1) (Standard Deviation s)
I IQR: Interquartile Q3 − Q1
Quantile: extension from median
Q1 (x̃0 .25): first quantile , median of the lower half
Q3 (x̃0 .75): third quantile ,median of the upper half
x̃p , p ∈ [0, 1]): In general, the 100pth quantile is the value
which has a fraction of p of the data less than or equal to it
and has a fraction of 1 − p of the data great than it.
Five number summary: min, Q1 , Q2 (= x̃), Q3 , max

Jimin Ding, Math WUSTL Math 3200 Spring 2017 30 / 67


Other Statistics

Note that all above summary statistics are sensitive to location and
scale change, which might be less appealing. Here are some other
statistics which are insensitive to location and/or scale changes.
I Coefficient of Variation (CV): CV = s/x̄ a relative measure of
dispersion, insensitive to scale change
I Centered data: subtract sample mean from all observations
I Scaled data: divide sample standard deviation from all
observations
I Standardized data (z-scores): centered and then scaled
observations zi = (xi − x̄)/s
I Order statistics: rank of observation
Sample skewness and kurtosis
R example: on Labor Data

Jimin Ding, Math WUSTL Math 3200 Spring 2017 31 / 67


Graphic Presentation of Data

Jimin Ding, Math WUSTL Math 3200 Spring 2017 32 / 67


Histogram

Count the number of observations (frequency) within each bin


(”break”) for numerical variable.
I Could also be frequency or relative frequency
I Great for seeing the shape (distribution) of the data:
unimode/bimode/multimode, symmetric/skewed, heavy tail,
uniform ...
I It is crucial to choose an appropriate bin size: different bin
size may tell different stories
I R function: hist
Other visual tools: stem and leaf plot, dotplot ...

Jimin Ding, Math WUSTL Math 3200 Spring 2017 33 / 67


Q1: Histogram and Standard Deviation
Please order the histogram below from the smallest standard
deviation to the largest standard deviation.

A B C

a. A<B<C
b. C<B<A
c. B<C<A
d. C<A<B
e. B<A<C
Jimin Ding, Math WUSTL Math 3200 Spring 2017 34 / 67
Boxplot

I Usually for numerical data, but can be grouped by one or two


categorical variables.
I It shows five number summary: box, median, fences, whiskers,
outliers
I R function: boxplot
Other visual tools: pie chart, bar chart ...

Jimin Ding, Math WUSTL Math 3200 Spring 2017 35 / 67


For more than one numerical variables

I When there are two numerical variables: scatterplot


I When there are two numerical variables and categorical
variable: scatterplot with different symbols
I When there are three numerical variables: 3D plots
I When there are several numerical variables: matrix of
scatterplots
I ....
R function: plot

Jimin Ding, Math WUSTL Math 3200 Spring 2017 36 / 67


Descriptive Statistics for Relationship Between
Two Variables

Jimin Ding, Math WUSTL Math 3200 Spring 2017 37 / 67


For Numerical Variable: Covariance and Correlation

Let (x1 , y1 ), · · · , (xn , yn ) be n pairs of observations.


Pn
i (xi − x̄)(yi − ȳ)/(n − 1).
I Covariance: sxy =
invariant to location shift
I Correlation: rxy = sxy /(sx sy ).
scaled version of covariance, invariant to both location and
scale change
I > 0: positive correlation, y variable increases as x increases
I < 0: negative correlation, y variable decreases as x increases
I = 0: linearly uncorrelated
Alert 1: uncorrelated 6= independence
Alert 2: correlation 6= causation
Covariance and correlation will not change when one exchange x
and y.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 38 / 67


Extension to single variable: Autocorrelation
The idea of correlation might be used for single variable to check
the association over “time” (index) within a single variable.
Basically, we replace yi by xi−k for k = 1, · · · , n − 1. Here, k is
called lag.
The first autocorrelation coefficient (k = 1) is
n−1
X n
X
r1 = (xi − x̄)(xi+1 − x̄)/ (xi − x̄)2 .
i=1 i

Generally, the kth autocorrelation coefficient is


n−k
X n
X
rk = (xi − x̄)(xi+k − x̄)/ (xi − x̄)2 .
i=1 i

This is useful in quantifying linear dependence structure in time


series.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 39 / 67


Regression Line on Scatterplot

Visually it might be useful to add a line on a scatterplot to


summarize the relationship between two numerical variables.
Naturally we want to find a line that is closest to all data.
For simplicity, let’s first consider a linear line y = a + bx. How to
find a and b so that this line is closest to all (xi , yi ), i = 1, · · · , n?
Least square criterion:
n
X
min [yi − (a + bxi )]2
a,b
i

One can solve the minimization by taking derivative w.r.t. a and b.


Let â and b̂ be the minimizer. Then we refer y = â + b̂x as the
regression line or least square line.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 40 / 67


Correlation and Regression

Correlation and Regression are closely related.


I It can be proved that the slope of the regression line, b̂, for
standardized data, is same as the correlation coefficient
between the two variables. (Homework: check in R)
I The sign of the slope of the regression line is same as the sign
of the correlation coefficient.
s
I In general, b̂ = rxy sxy .
I In regression, we are more interested to see who x affects y.
We call x as explanatory variable (covariate or independent
variable), and call y as response variable (outcome or
dependent variable). While in correlation calculation, the roles
of x and y are exchangeable.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 41 / 67


Q2: Correlation from Scatterplot
What number might be the Pearson correlation coefficient between
Verbal and Math SAT scores?

a. 0 b. -0.7 c. 0.7 d. 2 e. -1
Jimin Ding, Math WUSTL Math 3200 Spring 2017 42 / 67
Extension to Nonlinear and Multiple Regression

n
X
min [yi − (a + bxi )]2
a,b
i

The idea of least square criterion can be generalized beyond linear


regression line.
I For any
Pnparametric form of x, f (x), one may minimize
2
minf i [yi − f (xi )] to find the best f to describe the
relationship between x and y. For example, f is a quadratic
form.
I One may have multiple covariates xi1 , xi2 , · · · , xip .

Jimin Ding, Math WUSTL Math 3200 Spring 2017 43 / 67


For Categorical Variable:Contingency Table
Recall example 3: Kidney Stones Treatment

Success Failure Row Total


New Treatment (Open Surgery) 273 77 350
Standard Treatment (Noninvasive) 289 61 350
Column Total 562 138 700

I This is a two-way 2 × 2 table. The row variable is treatment


and the column variable is treatment outcome.
I I × J table for I levels of row variable and J levels of column
variable.
I Percentage (row, column, cell) might be presented in the
table. For example, one can find the successful rate for open
surgery and noninvasive treatments are 73% and 78%,
respectively.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 44 / 67


Contingency Table for More Than Two Variables
In the previous example, the treatment of kidney stones heavily
depend on the size of stones. Hence we may want to split the table
by a third variable “stone size”, which has two levels “small” and
“large”.

New Treatment (Open Surgery) Standard Treatment (Noninvasive)


Success Failure Row Total Success Failure Row Total
Small Stones 81 6 87 234 36 270
Large Stones 192 71 263 55 25 80
Column Total 273 77 350 289 61 350

I This is called a 3 way contingency table.


I One may extend it to higher contingency table.
I Higher contingency table is often harder to read. Hence it is
important to emphasize on the 1 ∼ 2 interesting variables in
exploring analysis. To deal with more variables simultaneously,
we need more sophisticated models.
Jimin Ding, Math WUSTL Math 3200 Spring 2017 45 / 67
R Functions

I cov
I cor
I lm
I table, prop.table, margin.table
I CrossTable{gmodels}

Jimin Ding, Math WUSTL Math 3200 Spring 2017 46 / 67


Probability

Jimin Ding, Math WUSTL Math 3200 Spring 2017 47 / 67


Review of Basic Probability Concepts

Jimin Ding, Math WUSTL Math 3200 Spring 2017 48 / 67


Topics
• Experiments, outcomes, sample space, and events
• Union, Intersection, complement, disjoint Events
• Probability
• Axioms of Probability
Motivating Probability
• Consider the following colored ball example

• What happens if we put our hand in the bowl and pulled a ball out
randomly? Can we guess which color we are most likely to get?

Population size: 17 balls


5 Maroon
6 Orange
5 Blue
1 Green
Motivating Probability
• Recall the colored ball example for proportions

• What happens if we put our hand in the bowl and pulled a ball out
without looking? Can we guess the color?

I say that I have a


Proportion Maroon: 5/17 = 0.294
35.3% chance of pulling
out an orange ball. Proportion Orange: 6/17 = 0.353
Proportion Blue: 5/17 = 0.294
Proportion Green: 1/17 = 0.059
What do I mean by chance?
• I mean the relative frequency with which I expect some “event” to
occur
• Event: Pulling out an orange ball (doesn’t matter which one).
• Random draw: Put my hand in the bowl and pull ball out without
looking
• I record the color and put the ball back.
• If I did this 10 times then I would expect about 10*0.353=3.53 of
the draws to be orange.
• That is, I’d expect the relative frequency of orange balls with
respect to the 10 balls to be around 35.3%.
Experiments, Outcomes, and Events
• The above example had three parts to it:
1. Action of pulling out a single ball
2. Four different colors we could get on one pull
3. Specific event of getting an orange ball
Experiments, Outcomes, and Sample Space
• Experiment: single, “random” trial
• Pull ball out of bowl
• Flipping coin
• Measurement from randomly selected person
• Outcomes: observable, potential result of the trial
• Pulling a green ball out of the bowl
• Getting heads
• Sample Space: Set of all outcomes
Events and probabilities
• Event: any subset of the sample space

• Probability of an event: the relative frequency with which that event


can be expected to occur

Experiment: One throw of the die


Outcome: 1
Sample Space: {1, 2, 3, 4, 12
5, 6}
Events: {1} or {2} or {3}
Notation!
• We could also think of probability as being a function that takes
every event and assigns it a number between 0 and 1

• Using P to denote the function and A to denote some event. Y must


be a number between 0 and 1.
Event goes in here, not probability

𝑷 𝑨 =𝒀
Read as: the probability of event A is Y
Notation!
• Probability of getting a heads on one coin flip is 1/2

𝑷 𝑯𝒆𝒂𝒅𝒔 = 𝟎. 𝟓
• Probability of rolling a 2 for a six-sided die is 1/6

𝑷 𝑹𝒐𝒍𝒍𝒊𝒏𝒈 𝟐 = 𝟏/𝟔 𝑷 {𝟐} = 𝟏/𝟔


• Probability of getting an orange ball is 6/17
𝑷 𝑶𝒓𝒂𝒏𝒈𝒆 = 𝟔/𝟏𝟕
More on Events
• Events are described as subsets of sample spaces
• Subset: collection of any outcomes
• Elementary event: an event consisting of a single outcome
• There is one subset called the empty set that contains no outcome
• On the other hand, a subset can also contain all outcomes, so that
the event equals the sample space All possible outcomes are
S contained in this rectangle
Event A lies inside S and A
some of the outcomes lie
“inside” of it.
More on Events
• Events are described as subsets of sample spaces
• Subset: collection of any outcomes
• Elementary event: an event consisting of a single outcome
• There is one subset called the empty set that contains no outcome
• On the other hand, a subset can also contain all outcomes, so that
the event equals the sample space
S A could be so “small” that
A it doesn’t contain anything.

This is the empty set.


More on Events
• Events are described as subsets of sample spaces
• Subset: collection of any outcomes
• Elementary event: an event consisting of a single outcome
• There is one subset called the empty set that contains no outcome
• On the other hand, a subset can also contain all outcomes, so that
the event equals the sample space
S=A
S
Or it could be so big that it
takes up the entire sample
space!
Complements
• What about all the events that lie “outside” of A?
• The complement of an event A are all the outcomes that are in S but
not in A
• Denoted by 𝑨𝒄
S
A
Fact: Every outcome
𝑨𝒄 is either in A or 𝑨𝒄
Multiple Events
• We could have multiple events of interest, say A and B
• There are two parts worth describing
• The part that “overlaps” or “intersects”
• The parts that do not overlap
Outcomes that are in both
events!

S
A These events do not overlap, so
there is no outcome common to
both.
B
Multiple Events
• We could have multiple events of interest, say A and B
• There are two parts worth describing
• The part that “overlaps” or “intersects”
• The parts that do not overlap
Outcomes that are in both
events!
Now they do overlap! There
are outcomes common to both!
S
A

B
Event Operations: Intersection
• What if we really only care about the events common to both?
• C = outcomes in BOTH A and B
• That is, I want to the overlap of A and B

𝑪=𝑨 ∩𝑩
Read: A “intersect” B

C All outcomes in here is the


A intersection

B
Mutually Exclusive Events
• In some cases there won’t be any overlap between the sets
• In this case 𝑪 = 𝑨 ∩ 𝑩 = ∅, the empty set!
• Mutually exclusive events: two events whose intersection equals the
empty set (no outcomes common to both)
• Fact: Every event is made up of a union of mutually exclusive events
(just union all the elementary events!)
S Mutually exclusive events cannot happen
A at the same time.
Example: A and 𝑨𝒄 are mutually exclusive!
B
Event Operations: Union
• We can define new events using multiple events!
• C = the event that has all the outcomes contained in A and B
• That is, I want to join (union) the outcomes in A and B

What happens when we


𝑪=𝑨 ∪𝑩
Read: A “union” B
union an event and its
complement?
A C
Are there any outcomes
that lie outside this union? B
• Ex: Roll a die once
• The sample space is {1, 2, 3, 4, 5, 6}
• Event A = Roll an even number {2, 4, or 6}
• Event B = Roll 1, 2, or 3

What are the elementary events that make up A? What about B?


What is the complement of A? What about B?
𝑨∪𝑩=
𝑨∩𝑩=
Axioms of Probability
• It’s crucial to understand how events work in order to understand
probability
• We want probability to be relative frequency with which we expect
the event to occur Why: Relative frequencies always
• Axiom 1: 𝑷 𝑺 = 𝟏 add up to 100%

• Axiom 2: 𝟎 ≤ 𝑷(𝑨) ≤ 𝟏 Why: Relative frequencies are never


negative, and never exceed 100%
• Axiom 3: If A and B are mutually exclusive
𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷(𝑩) Why: There are no outcomes in
common, so just do the relative
frequencies separately and add.
Rolling A Die Once
• “Fair” die = equal probability for each side
• There are 6 sides, so 1/6 for each side
• Event A = Roll an even number
• Event B = Roll 1, 2, or 3

P(A)= P(B)=
𝑷 𝑨𝒄 = 𝑷 𝑩𝒄 =
𝑷(𝑨 ∪ 𝑩) = 𝑷(𝑨 ∩ 𝑩) =
Rolling A Die Twice
• Record side on first and second roll:
(Side 1, Side 2)
• What’s the sample space?

(1, 1) (2, 1) (3, 1) (4, 1) (5, 1) (6, 1)


(1, 2) (2, 2) If we (3,
roll2)a 1 first, we could
(4, 2) roll a 1,(6,2,2)…, 6 on
(5, 2)
(1, 3) (2, 3) the second.
(3, 3) (4, 3) (5, 3) (6, 3)
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4)
(1, 5) (2, 5) (3, 5) (4, 5) (5, 5) (6, 5)
(1, 6) (2,6) (3, 6) (4, 6) (5, 6) (6, 6)
Rolling a Die Twice
• 36 total outcomes, each are equally likely
𝑷 𝟏, 𝟏 = 𝑷 𝑺𝒖𝒎 𝒐𝒇 𝑹𝒐𝒍𝒍𝒔 = 𝟕 =
𝑷 𝑹𝒐𝒍𝒍 𝒂 𝟏 = 𝑷 𝑹𝒐𝒍𝒍 𝑻𝒘𝒐 𝑶𝒅𝒅 #′ 𝒔 =

(1, 1) (2, 1) (3, 1) (4, 1) (5, 1) (6, 1)


(1, 2) (2, 2) (3, 2) (4, 2) (5, 2) (6, 2)
(1, 3) (2, 3) (3, 3) (4, 3) (5, 3) (6, 3)
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4)
(1, 5) (2, 5) (3, 5) (4, 5) (5, 5) (6, 5)
(1, 6) (2,6) (3, 6) (4, 6) (5, 6) (6, 6)
• Ex: Assume we have a rectangle with area 1

𝑷 𝑩 =
B
A 𝑷 𝑨∩𝑩 =
0.04 0.30
0.13 𝑷 𝑨∩𝑩∩𝑪 =
0.03 𝑷 (𝑨 ∪ 𝑩 ∪ 𝑪)𝒄 =
0.02 0.04
0.07 C 0.37
Conditional Probability and Independence

Jimin Ding, Math WUSTL Math 3200 Spring 2017 49 / 67


Motivating Examples

I What is the probability a randomly chosen student is a


sophomore? What is the probability that a student from our
class is a sophomore?
I Recall the example of kidney stone treatment. What is
successful rate of the new treatment (open surgery)? What is
the successful rate of the new treatment for the patients with
large kidney stones?
I Recall the example of “Do I really have a cancer?” What is
the sensitivity and specificity of a medical diagnosis?

Jimin Ding, Math WUSTL Math 3200 Spring 2017 50 / 67


Recall Example: “Do I really have cancer?”

A patient saw his medical diagnostic report of some cancer is “+”,


which indicates the presence of cancer. He immediately searched
online and found this type of medical diagnose is very accurate:
I sensitivity of the test is 99% (if a person has this cancer, with
99% of chance, his test result will ”+”)
I specificity of the test is 95% (if a person is normal, with 95%
of chance, his test result will be “-”).
Of course, the patient was really scared and worried. But his
doctor told him that the type of cancer is very rare with only 0.1%
prevalence rate, so he does not need to worry too much. Why does
the doctor say that?
Hint: Knowing his diagnostic result is “+”, what is the chance
(probability) that he actually has this type of cancer?

Jimin Ding, Math WUSTL Math 3200 Spring 2017 51 / 67


Topics
• Conditional probability
• Independent events
• Probability Rules: multiplication, addition, complement
• Bayes’ Rule
Conditional probability
• Changing relative frequencies by “conditioning” on the value of an
explanatory variable
Major
Gender Int. Bus Geo Socio H. Develop PU Affairs Finance Row Freq
Male 0.10 0.28 0.18 0.18 0.06 0.20 1.0
Female 0.20 0.20 0.20 0.16 0.08 0.16 1.0
Col Freq 0.12 0.26 0.18 0.18 0.06 0.19 1.0

• In some cases, being told that event A occurred implies that the
probability of event B changes
Conditional probability
• Conditional probability: the relative frequency we can expect an
event to occur under the condition that additional, preexisting
information is known about some other event
• Denoted 𝑷 𝑨 𝑩) 1) Probability of A given B
2) Probability of A, knowing B
3) Probability of A happening,
knowing B has already occurred
Major
Gender Int. Bus Geo Socio H. Develop PU Affairs Finance Row Freq 𝑷 𝑮𝒆𝒐 𝑴𝒂𝒍𝒆) =
Male 0.10 0.28 0.18 0.18 0.06 0.20 1.0
Female 0.20 0.20 0.20 0.16 0.08 0.16 1.0
𝑷 𝑰𝒏𝒕 𝑩𝒖𝒔 𝑭𝒆𝒎𝒂𝒍𝒆) =
Col Freq 0.12 0.26 0.18 0.18 0.06 0.19 1.0
Conditional probability
• If we know that B happened, we know that all outcomes outside of
B could not happen
• This means that events comprised only of these outside outcomes
now have zero probability!
A can still occur given B occurred

A
B All outcomes in B become the new
sample space!
Conditional probability
• We “standardize” the probability of A with the probability of B

The maroon area takes up


more space in B than it does
in S…probability of A is higher
if we know B happened!

A
B
Example
• Say we pulled a ball out and didn’t put it back in. This will change
the probability if we pull out another ball!

Color Orange Maroon Blue Green


Probability 6/17 5/17 5/17 1/17

There are now only 16 balls in the bowl!


Example
• Say we pulled a ball out and didn’t put it back in. This will change
the probability if we pull out another ball!

Color Orange Maroon Blue Green


Probability 5/16 5/16 5/16 1/16

𝟓
𝑷 𝑶𝟐 | 𝑶𝟏 = = 𝟎. 𝟑𝟏𝟐𝟓 < 𝟎. 𝟑𝟕𝟓
𝟏𝟔
Shorthand for “Orange ball, 2nd pull”
Example
• Say we pulled a ball out and didn’t put it back in. This will change
the probability if we pull out another ball!

Color Orange Maroon Blue Green


Probability 6/17 5/17 5/17 1/17

𝑷 𝑴𝟐 | 𝑶𝟏 =
𝑷 𝑶𝟐 | 𝑴𝟏 =
𝑷 𝑮𝟐 | 𝑮𝟏 =
Independent events

• Independent events: if the occurrence (or non-occurrence) of one


event does not change the probability of the other event occurring

Conditioning on B or its complement does not change the


probability of A occurring
Independent events
• This actually gives us a quick way to calculate 𝑷(𝑨 ∩ 𝑩)
• If A and B are independent we have

We don’t need to find the intersection,


just multiply the two probabilities!
Multiply by P(B) on both sides to get

𝑃(𝐴 ∩ 𝐵) = 𝑃 𝐴 𝑃(𝐵)
Example
• A good free-throw percentage is 90%
• Assume the probability of making a free-throw on one
shot is 0.90
• Event = make two shots in a row
Fotosearch.com
• Let’s assume the second shot is independent of the first
𝐴 = 𝑀𝑎𝑘𝑒 1𝑠𝑡 𝐵 = 𝑀𝑎𝑘𝑒 2𝑛𝑑 First Second Probability
Miss Make P(Miss 1st)P(Make 2nd)=0.10*0.90 =0.09
Miss Miss P(Miss 1st)P(Miss 2nd)=0.10*0.10 =0.01
This is the event we want Make Make P(Make 1st)P(Make 2nd)=0.90*0.90=0.81
Make Miss P(Make 1st)P(Miss 2nd)=0.90*0.10 =0.09
Example
• A good free-throw percentage is 90%
• Assume the probability of making a free-throw on one
shot is 0.90
• Event = make two shots in a row
Fotosearch.com
• Let’s assume the second shot is independent of the first
𝐴 = 𝑀𝑎𝑘𝑒 1𝑠𝑡 𝐵 = 𝑀𝑎𝑘𝑒 2𝑛𝑑 First Second Probability
Miss Make P(Miss 1st)P(Make 2nd)=0.10*0.90 =0.09
Miss Miss P(Miss 1st)P(Miss 2nd)=0.10*0.10 =0.01
Make Make P(Make 1st)P(Make 2nd)=0.90*0.90=0.81
These two outcomes make up A
Make Miss P(Make 1st)P(Miss 2nd)=0.90*0.10 =0.09
Example
• A good free-throw percentage is 90%
• Assume the probability of making a free-throw on one
shot is 0.90
• Event = make two shots in a row
Fotosearch.com
• Let’s assume the second shot is independent of the first
𝐴 = 𝑀𝑎𝑘𝑒 1𝑠𝑡 𝐵 = 𝑀𝑎𝑘𝑒 2𝑛𝑑 First Second Probability
Miss Make P(Miss 1st)P(Make 2nd)=0.10*0.90 =0.09
These outcomes make up B Miss Miss P(Miss 1st)P(Miss 2nd)=0.10*0.10 =0.01
Make Make P(Make 1st)P(Make 2nd)=0.90*0.90=0.81
Make Miss P(Make 1st)P(Miss 2nd)=0.90*0.10 =0.09
Example
• A good free-throw percentage is 90%
• Assume the probability of making a free-throw on one
shot is 0.90
• Event = make two shots in a row
Fotosearch.com
• Let’s assume the second shot is independent of the first
𝐴 = 𝑀𝑎𝑘𝑒 1𝑠𝑡 𝐵 = 𝑀𝑎𝑘𝑒 2𝑛𝑑 First Second Probability
Miss Make P(Miss 1st)P(Make 2nd)=0.10*0.90 =0.09
𝑬𝒗𝒆𝒏𝒕 = 𝑨 ∩ 𝑩 Miss Miss P(Miss 1st)P(Miss 2nd)=0.10*0.10 =0.01
Make Make P(Make 1st)P(Make 2nd)=0.90*0.90=0.81
Why do I care about this? Make Miss P(Make 1st)P(Miss 2nd)=0.90*0.10 =0.09
Example
• A good free-throw percentage is 90%
• Assume the probability of making a free-throw on one
shot is 0.90
• Event = make two shots in a row
Fotosearch.com
• Let’s assume the second shot is independent of the first
𝐴 = 𝑀𝑎𝑘𝑒 1𝑠𝑡 𝐵 = 𝑀𝑎𝑘𝑒 2𝑛𝑑 First Second Probability
Miss Make P()P(B) =0.10*0.90 =0.09
Because A and B are independent!
Miss Miss P()P()=0.10*0.10 =0.01
𝑃 𝐴 = 𝑃 𝐵 = 0.90 Make Make P(A)P(B) =0.90*0.90 =0.81
𝑃 𝐴𝑐 = 𝑃 𝐵𝑐 = 0.10 Make Miss P(A)P() =0.90*0.10 =0.09
Mutually Exclusive vs Independence
• It’s easy to confuse mutually exclusive with independent, but these
are very different concepts
• Say we have two mutually exclusive events A and B, each with
positive probability
S
𝑨∩𝑩=∅ 𝑷(𝑨 ∩ 𝑩) = 𝟎
A
To be independent P(𝐴|𝐵) = 𝑃(𝐴)
B

• These events will never be independent! But 𝑃 𝐴 > 0!


Rules of probabilities
• One way to calculate probabilities of events created from
unions/intersections/complements is to first find all the outcomes
in that event and then find the probability
Useful if we know what C looks like
𝑷 𝑨 ∪𝑩 =𝑷 𝑪

• Could use probabilities of the individual events in some cases


• Addition Rule: Unions
• Complement Rule: Complements
• Multiplication Rule: Intersections
Multiplication Rule
• Recall: When two events were independent we could write

𝑃(𝐴 ∩ 𝐵) = 𝑃 𝐴 𝑃(𝐵)
• This was because 𝑷 𝑨 = 𝑷(𝑨|𝑩), in general this isn’t true!

𝑃(𝐴 ∩ 𝐵)
𝑃 𝐴𝐵 = 𝑃(𝐴 ∩ 𝐵) = 𝑃 𝐴|𝐵 𝑃(𝐵)
𝑃(𝐵)
Multiplication Rule
• There are two ways to write this

𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴|𝐵 𝑃(𝐵)
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵|𝐴 𝑃(𝐴)
• Called the multiplication rule
• This gives us a useful way to calculate probabilities of
intersections….assuming we know the conditional probability
Example
• You pull out two balls in a row (not putting them back in)

Color Orange Maroon Blue Green


Probability 6/17 5/17 5/17 1/17

𝑷 𝑻𝒘𝒐 𝑶𝒓𝒂𝒏𝒈𝒆 𝑩𝒂𝒍𝒍𝒔 =


𝑷 𝑮𝟏 ∩ 𝑴𝟐 =
𝑷 𝑩𝟏 ∩ 𝑶𝟐 =
Example
• Randomly pull out five cards from a deck of 52 and want to know
how likely you are to get a royal flush.
• These are four possibilities in the sample space of all possible 5 card
combinations
• Equally likely so just do one
and multiply by 4
• Start by picking one suit
Example
• Idea: Find the probability for a royal flush of Hearts, then multiply by
4 (the other suits have the same probability)
• We’ll need to repeatedly use the multiplication rule
Addition Rule
• Remember Axiom 3, where if we have mutually exclusive events

𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷(𝑩)
• Here we don’t need to figure out C, we just add the probabilities!
• What happens if they aren’t mutually exclusive?
𝑷 𝑨 ∪𝑩 =
A
𝑷 𝑨 =
B
𝑷 𝑩 =
Addition Rule
• If we just add P(A) and P(B) we get too much area!

𝑷 𝑨 +𝑷 𝑩 = +

Idea: Maybe we can just


subtract off the extra area that ≠
we added. But what is
this extra area?
= 𝑷(𝑨 ∪ 𝑩)
Addition Rule
• This extra area is the intersection of A and B! This gives us the
following addition rule:

𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷 𝑩 −𝑷(𝑨 ∩ 𝑩)

• Note that if A and B are mutually exclusive the intersection goes


away!
• This is only useful if we know 𝑷(𝑨 ∩ 𝑩)
Example

B 𝑷 𝑨 =
A
0.04 0.30 𝑷 𝑩 =
0.13
0.03 𝑷 𝑨∩𝑩 =
0.02 0.04
0.37 𝑷 𝑨∪𝑩 =
0.07 C
Complement Rule
• Recall: any event and its complement account for all outcomes in S

𝑺 = 𝑨 ∪ 𝑨𝒄
• What do we know about these events and their probabilities?

𝑷(𝑺) = 𝟏 𝑷 𝑨 ∪ 𝑨𝒄 = 𝑷 𝑨 + 𝑷(𝑨𝒄 )
• Since these two events are equivalent, their probabilities must be
equal! 𝒄 𝒄
𝑷 𝑺 =𝑷 𝑨∪𝑨 𝟏 = 𝑷 𝑨 + 𝑷(𝑨 )
Complement Rule
• Doing simple algebra, we get the probability of the complement

𝑷 𝑨𝒄 = 𝟏 − 𝑷(𝑨)
• This is called the complement rule
• If we know the probability of an event we can easily find out the
probability of its complement
• If we know the probability of the complement of an event, we can
find out the probability of the event.
Most useful if an event is “large” but
the complement event is “small”
Examples
• There are around 60 students in this class. What’s the probability
that two or more people have the same birthday?
• Ignore leap year, so 365 possible birthdays
• Total # of different combinations of 60 birthdays?

Birthdayexpress.com

⋯⋯⋯
60th power!
365 365 365 365 365 365
𝟑𝟔𝟓𝟔𝟎
Examples
• How can we possibly count every single combination where at least
two people had the same birthday!?
• Idea: Let’s count the complement instead!
• Complement Event: No two people have the same
birthday

Birthdayexpress.com

⋯⋯⋯
365 364 363 362 361 360

365*364*…*(365-60+1)
Examples
• Divide these two quantities and subtract from 1
𝟑𝟔𝟓∗𝟑𝟔𝟒∗⋯∗ 𝟑𝟔𝟓−𝟔𝟎+𝟏
𝑷 𝑨 =𝟏− = 0.994
𝟑𝟔𝟓𝟔𝟎
• What about a group of 23 people?
𝟑𝟔𝟓∗𝟑𝟔𝟒∗⋯∗ 𝟑𝟔𝟓−𝟐𝟑+𝟏
𝑷 𝑨 =𝟏− = … >0.5
𝟑𝟔𝟓𝟐𝟑

• It is almost guaranteed that two of you have the same Birthdayexpress.com

birthday!
• The complement rule allowed us to make this problem a lot easier!
Let’s Make A Deal (by Monte Hall 1960~70)
• There are three doors, two have a goat behind them, one has a new
car!

dreamstime.com
Clker.com
Let’s Make A Deal (by Monte Hall 1960~70)
• You choose a door and the game show host (who knows what’s
behind the doors) opens one of the other doors with a goat behind it

dreamstime.com
Clker.com

You have the option to change your answer.


You choose this door. Should you? Does it matter?
Let’s Make A Deal (by Monte Hall 1960~70)
• Strategy 1: You think you’re being tricked! Never change your answer.
• P(Winning Car) = 1/3
• Strategy 2: Always change your answer.

First Choice Car Goat 1 Goat 2


Known Goat 1 (or 2) Goat 2 Goat 1
Second Choice Goat 2 (or 1) Car Car

𝟐
𝑷 𝑪𝒂𝒓 𝑮𝒐𝒂𝒕 𝑺𝒉𝒐𝒘𝒏, 𝑪𝒉𝒂𝒏𝒈𝒆 𝑨𝒏𝒔𝒘𝒆𝒓) =
𝟑
The information, which seemed useless, doubles your chances of getting the car!
Bayes’ Rule
• Use to calculate the “reverse” probability
𝑃(𝐴 ∩ 𝐵) 𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴𝐵 = =
𝑃(𝐵) 𝑃 𝐵 𝐴 𝑃 𝐴 + 𝑃 𝐵 𝐴𝑐 𝑃(𝐴𝑐 )
• Tree diagrams/Venn Diagrams
Summary
• Probability of an event: the relative frequency with which that event
can be expected to occur
• Understand the relationship between sample spaces and events
• Event operations may be used to combine multiple events
• Know the three axioms of probability
• The size of the sample space determines how we assign probabilities
• Conditional probabilities “reduce the sample space”
• Mutually exclusive versus Independence
• Multiplication, Addition, and Complement rules make calculating
probabilities easier
• Bayes’ Rule can be used to calculate the “reverse” probability
Random Variables and Distributions

Jimin Ding, Math WUSTL Math 3200 Spring 2017 52 / 67


Probability as Function

I We originally defined probability as a specific function of


events
P (A) = Y
I In some cases these probabilities were intuitive and we did not
really need to think of it as being a function. For example, flip
a coin to see which side faces up.
I But in the world of large sample spaces, it is impossible to
just know or manually assign a probability to every outcome.
For example, weight of newborns.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 53 / 67


Random Variable
I If our outcomes are numbers, we can create a function to
assign probabilities to all outcome numbers in the sample
space.
I Not every sample space will be made up of numerical
outcomes. For example, flip a coin (H/T), draw different
colored balls
I It is mathematically convenient to associate with every
outcome some numerical value.
I Random variable: a variable or a function that assigns a
unique numerical value to every outcome in the sample space,
S.
I If s is an outcome in S, then we can think of a random
variable this way:

X(s) = Real Number

Jimin Ding, Math WUSTL Math 3200 Spring 2017 54 / 67


Random Variable

I If we flip a coin once, the possible outcomes are Head or Tail.


I What is a random variable we can use?
(
1, if Head
X(s) =
0, if Tail

I A random variable is also a function, we often drop the


function notation.
I A random variable is usually denoted by a capital letter, such
as, X, Y, Z.
I An observed value of X (not random) is denoted by lower
case x. (We called it realization of X.)

Jimin Ding, Math WUSTL Math 3200 Spring 2017 55 / 67


Type of Random Variables

I Random variables are always quantitative.


I We had classified numerical data to discrete and continuous,
similarly the classification can be applied to random variables.
I Discrete random variable: the possible values are along
the real line.
I Continuous random variable: the possible values forms
along the real line.
a. finite, infinite;
b. discrete, an interval or intervals;
c. countable, uncountable.
I Probability function: a function that assigns probability to the
values of a random variable. The type of random variable
determines what type of probability function we can use.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 56 / 67


Probability Distribution of a Discrete R.V.

I Often we use a table to present probability distribution of a


discrete R.V.
I Ex. Let X = the number of heads in tossing a fair coin. The
probability distribution of X is
I Ex. Let X = the number of heads in tossing a fair coin
twice...
I Ex. Say we have a R.V. with 4 possible values: {1, 2, 3, 4} and
we assign probability P (X = x) = x/10, for x = 1, 2, 3, 4.
I Besides listing probabilities of every possible outcome in a
table, one may also display the probability distribution using a
formula as above. Furthermore, ...

Jimin Ding, Math WUSTL Math 3200 Spring 2017 57 / 67


Distribution Functions for Discrete R.V.
I Probability Density Function (pdf) for a discrete r.v. (It is
also called probability mass function (pmf))
f (x) := P (X = x)
Following
P
the definition of probability, f satisfies
1. i f (xi ) = 1;
2. 0 ≤ f (xi ) ≤ 1.
I Remark: It is very important to list all possible values for x or
specify the range of x in stating a pdf.
I All possible values of a random variable is called its support.
I Cumulative Distribution Function (cdf) for a discrete r.v.
F (x) := P (X ≤ x)
Following the definition of probability, F satisfies
1. F (−∞) = 0, F (∞) = 1;
2. F is monotone increasing;
3. F is right continuous.
Jimin Ding, Math WUSTL Math 3200 Spring 2017 58 / 67
Distribution Function for Discrete R.V.: Plots

Recall: Ex. Say we have a R.V. with 4 possible values: {1, 2, 3, 4}


and we assign probability P (X = x) = x/10, for x = 1, 2, 3, 4.
Can you plot the pmf and cdf. of the random variables on the
previous slide?

Jimin Ding, Math WUSTL Math 3200 Spring 2017 59 / 67


Q2: Calculation using cdf or pdf

Find the value of k so that the following function is a probability


distribution of a r.v. X: f (x) = k(x + 2), where x = 0, 1, 2, 3.
a. 1/6
b. 1/14
c. 1/8
d. 1/20
For this probability distribution, find F (2) .

Jimin Ding, Math WUSTL Math 3200 Spring 2017 60 / 67


Distribution Functions for Continuous R.V.
Similarly as for discrete r.v., one can also define cdf and pdf for a
continuous r.v.
Consider randomly dropping a point on the interval [0, 1]. Let
X =distance between the point and the origin, which is a
continuous r.v. and can take any values between 0 and 1.
I Cumulative Distribution Function for a continuous r.v.:

F (x) := P (X ≤ x), x ∈ R.

I Probability Density Function (pdf) for a continuous r.v.:

dF (x)
f (x) := , x ∈ R.
dx
P R
I Replacing by , the properties of cdf and pdf still hold.
I We will talk more about pdf for continuous r.v. later.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 61 / 67


Parameters of Distributions

Jimin Ding, Math WUSTL Math 3200 Spring 2017 62 / 67


Motivating Example: Drawing Colored Balls

Recall in the example of drawing colored balls from a bowl, We


had 17 balls in total, 5 Maroon, 6 Orange, 5 Blue, 1 Green. Now
let’s assume different colored ball is worth of different amount of
money. Say $1 for Maroon balls, $2 for Orange balls, $3 for Blue
balls, and $5 for the Green ball. Then how much will I win if I
randomly pull out a ball?
I Of course, the amount of money I win is random.
I Then, “on average”, how much will I win? What is the
expectation of my award?
(1 + 2 + 3 + 5)/4? or (1 × 5 + 2 × 6 + 3 × 5 + 5 × 1)/17?
I Furthermore, how precise is this expectation? What is the
variance of my award?

Jimin Ding, Math WUSTL Math 3200 Spring 2017 63 / 67


Mean /Expectation/Expected Value

(Population) Mean is a measurement of central tendency of the


population, denoted by µ.
It is also referred to as Expectation or Expected value of the
distribution (or of the random variable), denoted by E(X).
X X
µ = E(X) := xi f (xi ) = xi P (X = xi )
i all values

I So expectation is a weighted average of all possible values


that a random variable can take, with weight being the
probability of taking that value.
I Mean/Expectation/Expected value of a continuous
P R r.v. can
be calculated in the same fashion replacing by .

Jimin Ding, Math WUSTL Math 3200 Spring 2017 64 / 67


Variance

(Population) Variance is a measure of spread of the population,


denoted by σ 2 or V ar(X). Precisely,
X
σ 2 = V ar(X) := E[(X − µ)2 ] = (xi − µ)2 f (xi )
i

I Variance measures the average square distance away from the


center of the population.
I It is a special expectation.
I The population p
standard deviation is simply
σ = SD(X) = V ar(X).
I Variance of a continuous
P R r.v. can be calculated in the same
fashion replacing by .

Jimin Ding, Math WUSTL Math 3200 Spring 2017 65 / 67


Motivating Example: Drawing Colored Balls

Recall in the example of drawing colored balls from a bowl, We


had 17 balls in total, 5 Maroon, 6 Orange, 5 Blue, 1 Green. Now
let’s assume different colored ball is worth of different amount of
money. Say $1 for Maroon balls, $2 for Orange balls, $3 for Blue
balls, and $5 for the Green ball. Then how much will I win if I
randomly pull out a ball?
I Of course, the amount of money I win is random.
I Then, “on average”, how much will I win? What is the
expectation of my award?
(1 + 2 + 3 + 5)/4? or (1 × 5 + 2 × 6 + 3 × 5 + 5 × 1)/17?
I Furthermore, how precise is this expectation? What is the
variance of my award?

Jimin Ding, Math WUSTL Math 3200 Spring 2017 66 / 67


Linear Transformation of Random Variable

Let’s consider a linear transformation of a random variable:


Y = a + bX. Denote the pdf, cdf, mean and variance of X by
fX , FX , µX and σX 2 , respectively. Can you use them to derive

information for Y ? ( pdf, cdf, mean and variance of Y )

Jimin Ding, Math WUSTL Math 3200 Spring 2017 67 / 67

You might also like