Math3200 Classnotes PDF

Math 3200: Elementary to Intermediate Statistics
Instructor: Jimin Ding

jmding@wustl.edu
Department of Mathematics
Washington University in St. Louis
Class materials are available on course website
(www.math.wustl.edu/ jmding/math3200/ )
Spring 2017
Jimin Ding, Math WUSTL Math 3200 Spring 2017 1 / 67

Introduction

About Class
I Class materials are available at
http://www.math.wustl.edu/~jmding/math3200/
I Study Goals:
I statistical reasoning
I basic analytic skills
I critical thinking (in real life and empirical research studies)
I Syllabus
I Tentative class schedule
I Online survey through bb.wustl.edu
I Two interactive learning technologies: iClicker & Crowdmark
I Two statistical software: R & SAS
Learning tip: learn from reading others’ code, practice from
writing your code, and improve from searching online
help/documentation

What is Statistics?

Learning from Data= Statistics
Statistics is the art and science of learning from data.

I Art comes from the various creative and informative ways to
visualize, summarize, and analyze data
I Science comes from using applied and theoretic mathematics
and probability to make objective decisions.
An analysis that does not contain both aspects is often incomplete
and difficult to understand and use.
I Note, “statistics” is also the plural of “statistic” which is a
numerical fact of summary.
Example: average/mean, variance, range, ...
I Formally, a statistic is a function of data.
I Key: understand and quantify uncertainty/variability.

Example 1: Men’s Olympic Triple Jump
Christian Taylor won the gold in men’s triple jump in 2016 Olympic
with a jump of 17.86 meters. It is amazing that how further a
human can jump comparing with a century ago. The record of
1896 Olympic, when men’s triple jump first presented in Olympic,
was only 13.71 m. To understand how the trip jump distance have
been improved, we collect Olympic records from 1896 to 2016.
I How would you like to present the data?
I What can you tell fro the data?
I Can you make any prediction on 2020 Olympic Men’s Triple
Jump distance?
I How would you quantify the uncertainty of 2020 Olympic
Men’s Triple Jump distance?

1
Example 2: Music and memory
Is it a good idea to list to music when study for a big test?

Can you design a study to test your hypothesis?
In a study conducted by some Statistics students, 62 people were
randomly assigned to listen to rap music, Mozart, or no music
while attempting to memorize objects pictured on a page. They
were then asked to list all the objects they could remember.
I How would you like to present and summarize the data?
I If given this summary table, what will you conclude? Why?
I Is there any pitfall in this study? If so, how will it affect your
conclusion?
1
From: Veaux (2012) “Stats: Data and Models”
2
Example 3: Kidney Stones Treatment
In 1990s, a medical study was conducted to compare several
treatments of kidney stones, and found that 273 out of 350
patients who underwent open surgery were successfully cured,
while 289 out of 350 who underwent noninvasive percutaneous
nephrolithotomy were successfully cured.
I If you have a friend or relative who has kidney stone problem,
which treatment would you suggest him/her?
I Actually for kidney stones, the treatment is often assigned
based on the size of stones instead of random. See data from
the two subgroups of small and large stones.
New Treatment Standard Treatment
Small Stones 81 out of 87 (93%) 234 out of 270 87%)
Large Stones 192 out of 263 (73%) 55 out of 80 (69 %)
Total 273 out of 350 (78%) 289 out of 350 (83%)
Now which would you suggest?
2
From: Charig et al. (1986). “Comparison of treatment of renal calculi by
open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave
lithotripsy”, British Medical Journal.
3 4
Example 4: Care Pathway
An OB doctor (obstetrician) wanted to evaluate effect of a care
pathway protocol in childbirth that was implemented two years
ago. She counted the number of different types of delivery prior
and post implementation of the care pathway. She found that the
rate of C-sections dropped by 3%, and the rate of Spontaneous
Vaginal Deliveries increased by 2%.
I Do you think this care pathway is beneficial? Why?
I Can these small percentage changes caused by randomness in
data collection? How to distinguish a random change from a
true improvement?
I If there were 33 C-sections out of 100 inductions pre-pathway
and 30 C-sections out of 100 inductions post-pathway, are
these 3 cases enough to claim the benefit of the care pathway?
I If data were collected from 2000 patients, ... ?
3
From: a recent hospital consulting project
4
Clinical care pathways are essentially protocols that are used to manage
the quality in healthcare concerning the standardization of care processes.
Implementation of care pathway promotes organized and efficient patient care.
Example 5: “Do I really have cancer?”
A patient saw his medical diagnostic report of some cancer is “+”,

which indicates the presence of cancer. He immediately searched
online and found this type of medical diagnose is very accurate:
I sensitivity of the test is 99% (if a person has this cancer, with
99% of chance, his test result will ”+”)
I specificity of the test is 95% (if a person is normal, with 95%
of chance, his test result will be “-”).
Of course, the patient was really scared and worried. But his
doctor told him that the type of cancer is very rare with only 0.1%
prevalence rate, so he does not need to worry too much. Why does
the doctor say that?
Hint: Knowing his diagnostic result is “+”, what is the chance
(probability) that he actually has this type of cancer?

R

About R
I R is a very powerful and popular statistical software

I R is open source and FREE
I R can be used on many OS: windows, Mac, Unix, Linux ....
I R is an efficient data managing and storage facility
I R is the most widely used statistical software in research
I R has a sharp learning curve and good documentation support
I Download R: https://cran.r-project.org/bin/windows/base/
I Introduction to R:
I Long version: https:
//cran.r-project.org/doc/manuals/R-intro.pdf
I Short version: https://cran.r-project.org/doc/
contrib/Torfs+Brauer-Short-R-Intro.pdf

Learning R
I Installation and interface of R (and Rstudio)

I R console, script, workspace, working directory, libraries,
I R code: comments, help
I Data structure in R: scaler, vector, matrix, array, lists, data
frame,
I Data type in R: class (numeric, factor, character,
user-defined), names, attributes,
I Summary statistics: mean, sd, cor, var, plot, hist, summary
I In and out of R: workspace, csv/excel data

SAS

About SAS
I SAS is a powerful and the most widely used statistical package
I SAS is the standard and official package in many application
fields.
I Knowledge of SAS is an asset in many job markets:
I SAS programmer
I official statistical package authorized by FDA
I Used by all Fortune 500 companies
I SAS is good at data management and handling data in
various formats
I much faster than other statistical for extremely large data
I supports Structured Query Language (SQL)
I Can be interactive
I SAS is good at routine statistical analyses.
I SAS was developed in 70s at NCSU, ”Statistical Analysis
System”.

More about SAS
I SAS is NOT free.

I SAS does NOT fully support Mac OS.
I SAS is NOT much used for statistical methodological research.
I SAS is very different from all other programming languages.
I SAS has a very rigid code format.

Where to find SAS
I SAS is available on computers in the ArtSci computing lab in

Seigle Hall, room 012.
I SAS is available in WU Medical school computing lab
I If you have purchased a SAS license, you may download SAS
through wustl
I SAS University Edition (on cloud): free, work on both
windows and Mac,
I Access Virtual SAS through campus network

Learning SAS
I Goal: write simple SAS codes to utilize basic statistical

procedures, and understand the output.
I We will only focus on windowed version SAS.
I The part of SAS we learn is a very small part of SAS: Base,
STAT, Graph, IML, SQL...
I A strategy: check the details of the sample code, and learn
details of each procedure when you need them.
I There are two main components of most SAS programs: data
steps and procedure steps.

SAS Interface
I Program editor:
write your SAS program codes (commands and comments),
should be saved frequently ; xxx.sas.
I Log window:
contain errors, warnings, notes of how SAS interpret your
codes, check it every time you run a program; xxx.log.
I Output window:
results; xxx.lst.
I Explorer and Results Windows:
easy to use as data/file management tools, navigation tool.

SAS Programs
I End of the sentence:

;
I Begining of the blok:
DATA xxx;
PROC GLM;
I End of the blok:
Run; Quit;
I Comments:
/*comments*/
* comments;
Remark: SAS is case insensitive.

SAS Sample codes
I Entering data using data step and data importation

I Exploring data using descriptive statistics and graphics
I Inferential statistics: tests
www.ats.ucla.edu/stat/sas/modules/default.htm

Q1:Test your iClickers
Have you submitted online survey through blackboard yet?

I Yes.
I No.
If you have not done your online survey for math 3200, please do
so.

Q2: Test your iClickers
Have you used R since last Friday?

I No, but I have used R before and am very familiar with R.
I No, but I plan to try it this week when I do my homework.
I Yes, I have installed R on my computer, read the introduction
document, and tried R code.
I Yes, I have installed R on my computer, but not used R yet.

Data Exploring and Descriptive Statistics

Variables

Type of Variables
When we are given data, we have to know the meaning of the

numbers to really understand it. Variables are the results of
observing/measuring selected characteristics of the study units.
We often classify variables in the following types to find more
approriate models/presentations for them.
I Categorical (Qualitative)
I Nominal (non-ordered, “character”)
I Ordinal (ordered, “factor”)
I Numerical (Quantitative)
I Continuous (any possible values on an interval,“numeric”)
I Discrete (finite or infinite countable, “integer”)
Sometimes a variable can be classified differently (eg: income) and
one need to make a choice case by case.

5 6
Recall: Example 4: Care Pathway
An OB doctor (obstetrician) wanted to evaluate effect of a care

pathway protocol in childbirth that was implemented on Sep. 1st,
2014. She counted the number of different types of delivery prior
and post implementation of the care pathway. She found that the
rate of C-sections dropped by 3%, and the rate of Spontaneous
Vaginal Deliveries increased by 2%.
I See a random sample of 200 cases
I How many variables in this dataset? What type of variables?
I How would you present, summarize, and analyze this data?
5
From: a recent hospital consulting project
6
Clinical care pathways are essentially protocols that are used to manage
the quality in healthcare concerning the standardization of care processes.
Implementation of care pathway promotes organized and efficient patient care.
Summary Statistics for Single Numerical Variable

Measure of Central Tendency (Location Measurement)
Let x1 , · · · , xn be the n observed variables.

Pn
I (Sample) Mean x̄ =
i=1 xi /n
average, use information from all values, sensitive to outliers
I (Sample) Median x̃ (x̃0.5 )
middle value, only depend on one or two values and rank of
observations, insensitive to outliers (robust)
I Trimmed mean
I Mode

Measure of Dispersion (Variability)
I Range
Pn
I Variance s2 = i=1 (xi − x̄)2 /(n − 1) (Standard Deviation s)
I IQR: Interquartile Q3 − Q1
Quantile: extension from median
Q1 (x̃0 .25): first quantile , median of the lower half
Q3 (x̃0 .75): third quantile ,median of the upper half
x̃p , p ∈ [0, 1]): In general, the 100pth quantile is the value
which has a fraction of p of the data less than or equal to it
and has a fraction of 1 − p of the data great than it.
Five number summary: min, Q1 , Q2 (= x̃), Q3 , max

Other Statistics
Note that all above summary statistics are sensitive to location and
scale change, which might be less appealing. Here are some other
statistics which are insensitive to location and/or scale changes.
I Coefficient of Variation (CV): CV = s/x̄ a relative measure of
dispersion, insensitive to scale change
I Centered data: subtract sample mean from all observations
I Scaled data: divide sample standard deviation from all
observations
I Standardized data (z-scores): centered and then scaled
observations zi = (xi − x̄)/s
I Order statistics: rank of observation
Sample skewness and kurtosis
R example: on Labor Data

Graphic Presentation of Data

Histogram
Count the number of observations (frequency) within each bin

(”break”) for numerical variable.
I Could also be frequency or relative frequency
I Great for seeing the shape (distribution) of the data:
unimode/bimode/multimode, symmetric/skewed, heavy tail,
uniform ...
I It is crucial to choose an appropriate bin size: different bin
size may tell different stories
I R function: hist
Other visual tools: stem and leaf plot, dotplot ...

Q1: Histogram and Standard Deviation
Please order the histogram below from the smallest standard
deviation to the largest standard deviation.
A B C
a. A<B<C
b. C<B<A
c. B<C<A
d. C<A<B
e. B<A<C
Boxplot
I Usually for numerical data, but can be grouped by one or two

categorical variables.
I It shows five number summary: box, median, fences, whiskers,
outliers
I R function: boxplot
Other visual tools: pie chart, bar chart ...

For more than one numerical variables
I When there are two numerical variables: scatterplot

I When there are two numerical variables and categorical
variable: scatterplot with different symbols
I When there are three numerical variables: 3D plots
I When there are several numerical variables: matrix of
scatterplots
I ....
R function: plot

Descriptive Statistics for Relationship Between
Two Variables

For Numerical Variable: Covariance and Correlation
Let (x1 , y1 ), · · · , (xn , yn ) be n pairs of observations.

Pn
i (xi − x̄)(yi − ȳ)/(n − 1).
I Covariance: sxy =
invariant to location shift
I Correlation: rxy = sxy /(sx sy ).
scaled version of covariance, invariant to both location and
scale change
I > 0: positive correlation, y variable increases as x increases
I < 0: negative correlation, y variable decreases as x increases
I = 0: linearly uncorrelated
Alert 1: uncorrelated 6= independence
Alert 2: correlation 6= causation
Covariance and correlation will not change when one exchange x
and y.

Extension to single variable: Autocorrelation
The idea of correlation might be used for single variable to check
the association over “time” (index) within a single variable.
Basically, we replace yi by xi−k for k = 1, · · · , n − 1. Here, k is
called lag.
The first autocorrelation coefficient (k = 1) is
n−1
X n
X
r1 = (xi − x̄)(xi+1 − x̄)/ (xi − x̄)2 .
i=1 i
Generally, the kth autocorrelation coefficient is

n−k
X n
X
rk = (xi − x̄)(xi+k − x̄)/ (xi − x̄)2 .
i=1 i
This is useful in quantifying linear dependence structure in time

series.

Regression Line on Scatterplot
Visually it might be useful to add a line on a scatterplot to

summarize the relationship between two numerical variables.
Naturally we want to find a line that is closest to all data.
For simplicity, let’s first consider a linear line y = a + bx. How to
find a and b so that this line is closest to all (xi , yi ), i = 1, · · · , n?
Least square criterion:
n
X
min [yi − (a + bxi )]2
a,b
i
One can solve the minimization by taking derivative w.r.t. a and b.

Let â and b̂ be the minimizer. Then we refer y = â + b̂x as the
regression line or least square line.

Correlation and Regression
Correlation and Regression are closely related.

I It can be proved that the slope of the regression line, b̂, for
standardized data, is same as the correlation coefficient
between the two variables. (Homework: check in R)
I The sign of the slope of the regression line is same as the sign
of the correlation coefficient.
s
I In general, b̂ = rxy sxy .
I In regression, we are more interested to see who x affects y.
We call x as explanatory variable (covariate or independent
variable), and call y as response variable (outcome or
dependent variable). While in correlation calculation, the roles
of x and y are exchangeable.

Q2: Correlation from Scatterplot
What number might be the Pearson correlation coefficient between
Verbal and Math SAT scores?
a. 0 b. -0.7 c. 0.7 d. 2 e. -1
Extension to Nonlinear and Multiple Regression
n
X
min [yi − (a + bxi )]2
a,b
i
The idea of least square criterion can be generalized beyond linear

regression line.
I For any
Pnparametric form of x, f (x), one may minimize
2
minf i [yi − f (xi )] to find the best f to describe the
relationship between x and y. For example, f is a quadratic
form.
I One may have multiple covariates xi1 , xi2 , · · · , xip .

For Categorical Variable:Contingency Table
Recall example 3: Kidney Stones Treatment
Success Failure Row Total

New Treatment (Open Surgery) 273 77 350
Standard Treatment (Noninvasive) 289 61 350
Column Total 562 138 700
I This is a two-way 2 × 2 table. The row variable is treatment

and the column variable is treatment outcome.
I I × J table for I levels of row variable and J levels of column
variable.
I Percentage (row, column, cell) might be presented in the
table. For example, one can find the successful rate for open
surgery and noninvasive treatments are 73% and 78%,
respectively.

Contingency Table for More Than Two Variables
In the previous example, the treatment of kidney stones heavily
depend on the size of stones. Hence we may want to split the table
by a third variable “stone size”, which has two levels “small” and
“large”.
New Treatment (Open Surgery) Standard Treatment (Noninvasive)

Success Failure Row Total Success Failure Row Total
Small Stones 81 6 87 234 36 270
Large Stones 192 71 263 55 25 80
Column Total 273 77 350 289 61 350
I This is called a 3 way contingency table.

I One may extend it to higher contingency table.
I Higher contingency table is often harder to read. Hence it is
important to emphasize on the 1 ∼ 2 interesting variables in
exploring analysis. To deal with more variables simultaneously,
we need more sophisticated models.
R Functions
I cov
I cor
I lm
I table, prop.table, margin.table
I CrossTable{gmodels}

Probability

Review of Basic Probability Concepts

Topics
• Experiments, outcomes, sample space, and events
• Union, Intersection, complement, disjoint Events
• Probability
• Axioms of Probability
Motivating Probability
• Consider the following colored ball example
• What happens if we put our hand in the bowl and pulled a ball out
randomly? Can we guess which color we are most likely to get?
Population size: 17 balls

5 Maroon
6 Orange
5 Blue
1 Green
Motivating Probability
• Recall the colored ball example for proportions
• What happens if we put our hand in the bowl and pulled a ball out
without looking? Can we guess the color?
I say that I have a

Proportion Maroon: 5/17 = 0.294
35.3% chance of pulling
out an orange ball. Proportion Orange: 6/17 = 0.353
Proportion Blue: 5/17 = 0.294
Proportion Green: 1/17 = 0.059
What do I mean by chance?
• I mean the relative frequency with which I expect some “event” to
occur
• Event: Pulling out an orange ball (doesn’t matter which one).
• Random draw: Put my hand in the bowl and pull ball out without
looking
• I record the color and put the ball back.
• If I did this 10 times then I would expect about 10*0.353=3.53 of
the draws to be orange.
• That is, I’d expect the relative frequency of orange balls with
respect to the 10 balls to be around 35.3%.
Experiments, Outcomes, and Events
• The above example had three parts to it:
1. Action of pulling out a single ball
2. Four different colors we could get on one pull
3. Specific event of getting an orange ball
Experiments, Outcomes, and Sample Space
• Experiment: single, “random” trial
• Pull ball out of bowl
• Flipping coin
• Measurement from randomly selected person
• Outcomes: observable, potential result of the trial
• Pulling a green ball out of the bowl
• Getting heads
• Sample Space: Set of all outcomes
Events and probabilities
• Event: any subset of the sample space
• Probability of an event: the relative frequency with which that event

can be expected to occur
Experiment: One throw of the die

Outcome: 1
Sample Space: {1, 2, 3, 4, 12
5, 6}
Events: {1} or {2} or {3}
Notation!
• We could also think of probability as being a function that takes
every event and assigns it a number between 0 and 1
• Using P to denote the function and A to denote some event. Y must

be a number between 0 and 1.
Event goes in here, not probability
𝑷 𝑨 =𝒀
Read as: the probability of event A is Y
Notation!
• Probability of getting a heads on one coin flip is 1/2
𝑷 𝑯𝒆𝒂𝒅𝒔 = 𝟎. 𝟓
• Probability of rolling a 2 for a six-sided die is 1/6
𝑷 𝑹𝒐𝒍𝒍𝒊𝒏𝒈 𝟐 = 𝟏/𝟔 𝑷 {𝟐} = 𝟏/𝟔

• Probability of getting an orange ball is 6/17
𝑷 𝑶𝒓𝒂𝒏𝒈𝒆 = 𝟔/𝟏𝟕
More on Events
• Events are described as subsets of sample spaces
• Subset: collection of any outcomes
• Elementary event: an event consisting of a single outcome
• There is one subset called the empty set that contains no outcome
• On the other hand, a subset can also contain all outcomes, so that
the event equals the sample space All possible outcomes are
S contained in this rectangle
Event A lies inside S and A
some of the outcomes lie
“inside” of it.
More on Events
the event equals the sample space
S A could be so “small” that
A it doesn’t contain anything.
This is the empty set.

More on Events
the event equals the sample space
S=A
S
Or it could be so big that it
takes up the entire sample
space!
Complements
• What about all the events that lie “outside” of A?
• The complement of an event A are all the outcomes that are in S but
not in A
• Denoted by 𝑨𝒄
S
A
Fact: Every outcome
𝑨𝒄 is either in A or 𝑨𝒄
Multiple Events
• We could have multiple events of interest, say A and B
• There are two parts worth describing
• The part that “overlaps” or “intersects”
• The parts that do not overlap
Outcomes that are in both
events!
S
A These events do not overlap, so
there is no outcome common to
both.
B
Multiple Events
• We could have multiple events of interest, say A and B
• There are two parts worth describing
• The part that “overlaps” or “intersects”
• The parts that do not overlap
Outcomes that are in both
events!
Now they do overlap! There
are outcomes common to both!
S
A
B
Event Operations: Intersection
• What if we really only care about the events common to both?
• C = outcomes in BOTH A and B
• That is, I want to the overlap of A and B
𝑪=𝑨 ∩𝑩
Read: A “intersect” B
C All outcomes in here is the

A intersection
B
Mutually Exclusive Events
• In some cases there won’t be any overlap between the sets
• In this case 𝑪 = 𝑨 ∩ 𝑩 = ∅, the empty set!
• Mutually exclusive events: two events whose intersection equals the
empty set (no outcomes common to both)
• Fact: Every event is made up of a union of mutually exclusive events
(just union all the elementary events!)
S Mutually exclusive events cannot happen
A at the same time.
Example: A and 𝑨𝒄 are mutually exclusive!
B
Event Operations: Union
• We can define new events using multiple events!
• C = the event that has all the outcomes contained in A and B
• That is, I want to join (union) the outcomes in A and B
What happens when we

𝑪=𝑨 ∪𝑩
Read: A “union” B
union an event and its
complement?
A C
Are there any outcomes
that lie outside this union? B
• Ex: Roll a die once
• The sample space is {1, 2, 3, 4, 5, 6}
• Event A = Roll an even number {2, 4, or 6}
• Event B = Roll 1, 2, or 3
What are the elementary events that make up A? What about B?

What is the complement of A? What about B?
𝑨∪𝑩=
𝑨∩𝑩=
Axioms of Probability
• It’s crucial to understand how events work in order to understand
probability
• We want probability to be relative frequency with which we expect
the event to occur Why: Relative frequencies always
• Axiom 1: 𝑷 𝑺 = 𝟏 add up to 100%
• Axiom 2: 𝟎 ≤ 𝑷(𝑨) ≤ 𝟏 Why: Relative frequencies are never

negative, and never exceed 100%
• Axiom 3: If A and B are mutually exclusive
𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷(𝑩) Why: There are no outcomes in
common, so just do the relative
frequencies separately and add.
Rolling A Die Once
• “Fair” die = equal probability for each side
• There are 6 sides, so 1/6 for each side
• Event A = Roll an even number
• Event B = Roll 1, 2, or 3
P(A)= P(B)=
𝑷 𝑨𝒄 = 𝑷 𝑩𝒄 =
𝑷(𝑨 ∪ 𝑩) = 𝑷(𝑨 ∩ 𝑩) =
Rolling A Die Twice
• Record side on first and second roll:
(Side 1, Side 2)
• What’s the sample space?
(1, 1) (2, 1) (3, 1) (4, 1) (5, 1) (6, 1)

(1, 2) (2, 2) If we (3,
roll2)a 1 first, we could
(4, 2) roll a 1,(6,2,2)…, 6 on
(5, 2)
(1, 3) (2, 3) the second.
(3, 3) (4, 3) (5, 3) (6, 3)
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4)
(1, 5) (2, 5) (3, 5) (4, 5) (5, 5) (6, 5)
(1, 6) (2,6) (3, 6) (4, 6) (5, 6) (6, 6)
Rolling a Die Twice
• 36 total outcomes, each are equally likely
𝑷 𝟏, 𝟏 = 𝑷 𝑺𝒖𝒎 𝒐𝒇 𝑹𝒐𝒍𝒍𝒔 = 𝟕 =
𝑷 𝑹𝒐𝒍𝒍 𝒂 𝟏 = 𝑷 𝑹𝒐𝒍𝒍 𝑻𝒘𝒐 𝑶𝒅𝒅 #′ 𝒔 =
(1, 1) (2, 1) (3, 1) (4, 1) (5, 1) (6, 1)

(1, 2) (2, 2) (3, 2) (4, 2) (5, 2) (6, 2)
(1, 3) (2, 3) (3, 3) (4, 3) (5, 3) (6, 3)
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4)
(1, 5) (2, 5) (3, 5) (4, 5) (5, 5) (6, 5)
(1, 6) (2,6) (3, 6) (4, 6) (5, 6) (6, 6)
• Ex: Assume we have a rectangle with area 1
𝑷 𝑩 =
B
A 𝑷 𝑨∩𝑩 =
0.04 0.30
0.13 𝑷 𝑨∩𝑩∩𝑪 =
0.03 𝑷 (𝑨 ∪ 𝑩 ∪ 𝑪)𝒄 =
0.02 0.04
0.07 C 0.37
Conditional Probability and Independence

Motivating Examples
I What is the probability a randomly chosen student is a

sophomore? What is the probability that a student from our
class is a sophomore?
I Recall the example of kidney stone treatment. What is
successful rate of the new treatment (open surgery)? What is
the successful rate of the new treatment for the patients with
large kidney stones?
I Recall the example of “Do I really have a cancer?” What is
the sensitivity and specificity of a medical diagnosis?

Recall Example: “Do I really have cancer?”
A patient saw his medical diagnostic report of some cancer is “+”,

which indicates the presence of cancer. He immediately searched
online and found this type of medical diagnose is very accurate:
I sensitivity of the test is 99% (if a person has this cancer, with
99% of chance, his test result will ”+”)
I specificity of the test is 95% (if a person is normal, with 95%
of chance, his test result will be “-”).
Of course, the patient was really scared and worried. But his
doctor told him that the type of cancer is very rare with only 0.1%
prevalence rate, so he does not need to worry too much. Why does
the doctor say that?
Hint: Knowing his diagnostic result is “+”, what is the chance
(probability) that he actually has this type of cancer?

Topics
• Conditional probability
• Independent events
• Probability Rules: multiplication, addition, complement
• Bayes’ Rule
Conditional probability
• Changing relative frequencies by “conditioning” on the value of an
explanatory variable
Major
Gender Int. Bus Geo Socio H. Develop PU Affairs Finance Row Freq
Male 0.10 0.28 0.18 0.18 0.06 0.20 1.0
Female 0.20 0.20 0.20 0.16 0.08 0.16 1.0
Col Freq 0.12 0.26 0.18 0.18 0.06 0.19 1.0
• In some cases, being told that event A occurred implies that the
probability of event B changes
• Conditional probability: the relative frequency we can expect an
event to occur under the condition that additional, preexisting
information is known about some other event
• Denoted 𝑷 𝑨 𝑩) 1) Probability of A given B
2) Probability of A, knowing B
3) Probability of A happening,
knowing B has already occurred
Major
Gender Int. Bus Geo Socio H. Develop PU Affairs Finance Row Freq 𝑷 𝑮𝒆𝒐 𝑴𝒂𝒍𝒆) =
Male 0.10 0.28 0.18 0.18 0.06 0.20 1.0
Female 0.20 0.20 0.20 0.16 0.08 0.16 1.0
𝑷 𝑰𝒏𝒕 𝑩𝒖𝒔 𝑭𝒆𝒎𝒂𝒍𝒆) =
Col Freq 0.12 0.26 0.18 0.18 0.06 0.19 1.0
• If we know that B happened, we know that all outcomes outside of
B could not happen
• This means that events comprised only of these outside outcomes
now have zero probability!
A can still occur given B occurred
A
B All outcomes in B become the new
sample space!
• We “standardize” the probability of A with the probability of B
The maroon area takes up

more space in B than it does
in S…probability of A is higher
if we know B happened!
A
B
Example
• Say we pulled a ball out and didn’t put it back in. This will change
the probability if we pull out another ball!
Color Orange Maroon Blue Green

Probability 6/17 5/17 5/17 1/17
There are now only 16 balls in the bowl!

Example

Probability 5/16 5/16 5/16 1/16
𝟓
𝑷 𝑶𝟐 | 𝑶𝟏 = = 𝟎. 𝟑𝟏𝟐𝟓 < 𝟎. 𝟑𝟕𝟓
𝟏𝟔
Shorthand for “Orange ball, 2nd pull”
Example

Probability 6/17 5/17 5/17 1/17
𝑷 𝑴𝟐 | 𝑶𝟏 =
𝑷 𝑶𝟐 | 𝑴𝟏 =
𝑷 𝑮𝟐 | 𝑮𝟏 =
Independent events
• Independent events: if the occurrence (or non-occurrence) of one

event does not change the probability of the other event occurring
Conditioning on B or its complement does not change the

probability of A occurring
Independent events
• This actually gives us a quick way to calculate 𝑷(𝑨 ∩ 𝑩)
• If A and B are independent we have
We don’t need to find the intersection,

just multiply the two probabilities!
Multiply by P(B) on both sides to get
𝑃(𝐴 ∩ 𝐵) = 𝑃 𝐴 𝑃(𝐵)
Example
• A good free-throw percentage is 90%
• Assume the probability of making a free-throw on one
shot is 0.90
• Event = make two shots in a row
Fotosearch.com
• Let’s assume the second shot is independent of the first
𝐴 = 𝑀𝑎𝑘𝑒 1𝑠𝑡 𝐵 = 𝑀𝑎𝑘𝑒 2𝑛𝑑 First Second Probability
Miss Make P(Miss 1st)P(Make 2nd)=0.10*0.90 =0.09
Miss Miss P(Miss 1st)P(Miss 2nd)=0.10*0.10 =0.01
This is the event we want Make Make P(Make 1st)P(Make 2nd)=0.90*0.90=0.81
Make Miss P(Make 1st)P(Miss 2nd)=0.90*0.10 =0.09
Example
shot is 0.90
Fotosearch.com
Miss Miss P(Miss 1st)P(Miss 2nd)=0.10*0.10 =0.01
Make Make P(Make 1st)P(Make 2nd)=0.90*0.90=0.81
These two outcomes make up A
Example
shot is 0.90
Fotosearch.com
These outcomes make up B Miss Miss P(Miss 1st)P(Miss 2nd)=0.10*0.10 =0.01
Example
shot is 0.90
Fotosearch.com
𝑬𝒗𝒆𝒏𝒕 = 𝑨 ∩ 𝑩 Miss Miss P(Miss 1st)P(Miss 2nd)=0.10*0.10 =0.01
Why do I care about this? Make Miss P(Make 1st)P(Miss 2nd)=0.90*0.10 =0.09
Example
shot is 0.90
Fotosearch.com
Miss Make P()P(B) =0.10*0.90 =0.09
Because A and B are independent!
Miss Miss P()P()=0.10*0.10 =0.01
𝑃 𝐴 = 𝑃 𝐵 = 0.90 Make Make P(A)P(B) =0.90*0.90 =0.81
𝑃 𝐴𝑐 = 𝑃 𝐵𝑐 = 0.10 Make Miss P(A)P() =0.90*0.10 =0.09
Mutually Exclusive vs Independence
• It’s easy to confuse mutually exclusive with independent, but these
are very different concepts
• Say we have two mutually exclusive events A and B, each with
positive probability
S
𝑨∩𝑩=∅ 𝑷(𝑨 ∩ 𝑩) = 𝟎
A
To be independent P(𝐴|𝐵) = 𝑃(𝐴)
B
• These events will never be independent! But 𝑃 𝐴 > 0!

Rules of probabilities
• One way to calculate probabilities of events created from
unions/intersections/complements is to first find all the outcomes
in that event and then find the probability
Useful if we know what C looks like
𝑷 𝑨 ∪𝑩 =𝑷 𝑪
• Could use probabilities of the individual events in some cases

• Addition Rule: Unions
• Complement Rule: Complements
• Multiplication Rule: Intersections
Multiplication Rule
• Recall: When two events were independent we could write
𝑃(𝐴 ∩ 𝐵) = 𝑃 𝐴 𝑃(𝐵)
• This was because 𝑷 𝑨 = 𝑷(𝑨|𝑩), in general this isn’t true!
𝑃(𝐴 ∩ 𝐵)
𝑃 𝐴𝐵 = 𝑃(𝐴 ∩ 𝐵) = 𝑃 𝐴|𝐵 𝑃(𝐵)
𝑃(𝐵)
Multiplication Rule
• There are two ways to write this
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴|𝐵 𝑃(𝐵)
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵|𝐴 𝑃(𝐴)
• Called the multiplication rule
• This gives us a useful way to calculate probabilities of
intersections….assuming we know the conditional probability
Example
• You pull out two balls in a row (not putting them back in)

Probability 6/17 5/17 5/17 1/17
𝑷 𝑻𝒘𝒐 𝑶𝒓𝒂𝒏𝒈𝒆 𝑩𝒂𝒍𝒍𝒔 =

𝑷 𝑮𝟏 ∩ 𝑴𝟐 =
𝑷 𝑩𝟏 ∩ 𝑶𝟐 =
Example
• Randomly pull out five cards from a deck of 52 and want to know
how likely you are to get a royal flush.
• These are four possibilities in the sample space of all possible 5 card
combinations
• Equally likely so just do one
and multiply by 4
• Start by picking one suit
Example
• Idea: Find the probability for a royal flush of Hearts, then multiply by
4 (the other suits have the same probability)
• We’ll need to repeatedly use the multiplication rule
Addition Rule
• Remember Axiom 3, where if we have mutually exclusive events
𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷(𝑩)
• Here we don’t need to figure out C, we just add the probabilities!
• What happens if they aren’t mutually exclusive?
𝑷 𝑨 ∪𝑩 =
A
𝑷 𝑨 =
B
𝑷 𝑩 =
Addition Rule
• If we just add P(A) and P(B) we get too much area!
𝑷 𝑨 +𝑷 𝑩 = +
Idea: Maybe we can just

subtract off the extra area that ≠
we added. But what is
this extra area?
= 𝑷(𝑨 ∪ 𝑩)
Addition Rule
• This extra area is the intersection of A and B! This gives us the
following addition rule:
𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷 𝑩 −𝑷(𝑨 ∩ 𝑩)
• Note that if A and B are mutually exclusive the intersection goes

away!
• This is only useful if we know 𝑷(𝑨 ∩ 𝑩)
Example
B 𝑷 𝑨 =
A
0.04 0.30 𝑷 𝑩 =
0.13
0.03 𝑷 𝑨∩𝑩 =
0.02 0.04
0.37 𝑷 𝑨∪𝑩 =
0.07 C
Complement Rule
• Recall: any event and its complement account for all outcomes in S
𝑺 = 𝑨 ∪ 𝑨𝒄
• What do we know about these events and their probabilities?
𝑷(𝑺) = 𝟏 𝑷 𝑨 ∪ 𝑨𝒄 = 𝑷 𝑨 + 𝑷(𝑨𝒄 )
• Since these two events are equivalent, their probabilities must be
equal! 𝒄 𝒄
𝑷 𝑺 =𝑷 𝑨∪𝑨 𝟏 = 𝑷 𝑨 + 𝑷(𝑨 )
Complement Rule
• Doing simple algebra, we get the probability of the complement
𝑷 𝑨𝒄 = 𝟏 − 𝑷(𝑨)
• This is called the complement rule
• If we know the probability of an event we can easily find out the
probability of its complement
• If we know the probability of the complement of an event, we can
find out the probability of the event.
Most useful if an event is “large” but
the complement event is “small”
Examples
• There are around 60 students in this class. What’s the probability
that two or more people have the same birthday?
• Ignore leap year, so 365 possible birthdays
• Total # of different combinations of 60 birthdays?
Birthdayexpress.com
⋯⋯⋯
60th power!
365 365 365 365 365 365
𝟑𝟔𝟓𝟔𝟎
Examples
• How can we possibly count every single combination where at least
two people had the same birthday!?
• Idea: Let’s count the complement instead!
• Complement Event: No two people have the same
birthday
Birthdayexpress.com
⋯⋯⋯
365 364 363 362 361 360
365*364*…*(365-60+1)
Examples
• Divide these two quantities and subtract from 1
𝟑𝟔𝟓∗𝟑𝟔𝟒∗⋯∗ 𝟑𝟔𝟓−𝟔𝟎+𝟏
𝑷 𝑨 =𝟏− = 0.994
𝟑𝟔𝟓𝟔𝟎
• What about a group of 23 people?
𝟑𝟔𝟓∗𝟑𝟔𝟒∗⋯∗ 𝟑𝟔𝟓−𝟐𝟑+𝟏
𝑷 𝑨 =𝟏− = … >0.5
𝟑𝟔𝟓𝟐𝟑
• It is almost guaranteed that two of you have the same Birthdayexpress.com
birthday!
• The complement rule allowed us to make this problem a lot easier!
Let’s Make A Deal (by Monte Hall 1960~70)
• There are three doors, two have a goat behind them, one has a new
car!
dreamstime.com
Clker.com
• You choose a door and the game show host (who knows what’s
behind the doors) opens one of the other doors with a goat behind it
dreamstime.com
Clker.com
You have the option to change your answer.

You choose this door. Should you? Does it matter?
• Strategy 1: You think you’re being tricked! Never change your answer.
• P(Winning Car) = 1/3
• Strategy 2: Always change your answer.
First Choice Car Goat 1 Goat 2

Known Goat 1 (or 2) Goat 2 Goat 1
Second Choice Goat 2 (or 1) Car Car
𝟐
𝑷 𝑪𝒂𝒓 𝑮𝒐𝒂𝒕 𝑺𝒉𝒐𝒘𝒏, 𝑪𝒉𝒂𝒏𝒈𝒆 𝑨𝒏𝒔𝒘𝒆𝒓) =
𝟑
The information, which seemed useless, doubles your chances of getting the car!
Bayes’ Rule
• Use to calculate the “reverse” probability
𝑃(𝐴 ∩ 𝐵) 𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴𝐵 = =
𝑃(𝐵) 𝑃 𝐵 𝐴 𝑃 𝐴 + 𝑃 𝐵 𝐴𝑐 𝑃(𝐴𝑐 )
• Tree diagrams/Venn Diagrams
Summary
• Probability of an event: the relative frequency with which that event
can be expected to occur
• Understand the relationship between sample spaces and events
• Event operations may be used to combine multiple events
• Know the three axioms of probability
• The size of the sample space determines how we assign probabilities
• Conditional probabilities “reduce the sample space”
• Mutually exclusive versus Independence
• Multiplication, Addition, and Complement rules make calculating
probabilities easier
• Bayes’ Rule can be used to calculate the “reverse” probability
Random Variables and Distributions

Probability as Function
I We originally defined probability as a specific function of

events
P (A) = Y
I In some cases these probabilities were intuitive and we did not
really need to think of it as being a function. For example, flip
a coin to see which side faces up.
I But in the world of large sample spaces, it is impossible to
just know or manually assign a probability to every outcome.
For example, weight of newborns.

Random Variable
I If our outcomes are numbers, we can create a function to
assign probabilities to all outcome numbers in the sample
space.
I Not every sample space will be made up of numerical
outcomes. For example, flip a coin (H/T), draw different
colored balls
I It is mathematically convenient to associate with every
outcome some numerical value.
I Random variable: a variable or a function that assigns a
unique numerical value to every outcome in the sample space,
S.
I If s is an outcome in S, then we can think of a random
variable this way:
X(s) = Real Number

Random Variable
I If we flip a coin once, the possible outcomes are Head or Tail.

I What is a random variable we can use?
(
1, if Head
X(s) =
0, if Tail
I A random variable is also a function, we often drop the

function notation.
I A random variable is usually denoted by a capital letter, such
as, X, Y, Z.
I An observed value of X (not random) is denoted by lower
case x. (We called it realization of X.)

Type of Random Variables
I Random variables are always quantitative.

I We had classified numerical data to discrete and continuous,
similarly the classification can be applied to random variables.
I Discrete random variable: the possible values are along
the real line.
I Continuous random variable: the possible values forms
along the real line.
a. finite, infinite;
b. discrete, an interval or intervals;
c. countable, uncountable.
I Probability function: a function that assigns probability to the
values of a random variable. The type of random variable
determines what type of probability function we can use.

Probability Distribution of a Discrete R.V.
I Often we use a table to present probability distribution of a

discrete R.V.
I Ex. Let X = the number of heads in tossing a fair coin. The
probability distribution of X is
I Ex. Let X = the number of heads in tossing a fair coin
twice...
I Ex. Say we have a R.V. with 4 possible values: {1, 2, 3, 4} and
we assign probability P (X = x) = x/10, for x = 1, 2, 3, 4.
I Besides listing probabilities of every possible outcome in a
table, one may also display the probability distribution using a
formula as above. Furthermore, ...

Distribution Functions for Discrete R.V.
I Probability Density Function (pdf) for a discrete r.v. (It is
also called probability mass function (pmf))
f (x) := P (X = x)
Following
P
the definition of probability, f satisfies
1. i f (xi ) = 1;
2. 0 ≤ f (xi ) ≤ 1.
I Remark: It is very important to list all possible values for x or
specify the range of x in stating a pdf.
I All possible values of a random variable is called its support.
I Cumulative Distribution Function (cdf) for a discrete r.v.
F (x) := P (X ≤ x)
Following the definition of probability, F satisfies
1. F (−∞) = 0, F (∞) = 1;
2. F is monotone increasing;
3. F is right continuous.
Distribution Function for Discrete R.V.: Plots
Recall: Ex. Say we have a R.V. with 4 possible values: {1, 2, 3, 4}

and we assign probability P (X = x) = x/10, for x = 1, 2, 3, 4.
Can you plot the pmf and cdf. of the random variables on the
previous slide?

Q2: Calculation using cdf or pdf
Find the value of k so that the following function is a probability

distribution of a r.v. X: f (x) = k(x + 2), where x = 0, 1, 2, 3.
a. 1/6
b. 1/14
c. 1/8
d. 1/20
For this probability distribution, find F (2) .

Distribution Functions for Continuous R.V.
Similarly as for discrete r.v., one can also define cdf and pdf for a
continuous r.v.
Consider randomly dropping a point on the interval [0, 1]. Let
X =distance between the point and the origin, which is a
continuous r.v. and can take any values between 0 and 1.
I Cumulative Distribution Function for a continuous r.v.:
F (x) := P (X ≤ x), x ∈ R.
I Probability Density Function (pdf) for a continuous r.v.:
dF (x)
f (x) := , x ∈ R.
dx
P R
I Replacing by , the properties of cdf and pdf still hold.
I We will talk more about pdf for continuous r.v. later.

Parameters of Distributions

Motivating Example: Drawing Colored Balls
Recall in the example of drawing colored balls from a bowl, We

had 17 balls in total, 5 Maroon, 6 Orange, 5 Blue, 1 Green. Now
let’s assume different colored ball is worth of different amount of
money. Say $1 for Maroon balls, $2 for Orange balls, $3 for Blue
balls, and $5 for the Green ball. Then how much will I win if I
randomly pull out a ball?
I Of course, the amount of money I win is random.
I Then, “on average”, how much will I win? What is the
expectation of my award?
(1 + 2 + 3 + 5)/4? or (1 × 5 + 2 × 6 + 3 × 5 + 5 × 1)/17?
I Furthermore, how precise is this expectation? What is the
variance of my award?

Mean /Expectation/Expected Value
(Population) Mean is a measurement of central tendency of the

population, denoted by µ.
It is also referred to as Expectation or Expected value of the
distribution (or of the random variable), denoted by E(X).
X X
µ = E(X) := xi f (xi ) = xi P (X = xi )
i all values
I So expectation is a weighted average of all possible values

that a random variable can take, with weight being the
probability of taking that value.
I Mean/Expectation/Expected value of a continuous
P R r.v. can
be calculated in the same fashion replacing by .

Variance
(Population) Variance is a measure of spread of the population,

denoted by σ 2 or V ar(X). Precisely,
X
σ 2 = V ar(X) := E[(X − µ)2 ] = (xi − µ)2 f (xi )
i
I Variance measures the average square distance away from the

center of the population.
I It is a special expectation.
I The population p
standard deviation is simply
σ = SD(X) = V ar(X).
I Variance of a continuous
P R r.v. can be calculated in the same
fashion replacing by .

Motivating Example: Drawing Colored Balls
Recall in the example of drawing colored balls from a bowl, We

had 17 balls in total, 5 Maroon, 6 Orange, 5 Blue, 1 Green. Now
let’s assume different colored ball is worth of different amount of
money. Say $1 for Maroon balls, $2 for Orange balls, $3 for Blue
balls, and $5 for the Green ball. Then how much will I win if I
randomly pull out a ball?
I Of course, the amount of money I win is random.
I Then, “on average”, how much will I win? What is the
expectation of my award?
(1 + 2 + 3 + 5)/4? or (1 × 5 + 2 × 6 + 3 × 5 + 5 × 1)/17?
I Furthermore, how precise is this expectation? What is the
variance of my award?

Linear Transformation of Random Variable
Let’s consider a linear transformation of a random variable:

Y = a + bX. Denote the pdf, cdf, mean and variance of X by
fX , FX , µX and σX 2 , respectively. Can you use them to derive
information for Y ? ( pdf, cdf, mean and variance of Y )

Math3200 Classnotes PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Math3200 Classnotes PDF

Uploaded by

Copyright:

Available Formats

Math 3200: Elementary to Intermediate Statistics

Instructor: Jimin Ding

Jimin Ding, Math WUSTL Math 3200 Spring 2017 1 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 2 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 3 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 4 / 67

Statistics is the art and science of learning from data.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 5 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 6 / 67

Is it a good idea to list to music when study for a big test?

A patient saw his medical diagnostic report of some cancer is “+”,

Jimin Ding, Math WUSTL Math 3200 Spring 2017 10 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 11 / 67

I R is a very powerful and popular statistical software

Jimin Ding, Math WUSTL Math 3200 Spring 2017 12 / 67

I Installation and interface of R (and Rstudio)

Jimin Ding, Math WUSTL Math 3200 Spring 2017 13 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 14 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 15 / 67

I SAS is NOT free.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 16 / 67

I SAS is available on computers in the ArtSci computing lab in

Jimin Ding, Math WUSTL Math 3200 Spring 2017 17 / 67

I Goal: write simple SAS codes to utilize basic statistical

Jimin Ding, Math WUSTL Math 3200 Spring 2017 18 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 19 / 67

I End of the sentence:

Jimin Ding, Math WUSTL Math 3200 Spring 2017 20 / 67

I Entering data using data step and data importation

Jimin Ding, Math WUSTL Math 3200 Spring 2017 21 / 67

Have you submitted online survey through blackboard yet?

Jimin Ding, Math WUSTL Math 3200 Spring 2017 22 / 67

Have you used R since last Friday?

Jimin Ding, Math WUSTL Math 3200 Spring 2017 23 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 24 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 25 / 67

When we are given data, we have to know the meaning of the

Jimin Ding, Math WUSTL Math 3200 Spring 2017 26 / 67

An OB doctor (obstetrician) wanted to evaluate effect of a care

Jimin Ding, Math WUSTL Math 3200 Spring 2017 28 / 67

Let x1 , · · · , xn be the n observed variables.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 29 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 30 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 31 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 32 / 67

Count the number of observations (frequency) within each bin

Jimin Ding, Math WUSTL Math 3200 Spring 2017 33 / 67

I Usually for numerical data, but can be grouped by one or two

Jimin Ding, Math WUSTL Math 3200 Spring 2017 35 / 67

I When there are two numerical variables: scatterplot

Jimin Ding, Math WUSTL Math 3200 Spring 2017 36 / 67

Jimin Ding, Math WUSTL Math 3200 Spring 2017 37 / 67

Let (x1 , y1 ), · · · , (xn , yn ) be n pairs of observations.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 38 / 67

Generally, the kth autocorrelation coefficient is

This is useful in quantifying linear dependence structure in time

Jimin Ding, Math WUSTL Math 3200 Spring 2017 39 / 67

Visually it might be useful to add a line on a scatterplot to

One can solve the minimization by taking derivative w.r.t. a and b.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 40 / 67

Correlation and Regression are closely related.

Jimin Ding, Math WUSTL Math 3200 Spring 2017 41 / 67