Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Midterm Review

Dr. Shoemaker

STAT 5301/MATH 5310

Chapter 1 - Descriptive stats


Mean, median, mode! Sample variance and sample standard deviation!
x <- 1:15
mean(x) # not the same as the expected value!

## [1] 8

median(x)

## [1] 8

var(x) # SAMPLE variance

## [1] 20

sd(x) # SAMPLE standard deviation

## [1] 4.472136

Chapter 2 - Probability
Terms:
• Sample space is the set of all the outcomes in an experiement
• Outcome result of an experiment
• Event is one or several outcomes in an experiment
• Complement is “everything else” in the sample space.
• Union, ∪, of two events is everything in one or both of those events
• Intersection, ∩, of two events is everything in both of those events.

Probability of Complements, Unions, and Intersects


P( A ∪ B)=P( A)+ P(B)− P( A ∩ B)

P( A)+ P( AC )=1 ⟹ P (A C )=1− P( A)


A∧B are independent if P( A)∗P( B)=P( A ∩ B)
Why? link

The Multiplication Rule and Conditional Probability


P( A ∩ B)=P( A∨B) P( B)
The conditional probability P( A∨B) indicates what knowing B tells us about the
probability of A. We compute it as:
P( A ∩ B)
P( A∨B)=
P(B)
Note this is just a re-writing of the above rule…

Example
A student goes to the library. Let events:
• B = the student checks out a book
• D = the student check out a DVD
Suppose that:
• P(B) = 0.40
• P(D) = 0.30
• P(D|B) = 0.5.
Find P(B AND D):
Find P(B OR D):

Chapter 3 - Discrete RVs


Random variables! Random variables have possible outcomes, and we have probability
maps for those outcomes, called pmfs (or PDFs, for continuous).
A pdf is the probability that the random variable takes on that value. A cdf is the probability
that the random variable takes on any value up to and including that value.
Suppose our RV is defined by the following pmf:

X
1
2
3

The value of the pdf of X at 2 is 1/4, and the value of the cdf function of X at 2 is 3/4
2
P( X =2)=1/4 P ( X ≤2)=∑ P( X i)=3/4
i=1

Famililes: Binomial, Poisson.


Probability families are a way to conciscely define a pdf without having to define row by
row like above.
For these families:
• We need to define the values of the parameters that family uses
– Binomial needs n and p
• We have a formula for the pdf for any given value of X, and formulas for the expected
value and standard deviation of the random variable.
– Binomial expected value is n*p
• We can use R to compute the pdfs (dpois) and the cdfs (pbinom) for probabilities for
our families
An example: Y ∼B i n o m(5 ,.2)
Find:
P(Y =2) P(Y < 4)
dbinom(2, 5, .2)

## [1] 0.2048

Chapter 4 - Continuous RVs


This is just like the discrete, except now our cdf is an integral instead of a sum!

Normal, student-t,…
So many families. The same logic applies as above:
• We need to define the values of the parameters that family uses
– Normal needs μ and σ
• We have a formula for the pdf for any given value of X, and formulas for the expected
value and standard deviation of the random variable.
– Normal expected value is just μ
• We can use R to compute the pdfs (dnorm) and the cdfs (pexp) for probabilities for our
families
Suppose X ∼ B e t a (2 ,2)
Find:
• $ P(X < 0.2) $
• $ P(0.2 < X < 0.7) $
pbeta(0.2, 2, 2)

## [1] 0.104

pbeta(0.7, 2, 2) - pbeta(0.2, 2, 2)

## [1] 0.68

Chapter 5 - Sample Distributions


We learned how to find the distribution of the sample mean.
For a random variable X, with mean μ, IF variance σ 2 is known:

X ∼ N ( μ , σ / √ n)
This is also true for unknown variance when the samples are large, because of the central
limit theorem.
For a random variable X, with mean μ, with unknown variance, we estimate the sample
variance s2 and our standardized sample mean follows the student-t distribution with n −1
degrees of freedom:
X −μ
t= ∼ t n −1
s / √n

Example:
If X has normal distribution with mean 222 and standard deviation 2.2, what is the
probability that the sample mean is between 220 and 224 when n = 22?
${X} N(222, 2.2/(22) $
mean <- 222
sd <- 2.2
n <- 22

pnorm(224 , 222, 2.2/sqrt(n)) - pnorm(220 , 222, 2.2/sqrt(n))

## [1] 0.9999799

Chapter 6 - point estimates


You estimate μ with x and you estimate σ with s.
Chapter 7 - Confidence Intervals!
Point estimates only go so far!
• Sample 1: 48, 49, 51, 52
• Sample 2: 50, 50, 50.1, 49.1
• Sample 3: 0, 100
All of these have means of 50, so if I’m trying to estimate the true population mean, these
would all have the same point estimate.
Confidence intervals give you an interval estimate around the point estimate that gives an
idea about your… confidence.
The point estimate plus or minus an error bound, where the error bound is the critical
value for your distribution times the standard error of your point estimate.

Example: the sample mean


We start with our sample mean’s distribution, X ∼ N ( μ , σ / √ n) (for large samples, or with
known variance)
So, we know that:
X −μ
(
P z α /2 ≤
σ / √n )
< z α /2 =1 −α

Which we convert into a 100(1 −α )% Confidence interval for μ:

( ( √σn ), x + z ∗( √σn ))
x − z α / 2∗ α/ 2

We can find these critical values in R:


alpha <- 0.05
qnorm(alpha/2, lower.tail = F)

## [1] 1.959964

This is all similar for the student’s t, just using s instead of σ and the critical values for the t
distribution with n −1 degrees of freedom.

Error size:
Our error bound is z α / 2∗σ / √ n, and if we want to control/ensure our error bound is a certain
size, the only thing we can control is the sample size!
So, to find the sample size needed for an error bound of size c, we just move that all around
to get:
2
z α / 2∗σ
n= ( c )
And ALWAYS ROUND UP.

You might also like