Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 52

RE 397 Intro to RE Data Modeling

Lecture 7: Random Variable & Conf. Interval


Instructor: Feiyang Sun
Winter, 2022
Review
• Independence and mutually exclusive are two very different
concepts
- Mutually exclusive says the two events cannot occur together, that
is, they have no intersection
- Independence says each event does not affect the other event’s
probability

• Pr(A and B) = Pr(A) * Pr(B) when A and B are independent


- Since Pr(A) and Pr(B) are not zero, Pr(A and B) is nonzero
- Thus, independent events have an intersection

• Events cannot be both mutually exclusive and independent


- If two events are independent, then they are not mutually exclusive
- If two events are mutually exclusive, then they are not independent
Random Variables

• Random Variable: A variable that assumes a unique


numerical value for each of the outcomes in the sample
space of a probability experiment

Notes:
 Used to denote the outcomes of a probability experiment

 Each outcome in a probability experiment is assigned to

a unique value
 Illustration:
Discrete & Continuous Random Variables

• Discrete Random Variable: A quantitative random


variable that can assume a countable number of values

Note: Usually associated with counting

• Continuous Random Variable: A quantitative random


variable that can assume an uncountable number of
values

Note: Usually associated with a measurement


Probability Distribution

• Probability Distribution: A distribution of the


probabilities associated with each of the values of a
random variable. The probability distribution is a
theoretical distribution; it is used to represent
populations.

Notes:
 The probability distribution tells you everything you need

to know about the random variable.


 The probability distribution may be presented in the form

of a table, chart, function, etc.


Probability Distribution

• Probability Distribution: A distribution of the


probabilities associated with each of the values of a
random variable. The probability distribution is a
theoretical distribution; it is used to represent
populations.

Notes:
 The probability distribution tells you everything you need

to know about the random variable.


 The probability distribution may be presented in the form

of a table, chart, function, etc.

Probability Function: A rule that assigns probabilities


to the values of the random variable
Probability Distribution

• Example: The number of people staying in a selected


room at a local hotel within certain time period is a
random variable ranging in value from 0 to 4. The
probability distribution is known and is given in various
forms below:
x 0 1 2 3 4
P (x ) 2/15 4/15 5/15 3/15 1/15
Probability Distribution

• Example: The number of people staying in a selected


room at a local hotel within certain time period is a
random variable ranging in value from 0 to 4. The
probability distribution is known and is given in various
forms below:
x 0 1 2 3 4
P (x ) 2/15 4/15 5/15 3/15 1/15

Probability Function: Pr(x = 2) = 5/15


Probability Distribution

Hotel Room Probability Distribution


P( x )
0.4

0.3

0.2

0.1

0.0 x
0 1 2 3 4
Normal Probability Distribution

• A continuous random variable


• Symmetric around mean
• Normal probability distribution
function

• The mean, median, and mode are


equal
  3   2          2   3
• The value x is unrestricted
(-∞<x<∞)
Probability of Normal Distribution

b
P (a  x  b)   f ( x )dx
a

P ( a  x  b)

a b x
Probability of Normal Distribution
Probability Distribution

• z-Score: The position a particular value of x has relative


to the mean, measured in standard deviations. The z-
score is found by the formula:

value  mean x  x
z 
st.dev. s
Probability Distribution

• Example: A certain data set has mean 35.6 and standard


deviation 7.1. Find the z-scores for 46.

Solutions:
x  x 46  35.6
z   176
.
s 7.1
46 is 1.76 standard deviations above the mean
Probability of Normal Distribution
Confidence Intervals

P Z

99% 2.58

95% 1.96

90% 1.645
Probability Distribution

• Many methods for studying a normal population

• It is important to check for population normality before


conducting statistical analysis

• Normal Probability Plot (NPP) is a graphical


examination of normality based on the percentile
Probability Distribution
Discrete Random Variable

• Binomial Probability
Distribution: A type of
discrete distribution, also
noticed as Bernoulli Random
Variable distribution

• Based on a series of repeated


trials whose outcomes can be
classified in one of two
categories: success or failure

Jacob Bernoulli (1654-


• Distribution based on a 1705)
binomial probability
experiment
Discrete Random Variable

• Binomial Probability Experiment: An experiment that is


made up of repeated trials that possess the following
1. There are n repeated independent trials
2. Each trial has two possible outcomes (success, failure)
3. Pr(success) = p, Pr(failure) = q, and p + q = 1
4. The binomial random variable x is the count of the
number of successful trials that occur; x may take on any
integer value from zero to n
Discrete Random Variable

• Binomial Probability Experiment: An experiment that is


made up of repeated trials that possess the following
1. There are n repeated independent trials
2. Each trial has two possible outcomes (success, failure)
3. Pr(success) = p, Pr(failure) = q, and p + q = 1
4. The binomial random variable x is the count of the
number of successful trials that occur; x may take on any
integer value from zero to n

Example: The number of food businesses closed during the


pandemic in each neighborhood.
Discrete Random Variable

• Example: It is known that 40% of all graduating seniors on


the campus of a very large university have taken a statistics
class. Five seniors are selected at random and asked if they
have taken a statistics class. This approximates a binomial
experiment:
1. A trial is asking one student, repeated 5 times. The trials are
independent since the probability of taking a statistics class for any
one student is not affected by the results from any other student.
2. Two outcomes on each trial: taken a statistics class (success), not
taken a statistics class (failure)
3. p = Pr(taken a statistics class) = 0.40
q = Pr(not taken a statistics class) = 0.60
4. x = number of students who have taken a statistics class
Discrete Random Variable

• What is the probability of obtaining x successes in n trials?

• Example: What is the probability of obtaining 2 heads


from a coin that was tossed 5 times?
Discrete Random Variable

• What is the probability of obtaining x successes in n trials?

• Example: What is the probability of obtaining 2 heads


from a coin that was tossed 5 times?

P(HHTTT) = (1/2)5 = 1/32


Discrete Random Variable

• What is the probability of obtaining x successes in n trials?

• Example: What is the probability of obtaining 2 heads


from a coin that was tossed 5 times?

But there are more possibilities:

HHTTT HTHTT HTTHT HTTTH


THHTT THTHT THTTH
TTHHT TTHTH
TTTHH

P(2 heads) = 10 × 1/32 = 10/32


Discrete Random Variable

• In general, if trials result in a series of success and failures,

• Then the probability of x successes in that order is:

P(x) = px  qn – x

P: probability of success;
q: 1-p, probability of failure;
x: number of successes
n: number of trails
Discrete Random Variable

• For a binomial experiment, let p represent the probability of


a “success” and q represent the probability of a “failure” on a
single trial; then Pr(x), the probability that there will be
exactly x successes on n trials is:

 n  x n x
Pr( x)   ( p )(q ), for x  0, 1, 2, ... , or n
 x
Notes:
 The number of ways that exactly x successes can occur in n

trials:
 n
 
 x
Discrete Random Variable

• The number of ways that exactly x successes can occur in a


set of n trials is represented by the symbol:  n
 
 x
1. Must always be a positive integer
2. Called the binomial coefficient
3. Found by using the formula:  n n!
  
 x x !(n  x )!

Note:
n! is an abbreviation for n factorial: n!  n(n  1)(n  2)(3)(2)(1)
6!  6  5  4  3  2  1  720
Discrete Random Variable

• Example: According to a recent study, 65% of all homes in a


certain county have high levels of gas leaking into their
basements. Four homes are selected at random and tested
for the leak. The random variable x is the number of homes
with high levels of leaking (out of the four).
Properties:
1. There are 4 repeated trials: n = 4. The trials are independent.
2. Each test for radon is a trial, and each test has two outcomes: Leak
or no leak
3. p = Pr(leak) = 0.65, q = Pr(no leak) = 0.35
p+q=1
4. x is the number of homes with high levels of leaking, possible
values:
0, 1, 2, 3, 4
Discrete Random Variable

• Example: According to a recent study, 65% of all homes in a


certain county have high levels of gas leaking into their
basements. Four homes are selected at random and tested
for the leak. The random variable x is the number of homes
with high levels of leaking (out of the four).

What is the probability of x = 0, 1, 2, 3, 4?


Discrete Random Variable

æ4 ö -
Pr ( x ) = ç ÷(0.65 ) x (0. 35 ) 4 x , for x = 0, 1, 2, 3, 4
èx ø
æ4 ö
Pr ( 0 ) ç ÷(0.65 ) 0 (0. 35 ) 4 = (1)( 1)( 0 .0150 ) = 0 .0150
=
è0 ø

æ4 ö
Pr ( 1) = ç ÷(0. 65 ) 1 (0. 35 ) 3 = ( 4 )( 0 .65 )( 0 .0429 ) = 0 .1115
è1 ø

æ4 ö
Pr ( 2 ) ç ÷(0.65 ) 2 (0. 35 ) 2 = ( 6 )( 0 .4225 )( 0 .1225 ) = 0 .3105
=
è2 ø

æ4 ö
Pr ( 3) ç ÷(0. 65 ) 3 (0. 35 ) 1 = ( 4 )( 0 .2746 )( 0 .35 ) = 0 .3845
=
è3 ø
Procedure of Learning about a Population

• Statistics: The science of collecting, describing, analyzing,


interpreting displaying, and making decisions based on data

Sample Statistics

Population Parameters
Procedure of Learning about a Population

• So far:
• Define a population
• Describe the population parameter of concern
• Draw a sample
• Calculate the value of sample statistic

• Next: Make an inference


- Estimate the value of a population parameter
- Test a hypothesis: the value of a population parameter = the
value of a sample statistic
- Other hypotheses (≠, >, ≥, <, ≤)
Point Estimate for a Parameter

• Point estimate: the value of the corresponding statistics

• Example: 106 of 200, or 0.53 people of a sample support a


sports stadium.

• 0.53 is a point estimate of the population proportion


Point Estimate for a Parameter

• Example: students in this class take on average 24.7


minutes for commuting everyday.

• = 24.7 minutes is a point estimate (single number value) of


commuting time for the mean µ of the sampled population
(people at UW)
Point Estimate for a Parameter

• Example: students in this class take on average 24.7


minutes for commuting everyday.

• = 24.7 minutes is a point estimate (single number value) of


commuting time for the mean µ of the sampled population
(people at UW)

How good is the point estimate? Is it high? Or low?


Don’t know
Would another sample yield the same result?
Unlikely
Then what?
Interval estimation
Confidence Intervals

• Problem with point estimates: No information about the


uncertainty associated with the estimate

• Gives you no idea how close your sample mean is to the


population mean
Confidence Intervals

• Problem with point estimates: No information about the


uncertainty associated with the estimate

• Gives you no idea how close your sample mean is to the


population mean

• Confidence intervals: An interval of values computed


from the sample, that is almost sure to cover the true
population value.
Confidence Intervals

• Problem with point estimates: No information about the


uncertainty associated with the estimate

• Gives you no idea how close your sample mean is to the


population mean

• Confidence intervals: A 95% confidence interval around


the sample mean has a high probability of containing the
population mean.
Confidence Intervals

• What does “95%” mean?


- Interpretation: In 95% of the samples we take, the true
population proportion (or mean) will be in the interval.

- This is also the same as saying we are 95% confident that


the true population proportion (or mean) will be in the
interval

a probability that represents the


percentage of intervals that will
level of confidence
contain if a large number of repeated
samples are obtained
Confidence Intervals

• What does “95%” mean?


level of confidence

Calculate
Take repeated
confidence interval
samples
for each sample

95% of intervals contain 5% of intervals do not contain


population mean population mean
Confidence Intervals

• level of confidence

• It is denoted as 100(1-α)%

: significance level  tolerance level


Confidence Intervals

• level of confidence
• 90%, 95% and 99%

• A confidence level of 100(1-α)% implies that 100(1-α)% of


all samples would include the true value of the parameter
estimated.

• The higher the confidence level, the more strongly we


believe that the true value of the parameter being estimated
lies within the interval.
Confidence Intervals

• The construction of a confidence interval for the population


mean depends upon Three factors:
- The point estimate of the population
- The level of confidence
- The standard deviation of the sample mean

• One assumption:
- The sampling distribution of has a normal distribution
Confidence Intervals

• Suppose a simple random sample of size n is taken from a


population with unknown mean µ and known standard
deviation .

Notes:
1. x is the point estimate and the center point of the
confidence interval

2. z(/2) : confidence coefficient, the number needed to


construct an interval estimate of the correct width to
have a level of confidence 1- 
Confidence Intervals

1-a a Z(a/2)

99% 0.01 2.58

95% 0.05 1.96

90% 0.1 1.645


Estimation of mean μ : (σ known)

• Step 1. Describe the population


- parameter of concern: mean, μ
• Step 2. Specify the confidence interval criteria
- Check the assumptions (normal distribution, large sample size)
- Determine the test statistics
- Specify level of confidence
• Step 3. Collect and present sample evidence
• Step 4. Determine the confidence interval
- Determine the confidence coefficient, z(α/2):
- Calculate the lower confidence limit (LCL) and upper confidence
limit (UCL)
• Step 5. Describe the results: Between LCL and UCL
Estimation of mean μ : (σ known)

• Example: How far does the average community-college


student commute to college each day?

• Data: One-way distance from a random sample of 100


commuting students

• Sample mean = 10.22 miles


• Level of Confidence 95%
• Standard deviation = 6
Estimation of mean μ : (σ known)

• Step 1. Describe the population (mean)


• Step 2. Specify the confidence interval criteria (0.95)
• Step 3. Collect and present sample evidence
• Step 4. Determine the confidence interval
- Determine the confidence coefficient, z(α/2): 1.96
- Calculate the lower confidence limit (LCL) and upper confidence
limit (UCL)

- Step 5. Describe the results: Between LCL and UCL


Confidence Intervals

• Suppose a simple random sample of size n is taken from a


population with unknown mean µ and known standard
deviation .

Notes:
1. x is the point estimate and the center point of the
confidence interval

2. z(/2) : confidence coefficient, the number needed to


construct an interval estimate of the correct width to
have a level of confidence 1- 
Estimation of mean μ : (σ known)

• Step 1. Describe the population (mean)


• Step 2. Specify the confidence interval criteria (0.95)
• Step 3. Collect and present sample evidence
• Step 4. Determine the confidence interval
- Determine the confidence coefficient, z(α/2): 1.96
- LCL: 10.22 – 1.96 * 6/sqrt(100)
- UCL: 10.22 + 1.96 * 6/sqrt(100)

- Step 5. Describe the results: Between LCL and UCL


With 95% confidence we can say,
“The mean one-way distance is
between 9.04 and 11.40 miles”
Review

• Random Variables

• Probability Distribution of Random Variables


- Continuous Random Variable (Normal Distribution)
- Discrete Random Variable (Binomial Distribution)

• Procedure of Learning about a Population

• Confidence Intervals

You might also like