Lecture 7 Random Variable Confidence Interval

RE 397 Intro to RE Data Modeling
Lecture 7: Random Variable & Conf. Interval

Instructor: Feiyang Sun
Winter, 2022
Review
• Independence and mutually exclusive are two very different
concepts
- Mutually exclusive says the two events cannot occur together, that
is, they have no intersection
- Independence says each event does not affect the other event’s
probability
• Pr(A and B) = Pr(A) * Pr(B) when A and B are independent

- Since Pr(A) and Pr(B) are not zero, Pr(A and B) is nonzero
- Thus, independent events have an intersection
• Events cannot be both mutually exclusive and independent

- If two events are independent, then they are not mutually exclusive
- If two events are mutually exclusive, then they are not independent
Random Variables
• Random Variable: A variable that assumes a unique

numerical value for each of the outcomes in the sample
space of a probability experiment
Notes:
 Used to denote the outcomes of a probability experiment
 Each outcome in a probability experiment is assigned to
a unique value
 Illustration:
Discrete & Continuous Random Variables
• Discrete Random Variable: A quantitative random

variable that can assume a countable number of values
Note: Usually associated with counting
• Continuous Random Variable: A quantitative random

variable that can assume an uncountable number of
values
Note: Usually associated with a measurement

Probability Distribution
• Probability Distribution: A distribution of the

probabilities associated with each of the values of a
random variable. The probability distribution is a
theoretical distribution; it is used to represent
populations.
Notes:
 The probability distribution tells you everything you need
to know about the random variable.

 The probability distribution may be presented in the form
of a table, chart, function, etc.

• Probability Distribution: A distribution of the

probabilities associated with each of the values of a
random variable. The probability distribution is a
theoretical distribution; it is used to represent
populations.
Notes:
 The probability distribution tells you everything you need
to know about the random variable.

 The probability distribution may be presented in the form
of a table, chart, function, etc.
Probability Function: A rule that assigns probabilities

to the values of the random variable
• Example: The number of people staying in a selected

room at a local hotel within certain time period is a
random variable ranging in value from 0 to 4. The
probability distribution is known and is given in various
forms below:
x 0 1 2 3 4
P (x ) 2/15 4/15 5/15 3/15 1/15
• Example: The number of people staying in a selected

room at a local hotel within certain time period is a
random variable ranging in value from 0 to 4. The
probability distribution is known and is given in various
forms below:
x 0 1 2 3 4
P (x ) 2/15 4/15 5/15 3/15 1/15
Probability Function: Pr(x = 2) = 5/15

Hotel Room Probability Distribution

P( x )
0.4
0.3
0.2
0.1
0.0 x
0 1 2 3 4
Normal Probability Distribution
• A continuous random variable

• Symmetric around mean
• Normal probability distribution
function
• The mean, median, and mode are

equal
  3   2          2   3
• The value x is unrestricted
(-∞<x<∞)
Probability of Normal Distribution
b
P (a  x  b)   f ( x )dx
a
P ( a  x  b)
a b x
• z-Score: The position a particular value of x has relative

to the mean, measured in standard deviations. The z-
score is found by the formula:
value  mean x  x
z 
st.dev. s
• Example: A certain data set has mean 35.6 and standard

deviation 7.1. Find the z-scores for 46.
Solutions:
x  x 46  35.6
z   176
.
s 7.1
46 is 1.76 standard deviations above the mean
Confidence Intervals
P Z
99% 2.58
95% 1.96
90% 1.645
• Many methods for studying a normal population
• It is important to check for population normality before

conducting statistical analysis
• Normal Probability Plot (NPP) is a graphical

examination of normality based on the percentile
Discrete Random Variable
• Binomial Probability
Distribution: A type of
discrete distribution, also
noticed as Bernoulli Random
Variable distribution
• Based on a series of repeated

trials whose outcomes can be
classified in one of two
categories: success or failure
Jacob Bernoulli (1654-

• Distribution based on a 1705)
binomial probability
experiment
• Binomial Probability Experiment: An experiment that is

made up of repeated trials that possess the following
1. There are n repeated independent trials
2. Each trial has two possible outcomes (success, failure)
3. Pr(success) = p, Pr(failure) = q, and p + q = 1
4. The binomial random variable x is the count of the
number of successful trials that occur; x may take on any
integer value from zero to n
• Binomial Probability Experiment: An experiment that is

made up of repeated trials that possess the following
1. There are n repeated independent trials
2. Each trial has two possible outcomes (success, failure)
3. Pr(success) = p, Pr(failure) = q, and p + q = 1
4. The binomial random variable x is the count of the
number of successful trials that occur; x may take on any
integer value from zero to n
Example: The number of food businesses closed during the

pandemic in each neighborhood.
• Example: It is known that 40% of all graduating seniors on

the campus of a very large university have taken a statistics
class. Five seniors are selected at random and asked if they
have taken a statistics class. This approximates a binomial
experiment:
1. A trial is asking one student, repeated 5 times. The trials are
independent since the probability of taking a statistics class for any
one student is not affected by the results from any other student.
2. Two outcomes on each trial: taken a statistics class (success), not
taken a statistics class (failure)
3. p = Pr(taken a statistics class) = 0.40
q = Pr(not taken a statistics class) = 0.60
4. x = number of students who have taken a statistics class
• What is the probability of obtaining x successes in n trials?
• Example: What is the probability of obtaining 2 heads

from a coin that was tossed 5 times?

P(HHTTT) = (1/2)5 = 1/32


But there are more possibilities:
HHTTT HTHTT HTTHT HTTTH

THHTT THTHT THTTH
TTHHT TTHTH
TTTHH
P(2 heads) = 10 × 1/32 = 10/32

• In general, if trials result in a series of success and failures,
• Then the probability of x successes in that order is:
P(x) = px  qn – x
P: probability of success;
q: 1-p, probability of failure;
x: number of successes
n: number of trails
• For a binomial experiment, let p represent the probability of

a “success” and q represent the probability of a “failure” on a
single trial; then Pr(x), the probability that there will be
exactly x successes on n trials is:
 n  x n x
Pr( x)   ( p )(q ), for x  0, 1, 2, ... , or n
 x
Notes:
 The number of ways that exactly x successes can occur in n
trials:
 n
 
 x
• The number of ways that exactly x successes can occur in a

set of n trials is represented by the symbol:  n
 
 x
1. Must always be a positive integer
2. Called the binomial coefficient
3. Found by using the formula:  n n!
  
 x x !(n  x )!
Note:
n! is an abbreviation for n factorial: n!  n(n  1)(n  2)(3)(2)(1)
6!  6  5  4  3  2  1  720
• Example: According to a recent study, 65% of all homes in a

certain county have high levels of gas leaking into their
basements. Four homes are selected at random and tested
for the leak. The random variable x is the number of homes
with high levels of leaking (out of the four).
Properties:
1. There are 4 repeated trials: n = 4. The trials are independent.
2. Each test for radon is a trial, and each test has two outcomes: Leak
or no leak
3. p = Pr(leak) = 0.65, q = Pr(no leak) = 0.35
p+q=1
4. x is the number of homes with high levels of leaking, possible
values:
0, 1, 2, 3, 4
• Example: According to a recent study, 65% of all homes in a

certain county have high levels of gas leaking into their
basements. Four homes are selected at random and tested
for the leak. The random variable x is the number of homes
with high levels of leaking (out of the four).
What is the probability of x = 0, 1, 2, 3, 4?

æ4 ö -
Pr ( x ) = ç ÷(0.65 ) x (0. 35 ) 4 x , for x = 0, 1, 2, 3, 4
èx ø
æ4 ö
Pr ( 0 ) ç ÷(0.65 ) 0 (0. 35 ) 4 = (1)( 1)( 0 .0150 ) = 0 .0150
=
è0 ø
æ4 ö
Pr ( 1) = ç ÷(0. 65 ) 1 (0. 35 ) 3 = ( 4 )( 0 .65 )( 0 .0429 ) = 0 .1115
è1 ø
æ4 ö
Pr ( 2 ) ç ÷(0.65 ) 2 (0. 35 ) 2 = ( 6 )( 0 .4225 )( 0 .1225 ) = 0 .3105
=
è2 ø
æ4 ö
Pr ( 3) ç ÷(0. 65 ) 3 (0. 35 ) 1 = ( 4 )( 0 .2746 )( 0 .35 ) = 0 .3845
=
è3 ø
Procedure of Learning about a Population
• Statistics: The science of collecting, describing, analyzing,

interpreting displaying, and making decisions based on data
Sample Statistics
Population Parameters
Procedure of Learning about a Population
• So far:
• Define a population
• Describe the population parameter of concern
• Draw a sample
• Calculate the value of sample statistic
• Next: Make an inference

- Estimate the value of a population parameter
- Test a hypothesis: the value of a population parameter = the
value of a sample statistic
- Other hypotheses (≠, >, ≥, <, ≤)
Point Estimate for a Parameter
• Point estimate: the value of the corresponding statistics
• Example: 106 of 200, or 0.53 people of a sample support a

sports stadium.
• 0.53 is a point estimate of the population proportion

• Example: students in this class take on average 24.7

minutes for commuting everyday.
• = 24.7 minutes is a point estimate (single number value) of

commuting time for the mean µ of the sampled population
(people at UW)
• Example: students in this class take on average 24.7

minutes for commuting everyday.
• = 24.7 minutes is a point estimate (single number value) of

commuting time for the mean µ of the sampled population
(people at UW)
How good is the point estimate? Is it high? Or low?

Don’t know
Would another sample yield the same result?
Unlikely
Then what?
Interval estimation
• Problem with point estimates: No information about the

uncertainty associated with the estimate
• Gives you no idea how close your sample mean is to the

population mean


population mean
• Confidence intervals: An interval of values computed

from the sample, that is almost sure to cover the true
population value.


population mean
• Confidence intervals: A 95% confidence interval around

the sample mean has a high probability of containing the
population mean.
• What does “95%” mean?

- Interpretation: In 95% of the samples we take, the true
population proportion (or mean) will be in the interval.
- This is also the same as saying we are 95% confident that

the true population proportion (or mean) will be in the
interval
a probability that represents the

percentage of intervals that will
level of confidence
contain if a large number of repeated
samples are obtained
• What does “95%” mean?

level of confidence
Calculate
Take repeated
confidence interval
samples
for each sample
95% of intervals contain 5% of intervals do not contain

population mean population mean
• level of confidence
• It is denoted as 100(1-α)%
: significance level  tolerance level

• level of confidence
• 90%, 95% and 99%
• A confidence level of 100(1-α)% implies that 100(1-α)% of

all samples would include the true value of the parameter
estimated.
• The higher the confidence level, the more strongly we

believe that the true value of the parameter being estimated
lies within the interval.
• The construction of a confidence interval for the population

mean depends upon Three factors:
- The point estimate of the population
- The level of confidence
- The standard deviation of the sample mean
• One assumption:
- The sampling distribution of has a normal distribution
• Suppose a simple random sample of size n is taken from a

population with unknown mean µ and known standard
deviation .
Notes:
1. x is the point estimate and the center point of the
confidence interval
2. z(/2) : confidence coefficient, the number needed to

construct an interval estimate of the correct width to
have a level of confidence 1- 
1-a a Z(a/2)
99% 0.01 2.58
95% 0.05 1.96
90% 0.1 1.645

Estimation of mean μ : (σ known)
• Step 1. Describe the population

- parameter of concern: mean, μ
• Step 2. Specify the confidence interval criteria
- Check the assumptions (normal distribution, large sample size)
- Determine the test statistics
- Specify level of confidence
• Step 3. Collect and present sample evidence
• Step 4. Determine the confidence interval
- Determine the confidence coefficient, z(α/2):
- Calculate the lower confidence limit (LCL) and upper confidence
limit (UCL)
• Step 5. Describe the results: Between LCL and UCL
• Example: How far does the average community-college

student commute to college each day?
• Data: One-way distance from a random sample of 100

commuting students
• Sample mean = 10.22 miles

• Level of Confidence 95%
• Standard deviation = 6
• Step 1. Describe the population (mean)

• Step 2. Specify the confidence interval criteria (0.95)
- Determine the confidence coefficient, z(α/2): 1.96
- Calculate the lower confidence limit (LCL) and upper confidence
limit (UCL)
- Step 5. Describe the results: Between LCL and UCL

• Suppose a simple random sample of size n is taken from a

population with unknown mean µ and known standard
deviation .
Notes:
1. x is the point estimate and the center point of the
confidence interval
2. z(/2) : confidence coefficient, the number needed to

construct an interval estimate of the correct width to
have a level of confidence 1- 
• Step 1. Describe the population (mean)

• Step 2. Specify the confidence interval criteria (0.95)
- Determine the confidence coefficient, z(α/2): 1.96
- LCL: 10.22 – 1.96 * 6/sqrt(100)
- UCL: 10.22 + 1.96 * 6/sqrt(100)
- Step 5. Describe the results: Between LCL and UCL

With 95% confidence we can say,
“The mean one-way distance is
between 9.04 and 11.40 miles”
Review
• Random Variables
• Probability Distribution of Random Variables

- Continuous Random Variable (Normal Distribution)
- Discrete Random Variable (Binomial Distribution)
• Procedure of Learning about a Population
• Confidence Intervals

Lecture 7 Random Variable Confidence Interval

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 7 Random Variable Confidence Interval

Uploaded by

Copyright:

Available Formats

RE 397 Intro to RE Data Modeling

Lecture 7: Random Variable & Conf. Interval

• Pr(A and B) = Pr(A) * Pr(B) when A and B are independent

• Events cannot be both mutually exclusive and independent

• Random Variable: A variable that assumes a unique

 Each outcome in a probability experiment is assigned to

• Discrete Random Variable: A quantitative random

Note: Usually associated with counting

• Continuous Random Variable: A quantitative random

Note: Usually associated with a measurement

• Probability Distribution: A distribution of the

to know about the random variable.

of a table, chart, function, etc.

• Probability Distribution: A distribution of the

to know about the random variable.

of a table, chart, function, etc.

Probability Function: A rule that assigns probabilities

• Example: The number of people staying in a selected

• Example: The number of people staying in a selected

Probability Function: Pr(x = 2) = 5/15

Hotel Room Probability Distribution

• A continuous random variable

• The mean, median, and mode are

• z-Score: The position a particular value of x has relative

• Example: A certain data set has mean 35.6 and standard

• Many methods for studying a normal population

• It is important to check for population normality before

• Normal Probability Plot (NPP) is a graphical

• Based on a series of repeated

Jacob Bernoulli (1654-

• Binomial Probability Experiment: An experiment that is

• Binomial Probability Experiment: An experiment that is

Example: The number of food businesses closed during the

• Example: It is known that 40% of all graduating seniors on

• What is the probability of obtaining x successes in n trials?

• Example: What is the probability of obtaining 2 heads

• What is the probability of obtaining x successes in n trials?

• Example: What is the probability of obtaining 2 heads

P(HHTTT) = (1/2)5 = 1/32

• What is the probability of obtaining x successes in n trials?

• Example: What is the probability of obtaining 2 heads

But there are more possibilities:

HHTTT HTHTT HTTHT HTTTH

P(2 heads) = 10 × 1/32 = 10/32

• In general, if trials result in a series of success and failures,

• Then the probability of x successes in that order is:

• For a binomial experiment, let p represent the probability of

• The number of ways that exactly x successes can occur in a

• Example: According to a recent study, 65% of all homes in a

• Example: According to a recent study, 65% of all homes in a

What is the probability of x = 0, 1, 2, 3, 4?

• Statistics: The science of collecting, describing, analyzing,

• Next: Make an inference

• Point estimate: the value of the corresponding statistics

• Example: 106 of 200, or 0.53 people of a sample support a

• 0.53 is a point estimate of the population proportion

• Example: students in this class take on average 24.7

• = 24.7 minutes is a point estimate (single number value) of

• Example: students in this class take on average 24.7

• = 24.7 minutes is a point estimate (single number value) of

How good is the point estimate? Is it high? Or low?

• Problem with point estimates: No information about the

• Gives you no idea how close your sample mean is to the