Professional Documents
Culture Documents
Statistical Methods For Research - Additional Notes On Sampling Tests (2022)
Statistical Methods For Research - Additional Notes On Sampling Tests (2022)
Statistical Methods For Research - Additional Notes On Sampling Tests (2022)
If 4 people are chosen at random from the list, how likely is it that:-
None have type O blood?
One has type O?
Two have type O?
3? 4?
Solution:
First, list the different possible outcomes for a sample of 4 people. Let O means that, a
person has type O blood, and let N means that, the person does not have type O blood.
The sequence of symbols indicates the results in the order in which they occur in the
experiment, so NNON is a different outcome from ONNN
In summary, for example 3.2, the probability distribution is as appeared in the following
figure 3.3,
0.35
0.30
0.25
0.20
px
0.15
0.10
0.05
0 1 2 3 4
x
Figure 3.3: Probability Mass Function for Binomial distribution with 𝑛 = 4 and 𝑃(𝑂) =
0.4
The figure above is the discrete random variable of Binomial with 𝑛 = 4 and the
probability of success, 𝑝 = 0.4, (i.e. 𝑁(𝑂)~𝐵𝑖𝑛(𝑛 = 4, 𝑃(𝑂) = 0.4)).
The discrete random variable with values 0,1,2,3,4 represents the number of people with
type O blood in a random sample of 4 people.
𝑃[𝑁(𝑂) = 𝑛(𝑂)] is the probability function of 𝑁(𝑂), (binomial probability distribution).
Note that a binomial probability distribution is a model of an experiment with only two
possible outcomes. We concentrate on one of the outcomes, type O blood (for instance),
and count the number of occurrences (successes) in the sample. The probability of type O
blood does not change from observation to observation, and the observations are
independent of each other.
Example 3.3:
A large drug company has 100 potential new prescription drugs under clinical test. About
20% of all drugs that reach this stage are eventually licensed for sale. What is the
probability that at least 15 of the 100 drugs are eventually licensed? Assume that the
binomial assumptions are satisfied, and use a normal approximation with continuity
corrections.
Solution:
Variable = Number of drugs that are licensed for sale, 𝑋.
𝑋~𝐵𝑖𝑛(𝑛 = 100, 𝑝 = 0.2)
Then, 𝜇 = 𝑛𝑝 = 100 × 0.2 = 20 and 𝜎 2 = 𝑛𝑝(1 − 𝑝) = 100 × 0.2 × 0.8 = 16,
We may approximate 𝑋 to normal distribution, i.e., 𝑋 → 𝑁(𝜇 = 20, 𝜎 2 = 16)
For 𝑖 = 1, … , 𝑛, 𝑋𝑖 is normally distributed having the mean, 𝜇 and variance, 𝜎 2 . For the
normal distribution, 𝑋𝑖 can be denoted as,
𝑋𝑖 ~𝑁(𝜇, 𝜎 2 )
∑𝑛
𝑖=1 𝑋𝑖
As the average of 𝑋𝑖 , i.e. 𝑋̅ = , then mean and variance of average is determined as,
𝑛
𝜎2
𝐸(𝑋̅) = 𝜇 and 𝑉𝑎𝑟(𝑋̅) = . Hence, 𝑋̅ forms normal distribution,
𝑛
𝜎2
𝑋̅~𝑁 (𝜇, )
𝑛
and,
𝜕 𝑛 1 1 𝑛 𝑛 1 𝑛
2 |𝒙) 2
2
𝑙(𝜇, 𝜎 = 0 − ( 2) + 4 ∑ (𝑥𝑖 − 𝜇) = − 2
+ 4 ∑ (𝑥𝑖 − 𝜇)2
𝜕𝜎 2 𝜎 2𝜎 𝑖=1 2𝜎 2𝜎 𝑖=1
=0
This yields,
1 𝑛
𝜎̌ 2 = ∑ (𝑥𝑖 − 𝑥̅ )2
𝑛 𝑖=1
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
𝑠2 = is unbiased estimator of 𝜎 2 .
𝑛−1
1.5 Additional rules of the normal distribution
According to #1;
𝑋𝑖 − 𝜇 𝑋𝑖 − 𝜇 2
𝑋𝑖 ~𝑁(𝜇, 𝜎 2)
→ ~𝑁(0,1) → ( ) ~𝜒12
𝜎 𝜎
Also,
2
𝑋̅ − 𝜇 𝑋̅ − 𝜇
𝑋̅~𝑁(𝜇, 𝜎 /𝑛) →
2
~𝑁(0,1) → ( ) ~𝜒12
𝜎/√𝑛 𝜎/√𝑛
According to #3;
𝑛 𝑋𝑖 − 𝜇 2 𝑋1 − 𝜇 2 𝑋2 − 𝜇 2 𝑋𝑛 − 𝜇 2
∑ ( ) =( ) +( ) + ⋯+ ( )
𝑖=1 𝜎 ⏟ 𝜎 ⏟ 𝜎 ⏟ 𝜎
𝜒12 𝜒12 𝜒12
Thus,
𝑛 𝑋𝑖 − 𝜇 2
∑ ( ) ~𝜒𝑛2
𝑖=1 𝜎
From the above equation, the first term of the right hand side is chi-square distribution
with 𝑛 degrees of freedom. The second term of the right hand side is chi-square
distribution with 1 degree of freedom. Thus, according to #3,
1 𝑛
2
𝑛 𝑥𝑖 − 𝜇 2 (𝑥̅ − 𝜇)2
∑ (𝑥𝑖 − 𝑥̅ ) = ∑ ( ) −
𝜎2 𝑖=1 ⏟ 𝑖=1 𝜎 ⏟𝜎 2 /𝑛
2
𝜒𝑛 𝜒12
1 𝑛
2
∴ ∑ (𝑥𝑖 − 𝑥̅ )2 ~𝜒𝑛−1
𝜎2 𝑖=1
These statistics are useful Analysis of Variance (ANOVA).
Statistics
Objective: to make inferences about a population based on information contained in a
sample.
0.6
0.4
0.2
0.0
67 68 69 70 71 72 73
It means that, any time 𝑥̅ falls in the interval 𝜇 ± 1.96𝜎𝑥̅ , the interval μ ± 1.96σ𝑥̅ will
contain the parameter 𝜇.
The probability of 𝑥̅ falling in the interval 𝜇 ± 1.96𝜎𝑥̅ is 0.95.
The 𝑥̅ ± 1.96𝜎𝑥̅ is an interval estimate of 𝜇 with level of confidence 0.95.
95% of the time in repeated sampling, intervals calculated using the formula 𝑥̅ ± 1.96𝜎𝑥̅
will contain the mean 𝜇.
0.6
n=25
0.5
0.4
n=10
0.3
n=5
0.2
0.1
0.0
-3 -2 -1 0 1 2 3
Example 3.4:
The percentage calories from fat have the following information,
𝑥̅ = 36.92, 𝑠 = 6.73 and 𝑛 = 168.
Solution:
95% CI of 𝑥̅ (of we may denote it as 𝐶𝐼95% (𝑥̅ )) is,
𝑠 6.73
𝐶𝐼95% (𝑥̅ ) ≡ 𝑥̅ ± 𝑧2.5% × = 36.92 ± 1.96 × = (35.90,37.94)
√𝑛 √168
We are 95% certain (confident) that the average percent calories from fat is a value
between 35.90 and 37.94.
For a specified value of (1 − 𝛼), a 100(1 − 𝛼)% CI for 𝜇 is (with condition that 𝜎 2 is
𝜎
known) 𝑥̅ ± 𝑧𝛼/2 𝜎𝑥̅ , where 𝜎𝑥̅ = 𝑛.
√
Solution:
The 99% Confidence interval for 𝜇 is
12.1
𝑥̅ ± 2.58𝜎𝑥̅ = 27.3 ± 2.58 ( ) = 27.3 ± 4.41
√50
We are 99% confident that the average number of count trees per acre is between 22.89
and 31.71.
then,
𝜎 2
𝑛 = (𝑧𝛼/2 )
𝐸
Then, to fulfill the requirement of confident interval, the tolerable error, 𝑊 must be
greater or equal than the width of the interval, i.e.,
𝜎
𝑊 ≥ (𝑥̅ + 𝑧𝛼/2 𝜎𝑥̅ ) − (𝑥̅ − 𝑧𝛼/2 𝜎𝑥̅ ) = 2𝑧𝛼/2 𝜎𝑥̅ = 2𝑧𝛼/2 ×
√𝑛
So that,
𝑠 2
𝑛 ≥ (2𝑧𝛼/2 )
𝑊
Example 3.6:
In the dietary intake example, the researchers wanted to estimate the mean percentage of
calories from fat (PCF) with a 95% CI having a tolerable error of 3. From previous
studies, the values of PCF ranged from 10%-50%. How many samples must the
researchers include in the sample to achieve their specification (case study: percentage of
calories from fat; Ott pg 193)
Solution:
Here, we have 𝑥𝑚𝑖𝑛 = 10% and 𝑥𝑚𝑎𝑥 = 50%. Hence, our estimate of standard deviation
can be formed as follows,
𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛 50 − 10
𝑠= = = 10
4 4
We want a 95% CI with tolerable error, 𝑊 = 3, by setting 𝛼 = 5%, then we may have
𝑧2.5% = 1.96. Then,
𝑠 2 10 2
𝑛 ≥ (2𝑧𝛼/2 ) = (2 × 1.96 × ) = 170.74
𝑊 3
So, a random sample of 171 samples should give a 95% CI for with the desired width
of 3 provided 10 is a reasonable estimate of .
Example 3.7:
A federal agency has decided to investigate the advertised weight printed on cartons on a
certain brand of cereal. The company in question periodically samples cartons of cereal
coming off the production line to check their weight. A summary of 1,500 of the weights
made available to the agency indicates a mean weight of 11.80 ounces per carton and a
standard deviation of 0.7 ounce. Find the number of cereal cartons the federal agency
must examine to estimate the average weight of cartons being produced now, using a
99% CI of width of 0.50.
Solution:
The federal agency has specified that the 𝑊 = 0.5,
Assuming that the weights made available to the agency by the company are accurate, we
have, 𝜎 = 0.7. For 99% CI, we set 𝛼 = 1% and determine 𝑧𝛼/2 = 2.58. Thus, the
required sample size is,
𝑠 2 0.7 2
𝑛 ≥ (2𝑧𝛼/2 ) = (2 × 2.58 × ) = 52.2
𝑊 0.5
So, the federal agency must obtain a random sample of 53 cereal cartons to estimate the
mean weight to within ±0.25.