Professional Documents
Culture Documents
Estimation of The Population Average
Estimation of The Population Average
Estimation of The Population Average
Carlito O. Daarol
Instructor
Descriptive Statistics and Statistical Inference
Note: This is your reference for question number 1 in the final exam
Example1: Population consists of 47,558 students whose average body mass index
(BMI) is µbmi = 19.32374 (which is considered to be healthy status)
Using random sampling, repeated random sampling and Central Limit Theorem
Example2: The same population whose average age is µage = 26.49931 years old
Using random sampling, repeated random sampling and Central Limit Theorem
Case 2: The population is completely unknown with parameter µ. We are only given a set of random
samples.
Random samples = {28, -44, 29, 30, 26, 27, 22, 23, 33, 16, 24, 29, 24, 40 , 21, 31, 34, -2, 25, 19}
Using random sampling, repeated random sampling and Central Limit Theorem
Remark: The variable being measured is the speed of light. The data was collected
sometime in the 18th century.
RandomSample = {28, -44, 29, 30, 26, 27, 22, 23, 33, 16, 24, 29, 24, 40 , 21, 31, 34, -2, 25, 19}
Step 2: Get a random sample of size n from the original set of random samples (with replacement). This
process is called resampling and repeat this process for so many times (say 20,000 times). This
process will generate the following sets of random samples
Set1: 21 25 40 29 16 -2 24 26 19 40 26 25 33 29 23 22 16 33 25 30
Set2: 40 34 24 22 29 21 16 24 22 33 33 16 22 27 -44 26 23 29 24 -2
Set3: 28 27 21 33 21 31 19 27 24 23 22 31 34 -2 34 -44 30 24 26
Set4: 19 40 29 23 31 29 40 29 40 22 29 21 26 23 25 16 -2 16 29 -44
Set5: 27 31 24 30 29 40 25 22 33 22 -44 31 24 25 19 21 22 30 28 2321
…
…
Set20000: 31 21 -44 25 21 31 34 24 27 25 19 34 28 19 26 23 34 26 27 16
The sets Set1, Set2, Set3, … Set20000are called as bootstrap random samples which are obtained by
repeated resampling using the original set as the source.
Step3: For each bootstrap random samples compute the mean value. This will give us a sequence of
mean values
The collection of sample means: 25.00, 20.95, 21.70, 22.05, 27.40, 23.10, … 20.85, 22.35) forms
what is called a distribution of sampling means. The mean of these sampling means is the
estimate of the true population mean and this result is guaranteed by the Central Limit
Theorem.
R command to generate Set 1 and mean1
Set1 <- sample(random_samples,length(random_samples),replace = TRUE)
Set1
mean(Set1)
For (i in 1:20000){
Set[[i]] <- sample(random_samples,length(random_samples),replace = TRUE)
Set[[i]]
Means[i] <- mean(Set[[i]]) # compute the mean of the resampled set
}
Lastly, the mean of the sampling means is the estimate of the population parameter.
The normal curve on the right side represents the distribution of the sampling means.
The sample means are computed from the bootstrap samples.
The mean of the sampling means is the estimate of the true unknown population parameter 𝜽.
Step4: Invoke the Central Limit Theorem
General Idea: Regardless of the population distribution model, as the sample size
increases, the sample mean tends to be normally distributed around the population mean, and
its standard deviation shrinks as n increases.
2. From the given population parameters, you can derive the sample properties
It is believed that nearsightedness affects about 8% of all children. 194 incoming children have their
eyesight tested. Can the CLT be used in this situation?
Answer is yes!
2. For a binomial random variable X, the mean of X is equal to Np and the variance of X is Npq
3. Proportion of nearsightedness p = 8% = 0.08 and population size in N = 194. This means for the
entire population, the average number of nearsighted students is 0.08(194) = 15.52 ≈ 16.
By CLT, the mean 𝑋̅ 𝑠ℎ𝑜𝑢𝑙𝑑 𝑏𝑒 𝑛𝑜𝑟𝑚𝑎𝑙𝑙𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑 𝑤𝑖𝑡ℎ a mean of Np = 194(0.08) = 15.52
and standard deviation = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = √𝑁𝑝𝑞 = √194(0.08)(0.92) = 3.7787.
4. With 194 incoming children, what is a reasonable range of nearsighted children the school can
expect?
We are going to use the 68-95-99.7 normality rule here. A reasonable estimate is to cover the
99.7% coverage of the data which falls within 3 standard deviations.
The investor picks random samples of the stocks, with each sample comprising
at least 30 stocks. The samples must be random, and any previously selected
samples must be replaced in subsequent samples to avoid bias.
If the first sample produces an average return of 7.5%, the next sample may
produce an average return of 7.8%. With the nature of randomized sampling,
each sample will produce a different result. As you increase the size of the
sample size with each sample you pick, the sample means will start forming
their own distributions.
The distribution of the sample means will move toward normal as the value of
n increases. The average return of the stocks in the sample index estimates the
return of the whole index of 100,000 stocks, and the average return is normally
distributed.
The initial version of the central limit theorem was coined by Abraham De
Moivre, a French-born mathematician. In an article published in 1733, De
Moivre used the normal distribution to find the number of heads resulting
from multiple tosses of a coin. The concept was unpopular at the time, and it
was forgotten quickly.
Later in 1901, the central limit theorem was expanded by Aleksandr Lyapunov,
a Russian mathematician. Lyapunov went a step ahead to define the concept
in general terms and prove how the concept worked mathematically. The
characteristic functions that he used to provide the theorem were adopted in
modern probability theory.
What is the Central Limit Theorem (CLT)?
The Central Limit Theorem (CLT) is a statistical concept that states that the
sample mean distribution of a random variable will assume a near-normal or
normal distribution if the sample size is large enough. In simple terms, the
theorem states that the sampling distribution of the mean approaches a
normal distribution as the size of the sample increases, regardless of the shape
of the original population distribution.
As the user increases the number of samples to 30, 40, 50, etc., the graph of
the sample means will move towards a normal distribution. The sample size
must be 30 or higher for the central limit theorem to hold.
One of the most important components of the theorem is that the mean of
the sample will be the mean of the entire population. If you calculate the
mean of multiple samples of the population, add them up, and find their
average, the result will be the estimate of the population mean.
The same applies when using standard deviation. If you calculate the standard
deviation of all the samples in the population, add them up, and find the
average, the result will be the standard deviation of the entire population.
The central limit theorem forms the basis of the probability distribution. It
makes it easy to understand how population estimates behave when
subjected to repeated sampling. When plotted on a graph, the theorem shows
the shape of the distribution formed by the means of repeated population
samples.
As the sample sizes get bigger, the distribution of the means from the
repeated samples tend to normalize and resemble a normal distribution. The
result remains the same regardless of what the original shape of the
distribution was. It can be illustrated in the figure below:
From the figure above, we can deduce that despite the fact that the original
shape of the distribution was uniform, it tends towards a normal distribution
as the value of n (sample size) increases.
Apart from showing the shape that the sample means will take, the central
limit theorem also gives an overview of the mean and variance of the
distribution. The sample mean of the distribution is the actual population
mean from which the samples were taken from.
The variance of the sample distribution, on the other hand, is the variance of
the population divided by n. Therefore, the larger the sample size of the
distribution, the smaller the variance of the sample mean.