Estimation of The Population Average

Mindanao State University
General Santos City
December 28, 2020
Carlito O. Daarol
Instructor
Descriptive Statistics and Statistical Inference
Note: This is your reference for question number 1 in the final exam
Estimation of the population average

Case 1: The population is given with parameter µ.
Example1: Population consists of 47,558 students whose average body mass index
(BMI) is µbmi = 19.32374 (which is considered to be healthy status)
Using random sampling, repeated random sampling and Central Limit Theorem
Actual value of µbmi is 19.32374

A point estimate of µbmi is 19.32907
And a 95% Confidence interval for µbmi is the interval (19.28632, 19.3717)
(Complete R code is needed to get this results)
Example2: The same population whose average age is µage = 26.49931 years old
Actual value of µage is 26.49931

A point estimate of µage is 26.48184 years old
And a 95% Confidence interval for µage is the interval (26.3384, 26.6282)
(Complete R code is needed to get this results)
Case 2: The population is completely unknown with parameter µ. We are only given a set of random
samples.
Random samples = {28, -44, 29, 30, 26, 27, 22, 23, 33, 16, 24, 29, 24, 40 , 21, 31, 34, -2, 25, 19}
Actual mean value of µ: NA (Unknown Population)

Point Estimate for the population mean µ is 21.74553
95% Confidence Interval for the mean µ is (15.60, 30.30)
(Complete bootstrapping code is needed to get these results)
Remark: The variable being measured is the speed of light. The data was collected
sometime in the 18th century.
How it is done? (Bootstrap process)
Step 1: Get a random sample of size n
RandomSample = {28, -44, 29, 30, 26, 27, 22, 23, 33, 16, 24, 29, 24, 40 , 21, 31, 34, -2, 25, 19}
Step 2: Get a random sample of size n from the original set of random samples (with replacement). This
process is called resampling and repeat this process for so many times (say 20,000 times). This
process will generate the following sets of random samples
Set1: 21 25 40 29 16 -2 24 26 19 40 26 25 33 29 23 22 16 33 25 30
Set2: 40 34 24 22 29 21 16 24 22 33 33 16 22 27 -44 26 23 29 24 -2
Set3: 28 27 21 33 21 31 19 27 24 23 22 31 34 -2 34 -44 30 24 26
Set4: 19 40 29 23 31 29 40 29 40 22 29 21 26 23 25 16 -2 16 29 -44
Set5: 27 31 24 30 29 40 25 22 33 22 -44 31 24 25 19 21 22 30 28 2321
…
…
Set20000: 31 21 -44 25 21 31 34 24 27 25 19 34 28 19 26 23 34 26 27 16
The sets Set1, Set2, Set3, … Set20000are called as bootstrap random samples which are obtained by
repeated resampling using the original set as the source.
Step3: For each bootstrap random samples compute the mean value. This will give us a sequence of
mean values
mean1 = mean of Set1 = 25.00

…
…
Mean20000 = mean of Set20000 = 22.35
The collection of sample means: 25.00, 20.95, 21.70, 22.05, 27.40, 23.10, … 20.85, 22.35) forms
what is called a distribution of sampling means. The mean of these sampling means is the
estimate of the true population mean and this result is guaranteed by the Central Limit
Theorem.
R command to generate Set 1 and mean1
Set1 <- sample(random_samples,length(random_samples),replace = TRUE)
Set1
mean(Set1)
R for loop to repeat resamples for B = 20,000 times
Set.seed(123) # I want your results to coincide exactly with my results

Set <- list()
Means <- NULL
Random_samples = {28, -44, 29, 30, 26, 27, 22, 23, 33, 16, 24, 29, 24, 40 , 21, 31, 34, -2, 25, 19}
For (i in 1:20000){
Set[[i]] <- sample(random_samples,length(random_samples),replace = TRUE)
Set[[i]]
Means[i] <- mean(Set[[i]]) # compute the mean of the resampled set
}
The for loop generates

1) a collection of resampled sets Set[[i] where I = 1,2,3,…. B (where B = 20,000)
2) a collection of sampling means (one sample mean computed for each set[i])
Lastly, the mean of the sampling means is the estimate of the population parameter.
Bootstrap process in one picture
The normal curve on the right side represents the distribution of the sampling means.
The sample means are computed from the bootstrap samples.
The mean of the sampling means is the estimate of the true unknown population parameter 𝜽.
Step4: Invoke the Central Limit Theorem
General Idea: Regardless of the population distribution model, as the sample size
increases, the sample mean tends to be normally distributed around the population mean, and
its standard deviation shrinks as n increases.
In symbol this means
Application of Central Limit Theorem
CLT can be applied in two ways

1. From a collection of sets of random samples, you can estimate the population parameters.
An example is the estimation of population parameters as discussed above
2. From the given population parameters, you can derive the sample properties
Example for the second application of CLT:
It is believed that nearsightedness affects about 8% of all children. 194 incoming children have their
eyesight tested. Can the CLT be used in this situation?
Answer is yes!
1. Nearsightedness is a two level variable (meaning it is a binomial random variable). It is either

you are nearsighted or not.
2. For a binomial random variable X, the mean of X is equal to Np and the variance of X is Npq
3. Proportion of nearsightedness p = 8% = 0.08 and population size in N = 194. This means for the
entire population, the average number of nearsighted students is 0.08(194) = 15.52 ≈ 16.
By CLT, the mean 𝑋̅ 𝑠ℎ𝑜𝑢𝑙𝑑 𝑏𝑒 𝑛𝑜𝑟𝑚𝑎𝑙𝑙𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑 𝑤𝑖𝑡ℎ a mean of Np = 194(0.08) = 15.52
and standard deviation = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = √𝑁𝑝𝑞 = √194(0.08)(0.92) = 3.7787.
4. With 194 incoming children, what is a reasonable range of nearsighted children the school can
expect?
We are going to use the 68-95-99.7 normality rule here. A reasonable estimate is to cover the
99.7% coverage of the data which falls within 3 standard deviations.
3 standard deviations is 3SD = 3(3.7787) = 11.3361.

This within 3SD confidence interval is (15.52 - 11.361, 5.52 + 11.361) = (4.1839, 26.881)
The school should expect between 4 and 27 nearsighted children out of 194 incoming children.
Example of Central Limit Theorem in Accounting
Central Limit Theorem - Overview, History, and Example (corporatefinanceinstitute.com)
An investor is interested in estimating the return of ABC stock market index

that is comprised of 100,000 stocks. Due to the large size of the index, the
investor is unable to analyze each stock independently and instead chooses to
use random sampling to get an estimate of the overall return of the index.
The investor picks random samples of the stocks, with each sample comprising
at least 30 stocks. The samples must be random, and any previously selected
samples must be replaced in subsequent samples to avoid bias.
If the first sample produces an average return of 7.5%, the next sample may
produce an average return of 7.8%. With the nature of randomized sampling,
each sample will produce a different result. As you increase the size of the
sample size with each sample you pick, the sample means will start forming
their own distributions.
The distribution of the sample means will move toward normal as the value of
n increases. The average return of the stocks in the sample index estimates the
return of the whole index of 100,000 stocks, and the average return is normally
distributed.
History of the Central Limit Theorem
The initial version of the central limit theorem was coined by Abraham De
Moivre, a French-born mathematician. In an article published in 1733, De
Moivre used the normal distribution to find the number of heads resulting
from multiple tosses of a coin. The concept was unpopular at the time, and it
was forgotten quickly.
However, in 1812, the concept was reintroduced by Pierre-Simon Laplace,

another famous French mathematician. Laplace re-introduced the normal
distribution concept in his work titled “Théorie Analytique des Probabilités,”
where he attempted to approximate binomial distribution with the normal
distribution.
The mathematician found that the average of independent random variables,

when increased in number, tend to follow a normal distribution. At that time,
Laplace’s findings on the central limit theorem attracted attention from other
theorists and academicians.
Later in 1901, the central limit theorem was expanded by Aleksandr Lyapunov,
a Russian mathematician. Lyapunov went a step ahead to define the concept
in general terms and prove how the concept worked mathematically. The
characteristic functions that he used to provide the theorem were adopted in
modern probability theory.
What is the Central Limit Theorem (CLT)?
The Central Limit Theorem (CLT) is a statistical concept that states that the
sample mean distribution of a random variable will assume a near-normal or
normal distribution if the sample size is large enough. In simple terms, the
theorem states that the sampling distribution of the mean approaches a
normal distribution as the size of the sample increases, regardless of the shape
of the original population distribution.
As the user increases the number of samples to 30, 40, 50, etc., the graph of
the sample means will move towards a normal distribution. The sample size
must be 30 or higher for the central limit theorem to hold.
One of the most important components of the theorem is that the mean of
the sample will be the mean of the entire population. If you calculate the
mean of multiple samples of the population, add them up, and find their
average, the result will be the estimate of the population mean.
The same applies when using standard deviation. If you calculate the standard
deviation of all the samples in the population, add them up, and find the
average, the result will be the standard deviation of the entire population.
How Does the Central Limit Theorem Work?
The central limit theorem forms the basis of the probability distribution. It
makes it easy to understand how population estimates behave when
subjected to repeated sampling. When plotted on a graph, the theorem shows
the shape of the distribution formed by the means of repeated population
samples.
As the sample sizes get bigger, the distribution of the means from the
repeated samples tend to normalize and resemble a normal distribution. The
result remains the same regardless of what the original shape of the
distribution was. It can be illustrated in the figure below:
From the figure above, we can deduce that despite the fact that the original
shape of the distribution was uniform, it tends towards a normal distribution
as the value of n (sample size) increases.
Apart from showing the shape that the sample means will take, the central
limit theorem also gives an overview of the mean and variance of the
distribution. The sample mean of the distribution is the actual population
mean from which the samples were taken from.
The variance of the sample distribution, on the other hand, is the variance of
the population divided by n. Therefore, the larger the sample size of the
distribution, the smaller the variance of the sample mean.

Estimation of The Population Average

Uploaded by

Copyright:

Available Formats

You might also like

Estimation of The Population Average

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation of The Population Average

Uploaded by

Copyright:

Available Formats

Mindanao State University

General Santos City

December 28, 2020

Estimation of the population average

Actual value of µbmi is 19.32374

Actual value of µage is 26.49931

Actual mean value of µ: NA (Unknown Population)

How it is done? (Bootstrap process)

Step 1: Get a random sample of size n

mean1 = mean of Set1 = 25.00

R for loop to repeat resamples for B = 20,000 times

Set.seed(123) # I want your results to coincide exactly with my results

The for loop generates

Bootstrap process in one picture

In symbol this means

Application of Central Limit Theorem

CLT can be applied in two ways

Example for the second application of CLT:

1. Nearsightedness is a two level variable (meaning it is a binomial random variable). It is either

3 standard deviations is 3SD = 3(3.7787) = 11.3361.

Example of Central Limit Theorem in Accounting

Central Limit Theorem - Overview, History, and Example (corporatefinanceinstitute.com)

An investor is interested in estimating the return of ABC stock market index

History of the Central Limit Theorem

However, in 1812, the concept was reintroduced by Pierre-Simon Laplace,

The mathematician found that the average of independent random variables,

How Does the Central Limit Theorem Work?

You might also like