Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Data Analysis for Managers

Unit 5
Sampling and Estimation
Unit 5 - Sampling and Estimation

In business, before taking any kind of strategic decision it is essential to analyze the samples to test
and prove the assumptions. For example, while opening a new branch or launching a new product it is
compulsory to conduct a survey to assess potential. But practically it is not possible to reach all
potential customers of the catchment area, there arises the need to select samples to analyze the
Let us say we wish to study the preferences of employees of a large corporate. Since our research is
to test their preferences about banking products, we can select an appropriate sample comprising a
representative set. The preferences of clerical cadre employees could be different from senior
management, but by just analyzing the preferences of one group we cannot arrive at conclusions.
Therefore, we need to ensure that the sample being studied is representative.
Not only that the sample should be representative, but also should be appropriate. And now comes the
question what is the appropriate sample size? , we shall learn different methods of choosing an
appropriate sample.
There are multiple approaches, but this unit seeks to provide you an approach that is bereft of
complexities and easy to apply.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 2
Unit 5 - Sampling and Estimation

Learning Objectives

At the completion of this unit, you will be able to:

• State the principle of sampling
• Identify the appropriate method of sampling
• Explain the importance of sample size
• Calculate sample size
• Apply standardization at the workplace
• Describe estimator and its properties
• Calculate confidence level

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 3
Unit 5 - Sampling and Estimation

Table of contents

S.No Details Page No.

1. Types of Sampling 5
1.1 Probability Sampling 5
1.2 Judgmental Sampling 6
2. Importance of Sample Size 6
3. Estimation 8
3.1 Desirable Properties of Estimators 8
4. Confidence Level 9
5. Standardizing Values 12
6. Chapter Summary 15
7. Required Reading 16

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 4
Unit 5 - Sampling and Estimation

1. Types of Sampling
Sampling can be of many types but can be broadly structured as:
a) Probability Sampling
b) Judgmental Sampling

1.1 Probability Sampling

This is also called ‘random’ sampling and all elements in the population have an equal chance of being chosen.
Random sampling can be of many types but for the purposes of the survey or research, the following methods deserve critical consideration.
1. Simple Random Sampling: Each element has an equal chance of being selected.
2. Systematic Sampling: The elements are chosen at selected intervals. This interval usually arrived at by dividing the population by the sample size.
For example, in a class of 60, the teacher may ask students to count up to 5 and select every fifth student. A more practical example at a branch with a
population of 1000 customers with a sample of 250 would be identifying every 4th customer in terms of the account number.
3. Stratified Sampling: As the name implies, we classify the population into strata (homogenous groups such as age). The researcher randomly selects
the sample from each stratum.
4. Cluster Sampling: We divide the population into clusters and study the clusters. Cluster sampling is also known as area sampling or geographical
sampling, clusters are more heterogeneous compared to strata. where we divide the population into clusters based on the geographical area example,
if you’re studying your branch area for a sales campaign, then you can divide it area-wise and then select your customers in area-wise. This will
ensure that the entire area is covered.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 5
Unit 5 - Sampling and Estimation

1.2 Judgemental Sampling

Here researcher applies his/her knowledge while choosing sample components. For example, to understand, the efficiency of union budget researchers
may choose to collect the opinion of well-known economists.
While taking decision about opening a new branch top management and regional head will be called to discuss and decide.

2. Importance of Sample Size

When you’re conducting a survey or research on your customer base, it is normally difficult to survey all customers i.e. the population. Surveying the entire
population is possible if the population is small. The practical approach is to study a small subset of the population and use this subset to make inferences
about the population. The subset of the population that you’re studying is called a sample.
The challenge, however, is to determine the size of the sample. A small sample may not represent the population precisely, on being influenced by outliers
may get skewed in either of the directions. The larger the sample, the more accurate is the result with the caveat, but a large sample is time-consuming and
costly. There are many methods of identifying sample size and this would depend on how much you know about the population.
One example of a sample size calculator is the use of Slovin’s formula:

(𝟏 + 𝑵𝒆𝟐 )
N stands for total population and e stands for margin of error

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 6
Unit 5 - Sampling and Estimation

Example: Let’s assume you’re assessing a sample of HNI customers at your branch and the total number of customers is 500. You decide that the
acceptable error margin is 10%, then using Slovin’s formula you can calculate your sample size as:
N = 500
e = 0.10 (10% is 10/100 therefore 0.10)
Sample S= 500 / (1 + (500 × 〖0.10〗^2))
Sample S= 500 / ((1 + 5) )
Sample S= 500 / 6 = 83.33
The sample size for the purpose of the study is 83 (rounded off as it is a sample).

Sampling Distributions and standard error:

If you are conducting a survey of customer preferences, you would normally restrict your research to a small representative sample of customers.
Let’s say you have selected about 100 customers for research on average balances. The average balance obtained by this set of customers could be let’s
say Rs.15000/-.
If you decide to take another set of 100 and conduct the study, then you’re likely to get another average. Multiple sets of 100 are likely to result in differing
average balances.
The probability distribution of all possible means of different samples of the same population is called the sampling distribution of the mean.
The standard deviation of the distribution of sample means measures the extent to which we expect the means from the different samples to vary because
of the chance of error in the sampling process. Thus, the standard deviation of the distribution of sample statistics is known as the standard error of the
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 7
Unit 5 - Sampling and Estimation

How Do We Calculate Standard Error?

We need to know the standard deviation and the number of observations in the sample to arrive at the standard error.
Standard error of mean = (Standard deviation ) / (Square root of a number of observations)
SE = σ / √n
Let’s say you have a sample of 100 current account customers with an average of 90 transactions per month. The standard deviation is 15 transactions.
The standard error works out to:
SE = 15 / √100 = 1.50

3. Estimation
We have assessed how we can go about selecting samples from a large population set. The procedure used to select a sample from a population is called
the sampling method. The objective in obtaining a sample is to make inferences about the population from the sample. For example, bankers could make
inferences about repayment patterns during festival periods by studying past data about the sample. The conclusion could be in the form of a statement
such as – 3 to 4% of customers do not repay during the festival season. This 3 to 4 is in the form of an interval and this is called an interval estimation.
An interval estimate is a range of values for a statistic within which the population parameter lies. Point estimates, on the other hand, refer to single values
such as the average or the standard deviation. Point estimate, therefore, can be defined as a single value that is used to estimate the population parameter.

3.1 Desirable Properties of Estimators

The following characteristics describe the properties of a good estimator.
• Unbiasedness: An estimator is considered unbiased it the estimate is similar to the population parameter.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 8
Unit 5 - Sampling and Estimation

• Efficiency: An estimator is considered efficient if the variance of the estimate is small.

• Sufficiency: The estimator should be sufficient enough to convey all relevant information.
• Consistency: If the estimator is consistent then the sample size need not be increased. In other words, as the sample size increase if we find the
estimator approaching close to the population parameter, then we can consider the estimator to be consistent.

4. Confidence Level
The probability range that is related to the interval estimate is called the confidence level. In simple terms, it indicates how confident we are that the interval
estimate will contain the population parameter. For example, if we make an estimate that you’re 95% sure that the average balances in the account at your
branch lie between Rs. 15000 to Rs. 25000 then the confidence interval is Rs. 15000 to Rs. 25000 and we are 95% sure that the average balance
(population parameter will lie in this range.
A confidence level is expressed in terms of a percentage value. A confidence interval, on the other hand, indicates the error margin. In other words, a 95%
confidence level states that if the experiment were to be carried out repeatedly with different samples from the population, then the result has a 95% chance
of recurring. For most purposes of research, the confidence levels used are 90%, 95% or 99%. The confidence level required depends on what is being
tested. A test of 99% may be required where there is a need to ensure a higher degree of reliability. For example, research on medicine requires a higher
confidence level as it would be risky to say that the medicine is seen to be effective at a 90% confidence level. In this regard, it is worth mentioning that
confidence level is a complement of a level of significance.
How do we calculate the margin of error (confidence interval)?
We can calculate the margin of error by using the formula:
Margin of Error = 𝑍ҧ ×
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 9
Unit 5 - Sampling and Estimation

Z ̅ stands for the Z ̅-score

σ stands for population standard deviation
n stands for sample size
Alternately it can be given as:
Margin of Error = Z ̅-score x Standard Error

How to Calculate Critical Value?

When testing hypothesis the critical value is the point on the distribution that is compared to the value obtained to assess whether or not we should reject
the null hypothesis. If the absolute value is greater than the critical value then we can reject the null hypothesis and accept the alternate hypothesis.
The critical value can be calculated from the Z ̅-score table. For example, 95% confidence interval would mean that we are taking a 5% level of
significance. Since that would mean 2.5 on either side, we can look at 0.9750. The table in the appendix (greater than mean) shows that 0.9750 is at row
1.9 and column 0.06. Therefore, the Z ̅ is 1.96.
In the previous example on standard error, we have obtained a value of 1.50 as the standard error. The margin of error in this instance at a 95% confidence
interval would work out to 1.96 x 1.50 = 2.94.
The confidence interval in the example can be written as:
Average + / - Margin of error
Confidence interval is: 90 – 2.94% to 90 + 2.94%

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 10
Unit 5 - Sampling and Estimation

If the margin of error is large, then the researcher can explore options to
increase the size of the sample. A lower margin of error indicates that the
population parameter results indicate that it is closely matched with the
characteristics of the total population.
As a researcher, you should keep in mind the fact that in practice the actual
sample is the number of respondents who have actually answered your query.
This fact should be kept in mind when deciding on sample size. This is largely a
judgemental call and would have to be based on experience or alternately you
can increase the sample if the error margin is high.

Standardizing Values
Appendix 1
Please refer the link to know more about Z-table:

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 11
Unit 5 - Sampling and Estimation

5. Standardizing Values
At times we have to compare data that are from totally different sets. Take the example of two large sales teams. One of them is headed by a very strict
Manager who gives only between 60 and 70% score to his team. A lenient Manager who normally rates his team on a scale of 90 to 100% heads another
team. This would cause issues when comparing performance because the lowest performer in the second team has a score that is higher than the best
performer in the first team. To resolve this, we need to standardize the scores. This can be done using the formula:

μ = Mean
σ = Standard deviation
X = The observation (a specific value that you are calculating the z-score for)

Let’s take the example of two sales teams managed by different sales managers.
Team A Score Team B Score
Xavier 60 Roshan 90
Padmini 62 Sunil 92
Monica 64 Richa 94
Amalor 66 Ankit 96
Nirjhar 68 Marudhar 98
Manisha 70 Tanya 100
© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 12
Unit 5 - Sampling and Estimation

By standardizing using the formula, we get the following for Team A. Team A Score ҧ
Mean for Team A (μ) = 65 Xavier 60 -1.463850109
Padmini 62 -0.878310066
Standard deviation (σ) = 3.41
Monica 64 -0.292770022
Amalor 66 0.292770022
Nirjhar 68 0.878310066
Manisha 70 1.463850109

By standardizing using the formula, we get the following for Team B.

Mean for Team B (μ) = 95 Team B Score ҧ
Roshan 90 -1.463850109
Standard deviation (σ) = 3.41
Sunil 92 -0.878310066
Richa 94 -0.292770022
Ankit 96 0.292770022
Marudhar 98 0.878310066
Tanya 100 1.463850109

Note the Z ̅-scores are the same and Tanya and Manisha have the same scores after standardization. These standardized z- scores can be converted into
formats that are easy to use. For example, the Branch Manager may decide that the average branch performance is 80 and the Standard deviation is 5.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 13
Unit 5 - Sampling and Estimation

The above score will be converted as:

Score of individual = S (Z ̅-score) + Average

New score after ҧ New score after

Team A Score ҧ
𝑍-score Team B Score 𝑍-score
standardisation standardisation
Roshan 90 -1.463850109 73
Xavier 60 -1.463850109 73
Sunil 92 -0.878310066 76
Padmini 62 -0.878310066 76
Richa 94 -0.292770022 79
Monica 64 -0.292770022 79
Amalor 66 Ankit 96 0.292770022 81
0.292770022 81
Nirjhar 68 0.878310066 84 Marudhar 98 0.878310066 84
Manisha 70 1.463850109 87 Tanya 100 1.463850109 87

Note carefully that after standardization Tanya and Manisha have the same score.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 14
Unit 5 - Sampling and Estimation

6. Chapter Summary
Here are the key points discussed in this unit.
• Because of cost and time constraints, we choose to study the sample to draw inferences about a population parameter.
• The sample should represent the population.
• There are two types of sampling:
• Probability sampling: This is also called ‘random’ sampling and all elements in the population have an equal chance of being
• Judgmental sampling: Here researcher applies his/her knowledge while choosing sample components.
• The different types of probability sampling are Simple random sampling, Systematic sampling, Stratified sampling, and Cluster
• The very small sample may not represent the population parameters appropriately, and very big sample size may not be cost and
time-effective, hence it is important to choose the appropriate sample size.
• The standard error helps to quantify the deviation between means of different samples of the same population.
• The purpose behind sampling is to arrive at the estimates of the population parameters.
• There are two types of estimates, the point estimate, and the interval estimate.

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 15
Unit 5 - Sampling and Estimation

Required Reading………..

• ND Vohra, Business Statistics, McGraw Hill Education, 2017

• Srivatsava TN and Shailaja Rego, Statistics for Management, Tata

McGraw Hill, 2008

• Richard Levin, David Rubin, Masood Siddiqui, Sanjay Rastogi,

Statistics for Management, Eighth Edition, 2017

© Copyright 2017, all rights reserved. Manipal Global Education Services Pvt. Ltd. 16

You might also like