Statistics For Business and Economics

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

Data Analysis for Managers

Unit III (Part 1):


Sampling Methods and Estimation

MISSION VISION CORE VALUES


CHRIST is a nurturing ground for an individual’s Excellence and Service Faith in God | Moral Uprightness
holistic development to make effective contribution to Love of Fellow Beings
the society in a dynamic environment Social Responsibility | Pursuit of Excellence
CHRIST
Deemed to be University

Sampling – Need

● An element is the entry on which data are collected.


● A population is the collection of all the elements of interest.
● A sample is a subset of the population.

● The reason we select a sample is to collect data to make an inference


and answer a research question about a population.

● A tyre manufacturer is considering producing a new tyre designed to


provide an increase in mileage over the firm’s current line of tyres. To
estimate the mean useful life of the new tyres, the manufacturer
produced a sample of 120 tyres for testing. The test results provided a
sample mean of 36,500 miles. Hence, an estimate of the mean useful
life for the population of new tyres was 36,500 miles.

Excellence and Service


CHRIST
Deemed to be University

Benefits, Limitations, Definitions, Methods

● A sample mean provides an estimate of a population mean, and a


sample proportion provides an estimate of a population proportion.
● With estimates such as these, some estimation error can be expected.
● With proper sampling methods, the sample results will provide
“good” estimates of the population parameters.
● We need to find a basis for determining how large the estimation error
might be.
● The sampled population is the population from which the sample is
drawn, and a frame is a list of the elements that the sample will be
selected from.
● The target population is the population we want to make inferences
about.
● Numerical characteristics of a population are called parameters.
● There are probability and non-probability sampling methods: With a
probability sample, each possible sample
Excellence and Service has a known probability of
CHRIST
Deemed to be University
Probability Sampling Methods
● Samples can be selected using random number tables or computer generated
random numbers [Excel function RANDBETWEEN()].
○ (Excel Data Analysis Add-in ‘Sampling’ allows to generate a random sample of a given
size; size is input under “Number of samples”)
● A simple random sample of size n from a finite population of size N is a
sample selected such that each possible sample of size n has the same
probability of being selected. "Simple" -> Same probability.
● If we do not want to select an element more than one time, any previously
used random numbers are ignored because the corresponding element is
already included in the sample. Selecting a sample in this manner is referred
to as sampling without replacement. If we selected a sample such that
previously used random numbers are acceptable and specific elements could
be included in the sample two or more times, we would be sampling with
replacement.
● A random sample of size n from an infinite population is a sample selected
such that the following conditions are satisfied.
○ Each element selected comes from the same population.
○ Each element is selected independently (There should not be any selection bias).
Excellence and Service
CHRIST
Deemed to be University
Probability Sampling Methods (Contd.)
● In stratified random sampling, the elements in the population are
first divided into groups called strata, such that each element in the
population belongs to one and only one stratum. The basis for
forming the strata, such as department, location, age, industry type,
and so on, is at the discretion of the designer of the sample.
● However, the best results are obtained when the elements within each
stratum are as much alike (homogeneous) as possible.
● After the strata are formed, a simple random sample is taken from
each stratum.
● In cluster sampling, the elements in the population are first divided
into separate groups called clusters. Each element of the population
belongs to one and only one cluster. A simple random sample of the
clusters is then taken. All elements within each sampled cluster form
the sample.
● Cluster sampling tends to provide the best results when the elements
within the clusters are not alike. In the ideal case, each cluster is a
representative small-scaleExcellence
versionand
of Service
the entire population.
CHRIST
Deemed to be University
Probability Sampling Methods (Contd.)
● In systematic sampling, the first sample element is randomly selected
from the first (N/n) population elements. Every (N/n)th element
thereafter is chosen to be in the sample of size n.
● Because the first element selected is a random choice, a systematic
sample is usually assumed to have the properties of a simple random
sample. This assumption is especially applicable when the list of
elements in the population is a random ordering of the elements.

● In all probability sampling methods, elements selected from the


population have a known probability of being included in the sample.
● The advantage is that the sampling (probability) distribution of the
appropriate sample statistic (any quantity calculated from the sample)
generally can be identified.
● Then the sampling distribution can be used to make probability
statements about the error associated with using the sample results to
make inferences about the population.
Excellence and Service
CHRIST
Deemed to be University

Non-Probability Sampling Methods

● In Convenience sampling, the sample is identified primarily by


convenience. Elements are included in the sample without pre-
specified or known probabilities of being selected.
● In Judgment sampling, the person most knowledgeable on the
subject of the study selects elements of the population that he or she
feels are most representative of the population.
● In Quota sampling, the population is divided into a mutually
exclusive, sub-groups from which the sample items are selected on
the basis of the investigator's knowledge and professional judgment.
The quotas, i.e., proportions of the groups, are decided and then,
within the quotas, the choice of sample items depends exclusively on
the investigator’s judgment.
● In Snowball sampling, a sample element is selected first and then
any referrals of this element are also included in the sample. The
sample literally develops Excellence
like a "snowball".
and Service
CHRIST
Deemed to be University

Sampling Distributions

● A Statistic is a characteristic of the sample, calculated from the


sample elements. It is called a Point estimator if it is used to estimate
the value of a population parameter. For e.g., the sample mean,
sample standard deviation and sample proportion.
● The probability distribution of a statistic is called a Sampling
distribution.
● Central Limit Theorem (CLT): For a sufficiently large sample, the
sample mean has a normal distribution with mean equal to the
population mean and standard deviation equal to ).
● The standard deviation of the sample mean, ), is called as the
standard error of the mean.

Excellence and Service


CHRIST
Deemed to be University
Exercise 1
● BusinessWeek conducted a survey of graduates from 30 top MBA
programs (BusinessWeek, September 22, 2003). On the basis of the
survey, assume that the mean annual salary for male and female
graduates 10 years after graduation is $168,000 and $117,000,
respectively. Assume the standard deviation for the male graduates is
$40,000, and for the female graduates it is $25,000.

What is the probability that


a. a simple random sample of 40 male graduates will provide a sample
mean within $10,000 of the population mean, $168,000?
b. a simple random sample of 40 female graduates will provide a
sample mean within $10,000 of the population mean, $117,000?
c. In which of the preceding two cases, part (a) or part (b), do we have a
higher probability of obtaining a sample estimate within $10,000 of
the population mean? Why?
d. What is the probability that a simple random sample of 100 male
graduates will provide a Excellence
sample and mean more than $4000 below the
Service
population mean?
CHRIST
Deemed to be University

Solution to Exercise 1
a. Let be the sample mean of 40 male graduates. As per CLT, this
follows Normal distribution with a mean = population mean of male
graduates = 1,68,000, and standard deviation, /sqrt(n) = 40,000/sqrt(40)
= 6324.55.
● The required probability = P[(168000 – 10000)<(168000 + 10000)]
● = P[-10000/6324.55 < Z < 10000/6324.55]
● =P[-1.58 < Z < 1.58]
● = 0.9429 – 0.0571
● = 0.8858
● There is a 89% chance that the sample mean of 40 male graduates lies
within 10000 of the population mean.
b. Standard error = 25,000/sqrt(40).
c. The sd of female graduates’ mean salary is lesser
d. n = 100, SE of mean = 40000/10 = 4000
P[Xbar > (168000 – 4000)] = P[Z > -1] = 1 – 0.1587 =
● {General observation: Excellence
As n increases,
and Service the prob. increases.}
CHRIST
Deemed to be University

Sample Proportion

● The sample proportion approximately follows Normal distribution


with mean equal to the population proportion, p, and standard
deviation .

● Exercise 2: The president of Doerman Distributors, Inc., believes that


30% of the firm’s orders come from first-time customers. A random
sample of 100 orders will be used to estimate the proportion of first-
time customers.
a. Assume that the president is correct and p is .30. What is the
sampling distribution of the sample proportion for this study?
b. What is the probability that the sample proportion will be
between .20 and .40?
c. What is the probability that the sample proportion will be
between .25 and .35?
Excellence and Service
CHRIST
Deemed to be University

a. The sample proportion approximately follows Normal distribution


with mean equal to the population proportion = 0.3, and standard
deviation = sqrt(0.3*0.7/100) = 0.046.

b. The required probability = P[0.2 < pbar < 0.4]


= P[(0.2 – 0.3)/0.046 < Z < (0.4 – 0.3)/0.046]
= P[-2.17 < Z < 2.17]
= 2 * 0.485
= 0.97

Excellence and Service


CHRIST
Deemed to be University

Point and Interval Estimation of Mean; 𝜎 Known


● The sample mean and proportion are said to be point estimators of the
population mean and proportion respectively. As their expected values
(or means) are the population mean and proportion respectively, they
are said to be unbiased estimators (Excel Data Analysis Add-in
‘Descriptive Statistics’ -> Mean is provided).
● P[ - k < <  + k] = (1 – 𝛼) where the margin of error, k, is fixed such
that 𝛼 is a very small value, usually 0.05.
1 - alpha
P[ - k < <  + k] = (1 – 𝛼)
P[-k/ 𝜎/ < Z < k/ 𝜎/] = 1 - 𝛼
P[-Z𝛼/2 < Z < Z𝛼/2] = 1 - 𝛼 -Z𝛼/2 Z𝛼/2

k/ (𝜎/= Z𝛼/2 Excellence and Service


CHRIST
Deemed to be University

Margin of Error and C.I.

P[ - k < <  + k] = (1 – 𝛼)


P[-k/(𝜎/) < Z < k/(𝜎/)] = 1 - 𝛼
Lower tail prob. = Upper tail prob. = 𝛼/2
Z𝛼/2 = k/ 𝜎/
k = Z𝛼/2𝜎/

● The (1 - 𝛼)% confidence interval for  is given by


± Z𝛼/2 𝜎 /

● (Excel Data Analysis Add-in ‘Descriptive Statistics’ -> Margin of


Error is provided).

Excellence and Service


CHRIST
Deemed to be University

Exercise 3

● The Wall Street Journal reported that automobile crashes cost the
United States $162 billion annually (The Wall Street Journal, March
5, 2008). The average cost per person for crashes in the Tampa,
Florida, area was reported to be $1,599. Suppose this average cost
was based on a sample of 50 persons who had been involved in car
crashes and that the population standard deviation is $600. What is
the margin of error for a 95% confidence interval? What would you
recommend if the study required a margin of error of $150 or less?

Excellence and Service


CHRIST
Deemed to be University

● Given, n = 50, Xbar = 1,599, sigma = 600, (1 – alpha) = 0.95


That means alpha = 0.05.

● This implies alpha = 0.05 (5%)


● Margin of error = Z𝛼/2𝜎/
● Z𝛼/2 = 1.96
● Margin of error = 1.96 * 600/ = 166.31.
The sample mean for any other sample of size 50 will lie in the range
(1599 – 166.31) = to (1599 + 166.31) = with a 5% chance of error.
● To get a margin of error of 150 or less, sample size should be more
than 50, or std. deviation should have been less than 600, or alpha
should be more than 0.05.
● Ideally, to get margin of error of 150 or less, we should take a sample
size of more than 50 (other things are beyond our control or not to be
manipulated).
● Specifically, n should be such that 1.96 * 600/ = 150 or less
● n = (1.96 * 600/150) ^ 2 = 61.47 and
Excellence = 62 approx.
Service
CHRIST
Deemed to be University
Exercise 4
● Playbill magazine reported that the mean annual household income of
its readers is $1,19,155 (Playbill, January 2006). Assume this estimate
of the mean annual household income is based on a sample of 80
households, and based on past studies, the population standard
deviation is known to be $30,000.
a. Develop a 90% confidence interval estimate of the population mean.
● The (1 - 𝛼)% confidence interval for  is given by
1,19,155± 1.645 x 30000 /
ME = 1.645 x 30000 / = 49350/8.944 = 5517
90% CI = (119155 – 5517, 119155 + 5517) = (113638, 124672)
a. Develop a 95% confidence interval estimate of the population mean.
Excellence and Service
CHRIST
Deemed to be University

● n = 80, Xbar = 1,19,155, sigma = 30,000

a. Given alpha = 0.1


± Z𝛼/2𝜎/
The 90% CI for the mean, is
119155 ± 1.645 * 30000/

Excellence and Service


CHRIST
Deemed to be University

Interval Estimation of Mean; 𝜎 Unknown


● 𝜎 = can be estimated by s = . It is known that E(s2) = 𝜎2.
● The margin of error turns out to be t𝛼/2s/, where t 𝛼/2 is the value of a
Student's-t random variable with (n – 1) degrees of freedom
corresponding to an upper tail probability of 𝛼/2 (Pg. 977 of e-book).
● The (1 - 𝛼)% confidence interval for  is given by
± t𝛼/2s/
● t distribution is the sampling distribution of ( - 𝜇)/(s/.
● Degrees of freedom refer to the number of independent pieces of
information that go into the computation of (xi - )2. The n pieces of
information are
(x1 - ), (x2 - ), . . . , (xn - ).
Since the sum of all these is equal to zero, if we know (n – 1) of them,
the remaining value can be determined from this condition.
● When 𝜎 is unknown, a sample size of n ≥ 50 is recommended if the
population distribution is believed to be highly skewed or has outliers.

Excellence and Service


CHRIST
Deemed to be University

Exercise 5

● Sales personnel for Skillings Distributors submit weekly reports


listing the customer contacts made during the week. A sample of 65
weekly reports showed a sample mean of 19.5 customer contacts per
week. The sample standard deviation was 5.2.
Provide 90% and 95% confidence intervals for the population mean
number of weekly customer contacts for the sales personnel.

Excellence and Service


CHRIST
Deemed to be University

● Given, Xbar = 19.5, s = 5.2, n = 65


● For 90% CI for the mean number of customer contacts, we take alpha
= 1 – 0.9 = 0.1

● The (1 - 𝛼)% confidence interval for  is given by


± t𝛼/2s/
● From the table of Students-t distribution, for n – 1 = 64 degrees of
freedom,
t𝛼/2 = 1.669
Therefore the 90% CI for the mean … is given by {19.5 ± (1.669* 5.2/)}
= (18.42, 20.57).

For 95% …, alpha = 0.05


Excellence and Service
CHRIST
Deemed to be University

Exercise 6

● The average cost per night of a hotel room in New York City is $273
(SmartMoney, March 2009). Assume this estimate is based on a
sample of 45 hotels and that the sample standard deviation is $65.
a. With 95% confidence, what is the margin of error?
b. What is the 95% confidence interval estimate of the population mean?
c. Two years ago the average cost of a hotel room in New York City was
$229. Discuss the change in cost over the two-year period.

Excellence and Service


CHRIST

Determination of Sample Size Deemed to be University

● If we fix the margin of error, k, we can determine the optimum sample


size that will yield estimates of the mean within a particular interval
with a specified level of confidence.

● Assumes that 𝜎 is known. We can determine a planning value of 𝜎 by


one of the following methods:
1. Use the estimate of the population standard deviation computed
from data of previous studies as the planning value for 𝜎.
2. Use a pilot study to select a preliminary sample. The sample
standard deviation from the preliminary sample can be used as the
planning value for 𝜎.
3. Use judgment or a “best guess” for the value of 𝜎. For example,
the range (difference between the largest and smallest values in the
data) divided by 4 is often suggested as a rough approximation
of 𝜎.

Excellence and Service


CHRIST
Deemed to be University

Exercise 7

● Annual starting salaries for college graduates with degrees in business


administration are generally expected to be between $30,000 and
$45,000. Assume that a 95% confidence interval estimate of the
population mean annual starting salary is desired. What is the
planning value for the population standard deviation? How large a
sample should be taken if the desired margin of error is
a. $500?
b. $200?
c. $100?
d. Would you recommend trying to obtain the $100 margin of error?
Explain.

Excellence and Service


CHRIST
Deemed to be University

Interval Estimation for Population Proportion

● The margin of error turns out to be , where Z 𝛼/2 is the value of Z


corresponding to an upper tail probability of 𝛼/2.
● The (1 - 𝛼)% confidence interval for P is given by

±
● Sample Size Determination:

Excellence and Service


CHRIST
Deemed to be University

Exercise 8

● The Consumer Reports National Research Center conducted a


telephone survey of 2000 adults to learn about the major economic
concerns for the future (Consumer Reports, January 2009). The
survey results showed that 1760 of the respondents think the future
health of Social Security is a major economic concern.
a. What is the point estimate of the population proportion of adults
who think the future health of Social Security is a major economic
concern.
b. At 90% confidence, what is the margin of error?
c. Develop a 90% confidence interval for the population proportion
of adults who think the future health of Social Security is a major
economic concern. (0.88 – 0.0119, 0.88 + 0.0119)
d. Develop a 95% confidence interval for this population proportion.
e. How large a sample is needed if the desired margin of error is .05?
Excellence and Service

You might also like