Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

CHAPTER 3: SAMPLING

What is a Survey and Why do we use Sampling (rather than full enumeration)?
When the recording of information of an entire population is conducted, this is called a census. An example of this is collecting the grades of all the Grade 11 learners,
or the decennial population census done by the Philippine Statistics Authority (PSA).
A sample survey as a method of systematically gathering information on a segment of the population, such as individuals, families, wildlife, farms, business firms, and
unions of workers, for the purpose of inferring quantitative descriptors of the attributes of the population.
The fraction of the population being studied is called a sample
Reasons why we resort to sampling.
Cost. A sample often provides useful and reliable information at a much lower cost than a census. For extremely large populations, the conduct of a census can be
even impractical. In fact, the difficulty of analyzing complete census data led to summarizing a census by taking a “sample” of returns.
Timeliness. A sample usually provides more timely information because fewer data are to be collected and processed. This attribute is particularly important when
information is needed quickly.
Accuracy. A sample often provides information as accurate, or more accurate, than a census, because data errors typically can be controlled better in smaller tasks.
Detailed information. More time is spent in getting detailed information with sample surveys than with censuses. In a census, we can often only obtain stock, not
flow data. For instance, agricultural production cannot be generated from censuses.
Destructive testing. When a test involves the destruction of an item, sampling must be used. Battery life tests must use sampling because something must be left to
sell!
Types of Sampling

1. Probability Sampling
If data is to be used to make decisions about a population, then how the data is collected is critical. For a sample data to provide reliable information
about a population of interest, the sample must be representative of that population. Selecting samples from the population using chance allows the
samples to be representative.
Basic Types of Probability Sampling

a. Simple random sampling (SRS) involves allowing each possible sample to have an equal chance of being picked and every member of the population has
an equal chance of being included in the sample. Selection may be with replacement (selected individual or unit is returned to frame for possible
reselection) or without replacement (selected individual or unit isn’t returned to the frame). This sampling method requires a listing of the elements of the
population called the sampling frame.

b. Stratified sampling is an extension of simple random sampling which allows for different homogeneous groups, called strata, in the population to be
represented in the sample. To obtain a stratified sample, the population is divided into two or more strata based on common characteristics
c. In systematic sampling, elements are selected from the population at a uniform interval that is measured in time, order, or space.
d. Cluster sampling divides the population into groups called clusters, selects a random sample of clusters, and then, subjects the sampled clusters to
complete enumeration, that is everyone in the sampled clusters are made part of the sample.

2. Non-probability Sampling

Non-probability or judgment sampling is the generic name of several sampling methods where some units in the population do not have the chance to be
selected in the sample, or if the inclusion probabilities cannot be computed.

types of non-probability samples

a. Haphazard or accidental sampling involves an unsystematic selection of sample units. Some disciplines like archaeology, history, and even medicine draw
conclusions from whatever items are made available. Some disciplines like astronomy, experimental physics, and chemistry often do not care about the
“representativeness” of their specimens.
b. In convenience sampling, sample unit’s expedient to the sampler are taken.
c. For volunteer sampling, sample units are volunteers in studies wherein the measuring process is painful or troublesome to a respondent.
d. Purposive sampling pertains to having an expert select a representative sample based on his own subjective judgment. For instance, in Accounting, a
sample audit of ledgers may be taken of certain weeks (which are viewed as typical). Many agricultural surveys also adopt this procedure for lack of a
specific sampling frame.
e. In Quota Sampling, sample units are picked for convenience but certain quotas (such as the number of persons to interview) are given to interviewers.
This design is especially used in market research.
f. In Snowball Sampling, additional sample units are identified by asking previously picked sample units for people they know who can be added to the
sample. Usually, this is used when the topic is not common, or the population is hard to access.
Ways of classifying surveys.
size of the sample – e.g. large-scale or small-scale
periodicity – longitudinal or panel, where respondents are monitored periodically; cross-section; quarterly
main objective – descriptive, analytic
method of data collection – mail, face-to-face interview, e-survey, phone survey, SMS survey
respondents – individual, household, establishment (or enterprise), farmer, OFW

Survey Errors
When collecting data, whether through sample surveys or censuses, a variety of survey errors may arise. This is why it is crucial to design the data collection process
very carefully. Censuses may also overcount or undercount certain portions of the population of interest. Survey errors involve sampling errors and non-sampling
errors:
Types of Survey errors
o Sampling error results from chance variation from sample to sample in a probability sample. It is roughly the difference between the value obtained in a sample
statistic and the value of the population parameter that would have arisen had a census been conducted. Since estimates of a parameter from a probability sample
would vary from sample to sample, the variation in estimates serves as a measure of sampling error.

o Non-sampling error:
Coverage error or selection bias results if some groups are excluded from the frame and have no chance of being selected
Non-response error or bias occurs when people who do not respond may be different from those who do respond
Measurement error arising due to weaknesses in question design, respondent error, and interviewer’s impact on the respondent
Sampling Distribution
Sample table
Number in sample Weights of sample Sample mean, Sample proportion,

1-1 40 kg, 40 kg 40 kg 1
1-2 40 kg, 38 kg 39 kg 0.5

5-5 39 kg, 39 kg 39 kg 0
Total 1237.5 15
Average 49.5 kg 0.6

1. Expected Value and Standard Error Given a sample data set that was drawn from a certain population, the resulting sample mean ( ) serves as an estimate
of the mean of the population from which the sample was derived. a replicated run of the random sampling protocol, the new estimate is likely to vary from the
first estimate because of randomization. If the protocol were to be further replicated many times, then we would have a distribution of estimates. The set of all
possible estimates is called the sampling distribution. This sampling distribution has a mean (mean of all the sample means, ) and

standard deviation ( ). Henceforth, we define:

 the expected value (EV) as the mean of the sampling distribution


 the standard error (SE) as the standard deviation of the sampling distribution
It turns out that for the sampling distribution of the mean, EV is the population mean m. That is,

Thus, we say that the sample mean is an unbiased estimate of the population mean m. In addition, the SE can be viewed as measuring the amount of
chance variability in the estimates that could be generated over all possible samples. Ask learners whether they prefer the estimates to have less
variability. Their intuition should lead them to say that they desire estimates with a small SE, since in this case, the chances are good that the
estimate will be close to the true value of the parameter.

CHAPTER 4: ON ESTIMATION OF PARAMETERS

Concepts of Point and Interval Estimation

A point estimate is a numerical value and it identifies a location or a position in the distribution of possible values.
A confidence interval estimate is a range of values where one has a certain percentage of confidence that the true value will likely fall in.
whether inferential statistics is applicable or not:
a. The government would like to know the per capita rice consumption per day of Filipinos.
b. The effectiveness of a newly developed cure of cancer
c. A presidential candidate decides to take a survey through text messaging to determine the proportion of voters who are likely to vote for him/her.
d. A farmer wants to estimate the number of pigs he has in the pig pen. He decides to capture 20 pigs, puts a red mark on the captured pigs, and then, lets
them loose. After a day or so, the farmer decides to recapture another set of pigs, say, 10 of them, and notices that only one of them has a red mark, and
so he estimates that he has 20/(0.1) = 200 pigs

Lesson: Point Estimation Of The Population

A parameter is a characteristic of the population which is usually unknown and needs to be estimated. On the other hand,
A statistic is computed from a random sample and hence, it is known and is used to estimate the unknown parameter

There are two types of estimation: point and interval estimation. In estimating a parameter, the mathematical expression or formula you used in coming up with the
estimate is referred to as estimator while the estimate is a numerical value that you arrived at when you apply the estimator using the sample data.

With several estimators, we must choose and use the “best” estimator but how do we choose the “best”? An estimator could be evaluated based on the two
statistical properties: accuracy and precision, which are both measures of closeness. Accuracy is a measure of closeness of the estimates to the true value while
precision is a measure of closeness of the estimates to each other.

Properties a good estimator

Accuracy
Accuracy is a measure of how close the estimates are to the parameter. It can be measured by bias, i.e., the difference of the expected value of the estimate
from the true value of the parameter. An estimator is said to be unbiased if its bias is zero. Otherwise, the estimator is biased. When bias is positive or greater
than zero, the estimator overestimates the parameter. If negative or below zero, estimator underestimates the parameter.
Precision.
Precision is a measure of how close the estimates are with each other. The variance of the estimator or its standard error gives a measure of how precise the
estimator is. The smaller the value of the standard error of an estimator, the more precise the estimator is.

In general, we want the estimator to be both accurate and precise. We can illustrate precision and accuracy by way of an analogy. Let us represent the
parameter as a target bull’s eye while the estimates of the parameters are the arrows shot by an archer. The first target (1) in the figure below illustrates a
precise but not an accurate estimator. The second target (2) shows that the archer or estimator is accurate but not precise. The third estimator (3) shows the
archer is both precise and accurate while the last target (4) shows an estimator that is neither accurate nor precise.

(1) (2) (3) (4)


Figure 3-02.5 Analogy between estimation and hitting the bull’s eye
Figure 3-02.5 Analogy between estimation and hitting the bull’s eye
Example: The sample mean (of a simple random sample) is an estimator of the population mean that is both accurate and precise. Its expected value is equal to
the population mean itself that is why it is unbiased and, consequently, an accurate estimator. It is precise because statistical theory has determined that it has
the smallest standard error compared to other estimators. Having these good properties of an estimator makes the sample mean a good estimator of the
population mean.
ASSESSMENT.
1. The process of using sample statistics to draw conclusions about true population parameters is called
a. statistical inference b. the scientific method c. sampling d. descriptive statistics
2. The universe or "totality of items or things" under consideration is called
a. a sample b. a population c.a parameter d. a statistic
3. The portion of the universe that has been selected for analysis is called
a. a sample b. a frame c. a parameter d. a statistic

4. A summary measure that is computed to describe a characteristic from only a sample of the population is called
a. a parameter b. a census c. a statistic d. the scientific method
5. A summary measure that is computed to describe a characteristic of an entire population is called
a. a parameter b. a census c. a statistic d. the scientific method
6. Which of the following is most likely a population as opposed to a sample?
a. respondents to a newspaper survey
b. the first 5 learners completing an assignment
c. every third person to arrive at the bank
d. registered voters in a county
7. Which of the following is most likely a parameter as opposed to a statistic?
a. The average score of the first five learners completing an assignment
b. The proportion of females registered to vote in a county
c. The average height of people randomly selected from a database
d. The proportion of trucks stopped yesterday that were cited for bad brakes
8. Which of the following is NOT a reason for the need for sampling?
a. It is usually too costly to study the whole population.
b. It is usually too time-consuming to look at the whole population.
c. It is sometimes destructive to observe the entire population.
d. It is always more informative by investigating a sample than the entire population.
9. Which of the following is NOT a reason for drawing a sample?
a. A sample is less time consuming than a census.
b. A sample is less costly to administer than a census.
c. A sample is always a good representation of the target population.
d. A sample is less cumbersome and more practical to administer.
10. The Philippine Airlines Internet site provides a questionnaire instrument that can be answered electronically. Which of the 4 methods of data collection is
involved when people complete the questionnaire?
a. Published sources
b. Experimentation
c. Surveying
d. Observation
11. Identify which sampling method is applied in the following situations.
1. The teacher randomly selects 20 boys and 15 girls from a batch of learners to be members of a group that will go to a field trip.
2. A sample of 10 mice are selected at random from a set of 40 mice to test the effect of a certain medicine.
3. The people in a certain seminar are all members of two of five groups are asked what they think about the president.
4. A barangay health worker asks every four houses in the village for the ages of the children living in those households.
5. A sales clerk for a brand of clothing asks people who comes up to her whether they own a piece of article from her brand.
6. A psychologist asks his patient, who suffers from depression, whether he knows other people with the same condition, so he can include them in his
study.
7. A brand manager of a toothpaste asks ten dentists that have clinic closest to his office whether they use a particular brand of toothpaste.

CHAPTER 5: TESTS OF HYPOTHESIS


Lesson 1: Basic Concepts in Hypothesis Testing
“The country will experience El Niño phenomenon in the next few months.”
Their reactions may include the following:
a. The occurrence of El Niño phenomenon is not sure. There is a possibility that El Niño phenomenon may not occur.
b. The effects of El Niño phenomenon are devastating to the country.
c. Some of the consequences of the El Niño phenomenon are tolerable while other consequences are not.
d. The validity of the statement could be tested based on some empirical facts

• The pronouncement is a claim that may be true or false. Such claim could be referred to as a statistical hypothesis. A statistical hypothesis is a claim or a
conjecture that may either be true or false. The claim is usually expressed in terms of the value of a parameter or the distribution of the population values.
• There are two possible actions that one can do with the statement. These actions are either to accept the statement or to reject it. These actions are
brought about by a decision whether the statement is true or false. Some of the learners may believe that the statement is true, hence they accept the
pronouncement. Others may think that the statement is false, hence they reject the claim.
• The actions we made have consequences. Possible consequences of accepting that the statement is true include: (a) increase the importation of rice in
anticipation of supply shortage; (b) buy materials for water storage; (c) use drought-resistant varieties of rice; (d) invest in programs to make Filipinos ready;
and the like. On the other hand, when the statement is rejected because we think it is false, possible consequences are (a) We are not prepared for rice and
water shortage; (b) Farmers experience great loss on production; or (c) We do not do anything.
Some of the consequences are tolerable while other consequences are severe. Experiencing a few days of water shortage is tolerable but having rice shortage for a
month or two is unbearable. The degree of the possible consequence is the basis in making the decision. If the consequences of accepting the claim that El Niño
phenomenon is going to happen are tolerable, then we may not reject the pronouncement. However, if the consequences are severe, then we reject the claim
Consider another statement or claim but this time regarding a parameter. Consider the average number of text messages that a Grade 11 student sends in a day. The
statement could be stated as follows:

“The average daily number of text messages that a Grade 11 student sends is equal to 100.”

As discussed earlier, this statement can either be true or false. Hence, one can accept or reject this statement. The validity of this statement can be accessed through
a series of steps known as test of hypothesis. A test of hypothesis is a procedure based on a random sample of observations with a given level of probability of
committing an error in making the decision, whether the hypothesis is true or false.

In hypothesis testing, we first formulate the hypotheses to be tested. In the formulation of the hypotheses, we take note of the following:
• There are two kinds of a statistical hypothesis: the null and the alternative hypothesis. A null hypothesis is the statement or claim or conjecture to be
tested while an alternative hypothesis is the claim that is accepted in case the null hypothesis is rejected. The symbol “Ho” is used to represent a null hypothesis
while “Ha” is used to represent an alternative hypothesis. The statement “The average daily number of text messages that a Grade 11 student sends is equal to
100.” is considered a null hypothesis. In the event that we reject this claim, we can accept another statement which states otherwise, that is, “The average
daily number of text messages that a Grade 11 student sends is not equal to 100.” This statement is our alternative hypothesis.

• In formulating the hypotheses, we can use the following guidelines:


A null hypothesis is generally a statement of no change. Thus, a statement of equality or one which involves the equality is usually considered in the null
hypothesis. Possible forms of the null hypothesis include (a) equality; (b) less than or equal; and (c) greater than or equal.
The statistical hypothesis is about a parameter or distribution of the population values. For example, the parameter in the statement is the average daily
number of text messages that a Grade 11 student sends. Usually, the parameter is represented by a symbol, like for the population mean, we use µ.
Hence, the null and alternative hypotheses could be stated using symbols as “Ho: µ = 100 against Ha: µ ≠ 100.”
The null and alternative hypotheses are complementary and must not overlap. The usual pairs are as follow:
Ho: Parameter = Value versus Ha: Parameter ≠ Value;
Ho: Parameter = Value versus Ha: Parameter < Value;
Ho: Parameter = Value versus Ha: Parameter > Value;
Ho: Parameter ≤ Value versus Ha: Parameter > Value; and
Ho: Parameter ≥ Value versus Ha: Parameter < Value

• As discussed earlier, there are two actions that one can make on the hypothesis. One can either reject or fail to reject (accept) a hypothesis. The table
below shows these actions:

Hypothesis is TRUE Hypothesis is FALSE


Action

Reject the hypothesis Error Committed No Error Committed


Fail to reject (Accept) hypothesis the No Error Committed Error Committed

• The table shows that there are no errors committed when we reject a false hypothesis and when we fail to reject a true hypothesis. On the other hand,
an error is committed when we reject a true hypothesis and such error is called a Type I error . Also, when we fail to reject (accept) a false hypothesis, we are
committing a Type II error.
• As mentioned earlier, for every action that one takes, there are consequences. When we commit an error, there are consequences, too. Since it is an
error in decision making, the consequences may be tolerable or too severe, severe enough to cause lives. In Statistics, we measure that chance of committing
the error so we will have a basis in making a decision.

ASSESSMENT

As an assessment, choose one of the following problems and ask learners to formulate the appropriate null and alternative hypotheses. You can also ask them to
identify situations where Type I and Type II errors are committed. Have them state its possible consequences.
1. A manufacturer of IT gadgets recently announced they had developed a new battery for a tablet and claimed that it has an average life of at least 24 hours.
Would you buy this battery?
2. A teenager who wanted to lose weight is contemplating on following a diet she read about in the Facebook. She wants to adopt it but, unfortunately, following
the diet requires buying nutritious, low calorie yet expensive food. Help her decide.
3. Alden is exclusively dating Maine. He remembers that on their first date, Maine told him that her birthday was this month. However, he forgot the exact date.
Ashamed to admit that he did not remember, he decides to use hypothesis testing to make an educated guess that today is Maine’s birthday. Help Alden do it.
4. After senior high school, Lilifut is pondering whether or not to pursue a degree in Statistics. She was told that if she graduates with a degree in Statistics, a life of
fulfilment and happiness awaits her. Assist her in making a decision.
5. An airline company regularly does quality control checks on airplanes. Tire inspection is included since tires are sensitive to the heat produced when the
airplane passes through the airport’s runway. The company, since its operation, uses a particular type of tire which is guaranteed to perform even at a
maximum surface temperature of 107oC. However, the tires cannot be used
and need to be replaced when surface temperature exceeds a mean of 107 oC. Help the company decide whether or not to do a complete tire replacement.
6. Which of the following would be an appropriate null hypothesis?
a. The mean of a population is equal to 50.
b. The mean of a sample is equal to 50.
c. The mean of a population is greater than 50.
d. Only (a) and (c) are true.
7. Which of the following would be an appropriate null hypothesis?
a. The population proportion is less than 0.45.
b. The sample proportion is less than 0.45.
c. The population proportion is no less than 0.45.
d. The sample proportion is no less than 0.45.
8. Which of the following would be an appropriate alternative hypothesis?
a. The mean of a population is equal to 50.
b. The mean of a sample is equal to 50.
c. The mean of a population is greater than 50.
d. The mean of a sample is greater than 50.
9. Which of the following would be an appropriate alternative hypothesis?
a. The population proportion is less than 0.45.
b. The sample proportion is less than 0.45.
c. The population proportion is no less than 0.45.
d. The sample proportion is no less than 0.45.
10. A Type II error is committed when
a. we reject a null hypothesis that is true.
b. we don't reject a null hypothesis that is true.
c. we reject a null hypothesis that is false.
d. we don't reject a null hypothesis that is false.
11. A Type I error is committed when
a. we reject a null hypothesis that is true.
b. we don't reject a null hypothesis that is true.
c. we reject a null hypothesis that is false.
d. we don't reject a null hypothesis that is false.

12. Suppose we wish to test H0: 47 versus H1: > 47. What will result if we conclude that the mean is greater than 47 when its true value is really 52?
a. We have made a Type I error.
b. We have made a Type II error.
c. We have made a correct decision.
d. None of the above is correct.
13. If, as a result of a hypothesis test, we reject the null hypothesis when it is false, then we have committed
a. a Type II error.
b. a Type I error.
c. no error.
d. an acceptance error.
14. The owner of a local restaurant has recently surveyed a random sample of n = 250 customers of the restaurant. She would now like to determine whether or
not the mean age of her customers is over 30. If so, she planned to provide background music to appeal to an older crowd. If not, no changes would be made to
the background music in the restaurant. The appropriate hypotheses to test are:
a. H0 :µ≥30 versus H1 :µ<30.
b. H0 :µ≤30 versus H1 :µ> 30.
c. H0 : X ≥30 versus H1 : X <30.
d. H0 : X ≤30 versus H1 : X >30.
15. A major telco is considering opening a new telecom center in an area that currently does not have any such centers. The telco will open the center if there is
evidence that more than 5,000 of the 20,000 households in the area use the telco. It conducts a poll of 300 randomly selected households in the area and finds
that 96 subscribe to the telco. State the test of interest to the rental chain.
a. H0 : p≤0.32 versus H1 : p>0.32
b. H0 : p≤ 0.25 versus H1 : p>0.25
c. H0 : p≤5,000 versus H1 : p>5,000
d. H0 :µ≤5,000 versus H1 :µ> 5,000

You might also like