Professional Documents
Culture Documents
Sta104 Chapter 1
Sta104 Chapter 1
Statistics is a branch of mathematics. It is a set of concepts, rules and procedures that help us
Collect and organize numerical data in the form of tables, graphs and charts;
Understand statistical techniques underlying decisions that affect our lives and well-
being; and
Make informed decisions
Hence, one can define statistics as the knowledge and skills to collect, process, present and
interpret data. It also allows us to describe and summarize our results as well as making certain
predictions.
Term Meaning
Sample A sample is a portion or subset of the total group or population of interest. For
example, a sample may consist of 2000 voters randomly selected from the list of
eligible voters in Kelantan.
Census A census is a study of the entire population. For example, if we wish to study
the monthly income of all fishermen in a village, then it is census of the
population.
STA104 2
Term Meaning
Variable A variable is measure a characteristic of the population under study which may
take different values, such as weight, height and gender since they are different
from individual to individual.
Sample A sample survey involves a subgroup (or sample) of a population being chosen
survey and questioned on a set of topics. The results of this sample survey are usually
used to make inferences about the larger population.
Pilot study A pilot study is a study done before the actual fieldwork is carried out.
Comparing the effectiveness of two types of detergents based on tests performed on some
dirty laundry.
In the first two examples cited above, data can be described and summarized using tabular,
graphical or numerical methods. The last two examples involve making inferences and
predictions based on collected data. These two areas of usage actually divide Statistics into two
types:
STATISTICS
DESCRIPTIVE INFERENTIAL
- consists of the collection, - consists of generalizing from samples
organization, summarization and to populations
presentation of data
- performing estimations, determining
- presenting data in some meaningful relationships among variables and
form, such as charts, graphs or tables making predictions
-uses probability
Variable
Quantitative Qualitative
- are numerical - can be placed into distinct
- can be ordered or ranked categories, according to some
characteristic or attributes
- Ex : age, height, weight, body
temperature - Ex: gender, religion
Discrete Continuous
- a variable whose values are - a variable that can assume infinite
countable. number of values between any two
specific values.
- Ex: the number of children in a
family - often include fractions and
decimals.
- Ex: height, weight
STA104 6
• Ranks data.
• Differences between data values are meaningful
but cannot be manipulated with multiplication.
Interval scale and division.
• No meaningful zero.
• Ex: IQ, Temperature
Sampling is the process of selecting a sample from a population. Since the information obtained
from the sample is used to generalise or to make conclusion about the population, the sample
must be selected in such a way that it will accurately represent its population.
Sampling techniques are scientific methods of selecting samples from populations. As far as
possible, the samples selected must be random and representative of the population from which
the samples are selected.
Sampling
Techniques
Probability Nonprobability
Sampling Sampling
Techniques Techniques
Simple random
Quota sampling
sampling
Systematic Snowball
sampling sampling
Stratified Convenience
sampling sampling
Judgmental
Cluster sampling
sampling
Multistage
sampling
STA104 8
Before we look at each and every sampling method, we must have a list of the population
elements. This population list is called sampling frame.
Example 1:
Probability sampling techniques are used when a researcher plans to make inferences about the
population. The sample is selected based on known probabilities.
A simple random sample is selected from the population in such a way that each item has the
same chance of being selected as a sample. The sample is drawn randomly from a sampling
frame.
We can use a table of random numbers, computer software or a calculator with random number
generator or lottery method to choose the sample.
The advantage of this method is that it is easy and simple to use with small populations.
However, this sampling method requires a sampling frame. If we require larger sample size, the
cost is more expensive and takes more time to implement.
STA104 9
Example 2:
Hasnah Agency wants to assess client’s views about the quality of their service last month. These
include assessing the satisfaction with regards to speed of service and hours of operation. The
agency has a list of names of all their clients for last month in their records. The total number of
clients is 20.
Describe the steps needed to choose 5 clients from the 20 clients using simple random sample.
- Lottery method
Steps:
1. Get the list of client’s name. This would be the sampling frame.
2. Give each client a number starting from 01 to 20.
3. Print each number on separate pieces of paper and place the numbers in a box.
While mixing the numbers real well and closing your eyes, pull out a number.
Record the number.
4. Repeat the process of pulling out a number until you get 5 different numbers.
5. The 5 numbers represent the sample of clients that will be asked about their
views.
SYSTEMATIC SAMPLING
The first sample will be chosen using simple random sampling, while the subsequent samples
chosen according to an interval k from the sampling frame.
n = sample size
The advantage of this method is that it is easier and less expensive than the simple random
sampling. While a disadvantage is this method is not suitable if certain patterns exist in sampling
frame.
Example 3:
Using the same data as in Example 2, describe the steps needed to choose 5 clients from the 20
clients using systematic sampling.
Steps:
1. Get the list of client’s name. This would be the sampling frame.
2. Give each client a number starting from 01 to 20.
3. Calculate the interval size, = =4
4. Use simple random sampling to choose one number from the first four number, 01 to 04.
5. If the number chosen is 02, then the subsequent samples are 06, 10, 14 and 18. These
numbers represent the sample of clients that will be asked about their views.
STA104 11
STRATIFIED SAMPLING
The population is divided into subgroups called strata. Within each stratum, a simple random
sample is selected. Strata are subgroups that are internally homogeneous and heterogeneous with
other strata. Examples of strata are gender (male, female), religion (Islam, Christians, Others)
and marital status (married, not married).
Advantages of this method are it assures each of the subgroups will have a representative, allows
us to analyze each subgroup and can use different data collection methods for different strata.
However, the cost is expensive if the strata have to be created.
Example 4:
Hasnah Agency wants to assess client’s views about the quality of their service last year. These
include assessing the satisfaction with regards to speed of service and hours of operation. Their
clients can be grouped according to race: Malay, Chinese and Indians. There are 50 Malays, 40
Chinese and 10 Indians. The agency wants to choose a sample of 10 clients.
Describe the steps needed to choose 10 clients from the population using stratified sampling.
Steps:
1. The population is divided into three strata: Malay, Chinese and Indians.
2. Calculate the sample size for each stratum.
i. Malay: × 10 = 5
ii. Chinese: × 10 = 4
iii. Indian: × 10 = 1
3. We need to separate sampling frames for each stratum, thus, one name list for the
Malays, one for the Chinese and another for the Indians are required.
4. Then, use simple random sampling method to choose 5 Malays from the 50 Malays, 4
Chinese from the 40 Chinese and 1 Indian from the 10 Indians.
STA104 12
CLUSTER SAMPLING
This method requires us to divide the population into groups called clusters and then select a
number of clusters randomly from them. All members within the chosen clusters are included in
the sample. A cluster is heterogeneous within subgroups and homogeneous between subgroups.
Examples of clusters are factories, school and geographic areas.
Cluster sampling’s advantages are it is less expensive and it can be applied without sampling
frames. We only need a listing of the clusters. It is also useful if the population are spread out in
a wide geographic area. Its disadvantage is that in practical it is difficult to get heterogeneous
groups and this produces less accurate results.
Example 5:
Hasnah Agency wants to assess potential clients’ views about their agency. Since the agency is
in Perlis, all household in Perlis are considered their potential clients. The agency does not have
a complete list of households in Perlis.
Describe the steps needed to choose households from the population using cluster sampling if all
households are in the areas Kangar, Arau or Padang Besar.
Steps:
MULTISTAGE SAMPLING
Multistage sampling is similar to cluster sampling. However, samples are selected randomly at
the second or later stage. This means we will randomly select few clusters and then instead of
including all elements in these clusters we select samples within each chosen cluster.
The advantage of multistage sampling is it is less expensive and convenient. Its disadvantage is it
produces less accurate results.
Example 6:
Given the same information as Example 5, describe the steps to choose households according to
the housing areas they live in.
Steps:
Nonprobability sampling techniques are used when generalisation concerning the population is
not required or when sampling frames are difficult to obtain.
QUOTA SAMPLING
In this method, the population is divided into subgroups and samples are chosen non-randomly
from these subgroups to get the required quota.
First, you need to decide on a characteristic for the representative sample. Then, decide on the
quota to be used. Finally, the interviewer can pick out cases with the specific characteristics and
quota.
The advantages are it is less costly and does not need any sampling frame. Its disadvantages are
the choice of respondents is left to the researcher and estimates of population parameters are not
possible.
SNOWBALL SAMPLING
An initial group of respondents is selected, usually at random. After being interviewed, these
respondents are asked to identify others who belong to the target population of interest. This
procedure is applied until the researcher obtains the required number of respondents.
CONVENIENCE SAMPLING
In this method, the selection of elements or sampling units is left primarily to the interviewers.
Often, respondents are selected because they happen to be in the right place at the right time. For
example, an interviewer may look for respondents at a shopping complex, an airport or a stadium
to conduct the interview for sampling purposes.
JUDGMENTAL SAMPLING
In this method, the person taking the sample decides or uses his or her judgement to determine
who will or will not be included in the sample. Usually, the sample must conform to a certain
criterion set by the researcher. For example, in a study about homeless people, the researcher
may want to talk to those who are homeless.
STA104 15
The first step in writing a questionnaire is to define the objectives of the survey, topic areas and
specific questions.