Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

STA104 1

CHAPTER 1: INTRODUCTION TO STATISTICS

1.1 DEFINITION OF STATISTICS

Statistics is a branch of mathematics. It is a set of concepts, rules and procedures that help us

 Collect and organize numerical data in the form of tables, graphs and charts;
 Understand statistical techniques underlying decisions that affect our lives and well-
being; and
 Make informed decisions

Hence, one can define statistics as the knowledge and skills to collect, process, present and
interpret data. It also allows us to describe and summarize our results as well as making certain
predictions.

1.2 TERMS IN STATISTICS

Term Meaning

Element An element is an object on which a measurement is taken.

Population A population is a collection of element of interest or the measurements obtained


from all individuals or objects of interest. For example, in a study on the voting
pattern in Kelantan, the population consists of all voters in the state of Kelantan.

Sample A sample is a portion or subset of the total group or population of interest. For
example, a sample may consist of 2000 voters randomly selected from the list of
eligible voters in Kelantan.

Census A census is a study of the entire population. For example, if we wish to study
the monthly income of all fishermen in a village, then it is census of the
population.
STA104 2

Term Meaning

Parameter A parameter is a numerical descriptive measure of the population. Parameters


are used to represent a certain population characteristic. For example, in a
country of 10 million students, when we compute the mean of English oral
scores of all 10 million students and find that the score is 60, this is called a
population parameter.

Statistic A statistic is a numerical descriptive measure taken from sample. It is used to


give information about unknown values in the corresponding population. For
example, if 10,000 students are randomly selected from 10 million students in
the country and the average score of their English oral test is calculated, then
this is a statistic.

Variable A variable is measure a characteristic of the population under study which may
take different values, such as weight, height and gender since they are different
from individual to individual.

Data A data is an observation or information that have been recorded or collected.

Sample A sample survey involves a subgroup (or sample) of a population being chosen
survey and questioned on a set of topics. The results of this sample survey are usually
used to make inferences about the larger population.

Pilot study A pilot study is a study done before the actual fieldwork is carried out.

1.3 TYPES OF STATISTICS

The following lists a few examples of statistics.

 Presenting the data on federal government revenue in a table.


 Calculating the average annual salary for football players for the year 2012.
 Predicting future sales of computers on the basis of sales made over the last ten years.
STA104 3

 Comparing the effectiveness of two types of detergents based on tests performed on some
dirty laundry.

In the first two examples cited above, data can be described and summarized using tabular,
graphical or numerical methods. The last two examples involve making inferences and
predictions based on collected data. These two areas of usage actually divide Statistics into two
types:

STATISTICS

DESCRIPTIVE INFERENTIAL
- consists of the collection, - consists of generalizing from samples
organization, summarization and to populations
presentation of data
- performing estimations, determining
- presenting data in some meaningful relationships among variables and
form, such as charts, graphs or tables making predictions
-uses probability

1.4 TYPES OF DATA

Data Definition Advantage Disadvantage


Primary Data that must be - Normally is more Requires more time,
collected directly from accurate and consistent manpower and higher cost
the respondents. with the objectives of to collect.
the research.
- The researcher will be
able to explain how the
data are collected and
the limitations of their
use.
STA104 4

Data Definition Advantage Disadvantage


Secondary Data that is already - Easily accessible from - Data may lack accuracy
available. the internet, journals, because the
annual reports and measurement procedure
newspaper. and the method of data
- Requires less time to collection are not
collect. explained by the
previous researchers.
- The data may be biased
because the original
purpose is not known.
- Data may not meet the
specific needs and
objectives of the current
research.
STA104 5

1.5 TYPES OF VARIABLE

Variable

Quantitative Qualitative
- are numerical - can be placed into distinct
- can be ordered or ranked categories, according to some
characteristic or attributes
- Ex : age, height, weight, body
temperature - Ex: gender, religion

Discrete Continuous
- a variable whose values are - a variable that can assume infinite
countable. number of values between any two
specific values.
- Ex: the number of children in a
family - often include fractions and
decimals.
- Ex: height, weight
STA104 6

1.6 SCALE OF MEASUREMENT

• The lowest in the level of data measurement


scale.
• Classifies data into mutually exclusive
Nominal scale (nonoverlapping) categories.
• No order or ranking can be placed on the data.
• Ex: gender (male, female), marital status (single,
married, divorced, widowed, seperated)

• Classifies data into categories that can be ranked.


• The differences between data values cannot be
Ordinal scale determined or meaningless.
• Ex: Academic qualifiqation (SPM, Diploma,
Degree, Masters, PhD), Grade (A, B, C, D, E, F)

• Ranks data.
• Differences between data values are meaningful
but cannot be manipulated with multiplication.
Interval scale and division.
• No meaningful zero.
• Ex: IQ, Temperature

• The highest level of data measurement scale.


• Ranks data.
• Differences between two values and the ratio of
Ratio scale two values are meaningful.
• There exists a true zero.
• Ex: height, weight, time, salary, age
STA104 7

1.7 SAMPLING TECHNIQUES

Sampling is the process of selecting a sample from a population. Since the information obtained
from the sample is used to generalise or to make conclusion about the population, the sample
must be selected in such a way that it will accurately represent its population.

Sampling techniques are scientific methods of selecting samples from populations. As far as
possible, the samples selected must be random and representative of the population from which
the samples are selected.

The methods of sampling techniques are as below:

Sampling
Techniques

Probability Nonprobability
Sampling Sampling
Techniques Techniques

Simple random
Quota sampling
sampling

Systematic Snowball
sampling sampling

Stratified Convenience
sampling sampling

Judgmental
Cluster sampling
sampling

Multistage
sampling
STA104 8

1.7.1 SAMPLING FRAME

Before we look at each and every sampling method, we must have a list of the population
elements. This population list is called sampling frame.

Example 1:

Population Sampling Frame


Customers of a bank during month of July A list of all customers of a bank during month
of July
This semester students of UiTM Perlis A list of all UiTM Perlis students enrolled this
semester
Customers of a telephone company A telephone directory
Insurance company in Malaysia A list of all insurance company in Malaysia

1.7.2 PROBABILITY SAMPLING TECHNIQUES

Probability sampling techniques are used when a researcher plans to make inferences about the
population. The sample is selected based on known probabilities.

SIMPLE RANDOM SAMPLING

A simple random sample is selected from the population in such a way that each item has the
same chance of being selected as a sample. The sample is drawn randomly from a sampling
frame.

We can use a table of random numbers, computer software or a calculator with random number
generator or lottery method to choose the sample.

The advantage of this method is that it is easy and simple to use with small populations.
However, this sampling method requires a sampling frame. If we require larger sample size, the
cost is more expensive and takes more time to implement.
STA104 9

Example 2:

Hasnah Agency wants to assess client’s views about the quality of their service last month. These
include assessing the satisfaction with regards to speed of service and hours of operation. The
agency has a list of names of all their clients for last month in their records. The total number of
clients is 20.

Describe the steps needed to choose 5 clients from the 20 clients using simple random sample.

- Lottery method
Steps:
1. Get the list of client’s name. This would be the sampling frame.
2. Give each client a number starting from 01 to 20.
3. Print each number on separate pieces of paper and place the numbers in a box.
While mixing the numbers real well and closing your eyes, pull out a number.
Record the number.
4. Repeat the process of pulling out a number until you get 5 different numbers.
5. The 5 numbers represent the sample of clients that will be asked about their
views.

- Using a table of random numbers


Steps:
1. Get a complete sampling frame.
2. Give each client a number starting from 01 to 20.
3. Select 5 numbers from a table of random numbers. We have to decide on pattern
of movement through table. Any number, which is larger or repeated, is
discarded.
STA104 10

SYSTEMATIC SAMPLING

The first sample will be chosen using simple random sampling, while the subsequent samples
chosen according to an interval k from the sampling frame.

Where N = population size

n = sample size

The advantage of this method is that it is easier and less expensive than the simple random
sampling. While a disadvantage is this method is not suitable if certain patterns exist in sampling
frame.

Example 3:

Using the same data as in Example 2, describe the steps needed to choose 5 clients from the 20
clients using systematic sampling.

Steps:

1. Get the list of client’s name. This would be the sampling frame.
2. Give each client a number starting from 01 to 20.
3. Calculate the interval size, = =4

4. Use simple random sampling to choose one number from the first four number, 01 to 04.
5. If the number chosen is 02, then the subsequent samples are 06, 10, 14 and 18. These
numbers represent the sample of clients that will be asked about their views.
STA104 11

STRATIFIED SAMPLING

The population is divided into subgroups called strata. Within each stratum, a simple random
sample is selected. Strata are subgroups that are internally homogeneous and heterogeneous with
other strata. Examples of strata are gender (male, female), religion (Islam, Christians, Others)
and marital status (married, not married).

Advantages of this method are it assures each of the subgroups will have a representative, allows
us to analyze each subgroup and can use different data collection methods for different strata.
However, the cost is expensive if the strata have to be created.

Example 4:

Hasnah Agency wants to assess client’s views about the quality of their service last year. These
include assessing the satisfaction with regards to speed of service and hours of operation. Their
clients can be grouped according to race: Malay, Chinese and Indians. There are 50 Malays, 40
Chinese and 10 Indians. The agency wants to choose a sample of 10 clients.

Describe the steps needed to choose 10 clients from the population using stratified sampling.

Steps:

1. The population is divided into three strata: Malay, Chinese and Indians.
2. Calculate the sample size for each stratum.

i. Malay: × 10 = 5

ii. Chinese: × 10 = 4

iii. Indian: × 10 = 1

3. We need to separate sampling frames for each stratum, thus, one name list for the
Malays, one for the Chinese and another for the Indians are required.
4. Then, use simple random sampling method to choose 5 Malays from the 50 Malays, 4
Chinese from the 40 Chinese and 1 Indian from the 10 Indians.
STA104 12

CLUSTER SAMPLING

This method requires us to divide the population into groups called clusters and then select a
number of clusters randomly from them. All members within the chosen clusters are included in
the sample. A cluster is heterogeneous within subgroups and homogeneous between subgroups.
Examples of clusters are factories, school and geographic areas.

Cluster sampling’s advantages are it is less expensive and it can be applied without sampling
frames. We only need a listing of the clusters. It is also useful if the population are spread out in
a wide geographic area. Its disadvantage is that in practical it is difficult to get heterogeneous
groups and this produces less accurate results.

Example 5:

Hasnah Agency wants to assess potential clients’ views about their agency. Since the agency is
in Perlis, all household in Perlis are considered their potential clients. The agency does not have
a complete list of households in Perlis.

Describe the steps needed to choose households from the population using cluster sampling if all
households are in the areas Kangar, Arau or Padang Besar.

Steps:

1. Let Kangar, Arau and Padang Besar represent the clusters.


2. Use simple random sampling method to choose 1 area out of the 3 areas.
3. If Kangar is chosen, all households in Kangar are members of the sample.
STA104 13

MULTISTAGE SAMPLING

Multistage sampling is similar to cluster sampling. However, samples are selected randomly at
the second or later stage. This means we will randomly select few clusters and then instead of
including all elements in these clusters we select samples within each chosen cluster.

The advantage of multistage sampling is it is less expensive and convenient. Its disadvantage is it
produces less accurate results.

Example 6:

Given the same information as Example 5, describe the steps to choose households according to
the housing areas they live in.

Steps:

1. Let Kangar, Arau and Padang Besar represent the clusters.


2. Use simple random sampling method to choose 1 area out of the 3 areas.
3. If Kangar is chosen, divide the households in Kangar according to their different housing
areas.
4. Use simple random sampling method to choose a few housing areas. Then all households
from the chosen housing areas are the sample.
STA104 14

1.7.3 NONPROBABILITY SAMPLING TECHNIQUES

Nonprobability sampling techniques are used when generalisation concerning the population is
not required or when sampling frames are difficult to obtain.

QUOTA SAMPLING
In this method, the population is divided into subgroups and samples are chosen non-randomly
from these subgroups to get the required quota.
First, you need to decide on a characteristic for the representative sample. Then, decide on the
quota to be used. Finally, the interviewer can pick out cases with the specific characteristics and
quota.

The advantages are it is less costly and does not need any sampling frame. Its disadvantages are
the choice of respondents is left to the researcher and estimates of population parameters are not
possible.

SNOWBALL SAMPLING
An initial group of respondents is selected, usually at random. After being interviewed, these
respondents are asked to identify others who belong to the target population of interest. This
procedure is applied until the researcher obtains the required number of respondents.

CONVENIENCE SAMPLING
In this method, the selection of elements or sampling units is left primarily to the interviewers.
Often, respondents are selected because they happen to be in the right place at the right time. For
example, an interviewer may look for respondents at a shopping complex, an airport or a stadium
to conduct the interview for sampling purposes.

JUDGMENTAL SAMPLING
In this method, the person taking the sample decides or uses his or her judgement to determine
who will or will not be included in the sample. Usually, the sample must conform to a certain
criterion set by the researcher. For example, in a study about homeless people, the researcher
may want to talk to those who are homeless.
STA104 15

1.8 DATA COLLECTION METHODS

Method Advantage Disadvantage


Face-to-face interview/ - Higher response rates - Expensive
personal interview - Higher data quality since clarification of - Interviewers must be carefully selected and
- An interviewer asks the questions and answers can be done. trained.
questions, normally from a - The interviewer can note specific reactions of - Facial expressions and statements by interviewers
questionnaire and records the respondent and the surroundings. can affect responses.
the responses.
Telephone interview - High response rate - Respondents are restricted only to individuals
- The interviewer will - Moderate in cost who can be reached by telephone.
interview the respondents - A researcher can monitor the interviews. - Not suitable to ask lengthy and too many
through telephone. - Can reach a wide geographic area. questions.
- Normally short in duration.
Mail questionnaire - Cheaper - Low response rate.
- A questionnaire is sent to - The respondents are given sufficient time to - Only very simple questions can be asked.
each respondent with a respond. - Respondents are restricted only to individuals
stamped addressed - Provide privacy to the respondents. who can read.
envelope attached. - The research coverage is wider. - Nobody is on hand to explain the questions.
Observation - We can collect the data at the time they - Requires the observer to be at the scene of the
- This method requires us to occur. event.
observe. - Do not depend on others for the information. - It is limited to the present event and not the past.
STA104 16

1.9 GUIDELINES TO WRITE A QUESTIONNAIRE

The first step in writing a questionnaire is to define the objectives of the survey, topic areas and
specific questions.

Then write questions according to the following guidelines:

1. Use short questions.


2. Use simple language.
3. Ask only one issue per question. Do not write questions that ask about two things at once.
4. Use clear terms. If necessary, define terms that are not familiar to the respondents.
5. Avoid personal questions.
6. Avoid sensitive questions or words that may offend the respondents, their organisation or
their ethnic group.
7. Avoid questions that require calculations to be made by respondents.
8. Use more closed ended questions. Minimize the use of open ended questions.

You might also like