Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

LECTURE NOTES

ON

STATISTICS FOR COMPUTING I (COM 114)

DEPARTMENT OF COMPUTER SCIENCE

FEDERAL COLLEGE OF ANIMAL HEALTH AND PRODUCTION

TECHNOLOGY, VOM

Compiled By

Mrs Ruth Opabunmi

1.1 Definition of Statistics

What is Statistics?
Statistics is the science concerned with developing and studying methods for collecting, analyzing, interpreting
and presenting and making valid conclusion on empirical data. Statistics is a highly interdisciplinary field;
research in statistics finds applicability in virtually all scientific fields and research questions in the various
scientific fields motivate the development of new statistical methods and theory. In developing methods and
studying the theory that underlies the methods statisticians draw on variety of mathematical and computational
tools.

1
Two fundamental ideas in the field of statistics are uncertainty and variation. There are many situations that we
encounter in science (or more generally in life) in which the outcome is uncertain. In some cases the uncertainty
is because the outcome in question is not determined yet (e.g., we may not know whether it will rain tomorrow)
while in other cases the uncertainty is because although the outcome has been determined already we are not
aware of it (e.g., we may not know whether we passed a particular exam).

Probability is a mathematical language used to discuss uncertain events and probability plays a key role in
statistics. Any measurement or data collection effort is subject to a number of sources of variation. By this we
mean that if the same measurement were repeated, then the answer would likely change. Statisticians attempt to
understand and control (where possible) the sources of variation in any situation.

1.2 Identify various sources of statistical data

Types of Data and Data Collection

Primary data
As the name suggests, are first-hand information collected by the Statistician. The data so collected are pure and
original and collected for a specific purpose. They have never undergone any statistical treatment before. The
collected data may be published as well. The Census is an example of primary data.

Methods of Primary Data Collection:


Interviews, Questionnaire, experiments and Observation.
Questionnaires are structured tools for gathering information from a predefined group. These can
include various question types, such as open-ended, closed-ended, or multiple-choice questions. The
data collected is often quantitative and can be analyzed statistically.
Example: A market research company conducting an online survey to understand consumer preferences
regarding a new product line.
Interviews
Interviews involve a one-on-one interaction between the researcher and the respondent. This method
allows in-depth exploration of the respondent’s perspectives, feelings, and experiences. Interviews can
be structured, semi-structured, or unstructured, offering varying degrees of flexibility regarding the
questions.
Example: A researcher conducting face-to-face interviews with industry experts to gather qualitative
data for a case study.
Personal interview: The Statistician collects the data himself/herself. The data so collected is reliable
but is suited for small projects.
Collection Via Investigators: Trained investigators are employed to contact the respondents to collect
data.
Telephonic Interview: The collection of data is done through asking questions over the telephone to
give quick and accurate information.
Observations
Observation involves collecting data by directly watching and analyzing a phenomenon or behaviour in
its natural setting. This method can provide rich, contextual insights. Observations can be a participant
(where the observer is part of the observed group) or a non-participant (where the observer remains
separate).

2
Secondary data
Secondary data are opposite to primary data. They are collected and published already (by some organization,
for instance). They can be used as a source of data and used by Statisticians to collect data from and conduct the
analysis. Secondary data are impure in the sense that they have undergone statistical treatment at least once.

Methods of Secondary Data Collection:

Official publications such as the Ministry of Finance, Statistical Departments of the government, Federal
Bureaus, Agricultural Statistical boards, etc. Semi-official sources include State Bank, Boards of
Economic Enquiry, etc.
Data published by Chambers of Commerce and trade associations and boards.
Articles in the newspaper, from journals and technical publications.

1.3 State important uses of statistics


Importance of Statistical data in various field
Now statistics holds a central position in almost every field, including industry, commerce, trade, physics,
chemistry, economics, mathematics, biology, botany, psychology, astronomy, etc., so the application of
statistics is very wide. Now we shall discuss some important fields in which statistics is commonly applied.

(1) Business
Statistics plays an important role in business. A successful businessman must be very quick and accurate in
decision making. He knows what his customers want; he should therefore know what to produce and sell and in
what quantities.

Statistics helps businessmen to plan production according to the taste of the customers, and the quality of the
products can also be checked more efficiently by using statistical methods. Thus, it can be seen that all business
activities are based on statistical information. Businessmen can make correct decisions about the location of
business, marketing of the products, financial resources, etc.

2) Economics
Economics largely depends upon statistics. National income accounts are multipurpose indicators for
economists and administrators, and statistical methods are used to prepare these accounts. In economics
research, statistical methods are used to collect and analyze the data and test hypotheses. The relationship
between supply and demand is studied by statistical methods; imports and exports, inflation rates, and per capita
income are problems which require a good knowledge of statistics.

(3) Mathematics
Statistics plays a central role in almost all natural and social sciences. The methods used in natural sciences are
the most reliable but conclusions drawn from them are only probable because they are based on incomplete
evidence.

Statistics helps in describing these measurements more precisely. Statistics is a branch of applied mathematics.
A large number of statistical methods like probability averages, dispersions, estimation, etc., is used in
mathematics, and different techniques of pure mathematics like integration, differentiation and algebra are used
in statistics.

(4) Banking
3
Statistics plays an important role in banking. Banks make use of statistics for a number of purposes. They work
on the principle that everyone who deposits their money with the banks does not withdraw it at the same time.
The bank earns profits out of these deposits by lending it to others on interest. Bankers use statistical
approaches based on probability to estimate the number of deposits and their claims for a certain day.

(5) State Management (Administration)


Statistics is essential to a country. Different governmental policies are based on statistics. Statistical data are
now widely used in making all administrative decisions. Suppose if the government wants to revise the pay
scales of employees in view of an increase in the cost of living, and statistical methods will be used to
determine the rise in the cost of living. The preparation of federal and provincial government budgets mainly
depends upon statistics because it helps in estimating the expected expenditures and revenue from different
sources. So statistics are the eyes of the administration of the state.

(6) Accounting and Auditing


Accounting is impossible without exactness. But for decision making purposes, so much precision is not
essential; the decision may be made on the basis of approximation, known as statistics. The correction of the
values of current assets is made on the basis of the purchasing power of money or its current value.
In auditing, sampling techniques are commonly used. An auditor determines the sample size to be audited on
the basis of error.

(7) Natural and Social Sciences


Statistics plays a vital role in almost all the natural and social sciences. Statistical methods are commonly used
for analyzing experiments results, and testing their significance in biology, physics, chemistry, mathematics,
meteorology, research, chambers of commerce, sociology, business, public administration, communications and
information technology, etc.

(8) Astronomy
Astronomy is one of the oldest branches of statistical study; it deals with the measurement of distance, and
sizes, masses and densities of heavenly bodies by means of observations. During these measurements errors are
unavoidable, so the most probable measurements are found by using statistical methods.
Example: This distance of the moon from the earth is measured. Since history, astronomers have been using
statistical methods like method of least squares to find the movements of stars.

(9) Health - Predicting Disease


Statistics is even playing a role in the medical field. Statistics help us to know how many numbers of people are
suffering from the disease. It also helps us to understand how many have died from the same disease.

But the best part of statistics is that it also helps you to find out how much you affected from the deceased. For
example, a study has shown that more than 75% of people are infected with a disease that is caused by mango.
In that case, you might avoid mango to avoid this disease.

1.4 Explain the importance of computer in statistics

Statistics and the computer there are two different ways in which the computer is changing the field of statistics.
First, computers can help us to do what we did before the advent of the computer but in a more efficient way.
Second, computers can help us to do things nobody thought of before the advent of the computer. To the first
4
category belong statistical data analysis by numerical and graphical methods, and simulation; to the second
belongs, for example, different computer- intensive methods (see Diaconis and Efron, 1983). Another way to
categorise the relation statistics-computer is to list the different ways the computer can be used in statistics.
The following are examples of such uses: numerical and graphical data analysis; symbolic computations;
simulations; storing statistical knowledge; presentation of results. The close relationship between statistics and
computing implies that when one changes the other will also change. The following are some new practical
procedures in computing which have turned out to have a great importance for statistics:
(i) The change from mainframe batch computing to personal computing.
(ii) The introduction of multiple dynamic displays.
(iii) The possibility of direct manipulation of graphical objects.
Some trends in statistics are also obviously very much influenced by what has happened in computing.
Examples of such trends are:
(i) emphasis on exploratory data analysis instead of hypothesis testing;
(ii) the use of computer-intensive methods;
(iii) the introduction of new diagnostic method
1.5 State uses of statistical data
(1) Statistics helps in providing a better understanding and exact description of a phenomenon of nature.
(2) Statistics helps in the proper and efficient planning of a statistical inquiry in any field of study.
(3) Statistics helps in collecting appropriate quantitative data.

Basic Uses of Statistics in Our Daily Life

Government
The importance of statistics in government is utilized by making judgments about health, populations,
education, and much more. It may help the government to check out what education schedule can be beneficial
for students. What is the progress report of high school students using that particular curriculum? The
government can assemble specific data about the population of the country using a census. For example: The
government can assemble specific data about the population of the country using a census.

Weather Forecast
Have you ever seen weather forecasting? Do you know how the government does the weather forecasting?
Statistics play a crucial role in weather forecasting.

The computer use in weather forecasting is based on the set of statistics functions. All these statistics function to
compare the weather condition with the pre-recorded seasons and conditions. This helps the government.

Emergency Preparedness
Statistics is also helpful in emergency preparedness. With the help of statistics, we can predict any natural
disaster that may happen shortly. It will help us to get prepared for an emergency. It also helps the rescue team
to do the preparation to rescue the life of the people who are in danger.

Political Campaigns
Statistics are crucial in a political campaign. Without statistics, no one can run a political campaign with
perfection. It helps the politicians to have an idea about how many chances they have to win an election in a
particular area.

5
Statistics also help the news channel to predict the winner of the election. It also helps the political parties to
know how many candidates are in their support in a particular voting zone. In contrast, it helps the country to
predict the future government.

Sports

There is lots of uses of statistics in sports. Every sports require statistics to make the sport more effective.
Statistics help the sport person to get the idea about his/her performance in the particular sports.

Nowadays sports are utilizing the statistics data into the next level. However the reason is a sport is getting
more popular and there are various kinds of types of equipment in the sports that are used to collect data of
various factor. Statistics is used to get a conclusion from the given data.

Research

The uses of statistics in research play an essential role in the work of researchers. For instance, statistics can be
applied in data acquisition, analysis, explanation, interpretation, and presentation. The uses of statistics in
research can lead researchers for summarization, proper characterization, performance, and description of the
outcome of the research.

Besides this, the medical area would be less effective without the research to recognize which drugs or
interventions run best and how the individual groups respond to medicine. Medical experts also conduct studies
by age, race, or country to identify the effect of the features on one’s health.

Education
The beneficial uses of statistics in education are that teachers can be considered to be supportive as researchers
during their classrooms to recognize what education technique works on which pupils and know the reason
why. They also need to estimate test details to determine whether students are working expectedly, statistically,
or not. There are statistical studies about student achievement at all levels of testing and education, from
kindergarten to a GRE or SAT. For example, teachers can calculate the average of students’ marks and employ
new techniques that can help the students improve their grades.

Prediction
The figures help us make predictions about something that is going to happen in the future. Based on what we
face in our daily lives, we make predictions.

How accurate this prediction will depend on many factors. When we make a prediction, we take into account
the external or internal factors that may affect our future. When they apply statistical techniques to estimate an
event, the same statisticians use it.

Doctors, engineers, artists, and practitioners all use statistics to make predictions about future events. For
example, doctors use statistics to understand the future of the disease. They can predict the magnitude of the
flue in each winter season through the use of data.

Engineers use statistics to estimate the success of their ongoing project, and they also use the data to evaluate
how long it will take to complete a project.

Quality Testing
Quality testing is another important use of statistics in every area of life. On a day-to-day basis, we conduct
quality tests to ensure that our purchase is correct and get the best results from what we spend.
6
We do a sample test of what we expect to buy to get the best. If the sample test that we have done passes the
quality test, we want to buy it.

Insurance
Insurance is a vast industry. There are hundreds of insurance i.e. car insurance, bike, life insurance, and many
more. The premium of insurance is based on the statistics. Insurance companies use the statistics that are
collected from various homeowners, drivers, vehicle registration office, and many more. They receive the data
from all these resources and then decide the premium amount.

Consumer Goods
Statistics are widely used in consumer goods products. The reason is consumer goods are daily used products.
The business use statistics to calculate which consumer goods are available in the store or not.

They also used stats to find out which store needs the consumer goods and when to ship the products. Even
proper statistics decisions are helping the business to make massive revenue on consumer goods.

Financial Market
The financial market completely relies on the financial market. All the stock prices calculate with the help of
statistics. It also helps the investor to take the decision of investment in the particular stock. It also helps the
corporate to manage their finance to do long term business.

Business Statistics
Each large organization uses business statistics and utilize various data analysis tools. For instance,
approximating the probability and see where sales can be headed in the future. Several tools are used for
business statistics, which built on the bases of mean, median, and mode, the bell curve, and bar graphs, and
basic probability. These can be employed for research problems related to employees, products, customer
service, and much more. Business can successfully rely on the things what is working and what is not.

Besides this, statistics are widely used in consumer goods products. The reason is consumer goods are daily
used products. The business use statistics to calculate which consumer goods are available in the store or not.

They also used stats to find out which store needs the consumer goods and when to ship the products. Even
proper statistics decisions are helping the business to make massive revenue on consumer goods.

Advanced Uses of Statistics

Computer Science

Statistics is essential for all sections of science, as it is amazingly beneficial for decision making and examining
the correctness of the choices that one has made. With the application of statistics in computer science and
machine learning, algorithms’ efficiency can be increased significantly. One also can reduce the price of
processing with the help of statistics. If one does not understand statistics, it is not possible to know the logical
algorithms and find it challenging to develop them.

Computer scientists need to concentrate on retrieval, reporting, data acquisition/cleaning, and mining. They are
assigned to the algorithms’ improvement and systems efficiency. Besides this, they focus on machine learning,
especially data mining (discovering models and relationships in information for several objectives, like finance
and marketing).

Robotics
7
Statistics has various uses in the field of robotics. Various techniques can be applied in this field, such as EM,
Particle filters, Kalman filters, Bayesian networks, and much more. The robot always senses the present state by
estimating the probability density function value. With the help of new input sensories, the robots continuously
update themselves and give priority to the current actions. Apart from this, robots can compare the estimated
and actual value and act as per the value. Therefore, it can be stated that statistics is an important parameter that
is used in robotics.

Aerospace

Statistics is one of the important parameters on which aerospace engineering works. There are numerous ways
in which statistics are easily implemented, such as details about shrinkage and growth rate for a route. Apart
from this, statistics are used to study traffic decline and growth, the number of accidents due to aerospace
failures, etc. Several airline industries use these statistics information to check how they can work to make a
better aerospace future.

Data Science
A data scientist uses different statistical techniques to study the collected data, such as Classification,
Hypothesis testing, Regression, Time series analysis, and much more. Data scientists do proper experiments and
get desired results using these statistical techniques. Besides all this, statistics can be utilized for concluding the
information quickly and effectively. Therefore, statistics is one of the helpful measures for data scientists to
obtain the relevant outputs of the sample space.

Machine Learning
Statistics are utilized for quantifying the uncertainty of the estimated skills within the machine learning models.
These uncertainties are defined with the help of confidence intervals and tolerance intervals. Statistics can be
used for machine learning in various ways, such as for:

 Problem Framing
 Understanding the data
 Data Cleaning
 Selection of data
 Data Preparation
 Model Evaluation
 Model Configuration
 Selection of Model
 Model Presentation
 Model Prediction

Deep Learning
Statistics and probability both are considered as the method of handling the aggregation or ignorance of data.
Deep learning can use statistics to get knowledge about abstracting several useful properties and ignorance of
the details. Therefore, it can be seen that statistics and probability are the methods to formalize the deep
learning process mathematically. That is why this can be concluded that statistics are basic for deep learning,
and it would be better to understand the use of statistics in deep learning and know it.

1.6 Explain quantitative data

Quantitative Data: Definition


Quantitative data is defined as the value of data in the form of counts or numbers where each data-set has an
unique numerical value associated with it. This data is any quantifiable information that can be used for
8
mathematical calculations and statistical analysis, such that real-life decisions can be made based on these
mathematical derivations. Quantitative data is used to answer questions such as “How many?”, “How often?”,
“How much?”. This data can be verified and can also be conveniently evaluated using mathematical techniques.

For example, there are quantities corresponding to various parameters, for instance, “How much did that laptop
cost?” is a question which will collect quantitative data. There are values associated with most measuring
parameters such as pounds or kilograms for weight, dollars for cost etc.

Quantitative data makes measuring various parameters controllable due to the ease of mathematical derivations
they come with. Quantitative data is usually collected for statistical analysis using surveys, polls or
questionnaires sent across to a specific section of a population. The retrieved results can be established across a
population.

1.7 Identify various scales of measurement

What is Measurement?
Normally, when one hears the term measurement, they may think in terms of measuring the length
of something (ie. the length of a piece of wood) or measuring a quantity of something (ie. a cup of
flour). This represents a limited use of the term measurement. In statistics, the term measurement
is used more broadly and is more appropriately termed scales of measurement. Scales of
measurement refer to ways in which variables/numbers are defined and categorized. Each scale of
measurement has certain properties which in turn determines the appropriateness for use of certain
statistical analyses. The four scales of measurement are nominal, ordinal, interval, and ratio.

Nominal: Categorical data and numbers that are simply used as identifiers or names represent a
nominal scale of measurement. Numbers on the back of a baseball jersey (St. Louis Cardinals 1 =
Ozzie Smith) and your social security number are examples of nominal data. If I conduct a study
and I'm including gender as a variable, I will code Female as 1 and Male as 2 or visa versa when I
enter my data into the computer. Thus, I am using the numbers 1 and 2 to represent categories of
data.
Ordinal: An ordinal scale of measurement represents an ordered series of relationships or rank
order. Individuals competing in a contest may be fortunate to achieve first, second, or third place.
First, second, and third place represent ordinal data. If Roscoe takes first and Wilbur takes second,
we do not know if the competition was close; we only know that Roscoe outperformed Wilbur.
Likert-type scales (such as "On a scale of 1 to 10 with one being no pain and ten being high pain,
how much pain are you in today?") also represent ordinal data. Fundamentally, these scales do not
represent a measurable quantity. An individual may respond 8 to this question and be in less pain
than someone else who responded 5. A person may not be in half as much pain if they responded
4 than if they responded 8. All we know from this data is that an individual who responds 6 is in
less pain than if they responded 8 and in more pain than if they responded 4. Therefore, Likert-
type scales only represent a rank ordering.
Interval: A scale which represents quantity and has equal units but for which zero represents
simply an additional point of measurement is an interval scale. The Fahrenheit scale is a clear
example of the interval scale of measurement. Thus, 60 degree Fahrenheit or -10 degrees
Fahrenheit are interval data. Measurement of Sea Level is another example of an interval scale.
9
With each of these scales there is direct, measurable quantity with equality of units. In addition,
zero does not represent the absolute lowest value. Rather, it is point on the scale with numbers
both above and below it (for example, -10 degrees Fahrenheit).
Ratio: The ratio scale of measurement is similar to the interval scale in that it also represents
quantity and has equality of units. However, this scale also has an absolute zero (no numbers exist
below the zero). Very often, physical measures will represent ratio data (for example, height and
weight). If one is measuring the length of a piece of wood in centimeters, there is quantity, equal
units, and that measure can not go below zero centimeters. A negative length is not possible.

2.1 Describe basic sampling techniques: Random, Systematic, Stratified, Quota Sampling etc

Sampling methods
What are sampling methods?
In a statistical study, sampling methods refer to how we select members from the population to be in the study.
If a sample isn't randomly selected, it will probably be biased in some way and the data may not be
representative of the population.
There are many ways to select a sample—some good and some bad.
Bad ways to sample
Convenience sample: The researcher chooses a sample that is readily available in some non-random way.
Example—A researcher polls people as they walk by on the street.
Why it's probably biased: The location and time of day and other factors may produce a biased sample of
people.
Voluntary response sample: The researcher puts out a request for members of a population to join the sample,
and people decide whether or not to be in the sample.
Example—A TV show host asks his viewers to visit his website and respond to an online poll.
Why it's probably biased: People who take the time to respond tend to have similarly strong opinions compared
to the rest of the population.

the sample, and people decide whether or not to be in the sample.


Example—A TV show host asks his viewers to visit his website and respond to an online poll.
Why it's probably biased: People who take the time to respond tend to have similarly strong opinions compared
to the rest of the population.

PRACTICE PROBLEM 1
A restaurant leaves comment cards on all of its tables and encourages customers to participate in a brief survey
to learn about their overall experience.

Simple random sample: Every member and set of members has an equal chance of being included in the
sample. Technology, random number generators, or some other sort of chance process is needed to get a simple
random sample.
Example—A teachers puts students' names in a hat and chooses without looking to get a sample of
students.
Why it's good: Random samples are usually fairly representative since they don't favor certain members.

Stratified random sample: The population is first split into groups. The overall sample consists of some
members from every group. The members from each group are chosen randomly.

10
Example—A student council surveys 100100100 students by getting random samples of 252525
freshmen, 252525 sophomores, 252525 juniors, and 252525 seniors.
Why it's good: A stratified sample guarantees that members from each group will be represented in the
sample, so this sampling method is good when we want some members from every group.

Cluster random sample: The population is first split into groups. The overall sample consists of every member
from some of the groups. The groups are selected at random.
Example—An airline company wants to survey its customers one day, so they randomly select 555 flights
that day and survey every passenger on those flights.
Why it's good: A cluster sample gets every member from some of the groups, so it's good when each
group reflects the population as a whole.

Systematic random sample: Members of the population are put in some order. A starting point is selected at
random.

There are two types of sampling methods:

Probability sampling involves random selection, allowing you to make statistical inferences about the whole
group.
Non-probability sampling involves non-random selection based on convenience or other criteria, allowing you
to easily collect initial data.
You should clearly explain how you selected your sample in the methodology section of your paper or thesis.

A. Probability sampling methods


Probability sampling means that every member of the population has a chance of being selected. It is mainly
used in quantitative research. If you want to produce results that are representative of the whole population, you
need to use a probability sampling technique.

There are four main types of probability sample.

Probability sampling
1. Simple random sampling
In a simple random sample, every member of the population has an equal chance of being selected. Your
sampling frame should include the whole population.

To conduct this type of sampling, you can use tools like random number generators or other techniques that are
based entirely on chance.

Example
You want to select a simple random sample of 100 employees of Company X. You assign a number to
every employee in the company database from 1 to 1000, and use a random number generator to select
100 numbers.

2. Systematic sampling
Systematic sampling is defined as a probability sampling method where the researcher chooses elements from a
target population by selecting a random starting point and selects sample members after a fixed ‘sampling
interval.’ Systematic sampling is similar to simple random sampling, but it is usually slightly easier to conduct.
Every member of the population is listed with a number, but instead of randomly generating numbers,
individuals are chosen at regular intervals.

11
Example
All employees of the company are listed in alphabetical order. From the first 10 numbers, you randomly
select a starting point: number 6. From number 6 onwards, every 10th person on the list is selected (6,
16, 26, 36, and so on), and you end up with a sample of 100 people.

If you use this technique, it is important to make sure that there is no hidden pattern in the list that might skew
the sample. For example, if the HR database groups employees by team, and team members are listed in order
of seniority, there is a risk that your interval might skip over people in junior roles, resulting in a sample that is
skewed towards senior employees.

3. Stratified sampling
Stratified sampling involves dividing the population into subpopulations that may differ in important ways. It
allows you draw more precise conclusions by ensuring that every subgroup is properly represented in the
sample.

To use this sampling method, you divide the population into subgroups (called strata) based on the relevant
characteristic (e.g. gender, age range, income bracket, job role).

Based on the overall proportions of the population, you calculate how many people should be sampled from
each subgroup. Then you use random or systematic sampling to select a sample from each subgroup.

Example
The company has 800 female employees and 200 male employees. You want to ensure that the sample
reflects the gender balance of the company, so you sort the population into two strata based on gender.
Then you use random sampling on each group, selecting 80 women and 20 men, which gives you a
representative sample of 100 people.

4. Quota sampling
Quota sampling is a sampling methodology wherein data is collected from a homogeneous group. It involves a
two-step process where two variables can be used to filter information from the population. It can easily be
administered and helps in quick comparison. It is defined as a non-probability sampling method in which
researchers create a sample involving individuals that represent a population. Researchers choose these
individuals according to specific traits or qualities.

Example
With probability sampling, like simple random sampling, there are rules that govern how to get the
sample. However, with quota sampling, no formal rules exist. The general steps to follow are:

1. Divide the population into subgroups. These should be exclusive. For example, you might divide
employees by type of educational degree.
2. Figure out the proportion of subgroups to the population. For example, employees who have a
physical science degree might be 1 out of 4.
3. Choose your sample size. For example, if you are sampling 10,000 people you might have a
quota sample of 100.
4. Choose participants, being careful to adhere to the subgroup’s characteristics. For this example,
25% of your sample should have a physical science degree. The selection process continues until
your quotas are filled.

12
5. Cluster sampling
Cluster sampling also involves dividing the population into subgroups, but each subgroup should have similar
characteristics to the whole sample. Instead of sampling individuals from each subgroup, you randomly select
entire subgroups.

If it is practically possible, you might include every individual from each sampled cluster. If the clusters
themselves are large, you can also sample individuals from within each cluster using one of the techniques
above.

This method is good for dealing with large and dispersed populations, but there is more risk of error in the
sample, as there could be substantial differences between clusters. It’s difficult to guarantee that the sampled
clusters are really representative of the whole population.

Example
The company has offices in 10 cities across the country (all with roughly the same number of employees
in similar roles). You don’t have the capacity to travel to every office to collect your data, so you use
random sampling to select 3 offices – these are your clusters.

B. Non-probability sampling methods


In a non-probability sample, individuals are selected based on non-random criteria, and not every individual has
a chance of being included.

This type of sample is easier and cheaper to access, but it has a higher risk of sampling bias, and you can’t use it
to make valid statistical inferences about the whole population.

Non-probability sampling techniques are often appropriate for exploratory and qualitative research. In these
types of research, the aim is not to test a hypothesis about a broad population, but to develop an initial
understanding of a small or under-researched population.

i. Convenience sampling
A convenience sample simply includes the individuals who happen to be most accessible to the researcher.

This is an easy and inexpensive way to gather initial data, but there is no way to tell if the sample is
representative of the population, so it can’t produce generalizable results.

Example
You are researching opinions about student support services in your university, so after each of your
classes, you ask your fellow students to complete a survey on the topic. This is a convenient way to
gather data, but as you only surveyed students taking the same classes as you at the same level, the
sample is not representative of all the students at your university.

ii. Voluntary response sampling


Similar to a convenience sample, a voluntary response sample is mainly based on ease of access. Instead of the
researcher choosing participants and directly contacting them, people volunteer themselves (e.g. by responding
to a public online survey).

Voluntary response samples are always at least somewhat biased, as some people will inherently be more likely
to volunteer than others.

Example
13
You send out the survey to all students at your university and a lot of students decide to complete it. This
can certainly give you some insight into the topic, but the people who responded are more likely to be
those who have strong opinions about the student support services, so you can’t be sure that their
opinions are representative of all students.

iii. Purposive sampling


This type of sampling involves the researcher using their judgement to select a sample that is most useful to the
purposes of the research.

It is often used in qualitative research, where the researcher wants to gain detailed knowledge about a specific
phenomenon rather than make statistical inferences. An effective purposive sample must have clear criteria and
rationale for inclusion.

Example
You want to know more about the opinions and experiences of disabled students at your university, so
you purposefully select a number of students with different support needs in order to gather a varied
range of data on their experiences with student services.

2.2 Distinguish between the following methods of data collection: Interviews, Questionnaires, Observation
and Surveys.

Open-Ended Surveys and Questionnaires


Opposite to closed-ended are open-ended surveys and questionnaires. The main difference between the two is
the fact that closed-ended surveys offer predefined answer options the respondent must choose from, whereas
open-ended surveys allow the respondents much more freedom and flexibility when providing their answers.

Here’s an example that best illustrates the difference:

Closed vs. Open-Ended Question


When creating an open-ended survey, keep in mind the length of your survey and the number and complexity of
questions. You need to carefully determine the optimal number of question, as answering open-ended questions
can be time-consuming and demanding, and you don’t want to overwhelm your respondents.

Compared to closed-ended surveys, one of the quantitative data collection methods, the findings of open-ended
surveys are more difficult to compile and analyze due to the fact that there are no uniform answer options to
choose from.

1-on-1 Interviews
One-on-one (or face-to-face) interviews are one of the most common types of data collection methods in
qualitative research. Here, the interviewer collects data directly from the interviewee. Due to it being a very
personal approach, this data collection technique is perfect when you need to gather highly-personalized data.

Depending on your specific needs, the interview can be informal, unstructured, conversational, and even
spontaneous (as if you were talking to your friend) – in which case it’s more difficult and time-consuming to
process the obtained data – or it can be semi-structured and standardized to a certain extent (if you, for example,
ask the same series of open-ended questions).
Focus groups
The focus groups data collection method is essentially an interview method, but instead of being done 1-on-1,
here we have a group discussion.

14
Whenever the resources for 1-on-1 interviews are limited (whether in terms of people, money, or time) or you
need to recreate a particular social situation in order to gather data on people’s attitudes and behaviors, focus
groups can come in very handy.

Ideally, a focus group should have 3-10 people, plus a moderator. Of course, depending on the research goal
and what the data obtained is to be used for, there should be some common denominators for all the members of
the focus group.

Example, if you’re doing a study on the rehabilitation of teenage female drug users, all the members of
your focus group have to be girls recovering from drug addiction. Other parameters, such as age,
education, employment, marital status do not have to be similar.

Direct observation
Direct observation is one of the most passive qualitative data collection methods. Here, the data collector takes
a participatory stance, observing the setting in which the subjects of their observation are while taking down
notes, video/audio recordings, photos, and so on.

Due to its participatory nature, direct observation can lead to bias in research, as the participation may influence
the attitudes and opinions of the researcher, making it challenging for them to remain objective. Plus, the fact
that the researcher is a participant too can affect the naturalness of the actions and behaviors of subjects who
know they’re being observed.
Questionnaire Design - Guidelines on how to design a good questionnaire
A good questionnaire should not be too lengthy. Simple English should be used and the question shouldn’t be
difficult to answer. A good questionnaire requires sensible language, editing, assessment, and redrafting.

Survey

A survey is a research method used for collecting data from a predefined group of respondents to gain
information and insights into various topics of interest. They can have multiple purposes, and researchers can
conduct it in many ways depending on the methodology chosen and the study’s goal. In the year 2020, research
is of extreme importance, and hence it’s essential for us to understand the benefits of social research for a target
population using the right survey tool.

The data is usually obtained through the use of standardized procedures to ensure that each respondent can
answer the questions at a level playing field to avoid biased opinions that could influence the outcome of the
research or study. The process involves asking people for information through a questionnaire, which can be
either online or offline. However, with the arrival of new technologies, it is common to distribute them using
digital media such as social networks, email, QR codes, or URLs.

Questionnaire Design Process


State the information required- This will depend upon the nature of the problem, the purpose of the study and
hypothesis framed. The target audience must be concentrated on.
State the kind of interviewing technique- interviewing method can be telephone, mails, personal interview or
electronic interview. Telephonic interview can be computer assisted. Personal interview can be conducted at
respondent’s place or at mall or shopping place. Mail interview can take the form of mail panel. Electronic
interview takes place either through electronic mails or through the internet.
Decide the matter/content of individual questions- There are two deciding factors for this-

15
Is the question significant? - Observe contribution of each question. Does the question contribute for the
objective of the study?
Is there a need for several questions or a single question? - Several questions are asked in the following cases:
When there is a need for cross-checking
When the answers are ambiguous
When people are hesitant to give correct information.
Overcome the respondents’ inability and unwillingness to answer- The respondents may be unable to answer
the questions because of following reasons-
The respondent may not be fully informed
The respondent may not remember
He may be unable to express or articulate
The respondent may be unwilling to answer due to-

There may be sensitive information which may cause embarrassment or harm the respondent’s image.
The respondent may not be familiar with the genuine purpose
The question may appear to be irrelevant to the respondent
The respondent will not be willing to reveal traits like aggressiveness (For instance - if he is asked “Do you hit
your wife, sister”, etc.)
To overcome the respondent’s unwillingness to answer:

Place the sensitive topics at the end of the questionnaire


Preface the question with a statement
Use the third person technique (For example - Mark needed a job badly and he used wrong means to get it - Is it
right?? Different people will have different opinions depending upon the situation)
Categorize the responses rather than asking a specific response figure (For example - Group for income levels
0-25000, 25000-50000, 50000 and above)
Decide on the structure of the question- Questions can be of two types:
Structured questions- These specify the set of response alternatives and the response format. These can be
classified into multiple choice questions (having various response categories), dichotomous questions (having
only 2 response categories such as “Yes” or “No”) and scales (discussed already).
Unstructured questions- These are also known as open-ended question. No alternatives are suggested and the
respondents are free to answer these questions in any way they like.
Determine the question language/phrasing- If the questions are poorly worded, then either the respondents will
refuse to answer the question or they may give incorrect answers. Thus, the words of the question should be
carefully chosen. Ordinary and unambiguous words should be used. Avoid implicit assumptions,
generalizations and implicit alternatives. Avoid biased questions. Define the issue in terms of who the
questionnaire is being addressed to, what information is required, when is the information required, why the
question is being asked, etc.
Properly arrange the questions- To determine the order of the question, take decisions on aspects like opening
questions (simple, interesting questions should be used as opening questions to gain co-operation and
confidence of respondents), type of information (Basic information relates to the research issue, classification
information relates to social and demographic characteristics, and identification information relates to personal
information such as name, address, contact number of respondents), difficult questions (complex, embarrassing,
dull and sensitive questions could be difficult), effect on subsequent questions, logical sequence, etc.
Recognize the form and layout of the questionnaire- This is very essential for self-administered questionnaire.
The questions should be numbered and pre-coded. The layout should be such that it appears to be neat and
orderly, and not clattered.
Reproduce the questionnaire- Paper quality should be good. Questionnaire should appear to be professional.
The required space for the answers to the question should be sufficient. The font type and size should be
appropriate. Vertical response questions should be used, for example:
16
Do you use brand X of shampoo ? Yes/No
Pre-test the questionnaire- The questionnaire should be pre-tested on a small number of respondents to identify
the likely problems and to eliminate them. Each and every dimension of the questionnaire should be pre-tested.
The sample respondents should be similar to the target respondents of the survey.
Finalize the questionnaire- Check the final draft questionnaire. Ask yourself how much will the information
obtained from each question contribute to the study. Make sure that irrelevant questions are not asked. Obtain
feedback of the respondents on the questionnaire.

Sampling and Non-Sampling Errors.


Two major types of error can arise when a sample of observations is taken from a population: sampling error
and non-sampling error.

Sampling error refers to differences between the sample and the population that exist only because of the
observations that happened to be selected for the sample.

Non-sampling errors are more serious and are due to mistakes made in the acquisition of data or due to the
sample observations being selected improperly.
Sampling Error: Sampling error refers to differences between the sample and the population that exist only
because of the observations that happened to be selected for the sample.

Another way to look at this is: the differences in results for different samples (of the same size) is due to
sampling error:
E.g. Two samples of size 10 of 1,000 households. If we happened to get the highest income level data points in
our first sample and all the lowest income levels in the second, this is a consequence of sampling error.

Increasing the sample size will reduce this type of error.

Non-Sampling Error
Non-sampling error are more serious and are due to mistakes made in the acquisition of data or due to the
sample observations being selected improperly.

There are three types of non-sampling errors:


• Errors in data acquisition,
• Nonresponse errors, and
• Selection bias.
Increasing the sample size will not reduce this type of error.
Errors in Data Acquisition arises from the recording of incorrect responses, due to:

- incorrect measurements being taken because of faulty equipment,


- mistakes made during transcription from primary sources,
- inaccurate recording of data due to misinterpretation of terms, or
- inaccurate responses to questions concerning sensitive issues.

Nonresponse Error: refers to error (or bias) introduced when responses are not obtained from some members of
the sample, i.e. the sample observations that are collected may not be representative of the target population.
As mentioned earlier, the Response Rate (i.e. the proportion of all people selected who complete the survey) is a
key survey parameter and helps in the understanding in the validity of the survey and sources of nonresponse
error.

17
18

You might also like