Professional Documents
Culture Documents
COM 114 Note-1
COM 114 Note-1
ON
TECHNOLOGY, VOM
Compiled By
What is Statistics?
Statistics is the science concerned with developing and studying methods for collecting, analyzing, interpreting
and presenting and making valid conclusion on empirical data. Statistics is a highly interdisciplinary field;
research in statistics finds applicability in virtually all scientific fields and research questions in the various
scientific fields motivate the development of new statistical methods and theory. In developing methods and
studying the theory that underlies the methods statisticians draw on variety of mathematical and computational
tools.
1
Two fundamental ideas in the field of statistics are uncertainty and variation. There are many situations that we
encounter in science (or more generally in life) in which the outcome is uncertain. In some cases the uncertainty
is because the outcome in question is not determined yet (e.g., we may not know whether it will rain tomorrow)
while in other cases the uncertainty is because although the outcome has been determined already we are not
aware of it (e.g., we may not know whether we passed a particular exam).
Probability is a mathematical language used to discuss uncertain events and probability plays a key role in
statistics. Any measurement or data collection effort is subject to a number of sources of variation. By this we
mean that if the same measurement were repeated, then the answer would likely change. Statisticians attempt to
understand and control (where possible) the sources of variation in any situation.
Primary data
As the name suggests, are first-hand information collected by the Statistician. The data so collected are pure and
original and collected for a specific purpose. They have never undergone any statistical treatment before. The
collected data may be published as well. The Census is an example of primary data.
2
Secondary data
Secondary data are opposite to primary data. They are collected and published already (by some organization,
for instance). They can be used as a source of data and used by Statisticians to collect data from and conduct the
analysis. Secondary data are impure in the sense that they have undergone statistical treatment at least once.
Official publications such as the Ministry of Finance, Statistical Departments of the government, Federal
Bureaus, Agricultural Statistical boards, etc. Semi-official sources include State Bank, Boards of
Economic Enquiry, etc.
Data published by Chambers of Commerce and trade associations and boards.
Articles in the newspaper, from journals and technical publications.
(1) Business
Statistics plays an important role in business. A successful businessman must be very quick and accurate in
decision making. He knows what his customers want; he should therefore know what to produce and sell and in
what quantities.
Statistics helps businessmen to plan production according to the taste of the customers, and the quality of the
products can also be checked more efficiently by using statistical methods. Thus, it can be seen that all business
activities are based on statistical information. Businessmen can make correct decisions about the location of
business, marketing of the products, financial resources, etc.
2) Economics
Economics largely depends upon statistics. National income accounts are multipurpose indicators for
economists and administrators, and statistical methods are used to prepare these accounts. In economics
research, statistical methods are used to collect and analyze the data and test hypotheses. The relationship
between supply and demand is studied by statistical methods; imports and exports, inflation rates, and per capita
income are problems which require a good knowledge of statistics.
(3) Mathematics
Statistics plays a central role in almost all natural and social sciences. The methods used in natural sciences are
the most reliable but conclusions drawn from them are only probable because they are based on incomplete
evidence.
Statistics helps in describing these measurements more precisely. Statistics is a branch of applied mathematics.
A large number of statistical methods like probability averages, dispersions, estimation, etc., is used in
mathematics, and different techniques of pure mathematics like integration, differentiation and algebra are used
in statistics.
(4) Banking
3
Statistics plays an important role in banking. Banks make use of statistics for a number of purposes. They work
on the principle that everyone who deposits their money with the banks does not withdraw it at the same time.
The bank earns profits out of these deposits by lending it to others on interest. Bankers use statistical
approaches based on probability to estimate the number of deposits and their claims for a certain day.
(8) Astronomy
Astronomy is one of the oldest branches of statistical study; it deals with the measurement of distance, and
sizes, masses and densities of heavenly bodies by means of observations. During these measurements errors are
unavoidable, so the most probable measurements are found by using statistical methods.
Example: This distance of the moon from the earth is measured. Since history, astronomers have been using
statistical methods like method of least squares to find the movements of stars.
But the best part of statistics is that it also helps you to find out how much you affected from the deceased. For
example, a study has shown that more than 75% of people are infected with a disease that is caused by mango.
In that case, you might avoid mango to avoid this disease.
Statistics and the computer there are two different ways in which the computer is changing the field of statistics.
First, computers can help us to do what we did before the advent of the computer but in a more efficient way.
Second, computers can help us to do things nobody thought of before the advent of the computer. To the first
4
category belong statistical data analysis by numerical and graphical methods, and simulation; to the second
belongs, for example, different computer- intensive methods (see Diaconis and Efron, 1983). Another way to
categorise the relation statistics-computer is to list the different ways the computer can be used in statistics.
The following are examples of such uses: numerical and graphical data analysis; symbolic computations;
simulations; storing statistical knowledge; presentation of results. The close relationship between statistics and
computing implies that when one changes the other will also change. The following are some new practical
procedures in computing which have turned out to have a great importance for statistics:
(i) The change from mainframe batch computing to personal computing.
(ii) The introduction of multiple dynamic displays.
(iii) The possibility of direct manipulation of graphical objects.
Some trends in statistics are also obviously very much influenced by what has happened in computing.
Examples of such trends are:
(i) emphasis on exploratory data analysis instead of hypothesis testing;
(ii) the use of computer-intensive methods;
(iii) the introduction of new diagnostic method
1.5 State uses of statistical data
(1) Statistics helps in providing a better understanding and exact description of a phenomenon of nature.
(2) Statistics helps in the proper and efficient planning of a statistical inquiry in any field of study.
(3) Statistics helps in collecting appropriate quantitative data.
Government
The importance of statistics in government is utilized by making judgments about health, populations,
education, and much more. It may help the government to check out what education schedule can be beneficial
for students. What is the progress report of high school students using that particular curriculum? The
government can assemble specific data about the population of the country using a census. For example: The
government can assemble specific data about the population of the country using a census.
Weather Forecast
Have you ever seen weather forecasting? Do you know how the government does the weather forecasting?
Statistics play a crucial role in weather forecasting.
The computer use in weather forecasting is based on the set of statistics functions. All these statistics function to
compare the weather condition with the pre-recorded seasons and conditions. This helps the government.
Emergency Preparedness
Statistics is also helpful in emergency preparedness. With the help of statistics, we can predict any natural
disaster that may happen shortly. It will help us to get prepared for an emergency. It also helps the rescue team
to do the preparation to rescue the life of the people who are in danger.
Political Campaigns
Statistics are crucial in a political campaign. Without statistics, no one can run a political campaign with
perfection. It helps the politicians to have an idea about how many chances they have to win an election in a
particular area.
5
Statistics also help the news channel to predict the winner of the election. It also helps the political parties to
know how many candidates are in their support in a particular voting zone. In contrast, it helps the country to
predict the future government.
Sports
There is lots of uses of statistics in sports. Every sports require statistics to make the sport more effective.
Statistics help the sport person to get the idea about his/her performance in the particular sports.
Nowadays sports are utilizing the statistics data into the next level. However the reason is a sport is getting
more popular and there are various kinds of types of equipment in the sports that are used to collect data of
various factor. Statistics is used to get a conclusion from the given data.
Research
The uses of statistics in research play an essential role in the work of researchers. For instance, statistics can be
applied in data acquisition, analysis, explanation, interpretation, and presentation. The uses of statistics in
research can lead researchers for summarization, proper characterization, performance, and description of the
outcome of the research.
Besides this, the medical area would be less effective without the research to recognize which drugs or
interventions run best and how the individual groups respond to medicine. Medical experts also conduct studies
by age, race, or country to identify the effect of the features on one’s health.
Education
The beneficial uses of statistics in education are that teachers can be considered to be supportive as researchers
during their classrooms to recognize what education technique works on which pupils and know the reason
why. They also need to estimate test details to determine whether students are working expectedly, statistically,
or not. There are statistical studies about student achievement at all levels of testing and education, from
kindergarten to a GRE or SAT. For example, teachers can calculate the average of students’ marks and employ
new techniques that can help the students improve their grades.
Prediction
The figures help us make predictions about something that is going to happen in the future. Based on what we
face in our daily lives, we make predictions.
How accurate this prediction will depend on many factors. When we make a prediction, we take into account
the external or internal factors that may affect our future. When they apply statistical techniques to estimate an
event, the same statisticians use it.
Doctors, engineers, artists, and practitioners all use statistics to make predictions about future events. For
example, doctors use statistics to understand the future of the disease. They can predict the magnitude of the
flue in each winter season through the use of data.
Engineers use statistics to estimate the success of their ongoing project, and they also use the data to evaluate
how long it will take to complete a project.
Quality Testing
Quality testing is another important use of statistics in every area of life. On a day-to-day basis, we conduct
quality tests to ensure that our purchase is correct and get the best results from what we spend.
6
We do a sample test of what we expect to buy to get the best. If the sample test that we have done passes the
quality test, we want to buy it.
Insurance
Insurance is a vast industry. There are hundreds of insurance i.e. car insurance, bike, life insurance, and many
more. The premium of insurance is based on the statistics. Insurance companies use the statistics that are
collected from various homeowners, drivers, vehicle registration office, and many more. They receive the data
from all these resources and then decide the premium amount.
Consumer Goods
Statistics are widely used in consumer goods products. The reason is consumer goods are daily used products.
The business use statistics to calculate which consumer goods are available in the store or not.
They also used stats to find out which store needs the consumer goods and when to ship the products. Even
proper statistics decisions are helping the business to make massive revenue on consumer goods.
Financial Market
The financial market completely relies on the financial market. All the stock prices calculate with the help of
statistics. It also helps the investor to take the decision of investment in the particular stock. It also helps the
corporate to manage their finance to do long term business.
Business Statistics
Each large organization uses business statistics and utilize various data analysis tools. For instance,
approximating the probability and see where sales can be headed in the future. Several tools are used for
business statistics, which built on the bases of mean, median, and mode, the bell curve, and bar graphs, and
basic probability. These can be employed for research problems related to employees, products, customer
service, and much more. Business can successfully rely on the things what is working and what is not.
Besides this, statistics are widely used in consumer goods products. The reason is consumer goods are daily
used products. The business use statistics to calculate which consumer goods are available in the store or not.
They also used stats to find out which store needs the consumer goods and when to ship the products. Even
proper statistics decisions are helping the business to make massive revenue on consumer goods.
Computer Science
Statistics is essential for all sections of science, as it is amazingly beneficial for decision making and examining
the correctness of the choices that one has made. With the application of statistics in computer science and
machine learning, algorithms’ efficiency can be increased significantly. One also can reduce the price of
processing with the help of statistics. If one does not understand statistics, it is not possible to know the logical
algorithms and find it challenging to develop them.
Computer scientists need to concentrate on retrieval, reporting, data acquisition/cleaning, and mining. They are
assigned to the algorithms’ improvement and systems efficiency. Besides this, they focus on machine learning,
especially data mining (discovering models and relationships in information for several objectives, like finance
and marketing).
Robotics
7
Statistics has various uses in the field of robotics. Various techniques can be applied in this field, such as EM,
Particle filters, Kalman filters, Bayesian networks, and much more. The robot always senses the present state by
estimating the probability density function value. With the help of new input sensories, the robots continuously
update themselves and give priority to the current actions. Apart from this, robots can compare the estimated
and actual value and act as per the value. Therefore, it can be stated that statistics is an important parameter that
is used in robotics.
Aerospace
Statistics is one of the important parameters on which aerospace engineering works. There are numerous ways
in which statistics are easily implemented, such as details about shrinkage and growth rate for a route. Apart
from this, statistics are used to study traffic decline and growth, the number of accidents due to aerospace
failures, etc. Several airline industries use these statistics information to check how they can work to make a
better aerospace future.
Data Science
A data scientist uses different statistical techniques to study the collected data, such as Classification,
Hypothesis testing, Regression, Time series analysis, and much more. Data scientists do proper experiments and
get desired results using these statistical techniques. Besides all this, statistics can be utilized for concluding the
information quickly and effectively. Therefore, statistics is one of the helpful measures for data scientists to
obtain the relevant outputs of the sample space.
Machine Learning
Statistics are utilized for quantifying the uncertainty of the estimated skills within the machine learning models.
These uncertainties are defined with the help of confidence intervals and tolerance intervals. Statistics can be
used for machine learning in various ways, such as for:
Problem Framing
Understanding the data
Data Cleaning
Selection of data
Data Preparation
Model Evaluation
Model Configuration
Selection of Model
Model Presentation
Model Prediction
Deep Learning
Statistics and probability both are considered as the method of handling the aggregation or ignorance of data.
Deep learning can use statistics to get knowledge about abstracting several useful properties and ignorance of
the details. Therefore, it can be seen that statistics and probability are the methods to formalize the deep
learning process mathematically. That is why this can be concluded that statistics are basic for deep learning,
and it would be better to understand the use of statistics in deep learning and know it.
For example, there are quantities corresponding to various parameters, for instance, “How much did that laptop
cost?” is a question which will collect quantitative data. There are values associated with most measuring
parameters such as pounds or kilograms for weight, dollars for cost etc.
Quantitative data makes measuring various parameters controllable due to the ease of mathematical derivations
they come with. Quantitative data is usually collected for statistical analysis using surveys, polls or
questionnaires sent across to a specific section of a population. The retrieved results can be established across a
population.
What is Measurement?
Normally, when one hears the term measurement, they may think in terms of measuring the length
of something (ie. the length of a piece of wood) or measuring a quantity of something (ie. a cup of
flour). This represents a limited use of the term measurement. In statistics, the term measurement
is used more broadly and is more appropriately termed scales of measurement. Scales of
measurement refer to ways in which variables/numbers are defined and categorized. Each scale of
measurement has certain properties which in turn determines the appropriateness for use of certain
statistical analyses. The four scales of measurement are nominal, ordinal, interval, and ratio.
Nominal: Categorical data and numbers that are simply used as identifiers or names represent a
nominal scale of measurement. Numbers on the back of a baseball jersey (St. Louis Cardinals 1 =
Ozzie Smith) and your social security number are examples of nominal data. If I conduct a study
and I'm including gender as a variable, I will code Female as 1 and Male as 2 or visa versa when I
enter my data into the computer. Thus, I am using the numbers 1 and 2 to represent categories of
data.
Ordinal: An ordinal scale of measurement represents an ordered series of relationships or rank
order. Individuals competing in a contest may be fortunate to achieve first, second, or third place.
First, second, and third place represent ordinal data. If Roscoe takes first and Wilbur takes second,
we do not know if the competition was close; we only know that Roscoe outperformed Wilbur.
Likert-type scales (such as "On a scale of 1 to 10 with one being no pain and ten being high pain,
how much pain are you in today?") also represent ordinal data. Fundamentally, these scales do not
represent a measurable quantity. An individual may respond 8 to this question and be in less pain
than someone else who responded 5. A person may not be in half as much pain if they responded
4 than if they responded 8. All we know from this data is that an individual who responds 6 is in
less pain than if they responded 8 and in more pain than if they responded 4. Therefore, Likert-
type scales only represent a rank ordering.
Interval: A scale which represents quantity and has equal units but for which zero represents
simply an additional point of measurement is an interval scale. The Fahrenheit scale is a clear
example of the interval scale of measurement. Thus, 60 degree Fahrenheit or -10 degrees
Fahrenheit are interval data. Measurement of Sea Level is another example of an interval scale.
9
With each of these scales there is direct, measurable quantity with equality of units. In addition,
zero does not represent the absolute lowest value. Rather, it is point on the scale with numbers
both above and below it (for example, -10 degrees Fahrenheit).
Ratio: The ratio scale of measurement is similar to the interval scale in that it also represents
quantity and has equality of units. However, this scale also has an absolute zero (no numbers exist
below the zero). Very often, physical measures will represent ratio data (for example, height and
weight). If one is measuring the length of a piece of wood in centimeters, there is quantity, equal
units, and that measure can not go below zero centimeters. A negative length is not possible.
2.1 Describe basic sampling techniques: Random, Systematic, Stratified, Quota Sampling etc
Sampling methods
What are sampling methods?
In a statistical study, sampling methods refer to how we select members from the population to be in the study.
If a sample isn't randomly selected, it will probably be biased in some way and the data may not be
representative of the population.
There are many ways to select a sample—some good and some bad.
Bad ways to sample
Convenience sample: The researcher chooses a sample that is readily available in some non-random way.
Example—A researcher polls people as they walk by on the street.
Why it's probably biased: The location and time of day and other factors may produce a biased sample of
people.
Voluntary response sample: The researcher puts out a request for members of a population to join the sample,
and people decide whether or not to be in the sample.
Example—A TV show host asks his viewers to visit his website and respond to an online poll.
Why it's probably biased: People who take the time to respond tend to have similarly strong opinions compared
to the rest of the population.
PRACTICE PROBLEM 1
A restaurant leaves comment cards on all of its tables and encourages customers to participate in a brief survey
to learn about their overall experience.
Simple random sample: Every member and set of members has an equal chance of being included in the
sample. Technology, random number generators, or some other sort of chance process is needed to get a simple
random sample.
Example—A teachers puts students' names in a hat and chooses without looking to get a sample of
students.
Why it's good: Random samples are usually fairly representative since they don't favor certain members.
Stratified random sample: The population is first split into groups. The overall sample consists of some
members from every group. The members from each group are chosen randomly.
10
Example—A student council surveys 100100100 students by getting random samples of 252525
freshmen, 252525 sophomores, 252525 juniors, and 252525 seniors.
Why it's good: A stratified sample guarantees that members from each group will be represented in the
sample, so this sampling method is good when we want some members from every group.
Cluster random sample: The population is first split into groups. The overall sample consists of every member
from some of the groups. The groups are selected at random.
Example—An airline company wants to survey its customers one day, so they randomly select 555 flights
that day and survey every passenger on those flights.
Why it's good: A cluster sample gets every member from some of the groups, so it's good when each
group reflects the population as a whole.
Systematic random sample: Members of the population are put in some order. A starting point is selected at
random.
Probability sampling involves random selection, allowing you to make statistical inferences about the whole
group.
Non-probability sampling involves non-random selection based on convenience or other criteria, allowing you
to easily collect initial data.
You should clearly explain how you selected your sample in the methodology section of your paper or thesis.
Probability sampling
1. Simple random sampling
In a simple random sample, every member of the population has an equal chance of being selected. Your
sampling frame should include the whole population.
To conduct this type of sampling, you can use tools like random number generators or other techniques that are
based entirely on chance.
Example
You want to select a simple random sample of 100 employees of Company X. You assign a number to
every employee in the company database from 1 to 1000, and use a random number generator to select
100 numbers.
2. Systematic sampling
Systematic sampling is defined as a probability sampling method where the researcher chooses elements from a
target population by selecting a random starting point and selects sample members after a fixed ‘sampling
interval.’ Systematic sampling is similar to simple random sampling, but it is usually slightly easier to conduct.
Every member of the population is listed with a number, but instead of randomly generating numbers,
individuals are chosen at regular intervals.
11
Example
All employees of the company are listed in alphabetical order. From the first 10 numbers, you randomly
select a starting point: number 6. From number 6 onwards, every 10th person on the list is selected (6,
16, 26, 36, and so on), and you end up with a sample of 100 people.
If you use this technique, it is important to make sure that there is no hidden pattern in the list that might skew
the sample. For example, if the HR database groups employees by team, and team members are listed in order
of seniority, there is a risk that your interval might skip over people in junior roles, resulting in a sample that is
skewed towards senior employees.
3. Stratified sampling
Stratified sampling involves dividing the population into subpopulations that may differ in important ways. It
allows you draw more precise conclusions by ensuring that every subgroup is properly represented in the
sample.
To use this sampling method, you divide the population into subgroups (called strata) based on the relevant
characteristic (e.g. gender, age range, income bracket, job role).
Based on the overall proportions of the population, you calculate how many people should be sampled from
each subgroup. Then you use random or systematic sampling to select a sample from each subgroup.
Example
The company has 800 female employees and 200 male employees. You want to ensure that the sample
reflects the gender balance of the company, so you sort the population into two strata based on gender.
Then you use random sampling on each group, selecting 80 women and 20 men, which gives you a
representative sample of 100 people.
4. Quota sampling
Quota sampling is a sampling methodology wherein data is collected from a homogeneous group. It involves a
two-step process where two variables can be used to filter information from the population. It can easily be
administered and helps in quick comparison. It is defined as a non-probability sampling method in which
researchers create a sample involving individuals that represent a population. Researchers choose these
individuals according to specific traits or qualities.
Example
With probability sampling, like simple random sampling, there are rules that govern how to get the
sample. However, with quota sampling, no formal rules exist. The general steps to follow are:
1. Divide the population into subgroups. These should be exclusive. For example, you might divide
employees by type of educational degree.
2. Figure out the proportion of subgroups to the population. For example, employees who have a
physical science degree might be 1 out of 4.
3. Choose your sample size. For example, if you are sampling 10,000 people you might have a
quota sample of 100.
4. Choose participants, being careful to adhere to the subgroup’s characteristics. For this example,
25% of your sample should have a physical science degree. The selection process continues until
your quotas are filled.
12
5. Cluster sampling
Cluster sampling also involves dividing the population into subgroups, but each subgroup should have similar
characteristics to the whole sample. Instead of sampling individuals from each subgroup, you randomly select
entire subgroups.
If it is practically possible, you might include every individual from each sampled cluster. If the clusters
themselves are large, you can also sample individuals from within each cluster using one of the techniques
above.
This method is good for dealing with large and dispersed populations, but there is more risk of error in the
sample, as there could be substantial differences between clusters. It’s difficult to guarantee that the sampled
clusters are really representative of the whole population.
Example
The company has offices in 10 cities across the country (all with roughly the same number of employees
in similar roles). You don’t have the capacity to travel to every office to collect your data, so you use
random sampling to select 3 offices – these are your clusters.
This type of sample is easier and cheaper to access, but it has a higher risk of sampling bias, and you can’t use it
to make valid statistical inferences about the whole population.
Non-probability sampling techniques are often appropriate for exploratory and qualitative research. In these
types of research, the aim is not to test a hypothesis about a broad population, but to develop an initial
understanding of a small or under-researched population.
i. Convenience sampling
A convenience sample simply includes the individuals who happen to be most accessible to the researcher.
This is an easy and inexpensive way to gather initial data, but there is no way to tell if the sample is
representative of the population, so it can’t produce generalizable results.
Example
You are researching opinions about student support services in your university, so after each of your
classes, you ask your fellow students to complete a survey on the topic. This is a convenient way to
gather data, but as you only surveyed students taking the same classes as you at the same level, the
sample is not representative of all the students at your university.
Voluntary response samples are always at least somewhat biased, as some people will inherently be more likely
to volunteer than others.
Example
13
You send out the survey to all students at your university and a lot of students decide to complete it. This
can certainly give you some insight into the topic, but the people who responded are more likely to be
those who have strong opinions about the student support services, so you can’t be sure that their
opinions are representative of all students.
It is often used in qualitative research, where the researcher wants to gain detailed knowledge about a specific
phenomenon rather than make statistical inferences. An effective purposive sample must have clear criteria and
rationale for inclusion.
Example
You want to know more about the opinions and experiences of disabled students at your university, so
you purposefully select a number of students with different support needs in order to gather a varied
range of data on their experiences with student services.
2.2 Distinguish between the following methods of data collection: Interviews, Questionnaires, Observation
and Surveys.
Compared to closed-ended surveys, one of the quantitative data collection methods, the findings of open-ended
surveys are more difficult to compile and analyze due to the fact that there are no uniform answer options to
choose from.
1-on-1 Interviews
One-on-one (or face-to-face) interviews are one of the most common types of data collection methods in
qualitative research. Here, the interviewer collects data directly from the interviewee. Due to it being a very
personal approach, this data collection technique is perfect when you need to gather highly-personalized data.
Depending on your specific needs, the interview can be informal, unstructured, conversational, and even
spontaneous (as if you were talking to your friend) – in which case it’s more difficult and time-consuming to
process the obtained data – or it can be semi-structured and standardized to a certain extent (if you, for example,
ask the same series of open-ended questions).
Focus groups
The focus groups data collection method is essentially an interview method, but instead of being done 1-on-1,
here we have a group discussion.
14
Whenever the resources for 1-on-1 interviews are limited (whether in terms of people, money, or time) or you
need to recreate a particular social situation in order to gather data on people’s attitudes and behaviors, focus
groups can come in very handy.
Ideally, a focus group should have 3-10 people, plus a moderator. Of course, depending on the research goal
and what the data obtained is to be used for, there should be some common denominators for all the members of
the focus group.
Example, if you’re doing a study on the rehabilitation of teenage female drug users, all the members of
your focus group have to be girls recovering from drug addiction. Other parameters, such as age,
education, employment, marital status do not have to be similar.
Direct observation
Direct observation is one of the most passive qualitative data collection methods. Here, the data collector takes
a participatory stance, observing the setting in which the subjects of their observation are while taking down
notes, video/audio recordings, photos, and so on.
Due to its participatory nature, direct observation can lead to bias in research, as the participation may influence
the attitudes and opinions of the researcher, making it challenging for them to remain objective. Plus, the fact
that the researcher is a participant too can affect the naturalness of the actions and behaviors of subjects who
know they’re being observed.
Questionnaire Design - Guidelines on how to design a good questionnaire
A good questionnaire should not be too lengthy. Simple English should be used and the question shouldn’t be
difficult to answer. A good questionnaire requires sensible language, editing, assessment, and redrafting.
Survey
A survey is a research method used for collecting data from a predefined group of respondents to gain
information and insights into various topics of interest. They can have multiple purposes, and researchers can
conduct it in many ways depending on the methodology chosen and the study’s goal. In the year 2020, research
is of extreme importance, and hence it’s essential for us to understand the benefits of social research for a target
population using the right survey tool.
The data is usually obtained through the use of standardized procedures to ensure that each respondent can
answer the questions at a level playing field to avoid biased opinions that could influence the outcome of the
research or study. The process involves asking people for information through a questionnaire, which can be
either online or offline. However, with the arrival of new technologies, it is common to distribute them using
digital media such as social networks, email, QR codes, or URLs.
15
Is the question significant? - Observe contribution of each question. Does the question contribute for the
objective of the study?
Is there a need for several questions or a single question? - Several questions are asked in the following cases:
When there is a need for cross-checking
When the answers are ambiguous
When people are hesitant to give correct information.
Overcome the respondents’ inability and unwillingness to answer- The respondents may be unable to answer
the questions because of following reasons-
The respondent may not be fully informed
The respondent may not remember
He may be unable to express or articulate
The respondent may be unwilling to answer due to-
There may be sensitive information which may cause embarrassment or harm the respondent’s image.
The respondent may not be familiar with the genuine purpose
The question may appear to be irrelevant to the respondent
The respondent will not be willing to reveal traits like aggressiveness (For instance - if he is asked “Do you hit
your wife, sister”, etc.)
To overcome the respondent’s unwillingness to answer:
Sampling error refers to differences between the sample and the population that exist only because of the
observations that happened to be selected for the sample.
Non-sampling errors are more serious and are due to mistakes made in the acquisition of data or due to the
sample observations being selected improperly.
Sampling Error: Sampling error refers to differences between the sample and the population that exist only
because of the observations that happened to be selected for the sample.
Another way to look at this is: the differences in results for different samples (of the same size) is due to
sampling error:
E.g. Two samples of size 10 of 1,000 households. If we happened to get the highest income level data points in
our first sample and all the lowest income levels in the second, this is a consequence of sampling error.
Non-Sampling Error
Non-sampling error are more serious and are due to mistakes made in the acquisition of data or due to the
sample observations being selected improperly.
Nonresponse Error: refers to error (or bias) introduced when responses are not obtained from some members of
the sample, i.e. the sample observations that are collected may not be representative of the target population.
As mentioned earlier, the Response Rate (i.e. the proportion of all people selected who complete the survey) is a
key survey parameter and helps in the understanding in the validity of the survey and sources of nonresponse
error.
17
18