Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

MADHA Engineering College, Kundrathur, Chennai-69.

Master of Computer Application


N.Vinodh, MBA, M.Phil, Department of Management Studies
Unit-2 Research Methodology & IPR

Measurements, Measurement Scales, Questionnaires and Instruments, Sampling and


methods. Data - Preparing, Exploring, examining and displaying.

Measurement:
Measurement is the process of observing and recording the observations that are
collected as part of a research effort. There are two major issues that will be considered
here.
First, to understand the fundamental ideas involved in measuring. Here we consider
two of major measurement concepts. In Levels of Measurement, the meaning of the four
major levels of measurement: nominal, ordinal, interval and ratio. Then we move on to
the reliability of measurement, including consideration of true score theory and a
variety of reliability estimators.
Second, to understand the different types of measures that you might use in social
research. We consider four broad categories of measurements. Survey
research includes the design and implementation of interviews and
questionnaires. Scaling involves consideration of the major methods of developing and
implementing a scale. Qualitative research provides an overview of the broad range of
non-numerical measurement approaches. And unobtrusive measures presents a
variety of measurement methods that don’t intrude on or interfere with the context of
the research.

LEVELS OF MEASUREMENT
There are different levels of measurement. These levels differ as to how closely they
approach the structure of the number system we use. It is important to understand the
level of measurement of variables in research, because the level of measurement
determines the type of statistical analysis that can be conducted, and, therefore, the type
of conclusions that can be drawn from the research.

Nominal Level
A nominal level of measurement uses symbols to classify observations into categories
that must be both mutually exclusive and exhaustive. Exhaustive means that there must
be enough categories that all the observations will fall into some category. Mutually
exclusive means that the categories must be distinct enough that no observations will fall
into more than one category. This is the most basic level of measurement; it is essentially
labeling. It can only establish whether two observations are alike or different, for
example, sorting a deck of cards into two piles: red cards and black cards.

In a survey of boaters, one variable of interest was place of residence. It was measured
by a question on a questionnaire asking for the zip code of the boater's principal place of
residence. The observations were divided into zip code categories. These categories are
mutually exclusive and exhaustive. All respondents live in one zip code category
(exhaustive) but no boater lives in more than one zip code category (mutually exclusive).
Similarly, the sex of the boater was determined by a question on the
questionnaire. Observations were sorted into two mutually exclusive and exhaustive
categories, male and female. Observations could be labeled with the letters M and F, or
the numerals 0 and 1.

The variable of marital status may be measured by two categories, married and
unmarried. But these must each be defined so that all possible observations will fit into
one category but no more than one: legally married, common-law marriage, religious
marriage, civil marriage, living together, never married, divorced, informally separated,
legally separated, widowed, abandoned, annulled, etc.
In nominal measurement, all observations in one category are alike on some property,
and they differ from the objects in the other category (or categories) on that property
(e.g., zip code, sex). There is no ordering of categories (no category is better or worse, or
more or less than another).

Ordinal Level
An ordinal level of measurement uses symbols to classify observations into categories
that are not only mutually exclusive and exhaustive; in addition, the categories have some
explicit relationship among them.

For example, observations may be classified into categories such as taller and shorter,
greater and lesser, faster and slower, harder and easier, and so forth. However, each
observation must still fall into one of the categories (the categories are exhaustive) but
no more than one (the categories are mutually exclusive). Meats are categorized as
regular, choice, or prime; the military uses ranks to distinguish categories of soldiers.

Most of the commonly used questions which ask about job satisfaction use the ordinal
level of measurement. For example, asking whether one is very satisfied, satisfied,
neutral, dissatisfied, or very dissatisfied with one's job is using an ordinal scale of
measurement.

Interval Level
An interval level of measurement classifies observations into categories that are not only
mutually exclusive and exhaustive, and have some explicit relationship among them, but
the relationship between the categories is known and exact. This is the first quantitative
application of numbers.

In the interval level, a common and constant unit of measurement has been established
between the categories. For example, the commonly used measures of temperature are
interval level scales. We know that a temperature of 75 degrees is one degree warmer
than a temperature of 74 degrees, just as a temperature of 42 degrees is one degree
warmer than a temperature of 41 degrees.

Numbers may be assigned to the observations because the relationship between the
categories is assumed to be the same as the relationship between numbers in the number
system. For example, 74+1=75 and 41+1=42.

The intervals between categories are equal, but they originate from some arbitrary
origin. that is, there is no meaningful zero point on an interval scale.

Ratio Level
The ratio level of measurement is the same as the interval level, with the addition of a
meaningful zero point. There is a meaningful and non-arbitrary zero point from which
the equal intervals between categories originate.
For example, weight, area, speed, and velocity are measured on a ratio level scale. In
public policy and administration, budgets and the number of program participants are
measured on ratio scales.

In many cases, interval and ratio scales are treated alike in terms of the statistical tests
that are applied.

Variables measured at a higher level can always be converted to a lower level, but not
vice versa. For example, observations of actual age (ratio scale) can be converted to
categories of older and younger (ordinal scale), but age measured as simply older or
younger cannot be converted to measures of actual age.
Questionaries & Instruments:
A questionnaire is a research tool featuring a series of questions used to collect useful
information from respondents. These instruments include either written or oral
questions and comprise an interview-style format. Questionnaires may be qualitative or
quantitative and can be conducted online, by phone, on paper or face-to-face, and
questions don’t necessarily have to be administered with a researcher present.
Questionnaires feature either open or closed questions and sometimes employ a mixture
of both. Open-ended questions enable respondents to answer in their own words in as
much or as little detail as they desire. Closed questions provide respondents with a series
of predetermined responses they can choose from.

Is a Questionnaire Just Another Word for “Survey”?


While the two terms seem synonymous, there are not quite the same. A questionnaire is
a set of questions created for the purpose of gathering information; that information may
not be used for a survey. However, all surveys do require questionnaires. If you are using
a questionnaire for survey sampling, it’s important to ensure that it is designed to gather
the most accurate answers from respondents.

Why Are Questionnaires Effective in Research?


Questionnaires are popular research methods because they offer a fast, efficient and
inexpensive means of gathering large amounts of information from sizeable sample
volumes. These tools are particularly effective for measuring subject behavior,
preferences, intentions, attitudes and opinions. Their use of open and closed research
questions enables researchers to obtain both qualitative and quantitative data, resulting
in more comprehensive results.

Advantages of Questionnaires
Some of the many benefits of using questionnaires as a research tool include:
 Practicality: Questionnaires enable researchers to strategically manage their
target audience, questions and format while gathering large data quantities on any
subject.
 Cost-efficiency: You don’t need to hire surveyors to deliver your survey questions
— instead, you can place them on your website or email them to respondents at
little to no cost.
 Speed: You can gather survey results quickly and effortlessly using mobile tools,
obtaining responses and insights in 24 hours or less.
 Comparability: Researchers can use the same questionnaire yearly and compare
and contrast research results to gain valuable insights and minimize translation
errors.
 Scalability: Questionnaires are highly scalable, allowing researchers to distribute
them to demographics anywhere across the globe.
 Standardization: You can standardize your questionnaire with as many
questions as you want about any topic.
 Respondent comfort: When taking a questionnaire, respondents are completely
anonymous and not subject to stressful time constraints, helping them feel relaxed
and encouraging them to provide truthful responses.
 Easy analysis: Questionnaires often have built-in tools that automate analyses,
making it fast and easy to interpret your results.
Disadvantages of Questionnaires
Questionnaires also have their disadvantages, such as:
 Answer dishonesty: Respondents may not always be completely truthful with
their answers — some may have hidden agendas, while others may answer how
they think society would deem most acceptable.
 Question skipping: Make sure to require answers for all your survey questions.
Otherwise, you may run the risk of respondents leaving questions unanswered.
 Interpretation difficulties: If a question isn’t straightforward enough,
respondents may struggle to interpret it accurately. That’s why it’s important to
state questions clearly and concisely, with explanations when necessary.
 Survey fatigue: Respondents may experience survey fatigue if they receive too
many surveys or a questionnaire is too long.
 Analysis challenges: Though closed questions are easy to analyze, open
questions require a human to review and interpret them. Try limiting open-ended
questions in your survey to gain more quantifiable data you can evaluate and
utilize more quickly.
 Unconscientious responses: If respondents don’t read your questions
thoroughly or completely, they may offer inaccurate answers that can impact data
validity. You can minimize this risk by making questions as short and simple as
possible.
Types of Questionnaires in Research
There are various types of questionnaires in survey research, including:
 Postal: Postal questionnaires are paper surveys that participants receive through
the mail. Once respondents complete the survey, they mail them back to the
organization that sent them.
 In-house: In this type of questionnaire, researchers visit respondents in their
homes or workplaces and administer the survey in person.
 Telephone: With telephone surveys, researchers call respondents and conduct
the questionnaire over the phone.
 Electronic: Perhaps the most common type of questionnaire, electronic surveys
are presented via email or through a different online medium.

What are Research Instruments?


A research instrument is a tool used to collect, measure, and analyze data related
to your subject.

Research instruments can be tests, surveys, scales, questionnaires, or


even checklists.

A research instrument is a tool used to obtain, measure, and analyze data from subjects
around the research topic.
To decide the instrument to use based on the type of study you are conducting:
quantitative, qualitative, or mixed-method. For instance, for a quantitative study, you may
decide to use a questionnaire, and for a qualitative study, you may choose to use a scale.

While it helps to use an established instrument, as its efficacy is already established, you
may if needed use a new instrument or even create your own instrument.

What are the Different Types of Interview Research Instruments?


The general format of an interview is where the interviewer asks the interviewee to
answer a set of questions which are normally asked and answered verbally. There are
several different types of interview research instruments that may exist.
1. A structural interview may be used in which there are a specific number of
questions that are formally asked of the interviewee and their responses recorded
using a systematic and standard methodology.
2. An unstructured interview on the other hand may still be based on the same
general theme of questions but here the person asking the questions (the
interviewer) may change the order the questions are asked in and the specific way
in which they’re asked.
3. A focus interview is one in which the interviewer will adapt their line or content
of questioning based on the responses from the interviewee.
4. A focus group interview is one in which a group of volunteers or interviewees are
asked questions to understand their opinion or thoughts on a specific subject.
5. A non-directive interview is one in which there are no specific questions agreed
upon but instead the format is open-ended and more reactionary in the discussion
between interviewer and interviewee.

What is sampling?
Sampling is a technique of selecting individual members or a subset of the population to
make statistical inferences from them and estimate characteristics of the whole
population. Different sampling methods are widely used by researchers in market
research so that they do not need to research the entire population to collect actionable
insights.

It is also a time-convenient and a cost-effective method and hence forms the basis of
any research design. Sampling techniques can be used in a research survey software for
optimum derivation.
For example, if a drug manufacturer would like to research the adverse side effects of a
drug on the country’s population, it is almost impossible to conduct a research study that
involves everyone. In this case, the researcher decides a sample of people from
each demographic and then researches them, giving him/her indicative feedback on the
drug’s behavior.

Types of sampling: sampling methods


Sampling in market research is of two types – probability sampling and non-probability
sampling. Let’s take a closer look at these two methods of sampling.
1. Probability sampling: Probability sampling is a sampling technique where a
researcher sets a selection of a few criteria and chooses members of a
population randomly. All the members have an equal opportunity to be a part
of the sample with this selection parameter.
2. Non-probability sampling: In non-probability sampling, the researcher
chooses members for research at random. This sampling method is not a fixed
or predefined selection process. This makes it difficult for all elements of a
population to have equal opportunities to be included in a sample.

In this blog, we discuss the various probability and non-probability sampling methods
that you can implement in any market research study.

Types of probability sampling with examples:


Probability sampling is a sampling technique in which researchers choose samples from
a larger population using a method based on the theory of probability. This sampling
method considers every member of the population and forms samples based on a fixed
process.

For example, in a population of 1000 members, every member will have a 1/1000
chance of being selected to be a part of a sample. Probability sampling eliminates bias in
the population and gives all members a fair chance to be included in the sample.

There are four types of probability sampling techniques:


 Simple random sampling: One of the best probability sampling techniques that

helps in saving time and resources, is the Simple Random Sampling method. It
is a reliable method of obtaining information where every single member of a
population is chosen randomly, merely by chance. Each individual has the same
probability of being chosen to be a part of a sample.
 For example, in an organization of 500 employees, if the HR team decides on

conducting team building activities, it is highly likely that they would prefer
picking chits out of a bowl. In this case, each of the 500 employees has an equal
opportunity of being selected.

 Cluster sampling: Cluster sampling is a method where the researchers divide the

entire population into sections or clusters that represent a population. Clusters


are identified and included in a sample based on demographic parameters like
age, sex, location, etc. This makes it very simple for a survey creator to derive
effective inference from the feedback.

For example, if the United States government wishes to evaluate the number of
immigrants living in the Mainland US, they can divide it into clusters based on
states such as California, Texas, Florida, Massachusetts, Colorado, Hawaii, etc.
This way of conducting a survey will be more effective as the results will be
organized into states and provide insightful immigration data.

 Systematic sampling: Researchers use the systematic sampling method to

choose the sample members of a population at regular intervals. It requires the


selection of a starting point for the sample and sample size that can be repeated
at regular intervals. This type of sampling method has a predefined range, and
hence this sampling technique is the least time-consuming.
For example, a researcher intends to collect a systematic sample of 500 people
in a population of 5000. He/she numbers each element of the population from
1-5000 and will choose every 10th individual to be a part of the sample (Total
population/ Sample Size = 5000/500 = 10).

 Stratified random sampling: Stratified random sampling is a method in which

the researcher divides the population into smaller groups that don’t overlap but
represent the entire population. While sampling, these groups can be organized
and then draw a sample from each group separately.
For example, a researcher looking to analyze the characteristics of people
belonging to different annual income divisions will create strata (groups)
according to the annual family income. Eg – less than $20,000, $21,000 –
$30,000, $31,000 to $40,000, $41,000 to $50,000, etc. By doing this, the
researcher concludes the characteristics of people belonging to different
income groups. Marketers can analyze which income groups to target and which
ones to eliminate to create a roadmap that would bear fruitful results.
Uses of probability sampling
There are multiple uses of probability sampling:
 Reduce Sample Bias: Using the probability sampling method, the bias in the

sample derived from a population is negligible to non-existent. The selection of


the sample mainly depicts the understanding and the inference of the
researcher. Probability sampling leads to higher quality data collection as the
sample appropriately represents the population.

 Diverse Population: When the population is vast and diverse, it is essential to

have adequate representation so that the data is not skewed towards


one demographic. For example, if Square would like to understand the people
that could make their point-of-sale devices, a survey conducted from a sample
of people across the US from different industries and socio-economic
backgrounds helps.

 Create an Accurate Sample: Probability sampling helps the researchers plan and

create an accurate sample. This helps to obtain well-defined data.

Types of non-probability sampling with examples


The non-probability method is a sampling method that involves a collection of feedback
based on a researcher or statistician’s sample selection capabilities and not on a fixed
selection process. In most situations, the output of a survey conducted with a non-
probable sample leads to skewed results, which may not represent the desired target
population. But, there are situations such as the preliminary stages of research or cost
constraints for conducting research, where non-probability sampling will be much more
useful than the other type.

Four types of non-probability sampling explain the purpose of this sampling method in a
better manner:
 Convenience sampling: This method is dependent on the ease of access to

subjects such as surveying customers at a mall or passers-by on a busy street. It


is usually termed as convenience sampling, because of the researcher’s ease of
carrying it out and getting in touch with the subjects. Researchers have nearly
no authority to select the sample elements, and it’s purely done based on
proximity and not representativeness. This non-probability sampling method is
used when there are time and cost limitations in collecting feedback. In
situations where there are resource limitations such as the initial stages of
research, convenience sampling is used.
For example, startups and NGOs usually conduct convenience sampling at a mall
to distribute leaflets of upcoming events or promotion of a cause – they do that
by standing at the mall entrance and giving out pamphlets randomly.

 Judgmental or purposive sampling: Judgemental or purposive samples are

formed by the discretion of the researcher. Researchers purely consider the


purpose of the study, along with the understanding of the target audience. For
instance, when researchers want to understand the thought process of people
interested in studying for their master’s degree. The selection criteria will be:
“Are you interested in doing your masters in …?” and those who respond with a
“No” are excluded from the sample.

 Snowball sampling: Snowball sampling is a sampling method that researchers

apply when the subjects are difficult to trace. For example, it will be extremely
challenging to survey shelterless people or illegal immigrants. In such cases,
using the snowball theory, researchers can track a few categories to interview
and derive results. Researchers also implement this sampling method in
situations where the topic is highly sensitive and not openly discussed—for
example, surveys to gather information about HIV Aids. Not many victims will
readily respond to the questions. Still, researchers can contact people they
might know or volunteers associated with the cause to get in touch with the
victims and collect information.

 Quota sampling: In Quota sampling, the selection of members in this sampling

technique happens based on a pre-set standard. In this case, as a sample is


formed based on specific attributes, the created sample will have the same
qualities found in the total population. It is a rapid method of collecting samples.

Uses of non-probability sampling


Non-probability sampling is used for the following:
 Create a hypothesis: Researchers use the non-probability sampling method to

create an assumption when limited to no prior information is available. This


method helps with the immediate return of data and builds a base for further
research.

 Exploratory research: Researchers use this sampling technique widely when

conducting qualitative research, pilot studies, or exploratory research.


 Budget and time constraints: The non-probability method when there are

budget and time constraints, and some preliminary data must be collected.
Since the survey design is not rigid, it is easier to pick respondents at random
and have them take the survey or questionnaire.

How do you decide on the type of sampling to use?


For any research, it is essential to choose a sampling method accurately to meet the goals
of your study. The effectiveness of your sampling relies on various factors. Here are some
steps expert researchers follow to decide the best sampling method.
 Jot down the research goals. Generally, it must be a combination of cost, precision,

or accuracy.
 Identify the effective sampling techniques that might potentially achieve the

research goals.
 Test each of these methods and examine whether they help in achieving your goal.

 Select the method that works best for the research.

Difference between probability sampling and non-probability sampling methods


We have looked at the different types of sampling methods above and their subtypes. To
encapsulate the whole discussion, though, the significant differences between probability
sampling methods and non-probability sampling methods are as below:

Probability Sampling Non-Probability Sampling


Methods Methods

Probability Sampling is a Non-probability sampling is a


sampling technique in which sampling technique in which the
samples from a larger researcher selects samples based
Definition
population are chosen using a on the researcher’s subjective
method based on the theory of judgment rather than random
probability. selection.

Alternatively Known as Random sampling method. Non-random sampling method

The population is selected The population is selected


Population selection
randomly. arbitrarily.

Nature The research is conclusive. The research is exploratory.

Since there is a method for Since the sampling method is


deciding the sample, the arbitrary, the population
Sample
population demographics are demographics representation is
conclusively represented. almost always skewed.
Takes longer to conduct since This type of sampling method is
the research design defines the quick since neither the sample or
Time Taken
selection parameters before the selection criteria of the sample are
market research study begins. undefined.

This type of sampling is entirely This type of sampling is entirely


unbiased and hence the results biased and hence the results are
Results
are unbiased too and biased too, rendering the research
conclusive. speculative.

In probability sampling,
there is an underlying
hypothesis before the study In non-probability sampling, the hypothesis is
Hypothesis
begins and the objective of derived after conducting the research study.
this method is to prove the
hypothesis.

Data Preparation Steps


The specifics of the data preparation process vary by industry, organization and need, but
the framework remains largely the same.
1. Gather data
The data preparation process begins with finding the right data. This can come from an
existing data catalog or can be added ad-hoc.
2. Discover and assess data
After collecting the data, it is important to discover each dataset. This step is about
getting to know the data and understanding what has to be done before the data becomes
useful in a particular context.
Discovery is a big task, but Talend’s data preparation platform offers visualization tools
which help users profile and browse their data.
3. Cleanse and validate data
Cleaning up the data is traditionally the most time consuming part of the data preparation
process, but it’s crucial for removing faulty data and filling in gaps. Important tasks here
include:
 Removing extraneous data and outliers.
 Filling in missing values.
 Conforming data to a standardized pattern.
 Masking private or sensitive data entries.
Once data has been cleansed, it must be validated by testing for errors in the data
preparation process up to this point. Often times, an error in the system will become
apparent during this step and will need to be resolved before moving forward.
4. Transform and enrich data
Transforming data is the process of updating the format or value entries in order to reach
a well-defined outcome, or to make the data more easily understood by a wider
audience. Enriching data refers to adding and connecting data with other related
information to provide deeper insights.
5. Store data
Once prepared, the data can be stored or channeled into a third party application—such
as a business intelligence tool—clearing the way for processing and analysis to take place.
WhatisDataExploration?
Data exploration definition: Data exploration refers to the initial step in data analysis in
which data analysts use data visualization and statistical techniques to describe dataset
characterizations, such as size, quantity, and accuracy, in order to better understand the
nature of the data.

Data exploration techniques include both manual analysis and automated data
exploration software solutions that visually explore and identify relationships between
different data variables, the structure of the dataset, the presence of outliers, and the
distribution of data values in order to reveal patterns and points of interest, enabling data
analysts to gain greater insight into the raw data.

Data is often gathered in large, unstructured volumes from various sources and data
analysts must first understand and develop a comprehensive view of the data before
extracting relevant data for further analysis, such as univariate, bivariate, multivariate,
and principal components analysis.

DataExplorationTools
Manual data exploration methods entail either writing scripts to analyze raw data or
manually filtering data into spreadsheets. Automated data exploration tools, such as data
visualization software, help data scientists easily monitor data sources and perform big
data exploration on otherwise overwhelmingly large datasets. Graphical displays of data,
such as bar charts and scatter plots, are valuable tools in visual data exploration.

A popular tool for manual data exploration is Microsoft Excel spreadsheets, which can be
used to create basic charts for data exploration, to view raw data, and to identify the
correlation between variables. To identify the correlation between two continuous
variables in Excel, use the function CORREL() to return the correlation. To identify the
correlation between two categorical variables in Excel, the two-way table method, the
stacked column chart method, and the chi-square test are effective.

There is a wide variety of proprietary automated data exploration solutions,


including business intelligence tools, data visualization software, data preparation
software vendors, and data exploration platforms. There are also open source data
exploration tools that include regression capabilities and visualization features, which
can help businesses integrate diverse data sources to enable faster data exploration. Most
data analytics software includes data visualization tools.
WhyisDataExplorationImportant?

Humans process visual data better than numerical data, therefore it is extremely
challenging for data scientists and data analysts to assign meaning to thousands of rows
and columns of data points and communicate that meaning without any visual
components.

Data visualization in data exploration leverages familiar visual cues such as shapes,
dimensions, colors, lines, points, and angles so that data analysts can effectively visualize
and define the metadata, and then perform data cleansing. Performing the initial step of
data exploration enables data analysts to better understand and visually identify
anomalies and relationships that might otherwise go undetected.

What is Data Preparation?


Data preparation is the process of cleaning and transforming raw data prior to processing
and analysis. It is an important step prior to processing and often involves reformatting
data, making corrections to data and the combining of data sets to enrich data.

Data preparation is often a lengthy undertaking for data professionals or business users,
but it is essential as a prerequisite to put data in context in order to turn it into insights
and eliminate bias resulting from poor data quality.

For example, the data preparation process usually includes standardizing data formats,
enriching source data, and/or removing outliers.

Benefits of Data Preparation + The Cloud


76% of data scientists say that data preparation is the worst part of their job, but the
efficient, accurate business decisions can only be made with clean data. Data preparation
helps:
 Fix errors quickly — Data preparation helps catch errors before
processing. After data has been removed from its original source, these
errors become more difficult to understand and correct.
 Produce top-quality data — Cleaning and reformatting datasets ensures
that all data used in analysis will be high quality.
 Make better business decisions — Higher quality data that can be
processed and analyzed more quickly and efficiently leads to more timely,
efficient and high-quality business decisions.

Additionally, as data and data processes move to the cloud, data preparation moves with
it for even greater benefits, such as:

 Superior scalability — Cloud data preparation can grow at the pace of the
business. Enterprise don’t have to worry about the underlying
infrastructure or try to anticipate their evolutions.

 Future proof — Cloud data preparation upgrades automatically so that new


capabilities or problem fixes can be turned on as soon as they are released.
This allows organizations to stay ahead of the innovation curve without
delays and added costs.

 Accelerated data usage and collaboration — Doing data prep in the cloud
means it is always on, doesn’t require any technical installation, and lets
teams collaborate on the work for faster results.

What Is Data Analysis?


Although many groups, organizations, and experts have different ways to approach data
analysis, most of them can be distilled into a one-size-fits-all definition. Data analysis is
the process of cleaning, changing, and processing raw data, and extracting actionable,
relevant information that helps businesses make informed decisions. The procedure
helps reduce the risks inherent in decision-making by providing useful insights and
statistics, often presented in charts, images, tables, and graphs.
It’s not uncommon to hear the term “big data” brought up in discussions about data
analysis. Data analysis plays a crucial role in processing big data into useful information.
Neophyte data analysts who want to dig deeper by revisiting big data fundamentals
should go back to the basic question, “What is data?”

Why is Data Analysis Important?


Here is a list of reasons why data analysis is such a crucial part of doing business today.
 Better Customer Targeting: You don’t want to waste your business’s precious
time, resources, and money putting together advertising campaigns targeted at
demographic groups that have little to no interest in the goods and services you
offer. Data analysis helps you see where you should be focusing your
advertising efforts.

 You Will Know Your Target Customers Better: Data analysis tracks how well
your products and campaigns are performing within your target demographic.
Through data analysis, your business can get a better idea of your target
audience’s spending habits, disposable income, and most likely areas of
interest. This data helps businesses set prices, determine the length of ad
campaigns, and even help project the quantity of goods needed.

 Reduce Operational Costs: Data analysis shows you which areas in your
business need more resources and money, and which areas are not producing
and thus should be scaled back or eliminated outright.

 Better Problem-Solving Methods: Informed decisions are more likely to be


successful decisions. Data provides businesses with information. You can see
where this progression is leading. Data analysis helps businesses make the
right choices and avoid costly pitfalls.

 You Get More Accurate Data: If you want to make informed decisions, you need
data, but there’s more to it. The data in question must be accurate. Data analysis
helps businesses acquire relevant, accurate information, suitable for
developing future marketing strategies, business plans, and realigning the
company’s vision or mission.

What Is the Data Analysis Process?


Answering the question “what is data analysis” is only the first step. Now we will look at
how it’s performed. The data analysis process, or alternately, data analysis steps, involves
gathering all the information, processing it, exploring the data, and using it to find
patterns and other insights. The process consists of:

 Data Requirement Gathering: Ask yourself why you’re doing this analysis, what
type of data analysis you want to use, and what data you are planning on
analyzing.

 Data Collection: Guided by the requirements you’ve identified, it’s time to


collect the data from your sources. Sources include case studies, surveys,
interviews, questionnaires, direct observation, and focus groups. Make sure to
organize the collected data for analysis.

 Data Cleaning: Not all of the data you collect will be useful, so it’s time to clean
it up. This process is where you remove white spaces, duplicate records, and
basic errors. Data cleaning is mandatory before sending the information on for
analysis.

 Data Analysis: Here is where you use data analysis software and other tools to
help you interpret and understand the data and arrive at conclusions. Data
analysis tools include Excel, Python, R, Looker, Rapid Miner, Chartio, Metabase,
Redash, and Microsoft Power BI.

 Data Interpretation: Now that you have your results, you need to interpret
them and come up with the best courses of action, based on your findings.

 Data Visualization: Data visualization is a fancy way of saying, “graphically


show your information in a way that people can read and understand it.” You
can use charts, graphs, maps, bullet points, or a host of other methods.
Visualization helps you derive valuable insights by helping you compare
datasets and observe relationships.

What Is the Importance of Data Analysis in Research?


A huge part of a researcher’s job is to sift through data. That is literally the definition of
“research.” However, today’s Information Age routinely produces a tidal wave of data,
enough to overwhelm even the most dedicated researcher.

Data analysis, therefore, plays a key role in distilling this information into a more accurate
and relevant form, making it easier for researchers to do to their job.

Data analysis also provides researchers with a vast selection of different tools, such as
descriptive statistics, inferential analysis, and quantitative analysis.
So, to sum it up, data analysis offers researchers better data and better ways to analyze
and study said data.

What is Data Analysis: Types of Data Analysis


There are a half-dozen popular types of data analysis available today, commonly
employed in the worlds of technology and business. They are:
 Diagnostic Analysis: Diagnostic analysis answers the question, “Why did this
happen?” Using insights gained from statistical analysis (more on that later!),
analysts use diagnostic analysis to identify patterns in data. Ideally, the analysts
find similar patterns that existed in the past, and consequently, use those
solutions to resolve the present challenges hopefully.

 Predictive Analysis: Predictive analysis answers the question, “What is most


likely to happen?” By using patterns found in older data as well as current
events, analysts predict future events. While there’s no such thing as 100
percent accurate forecasting, the odds improve if the analysts have plenty of
detailed information and the discipline to research it thoroughly.

 Prescriptive Analysis: Mix all the insights gained from the other data analysis
types, and you have prescriptive analysis. Sometimes, an issue can’t be solved
solely with one analysis type, and instead requires multiple insights.

 Statistical Analysis: Statistical analysis answers the question, “What


happened?” This analysis covers data collection, analysis, modeling,
interpretation, and presentation using dashboards. The statistical analysis
breaks down into two sub-categories:

1. Descriptive: Descriptive analysis works with either complete or selections of


summarized numerical data. It illustrates means and deviations in continuous
data and percentages and frequencies in categorical data.

2. Inferential: Inferential analysis works with samples derived from complete


data. An analyst can arrive at different conclusions from the same
comprehensive data set just by choosing different samplings.

 Text Analysis: Also called “data mining,” text analysis uses databases and data
mining tools to discover patterns residing in large datasets. It transforms raw data
into useful business information. Text analysis is arguably the most
straightforward and the most direct method of data analysis.
 Displaying data in research is the last step of the research process. It is important to display
data accurately because it helps in presenting the findings of the research effectively to the
reader. The purpose of displaying data in research is to make the findings more visible and
make comparisons easy. When the researcher will present the research in front of the research
committee, they will easily understand the findings of the research from displayed data. The
readers of the research will also be able to understand it better. Without displayed data, the
data looks too scattered and the reader cannot make inferences.
 There are basically two ways to display data: tables and graphs. The tabulated data and the
graphical representation both should be used to give more accurate picture of the research. In
quantitative research it is very necessary to display data, on the other hand in qualitative data
the researcher decides whether there is a need to display data or not. The researcher can use
an appropriate software to help tabulate and display the data in the form of graphs. Microsoft
excel is one such example, it is a user-friendly program that you can use to help display the
data.

Tables for displaying data in research


 The use of tables to display data is very common in research. Tables are very
effective in presenting a large amount of data. They organize data very well and
makes the data very visible. A badly tabulated data also occurs, in case, you do not
have knowledge of tables and tabulating data consult a statistician to do this step
effectively.
 Parts of a table
 To know the tables and to tabulate data in tables you should know the parts or
structure of the tables. There are five parts of a tables, namely;
 Title
 The title of the table speaks about the contents of the table. The title should have
to be concise and precise, no extra details. The title should be written in sentence
case.
 Stub
 The column at the left-most of the table is called as stub. A stub has a stub-heading
at the top of the column, not all tables have stub. The stub shows the subcategories
that are listed along Y-axis.
 Caption
 The caption is the column heading, the variable might have subcategories which
are captioned. These subcategories are provided on the X-axis, the captions are
provided on the top of each column.
 Body
 The body of the table is the actual part of the table in which resides the whole
values, results, and analysis.
 Footnotes
 There can be many different types of notes that you may have to provide at the
end of the table. The footnotes are provided just below the table and labeled as the
source. The source generally are provided when the table has been taken from
some other source. They are also provided for explaining some point in the table.
Sometimes there is some part of the table that is taken from a source so it should
also be mentioned.
 Types of tables
 Tables are the most simple means to display data, they can be categorized into the
following;
 Univariate
 Bivariate
 Polyvariate
 These categories are based on the numbers of variables that need to be tabulated
in the table. A univariate table has one variable to be tabulated; a bivariate table,
as the name suggests, has two variables to be tabulated and a polyvariate table has
more than two variables to be tabulated.
 Graphs to display data
 The purpose of displaying data is to make the communications easier. Graphs
should be used in displaying data when they can add to the visual beauty of the
data. The researcher should decide whether there is a need for table only or he
should also present data in the form of a suitable graph.
 Types of graphs
 You can use a suitable graph type depending on the type of data and the variables
involved in the data.
 The histogram
 The histogram is a graph that is highly used for displaying data. A histogram
consists of rectangles that are drawn next to each other on the graph. The
rectangles have no space in between them. A histogram can be drawn for a single
variable as well as for two or more than two variables. The height of the bars in
the histogram represent the frequency of each variable. It can be drawn for both
categorical and continuous variables.
 The bar chart
 The bar chart is similar to a histogram except in that it is drawn only for categorical
variables. Since it is used for categorical variables, therefore, it is drawn with space
between the rectangles.
 The frequency polygon
 A frequency polygon is also very much like a histogram. A frequency polygon
consists of frequency rectangles drawn next to each other but the values taken to
draw the rectangles is the midpoint of the values. The height of the rectangles
describes the frequency of each interval. A line is drawn that touches the
midpoints at the highest frequency level on Y-axis and it touches the X-axis on each
extreme end.
 The cumulative frequency polygon
 The cumulative frequency polygon is also a frequency polygon, it is drawn using
the cumulative frequencies on the Y-axis. The values on the X-axis are taken by
using the endpoints of the interval. The endpoints of the interval are joined to each
other the reason being that the cumulative frequency is always based on the upper
limit of an interval.
 The stem and leaf display
 The stem and leaf display is another easy way to display data. The stem and leaf
display if rotated to 90 degrees become a histogram.
 The pie chart
 The pie chart is a very different way to display data. The pie chart is a circle, as a
circle has 360 degrees so it is taken in percentage and the whole pie or circle
represent the whole population. The pie or circle is divided into slices or sections,
each section represents the magnitude of the category or the sub-category.
 The trend curve
 The trend curve is also called as the line diagram. It is drawn by plotting the
midpoints on the X-axis and the frequencies commensurate with each interval on
the Y-axis. The trend curve is drawn only for a set of data that has been measured
on the continuous, interval or ratio scale. A trend diagram or the line diagram is
most suitable for plotting values that show changes over a period of time.
 The area chart
 The area chart is a variation of the trend curve. In area chart, the sub-categories of
a variable can be displayed. The categories in the chart are displayed by shading
them with different colors or patterns. For example, if there are both males and
females category in the dataset both can be highlighted in this chart.
 The scattergram
 A scattergram is a very simple way to plot the data on a chart. The scattergram is
used for data where the change in one variable affects the change in the other
variable. The frequency against each interval is plotted with the help of dots.

You might also like