Professional Documents
Culture Documents
Unit - 3 Data Taxonomy
Unit - 3 Data Taxonomy
Unit - 3 Data Taxonomy
Data are individual pieces of factual information recorded and used for the purpose of analysis.
It is the raw information from which statistics are created. Data analysis is the practice of
working with data to glean useful information, which can then be used to make informed
decisions.
DATA TAXONOMY:
Data taxonomy is the classification of data into categories and sub-categories. It provides
a unified view of the data in an organization and introduces common terminologies and
semantics across multiple systems. Establishing a hierarchy within a set of metadata and
segregating it into categories creates a better understanding of the relationships between data
points.
Levels of Measurements
There are four different scales of measurement. The data can be defined as being one of the
four scales. The four types of scales are:
Nominal Scale
Ordinal Scale
Interval Scale
Ratio Scale
Nominal Scale
A nominal scale is the 1st level of measurement scale in which the numbers serve as “tags” or
“labels” to classify or identify the objects. A nominal scale usually deals with the non-numeric
variables or the numbers that do not have any value.
Characteristics of Nominal Scale
A nominal scale variable is classified into two or more categories. In this measurement
mechanism, the answer should fall into either of the classes.
It is qualitative. The numbers are used here to identify the objects.
The numbers don’t define the object characteristics. The only permissible aspect of
numbers in the nominal scale is “counting.”
Example:
An example of a nominal scale measurement is given below:
What is your gender?
M- Male
F- Female
Here, the variables are used as tags, and the answer to this question should be either M or F.
Ordinal Scale
The ordinal scale is the 2nd level of measurement that reports the ordering and ranking of data
without establishing the degree of variation between them. Ordinal represents the “order.”
Ordinal data is known as qualitative data or categorical data. It can be grouped, named and also
ranked.
Characteristics of the Ordinal Scale
Interval Scale
The interval scale is the 3rd level of measurement scale. It is defined as a quantitative
measurement scale in which the difference between the two variables is meaningful. In other
words, the variables are measured in an exact manner, not as in a relative way in which the
presence of zero is arbitrary.
Characteristics of Interval Scale:
The interval scale is quantitative as it can quantify the difference between the values
It allows calculating the mean and median of the variables
To understand the difference between the variables, you can subtract the values between
the variables
The interval scale is the preferred scale in Statistics as it helps to assign any numerical
values to arbitrary assessment such as feelings, calendar types, etc.
Example:
Likert Scale
Net Promoter Score (NPS)
Bipolar Matrix Table
Ratio Scale
The ratio scale is the 4th level of measurement scale, which is quantitative. It is a type of
variable measurement scale. It allows researchers to compare the differences or intervals. The
ratio scale has a unique feature. It possesses the character of the origin or zero points.
Characteristics of Ratio Scale:
1. Structured data –
Structured data is data whose elements are addressable for effective analysis. It has
been organized into a formatted repository that is typically a database. It concerns all
data which can be stored in database SQL in a table with rows and columns. They have
relational keys and can easily be mapped into pre-designed fields. Today, those data are
most processed in the development and simplest way to manage
information. Example: Relational data.
2. Semi-Structured data –
Semi-structured data is information that does not reside in a relational database but that
has some organizational properties that make it easier to analyze. With some processes,
you can store them in the relation database (it could be very hard for some kind of semi-
structured data), but Semi-structured exist to ease space. Example: XML data.
3. Unstructured data –
Unstructured data is a data which is not organized in a predefined manner or does not
have a predefined data model, thus it is not a good fit for a mainstream relational
database. So for Unstructured data, there are alternative platforms for storing and
managing, it is increasingly prevalent in IT systems and is used by organizations in a
variety of business intelligence and analytics applications. Example: Word, PDF, Text,
Media logs.
What is quantitative data?
Quantitative data refers to any information that can be quantified. If it can be counted or
measured, and given a numerical value, it’s quantitative data. Quantitative data can tell you
“how many,” “how much,” or “how often”—for example, how many people attended last
week’s webinar? How much revenue did the company make in 2019? How often does a certain
customer group use online banking?
What is qualitative data?
Unlike quantitative data, qualitative data cannot be measured or counted. It’s descriptive,
expressed in terms of language rather than numerical values. Researchers will often turn to
qualitative data to answer “Why?” or “How?” questions. For example, if your quantitative data
tells you that a certain website visitor abandoned their shopping cart three times in one week,
you’d probably want to investigate why—and this might involve collecting some form of
qualitative data from the user. Perhaps you want to know how a user feels about a particular
product; again, qualitative data can provide such insights. In this case, you’re not just looking
at numbers; you’re asking the user to tell you, using language, why they did something or how
they feel. Qualitative data also refers to the words or labels used to describe certain
characteristics or traits—for example, describing the sky as blue or labeling a particular ice
cream flavor as vanilla.
What are the main differences between quantitative and qualitative data?
The main differences between quantitative and qualitative data lie in what they tell us, how
they are collected, and how they are analyzed. Let’s summarize the key differences before
exploring each aspect in more detail:
Quantitative data is countable or measurable, relating to numbers. Qualitative data is
descriptive, relating to language.
Quantitative data tells us how many, how much, or how often (e.g. “20 people signed
up to our email newsletter last week”). Qualitative data can help us to understand the
“why” or “how” behind certain behaviors, or it can simply describe a certain attribute—
for example, “The postbox is red” or “I signed up to the email newsletter because I’m
really interested in hearing about local events.”
Quantitative data is fixed and “universal,” while qualitative data is subjective and
dynamic. For example, if something weighs 20 kilograms, that can be considered an
objective fact. However, two people may have very different qualitative accounts of
how they experience a particular event.
Quantitative data is gathered by measuring and counting. Qualitative data is collected
by interviewing and observing.
Quantitative data is analyzed using statistical analysis, while qualitative data is
analyzed by grouping it in terms of meaningful categories or themes.
Scales of measurement is how variables are defined and categorised. Psychologist Stanley
Stevens developed the four common scales of
measurement: nominal, ordinal, interval and ratio. Each scale of measurement has properties
that determine how to properly analyse the data. The properties evaluated
are identity, magnitude, equal intervals and a minimum value of zero.
Properties of Measurement
• Identity: Identity refers to each value having a unique meaning.
• Magnitude: Magnitude means that the values have an ordered relationship to one
another, so there is a specific order to the variables.
• Equal intervals: Equal intervals mean that data points along the scale are equal, so the
difference between data points one and two will be the same as the difference between
data points five and six.
• A minimum value of zero: A minimum value of zero means the scale has a true zero
point. Degrees, for example, can fall below zero and still have meaning. But if you
weigh nothing, you don’t exist.
Quantitative Processing
Quantitative processing describe the relationships of the data. Depending on the sample, there
are different ways to communicate quantitative data.
• Nominal comparison: Sub-categories are individually compared in no particular order.
• Time series: An individual variable is tracked over a period of time, usually represented
in a line chart.
• Ranking: Sub-categories are ranked in order, usually represented in a bar chart.
• Part-to-whole: Sub-categories are represented as a ratio in comparison with the whole,
usually represented in a bar or pie chart.
• Deviation: Sub-categories are compared with a reference point, usually represented in
a bar chart.
• Frequency distribution: Sub-categories are counted in intervals, usually represented in
a histogram.
• Correlation: Two sets of measures are compared to identify if they move in the same or
opposite directions, usually represented in a scatter plot.
FUZZY LOGIC:
FUZZY LOGIC NUMERICAL