Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

Week #3

Data Types

Artificial Intelligence &


Intelligent Systems
Slides By Dr.Rami Ibrahim
Types of Data
 Unstructured data vs. Structured data
 Quantitative data vs. Qualitative data
 Levels of structured data
Definitions
 Structured data: This is data that can be thought of as
observations and characteristics. It is usually organized using
a table method (rows and columns).
 Structured data examples: Excel sheets, CSV files, Database
tables.
 Semi-Structured data: This is data that doesn’t follow tabular
structure but contain tags to separate semantic elements.
 Semi-Structured data examples: XML files, HTML files.
 Unstructured data: This data exists as a free entity and does
not follow any standard organization hierarchy.
 Unstructured data examples: Text files, server logs, images,
videos, voices.
Examples
Real Life Data
 90% of the world’s data is unstructured.
 ML learning algorithms are designed to work with structured
data, while deep learning algorithms are designed to work with
unstructured data.
 Remember the iceberg image from previous classes. The vast
majority of data is unstructured and not explored yet.
Definitions
 Quantitative data: This data can be described using numbers,
and basic mathematical procedures, including addition, are
possible on the set.
 Qualitative data: This data cannot be described using numbers
and basic mathematics. This data is generally thought of as
being described using "natural" categories and language.
Example
 Let's Assume the following Coffee Shop data:
Coffee Shop Data Qualitative/Quantitative
Name of Coffee Shop Qualitative
Revenue (in thousands of dollars) Quantitative
Postal Code Qualitative
Average monthly customers Quantitative
Country of coffee origin Qualitative

 To determine the type ask yourself:


1- Is it numeric?
2- Can we perform mathematical operations on it?
Example
Quantitative Types
 Discrete data: The data is counted.
 Discrete data examples: A dice roll, because it can only take on
six values, and the number of customers in a café, because you
can’t have a real range of people.
 Continuous data: The data is measured.
 Continuous data examples: The height of a person or building is
a continuous number because an infinite scale of decimals is
possible, time and temperature.
Example
Levels of Structured Data
 The Nominal Level
 The Ordinal Level
 The Interval Level
 The Ratio Level
Levels of Structured Data
The Nominal Data
 The Nominal Data can be described by name.
 The Nominal Data is always qualitative.
 Examples of Nominal Data: Gender, Eye color, Animal species.
 We cannot perform mathematical operations on Nominal Data
because it is qualitative (Except for the mode).
The Ordinal Data
 The Ordinal Data can be described by name and provides data
with a rank order.
 The Ordinal Data is always qualitative.
 Examples of Ordinal Data: Likert Scale (unsatisfied, neutral,
satisfied).
 We cannot perform mathematical operations except ordering,
comparison between two values (Example: neutral is higher
than unsatisfied), and mode.
The Interval Data
 The Interval Data divides data into variables with equal measure
of distance.
 Interval datasets have no “true zero” and may contain negative
values.
 The Interval Data is always quantitative.
 Examples of Interval Data: Temperature in Celsius, IQ scores.
 We can perform following mathematical operations: Mode,
Median, Mean, Range, Standard Deviation, Variance.
The Ratio Data
 The Ratio Data divides data into variables with equal measure
of distance.
 Ratio datasets have a “true zero” and doesn't contain negative
values.
 The Ratio Data is always quantitative.
 Examples of Ratio Data: Temperature in Kelvin, Age in years.
 We can perform following mathematical operations: Mode,
Median, Mean, Range, Standard Deviation, Variance.
 Since there are no negatives, Ratio Data can be added,
subtracted, multiplied, and divided.
Exercise
 Determine if the given data is nominal, ordinal, interval, or ratio:
1- Blood Types: O-, O+, A-, A+, B-, B+, AB-, AB+
2- Height in centimeters
3- Vehicle type: Car, Truck, Motorcycle
4- Pearson course grades: U, P, M, D
5- Income status: Low income, medium income, high income
6- Electronic device: Desktop, Laptop, Smartphone, Tablet
7- Table length in inches
8- Degree of Pain: Small pain, medium pain, severe pain
9- Place you live: City, Suburbs, Rural
10- Weight in kilograms
References
Chapter 1 ‘Training Models’ from Hands-On Machine Learning
with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and
Techniques to Build Intelligent Systems’ 2nd Edition

You might also like