This document discusses different types of data including structured, unstructured, quantitative, and qualitative data. It provides definitions and examples of each type. It also describes the different levels of structured data from nominal to ratio and provides examples of each level. Key points covered include that most data is unstructured, ML works best with structured data while deep learning works with unstructured, and the majority of data remains unexplored.
This document discusses different types of data including structured, unstructured, quantitative, and qualitative data. It provides definitions and examples of each type. It also describes the different levels of structured data from nominal to ratio and provides examples of each level. Key points covered include that most data is unstructured, ML works best with structured data while deep learning works with unstructured, and the majority of data remains unexplored.
This document discusses different types of data including structured, unstructured, quantitative, and qualitative data. It provides definitions and examples of each type. It also describes the different levels of structured data from nominal to ratio and provides examples of each level. Key points covered include that most data is unstructured, ML works best with structured data while deep learning works with unstructured, and the majority of data remains unexplored.
Intelligent Systems Slides By Dr.Rami Ibrahim Types of Data Unstructured data vs. Structured data Quantitative data vs. Qualitative data Levels of structured data Definitions Structured data: This is data that can be thought of as observations and characteristics. It is usually organized using a table method (rows and columns). Structured data examples: Excel sheets, CSV files, Database tables. Semi-Structured data: This is data that doesn’t follow tabular structure but contain tags to separate semantic elements. Semi-Structured data examples: XML files, HTML files. Unstructured data: This data exists as a free entity and does not follow any standard organization hierarchy. Unstructured data examples: Text files, server logs, images, videos, voices. Examples Real Life Data 90% of the world’s data is unstructured. ML learning algorithms are designed to work with structured data, while deep learning algorithms are designed to work with unstructured data. Remember the iceberg image from previous classes. The vast majority of data is unstructured and not explored yet. Definitions Quantitative data: This data can be described using numbers, and basic mathematical procedures, including addition, are possible on the set. Qualitative data: This data cannot be described using numbers and basic mathematics. This data is generally thought of as being described using "natural" categories and language. Example Let's Assume the following Coffee Shop data: Coffee Shop Data Qualitative/Quantitative Name of Coffee Shop Qualitative Revenue (in thousands of dollars) Quantitative Postal Code Qualitative Average monthly customers Quantitative Country of coffee origin Qualitative
To determine the type ask yourself:
1- Is it numeric? 2- Can we perform mathematical operations on it? Example Quantitative Types Discrete data: The data is counted. Discrete data examples: A dice roll, because it can only take on six values, and the number of customers in a café, because you can’t have a real range of people. Continuous data: The data is measured. Continuous data examples: The height of a person or building is a continuous number because an infinite scale of decimals is possible, time and temperature. Example Levels of Structured Data The Nominal Level The Ordinal Level The Interval Level The Ratio Level Levels of Structured Data The Nominal Data The Nominal Data can be described by name. The Nominal Data is always qualitative. Examples of Nominal Data: Gender, Eye color, Animal species. We cannot perform mathematical operations on Nominal Data because it is qualitative (Except for the mode). The Ordinal Data The Ordinal Data can be described by name and provides data with a rank order. The Ordinal Data is always qualitative. Examples of Ordinal Data: Likert Scale (unsatisfied, neutral, satisfied). We cannot perform mathematical operations except ordering, comparison between two values (Example: neutral is higher than unsatisfied), and mode. The Interval Data The Interval Data divides data into variables with equal measure of distance. Interval datasets have no “true zero” and may contain negative values. The Interval Data is always quantitative. Examples of Interval Data: Temperature in Celsius, IQ scores. We can perform following mathematical operations: Mode, Median, Mean, Range, Standard Deviation, Variance. The Ratio Data The Ratio Data divides data into variables with equal measure of distance. Ratio datasets have a “true zero” and doesn't contain negative values. The Ratio Data is always quantitative. Examples of Ratio Data: Temperature in Kelvin, Age in years. We can perform following mathematical operations: Mode, Median, Mean, Range, Standard Deviation, Variance. Since there are no negatives, Ratio Data can be added, subtracted, multiplied, and divided. Exercise Determine if the given data is nominal, ordinal, interval, or ratio: 1- Blood Types: O-, O+, A-, A+, B-, B+, AB-, AB+ 2- Height in centimeters 3- Vehicle type: Car, Truck, Motorcycle 4- Pearson course grades: U, P, M, D 5- Income status: Low income, medium income, high income 6- Electronic device: Desktop, Laptop, Smartphone, Tablet 7- Table length in inches 8- Degree of Pain: Small pain, medium pain, severe pain 9- Place you live: City, Suburbs, Rural 10- Weight in kilograms References Chapter 1 ‘Training Models’ from Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems’ 2nd Edition