Session Objectives By the end of this session, you will be able to:
1. Describe and differentiate different types of digital
data. Unit Content Summary 2.1 Introduction 2.2 Structured Data 2.3 Unstructured Data 2.4 Semi-structured Data 2.5 Real Time Data 2.6 Right Time Data Introduction • A huge volume of data is being generated every day at very fast rate. • Data is broadly categorized into three types namely structured, unstructured, and semi-structured. • As per Gartner’s report 20% data is structured and remaining is unstructured or semi-structured data. • Most of the data generated now a day falls under unstructured category. Structured Data
• The structured data is usually stored in the tabular
format (in rows and columns) in excel, csv files, relational database management system (RDBMS) etc. Structured Data (… contd.) • Example- Day to day operational data generated everyday by transactional processing systems such as railway reservation system, core banking solution, ERP, CRM, point of sale system etc. • The structured data can be easily stored, manipulated, and processed by any RDBMS systems such as Oracle, MySql. • The warehouse data is also example of structured data. Unstructured Data • The data that doesn’t follow any pre-defined structured in order to store and process it, falls under unstructured category. • A bigger percentage of the worlds data generated in today’s time falls under this category. Unstructured Data (…contd.) • Example- The data in the form of text, images, video etc. • Every day we generate huge volume of unstructured data by posting text messages, images/pictures, Videos on social media sites such as Facebooks, Instagram, twitter, WhatsApp, YouTube, etc. • The storage capacity required to store unstructured data is huge because of images and videos. Semi Structured Data • This is type of data which is neither structured nor unstructured. • It means that data doesn’t follow fixed schema like tables in relational database management system but it has some structural elements like tags and elements. • E-mails, XML, JSON are some good examples of semi- structured data Real Time Data • The data which is used immediately after it is collected is called real time data. • Nowadays there are several applications where real time data is used. For example, recommender systems, navigation systems, self-driving cars, google maps • Data loses its value as time passes. Therefore nowadays it is increasingly becoming important to leverage the data value immediately once it is generated. Right Time Data • Data become useless if not available at right time i.e. when needed. • Therefore, making data available or accessible to the right people at right time is very much important. Exercises • Discuss characteristics of structured, unstructured, and semi structured data. • Explain the Real Time Data with the help of example. • Explain the Right Time Data in detail. Thank You