Adobe Scan

2. ing resources suggested at the end ofthis chapterand aly We suggest you refer to the learn We suggest you make your own notes/bookmarks while exercises to get a grip on this topic. the chapter. 1.1 CLASSIFICATION OF DIGITAL DATA As depicted in Figure 1.1, digital data can be broadly classified into structured, semi-structured, andy tured data. 1. Unstructured data: This is the data which does not conform to a data model ors notin afy can be used easily by a computer program. About 80-90% data of an organization i in for example, memos, chat rooms, PowerPoint presentations, images, videos, letters, papers, body of an email, etc. : 2, Semi-structured data: This is the data which does not conform to a data model but has tate, However itis not in a form which can be used easily by a computer program for examy XML, markup languages like HTML, etc. Meradaca for this data is available buts not suff | 3, Structured data: This is the data which isin an organized form (eg. in rows and columns) be easily used by a computer program. Relationships exist berween entities of data, such ir objects, Data stored in databases is an example of structured data. ce the 1980s most of the enterprise data has been stored in relational databases comp es, columns/attributes/fields, primary keys, foreign keys, etc. Over a period oft agement System (RDBMS) matured and the RDBMS, as they are availab sre robust, cost-effective, and efficient. We have grown comfortable working with | and management of data has been immensely simplified. The data held red data, However, with the Internet connecting the world, data that existed be ed to become an integral part of daily transactions. This data grew by leaps an¢ came difficult for the enterprises to ignore it. All of this data was not struct d. In fact, Gartner estimates that almost 80% of data generated in any € Roughly around 10% of data is in the structured and semi-structured very basic question - When do we say that the data is structured? to a pre-defined schema/structure we say it is structured data. Pookie)id before was the era of evolved in 19805 and 1 and the Internet of data. Refer Table 2, 1. Y primitive and 4 intensive applications Structured. Rel onslaught of structured The World Wik * UMstructured, ay ‘ational ide Wel ind mul- led to an o Table 2.1 the evolution of big data Data Generation and Data Utilization Storage Data Driven Structured data, Unstructured data, multimedia data Relational databases: Data-intensive applications Mainframes: Basic data storage 1970s and before Relational 2000s and beyond (1980s and 1990s) DEFINITION OF BIG DATA were to ask you the simple question: “Define Bi ig Data’, what would your answer be? Wel, we ill give a few responses that we have heard over time: Anything beyond the human and technical infrastructure needed to support storage, ins analysis. ‘ Today's BIG may be tomorrow's NORMAL. processing, and3. Terabytes or petabytes or zerabytes of 42% pig daca if ese; in fee’ 4. V think it is about 3 Vs. Refer Figure 2.2. Well, 2. Well, all of of the above and more. these responses are COTTE Botitis nor just 7° of th Big data is high-volume, high eclocity, and bigh-vaTiY information: asses that 4 demand cot effecrive chances insight am Om sion meakiNS Garmer iv Glossary innovative forms of information process J” Laney ip 2001 MeaGrouP analyst Dovs a img Data Volum ¥ and Velocity: 3101/2d949-3P" The 3Vs concept by the a riety 4 publication, sided, 3D Data M. Source: bey Iologs: gartner.com/doug-l Data Volume-Velocity-and-Variery: oF ae 2012: For the sake of easy cor ymprehension, We will look at the definition 9 three parts. Refer Figure > 2.3. reipsec Big Daa DpeD 21 Figure 2.3 Definition of big data - Gartner. Part | of the definition “big data is high-volume, high-velocity, and high-variety information assets” talks about voluminous data (humongous data) that may have great variety (a good mix of structured, uctured, and unstructured data) and will require a good speed/pace for storage, preparation, processing, and analysis. Part Il of the definition “cost effective, innovative forms of information processing” talks about embrac- ing new techniques and technologies to capture (ingest), store, process, persis, integrate, and visualize the high-volume, high-velocity, and high-variety data. art III of the definition “enhanced insight and decision making” talks about deriving deeper, richer, and ingful insights and then using these insights to make faster and better decisions to gain business value hus a competitive edge semi ani Data — Information — Actionable intelligence — Better decisions + Enhanced business value 2.4 CHALLENGES WITH BIG DATA Refer Figure 2.4. Following are a few challenges with big data: _A* Data today is growing at an exponential rate. Most of the data that we have today has been generated in the last 2-3 years. This high tide of data will continue to rise incessantly. The key questions here are: “Will allthis data be useful for analyss?”, “Do we work with all this data ora subset of i2”, “How ‘we separate the knowledge from the noise2”, etc. computing and virtualization are here to stay. Cloud computing is the answer to managing for big data as far as cost-effciency, elasticity, and easy upgrading/downgrading is con- This further complicates the decision to host big data solutions outside the enterprise. is to decide on the period of retention of big data. Just how long should one setain question indeed as some data is useful for making long-term decisions, whereas in few quickly become irrelevant and obsolete just a few hours after having being generated. aala Visial:...*. a —_ ris Bin uzation is becoming popul Se lization experts are concerned. ar as a separate discipline. We are shore by quite a 25 WHAT IS BiG DATA? Big data is daca that is big in volume, velocity, and variety. Refer Figure 2.5. 2.5.1 Volume q We have seen it grow from bits to bytes to petabytes and exabytes. Refer Table 2.2 and Figure 6, Bits — Bytes > Kilobytes > Megabytes > Gigabytes —> Terabytes ~ Petabytes — Exabytes — Zettabytes — Yottabytes 2.5.1.1 Where Does This Data get Generated? There are a multitude of sources for big data. An XLS, a DOC, a PDE ete. is unstructured data; a Tube, a chat conversation on Internet Messenger, a customer feedback form on an online2.6 OTHER CHARACTERISTICS OF pD, DEFINITIONAL TRAITS OF BIG DATA rice aay “There are yet other characteristics of data which are not necesea| a these are listed as follows: ccessaily the definitional traits of big data. Few of | Veracity and validity: Veraci eeedal Fe + al the dm at's eapemelsiedoleninel an a c data that mined, zed meaningful and pertinent to the problem under consideration?” Validity refers to the accuracy and correctness ofthe data. Any dit that picked up for analysis needs to be accurate It is not just trueaboutbigdataalone 2. Volatility: Volatility of data deals with, how long isthe data valid? And how long should i be stored? “There is some data that is required for long-term decisions and remains valid for longer periods ofime, However, there are also pieces of data thar quickly become obsolete minutes after their generation, 3, Variability: Data flows can be highly inconsistent with periodic peaks, ancy ects dau” fae ange a: ellimncin eS Tien after ec erat

Adobe Scan

Uploaded by

Copyright:

Available Formats

You might also like

Adobe Scan

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adobe Scan

Uploaded by

Copyright:

Available Formats

You might also like