DECS 43A - Big Data Analysis

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

Big Data Analytics

Government Arts and Science College


Tittagudi-606106
Department of Computer Science

DECS 43A – Big Data Analysis


II Year IV Semester

Unit -1
Introduction to Big Data
Dr. S. P. Ponnusamy
Assistant Professor and Head

1
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Unit -1
Introduction to Big Data
 Data
 Characteristics of data
 Types of digital data: Unstructured, Semi-structured and Structured,
 Sources of data
 Working with unstructured data
 Evolution and Definition of big data
 Characteristics and Need of big data
 Challenges of big data
 Data environment versus big data environment

2
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Data
• The quantities, characters, or symbols on which operations are performed by a
computer, which may be stored and transmitted in the form of electrical signals and
recorded on magnetic, optical, or mechanical recording media

Big Data
• Big Data is a collection of data that is huge in volume, yet growing exponentially
with time.
• It is a data with so large size and complexity that none of traditional data
management tools can store it or process it efficiently.
• It includes data mining, data storage, data analysis, data sharing, and data
visualization.

3
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Data vs Big Data

4
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Data Growth

• 1,024 bytes = 1 kilobyte (KB).


• 1,024 kilobytes (KB) = 1 MB.
• 1,024 MB = 1 GB.
• 1,024 GB = 1 TB
• 1,024 TB = 1 petabyte (PB).
• 1,024 PB = an exabyte (EB).
• 1,024 EB = a zettabyte (ZB)
• 1,024 ZB = 1 YB (Yottabyte).

5
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Data Growth

6
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Types of Data

7
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Structured Data
• This is the data which is in an organized form (e.g., in rows and columns) and can be
easily used by a computer program.
• Relationships exist between entities of data, such as classes and their objects.
• Data stored in databases is an example of structured data.
• Structured data is also called relational data.
• It is split into multiple tables to enhance the integrity of the data by creating a single
record to depict an entity.
• A Structured Query Language (SQL) is needed to bring the data together.
• Structured data is easy to enter, query, and analyze. 

8
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Structured Data

9
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Structured Data - Sources

10
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Ease with Structured Data

11
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Semi-Structured Data
• This is the data which does not conform to a data model but has some structure.
• However, it is not in a form which can be used easily by a computer program.
• Example, emails, XML, markup languages like HTML, JSON document, etc.
• Metadata for this data is available but is not sufficient.
• It is commonly called NoSQL data

12
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Semi-Structured Data - Sources

13
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Semi-Structured Data –XML Example

<ProgrammerDetails>
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<CodingPlatforms>
<CodingPlatform Type="Fav">GeeksforGeeks</CodingPlatform>
<CodingPlatform Type="2ndFav">Code4Eva!</CodingPlatform>
<CodingPlatform Type="3rdFav">CodeisLife</CodingPlatform>
</CodingPlatforms>
</ProgrammerDetails>

<!--The 2ndFav and 3rdFav Coding Platforms are imaginative because Geeksforgeeks is
the best!-->

14
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Semi Structured Data – JSON Example

15
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Characteristics of Semi-Structured Data

16
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Unstructured Data
• This is the data which does not conform to a data model or is not in a form which can be
used easily by a computer program.
• Data can not be stored in the form of rows and columns as in Databases
• Data does not follows any semantic or rules
• Data lacks any particular format or sequence
• Data has no easily identifiable structure
• Due to lack of identifiable structure, it can not used by computer programs easily
• About 80–90% data of an organization is in this format.
• Example: memos, chat rooms, PowerPoint presentations, images, videos, letters, researches,
white papers, body of an email, etc.

17
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Unstructured Data – Example

18
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Unstructured Data – Sources

19
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Unstructured Data – issues

20
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Dealing with Unstructured Data

21
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Definition of Big Data


Big Data is high-volume, high-
High-volume
velocity, and high-variety High-velocity
information assets that demand High-variety

cost effective, innovative forms


of information processing for
enhanced insight and decision
making. Cost-effective, innovative forms of
information processing

Source: Gartner IT Glossary


Enhanced insight & decision
making

22
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Characteristics of Data
1. Composition: The composition of data deals with the structure of data, that is,
the sources of data, the granularity, the types, and the nature of data as to
whether it is static or real-time streaming.

2. Condition: The condition of data deals with the state of data, that is, "Can one
use this data as is for analysis?" or "Does it require cleansing for further
enhancement and enrichment?"

3. Context: The context of data deals with "Where has this data been generated?"
"Why was this data generated?" How sensitive is this data?" "What are the
events associated with this data?" and so on.

23
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Evolution of Big Data

24
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Evolution of Big Data

25
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Evolution of Big Data

26
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Evolution of Big Data

27
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Why of Big Data?

28
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Need of Big Data

29
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Characteristics of Big Data/What is Big Data?


• Volume: the size and amounts of big data that companies manage and
analyse.
• Variety: the diversity and range of different data types, including
unstructured data, semi-structured data and structured data.
• Velocity: the speed at which companies receive, store and manage data
– e.g., the specific number of social media posts or search queries
received within a day, hour or other unit of time.

30
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Characteristics of Big Data

31
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Characteristics of Big Data

32
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Characteristics of Big Data – other V’s

• Value: refers to the value that big data can


provide, and it relates directly to what
organizations can do with that collected
data. 
• Veracity: the “truth” or accuracy of data
and information assets, which often
determines executive-level confidence

33
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Challenges of Big Data Capture

Storage

Curation

Challenges with Big Data


Search

Analysis

Transfer

Visualization

Privacy
Violations
34
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Sources of Big Data

35
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Traditional Business Intelligence (BI) versus Big Data

• In traditional BI environment, data resides in a central server whereas


in big data environment, data resides in a distributed file system.

• Traditional BI  Move data to code

• Big Data Environment  Move code to data

• In traditional BI environment, data is analyzed in offline mode


whereas in big data environment data is analyzed in both real time as
well as offline mode.

36
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

A Typical Data Warehouse Environment

• In a typical DW environment, data is collected from multiple disparate sources,


integrated, cleansed and transformed before loading it to a data warehouse.
• A host of market leading BI tools can then be used on top of the data warehouse for
reporting/dashboarding, ad hoc querying and modelling.
37
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

A Typical Hadoop Environment

Hadoop takes care of storage and processing using the following:

a)HDFS (Hadoop Distributed File System) (distributed storage)


b)MapReduce (distributed processing)

ODS-operational Data store


38
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

Co-existence of Big Data and Data Warehouse

39
Big Data Analytics
Government Arts and Science College
Tittagudi-606106
Department of Computer Science

End

40

You might also like