Big Data

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

BIG DATA

BLACK BOOK
Unit – I

• Introduction: Big Data Definition, History of


Data Management-Evolution of Big Data,
Structuring Big Data, Elements of Big Data,
Big Data Analytics, Careers in Big Data,
Future of Big Data, Use of Big Data in Social
Networking, Use of Big Data in Preventing
Fraudulent Activities; Use of Big Data in
Retail Industry
8 Hours
Unit – II

• Hadoop Ecosystem: Understanding Hadoop


Ecosystem, Hadoop Distributed File System:HDFS
Architecture,Concept of Blocks in HDFS
Architecture, NameNodes and Data Nodes, The
Command-Line Interface, Using HDFS Files,
Hadoop-Specific File System Types, HDFS
Commands, The org.apache.hadoop.io
package,HDFS High availability:Features of HDFS.
8 Hours
Unit – III

• Understanding MapReduce: The MapReduce


Framework: Exploring the Features of
MapReduce,Working of MapReduce,
Exploring Map and Reduce Functions, Uses of
MapReduce.
• YARN Architecture: Background; Advantages
of YARN;YARN Architecture
8 Hours
Unit – IV

• Apache Spark: Overview - What Apache Spark is? Features


of apache spark, Spark programming languages, Spark's
built-in libraries; Spark History - Limitations of Map Reduce
in Hadoop, Creation history of Spark; Why Use Spark -
Comparison of Spark and Map Reduce, Reasons for choosing
Spark; Spark architecture and its advantages; Data sharing
using Spark RDD; iterative operations on Spark RDD;
interactive operations on Spark RDD;
• Spark –installation.

8 Hours
Unit V
• NoSQL: Introduction to NoSQL: Why NoSQL,
Characteristics of NoSQL, History of NoSQL,
Types of NoSQL Data Models: Key-Value Data
Model, Column-Oriented Data Model,
Document Data Model, Graph Databases,
Schemaless Databases, Materialized views,
Distribution Models: CAP Theorem, Sharding
8 Hours
BOOKS
• Text Book:
• DT Editorial Services,”Big Data:Black Book ,Comprehensive Problem
Solver”, Dreamtech Press. 2016 Edition [ Chapters - 1,2,4,5,11,12,13,15]
•  
• Reference Book:
• Paul C. Zikopoulos, Chris Eaton, Dirk deRoos, Thomas Deutsch, George
Lapis, Understanding Big Data – Analytics for Enterprise Class Hadoop
and Streaming Data, McGraw Hill, 2012
• P. J. Sadalage and M. Fowler, "NoSQL Distilled: A Brief Guide to the
Emerging World of
• Polyglot Persistence", Addison-Wesley Professional, 2012.
• 3. TomWhite,"Hadoop:TheDefinitiveGuide",ThirdEdition,O'Reilly,2012.
What is Data
“Collection of raw facts from which conclusions may be drawn”

• Data is converted into more


convenient form i.e. Digital
Data Video

– Increase in data processing 01010101010


capabilities Photo 10101011010
– Lower cost of digital storage 00010101011
01010101010
– Affordable and faster Book
10101010101
communication technology 01010101010

• Who creates data? Letter Digital Data

– Individuals
– Businesses
AVocabulary for Measuring Information
If a Grain of Sand were One Byte of Information . . .

1 Megabyte =
1 million bytes
a tablespoon of sand
1 Gigabyte =
1 billion bytes
patch of sand—
9” square, 1’ deep
1 Terabyte =
1 trillion bytes
a sandbox—
24’ square, 1’ deep
1 Petabyte =
1,000 terabytes
a mile long beach—
100’ wide , 1’ deep
A NewVocabulary for Measuring Information
If a Grain of Sand were One Byte of Information . . .

1 Exabyte =
1 Megabyte = 1,000 petabytes
1 million bytes the same beach—
a tablespoon of sand from Maine to North Carolina
1 Gigabyte = 1 Zetabyte =
1 billion bytes 1,000 exabytes
patch of sand— the same beach—
9” square, 1’ deep along the entire US coast
1 Terabyte = 1 Yottabyte =
1 trillion bytes 1,000 zetabytes
a sandbox— enough info to bury the entire
24’ square, 1’ deep US under 296 feet of sand
1 Petabyte =
1,000 terabytes
a mile long beach—
100’ wide , 1’ deep
Define Information
• What do individuals/businesses do
with the data they collect?
– They turn it into “information”
– “Information is the intelligence
Centralized information
storage and processing
Network Network

and knowledge derived from data”


Wired Wireless Wireless Wired

Uploading Accessing
information information

• Businesses analyze raw data in


order to identify meaningful Creators of
Users of

trends
Information
information

– For example:
• Buying habits and patterns of
customers
• Health history of patients Demand for more
Information

Virtuous cycle of information


Difference between RDBMS and DBMS

• Database Management • Relational Database


System (DBMS) is a Management System
software that is used to (RDBMS) is an
define, create and advanced version of a
maintain a database DBMS.
and provides controlled
access to the data.
Difference between RDBMS and DBMS

• DBMS stores data as file. • RDBMS stores data in tabular


form

• Data elements need to access • Multiple data elements can be


individually. accessed at the same

• No relationship between data. Relationship between data.

• DBMS does not support • RDBMS supports distributed


distributed database database
Difference between RDBMS and DBMS

• It supports single user. • It supports multiple users.



• Data redundancy is common in this • Keys and indexes do not allow Data
model. redundancy
• •
• There exists multiple levels of data
• The data in a DBMS is subject to
security in a RDBMS.
low security levels with regards to
data manipulation.
• Higher software and hardware
• Low software and hardware necessities.
necessities.
• Examples: MySQL, PostgreSQL, SQL
Examples: XML, Microsoft Access, Server, Oracle, etc.
etc

You might also like