Big Data MHE - CH 1 - Wholeness

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

BIG DATA, 1E

CHAPTER 1
By: Dr. Anil Maheshwari

Copyright © 2017 McGraw Hill Education, All Rights Reserved.

PROPRIETARY MATERIAL © 2017 The McGraw Hill Education, Inc. All rights reserved. No part of this PowerPoint slide may be displayed, reproduced or
distributed in any form or by any means, without the prior written permission of the publisher, or used beyond the limited distribution to teachers and
educators permitted by McGraw Hill for their individual course preparation. If you are a student using this PowerPoint slide, you are using it without
permission.
Challenge Description Solution Technology
Volume Avoid risk of data loss Replicate segments of HDFS
from machine failure in data in multiple machines;
clusters of commodity master node keeps track
machines of segment location
Volume & Avoid choking of network Move processing logic to Map-Reduce
Velocity bandwidth by moving where the data is stored;
large volumes of data manage using parallel
processing algorithms

Variety Efficient storage of large Columnar databases using HBase,


and small data objects key-pair values format Cassandra
Velocity Monitoring streams too Fork-shaped architecture Spark
large to store to process data as stream
and as batch
Feature Traditional Data Big Data
Representative
Lake / Pool Flowing Stream / river
Structure
Primary Purpose Manage business activities Communicate, Monitor
Business transactions, Social media, Web access logs, machine
Source of data
documents generated
Volume of data Gigabytes, Terabytes Petabytes, Exabytes
Velocity of data Ingest level is controlled Real-time unpredictable ingest
Variety of data Alphanumeric Audio, Video, Graphs, Text
Veracity of data Clean, more trustworthy Varies depending on source
Structure of data Well-Structured Semi- or Un-structured
Physical Storage In a Storage Area Network Clusters of commodity computers
Data organization Relational databases NoSQL databases
Data Access SQL NoSQL such as Pig
Data Manipulation Conventional data processing Parallel processing
Data Visualization Variety of tools Dynamic dashboards with simple measures
Database Tools Commercial systems Open-source - Hadoop, Spark
Cost of System Medium to High high

You might also like