Big Data Distributed Computing

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Big data and effective

distributed computing
W h a t is Bigdata?

3 V Definition

Volume Velocity Variety


How much data generated in 1-Minite
➢4,166,667 post on facebook

➢3,47,222 tweets on twitter

➢1,736,111 post on instagram

➢300hr of video on youtube

➢18,327 cast the vote on REDDIT


W h a t a r e T y p e s o f D a t a?
S t r u c t u r e d Data:

➢ There are 3 AspectsDefined:


➢ Name
➢ Type
➢ Length
First_name Hire_date Salary
Gopichand 36/02/2013 $6000
Rahul 28/01/2013 10000

➢ Example: Hire_date date


➢ Example: Salary number(6,0)
➢ e.g. Oracle, SQL Server, DB2, Sybase, Teradata, MySQL (All RDBMS’s)
S e m i - S t r u c t u r e d Data:
➢Is this valid in Excel?
First_name Hire_date Salary
Gopichand 36/02/2013 $6000
Rahul 28/01/2013 10000

➢Is this valid in XML?


<row>
<name>Gopichand</name>
<hire_date>36/02/2013</ire_date> Example: Excel, XML,JSON, csv
<salary>$6000</salary>
</row>
<row>
<name>Rahul</name>
<hire_date>28/01/2013</hire_date>
<salary>10000</salary>
</row>
U n s t r u c t u r e d Data

➢ Free Flowing Text(Plain Text)


➢ Tweets on Twitter
➢ Messages on Whatsapp
➢ Emails on gmail, ymail etc.
➢ Word doc, pdfs, Comments on Facebook Posts
➢ Videos on Youtube
➢ Log files from Webservers, log files from app
servers, audit logs etc.
Can this be Processed by the Legacy
F r a m e w o r k s Av a i l a b l e ?
➢ Like ETLtools (ab Initio, Informatica,
Datastage etc.)
➢ Like RDBMS’s (Oracle, SQL Server, DB2,
Sybase, Teradata, MySQL)
➢ Mainframes(IBM mainframe, AS400 etc.)

Problem is: Client Server Architecture


How do Legacy Frameworks Work?

1. Client writes a select query

Select * 2. Sends to server


From emp
Where salary > 5000; Server
4. Sends the result to
client

3. Server processes data


W h a t is Scaling Problem?

Vertical Scaling Horizontal Scaling


W h a t H a p p e ns to these Databases w h e n
Data increases?

➢Select Queries would run very slow….This


increases time taken for reports to be
generated
➢Buffers/Cache Overflow can lead to a server
crash
Bigdata i n Telecom
Data:
➢ Incoming Calls Incoming Calls
➢ Outgoing Calls
Pimple
➢ Data Usage
Saudagar
➢ SMS
➢ VAS
➢ Call Detail Records
Deccan Kharadi
Think About!
➢ Total Activity
➢ Millions of Users
➢ Huge number of CDR Logs Generated
➢ Million Logs per Tower
➢ So many Towers in a City Hinjawadi Nal Stop
➢ Multiple Towers in the Country
Bigdata
Bigdata in Banking

Cheque Loans: Swipes: Social Media Spending


Net Banking ➢ Home Loans ➢ Credit Card Patterns
➢ Car Loans ➢ Debit Cards
➢ POS
➢ Gold Loans

Millions of Users…
➢ Investment Banking ➢ Mutual Funds
Millions of Logs…
➢ Retail Banking ➢ SIP
Millions of Rows…
➢ Corporate Banking ➢ EMIs
Gigabytes and Terabytes of Data
Bigdata in E-commerce

➢ Advertising and SEM(Search Engine Marketing)

Millions of Customers…
Millions of Clicks…

Clicks x Cost Per Click (CPC)


1 Click Rs. 50-500/-
Crores of Money

Click Based Logs


Millions and Millions of
Clickstream Logs…
Bigdata in Healthcare

Billing Medicines

Insurance Test
➢ Operator Information
Claims Reports
➢ Doctor/Nurse Information
➢ Medical Records
➢ Drug Information Room Old
Booking
➢ Emergency Work Log Reports
➢ In-Patient Record
➢ Out-Patient Record New
Hospital
Staff Reports
Every Hour…
Every Day…
Every Branch…
Every Hospital…

Gigabytes and Terabytes of Data


Android Apps

➢ When you install an Android


App, you authorize them to
read your messages, mails,
phonebook, and access your
media etc.
Te m p l e R u n a n d A d v e r t i s e m e n t

Advertisements shown here


H o w d o we solve t h i s p r o b l e m ?
Distributed Processing

Select Host
Terminal Terminal Terminal Terminal
Query Computer

Host Terminal Terminal Terminal Terminal


Computer

Code Host
Written Terminal Terminal Terminal Terminal
Computer
at one
place

Unlimited Scaling!!
Hadoop
History of Hodoop

2003 :- Google launch one file system for storing data. GFS (Google
File System)

2004 :- Google launch one software framework for processing data


called MapReduce.

2004:- Google publish one white paper on GFS & MapReduce

Yahoo work on white paper, publish by google and come out on


conclusion i.e. Hadoop
Who is the inventor of hadoop

Doug Cutting is inventor of hadoop

Doug Cutting small kids playing with toy elephant and the toy name is hadoop, so
this name was given to technology called as “hadoop

You might also like