Welcome to Scribd!

Introduction To Apache Hadoop

Uploaded by

shashwat2010

0% found this document useful (0 votes)

30 views22 pages

Just a basic introduction on Hadoop to get started with it. What is hadoop? What is Map Reduce? What is structure of Hadoop?

Original Title

Introduction to Apache Hadoop

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Just a basic introduction on Hadoop to get started with it. What is hadoop? What is Map Reduce? What is structure of Hadoop?

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

30 views22 pages

Introduction To Apache Hadoop

Uploaded by

shashwat2010

Just a basic introduction on Hadoop to get started with it. What is hadoop? What is Map Reduce? What is structure of Hadoop?

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 22

Search inside document

Agenda

Need for a new processing platform (BigData)

Origin of Hadoop
What is Hadoop & what it is not ? Hadoop architecture Hadoop components (Common/HDFS/MapReduce) Hadoop ecosystem When should we go for Hadoop ? Real world use cases

Questions

Need for a new processing platform (Big Data)

What is BigData ?
- Twitter (over 7~ TB/day) - Facebook (over 10~ TB/day) - Google (over 20~ PB/day)

Where does it come from ?

Why to take so much of pain ?

- Information everywhere, but where is the knowledge? Existing systems (vertical scalibility)

Why Hadoop (horizontal scalibility)?

Origin of Hadoop

Seminal whitepapers by Google in 2004 on a new programming paradigm to handle data at internet scale Hadoop started as a part of the Nutch project. In Jan 2006 Doug Cutting started working on Hadoop at Yahoo Factored out of Nutch in Feb 2006

First release of Apache Hadoop in September 2007

Jan 2008 - Hadoop became a top level Apache project

Hadoop distributions

Amazon Cloudera MapR

HortonWorks
Microsoft Windows Azure.

IBM InfoSphere Biginsights

Datameer EMC Greenplum HD Hadoop distribution Hadapt

What is Hadoop ?
Flexible

infrastructure for large scale computation & data processing on a network of commodity hardware Completely written in java Open source & distributed under Apache license Hadoop Common, HDFS & MapReduce

What Hadoop is not

replacement for existing data warehouse systems A File system An online transaction processing (OLTP) system Replacement of all programming logic A database

Hadoop architecture

High level view (NN, DN, JT, TT)

HDFS (Hadoop Distributed File System)

Hadoop distributed file system

Default storage for the Hadoop cluster NameNode/DataNode The File System Namespace(similar to our local file system)

Master/slave architecture (1 master 'n' slaves)

Virtual not physical Provides configurable replication (user specific) Data is stored as chunks (64 MB default, but configurable) across all the nodes

HDFS architecture

Data replication in HDFS.

Rack awareness

Typically large Hadoop clusters are arranged in racks and network traffic between different nodes with in the same rack is much more desirable than network traffic across the racks. In addition Namenode tries to place replicas of block on multiple racks for improved fault tolerance. A default installation assumes all the nodes belong to the same rack.

MapReduce

Framework provided by Hadoop to process large amount of data across a cluster of machines in a parallel manner Comprises of three classes Mapper class Reducer class Driver class

Tasktracker/ Jobtracker
Reducer phase will start only after mapper is done Takes (k,v) pairs and emits (k,v) pair

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void

map(LongWritable key, Text value, Context context) throws

IOException, InterruptedException {
String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } }

MapReduce job flow

Modes of operation
Standalone

mode mode

Pseudo-distributed Fully-distributed

mode

Hadoop ecosystem

When should we go for Hadoop?

Data

is too huge are independent analytical processing

Processes Online Better

(OLAP)

scalability data

Parallelism Unstructured

Real world use cases

Clickstream
Sentiment Ad

analysis
engines

analysis

Recommendation

Targeting
Quality

What I have been doing

Seismic

Data Management & Processing

WITSML

Server & Drilling Analytics

Permission Map management for

Orchestra

SDIS

(just started)

Next steps: Get your hands dirty with code in a workshop on

Hadoop HDFS Map

Configuration

Data loading Reduce programming

Hbase

Hive

& Pig

QUESTIONS ?

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Rating: 4 out of 5 stars
4/5 (5822)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Rating: 4 out of 5 stars
4/5 (1093)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Rating: 4.5 out of 5 stars
4.5/5 (852)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
Rating: 4 out of 5 stars
4/5 (610)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
Rating: 4.5 out of 5 stars
4.5/5 (1717)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Rating: 4 out of 5 stars
4/5 (590)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Rating: 4 out of 5 stars
4/5 (1105)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Rating: 4 out of 5 stars
4/5 (898)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Rating: 4.5 out of 5 stars
4.5/5 (540)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
Rating: 4.5 out of 5 stars
4.5/5 (2104)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Rating: 4.5 out of 5 stars
4.5/5 (349)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Rating: 4.5 out of 5 stars
4.5/5 (474)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
Rating: 4 out of 5 stars
4/5 (1025)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
Rating: 4 out of 5 stars
4/5 (1867)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Rating: 4 out of 5 stars
4/5 (822)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Rating: 4.5 out of 5 stars
4.5/5 (122)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
Rating: 4.5 out of 5 stars
4.5/5 (441)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Rating: 4.5 out of 5 stars
4.5/5 (271)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
Rating: 3.5 out of 5 stars
3.5/5 (1948)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Rating: 3.5 out of 5 stars
3.5/5 (403)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
Rating: 4.5 out of 5 stars
4.5/5 (4771)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Rating: 3.5 out of 5 stars
3.5/5 (2259)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
Rating: 4.5 out of 5 stars
4.5/5 (809)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Rating: 4.5 out of 5 stars
4.5/5 (266)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Rating: 4 out of 5 stars
4/5 (4208)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
Rating: 4.5 out of 5 stars
4.5/5 (1929)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Rating: 4 out of 5 stars
4/5 (98)
Yes Please
From Everand
Yes Please
Amy Poehler
Rating: 4 out of 5 stars
4/5 (1903)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Rating: 3.5 out of 5 stars
3.5/5 (231)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Rating: 4.5 out of 5 stars
4.5/5 (234)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
Rating: 3.5 out of 5 stars
3.5/5 (2522)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
Rating: 4 out of 5 stars
4/5 (3973)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
Rating: 3.5 out of 5 stars
3.5/5 (738)
John Adams
From Everand
John Adams
David McCullough
Rating: 4.5 out of 5 stars
4.5/5 (2409)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Rating: 4 out of 5 stars
4/5 (74)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
Rating: 4.5 out of 5 stars
4.5/5 (789)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
Rating: 3.5 out of 5 stars
3.5/5 (880)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
Rating: 3.5 out of 5 stars
3.5/5 (104)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
Rating: 4 out of 5 stars
4/5 (45)
El Lenguaje de La Seduccion Philippe Turchet PDF
Document4 pages
El Lenguaje de La Seduccion Philippe Turchet PDF
Ruben Minas Trinidad
No ratings yet
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Rating: 3.5 out of 5 stars
3.5/5 (137)
Little Women
From Everand
Little Women
Louisa May Alcott
Rating: 4 out of 5 stars
4/5 (105)
PDI Project Setup and Lifecycle Management
Document57 pages
PDI Project Setup and Lifecycle Management
emidavila674
No ratings yet
SQL Injection 3
Document19 pages
SQL Injection 3
Ivan Martin Valderas
100% (1)
Mysql
Document11 pages
Mysql
shashwat2010
100% (1)
Hive Query Optimization Infinity
Document13 pages
Hive Query Optimization Infinity
shashwat2010
No ratings yet
Hive Configuration: Shashwat Shriparv
Document5 pages
Hive Configuration: Shashwat Shriparv
shashwat2010
No ratings yet
Next Generation Technology
Document4 pages
Next Generation Technology
shashwat2010
No ratings yet
Probability Terminology and Concepts
Document13 pages
Probability Terminology and Concepts
shashwat2010
No ratings yet
Apache Tomcat
Document18 pages
Apache Tomcat
shashwat2010
No ratings yet
Project Oxygen : Shashwat Shriparv Infinitysoft
Document25 pages
Project Oxygen : Shashwat Shriparv Infinitysoft
shashwat2010
No ratings yet
Secondary Storage Devices
Document36 pages
Secondary Storage Devices
shashwat2010
No ratings yet
Search Engine
Document42 pages
Search Engine
shashwat2010
No ratings yet
NewPaper Problem
Document12 pages
NewPaper Problem
shashwat2010
No ratings yet
Network Structures: Shashwat Shriparv Infinitysoft
Document12 pages
Network Structures: Shashwat Shriparv Infinitysoft
shashwat2010
No ratings yet
Object AND Classes: in Java
Document9 pages
Object AND Classes: in Java
shashwat2010
No ratings yet
Shashwat Shriparv Infinitysoft: Access To Non Local Names
Document12 pages
Shashwat Shriparv Infinitysoft: Access To Non Local Names
shashwat2010
No ratings yet
Shashwat Shriparv Infinitysoft
Document38 pages
Shashwat Shriparv Infinitysoft
shashwat2010
No ratings yet
System Programming: Shashwat Shriparv Infinitysoft
Document40 pages
System Programming: Shashwat Shriparv Infinitysoft
shashwat2010
No ratings yet
Microsoft Surface Introduction
Document25 pages
Microsoft Surface Introduction
shashwat2010
No ratings yet
Jini Network Technology
Document45 pages
Jini Network Technology
shashwat2010
No ratings yet
Microsoft Surface Introduction
Document25 pages
Microsoft Surface Introduction
shashwat2010
No ratings yet
Issues Regarding Mis Structure: Shashwat Shriparv Infinitysoft
Document15 pages
Issues Regarding Mis Structure: Shashwat Shriparv Infinitysoft
shashwat2010
No ratings yet
Java Ring: Shashwat Shriparv Infinitysoft
Document33 pages
Java Ring: Shashwat Shriparv Infinitysoft
shashwat2010
No ratings yet
ERPortal PrintDirect Install Instructions
Document2 pages
ERPortal PrintDirect Install Instructions
PhilKnight
No ratings yet
PV530 Itx
Document39 pages
PV530 Itx
Je An
No ratings yet
System Bit M340
Document13 pages
System Bit M340
ifceeae nhce
No ratings yet
OBIEE Technical Check-List
Document6 pages
OBIEE Technical Check-List
Krista Smith
No ratings yet
Arm Quick Ref
Document4 pages
Arm Quick Ref
Ajay Rio
No ratings yet
Practice PLSQL SEC 4
Document19 pages
Practice PLSQL SEC 4
annonymous
100% (1)
Railway Reservation Sysytem
Document63 pages
Railway Reservation Sysytem
Parag Waghmare
0% (1)
Sharpen Up On C#
Document19 pages
Sharpen Up On C#
Bachtiar Yanuari
No ratings yet
ControlLogix® Redundancy System Revision 11
Document12 pages
ControlLogix® Redundancy System Revision 11
l1f3b00k
No ratings yet
Gopigo Write - Py
Document2 pages
Gopigo Write - Py
Pete
No ratings yet
Introduction To Database
Document57 pages
Introduction To Database
Adnan Alam Khan
No ratings yet
Powershell For Azure
Document94 pages
Powershell For Azure
ichbin1199
No ratings yet
OOAD Lab Exercise-1 To 3
Document59 pages
OOAD Lab Exercise-1 To 3
Raja Rathnam
No ratings yet
The Computer Science Handbook
Document271 pages
The Computer Science Handbook
Anonymous n6EXrLLvL
100% (1)
Java String
Document35 pages
Java String
Daksh Mathur
No ratings yet
Cobols
Document515 pages
Cobols
Goutham Kadimisetty
No ratings yet
Vworkspace POC
Document16 pages
Vworkspace POC
Rodrigo Rocha
No ratings yet
Switching
Document34 pages
Switching
Harish Pandey
No ratings yet
Delivered By.. Love Jain P08ec907
Document24 pages
Delivered By.. Love Jain P08ec907
api-19772070
100% (1)
Start in Safe Mode
Document4 pages
Start in Safe Mode
Vicky Singh
No ratings yet
tm4c123gh6pm PDF
Document1,409 pages
tm4c123gh6pm PDF
Sridhar Ravi
No ratings yet
Intrusion Detection System IDS Seminar Report
Document18 pages
Intrusion Detection System IDS Seminar Report
Sahil Sethi
No ratings yet
Leave Management System: Problem Statement
Document11 pages
Leave Management System: Problem Statement
Eternall Luvrs
No ratings yet
Lec-1-2 ISA
Document52 pages
Lec-1-2 ISA
beepee14
No ratings yet
Batch Data Communication (BDC) Procedure in Overview, PDF Book in SAP ABAP
Document6 pages
Batch Data Communication (BDC) Procedure in Overview, PDF Book in SAP ABAP
sanjay
No ratings yet
Building Specialized Industry Applications Using Solr, and Migration From FAST ESP
Document20 pages
Building Specialized Industry Applications Using Solr, and Migration From FAST ESP
lucidimagination
No ratings yet
Code Commands and Reference Links PDF
Document37 pages
Code Commands and Reference Links PDF
Esti Armijantari
No ratings yet