HDFS and Oracle

HDFS Hadoop Distributed File System
Introduction
Johan Louwers Lead Architect Oracle Technology
HDFS Hadoop Distributed File System
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity
hardware. It has many similarities with existing distributed file systems. However, the differences from
other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed
on low-cost hardware. HDFS provides high throughput access to application data and is suitable for
applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming
access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search
engine project. HDFS is now an Apache Hadoop subproject. The project URL
is http://hadoop.apache.org/hdfs/.
Hadoop HDFS introduction

Copyright 2014 Capgemini. All rights reserved.
HDFS Simple Cluster Setup

A
Simple HDFS Cluster Setup

B
A) HDFS cluster consisting out of a number of

commodity servers.
B) A single server containing both a name
node and a data node
C) Multiple servers containing a data node

HDFS introduction
HDFS Name Node
Primary index of where data is stored within

the cluster.
Primary entry point for all (applications)
clients who request access to HDFS.
Advisable to size the Name Node bigger then

the Data Node server.
Option to run a Data Node instance on the
same server as the Name Node.
Hadoop 2.0.0 and higher provide the option to
have high available Name Node setup. Prior to
2.0.0 the name Node was a single point of
Failure.

HDFS introduction
HDFS Storage
A (large) file is chopped into blocks.

Blocks are written to the different data nodes
in the cluster.
The name node keeps track of which block is
written to which node.

HDFS introduction
HDFS Storage
Data blocks are replicated over different nodes

in the cluster to ensure availability when a node
fails.
Level of replication is by default 3. Configured
with the dfs.replication variable in the HDFS
configuration
On startup, the NameNode enters a special state

called Safemode. Replication of data blocks does
not occur when the NameNode is in the Safemode
state.

HDFS introduction
HDFS Storage
When operating a large cluster ensure that

you have enabled the rack aware option.
Refer to the HADOOP-692 improvement for
more details: http://goo.gl/dQ012n
Thanks to ChrisDag for the image
Typically large Hadoop clusters are arranged in racks

and network traffic between different nodes with in the
same rack is much more desirable than network traffic
across the racks. In addition NameNode tries to place
replicas of block on multiple racks for improved fault
tolerance.

HDFS Oracle & Big Data

Oracle Big Data Appliance Introduction
Oracle Big Data Appliance is a highperformance, secure platform for running

diverse workloads on Hadoop and NoSQL
systems.


Oracle Big Data Appliance Introduction
Oracle Big Data Appliance includes (almost

without the need to say it) a HDFS storage
component for storing data.


Oracle & Hadoop
Oracle XQuery for Hadoop

10

Oracle & Hadoop
Oracle SQL connector for HDFS

11

Oracle & Hadoop
Oracle Loader for Hadoop

Online mode
Offline mode

12

Oracle & Hadoop
Oracle Loader for Hadoop

Online mode
Offline mode

13

Oracle & Hadoop
Oracle Big Data SQL

14

Oracle & Hadoop
Oracle Big Data SQL

15
Contact me
Johan Louwers
Capgemini Lead Architect Oracle Technology
Mail
Twitter
Blog 1
Blog 2
: Johan.Louwers@capgemini.com
: @johanlouwers
: http://www.capgemini.com/blog/capgemini-oracle-blog
: http://johanlouwers.blogspot.com

16
About Capgemini
With almost 140,000 people in over 40 countries, Capgemini is
one of the world's foremost providers of consulting, technology
and outsourcing services. The Group reported 2013 global
revenues of EUR 10.1 billion.
Together with its clients, Capgemini creates and delivers
business and technology solutions that fit their needs and drive
the results they want. A deeply multicultural organization,
Capgemini has developed its own way of working, the
Collaborative Business Experience, and draws on
Rightshore, its worldwide delivery model.
Learn more about us at www.capgemini.com.
www.capgemini.com
The information contained in this presentation is proprietary.
2014 Capgemini. All rights reserved.
Rightshore is a trademark belonging to Capgemini.

HDFS and Oracle

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HDFS and Oracle

Uploaded by

Copyright:

Available Formats

HDFS Hadoop Distributed File System

HDFS Hadoop Distributed File System

Hadoop HDFS introduction

HDFS Simple Cluster Setup

Simple HDFS Cluster Setup

A) HDFS cluster consisting out of a number of

C) Multiple servers containing a data node

Hadoop HDFS introduction

Primary index of where data is stored within

Advisable to size the Name Node bigger then

Hadoop HDFS introduction

A (large) file is chopped into blocks.

Hadoop HDFS introduction

Data blocks are replicated over different nodes

On startup, the NameNode enters a special state

Hadoop HDFS introduction

When operating a large cluster ensure that

Thanks to ChrisDag for the image

Typically large Hadoop clusters are arranged in racks

Hadoop HDFS introduction

HDFS Oracle & Big Data

Oracle Big Data Appliance is a highperformance, secure platform for running

Hadoop HDFS introduction

HDFS Oracle & Big Data

Oracle Big Data Appliance includes (almost

Hadoop HDFS introduction

HDFS Oracle & Big Data

Oracle XQuery for Hadoop

Hadoop HDFS introduction

HDFS Oracle & Big Data

Oracle SQL connector for HDFS

Hadoop HDFS introduction

HDFS Oracle & Big Data

Oracle Loader for Hadoop

Hadoop HDFS introduction

HDFS Oracle & Big Data

Oracle Loader for Hadoop

Hadoop HDFS introduction

HDFS Oracle & Big Data

Oracle Big Data SQL

Hadoop HDFS introduction

HDFS Oracle & Big Data

Oracle Big Data SQL

Hadoop HDFS introduction

Hadoop HDFS introduction

You might also like