Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Hadoop Ecosystems - Quick Reference.

02/03/2016

Master Planning ERP-NonSAP

Microsoft Mobile Oy

Karthika Janardhanan
Hadoop

j.karthika@tcs.com
TCS Experience Certainty branding will be added by ePublishing team after all reviews and before final upload
Confidentiality Statement
Include the confidentiality statement within the box provided. This has to be legally
approved
Confidentiality and Non-Disclosure Notice
The information contained in this document is confidential and proprietary to TATA
Consultancy Services. This information may not be disclosed, duplicated or used for any
other purposes. The information contained in this document may not be released in
whole or in part outside TCS for any purpose without the express written permission of
TATA Consultancy Services.

Tata Code of Conduct


We, in our dealings, are self-regulated by a Code of Conduct as enshrined in the Tata
Code of Conduct. We request your support in helping us adhere to the Code in letter and
spirit. We request that any violation or potential violation of the Code by any person be
promptly brought to the notice of the Local Ethics Counselor or the Principal Ethics
Counselor or the CEO of TCS. All communication received in this regard will be treated
and kept as confidential.
Table of Content

1. Introduction .............................................................................................................................................................. 5
2. Hadoop Ecosystems Quick Reference..................................................................................................................... 6
1. Introduction
The world of Hadoop comprises of a number of ecosystems associated with it. This document would help the
readers to understand the various Hadoop ecosystems, when to use the appropriate Hadoop ecosystem with an
example.

This would be handy to design the overall solution approach using Hadoop ecosystems for the given problem
statement.
2. Hadoop Ecosystems Quick Reference

S.No. Ecosystem What is it When to use Example


1 HDFS This is the underlying data 1. To store such huge data, Hadoop cluster
(Hadoop architecture used in Hadoop the files are stored across
Distributed File eco system. multiple machines.
System)
2. These files are stored in
redundant fashion to rescue
the system from possible
data losses in case of failure.

3.HDFS also makes


applications available to
parallel processing.
2 Hbase It is a NoSQL DB where data 1.Huge volume of data Employee details
stored as columns. It is stored with column
inspired by Googles Big 2.Need for High write families such as
table. throughput personal data,
contact details,
employment details
etc.click below for
the link.

3 Map Reduce(MR) MR is the algorithm to apply 1.Huge volume of data Creating a weather
logical rules in the raw data report with
and derive the expected data 2.Complex logic to derive the maximum average
outcome. required output temperature for
each city in the
country from the
raw weather data.

4 Pig It is a Scripting language to 1.Raw data could be mapped Deriving maximum


process data into Hadoop to a schema and minimum mark
cluster. Uses PIG-LATIN from student
language. Executes as step-by 2.Expected output could be academic data.
step. derived by applying scalar
functions on the data
5 Hive An SQL-like interface to Raw data present in Processing oracle
Hadoop by building replica of Relational database table or table data in
Relational Database Tables in could be presented in a Hadoop cluster.
Hadoop cluster. relational fashion.
Data Warehouse
infrastructure that provides
data summarization and ad
hoc querying on top of
Hadoop
6 Sqoop Raw data present in Import from my SQL
It has clear set of instructions
conventional database which databases straight
to source in the structured
has to be imported/exported into your Hive data
static data from conventional
database into Hadoop cluster. to and from Hadoop cluster warehouse
A suite of tools that connect
Hadoop and database
systems
7 Flume To source in streaming data To load twitter
Flume is one of the eco
into Hadoop cluster. tweets into Hadoop
systems in Hadoop which is
cluster.
used to connect into the
streaming data and get the
data loaded into Hadoop
cluster.
8 Oozie Scheduled execution of Batch processing
A java Web Application which
Hadoop data processing
is a workflow scheduler for
steps with one or more
Hadoop
ecosystems
9 HUE(Hadoop Enabling users to interact Custom Applications
It is developed by Cloudera. It
User Experience) with Hadoop cluster with with nice UI library
is a web UI for Hadoop
Web UI
10 Mahout Shopping cart
Machine-learning tool ,with 1.Recommendation Mining-
content in online
distributed and scalable takes users behaviour and
Shopping sites.
machine learning algorithms find items said specified user
on Hadoop platform might like
Click on below icon
2.Clustering-takes e.g. text
documents and groups them
mahout. jpg

based on related document


topics
3.Classification-learns from
existing categorized
documents what specific
category documents look
like and is able to assign
unlabeled documents to
appropriate category

4.Frequent item set mining


:takes a set of item groups
and identifies which
individual items typically
appear together
Thank You

Contact

For more information, contact gsl.cdsfiodg@tcs.com (Email Id of ISU)

About Tata Consultancy Services (TCS)

Tata Consultancy Services is an IT services, consulting and business solutions


organization that delivers real results to global business, ensuring a level of certainty no
other firm can match. TCS offers a consulting-led, integrated portfolio of IT and IT-
enabled infrastructure, engineering and assurance services. This is delivered through its
TM
unique Global Network Delivery Model , recognized as the benchmark of excellence in
software development. A part of the Tata Group, Indias largest industrial conglomerate,
TCS has a global footprint and is listed on the National Stock Exchange and Bombay
Stock Exchange in India.

For more information, visit us at www.tcs.com.

IT Services
Business Solutions
Consulting

All content / information present here is the exclusive property of Tata Consultancy Services Limited (TCS). The content /
information contained here is correct at the time of publishing. No material from here may be copied, modified, reproduced,
republished, uploaded, transmitted, posted or distributed in any form without prior written permission from TCS.
Unauthorized use of the content / information appearing here may violate copyright, trademark and other applicable laws,
and could result in criminal or civil penalties. Copyright 2011 Tata Consultancy Services Limited

You might also like