Welcome to Scribd!

Big Data

Uploaded by

0% found this document useful (0 votes)

9 views3 pages

Big data is a collection of large datasets that cannot be processed using traditional computing techniques due to their huge volume and rapid growth rate. Hadoop is an open-source framework for storing and processing big data across clusters of computers. It has two major layers - a processing layer called MapReduce and a storage layer called HDFS. MapReduce uses a map function to process key-value pairs and a reduce function to combine the outputs into final results. Hadoop Streaming allows any executable or script to perform map and reduce jobs. The Hadoop ecosystem includes components like HDFS, YARN, MapReduce, Spark, HBase and others that work together to store, process and analyze big data.

Original Description:

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

0% found this document useful (0 votes)

9 views3 pages

Big Data

Uploaded by

Gajanand Sharma

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 3

Search inside document

Big Data

Q1. What is Big Data?

Big Data is a collection of data that is huge in volume, yet growing exponentially
with time. Also, Big data is a collection of large datasets that cannot be processed
using traditional computing techniques. It is not a single technique or a tool,
rather it has become a complete subject, which involves various tools, techniques
and frameworks.

Q2. What is Hadoop?

Hadoop is an open-source framework that allows to store and process big data in
a distributed environment across clusters of computers using simple
programming models. It is designed to scale up from single servers to thousands
of machines, each offering local computation and storage.
Hadoop Architecture
At its core, Hadoop has two major layers namely −
Processing/Computation layer (MapReduce), and
Storage layer (Hadoop Distributed File System).
Hadoop Components
1. HDFS : It is a virtual File system, It is primary storage in Hadoop, Infinitely
scalable.
2. YARN( Yet Another Resource Negotiator) : Responsible for providing
computational resources. Comprises Resource Manager, Node Manager,
Application Manager.
3. Hadoop Common : It is a collection of libraries that implement underlying
capabilities lacked by Hadoop.

Q3. What is MapReduce?

MapReduce is a processing technique and a program model for distributed
computing based on java. The MapReduce algorithm contains two important
tasks, namely Map and Reduce. Map takes a set of data and converts it into
another set of data, where individual elements are broken down into tuples
(key/value pairs). Secondly, reduce task, which takes the output from a map as an
input and combines those data tuples into a smaller set of tuples. As the
sequence of the name MapReduce implies, the reduce task is always performed
after the map job.
Q4. Hadoop Streaming?
Hadoop streaming is a utility that comes with the Hadoop distribution. This utility
allows you to create and run Map/Reduce jobs with any executable or script as
the mapper and/or the reducer.

Q5. What is Hadoop Ecosystem?

Below are the Hadoop components, that together form a Hadoop ecosystem:
HDFS -> Hadoop Distributed File System
YARN -> Yet Another Resource Negotiator
MapReduce -> Data processing using programming
Spark -> In-memory Data Processing
PIG, HIVE-> Data Processing Services using Query (SQL-like)
HBase -> NoSQL Database
Mahout, Spark MLlib -> Machine Learning
Apache Drill -> SQL on Hadoop
Zookeeper -> Managing Cluster
Oozie -> Job Scheduling
Flume, Sqoop -> Data Ingesting Services
Solr & Lucene -> Searching & Indexing
Ambari -> Provision, Monitor and Maintain cluster

Exam DP-203: Data Engineering On Microsoft Azure - Skills Measured
Document5 pages
Exam DP-203: Data Engineering On Microsoft Azure - Skills Measured
welhie
0% (1)
Big Data: Practice Exercises
Document4 pages
Big Data: Practice Exercises
NUBG Gamer
0% (1)
Great Expectations Vs Apache Griffin v1.2
Document2 pages
Great Expectations Vs Apache Griffin v1.2
kashif
100% (1)
Unit 2
Document56 pages
Unit 2
Ramstage Testing
No ratings yet
Parallel Project
Document32 pages
Parallel Project
hafsabashir820
No ratings yet
Hadoop
Document7 pages
Hadoop
Anonymous mFO6slhI0
No ratings yet
Unit 3 ETI (BDA)
Document34 pages
Unit 3 ETI (BDA)
abdulahad.ubeid
No ratings yet
Lab Manual BDA
Document36 pages
Lab Manual BDA
hemalata jangale
No ratings yet
Hadoop Notesforstudents
Document13 pages
Hadoop Notesforstudents
Saif Fazal
No ratings yet
Bda Bi Jit Chapter-4
Document20 pages
Bda Bi Jit Chapter-4
Araarsoo Jaallataa
No ratings yet
Experiment No.1: AIM: Study of Hadoop
Document6 pages
Experiment No.1: AIM: Study of Hadoop
Harshita Mandloi
No ratings yet
Unit 3
Document15 pages
Unit 3
xcgfxgvx
No ratings yet
Module-2 - Introduction To Hadoop
Document13 pages
Module-2 - Introduction To Hadoop
shreya
No ratings yet
CC-KML051-Unit V
Document17 pages
CC-KML051-Unit V
Fdjs
No ratings yet
Hadoop
Document12 pages
Hadoop
Ã S Àdhìkãrí
No ratings yet
Shortnotes For Cloud
Document22 pages
Shortnotes For Cloud
Mahi Mahi
No ratings yet
BDA Unit 3
Document6 pages
BDA Unit 3
Sp
No ratings yet
Hadoop Features 2
Document3 pages
Hadoop Features 2
sharan kommi
No ratings yet
Hadoop Ecosystem PDF
Document6 pages
Hadoop Ecosystem PDF
Kittu
No ratings yet
HADOOP: A Solution To Big Data Problems Using Partitioning Mechanism Map-Reduce
Document6 pages
HADOOP: A Solution To Big Data Problems Using Partitioning Mechanism Map-Reduce
Editor IJTSRD
No ratings yet
Unit 2 Notes BDA
Document10 pages
Unit 2 Notes BDA
vasusrivastava138
No ratings yet
Bda Lab Manual
Document40 pages
Bda Lab Manual
vishalatdwork573
0% (1)
Bda Unit 2
Document21 pages
Bda Unit 2
245120737162
No ratings yet
BDA Notes
Document25 pages
BDA Notes
mrudula.sb
No ratings yet
Big Data Hadoop Stack
Document52 pages
Big Data Hadoop Stack
Yaser Ali Tariq
No ratings yet
1) Hadoop Basics
Document86 pages
1) Hadoop Basics
angeline
No ratings yet
Big Data Module 2
Document23 pages
Big Data Module 2
Srikanth M
No ratings yet
Hadoop
Document5 pages
Hadoop
Vaishnavi Chockalingam
No ratings yet
Cloud - UNIT V
Document18 pages
Cloud - UNIT V
Shikha Sharma
No ratings yet
Hadoop Ecosystem
Document4 pages
Hadoop Ecosystem
shweta shedshale
No ratings yet
Fillatre Big Data
Document98 pages
Fillatre Big Data
satmania
No ratings yet
Haddob Lab Report
Document12 pages
Haddob Lab Report
Magneto Eric Apollyon Thorn
No ratings yet
Hadoop Ecosystem: Hdfs Mapreduce Yarn Hadoop Common
Document5 pages
Hadoop Ecosystem: Hdfs Mapreduce Yarn Hadoop Common
Harshdeep850
No ratings yet
Hadoop: A Seminar Report On
Document28 pages
Hadoop: A Seminar Report On
Roshni Khairnar
No ratings yet
BigData Unit 2
Document15 pages
BigData Unit 2
Sreedhar Arikatla
No ratings yet
Lecture Notes Hadoop
Document11 pages
Lecture Notes Hadoop
sakshi kureley
No ratings yet
Big Data and Hadoop - Suzanne
Document5 pages
Big Data and Hadoop - Suzanne
Tripti Sagar
No ratings yet
Hadoop: Er. Gursewak Singh Dsce
Document15 pages
Hadoop: Er. Gursewak Singh Dsce
Daisy Kawatra
No ratings yet
Big Data Hadoop
Document37 pages
Big Data Hadoop
SDHR BCA
No ratings yet
CASE STUDY On Application of Hadoop
Document16 pages
CASE STUDY On Application of Hadoop
haqueashraful713
No ratings yet
Hadoop
Document6 pages
Hadoop
Vikas Sinha
No ratings yet
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
Document89 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
Antony George Sahayaraj
No ratings yet
Compusoft, 3 (10), 1136-1139 PDF
Document4 pages
Compusoft, 3 (10), 1136-1139 PDF
Ijact Editor
No ratings yet
BD 11
Document17 pages
BD 11
V Kalyan
No ratings yet
Chapter 2 Hadoop Eco System
Document34 pages
Chapter 2 Hadoop Eco System
lamisaldhamri237
No ratings yet
Unit 2
Document10 pages
Unit 2
tripathineeharika
No ratings yet
Hadoop: A Report Writing On
Document13 pages
Hadoop: A Report Writing On
dilip kodmour
No ratings yet
Hadoop Ecosystem and Their Components
Document19 pages
Hadoop Ecosystem and Their Components
pallavibhardwaj1124
No ratings yet
BD - Unit - II - Hadoop Frameworks and HDFS
Document37 pages
BD - Unit - II - Hadoop Frameworks and HDFS
Prem Kumar
No ratings yet
BDA Mod2@AzDOCUMENTS - in
Document64 pages
BDA Mod2@AzDOCUMENTS - in
ramya
No ratings yet
Assignment 5 (Hadoop)
Document1 page
Assignment 5 (Hadoop)
hapiness1131
No ratings yet
Chapter - 2 Hadoop
Document32 pages
Chapter - 2 Hadoop
Rahul Pawar
No ratings yet
Hadoop Tutorial
Document17 pages
Hadoop Tutorial
Priyadarsini Rout
No ratings yet
BDA Module2 Hadoop Ecosystem
Document41 pages
BDA Module2 Hadoop Ecosystem
Prarthana Manavi
100% (1)
Hdfs Architecture and Hadoop Mapreduce
Document10 pages
Hdfs Architecture and Hadoop Mapreduce
Nishkarsh Shah
No ratings yet
Hadoop Job Runner UI Tool
Document10 pages
Hadoop Job Runner UI Tool
International Journal of Engineering Inventions (IJEI)
No ratings yet
Big Data Analytics Unit-3
Document15 pages
Big Data Analytics Unit-3
4241 DAYANA SRI VARSHA
No ratings yet
Apache Hadoop
Document11 pages
Apache Hadoop
Imaad Ukaye
No ratings yet
BDA Module 2 Chapter 1
Document12 pages
BDA Module 2 Chapter 1
Prathibha Rangaswamy
No ratings yet
UNIT-I Introduction To Hadoop - A20
Document24 pages
UNIT-I Introduction To Hadoop - A20
Manoj Reddy
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
AI-Powered Search v14
Document398 pages
AI-Powered Search v14
Nikhil
No ratings yet
Azure Data Factory Interview Questions: Click Here
Document28 pages
Azure Data Factory Interview Questions: Click Here
Vijay Rajendiran
No ratings yet
Introduction To And: SAP Data Intelligence SAP Data Hub
Document37 pages
Introduction To And: SAP Data Intelligence SAP Data Hub
alejandrodazal
No ratings yet
VMware Greenplum Overview
Document31 pages
VMware Greenplum Overview
Alexander Zhirov
No ratings yet
CV 6
Document1 page
CV 6
RahulRoyale
No ratings yet
Data Engineering New
Document3 pages
Data Engineering New
Course Kalavani
No ratings yet
Introduction To Data Science
Document11 pages
Introduction To Data Science
pjmmag
No ratings yet
CV Ashfaq Salehin Academic Employment
Document5 pages
CV Ashfaq Salehin Academic Employment
aytrrdde
No ratings yet
Machine Learning Operations MLOps Overview Definition and Architecture
Document14 pages
Machine Learning Operations MLOps Overview Definition and Architecture
Serhiy Yehress
No ratings yet
9-10 Spark Architecture
Document25 pages
9-10 Spark Architecture
Wong pi wen
No ratings yet
BDS Course Handout - Intuit PDF
Document6 pages
BDS Course Handout - Intuit PDF
Prasanth Tarikoppad
No ratings yet
Wa0003.
Document1 page
Wa0003.
Yameen Khan
No ratings yet
Data Universe Organizational Insights With Python Embracing Data Driven Decision Making Van Der Post Full Chapter
Document67 pages
Data Universe Organizational Insights With Python Embracing Data Driven Decision Making Van Der Post Full Chapter
barbara.martich678
100% (6)
Building A Big Data Platform For Smart Cities: Experience and Lessons From Santander
Document8 pages
Building A Big Data Platform For Smart Cities: Experience and Lessons From Santander
Dylan Guedes
No ratings yet
Comparative Analysis of Supervised Machine Learnin
Document10 pages
Comparative Analysis of Supervised Machine Learnin
Hainsley Edwards
No ratings yet
Big Data Architectural Patterns and Best Practices On AWS Presentation
Document56 pages
Big Data Architectural Patterns and Best Practices On AWS Presentation
munnaz
No ratings yet
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
Document18 pages
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
B
No ratings yet
Azure Data Factory
Document5 pages
Azure Data Factory
vr.sf99
No ratings yet
Akash Data Engineer
Document6 pages
Akash Data Engineer
HARSHA
No ratings yet
Gunja Agrawal Resume
Document1 page
Gunja Agrawal Resume
gunjatech
No ratings yet
Resume Yogeshdarji
Document1 page
Resume Yogeshdarji
api-324175597
No ratings yet
Codetru - Big Data
Document17 pages
Codetru - Big Data
Karim Shaik
No ratings yet
ML Server
Document2,320 pages
ML Server
Varsha Mishra
No ratings yet
Ignite Sample
Document88 pages
Ignite Sample
xbsd
0% (1)
Comprehensive Guide For Tuning Spark Big Data Applications and Infrastructure
Document20 pages
Comprehensive Guide For Tuning Spark Big Data Applications and Infrastructure
sandipmondal
100% (1)
Information Management Syllabus
Document9 pages
Information Management Syllabus
Miharbe Diangca
No ratings yet
Ajay Kadiyala Resume 2023 PDF
Document6 pages
Ajay Kadiyala Resume 2023 PDF
viki awsac
No ratings yet