Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

AN OPTIMIZED AND EFFECTIVE BIG DATA HADOOP

PROCESSING ENVIRONMENT USING JOB SCHEDULERS

A mini project report submitted in partial fulfilment of the requirement for the
award of the degree of

BACHELOR OF TECHNOLOGY

in

COMPUTER SCIENCE AND ENGINEERING

Submitted by

ANIRUDH NALLANA 318129210006


CHINTHALAPUDI BHAGYA SREE 318129210011
ADAPA BABY LAHARI 318129210002

Under the Esteemed Guidance of


SRI. G. KALYAN CHAKRAVARTHI B.Tech, M.Tech,(Ph.D)
Assistant Professor
and

Dr. S.S.V.R Kumar Addagarla B.Tech, M.Tech,Ph.D


Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


ENGINEERING AND TECHNOLOGY PROGRAM
GAYATRI VIDYA PARISHAD COLLEGE FOR DEGREE AND PG COURSES (A)
Rushikonda, Visakhapatnam – 45
(Approved by AICTE| Accredited by NBA| Accredited by NAAC| Affiliated to Andhra University)
2019-2023
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GAYATRI VIDYA PARISHAD COLLEGE FOR DEGREE AND PG COURSES (A)

Rushikonda, Visakhapatnam - 45

CERTIFICATE
This is to certify that the project report entitled “AN OPTIMISED AND EFFECTIVE BIG
DATA HADOOP PROCESSING ENVIRONMENT USING JOB SCHEDULERS” being
submitted by ANIRUDH NALLANA (318129210006), CHINTHALAPUDI BHAGYA SREE
(318129210011), ADAPA BABY LAHARI (31812921002) in the partial fulfilment for the
award of the Degree of Bachelor of Technology in Computer Science and Engineering to
Engineering and Technology Program, Gayatri Vidya Parishad College for Degree and PG
Courses (A), Visakhapatnam is a record of bonafide work carried out under my guidance and
supervision.

Project Guide Head of the Department

Sri. G. Kalyan Chakravarthi Dr. N.V.Ramana Murty


B.Tech, M.Tech,(Ph. D) M. Tech, Ph. D
Assistant Professor Professor

External Examiner
DECLARATION
We hereby declare that the project entitled “AN OPTIMISED AND EFFECTIVE BIG DATA
HADOOP PROCESSING ENVIRONMENT USING JOB SCHEDULERS” submitted in
partial fulfilment of the requirements for the award of Bachelor of Technology in Computer
Science and Engineering, to Engineering and Technology Program, Gayatri Vidya Parishad
College for Degree and PG Courses (A). We assure that this project is not submitted in any other
University or College.

Name & Signature of the Students

ANIRUDH NALLANA 318129210006

CHINTHALAPUDI BHAGYA SREE 318129210011

ADAPA BABY LAHARI 318129210002

ACKNOWLEDGEMENTS
With great pleasure we want to take this opportunity to express our heartfelt gratitude to
all the people who helped in making main project work a grand success.

First of all we express our deep sense of gratitude to Sri. G. Kalyan Chakravarthi
, Assistant Professor, and Dr. S S V R Kumar Addagarla for their constant guidance throughout
our main project work.

We would like to thank Dr. N.V.Ramana Murty, Professor, Head of the Department of
Computer Science and Engineering, for being moral support throughout the period of our study.

ANIRUDH NALLANA (318129210006)


CHINTHALAPUDI BHAGYA SREE (318129210011)
ADAPA BABY LAHARI (318129210002)

ABSTRACT
MapReduce is emerging as an important programming model for large-scale data processing,
parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an
open-source implementation of MapReduce enjoying wide adoption and is often used for both
short jobs where low response time is critical and large jobs where resource intensive analysis
operations must perform efficiently and effectively without any fail states and compromises. The
arch of Hadoop Distributed Environment Processing is closely tied to its task scheduler, as it the
job of the Task Scheduler to manage and implement tasks within and throughout different
loosely coupled nodes in a distributed computing environment that is Hadoop Distributed File
System(HDFS). It is the job of the Map-Reduce daemon to effectively configure and provide an
interface with the underlying components of Hadoop. Thus in this project, we configure the
Map-Reduce daemon as well as the Hadoop Processing Environment to create a customised
Hadoop Processing Environment which is arguably better than the native and pre-packaged
Hadoop Software.

In this project, most importantly we address the problem of job scheduling, that is how to
allocate the resources of a cluster to a number of concurrent jobs, and focus on Hadoop. Our
solution implements a size-based, pre-emptive scheduling discipline. We proceed with the design
of a new scheduling protocol that caters both to a fair and efficient utilization of cluster
resources, while striving to achieve short response times. Our approach satisfies both the
interactivity requirements of “small” jobs and the performance requirements of “large” jobs,
which can thus coexist in a cluster without requiring manual setups and complex tuning.

TABLE OF CONTENTS
CHAPTER Page No

1. Introduction 1

2. Literature Survey 12
2.1 Survey Paper 1
2.2 Survey Paper 2
2.3 Survey Paper 3

3. Requirements and Analysis 14


3.1 Existing System
3.2 Proposed System
3.3 Functional Requirements
3.4 Non Functional Requirements
3.5 Domain Requirements

4. System Design 16
4.1 Design Goals
4.2 UML Diagrams
4.3 Data Flow Diagram

5. Implementation 26

6. Testing 32
7. Results 34
8. Sample Code 40
9. Conclusion 48
10. References 49

You might also like