BDA Literature Review

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

ASSOSA UNIVERSITY

COLLEGE OF COMPUTING AND INFORMATICS


DEPARTMENT OF INFORMATION TECHNOLOGY

MSc Program in Information Technology


Literature Review Presentation
Course Title: Big Data

By Dejene Dagim
Replication-Based Query
Management for Resource
Allocation Using Hadoop and
MapReduce over Big Data

By Dejene Dagim
Introduction
Overview of Replication-Based
1 Purpose of the Presentation 2 Query Management for Resource
Allocation Using Hadoop and MapReduce
over Big Data
The purpose of the The paper discusses the use of the MapReduce
presentation will be to model for creating a cluster parallel, distributed
provide an overview of the algorithm, and large datasets.
research conducted in this The paper focuses on evaluating this technique
area, including the for the efficient retrieval of large volumes of
challenges faced and the data.
solutions proposed. The technique allows for capabilities to inform a
massive database of information, from storage
and indexing techniques to the distribution of
queries, scalability, and performance in
heterogeneous environments
Background
1 What is Replication-Based Query 2 Importance of Resource
Management? Allocation in Big Data
Is a technique used for efficient Processing
retrieval of large volumes of data. Resource allocation is a crucial
It allows for capabilities to inform aspect of big data processing. It
a massive database of information, involves the distribution of
from storage and indexing computing resources to various
techniques to the distribution of tasks in a way that maximizes
queries, scalability, and overall efficiency and
performance in heterogeneous throughput.
environments.

3 Introduction to Hadoop and MapReduce


MapReduce is a programming model and an associated
implementation for processing and generating big data sets
with a parallel, distributed algorithm on a cluster.
Replication-Based Query Management
Replication-Based Query Management Use Cases and Real-World Applications
Approach

The Replication-Based Query One of the use cases of this approach


Management Approach is a technique is resource allocation using Hadoop
used for efficient retrieval of large and MapReduce over big data.
volumes of data. It allows for
capabilities to inform a massive database Load Balancing and Performance
of information, from storage and Improvement
indexing techniques to the distribution Replication allows distributing read
of queries, scalability, and performance queries across multiple replicas,
in heterogeneous environments reducing the load on the primary
database server.
Hadoop and MapReduce for Resource Allocation

Overview of Hadoop and How Hadoop and MapReduce


MapReduce Framework Key Features and
Enable Resource Allocation
Advantages of Hadoop
Hadoop is a software framework Distributed Storage and Processing:
that allows for the distributed (HDFS): Hadoop's file system, Hadoop is a big data processing paradigm
processing of large data sets HDFS, divides large datasets into that has multiple advantages Some of the
across clusters of computers smaller blocks and distributes them key features and advantages of Hadoop are:
using simple programming across multiple nodes in a cluster. Fault tolerance and high availability
models Parallel Processing:  Scalability
MapReduce is a programming MapReduce Model: MapReduce  Easy to use
model that is used for processing breaks down large-scale data  Fast data processing due to distributed
and generating large data sets on processing tasks into smaller, processing
clusters of computers independent units called mappers  Cost-effective
and reducers.
Implementation and Performance Considerations

1 Factors to Consider when Implementing Replication-Based Query


Management with Hadoop and MapReduce
Data Distribution and Replication: Query Optimization :MapReduce Job Design:
HDFS Block Size. Resource Management and Scheduling: YARN
Configuration:
Replication Factor
2 Performance Considerations and Best Practices
 Develop and test a backup and restore strategy
 Develop and test a backup and restore strategy
 Monitor the replication topology
3 Case Studies and Success Stories
The paper focuses on evaluating a technique for the efficient retrieval of large volumes of
data.
The technique allows for capabilities to inform a massive database of information, from
storage and indexing techniques to the distribution of queries, scalability, and
performance in heterogeneous environments.
The results show that the proposed work reduces the data processing time by 30%.
Conclusion
1 Summary of Key Points 2 Future Possibilities and
Advancements
The paper proposes a replication-based query management
technique for resource allocation using Hadoop and Further research can focus on
MapReduce over big data. enhancing the replication-based query
It utilizes the MapReduce model and Hadoop's functionality management technique to handle more
to develop a parallel, distributed algorithm for handling large complex and diverse types of data,
datasets. such as unstructured or semi-
The technique focuses on efficient retrieval of large volumes structured data
of data, considering factors such as storage and indexing
techniques, query distribution, scalability, and performance in Investigating the integration of
heterogeneous environments. advanced clustering algorithms into
Results show a 30% reduction in data processing time, Hadoop's heterogeneous clusters can
indicating the effectiveness of the proposed technique in improve the performance of
optimizing resource allocation and improving efficiency in MapReduce job processing.
handling big data .
The technique also addresses node failures and reduces
delays through an improved rescheduling algorithm, ensuring
uninterrupted execution and minimizing resource wastage ..
Thank You!
Q&A

You might also like