Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

COLLEGE OF COMPUTING AND INFORMATICS

DEPARTMENT OF INFORMATION SYSTEM

Title: Seminar report on Big Data

Name Id /No

Bekele Bayisa ………………………………………………..1633/10

May, 2021

Haramaya- Ethiopia
Abstract

Big data is collection of data sets so large and complex that it becomes difficult to process using
on-hand database management tools or traditional data processing applications. Big Data is the
data whose scale, diversity, and complexity require new architecture, techniques, algorithms,
and analytics to manage it and extract value and hidden knowledge from it͙ . Big data is work
based on the amount of data, types of data, speed of data and reliability data. We have made this
seminar report on the topic Big Data; this paper will clarify what is big data, what behind big
data and also what are application areas of big data. Furthermore, this paper focuses on a wide
field of big data, discussed its detail description about the system, wide applications, used
technologies and its usage in different areas, advantages and disadvantages and examples,
respectively. We have tried our best by explaining about big data, how it works, and also its
application to elucidate all the relevant detail to the topic to be included in the report.

i
Acknowledgement

First of all, we would like to thank Almighty God for giving strength in order to do our seminar
from starting up to the end. And, we would like thank department of Information technology for
giving us such a wonderful opportunity to expand our knowledge and giving us guidelines to
present seminar report.
Secondly, we express our sincere gratitude to Mr.Tilahun Melese (MSc.) who assisting us
throughout the preparation of this topic. We thank him for providing us the reinforcement,
confidence and most importantly the track for the topic whenever we needed it.

Contents
Abstract............................................................................................................................................................................i
Acknowledgement...........................................................................................................................................................i
1 Introduction..................................................................................................................................................................1

ii
2 General Motivation......................................................................................................................................................1
3 Goal of the paper..........................................................................................................................................................1
4 Overview about the previous technology……………………………………………………………………………2
5 Literature review…………………………………………………………………………………………………….2
6 Detail description about the big data...........................................................................................................................3
6.1 Big Data Technologies..........................................................................................................................................3
6.1.1Apache Hadoop...............................................................................................................................................1
6.1.2. Apache Spark................................................................................................................................................1
6.2 Big data Examples................................................................................................................................................1
7 How do big data work..................................................................................................................................................1
7.1 Integration.............................................................................................................................................................1
7.2 Management..........................................................................................................................................................1
7.3 Analysis.................................................................................................................................................................1
8 Architecture of big data...............................................................................................................................................1
8.1 Big Data Architecture Layers...............................................................................................................................1
8.2 Big Data Architecture Processes...........................................................................................................................1
9 Advantages and Disadvantage of Big data..................................................................................................................1
9.1. Advantage of big data..........................................................................................................................................1
9.2 Disadvantage of big data.......................................................................................................................................1
9 Big data applications area............................................................................................................................................1
9.1 Big Data in Healthcare Industry............................................................................................................................1
9.2. Big Data in Banking Sector.................................................................................................................................1
9.3 Big Data in Academia...........................................................................................................................................1
9.4 Applications of Big Data in Tourism....................................................................................................................1
10 Evaluation of the result..............................................................................................................................................1
11 Conclusions ...............................................................................................................................................................1
12 Reference...................................................................................................................................................................1

iii
1 Introduction

Big Data refers to data volumes in the range of Exabyte and beyond [1]. Such volumes exceed
the capacity of current on-line storage and processing systems. With characteristics like volume,
velocity and variety big data throws challenges to the traditional IT establishments. Computer
assisted innovation, real time data analytics, customer-centric business intelligence, industry
wide decision making and transparency are possible advantages, to mention few, of Big Data.
Big data is a term for massive data sets having large, more varied and complex structure with the
difficulties of storing, analyzing and visualizing for further processes or results [2].
In simple terms, it can be defined as the vast amount of data so complex and unorganized which
can’t be handled with the traditional database management systems. It is so complex and huge
that we cannot store and process it with the traditional database management tools or data
processing applications. With more and more data generated, it has become a big challenge for
traditional architectures and infrastructures to process large amounts of data within an acceptable
time and resources. In order to efficiently extract value from these data, organizations need to
find new tools and methods specialized for big data processing. For this reason, big data
analytics has become a key factor for companies to reveal hidden information and achieve
competitive advantages in the market. Hence big data is a broad term for data sets so large or
complex that traditional data processing applications are inadequate. Challenges include analysis,
capture, data curation, search, sharing, storage, transfer, visualization, and information privacy.
The term often refers simply to the use of predictive analytics or other certain advanced methods
to extract value from data, and seldom to a particular size of data set. Accuracy in big data may
lead to more confident decision making.
And better decisions can mean greater operational efficiency, cost reductions and reduced risk.

1
2 General Motivation

Get deep information about big data so we have addressed in this seminar report they have
initiated to solve the problem in the relational database. Because of this title is done for handling
data that wouldn't be possible to be handled by traditional database systems. Many world class
problems are being addressed with big data. Since, big data process used for handling a lot of
different types of data very quickly.

3 Goal of the paper

The general objective of this topic (seminar project) is to introduce big data to the people who
don’t know and haven’t any information about this big data so that they can get detail
information of this technology.

4 Overview about the previous technology


Traditional data is the structured data which is being majorly maintained by all types of
businesses starting from very small to big organizations. In traditional database system a
centralized database architecture used to store and maintain the data in a fixed format or fields in
a file. For managing and accessing the data structured query language is used.
In the traditional data: -It is difficult to maintain the accuracy and confidential as the quality of
the data is high and in order to store such massive quantity of data is expensive. It affects the
data analyzing which also decrease the end result of accuracy and confidentiality.
It’s impossible to store a large amount of data. The only certain amount can be stored
The traditional database is mainly for ritual structure i.e., storing data in different or mixed
formats in a file.

5 Literature review

A systematic literature review of papers on big data in healthcare published between 2010 and
2015 was conducted. This paper reviews the definition, process, and use of big data in healthcare
management. Unstructured data are growing very faster than semi-structured and structured data.
90 percentages of the big data are in a form of unstructured data, major steps of big data
management in healthcare industry are data acquisition, storage of data, managing the data,

2
analysis on data and data visualization. Recent researches targets on big data visualization tools.
In this paper the authors analysed the effective tools used for visualization of big data and
suggesting new visualization tools to manage the big data in healthcare industry. This article will
be helpful to understand the processes and use of big data in healthcare management. Keywords:
Big Data, Data Acquisition, Data Storage, Data Analytics, Data Visualization, Healthcare
Manage

6 Detail description about the big data

Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is
a data with so large size and complexity that none of traditional data management tools can store
it or process it efficiently. Big data is also a data but with huge size.

It is a term that describes the large volume of data both structured and unstructured that
inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s
what organizations do with the data that matters. Big data can be analyzed for insights that lead
to better decisions and strategic business moves.

The use of Big Data is becoming common these days by the companies to outperform their peers.
In most industries, existing competitors and new entrants alike will use the strategies resulting
from the analyzed data to compete, innovate and capture value.

Big data helps the organizations to create new growth opportunities and entirely new categories
of companies that can combine and analyze industry data. These companies have ample
information about the products and services, buyers and suppliers, consumer preferences that can
be captured and analyzed.
Big data refers to massive complex structured and unstructured data sets that are rapidly
generated and transmitted from a wide variety of sources. These attributes make up the three Vs
of big data:
Volume: The huge amounts of data being stored.
Velocity: The lightning speed at which data streams must be processed and analyzed.

3
Variety: The different sources and forms from which data is collected, such as numbers, text,
video, images, audio and text.
Massive collections of valuable information that companies and organizations need to manage,
store, visualize and analyze.

Traditional data tools aren't equipped to handle this kind of complexity and volume.

6.1 Big Data Technologies

Big Data technologies are the software utility designed for analyzing, processing, and extracting
information from the unstructured large data which can’t be handled with the traditional data
processing software. Companies required big data processing technologies to analyze the
massive amount of real-time data. They use Big Data technologies to come up with Predictions
to reduce the risk of failure. There are lots of technologies to solve the problem of Big data
Storage and processing. Such technologies are Apache Hadoop, Apache Spark

6.1.1 Apache Hadoop


It is the topmost big data tool. Apache Hadoop is an open-source software framework developed
by Apache Software foundation for storing and processing Big Data. Hadoop stores and
processes data in a distributed computing environment across the cluster of commodity
hardware. Hadoop is the in-expensive, fault-tolerant and highly available framework that can
process data of any size and formats. It was written in JAVA and the current stable version is
Hadoop 3.1.3. The Hadoop HDFS is the most reliable storage on the planet.

6.1.2. Apache Spark


Apache Spark is another popular open-source big data tool designed with the goal to speed up
the Hadoop big data processing. The main objective of the Apache Spark project was to keep the
advantages of Map Reduce’s distributed, scalable, fault-tolerant processing framework and make
it more efficient and easier to use. It provides in-memory computing capabilities to deliver
Speed. Spark supports both real-time as well as batch processing and provides high-level APIs in
Java, Scala, Python. Hence spark helps in-memory calculation.

4
6.2 Big data Examples
Social Media: The statistic shows that 500+terabytes of new data get ingested into the
databases of social media site Facebook, every day. This data is mainly generated in terms of
photo and video uploads, message exchanges, putting comments etc.

A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many
thousand flights per day, generation of data reaches up to many Petabytes.

Figure 6.1

7 How do big data work

The need to handle so much data requires a really stable and well-structured infrastructure. It will
need to quickly process huge volumes and different types of data and this can overload a single
server or cluster. This is why we need to have a well-thought-out system behind Big Data.

All the processes should be considered according to the capacity of the system. And this can
potentially demand hundreds or thousands of servers for larger companies.

Therefore, to understand how big data work is about to know the following concept

7.1 Integration
Big Data is always collected from many sources and as we are speaking for enormous loads of
information, new strategies and technologies to handle it need to be discovered. In some cases,
we are talking for petabytes of information flowing into our system, so it will be a challenge to

5
integrate such volume of information in our system. We will have to receive the data, process it
and format it in the right form that our business needs and that our customers can understand.

7.2 Management
We will need a place to store it. our storage solution can be in the cloud, on-premises, or both.
we can also choose in what form your data will be stored, so you can have it available in real-
time on demand.

7.3 Analysis
Okay, we have the data received and stored, but we need to analyze it so we can use it. Explore
our data and use it to make any important decisions such as knowing what features are mostly
researched from our customers or use it to share research. Do whatever we want and need with it
put it to work, because we did big investments to have this infrastructure set up, so you need to
use it.

8 Architecture of big data

Big data architecture refers to the logical and physical structure that dictates how high volumes
of data are ingested, processed, stored, managed, and accessed.
A big data architecture is designed to handle the ingestion, processing, and analysis of data that
is too large or complex for traditional database systems.

Big data architecture is the overarching system used to ingest and process enormous amounts of
data (often referred to as "big data") so that it can be analyzed for business purposes. The
architecture can be considered the blueprint for a big data solution based on the business needs of
organization.

6
Figure 8.1 Big data architecture
8.1 Big Data Architecture Layers
Most big data architectures include some or all of the following components:
Data sources: All big data solutions start with one or more data sources. Examples include:
Application data stores, such as relational databases.
Static files produced by applications, such as web server log files.
Real-time data sources, such as IoT devices.
Data storage: Data for batch processing operations is typically stored in a distributed file store
that can hold high volumes of large files in various formats. This kind of store is often called a
data lake. Options for implementing this storage include Azure Data Lake Store or blob
containers in Azure Storage.
Batch processing: Because the data sets are so large, often a big data solution must process data
files using long-running batch jobs to filter, aggregate, and otherwise prepare the data for
analysis. Usually, these jobs involve reading source files, processing them, and writing the output
to new files.

Real-time message ingestion: If the solution includes real-time sources, the architecture must
include a way to capture and store real-time messages for stream processing. This might be a
simple data store, where incoming messages are dropped into a folder for processing. However,
many solutions need a message ingestion store to act as a buffer for messages, and to support
scaleout processing, reliable delivery, and other message queuing semantics. Options include
Azure Event Hubs, Azure IoT Hubs, and Kafka.

Stream processing: After capturing real-time messages, the solution must process them by
filtering, aggregating, and otherwise preparing the data for analysis. The processed stream data is
then written to an output sink. Azure Stream Analytics provides a managed stream processing
service based on perpetually running SQL queries that operate on unbounded streams. You can
also use open-source Apache streaming technologies like Storm and Spark Streaming in an
HDInsight cluster.

Analytical data store: Many big data solutions prepare data for analysis and then serve the
processed data in a structured format that can be queried using analytical tools. The analytical

7
data store used to serve these queries can be a Kimball-style relational data warehouse, as seen in
most traditional business intelligence (BI) solutions. Azure Synapse Analytics provides a
managed service for large-scale, cloud-based data warehousing. HDInsight supports Interactive
Hive, HBase, and Spark SQL, which can also be used to serve data for analysis.

Analysis and reporting: The goal of most big data solutions is to provide insights into the data
through analysis and reporting. To empower users to analyze the data, the architecture may
include a data modeling layer, such as a multidimensional OLAP cube or tabular data model in
Azure Analysis Services. It might also support self-service BI, using the modeling and
visualization technologies in Microsoft Power BI or Microsoft Excel. Analysis and reporting can
also take the form of interactive data exploration by data scientists or data analysts.

Orchestration: Most big data solutions consist of repeated data processing operations,
encapsulated in work flows,that transform source data, move data between multiple sources and
sinks, load the processed data into an analytical data store, or push the results straight to a report
or dashboard. To automate these workflows, you can use an orchestration technology such Azure
Data Factory or Apache Oozie and Sqoop.

6.2 Big Data Architecture Processes

• Connecting to Data Sources: connectors and adapters are capable of efficiently connecting
any format of data and can connect to a variety of different storage systems, protocols, and
networks.
• Data Governance: includes provisions for privacy and security, operating from the moment
of ingestion through processing, analysis, storage, and deletion.
• Systems Management: highly scalable, large-scale distributed clusters are typically the
foundation for modern big data architectures, which must be monitored continually via
central management consoles.
• Protecting Quality of Service: The Quality-of-Service framework supports the defining of
data quality, compliance policies, and ingestion frequency and sizes.

8
9 Advantages and Disadvantage of Big data

9.1. Advantage of big data


The biggest advantage of Big Data is the fact that it opens up new possibilities for organizations.
Improved operational efficiency, improved customer satisfaction, drive for innovation, and
maximizing profits are only a few among the many, many benefits of Big Data.
❖ Cost Savings: The implementation of real time analytics tools may be expensive; it will
eventually save a lot of money. Some tools of it like Hadoop and Cloud-Based Analytics can
bring cost advantages to business when large amounts of data are to be stored and these tools
also help in identifying more efficient ways of doing business.
❖ Time Reductions: The high speed of tools like Hadoop and in-memory analytics can easily
identify new sources of data which helps businesses analysis immediately and make quick
decisions based on the learnings.
❖ Better sales insights, which could lead to additional revenue. Real-time analytics tell exactly
how your sales are doing and in case an internet retailer sees that a product is doing
extremely well, it can take action to prevent missing out or losing revenue.
❖ Control online reputation tools can do sentiment analysis. Therefore, you can get feedback
about who is saying what about your company. If you want to monitor and improve the
online presence of your business, then, tools can help in all this.
❖ Understand the market conditions by analysis you can get a better understanding of current
market conditions. For example, by analyzing customers’ purchasing behaviors, a company
can find out the products that are sold the most and produce products according to this trend.
By this, it can get ahead of its competitors.
❖ Increased productivity Modern tools are allowing analysts to analyze more data, more
quickly, which increases their personal productivity
❖ Fraud detection One of the big advantages of analytics systems that rely on machine learning
is that they are excellent at detecting patterns and anomalies. These abilities can give banks
and credit card companies the ability to spot stolen credit cards or fraudulent purchases, often
before the cardholder even knows that something is wrong.

9.2 Disadvantage of big data

9
✓ The quality of data: Big data is usually semi-structured and unstructured. Quality is not
always up to the mark. For this need to ensure that the data which is collected is accurate and
précised. Its format should be appropriate. If these kinds of issues prevail then it will create a
problematic situation. The insights are worthless in this case.
✓ Rapid change: Every month technology is improving and getting better than previous
versions. So many big companies cannot meet the requirements of deploying these tools.
Sometimes this rapid change can lead to a mess in the business.
✓ Lack of professionals: those people who analyze the big data to find valuable insights for
increasing productivity of a business is called big data analyst but the people who possess
these skills are not available sometimes. Not many people are aware of analyzing the data.
Talented people who can work on big data analytics are less in number, so it is one of the big
disadvantages of big data.
✓ Compliance: To keeping big data in stores government compliance is needed. Almost every
information included in companies’ big data stores is sensitive or personal. So that the firms
may need to maintain the industry standards or government requirements while maintaining,
handling and storing the data.
✓ Hardware needs: Some organizations make the use of clouds for big data analytics but
infrastructure problem is not solved properly by this. IT infrastructure is needed to support
big data analytics in large organizations. High storage space for housing the data, networking
bandwidth for transferring it to and from analytics systems, and deploy resources to perform
those analytics are so costly for buying. Moreover, its maintenance cost is so high.
✓ Cost Factor: Big data analytics is an expensive process. Many additional costs are associated
with it. These costs include hardware cost, technology cost, storage and maintenance, tool
deployment and hiring talented staff. For working on the analysis of big data high investment
is needed.

10 Big data applications area

10.1 Big Data in Healthcare Industry

Healthcare is yet another industry which is bound to generate a huge amount of data. Following
are some of the ways in which big data has contributed to healthcare:

10
• Big data reduces costs of treatment since there is less chances of having to perform
unnecessary diagnosis.
• It helps in predicting outbreaks of epidemics and also in deciding what preventive measures
could be taken to minimize the effects of the same.
• It helps avoid preventable diseases by detecting them in early stages. It prevents them from
getting any worse which in turn makes their treatment easy and effective.
• Patients can be provided with evidence-based medicine which is identified and prescribed
after doing research on past medical results.

Example: Wearable devices and sensors have been introduced in the healthcare industry which
can provide real-time feed to the electronic health record of a patient. One such technology is
from Apple.

Figure 10.1
10.2. Big Data in Banking Sector
We keep our valuable properties in the bank for ensuring security. But a bank has to go through a
lot of strategies to keep your wealth safe and well maintained. In each bank, big data is being
used for many years. From cash collection to financial management, big data is making banks
more efficient in every sector. Big data applications in the banking sector have lessened
customer’s hassle and generated revenue for the banks.

11
Figure 10.2
10.3 Big Data in Academia
Big Data is also helping enhance education today. Education is no more limited to the physical
bounds of the classroom there are numerous online educational courses to learn from. Academic
institutions are investing in digital courses powered by Big Data technologies to aid the all-round
development of budding learners.

Figure 10.3

12
10.4 Applications of Big Data in Tourism
Big data helps to gather the knowledge of tourists all around the world about places and people
that can be enormously helpful for the tourist company.

Figure 10.4

11 Evaluation of the result

The vast majority of businesses agree that big data is an excellent opportunity to obtain customer
insights, monitor business operations in real-time, predict business outcomes and so forth.
However, it is so only if they seize this opportunity. Therefore, we recommend these four points
that will allow organizations to exploit the available opportunity, first using big data to deep-dive
for advanced analytics second, integrate new big data with legacy enterprise data to extend views
of customers and other business entities third, enable ability to capture, ingest, analyses and
enlighten in real time business processes based on streaming big data. Our last recommend is

Strong collaboration amongst the big data management organization for big data to be fully
utilized, so that it must be accessed and leveraged by multiple business units and users.

12 Conclusions

Big data is a term that describes the large volume of data both structured and unstructured that
inundates a business on a day-to-day basis. It's what organizations do with the data that matters.
Big data can be analyzed for insights that lead to better decisions and strategic business moves.
Now a day's companies use Big Data to make business more informative and allows to take

13
business decisions by enabling data scientists, analytical modelers and other professionals to
analyses large volume of transactional data. Big data is the valuable and powerful fuel that drives
large IT industries of the 21st century. Big data is a spreading technology used in each business
sector.

13 Reference

[1] https://www.omnisci.com may 25/2021


[2] International Conference on Collaboration Technologies and Systems (CTS) 2013
[3] Big Data: Promise, Application and Pitfalls edited by John Storm Pedersen, Adrian 2019
[4] Big Data: Using SMART Big Data, Analytics and Metrics to Make Better by Bernard Marr
2014
[5] Big Data in Education: The digital future of learning, policy and practice Aug 7, 2017
[6] Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas
Davenport 2014
[7] Big Data: Principles and Paradigms by Rajkumar Buyya, Rodrigo N. Calheiros, Amir Vahid
Dastjerdi Elsevier Science, Jun 7, 2016
[8] Big Data Analytics for Internet of Things edited by Tausifa Jan Saleem, Mohammad Ahsan
Chishti 2021

14

You might also like