Professional Documents
Culture Documents
Big Data Seminar Report Rahul Jain
Big Data Seminar Report Rahul Jain
SEMINAR REPORT
ON
“ BIG DATA ”
Submitted To
Computer Science and Technology, Department of Technology
Shivaji University, Kolhapur
Submitted By
Mr. Rahul Rajkumar Jain
1
DEPARTMENT OF TECHNOLOGY, SHIVAJI UNIVERSITY,
KOLHAPUR.
CERTIFICATE
This is to certify that the Seminar report on “ BIG DATA ” has been submitted
by
of T.Y. (Computer Science and Technology) class in partial fulfillment for the
award of B.Tech in Computer Science and Technology Degree as per curriculum
laid by the Shivaji University, Kolhapur during the academic year 2023-2024.
2
Abstract
Big data is a broad term for data sets so large or complex that traditional data
processing applications are inadequate. Challenges include analysis, capture, data
curation, search, sharing, storage, transfer, visualization, and information
privacy.The term often refers simply to the use of predictive analytics or other
certain advanced methods to extract value from data, and seldom to a particular size
of data set. Accuracy in big data may lead to more confident decision making. And
better decisions can mean greater operational efficiency, cost reductions and
reduced risk. Analysis of data sets can find new correlations, to "spot business
trends, prevent diseases, combat crime and so on." Scientists, practitioners of media
and advertising and governments alike regularly meet difficulties with large data
sets in areas including Internet search, finance and business informatics. Scientists
encounter limitations in e-Science work, including meteorology, genomics,
connectomics, complex physics simulations, and biological and environmental
research. Data sets grow in size in part because they are increasingly being gathered
by cheap and numerous information-sensing mobile devices, aerial (remote
sensing), software logs, cameras, microphones, radio-frequency identification
(RFID) readers, and wireless sensor networks. The world's technological per-capita
capacity to store information has roughly doubled every 40 months since the 1980s;
as of 2012, every day 2.5 exabytes (2.5×1018) of data were created; The challenge
for large enterprises is determining who should own big data initiatives that
straddle the entire organization.Work with big data is necessarily uncommon; most
analysis is of "PC size" data, on a desktop PC or notebook that can handle the
available data set. Relational database management systems and desktop statistics
and visualization packages often have difficulty handling big data. The work
instead requires "massively parallel software running on tens, hundreds, or even
thousands of servers".
01 INTRODUCTION 08
03 METHODOLOGY 25
04 DATA COLLECTION 27
05 ADVANTAGES & 29
DISADVANTAGES
06 CONCLUSION 38
07 REFERENCES 39
08 BIBLOGRAPHY 40
4
REPORT DETAILS
5
1. Introduction
Big data is a collection of massive and complex data sets and data volume that
include the huge quantities of data, data management capabilities, social media
analytics and real-time data. Big data analytics is the process of examining large
amounts of data.
There exist large amounts of heterogeneous digital data. Big data is about data
volume and large data set's measured in terms of terabytes or petabytes. This
phenomenon is called Bigdata.
After examining of Bigdata, the data has been launched as Big Data analytics. In
this paper, presenting the 5Vs characteristics of big data and the technique and
technology used to handle big data.
The term Big Data refers to all the data that is being generated across the globe
at an unprecedented rate. This data could be either structured or unstructured.
Today’s business enterprises owe a huge part of their success to an economy
that is firmly knowledge-oriented.
There is a need to convert Big Data into Business Intelligence that enterprises
can readily deploy. Better data leads to better decision-making and an improved
way to strategize for organizations regardless of their size, geography, market
share, customer segmentation, and such other categorizations. Hadoop is the
platform of choice for working with extremely large volumes of data.
8|Page
6
1. Volume:
To determine the value of data, size of data plays a very crucial role. If the
volume of data is very large then it is actually considered as a 8Big Data9.
This means whether a particular data can actually be considered as a Big
Data or not, is dependent upon the volume of data.
Example: In the year 2016, the estimated global mobile traffic was 6.2
Exabytes(6.2 billion GB) per month. Also, by the year 2020 we will have
almost 40000 ExaBytes of data.
2. Velocity:
In Big Data velocity data flows in from sources like machines, networks,
social media, mobile phones etc.
Sampling data can help in dealing with the issue like 8velocity9.
Example: There are more than 3.5 billion searches per day are made on
Google. Also, FaceBook users are increasing by 22%(Approx.) year by
year.
9|Page
7
3. Veracity:
4. Value:
After having the 3 V’s into account there comes one more V which stands
for Value!. The bulk of Data having no Value is of no good to the
company, unless you turn it into something useful.
10 | P a g e
8
2a. Problem Statement
The big data and analytics benefits prove how powerful a tool it has
emerged to be for businesses irrespective of size and industry. Big data
has become crucial both for organizations as well as professionals skilled
in analytics.
Big data is highly in demand now because data is useless unless there's
the skill to analyze it.
Job opportunities in big data and analytics careers are at an all-time high,
and companies are looking for qualified data professionals to help tap the
true potential of big data to positively influence their business decisions.
Skilled big data analytics are getting big pay packages, and the salary
trend is indicating exponential growth.
Data drives the modern organizations of the world and hence making
sense of this data and unraveling the various patterns and revealing
unseen connections within the vast sea of data becomes critical and a
hugely rewarding endeavor indeed.
11 | P a g e
9
2b. Technology identified
Actually, Big Data Technologies is the utilized software that incorporates data
mining, data storage, data sharing, and data visualization, the comprehensive
term embraces data, data framework including tools and techniques used to
investigate and transform data.
12 | P a g e
10
Big Data Technologies can be split into two categories
Data from social media sites like Facebook, Instagram, what9s app and a
lot more.
13 | P a g e
11
2. Analytical Big Data Technologies:
Stock marketing
Carrying out the Space missions where every single bit of information is
crucial.
14 | P a g e
12
Foremost Big Data Technologies Trending in 2020
Now, we shall discuss the leading-edge technologies (in no particular order) that
influence the market and IT industries in recent time;
1. Artificial Intelligence
15 | P a g e
13
2. NoSQL Database
(Must read to understand the real-time- big data analytics: How is Big Data
Analytics shaping up the Internet of Things(IoT)9s?)
16 | P a g e
14
3. R Programming
Expert says it has graced the most prominent language across the world. Along
with it, being used by data miners and statisticians, it is widely implemented for
designing statistical software and mainly in data analytics.
17 | P a g e
15
4. Data Lakes
Organizations that use data lakes will be able to defeat their peers, new types of
analytics can be conducted such as machine learning across new sources of log
files, data from social media and click-streams and even IoT devices freeze in
data lakes.
18 | P a g e
16
5. Predictive Analytics
A subpart of big data analytics, it endeavors to predict future behavior via prior
data. It works using machine learning technologies, data mining and statistical
modeling and some mathematical models to forecast future events.
19 | P a g e
17
6. Apache Spark
With in-built features for streaming, SQL, machine learning and graph
processing support, Apache Spark earns the cite as the speedest and common
generator for big data transformation. It supports major languages of big data
comprising Python, R, Scala, and Java.
The Hadoop was introduced due to spark, concerning the main objective with
data processing is speed. It lessens the waiting time between interrogating and
program execution timing. The spark is used within Hadoop mainly for storage
and processing. It is a hundred times faster than MapReduce.
20 | P a g e
18
7. Prescriptive Analytics
21 | P a g e
19
8. In-memory Database
22 | P a g e
20
9. Blockchain
21
23 | P a g e
22
10. Hadoop Ecosystem
Hadoop ecosystem comprises both Apache Open Source projects and other
wide variety of commercial tools and solutions. A few of the well known open
source examples include Spark, Hive, Pig, Sqoop and Oozie.
24 | P a g e
23
2. Methodology
This allows to generate data in a way that can be used by a statistical model,
where certain assumptions hold such as independence, normality, and
randomization.
In big data analytics, we are presented with the data. We cannot design an
experiment that fulfills our favorite statistical model. In large-scale applications
of analytics, a large amount of work (normally 80% of the effort) is needed just
for cleaning the data, so it can be used by a machine learning model.
24
25 | P a g e
25
We don’t have a unique methodology to follow in real large-scale applications.
Normally once the business problem is defined, a research stage is needed to
design the methodology to be used.
However general guidelines are relevant to be mentioned and apply to almost all
problems.
One of the most important tasks in big data analytics is statistical modeling,
meaning supervised and unsupervised classification or regression problems.
Once the data is cleaned and preprocessed, available for modeling, care should
be taken in evaluating different models with reasonable loss metrics and then
once the model is implemented, further evaluation and results should be
reported.
26
26 | P a g e
27
3. Data Collection
Big Data is a term used to describe the massive growth and availability of
structured and unstructured data. While the term may seem to refer to the
volume of data, it also refers to the technology (tools and processes) that an
organization requires to handle these data volumes and storage facilities. Big
Data spans three dimensions: Volume, Velocity and Variety.
Everyone is talking about Big Data trends, from challenges to the tools required
for Big Data projects. Businesses understand that Big Data infrastructure will
help them make better decisions.
27 | P a g e
28
Every day we create 2.5 quintillion bytes of data - so much that 90% of the data
in the world today has been created in the last two years alone.
Similar to the complexity aspect of Big Data use, its growth rate is mostly due
to the ubiquitous nature of real time big data processing, capture devices,
systems and networks. We can expect that this growth rate will continue to
increase in the future.
Big Data solutions help detect customer sentiment about products or services of
an organization and gain a deeper, visual understanding of the multichannel
customer journey and then act on these insights to improve the customer
experience.
28 | P a g e
29
4. Advantages & Disadvantages
Businesses, big or small, across industries can benefit from using big data
effectively. The benefits of big data and analytics include better decision-
making, bigger innovations, and product price optimization, among others. Let's
look at the top benefits closely:
Amazon has utilized this big data benefit by offering the ultimate
personalized shopping experience, wherein suggestions pop up based on
previous purchases as well as products that other customers have bought,
browsing patterns, and other factors.
29 | P a g e
30
3. Potential Risks Identification
4. Innovate
The insights you gain using big data analytics are the key to innovation.
Big data allows you to update existing products/services while
innovating new ones.
The large volume of data collected helps businesses identify what fits
their customer base. Information on what others think of your
products/services can help in product development.
30 | P a g e
31
5. Complex Supplier Networks
6. Cost optimization
One of the most compelling benefits that big data tools like Hadoop
Typically, the cost of returns is 1.5 times higher than normal shipping
costs. Companies use big data and analytics to minimize product return
costs by calculating the chances of product returns. Thus they can take
suitable measures to minimize product-return losses.
7. Improve Efficiency
31 | P a g e
32
Disadvantages
If anything has advantages, disadvantages may also be there. Following are the
disadvantages of the big data
1. Lack of talent
According to a survey by AtScale, the lack of big data experts and data
scientists has been the biggest challenge in this field for the past three
years. Currently, many IT professionals don9t know how to carry out
big data analytics as it requires a different skill set.
Thus, finding data scientists who are also experts in big data can be
challenging.
Big data experts and data scientists are two highly paid careers in the
data science field. Therefore, hiring big data analysts can be very
expensive for companies, especially for startups.
Some companies have to wait for a long time to hire the required staff to
continue their big data analytics tasks.
2. Security risks
Most of the time, companies collect sensitive information for big data
analytics. Those data need protection, and security risks can
be demerits due to the lack of proper maintenance.
Besides, having access to huge data sets can gain unwanted attention
from hackers, and your business may be a target of a potential cyber-
attack. As you know, data breaches have become the biggest threat to
many companies today.
32 | P a g e
33
3. Compliance
So, data governance tasks, transmission, and storage will become more
difficult to manage as the big data volumes increase.
5. Rapid change
6. Lack of professionals
those people who analyze the big data to find valuable insights for
increasing productivity of a business is called big data analyst but the
people who possess these skills are not available sometimes.
33 | P a g e
34
5. Application Areas
1. Government:
For example in the United States of America, in the year of 2012, the
administration of Obama declared the big data research and development
initiative, because it is used to address many issues faced by the government.
The big data is also utilized by the Indian government.
2. International development:
34 | P a g e
35
3. Manufacturing:
4. Cyber-physical models:
The present PHM implementations make avail of data during the actual
usage while the analytical step by step procedures can do more precisely
when more data is included.
5. Media:
In the media, it is used in the internet of things which do the activities like
targeting of computers and data capturing.
6. Technology:
7. Private sector:
The application of big data in the private sector includes the retail, retail
banking, and real estate.
35 | P a g e
36
8. Science:
The best example for its application in science is about the Large Hardom
collider that represented 150 million sensors transmitting information 40
million times per second.
9. Healthcare:
With the added adoption of mHealth, eHealth and wearable technologies the
volume of data is increasing at an exponential rate. This includes electronic
health record data, imaging data, patient generated data, sensor data, and
other forms of data.
Learn more about the importance of Big Data and its various applications
in the Big Data.
36 | P a g e
37
10. IoT:
The big data also has the application in the science and research.
The big data will be very advanced in the future as $15 billion is invested in
software firms that are specialized in the data management and the data
analytics.
37 | P a g e
38
6. Conclusion
The Age of Big Data is here, and these are truly revolutionary times if both
business and technology professionals continue to work together and deliver
on the promise.
38 | P a g e
39
7. References (in IEEE format)
www.google.com
www.wikipedia.com
www.studymafia.org
39 | P a g e
40
8. Biblography
Big data examples use cases. (2011). Retrieved from Tableau:
https://www.tableau.com/learn/articles/big-data-examples-use-cases
Big Data Use Cases. (n.d.). Retrieved from Big data Analytics
News: https://bigdataanalyticsnews.com/big-data-use-cases/
Carsten Bange, T. G. (2015). big data use cases. Retrieved from BARC :
http://barc-research.com/research/big-data-use-cases-2015/
Zeng, J. &. (2013). Looking inside the black box. Strategic Organization
40 | P a g e
41