Professional Documents
Culture Documents
Big Data Introduction PDF
Big Data Introduction PDF
Dr Mouhim Sanaa
WHAT IS IN IT?
What is bigData
BigData challenges
BigData: 5Vs
Big Data use cases
Limitations of traditionnal approach
WHAT IS BIG DATA
• Big data is a term that is used to describe large collections of data (also known as data sets).
Big data might be unstructured and grow so large and quickly that is difficult to manage with
regular database or statistics tools.
WHAT IS BIG DATA
BigData challenges
• Process and analyze data in a distributed way to extract knowledge and for decision making
Big data processing : Batch processing (Map Reduce, SPARK)
Stream Proccing (spark, kafka, Flink, Storm …)
Social data comes from the Likes, Tweets & Retweets, Comments, Video Uploads, and general media that
are uploaded and shared via the world’s favorite social media platforms.
Machine data is defined as information which is generated by industrial equipment, sensors that are
installed in machinery, and even web logs which track user behavior.
Transactional data is generated from all the daily transactions that take place both online and offline.
Invoices, payment orders, storage records, delivery receipts – all are characterized as transactional data
yet data alone is almost meaningless, and most organizations struggle to make sense of the data that
they are generating and how it can be put to good use.
5V’S OF BIG DATA Volume
Chaque minute, dans le monde, sont envoyés plus de 204 millions de mails. Chaque minute, Google
enregistre 2 millions de requêtes différentes sur son moteur de recherche et Facebook, autant de
« likes ». Chaque minute, 85 000 dollars de commandes sont passées sur Amazon.
5V’S OF BIG DATA Veracity
In scoping out your big data strategy you need to have your team and partners
work to help keep your data clean and processes to keep ‘dirty data’ from
accumulating in your systems.
5V’S OF BIG DATA Value
USE CASES OF BIG DATA
USE CASES OF BIG DATA
• enables organizations to anticipate their wants and needs, and thereby improve all marketing
efforts, all sales communications, and all customer service engagements.
• It helps with predicting potential demand, upselling and cross-selling, increasing loyalty,
retention and satisfaction, and with designing personalized customer journeys that enhance the
whole customer experience.
• helps reduce costs – the more intelligence an organization has about its customers, the more
targeted campaigns can be, and the less money is wasted developing and executing campaigns
that never perform.
• Some tools can even analyze customers’ language to detect their current emotions and suggest
appropriate responses to sales or customer service agents.
USE CASES OF BIG DATA
Security intelligence
Organizations are also using big data analytics to help them stop hackers and cyberattackers.
Servers in a company are generating log files, other log files are available for sale.
Many organizations analyze all of the internal and external log information to prevent and detect
attacks in real time.
USE CASES OF BIG DATA
Recommendation Engines
• When you are watching a movie at Netflix or shopping for products from Amazon, you probably
now take it for granted that the website will suggest similar items that you might enjoy.
• Of course, the ability to offer those recommendations arises from the use of big data analytics to
analyze historical data.
• These recommendation engines have become so commonplace on the Web that many customers
now expect them when they are shopping online.
• And organizations that haven't taken advantage of their big data in this way may lose customers to
competitors or may lose out on upsell or cross-sell opportunities (association rules).
USE CASES OF BIG DATA
Internet Of Things
The Internet of Things refers to the rapidly growing network of connected objects that are able to
collect and exchange data using embedded sensors.
• Smart Home: The smart home is likely the most popular IoT application at the moment. Control
doors and windows using internet (domestics).
• Wearables: The Apple Watch and other smartwatches on the market have turned our wrists into
smartphone holsters by enabling text messaging, phone calls, and more.
• Smart Cities: The Internet of Things can solve traffic congestion issues and reduce noise, crime, and
pollution.
• Connected Car: These vehicles are equipped with Internet access and can share that access with
others, just like connecting to a wireless network in a home or office.
TRADITONNAL APPROACH
• Here data will be stored in an RDBMS like Oracle Database or MS SQL Server.
• Sophisticated softwares can be written to
interact with the database,
process the required data
and present it to the users for analysis purpose.
LIMITATIONS OF TRADITONNAL APPROACH
we have less volume of data that can be accommodated by standard database servers,
or up to the limit of the processor which is processing the data.
But when it comes to dealing with huge amounts of data, it is really a difficult task to
process such data through a traditional database server.
we have less volume of data that can be accommodated by standard database servers,
or up to the limit of the processor which is processing the data.
But when it comes to dealing with huge amounts of data, it is really a difficult task to
process such data through a traditional database server.
we have less volume of data that can be accommodated by standard database servers,
or up to the limit of the processor which is processing the data.
But when it comes to dealing with huge amounts of data, it is really a difficult task to
process such data through a traditional database server.
• MapReduce divides the task into small parts and assigns those parts to
many computers connected over the network, and collects the results to
form the final result dataset.
WHICH SOLUTION FOR BIGDATA
• The solution is a framework wich allows a distributed storage and a parallel processing of
“BigData”: Hadoop