Professional Documents
Culture Documents
Big Data and Storage Systems
Big Data and Storage Systems
Big Data and Storage Systems
“
1. Scalable problems and memory-bounded speedup, Sun
and Ni, JPDC, 1993
2. The Google File System, Sanjay Ghemawat Howard
Gobioff Shun-Tak Leung, ACM SOSP, 2003
12
Big Data Concepts
What, Where, Why?
13
What is Big Data?
14
Image credits: http://www.seekbig.in/1128-tnpsc-economics-questions/
The term is fuzzy … Handle with care!
“
“Big data refers to the approach to data of
‘collect now, sort out later’…The low cost of
storage and better methods of analysis mean
that you generally don’t need to have a specific
purpose for the data in mind before you collect
it.”
“
“Big data is when your business wants to use data to
solve a problem, answer a question, produce a
product, etc., but the standard, simple methods
break down on the size of the data set, causing time,
effort, creativity, and money to be spent crafting a
solution to the problem that leverages the data
without simply sampling or tossing out records.”
John Foreman, Chief Data Scientist, MailChimp
“
“While the use of the term is quite nebulous …
I’ve understood “big data” to be about analysis
for data that’s really messy or where you don’t
know the right questions or queries to make —
analysis that can help you find patterns,
anomalies, or new structures amidst otherwise
chaotic or complex data points.”
19
Image Credits: https://community.uservoice.com/wp-content/uploads/benefits-of-effective-questions-800x448-300x168.jpg
“
So, where does Big Data
come from?
20
Desktop & Mobile Web Users
https://gs.statcounter.com/platform-market-share/desktop-
mobile-tablet/worldwide/#monthly-200901-202111
21
Facebook Active Users
July 2020
https://www.statista.com/statistics/578364/countries-with-most- 23
instagram-users/
Internet Activity during COVID
25
https://www.sciencemag.org/news/2017/07/ai-changing-how-we-do-science-get-glimpse
3000 1600
Monthly UPI Transactions 1400
2250 1200
9000 25000
8000
7000 20000
6000
15000
5000
4000
10000
3000
0 0
https://www.gstn.org/
27
IoT and Big Data
30
Data Science
Methods
Inter-disciplinary (AI, ML)
domain at the
intersection of data
analysis methods, Data
Big Data Systems Science Big Data
and data-driven Applica-
tions
(Systems,
Platforms
applications. )
31
Data Analysis Lifecycle
• Acquire Data
Acquire • Sensors, Web logs & crawls, Transactions
32
OPEN SOURCE PLATFORMS
Top Paying Technologies ☺
Topics you are learning in this module are the top-3
paying Databases and Frameworks globally!
https://survey.stackoverflow.co/2022/#top-paying-technologies-other-frameworks-and-libraries
Big Data Platform Stack, Spark Flavor
Different Abstractions
Process Data
Manage System
Store Data
38
https://www.oreilly.com/library/view/data-analytics-with/9781491913734/ch04.html
Additional Reading
▷
“
A Survey of Big Data Research, H Fang, et al., IEEE Network,
September/October 2015,
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4617656/
▷ Beyond the hype: Big data concepts, methods, and analytics, A. Gandomi
and M. Haider, International Journal of Information Management, Volume 35,
Issue 2, 2015, https://doi.org/10.1016/j.ijinfomgt.2014.10.007
▷ Uncertainty in big data analytics: survey, opportunities, and
challenges. R.H. Hariri, et al. J Big Data 6, 44 (2019).
https://doi.org/10.1186/s40537-019-0206-3
39
Distributed Systems
Helping Data Science Scale
40
Vertical and Horizontal Scaling
https://www.nextplatform.com/2017/06/20/competition-returns-x86-servers-epyc-fashion/
https://www.macrumors.com/guide/m1/
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-dynamiq-technology-for-the-next-era-of-compute
Apple M1
43