Welcome to Scribd!

Scala Spark

Uploaded by

0% found this document useful (0 votes)

6 views2 pages

The document discusses how functional abstractions from previous Scala courses can be mapped to computations over massive datasets on multiple machines. It notes that functional frameworks like Spark make scaling computations easier than imperative systems. Learners will analyze large data tasks like K-means functionally and see how they can be implemented in Spark. While many data science courses use languages like R, Python, Octave and MATLAB for small datasets, these don't allow scaling to large datasets without reimplementing algorithms from scratch in systems like Hadoop or Spark.

Original Description:

Original Title

Scala_Spark

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

0% found this document useful (0 votes)

6 views2 pages

Scala Spark

Uploaded by

Mastan

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 2

Search inside document

how to map some of the functional abstractions that you've learned in previous Scala courses to

computations on multiple machines over massive data sets.

What is, we will see first-hand how the functional abstractions that we've covered in the previous Scala
courses makes it easier and more user-friendly to scale computations over large clusters. Or easier, per
se, than scalingcomputations on imperative frameworks, imperative systems fordistributedcomputation.

we're always going to focus on analyzing large data sets. That is you'll be challenged to think about
common data science tasks like K-means functionally, such as that they can be adopted to and
implemented in the context of Spark.

A functionally oriented framework for large scale data processing that's implemented in Scala

you might beasking well, if we're going to be focusing on a lightweight data science flavor of the
processing tasks, then why are we bothering with Scala and

why are we bothering with Spark? After all

if you want to learn data science in the classroom off of statistics professor's favorite languages or

frameworks like R or Python or Octave and/or MATLAB.

So then why should one bother running Scala or Spark which are both arguably very unlike R, Python,
Octave and MATLAB? The answer is that these language and frameworks are good for data science in
the small.

Algorithms on data sets that are perhaps just a few hundred megabytes or even a few gigabytes in size.
However, once the dataset becomes too large to fit into main memory on one computer, it suddenly
becomes much more difficult to use one of these languages or frameworks alone.

if your small dataset grows into a much larger data set than these languages and frameworks like
R,Python, MATLAB, etc. They won't allow you to scale,you'll need to start completely from scratch
reimplementing all of your algorithms using a system like Hadoop or Spark anyway. We'll need to
manually figure out how to distribute your problem over many machines without the help of such a
framework.

Which is kind of a bad idea if you're not already an expert in building distributed systems.

there's also this wholehuge massive industry shift towards data-oriented decision making. Nowadays,
many companies across manydifferent industries have realized that by looking more closely at the data
they'recollecting from device logs to health or genetic data, they can innovate in ways that were
impossible before. For example, now we have all of these devices surrounding us, collecting information
and attempting to provide all kinds of insights to enrich our day-to-day lives.

instead, imagine hundreds of thousands of users of some device, say a smartphone or

some wearable or something. And imagine as part of your job, you'rresponsible for providing some
analysis or insight behind all of the data that's collected.

Getting Started With Hazelcast - Second Edition - Sample Chapter
Document14 pages
Getting Started With Hazelcast - Second Edition - Sample Chapter
Packt Publishing
0% (1)
Panvalet Userguide
Document155 pages
Panvalet Userguide
Tanmoy Mondal
No ratings yet
The Big Big Data' Question Hadoop or Spark
Document3 pages
The Big Big Data' Question Hadoop or Spark
Rajiv Nayan
No ratings yet
Spark 101
Document25 pages
Spark 101
Daniel Ortiz
No ratings yet
Big Data Processing With Apache Spark - Infoqdotcom
Document16 pages
Big Data Processing With Apache Spark - Infoqdotcom
abhijitch
No ratings yet
Big Data Processing With Apache Spark
Document17 pages
Big Data Processing With Apache Spark
abhijitch
No ratings yet
Machine Learning With Spark - Sample Chapter
Document36 pages
Machine Learning With Spark - Sample Chapter
Packt Publishing
100% (1)
Ebook: Data Visualization Tools For Users (English)
Document26 pages
Ebook: Data Visualization Tools For Users (English)
BBVA Innovation Center
No ratings yet
Step by Step Guide To Become Big Data Developer
Document15 pages
Step by Step Guide To Become Big Data Developer
Saggam Bharath
75% (4)
Data Science - UNIT-3 - Notes
Document32 pages
Data Science - UNIT-3 - Notes
catsa dogga
No ratings yet
Machine Learning Python Packages
Document9 pages
Machine Learning Python Packages
Nandkumar Khachane
No ratings yet
Research Paper On Big Data Hadoop
Document5 pages
Research Paper On Big Data Hadoop
t1tos1z0t1d2
100% (1)
(Download PDF) Algorithms and Data Structures in Action Meap V12 Marcello La Rocca Online Ebook All Chapter PDF
Document42 pages
(Download PDF) Algorithms and Data Structures in Action Meap V12 Marcello La Rocca Online Ebook All Chapter PDF
yvonne.thorsness561
100% (9)
Map Reduce
Document13 pages
Map Reduce
Harshali Kalunge
No ratings yet
Real-Time Big Data Analytics - Sample Chapter
Document30 pages
Real-Time Big Data Analytics - Sample Chapter
Packt Publishing
100% (2)
Beginning Database Design
Document2 pages
Beginning Database Design
I Made Putrama
No ratings yet
Basic Libraries For Data Science
Document4 pages
Basic Libraries For Data Science
sgoranks
No ratings yet
Apache Spark Essential Training
Document30 pages
Apache Spark Essential Training
Fernando Andrés Hinojosa Villarreal
No ratings yet
Evaluative Summary On Databricks' Value Propositions
Document2 pages
Evaluative Summary On Databricks' Value Propositions
Saad Sadiq
No ratings yet
Apache Spark Things To Know
Document8 pages
Apache Spark Things To Know
sparkaredla
No ratings yet
Best Field in Computer Science
Document13 pages
Best Field in Computer Science
Pratham Saxena
No ratings yet
Parallel Python with Dask: Perform distributed computing, concurrent programming and manage large dataset
From Everand
Parallel Python with Dask: Perform distributed computing, concurrent programming and manage large dataset
Tim Peters
No ratings yet
Parallel Python with Dask
From Everand
Parallel Python with Dask
Tim Peters
No ratings yet
Master Microsoft Excel
Document6 pages
Master Microsoft Excel
Mohit Bansal
No ratings yet
Learning Real-Time Processing With Spark Streaming - Sample Chapter
Document30 pages
Learning Real-Time Processing With Spark Streaming - Sample Chapter
Packt Publishing
No ratings yet
How Is Mapreduce A Good Method To Analyse HTTP Server Logs?: 2 Answers
Document3 pages
How Is Mapreduce A Good Method To Analyse HTTP Server Logs?: 2 Answers
Vaddi Ramanjaneyulu
No ratings yet
What Is Bigdata
Document5 pages
What Is Bigdata
vaddeseetharamaiah
No ratings yet
6th Sem - Cse - DATA SCIENCE ANALYTICS - SM O
Document40 pages
6th Sem - Cse - DATA SCIENCE ANALYTICS - SM O
saurabh32x
No ratings yet
6th Sem Cse Data Science Analytics Sm o
Document40 pages
6th Sem Cse Data Science Analytics Sm o
Tushar Chaudhari
No ratings yet
Thesis Apache Spark
Document4 pages
Thesis Apache Spark
iapesmiig
100% (2)
Spark2x: Big Data Huawei Course
Document25 pages
Spark2x: Big Data Huawei Course
Thiago Siqueira
No ratings yet
Big Data Technologies
Document31 pages
Big Data Technologies
AdiTan00
No ratings yet
Unit 4
Document60 pages
Unit 4
Ramstage Testing
No ratings yet
The Next Database Revolution: Jim Gray Microsoft 455 Market St. #1650 San Francisco, CA, 94105 USA
Document4 pages
The Next Database Revolution: Jim Gray Microsoft 455 Market St. #1650 San Francisco, CA, 94105 USA
nguyendinh126
No ratings yet
The Modern Data Warehouse in Azure: Building with Speed and Agility on Microsoft’s Cloud Platform
From Everand
The Modern Data Warehouse in Azure: Building with Speed and Agility on Microsoft’s Cloud Platform
Matt How
No ratings yet
Research Paper On Apache Hadoop
Document6 pages
Research Paper On Apache Hadoop
soezsevkg
100% (1)
Developing Analytic Talent: Becoming a Data Scientist
From Everand
Developing Analytic Talent: Becoming a Data Scientist
Vincent Granville
Rating: 3 out of 5 stars
3/5 (7)
School of Computing Indian Institute of Information Technology UNA Himachal Pradesh
Document10 pages
School of Computing Indian Institute of Information Technology UNA Himachal Pradesh
Chiraag Mittal
No ratings yet
Fast Data Processing With Spark - Second Edition - Sample Chapter
Document18 pages
Fast Data Processing With Spark - Second Edition - Sample Chapter
Packt Publishing
No ratings yet
Introduction To Big Data Technologies
Document10 pages
Introduction To Big Data Technologies
indolent56
No ratings yet
Learning Spark Preview Ed
Document18 pages
Learning Spark Preview Ed
linux87s
No ratings yet
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond
From Everand
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond
Kevin Feasel
No ratings yet
The Data Science Toolkit
Document5 pages
The Data Science Toolkit
guruvarshniganesapandi
No ratings yet
Core Libraries For Machine Learning
Document5 pages
Core Libraries For Machine Learning
Nandkumar Khachane
No ratings yet
COLL Report Typesafe Apache Spark
Document24 pages
COLL Report Typesafe Apache Spark
RahulAgarwal
No ratings yet
Learning Cascading
From Everand
Learning Cascading
Michael Covert
No ratings yet
BDE Pertemuan 1
Document20 pages
BDE Pertemuan 1
Ignatius Joko Dewanto
No ratings yet
Mapreduce Google Research Paper
Document7 pages
Mapreduce Google Research Paper
zgkuqhxgf
100% (1)
Is Java Bootcamp Required For Data Analyst
Document4 pages
Is Java Bootcamp Required For Data Analyst
SynergisticIT
No ratings yet
Getting Started With Hadoop
Document47 pages
Getting Started With Hadoop
TeeMan27
No ratings yet
Spss
Document35 pages
Spss
Dilshad Shah
No ratings yet
Ismail - Data Scientist
Document9 pages
Ismail - Data Scientist
Madhav Garikapati
No ratings yet
Mapreduce Research Paper
Document5 pages
Mapreduce Research Paper
qhfyuubnd
100% (1)
Algorithms and Data Structures
Document6 pages
Algorithms and Data Structures
belinda abigael
No ratings yet
Fast and Interactive Analytics Over Hadoop Data With Spark
Document7 pages
Fast and Interactive Analytics Over Hadoop Data With Spark
Sami Dick
No ratings yet
Hands-On Machine Learning Recommender Systems with Apache Spark
From Everand
Hands-On Machine Learning Recommender Systems with Apache Spark
Ernesto Lee
No ratings yet
Fabric Data Science
Document652 pages
Fabric Data Science
pascalburume
No ratings yet
18 Free Exploratory Data Analysis Tools For People Who Don't Code So Well
Document14 pages
18 Free Exploratory Data Analysis Tools For People Who Don't Code So Well
ajmal
No ratings yet
Fabric Data Science 1 150
Document150 pages
Fabric Data Science 1 150
pascalburume
No ratings yet
DataStage Vs Informatica
Document3 pages
DataStage Vs Informatica
vkaturiLS
No ratings yet
1 Introduction To Data Structures
Document3 pages
1 Introduction To Data Structures
at9187
No ratings yet
Acer BIOS Update Instructions
Document6 pages
Acer BIOS Update Instructions
Zachary Molina
No ratings yet
Release Schedule of Current Database Releases (Doc ID 742060.1) 742060.1
Document9 pages
Release Schedule of Current Database Releases (Doc ID 742060.1) 742060.1
Ganesh J
No ratings yet
Microsoft Project 2016: Imran Qamar Khan, PMP
Document95 pages
Microsoft Project 2016: Imran Qamar Khan, PMP
javeria zahid
No ratings yet
Gre Tunnel Lab
Document5 pages
Gre Tunnel Lab
data tech
No ratings yet
Getting Started With NODE - RED
Document20 pages
Getting Started With NODE - RED
Hadhami Boughanmi
No ratings yet
It Solution Architecture Template
Document14 pages
It Solution Architecture Template
prasadkv
No ratings yet
Sancharee Das
Document2 pages
Sancharee Das
Sancharee Das
No ratings yet
Homework #5
Document2 pages
Homework #5
M shayan Javed
No ratings yet
Unit 10
Document30 pages
Unit 10
Khánh Linh
No ratings yet
Berec Report On Universal Service Reflec - 0
Document95 pages
Berec Report On Universal Service Reflec - 0
Sonia
No ratings yet
QXDM Professionaltm Qualcomm Extensible Diagnostic Monitor
Document2 pages
QXDM Professionaltm Qualcomm Extensible Diagnostic Monitor
jigar1007
100% (5)
Design Considerations For Using C++ in Safety Critical Avionics Systems
Document23 pages
Design Considerations For Using C++ in Safety Critical Avionics Systems
Sonali Shree
No ratings yet
Marketing Presentation 460
Document10 pages
Marketing Presentation 460
Hasan Mehedi Niloy
No ratings yet
VDSL POTS Splitter Whitepaper PDF
Document12 pages
VDSL POTS Splitter Whitepaper PDF
EzZam
No ratings yet
Advanced Certification in Cyber Security 1
Document13 pages
Advanced Certification in Cyber Security 1
raj503
No ratings yet
CiscoPress - CCNP Collaboration Call Control and Mobility CLACCM 300-815 Official Cert Guide
Document985 pages
CiscoPress - CCNP Collaboration Call Control and Mobility CLACCM 300-815 Official Cert Guide
AC INGENIERIA
No ratings yet
A6b 8051 Interfacing EEPROM AT24C16
Document3 pages
A6b 8051 Interfacing EEPROM AT24C16
Phan Hiếu
No ratings yet
RAS Opticalnet N48
Document93 pages
RAS Opticalnet N48
buthidae
No ratings yet
Deep Learning Model For Glioma, Meningioma and Pituitary Classification
Document11 pages
Deep Learning Model For Glioma, Meningioma and Pituitary Classification
International Journal of Advances in Applied Sciences (IJAAS)
No ratings yet
Release Notes For Code Composer Studio v6.1.0
Document2 pages
Release Notes For Code Composer Studio v6.1.0
Sunil Kumar Dadwal Ruhela
No ratings yet
TP 1 - Pentaho Data Integration (PDI) - Installing
Document2 pages
TP 1 - Pentaho Data Integration (PDI) - Installing
zackkillz20
No ratings yet
Web Application For Emotion-Based Music Player
Document6 pages
Web Application For Emotion-Based Music Player
Taha shaikh
No ratings yet
WCDMA RAN Overview
Document44 pages
WCDMA RAN Overview
megzgreat
No ratings yet
Android App Development Course Outline
Document6 pages
Android App Development Course Outline
mehedi
No ratings yet
Revision On Something We Assume You Already Know
Document39 pages
Revision On Something We Assume You Already Know
aisyah
No ratings yet
Sample Thesis Proposal Enrollment System
Document4 pages
Sample Thesis Proposal Enrollment System
nadinebenavidezfortworth
100% (1)
Programming F# (Chris Smith)
Document408 pages
Programming F# (Chris Smith)
apm11426
No ratings yet
Cryptography Io en Latest
Document355 pages
Cryptography Io en Latest
swapniljain2403
No ratings yet
2016 IMAS - 2nd - UP - ENG
Document9 pages
2016 IMAS - 2nd - UP - ENG
Long Nguyễn
No ratings yet