Welcome to Scribd!

Spark

Uploaded by

0% found this document useful (0 votes)

13 views9 pages

Apache Spark is a lightning-fast cluster computing framework designed for fast computation. It extends the MapReduce model to efficiently handle more types of computations, including interactive queries and stream processing. Spark uses in-memory cluster computing to increase processing speed. It supports batch jobs, iterative algorithms, interactive queries and streaming workloads in a single system with APIs in Java, Scala, and Python. The main components of Spark are the core engine, Spark SQL for structured data, Spark Streaming for real-time data, MLlib for machine learning, and GraphX for graph processing.

Original Description:

Big data apache spark revision

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

13 views9 pages

Spark

Uploaded by

Mohamed H. Mokarab

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 9

Search inside document

Spark

• Apache Spark is a lightning-fast cluster computing

technology, designed for fast computation. It is based
on Hadoop MapReduce and it extends the MapReduce
model to efficiently use it for more types of
computations, which includes interactive queries and
stream processing. The main feature of Spark is its in-
memory cluster computing that increases the
What is processing speed of an application.
spark • Spark is designed to cover a wide range of workloads
such as batch applications, iterative algorithms,
interactive queries and streaming. Apart from
supporting all these workload in a respective system, it
reduces the management burden of maintaining
separate tools.
Apache Spark has following features.
• Speed − Spark helps to run an application in Hadoop
cluster, up to 100 times faster in memory, and 10 times
faster when running on disk. This is possible by
Features of reducing number of read/write operations to disk. It
stores the intermediate processing data in memory.
Apache • Supports multiple languages − Spark provides built-in
APIs in Java, Scala, or Python. Therefore, you can write
Spark applications in different languages. Spark comes up
with 80 high-level operators for interactive querying.
• Advanced Analytics − Spark not only supports ‘Map’
and ‘reduce’. It also supports SQL queries, Streaming
data, Machine learning (ML), and Graph algorithms
Apache Spark Core

Spark SQL

Components
of Spark Spark Streaming

MLlib (Machine Learning Library)

GraphX
Apache Spark Core
• Spark Core is the underlying general execution engine
for spark platform that all other functionality is built
upon. It provides In-Memory computing and
referencing datasets in external storage systems.
Spark SQL
Cont. • Spark SQL is a component on top of Spark Core that
introduces a new data abstraction called SchemaRDD,
which provides support for structured and semi-
structured data.
Spark Streaming
• Spark Streaming leverages Spark Core's fast scheduling
capability to perform streaming analytics. It ingests
data in mini-batches and performs RDD (Resilient
Distributed Datasets) transformations on those mini-
batches of data.
MLlib (Machine Learning Library)
• MLlib is a distributed machine learning framework
above Spark because of the distributed memory-based
Spark architecture. It is, according to benchmarks, done
by the MLlib developers against the Alternating Least
Squares (ALS) implementations. Spark MLlib is nine
times as fast as the Hadoop disk-based version
Cont. of Apache Mahout (before Mahout gained a Spark
interface).
GraphX
• GraphX is a distributed graph-processing framework on
top of Spark. It provides an API for expressing graph
computation that can model the user-defined graphs by
using Pregel abstraction API. It also provides an
optimized runtime for this abstraction.
main abstractions of Spark

RDD – Resilient Distributed Dataset DAG – Direct Acyclic Graph

• Resilient Distributed Datasets (RDD) is a fundamental
data structure of Spark. It is an immutable distributed
collection of objects. Each dataset in RDD is divided into
RDD logical partitions, which may be computed on different
nodes of the cluster. RDDs can contain any type of
Python, Java, or Scala objects, including user-defined
classes.
• DAG a finite direct graph with no directed cycles. There
are finitely many vertices and edges, where each edge
directed from one vertex to another. It contains a
sequence of vertices such that every edge is directed
DAG from earlier to later in the sequence. It is a strict
generalization of MapReduce model. DAG operations
can do better global optimization than other systems
like MapReduce. The picture of DAG becomes clear in
more complex jobs.

Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Spark BD
Document9 pages
Spark BD
Mohamed H. Mokarab
No ratings yet
Spark-Rdd
Document15 pages
Spark-Rdd
K Anantha Krishnan
No ratings yet
Spark: Prepared by Dulari Bhatt
Document19 pages
Spark: Prepared by Dulari Bhatt
Dulari Bosamiya Bhatt
No ratings yet
Apache Spark
Document14 pages
Apache Spark
wassimoss00
No ratings yet
Top Answers to Spark Interview Questions (3)
Document32 pages
Top Answers to Spark Interview Questions (3)
Nitin Gorde
No ratings yet
Unit-5 Spark
Document20 pages
Unit-5 Spark
Siva
No ratings yet
Spark Notes
Document37 pages
Spark Notes
bhargavi
No ratings yet
Top Answers To Spark Interview Questions
Document32 pages
Top Answers To Spark Interview Questions
srinivas75k
No ratings yet
Unit 4 Spark Cassendra
Document41 pages
Unit 4 Spark Cassendra
downloadjain123
No ratings yet
Module 3
Document51 pages
Module 3
sagarhn sagarhn
No ratings yet
Unit 5
Document109 pages
Unit 5
Rajesh Kumar Rakasula
No ratings yet
Bda 5
Document21 pages
Bda 5
abdulahad.ubeid
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
Document11 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
divya kolluri
No ratings yet
What Is Spark?: History of Apache Spark
Document65 pages
What Is Spark?: History of Apache Spark
Apurva
No ratings yet
Spark SQL
Document25 pages
Spark SQL
Rishi
No ratings yet
Key Features: General-Purpose Fast Cluster Computing Platform
Document16 pages
Key Features: General-Purpose Fast Cluster Computing Platform
Mahesh VP
No ratings yet
Apache Spark Interview Questions
Document12 pages
Apache Spark Interview Questions
varun3dec1
No ratings yet
Apache Spark Quick Guide
Document21 pages
Apache Spark Quick Guide
Oumaima Alfa
100% (1)
Learn Apache Spark
Document31 pages
Learn Apache Spark
abreddy2003
100% (1)
A Brief Introduction To Apache Spark
Document10 pages
A Brief Introduction To Apache Spark
Venkatesh Narisetty
No ratings yet
Solution Methodology
Document3 pages
Solution Methodology
Arnab Dey
No ratings yet
Apache Spark: Data Science Foundations
Document55 pages
Apache Spark: Data Science Foundations
TRAPMUZIC HDTV
No ratings yet
09 Programming Hadoop - Spark, R and Pig
Document80 pages
09 Programming Hadoop - Spark, R and Pig
Neeraj Garg
No ratings yet
Apache Spark
Document13 pages
Apache Spark
wassimoss00
No ratings yet
Shark
Document24 pages
Shark
kapilkashyap3105
No ratings yet
Tech Seminar Report
Document5 pages
Tech Seminar Report
Saikumar Thurai
No ratings yet
Apache Spark Self Learning 1
Document7 pages
Apache Spark Self Learning 1
bhargavikattikola9515
No ratings yet
Apache Spark Components
Document4 pages
Apache Spark Components
nitinlucky
No ratings yet
Big Data Assignment
Document6 pages
Big Data Assignment
suibian.270619
No ratings yet
Apache Spark Theory by Arsh
Document4 pages
Apache Spark Theory by Arsh
Faraz Akhtar
No ratings yet
SABDE3G06 Big Data Sparks
Document57 pages
SABDE3G06 Big Data Sparks
sama ghorab
No ratings yet
Apache Spark Interview Questions and Answers PDF
Document31 pages
Apache Spark Interview Questions and Answers PDF
Zyad Ahmed
No ratings yet
Features of Apache Spark
Document7 pages
Features of Apache Spark
Sailesh Chauhan
No ratings yet
Pyspark Modules&packages RDD
Document9 pages
Pyspark Modules&packages RDD
klogeswaran.it
No ratings yet
Lec - Spark
Document65 pages
Lec - Spark
sama ghorab
No ratings yet
Top Answers To Spark Interview Questions
Document4 pages
Top Answers To Spark Interview Questions
Ejaz Alam
No ratings yet
Spark Interview Questions and Answers
Document31 pages
Spark Interview Questions and Answers
srinivas75k
100% (1)
Spark Intreview FAQ
Document21 pages
Spark Intreview FAQ
haranadhc
100% (1)
Apache Spark Tutorial
Document6 pages
Apache Spark Tutorial
abhimanyu thakur
100% (1)
Introduction To Spark
Document84 pages
Introduction To Spark
Namruta G H
No ratings yet
Overview of Apache Spark Technology
Document1 page
Overview of Apache Spark Technology
surbhi
No ratings yet
Apache Spark Interview Guide
Document22 pages
Apache Spark Interview Guide
Venmo 6193
0% (1)
Apache Spark Interview Questions and Answers For 2020
Document8 pages
Apache Spark Interview Questions and Answers For 2020
Shashank Abhishek
No ratings yet
Apache Spark Architecture
Document7 pages
Apache Spark Architecture
klogeswaran.it
No ratings yet
Spark Vs Hadoop Features Spark
Document9 pages
Spark Vs Hadoop Features Spark
consania
No ratings yet
UNIT 4 Part 2
Document11 pages
UNIT 4 Part 2
works8606
No ratings yet
Apache Spark
Document1 page
Apache Spark
Shashini Karunarathna
No ratings yet
BDA GTU Study Material Presentations Unit-6 03102021061221PM
Document23 pages
BDA GTU Study Material Presentations Unit-6 03102021061221PM
Ri Patel
No ratings yet
Apache Spark
Document27 pages
Apache Spark
Muhammad
No ratings yet
Spark PPT
Document55 pages
Spark PPT
bhargavi
No ratings yet
Apache Spark
Document25 pages
Apache Spark
PhillipeSantos
No ratings yet
Apache Spark Primer 170303
Document8 pages
Apache Spark Primer 170303
selives
No ratings yet
Introduction To Spark
Document23 pages
Introduction To Spark
Trần Nguyên Thái Bảo
No ratings yet
Spark Interview Questions: Click Here
Document35 pages
Spark Interview Questions: Click Here
Keshav Krishna
No ratings yet
Apache Spark Explanation
Document9 pages
Apache Spark Explanation
levin696
No ratings yet
Spark Interview 4
Document10 pages
Spark Interview 4
consania
No ratings yet
Introduction To Spark
Document54 pages
Introduction To Spark
Sana Khan
No ratings yet
Big Data Processing With Apache Spark
Document17 pages
Big Data Processing With Apache Spark
abhijitch
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Mongo DB
Document13 pages
Mongo DB
Mohamed H. Mokarab
No ratings yet
All Mongodb Commands
Document2 pages
All Mongodb Commands
Mohamed H. Mokarab
No ratings yet
BD Quiz Solu
Document4 pages
BD Quiz Solu
Mohamed H. Mokarab
No ratings yet
Spark BD
Document9 pages
Spark BD
Mohamed H. Mokarab
No ratings yet
Lec 44 Multicore
Document23 pages
Lec 44 Multicore
selvakrishnan_s
No ratings yet
Distributed Objects and Remote Invocation PDF
Document21 pages
Distributed Objects and Remote Invocation PDF
Tomas Dias
No ratings yet
Power Off Reset Reason Backup
Document5 pages
Power Off Reset Reason Backup
Osmarilyn Palacios
No ratings yet
Assignment 3
Document2 pages
Assignment 3
brisaac
No ratings yet
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
Document28 pages
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
Pallavi Bharti
No ratings yet
Thread-Level Parallelism: A Quantitative Approach, Sixth Edition
Document40 pages
Thread-Level Parallelism: A Quantitative Approach, Sixth Edition
AA
No ratings yet
Os Lab
Document18 pages
Os Lab
Satyam singh
No ratings yet
Performance Analysis of Parallel Algorithms On Mul
Document11 pages
Performance Analysis of Parallel Algorithms On Mul
RIYA GUPTA
No ratings yet
MCQS On Real Time Systems (6 Sem, CSE) : Prepared By: Sandigdha Acharya, Asst - Prof (CSE)
Document12 pages
MCQS On Real Time Systems (6 Sem, CSE) : Prepared By: Sandigdha Acharya, Asst - Prof (CSE)
Rajeev Bansal
No ratings yet
Unit 2 Processes: Structure Page Nos
Document24 pages
Unit 2 Processes: Structure Page Nos
Gaytri Dhingra
No ratings yet
Graphics Processing Unit (GPU) Architecture and Programming: TU/e 5kk73 Zhenyu Ye Henk Corporaal 2011-11-15
Document53 pages
Graphics Processing Unit (GPU) Architecture and Programming: TU/e 5kk73 Zhenyu Ye Henk Corporaal 2011-11-15
Vidit Upadhyay
No ratings yet
Assignment Os: Saqib Javed 11912
Document8 pages
Assignment Os: Saqib Javed 11912
Saqib Javed
No ratings yet
Unix and OS Lab Programs
Document25 pages
Unix and OS Lab Programs
Shashi
No ratings yet
Synchronization
Document81 pages
Synchronization
Maheshwar Bhat
No ratings yet
Unit VI Parallel Programming Concepts
Document90 pages
Unit VI Parallel Programming Concepts
Prathamesh
No ratings yet
Name: Course/year:: Micro Asia College of Science and Technology
Document3 pages
Name: Course/year:: Micro Asia College of Science and Technology
TrisTan Dolojan
No ratings yet
CS401 CPU Scheduling Exercise Problem 1 Solution FINAL
Document2 pages
CS401 CPU Scheduling Exercise Problem 1 Solution FINAL
techno des
No ratings yet
Cpu Sheduling
Document22 pages
Cpu Sheduling
Raviteja Papeneni
No ratings yet
MCS 041
Document4 pages
MCS 041
shivshankar shah
No ratings yet
Distributed Systems
Document7 pages
Distributed Systems
Murari Nayudu
No ratings yet
ADF Server Start Issue
Document4 pages
ADF Server Start Issue
wael
No ratings yet
Fork
Document37 pages
Fork
adarsh awasthi
No ratings yet
Using Parallel Maya: Autodesk, Inc
Document22 pages
Using Parallel Maya: Autodesk, Inc
MTs Ramara
No ratings yet
POSIX Thread Programming
Document36 pages
POSIX Thread Programming
n0wdev
No ratings yet
questions-OS-Default For OS-20210416-1544
Document6 pages
questions-OS-Default For OS-20210416-1544
akttripathi
No ratings yet
Master in High Performance Computing Advanced Parallel Programming LABS
Document2 pages
Master in High Performance Computing Advanced Parallel Programming LABS
nijota1
No ratings yet
Parallel Processing Chapter - 3: Instruction Level Parallelism
Document33 pages
Parallel Processing Chapter - 3: Instruction Level Parallelism
Getu Genene
No ratings yet
Operating System 2
Document29 pages
Operating System 2
nazim khan
No ratings yet
Platform Script
Document2 pages
Platform Script
andres tabaldo
No ratings yet
MCQS (Operating System)
Document17 pages
MCQS (Operating System)
cfhdystjcog
No ratings yet