Welcome to Scribd!

Apache Spark Theory by Arsh

Uploaded by

0% found this document useful (0 votes)

63 views4 pages

Apache Spark is an open-source cluster computing framework that provides in-memory processing for real-time data. It runs up to 100 times faster than Hadoop for large-scale data processing due to its in-memory computing and powerful caching capabilities. Spark supports batch processing, iterative algorithms, interactive queries, streaming, and can be deployed on Mesos, YARN, or its own cluster manager.

Original Description:

Hadoop and spark

Original Title

Apache Spark Theory by Arsh (3)

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

0% found this document useful (0 votes)

63 views4 pages

Apache Spark Theory by Arsh

Uploaded by

Faraz Akhtar

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 4

Search inside document

Spark & its Features

Apache Spark is an open source cluster computing framework for real-time data processing.
The main feature of Apache Spark is its in-memory cluster computing that increases the
processing speed of an application.

Spark provides an interface for programming entire clusters with implicit data parallelism and
fault tolerance.

It is designed to cover a wide range of workloads such as batch applications, iterative

algorithms, interactive queries, and streaming.

Features of Apache Spark:

Fig: Features of Spark

1. Speed
Spark runs up to 100 times faster than Hadoop MapReduce for large-scale data
processing. It is also able to achieve this speed through controlled partitioning.
2. Powerful Caching
Simple programming layer provides powerful caching and disk persistence capabilities.
3. Deployment
It can be deployed through Mesos, Hadoop via YARN, or Spark’s own cluster manager.
4. Real-Time
It offers Real-time computation & low latency because of in-memory computation.
5. Polyglot
Spark provides high-level APIs in Java, Scala, Python, and R. Spark code can be written
in any of these four languages. It also provides a shell in Scala and Python.
Fig: Spark Architecture
Spark Eco-System
Spark Core is the base engine for large-scale parallel and distributed data processing.
Further, additional libraries which are built on the top of the core allows diverse workloads for
streaming, SQL, and machine learning.
It is responsible for memory management and fault recovery, scheduling, distributing and
monitoring jobs on a cluster & interacting with storage systems.

Spark Support multiple framework like Spark SQL, Spark Streaming, MLlib, GraphX, and the
Core API component.

Fig: Spark Eco-System

Resilient Distributed Dataset(RDD)

RDDs are the building blocks of any Spark application. RDDs Stands for:

● Resilient: Fault tolerant and is capable of rebuilding data on failure

● Distributed: Distributed data among the multiple nodes in a cluster
● Dataset: Collection of partitioned data with values
Workflow of RDD

With RDDs, you can perform two types of operations:

1. Transformations: They are the operations that are applied to create a new RDD.
2. Actions: They are applied on an RDD to instruct Apache Spark to apply computation and
pass the result back to the driver.

DAG -
Directed Acyclic Graph

1. It represents the flow chart of your spark application.

2. It will decide the flow of processing of your spark application.
3. According to the flow the spark driver will create a execution plan.

Spark Architecture
Working of Spark -

STEP 1: The client submits spark user application code. When an application code is submitted,
the driver implicitly converts user code that contains transformations and actions into a logically
directed acyclic graph called DAG. At this stage, it also performs optimizations such as
pipelining transformations.

STEP 2: After that, it converts the logical graph called DAG into physical execution plan with
many stages. After converting into a physical execution plan, it creates physical execution units
called tasks under each stage. Then the tasks are bundled and sent to the cluster.

STEP 3: Now the driver talks to the cluster manager and negotiates the resources. Cluster
manager launches executors in worker nodes on behalf of the driver. At this point, the driver will
send the tasks to the executors based on data placement. When executors start, they register
themselves with drivers. So, the driver will have a complete view of executors that are executing
the task.

STEP 4: During the course of execution of tasks, driver program will monitor the set of executors
that runs. Driver node also schedules future tasks based on data placement.

This was all about Spark Architecture.

Apache Spark Interview Questions Book
Document15 pages
Apache Spark Interview Questions Book
Praneeth Krishna
100% (1)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Cerner
Document8 pages
Cerner
Rudresh GE
100% (1)
Spark Notes
Document37 pages
Spark Notes
bhargavi
No ratings yet
Unit-5 Spark
Document20 pages
Unit-5 Spark
Siva
No ratings yet
Spark Interview Questions and Answers
Document31 pages
Spark Interview Questions and Answers
srinivas75k
100% (1)
UNIT 4 Part 2
Document11 pages
UNIT 4 Part 2
works8606
No ratings yet
Big Data Assignment
Document6 pages
Big Data Assignment
suibian.270619
No ratings yet
Apache Spark Interview Questions
Document12 pages
Apache Spark Interview Questions
varun3dec1
No ratings yet
Apache Spark Components
Document4 pages
Apache Spark Components
nitinlucky
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
Document11 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
divya kolluri
No ratings yet
Features of Apache Spark
Document7 pages
Features of Apache Spark
Sailesh Chauhan
No ratings yet
Top Answers to Spark Interview Questions (3)
Document32 pages
Top Answers to Spark Interview Questions (3)
Nitin Gorde
No ratings yet
What Is Spark?: History of Apache Spark
Document65 pages
What Is Spark?: History of Apache Spark
Apurva
No ratings yet
BDA GTU Study Material Presentations Unit-6 03102021061221PM
Document23 pages
BDA GTU Study Material Presentations Unit-6 03102021061221PM
Ri Patel
No ratings yet
Top Answers To Spark Interview Questions
Document32 pages
Top Answers To Spark Interview Questions
srinivas75k
No ratings yet
Apache Spark Interview Guide
Document22 pages
Apache Spark Interview Guide
Venmo 6193
0% (1)
Bda 5
Document21 pages
Bda 5
abdulahad.ubeid
No ratings yet
Spark 101
Document25 pages
Spark 101
Daniel Ortiz
No ratings yet
Key Features: General-Purpose Fast Cluster Computing Platform
Document16 pages
Key Features: General-Purpose Fast Cluster Computing Platform
Mahesh VP
No ratings yet
Spark Interview Questions: Click Here
Document35 pages
Spark Interview Questions: Click Here
Keshav Krishna
No ratings yet
Cerificate Report Sharique
Document12 pages
Cerificate Report Sharique
adil.shaikh
No ratings yet
Apache Spark Architecture
Document7 pages
Apache Spark Architecture
klogeswaran.it
No ratings yet
Apache Spark Interview Questions and Answers PDF
Document31 pages
Apache Spark Interview Questions and Answers PDF
Zyad Ahmed
No ratings yet
Spark Intreview FAQ
Document21 pages
Spark Intreview FAQ
haranadhc
100% (1)
Spark: Prepared by Dulari Bhatt
Document19 pages
Spark: Prepared by Dulari Bhatt
Dulari Bosamiya Bhatt
No ratings yet
Spark BD
Document9 pages
Spark BD
Mohamed H. Mokarab
No ratings yet
Top Answers To Spark Interview Questions
Document4 pages
Top Answers To Spark Interview Questions
Ejaz Alam
No ratings yet
Interview - Questions
Document8 pages
Interview - Questions
SELVAKUMAR MP
No ratings yet
Spark
Document9 pages
Spark
Mohamed H. Mokarab
No ratings yet
Bigdata Interview Q&A
Document71 pages
Bigdata Interview Q&A
gvk171983
No ratings yet
Spark Vs Hadoop Features Spark
Document9 pages
Spark Vs Hadoop Features Spark
consania
No ratings yet
Spark-Rdd
Document15 pages
Spark-Rdd
K Anantha Krishnan
No ratings yet
Apache Spark Tutorial
Document6 pages
Apache Spark Tutorial
abhimanyu thakur
100% (1)
BDALab Assn5
Document16 pages
BDALab Assn5
Deepti Agrawal
No ratings yet
Data Bricks Interview
Document18 pages
Data Bricks Interview
panditdandgule777_80
No ratings yet
Unit 4 Spark Cassendra
Document41 pages
Unit 4 Spark Cassendra
downloadjain123
No ratings yet
Spark SQL
Document25 pages
Spark SQL
Rishi
No ratings yet
Tech Seminar Report
Document5 pages
Tech Seminar Report
Saikumar Thurai
No ratings yet
Apache Spark Interview Questions and Answers For 2020
Document8 pages
Apache Spark Interview Questions and Answers For 2020
Shashank Abhishek
No ratings yet
Spark Interview Questions
Document7 pages
Spark Interview Questions
Rajesh Sugumaran
100% (1)
APACHE SPARK Architecture: Computing Engine
Document7 pages
APACHE SPARK Architecture: Computing Engine
Every Medias
No ratings yet
Interview+Questions+ +Apache+Spark
Document3 pages
Interview+Questions+ +Apache+Spark
Junaid Sheikh
No ratings yet
SPARK Interview Questions
Document12 pages
SPARK Interview Questions
aditya.rana.datascience
No ratings yet
Spark2x: Big Data Huawei Course
Document25 pages
Spark2x: Big Data Huawei Course
Thiago Siqueira
No ratings yet
Solution Methodology
Document3 pages
Solution Methodology
Arnab Dey
No ratings yet
09 Programming Hadoop - Spark, R and Pig
Document80 pages
09 Programming Hadoop - Spark, R and Pig
Neeraj Garg
No ratings yet
Apache Spark Quick Guide
Document21 pages
Apache Spark Quick Guide
Oumaima Alfa
100% (1)
Spark PPT
Document55 pages
Spark PPT
bhargavi
No ratings yet
Unit 5
Document109 pages
Unit 5
Rajesh Kumar Rakasula
No ratings yet
Spark Interview 4
Document10 pages
Spark Interview 4
consania
No ratings yet
Apache Spark Features
Document2 pages
Apache Spark Features
nitinlucky
No ratings yet
Apache Spark Explanation
Document9 pages
Apache Spark Explanation
levin696
No ratings yet
226 Unit-7
Document26 pages
226 Unit-7
shivam saxena
No ratings yet
School of Computing Indian Institute of Information Technology UNA Himachal Pradesh
Document10 pages
School of Computing Indian Institute of Information Technology UNA Himachal Pradesh
Chiraag Mittal
No ratings yet
Report SQL PDF
Document21 pages
Report SQL PDF
Rambabu Alokam
No ratings yet
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
Document96 pages
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
Costi Stoian
No ratings yet
Machine Learning in Spark
Document26 pages
Machine Learning in Spark
brockthebone
No ratings yet
Sparks QL Sig Mod 2015
Document12 pages
Sparks QL Sig Mod 2015
aloknsingh
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Travelmate 2300
Document131 pages
Travelmate 2300
Agustin Gamero
No ratings yet
Lab 2
Document15 pages
Lab 2
hackcodes.freecluster
No ratings yet
Hardware Defination
Document7 pages
Hardware Defination
shubham sahu
No ratings yet
Sensors 20 00817 v3
Document25 pages
Sensors 20 00817 v3
Asd fASFAS S
No ratings yet
Mitsubishi PLC 7 Segment
Document1 page
Mitsubishi PLC 7 Segment
mishu_g
No ratings yet
Global - IT HandBook - Ver3 5
Document32 pages
Global - IT HandBook - Ver3 5
sruthi shetty579
No ratings yet
Tailorpro
Document13 pages
Tailorpro
Wresni Catur S
No ratings yet
LDALI User Manual
Document266 pages
LDALI User Manual
Alex Cheriyan
No ratings yet
Check Point Mobile Guide
Document24 pages
Check Point Mobile Guide
bn2552
No ratings yet
Getting Familiar With Your JSP Server
Document14 pages
Getting Familiar With Your JSP Server
Mohit Miglani
No ratings yet
Appendix-2 Car Operating Panel Installation: Document Version: v1.0 ALYA Software Version: 1.3
Document6 pages
Appendix-2 Car Operating Panel Installation: Document Version: v1.0 ALYA Software Version: 1.3
ben omar Hmidou
No ratings yet
GMSH Official Tutorial
Document244 pages
GMSH Official Tutorial
oct3178
No ratings yet
System Compatibility Report
Document3 pages
System Compatibility Report
Gendis manja
No ratings yet
CSS 22519 Master Notes
Document11 pages
CSS 22519 Master Notes
Srushti Jadhav
No ratings yet
1z0 434 Oraclesoasuite12cessentials Blogspot Pe 2015-12-1z0
Document30 pages
1z0 434 Oraclesoasuite12cessentials Blogspot Pe 2015-12-1z0
mplescano
No ratings yet
Lab Manual
Document63 pages
Lab Manual
kiruthika
100% (2)
MANUAL Testing Notes
Document58 pages
MANUAL Testing Notes
subhabirajdar
No ratings yet
Introduction To Shell Lab Journal - Lab 5: Objective
Document3 pages
Introduction To Shell Lab Journal - Lab 5: Objective
Maryam Khalil
No ratings yet
SRS Format by TGMC
Document9 pages
SRS Format by TGMC
ashokcs
No ratings yet
Motorola Mobile Computer Warm Cold Boot Sequences
Document7 pages
Motorola Mobile Computer Warm Cold Boot Sequences
Hilber Ricardo Moyano
No ratings yet
UberEats Clone App - UberEats Clone Script For Your Food Delivery Business
Document9 pages
UberEats Clone App - UberEats Clone Script For Your Food Delivery Business
pickneats
No ratings yet
Discovery Lab 11
Document9 pages
Discovery Lab 11
yasmin187
No ratings yet
PowerShell For The IT Administrator Part 1 v1.1
Document160 pages
PowerShell For The IT Administrator Part 1 v1.1
birroz
No ratings yet
Introduction To Power Apps Functions
Document10 pages
Introduction To Power Apps Functions
pvenu.sp13
No ratings yet
Et200sp Di 8x24vdc Ba Manual en-US en-US PDF
Document28 pages
Et200sp Di 8x24vdc Ba Manual en-US en-US PDF
Harindra Bayu
No ratings yet
PHP Mail Functions
Document2 pages
PHP Mail Functions
Thant Thant
No ratings yet
Cubase Elements 8 Crack Mac Osl PDF
Document4 pages
Cubase Elements 8 Crack Mac Osl PDF
Tomeka
No ratings yet
DS1245
Document12 pages
DS1245
Neha Dadhich
No ratings yet
2023 LLMBC LLM Foundations
Document92 pages
2023 LLMBC LLM Foundations
abhijitkalita
No ratings yet