Welcome to Scribd!

Bigdata Bits

Uploaded by

0% found this document useful (0 votes)

111 views2 pages

1) A resilient distributed dataset (RDD) is a read-only collection of objects partitioned across machines that can be rebuilt if lost. 2) The join operation on RDDs of type (K,V) and (K,W) returns an RDD of type (K,(V,W)) pairs with all pairs of elements for each key. 3) Both statements about RDDs are true - you can control partitioning and choose to persist RDDs to disk.

Original Description:

Original Title

BIGDATA BITS

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

111 views2 pages

Bigdata Bits

Uploaded by

Shreyansh Diwan

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

1)In spark, a ______________________is a read-only collection of objects partitioned across a

set of machines that can be rebuilt if a partition is lost.

A) Spark Streaming
B)esilient Distributed Dataset (RDD)
C) FlatMap
D) Driver

2)Given the following definition about the join transformation in Apache Spark:

def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]

Where join operation is used for joining two datasets. When it is called on datasets of type (K,
V) and (K, W), it returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.

Output the result of joinrdd, when the following code is run.

val rdd1 = sc.parallelize(Seq(("m",55),("m",56),("e",57),("e",58),("s",59),("s",54)))

val rdd2 = sc.parallelize(Seq(("m",60),("m",65),("s",61),("s",62),("h",63),("h",64)))
val joinrdd = rdd1.join(rdd2)
joinrdd.collect
1) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)),
(s,(59,62)), (h,(63,64)), (s,(54,61)), (s,(54,62)))
2) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)),
(s,(59,62)), (e,(57,58)), (s,(54,61)), (s,(54,62)))
3) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)),
(s,(59,62)), (s,(54,61)), (s,(54,62)))
4)None of the mentioned

3)Consider the following statements in the context of Spark:

Statement 1: Spark also gives you control over how you can partition your Resilient Distributed
Datasets (RDDs)

Statement 2: Spark allows you to choose whether you want to persist Resilient Distributed
Dataset (RDD) onto disk or not.
A)Only statement 1 is true
B)Only statement 2 is true
C)Both statements are true
D)Both statements are false

4)______________ leverages Spark Core fast scheduling capability to perform streaming

analytics.
A) MLlib
B) Spark Streaming
C) GraphX
D) RDDs

5)____________________ is a distributed graph processing framework on top of Spark.

A) MLlib
B) Spark streaming
C) GraphX
D) All of the mentioned

6)Consider the following statements:

Statement 1: Scale out means grow your cluster capacity by replacing with more powerful
machines

Statement 2: Scale up means incrementally grow your cluster capacity by adding more COTS
machines (Components Off the Shelf)

A) Only statement 1 is true

B) Only statement 2 is true
C) Both statements are true
D) Both statements are false

7)Which of the following is not a NoSQL database?

A) HBase
B) SQL Server
C) Cassandra
D) None of the mentioned

8)Which of the following are the simplest NoSQL databases ?

A) Key-value
B) Wide-column
C) Document
D) All of the mentioned

9)Point out the incorrect statement in the context of Cassandra:

A) It is originally designed at Facebook
B) It is a centralized key-value store
C) It is designed to handle large amounts of data across many commodity servers, providing
high availability with no single point of failure.
D) It uses a ring-based DHT (Distributed Hash Table) but without finger tables or routing

CMPE 266 - Mid Exam Final
Document3 pages
CMPE 266 - Mid Exam Final
AKSHAY BHOPE
No ratings yet
ACA BigData Consolidated Dump
Document29 pages
ACA BigData Consolidated Dump
Ahimed Habib Husen
No ratings yet
BAPI Filling The Configuration Structures
Document3 pages
BAPI Filling The Configuration Structures
Bora Gür
No ratings yet
JAVA Study Materils
Document34 pages
JAVA Study Materils
Sayan Barrie Mccullum
No ratings yet
UCON RFC Basic Scenario
Document28 pages
UCON RFC Basic Scenario
RepyRepy
No ratings yet
Unit-5 Spark
Document24 pages
Unit-5 Spark
nosopa5904
No ratings yet
ABD Exame PDF
Document17 pages
ABD Exame PDF
Bruno Teles
No ratings yet
Big Data Computing - Assignment 3
Document3 pages
Big Data Computing - Assignment 3
VarshaMega
No ratings yet
Bigdata Bits PDF
Document2 pages
Bigdata Bits PDF
Shreyansh Diwan
No ratings yet
Week 3 Assignment Answer 2022
Document3 pages
Week 3 Assignment Answer 2022
vasu.82003
No ratings yet
Transformations and Actions: A Visual Guide of The API
Document122 pages
Transformations and Actions: A Visual Guide of The API
Jorge Emilio Roa Barreto
No ratings yet
What Is Matlab
Document3 pages
What Is Matlab
raghgk2012
No ratings yet
Class 9 Syllabus & Sample Questions
Document3 pages
Class 9 Syllabus & Sample Questions
scribdsahil
No ratings yet
Spark: Fast, Interactive, Language-Integrated Cluster Computing
Document25 pages
Spark: Fast, Interactive, Language-Integrated Cluster Computing
Michaël Memeteau
No ratings yet
Sample Paper 1 (Solved) : Class Xii Informatics Practices (New)
Document4 pages
Sample Paper 1 (Solved) : Class Xii Informatics Practices (New)
Harsh Vardhan Singh
No ratings yet
Assignment 03 BigData Computing Noc23-Cs112
Document6 pages
Assignment 03 BigData Computing Noc23-Cs112
BALAJI M
No ratings yet
Unit 4-1
Document6 pages
Unit 4-1
chirag suresh chiru
No ratings yet
PPSC Lecturer of Computer Science Past Paper Questions
Document19 pages
PPSC Lecturer of Computer Science Past Paper Questions
Mian Ejaz
0% (2)
Bda MCQ
Document44 pages
Bda MCQ
Salman Shaikh
100% (1)
Final
Document10 pages
Final
mimisbhatu
No ratings yet
Class 12
Document5 pages
Class 12
kishlaysinha5
No ratings yet
Helpful
Document9 pages
Helpful
6pfjzdbp4n
No ratings yet
Ar 10steps Secinfowatch Us 0612
Document7 pages
Ar 10steps Secinfowatch Us 0612
Zahid Mashhood
No ratings yet
GATE Study Material, Forum, Downloads, Discussions & More!
Document20 pages
GATE Study Material, Forum, Downloads, Discussions & More!
himani23
No ratings yet
GATE 2005 CS
Document4 pages
GATE 2005 CS
karthi
No ratings yet
IBPS SO IT OFFICER Free Practice SET QUESTIONS PDF
Document13 pages
IBPS SO IT OFFICER Free Practice SET QUESTIONS PDF
Swethaa
No ratings yet
HBD Quiz Questions For Mid-2
Document10 pages
HBD Quiz Questions For Mid-2
Iswarya Mallala
No ratings yet
4a Resilient Distributed Datasets Etc PDF
Document46 pages
4a Resilient Distributed Datasets Etc PDF
23522020 Danendra Athallariq Harya P
No ratings yet
Pyspark Questions & Scenario Based
Document25 pages
Pyspark Questions & Scenario Based
Sowjanya Vakkalanka
No ratings yet
Which of The Following Is The Foundation of Mapreduce Operations?
Document12 pages
Which of The Following Is The Foundation of Mapreduce Operations?
Yamini Garuda
No ratings yet
Spark
Document17 pages
Spark
Ravi Kumar
No ratings yet
Big Data Computing Spark Basics and RDD: Ke Yi
Document43 pages
Big Data Computing Spark Basics and RDD: Ke Yi
Patrick Li
No ratings yet
Sample Paper General Instruction
Document12 pages
Sample Paper General Instruction
srsecondarycomputerscienceteacher1
No ratings yet
Pyspark MCQ
Document3 pages
Pyspark MCQ
Eren Levi
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
Document219 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
Chánh Lê
No ratings yet
1 Mark MCQs - 75
Document10 pages
1 Mark MCQs - 75
inithinrajkv
No ratings yet
How To Solve Your Internet Connection Not Working CMD Commands
Document5 pages
How To Solve Your Internet Connection Not Working CMD Commands
FarooqHaiderButt
No ratings yet
RDD Numeric
Document2 pages
RDD Numeric
bhargavi
No ratings yet
Spark Repartition1
Document7 pages
Spark Repartition1
dalalroshan
No ratings yet
Info Pract Xii QP PB 1 Set 1
Document7 pages
Info Pract Xii QP PB 1 Set 1
lavanyajoshi942
No ratings yet
Hadoop Spark
Document34 pages
Hadoop Spark
klogeswaran.it
No ratings yet
IT4302: Rapid Application Development: University of Colombo, Sri Lanka
Document17 pages
IT4302: Rapid Application Development: University of Colombo, Sri Lanka
Angana
No ratings yet
Week2 - Assignment Solutions
Document16 pages
Week2 - Assignment Solutions
mobal62523
No ratings yet
Sample AS Computer Science
Document3 pages
Sample AS Computer Science
Movie World
No ratings yet
(A) Entity Integrity Rule (B) Referential Integrity Constraint (C) Action Assertion (D) Composite Attribute
Document7 pages
(A) Entity Integrity Rule (B) Referential Integrity Constraint (C) Action Assertion (D) Composite Attribute
SuhelSiddiqui
No ratings yet
Tcs Programming Mcquestions
Document17 pages
Tcs Programming Mcquestions
Boomija IT
No ratings yet
MCQ Questions
Document8 pages
MCQ Questions
Tanusha hande
No ratings yet
Q. 1 - Q. 25 Carry One Mark Each
Document12 pages
Q. 1 - Q. 25 Carry One Mark Each
Pooja Sinha
No ratings yet
Preboard 2 Class Xii Paper
Document8 pages
Preboard 2 Class Xii Paper
Deepti Sharma
No ratings yet
Suppose You Have A Large Dataset Stored in A Distributed File System Like HDFS
Document11 pages
Suppose You Have A Large Dataset Stored in A Distributed File System Like HDFS
Israa
No ratings yet
Hadoop MCQ Challenge
Document63 pages
Hadoop MCQ Challenge
yashswami284
No ratings yet
PySpark Transformations Tutorial
Document58 pages
PySpark Transformations Tutorial
ravikumar lanka
100% (1)
Assignment 1 - 2023spring
Document3 pages
Assignment 1 - 2023spring
KA I LAO
No ratings yet
Ugc Net-Cs - System Software and Compilers Paper II June 2015
Document6 pages
Ugc Net-Cs - System Software and Compilers Paper II June 2015
Santhini Ka
No ratings yet
Research Paper Coordination Bonding
Document11 pages
Research Paper Coordination Bonding
SIDHARTH SINH
No ratings yet
Spark Training in Bangalore
Document36 pages
Spark Training in Bangalore
kellytechnologies
No ratings yet
CS Sample Paper 1
Document10 pages
CS Sample Paper 1
SpreadSheets
No ratings yet
Honours Paper
Document8 pages
Honours Paper
pkkumarsingh
No ratings yet
The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform
From Everand
The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform
Ron C. L'Esteve
No ratings yet
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
C++ Programming For Beginners
From Everand
C++ Programming For Beginners
carlos vereau
No ratings yet
Handling Data Skew in Mapreduce: Benjamin Gufler, Nikolaus Augsten, Angelika Reiser and Alfons Kemper
Document10 pages
Handling Data Skew in Mapreduce: Benjamin Gufler, Nikolaus Augsten, Angelika Reiser and Alfons Kemper
Shreyansh Diwan
No ratings yet
Ghatna Chakra Book For SSC in Hindi PDF
Document784 pages
Ghatna Chakra Book For SSC in Hindi PDF
Shreyansh Diwan
No ratings yet
The New India Assurance Co. LTD.: Leehan Retails PVT LTD
Document4 pages
The New India Assurance Co. LTD.: Leehan Retails PVT LTD
Shreyansh Diwan
No ratings yet
Chevrolet Lottery Board: Po Box 200, Harrogate England, United Kingdom
Document2 pages
Chevrolet Lottery Board: Po Box 200, Harrogate England, United Kingdom
Shreyansh Diwan
No ratings yet
Wipro Very Important-1
Document310 pages
Wipro Very Important-1
Phani Deep Divvela
75% (4)
Bigdata Bits PDF
Document2 pages
Bigdata Bits PDF
Shreyansh Diwan
No ratings yet
Abap Certification Test
Document5 pages
Abap Certification Test
Gayathri S
No ratings yet
Motorola68k 2
Document114 pages
Motorola68k 2
SoumyabrataPatra
No ratings yet
Supplier Data Migration
Document43 pages
Supplier Data Migration
Mohamed Rizk
No ratings yet
Chapter 4
Document16 pages
Chapter 4
T Sha
No ratings yet
Oracle Database 11g: Administration Workshop I - D50102GC11
Document13 pages
Oracle Database 11g: Administration Workshop I - D50102GC11
haflores2512
No ratings yet
Object Oriented Programming Exercises
Document13 pages
Object Oriented Programming Exercises
Ngọc Sơn
No ratings yet
EDIABAS бк ECU SIMULATOR
Document23 pages
EDIABAS бк ECU SIMULATOR
Владислав Зыбин
No ratings yet
F2PY Tutorial
Document44 pages
F2PY Tutorial
Gifron
100% (1)
Hill CH 2 Ed 3
Document61 pages
Hill CH 2 Ed 3
Tamiru Hailu
No ratings yet
Selenium Notes
Document55 pages
Selenium Notes
sageswagbaba
No ratings yet
The Acorn RISC Machine (ARM)
Document12 pages
The Acorn RISC Machine (ARM)
Vikas Singh
No ratings yet
Government Engineering College-Katpur-Patan Cse Department Lab Plan - Object Oriented Programming With C++ (2140705)
Document10 pages
Government Engineering College-Katpur-Patan Cse Department Lab Plan - Object Oriented Programming With C++ (2140705)
Jitendra Patel
No ratings yet
C++ File 2
Document18 pages
C++ File 2
Devansh Mittal
No ratings yet
Build Your Own Database From Scratch-2023-英文版
Document120 pages
Build Your Own Database From Scratch-2023-英文版
taylor18883284612
No ratings yet
MCQ Ste-I
Document5 pages
MCQ Ste-I
pooja katare
No ratings yet
Emgu CV Get All Frames From Video File: 2 Answers
Document4 pages
Emgu CV Get All Frames From Video File: 2 Answers
Hernis Mercado
No ratings yet
Function Modules in Index - 7
Document6 pages
Function Modules in Index - 7
nlpatel22
No ratings yet
Assignment # 1: 1.write A Program in Python To Print PRIME Numbers Between 1 To N, Where The
Document11 pages
Assignment # 1: 1.write A Program in Python To Print PRIME Numbers Between 1 To N, Where The
Radhikaa Behl
No ratings yet
5cs4 3 Os Guess Paper Solution 8394
Document105 pages
5cs4 3 Os Guess Paper Solution 8394
Pratap
No ratings yet
SQL Interview Questions and Answers
Document9 pages
SQL Interview Questions and Answers
Oana Achitei
No ratings yet
Computer Project Grade 10 Calculator
Document20 pages
Computer Project Grade 10 Calculator
Swaraj Mukherjee
50% (2)
ASP Net MVC 3tier
Document12 pages
ASP Net MVC 3tier
Abdhal Galu
No ratings yet
2.memory Structure in Oracle
Document20 pages
2.memory Structure in Oracle
fahd sial
No ratings yet
Programming For Civil and Building Engineers Using Matlab
Document9 pages
Programming For Civil and Building Engineers Using Matlab
Nafees Imitaz
No ratings yet
Hybridizing Guided Genetic Algorithm and Single-Based Metaheuristics To Solve Unrelated Parallel Machine Scheduling Problem With Scarce Resources
Document13 pages
Hybridizing Guided Genetic Algorithm and Single-Based Metaheuristics To Solve Unrelated Parallel Machine Scheduling Problem With Scarce Resources
IAES IJAI
No ratings yet
C Notes en
Document68 pages
C Notes en
savio77
No ratings yet
SWT Lab4
Document37 pages
SWT Lab4
Manh Hieu Nguyen
No ratings yet