Welcome to Scribd!

Sheet 08

Uploaded by

0% found this document useful (0 votes)

6 views2 pages

This document contains exercises for a course on foundations of data engineering. Exercise 1 asks to order different workloads by how well they are suited for cluster computing. Exercise 2 asks to formulate MapReduce programs to handle tasks involving sets, documents, and matrices. Exercise 3 involves loading a song dataset into Spark and then answering questions by using Spark Dataset API functions, including finding songs from a year, the earliest song, hottest artist, most prolific album, and pairs of songs with similarly familiar artists.

Original Description:

TUM - FDE - Sheet8 - MapReduce

Original Title

Sheet08

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

6 views2 pages

Sheet 08

Uploaded by

eir.gn

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

TU München, Fakultät für Informatik

Lehrstuhl III: Datenbanksysteme

Prof. Dr. Thomas Neumann

Exercises for Foundations in Data Engineering, WiSe 22/23

Alexander Beischl, Maximilian Reif (i3fde@in.tum.de)
http://db.in.tum.de/teaching/ws2223/foundationsde
Sheet Nr. 08

Exercise 1 Order the following workloads by how well they are suited for cluster computing.
Assume a cluster of machines which are connected via ethernet and that the dataset is
distributed onto the machines when the workload starts. A workload that is well suited
for cluster computing decreases in execution time proportional to the number of machines
in the cluster.
• Search in text documents
• The toy example from the lecture (Basic Building Blocks, slide 26-35)
• Sorting records
• Breadth-first search
• Join of two relations (both relations to big for one machine)
• Shuffling a dataset

Exercise 2
Formulate map-reduce programs to handle the following tasks:
1. For the multi-sets X : [(a, b)], Y : [(c, d)] find all combinations where b = c.
2. For the documents D : [(name, [w])] (where D is a list of documents, in which each
document is a list of words w), find the words which all documents have in common.
3. Compute AB for the two matrices A and B.

Exercise 3 This exercise is about getting familiar with the Spark Dataset API.
To get started, open a spark shell and load the song dataset into a dataset. The data is an
excerpt from the Million Song Dataset. Make sure that appropriate data types are used
for each column. You can check the types e.g. with the printSchema function.
For a list of functions offered by the dataset API please refer to the dataset documentation.
To get started with importing a csv file, have a look at this tutorial.
Once the dataset is loaded, use the API to answer the following questions:
1. List all songs from the year 2000.
2. In which year is the first song from this dataset published? (For some songs in the
dataset, the year is set to 0. In this case, the year is unknown.)
3. Which artist is the hottest, according to the artist_hotttnesss column?
4. Which album has most songs on it? (Use release text and artist to identify album.)
5. Find song pairs with equally familiar artists. That is: a.artist_familiarity =
b.artist_familiarity How many pairs are there?

1
6. Find song pairs with similarly familiar artists.
That is: |a.artist_familiarity − b.artist_familiarity| < 0.001
How many pairs are there? You may notice, that when trying to run this query, it
does not terminate (in reasonable time). Call explain on the final dataset. This will
show the execution plan which will be used to retrieve the result. Most likely, the
root node of the plan you are seeing is a CarthesianProduct operator. It produces
a cross product of all its input and applies a filter operation on the result. Thus,
it is quite slow for larger amounts of data. Try to reformulate the query so that no
CarthesianProduct is used.

PJ 1
Document6 pages
PJ 1
Long Trương
No ratings yet
CS593: Data Structure and Database Lab Take Home Assignment - 4 (10 Questions, 100 Points)
Document3 pages
CS593: Data Structure and Database Lab Take Home Assignment - 4 (10 Questions, 100 Points)
Lafoot Babu
No ratings yet
K-Menas Problem
Document2 pages
K-Menas Problem
Gentian Strana
No ratings yet
Adobe Interview
Document16 pages
Adobe Interview
Subhash Jha
No ratings yet
MixSIR Manual 0.5
Document20 pages
MixSIR Manual 0.5
ennnetsw
No ratings yet
Standard Welded Wire Mesh For Concrete Reinforcement
Document3 pages
Standard Welded Wire Mesh For Concrete Reinforcement
marksantana
100% (2)
Promiment Sobriquets (World) : S.No. Place Sobriquet
Document6 pages
Promiment Sobriquets (World) : S.No. Place Sobriquet
kausik nath
100% (1)
CSE 5311: Design and Analysis of Algorithms Programming Project Topics
Document3 pages
CSE 5311: Design and Analysis of Algorithms Programming Project Topics
Fgv
No ratings yet
Adt Perf
Document42 pages
Adt Perf
Test
No ratings yet
Homework File: March 23, 2012
Document8 pages
Homework File: March 23, 2012
taemini9214
No ratings yet
List of Lab Exercises
Document3 pages
List of Lab Exercises
Ajay Raj Srivastava
No ratings yet
cs301 FAQ
Document9 pages
cs301 FAQ
pirah chhachhar
No ratings yet
Ann Tutorial
Document12 pages
Ann Tutorial
Jaam Jam
No ratings yet
CS5785 Homework 4: .PDF .Py .Ipynb
Document5 pages
CS5785 Homework 4: .PDF .Py .Ipynb
Al Tarino
No ratings yet
ps1 PDF
Document5 pages
ps1 PDF
Anil
No ratings yet
Data Structures and Algorithms
Document6 pages
Data Structures and Algorithms
suryaniraj79
No ratings yet
CSC263 Assignment 1 2014
Document5 pages
CSC263 Assignment 1 2014
iasi2050
No ratings yet
MCA 312 Design&Analysis of Algorithm QuestionBank
Document7 pages
MCA 312 Design&Analysis of Algorithm QuestionBank
nbpr
No ratings yet
Week 2 Project - Search Algorithms - CSMM
Document8 pages
Week 2 Project - Search Algorithms - CSMM
Raj kumar manepally
No ratings yet
Programming Pearls - Self-Describing Data
Document5 pages
Programming Pearls - Self-Describing Data
Vikash Singh
No ratings yet
MDA File
Document37 pages
MDA File
Akshat
No ratings yet
Assignment 2 - Data Structure Comparison
Document5 pages
Assignment 2 - Data Structure Comparison
Anonymous aiVnyoJb
No ratings yet
Mca1002 - Problem Solving Using Data Structures and Algorithms - LTP - 1.0 - 1 - Mca1002
Document4 pages
Mca1002 - Problem Solving Using Data Structures and Algorithms - LTP - 1.0 - 1 - Mca1002
Prateek Balchandani
No ratings yet
9nm4alc: CS 213 M 2023 Data Structures
Document21 pages
9nm4alc: CS 213 M 2023 Data Structures
sahil Barbade
No ratings yet
A Scalable Parallel Union-Find Algorithm For Distributed Memory Computers
Document10 pages
A Scalable Parallel Union-Find Algorithm For Distributed Memory Computers
lokesh guru
No ratings yet
Background: 1.1 General Introduction
Document19 pages
Background: 1.1 General Introduction
dimtry
No ratings yet
ELEC-278: Assignment 3
Document2 pages
ELEC-278: Assignment 3
jsy2008
No ratings yet
Assignment 1
Document2 pages
Assignment 1
ashish
No ratings yet
Content of Lab. Manual Prepared by Prof. Anil Kumar Swain
Document6 pages
Content of Lab. Manual Prepared by Prof. Anil Kumar Swain
Rudrasish Mishra
No ratings yet
++probleme Tot
Document22 pages
++probleme Tot
Rochelle Byers
No ratings yet
4.) Define Algorithm With Its Function ? ANS:-: A) Input
Document2 pages
4.) Define Algorithm With Its Function ? ANS:-: A) Input
shivam kachole
No ratings yet
Csci 5454 HW
Document10 pages
Csci 5454 HW
Ramnarayan Shreyas
No ratings yet
Lab 3: Unix and Linux
Document3 pages
Lab 3: Unix and Linux
Michael Fiorelli
No ratings yet
Instructions:: CS593: Data Structure and Database Lab Take Home Assignment - 5 (10 Questions, 100 Points)
Document3 pages
Instructions:: CS593: Data Structure and Database Lab Take Home Assignment - 5 (10 Questions, 100 Points)
Lafoot Babu
No ratings yet
Questions Collected From Web - Amazon
Document13 pages
Questions Collected From Web - Amazon
Nishtha Patel
100% (1)
Algorithms Examples Sheet
Document5 pages
Algorithms Examples Sheet
funtime_in_life2598
No ratings yet
HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing
Document11 pages
HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing
Mahroosh Banday
No ratings yet
Vector Diagram Simulation Package: A Matlab Toolbox: Ohio Advanced Epr Laboratory Robert Mccarrick - 1/23/2012
Document5 pages
Vector Diagram Simulation Package: A Matlab Toolbox: Ohio Advanced Epr Laboratory Robert Mccarrick - 1/23/2012
Azhar Mahmood
No ratings yet
Assignment 1
Document5 pages
Assignment 1
Changwook Jung
No ratings yet
Case Studies C++
Document5 pages
Case Studies C++
Vaibhav Chitransh
No ratings yet
Intro To DS and Algo Analysis
Document51 pages
Intro To DS and Algo Analysis
sachinkumardbg766
No ratings yet
Unix For Poets
Document25 pages
Unix For Poets
Sergei
No ratings yet
Sheet 07
Document3 pages
Sheet 07
eir.gn
No ratings yet
Assignment 2
Document5 pages
Assignment 2
Nguyễn Văn Nhật
No ratings yet
Ad3271 - Data Structure Design Lab - Final
Document68 pages
Ad3271 - Data Structure Design Lab - Final
suji.anand21
No ratings yet
Algo Assign
Document5 pages
Algo Assign
Muhammad Usman
No ratings yet
CS341L - Lab - 9 - Signals and System Techniques On Real-Time Data
Document4 pages
CS341L - Lab - 9 - Signals and System Techniques On Real-Time Data
arslan
No ratings yet
Assignment 22778 19835 612cbc1d69fbb
Document7 pages
Assignment 22778 19835 612cbc1d69fbb
Minakshi Patil
No ratings yet
Data Structure Syllabus
Document4 pages
Data Structure Syllabus
VaradharajanArul
No ratings yet
Sheet 03
Document3 pages
Sheet 03
eir.gn
No ratings yet
Data Structure
Document56 pages
Data Structure
dineshgomber
No ratings yet
Rahul Pandey
Document10 pages
Rahul Pandey
barunlucky
No ratings yet
Data Structure Notes
Document171 pages
Data Structure Notes
kavirajee
No ratings yet
Introduction To Deep Learning Assignment 0: September 2023
Document3 pages
Introduction To Deep Learning Assignment 0: September 2023
christiaanbergsma03
No ratings yet
DSAL Lab Manual
Document61 pages
DSAL Lab Manual
r.bunny.0022
No ratings yet
COSC-160 Fall2019 P2
Document4 pages
COSC-160 Fall2019 P2
Funmath
No ratings yet
Daa S
Document6 pages
Daa S
kolasandeep555
No ratings yet
A Complete Note On Data Structure and Algorithms
Document160 pages
A Complete Note On Data Structure and Algorithms
bishwas pokharel
No ratings yet
Amazon Interview Ques
Document10 pages
Amazon Interview Ques
Jyapriya
No ratings yet
Untitled
Document66 pages
Untitled
Andrea Catalan
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Sets, Numbers and Flowcharts
From Everand
Sets, Numbers and Flowcharts
William R. Parks
No ratings yet
Sheet 07
Document3 pages
Sheet 07
eir.gn
No ratings yet
Sheet 06
Document4 pages
Sheet 06
eir.gn
No ratings yet
Sheet 09
Document2 pages
Sheet 09
eir.gn
No ratings yet
Sheet 05
Document2 pages
Sheet 05
eir.gn
No ratings yet
Sheet 10
Document1 page
Sheet 10
eir.gn
No ratings yet
Sheet 02
Document2 pages
Sheet 02
eir.gn
No ratings yet
Sheet 01
Document2 pages
Sheet 01
eir.gn
No ratings yet
Sheet 04
Document3 pages
Sheet 04
eir.gn
No ratings yet
Sheet 03
Document3 pages
Sheet 03
eir.gn
No ratings yet
X1D - USER GUIDE English PDF
Document157 pages
X1D - USER GUIDE English PDF
Steve
No ratings yet
Revised-13.8kV Cable Sizing Calc-Arar 03.03.2005
Document10 pages
Revised-13.8kV Cable Sizing Calc-Arar 03.03.2005
srigirisetty208
No ratings yet
Gaskets
Document34 pages
Gaskets
Aslam Asiff
No ratings yet
HCI
Document15 pages
HCI
Abrivylle Cerise
No ratings yet
1986 GT Catalog
Document16 pages
1986 GT Catalog
tspinner
No ratings yet
A Survey of One of The Most Important Pressure Pipe Codes
Document3 pages
A Survey of One of The Most Important Pressure Pipe Codes
nerioalfonso
No ratings yet
AdaptiveAndLearningSystems TheoryAndApplications KumpatiSNarendra
Document410 pages
AdaptiveAndLearningSystems TheoryAndApplications KumpatiSNarendra
SooperAktif
No ratings yet
Accusine SWP 20 - 480 A Active Harmonic Filter: Receiving
Document2 pages
Accusine SWP 20 - 480 A Active Harmonic Filter: Receiving
Cata Catalin
No ratings yet
Pre Commiss. Check List Chilled Water Pumps
Document3 pages
Pre Commiss. Check List Chilled Water Pumps
ARUL SANKARAN
100% (1)
What Is The Production Process of Inorganic Pigment Powder
Document3 pages
What Is The Production Process of Inorganic Pigment Powder
kinley dorjee
No ratings yet
Department of Education: Learning Activity Worksheets
Document4 pages
Department of Education: Learning Activity Worksheets
ARLENE GRACE AVENUE
No ratings yet
Grade 5 DLL SCIENCE 5 Q3 Week 10
Document4 pages
Grade 5 DLL SCIENCE 5 Q3 Week 10
Leonardo Bruno Jr
100% (4)
Lesson 1 - Different Philosophical Thoughts On Education
Document8 pages
Lesson 1 - Different Philosophical Thoughts On Education
Glenn F. Jove
No ratings yet
Unit 14 Design of Slender Columns
Document32 pages
Unit 14 Design of Slender Columns
Sh Jvon Sh Jvon
No ratings yet
Conceptualisation and Pilot Study of Shelled Compressed Earth Block For Sustainable Housing in Nigeria
Document15 pages
Conceptualisation and Pilot Study of Shelled Compressed Earth Block For Sustainable Housing in Nigeria
MMMFIAZ
No ratings yet
Lipid Microencapsulation Allows Slow Release of Organic Acids and Natural Identical Flavors Along The Swine Intestine1,2
Document8 pages
Lipid Microencapsulation Allows Slow Release of Organic Acids and Natural Identical Flavors Along The Swine Intestine1,2
ĐêmTrắng
No ratings yet
Do Dice Play God
Document6 pages
Do Dice Play God
ninafatima allam
No ratings yet
6.2 MEGA Workshop
Document3 pages
6.2 MEGA Workshop
progress dube
No ratings yet
Cell Division: Mitosis & Meiosis
Document50 pages
Cell Division: Mitosis & Meiosis
Lyca Gunay
No ratings yet
Welding Consumables-Cast Iron
Document9 pages
Welding Consumables-Cast Iron
shabbir626
No ratings yet
The Tangents Drawn at The Ends of
Document6 pages
The Tangents Drawn at The Ends of
pjojibabu
100% (2)
High Voltage Engineering Assignment No 2: Ma'Am DR Sidra Mumtaz
Document7 pages
High Voltage Engineering Assignment No 2: Ma'Am DR Sidra Mumtaz
Zabeehullahmiakhail
No ratings yet
Assignment 3
Document2 pages
Assignment 3
Gaurav Jain
No ratings yet
Chemical Energertics
Document24 pages
Chemical Energertics
Anotidaishe Chakanetsa
No ratings yet
Lincoln, Bruce. How To Read A Religious Text - Reflections On Some Passages of The Chandogya Upanisad
Document14 pages
Lincoln, Bruce. How To Read A Religious Text - Reflections On Some Passages of The Chandogya Upanisad
vkas
No ratings yet
Asthma Action Plan
Document1 page
Asthma Action Plan
Akos
No ratings yet
Analyser Shelter ATEC Italy - WWW - Atecsrl
Document3 pages
Analyser Shelter ATEC Italy - WWW - Atecsrl
MalyshAV
No ratings yet
P2P Paths With Djikstra's Algorithm
Document8 pages
P2P Paths With Djikstra's Algorithm
Nouman Afzal
No ratings yet