Welcome to Scribd!

Mapreduce Algorithm Design

Uploaded by

100% found this document useful (1 vote)

26 views1 page

This homework assignment involves: 1) Implementing "stripes" and "pair" approaches in Hadoop to compute item co-occurrences and conditional probabilities from transactional data. 2) Revising the algorithms to count word co-occurrences and conditional probabilities from text files, considering lines, sentences, or paragraphs as analysis units. 3) Running the algorithms on the Enron email dataset as a bonus task. The submission should include source code for the four algorithms, instructions to run them, and screenshots of running each on Amazon EC2.

Original Description:

HEWWE

Original Title

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

100% found this document useful (1 vote)

26 views1 page

Mapreduce Algorithm Design

Uploaded by

Indra Sar

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 1

Search inside document

Homework 2 (Due Date: Feb.

14)

1. Your input datasets are transactional data with the following format:

25 52 164 240 274 328 368 448 538 561 630 687 730 775 825 834
39 120 124 205 401 581 704 814 825 834
35 249 674 712 733 759 854 950
39 422 449 704 825 857 895 937 954 964
15 229 262 283 294 352 381 708 738 766 853 883 966 978
26 104 143 320 569 620 798
7 185 214 350 529 658 682 782 809 849 883 947 970 979
227 390
71 192 208 272 279 280 300 333 496 529 530 597 618 674 675 720 855 914 932

Each line records a transaction (a set of items being purchased together).

Your goal is to compute the co-occurrences of each pair of items, Count(A,B)=# of transactions
contains both A and B, and the conditional probability Prob(B|A)=Count(A,B)/Count(A).

You need implement both the “stripes” approach and the “pair” approach om Hadoop (See
slides on MapReduce Algorithm Design and Chapter 3 in Jimmy Lin's textbook).

The sample datasets can be found at: http://fimi.ua.ac.be/data/

2. Revising your algorithms (in 1) to count the co-occurrences of pairs of words Count(A,B) and
the conditional probability Prob(B|A)=Count(A,B)/Count(A).
Your inputs are text files and considering using the following criteria for a unit (corresponding a
transaction in 1):
a. each line is a unit
b. each sentence is a unit
c. each paragraph is a unit (Bonus)

Bonus: Run your algorithms on the Enrone email datasets:

http://www.cs.cmu.edu/~enron/enron_mail_20110402.tgz

Submission: The source code of four algorithms (stripes and pairs for both problems); a simple
description on how to run your programs; and a screen shot on running each algorithm. Finally, all your
programs shall be tested and run on Amazon EC2.

Case Study
Document8 pages
Case Study
Kotwal Mohit Kotwal
50% (2)
501 GMAT Questions PDF
Document543 pages
501 GMAT Questions PDF
Indra Sar
100% (2)
System Identification Toolbox - Reference Matlab
Document1,249 pages
System Identification Toolbox - Reference Matlab
Willian Leão
100% (1)
ISU Billing Master Data
Document6 pages
ISU Billing Master Data
soumyajit1986
100% (3)
Assignment 2 PDF
Document25 pages
Assignment 2 PDF
Boni Halder
No ratings yet
CS1352 May09
Document14 pages
CS1352 May09
sridharanc23
100% (1)
COMPUTERISED ACCOUNTING - Ncert Summary + Important Questions - Jitin KR On YouTube
Document25 pages
COMPUTERISED ACCOUNTING - Ncert Summary + Important Questions - Jitin KR On YouTube
utkarshp2459
No ratings yet
Tsinghua Introduction To Big Data Systems hw8 Question
Document5 pages
Tsinghua Introduction To Big Data Systems hw8 Question
jovanhuang
No ratings yet
W3 TechAssignmentB SpreadsheetComputerComparision
Document6 pages
W3 TechAssignmentB SpreadsheetComputerComparision
Logan Major
No ratings yet
Computer Questions and Answers
Document10 pages
Computer Questions and Answers
api-3832224
No ratings yet
CRC of BCA (2) Assignment (Revised Syllabus)
Document17 pages
CRC of BCA (2) Assignment (Revised Syllabus)
Bshrinivas
No ratings yet
t4 m2
Document49 pages
t4 m2
Amazon Mío
No ratings yet
Csci 5454 HW
Document10 pages
Csci 5454 HW
Ramnarayan Shreyas
No ratings yet
ECEE2 - Lecture 3 - Flowcharting
Document3 pages
ECEE2 - Lecture 3 - Flowcharting
JheZz Masinsin
No ratings yet
M SC Syllabus
Document31 pages
M SC Syllabus
Vishal Upadhayay
No ratings yet
Assoc Parallel Journal
Document16 pages
Assoc Parallel Journal
Anonymous RrGVQj
No ratings yet
Assignment 22778 19835 612cbc1d69fbb
Document7 pages
Assignment 22778 19835 612cbc1d69fbb
Minakshi Patil
No ratings yet
Wa0000
Document38 pages
Wa0000
Aurobinda Mohanty
No ratings yet
Practical No.-01
Document25 pages
Practical No.-01
vaibhavibhurane
No ratings yet
Module 2
Document19 pages
Module 2
a45 as
No ratings yet
MCS 032 Notes
Document4 pages
MCS 032 Notes
TECH IS FUN
No ratings yet
Assignment 2: Use Arrays To Structure The Raw Data and To Perform Data Comparison & Operations
Document6 pages
Assignment 2: Use Arrays To Structure The Raw Data and To Perform Data Comparison & Operations
BB ki Vince
No ratings yet
Assignment 05
Document2 pages
Assignment 05
Gabriel Jau
No ratings yet
NanoRoughness Project
Document12 pages
NanoRoughness Project
Borad Madan Barkachary
No ratings yet
Introduction To Matlab - 1
Document190 pages
Introduction To Matlab - 1
Omer Uctu
No ratings yet
Theano: A CPU and GPU Math Compiler in Python
Document7 pages
Theano: A CPU and GPU Math Compiler in Python
hachan
No ratings yet
List of Lab Exercises
Document3 pages
List of Lab Exercises
Ajay Raj Srivastava
No ratings yet
Assignment 8 AWK
Document4 pages
Assignment 8 AWK
nitin10685
No ratings yet
5 Sess Prac MT
Document5 pages
5 Sess Prac MT
blaze
No ratings yet
Sample Exam Problems
Document9 pages
Sample Exam Problems
SherelleJiaxinLi
100% (1)
Exploratory Data Analysis (EDA) Using Python
Document21 pages
Exploratory Data Analysis (EDA) Using Python
bpjstk vc
No ratings yet
CS MSC Entrance Exam 2006 PDF
Document12 pages
CS MSC Entrance Exam 2006 PDF
Rahel Eshetu
No ratings yet
SPAO - Korisni Prashanja I Odgovori
Document6 pages
SPAO - Korisni Prashanja I Odgovori
Viktorija Krsteska
No ratings yet
CS1352 May07
Document19 pages
CS1352 May07
sridharanc23
No ratings yet
Scil 4 B
Document48 pages
Scil 4 B
Skye Jaba
No ratings yet
Interview Question
Document48 pages
Interview Question
s_praveenkumar
No ratings yet
9A05403 Design & Analysis of Algorithms
Document4 pages
9A05403 Design & Analysis of Algorithms
sivabharathamurthy
No ratings yet
4.1.2.4 Lab - Simple Linear Regression in Python
Document6 pages
4.1.2.4 Lab - Simple Linear Regression in Python
Nurul Fadillah Jannah
No ratings yet
CL2 Final Assignment
Document7 pages
CL2 Final Assignment
Mohit Gupta
No ratings yet
What Is Compiler in Datastage - Compilation Process in Datastage
Document14 pages
What Is Compiler in Datastage - Compilation Process in Datastage
shivnat
No ratings yet
Unit Iii & Iv
Document4 pages
Unit Iii & Iv
SangeethRaj PS
No ratings yet
What Is The Process To Create ISU Master Data?
Document4 pages
What Is The Process To Create ISU Master Data?
Suhas Misal
No ratings yet
1.3 PPT - Measure of Query Cost
Document42 pages
1.3 PPT - Measure of Query Cost
amaan
100% (1)
MATLAB Tutorials: 1.1 Environment
Document4 pages
MATLAB Tutorials: 1.1 Environment
gri_t
No ratings yet
CS246 Hw1
Document5 pages
CS246 Hw1
sudh123456
No ratings yet
Screenshot 2024-05-05 at 18.55.12
Document15 pages
Screenshot 2024-05-05 at 18.55.12
reagan omondi
No ratings yet
Assignment3 A20
Document3 pages
Assignment3 A20
April Ding
No ratings yet
4.3.1.4 Lab - Internet Traffic Data Linear Regression
Document14 pages
4.3.1.4 Lab - Internet Traffic Data Linear Regression
Nurul Fadillah Jannah
No ratings yet
BCA 2nd Sem Ass.2018-19
Document15 pages
BCA 2nd Sem Ass.2018-19
Kumar Info
No ratings yet
Machine (RAM), Instructions Are Executed One After Another (The Central
Document3 pages
Machine (RAM), Instructions Are Executed One After Another (The Central
sankarwinning
No ratings yet
9691 Y11 SP 3
Document4 pages
9691 Y11 SP 3
kervin
No ratings yet
19cs2205a Key
Document8 pages
19cs2205a Key
Suryateja Koka
No ratings yet
R Advbeginner v5
Document73 pages
R Advbeginner v5
LikonStruptel
No ratings yet
Theoryassignment PDF
Document11 pages
Theoryassignment PDF
Karthik Reddy
No ratings yet
5.1 Types: Edsger Dijkstra
Document30 pages
5.1 Types: Edsger Dijkstra
philip resuello
No ratings yet
Erer
Document52 pages
Erer
Richard Baker
No ratings yet
Data Understanding and Preparation
Document48 pages
Data Understanding and Preparation
MohamedYounes
No ratings yet
DM Lab Cycle 1
Document12 pages
DM Lab Cycle 1
ispclx
No ratings yet
End Semester Examination, May 2 EC Microprocesso: 008 /COEIIC-311: RS
Document10 pages
End Semester Examination, May 2 EC Microprocesso: 008 /COEIIC-311: RS
Dheeraj Anand
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Enterprise Artificial Intelligence Transformation
From Everand
Enterprise Artificial Intelligence Transformation
Rashed Haq
No ratings yet
Senior Solution Architect in Amazon CV Samle - Kickresume
Document4 pages
Senior Solution Architect in Amazon CV Samle - Kickresume
Indra Sar
No ratings yet
Human Resources Intern Resume Example - Kickresume
Document5 pages
Human Resources Intern Resume Example - Kickresume
Indra Sar
No ratings yet
Qs Mba Roi Report 2018 v2 PDF
Document26 pages
Qs Mba Roi Report 2018 v2 PDF
Indra Sar
No ratings yet
Idioms To Know For The GMAT
Document6 pages
Idioms To Know For The GMAT
Indra Sar
No ratings yet
Mla 5
Document4 pages
Mla 5
Indra Sar
No ratings yet
MLA Compilation 1-14
Document40 pages
MLA Compilation 1-14
Indra Sar
No ratings yet
Harvard Duke Fuqua Columbia: Round 1 vs. Round 2 vs. Round 3 - What Do These Mean?
Document3 pages
Harvard Duke Fuqua Columbia: Round 1 vs. Round 2 vs. Round 3 - What Do These Mean?
Indra Sar
No ratings yet
Wi-Fi Certified 6
Document2 pages
Wi-Fi Certified 6
Indra Sar
No ratings yet
Pax-Dbscan: A Proposed Algorithm For Improved Clustering: Grace L. Samson Joan Lu
Document36 pages
Pax-Dbscan: A Proposed Algorithm For Improved Clustering: Grace L. Samson Joan Lu
Indra Sar
No ratings yet
Predictive Modeling Machine Learning
Document16 pages
Predictive Modeling Machine Learning
Indra Sar
No ratings yet
7 Backtracking - N Queen Problem
Document23 pages
7 Backtracking - N Queen Problem
ctxxfx5ykz
No ratings yet
Artificial Intelligence Questions Solved
Document17 pages
Artificial Intelligence Questions Solved
Alima Nasa
No ratings yet
DSP Mod 1
Document41 pages
DSP Mod 1
Abhishek Pandey
No ratings yet
Worksheet Algebraic Expression
Document5 pages
Worksheet Algebraic Expression
Catherine Galotera
No ratings yet
Thapar University, Patiala
Document2 pages
Thapar University, Patiala
Mridul Mahindra
No ratings yet
Particle Swarm Optimization
Document13 pages
Particle Swarm Optimization
Saumil Shah
No ratings yet
Midterm Exam: ENGR 391 Numerical Methods
Document7 pages
Midterm Exam: ENGR 391 Numerical Methods
amyse56
No ratings yet
Decision Modeling Using Spreadsheet
Document36 pages
Decision Modeling Using Spreadsheet
amrita
No ratings yet
Irections: Read Each Problem Carefully. Choose The Letter That Corresponds To The Correct Answer
Document2 pages
Irections: Read Each Problem Carefully. Choose The Letter That Corresponds To The Correct Answer
Just Some Random Commenter
No ratings yet
A Novel Method For MP3 Steganalysis
Document7 pages
A Novel Method For MP3 Steganalysis
m1.nourian
No ratings yet
Chapter 24: Single-Source Shortest Paths: Given
Document48 pages
Chapter 24: Single-Source Shortest Paths: Given
Ananya Lakshmi P
No ratings yet
Algorithms, Fall 2005. (Massachusetts Institute of Technology: MIT
Document14 pages
Algorithms, Fall 2005. (Massachusetts Institute of Technology: MIT
ankitdocs
No ratings yet
Digital Signal Processing Multiple Choice Questions and Answers - Sanfoundry
Document13 pages
Digital Signal Processing Multiple Choice Questions and Answers - Sanfoundry
OMSURYACHANDRAN
0% (1)
Inverse Power Method, Shifted Power Method and Deflation
Document4 pages
Inverse Power Method, Shifted Power Method and Deflation
M.Y M.A
No ratings yet
1 s2.0 S0957417420303869 Main
Document13 pages
1 s2.0 S0957417420303869 Main
daytdeen
No ratings yet
Reviewer 2 - Numerical Solutions To CE Problems
Document6 pages
Reviewer 2 - Numerical Solutions To CE Problems
stnicog
No ratings yet
Polynomials Hand in Assignment #1
Document5 pages
Polynomials Hand in Assignment #1
cuteangel_103
No ratings yet
Automata Solution
Document4 pages
Automata Solution
Rapid fire
No ratings yet
Elec9123 DSP Design
Document7 pages
Elec9123 DSP Design
Sydney Finest
No ratings yet
Applied Numerical Methods Project
Document18 pages
Applied Numerical Methods Project
Daffa Sudiana
No ratings yet
Chap9 Anomaly Detection
Document46 pages
Chap9 Anomaly Detection
محمد إحسن
No ratings yet
5-LP Simplex (CJ-ZJ Tableau)
Document7 pages
5-LP Simplex (CJ-ZJ Tableau)
zuluagaga
No ratings yet
Ec2301 Digital Communication
Document19 pages
Ec2301 Digital Communication
jeevitha86
No ratings yet
Mason's Rule
Document1 page
Mason's Rule
Adarsh Bandi
100% (1)
AI Fellowship Syllabus LATAM
Document17 pages
AI Fellowship Syllabus LATAM
Oscar Misael Peña Victorino
No ratings yet
Section 15 Plomberie 2 Mai 2016
Document17 pages
Section 15 Plomberie 2 Mai 2016
Romi L'amiral
No ratings yet
Nemo, Czriss Paulimer C - Problem Set 1
Document7 pages
Nemo, Czriss Paulimer C - Problem Set 1
pauli
No ratings yet
Banana Leaf Disease Detection Using Deep Learning Approach
Document5 pages
Banana Leaf Disease Detection Using Deep Learning Approach
IJRASETPublications
No ratings yet