Welcome to Scribd!

Homework Assignment: Project 1

Uploaded by

0% found this document useful (0 votes)

39 views2 pages

This homework assignment requires students to: 1) Apply knowledge of association rule learning to analyze a large dataset and learn 50 strong rules using an algorithm like Apriori. 2) Write a program to generate an output file listing the support, confidence, and product of support and confidence for each rule. 3) Submit a report discussing their approach, the rules learned from the data, limitations of their method, and a suggested alternative approach.

Original Description:

Original Title

robt407_fall2014_project1

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

39 views2 pages

Homework Assignment: Project 1

Uploaded by

Bauyrzhan Du Fromage

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

HOMEWORK ASSIGNMENT

NAZARBAYEV UNIVERSITY | SCHOOL OF SCIENCE AND TECHNOLOGY

PROJECT 1
In this project students are required to apply the knowledge learned about association rules acquired in the
Statistical Methods and Machine Learning class to learn rules from a big dataset provided. Data preprocessing
skills, programming capabilities and completed work reporting are also evaluated as part of the assignment.

DUE DATE
Thursday, 4
th
of September

METHOD OF DELIVERY
Assignment deliverables should be submitted via Moodle to the course instructor before the due date.

LEVEL OF COLLABORATION ALLOWED
Collaboration is not allowed on this assignment each student should perform the assignment individually.

ESTIMATED TIME FOR COMPLETION
20 hours

ADDITIONAL SUPPORT
Please contact the course instructor if you need any assistance or have any concerns about this assignment.

ASSIGNMENT DELIVERABLES
- Matlab (or C++) program written by a student to accomplish the task
- Report which reveals in detail how a student approached the problem and solved it. Association rules
learned from the data must be discussed.
GRADING CRITERIA
- 60% - implementation, functionality and documentation of the work
- 30% - based on student ranking provided using an evaluation program implementing a metric
- 10% - discussion of the limitations of the implemented approach and suggestion of another one with
proper justification

ASSIGNMENT DETAILS
The Computer Age provided people with enormous amount of digital information and introduced the concept
of the Big Data, which is associated with difficulties of traditional data processing applications.
Association Rule learning is a popular and wide spread machine learning technique allowing extraction of
important and/or interesting relations between elements in large datasets. The concept provides tools to
identify strong rules using different measures, for example support and/or confidence. These strong rules then
describe regularities between variabels in transaction datasets, which are very important in applications such
as market basket analysis, web data mining, bioinformatics, etc.

In this assignment you are provided with large dataset of 989818 transactions. The data comes from Internet
Information Server (IIS) logs of msnbc.com and news-related portions of msn.com for the entire day of
September, 28, 1999 (http://archive.ics.uci.edu/ml/datasets/MSNBC.com+Anonymous+Web+Data). Each
transaction (row sequence) represents page views of a user during 24 hour period. Sequences consist of
numbers which represent a web page categories as follows:
1 = Frontpage
2 = News
3 = Tech
4 = Local
5 = Opinion
6 = On Air
7 = Misc
8 = Weather
9 = MSN-News

10 = Health
11 = Living
12 = Business
13 = MSN-Sports
14 = Sports
15 = Summary
16 = BBS
17 = Travel

For instance, the sequence 4,3,5,1,10,1 means that the person visited Local, Tech, Opinion, Front, Health and
then again Front pages. You are required to implement an association rule algorithm (for example Apriori
algorithm, but dont forget that there might be better ones) to learn 50 strong rules from your dataset. This will
involve manipulations on support and confidence thresholds and your analysis. Rules should depict which
web page categories are commonly accessed together, for example
Frontpage -> News : (support, confidence)

Your program must generate an output file in txt format with the following values listed at each line
separated with commas:

Support (float), Confidence(float), Product of Support an Confidence values (float), Consequent (integer), list
of Antecedents (integer)

Note that each line might have different number of entries due to different number of Antecedents. The rules
should be listed in sorted order with the strongest rule first. The strength of a rule will be determined by the
multiplicated value of the support and confidence.

Data Cleaning With Power BI (2024)
Document620 pages
Data Cleaning With Power BI (2024)
Hamza
100% (3)
Solution Manual For Mis 10th Edition Hossein Bidgoli
Document18 pages
Solution Manual For Mis 10th Edition Hossein Bidgoli
Owen Wilkerson
100% (42)
Rob T 308 Lecture 18
Document78 pages
Rob T 308 Lecture 18
Bauyrzhan Du Fromage
No ratings yet
Sylabus Spring2015 Robt308 IndustrialAutomation
Document6 pages
Sylabus Spring2015 Robt308 IndustrialAutomation
Bauyrzhan Du Fromage
No ratings yet
Art Terminology
Document32 pages
Art Terminology
kalainh
50% (2)
Jhon Alexander - LaitonD83527GC10 - sg1 PDF
Document304 pages
Jhon Alexander - LaitonD83527GC10 - sg1 PDF
Xiomy Becerra
No ratings yet
Past Paper ISM
Document8 pages
Past Paper ISM
sushree.dash23h
No ratings yet
School Management System
Document11 pages
School Management System
Gayle Marie Rosales
No ratings yet
7010 Nos SW 7
Document3 pages
7010 Nos SW 7
mstudy123456
No ratings yet
3 - Data Assignment f22 Crochelo
Document6 pages
3 - Data Assignment f22 Crochelo
api-658313960
No ratings yet
1femi AE2 Jan2023
Document4 pages
1femi AE2 Jan2023
Chakri Chakradhar
No ratings yet
Online School Management System Thesis
Document4 pages
Online School Management System Thesis
lhydupvcf
100% (2)
Research Assignment All
Document2 pages
Research Assignment All
Tanzeem Khan
No ratings yet
ML - 2
Document9 pages
ML - 2
dibloa
No ratings yet
Operations Research Applications in The Field of Information and Communication
Document6 pages
Operations Research Applications in The Field of Information and Communication
rushabhrakholiya
No ratings yet
Online Course Portal A ASP - Net C#.Net Project
Document29 pages
Online Course Portal A ASP - Net C#.Net Project
Punit Chauhan
100% (2)
DSExp10MiniProject Format
Document12 pages
DSExp10MiniProject Format
Wahid Ahmed
No ratings yet
Student Attendance Monitoring System Thesis Documentation
Document4 pages
Student Attendance Monitoring System Thesis Documentation
seewbyvff
100% (2)
Document F
Document52 pages
Document F
Hassen Mohammed
No ratings yet
Student Management System Source Code
Document9 pages
Student Management System Source Code
Varun Prajapati
100% (1)
My ML Notes
Document6 pages
My ML Notes
Pandu K
No ratings yet
Zero To Mastery In Cybersecurity- Become Zero To Hero In Cybersecurity, This Cybersecurity Book Covers A-Z Cybersecurity Concepts, 2022 Latest Edition
From Everand
Zero To Mastery In Cybersecurity- Become Zero To Hero In Cybersecurity, This Cybersecurity Book Covers A-Z Cybersecurity Concepts, 2022 Latest Edition
RAJIV JAIN
No ratings yet
Information System Examples Thesis
Document6 pages
Information System Examples Thesis
juliesmitheverett
100% (2)
Student Portal System Thesis Documentation
Document8 pages
Student Portal System Thesis Documentation
oxylhkxff
100% (2)
E-Learning System Thesis Documentation
Document5 pages
E-Learning System Thesis Documentation
gj84st7d
100% (2)
MIS Mid-Term Examination
Document8 pages
MIS Mid-Term Examination
OWAIS AHMED NISAR AHMED
No ratings yet
Record Management System Case Study of Mugabe Sec School
Document34 pages
Record Management System Case Study of Mugabe Sec School
Miriam Karata
No ratings yet
Portfolio Document
Document4 pages
Portfolio Document
priya salunke
No ratings yet
Info Tech Thesis
Document5 pages
Info Tech Thesis
cindyturnertorrance
100% (2)
DBMS Unit-I
Document22 pages
DBMS Unit-I
srigopi1415
No ratings yet
Introduction To Information Systems Canadian 3rd Edition Rainer Solutions Manual
Document15 pages
Introduction To Information Systems Canadian 3rd Edition Rainer Solutions Manual
JessicaTerrysaoyj
100% (14)
Web Data Extraction Applications Survey
Document40 pages
Web Data Extraction Applications Survey
jose_dias_58
No ratings yet
What Is Information Technology, and Why Is It Important To A Business?
Document5 pages
What Is Information Technology, and Why Is It Important To A Business?
Maxbuub Axmed
No ratings yet
Thesis Title For Information System
Document8 pages
Thesis Title For Information System
PaperWritingServicesSingapore
100% (2)
Capstone
Document10 pages
Capstone
Richelyn Joy Micaros
No ratings yet
Sample Thesis in System Analysis and Design
Document7 pages
Sample Thesis in System Analysis and Design
tifqbfgig
100% (2)
Ordering System Thesis Documentation
Document4 pages
Ordering System Thesis Documentation
nadugnlkd
100% (2)
MMMM
Document86 pages
MMMM
Muluken Getachew
No ratings yet
Information Technology Hardware Thesis Title
Document5 pages
Information Technology Hardware Thesis Title
WriteMyNursingPaperSingapore
100% (2)
Thesis Documentation Chapter 2
Document8 pages
Thesis Documentation Chapter 2
stacyjohnsonreno
100% (2)
Week 2
Document22 pages
Week 2
api-394738731
No ratings yet
DS&BD Lab Manul
Document98 pages
DS&BD Lab Manul
Ajeet Gupta
No ratings yet
Thesis On Expert System
Document7 pages
Thesis On Expert System
WriteMyPaperForMeSpringfield
100% (2)
Data Mining and Warehousing Lab
Document4 pages
Data Mining and Warehousing Lab
PhamThi Thiet
No ratings yet
Twitter Sentimental Analysis Project Report
Document4 pages
Twitter Sentimental Analysis Project Report
Anish Yadav
No ratings yet
I-Teach Information Technology: Dept of CSE
Document97 pages
I-Teach Information Technology: Dept of CSE
guruannamalai
No ratings yet
Chapter 1 & 2 7-18-2013
Document15 pages
Chapter 1 & 2 7-18-2013
Jun Li
No ratings yet
BDA Lab 9 Manual
Document3 pages
BDA Lab 9 Manual
aaleem.bscs21seecs
No ratings yet
System Analysis and Design
Document138 pages
System Analysis and Design
derejebokasa2022
No ratings yet
ATARC AIDA Guidebook - FINAL 1T
Document6 pages
ATARC AIDA Guidebook - FINAL 1T
dfglunt
No ratings yet
Thesis Student Attendance Monitoring System
Document5 pages
Thesis Student Attendance Monitoring System
fjbnd9fq
100% (2)
Thesis Payroll System
Document5 pages
Thesis Payroll System
kristilucaspittsburgh
100% (2)
ATARC AIDA Guidebook - FINAL 3v
Document6 pages
ATARC AIDA Guidebook - FINAL 3v
dfglunt
No ratings yet
Mark ZuckerBerg Speech
Document4 pages
Mark ZuckerBerg Speech
Muhammad Hunaid
No ratings yet
ATARC AIDA Guidebook - FINAL 42
Document5 pages
ATARC AIDA Guidebook - FINAL 42
dfglunt
No ratings yet
ICEIS - 2010 - 381 v7
Document6 pages
ICEIS - 2010 - 381 v7
javier_elicegui
No ratings yet
Thesis Computerized Enrollment System
Document7 pages
Thesis Computerized Enrollment System
aflnxhshxlddxg
100% (2)
BSC (Hons) Information Technology (Online) Level 6 Modules
Document3 pages
BSC (Hons) Information Technology (Online) Level 6 Modules
jazzery
No ratings yet
Final Destination PDF
Document78 pages
Final Destination PDF
Rhea
No ratings yet
Thesis On Information Technology PDF
Document6 pages
Thesis On Information Technology PDF
joyceknightjackson
100% (2)
SWE680 Midterm - 94929
Document6 pages
SWE680 Midterm - 94929
Jayanth Jaidev
No ratings yet
Management System
Document15 pages
Management System
Satya Prakash Mehra
No ratings yet
Research Paper On Rapidminer
Document4 pages
Research Paper On Rapidminer
hnpawevkg
100% (1)
Big Data Modeling and Management Systems
From Everand
Big Data Modeling and Management Systems
Alexander Afriyie
No ratings yet
Why Use School Information Software?: Keys to Making Sense of K-12 Software
From Everand
Why Use School Information Software?: Keys to Making Sense of K-12 Software
Sue Lloyd
No ratings yet
29 March 2016: Press Release
Document2 pages
29 March 2016: Press Release
Bauyrzhan Du Fromage
No ratings yet
DBG December 2014
Document71 pages
DBG December 2014
Bauyrzhan Du Fromage
No ratings yet
Linux
Document95 pages
Linux
19-5E8 Tushara Priya
No ratings yet
Random Dice Deck Database PDF
Document68 pages
Random Dice Deck Database PDF
caca
No ratings yet
IBM Storage Scale and Storage Scale Server Level 2 Quiz - Attempt Review
Document12 pages
IBM Storage Scale and Storage Scale Server Level 2 Quiz - Attempt Review
Abdul Neves
100% (1)
KEB v3 - 21 Rev1A 08 - 2014 PDF
Document314 pages
KEB v3 - 21 Rev1A 08 - 2014 PDF
Daniel Gonzalez
No ratings yet
Cardinal IVAC-597-598 - Service Manual
Document76 pages
Cardinal IVAC-597-598 - Service Manual
Pablo González de Paz
No ratings yet
MCA IV Syllabus
Document14 pages
MCA IV Syllabus
PrateekRathore
No ratings yet
1st Meeting MIDTERM in ADVANCE WORD PROCESSING
Document5 pages
1st Meeting MIDTERM in ADVANCE WORD PROCESSING
Christopher Cristobal
No ratings yet
1 C#-Language Fundamentals
Document46 pages
1 C#-Language Fundamentals
suresh1130
No ratings yet
Project Management: Prof. Dr. Shahid Naveed
Document31 pages
Project Management: Prof. Dr. Shahid Naveed
ramiz
No ratings yet
Technology Readiness Assessment (TRA) : Department of Defense
Document20 pages
Technology Readiness Assessment (TRA) : Department of Defense
mcpackman
No ratings yet
Auto Net
Document40 pages
Auto Net
sdthrtshrs
No ratings yet
Drupal User Guide
Document229 pages
Drupal User Guide
jahnavi Malhotra
No ratings yet
BI 2012 Brochure
Document42 pages
BI 2012 Brochure
Indra Bangsawan
No ratings yet
C++ STL Functions
Document13 pages
C++ STL Functions
Swastik swarup meher
No ratings yet
Silabus Kursus Keahlian Komputer Pada MMSI
Document4 pages
Silabus Kursus Keahlian Komputer Pada MMSI
R. Arief Ferdiansyah Praja
No ratings yet
Meghana Resume New
Document2 pages
Meghana Resume New
H R
No ratings yet
Course On Physical Design Techniques
Document3 pages
Course On Physical Design Techniques
Resonous Com
No ratings yet
Edwin Vlsi Testing
Document23 pages
Edwin Vlsi Testing
Jagan Rajendiran
No ratings yet
The Most Effective Training Techniques
Document10 pages
The Most Effective Training Techniques
Usadhi
No ratings yet
Making Multimedia Session 08: Course: 0553T / Multimedia System Year: 2015
Document23 pages
Making Multimedia Session 08: Course: 0553T / Multimedia System Year: 2015
Gabrielle Angelica
No ratings yet
Shell Programming
Document31 pages
Shell Programming
srinidhi1956
No ratings yet
Asp 5202
Document508 pages
Asp 5202
julesji
No ratings yet
Smartbox Brochure PDF
Document32 pages
Smartbox Brochure PDF
Kidz to Adultz Exhibitions
100% (1)
Employee Management System For ASTU
Document76 pages
Employee Management System For ASTU
Lungile
No ratings yet
Canon c2225 Brochure
Document3 pages
Canon c2225 Brochure
JEZUZ_JANDY
No ratings yet
iOS - How To Find The Serial Number, IMEI, MEID, CDN, and ICCID Number
Document3 pages
iOS - How To Find The Serial Number, IMEI, MEID, CDN, and ICCID Number
j0haNN3s
No ratings yet
Scadainstallationguide Us
Document14 pages
Scadainstallationguide Us
athavan
No ratings yet