Welcome to Scribd!

Perform Data Pre-Processing On Sample Data Set (Student - Arff)

Uploaded by

0% found this document useful (0 votes)

13 views4 pages

This document describes steps to perform data pre-processing on a student data set using WEKA. It includes loading the data, viewing attribute statistics, filtering attributes by removing ones that are not needed, and discretizing continuous attributes like age into categorical bins. Discretizing prepares the data for association rule mining by making all attributes categorical. The steps are demonstrated on a sample student data set with attributes like age, income level, student status, credit rating and whether they purchase a PC.

Original Description:

Original Title

weka exp-2.docx

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

0% found this document useful (0 votes)

13 views4 pages

Perform Data Pre-Processing On Sample Data Set (Student - Arff)

Uploaded by

aditya

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 4

Search inside document

Perform Data Pre-processing on sample data set(student.

arff)

This experiment illustrates some of the basic data preprocessing operations that can be
performed using WEKA-Explorer. The sample dataset used for this example is the student
data available in arff format.

Step1: Loading the data. We can load the dataset into weka by clicking on open button in
preprocessing interface and selecting the appropriate file.

Step2: Once the data is loaded, weka will recognize the attributes and during the scan of the
data weka will compute some basic strategies on each attribute. The left panel in the above
figure shows the list of recognized attributes while the top panel indicates the names of the
base relation or table and the current working relation (which are same initially).

Step3:Clicking on an attribute in the left panel will show the basic statistics on the attributes
for the categorical attributes the frequency of each attribute value is shown, while for
continuous attributes we can obtain min, max, mean, standard deviation and deviation etc.,

Step4:The visualization in the right button panel in the form of cross-tabulation across two
attributes.
Note:we can select another attribute using the dropdown list.

Step5:Selecting or filtering attributes

Removing an attribute-When we need to remove an attribute,we can do this by using the

attribute filters in weka.In the filter model panel,click on choose button,This will show a
popup window with a list of available filters.

Scroll down the list and select the “weka.filters.unsupervised.attribute.remove” filters.

Step 6:a)Next click the textbox immediately to the right of the choose button.In the resulting
dialog box enter the index of the attribute to be filtered out.

b)Make sure that invert selection option is set to false.The click OK now in the filter box.you
will see “Remove-R-7”.
c)Click the apply button to apply filter to this data.This will remove the attribute and create
new working relation.

d)Save the new working relation as an arff file by clicking save button on the
top(button)panel.(student.arff)

1
Discretization

1)Sometimes association rule mining can only be performed on categorical data. This
requires performing discretization on numeric or continuous attributes.In the following
example let us discretize age attribute.

 Let us divide the values of age attribute into three bins(intervals).

 First load the dataset into weka(student.arff)

 Select the age attribute.

 Activate filter-dialog box and select WEKA.filters.unsupervised.attribute.discretize”

from the list.

 To change the defaults for the filters,click on the box immediately to the right of the
choose button.

 We enter the index for the attribute to be discretized.In this case the attribute is age.So
we must enter ‘1’ corresponding to the age attribute.

 Enter ‘3’ as the number of bins.Leave the remaining field values as they are.

 Click OK button.

 Click apply in the filter panel.This will result in a new working relation with the
selected attribute partition into 3 bins.

 Save the new working relation in a file called student-data-discretized.arff

Dataset student .arff

@relation student

@attribute age {<30,30-40,>40}

@attribute income {low, medium, high}

@attribute student {yes, no}

@attribute credit-rating {fair, excellent}

@attribute buyspc {yes, no}

@data

2
<30, high, no, fair, no

<30, high, no, excellent, no 30-40, high, no, fair, yes

>40, medium, no, fair, yes

>40, low, yes, fair, yes

>40, low, yes, excellent, no

30-40, low, yes, excellent, yes

<30, medium, no, fair, no

<30, low, yes, fair, no

>40, medium, yes, fair, yes

<30, medium, yes, excellent, yes

30-40, medium, no, excellent, yes

30-40, high, yes, fair, yes

>40, medium, no, excellent, no

%
The following screenshot shows the effect of discretization.

3
4

HW4 Text-1
Document8 pages
HW4 Text-1
Utkarsh Shrivatava
No ratings yet
Practical Engineering, Process, and Reliability Statistics
From Everand
Practical Engineering, Process, and Reliability Statistics
Mark Allen Durivage
No ratings yet
Data Analysis Using SPSS: Research Workshop Series
Document86 pages
Data Analysis Using SPSS: Research Workshop Series
Muhammad Asad Ali
No ratings yet
IS328 Assignment1-Unlocked
Document10 pages
IS328 Assignment1-Unlocked
Tetz
No ratings yet
Clustering Iris Data With Weka
Document6 pages
Clustering Iris Data With Weka
Cheng Lin
No ratings yet
Stress Strain Apparatus Lab
Document4 pages
Stress Strain Apparatus Lab
api-253978194
No ratings yet
Demonstration of Preprocessing On Dataset Student - Arff Aim: This Experiment Illustrates Some of The Basic Data Preprocessing Operations That Can Be
Document4 pages
Demonstration of Preprocessing On Dataset Student - Arff Aim: This Experiment Illustrates Some of The Basic Data Preprocessing Operations That Can Be
Pavan Sankar K
No ratings yet
MC0717 Lab Manual
Document42 pages
MC0717 Lab Manual
Arun Reddy
No ratings yet
DWDM Record With Alignment
Document69 pages
DWDM Record With Alignment
navya
No ratings yet
Data Mining 2
Document40 pages
Data Mining 2
Piyush Rajput
No ratings yet
Data Mining Lab Manual
Document44 pages
Data Mining Lab Manual
Amanpreet Kaur
33% (3)
Data Mining Lab Manual
Document40 pages
Data Mining Lab Manual
xamogoy396
No ratings yet
Data Mining - Lab - Manual
Document20 pages
Data Mining - Lab - Manual
varmam
No ratings yet
Data Mining Lab Manual
Document42 pages
Data Mining Lab Manual
bahubali king
No ratings yet
DWDM Lab Manual Using Weka-For MIC
Document42 pages
DWDM Lab Manual Using Weka-For MIC
Krishnamohan Yerrabilli
No ratings yet
Data Mining Lab Manual: Aurora's PG College Moosarambagh Mca Department
Document42 pages
Data Mining Lab Manual: Aurora's PG College Moosarambagh Mca Department
ladooroy
No ratings yet
Data-Mining-Lab-Manual Cs 703b
Document41 pages
Data-Mining-Lab-Manual Cs 703b
Amit Kumar Sahu
No ratings yet
DWDM Lab Manual
Document47 pages
DWDM Lab Manual
Krishna Chowdary Challa
No ratings yet
Data Warehousing and Data Mining Lab
Document53 pages
Data Warehousing and Data Mining Lab
Aman Jolly
No ratings yet
Exp 5
Document5 pages
Exp 5
Pavan Sankar K
No ratings yet
Objective: Classification Using ID3 and C4.5 Algorithms Tasks
Document8 pages
Objective: Classification Using ID3 and C4.5 Algorithms Tasks
Shivam Shukla
No ratings yet
Lab2 DataPreprocessing A1.2
Document15 pages
Lab2 DataPreprocessing A1.2
Deepak Dahiya
No ratings yet
Assignment 1-Preprocessing Handon
Document13 pages
Assignment 1-Preprocessing Handon
suleman045
No ratings yet
Data Extraction Aditya Agrawal Exp-2
Document7 pages
Data Extraction Aditya Agrawal Exp-2
Aditya Agrawal
No ratings yet
Assignment 1-Preprocessing Handon
Document6 pages
Assignment 1-Preprocessing Handon
Ch Ubaid Warraich
No ratings yet
Data Mining Lab File
Document20 pages
Data Mining Lab File
DanishKhan
No ratings yet
hw2 Datapreproc PDF
Document15 pages
hw2 Datapreproc PDF
Tetz
No ratings yet
COC131 Tutorial w6
Document4 pages
COC131 Tutorial w6
Krishan Acharya
No ratings yet
Assignment
Document5 pages
Assignment
Chandra Bhushan Sah
No ratings yet
Step1. Open The Data/bank Data - CSV Dataset
Document3 pages
Step1. Open The Data/bank Data - CSV Dataset
NitSal Nand
No ratings yet
Factor Analysis Spss Notes
Document4 pages
Factor Analysis Spss Notes
gauravibs
No ratings yet
Data Mining Questions Q&A
Document11 pages
Data Mining Questions Q&A
aaakandoh
No ratings yet
Journal Data Mining
Document31 pages
Journal Data Mining
Mukesh
No ratings yet
WEKA Manual
Document25 pages
WEKA Manual
sagar
No ratings yet
Wa0002.
Document21 pages
Wa0002.
thabeswar2003
No ratings yet
Tut2 Weka
Document8 pages
Tut2 Weka
borjaunda
No ratings yet
Using Weka 3 For Classification and Prediction: Preparing A Test File
Document6 pages
Using Weka 3 For Classification and Prediction: Preparing A Test File
NurcahyoDwiNugroho
No ratings yet
WEKA Manual
Document11 pages
WEKA Manual
prabumn
No ratings yet
CAP3770 Lab#4 DecsionTree Sp2017
Document4 pages
CAP3770 Lab#4 DecsionTree Sp2017
Melving
No ratings yet
hw2 Datapreproc
Document15 pages
hw2 Datapreproc
Nau Gamero
No ratings yet
DWDM - Case Study On Weka - Ceb624
Document13 pages
DWDM - Case Study On Weka - Ceb624
CEB524SreejitGNair
No ratings yet
AI32 Guide To Weka PDF
Document6 pages
AI32 Guide To Weka PDF
datruccone
No ratings yet
Exp 2
Document6 pages
Exp 2
abhatnagar1be21
No ratings yet
Exercise 5: 5. Queries
Document15 pages
Exercise 5: 5. Queries
biruk
No ratings yet
7 Data Pre-Processing in Clementine
Document7 pages
7 Data Pre-Processing in Clementine
Vũ Tuấn Hưng
No ratings yet
6.034 Design Assignment 2: 1 Data Sets
Document6 pages
6.034 Design Assignment 2: 1 Data Sets
upender_kalwa
No ratings yet
Using Weka
Document6 pages
Using Weka
Arunima
No ratings yet
Wekappt
Document58 pages
Wekappt
bahubali king
No ratings yet
Chapter 3 mgsc
Document28 pages
Chapter 3 mgsc
stacey.githinji
No ratings yet
Query Wizard: Step 1 - Select Fields
Document3 pages
Query Wizard: Step 1 - Select Fields
Clerenda Mcgrady
No ratings yet
DWDM LAB Manual SVEC-16
Document8 pages
DWDM LAB Manual SVEC-16
Pottli Siddhu
No ratings yet
A. What Are The Coordinates of The Centroids For The Good Students and The Weak Students?
Document18 pages
A. What Are The Coordinates of The Centroids For The Good Students and The Weak Students?
rbrzakovic
No ratings yet
Classification With WEKA: Data Mining Lab 2
Document8 pages
Classification With WEKA: Data Mining Lab 2
haristimuño
No ratings yet
Data Mining Record
Document34 pages
Data Mining Record
Mohan Chandaluru
No ratings yet
Performing Cluster Analysis
Document3 pages
Performing Cluster Analysis
Tamanna Bavishi Shah
No ratings yet
How To Perform and Interpret Factor Analysis Using SPSS
Document9 pages
How To Perform and Interpret Factor Analysis Using SPSS
Jyoti Prakash
100% (2)
Data Analysis Guide Spss
Document58 pages
Data Analysis Guide Spss
Urvashi Khedoo
No ratings yet
Practical Control Charts: Control Charts Made Easy!
From Everand
Practical Control Charts: Control Charts Made Easy!
Colin Hardwick
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
Rating: 5 out of 5 stars
5/5 (5)
Excel Techniques
From Everand
Excel Techniques
Online Trainees
Rating: 2 out of 5 stars
2/5 (1)
WinUSB Tutorial 5
Document12 pages
WinUSB Tutorial 5
quim758
No ratings yet
4.4c DAFTAR KEMAMPUAN PP64
Document1 page
4.4c DAFTAR KEMAMPUAN PP64
kemalprima
No ratings yet
2010 Ross Catalog
Document32 pages
2010 Ross Catalog
medphoto
No ratings yet
Steelworkers Face Challenges All Around: Local 2958's Rex Ambrose Talks About Haynes, Union Issues and Buying American
Document8 pages
Steelworkers Face Challenges All Around: Local 2958's Rex Ambrose Talks About Haynes, Union Issues and Buying American
trturnerjr
No ratings yet
IADC Specifications and Equipments TD-01
Document25 pages
IADC Specifications and Equipments TD-01
Them Bui Xuan
No ratings yet
I0qm Balun
Document2 pages
I0qm Balun
Dundo Sakić
No ratings yet
Extremeswitching X620: Product Overview
Document7 pages
Extremeswitching X620: Product Overview
Alejandro Gutierrez
No ratings yet
Class XI Practical Assignment Mysql
Document6 pages
Class XI Practical Assignment Mysql
harsh
0% (2)
Amarujala
Document59 pages
Amarujala
royal.faraz4u9608
100% (2)
Unit Test 3 Paper 1 Compile - Key
Document1 page
Unit Test 3 Paper 1 Compile - Key
yashi84480
No ratings yet
Edtpa Lesson Plan 1
Document2 pages
Edtpa Lesson Plan 1
api-480117695
No ratings yet
Saurabh SIP
Document53 pages
Saurabh SIP
Progna Banerjee
No ratings yet
DBX 586H Preamp
Document17 pages
DBX 586H Preamp
energicul
100% (1)
Chapter6 TransportLayer
Document81 pages
Chapter6 TransportLayer
onmcv
No ratings yet
DIN 11864-3/DIN 11853-3 Aseptic Clamp Unions For DIN 11866/11850 Table A
Document7 pages
DIN 11864-3/DIN 11853-3 Aseptic Clamp Unions For DIN 11866/11850 Table A
Dinh Nguyen Dao
No ratings yet
Maximo7 5 v4
Document73 pages
Maximo7 5 v4
Chinna Bhupal
No ratings yet
Novotechnik LWH 100
Document2 pages
Novotechnik LWH 100
Jorge Agustin González Saraiba
No ratings yet
Onboard Me M CFG Ref Proj
Document28 pages
Onboard Me M CFG Ref Proj
Wali Ahsan
No ratings yet
Debre Markos Universit
Document26 pages
Debre Markos Universit
kidanemariam tesera
No ratings yet
CBB81 High Voltage Metallized Polypropylene Film Capacitor: Features
Document2 pages
CBB81 High Voltage Metallized Polypropylene Film Capacitor: Features
saran gul
No ratings yet
FDMC 8884
Document7 pages
FDMC 8884
Kholit Lit
No ratings yet
Civic Center Design Guidelines
Document18 pages
Civic Center Design Guidelines
aldridge
No ratings yet
HUAWEI S5720-SI Series Switches Datasheet (Detailed Version)
Document24 pages
HUAWEI S5720-SI Series Switches Datasheet (Detailed Version)
Ernesto Escarza Rios
No ratings yet
1.stresses On Propeller Shaft: 2.excessive Stern Tube Bearing Wear Down Reasons
Document10 pages
1.stresses On Propeller Shaft: 2.excessive Stern Tube Bearing Wear Down Reasons
Subham Samantaray
No ratings yet
BahasaInggris B
Document20 pages
BahasaInggris B
fin99
No ratings yet
Indian Standard - Industrial Plant Layout-Code of Safe Practice
Document29 pages
Indian Standard - Industrial Plant Layout-Code of Safe Practice
DineshKumar
No ratings yet
Water
Document346 pages
Water
LTE002
No ratings yet
Accelerated Bridge Construction (ABC) Guidelines: 1.1 General
Document6 pages
Accelerated Bridge Construction (ABC) Guidelines: 1.1 General
Kartik
No ratings yet
Chem Office Enterprise 2006
Document402 pages
Chem Office Enterprise 2006
HalimatulJulkapli
No ratings yet