Practical Project

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Practical Project

Using the WEKA explorer interface, study the following Tasks.

TASK A

Step 1: Load the glass.arff dataset.

Step 2: Descriptive data study.

(i) Determine how many attributes and instances do the dataset contains. Count the number
of nominal attributes and numeric attributes.
(ii) How many data labels are involved?
(iii) Is there any missing data identified?
(iv) What is the range of Al, Si and K?

Step 3: Use J48 algorithm and its default settings to classify the data. What is the classification
accuracy? Display and explain the results from confusion matrix.

Step 4: Identify the number of leaves and size of the tree.

Step 5: Visualize J48 results in tree diagram.

Step 6: Configure J48 option using unpruned tree and execute J48 classification again. Does the
classification accuracy improve? Explain the effect of unpruned tree.

Step 7: Revise minNumObj option of J48 to 15 and execute the classification. Compare the tree
diagram with Step 6.

TASK B

Step 1: Load the glass.arff dataset.

Step 2: Apply unsupervised filter option to Remove three attributes: Ca, Ba and Fe.

Step 3: Undo all changes into the original glass.arff dataset. Now remove the three attributes: Ca, Ba
and Fe by manual attributes select and remove. Are the results similar to Step 2?

Step 3: Use J48 algorithm and its default settings to classify the data. What is the classification
accuracy?

Step 4: Undo all changes to the original glass.arff dataset again. Now determine which of the
following attribute selection gives the highest classification accuracy using J48.
(i) Removing Fe, Si, Al, K
(ii) Removing Fe, Mg, Rl
(iii) Removing Fe, Si, Mg, K

Step 5: Undo all changes into the original glass.arff dataset. Apply unsupervised filter option to
Normalize attributes.

Step 6: What are the changes observed on the attributes from Step 5?
TASK C

Step 1: Retrieve Balloons dataset from the UCI Machine Learning Repository Datasets.

Step 2: Transform data into format readable by WEKA tool and open file using WEKA.

Step 3: Perform the descriptive data analysis

e.g. Number and type of attributes, instances, …Any missing value? What are the data labels?

Step 4: Consider if data filtering is necessary.

Step 5: Run classification analysis using the following WEKA algorithms.

 NaiveBayes
 J48
 IBk (for each value of K=2, 3, 4)

Step 6: Compare the classification accuracies from Step 5.

Step 7: Identify the classifier errors using plot matrix of Visualize. Do the classification prediction
errors from the three algorithms point to the same data instances?

You might also like