Professional Documents
Culture Documents
Practical Project
Practical Project
Practical Project
TASK A
(i) Determine how many attributes and instances do the dataset contains. Count the number
of nominal attributes and numeric attributes.
(ii) How many data labels are involved?
(iii) Is there any missing data identified?
(iv) What is the range of Al, Si and K?
Step 3: Use J48 algorithm and its default settings to classify the data. What is the classification
accuracy? Display and explain the results from confusion matrix.
Step 6: Configure J48 option using unpruned tree and execute J48 classification again. Does the
classification accuracy improve? Explain the effect of unpruned tree.
Step 7: Revise minNumObj option of J48 to 15 and execute the classification. Compare the tree
diagram with Step 6.
TASK B
Step 2: Apply unsupervised filter option to Remove three attributes: Ca, Ba and Fe.
Step 3: Undo all changes into the original glass.arff dataset. Now remove the three attributes: Ca, Ba
and Fe by manual attributes select and remove. Are the results similar to Step 2?
Step 3: Use J48 algorithm and its default settings to classify the data. What is the classification
accuracy?
Step 4: Undo all changes to the original glass.arff dataset again. Now determine which of the
following attribute selection gives the highest classification accuracy using J48.
(i) Removing Fe, Si, Al, K
(ii) Removing Fe, Mg, Rl
(iii) Removing Fe, Si, Mg, K
Step 5: Undo all changes into the original glass.arff dataset. Apply unsupervised filter option to
Normalize attributes.
Step 6: What are the changes observed on the attributes from Step 5?
TASK C
Step 1: Retrieve Balloons dataset from the UCI Machine Learning Repository Datasets.
Step 2: Transform data into format readable by WEKA tool and open file using WEKA.
e.g. Number and type of attributes, instances, …Any missing value? What are the data labels?
NaiveBayes
J48
IBk (for each value of K=2, 3, 4)
Step 7: Identify the classifier errors using plot matrix of Visualize. Do the classification prediction
errors from the three algorithms point to the same data instances?