Professional Documents
Culture Documents
Nayana Paper GUCON
Nayana Paper GUCON
1. Data Preprocessing
The misclassified data were partitioned into different
datasets for evaluation and training of the deleted model
using the EM clustering algorithm. We construct training set V. Confusion matrix
1 by oddly choosing 70% among the entire data-set. The
remaining 30% of the dataset is applied as0a0test dataset The confusion0matrix is a0useful tool for0analyzing how
to0estimate0the performance0of the0decision-tree algorithm. well the categorizer predictive self recognizes different sets
Since0the0model has0been tested multiple times, it is0easy of classes. TP0and TN will notify you when0the classifier
to0overfit the0model by0evaluating it on the same dataset. problem is being resolved FP and FN will be notified when
To0evaluate the0model more efficiently, training0set 1 is there is a categorizer. For a precision sorter, ideally most
split into0two subsets0of 90 sets 100sets. The090 sets pairs are represented0along the0diagonal of0the
above0are used0to train each0model called0training set 2. confusion0matrix, with the rest of the items being 0 or close
The0remaining 10% of0the data is0used as a cross-test set to 0. The confusion matrix represented using the real and
(CV0set) to evaluate0the model. As0shown in0Figure 3, the predictive classes is shown in Figure 4.
entire data0set is0split0into three0subsets: training0set 2,
CV0set, and0test0set.
Figure 4. Confusion matrix
REFERENCES