Our Desired Tool

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Our desired tool 'WEKA' named after a flightless bird, Weka is a set of machine learning algorithms that

can be applied to a data set directly, or called from your own Java code. Weka contains tools for data
pre-processing, classification, regression, clustering, association rules, and visualization. There are many
facilities of Weka to use. These are - platform independent, open source and free, different machine
learning algorithms. The name WEKA stands for Waikato Environment for Knowledge Analysis. When I
was installing WEKA in my windows operating system, I faced some of unavoidable problems that raises
so much concern in my mind to move forward with the problem or go back and try to find a solution for
it. After downloading it from the required site of its, I move to the part called installation and followed
the basic instruction for downloading the desired machine learning software. The accompanying oracle
JAVA installation started automatically after the WEKA installation was installed. For the installation, I
also picked the default setting as I did for WEKA. A pop-up window appeared with "successfully
installed". A common problem was happening during the installation of this WEKA software is it could
not bring me all of its module. It was reporting an error message while it could not download and install
several modules. I just simply skip those phases of moments without installing them into my computer.
Though I have not given a try to installed them again into my pc and getting prepared for another
session when I can effort another installation of WEKA after uninstalling the recently used one. By
default, WEKA attempts to load all installed packages, and if a package cannot be loaded for some
reason a message will be displayed in the Loaded column of the list. We can opt to prevent a particular
package from being loaded by selecting it and then clicking the Toggle load button. This will mark the
package as one that should not be loaded the next time that WEKA is started. This can be useful if an
unstable package is generating errors, conflicting with another package (perhaps due to third-party
libraries), or otherwise preventing WEKA from operating properly. We can preprocess a dataset, feed it
into a learning scheme, and analyze the resulting classifier and its performance. We know that the data
is an integral part of the work, and many data visualization facilities and adapt preprocessing tools are
provided. All algorithms take their input in the form of a single relational table that can be read from a
file or generated by a dataset query. We can use WEKA in many ways. One of them is apply a learning
method to a dataset and analyze its output to learn more about the data. Another is to use learned
models to generate predictions on new instances. The third way is to apply several different learners
and compare their performance in order to choose one for prediction. In the interactive WEKA’s
interface we have to select the learning method that we want from a menu. Many methods have
tunable parameters, which you access through a property sheet or object editor. A common evaluation
module is used to measure the performance of all classifiers. Implementations of actual learning
schemes are the most valuable resource that WEKA provides.

Here, we have some data in a file called load data file. First of all, I have to prepare the data into ARFF
format and then we have to fire up the Explorer and load in the data. Because WEKA's native data
storage method is ARFF format. The bulk of an ARFF file consists of a list of the instances, and the
attribute values for each instance are separated by commas. Most spreadsheet and database programs
allow you to export data into a file in comma-separated value (CSV) format as a list of records with
commas between items. After doing this I have loaded the file into a text editor, add the dataset's name
using the @relation tag, the attribute information using @attribute, and a @data line; and save the file
as raw text. For loading the data at first, we have to fire up the explorer. Then we need to do some tasks
and those are- open a file from my computer by clicking the button named "Open file". After clicking the
button, we can find a new window popped up on our screen. From there we can locate our file in any
directory of our pc. After choosing the file we can observe the outcome of our selection of such file
called loan data file. After selecting the file, we can find Current relation, in that section we can find
Relation and number of instances, attributes and sum of weights in it. In selected attribute section we
can find out more about our dataset. By defining the class, we can find out the graphical chart of our
given dataset. And for every specific class we can find different visualization of those graphical charts
according to their Label. From "Filter" we can select our desired filter to apply that on our dataset. By
clicking the Choose button we can find the Files section, there we can find the option called
unsupervised and from there we can find our desired option named Discretize. This is an instance filter
that describes a range of numeric attributes in the dataset into nominal attributes. And by pressing the
apply button we can apply this specific filter on our dataset. There is one more thing about filter, and
that is filtering capabilities. We can set our desired settings of such various types of attributes. Our
desired “loan_data” file looks like-

@relation loan_history

@attribute Client{s1,s2,s3,s4,s5,s6,s7,s8,s9,s10,s11,s12,s13,s14,s15,s16,s17,s18,s19,s20}

@attribute unclient{yes,no}

@attribute LforPC{taken,nottaken}

@attribute LforCar{taken,nottaken}

@attribute MaleCli{yes,no}

@attribute married{yes,no}

@attribute LinprobA{yes,no}

@attribute Age real

@attribute MinB real

@attribute MontPay real

@attribute MoLoan real

@attribute ywtlemp real

@attribute apprloan{yes,no}
@attribute rejeloan{yes,no}

@data

s1,no,taken,nottaken,no,no,no,18,20,2,15,1,yes,no

s2,no,taken,nottaken,no,no,no,20,10,2,20,2,yes,no

s3,yes,taken,nottaken,no,yes,yes,25,5,4,12,0,no,yes

s4,no,taken,nottaken,no,yes,no,40,5,7,12,4,yes,no

s5,no,taken,nottaken,yes,no,yes,50,5,4,12,25,yes,no

s6,no,taken,nottaken,yes,no,no,18,10,5,8,1,yes,no

s7,no,taken,nottaken,yes,no,no,22,10,3,8,4,yes,no

s8,no,taken,nottaken,yes,yes,no,28,15,4,10,5,yes,no

s9,no,taken,nottaken,yes,yes,no,40,20,2,20,15,yes,no

s10,yes,nottaken,taken,no,yes,no,50,5,4,12,10,no,yes

s11,no,nottaken,taken,no,no,no,18,50,8,20,1,no,yes

s12,yes,nottaken,taken,no,yes,no,20,50,10,20,2,no,yes

s13,no,nottaken,taken,no,no,no,25,50,5,20,5,no,yes

s14,no,nottaken,taken,no,no,no,38,150,10,20,15,yes,no

s15,no,nottaken,taken,yes,yes,no,50,50,15,20,8,yes,no

s16,no,nottaken,taken,yes,no,no,19,50,7,20,2,no,yes

s17,no,nottaken,taken,yes,yes,no,21,150,3,20,3,yes,no

s18,no,nottaken,taken,yes,no,no,25,150,10,20,2,yes,no

s19,no,nottaken,taken,yes,yes,no,38,100,10,20,15,yes,no

s20,no,nottaken,taken,yes,yes,no,50,50,10,30,2,no,yes
Discretizing means transforming numeric attribute with a certain range. We could take that range and
chop it into certain number of equal parts or bins. We have to just divide it into equal bins, and
wherever a numeric value falls, we take that bin and use its identification as the discretized version of
the numeric value. Instead of using equal -sized bins, we can adjust the size to make the number of
instances that fall into watch bin approximately the same: equal frequency binning. By clicking the
preprocess button at first, we have to choose Discretize option of filtering. By applying it on the dataset
we can find more specific information about its attributes. If we are going to talk about the benefits of
nominal and real data then we have to consider that the nominal value has slightly advantage over real
valued data. Because nominal data values in monetary terms for a given year, while real data are values
that have been adjusted for price level changes. In other words, nominal values are the face of the value
of goods and services. Here, we can refer attribute word instead of goods or services. On the other
hand, while real values are inflation-adjusted. Because of such terms we can easily understand that
Discretization experiments help better understanding the loan data. Discretization algorithm for real
value attributes is of very important uses in many areas such as intelligence and machine learning.
Discretization of real value attributes is an important method of compression data and simplification
analysis and also is an indeterminable in pattern recognition, machine learning, and rough set analysis
domain. The key of discretization lies with dividing the cut point. As we know, there are two types of
discretization techniques: unsupervised ones, which are “class blind”, and supervised ones, which take
the class value of the instances into account when creating intervals. Weka's main unsupervised method
for discretizing numeric attributes is weka.filters.unsupervised.attribute.Discretize. It implements these
two methods: equal-width (the default) and equal-frequency discretization. To proof the benefit of
discretization we can show some implementation of some techniques of several combination of options.
And those can be like; Before using any filter to our dataset we can click the classify button beside of the
Preprocess button. And there we can Find Choose button for electing any algorithm to find out the
information about the dataset. For example, let's select the “Naïve Bayes” classifier among many
options. And select the Cross-validation Folds giving the value 10 and let the rest of the option as it is.
Then simply click the start button to get the Classifier output. There we can find the brief information
about the dataset. In short to identify the changes we can point a section called Stratified cross-
validation, in there we can see Correctly Classified Instances 16 out of 20 and the percentage of
correctness is 80%. And the number of Incorrectly Classified Instances is 4 and the percentage of
incorrectness is 20%. Then we should click the Preprocess button. There we should select the Discretize
Filter by clicking the Choose button. And by clicking the Apply button we can find out the discretize
result of our dataset. Then again, we should back to the Classify section and there we should start the
classifying under “Naïve Bayes” option with the same setting of other option as we did earlier before
discretize the attribute. In the Correctly Classified Instances, we can find out no Incorrectly Classified
Instances and the percentage of correctness is 100%. Previously it was 80%. And there is many differ in
several attribute before and after discretization. Instead of “Naïve Bayes” we can use many other
algorithms to our datasets. And lastly, we can say that discretization helps us to understand the loan
data better.
There is a question asking us what does it mean if an attribute get a single value (All) after
discretization?

About this query after discretization, we have to say there is no such value like this. It seems that the
numeric attributes of the loan dataset relation are becoming nominal attributes and that is not
significant with respect to the class (they get a single value “All”, which may be interpreted as “don’t
care”). If we found a row that have value but all other rows don’t have then we should replace the
value with 0, in the sake of making an appropriate datasets and analysis. If we don’t do that then there
is going to be some effect on the datasets and the analysis of such terminologies that we are going to
perform on our datasets. So, we should make the null values (the empty values) zero to avoid any effect
on our activities related to the datasets. If an attribute gets a single value (All) after discretization it
means this attribute is not in our consideration for making decision. The ID attribute is only the serial
number of employee so it shows 'ALL' after discretization as I expected.

For any kind of dataset ID attribute is necessary. Because without ID attribute the result will be
incomplete. ID attribute works like a unique key in any dataset. With the help of ID attribute result can
be found.

The original data file created directly from the loan data set.

@relation loan_history

@attribute
Client{s1,s2,s3,s4,s5,s6,s7,s8,s9,s10,s11,s12,s13,s14,s15,s16,s17,s18,s19,
s20}
@attribute unclient{yes,no}
@attribute LforPC{taken,nottaken}
@attribute LforCar{taken,nottaken}
@attribute MaleCli{yes,no}
@attribute married{yes,no}
@attribute LinprobA{yes,no}
@attribute Age real
@attribute MinB real
@attribute MontPay real
@attribute MoLoan real
@attribute ywtlemp real
@attribute apprloan{yes,no}
@attribute rejeloan{yes,no}

@data
s1,no,taken,nottaken,no,no,no,18,20,2,15,1,yes,no
s2,no,taken,nottaken,no,no,no,20,10,2,20,2,yes,no
s3,yes,taken,nottaken,no,yes,yes,25,5,4,12,0,no,yes
s4,no,taken,nottaken,no,yes,no,40,5,7,12,4,yes,no
s5,no,taken,nottaken,yes,no,yes,50,5,4,12,25,yes,no
s6,no,taken,nottaken,yes,no,no,18,10,5,8,1,yes,no
s7,no,taken,nottaken,yes,no,no,22,10,3,8,4,yes,no
s8,no,taken,nottaken,yes,yes,no,28,15,4,10,5,yes,no
s9,no,taken,nottaken,yes,yes,no,40,20,2,20,15,yes,no
s10,yes,nottaken,taken,no,yes,no,50,5,4,12,10,no,yes
s11,no,nottaken,taken,no,no,no,18,50,8,20,1,no,yes
s12,yes,nottaken,taken,no,yes,no,20,50,10,20,2,no,yes
s13,no,nottaken,taken,no,no,no,25,50,5,20,5,no,yes
s14,no,nottaken,taken,no,no,no,38,150,10,20,15,yes,no
s15,no,nottaken,taken,yes,yes,no,50,50,15,20,8,yes,no
s16,no,nottaken,taken,yes,no,no,19,50,7,20,2,no,yes
s17,no,nottaken,taken,yes,yes,no,21,150,3,20,3,yes,no
s18,no,nottaken,taken,yes,no,no,25,150,10,20,2,yes,no
s19,no,nottaken,taken,yes,yes,no,38,100,10,20,15,yes,no
s20,no,nottaken,taken,yes,yes,no,50,50,10,30,2,no,yes

@relation loan_history-weka.filters.supervised.attribute.Discretize-
Rfirst-last-precision6-weka.filters.supervised.attribute.Discretize-
Rfirst-last-precision6
@attribute Client
{s1,s2,s3,s4,s5,s6,s7,s8,s9,s10,s11,s12,s13,s14,s15,s16,s17,s18,s19,s20}
@attribute unclient {yes,no}
@attribute LforPC {taken,nottaken}
@attribute LforCar {taken,nottaken}
@attribute MaleCli {yes,no}
@attribute married {yes,no}
@attribute LinprobA {yes,no}
@attribute Age {'\'All\''}
@attribute MinB {'\'All\''}
@attribute MontPay {'\'All\''}
@attribute MoLoan {'\'All\''}
@attribute ywtlemp {'\'All\''}
@attribute apprloan {yes,no}
@attribute rejeloan {yes,no}
@data
s1,no,taken,nottaken,no,no,no,'\'All\'','\'All\'','\'All\'','\'All\'','\'A
ll\'',yes,no
s2,no,taken,nottaken,no,no,no,'\'All\'','\'All\'','\'All\'','\'All\'','\'A
ll\'',yes,no
s3,yes,taken,nottaken,no,yes,yes,'\'All\'','\'All\'','\'All\'','\'All\'','
\'All\'',no,yes
s4,no,taken,nottaken,no,yes,no,'\'All\'','\'All\'','\'All\'','\'All\'','\'
All\'',yes,no
s5,no,taken,nottaken,yes,no,yes,'\'All\'','\'All\'','\'All\'','\'All\'','\
'All\'',yes,no
s6,no,taken,nottaken,yes,no,no,'\'All\'','\'All\'','\'All\'','\'All\'','\'
All\'',yes,no
s7,no,taken,nottaken,yes,no,no,'\'All\'','\'All\'','\'All\'','\'All\'','\'
All\'',yes,no
s8,no,taken,nottaken,yes,yes,no,'\'All\'','\'All\'','\'All\'','\'All\'','\
'All\'',yes,no
s9,no,taken,nottaken,yes,yes,no,'\'All\'','\'All\'','\'All\'','\'All\'','\
'All\'',yes,no
s10,yes,nottaken,taken,no,yes,no,'\'All\'','\'All\'','\'All\'','\'All\'','
\'All\'',no,yes
s11,no,nottaken,taken,no,no,no,'\'All\'','\'All\'','\'All\'','\'All\'','\'
All\'',no,yes
s12,yes,nottaken,taken,no,yes,no,'\'All\'','\'All\'','\'All\'','\'All\'','
\'All\'',no,yes
s13,no,nottaken,taken,no,no,no,'\'All\'','\'All\'','\'All\'','\'All\'','\'
All\'',no,yes
s14,no,nottaken,taken,no,no,no,'\'All\'','\'All\'','\'All\'','\'All\'','\'
All\'',yes,no
s15,no,nottaken,taken,yes,yes,no,'\'All\'','\'All\'','\'All\'','\'All\'','
\'All\'',yes,no
s16,no,nottaken,taken,yes,no,no,'\'All\'','\'All\'','\'All\'','\'All\'','\
'All\'',no,yes
s17,no,nottaken,taken,yes,yes,no,'\'All\'','\'All\'','\'All\'','\'All\'','
\'All\'',yes,no
s18,no,nottaken,taken,yes,no,no,'\'All\'','\'All\'','\'All\'','\'All\'','\
'All\'',yes,no
s19,no,nottaken,taken,yes,yes,no,'\'All\'','\'All\'','\'All\'','\'All\'','
\'All\'',yes,no
s20,no,nottaken,taken,yes,yes,no,'\'All\'','\'All\'','\'All\'','\'All\'','
\'All\'',no,yes

[N.B.- There are three ARFF files including one original ARFF file created directly from the loan data set
and we included one discretized data files (my choice of discretization parameters)]

You might also like