Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Using Weka from Java

The power of weka data manipulation can also be exploited directly from Java code This enables the development of data mining applications (for decision support systems) without writing any machine learning code yourself Weka is completely written in Java and it comes with its documentation (javadoc style) Basis classes are: Attribute, Instance, Instances and Classier

Gianluca Moro - DEIS, University of Bologna

27

27

Attribute class
This is the class for handling attributes It is contained on the weka.io package Four types of Attribute are supported: - Numeric - Nominal (a xed set of values) - String - Date A Numeric attribute denition example: A Nominal attribute denition example:
Attribute temperatura = new Attribute("Temperatura"); FastVector tempoValues = new FastVector(3); tempoValues.addElement("sole"); tempoValues.addElement("coperto"); tempoValues.addElement("pioggia"); Attribute tempo = new Attribute("Tempo", tempoValues);
Gianluca Moro - DEIS, University of Bologna
28

28

Instance class
This is the class for handling a single instance It is contained on the weka.io package Creating an instance example:
// Create empty instance with three attribute values Instance inst = new Instance(3); // Set instance's values for the attributes "temperatura" and "tempo" inst.setValue(temperatura, 25); inst.setValue(tempo, "coperto"); // Set instance's dataset to be the dataset "weather" inst.setDataset(weather);

Gianluca Moro - DEIS, University of Bologna

29

29

Instances class
This is the class for handling set of instances It is contained on the weka.io package Could be directly created from an ARFF le:
FileReader reader = new FileReader(myDataset.arff); Instances set = new Instances(reader);

To add a new instances to the set just use:


set.add(inst);

To set the class attribute just use:


//Suppose that the class attribute is the last int classAttributeIndex = set.numAttributes()-1; set.setClassIndex(classAttributeIndex);

Gianluca Moro - DEIS, University of Bologna

30

30

Classier class
Just an abstract class implemented by specic algorithms It is contained in weka.classiers package It is the nal model of the system To build a model from a set of instances:
//For example using a J48-tree algorithm Classifier myClassifier = new J48(); myClassifier.buildClassifier(set);

To classify an instance just:


//For example using a J48-tree algorithm Classifier myClassifier = new J48(); double class = myClassifier.classifyInstance(inst);

Gianluca Moro - DEIS, University of Bologna

31

31

Save and load a model


Classier class is serializable so it is very simple to save and load a model. To save a created model in a le:
m_Classifier.buildClassifier(datas); ObjectOutputStream oos = new ObjectOutputStream( new FileOutputStream(modelFile)); oos.writeObject(classifier); oos.flush(); oos.close();

To load a model from a le:


ObjectInputStream fileIO = new ObjectInputStream(new FileInputStream(modelName)); Classifier model = (Classifier) fileIO.readObject(); fileIO.close();
Gianluca Moro - DEIS, University of Bologna

32

Remember to cast

32

The weather classier ... again


This simple example shows how to use weka from Java code. It is just the weather classier seen in the rst part. We will do basically the same things but from our code, without using weka GUI: 1. Just import some package from weka
import weka.core.*; import weka.classifiers.Classifier; import weka.classifiers.trees.J48;

Gianluca Moro - DEIS, University of Bologna

33

33

2. The constructor: build the data structure (declare the attributes)


! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! public Weather() throws Exception { ! String nameOfDataset = "WeatherSet"; ! /* Attribute's vector: ! * @attribute Tempo {sole, coperto, pioggia} ! * @attribute Temperatura real ! * @attribute Umidita real ! * @attribute Vento {true, false} ! * @attribute Gioca {yes, no} ! */ ! FastVector attributes = new FastVector(5); ! //Tempo ! FastVector tempoValues = new FastVector(3); ! tempoValues.addElement("sole"); ! tempoValues.addElement("coperto"); ! tempoValues.addElement("pioggia"); ! //Vento ! FastVector ventoValues = new FastVector(2); ! ventoValues.addElement("true");ventoValues.addElement("false"); ! //Gioca ! FastVector giocaValues = new FastVector(2); ! giocaValues.addElement("si");giocaValues.addElement("no"); ! ! attributes.addElement(new Attribute("Tempo", tempoValues)); ! attributes.addElement(new Attribute("Temperatura")); ! attributes.addElement(new Attribute("Umidita'")); ! attributes.addElement(new Attribute("Vento", ventoValues)); ! attributes.addElement(new Attribute("Gioca", giocaValues)); ! // Create a new dataset ! m_Data = new Instances(nameOfDataset, attributes, 100); ! // Set the last attribute as class ! m_Data.setClassIndex(m_Data.numAttributes() - 1); }

Gianluca Moro - DEIS, University of Bologna

34

34

3. Classify a message
public String classifyMessage(String message) throws Exception { ! ! // Check whether a classifier has been built. ! ! //if (m_Data.numInstances() == 0) { ! ! //! throw new Exception("No classifier available."); ! ! //} ! ! // Make separate little test set so that message ! ! // does not get added to string attribute in m_Data. ! ! Instances testset = m_Data.stringFreeStructure(); ! ! // Make message into test instance. ! ! Instance instance = makeInstance(message, testset); ! ! // Get index of predicted class value. ! ! double predicted = m_Classifier.classifyInstance(instance); ! ! // Output class value: value(..) Returns an instance's attribute value in internal format ! ! /* Class for handling an instance. All values (numeric, nominal, or string) are internally stored as floating-point numbers. If an attribute is nominal (or a string), the stored value is the index of the corresponding nominal (or string) value in the attribute's definition. We have chosen this approach in favor of a more elegant object-oriented approach because it is much faster. */ String msg ="Weather classified as: " ! ! + m_Data.classAttribute().value((int) predicted); return msg;

! ! !

! ! }

Gianluca Moro - DEIS, University of Bologna

35

35

Gianluca Moro - DEIS, University of Bologna

/** * Programma per classificare una giornata: a seconda delle caratteristiche meteorologichche * il sistema deve scegliere se giocare oppure no * * @author gm * */ import java.io.*; import weka.core.*; import weka.classifiers.Classifier; import weka.classifiers.trees.J48; public class Weather implements Serializable { ! // Training set ! private Instances m_Data = null; ! // Classificatore scelto ! private Classifier m_Classifier = new J48(); ! /** ! * Costruttore: crea un nuovo dataset di training inizialmente vuoto ! * ! * @throws Exception ! */ ! public Weather() throws Exception { ! ! String nameOfDataset = "WeatherSet"; ! ! ! ! /* Attribute's vector: ! ! * @attribute Tempo {sole, coperto, pioggia} ! ! * @attribute Temperatura real ! ! * @attribute Umidit real ! ! * @attribute Vento {true, false} ! ! * @attribute Gioca {yes, no} ! ! */ ! ! FastVector attributes = new FastVector(5); ! ! ! ! //Tempo ! ! FastVector tempoValues = new FastVector(3); ! ! tempoValues.addElement("sole"); ! ! tempoValues.addElement("coperto"); ! ! tempoValues.addElement("pioggia"); ! ! ! ! //Vento ! ! FastVector ventoValues = new FastVector(2); ! ! ventoValues.addElement("true"); ! ! ventoValues.addElement("false");

The entire code

36

36

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! }

//Gioca FastVector giocaValues = new FastVector(2); giocaValues.addElement("si"); giocaValues.addElement("no");

attributes.addElement(new attributes.addElement(new attributes.addElement(new attributes.addElement(new attributes.addElement(new

Attribute("Tempo", tempoValues)); Attribute("Temperatura")); Attribute("Umidita'")); Attribute("Vento", ventoValues)); Attribute("Gioca", giocaValues));

// Create a new dataset m_Data = new Instances(nameOfDataset, attributes, 100); // Set the last attribute as class m_Data.setClassIndex(m_Data.numAttributes() - 1);

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

/** * Metodo che configura il classificatore a partire dal nome del file che lo contiene * * @param modelName Nome del file che contiene il modello */ public void setClassifier(String modelName){ ! try { ! ! Classifier model; ! ! //Load the model ! ! ! ObjectInputStream modelInObjectFile = new ObjectInputStream( ! ! ! ! ! new FileInputStream(modelName)); ! ! ! model = (Classifier) modelInObjectFile.readObject(); ! ! ! modelInObjectFile.close(); ! ! ! ! ! this.m_Classifier = model; ! ! System.out.println("Model "+modelName+" lodaded.");! ! } catch (Exception e) { ! ! // TODO Auto-generated catch block ! ! e.printStackTrace(); ! } ! }

Gianluca Moro - DEIS, University of Bologna

37

37

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

Gianluca Moro - DEIS, University of Bologna

/** * Metodo che converte una stringa in una istanza (record) * * @param data La stringa nel formato Tempo,Temperatura,Umidita',Vento, [Gioca] * @param dataSet Il dataset a cui verr aggiunta * @return L'istanza creata */ private Instance makeInstance(String data, Instances dataSet) { ! Instance instance = new Instance(5); ! String[] values = data.split(","); ! Attribute tempo = dataSet.attribute("Tempo"); ! Attribute temperatura = dataSet.attribute("Temperatura"); ! Attribute umidita = dataSet.attribute("Umidita'"); ! Attribute vento = dataSet.attribute("Vento"); ! Attribute gioca = dataSet.attribute("Gioca"); ! instance.setValue(tempo, values[0]); ! instance.setValue(temperatura, Integer.parseInt(values[1])); ! instance.setValue(umidita,Integer.parseInt(values[2])); ! instance.setValue(vento, values[3]); ! if(values.length > 4){ ! ! instance.setValue(gioca, values[4]); ! } ! // Give instance access to attribute information from the dataset. ! instance.setDataset(dataSet); ! return instance; } /** * Classifica un messaggio passato in ingresso * * @return Una rappresentazione testuale della classe del messaggio */ public String classifyMessage(String message) throws Exception { ! // Check whether classifier has been built. ! if (m_Data.numInstances() == 0) { ! ! throw new Exception("No classifier available."); ! } ! // Make separate little test set so that message ! // does not get added to string attribute in m_Data. ! Instances testset = m_Data.stringFreeStructure(); ! // Make message into test instance. ! Instance instance = makeInstance(message, testset); ! // Get index of predicted class value. ! double predicted = m_Classifier.classifyInstance(instance); ! // Output class value. ! String msg ="Weather classified as: " ! ! ! + m_Data.classAttribute().value((int) predicted); ! return msg; }

38

38

Gianluca Moro - DEIS, University of Bologna

! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! }

/** * Main method. * * @option -m Model file's name * @option -classify Classifies an istance with model provided * @option -create Create a model from an arff data file, and output it in a model file */ public static void main(String[] options) { ! try { ! ! Classifier tree; ! ! if(options.length != 2){ ! ! ! String modelFile = Utils.getOption("m", options); ! ! ! String dataFile = Utils.getOption("create", options); ! ! ! if(dataFile.length() != 0){ ! ! ! ! // Load data from file ! ! ! ! Instances datas = new Instances(new BufferedReader(new FileReader(dataFile))); ! ! ! ! datas.setClassIndex(datas.numAttributes() - 1); ! ! ! ! //Build classifier ! ! ! ! tree = new J48(); ! ! ! ! tree.buildClassifier(datas); ! ! ! ! try { ! ! ! ! ! // serialize model ! ! ! ! ! ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(modelFile)); ! ! ! ! ! oos.writeObject(tree); ! ! ! ! ! oos.flush();oos.close(); ! ! ! ! ! System.out.println("Model saved on: "+modelFile); ! ! ! ! } catch (Exception e) { ! ! ! ! ! // TODO Auto-generated catch block ! ! ! ! ! e.printStackTrace(); ! ! ! ! } ! ! ! }else{ ! ! ! ! Weather w = new Weather(); ! ! ! ! w.setClassifier(modelFile); ! ! ! ! String unclassified = Utils.getOption("classify", options); ! ! ! ! if(unclassified.length() != 0){ ! ! ! ! ! //Classify the provided instance! ! ! ! ! ! ! ! ! ! ! ! System.out.println(w.classifyMessage(unclassified)); ! ! ! ! }! ! ! ! ! }! ! ! }else{ ! ! ! System.out.println("Parametri non corretti"); ! ! }//else! ! ! } catch (Exception e) { ! ! // TODO Auto-generated catch block ! ! e.printStackTrace(); ! } }

39

39

How to use the program


! To create a new model from a le just launch with -m and -create options:
java Weather.class -m myModel.model -create Weather.arff

Gianluca Moro - DEIS, University of Bologna

40

40

! To classify a message just launch with -m and -classify options:


java Weather.class -m myModel.model -classify coperto,83,86,false

Gianluca Moro - DEIS, University of Bologna

41

41

You might also like