Professional Documents
Culture Documents
15CSL76
15CSL76
15CSL76
VII SEMESTER
Prepared by:-
Ms. RACHANA N B
Asst. Professor
Dept. of CS&E.
-
GMIT, Davangere
Machine Learning Lab Manual (15CSL76) Rachana N B
1. Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file.
THEORY:
• The concept learning approach in machine learning can be formulated as “Problem of searching
through a predefined space of potential hypotheses for the hypothesis that best fits the training
examples”.
• Find-S algorithm for concept learning is one of the most basic algorithms of machine learning.
Find-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h :
If the constraint ai in h is satisfied by x then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
• It is guaranteed to output the most specific hypothesis within H that is consistent with the
positive training examples.
• Also Notice that negative examples are ignored.
Limitations of the Find-S algorithm:
• No way to determine if the only final hypothesis (found by Find-S) is consistent with data or
there are more hypotheses that are consistent with data.
• Inconsistent sets of training data can mislead the finds algorithm as it ignores negative data
samples.
• A good concept learning algorithm should be able to backtrack the choice of hypothesis found so
that the resulting hypothesis can be improved over time. Unfortunately, Find-S provides no such
method.
INPUT: finds.csv
Sky,Airtemp,Humidity,Wind,Water,Forecast,WaterSport
Sunny,Warm,Normal,Strong,Warm,Same,Yes
Sunny,Warm,High,Strong,Warm,Same,Yes
Rainy,Cold,High,Strong,Warm,Change,No
Sunny,Warm,High,Strong,Cool,Change,Yes
PROGRAM:
import numpy as np
import pandas as pd
data = pd.read_csv('finds.csv')
def train(concepts, target):
for i, val in enumerate(target):
if val == "Yes":
specific_h = concepts[i]
break
for i,h in enumerate(concepts):
if target[i] == "Yes":
for x in range(len(specific_h)):
if h[x] == specific_h[x]:
pass
else:
specific_h[x] = "?" #else replace with ?
return specific_h
concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-1])
print(train(concepts,target))
OUTPUT:
LEARNING OUTCOMES :
• Students will be able to apply Find-S algorithm to the real world problem and find the most
specific hypothesis from the training data.
APPLICATION AREAS:
• Classification based problems.
2. For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent
with the training examples.
THEORY:
• The key idea in the Candidate-Elimination algorithm is to output a description of the set of all
hypotheses consistent with the training examples.
• It computes the description of this set without explicitly enumerating all of its members.
• This is accomplished by using the more-general-than partial ordering and maintaining a compact
representation of the set of consistent hypotheses.
• The algorithm represents the set of all hypotheses consistent with the observed training
examples. This subset of all hypotheses is called the version space with respect to the hypothesis
space H and the training examples D, because it contains all plausible versions of the target
concept.
• A version space can be represented with its general and specific boundary sets.
• The Candidate-Elimination algorithm represents the version space by storing only its most
general members G and its most specific members S.
• Given only these two sets S and G, it is possible to enumerate all members of a version space by
generating hypotheses that lie between these two sets in general-to-specific partial ordering over
hypotheses. Every member of the version space lies between these boundaries
Algorithm
PROGRAM:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('finds.csv'))
concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-1])
def learn(concepts, target):
specific_h = concepts[0].copy()
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
for i, h in enumerate(concepts):
if target[i] == "Yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "No":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
indices = [i for i,val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("Final S:", s_final, sep="\n")
print("Final G:", g_final, sep="\n")
data.head()
INPUT: finds.csv
Sky,Airtemp,Humidity,Wind,Water,Forecast,WaterSport
Sunny,Warm,Normal,Strong,Warm,Same,Yes
Sunny,Warm,High,Strong,Warm,Same,Yes
Rainy,Cold,High,Strong,Warm,Change,No
Sunny,Warm,High,Strong,Cool,Change,Yes
OUTPUT:
Final S:
Final G:
[['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?']]
LEARNING OUTCOMES:
The students will be able to apply candidate elimination algorithm and output a description of the set of
all hypotheses consistent with the training examples
APPLICATION AREAS:
Classification based problems.
3. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
THEORY:
• ID3 algorithm is a basic algorithm that learns decision trees by constructing them topdown, beginning
with the question "which attribute should be tested at the root of the tree?".
• To answer this question, each instance attribute is evaluated using a statistical test to determine how
well it alone classifies the training examples. The best attribute is selected and used as the test at the
root node of the tree.
• A descendant of the root node is then created for each possible value of this attribute, and the training
examples are sorted to the appropriate descendant node (i.e., down the branch corresponding to the
example's value for this attribute).
• The entire process is then repeated using the training examples associated with each descendant node
to select the best attribute to test at that point in the tree.
• A simplified version of the algorithm, specialized to learning boolean-valued functions (i.e., concept
learning), is described below.
ALGORITHM:
package id3;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.trees.Id3;
import weka.core.Instances;
import weka.core.converters.ArffLoader;
public class TestID3 {
public static BufferedReader readDataFile(String filename) {
BufferedReader inputReader = null;
try {
inputReader = new BufferedReader(new FileReader(filename));
} catch (FileNotFoundException ex) {
System.err.println("File not found: " + filename);
}
return inputReader;
}
public static void main(String[] args) throws Exception {
//Get File
ArffLoader loader= new ArffLoader();
loader.setSource(new File("playtennis/tennis11.arff"));
Instances data= loader.getDataSet();
//Setting class attribute
data.setClassIndex(data.numAttributes() - 1);
//Make tree
Id3 tree = new Id3();
tree.buildClassifier(data);
System.out.println(tree);
BufferedReader datafile = readDataFile("playtennis/tennis11.arff");
OUTPUT:
Id3
outlook = sunny
| humidity = high: no
| humidity = normal: yes
outlook = overcast: yes
outlook = rain
| wind = weak: yes
| wind = strong: no
Results
======
sunny,hot,high,weak,?............Prediction .......... no
sunny,hot,high,strong,?............Prediction...........no
overcast,hot,high,weak,?............Prediction .......... yes
rain,mild,high,weak,?............Prediction .......... yes
sunny,mild,high,weak,?............Prediction .......... no
overcast,mild,high,strong,?............Prediction...........yes
rain,mild,high,strong,?............Prediction .......... no
rain,cool,normal,weak,?............Prediction ..........yes
rain,cool,normal,strong,?............Prediction .......... no
overcast,cool,normal,strong,?............Prediction .......... yes
sunny,cool,normal,weak,?............Prediction .......... yes
rain,mild,normal,weak,?............Prediction .......... yes
sunny,mild,normal,strong,?............Prediction .......... yes
overcast,hot,normal,weak,?............Prediction .......... yes
LEARNING OUTCOMES:
The student will be able to demonstrate the working of the decision tree based ID3 algorithm, use an
appropriate data set for building the decision tree and apply this knowledge toclassify a new sample.
APPLICATION AREAS:
Classification related prblem areas
4. Build an Artificial Neural Network by implementing the Backpropagation algorithm and test
the same using appropriate data sets.
THEORY:
• Artificial neural networks (ANNs) provide a general, practical method for learning real-
valued, discrete-valued, and vector-valued functions from examples.
• Algorithms such as BACKPROPAGATION gradient descent to tune network parameters to best
fit a training set of input-output pairs.
• ANN learning is robust to errors in the training data and has been successfully applied to
problems such as interpreting visual scenes, speech recognition, and learning robot control
strategies.
Backpropogation algorithm
1. Create a feed-forward network with ni inputs, nhidden hidden units, and nout output units.
2. Initialize each wi to some small random value (e.g., between -.05 and .05).
3. Until the termination condition is met, do
For each training example <(x1,…xn),t>, do
// Propagate the input forward through the network:
a. Input the instance (x1, ..,xn) to the n/w & compute the n/w outputs ok for everyunit
// Propagate the errors backward through the network:
d. For each network weight wi,j do; wi,j=wi,j+wi,j where wi,j= j xi,j
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally
y = y/100
#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))
#Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad
wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
OUTPUT:
[[ 0.66666667 1. ]
[ 0.33333333 0.55555556]
[ 1. 0.66666667]]
Actual Output:
[[ 0.92]
[ 0.86]
[ 0.89]]
Predicted Output:
[[ 0.89565473]
[ 0.87769237]
[ 0.89631707]]
LEARNING OUTCOMES:
The student will be able to build an Artificial Neural Network by implementing the Back
propagation algorithm and test the same using appropriate data sets.
APPLICATION AREAS:
Speech recognition, Character recognition, Human Face recognition
5. Write a program to implement the Naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
THEORY:
Naive Bayes algorithm: Naive Bayes algorithm is a classification technique based on Bayes‟ Theorem
with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier
assumes that the presence of a particular feature in a class is unrelated to the presence of any other
feature. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in
diameter. Even if these features depend on each other or upon the existence of the other features, all of
these properties independently contribute to the probability that this fruit is an apple and that is why it is
known as „Naive‟.
Naive Bayes model is easy to build and particularly useful for very large data sets. Along with
simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c).
Look at the equation below:
where
P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes). P(c) is the prior
probability of class.
P(x|c) is the likelihood which is the probability of predictor given class. P(x) is the prior probability of
predictor.
The naive Bayes classifier applies to learning tasks where each instance x is described by a conjunction
of attribute values and where the target function f (x) can take on any value from some finite set V. A set
of training examples of the target function is provided, and a new instance is presented, described by the
tuple of attribute values (a1, a2, ... ,an). The learner is asked to predict the target value, or classification,
for this new instance.
The Bayesian approach to classifying the new instance is to assign the most probable target value,
VMAP, given the attribute values (al, a2, ..., an) that describe the instance.
Now we could attempt to estimate the two terms in Equation (19) based on the training data. It is
easy to estimate each of the P(vj) simply by counting the frequency with which each target value vj
occurs in the training data.
The naive Bayes classifier is based on the simplifying assumption that the attribute values are
conditionally independent given the target value. In other words, the assumption is that given the target
value of the instance, the probability of observing the conjunction al, a2, … , an, is just the product of
the probabilities for the individual attributes: P(al, a2, … , an | vj) = Πi P(ai | vj). Substituting this, we
where vNB denotes the target value output by the naive Bayes classifier.
When dealing with continuous data, a typical assumption is that the continuous values associated with
each class are distributed according to a Gaussian distribution. For example, suppose the training data
contains a continuous attribute, x. We first segment the data by the class, and then compute the mean
and variance of x in each class.
Let μ be the mean of the values in x associated with class Ck, and let σ2k be the variance of the
values in x associated with class Ck. Suppose we have collected some observation value v. Then, the
probability distribution of v given a class Ck, p(x=v|Ck) can be computed by plugging v into the
768 observations of medical details. The records describe instantaneous measurements taken from
the patient such as their age, the number of times pregnant and blood workup. All patients are
women aged 21 or older. All attributes are numeric, and their units vary from attribute to attribute.
Each record has a class value that indicates whether the patient suffered an onset of diabetes within 5
years of when the measurements were taken (1) or not (0).
Implementation:
1. Handle Data: Load the data from CSV file and split it into training and test datasets.
2. Summarize Data: summarize the properties in the training dataset so that we can calculate
probabilities and make predictions.
3. Make a Prediction: Use the summaries of the dataset to generate a single prediction.
4. Make Predictions: Generate predictions given a test dataset and a summarized training dataset.
5. Evaluate Accuracy: Evaluate the accuracy of predictions made for a test dataset as the
percentage correct out of all predictions made
Input samples:
import csv
import random
import math
def loadCsv(filename):
lines = csv.reader(open(filename, "r"))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def separateByClass(dataset):
separated = {}
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)
def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)]
del summaries[-1]
return summaries
def summarizeByClass(dataset):
separated = separateByClass(dataset)
summaries = {}
for classValue, instances in separated.items():
summaries[classValue] = summarize(instances)
return summaries
main()
OUTPUT:
Split 768 rows into train=514 and test=254 rows
Accuracy: 0.39370078740157477%
LEARNING OUTCOMES:
The student will be able to apply naive baysian classifier for the relevent problem and analyse the
results.
APPLICATION AREAS:
• Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could
be used for making predictions in real time.
• Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here
we can predict the probability of multiple classes of target variable.
• Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text
classification (due to better result in multi class problems and independence rule) have higher
success rate as compared to other algorithms. As a result, it is widely used in Spam filtering
(identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and
negative customer sentiments)
• Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a
Recommendation System that uses machine learning and data mining techniques to filter unseen
information and predict whether a user would like a given resource or not.
Input dataset:
OUTPUT:
The dimensions of the dataset (18, 2)
dimensions of train and test sets
(13,)
(5,)
(13,)
(5,)
Accuracy metrics
Accuracy of the classifer is 0.8
Confusion matrix
[[2 0]
[1 2]]
Recall and Precison
0.666666666667
1.0
LEARNING OUTCOMES:
The student will be able to apply naive baysian classifier for document classification and analyse the
results.
APPLICATION AREAS:
Applicable in document classification
THEORY:
• Bayesian networks are very convenient for representing similar probabilistic relationships
between multiple events.
• Bayesian networks as graphs - People usually represent Bayesian
networks as directed graphs in which each node is a hypothesis or a
random process. In other words, something that takes at least 2
possible values you can assign probabilities to. For example, there can
be a node that represents the state of the dog (barking or not barking at
the window), the weather (raining or not raining), etc.
• The arrows between nodes represent the conditional probabilities
between them — how information about the state of one node changes
the probability distribution of another node it‟s connected to.
package bayesNetwork;
import weka.classifiers.Evaluation;
import weka.classifiers.bayes.BayesNet;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.gui.graphvisualizer.GraphVisualizer;
import java.awt.BorderLayout;
import javax.swing.JFrame;
// display graph
GraphVisualizer gv = new GraphVisualizer();
gv.readBIF(cls.graph());
JFrame jf = new JFrame("BayesNet graph");
jf.setSize(800, 600);
jf.getContentPane().setLayout(new BorderLayout());
jf.getContentPane().add(gv, BorderLayout.CENTER);
jf.setVisible(true);
gv.layoutGraph();
}
}
Input dataset:
age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,Diagnose
63,1,1,145,233,1,2,150,0,2.3,3,0,6,N
67,1,4,160,286,0,2,108,1,1.5,2,3,3,N
67,1,4,120,229,0,2,129,1,2.6,2,2,7,N
37,1,3,130,250,0,0,187,0,3.5,3,0,3,N
41,0,2,130,204,0,2,172,0,1.4,1,0,3,N
56,1,2,120,236,0,0,178,0,0.8,1,0,3,N
62,0,4,140,268,0,2,160,0,3.6,3,2,3,Y
OUTPUT:
** Bayes Network Evaluation with Datasets **
APPLICATION AREAS:
• Applicable in prediction and classification • Document Classification
• Gene Regulatory Networks • Information Retrieval
• Medicine • Semantic Search
• Biomonitoring
THEORY:
Expectation Maximization algorithm
• The basic approach and logic of this clustering method is as follows.
• Suppose we measure a single continuous variable in a large sample of observations. Further,
suppose that the sample consists of two clusters of observations with different means (and
perhaps different standard deviations); within each sample, the distribution of values for the
continuous variable follows the normal distribution.
• The goal of EM clustering is to estimate the means and standard deviations for each cluster so as
to maximize the likelihood of the observed data (distribution).
• Put another way, the EM algorithm attempts to approximate the observed distributions of values
based on mixtures of different distributions in different clusters. The results of EM clustering are
different from those computed by k-means clustering.
• The latter will assign observations to clusters to maximize the distances between clusters. The
EM algorithm does not compute actual assignments of observations to clusters, but classification
probabilities.
• In other words, each observation belongs to each cluster with a certain probability. Of course, as
a final result we can usually review an actual assignment of observations to clusters, based on the
(largest) classification probability.
K means Clustering
• The algorithm will categorize the items into k groups of similarity. To calculate that similarity,
we will use the euclidean distance as measurement.
• The algorithm works as follows:
1. First we initialize k points, called means, randomly.
2. We categorize each item to its closest mean and we update the mean‟s coordinates, which
are the averages of the items categorized in that mean so far.
3. We repeat the process for a given number of iterations and at the end, we have our
clusters.
The “points” mentioned above are called means, because they hold the mean values of the items
categorized in it. To initialize these means, we have a lot of options. An intuitive method is to initialize
the means at random items in the data set. Another method is to initialize the means at random values
• Pseudocode:
1. Initialize k means with random values
2. For a given number of iterations:
Iterate through items:
Find the mean closest to the item
Assign item to mean
Update mean
KmeansWeka.java
package kMeansandEM;
import weka.clusterers.ClusterEvaluation;
import weka.clusterers.EM;
import weka.clusterers.SimpleKMeans;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
System.out.println(eval.clusterResultsToString());
} catch (Exception e) {
}
}
TestEM
package kMeansandEM;
import java.io.FileReader;
import weka.clusterers.ClusterEvaluation;
import weka.clusterers.EM;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;
System.out.println(eval.clusterResultsToString());
} catch (Exception e) {
}
}
INPUT:
preg,plas,pres,skin,insu,mass,pedi,age,class
6,148,72,35,0,33.6,0.627,50,tested_positive
1,85,66,29,0,26.6,0.351,31,tested_negative
8,183,64,0,0,23.3,0.672,32,tested_positive
1,89,66,23,94,28.1,0.167,21,tested_negative
0,137,40,35,168,43.1,2.288,33,tested_positive
5,116,74,0,0,25.6,0.201,30,tested_negative
Number of iterations: 4
Within cluster sum of squared errors: 21.000000000000004
Missing values globally replaced with mean/mode
Cluster centroids:
Cluster#
Attribute Full Data 0 1
(14) (10) (4)
==============================================
outlook sunny sunny overcast
temperature mild mild cool
humidity high high normal
windy weak weak strong
Clustered Instances
0 10 ( 71%)
1 4 ( 29%)
EM
==
Cluster
Attribute 0
(1)
=================
outlook
sunny 6
overcast 5
rainy 6
0 14 (100%)
LEARNING OUTCOMES:
The students will be apble to apply EM algorithm and k-Means algorithm for clustering and anayse the
results.
APPLICATION AREAS:
• Text mining • Image analysis
• Pattern recognition • Web cluster engines
PROGRAM:
OUTPUT:
[[11 0 0]
[ 0 11 1]
[ 0 0 7]]
Accuracy Metrics
precision recall f1-score support
LEARNING OUTCOMES:
The student will be able to implement k-Nearest Neighbour algorithm to classify the iris data set and
Print both correct and wrong predictions.
APPLICATION AREAS:
• Recommender systems
• Classification problems
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def localWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W
def localWeightRegression(xmat,ymat,k):
m,n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred
#set k here
ypred = localWeightRegression(X,mtip,0.5)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show();
Input Dataset:
total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
23.68,3.31,Male,No,Sun,Dinner,2
24.59,3.61,Female,No,Sun,Dinner,4