Professional Documents
Culture Documents
ML Manual2024 - IV YEar
ML Manual2024 - IV YEar
To be an academic leader for the development of human potential so as to meet the global
challenges.
Mission
Vision
To be a centre of excellence for IT education and research to produce globally competent and skilled IT professionals.
Mission
1: To equip the department with latest facilities to provide need-based quality education.
2: To inculcate real-life problem-solving skills in the students.
3: To strengthen students’ capabilities to match with industry requirements.
4: To provide an environment for continuous learning and applied research.
Acropolis Institute of Technology and Research, Indore
Certificate
(Sub. Code – IT802(A) has performed experiments as per the syllabus prescribed
by RGPV Bhopal and submitted satisfactory work in the institute during the
List of Practical
S NO CO
Title of the Practical CO CO CO CO CO5
1 2 3 4
1 Implement and demonstrate the FIND-S algorithm for finding ✓
the most specific hypothesis based on a given set of training
data samples. Read the training data from a .CSV file.
2 For a given set of training data examples stored in a .CSV file, ✓
implement and demonstrate the Candidate-Elimination
algorithm to output a description of the set of all hypotheses
consistent with the training examples.
3 Write a program for linear regression. ✓
4 Write a program to demonstrate the working of the decision ✓
tree based ID3 algorithm. Use an appropriate data set for
building the decision tree and apply this knowledge to classify a
new sample.
5 Write a program to implement Random forest ensemble method ✓
on a given dataset.
6 Write a program to implement Boosting ensemble method on a ✓
given dataset.
7 Write a python program to implement K-Means clustering ✓
Algorithm.
8 Write a program toImplement Dimensionality reduction using ✓
Principle Component Analysis (PCA) method.
9 Write a program to construct a Bayesian network considering ✓
medical data. Use this model to demonstrate the diagnosis of
heart patients using standard Heart Disease Data Set. You can
use Java/Python ML library classes/API.
10 Write a program to implement the naïve Bayesian classifier for ✓
a sample trainingdata set stored as a .CSV file. Compute the
accuracy of the classifier, considering few test data sets
CO List
CO 1 Apply knowledge of computing and mathematics to machine learning problems, models and
Algorithms
CO 2 Analyze a problem and identify the computing requirements appropriate for its solution
using Neural Network.
CO 3
Implement, and evaluate Convolution Neural Network to meet desired needs
Acropolis Institute of Technology and Research, Indore
CO 4 Solve real world problems using recurrent network and reinforcement learning.
CO 5 Apply mathematical foundations, algorithmic principles, and computer science theory to the modeling
and design of computer-based systems in a way that demonstrates comprehension
CO PO Mapping
PO PSO
CO PO PO PO PO PO PO PO PO PO PO1 PO1 PO1 PSO PSO PSO
1 2 3 4 5 6 7 8 9 0 1 2 1 2 3
CO ✓ ✓ ✓ ✓
1
CO ✓ ✓ ✓ ✓
2
CO ✓ ✓
3
CO ✓ ✓
4
CO ✓
5
Acropolis Institute of Technology and Research, Indore
Index
Practical Mark Faculty
Date of Date of Submission s/ Sign with
S. No Title of the Practical
Practical Submission Remark Grade date.
1 Implement and demonstrate the FIND-S
algorithm for finding the most specific
hypothesis based on a given set of
training data samples. Read the training
data from a .CSV file.
2 For a given set of training data
examples stored in a .CSV file,
implement and demonstrate the
Candidate-Elimination algorithm to
output a description of the set of all
hypotheses consistent with the training
examples.
3
Write a program for linear
regression.
4 Write a program to demonstrate the
working of the decision tree based ID3
algorithm. Use an appropriate data set
for building the decision tree and apply
this knowledge to classify a new sample.
5 Write a program to implement Random
forest ensemble method on a given
dataset.
6 Write a program to implement
Boosting ensemble method on a
given dataset.
7 Write a python program to
implement K- Means clustering
Algorithm
8 Write a program toImplement
Dimensionality reduction using Principle
Component Analysis (PCA) method.
9 Write a program to construct a Bayesian
network considering medical data. Use
this model to demonstrate the diagnosis
of heart patients using standard Heart
Disease Data Set. You can use
Java/Python ML library classes/API.
10 Write a program to implement the naïve
Bayesian classifier for a sample
trainingdata set stored as a .CSV file.
Compute the accuracy of the classifier,
considering few test data sets
Acropolis Institute of Technology and Research, Indore
DO’S
1. Without Prior permission do not enter the Laboratory.
4. Students should sign in the LOGIN REGISTER before entering the laboratory.
5. Students should come with observation and record notebook to the laboratory.
7. After completing the laboratory exercise, make sure to shut down the system properly.
DONT’S
1. Students bringing the bags inside the laboratory.
Course Objective
1. To understand the basic theory underlying machine learning, fundamental issues and
2. To study and apply basic concepts of artificial neural networks and Deep Learning based
4. To apply and implement basic concepts of recurrent network and reinforcement learning.
5. To evaluate the performance of algorithms and to provide solution for various real world
Course Outcomes
4. Create probabilistic and unsupervised learning models for handling unknown pattern.
5. Evaluate frequent patterns and preprocess the data before applying to any real-world problem and can
evaluate its performance.
Acropolis Institute of Technology and Research, Indore
Pre-requisites -
Experiment No. – 1
Experiment Title – Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data froma .CSV file.
Concept Theory – The find-S algorithm is a basic concept learning algorithm in machine learning. The find-S
algorithm finds the most specific hypothesis that fits all the positive examples. We have to note here that the algorithm
considers only those positive training example. The find-S algorithm starts with the most specific hypothesis and
generalizes this hypothesis each time it fails to classify an observed positive training data. Hence, the Find-S
algorithm moves from the most specific hypothesis to the most general hypothesis.
Solution(Program/Code/Procedure/Query) –
import csv
list(reader)
if x != "True":
else:
pass
j=j+1
print(h)
Acropolis Institute of Technology and Research, Indore
Output/Result -
'Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same',True 'Sunny', 'Warm', 'High', 'Strong',
'Warm', 'Same',True 'Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change',False 'Sunny', 'Warm', 'High',
'Strong', 'Cool', 'Change',True
Experiment No. – 2
Experiment Title – For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all hypothesesconsistent
with the training examples.
Concept Theory – The candidate elimination algorithm incrementally builds the version space given a hypothesis
space H and a set E of examples. The examples are added one by one; each example possibly shrinks the version
space by removing the hypotheses that are inconsistent with the example. The candidate elimination algorithm does
this by updating the general and specific boundary for each new example.
Solution (Program/Code/Procedure/Query) –
import csv
class CandidateElimination:
for i in range(num_attributes):
return False
return True
def generalize_G(self, example):
for i in range(self.num_attributes):
if self.G[i][0] == '?':
Acropolis Institute of Technology and Research, Indore
for i in range(self.num_attributes):
if self.S[i][0] == '0':
for i in range(self.num_attributes):
for i in range(self.num_attributes):
self.generalize_G(example[:-1])
def print_hypotheses(self):
print("Final hypothesis:")
print("S:", self.S)
print("G:", self.G)
def load_data(file_path):
data = []
data.append(row)
return data
Acropolis Institute of Technology and Research, Indore
def main():
data = load_data(file_path)
num_attributes = len(data[0]) - 1 # Number of attributes (excluding label)
ce = CandidateElimination(num_attributes)
ce.eliminate_candidates(data)
ce.print_hypotheses()
if __name__ == "__main__":
main()
Output/Result –
Final hypothesis:
S: [('Sunny', '?'), ('Warm', '?'), ('?', '?'), ('Strong', '?'), ('?', '?'), ('?', '?')]
G: [('?', '?'), ('Warm', '?'), ('High', '?'), ('Strong', '?'), ('Warm', '?'), ('?', '?')]
Acropolis Institute of Technology and Research, Indore
Experiment No. – 3
Concept Theory – Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis. Linear regression makes predictions for continuous/real or
numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (y)
variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it
finds how the value of the dependent variable is changing according to the value of the independent variable.
Solution (Program/Code/Procedure/Query) –
import numpy as np
estimate_coef(x, y):
n = np.size(x) m_x
= np.mean(x)m_y =
np.mean(y)
plt.ylabel('y')
plt.show() def
main():
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b = estimate_coef(x, y)
plot_regression_line(x, y, b)
main()
Output/Result –
= -0.0586206896552
b_1 = 1.45747126437
Acropolis Institute of Technology and Research, Indore
Experiment No. – 4
Experiment Title – Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to classify a new sample.
Concept Theory – ID3 is a simple decision tree learning algorithm developed by Ross Quinlan (1983). The basic
idea of ID3 algorithm is to construct the decision tree by employing a top-down, greedy search through the given sets
to test each attribute at every tree node. In order to select theattribute that is most useful for classifying a given sets,
we introduce a metric---information gain.
To find a optimal way to classify a learning set, what we need to do is to minimize the questions asked (i.e.
minimizing the depth of the tree). Thus , we need some function which can measure which questions provide the
most balanced splitting. The information gain metric is such a function
Solution (Program/Code/Procedure/Query) –
import numpy as np
import math
class Node:
def init (self, attribute):
self.attribute = attribute
self.children = [] self.answer
= ""
def
str(self):
return self.attribute
y in range(data.shape[0]):
Acropolis Institute of Technology and Research, Indore
count[x] += 1
for x in range(items.shape[0]):
pos = 0 for y in
range(data.shape[0]): if
data[y, col] == items[x]:
dict[items[x]][pos] = data[y]
pos += 1
for x in range(items.shape[0]):
return sums
for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv
split = np.argmax(gains)
node = Node(metadata[split])
def empty(size):
s = "" for x in
range(size): s += "
"
return s
Acropolis Institute of Technology and Research, Indore
if node.answer != "":
print(empty(level), node.answer)
return print(empty(level),
node.attribute)
metadata, traindata =
read_data("tennis.csv") data =
np.array(traindata) node = create_node(data,
metadata) print_tree(node, 0)
Data_loader.py import
csv def
read_data(filename):
headers =
next(datareader) metadata
= [] traindata =[] for name
in headers:
metadata.append(name)
for row in datareader:
traindata.append(row)
outlook,temperature,humidity,wind, answer
sunny,hot,high,weak,no
sunny,hot,high,strong,no
overcast,hot,high,weak,yes
rain,mild,high,weak,yes
rain,cool,normal,weak,yes
rain,cool,normal,strong,no
overcast,cool,normal,strong,yes
Acropolis Institute of Technology and Research, Indore
sunny,mild,high,weak,no
sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes
sunny,mild,normal,strong,yes
overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes
rain,mild,high,strong,no
Output/Result –
outlook
overcast
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidity
b'high'
b'no'
b'normal'
b'yes
Acropolis Institute of Technology and Research, Indore
Experiment No. – 5
Experiment Title – Write a program to implement Random forest ensemble method on a given dataset.
Concept Theory – Random Forest is one of the most popular and commonly used algorithms by Data Scientists.
Random forest is a Supervised Machine Learning Algorithm that is used widely in Classification and Regression
problems. It builds decision trees on different samples and takes their majority vote for classification and average in
case of regression.
One of the most important features of the Random Forest Algorithm is that it can handle the data set containing
continuous variables, as in the case of regression, and categorical variables, as in the case of classification. It
performs better for classification and regression tasks. In this tutorial, we will understand the working of random
forest and implement random forest on a classificationtask.
Solution (Program/Code/Procedure/Query) –
import numpy as np
import pandas as pd
data = pd.read_csv('Salaries.csv')
print(data)
regressor.fit(x, y)
Acropolis Institute of Technology and Research, Indore
color = 'blue')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show() Output/Result –
Acropolis Institute of Technology and Research, Indore
Experiment No. – 6
Experiment Title – Write a program to implement Boosting ensemble method on a given dataset.
Concept Theory – Boosting is an ensemble learning method that combines a set of weak learners into a strong
learner to minimize training errors. In boosting, a random sample of data is selected, fitted with a model and then
trained sequentially—that is, each model tries to compensate for the weaknesses of its predecessor. With each
iteration, the weak rules from each individual classifierare combined to form one, strong prediction rule.
Solution (Program/Code/Procedure/Query) –
1. Cross Validation
of K is 10.
2. Implementing AdaBoost
import pandas as pd import
numpy as np
warnings
warnings.filterwarnings("ignore")
data = pd.read_csv("Iris.csv")
print(data.shape)
data = data.drop('Id',axis=1)X
= data.iloc[:,:-1]
y = data.iloc[:,-1]
total_classes = y.nunique()
distribution = y.value_counts()
print(distribution)
Output/Result –
Experiment No. – 7
Concept Theory – The K-means clustering algorithm computes centroids and repeats until the optimal centroid is
found. It is presumptively known how many clusters there are. It is also known as the flat clustering algorithm. The
number of clusters found from data by the method is denotedby the letter ‘K’ in K-means.
In this method, data points are assigned to clusters in such a way that the sum of the squared distances between the
data points and the centroid is as small as possible. It is essential to note thatreduced diversity within clusters leads to
more identical data points within the same cluster.
Solution (Program/Code/Procedure/Query) –
ReadData(fileName):
open(fileName, 'r');
lines = f.read().splitlines();
f.close();
items = [];
line = lines[i].split(',');
itemFeatures = [];
for j in range(len(line)-1):
float(line[j]);
itemFeatures.append(v);
items.append(itemFeatures);
shuffle(items);
return items;
def FindColMinMax(items):n
= len(items[0]);
Acropolis Institute of Technology and Research, Indore
item in items:
for f in range(len(item)):
minima[f] = item[f];
if (item[f] > maxima[f]):
maxima[f] = item[f];
return minima,maxima;
mean in means:
for i in range(len(mean)):
return means;
S = 0;
in range(len(x)):
S += math.pow(x[i]-y[i], 2)
return math.sqrt(S)
def UpdateMean(n,mean,item):
for i in range(len(mean)):
m = mean[i];
m = (m*(n-1)+item[i])/float(n);
Acropolis Institute of Technology and Research, Indore
return mean;
def Classify(means,item):
minimum = sys.maxint;
index = -1;
for i in range(len(means)):
minimum = dis;
index = i; return
index;
def CalculateMeans(k,items,maxIterations=100000):#
means = InitializeMeans(items,k,cMin,cMax);#
for e in range(maxIterations):
noChange = True;
for i in range(len(items)):
item = items[i];
# corresponding means.
Acropolis Institute of Technology and Research, Indore
index = Classify(means,item);
clusterSizes[index] += 1; cSize =
clusterSizes[index];
means[index] = UpdateMean(cSize,means[index],item);#
if(index != belongsTo[i]):
noChange = False;
belongsTo[i] = index;
(noChange):
break;
return means;
def FindClusters(means,items):
item in items:
index = Classify(means,item);#
clusters[index].append(item);
return clusters;
Acropolis Institute of Technology and Research, Indore
Experiment No. – 8
Experiment Title – Write a program to Implement Dimensionality reduction using Principle Component
Analysis (PCA) method.
Concept Theory – PCA is a widely covered machine learning method on the web, and there are some great articles
about it, but many spend too much time in the weeds on the topic, when most of us just want to know how it works
in a simplified way.
Principal component analysis can be broken down into five steps. I’ll go through each step, providing logical
explanations of what PCA is doing and simplifying mathematical concepts such as standardization, covariance,
eigenvectors and eigenvalues without focusing on how to computethem.
Solution (Program/Code/Procedure/Query) –
1. Implementation of PCA
numpy as np
%matplotlib inline
data=load_breast_cancer()
data.keys()
print(data['target_names']) #
print(data['feature_names'])
df1=pd.DataFrame(data['data'],columns=data['feature_names']) #
scaling=StandardScaler()
Acropolis Institute of Technology and Research, Indore
scaling.fit(df1)
Scaled_data=scaling.transform(df1)#
principal=PCA(n_components=3)
principal.fit(Scaled_data)
x=principal.transform(Scaled_data)
print(x.shape)
principal.components_
plt.figure(figsize=(10,10))
plt.scatter(x[:,0],x[:,1],c=data['target'],cmap='plasma')
plt.xlabel('pc1')
plt.ylabel('pc2')
plt.figure(figsize=(10,10))
= fig.add_subplot(111, projection='3d')
c=data['target'],cmap='plasma')axis.set_xlabel("PC1", fontsize=10)
axis.set_ylabel("PC2", fontsize=10)
axis.set_zlabel("PC3", fontsize=10)
Acropolis Institute of Technology and Research, Indore
Experiment No. – 9
Experiment Title – Write a program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use
Java/Python ML library classes/API.
Concept Theory – A Bayesian network (BN) is a probabilistic graphical model for representing knowledge about an
uncertain domain where each node corresponds to a random variable and each edge represents the conditional
probability for the corresponding random variables [9]. BNs are also called belief networks or Bayes nets. Due to
dependencies and conditional probabilities, a BN corresponds to a directed acyclic graph (DAG) where no loop or
self connection is allowed.
Solution (Program/Code/Procedure/Query) –
„True‟:0.5, „False‟:0.5 })
Tuberculosis=ConditionalProbabilityTable(
= ConditionalProbabilityTable(
Bronchitis = ConditionalProbabilityTable([[
[„True‟, „False‟,0.08].
[ „False‟, „True‟,0.03],
Tuberculosis_or_cancer = ConditionalProbabilityTable(
= ConditionalProbabilityTable(
[tuberculosis_or_cancer, bronchitis])s0
= State(asia, name=”asia”)
network.add_nodes(s0,s1,s2)
network.add_edge(s0,s1)
network.add_edge(s1.s2)
network.bake()
print(network.predict_probal({„tuberculosis‟: „True‟}))
Acropolis Institute of Technology and Research, Indore
Experiment No. – 10
Experiment Title – Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
Concept Theory –
Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayestheorem and
used for solving classification problems.
It is mainly used in text classification that includes a high-dimensional training dataset.
Naïve Bayes Classifier is one of the simple and most effective Classification algorithmswhich helps in
building the fast machine learning models that can make quick predictions.
It is a probabilistic classifier, which means it predicts on the basis of the probability of anobject.
Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimentalanalysis, and
classifying articles.
Solution (Program/Code/Procedure/Query) –
import csv
import random
import math
return dataset
#generate indices for the dataset list randomly to pick ele for training data index =
random.randrange(len(copy)); trainSet.append(copy.pop(index))
def separateByClass(dataset):
separated = {}
#creates a dictionary of classes 1 and 0 where the values are the instacnes belonging to eachclass
Acropolis Institute of Technology and Research, Indore
for i in range(len(dataset)):
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg =
mean(numbers)
= separateByClass(dataset);
return summaries
and sd probabilities[classValue] =
1 for i in
range(len(classSummaries)):
return probabilities
bestProb = probability
bestLabel = classValue
return bestLabel
def getPredictions(summaries,
testSet): predictions = [] for i in
range(len(testSet)):
Acropolis Institute of Technology and Research, Indore
predictions
if testSet[i][-1] == predictions[i]:
correct += 1
def main():
filename='5data.csv'
splitRatio = 0.67
dataset = loadCsv(filename);
# prepare model
summaries = summarizeByClass(trainingSet);
Output/Result –
confusion matrix is as follows [[17 0 0]
[ 0 17 0]
[ 0 0 11]]