ML Manual2024 - IV YEar

Acropolis Institute of Technology and Research, Indore
Vision and Mission of the Institute Vision
To be an academic leader for the development of human potential so as to meet the global
challenges.
Mission
1. To create an intellectually stimulating learning environment.
2. To impart value based, innovative, and research-oriented education.
3. To develop positive attitude with communication skills.
4. To increase employability and entrepreneurship through collaboration with industries and

professional organizations.
Vision and Mission of Department of Information Technology
Vision
To be a centre of excellence for IT education and research to produce globally competent and skilled IT professionals.
Mission
1: To equip the department with latest facilities to provide need-based quality education.
2: To inculcate real-life problem-solving skills in the students.
3: To strengthen students’ capabilities to match with industry requirements.
4: To provide an environment for continuous learning and applied research.
Program Outcome (PO)

The engineering graduate of this institute will demonstrate:
1. Apply knowledge of mathematics, science, computing, and engineering fundamentals to computer

science engineering problems.
2. Able to identify, formulate, and demonstrate with excellent programming, and problem-solving
skills.
3. Design solutions for engineering problems including design of experiment and processes to meet
desired needs within reasonable constraints of manufacturability, sustainability, ecological,
intellectual and health and safety considerations.
4. Propose and develop effective investigational solution of complex problems using research
methodology; including design of experiment, analysis and interpretation of data, and combination
of information to provide suitable conclusion. synthesis
5. Ability to create, select and use the modern techniques and various tools to solve engineering
problems and to evaluate solutions with an understanding of the limitations.
6. Ability to acquire knowledge of contemporary issues to assess societal, health and safety, legal
and cultural issues.
7. Ability to evaluate the impact of engineering solutions on individual as well as organization in a
societal and environmental context, and recognize sustainable development, and will be aware of
emerging technologies and current professional issues.
8. Capability to possess leadership and managerial skills and understand and commit to professional
ethics and responsibilities.
9. Ability to demonstrate the teamwork and function effectively as an individual, with an ability to
design, develop, test, and debug the project, and will be able to work with a multi-disciplinary team.
10. Ability to communicate effectively on engineering problems with the community, such as being
able to write effective reports and design documentation.
11. Flexibility to feel the recognition of the need for and can engage in independent and life- long
learning by professional development and quality enhancement programs in context of
technological change.
12. A practice of engineering and management principles and apply these to one’s own work, as a
member and leader in a team, to manage projects and entrepreneurship.
Certificate
This is to certify that …………Enrolment No. …………… B. Tech
Information Technology Year 4th Semester VIII in the Machine Learning
(Sub. Code – IT802(A) has performed experiments as per the syllabus prescribed
by RGPV Bhopal and submitted satisfactory work in the institute during the
academic year 2023 – 2024.
Signature of Head Signature of Faculty

List of Practical
S NO CO
Title of the Practical CO CO CO CO CO5
1 2 3 4
1 Implement and demonstrate the FIND-S algorithm for finding ✓
the most specific hypothesis based on a given set of training
data samples. Read the training data from a .CSV file.
2 For a given set of training data examples stored in a .CSV file, ✓
implement and demonstrate the Candidate-Elimination
algorithm to output a description of the set of all hypotheses
consistent with the training examples.
3 Write a program for linear regression. ✓
4 Write a program to demonstrate the working of the decision ✓
tree based ID3 algorithm. Use an appropriate data set for
building the decision tree and apply this knowledge to classify a
new sample.
5 Write a program to implement Random forest ensemble method ✓
on a given dataset.
6 Write a program to implement Boosting ensemble method on a ✓
given dataset.
7 Write a python program to implement K-Means clustering ✓
Algorithm.
8 Write a program toImplement Dimensionality reduction using ✓
Principle Component Analysis (PCA) method.
9 Write a program to construct a Bayesian network considering ✓
medical data. Use this model to demonstrate the diagnosis of
heart patients using standard Heart Disease Data Set. You can
use Java/Python ML library classes/API.
10 Write a program to implement the naïve Bayesian classifier for ✓
a sample trainingdata set stored as a .CSV file. Compute the
accuracy of the classifier, considering few test data sets
CO List
CO 1 Apply knowledge of computing and mathematics to machine learning problems, models and
Algorithms
CO 2 Analyze a problem and identify the computing requirements appropriate for its solution
using Neural Network.
CO 3
Implement, and evaluate Convolution Neural Network to meet desired needs
CO 4 Solve real world problems using recurrent network and reinforcement learning.
CO 5 Apply mathematical foundations, algorithmic principles, and computer science theory to the modeling
and design of computer-based systems in a way that demonstrates comprehension
of the trade-offs involved in design choices.
CO PO Mapping
PO PSO
CO PO PO PO PO PO PO PO PO PO PO1 PO1 PO1 PSO PSO PSO
1 2 3 4 5 6 7 8 9 0 1 2 1 2 3
CO ✓ ✓ ✓ ✓
1
CO ✓ ✓ ✓ ✓
2
CO ✓ ✓
3
CO ✓ ✓
4
CO ✓
5
Index
Practical Mark Faculty
Date of Date of Submission s/ Sign with
S. No Title of the Practical
Practical Submission Remark Grade date.
1 Implement and demonstrate the FIND-S
algorithm for finding the most specific
hypothesis based on a given set of
training data samples. Read the training
data from a .CSV file.
2 For a given set of training data
examples stored in a .CSV file,
implement and demonstrate the
Candidate-Elimination algorithm to
output a description of the set of all
hypotheses consistent with the training
examples.
3
Write a program for linear
regression.
4 Write a program to demonstrate the
working of the decision tree based ID3
algorithm. Use an appropriate data set
for building the decision tree and apply
this knowledge to classify a new sample.
5 Write a program to implement Random
forest ensemble method on a given
dataset.
6 Write a program to implement
Boosting ensemble method on a
given dataset.
7 Write a python program to
implement K- Means clustering
Algorithm
8 Write a program toImplement
Dimensionality reduction using Principle
Component Analysis (PCA) method.
9 Write a program to construct a Bayesian
network considering medical data. Use
this model to demonstrate the diagnosis
of heart patients using standard Heart
Disease Data Set. You can use
Java/Python ML library classes/API.
10 Write a program to implement the naïve
Bayesian classifier for a sample
trainingdata set stored as a .CSV file.
Compute the accuracy of the classifier,
considering few test data sets
Hardware and Software Requirements
Sr. No. Software Requirements Hardware Requirements
1 Windows,Mac,Linux Hard Disk min 1 GB or above
2 Python versions: 2.7.X, 3.6.X.,3.8.X RAM: 4 GB

General Instructions for Laboratory Classes
DO’S
1. Without Prior permission do not enter the Laboratory.
2. While entering the LAB students should wear their ID cards.
3. The students should come with proper uniform.
4. Students should sign in the LOGIN REGISTER before entering the laboratory.
5. Students should come with observation and record notebook to the laboratory.
6. Students should maintain silence inside the laboratory.
7. After completing the laboratory exercise, make sure to shut down the system properly.
DONT’S
1. Students bringing the bags inside the laboratory.
2. Students using the computers in an improper way.
3. Students scribbling on the desk and mishandling the chairs.
4. Students using mobile phones inside the laboratory.
5. Students making noise inside the laboratory.

Course Objectives and Outcomes
Course Objective
1. To understand the basic theory underlying machine learning, fundamental issues and
challenges of machine learning: data, model selection, model complexity, etc.
2. To study and apply basic concepts of artificial neural networks and Deep Learning based
machine learning algorithms.
3. To Implement Convolution neural networks for solving real world problems.
4. To apply and implement basic concepts of recurrent network and reinforcement learning.
5. To evaluate the performance of algorithms and to provide solution for various real world
applications related to computer vision, speech processing, natural language processing
Course Outcomes
At the end of the course student will be able to:
1. Understand the basic characteristics of machine learning strategies.
2. Analyze supervised learning and various applications of Neural Network.
3. Apply more than one technique to enhance the performance of learning.
4. Create probabilistic and unsupervised learning models for handling unknown pattern.
5. Evaluate frequent patterns and preprocess the data before applying to any real-world problem and can
evaluate its performance.
Department Of Information Technology
Pre-requisites -
1. Basic Concepts of Statistics, Probability, Linear Algebra, Calculus.
2. Basic Concepts of Programming Languages.

Subject Name – Machine Learning Subject Code – IT802(A)
Experiment No. – 1
Experiment Title – Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data froma .CSV file.
Concept Theory – The find-S algorithm is a basic concept learning algorithm in machine learning. The find-S
algorithm finds the most specific hypothesis that fits all the positive examples. We have to note here that the algorithm
considers only those positive training example. The find-S algorithm starts with the most specific hypothesis and
generalizes this hypothesis each time it fails to classify an observed positive training data. Hence, the Find-S
algorithm moves from the most specific hypothesis to the most general hypothesis.
Solution(Program/Code/Procedure/Query) –
import csv
with open('tennis.csv', 'r') as f:
reader = csv.reader(f) your_list =
list(reader)
h = [['0', '0', '0', '0', '0', '0']]
for i in your_list: print(i) if

i[-1] == "True": j
= 0 for x in i:
if x != "True":
if x != h[0][j] and h[0][j] == '0':h[0][j]

=x
elif x != h[0][j] and h[0][j] != '0': h[0][j]

= '?'
else:
pass
j=j+1
print("Most specific hypothesis is")
print(h)
Output/Result -
'Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same',True 'Sunny', 'Warm', 'High', 'Strong',
'Warm', 'Same',True 'Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change',False 'Sunny', 'Warm', 'High',
'Strong', 'Cool', 'Change',True
Maximally Specific set - [['Sunny', 'Warm', '?', 'Strong', '?', '?']]

Experiment Title – For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all hypothesesconsistent
with the training examples.
Concept Theory – The candidate elimination algorithm incrementally builds the version space given a hypothesis
space H and a set E of examples. The examples are added one by one; each example possibly shrinks the version
space by removing the hypotheses that are inconsistent with the example. The candidate elimination algorithm does
this by updating the general and specific boundary for each new example.
 You can consider this as an extended form of the Find-S algorithm.

 Consider both positive and negative examples.
 Actually, positive examples are used here as the Find-S algorithm (Basically they aregeneralizing from the
specification).
 While the negative example is specified in the generalizing form.
Solution (Program/Code/Procedure/Query) –
import csv
class CandidateElimination:
def __init__(self, num_attributes):

self.num_attributes = num_attributes
self.S = [None] * num_attributes # Most specific hypothesis

self.G = [None] * num_attributes # Most general hypothesis
for i in range(num_attributes):
self.S[i] = ('0', '?')
self.G[i] = ('?', '?')
def is_consistent(self, example, hypothesis):

for i in range(len(example)):
if hypothesis[i] != '?' and example[i] != hypothesis[i]:
return False
return True
def generalize_G(self, example):
for i in range(self.num_attributes):
if self.G[i][0] == '?':
self.G[i] = (example[i], '?')
elif self.G[i][0] != example[i]:
self.G[i] = ('?', '?')

def specialize_S(self, example):
if self.S[i][0] == '0':
self.S[i] = (example[i], '?')

elif self.S[i][0] != example[i]:
self.S[i] = ('?', self.S[i][1])
def eliminate_candidates(self, data):
for example in data:
if example[-1] == 'Y': # Positive example

self.generalize_G(example[:-1])
if not self.is_consistent(example[:-1], self.S):

self.specialize_S(example[:-1])
else: # Negative example

self.specialize_S(example[:-1])
if not self.is_consistent(example[:-1], self.G):
self.generalize_G(example[:-1])
def print_hypotheses(self):
print("Final hypothesis:")
print("S:", self.S)
print("G:", self.G)
def load_data(file_path):
data = []
with open(file_path, 'r') as file:

reader = csv.reader(file)
for row in reader:
data.append(row)
return data
def main():
file_path = "training_data.csv" # Path to your CSV file
data = load_data(file_path)
num_attributes = len(data[0]) - 1 # Number of attributes (excluding label)
ce = CandidateElimination(num_attributes)
ce.eliminate_candidates(data)
ce.print_hypotheses()
if __name__ == "__main__":
main()
Training Data- Sky, AirTemp, Humidity, Wind, Water, Forecast, EnjoySport
Sunny, Warm, Normal, Strong, Warm, Same, Yes

Sunny, Warm, High, Strong, Warm, Same, Yes
Rainy, Cold, High, Strong, Warm, Change, No
Sunny, Warm, High, Strong, Cool, Change, Yes
Output/Result –
Final hypothesis:
S: [('Sunny', '?'), ('Warm', '?'), ('?', '?'), ('Strong', '?'), ('?', '?'), ('?', '?')]
G: [('?', '?'), ('Warm', '?'), ('High', '?'), ('Strong', '?'), ('Warm', '?'), ('?', '?')]
Experiment Title – Implementation of Linear Regression
Concept Theory – Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis. Linear regression makes predictions for continuous/real or
numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (y)
variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it
finds how the value of the dependent variable is changing according to the value of the independent variable.
import numpy as np
import matplotlib.pyplot as pltdef
estimate_coef(x, y):
n = np.size(x) m_x
= np.mean(x)m_y =
np.mean(y)
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot plt.scatter(x,
y, color = "m",marker = "o", s = 30)

y_pred = b[0] + b[1]*x plt.plot(x,
y_pred, color = "g")plt.xlabel('x')
plt.ylabel('y')
plt.show() def
main():
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {}\\nb_1 = {}".format(b[0], b[1]))
plot_regression_line(x, y, b)
if name == " main ":
main()
Output/Result –
Estimated coefficients: b_0
= -0.0586206896552
b_1 = 1.45747126437
Experiment Title – Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to classify a new sample.
Concept Theory – ID3 is a simple decision tree learning algorithm developed by Ross Quinlan (1983). The basic
idea of ID3 algorithm is to construct the decision tree by employing a top-down, greedy search through the given sets
to test each attribute at every tree node. In order to select theattribute that is most useful for classifying a given sets,
we introduce a metric---information gain.
To find a optimal way to classify a learning set, what we need to do is to minimize the questions asked (i.e.
minimizing the depth of the tree). Thus , we need some function which can measure which questions provide the
most balanced splitting. The information gain metric is such a function
import numpy as np
import math
from data_loader import read_data
class Node:
def init (self, attribute):
self.attribute = attribute
self.children = [] self.answer
= ""
def
str(self):
return self.attribute
def subtables(data, col, delete): dict = {} items =

np.unique(data[:, col]) count =
np.zeros((items.shape[0], 1), dtype=np.int32)
for x in range(items.shape[0]): for
y in range(data.shape[0]):
\if data[y, col] == items[x]:
count[x] += 1
for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0 for y in
range(data.shape[0]): if
data[y, col] == items[x]:
dict[items[x]][pos] = data[y]
pos += 1
if delete: dict[items[x]] = np.delete(dict[items[x]], col, 1) return items, dict
def entropy(S): items

= np.unique(S) if
items.size == 1:
return 0
counts = np.zeros((items.shape[0], 1))

sums = 0
for x in range(items.shape[0]):
counts[x] = sum(S == items[x]) / (S.size * 1.0)
for count in counts: sums += -1 * count *

math.log(count, 2)
return sums
def gain_ratio(data, col):
items, dict = subtables(data, col, delete=False)
total_size = data.shape[0] entropies =

np.zeros((items.shape[0], 1)) intrinsic =
np.zeros((items.shape[0], 1)) for x in
range(items.shape[0]): ratio =
dict[items[x]].shape[0]/(total_size * 1.0) entropies[x]
= ratio * entropy(dict[items[x]][:, -1]) intrinsic[x] =ratio *

math.log(ratio, 2)
total_entropy = entropy(data[:, -1])iv

= -1 * sum(intrinsic)
for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv
def create_node(data, metadata):
if (np.unique(data[:, -1])).shape[0] == 1: node

= Node("")
node.answer = np.unique(data[:, -1])[0] returnnode
gains = np.zeros((data.shape[1] - 1, 1)) forcol

in range(data.shape[1] - 1):
gains[col] = gain_ratio(data, col)
split = np.argmax(gains)
node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)
items, dict = subtables(data, split, delete=True)
for x in range(items.shape[0]): child =

create_node(dict[items[x]], metadata)
node.children.append((items[x], child)) returnnode
def empty(size):
s = "" for x in
range(size): s += "
"
return s
def print_tree(node, level):
if node.answer != "":
print(empty(level), node.answer)
return print(empty(level),
node.attribute)
for value, n in node.children: print(empty(level +

1), value) print_tree(n, level + 2)
metadata, traindata =
read_data("tennis.csv") data =
np.array(traindata) node = create_node(data,
metadata) print_tree(node, 0)
Data_loader.py import
csv def
read_data(filename):
with open(filename, 'r') as csvfile: datareader =

csv.reader(csvfile, delimiter=',')
headers =
next(datareader) metadata
= [] traindata =[] for name
in headers:
metadata.append(name)
for row in datareader:
traindata.append(row)
return (metadata, traindata)

Tennis.csv
outlook,temperature,humidity,wind, answer
sunny,hot,high,weak,no
sunny,hot,high,strong,no
overcast,hot,high,weak,yes
rain,mild,high,weak,yes
rain,cool,normal,weak,yes
rain,cool,normal,strong,no
overcast,cool,normal,strong,yes
sunny,mild,high,weak,no
sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes
sunny,mild,normal,strong,yes
overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes
rain,mild,high,strong,no
Output/Result –
outlook
overcast
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidity
b'high'
b'no'
b'normal'
b'yes
Experiment Title – Write a program to implement Random forest ensemble method on a given dataset.
Concept Theory – Random Forest is one of the most popular and commonly used algorithms by Data Scientists.
Random forest is a Supervised Machine Learning Algorithm that is used widely in Classification and Regression
problems. It builds decision trees on different samples and takes their majority vote for classification and average in
case of regression.
One of the most important features of the Random Forest Algorithm is that it can handle the data set containing
continuous variables, as in the case of regression, and categorical variables, as in the case of classification. It
performs better for classification and regression tasks. In this tutorial, we will understand the working of random
forest and implement random forest on a classificationtask.
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('Salaries.csv')
print(data)
# Fitting Random Forest Regression to the dataset#
import the regressor
from sklearn.ensemble import RandomForestRegressor #
create regressor object
regressor = RandomForestRegressor(n_estimators = 100, random_state = 0)# fit the
regressor with x and y data
regressor.fit(x, y)
Y_pred = regressor.predict(np.array([6.5]).reshape(1, 1)) # test the output by changing values#
Visualising the Random Forest Regression results
# arrange for creating a range of values#
from min value of x to max
# value of x with a difference of 0.01#
between two consecutive values
X_grid = np.arrange(min(x), max(x), 0.01)
# reshape for reshaping the data into a len(X_grid)*1 array, # i.e.
to make a column out of the X_grid value
X_grid = X_grid.reshape((len(X_grid), 1))#
Scatter plot for original data plt.scatter(x, y,
color = 'blue')
# plot predicted data
plt.plot(X_grid, regressor.predict(X_grid),color = 'green')
plt.title('Random Forest Regression')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show() Output/Result –
Experiment Title – Write a program to implement Boosting ensemble method on a given dataset.
Concept Theory – Boosting is an ensemble learning method that combines a set of weak learners into a strong
learner to minimize training errors. In boosting, a random sample of data is selected, fitted with a model and then
trained sequentially—that is, each model tries to compensate for the weaknesses of its predecessor. With each
iteration, the weak rules from each individual classifierare combined to form one, strong prediction rule.
1. Cross Validation
# This code may not be run on GFG IDE# as
required packages are not found.
# importing cross-validation from sklearn package.from sklearn import cross_validation # value
of K is 10.
data = cross_validation.KFold(len(train_set), n_folds=10, indices=False)
2. Implementing AdaBoost
import pandas as pd import
numpy as np
from sklearn.model_selection import train_test_splitfrom
sklearn.ensemble import AdaBoostClassifier import
warnings
warnings.filterwarnings("ignore")
# Reading the dataset from the csv file
# separator is a vertical line, as seen in the dataset

data = pd.read_csv("Iris.csv")
# Printing the shape of the dataset
print(data.shape)
data = data.drop('Id',axis=1)X
= data.iloc[:,:-1]
y = data.iloc[:,-1]
print("Shape of X is %s and shape of y is %s"%(X.shape,y.shape))
total_classes = y.nunique()
print("Number of unique species in dataset are: ",total_classes)
distribution = y.value_counts()
print(distribution)
X_train,X_val,Y_train,Y_val = train_test_split(X,y,test_size=0.25,random_state=28) print("The
accuracy of the model on validation set is", adb_model.score(X_val,Y_val))
Output/Result –
The accuracy of the model on validation set is 0.9210526315789473

Experiment Title – Write a python program to implement K-Means clustering Algorithm.
Concept Theory – The K-means clustering algorithm computes centroids and repeats until the optimal centroid is
found. It is presumptively known how many clusters there are. It is also known as the flat clustering algorithm. The
number of clusters found from data by the method is denotedby the letter ‘K’ in K-means.
In this method, data points are assigned to clusters in such a way that the sum of the squared distances between the
data points and the centroid is as small as possible. It is essential to note thatreduced diversity within clusters leads to
more identical data points within the same cluster.
# Implementing K-means Clusteringdef
ReadData(fileName):
# Read the file, splitting by linesf =
open(fileName, 'r');
lines = f.read().splitlines();
f.close();
items = [];
for i in range(1, len(lines)):
line = lines[i].split(',');
itemFeatures = [];
for j in range(len(line)-1):
# Convert feature value to floatv =
float(line[j]);
# Add feature value to dict
itemFeatures.append(v);
items.append(itemFeatures);
shuffle(items);
return items;
def FindColMinMax(items):n
= len(items[0]);
minima = [sys.maxint for i in range(n)];
maxima = [-sys.maxint -1 for i in range(n)];for
item in items:
for f in range(len(item)):
if (item[f] < minima[f]):
minima[f] = item[f];
if (item[f] > maxima[f]):
maxima[f] = item[f];
return minima,maxima;
def InitializeMeans(items, k, cMin, cMax):
# Initialize means to random numbers between # the
min and max of each column/feature
f = len(items[0]); # number of features
means = [[0 for i in range(f)] for j in range(k)];for
mean in means:
for i in range(len(mean)):
# Set value to a random float
# (adding +-1 to avoid a wide placement of a mean)
mean[i] = uniform(cMin[i]+1, cMax[i]-1);
return means;
def EuclideanDistance(x, y):
S = 0;
# The sum of the squared differences of the elements for i
in range(len(x)):
S += math.pow(x[i]-y[i], 2)
#The square root of the sum
return math.sqrt(S)
def UpdateMean(n,mean,item):
for i in range(len(mean)):
m = mean[i];
m = (m*(n-1)+item[i])/float(n);
mean[i] = round(m, 3);
return mean;
def Classify(means,item):
# Classify item to the mean with minimum distance
minimum = sys.maxint;
index = -1;
for i in range(len(means)):
# Find distance from item to mean
dis = EuclideanDistance(item, means[i]);if
(dis < minimum):
minimum = dis;
index = i; return
index;
def CalculateMeans(k,items,maxIterations=100000):#
Find the minima and maxima for columns
cMin, cMax = FindColMinMax(items);
# Initialize means at random points
means = InitializeMeans(items,k,cMin,cMax);#
Initialize clusters, the array to hold
# the number of items in a class clusterSizes=
[0 for i in range(len(means))];# An array to hold
the cluster an item is in belongsTo = [0 for i in
range(len(items))]; # Calculate means
for e in range(maxIterations):
# If no change of cluster occurs, halt
noChange = True;
for i in range(len(items)):
item = items[i];
# Classify item into a cluster and update the
# corresponding means.
index = Classify(means,item);
clusterSizes[index] += 1; cSize =
clusterSizes[index];
means[index] = UpdateMean(cSize,means[index],item);#
Item changed cluster
if(index != belongsTo[i]):
noChange = False;
belongsTo[i] = index;
# Nothing changed, returnif
(noChange):
break;
return means;
def FindClusters(means,items):
clusters = [[] for i in range(len(means))]; # Init clustersfor
item in items:
# Classify item into a cluster
index = Classify(means,item);#
Add item to cluster
clusters[index].append(item);
return clusters;
Experiment Title – Write a program to Implement Dimensionality reduction using Principle Component
Analysis (PCA) method.
Concept Theory – PCA is a widely covered machine learning method on the web, and there are some great articles
about it, but many spend too much time in the weeds on the topic, when most of us just want to know how it works
in a simplified way.
Principal component analysis can be broken down into five steps. I’ll go through each step, providing logical
explanations of what PCA is doing and simplifying mathematical concepts such as standardization, covariance,
eigenvectors and eigenvalues without focusing on how to computethem.
1. Implementation of PCA
import pandas as pd import
numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
#import the breast _cancer dataset
from sklearn.datasets import load_breast_cancer
data=load_breast_cancer()
data.keys()
# Check the output classes
print(data['target_names']) #
Check the input attributes
print(data['feature_names'])
# construct a dataframe using pandas
df1=pd.DataFrame(data['data'],columns=data['feature_names']) #
Scale data before applying PCA
scaling=StandardScaler()
# Use fit and transform method
scaling.fit(df1)
Scaled_data=scaling.transform(df1)#
Set the n_components=3
principal=PCA(n_components=3)
principal.fit(Scaled_data)
x=principal.transform(Scaled_data)
# Check the dimensions of data after PCA
print(x.shape)
# Check the values of eigen vectors #
prodeced by principal components
principal.components_
plt.figure(figsize=(10,10))
plt.scatter(x[:,0],x[:,1],c=data['target'],cmap='plasma')
plt.xlabel('pc1')
plt.ylabel('pc2')
# import relevant libraries for 3d graph from
mpl_toolkits.mplot3d import Axes3Dfig =
plt.figure(figsize=(10,10))
# choose projection 3d for creating a 3d graphaxis
= fig.add_subplot(111, projection='3d')
# x[:,0]is pc1,x[:,1] is pc2 while x[:,2] is pc3 axis.scatter(x[:,0],x[:,1],x[:,2],
c=data['target'],cmap='plasma')axis.set_xlabel("PC1", fontsize=10)
axis.set_ylabel("PC2", fontsize=10)
axis.set_zlabel("PC3", fontsize=10)
Experiment Title – Write a program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use
Java/Python ML library classes/API.
Concept Theory – A Bayesian network (BN) is a probabilistic graphical model for representing knowledge about an
uncertain domain where each node corresponds to a random variable and each edge represents the conditional
probability for the corresponding random variables [9]. BNs are also called belief networks or Bayes nets. Due to
dependencies and conditional probabilities, a BN corresponds to a directed acyclic graph (DAG) where no loop or
self connection is allowed.
From pomegranate import* Asia=DiscreteDistribution({
„True‟:0.5, „False‟:0.5 })
Tuberculosis=ConditionalProbabilityTable(
[[ „True‟, „True‟, 0.2],
[„True‟, „False‟, 0.8],
[ „False‟, „True‟, 0.01], [ „False‟, „False‟, 0.98]], [asia])
Smoking = DiscreteDistribution({ „True‟:0.5, „False‟:0.5 })Lung
= ConditionalProbabilityTable(
[[ „True‟, „True‟, 0.75], [„True‟, „False‟,0.25].
[ „False‟, „True‟, 0.02],
[ „False‟, „False‟, 0.98]], [ smoking])
Bronchitis = ConditionalProbabilityTable([[
„True‟, „True‟, 0.92],
[„True‟, „False‟,0.08].
[ „False‟, „True‟,0.03],
[ „False‟, „False‟, 0.98]], [ smoking])
Tuberculosis_or_cancer = ConditionalProbabilityTable(
[[ „True‟, „True‟, „True‟, 1.0], [„True‟, „True‟, „False‟, 0.0],

[„True‟, „False‟, „True‟, 1.0],
[„True‟, „False‟, „False‟, 0.0],
[„False‟, „True‟, „True‟, 1.0],
[„False‟, „True‟, „False‟, 0.0],
[„False‟, „False‟ „True‟, 1.0],
[„False‟, „False‟, „False‟, 0.0]], [tuberculosis, lung])Xray
= ConditionalProbabilityTable(
[[ „True‟, „True‟, 0.885],
[„True‟, „False‟, 0.115],
[ „False‟, „True‟, 0.04],
[ „False‟, „False‟, 0.96]], [tuberculosis_or_cancer]) dyspnea = ConditionalProbabilityTable( [[ „True‟,

„True‟, „True‟, 0.96], [„True‟, „True‟, „False‟, 0.04],
[„True‟, „False‟, „True‟, 0.89],
[„True‟, „False‟, „False‟, 0.11],
[„False‟, „True‟, „True‟, 0.96],
[„False‟, „True‟, „False‟, 0.04],
[„False‟, „False‟ „True‟, 0.89],
[„False‟, „False‟, „False‟, 0.11 ]],
[tuberculosis_or_cancer, bronchitis])s0
= State(asia, name=”asia”)
s1 = State(tuberculosis, name=” tuberculosis”)

s2 = State(smoking, name=” smoker”)

network = BayesianNetwork(“asia”)
network.add_nodes(s0,s1,s2)
network.add_edge(s0,s1)
network.add_edge(s1.s2)
network.bake()
print(network.predict_probal({„tuberculosis‟: „True‟}))
Experiment Title – Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
Concept Theory –
 Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayestheorem and
used for solving classification problems.
 It is mainly used in text classification that includes a high-dimensional training dataset.
 Naïve Bayes Classifier is one of the simple and most effective Classification algorithmswhich helps in
building the fast machine learning models that can make quick predictions.
 It is a probabilistic classifier, which means it predicts on the basis of the probability of anobject.
 Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimentalanalysis, and
classifying articles.
import csv
import random
import math
def loadCsv(filename): lines =

csv.reader(open(filename, "r"));
dataset = list(lines) for i in
range(len(dataset)):
#converting strings into numbers for processing dataset[i]

= [float(x) for x in dataset[i]]
return dataset
def splitDataset(dataset, splitRatio):
#67% training size trainSize =

int(len(dataset) * splitRatio); trainSet = []
copy = list(dataset); while len(trainSet) <
trainSize:
#generate indices for the dataset list randomly to pick ele for training data index =
random.randrange(len(copy)); trainSet.append(copy.pop(index))
return [trainSet, copy]
def separateByClass(dataset):
separated = {}
#creates a dictionary of classes 1 and 0 where the values are the instacnes belonging to eachclass
for i in range(len(dataset)):
vector = dataset[i] if (vector[-1]not

in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg =
mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1) return

math.sqrt(variance)
def summarize(dataset): summaries = [(mean(attribute), stdev(attribute)) forattribute in

zip(*dataset)]; del summaries[-1] return summaries
def summarizeByClass(dataset): separated
= separateByClass(dataset);
summaries = {} for classValue, instances in

separated.items():
#summaries is a dic of tuples(mean,std) for each class value summaries[classValue] =

summarize(instances)
return summaries
def calculateProbability(x, mean, stdev):
exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2)))) return (1

/ (math.sqrt(2*math.pi) * stdev)) * exponent
def calculateClassProbabilities(summaries, inputVector): probabilities =

{}
for classValue, classSummaries in summaries.items():#class and attribute informationas mean
and sd probabilities[classValue] =
1 for i in
range(len(classSummaries)):
mean, stdev = classSummaries[i] #take mean and sd of every attributefor

class 0 and 1 seperaely x = inputVector[i] #testvector's
first attribute
probabilities[classValue] *= calculateProbability(x, mean, stdev);#use

normal dist
return probabilities
def predict(summaries, inputVector): probabilities =

calculateClassProbabilities(summaries, inputVector) bestLabel, bestProb =
None, -1
for classValue, probability in probabilities.items():#assigns that class which has he highest
prob if bestLabel is None or probability >

bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel
def getPredictions(summaries,
testSet): predictions = [] for i in
range(len(testSet)):
result = predict(summaries, testSet[i]) predictions.append(result) return
predictions
def getAccuracy(testSet, predictions):

correct = 0 for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testSet))) * 100.0
def main():
filename='5data.csv'
splitRatio = 0.67
dataset = loadCsv(filename);
trainingSet, testSet = splitDataset(dataset, splitRatio) print('Split {0} rows into

train={1} and test={2} rows'.format(len(dataset), len(trainingSet), len(testSet)))
# prepare model
summaries = summarizeByClass(trainingSet);
# test model predictions = getPredictions(summaries, testSet)

accuracy = getAccuracy(testSet, predictions) print('Accuracy of the
classifier is : {0}%'.format(accuracy)) main()
Output/Result –
confusion matrix is as follows [[17 0 0]
[ 0 17 0]
[ 0 0 11]]
Accuracy metrics precision recall f1-score support
0 1.00 1.00 1.00 17

1 1.00 1.00 1.00 17
2 1.00 1.00 1.00 11
avg / total - 1.00 1.00 1.00 45

ML Manual2024 - IV YEar

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Manual2024 - IV YEar

Uploaded by

Copyright:

Available Formats

Acropolis Institute of Technology and Research, Indore

Vision and Mission of the Institute Vision

1. To create an intellectually stimulating learning environment.

2. To impart value based, innovative, and research-oriented education.

3. To develop positive attitude with communication skills.

4. To increase employability and entrepreneurship through collaboration with industries and

Vision and Mission of Department of Information Technology

Program Outcome (PO)

1. Apply knowledge of mathematics, science, computing, and engineering fundamentals to computer

This is to certify that …………Enrolment No. …………… B. Tech

Information Technology Year 4th Semester VIII in the Machine Learning

academic year 2023 – 2024.

Signature of Head Signature of Faculty

of the trade-offs involved in design choices.

Hardware and Software Requirements

Sr. No. Software Requirements Hardware Requirements

1 Windows,Mac,Linux Hard Disk min 1 GB or above

2 Python versions: 2.7.X, 3.6.X.,3.8.X RAM: 4 GB

General Instructions for Laboratory Classes

2. While entering the LAB students should wear their ID cards.

3. The students should come with proper uniform.

6. Students should maintain silence inside the laboratory.

2. Students using the computers in an improper way.

3. Students scribbling on the desk and mishandling the chairs.

4. Students using mobile phones inside the laboratory.

5. Students making noise inside the laboratory.

Course Objectives and Outcomes

challenges of machine learning: data, model selection, model complexity, etc.

machine learning algorithms.

3. To Implement Convolution neural networks for solving real world problems.

applications related to computer vision, speech processing, natural language processing

At the end of the course student will be able to:

1. Understand the basic characteristics of machine learning strategies.

2. Analyze supervised learning and various applications of Neural Network.

3. Apply more than one technique to enhance the performance of learning.

Department Of Information Technology

1. Basic Concepts of Statistics, Probability, Linear Algebra, Calculus.

2. Basic Concepts of Programming Languages.

Subject Name – Machine Learning Subject Code – IT802(A)

with open('tennis.csv', 'r') as f:

reader = csv.reader(f) your_list =

h = [['0', '0', '0', '0', '0', '0']]

for i in your_list: print(i) if

if x != h[0][j] and h[0][j] == '0':h[0][j]

elif x != h[0][j] and h[0][j] != '0': h[0][j]

print("Most specific hypothesis is")

Maximally Specific set - [['Sunny', 'Warm', '?', 'Strong', '?', '?']]

Subject Name – Machine Learning Subject Code – IT802(A)

 You can consider this as an extended form of the Find-S algorithm.

def __init__(self, num_attributes):

self.S = [None] * num_attributes # Most specific hypothesis

self.S[i] = ('0', '?')

self.G[i] = ('?', '?')

def is_consistent(self, example, hypothesis):

self.G[i] = (example[i], '?')

elif self.G[i][0] != example[i]:

self.G[i] = ('?', '?')

self.S[i] = (example[i], '?')

self.S[i] = ('?', self.S[i][1])

def eliminate_candidates(self, data):

for example in data:

if example[-1] == 'Y': # Positive example

if not self.is_consistent(example[:-1], self.S):

def init(self, num_attributes):

SS_xy = np.sum(yx) - nm_y*m_x

SS_xx = np.sum(xx) - nm_x*m_x