Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Acropolis Institute of Technology and Research, Indore

Vision and Mission of the Institute Vision

To be an academic leader for the development of human potential so as to meet the global
challenges.
Mission

1. To create an intellectually stimulating learning environment.

2. To impart value based, innovative, and research-oriented education.

3. To develop positive attitude with communication skills.

4. To increase employability and entrepreneurship through collaboration with industries and


professional organizations.

Vision and Mission of Department of Information Technology

Vision

To be a centre of excellence for IT education and research to produce globally competent and skilled IT professionals.

Mission

1: To equip the department with latest facilities to provide need-based quality education.
2: To inculcate real-life problem-solving skills in the students.
3: To strengthen students’ capabilities to match with industry requirements.
4: To provide an environment for continuous learning and applied research.
Acropolis Institute of Technology and Research, Indore

Program Outcome (PO)


The engineering graduate of this institute will demonstrate:

1. Apply knowledge of mathematics, science, computing, and engineering fundamentals to computer


science engineering problems.
2. Able to identify, formulate, and demonstrate with excellent programming, and problem-solving
skills.
3. Design solutions for engineering problems including design of experiment and processes to meet
desired needs within reasonable constraints of manufacturability, sustainability, ecological,
intellectual and health and safety considerations.
4. Propose and develop effective investigational solution of complex problems using research
methodology; including design of experiment, analysis and interpretation of data, and combination
of information to provide suitable conclusion. synthesis
5. Ability to create, select and use the modern techniques and various tools to solve engineering
problems and to evaluate solutions with an understanding of the limitations.
6. Ability to acquire knowledge of contemporary issues to assess societal, health and safety, legal
and cultural issues.
7. Ability to evaluate the impact of engineering solutions on individual as well as organization in a
societal and environmental context, and recognize sustainable development, and will be aware of
emerging technologies and current professional issues.
8. Capability to possess leadership and managerial skills and understand and commit to professional
ethics and responsibilities.
9. Ability to demonstrate the teamwork and function effectively as an individual, with an ability to
design, develop, test, and debug the project, and will be able to work with a multi-disciplinary team.
10. Ability to communicate effectively on engineering problems with the community, such as being
able to write effective reports and design documentation.
11. Flexibility to feel the recognition of the need for and can engage in independent and life- long
learning by professional development and quality enhancement programs in context of
technological change.
12. A practice of engineering and management principles and apply these to one’s own work, as a
member and leader in a team, to manage projects and entrepreneurship.
Acropolis Institute of Technology and Research, Indore

Certificate

This is to certify that …………Enrolment No. …………… B. Tech

Information Technology Year 4th Semester VIII in the Machine Learning

(Sub. Code – IT802(A) has performed experiments as per the syllabus prescribed

by RGPV Bhopal and submitted satisfactory work in the institute during the

academic year 2023 – 2024.

Signature of Head Signature of Faculty


Acropolis Institute of Technology and Research, Indore

List of Practical
S NO CO
Title of the Practical CO CO CO CO CO5
1 2 3 4
1 Implement and demonstrate the FIND-S algorithm for finding ✓
the most specific hypothesis based on a given set of training
data samples. Read the training data from a .CSV file.
2 For a given set of training data examples stored in a .CSV file, ✓
implement and demonstrate the Candidate-Elimination
algorithm to output a description of the set of all hypotheses
consistent with the training examples.
3 Write a program for linear regression. ✓
4 Write a program to demonstrate the working of the decision ✓
tree based ID3 algorithm. Use an appropriate data set for
building the decision tree and apply this knowledge to classify a
new sample.
5 Write a program to implement Random forest ensemble method ✓
on a given dataset.
6 Write a program to implement Boosting ensemble method on a ✓
given dataset.
7 Write a python program to implement K-Means clustering ✓
Algorithm.
8 Write a program toImplement Dimensionality reduction using ✓
Principle Component Analysis (PCA) method.
9 Write a program to construct a Bayesian network considering ✓
medical data. Use this model to demonstrate the diagnosis of
heart patients using standard Heart Disease Data Set. You can
use Java/Python ML library classes/API.
10 Write a program to implement the naïve Bayesian classifier for ✓
a sample trainingdata set stored as a .CSV file. Compute the
accuracy of the classifier, considering few test data sets
CO List
CO 1 Apply knowledge of computing and mathematics to machine learning problems, models and
Algorithms
CO 2 Analyze a problem and identify the computing requirements appropriate for its solution
using Neural Network.
CO 3
Implement, and evaluate Convolution Neural Network to meet desired needs
Acropolis Institute of Technology and Research, Indore

CO 4 Solve real world problems using recurrent network and reinforcement learning.

CO 5 Apply mathematical foundations, algorithmic principles, and computer science theory to the modeling
and design of computer-based systems in a way that demonstrates comprehension

of the trade-offs involved in design choices.

CO PO Mapping
PO PSO
CO PO PO PO PO PO PO PO PO PO PO1 PO1 PO1 PSO PSO PSO
1 2 3 4 5 6 7 8 9 0 1 2 1 2 3
CO ✓ ✓ ✓ ✓
1
CO ✓ ✓ ✓ ✓
2
CO ✓ ✓
3
CO ✓ ✓
4
CO ✓
5
Acropolis Institute of Technology and Research, Indore

Index
Practical Mark Faculty
Date of Date of Submission s/ Sign with
S. No Title of the Practical
Practical Submission Remark Grade date.
1 Implement and demonstrate the FIND-S
algorithm for finding the most specific
hypothesis based on a given set of
training data samples. Read the training
data from a .CSV file.
2 For a given set of training data
examples stored in a .CSV file,
implement and demonstrate the
Candidate-Elimination algorithm to
output a description of the set of all
hypotheses consistent with the training
examples.
3
Write a program for linear
regression.
4 Write a program to demonstrate the
working of the decision tree based ID3
algorithm. Use an appropriate data set
for building the decision tree and apply
this knowledge to classify a new sample.
5 Write a program to implement Random
forest ensemble method on a given
dataset.
6 Write a program to implement
Boosting ensemble method on a
given dataset.
7 Write a python program to
implement K- Means clustering
Algorithm
8 Write a program toImplement
Dimensionality reduction using Principle
Component Analysis (PCA) method.
9 Write a program to construct a Bayesian
network considering medical data. Use
this model to demonstrate the diagnosis
of heart patients using standard Heart
Disease Data Set. You can use
Java/Python ML library classes/API.
10 Write a program to implement the naïve
Bayesian classifier for a sample
trainingdata set stored as a .CSV file.
Compute the accuracy of the classifier,
considering few test data sets
Acropolis Institute of Technology and Research, Indore

Hardware and Software Requirements

Sr. No. Software Requirements Hardware Requirements

1 Windows,Mac,Linux Hard Disk min 1 GB or above

2 Python versions: 2.7.X, 3.6.X.,3.8.X RAM: 4 GB


Acropolis Institute of Technology and Research, Indore

General Instructions for Laboratory Classes

DO’S
1. Without Prior permission do not enter the Laboratory.

2. While entering the LAB students should wear their ID cards.

3. The students should come with proper uniform.

4. Students should sign in the LOGIN REGISTER before entering the laboratory.

5. Students should come with observation and record notebook to the laboratory.

6. Students should maintain silence inside the laboratory.

7. After completing the laboratory exercise, make sure to shut down the system properly.

DONT’S
1. Students bringing the bags inside the laboratory.

2. Students using the computers in an improper way.

3. Students scribbling on the desk and mishandling the chairs.

4. Students using mobile phones inside the laboratory.

5. Students making noise inside the laboratory.


Acropolis Institute of Technology and Research, Indore

Course Objectives and Outcomes

Course Objective

1. To understand the basic theory underlying machine learning, fundamental issues and

challenges of machine learning: data, model selection, model complexity, etc.

2. To study and apply basic concepts of artificial neural networks and Deep Learning based

machine learning algorithms.

3. To Implement Convolution neural networks for solving real world problems.

4. To apply and implement basic concepts of recurrent network and reinforcement learning.

5. To evaluate the performance of algorithms and to provide solution for various real world

applications related to computer vision, speech processing, natural language processing

Course Outcomes

At the end of the course student will be able to:

1. Understand the basic characteristics of machine learning strategies.

2. Analyze supervised learning and various applications of Neural Network.

3. Apply more than one technique to enhance the performance of learning.

4. Create probabilistic and unsupervised learning models for handling unknown pattern.

5. Evaluate frequent patterns and preprocess the data before applying to any real-world problem and can
evaluate its performance.
Acropolis Institute of Technology and Research, Indore

Department Of Information Technology

Pre-requisites -

1. Basic Concepts of Statistics, Probability, Linear Algebra, Calculus.

2. Basic Concepts of Programming Languages.


Acropolis Institute of Technology and Research, Indore

Subject Name – Machine Learning Subject Code – IT802(A)

Experiment No. – 1

Experiment Title – Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data froma .CSV file.

Concept Theory – The find-S algorithm is a basic concept learning algorithm in machine learning. The find-S
algorithm finds the most specific hypothesis that fits all the positive examples. We have to note here that the algorithm
considers only those positive training example. The find-S algorithm starts with the most specific hypothesis and
generalizes this hypothesis each time it fails to classify an observed positive training data. Hence, the Find-S
algorithm moves from the most specific hypothesis to the most general hypothesis.

Solution(Program/Code/Procedure/Query) –

import csv

with open('tennis.csv', 'r') as f:

reader = csv.reader(f) your_list =

list(reader)

h = [['0', '0', '0', '0', '0', '0']]

for i in your_list: print(i) if


i[-1] == "True": j
= 0 for x in i:

if x != "True":

if x != h[0][j] and h[0][j] == '0':h[0][j]


=x

elif x != h[0][j] and h[0][j] != '0': h[0][j]


= '?'

else:

pass

j=j+1

print("Most specific hypothesis is")

print(h)
Acropolis Institute of Technology and Research, Indore

Output/Result -

'Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same',True 'Sunny', 'Warm', 'High', 'Strong',
'Warm', 'Same',True 'Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change',False 'Sunny', 'Warm', 'High',
'Strong', 'Cool', 'Change',True

Maximally Specific set - [['Sunny', 'Warm', '?', 'Strong', '?', '?']]


Acropolis Institute of Technology and Research, Indore

Subject Name – Machine Learning Subject Code – IT802(A)

Experiment No. – 2

Experiment Title – For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all hypothesesconsistent
with the training examples.

Concept Theory – The candidate elimination algorithm incrementally builds the version space given a hypothesis
space H and a set E of examples. The examples are added one by one; each example possibly shrinks the version
space by removing the hypotheses that are inconsistent with the example. The candidate elimination algorithm does
this by updating the general and specific boundary for each new example.

 You can consider this as an extended form of the Find-S algorithm.


 Consider both positive and negative examples.
 Actually, positive examples are used here as the Find-S algorithm (Basically they aregeneralizing from the
specification).
 While the negative example is specified in the generalizing form.

Solution (Program/Code/Procedure/Query) –

import csv

class CandidateElimination:

def __init__(self, num_attributes):


self.num_attributes = num_attributes

self.S = [None] * num_attributes # Most specific hypothesis


self.G = [None] * num_attributes # Most general hypothesis

for i in range(num_attributes):

self.S[i] = ('0', '?')

self.G[i] = ('?', '?')

def is_consistent(self, example, hypothesis):


for i in range(len(example)):
if hypothesis[i] != '?' and example[i] != hypothesis[i]:

return False
return True
def generalize_G(self, example):

for i in range(self.num_attributes):

if self.G[i][0] == '?':
Acropolis Institute of Technology and Research, Indore

self.G[i] = (example[i], '?')

elif self.G[i][0] != example[i]:

self.G[i] = ('?', '?')


def specialize_S(self, example):

for i in range(self.num_attributes):

if self.S[i][0] == '0':

self.S[i] = (example[i], '?')


elif self.S[i][0] != example[i]:

self.S[i] = ('?', self.S[i][1])

def eliminate_candidates(self, data):

for example in data:

if example[-1] == 'Y': # Positive example


self.generalize_G(example[:-1])

for i in range(self.num_attributes):

if not self.is_consistent(example[:-1], self.S):


self.specialize_S(example[:-1])

else: # Negative example


self.specialize_S(example[:-1])

for i in range(self.num_attributes):

if not self.is_consistent(example[:-1], self.G):

self.generalize_G(example[:-1])

def print_hypotheses(self):
print("Final hypothesis:")

print("S:", self.S)

print("G:", self.G)
def load_data(file_path):

data = []

with open(file_path, 'r') as file:


reader = csv.reader(file)

for row in reader:

data.append(row)

return data
Acropolis Institute of Technology and Research, Indore

def main():

file_path = "training_data.csv" # Path to your CSV file

data = load_data(file_path)
num_attributes = len(data[0]) - 1 # Number of attributes (excluding label)

ce = CandidateElimination(num_attributes)

ce.eliminate_candidates(data)

ce.print_hypotheses()
if __name__ == "__main__":

main()

Training Data- Sky, AirTemp, Humidity, Wind, Water, Forecast, EnjoySport

Sunny, Warm, Normal, Strong, Warm, Same, Yes


Sunny, Warm, High, Strong, Warm, Same, Yes

Rainy, Cold, High, Strong, Warm, Change, No

Sunny, Warm, High, Strong, Cool, Change, Yes

Output/Result –

Final hypothesis:
S: [('Sunny', '?'), ('Warm', '?'), ('?', '?'), ('Strong', '?'), ('?', '?'), ('?', '?')]
G: [('?', '?'), ('Warm', '?'), ('High', '?'), ('Strong', '?'), ('Warm', '?'), ('?', '?')]
Acropolis Institute of Technology and Research, Indore

Subject Name – Machine Learning Subject Code – IT802(A)

Experiment No. – 3

Experiment Title – Implementation of Linear Regression

Concept Theory – Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis. Linear regression makes predictions for continuous/real or
numeric variables such as sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (y)
variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it
finds how the value of the dependent variable is changing according to the value of the independent variable.

Solution (Program/Code/Procedure/Query) –

import numpy as np

import matplotlib.pyplot as pltdef

estimate_coef(x, y):

n = np.size(x) m_x

= np.mean(x)m_y =

np.mean(y)

SS_xy = np.sum(y*x) - n*m_y*m_x

SS_xx = np.sum(x*x) - n*m_x*m_x

b_1 = SS_xy / SS_xx

b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

# plotting the actual points as scatter plot plt.scatter(x,

y, color = "m",marker = "o", s = 30)


Acropolis Institute of Technology and Research, Indore

y_pred = b[0] + b[1]*x plt.plot(x,

y_pred, color = "g")plt.xlabel('x')

plt.ylabel('y')

plt.show() def

main():

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

b = estimate_coef(x, y)

print("Estimated coefficients:\nb_0 = {}\\nb_1 = {}".format(b[0], b[1]))

plot_regression_line(x, y, b)

if name == " main ":

main()

Output/Result –

Estimated coefficients: b_0

= -0.0586206896552

b_1 = 1.45747126437
Acropolis Institute of Technology and Research, Indore

Subject Name – Machine Learning Subject Code – IT802(A)

Experiment No. – 4

Experiment Title – Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to classify a new sample.

Concept Theory – ID3 is a simple decision tree learning algorithm developed by Ross Quinlan (1983). The basic
idea of ID3 algorithm is to construct the decision tree by employing a top-down, greedy search through the given sets
to test each attribute at every tree node. In order to select theattribute that is most useful for classifying a given sets,
we introduce a metric---information gain.

To find a optimal way to classify a learning set, what we need to do is to minimize the questions asked (i.e.
minimizing the depth of the tree). Thus , we need some function which can measure which questions provide the
most balanced splitting. The information gain metric is such a function

Solution (Program/Code/Procedure/Query) –

import numpy as np
import math

from data_loader import read_data

class Node:
def init (self, attribute):
self.attribute = attribute
self.children = [] self.answer
= ""

def
str(self):

return self.attribute

def subtables(data, col, delete): dict = {} items =


np.unique(data[:, col]) count =
np.zeros((items.shape[0], 1), dtype=np.int32)

for x in range(items.shape[0]): for

y in range(data.shape[0]):
Acropolis Institute of Technology and Research, Indore

\if data[y, col] == items[x]:

count[x] += 1

for x in range(items.shape[0]):

dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")

pos = 0 for y in
range(data.shape[0]): if
data[y, col] == items[x]:

dict[items[x]][pos] = data[y]

pos += 1

if delete: dict[items[x]] = np.delete(dict[items[x]], col, 1) return items, dict

def entropy(S): items


= np.unique(S) if
items.size == 1:
return 0

counts = np.zeros((items.shape[0], 1))


sums = 0

for x in range(items.shape[0]):

counts[x] = sum(S == items[x]) / (S.size * 1.0)

for count in counts: sums += -1 * count *


math.log(count, 2)

return sums

def gain_ratio(data, col):

items, dict = subtables(data, col, delete=False)

total_size = data.shape[0] entropies =


np.zeros((items.shape[0], 1)) intrinsic =
np.zeros((items.shape[0], 1)) for x in
range(items.shape[0]): ratio =
dict[items[x]].shape[0]/(total_size * 1.0) entropies[x]
Acropolis Institute of Technology and Research, Indore

= ratio * entropy(dict[items[x]][:, -1]) intrinsic[x] =ratio *


math.log(ratio, 2)

total_entropy = entropy(data[:, -1])iv


= -1 * sum(intrinsic)

for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv

def create_node(data, metadata):

if (np.unique(data[:, -1])).shape[0] == 1: node


= Node("")

node.answer = np.unique(data[:, -1])[0] returnnode

gains = np.zeros((data.shape[1] - 1, 1)) forcol


in range(data.shape[1] - 1):

gains[col] = gain_ratio(data, col)

split = np.argmax(gains)

node = Node(metadata[split])

metadata = np.delete(metadata, split, 0)

items, dict = subtables(data, split, delete=True)

for x in range(items.shape[0]): child =


create_node(dict[items[x]], metadata)
node.children.append((items[x], child)) returnnode

def empty(size):

s = "" for x in
range(size): s += "
"

return s
Acropolis Institute of Technology and Research, Indore

def print_tree(node, level):

if node.answer != "":
print(empty(level), node.answer)
return print(empty(level),
node.attribute)

for value, n in node.children: print(empty(level +


1), value) print_tree(n, level + 2)

metadata, traindata =
read_data("tennis.csv") data =
np.array(traindata) node = create_node(data,
metadata) print_tree(node, 0)

Data_loader.py import
csv def
read_data(filename):

with open(filename, 'r') as csvfile: datareader =


csv.reader(csvfile, delimiter=',')

headers =
next(datareader) metadata
= [] traindata =[] for name
in headers:

metadata.append(name)
for row in datareader:
traindata.append(row)

return (metadata, traindata)


Tennis.csv

outlook,temperature,humidity,wind, answer
sunny,hot,high,weak,no
sunny,hot,high,strong,no
overcast,hot,high,weak,yes
rain,mild,high,weak,yes
rain,cool,normal,weak,yes
rain,cool,normal,strong,no
overcast,cool,normal,strong,yes
Acropolis Institute of Technology and Research, Indore

sunny,mild,high,weak,no
sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes
sunny,mild,normal,strong,yes
overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes
rain,mild,high,strong,no

Output/Result –
outlook
overcast

b'yes'
rain

wind
b'strong'
b'no'
b'weak'
b'yes'
sunny

humidity
b'high'
b'no'

b'normal'
b'yes
Acropolis Institute of Technology and Research, Indore

Subject Name – Machine Learning Subject Code – IT802(A)

Experiment No. – 5

Experiment Title – Write a program to implement Random forest ensemble method on a given dataset.

Concept Theory – Random Forest is one of the most popular and commonly used algorithms by Data Scientists.
Random forest is a Supervised Machine Learning Algorithm that is used widely in Classification and Regression
problems. It builds decision trees on different samples and takes their majority vote for classification and average in
case of regression.

One of the most important features of the Random Forest Algorithm is that it can handle the data set containing
continuous variables, as in the case of regression, and categorical variables, as in the case of classification. It
performs better for classification and regression tasks. In this tutorial, we will understand the working of random
forest and implement random forest on a classificationtask.

Solution (Program/Code/Procedure/Query) –

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

data = pd.read_csv('Salaries.csv')

print(data)

# Fitting Random Forest Regression to the dataset#

import the regressor

from sklearn.ensemble import RandomForestRegressor #

create regressor object

regressor = RandomForestRegressor(n_estimators = 100, random_state = 0)# fit the

regressor with x and y data

regressor.fit(x, y)
Acropolis Institute of Technology and Research, Indore

Y_pred = regressor.predict(np.array([6.5]).reshape(1, 1)) # test the output by changing values#

Visualising the Random Forest Regression results

# arrange for creating a range of values#

from min value of x to max

# value of x with a difference of 0.01#

between two consecutive values

X_grid = np.arrange(min(x), max(x), 0.01)

# reshape for reshaping the data into a len(X_grid)*1 array, # i.e.

to make a column out of the X_grid value

X_grid = X_grid.reshape((len(X_grid), 1))#

Scatter plot for original data plt.scatter(x, y,

color = 'blue')

# plot predicted data

plt.plot(X_grid, regressor.predict(X_grid),color = 'green')

plt.title('Random Forest Regression')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show() Output/Result –
Acropolis Institute of Technology and Research, Indore

Subject Name – Machine Learning Subject Code – IT802(A)

Experiment No. – 6

Experiment Title – Write a program to implement Boosting ensemble method on a given dataset.

Concept Theory – Boosting is an ensemble learning method that combines a set of weak learners into a strong
learner to minimize training errors. In boosting, a random sample of data is selected, fitted with a model and then
trained sequentially—that is, each model tries to compensate for the weaknesses of its predecessor. With each
iteration, the weak rules from each individual classifierare combined to form one, strong prediction rule.

Solution (Program/Code/Procedure/Query) –

1. Cross Validation

# This code may not be run on GFG IDE# as

required packages are not found.

# importing cross-validation from sklearn package.from sklearn import cross_validation # value

of K is 10.

data = cross_validation.KFold(len(train_set), n_folds=10, indices=False)

2. Implementing AdaBoost
import pandas as pd import

numpy as np

from sklearn.model_selection import train_test_splitfrom

sklearn.ensemble import AdaBoostClassifier import

warnings

warnings.filterwarnings("ignore")

# Reading the dataset from the csv file

# separator is a vertical line, as seen in the dataset


Acropolis Institute of Technology and Research, Indore

data = pd.read_csv("Iris.csv")

# Printing the shape of the dataset

print(data.shape)

data = data.drop('Id',axis=1)X

= data.iloc[:,:-1]

y = data.iloc[:,-1]

print("Shape of X is %s and shape of y is %s"%(X.shape,y.shape))

total_classes = y.nunique()

print("Number of unique species in dataset are: ",total_classes)

distribution = y.value_counts()

print(distribution)

X_train,X_val,Y_train,Y_val = train_test_split(X,y,test_size=0.25,random_state=28) print("The

accuracy of the model on validation set is", adb_model.score(X_val,Y_val))

Output/Result –

The accuracy of the model on validation set is 0.9210526315789473


Acropolis Institute of Technology and Research, Indore

Subject Name – Machine Learning Subject Code – IT802(A)

Experiment No. – 7

Experiment Title – Write a python program to implement K-Means clustering Algorithm.

Concept Theory – The K-means clustering algorithm computes centroids and repeats until the optimal centroid is
found. It is presumptively known how many clusters there are. It is also known as the flat clustering algorithm. The
number of clusters found from data by the method is denotedby the letter ‘K’ in K-means.

In this method, data points are assigned to clusters in such a way that the sum of the squared distances between the
data points and the centroid is as small as possible. It is essential to note thatreduced diversity within clusters leads to
more identical data points within the same cluster.

Solution (Program/Code/Procedure/Query) –

# Implementing K-means Clusteringdef

ReadData(fileName):

# Read the file, splitting by linesf =

open(fileName, 'r');

lines = f.read().splitlines();

f.close();

items = [];

for i in range(1, len(lines)):

line = lines[i].split(',');

itemFeatures = [];

for j in range(len(line)-1):

# Convert feature value to floatv =

float(line[j]);

# Add feature value to dict

itemFeatures.append(v);

items.append(itemFeatures);

shuffle(items);

return items;

def FindColMinMax(items):n

= len(items[0]);
Acropolis Institute of Technology and Research, Indore

minima = [sys.maxint for i in range(n)];

maxima = [-sys.maxint -1 for i in range(n)];for

item in items:

for f in range(len(item)):

if (item[f] < minima[f]):

minima[f] = item[f];
if (item[f] > maxima[f]):

maxima[f] = item[f];

return minima,maxima;

def InitializeMeans(items, k, cMin, cMax):

# Initialize means to random numbers between # the

min and max of each column/feature

f = len(items[0]); # number of features

means = [[0 for i in range(f)] for j in range(k)];for

mean in means:

for i in range(len(mean)):

# Set value to a random float

# (adding +-1 to avoid a wide placement of a mean)

mean[i] = uniform(cMin[i]+1, cMax[i]-1);

return means;

def EuclideanDistance(x, y):

S = 0;

# The sum of the squared differences of the elements for i

in range(len(x)):

S += math.pow(x[i]-y[i], 2)

#The square root of the sum

return math.sqrt(S)

def UpdateMean(n,mean,item):

for i in range(len(mean)):

m = mean[i];

m = (m*(n-1)+item[i])/float(n);
Acropolis Institute of Technology and Research, Indore

mean[i] = round(m, 3);

return mean;

def Classify(means,item):

# Classify item to the mean with minimum distance

minimum = sys.maxint;

index = -1;

for i in range(len(means)):

# Find distance from item to mean

dis = EuclideanDistance(item, means[i]);if

(dis < minimum):

minimum = dis;

index = i; return

index;

def CalculateMeans(k,items,maxIterations=100000):#

Find the minima and maxima for columns

cMin, cMax = FindColMinMax(items);

# Initialize means at random points

means = InitializeMeans(items,k,cMin,cMax);#

Initialize clusters, the array to hold

# the number of items in a class clusterSizes=

[0 for i in range(len(means))];# An array to hold

the cluster an item is in belongsTo = [0 for i in

range(len(items))]; # Calculate means

for e in range(maxIterations):

# If no change of cluster occurs, halt

noChange = True;

for i in range(len(items)):

item = items[i];

# Classify item into a cluster and update the

# corresponding means.
Acropolis Institute of Technology and Research, Indore

index = Classify(means,item);

clusterSizes[index] += 1; cSize =

clusterSizes[index];

means[index] = UpdateMean(cSize,means[index],item);#

Item changed cluster

if(index != belongsTo[i]):

noChange = False;

belongsTo[i] = index;

# Nothing changed, returnif

(noChange):

break;

return means;

def FindClusters(means,items):

clusters = [[] for i in range(len(means))]; # Init clustersfor

item in items:

# Classify item into a cluster

index = Classify(means,item);#

Add item to cluster

clusters[index].append(item);

return clusters;
Acropolis Institute of Technology and Research, Indore

Subject Name – Machine Learning Subject Code – IT802(A)

Experiment No. – 8

Experiment Title – Write a program to Implement Dimensionality reduction using Principle Component
Analysis (PCA) method.

Concept Theory – PCA is a widely covered machine learning method on the web, and there are some great articles
about it, but many spend too much time in the weeds on the topic, when most of us just want to know how it works
in a simplified way.

Principal component analysis can be broken down into five steps. I’ll go through each step, providing logical
explanations of what PCA is doing and simplifying mathematical concepts such as standardization, covariance,
eigenvectors and eigenvalues without focusing on how to computethem.

Solution (Program/Code/Procedure/Query) –

1. Implementation of PCA

import pandas as pd import

numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.decomposition import PCA

from sklearn.preprocessing import StandardScaler

#import the breast _cancer dataset

from sklearn.datasets import load_breast_cancer

data=load_breast_cancer()

data.keys()

# Check the output classes

print(data['target_names']) #

Check the input attributes

print(data['feature_names'])

# construct a dataframe using pandas

df1=pd.DataFrame(data['data'],columns=data['feature_names']) #

Scale data before applying PCA

scaling=StandardScaler()
Acropolis Institute of Technology and Research, Indore

# Use fit and transform method

scaling.fit(df1)

Scaled_data=scaling.transform(df1)#

Set the n_components=3

principal=PCA(n_components=3)

principal.fit(Scaled_data)

x=principal.transform(Scaled_data)

# Check the dimensions of data after PCA

print(x.shape)

# Check the values of eigen vectors #

prodeced by principal components

principal.components_

plt.figure(figsize=(10,10))

plt.scatter(x[:,0],x[:,1],c=data['target'],cmap='plasma')

plt.xlabel('pc1')

plt.ylabel('pc2')

# import relevant libraries for 3d graph from

mpl_toolkits.mplot3d import Axes3Dfig =

plt.figure(figsize=(10,10))

# choose projection 3d for creating a 3d graphaxis

= fig.add_subplot(111, projection='3d')

# x[:,0]is pc1,x[:,1] is pc2 while x[:,2] is pc3 axis.scatter(x[:,0],x[:,1],x[:,2],

c=data['target'],cmap='plasma')axis.set_xlabel("PC1", fontsize=10)

axis.set_ylabel("PC2", fontsize=10)

axis.set_zlabel("PC3", fontsize=10)
Acropolis Institute of Technology and Research, Indore

Subject Name – Machine Learning Subject Code – IT802(A)

Experiment No. – 9

Experiment Title – Write a program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use
Java/Python ML library classes/API.

Concept Theory – A Bayesian network (BN) is a probabilistic graphical model for representing knowledge about an
uncertain domain where each node corresponds to a random variable and each edge represents the conditional
probability for the corresponding random variables [9]. BNs are also called belief networks or Bayes nets. Due to
dependencies and conditional probabilities, a BN corresponds to a directed acyclic graph (DAG) where no loop or
self connection is allowed.

Solution (Program/Code/Procedure/Query) –

From pomegranate import* Asia=DiscreteDistribution({

„True‟:0.5, „False‟:0.5 })

Tuberculosis=ConditionalProbabilityTable(

[[ „True‟, „True‟, 0.2],

[„True‟, „False‟, 0.8],

[ „False‟, „True‟, 0.01], [ „False‟, „False‟, 0.98]], [asia])

Smoking = DiscreteDistribution({ „True‟:0.5, „False‟:0.5 })Lung

= ConditionalProbabilityTable(

[[ „True‟, „True‟, 0.75], [„True‟, „False‟,0.25].

[ „False‟, „True‟, 0.02],

[ „False‟, „False‟, 0.98]], [ smoking])

Bronchitis = ConditionalProbabilityTable([[

„True‟, „True‟, 0.92],

[„True‟, „False‟,0.08].

[ „False‟, „True‟,0.03],

[ „False‟, „False‟, 0.98]], [ smoking])

Tuberculosis_or_cancer = ConditionalProbabilityTable(

[[ „True‟, „True‟, „True‟, 1.0], [„True‟, „True‟, „False‟, 0.0],


Acropolis Institute of Technology and Research, Indore

[„True‟, „False‟, „True‟, 1.0],

[„True‟, „False‟, „False‟, 0.0],

[„False‟, „True‟, „True‟, 1.0],

[„False‟, „True‟, „False‟, 0.0],

[„False‟, „False‟ „True‟, 1.0],

[„False‟, „False‟, „False‟, 0.0]], [tuberculosis, lung])Xray

= ConditionalProbabilityTable(

[[ „True‟, „True‟, 0.885],

[„True‟, „False‟, 0.115],

[ „False‟, „True‟, 0.04],

[ „False‟, „False‟, 0.96]], [tuberculosis_or_cancer]) dyspnea = ConditionalProbabilityTable( [[ „True‟,


„True‟, „True‟, 0.96], [„True‟, „True‟, „False‟, 0.04],

[„True‟, „False‟, „True‟, 0.89],

[„True‟, „False‟, „False‟, 0.11],

[„False‟, „True‟, „True‟, 0.96],

[„False‟, „True‟, „False‟, 0.04],

[„False‟, „False‟ „True‟, 0.89],

[„False‟, „False‟, „False‟, 0.11 ]],

[tuberculosis_or_cancer, bronchitis])s0

= State(asia, name=”asia”)

s1 = State(tuberculosis, name=” tuberculosis”)


Acropolis Institute of Technology and Research, Indore

s2 = State(smoking, name=” smoker”)


network = BayesianNetwork(“asia”)

network.add_nodes(s0,s1,s2)

network.add_edge(s0,s1)

network.add_edge(s1.s2)

network.bake()

print(network.predict_probal({„tuberculosis‟: „True‟}))
Acropolis Institute of Technology and Research, Indore

Subject Name – Machine Learning Subject Code – IT802(A)

Experiment No. – 10

Experiment Title – Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.

Concept Theory –

 Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayestheorem and
used for solving classification problems.
 It is mainly used in text classification that includes a high-dimensional training dataset.
 Naïve Bayes Classifier is one of the simple and most effective Classification algorithmswhich helps in
building the fast machine learning models that can make quick predictions.
 It is a probabilistic classifier, which means it predicts on the basis of the probability of anobject.
 Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimentalanalysis, and
classifying articles.

Solution (Program/Code/Procedure/Query) –

import csv
import random
import math

def loadCsv(filename): lines =


csv.reader(open(filename, "r"));
dataset = list(lines) for i in
range(len(dataset)):

#converting strings into numbers for processing dataset[i]


= [float(x) for x in dataset[i]]

return dataset

def splitDataset(dataset, splitRatio):

#67% training size trainSize =


int(len(dataset) * splitRatio); trainSet = []
copy = list(dataset); while len(trainSet) <
trainSize:

#generate indices for the dataset list randomly to pick ele for training data index =
random.randrange(len(copy)); trainSet.append(copy.pop(index))

return [trainSet, copy]

def separateByClass(dataset):
separated = {}

#creates a dictionary of classes 1 and 0 where the values are the instacnes belonging to eachclass
Acropolis Institute of Technology and Research, Indore

for i in range(len(dataset)):

vector = dataset[i] if (vector[-1]not


in separated):
separated[vector[-1]] = []

separated[vector[-1]].append(vector)

return separated

def mean(numbers):

return sum(numbers)/float(len(numbers))

def stdev(numbers):
avg =
mean(numbers)

variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1) return


math.sqrt(variance)

def summarize(dataset): summaries = [(mean(attribute), stdev(attribute)) forattribute in


zip(*dataset)]; del summaries[-1] return summaries

def summarizeByClass(dataset): separated

= separateByClass(dataset);

summaries = {} for classValue, instances in


separated.items():
Acropolis Institute of Technology and Research, Indore

#summaries is a dic of tuples(mean,std) for each class value summaries[classValue] =


summarize(instances)

return summaries

def calculateProbability(x, mean, stdev):

exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2)))) return (1


/ (math.sqrt(2*math.pi) * stdev)) * exponent

def calculateClassProbabilities(summaries, inputVector): probabilities =


{}

for classValue, classSummaries in summaries.items():#class and attribute informationas mean

and sd probabilities[classValue] =
1 for i in
range(len(classSummaries)):

mean, stdev = classSummaries[i] #take mean and sd of every attributefor


class 0 and 1 seperaely x = inputVector[i] #testvector's
first attribute

probabilities[classValue] *= calculateProbability(x, mean, stdev);#use


normal dist

return probabilities

def predict(summaries, inputVector): probabilities =


calculateClassProbabilities(summaries, inputVector) bestLabel, bestProb =
None, -1

for classValue, probability in probabilities.items():#assigns that class which has he highest

prob if bestLabel is None or probability >


bestProb:

bestProb = probability
bestLabel = classValue

return bestLabel

def getPredictions(summaries,
testSet): predictions = [] for i in
range(len(testSet)):
Acropolis Institute of Technology and Research, Indore

result = predict(summaries, testSet[i]) predictions.append(result) return

predictions

def getAccuracy(testSet, predictions):


correct = 0 for i in range(len(testSet)):

if testSet[i][-1] == predictions[i]:
correct += 1

return (correct/float(len(testSet))) * 100.0

def main():
filename='5data.csv'
splitRatio = 0.67
dataset = loadCsv(filename);

trainingSet, testSet = splitDataset(dataset, splitRatio) print('Split {0} rows into


train={1} and test={2} rows'.format(len(dataset), len(trainingSet), len(testSet)))

# prepare model

summaries = summarizeByClass(trainingSet);

# test model predictions = getPredictions(summaries, testSet)


accuracy = getAccuracy(testSet, predictions) print('Accuracy of the
classifier is : {0}%'.format(accuracy)) main()

Output/Result –
confusion matrix is as follows [[17 0 0]

[ 0 17 0]

[ 0 0 11]]

Accuracy metrics precision recall f1-score support

0 1.00 1.00 1.00 17


1 1.00 1.00 1.00 17
2 1.00 1.00 1.00 11

avg / total - 1.00 1.00 1.00 45

You might also like