Professional Documents
Culture Documents
ML_LAB
ML_LAB
Aim: Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a .CSV
file.
Software Required: Python IDE or Google Collab
Theory:
FIND-S Algorithm:
The FIND-S algorithm is a machine learning algorithm used for finding the most specific
hypothesis in the hypothesis space based on a given set of training data. The algorithm belongs
to the category of concept learning algorithms, particularly in the context of learning from
examples.
Hypothesis Representation:
In the FIND-S algorithm, a hypothesis is represented as a conjunction of literals, where each
literal corresponds to an attribute-value pair. The algorithm starts with the most specific
hypothesis, which is initialized to the first positive example in the training data. The
hypothesis is then refined as it iterates over the positive examples, adjusting attribute values to
cover all positive instances.
Algorithm:
FIND-S Algorithmn
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
Training Example:
Program:
import csv
for i in your_list:
print(i)
if i[-1] == "True":
j=0
for x in i:
if x != "True":
if x != h[j] and h[j] == '0':
h[j] = x
elif x != h[j] and h[j] != '0':
h[j] = '?'
else:
pass
j += 1
Data Set:
Output:
Conclusion:
Initially, set S to the most specific hypothesis in the hypothesis space (e.g., a hypothesis that
covers no examples), and set G to the most general hypothesis (e.g., a hypothesis that covers
all possible examples).
Process Training Examples:
For each training example (x, y), where x is the input and y is the output (label):
If y is positive (example is positive), update S by generalizing it to include x and updating G
to exclude any inconsistent hypotheses.
If y is negative (example is negative), update G by specializing it to exclude x and updating S
to exclude any inconsistent hypotheses.
Output Hypotheses:
The final output consists of the hypothesis space represented by the sets S and G.
S represents the most specific hypothesis consistent with the observed positive examples.
G represents the most general hypothesis consistent with the observed negative examples.
Algorithm:
Program:
class Holder:
factors = {} # Initialize an empty dictionary
attributes = () # Declaration of dictionaries parameters with an arbitrary length
class CandidateElimination:
Positive = {}
Negative = {}
Department of Electrical Engineering
Faculty of Engineering
Medi-Caps University
4
EE3CO47 (P) Machine Learning with Python Lab
Experiment No: -01 Implement and demonstrate the FIND-S
algorithm for finding the most specific hypothesis Page 5 of 49
based on a given set of training data samples.
Read the training data from a .CSV file.
def run_algorithm(self):
# Initialize specific and general boundaries
G = self.initializeG()
S = self.initializeS()
def initializeS(self):
"""Initialize the specific boundary."""
S = tuple(['-' for _ in range(self.num_factors)]) # 6 constraints in the vector
return [S]
def initializeG(self):
"""Initialize the general boundary."""
G = tuple(['?' for _ in range(self.num_factors)]) # 6 constraints in the vector
Department of Electrical Engineering
Faculty of Engineering
Medi-Caps University
5
EE3CO47 (P) Machine Learning with Python Lab
Experiment No: -01 Implement and demonstrate the FIND-S
algorithm for finding the most specific hypothesis Page 6 of 49
based on a given set of training data samples.
Read the training data from a .CSV file.
return [G]
# Example usage
dataset = [(('sunny', 'warm', 'normal', 'strong', 'warm', 'same'), 'Y'),
(('sunny', 'warm', 'high', 'strong', 'warm', 'same'), 'Y'),
(('rainy', 'cold', 'high', 'strong', 'warm', 'change'), 'N'),
(('sunny', 'warm', 'high', 'strong', 'cool', 'change'), 'Y')]
a = CandidateElimination(dataset, f)
a.run_algorithm()
Data Set:
Output:
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?'), ('?', '?', '?', '?', '?', 'same')]
Conclusion:
import numpy as np
import math
from data_loader import read_data
class Node:
def __init__(self, attribute):
self.attribute = attribute
self.children = []
self.answer = ""
def __str__(self):
return str(self.attribute)
for x in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] == items[x]:
count[x] += 1
for x in range(items.shape[0]):
dictionary[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in range(data.shape[0]):
if data[y, col] == items[x]:
dictionary[items[x]][pos] = data[y]
pos += 1
if delete:
for x in range(items.shape[0]):
dictionary[items[x]] = np.delete(dictionary[items[x]], col, 1)
def entropy(S):
items = np.unique(S)
if items.size == 1:
return 0
return sums
for x in range(items.shape[0]):
ratio = dictionary[items[x]].shape[0] / (total_size * 1.0)
entropies[x] = ratio * entropy(dictionary[items[x]][:, -1])
intrinsic[x] = ratio * math.log(ratio, 2)
Department of Electrical Engineering
Faculty of Engineering
Medi-Caps University
8
EE3CO47 (P) Machine Learning with Python Lab
Experiment No: -01 Implement and demonstrate the FIND-S
algorithm for finding the most specific hypothesis Page 9 of 49
based on a given set of training data samples.
Read the training data from a .CSV file.
for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv
split = np.argmax(gains)
node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)
items, dictionary = subtables(data, split, delete=True)
for x in range(items.shape[0]):
child = create_node(dictionary[items[x]], metadata)
node.children.append((items[x], child))
return node
def empty(size):
s = ""
for x in range(size):
s += " "
return s
outlook,temperature,humidity,wind,answer
sunny,hot,high,weak,no
sunny,hot,high,strong,no
overcast,hot,high,weak,yes
rain,mild,high,weak,yes
rain,cool,normal,weak,yes
rain,cool,normal,strong,no
overcast,cool,normal,strong,yes
sunny,mild,high,weak,no
sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes
sunny,mild,normal,strong,yes
overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes
rain,mild,high,strong,no
Output:
outlook
overcastb'yes'
rain
wind
b'strong'b'no'
b'weak' b'yes'
sunny
humidity
b'high'b'no'
b'normal'b'yes'
Conclusion:
Aim: To solve the real-world problems using the machine learning methods. Linear
Regression and Logistic Regression
Software Required: Python IDE or Google Collab
Department of Electrical Engineering
Faculty of Engineering
Medi-Caps University
10
EE3CO47 (P) Machine Learning with Python Lab
Experiment No: -01 Implement and demonstrate the FIND-S
algorithm for finding the most specific hypothesis Page 11 of 49
based on a given set of training data samples.
Read the training data from a .CSV file.
Theory:
a) Linear Regression:
Exercise: Predicting House Prices
Theory:
Linear Regression models the relationship between features (e.g., square footage, bedrooms) and
continuous target variable (house prices).
Coefficients (θ) are learned to minimize the difference between predicted and actual prices.
Gradient Descent optimizes coefficients.
Evaluation metrics: Mean Squared Error, R-squared.
b) Logistic Regression:
Exercise: Customer Churn Prediction
Theory:
Logistic Regression for binary classification (churn or no churn).
Models probability of churn using the logistic function.
Log Loss cost function measures prediction accuracy.
Gradient Descent adjusts parameters.
Evaluation metrics: Accuracy, precision, recall, F1-score.
Program:
a) Linear Regression:
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
# Feature selection
x_set = data[['internal']]
print('First 5 rows of the feature set are:')
print(x_set.head())
# Target variable
y_set = data[['external']]
print('First 5 rows of the target set are:')
print(y_set.head())
plt.show()
Output:
b) Logistic Regression:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, recall_score,
precision_score
from sklearn.preprocessing import StandardScaler
data = pd.read_csv(r"E:\sudhakar\heart.csv")
dim = data.shape
Department of Electrical Engineering
Faculty of Engineering
Medi-Caps University
13
EE3CO47 (P) Machine Learning with Python Lab
Experiment No: -01 Implement and demonstrate the FIND-S
algorithm for finding the most specific hypothesis Page 14 of 49
based on a given set of training data samples.
Read the training data from a .CSV file.
class_labels = data['target'].unique().astype(str)
print('Class labels are:')
print(class_labels)
sns.countplot(data['target'])
col_names = data.columns
feature_names = col_names[:-1].tolist()
print('Feature names are:')
print(feature_names)
y_set = data[['target']]
print('First 5 rows of target set are:')
print(y_set.head())
scaler = StandardScaler()
x_train, x_test, y_train, y_test = train_test_split(x_set, y_set, test_size=0.3)
scaler.fit(x_train)
x_train = scaler.transform(x_train)
model = LogisticRegression()
model.fit(x_train, y_train)
x_test = scaler.transform(x_test)
y_pred = model.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
Department of Electrical Engineering
Faculty of Engineering
Medi-Caps University
14
EE3CO47 (P) Machine Learning with Python Lab
Experiment No: -01 Implement and demonstrate the FIND-S
algorithm for finding the most specific hypothesis Page 15 of 49
based on a given set of training data samples.
Read the training data from a .CSV file.
sns.set(font_scale=1.5)
sns.heatmap(df_cm, annot=True, cmap="Blues", fmt='d')
Output:
Conclusion:
Bias, Variance, and Cross-Validation are important concepts in machine learning for
evaluating and improving the performance of models. Let's first discuss the theory behind Bias
and Variance, and then we'll explore the concept of Cross-Validation.
Bias and Variance:
Bias:
Bias refers to the error introduced by approximating a real-world problem, which may be
complex, by a simplified model. High bias can lead to underfitting, where the model is too
simple to capture the underlying patterns in the data.
Variance:
Variance is the amount by which a model's predictions would change if it were trained on a
different dataset. High variance can lead to overfitting, where the model is too complex and
captures noise in the training data.
Bias-Variance Tradeoff:
The goal in machine learning is to find a balance between bias and variance. Increasing the
complexity of a model typically reduces bias but increases variance, and vice versa. The
challenge is to find the optimal level of complexity that minimizes both bias and variance.
Cross-Validation:
Cross-validation is a technique used to assess how well a model will generalize to an
independent dataset. The basic idea is to split the dataset into multiple subsets, train the model
on some subsets, and test it on the remaining subsets. This process is repeated multiple times,
and the average performance is used to evaluate the model.
K-Fold Cross-Validation:
One common method is K-Fold Cross-Validation, where the dataset is divided into K subsets
(folds). The model is trained K times, each time using K-1 folds for training and the remaining
fold for testing. This process ensures that the model is evaluated on different subsets of the
data, providing a more robust assessment of its performance.
Program:
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model
col_names = data.columns
col_names = list(col_names)
print('Attribute names are:')
print(col_names)
feature_names = col_names[:-1]
print('Feature names are:', feature_names)
model = linear_model.LinearRegression()
scores = cross_val_score(model, x_set, y_set, cv=10)
for k in k_list:
model = linear_model.LinearRegression()
scores = cross_val_score(model, x_set, y_set, cv=k)
bias.append(mean(scores))
variance.append(stdev(scores))
Output:
Conclusion:
This is a basic example, and in real-world scenarios, you might need to handle more complex datasets
and encoding requirements.
Program:
import pandas as pd
# Sample data
data = {
'Category': ['A', 'B', 'A', 'C', 'B', 'C', 'A', 'B', 'C']
}
# Create a DataFrame
df = pd.DataFrame(data)
# Categorical Encoding
df['Category_Coded'] = df['Category'].astype('category').cat.codes
# One-Hot Encoding
df_one_hot = pd.get_dummies(df['Category'], prefix='Category')
Output:
Original DataFrame:
Category
0 A
1 B
2 A
3 C
4 B
5 C
6 A
7 B
8 C
Conclusion:
Forward Pass: Propagate input data through the network to compute outputs.
Calculate Error: Compare predicted outputs with actual targets using a loss function.
Backward Pass: Compute gradients of the loss function with respect to weights and biases.
Gradient Descent: Update weights and biases to minimize the error.
Repeat: Iterate through steps 2-5 for multiple epochs.
To test the ANN, use datasets like MNIST, CIFAR-10, or IMDB Movie Reviews, splitting data
into training and testing sets for evaluation. This process trains the network efficiently using
Backpropagation and validates its performance on real-world data.
Program:
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Variable initialization
epoch = 7000 # Setting training iterations
lr = 0.1 # Setting learning rate
# Backpropagation
EO = y - output
outgrad = derivatives_sigmoid(output)
d_output = EO * outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) * lr
bout += np.sum(d_output, axis=0, keepdims=True) * lr
wh += X.T.dot(d_hiddenlayer) * lr
bh += np.sum(d_hiddenlayer, axis=0, keepdims=True) * lr
Input:
[[ 0.66666667 1. ]
[ 0.33333333 0.55555556]
[ 1. 0.66666667]]
Actual Output:
[[0.92]]
[0.86]
[0.89]
Predicted Output:
Department of Electrical Engineering
Faculty of Engineering
Medi-Caps University
22
EE3CO47 (P) Machine Learning with Python Lab
Experiment No: -01 Implement and demonstrate the FIND-S
algorithm for finding the most specific hypothesis Page 23 of 49
based on a given set of training data samples.
Read the training data from a .CSV file.
[[0.089559591]]
[0.88142869]
[0.8928407]]
Conclusion:
lines = csv.reader(csvfile)
dataset = list(lines)
for x in range(len(dataset)-1):
for y in range(4):
dataset[x][y] = float(dataset[x][y])
if random.random() < split:
trainingSet.append(dataset[x])
else:
testSet.append(dataset[x])
def getResponse(neighbors):
classVotes = {}
for x in range(len(neighbors)):
response = neighbors[x][-1]
if response in classVotes:
classVotes[response] += 1
else:
classVotes[response] = 1
def main():
# prepare data
trainingSet = []
testSet = []
split = 0.67
# generate predictions
predictions = []
k=3
for x in range(len(testSet)):
neighbors = getNeighbors(trainingSet, testSet[x], k)
result = getResponse(neighbors)
predictions.append(result)
print('> predicted=' + repr(result) + ', actual=' + repr(testSet[x][-1]))
main()
Output:
Conclusion:
Locally Weighted Regression is a non-parametric learning algorithm that fits data points by
giving more weight to nearby points during prediction. Here are key points about this
algorithm:
Non-Parametric Algorithm: Locally Weighted Regression does not assume a fixed
functional form for the relationship between inputs and outputs.
Working Principle: It predicts the output for a new data point by giving more weight to
nearby training points and less weight to distant ones.
No Training Phase: Unlike parametric models that require a training phase to learn
parameters, Locally Weighted Regression retains the training data during prediction.
Weight Calculation: The weight assigned to each training point is determined by a kernel
function that decreases with distance from the test point.
Tau Parameter: The bandwidth parameter (tau) controls the width of the kernel and
influences how much each training point contributes to the prediction.
By implementing Locally Weighted Regression and fitting data points using this algorithm,
you can observe how it adapts to local patterns in the data and provides flexible predictions
based on neighboring points.
Program:
m = np.shape(mbill)[1]
one = np.mat(np.ones(m))
X = np.hstack((one.T, mbill.T))
# Set k here
ypred = localWeightRegression(X, mtip, 2)
Output:
Conclusion:
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics
df = pd.DataFrame(xtrain_dtm.toarray(), columns=count_vect.get_feature_names())
print(df)
# Tabular representation
print(xtrain_dtm)
predicted = clf.predict(xtest_dtm)
# Example predictions
docs_new = ['I love this sandwich', 'This is an amazing place']
X_new_counts = count_vect.transform(docs_new)
predicted_new = clf.predict(X_new_counts)
Output:
['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'boss', 'can', 'deal',
'do', 'enemy', 'feel', 'fun', 'good', 'have', 'horrible', 'house', 'is', 'like', 'love', 'my',
'not', 'of', 'place', 'restaurant', 'sandwich', 'sick', 'stuff', 'these', 'this', 'tired', 'to',
'today', 'tomorrow', 'very', 'view', 'we', 'went', 'what', 'will', 'with', 'work']about am
amazing an and awesome beers best boss can ... today \
0 10 0 0 0 0 1 0 0 0 ... 0
1 00 0 0 0 0 0 1 0 0 ... 0
2 00 1100 0 0 0 0 ... 0
3 00 0 0 0 0 0 0 0 0 ... 1
4 00 0 0 0 0 0 0 0 0 ... 0
5 01 001 0 0 0 0 0 ... 0
6 00 0 0 0 0 0 0 0 1 ... 0
7 00 0 0 0 0 0 0 0 0 ... 0
8 01 0 0 0 0 0 0 0 0 ... 0
9 00 0 1 0 1 0 0 0 0 ... 0
10 0 0 0 0 0 0 0 0 0 0 ... 0
11 0 0 0 0 0 0 0 0 1 0 ... 0
12 0 0 0 1 0 1 0 0 0 0 ... 0
Conclusion:
The Expectation-Maximization (EM) algorithm and k-Means algorithm are commonly used for
clustering data. EM is a probabilistic algorithm that iteratively estimates parameters with latent
variables, often applied with Gaussian Mixture Models for probabilistic clustering. On the other hand,
k-Means is a centroid-based algorithm that partitions data into k clusters by minimizing the sum of
squared distances to cluster centroids.
When comparing the results of EM and k-Means clustering on the Heart Disease Data Set, it's
essential to consider several factors:
Algorithm Differences:
EM Algorithm: Capable of capturing complex cluster shapes through probabilistic modeling but
sensitive to initialization.
k-Means Algorithm: Efficient and straightforward, but may struggle with non-linear cluster
boundaries and requires careful initialization.
Cluster Quality Evaluation:
Metrics like Davies-Bouldin Index, Rand Index, Purity, Sensitivity, Precision, F1 Score, and
Accuracy can be used to evaluate clustering quality.
These metrics help assess how well the clusters represent underlying patterns in the data and how
distinct they are from each other.
Comment on Clustering Quality:
Compare the results of EM and k-Means based on cluster shape, initialization sensitivity, and overall
clustering performance.
Consider factors like cluster compactness, separation, and how well they align with the underlying
structure of the data.
By analyzing the clustering results using both algorithms on the Heart Disease Data Set and
evaluating them using appropriate metrics, you can provide insights into the strengths and limitations
of each algorithm in terms of clustering quality and performance.
Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
from sklearn.mixture import GaussianMixture
from matplotlib.patches import Ellipse
gmm = GaussianMixture(n_components=4).fit(X)
labels = gmm.predict(X)
probs = gmm.predict_proba(X)
print(probs[:5].round(3))
size = 50 * probs.max(1) ** 2
Department of Electrical Engineering
Faculty of Engineering
Medi-Caps University
31
EE3CO47 (P) Machine Learning with Python Lab
Experiment No: -01 Implement and demonstrate the FIND-S
algorithm for finding the most specific hypothesis Page 32 of 49
based on a given set of training data samples.
Read the training data from a .CSV file.
)
Output:
[[1 ,0, 0, 0]
[0 ,0, 1, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]
[1,0,0,]]
K-Means:
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv("kmeansdata.csv")
df1 = pd.DataFrame(data)
print(df1)
f1 = df1['Distance_Feature'].values
f2 = df1['Speeding_Feature'].values
X = np.matrix(list(zip(f1, f2)))
plt.plot()
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.title('Dataset')
plt.ylabel('Speeding_Feature')
plt.xlabel('Distance_Feature')
plt.scatter(f1, f2)
plt.show()
# KMeans algorithm
#K=3
kmeans_model = KMeans(n_clusters=3).fit(X)
plt.plot()
Department of Electrical Engineering
Faculty of Engineering
Medi-Caps University
33
EE3CO47 (P) Machine Learning with Python Lab
Experiment No: -01 Implement and demonstrate the FIND-S
algorithm for finding the most specific hypothesis Page 34 of 49
based on a given set of training data samples.
Read the training data from a .CSV file.
for i, l in enumerate(kmeans_model.labels_):
plt.plot(f1[i], f2[i], color=colors[l], marker=markers[l], ls='None')
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.show()
Output:
Driver_ID,Distance_Feature,Speeding_Feature :
3423311935,71.24,28
3423313212,52.53,25
3423313724,64.54,27
3423311373,55.69,22
3423310999,54.58,25
3423313857,41.91,10
3423312432,58.64,20
3423311434,52.02,8
3423311328,31.25,34
3423312488,44.31,19
3423311254,49.35,40
3423312943,58.07,45
3423312536,44.22,22
3423311542,55.73,19
3423312176,46.63,43
3423314176,52.97,32
3423314202,46.25,35
3423311346,51.55,27
3423310666,57.05,26
3423313527,58.45,30
3423312182,43.42,23
3423313590,55.68,37
3423312268,55.15,18
Conclusion:
Aim: : Exploratory data analysis for classification using pandas and Matplotlib
Program:
import pandas as pd
import matplotlib.pyplot as plt
print(data['score'].value_counts())
plt.show()
Output:
Conclusion:
Aim: Demonstrate the diagnosis of heart patients using standard Heart Disease Data
Set.
Software Required: Python IDE or Google Collab
Theory:
Bayesian Network:
A Bayesian network is a probabilistic graphical model that represents probabilistic relationships
between variables.
It consists of nodes representing variables and directed edges representing dependencies.
Medical Data and Diagnosis:
In medical diagnosis, Bayesian networks can model relationships between symptoms, test results, and
diseases.
Variables like age, sex, cholesterol levels, and ECG results can be crucial in diagnosing heart disease.
Heart Disease Data Set:
The Heart Disease Data Set contains attributes like age, sex, cholesterol levels, and ECG results for
patients.
By constructing a Bayesian network using this data, we can model dependencies between these
variables and heart disease.
Program Implementation:
Load the Heart Disease Data Set and define the structure of the Bayesian network.
Use Maximum Likelihood Estimation to estimate parameters based on the data.
Perform inference with the model to predict heart disease diagnosis based on observed data.
By constructing a Bayesian network with medical data like the Heart Disease Data Set, we can create
a model that aids in diagnosing heart patients by capturing dependencies between variables and
predicting outcomes based on observed information.
Program:
import bayespy as bp
import numpy as np
import csv
from colorama import init
from colorama import Fore, Back, Style
init()
for x in dataset:
data.append([ageEnum[x[0]], genderEnum[x[1]], familyHistoryEnum[x[2]], dietEnum[x[3]],
lifeStyleEnum[x[4]], cholesterolEnum[x[5]], heartDiseaseEnum[x[6]]])
# Training data for machine learning todo: should import from csv
data = np.array(data)
N = len(data)
p_heartdisease.update()
# Interactive Test
m=0
while m == 0:
print("\n")
res = bp.nodes.MultiMixture([int(input('Enter Age: ' + str(ageEnum))), int(input('Enter Gender: ' +
str(genderEnum))),
int(input('Enter FamilyHistory: ' + str(familyHistoryEnum))), int(input('Enter
dietEnum: ' + str(dietEnum))),
int(input('Enter LifeStyle: ' + str(lifeStyleEnum))), int(input('Enter Cholesterol: '
+ str(cholesterolEnum)))],
bp.nodes.Categorical, p_heartdisease).get_moments()[0]
[heartDiseaseEnum['Yes']]
print("Probability(HeartDisease) = " + str(res))
m = int(input("Enter for Continue: 0, Exit: 1 "))
Output:
Conclusion:
Theory:
Support Vector Machines:
SVM is a supervised learning algorithm used for classification and regression tasks.
It finds the optimal hyperplane that best separates classes in a high-dimensional space.
Kernel Trick:
SVM can handle non-linearly separable data by mapping it to a higher-dimensional space using kernel
functions like linear, polynomial, or radial basis function (RBF).
Margin:
SVM aims to maximize the margin between the decision boundary and the support vectors, which are
data points closest to the boundary.
Regularization:
SVM uses regularization parameters to control the trade-off between maximizing margin and
minimizing classification errors.
Kernel Selection:
Choosing an appropriate kernel is crucial in SVM to handle different types of data distributions
effectively.
Prediction:
Once trained, SVM can predict the class of new data points based on their position relative to the
decision boundary.
By implementing SVM in Python using libraries like scikit-learn, you can leverage its robustness in
handling complex classification tasks by finding optimal decision boundaries in high-dimensional
spaces.
Program:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, recall_score,
precision_score
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
dim = data.shape
print('Dimensions of the data set are', dim)
print('Statistics of the data are:')
print(data.describe())
print('Correlation matrix of the data set is:')
print(data.corr())
class_lbls = data['survival_status'].unique()
class_labels = [str(x) for x in class_lbls]
print('Class labels are:')
print(class_labels)
sns.countplot(data['survival_status'])
col_names = data.columns
feature_names = col_names[:-1]
feature_names = list(feature_names)
print('Feature names are:')
print(feature_names)
y_set = data['survival_status']
print('First 5 rows of target variable are:')
print(y_set.head())
scaler = StandardScaler()
x_train, x_test, y_train, y_test = train_test_split(x_set, y_set, test_size=0.3)
scaler.fit(x_train)
x_train = scaler.transform(x_train)
model = SVC()
print("Training the model with the train data set")
model.fit(x_train, y_train)
x_test = scaler.transform(x_test)
Department of Electrical Engineering
Faculty of Engineering
Medi-Caps University
44
EE3CO47 (P) Machine Learning with Python Lab
Experiment No: -01 Implement and demonstrate the FIND-S
algorithm for finding the most specific hypothesis Page 45 of 49
based on a given set of training data samples.
Read the training data from a .CSV file.
y_pred = model.predict(x_test)
print('Predicted class labels for test data are:')
print(y_pred)
cm = confusion_matrix(y_test, y_pred)
df_cm = pd.DataFrame(cm, columns=class_labels, index=class_labels)
df_cm.index.name = 'Actual'
df_cm.columns.name = 'Predicted'
sns.set(font_scale=1.5)
sns.heatmap(df_cm, annot=True, cmap="Blues", fmt='d')
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
plt.show()
Output:
Conclusion:
Projection:
Project the original data onto the selected principal components to obtain the reduced-
dimensional representation.
Program:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
# Separate the dataset into features "X" and the target variable "Y"
X = DS.iloc[:, 0:13].values
Y = DS.iloc[:, 13].values
Output:
LogisticRegression(random.state=0)
Conclusion: