AIML Report.

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12


Diabetes prediction
Artificial Intelligence and Machine Learning

Date of Submission: 20/01/2024

Marks Obtained: /10

Submitted By:

Shreya - 4PM21CS085
Suma T K - 4PM21CS096

Signature of the Students

Diabetes Prediction Using Machine Learning Algorithms


Diabetes is a chronic disease caused due to high amount of glucose

present in the human body.

There are types in diabetes Type1 and Type 2 other form is

gestational diabetes which is caused during pregnancy.

This can be controlled in the earlier stages of the attack. According to

International Diabetes Federation (IDF) 382 million people are
suffering with diabetes .

To accomplish this goal, in this project we can do early prediction of

diabetes in humans or patients for Submitted To:
good accuracy.
However, in this project we are Mr.Sunil Kumar H R.,
predicting diabetes using KNN Department of CSE, PESITM,
classifier model. Shivamogga.
Assistant Professor
Problem Statement:-

By using patient records, we will try to build a machine learning

model to accurately predict wheather or not the patients in the dataset
have diabetes or not.

Description of Dataset:-

The dataset represents a list of study from different patients that leads
to classification of either diabetic or not.

For this coursework I will use these presented data and adopt a Knn
algorithm to test some given data of patients and see if they are under
either category diabetes or non-diabetic.

Total number of studied list in this dataset related to diabetic and non-
diabetic patient is 768 , which we will manipulate ,scrap and clean
these data to use them in our KNN predictive model.

The dataset consists of several medical predictor values and one target
variable, outcome.

 Predictor variables includes the number of pregnancies that the

patient has had and their glucose level…
 blood pressure ( mm Hg )
 BMI ( weight in kg / (height in meters^2) )
 Insulin level ( ml )
 age ( years )

Outcome : ( class variable 0 or 1)

Implementation of Code:-

import pandas as pd // data manipulation

import numpy as np // numerical operations
import matplotlib.pyplot as plt // Graph/basic plotting
import seaborn as sns // data visualization of bar plots,scatter

import pandas as pd

data = pd.read_csv("/content/diabetes.csv")

x = data.drop(['Outcome'], axis = 1)
y = data['Outcome']

0 1
1 0
2 1
3 0
4 1
763 0
764 0
765 0
766 1
767 0
Name: Outcome, Length: 768, dtype: int64

from sklearn.preprocessing import MinMaxScaler// to scale the data such

that each feature's values fall within the range [0, 1]

scaler = MinMaxScaler()

x = scaler.fit_transform(x)

0 1
1 0
2 1
3 0
4 1
763 0
764 0
765 0
766 1
767 0
Name: Outcome, Length: 768, dtype: int64

from sklearn.model_selection import train_test_split// for splitting a

dataset into training and testing sets

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.3,

random_state=1)// 30% of the data for testing, and remaining 70% for

From sklearn.neighbors import KNeighborsClassifier// for classification

based on the nearest neighbors

knn = KNeighborsClassifier(n_neighbors=1), ytrain)

KNeighborsClassifier (n_neighbors=1)

ypred = knn.predict(xtest)


array([1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1,
0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0,
0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0,
1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0,
0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0,
1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0])


285 0
101 0
581 0
352 0
726 0
241 0
599 0
650 0
11 1
214 1
Name: Outcome, Length: 231, dtype: int64

from sklearn.metrics import confusion_matrix, classification_report //

summaries the score ( positive or negative predictions )

print(confusion_matrix(ytest, ypred))
print(classification_report(ytest, ypred))

import numpy as np

error_rate = []

for i in range(1, 40):

knn = KNeighborsClassifier(n_neighbors=i), ytrain)
pred_i = knn.predict(xtest)

error_rate.append(np.mean(pred_i != ytest))

plt.figure(figsize=(10, 6))

plt.plot(range(1, 40), error_rate, color='blue', linestyle='--',

markersize=10, markerfacecolor='red', marker='o')

plt.title('K versus Error rate')

plt.ylabel('Error rate')

Text(0, 0.5, 'Error rate')

# lowest error rate at " 11 i think"

knn = KNeighborsClassifier(n_neighbors=13), ytrain)
predictions = knn.predict(xtest)

print(confusion_matrix(ytest, ypred))
print(classification_report(ytest, ypred))

[[119 27]
[ 40 45]]
precision recall f1-score support

0 0.75 0.82 0.78 146

1 0.62 0.53 0.57 85

accuracy 0.71 231

macro avg 0.69 0.67 0.68 231
weighted avg 0.70 0.71 0.70 231

## checking the balance of the data by plotting the count of outcomes

by their value
color_wheel = {1: "#0392cf",
2: "#1020cf"}
colors = data["Outcome"].map(lambda x: color_wheel.get(x + 1))
p = data.hist(figsize = (20,20))
y_pred = knn.predict(xtest)

from sklearn import metrics

cnf_matrix = metrics.confusion_matrix(ytest, ypred)

p = sns.heatmap(pd.DataFrame(cnf_matrix), annot=True,

cmap="YlGnBu" ,fmt='g')

plt.title('Confusion matrix', y=1.1)

plt.ylabel('Actual label')

plt.xlabel('Predicted label')

For this coursework we use these presented data and adopt a Knn
algorithm to test some given data of patients and see if they are under
either category diabetes or non-diabetic.

We’ve obtained the subjected dataset from Kaggle, YouTube…

You might also like