Machine Learning

PRACTICAL 1
Numerical Computing with python (numpy,matplotlib)
In [1]: n=int(input("enter thr number:"))

m=int(input("enter the number:"))
sum=n+m
print(sum)
enter thr number:34
enter the number:35
69
In [2]: import numpy as np a=np.array(['d','h','r','u','v','i']) print("numpy array in python:",a)

numpy array in python: ['d' 'h' 'r' 'u' 'v' 'i']
In [3]:
square root number:45

45 2025
In [6]: n=int(input("square
import numpy as nproot number:")) print(n,n**2)
arr=np.array([1,36,49,4,16])
sqrt_arr=np.sqrt(arr)
print(sqrt_arr)
[1. 6. 7. 2. 4.]
In [7]: import numpy as np

num1=12
num2=14
print("1st numer:",num1)
print("2nd number:",num2)
num=np.add(num1,num2)
print("output numer after addition:",num)
1st numer: 12
2nd number: 14
output numer after addition: 26

M1=([12,17,77])
M2=([77,17,7])
a=np.dot(M1,M2)
print(a)
1752
pip install matplotlib
In [10]:
Requirement already satisfied: matplotlib in d:\anaconda\lib\site-packages (3.5.2)

Requirement already satisfied: cycler>=0.10 in d:\anaconda\lib\site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: pillow>=6.2.0 in d:\anaconda\lib\site-packages (from matplotlib) (9.2.0)
Requirement already satisfied: python-dateutil>=2.7 in d:\anaconda\lib\site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: kiwisolver>=1.0.1 in d:\anaconda\lib\site-packages (from matplotlib) (1.4.2)
Requirement already satisfied: numpy>=1.17 in d:\anaconda\lib\site-packages (from matplotlib) (1.21.5)
Requirement already satisfied: pyparsing>=2.2.1 in d:\anaconda\lib\site-packages (from matplotlib) (3.0.9)
Requirement already satisfied: fonttools>=4.22.0 in d:\anaconda\lib\site-packages (from matplotlib) (4.25.0)
Requirement already satisfied: packaging>=20.0 in d:\anaconda\lib\site-packages (from matplotlib) (21.3)
Requirement already satisfied: six>=1.5 in d:\anaconda\lib\site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
In [11]: import matplotlib.pyplot as plt

X=('A','B','c')
Y=(20,30,40)
fig = plt.figure(figsize=(5,2))
plt.bar(X,Y, color="green")
plt.xlabel("CLASS")
plt.ylabel("NO OF STUDENT")
plt.title("STUDENT OF CLASS")
plt.show()
In [14]:
X=('maths','science','english','ss')
Y=(20,12,15,14)
fig = plt.figure(figsize=(5,3))
plt.plot(X,Y,color="green")
plt.xlabel("subject")
plt.ylabel("marks")
plt.title("markes of student")
plt.show()
In [18]:
import matplotlib.pyplot as plt
X=('A','B','C','D','F','G')
In [22]: Y=(15,12,45,55,34,43)
fig=plt.figure(figsize=(10,2))
plt.scatter(X,Y, color="red")
plt.xlabel("class")
plt.ylabel("no of students")
plt.title("student of class")
plt.show()
import numpy as np
x=np.random.randn(200)
y=2*x + np.random.randn(200)
plt.scatter(x,y)
plt.show()
In [30]: x1=[89,43,36,36,95,10,66,34,38,20]
y1=[21,46,3,35,67,95,53,72,58,10]
x2=[26,29,48,64,6,5,36,66,72,40]
y2=[26,34,90,33,38,20,56,2,47,15]
plt.scatter(x1,y1,c ="grey",linewidths=2,marker="x",edgecolor="red",s=150)
plt.scatter(x2,y2,c="yellow",linewidths=2,marker="*",edgecolor="blue",s=300)
plt.xlabel("x")
plt.ylabel("y")
plt.show()
PRACTICAL 2
C:\Users\DHRUVI\AppData\Local\Temp\ipykernel_21880\3777407254.py:6: UserWarning: You passed a edgecolor/edgecolors ('red') for an unfilled marker ('x'). Matplotlib is ignoring t
plt.scatter(x1,y1,c ="grey",linewidths=2,marker="x",edgecolor="red",s=150)
introduction to pandas for data import and export(Excel,CVS etc)
In [1]: import pandas as pd df=pd.read_csv("PMData.csv")
In [5]:
Out[5]:
0 NaN NaN NaN NaN NaN NaN NaN
df.head()
3 Project Name Task Name Assigned to Start Date Days Required End Date Progress
4 Marketing Market Research Alice 01-01-2024 13 14-01-2024 78%
5 Marketing Content Creation Bob 14-01-2024 14 28-01-2024 100%
1 Project Management Data NaN NaN NaN NaN NaN NaN

8 Product Dev Prototype Development Ethan 02-01-2024 18 20-01-2024 100%
Excel Sample Data Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6
6 Marketing Social Media Planning Charlie 28-01-2024 22 19-02-2024 45%
2
14 NaN NaN
Financial NaN NaN Analysis
Budget NaN NaN Kevin
NaN 02-02-2024 22 24-02-2024 10%
9 Product Dev Quality Assurance Fiona 20-01-2024 10 30-01-2024 78%
In [6]: df.head(15)
Out[6]:

7 Marketing Campaign Analysis Daisy 18-02-2024 25 14-03-2024 0%

3 Project Name Task Name Assigned to Start Date Days Required End Date Progress
10 Product Dev User Interface Design Gabriel 04-02-2024 25 29-02-2024 0%
4 Marketing Market Research Alice 01-01-2024 13 14-01-2024 78%
Excel Sample Data Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6
1 Project Management Data NaN NaN NaN NaN NaN NaN

8 Product Dev Prototype Development Ethan 02-01-2024 18 20-01-2024 100%
11 Customer Svc Service Improvement Hannah 01-02-2024 22 23-02-2024 100%

5 Marketing Content Creation Bob 14-01-2024 14 28-01-2024 100%
14 Financial Budget Analysis Kevin 02-02-2024 22 24-02-2024 10%

9 Product Dev Quality Assurance Fiona 20-01-2024 10 30-01-2024 78%
In [7]: df.tail()
Out[7]:
12 Customer Svc Ticket Resolution Ian 24-02-2024 25 20-03-2024 100%
6 Marketing Social Media Planning Charlie 28-01-2024 22 19-02-2024 45%
10 Product Dev User Interface Design Gabriel 04-02-2024 25 29-02-2024 0%

49 Sample Data Engineering
Excel Unnamed: 1 Unnamed: Prototype
2 Unnamed:Testing
3 Unnamed: 4 Tom
Unnamed: 23-02-2024
5 Unnamed: 6 27 21-03-2024 0%
In [8]: df.tail(5)
13 Customer Svc Customer Feedback Julia 21-03-2024 30 20-04-2024 0%
7 Marketing Campaign Analysis Daisy 18-02-2024 25 14-03-2024 0%
Out[8]:
11 Customer Svc Service Improvement Hannah 01-02-2024 22 23-02-2024 100%

45 Logistics Transportation Planning Patricia 29-01-2024 30 28-02-2024 100%
49 Sample Data Engineering Prototype Testing Tom 23-02-2024 27 21-03-2024 0%

Excel Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6
In [9]:
df.shape
(50, 7)
Out[9]:
In [12]: name=['dhruvi','yamini','vishal','jay','meet']
dep=['IT','CSE','IT-D','BIO','CS']
scr=[40,23,48,34,45]
12 Customer Svc Ticket Resolution Ian 24-02-2024 25 20-03-2024 100%
dict={'name':name,'deploma':dep,"score":scr}
46 Logistics Inventory Optimization Quentin 29-03-2024 20 18-04-2024 0%
df =pd.DataFrame(dict)
print(df)
45 Logistics Transportation Planning Patricia 29-01-2024 30 28-02-2024 100%
name deploma score
0 dhruvi IT 40
1 yamini CSE 23
2 vishal IT-D 48
3 jay BIO 34
4 meet CS 45
13 Customer Svc Customer Feedback Julia 21-03-2024 30 20-04-2024 0%
In [13]: df.to_csv("Dhruvi.csv")
47 Engineering
df.to_excel("dhruvi.xlsx") Product Design Rachel 02-01-2024 25 27-01-2024 20%
df.head()
46 Logistics Inventory Optimization Quentin 29-03-2024 20 18-04-2024 0%
Out[13]: name deploma score
0 dhruvi IT40
1 yamini CSE23
2 vishal IT-D48
3 jay BIO34
48
4 Engineering
meet System Integration
CS Sam 02-02-2024
45 22 24-02-2024 0%
47 Engineering Product Design Rachel 02-01-2024 25 27-01-2024 20%

df.tail()
In [14]:
Out[14]:
48
4 Engineering
meet System Integration
CS Sam 02-02-2024
45 22 24-02-2024 0%
name deploma score
0 dhruvi IT40
1 yamini CSE23
In [15]: 2df.shape
vishal (5,IT-D48
3)
3 jay BIO34
Out[15]:
In [16]: df.values
array([['dhruvi', 'IT', 40],
Out[16]:
['yamini', 'CSE', 23],
['vishal', 'IT-D', 48],
['jay', 'BIO', 34],
['meet', 'CS', 45]], dtype=object)
In [17]: df.describe()
Out[17]:
score
In [18]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
count 5.000000
# Column Non-Null Count Dtype

0 name 5 non-null object
1 deploma 5 non-null object
2score 5 non-null int64
dtypes: int64(1), object(2)
memory usage: 248.0+ bytes
Practical 3
Basic introduction to Scikit Learn

mean 38.000000
In [23]: from sklearn.datasets import load_iris
s1=load_iris()
x,y=s1.data,s1.target
print(x,y)
[[5.1 3.5 1.4 0.2]

[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 9.924717
std 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]
[5.4 3.7 1.5 0.2]
[4.8 3.4 1.6 0.2]
[4.8 3. 1.4 0.1]
[4.3 3. 1.1 0.1]
[5.8 4. 1.2 0.2]
[5.7 4.4 1.5 0.4]
min 23.000000
[5.4 3.9 1.3 0.4]
[5.1 3.5 1.4 0.3]
[5.7 3.8 1.7 0.3]
[5.1 3.8 1.5 0.3]
[5.4 3.4 1.7 0.2]
[5.1 3.7 1.5 0.4]
[4.6 3.6 1. 0.2]
[5.1 3.3 1.7 0.5]
[4.8 3.4 1.9 0.2]
[5. 3. 1.6 0.2]
[5. 3.4 1.6 0.4]
25%
[5.2 34.000000
3.5 1.5 0.2]
[5.2 3.4 1.4 0.2]
[4.7 3.2 1.6 0.2]
[4.8 3.1 1.6 0.2]
[5.4 3.4 1.5 0.4]
[5.2 4.1 1.5 0.1]
[5.5 4.2 1.4 0.2]
[4.9 3.1 1.5 0.2]
[5. 3.2 1.2 0.2]
[5.5 3.5 1.3 0.2]
[4.9 3.6 1.4 0.1]
50%
[4.440.000000
3. 1.3 0.2]
[5.1 3.4 1.5 0.2]
[5. 3.5 1.3 0.3]
[4.5 2.3 1.3 0.3]
[4.4 3.2 1.3 0.2]
[5. 3.5 1.6 0.6]
[5.1 3.8 1.9 0.4]
[4.8 3. 1.4 0.3]
[5.1 3.8 1.6 0.2]
[4.6 3.2 1.4 0.2]
[5.3 3.7 1.5 0.2]
[5. 45.000000
75% 3.3 1.4 0.2]
[7. 3.2 4.7 1.4]
[6.4 3.2 4.5 1.5]
[6.9 3.1 4.9 1.5]
[5.5 2.3 4. 1.3]
[6.5 2.8 4.6 1.5]
[5.7 2.8 4.5 1.3]
[6.3 3.3 4.7 1.6]
[4.9 2.4 3.3 1. ]
[6.6 2.9 4.6 1.3]
[5.2 2.7 3.9 1.4]
[5. 48.000000
max 2. 3.5 1. ]
[5.9 3. 4.2 1.5]
[6. 2.2 4. 1. ]
[6.1 2.9 4.7 1.4]
[5.6 2.9 3.6 1.3]
[6.7 3.1 4.4 1.4]
[5.6 3. 4.5 1.5]
[5.8 2.7 4.1 1. ]
[6.2 2.2 4.5 1.5]
[5.6 2.5 3.9 1.1]
[5.9 3.2 4.8 1.8]
[6.1 2.8 4. 1.3]
[6.3 2.5 4.9 1.5]
[6.1 2.8 4.7 1.2]
[6.4 2.9 4.3 1.3]
[6.6 3. 4.4 1.4]
[6.8 2.8 4.8 1.4]
[6.7 3. 5. 1.7]
[6. 2.9 4.5 1.5]
[5.7 2.6 3.5 1. ]
[5.5 2.4 3.8 1.1]
[5.5 2.4 3.7 1. ]
[5.8 2.7 3.9 1.2]
[6. 2.7 5.1 1.6]
[5.4 3. 4.5 1.5]
[6. 3.4 4.5 1.6]
[6.7 3.1 4.7 1.5]
[6.3 2.3 4.4 1.3]
[5.6 3. 4.1 1.3]
[5.5 2.5 4. 1.3]
[5.5 2.6 4.4 1.2]
[6.1 3. 4.6 1.4]
[5.8 2.6 4. 1.2]
[5. 2.3 3.3 1. ]
[5.6 2.7 4.2 1.3]
[5.7 3. 4.2 1.2]
[5.7 2.9 4.2 1.3]
[6.2 2.9 4.3 1.3]
[5.1 2.5 3. 1.1]
[5.7 2.8 4.1 1.3]
[6.3 3.3 6. 2.5]
[5.8 2.7 5.1 1.9]
[7.1 3. 5.9 2.1]
[6.3 2.9 5.6 1.8]
[6.5 3. 5.8 2.2]
[7.6 3. 6.6 2.1]
[4.9 2.5 4.5 1.7]
[7.3 2.9 6.3 1.8]
[6.7 2.5 5.8 1.8]
[7.2 3.6 6.1 2.5]
[6.5 3.2 5.1 2. ]
[6.4 2.7 5.3 1.9]
[6.8 3. 5.5 2.1]
[5.7 2.5 5. 2. ]
[5.8 2.8 5.1 2.4]
[6.4 3.2 5.3 2.3]
[6.5 3. 5.5 1.8]
[7.7 3.8 6.7 2.2]
[7.7 2.6 6.9 2.3]
[6. 2.2 5. 1.5]
[6.9 3.2 5.7 2.3]
[5.6 2.8 4.9 2. ]
[7.7 2.8 6.7 2. ]
[6.3 2.7 4.9 1.8]
[6.7 3.3 5.7 2.1]
[7.2 3.2 6. 1.8]
[6.2 2.8 4.8 1.8]
[6.1 3. 4.9 1.8]
[6.4 2.8 5.6 2.1]
[7.2 3. 5.8 1.6]
[7.4 2.8 6.1 1.9]
[7.9 3.8 6.4 2. ]
[6.4 2.8 5.6 2.2]
[6.3 2.8 5.1 1.5]
[6.1 2.6 5.6 1.4]
[7.7 3. 6.1 2.3]
[6.3 3.4 5.6 2.4]
[6.4 3.1 5.5 1.8]
[6. 3. 4.8 1.8]
[6.9 3.1 5.4 2.1]
[6.7 3.1 5.6 2.4]
[6.9 3.1 5.1 2.3]
[5.8 2.7 5.1 1.9]
[6.8 3.2 5.9 2.3]
[6.7 3.3 5.7 2.5]
[6.7 3. 5.2 2.3]
[6.3 2.5 5. 1.9]
[6.5 3. 5.2 2. ]
[6.2 3.4 5.4 2.3]
[5.9 3. 5.1 1.8]] [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
In [25]: from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.5, random_state=50) print(x_train)
[[4.6 3.2 1.4 0.2]

[6.3 2.3 4.4 1.3]
[5.2 4.1 1.5 0.1]
[5.5 2.5 4. 1.3]
[6.9 3.1 4.9 1.5]
[4.7 3.2 1.6 0.2]
[4.9 3.1 1.5 0.1]
[5.9 3. 4.2 1.5]
[4.9 3. 1.4 0.2]
[6. 2.7 5.1 1.6]
[4.8 3. 1.4 0.3]
[5.5 2.6 4.4 1.2]
[6.1 3. 4.9 1.8]
[7.2 3. 5.8 1.6]
[7.7 3. 6.1 2.3]
[6.6 2.9 4.6 1.3]
[6.3 2.7 4.9 1.8]
[5.5 3.5 1.3 0.2]
[5.8 2.7 5.1 1.9]
[4.3 3. 1.1 0.1]
[6. 2.2 4. 1. ]
[5.1 3.8 1.6 0.2]
[6.3 3.4 5.6 2.4]
[4.8 3.4 1.9 0.2]
[5.2 3.4 1.4 0.2]
[6. 3. 4.8 1.8]
[5.9 3. 5.1 1.8]
[6.9 3.2 5.7 2.3]
[6.7 3.3 5.7 2.1]
[4.8 3.4 1.6 0.2]
[6.2 3.4 5.4 2.3]
[5.6 2.7 4.2 1.3]
[6.7 2.5 5.8 1.8]
[5. 2.3 3.3 1. ]
[5.1 3.5 1.4 0.2]
[6.4 3.2 4.5 1.5]
[6.5 3.2 5.1 2. ]
[5.4 3.7 1.5 0.2]
[6.2 2.8 4.8 1.8]
[5.8 2.7 4.1 1. ]
[5.7 2.9 4.2 1.3]
[6.8 2.8 4.8 1.4]
[5.6 3. 4.5 1.5]
[5.6 2.8 4.9 2. ]
[5. 2. 3.5 1. ]
[5. 3.4 1.6 0.4]
[6.4 3.2 5.3 2.3]
[5. 3.2 1.2 0.2]
[7.6 3. 6.6 2.1]
[4.8 3.1 1.6 0.2]
[5.7 2.6 3.5 1. ]
[6.9 3.1 5.1 2.3]
[5.1 3.8 1.5 0.3]
[4.6 3.4 1.4 0.3]
[5.6 2.9 3.6 1.3]
[4.9 2.5 4.5 1.7]
[6. 3.4 4.5 1.6]
[5. 3.3 1.4 0.2]
[5.4 3.4 1.5 0.4]
[5. 3.5 1.6 0.6]
[6.1 2.6 5.6 1.4]
[6.1 3. 4.6 1.4]
[5.8 2.6 4. 1.2]
[6.4 2.7 5.3 1.9]
[6.1 2.8 4. 1.3]
[5.7 3. 4.2 1.2]
[4.7 3.2 1.3 0.2]
[6.3 2.8 5.1 1.5]
[4.6 3.6 1. 0.2]
[6.7 3. 5.2 2.3]
[5.9 3.2 4.8 1.8]
[6.4 2.8 5.6 2.2]
[5.5 4.2 1.4 0.2]
[7.2 3.6 6.1 2.5]
[6.9 3.1 5.4 2.1]]
In [26]: from sklearn.linear_model import LogisticRegression model = LogisticRegression()

model.fit(x_train, y_train)
LogisticRegression()
Out[26]:
Practical 5
Import Pima Indian diabetes data Apply select KBest and chi2 for feature selection Identify the best features
In [28]: import pandas as pd

import numpy as np
from sklearn.feature_selection import SelectKBest, chi2
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv"
names = ['preg', 'Glucose', 'pres', 'skin', 'Insulin', 'BMI', 'Pedi', 'age', 'Outcome']
dataset = pd.read_csv(url, names=names)
x = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
kbest = SelectKBest(score_func=chi2, k=5)
x_new = kbest.fit_transform(x,y)
mask = kbest.get_support()
best_feature = x.columns[mask]
print(best_feature)
Index(['preg', 'Glucose', 'Insulin', 'BMI', 'age'], dtype='object')
Practical 6
Write a program to learn a decision tree and use it to predict class labels of test data. Training and test data will be explicity provided by instructor. Tree Pruning Should not be Performed.
In [30]: from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score
X_train = [[1, 2], [3, 4], [4, 3], [3, 4], [1, 2], [1, 4], [1, 2]]
y_train = [1, 0, 1, 1, 0, 0, 1]
X_test = [[2, 2], [4, 3], [5, 5], [6, 2]]
y_test = [0, 1, 0, 1]
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.75
Practical 7
ML Project . Use the following Dataset as music.csv | a. Store File as music.csv and import it to python using pandas | b. Prepare the data by Splitting data in input (age, gender) and output (genre) data set | c. Use Decision tree model form Sklearn to
predict the genre of various age group people. | d. Calculate the accuracy of the model | e. Vary training and test Size tp check different accuracy values models achieves.

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
data = [
["age", "genre", "gender"],
[20, "Rock", "M"],
[60, "Jazz", "F"],
[23, "Pop", "F"],
[30, "Classical", "M"],
[34, "Electronic", "F"],
[56, "Rock", "F"],
[45, "Hip-Hop", "M"],
[23, "Classical", "F"],
[56, "Pop", "M"],
[45, "Electronic", "M"]
]
df_music = pd.DataFrame(data)
print(df_music)
0 1 2
0 age genre gender
1 20 Rock M
2 60 Jazz F
3 23 Pop F
4 30 Classical M
5 34 Electronic F
6 56 Rock F
7 45 Hip-Hop M
8 23 Classical F
9 56 Pop M
10 45 Electronic M
In [32]: df_music.to_csv("music.csv")
In [33]:
Out[33]:
df_music.tail()
10 45 Electronic M
In [34]: df_music.head()
Out[34]:
0 1 2
04 30 Classical M
1 2

from sklearn.tree import DecisionTreeClassifier
6 df56= pd.read_csv("music.csv")
Rock F
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
X[:, 1] = pd.factorize(X[:, 1])[0]
X[:, 2] = pd.factorize(X[:, 2])[0]
0 age genre gender
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
clf = DecisionTreeClassifier(random_state=42)
accuracy_decision = accuracy_score(y_test, y_pred)
print("accuracy:", accuracy_decision)
7accuracy:
45 Hip-Hop
0.25 M
Practical 8
1Write
20a program
Rockto use a knearest
M neighbor it to predict class labesl of test data. Training and test data must be provided explicity.
In [37]: from sklearn.neighbors import KNeighborsClassifier

X_train = [[1, 4], [4, 2], [6, 2], [3, 4], [4, 5], [4, 4], [5, 3]]
y_train = [1, 0, 1, 1, 0, 0, 1]
8 X_test = [[2, F2], [4, 3], [5, 5], [6, 2]]
23 Classical
y_test = [0, 1, 0, 1]
clf = KNeighborsClassifier(n_neighbors=3)
2 accuracy
60 =Jazz F
accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
9 56 Pop M
3 23 Pop F
Practical 9
Accuracy: 0.5
D:\anaconda\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode`
mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
Import vgsales.csv from kaggle platform. | a. Find rows and columns in Dataset
dg_vgsales = pd.read_csv("vgsales.csv") dg_vgsales.head()

In [38]:
Out[38]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
4 2671 Boxing 2600 1980 Fighting Activision 0.72 0.04 0.0 0.01 0.77
0 259 Asteroids 2600 1980 Shooter Atari 4.00 0.26 0.0 0.054.31
1 545 Missile Command 2600 1980 Shooter Atari 2.56 0.17 0.0 0.032.76
2 1768 Kaboom! 2600 1980 Misc Activision 1.07 0.07 0.0 0.011.15
In [39]: 3 dg_vgsales.tail()
1971 Defender 2600 1980 Misc Atari 0.99 0.05 0.0 0.011.05
Out[39]: Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
Mighty No. 9 XOne 2016 Platform Deep Silver 0.01 0.00 0.00 0.00.01
Resident Evil 4 HD XOne 2016 Shooter Capcom 0.01 0.00 0.00 0.00.01
16321 16573 Farming 2017 - The Simulation PS4 2016 Simulation UIG Entertainment 0.00 0.01 0.00 0.00.01
16322 16579 Rugby Challenge 3 XOne 2016 Sports Alternative Software 0.00 0.01 0.00 0.00.01
16323 16592 Chou Ezaru wa Akai Hana: Koi wa Tsuki ni Shiru... PSV 2016 Action dramatic create 0.00 0.00
0.01 Rank 0.0 0.01
dg_vgsales.shape
In [41]:
(16324, 11)
Out[41]:
b. Find Basic information regarding dataset using describe command.
In [42]: dg_vgsales.describe()
Out[42]:
16319 16565
Rank Year NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
16320 16572
In [43]: dg_vgsales.values
array([[259, 'Asteroids', '2600', ..., 0.0, 0.05, 4.31],
Out[43]:
[545, 'Missile Command', '2600', ..., 0.0, 0.03, 2.76],
[1768, 'Kaboom!', '2600', ..., 0.0, 0.01, 1.15],
count 16324.000000
..., 16324.000000 16324.000000 16324.000000 16324.000000 16324.000000 16324.000000
mean [16573,
8291.508270 2006.404251
'Farming 2017 - The0.265464
Simulation',0.147581 0.078673
'PS4', ..., 0.0, 0.0,0.0483340.540328
std 4792.043734
0.01], 5.826744 0.821658 0.508809 0.311584 0.1899021.565860
min 1.000000'Rugby
[16579, 1980.000000 0.000000
Challenge 3', 0.000000
'XOne', ..., 0.000000
0.0, 0.0, 0.01], 0.0000000.010000
[16592, 'Chou Ezaru wa Akai Hana: Koi wa Tsuki ni Shirube Kareru',
'PSV', ..., 0.01, 0.0, 0.01]], dtype=object)
Practical 10
Project on regression | a.Import home data.csv on kaggle using pandas
dh_home = pd.read_csv("home_data.csv")
In [45]: b. Understand data by running head, info and describe command

25% 4135.750000 2003.000000 0.000000 0.000000 0.000000 0.000000 0.060000
dh_home.head()
In [46]:
Out[46]:
50%
4 8293.500000 2007.000000
1954400510 0.080000
20150218T000000 0.020000 0.000000
510000 0.010000
3 0.170000 2.00 1680 8080 1.0 0 0 ... 8
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront view ... grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long sqft_living15 sqft
1680 0 1987 0 98074 47.6168 -122.045 1800 7503
0 7129300520 20141013T000000 221900 3 1.00 1180 5650 1.0 0 0 ... 7 1180 0 1955 0 98178 47.5112 -122.257 13405650
1 6414100192 20141209T000000 538000 3 2.25 2570 7242 2.0 0 0 ... 7 2170 400 1951 1991 98125 47.7210 -122.319 16907639
25 rows × 21 columns
5631500400 20150225T000000 180000 2 1.00 770 10000 1.0 0 0 ... 6 770 0 1933 0 98028 47.7379 -122.233 27208062
3 2487200875 20141209T000000 604000 4 3.00 1960 5000 1.0 0 0 ... 7 1050 910 1965 0 98136 47.5208 -122.393 13605000
dh_home.tail(4)
In [47]:
Out[47]:
75% 12439.250000 2010.000000 0.240000 0.110000 0.040000 0.040000 0.480000

max 16600.000000 2016.000000 41.490000 29.020000 10.220000 10.57000082.740000
4 rows × 21 columns
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront view ... grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long sqft_living15 sqft
21609 6600060120 20150223T000000 400000 4 2.50 2310 5813 2.0 0 0 ... 8 2310 0 2014 0 98146 47.5107 -122.362 18307200
21610 1523300141 20140623T000000 402101 2 0.75 1020 1350 2.0 0 0 ... 7 1020 0 2009 0 98144 47.5944 -122.299 10202007
dh_home.shape
In [48]: 21611 291310100(21613, 21)
20150116T000000 400000 3 2.50 1600 2388 2.0 0 0 ... 8 1600 0 2004 0 98027 47.5345 -122.069 14101287
21612 1523300157 20141015T000000 325000 2 0.75 1020 1076 2.0 0 0 ... 7 1020 0 2008 0 98144 47.5941 -122.299 10201357
Out[48]:
In [49]: dh_home.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 21 columns):
# Column Non-Null Count Dtype
0 id 21613 non-null int64

1 date 21613 non-null object
2 price 21613 non-null int64
3 bedrooms 21613 non-null int64
4 bathrooms 21613 non-null float64
5 sqft_living 21613 non-null int64
6 sqft_lot 21613 non-null int64
7 floors 21613 non-null float64
8 waterfront 21613 non-null int64
9 view 21613 non-null int64
10 condition 21613 non-null int64
11 grade 21613 non-null int64
12 sqft_above 21613 non-null int64
13 sqft_basement 21613 non-null int64
14 yr_built 21613 non-null int64
15 yr_renovated 21613 non-null int64
16 zipcode 21613 non-null int64
17 lat 21613 non-null float64
18 long 21613 non-null float64
19 sqft_living15 21613 non-null int64
20 sqft_lot15 21613 non-null int64
dtypes: float64(4), int64(16), object(1)
memory usage: 3.5+ MB
In [50]: dh_home.describe().T
Out[50]:
count mean std min 25% 50% 75% max
id 21613.0 4.580302e+09 2.876566e+09 1.000102e+06 2.123049e+09 3.904930e+09 7.308900e+09 9.900000e+09
price 21613.0 5.400881e+05 3.671272e+05 7.500000e+04 3.219500e+05 4.500000e+05 6.450000e+05 7.700000e+06
dh_home.values
In [51]:
array([[7129300520, '20141013T000000', 221900, ..., -122.257, 1340, 5650],
Out[51]:
[6414100192, '20141209T000000', 538000, ..., -122.319, 1690, 7639],
[5631500400, '20150225T000000', 180000, ..., -122.233, 2720, 8062],
...,
[1523300141, '20140623T000000', 402101, ..., -122.299, 1020, 2007],
[291310100, '20150116T000000', 400000, ..., -122.069, 1410, 1287],
[1523300157,
bedrooms 21613.0 '20141015T000000', 325000,0.000000e+00
3.370842e+00 9.300618e-01 ..., -122.299, 1020, 1357]],
3.000000e+00 3.000000e+00 4.000000e+00 3.300000e+01
dtype=object)
In [52]: import matplotlib.pyplot as plt
plt.scatter(dh_home['sqft_living'], dh_home['price'])
plt.xlabel('Area')
plt.ylabel('Price')
plt.show()
d. Apply Linear Regression model to predict the price
bathrooms 21613.0 2.114757e+00 7.701632e-01 0.000000e+00 1.750000e+00 2.250000e+00 2.500000e+00 8.000000e+00
sqft_living 21613.0 2.079900e+03 9.184409e+02 2.900000e+02 1.427000e+03 1.910000e+03 2.550000e+03 1.354000e+04
sqft_lot 21613.0 1.510697e+04 4.142051e+04 5.200000e+02 5.040000e+03 7.618000e+03 1.068800e+04 1.651359e+06
In [53]: from sklearn.linear_model import LinearRegression

m = LinearRegression()
m.fit(dh_home[['sqft_living']], dh_home['price'])
pred_price = m.predict([[2000]])
print(pred_price)
[517666.39294021]
[517666.39294021]
floors 21613.0 1.494309e+00 5.399889e-01 1.000000e+00 1.000000e+00 1.500000e+00 2.000000e+00 3.500000e+00
D:\anaconda\lib\site-packages\sklearn\base.py:450: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
[517666.39294021]
Out[53]:
Practical 11
Write a program to duster a set of points using K-menas. Training and test data must be provided explicitly.
In [56]: pip install threadpoolctl==3.1.0

waterfront 21613.0 7.541757e-03 8.651720e-02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00
Requirement already satisfied: threadpoolctl==3.1.0 in d:\anaconda\lib\site-packages (3.1.0)
Note: you may need to restart the kernel to use updated packages.

import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
X_train = np.array([[2, 4], [4, 5], [5, 2], [6, 4], [5, 5], [4, 2], [5, 2]])
X_test = np.array([[2, 2], [4, 3], [5, 5], [6, 2]])
km = KMeans(n_clusters=3)
km.fit(x_train)
view 21613.0 2.343034e-01 7.663176e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 4.000000e+00
y_pred = km.predict(x_test)
plt.scatter(x_train[:, 0], x_train[:, 1], c=km.labels_)
plt.scatter(x_test[:, 0], x_test[:, 1], marker="x", s=150, linewidths=1, c=y_pred)
plt.show()
condition 21613.0 3.409430e+00 6.507430e-01 1.000000e+00 3.000000e+00 3.000000e+00 4.000000e+00 5.000000e+00
grade 21613.0 7.656873e+00 1.175459e+00 1.000000e+00 7.000000e+00 7.000000e+00 8.000000e+00 1.300000e+01
Practical 12 Import
Irissqft_above 21613.0 1.788391e+03 8.280910e+02 2.900000e+02 1.190000e+03 1.560000e+03 2.210000e+03 9.410000e+03

Datase
di_iris = pd.read_csv("Iris.csv")
In [59]:
a. Find Rows and Columns using shape command
di_iris.shape
In [60]:
(150, 6)
Out[60]:
b. Print First 30 instances
sqft_basement 21613.0using Head command
2.915090e+02 4.425750e+02 0.000000e+00 0.000000e+00 0.000000e+00 5.600000e+02 4.820000e+03
In [61]: di_iris.head(10)
Out[61]:
yr_built 21613.0 1.971005e+03 2.937341e+01 1.900000e+03 1.951000e+03 1.975000e+03 1.997000e+03 2.015000e+03
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
yr_renovated 21613.0 8.440226e+01 4.016792e+02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.015000e+03

c. Find out the Data instances in each class
0 1 5.1 3.5 1.4 0.2 Iris-setosa
In [62]: dd = di_iris.groupby('Species').size() print(dd)
Species
Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
dtype: int64
d. Plotzipcode 21613.0 (box,

the univariatefraphs 9.807794e+04 5.350503e+01
plot and histograms) 9.800100e+04 9.803300e+04 9.806500e+04 9.811800e+04 9.819900e+04
1 2 4.9 3.0 1.4 0.2 Iris-setosa

In [63]: di_iris.boxplot(column='SepalLengthCm', by='Species')
In [64]: plt.title("Box Plot of Sepal Length")
plt.xlabel("Species")
plt.ylabel("SepalLengthCm")
plt.show()
lat 21613.0 4.756005e+01 1.385637e-01 4.715590e+01 4.747100e+01 4.757180e+01 4.767800e+01 4.777760e+01
2 3 4.7 3.2 1.3 0.2 Iris-setosa
long 21613.0 -1.222139e+02 1.408283e-01 -1.225190e+02 -1.223280e+02 -1.222300e+02 -1.221250e+02 -1.213150e+02
3 4 4.6 3.1 1.5 0.2 Iris-setosa
sqft_living15 21613.0 1.986552e+03 6.853913e+02 3.990000e+02 1.490000e+03 1.840000e+03 2.360000e+03 6.210000e+03
4 5 5.0 3.6 1.4 0.2 Iris-setosa
di_iris.hist(column='PetalWidthCm', by='Species')
plt.suptitle("Histogram of Petal Width")
plt.xlabel('PetalWidthCm')
plt.ylabel('counts')
plt.show()
sqft_lot15 21613.0 1.276846e+04 2.730418e+04 6.510000e+02 5.100000e+03 7.620000e+03 1.008300e+04 8.712000e+05
5 6 5.4 3.9 1.7 0.4 Iris-setosa
6 7 4.6 3.4 1.4 0.3 Iris-setosa
7 8 5.0 3.4 1.5 0.2 Iris-setosa

e. Plot the multivariate plot (scatter matrix)
In [65]: from pandas.plotting import scatter_matrix

scatter_matrix(di_iris, alpha=0.5, figsize=(10,10), diagonal='hist') plt.show()
8 9 4.4 2.9 1.4 0.2 Iris-setosa
9 10 4.9 3.1 1.5 0.1 Iris-setosa
f. Split data to train model by 80% data values.
In [66]: from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(di_iris.iloc[:,:-1], di_iris.iloc[:, -1], test_size=10)
print("X Training Shape", x_train.shape)
print("X Testing Shape", x_test.shape)
print("Y Training Shape", y_train.shape)
print("X Testing Shape", y_test.shape)
X Training Shape (140, 5)

X Testing Shape (10, 5)
Y Training Shape (140,)
X Testing Shape (10,)
Apply K-NN and k means clustering to check accuracy and decide which is better
In [67]: from sklearn.neighbors import KNeighborsClassifier

from sklearn.cluster import KMeans
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)
knn_pred = knn.predict(x_test)
knn_acc = accuracy_score(y_test, knn_pred)
kmeans = KMeans(n_clusters=4, random_state=50)
kmeans.fit(x_train)
kmeans_pred = kmeans.predict(x_test)
kmeans_acc = accuracy_score(y_test, knn_pred)
print("KNN Acc: ", knn_acc)
print("KMEANS Acc: ", kmeans_acc)
KNN Acc: 1.0

D:\anaconda\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default beha
KMEANS Acc: 1.0
mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
In [ ]:

Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning

Uploaded by

Copyright:

Available Formats

PRACTICAL 1

Numerical Computing with python (numpy,matplotlib)

In [1]: n=int(input("enter thr number:"))

In [2]: import numpy as np a=np.array(['d','h','r','u','v','i']) print("numpy array in python:",a)

square root number:45

In [7]: import numpy as np

In [8]: import numpy as np

Requirement already satisfied: matplotlib in d:\anaconda\lib\site-packages (3.5.2)

In [11]: import matplotlib.pyplot as plt

import matplotlib.pyplot as plt

introduction to pandas for data import and export(Excel,CVS etc)

In [1]: import pandas as pd df=pd.read_csv("PMData.csv")

1 Project Management Data NaN NaN NaN NaN NaN NaN

Excel Sample Data Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6

6 Marketing Social Media Planning Charlie 28-01-2024 22 19-02-2024 45%

0 NaN NaN NaN NaN NaN NaN NaN

2 NaN NaN NaN NaN NaN NaN NaN

1 Project Management Data NaN NaN NaN NaN NaN NaN

11 Customer Svc Service Improvement Hannah 01-02-2024 22 23-02-2024 100%

14 Financial Budget Analysis Kevin 02-02-2024 22 24-02-2024 10%

10 Product Dev User Interface Design Gabriel 04-02-2024 25 29-02-2024 0%

11 Customer Svc Service Improvement Hannah 01-02-2024 22 23-02-2024 100%

49 Sample Data Engineering Prototype Testing Tom 23-02-2024 27 21-03-2024 0%

47 Engineering Product Design Rachel 02-01-2024 25 27-01-2024 20%

# Column Non-Null Count Dtype

Basic introduction to Scikit Learn

[[5.1 3.5 1.4 0.2]

In [25]: from sklearn.model_selection import train_test_split

[[4.6 3.2 1.4 0.2]

In [26]: from sklearn.linear_model import LogisticRegression model = LogisticRegression()

In [28]: import pandas as pd

Index(['preg', 'Glucose', 'Insulin', 'BMI', 'age'], dtype='object')

In [30]: from sklearn.tree import DecisionTreeClassifier

In [31]: import pandas as pd

In [35]: import pandas as pd

In [37]: from sklearn.neighbors import KNeighborsClassifier

dg_vgsales = pd.read_csv("vgsales.csv") dg_vgsales.head()

Rank Year NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales

Project on regression | a.Import home data.csv on kaggle using pandas

In [45]: b. Understand data by running head, info and describe command

75% 12439.250000 2010.000000 0.240000 0.110000 0.040000 0.040000 0.480000

0 id 21613 non-null int64

count mean std min 25% 50% 75% max

id 21613.0 4.580302e+09 2.876566e+09 1.000102e+06 2.123049e+09 3.904930e+09 7.308900e+09 9.900000e+09

price 21613.0 5.400881e+05 3.671272e+05 7.500000e+04 3.219500e+05 4.500000e+05 6.450000e+05 7.700000e+06

In [52]: import matplotlib.pyplot as plt

bathrooms 21613.0 2.114757e+00 7.701632e-01 0.000000e+00 1.750000e+00 2.250000e+00 2.500000e+00 8.000000e+00

sqft_living 21613.0 2.079900e+03 9.184409e+02 2.900000e+02 1.427000e+03 1.910000e+03 2.550000e+03 1.354000e+04

sqft_lot 21613.0 1.510697e+04 4.142051e+04 5.200000e+02 5.040000e+03 7.618000e+03 1.068800e+04 1.651359e+06

In [53]: from sklearn.linear_model import LinearRegression

In [56]: pip install threadpoolctl==3.1.0

In [57]: import numpy as np

condition 21613.0 3.409430e+00 6.507430e-01 1.000000e+00 3.000000e+00 3.000000e+00 4.000000e+00 5.000000e+00

grade 21613.0 7.656873e+00 1.175459e+00 1.000000e+00 7.000000e+00 7.000000e+00 8.000000e+00 1.300000e+01

Irissqft_above 21613.0 1.788391e+03 8.280910e+02 2.900000e+02 1.190000e+03 1.560000e+03 2.210000e+03 9.410000e+03

yr_built 21613.0 1.971005e+03 2.937341e+01 1.900000e+03 1.951000e+03 1.975000e+03 1.997000e+03 2.015000e+03

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

yr_renovated 21613.0 8.440226e+01 4.016792e+02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.015000e+03

d. Plotzipcode 21613.0 (box,

1 2 4.9 3.0 1.4 0.2 Iris-setosa

lat 21613.0 4.756005e+01 1.385637e-01 4.715590e+01 4.747100e+01 4.757180e+01 4.767800e+01 4.777760e+01

2 3 4.7 3.2 1.3 0.2 Iris-setosa

long 21613.0 -1.222139e+02 1.408283e-01 -1.225190e+02 -1.223280e+02 -1.222300e+02 -1.221250e+02 -1.213150e+02