Professional Documents
Culture Documents
Machine Learning
Machine Learning
In [3]:
In [6]: n=int(input("square
import numpy as nproot number:")) print(n,n**2)
arr=np.array([1,36,49,4,16])
sqrt_arr=np.sqrt(arr)
print(sqrt_arr)
[1. 6. 7. 2. 4.]
num=np.add(num1,num2)
print("output numer after addition:",num)
1st numer: 12
2nd number: 14
output numer after addition: 26
1752
pip install matplotlib
In [10]:
fig = plt.figure(figsize=(5,2))
plt.bar(X,Y, color="green")
plt.xlabel("CLASS")
plt.ylabel("NO OF STUDENT")
plt.title("STUDENT OF CLASS")
plt.show()
In [14]:
X=('maths','science','english','ss')
Y=(20,12,15,14)
fig = plt.figure(figsize=(5,3))
plt.plot(X,Y,color="green")
plt.xlabel("subject")
plt.ylabel("marks")
plt.title("markes of student")
plt.show()
In [18]:
X=('A','B','C','D','F','G')
In [22]: Y=(15,12,45,55,34,43)
fig=plt.figure(figsize=(10,2))
plt.scatter(X,Y, color="red")
plt.xlabel("class")
plt.ylabel("no of students")
plt.title("student of class")
plt.show()
import numpy as np
x=np.random.randn(200)
y=2*x + np.random.randn(200)
plt.scatter(x,y)
plt.show()
In [30]: x1=[89,43,36,36,95,10,66,34,38,20]
y1=[21,46,3,35,67,95,53,72,58,10]
x2=[26,29,48,64,6,5,36,66,72,40]
y2=[26,34,90,33,38,20,56,2,47,15]
plt.scatter(x1,y1,c ="grey",linewidths=2,marker="x",edgecolor="red",s=150)
plt.scatter(x2,y2,c="yellow",linewidths=2,marker="*",edgecolor="blue",s=300)
plt.xlabel("x")
plt.ylabel("y")
plt.show()
PRACTICAL 2
C:\Users\DHRUVI\AppData\Local\Temp\ipykernel_21880\3777407254.py:6: UserWarning: You passed a edgecolor/edgecolors ('red') for an unfilled marker ('x'). Matplotlib is ignoring t
plt.scatter(x1,y1,c ="grey",linewidths=2,marker="x",edgecolor="red",s=150)
In [5]:
Out[5]:
0 NaN NaN NaN NaN NaN NaN NaN
df.head()
3 Project Name Task Name Assigned to Start Date Days Required End Date Progress
4 Marketing Market Research Alice 01-01-2024 13 14-01-2024 78%
5 Marketing Content Creation Bob 14-01-2024 14 28-01-2024 100%
2
14 NaN NaN
Financial NaN NaN Analysis
Budget NaN NaN Kevin
NaN 02-02-2024 22 24-02-2024 10%
9 Product Dev Quality Assurance Fiona 20-01-2024 10 30-01-2024 78%
In [6]: df.head(15)
Out[6]:
In [7]: df.tail()
Out[7]:
12 Customer Svc Ticket Resolution Ian 24-02-2024 25 20-03-2024 100%
6 Marketing Social Media Planning Charlie 28-01-2024 22 19-02-2024 45%
In [8]: df.tail(5)
13 Customer Svc Customer Feedback Julia 21-03-2024 30 20-04-2024 0%
7 Marketing Campaign Analysis Daisy 18-02-2024 25 14-03-2024 0%
Out[8]:
(50, 7)
Out[9]:
In [12]: name=['dhruvi','yamini','vishal','jay','meet']
dep=['IT','CSE','IT-D','BIO','CS']
scr=[40,23,48,34,45]
12 Customer Svc Ticket Resolution Ian 24-02-2024 25 20-03-2024 100%
dict={'name':name,'deploma':dep,"score":scr}
46 Logistics Inventory Optimization Quentin 29-03-2024 20 18-04-2024 0%
df =pd.DataFrame(dict)
print(df)
45 Logistics Transportation Planning Patricia 29-01-2024 30 28-02-2024 100%
name deploma score
0 dhruvi IT 40
1 yamini CSE 23
2 vishal IT-D 48
3 jay BIO 34
4 meet CS 45
13 Customer Svc Customer Feedback Julia 21-03-2024 30 20-04-2024 0%
In [13]: df.to_csv("Dhruvi.csv")
47 Engineering
df.to_excel("dhruvi.xlsx") Product Design Rachel 02-01-2024 25 27-01-2024 20%
df.head()
46 Logistics Inventory Optimization Quentin 29-03-2024 20 18-04-2024 0%
Out[13]: name deploma score
0 dhruvi IT40
1 yamini CSE23
2 vishal IT-D48
3 jay BIO34
48
4 Engineering
meet System Integration
CS Sam 02-02-2024
45 22 24-02-2024 0%
Out[14]:
48
4 Engineering
meet System Integration
CS Sam 02-02-2024
45 22 24-02-2024 0%
name deploma score
0 dhruvi IT40
1 yamini CSE23
In [15]: 2df.shape
vishal (5,IT-D48
3)
3 jay BIO34
Out[15]:
In [16]: df.values
array([['dhruvi', 'IT', 40],
Out[16]:
['yamini', 'CSE', 23],
['vishal', 'IT-D', 48],
['jay', 'BIO', 34],
['meet', 'CS', 45]], dtype=object)
In [17]: df.describe()
Out[17]:
score
In [18]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
count 5.000000
s1=load_iris()
x,y=s1.data,s1.target
print(x,y)
LogisticRegression()
Out[26]:
Practical 5
Import Pima Indian diabetes data Apply select KBest and chi2 for feature selection Identify the best features
Practical 6
Write a program to learn a decision tree and use it to predict class labels of test data. Training and test data will be explicity provided by instructor. Tree Pruning Should not be Performed.
Practical 7
ML Project . Use the following Dataset as music.csv | a. Store File as music.csv and import it to python using pandas | b. Prepare the data by Splitting data in input (age, gender) and output (genre) data set | c. Use Decision tree model form Sklearn to
predict the genre of various age group people. | d. Calculate the accuracy of the model | e. Vary training and test Size tp check different accuracy values models achieves.
In [32]: df_music.to_csv("music.csv")
In [33]:
Out[33]:
df_music.tail()
10 45 Electronic M
In [34]: df_music.head()
Out[34]:
0 1 2
04 30 Classical M
1 2
Practical 8
1Write
20a program
Rockto use a knearest
M neighbor it to predict class labesl of test data. Training and test data must be provided explicity.
9 56 Pop M
3 23 Pop F
Practical 9
Accuracy: 0.5
D:\anaconda\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode`
mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
Import vgsales.csv from kaggle platform. | a. Find rows and columns in Dataset
Out[38]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
4 2671 Boxing 2600 1980 Fighting Activision 0.72 0.04 0.0 0.01 0.77
0 259 Asteroids 2600 1980 Shooter Atari 4.00 0.26 0.0 0.054.31
1 545 Missile Command 2600 1980 Shooter Atari 2.56 0.17 0.0 0.032.76
2 1768 Kaboom! 2600 1980 Misc Activision 1.07 0.07 0.0 0.011.15
In [39]: 3 dg_vgsales.tail()
1971 Defender 2600 1980 Misc Atari 0.99 0.05 0.0 0.011.05
Out[39]: Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
Mighty No. 9 XOne 2016 Platform Deep Silver 0.01 0.00 0.00 0.00.01
Resident Evil 4 HD XOne 2016 Shooter Capcom 0.01 0.00 0.00 0.00.01
16321 16573 Farming 2017 - The Simulation PS4 2016 Simulation UIG Entertainment 0.00 0.01 0.00 0.00.01
16322 16579 Rugby Challenge 3 XOne 2016 Sports Alternative Software 0.00 0.01 0.00 0.00.01
16323 16592 Chou Ezaru wa Akai Hana: Koi wa Tsuki ni Shiru... PSV 2016 Action dramatic create 0.00 0.00
0.01 Rank 0.0 0.01
dg_vgsales.shape
In [41]:
(16324, 11)
Out[41]:
b. Find Basic information regarding dataset using describe command.
In [42]: dg_vgsales.describe()
Out[42]:
16319 16565
16320 16572
In [43]: dg_vgsales.values
array([[259, 'Asteroids', '2600', ..., 0.0, 0.05, 4.31],
Out[43]:
[545, 'Missile Command', '2600', ..., 0.0, 0.03, 2.76],
[1768, 'Kaboom!', '2600', ..., 0.0, 0.01, 1.15],
count 16324.000000
..., 16324.000000 16324.000000 16324.000000 16324.000000 16324.000000 16324.000000
mean [16573,
8291.508270 2006.404251
'Farming 2017 - The0.265464
Simulation',0.147581 0.078673
'PS4', ..., 0.0, 0.0,0.0483340.540328
std 4792.043734
0.01], 5.826744 0.821658 0.508809 0.311584 0.1899021.565860
min 1.000000'Rugby
[16579, 1980.000000 0.000000
Challenge 3', 0.000000
'XOne', ..., 0.000000
0.0, 0.0, 0.01], 0.0000000.010000
[16592, 'Chou Ezaru wa Akai Hana: Koi wa Tsuki ni Shirube Kareru',
'PSV', ..., 0.01, 0.0, 0.01]], dtype=object)
Practical 10
dh_home = pd.read_csv("home_data.csv")
Out[46]:
50%
4 8293.500000 2007.000000
1954400510 0.080000
20150218T000000 0.020000 0.000000
510000 0.010000
3 0.170000 2.00 1680 8080 1.0 0 0 ... 8
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront view ... grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long sqft_living15 sqft
1680 0 1987 0 98074 47.6168 -122.045 1800 7503
0 7129300520 20141013T000000 221900 3 1.00 1180 5650 1.0 0 0 ... 7 1180 0 1955 0 98178 47.5112 -122.257 13405650
1 6414100192 20141209T000000 538000 3 2.25 2570 7242 2.0 0 0 ... 7 2170 400 1951 1991 98125 47.7210 -122.319 16907639
25 rows × 21 columns
5631500400 20150225T000000 180000 2 1.00 770 10000 1.0 0 0 ... 6 770 0 1933 0 98028 47.7379 -122.233 27208062
3 2487200875 20141209T000000 604000 4 3.00 1960 5000 1.0 0 0 ... 7 1050 910 1965 0 98136 47.5208 -122.393 13605000
dh_home.tail(4)
In [47]:
Out[47]:
4 rows × 21 columns
id date price bedrooms bathrooms sqft_living sqft_lot floors waterfront view ... grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long sqft_living15 sqft
21609 6600060120 20150223T000000 400000 4 2.50 2310 5813 2.0 0 0 ... 8 2310 0 2014 0 98146 47.5107 -122.362 18307200
21610 1523300141 20140623T000000 402101 2 0.75 1020 1350 2.0 0 0 ... 7 1020 0 2009 0 98144 47.5944 -122.299 10202007
dh_home.shape
In [48]: 21611 291310100(21613, 21)
20150116T000000 400000 3 2.50 1600 2388 2.0 0 0 ... 8 1600 0 2004 0 98027 47.5345 -122.069 14101287
21612 1523300157 20141015T000000 325000 2 0.75 1020 1076 2.0 0 0 ... 7 1020 0 2008 0 98144 47.5941 -122.299 10201357
Out[48]:
In [49]: dh_home.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 21 columns):
# Column Non-Null Count Dtype
In [50]: dh_home.describe().T
Out[50]:
dh_home.values
In [51]:
array([[7129300520, '20141013T000000', 221900, ..., -122.257, 1340, 5650],
Out[51]:
[6414100192, '20141209T000000', 538000, ..., -122.319, 1690, 7639],
[5631500400, '20150225T000000', 180000, ..., -122.233, 2720, 8062],
...,
[1523300141, '20140623T000000', 402101, ..., -122.299, 1020, 2007],
[291310100, '20150116T000000', 400000, ..., -122.069, 1410, 1287],
[1523300157,
bedrooms 21613.0 '20141015T000000', 325000,0.000000e+00
3.370842e+00 9.300618e-01 ..., -122.299, 1020, 1357]],
3.000000e+00 3.000000e+00 4.000000e+00 3.300000e+01
dtype=object)
plt.scatter(dh_home['sqft_living'], dh_home['price'])
plt.xlabel('Area')
plt.ylabel('Price')
plt.show()
d. Apply Linear Regression model to predict the price
[517666.39294021]
floors 21613.0 1.494309e+00 5.399889e-01 1.000000e+00 1.000000e+00 1.500000e+00 2.000000e+00 3.500000e+00
D:\anaconda\lib\site-packages\sklearn\base.py:450: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
[517666.39294021]
Out[53]:
Practical 11
Write a program to duster a set of points using K-menas. Training and test data must be provided explicitly.
Practical 12 Import
In [59]:
a. Find Rows and Columns using shape command
di_iris.shape
In [60]:
(150, 6)
Out[60]:
b. Print First 30 instances
sqft_basement 21613.0using Head command
2.915090e+02 4.425750e+02 0.000000e+00 0.000000e+00 0.000000e+00 5.600000e+02 4.820000e+03
In [61]: di_iris.head(10)
Out[61]:
Species
Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
dtype: int64
di_iris.hist(column='PetalWidthCm', by='Species')
plt.suptitle("Histogram of Petal Width")
plt.xlabel('PetalWidthCm')
plt.ylabel('counts')
plt.show()
Apply K-NN and k means clustering to check accuracy and decide which is better
In [ ]: