IDP Assignment - 4 _ 5 [Saswat Mohanty_ 1941012407_ CSE-D]

SIKSHA ‘O’ ANUSANDHAN
DEEMED TO BE UNIVERSITY
Admission Batch: 2019 Session: 2021-2022
Laboratory Record
Introduction to Data Science using Python (CSE 3054)
Submitted by
Name: SASWAT MOHANTY
Registration No.: 1941012407
Branch: Computer Science & Engineering (C.S.E.)
Semester: 6th Semester Section: CSE-‘D’
Department of Computer Science & Engineering

Faculty of Engineering & Technology (ITER)
Jagamohan Nagar, Jagamara, Bhubaneswar, Odisha - 751030
INDEX
Sl. Name of Program Page Remarks

No No
1 Minor Assignment - 4 1-8
2 Minor Assignment - 5 9-13

Introduction to Data Science using Python(CSE 3054)

MINOR ASSIGNMENT 4
1. Write a python program to randomly generate data with 100 numbers and form a
histogram of it. Range of numbers should be between 1 to 100(both included), bucket
size=10.
Ans:
Progarm:
import random
import matplotlib.pyplot as plt
def generateRandomNumbers(number,start,end):
random_numbers=[]
for i in range(number):
random_numbers.append(random.randint(start,end))
return random_numbers
def main():
number,start,end=100,1,100
random_numbers=generateRandomNumbers(number,start,end)
print('Generated Random Numbers:',random_numbers)
plt.hist(random_numbers,bins=10,color='pink',edgecolor='red')
plt.xlabel('Generated Random Numbers Range 1 to 100 Including')
plt.ylabel('Frequency Count')
plt.title('Generated Random Numbers Range 1 to 100 Including Vs Frequency Count')
plt.show()
if name ==' main ':
main()
Output:
Generated Random Numbers: [14, 86, 20, 79, 75, 35, 78, 75, 43, 46, 75, 100, 4, 37, 72, 86, 9, 6,
95, 24, 60, 22, 20, 15, 70, 76, 11, 53, 5, 59, 27, 55, 70, 33, 9, 89, 30, 84, 51, 36, 45, 92, 34, 78, 91,
14, 42, 15, 25, 90, 13, 83, 15, 91, 62, 28, 68, 21, 37, 70, 76, 13, 67, 26, 65, 78, 17, 86, 97, 43, 12,
88, 83, 91, 1, 4, 12, 84, 81, 23, 9, 89, 83, 81, 79, 9, 14, 38, 98, 66, 86, 60, 57, 8, 80, 18, 21, 10, 78,
36]
2. Write a python program to create a NamedTuple with following details (Roll

No.,[Name,Branch,Year of Admission]).
Name: Saswat Mohanty 1 Regd. Number: 1941012407

Ans:
Program:
from collections import namedtuple
def generateNamedTuple():
Roll_No=namedtuple("Roll_No",["Name","Branch","Year_Of_Admission"])
Roll=[]
R1=Roll_No('Rahul','C.S.E.',2017)
Roll.append(R1)
R2=Roll_No('Dabba','C.S.E.',2018)
Roll.append(R2)
R3=Roll_No('Soumya','C.S.E.',2019)
Roll.append(R3)
R4=Roll_No('Rituraj','C.S.E.',2020)
Roll.append(R4)
return Roll
def main():
Roll_No=generateNamedTuple()
print('Named Tuple: Roll_No')
for Roll in Roll_No:
print(Roll)
if name ==' main ':
main()
Output:
Named Tuple: Roll_No
Roll_No(Name='Rahul', Branch='C.S.E.', Year_Of_Admission=2017)
Roll_No(Name='Dabba', Branch='C.S.E.', Year_Of_Admission=2018)
Roll_No(Name='Soumya', Branch='C.S.E.', Year_Of_Admission=2019)
Roll_No(Name='Rituraj', Branch='C.S.E.', Year_Of_Admission=2020)
3. Write a python program to create a Dataclass with following details (Roll

No.,[Name,Branch,Year of Admission]).
Ans:
Program:
from dataclasses import dataclass
@dataclass
class Roll_No:
name:str
branch:str
year_of_admission:int
def generateDataclass():
Roll=[]
R1=Roll_No('Rahul','C.S.E.',2017)
Roll.append(R1)
R2=Roll_No('Dabba','C.S.E.',2018)
Roll.append(R2)
R3=Roll_No('Soumya','C.S.E.',2019)
Roll.append(R3)
R4=Roll_No('Rituraj','C.S.E.',2020)
Roll.append(R4)
return Roll

def main():
print('Dataclass:Roll_No')
Roll=generateDataclass()
for R in Roll:
print(R)
if name ==' main ':
main()
Output:
Dataclass:Roll_No
Roll_No(name='Rahul', branch='C.S.E.', year_of_admission=2017)
Roll_No(name='Dabba', branch='C.S.E.', year_of_admission=2018)
Roll_No(name='Soumya', branch='C.S.E.', year_of_admission=2019)
Roll_No(name='Rituraj', branch='C.S.E.', year_of_admission=2020)
4. Write, execute and visualise the progress of a python code using tqdm module with
setting proper description.
Ans:
Program:
import tqdm
def primes_up_to(n):
primes=[]
with tqdm.trange(2,n) as t:
for i in t:
i_is_prime=not any(i%p==0 for p in primes)
if i_is_prime:
primes.append(i)
t.set_description(f'{len(primes)} Primes')
return primes
def main():
primes=primes_up_to(1000)
print(primes)
if name ==' main ':
main()
Output:
168 Primes: 100%|██████████| 998/998 [00:03<00:00, 275.09it/s]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103,
107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199,
211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313,
317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433,
439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563,
569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673,
677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811,
821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941,
947, 953, 967, 971, 977, 983, 991, 997]
5. Write a python program to split the data ’X’ into test and train dataset and print them,
use 70-30 split criteria.
Ans:
Program:

import csv
from sklearn.model_selection import train_test_split
def loadDataset():
with open('Advertising.csv') as csvfile:
csv_reader=csv.reader(csvfile)
header=[]
header=next(csv_reader)
print('Header:',header)
rows=[]
for row in csv_reader:
rows.append(row)
return rows
def main():
data_set=loadDataset()
print('No. Of Rows Before Split In A Data Set:',len(data_set))
Advertising_train,Advertising_test=train_test_split(data_set,test_size=0.30)
print('No. Of Rows After Split In A Train Data Set:',len(Advertising_train))
print('No. Of Rows After Split In A Test Data Set:',len(Advertising_test))
if name ==' main ':
main()
Output:
Header: ['SlNo', 'TV', 'Radio', 'Newspaper', 'Sales']
No. Of Rows Before Split In A Data Set: 200
No. Of Rows After Split In A Train Data Set: 140
No. Of Rows After Split In A Test Data Set: 60
6. Given below is a confusion matrix, compute Precision, Recall, Accuracy and f1 score.
Write a python program using a generic function to compute above given parameters by
taking values of TP, FP, TN, FN as user input.
Ans:
Program:
def precision(TP,FP,FN,TN):
return TP/(TP+FP)
def recall(TP,FP,FN,TN):
return TP/(TP+FN)
def accuracy(TP,FP,FN,TN):
correct=TP+TN
total=TP+FP+FN+TN
return correct/total
def f1_score(precision,recall):
return 2*precision*recall/(precision+recall)
def genericFunction(TP,FP,FN,TN):
p=precision(TP,FP,FN,TN)
r=recall(TP,FP,FN,TN)
a=accuracy(TP,FP,FN,TN)
f1=f1_score(p,r)
print('Precision:',p)
print('Recall:',r)
print('Accuracy:',a)

print('F1 Score:',f1)
def main():
TP,FP,FN,TN=250,750,500,250
print('From Given Cunfusion Matrix Data,')
genericFunction(TP,FP,FN,TN)
TP=int(input('Enter TP Value: '))
FP=int(input('Enter FP Value: '))
FN=int(input('Enter FN Value: '))
TN=int(input('Enter TN Value: '))
print('From Given User Input Data,')
genericFunction(TP,FP,FN,TN)
if name ==' main ':
main()
Output:
From Given Cunfusion Matrix Data,
Precision: 0.25
Recall: 0.3333333333333333
Accuracy: 0.2857142857142857
F1 Score: 0.28571428571428575
Enter TP Value: 70
Enter FP Value: 4930
Enter FN Value: 13930
Enter TN Value: 981070
From Given User Input Data,
Precision: 0.014
Recall: 0.005
Accuracy: 0.98114
F1 Score: 0.00736842105263158
7. Write the python program for k-NN model implemented on iris dataset and print
accuracy of the same.
Ans:
Program:
import requests
import csv
import random
import math
from typing import NamedTuple,List,Tuple,Dict
from sklearn.model_selection import train_test_split
from collections import defaultdict,Counter
Vector=List[float]
class LabeledPoint(NamedTuple):
point:Vector
label:str
def majority_vote(labels):
vote_counts=Counter(labels)
winner,winner_count=vote_counts.most_common(1)[0]
num_winners=len([count for count in vote_counts.values() if count==winner_count])
if num_winners==1:
return winner
else:
return majority_vote(labels[:-1])
def knn_classify(k,labeled_points,new_points):

by_distance=sorted(labeled_points,key=lambda lp:distance(lp.point,new_points))
k_nearest_labels=[lp.label for lp in by_distance[:k]]
return majority_vote(k_nearest_labels)
def dot(v, w):
assert len(v) == len(w)
return sum(v_i * w_i for v_i, w_i in zip(v, w))
def sum_of_squares(v):
return dot(v, v)
def magnitude(v):
return math.sqrt(sum_of_squares(v))
def subtract(v,w):
assert len(v)==len(w)
return [v_i-w_i for v_i,w_i in zip(v, w)]
def distance(v,w):
return magnitude(subtract(v, w))
def parse_iris_row(row:List[str]):
measurements=[float(value) for value in row[:-1]]
label=row[-1].split("-")[-1]
return LabeledPoint(measurements,label)
def main():
data=requests.get("https://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data")
with open('iris.dat','w') as f:
f.write(data.text)
with open('iris.data') as f:
reader=csv.reader(f)
iris_data=[parse_iris_row(row) for row in reader if row!=[]]
random.seed(12)
iris_train,iris_test=train_test_split(iris_data,test_size=0.30)
assert len(iris_train)==0.7*150
assert len(iris_test)==0.3*150
confusion_matrix:Dict[Tuple[str,str],int]=defaultdict(int)
num_correct=0
for iris in iris_test:
predict=knn_classify(5,iris_train,iris.point)
actual=iris.label
if predict==actual:
num_correct+=1
confusion_matrix[(predict,actual)]+=1
pct_correct=num_correct/len(iris_test)
print(pct_correct,confusion_matrix)
if name ==' main ':
main()
Output:
0.9111111111111111 defaultdict(<class 'int'>, {('virginica', 'virginica'): 10, ('versicolor',
'versicolor'): 16, ('setosa', 'setosa'): 15, ('versicolor', 'virginica'): 4})
8. What is curse of dimensionality? Write a python program to show it by calculating

minimum distances between points when dimensions increase.
Ans: The curse of dimensionality refers to the phenomena that occur when classifying,
organizing, and analyzing high dimensional data that does not occur in low dimensional
spaces, specifically the issue of data sparsity and “closeness” of data.
Program:

import tqdm
import random
import math
def dot(v, w):
return dot(v, v)
def magnitude(v):
return math.sqrt(sum_of_squares(v))
def subtract(v,w):
assert len(v)==len(w)
return [v_i-w_i for v_i,w_i in zip(v, w)]
def distance(v,w):
return magnitude(subtract(v, w))
def random_point(dim):
return [random.random() for _ in range(dim)]
def random_distance(dim,num_pairs):
return [distance(random_point(dim),random_point(dim)) for _ in range(num_pairs)]
def main():
dimensions=range(1,101)
avg_distances,min_distances=[],[]
random.seed(0)
for dim in tqdm.tqdm(dimensions,desc='Curse Of Dimensionality'):
distances=random_distance(dim,10000)
avg_distances.append(sum(distances)/10000)
min_distances.append(min(distances))
min_avg_ratio=[min_dist/avg_dist for min_dist,avg_dist in zip(min_distances,avg_distances)]
plt.xlabel('No. Of Dimensions')
plt.ylabel('Distance')
plt.plot(dimensions,avg_distances,label='Average Distance')
plt.plot(dimensions,min_distances,label='Minimum Distance')
plt.legend()
plt.show()
min_avg_ratio=[min_dist/avg_dist for min_dist,avg_dist in zip(min_distances,avg_distances)]
plt.xlabel('No. Of Dimensions')
plt.ylabel('Distance')
plt.plot(dimensions,min_avg_ratio,label='Minimum Distance/Average Distance')
plt.legend()
plt.show()
if name ==' main ':
main()
Output:


Introduction to Data Science Using Python (CSE 3054)

MINOR ASSIGNMENT-5: SIMPLE LINEAR REGRESSION
1. Write a python code to estimate the parameters when we regress Height on Weight
using least-square estimation method. Write a python code to calculate the the value of r-
squared. Write down the output of the given programs. The data set is as follows:
Write a python program to plot the scatter plot and regression line between height and
weight when we regress Height on Weight.
Ans:
Program:
import math
import numpy as np
import seaborn as sns
def dot(v, w):
return dot(v, v)
def mean(xs):
return sum(xs)/len(xs)
def de_mean(xs):
x_bar=mean(xs)
return [x-x_bar for x in xs]
def variance(xs):
assert len(xs)>=2
n=len(xs)
deviations=de_mean(xs)
return sum_of_squares(deviations)/(n-1)
def standard_deviation(xs):
return math.sqrt(variance(xs))
def covariance(xs,ys):
assert len(xs)==len(ys)
return dot(de_mean(xs),de_mean(ys))/(len(xs)-1)
def correlation(xs,ys):
stdev_x=standard_deviation(xs)
stdev_y=standard_deviation(ys)
if stdev_x>0 and stdev_y>0:
return covariance(xs,ys)/stdev_x/stdev_y
else:
return 0
def least_square_fit(x,y):
beta=correlation(x,y)*standard_deviation(y)/standard_deviation(x)
alpha=mean(y)-beta*mean(x)
return alpha,beta
def total_sum_of_squares(y):
return sum(v**2 for v in de_mean(y))
def predict(alpha,beta,x_i):
return beta*x_i+alpha

def error(alpha,beta,x_i,y_i):
return predict(alpha,beta,x_i)-y_i
def sum_of_sqerrors(alpha,beta,x,y):
return sum(error(alpha,beta,x_i,y_i)**2 for x_i,y_i in zip(x,y))
def r_squared(alpha,beta,x,y):
return 1.0-(sum_of_sqerrors(alpha,beta,x,y)/total_sum_of_squares(y))
def main():
height=np.array([65.8,71.51,69.39,68.21,67.78,68.69,69.80,70.01,67.90])
weight=np.array([112.99,136.48,153.02,142.33,144.29,123.30,141.29,136.46,112.37])
alpha,beta=least_square_fit(height,weight)
print('(Alpha,Beta): (',alpha,',',beta,')')
print('R-Squared:',r_squared(alpha,beta,height,weight))
plt.figure()
sns.regplot(height,weight,fit_reg=True,color='green')
plt.scatter(np.mean(height),np.mean(weight),color='blue')
plt.xlabel('Height')
plt.ylabel('Weight')
plt.title('The Scatter Plot And Regression Line Between Height And Weight When We Regress
Height On Weight.')
if name ==' main ':
main()
Output:
(Alpha,Beta): ( -176.84906144885235 , 4.513352748452845 )
R-Squared: 0.2678616444043238
2. Write a python program to eEstimate the parameters when we regress Height on

Weight using Gradient Descent estimation method. Write a python code to calculate the
value of r-squared.Compare the results obtained using least square method in Q2. The
data set is as follows:
Ans:
Program:
import numpy as np
import random
import tqdm

def scalar_multiply(c,v):
return [c*v_i for v_i in v]
def add(v,w):
return [v_i+ w_i for v_i,w_i in zip(v,w)]
def gradient_step(v, gradient,step_size):
assert len(v) == len(gradient)
step = scalar_multiply(step_size, gradient)
return add(v, step)
def mean(xs):
def de_mean(xs):
x_bar=mean(xs)
def main():
height=np.array([65.8,71.51,69.39,68.21,67.78,68.69,69.80,70.01,67.90])
weight=np.array([112.99,136.48,153.02,142.33,144.29,123.30,141.29,136.46,112.37])
nums_epochs=10000
random.seed(0)
guess=[random.random(),random.random()]
learning_rate=0.00001
with tqdm.trange(nums_epochs) as t:
for _ in t:
alpha,beta=guess
grad_a=sum(2*error(alpha,beta,x_i,y_i) for x_i,y_i in zip(height,weight))
grad_b=sum(2*error(alpha,beta,x_i,y_i)*x_i for x_i,y_i in zip(height,weight))
loss=sum_of_sqerrors(alpha,beta,height,weight)
t.set_description(f"Loss: {loss:.3f}")
guess=gradient_step(guess,[grad_a,grad_b],-learning_rate)
alpha,beta=guess
print('(Alpha,Beta):',guess)
print('R-Squared:',r_squared(alpha,beta,height,weight))
if name ==' main ':
main()
Output:
Loss: 60004.613: 100%|██████████| 10000/10000 [02:01<00:00, 82.46it/s]
(Alpha,Beta): [0.8589356157654474, 1.7579287156676298]
R-Squared: -0.6073641012363178
Comparision:
Using least-square estimation method:
(Alpha,Beta): ( -176.84906144885235 , 4.513352748452845 )
R-Squared: 0.2678616444043238
Using Gradient Descent estimation method:
(Alpha,Beta): [0.8589356157654474, 1.7579287156676298]
R-Squared: -0.6073641012363178

3. Consider the ”Advertising” data. Import the data (the advertizing data is stored as
”Advertizing.csv”). It contains four columns, these are: ”Sales”, ”TV”, ”Radio”, and
”Newspaper” as predictor variables. Consider ”Sales” as a response variable and ”TV”,
”Radio”, and ”Newspaper” as predictor variables. write a python program to fit the
simple linear regression between ”Sales” and ”TV” , ”Sales” and ”Radio” , ”Sales” and
”Newspaper” for each of these cases and estimate the parameter. Write a python code to
calculate R-sqauared value and compare. Show the output of the programs.
Ans:
Program:
import csv
import math
def mean(xs):
def de_mean(xs):
x_bar=mean(xs)
def dot(v, w):
return dot(v, v)
def variance(xs):
assert len(xs)>=2
n=len(xs)
deviations=de_mean(xs)
return sum_of_squares(deviations)/(n-1)
def standard_deviation(xs):
return math.sqrt(variance(xs))
def covariance(xs,ys):
assert len(xs)==len(ys)
return dot(de_mean(xs),de_mean(ys))/(len(xs)-1)
def correlation(xs,ys):
stdev_x=standard_deviation(xs)
stdev_y=standard_deviation(ys)
if stdev_x>0 and stdev_y>0:
return covariance(xs,ys)/stdev_x/stdev_y
else:
return 0
def least_square_fit(x,y):
beta=correlation(x,y)*standard_deviation(y)/standard_deviation(x)
alpha=mean(y)-beta*mean(x)
return alpha,beta

def loadDataset():
with open('Advertising.csv') as csvfile:
csv_reader=csv.reader(csvfile)
header=[]
header=next(csv_reader)
print('Header:',header)
rows=[]
for row in csv_reader:
rows.append(row)
return rows
def main():
data_set=loadDataset()
Tv=[float(x[1]) for x in data_set]
Radio=[float(x[2]) for x in data_set]
Newspaper=[float(x[3]) for x in data_set]
Sales=[float(x[4]) for x in data_set]
print('Sales And Tv:')
alpha,beta=least_square_fit(Sales,Tv)
print('R-Squared:',r_squared(alpha,beta,Sales,Tv))
print('Sales And Radio:')
alpha,beta=least_square_fit(Sales,Radio)
print('R-Squared:',r_squared(alpha,beta,Sales,Radio))
print('Sales & Newspaper:')
alpha,beta=least_square_fit(Sales,Newspaper)
print('R-Squared:',r_squared(alpha,beta,Sales,Newspaper))
if name ==' main ':
main()
Output:
Header: ['SlNo', 'TV', 'Radio', 'Newspaper', 'Sales']
Sales And Tv:
(Alpha,Beta): ( -33.450227765113596 , 12.871651115358429 )
R-Squared: 0.6118750508500714
Sales And Radio:
(Alpha,Beta): ( 0.2712984805890848 , 1.639700589724438 )
R-Squared: 0.3320324554452946
Sales & Newspaper:
(Alpha,Beta): ( 17.19109011451826 , 0.9529620171497046 )
R-Squared: 0.05212044544430583

IDP Assignment - 4 _ 5 [Saswat Mohanty_ 1941012407_ CSE-D]

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IDP Assignment - 4 _ 5 [Saswat Mohanty_ 1941012407_ CSE-D]

Uploaded by

Copyright:

Available Formats

SIKSHA ‘O’ ANUSANDHAN

Admission Batch: 2019 Session: 2021-2022

Introduction to Data Science using Python (CSE 3054)

Name: SASWAT MOHANTY

Registration No.: 1941012407

Branch: Computer Science & Engineering (C.S.E.)

Semester: 6th Semester Section: CSE-‘D’

Department of Computer Science & Engineering

Sl. Name of Program Page Remarks

1 Minor Assignment - 4 1-8

2 Minor Assignment - 5 9-13

Introduction to Data Science using Python(CSE 3054)

2. Write a python program to create a NamedTuple with following details (Roll

Name: Saswat Mohanty 1 Regd. Number: 1941012407

3. Write a python program to create a Dataclass with following details (Roll

Name: Saswat Mohanty 2 Regd. Number: 1941012407

Name: Saswat Mohanty 3 Regd. Number: 1941012407

Name: Saswat Mohanty 4 Regd. Number: 1941012407

Name: Saswat Mohanty 5 Regd. Number: 1941012407

8. What is curse of dimensionality? Write a python program to show it by calculating

Name: Saswat Mohanty 6 Regd. Number: 1941012407

Name: Saswat Mohanty 7 Regd. Number: 1941012407

Name: Saswat Mohanty 8 Regd. Number: 1941012407

Introduction to Data Science Using Python (CSE 3054)

Name: Saswat Mohanty 9 Regd. Number: 1941012407

2. Write a python program to eEstimate the parameters when we regress Height on

Name: Saswat Mohanty 10 Regd. Number: 1941012407

Name: Saswat Mohanty 11 Regd. Number: 1941012407

Name: Saswat Mohanty 12 Regd. Number: 1941012407

Name: Saswat Mohanty 13 Regd. Number: 1941012407

You might also like