Ilovepdf Merged

IU2141220140 DATA SCIENCE
Practical – 1
Aim – Introduction to Python
• What is Python
Python is a genral purpose high level programming language.
• It can be used for : Console app

Desktop Application
Web app
Mobile app
Machine Learning
IOT applications
• Popular apps developed :
• On GitHub :
• About Python :
Very simple and straight forward syntax.
It can be your first programming language too.
Python is case sensitive
It is an Object Oriented Language
Dynamically typed
Indentation is used in place of curly braces
Use variable without declaration

Interpreted Language
• Features of Python :
Emphasis on code readability
Automatic memory management
Dynamically typed
Large Library
Multi-paradigm programming language
With the python interactive interpreter it is easy to check python commands
Platform Independent
• Library for :
Graphical user interfaces
Web frameworks
Multimedia
Databases
Networking
Test frameworks
Automation
Web scraping (Like crawler)
Documentation
System administration
Scientific computing
Text processing
Image processing
IOT
• Download and Installation :
Practical – 2
Aim – Introduction to Google Colab.
• Google is quite aggressive in AI research. Over many years, Google developed
AI framework called TensorFlow and a development tool called Colaboratory.
Today TensorFlow is open-sourced and since 2017, Google made
Colaboratory free for public use. Colaboratory is now known as Google Colab
or simply Colab.
• Another attractive feature that Google offers to the developers is the use of
GPU. Colab supports GPU and it is totally free. The reasons for making it free
for public could be to make its software a standard in the academics for
teaching machine learning and data science. It may also have a long term
perspective of building a customer base for Google Cloud APIs which are sold
per-use basis.
• Irrespective of the reasons, the introduction of Colab has eased the learning
and development of machine learning applications.
• What Colab Offers You?
• As a programmer, you can perform the following using Google Colab.
• Write and execute code in Python
• Document your code that supports mathematical equations
• Create/Upload/Share notebooks
• Import/Save notebooks from/to Google Drive
• Import/Publish notebooks from GitHub
• Import external datasets e.g. from Kaggle
• Integrate PyTorch, TensorFlow, Keras, OpenCV
• Free Cloud service with free GPU
Using Guidance :
• Open the following URL in your browser

– https://colab.research.google.com Your browser would display the following
screen (assuming that you are logged into your Google Drive) –
• Click on the NEW PYTHON 3 NOTEBOOK link at the bottom of the screen.
A new notebook would open up as shown in the screen below.
• Setting Notebook Name

By default, the notebook uses the naming convention UntitledXX.ipynb. To
rename the notebook, click on this name and type in the desired name in the
edit box as shown here –
• Executing Code
To execute the code, click on the arrow on the left side of the code window.
• Adding Code Cells

• To add more code to your notebook, select the following menu options −
• Insert / Code Cell

• Alternatively, just hover the mouse at the bottom center of the Code cell.
When the CODE and TEXT buttons appear, click on the CODE to add a
new cell. This is shown in the screenshot below –
• Changing Cell Order

When your notebook contains a large number of code cells, you may come
across situations where you would like to change the order of execution of
these cells. You can do so by selecting the cell that you want to move and
clicking the UP CELL or DOWN CELL buttons shown in the following
screenshot –
IU2141220140 Data Science
Practical – 3
Aim : Study of various Machine Learning Libraries
7
8
9
10
11
12
13
Practical – 4
Aim : Introduction to Github Repository
14
15
16
17
Practical – 5
Aim : Write a program to implemenr Linear Regression
18
19
Practical – 6
Aim : Bank Churning using ANN
20
21
22
23
24
25
Practical - 7
Aim : Binary Classification using CNN.
Binar y classifica on in CNN (Convolu onal Neural Network) refers

to a specific task where the network is trained to classify input data
into one of two categories or classes. CNNs are a type of deep
neural network commonly used for processing and analyzing visual
data, such as images.
Here's an explana on of how binary classifica on works in CNNs:
1. Input Data: The input data for binary classifica on in CNNs is

typically images, although it can be applied to other types of data
as well. Each input image is represented as a grid of pixel values.
2. Convolu onal Layers: The convolu onal layers in a CNN are

responsible for learning features from the input data. These layers
consist of filters (also known as kernels) that convolve across the
input image, extrac ng relevant features such as edges, textures,
and pa erns.
3. Pooling Layers: A er each convolu onal layer, pooling layers are o

en used to reduce the spa al dimensions of the feature maps while
retaining important informa on. Max pooling is a commonly used
pooling opera on where the maximum value within a window is
selected as the output.
4. Fla ening: Once the convolu onal and pooling layers have been
applied, the resul ng feature maps are fla ened into a one-
dimensional vector. This fla ening process converts the spa al
26
informa on into a format that can be fed into a tradi onal neural
network.
5. Fully Connected Layers: The fla ened feature vector is then

passed through one or more fully connected layers, also known as
dense layers. These layers are responsible for learning the high-
level representa ons of the input data and making predic ons. In
binary classifica on, the output layer typically consists of a single
neuron with a sigmoid ac va on func on, which squashes the
output into a range between 0 and 1, represen ng the probability
of belonging to one of the two classes.
6. Output: The output of the network is a single value between 0 and
1, which represents the predicted probability that the input
belongs to the posi ve class (class 1). A threshold (commonly 0.5)
is then applied to this probability to make the final classifica on
decision. If the predicted probability is above the threshold, the
input is classified as belonging to the posi ve class; otherwise, it is
classified as belonging to the nega ve class (class
0)
7. Training: During the training phase, the parameters of the CNN,
including the weights and biases of the convolu onal and fully
connected layers, are op mized using an algorithm such as
gradient descent and backpropaga on. The network learns to
minimize a loss func on, such as binary cross-entropy, which
measures the difference between the predicted probabili es and
the true labels of the training data.
8. Evalua on: Once trained, the performance of the CNN is

evaluated on a separate test dataset to assess its ability to
generalize to unseen data. Metrics such as accuracy, precision,
27
recall, and F1 score are commonly used to evaluate the

performance of a binary classifier.
Example :
import numpy as np # linear algebra import pandas as pd # data

processing, CSV file I/O (e.g. pd.read_csv) from keras.models
import Sequential from keras.layers import Conv2D from
keras.layers import MaxPooling2D from keras.layers import
Flatten from keras.layers import Dense from PIL import Image
from keras.utils.vis_utils import plot_model from keras.callbacks
import ModelCheckpoint import matplotlib.pyplot as plt import
os
print(os.listdir("../input"))
# Any results you write to the current directory are saved as output.
Using TensorFlow backend.
['training_set', 'test_set']
# Initialising the CNN classifier =

Sequential()
# Step 1 - Convolution classifier.add(Conv2D(32, (3, 3),

input_shape = (64, 64, 3), activation = 'relu'))
# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Adding a second convolutional layer

classifier.add(Conv2D(64, (3, 3), activation = 'relu'))
28
# Adding a third convolutional layer

classifier.add(Conv2D(128, (3, 3), activation = 'relu'))
# Adding a fourth convolutional layer classifier.add(Conv2D(128,

(3, 3), activation = 'relu')) classifier.add(MaxPooling2D(pool_size
= (2, 2)))
# Step 3 - Flattening classifier.add(Flatten())
# Step 4 - Full connection classifier.add(Dense(units =

64, activation = 'relu')) classifier.add(Dense(units = 1,
activation = 'sigmoid'))
# Compiling the CNN classifier.compile(optimizer = 'adam', loss =
'binary_crossentropy', metri cs
= ['accuracy'])
plot_model(classifier, to_file='cnn_model.png', show_shapes=True,
show_
layer_names=True) display(Image.open('cnn_model.png'))
from keras.preprocessing.image import
ImageDataGenerator train_datagen = ImageDataGenerator(rescale
= 1./255,
shear_range = 0.2, zoom_range = 0.2,
horizontal_flip = True) test_datagen =
ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory
('../input/training_set/training_set/', target_size =
(64, 64), batch_size = 32, class_mode =
'binary') test_set
=
test_datagen.flow_from_directory('../input/test_set/test_set',
target_size = (64, 64), batch_size = 32, class_mode = 'binary')
29
Found 8005 images belonging to 2 classes.

Found 2023 images belonging to 2 classes.
filepath = "best_model.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc',
verbose=1, save_ best_only=True, mode='max') history =
classifier.fit_generator(training_set,
steps_per_epoch = 8000, epochs = 15, validation_data
= test_set, validation_steps = 2000, callbacks =
[checkpoint]) print(history.history.keys())
Epoch 1/15
8000/8000 [==============================] - 1257s

157ms/step - loss: 0.32 55 - acc: 0.8462 - val_loss: 0.4854 - val_acc:
0.8469
Epoch 00001: val_acc improved from -inf to 0.84690, saving model to

best_m odel.hdf5 Epoch 2/15
2533/8000 [========>. .................... ] - ETA: 12:04 - loss: 0.1356 - a
cc: 0.9456
# Plot training & validation accuracy values

plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model accuracy') plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left') plt.show()
# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss']) plt.title('Model
loss') plt.ylabel('Loss') plt.xlabel('Epochs')
plt.legend(['Train', 'Test'], loc='upper left') plt.show()
30
import numpy as np
from keras.preprocessing import image
test_image = image.load_img('../input/test_set/test_set/cats/cat.4009.jpg'
, target_size = (64, 64))
test_image = image.img_to_array(test_image) test_image
= np.expand_dims(test_image, axis = 0) result =
classifier.predict(test_image) print(result)
print(training_set.class_indices) if result[0][0] == 1:
prediction = 'dog' else: prediction = 'cat' print(prediction)
[[4.2898555e-30]] {'cats': 0,
'dogs': 1} cat
31
Practical – 8
AIM: Mini Project (Music Recommendation System)
Code:
import numpy as np
import pandas as pd
from typing import List, Dict
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
#This dataset contains name, artist, and lyrics for 57650 songs in English.
songs = pd.read_csv('songdata.csv')
songs.head()
songs.shape
(57650, 4)
#we are going to resample only 5000 random songs.
songs = songs.sample(n=5000).drop('link', axis=1).reset_index(drop=True)
#We can also notice the presence of \n in the text, so we are going to remove it.
songs['text'] = songs['text'].str.replace(r'\n', '')
tfidf = TfidfVectorizer(analyzer='word', stop_words='english')

lyrics_matrix = tfidf.fit_transform(songs['text'])
#We now need to calculate the similarity of one lyric to another. We are going to use cosine
#similarity.
#We want to calculate the cosine similarity of each item with every other item in the
#dataset. So we just pass the lyrics_matrix as argument.
cosine_similarities = cosine_similarity(lyrics_matrix)
#Once we get the similarities, we'll store in a dictionary the names of the 50 most similar
#songs for each song in our dataset.
similarities = {}
for i in range(len(cosine_similarities)):
# Now we'll sort each element in cosine_similarities and get the indexes of the songs.
similar_indices = cosine_similarities[i].argsort()[:-50:-1]
# After that, we'll store in similarities each name of the 50 most similar songs.
# Except the first one that is the same song.
similarities[songs['song'].iloc[i]] = [(cosine_similarities[i][x], songs['song'][x],
songs['artist'][x]) for x in similar_indices][1:]
#define Content based recommender class.

class ContentBasedRecommender:
def __init__(self, matrix):
self.matrix_similar = matrix
def _print_message(self, song, recom_song):

rec_items = len(recom_song)
print(f'The {rec_items} recommended songs for {song} are:')

for i in range(rec_items):
print(f"Number {i+1}:")
print(f"{recom_song[i][1]} by {recom_song[i][2]} with {round(recom_song[i][0], 3)}
similarity score")
print("--------------------")
def recommend(self, recommendation):

# Get song to find recommendations for
song = recommendation['song']
# Get number of songs to recommend
number_songs = recommendation['number_songs']
# Get the number of songs most similars from matrix similarities
recom_song = self.matrix_similar[song][:number_songs]
# print each item
self._print_message(song=song, recom_song=recom_song)
#Now, instantiate class
recommedations = ContentBasedRecommender(similarities)
recommendation = {
"song": songs['song'].iloc[10],
"number_songs": 4
}
recommedations.recommend(recommendation)
The 4 recommended songs for The Little Drummer Boy are:

Number 1:
Kiss by Rainbow with 0.123 similarity score
--------------------
Number 2:
Tecumseh Valley by Townes Van Zandt with 0.037 similarity score
--------------------
Number 3:
Ikaw Lamang by Carol Banawa with 0.033 similarity score
--------------------
Number 4:
Maging Sino Ka Man by Erik Santos with 0.028 similarity score
--------------------
recommendation2 = {
"song": songs['song'].iloc[120],
"number_songs": 4
}
recommedations.recommend(recommendation2)
The 4 recommended songs for Cherche Encore are:

Number 1:
Lolita by Celine Dion with 0.379 similarity score
--------------------
Number 2:
Nous Vivons Ensemble by Gordon Lightfoot with 0.303 similarity score
--------------------
Number 3:
Les Yeux Ouverts by Beautiful South with 0.261 similarity score
--------------------
Number 4:
Ananas by James Taylor with 0.172 similarity score
--------------------

Ilovepdf Merged

Uploaded by

Copyright:

Available Formats

You might also like

Ilovepdf Merged

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ilovepdf Merged

Uploaded by

Copyright:

Available Formats

IU2141220140 DATA SCIENCE

• It can be used for : Console app

• Popular apps developed :

Use variable without declaration

• What Colab Offers You?

• As a programmer, you can perform the following using Google Colab.

• Write and execute code in Python

• Document your code that supports mathematical equations

• Import/Save notebooks from/to Google Drive

• Import/Publish notebooks from GitHub

• Import external datasets e.g. from Kaggle

• Integrate PyTorch, TensorFlow, Keras, OpenCV

• Free Cloud service with free GPU

• Open the following URL in your browser

• Setting Notebook Name

• Adding Code Cells

• Insert / Code Cell

• Changing Cell Order

Binar y classifica on in CNN (Convolu onal Neural Network) refers

Here's an explana on of how binary classifica on works in CNNs:

1. Input Data: The input data for binary classifica on in CNNs is

2. Convolu onal Layers: The convolu onal layers in a CNN are

3. Pooling Layers: A er each convolu onal layer, pooling layers are o

5. Fully Connected Layers: The fla ened feature vector is then

8. Evalua on: Once trained, the performance of the CNN is

recall, and F1 score are commonly used to evaluate the

import numpy as np # linear algebra import pandas as pd # data

# Initialising the CNN classifier =

# Step 1 - Convolution classifier.add(Conv2D(32, (3, 3),

# Adding a second convolutional layer

# Adding a third convolutional layer

# Adding a fourth convolutional layer classifier.add(Conv2D(128,

# Step 3 - Flattening classifier.add(Flatten())

# Step 4 - Full connection classifier.add(Dense(units =

Found 8005 images belonging to 2 classes.

8000/8000 [==============================] - 1257s

Epoch 00001: val_acc improved from -inf to 0.84690, saving model to

# Plot training & validation accuracy values

tfidf = TfidfVectorizer(analyzer='word', stop_words='english')

#define Content based recommender class.

def _print_message(self, song, recom_song):

print(f'The {rec_items} recommended songs for {song} are:')

def recommend(self, recommendation):

The 4 recommended songs for The Little Drummer Boy are:

The 4 recommended songs for Cherche Encore are:

You might also like