Ilovepdf Merged

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

IU2141220140 DATA SCIENCE

Practical – 1
Aim – Introduction to Python

• What is Python
Python is a genral purpose high level programming language.

• It can be used for : Console app


Desktop Application
Web app
Mobile app
Machine Learning
IOT applications

• Popular apps developed :

• On GitHub :

• About Python :
Very simple and straight forward syntax.
It can be your first programming language too.
Python is case sensitive
It is an Object Oriented Language
Dynamically typed
Indentation is used in place of curly braces
IU2141220140 DATA SCIENCE

Use variable without declaration


Interpreted Language
• Features of Python :
Emphasis on code readability
Automatic memory management
Dynamically typed
Large Library
Multi-paradigm programming language
With the python interactive interpreter it is easy to check python commands
Platform Independent

• Library for :
Graphical user interfaces
Web frameworks
Multimedia
Databases
Networking
Test frameworks
Automation
Web scraping (Like crawler)
Documentation
System administration
Scientific computing
Text processing
Image processing
IOT
• Download and Installation :
IU2141220140 DATA SCIENCE
IU2141220140 DATA SCIENCE

Practical – 2
Aim – Introduction to Google Colab.
• Google is quite aggressive in AI research. Over many years, Google developed
AI framework called TensorFlow and a development tool called Colaboratory.
Today TensorFlow is open-sourced and since 2017, Google made
Colaboratory free for public use. Colaboratory is now known as Google Colab
or simply Colab.
• Another attractive feature that Google offers to the developers is the use of
GPU. Colab supports GPU and it is totally free. The reasons for making it free
for public could be to make its software a standard in the academics for
teaching machine learning and data science. It may also have a long term
perspective of building a customer base for Google Cloud APIs which are sold
per-use basis.
• Irrespective of the reasons, the introduction of Colab has eased the learning
and development of machine learning applications.

• What Colab Offers You?

• As a programmer, you can perform the following using Google Colab.

• Write and execute code in Python

• Document your code that supports mathematical equations

• Create/Upload/Share notebooks

• Import/Save notebooks from/to Google Drive

• Import/Publish notebooks from GitHub

• Import external datasets e.g. from Kaggle

• Integrate PyTorch, TensorFlow, Keras, OpenCV

• Free Cloud service with free GPU

Using Guidance :

• Open the following URL in your browser


– https://colab.research.google.com Your browser would display the following
screen (assuming that you are logged into your Google Drive) –
• Click on the NEW PYTHON 3 NOTEBOOK link at the bottom of the screen.
A new notebook would open up as shown in the screen below.
IU2141220140 DATA SCIENCE

• Setting Notebook Name


By default, the notebook uses the naming convention UntitledXX.ipynb. To
rename the notebook, click on this name and type in the desired name in the
edit box as shown here –

• Executing Code
To execute the code, click on the arrow on the left side of the code window.

• Adding Code Cells


IU2141220140 DATA SCIENCE

• To add more code to your notebook, select the following menu options −

• Insert / Code Cell


• Alternatively, just hover the mouse at the bottom center of the Code cell.
When the CODE and TEXT buttons appear, click on the CODE to add a
new cell. This is shown in the screenshot below –

• Changing Cell Order


When your notebook contains a large number of code cells, you may come
across situations where you would like to change the order of execution of
these cells. You can do so by selecting the cell that you want to move and
clicking the UP CELL or DOWN CELL buttons shown in the following
screenshot –
IU2141220140 Data Science

Practical – 3
Aim : Study of various Machine Learning Libraries

7
IU2141220140 Data Science

8
IU2141220140 Data Science

9
IU2141220140 Data Science

10
IU2141220140 Data Science

11
IU2141220140 Data Science

12
IU2141220140 Data Science

13
IU2141220140 Data Science

Practical – 4
Aim : Introduction to Github Repository

14
IU2141220140 Data Science

15
IU2141220140 Data Science

16
IU2141220140 Data Science

17
IU2141220140 Data Science

Practical – 5
Aim : Write a program to implemenr Linear Regression

18
IU2141220140 Data Science

19
IU2141220140 Data Science

Practical – 6
Aim : Bank Churning using ANN

20
IU2141220140 Data Science

21
IU2141220140 Data Science

22
IU2141220140 Data Science

23
IU2141220140 Data Science

24
IU2141220140 Data Science

25
IU2141220140 Data Science

Practical - 7
Aim : Binary Classification using CNN.

Binar y classifica on in CNN (Convolu onal Neural Network) refers


to a specific task where the network is trained to classify input data
into one of two categories or classes. CNNs are a type of deep
neural network commonly used for processing and analyzing visual
data, such as images.

Here's an explana on of how binary classifica on works in CNNs:

1. Input Data: The input data for binary classifica on in CNNs is


typically images, although it can be applied to other types of data
as well. Each input image is represented as a grid of pixel values.

2. Convolu onal Layers: The convolu onal layers in a CNN are


responsible for learning features from the input data. These layers
consist of filters (also known as kernels) that convolve across the
input image, extrac ng relevant features such as edges, textures,
and pa erns.

3. Pooling Layers: A er each convolu onal layer, pooling layers are o


en used to reduce the spa al dimensions of the feature maps while
retaining important informa on. Max pooling is a commonly used
pooling opera on where the maximum value within a window is
selected as the output.

4. Fla ening: Once the convolu onal and pooling layers have been
applied, the resul ng feature maps are fla ened into a one-
dimensional vector. This fla ening process converts the spa al
26
IU2141220140 Data Science

informa on into a format that can be fed into a tradi onal neural
network.

5. Fully Connected Layers: The fla ened feature vector is then


passed through one or more fully connected layers, also known as
dense layers. These layers are responsible for learning the high-
level representa ons of the input data and making predic ons. In
binary classifica on, the output layer typically consists of a single
neuron with a sigmoid ac va on func on, which squashes the
output into a range between 0 and 1, represen ng the probability
of belonging to one of the two classes.
6. Output: The output of the network is a single value between 0 and
1, which represents the predicted probability that the input
belongs to the posi ve class (class 1). A threshold (commonly 0.5)
is then applied to this probability to make the final classifica on
decision. If the predicted probability is above the threshold, the
input is classified as belonging to the posi ve class; otherwise, it is
classified as belonging to the nega ve class (class
0)
7. Training: During the training phase, the parameters of the CNN,
including the weights and biases of the convolu onal and fully
connected layers, are op mized using an algorithm such as
gradient descent and backpropaga on. The network learns to
minimize a loss func on, such as binary cross-entropy, which
measures the difference between the predicted probabili es and
the true labels of the training data.

8. Evalua on: Once trained, the performance of the CNN is


evaluated on a separate test dataset to assess its ability to
generalize to unseen data. Metrics such as accuracy, precision,

27
IU2141220140 Data Science

recall, and F1 score are commonly used to evaluate the


performance of a binary classifier.

Example :

import numpy as np # linear algebra import pandas as pd # data


processing, CSV file I/O (e.g. pd.read_csv) from keras.models
import Sequential from keras.layers import Conv2D from
keras.layers import MaxPooling2D from keras.layers import
Flatten from keras.layers import Dense from PIL import Image
from keras.utils.vis_utils import plot_model from keras.callbacks
import ModelCheckpoint import matplotlib.pyplot as plt import
os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.
Using TensorFlow backend.

['training_set', 'test_set']

# Initialising the CNN classifier =


Sequential()

# Step 1 - Convolution classifier.add(Conv2D(32, (3, 3),


input_shape = (64, 64, 3), activation = 'relu'))
# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Adding a second convolutional layer


classifier.add(Conv2D(64, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))

28
IU2141220140 Data Science

# Adding a third convolutional layer


classifier.add(Conv2D(128, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Adding a fourth convolutional layer classifier.add(Conv2D(128,


(3, 3), activation = 'relu')) classifier.add(MaxPooling2D(pool_size
= (2, 2)))

# Step 3 - Flattening classifier.add(Flatten())

# Step 4 - Full connection classifier.add(Dense(units =


64, activation = 'relu')) classifier.add(Dense(units = 1,
activation = 'sigmoid'))
# Compiling the CNN classifier.compile(optimizer = 'adam', loss =
'binary_crossentropy', metri cs
= ['accuracy'])
plot_model(classifier, to_file='cnn_model.png', show_shapes=True,
show_

layer_names=True) display(Image.open('cnn_model.png'))
from keras.preprocessing.image import
ImageDataGenerator train_datagen = ImageDataGenerator(rescale
= 1./255,
shear_range = 0.2, zoom_range = 0.2,
horizontal_flip = True) test_datagen =

ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_directory
('../input/training_set/training_set/', target_size =
(64, 64), batch_size = 32, class_mode =
'binary') test_set
=
test_datagen.flow_from_directory('../input/test_set/test_set',
target_size = (64, 64), batch_size = 32, class_mode = 'binary')

29
IU2141220140 Data Science

Found 8005 images belonging to 2 classes.


Found 2023 images belonging to 2 classes.

filepath = "best_model.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc',
verbose=1, save_ best_only=True, mode='max') history =
classifier.fit_generator(training_set,
steps_per_epoch = 8000, epochs = 15, validation_data
= test_set, validation_steps = 2000, callbacks =
[checkpoint]) print(history.history.keys())
Epoch 1/15

8000/8000 [==============================] - 1257s


157ms/step - loss: 0.32 55 - acc: 0.8462 - val_loss: 0.4854 - val_acc:
0.8469

Epoch 00001: val_acc improved from -inf to 0.84690, saving model to


best_m odel.hdf5 Epoch 2/15
2533/8000 [========>. .................... ] - ETA: 12:04 - loss: 0.1356 - a
cc: 0.9456

# Plot training & validation accuracy values


plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model accuracy') plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left') plt.show()
# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss']) plt.title('Model
loss') plt.ylabel('Loss') plt.xlabel('Epochs')
plt.legend(['Train', 'Test'], loc='upper left') plt.show()

30
IU2141220140 Data Science

import numpy as np
from keras.preprocessing import image
test_image = image.load_img('../input/test_set/test_set/cats/cat.4009.jpg'
, target_size = (64, 64))
test_image = image.img_to_array(test_image) test_image
= np.expand_dims(test_image, axis = 0) result =
classifier.predict(test_image) print(result)
print(training_set.class_indices) if result[0][0] == 1:
prediction = 'dog' else: prediction = 'cat' print(prediction)
[[4.2898555e-30]] {'cats': 0,
'dogs': 1} cat

31
IU2141220149 Data Science
IU2141220140 Data Science

Practical – 8
AIM: Mini Project (Music Recommendation System)
Code:
import numpy as np
import pandas as pd
from typing import List, Dict
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

#This dataset contains name, artist, and lyrics for 57650 songs in English.
songs = pd.read_csv('songdata.csv')
songs.head()

songs.shape
(57650, 4)
#we are going to resample only 5000 random songs.
songs = songs.sample(n=5000).drop('link', axis=1).reset_index(drop=True)
#We can also notice the presence of \n in the text, so we are going to remove it.
songs['text'] = songs['text'].str.replace(r'\n', '')

tfidf = TfidfVectorizer(analyzer='word', stop_words='english')


lyrics_matrix = tfidf.fit_transform(songs['text'])
IU2141220140 Data Science

#We now need to calculate the similarity of one lyric to another. We are going to use cosine
#similarity.
#We want to calculate the cosine similarity of each item with every other item in the
#dataset. So we just pass the lyrics_matrix as argument.
cosine_similarities = cosine_similarity(lyrics_matrix)

#Once we get the similarities, we'll store in a dictionary the names of the 50 most similar
#songs for each song in our dataset.
similarities = {}
for i in range(len(cosine_similarities)):
# Now we'll sort each element in cosine_similarities and get the indexes of the songs.
similar_indices = cosine_similarities[i].argsort()[:-50:-1]
# After that, we'll store in similarities each name of the 50 most similar songs.
# Except the first one that is the same song.
similarities[songs['song'].iloc[i]] = [(cosine_similarities[i][x], songs['song'][x],
songs['artist'][x]) for x in similar_indices][1:]

#define Content based recommender class.


class ContentBasedRecommender:
def __init__(self, matrix):
self.matrix_similar = matrix

def _print_message(self, song, recom_song):


rec_items = len(recom_song)

print(f'The {rec_items} recommended songs for {song} are:')


for i in range(rec_items):
print(f"Number {i+1}:")
print(f"{recom_song[i][1]} by {recom_song[i][2]} with {round(recom_song[i][0], 3)}
similarity score")
print("--------------------")
IU2141220140 Data Science

def recommend(self, recommendation):


# Get song to find recommendations for
song = recommendation['song']
# Get number of songs to recommend
number_songs = recommendation['number_songs']
# Get the number of songs most similars from matrix similarities
recom_song = self.matrix_similar[song][:number_songs]
# print each item
self._print_message(song=song, recom_song=recom_song)
#Now, instantiate class
recommedations = ContentBasedRecommender(similarities)

recommendation = {
"song": songs['song'].iloc[10],
"number_songs": 4
}
recommedations.recommend(recommendation)

The 4 recommended songs for The Little Drummer Boy are:


Number 1:
Kiss by Rainbow with 0.123 similarity score
--------------------
Number 2:
Tecumseh Valley by Townes Van Zandt with 0.037 similarity score
--------------------
Number 3:
Ikaw Lamang by Carol Banawa with 0.033 similarity score
--------------------
Number 4:
Maging Sino Ka Man by Erik Santos with 0.028 similarity score
--------------------
IU2141220140 Data Science

recommendation2 = {
"song": songs['song'].iloc[120],
"number_songs": 4
}
recommedations.recommend(recommendation2)

The 4 recommended songs for Cherche Encore are:


Number 1:
Lolita by Celine Dion with 0.379 similarity score
--------------------
Number 2:
Nous Vivons Ensemble by Gordon Lightfoot with 0.303 similarity score
--------------------
Number 3:
Les Yeux Ouverts by Beautiful South with 0.261 similarity score
--------------------
Number 4:
Ananas by James Taylor with 0.172 similarity score
--------------------

You might also like