Internship Report

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

INTERNSHIP REPORT

on
MACHINE LEARNING IN PYTHON

Submitted by
NIMISHA A : 20320055

in partial fulfillment of requirement for the award of the degree

of

BACHELOR OF TECHNOLOGY
in
ELECTRONICS AND COMMUNICATION

Under Supervision of Dr. Ameer P M, Assistant Professor, Electronics and


Communication Department, NIT, Calicut.
(Duration: 1 st June, 2023 to 30 th June, 2023)

DIVISION OF ELECTRONICS ENGINEERING


SCHOOL OF ENGINEERING
COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY
KOCHI-682022

APRIL 2023

1
DIVISION OF ELECTRONICS ENGINEERING
SCHOOL OF ENGINEERING
COCHIN UNIVERSITY OF SCIENCE AND
TECHNOLOGY KOCHI-682022

CERTIFICATE

This is to certify that the “Internship report” submitted by NIMISHA A(Reg. No.:
20320055) is work done by her and submitted during 2023-2024 academic year, in
partial fulfillment of the requirements for the award of the degree of BACHELOR OF
TECHNOLOGY in ELECTRONICS AND COMMUNICATION ENGINEERING, at
NIT CALICUT..

College Internship Coordinator Department Internship Coordinator

2
ACKNOWLEDGEMENT

I would like to express my sincere gratitude and appreciation to all those who have supported and
contributed to the successful completion of my internship report. This internship has been an invaluable
experience, and I am grateful for the guidance, mentorship, and assistance I have received throughout this
journey.

First and foremost, I extend my heartfelt thanks to Dr. Ameer P M, my internship supervisor, for their
unwavering support, guidance, and expertise.

I would like to thank Mr Nirmal Joseph, of Rajiv Gandhi Institute Of Technology, Kottayam for their
guidance and advice to get and complete this internship.

I am indebted to my professors and faculty members at SOE, CUSAT for their academic guidance, which
laid the foundation for my internship.

Lastly, I want to express my deep appreciation to my family and friends for their unwavering support,
understanding, and encouragement throughout my internship journey.

Nimisha A

3
ABSTRACT

Diabetes retinopathy is a progressive condition that damages the blood vessels in the retina, leading to
vision impairment and, if left untreated, even blindness. Early detection and timely intervention are
crucial for preventing irreversible vision loss and providing appropriate medical care to patients. Transfer
learning, a technique in machine learning, has proven to be highly relevant in improving the accuracy of
diabetes retinopathy detection systems. Transfer learning enables the transfer of knowledge and learned
representations from pre-trained models to new tasks or datasets. Objective is to increase the accuracy of
the transfer learning to detect Diabetes Retinopathy using models Densenet and Xcepion and compare
their efficiencies.

Methodologies: Collected a dataset of retinal images containing both diabetic and non-diabetic cases and
split it into training and validation sets, ensuring a balanced representation of both classes.Done data
augmentation techniques such as rotation, flipping, or zooming to increase the diversity of the training set
and improve generalization. Chose pre-trained models DenseNet and Xception, which have demonstrated
strong performance in computer vision tasks. Frozen the initial layers of the pre-trained models to retain
the learned features while preventing excessive modification.Experiment with different hyperparameters,
such as learning rate, batch size, or number of epochs, to find the optimal configuration that maximizes
validation accuracy.

4
Learning Objectives/Internship Objectives

Internships are generally thought of to be reserved for college students looking to gain experience in a
particular field. However, a wide array of people can benefit from Training Internships in order to
receive real world experience and develop their skills.

An objective for this position should emphasize the skills you already possess in the area and your
interest in learning more. Internships are utilized in a number of different career fields, including
architecture, engineering, healthcare, economics, advertising and many more. Some internships are
used to allow individuals to perform scientific research while others are specifically designed to allow
people to gain first-hand experience working.

Utilizing internships is a great way to build your resume and develop skills that can be emphasized in
your resume for future jobs. When you are applying for a Training Internship, make sure to highlight
any special skills or talents that can make you stand apart from the rest of the applicants so that you
have an improved chance of landing the position.

5
CONTENTS

About NIT…………………………………………………………………………….07
1. Introduction……………………………………………………………………………08
1.1 Objective
1.2 Overview
2. Technology…………………………………………………………………………….09
2.1 Google Colab
2.2 Machine Learning using Python
2.3 Neural Network and Deep Learning
3. Methodologies………………………………………………………………………...12
4. Code…………………………………………………………………………………..14
4.1 Using Densenet
4.2 Using Xception
5. Result…………………………………………………………………………………19
5.1 Using Densenet
5.2 Using Xception
6. Conclusion……………………………………………………………………………22
7. Bibliography………………………………………………………………………….23

6
ABOUT NIT CALICUT

National Institute of Technology Calicut (NITC) is a prestigious public technical university in


Kozhikode, Kerala, India. Established in 1961, it is one of the National Institutes of Technology
(NITs) recognized as Institutes of National Importance by the Government of India. NITC consistently
ranks among the top universities in India for engineering and technology education.

Offers undergraduate and postgraduate programs in various disciplines of engineering, science,


technology, and management. The curriculum is updated regularly to meet industry needs and
international standards. It is renowned for its highly qualified and experienced faculty members with
expertise in their respective fields. NITC boasts an excellent placement record with over 90% of
graduates securing jobs in top companies. The institute has a dedicated placement cell that provides
career guidance and support to students. Vibrant campus life with numerous student clubs, societies,
and cultural events. NITC fosters a diverse and inclusive environment for students from all over India.

7
1. INTRODUCTION

1.1 Objective
The objective of this project is to improve the validation accuracy of a transfer learning model for
detecting diabetes retinopathy using DenseNet and Xception models on a dataset of 10,000 retinal
images with two classes (diabetic and non-diabetic). The primary focus is to enhance the accuracy of
the model's predictions and provide a reliable tool for early detection of diabetes retinopathy.

1.2 Overview
Diabetes retinopathy is a severe complication of diabetes that affects the blood vessels in the retina
and can lead to vision impairment or blindness if not detected and treated early. Transfer learning, a
technique in machine learning, allows us to leverage pre-trained models like DenseNet and Xception
to enhance the accuracy of diabetes retinopathy detection systems. These models have been trained on
large-scale datasets and possess a deep understanding of visual patterns and structures.
In this project, we aim to utilize the power of transfer learning to develop a robust and accurate model
for diabetes retinopathy detection. We will start by preparing the dataset, which consists of 10,000
retinal images with binary classification. The dataset will be split into training and validation sets,
ensuring a balanced representation of both classes. We will select and fine-tune the DenseNet and
Xception models, adapting them to the specific characteristics of the retinal image dataset. Custom
layers will be added on top of the pre-trained models for binary classification.
Throughout the project, we will employ various preprocessing techniques, such as image resizing and
data augmentation, to enhance the quality and diversity of the dataset. Hyperparameter tuning will be
conducted to optimize the model's performance, and comparative analysis will be carried out to
determine the superior model between DenseNet and Xception for diabetes retinopathy detection. By
the end of this project, we expect to achieve higher validation accuracy in detecting diabetes
retinopathy, providing a reliable and efficient tool for early diagnosis.

8
2. TECHNOLOGY

2.1 Google Colab

Google Colab (short for "Colaboratory") is a cloud-based platform provided by Google that allows
users to write, execute, and collaborate on Python code using Jupyter notebooks. It offers a
convenient and interactive environment for data analysis, machine learning, and general programming
tasks.
Similar to Jupyter notebooks, we can use Google Colabs. They are quite practical because Google
Colab hosts them, allowing us to operate the notebook without using any of our own computer
resources. These notebooks can also be shared, allowing others to run our programs with ease in a
standard environment without relying on our personal computers.Nevertheless, when initializing, we
might need to install a few libraries in our environment.
​Colab provides access to powerful hardware resources, including CPUs, GPUs, and TPUs (Tensor
Processing Units). Pre-installed Libraries and Packages: Colab comes pre-installed with popular
Python libraries and packages commonly used in data analysis and machine learning, such as
TensorFlow, PyTorch, NumPy, Pandas, and Matplotlib. This saves time and effort in setting up the
development environment.

2.2 Machine Learning using Python


Machine learning is a subfield of artificial intelligence (AI) that focuses on developing algorithms and
models that allow computers to learn from data and make predictions or decisions without explicit
programming. .Machine learning relies on data as its primary source of information. In machine
learning, models are trained by exposing them to labeled or unlabeled data.The quality, quantity, and
relevance of the data play a significant role in the performance and accuracy of the models.The
performance of machine learning models is evaluated using various metrics, such as accuracy,
precision, recall, F1 score, and area under the curve (AUC).

Machine learning using Python has gained significant popularity due to the rich ecosystem of
libraries and frameworks available for data manipulation, modeling, and evaluation. Here are some

9
key Python libraries commonly used in machine learning: such as Numpy, Pandas,
scikit-learn,TensorFlow,Keras, PyTorch, Matplotlib and Seaborn
NumPy: NumPy is a fundamental library for numerical computations in Python. It provides support
for large, multi-dimensional arrays and a collection of mathematical functions, which are essential for
data preprocessing and manipulation.
TensorFlow: TensorFlow is a popular open-source library for deep learning. It provides a flexible
framework for building and training neural networks.
Keras: Keras is a high-level neural network API that runs on top of TensorFlow. It simplifies the
process of building and training deep learning models by providing a user-friendly interface and a
collection of pre-built neural network layers and architectures.
Matplotlib and Seaborn: Matplotlib and Seaborn are visualization libraries that enable the creation
of various types of plots and charts. They help in visualizing data, understanding patterns, and
presenting results.

2.3 Neural Network and Deep Learning

Neural networks and deep learning are subfields of machine learning that focus on modeling and
simulating the behavior of the human brain to solve complex problems.
Neural Network: A neural network is a computational model inspired by the structure and
functioning of biological neural networks. It consists of interconnected artificial neurons, also known
as nodes or units, organized into layers. The nodes in each layer receive inputs, perform computations
using activation functions, and pass the results to the next layer until the final output is produced.
Neural networks can have multiple hidden layers between the input and output layers, enabling them
to capture complex patterns and relationships in the data.
Deep Learning: Deep learning is a subset of machine learning that utilizes deep neural networks with
multiple layers to learn hierarchical representations of data. It leverages the power of computational
resources, large datasets, and sophisticated optimization algorithms to train deep neural networks.
Deep learning models can automatically learn and extract high-level features from raw data,
eliminating the need for manual feature engineering. This ability to learn hierarchical representations
makes deep learning effective in solving complex tasks, such as image and speech recognition.

10
Neural networks and deep learning have been successfully applied in various fields, including:
● Image and object recognition
● Natural language processing and machine translation
● Speech and audio recognition
● Recommendation systems
● Autonomous vehicles
● Medical diagnosis and image analysis
● Financial market analysis
● Gaming and reinforcement learning

11
3. METHODOLOGIES

The methodology for increasing validation accuracy in a transfer learning model for detecting
diabetes retinopathy using DenseNet and Xception models on a dataset of 10,000 images with 2
classes (diabetic and non-diabetic) are outlined as follows:
​Dataset Preparation:
● Collected a dataset of retinal images containing both diabetic and non-diabetic cases.
● Splitted the dataset into training(80%) and validation(20%) sets, ensuring a balanced
representation of both classes.
​Transfer Learning Model Selection:
● Chose pre-trained models DenseNet and Xception, which have demonstrated strong
performance in computer vision tasks.
● Utilized pre-trained weights obtained from large-scale datasets like ImageNet, as these models
have learned generic features applicable to various visual recognition tasks.
​Model Architecture:
● Removed the fully connected layers at the top of the pre-trained models and added custom
layers appropriate for the binary classification task of diabetes retinopathy detection.
​x=GlobalAveragePooling2D()(x): Global average pooling is applied to reduce the spatial
dimensions of the tensor.
​x=Dropout(0.2)(x): Dropout regularization with a rate of 0.2 (20%) is applied to prevent
overfitting.
​x=Dense(4096,activation='relu',
​kernel_regularizer=regularizers.l2(0.01))(x): A new dense layer with 4096
units, ReLU activation, and L2 regularization is added.
​output = Dense(2, activation='softmax')(x): The final dense layer is added with 2
units representing the binary classification task, and softmax activation is used to obtain class
probabilities.
​Fine-tuning and Training:
● Froze the initial layers of the pre-trained models to retain the learned features while preventing
excessive modification.
for layer in densenet.layers:
layer.trainable = False

12
for layer in base_model.layers:
layer.trainable = False
Unfrozen some layer and made to train from layer 175
for layer in densenet.layers[175:]:
layer.trainable = True

● Defined an appropriate learning rate for training the newly added custom layers and gradually
fine-tune the earlier layers to adapt to the specific retinal image dataset.
adam= Adam(learning_rate=0.00001)

● Employ an optimizer Adam and a suitable loss function (e.g., binary cross-entropy) for the binary
classification task.
model.compile(optimizer='adam',loss='binary_crossentropy',
metrics=['accuracy'])
● Trained the model on the training set, iteratively adjusting the weights to minimize the loss and
maximize accuracy.
​Model Evaluation and Validation:
● Evaluated the model's performance on the validation set at regular intervals during training to
monitor its progress and detect overfitting.
● Calculated the accuracy to assess the model's effectiveness in detecting diabetes retinopathy.
● Used techniques like early stopping to prevent overfitting and ensure optimal model performance.
​Hyperparameter Tuning:
● Experiment with different hyperparameters, such as learning rate, batch size, or number of epochs,
to find the optimal configuration that maximizes validation accuracy.
​Comparative Analysis:
● Compared the validation accuracy and performance of the DenseNet and Xception models to
identify which model achieves better results for diabetes retinopathy detection.

13
4.CODE

4.1 Using Densenet model

import os
os.chdir('/content/drive/MyDrive')
import tensorflow as tf
print('Version of tensorflow :',tf.__version__)
import numpy as np
from tensorflow.keras.applications.densenet import DenseNet121
from tensorflow.keras.models import Model
from keras.optimizers import Adam
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import glob
from matplotlib import pyplot as plt
import warnings
warnings.filterwarnings('ignore',category=FutureWarning)
from datetime import datetime

target_accuracy=0.9
target_validation_accuracy = 0.9
densenet = DenseNet121(weights='imagenet',
include_top=False,
input_shape=(224, 224, 3))
#visualise model
densenet.summary()
for layer in densenet.layers:
layer.trainable = False
from tensorflow.keras import regularizers
from tensorflow.keras.layers import Dense, Dropout
x = densenet.output

14
x = GlobalAveragePooling2D()(x)
x=(Dropout(0.2)(x))
x = Dense(4096, activation='relu',
kernel_regularizer=regularizers.l2(0.01))(x)
output = Dense(2, activation='softmax')(x)

model = Model(inputs=densenet.input, outputs=output)

model.summary()
for layer in densenet.layers[175:]:
layer.trainable = True
# defining adam
adam= Adam(learning_rate=0.00001)
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
train_dir = '/content/drive/MyDrive/Internship neural/'
valid_dir = '/content/drive/MyDrive/Internship neural/'
train_datagen = ImageDataGenerator(rescale=1./255,
validation_split=0.2)

valid_datagen = ImageDataGenerator(rescale=1./255,)
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(224, 224),
batch_size=64,
class_mode='categorical',
subset='training')

# Load and augment validation/test data


valid_generator = valid_datagen.flow_from_directory(
valid_dir,
target_size=(224, 224),
batch_size=64,
15
class_mode='categorical')
# Define early stopping based on validation accuracy
early_stopping = tf.keras.callbacks.EarlyStopping(
monitor='val_accuracy',
patience=5,
restore_best_weights=True
)

from matplotlib import image as mpimg


import os
for link in os.listdir(train_dir):
lk=train_dir+'/'+link
for lk1 in os.listdir(lk+'/'):
path=lk+'/'+lk1
img=mpimg.imread(path)
plt.imshow(img)
plt.title(link[:])
plt.show()
break

4.2 Using Xception model

import os
os.chdir('/content/drive/MyDrive')
import tensorflow as tf
from tensorflow.keras.applications.xception import Xception
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam
from matplotlib import pyplot as plt
target_accuracy = 0.9

16
target_validation_accuracy=0.9
base_model = Xception(weights='imagenet', include_top=False,
input_shape=(224, 224, 3))

# Visualize the model


base_model.summary()
for layer in base_model.layers:
layer.trainable = False
from keras.layers import Dropout
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(4096, activation='relu')(x)
x = Dropout(0.5)(x)
output = Dense(2, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=output)
model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_dir ='/content/drive/MyDrive/Internship neural/TRAIN'
validation_dir ='/content/drive/MyDrive/Internship neural/TEST'
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(224, 224),
batch_size=32,
class_mode='categorical',
subset='training'
)

validation_generator = train_datagen.flow_from_directory(
validation_dir,
target_size=(224, 224),
batch_size=32,
17
class_mode='categorical',
)
# %% [code]
from matplotlib import image as mpimg
import os
for link in os.listdir(train_dir):
lk=train_dir+'/'+link
for lk1 in os.listdir(lk+'/'):
path=lk+'/'+lk1
img=mpimg.imread(path)
plt.imshow(img)
plt.title(link[:])
plt.show()
break
history = model.fit(
train_generator,
steps_per_epoch=train_generator.samples //
train_generator.batch_size,
epochs=40,
validation_data=validation_generator,
validation_steps=validation_generator.samples //
validation_generator.batch_size,
)

18
5. RESULT

5.1 using Densenet model

​1: No Augmentation, No Dropout/Regularization, Early Stopping (Patience 5), No Learning Rate


Specification, Epoch 50
​Training Accuracy: 98%
​Validation Accuracy: 61%
Observation: The model achieved high training accuracy but lower validation accuracy,
indicating potential overfitting. The model might have memorized the training data without
generalizing well to unseen data.

2: With no augmentation, Dropout (0.5), Regularization, Epochs 50, Early Stopping (Patience 5),
with no learning rate specified
● Training Accuracy: 98%
● Validation Accuracy: 75%
Observation: Introducing dropout (0.5) and regularization resulted in improvements. The model's
training accuracy didn't change and validation accuracy increased suggesting that the
regularization techniques might have been appropriate for the given task to reduce the overfitting.
3: No Augmentation, Dropout (0.2), Regularization, Epochs 50, Early Stopping (Patience 5)
● Training Accuracy: 98%
● Validation Accuracy: 80%
● Observation: This experiment showed better results compared to the previous ones. The
introduction of dropout (0.2) and regularization, along with a lower learning rate and extended
training (epochs 50), led to improved validation accuracy. The model achieved 80% validation
accuracy, suggesting better generalization and performance on unseen data.

4: Dropout (0.2), Regularization, Early Stopping (Patience 5), with Augmentation, Epoch-50,
learning rate 0.00001
● Training Accuracy: 70%
● Validation Accuracy: 68%

19
● Observation: This experiment did not result in improvements compared to the previous one.
The model's training accuracy decreased and the validation accuracy too decreased. Augmentation
did not lead to significant improvements in validation accuracy.

Overall, these observations highlight the importance of hyperparameter tuning and regularization
techniques in transfer learning tasks. Experimenting with dropout, regularization, learning rate, and
early stopping can help improve the model's performance, prevent overfitting, and achieve better
generalization. The results obtained in Experiment 4, with appropriate regularization and learning
rate adjustments, demonstrated the most promising performance in terms of validation accuracy.

5.1 using Xception model

1.If no dropout and no augmentation are applied in the Xception model,


the training accuracy is 80%
the validation accuracy is 65%.

Observation:The model seems to be overfitting the training data. Overfitting occurs when a model
performs well on the training data but fails to generalize well to unseen data. In this case, the model
achieves a higher accuracy on the training data (80%) compared to the validation data (65%).

2.If no argumentation , dropout of 0.5 and adding a dense layer in xception model , got
accuracy of 90% and
validation accuracy of 68%.

Observation:the model achieved a training accuracy of 90% and a validation accuracy of 68%. This
indicates that the model was able to effectively learn and make accurate predictions on the training
data. However, the relatively lower validation accuracy suggests that the model may not be
generalizing well to unseen data, indicating a potential issue with overfitting.

3.If no argumentation , dropout 0.5 , adding a dense layer, Epoch of 40 in xception model, got
95% accuracy and
70% of validation accuracy.

20
Observation:The results indicate that the model achieved a high training accuracy of 95% and a
validation accuracy of 70%. This suggests that the model has learned the patterns and features
present in the training data and can make accurate predictions for most of the samples. However, the
relatively lower validation accuracy suggests that the model may still be overfitting to some extent.
To enhance the model's generalization capabilities, further adjustments and experimentation are
needed. This may involve fine-tuning hyperparameters, exploring different regularization techniques,
or considering other architectural modifications. By finding the right balance between model
complexity and generalization, we can aim to improve the overall performance and ensure accurate
predictions on unseen data.

21
6. CONCLUSION

In conclusion, the project involved the use of transfer learning with DenseNet and Xception models
for the detection of diabetes retinopathy using a dataset of 10,000 images with 2 classes. Several
experiments were conducted to observe the impact of different configurations and techniques on the
model's performance. The conclusions drawn are:
​* Transfer learning proved to be effective in leveraging pre-trained models such as DenseNet and
Xception for diabetes retinopathy detection. These models, which were trained on large-scale
datasets, provided a strong foundation for extracting relevant features and patterns from retinal
images.
​* The initial experiments highlighted the challenge of overfitting, where the model performed well on
the training set but struggled to generalize to unseen data, resulting in lower validation accuracy. This
indicated the need for techniques to prevent overfitting and improve generalization.
​* The experiments involving hyperparameter tuning, such as dropout, regularization, learning rate
adjustments, and early stopping, played a crucial role in improving the model's performance. In case
of DenseNet, experiment 3, with dropout (0.2), regularization, extended training (epochs 50), and a
lower learning rate (0.00001), achieved the highest validation accuracy of 80%, and Xception with
dropout 0.2, adding a denselayer gave the accuracy 70%, indicating improved generalization and
better performance on unseen data.
​* The results showed that the use of data augmentation techniques did not lead to significant
improvements in validation accuracy. This could be due to the nature of the dataset or the specific
augmentation techniques applied. Further exploration of different augmentation strategies might be
beneficial for improving the model's performance.
​* The evaluation of the model's performance using metrics like training accuracy and validation
accuracy helped assess the effectiveness of different configurations.
* Overall, the project demonstrated the significance of hyperparameter tuning, regularization
techniques, and appropriate model evaluation in improving the performance of transfer learning
models for diabetes retinopathy detection. The findings can contribute to further advancements in
medical image analysis and enhance the detection and diagnosis of diabetes-related complications.

22
7. BIBLIOGRAPHY

https://youtu.be/WWcgHjuKVqA- What is Transfer Learning? Transfer Learning in Keras | Fine


Tuning Vs Feature Extraction

https://youtu.be/JcU72smpLJk - Transfer Learning Using Keras(ResNet-50)| Complete Python


Tutorial|

https://youtu.be/84J1fMklQWE - Transfer Learning - Image Classification using Tensorflow

https://youtu.be/chQNuV9B-Rw - How To Train Deep Learning Models In Google Colab- Must For
Everyone

https://www.techtarget.com/searchenterpriseai/definition/deep-learning-deep-neural-network#:~:text
=Deep%20learning%20is%20a%20type,includes%20statistics%20and%20predictive%20modeling.
Covering concepts on Deep Learning

23

You might also like