Rfggteg

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 92

CSL604

RIZVI EDUCATION SOCIETY’S

Rizvi College of Engineering


Department of Artificial Intelligence & Data Science Engineering

Machine Learning Journal

NAME:

CLASS:

ROLL NO. / UIN:


Rizvi College of Engineering CSL604 (Machine Learning)

RIZVI EDUCATION SOCIETY’S

Rizvi College of Engineering


Department of Artificial Intelligence & Data Science Engineering

Lab File

Class & Semester: TE (Sem VI)


Subject: Machine Learning Lab

Name: …………………………………………………………………

Roll No. : ……………..

UIN: …………………………….

Page | 1
Rizvi College of Engineering CSL604 (Machine Learning)

RIZVI EDUCATION SOCIETY’S

Rizvi College of Engineering


Department of Artificial Intelligence & Data Science Engineering

Certificate

This is to certify that ......................................................................................................................... of


Third year Artificial Intelligence & Data Science Engineering Department has performed all the experiments of
Machine Learning with satisfactory results.

………………………………..

Prof. Tushar J. Surwadkar

Subject In-charge

Page | 2
Rizvi College of Engineering CSL604 (Machine Learning)

INSTITUTE MISSION
INSTITUTE VISION
IM1. Impart Core Fundamental Principles: To impart core
fundamental principles of engineering and science that will enable
To become a leading entity in transforming the the learner to develop solutions to complex engineering problems,
diverse class of learners into innovators, analyzers through conventional and innovative teaching learning methods and
and entrepreneurs competent to develop eco-friendly mentoring.
sustainable solutions and work in multi-disciplinary IM2. Bridge the Technical Skill Gap: To bridge the technical
skill gap through curriculum enrichment activities for industry
environment to meet the global challenges and
readiness.
contribute towards nation building. IM3. Inculcate Professional Etiquettes and Ethics: To groom
the learner through dedicated training, placement and extension
activities and to inculcate professional etiquettes and ethics aimed
DEPARTMENT VISION at holistic development of the learners enabling them to acquire
distinguished positions in the leading industries or be eligible for
higher studies in globally recognized universities.
To become a center of excellence in the field of IM4. Research and Development: To provide modern
Artificial Intelligence & Data Science Engineering to infrastructure and the necessary resources, for planning and
transform diverse class of learners into skilled implementing innovative ideas, leading to meaningful research and
professionals with ethical values capable of development and entrepreneurship.

PROGRAM OUTCOMES DEPARTMENT MISSION

DM1. Quality Technical Education: To provide quality technical


PO1. Engineering knowledge education with the help of modern resources for finding solutions to
PO2. Problem analysis complex problems of Artificial Intelligence & Data Science
PO3. Design/development of solutions engineering.
PO4. Conduct investigations of complex DM2. Leadership & Entrepreneurial skills: To inculcate moral
problems and ethical values while acquiring leadership, entrepreneurial skills
PO5. Modern tool usage and overall personality development.
DM3. Professional Skills and Lifelong Learning: To inculcate
PO6. The engineer and society
professional skills and lifelong learning for further education as well
PO7. Environment and sustainability as acquiring distinguished positions in leading software industries.
PO8. Ethics DM4. Research and Development: To encourage creative
PO9. Individual and team work thinking for planning and implementing innovative ideas leading to
PO10. Communication meaningful research and development considering economical,
PO11. Project management and finance
PO12. Life-long learning
PROGRAM EDUCATIONAL OBJECTIEVES

PROGRAM SPECIFIC OUTCOMES


PEO1. Successful Career: To build a successful career in leading
industries related to the field of Artificial Intelligence & Data Science
Engineering wherein the engineer will be able to provide the necessary
PSO1. Open Source Tools: To encourage the solutions to the challenges witnessed in existing and new business
students to work using open-source models.
software’s & tools in diversified areas of PEO2. Leadership Qualities: To exhibit the qualities of team
Artificial Intelligence & Data Science spirit, leadership, and problem-solving skills to achieve top positions
science. in the organization or to enhance entrepreneurial skills.
PSO2. Industry Readiness: To enable the students PEO3. Adaptability to New Technology: To be able to adapt to
to acquire the necessary skill set required to new technologies and platforms and share their knowledge with their
develop, test, install, deploy, and maintain a peers in the allied fields.
PEO4. Research & Higher Studies: To develop proficiency in
complete software system for business and
Artificial Intelligence & Data Science engineering and related fields
other applications, that makes them industry Patogebe|ab3le to work in multi-disciplinary areas with a strong focus on
Rizvi College of Engineering CSL604 (Machine Learning)

Course Objectives
1. To introduce platforms such as Anaconda, COLAB suitable to Machine learning

2. To implement various Regression techniques

3. To develop Neural Network based learning models

4. To implement Clustering techniques

Course Outcomes
Learner will be able to…

CO1. Implement various Machine learning models.


CO2. Apply suitable Machine learning models for a given problem.
CO3. Implement Neural Network based models.
CO4. Apply Dimensionality Reduction techniques.

Page | 4
Rizvi College of Engineering CSL604 (Machine Learning)

Rubrics
Following rubrics will be used to assess the work submitted by the students.

1. For Experiment 1 to 10

Criteria 1: Understanding
Criteria description in detail
3 2 1 0
Excellent Good Average Poor
Well understood & written Understood after clearing Understood after repeated Did not understand
doubts explanations
Criteria 2: Code
Criteria description in detail
3 2 1 0
Excellent Good Average Poor
Unique code Referred code but code Wrote code from reference Not able to write code
written by student
Criteria 3: Output
Criteria description in detail
3 2 1 0
Excellent Good Average Poor
In 1st attempt (in least time) Explanation for this Point In 3rd attempt (in more No Output
time)

2. For Assignments 1 to 2

Criteria 1: Timely submissions


Criteria description in detail
3 2 1 0
Excellent Good Average Poor
Before deadline On the day of deadline given After deadline Week after the deadline
Criteria 2: Presentation
Criteria description in detail
3 2 1 0
Excellent Good Average Poor
Very Neatly presented Neat Not neat Not able to read
Criteria 3:
Understanding
Criteria description in detail
3 2 1 0
Excellent Good Average Poor
Answered all questions Did not answer all questions Answered few questions Did not answer any
questions

Page | 5
Rizvi College of Engineering CSL604 (Machine Learning)

Lab Guidelines
 Guidelines for performing the Experiments

1. Students are advised to come to the laboratory at least 5 minutes before (to starting time), those who come after
5 minutes will not be allowed into the lab.

2. Plan your task properly much before to the commencement, come prepared to the lab with the synopsis/ program
/ experiment details.

3. Student should enter into the laboratory with: a. Laboratory observation notes with all the details (Problem
statement, Aim, Algorithm, Procedure, Program, Expected Output, etc.,) filled in for the lab. session. b.
Laboratory Record updated up to the last session experiments and other utensils (if any) needed in the lab. c.
Proper Dress code and Identity card.

4. Sign in the laboratory login register, write the TIME-IN, and occupy the computer system allotted to you by the
faculty.

5. Execute your task in the laboratory, and record the results / output in the lab observation notebook, and get
certified by the concerned faculty.

6. All the students should be polite and cooperative with the laboratory staff, must maintain the discipline and
decency in the laboratory.

7. Computer labs are established with sophisticated and high-end branded systems, which should be utilized
properly.

8. Students / Faculty must keep their mobile phones in SWITCHED OFF mode during the lab sessions. Misuse of
the equipment, misbehavior with the staff and systems etc., will attract severe punishment.

9. Students must take the permission of the faculty in case of any urgency to go out; if anybody found loitering
outside the lab / class without permission during working hours will be treated seriously and punished
appropriately.

10. Students should LOG OFF/ SHUT DOWN the computer system before he/she leaves the lab after completing
the task (experiment) in all aspects. He/she must ensure the system / seat is kept properly.

Page | 6
Rizvi College of Engineering CSL604 (Machine Learning)

INDEX
Sr. Title Page Performed Sign & Rubrics
No. No. On Date Points
1 Introduction to platforms such as Anaconda,
COLAB
2 Study of machine learning libraries and tools
in python.
3 To implement linear regression in python.

4 To implement confusion matrix in python.

5 To implement logistic regression in python.

6 To implement SVM in python.

7 To implement Hebbian learning rule in python.

8 To implement Logic Gate using Mc-Culloch


Pitts model.
9 To implement Error Backpropagation
Perceptron Training Algorithm.
10 To implement Principal Component Analysis.

11 Assignment 1

12 Assignment 2

TOTAL

Page | 7
Rizvi College of Engineering CSL604 (Machine Learning)

EXPERIMENTS

Page | 8
Rizvi College of Engineering CSL604 (Machine Learning)

Expt. No. 1

Title: Introduction to platforms such as Anaconda, COLAB

Rubric Score (0 to 3)
Understanding
Analysis
Program Logic & Code
Performance & Output
Timely Submission
Total

Performed On:

Sign:

Page | 9
Rizvi College of Engineering CSL604 (Machine Learning)

Experiment No. 1
ML (Machine Learning)

Aim: Introduction to platforms such as Anaconda, COLAB

Software: Anaconda, Google COLAB

Pre-Lab Questions:

1. How do I read a CSV file in Colab?

Ans: - After you allow permission, copy the given verification code and paste it in the box in Colab.
Once you have completed verification, go to the CSV file in Google Drive, right-click on it and select
“Get shareable link”. The link will be copied into your clipboard. Paste this link into a string variable
in Colab.

Post-Lab Questions:

1. What is the difference between Jupyter Lab and Colab?

Ans: - Google Colab and Jupyter Notebook are similar tools that offer ways to use the Python
programming language. However, in Colab, you cannot install any module; you should have python
installed on your Google machine. Jupyter is a web-based interface that allows editing, sharing, and
executing documents with code cells.

Theory:
To learn machine learning, we will use the Python programming language. So, in order to use Python for
machine learning, we need to install it in our computer system with compatible IDEs (Integrated
Development Environment).
In this experiment, we will learn to install Python and an IDE with the help of Anaconda distribution.
Anaconda distribution is a free and open-source platform for Python/R programming languages. It can be
easily installed on any OS such as Windows, Linux, and MAC OS. It provides more than 1500 Python/R
data science packages which are suitable for developing machine learning and deep learning models.
Anaconda distribution provides installation of Python with various IDE's such as Jupyter
Notebook, Spyder, Anaconda prompt, etc. Hence it is a very convenient packaged solution which you
can easily download and install in your computer. It will automatically install Python and some basic IDEs
and libraries with it. Here we are going to work with Jupyter.

How to Install Anaconda Python:

Step-1: Download Anaconda Python by Clicking on this link

https://www.anaconda.com/distribution/#download-section.

o After clicking on the first link, you will reach to download page of Anaconda.

Page | 10
Rizvi College of Engineering CSL604 (Machine Learning)

o Since, Anaconda is available for Windows, Linux, and Mac OS, hence, you can download it as per
your OS type by clicking on available options shown in below image. It will provide you Python
2.7 and Python 3.7 versions, but the latest version is 3.7, hence we will download Python 3.7
version. After clicking on the download option, it will start downloading on your computer.

Step- 2: Install Anaconda Python (Python 3.7 version):

Once the downloading process gets completed, go to downloads → double click on the ".exe" file
(Anaconda3-2019.03-Windows-x86_64.exe) of Anaconda. It will open a setup window for Anaconda
installations as given in below image, then click on Next.

o It will open a License agreement window click on "I Agree" option and move further.
o In the next window, you will get two options for installations as given in the below image. Select
the first option (Just me) and click on Next.
o Now you will get a window for installing location, here, you can leave it as default or change it by
browsing a location, and then click on Next.
o Now select the second option, and click on install.
o Once the installation gets complete, click on Next.
o Now installation is completed, tick the checkbox if you want to learn more about Anaconda and
Anaconda cloud. Click on Finish to end the process.

Step- 3: Open Anaconda Jupyter

Step- 4: Write your code and press on SHIFT+ENTER/ Run button.

Google COLAB

The Basics. Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody
to write and execute arbitrary python code through the browser, and is especially well suited to
machine learning, data analysis and education.

How to Write Codes in Google COLAB


1. Open Google Colab.
2. Click on ‘New Notebook’ and select Python 2 notebook or Python 3 notebook.

Running a Cell
Make sure the runtime is connected. The notebook shows a green check and ‘Connected’ on the top right
corner.
There are various runtime options in ‘Runtime’.
OR
To run the current cell, press SHIFT + ENTER.

Page | 11
Rizvi College of Engineering CSL604 (Machine Learning)

Program & Output:

Conclusion:

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

CO’s Covered:

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

Page | 12
Rizvi College of Engineering CSL604 (Machine Learning)

Expt. No. 2

Title: Study of machine learning libraries and tools in python.

Rubric Score (0 to 3)
Understanding
Analysis
Program Logic & Code
Performance & Output
Timely Submission
Total

Performed On:

Sign:

Page | 13
Rizvi College of Engineering CSL604 (Machine Learning)

Experiment No. 2
ML (Machine Learning)

Aim : Study of machine learning libraries and tools in python.

Software : GCC Compiler, CodeBlocks

Pre-Lab Questions:

1. Name a few libraries in Python used for Data Analysis and Scientific Computations.

Ans: - Here is a list of Python libraries mainly used for Data Analysis:

NumPy
SciPy
Pandas
SciKit
Matplotlib
Seaborn
Bokeh

Post-Lab Questions:

1. Which library would you prefer for plotting in Python language: Seaborn or Matplotlib or Bokeh?

Ans: -It depends on the visualization you’re trying to achieve. Each of these libraries is used for a
specific purpose:

Matplotlib: Used for basic plotting like bars, pies, lines, scatter plots, etc

Seaborn: Is built on top of Matplotlib and Pandas to ease data plotting. It is used for statistical
visualizations like creating heatmaps or showing the distribution of your data

Bokeh: Used for interactive visualization. In case your data is too complex and you haven’t found any
“message” in the data, then use Bokeh to create interactive visualizations that will allow your viewers
to explore the data themselves.

2. How are NumPy and SciPy related?

Ans: - NumPy is part of SciPy.

NumPy defines arrays along with some basic numerical functions like indexing, sorting, reshaping.

SciPy implements computations such as numerical integration, optimization and machine learning
using NumPy’s functionality.

Page | 14
Rizvi College of Engineering CSL604 (Machine Learning)

Theory :
Machine Learning libraries (Pandas, Numpy, Matplotlib, OpenCV, Flask, Seaborn, etc.) are defined as an
interface of a set of rules or optimized functions that are written in a given language to perform repetitive
work like arithmetic computation, visualizing dataset, reading of images, etc. This saves a lot of time for
the developer and makes the life of the developer easier as the developers can directly use the functions of
the libraries without knowing the implementation of the algorithms.

Libraries of Machine Learning

Following are some of the most popular Machine Learning Libraries

(1) Pandas

(2) Numpy

(3) Matplotlib

(4) Scikit learn

(5) Seaborn

(6) Tensorflow

(7) Theano

(8) Keras

(9) PyTorch

(10) OpenCV

(11) Flask

1. Pandas

Pandas is an open-source python library that provides flexible, high performance, and easy to use data
structures like series, data frames. Python is a helpful language for data preparation, but it lags behind when
it comes to data analysis and modeling. To overcome this lag, Pandas helps complete the entire data analysis
workflow in Python without switching to any other domain-specific languages like R. Pandas enables the
user to read/write datasets in various formats TEXT, CSV, XLS, JSON, SQL, HTML and many more. It
gives high performance for data mining, reshaping, sub-setting, data alignment, slicing, indexing,
merging/joining data sets. But, pandas are inefficient when it comes to memory utilisation. It creates too
many objects to make data manipulation easy, which utilizes high memory.

2. NumPy

NumPy is the most fundamental data handling library, which is popularly used for scientific computing
with python. It allows the user to handle a large N-dimensional array with the ability to perform
mathematical operations. NumPy is famous for its runtime execution speed, parallelization and
vectorization capabilities. It is useful for matrix data manipulation like reshape, transpose, fast

Page | 15
Rizvi College of Engineering CSL604 (Machine Learning)

mathematical/logical operations. Other operations like sorting, selecting, basic linear algebra, discrete
Fourier transform and much more. NumPy consumes lesser memory and provides better runtime behaviour.
But it is dependent on Cython, which makes NumPy difficult to integrate with other C/C++ libraries.

3. Matplotlib

Matplotlib is a data visualization library that works with numpy, pandas and other interactive
environments across platforms. It produces high-quality visualization of data. Matplotlib can be
customized to plot charts, axis, figures or publications, and it is easy to use in jupyter notebooks. The
code for matplotlib may look daunting to some, but it is fairly easy to implement once the user gets used
to it. But it takes a lot of practice to use matplotlib efficiently.

4. Sci-kit learn

Sci-kit learns can be considered as the heart of classical machine learning, which is completely focused
on modeling the data instead of loading, manipulating or summarizing the data. Any task, you just name
it, and sci-kit learn can perform it efficiently. One of the most simple and efficient libraries for data
mining and data analysis, sci-kit learn is an open-source library that is built on NumPy, SciPy &
Matplotlib. It was developed as a part of the google summer code project, which now has become a
widely accepted library for machine learning tasks. Sci-kit learns can be used to prepare classification,
regression, clustering, dimensionality reduction, model selection, feature extraction, normalization and
much more. One drawback of sci-kit learn is, it is not convenient to utilize categorical data.

5. Seaborn

Seaborn library is built on top of the matplotlib. Seaborn makes it easy to plot data visualizations. It
draws attractive information generating graphs with fewer lines of code. Seaborn has special support for
categorical and multivariate data to show aggregate statistics.

6. Tensorflow
Developed by the google brain team for its internal use, TensorFlow is an open-source platform to
develop and train machine learning models. It is a widely accepted platform among ML researchers,
developers, and production environments. Tensorflow performs various tasks, including model
optimization, graphical representation, probabilistic reasoning, statistical analysis. Tensors are the basic
concept of this library, which provides a generalization of vectors and matrices for high dimensional data.
Tensorflow can do numerous ML tasks but is highly used to build deep neural networks.

7. Theano
Developed by Montreal Institute for learning algorithm (MILA), theano is a python library that enables
the user to evaluate mathematical expressions with N-Dimensional arrays. Yes, this is similar to the
Numpy library. The only difference is Numpy is helpful in machine learning, while theano works well for
deep learning. In addition, Theano provides faster computational speed than a CPU, detects and resolves
many errors.

8. Keras
'Deep neural networks made easy'- that should be the tagline of this library. Keras is user-friendly designed
for humans, which follows the best process to reduce the cognitive load. Keras provides easy and fast

Page | 16
Rizvi College of Engineering CSL604 (Machine Learning)

prototyping. It is a high-level neural networks API written in python and runs on top of CNTK, TensorFlow,
and MXNET. Keras provides a large number of already pre-trained models. It supports recurrent and
convolutional networks and the combination of both networks 100. A user can add new modules easily,
which makes Keras suitable for high-level research.

Performance of Keras completely depends on under the hood backends (CNTK, TensorFlow, and MXNET)

9.PyTorch
PyTorch was initially developed by Facebook's artificial intelligence team, which later combined with
caffe2. Till TensorFlow came, PyTorch was the only deep learning framework in the market. It is so
integrated with python that it can be used with other trending libraries like numpy, Python, etc. Furthermore,
PyTorch allows the user to export models in the standard ONNX (Open Neural Network Exchange) to get
direct access to ONNX platforms, runtimes and more.

10. OpenCV
OpenCV is a computer vision library that is built to provide central infrastructure for computer vision
applications and improve machine perception. This library is free for commercial use. Algorithms provided
by OpenCV can be used for face detection, object identification, track moving objects, and camera
movements. In addition, OpenCV is useful for combining two images, which can produce high-resolution
images, follow eye movements, extract 3D models of objects, and much more. It has the ability to perform
on different platforms; its C++, Java, and Python interfaces can support Windows, macOS, iOS, Linux, and
Android.

11. Flask
A group of international python enthusiasts developed a flask in 2004. If you want to develop web
applications, Flask can be the best python web application framework. It relies on the Jinja template engine
and the Werkzeug WSGI toolkit. It is compatible with the google app engine and contains the development
server and debugger. Some other libraries: - Scrapy, Plotly, Bokeh, Spacy, Dask, Gensim, data. table, Caffe,
NLTK, FastAI, Gluon and the list can go on and on.

Conclusion:

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

CO’s Covered:

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

Page | 17
Rizvi College of Engineering CSL604 (Machine Learning)

Expt. No. 3

Title: To implement linear regression in python.

Rubric Score (0 to 3)
Understanding
Analysis
Program Logic & Code
Performance & Output
Timely Submission
Total

Performed On:

Sign:

Page | 18
Rizvi College of Engineering CSL604 (Machine Learning)

Experiment No. 3
ML (Machine Learning)

Aim : To implement linear regression in python.

Software : Anaconda, Google COLAB

Pre-Lab Questions:

1. What is Linear Regression Algorithm?

Ans: - It is a method of finding the best straight-line fitting to the given dataset, i.e., tries to find the
best linear relationship between the independent and dependent variables.

Post-Lab Questions:

1. What can you comment about outliers in Linear regression?

Ans:- Linear regression is sensitive to outliers

Theory:

Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical
method that is used for predictive analysis. Linear regression makes predictions for continuous/real or
numeric variables such as sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the linear
relationship, which means it finds how the value of the dependent variable is changing according to the
value of the independent variable.

The linear regression model provides a sloped straight line representing the relationship between the
variables. Consider the below image:

Page | 19
Rizvi College of Engineering CSL604 (Machine Learning)

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

Simple Linear Regression:


If a single independent variable is used to predict the value of a numerical dependent variable, then such a
Linear Regression algorithm is called Simple Linear Regression.

Multiple Linear regression:


If more than one independent variable is used to predict the value of a numerical dependent variable, then
such a Linear Regression algorithm is called Multiple Linear Regression.

Linear Regression Line

A linear line showing the relationship between the dependent and independent variables is called
a regression line. A regression line can show two types of relationship:

Positive Linear Relationship:


If the dependent variable increases on the Y-axis and independent variable increases on X-axis, then such
a relationship is termed as a Positive linear relationship.

Page | 20
Rizvi College of Engineering CSL604 (Machine Learning)

Negative Linear Relationship:


If the dependent variable decreases on the Y-axis and independent variable increases on the X-axis, then
such a relationship is called a negative linear relationship.

Page | 21
Rizvi College of Engineering CSL604 (Machine Learning)

Program & Output:

Page | 22
Rizvi College of Engineering CSL604 (Machine Learning)

Page | 23
Rizvi College of Engineering CSL604 (Machine Learning)

Page | 24
Rizvi College of Engineering CSL604 (Machine Learning)

Page | 25
Rizvi College of Engineering CSL604 (Machine Learning)

Page | 26
Rizvi College of Engineering CSL604 (Machine Learning)

Conclusion:

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

CO’s Covered:

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

Page | 27
Rizvi College of Engineering CSL604 (Machine Learning)

Expt. No. 4

Title: To implement confusion matrix in python.

Rubric Score (0 to 3)
Understanding
Analysis
Program Logic & Code
Performance & Output
Timely Submission
Total

Performed On:

Sign:

Page | 28
Rizvi College of Engineering CSL604 (Machine Learning)

Experiment No. 4
ML ( Machine Learning )

Aim : To implement confusion matrix in python.

Software : Anaconda, Google COLAB

Pre-Lab Questions:

1. What Is a Confusion Matrix?

Ans: - A Confusion matrix is an N x N matrix used for evaluating the performance of a classification
model, where N is the total number of target classes. The matrix compares the actual target values with
those predicted by the machine learning model.

Post-Lab Questions:

1. What are important terms in a confusion matrix

Ans:- True Positive (TP)

The predicted value matches the actual value, or the predicted class matches the actual class. The actual
value was positive, and the model predicted a positive value.

True Negative (TN)

The predicted value matches the actual value, or the predicted class matches the actual class. The actual
value was negative, and the model predicted a negative value.

False Positive (FP) – Type I Error

The predicted value was falsely predicted. The actual value was negative, but the model predicted
a positive value. Also known as the type I error.

False Negative (FN) – Type II Error

The predicted value was falsely predicted. The actual value was positive, but the model predicted a
negative value. Also known as the type II error.

Theory :

The confusion matrix is a matrix used to determine the performance of the classification models for a given
set of test data. It can only be determined if the true values for test data are known. The matrix itself can be
easily understood, but the related terminologies may be confusing. Since it shows the errors in the model
performance in the form of a matrix, hence also known as an error matrix. Some features of Confusion
matrix are given below:

Page | 29
Rizvi College of Engineering CSL604 (Machine Learning)

o For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is 3*3 table, and
so on.
o The matrix is divided into two dimensions, that are predicted values and actual values along with
the total number of predictions.
o Predicted values are those values, which are predicted by the model, and actual values are the true
values for the given observations.
o It looks like the below table:

The above table has the following cases:

o True Negative: Model has given prediction No, and the real or actual value was also No.
o True Positive: The model has predicted yes, and the actual value was also true.
o False Negative: The model has predicted no, but the actual value was Yes, it is also called as Type-
II error.
o False Positive: The model has predicted Yes, but the actual value was No. It is also called a Type-
I error.

Need for Confusion Matrix in Machine learning


o It evaluates the performance of the classification models, when they make predictions on test data,
and tells how good our classification model is.
o It not only tells the error made by the classifiers but also the type of errors such as it is either type-
I or type-II error.
o With the help of the confusion matrix, we can calculate the different parameters for the model, such
as accuracy, precision, etc.

Calculations using Confusion Matrix:


Classification Accuracy: It is one of the important parameters to determine the accuracy of the
classification problems. It defines how often the model predicts the correct output. It can be calculated as
the ratio of the number of correct predictions made by the classifier to all number of predictions made by

Page | 30
Rizvi College of Engineering CSL604 (Machine Learning)

the classifiers. The formula is given below:

Misclassification rate: It is also termed as Error rate, and it defines how often the model gives the wrong
predictions. The value of error rate can be calculated as the number of incorrect predictions to all number
of the predictions made by the classifier. The formula is given below:

Precision: It can be defined as the number of correct outputs provided by the model or out of all positive
classes that have predicted correctly by the model, how many of them were actually true. It can be
calculated using the below formula:

Recall: It is defined as the out of total positive classes, how our model predicted correctly. The recall
must be as high as possible.

F-measure: If two models have low precision and high recall or vice versa, it is difficult to compare these
models. So, for this purpose, we can use F-score. This score helps us to evaluate the recall and precision
at the same time. The F-score is maximum if the recall is equal to the precision. It can be calculated using
the below formula:

Page | 31
Rizvi College of Engineering CSL604 (Machine Learning)

Program & Output:

Page | 32
Rizvi College of Engineering CSL604 (Machine Learning)

Page | 33
Rizvi College of Engineering CSL604 (Machine Learning)

Page | 34
Rizvi College of Engineering CSL604 (Machine Learning)

Page | 35
Rizvi College of Engineering CSL604 (Machine Learning)

Page | 36
Rizvi College of Engineering CSL604 (Machine Learning)

Conclusion:

………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
CO’s Covered:
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..

Page | 37
Rizvi College of Engineering CSL604 (Machine Learning)

Expt No. 5

Title: To implement Logistic Regression in python.

Rubric Score (0 to 3)
Understanding
Analysis
Program Logic & Code
Performance & Output
Timely Submission
Total

Performed On:

Sign:

Page | 38
Rizvi College of Engineering CSL604 (Machine Learning)

Experiment No. 5
ML (Machine Learning)

Aim : To implement logistic regression in python.

Software : Anaconda, Google COLAB

Pre-Lab Questions:

1. Is Logistic regression a supervised machine learning algorithm?

Ans: - True, Logistic regression is a supervised learning algorithm because it uses true
labels for training. Supervised learning algorithm should have input variables (x) and a
target variable (Y) when you train the model

Post-Lab Questions:

1. Is Logistic regression mainly used for Regression?

Ans: - Logistic regression is a classification algorithm, don’t confuse with the name
regression.

Theory:

Logistic regression is one of the most popular Machine Learning algorithms, which comes under the
Supervised Learning technique. It is used for predicting the categorical dependent variable using a given
set of independent variables. Logistic regression predicts the output of a categorical dependent variable.
Therefore the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie
between 0 and 1. Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic regression is used for
solving the classification problems. In Logistic regression, instead of fitting a regression line, we fit an
"S" shaped logistic function, which predicts two maximum values (0 or 1). The curve from the logistic
function indicates the likelihood of something such as whether the cells are cancerous or not, a mouse is
obese or not based on its weight, etc. Logistic Regression is a significant machine learning algorithm
because it has the ability to provide probabilities and classify new data using continuous and discrete
datasets. Logistic Regression can be used to classify the observations using different types of data and can
easily determine the most effective variables used for the classification. The below image is showing the
logistic function:

Page | 39
Rizvi College of Engineering CSL604 (Machine Learning)

Logistic Function (Sigmoid Function):

o The sigmoid function is a mathematical function used to map the predicted values to probabilities.
o It maps any real value into another value within a range of 0 and 1.
o The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so
it forms a curve like the "S" form. The S-form curve is called the Sigmoid function or the logistic
function.
o In logistic regression, we use the concept of the threshold value, which defines the probability of
either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold
values tends to 0.

Assumptions for Logistic Regression:


o The dependent variable must be categorical in nature.
o The independent variable should not have multi-collinearity.

Logistic Regression Equation:

The Logistic regression equation can be obtained from the Linear Regression equation. The mathematical
steps to get Logistic Regression equations are given below:

o We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above equation by
(1-y):

Page | 40
Rizvi College of Engineering CSL604 (Machine Learning)

o But we need range between -[infinity] to +[infinity], then take logarithm of the equation it will
become:

The above equation is the final equation for Logistic Regression.

Type of Logistic Regression:

On the basis of the categories, Logistic Regression can be classified into three types:

o Binomial: In binomial Logistic regression, there can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
o Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types
of the dependent variable, such as "cat", "dogs", or "sheep"
o Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent
variables, such as "low", "Medium", or "High".

Page | 41
Rizvi College of Engineering CSL604 (Machine Learning)

Program & Output:

Page | 42
Rizvi College of Engineering CSL604 (Machine Learning)

Page | 43
Rizvi College of Engineering CSL604 (Machine Learning)

Page | 44
Rizvi College of Engineering CSL604 (Machine Learning)

Conclusion:
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
CO’s Covered:
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..

Page | 45
Rizvi College of Engineering CSL604 (Machine Learning)

Expt No. 6

Title: To implement SVM in python.

Rubric Score (0 to 3)
Understanding
Analysis
Program Logic & Code
Performance & Output
Timely Submission
Total

Performed On:

Sign:

Page | 46
Rizvi College of Engineering CSL604 (Machine Learning)

Experiment No. 6
ML (Machine Learning)

Aim : To implement SVM in python.

Software : Anaconda, Google COLAB

Pre-Lab Questions:

1. What are Support Vectors in SVMs?

Ans: - Support vectors are those instances that are located on the margin itself. For SVMS,
the decision boundary is entirely determined by using only the support vectors.

Any instance that is not a support vector (not on the margin boundaries) has no influence
whatsoever; you could remove them or add more instances, or move them around, and as
long as they stay off the margin, they won’t affect the decision boundary.

For computing the predictions, only the support vectors are involved, not the whole
training set.

Post-Lab Questions:

1. What does the cost parameter in the SVM mean?

Ans: - The cost parameter decides how much an SVM should be allowed to “bend” with
the data. For a low cost, you aim for a smooth decision surface, and for a higher cost,
you aim to classify more points correctly. It is also simply referred to as the cost of
misclassification.

Theory:

Introduction

I guess by now you would’ve accustomed yourself with linear regression and logistic regression algorithms.

If not, I suggest you have a look at them before moving on to support vector machine. Support vector

machine is another simple algorithm that every machine learning expert should have in his/her arsenal.

Support vector machine is highly preferred by many as it produces significant accuracy with less

computation power. Support Vector Machine, abbreviated as SVM can be used for both regression and

classification tasks. But it is widely used in classification objectives.

Page | 47
Rizvi College of Engineering CSC603 (Machine Learning)

What is Support Vector Machine?

The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N

— the number of features) that distinctly classifies the data points.

Hyperplanes and Support Vectors


Hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of
the hyperplane can be attributed to different classes. Also, the dimension of the hyperplane depends upon
the number of features. If the number of input features is 2, then the hyperplane is just a line. If the number
of input features is 3, then the hyperplane becomes a two-dimensional plane. It becomes difficult to imagine
when the number of features exceeds 3.

Page | 48
Rizvi College of Engineering CSC603 (Machine Learning)

Support vectors

Support vectors are data points that are closer to the hyperplane and influence the position and orientation
of the hyperplane. Using these support vectors, we maximize the margin of the classifier. Deleting the
support vectors will change the position of the hyperplane. These are the points that help us build our SVM.

Page | 49
Rizvi College of Engineering CSC603 (Machine Learning)

Program & Output:

Page | 50
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 51
Rizvi College of Engineering CSC603 (Machine Learning)

Conclusion:
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
CO’s Covered:
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..

Page | 52
Rizvi College of Engineering CSC603 (Machine Learning)

Expt No. 7

Title: To implement Hebbian Learning Rule in python.

Rubric Score (0 to 3)
Understanding
Analysis
Program Logic & Code
Performance & Output
Timely Submission
Total

Performed On:

Sign:

Page | 53
Rizvi College of Engineering CSC603 (Machine Learning)

Experiment No. 7
ML (Machine Learning)

Aim : To implement Hebbian Learning Rule in python.

Software : Anaconda, Google COLAB

Pre-Lab Questions:

1. Mention important ANN Terminology

Ans: - Before we classify the various learning rules in ANN, let us understand some
important terminologies related to ANN.

#1) Weights: In an ANN, each neuron is connected to the other neurons through
connection links. These links carry a weight. The weight has information about the
input signal to the neuron. The weights and input signal are used to get an output.
The weights can be denoted in a matrix form that is also called a Connection matrix.

Each neuron is connected to every other neuron of the next layer through connection
weights. Hence, if there are “n” nodes and each node has “m” weights, then the
weight matrix will be:

W1 represents the weight vector starting from node 1. W11 represents the weight
vector from the 1st node of the preceding layer to the 1st node of the next layer.
Similarly, wij represents the weight vector from the “ith” processing element
(neuron) to the “jth” processing element of the next layer.

#2) Bias: The bias is added to the network by adding an input element x (b) = 1 into
the input vector. The bias also carries a weight denoted by w (b).

Page | 54
Rizvi College of Engineering CSC603 (Machine Learning)

The bias plays an important role in calculating the output of the neuron. The bias can
either be positive or negative. A positive bias increases the net input weight while
the negative bias reduces the net input.

#3) Threshold: A threshold value is used in the activation function. The net input is
compared with the threshold to get the output. In NN, the activation function is
defined based on the threshold value and output is calculated.

The threshold value is:

Threshold

#4) Learning Rate: It is denoted by alpha? The learning rate ranges from 0 to 1. It is
used for weight adjustment during the learning process of NN.

#5) Momentum Factor: It is added for faster convergence of results. The momentum
factor is added to the weight and is generally used in backpropagation networks.

Page | 55
Rizvi College of Engineering CSC603 (Machine Learning)

Post-Lab Questions:

1. What is Training Algorithm for Hebbian Learning Rule?

Ans: -

The training steps of the algorithm are as follows:

1. Initially, the weights are set to zero, i.e. w =0 for all inputs i =1 to n and n is the total
number of input neurons.
2. Let s be the output. The activation function for inputs is generally set as an identity
function.
3. The activation function for output is also set to y= t.
4. The weight adjustments and bias are adjusted to:

The steps 2 to 4 are repeated for each input vector and output.

Theory:

The simplest neural network (threshold neuron) lacks the capability of learning, which is its major
drawback. In the book “The Organisation of Behaviour”, Donald O. Hebb proposed a mechanism to
update weights between neurons in a neural network. This method of weight updation enabled neurons to
learn and was named as Hebbian Learning.

Three major points were stated as a part of this learning mechanism:

 Information is stored in the connections between neurons in neural networks, in the form of
weights.

 Weight change between neurons is proportional to the product of activation values for
neurons.

Page | 56
Rizvi College of Engineering CSC603 (Machine Learning)

 As learning takes place, simultaneous or repeated activation of weakly connected neurons


incrementally changes the strength and pattern of weights, leading to stronger connections.

Neuron Assembly Theory

The repeated stimulus of weak connections between neurons leads to their incremental strengthening.

The new weights are calculated by the equation :

Inhibitory Connections

This is another kind of connection, that have an opposite response to a stimulus. Here, the connection
strength decreases with repeated or simultaneous stimuli.

Implementation of Hebbian Learning in a Perceptron

Frank Rosenblatt in 1950, inferred that threshold neuron cannot be used for modeling cognition as it
cannot learn or adopt from the environment or develop capabilities for classification, recognition or similar
capabilities.

A perceptron draws inspiration from a biological visual neural model with three layers illustrated as
follows :
Input Layer is synonymous to sensory cells in the retina, with random connections to neurons of the
succeeding layer.

Association layers have threshold neurons with bi-directional connections to the response layer.

Response layer has threshold neurons that are interconnected with each other for competitive inhibitory
signaling.

Response layer neurons compete with each other by sending inhibitory signals to produce output.
Threshold functions are set at the origin for the association and response layers. This forms the basis of
learning between these layers. The goal of the perception is to activate correct response neurons for each
input pattern.

Page | 57
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 58
Rizvi College of Engineering CSC603 (Machine Learning)

Program & Output:

Page | 59
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 60
Rizvi College of Engineering CSC603 (Machine Learning)

Conclusion:

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

CO’s Covered:

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

Page | 61
Rizvi College of Engineering CSC603 (Machine Learning)

Expt No. 8

Title: To implement Logic gates using Mc-Culloch Pitts


Model.

Rubric Score (0 to 3)
Understanding
Analysis
Program Logic & Code
Performance & Output
Timely Submission
Total

Performed On:

Sign:

Page | 62
Rizvi College of Engineering CSC603 (Machine Learning)

Experiment No. 8
ML (Machine Learning)

Aim: To implement Logic gates using Mc-Culloch Pitts Model.

Software: Anaconda, Google COLAB

Pre-Lab Questions:

1. What is Mathematical Definition of Mc-Culloch Pitts Model.

Ans: - McCulloch and Pitts developed a mathematical formulation know as linear


threshold gate, which describes the activity of a single neuron with two states, firing
or not-firing. In its simplest form, the mathematical formulation is as follows:

Where I1,I2,…,IN are binary input values ∈0,1


; W1,W2,…,WN are weights associated with each input ∈−1,1;
Sum is the weighted sum of inputs; and T is a predefined threshold value for the
neuron activation (i.e., firing).
Figure 3 shows a graphical representation of the McCulloch-Pitts artificial neuron.

Post-Lab Questions:

1. What are limitations of the McCulloch-Pitts Artificial Neuron

Ans: - Many other boolean functions can be emulated with this simple, yet versatile
model. Nonetheless, it has many limitations. Among the main ones are:

o Only binary inputs and outputs are allowed: this is a significant limitation since many of the
features that you can imagine can be useful to make decisions are continuous rather than
binary. The same goes for the decisions themselves, where in many instances you may want
to attach a continuous value (e.g., a probability value) to a decision instead of a yes or no
label.
o No learning is possible: as you may have realized, you have to figure it out the solution to
your problem beforehand. In this sense, the model has no autonomy whatsoever, restricting
the problems that can be solved to the ones that you know how to solve already.

Page | 63
Rizvi College of Engineering CSC603 (Machine Learning)

o Manual adjustment of the weights and threshold: connected to the lack of a learning
procedure, once you figure out the solution, you will have to adjust all the parameters by
hand.

Theory:

McCulloch-Pitts Neuron
It is very well known that the most fundamental unit of deep neural networks is called an artificial
neuron/perceptron. But the very first step towards the perceptron we use today was taken in 1943 by
McCulloch and Pitts, by mimicking the functionality of a biological neuron.

Dendrite: Receives signals from other neurons

Soma: Processes the information

Axon: Transmits the output of this neuron

Synapse: Point of connection to other neurons

Basically, a neuron takes an input signal (dendrite), processes it like the CPU (soma), passes the output

through a cable like structure to other connected neurons (axon to synapse to other neuron’s dendrite).

Page | 64
Rizvi College of Engineering CSC603 (Machine Learning)

Now, this might be biologically inaccurate as there is a lot more going on out there but on a higher level,

this is what is going on with a neuron in our brain — takes an input, processes it, throws out an output.

McCulloch-Pitts Neuron

The first computational model of a neuron was proposed by Warren MuCulloch (neuroscientist) and

Walter Pitts (logician) in 1943.

Page | 65
Rizvi College of Engineering CSC603 (Machine Learning)

AND Function

An AND function neuron would only fire when ALL the inputs are ON i.e., g(x) ≥ 3 here.

OR Function

I believe this is self-explanatory as we know that an OR function neuron would fire if ANY of the inputs is
ON i.e., g(x) ≥ 1 here.

NOR Function

For a NOR neuron to fire, we want ALL the inputs to be 0 so the thresholding parameter should also be 0
and we take them all as inhibitory input.

Page | 66
Rizvi College of Engineering CSC603 (Machine Learning)

NOT Function

For a NOT neuron, 1 outputs 0 and 0 outputs 1. So we take the input as an inhibitory input and set the
thresholding parameter to 0. It works!

Can any boolean function be represented using the M-P neuron? Before you answer that, lets understand
what M-P neuron is doing geometrically.

Page | 67
Rizvi College of Engineering CSC603 (Machine Learning)

Program & Output:

Page | 68
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 69
Rizvi College of Engineering CSC603 (Machine Learning)

Conclusion:

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

CO’s Covered:

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

………………………………………………………………………………………………..........................

………………………………………………………………………………………………………………..

Page | 70
Rizvi College of Engineering CSC603 (Machine Learning)

Expt No. 9

Title: To Implement Error Backpropagation Perceptron


Training Algorithm.

Rubric Score (0 to 3)
Understanding
Analysis
Program Logic & Code
Performance & Output
Timely Submission
Total

Performed On:

Sign:

Page | 71
Rizvi College of Engineering CSC603 (Machine Learning)

Experiment No. 9
ML ( Machine Learning )

Aim : To implement Error Backpropagation Perceptron Training Algorithm.

Software : Anaconda, Google COLAB

Pre-Lab Questions:

1. What is Back propagation in Neural Networks

Ans: - The principle behind the back propagation algorithm is to reduce the error
values in randomly allocated weights and biases such that it produces the correct
output. The system is trained in the supervised learning method, where the error
between the system’s output and a known expected output is presented to the system
and used to modify its internal state. We need to update the weights such that we get
the global loss minimum. This is how back propagation in neural networks works

When the gradient is negative, an increase in weight decreases the error.

When the gradient is positive, the decrease in weight decreases the error.

Post-Lab Questions:

1. What are the advantages of the backpropagation algorithm?

Ans:- Here are some of the advantages of the backpropagation algorithm:

1. It’s memory-efficient in calculating the derivatives, as it uses less memory compared


to other optimization algorithms, like the genetic algorithm. This is a very important
feature, especially with large networks.

Page | 72
Rizvi College of Engineering CSC603 (Machine Learning)

2. The backpropagation algorithm is fast, especially for small and medium-sized


networks. As more layers and neurons are added, it starts to get slower as more
derivatives are calculated.

3. This algorithm is generic enough to work with different network architectures, like
convolutional neural networks, generative adversarial networks, fully-connected
networks, and more.

4. There are no parameters to tune the backpropagation algorithm, so there’s less


overhead. The only parameters in the process are related to the gradient descent
algorithm, like learning rate.

Theory :

The error backpropagation learning algorithm is tool used during the training of neural networks. The main
goal is to compute the gradient of the loss function (also known as the error function or cost
function). These gradients are required for many optimization routines such as stochastic gradient descent
and its many variants.

How does Error Backpropagation Work?

Essentially, calculating the gradients relies entirely on the rules of differential calculus. As a neural network
is a series of layers, for each data point the loss function is computed by passing a label data point through
the network (feed forward). Next, the gradients are calculated starting from the final layer and then through
use of the chain rule, the gradients can be passed backwards to calculate the gradients in the previous
layers. The goal is to get the gradients for the loss function with respect to each model parameter (weights
for each neural node connection as well as the bias weights). This point of this backwards method of error
checking is to more efficiently calculate the gradient at each layer than the traditional approach of
calculating each layer’s gradient separately.

What are the Uses of Error Backpropagation?

Backpropagation is especially useful for deep neural networks working on error-prone projects, such as
image or speech recognition. Taking advantage of the chain and power rules allows backpropagation to
function with any number of outputs and better train all sorts of neural networks.

Page | 73
Rizvi College of Engineering CSC603 (Machine Learning)

Program & Output:

Page | 74
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 75
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 76
Rizvi College of Engineering CSC603 (Machine Learning)

Conclusion:
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
CO’s Covered:
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..

Page | 77
Rizvi College of Engineering CSC603 (Machine Learning)

Expt No. 10

Title: To Implement Principal Component Analysis.

Rubric Score (0 to 3)
Understanding
Analysis
Program Logic & Code
Performance & Output
Timely Submission
Total

Performed On:

Sign:

Page | 78
Rizvi College of Engineering CSC603 (Machine Learning)

Experiment No. 10
ML (Machine Learning)

Aim : To implement Principle Component Analysis.

Software : Anaconda, Google COLAB

Pre-Lab Questions:

1. What is Curse of Dimensionality?

Ans: When working with data is greater dimensions, issues arise. As the number of features
increases, so does the number of samples, resulting in a complex model. This is known as curse
of dimensionality. Because of the enormous number of features, there is a potential that our
model would overfit. As a result, it performs badly on the test data because it overly reliant on
training data.

Post-Lab Questions:

1. Can we implement Principal Component Analysis for Regression?

Ans:- Yes, we can use principle components to set up regression. PCA performs effectively
when the first few principal components are sufficient to capture the majority of the variation
in the predictors and the relationship with the response. The only disadvantage of this
approach is that when using a PCA, the new reduced set of features would be modeled while
ignoring the response variable Y. While these features may do a good overall job of
explaining variation in X, the model will perform poorly if these variables do not explain
variation in Y.

Theory :

Principal Component Analysis is an unsupervised learning algorithm that is used for the dimensionality
reduction in Machine learning. It is a statistical process that converts the observations of correlated features
into a set of linearly uncorrelated features with the help of orthogonal transformation. These new
transformed features are called the Principal Components. It is one of the popular tools that is used for
exploratory data analysis and predictive modeling. It is a technique to draw strong patterns from the given
dataset by reducing the variances.

PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.

PCA works by considering the variance of each attribute because the high attribute shows the good split
between the classes, and hence it reduces the dimensionality. Some real-world applications of PCA
are image processing, movie recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the important variables and
drops the least important variable.

Page | 79
Rizvi College of Engineering CSC603 (Machine Learning)

Some common terms used in PCA algorithm:

o Dimensionality: It is the number of features or variables present in the given dataset. More easily,
it is the number of columns present in the dataset.
o Correlation: It signifies that how strongly two variables are related to each other. Such as if one
changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1
occurs if variables are inversely proportional to each other, and +1 indicates that variables are
directly proportional to each other.
o Orthogonal: It defines that variables are not correlated to each other, and hence the correlation
between the pair of variables is zero.
o Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be
eigenvector if Av is the scalar multiple of v.
o Covariance Matrix: A matrix containing the covariance between the pair of variables is called the
Covariance Matrix.

Applications of Principal Component Analysis

o PCA is mainly used as the dimensionality reduction technique in various AI applications such as
computer vision, image compression, etc.
o It can also be used for finding hidden patterns if data has high dimensions. Some fields where PCA
is used are Finance, data mining, Psychology, etc.

Page | 80
Rizvi College of Engineering CSC603 (Machine Learning)

Program & Output:

Page | 81
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 82
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 83
Rizvi College of Engineering CSC603 (Machine Learning)

Conclusion:
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
CO’s Covered:
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..
………………………………………………………………………………………………..........................
………………………………………………………………………………………………………………..

Page | 84
Rizvi College of Engineering CSC603 (Machine Learning)

ASSIGNMENTS

Page | 85
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 86
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 87
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 88
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 89
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 90
Rizvi College of Engineering CSC603 (Machine Learning)

Page | 91

You might also like