Download as pdf or txt
Download as pdf or txt
You are on page 1of 445

Artificial

Intelligence
Introduction to Artificial Intelligence

Copyright Intellipaat. All rights reserved.


Agenda

01 Importance of AI 02 What Is AI?

03 What Is Intelligence? 04 Difference Between AI, ML, and DL

05 Basics of Machine Learning 06 Basics of Deep Learning

Copyright Intellipaat. All rights reserved.


Let us understand why we should
study Artificial Intelligence!

Copyright Intellipaat. All rights reserved.


Why Artificial Intelligence?
Widely used in banking An important feature Perfect for heavy Efficiently used in air
and finance industries of medical science industries transport

Changed the face of Reinvented the world A great help for


gaming humans
Copyright Intellipaat. All rights reserved.
Want to know more about Artificial
Intelligence?

Copyright Intellipaat. All rights reserved.


Artificial Intelligence

‘Artificial intelligence (AI) is a field of computer science that emphasizes on the creation of intelligent machines
which can work and react like humans’

Copyright Intellipaat. All rights reserved.


Artificial Intelligence

‘Artificial intelligence (AI) is a field of computer science that emphasizes on the creation of intelligent machines
which can work and react like humans’

Copyright Intellipaat. All rights reserved.


What Is Intelligence?

‘Intelligence can be defined as one's capacity for understanding, self-awareness, learning, emotional
knowledge, planning, creativity, and problem solving’

▪ Artificial Intelligence is intelligence in machines

▪ It is commonly implemented in computer systems using program software

▪ Accordingly, there are two possibilities:

• A system with intelligence is expected to behave as intelligently as a human

• A system with intelligence is expected to behave in the best possible manner

Copyright Intellipaat. All rights reserved.


What Makes Humans Intelligent?
The core problem of Artificial Intelligence includes programming computers for certain traits such as:

Copyright Intellipaat. All rights reserved.


Growth of Artificial Intelligence

Copyright Intellipaat. All rights reserved.


A lot of people think that Artificial
Intelligence, Machine Learning, and
Deep Learning, all are the same. Let me
tell you some real facts then!

Copyright Intellipaat. All rights reserved.


AI and ML and DL

Artificial Intelligence

Machine Learning (ML)


Deep Learning (DL)
• An Approach to achieve Artificial Intelligence
• A technique for implementing Machine Learning
• A subfield of AI that aims to teach computers the
• A subfield of AI that uses specialized techniques
ability to do tasks with data, without explicit
involving multi-layer (2+) artificial neural networks
programming
• Layering allows cascaded learning and abstraction
• It uses numerical and statistical approaches,
levels (e.g., line -> shape -> object -> scene)
including artificial neural networks, to encode
learning in models

Copyright Intellipaat. All rights reserved.


AI in a Bigger Set

Copyright Intellipaat. All rights reserved.


Let us understand Machine Learning in
detail!

Copyright Intellipaat. All rights reserved.


Machine Learning Around YOU!

Copyright Intellipaat. All rights reserved.


Machine
Learning Around Products Recommendation
YOU!

Copyright Intellipaat. All rights reserved.


Machine Amazon Alexa
Learning Around
YOU!

Copyright Intellipaat. All rights reserved.


Movie Recommendation
Machine
Learning Around
YOU!

Copyright Intellipaat. All rights reserved.


Google Traffic Prediction
Machine
Learning Around
YOU!

Copyright Intellipaat. All rights reserved.


Introduction to Machine Learning

Copyright Intellipaat. All rights reserved.


What Is Machine Learning?

Machine Learning is a subset of Artificial Intelligence which gives a machine the ability to learn without being explicitly
programmed. Data, not algorithms, is key to machine learning success

Copyright Intellipaat. All rights reserved.


What Is Machine Learning?

Copyright Intellipaat. All rights reserved.


How Does a Machine Learn?

▪ Machine Learning algorithm is trained using a training

dataset to create a model

▪ When a new input data is introduced to the ML algorithm,

it makes a prediction on the basis of the model

▪ The prediction is evaluated for accuracy and if the

accuracy is acceptable, the Machine Learning algorithm is

deployed

▪ If the accuracy is not acceptable, the Machine Learning

algorithm is trained again and again with an augmented

training dataset

Copyright Intellipaat. All rights reserved.


Machine Learning Types

Machine learning is categorized into three types

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Copyright Intellipaat. All rights reserved.


In Supervised Learning, you can consider that the learning is guided by a teacher. We have a dataset
which acts as a teacher and its role is to train the model or the machine. Once the model gets trained,

Machine it can start making a prediction or decision whenever new data is given to it

Learning Types!
Training Data

Apple Apple

Oh…
Supervised Learning Apple Apple okay…
noted...

This is
Unsupervised Learning an apple

Reinforcement Learning
Copyright Intellipaat. All rights reserved.
In Supervised Learning, you can consider that the learning is guided by a teacher. We have a dataset
which acts as a teacher and its role is to train the model or the machine. Once the model gets trained,

Machine it can start making a prediction or decision whenever new data is given to it

Learning Types!
Training Data

97% it’s
Supervised Learning What does
an apple!
this image
represent?

Unsupervised Learning

Reinforcement Learning
Copyright Intellipaat. All rights reserved.
Use Case: Spam Classifier

Machine
Learning Types!

Supervised Learning

Unsupervised Learning Most of the spam filtering techniques are based on text categorization methods. Thus, filtering spam
turns out to be a classification problem. We employee Supervised Machine Learning techniques to filter
the email spam messages
Reinforcement Learning
Copyright Intellipaat. All rights reserved.
Here, the model learns through observation and finds structures in data. Once the model is given a
dataset, it automatically finds patterns and relationships in the dataset by creating clusters in it. What

Machine
it cannot do is to add labels to these clusters. For example, it cannot say if this is a group of apples,
mangoes, or oranges, but it will separate all apples from mangoes and oranges

Learning Types!

Supervised Learning …

Unsupervised Learning

Reinforcement Learning
Copyright Intellipaat. All rights reserved.
Here, the model learns through observation and finds structures in data. Once the model is given a
dataset, it automatically finds patterns and relationships in the dataset by creating clusters in it. What

Machine
it cannot do is to add labels to these clusters. For example, it cannot say if this is a group of apples,
mangoes, or oranges, but it will separate all apples from mangoes and oranges

Learning Types!

Similar Cluster 1
Similar Cluster 3

3
Supervised Learning Similar Cluster 2 clusters

Unsupervised Learning

Reinforcement Learning
Copyright Intellipaat. All rights reserved.
Use Case: Netflix Recommendation

Machine
Learning Types!

Supervised Learning

Unsupervised Learning Netflix uses Machine Learning algorithms to help break viewers’ preconceived notions and find shows that
they might not have initially chosen

Reinforcement Learning
Copyright Intellipaat. All rights reserved.
It is the ability of an agent to interact with the environment and find out what the best outcome is. It
follows the concept of hit and trial method. The agent is rewarded or penalized with a point for a correct

Machine
or a wrong answer, and on the basis of the positive reward points gained the model trains itself

Learning Types! Agent Environment

1 Observe

Select an Action
2 Using Policy

3 Action! -50 Points

Reward or
4
Supervised Learning Penalty

From Next Time…


Unsupervised Learning 5 Update Policy

Iterate the
6 Process
Reinforcement Learning
Copyright Intellipaat. All rights reserved.
Use Case: Self-driving Cars

Machine
Learning Types!

Supervised Learning

Unsupervised Learning Companies such as Tesla (you’ve heard of them), Google, Wayve, and more are working on such machines.
These cars are powered by Reinforcement Learning. It allows machines (known as agents) to learn by
experimentation
Reinforcement Learning
Copyright Intellipaat. All rights reserved.
Machine Learning Algorithms

Copyright Intellipaat. All rights reserved.


Machine Learning for You!

Cool Machine Learning projects you can use:

➢ https://www.autodraw.com/

➢ https://quickdraw.withgoogle.com/

➢ https://opensource.google.com/projects/explore/machine-learning

➢ https://experiments.withgoogle.com/collection/ai

➢ https://toolbox.google.com/datasetsearch

Copyright Intellipaat. All rights reserved.


Limitations of Machine Learning

Machine Learning algorithms


Error diagnosis and correction
require massive amount of
can be difficult
training data

Time constraints in learning as


Lack of creativity
it learns through historical data

Copyright Intellipaat. All rights reserved.


Introduction to Deep Learning

Copyright Intellipaat. All rights reserved.


Deep Learning

Deep Learning is part of Machine Learning methods based on learning data representations, as opposed to task-specific
algorithms. It teaches computers to do what comes naturally to humans (to learn by examples)

Copyright Intellipaat. All rights reserved.


Deep Learning

▪ Deep Learning architectures such as deep neural

networks, deep belief networks, and recurrent

neural networks have been applied to fields

including computer vision, speech recognition,

natural language processing, audio recognition,

etc. where they have produced results

comparable to, and in some cases superior to,

human experts

▪ Most modern Deep Learning models are based

on artificial neural networks

Copyright Intellipaat. All rights reserved.


Applications of Speech Recognition
Deep Learning

Copyright Intellipaat. All rights reserved.


Applications of Self-driving Cars
Deep Learning

Copyright Intellipaat. All rights reserved.


Applications of Automatic Machine Translation
Deep Learning

Copyright Intellipaat. All rights reserved.


Applications of Visual Translation
Deep Learning

Copyright Intellipaat. All rights reserved.


How Does Deep Learning Work?

Most Deep Learning methods use neural networks architecture, which is why Deep Learning models are often referred to
as deep neural networks

▪ The term ‘deep’ usually refers to the number of

hidden layers in the neural network

▪ Traditional neural networks contain only 2–3 hidden

layers, while deep networks can have as many as 150

▪ Deep Learning models are trained using large sets of

labeled data and neural network architectures that

learn features directly from data without the need for

manual feature extraction

Copyright Intellipaat. All rights reserved.


What Is a Neural Network?

A neural network is a computing model whose layered structure resembles the networked structure of neurons in the
brain, with layers of connected nodes. It can learn from data, so it can be trained to recognize patterns, classify data, and
forecast future events

▪ A neural network breaks down your input into layers of

abstraction

▪ It consists of an input layer, one or more hidden layers, and

an output layer

▪ These layers are interconnected via nodes, or neurons, with

each layer using the output of the previous layer as its input

▪ Its main function is to receive a set of inputs, perform

calculations, and then use the output to solve the problem

Copyright Intellipaat. All rights reserved.


Artificial Neural Networks (ANN)

Artificial neural networks are computing systems inspired by the biological neural networks that constitute animal brains.
Such systems learn (progressively improve their ability) to do tasks by considering examples, generally without task-
specific programming

▪ For example, in image recognition, they might

learn to identify images that contain cats by

analyzing example images that have been manually

labeled as ‘cat’ or ‘no cat’, and by using these

analytic results they can identify cats in other

images

▪ They have found to be most useful in applications

difficult to express with a traditional computer

algorithm using rule-based programming


Copyright Intellipaat. All rights reserved.
Quiz

Copyright Intellipaat. All rights reserved.


Quiz 1

Deep Learning is not a subset of ML.

A True

B False

Copyright Intellipaat. All rights reserved.


Answer 1

Deep Learning is not a subset of ML.

A True

B False

Copyright Intellipaat. All rights reserved.


Quiz 2

Self-Driving Cars is the use case of ..

A Classification Algorithm

B Reinforcement Learning

C Unsupervised Learning

D Supervised Learning

Copyright Intellipaat. All rights reserved.


Answer 2

Self-Driving Cars is the use case of ..

A Classification Algorithm

B Reinforcement Learning

C Unsupervised Learning

D Supervised Learning

Copyright Intellipaat. All rights reserved.


Quiz 3

Having a Perception is a kind of Intelligence?

A Yes

B No

Copyright Intellipaat. All rights reserved.


Answer 3

Having a Perception is a kind of Intelligence?

A Yes

B No

Copyright Intellipaat. All rights reserved.


Thank you!

Copyright Intellipaat. All rights reserved.


India: +91-7847955955

US: 1-800-216-8930 (TOLL FREE)

sales@intellipaat.com

24/7 Chat with Our Course Advisor

Copyright Intellipaat. All rights reserved.


Artificial
Intelligence
Introduction to Neural Networks and
Deep Learning Frameworks

Copyright Intellipaat. All rights reserved.


Agenda
Topology of Neural
01 Networks 02 Perceptrons

Activation Functions Perceptron Training


03 and Their Types
04 Algorithm

Deep Learning 06 What Are Tensors?


05 Frameworks

Program Elements
07 Computational Graph 08 in TensorFlow

Copyright Intellipaat. All rights reserved.


Topology of a Neural Network

Typically, artificial neural networks have a layered structure. The Input Layer picks up the input signals and passes them on to the next layer, also
known as the ‘Hidden’ Layer (there may be more than one Hidden Layer in a neural network). Last comes the Output Layer that delivers the result

Input layer Hidden layer Output layer


Copyright Intellipaat. All rights reserved.
Well, everyone has heard about AI, but how
many of you know that the inspiration behind
artificial neural networks came from the
biological neurons that are found within
human brains?

Copyright Intellipaat. All rights reserved.


Let us first understand the architecture of
our biological neurons which is very similar to
that of artificial neurons

Copyright Intellipaat. All rights reserved.


Neurons: How Do They Work?

A neural network is a computer simulation of the way biological neurons work within a human brain

Dendrites: These branch-like structures extending away from the cell body
receive messages from other neurons and allow them travel to the cell body

Cell Body: It contains a nucleus, smooth and rough endoplasmic reticulum,


Golgi apparatus, mitochondria, and other cellular components

Axon: An axon carries an electrical impulse from the cell body to another
neuron

Copyright Intellipaat. All rights reserved.


Now, let us understand about artificial
neurons in detail!

Copyright Intellipaat. All rights reserved.


Artificial Neurons

▪ The most fundamental unit of a deep neural network is

called as an artificial neuron

▪ It takes an input, processes it, passes it through an

activation function, and returns the output

▪ Such type of artificial neurons are called as perceptrons

▪ A perceptron is a linear model used for binary

classification

Schematic Representation of a Neuron in a Neural Network

Copyright Intellipaat. All rights reserved.


Perceptron: How Does It Work?

▪ The three arrows correspond to the three inputs coming into the network

▪ Values [0.7, 0.6, and 1.4] are weights assigned to the corresponding input

▪ Inputs get multiplied with their respective weights and their sum is taken

▪ Consider the three inputs as x1, x2, and x3

▪ Let the three weights be w1, w2, and w3

Sum = x1w1 + x2w2+x3w3


Sum=x1(0.7) + x2(0.6) + x3(1.4)

▪ An offset is added to this sum. This offset is called Bias

▪ It is just a constant number, say 1, which is added for scaling purposes

New_Sum = x1(0.7) + x2(0.6) + x3(1.4) + bias

Copyright Intellipaat. All rights reserved.


Why Do We Need Weights?

▪ Statistically, weights determine the relative importance of input

▪ Mathematically, they are just the slope of the line

Copyright Intellipaat. All rights reserved.


Why Do We Need Weights?

Will it rain, if I
wear a blue
shirt?
Humidity x1
Output

Will it rain? (0/1)

Blue shirt x2

w2 is assigned a lower value because significance of the


input ‘blue shirt’ is less than ‘humidity’

Copyright Intellipaat. All rights reserved.


Why Do We Need Activation Functions?

We have two classes. One set is


represented with triangles and
the other with circles

Copyright Intellipaat. All rights reserved.


Why Do We Need Activation Functions?

Draw me a linear decision


boundary which can separate
these two classes

Copyright Intellipaat. All rights reserved.


Why Do We Need Activation Functions?

We will have to add a third


dimension to create a linearly
separable model which is easy to
deal with

Copyright Intellipaat. All rights reserved.


Activation Functions

▪ They are used to convert an input signal of a node in an

artificial neural network to an output signal

▪ That output signal now is used as an input in the next layer in

the stack

▪ Activation functions introduce non-linear properties to our

network

▪ A neural network without an activation function is essentially

just a linear regression model

▪ The activation function does non-linear transformation to the

input making it capable to learn and perform more complex

tasks

Copyright Intellipaat. All rights reserved.


Identity Types of Activation Functions

Binary Step

Sigmoid

Tanh

ReLU

Leaky ReLU

Softmax
Copyright Intellipaat. All rights reserved.
Identity Function
• A straight line function where activation is proportional to input

• No matter how many layers we have, if all of them are linear in nature, the final activation function of the last

layer will be nothing but just a linear function of the input of the first layer

• We use a linear function to solve a linear regression problem

• Range: (−∞,∞)

𝒇 𝒙 =𝒙

Copyright Intellipaat. All rights reserved.


Binary Step Function
• It is also known as the Heaviside step function, or the unit step function, which is usually denoted by H or θ, is a

discontinuous function

• Its value is 0 for the negative argument and 1 for the positive argument

• It depends on the threshold value we define

• We use the binary step function to solve a binary classification problem


𝟎 𝒇𝒐𝒓 𝒙 < 𝟎 • Range: (0,1)
𝒇 𝒙 =ቐ
𝟏 𝒇𝒐𝒓 𝒙 = > 𝟎

Copyright Intellipaat. All rights reserved.


Sigmoid Function
• The sigmoid function is an activation function where it scales values between 0 and 1 by applying a threshold

• When we apply the weighted sum in the place of x, the values are scaled in between 0 and 1

• Large negative numbers are scaled toward 0, and large positive numbers are scaled toward 1

• Range: (0,1)

𝟏
𝒇 𝒙 =
𝟏 + ⅇ−𝒙

Copyright Intellipaat. All rights reserved.


Tanh Function
• It is a hyperbolic trigonometric function

• The Tanh activation works almost always better than sigmoid functions as optimization is easier in this method

• The advantage of Tanh is that it can deal more easily with negative numbers

• It is actually a mathematically shifted version of the sigmoid function

• Range: (−1,1)

𝟐
𝒇 𝒙 : 𝒕𝒂𝒏 𝒉 𝒙 = 𝒙 −𝟏
𝟏+ⅇ−𝟐

Copyright Intellipaat. All rights reserved.


ReLU Function
• ReLU stands for rectified linear unit

• It is the most widely used activation function

• It is primarily implemented in Hidden Layers of the neural network

• This function allows only the maximum values to pass during the front propagation as shown in the graph below

• Range: (0,∞)
𝟎 𝒇𝒐𝒓 𝒙 < 𝟎
𝒇 𝒙 =ቐ
𝒙 𝒇𝒐𝒓 𝒙 = > 𝟎

Copyright Intellipaat. All rights reserved.


Leaky ReLU Function
• Leaky ReLU allows a small negative value to pass during the back propagation if we have a dead ReLU problem

• This eventually activates the neuron and brings it down

• Range: (−∞,∞)

𝟎. 𝟎𝟏𝒙 𝒇𝒐𝒓 𝒙 < 𝟎


𝒇 𝒙 =ቐ
𝒙 𝒇𝒐𝒓 𝒙 = > 𝟎

Copyright Intellipaat. All rights reserved.


Softmax Function
• The Softmax function is used when we have multiple classes

• It is useful for finding out the class which has the max. probability

• The Softmax function is ideally used in the Output Layer of the classifier where we are actually trying to attain the

probabilities to define the class of each input

• Range: (0,1)

ⅇ𝒛 𝒋
𝜎 𝒛 𝒋 = σ𝑲 𝒛 , 𝒋 = 𝟏, 𝟐, . . 𝑲
𝒌=𝟏 ⅇ 𝒌

Copyright Intellipaat. All rights reserved.


Got bored again? Let us get back to
perceptrons and try to understand them in a
better way!

Copyright Intellipaat. All rights reserved.


Training a Perceptron

By training a perceptron, we try to find a line, plane, or some hyperplane which can accurately separate two
classes by adjusting weights and biases

Error = 2 Error = 1 Error = 0

Copyright Intellipaat. All rights reserved.


Perceptron Training Algorithm

Input

W1
Bias
X1

Activation Initialize weights,


X2 W2 Function bias, and threshold

෍ Output
Calculate the sum
Update weights
X3 W3 and pass through an
activation function 𝜕𝐸
Wnew = Wold – LR*( )
𝜕𝑤

Error Produce the output


(Error)
Xn Wn
(Correct)

Stop

Copyright Intellipaat. All rights reserved.


Benefits of Using Artificial Neural Networks

Organic Non-linear Data


Fault Tolerance Self-repairing
Learning Processing

Copyright Intellipaat. All rights reserved.


Let us now move toward Deep
Learning frameworks!

Copyright Intellipaat. All rights reserved.


Deep Learning Frameworks

These Deep Learning libraries help in implementing artificial neural networks

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J
TensorFlow is an open-source software library for high-performance numerical computations
MXNet

Developed by Google

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet Natural language


Forecasting
processing

Text classification Tagging

Google Translate

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet
Tensor
Used for visualizing TensorFlow computations and graphs
Board

TensorFlow Used for rapid deployment of new algorithms/experiments


Serving while retaining the same server architecture and APIs

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet

A high-Level API which can run on top of


TensorFlow, Theano, or CNTK

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet
A recurrent neural network

A convolutional neural network

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet

Stacks layers on top

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet

A scientific computing framework developed by


Facebook

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet

‘Pythonic’ in nature

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet

Offers dynamic computational


graphs

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet

A Deep Learning programming


library written for Java

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet
Image recognition Fraud detection

Parts of speech
Text mining
tagging

Natural language
processing

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet

Developed by Apache
Software Foundation

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet

Copyright Intellipaat. All rights reserved.


TensorFlow

Keras

PyTorch

DL4J

MXNet
Speech
Imaging
recognition

Forecasting NLP

Copyright Intellipaat. All rights reserved.


What Are Tensors?

A tensor is a multi-dimensional array in which data is stored

Tensor is given
as an input to a
neural network

Tensor

Copyright Intellipaat. All rights reserved.


Tensor Rank

Tensor rank represents the dimension of the n-dimensional array

Rank Math Entity Example

0 Scalar (magnitude only) s = 483

1 Vector (magnitude and v = [1.1, 2.2, 3.3]


direction)
2 Matrix (table of numbers)
m = [1, 2, 3], [4, 5, 6], [7, 8, 9]]
3 3-Tensor (cube of numbers) t=
[[[2], [4], [6]], [[8], [10], [12]], [
[14], [16], [18]]]
n n-Tensor ……

Copyright Intellipaat. All rights reserved.


Computational Graph

Computation is done in the form of a graph

a = 10
b = 20 Addition
c = 30 c
ℎ = 𝑎∗𝑏 +𝑐
Multiplication

a b

Copyright Intellipaat. All rights reserved.


Computational Graph

The computational graph is executed inside a session

h h

a = 10
b = 20 Addition Addition
c = 30 c c
ℎ = 𝑎∗𝑏 +𝑐
Multiplication Multiplication

a b a b

Session

Copyright Intellipaat. All rights reserved.


Computational Graph

The computational graph is executed inside a session

a = 10 Node -> Mathematical operation


b = 20 Addition
c = 30 c
ℎ = 𝑎∗𝑏 +𝑐
Multiplication
Edge -> Tensor

a b

Copyright Intellipaat. All rights reserved.


Program Elements in TensorFlow

Constant Placeholder Variable

Copyright Intellipaat. All rights reserved.


Constant Constants are program elements whose values do not change

Placeholder

Variable a=tf.constant(10) b=tf.constant(20)

Copyright Intellipaat. All rights reserved.


Constant A placeholder is a program element to which we can assign data at a later
time

Placeholder

Variable x=tf.placeholder(tf.float32) y=tf.placeholder(tf.string)

Copyright Intellipaat. All rights reserved.


Constant A variable is a program element which allows us to add new trainable
parameters to the graph

Placeholder

Variable W=tf.Variable([3],tf.float32) b=tf.Variable([0.4],tf.float32)

Copyright Intellipaat. All rights reserved.


Quiz

Copyright Intellipaat. All rights reserved.


Quiz 1

A tensor is a single-dimensional array in


which data is stored

A True

B False

Copyright Intellipaat. All rights reserved.


Answer 1

A tensor is a single-dimensional array in


which data is stored

A True

B False

Copyright Intellipaat. All rights reserved.


Quiz 2

How many layers does a standard Neural


Network has?

A 1

B 2

C 3

D 4 or more

Copyright Intellipaat. All rights reserved.


Answer 2

How many layers does a standard Neural


Network has?

A 1

B 2

C 3

D 4 or more

Copyright Intellipaat. All rights reserved.


Quiz 3

Perceptron is other name of Neurons

A Yes

B No

Copyright Intellipaat. All rights reserved.


Answer 3

Perceptron is other name of Neurons

A Yes

B No

Copyright Intellipaat. All rights reserved.


Thank you!

Copyright Intellipaat. All rights reserved.


India: +91-7847955955

US: 1-800-216-8930 (TOLL FREE)

sales@intellipaat.com

24/7 Chat with Our Course Advisor

Copyright Intellipaat. All rights reserved.


Artificial
Intelligence
Deep Dive into Neural Networks

Copyright Intellipaat. All rights reserved.


Agenda
Limitations of a Single- 02 Use Case 1
01 layer Perceptron

Feedforward Neural
03 Network 04 Multi-layer Perceptron

05 Use Case 2 06 Backpropagation Algorithm

07 Gradient Descent 08 Stochastic Gradient Descent

09 Adam Optimization Algorithm 10 Demo


Copyright Intellipaat. All rights reserved.
Limitations of a Single-layer Perceptron

▪ A single-layer perceptron can only learn linearly

separable problems
Boolean ‘AND’: Linearly
▪ If the problem is not linearly separable, the learning Separable

process of a perceptron will never reach a point where

all points are classified correctly

▪ Boolean ‘AND’ and ‘OR’ functions are linearly separable,

whereas Boolean ‘XOR’ is not

Boolean ‘XOR’: Non-


Linearly Separable

Copyright Intellipaat. All rights reserved.


Limitations of a Single-layer Perceptron

▪ A single-layer perceptron can only learn linearly

separable problems
Boolean ‘AND’: Linearly
▪ If the problem is not linearly separable, the learning Separable

process of a perceptron will never reach a point where

all points are classified correctly

▪ Boolean ‘AND’ and ‘OR’ functions are linearly separable,

whereas Boolean ‘XOR’ is not

Boolean ‘XOR’: Non-


Linearly Separable

For solving this problem, we can use a multi-layer


perceptron

Copyright Intellipaat. All rights reserved.


Limitations of a Single-layer Perceptron

For solving this problem, we can use a multi-layer Boolean ‘AND’: Linearly
perceptron Separable

▪ A single-layer perceptron won’t be able to solve complex problems

such as Image Classification Boolean ‘XOR’: Non-


Linearly Separable
▪ In such kinds of problems, the dimensionality and complexity of the

classification is very high

Copyright Intellipaat. All rights reserved.


Let us see some real-life problems
which cannot be solved by single-
layer perceptron

Copyright Intellipaat. All rights reserved.


Use Case 1

Complex problems that involve a lot of parameters cannot be solved by a single-layer perceptron

▪ Consider a case where you own an e-commerce firm. You have planned to increase traffic on your site by providing a special discount on the
products and services. Now, you want to create awareness among people regarding this end-season sale by marketing on different portals like:
• Google ads
• Personal emails
• Sales advertisements on relevant sites
• YouTube ads
• Ads on different sites
• linkedin
• Blogs and so on
▪ This task is too complex for a human to analyze, as you can see that the number of parameters is quite high
▪ Let us try to solve it using Deep Learning

Copyright Intellipaat. All rights reserved.


Use Case 1

▪ You can either use just one platform for publicity or use a variety of them

▪ Each of them has its own advantages and disadvantages, but lots of factors would have to be considered

▪ The increased traffic on your portal or the number of sales that would happen is dependent on different categorical inputs, their sub-categories, and their

parameters

Computing and calculating profit in terms of popularity and sales, from so many inputs and their
sub-categories, is not possible just through one perceptron

Copyright Intellipaat. All rights reserved.


So now, you know why a
single perceptron cannot
be used for complex non-
linear problems

So many
Inputs!

Copyright Intellipaat. All rights reserved.


Before getting into the actual solution to our
problem, let us recall one of the previously
discussed topics: Feedforward Neural Network

Copyright Intellipaat. All rights reserved.


Feedforward Neural Network

Feedforward neural network is the most simple artificial neural network containing multiple nodes arranged in multiple layers.
Adjacent layer nodes have connections or edges where all connections are weighted

Copyright Intellipaat. All rights reserved.


Feedforward Neural Network

Feedforward neural network is the most simple artificial neural network containing multiple nodes arranged in multiple layers.
Adjacent layer nodes have connections or edges where all connections are weighted

“A feedforward neural network can contain two kinds of nodes”

This is the simplest feedforward


Monolayer neural network that does not
contain any hidden layers

Multi-layer Perceptron (MLP) includes at


Multi-layer
least one hidden layer (except for one
Perceptron
input layer and one output layer)

Copyright Intellipaat. All rights reserved.


Let us now discuss the ultimate solution to our
previous problem, i.e., MLP

Copyright Intellipaat. All rights reserved.


Multi-layer Perceptron

▪ A multi-layer perceptron (MLP) is a deep, artificial neural network

▪ It is composed of more than one perceptrons

▪ An MLP is comprised of:

• An input layer to receive the signal

• An output layer that makes a decision or prediction about the input

• An arbitrary number of hidden layers

• Each node, apart from the input nodes, has a nonlinear activation function

• An MLP uses backpropagation as a supervised learning technique Input layer Hidden layers Output layer

MLP is widely used for solving problems that require supervised learning and research into computational
neuroscience and parallel distributed processing. Such applications include speech recognition, image recognition,
and machine translation

Copyright Intellipaat. All rights reserved.


Multi-layer Perceptron
The figure shows a multi-layer perceptron with a single hidden layer. All connections have weights associated with them, but only
three weights (w0, w1, and w2) are shown in the figure

Copyright Intellipaat. All rights reserved.


Multi-layer Perceptron

Input Layer:

• It has three nodes

• Bias (offset) node has a value of 1

• The other two nodes take X1 and X2 as external inputs

• Outputs from nodes in the input layer are 1, X1, and X2, respectively,

which are fed into the hidden layer

Copyright Intellipaat. All rights reserved.


Multi-layer Perceptron

Hidden Layer:

• It also has three nodes with the Bias node having an output of 1

• The output of the other two nodes in the hidden layer depends on

the outputs from the input layer (1, X1, and X2) as well as the

weights associated with the connections (edges)

• The figure shows the output calculation for one of the hidden nodes

• Similarly, the output from the other hidden node can be calculated

• Here, ‘f’ refers to the activation function. These outputs are then fed

to the nodes in the output layer

Copyright Intellipaat. All rights reserved.


Multi-layer Perceptron

Output Layer:

• The output layer has two nodes which take inputs

from the hidden layer and perform similar

computations as shown for the highlighted hidden

node

• Values calculated (Y1 and Y2) as a result of these

computations act as outputs of the multi-layer

perceptron

Copyright Intellipaat. All rights reserved.


Let us now see how MLP helps us in providing a
solution to our Use Case 1

Copyright Intellipaat. All rights reserved.


Use Case 1: Solution

▪ Every source behaves as an input to the neural network

▪ Once all sources are fed into the system, the neural network calculates the output after the computation is done

Copyright Intellipaat. All rights reserved.


Lets take another example to understand a multi-
layer perceptron better!

Copyright Intellipaat. All rights reserved.


Use Case 2

Suppose, we have the following student-marks dataset

Mid-term
Hours Studied Final Results
Marks
35 67 1

12 75 0

16 89 1

45 56 1

10 90 0

• The two input columns show the number of hours each student has studied and the mid-term marks obtained by the student, respectively
• The Final Results column can have two values 1 or 0 indicating whether the student passed (1) in the final term or failed (0)

Copyright Intellipaat. All rights reserved.


Now, suppose, we want to predict whether a
student studying 25 hours and having 70 marks in
the mid term will pass the final term

Copyright Intellipaat. All rights reserved.


Use Case 2

Hours Studied Mid-term Marks Final Results


25 70 ?

This is a binary classification problem where a multi-layer perceptron can learn from the given examples (the
training data) and make an informed prediction when given a new data point. We will see now, how a multi-layer
perceptron learns such relationships

Copyright Intellipaat. All rights reserved.


The process by which a multi-layer perceptron learns
is called the Backpropagation algorithm.
We will discuss this in details after completing MLP!

Copyright Intellipaat. All rights reserved.


Use Case 2: Solution

• The figure has two nodes in the input layer (apart from the Bias node) which take the inputs Hours Studied and Mid-term Marks

• It also has a hidden layer with two nodes (apart from the Bias node)

• The output layer has two nodes as well: the upper node outputs the probability of ‘Pass’ while the lower node outputs the probability of ‘Fail’

Copyright Intellipaat. All rights reserved.


In classification, we generally use a Softmax function
as the activation function to ensure that the outputs
are probabilities and they add up to 1. So, in this case,
Probability (Pass) + Probability (Fail) = 1

Copyright Intellipaat. All rights reserved.


Use Case 2: Solution

▪ Step 1: Forward Propagation


• Let’s consider the hidden layer node, marked V, in the figure
• Assume that the weights of the connections from the inputs to that node are w1, w2, and w3 (as shown)
• The first training example as input:
o Input to the network = [35, 67]
o Desired output from the network (target) = [1, 0]
o The output V from the node can be calculated as follows (where ‘f’ is an activation function): V = f (1*w1 + 35*w2 + 67*w3)

Copyright Intellipaat. All rights reserved.


Suppose, the output probabilities from the two nodes
in the output layer are 0.4 and 0.6, respectively
(since the weights are randomly assigned, outputs
will also be random)

Copyright Intellipaat. All rights reserved.


We can see that the calculated probabilities (0.4 and
0.6) are very far from the desired probabilities (1
and 0, respectively); hence, the network in the
figure is said to have an ‘Incorrect Output’

Copyright Intellipaat. All rights reserved.


Use Case 2: Solution

▪ Step 2: Backpropagation and Weight Updates


• We calculate the total errors at the output nodes and propagate these errors back through the network using backpropagation to calculate
the gradients
• Then, we use an optimization method such as gradient descent to ‘adjust’ all weights in the network with an aim of reducing errors at the
output layer
• This is shown in the next figure

Copyright Intellipaat. All rights reserved.


Use Case 2: Solution

▪ Step 2: Backpropagation and Weight Updates


• Suppose that the new weights associated with the node in consideration are w4, w5, and w6 (after backpropagation and adjusting weights)

Copyright Intellipaat. All rights reserved.


If we now input the same example to the network
again, the network should perform better than
before since the weights have now been adjusted to
minimize errors in prediction

Copyright Intellipaat. All rights reserved.


Use Case 2: Solution

▪ As shown in Figure, errors at the output nodes have now reduced to [0.2, -0.2] as compared to [0.6, -0.4] earlier

▪ This means that our network has learned to correctly classify our first training example

▪ We repeat this process with all other training examples in our dataset. Then, our network will learn those examples as well

Copyright Intellipaat. All rights reserved.


If we now want to predict whether a student
studying 25 hours and having 70 marks in the mid
term will pass the final term, we go through the
forward propagation step and find the output
probabilities for Pass and Fail

Copyright Intellipaat. All rights reserved.


So now, let us understand backpropagation in
detail as you have already heard a lot about it!

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm

The backpropagation algorithm is a supervised learning method for multi-layer feedforward networks from the field of Artificial
Neural Networks

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm

• The principle of this approach is to model a given function by modifying internal weightings of input signals to produce an expected output signal

• The system is trained using a supervised learning method, where the error between the system’s output and a known expected output is presented to the

system and used to modify its internal state

Copyright Intellipaat. All rights reserved.


Let us understand its working with the help
of an example!

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Consider the following table

Input Desired Output


0 0
1 2
2 4

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Consider the initial value of weight as 3

Desired Model Output


Input
Output (W=3)
0 0 0
1 2 3
2 4 6

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Observe the difference between the actual output and the desired output

Model
Desired Absolute
Input Output Square Error
Output Error
(W=3)
0 0 0 0 0
1 2 3 1 1
2 4 6 2 4

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Observe the error when changing the value of W to 4

Model Output Absolute Model Output Square


Input Desired Output Square Error
(W=3) Error (W=4) Error
0 0 0 0 0 0 0
1 2 3 1 1 4 4
2 4 6 2 4 8 16

Copyright Intellipaat. All rights reserved.


What if we decrease the value of W?

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Consider the value of weight as 2

Model Output Absolute Model Output Square


Input Desired Output Square Error
(W=3) Error (W=2) Error
0 0 0 0 0 0 0
1 2 3 1 1 3 1
2 4 6 2 4 4 0

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Consider the value of weight as 2

Model Output Absolute Model Output Square


Input Desired Output Square Error
(W=3) Error (W=2) Error
0 0 0 0 0 0 0
1 2 3 1 1 3 1
2 4 6 2 4 4 0

We see that when the weight is reduced, the error also decreases

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Consider the following graph

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

We need to reach the Global Loss Minimum. This is nothing but backpropagation

When the gradient is negative, When the gradient is positive,

increase in weight decrease in weight

leads to decrease in error leads to decrease in error

Copyright Intellipaat. All rights reserved.


Let us now understand the math behind
backpropagation

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

In order to have some numbers to work with, here are initial weights, biases, and training inputs/outputs

• The goal of backpropagation is to optimize the

weights so that the neural network can learn how to

correctly map arbitrary inputs to outputs

• We’re going to work with a single training set: given

inputs 0.05 and 0.10, we want the neural network to

output 0.01 and 0.99

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Steps Involved in Backpropagation

Step 1: The Forward Pass

Step 2: The Backward Pass

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Step 1: The Forward Pass

The total net input for h1:


net h1 = w1 * i1 + w2 * i2 + b1 * 1
net h1 = 0.15 * 0.05 + 0.2 * 0.1 + 0.35 * 1 = 0.3775

The output for h1:


out h1 = 1/(1 + e-net h1) = 1/(1 + e-0.3775) = 0.593269992

Carrying out the same process for h2:


out h2 = 0.596884378

**We repeat this process for the output layer neurons, using the output from the hidden layer
neurons as their input**

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Step 1: The Forward Pass

The output for o1:


net o1 = w5 * out h1 + w6 * out h2 + b2 * 1
net o1 = 0.4 * 0.593269992 + 0.45 * 0.596884378 + 0.6 * 1 = 1.105905967
out o1 = 1/(1 + e-net o1) = 1/(1 + e-1.105905967) = 0.75136507

Carrying out the same process for o2:


out o2 = 0.772928465

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

We can now calculate the error for each output neuron using the squared error function and sum them to
Calculating the Total Error
get the total error: E total = Ʃ1/2(target - output)2

The target output for o1 is 0.01, but the neural network output is 0.75136507; therefore, its error is:
E o1 = 1/2(target o1 - out o1)2 = 1/2(0.01 - 0.75136507)2 = 0.274811083

By repeating this process for o2 (remembering that the target is 0.99), we get:
E o2 = 0.023560026

The total error for the neural network is the sum of these errors:
E total = E o1 + E o2 = 0.274811083 + 0.023560026 = 0.298371109

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Our goal with backpropagation is to update each of the weights in the network so that the actual output is closer
Step 2: The Backward Pass
to the target output, thereby minimizing the error for each output neuron and the network as a whole

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Our goal with backpropagation is to update each of the weights in the network so that the actual output is closer
Step 2: The Backward Pass
to the target output, thereby minimizing the error for each output neuron and the network as a whole

Consider w5; we will calculate the rate of change of error w.r.t the change in weight w5:
(𝜕𝐸 𝑡𝑜𝑡𝑎𝑙)/𝜕w5 = (𝜕E total)/(𝜕out o1) * (𝜕out o1)/(𝜕net o1) * (𝜕net o1)/ 𝜕w5

Since we are propagating backwards, the first thing we need to do is to calculate the change in total errors w.r.t. the outputs o1 and
o2:
E total = (1/2)*(target o1 – out o1)2 + (1/2)*(target o2 – out o2)2
(𝜕𝐸 𝑡𝑜𝑡𝑎𝑙)/(𝜕out o1) = -(target o1 – out o1) = -(0.01 - 0.75136507) = 0.74136507

Now, we will propagate further backward and calculate the change in the output o1 w.r.t to its total net input:
out o1 = 1/(1 + e-net o1)
(𝜕out o1)/(𝜕net o1) = out o1(1-out o1) = 0.75136507(1 - 0.75136507) = 0.186815602

How much does the total net input of o1 change w.r.t. w5?
net o1 = w5*out h1 + w6*out h2 +b2*1
(𝜕net o1)/ 𝜕w5 = 1*out h1*w5(1-1) +0+0 = out h1 = 0.593269992

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Step 2: The Backward Pass Putting all values together and calculating the updated weight value

let’s put all values together:


(𝜕𝐸 𝑡𝑜𝑡𝑎𝑙)/𝜕w5 = (𝜕E total)/(𝜕out o1) * (𝜕out o1)/(𝜕net o1) * (𝜕net o1)/ 𝜕w5 = 0.082167041

Calculate the updated value of w5:


W5+ = W5 –n*(𝜕𝐸 𝑡𝑜𝑡𝑎𝑙)/𝜕w5 = 0.4 - 0.5 * 0.082167041 = 0.35891648

We can repeat this process to get the new weights w6, w7, and w8
W6+ = 0.408666186
W7+ = 0.511301270
W8+ = 0.561370121

We perform the actual updates in the neural network after we have the new weights leading into the hidden layer neurons

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Step 2: The Backward Pass


Hidden Layer We’ll continue the backward pass by calculating new values for w1, w2, w3, and w4

Start with w1:


(𝜕𝐸 𝑡𝑜𝑡𝑎𝑙)/𝜕w1 = (𝜕E total)/(𝜕out h1) * (𝜕out h1)/(𝜕net h1) * (𝜕net h1)/ 𝜕w1
(𝜕E total)/(𝜕out h1) = (𝜕E o1)/(𝜕out h1) + (𝜕E o2)/(𝜕out h1)

We’re going to use a similar process as we did for the output layer, but slightly different to account for the fact that the output of
each hidden layer neuron contributes to the output. Thus, we need to take E o1 and E o2 into consideration

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Step 2: The Backward Pass


Hidden Layer We can visualize it as:

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Step 2: The Backward Pass


Hidden Layer

Starting with:
(𝜕E total)/(𝜕out h1) = (𝜕E o1)/(𝜕out h1) + (𝜕E o2)/(𝜕out h1)

(𝛛E o1)/(𝛛out h1) = (𝜕E o1)/(𝜕net o1) * (𝜕net o1)/(𝜕out h1)

We can calculate (𝜕E o1)/(𝜕net o1)using values calculated earlier:

(𝜕E o1)/(𝜕net o1) = (𝜕E o1)/(𝜕out h1) * (𝜕out h1)/(𝜕net o1)


= 0.74136507 * 0.186815602 = 0.138498562

net o1 = w5*out h1 + w6*out h2 + b2*1


(𝜕net o1)/(𝜕out h1) = w5 = 0.40

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Step 2: The Backward Pass


Hidden Layer

Put the values in the equation:


(𝛛E o1)/(𝛛out h1) = (𝜕E o1)/(𝜕net o1) * (𝜕net o1)/(𝜕out h1)
= 0.138498562 * 0.40 = 0.055399425

Following the same process for (𝜕E o2)/(𝜕out h1), we get:


(𝜕E o2)/(𝜕out h1) = -0.019049119

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Step 2: The Backward Pass


Hidden Layer

We can calculate:
(𝜕E total)/(𝜕out h1) = (𝜕E o1)/(𝜕out h1) + (𝜕E o2)/(𝜕out h1)
= 0.055399425 + (-0.019049119) = 0.036350306

Now that we have (𝜕E total)/(𝜕out h1), we need to figure out (𝜕out h1)/(𝜕net h1)and (𝜕net h1)/𝜕w 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑤𝑒𝑖𝑔ℎ𝑡
out h1 = 1/(1 + e-net h1)
(𝜕out h1)/(𝜕net h1) = out h1(1- out h1) = 0.59326999(1 - 0.59326999 ) = 0.241300709

We calculate the partial derivative of the total net input to h1 with respect to w1 the same as we did for the output neuron:
net h1 = w1*i1 + w3*i2 + b1*1
(𝜕net h1)/𝜕w1 = i1 = 0.05

Copyright Intellipaat. All rights reserved.


Backpropagation Algorithm: How Does it Work?

Step 2: The Backward Pass


Hidden Layer

Put it all together:


(𝜕𝐸 𝑡𝑜𝑡𝑎𝑙)/𝜕w1 = (𝜕E total)/(𝜕out h1) * (𝜕out h1)/(𝜕net h1) * (𝜕net h1)/ 𝜕w1
(𝜕𝐸 𝑡𝑜𝑡𝑎𝑙)/𝜕w1 = 0.036350306 * 0.241300709 * 0.05 = 0.000438568

We can now update w1:


w1+ = w1 – n* (𝜕𝐸 𝑡𝑜𝑡𝑎𝑙)/𝜕w1 = 0.15 - 0.5 * 0.000438568 = 0.149780716

Update other weights similarly:


w2+ = 0.19956143
w3+ = 0.24975114
w4+ = 0.29950229

• When we fed forward 0.05 and 0.1 inputs originally, the error on the network was 0.298371109
• After this first round of backpropagation, the total error is now down to 0.291027924

Copyright Intellipaat. All rights reserved.


It might not seem like much, but after repeating this process
10,000 times, for example, the error plummets to
0.0000351085. At this point, when we feed forward 0.05 and
0.1, the two output neurons generate 0.015912196 (vs. 0.01
target) and 0.984065734 (vs. 0.99 target)

Copyright Intellipaat. All rights reserved.


Optimization is a big part of Machine Learning.
Almost every Machine Learning algorithm has
an optimization algorithm at its core

Copyright Intellipaat. All rights reserved.


Gradient Descent

▪ Gradient descent is by far the most popular optimization strategy, used in Machine Learning and Deep Learning at the moment
▪ It is used while training your model, can be combined with every algorithm, and is easy to understand and implement

Gradient measures how much the output of a function changes if you change the inputs a little bit

▪ You can also think of a gradient as the slope of a function. The higher the gradient, the steeper the slope and the faster the model learns

Copyright Intellipaat. All rights reserved.


Gradient Descent

▪ Gradient descent is by far the most popular optimization strategy, used in Machine Learning and Deep Learning at the moment
▪ It is used while training your model, can be combined with every algorithm, and is easy to understand and implement

Gradient measures how much the output of a function changes if you change the inputs a little bit

▪ You can also think of a gradient as the slope of a function. The higher the gradient, the steeper the slope and the faster the model learns

How Does it Work?

Copyright Intellipaat. All rights reserved.


Gradient Descent

▪ Gradient descent is by far the most popular optimization strategy, used in Machine Learning and Deep Learning at the moment
▪ It is used while training your model, can be combined with every algorithm, and is easy to understand and implement

Gradient measures how much the output of a function changes if you change the inputs a little bit

▪ You can also think of a gradient as the slope of a function. The higher the gradient, the steeper the slope and the faster the model learns

• b = next value
• a = current value
How Does it Work? 𝑏 = 𝑎 − 𝛾∇𝑓 𝑎 • ‘−’ refers to the minimization part of the gradient
descent
• 𝛾 in the middle is the learning rate, and the gradient
term 𝛻𝑓 𝑎 is simply the direction of the steepest
descent
Copyright Intellipaat. All rights reserved.
Gradient Descent

𝑏 = 𝑎 − 𝛾∇𝑓 𝑎

▪ This formula basically tells you the next position where you need to go, which is the direction of the steepest descent
▪ Gradient descent can be thought of climbing down to the bottom of a valley, instead of climbing up a hill. This is because it is a minimization algorithm that
minimizes a given function
▪ Consider the graph below where we need to find the values of w and b that correspond to the minimum of the cost function (marked with the red arrow)

Copyright Intellipaat. All rights reserved.


Gradient Descent

▪ To start with finding the right values, we initialize the values of w and b with some random numbers, and gradient descent then starts at that point (somewhere
around the top)
▪ Then, it takes one step after the other in the steepest downside direction (e.g., from top to bottom) till it reaches the point where the cost function is as small as
possible

Copyright Intellipaat. All rights reserved.


Importance of the Learning Rate

▪ Learning rate determines how fast or slow we will move toward the optimal weights
▪ In order for gradient descent to reach the local minimum, we have to set the learning rate to an appropriate value, which is neither too low nor too high

If the steps it takes are too big, it If you set the learning rate to a
might not reach the local very small value, gradient descent
minimum because it just bounces will eventually reach the local
back and forth between the minimum, but it might take too
convex function of gradient much time
descent
Copyright Intellipaat. All rights reserved.
Understanding Epoch

▪ One epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE
▪ One epoch leads to underfitting of the curve in the graph

▪ As the number of epochs increases, more number of times the weights are changed in the neural network and the curve goes
from underfitting to optimal to overfitting

Copyright Intellipaat. All rights reserved.


Batches and Iterations

Batch Size

▪ Total number of training examples present in a single batch is referred to as the batch size
▪ Since we can’t pass the entire dataset into the neural net at once, we divide the dataset into number of batches or sets or parts

Iterations

▪ Iteration is the number of batches needed to complete one epoch

Let’s say, we have 2,000 training examples that we are going to use. We can divide the dataset of 2,000
examples into batches of 500, and then it will take four iterations to complete one epoch

Copyright Intellipaat. All rights reserved.


Gradient Descent Variants

Gradient Descent Variants

Stochastic Gradient Mini-batch Gradient


Batch Gradient Descent
Descent Descent

Copyright Intellipaat. All rights reserved.


Gradient Descent Variants

Assume you have


a dataset of 6
images

Copyright Intellipaat. All rights reserved.


Batch Gradient Descent

Back-propagates the loss for all 10


images at a time

Takes the entire


dataset at a time

Copyright Intellipaat. All rights reserved.


Stochastic Gradient Descent

Back-propagates the loss for each


image and updates the gradient

Copyright Intellipaat. All rights reserved.


Mini-Batch Gradient Descent

Back-propagates the loss for each


mini batch

Copyright Intellipaat. All rights reserved.


Tips for Gradient Descent

• Plot Cost versus Time: Collect and plot the cost values calculated by the algorithm for each iteration. The expectation for a well-performing gradient

descent run is a decrease in cost at every iteration. If it does not decrease, try reducing your learning rate

• Learning Rate: The learning rate value is a small real value such as 0.1, 0.001, or 0.0001. Try different values for your problem and see which works best

• Rescale Inputs: The algorithm will reach the minimum cost faster if the shape of the cost function is not skewed and distorted. You can achieve this by

rescaling all of the input variables (X) to the same range, such as [0, 1] or [-1, 1]

Copyright Intellipaat. All rights reserved.


Adam Optimization Algorithm

▪ The Adaptive Moment Estimation or Adam optimization algorithm is a combination of gradient descent with momentum and RMSprop algorithms
▪ Adam is an adaptive learning rate method, which means that it computes individual learning rates for different parameters

Adaptive Learning Rate Momentum

Copyright Intellipaat. All rights reserved.


Implementing a Simple Neural
Network

Copyright Intellipaat. All rights reserved.


Implementing a Simple Neural Network

x1

y
Let x1 = 2, x2 = 5, and y = 31

x2

Initializing the values of x1, x2, and y:

Copyright Intellipaat. All rights reserved.


Implementing a Simple Neural Network

Implementing forward propagation and backpropagation:

Copyright Intellipaat. All rights reserved.


Multi-variable Equation

Copyright Intellipaat. All rights reserved.


Multi-variable Equation

Loading the NumPy package:

Setting initial values for x1, x2, x3, and y:

Having a glance at x1, x2, x3, and y:

Copyright Intellipaat. All rights reserved.


Multi-variable Equation

Reshaping ‘y’:

4 Transposing ‘X’:

Creating a NumPy array from ‘x1’, ‘x2’ , and ‘x3’:

Copyright Intellipaat. All rights reserved.


Multi-variable Equation

Setting the learning rate value and initializing random values:

Implementing forward propagation and backpropagation:

Copyright Intellipaat. All rights reserved.


Quiz

Copyright Intellipaat. All rights reserved.


Quiz 1

Single Layer Perceptron is easy to use than


Multi Layer Perceptron

A True

B False

Copyright Intellipaat. All rights reserved.


Answer 1

Single Layer Perceptron is easy to use than


Multi Layer Perceptron

A True

B False

Copyright Intellipaat. All rights reserved.


Quiz 2

The variants of Gradient Descent are...

A Batch Gradient Descent

B Adams Optimizer

C Ada-Delta Optimizer

D All of these

Copyright Intellipaat. All rights reserved.


Answer 2

The variants of Gradient Descent are...

A Batch Gradient Descent

B Adams Optimizer

C Ada-Delta Optimizer

D All of these

Copyright Intellipaat. All rights reserved.


Quiz 3

The backpropagation algorithm is a supervised learning


method for multi-layer feedforward networks

A Yes

B No

Copyright Intellipaat. All rights reserved.


Answer 3

The backpropagation algorithm is a supervised learning


method for multi-layer feedforward networks

A Yes

B No

Copyright Intellipaat. All rights reserved.


Thank you!

Copyright Intellipaat. All rights reserved.


India: +91-7847955955

US: 1-800-216-8930 (TOLL FREE)

sales@intellipaat.com

24/7 Chat with Our Course Advisor

Copyright Intellipaat. All rights reserved.


Artificial
Intelligence
Model Building Using Keras

Copyright Intellipaat. All rights reserved.


Agenda
01 Why Keras? 02 What Is Keras?

03 Models in Keras 04 Layers in Keras

05 Regularization Techniques 06 Batch Normalization

07 Keras Workflow 08 Use Case 1: Sequential Model

09 Use Case 2: Functional Model


Copyright Intellipaat. All rights reserved.
There are countless Deep Learning
frameworks available today. Why do
we prefer Keras the most?

Copyright Intellipaat. All rights reserved.


Why Keras?

Keras prioritizes
developer experience

Copyright Intellipaat. All rights reserved.


Why Keras?

Keras is broadly
adopted in the
industry and among
the research
community

Copyright Intellipaat. All rights reserved.


Why Keras?

Keras makes it easy to


turn models into
products

Copyright Intellipaat. All rights reserved.


Why Keras?

Keras supports
multiple backend
engines and does not
lock you into one
ecosystem

Copyright Intellipaat. All rights reserved.


Why Keras?

Keras has strong multi-


GPU support and
distributed training
support

Copyright Intellipaat. All rights reserved.


Why Keras?

Keras development is
backed by key
companies in the Deep
Learning ecosystem

Copyright Intellipaat. All rights reserved.


What Is Keras?

Keras is a high-level neural networks API. It is written in Python and can run on top of Theano, TensorFlow, or CNTK. It is designed
to be modular, fast, and easy to use

• It was developed by François Chollet, a Google engineer

• It was developed with the concept of:

‘Being able to go from idea to result with the least possible delay is key to

doing good research’

• Keras doesn't handle low-level computation. Instead, it relies on a

specialized, well optimized tensor manipulation library to do so, serving as

the "backend engine" of Keras

• So, Keras is the high-level API wrapper for the low-level API

Copyright Intellipaat. All rights reserved.


Let us understand the different
types of models in Keras

Copyright Intellipaat. All rights reserved.


Composing Models in Keras

You can create two types of models available in Keras, i.e., the Sequential model and the Functional model

Copyright Intellipaat. All rights reserved.


Sequential Models

▪ The Sequential model is a linear stack of layers

▪ You can create a Sequential model by passing a list of layer instances to the constructor

▪ Stacking convolutional layers one above the other can be an example of a sequential model

Copyright Intellipaat. All rights reserved.


Sequential Models

from keras.models import Sequential


from keras.layers import Dense, Activation
model = Sequential([
model = Sequential()
Dense(32, input_shape=(784,)), You can also simply add
layers via the .add() model.add(Dense(32, input_dim=784))
Activation('relu’), method:
model.add(Activation('relu'))
Dense(10),
Activation('softmax’),
])

Copyright Intellipaat. All rights reserved.


Functional Models

The Keras functional API is used for defining complex models, such as multi-output models, directed acyclic graphs, or models with
shared layers

Three unique aspects of the Keras Functional API

Models are defined by creating instances of layers and connecting are as follows:

them directly to each other in pairs, then specifying the layers to act as • Defining the Input

the input and output to the model • Connecting Layers

• Creating the Model

Copyright Intellipaat. All rights reserved.


Defining the Input

• Unlike in the Sequential model, you must create and define a

standalone Input Layer that specifies the shape of the input data

• The Input Layer takes a shape argument which is a tuple that

indicates the dimensionality of the input data

• In the case of one-dimensional input data, such as for a from keras.layers import Input
visible = Input(shape=(2,))
multilayer perceptron, the shape must explicitly leave room for

the shape of the mini-batch size used when splitting the data

while training the network

• Therefore, the shape tuple is always defined with a hanging last

dimension when the input is one-dimensional

Copyright Intellipaat. All rights reserved.


Connecting Layers

• Layers in the model are connected pairwise

• This is achieved by specifying where the input comes

from while defining each new layer

• A bracket notation is used to specify the layer from from keras.layers import Input
from keras.layers import Dense
which the input is received to the current layer, after
visible = Input(shape=(2,))
hidden = Dense(2)(visible)
the layer is created

• Example: We can create the Input Layer as above,

and then create a Hidden Layer as a Dense Layer

that receives input only from the Input Layer

Copyright Intellipaat. All rights reserved.


Creating the Model

• After creating all of your model layers and

connecting them together, you must

define the model from keras.models import Model


from keras.layers import Input
• Keras provides a model class that you can from keras.layers import Dense
visible = Input(shape=(2,))
use to create a model from your created
hidden = Dense(2)(visible)

layers. It requires you to only specify the model = Model(inputs=visible, outputs=hidden)

Input and Output Layers

• Example:

Copyright Intellipaat. All rights reserved.


Predefined Neural Network Layers

• Keras has a number of predefined layers, such as:

1. Core Layers 6. Normalization Layers

2. Convolutional Layers 7. Noise Layers

3. Pooling Layers 8. Embedding Layers

4. Locally-connected Layers 9. Merge Layers

5. Recurrent Layers 10. Advanced Activation Layers

Copyright Intellipaat. All rights reserved.


Performing Regularization Using Keras

Take a look at this graph:

• As we move toward the right in this graph, our model tries to learn too well the details and the noise from the training data, which results in poor

performance on the unseen data

• In other words, while going toward the right, the complexity of the model increases such that the training error reduces but the testing error doesn’t. This

is shown in the graph on the next slide

Copyright Intellipaat. All rights reserved.


Performing Regularization Using Keras

• Have you come across a situation where your model performed exceptionally well on train data but was not able to predict test data?

• Or, were you ever on the top of a competition in public leaderboard only to fall hundreds of places in the final ranking?

• Do you know how complex neural networks are and how it makes them prone to overfitting? This is one of the most common problems Data Science

professionals face these days

Copyright Intellipaat. All rights reserved.


Performing Regularization Using Keras

Regularization is a technique which makes slight modifications to the learning algorithm such that the model generalizes better. This in turn improves the

model’s performance on the unseen data as well

Copyright Intellipaat. All rights reserved.


Performing Regularization Using Keras

How does Regularization help reduce overfitting?

Copyright Intellipaat. All rights reserved.


Performing Regularization Using Keras

How does Regularization help reduce overfitting?

Let’s consider a neural network which is overfitting on the training data as shown in the above image

Copyright Intellipaat. All rights reserved.


Performing Regularization Using Keras

How does Regularization help reduce overfitting?

Regularization penalizes the weight matrices of the nodes

Copyright Intellipaat. All rights reserved.


Performing Regularization Using Keras

Assume that our regularization coefficient is so high that some of the weight matrices are nearly equal to zero

This will result in a much simpler linear network and slight underfitting of the training data

Copyright Intellipaat. All rights reserved.


Performing Regularization Using Keras

• Such a large value of the regularization coefficient is not that useful

• We need to optimize the value of the regularization coefficient in order to obtain a well-fitted model as shown in the image below

Copyright Intellipaat. All rights reserved.


Performing Regularization Using Keras

• Dropout produces very good results and is consequently the most frequently used Regularization technique in the field of Deep Learning

• Let’s say our neural network structure is akin to the one shown below:

So, what does dropout do?

Copyright Intellipaat. All rights reserved.


Dropout

• At every iteration, it randomly selects some nodes and removes them, along with all of their incoming and outgoing connections as shown below:

• So, each iteration has a different set of nodes, and this results in a different set of outputs. It can also be thought of as an ensemble technique in

Machine Learning

Copyright Intellipaat. All rights reserved.


Dropout

• Ensemble models usually perform better than a single model as they capture more randomness. Similarly, dropout also performs better than a

normal neural network model

• The probability of choosing how many nodes should be dropped out is the hyperparameter of the dropout function. As seen in the image below,

dropout can be applied to both the Hidden Layers as well as the Input Layers

• Due to these reasons, dropout is usually preferred when we have a large neural network structure in order to introduce more randomness

Copyright Intellipaat. All rights reserved.


Dropout

In Keras, we can implement dropout using the Keras core layer

from keras.layers.core import Dropout

model = Sequential([
Dense(output_dim=hidden1_num_units, input_dim=input_num_units, activation='relu'),
Dropout(0.25),

Dense(output_dim=output_num_units, input_dim=hidden5_num_units, activation='softmax'),


])

Copyright Intellipaat. All rights reserved.


Data Augmentation

• The simplest way to reduce overfitting is to increase the size of the training data

• Let’s consider, we are dealing with images

• There are a few ways of increasing the size of the training data—rotating the image, flipping, scaling, shifting, etc.

• In the below image, some transformation has been done on the handwritten digits dataset

• This technique is known as Data Augmentation

• This usually provides a big leap in improving the accuracy of the model

• It can be considered as a mandatory trick to improve our predictions

Copyright Intellipaat. All rights reserved.


Data Augmentation

• In Keras, we can perform all these transformations using ImageDataGenerator

• It has a big list of arguments which you can use to pre-process your training data

• Example:

from keras.preprocessing.image import ImageDataGenerator


datagen = ImageDataGenerator(horizontal flip=True)
datagen.fit(train)

Copyright Intellipaat. All rights reserved.


Batch Normalization

• When the data is fed through a deep neural network and weights and parameters adjust those values, sometimes making the data too big or too small, it

becomes a problem. By normalizing the data in each mini-batch, this problem is largely avoided

• Batch Normalization normalizes each batch by both mean and variance reference

• It is just another layer, you can use to create your desired network architecture

• It is generally used between the linear and the non-linear layers in your network, because it normalizes the input to your activation function so that you're

centered in the linear section of the activation function (such as Sigmoid)

Copyright Intellipaat. All rights reserved.


Batch Normalization

• When the data is fed through a deep neural network and weights
A normal Dense fully connected layer looks like this:
and parameters adjust those values, sometimes making the data

too big or too small, it becomes a problem. By normalizing the


model.add(layers.Dense(64, activation='relu'))
data in each mini-batch, this problem is largely avoided

• Batch Normalization normalizes each batch by both mean and

variance reference

• It is just another layer, you can use to create your desired network

architecture

• It is generally used between the linear and the non-linear layers in

your network, because it normalizes the input to your activation

function so that you're centered in the linear section of the

activation function (such as Sigmoid)

Copyright Intellipaat. All rights reserved.


Batch Normalization

• When the data is fed through a deep neural network and weights
A normal Dense fully connected layer looks like this:
and parameters adjust those values, sometimes making the data

too big or too small, it becomes a problem. By normalizing the


model.add(layers.Dense(64, activation='relu'))
data in each mini-batch, this problem is largely avoided

• Batch Normalization normalizes each batch by both mean and

variance reference
To make it Batch normalization enabled, we have to tell the
• It is just another layer, you can use to create your desired network Dense Layer not to use bias, since it is not needed, and thus
it can save some calculation. Also, put the Activation Layer
architecture after the BatchNormalization() layer

• It is generally used between the linear and the non-linear layers in

your network, because it normalizes the input to your activation model.add(layers.Dense(64, use_bias=False))
model.add(layers.BatchNormalization())
function so that you're centered in the linear section of the model.add(Activation("relu"))

activation function (such as Sigmoid)

Copyright Intellipaat. All rights reserved.


Let us see how to build models in
Keras!

Copyright Intellipaat. All rights reserved.


Building Models in Keras

Let us see the 4-step workflow in developing neural networks with Keras

Define the Training Data

Copyright Intellipaat. All rights reserved.


Building Models in Keras

Let us see the 4-step workflow in developing neural networks with Keras

Define a Neural Network


Model

Copyright Intellipaat. All rights reserved.


Building Models in Keras

Let us see the 4-step workflow in developing neural networks with Keras

Configure the Learning


Process

Copyright Intellipaat. All rights reserved.


Building Models in Keras

Let us see the 4-step workflow in developing neural networks with Keras

Train the Model

Copyright Intellipaat. All rights reserved.


Building a Simple Sequential Model

Copyright Intellipaat. All rights reserved.


Use Case 1

Here, we will try to build a Sequential Network of Dense Layers, and the dataset used is MNIST. MNIST is a classic dataset of
handwritten images, released in 1999, and has served as the basis for benchmarking classification algorithms

Copyright Intellipaat. All rights reserved.


Building a Keras Model Using Functional
Model API

Copyright Intellipaat. All rights reserved.


Use Case 2

Here, we will be using Keras to create a simple neural network to predict, as accurately as we can, digits from handwritten images. In
particular, we will be calling the Functional Model API of Keras and creating a 4-layered and 5-layered neural network.
Also, we will be experimenting with various optimizers: the plain Vanilla Stochastic Gradient Descent optimizer and the Adam’s optimizer.
We will also introduce dropout, a form of regularization technique, in our neural networks to prevent overfitting

Copyright Intellipaat. All rights reserved.


Quiz

Copyright Intellipaat. All rights reserved.


Quiz 1

Keras uses TensorFlow in the backend?

A True

B False

Copyright Intellipaat. All rights reserved.


Answer 1

Keras uses TensorFlow in the backend?

A True

B False

Copyright Intellipaat. All rights reserved.


Quiz 2

Models is Keras are of type..

A Sequential and Functional API

B Linear and Functional API

C Sequential and Batch

D None of these

Copyright Intellipaat. All rights reserved.


Answer 2

Models is Keras are of type..

A Sequential and Functional API

B Linear and Functional API

C Sequential and Batch

D None of these

Copyright Intellipaat. All rights reserved.


Quiz 3

Dropout is the other name for Data


Augmentation?

A Yes

B No

Copyright Intellipaat. All rights reserved.


Answer 3

Dropout is the other name for Data


Augmentation?

A Yes

B No

Copyright Intellipaat. All rights reserved.


Thank you!

Copyright Intellipaat. All rights reserved.


India: +91-7847955955

US: 1-800-216-8930 (TOLL FREE)

sales@intellipaat.com

24/7 Chat with Our Course Advisor

Copyright Intellipaat. All rights reserved.


Artificial
Intelligence
Convolutional Neural Networks

Copyright Intellipaat. All rights reserved.


Agenda

Disadvantages of fully
01 connected network 02 What is CNN?

Different Layers Build a model with


03 in CNN 04 CNN using Keras

Copyright Intellipaat. All rights reserved.


Let us understand how our
computer reads the images that we
provide to it!

Copyright Intellipaat. All rights reserved.


Image Input

What We See What Computer See


▪ When a computer sees an image (takes an image as input), it will see an array of pixel values

▪ Depending on the resolution and size of the image, it will see a 32 x 32 x 3 array of numbers (3 refers to RGB values)

▪ Each of these numbers is given a value from 0 to 255 which describes the pixel intensity at that point

▪ These numbers are the only inputs available to the computer


Copyright Intellipaat. All rights reserved.
Problem with a Fully Connected Network

Ok, got it!


A single fully connected neuron in the first
Hidden Layer of a regular neural network
would have 28*28*3 = 2,352 weights

Copyright Intellipaat. All rights reserved.


Problem with a Fully Connected Network

How will I
An image of size 200*200*3 = 2,352 would manage
have 120,000 weights at the Hidden Layer that?

Copyright Intellipaat. All rights reserved.


Problem with a Fully Connected Network

How will I
An image of size 200*200*3 = 2,352 would manage
have 120,000 weights at the Hidden Layer that?

Also, we will have several such layers of neurons leading to several parameters. Thus, this connectivity would be a waste as
the huge number of parameters would lead to overfitting!

Copyright Intellipaat. All rights reserved.


Let us understand what CNN is and
how it helps overcome this
problem!

Copyright Intellipaat. All rights reserved.


Convolutional Neural Networks (CNNs/ConvNets)

CNNs are like neural networks (that we have already studied before) and are made up of neurons with learnable weights and biases.
Each neuron receives several inputs, takes a weighted sum of them, passes it through an activation function, and responds with an output

• The whole network has a loss function, and all tips and tricks that So, how are
Convolutional Neural
we developed for neural networks still apply on CNNs Networks different
from Neural Networks?

• A ConvNet perceives images as volumes, i.e., 3-dimensional

objects, rather than as flat canvases to be measured only by

width and height

Copyright Intellipaat. All rights reserved.


Convolution: Definition

“to convolve” means to roll together

• From mathematical perspective, a convolution is the integral

measuring of how much two functions overlap as one passes

over the other

• Think of convolution as a way of mixing two functions by

multiplying them

Copyright Intellipaat. All rights reserved.


More About CNNs

▪ ConvNets pass many filters over a single image, each one picking up a

different signal

▪ At a fairly early layer, you could imagine them as passing a horizontal line

filter, a vertical line filter, and a diagonal line filter to create a map of the

edges in the image

▪ CNNs take those filters, slice of the image’s feature space, and map them one

by one; i.e., they create a map of each place wherever feature occurs

▪ By learning different portions of a feature space, convolutional nets allow for

easily scalable and robust feature engineering

Copyright Intellipaat. All rights reserved.


More About CNNs

▪ A simple ConvNet is a sequence of layers, and every layer of a ConvNet

transforms one volume of activations to another through a differentiable

function

▪ There are four layered concepts we should understand in Convolutional

Neural Networks:

• Convolution

• ReLU

• Pooling

• Full Connectedness (Fully Connected Layer)

Copyright Intellipaat. All rights reserved.


Before going deep into this concept,
let us see an example!

Copyright Intellipaat. All rights reserved.


Use Case 1

In this simple example, we will determine whether the image is of an O or of an X

Copyright Intellipaat. All rights reserved.


Use Case 1

▪ Here, there are multiple renditions of Xs and Os

▪ This makes it tricky for a computer to recognize them

▪ But the goal is that if the input signal looks

like images it has seen before, the ‘image’

reference signal will be mixed into, or convolved with,

the input signal

▪ The resulting output signal is then passed on to

the next layer

Copyright Intellipaat. All rights reserved.


Use Case 1

▪ A naïve approach to solve this problem is to save images of

X and O and compare every new image with our examples

to check which is a better match

▪ To a computer, an image looks like a 2-dimensional array of

pixels (think of giant checkerboard) with a number in each

position

▪ When comparing two images, if any pixel value doesn’t

match, then these images don’t match, at least according to

the computer

▪ Ideally, we would like to be able to see Xs and Os even if

they’re shifted, shrunken, rotated, or deformed. This is

where CNNs come in

Copyright Intellipaat. All rights reserved.


Probably, I think we need a
classifier which can be used with
images and correctly predict what it
is! Agree or not?

Copyright Intellipaat. All rights reserved.


Use Case 1

▪ A computer understands an image using numbers at each pixel as shown in the figure

▪ Here, blue pixels have −1 value, while the white pixels have a value of 1

Copyright Intellipaat. All rights reserved.


Use Case 1

If we just normally search and compare the values between a normal image and another ‘X’ rendition, we would get a lot of missing pixels, which
means, this is not an optimal way of image classification since it requires exactly the same images to classify

Copyright Intellipaat. All rights reserved.


Let’s see how CNN solves this
problem

Copyright Intellipaat. All rights reserved.


Feature Matching

▪ CNNs compare images piece by piece

▪ The pieces that it looks for are called features

▪ By finding rough feature matches in roughly the same

positions in the two images, CNNs get a lot better at seeing

similarity, than the whole-image matching schemes

Copyright Intellipaat. All rights reserved.


How Do CNNs Work?

▪ Each feature is like a mini-image—a small 2-dimensional

array of values

▪ Features match common aspects of the images

▪ In the case of X images, features consisting of diagonal lines

and a crossing capture all the important characteristics of

most Xs

▪ These features will probably match the arms and the center

of any image of an X

Copyright Intellipaat. All rights reserved.


Let’s now understand the concept
of the Convolutional Layer

Copyright Intellipaat. All rights reserved.


Convolutional Layer

▪ It is the first layer of a CNN

▪ When presented with a new image, the CNN doesn’t know exactly where these features will match,

so it tries them everywhere, in every possible position

▪ In calculating the match of a feature across the whole image, they (ConvNet) act as filters

▪ The math used to perform this is called convolution, from which Convolutional Neural Networks get

their name

▪ We have four steps in convolution:


Symbol for Convolution
• Line up the feature and the image

• Multiply each image pixel by the corresponding feature pixel

• Add the values and find the sum

• Divide the sum by the total number of pixels in the feature

Copyright Intellipaat. All rights reserved.


Convolutional Layer

▪ In our example, consider a feature image and one pixel from it

▪ Multiply this with the existing image, and the product will be stored in another buffer feature image
Copyright Intellipaat. All rights reserved.
Convolutional Layer

▪ Then, add up the answers and divide by the total

number of pixels in the feature

▪ If both pixels are white (with a value of 1) then 1 * 1 = 1

▪ If both are black (−1), then (−1) * (−1) = 1

▪ Either way, every matching pixel results in 1

▪ Similarly, any mismatch results in −1

▪ If all pixels in a feature match, then adding them up and

dividing by the total number of pixels gives a value, 1

▪ Similarly, if none of the pixels in a feature match the

image patch, then the answer is −1

Copyright Intellipaat. All rights reserved.


Convolutional Layer

▪ The final value obtained from the math that is performed in the last step is placed at the center of the filtered image as shown above

Copyright Intellipaat. All rights reserved.


Convolutional Layer

Now, move this filter around and do the same at any pixel in the image. Consider the below example:

Copyright Intellipaat. All rights reserved.


Convolutional Layer

▪ As you can see, here after performing all the steps, the value is 0.55!

▪ Take this value and place it in the image as explained before


Copyright Intellipaat. All rights reserved.
Convolutional Layer

▪ To complete the convolution, repeat this process, lining up the feature with every possible image patch

▪ Take the answer from each convolution and make a new 2-dimensional array from it, based on where in the image each patch is located

▪ This map of matches is also a filtered version of the original image

▪ It’s a map showing where in the image the feature can be found

▪ Values close to 1 show strong matches, values close to −1 show strong matches for the photographic negative of our feature, and values near 0 show no

match of any sort


Copyright Intellipaat. All rights reserved.
Convolutional Layer

▪ The next step is to repeat the convolution process in its

entirety for each of the other features. The result is a set of

filtered images, one for each of the filters

▪ It’s convenient to think of this whole collection of convolution

operations as a single processing step

Copyright Intellipaat. All rights reserved.


Let’s understand the concept of the
ReLU Layer

Copyright Intellipaat. All rights reserved.


Rectified Linear Units

▪ ReLU is an activation function (as discussed earlier)

▪ This function activates a node only if the input is above a certain quantity.

▪ When the input is below zero, the output is zero, but when the input rises above a certain threshold, it has a linear relationship with the dependent variable

x f(x) = x F(x)

−2 f(−2) = 0 0

−6 f(−6) = 0 0

2 f(2) = 2 2

6 f(6) = 6 6

▪ We have considered a simple function with values as mentioned above. So, the function only performs an operation if that value is obtained by the

dependent variable

Copyright Intellipaat. All rights reserved.


ReLU Layer

▪ A small but important player in CNNs is the ReLU layer

▪ Its math is also very simple—wherever a negative number occurs, swap it out for a 0

▪ This helps the CNN stay mathematically healthy by keeping learned values from getting stuck near 0 or blowing them up toward infinity

Symbol for ReLU

Copyright Intellipaat. All rights reserved.


ReLU Layer

Copyright Intellipaat. All rights reserved.


ReLU Layer
Similarly, we do the same process to all other feature images. The output of a ReLU layer is of the same size as
whatever is put into it, just with all negative values removed

Copyright Intellipaat. All rights reserved.


Let’s understand the concept of the
Pooling Layer

Copyright Intellipaat. All rights reserved.


Pooling Layer

▪ Another powerful tool that CNNs use is called pooling

▪ Pooling is a way to take large images and shrink them down while preserving the most important information in them

▪ It consists of stepping a small window across an image and taking the maximum value from the window at each step

▪ In practice, a window (2x2 or 3x3) and a step of 2 works well

Symbol for Pooling

Copyright Intellipaat. All rights reserved.


Pooling Layer

▪ In this case, we took the window size to be 2 and we got four values to choose from

▪ In those four values, the maximum value is 1, so we pick 1. Also, note that we started with a 7×7 matrix, but now the same matrix after pooling came down to

be a 4×4 matrix Copyright Intellipaat. All rights reserved.


Pooling Layer

▪ We need to move the window across the entire image

▪ The procedure is exactly as same as the above, and we need to repeat it for the entire image

Copyright Intellipaat. All rights reserved.


Pooling Layer

▪ We need to do it for two other filters as well. Once this is

done, we get the adjacent result

▪ The output will have the same number of images, but they will

have fewer pixels

▪ This is helpful in managing the computational load

▪ Making an 8 megapixel image down to a 2 megapixel image

makes life a lot easier for everything downstream

Copyright Intellipaat. All rights reserved.


Let’s combine all layers together

Copyright Intellipaat. All rights reserved.


All Layers Together!

▪ You’ve probably noticed that the output of one layer is taken as the input to the other. Because of this, we can stack them like Lego bricks

Copyright Intellipaat. All rights reserved.


All Layers Together!

▪ Raw images get filtered, rectified, and pooled to create a set of shrunken, feature-filtered images. These can be filtered and shrunken again and again

Copyright Intellipaat. All rights reserved.


Fully Connected Layer

▪ It is the final layer where the classification actually happens

▪ Fully connected layers take the high-level filtered images and

translate them into votes

▪ Here, we take our filtered and shrunk images and put them into

one single list as shown in the figure

▪ Instead of treating inputs as a 2-dimensional array, they are

treated as a single list, and all are treated identically

▪ Every value gets its own vote on whether the current image is

an X or an O

Copyright Intellipaat. All rights reserved.


Fully Connected Layer

▪ Similarly, we will feed an image of O where we will have certain values which are high than others

▪ Some values are much better than the others at knowing when the image is an X, and some are particularly good at knowing when the image is an O

▪ These get larger votes than the others. These votes are expressed as weights, or connection strengths, between each value and each category

▪ We’re done with training the network, and now we can begin to predict and check the working of the classifier
Copyright Intellipaat. All rights reserved.
Fully Connected Layer

▪ Consider getting a new input where we have a 12-element

vector obtained after passing the input through all layers of

our network

▪ We make predictions based on the output data by comparing

the obtained values with the list of ‘Xs’ and ‘Os’ to check

what we’ve obtained is right or wrong

Copyright Intellipaat. All rights reserved.


Fully Connected Layer

▪ We just added the values which we found out as high (1st, 4th, 5th, 10th, and 11th) from the vector table of X and we got the sum as 5

▪ We did exactly the same thing with the input image and got a value of 4.56

▪ When we divide the values, we have a probability match of 0.91! Copyright Intellipaat. All rights reserved.
Fully Connected Layer

▪ Doing the same with the vector table of O, we have an output of 0.51

Copyright Intellipaat. All rights reserved.


Final Output

.92

▪ "Well, since 0.51 is less than 0.91, the probability for the input image to be of an O is less, isn't it?“

▪ So, we can conclude that the resulting input image is of an ‘X’!

▪ In practice, several fully connected layers are often stacked together, with each intermediate layer voting on phantom ‘hidden’ categories

▪ In effect, each additional layer lets the network learn ever more sophisticated combinations of features that help it make better decisions
Copyright Intellipaat. All rights reserved.
Object Recognition with Convolutional
Neural Networks in the Keras Deep
Learning Library

Copyright Intellipaat. All rights reserved.


Use Case 1

CIFAR-10 is an established computer-vision dataset used for object recognition. The CIFAR-10 data consists of 60,000 (32×32) color
images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images in the official data. The
label classes in the dataset are:
airplane
automobile
bird
cat
deer
dog
frog
horse
ship
Truck
Let’s look into full python implementation of object recognition task on CIFAR-10 dataset

Copyright Intellipaat. All rights reserved.


Quiz

Copyright Intellipaat. All rights reserved.


Quiz 1

Which of following activation function can’t be used at


output layer to classify an image ?

A sigmoid

B Tanh

C ReLU

D None of the above

Copyright Intellipaat. All rights reserved.


Answer 1

Which of following activation function can’t be used at


output layer to classify an image ?

A sigmoid

B Tanh

C ReLU

D None of the above

Copyright Intellipaat. All rights reserved.


Quiz 2

Which of the following statements is true


when you use 1×1 convolutions in a CNN?

A It can help in dimensionality reduction

B It can be used for feature pooling

It suffers less overfitting due to small


C kernel size

D All of the above

Copyright Intellipaat. All rights reserved.


Answer 2

Which of the following statements is true


when you use 1×1 convolutions in a CNN?

A It can help in dimensionality reduction

B It can be used for feature pooling

It suffers less overfitting due to small


C kernel size

D All of the above

Copyright Intellipaat. All rights reserved.


Quiz 3

In a simple MLP model with 8 neurons in the input layer, 5 neurons in the hidden
layer and 1 neuron in the output layer. What is the size of the weight matrices
between hidden output layer and input hidden layer?

A [1 X 5] , [5 X 8]

B [8 X 5] , [ 1 X 5]

C [8 X 5] , [5 X 1]

D [5 x 1] , [8 X 5]

Copyright Intellipaat. All rights reserved.


Answer 3

In a simple MLP model with 8 neurons in the input layer, 5 neurons in the hidden
layer and 1 neuron in the output layer. What is the size of the weight matrices
between hidden output layer and input hidden layer?

A [1 X 5] , [5 X 8]

B [8 X 5] , [ 1 X 5]

C [8 X 5] , [5 X 1]

D [5 x 1] , [8 X 5]

Copyright Intellipaat. All rights reserved.


Thank you!

Copyright Intellipaat. All rights reserved.


India: +91-7847955955

US: 1-800-216-8930 (TOLL FREE)

sales@intellipaat.com

24/7 Chat with Our Course Advisor

Copyright Intellipaat. All rights reserved.


Artificial
Intelligence
Recurrent Neural Networks

Copyright IntelliPaat. All rights reserved


Agenda
Issues with Feed Forward Understanding Recurrent Neural
01 Network 02 Networks

03 Types of RNN 04 Issues with RNN

05 Vanishing Gradient Problem 06 Long Short Term Networks

07 Demo on LSTM with Keras

Copyright Intellipaat. All rights reserved.


Issues with Feed Forward Network

Outputs are independent of each other

No Relation

Cannot handle sequential data


Output at ‘t’ Output at ‘t+1’

Cannot memorize previous inputs


Feed Forward Network

Copyright IntelliPaat. All rights reserved


Issues with Feed Forward Network

Would this feed


forward network be
able to predict the
Input Output
next word?

Recurrent Neural ……………………. FFN

This feed forward network


wouldn’t be able to predict
the next word because it
cannot memorize the
previous inputs

Copyright IntelliPaat. All rights reserved


Solution with Recurrent Neural Network

I only cook these


three items and
in the same
sequence

Day 1 Day 2 Day 3

Copyright IntelliPaat. All rights reserved


Solution with Recurrent Neural Network

Outputs are dependent on each other

Can handle sequential data

Day 1 Day 2 Day 3


Can memorize previous inputs

Recurrent Neural
Network

Copyright IntelliPaat. All rights reserved


Understanding Recurrent Neural Networks

▪ RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous

computations

▪ Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far

Input at Input at Input at


Input
‘t-1’ ‘t’ ‘t+1’

Copyright IntelliPaat. All rights reserved


Understanding Recurrent Neural Networks

• Xt is the input at time step ‘t’


• St is the hidden state at time step ‘t’. It’s the memory of the network. St is calculated based on the previous hidden
state and the input at the current step: 𝑠𝑡 = 𝑓 𝑈𝑥𝑡 + 𝑊𝑠𝑡−1 . The function 𝑓 is usually a non-linearity such as tanh
or ReLu.
• Ot is the output at step ‘t’. 𝑂𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑉𝑠𝑡

Copyright IntelliPaat. All rights reserved


Back-Propagation through Time

▪ Backpropagation Through Time (BPTT) is used to update the weights in the

recurrent neural network

▪ RNN typically predicts one output per each time step. Conceptually,

Backpropagation through Time works by unrolling the network to get each of

these individual time steps.

▪ Then, it calculates the error across each time step and adds up all of the

individual errors to get the final accumulated error.

▪ Following which the network is rolled back up and the weights are updated

Copyright IntelliPaat. All rights reserved


Types of RNN

single images ( or words,... ) are


single images ( or words,... ) are
classified in single class ( binary
classified in multiple classes
classification ) i.e. is this a bird or not

Copyright IntelliPaat. All rights reserved


Types of RNN

sequence of images ( or words, ... )


sequence of images ( or words, ... )
is classified in single class ( binary
is classified in multiple classes
classification of a sequence )

Copyright IntelliPaat. All rights reserved


Issues with RNN

Suppose we try to
predict the last word
in this text.. Input Output

Recurrent Neural …… RNN Network

Here, the RNN does not


need any further context. It
can easily predict that the
last word would be
‘Network’

Copyright IntelliPaat. All rights reserved


Issues with RNN

Now, let’s predict the


last word in this text..

Input Output

I’ve been staying in Spain for the last 10


years. I can speak fluent ………….. RNN

Regular RNN’s have


difficulty in learning long
range dependencies

Copyright IntelliPaat. All rights reserved


Issues with RNN

I’ve been staying in Spain for the last 10 years. I can speak fluent …………..

• In this case, the network needs the context of ‘Spain’ to predict the last word in this text, which is “Spanish”

• The gap between the word which we want to predict and the relevant information is very large and this is known as
long term dependency

∂E/∂W = ∂E/∂ 3 *∂ 3/∂h3 *∂h3/∂ 2 *∂ 2/∂h1…

• There arises a long dependency while backpropagating the error

Copyright IntelliPaat. All rights reserved


Vanishing Gradient Problem

▪ Now, if there is a really long dependency, there’s a good probability that one

of the gradients might approach zero and this would lead to all the gradients

rushing to zero exponentially fast due to multiplication

∂E/∂W=0

▪ Such states would no longer help the network to learn anything. This is

known as vanishing gradient problem

Copyright IntelliPaat. All rights reserved


Long Short Term Networks

Long Short Term Networks are special kind of RNNs which are explicitly designed to avoid the long-term dependency problem

Standard RNN

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will

have a very simple structure, such as a single tanh layer

Copyright IntelliPaat. All rights reserved


Long Short Term Networks

Long Short Term Networks are special kind of RNNs which are explicitly designed to avoid the long-term dependency problem

h1 h
h 1

LSTM
h

1 1

LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer,

there are four, interacting in a very special way

Copyright IntelliPaat. All rights reserved


Core Idea behind LSTMs

The key to LSTMs is the cell state. The cell state is kind of like a conveyor belt. It runs straight down the entire
chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged

h
h 1 h

The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures
called gates

Copyright IntelliPaat. All rights reserved


Core Idea behind LSTMs

Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and
a pointwise multiplication operation

The sigmoid layer outputs numbers between zero and one, describing how much of each component should be
let through. A value of zero means “let nothing through,” while a value of one means “let everything through!”

Copyright IntelliPaat. All rights reserved


Working of LSTMs

Step 1

The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This
decision is made by a sigmoid layer called the “forget gate layer”

Copyright IntelliPaat. All rights reserved


Working of LSTMs

Step 2

The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a
sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of
new candidate values, that could be added to the state

Copyright IntelliPaat. All rights reserved


Working of LSTMs

Step 3

Then we have to update the old cell state, Ct-1, into new cell state Ct. So, we multiply the old state (Ct-1) by ft,
forgetting the things we decided to forget earlier. Then we add (it * C~t). This is the new candidate values, scaled
by how much we decided to update each state value

h
h 1 h

Copyright IntelliPaat. All rights reserved


Working of LSTMs

Step 4

Finally, we’ll run a sigmoid layer which decides what part of the cell state we’re going to output. Then, we put the
cell state through tanh and multiply it by the output of the sigmoid gate, so that we only output the parts we
decided to

h
h 1

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN
Loading the required packages:

Preparing the input data:

Creating 100 vectors with 5 consecutive


numbers

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Preparing the output data:

Converting the data & target into numpy arrays:

Having a glance at the shape:

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Dividing the data into train & test sets:

Creating a sequential model:

Adding the LSTM layer with the output and input shape:

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Compiling the model with ‘Adam’ optimizer:

Having a glance at the model summary:

10

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Fitting a model on the train set:

11

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Predicting the values on the test set:

12

Making a scatter plot for actual values and predicted values:

13

Copyright IntelliPaat. All rights reserved


We see that the model
fails miserably and none
of the predictions are
correct

Copyright IntelliPaat. All rights reserved


We’d have to normalize
the data before we
build the model

Normalized
Raw Data Normalizing
Data

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Normalizing the input data:

14

Normalizing the output data:

15

Fitting the model with normalized values and number of epochs to be 500:

16

Copyright IntelliPaat. All rights reserved


Implementing a Simple RNN

Predicting the values on test set:

17

Making a scatter plot for actual values & predicted values:

18

Copyright IntelliPaat. All rights reserved


We see that the loss has
reduced after
normalizing the data
and increasing the
epochs

Copyright IntelliPaat. All rights reserved


Quiz

Copyright IntelliPaat. All rights reserved


Quiz 1

Gated Recurrent units can help prevent


vanishing gradient problem in RNN.

A True

B False

Copyright IntelliPaat. All rights reserved


Answer 1

Gated Recurrent units can help prevent


vanishing gradient problem in RNN.

A True

B False

Copyright IntelliPaat. All rights reserved


Quiz 2

How many types of RNN exist?

A 4

B 2

C 3

D None of these

Copyright IntelliPaat. All rights reserved


Answer 2

How many types of RNN exist?

A 4

B 2

C 3

D None of these

Copyright IntelliPaat. All rights reserved


Quiz 3

How many gates are there in LSTM?

A 1

B 2

C 3

D 4

Copyright IntelliPaat. All rights reserved


Answer 3

How many gates are there in LSTM?

A 1

B 2

C 3

D 4

Copyright IntelliPaat. All rights reserved


India: +91-7847955955

US: 1-800-216-8930 (TOLL FREE)

sales@intellipaat.com

24/7 Chat with Our Course Advisor

Copyright IntelliPaat. All rights reserved


Artificial
Intelligence
Autoencoders and
Restricted Boltzmann Machine

Copyright Intellipaat. All rights reserved.


Agenda
01 Autoencoders 02 Autoencoders Vs. PCA

Architecture of Hyperparameters of
03 Autoencoders
04 Autoencoders

05 Use Case 1 06 Types of Autoencoders

07 What Is an RBM? 08 Working of RBMs

Collaborative Filtering
09 Using RBMs
Copyright Intellipaat. All rights reserved.
For Your Information!

Artificial Intelligence encircles a wide range of technologies and techniques that enable computer systems to solve problems, like data
compression which is used in computer vision, computer networks, computer architecture, and many other fields

Copyright Intellipaat. All rights reserved.


You might be thinking why we are
discussing about Autoencoders.
What is a big deal about them?

Copyright Intellipaat. All rights reserved.


What Are Autoencoders?

An autoencoder is an Unsupervised Machine Learning algorithm that takes an image as input and tries to reconstruct it using a
fewer number of bits from the bottleneck also known as the latent space

Copyright Intellipaat. All rights reserved.


What Are Autoencoders?

Compression and decompression functions in autoencoders are:

Data-specific, which means that they will only be able to compress data
similar to what they have been trained on. An autoencoder trained on
pictures of faces would do a rather poor job on compressing pictures of
trees, because the features it would have learned would be face-
specific

Lossy, which means that the decompressed outputs will be degraded


compared to the original inputs (similar to MP3 or JPEG compression)

Learned automatically from examples, rather than engineered by a


human

Copyright Intellipaat. All rights reserved.


But then, Autoencoders and
Dimensionality Reduction are same
techniques. Are they?

Copyright Intellipaat. All rights reserved.


Autoencoders and Dimensionality Reduction

• Autoencoders are similar to dimensionality reduction techniques like Principal Component Analysis (PCA)

• PCA: It projects the data from a higher dimension to a lower dimension using linear transformation

• Both techniques try to preserve the important features of data while removing the non-essential parts

Copyright Intellipaat. All rights reserved.


Autoencoders and Dimensionality Reduction

• Autoencoders are similar to dimensionality reduction techniques like Principal Component Analysis (PCA)

• PCA: It projects the data from a higher dimension to a lower dimension using linear transformation

• Both techniques try to preserve the important features of data while removing the non-essential parts

Then, why can’t


we just learn any
one of them?

Copyright Intellipaat. All rights reserved.


Autoencoders Vs. PCA

‘The major difference between autoencoders and PCA lies in the transformation part: PCA uses linear transformations

whereas autoencoders use non-linear transformations’

Copyright Intellipaat. All rights reserved.


Let us see the architecture of
Autoencoders

Copyright Intellipaat. All rights reserved.


Architecture of Autoencoders

An autoencoder consists of three parts:

Encoder Code Decoder

Copyright Intellipaat. All rights reserved.


Architecture of Autoencoders

Encoder

• This part of the network compresses or down samples the input into a fewer

number of bits

• The space represented by these fewer number of bits is often called the latent

space or bottleneck

• The bottleneck is also called the ‘maximum point of compression’ since at this

point the input is compressed the maximum

• These compressed bits that represent the original input are together called an

‘encoding’ of the input

Copyright Intellipaat. All rights reserved.


Architecture of Autoencoders

Code

• This part of the network represents the compressed input which is fed to the decoder

• It is also known as the bottleneck

• The code decides which aspects of the observed data are relevant information and which

aspects can be discarded. It does this by balancing:

o Compactness of representation, measured as the compressibility

o Retains some behaviorally relevant variables from the input

Copyright Intellipaat. All rights reserved.


Architecture of Autoencoders

Decoder

• This part of the network tries to reconstruct the input using the encoded input

• The decoded image is a lossy reconstruction of the original image, and it is

reconstructed from the latent space representation

• When the decoder is able to reconstruct the input exactly as it was fed to the

encoder, you can say that the encoder is able to produce the best encodings for

the input

Copyright Intellipaat. All rights reserved.


Let us understand some of the
important Hyperparameters of
Autoencoders

Copyright Intellipaat. All rights reserved.


Hyperparameters of Autoencoders

CODE SIZE
1 It represents the number of nodes
in the middle layer. Smaller size
results in more compression

Copyright Intellipaat. All rights reserved.


Hyperparameters of Autoencoders

CODE SIZE
1 It represents the number of nodes
in the middle layer. Smaller size
results in more compression

NUMBER OF LAYERS
An autoencoder can consist of as
2 many layers as we want

Copyright Intellipaat. All rights reserved.


Hyperparameters of Autoencoders

CODE SIZE
1 It represents the number of nodes
in the middle layer. Smaller size
results in more compression

NUMBER OF NODES PER


LAYER NUMBER OF LAYERS
The number of nodes per layer An autoencoder can consist of as
decreases with each subsequent
layer of the encoder, and increases
3 2 many layers as we want

back in the decoder. The decoder is


symmetric to the encoder in terms
of the layer structure

Copyright Intellipaat. All rights reserved.


Hyperparameters of Autoencoders

LOSS FUNCTION CODE SIZE


We either use mean squared error
4 1 It represents the number of nodes
or binary cross-entropy. If input in the middle layer. Smaller size
values are in the range [0, 1], then results in more compression
we typically use cross-entropy;
otherwise, we use the mean
squared error

NUMBER OF NODES PER


LAYER NUMBER OF LAYERS
The number of nodes per layer An autoencoder can consist of as
decreases with each subsequent
layer of the encoder, and increases
3 2 many layers as we want

back in the decoder. The decoder is


symmetric to the encoder in terms
of the layer structure

Copyright Intellipaat. All rights reserved.


Let’s Build a Simple Autoencoder on the
MNIST Dataset

Copyright Intellipaat. All rights reserved.


Use Case 1

We will start with a single fully-connected neural layer as the encoder and as the decoder:

Copyright Intellipaat. All rights reserved.


Use Case 1

Let's create a separate encoder model…

Copyright Intellipaat. All rights reserved.


Use Case 1

…and a decoder model:

Copyright Intellipaat. All rights reserved.


Use Case 1

Now, let's train our autoencoder to reconstruct MNIST digits

First, we'll configure our model to use a per-pixel binary cross-entropy loss and the Adadelta optimizer:

Copyright Intellipaat. All rights reserved.


Use Case 1

Let's prepare our input data. We're using MNIST digits, and we're discarding the labels (since we're only interested in
encoding/decoding the input images)

Copyright Intellipaat. All rights reserved.


Use Case 1

We will normalize all values between 0 and 1, and we will flatten 28x28 images into vectors of size 784

Copyright Intellipaat. All rights reserved.


Use Case 1

Now, let's train our autoencoder for 50 epochs:

Copyright Intellipaat. All rights reserved.


Use Case 1

After 50 epochs, the autoencoder seems to reach a stable train/test loss value of about 0.11. We can try to visualize the
reconstructed inputs and the encoded representations

Copyright Intellipaat. All rights reserved.


Use Case 1

Here's what we get as an output. The top row is the original digits and the bottom row is the reconstructed digits. We are
losing quite a bit of details with this basic approach

Copyright Intellipaat. All rights reserved.


Adding a Sparsity Constraint on the Encoded
Representations

▪ In the previous example, representations were (only) constrained from keras import regularizers

by the size of the Hidden Layer (32) encoding_dim = 32

▪ What typically happens in such situations is that the Hidden Layer input_img = Input(shape=(784,))
# add a Dense layer with a L1 activity
is learning an approximation of PCA regularizer
encoded = Dense(encoding_dim,
▪ Another way to constrain the representations is to add a sparsity activation='relu',

constraint on the activity of the hidden representations so that activity_regularizer=regularizers.l1(10e-


5))(input_img)
only fewer units would ‘fire’ at a given time decoded = Dense(784,
activation='sigmoid')(encoded)
▪ In Keras, this can be done by adding an activity_regularizer to our
autoencoder = Model(input_img, decoded)
Dense Layer:

Copyright Intellipaat. All rights reserved.


Adding a Sparsity Constraint on the Encoded
Representations

▪ Train this model for 100 epochs (with the added regularization, the model is less likely to overfit and can be trained longer)

▪ The model ends with a train loss of 0.11 and a test loss of 0.10

▪ The difference between the two is mostly due to the regularization term being added to the loss during training (worth about 0.01)

▪ Here's a visualization of our new results:

They look pretty similar to the previous model, the only significant difference being the sparsity of the encoded representations.
encoded_imgs.mean() yields a value 3.33 (over our 10,000 test images); whereas with the previous model, the same quantity was 7.30.
So, our new model yields encoded representations that are twice sparser

Copyright Intellipaat. All rights reserved.


Quiz

Copyright Intellipaat. All rights reserved.


Quiz 1

Autoencoders and RBM are same with just


little change in their laws.

A True

B False

Copyright Intellipaat. All rights reserved.


Answer 1

Autoencoders and RBM are same with just


little change in their laws.

A True

B False

Copyright Intellipaat. All rights reserved.


Quiz 2

What should be the aim of training procedure in


boltzman machine of feedback networks?

A To capture inputs

B To feedback the captured outputs

C To capture the behaviour of system

D None of the mentioned

Copyright Intellipaat. All rights reserved.


Answer 2

What should be the aim of training procedure in


boltzman machine of feedback networks?

A To capture inputs

B To feedback the captured outputs

C To capture the behaviour of system

D None of the mentioned

Copyright Intellipaat. All rights reserved.


Quiz 3

Should we build an autoencoder in Tensorflow to


surpass PCA?

A Yes

B No

Copyright Intellipaat. All rights reserved.


Answer 3

Should we build an autoencoder in Tensorflow to


surpass PCA?

A Yes

B No

Copyright Intellipaat. All rights reserved.


Thank you!

Copyright Intellipaat. All rights reserved.


India: +91-7847955955

US: 1-800-216-8930 (TOLL FREE)

sales@intellipaat.com

24/7 Chat with Our Course Advisor

Copyright Intellipaat. All rights reserved.


Artificial
Intelligence
TFLearn: A Deep Learning Library

Copyright Intellipaat. All rights reserved.


Agenda

01 What Is TFLearn? 02 Features of TFLearn

03 Layers in TFLearn 04 Built-in Operations

Saving/Restoring 06 Fine Tuning


05 Using TFLearn

Data Management
07 Using TFLearn 08 Use Case 1

Copyright Intellipaat. All rights reserved.


Let us learn about a Deep Learning
library, featuring a high-level API
for TensorFlow!

Copyright Intellipaat. All rights reserved.


TFLearn

TFLearn is a modular and transparent Deep Learning library built on top of TensorFlow. It was designed to provide a high-level API to
TensorFlow in order to facilitate and speed-up experimentations, while remaining fully transparent and compatible with it.

Copyright Intellipaat. All rights reserved.


Let us see some of the features of
TFLearn!

Copyright Intellipaat. All rights reserved.


TFLearn: Features

An easy-to-use and understand


high-level API for implementing
Deep Neural Networks
01

Copyright Intellipaat. All rights reserved.


TFLearn: Features

Fast prototyping through highly


modular built-in neural network
layers, regularizers, optimizers,
02
metrics, etc.

An easy-to-use and understand


high-level API for implementing
Deep Neural Networks
01

Copyright Intellipaat. All rights reserved.


TFLearn: Features

Full transparency over


TensorFlow. All functions are
built over tensors and can be
used independently of TFLearn 03

Fast prototyping through highly


modular built-in neural network
layers, regularizers, optimizers,
02
metrics, etc.

An easy-to-use and understand


high-level API for implementing
Deep Neural Networks
01

Copyright Intellipaat. All rights reserved.


TFLearn: Features

Full transparency over Powerful helper functions to


TensorFlow. All functions are train any TensorFlow graph with
built over tensors and can be
used independently of TFLearn 03 04 the support of multiple inputs,
outputs, and optimizers

Fast prototyping through highly


modular built-in neural network
layers, regularizers, optimizers,
02
metrics, etc.

An easy-to-use and understand


high-level API for implementing
Deep Neural Networks
01

Copyright Intellipaat. All rights reserved.


TFLearn: Features

Full transparency over


Powerful helper functions to
TensorFlow. All functions are
train any TensorFlow graph with
built over tensors and can be
used independently of TFLearn 03 04 the support of multiple inputs,
outputs, and optimizers

Fast prototyping through highly


modular built-in neural network
layers, regularizers, optimizers,
02 05 Easy and beautiful graph
visualizations, with details about
weights, gradients, activations,
metrics, etc. and more

An easy-to-use and understand


high-level API for implementing
Deep Neural Networks
01

Copyright Intellipaat. All rights reserved.


TFLearn: Features

Full transparency over Powerful helper functions to


TensorFlow. All functions are train any TensorFlow graph with
built over tensors and can be
used independently of TFLearn 03 04 the support of multiple inputs,
outputs, and optimizers

Fast prototyping through highly


modular built-in neural network
layers, regularizers, optimizers,
02 05 Easy and beautiful graph
visualizations, with details about
weights, gradients, activations,
metrics, etc. and more

An easy-to-use and understand


high-level API for implementing
Deep Neural Networks
01 06 Effortless device placement for
using multiple CPUs/GPUs

Copyright Intellipaat. All rights reserved.


It’s time to take a deep dive into
TFLearn concepts

Copyright Intellipaat. All rights reserved.


Layers in TFLearn

Layers are a core feature of TFLearn. Here is a list of all currently available layers:

File Layers
input_data, fully_connected, dropout, custom_layer, reshape, flatten, activation, single_unit,
core
highway, one_hot_encoding, and time_distributed
conv_2d, conv_2d_transpose, max_pool_2d, avg_pool_2d, upsample_2d, conv_1d, max_pool_1d,
conv avg_pool_1d, residual_block, residual_bottleneck, conv_3d, max_pool_3d, avg_pool_3d,
highway_conv_1d, highway_conv_2d, global_avg_pool, and global_max_pool
recurrent simple_rnn, lstm, gru, bidirectionnal_rnn, and dynamic_rnn

embedding embedding

normalization batch_normalization, local_response_normalization, and l2_normalize

merge Merge and merge_outputs

estimator regression

Example:

tflearn.conv_2d(x, 32, 5, activation='relu', name='conv1')

Copyright Intellipaat. All rights reserved.


Built-in Operations

▪ Besides the layers concept, TFLearn also provides many different operations to be used when building a neural network

▪ These operations are mainly meant to be part of the above 'layers' argument, but they can also be used independently in any other TensorFlow graph for

convenience

▪ In practice, just providing the operation’s name as an argument is enough (such as activation='relu' or regularizer='L2' for conv_2d), but a function can also

be provided for further customization

File Operations

activations linear, tanh, sigmoid, softmax, softplus, softsign, relu, relu6, leaky_relu, prelu, and elu
softmax_categorical_crossentropy, categorical_crossentropy, binary_crossentropy, mean_square,
objectives
hinge_loss, roc_auc_score, and weak_cross_entropy_2d
optimizers SGD, RMSProp, Adam, Momentum, AdaGrad, Ftrl, and AdaDelta

metrics Accuracy, Top_k, and R2

initializations zeros, uniform, uniform_scaling, normal, truncated_normal, xavier, and variance_scaling

losses L1 and l2

Copyright Intellipaat. All rights reserved.


Built-in Operations

Example:

# Activation and Regularization inside a layer:


fc2 = tflearn.fully_connected(fc1, 32, activation='tanh', regularizer='L2')
# Equivalent to:
fc2 = tflearn.fully_connected(fc1, 32)
tflearn.add_weights_regularization(fc2, loss='L2')
fc2 = tflearn.tanh(fc2)

# Optimizer, Objective and Metric:


reg = tflearn.regression(fc4, optimizer='rmsprop', metric='accuracy', loss='categorical_crossentropy')
# Ops can also be defined outside, for deeper customization:
momentum = tflearn.optimizers.Momentum(learning_rate=0.1, weight_decay=0.96, decay_step=200)
top5 = tflearn.metrics.Top_k(k=5)
reg = tflearn.regression(fc4, optimizer=momentum, metric=top5, loss='categorical_crossentropy')

Copyright Intellipaat. All rights reserved.


Saving/Restoring Using TFLearn

▪ To save or restore a model, you can simply invoke the save or load method of the Deep Neural Network model class

# Save a model
model.save('my_model.tflearn')
# Load a model
model.load('my_model.tflearn')

▪ Retrieving layer variables can either be done using the layer name or, directly, by using 'W' or 'b' attributes that are supercharged to the layer's returned Tensor

# Let's create a layer


fc1 = fully_connected(input_layer, 64, name="fc_layer_1")
# Using Tensor attributes (Layer will supercharge the returned Tensor with weights attributes)
fc1_weights_var = fc1.W
fc1_biases_var = fc1.b
# Using Tensor name
fc1_vars = tflearn.get_layer_variables_by_name("fc_layer_1")
fc1_weights_var = fc1_vars[0]
fc1_biases_var = fc1_vars[1]

Copyright Intellipaat. All rights reserved.


Saving/Restoring Using TFLearn

▪ To get or set the value of variables, TFLearn models class implement get_weights and set_weights methods

input_data = tflearn.input_data(shape=[None, 784])


fc1 = tflearn.fully_connected(input_data, 64)
fc2 = tflearn.fully_connected(fc1, 10, activation='softmax')
net = tflearn.regression(fc2)
model = DNN(net)
# Get weights values of fc2
model.get_weights(fc2.W)
# Assign new random weights to fc2
model.set_weights(fc2.W, numpy.random.rand(64, 10))

Note: You can also directly use TensorFlow eval or assign ops to get or set the value of these variables

Copyright Intellipaat. All rights reserved.


Fine Tuning

Fine tuning is a process to take a network model that is already trained for a given task and make it perform a second
similar task

• Assuming that the original task is similar to the new task, using a network that is

already designed and trained allows us to take advantage of the feature

extraction that happens in the front layers of the network without developing

that feature extraction network from scratch

• It replaces the output layer, originally trained to recognize (in the case of

imagenet models) 1,000 classes, with a layer that recognizes the number of

classes you require

Copyright Intellipaat. All rights reserved.


Fine Tuning

• The new output layer that is attached to the model is then trained to take the lower level features from the front of the network and map them to the

desired output classes, using Stochastic Gradient Descent (SGD)

• Once this has been done, other late layers in the model can be set as 'trainable=True' so that in further SGD epochs their weights can be fine-tuned for the

new task

• So, when defining a model in TFLearn, you can specify which layer's weights you want to be restored or not (when loading the pre-trained model)

• This can be handled with the 'restore' argument of layer functions (only available for layers with weights)

# Weights will be restored by default.


fc_layer = tflearn.fully_connected(input_layer, 32)
# Weights will not be restored, if specified so.
fc_layer = tflearn.fully_connected(input_layer, 32,
restore='False')

All weights that do not need to be restored will be added to tf.GraphKeys.EXCL_RESTORE_VARS


collection, and when loading a pre-trained model, these variables’ restoration will simply be ignored

Copyright Intellipaat. All rights reserved.


Data Management

• TFLearn supports NumPy array data

• Additionally, it also supports HDF5 for handling large datasets

• HDF5 is a data model, a library, and a file format for storing and managing data

• It supports an unlimited variety of data types and is designed for flexible and efficient I/O and also for high-volume and complex data (more info)

• TFLearn can directly use HDF5 formatted data

# Load hdf5 dataset


h5f = h5py.File('data.h5', 'r')
X, Y = h5f['MyLargeData']

... define network ...

# Use HDF5 data model to train model


model = DNN(network)
model.fit(X, Y)

Copyright Intellipaat. All rights reserved.


Scopes and Weights Sharing

• All layers are built over 'variable_op_scope' that makes it easy to share variables among multiple layers and make TFLearn suitable for a

distributed training

• All layers with inner variables support a 'scope' argument to place variables under; layers with same scope name will then share the

same weights
# Define a model builder
def my_model(x):
x = tflearn.fully_connected(x, 32, scope='fc1')
x = tflearn.fully_connected(x, 32, scope='fc2')
x = tflearn.fully_connected(x, 2, scope='out')

# 2 different computation graphs but sharing the same weights


with tf.device('/gpu:0'):
# Force all Variables to reside on the CPU.
with tf.arg_scope([tflearn.variables.variable], device='/cpu:0'):
model1 = my_model(placeholder_X)
# Reuse Variables for the next model
tf.get_variable_scope().reuse_variables()
with tf.device('/gpu:1'):
with tf.arg_scope([tflearn.variables.variable], device='/cpu:0'):
model2 = my_model(placeholder_X)

Copyright Intellipaat. All rights reserved.


Demo on Titanic Survival Predictor

Copyright Intellipaat. All rights reserved.


Use Case 1

In this demo, we will learn to use TFLearn and TensorFlow to model the survival chance of Titanic passengers using their personal information
(such as gender, age, and so on). To tackle this classic Machine Learning task, we are going to build a Deep Neural Network classifier

Copyright Intellipaat. All rights reserved.


Use Case 1

• Let's take a look at the dataset (TFLearn will automatically download it for you)
• For each passenger, the following information is provided:

survivedSurvived (0 = No; 1 = Yes)


pclass Passenger Class (1 = st; 2 = nd; 3 = rd)
name
sex
age
sibsp Number of Siblings/Spouses Aboard
parch Number of Parents/Children Aboard
ticket Number
fare Passenger Fare

Copyright Intellipaat. All rights reserved.


Use Case 1

• There are two classes in our task: not survived (class = 0) and survived (class = 1), and the passenger data has eight features

• The Titanic dataset is stored in a CSV file, so we can use the TFLearn load_csv() function to load the data from the file into a Python list

• We specify the target_column argument to indicate that our labels (survived or not) are located in the first column (ID is 0)

• The functions will return a tuple (data, labels)

Copyright Intellipaat. All rights reserved.


Implementing RNN on MNIST data with
TFLearn

Copyright Intellipaat. All rights reserved.


Quiz

Copyright Intellipaat. All rights reserved.


Quiz 1

TFLearn provides easy device placement for


using multiple CPUs/GPUs

A True

B False

Copyright Intellipaat. All rights reserved.


Answer 1

TFLearn provides easy device placement for


using multiple CPUs/GPUs

A True

B False

Copyright Intellipaat. All rights reserved.


Quiz 2

Which of the following is correct from TFLearn’s


Data Management point of view?

A Supports NumPy array data

B Supports HDF5

C A,B

D None of the mentioned

Copyright Intellipaat. All rights reserved.


Answer 2

Which of the following is correct from TFLearn’s


Data Management point of view?

A Supports NumPy array data

B Supports HDF5

C A and B

D None of the mentioned

Copyright Intellipaat. All rights reserved.


Quiz 3

TFLearn is an abstraction framework for Tensorflow.

A Yes

B No

Copyright Intellipaat. All rights reserved.


Answer 3

TFLearn is an abstraction framework for Tensorflow.

A Yes

B No

Copyright Intellipaat. All rights reserved.


Thank you!

Copyright Intellipaat. All rights reserved.


India: +91-7847955955

US: 1-800-216-8930 (TOLL FREE)

sales@intellipaat.com

24/7 Chat with Our Course Advisor

Copyright Intellipaat. All rights reserved.

You might also like