DL With Keras

Before we can start building deep
learning networks, we will spend some
time learning about the different deep
learning libraries and frameworks that
are out there. In this video, I will
briefly cover the libraries that we'll
be teaching in this specialization. The
most popular library is in descending
order are TensorFlow, Keras, and PyTorch.
There is also Theano, a library developed
by the Montreal Institute for Learning
Algorithms, and was the major library for
deep learning development even before
TensorFlow and PyTorch.
However, the founders can't afford to
continuously support it and maintain it,
and therefore, the library lost its
popularity. Because of that, in this
specialization, we will focus on the
three other popular libraries. Among the
three libraries. TensorFlow is the most
popular one. It is the library that is
mostly used in production of deep
learning models. It has a very large
community. Just a quick look at the
number of forks on the library's Github
repository as well as the number of
commits and pull requests should suffice
in giving you an idea of how popular the
library is. Tensorflow was developed by
Google and released to the public in

2015, and is still being actively used at
Google for both research and production
needs. PyTorch on the other hand, is the
cousin of the Torch framework, which is
in Lua, and supports machine learning
algorithms running on GPUs in particular.
However being derived from the Torch
framework, PyTorch isn't just a set of
wrappers to support a popular language
like Python. Tt was actually rewritten
and tailored to be fast and feel native.
PyTorch was released in 2016 and has
gained immense interest lately and is
becoming the preferred language over
TensorFlow,
especially in academic research settings
and applications of deep learning
requiring optimizing custom expressions.
PyTorch is supported and being actively
used at Facebook. However, despite their
popularity, both
PyTorch and TensorFlow are not easy to
use, and have a steep learning curve. So
for people who are just starting to
learn deep learning, there is no better
library to use other than the Keras
library.
Keras is a high level API for building
deep learning models. It has gained favor
for its ease of use and syntactic
simplicity facilitating fast development.

As you'll see in the next couple of
videos, building a very complex deep
learning network can be achieved with
Keras with only few lines of code.
Keras normally runs on top of a
low-level library such as TensorFlow.
This means that to be able to use the
Kares library, you will have to install
TensorFlow first, and when you import
Keras, it will be explicitly displayed
what backend was used to install the
Keras library. Keras is also supported
by Google. I won't go into more details
about the different libraries,
but the take home message here is if
you're interested in building something
quickly go with the Keras library; you
won't be disappointed. However, if you
want to have more control over the
different nodes and layers in the
network, and want to watch closely what
happens with the network over time, then
PyTorch or TensorFlow would be the
right library. It will really boil
down to your personal preference.
With that, in the next videos, we will start
learning how to use the Keras library
to build models for regression and
classification problems.
In this video, we will learn about deep

learning algorithms. We will start with
supervised deep learning algorithms, and
in this video, we will learn about
convolutional neural networks.
Convolutional neural networks are very
similar to the neural networks that we
have seen so far in this course. They are
made up of neurons, which need to have
the weights and biases optimized. Each
neuron combines the inputs that it
receives by computing the dot product
between each input and the corresponding
weight before it fits the resulting
total input into an activation function,
ReLU most likely. So then, what is
different with these networks and why
are they called convolutional neural
networks?
Well convolutional neural networks, or
CNNs for short, make the explicit
assumption that the inputs are images,
which allows us to incorporate certain
properties into their architecture. These
properties make the forward propagation
step much more efficient and vastly
reduces the amount of parameters in the
network. Therefore, CNNs are best for
solving problems related to image
recognition, object detection, and other
computer vision applications. Here is a
typical architecture of a convolutional

neural network. As you can see, the
network consists of a series of
convolutional, ReLU, and pooling layers
as well as a number of fully connected
layers which are necessary before the
output is generated. Now, let's study what
happens in each layer. So far, we have dealt
only with conventional neural networks
that take an ( n x 1) vector as their
input. The input to a convolutional
neural network, on the other hand, is
mostly an (n x m x 1) for grayscale images
or an (n x m x 3) for colored images,
where the number 3 represents the
red,
green, and blue components of each pixel
in the image. In the convolutional layer,
we basically define filters and we
compute the convolution between the
defined filters and each of the three
images. If we take the red image for
example, let's assume these are the pixel
values. Now for a (2 x 2) filter with
these values, let's create an empty
matrix to save the results of the
convolution process.
We start by sliding the filter over the
image and computing the dot product
between the filter and the overlapping
pixel values and storing the result in
the empty matrix. We repeat this step

moving our filter one cell, or one stride
is the proper terminology, at a time, and
we repeat this until we cover the entire
image and fill the empty matrix. Here, I
just showed one filter and only one of
the three images. The same thing would be
applied to the green and blue images and
you can apply more than one filter. The
more filters we use, the more we are able
to preserve the spatial dimensions
better. But one question you must be
asking yourself at this point is, why
would we need to use convolution? Why not
flatten the input image into an (n x m) x 1
vector and use that as our input?
Well, if we do that, we will end up with a
massive number of parameters that will
need to be optimized, and it will be
super computationally expensive. Also,
decreasing the number of parameters
would definitely help in preventing the
model from overfitting the training data.
It is worth mentioning that a
convolutional layer also consists of
ReLU's which filter the output of the
convolutional step passing only positive
values and turning any negative values
to 0. The next layer in our
convolutional neural network is the
pooling layer. The pooling layer's main
objective is to reduce the spatial

dimensions of the data propagating
through the network. There are two types
of pooling that are widely used in
convolutional neural networks. Max-
pooling and average pooling. In max-pooling
which is the most common of the
two, for each section of the image we scan
we keep the highest value, like so.
Here our filter is moving two strides at
a time.
Similarly, with average pooling, we
compute the average of each area we scan.
In addition to reducing the dimension of
the data, pooling, or max pooling in
particular, provides spatial variance
which enables the neural network to
recognize objects in an image even if
the object does not exactly resemble the
original object. Finally, in the fully
connected layer, we flatten the output
of the last convolutional layer and
connect every node of the current layer
with every other node of the next layer.
This layer basically takes as input the
output from the preceding layer, whether
it is a convolutional layer, ReLU, or
pooling layer, and outputs an n-dimensional
vector, where n is the number
of classes pertaining to the problem at
hand. For example, if you are building a
network to classify images of digits, the

dimension n would be 10, since there are
10 digits. You will be covering
convolutional neural networks in much
more details in the other courses in this
specialization, but this information is
more than enough to give you a general
understanding of convolutional neural
networks. Now let's see how we can use
the Keras library to build a
convolutional neural network. Here I will
show you how you can use the Keras
library to build a convolutional neural
network. Training and testing of a
convolutional neural network are the
same as what we have seen so far. So to
begin with, we use the sequential
constructor to create our model. Then, we
define our input to be the size of the
input images. Assuming the input images
are 128 by 128 color images, we define
the input shape to be a tuple of (128, 128, 3).
Next, we start adding layers to the
network. We start with a convolutional
layer, with 16 filters, each filter being
of size 2x2 and slides through the image
with a stride of magnitude 1 in the
horizontal direction, and of magnitude 1
in the vertical direction. And the layer
uses the ReLU activation function. Then,
we add a pooling layer and we're using
max-pooling here with a filter or

pooling size of 2 and the filter slides
through the image with a stride of
magnitude 2. Next, we add another set of
convolutional and pooling layers. The
only difference here is we are using more
filters in the convolutional layer,
actually twice as many filters as the
first convolutional layer. Finally, we
flatten the output from these layers so
that the data can proceed to the fully
connected layers. We add another dense
layer with 100 nodes and an output layer
that has nodes equal to the number of
classes in the problem at hand. And we
use the softmax activation function in
order to convert the outputs into
probabilities. With this, we conclude this
video on convolutional neural networks.
In the lab, we will implement a complete
convolutional neural network, where we
will use the Keras library to build the
network, train it, and then validate it. So
make sure to complete this module's lab
on convolution learning networks.
3
As we discussed earlier, activation
functions play a major role in the
learning process of a neural network. So
far, we have used only the sigmoid
function as the activation function in
our networks, but we saw how the sigmoid
function has its shortcomings since it
can lead to the vanishing gradient
problem for the earlier layers. In this
video, we will discuss other activation
functions; ones that are more efficient
to use and are more applicable to deep
learning applications. There are seven
types of activation functions that you
can use when building a neural network.
There is the binary step function, the
linear or identity function, there is our
old friend the sigmoid or logistic
function, there is the hyperbolic tangent,
or tanh, function, the rectified linear
unit (ReLU) function, the leaky ReLU
function, and the softmax function. In
this video, we will discuss the popular
ones, which are the sigmoid, the
hyperbolic tangent, ReLU, and the softmax
functions. This is the sigmoid function.
At z = 0, a is equal to 0.5 and when z
is a very large positive number, a is
close to 1, and when z is a very large
negative number, a is close to zero.
Sigmoid functions used to be widely used

as activation functions in the hidden
layers of a neural network. However, as
you can see, the function is pretty flat
beyond the +3 and -3
region. This means that once the function
falls in that region, the gradients
become very small. This results in the
vanishing gradient problem that we
discussed, and as the gradients approach
0, the network doesn't really learn.
Another problem with the sigmoid
function is that the values only range
from 0 to 1. This means that the sigmoid
function is not symmetric around the
origin.
The values received are all positive.
Well, not all the times would we desire
that values going to the next neuron be
all of the same sign. This can be
addressed by scaling the sigmoid
function, and this brings us to the next
activation function: the hyperbolic
tangent function. This is the hyperbolic
tangent, or tanh, function. It is very
similar to the sigmoid function. It is
actually just a scaled version of the
sigmoid function, but unlike the sigmoid
function, it's symmetric over the origin.
It ranges from -1 to +1.
However, although it overcomes the lack
of symmetry of the sigmoid function, it

also leads to the vanishing gradient
problem in very deep neural networks. The
rectified linear unit, or ReLU, function
is the most widely used activation
function when designing networks today.
In addition to it being nonlinear, the
main advantage of using the ReLU,
function over the other activation
functions is that it does not activate
all the neurons at the same time.
According to the plot here, if the input
is negative it will be converted to 0,
and the neuron does not get activated.
This means that at a time, only a few
neurons are activated, making the network
sparse and very efficient. Also, the ReLU
function was one of the main
advancements in the field of deep
learning that led to overcoming the
vanishing gradient problem. One last
activation function that we will discuss
here is the softmax function. The softmax
function is also a type of a sigmoid
function, but it is handy when we are
trying to handle classification problems.
The softmax function is ideally used in
the output layer of the classifier where
we are actually trying to get the
probabilities to define the class of
each input. So, if a network with 3
neurons in the output layer outputs [1.6, 0.55, 0.98]

then with a softmax activation
function, the outputs get converted to
[0.51, 0.18, 0.31]. This way, it is
easier for us to classify a given data
point and determine to which category it
belongs. In conclusion, the sigmoid and
the tanh functions are avoided in many
applications nowadays since they can
lead to the vanishing gradient problem.
The ReLU function is the function
that's widely used nowadays, and it's
important to note that it is only used in
the hidden layers. Finally, when building
a model, you can begin with using the
ReLU function and then you can switch to
other activation functions if the
ReLU function does not yield a good
performance. And this concludes this
video on activation functions I'll see
you in the next video.
In this video, we will start learning how
to use the Keras library to build deep
learning models. We will start with
building models for regression problems.
In this course, we will be using
Cognitive Class Labs, or CC Labs for
short, as our platform to run the lab
sessions. So if you go to labs.cognitiveclass.ai,
you can either sign in if
you already have a Cognitive Class

account, or sign up if this is your first
visit to Cognitive Class or its Labs
platform. Once you sign in you should get
to this landing page, where you can
select different environments. Let's
click on JupyterLab to start a new
JupyterLab Notebook.
Once JupyterLab loads, click on the
Python 3 icon to start a new notebook. In
order to avoid any technical hiccups, or
to make sure that you have the smoothest
experience, we have pre-installed the
Keras library in CC Labs so you can
import it directly by running "import
keras". Once the code executes, it will
print what backend was used to install
the Keras library. Here, we used the
TensorFlow backend. Let's take a look at
a regression example. Here is a data set
of the compressive strength of different
samples of concrete based on the volumes
of the different materials that were
used to make them. So the first concrete
sample has 540 cubic meter of
cement, 0 cubic meter of blast furnace
slag, 0 cubic meter of fly ash, 162 cubic
meter of water, 2.5 cubic meter of superplasticizer,
1040 cubic meter of
coarse aggregate, and 676 cubic meter of
fine aggregate. Such a concrete mix which
is 28 days old has a compressive

strength of 79.99 MPa. The data is in a pandas
dataframe and named concrete_data. So
let's say we would like to use the Keras
library to quickly build
a deep neural network to model this
dataset, and so we can automatically
determine the compressive strength of a
given concrete sample based on its
ingredients. So let's say that the deep
neural network that we would like to
create takes all the eight features
as input, feeds them into a hidden layer
of five nodes, which is connected to
another hidden layer of five nodes, which
is then connected to the output layer
with one node that is supposed to output
the compressive strength of a given
concrete sample. Note that usually you
would go with a much larger number of
neurons in each hidden layer like 50 or
even 100, but we're just using a small
network for simplicity. Notice how all
the nodes in one layer are connected to
all the other nodes in the next layer.
Such a network is called a dense network.
Before we begin using the Keras library,
let's prepare our data and have it in
the right format. The only thing we would
need to do is to split the dataframe
into two dataframes, one that has the
predictors columns and another one that

has the target column. We will also name
them predictors and target. Now prepare
to see the magic of Keras and how
building such a network and training it
and using it to predict new samples can
be achieved with only few lines of code.
The first thing you will need to do is
import Keras and the Sequential model
from "keras.models". Because our
network consists of a linear stack of
layers, then the Sequential model is what
you would want to use. This is the case
most of the time unless you are building
something out of the ordinary. There are
two models in the Keras library. One of
them is the Sequential model and the
other one is the model class used with
the functional API. So to create your
model, you simply call the Sequential
constructor. Now building your layers is
pretty straightforward as well.
For that, we would need to import the "Dense"
type of layers from "keras.layers".
Then we use the add method to add each
dense layer. We specify the number of
neurons in each layer and the activation
function that we want to use.
As per our discussion in the video on activation
functions, ReLU is one of the
recommended activation functions for
hidden layers, so we will use that. And

for the first hidden layer we need to
pass in the "input_shape" parameter, which
is the number of columns or predictors
in our dataset.
Then we repeat the same thing for the other hidden layer,
of course without the input_shape parameter,
and we create our output layer with one
node. Now for training, we need to define
an optimizer and the error metric. In the
previous module, we used gradient descent
as our minimization or optimization
algorithm, and the mean squared error as
our loss measure between the predicted
value and the ground truth. So we will
stick with that and use the mean squared
error as our loss measure. As for the
minimization algorithm, there are actually
other more efficient algorithms than the
gradient descent for deep learning
applications. One of them is "adam". One of
the main advantages of the "adam"
optimizer is that you don't need to
specify the learning rate that we saw in
the gradient descent video. So this saves
us the task of optimizing the learning
rate for our model. Then we use the fit
method to train our model. Once training
is complete, we can start making
predictions using the predict method.
Below the video, you will find a document
with links to different sections in the

Keras library that you can refer to to
learn more about optimizers, models, and
other methods that you can use in the
Keras library. But this code snippet
here is typically all you need to know
to build a regression model in Keras. In
the next video, we will learn how to
build a classification model using the
Keras library.
in this video, we will learn how to use the Keras library to build models for
classification problems. Let's say that we would like to build a model that
would inform someone whether purchasing a certain car would be a good choice
based on the price of the car, the cost to maintain it, and whether it can
accommodate two or more people. So, here is a dataset that we are calling "car_data".
I already cleaned the data, as you can see, where I used one-hot encoding to
transform each category of price, maintenance, and how many people the car
can accommodate, into separate columns. So the price of the car can be either
high, medium, or low. Similarly, the cost of maintaining the
car can also be high, medium, or low, and the car can either fit two people or
more. If you take the first car in the dataset, it is considered an expensive
car, has high maintenance cost, and can fit only two people. The decision is 0,
meaning that buying this car would be a bad choice. A decision of 1 means that
buying the car is acceptable, a decision of 2 means that buying the car would
be a good decision, and a decision of 3 means that buying the car would be
a very good decision. Let's use the same neural network as the one we used for
the regression problem that we discussed in the previous video. So a network that
still takes eight inputs or predictors, consists of two hidden layers, each of
five neurons, and an output layer. Next, let's divide our dataset into
predictors and target. However, with Keras, for classification problems, we
can't use the target column as is; we actually need to transform the column
into an array with binary values similar to one-hot encoding like the output
shown here. We easily achieve that using the "to_categorical"
function from the Keras utilities package. In other words, our model instead
of having just one neuron in the output layer, it would have four neurons,
since our target variable consists of four categories. In terms of code, the
structure of our code is pretty similar to the one we use to build the model for
our regression problem. We start by importing the Keras library and the
Sequential model and we use it to construct our model. We also import the
"Dense" layer since we will be using it to build our network. The additional import
statement here is the "to_categorical" function in order to transform our
target column into an array of binary numbers for classification. Then, we
proceed to constructing our layers. We use the add method to create two hidden
layers, each with five neurons and the neurons are activated using the ReLU
activation function. Notice how here we also specify the softmax function as the
activation function for the output layer, so that the sum of the predicted values
from all the neurons in the output layer sum nicely to 1. Then in defining our
compiler, here we will use the categorical cross-entropy as our loss
measure instead of the mean squared error that we use for regression, and we
will specify the evaluation metric to be "accuracy". "accuracy" is a built-in
evaluation metric in Keras but you can actually define your own evaluation
metric and pass it in the metrics parameter. Then we fit the model.
Notice how this time we're specifying the number of epochs for training the
model. Although we didn't specify the number of epochs when we built a
regression model, but we could have done that. Finally, we use the predict method
to make predictions. Now the output of the Keras predict method would be
something like what's shown here. For each data point, the output is the
probability that the decision of purchasing a given car belongs to one of
the four classes. For each data point, the probabilities should sum to 1,
and the higher the probability the more confident is the algorithm that
a datapoint belongs to the respective class. So for the first data point or the
first car in the test set, the decision would be 0 meaning not acceptable,
since the first probability is the highest, with a value of 0.99 or close to
1, in this case. Similarly, for the second datapoint, the decision is also 0 or
not acceptable, since the probability for this class is the highest, again with a
value of 0.99 or almost 1. For the first three datapoints, the model is very
confident that purchasing these cars is not acceptable. As for the last three
datapoints, the decision would be 1 or acceptable, since the probabilities for
the second class are higher than the rest of the classes. But notice how the
probabilities for decision 0 and decision 1 are very close. Therefore,
the model is not very confident but it would lean towards accepting purchasing
these cars. In the lab part, you will get the chance to build your own
regression and classification models using the Keras library, so make sure to
complete this module's lab components.
So far, we discussed two supervised deep learning models, which are the
convolutional neural network and the recurrent neural network. In this video, we
will switch to an unsupervised deep learning model which is the autoencoder.
So what are autoencoders? Autoencoding is a data compression algorithm where
the compression and the decompression functions are learned automatically from
data. instead of being engineered by a human. Such autoencoders are built using
neural networks. Autoencoders are data specific, which means that they will only
be able to compress data similar to what they have been trained on. Therefore, an
autoencoder trained on pictures of cars would do a
rather poor job of compressing pictures of buildings, because the features it
would learn would be vehicle or car specific. Some interesting applications
of autoencoders are data denoising and dimensionality reduction for data
visualization. Here is the architecture of an autoencoder. It takes an image, for
example, as an input and uses an encoder to find the optimal compressed
representation of the input image. Then, using a decoder the original image is
restored. So an autoencoder is an unsupervised neural network model. It
uses backpropagation by setting the target variable to be the same as the
input. In other words, it tries to learn an approximation of an identity function.

Because of non-linear activation functions in neural networks,
autoencoders can learn data projections that are more interesting than a
principal component analysis PCA or other basic techniques, which can handle
only linear transformations. A very popular type of autoencoders is the
Restricted Boltzmann Machines or (RBMs) for short. RBMs have been successfully
used for various applications, including fixing imbalanced datasets. Because
RBMs learn the input in order to be able to
regenerate it, then they can learn the distribution of the minority class in an
imbalance dataset ,and then generate more data points of that class, transforming
the imbalance dataset into a balanced data set.
Similarly, RBMs can also be used to estimate missing values in different
features of a data set. Another popular application of Restricted Boltzmann
Machines is automatic feature extraction of especially unstructured data. And this
concludes our high-level introduction to autoencoders and Restricted Boltzmann
Machines.
67
All the algorithms that are used in deep
learning are largely inspired by the way
neurons and neural networks function and
process data in the brain. This image is
one of the very first pictures of a
neuron. It was drawn by Santiago Ramon
y Cajal,
back in 1899 based on what he saw after
placing a pigeon's brain under the
microscope. He is now known as the father
of modern neuroscience, but based on his
drawing, the neurons, one of them labeled
A, have big bodies in the middle and long
arms that stretch out and branch off to
connect with other neurons. This other

image here is that of a neural network
and has a bunch or thousands of neurons
in what looks like a brain tissue. Tt
gives you a sense of how tightly they
are packed together and how many of them
are in a small brain tissue. Going back
to the drawing of neurons by Ramon y Cajal,
let's rotate it 90 degrees to the
left.
I bet this way it is starting to look a
little familiar since it slightly
resembles drawings of artificial neural
networks that you must have seen. Here is
a cartoon drawing of the neuron. The main
body of the neuron is called the soma,
which contains the nucleus of the neuron.
The big network of arms sticking out of
the body is called the dendrites, and
then the long arm that sticks out of the
soma in the other direction is called
the axon. The whiskers at the end of the
axon are called the terminal buttons or
synapses. So the dendrites receive
electrical impulses which carry
information, or data, from sensors or
terminal buttons of other adjoining
neurons. The dendrites then carry the
impulses or data to the soma. In the
nucleus, electrical impulses, or the data,
are processed by combining them together,
and then they are passed on to the axon.

The axon then carries the processed
information to the terminal button or
synapse, and the output
of this neuron becomes the input to
thousands of other neurons. Learning in
the brain occurs by repeatedly
activating certain neural connections
over others, and this reinforces those
connections. This makes them more likely
to produce a desired outcome given a
specified input. Once the desired outcome
occurs, the neural connections causing
that outcome become strengthened. An
artificial neuron behaves in the same
way as a biological neuron. So it
consists of a soma, dendrites, and an axon
to pass on the output of this neuron to
other neurons. The end of the axon can
branch off to connect to many other
neurons, but for simplicity we are just
showing one branch here. The learning
process also very much resembles the way
learning occurs in the brain as you will
see in the next couple of videos. Now
that we understand the different parts
of an artificial neuron, let's learn how
we formulate the way artificial neural
networks process information.
So far, we have mostly been dealing with
not very deep, or shallow, neural networks.

And the main reason is that they really
do serve as the building block of deep
neural networks and are easier to
understand due to their simplicity. There
isn't really a consensus on the
definition of a shallow neural network
but a neural network with one hidden
layer is considered a shallow neural
network whereas a network with many
hidden layers and a large number of
neurons in each layer is considered a
deep neural network. Also, unlike a
shallow neural network which takes only
input as vectors, deep neural networks
are able to take raw data such as images
and text and automatically extract the
necessary features to learn the data
better. We will start learning about deep
learning algorithms in the next videos.
But if neural networks have been around
for quite some time, how come only
recently did they turn deep and start
taking off resulting in a plethora of
cool and exciting applications? The
sudden boom in the deep learning field
can be attributed to three main factors.
Number one, advancement in the field
itself. We talked about this briefly in
the activation functions video, where we
mentioned that the ReLU activation
function helped overcome the challenge

of the vanishing gradient problem, and
therefore, opened the door to the creation
of very deep networks.
Therefore, advancement in the field
itself is one factor that helped deep
learning take off. Another main reason is
the availability of data. Deep neural
networks work best when trained with
large and large amounts of data,
since neural networks learn the
training data so well, then large amounts
of data have to be used in order to
avoid overfitting of the training data.
Now that large amounts of data are
readily available and easy to acquire
like never before, deep learning
algorithms are being tried and tested
like never before. Especially that the
other conventional machine
learning algorithms, while they do
improve with more data, but up to a
certain point. After that, no significant
improvement would be observed with more
data. That is definitely not the case
with deep learning. The more data you
feed it the better it performs. Finally,
and this goes hand-in-hand with point
number 2, is computational power. With
NVIDIA's super powerful GPUs, we are now
able to train very deep neural networks
on tremendous amount of data in a matter

of hours as opposed to days or weeks,
which is how long it used to take to
train very deep neural networks.
Therefore, users are able to experiment
with different deep neural networks and
test different prototypes in much
shorter periods of time. These three
factors are the main reasons behind the
boom of deep learning. In the next video,
we will start learning about deep
in the next video, we will learn about
In this video, we will learn about deep
in this video, we will learn about
Convolutional neural networks are very
similar to the neural networks that we
have seen so far in this course. They are
made up of neurons, which need to have
the weights and biases optimized. Each
neuron combines the inputs that it
receives by computing the dot product
between each input and the corresponding
weight before it fits the resulting
total input into an activation function,
ReLU most likely. So then, what is

different with these networks and why
are they called convolutional neural
networks?
Well convolutional neural networks, or
CNNs for short, make the explicit
assumption that the inputs are images,
which allows us to incorporate certain
properties into their architecture. These
properties make the forward propagation
step much more efficient and vastly
reduces the amount of parameters in the
network. Therefore, CNNs are best for
solving problems related to image
recognition, object detection, and other
computer vision applications. Here is a
typical architecture of a convolutional
neural network. As you can see, the
network consists of a series of
convolutional, ReLU, and pooling layers
as well as a number of fully connected
layers which are necessary before the
output is generated. Now, let's study what
happens in each layer. So far, we have dealt
only with conventional neural networks
that take an ( n x 1) vector as their
input. The input to a convolutional
neural network, on the other hand, is
mostly an (n x m x 1) for grayscale images
or an (n x m x 3) for colored images,
where the number 3 represents the
red,
green, and blue components of each pixel
in the image. In the convolutional layer,
we basically define filters and we
compute the convolution between the
defined filters and each of the three
images. If we take the red image for
example, let's assume these are the pixel
values. Now for a (2 x 2) filter with
these values, let's create an empty
matrix to save the results of the
convolution process.
We start by sliding the filter over the
image and computing the dot product
between the filter and the overlapping
pixel values and storing the result in
the empty matrix. We repeat this step
moving our filter one cell, or one stride
is the proper terminology, at a time, and
we repeat this until we cover the entire
image and fill the empty matrix. Here, I
just showed one filter and only one of
the three images. The same thing would be
applied to the green and blue images and
you can apply more than one filter. The
more filters we use, the more we are able
to preserve the spatial dimensions
better. But one question you must be
asking yourself at this point is, why
would we need to use convolution? Why not
flatten the input image into an (n x m) x 1
vector and use that as our input?

Well, if we do that, we will end up with a
massive number of parameters that will
need to be optimized, and it will be
super computationally expensive. Also,
decreasing the number of parameters
would definitely help in preventing the
model from overfitting the training data.
It is worth mentioning that a
convolutional layer also consists of
ReLU's which filter the output of the
convolutional step passing only positive
values and turning any negative values
to 0. The next layer in our
convolutional neural network is the
pooling layer. The pooling layer's main
objective is to reduce the spatial
dimensions of the data propagating
through the network. There are two types
of pooling that are widely used in
convolutional neural networks. Max-
pooling and average pooling. In max-pooling
which is the most common of the
two, for each section of the image we scan
we keep the highest value, like so.
Here our filter is moving two strides at
a time.
Similarly, with average pooling, we
compute the average of each area we scan.
In addition to reducing the dimension of
the data, pooling, or max pooling in
particular, provides spatial variance

which enables the neural network to
recognize objects in an image even if
the object does not exactly resemble the
original object. Finally, in the fully
connected layer, we flatten the output
of the last convolutional layer and
connect every node of the current layer
with every other node of the next layer.
This layer basically takes as input the
output from the preceding layer, whether
it is a convolutional layer, ReLU, or
pooling layer, and outputs an n-dimensional
vector, where n is the number
of classes pertaining to the problem at
hand. For example, if you are building a
network to classify images of digits, the
dimension n would be 10, since there are
10 digits. You will be covering
convolutional neural networks in much
more details in the other courses in this
specialization, but this information is
more than enough to give you a general
understanding of convolutional neural
networks. Now let's see how we can use
the Keras library to build a
convolutional neural network. Here I will
show you how you can use the Keras
library to build a convolutional neural
network. Training and testing of a
convolutional neural network are the
same as what we have seen so far. So to

begin with, we use the sequential
constructor to create our model. Then, we
define our input to be the size of the
input images. Assuming the input images
are 128 by 128 color images, we define
the input shape to be a tuple of (128, 128, 3).
Next, we start adding layers to the
network. We start with a convolutional
layer, with 16 filters, each filter being
of size 2x2 and slides through the image
with a stride of magnitude 1 in the
horizontal direction, and of magnitude 1
in the vertical direction. And the layer
uses the ReLU activation function. Then,
we add a pooling layer and we're using
max-pooling here with a filter or
pooling size of 2 and the filter slides
through the image with a stride of
magnitude 2. Next, we add another set of
convolutional and pooling layers. The
only difference here is we are using more
filters in the convolutional layer,
actually twice as many filters as the
first convolutional layer. Finally, we
flatten the output from these layers so
that the data can proceed to the fully
connected layers. We add another dense
layer with 100 nodes and an output layer
that has nodes equal to the number of
classes in the problem at hand. And we
use the softmax activation function in

order to convert the outputs into
probabilities. With this, we conclude this
video on convolutional neural networks.
In the lab, we will implement a complete
convolutional neural network, where we
will use the Keras library to build the
network, train it, and then validate it. So
make sure to complete this module's lab
on convolution learning networks.
In the previous video, we learned about convolutional neural networks, which are
supervised deep learning models that have revolutionized the field of
computer vision, especially object detection in images. In this video, we
will learn about another supervised deep learning model, which is the recurrent
neural network. So far, we have seen neural networks and deep learning models
that see datapoints as independent instances. However, let's say you want to
build a model that can analyze scenes in a movie.
Well, you cannot assume that scenes in a movie are independent, and therefore,
traditional deep learning models are not suitable for this application. Recurrent
neural networks overcome this issue. Recurrent neural networks or (RNNs)
for short, are networks with loops that don't just take a new input at a time,
but also take in as input the output from the previous dat point that was fed
into the network. Accordingly, this is how the architecture of a recurrent neural
network would look like. Essentially, we can start with a normal neural network.
At time t = 0, the network takes in input x0 and outputs a0. Then, at time t = 1,
in addition to the input x1, the network also takes a0 as input,
weighted with weight w0,1, and so on and so forth. As a result, recurrent neural
networks are very good at modelling patterns and sequences of data, such as
texts, genomes, handwriting, and stock markets. These algorithms take time and
sequence into account, which means that they have a temporal dimension. A very
popular type of recurrent neural network is the long short-term memory model or
the (LSTM) model for short. It has been successfully used for many applications
including image generation, where a model trained on many
images is used to generate new novel images. Another application is
handwriting generation, which I described in the welcome video of this course. Also
LSTM models have been successfully used to build algorithms that can
automatically describe images as well as streams of videos. I think this is a good
overview of recurrent neural networks. Given that this is just an introductory
course I will leave it here. This concludes this video
on recurrent neural networks. I will see you in the next video
we will switch to unsupervised deep learning models and talk about
autoencoders.

DL With Keras

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DL With Keras

Uploaded by

Copyright:

Available Formats

Before we can start building deep

learning networks, we will spend some

time learning about the different deep

learning libraries and frameworks that

are out there. In this video, I will

briefly cover the libraries that we'll

be teaching in this specialization. The

most popular library is in descending

order are TensorFlow, Keras, and PyTorch.

There is also Theano, a library developed

by the Montreal Institute for Learning

Algorithms, and was the major library for

deep learning development even before

TensorFlow and PyTorch.

However, the founders can't afford to

continuously support it and maintain it,

and therefore, the library lost its

popularity. Because of that, in this

specialization, we will focus on the

three other popular libraries. Among the

three libraries. TensorFlow is the most

popular one. It is the library that is

mostly used in production of deep

learning models. It has a very large

community. Just a quick look at the

number of forks on the library's Github

repository as well as the number of

commits and pull requests should suffice

in giving you an idea of how popular the

library is. Tensorflow was developed by

Google and released to the public in

Google for both research and production

needs. PyTorch on the other hand, is the

cousin of the Torch framework, which is

in Lua, and supports machine learning

algorithms running on GPUs in particular.

However being derived from the Torch

framework, PyTorch isn't just a set of

wrappers to support a popular language

like Python. Tt was actually rewritten

and tailored to be fast and feel native.

PyTorch was released in 2016 and has

gained immense interest lately and is

becoming the preferred language over

especially in academic research settings

and applications of deep learning

requiring optimizing custom expressions.

PyTorch is supported and being actively

used at Facebook. However, despite their

PyTorch and TensorFlow are not easy to

use, and have a steep learning curve. So

for people who are just starting to

learn deep learning, there is no better

library to use other than the Keras

Keras is a high level API for building

deep learning models. It has gained favor

for its ease of use and syntactic

simplicity facilitating fast development.

videos, building a very complex deep

learning network can be achieved with

Keras with only few lines of code.

Keras normally runs on top of a