Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Before we can start building deep

learning networks, we will spend some

time learning about the different deep

learning libraries and frameworks that

are out there. In this video, I will

briefly cover the libraries that we'll

be teaching in this specialization. The

most popular library is in descending

order are TensorFlow, Keras, and PyTorch.

There is also Theano, a library developed

by the Montreal Institute for Learning

Algorithms, and was the major library for

deep learning development even before

TensorFlow and PyTorch.

However, the founders can't afford to

continuously support it and maintain it,

and therefore, the library lost its

popularity. Because of that, in this

specialization, we will focus on the

three other popular libraries. Among the

three libraries. TensorFlow is the most

popular one. It is the library that is

mostly used in production of deep

learning models. It has a very large

community. Just a quick look at the

number of forks on the library's Github

repository as well as the number of

commits and pull requests should suffice

in giving you an idea of how popular the

library is. Tensorflow was developed by

Google and released to the public in


2015, and is still being actively used at

Google for both research and production

needs. PyTorch on the other hand, is the

cousin of the Torch framework, which is

in Lua, and supports machine learning

algorithms running on GPUs in particular.

However being derived from the Torch

framework, PyTorch isn't just a set of

wrappers to support a popular language

like Python. Tt was actually rewritten

and tailored to be fast and feel native.

PyTorch was released in 2016 and has

gained immense interest lately and is

becoming the preferred language over

TensorFlow,

especially in academic research settings

and applications of deep learning

requiring optimizing custom expressions.

PyTorch is supported and being actively

used at Facebook. However, despite their

popularity, both

PyTorch and TensorFlow are not easy to

use, and have a steep learning curve. So

for people who are just starting to

learn deep learning, there is no better

library to use other than the Keras

library.

Keras is a high level API for building

deep learning models. It has gained favor

for its ease of use and syntactic

simplicity facilitating fast development.


As you'll see in the next couple of

videos, building a very complex deep

learning network can be achieved with

Keras with only few lines of code.

Keras normally runs on top of a

low-level library such as TensorFlow.

This means that to be able to use the

Kares library, you will have to install

TensorFlow first, and when you import

Keras, it will be explicitly displayed

what backend was used to install the

Keras library. Keras is also supported

by Google. I won't go into more details

about the different libraries,

but the take home message here is if

you're interested in building something

quickly go with the Keras library; you

won't be disappointed. However, if you

want to have more control over the

different nodes and layers in the

network, and want to watch closely what

happens with the network over time, then

PyTorch or TensorFlow would be the

right library. It will really boil

down to your personal preference.

With that, in the next videos, we will start

learning how to use the Keras library

to build models for regression and

classification problems.

In this video, we will learn about deep


learning algorithms. We will start with

supervised deep learning algorithms, and

in this video, we will learn about

convolutional neural networks.

Convolutional neural networks are very

similar to the neural networks that we

have seen so far in this course. They are

made up of neurons, which need to have

the weights and biases optimized. Each

neuron combines the inputs that it

receives by computing the dot product

between each input and the corresponding

weight before it fits the resulting

total input into an activation function,

ReLU most likely. So then, what is

different with these networks and why

are they called convolutional neural

networks?

Well convolutional neural networks, or

CNNs for short, make the explicit

assumption that the inputs are images,

which allows us to incorporate certain

properties into their architecture. These

properties make the forward propagation

step much more efficient and vastly

reduces the amount of parameters in the

network. Therefore, CNNs are best for

solving problems related to image

recognition, object detection, and other

computer vision applications. Here is a

typical architecture of a convolutional


neural network. As you can see, the

network consists of a series of

convolutional, ReLU, and pooling layers

as well as a number of fully connected

layers which are necessary before the

output is generated. Now, let's study what

happens in each layer. So far, we have dealt

only with conventional neural networks

that take an ( n x 1) vector as their

input. The input to a convolutional

neural network, on the other hand, is

mostly an (n x m x 1) for grayscale images

or an (n x m x 3) for colored images,

where the number 3 represents the

red,

green, and blue components of each pixel

in the image. In the convolutional layer,

we basically define filters and we

compute the convolution between the

defined filters and each of the three

images. If we take the red image for

example, let's assume these are the pixel

values. Now for a (2 x 2) filter with

these values, let's create an empty

matrix to save the results of the

convolution process.

We start by sliding the filter over the

image and computing the dot product

between the filter and the overlapping

pixel values and storing the result in

the empty matrix. We repeat this step


moving our filter one cell, or one stride

is the proper terminology, at a time, and

we repeat this until we cover the entire

image and fill the empty matrix. Here, I

just showed one filter and only one of

the three images. The same thing would be

applied to the green and blue images and

you can apply more than one filter. The

more filters we use, the more we are able

to preserve the spatial dimensions

better. But one question you must be

asking yourself at this point is, why

would we need to use convolution? Why not

flatten the input image into an (n x m) x 1

vector and use that as our input?

Well, if we do that, we will end up with a

massive number of parameters that will

need to be optimized, and it will be

super computationally expensive. Also,

decreasing the number of parameters

would definitely help in preventing the

model from overfitting the training data.

It is worth mentioning that a

convolutional layer also consists of

ReLU's which filter the output of the

convolutional step passing only positive

values and turning any negative values

to 0. The next layer in our

convolutional neural network is the

pooling layer. The pooling layer's main

objective is to reduce the spatial


dimensions of the data propagating

through the network. There are two types

of pooling that are widely used in

convolutional neural networks. Max-

pooling and average pooling. In max-pooling

which is the most common of the

two, for each section of the image we scan

we keep the highest value, like so.

Here our filter is moving two strides at

a time.

Similarly, with average pooling, we

compute the average of each area we scan.

In addition to reducing the dimension of

the data, pooling, or max pooling in

particular, provides spatial variance

which enables the neural network to

recognize objects in an image even if

the object does not exactly resemble the

original object. Finally, in the fully

connected layer, we flatten the output

of the last convolutional layer and

connect every node of the current layer

with every other node of the next layer.

This layer basically takes as input the

output from the preceding layer, whether

it is a convolutional layer, ReLU, or

pooling layer, and outputs an n-dimensional

vector, where n is the number

of classes pertaining to the problem at

hand. For example, if you are building a

network to classify images of digits, the


dimension n would be 10, since there are

10 digits. You will be covering

convolutional neural networks in much

more details in the other courses in this

specialization, but this information is

more than enough to give you a general

understanding of convolutional neural

networks. Now let's see how we can use

the Keras library to build a

convolutional neural network. Here I will

show you how you can use the Keras

library to build a convolutional neural

network. Training and testing of a

convolutional neural network are the

same as what we have seen so far. So to

begin with, we use the sequential

constructor to create our model. Then, we

define our input to be the size of the

input images. Assuming the input images

are 128 by 128 color images, we define

the input shape to be a tuple of (128, 128, 3).

Next, we start adding layers to the

network. We start with a convolutional

layer, with 16 filters, each filter being

of size 2x2 and slides through the image

with a stride of magnitude 1 in the

horizontal direction, and of magnitude 1

in the vertical direction. And the layer

uses the ReLU activation function. Then,

we add a pooling layer and we're using

max-pooling here with a filter or


pooling size of 2 and the filter slides

through the image with a stride of

magnitude 2. Next, we add another set of

convolutional and pooling layers. The

only difference here is we are using more

filters in the convolutional layer,

actually twice as many filters as the

first convolutional layer. Finally, we

flatten the output from these layers so

that the data can proceed to the fully

connected layers. We add another dense

layer with 100 nodes and an output layer

that has nodes equal to the number of

classes in the problem at hand. And we

use the softmax activation function in

order to convert the outputs into

probabilities. With this, we conclude this

video on convolutional neural networks.

In the lab, we will implement a complete

convolutional neural network, where we

will use the Keras library to build the

network, train it, and then validate it. So

make sure to complete this module's lab

on convolution learning networks.

3
As we discussed earlier, activation

functions play a major role in the

learning process of a neural network. So

far, we have used only the sigmoid

function as the activation function in

our networks, but we saw how the sigmoid

function has its shortcomings since it

can lead to the vanishing gradient

problem for the earlier layers. In this

video, we will discuss other activation

functions; ones that are more efficient

to use and are more applicable to deep

learning applications. There are seven

types of activation functions that you

can use when building a neural network.

There is the binary step function, the

linear or identity function, there is our

old friend the sigmoid or logistic

function, there is the hyperbolic tangent,

or tanh, function, the rectified linear

unit (ReLU) function, the leaky ReLU

function, and the softmax function. In

this video, we will discuss the popular

ones, which are the sigmoid, the

hyperbolic tangent, ReLU, and the softmax

functions. This is the sigmoid function.

At z = 0, a is equal to 0.5 and when z

is a very large positive number, a is

close to 1, and when z is a very large

negative number, a is close to zero.

Sigmoid functions used to be widely used


as activation functions in the hidden

layers of a neural network. However, as

you can see, the function is pretty flat

beyond the +3 and -3

region. This means that once the function

falls in that region, the gradients

become very small. This results in the

vanishing gradient problem that we

discussed, and as the gradients approach

0, the network doesn't really learn.

Another problem with the sigmoid

function is that the values only range

from 0 to 1. This means that the sigmoid

function is not symmetric around the

origin.

The values received are all positive.

Well, not all the times would we desire

that values going to the next neuron be

all of the same sign. This can be

addressed by scaling the sigmoid

function, and this brings us to the next

activation function: the hyperbolic

tangent function. This is the hyperbolic

tangent, or tanh, function. It is very

similar to the sigmoid function. It is

actually just a scaled version of the

sigmoid function, but unlike the sigmoid

function, it's symmetric over the origin.

It ranges from -1 to +1.

However, although it overcomes the lack

of symmetry of the sigmoid function, it


also leads to the vanishing gradient

problem in very deep neural networks. The

rectified linear unit, or ReLU, function

is the most widely used activation

function when designing networks today.

In addition to it being nonlinear, the

main advantage of using the ReLU,

function over the other activation

functions is that it does not activate

all the neurons at the same time.

According to the plot here, if the input

is negative it will be converted to 0,

and the neuron does not get activated.

This means that at a time, only a few

neurons are activated, making the network

sparse and very efficient. Also, the ReLU

function was one of the main

advancements in the field of deep

learning that led to overcoming the

vanishing gradient problem. One last

activation function that we will discuss

here is the softmax function. The softmax

function is also a type of a sigmoid

function, but it is handy when we are

trying to handle classification problems.

The softmax function is ideally used in

the output layer of the classifier where

we are actually trying to get the

probabilities to define the class of

each input. So, if a network with 3

neurons in the output layer outputs [1.6, 0.55, 0.98]


then with a softmax activation

function, the outputs get converted to

[0.51, 0.18, 0.31]. This way, it is

easier for us to classify a given data

point and determine to which category it

belongs. In conclusion, the sigmoid and

the tanh functions are avoided in many

applications nowadays since they can

lead to the vanishing gradient problem.

The ReLU function is the function

that's widely used nowadays, and it's

important to note that it is only used in

the hidden layers. Finally, when building

a model, you can begin with using the

ReLU function and then you can switch to

other activation functions if the

ReLU function does not yield a good

performance. And this concludes this

video on activation functions I'll see

you in the next video.

In this video, we will start learning how

to use the Keras library to build deep

learning models. We will start with

building models for regression problems.

In this course, we will be using

Cognitive Class Labs, or CC Labs for

short, as our platform to run the lab

sessions. So if you go to labs.cognitiveclass.ai,

you can either sign in if

you already have a Cognitive Class


account, or sign up if this is your first

visit to Cognitive Class or its Labs

platform. Once you sign in you should get

to this landing page, where you can

select different environments. Let's

click on JupyterLab to start a new

JupyterLab Notebook.

Once JupyterLab loads, click on the

Python 3 icon to start a new notebook. In

order to avoid any technical hiccups, or

to make sure that you have the smoothest

experience, we have pre-installed the

Keras library in CC Labs so you can

import it directly by running "import

keras". Once the code executes, it will

print what backend was used to install

the Keras library. Here, we used the

TensorFlow backend. Let's take a look at

a regression example. Here is a data set

of the compressive strength of different

samples of concrete based on the volumes

of the different materials that were

used to make them. So the first concrete

sample has 540 cubic meter of

cement, 0 cubic meter of blast furnace

slag, 0 cubic meter of fly ash, 162 cubic

meter of water, 2.5 cubic meter of superplasticizer,

1040 cubic meter of

coarse aggregate, and 676 cubic meter of

fine aggregate. Such a concrete mix which

is 28 days old has a compressive


strength of 79.99 MPa. The data is in a pandas

dataframe and named concrete_data. So

let's say we would like to use the Keras

library to quickly build

a deep neural network to model this

dataset, and so we can automatically

determine the compressive strength of a

given concrete sample based on its

ingredients. So let's say that the deep

neural network that we would like to

create takes all the eight features

as input, feeds them into a hidden layer

of five nodes, which is connected to

another hidden layer of five nodes, which

is then connected to the output layer

with one node that is supposed to output

the compressive strength of a given

concrete sample. Note that usually you

would go with a much larger number of

neurons in each hidden layer like 50 or

even 100, but we're just using a small

network for simplicity. Notice how all

the nodes in one layer are connected to

all the other nodes in the next layer.

Such a network is called a dense network.

Before we begin using the Keras library,

let's prepare our data and have it in

the right format. The only thing we would

need to do is to split the dataframe

into two dataframes, one that has the

predictors columns and another one that


has the target column. We will also name

them predictors and target. Now prepare

to see the magic of Keras and how

building such a network and training it

and using it to predict new samples can

be achieved with only few lines of code.

The first thing you will need to do is

import Keras and the Sequential model

from "keras.models". Because our

network consists of a linear stack of

layers, then the Sequential model is what

you would want to use. This is the case

most of the time unless you are building

something out of the ordinary. There are

two models in the Keras library. One of

them is the Sequential model and the

other one is the model class used with

the functional API. So to create your

model, you simply call the Sequential

constructor. Now building your layers is

pretty straightforward as well.

For that, we would need to import the "Dense"

type of layers from "keras.layers".

Then we use the add method to add each

dense layer. We specify the number of

neurons in each layer and the activation

function that we want to use.

As per our discussion in the video on activation

functions, ReLU is one of the

recommended activation functions for

hidden layers, so we will use that. And


for the first hidden layer we need to

pass in the "input_shape" parameter, which

is the number of columns or predictors

in our dataset.

Then we repeat the same thing for the other hidden layer,

of course without the input_shape parameter,

and we create our output layer with one

node. Now for training, we need to define

an optimizer and the error metric. In the

previous module, we used gradient descent

as our minimization or optimization

algorithm, and the mean squared error as

our loss measure between the predicted

value and the ground truth. So we will

stick with that and use the mean squared

error as our loss measure. As for the

minimization algorithm, there are actually

other more efficient algorithms than the

gradient descent for deep learning

applications. One of them is "adam". One of

the main advantages of the "adam"

optimizer is that you don't need to

specify the learning rate that we saw in

the gradient descent video. So this saves

us the task of optimizing the learning

rate for our model. Then we use the fit

method to train our model. Once training

is complete, we can start making

predictions using the predict method.

Below the video, you will find a document

with links to different sections in the


Keras library that you can refer to to

learn more about optimizers, models, and

other methods that you can use in the

Keras library. But this code snippet

here is typically all you need to know

to build a regression model in Keras. In

the next video, we will learn how to

build a classification model using the

Keras library.

in this video, we will learn how to use the Keras library to build models for

classification problems. Let's say that we would like to build a model that

would inform someone whether purchasing a certain car would be a good choice

based on the price of the car, the cost to maintain it, and whether it can

accommodate two or more people. So, here is a dataset that we are calling "car_data".

I already cleaned the data, as you can see, where I used one-hot encoding to

transform each category of price, maintenance, and how many people the car

can accommodate, into separate columns. So the price of the car can be either

high, medium, or low. Similarly, the cost of maintaining the

car can also be high, medium, or low, and the car can either fit two people or

more. If you take the first car in the dataset, it is considered an expensive

car, has high maintenance cost, and can fit only two people. The decision is 0,

meaning that buying this car would be a bad choice. A decision of 1 means that

buying the car is acceptable, a decision of 2 means that buying the car would

be a good decision, and a decision of 3 means that buying the car would be

a very good decision. Let's use the same neural network as the one we used for

the regression problem that we discussed in the previous video. So a network that

still takes eight inputs or predictors, consists of two hidden layers, each of

five neurons, and an output layer. Next, let's divide our dataset into

predictors and target. However, with Keras, for classification problems, we

can't use the target column as is; we actually need to transform the column
into an array with binary values similar to one-hot encoding like the output

shown here. We easily achieve that using the "to_categorical"

function from the Keras utilities package. In other words, our model instead

of having just one neuron in the output layer, it would have four neurons,

since our target variable consists of four categories. In terms of code, the

structure of our code is pretty similar to the one we use to build the model for

our regression problem. We start by importing the Keras library and the

Sequential model and we use it to construct our model. We also import the

"Dense" layer since we will be using it to build our network. The additional import

statement here is the "to_categorical" function in order to transform our

target column into an array of binary numbers for classification. Then, we

proceed to constructing our layers. We use the add method to create two hidden

layers, each with five neurons and the neurons are activated using the ReLU

activation function. Notice how here we also specify the softmax function as the

activation function for the output layer, so that the sum of the predicted values

from all the neurons in the output layer sum nicely to 1. Then in defining our

compiler, here we will use the categorical cross-entropy as our loss

measure instead of the mean squared error that we use for regression, and we

will specify the evaluation metric to be "accuracy". "accuracy" is a built-in

evaluation metric in Keras but you can actually define your own evaluation

metric and pass it in the metrics parameter. Then we fit the model.

Notice how this time we're specifying the number of epochs for training the

model. Although we didn't specify the number of epochs when we built a

regression model, but we could have done that. Finally, we use the predict method

to make predictions. Now the output of the Keras predict method would be

something like what's shown here. For each data point, the output is the

probability that the decision of purchasing a given car belongs to one of

the four classes. For each data point, the probabilities should sum to 1,

and the higher the probability the more confident is the algorithm that

a datapoint belongs to the respective class. So for the first data point or the

first car in the test set, the decision would be 0 meaning not acceptable,
since the first probability is the highest, with a value of 0.99 or close to

1, in this case. Similarly, for the second datapoint, the decision is also 0 or

not acceptable, since the probability for this class is the highest, again with a

value of 0.99 or almost 1. For the first three datapoints, the model is very

confident that purchasing these cars is not acceptable. As for the last three

datapoints, the decision would be 1 or acceptable, since the probabilities for

the second class are higher than the rest of the classes. But notice how the

probabilities for decision 0 and decision 1 are very close. Therefore,

the model is not very confident but it would lean towards accepting purchasing

these cars. In the lab part, you will get the chance to build your own

regression and classification models using the Keras library, so make sure to

complete this module's lab components.

So far, we discussed two supervised deep learning models, which are the

convolutional neural network and the recurrent neural network. In this video, we

will switch to an unsupervised deep learning model which is the autoencoder.

So what are autoencoders? Autoencoding is a data compression algorithm where

the compression and the decompression functions are learned automatically from

data. instead of being engineered by a human. Such autoencoders are built using

neural networks. Autoencoders are data specific, which means that they will only

be able to compress data similar to what they have been trained on. Therefore, an

autoencoder trained on pictures of cars would do a

rather poor job of compressing pictures of buildings, because the features it

would learn would be vehicle or car specific. Some interesting applications

of autoencoders are data denoising and dimensionality reduction for data

visualization. Here is the architecture of an autoencoder. It takes an image, for

example, as an input and uses an encoder to find the optimal compressed

representation of the input image. Then, using a decoder the original image is

restored. So an autoencoder is an unsupervised neural network model. It

uses backpropagation by setting the target variable to be the same as the

input. In other words, it tries to learn an approximation of an identity function.


Because of non-linear activation functions in neural networks,

autoencoders can learn data projections that are more interesting than a

principal component analysis PCA or other basic techniques, which can handle

only linear transformations. A very popular type of autoencoders is the

Restricted Boltzmann Machines or (RBMs) for short. RBMs have been successfully

used for various applications, including fixing imbalanced datasets. Because

RBMs learn the input in order to be able to

regenerate it, then they can learn the distribution of the minority class in an

imbalance dataset ,and then generate more data points of that class, transforming

the imbalance dataset into a balanced data set.

Similarly, RBMs can also be used to estimate missing values in different

features of a data set. Another popular application of Restricted Boltzmann

Machines is automatic feature extraction of especially unstructured data. And this

concludes our high-level introduction to autoencoders and Restricted Boltzmann

Machines.

67

All the algorithms that are used in deep

learning are largely inspired by the way

neurons and neural networks function and

process data in the brain. This image is

one of the very first pictures of a

neuron. It was drawn by Santiago Ramon

y Cajal,

back in 1899 based on what he saw after

placing a pigeon's brain under the

microscope. He is now known as the father

of modern neuroscience, but based on his

drawing, the neurons, one of them labeled

A, have big bodies in the middle and long

arms that stretch out and branch off to

connect with other neurons. This other


image here is that of a neural network

and has a bunch or thousands of neurons

in what looks like a brain tissue. Tt

gives you a sense of how tightly they

are packed together and how many of them

are in a small brain tissue. Going back

to the drawing of neurons by Ramon y Cajal,

let's rotate it 90 degrees to the

left.

I bet this way it is starting to look a

little familiar since it slightly

resembles drawings of artificial neural

networks that you must have seen. Here is

a cartoon drawing of the neuron. The main

body of the neuron is called the soma,

which contains the nucleus of the neuron.

The big network of arms sticking out of

the body is called the dendrites, and

then the long arm that sticks out of the

soma in the other direction is called

the axon. The whiskers at the end of the

axon are called the terminal buttons or

synapses. So the dendrites receive

electrical impulses which carry

information, or data, from sensors or

terminal buttons of other adjoining

neurons. The dendrites then carry the

impulses or data to the soma. In the

nucleus, electrical impulses, or the data,

are processed by combining them together,

and then they are passed on to the axon.


The axon then carries the processed

information to the terminal button or

synapse, and the output

of this neuron becomes the input to

thousands of other neurons. Learning in

the brain occurs by repeatedly

activating certain neural connections

over others, and this reinforces those

connections. This makes them more likely

to produce a desired outcome given a

specified input. Once the desired outcome

occurs, the neural connections causing

that outcome become strengthened. An

artificial neuron behaves in the same

way as a biological neuron. So it

consists of a soma, dendrites, and an axon

to pass on the output of this neuron to

other neurons. The end of the axon can

branch off to connect to many other

neurons, but for simplicity we are just

showing one branch here. The learning

process also very much resembles the way

learning occurs in the brain as you will

see in the next couple of videos. Now

that we understand the different parts

of an artificial neuron, let's learn how

we formulate the way artificial neural

networks process information.

So far, we have mostly been dealing with

not very deep, or shallow, neural networks.


And the main reason is that they really

do serve as the building block of deep

neural networks and are easier to

understand due to their simplicity. There

isn't really a consensus on the

definition of a shallow neural network

but a neural network with one hidden

layer is considered a shallow neural

network whereas a network with many

hidden layers and a large number of

neurons in each layer is considered a

deep neural network. Also, unlike a

shallow neural network which takes only

input as vectors, deep neural networks

are able to take raw data such as images

and text and automatically extract the

necessary features to learn the data

better. We will start learning about deep

learning algorithms in the next videos.

But if neural networks have been around

for quite some time, how come only

recently did they turn deep and start

taking off resulting in a plethora of

cool and exciting applications? The

sudden boom in the deep learning field

can be attributed to three main factors.

Number one, advancement in the field

itself. We talked about this briefly in

the activation functions video, where we

mentioned that the ReLU activation

function helped overcome the challenge


of the vanishing gradient problem, and

therefore, opened the door to the creation

of very deep networks.

Therefore, advancement in the field

itself is one factor that helped deep

learning take off. Another main reason is

the availability of data. Deep neural

networks work best when trained with

large and large amounts of data,

since neural networks learn the

training data so well, then large amounts

of data have to be used in order to

avoid overfitting of the training data.

Now that large amounts of data are

readily available and easy to acquire

like never before, deep learning

algorithms are being tried and tested

like never before. Especially that the

other conventional machine

learning algorithms, while they do

improve with more data, but up to a

certain point. After that, no significant

improvement would be observed with more

data. That is definitely not the case

with deep learning. The more data you

feed it the better it performs. Finally,

and this goes hand-in-hand with point

number 2, is computational power. With

NVIDIA's super powerful GPUs, we are now

able to train very deep neural networks

on tremendous amount of data in a matter


of hours as opposed to days or weeks,

which is how long it used to take to

train very deep neural networks.

Therefore, users are able to experiment

with different deep neural networks and

test different prototypes in much

shorter periods of time. These three

factors are the main reasons behind the

boom of deep learning. In the next video,

we will start learning about deep

learning algorithms. We will start with

supervised deep learning algorithms, and

in the next video, we will learn about

convolutional neural networks.

In this video, we will learn about deep

learning algorithms. We will start with

supervised deep learning algorithms, and

in this video, we will learn about

convolutional neural networks.

Convolutional neural networks are very

similar to the neural networks that we

have seen so far in this course. They are

made up of neurons, which need to have

the weights and biases optimized. Each

neuron combines the inputs that it

receives by computing the dot product

between each input and the corresponding

weight before it fits the resulting

total input into an activation function,

ReLU most likely. So then, what is


different with these networks and why

are they called convolutional neural

networks?

Well convolutional neural networks, or

CNNs for short, make the explicit

assumption that the inputs are images,

which allows us to incorporate certain

properties into their architecture. These

properties make the forward propagation

step much more efficient and vastly

reduces the amount of parameters in the

network. Therefore, CNNs are best for

solving problems related to image

recognition, object detection, and other

computer vision applications. Here is a

typical architecture of a convolutional

neural network. As you can see, the

network consists of a series of

convolutional, ReLU, and pooling layers

as well as a number of fully connected

layers which are necessary before the

output is generated. Now, let's study what

happens in each layer. So far, we have dealt

only with conventional neural networks

that take an ( n x 1) vector as their

input. The input to a convolutional

neural network, on the other hand, is

mostly an (n x m x 1) for grayscale images

or an (n x m x 3) for colored images,

where the number 3 represents the

red,
green, and blue components of each pixel

in the image. In the convolutional layer,

we basically define filters and we

compute the convolution between the

defined filters and each of the three

images. If we take the red image for

example, let's assume these are the pixel

values. Now for a (2 x 2) filter with

these values, let's create an empty

matrix to save the results of the

convolution process.

We start by sliding the filter over the

image and computing the dot product

between the filter and the overlapping

pixel values and storing the result in

the empty matrix. We repeat this step

moving our filter one cell, or one stride

is the proper terminology, at a time, and

we repeat this until we cover the entire

image and fill the empty matrix. Here, I

just showed one filter and only one of

the three images. The same thing would be

applied to the green and blue images and

you can apply more than one filter. The

more filters we use, the more we are able

to preserve the spatial dimensions

better. But one question you must be

asking yourself at this point is, why

would we need to use convolution? Why not

flatten the input image into an (n x m) x 1

vector and use that as our input?


Well, if we do that, we will end up with a

massive number of parameters that will

need to be optimized, and it will be

super computationally expensive. Also,

decreasing the number of parameters

would definitely help in preventing the

model from overfitting the training data.

It is worth mentioning that a

convolutional layer also consists of

ReLU's which filter the output of the

convolutional step passing only positive

values and turning any negative values

to 0. The next layer in our

convolutional neural network is the

pooling layer. The pooling layer's main

objective is to reduce the spatial

dimensions of the data propagating

through the network. There are two types

of pooling that are widely used in

convolutional neural networks. Max-

pooling and average pooling. In max-pooling

which is the most common of the

two, for each section of the image we scan

we keep the highest value, like so.

Here our filter is moving two strides at

a time.

Similarly, with average pooling, we

compute the average of each area we scan.

In addition to reducing the dimension of

the data, pooling, or max pooling in

particular, provides spatial variance


which enables the neural network to

recognize objects in an image even if

the object does not exactly resemble the

original object. Finally, in the fully

connected layer, we flatten the output

of the last convolutional layer and

connect every node of the current layer

with every other node of the next layer.

This layer basically takes as input the

output from the preceding layer, whether

it is a convolutional layer, ReLU, or

pooling layer, and outputs an n-dimensional

vector, where n is the number

of classes pertaining to the problem at

hand. For example, if you are building a

network to classify images of digits, the

dimension n would be 10, since there are

10 digits. You will be covering

convolutional neural networks in much

more details in the other courses in this

specialization, but this information is

more than enough to give you a general

understanding of convolutional neural

networks. Now let's see how we can use

the Keras library to build a

convolutional neural network. Here I will

show you how you can use the Keras

library to build a convolutional neural

network. Training and testing of a

convolutional neural network are the

same as what we have seen so far. So to


begin with, we use the sequential

constructor to create our model. Then, we

define our input to be the size of the

input images. Assuming the input images

are 128 by 128 color images, we define

the input shape to be a tuple of (128, 128, 3).

Next, we start adding layers to the

network. We start with a convolutional

layer, with 16 filters, each filter being

of size 2x2 and slides through the image

with a stride of magnitude 1 in the

horizontal direction, and of magnitude 1

in the vertical direction. And the layer

uses the ReLU activation function. Then,

we add a pooling layer and we're using

max-pooling here with a filter or

pooling size of 2 and the filter slides

through the image with a stride of

magnitude 2. Next, we add another set of

convolutional and pooling layers. The

only difference here is we are using more

filters in the convolutional layer,

actually twice as many filters as the

first convolutional layer. Finally, we

flatten the output from these layers so

that the data can proceed to the fully

connected layers. We add another dense

layer with 100 nodes and an output layer

that has nodes equal to the number of

classes in the problem at hand. And we

use the softmax activation function in


order to convert the outputs into

probabilities. With this, we conclude this

video on convolutional neural networks.

In the lab, we will implement a complete

convolutional neural network, where we

will use the Keras library to build the

network, train it, and then validate it. So

make sure to complete this module's lab

on convolution learning networks.

In the previous video, we learned about convolutional neural networks, which are

supervised deep learning models that have revolutionized the field of

computer vision, especially object detection in images. In this video, we

will learn about another supervised deep learning model, which is the recurrent

neural network. So far, we have seen neural networks and deep learning models

that see datapoints as independent instances. However, let's say you want to

build a model that can analyze scenes in a movie.

Well, you cannot assume that scenes in a movie are independent, and therefore,

traditional deep learning models are not suitable for this application. Recurrent

neural networks overcome this issue. Recurrent neural networks or (RNNs)

for short, are networks with loops that don't just take a new input at a time,

but also take in as input the output from the previous dat point that was fed

into the network. Accordingly, this is how the architecture of a recurrent neural

network would look like. Essentially, we can start with a normal neural network.

At time t = 0, the network takes in input x0 and outputs a0. Then, at time t = 1,

in addition to the input x1, the network also takes a0 as input,

weighted with weight w0,1, and so on and so forth. As a result, recurrent neural

networks are very good at modelling patterns and sequences of data, such as

texts, genomes, handwriting, and stock markets. These algorithms take time and

sequence into account, which means that they have a temporal dimension. A very

popular type of recurrent neural network is the long short-term memory model or
the (LSTM) model for short. It has been successfully used for many applications

including image generation, where a model trained on many

images is used to generate new novel images. Another application is

handwriting generation, which I described in the welcome video of this course. Also

LSTM models have been successfully used to build algorithms that can

automatically describe images as well as streams of videos. I think this is a good

overview of recurrent neural networks. Given that this is just an introductory

course I will leave it here. This concludes this video

on recurrent neural networks. I will see you in the next video

we will switch to unsupervised deep learning models and talk about

autoencoders.

You might also like