Professional Documents
Culture Documents
AI Chapter 4
AI Chapter 4
Institute of Technology
University of Gondar
Biomedical Engineering Department
Outlines: -
» Deep learning
2
Deep Learning
» Machine learning is using algorithms to extract information from raw data and represent it in some
type of model
» The model uses to infer things about other data we have not yet modeled.
» A computer learns something about the structures that represent the information in the raw data.
» Structural descriptions are another term for the models we build to contain the information extracted
from the raw data, and we can use those structures or models to predict unknown data.
» Structural descriptions (or models) can take many forms, including the following:
o Decision trees
o Linear regression
o Neural network weights
Deep Learning(DL)
» Deep learning is a sub‐ set of the field of machine learning, which is a subfield of AI.
» Deep learning is an area of machine learning that emerged from the intersection of neural networks,
artificial intelligence, graphical modeling, optimization, pattern recognition and signal processing.
» Deep learning is about supervised or unsupervised learning from data using multiple layered machine
learning models.
» The development of deep learning was motivated in part by the failure of traditional algorithms to
generalize well on such AI tasks
» Deep multi-layer neural networks contain many levels of nonlinearities which allow them to compactly
represent highly non-linear and/ or highly-varying functions.
Deep Learning(DL)
» The power of deep learning models comes from their ability to classify or predict nonlinear
data using a modest number of parallel nonlinear steps.
» A deep learning model learns the input data features hierarchy all the way from raw data input
to the actual classification of the data.
» Each layer extracts features from the output of the previous layer.
7
Why deep learning? Why now?
o Datasets
o Algorithmic advances
» Machine learning isn’t mathematics or physics, where major advances can be done
with a pen and a piece of paper. It’s an engineering science.
What makes deep learning different?
» A feedforward network defines a mapping y = f (x; θ) and learns the value of the
parameters θ that result in the best function approximation.
Deep Feed forward Networks
» For example, we may have three functions: 𝑓 (1) , 𝑓 (2) 𝑎𝑛𝑑 𝑓 (3) connected in a
chain to form 𝑓 𝑥 = 𝑓 3 (𝑓 2 𝑓 1 𝑥 ).
» 𝑓 (1) is called the first layer of the network, 𝑓 (2) is the second layer and so on.
» To extend linear models to represent nonlinear functions of x, we can apply the linear
model not to x itself but to a transformed input φ(x), where φ is a nonlinear
transformation.
» y = f (x; θ, w, ) = φ(x; θ) w.
» We now have parameters θ that we use to learn φ from a broad class of functions, and
parameters w that map from φ(x) to the desired output.
» Parameters learning means finding a set of values for the weights of all
layers in a network.
Understanding how deep learning works
17
Understanding how deep learning works
The loss function takes the predictions of the network and the true target (what function you wanted
the network to output) and computes a distance score, capturing how well the network has done on
this specific example. A loss function measures the quality of the network’s output.
Understanding how deep learning works
» A sufficient training loop or number of times yields weight values that minimize the loss
function
» Once the network architecture is defined, you still have to choose two more things:
o Loss function (objective function)—The quantity that will be minimized during training.
o Optimizer—Determines how the network will be updated based on the loss function. It
implements a specific variant of stochastic gradient descent (SGD)
» Choosing the right objective function for the right problem is extremely important
Convolutional Neural Networks
o Speech recognition
o Translation
o Self-driving etc.
» Convolution leverages three important ideas that can help improve a machine learning
system:
o Sparse interactions
o parameter sharing and
o equivariant representation
» Moreover, convolution provides a means for working with inputs of variable size.
Sparse interactions:
» For example:
o when processing an image, the input image might have thousands or millions of pixels, but
we can detect small, meaningful features such as edges with kernels that occupy only tens or
hundreds of pixels.
o This means that we need to store fewer parameters, which both reduces the memory
requirements of the model and improves its statistical efficiency
CNN: Motivation
» If there are m inputs and n outputs, then matrix multiplication requires mxn
parameters and the algorithms used in practice have O(m x n) runtime (per example).
» If we limit the number of connections each output may have to k, hen the sparsely
connected approach requires only kxn parameters and O(k xn) runtime.
Parameter sharing:
» It refers to using the same parameter for more than one function in a
model.
» In the case of convolution, the particular form of parameter sharing causes the
layer to have a property called equivariance to translation.
» Equivariant means that if the input changes, the output changes in the same way.
» In the case of convolution, if we let g be any function that translates the input,
i.e., shifts it, then the convolution function is equivariant to g.
CNN: Motivation
» Let g be a function mapping one image function to another image function, such that
𝐼 ′ = g (I ) is the image function with 𝐼 ′ 𝑥, 𝑦 = 𝐼(𝑥 − 1, 𝑦)
» If we apply this transformation to I , then apply convolution, the result will be the
same as if we applied convolution to I’, then applied the transformation g to the
output.
CNN: Motivation
» For example, when processing images, it is useful to detect edges in the first
layer of a convolutional network.
» This implies that the inputs of hidden units in layer l belong to a subset of units in
layer l-1 which have “spatially contiguous” receptive fields.
» Each unit does not respond to the variations which are outer to its receptive field
with respect to the input.
» This ensures that the learnt “filters” result in the strongest response to a spatially
local input pattern.
The Convolution Operation
The Convolution Operation
33
The Convolution Operation
If we have an n × n image matrix and an f × f filter, the general formula for the dimension
of the output image matrix is n − f + 1 × n − f + 1
The Convolution Operation
For example:
» We will consider a 6 × 6 matrix and apply a 3 × 3 filter, which is a hyperparameter, over the
matrix and interpret the results.
» In 1st stage, the layer performs several convolutions in parallel to produce a set of
linear activations.
» In 2nd stage, each linear activation is run through a nonlinear activation function,
such as the rectified linear activation function. This stage is sometimes called the
detector stage.
» In 3rd stage, we use a pooling function to modify the output of the layer further.
Pooling
» A pooling function replaces the output of the net at a certain location with a summary
statistic of the nearby outputs.
» For example, the max pooling operation reports the maximum output within a rectangular
neighborhood.
» In all cases, pooling helps to make the representation become approximately invariant to
small translations of the input.
» Invariance to translation means that if we translate the input by a small amount, the values
of most of the pooled outputs do not change.
Pooling
» The pooling (POOL) layer reduces the dimensions of the input and reduces computation
and make feature detectors more invariant (invariance implies that we can recognize an
object even when its appearance varies) to its position in the input.
» ‘Translation invariance’ means that the system produces exactly the same response,
regardless of how the input varies.
» For example, a dog-detector might report “dog-identified” for all input images.
Pooling
o Average-pooling layer: slides the filter over the input and stores the average value of the input
(overlapping the input) in the output.
o stride s
cat dog ……
Convolution
Max Pooling
Can
Fully Connected repeat
Feedforward network
Convolution many
times
Max Pooling
Flattened
Sequence Modeling: Recurrent and Recursive Nets
» Parameter sharing makes it possible to extend and apply the model to examples of
different forms (different lengths, here) and generalize across them.
» RNN works for sequence of data like natural language processing (i.e. for a given
sentences weather positive or negative.) and Time series data( i.e. sales forecasting)
Sequence Modeling: Recurrent and Recursive Nets
o Each member of the output is a function of the previous members of the output.
o Each member of the output is produced using the same update rule applied to the previous
outputs.
» RNNs as operating on a sequence that contains vectors 𝑥 (𝑡) with the time step index t
ranging from 1 to 𝜏
Unfolding Computational Graphs
» The idea of transform a recurrent computation into a computational unfolding graph that
has a repetitive structure(i.e. a chain of events).
» Unfolding this graph results in the sharing of parameters across a deep network structure.
Unfolding Computational Graphs
» For example:
» For a finite number of time steps τ , the graph can be unfolded by applying
the definition τ -1 times.
= 𝑓(𝑓 𝒔 1 ; 𝜽 ; 𝜽)
Unfolding Computational Graphs
» Each node represents the state at some time t and the function f maps the
state at t to the state at t + 1.
» The same parameters (the same value of used to parametrize ) are used for
all time steps.
Unfolding Computational Graphs
where we see that the state now contains information about the whole past sequence.
» Many recurrent neural networks use following equation to the values of their
hidden units:
𝒉(𝑡) = 𝑓(𝒉 𝑡−1
, 𝒙𝑡 ; 𝜽)
output layers that read information out of the state h to make predictions
Unfolding Computational Graphs
» When the recurrent network is trained to perform a task that requires predicting the future
from the past, the network typically learns to use 𝑥 (𝑡) as the past sequence of inputs up to t.
» In the left diagram, the black square indicates a delay of 1 time step.
» Computational graph, where each node is now associated with one particular
time instance.
» What we call is unfolding the operation that maps a circuit as in the left side
of the figure to a computational graph with repeated pieces as in the right side.
» The unfolded graph now has a size that depends on the sequence length.
Unfolding Computational Graphs
» We can represent the unfolded recurrence after steps t with a function 𝑔(𝑡) :
𝑡−1
= 𝑓(𝒉 , 𝒙 𝑡 ; 𝜽)
» The unfolded graph also helps to illustrate the idea of information flow forward in
time (computing outputs and losses) and backward in time (computing gradients)
by explicitly showing the path along which this information flows
Recurrent Neural Networks
» The computational graph to compute the training loss of a recurrent network that
maps an input sequence of x values to a corresponding sequence of output o values.
» A loss L measures how far each o is from the corresponding training target y.
Recurrent networks that produce an output at each time step and have recurrent
connections between hidden units.
Recurrent Neural Networks
» In the Left, the RNN and its loss drawn with recurrent connections.
» Keras is a deep-learning framework that provides a convenient way to define and train
almost any kind of deep-learning model.
» Keras was initially developed for researchers, with the aim of enabling fast experimentation.
o It has a user-friendly API that makes it easy to quickly prototype deep-learning models.
o It has build-in support for convolutional networks (for computer vision), recurrent networks (for
sequence processing), and any combination of both.
Introduction to Keras
» Keras is used at Google, Netflix, Uber, CERN, Yelp, Square, and hundreds of startups
working on a wide range of problems
» The TensorFlow backend as the default for most of your deep-learning needs
o It is the most widely adopted
» Via TensorFlow (or Theano, or CNTK), Keras is able to run seamlessly on both CPUs and
GPUs.
» When running on CPU, TensorFlow is itself wrapping a low-level library for tensor
operations called Eigen ( ).
» To get started with Keras, you need to install the Keras R package, the core Keras
library, as well as a backend tensor engine (e.g. TensorFlow)
o # Install the keras R package
o install.packages("keras")
o library(keras)
o install_keras()
Developing with Keras: a quick overview
» You’ve already seen one example of a Keras model: the MNIST example.
The typical Keras workflow looks just like that example:
o Define your training data: input tensors and target tensors.
o Define a network of layers (or ) that maps your inputs to your targets.model
1. Develop an ECG signal feature extractor using deep learning (the ECG features
will be P-peak, P-duration, Q-peak, R-peak, S-peak, QRS-duration, T-peak and
T-duration)