Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Unit 5

 Deep generative models:

Deep generative models are neural networks with many hidden layers trained to approximate
complicated, high-dimensional probability distributions. In short, the ambitious goal in DGM
training is to learn an unknown or intractable probability distribution from a typically small
number of independent and identically distributed samples. When trained successfully, we can use
the DGM to estimate the likelihood of a given sample and to create new samples that are similar
to samples from the unknown distribution. These problems have been at the core of probability
and statistics for decades but remain computationally challenging to solve, particularly in high
dimensions. Despite many recent advances and success stories, several open challenges remain in
the field of generative modeling. This paper focuses on explaining three key mathematical
challenges.

1. DGM training is an ill-posed problem since uniquely identifying a probability distribution from
a finite number of samples is impossible. Hence, the performance of the DGM will depend
heavily on so-called hyperparameters, which include the design of the network, the choice of
training objective, regularization, and training algorithms.

2. Training the generator requires a way to quantify its samples’ similarity to those from the
intractable distribution. In the approaches considered here, this either requires the inversion of the
generator or comparing the distribution of generated samples to the given dataset. Both of these
avenues have their distinct challenges. Inverting the generator is complicated in most cases,
particularly when it is modeled by a neural network that is nonlinear by design. Quantifying the
similarity of two probability distributions from samples leads to two-sample test problems, which
are especially difficult without prior assumptions on the distributions.

3. Most common approaches for training DGMs assume that we can approximate the intractable
distribution by transforming a known and much simpler probability distribution (for instance, a
Gaussian) in a latent space of known dimension. In most practical cases, determining the latent
space dimension is impossible and is left as a hyperparameter that the user needs to choose. This
choice is both difficult and important. With an overly conservative estimate, the generator may
not approximate the data well enough, and an overestimate can render the generator non-injective,
which complicates the training.
Deep generative models are neural network models that can replicate the data distribution that you
give it. This allows you to generate “fake-but-realistic” data points from real data points.

There are two major departments of generative models: Variational Autoencoders, and Generative
Adversarial Networks.

Variational Autoencoders
Auto encoders (without the “Variational”) take in data, transform it through a hidden layer, and
attempt to replicate the data. Essentially, it is a neural network that replicates exactly what you put
in. This seems really simple, but there is a catch: the hidden layer has less neurons than the input
layer. This means that you can’t just copy and paste: you need to “compress” the input data into
fewer nodes, and reconstruct it. Normally, we aren’t interested in the reconstruction—this is
simply a way for us to know the progress of the network, and to find a tractable way of giving it
feedback (we can’t tell how close it is to the right answer based on the hidden, latent nodes, we
can if it reconstructs the image). The goal is that, if the network can successfully replicate the
image (roughly), then we know that the hidden layer has properly compressed the image. This is
important because the hidden layer must then only capture the important features of the image,
and so auto encoders are really useful for feature extraction, or as a form of pre-training.

Variational auto encoders are basically the same thing, but with a probabilistic twist. They take
the input, but the hidden layer generates distribution parameters, like a mean for example. So if
the latent distribution was a Gaussian distribution, then the hidden layer outputs would represent
the mean of the Gaussian distribution. We then train the hidden layer to generate a mean as close
to 0 as possible (a unit Gaussian). This is good because we now know the exact distribution that
the hidden layer will follow (a Gaussian), so if we want to generate our own images, we just
sample from the Gaussian distribution and feed it to the output layer.

Generative Adversarial Networks

This is really cool. This actually consists of two networks: a generator and a discriminator. We
start with a data distribution, and a random number generator. We feed the generator network
random numbers, and the network maps those numbers to data (or images). The discriminator
then takes the generator made data point, and an actual data point from the data distribution, and
outputs log probabilities that the generator and real data point are real.

So, the goal of the generator network is to make the discriminator network output a 1 when fed the
generator’s image (this means that the discriminator believes the generator’s image is real, which
is good for the generator), and the goal of the discriminator is to output a 0 for the generator’s
image, and a 1 for the real image.

Once we train the networks like this for a long time, the generator starts to generate images
similar to the real data distribution. A perfect imitation is when the generator outputs images so
real, that the discriminator always outputs a 0.5 as the probability, because the chance that the
image is real is 50/50 (this is the optimal distribution for the discriminator network).

 Restricted Boltzmann machines:

It is a network of neurons in which all the neurons are connected to each other. In this machine,
there are two layers named visible layer or input layer and hidden layer. The visible layer is
denoted as v and the hidden layer is denoted as the h. In Boltzmann machine, there is no output
layer. Boltzmann machines are random and generative neural networks capable of learning
internal representations and are able to represent and (given enough time) solve tough
combinatoric problems.
The Boltzmann distribution (also known as Gibbs Distribution) which is an integral part of
Statistical Mechanics and also explain the impact of parameters like Entropy and Temperature on
the Quantum States in Thermodynamics. Due to this, it is also known as Energy-Based Models
(EBM). It was invented in 1985 by Geoffrey Hinton, then a Professor at Carnegie Mellon
University, and Terry Sejnowski, then a Professor at Johns Hopkins University

 Restricted Boltzmann Machines (RBM)


A restricted term refers to that we are not allowed to connect the same type layer to each other. In
other words, the two neurons of the input layer or hidden layer can’t connect to each other.
Although the hidden layer and visible layer can be connected to each other.

As in this machine, there is no output layer so the question arises how we are going to identify,
adjust the weights and how to measure that our prediction is accurate or not. All the questions
have one answer that is Restricted Boltzmann Machine.
The RBM algorithm was proposed by Geoffrey Hinton (2007), which learns probability
distribution over its sample training data inputs. It has seen wide applications in different areas of
supervised/unsupervised machine learning such as feature learning , dimensionality reduction,
classification, collaborative filtering, and topic modeling.
Consider the example movie rating discussed in the recommender system section.

Movies like Avengers, Avatar, and Interstellar have strong associations with the latest fantasy and
science fiction factor. Based on the user rating RBM will discover latent factors that can explain
the activation of movie choices. In short, RBM describes variability among correlated variables of
input dataset in terms of a potentially lower number of unobserved variables.
The energy function is given by
Restricted Boltzmann Machines working
In RBM there are two phases through which the entire RBM works:

1st Phase: In this phase, we take the input layer and using the concept of weights and biased we
are going to activate the hidden layer. This process is said to be Feed Forward Pass. In Feed
Forward Pass we are identifying the positive association and negative association.

Feed Forward Equation:

● Positive Association — When the association between the visible unit and the hidden
unit is positive.
● Negative Association — When the association between the visible unit and the hidden
unit is negative.

2nd Phase: As we don’t have any output layer. Instead of calculating the output layer, we are
reconstructing the input layer through the activated hidden state. This process is said to be Feed
Backward Pass. We are just backtracking the input layer through the activated hidden neurons.
After performing this we have reconstructed Input through the activated hidden state. So, we can
calculate the error and adjust weight in this way:

Feed Backward Equation:

● Error = Reconstructed Input Layer-Actual Input layer


● Adjust Weight = Input*error*learning rate (0.1)
After doing all the steps we get the pattern that is responsible to activate the hidden neurons. To
understand how it works:
Let us consider an example in which we have some assumption that V1 visible unit activates the
h1 and h2 hidden unit and V2 visible unit activates the h2 and h3 hidden. Now when any new
visible unit let V5 has come into the machine and it also activates the h1 and h2 unit. So, we can
back trace the hidden units easily and also identify that the characteristics of the new V5 neuron is
matching with that of V1. This is because V1 also activated the same hidden unit earlier.

.
 Autoregressive Model

Future prediction often requires a technical base. In the practical world, analysts predict future
values based on past values of a commodity or trend in the market. In a statistical model, it is
termed an autoregressive if it is capable of predicting future values given a series of factual data of
past values.

Therefore, Auto regression (AR) is a time series model. The autoregressive model is meant to
predict future values based on the values in the past events. It uses input data from observations of
previous steps, and using the regression equation predicts the value at the next time step. This
model can result in accurate forecasts on a range of time series problems.

It’s commonly making use of the algorithm based on the correlations (serial correlation) derived
within the values in a given time series and the values that lead and succeed them. The hypothesis
that the past values affect current values makes the statistical technique useful for analyzing
nature, such as weather, finance, e.g. economics, and other processes subjected to vary over time.

Features of Autoregressive Model


 Autoregressive models help predict future values based on past values.
 Autoregressive models are used in technical analysis to forecast future trends.
 Autoregressive models are based on the theory that the future will resemble the past.
 Time series data are data collected on the same observational unit at multiple periods.

Examples of Autoregressive Models

• Fully Visible Sigmoid Belief Network (FVSBN)


• Neural Autoregressive Density Estimation (NADE)
• Masked Autoencoder for Distribution Estimation (MADE)
• PixelRNN, PixelCNN, WaveNet….

 Neural Autoregressive Density Estimation (NADE)

Estimation of probability density of high dimensional complex data distributions is a long-


standing challenge in many different fields, including machine learning. Neural Autoregressive
Density Estimation (NADE) is a method that estimates data distributions by modeling each term
in the probability chain rule with a parameterized function.

The indices in (1) represent an arbitrary ordering of dimensions, i.e. x1, doesn’t need to be the
first dimension of data. In this approach, we decompose the joint density into one dimensional
conditional densities, where each conditional is a function of all the previous dimensions, e.g. a
univariate Gaussian density, whose mean and variance are computed by deep neural networks
(DNN).

Where x{1:0} is an empty sequence and its function is a constant.


The main contribution of NADE is its architecture that heavily ties the parameters of conditionals
together, resulting in better learning and sample efficiency. The parameters, denoted by vector ⌀,
are learned from data samples using Maximum Likelihood Estimation (MLE) or maximizing the
Evidence Lower BOund (ELBO) for Variational Inference (VI). Unfortunately, likelihood
evaluation in NADE from (1) is O(D) which is slow for high dimensions.

 Masked Auto-encoder for Density Estimation


Masked Auto-encoder for Density Estimation (MADE) is a popular architecture which
implements the autoregressive dependencies in terms of connections of layers in a DNN. MADE
an auto-encoder architecture is converted into an autoregressive model by applying a mask on the
appropriate connections linking the high indices of the ordering to the conditionals of lower
indices in each layer (Figure (2). Although, forward pass is still O(D), we can enjoy the efficient
parameter and computation sharing and speed boost from parallel computation of the conditionals
with a single forward pass of the network [2].

Figure (2): Masking connections in 3 layer auto-encoder to construct an autoregressive model.


Our ordering indices correspond to the input node numbers. To ensure autoregressive property, in
each layer nodes are randomly indexed and the connections to larger numbers from the previous
layer are removed from fully connected layers. Output nodes evaluate the parameters of the
conditionals, which are then used to evaluate the joint likelihood of x {1: D}. [2]
 Pixel Rnn

PixelRNN is a part of the class of Auto-regressive models that is tractable and scalable. An
effective approach to model such a network is to use probabilistic density models (like Gaussian
or Normal distribution) to quantify the pixels of an image as a product of conditional distributions.
This approach turns the modeling problem into a sequence problem wherein the next pixel value
is determined by all the previously generated pixel values.

To process these non-linear and long term dependencies between pixel values and distributions we
need an expressive sequence model like Recurrent Neural Network(RNN). RNNs have been
shown to be extremely efficient in handling sequence problems.

The network scans the image one row one pixel at a time in each row. Subsequently it predicts
conditional distributions over the possible pixel values. The distribution of image pixels is written
as product of conditional distributions and these values are shared across all the pixels of the
image.

The objective here is to assign a probability p(x) to every pixel of the (n x n) image. This can be
done by writing the probability of a pixel xi as :

The four different architectures which can be used by Pixel RNN, namely: Row LSTM, Diagonal
BiLSTM, a fully convolutional network and a Multi Scale network.

 Generative Adversarial Network (GAN)

Generative Adversarial Networks (GANs) are a powerful class of neural networks that are used
for unsupervised learning. It was developed and introduced by Ian J. Goodfellow in 2014. GANs
are basically made up of a system of two competing neural network models which compete with
each other and are able to analyze, capture and copy the variations within a dataset.
Working of GANs
Generative Adversarial Networks (GANs) can be broken down into three parts:
Generative: To learn a generative model, which describes how data is generated in terms of a
probabilistic model.
Adversarial: The training of a model is done in an adversarial setting.
Networks: Use deep neural networks as the artificial intelligence (AI) algorithms for training
purpose.

In GANs, there is a generator and a discriminator. The Generator generates fake samples of
data(be it an image, audio, etc.) and tries to fool the Discriminator. The Discriminator, on the
other hand, tries to distinguish between the real and fake samples. The Generator and the
Discriminator are both Neural Networks and they both run in competition with each other in the
training phase. The steps are repeated several times and in this, the Generator and Discriminator
get better and better in their respective jobs after each repetition. The working can be visualized
by the diagram given below:

Here, the generative model captures the distribution of data and is trained in such a manner that it
tries to maximize the probability of the Discriminator in making a mistake. The Discriminator, on
the other hand, is based on a model that estimates the probability that the sample that it got is
received from the training data and not from the Generator.

The GANs are formulated as a minimax game, where the Discriminator is trying to minimize its
reward V(D, G) and the Generator is trying to minimize the Discriminator’s reward or in other
words, maximize its loss. It can be mathematically described by the formula below:

where,
G = Generator
D = Discriminator
Pdata(x) = distribution of real data
P(z) = distribution of generator
x = sample from Pdata(x)
z = sample from P(z)
D(x) = Discriminator network
G(z) = Generator network
So, basically, training a GAN has two parts:

Part 1: The Discriminator is trained while the Generator is idle. In this phase, the network is only
forward propagated and no back-propagation is done. The Discriminator is trained on real data for
n epochs, and see if it can correctly predict them as real. Also, in this phase, the Discriminator is
also trained on the fake generated data from the Generator and see if it can correctly predict them
as fake.

Part 2: The Generator is trained while the Discriminator is idle. After the Discriminator is trained
by the generated fake data of the Generator, we can get its predictions and use the results for
training the Generator and get better from the previous state to try and fool the Discriminator.
Some of the advantages of Auto-regressive models over GANs are:

1. Provides a way to calculate likelihood: These models have the advantage of returning explicit
probability densities (unlike GANs), making it straightforward to apply in domains such as
compression and probabilistic planning and exploration

2. The training is more stable than GANs: Training a GAN requires finding the Nash
equilibrium. Since, there is no algorithm present which does this, training a GAN is unstable as
compared to PixelRNN or PixelCNN.

3. It works for both discrete and continuous data: It’s hard to learn to generate discrete data for
GAN, like text.
GANs are known to produce higher quality images and are faster to train. There are efforts being
made to incorporate the advantages of both classes in a single model but it is still an open-research
area. In this blog, we’ll focus solely on the Auto-regressive part and leave the others for sometime
later.

 Application Of Deep Learning In Object Detection:

*Object detection is the procedure of determining the instance of the class to which the object
belongs and estimating the location of the object by outputting the bounding box around the
object. Detecting single instance of class from image is called as single class object detection,
whereas detecting the classes of all objects present in the image is known as multi class object
detection. Different challenges such as partial/full occlusion, varying illumination conditions,
poses, scale, etc are needed to be handled while performing the object detection. As shown in the
figure 3, object detection is the foremost step in any visual recognition activity.
Deep CNNs have been extensively used for object detection. CNN is a type of feed-forward
neural network and works on principle of weight sharing. Convolution is an integration showing
how one function overlaps with other function and is a blend of two functions being multiplied.
Fig. 4 shows layered architecture of CNN for object detection. Image is convolved with activation
function to get feature maps. To reduce spatial complexity of the network, feature maps are
treated with pooling layers to get abstracted feature maps. This process is repeated for the desired
number of filters and accordingly feature maps are created. Eventually, these feature maps are
processed with fully connected layers to get output of image recognition showing confidence
score for the predicted class labels. For ameliorating the complexity of the network and reduce the
number of parameters, CNN employs different kinds of pooling layers as shown in the table 1.
Pooling layers are translation-invariant. Activation maps are fed as input to the pooling layers.
They operate on each patch in the selected map.
Frameworks and Services of Object Detection The list of deep learning frameworks available till
date is exhaustive. We have mentioned some significant deep learning frameworks in table 2. The
frameworks are studied from the point of view of features exhibited, interface, support for deep
learning model viz. convolutional neural network, recurrent neural network (RNN), Restricted
Botltzmann Machine (RBM) and Deep Belief Network (DBN) and support for Multi-node parallel
execution, developer of the framework and license. Table 3 show the list of services which can be
used for object detection.
 Application Of Deep Learning In Image Recognition:

Deep learning is widely used in image recognition due to its advantages, such as strong feature
extraction ability and high recognition accuracy. Comparing with some standard networks, such
as RNN, Convolutional Neural Networks (CNN) has a noticeable effect on image recognition.
Conventional neural networks is a kind of deep learning model specially designed for image
classification and recognition developed based on multi-layer Neural Networks. A typical deep
learning model is demonstrated below

A typical deep learning model is a multi-layer neural network consists of an input layer and an
output layer, with several hidden layers in the middle. There are several neurons in each layer, and
each neuron in these layers are connected with a specific weight as the parameter. For each neuron
in these hidden layers, a specific bias is added to make a further adjustment. In general
recognition problems, the image data is passed through the input and during some multiplications
and additions, the final result is returned in the output layer. In the image recognition problem,
each neuron of a deep learning model in the input layer may represent a feature of the input
image. However, there are several problems in this neural network model during image
recognition. One is that the spatial structure of the image is not considered so that the recognition
performance is limited. Second, the neurons in every two adjacent layers are fully connected, so
that the training speed is slow due to an excess of parameters. Unlike conventional deep learning
models, conventional neural networks can solve these problems. It uses a unique structure for data
analysis and can be trained quickly than a conventional deep learning model. Because of the high
speed, it is possible to have a large number of neural network layers, which has a significant
advantage in recognition accuracy. There are three basic concepts of a convolutional neural
network: local receptive fields, shared weights, and pooling. Each idea is demonstrated below for
the case of image recognition.

The most critical development of deep learning in image recognition is the image classification
task in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) challenge. The
convolutional network is widely used in the classification of ImageNet. In many cases, the data
obtained by traditional machine learning methods often have large errors, and the error rate on the
test set is relatively high. However, deep learning can effectively solve these problems. Two
popular convolutional network models that got good rankings in ILSVRC are AlexNet and VGG.

 Application of Deep Learning in Speech Recognition:


Speech recognition is the ability of a machine or program to identify words and phrases in spoken
language and convert them to a machine-readable format. Many speech recognition applications,
such as voice dialing, simple data entry and speech-to-text are in existence today.

Deep learning is becoming a mainstream technology for speech recognition and has successfully
replaced Gaussian mixtures for speech recognition and feature coding at an increasingly larger
scale.

Conventional speech recognition systems utilize Gaussian mixture model (GMM) based hidden
Markov models (HMMs) to represent the sequential structure of speech signals. HMMs are used
in speech recognition because a speech signal can be viewed as a piecewise stationary signal or a
short-time stationary signal. In a short time-scale, speech can be approximated as a stationary
process. Speech can be thought of as a Markov model for many stochastic purposes. Typically,
each HMM state utilizes a mixture of Gaussian to model a spectral representation of the sound
wave. HMMs-based speech recognition systems can be trained automatically and are simple and
computationally feasible to use. However, one of the main drawbacks of Gaussian mixture models
is that they are statistically inefficient for modeling data that lie on or near a non-linear manifold
in the data space. Neural networks trained by back-propagation error derivatives emerged as an
attractive acoustic modeling approach for speech recognition in the late 1980s. In contrast to
HMMs, neural networks make no assumptions about feature statistical properties. When used to
estimate the probabilities of a speech feature segment, neural networks allow discriminative
training in a natural and efficient manner. However, in spite of their effectiveness in classifying
short-time units such as individual phones and isolated words, neural networks are rarely
successful for continuous recognition tasks, largely because of their lack of ability to model
temporal dependencies. Thus, one alternative approach is to use neural networks as a pre-
processing e.g. feature transformation, dimensionality reduction for the HMM based recognition.
Various neural networks model such as deep neural networks, and RNN and LSTM are used in
speech recognition.

HMM (Hidden Markov Model):- This is the most strongly used method in order to recognize
pattern in the speech. It is safer and possesses a secure mathematical foundation as compared to
the template based and knowledge based approach. In this method, the system being modeled is
assumed to be a Markov process having hidden states. The speech is distributed into smaller
resounding entities each of which represent a state. In simpler Markov Model, the states are
clearly visible to the user and thus the state transition probabilities are only the parameters. On the
other hand, in hidden Markov Model, the state is not directly visible, but the output, which is
dependent on the state, is evident. HMM are specifically known for their application in
reinforcement learning and pattern recognition such as speech, handwriting and bioinformatics.
 Application Of Deep Learning In Video Analysis:
Video scene analysis is an automatic process to recognize humans and objects from live-video
Sequences. In last several decades, the computer vision and artificial intelligence areas have been
considered as an active research domain for the development of automatic application.
Particularly, the area of video analysis consists of human action recognition, activity classifi-
cation, scene interpretation, and video description or captioning.
Deep learning is a subfield of machine learning and it is an emerging approach in the domain of
video scene analysis. In particular, the deep learning algorithms have many variants to represent
visual features such as the convolutional neural network (CNN), recurrent neural network (RNN),
deep belief networks (DBN), restricted Boltzmann machine (RBM) and AutoEncoder. This
removes the requirement of handcrafted feature approaches that is needed for action
representation like the regular method. Unlike the traditional handcrafted approach, it uses the
concept of a trainable feature extractor followed by a trainable classifier, presenting the idea of
end-to-end learning. In practice, this learning-based representation technique employs
computational models with many hidden to show numerous levels of abstraction.
The learning in the deep-neural network was built through a group of techniques to execute the
data in a raw way and automatically convert it into an appropriate representation.
This procedure is performed through multi-layer architecture. In the first layer, the group of pixels
is extracted from each image. Afterward, in the second layer, these pixels gather as subjects by
identifying the specific edges in an image. The third layer can corporate the subjects into small
segments. Finally, the next layers could convert it into the recognizable objects. These layers are
erudite from the raw data using a common learning process that does not require to be calculated
manually by the experts.
The main objective of deep learning is to extract information from large-scale data (e.g. images
and videos) through deep architecture models with numerous hidden layers. By using this type of
technique, it is easier to attain good result as compare to use raw pixel values or hand-crafted
features. The main thing to achieve this is that deep learning is capable of extricating diverse
stages of abstractions surrounded by perceived data.

 Interesting Deep Learning Applications for NLP:


The field of language modeling is rapidly shifting from statistical language modeling to deep
learning methods and neural networks. This is because DL models and methods have ensured a
superior performance on complex NLP tasks. Thus, deep learning models seem like a good
approach for accomplishing NLP tasks that require a deep understanding of the text, namely text
classification, machine translation, question answering, summarization, and natural language
inference among others.
1. Tokenization and Text Classification
Tokenization involves chopping words into pieces (or tokens) that machines can comprehend.
English-language documents are easy to tokenize as they have clear spaces between the words and
paragraphs. However, most other language presents novel challenges. For instance, logographic
languages like Cantonese, Mandarin, and Japanese Kanji can be challenging as they have no
spaces between words or even sentences.
But all languages follow certain rules and patterns. Through deep learning we can train models to
perform tokenization. Therefore, most AI and deep learning courses encourage aspiring DL
professionals to experiment with training DL models to identify and understand these patterns and
text.
Also, DL models can classify and predict the theme of a document. For instance, deep
convolutional neural networks (CNN) and recurrent neural network (RNN) can automatically
classify the tone and sentiment of the source text using word embeddings that find the vector
value of words. Most social media platforms deploy CNN and RNN-based analysis systems to
flag and identify spam content on their platforms. Text classification is also applied in web
searching, language identification, and readability assessment.

2. Generating Captions for Images


Automatically describing the content of an image using natural sentences is a challenging task.
The caption of the image should not only recognize the objects contained in it but also express
how they are related to each other along with their attributes (visual recognition model). Also,
semantic knowledge has to be expressed in natural language which requires a language model too.
Aligning the visual and semantic elements is core to generating perfect image captions. DL
models can help automatically describe the content of an image using correct English sentences.
This can help visually impaired people to easily access online content.

Source
Google’s Neural Image Caption Generator (NIC) is based on a network consisting of a vision
CNN followed by a language-generating RNN. The model automatically views images and
generates descriptions in plain English.

Source

3. Speech Recognition
DL is being increasingly used to build and train neural networks to transcribe audio inputs and
perform complex vocabulary speech recognition and separation tasks. In fact, these models and
methods are used in signal processing, phonetics, and word recognition, the core areas of speech
recognition.
For instance, DL models can be trained to identify each voice to the corresponding speaker and
answer each of the speakers separately. Further, CNN-based speech recognition systems can
translate raw speech into a text message that offers interesting insights pertaining to the speaker.
4. Machine Translation
Machine translation (MT) is a core task in natural language processing that investigates the use of
computers to translate languages without human intervention. It’s only recently that deep learning
models are being used for neural machine translation. Unlike traditional MT, deep neural
networks (DNN) offer accurate translation and better performance. RNNs, feed-forward neural
network (FNNs), recursive auto-encoder (RAE), and long short-term memory (LSTM) are used to
train the machine to convert the sentence from the source language to the target language with
accuracy.
Source
Suitable DNN solutions are used for processes, such as word alignment, reordering rules,
language modeling, and join translation prediction to translate sentences without using a large
database of rules.
5. Question Answering (QA)
Question answering systems try to answer a query that is put across in the form of a question. So,
definition questions, biographical questions, and multilingual questions among other types of
questions asked in natural languages are answered by such systems.
Creating a fully functional question answering system has been one of the popular challenges
faced by researchers in the DL segment. Though deep learning algorithms have made decent
progress in text and image classification in the past, they weren’t been able to solve the tasks that
involve logical reasoning (like question answering problem). However, in recent times, deep
learning models are improving the performance and accuracy of these QA systems.
Recurrent neural network models, for instance, are able to correctly answer paragraph-length
questions where traditional approaches fail. More importantly, the DL model is trained in such a
way that there’s no need to build the system using linguistic knowledge like creating a semantic
parser.
6. Document Summarization
The increasing volume of data available today is making the role of document summarization
critical. The latest advances in sequence-to-sequence models have made it easy for DL experts to
develop good text summarization models. The two types of document summarization, namely
extractive and abstractive summarization can be achieved through the sequence-to-sequence
model with attention. Refer the diagram below from the Pointer Generator blog by Abigail See.
Source
Here, the encoder RNN reads the source text, producing a sequence of encoder hidden states.
Next, the decoder RNN receives the previous word of the summary as the input. It uses this input
to update the decoder hidden state (the context vector). Finally, the context vector and the decoder
hidden state produce the output. This sequence-to-sequence model where the decoder is able to
freely generate words in any order is a powerful solution to abstractive summarization.

 Application Of Deep Learning In Medical Science:

Medical Image
The medical image plays a key role in medical diagnosis and treatment providing an important
basis for understanding a patient’s disease and helping physicians make decisions. As medical
devices become more advanced, and the career of medical health is rapidly growing, more and
more medical image data are generated, such as magnetic resonance imaging, computed
tomography (CT), and so on.
rend in medical research in recent years. Researchers used the artificial intelligence methods to
help physicians make accurate diagnoses and decisions. Many aspects are involved in these tasks,
such as detecting retinopathy, bone age, skin cancer identification, etc. Deep learning achieves
expert level in these tasks. The convolutional neural network is a powerful deep learning method.
The convolutional neural networks follow the principle of translational invariance and parameter
sharing, which is very suitable for automatically extracting image features from the original image
Electronic Health Record
One-dimensional convolutional neural network, recurrent neural network, LSTM, GRU, and other
neural networks in deep learning have been widely used in the natural language processing
community and have achieved great success. These networks are very suitable for processing
sequence-related data, such as sentence, voice, time series, and so on. Similarly, natural language
processing technology is also used in the field of computational medicine, which uses these neural
networks to process electronic medical records.

Drug Development
In recent years, with the rapid growth of biomedical data, deep learning technology has become a
new method in drug development. The application of deep learning in the field of drug
development can help researchers effectively carry out drug development and disease treatment
research and greatly promote the development of precision medicine.
Deep Learning Research on Longitudinal Datasets
In this section, we introduce the application of deep learning in longitudinal datasets. Longitudinal
data track and record the patient’s long-term condition. Using longitudinal datasets, we can
perform tasks such as predicting disease-related risks, predicting the trajectory of relevant
biomarkers at different disease development stages, or conducting survival analysis. Some
longitudinal data studies, such as Framingham Heart Study (Araki et al., 2016) or the UK Cystic
Fibrosis Registry (Taylor-Robinson et al., 2018), provide useful datasets for longitudinal data
research.
The patient’s medical history may contain some information about the future disease, so it is very
useful to study the history and infer the future disease development from the past information. It
requires doctors to make a prediction as soon as possible so that doctors can take measures to
prevent the trend of disease onset or deterioration. However, the use of longitudinal data for
research will also have difficulties. For example, for predicting disease trajectory, for a patient, his
disease status may develop slowly, which increases the difficulty of related research. For example,
a patient with a chronic disease such as diabetes may have different conditions over time. How to
apply appropriate deep learning to longitudinal datasets has become a research direction.
Genomics
Genomics studies the function, structure, editing, and performance of genes. Because of its
powerful ability to process data and automatic feature extraction, many researchers have applied it
to the field of genomics to discover deeper patterns.
Compared with traditional machine learning methods, the deep learning methods can extract the
higher dimensional features, richer information, and more complex structure from biological data.
In recent years, deep learning has been widely used in genomics, such as gene expression, gene
slicing, RNA measurement, and other tasks. Deep learning brings new methods to bioinformatics
and helps to understand the principles of human diseases further.

You might also like