Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

A Guide to Deep Learning and Neural Networks

Article by Yulia Gavrilova


October 8th, 2020
13 min read

35

As a subset of artificial intelligence, deep learning lies at the heart of various


innovations: self-driving cars, natural language processing, image recognition and so
on. Companies that deliver DL solutions (such as Amazon, Tesla, Salesforce) are at the
forefront of stock markets and attract impressive investments. According to Statista, the
total funding of artificial intelligence startup companies worldwide in 2014–2019 is equal
to more than $26 billion. This high interest can be explained by the amazing benefits of
deep learning and its architectures — artificial neural networks.

What is deep learning?


Deep learning is one of the subsets of machine learning that uses deep learning
algorithms to implicitly come up with important conclusions based on input data.

Usually, deep learning is unsupervised or semi-supervised. Deep learning is based


on representation learning. Instead of using task-specific algorithms, it learns from
representative examples. For example, if you want to build a model that recognizes cats
by species, you need to prepare a database that includes a lot of different cat images.

The main architectures of deep learning are:

 Convolutional neural networks


 Recurrent neural networks
 Generative adversarial networks
 Recursive neural networks

We are going to talk about them more in detail later in this text.

Difference between machine learning and deep learning

Machine learning attempts to extract new knowledge from a large set of pre-processed
data loaded into the system. Programmers need to formulate the rules for the machine,
and it learns based on them. Sometimes, a human might intervene to correct its errors.

However, deep learning is a bit different:

Deep learning Machine learning

small datasets as long as they are


large amounts of data
high-quality

computation-heavy not always

an draw accurate conclusions from raw data carefully pre-processed data


can be trained in a reduced amount of
take much longer to train
time

you can't know what are the particular features that the logic behind the machine’s decision
neurons represent is clear

algorithm is built to solve a specific


can be used in unexpected ways
problem

Advantages of deep learning

Now that you know what the difference between DL and ML is, let us look at some
advantages of deep learning.

 In 2015, a group of Google engineers was conducting research about how NN


carry out classification tasks. By chance, they also noticed that neural networks can
hallucinate and produce rather interesting art.
 The ability to identify patterns and anomalies in large volumes of raw data
enables deep learning to efficiently deliver accurate and reliable analysis results to
professionals. For example, Amazon has more than 560 million items on the website
and 300+ million users. No human accountant or even a whole army of accountants
would be able to track that many transactions without an AI tool.
 Deep learning doesn’t rely on human expertise as much as traditional machine
learning. DL allows us to make discoveries in data even when the developers are not
sure what they are trying to find. For example, you want your algorithms to be able
to predict customer retention, but you’re not sure which characteristics of a customer
will enable the system to make this prediction.

Problems of deep learning


 Large amounts of quality data are resource-consuming to collect. For many
years, the largest and best-prepared collection of samples was ImageNet with 14 million
different images and more than 20,000 categories. It was founded in 2012, and only last
year, Tencent released a database that is larger and more versatile.
 Another difficulty with deep learning technology is that it cannot provide reasons
for its conclusions. Therefore, it is difficult to assess the performance of the model if you
are not aware of what the output is supposed to be. Unlike in traditional machine
learning, you will not be able to test the algorithm and find out why your system decided
that, for example, it is a cat in the picture and not a dog.
 It is very costly to build deep learning algorithms. It is impossible without qualified
staff who are trained to work with sophisticated maths. Moreover, deep learning is a
resource-intensive technology. It requires powerful GPUs and a lot of memory to train
the models. A lot of memory is needed to store input data, weight parameters, and
activation functions as an input propagates through the network. Sometimes deep
learning algorithms become so power-hungry that researchers prefer to use other
algorithms, even sacrificing the accuracy of predictions.

However, in many cases, deep learning cannot be substituted.

How can you apply DL to real-life problems?

Today, deep learning is applied across different industries for various use cases:

 Speech recognition. All major commercial speech recognition systems (like


Microsoft Cortana, Alexa, Google Assistant, Apple Siri) are based on deep learning.
 Pattern recognition. Pattern recognition systems are already able to give more
accurate results than the human eye in medical diagnosis.
 Natural language processing. Neural networks have been used to implement
language models since the early 2000s. The invention of LSTM helped improve
machine translation and language modeling.
 Discovery of new drugs. For example, the AtomNet neural network has been
used to predict new biomolecules that can potentially cure diseases such as Ebola and
multiple sclerosis.
 Recommender systems. Today, deep learning is being used to study user
preferences across many domains. Netflix is one of the brightest examples in this field.

What are artificial neural networks?

“Artificial neural networks” and “deep learning” are often used interchangeably, which
isn’t really correct. Not all neural networks are “deep”, meaning “with many hidden
layers”, and not all deep learning architectures are neural networks. There are
also deep belief networks, for example..
However, since neural networks are the most hyped algorithms right now and are, in
fact, very useful for solving complex tasks, we are going to talk about them in this post.

Definition of an ANN

An artificial neural network represents the structure of a human brain modeled on the
computer. It consists of neurons and synapses organized into layers.

ANN can have millions of neurons connected into one system, which makes it extremely
successful at analyzing and even memorizing various information.

Here is a video for those who want to dive deeper into the technical details of how
artificial neural networks work.

Components of Neural Networks

There are different types of neural networks but they always consist of the same
components: neurons, synapses, weights, biases, and functions.

Neurons
A neuron or a node is a basic unit of neural networks that receives information,
performs simple calculations, and passes it further.

All neurons in a net are divided into three groups:

 Input neurons that receive information from the outside world;


 Hidden neurons that process that information;
 Output neurons that produce a conclusion.

In a large neural network with many neurons and connections between them, neurons
are organized in layers. There is an input layer that receives information, a number of
hidden layers, and the output layer that provides valuable results. Every neuron
performs transformation on the input information.

Neurons only operate numbers in the range [0,1] or [-1,1]. In order to turn data into
something that a neuron can work with, we need normalization. We talked about what it
is in the post about regression analysis.

Wait, but how do neurons communicate? Through synapses.

Synapses and weights

A synapse is what connects the neurons like an electricity cable. Every synapse has a
weight. The weights also add to the changes in the input information. The results of the
neuron with the greater weight will be dominant in the next neuron, while information
from less ‘weighty’ neurons will not be passed over. One can say that the matrix of
weights governs the whole neural system.
How do you know which neuron has the biggest weight? During the initialization (first
launch of the NN), the weights are randomly assigned but then you will have to optimize
them.

Bias

A bias neuron allows for more variations of weights to be stored. Biases add richer
representation of the input space to the model’s weights.

In the case of neural networks, a bias neuron is added to every layer. It plays a vital role
by making it possible to move the activation function to the left or right on the graph.
It is true that ANNs can work without bias neurons. However, they are almost always
added and counted as an indispensable part of the overall model.

How ANNs work

Every neuron processes input data to extract a feature. Let’s imagine that we have
three features and three neurons, each of which is connected with all these features.

Each of the neurons has its own weights that are used to weight the features. During
the training of the network, you need to select such weights for each of the neurons that
the output provided by the whole network would be true-to-life.

To perform transformations and get an output, every neuron has an activation function.
This combination of functions performs a transformation that is described by a common
function F — this describes the formula behind the NN’s magic.
There are a lot of activation functions. The most common ones are linear, sigmoid, and
hyperbolic tangent. Their main difference is the range of values they work with.

How do you train an algorithm?

Neural networks are trained like any other algorithm. You want to get some results and
provide information to the network to learn from. For example, we want our neural
network to distinguish between photos of cats and dogs and provide plenty of examples.

Delta is the difference between the data and the output of the neural network. We use
calculus magic and repeatedly optimize the weights of the network until the delta is
zero. Once the delta is zero or close to it, our model is correctly able to predict our
example data.

Iteration

This is a kind of counter that increases every time the neural network goes through one
training set. In other words, this is the total number of training sets completed by the
neural network.

Epoch

The epoch increases each time we go through the entire set of training sets. The more
epochs there are, the better is the training of the model.

Batch
Batch size is equal to the number of training examples in one forward/backward pass.
The higher the batch size, the more memory space you’ll need.

What is the difference between an iteration and an epoch?

 one epoch is one forward pass and one backward pass of all the training
examples;
 number of iterations is a number of passes, each pass using [batch size] number
of examples. To be clear, one pass equals one forward pass + one backward pass (we
do not count the forward pass and backward pass as two different passes).

And what about errors?

Error is a deviation that reflects the discrepancy between expected and received output.
The error should become smaller after every epoch. If this does not happen, then you
are doing something wrong.

The error can be calculated in different ways, but we will consider only two main ways:
Arctan and Mean Squared Error.

There is no restriction on which one to use and you are free to choose whichever
method gives you the best results. But each method counts errors in different ways:

 With Arctan, the error will almost always be larger.

$\frac{arctan^2(i_1-a_1)+...+arctan^2(i_n-a_n)}{n}$

 MSE is more balanced and is used more often.

$\frac{(i_1-a_1)^2+(i_2-a_2)^2+...+(i_n-a_n)^2}{n}$

What kinds of neural networks exist?


There are so many different neural networks out there that it is simply impossible to
mention them all. If you want to learn more about this variety, visit the neural network
zoo where you can see them all represented graphically.

Feed-forward neural networks

This is the simplest neural network algorithm. A feed-forward network doesn’t have any
memory. That is, there is no going back in a feed-forward network. In many tasks, this
approach is not very applicable. For example, when we work with text, the words form a
certain sequence, and we want the machine to understand it.

Feedforward neural networks can be applied in supervised learning when the data that
you work with is not sequential or time-dependent. You can also use it if you don’t know
how the output should be structured but want to build a relatively fast and easy NN.

Recurrent neural networks

A recurrent neural network can process texts, videos, or sets of images and become
more precise every time because it remembers the results of the previous iteration and
can use that information to make better decisions.

Recurrent neural networks are widely used in natural language processing and speech
recognition.
Convolutional neural networks

Convolutional neural networks are the standard of today’s deep machine learning and
are used to solve the majority of problems. Convolutional neural networks can be either
feed-forward or recurrent.

Let’s see how they work. Imagine we have an image of Albert Einstein. We can assign a
neuron to all pixels in the input image.

But there is a big problem here: if you connect each neuron to all pixels, then, firstly,
you will get a lot of weights. Hence, it will be a very computationally intensive operation
and take a very long time. Then, there will be so many weights that this method will be
very unstable to overfitting. It will predict everything well on the training example but
work badly on other images.

Therefore, programmers came up with a different architecture where each of the


neurons is connected only to a small square in the image. All these neurons will have
the same weights, and this design is called image convolution. We can say that we
have transformed the picture, walked through it with a filter simplifying the process.
Fewer weights, faster to count, less prone to overfitting.
For an awesome explanation of how convolutional neural networks work, watch this
video by Luis Serrano.

Generative adversarial neural networks

A generative adversarial network is an unsupervised machine learning algorithm that is


a combination of two neural networks, one of which (network G) generates patterns and
the other (network A) tries to distinguish genuine samples from the fake ones. Since
networks have opposite goals – to create samples and reject samples – they start an
antagonistic game that turns out to be quite effective.

GANs are used, for example, to generate photographs that are perceived by the human
eye as natural images or deepfakes (videos where real people say and do things they
have never done in real life).
What kind of problems do NNs solve?

Neural networks are used to solve complex problems that require analytical calculations
similar to those of the human brain. The most common uses for neural networks are:

 Classification. NNs label the data into classes by implicitly analyzing its


parameters. For example, a neural network can analyse the parameters of a bank client
such as age, solvency, credit history and decide whether to loan them money.
 Prediction. The algorithm has the ability to make predictions. For example, it can
foresee the rise or fall of a stock based on the situation in the stock market.
 Recognition. This is currently the widest application of neural networks. For
example, a security system can use face recognition to only let authorized people into
the building.

Summary

Deep learning and neural networks are useful technologies that expand human
intelligence and skills. Neural networks are just one type of deep learning architecture.
However, they have become widely known because NNs can effectively solve a huge
variety of tasks and cope with them better than other algorithms.

If you want to learn more about applications of machine learning in real life and
business, continue reading our blog:

 In our post about best ML applications, you can discover the most stunning use
cases of machine learning algorithms.
 Read this Medium post if you want to learn more about GPT-3 and creative
computers.
 If you want to know how to choose ML techniques for your project, you will find
the answer in our blog post.

You might also like