Professional Documents
Culture Documents
FULLTEXT01
FULLTEXT01
FULLTEXT01
Linköpings universitet
SE–581 83 Linköping
+46 13 28 10 00 , www.liu.se
Upphovsrätt
Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år
från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka
kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för
undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta
tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För
att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-
istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i
den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt
samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-
manhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller
egenart. För ytterligare information om Linköping University Electronic Press se förlagets
hemsida http://www.ep.liu.se/.
Copyright
The publishers will keep this document online on the Internet – or its possible replacement
– for a period of 25 years starting from the date of publication barring exceptional circum-
stances. The online availability of the document implies permanent permission for anyone to
read, to download, or to print out single copies for his/hers own use and to use it unchanged
for non-commercial research and educational purpose. Subsequent transfers of copyright
cannot revoke this permission. All other uses of the document are conditional upon the con-
sent of the copyright owner. The publisher has taken technical and administrative measures
to assure authenticity, security and accessibility. According to intellectual property law the
author has the right to be mentioned when his/her work is accessed as described above and
to be protected against infringement. For additional information about the Linköping Uni-
versity Electronic Press and its procedures for publication and for assurance of document
integrity, please refer to its www home page: http://www.ep.liu.se/.
The development of computational power is constantly on the rise and makes for new pos-
sibilities in a lot of areas. Two of the areas that has made great progress thanks to this
development are control theory and artificial intelligence. The most eminent area of arti-
ficial intelligence is machine learning. The difference between an environment controlled
by control theory and an environment controlled by machine learning is that the machine
learning model will adapt in order to achieve a goal while the classic model needs preset
parameters. This supposedly makes the machine learning model more optimal for an en-
vironment which changes over time. This theory is tested in this paper on an model of an
inverted pendulum. Three different machine learning algorithms are compared to a classic
model based on control theory. Changes are made to the model and the adaptability of
the machine learning algorithms are tested. As a result one of the algorithms were able
to mimic the classic model but with different accuracy. When changes were made to the
environments the result showed that only one of the algorithms were able to adapt and
achieve balance.
Acknowledgments
We would like to express our very great appreciation to our examiner Peter Fritzson and our
supervisor Lennart Ochel
iv
Contents
Abstract iii
Acknowledgments iv
Contents v
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Theory 3
2.1 Control theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Open-loop and closed-loop control systems . . . . . . . . . . . . . . . . . 4
2.1.2 Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Physics of the pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Equations of motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Artificial intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.1 Regression Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.2 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.4 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.5 Activation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.6 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Modelica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 FMI - Funtional Mock-up Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.8 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Method 18
3.1 Virtual model of pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Implementation in OpenModelica . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Altering of the environments . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 PID Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Choosing algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 Training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Implementing the learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5.1 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.3 Regression Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
v
3.6 Selection of hyper parameters for the algorithms . . . . . . . . . . . . . . . . . . 26
3.7 Comparing the algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Results 27
4.1 Base pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Data from altered environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.1 Altered environment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.2 Altered environment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.3 Altered environment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.4 Altered environment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.5 Altered environment 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Discussion 33
5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.1 Source criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3 The work in a wider context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 Conclusion 36
6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Bibliography 37
vi
List of Figures
2.1 A schematic over a simple feedback control system with exogeneous signals. . . . 5
2.2 A schematic over a PI-controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 A schematic over a PID-controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Diagram of the pendulum environment with actuating forces . . . . . . . . . . . . 8
2.5 A diagram of a classic decision tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 A graph showing an example of linear regression . . . . . . . . . . . . . . . . . . . 12
2.7 Simplified image of a biological neuron . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.8 Schematic of Artificial Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.9 A graph showing weight altering by using stochastic gradient descent . . . . . . . 14
2.10 Graphs of hyperplane decision surface for a single layer perceptron network . . . 15
2.11 Graph showing a plot of the sigmoid activation function . . . . . . . . . . . . . . . 16
2.12 Graph showing the tanh activation function . . . . . . . . . . . . . . . . . . . . . . . 16
4.1 Graphs of simulation results for the base pendulum with different start angles. . . 27
4.2 Graphs containing the different simulation results in altered environment 1 with
changed pendulum mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Graphs containing the different simulation results in altered environment 2 with
changed pendulum mass and pendulum length . . . . . . . . . . . . . . . . . . . . 29
4.4 Graphs containing the different simulation results in altered environment 3 with
changed cart mass and pendulum length . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5 Graphs containing the different simulation results in altered environment 4 with
changed cart mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.6 Graphs containing the different simulation results in altered environment 5 with
changed cart mass, pendulum mass and pendulum length . . . . . . . . . . . . . . 32
vii
1 Introduction
This chapter will introduce the main motivation of the project. The aim is presented before
presenting the research questions. Lastly the delimitations of the project is given.
1.1 Motivation
Control theory is a field of engineering which has seen a lot of development during the 20th
century. Technological advances and the difference in computation power is two of the rea-
sons for this development. Most of the present methods for controller design within the linear
control theory branch is static, this means that as soon as the parameters for the system is set
it is going to work that way until you change it. This sounds like a good approach in most
cases, but if the environment changes or the user wants to make variations to the environ-
ment the parameters need to be reset, something that could be a complicated process.
When studying control theory an environment which is very frequent is the inverted pendu-
lum. The inverted pendulum controller is a classic example in control engineering because of
the instability of the system. The inverted pendulum is placed on a cart, both the pendulum
and cart are constrained to move within the vertical plane. The goal is for the cart to be able
to balance the pendulum 90 degrees straight up by moving in the vertical directions.
Just as the development of computation power has made an impact in the control theory
branch, it has also made Artificial intelligence (AI) a hot topic in the 2010’s. AI is something
that becomes more relevant every day in our society, research in this fields has made some
big breakthroughs over the last ten years. Most of those breakthroughs have been in the area
of machine learning with neural networks as one of the big buzzwords of the area.
1
1.2. Aim
The difference between a classic controller of the pendulum and a pendulum operated with
machine learning is that the machine learning model will learn how to operate in order to
balance the pendulum while the classic model will need preset parameters. This suppos-
edly makes the machine learning model more optimal for an environment which may have
deviations over time since there would be no need to change any parameters.
1.2 Aim
The aim of the project is to compare and study the data produced by different machine learn-
ing algorithms to map their behavior and ability to adapt. The research environment, a sim-
ulated inverted pendulum will be controlled by a PID-controller. From these simulations
training data will be extracted to train machine learning algorithms to be able to balance the
pendulum. Different changes will be made to the environment and then the algorithms will
be run on the new environment to compare how control systems, that are based on different
machine learning algorithms, adapt when placed in environments that are dynamic.
With previously mentioned problem as a base, the questions that will be answered in this
report are:
• How does a pendulum based on classic control theory compare to a pendulum based
on different machine learning algorithms?
• How will the machine learning algorithm behave when altering the environment by
changing for example weight, length and size.
The necessary theory and method will be described in this report before presenting the result
and discussion.
1.4 Delimitations
Considering the time frame of the project a number of different delimitation has been made.
Today many different machine learning algorithms exist, given the mentioned time frame we
had to pick 3 of them since comparing them all would take too much time. This work should
therefore not be considered a complete comparison since many algorithms has not been con-
sidered. Also due to the time frame of the object, the only problem which will be studied is
the inverted pendulum. With more time there would be possible to research different types
of problems. The pendulum will only be virtual as a result of the time frame of the project.
2
2 Theory
This chapter will explain the theory behind the project and will start of by presenting the
main control theory which have been used to build the controller and the classic model of the
pendulum. Then the area artificial intelligence is presented as an area in the field of computer
engineering. The area is wide and has several subareas, the one who will be used in this
article is machine learning. After this the selected algorithms will be thoroughly explained.
The modeling language which has been used in this project to build the pendulum, Modelica,
is explained last.
To be able to construct a control system a high knowledge about the processes that are to
be regulated is required. The most important factor are the outputs of the system and how
they react to changes in the inputs. The input is some kind of command or stimulus that is
applied to the system. The form of the inputs and outputs can vary. The input and output
needs to be given to be able to identify the components of the system. A control system can
have more than one input and output [8]. A process have static and dynamic characteristics.
The static characteristics are the ones that are in the process’s static condition, this means
that the static amplification is leveled everywhere in the work of the process. The dynamic
characteristics of a process will take into account rigidity’s, time delays and transients. For
example, when you press the pedal of a car it will take some time before the wanted speed is
acquired. The static characteristics between different systems are often very alike while the
dynamic characteristics may differ. Because of this, different processes can be divided into
different types. There are processes with downtime, processes with overshoot and so forth.
The inverted pendulum is an unstable process. This means that feedback is required to be
able to keep the output close to the wanted reference signal. [27]
3
2.1. Control theory
When designing a control system there are a number of steps to go through. First the control-
system must be studied, to be able to build the system one must know what type of sensors
that are necessary, what actuators that is to be used and where they should be placed. After
identifying all the necessities of the system, it can be modeled. After the system is modeled
the resulting models’ properties can be determined and the performance specifications can
be set. Based on earlier conclusions the type of controller that is to be used is set and the
controller is designed to meet the measured properties. There are different types of controllers
that can be used on different types of systems. Some of these controllers will be explained
more closely in the following sections. After identifying the appropriate controller the system
can be simulated. [9]
4
2.1. Control theory
Figure 2.1: A schematic over a simple feedback control system with exogeneous signals.
As seen in figure 2.1 the output signal is fed back into the object that is to be controlled, this
is called the feedback. Apart from the feedback there are some signals that are coming from
the outside, the external disturbance and sensor noise are example of this. These signals are
called exogenous signals [9]. The controller oversees the process and has a input variable, the
command input, which is compared to a set-point that is the wanted outcome. If the input is
the same as the set-point the control system has reached the goal, if they differ the controller
sends the object input to the object to tell it how to perform to reach the set-point. [1]
2.1.2 Controllers
The controller is the heart of the control system. The controller’s task in the system is to, with
the information from the feedback, create the control signal that will try to decrease the error.
The simplest form of controller is the on-off controller, in these the control signal can only
take two different values. The value of the signal depends on the output, if it is positive or
negative. The on-off controller is simple but not always accurate enough. As a more advanced
type of the on-off controller there is the multi-stage controller. The difference is basically that
there are more than two stages in this type of controller. The on-off and multi-stage controller
are not very usual, in many cases a P- PI- or PID-controller is used. [27]
5
2.1. Control theory
2.1.2.1 P-controller
With proportional control the variations of the control signal is proportional to the control
error signal, the input to the controller. The relationship between the input and output can be
described with the following formula:
u = u0 + Ke (2.1)
Whereas u is the control signal, e is the control error signal, u0 is the set point, the goal value.
The K parameter is the controllers amplification which means that it decides how much the
controller should do to right the wrong.
A P-controller is often used as a basic function in most controllers but is often combined
with a I-controller and D-controller which will be brought up in the following sections. The
P-controller will give a softer control compared to previously mentioned on-off controllers.
There is no K-value that gives a good speed and a high stability, if both speed and stability is
wanted the controller must be supplemented with a D-controller. The P-controller might be
enough if the requirements are not that high.[27]
2.1.2.2 I-controller
An I-controller is a controller where the output is an integral of the error.
1 t
ż
u(t) = e(t)dt (2.2)
TI 0
Whereas TI is the integration time which decides the velocity of the integration. e is the input
and u is the output. The I-controllers output at a certain time depends on the size of the error
at that time. In a I-controller the control signals initial value is set to the set points value. As
long as the error is 0, the control signal is the same as the set point, the control signal will stay
at its initial value. If there is an error the control signal will increase or decrease depending
on if the error signal is positive or negative. When the error is controlled the control signal
will have returned to its initial value. [27]
2.1.2.3 PI-controllers
Often a P-controller and an I-controller is combined into a PI-controller. In this way you can
use the advantages of the both types.
h şt i
u(t) = K e(t) + T1I 0 e(t)dt (2.3)
The amplification will now affect both therms in the PI-controller. The integration time TI
in a PI-regulator is consciously chosen to be big and that will result in a slow change in the
I-part of the controller, when an error occur, compared to the P-part. The integration time in
the PI-controller corresponds to the time it takes for the I-parts output to match the P-parts
output. The constants K and TI needs to be set to suitable values for the PI-controller to work
as intended. [27]
6
2.1. Control theory
1 de(t)
u(t) = TD e (t) = TD (2.4)
dt
Where the derivation time TD is a constant.
The output of the D-part is separated from 0 only when the input’s value is changed, when
the derivative of the input is separated from 0. If the change of the input is fast, the derivative
will be big and when the input takes a constant value the derivative will be 0. The derivative
part is never present as a single entity in a control system, it is always used with some of the
previously mention parts, for example in a PD-controller, or PID-controller.[27]
The output of a PID-controller is made up of outputs from the three different parts, the P-, I-
and D-parts. The relationship between the input e and the output u in a PID controller can be
described in the following form:
h şt 1
i
u(t) = K e(t) + T1I 0 e(t)dt + TD e (t) (2.5)
The derivative part of the PID-controller can improve the stability, speed and the interference
suppression. If the PID-controller is to function as intended the parameters K K I and TD
needs to be tuned to a suitable value. [27]
7
2.2. Physics of the pendulum
The system has the freedom to move in two different ways, the cart can move horizontal with
the x-axis and the pendulum can rotate against it’s pivot point 360 degrees.
d BL BL
( )´ = Qi (2.6)
dt Bqi Bqi
Where L is the difference between the kinetic energy (T) and the potential energy (V).
L = T´V (2.7)
The potential energy of the system is going to be the potential energy of the pendulum since
the cart will never have any stored energy.
V = mglCosθ (2.8)
8
2.3. Artificial intelligence
Finding the kinetic energy is a little more complicated since it involves both the pendulum
and the cart.
1 1
T= M ẋ2 + mVp2
2 2
1
= ( M ẋ + m(Vx2 + Vy2 ))
2
2
1
= ( M ẋ2 + m(( ẋ ´ l θ̇Cosθ )2 + l θ̇Sinθ )2 )) (2.9)
2
1
= (( M + m) ẋ2 ´ m(2ẋl θ̇Cosθ ´ l 2 θ̇ 2 (Cos2 θ + Sin2 θ )))
2
1
= (( M + m) ẋ2 ´ m(2ẋl θ̇Cosθ ´ l 2 θ̇ 2 ))
2
By then combining formula 2.8 and 2.9, equation 2.7 can be solved for L.
1
L= (( M + m) ẋ2 ´ 2m ẋl θ̇Cosθ + ml 2 θ̇ 2 ´ 2mglCosθ ) (2.10)
2
Using Lagrange’s equations (2.6) and calculating the equations of motions for our state vari-
ables xs and θs we can get the equations of motions for the system.
Equation 2.12 is equal to 0 because there will be no external force actuating on state variable
θs .
AI can be classified into two different categories, weak and strong AI. The strong AI category
is considered to have human-like high level cognition ability. Included in this behaviour is
for example common sense and self-awareness. On the other hand, there is weak AI which
simulates human intelligent processes without resistance and real understanding. Modern
AI systems are all at the stage of weak AI and as for today strong AI does not exist. [29]
9
2.4. Machine learning
The main motivation for using this type of system is that if a system can learn and adapt to
changes in an environment the designer of these systems does not need to be able to foresee
and provide solutions for all possible situations. The areas in which machine learning is used
today are many and include pattern recognition, speech recognition and robotics . One of
the main themes of pattern recognition is recognizing faces, it is an easy task for the human
brain, a human does it without effort every day. This is however done unconsciously which
means without awareness, sensation, or cognition. To build a computer program that works
with awareness or cognition is impossible, the program will never make its own decision, it
will take action based on the code that the programmer have written. To mimic the brain the
algorithm is programmed to look at the known facts; a face has a pattern, it has a nose, eyes
and a mouth. These are all placed at a certain position in the face, there is a structure. With
the data of photos of different faces a learning program can see and analyze a face-pattern
and recognize this by checking for the pattern in each image. [2]
10
2.5. Supervised Learning
The output, leaf, of a regression tree is obtained by making a series of comparisons rather
than asking yes- or no questions. To train a tree you need a dataset. The dataset is divided
into a training set and a testing set. The training set is usually composed of complete data
which determine the structure of the tree while the testing set has missing values of certain
attributes. The model searches every distinct value of the input data to find the split value
that separates the data into two regions. After finding the best split the splitting process is
repeated on each of the two new regions. This is repeated until a stopping point is reached.
[14, 12]
A regression tree have a lot of advantages. It’s an excellent way for the user to visualize each
step of a decision making process which can help with making rational decisions. There is a
possibility to give priority to a decision criterion. A lot of the undesired data is filtered out in
each step which makes for a manageable amount of data. It’s a very presentable algorithm
that is easy to explain.
11
2.5. Supervised Learning
With two variables and only one explanatory variable the method is referred to as simple
linear regression. If there are more than one explanatory variable this is referred to as multiple
linear regression. It’s a fairly simple algorithm and it has well known properties which makes
for a popular algorithm to use not only in machine learning but in areas such as for example
finance, economics and epidemiology. [18, 15]
There are two types of Neural Networks, Convolutional Neural Networks (CNNs) and Re-
current Neural Networks (RNNs). The difference is that everything in the CNN is sequential,
whereas the RNN has loops inside of it. Every time Neural Network is mentioned in this
thesis it is referring to a CNN if nothing else is mentioned.
12
2.5. Supervised Learning
13
2.5. Supervised Learning
2.5.4 Backpropagation
To train a network it is necessary to calculate the desired functionality. Comparing the output
from the network to the desired output gives us an error value (can also be referred to as loss).
This is done by using an error function, the most common named sum of squared error (sse)
[31].
L H
1ÿÿ
Esse = (Olh ´ Ylh )2 (2.13)
2
l =1 h =1
Where l = 1,2,3, ..., L and are for the different observations. h = 1,2,3, ..., H is the index for
respective output node. O is the desired output and Y is the observed output.
As mentioned earlier each individual neuron has a weight associated with it. Altering the
behaviour of the program is done by adjusting the weight that each neuron has. The weights
adjust the importance of certain input values and change the output from the neurons, there-
fore they can be used to alter the functionality of the network. Altering of the weight is done
by something called backpropagating. Backpropagation works by using stochastic gradient
descent, the idea is to minimize the error value generated by the error function. Calculating
E
partial derivatives of the error function regarding the weights ( BBW ) gives us the possibility to
move towards a smaller error value, this is displayed in figure 2.4. Each iteration the gradient
is calculated and moved towards a smaller error by adjusting the weight in the appropriate
direction [16].
Figure 2.9: A graph showing weight altering by using stochastic gradient descent
14
2.5. Supervised Learning
2.5.5.1 Perceptron
Old ANN systems used an activation function called perceptron.If the result is bigger than a
threshold the perceptron will output a binary value (1 or -1). Since the output is binary it can
be considered as a Boolean value. Given input x1 to xn and assuming x0 =1, the output o(x1 ,
. . . , xn ) can be written as the formula:
#
1 if Σnk=0 wk xk > 0
o ( x1 , ..., xn ) = (2.14)
-1 else
Where wk = a real-value constant (weight) that affects how much each input will contribute.
As mentioned earlier, the output of a perceptron can be interpreted as a Boolean value de-
pending on what value the sum of the weights and inputs have. In figure 2.10a there’s a
representation of this. The figure 2.10b shows a limitation to the perceptron, there is no way
to model a XOR function with a single function since the XOR function is not linear separable.
This has led to evolution of more complex functions that can be used when modeling neural
networks [21].
(a) (b)
Figure 2.10: Graphs of hyperplane decision surface for a single layer perceptron network
2.5.5.2 Sigmoid
The sigmoid receives a real-value input and produce a real-value output, compared to the
perceptron activation function which outputs a binary value [19]. The sigmoid function is
described in equation 2.15 and plotted in figure 2.11.
1
f (x) = (2.15)
1 + e´ x
As seen in the plot the function squashes the input value to a value between 0 and 1. When
looking at figure 2.11 and remembering how back-propagation worked (calculating gradient)
it is obvious that this function will have a hard time converging for high input values since
all gradient is killed the closer it comes to ´8 or 8. Computing exponential functions will
also be expensive computationally, which is important to avoid when training large networks
[25].
15
2.5. Supervised Learning
2.5.5.3 Tanh
Tanh activation function is a spin-off from the sigmoid, where the gradients increase when
data is centered around 0. This can be visualised by plotting the tanh function, in figure 2.12
the output ranges from [-1, 1] where as in the sigmoid the possible output lies between [0,
1].[17] Equation 2.16 is the tanh function where the sigmoid function 2.15 has been denoted
to σ ( x ).
tanh( x ) = 2σ (2x ) ´ 1 (2.16)
2.5.6 Hyperparameters
There are a number of different learning algorithms that exists today. Oftentimes these al-
gorithms have sets of hyperparameters. A hyperparameter is a parameter that has to be set
accordingly by the user for the algorithm to be able to perform to its’ full extent. These hyper-
parameters influence the algorithm and its performance a great deal and are therefore used
to configure different aspects of the algorithm. The tuning of hyperparameters are generally
done manually which can be a time consuming work and it can be hard to reproduce by
others. The amount of hyperparameters can vary substantially but usually only a few of the
hyperparameters impact the performance. Identifying which of the parameters that has this
kind of impact in advance is hard. [5]
There are a number of different methods for optimizing hyperparameters, such as for exam-
ple grid search, random search, Bayesian optimization and evolutionary optimization. The
one that is used in this paper is grid search which is one of the most popular methods. The
method searches through a user-specified subset of a hyperparameter field of the learning
algorithm. This field may consist of real values parameters which makes it necessary to man-
ually set bounds for the search. [4]
16
2.6. Modelica
2.6 Modelica
Modelica is an open for all language which is used for modeling different systems. The de-
velopment of the language has been ongoing since 1996. The language is object-oriented
and is suited for a number of different multi-domain models. The models in Modelica are
described mathematically by algebraic, differential and discrete equations and from a user’s
point of view, they are described by schematics. These schematics consists of components
that has connectors that describes the possible interactions. A diagram model is made by
drawing connection lines between connectors on different components. To be able to graphi-
cally edit and browse a Modelica model a Modelica simulation environment is needed. This
environment is used to perform model simulations and other analysis.
17
3 Method
In response to the research questions 1.3; creating a virtual model of the pendulum allowed
for repetitive runs with different parameters for different algorithms. For all experiments a
PID-controller was used as reference point. To allow the algorithms to run on a real-time
hardware system, the algorithms were implemented in C++ to achieve fast execution time.
18
3.2. PID Controller
The angle offset was initialized with the values {90, 60, 45, 30, 20, 14, 9, 4, 0} to cover the
possible scenarios a controller could face in a varying environment. We did not consider the
opposite angles {-90, -60, ..., 0} since the target function for those values is expected to be a
horizontal reflection for the values chosen.
19
3.4. Training data
As a contrast to the complex neural network algorithm we wanted a simple and well-
established algorithm. We considered different ones but settled on the linear regression al-
gorithm. This algorithm is very common, easy to use and implement and we thought that
the contrast to the neural network would make for interesting discussion. We wanted three
algorithms to be able to make a deeper comparison and as for the last algorithm we wanted
to find a middle ground between the neural network and linear regression. Here we settled
on a regression tree algorithm. This algorithm is fairly complex but is easy to understand
and explain. We thought that the tree-structure seemed interesting and found it intriguing to
see how this algorithm would compare to the first two algorithms we picked. We felt that all
three algorithms were dissimilar enough to give us a bigger picture over how different kinds
of algorithms work on the same kind of problem.
For simplicity, the Mean Squared Error (MSE) loss function was used for as the error function
for all functions. MSE is equal to the mean of the sum of squared error that was mentioned
briefly in section 2.5.4. The MSE was selected because the derivate is easy to calculate with it
for the linear regression. The MSE is:
n
1 ÿ
(yi ´ predict( x ))2 (3.1)
N
i =1
20
3.5. Implementing the learning algorithms
The network was chosen to have one hidden layer and 25 neurons as in [22] because we
used the same input variables and derive our training data in the same way. The output
function chosen was the tanh function 2.16 over the sigmoid function 2.15 since our output
data was centered around 0 it would provide better gradients. The saved computation by
using ReLU was not prioritized since the amount of data was not large enough to motivate
its use. Using ReLU would give derivates of 1 or -1 since it is a linear function, using tanh
was thought to make the gradients more dependent on the target output and therefore yield
better convergence.
Since this was not a classification problem there was no need for more than one neuron as
output, by making the last neuron have a linear output function the output was as expected.
During the training the backpropagation was done from the back to the front to make sure
all gradients were calculated for previous neurons and did not affect each other. Respectively
in the prediction the layers was traversed from front to back. The update on the weights,
were done first after all changes had been calculated. This was done to not have any changes
affecting other nodes during same iteration.
The derivative of tanh was used in the backpropagation and was equal to:
B
tanh x = 1 ´ tanh2 x (3.2)
Bx
21
3.5. Implementing the learning algorithms
For the neural network we used three different hyperparameters. Learning rate, momentum,
and epochs. The learning rate was a factor added to not overshoot when decreasing/in-
creasing the weights. Momentum was used to quicker converge towards goal output if we
had multiple training points indicating the same gradients. Epochs was used to re-use the
training data for a training session. Their respective settings is discussed in section 3.6.
22
3.5. Implementing the learning algorithms
output: float
1 Function Predict(data_point)
2 foreach neuron n in layer [0] do
3 n.output = data_pointn * n.weight
4 end
5 // Hidden layers
6 foreach layer l where l.index > 0 do
7 foreach neuron n in l do
8 foreach neuron m in layer [l-1] do
9 sum += m.output ˚ m.weightn
10 end
11 if n == output_neuron then
12 n.output = sum
13 else
14 n.output = TransferFunction(sum)
15 end
16 end
17 end
18 return output_neuron.output
19 end
Algorithm 3: Pseudo code of the prediction function for the neural network
23
3.5. Implementing the learning algorithms
n f
1 ÿ ÿ
( yi ´ m ´ k j xij )2 (3.3)
N
i =1 j =1
Where m represents the bias, k is equal to the regression coefficient and x is equal to the input
data point. F is equal to the features which is in our case two. N is the total number of data
points. Equation 3.3 allows us to derive the gradients for a specific feature f and m.
" # 1 řn řf
Bf ´2k ( y ´ m ´ k x ))
Bk = N i =1 f i j=1 j ij
Bf 1 řn řf (3.4)
Bm N i =1 ´2 ( y i ´ m ´ j=1 k j xij ))
The start value of m (bias) was initialized to achieve a straight line between maximum and
minimum output value.
output: float
1 Function Predict(data_point)
2 value = m
3 foreach feature f do
4 value += data_point f * k f
5 end
6 return value
7 end
Algorithm 5: Pseudo code of the prediction function for Linear Regression
24
3.5. Implementing the learning algorithms
output: float
1 Function PredictRecursive(data_point)
2 if has children then
3 if data_point f < value then
4 return leftchild PredictRecursive(data_point)
5 else
6 return rightchild PredictRecursive(data_point)
7 end
8 else
9 return value
10 end
11 end
Algorithm 7: Pseudo code of the recursive prediction function for regression tree
25
3.6. Selection of hyper parameters for the algorithms
26
4 Results
In this chapter the result of the different simulations are presented in graphs. The starting an-
gle is different in each graphs, this can be seen above the graphs. The first section will present
the result of the base pendulum where no changes has been made. After this 5 different en-
vironments will be presented in the same way. In these 5 environments different changes has
been made.
Figure 4.1: Graphs of simulation results for the base pendulum with different start angles.
27
4.2. Data from altered environments
Figure 4.2: Graphs containing the different simulation results in altered environment 1 with
changed pendulum mass
28
4.2. Data from altered environments
Figure 4.3: Graphs containing the different simulation results in altered environment 2 with
changed pendulum mass and pendulum length
29
4.2. Data from altered environments
Figure 4.4: Graphs containing the different simulation results in altered environment 3 with
changed cart mass and pendulum length
30
4.2. Data from altered environments
Figure 4.5: Graphs containing the different simulation results in altered environment 4 with
changed cart mass
31
4.2. Data from altered environments
Figure 4.6: Graphs containing the different simulation results in altered environment 5 with
changed cart mass, pendulum mass and pendulum length
32
5 Discussion
The discussion is founded in the theory and result of the paper. First the result is discussed
by going over the different algorithms one by one. Speculations about the outcome are made
with connection to the theory. The method is also discussed with theories about what im-
provements that could have been made and the assumed consequences of these improve-
ments. With connection to the method the sources are discussed. In conclusion the work is
discussed in a wider context.
5.1 Results
When studying and comparing the results we could see that one of the algorithms were able
to mimic the PID-controller fully, the neural network. This is clear in figure 4.1 where the
line of the PID-controller and the neural network follows each other. The controller using
the neural network is also the only of our trained algorithms which manages to balance the
pendulum for all simulations with starting offsets larger than 9 degrees. The linear regression
never manages to balance the pendulum when not starting in a balanced state and therefor
does not perform significantly better than using no controller at all. The controller using the
regression tree algorithm on most occasions only performs better than using none controller
for two starting offsets, 9 and 4 degrees, while it in all simulations except one performs worse
than no controller when starting in a balanced state.
As seen in 2.5.3 the neural network algorithm can do non-linear mapping. This together with
a small bias and low variance is what allows it to mimic the reference data fully. The regres-
sion tree has the freedom to map in hypercubes, it makes no assumption about the target
function and therefore gives us a low bias and high variance. The tree output values is lim-
ited to the number of leaves compared to the linear regression and the neural network that
has a continuous output function which suits this problem more. There is more advanced
tree algorithms such as random forest and GBM which would probably have yielded a bet-
ter result, but due to the linearity of the target function they would not be suitable for this
issue because the algorithm makes too few assumptions about the target function. As for the
Linear regression, it did not mimic the reference data at all and was performing worst of all
algorithms. Based on 2.2 the balancing of the pendulum becomes a linear target function as
the angle approaches upright position. We believe that because of this the linear regression
should be able to balance the pendulum for more angles if tweaking with the training data,
because it makes the assumption that the target function is linear it is reasonable that it does
not manage to balance on training data that requires non-linear mapping, and removing these
training data points would probably make the linear controller better.
33
5.2. Method
When realizing that not all the algorithms were able to achieve balance we started reasoning
what type of algorithm we would use if we were to build a pendulum or just generally how to
decide which algorithm to use on different types of problems. The most obvious conclusion
we made was that it is very important to determine what type of problem that is to be solved.
Different algorithms are designed to solve different problem. In our case we believe that all
the algorithms could have been used to solve the problem, but due to previously mentioned
factors they did not solve the problem and might not be the most perfect algorithms to use. In
particular it is important to consider what assumptions the algorithm makes about the target
function. A linear regression controller might perform equally as good as a neural network
and be a lot more understandable, this also adds a factor of reliability into the controller
which makes it a lot more suitable. One also needs to reason a lot about what training data to
use for each controller, it does not make sense to train the linear regression on training data
that it will never be able to fit to, especially since it will make the fit on other training data
worse when trying to fit it.
The most interesting result that we received was that the neural network were able to recover
the inverted pendulum in some cases where the PID was not, with even clearer difference
when altering environments. This performance difference was the most clear when altering
the length of the pendulum, we believe that this is because the length is the only parameter
being changed that directly affects our state variable θs in our equations of motions 2.12.
Why the neural network performs better than the reference data from the PID controller is
very hard to reason about, since neural networks are not intuitively easy to understand. The
advantage that the neural network has that the PID does not, is that it works great for non-
linear mapping. This could be the reason for the difference in performance.
When altering the different environments the linear regression is performing very bad for
all performances. The change in the parameters do not affect the algorithm very much. We
believe that this is due to that the linear regression algorithms behaves as two P-controllers
or a single PD controller. If we added another parameter to the linear controller which would
have the purpose of mimicking the Integral part of the PID-controller we could have achieved
a better fit to the reference data. As for the regression tree it was not able to balance in our
reference simulation and this is a continuous trend throughout our tests. As previously stated
this is believed to depend on the properties of the algorithm. The neural network is constantly
adapting to the PID-controller. This should in some way not be seen as a failure since by the
definition of machine learning in 2.4 it has learned exactly what it needs to be able to mimic
the reference data. For the neural network to be able to balance even when the parameters of
the pendulum changes some kind of reinforcement learning could be tested. This means that
the neural network would have ability to adapt to changes in an environment.
5.2 Method
With the realizations that have come from this project our choices of algorithms are question-
able, especially the Regression Tree. As previously stated in 3.3 we made our choices based
on the actuality of different algorithms. We wanted to test newer more relevant algorithms
and compare these with older more well established algorithms. We also wanted to test algo-
rithms of different complexity. When looking at this problem at the end stage of the project
it is clear that more consideration and time should have been put into choosing algorithms
that are more well addressed to the problem and factors that are brought up in 5.1. In this we
should also have spent more time and reasoning about what training data that would be best
for the different controllers.
34
5.3. The work in a wider context
This could also be said for the choice of test environment. We feel that a more considered
problem would make for a better study. The pendulum in our case was a very well estab-
lished problem which was one of the main factors that we chose it. Making our study on a
different problem or several different kinds of problems might have made for a more inter-
esting result. We are however pleased with our choice since the pendulum makes for a easy
environment to alter which was one of the main questions of our report. We do feel though
that more time could have been made to consider and discuss different environments and
problems to study and that this may have resulted in a more diverse study.
With more time and resources we do feel that it would have been interesting to not only use
a virtual but also a real model to test the algorithms on. It would also have been a more com-
plete comparison if we could have used more algorithms. More time would also have given
us the opportunity to try different kind of problems. This could have been for example a fully
linear and a non linear problem. We believe that this would have made for a more rich and
diverse discussion as to the differences between different algorithms on different problems.
35
6 Conclusion
The aim of the project was to compare different machine learning algorithms when used as a
substitution for classical control theory and to map their behaviour and ability to adapt. To
be able to do this we phrased two different questions. "How does a pendulum based on classic
control theory compare to a pendulum based on different machine learning algorithms?" and "How
will the machine learning algorithms behave when altering the environment by changing for example
weight, length and size.".
The aim of the paper has been achieved to some extent. The neural network slightly outper-
formed the PID controller in altered environments, even though this suggests that the neural
network should be more stable, there is a complexity added to the controller which makes it
harder to understand. As one of the advantages of using a PID is the simplicity, there is very
little motivation to replace a PID-controller with a neural network controller. We also feel
that the result could have been more comprehensive and unpredictable if more algorithms,
different models and more diverse environments could have been studied. Most importantly,
the algorithms assumptions about the target functions should have been evaluated more to
the target function that we were using as reference data. This is in our opinion one of the
most important aspects when choosing a machine learning algorithm. Also, the comprehen-
sion of the algorithm should be one of the main aspects when using it in control theory, this
is important since it makes the limitations and functionality more clear for the user. It is our
belief that this paper could be used as a basis for considering and choosing what type of or a
specific algorithm. It could also be used as a basis for a bigger research project.
Another factor that needs to be done to be able to create a large-scale comparison would be
to use the algorithms on different models and environments. An algorithm that is good on
one model could be really bad on another. All the factors of the algorithm and model need to
be taken into account to be able to provide a comprehensive result. With the different mod-
els different factors in the environment could be change to examine how and if the different
algorithms can adapt to changes.
36
Bibliography
[1] T Abdelzaher, Y Diao, JL Hellerstein, C Lu, and X Zhu. Introduction to control theory and
its application to computing systems. 2008.
[2] Ethem Alpaydin. “Introduction to Machine Learning, Second Edition”. In: Mas-
sachusetts Institute of Technology (2010).
[3] Sara Ayoubi, Noura Limam, Mohammad A. Salahuddin, Nashid Shahriar, Raouf
Boutaba, Felipe Estrada-Solano, and Oscar M. Caicedo. “Machine Learning for Cog-
nitive Network Management”. In: IEEE Communications Magazine 56.1 (Jan. 2018),
pp. 158–165. ISSN: 0163-6804. DOI: 10 . 1109 / MCOM . 2018 . 1700560. URL: http :
//ieeexplore.ieee.org/document/8255757/.
[4] James Bergstra and Yoshua Bengio. “Random search for hyper-parameter optimiza-
tion”. In: Journal of Machine Learning Research 13 (2012), pp. 281–305. ISSN: 15324435.
[5] Marc Claesen and Bart De Moor. “Hyperparameter Search in Machine Learning”. In:
(2015), pp. 10–14. arXiv: 1502.02127. URL: http://arxiv.org/abs/1502.02127.
[6] “Dartmouth conference: McCorduck 2004, pp. 111–136, Crevier 1993, pp. 47–49, who
writes "the conference is generally recognized as the official birthdate of the new sci-
ence.", Russell and Norvig 2003, p. 17, who call the conference "the birth of artificial
intelligence.", NRC 1999, pp. 200–201”. In: Dartmouth conference: McCorduck (1956).
[7] “Design of reinforce learning control algorithm and verified in inverted pendulum”. In:
Chinese Control Conference, CCC 2015-Septe.Grant 61362002 (2015), pp. 3164–3168. ISSN:
21612927. DOI: 10.1109/ChiCC.2015.7260128.
[8] Joseph Distefano, Allen R. Stubberud, and Ivan J. Williams. Feedback and Control Sys-
tems. Vol. 2. McGraw-Hill Education, 2013, pp. 1–6.
[9] John Doyle, Bruce Francis, and Allen Tannenbaum. “Feedback Control Theory”. In:
Design 134.6 (1990), p. 219. ISSN: 00223395.
[10] David A. Drachman. “Do we have brain to spare?” In: Neurology 64.12 (2005), pp. 2004–
2005. ISSN: 0028-3878. DOI: 10 . 1212 / 01 . WNL . 0000166914 . 38327 . BB. eprint:
http : / / n . neurology . org / content / 64 / 12 / 2004 . full . pdf. URL: http :
//n.neurology.org/content/64/12/2004.
[11] FMI Framework - FMI4cpp. URL: https://github.com/NTNU- IHB/FMI4cpp (vis-
ited on 02/18/2020).
[12] Jake Gunther and Todd Moon. “Digital signal processing without arithmetic using re-
gression trees”. In: 2009 IEEE 13th Digital Signal Processing Workshop and 5th IEEE Sig-
nal Processing Education Workshop, DSP/SPE 2009, Proceedings (2009), pp. 524–529. DOI:
10.1109/DSP.2009.4785979.
[13] Angela Harmon. “Neurons (nerve cells).” In: Salem Press Encyclopedia of Science (2014).
[14] Yuka Higashijima, Atsushi Yamamoto, Takayuki Nakamura, Motonori Nakamura, and
Masato Matsuo. “Missing data imputation using regression tree model for sparse data
collected via wide area ubiquitous network”. In: Proceedings - 2010 10th Annual Interna-
tional Symposium on Applications and the Internet, SAINT 2010 (2010), pp. 189–192. DOI:
10.1109/SAINT.2010.18.
37
Bibliography
38
Bibliography
[30] Zhang and Zhongheng. “A gentle introduction to artificial neural networks”. In: Annals
of Translational Medicine 4.19 (2016). ISSN: 2305-5847. DOI: 10 . 21037 / 10805. URL:
http://atm.amegroups.com/article/view/10805/11398.
[31] Zhang and Zhongheng. “Neural networks: further insights into error function, gener-
alized weights and others”. In: Annals of Translational Medicine 4.16 (2016). ISSN: 2305-
5847. DOI: 10 . 21037 / 10492. URL: http : / / atm . amegroups . com / article /
view/10492/pdf.
39