You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Linköping University | Department of Computer and Information Science

Bachelor thesis, 16 ECTS | Datateknik

2020 | LIU-IDA/LITH-EX-G--20/046--SE

Using machine learning for

control systems in transform-
ing environments

Felicia Barkrot, Mathias Berggren

Supervisor : Lennart Ochel

Examiner : Peter Fritzson

Linköpings universitet
SE–581 83 Linköping
+46 13 28 10 00 , www.liu.se
Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år
från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka
kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för
undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta
tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För
att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-
istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i
den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt
samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-
manhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller
egenart. För ytterligare information om Linköping University Electronic Press se förlagets
hemsida http://www.ep.liu.se/.

The publishers will keep this document online on the Internet – or its possible replacement
– for a period of 25 years starting from the date of publication barring exceptional circum-
stances. The online availability of the document implies permanent permission for anyone to
read, to download, or to print out single copies for his/hers own use and to use it unchanged
for non-commercial research and educational purpose. Subsequent transfers of copyright
cannot revoke this permission. All other uses of the document are conditional upon the con-
sent of the copyright owner. The publisher has taken technical and administrative measures
to assure authenticity, security and accessibility. According to intellectual property law the
author has the right to be mentioned when his/her work is accessed as described above and
to be protected against infringement. For additional information about the Linköping Uni-
versity Electronic Press and its procedures for publication and for assurance of document
integrity, please refer to its www home page: http://www.ep.liu.se/.

c Felicia Barkrot, Mathias Berggren


The development of computational power is constantly on the rise and makes for new pos-
sibilities in a lot of areas. Two of the areas that has made great progress thanks to this
development are control theory and artificial intelligence. The most eminent area of arti-
ficial intelligence is machine learning. The difference between an environment controlled
by control theory and an environment controlled by machine learning is that the machine
learning model will adapt in order to achieve a goal while the classic model needs preset
parameters. This supposedly makes the machine learning model more optimal for an en-
vironment which changes over time. This theory is tested in this paper on an model of an
inverted pendulum. Three different machine learning algorithms are compared to a classic
model based on control theory. Changes are made to the model and the adaptability of
the machine learning algorithms are tested. As a result one of the algorithms were able
to mimic the classic model but with different accuracy. When changes were made to the
environments the result showed that only one of the algorithms were able to adapt and
achieve balance.

We would like to express our very great appreciation to our examiner Peter Fritzson and our
supervisor Lennart Ochel


Abstract iii

Acknowledgments iv

Contents v

List of Figures vii

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Theory 3
2.1 Control theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Open-loop and closed-loop control systems . . . . . . . . . . . . . . . . . 4
2.1.2 Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Physics of the pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Equations of motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Artificial intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.1 Regression Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.2 Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.3 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.4 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.5 Activation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.6 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Modelica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 FMI - Funtional Mock-up Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.8 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Method 18
3.1 Virtual model of pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Implementation in OpenModelica . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Altering of the environments . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 PID Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Choosing algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 Training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Implementing the learning algorithms . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5.1 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.3 Regression Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.6 Selection of hyper parameters for the algorithms . . . . . . . . . . . . . . . . . . 26
3.7 Comparing the algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Results 27
4.1 Base pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Data from altered environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.1 Altered environment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2.2 Altered environment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.3 Altered environment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.4 Altered environment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.5 Altered environment 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Discussion 33
5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.1 Source criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3 The work in a wider context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Conclusion 36
6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Bibliography 37

List of Figures

1.1 An inverted pendulum balancing on a cart . . . . . . . . . . . . . . . . . . . . . . . 1

2.1 A schematic over a simple feedback control system with exogeneous signals. . . . 5
2.2 A schematic over a PI-controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 A schematic over a PID-controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Diagram of the pendulum environment with actuating forces . . . . . . . . . . . . 8
2.5 A diagram of a classic decision tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 A graph showing an example of linear regression . . . . . . . . . . . . . . . . . . . 12
2.7 Simplified image of a biological neuron . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.8 Schematic of Artificial Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.9 A graph showing weight altering by using stochastic gradient descent . . . . . . . 14
2.10 Graphs of hyperplane decision surface for a single layer perceptron network . . . 15
2.11 Graph showing a plot of the sigmoid activation function . . . . . . . . . . . . . . . 16
2.12 Graph showing the tanh activation function . . . . . . . . . . . . . . . . . . . . . . . 16

3.1 Schematic of the neural network structure . . . . . . . . . . . . . . . . . . . . . . . . 21

4.1 Graphs of simulation results for the base pendulum with different start angles. . . 27
4.2 Graphs containing the different simulation results in altered environment 1 with
changed pendulum mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Graphs containing the different simulation results in altered environment 2 with
changed pendulum mass and pendulum length . . . . . . . . . . . . . . . . . . . . 29
4.4 Graphs containing the different simulation results in altered environment 3 with
changed cart mass and pendulum length . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5 Graphs containing the different simulation results in altered environment 4 with
changed cart mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.6 Graphs containing the different simulation results in altered environment 5 with
changed cart mass, pendulum mass and pendulum length . . . . . . . . . . . . . . 32

1 Introduction

This chapter will introduce the main motivation of the project. The aim is presented before
presenting the research questions. Lastly the delimitations of the project is given.

1.1 Motivation
Control theory is a field of engineering which has seen a lot of development during the 20th
century. Technological advances and the difference in computation power is two of the rea-
sons for this development. Most of the present methods for controller design within the linear
control theory branch is static, this means that as soon as the parameters for the system is set
it is going to work that way until you change it. This sounds like a good approach in most
cases, but if the environment changes or the user wants to make variations to the environ-
ment the parameters need to be reset, something that could be a complicated process.

When studying control theory an environment which is very frequent is the inverted pendu-
lum. The inverted pendulum controller is a classic example in control engineering because of
the instability of the system. The inverted pendulum is placed on a cart, both the pendulum
and cart are constrained to move within the vertical plane. The goal is for the cart to be able
to balance the pendulum 90 degrees straight up by moving in the vertical directions.

Figure 1.1: An inverted pendulum balancing on a cart

Just as the development of computation power has made an impact in the control theory
branch, it has also made Artificial intelligence (AI) a hot topic in the 2010’s. AI is something
that becomes more relevant every day in our society, research in this fields has made some
big breakthroughs over the last ten years. Most of those breakthroughs have been in the area
of machine learning with neural networks as one of the big buzzwords of the area.

1.2. Aim

The difference between a classic controller of the pendulum and a pendulum operated with
machine learning is that the machine learning model will learn how to operate in order to
balance the pendulum while the classic model will need preset parameters. This suppos-
edly makes the machine learning model more optimal for an environment which may have
deviations over time since there would be no need to change any parameters.

1.2 Aim
The aim of the project is to compare and study the data produced by different machine learn-
ing algorithms to map their behavior and ability to adapt. The research environment, a sim-
ulated inverted pendulum will be controlled by a PID-controller. From these simulations
training data will be extracted to train machine learning algorithms to be able to balance the
pendulum. Different changes will be made to the environment and then the algorithms will
be run on the new environment to compare how control systems, that are based on different
machine learning algorithms, adapt when placed in environments that are dynamic.

1.3 Research Questions

AI is one of the biggest fields within computer science as for today. There are numerous of
areas in which it is used and one of these is control theory. The inverted pendulum is a classic
experiment within this area and it is interesting to implement with machine learning to be
able to examine how a well-studied environment functions with more modern techniques.
It is also an model in which you can supply different changes to the environment such as
different weights, lengths and start-angles to see how an algorithm adapts.

With previously mentioned problem as a base, the questions that will be answered in this
report are:

• How does a pendulum based on classic control theory compare to a pendulum based
on different machine learning algorithms?

• How will the machine learning algorithm behave when altering the environment by
changing for example weight, length and size.

The necessary theory and method will be described in this report before presenting the result
and discussion.

1.4 Delimitations
Considering the time frame of the project a number of different delimitation has been made.
Today many different machine learning algorithms exist, given the mentioned time frame we
had to pick 3 of them since comparing them all would take too much time. This work should
therefore not be considered a complete comparison since many algorithms has not been con-
sidered. Also due to the time frame of the object, the only problem which will be studied is
the inverted pendulum. With more time there would be possible to research different types
of problems. The pendulum will only be virtual as a result of the time frame of the project.

2 Theory

This chapter will explain the theory behind the project and will start of by presenting the
main control theory which have been used to build the controller and the classic model of the
pendulum. Then the area artificial intelligence is presented as an area in the field of computer
engineering. The area is wide and has several subareas, the one who will be used in this
article is machine learning. After this the selected algorithms will be thoroughly explained.
The modeling language which has been used in this project to build the pendulum, Modelica,
is explained last.

2.1 Control theory

Control theory deals with automatic systems and open- and closed-looped systems. Auto-
matic systems refers to system that can work without human supervision. Control theory is
a interdisciplinary subject which means that it is used within a number of different branches
such as for example vehicles, robots and space technology [27]. Without control systems it is
possible that our technology today would be very different. Control systems are what makes
many of our machines work as intended. A control system is often based on a feedback-
principle. There is an input signal that is compared to a reference signal which represent the
goal that is to be reached [9].

To be able to construct a control system a high knowledge about the processes that are to
be regulated is required. The most important factor are the outputs of the system and how
they react to changes in the inputs. The input is some kind of command or stimulus that is
applied to the system. The form of the inputs and outputs can vary. The input and output
needs to be given to be able to identify the components of the system. A control system can
have more than one input and output [8]. A process have static and dynamic characteristics.
The static characteristics are the ones that are in the process’s static condition, this means
that the static amplification is leveled everywhere in the work of the process. The dynamic
characteristics of a process will take into account rigidity’s, time delays and transients. For
example, when you press the pedal of a car it will take some time before the wanted speed is
acquired. The static characteristics between different systems are often very alike while the
dynamic characteristics may differ. Because of this, different processes can be divided into
different types. There are processes with downtime, processes with overshoot and so forth.
The inverted pendulum is an unstable process. This means that feedback is required to be
able to keep the output close to the wanted reference signal. [27]

2.1. Control theory

When designing a control system there are a number of steps to go through. First the control-
system must be studied, to be able to build the system one must know what type of sensors
that are necessary, what actuators that is to be used and where they should be placed. After
identifying all the necessities of the system, it can be modeled. After the system is modeled
the resulting models’ properties can be determined and the performance specifications can
be set. Based on earlier conclusions the type of controller that is to be used is set and the
controller is designed to meet the measured properties. There are different types of controllers
that can be used on different types of systems. Some of these controllers will be explained
more closely in the following sections. After identifying the appropriate controller the system
can be simulated. [9]

2.1.1 Open-loop and closed-loop control systems

Control systems are generally divided into two categories: open-loop and closed-loop sys-
tems. These system are separated based on the control action which is responsible for activat-
ing the system and producing the output. The word action in the term control action is not
necessary a change, motion or activity. In a system designed to have an object hit an target
the control action is the distance between the object and the target. A distance is not an action
but the action motion is implied in this case due to the goal of the object hitting the target. In
a open-loop control system the control action is independent of the output, in contrast to the
closed-loop control system where the control action is dependent on the output. The open-
loop control systems ability to perform accurately is entirely determined by its calibration.
When a system is calibrated the input-output relation is established to obtain a desired sys-
tem accuracy. The closed-loop control system are also known as feedback control systems.
This is the kind of system which will be used in this article and the feedback control system
is explained more thoroughly in the following section. [27] Feedback control systems

Feedback is the main difference that separates the closed-loop control system from the open-
looped one. The feedback permits the output to be compared with the input to the system
and allows for the appropriate control action to be formed as a function of the output and
input. The presence of the feedback gives the system a number of different properties. For
example, a feedback control system increase accuracy and the feedback reduce the effects of
external disturbances or noise.

2.1. Control theory

The simplest kind of feedback control system has three components:

1. The object that is to be controlled.

2. A sensor to measure the output from the object.
3. A controller to generate the input to the object.

Figure 2.1: A schematic over a simple feedback control system with exogeneous signals.

As seen in figure 2.1 the output signal is fed back into the object that is to be controlled, this
is called the feedback. Apart from the feedback there are some signals that are coming from
the outside, the external disturbance and sensor noise are example of this. These signals are
called exogenous signals [9]. The controller oversees the process and has a input variable, the
command input, which is compared to a set-point that is the wanted outcome. If the input is
the same as the set-point the control system has reached the goal, if they differ the controller
sends the object input to the object to tell it how to perform to reach the set-point. [1]

2.1.2 Controllers
The controller is the heart of the control system. The controller’s task in the system is to, with
the information from the feedback, create the control signal that will try to decrease the error.
The simplest form of controller is the on-off controller, in these the control signal can only
take two different values. The value of the signal depends on the output, if it is positive or
negative. The on-off controller is simple but not always accurate enough. As a more advanced
type of the on-off controller there is the multi-stage controller. The difference is basically that
there are more than two stages in this type of controller. The on-off and multi-stage controller
are not very usual, in many cases a P- PI- or PID-controller is used. [27]

2.1. Control theory P-controller
With proportional control the variations of the control signal is proportional to the control
error signal, the input to the controller. The relationship between the input and output can be
described with the following formula:
u = u0 + Ke (2.1)
Whereas u is the control signal, e is the control error signal, u0 is the set point, the goal value.
The K parameter is the controllers amplification which means that it decides how much the
controller should do to right the wrong.

A P-controller is often used as a basic function in most controllers but is often combined
with a I-controller and D-controller which will be brought up in the following sections. The
P-controller will give a softer control compared to previously mentioned on-off controllers.
There is no K-value that gives a good speed and a high stability, if both speed and stability is
wanted the controller must be supplemented with a D-controller. The P-controller might be
enough if the requirements are not that high.[27] I-controller
An I-controller is a controller where the output is an integral of the error.
1 t
u(t) = e(t)dt (2.2)
TI 0
Whereas TI is the integration time which decides the velocity of the integration. e is the input
and u is the output. The I-controllers output at a certain time depends on the size of the error
at that time. In a I-controller the control signals initial value is set to the set points value. As
long as the error is 0, the control signal is the same as the set point, the control signal will stay
at its initial value. If there is an error the control signal will increase or decrease depending
on if the error signal is positive or negative. When the error is controlled the control signal
will have returned to its initial value. [27] PI-controllers
Often a P-controller and an I-controller is combined into a PI-controller. In this way you can
use the advantages of the both types.
h şt i
u(t) = K e(t) + T1I 0 e(t)dt (2.3)

Figure 2.2: A schematic over a PI-controller.

The amplification will now affect both therms in the PI-controller. The integration time TI
in a PI-regulator is consciously chosen to be big and that will result in a slow change in the
I-part of the controller, when an error occur, compared to the P-part. The integration time in
the PI-controller corresponds to the time it takes for the I-parts output to match the P-parts
output. The constants K and TI needs to be set to suitable values for the PI-controller to work
as intended. [27]

2.1. Control theory Derivatives and PID-controller

The third part of the PID-controller is the derivative part, the D-part.

1 de(t)
u(t) = TD e (t) = TD (2.4)
Where the derivation time TD is a constant.

Figure 2.3: A schematic over a PID-controller

The output of the D-part is separated from 0 only when the input’s value is changed, when
the derivative of the input is separated from 0. If the change of the input is fast, the derivative
will be big and when the input takes a constant value the derivative will be 0. The derivative
part is never present as a single entity in a control system, it is always used with some of the
previously mention parts, for example in a PD-controller, or PID-controller.[27]

The output of a PID-controller is made up of outputs from the three different parts, the P-, I-
and D-parts. The relationship between the input e and the output u in a PID controller can be
described in the following form:
h şt 1
u(t) = K e(t) + T1I 0 e(t)dt + TD e (t) (2.5)

The derivative part of the PID-controller can improve the stability, speed and the interference
suppression. If the PID-controller is to function as intended the parameters K K I and TD
needs to be tuned to a suitable value. [27]

2.2. Physics of the pendulum

2.2 Physics of the pendulum

The inverted pendulum operates in an environment with the following parameters; A cart
that has a mass M. External force F is added at the sides. The pendulum itself has a mass m
and is connected to the cart through a rigid massless rod with a length l. The pendulum is
rotated from the vertical line by a quantity θ in the counter clockwise direction. There’s also
a friction force f that works in the opposite direction of the external force and a gravitation
constant g. Figure 2.4 describes the environment.

Figure 2.4: Diagram of the pendulum environment with actuating forces

The system has the freedom to move in two different ways, the cart can move horizontal with
the x-axis and the pendulum can rotate against it’s pivot point 360 degrees.

2.2.1 Equations of motions

In section 2.2 it was stated that the pendulum has the freedom to move in two different ways.
This leads to two state variables.

xs = Displacement of cart on x-axis relative to starting position.

θs = Angular displacement for the pivot relative to upright position.

To derive the equations of motions we used Lagrange’s equations (2.6).

( )´ = Qi (2.6)
dt Bqi Bqi

Where L is the difference between the kinetic energy (T) and the potential energy (V).

L = T´V (2.7)

The potential energy of the system is going to be the potential energy of the pendulum since
the cart will never have any stored energy.

V = mglCosθ (2.8)

2.3. Artificial intelligence

Finding the kinetic energy is a little more complicated since it involves both the pendulum
and the cart.
1 1
T= M ẋ2 + mVp2
2 2
= ( M ẋ + m(Vx2 + Vy2 ))
= ( M ẋ2 + m(( ẋ ´ l θ̇Cosθ )2 + l θ̇Sinθ )2 )) (2.9)
= (( M + m) ẋ2 ´ m(2ẋl θ̇Cosθ ´ l 2 θ̇ 2 (Cos2 θ + Sin2 θ )))
= (( M + m) ẋ2 ´ m(2ẋl θ̇Cosθ ´ l 2 θ̇ 2 ))
By then combining formula 2.8 and 2.9, equation 2.7 can be solved for L.

L= (( M + m) ẋ2 ´ 2m ẋl θ̇Cosθ + ml 2 θ̇ 2 ´ 2mglCosθ ) (2.10)

Using Lagrange’s equations (2.6) and calculating the equations of motions for our state vari-
ables xs and θs we can get the equations of motions for the system.

xs : ( M + m) ẍ ´ ml θ̈Cosθ + ml θ̇ 2 Sinθ = F (2.11)

Where F is the external force applied to the state variable xs .

θs : m ẋl θ̇Sinθ + ml 2 θ̈ ´ m ẍlCosθ ´ m ẋl θ̇Sinθ ´ mglSinθ

= ml 2 θ̈ ´ m ẍlCosθ ´ mglSinθ (2.12)
= l θ̈ ´ ẍCosθ ´ gSinθ = 0

Equation 2.12 is equal to 0 because there will be no external force actuating on state variable
θs .

2.3 Artificial intelligence

The term artificial intelligence (AI) was first proposed at a conference held at Dartmouth
College in 1956 [6]. Since then the area has been going through a remarkable process. The
cognitive system, Watson, which was developed by IBM beat the reigning master in Jeopardy
in 2011. In 2016 Google’s AI-system, AlphaGo, achieved great success in a challenge with Lee
Se-dol, one of the best players in the world of the game Go. Simply put AI can be defined
as the research of intelligent agents. This include devices that observe its environment and
based on this observations makes decisions that maximizes the likelihood of obtaining a goal.
The device should be programmed to mimic the cognitive behaviour in a human brain such
as for example learning and problem solving. [29].

AI can be classified into two different categories, weak and strong AI. The strong AI category
is considered to have human-like high level cognition ability. Included in this behaviour is
for example common sense and self-awareness. On the other hand, there is weak AI which
simulates human intelligent processes without resistance and real understanding. Modern
AI systems are all at the stage of weak AI and as for today strong AI does not exist. [29]

2.4. Machine learning

2.4 Machine learning

Machine learning is an area of computer science and AI. It gives a computer the ability to
learn without being deliberately programmed [6]. The area evolved from the study of pattern
recognition and computational learning theory in AI. Machine learning is used for problems
that can be solved using inference and have a large representative training data [3]. This is
done with different algorithms. Simply explained, these algorithms are a sequence of differ-
ent instructions. These instructions interpret input and revise the input to an output. There
are many different types of algorithms and the challenge is to find the most efficient one for a
specific task. Simpler tasks might not need an algorithm at all, for example, classifying emails
into spam and not spam. When doing this the input is the email and the output is a simple
yes/no depending on if the email is spam or not. The person using the email-service can
change or affect what is spam, which mean it will change over time [2].

The main motivation for using this type of system is that if a system can learn and adapt to
changes in an environment the designer of these systems does not need to be able to foresee
and provide solutions for all possible situations. The areas in which machine learning is used
today are many and include pattern recognition, speech recognition and robotics . One of
the main themes of pattern recognition is recognizing faces, it is an easy task for the human
brain, a human does it without effort every day. This is however done unconsciously which
means without awareness, sensation, or cognition. To build a computer program that works
with awareness or cognition is impossible, the program will never make its own decision, it
will take action based on the code that the programmer have written. To mimic the brain the
algorithm is programmed to look at the known facts; a face has a pattern, it has a nose, eyes
and a mouth. These are all placed at a certain position in the face, there is a structure. With
the data of photos of different faces a learning program can see and analyze a face-pattern
and recognize this by checking for the pattern in each image. [2]

2.5 Supervised Learning

Supervised learning uses models to map inputs to comparable outputs with the help of des-
ignated training data-sets. This can be used when trying to solve classification and regression
problems that apply to predicting discrete or continuous valued outputs, respectively. To be
able to solve a supervised learning problem you first need to determine what type of data
that is going to be used as training set. Different types of data requires different actions. Af-
ter determined the type of the data it needs to be collected. The data is a must to be able
to achieve the requested result. When the data is gathered the input function of the learned
function needs to be determined. It is important that the structure of the learned function
and the comparable learning algorithm is thought through and compatible. The training sets
consists of input objects and corresponding outputs and are gathered from human experts
or from measurements. The most common way to decide the input function of the learned
function is to transform the input object into a vector containing features that are revealing
the object. [3].

2.5.1 Regression Tree

The regression tree is a long-established machine learning imputation algorithm. The algo-
rithm forms a binary tree structure model which outputs a represented value by conditional
branching. The tree has a root as a beginning. The leaf in the tree is the terminal node. In
between the root and a leaf are the regular nodes. Originating from the root, questions can
be asked about the features. The branches then answer the question. The next question is de-
termined by the previous answer. In the classic version, each question refers only to a single
attribute, and has a yes or no answer. [14, 12]

2.5. Supervised Learning

Figure 2.5: A diagram of a classic decision tree

The output, leaf, of a regression tree is obtained by making a series of comparisons rather
than asking yes- or no questions. To train a tree you need a dataset. The dataset is divided
into a training set and a testing set. The training set is usually composed of complete data
which determine the structure of the tree while the testing set has missing values of certain
attributes. The model searches every distinct value of the input data to find the split value
that separates the data into two regions. After finding the best split the splitting process is
repeated on each of the two new regions. This is repeated until a stopping point is reached.
[14, 12]

A regression tree have a lot of advantages. It’s an excellent way for the user to visualize each
step of a decision making process which can help with making rational decisions. There is a
possibility to give priority to a decision criterion. A lot of the undesired data is filtered out in
each step which makes for a manageable amount of data. It’s a very presentable algorithm
that is easy to explain.

2.5.2 Linear regression

This method is a linear approach to modeling the relationship between independent vari-
ables. This is done by fitting a linear equation to the data. For example there is a dataset with
two different variables X and Y. These datasets are plotted in a graph. Linear regression now
strive to find the optimal straight line amongst these data points. The line, which is called a
regression line, contains the predicted score of Y for each achievable value of X. The line is a
mean value of all the data points which means that the prediction often will not be exact and
will have some errors of prediction. [18, 15]

2.5. Supervised Learning

Figure 2.6: A graph showing an example of linear regression

With two variables and only one explanatory variable the method is referred to as simple
linear regression. If there are more than one explanatory variable this is referred to as multiple
linear regression. It’s a fairly simple algorithm and it has well known properties which makes
for a popular algorithm to use not only in machine learning but in areas such as for example
finance, economics and epidemiology. [18, 15]

2.5.3 Artificial Neural Networks

An Artificial Neural Network (ANN) is an attempt to model how a biological brain operates.
There are several different types of ANNs, this section covers the relevant ANN models for
the report, from single layer networks with perceptron activation functions to more modern
and evolved networks represented by multi layered neurons with ReLU functions. ANN are
considered a viable solution for problems which have noisy and complex sensor data [21].

There are two types of Neural Networks, Convolutional Neural Networks (CNNs) and Re-
current Neural Networks (RNNs). The difference is that everything in the CNN is sequential,
whereas the RNN has loops inside of it. Every time Neural Network is mentioned in this
thesis it is referring to a CNN if nothing else is mentioned.

2.5. Supervised Learning Biological neurons

A human brain consists of a large set of connected nodes (neurons). Approximately there
are 20 billions of neurons in a human brain, where each neuron is on average connected to
7 000 other neurons [10]. A neuron is sometimes also referred to as a nerve cell. The neu-
rons transmit and receive signals through an electrochemical process, first an electrical pulse
is sent, the pulse is then converted into a chemical message which is transported to other
neurons [13]. The neurons in the brain receive the signals with something called dendrites.
After receiving, the neuron has a system that keeps track of which input signal that is of most
significance, this is named synapses. After gathering the received messages the neuron will
evaluate what to do next and send messages to other neurons or organs, the part that sums
all the input-signals up and evaluates if they reached a certain threshold is named Soma. The
part of the neuron that handles the transmission is referred to as the Axiom [30]. The speed
for which a human neuron can perform a switch is estimated to 10´3 seconds, compared to a
computer which has switching speeds of 10´10 seconds. However, a human brain can recog-
nize a person in approximately 10´1 seconds because of the human neurons ability to operate
in parallel [21].

Figure 2.7: Simplified image of a biological neuron Artificial Neurons

Already 1943 the first mathematical model for representing a neural network was presented
[20]. Since then there has been significant improvements and major breakthroughs. However,
the basic model for how a neuron is represented remains the same. There is an input vector
{x1 ... xn } that works like the dendrites described in There is also a weight vector {w0
... wn } which works like the synapses, the weights themselves are represented by a decimal
number between 0 and 1. After receiving the input, they are all summed and checked if it
reaches over the threshold of the activation function. Depending on the triggering of the
activation function different values are sent to the output; represented by an output vector

Figure 2.8: Schematic of Artificial Neuron

2.5. Supervised Learning

2.5.4 Backpropagation
To train a network it is necessary to calculate the desired functionality. Comparing the output
from the network to the desired output gives us an error value (can also be referred to as loss).
This is done by using an error function, the most common named sum of squared error (sse)
Esse = (Olh ´ Ylh )2 (2.13)
l =1 h =1

Where l = 1,2,3, ..., L and are for the different observations. h = 1,2,3, ..., H is the index for
respective output node. O is the desired output and Y is the observed output.

As mentioned earlier each individual neuron has a weight associated with it. Altering the
behaviour of the program is done by adjusting the weight that each neuron has. The weights
adjust the importance of certain input values and change the output from the neurons, there-
fore they can be used to alter the functionality of the network. Altering of the weight is done
by something called backpropagating. Backpropagation works by using stochastic gradient
descent, the idea is to minimize the error value generated by the error function. Calculating
partial derivatives of the error function regarding the weights ( BBW ) gives us the possibility to
move towards a smaller error value, this is displayed in figure 2.4. Each iteration the gradient
is calculated and moved towards a smaller error by adjusting the weight in the appropriate
direction [16].

Figure 2.9: A graph showing weight altering by using stochastic gradient descent

2.5.5 Activation functions

As mentioned in the previous section there is an activation function that decides whether or
not the input was larger than a certain threshold. During the recent years there has been a
lot of research in this area which has resulted in numerous different functions. This section
briefly describes what has been, what exists and some of the setbacks of the different acti-
vation functions. There has been experiments with using different activation functions for
different layers [28].

2.5. Supervised Learning Perceptron
Old ANN systems used an activation function called perceptron.If the result is bigger than a
threshold the perceptron will output a binary value (1 or -1). Since the output is binary it can
be considered as a Boolean value. Given input x1 to xn and assuming x0 =1, the output o(x1 ,
. . . , xn ) can be written as the formula:
1 if Σnk=0 wk xk > 0
o ( x1 , ..., xn ) = (2.14)
-1 else

Where wk = a real-value constant (weight) that affects how much each input will contribute.

As mentioned earlier, the output of a perceptron can be interpreted as a Boolean value de-
pending on what value the sum of the weights and inputs have. In figure 2.10a there’s a
representation of this. The figure 2.10b shows a limitation to the perceptron, there is no way
to model a XOR function with a single function since the XOR function is not linear separable.
This has led to evolution of more complex functions that can be used when modeling neural
networks [21].

(a) (b)

Figure 2.10: Graphs of hyperplane decision surface for a single layer perceptron network Sigmoid
The sigmoid receives a real-value input and produce a real-value output, compared to the
perceptron activation function which outputs a binary value [19]. The sigmoid function is
described in equation 2.15 and plotted in figure 2.11.

f (x) = (2.15)
1 + e´ x
As seen in the plot the function squashes the input value to a value between 0 and 1. When
looking at figure 2.11 and remembering how back-propagation worked (calculating gradient)
it is obvious that this function will have a hard time converging for high input values since
all gradient is killed the closer it comes to ´8 or 8. Computing exponential functions will
also be expensive computationally, which is important to avoid when training large networks

2.5. Supervised Learning

Figure 2.11: Graph showing a plot of the sigmoid activation function Tanh
Tanh activation function is a spin-off from the sigmoid, where the gradients increase when
data is centered around 0. This can be visualised by plotting the tanh function, in figure 2.12
the output ranges from [-1, 1] where as in the sigmoid the possible output lies between [0,
1].[17] Equation 2.16 is the tanh function where the sigmoid function 2.15 has been denoted
to σ ( x ).
tanh( x ) = 2σ (2x ) ´ 1 (2.16)

Figure 2.12: Graph showing the tanh activation function

2.5.6 Hyperparameters
There are a number of different learning algorithms that exists today. Oftentimes these al-
gorithms have sets of hyperparameters. A hyperparameter is a parameter that has to be set
accordingly by the user for the algorithm to be able to perform to its’ full extent. These hyper-
parameters influence the algorithm and its performance a great deal and are therefore used
to configure different aspects of the algorithm. The tuning of hyperparameters are generally
done manually which can be a time consuming work and it can be hard to reproduce by
others. The amount of hyperparameters can vary substantially but usually only a few of the
hyperparameters impact the performance. Identifying which of the parameters that has this
kind of impact in advance is hard. [5]

There are a number of different methods for optimizing hyperparameters, such as for exam-
ple grid search, random search, Bayesian optimization and evolutionary optimization. The
one that is used in this paper is grid search which is one of the most popular methods. The
method searches through a user-specified subset of a hyperparameter field of the learning
algorithm. This field may consist of real values parameters which makes it necessary to man-
ually set bounds for the search. [4]

2.6. Modelica

2.6 Modelica
Modelica is an open for all language which is used for modeling different systems. The de-
velopment of the language has been ongoing since 1996. The language is object-oriented
and is suited for a number of different multi-domain models. The models in Modelica are
described mathematically by algebraic, differential and discrete equations and from a user’s
point of view, they are described by schematics. These schematics consists of components
that has connectors that describes the possible interactions. A diagram model is made by
drawing connection lines between connectors on different components. To be able to graphi-
cally edit and browse a Modelica model a Modelica simulation environment is needed. This
environment is used to perform model simulations and other analysis.

2.7 FMI - Funtional Mock-up Interface

Functional Mock-up Interface (FMI) is a tool independent standard that support both model
exchange and co-simulation of dynamic models. This is done by using a combination of xml-
files and compiled C-code. It was the European project MODELISAR that started the work
with FMI in July 2008. The goal was to improve the design of systems and of embedded soft-
ware in vehicles. The purpose has broaden since 2008 and the intention is now that dynamic
system models of different software systems can be used together for different simulation
models in both cyber physical systems and other applications. FMI functions are called by
a simulation environment to create and simulate one or more executable called Functional
Mock-up Units (FMU). An FMU can require the simulation environment to perform numeri-
cal integration or have its own solvers. The goal is that the calling of an FMU in a simulation
environment should be fairly simple. [26]

2.8 Related Works

There has been numerous experiments with balancing a pendulum with machine learning,
especially with neural networks. In [24], [22], [23] and [7] this was performed successfully.

3 Method

In response to the research questions 1.3; creating a virtual model of the pendulum allowed
for repetitive runs with different parameters for different algorithms. For all experiments a
PID-controller was used as reference point. To allow the algorithms to run on a real-time
hardware system, the algorithms were implemented in C++ to achieve fast execution time.

3.1 Virtual model of pendulum

The model was written by deriving the equations of motions mathematically, this gave a
strong understanding of the physics of the pendulum. The C++ library used to access the
FMU model is FMI4cpp because of its focus on being easy to setup and use [11]. The simula-
tion of the virtual pendulum was done with step sizes of 0.001 seconds and total simulation
time for one simulation was 10 seconds. Both of these parameters were selected after trial
and error investigations.

3.1.1 Implementation in OpenModelica

Equations 2.11 and 2.12 are used to simulate the behaviour of the pendulum and the cart
where the input force is F in equation 2.11. The implementation of the inverted pendulum
can be seen below.
1 model InvertedPendulum
2 import SI = Modelica.SIunits;
4 parameter SI.Mass M = 1;
5 parameter SI.Mass m = 1.5;
6 parameter SI.Length L = 1.0;
7 parameter SI.Acceleration g = 9.82;
9 SI.Angle theta(start=3.14);
10 SI.AngularVelocity theta_velo = der(theta);
11 SI.AngularAcceleration theta_accel = der(theta_velo);
12 SI.Position x_pos(start=0);
13 SI.Velocity x_vel = der(x_pos);
14 SI.Acceleration x_acc = der(x_vel);
16 input SI.Force F(start=0);
17 output SI.Angle Y = theta;
18 equation
19 (M + m) * x_acc - m * L * theta_accel * cos(theta) + m * L * theta_velo^2 * sin(
theta) = F;
20 L * theta_accel - x_acc * cos(theta) - g * sin(theta) = 0;
22 end InvertedPendulum;
Listing 3.1: Modelica code for pendulum

3.2. PID Controller

3.1.2 Altering of the environments

When considering which variables to make changes to, we decided that the mass of the cart
(M), the mass of the pendulum (m) the length of the pendulum axis (L), and the displacement
of the angle at the start were the most suitable to make changes to. Our result is measured by
the displacement of θs , and the PID controller we are using operates on the offset of the angle.
Because of this we do not make any changes for the initial values of xs or take measurements
from it.

The angle offset was initialized with the values {90, 60, 45, 30, 20, 14, 9, 4, 0} to cover the
possible scenarios a controller could face in a varying environment. We did not consider the
opposite angles {-90, -60, ..., 0} since the target function for those values is expected to be a
horizontal reflection for the values chosen.

3.2 PID Controller

The integral-part of the PID controller was achieved by having an internal state that accu-
mulates the error, in the pseudo code below referred to as ITerm. The derivative is solved
by saving the last error input and calculate the change that happened last time step, as men-
tioned earlier our step size (dt in code below) is 0.001. There is also two hyperparameter
values used for regulating the output from the controller, we set these to -100 respective 100
as these were seen as good thresholds after initial test runs. All the coefficient values are set
through a grid-search of all parameters.

hyperparameters: Kp, Ki, Kd, out_max, out_min

output : External force F for xs
1 Function PID(error)
2 ITerm += error
3 output = (Kp ˚ error + Ki ˚ ITerm ´ Kd ˚ (error ´ old_error)/dt)
4 if output > out_max then
5 output = out_max
6 else if output < out_min then
7 output = out_min
8 return output
9 end
Algorithm 1: Pseudo code of the PID controller

3.3 Choosing algorithms

When choosing which algorithm to use in the experiment we looked at a number of different
factors. The first aspect that was discussed was the future aspect. We wanted a algorithm
that is modern, up-to-date but most importantly current, this due to the usability and de-
velopment of the study. As a contrast to this we also wanted a older, more well-researched
and proven algorithm. We also considered using algorithms with different complexity. We
found it interesting to see how algorithms with different complexity would adapt and learn
to different environments. As for the current and modern algorithm we settled on a neural
network. Even if the possibility of neural networks have been discussed for a long time, this
is one of the biggest and most up-coming algorithms of AI as for today. More facts that made
us chose the neural network is that it has the ability to learn and model complex and non-
linear relationships, it can generalize and predict on unseen data and that it does not impose
restrictions on the input variables.

3.4. Training data

As a contrast to the complex neural network algorithm we wanted a simple and well-
established algorithm. We considered different ones but settled on the linear regression al-
gorithm. This algorithm is very common, easy to use and implement and we thought that
the contrast to the neural network would make for interesting discussion. We wanted three
algorithms to be able to make a deeper comparison and as for the last algorithm we wanted
to find a middle ground between the neural network and linear regression. Here we settled
on a regression tree algorithm. This algorithm is fairly complex but is easy to understand
and explain. We thought that the tree-structure seemed interesting and found it intriguing to
see how this algorithm would compare to the first two algorithms we picked. We felt that all
three algorithms were dissimilar enough to give us a bigger picture over how different kinds
of algorithms work on the same kind of problem.

3.4 Training data

To be able to train our algorithms, training data was needed. The same sets of training data
was used on all three machine learning algorithms. The training data was extracted by sim-
ulating the virtual model and letting the PID-controller act as a point of reference. We ran 6
simulations with different starting values for θ varying between -60, -40, -30, 30, 40, 60 de-
grees offset from vertical upright. For these simulations we took the value of θ and θ̇ as input
for each data point and the output from the PID controller as the target for each data point at
each step. This gave us 60 000 data points. Since a majority of these training points were for
a pendulum balanced upright with nothing affecting it, we decided to remove all data points
gathered after 4.5 seconds into the simulation. This was done to avoid overfitting issues and
left us with 27 500 data points to be used for fitting and testing of the algorithms.

3.5 Implementing the learning algorithms

This section present the implementations of the different algorithms used throughout the
project. Both pseudo code and text is presented to describe the implementations. When
implementing the algorithms the focus was put into creating easy to understand algorithms,
since the training phase was not run during the simulation it was not required to be optimized
with regards to time. All pseudo code were derived from the theory part.

For simplicity, the Mean Squared Error (MSE) loss function was used for as the error function
for all functions. MSE is equal to the mean of the sum of squared error that was mentioned
briefly in section 2.5.4. The MSE was selected because the derivate is easy to calculate with it
for the linear regression. The MSE is:
1 ÿ
(yi ´ predict( x ))2 (3.1)
i =1

Where predict is the function used by the specific algorithm.

3.5. Implementing the learning algorithms

3.5.1 Neural Network

Neural networks (NN) is a large field and there is a lot that can be done for optimization. In
[22] a pendulum was stabilized virtually with a neural network using a energy swing-up and
PID controller for stabilizing. However the length of the cart rail was thought to be infinite to
minimize the input variables to only θ and θ̇. There was one hidden layer used and 25 neurons
in it. The sigmoid activation function was used. In [23] they also used a NN for stabilizing a
pendulum virtually, however here the length of the cart was considered. Therefore there was
four input variables to the system x, ẋ, θ and θ̇. The pendulum was successfully controlled
using only 4 neurons in the hidden layers of the NN, however using up to 20 neurons inside
the hidden layer did generate better results.

The network was chosen to have one hidden layer and 25 neurons as in [22] because we
used the same input variables and derive our training data in the same way. The output
function chosen was the tanh function 2.16 over the sigmoid function 2.15 since our output
data was centered around 0 it would provide better gradients. The saved computation by
using ReLU was not prioritized since the amount of data was not large enough to motivate
its use. Using ReLU would give derivates of 1 or -1 since it is a linear function, using tanh
was thought to make the gradients more dependent on the target output and therefore yield
better convergence.

Figure 3.1: Schematic of the neural network structure

Since this was not a classification problem there was no need for more than one neuron as
output, by making the last neuron have a linear output function the output was as expected.
During the training the backpropagation was done from the back to the front to make sure
all gradients were calculated for previous neurons and did not affect each other. Respectively
in the prediction the layers was traversed from front to back. The update on the weights,
were done first after all changes had been calculated. This was done to not have any changes
affecting other nodes during same iteration.

The derivative of tanh was used in the backpropagation and was equal to:

tanh x = 1 ´ tanh2 x (3.2)

3.5. Implementing the learning algorithms

For the neural network we used three different hyperparameters. Learning rate, momentum,
and epochs. The learning rate was a factor added to not overshoot when decreasing/in-
creasing the weights. Momentum was used to quicker converge towards goal output if we
had multiple training points indicating the same gradients. Epochs was used to re-use the
training data for a training session. Their respective settings is discussed in section 3.6.

hyperparameters: learning_rate, momentum, epochs

output : trained NN
1 Function Train(data_points)
2 foreach data_points i do
3 error = mse(y, Predict(i))
4 // Calculate gradients
5 foreach layer l do
6 foreach neuron n in l do
7 if n == output_neuron then
8 if error > 0 then
9 gradientn = 1
10 else if error < 0 then
11 gradientn = -1
12 else
13 gradientn = 0
14 end
15 else
16 foreach neuron m in layer [l+1] do
17 sum += m.gradient ˚ n.weightm
18 end
19 gradientn = Derivative(n.output) ˚ sum
20 end
21 end
22 end
23 // Update weights
24 foreach layer l do
25 foreach neuron n in l do
26 foreach neuron m in layer [l+1] do
27 old = deltaw
28 deltaw = learning_rate ˚ m.output ˚ n.gradient + momentum ˚ old
29 nw += deltaw
30 end
31 end
32 end
33 end
34 end
Algorithm 2: Pseudo code of the training function for Neural Network

3.5. Implementing the learning algorithms

output: float
1 Function Predict(data_point)
2 foreach neuron n in layer [0] do
3 n.output = data_pointn * n.weight
4 end
5 // Hidden layers
6 foreach layer l where l.index > 0 do
7 foreach neuron n in l do
8 foreach neuron m in layer [l-1] do
9 sum += m.output ˚ m.weightn
10 end
11 if n == output_neuron then
12 n.output = sum
13 else
14 n.output = TransferFunction(sum)
15 end
16 end
17 end
18 return output_neuron.output
19 end
Algorithm 3: Pseudo code of the prediction function for the neural network

3.5. Implementing the learning algorithms

3.5.2 Linear Regression

When implementing the linear regression (LR) there was not as much flexibility as in the case
of the NN. The derivation of respective feature was derived from 3.1, if we write the MSE
with the prediction function for linear regression expanded.

n f
1 ÿ ÿ
( yi ´ m ´ k j xij )2 (3.3)
i =1 j =1

Where m represents the bias, k is equal to the regression coefficient and x is equal to the input
data point. F is equal to the features which is in our case two. N is the total number of data
points. Equation 3.3 allows us to derive the gradients for a specific feature f and m.
" #  1 řn řf 
Bf ´2k ( y ´ m ´ k x ))
Bk =  N i =1 f i j=1 j ij 
Bf 1 řn řf (3.4)
Bm N i =1 ´2 ( y i ´ m ´ j=1 k j xij ))

The start value of m (bias) was initialized to achieve a straight line between maximum and
minimum output value.

hyperparameters: learning_rate, epochs

output : trained linear regression
1 Function Train(data_points)
max(data_points )+min(data_points )
y y
2 m= 2
3 for i = epochs; i > 0; i- - do
4 foreach data_points d do
5 foreach feature f do
6 change f += learning_rate ˚ DerivateLoss f d
7 end
8 changem += learning_rate ˚ DerivateLossm d
9 end
10 k f = k f ´ change f
11 m = m ´ changem
12 end
13 end
Algorithm 4: Pseudo code of the training function for Linear Regression

output: float
1 Function Predict(data_point)
2 value = m
3 foreach feature f do
4 value += data_point f * k f
5 end
6 return value
7 end
Algorithm 5: Pseudo code of the prediction function for Linear Regression

3.5. Implementing the learning algorithms

3.5.3 Regression Tree

Both the training and prediction algorithm were implemented recursively. Since binary tree
structures are suitable to implement in recursive functions and time efficiency not being a
priority in the training phase. Right and left is children of a specific node.

hyperparameters: max depth, leaf k

output : trained tree
1 Function TrainRecursive(data_points, element, parent = nullptr)
2 if (parent.depth + 1 >= max depth) then
3 return
4 end
5 foreach feature f do
6 foreach data_points [ f ] i do
7 node [i].value = mean (data_points f [i], data_points f [i + 1])
8 foreach data_points [ f ] j do
9 node [i].residual += residual (node [i].value, j)
10 end
11 end
12 best_node f = min (node.residual)
13 end
14 element = min (best node)
15 foreach data_points i do
16 if i < element then
17 leftdata.append(i)
18 else
19 rightdata.append(i)
20 end
21 end
22 if (size (leftdata) > leaf k) and (size (rightdata) > leaf k) then
23 TrainRecursive(leftdata, element.left, element)
24 TrainRecursive(rightdata, element.right, element)
25 end
26 end
Algorithm 6: Pseudo code of the recursive training function for Regression Tree

output: float
1 Function PredictRecursive(data_point)
2 if has children then
3 if data_point f < value then
4 return leftchild PredictRecursive(data_point)
5 else
6 return rightchild PredictRecursive(data_point)
7 end
8 else
9 return value
10 end
11 end
Algorithm 7: Pseudo code of the recursive prediction function for regression tree

3.6. Selection of hyper parameters for the algorithms

3.6 Selection of hyper parameters for the algorithms

In order to select the hyper parameters for the different machine learning algorithms we
performed a grid search through the following values and selected the value for which the
mean squared error was smallest.

For Neural Network:

Epochs = t1, 5, 10, 15, 20, 25, 30, 35, 40u

Momentum = t0.001, 0.002, 0.005, 0.008, 0.01, 0.02, 0.03u
Learning rate = t0.001, 0.002, 0.005, 0.008, 0.01, 0.02, 0.03u

For Linear Regression:

Epochs = t1, 5, 10, 15, 20, 25, 30, 35, 40u

Learning rate = t0.001, 0.002, 0.005, 0.008, 0.01, 0.02, 0.03u

For Regression tree:

Depth = t10, 20, 50u

Lea f k = t10, 50, 100u

3.7 Comparing the algorithms

To answer our first research question the data from the pendulum simulation based on the
PID-controller, classic control theory, was compared to the data from the simulations with the
different machine learning algorithms. The data was plotted in graphs that were compared to
be able to separate the results. We compared how long time it took for the algorithm (if ever)
to balance the pendulum in an upright position with less than 1 degree error. The algorithms
where then compared to how much bias and variance they each respectively has and how
this could have affected respective result.
When adding different weights to the pendulum the same comparisons were made to be able
to determine how the different algorithms performed when altering the environment.

4 Results

In this chapter the result of the different simulations are presented in graphs. The starting an-
gle is different in each graphs, this can be seen above the graphs. The first section will present
the result of the base pendulum where no changes has been made. After this 5 different en-
vironments will be presented in the same way. In these 5 environments different changes has
been made.

4.1 Base pendulum

The result of the simulations are presented in the graphs below. The parameters of the pen-
dulum can be seen at the top of the image. In the simulation of the base pendulum the cart
mass is 1.0, the pendulum mass is 0.3 and the pendulum length is 0.5. The graphs show that
the linear regression algorithm, together with the tree algorithm, has the worst performance.
The tree algorithm does not succeed balancing the pendulum when starting in an upright
position (balanced starting state). The controller that is closest to the PID controller is the
Neural Network, it mimics the PID-controller for all starting angles.

Figure 4.1: Graphs of simulation results for the base pendulum with different start angles.

4.2. Data from altered environments

4.2 Data from altered environments

Here the different data from our altered environments are presented. In total 5 different
altered environments have been studied. There are some of the three different parameters
in the top of each image that have been changed in the different environments, cart mass,
pendulum mass and pendulum length.

4.2.1 Altered environment 1

In this environment we have changed the pendulum mass from 0.3 to 1.0. The linear re-
gression algorithm and the tree algorithm, again has the worst performance. Tree algorithm
manages to balance the pendulum for starting offsets smaller than 9 degrees, but not when
starting in a balanced state. The controller using the linear model only manages to balance
the controller when starting in a balanced state. The neural network controller is performing
almost identical to the PID-controller, a little better when starting with a 90 degree offset and
a little bit worse when starting with a 60 degree offset.

Figure 4.2: Graphs containing the different simulation results in altered environment 1 with
changed pendulum mass

4.2. Data from altered environments

4.2.2 Altered environment 2

In this environment we have changed the pendulum mass from 0.3 to 1.0 and the pendulum
length from 0.5 to 1.0. The controllers performances is very close to the first environment,
with the exception that the controller using a neural network is performing slightly better
than the PID-controller. This is most visible when starting with large offset values on the

Figure 4.3: Graphs containing the different simulation results in altered environment 2 with
changed pendulum mass and pendulum length

4.2. Data from altered environments

4.2.3 Altered environment 3

In this environment the cart mass has been changed from 1.0 to 2.0 and the pendulum length
has been changed from 0.5 to 2.0 compared to the base environment. Here we can see that
for the simulations that started with large offset values the neural network significantly out-
performed the PID-controller, the neural network managed to balance the pendulum when
starting with a 90 degree offset, while the PID failed for the same angle. The regression tree
algorithm for the first time managed to balance the pendulum when starting in an upright
position and also for the 9 and 4 degree offset. The linear controller did not succeed on any
simulation except for when starting in a balanced state.

Figure 4.4: Graphs containing the different simulation results in altered environment 3 with
changed cart mass and pendulum length

4.2. Data from altered environments

4.2.4 Altered environment 4

In this environment the mass of the cart is the only thing that has been changed. The cart mass
changed from 1.0 to 2.0. The graphs show a very similar performance to the altered environ-
ment 2. The regression tree algorithm is again unable to balance the pendulum when starting
in an upright position but succeeds with an offset of 9 or 4 degrees. The linear controller is
not able to balance for any other simulation than the one starting with upright position. The
PID-controller and the neural network is following each other with a slightly advantage for
the neural network.

Figure 4.5: Graphs containing the different simulation results in altered environment 4 with
changed cart mass

4.2. Data from altered environments

4.2.5 Altered environment 5

For the last environment we have changed the mass of the cart to 2.0, the length of the pen-
dulum to 2.0 and the mass of the pendulum to 1.0. The neural network controller deviates
from the PID but is not performing better. The tree algorithm only manages to balance the
pendulum when the starting offset is 4 degrees, while the linear regression controller only
manages to balance the pendulum when starting in a balanced state.

Figure 4.6: Graphs containing the different simulation results in altered environment 5 with
changed cart mass, pendulum mass and pendulum length

5 Discussion

The discussion is founded in the theory and result of the paper. First the result is discussed
by going over the different algorithms one by one. Speculations about the outcome are made
with connection to the theory. The method is also discussed with theories about what im-
provements that could have been made and the assumed consequences of these improve-
ments. With connection to the method the sources are discussed. In conclusion the work is
discussed in a wider context.

5.1 Results

When studying and comparing the results we could see that one of the algorithms were able
to mimic the PID-controller fully, the neural network. This is clear in figure 4.1 where the
line of the PID-controller and the neural network follows each other. The controller using
the neural network is also the only of our trained algorithms which manages to balance the
pendulum for all simulations with starting offsets larger than 9 degrees. The linear regression
never manages to balance the pendulum when not starting in a balanced state and therefor
does not perform significantly better than using no controller at all. The controller using the
regression tree algorithm on most occasions only performs better than using none controller
for two starting offsets, 9 and 4 degrees, while it in all simulations except one performs worse
than no controller when starting in a balanced state.

As seen in 2.5.3 the neural network algorithm can do non-linear mapping. This together with
a small bias and low variance is what allows it to mimic the reference data fully. The regres-
sion tree has the freedom to map in hypercubes, it makes no assumption about the target
function and therefore gives us a low bias and high variance. The tree output values is lim-
ited to the number of leaves compared to the linear regression and the neural network that
has a continuous output function which suits this problem more. There is more advanced
tree algorithms such as random forest and GBM which would probably have yielded a bet-
ter result, but due to the linearity of the target function they would not be suitable for this
issue because the algorithm makes too few assumptions about the target function. As for the
Linear regression, it did not mimic the reference data at all and was performing worst of all
algorithms. Based on 2.2 the balancing of the pendulum becomes a linear target function as
the angle approaches upright position. We believe that because of this the linear regression
should be able to balance the pendulum for more angles if tweaking with the training data,
because it makes the assumption that the target function is linear it is reasonable that it does
not manage to balance on training data that requires non-linear mapping, and removing these
training data points would probably make the linear controller better.

5.2. Method

When realizing that not all the algorithms were able to achieve balance we started reasoning
what type of algorithm we would use if we were to build a pendulum or just generally how to
decide which algorithm to use on different types of problems. The most obvious conclusion
we made was that it is very important to determine what type of problem that is to be solved.
Different algorithms are designed to solve different problem. In our case we believe that all
the algorithms could have been used to solve the problem, but due to previously mentioned
factors they did not solve the problem and might not be the most perfect algorithms to use. In
particular it is important to consider what assumptions the algorithm makes about the target
function. A linear regression controller might perform equally as good as a neural network
and be a lot more understandable, this also adds a factor of reliability into the controller
which makes it a lot more suitable. One also needs to reason a lot about what training data to
use for each controller, it does not make sense to train the linear regression on training data
that it will never be able to fit to, especially since it will make the fit on other training data
worse when trying to fit it.

The most interesting result that we received was that the neural network were able to recover
the inverted pendulum in some cases where the PID was not, with even clearer difference
when altering environments. This performance difference was the most clear when altering
the length of the pendulum, we believe that this is because the length is the only parameter
being changed that directly affects our state variable θs in our equations of motions 2.12.
Why the neural network performs better than the reference data from the PID controller is
very hard to reason about, since neural networks are not intuitively easy to understand. The
advantage that the neural network has that the PID does not, is that it works great for non-
linear mapping. This could be the reason for the difference in performance.

When altering the different environments the linear regression is performing very bad for
all performances. The change in the parameters do not affect the algorithm very much. We
believe that this is due to that the linear regression algorithms behaves as two P-controllers
or a single PD controller. If we added another parameter to the linear controller which would
have the purpose of mimicking the Integral part of the PID-controller we could have achieved
a better fit to the reference data. As for the regression tree it was not able to balance in our
reference simulation and this is a continuous trend throughout our tests. As previously stated
this is believed to depend on the properties of the algorithm. The neural network is constantly
adapting to the PID-controller. This should in some way not be seen as a failure since by the
definition of machine learning in 2.4 it has learned exactly what it needs to be able to mimic
the reference data. For the neural network to be able to balance even when the parameters of
the pendulum changes some kind of reinforcement learning could be tested. This means that
the neural network would have ability to adapt to changes in an environment.

5.2 Method
With the realizations that have come from this project our choices of algorithms are question-
able, especially the Regression Tree. As previously stated in 3.3 we made our choices based
on the actuality of different algorithms. We wanted to test newer more relevant algorithms
and compare these with older more well established algorithms. We also wanted to test algo-
rithms of different complexity. When looking at this problem at the end stage of the project
it is clear that more consideration and time should have been put into choosing algorithms
that are more well addressed to the problem and factors that are brought up in 5.1. In this we
should also have spent more time and reasoning about what training data that would be best
for the different controllers.

5.3. The work in a wider context

This could also be said for the choice of test environment. We feel that a more considered
problem would make for a better study. The pendulum in our case was a very well estab-
lished problem which was one of the main factors that we chose it. Making our study on a
different problem or several different kinds of problems might have made for a more inter-
esting result. We are however pleased with our choice since the pendulum makes for a easy
environment to alter which was one of the main questions of our report. We do feel though
that more time could have been made to consider and discuss different environments and
problems to study and that this may have resulted in a more diverse study.

With more time and resources we do feel that it would have been interesting to not only use
a virtual but also a real model to test the algorithms on. It would also have been a more com-
plete comparison if we could have used more algorithms. More time would also have given
us the opportunity to try different kind of problems. This could have been for example a fully
linear and a non linear problem. We believe that this would have made for a more rich and
diverse discussion as to the differences between different algorithms on different problems.

5.2.1 Source criticism

The sources of this paper were thoughtfully selected to suit the subject and have all been
checked for reliability. The majority part of the sources has been accepted conference papers.
It is our opinion that these papers can be seen as good sources as basis of information. Since
the subject of the paper handle both older and newer theories and techniques the sources
vary from as old as 1943 and as new as 2019. When handling older resources we have always
been careful to check relevance and if the information is out-dated or not. Some books have
been used as sources, these books are educational and written in an unbiased way. When
explaining different system and processes the sources are official documentation and the of-
ficial websites for the systems. There is a possibility that these documents and websites are
biased and present favorable information, but this was a conscious choice since we feel that
the source material has the highest accuracy.

5.3 The work in a wider context

In our society today artificial intelligence is an hot topic and it is impossible to do research
about this without making an impact. People can be sceptical against the subject and see it
as a bad thing. Ethically there is always the question of how much data different algorithm
should be able to obtain and use or the ongoing discussion of self-driving cars. This paper in
the context of all the information that exists might not impact the discussion currently going
on about AI and data mining but in a way simpler and more grounded facts are important to
be able to have a professional and mature discussion.

6 Conclusion

The aim of the project was to compare different machine learning algorithms when used as a
substitution for classical control theory and to map their behaviour and ability to adapt. To
be able to do this we phrased two different questions. "How does a pendulum based on classic
control theory compare to a pendulum based on different machine learning algorithms?" and "How
will the machine learning algorithms behave when altering the environment by changing for example
weight, length and size.".

The aim of the paper has been achieved to some extent. The neural network slightly outper-
formed the PID controller in altered environments, even though this suggests that the neural
network should be more stable, there is a complexity added to the controller which makes it
harder to understand. As one of the advantages of using a PID is the simplicity, there is very
little motivation to replace a PID-controller with a neural network controller. We also feel
that the result could have been more comprehensive and unpredictable if more algorithms,
different models and more diverse environments could have been studied. Most importantly,
the algorithms assumptions about the target functions should have been evaluated more to
the target function that we were using as reference data. This is in our opinion one of the
most important aspects when choosing a machine learning algorithm. Also, the comprehen-
sion of the algorithm should be one of the main aspects when using it in control theory, this
is important since it makes the limitations and functionality more clear for the user. It is our
belief that this paper could be used as a basis for considering and choosing what type of or a
specific algorithm. It could also be used as a basis for a bigger research project.

6.1 Future work

Something that we think could be interesting to investigate further after this study would be
to tune PID parameters using machine learning algorithms. To be able to provide a more ex-
tensive comparison, as been describes before, more algorithms could be studied. We believe
that it would be beneficial to start by trying to sort different algorithms and group them by
different labels such as for example, complexity, accuracy, what type of problem they solve
most efficiently and how they handle different amount of training data. We strongly believe
that this would make for a more interesting result.

Another factor that needs to be done to be able to create a large-scale comparison would be
to use the algorithms on different models and environments. An algorithm that is good on
one model could be really bad on another. All the factors of the algorithm and model need to
be taken into account to be able to provide a comprehensive result. With the different mod-
els different factors in the environment could be change to examine how and if the different
algorithms can adapt to changes.


[1] T Abdelzaher, Y Diao, JL Hellerstein, C Lu, and X Zhu. Introduction to control theory and
its application to computing systems. 2008.
[2] Ethem Alpaydin. “Introduction to Machine Learning, Second Edition”. In: Mas-
sachusetts Institute of Technology (2010).
[3] Sara Ayoubi, Noura Limam, Mohammad A. Salahuddin, Nashid Shahriar, Raouf
Boutaba, Felipe Estrada-Solano, and Oscar M. Caicedo. “Machine Learning for Cog-
nitive Network Management”. In: IEEE Communications Magazine 56.1 (Jan. 2018),
pp. 158–165. ISSN: 0163-6804. DOI: 10 . 1109 / MCOM . 2018 . 1700560. URL: http :
[4] James Bergstra and Yoshua Bengio. “Random search for hyper-parameter optimiza-
tion”. In: Journal of Machine Learning Research 13 (2012), pp. 281–305. ISSN: 15324435.
[5] Marc Claesen and Bart De Moor. “Hyperparameter Search in Machine Learning”. In:
(2015), pp. 10–14. arXiv: 1502.02127. URL: http://arxiv.org/abs/1502.02127.
[6] “Dartmouth conference: McCorduck 2004, pp. 111–136, Crevier 1993, pp. 47–49, who
writes "the conference is generally recognized as the official birthdate of the new sci-
ence.", Russell and Norvig 2003, p. 17, who call the conference "the birth of artificial
intelligence.", NRC 1999, pp. 200–201”. In: Dartmouth conference: McCorduck (1956).
[7] “Design of reinforce learning control algorithm and verified in inverted pendulum”. In:
Chinese Control Conference, CCC 2015-Septe.Grant 61362002 (2015), pp. 3164–3168. ISSN:
21612927. DOI: 10.1109/ChiCC.2015.7260128.
[8] Joseph Distefano, Allen R. Stubberud, and Ivan J. Williams. Feedback and Control Sys-
tems. Vol. 2. McGraw-Hill Education, 2013, pp. 1–6.
[9] John Doyle, Bruce Francis, and Allen Tannenbaum. “Feedback Control Theory”. In:
Design 134.6 (1990), p. 219. ISSN: 00223395.
[10] David A. Drachman. “Do we have brain to spare?” In: Neurology 64.12 (2005), pp. 2004–
2005. ISSN: 0028-3878. DOI: 10 . 1212 / 01 . WNL . 0000166914 . 38327 . BB. eprint:
http : / / n . neurology . org / content / 64 / 12 / 2004 . full . pdf. URL: http :
[11] FMI Framework - FMI4cpp. URL: https://github.com/NTNU- IHB/FMI4cpp (vis-
ited on 02/18/2020).
[12] Jake Gunther and Todd Moon. “Digital signal processing without arithmetic using re-
gression trees”. In: 2009 IEEE 13th Digital Signal Processing Workshop and 5th IEEE Sig-
nal Processing Education Workshop, DSP/SPE 2009, Proceedings (2009), pp. 524–529. DOI:
[13] Angela Harmon. “Neurons (nerve cells).” In: Salem Press Encyclopedia of Science (2014).
[14] Yuka Higashijima, Atsushi Yamamoto, Takayuki Nakamura, Motonori Nakamura, and
Masato Matsuo. “Missing data imputation using regression tree model for sparse data
collected via wide area ubiquitous network”. In: Proceedings - 2010 10th Annual Interna-
tional Symposium on Applications and the Internet, SAINT 2010 (2010), pp. 189–192. DOI:


[15] S. Kavitha, S. Varuna, and R. Ramya. “A comparative analysis on linear regression

and support vector regression”. In: Proceedings of 2016 Online International Conference on
Green Engineering and Technologies, IC-GET 2016 (2017), pp. 1–5. DOI: 10.1109/GET.
[16] Yann Lecun, Yoshua Bengio, and Geoffrey Hinton. “Deep learning”. In: Nature 521.7553
(2015), pp. 436–444. ISSN: 14764687.
[17] Yann. Lecun, Leon. Bottoun, Geneview. Orr, and Klaus-Robert. Müller. “Efficient Back-
Prop”. In: Neural Networks: tricks of the trade (1998).
[18] Hyun Il Lim. “A linear regression approach to modeling software characteristics for
classifying similar software”. In: Proceedings - International Computer Software and Ap-
plications Conference 1 (2019), pp. 942–943. ISSN: 07303157. DOI: 10.1109/COMPSAC.
[19] Weibo Liu, Zidong Wang, Xiaohui Liu, Nianyin Zeng, Yurong Liu, and Fuad E. Alsaadi.
“A survey of deep neural network architectures and their applications”. In: Neurocom-
puting 234 (Apr. 2017), pp. 11–26. ISSN: 0925-2312. DOI: 10.1016/J.NEUCOM.2016.
12.038. URL: https://www.sciencedirect.com/science/article/pii/
[20] Warren S Mcculloch and Walter Pitts. “A logical calculus of the ideas immanent in
nervous activity”. In: Bulletin of mathematical biophysics 5 (1943). URL: https://pdfs.
semanticscholar . org / 5272 / 8a99829792c3272043842455f3a110e841b1 .
[21] Tom M Mitchell. Machine Learning. McGraw-Hill, 1997. ISBN: 0070428077.
[22] Valeri Mladenov. “Application of neural networks for control of inverted pendulum”.
In: WSEAS Transactions on Circuits and Systems 10.2 (2011), pp. 49–58. ISSN: 11092734.
DOI : 10 . 1109 / NEUREL . 2014 . 7011468. URL : http : / / www . wseas . us / e -
[23] Zhang Pengpeng and Zhang Lei. “BP Neural Network Control of Single Inverted Pen-
dulum”. In: Computer Science and Network Technology (ICCSNT). 2013, pp. 1259–1262.
ISBN : 9781479905614.

[24] S. Kawaji, T. Mclcda and N. Matsiinaga. “Learning Control of an Inverted Pendulum

Using Neural Networks”. In: (1992).
[25] Jürgen Schmidhuber. “Deep learning in neural networks: An overview”. In: Neural Net-
works 61 (Jan. 2015), pp. 85–117. ISSN: 0893-6080. DOI: 10.1016/J.NEUNET.2014.
09.003. URL: https://www.sciencedirect.com/science/article/pii/
[26] FMI Standard. “{FMI} for {ModelExchange} and {CoSimulation} v2.0”. In: (2014), pp. 1–
[27] Bertil Thomas. "Modern Reglerteknik". Vol. 5. Liber, 2016, pp. 5–70. ISBN: 9789147112128.
[28] Bin Wang, Tianrui Li, Yanyong Huang, Huaishao Luo, Dongming Guo, and Shi-Jinn
Horng. “Diverse activation functions in deep learning”. In: 2017 12th International Con-
ference on Intelligent Systems and Knowledge Engineering (ISKE) (2017), pp. 1–6. DOI: 10.
1109/ISKE.2017.8258768. URL: http://ieeexplore.ieee.org/document/
[29] Xiuquan Li and Tao Zhang. “An exploration on artificial intelligence application: From
security, privacy and ethic perspective”. In: 2017 IEEE 2nd International Conference
on Cloud Computing and Big Data Analysis (ICCCBDA). IEEE, Apr. 2017, pp. 416–420.
ISBN : 978-1-5090-4498-6. DOI : 10.1109/ICCCBDA.2017.7951949. URL : http://


[30] Zhang and Zhongheng. “A gentle introduction to artificial neural networks”. In: Annals
of Translational Medicine 4.19 (2016). ISSN: 2305-5847. DOI: 10 . 21037 / 10805. URL:
[31] Zhang and Zhongheng. “Neural networks: further insights into error function, gener-
alized weights and others”. In: Annals of Translational Medicine 4.16 (2016). ISSN: 2305-
5847. DOI: 10 . 21037 / 10492. URL: http : / / atm . amegroups . com / article /


You might also like