Neural Knight

F INAL P ROJECT R EPORT
Neural Knight: Novice Human Level

Performance in a Multiplayer 3D Fighting
Game using Computer Vision and a Mixed
Input Convolutional and Recurrent Neural
Network
Author: Supervisor:
Taha N ASIR Dr. Matthew Y EE -K ING
Student No: 33497611
May 16, 2019

iii
UNIVERSITY OF LONDON
Abstract
Computing Department
BSc Computer Science Degree
Neural Knight: Novice Human Level Performance in a Multiplayer 3D Fighting Game

using Computer Vision and a Mixed Input Convolutional and Recurrent Neural
Network
by Taha N ASIR
Neural Knight is an artificial neural network that has learned to play the 2017 computer
game "For Honor". It can fight against CPUs and humans to the level of a novice human
player, and is extremely challenging to tell apart from a human when observed. The neural
network consists of recurrent branches as well as convolutional branches. It was achieved
using OpenCV, Tensorflow and Keras.
iv
Acknowledgements
I would like to thank my family and friends for their support throughout the last four
months of developing Neural Knight. A special thanks must be given to my supervisor,
Matthew Yee-King, for his time and help in supervising me. I would especially like to thank
Daniel Kukiela for his invaluable insight into the workings of Neural Networks, and for
teaching me much. Finally, thank you to Yijie and Hasan, who were my human test subjects
within the game and spent hours fighting against different iterations of the bot.
v
Contents
1 Introduction 1
1.1 Project Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Baseline Target Specification For The Project . . . . . . . . . . . . . . . . . . . 1
1.3 Extended Target Specification For The Project . . . . . . . . . . . . . . . . . . . 2
1.4 Structure of Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Survey 3
2.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Neural Networks in Self-Playing AI . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Design and Implementation 5

3.1 For Honor Explication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1 Art of Battle System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.2 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.3 Bots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.2 Player Control Neural Network . . . . . . . . . . . . . . . . . . . . . . 10
3.2.3 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.1 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.2 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Capturing the Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Developing the Guard Detection Algorithm using Line Detection . . . 13
Capturing Attack Indicators . . . . . . . . . . . . . . . . . . . . . . . . . 14
Auto-parry Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Output of Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 15
Developing Remaining Feature Extraction Utilising Colour Presence . 16
Capturing Gamepad Input . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.3 Player Control Neural Network . . . . . . . . . . . . . . . . . . . . . . 17
Feed Forward Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Recurrent Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Attempts to improve the Recurrent model . . . . . . . . . . . . . . . . 22
Mixed Input Data Network . . . . . . . . . . . . . . . . . . . . . . . . . 22
Class Imbalance with Multi-Label Classification . . . . . . . . . . . . . 25
Change from Accuracy to F Score as Metric . . . . . . . . . . . . . . . . 25
Custom Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Temporal Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . 27
Model Optimisation Utilising Tensorboard . . . . . . . . . . . . . . . . 27
Addition of Third Branch . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Network Early Stopping During Training . . . . . . . . . . . . . . . . . 28
Completion of Third Branch . . . . . . . . . . . . . . . . . . . . . . . . . 28
vi
4 Testing and Evaluation 31

4.1 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 Infeasibility of Evaluating the State Detector . . . . . . . . . . . . . . . 31
4.1.2 Evaluating the Neural Networks . . . . . . . . . . . . . . . . . . . . . . 32
Random Fighter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Feed Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Recurrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Finalised Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Tensorboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Specification Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
End Specification Review . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5 Conclusions and Future Work 41

5.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1.1 Project Successes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1.2 Project Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Project Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Potential Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4 Self Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
A Turing Test Responses 45
B Figures & Tables 49
C Project Proposal 53
C.1 Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
C.2 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
C.3 Extended Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
C.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
C.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
D Preliminary Project Report 57

D.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
D.2 Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
D.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
D.4 Project Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
D.5 Progress to Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
E Source Code 61
F Weekly Logs 107

1
Chapter 1
Introduction
1.1 Project Introduction

For Honor is a 3D multiplayer fighting game developed and published in 2017 by Ubisoft.
It consists of fantasy characters such as Knights, Vikings and Samurai, who duel each other
using the unique standardised combat system known as the "Art of Battle" system. Due
to the author’s extensive domain knowledge and personal passion for the game, it was
considered an engaging and compelling project to attempt to develop a neural network that
learned to play the game. This is made more compelling and ambitious when considering
the very small number of self-learned, self-playing neural networks developed for modern
multiplayer titles.
The intention of the project was to develop a bot that learns to play a particular character in
order to win against opponents both human and in-game CPUs, and aims to mimic game-
play in such a way that it is indistinguishable from a skilled human. This project will detail
the undertaking of this project, about how computer vision techniques were employed along
with many modern machine learning technologies in order to implement the bot, as well as
the issues that arose and the modern solutions that were applied to tackle them. Following
this section is a baseline specification for the project, along with a secondary specification
detailing potential for extension beyond the baseline.
1.2 Baseline Target Specification For The Project

The end specifications of this project was to be able to have a fully functioning and com-
prehensively skilled artificial intelligence that could play For Honor. This could, in turn, be
formalised into a series of explicit requirements, which will now be explored.
The bot must be able to competently make fighting decisions against an enemy. Ideally it
should fight and win matches with a regularity of over 50% against Level 2 difficulty bots
and intermediately-skilled players (this level of gameplay quality is hereafter referred to in
this chapter as "playing"). It must be able to play without any manual assistance from the
time when initially locking on until the round is over.
The bot must be able to play in real-time using information discerned entirely from the
screen. It should not require any hooks to be connected to the opponent’s client, and thus
should be able to fight any opponent, including in-game bots. This requires computer vision
solutions. This specification is required due to the lack of any API to access information
within the game directly.
2 Chapter 1. Introduction
1.3 Extended Target Specification For The Project

Once the previous specifications have been met, there is ample opportunity for extended
functionality. In addition to being competent at playing as previously defined, it would
be ideal for the bot’s gameplay to be indistinguishable from that of a humans’. This can
be tested for by surveying subjects who are both familiar and unfamiliar with the game,
showing them two different pieces of recorded footage of duels, and asking them to guess
as to which was performed by a human. This process could be repeated a number of times
on the same subject. The intended result would be for those familiar with the game to be
unable to consistently guess correctly.
In addition, if the AI reaches a point where its performance cannot be improved beyond
any qualitatively observable measure, it would be useful if it was able to learn to play more
than one kind of character, as each has a different moveset and different styles with which
to win fights. However, it was noted from the beginning that such a task was unlikely to
be achievable for every character given the time constraint of the project and the quantity
of data required to be captured. A comparison of this specification and the end result is
detailed in Chapter 5.
1.4 Structure of Report

Chapter 2 on page 3 introduces the Literature Survey, which discusses the problem being
solved along with precedent on how similar problems have been solved in the past, and the
underlying methods behind them.
Chapter 3 on page 5 explores in great detail the actual process by which the project was
designed and implemented by splitting it into three distinct and interlocking problems.
Chapter 4 on page 31 details the methods by which the different aspects of the project -
including different neural network models - were tested, and the results are analysed and
evaluated. It also compares the final result to the specification.
Finally, Chapter 5 on page 41 lays out the final conclusion and details potential future ex-
pansion on the network in order to improve or modify it, or generalise it.
3
Chapter 2
Literature Survey
2.1 The Problem

Computer games bots are pieces of software that are able to play computer games au-
tonomously. Researchers have applied many different techniques to the design of bots, and
one such modern technique is the development and training of neural networks (and other
learning machines) to play single-player computer games, as training is made easy. Neu-
ral networks require many thousands upon thousands of data points, which also assumes
the behaviour being replicated must be able to be encoded (for example, for a feed-forward
neural network with a single layer of hidden neurons, the behaviour must be able to be
represented as a continuous bounded function. Fortunately, virtually every aspect of a com-
puter game is already encoded and accessible, and often data can be very easily generated
to provide a large enough training set.
However, most neural networks of this nature are done for single-player computer games,
where all other agents are programmed. Unsupervised neuro-evolutionary computing is a
popular method in this instance, where the goal is unchanging, but the solution is unknown.
Neuro-evolutionary computing is a kind of reinforcement learning algorithm first proposed
by Kenneth O. Stanley [17], where a generation of random neural networks is generated,
and is then tested and rated against some reward function. Poor models are killed off,
whilst fitter models are ”bred” to make a new generation. This has been done for Super
Mario World [16] as well as for a host of games for the Atari 2600 [5], to name just a few
examples.
The concept of using machine learning techniques with online multiplayer games, however
is still fairly new. In addition, this kind of technique is extremely challenging to achieve for
an online fighting game, where there is no single hard solution to winning, as the human op-
ponent can be considered unpredictable. A key point to this problem is that a large amount
of For Honor’s difficulty lies in quick attacks that are difficult to react to. However, with a
fast enough processor a computer is vastly superior at reacting to actions such as these, and
so theoretically, so long as the decision it makes is correct, a computer could outperform a
human.
Therein lies the main problem that this project has aimed to solve - teach a network to make
the correct decisions moment-to-moment in order to win a duel. It should be able to do so
consistently, and ideally match or even outperform a moderately skilled human player.
4 Chapter 2. Literature Survey
2.2 Neural Networks in Self-Playing AI

With regards to the implementation of the neural network, it was decided early-on that the
project would likely make use of a particular model of neural network known as a recurrent
neural network. As opposed to the traditional feed-forward neural network, a recurrent
neural network not only processes inputs from one layer to the next (i.e. feeding the in-
formation forward), but information is also retained and fed back to other nodes within
the feature space. This allows for information to be retained between frames, creating the
possibility for patterns to be recognised within gameplay, because it allows time to be repre-
sented. Especially promising is the model known as the ”long short-term memory” network,
a form of recurrent network that has a very large memory capacity compared to a normal
RNN.
LSTM networks are comprised of LSTM cells. Each unit is typically comprised of an input
gate, an output gate, a cell that stores the state, and a forget gate. Together, these structures
decide, each time step, what information will be forgotten, what new information will be
stored, then use this and the currently stored information to decide what to output. This
was explained in depth by Colah [3].
An RNN with external memory was used to train multiple agents to play Quake III Arena
[7], but using an LSTM solves the same problem as the utilisation of external memory -
the problem of the limitation of memorisation due to the gradient vanishing and exploding
problem, which is due to ”the temporal evolution of the backpropagated error exponentially
depending on the size of the weights” [6] where back-propagation is a common and efficient
method of training networks. This problem relates to the number of time-steps the network
needs to remember being too high, and this was unlikely to be an issue in this project as a
typical fight in For Honor does not require a player to remember more than approximately
one minute before. Individual fights last only a maximum of three minutes. A Long Short
Term Memory network was also used to train MariFlow, a self-learning Super Mario Kart
AI. [15].
The first segment of this problem, however, was the matter of feature preprocessing - obtain-
ing live information regarding the state of the game from frame to frame. Such information
was not accessible using an API, so the only way to approach this problem was to utilise
computer vision solutions. Research on OpenCV, the primary and most well-documented
computer vision module for Python, was conducted primarily via its own documentation.
In addition, the computer vision section of a course was followed, run by Harrison Kinsley,
AKA "Sentdex" [8].
5
Chapter 3
Design and Implementation
3.1 For Honor Explication

What follows is a simplified explanation as to how the combat system in For Honor func-
tions, followed by a brief glossary providing a definition to simple game-related terms, so
as to make the report easier to understand for a reader unfamiliar with the game and to
better provide a description of the problem being solved. This will further assist to clarify
the following section on the design of the solution.
3.1.1 Art of Battle System

When the player locks onto an opponent, the Art of Battle - the game’s combat system - is
in effect. Designed primarily for 1 vs 1 duels, there are four main areas of the user interface,
displayed and annotated in the Figure below.
F IGURE 3.1: Annotated user interface.

6 Chapter 3. Design and Implementation
As can be seen, the main aspects of the UI are the player’s guard, health and stamina, and
the opponent’s guard, health and stamina. In order to win, the player must reduce the op-
ponent’s health to zero, emptying the bar. If the player’s health reaches zero first then they
will die and lose the duel. Most actions in the game consume stamina. If stamina is com-
pletely consumed then the player will become "exhausted", greying the screen and causing
all attempted moves to become extremely slow and high risk until it has fully replenished.
The guard user interface is an outline of a shield, split into three sections, referred to as "left",
"right", and "top" guard. A hero can have its guard face any one of these three directions.
Should an opponent attack from the same direction that a player is guarding, the attack will
be blocked, meaning it will be mostly negated.
Damaging weapon attacks consist of two main types - light attacks and heavy attacks. Light
attacks are fast, generally between 300-400ms (the timing differs dependent on their posi-
tion in a combination, the direction they come from, and the hero being played). They do
small amounts of damage and often work as opening attacks that chain into more dangerous
combinations. They consume little stamina. Heavy attacks are slower, often between 800-
1000ms. They consume much more stamina than light attacks, and are often the finishing
attack in a chain of moves.
Light and heavy attacks can be countered or "parried". Parrying an incoming attack requires
applying a heavy attack during the incoming attack’s parry window. The parry window be-
gins 300ms before the attack lands, and ends 100ms before the attack lands. This means the
player has only 200ms to react accordingly. Due to the difficulty of this action, successfully
parrying an attack rewards the player with an opportunity to apply certain amount of guar-
anteed damage. Typically, parrying a heavy attack guarantees a light attack, and parrying a
light attack guarantees a heavy attack.
Certain attacks, often heavy attacks positioned at the end of combination sequences, are
attributed the "unblockable" property. Unblockable attacks are accompanied by an orange
effect and sound, and cannot be blocked. They can, however, be parried as normal. Certain
unblockable attacks are not a weapon attack, but actually consist of some kind of blunt
charge, such as an attack with a shield. These are known as "bash" attacks. Bashes often
knock the opponent back and guarantee a light attack.
Weapon attacks can be dodged, if timed correctly. Players can dodge left, right, and back-
wards. In order to prevent an opponent consistently dodging attacks, a player can break
their guard. "Guardbreaks" display an icon briefly, and if the opponent does not also press
the same button when the icon becomes visible (for 400ms), then the opponent becomes
stunned and vulnerable to a large amount of damage. A GB will fail if the opponent is
in the process of throwing an attack. An opponent cannot counter-guardbreak if they are
in the process of dodging. If a player and an opponent attempt to guardbreak each other
simultaneously, they will both bounce off each other and fail.
This information is now repeated in a short-form glossary format for ease of referral for the
reader.
3.1.2 Glossary
• Hero: Playable characters in the game.
• Warden: One of the playable characters in the game, a knight in armour, wielding a
two-handed sword. Neural Knight is taught to fight as a Warden, and to fight against
a Warden.
3.2. Design 7
• Light attack: Fast, low-cost attack that does low damage and is difficult to parry.
• Heavy attack: Slow, high-cost attack that does high damage but is easy to parry.
• Parry: A counter-move performed in a 200ms window during an attack that stuns the
opponent.
• Unblockable: An attack that cannot be blocked by matching the guard direction. Can
still be parried.
• Bash: An unblockable attack that has no direction, often some kind of physical shoving
motion.
• Guardbreak: An opening attack with no direction that requires a quick reaction to
negate. Failure to counter will stun the opponent for 600-800ms.
3.1.3 Bots
For Honor has built-in bots in the game that serve as replacements for other players. They
can be played against or work with the player co-operatively in other game modes. The bots
are split into difficulty levels 1, 2, and 3, in ascending order. Whilst there is no publicly avail-
able data regarding the bots, from domain knowledge of the author, Level 1 bots are bots
that a novice player would find challenging. Level 2 bots a relatively intermediate player
should find challenging, and Level 3 bots a more experienced player should find challeng-
ing. This is clarified as these bots are used as the primary way to test the implementation,
further discussed in the next chapter (4 - Testing and Evaluation).
3.2 Design
The project can be described as the sum of three major components - the input, the player
control network, and the output. The input is the process of capturing the information that
is to be fed into the network. The player control system section is the development of the
neural network itself. The third task is the consideration as to what the output of the neural
network will be, and how it will be processed as inputs into the game itself. This section
details how the solution to these tasks was designed and conceptualised.
3.2.1 Input
Since, as aforementioned in the Specification (1.2 - Baseline Target Specification For The
Project), there is no API which can be used to hook into the game so as to pull live informa-
tion during a fight. As a result, the only way to provide information for the neural network
is by discerning information extracted from each individual on-screen frame.
Theoretically, a neural network may be able to learn to play simply by feeding in the raw
frame data. However, it would be much easier to control and test if more explicit informa-
tion could be fed. This requires manually interpreting information found on screen, using
computer vision techniques. Information such as the enemy’s guard direction, the presence
of an attack, an "unblockable" attack, a "bash", or a "guard-break" would all be required.
Table 3.1 was originally created to display all the game features desired to be extractable.
Health and stamina was not made extractable.
State Images
Guard Up
Guard Left
Guard Right
3.2. Design 9
Attack
Unblockable
Guardbreak
Bash
Health and Stamina
TABLE 3.1: A table of images that demonstrates the appearance of the user
interface during certain game actions, and shows what features were desirable
to extract.
3.2.2 Player Control Neural Network

This was an aspect of the project that could not be explicitly designed before launch, as the
"black box" nature of neural networks made it difficult to discern what kind of model would
perform well without attempting to train it.
Nonetheless, it was possible to make more broad judgements regarding the design of the
network. It was concluded that it would be likely that a Feed-Forward network would not
perform very well, but would be attempted due to its simplicity. This conclusion regarding
its performance was reached under the consideration of one key factor: feed-forward net-
works do not consider any past information; they have no memory. This means that any
decision made based upon data given would be made exclusively with the immediate data
provided at that frame, whereupon that information would be discarded. This would likely
not be sufficient, not only because For Honor both requires many actions to be carried out by
pressing a sequence of buttons, but also because considering an opponent’s previous actions
more than often helps more accurately predict and adapt to their next action.
Considering this need for memory, it was almost certain that a recurrent neural network
would be required - a type of model that feeds some of the information it receives back
and outputs it into itself as input. Specifically, the unit used would be the Long Short-Term
Memory cell [6] network discussed in the Literature Survey. (2 - Literature Survey). With
the ability to store large amounts of its previous processing and be able to strategically "for-
get" information that was not useful, it was far more likely than the feed-forward network
to see success. However, as aforementioned, the specificities of the model would be deter-
mined only when implemented. In fact, a significant portion of the resulting neural network
utilised convolutional layers of both temporal and two-dimensional natures. utilising con-
volutional networks was not considered at all during the design stage.
3.2.3 Output
The method most frequently employed by other Self-Playing AI (e.g. When "Sethbling"
developed a self-racing AI for the SNES title "Super Mario Kart" [15]) is to consider this
3.3. Implementation 11
a classification problem and represent each button that can be pressed as an output node.
By employing this method the output would be a "multi-one-hot" array where each index
represented a button press. The exact number of different buttons and how the array is
processed into direct game input was the first aspect of the project that was considered and
also to be implemented, and this process will now be explored first in the next section.
3.3 Implementation
This section explores the process of how each of the three tasks aforementioned were imple-
mented. Whilst the order of them in the Design was considered in an intuitive manner, this
section of the report details each problem in the order they were implemented. They were
attempted in this order due to the level of difficulty of each task.
3.3.1 Output
The first task was to demonstrate that one could easily control a game automatically using a
script. Since it was intended for the project to utilise Python libraries for the Input and neural
network portion of the project, (namely OpenCV and Tensorflow), research was conducted
regarding methods to simulate input using Python.
Initially, it was concluded that this task was completely trivial. Using Pyautogui, a library
for Cross-platform GUI automation, functions like pyautogui.keyDown() would have been
used to control the game. So initially a script was written such as:
import pyautogui
import time
while True :
pyautogui . keyDown ( "W" )
time . s l e e p ( 1 )
pyautogui . keyUp ( "W" )
However, whilst this worked in the shell, and in any normal text field (in this instance pro-
ducing the result of pressing the W key repeatedly, which is usually assigned in games as
the input to move a character forward), and even worked for regular interaction (pressing
the Windows key also opened the start menu), it would not work at all in any game, includ-
ing For Honor. Upon further research, a StackOverflow thread was discovered [2] in which
user Cas explains why these functions do not work in modern games:
The SendInput function will insert input events into the same queue as a
hardware device but the events are marked with an LLMH_INJECTED flag that
can be detected by hooks.
Modern games actually utilise DirectInput events, which is part of the DirectX development
API, meaning only triggered events are detected as input. After further research, another
thread with a script was found. A modified version was made by Harrison Kinsley [8], co-
incidentally as part of the same course that would be rediscovered and used for the next
subsection, which made it much easier to interact with. This script converts keyboard Hex-
Codes into DirectX input events.
Using this, a script was written that would allow for controlling the fighting system in the
game. A demonstration of this code can be seen here: (https://youtu.be/4Uf7F4eQSR0),
where the bot fights and kills an in-game bot which does not move or change its guard, by
locking on, moving towards it, and executing a series of attacks from two different direc-
tions. After this, it performs an "execution" animation, before unlocking the camera and
ending.
Refactored slightly, it demonstrates that the output can be easily abstracted such that compli-
cated sequences of gameplay interactions can be performed with a single function call. The
version written for use in the final implementation of the project can be found as "processIn-
putToGame.py", which contains the method "inputToGame()". This takes in the multi-one-
hot array, where each index corresponds to a button press. The algorithm iterates through
the array and sends the appropriate DirectX events.
The output array itself is discussed further in the next subsection.
3.3.2 Input
The second task is the matter of input for the AI - providing real-time data about the game
so the AI can use this information to make a decision as to how to act. This proves itself
as a non-trivial problem, as For Honor does not provide any kind of API to extract such
information, presumably because it could be used for cheating quite easily. As a result,
the implementation must, as discussed in the Design, utilise Computer Vision techniques
to extract information. This subsection details the process by which a state detector was
implemented to detect an enemy’s guard. It can be seen as a proof of concept in the following
supplementary video: (https://youtu.be/j0uWr_h7eZc) It follows by exploring how the
other important game states are captured.
The single most important aspect of the user interface that requires interpreting for the AI
is the enemy’s guard direction. Figure 3.1 shows an annotated screenshot of the game to
explain how the guard direction functions. Directional attacks come from one of these di-
rections, and the defender must match the guard in order to block the attack.
Given this information, the first aim was to make a simple program that detected the en-
emy’s guard direction and switched its guard appropriately. This would serve as a proof of
concept. However, the author was completely unfamiliar with OpenCV and computer vi-
sion as a field, and thus research and experimentation was carried out in order to learn how
to apply it to the project. Sentdex’s course on implementing a self-driving car AI in the 2013
title "Grand Theft Auto V" [8] contained an introduction to OpenCV and contained some
rudimentary boiler plate code that would subsequently become useful in the state detector.
Following the first several steps of this course, a rudimentary lane detection algorithm was
developed using OpenCV. It relied primarily on finding the main two lines in a frame, and
consider them the two boundaries of the lane. By analysing the gradients of these lines, the
algorithm would simply try to stay between them. Line detection is a very common method
for feature detection in computer vision. Since the guard user interface in For Honor is
essentially made up of six unique lines, a similar method was employed to extract the guard
direction.
Capturing the Screen

The screen itself could be captured very easily using OpenCV’s ImageGrab function; just
using the following line, the portion of the screen containing the game could be captured:
s c r e e n = np . a r r a y ( ImageGrab . grab ( bbox = ( 0 , 1 3 0 , 1 0 2 4 , 7 0 0 ) ) )
After brief experimentation, it was concluded that it would be best to run the game at a
resolution of 1024x768, as it is the lowest resolution the game allows to be run whilst still
maintaining a 16:9 aspect ratio. A smaller resolution is desirable, as it meant that the image
captured converts to as small a matrix as possible, which would mean operations manipulat-
ing it would perform as quickly as possible. With the image captured, it was ideal to restrict
the image solely to the region of interest, such that only sections of the screen bearing useful
information would appear. This was achieved by creating a mask matrix of entirely zeroes
(0) except for the region of interest, which contained ones (1). Then, by applying a Bitwise-
And operation with the mask and image, all pixels of the image except those covered by the
ones are turned black. This allows for only the guard user interface to be captured, though
the guard direction itself still needed to be filtered from this.
Developing the Guard Detection Algorithm using Line Detection

The next step to implement line detection was to threshold the image. Binary thresholding
is the process of filtering the image only to allow pixels above a certain brightness threshold,
and convert all other pixels to black. This could be done with a single line of code. After
some experimentation with the parameters, the resulting script was able to be left only with
the guard UI (with some residual but minimal noise, to be expected). Whilst in the con-
ventional case, edge detection is applied before any kind of line-finding, such as using the
Canny Edge Detection algorithm, the For Honor guard user interface is designed such that it
was worth attempting to apply line detection immediately, as it was itself mostly comprised
of white lines.
To actually carry out line detection on the thresholded image, a form of the Hough Lines
Transform was used, known as the Probabilistic Hough Lines Transform, provided as a
function by OpenCV. This is a more efficient form of the algorithm, because it only samples
from a random selection of points rather than the entire image. The probabilistic Hough
Lines transform also differs in that it returns the lines as a pair of Cartesian coordinates
that form the line segment, as opposed to the entire line. After experimenting with the
parameters, the resulting lines ended up appearing as intended. Figure 3.2 shows the results
of the lines being drawn to the captured frame.
F IGURE 3.2: The guard direction user interface and corresponding detected
lines.
Once a script had been written that successfully pulled information regarding the two lines
that made up the current guard, the following step was to use this information to detect the
actual guard direction. This was done by analysing the gradients of each line and taking
the average. Since each guard on the interface is made up of two unique lines, the resulting
gradient would lie between three ranges that would correspond to its direction.
After some experimentation logging the gradients of different guard directions repeatedly,
the following results were concluded: the lines making up the right guard had a gradient
m 6 −0.5. For the left guard, m > 1, and for the top guard, −0.5 > m < 1.
When this was implemented as a function switch_guard(), a proof of concept script was
written that would switch the guard direction to the opponent’s detected guard. When
tested, the algorithm successfully matched the opponent’s guard accurately, as seen in the
supplementary video linked at the beginning of this subsection.
It should be noted that, in order for the algorithm to detect the lines cleanly and properly,
some prerequisites were identified. Contrast should be relatively low, finalised at setting
19. The player’s character must maintain only a short distance from the enemy for these
states to be correctly identified, else the enemy becomes outlined in white, generating noise
that bypasses the binary threshold (though it should be noted that the end result is not too
heavily penalised in performance when the bot creates a large distance between it and its
opponent, thanks to other sources of data explained in subsequent sections). Only permu-
tations of maps that are set at dusk or night, and that have clear weather should be used.
The map used for all testing is "The Ring" set at Dusk. In the video, it can be seen that the
AI occasionally makes mistakes, and these are caused by noise generated by the snowfall.
A clear, dark environment helps minimalise noise.
Capturing Attack Indicators

With guard detection achieved, the remaining features needed to be captured. The next
state to detect was the ability to discriminate the different states of attack from the enemy.
Conveniently, each kind of attack has a different combination of colour presences. Table 3.1
shows the different states in the game that require capturing, and their appearance.
When an enemy attacks, the corresponding direction indicator grows larger and turns red.
This provided a new problem, as the red pixels fail to meet the required brightness levels
to be thresholded, and so disappear. The solution to this was to save the red pixels as a
map, then convert the red to white for thresholding and line detection. This way, the algo-
rithm does not go "blind" when the enemy attacks, causing erratic behaviour. Additionally,
a second image could be created where, following the thresholding, the red pixels are su-
perimposed upon the image so colour can be used as a discriminant to distinguish different
states. This will now be explored in greater detail.
For the implementation of the colour state detector with the attack indicator, the first step
was to determine the range of colours to filter in order to capture the attack indicator only.
It was becoming clear that using RGB values was capturing far too much in the way of
erroneous data. So, as a result, research was conducted on alternative representations of
colour, settling on using the HSV (Hue, Saturation, Value) scheme.
The function to filter for the presence of the attack indicator begins by converting the image
into HSV. It then creates a mask based on the colour range values that correspond to the
attack indicator. This was achieved with the cv2.inRange() function:
mask = cv2 . inRange ( hsv , np . a r r a y ( [ 0 , 2 3 4 , 7 4 ] ) , np . a r r a y ( [ 1 , 2 5 5 , 1 6 0 ] ) )
The values for the range were determined by qualitatively analysing screenshots taken of
the attack indicator.
Thus, by simply retrieving the array from the line: np.where(mask! = 0) the algorithm could
detect how many red pixels were in the frame. If the quantity exceeded a certain threshold,
it was very likely that the opponent was attacking.
However, for the attack indicators, as aforementioned, the algorithm also needs to convert
the red pixels into white in order to pass the brightness threshold. The method for the actual
filtering of the image for the attack indicator uses an algorithm made by StackOverflow user
Ray Ryeng [14]. This emulates the copyto function in the C++ edition of the OpenCV library
which the Python library lacks. It was used to copy all instances of the mask (using the locs
numpy array) onto the original image. In the end, this results in two separate images. One
image is all black except for any red pixels during an attack. The other is the original guard
detecting image, which shows the pixels that pass the binary threshold as well as showing
any pixels from an attack indicator after it has been changed to white.
Auto-parry Test
This process was devised after extensive research, but it appears to work very effectively.
Its use was tested by making an "autoparry bot." As explained in the For Honor explication
(3.1.1 - Art of Battle System), when an opponent throws an attack, there is a window of
opportunity in the first half of the attack’s animation to counter their attack. This is more
favourable than simply blocking, because it guarantees a counter-attack to land, and drains
the opponent’s stamina rather than your own. A simple testing function was developed that
checked if the enemy was attacking, and if so, to input the button to parry in the game. A
testing environment within the game was then set up, where the enemy bot would simply
throw a "light" attack in a random direction approximately every three seconds. This would
then theoretically allow the algorithm to automatically parry all light attacks thrown at it.
The result worked mostly as intended. When run, the AI controlled character successfully
blocked approximately 70% of attacks. However, it can be considered even more accurate
than that, since approximately 62% of the attacks it failed to parry were not because it de-
tected the wrong guard, but actually because it was programmed to react the moment it saw
an attack. However, in the game itself even with the quickest attacks the window of oppor-
tunity does not start right away, and the bot was reacting before this window had opened.
This is not a problem for the average human player, for whom, according to Human Bench-
mark’s online test results, the average reaction time is around 284ms [1]. This proves that the
bot is actually capable of reacting much faster than a human, since the window for parrying
starts 200ms into the attack and it reacts well before this.
Output of Feature Extraction

Given that this was all in the pursuit of feature preprocessing, it was important now to
consider how exactly the information would be passed into the neural network. Given that
information should be presented in as minimal a form as possible in order to optimise the
training time, and that the input data needed to be fully normalised for a neural network
to learn, the input data was structured to be represented as a 10 index, 1 dimensional array.
Each index represents a different state, and the value at that index is a boolean 0 or 1 to
represent whether or not the corresponding feature is present at the given frame.
The full array index legend, in order:
• Player’s right guard
• Player’s top guard
• Player’s left guard
• Opponent’s right guard
• Opponent’s top guard
• Opponent’s left guard

• Opponent attacking
• Opponent unblockable attack
• Opponent bashing
• Opponent guard breaking
So, taking an example extracted array such as the following: [0, 1, 0, 1, 0, 0, 1, 1, 0, 0] would
convey that the player is guarding top, whilst the enemy had their guard to the left, and
were currently attacking with an unblockable attack. This way, much information could
be learned from very little actual stored data. Implementing this was simply a series of if
statements, as there is no cleaner way of achieving this in Python due to the lack of switch
statements.
Developing Remaining Feature Extraction Utilising Colour Presence

With colour state detection now achieved, the next steps were to complete feature processing
by adding state detection for the remaining attack types. This was done using the exact
same method as capturing the attack frames: using the cv2.inrange() function and masking
it to capture the number of pixels in that range. The quantity is then thresholded, and if it
passes said threshold then that attack can be considered present. Table 3.2 displays all of the
ranges that the state detector measures for each kind of attack. Note that bash attacks are
not present because an attack can be considered a bash if it is unblockable but does not have
a direction.
Attack Type Lower Boundary Upper Boundary

Attack [0,234,74] [1, 255, 160]
Unblockable [8,250,135] [11, 255, 164]
Guard break [0,247,68] [1, 255, 85]
TABLE 3.2: HSV Colour ranges for each type of feature that is captured by
detecting the presence of said colours.
Additionally, from research conducted on precedent in this field, it appeared beneficial to

also pass in an extremely low-resolution greyscaled version of the frame as information, in
order for the network to have more context-sensitive information with which to form its
own pattern detection, as done in the aforementioned Super Mario Kart AI [15]. This will
be explored further in the next subsection, but the data capture system also stores this.
Capturing Gamepad Input

One last aspect of the state detector remained, hitherto unmentioned. At this stage, the
author’s understanding had reached a point that it was clear that the most trivial approach
to the actual learning stage of the project was to make the network supervised and teach it to
play like a human. A human subject could, whilst capturing data, also have their gamepad
inputs recorded and use them as output data. This was achieved using the Inputs library for
Python. The script produces a multi-one-hot array of size 13 where each index represents a
game action triggered by gamepad input. The list is as follows:
• Move forward
• Move left
• Move backward
• Move right
• Guard left
• Guard top
• Guard right
• Guard break
• Light attack
• Heavy attack
• Dodge
• Feint
• Taunt/Execute
• Idle (Do nothing)
However, the initial script written was changed much later in the project after it was dis-
covered that the analogue sticks would not be detected at all if they were not being actively
moved. This was due to a problem related to threading. As with other buttons, analogue
stick data comes in a buffer that has to be read from, and that buffer does not update if the
sticks do not move. To solve this, a new thread was made only for gamepad input detection.
This way the buffer could be read from constantly within its own loop, and ensured that
when holding an analogue stick in a direction that it would consistently update and detect
properly.
With the state detector completed, a finalised script was implemented that could start,
pause, and stop recording of data and then pickle it to a file, thus completing the feature
preprocessing stage. The lack of testing of the state detector is detailed and justified in a
later section of this report (4.1.1 - Infeasibility of Evaluating the State Detector).
3.3.3 Player Control Neural Network

This subsection explores in detail the process of developing the neural network to control the
player. Several different types of models were attempted - feed forward, recurrent, mixed
data recurrent - and this subsection is split into an explanation of the implementation each of
these different models, followed by each method attempted to improve upon the previous
model. This subsection is detailed in a narrative manner so as to better describe the process
of the implementation itself and help justify why decisions were made and the consequences
thereby invoked. The final results of the implementation discussed here can be seen in the
following chapter (4 - Testing and Evaluation).
Feed Forward Network

The state detector captures a multi-one-hot array describing the presence of important fea-
tures extracted from the frame. With the input stage complete, the next logical step was to
build the neural network model. The initial model was entirely feed forward. Feed forward
models process information and pass it to subsequent layers before reaching the output node
to decide on what to return. Feed-forward models do not retain any information between
passes, and thus each set of inputs passed into a trained feed-forward model will output
the same value irrespective of any passes that occurred before. Concisely, they possess no
"memory" of any kind.
Research discussed earlier in the report (2 - Literature Survey) had suggested that it was
likely that this would not be sufficient, as context related to previous actions made by the
opponent (in the form of combination moves, timed events, and the individual’s preferred
style of play) and the player’s own previous actions and button presses would be required to
make intelligent decisions. However, it was still worth attempting to observe its efficiency,
and to serve as an additional benchmark for the later models.
The Tensorflow library is used to model, train, and run the neural networks, using Keras
as a wrapper in order to abstract and simplify the process. Keras allows one to abstract the
declaration of, for example, a dense layer in a model. This would normally require several
complicated lines describing the nature of the weights, nodes, and activation function, such
as:
h i d d e n _ 1 _ l a y e r = { " weights " : t f . V a r i a b l e ( t f . random_normal ( [ 1 2 8 ,
n_nodes_hl1 ] ) ) , " b i a s e s " : t f . V a r i a b l e ( t f . random_normal ( [
n_nodes_hl1 ] ) ) }
l 1 = t f . add ( t f . matmul ( data , h i d d e n _ 1 _ l a y e r [ " weights " ] ) ,
hidden_1_layer [ " b i a s e s " ] )
l 1 = t f . nn . r e l u ( l 1 )
Using Keras, this simply becomes:

model . add ( k e r a s . l a y e r s . Dense ( 1 2 8 , a c t i v a t i o n = t f . nn . r e l u ) )
Allowing much more convenient and powerful control over making immediate major changes
to the model structure.
This was a simple enough process. Some test data was captured by fighting a dummy CPU
several times. Then, a simple model was developed that takes in the input state, along with
the buttons pressed at each given frame to serve as a training output. The scaled frame was
not yet passed in, so as to observe what the minimum necessary input data required. The
model itself was comprised of three Dense (fully-connected) layers of 128 rectified linear
nodes (hereafter this activation function is referred to as "relu"), followed by a softmax layer
of 14 output nodes, one to represent each button, and a final node to represent idling. Soft-
max activations are used for one-hot encodings, meaning only one button would be pressed
at a time. Figure 3.3 has been generated by Keras visualising the model:
F IGURE 3.3: The feed forward model, made up of three rectified linear activa-
tion layers followed by a softmax output layer.
The initial model was trained on the dummy data. The samples size was low and the data
itself relatively noisy, and so a fairly low initial accuracy was expected. However, after the
model ran, Tensorflow claimed to have achieved a validation accuracy of approximately
70%. The only true means of testing the model was to test its performance in the game,
and so the process of developing a separate script began; a script that would load a trained
feed-forward model and use it and the state detector to "predict" game inputs.
It took an unexpectedly large amount of time to implement the real-time predictor, due
to seemingly running out of video memory when trying to run both For Honor and the
network at the same time. This was solved upon discovering that Keras tries to allocate
as much of the VRAM as possible to itself unless otherwise specified, much of which was
being consumed by the game in this case. By forcing a maximum allocation of 30% to the
neural network, both the program and the game were able to execute properly when run
simultaneously.
When the model was initially run, it was observed that it had been incentivised to almost
exclusively predict "Idle" - i.e. to do nothing at all. The author’s domain knowledge assisted
in identifying the cause of this erroneous behaviour. This is due to the fact that, in any given
frame in a fight, it is unlikely that a button would be being pressed by a player, as duels in
For Honor are carefully paced and designed to avoid "button mashing" behaviour (the act
of aggressively attacking by pressing essentially random buttons that lack any thought or
skill). As a result, most of the data explained that the right answer was indeed to do "noth-
ing", and thus the network had learned that if it did nothing every single time, then it would
get the majority of its behaviour correct, thus explaining the erroneously high accuracy.
This problem of imbalanced data is extremely common in the field of Machine Learning
and as a result a myriad of methods have been developed to tackle it. However, since the
primary solution - obtain more balanced data - was not possible due to the nature of the
game itself as explained above, the next best solution was implemented: modification of the
class weights. This allows certain outputs in the training set to be worth a certain factor
more than other classes. For example, when utilising this method, the weightings could
be set such that every frame containing an output with an actual button press would be 50
times more important than any idling examples. At first, employing this method appeared
to resolve the problem. The trained model, when run in game, appeared to make some in-
formed decisions, such as occasionally parrying attacks. At the very least, it was observably
no longer classifying "idle" exclusively.
However, after providing the network with more robust data comprised of a large amount of
successful fights, it was becoming increasingly apparent that more data was not improving
the performance of the network. It was uncertain as to if it had learned at all. As a result of
this failure to learn, the new goal regressed temporarily to teach the network to simply block
attacks. This task is a trivial mapping, teaching it to match the same guard as the opponent’s
whenever they attacked. It was at this point where the threading issue with the gamepad
detection was identified, discussed earlier in (3.3.2 - Input). In addition to this, the model’s
final output layer was changed from a softmax one-hot output to a sigmoidal layer. The
Sigmoidal activation function outputs an array that represents the probability of the inputs
resulting in each corresponding class. This is then interpreted such that if any probabilities
exceed a certain threshold then to input the corresponding button. This allows the network
to "press" multiple buttons simultaneously. Once these two issues were rectified and the
network was retrained on fresh data, the automatic blocking bot worked very effectively
(supplementary video: https://youtu.be/LR2Tm7F-w80). Results from 100 thrown attacks
can be seen in Table 3.3.
Direction Blocked Not Blocked

Left 22 9
Up 32 3
Right 29 5
TABLE 3.3: The number of blocked and landed opponent attacks before cor-
recting the region of interest. Note that this was a "sanity test" to ensure the
network was making intelligent decisions. Results of final performance tests
are detailed in Chapter 4.
Whilst it was relatively accurate, then with a total accuracy of 83%, the individual accuracy
of attacks that came from the left were significantly less so, with an accuracy of only 71%.
This led to the conclusion that the state detector was, for an unknown reason, not as effective
at detecting the left guard than the others. The root of this problem was not difficult to find;
With some visual testing it could be seen that the Region of Interest cropping, discussed
initially in (3.3.2 - Input), was cutting off the left guard, and even a small part of the right
guard. Increasing the dimensions of the space kept unmasked was enough to fix this prob-
lem. This can be observed in the significant drop in left and right missed blocks, displayed
in Table 3.4.
Direction Blocked Not Blocked

Left 31 3
Up 28 4
Right 34 0
TABLE 3.4: The number of blocked and landed opponent attacks after correct-
ing the region of interest. Results of final performance tests are detailed in
Chapter 4.
However, even with the threading issue resolved, capturing a new set of fights with the cor-
rected data capturing system and training the model yielded no improvement in fighting
performance. The details as to how models were compared are explained in (4 - Testing
and Evaluation). There were two main ways to proceed from here to improve the model.
One method was to add the other type of input data to be fed into the network. At that
stage in the development, only information passed from the state detector and the previous
gamepad inputs from 6-60 of the previous frames had been used; the normalised, low reso-
lution frames stored alongside this data had yet to be utilised. However, additional research
determined that developing models for mixed inputs was a non-trivial task. In the article
"Keras: Multiple Inputs and Mixed Data" [13] Dr. Adrian Rosebrock writes:
Developing machine learning systems capable of handling mixed data can be ex-
tremely challenging as each data type may require separate preprocessing steps,
including scaling, normalization, and feature engineering.
Working with mixed data is still very much an open area of research and is often
heavily dependent on the specific task/end goal.
With this considered, and given the time frame involved, it was more logical to consider the
second option first - converting the network to be recurrent rather than feed-forward, then
consider implementing a mixed input network afterwards if there was still no noticeable
improvement.
Recurrent Network
The structure, advantages, and workings of recurrent networks, specifically long short term
memory networks, are discussed in (2.2 - Neural Networks in Self-Playing AI). The envi-
ronment had been modified prior to allow for GPU training. With CuDNN also installed,
this allowed for the CUDA version of the Tensorflow LSTM cell to be used (which allows
training on the GPU instead of the CPU, which is significantly faster). Using LSTM units,
a recurrent network model with Dropout was developed. Dropout is a method to tackle
overfitting by randomly "switching off" a specified percentage of units by setting them to
zero. The optimum fraction of units to apply Dropout to was subject to experimentation,
but began with 10%. Figure 3.4 depicts the initial model structure.
Capturing training data becomes more difficult with the addition of network memory. With
a feed forward network, specific actions could be captured in a controlled environment be-
cause context did not matter. For example, learning to successfully "counter guard break"
when the opponent attempts to guardbreak the player could be learned by having the oppo-
nent guardbreak repeatedly. However, a recurrent network considers sequences of events,
and in a real fight such a scenario would not ever take place. As a result, the only way to
properly capture data was to play through full, normal fights.
Initial data consisted of 54 rounds worth of fights versus a Warden of Difficulty Level 2,
50 of which were won. Initial results were very promising. The network still failed to do
simple actions like intercept attacks by blocking (though this was considered to be due to
a simple lack of data), but was able to perform complex and skilful evasions and counter-
attacks. For example, the network learned to optimally counter a "shoulder bash" move by
doing the appropriate four action combination attack. It was able to occasionally win against
Difficulty Level 1 bots, and even once defeated a Difficulty Level 2 bots (supplementary
video: https://youtu.be/6I5jLyAHcjY). With this level of success, the next step was to
capture much more data and observe its performance.
F IGURE 3.4: The initial recurrent layer, comprised of a CUDA-enabled LSTM

layer which has Dropout applied and is subsequently flattened to allow the
output to be passed into a Dense layer, and output using sigmoidal activation.
Attempts to improve the Recurrent model

First, data was captured of approximately 15 000 samples, capturing 600 frames of previous
gamepad input history (approximately ten seconds of previous information). Experimenta-
tion was performed with this data and existing network with the network hyperparameters.
This included modifying the learning rate, dropout, and decay rate. None of these made any
noticeable difference. More LSTM layers were added to the network to see if this improved
the performance. However, if a single layer universal approximator fails to learn from the
provided data, it is very unlikely that additional layers will make much improvement, and
this was seen to be the case. Even after much more data was recorded, with 50 000 samples
taken from fighting a Level 3 bot, there was no improvement.
Despite a supposed validation accuracy of over 90% every time, none of the discussed meth-
ods provided any kind of improvement. The high validation accuracy is discussed later in
this chapter (3.3.3 - Class Imbalance with Multi-Label Classification). However, at this point
it was considered that it may be that 10 features alone might not have been rich enough
data to learn such a complicated classification task. The next step taken was to provide the
network with the image data as well. This would require a vast restructuring of the network.
Mixed Input Data Network

In order to construct a neural network that made use of the image data, Keras’ Functional
API was utilised. Up until this point the network had been utilising the Sequential API,
which required each layer added to be subsequently connected to the next layer added. The
Functional API is much more powerful, as it allows much more control over which parts
of the network connect. The new neural network consisted of two branches; one branch
was the existing LSTM layer that received the categorical state data. The other branch was a
convolutional layer that processed the image data. These were both then flattened and con-
catenated so they could be processed together at activation, after being abstracted through
several dense layers. Figure 3.6 depicts this structure. Convolutional layers were not inves-
tigated during the research phase of the project, but are made up of two processes, convolu-
tion and pooling. Convolution is the splitting of the data into "feature maps". For example,
in Figure 3.5, the dataset showing human faces splits the images into features such as eyes,
ears, noses, lips, etc. This is done by moving a window (kernel) along the data and, for
each subset of data, checking the rest of the image for similarity to the pixels in the window.
This is convolving. What results is a map for each feature that represents what areas of the
data sample are likely to contain that feature. The convolution layer is this resulting stack of
feature maps. Pooling is then performed, which walks another smaller window along the
feature map and attempts to condense the captured window into a single data point. This
is most often done in the form of MaxPooling, where the highest value in the window is
chosen.
F IGURE 3.5: Visualisation of convolutional neural networks. Image taken

from [9].
However, this network still did not provide any kind of improvement. In fact, with the new
and much larger dataset, the performance of the network was actually worse than the initial
recurrent’s - it failed to appear that the network was ever making intelligent decisions. It was
at this point that the different datasets were compared. The initial set, though much smaller,
still performed better than the new set. Besides a smaller size - which should theoretically
have an adverse effect on learning - the only difference was the number of previous frames’
outputs being fed as inputs; the initial dataset stored only 6 frames of history, whilst the
newest dataset contained 600. This potentially implied that training and predicting using a
higher quantity of frames of history was less effective than using a lower number, if at all.
Based on this theory, the network was then retrained, but with the input history completely
F IGURE 3.6: The mixed-data network that accepts both the categorical and
image data, fed through an LSTM layer and a convolutional layer respectively,
before being concatenated and fed through Dense layers till activation.
removed, training solely on categorical and image data. Upon training, the network began
to exhibit more intelligent and reactive decision making again. Not enough to be considered
competent, but marginally better than the initial recurrent net. The reason for this improve-
ment is considered to be that by having a large history of previous button inputs, it makes
the problem space much larger, and it is very unlikely that the same extended sequence
of inputs leading to the exact same result happens enough times to learn from. Thus, this
problem may be eliminated if there was significantly more data available.
It was discovered after further research that the categorical data was being passed into the
network incorrectly. This is because LSTM units do not learn sequences of events implicitly,
rather each sample has to be accompanied by a history of previous samples split into indi-
vidual arrays. Until now, it had been provided with a sequence length of 1. By providing it
with a small history, around 6-10 previous samples worth, the performance improved con-
sistently. However, increasing the sequence length yielded the same problems as discussed
with training using output history, meaning until a much larger dataset can be acquired, the
sequence length must be kept short.
Class Imbalance with Multi-Label Classification

During the entire span of training, the purported validation accuracy of the networks had
always been over 80%, with the mixed data network achieving over 90% without fail. How-
ever, this supposed success did not at all translate accurately when tested in-game. After
considerable research, the conclusion was reached that the primary reason for this poor,
disjointed learning was due to data imbalance. Certain labels, particularly those pertain-
ing to movement and guard direction were far more present than others. Table 3.5 depicts
how imbalanced this data can be. This explains the paradoxically high validation accuracy,
which is merely representing the class distribution. The network has been incentivised only
to classify the most over-represented labels, and to consider the total misclassification of all
rarer labels to part of the acceptable error.
Class label
0 1 2 3 4 5 6 7 8 9 10 11 12 13
4999 2997 1095 1997 2814 1862 3468 455 1035 768 353 14 38 4897
TABLE 3.5: Example dataset displaying the extreme class imbalance. Whilst
classes 11 and 12 can be considered too low in sample size to learn from, they
are not very important to overall gameplay performance, and were included
for the sake of completeness.
This problem was originally thought to be solved using the application of class weighting.
As explained previously, class weighting allows for the amplification of the cost of misclas-
sifying certain samples during training. Thus theoretically if one applies a higher weighting
to under-represented labels, then they will be considered more "important" than more reg-
ularly occurring labels. However, this method, and indeed other traditional methods (such
as randomly over/undersampling) do not work with multi-label classification tasks. This
is because when applying an amplified cost to a misclassified sample containing an under-
represented label, because of the multi-label nature of the data, it is very likely that one of
the more over-represented labels would also be present in that sample. Thus that label gets
considered more important as well. Indeed, applying balancing methods such as this to
multi-label data can actually exacerbate the problem. Data balancing for multi-label classi-
fication is still a developing field of research.
Change from Accuracy to F Score as Metric

In order to apply more intelligent and useful improvements to the network, a better metric
for determining the efficacy of each trained model was required. Training via live testing
was nebulous and inconclusive, especially by the time in the development cycle that it was
becoming apparent that there would likely not be enough time to achieve a model that can
win fights consistently, though this was not the case in the end. This nebulousness was
exacerbated, as previously mentioned in this section, because accuracy was proving to be
a poor metric to ascertain the network’s predictive power. This phenomenon is known in
Data Science as the Accuracy Paradox, where a model with a given accuracy predicts more
correctly than a model with purportedly higher accuracy. The reasoning for this has already
been explained earlier in this section - it is due to the over-representation of certain classes
in the system the model is attempting to predict. For example, if Class A appears in 99%
of a sample set, and Class B only 1%, then a simple network may be incentivised to simply
predict Class A every single time, achieving a 99% accuracy value despite being a blatantly
poor predictor due to consistently misclassifying Class B.
As a result, a better metric for measuring was required. F Score was chosen. The F Score is a
balanced compromise between precision and recall, and can be considered relatively harsh
as a result, because often networks have a high value for one metric or the other. Precision
and recall are calculated as follows:
TP
precision =
TP + FP
TP
recall =
TP + FN
F Score is calculated as follows:

−1
recall −1 + precision−1

F1 =
2
Which is computationally faster when simplified to:
precision × recall
F1 = 2 ×
precision + recall
The custom metric implemented was StackOverflow user Paddy’s version of F Score and
can be found at the beginning of all files named "trainModel.py" [12]. When this was used
and tested with prior networks, it was evident that they all suffered extremely low F Scores.
Results are discussed in the following Chapter (4 - Testing and Evaluation). As a result,
optimising for F Score was the ideal route, as it provided a useful and realistic performance
metric which was less susceptible to the Accuracy Paradox.
Custom Loss Function

Further research was conducted regarding the cost function. Up until this stage of the imple-
mentation, the network had been utilising binary cross-entropy as its loss function, which
essentially creates a binary classifier for each class. However, research suggested that the
primary approach to tackling class imbalance in multi-label classification was by way of a
custom loss function, one that applied weightings to the cost of misclassification dependent
on how under or over-represented each class is in the dataset. After reading through P-GN’s
StackOverflow advice [11] in a thread discussing imbalance tackling techniques, and refer-
ring to the Tensorflow documentation linked in the same reply [18], the final loss function
that was decided upon was a custom implementation of Tensorflow’s
weighted_cross_entropy_with_logits method for Keras. It simply returns the expression:
Σ ((y × −log( X + e) × W + (1 − y) × −log(1 − y + e) × (1 − y))
Where y is the set of the batch’s targets, X is the input batch, W is the set of class weightings,
and e is a small constant, 1e − 10. By applying the weightings such that each weight is
inversely proportional to each class’s presence in the dataset, the loss function decreases
the false positive count. This has the effect of increasing the precision at the cost of the
recall. When tested, the network showed considerable improvement, and the in-game tests
displayed a much more intelligent level of decision making, despite still not being able to
win fights consistently.
Temporal Convolutional Layer

Further research on the nature of multi-label learning using sequences revealed that one
other potential option instead of LSTM units was instead to use temporal convolutional
layers. Temporal convolutional layers are the same as standard convolutional layers, but
are only 1-dimensional. This means they are often used to split up time sequences rather
than images or other two-dimensional matrices of data. This would theoretically perform
better at extracting the important features from each set of sequences than the LSTM layer
as it is specifically designed for time sequence data of this nature.
Replacing the LSTM layer with a Convolutional layer did indeed have positive results, and
so the LSTM unit was, at this stage, switched permanently to a convolutional net.
Model Optimisation Utilising Tensorboard

With the basic structure of the model finalised, the next key part to improving performance
of the network was to optimise the model’s parameters. The number of layers for each
branch and the number of nodes were all major features that could have significant impact
on the final performance after training. In order to discern the optimal model, the typical
solution is simply to train every single permutation of a model within a specific set of ranges.
Each parameter’s range was quite small due to the sheer amount of time it would take to
train every combination of model. Below is the list of all of the parameters that were tested
and their ranges:
• Layer sizes in nodes (32, 64, 128)
• Temporal Convolutional layers (1, 2, 3)
• 2D Convolutional layers (1, 2, 3)
• Dense layers following flattening (1, 2, 3)
In order to more easily compare models, one particular callback function was used during
training known as Tensorboard. Tensorboard allows real time visualisation of the training
process by graphing the metrics as the training occurs. Of all the metrics graphed, the vali-
dation F Score could be considered the most important metric to observe, as training F Score
may be corrupted by data overfitting. Each network was trained for 20 epochs, and Graph
4.1 depicts some of the key results.
According to the graph, the best model of all the models tested contained 128 nodes, 2
temporal convolutional layers, 3 2D convolutional layers, and 2 dense layers. It should
be noted that all models with 128 nodes were better than all those with 64 nodes, and the
same applies between 64 nodes and 32. Therefore, theoretically this suggests that using a
model with 256 nodes per layer would bring further improvement. However, the machine
being used to perform training possessed too little memory to handle such a large network.
Further analysis of results are detailed in the next chapter (4.1.2 - Tensorboard).
Addition of Third Branch

With the addition of a useful loss function, and a newly finalised model structure, one major
final improvement was attempted - back during the initial phases of the implementation, the
history of previously pressed gamepad inputs was being used as an input, and initially im-
proved performance. However, it degraded when the sequence length was any higher than
1. This was due to two reasons: the primary problem was that, as discussed, the sequences
were being fed incorrectly - as one long string rather than being split into separate lists. The
second issue is with longer sequences, more data is exponentially required for the network
to learn. This is because an output produced given some input data with a given state is
much less likely to occur multiple times when the input data is a much longer sequence of
previous inputs, as the entire sequence needs to be the same every time. As a result of this,
this input data was removed from the network. At this stage in development, it was now
reintroduced in its own separate branch.
The input history branch was initially using a temporal convolutional layer due to the im-
provement it demonstrated for the other categorical data. The data capturing system was
set up to capture singular frames’ previous input so that different size sequences could be
constructed after the fact, during preprocessing. A new dataset was captured with this sys-
tem in place - 50 rounds of gameplay, approximately an hour’s worth of duels versus a level
2 bot. With this captured, the next stage was to analyse and experiment with the sequence
length. It quickly became apparent with minimal experimentation, however, that the model
still suffered the problem of lack of comprehensive data with long sequence lengths. In
fact, a sequence length of 1 was found to provide the highest predictive power. This would
be re-investigated at the end of the implementation. At this time, the implementation was
interrupted by a new issue that had become apparent.
Network Early Stopping During Training

In order to train effectively, at this stage of the implementation a key problem that presented
itself was potential for overfitting. Even with a high F Score during training, the validation
F Score was significantly lower, which suggested that the network had overtrained, being
trained for 60 epochs. However, with too few epochs trained the network would risk un-
derfitting. As a result, a common tactic for tackling overfitting was utilised - early stopping.
Early stopping is the method of ceasing training if after a certain number of epochs there has
been minimal or negative change to a certain metric. In this instance, training was ceased
if there was no change to the validation F Score after 5 epochs. At the end of training, the
network would also revert back to the model that achieved the best performance. This was
achieved utilising a built-in Keras callback that allows the definition and customisation of
an "EarlyStopper". After some experimentation, this too improved the model somewhat,
achieving a 66.15% validation F Score. However, despite the apparent improvement, the bot
was still not even adequate to complete any of the end specification.
Completion of Third Branch

The final major change resulted in breakthrough results. Originally, when the information
being passed into the third branch was first added at the beginning of the implementation, it
was passed through an LSTM layer, but the third branch was convolutional. During exper-
imentation, this was switched to an LSTM layer, to observe if it would perform any better.
The results were profoundly significant. The network learned extremely well, and the per-
formance in-game was suddenly extraordinarily competent compared to previous results.
The AI demonstrated a clear ability to perform difficult game actions, such as parrying and
even counter-guard breaking, both of which require decision making and actions to be taken
within a window of mere hundreds of milliseconds.
With this significant improvement in performance, it was now observed that there were
still certain areas of the bot’s performance that were poorly utilised or under-represented
- dodging, guarding top and guard-breaking were all used too rarely. At this point, the
weights for the corresponding classes were increased during training such that they would
be classified positive more often. Over the course of several iterations, this was fine-tuned
to an observably optimal level.
At this stage, time for development was ending, and the bot had become rapidly more effec-
tive at fighting. The areas where it lacked good decision making were mitigated by provid-
ing more data of just that problem area, such as dealing with attacks coming from the top
direction. The bot was then tested against other bots, as well as humans. Whilst it does not
win often, it plays far better than a new player to the game, and can even occasionally de-
feat a human opponent. With this achieved, the implementation was considered complete.
Figure 3.7 displays the finalised network structure.
F IGURE 3.7: The finalised neural network model. Input 1 is the categorical
state data. Input 2 is the low resolution, RGB image data. Input 3 is the pre-
vious gamepad input, either provided by the training data, or in the case of a
real-time test it receives its own previous prediction.
31
Chapter 4
Testing and Evaluation
4.1 Verification
4.1.1 Infeasibility of Evaluating the State Detector
Evaluating the state detector is a very non trivial task. There are only two empirical ways
to observe the accuracy of the state detector. Both require testing its classifications against
labelled data. The first method is to label footage completely manually. Given the time
frame available, this is unreasonable due to the sheer volume of labels required. Each frame
could potentially be a different classification, meaning an average two-minute fight at 60
FPS would require 7200 manual labels.
The other method is to record the opponent’s button inputs, and convert them automatically
to the appropriate labels. However, this is again infeasible due to the time constraint. The
states being recorded are context sensitive, and also combination sensitive, meaning certain
sequences of inputs lead to different states. To explicitly depict this, a single character, or
"Hero" has had its moveset described in terms of a finite state transducer:
Q = { Idle, SingleLightAttack, ChainAttack}
Σ = { R1, R2, LS ↑, X, O, , Wait, HoldPrevious/Release, HitOrBlockedPrevious, HoldPrevious}
Γ = { LightAttack, HeavyAttack, GuardBreak, ChargedHeavy, Lion0 sClaws, Lion0 sFangs, Lion0 sBite,
Lion0 sJaws, Eagle0 sFury, Eagle0 sFuryAlternate, LegionKick, LegionKickCombo, Jab, ChargedJab,
JabCombo, QuickThrow, Lion0 sRoar, Feint}
I = { Idle}
32 Chapter 4. Testing and Evaluation
F = { Idle, SingleLightAttack, SingleHeavyAttack, ChainAttack,

GuardBreaking, UninterruptibleAttack, UnblockableAttack, UnblockableStunningAttack,
UnblockableStunningProningAttack}
σ = {( Q0 , Σ0 , Γ0 , F1 ), ( Q1 , Σ1 , Γ0 , F3 ), ( Q1 , Σ5 , Γ15 , F0 ), ( Q2 , Σ0 , Γ4 , F0 ), ( Q0 , Σ0 , Γ0 , F1 ), ( Q1 , Σ0 , Γ0 , F3 ),
( Q2 , Σ1 , Γ5 , F0 ), ( Q0 , Σ0 Γ0 , F1 ), ( Q0 Σ1 , Γ7 , F3 ), ( Q0 , Σ1 , Γ1 , F2 ), ( Q0 , Σ1 , Γ6 , F3 ), ( Q0 , Σ5 , Γ2 , F4 )
( Q0 , Σ2 + Σ1 , Γ8 , F5 ), ( Q0 , Σ2 + Σ3 + Σ1 , Γ9 , F5 ), ( Q0 , Σ1 + Σ8 + Σ5 + Σ7 , Γ12 , F7 ),
( Q0 , Σ1 + Σ8 + Σ5 + Σ9 , Γ13 , F8 ), ( Q0 , Σ1 + Σ8 + Σ5 + Σ0 , Γ14 , F7 ),
( Q3 , Σ0 + Σ6 + Σ0 + Σ6 + Σ0 , Γ16 , F0 ), ( Q0 , Σ1 + Σ4 , Γ17 , F0 )}
The other important aspect of this to observe is how many items in σ result in the same
output, adding an additional layer of complexity to trying to output this. Coupled with
the fact that each character has a different move-set, evaluation by this method becomes
infeasible under the project time constraint.
4.1.2 Evaluating the Neural Networks

This section will describe intricately the efficacy of both types of neural networks under
their appropriate headings by measuring the number of wins against certain difficulty bots,
and comparing their performance with humans. In addition, the validation F Score is pro-
vided for each model after retraining. The function of the F Score metric is described in the
previous chapter (3.3.3 - Change from Accuracy to F Score as Metric).
Random Fighter
To serve as a benchmark for lower-boundary comparison, a naive bot was implemented that
simply output a random array of gamepad inputs every 0.1 seconds. 100 rounds of fights
were recorded in wins and losses, and the results are displayed in Table 4.1. It achieves a
Win/Loss ratio of 0.04.
Random Fighter
Wins Losses
4 96
TABLE 4.1: Performance results of the random fighter after 100 rounds of du-
elling a Level 1 Warden.
Feed Forward
The feed forward network was the first network to be tested. It exhibited some intelligent be-
haviour, such as blocking attacks, occasional parrying, but failed to play consistently enough
to win often. Results are displayed in Table 4.2. It achieves a Win/Loss ratio of 0.15. It is
decisively more effective than a random fighter.
4.1. Verification 33
Feed Forward
Wins Losses F Score
15 85 0.4010
TABLE 4.2: Performance results of the feed forward network after 100 rounds
of duelling a Level 1 Warden.
Recurrent
The initial recurrent network was tested similarly. Results are displayed in Table 4.3. It
achieved a Win/Loss ratio of 0.23. The network evidently performed significantly better
than the feed-forward network.
Initial Recurrent
Wins Losses F Score
23 77 0.4827
TABLE 4.3: Performance results of the initial LSTM network after 100 rounds
of duelling a Level 1 Warden.
Subsequent tests were performed of a smaller scale upon attempting to improve the recur-
rent net. The tests were of a smaller sample size simply due to the time constraint of the
project. They do, at the very least, suggest a general tendency towards improved effec-
tiveness, and can be seen in Tables 4.4, 4.5, 4.6, and 4.7. As discussed in the Implementation
(3.3.3 - Attempts to improve the Recurrent model) the new dataset saw a severe degradation
in performance, and several methods were attempted in order to fix this degradation.
Pre-Improvement
Wins Losses
2 18
TABLE 4.4: Initial results of 20 rounds of duelling a Level 1 Warden before

improvements were attempted. Note that no F score was recorded for this
model as it was discarded before the switch to this metric. Win/Loss ratio: 0.1
Prediction History Removed

Wins Losses
4 16
TABLE 4.5: Results of 20 rounds of duelling a Level 1 Warden after the net-
work was trained without the previous frames’ gamepad outputs as inputs.
Note that no F score was recorded for this model as it was discarded before
the switch to this metric. Win/Loss ratio: 0.2
Mixed-Input Model
Wins Losses F Score
4 16 0.3247
TABLE 4.6: Results of 20 rounds of duelling a Level 1 Warden after the net-
work was changed to accept multiple inputs. Win/Loss ratio: 0.2
Multi-Input Model
Wins Losses F Score
6 14 0.6617
TABLE 4.7: Results of 20 rounds of duelling a Level 1 Warden after the data
being passed into the LSTM layer was fixed by converting it into a series of
sequences. Win/Loss ratio: 0.3
It should be clarified that 20 rounds of duelling is not at all a large enough sample size to
make any conclusive observations about the efficacy of each model. The testing data above
is primarily used as a means to show that none of the model modifications displayed any
significant improvement. A larger sample size would be desirable, but each round can take
from 30-180 seconds to complete, and due to the time frame of the project, obtaining a larger
sample to test each and every attempted improvement was simply infeasible.
Finalised Model
Once the final model had been determined, and the only remaining major improvements
were to qualitatively modify the misclassification weightings and provide more data, the
model was tested rigorously against both the Level 1 and Level 2 difficulties of bot - a test
not previously performed on other models, as initial testing demonstrated that they could
almost never defeat a level 2 opponent. The final model achieved an F Score of 0.8091, a
significant improvement over every other model, and this is evident in the test results:
Multi-Input Model
Wins Losses
74 26
TABLE 4.8: Performance after 100 rounds of duelling a level 1 Warden. A

remarkable improvement over other models in a relatively small step, switch-
ing the Conv1D branch that processed the previous gamepad inputs to using
LSTM. Win/Loss ratio: 0.74
Multi-Input Model
Wins Losses
23 77
TABLE 4.9: Performance after 100 rounds of duelling a level 2 Warden using
the same finalised model as above. No other previous model was able to
intelligently defeat a Level 2 opponent. Win/Loss ratio: 0.23
In both the training process and live testing, the finalised model performs significantly bet-
ter than every other model, and can be considered a genuine success. When compared to
an example novice player, who had volunteered as a test subject, it can be seen that their
performance in terms of win ratio are very comparable, shown in Table 4.10.
Multi-Input Model
Wins Losses
76 24
TABLE 4.10: Performance after 100 rounds of duelling a level 1 Warden,

fought by a novice-level human player. The performance of this player and
the bot are comparable. Win/Loss ratio: 0.76
Though there was no set data recorded, the bot did fight humans numerous times. Whilst
it is evidently not good enough to defeat experienced humans consistently, it does make it
a challenge for humans to defeat, and even managed to defeat both of the two human test
subjects on more than one occasion. Table 4.11 below displays all of the prior testing table
results, congregated for comparison.
Model Performance Results Comparison

Model F Score Rounds Played Wins Losses Win/Loss Ratio
Random Fighter n/a 100 4 96 0.04
Feed Forward 0.4010 100 15 85 0.15
Initial Recurrent 0.4827 100 23 77 0.23
Pre-Improvement - 20 2 18 0.1
Prediction History Removed - 20 4 16 0.2
Mixed-Input Model 0.3247 20 4 16 0.2
Multi-Input Model 0.6667 20 6 14 0.3
Final Model 0.8091 100 74 26 0.74
Novice Human n/a 100 76 24 0.76
TABLE 4.11: Performance results of all models tested. The degraded model
Pre-improvement and with the Prediction History Removed did have F Scores
recorded. The improvement in Win/Loss ratio is observable, and by the end
is comparable to the results of a real novice human’s results. All duels were
versus a Level 1 Warden.
Tensorboard
As discussed in (3.3.3 - Model Optimisation Utilising Tensorboard) once a basic model struc-
ture had been finalised (before adding the LSTM branch, there was too little time to perform
re-testing after the addition of the third branch), utilising a temporal convolutional layer and
2D convolutional layer, several values for different parameters were tested. This included
the number of neurons per layer and the number of layers for each branch. Every permu-
tation of model for these parameters (within a small range) was trained and compared with
each other. They were compared utilising Tensorboard, a visualisation tool for Tensorflow,
used within Keras as a callback.
Figure 4.1 depicts the performance of 15 of the 98 models tested. Many models experienced
extremely similar training cycles, and so have been omitted for the ease of readability of the
graph. It is visible that all models with a higher quantity of neurons per layer outperform
those with lower amounts. 128 neurons was the highest amount that memory limitations
allowed for. Figure 4.2 displays the generated graph of the network structure in far greater
detail than in Figure 3.7.
F IGURE 4.1: Results of 15 models trained with the same network structure,
but with different values for the number of layers for each branch, and the
number of neurons per layer. The name of the model details the values of
these parameters.
F IGURE 4.2: Tensorboard visualisation of the final neural network structure.
4.1.3 Validation
This subsection will review and reflect upon the specifications set out initially (1.2 - Baseline
Target Specification For The Project), and determine how much of what was set out to be
achieved was fully realised. It will also evaluate whether the specifications were themselves
appropriate and realistically feasible. Each stated specification is listed and explored.
Specification Review
The bot must be able to competently make fighting decisions against an enemy.
Whilst this is a fairly nebulous requirement, it nonetheless is of key importance, and is better
observed qualitatively than quantitatively. The bot consistently performs important deci-
sions, such as switching its guard direction to match an opponent’s in order to block incom-
ing attacks, parrying attacks, following successful guard breaks and parries with optimal
counter-attacks, and more. Counter-attacks are performed less consistently than standard
defensive moves, but these successful actions require a window of decision of less than half
a second. It can be confidently stated that the bot does competently and consistently make
fighting decisions against an opponent. It can be argued however that it does not do so
consistently enough to be adequate, but this is explored later in the subsection.
Ideally it should fight and win matches with a regularity of over 50% against
Level 2 difficulty bots and intermediately-skilled players.
This can be considered a failed specification. As evidenced by the results listed in the pre-
vious subsection, in Table 4.9, it can be observed that the win percentage is only 23%. The
bot simply does not make intelligent decisions consistently enough to win against Level 2
difficulty bots in most test cases. It should be noted, however, that win ratio is a particularly
harsh metric. For Honor is a game where an extremely small number of mistakes made in
the span of milliseconds can cost you an entire match. Whilst a player does not need to play
perfectly, they also cannot afford more than a small number of mistakes, and so even a 23%
win percentage can be considered an achievement. Upon reflection, this specification could
be considered itself to be poor and overly harsh given the time frame. It was born out of,
at the time, a lack of knowledge on the part of the author. It was not known at the time the
sheer complexity of multi-label, multi-class classification tasks such as this, especially when
dealing with imbalance.
It must be able to play without any manual assistance from the time when ini-
tially locking on until the round is over.
This was successful. Once the program is running and the model has loaded, the AI captures
several frames to use as a frame of reference to begin with. Then it can be activated and
ceased with the push of a button. Nothing else is required of the user.
The bot must be able to play in real-time using information entirely gleaned from
the screen. It should not require any hooks to be connected to the opponent’s
client, and thus should be able to fight any opponent, including in-game bots.
This was also successful. Explained in detail in the Implementation (3.3.2 - Input), the bot
gains all of its information via complex computer vision techniques utilised in a state detec-
tor. It gains the rest of its input data from the frame itself and its own previous predictions.
End Specification Review

In addition to being competent at "fighting", it would be ideal for the bot’s game-
play to be indistinguishable from that of humans’. This can be tested for by sur-
veying people both familiar and unfamiliar with the game, showing them two
different pieces of recorded footage of duels, and asking them to guess as to
which was performed by a human. This process could be repeated a number of
times on the same subject. The intended result would be for those familiar with
the game to be unable to consistently guess correctly.
A kind of "Turing Test" was performed. Footage was recorded from a novice human test
subject, playing in the exact same circumstances as the bot, in the same map against the
same difficulty opponent (Level 1 Warden). 5 fights were recorded. 15 fights were then
recorded of the bot playing. These are then compared in a survey, with four fights being
displayed per question, only one of which was controlled by a human. Respondents were
asked to watch the four fights, then answer as to which they believed to be the human.
Theoretically, if the bot was extremely poor or exhibited erratic behaviour, then it would
have been extremely trivial to discern the human. The results will now be explored. Charts
for all five questions can be seen in the Appendix (A - Turing Test Responses) along with the
answers. The video can be seen at the following link: https://youtu.be/Y2cir2THk3E
Each respondent was able to obtain a maximum score of 5, if they managed to discern the
human in all five scenarios. As it was, of the 20 people that responded to the survey, none
of them were able to obtain the maximum score. Nor were any able to obtain a score of 4/5.
Scores beyond that were fairly evenly distributed though tending towards the lower end,
with 5 scoring 3/5, 5 scoring 2/5, 6 scoring 1/5, and 6 scoring 0/5. The mean score thus
was just 1.41/5, which is barely over the most likely score obtained by randomly choosing
answers of 1.25 (0.25 × 5). Given the generally poor performance from respondents, it can
be concluded that the AI observably performs similarly or better than a novice human. Full
results can be seen in Figure 4.3.
F IGURE 4.3: Results from the "Turing Test". No respondents scored a maxi-
mum score of 5, nor even 4. This is a promising result, as it implies that it is
extremely difficult to tell the AI apart from a novice human.
It would be useful if it was able to learn to play more than one kind of character,
as each has a different moveset and different styles with which to win fights. (it
is unlikely it will be completely achievable for every character due to the time
limit of the project and the quantity of data required to be captured).
This final extension specification can be considered not met. The AI can only play as the War-
den Hero, and has only been trained to fight against Wardens. However, theoretically the
network should be able to learn to play as any Hero and against any Hero, provided there is
a dataset tailored specifically for that combination. This combination was chosen due to the
Hero’s comparative simplicity, and due to the fact that supervised data capture required the
skill level of the author to be high in order to provide useful data. More complicated fight
matchups were more likely to lead to more erroneous data capture and subsequently worse
results.
41
Chapter 5
Conclusions and Future Work
5.1 Evaluation
This section will reflect and review upon the entirety of the project and provide a more
general analysis than Chapter 4 as to how successful the end result was, and how many of
the original intentions were realised.
5.1.1 Project Successes

The end result of the project at this time can be considered a success. Many initially large-
scope problems were solved. The game can be controlled easily via script; in fact the code is
generalised in such a way that any game could be controlled using it.
Beginning from a standpoint of absolutely no experience by the author in computer vision,
the state detector developed implements fairly complex algorithms developed specifically
for this project in order to achieve an accurate and detailed extraction of the current game
state, a feat made even more challenging by the fact that the user interface moves around the
screen constantly. The state detector is accurate enough to be able to build an auto-parrying
tool with extremely minimal effort, that works against the fastest opening attack in the game
with 70% accuracy (3.3.2 - Input). Coupled with the multi-thread gamepad input detector,
it serves as an effective data capture tool, storing information in such a way that this section
alone can be extended into other tools.
When used in conjunction with the final neural network model (See Figure 3.7) the AI,
dubbed "Neural Knight", is able to effectively make real intelligent decisions in a real-time
fight. It is able to achieve a 74% win rate against level 1 bots, and can fight with a semblance
of efficacy against humans, often beating novice humans, if not experienced players.
5.1.2 Project Failures

With no perspective or frame of reference by the author as to the difficulty of such a project
besides an approximation, the end result can be considered not as effective as it was in-
tended to be. The AI cannot be considered to be on par with an intermediate player. For
Honor is a game that punishes small mistakes harshly, and so due to the fact that, at the
end of the development time, Neural Knight occasionally fails to perform some more basic
tasks like blocking attacks (which it performs successfully most of the time), these mistakes
lead to severe punishment from an experienced opponent meaning Neural Knight rarely
wins against them. With additional data this should be able to be resolved. Most of Neural
Knights performance failings at this stage can likely be fixed with much, much more data, as
the current dataset consists of only approximately two hours worth of constant gameplay.
It can also be considered a failing that Neural Knight requires the prerequisite of disabling
42 Chapter 5. Conclusions and Future Work
the stamina system. Though it does not exhaust itself often, the network has been given no
real frame of reference as to the stamina level and thus it cannot rectify that. Extracting the
health and stamina level was deemed too difficult a task given the time allowed, however a
potential solution has been devised that could solve this problem, discussed further in 5.3.
A more significant failing, but another that can be fixed with data, is that Neural Knight
thus-far has only been made functional for one specific matchup i.e. Warden vs Warden.
The network does not and cannot generalise to other character matchups, and the dataset
provided does not support that. This could be theoretically fixed by recording a new dataset
with a new matchup, however this would mean that each trained model would be designed
to perform in one kind of matchup. In order to be a generalised system, there would need
to be 2424 models and datasets, which is an infeasibly large number.
5.2 Project Comparison

There are seemingly no other projects that attempt to create a self-learning AI for For Honor.
Indeed, games such as these are rarely chosen due for tasks like this due to the sheer diffi-
culty of extracting information via the development of an external API. There are also very
few projects that focus on teaching an AI to control a single complicated agent in a mul-
tiplayer game. The most prominent example of this is OpenAI’s Dota 2 1v1 AI [10]. This
AI was successful enough to beat the world’s top professional Dota 2 players. The AI was
trained using unsupervised techniques, and was able to utilise both an API to get perfect
game information as well as the ability to run simulations of the game far faster than real
time. As a result, the AI played against itself for two weeks, able to play over 4000 games and
learn over time, before performing publicly during a broadcasted tournament. Given these
similar resources, it is certainly possible Neural Knight too would succeed to such a degree.
Using unsupervised learning too would be a key strategy to overcoming the problem that
the AI can only be as effective as the player that provides it the data. This is discussed more
in the next section.
5.3 Potential Extension

There are many ways to extend this project beyond its current status in order to improve the
AI and make it more versatile. As discussed in earlier sections, taking the time to capture far
more data would be of great help as currently the dataset is relatively small. Datasets could
also be captured with the player playing as a different hero, or playing against a different
hero, so models could be trained that were able to play different matchups. Given that it is a
supervised task, the better the player is, the better and less erroneous the data captured will
be.
Another way this could be circumvented is by performing unsupervised learning. This
would require no data capture, where instead the AI would learn by itself, most likely using
some form of genetic algorithm such as the Neuroevolution of Augmenting Topologies algo-
rithm, also known as NEAT, described in Chapter 2. Briefly, the algorithm generates many
random neural networks, then rates them using some reward function, usually dubbed their
"fitness" score. The best networks are kept and "bred" together to form new networks, with
occasional mutations. Over time, the highest fitness networks end up gaining high pre-
dictive power. For this project, one possible reward function, which was used by Adam
Fletcher and Jonathan Mortensen to make an AI for the fighting game "Street Fighter II" [4],
was to take the difference in remaining health between the player character and the oppo-
nent. This is significantly more difficult a value to obtain in For Honor, however, due to
5.4. Self Evaluation 43
the user interface constantly moving. However, one method of obtaining the health is to
use more precise pixel-by-pixel analysis. The health bars only move around within a certain
range of the screen. If these two sub-windows were captured, then pixels could be analysed
along a row to monitor the colour, from left to right. When pure white was reached, this
would signify the beginning of the health bar. When a black pixel was reached, then the dis-
tance travelled would constitute the current health. The same could be done with stamina.
Figure 5.1 further explains this concept.
F IGURE 5.1: Concept for a potential reward function. In order to determine

the health, capture the general window where the health bar resides. Travel
from the left till pure white pixels are met. Begin counting. When more than
a certain number of black pixels are reached (so as not to consider each small
black division bar), the distance travelled would be health. Taking the differ-
ence would be able to provide a workable reward function. The same algo-
rithm could be applied to stamina for observation purposes.
In addition to all of this, once Neural Knight has successfully reached a point where it could
receive no further meaningful improvement in its performance of a duel, For Honor also has
a "Brawl" mode, which works exactly the same, but is instead a duel of two teams of two.
This would require taking into consideration a second enemy, as well as co-operation with
another player. It would require either a far more robust API to access game information,
or a way to learn without utilising precise information. Learning to self-play Brawl modes,
as well as the other 4 vs 4 modes, are of such a larger scope that they would be considered
completely different projects.
5.4 Self Evaluation

I, the author, consider the project to be a success. Beginning with no knowledge in machine
learning, nor computer vision, many of the techniques used were self-taught through per-
sonal research. Whilst Neural Knight could currently never beat me in a fight, it can beat my
more novice peers, and the fact that it can defeat Level 1 bots consistently makes it better
than most every new player.
45
Appendix A
Turing Test Responses
Correct answer: 3
46 Appendix A. Turing Test Responses
Correct answer: 4
Correct answer: 1
Appendix A. Turing Test Responses 47
Correct answer: 1
Correct answer: 2
49
Appendix B
Figures & Tables
List of Tables
3.1 A table of images that demonstrates the appearance of the user interface dur-
ing certain game actions, and shows what features were desirable to extract. . 10
3.2 HSV Colour ranges for each type of feature that is captured by detecting the
presence of said colours. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 The number of blocked and landed opponent attacks before correcting the
region of interest. Note that this was a "sanity test" to ensure the network was
making intelligent decisions. Results of final performance tests are detailed
in Chapter 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 The number of blocked and landed opponent attacks after correcting the re-
gion of interest. Results of final performance tests are detailed in Chapter
4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Example dataset displaying the extreme class imbalance. Whilst classes 11
and 12 can be considered too low in sample size to learn from, they are not
very important to overall gameplay performance, and were included for the
sake of completeness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Performance results of the random fighter after 100 rounds of duelling a Level
1 Warden. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Performance results of the feed forward network after 100 rounds of duelling
a Level 1 Warden. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Performance results of the initial LSTM network after 100 rounds of duelling
a Level 1 Warden. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Initial results of 20 rounds of duelling a Level 1 Warden before improvements
were attempted. Note that no F score was recorded for this model as it was
discarded before the switch to this metric. Win/Loss ratio: 0.1 . . . . . . . . . 33
4.5 Results of 20 rounds of duelling a Level 1 Warden after the network was
trained without the previous frames’ gamepad outputs as inputs. Note that
no F score was recorded for this model as it was discarded before the switch
to this metric. Win/Loss ratio: 0.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.6 Results of 20 rounds of duelling a Level 1 Warden after the network was
changed to accept multiple inputs. Win/Loss ratio: 0.2 . . . . . . . . . . . . . 33
4.7 Results of 20 rounds of duelling a Level 1 Warden after the data being passed
into the LSTM layer was fixed by converting it into a series of sequences.
Win/Loss ratio: 0.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.8 Performance after 100 rounds of duelling a level 1 Warden. A remarkable im-
provement over other models in a relatively small step, switching the Conv1D
branch that processed the previous gamepad inputs to using LSTM. Win/Loss
ratio: 0.74 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.9 Performance after 100 rounds of duelling a level 2 Warden using the same
finalised model as above. No other previous model was able to intelligently
defeat a Level 2 opponent. Win/Loss ratio: 0.23 . . . . . . . . . . . . . . . . . 34
4.10 Performance after 100 rounds of duelling a level 1 Warden, fought by a novice-
level human player. The performance of this player and the bot are compara-
ble. Win/Loss ratio: 0.76 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.11 Performance results of all models tested. The degraded model Pre-improvement
and with the Prediction History Removed did have F Scores recorded. The
improvement in Win/Loss ratio is observable, and by the end is comparable
to the results of a real novice human’s results. All duels were versus a Level
1 Warden. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
51
List of Figures
3.1 Annotated user interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 The guard direction user interface and corresponding detected lines. . . . . . 13
3.3 The feed forward model, made up of three rectified linear activation layers
followed by a softmax output layer. . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 The initial recurrent layer, comprised of a CUDA-enabled LSTM layer which
has Dropout applied and is subsequently flattened to allow the output to be
passed into a Dense layer, and output using sigmoidal activation. . . . . . . . 22
3.5 Visualisation of convolutional neural networks. Image taken from [9]. . . . . 23
3.6 The mixed-data network that accepts both the categorical and image data, fed
through an LSTM layer and a convolutional layer respectively, before being
concatenated and fed through Dense layers till activation. . . . . . . . . . . . 24
3.7 The finalised neural network model. Input 1 is the categorical state data. In-
put 2 is the low resolution, RGB image data. Input 3 is the previous gamepad
input, either provided by the training data, or in the case of a real-time test it
receives its own previous prediction. . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 Results of 15 models trained with the same network structure, but with dif-
ferent values for the number of layers for each branch, and the number of
neurons per layer. The name of the model details the values of these parame-
ters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Tensorboard visualisation of the final neural network structure. . . . . . . . . 37
4.3 Results from the "Turing Test". No respondents scored a maximum score of 5,
nor even 4. This is a promising result, as it implies that it is extremely difficult
to tell the AI apart from a novice human. . . . . . . . . . . . . . . . . . . . . . 39
5.1 Concept for a potential reward function. In order to determine the health,
capture the general window where the health bar resides. Travel from the
left till pure white pixels are met. Begin counting. When more than a certain
number of black pixels are reached (so as not to consider each small black
division bar), the distance travelled would be health. Taking the difference
would be able to provide a workable reward function. The same algorithm
could be applied to stamina for observation purposes. . . . . . . . . . . . . . . 43
53
Appendix C
Project Proposal
Below is the original project proposal in its entirety, followed by the list of intended aims
and objectives, and a rudimentary bibliography of the background research conducted.
C.1 Proposal
It is a popular project in the field of neural networks to develop and train neural networks
(and other learning machines) to play single-player computer games, as training is made
easy. Neural networks require many thousands upon thousands of data points, which also
assumes the behaviour being replicated must be able to be encoded (for example, for a feed-
forward neural network with a single layer of hidden neurons, the behaviour must be able
to be represented as a continuous bounded function). Virtually every aspect of a computer
game is already encoded and accessible, and data points can be very easily generated to
provide a large enough training set.
where all other agents are programmed. The concept of doing the same with online multi-
player games is still fairly new, and multiple agents cooperating together to make decisions
even more so.
My project will be the development of a neural network that will learn to play an online
game, and, time-dependent, to develop other neural networks that all learn to work in a
team. The game that will be used is as of yet undecided, though likely it will either be
Ubisoft’s “For Honor” (2017) or Valve’s “Dota 2” (2011).
There are advantages and disadvantages to both games. For Honor, a fantasy 3D melee com-
bat game, utilises its combat engine dubbed the “Art of Battle” system. It is fundamentally
quite simple, and as a result would be able to be easily encoded. There is also specific modes
for 1v1 duelling, 2v2 and 4v4 fights, which would make scaling the project into dealing with
multiple agents very convenient. However, there is no API or other way of easily accessing
real-time information of a fight. As a result, some kind of visual encoding would need to be
done, so computer vision would add an additional layer of complexity to the project.
Dota 2, however, would be significantly more difficult to model. The game’s mechanics are
extremely complicated, and so forming an intelligent AI would be equally difficult, espe-
cially multiple agents working cooperatively, since different agents would have very differ-
ent roles. One additional note is that 1v1 has already been achieved by Open AI, and they
have since begun to work on a 5v5 project which is making progress, so using Dota would
also not be wholly very unique.It would be much easier to encode however, as real-time in-
formation is accessible outside of the game for anyone to use. This, along with my existing
knowledge and experience with the game are the two main advantages of using Dota.
54 Appendix C. Project Proposal
The only hardware needed is a computer powerful enough to run both the network and the
game, as well as presumably another computer to use to fight the AI and give it training
data as well as to test it against. In terms of software, an additional copy of the game for the
AI to play on. The tools and systems I will be using are Tensorflow, and OpenAI’s Gym.
C.2 Aims
• Create an AI that can play a game, most likely For Honor
• Recognise the screen and convert it into inputs that can be fed to a neural network
• Develop a neural network that learns to produce game actions that can win fights
C.3 Extended Aims

• Develop a pair of neural networks that can fight each other
• Develop a pair of neural networks that can fight cooperatively in Brawl mode (2v2)
• Develop four neural networks that can fight cooperatively in Elimination mode (4v4)
• Develop four neural networks that can fight cooperatively in Dominion mode (4v4
objective capture mode)
C.4 Objectives
• Be able to control the game through code
• Regularly capture the screen of the game
• Process each frame into a visualiser that can recognise guard direction
• Process guard direction into normalised data to be fed as inputs into a neural network
• Develop a recurrent neural network that can output the correct response to the en-
emy’s actions
• Develop the AI to be proficient enough to beat Level 1 Bots.
• Develop the AI to be proficient enough to beat competent humans.
C.5 References
G. Cybenko, 1989, “Approximation by Superpositions of a Sigmoidal Function”
John Laird, Michael van Lent, 2001, “Human-Level AI’s Killer Application - Interactive
Computer Games”
Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio
Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman,
Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis,
C.5. References 55
Koray Kavukcuoglu, Thore Graepel, 2018, “Human-level performance in first-person mul-

tiplayer games with population-based deep reinforcement learning”
Jose Font, Tobwas Mahlmann, Unpublished, “The Dota 2 Bot Competition”
57
Appendix D
Preliminary Project Report
D.1 Introduction
Computer games bots are pieces of software that are able to play computer games au-
tonomously. Researchers have applied many different techniques to the design of bots, and
one such modern technique is the use of of neural networks to develop and train neural
networks (and other learning machines) to play single-player computer games, as training
is made easy.
Neural networks require many thousands upon thousands of data points, which also as-
sumes the behaviour being replicated must be able to be encoded (for example, for a feed-
forward neural network with a single layer of hidden neurons, the behaviour must be able to
be represented as a continuous bounded function. Luckily, virtually every aspect of a com-
puter game is already encoded and accessible, and data points can be very easily generated
to provide a large enough training set.
where all other agents are programmed. Neuro-evolutionary computing is a popular method
in this instance, where the goal is unchanging, but the solution is unknown. Neuro-evolutionary
computing is a kind of reinforcement learning, where a generation of random neural net-
works is generated, then tested and rated against some reward function. Poor models are
killed off, whilst fitter models are "bred" to make a new generation. This has been done for
Super Mario World as well as for a host of games for the Atari 2600.
The concept of using machine learning techniques with online multiplayer games, however
is still fairly new. In addition, this kind of technique is not feasible for an online fighting
game, where there is no single hard solution to winning, as the human opponent can be
considered completely unpredictable.
This project aims to develop an AI that learns to play the game For Honor (2017), developed
by Ubisoft. It can be considered a 3D melee dueling game, with additional modes that
support 2v2 and 4v4 gameplay. Since there is no API or hook available with which live
match information may be accessed, a significant portion of this project must be dedicated
to processing features directly from the screen using computer vision. This must be done in
an efficient enough manner that frames can be processed from a minimum of 10 frames per
second, due to this being the time taken for the quickest possible action to take place (guard
switching).
D.2 Aims and Objectives

Aims
58 Appendix D. Preliminary Project Report
• Create an AI that can play For Honor

• Recognise the screen and convert it into inputs that can be fed to a neural network
• Develop a neural network that learns to produce game actions that can win fights
Extended Aims
• Develop a pair of neural networks that can fight each other
• Develop a pair of neural networks that can fight cooperatively in Brawl mode (2v2)
• Develop four neural networks that can fight cooperatively in Elimination mode (4v4)
• Develop four neural networks that can fight cooperatively in Dominion mode (4v4
objective capture mode)
Objectives
• Be able to control the game through code (complete)
• Regularly capture the screen of the game (complete)
• Process each frame into a visualiser that can recognise guard direction (complete)
• Process guard direction into normalised data to be fed as inputs into a neural network
(complete)
• Process attack indicators, unblockable attacks, and guard breaking as normalised data
(complete)
• Process enemy health as normalised data (complete)
• Process player input for use as output data for training and testing (complete)
• Develop a recurrent neural network that can output the correct response to the en-
emy’s actions
• Develop the AI to be proficient enough to beat "Level 1" difficulty Bots.
• Develop the AI to be proficient enough to beat competent humans.
It is worth nothing that Level 3 difficulty bots are often more skilled than humans.
D.3 Methods
With regards to the actual learning, I decided early on that I would utilise a neural network,
as has been done in many observable examples previously. Specifically, I would make use of
a particular model of neural network called a recurrent neural network. As opposed to the
traditional feed-forward neural network, a recurrent neural network not only processes in-
puts from one layer to the next, but information is also retained and fed back within the fea-
ture space. This allows for information to be retained between frames, creating the possibil-
ity for patterns to be recognised within gameplay, because it allows time to be represented.
Especially promising is the model known as the "long short-term memory network", a form
of recurrent network that has a very large memory capacity compared to a normal RNN. An
RNN with external memory was used to train multiple agents to play Quake III Arena, but
using an LSTM solves the same problem as the utilisation of external memory - the problem
D.4. Project Plan 59
of the limitation of memorisation due to the the gradient vanishing and exploding problem
because "the temporal evolution of the backpropagated error exponentially depending on
the size of the weights" , back-propagation being a common and efficient method of training
networks. It was also used to train MariFlow, a self-learning Super Mario Kart AI. However,
my knowledge regarding recurrent neural networks is not yet sufficient, and more research
regarding it will be conducted over Reading Week.
The first segment of my project, however, is the matter of feature preprocessing - obtaining
live information regarding the state of the game from frame to frame. Such information is
not accessible using an API, so the only way to approach this problem is using computer
vision solutions. I have primarily been using the OpenCV documentation to learn, and
initially followed the computer vision section of a course by Sentdex [8].
D.4 Project Plan

• Complete feature preprocessing and run supervised state detector test and construct
confusion matrix - 22/02/19
• Preliminary Report - 22/02/19
• Decide on network type and model - 28/02/19
• Have a successfully trained network - 15/03/19
• Network produces usable, continuous and coherent game input - 29/03/10
• Full draft of project - 29/03/19
• Network able to win a fight against a poorly-skilled player - 7/04/19
• Testing/improvement - 10/04/19
• Space for unforeseen issues/Extension
• Final deliverable - 17/05/19
D.5 Progress to Date

The project can be split into three major problems. The first is the output - controlling the
game through script. This was already solved soon after proposing the project, utilising a
script made by StackOverflow user "hodka" and modified by Harrison Kinsley (A.K.A "Sent-
dex") that converts keyboard input to DirectX events. More details can be read in Weekly
Log 00.
The second problem was the network itself. However, before this can be tackled, it is now
known that large amounts of data will need to be captured, and the network will need real-
time data to be pulled from the screen in order to predict the correct input. This leads to
the third problem - detecting game information directly from the screen itself. This is the
problem I have focussed on after completing the first problem.
I am able to capture the screen and process each frame in real-time, at an approximate speed
of 20 frames per second. Feature preprocessing has nearly been completed. Utilising the
OpenCV library, I have constructed a state detector that is almost able to completely produce
a simple array where each index contains the presence of each feature.
60 Appendix D. Preliminary Project Report
The initial task of detecting guard direction was done by binary thresholding. Binary thresh-
olding is the process of filtering the image only to allow pixels above a certain brightness
value, and convert the rest to black. This could be done with a single line of code. After some
experimentation with the parameters, I was able to quite cleanly be left only with the guard
UI (and some miscellaneous noise, to be expected). Whilst in the conventional case, edge
detection is applied before any kind of line-finding, such as using the Canny Edge Detec-
tion algorithm, the UI is designed such that it was worth attempting to apply line detection
immediately, as it was mostly made of lines.
I used a form of the Hough Lines Transform known as the Probabilistic Hough Lines Trans-
form, provided by OpenCV. This is a more efficient form of the algorithm that also conve-
niently returns the lines as a pair of Cartesian coordinates. Again, this could be done with
only a single line of code. I decided that, in order to determine guard direction, since all
six lines were unique, that direction could be determined by the gradient. That required
writing my own algorithm for retrieving a list of the gradients. After experimenting with
the parameters, the guard detection was tested and successfully labelled the correct guard
most times. A more accurate test will be done of the whole state detector (see Project Plan).
The other kinds of feature were detected using a separate system. By using HSV values,
shades of a certain colour could also be thresholded. Utilising this, verifying the presence of
a colour in enough pixels could be used to detect the state of a feature. This is all stored in an
image separate from the guard direction thresholded image. In the case of normal weapon
attacks, where direction is also required, the attack indicator is converted to white in the
guard direction image before processing, so as to pass the threshold. Unblockable attacks
and guardbreaking is captured in much the same way, each looking for the presence of
certain colours within the frame. A "bash" attack is detected by deduction: if an unblockable
attack is taking place, yet the enemy has no guard up, then it is considered a bash.
The enemy’s health should be captured for the network to potentially use as some kind of
heuristic for progress. This is done by processing a separate copy of the frame and line de-
tecting the health bar. Since the health bar stays the same, and has a length of approximately
60 pixels when full (at 1024x768), finding the ratio of this by dividing the current length by
60 allows the program to get a percentage of health remaining.
In addition I finally generate a low-resolution, greyscaled version of the frame to pass in as
information.
With feature processing virtually complete aside from tweaking and experimenting with pa-
rameters, the next step is for a formal supervised test to be composed to assess the accuracy
and precision of this classifier.
61
Appendix E
Source Code
Experimentation program designed to match the opponent’s guard:

import numpy as np
from PIL import ImageGrab
import cv2
import time
from d i r e c t k e y s import ReleaseKey , PressKey , W, A, S , D
from s t a t i s t i c s import mean
from numpy import ones , vstack , i s f i n i t e
from numpy . l i n a l g import l s t s q
# from g r a b s c r e e n import g r a b _ s c r e e n
# import pyautogui
def draw_ li nes ( image , l i n e s ) :

try :
f o r l i n e in l i n e s :
coords = l i n e [ 0 ]
cv2 . l i n e ( image , ( coords [ 0 ] , coords [ 1 ] ) , ( coords [ 2 ] ,
coords [ 3 ] ) , [ 2 5 5 , 0 , 0 ] , 1 0 )
except :
pass
def r o i ( image , v e r t i c e s ) :
mask = np . z e r o s _ l i k e ( image )
cv2 . f i l l P o l y ( mask , v e r t i c e s , 2 5 5 )
masked = cv2 . b i t w i s e _ a n d ( image , mask )
r e t u r n masked
def process_image ( o r i g i n a l I m a g e ) :
processedImage = cv2 . c v t C o l o r ( o r i g i n a l I m a g e , cv2 . COLOR_BGR2GRAY
)
# p r o c e s s e d I m a g e = cv2 . G a u s s i a n B l u r ( p r o c e s s e d I m a g e , ( 1 , 1 ) , 0 )
v e r t i c e s = np . a r r a y ( [ [ 4 6 9 , 1 7 9 ] , [ 5 6 7 , 1 7 9 ] , [ 5 6 7 , 2 7 8 ] , [ 4 6 9 , 2 7 8 ] ] )
processedImage = r o i ( processedImage , [ v e r t i c e s ] )
_ , processedImage = cv2 . t h r e s h o l d ( processedImage , 1 2 5 , 2 5 5 , cv2 .
THRESH_BINARY)
# p r o c e s s e d I m a g e = cv2 . Canny ( p r o c e s s e d I m a g e , t h r e s h o l d 1 =200 ,
t h r e s h o l d 2 =300)
62 Appendix E. Source Code
l i n e s = cv2 . HoughLinesP ( processedImage , rho = 1 , t h e t a = np . p i

/180 , t h r e s h o l d = 1 7 , minLineLength = 1 5 , maxLineGap = 5 )
gradients = [ ]
try :
p r i n t ( len ( l i n e s ) )
print ( l in es )
# find gradients
actualLine = line [0]
gradient = ( actualLine [ 3 ] − actualLine [ 1 ] ) / (
actualLine [2] − actualLine [ 0 ] )
i f i s f i n i t e ( gradient ) :
g r a d i e n t s . append ( g r a d i e n t )
print ( gradients )
except :
pass
draw _li nes ( processedImage , l i n e s )
r e t u r n processedImage , o r i g i n a l I m a g e , g r a d i e n t s
def find_guard ( g r a d i e n t s ) :
avgM = i n t ( mean ( g r a d i e n t s ) )
i f avgM < − 0.5:
r e t u r n RIGHT
e l i f avgM > − 0.5 and avgM < 1 :
r e t u r n UP
else :
r e t u r n LEFT
f o r i in l i s t ( range ( 4 ) ) [ : : − 1 ] :
print ( i + 1)
def main ( ) :
l a s t T i m e = time . time ( )
while True :
# grab_screen ( region =(0 ,40 ,800 ,640) )
newScreen , o r i g i n a l I m a g e , g r a d i e n t s = process_image ( s c r e e n )
# cv2 . imshow ( " window " , n e w S c r e e n )
if gradients :
g u a r d D i r e c t i o n = find_guard ( g r a d i e n t s )
switch_guard ( g u a r d D i r e c t i o n )
print ( guardDirection )
cv2 . imshow ( " window2 " , newScreen )
i f cv2 . waitKey ( 2 5 ) & 0xFF == ord ( " q " ) :

Appendix E. Source Code 63
cv2 . destroyAllWindows ( )
break
p r i n t ( " Loop took { } seconds " . format ( time . time ( ) − l a s t T i m e ) )
main ( )
Program to execute in-game combinations:

from d i r e c t k e y s import PressKey , ReleaseKey , W, A, S , D, J , K, E , Q
, CTRL , UP, LEFT , RIGHT
import time
def tap_key ( key ) :

PressKey ( key )
time . s l e e p ( 0 . 1 )
ReleaseKey ( key )
def lock_on ( ) :
PressKey (CTRL)
def l o c k _ o f f ( ) :
ReleaseKey (CTRL)
def switch_guard ( d i r ) :
tap_key ( d i r )
def warden_heavy_attack ( ) :
tap_key (K)
def w a r d e n _ l i g h t _ a t t a c k ( ) :
tap_key ( J )
def p a r r y _ t h e n _ c o u n t e r ( ) :
warden_heavy_attack ( )
time . s l e e p ( 0 . 1 )
switch_guard ( LEFT )
time . s l e e p ( 0 . 5 )
warden_light_attack ( )
time . s l e e p ( 0 . 0 5 )
warden_light_attack ( )
def e x e c u t e ( o p t i o n ) :
tap_key ( o p t i o n )
"""
time . s l e e p (4)
LockOn ( )
P r e s s K e y (W)
f o r i in range ( 2 ) :
Swit ch Gua rd ( LEFT )
WardenHeavyAttack ( )
time . s l e e p ( 0 . 8 )
Sw it ch Gua rd (UP)
WardenHeavyAttack ( )
time . s l e e p ( 1 . 6 )
E x e c u t e (Q)
R e l e a s e K e y (W)
LockOff ( )
"""
Main Data Capture script:

import numpy as np
import cv2
import time
from ou t p u t C o n t r o l import switch_guard , LEFT , RIGHT , UP

from multithread_gamepadInput import C o n t r o l l e r
from s t a t e D e t e c t o r import g e n e r a t e _ f e a t u r e _ l i s t ,
checkCurrentGuardInput
# for data c o l l e c t i o n
import keyboard
import p i c k l e
import os
capturing = False
def t o g g l e _ c a p t u r e ( key ) :
global capturing
i f capturing :
p r i n t ( " Ceasing c a p t u r e . " )
capturing = False
else :
p r i n t ( " Beginning c a p t u r e . . . " )
c a p t u r i n g = True
def read_data ( s t a t e , gamepad ) :

i f s t a t e [ 3 ] == 1 : # r i g h t
p r i n t ( " The enemy ’ s guard i s r i g h t ! " )
e l i f s t a t e [ 4 ] == 1 : # t o p
p r i n t ( " The enemy ’ s guard i s top ! " )
e l i f s t a t e [ 5 ] == 1 : # l e f t
p r i n t ( " The enemy ’ s guard i s l e f t ! " )
i f gamepad [ 4 ] == 1 : # l e f t
p r i n t ( " The p l a y e r s e t guard t o l e f t ! " )
e l i f gamepad [ 5 ] == 1 : # t o p
p r i n t ( " The p l a y e r s e t guard t o top ! " )
e l i f gamepad [ 6 ] == 1 :
p r i n t ( " The p l a y e r s e t guard t o r i g h t ! " )
i f gamepad [ 9 ] == 1 :
p r i n t ( " The p l a y e r heavy a t t a c k e d ! " )
p r i n t ( "−−−−−−−−−" )
#RUNTIME
def main ( ) :
global capturing
keyboard . o n _ r e l e a s e _ k e y ( " v " , t o g g l e _ c a p t u r e )
print ( i + 1)
p r i n t ( " Neural Knight Data Capture i s now a c t i v e . . . " )
i = 0
i f os . path . e x i s t s ( " i n / s t a t e D a t a 0 . p i c k l e " ) :
foundFile = False
while not f o u n d F i l e :
i f not os . path . e x i s t s ( " i n / s t a t e D a t a " + s t r ( i ) + " .
pickle " ) :
f o u n d F i l e = True
else :
i += 1
f = open ( " i n / s t a t e D a t a " + s t r ( i ) + " . p i c k l e " , "wb" )

g = open ( " out/outputData " + s t r ( i ) + " . p i c k l e " , "wb" )
switch_guard (UP)
c u r r e n t G u a r d D i r e c t i o n = UP
gd = C o n t r o l l e r ( )
gamepadInput = [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ]
# prevGamepadInput = [ ]
prevGamepadInput = gamepadInput
"""
f o r i in range ( prevFrames ) :
prevGamepadInput . append ( gamepadInput )
"""
while True :
i f capturing :
s c r e e n = np . a r r a y ( ImageGrab . grab ( bbox = ( 0 , 1 3 0 , 1 0 2 4 ,
700) ) ) # g r a b _ s c r e e n ( r e g i o n =(0 ,40 ,800 ,640) )
# prevGamepadInput = prevGamepadInput [ 1 : ]
# prevGamepadInput . append ( gamepadInput )
prevGamepadInput = gamepadInput
pcg = c u r r e n t G u a r d D i r e c t i o n
c u r r e n t G u a r d D i r e c t i o n = checkCurrentGuardInput (
prevGamepadInput , c u r r e n t G u a r d D i r e c t i o n )
s t a t e = g e n e r a t e _ f e a t u r e _ l i s t ( s c re en ,
currentGuardDirection )
# gamepadInputHistory = [ y f o r x in prevGamepadInput f o r
y in x ]
# print (" History : " , gamepadInputHistory )
s t a t e . append ( prevGamepadInput )
p i c k l e . dump( s t a t e , f )
gamepadInput = gd . get_true_game_input ( )
p i c k l e . dump( gamepadInput , g )
# r e a d _ d a t a ( s t a t e [ 0 ] , gamepadInput )
i f cv2 . waitKey ( 2 5 ) & 0xFF == ord ( " q " ) : # e y b o a r d . i s _ p r e s s e d ( "
q ") :
break
f . close ( )
g . close ( )
main ( )
Script used for capturing gamepad inputs:

from i n p u t s import get_gamepad
import t h r e a d i n g
def l i s t e n ( c ) :
while True :
e v e n t s = get_gamepad ( )
f o r event in e v e n t s :
i f event . ev_type == " Key " or event . code == "ABS_RZ" :
c . b u t t o n s _ p r e s s e d . add ( event . code )
e l i f event . ev_type == " Absolute " :
i f event . code == "ABS_RX" :
c . rx = i n t ( event . s t a t e ) / 32768
i f event . code == "ABS_RY" :
c . ry = i n t ( event . s t a t e ) / 32768
i f event . code == " ABS_X " :
c . x = i n t ( event . s t a t e ) / 32768
i f event . code == " ABS_Y " :
c . y = i n t ( event . s t a t e ) / 32768
class Controller :
def _ _ i n i t _ _ ( s e l f ) :
s e l f . rx = 0
s e l f . ry = 0
self .x = 0
self .y = 0
s e l f . p r e s s e d _ s i n c e _ l a s t _ p o l l = t h r e a d i n g . Thread ( t a r g e t =
l i s t e n , a r g s =( s e l f , ) )
s e l f . p r e s s e d _ s i n c e _ l a s t _ p o l l . setDaemon ( True )
self . pressed_since_last_poll . start ()
s e l f . buttons_pressed = set ( )
def g e t _ b u t t o n ( s e l f , button ) :
pressed = button in s e l f . b u t t o n s _ p r e s s e d
r e t u r n pressed
def g e t _ l e f t _ s t i c k ( s e l f ) :
return ( s e l f . x , s e l f . y )
def g e t _ r i g h t _ s t i c k ( s e l f ) :
r e t u r n ( s e l f . rx , s e l f . ry )
def get_true_game_input ( s e l f ) :
inputArray = [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ]
# W, A, S , D, l e f t , up , r i g h t , GB , l i g h t , heavy , dodge ,
f e i n t , taunt , i d l e
i f s e l f . g e t _ b u t t o n ( "BTN_WEST" ) :
# p r i n t ( " Guard b r e a k " )
inputArray [ 7 ] = 1
i f s e l f . g e t _ b u t t o n ( "BTN_TR" ) :
# print (" Light Attack ")
i f s e l f . g e t _ b u t t o n ( "ABS_RZ" ) :
# p r i n t ( " Heavy A t t a c k " )
i f s e l f . g e t _ b u t t o n ( "BTN_SOUTH" ) :
# p r i n t ( " Dodge " )
inputArray [ 1 0 ] = 1
i f s e l f . g e t _ b u t t o n ( "BTN_EAST" ) :
# print (" Feint ")
i f s e l f . g e t _ b u t t o n ( "BTN_NORTH" ) :
# p r i n t ( " Taunt " )
s e l f . buttons_pressed . c l e a r ( )
movement = s e l f . g e t _ l e f t _ s t i c k ( )
i f movement [ 1 ] > 0 . 7 :
e l i f movement [ 1 ] < − 0.7:
i f movement [ 0 ] < − 0.7:
e l i f movement [ 0 ] > 0 . 7 :
guard = s e l f . g e t _ r i g h t _ s t i c k ( )
i f guard [ 0 ] < − 0.7 and guard [ 1 ] < 0 . 7 :
# p r i n t ( " Guard l e f t ! " )
e l i f guard [ 0 ] > 0 . 7 and guard [ 1 ] < 0 . 7 :
# p r i n t ( " Guard r i g h t ! " )
e l i f guard [ 1 ] > 0 . 7 and guard [ 0 ] > − 0.3 and guard [ 0 ] < 0 . 3 :
# p r i n t ( " Guard t o p ! " )
i f 1 not in inputArray :
p r i n t ( inputArray )
r e t u r n inputArray
i f __name__ == " __main__ " :

a = Controller ( )
while True :
a . get_true_game_input ( )
State Detector, used for data capture and real-time predicting:

import numpy as np
import cv2
from s t a t i s t i c s import mean , mode
from numpy import ones , vstack , i s f i n i t e , where , copyto , mean
from ou t p u t C o n t r o l import LEFT , RIGHT , UP
c u r r e n t G u a r d D i r e c t i o n = None

try :
coords [ 3 ] ) , [ 2 5 5 , 0 , 0 ] , 1 )
except :
pass
r e t u r n masked
def r o i 3 ( image , v e r t i c e s ) : # r o i f o r 3 c h a n n e l i m a g e
cv2 . f i l l P o l y ( mask , np . i n t 3 2 ( [ v e r t i c e s ] ) , ( 2 5 5 , 2 5 5 , 2 5 5 ) )
r e t u r n masked
def f i l t e r _ f o r _ a t t a c k _ i n d i c a t o r ( image ) :
hsv = cv2 . c v t C o l o r ( image , cv2 . COLOR_RGB2HSV)
mask = cv2 . inRange ( hsv , np . a r r a y ( [ 0 , 2 3 4 , 7 4 ] ) , np . a r r a y ( [ 1 , 2 5 5 ,
160]) )
whiteImage = np . z e r o s _ l i k e ( image )
whiteImage [ : ] = 255
l o c s = np . where ( mask ! = 0 )
# h t t p s : / / s t a c k o v e r f l o w . com / q u e s t i o n s / 4 1 5 7 2 8 8 7 / e q u i v a l e n t −o f −
c o p y t o −in −python −o p e n c v − b i n d i n g s
# C a s e #1 − O t h e r i m a g e i s g r a y s c a l e and s o u r c e i m a g e i s c o l o u r
i f len ( image . shape ) == 3 and len ( whiteImage . shape ) ! = 3 :
image [ l o c s [ 0 ] , l o c s [ 1 ] ] = whiteImage [ l o c s [ 0 ] , l o c s [ 1 ] , None
]
# C a s e #2 − B o t h i m a g e s a r e c o l o u r o r g r a y s c a l e
e l i f ( len ( image . shape ) == 3 and len ( whiteImage . shape ) == 3 ) or
\
( len ( image . shape ) == 1 and len ( whiteImage . shape ) == 1 ) :
image [ l o c s [ 0 ] , l o c s [ 1 ] ] = whiteImage [ l o c s [ 0 ] , l o c s [ 1 ] ]
# O t h e r w i s e , we can ’ t do t h i s
r e t u r n mask , l o c s
def f i l t e r _ f o r _ u n b l o c k a b l e ( image ) :
mask = cv2 . inRange ( hsv , np . a r r a y ( [ 8 , 2 5 0 , 1 3 5 ] ) , np . a r r a y ( [ 1 1 , 2 5 5 ,
164]) )
return locs
def f i l t e r _ f o r _ g u a r d b r e a k ( image ) :
mask = cv2 . inRange ( hsv , np . a r r a y ( [ 0 , 2 4 7 , 6 8 ] ) , np . a r r a y ( [ 1 , 2 5 5 ,
85]) )
return locs
"""
we g e n e r a t e 2 d i f f e r e n t i m a g e s . One b l a c k and w h i t e i m a g e f o r
finding the
l i n e s f o r g u a r d d i r e c t i o n , and o n e f o r f i n d i n g t h e a t t a c k i n g s t a t e s
. ProcessedImage finds the l i n e s ( including attack indicators ,
converted to white ) .
outImage turns the a t t a c k i n d i c a t o r s back i n t o red f o r viewing
c o n v e n i e n c e , b u t m o s t l y u s e s l o c s t o f i n d i f an a t t a c k i s
h a p p e n i n g . I t a l s o u s e s t h i s same l o c s method
t o f i n d t h e s t a t e o f u n b l o c k a b l e s and g u a r d b r e a k i n g
As w e l l a s t h i s , i t c r e a t e s a s e p a r a t e i m a g e t o f i n d o u t t h e h e a l t h
o f t h e enemy and p l a y e r
"""
processedImage = o r i g i n a l I m a g e . copy ( )
v e r t i c e s 2 = np . a r r a y ( [ [ 4 0 0 , 1 3 0 ] , [ 6 2 0 , 1 3 0 ] , [ 6 2 0 , 4 0 0 ] , [ 4 0 0 , 4 0 0 ] ] )
processedImage = r o i 3 ( processedImage , v e r t i c e s 2 )
gbLocs = f i l t e r _ f o r _ g u a r d b r e a k ( processedImage )
uLocs = f i l t e r _ f o r _ u n b l o c k a b l e ( processedImage ) # u n b l o c k a b l e
presence locations
mask , l o c s = f i l t e r _ f o r _ a t t a c k _ i n d i c a t o r ( processedImage )
processedImage = cv2 . c v t C o l o r ( processedImage , cv2 .

COLOR_BGR2GRAY)
# v e r t i c e s = np . a r r a y ( [ [ 4 2 0 , 1 6 0 ] , [ 6 2 0 , 1 6 0 ] , [ 6 2 0 , 3 5 0 ] , [ 4 2 0 , 3 5 0 ] ] )
THRESH_BINARY)
# r e p l a c e white a t t a c k i n d i c a t o r with red

outImage = processedImage . copy ( )
outImage = cv2 . c v t C o l o r ( outImage , cv2 . COLOR_GRAY2BGR)
redImage = np . z e r o s _ l i k e ( outImage )
redImage [ : ] = ( 0 , 0 , 2 5 5 )
i f len ( outImage . shape ) == 3 and len ( redImage . shape ) ! = 3 :

outImage [ l o c s [ 0 ] , l o c s [ 1 ] ] = redImage [ l o c s [ 0 ] , l o c s [ 1 ] ,
None ]
# C a s e #2 − B o t h i m a g e s a r e c o l o u r o r g r a y s c a l e
e l i f ( len ( outImage . shape ) == 3 and len ( redImage . shape ) == 3 ) or
\
( len ( outImage . shape ) == 1 and len ( redImage . shape ) == 1 ) :
outImage [ l o c s [ 0 ] , l o c s [ 1 ] ] = redImage [ l o c s [ 0 ] , l o c s [ 1 ] ]
# O t h e r w i s e , we can ’ t do t h i s
# Detect l i n e s
gradients = [ ]
try :
# find gradients
i f ( actualLine [ 2 ] − actualLine [ 0 ] ) != 0 :
# print ( gradients )
except :
pass
#now do h e a l t h s t u f f
healthImage = o r i g i n a l I m a g e . copy ( )
healthImage = cv2 . c v t C o l o r ( healthImage , cv2 . COLOR_BGR2GRAY)
v e r t i c e s = np . a r r a y ( [ [ 4 9 0 , 3 0 ] , [ 6 1 0 , 3 0 ] , [ 6 1 0 , 1 2 0 ] , [ 4 9 0 , 1 2 0 ] ] )
healthImage = r o i ( healthImage , [ v e r t i c e s ] )
_ , healthImage = cv2 . t h r e s h o l d ( healthImage , 1 3 0 , 2 5 5 , cv2 .
THRESH_BINARY)
h e a l t h L i n e s = cv2 . HoughLinesP ( healthImage , rho = 1 , t h e t a = np .
p i /180 , t h r e s h o l d = 2 0 , minLineLength = 3 , maxLineGap = 1 0 )
draw _li nes ( healthImage , h e a l t h L i n e s )
h e a l t h L i n e = None # T h i s i s t h e a c t u a l l i n e r e p r e s e n t i n g t h e
health
healths = [ ]
h e a l t h = None
try :
i = 0
f o r l i n e in h e a l t h L i n e s :
i += 1
y D i f f = abs ( a c t u a l L i n e [ 3 ] − a c t u a l L i n e [ 1 ] ) # N o i s e
filtering
i f y D i f f == 0 :
healthLine = actualLine
h e a l t h s . append ( ( h e a l t h L i n e [ 2 ] − h e a l t h L i n e [ 0 ] ) /
60) # Health i s j u s t x d i f f e r e n c e o f the l i n e
ends
h e a l t h = mode ( h e a l t h s )
except :
pass
# c r e a t e l o w r e s v e r s i o n o f f r a m e t o p a s s i n a s c o n t e x t −s e n s i t i v e
information
lowRes = o r i g i n a l I m a g e . copy ( )
lowRes = cv2 . c v t C o l o r ( lowRes , cv2 . COLOR_BGR2RGB)
s c a l e P e r c e n t = 20
newWidth = i n t ( lowRes . shape [ 1 ] ∗ s c a l e P e r c e n t / 1 0 0 )
newHeight = i n t ( lowRes . shape [ 0 ] ∗ s c a l e P e r c e n t / 1 0 0 )
newDim = ( newWidth , newHeight )
lowRes = cv2 . r e s i z e ( lowRes , newDim , i n t e r p o l a t i o n = cv2 .
INTER_AREA)
cv2 . imshow ( " img " , lowRes )
r e t u r n outImage , o r i g i n a l I m a g e , g r a d i e n t s , l o c s , uLocs , gbLocs ,
h e a l t h , lowRes
# 0 = right 1 = top 2 = l e f t
avgM = np . mean ( g r a d i e n t s )
i f avgM < − 0.5:
# p r i n t ( " RIGHT " )
r e t u r n RIGHT
e l i f avgM > − 0.5 and avgM < 0 . 9 :
# p r i n t ( "TOP" )
r e t u r n UP
else :
# p r i n t ( " LEFT " )
r e t u r n LEFT
def g e n e r a t e _ f e a t u r e _ l i s t ( s cr e en , c u r r e n t G u a r d D i r e c t i o n ) :
# p r i n t("−−−−−−−−−−−−−−−")
newScreen , o r i g i n a l I m a g e , g r a d i e n t s , l o c s , uLocs , gbLocs ,
h e a l t h , lowRes = process_image ( s c r e e n )
enemyGuardDirection = None
attacking = False
unb l o c k a b l e = F a l s e
i f len ( uLocs [ 0 ] > 5 ) : # t h i s number i s >0 j u s t t o f i l t e r o u t
noise
u n b l o c k a b l e = True
i f len ( l o c s [ 0 ] ) > 3 0 0 :
a t t a c k i n g = True
i f len ( g r a d i e n t s ) > 0 :
pass
i f g r a d i e n t s and len ( g r a d i e n t s ) < 1 0 : # o n l y c h a n g e g u a r d i f
t h e r e ’ s n o t much i n t h e way o f n o i s e
enemyGuardDirection = find_guard ( g r a d i e n t s )
else :
enemyGuardDirection = None
# i n o r d e r : p l a y e r r i g h t guard , p l a y e r up guard , p l a y e r l e f t
guard , enemy r i g h t guard , enemy up guard , enemy l e f t guard ,
enemy a t t a c k i n g , enemy u n b l o c k a b l e , enemy b a s h i n g , enemy
guard b r e a k i n g ]
features = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0]
i f c u r r e n t G u a r d D i r e c t i o n == RIGHT :
features [0] = 1
e l i f c u r r e n t G u a r d D i r e c t i o n == UP :
features [1] = 1
e l i f c u r r e n t G u a r d D i r e c t i o n == LEFT :
features [2] = 1
i f enemyGuardDirection == RIGHT :
features [3] = 1
# p r i n t (" Right ")
i f enemyGuardDirection == UP :
features [4] = 1
# p r i n t ( " Top " )
i f enemyGuardDirection == LEFT :
features [5] = 1
# print (" L e f t ")
if attacking :
features [6] = 1
tempStr = " l e f t "
i f enemyGuardDirection == UP :
tempStr = " top "
e l i f enemyGuardDirection == RIGHT :
tempStr = " r i g h t "
# p r i n t ( " The enemy i s t h r o w i n g o u t an a t t a c k f r o m t h e " ,
tempStr , " ! " )
i f un b l o c k a b l e :
features [7] = 1
# p r i n t ( " Enemy i s t h r o w i n g some k i n d o f u n b l o c k a b l e a t t a c k
!")
i f f e a t u r e s [ 7 ] == 1 and enemyGuardDirection == None : # f e a t u r e s
[ 3 ] == 0 and f e a t u r e s [ 4 ] == 0 and f e a t u r e s [ 5 ] == 0 : # i f
u n b l o c k a b l e and t h e r e ’ s no a t t a c k d i r e c t i o n , i t ’ s a b a s h
features [8] = 1
# p r i n t (" I t ’ s a bash ! " )
i f len ( gbLocs [ 0 ] ) > 300 and enemyGuardDirection == None :
# p r i n t ( " Enemy i s g u a r d b r e a k i n g ! " )
features [9] = 1
f i n a l F e a t u r e L i s t = [ f e a t u r e s , lowRes ]
return f i n a l F e a t u r e L i s t
def checkCurrentGuardInput ( inp , c u r r e n t ) : # u p d a t e c u r r e n t g u a r d

d i r e c t i o n i f i t needs i t
i f inp [ 4 ] == 1 : # l e f t
r e t u r n LEFT
e l i f inp [ 5 ] == 1 : # t o p
r e t u r n UP
e l i f inp [ 6 ] == 1 : # r i g h t
r e t u r n RIGHT
return current
i f __name__ == " __main__ " :

while True :
g e n e r a t e _ f e a t u r e _ l i s t ( s cr e en , UP)
break
Script used to pass prediction into game as keyboard input:

import numpy as np
import cv2
import time
from d i r e c t k e y s import ReleaseKey , PressKey , W, A, S , D
from s t a t i s t i c s import mean
from numpy import ones , vstack , i s f i n i t e
# import pyautogui

try :
coords [ 3 ] ) , [ 2 5 5 , 0 , 0 ] , 1 0 )
except :
pass
r e t u r n masked
processedImage = cv2 . c v t C o l o r ( o r i g i n a l I m a g e , cv2 . COLOR_BGR2GRAY
)
# p r o c e s s e d I m a g e = cv2 . G a u s s i a n B l u r ( p r o c e s s e d I m a g e , ( 1 , 1 ) , 0 )
THRESH_BINARY)
# p r o c e s s e d I m a g e = cv2 . Canny ( p r o c e s s e d I m a g e , t h r e s h o l d 1 =200 ,
t h r e s h o l d 2 =300)
gradients = [ ]
try :
p r i n t ( len ( l i n e s ) )
print ( l in es )
# find gradients
print ( gradients )
except :
pass
draw_ li nes ( processedImage , l i n e s )
r e t u r n processedImage , o r i g i n a l I m a g e , g r a d i e n t s
avgM = i n t ( mean ( g r a d i e n t s ) )
i f avgM < − 0.5:
r e t u r n RIGHT
e l i f avgM > − 0.5 and avgM < 1 :
r e t u r n UP
else :
r e t u r n LEFT
print ( i + 1)
def main ( ) :
while True :
# grab_screen ( region =(0 ,40 ,800 ,640) )
newScreen , o r i g i n a l I m a g e , g r a d i e n t s = process_image ( s c r e e n )
# cv2 . imshow ( " window " , n e w S c r e e n )
if gradients :
g u a r d D i r e c t i o n = find_guard ( g r a d i e n t s )
switch_guard ( g u a r d D i r e c t i o n )
print ( guardDirection )
cv2 . imshow ( " window2 " , newScreen )

break
p r i n t ( " Loop took { } seconds " . format ( time . time ( ) − l a s t T i m e ) )
main ( )
Feed forward model training:

import pickle
import t e n s o r f l o w as t f
import m a t p l o t l i b . pyplot as p l t
import numpy as np
import keras
import keras_metrics
import k e r a s . backend as K
def f 1 ( y_true , y_pred ) :
def r e c a l l ( y_true , y_pred ) :
""" R e c a l l m e t r i c .
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f r e c a l l .
Computes t h e r e c a l l , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many r e l e v a n t i t e m s a r e s e l e c t e d .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p o s s i b l e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_true , 0 , 1 ) ) )
r e c a l l = true_positives / ( possible_positives + K. epsilon ( )
)
return r e c a l l
def p r e c i s i o n ( y_true , y_pred ) :

""" P r e c i s i o n m e t r i c .
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f p r e c i s i o n .
Computes t h e p r e c i s i o n , a m e t r i c f o r m u l t i − l a b e l
how many s e l e c t e d i t e m s a r e r e l e v a n t .
"""
1) ) )
p r e d i c t e d _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_pred , 0 , 1 ) ) )
precision = true_positives / ( predicted_positives + K.
epsilon ( ) )
return precision
p r e c i s i o n = p r e c i s i o n ( y_true , y_pred )
r e c a l l = r e c a l l ( y_true , y_pred )
r e t u r n 2 ∗ ( ( p r e c i s i o n ∗ r e c a l l ) /( p r e c i s i o n + r e c a l l +K . e p s i l o n ( ) ) )
f = open ( " i n / i n F i g h t 3 . p i c k l e " , " rb " )

g = open ( " out/ o u t F i g h t 3 . p i c k l e " , " rb " )
# l oa d data , only use f e a t u r e s

X = []
y = []
try :
while True :
example = p i c k l e . load ( f )
a = example [ 0 ]
a += example [ 2 ]
X . append ( a )
y . append ( p i c k l e . load ( g ) )
except :
pass
p r i n t ( len ( X ) )
p r i n t ( len ( y ) )
f . close ( )
g . close ( )
X = np . a s a r r a y ( X , dtype=np . f l o a t 3 2 )
y = np . a s a r r a y ( y , dtype=np . f l o a t 3 2 )
t r a i n i n g S i z e = 3000
x_train = X[ : trainingSize ]
y_train = y [ : trainingSize ]
x _ t e s t = X[ t r a i n i n g S i z e : ]
y_test = y[ trainingSize : ]
a = 0
f o r i in range ( len ( y _ t r a i n ) ) :
i f y _ t r a i n [ i ] [ 1 3 ] == 1 :
a += 1
p r i n t ( " node imbalance : " , a , " / " , len ( y _ t r a i n ) )
"""
x _ t r a i n = t f . k e r a s . u t i l s . n o r m a l i z e ( x _ t r a i n , a x i s =1)
x _ t e s t = t f . k e r a s . u t i l s . n o r m a l i z e ( x _ t e s t , a x i s =1)
f o r a in x_train :
p l t . imshow ( a , cmap = p l t . cm . b i n a r y )
p l t . show ( )
"""
c o n f i g = t f . ConfigProto ( )
c o n f i g . gpu_options . allow_growth = True
session = t f . Session ( config=config )
model = k e r a s . models . S e q u e n t i a l ( )
model . add ( k e r a s . l a y e r s . Dense ( 1 4 , a c t i v a t i o n = t f . nn . sigmoid ) )
model . compile ( o p t i m i z e r = "adam" , l o s s = " c a t e g o r i c a l _ c r o s s e n t r o p y " ,

m e t r i c s =[ f 1 ] )
c l a s s _ w e i g h t s = { 0 : 1 , #w
1 : 1 , #A
2 : 1 , #S
3 : 1 , #D
4 : 3 , # g u a r d l e f t 10
5 : 3 , # g u a r d t o p 10
6 : 3 , # g u a r d r i g h t 10
7:2 , # guardbreak
8 : 2 , # l i g h t 60
9 : 2 , # h e a v y 60
10:2 , # dodge
11:1 , # f e i n t
12:1 , # taunt
13:1} # idle
model . f i t ( x _ t r a i n , y _ t r a i n , epochs =10 , b a t c h _ s i z e =16 , c l a s s _ w e i g h t =

c l a s s _ w e i g h t s , v a l i d a t i o n _ d a t a =[ x _ t e s t , y _ t e s t ] )
v a l _ l o s s , v a l _ p r e c = model . e v a l u a t e ( x _ t e s t , y _ t e s t )
print ( " Validation l o s s : " , val_loss , " Validation accuracy : " ,
val_prec )
model . save ( " n k _ f f _ f i g h t e r . model " )

p r i n t ( " Model saved . " )
Feed forward main predictor:

import numpy as np
import cv2
import time

import keyboard
import os
import k e r a s
from k e r a s . u t i l s import plot_model
import k e r a s _ m e t r i c s
from processInputToGame import input_to_game
maxHealth = 60
running = F a l s e
gpu_options = t f . GPUOptions ( per_process_gpu_memory_fraction = 0 . 3 0 )

s e s s = t f . S e s s i o n ( c o n f i g = t f . ConfigProto ( gpu_options=gpu_options ) )
def t o g g l e _ r u n n i n g ( key ) :
g l o b a l running
i f running :
p r i n t ( " Ceasing Neural Knight ’ s mind . " )
running = F a l s e
else :
p r i n t ( " Continuing Neural Knight ’ s mind . . . " )
running = True
def read_data ( s t a t e ) :
i f s t a t e [ 3 ] == 1 : # r i g h t
e l i f s t a t e [ 4 ] == 1 : # t o p
e l i f s t a t e [ 5 ] == 1 : # l e f t

"""
1) ) )
)
return r e c a l l

"""
1) ) )
epsilon ( ) )
return precision
#RUNTIME
def main ( ) :
g l o b a l running
keyboard . o n _ r e l e a s e _ k e y ( " t " , t o g g l e _ r u n n i n g )
print ( i + 1)
p r i n t ( " Loading model . . . " )
switch_guard (UP)
# model = k e r a s . models . l oa d_m ode l (" n k _ f f _ a u t o b l o c k e r . model " ,

c u s t o m _ o b j e c t s ={" b i n a r y _ p r e c i s i o n " : k e r a s _ m e t r i c s . p r e c i s i o n ( )
})
model = k e r a s . models . load_model ( " n k _ f f _ f i g h t e r . model " ,
c u s t o m _ o b j e c t s ={ " f 1 " : f 1 } )
plot_model ( model , t o _ f i l e = ’ model . png ’ )
p r i n t ( " Neural Knight i s now a c t i v e . " )
p r e d i c t i o n = np . a r r a y ( [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ] )
prevFrames = 600
prevPrediction = [ ]
p r e v P r e d i c t i o n . append ( p r e d i c t i o n )
while True :
i f running :
prediction , currentGuardDirection )
# e x t r a c t e d S t a t e = np . a s a r r a y ( s t a t e [ 0 ] , d t y p e =np . f l o a t 3 2
) # State data
e x t r a c t e d S t a t e = s t a t e [ 0 ] # Frame d a t a
# p r i n t ( " The i n p u t i s : " , e x t r a c t e d S t a t e )
# p r i n t ( " The s h a p e o f t h e i n p u t i s : " , e x t r a c t e d S t a t e .
shape )
prevPrediction = prevPrediction [ 1 : ]
predHistory = [ y f o r x in p r e v P r e d i c t i o n f o r y in x ]
e x t r a c t e d S t a t e = np . expand_dims ( e x t r a c t e d S t a t e , a x i s =0)
e x t r a c t e d S t a t e = np . c o n c a t e n a t e ( ( e x t r a c t e d S t a t e [ 0 ] ,
predHistory ) , a x i s =0) # add p r e v i o u s f r a m e
gamepadinput to input s
p r e d i c t i o n = model . p r e d i c t ( e x t r a c t e d S t a t e ) [ 0 ]
# print (" pred : " , p r e d i c t i o n )
b = np . z e r o s _ l i k e ( p r e d i c t i o n )
# S e p a r a t e l y c h e c k t o s e e i f any o f t h e g u a r d d i r e c t i o n s
n e e d c h a n g i n g . I f so , o n l y c h a n g e t h e g u a r d most
l i k e l y to need changing
biggest = 0
b i g g e s t I n d e x = −1
f o r i in range ( 4 , 7 ) :
if prediction [ i ] > 0 . 2 :
if prediction [ i ] > biggest :
biggest = prediction [ i ]
biggestIndex = i
i f b i g g e s t I n d e x > − 1:
b [ biggestIndex ] = 1
f o r i in range ( 0 , 4 ) :
b[ i ] = 1
f o r i in range ( 7 , len ( b ) ) :
b[ i ] = 1
# b [ np . argmax ( p r e d i c t i o n ) ] = 1
# p r i n t (" out : " , b )
# read_data ( state [0])

input_to_game ( b )
i f keyboard . i s _ p r e s s e d ( " . " ) :

break
main ( )
Initial recurrent model training:

import p i c k l e
import numpy as np
import k e r a s
from k e r a s . l a y e r s import Dense , Dropout , CuDNNLSTM, F l a t t e n
# metric s t u f f

"""
1) ) )
)
return r e c a l l

"""
1) ) )

epsilon ( ) )
return precision
f = open ( " i n / s t a t e D a t a 6 . p i c k l e " , " rb " )

g = open ( " out/outputData6 . p i c k l e " , " rb " )
# l oa d data , only use f e a t u r e s

X = []
y = []
try :
while True :
a = example [ 0 ]
a += example [ 2 ]
X . append ( a )
except :
pass
f . close ( )
g . close ( )
"""
f = open (" in / s t a t e D a t a 3 . p i c k l e " , " rb ")
g = open (" out / outputData3 . p i c k l e " ," rb ")
try :
w h i l e True :
a = example [ 0 ]
a += e x a m p l e [ 2 ]
X. append ( a )
y . append ( p i c k l e . l o a d ( g ) )
except :
pass
f . close ()
g. close ()
"""
p r i n t ( len ( X ) )
p r i n t ( len ( y ) )
X = np . a s a r r a y ( X , dtype=np . f l o a t 3 2 )
t r a i n i n g S i z e = 15000
print ( x_train [ 0 ] )
a = 0
f o r i in range ( len ( y _ t r a i n ) ) :
i f y _ t r a i n [ i ] [ 1 3 ] == 1 :
a += 1
p r i n t ( " node imbalance : " , a , " / " , len ( y _ t r a i n ) )
"""
x _ t r a i n = t f . k e r a s . u t i l s . n o r m a l i z e ( x _ t r a i n , a x i s =1)
x _ t e s t = t f . k e r a s . u t i l s . n o r m a l i z e ( x _ t e s t , a x i s =1)
f o r a in x_train :
p l t . show ( )
"""
x _ t r a i n . shape = ( x _ t r a i n . shape [ 0 ] , 1 , x _ t r a i n . shape [ 1 ] )

x _ t e s t . shape = ( x _ t e s t . shape [ 0 ] , 1 , x _ t e s t . shape [ 1 ] )
p r i n t ( " x _ t r a i n [ 0 ] . shape = " , x _ t r a i n [ 0 ] . shape )
# Model p a r a m e t e r s
numLSTM = 1
dropout = 0 . 3
numDense = 0
c o n f i g = t f . ConfigProto ( )
c o n f i g . gpu_options . allow_growth = True
model = k e r a s . models . S e q u e n t i a l ( )
model . add (CuDNNLSTM( 1 2 8 , input_shape= x _ t r a i n . shape [ 1 : ] ,

r e t u r n _ s e q u e n c e s =True ) )
model . add ( Dropout ( dropout ) )
f o r i in range (numLSTM− 1) :
model . add (CuDNNLSTM( 1 2 8 , r e t u r n _ s e q u e n c e s =True ) )
model . add ( Dropout ( dropout ) )
model . add ( F l a t t e n ( ) )
"""
f o r i i n r a n g e ( numDense ) :
m o d e l . add ( k e r a s . l a y e r s . Dense ( 3 2 , a c t i v a t i o n =" r e l u " ) )
m o d e l . add ( Dropout ( d r o p o u t ) )
"""
model . add ( k e r a s . l a y e r s . Dense ( 1 4 , a c t i v a t i o n = " sigmoid " ) )
opt = k e r a s . o p t i m i z e r s . Adam( l r =1e − 2, decay=1e − 5)

model . compile ( o p t i m i z e r =opt , l o s s = " b i n a r y _ c r o s s e n t r o p y " , m e t r i c s =[ "
accuracy " ] )
idleBalancer = 2
c l a s s _ w e i g h t s = { 0 : i d l e B a l a n c e r , #w
1 : i d l e B a l a n c e r , #A
2 : i d l e B a l a n c e r , #S
3 : i d l e B a l a n c e r , #D
4 : i d l e B a l a n c e r +1 , # g u a r d l e f t 10
5 : i d l e B a l a n c e r +1 , # g u a r d t o p 10
6 : i d l e B a l a n c e r +1 , # g u a r d r i g h t 10
7: idleBalancer + 5 , # guardbreak
8 : i d l e B a l a n c e r + 2 0 , # l i g h t 60
9 : i d l e B a l a n c e r + 2 0 , # h e a v y 60
10: idleBalancer + 10 , # dodge
11: idleBalancer , # f e i n t
12: idleBalancer , # taunt
13:1} # idle
model . f i t ( x _ t r a i n , y _ t r a i n , epochs =15 , b a t c h _ s i z e =128 , c l a s s _ w e i g h t

= c l a s s _ w e i g h t s , v a l i d a t i o n _ d a t a =[ x _ t e s t , y _ t e s t ] )
v a l _ l o s s , v a l _ a c c = model . e v a l u a t e ( x _ t e s t , y _ t e s t )
val_acc )
model . save ( " nk_lstm_6 . model " )

Initial recurrent main predictor:

import numpy as np
import cv2
import time
from outp u t C o n t r o l import switch_guard , LEFT , RIGHT , UP

import keyboard
import os
import k e r a s
"""
1) ) )
)
return r e c a l l

"""
1) ) )
epsilon ( ) )
return precision
maxHealth = 60
running = F a l s e

g l o b a l running
i f running :
running = F a l s e
else :
running = True
i f s t a t e [ 3 ] == 1 : # r i g h t
e l i f s t a t e [ 4 ] == 1 : # t o p
e l i f s t a t e [ 5 ] == 1 : # l e f t
#RUNTIME
def main ( ) :
g l o b a l running
print ( i + 1)
switch_guard (UP)
# model = k e r a s . models . l oa d_m ode l (" n k _ f f _ a u t o b l o c k e r . model " ,

c u s t o m _ o b j e c t s ={" b i n a r y _ p r e c i s i o n " : k e r a s _ m e t r i c s . p r e c i s i o n ( )
})
model = k e r a s . models . load_model ( " nk_lstm_6 . model " )
prevFrames = 60
while True :
i f running :
# e x t r a c t e d S t a t e = np . a s a r r a y ( s t a t e [ 0 ] , d t y p e =np . f l o a t 3 2
) # State data
e x t r a c t e d S t a t e = s t a t e [ 0 ] # Frame d a t a
# p r i n t ( " The i n p u t i s : " , e x t r a c t e d S t a t e )
# p r i n t ( " The s h a p e o f t h e i n p u t i s : " , e x t r a c t e d S t a t e .
shape )
prevPrediction = prevPrediction [ 1 : ]
predHistory = [ y f o r x in p r e v P r e d i c t i o n f o r y in x ]
e x t r a c t e d S t a t e = np . c o n c a t e n a t e ( ( e x t r a c t e d S t a t e [0] ,
predHistory ) , a x i s =0) # add p r e v i o u s f r a m e
gamepadinput to input s
e x t r a c t e d S t a t e . shape = ( e x t r a c t e d S t a t e . shape [ 0 ] ,1 ,
e x t r a c t e d S t a t e . shape [ 1 ] )
p r e d i c t i o n = model . p r e d i c t ( e x t r a c t e d S t a t e ) [ 0 ]
# print (" pred : " , p r e d i c t i o n )
# S e p a r a t e l y c h e c k t o s e e i f any o f t h e g u a r d d i r e c t i o n s
n e e d c h a n g i n g . I f so , o n l y c h a n g e t h e g u a r d most
l i k e l y to need changing
biggest = 0
f o r i in range ( 4 , 7 ) :
biggestIndex = i
b[ i ] = 1
f o r i in range ( 0 , 4 ) :
b[ i ] = 1
# p r i n t (" out : " , b )

i f np . a r r a y _ e q u a l ( b , np . a r r a y
([0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,1]) ) :
print ( " Idle " )
else :
input_to_game ( b )

break
main ( )
Mixed data recurrent model training:

import p i c k l e
import numpy as np
import k e r a s
from k e r a s . l a y e r s import Dense , Dropout , CuDNNLSTM, F l a t t e n , Input ,
A c t i v a t i o n , Conv2D , MaxPooling2D , c o n c a t e n a t e
from k e r a s . models import Model
from s k l e a r n . u t i l s import c l a s s _ w e i g h t
# metric s t u f f

"""

1) ) )
)
return r e c a l l

"""
1) ) )
epsilon ( ) )
return precision
f = open ( " i n / s t a t e D a t a 1 . p i c k l e " , " rb " )

g = open ( " out/outputData1 . p i c k l e " , " rb " )
# load data
X = []
X2 = [ ]
y = []
try :
while True :
sample = p i c k l e . load ( f )
X . append ( sample [ 0 ] )
X2 . append ( sample [ 1 ] )
except :
pass
f . close ( )
g . close ( )
p r i n t ( len ( X ) )
p r i n t ( len ( y ) )
# Create sequences
XS = [ ]
sequenceLength = 8
t r a i n i n g S i z e = 15000 # 15000 19 #38000 16

f o r i in range ( len ( X )−sequenceLength ) :
sample = [ ]
f o r j in range ( sequenceLength ) :
sample . append ( X [ i + j ] )
XS . append ( sample )
X = np . a r r a y ( XS )
X2 = X2[: − sequenceLength ]
X2 = np . a s a r r a y ( X2 , dtype=np . f l o a t 3 2 )
X2 = X2 / 255 # N o r m a l i s a t i o n o f i m a g e d a t a
y = y[: − sequenceLength ]
x 2 _ t r a i n = X2 [ : t r a i n i n g S i z e ]
x 2 _ t e s t = X2 [ t r a i n i n g S i z e : ]
p r i n t ( " Samples 0 − 1:\n " , x _ t r a i n [ 0 ] , " \n\n " , x _ t r a i n [ 1 ] )
p r i n t ( "X l e n g t h : " , len ( X ) , " X2 l e n g t h : " , len ( X2 ) )

p r i n t ( " x _ t r a i n l e n g t h : " , len ( x _ t r a i n ) , " x 2 _ t r a i n l e n g t h : " , len (
x2_train ) )
x 2 _ t r a i n . shape = ( x 2 _ t r a i n . shape [ 0 ] , x 2 _ t r a i n . shape [ 1 ] , x 2 _ t r a i n .

shape [ 2 ] , 1 )
x 2 _ t e s t . shape = ( x 2 _ t e s t . shape [ 0 ] , x 2 _ t e s t . shape [ 1 ] , x 2 _ t e s t . shape
[2] ,1)
"""
# dims : 204 x114
f o r a in x2_train :
p l t . show ( )
"""
# Model h y p e r p a r a m e t e r s
numLSTM = 1
dropout = 0 . 4
numDense = 0
"""
config = t f . ConfigProto ()
c o n f i g . g p u _ o p t i o n s . a l l o w _ g r o w t h = True
"""
i n p u t S t a t e = Input ( shape= x _ t r a i n [ 0 ] . shape )

inputImage = Input ( shape= x 2 _ t r a i n [ 0 ] . shape )
# S t a t e branch
x = CuDNNLSTM( 1 2 8 , input_shape = ( 1 , x _ t r a i n . shape [ 1 ] ) ,
r e t u r n _ s e q u e n c e s =True ) ( i n p u t S t a t e )
x = Dropout ( dropout ) ( x )
x = Flatten () (x)
x = Model ( i n p u t s = i n p u t S t a t e , outputs=x )
# Image b r a n c h
y = Conv2D ( 6 4 , ( 3 , 3 ) ) ( inputImage )
y = Activation ( " relu " ) (y)
y = MaxPooling2D ( p o o l _ s i z e = ( 2 , 2 ) ) ( y )
y = Flatten () (y)
y = Model ( i n p u t s =inputImage , outputs=y )
# combine inputs
combined = c o n c a t e n a t e ( [ x . output , y . output ] )
# combined branch
z = Dense ( 2 5 6 , a c t i v a t i o n = " r e l u " ) ( combined )
z = Dropout ( dropout ) ( z )
z = Dense ( 1 2 8 , a c t i v a t i o n = " r e l u " ) ( z )
z = Dense ( 1 4 , a c t i v a t i o n = " sigmoid " ) ( z )
model = Model ( i n p u t s =[ x . input , y . input ] , outputs=z )
c l a s s _ w e i g h t s = { 0 : 1 , #w
1 : 1 , #A
2 : 1 , #S
3 : 1 , #D
4 : 1 , # g u a r d l e f t 10
5 : 1 , # g u a r d t o p 10
6 : 1 , # g u a r d r i g h t 10
7:3 , # guardbreak
8 : 2 , # l i g h t 60
9 : 1 , # h e a v y 60
10:3 , # dodge
11:1 , # f e i n t
12:1 , # taunt
13:1} # idle

model . compile ( o p t i m i z e r =opt , l o s s = " b i n a r y _ c r o s s e n t r o p y " , m e t r i c s =[
f1 ] )
model . f i t ( [ x _ t r a i n , x 2 _ t r a i n ] , y _ t r a i n , epochs =15 , b a t c h _ s i z e =64 ,

validation_data =([ x_test , x2_test ] , y_test ) )
v a l _ l o s s , v a l _ a c c = model . e v a l u a t e ( [ x _ t e s t , x 2 _ t e s t ] , y _ t e s t )
val_acc )
model . save ( " nk_lstm_conv_sequences_4 . model " )

Mixed data recurrent main predictor:

import numpy as np
import cv2
import time
from outp u t C o n t r o l import switch_guard , LEFT , RIGHT , UP

import keyboard
import os
import k e r a s
"""
1) ) )
)
return r e c a l l

"""
1) ) )
epsilon ( ) )
return precision
running = F a l s e

g l o b a l running
i f running :
running = F a l s e
else :
running = True
i f s t a t e [ 3 ] == 1 : # r i g h t
e l i f s t a t e [ 4 ] == 1 : # t o p
e l i f s t a t e [ 5 ] == 1 : # l e f t
#RUNTIME
def main ( ) :
g l o b a l running
print ( i + 1)
switch_guard (UP)
model = k e r a s . models . load_model ( " nk_lstm_conv_sequences_4 . model

" , c u s t o m _ o b j e c t s ={ " f 1 " : f 1 } )
prevFrames = 6
sequenceLength = 8
stateSequence = [ ]
while True :
i f running :
extractedState = state [0] # State data
frame = np . a s a r r a y ( s t a t e [ 1 ] ) # i m a g e d a t a
frame . shape = ( 1 , frame . shape [ 0 ] , frame . shape [ 1 ] , 1 )
s t a t e S e q u e n c e . append ( e x t r a c t e d S t a t e )
i f len ( s t a t e S e q u e n c e ) > sequenceLength :
stateSequence = stateSequence [ 1 : ]
stateSequenceNP = np . a s a r r a y ( s t a t e S e q u e n c e )
stateSequenceNP . shape = ( 1 , stateSequenceNP . shape
[ 0 ] , stateSequenceNP . shape [ 1 ] )
p r e d i c t i o n = model . p r e d i c t ( [ stateSequenceNP , frame ] )

[0]
# p r i n t (" Pred : " , p r e d i c t i o n )
# S e p a r a t e l y c h e c k t o s e e i f any o f t h e g u a r d
d i r e c t i o n s n e e d c h a n g i n g . I f so , o n l y c h a n g e t h e
g u a r d most l i k e l y t o n e e d c h a n g i n g
biggest = 0
f o r i in range ( 4 , 7 ) :
if prediction [ i ] > 0.05:
biggestIndex = i
# i n c r e a s e w e i g h t i n g o f GB!
if prediction [ i ] > 0.05:
b[ i ] = 1
f o r i in range ( 0 , 4 ) :
b[ i ] = 1

( [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ] ) ) or np .
a r r a y _ e q u a l ( p r e d i c t i o n , np . a r r a y
([0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0]) ) :
print ( " Idle " )
else :
input_to_game ( b )
# print ( stateSequenceNP )

break
main ( )
Final model training:

import p i c k l e
import time , d a t e t i m e
import numpy as np
import k e r a s
from k e r a s . l a y e r s import Dense , Dropout , CuDNNLSTM, F l a t t e n , Input ,
A c t i v a t i o n , Conv2D , MaxPooling2D , Conv1D , MaxPooling1D ,
concatenate
from k e r a s . models import Model
from k e r a s . c a l l b a c k s import TensorBoard , E a r l y S t o p p i n g
# metric s t u f f
# T h i s m e t r i c was t a k e n f r o m h e r e : h t t p s : / / s t a c k o v e r f l o w . com / a
/ 4 5 3 0 5 3 8 4 / 5 6 3 4 6 1 0 by u s e r " Paddy "
"""
1) ) )
)
return r e c a l l

"""
1) ) )
epsilon ( ) )
return precision
# load data
X = []
X2 = [ ]
X3 = [ ]
y = []
inputFiles = [ " stateData21 " , " stateData20 " , " stateData23 " ] # Enter the
filenames of a l l input data here
o u t p u t F i l e s = [ " outputData21 " , " outputData20 " , " outputData23 " ] # E n t e r
the filenames of a l l output data here
f o r i in range ( len ( i n p u t F i l e s ) ) :
i n S t r i n g = " in/" + i n p u t F i l e s [ i ] + " . pickle "
o u t S t r i n g = " out/ " + o u t p u t F i l e s [ i ] + " . p i c k l e "
f = open ( i n S t r i n g , " rb " )
g = open ( o u t S t r i n g , " rb " )
try :
while True :
sample = p i c k l e . load ( f )
X . append ( sample [ 0 ] )
except :
pass
f . close ( )
g . close ( )
p r i n t ( len ( X ) )
p r i n t ( len ( X3 ) )
p r i n t ( len ( y ) )
# Create sequences for s t a t e data

XS = [ ]
stateSequenceLength = 4
t r a i n i n g S i z e = 21000 # 15000 19 #38000 16
f o r i in range ( len ( X )− s t a t e S e q u e n c e L e n g t h ) :
sample = [ ]
f o r j in range ( s t a t e S e q u e n c e L e n g t h ) :
sample . append ( X [ i + j ] )
XS . append ( sample )
X = np . a r r a y ( XS )
X2 = X2[: − s t a t e S e q u e n c e L e n g t h ]
X2 = np . a s a r r a y ( X2 , dtype=np . f l o a t 3 2 )
X2 = X2 / 255 # N o r m a l i s a t i o n o f i m a g e d a t a
# Create sequences f o r previous input data

historySe q u en c eL e ng t h = 1
X3S = [ ]
f o r i in range ( len ( X3 )− s t a t e S e q u e n c e L e n g t h ) :
sample = [ ]
f o r j in range ( h i st o ry S eq u en c eL e n gt h ) :
sample . append ( X3 [ i + j ] )
X3S . append ( sample )
X3 = np . a r r a y ( X3S )
#X3 = np . a s a r r a y ( X3 , d t y p e =np . f l o a t 3 2 )
p r i n t ( " X3 0 : " , X [ 0 ] )
p r i n t ( " X3 0 : \ n " , X3 [ 0 ] )
p r i n t ( " X3 shape " , X3 [ 0 ] . shape )
y = y[: − s t a t e S e q u e n c e L e n g t h ]
# x2_train . shape = ( x2_train . shape [0] , x2_train . shape [1] , x2_train .

shape [ 2 ] , 1 )
# x2_test . shape = ( x2_test . shape [0] , x2_test . shape [1] , x2_test . shape
[2] ,1)
"""
# dims : 204 x114
f o r a in x2_train :
p l t . show ( )
"""
# Model h y p e r p a r a m e t e r s
layerSizes = [128]
conv1Layers = [ 2 ]
conv2Layers = [ 3 ]
denseLayers = [ 2 ]
dropout = 0 . 3
"""
config = t f . ConfigProto ()
c o n f i g . g p u _ o p t i o n s . a l l o w _ g r o w t h = True
"""
patience = 6
e a r l y s t o p p e r = E a r l y S t o p p i n g ( monitor= " v a l _ f 1 " , p a t i e n c e = p a t i e n c e ,
verbose =1 , mode= "max" , r e s t o r e _ b e s t _ w e i g h t s =True )
f o r l a y e r S i z e in l a y e r S i z e s :
f o r conv1Layer in conv1Layers :
f o r conv2Layer in conv2Layers :
f o r denseLayer in denseLayers :
NAME = " {} − nodes −{} − conv1 −{} − conv2 −{} − dense−

e a r l y S t o p p i n g −p a t i e n c e −{} − time − {} " . format (
l a y e r S i z e , conv1Layer , conv2Layer , denseLayer + 1 ,
p a t i e n c e , s t r ( d a t e t i m e . d a t e t i m e . now ( ) ) . r e p l a c e ( "
: " , " _ " ) ) # int ( time . time ( ) )
p r i n t (NAME)
tensorBoard = TensorBoard ( l o g _ d i r = " l o g s / { } " . format (
NAME) )
i n p u t S t a t e = Input ( shape= x _ t r a i n [ 0 ] . shape )

inputImage = Input ( shape= x 2 _ t r a i n [ 0 ] . shape )
i n p u t H i s t o r y = Input ( shape= x 3 _ t r a i n [ 0 ] . shape )
# S t a t e branch
x = Conv1D ( l a y e r S i z e , 4 , padding= ’ same ’ ) ( i n p u t S t a t e )
x = Activation ( " relu " ) ( x )
f o r l in range ( conv1Layer − 1) :
x = Conv1D ( l a y e r S i z e , 4 , padding= ’ same ’ ) ( x )
x = Activation ( " relu " ) ( x )
x = MaxPooling1D ( p o o l _ s i z e =1) ( x )
x = Flatten () (x)
x = Model ( i n p u t s = i n p u t S t a t e , outputs=x )
# Image b r a n c h
y = Conv2D ( l a y e r S i z e , ( 3 , 3 ) ) ( inputImage )
for l in conv2Layers :
y = Conv2D ( l a y e r S i z e , ( 3 , 3 ) ) ( y )
y = Flatten () (y)
y = Model ( i n p u t s =inputImage , outputs=y )
# History branch
w = CuDNNLSTM( l a y e r S i z e , input_shape = ( 1 , x 3 _ t r a i n .
shape [ 1 ] ) , r e t u r n _ s e q u e n c e s =True ) ( i n p u t H i s t o r y )
w = Dropout ( dropout ) (w)
w = F l a t t e n ( ) (w)
w = Model ( i n p u t s =i n p u t H i s t o r y , outputs=w)
# combine inputs
combined = c o n c a t e n a t e ( [ x . output , y . output , w.
output ] )
# combined branch
z = Dense ( 2 5 6 , a c t i v a t i o n = " r e l u " ) ( combined )
f o r l in denseLayers :
z = Dense ( l a y e r S i z e , a c t i v a t i o n = " r e l u " ) ( z )
# Output l a y e r
z = Dense ( 1 4 , a c t i v a t i o n = " sigmoid " ) ( z )
model = Model ( i n p u t s =[ x . input , y . input , w. input ] ,
outputs=z )
# "Move f o r w a r d " , " Move l e f t " , "Move b a c k w a r d " , "Move

r i g h t " , " Guard l e f t " , " Guard t o p " , " Guard r i g h t
" , " Guard b r e a k " , " L i g h t a t t a c k " , " Heavy a t t a c k
" , " Dodge " , " F e i n t " , " Taunt " , " I d l e "
# w e i g h t s = np . a r r a y ( [ 0 . 5 9 6 6 7 5 8 1 5 7 9 7 4 9 9 2 ,
0.7981600081325607 , 0.925282098200671 ,
0.9579139981701739 , 0.8746060790891532 ,
0.9140489986784589 , 0.7273050726847616 ,
0.9701636677848937 , 0.951865406119752 ,
0.931547219680797 , 0.9761106028260649 ,
0.9806983836535529 , 0.9880176883196097 ,
0.8289620819355494])
def my_loss ( t a r g e t s , l o g i t s ) :
weights = np . a r r a y ( [ 0 . 5 9 6 6 7 5 8 1 5 7 9 7 4 9 9 2 ,
0.7981600081325607 , 0.925282098200671 ,
0.9579139981701739 , 0.9246060790891532 ,
0.9880489986784589 , 0.9473050726847616 ,
0.9771636677848937 , 0.951865406119752 ,
0.971547219680797 , 0.9881106028260649 ,
0.9806983836535529 , 0.9880176883196097 ,
0.8289620819355494]) # optimal3
r e t u r n K . sum( t a r g e t s ∗ −K . l o g ( l o g i t s + 1e − 10) ∗
weights +
( 1 − t a r g e t s ) ∗ −K . l o g ( 1 − l o g i t s + 1e − 10)
∗ (1 − weights ) , a x i s = − 1)

model . compile ( o p t i m i z e r =opt , l o s s =my_loss , m e t r i c s
=[ f 1 ] )
model . f i t ( [ x _ t r a i n , x 2 _ t r a i n , x 3 _ t r a i n ] , y _ t r a i n ,
epochs =40 , b a t c h _ s i z e =64 , v a l i d a t i o n _ d a t a = ( [
x _ t e s t , x 2 _ t e s t , x 3 _ t e s t ] , y _ t e s t ) , c a l l b a c k s =[
tensorBoard , e a r l y s t o p p e r ] )
v a l _ l o s s , v a l _ a c c = model . e v a l u a t e ( [ x _ t e s t , x 2 _ t e s t
, x3_test ] , y_test )
print ( " Validation loss : " , val_loss , " Validation
accuracy : " , val_acc )
model . save (NAME + " . model " )

Final model main predictor:

import numpy as np
import cv2
import time

import keyboard
import os
import k e r a s
from multithread_gamepadInput import C o n t r o l l e r
running = F a l s e

g l o b a l running
i f running :
running = F a l s e
else :
running = True
i f s t a t e [ 3 ] == 1 : # r i g h t
e l i f s t a t e [ 4 ] == 1 : # t o p
e l i f s t a t e [ 5 ] == 1 : # l e f t
# T h i s m e t r i c was t a k e n f r o m h e r e : h t t p s : / / s t a c k o v e r f l o w . com / a
/ 4 5 3 0 5 3 8 4 / 5 6 3 4 6 1 0 by u s e r " Paddy "
"""
1) ) )
)
return r e c a l l

"""
1) ) )
epsilon ( ) )
return precision
#RUNTIME
def main ( ) :
g l o b a l running
print ( i + 1)
switch_guard (UP)
def my_loss ( t a r g e t s , l o g i t s ) :
# w e i g h t s = np . a r r a y ( [ 0 . 5 9 6 6 7 5 8 1 5 7 9 7 4 9 9 2 ,
0.7981600081325607 , 0.925282098200671 ,
0.9579139981701739 , 0.9246060790891532 ,
0.9940489986784589 , 0.9473050726847616 ,
0.9771636677848937 , 0.951865406119752 ,
0.951547219680797 , 0.9911106028260649 ,
0.9806983836535529 , 0.9880176883196097 ,
0.8289620819355494]) # optimal
weights = np . a r r a y ( [ 0 . 5 9 6 6 7 5 8 1 5 7 9 7 4 9 9 2 , 0 . 7 9 8 1 6 0 0 0 8 1 3 2 5 6 0 7 ,
0.925282098200671 , 0.9579139981701739 ,
0.9246060790891532 , 0.9880489986784589 ,
0.9473050726847616 , 0.9771636677848937 ,
0.951865406119752 , 0.971547219680797 ,
0.9881106028260649 , 0.9806983836535529 ,
0.9880176883196097 , 0.8289620819355494]) # optimal3
r e t u r n K . sum( t a r g e t s ∗ −K . l o g ( l o g i t s + 1e − 10) ∗ weights +
( 1 − t a r g e t s ) ∗ −K . l o g ( 1 − l o g i t s + 1e − 10) ∗ (1 − weights
) , a x i s = − 1)
modelName = " optimal2 . model "

p r i n t ( " Loading model ’ " , modelName , " ’ . . . " )
model = k e r a s . models . load_model ( modelName , c u s t o m _ o b j e c t s ={ " f 1 "
: f1 , " my_loss " : my_loss } )
predictionHistory = [ ]
sequenceLength = 4
his t or y Se q u en c eL e ng t h = 1
stateSequence = [ ]
# gd = C o n t r o l l e r ( )
f o r i in range ( h i st o ry S eq u en c eL e n gt h ) :
p r e d i c t i o n H i s t o r y . append ( [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ] )
# p r e d i c t i o n H i s t o r y . a p p e n d ( gd . g e t _ t r u e _ g a m e _ i n p u t ( ) )
time . s l e e p ( 0 . 1 )
while True :
i f running :
p r e d i c t i o n H i s t o r y [ − 1] , c u r r e n t G u a r d D i r e c t i o n )
extractedState = state [0] # State data
frame = np . a s a r r a y ( s t a t e [ 1 ] ) # i m a g e d a t a
frame . shape = ( 1 , frame . shape [ 0 ] , frame . shape [ 1 ] , frame .
shape [ 2 ] )
s t a t e S e q u e n c e . append ( e x t r a c t e d S t a t e )
i f len ( s t a t e S e q u e n c e ) > sequenceLength :

stateSequence = stateSequence [ 1 : ]
stateSequenceNP = np . a s a r r a y ( s t a t e S e q u e n c e )
stateSequenceNP . shape = ( 1 , stateSequenceNP . shape
[ 0 ] , stateSequenceNP . shape [ 1 ] )
p r e d i c t i o n H i s t o r y N P = np . a s a r r a y ( p r e d i c t i o n H i s t o r y )
p r e d i c t i o n H i s t o r y N P . shape = ( 1 , p r e d i c t i o n H i s t o r y N P
. shape [ 0 ] , p r e d i c t i o n H i s t o r y N P . shape [ 1 ] )
p r e d i c t i o n = model . p r e d i c t ( [ stateSequenceNP , frame ,

predictionHistoryNP ] ) [ 0 ]
# p r i n t (" Pred : " , p r e d i c t i o n )
# S e p a r a t e l y c h e c k t o s e e i f any o f t h e g u a r d
d i r e c t i o n s n e e d c h a n g i n g . I f so , o n l y c h a n g e t h e
g u a r d most l i k e l y t o n e e d c h a n g i n g
biggest = 0
f o r i in range ( 4 , 7 ) :
biggestIndex = i
b[ i ] = 1
f o r i in range ( 0 , 4 ) :
b[ i ] = 1
# b [ np . argmax ( p r e d i c t i o n ) ] = 1 # Only f o r one −h o t
d e c i s i o n making
p r e d i c t i o n H i s t o r y . append ( b )
predictionHistory = predictionHistory [ 1 : ]
( [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ] ) ) or np .
a r r a y _ e q u a l ( p r e d i c t i o n , np . a r r a y
([0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0]) ) :
i f b [ 1 3 ] == 1 :
print ( " Idle " )
else :
p r i n t ( " Zero I d l e " )
else :
input_to_game ( b )
# print ( stateSequenceNP )

break
main ( )
107
Appendix F
Weekly Logs
The following documents are the logs written to track progress across the term.
02 - 23/01/2019
The project can be split into three major problems. Approaching them in order of difficulty, the first is the output - controlling the
game through script. This w as already solved soon after proposing the project, and can be read about in Log 00 - controlling
output.
The second problem is the matter of input for the AI - providing real-time data about the game so the AI can use this information to
make a decision as to how to act. This proves challenging, as For Honor does not provide any kind of API to extract such
information, presumably because it could be used for cheating quite easily. As a result, I w ill have to use Computer Vision to try to
extract information. This is the problem I am currently in the process of solving. This log details the process by w hich I created a
visualiser to detect an enemy's guard, and develop the GuardMatcher, w hich can be seen here.
The single most important part of the UI that requires interpreting for the AI is the enemy's guard direction. Below is an anntotated
screenshot of the game to explain how the guard direction functions:
As a result, my first aim w as simply to make a simple program that detected the enemy's guard direction and sw itched its guard
appropriately. How ever, w ith no know ledge as to how to use openCV, I needed to spend time learning and using it in a w ay that
w ould allow me to make progress. I came across this course, w hich w as the same resource I had previously used to discover
how to control the game (w hich can again be read in Log 00 - controlling output), w hich w as a pseudo-improvised course made
by "Sentdex", w here he w ould show his process of making an AI that could play the game Grand Theft Auto V - specifically, it
begins first w ith the making of a self-driving car AI.
Follow ing the first several steps of this course, I developed a rudimentary AI using openCV. It relied primarily on finding the main
tw o lines in a frame, and consider them the tw o boundaries of the lane. By analysing the gradients of these lines, the AI w ould
simply try to stay betw een them. It seemed a lot of this style of AI relied on line detection, and since the guard UI in For Honor is
essentially made up of six unique lines, I decided to try a similar method to detect the guard direction.
The screen itself could be captured very easily using opencv's ImageGrab function; just using the follow ing line, the portion of the
screen containing the game could be captured:
screen = np.array(ImageGrab.grab(bbox=(0, 130, 1024, 700)))
After brief experimentation, I concluded that it w ould be best to run the game at a resolution of 1024x768, as it is the low est
resolution the game allow s to be run w hilst still maintaining a 16:9 aspect ratio. A smaller resolution is desirable, as it means that
the image captured converts to as small a matrix as possible, w hich w ill mean operations manipulating it w ill perform as quickly as
possible. With the image captured, it is ideal to restrict the image solely to the region of interest, so that only areas bearing useful
information w ill appear. Below is a diagram draw n to demonstrate the concept. Though the specific values w ill need ot be
changed as more information requires capturing, the concept remains the same:
This is achieved by creating a mask matrix of entirely zeroes except for the region of interest, w hich are ones. Then, by applying
a Bitw ise And operation w ith the mask and image, all pixels of the image except those covered by the ones are turned black. This
allow s for only the UI to be show ing:
def roi(image, vertices):

mask = np.zeros_like(image)
cv2.fillPoly(mask, vertices, 255)
masked = cv2.bitwise_and(image, mask)
return masked
The next step to get line detection w orking w as to threshold the image. Binary thresholding is the process of filtering the image
only to allow pixels above a certain brightness threshold, and convert the rest to black. This could be done w ith a single line of
code:
_,processedImage = cv2.threshold(processedImage, 125, 255, cv2.THRESH_BINARY)
After some experimentation w ith the parameters, I w as able to quite cleanly be left only w ith the guard UI (and some
miscellaneous noise, to be expected). Whilst in the conventional case, edge detection is applied before any kind of line-finding,
such as using the Canny Edge Detection algorithm, the UI is designed such that it w as w orth attempting to apply line detection
immediately, as it w as mostly made of lines.
I used a form of the Hough Lines Transform know n as the Probabilistic Hough Lines Transform, provided by openCV. This is a
more efficient form of the algorithm that also conveniently returns the lines as a pair of cartesian coordinates. Again, this could be
done w ith only a single line of code:
lines = cv2.HoughLinesP(processedImage, rho = 1, theta = np.pi/180, threshold = 17, minLineLength = 15, maxLineGap = 5)
After experimenting w ith the parameters, the resulting lines ended up essentially perfect. I used a function provided by Sentdex
to draw them:
With this done, I had to use the information from the lines to somehow discriminate and discern guard direction. When follow ing
the Self Driving AI course, the gradients of the lines w ere used to determine w hich side of the rode w as left or right. I decided to
use a similar method of analysing gradient. That required w riting my ow n algorithm for retrieving a list of the gradients:
gradients = []
try:
print(len(lines))
print(lines)
#find gradients
for line in lines:
actualLine = line[0]
gradient = (actualLine[3] - actualLine[1]) / (actualLine[2] - actualLine[0])
if isfinite(gradient):
gradients.append(gradient)
print(gradients)
except:
pass
I then started printing every gradient to the console output to analyse to see if any patterns emerged to use to distinguish. Quite
conveniently, I discovered that the lines had very simple heuristic borders; The lines making up the right guard had a gradient m <
-0.5. For the left guard, m > 1, and for the top guard, -0.5 > m < 1.
Given these simple rules, I w as able to quite easily w rite a function that w ould return the guard direction:
def find_guard(gradients):
avgM = int(mean(gradients))
if avgM < -0.5:
return RIGHT
elif avgM > -0.5 and avgM < 1:
return UP
else:
return LEFT
Where the constants returned are hexcodes that correspond to the inputs used to sw itch guard. This meant I could simply w rite
functions such that I could simply call switch_guard(find_guard(gradients)) . Doing this resulted in the AI successfully sw itching
to the correct guard. This can be seen in the video demo linked at the beginning of this log.
It should be noted that, in order for the AI to detect the lines cleanly and properly, some prerequisites w ere identified. Contrast
should be relatively low . The player character must maintain only a short distance from the enemy. Only permutations of maps that
are set at dusk or night, and that have clear w eather should be used. In the video, it can be seen that the AI occasionally makes
mistakes, and these are caused by noise generated by the snow fall. A clear, dark environment helps minimalise noise.
With this achieved, next steps w ere discussed. The key next step is the ability to discriminate the different states of attack from
the enemy. When an enemy attacks, the corresponding direction indicator grow s larger and turns red. This provides a new
problem, as red does not meet the required brightness levels to be thresholded. My proposed plan is to identify the red guard,
save the pixels as a map, then convert the red to w hite for thresholding and line detection. This w ay, the AI does not go "blind"
w hen the enemy attacks, causing erratic behavioure. Additionally, a second image can be created w here, follow ing thresholding,
the red pixels are reimposed upon the image so colour can be used as a discriminant to distinguish different states. Achieving this
w ill mean it w ill be possible to do for all states by their ow n unique colour.
It w as agreed that creating a table of different states requiring real-time identification w ould be useful. These w ill both be
attempted to be completed by the end of next w eek.
Due to extenuating circumstances regarding family, I w as unable to make any progress on the project this w eek. Last w eek's
goals roll forw ard to the next w eek.
04 - 06/02/2019
As discussed in the previous log, the next step of feature preprocessing w as to be able to filter the different colours (w hich
have been designed by Ubisoft to represent attack types) and use detection of the presence of those colours to be able to tell the
state.
This w ould require know ing exactly w hat information needed to be captured. The supplementary state table document in [log 02a]
() show s the possible information that might be important. Attack indicators w ould also need to reflect their direction as w ell.
Whilst it w ould be helpful to determine the player's stamina, and the enemy's health, it w ould require capturing the quantity of
health and stamina bars, w hich change size and move around. As a result, I have concluded it is not w orth consuming any time
attempting to capture these features as they w ould require a completely different method.
I concluded that the best w ay to achieve this colour-filtering capture w ould be to convert all but a range of colour values into
black. With regards to the attack indicators, since they also need to reveal guard direction, I w ould first convert them to w hite so
that they could pass the luminance threshold for the guard detection algorithm described in log 02.
How ever, before I tackled w ith the implementation of this, I had been encountering in my testing an unforeseen level of
inconsistency w ith the accuracy w ith w hich the lines w ere detected for the guard direction. To the point that even testing the
program on a static frame w ould yield constantly flickering results. After researching, I realised, by w ay of [this]() StackOverflow
thread, that the HoughLinesP function I w as using as part of the OpenCV library is a Probabilistic, meaning it does not test every
single point of data for lines, rather randomly selecting a subset of points. Making this more consistent w ould drastically increase
the accuracy of guard detection. Unfortunately, this w as not a task I w as able to achieve. I discovered tw o potential options: use
a function HoughLines , or use a completely different system called the linesegmentdetector . They both suffered crucial
problems. The former only returned the line equation. Not the line segment at all. This made the information useless. The line
segment detector suffered a similar flaw in that it does not allow for any kind of filtration during line detection - I cannot set a
minimum length of a line, or the maximum line gap. Both of these are crucial information, so I had to resort to remaining w ith the
HoughLinesP algorithm.
I began w ith the colour state detector w ith the attack indicator. The first step w as to determine the range of colours I could filter in
order to capture the attack indicator only. It w as becoming clear that using RGB values w as capturing far too much erroneous
noise. So as a result I researched alternative representations of colour, settling on using the HSV (Hue, Value, Saturation)
scheme. The follow ing diagram describes how these three variables map to a colour:
From Wikipedia, used under a CC BY-SA 3.0 license
Which means that I can very easily capture different shades of red and only shades of red. The first line of the filtering function
sees the image converted to HSV as a result:
hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
This is achieved using the numpy.inrange() function in a single line:
mask = cv2.inRange(hsv,np.array([0,234,74]),np.array([1, 255, 160]))
With this alone, technically w ith only a single line I can then retrieve all instances of this colour in the image, and w ill w ork w ith all
other colour state detection:
locs = np.where(mask != 0)
But for the attack indicators, as aforementioned I also need to convert the red into w hite in order to pass the brightness threshold
and The method for the actual filtering of the image for the attack indicator uses an algorithm made by StackOverflow user Ray
Ryeng, w hich I found on this thread. This emulates the copyto function in the C++ edition of the openCV library w hich the Python
library lacks. I used this function to copy all instances of the mask (using the locs numpy array) onto the original image.
Discovering this process took many hours of research, but it all w orks very effectively. By making a simple testing function, I
checked if the enemy w as attacking. If so, to do a "heavy attack" in the game. I then set up a testing environment w ithin the game
w hereupon the enemy bot w ould simply light attack in a random direction approximately every three seconds. When one heavy
attacks almost immediately w hen seing an enemy light attack, it has the effect of "parrying" the enemy's attack, negating any
potential damage to the player and opening up the enemy for counter attacking. Theoretically, then, this w ould allow the AI to
automatically parry all light attacks throw n at it.
At the time of w riting, I have had no access to the usual computer w ith w hich I run everything, due to the university failing to
provide my internet. As a result, I w as forced to run it on a suboptimal setup, on a significantly w orse CPU, meaning less frames
are able to be processed in a given time. This has the effect of w orsening the AI's "reaction time". Still, I tested the AI's automatic
parrying effectiveness to see if it w ould w ork, and how w ell if so. It did indeed w ork, though not every time. As can be seen
from [this placeholder video], the AI successfully parries many attacks, and usually blocks those it fails to parry. This is because it
captures the frames in w hich the guard direction has changed in time, but not the much smaller number of frames in w hich the
attack indicator is present. Hypothetically, the performance w ould improve w hen run on a more pow erful computer.
Given that this is all in the pursuit of feature preprocessing, it w as important now to consider how exactly the information w ould
be passed into the neural netw ork. Whilst much more research w ill now need to be done, given that information as simple as
possible w ould optimise the performance, I decided on representing this side of the game state on a 10 index, 1 dimensional
array. Each index w ould represent a different state, and the value at that index w ould be a boolean 0 or 1.
The full array index legend:
player right guard

player up guard
player left guard
enemy right guard
enemy up guard
enemy left guard
enemy attacking
enemy unblockable
enemy bashing
enemy guard breaking
So, for example, an array that looked like this: [0,1,0,1,0,0,1,1,0,0] w ould mean that the player is guarding top, w hilst the
enemy had their guard to the left, and w ere currently attacking w ith an unblockable attack. This w ay, much information could be
learned from very little actual stored data. Implementing this w as simply a series of if statements, as there is no cleaner w ay of
achieving this.
With colour state detection now achieved, the next steps w ill be to complete feature processing - add the colour detection for
unblockable attacks, and for bashing. Guard breaking w ill be a different, more difficult task as that requires detection of a
transparent image. Additionally, from my research it seems beneficial to also pass in an extremely low -resolution greyscaled
version of the frame as information, in order for the netw ork to have more contex-sensitive information w ith w hich to form its
ow n pattern detection. This w ill also require further research.
Log 05 - 14/02/2019
Feature processing is now nearly complete. Unblockable states are now accounted for by w ay of the same method described in
log 04. The state detector is now able to process all useful states but guard breaks, w hich are being investigated.
With this done, I printed w hat the detector could compute, and it w as able to give a coherent and seemingly accurate
representation of the opponent's actions:
At this point, I considered the outputs that the neural netw ork w ould produce after processing. It w ould be precedented for each
node to represent a button that can be pressed, and thus output an array w here each index represents a button, and its value
there a 0 or 1 describing w hether or not it should be pressed.
As such, a program w ould need to be able to conver this information into the DirectX button presses using the method described
in log 01. The follow ing simple script w as w ritten, and can be found as the file processInputToGame.py :
from directkeys import ReleaseKey, PressKey, W, A, S, D, LEFT, UP, RIGHT, Q, J, K, SPACE

import random
import time
inputArray = [0,0,0,0,0,0,0,0,0,0,0]
correspondingKeys = [W,A,S,D,LEFT,UP,RIGHT,Q,J,K,SPACE]
# W, A, S, D, left, up, right, GB, light, heavy, dodge
def processInput(val, key):

if val == 1:
PressKey(key)
else:
ReleaseKey(key)
def generateRandomInputArray(arr):
for i in range(len(arr)):
arr[i] = random.randint(0,1)
return arr
def inputToGame(arr):
for i in range(len(arr)):
processInput(arr[i],correspondingKeys[i])
for i in list(range(10))[::-1]:
print(i + 1)
time.sleep(1)
print("active...")
while True:
time.sleep(1)
inputArray = generateRandomInputArray(inputArray)
print(inputArray)
inputToGame(inputArray)
The function generateRandomInputArray() w as added as a testing placeholder, and the bottom program w as set up to simply
generate a new random state and execute it every 0.1 seconds. It w orked as expected.
After meeting w ith my supervisor, it w as agreed that an explicit, formal test w ould be required for the state detector before
moving on to the development of the neural netw ork. I w ill record footage of a fight. Every discrete time step, I w ill manually
record the true state log. Then, the state detector w ill w atch the footage of the fight as w ell and log its observed game state. A
confusion matrix w ill be draw n from a comparison of these results, w ith the accuracy and precision calculated.
Log 06 - Reading Week
Completion of State Detector (Unblockable, Low res frame)
Over reading w eek, I mainly focussed on the completion of the state detector, and prepared it for everything I w ould need once
the netw ork w as outpu
tting.
The final elements of the state detector w as to add guard breaking detection, w hich w as done in the exact same w ay as before.
I also w anted to pass in a low resolution version of the frame, and this w as done very simply w ith the cv2.resize() dfunction.
After doing some research on how to actually structure neural netw orks, I decided that it w ould be best to begin the learning
process using a supervised method, by capturing the buttons I am pressing at any frame, so as to serve as an output. This meant
I w ould need to be able to capture inputs from the gamepad. Using the inputs library I w rote a test script to do just that. This can
be seen in Experimentation/Gamepad Capture/inputs Test.py of the repository.
In addition to this, the neural netw ork w ould be outputting a one hot array w here each index represents a button press. This
w ould need to be converted to inputs into the game itself so the netw ork may actually "play" the game. Another script w as used
utilising the DirectKeys.py script and OutputControl.py script discussed in the first logs. It can be seen in
Experimentation/GuardMatcherAttacker/processInputToGame.py .
With these both completed, I w anted to spend time looking at actually implementing neural netw orks. I w atched a series of videos
by Harrison Kinsley, or "Sentdex", explaining how to use Keras and Tensorflow . Follow ing the course, I implemented a
feedforw ard netw ork that trained on the mnist dataset. Mnist is a classically used dataset for learning to implement neural
netw orks, as it contains an extremely large number of samples. They are all greyscale images of handw ritten digits, labelled w ith
their correct answ er. Here, the netw ork learns to identify new handw ritten digits from this training set.
Afterw ard, I follow ed the course into implementing a convolutional neural netw ork that learned on the Kaggle dataset,
approximately 25 000 images of cats and dogs, learning to get it w orking and learning about tensorboard and experimentation of
hyperparameters in the process. It w as at this point I realised my tensorflow w as training extremely slow , and discovered i
should be training on the GPU, rather than the processor. After installing tensorflow -gpu, CUDA, CUDNN and upgrading to python
3.6, the training became drastically faster; w here before a single epoch took approximately 63 seconds, they now took 9.
At the end of Reading w eek I briefly explored implementing a recurrent netw ork on MNIST. This w as extremely fast and accurate,
though more difficutl and complicated to set up. It w ill likely be a recurrent LSTM netw ork I use for the final implementation of
Neural Knight, but after speaking w ith my Supervisor, it w as determined that it w ould be good simply to start capturing data and
explore the results acquired by a simple Feed-Forw ard netw ork.
Log 07 - 03/03/2019
With the intention of being in a position to begin developing a simple feed forw ard netw ork, the final step w as to actually be able
to capture training data.
This w as a simple enough process. Essentially, for each frame, the data captured from the state detector, as w ell as a
greyscaled, low resoltution version of the frame are both stored into a single file w hich is then pickled. For the same frame, the
buttons I am pressing at the time (though only one is stored to keep the netw ork one hot for now ) are stored in a separate file.
This w as set up in a w ay that I could toggle recording by pushing a button, so I could avoid capturing useless frames.
Some test data w as captured by fighting a dummy CPU several times. The implementation and test data can be found in
Implementation/Data Capture .
Then, using a similar version of the feed forw ard netw ork I trained on the mnit dataset, I passed it in the data captured in the
game. The data w as small and noisy, and so I did not expect much. After some tw eaking, it claimed how ever to have achieved a
validation accuracy of ~ 70%. Sceptical, I had the program save hte model, and set about building the part of the program that
w ould load this model and use it and the stateDetector to "predict" game inputs.
It took an unforesson amount of time to get running, unfortunately, due to apparently running out of video memory w hen trying to
run both For Honor and the netw ork at the same time. How ever, eventually this w as solved as I discovered that Keras tries to
allocate as much of the VRAM as possible to itself, but by using gpu_options =
tf.GPUOptions(per_process_gpu_memory_fraction=0.30) I w as able to restrict the netw ork to consuming only 30% of my VRAM,
leaving plenty for the game itself..
When it finally ran, I discovered that the netw ork w as consistently and almost exclusively deciding to do nothing at all - i.e. output
the final node as hot. The final node is reserved not for a button press, but rather an "idle" action, to do notihing. This makes
sense, as I realised that, in a given fight, most frames I w ould not be pressing a button, as duels in For Honor are carefully paced
and designed to avoid "button mashing" behaviour. As a result, most of the data explained that the right answ er w as indeed to do
"nothing", and thus the netw ork had learned that if it just did nothing every single time, then it w ould get the majority of its
behaviour correct.
This problem of imbalanced data can be fixed in a myriad of w ays, but since the primary solution - obtain more balanced data - is
not possible due to the nature of the game itself, the next solution w ill be attmempted: modification of the class w eights. This
allow s certain outputs in the training set to be w orth a certain factor more than other classes. This w ay I could set every frame
containing an output w ith an actual button press, to be, for example, 50 times any idling examples.
This is w hat iw ll be attempted over the w eekend.

Log 08 - 09/03/2019
The netw ork w as not learning to make accurate decisions w hen trained to fight, so it w as decided that it w ould first be taught a
much simpler behaviour - blocking attacks. Data w as fed in of just blocking attacks from random directions, w hich w as a simple
mapping. How ever, it w ould not learn this either. Every time, it w ould only stand still, yet claimed to have achieved 70% accuracy.
Due to possessed domain know ledge, the conclusion w as draw n that the data w as alrgely imbalanced as, in most frames
captured (w hich occurs betw een attacks) no buttons w ould be pressed. As such, the netw ork w ould be incentivised to do
nothing all the time. This w as fixed w ith the utilisation of class w eights, adding proportional value to certain data samples based
on their end classification. Proceeding this, the bot began to make decisions.
Log 09 - 14/03/2019
With the w ork of the previous w eek, the bot began to block attacks, but inonsistently and often it w ould sw itch its guard
incorrectly. As a result, a deeper ivnestigation of the data capturing system w as initiated. After some time, it w as discovered that
the syste mfor detecting the end gamepad's inputs w as flaw ed - the inputs library only detected thumbstick input if the thumbstick
w as being actively moved. This eant that if the thumbstick w as being held in a specific position (w hich is needed to keep a guard
up), then the algorithm w ould report thatr it w as not being moved at all.
This w as due to a problem related to threading. As w ith other buttons, thumbstick data comes in a buffer that has to be read
from, and that buffer does not update if the sticks do not move. To solve this, a new thread w as made only for gamepad input
detection. This w ay the buffer could be read from constantly w ithin its ow n loop, and ensured that w hen holding a thumbstick in
a direction that it w ould consistently detect the correct w ay.
With these tw o additions made, the autoblocker now w orked exactly as intended, w hich can be seen in this video:
https://w w w .youtube.com/w atch?v=LR2Tm7F-w 80
Results from 100 throw n attacks can be seen here:
Blocked Not Blocked
Left 22 9
Up 32 3
Right 29 5
Whilst it is relatively accurate, then w ith a total accuracy of 83%, the individual accuracy of attacks that came from the left w ere
significantly less so, w ith an accuracy of only 71%. This led to the conclusion that the state detector w as, for some reason, not
as effective at detecting the left guard than the others. The root of this problem w as not difficult to find; With some visual testing it
could be seen that the Region of Interest cropping (discussed initially in Log 02) w as cutting off the left guard. Increasing the
dimensions some afterw ard w as enough to fix this problem, as demonstrated:
Blocked Not Blocked
Left 31 3
Up 28 4
Right 34 0
Weekly Log 24/03/2019
Not much could be done this w eek as I w as ill, and had to focus on tw o other module coursew ork projects and their subsequent
reports.
Weekly Log - 30/03/2019
All of my time w as spent on the draft report. I tested myself against a level 3 w arden in 50 rounds to use as a human benchmark
for testing. I w on 35 rounds and lost 15. I w ill now w ork on capturing large amounts of data.
125
Bibliography
[1] Human Benchmark. Reaction Time Statistics. 2019. URL:

https://www.humanbenchmark.com/tests/reactiontime/statistics.
[2] Cas. Why Pyautogui Doesn’t Work in Game Window. 2017. URL:
https://stackoverflow.com/revisions/45369762/3.
[3] Colah. Understanding LSTM Networks. URL:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
[4] Adam Fletcher and Jonathan Mortensen. Using Python to build an AI to play and win
SNES StreetFighter II with machine learning. 2019. URL: https:
//www.theengineeringofconsciousexperience.com/using-python-to-build-an-
ai-to-play-and-win-snes-streetfighter-ii-with-machine-learning/.
[5] Matthew Hausknecht et al. “A neuroevolution approach to general atari game
playing”. In: IEEE Transactions on Computational Intelligence and AI in Games 6.4 (2014),
pp. 355–366.
[6] Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural
computation 9.8 (1997), pp. 1735–1780.
[7] Max Jaderberg et al. “Human-level performance in first-person multiplayer games
with population-based deep reinforcement learning”. In: arXiv preprint
arXiv:1807.01281 (2018).
[8] Harrison Kinsley. Python Plays GTA V. 2017. URL:
https://pythonprogramming.net/game-frames-open-cv-python-plays-gta-v/.
[9] Honglak Lee et al. “Convolutional deep belief networks for scalable unsupervised
learning of hierarchical representations”. In: Proceedings of the 26th annual international
conference on machine learning. ACM. 2009, pp. 609–616.
[10] OpenAI. OpenAI Five. URL: https://openai.com/five/#how-openai-five-works.
[11] P-GN. Unbalanced data and weighted cross entropy. 2017. URL:
[12] Kev1n91 Paddy. How to calculate F1 Macro in Keras? 2017. URL:
https://stackoverflow.com/a/45305384/5634610.
[13] Adrian Rosebrock. Keras: Multiple Inputs and Mixed Data. 2019. URL:
https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-
mixed-data/.
[14] Ray Ryeng. Equivalent of copyTo in Python OpenCV bindings? 2017. URL:
[15] SethBling. MariFlow - Self-Driving Mario Kart w/Recurrent Neural Network. 2017. URL:
https://www.youtube.com/watch?v=Ipi40cb_RsI.
[16] SethBling. MarI/O - Machine Learning for Video Games. 2015. URL:
https://www.youtube.com/watch?v=qv6UVOQ0F44.
[17] Kenneth O Stanley and Risto Miikkulainen. “Evolving neural networks through
augmenting topologies”. In: Evolutionary computation 10.2 (2002), pp. 99–127.
[18] Weighted Cross Entropy With Logits - TensorFlow. URL: https://www.tensorflow.org/
api\_docs/python/tf/nn/weighted\_cross\_entropy\_with\_logits.

Neural Knight

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neural Knight

Uploaded by

Copyright:

Available Formats

F INAL P ROJECT R EPORT

Neural Knight: Novice Human Level

Student No: 33497611

May 16, 2019

BSc Computer Science Degree

Neural Knight: Novice Human Level Performance in a Multiplayer 3D Fighting Game

3 Design and Implementation 5

4 Testing and Evaluation 31

5 Conclusions and Future Work 41

A Turing Test Responses 45

B Figures & Tables 49

D Preliminary Project Report 57

F Weekly Logs 107

1.1 Project Introduction

1.2 Baseline Target Specification For The Project

1.3 Extended Target Specification For The Project

1.4 Structure of Report

2.1 The Problem

2.2 Neural Networks in Self-Playing AI

Design and Implementation

3.1 For Honor Explication

3.1.1 Art of Battle System

F IGURE 3.1: Annotated user interface.

Health and Stamina

3.2.2 Player Control Neural Network

Capturing the Screen

Developing the Guard Detection Algorithm using Line Detection

Capturing Attack Indicators

Output of Feature Extraction

• Opponent’s left guard

Developing Remaining Feature Extraction Utilising Colour Presence

Attack Type Lower Boundary Upper Boundary

Additionally, from research conducted on precedent in this field, it appeared beneficial to

Capturing Gamepad Input

3.3.3 Player Control Neural Network

Feed Forward Network

Using Keras, this simply becomes:

Direction Blocked Not Blocked

Direction Blocked Not Blocked

F IGURE 3.4: The initial recurrent layer, comprised of a CUDA-enabled LSTM

Attempts to improve the Recurrent model

Mixed Input Data Network

F IGURE 3.5: Visualisation of convolutional neural networks. Image taken

Class Imbalance with Multi-Label Classification

Change from Accuracy to F Score as Metric

F Score is calculated as follows:

Which is computationally faster when simplified to:

Custom Loss Function

Σ ((y × −log( X + e) × W + (1 − y) × −log(1 − y + e) × (1 − y))

Temporal Convolutional Layer

Model Optimisation Utilising Tensorboard

Addition of Third Branch

Network Early Stopping During Training

Completion of Third Branch

Testing and Evaluation

Q = { Idle, SingleLightAttack, ChainAttack}

Σ = { R1, R2, LS ↑, X, O, , Wait, HoldPrevious/Release, HitOrBlockedPrevious, HoldPrevious}

F = { Idle, SingleLightAttack, SingleHeavyAttack, ChainAttack,

4.1.2 Evaluating the Neural Networks

TABLE 4.4: Initial results of 20 rounds of duelling a Level 1 Warden before

Prediction History Removed

TABLE 4.8: Performance after 100 rounds of duelling a level 1 Warden. A

TABLE 4.10: Performance after 100 rounds of duelling a level 1 Warden,

Model Performance Results Comparison

i f name == " main " :

i f name == " main " :