Professional Documents
Culture Documents
Neural Knight
Neural Knight
Author: Supervisor:
Taha N ASIR Dr. Matthew Y EE -K ING
UNIVERSITY OF LONDON
Abstract
Computing Department
by Taha N ASIR
Neural Knight is an artificial neural network that has learned to play the 2017 computer
game "For Honor". It can fight against CPUs and humans to the level of a novice human
player, and is extremely challenging to tell apart from a human when observed. The neural
network consists of recurrent branches as well as convolutional branches. It was achieved
using OpenCV, Tensorflow and Keras.
iv
Acknowledgements
I would like to thank my family and friends for their support throughout the last four
months of developing Neural Knight. A special thanks must be given to my supervisor,
Matthew Yee-King, for his time and help in supervising me. I would especially like to thank
Daniel Kukiela for his invaluable insight into the workings of Neural Networks, and for
teaching me much. Finally, thank you to Yijie and Hasan, who were my human test subjects
within the game and spent hours fighting against different iterations of the bot.
v
Contents
1 Introduction 1
1.1 Project Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Baseline Target Specification For The Project . . . . . . . . . . . . . . . . . . . 1
1.3 Extended Target Specification For The Project . . . . . . . . . . . . . . . . . . . 2
1.4 Structure of Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Survey 3
2.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Neural Networks in Self-Playing AI . . . . . . . . . . . . . . . . . . . . . . . . 4
C Project Proposal 53
C.1 Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
C.2 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
C.3 Extended Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
C.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
C.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
E Source Code 61
Chapter 1
Introduction
Chapter 2
Literature Survey
Chapter 3
As can be seen, the main aspects of the UI are the player’s guard, health and stamina, and
the opponent’s guard, health and stamina. In order to win, the player must reduce the op-
ponent’s health to zero, emptying the bar. If the player’s health reaches zero first then they
will die and lose the duel. Most actions in the game consume stamina. If stamina is com-
pletely consumed then the player will become "exhausted", greying the screen and causing
all attempted moves to become extremely slow and high risk until it has fully replenished.
The guard user interface is an outline of a shield, split into three sections, referred to as "left",
"right", and "top" guard. A hero can have its guard face any one of these three directions.
Should an opponent attack from the same direction that a player is guarding, the attack will
be blocked, meaning it will be mostly negated.
Damaging weapon attacks consist of two main types - light attacks and heavy attacks. Light
attacks are fast, generally between 300-400ms (the timing differs dependent on their posi-
tion in a combination, the direction they come from, and the hero being played). They do
small amounts of damage and often work as opening attacks that chain into more dangerous
combinations. They consume little stamina. Heavy attacks are slower, often between 800-
1000ms. They consume much more stamina than light attacks, and are often the finishing
attack in a chain of moves.
Light and heavy attacks can be countered or "parried". Parrying an incoming attack requires
applying a heavy attack during the incoming attack’s parry window. The parry window be-
gins 300ms before the attack lands, and ends 100ms before the attack lands. This means the
player has only 200ms to react accordingly. Due to the difficulty of this action, successfully
parrying an attack rewards the player with an opportunity to apply certain amount of guar-
anteed damage. Typically, parrying a heavy attack guarantees a light attack, and parrying a
light attack guarantees a heavy attack.
Certain attacks, often heavy attacks positioned at the end of combination sequences, are
attributed the "unblockable" property. Unblockable attacks are accompanied by an orange
effect and sound, and cannot be blocked. They can, however, be parried as normal. Certain
unblockable attacks are not a weapon attack, but actually consist of some kind of blunt
charge, such as an attack with a shield. These are known as "bash" attacks. Bashes often
knock the opponent back and guarantee a light attack.
Weapon attacks can be dodged, if timed correctly. Players can dodge left, right, and back-
wards. In order to prevent an opponent consistently dodging attacks, a player can break
their guard. "Guardbreaks" display an icon briefly, and if the opponent does not also press
the same button when the icon becomes visible (for 400ms), then the opponent becomes
stunned and vulnerable to a large amount of damage. A GB will fail if the opponent is
in the process of throwing an attack. An opponent cannot counter-guardbreak if they are
in the process of dodging. If a player and an opponent attempt to guardbreak each other
simultaneously, they will both bounce off each other and fail.
This information is now repeated in a short-form glossary format for ease of referral for the
reader.
3.1.2 Glossary
• Hero: Playable characters in the game.
• Warden: One of the playable characters in the game, a knight in armour, wielding a
two-handed sword. Neural Knight is taught to fight as a Warden, and to fight against
a Warden.
3.2. Design 7
• Light attack: Fast, low-cost attack that does low damage and is difficult to parry.
• Heavy attack: Slow, high-cost attack that does high damage but is easy to parry.
• Parry: A counter-move performed in a 200ms window during an attack that stuns the
opponent.
• Unblockable: An attack that cannot be blocked by matching the guard direction. Can
still be parried.
• Bash: An unblockable attack that has no direction, often some kind of physical shoving
motion.
• Guardbreak: An opening attack with no direction that requires a quick reaction to
negate. Failure to counter will stun the opponent for 600-800ms.
3.1.3 Bots
For Honor has built-in bots in the game that serve as replacements for other players. They
can be played against or work with the player co-operatively in other game modes. The bots
are split into difficulty levels 1, 2, and 3, in ascending order. Whilst there is no publicly avail-
able data regarding the bots, from domain knowledge of the author, Level 1 bots are bots
that a novice player would find challenging. Level 2 bots a relatively intermediate player
should find challenging, and Level 3 bots a more experienced player should find challeng-
ing. This is clarified as these bots are used as the primary way to test the implementation,
further discussed in the next chapter (4 - Testing and Evaluation).
3.2 Design
The project can be described as the sum of three major components - the input, the player
control network, and the output. The input is the process of capturing the information that
is to be fed into the network. The player control system section is the development of the
neural network itself. The third task is the consideration as to what the output of the neural
network will be, and how it will be processed as inputs into the game itself. This section
details how the solution to these tasks was designed and conceptualised.
3.2.1 Input
Since, as aforementioned in the Specification (1.2 - Baseline Target Specification For The
Project), there is no API which can be used to hook into the game so as to pull live informa-
tion during a fight. As a result, the only way to provide information for the neural network
is by discerning information extracted from each individual on-screen frame.
Theoretically, a neural network may be able to learn to play simply by feeding in the raw
frame data. However, it would be much easier to control and test if more explicit informa-
tion could be fed. This requires manually interpreting information found on screen, using
computer vision techniques. Information such as the enemy’s guard direction, the presence
of an attack, an "unblockable" attack, a "bash", or a "guard-break" would all be required.
Table 3.1 was originally created to display all the game features desired to be extractable.
Health and stamina was not made extractable.
8 Chapter 3. Design and Implementation
State Images
Guard Up
Guard Left
Guard Right
3.2. Design 9
Attack
Unblockable
Guardbreak
10 Chapter 3. Design and Implementation
Bash
TABLE 3.1: A table of images that demonstrates the appearance of the user
interface during certain game actions, and shows what features were desirable
to extract.
3.2.3 Output
The method most frequently employed by other Self-Playing AI (e.g. When "Sethbling"
developed a self-racing AI for the SNES title "Super Mario Kart" [15]) is to consider this
3.3. Implementation 11
a classification problem and represent each button that can be pressed as an output node.
By employing this method the output would be a "multi-one-hot" array where each index
represented a button press. The exact number of different buttons and how the array is
processed into direct game input was the first aspect of the project that was considered and
also to be implemented, and this process will now be explored first in the next section.
3.3 Implementation
This section explores the process of how each of the three tasks aforementioned were imple-
mented. Whilst the order of them in the Design was considered in an intuitive manner, this
section of the report details each problem in the order they were implemented. They were
attempted in this order due to the level of difficulty of each task.
3.3.1 Output
The first task was to demonstrate that one could easily control a game automatically using a
script. Since it was intended for the project to utilise Python libraries for the Input and neural
network portion of the project, (namely OpenCV and Tensorflow), research was conducted
regarding methods to simulate input using Python.
Initially, it was concluded that this task was completely trivial. Using Pyautogui, a library
for Cross-platform GUI automation, functions like pyautogui.keyDown() would have been
used to control the game. So initially a script was written such as:
import pyautogui
import time
while True :
pyautogui . keyDown ( "W" )
time . s l e e p ( 1 )
pyautogui . keyUp ( "W" )
However, whilst this worked in the shell, and in any normal text field (in this instance pro-
ducing the result of pressing the W key repeatedly, which is usually assigned in games as
the input to move a character forward), and even worked for regular interaction (pressing
the Windows key also opened the start menu), it would not work at all in any game, includ-
ing For Honor. Upon further research, a StackOverflow thread was discovered [2] in which
user Cas explains why these functions do not work in modern games:
The SendInput function will insert input events into the same queue as a
hardware device but the events are marked with an LLMH_INJECTED flag that
can be detected by hooks.
Modern games actually utilise DirectInput events, which is part of the DirectX development
API, meaning only triggered events are detected as input. After further research, another
thread with a script was found. A modified version was made by Harrison Kinsley [8], co-
incidentally as part of the same course that would be rediscovered and used for the next
subsection, which made it much easier to interact with. This script converts keyboard Hex-
Codes into DirectX input events.
Using this, a script was written that would allow for controlling the fighting system in the
game. A demonstration of this code can be seen here: (https://youtu.be/4Uf7F4eQSR0),
where the bot fights and kills an in-game bot which does not move or change its guard, by
12 Chapter 3. Design and Implementation
locking on, moving towards it, and executing a series of attacks from two different direc-
tions. After this, it performs an "execution" animation, before unlocking the camera and
ending.
Refactored slightly, it demonstrates that the output can be easily abstracted such that compli-
cated sequences of gameplay interactions can be performed with a single function call. The
version written for use in the final implementation of the project can be found as "processIn-
putToGame.py", which contains the method "inputToGame()". This takes in the multi-one-
hot array, where each index corresponds to a button press. The algorithm iterates through
the array and sends the appropriate DirectX events.
The output array itself is discussed further in the next subsection.
3.3.2 Input
The second task is the matter of input for the AI - providing real-time data about the game
so the AI can use this information to make a decision as to how to act. This proves itself
as a non-trivial problem, as For Honor does not provide any kind of API to extract such
information, presumably because it could be used for cheating quite easily. As a result,
the implementation must, as discussed in the Design, utilise Computer Vision techniques
to extract information. This subsection details the process by which a state detector was
implemented to detect an enemy’s guard. It can be seen as a proof of concept in the following
supplementary video: (https://youtu.be/j0uWr_h7eZc) It follows by exploring how the
other important game states are captured.
The single most important aspect of the user interface that requires interpreting for the AI
is the enemy’s guard direction. Figure 3.1 shows an annotated screenshot of the game to
explain how the guard direction functions. Directional attacks come from one of these di-
rections, and the defender must match the guard in order to block the attack.
Given this information, the first aim was to make a simple program that detected the en-
emy’s guard direction and switched its guard appropriately. This would serve as a proof of
concept. However, the author was completely unfamiliar with OpenCV and computer vi-
sion as a field, and thus research and experimentation was carried out in order to learn how
to apply it to the project. Sentdex’s course on implementing a self-driving car AI in the 2013
title "Grand Theft Auto V" [8] contained an introduction to OpenCV and contained some
rudimentary boiler plate code that would subsequently become useful in the state detector.
Following the first several steps of this course, a rudimentary lane detection algorithm was
developed using OpenCV. It relied primarily on finding the main two lines in a frame, and
consider them the two boundaries of the lane. By analysing the gradients of these lines, the
algorithm would simply try to stay between them. Line detection is a very common method
for feature detection in computer vision. Since the guard user interface in For Honor is
essentially made up of six unique lines, a similar method was employed to extract the guard
direction.
After brief experimentation, it was concluded that it would be best to run the game at a
resolution of 1024x768, as it is the lowest resolution the game allows to be run whilst still
3.3. Implementation 13
maintaining a 16:9 aspect ratio. A smaller resolution is desirable, as it meant that the image
captured converts to as small a matrix as possible, which would mean operations manipulat-
ing it would perform as quickly as possible. With the image captured, it was ideal to restrict
the image solely to the region of interest, such that only sections of the screen bearing useful
information would appear. This was achieved by creating a mask matrix of entirely zeroes
(0) except for the region of interest, which contained ones (1). Then, by applying a Bitwise-
And operation with the mask and image, all pixels of the image except those covered by the
ones are turned black. This allows for only the guard user interface to be captured, though
the guard direction itself still needed to be filtered from this.
F IGURE 3.2: The guard direction user interface and corresponding detected
lines.
Once a script had been written that successfully pulled information regarding the two lines
that made up the current guard, the following step was to use this information to detect the
actual guard direction. This was done by analysing the gradients of each line and taking
the average. Since each guard on the interface is made up of two unique lines, the resulting
gradient would lie between three ranges that would correspond to its direction.
14 Chapter 3. Design and Implementation
After some experimentation logging the gradients of different guard directions repeatedly,
the following results were concluded: the lines making up the right guard had a gradient
m 6 −0.5. For the left guard, m > 1, and for the top guard, −0.5 > m < 1.
When this was implemented as a function switch_guard(), a proof of concept script was
written that would switch the guard direction to the opponent’s detected guard. When
tested, the algorithm successfully matched the opponent’s guard accurately, as seen in the
supplementary video linked at the beginning of this subsection.
It should be noted that, in order for the algorithm to detect the lines cleanly and properly,
some prerequisites were identified. Contrast should be relatively low, finalised at setting
19. The player’s character must maintain only a short distance from the enemy for these
states to be correctly identified, else the enemy becomes outlined in white, generating noise
that bypasses the binary threshold (though it should be noted that the end result is not too
heavily penalised in performance when the bot creates a large distance between it and its
opponent, thanks to other sources of data explained in subsequent sections). Only permu-
tations of maps that are set at dusk or night, and that have clear weather should be used.
The map used for all testing is "The Ring" set at Dusk. In the video, it can be seen that the
AI occasionally makes mistakes, and these are caused by noise generated by the snowfall.
A clear, dark environment helps minimalise noise.
The values for the range were determined by qualitatively analysing screenshots taken of
the attack indicator.
Thus, by simply retrieving the array from the line: np.where(mask! = 0) the algorithm could
detect how many red pixels were in the frame. If the quantity exceeded a certain threshold,
it was very likely that the opponent was attacking.
3.3. Implementation 15
However, for the attack indicators, as aforementioned, the algorithm also needs to convert
the red pixels into white in order to pass the brightness threshold. The method for the actual
filtering of the image for the attack indicator uses an algorithm made by StackOverflow user
Ray Ryeng [14]. This emulates the copyto function in the C++ edition of the OpenCV library
which the Python library lacks. It was used to copy all instances of the mask (using the locs
numpy array) onto the original image. In the end, this results in two separate images. One
image is all black except for any red pixels during an attack. The other is the original guard
detecting image, which shows the pixels that pass the binary threshold as well as showing
any pixels from an attack indicator after it has been changed to white.
Auto-parry Test
This process was devised after extensive research, but it appears to work very effectively.
Its use was tested by making an "autoparry bot." As explained in the For Honor explication
(3.1.1 - Art of Battle System), when an opponent throws an attack, there is a window of
opportunity in the first half of the attack’s animation to counter their attack. This is more
favourable than simply blocking, because it guarantees a counter-attack to land, and drains
the opponent’s stamina rather than your own. A simple testing function was developed that
checked if the enemy was attacking, and if so, to input the button to parry in the game. A
testing environment within the game was then set up, where the enemy bot would simply
throw a "light" attack in a random direction approximately every three seconds. This would
then theoretically allow the algorithm to automatically parry all light attacks thrown at it.
The result worked mostly as intended. When run, the AI controlled character successfully
blocked approximately 70% of attacks. However, it can be considered even more accurate
than that, since approximately 62% of the attacks it failed to parry were not because it de-
tected the wrong guard, but actually because it was programmed to react the moment it saw
an attack. However, in the game itself even with the quickest attacks the window of oppor-
tunity does not start right away, and the bot was reacting before this window had opened.
This is not a problem for the average human player, for whom, according to Human Bench-
mark’s online test results, the average reaction time is around 284ms [1]. This proves that the
bot is actually capable of reacting much faster than a human, since the window for parrying
starts 200ms into the attack and it reacts well before this.
TABLE 3.2: HSV Colour ranges for each type of feature that is captured by
detecting the presence of said colours.
• Move backward
• Move right
• Guard left
• Guard top
• Guard right
• Guard break
• Light attack
• Heavy attack
• Dodge
• Feint
• Taunt/Execute
• Idle (Do nothing)
However, the initial script written was changed much later in the project after it was dis-
covered that the analogue sticks would not be detected at all if they were not being actively
moved. This was due to a problem related to threading. As with other buttons, analogue
stick data comes in a buffer that has to be read from, and that buffer does not update if the
sticks do not move. To solve this, a new thread was made only for gamepad input detection.
This way the buffer could be read from constantly within its own loop, and ensured that
when holding an analogue stick in a direction that it would consistently update and detect
properly.
With the state detector completed, a finalised script was implemented that could start,
pause, and stop recording of data and then pickle it to a file, thus completing the feature
preprocessing stage. The lack of testing of the state detector is detailed and justified in a
later section of this report (4.1.1 - Infeasibility of Evaluating the State Detector).
the same value irrespective of any passes that occurred before. Concisely, they possess no
"memory" of any kind.
Research discussed earlier in the report (2 - Literature Survey) had suggested that it was
likely that this would not be sufficient, as context related to previous actions made by the
opponent (in the form of combination moves, timed events, and the individual’s preferred
style of play) and the player’s own previous actions and button presses would be required to
make intelligent decisions. However, it was still worth attempting to observe its efficiency,
and to serve as an additional benchmark for the later models.
The Tensorflow library is used to model, train, and run the neural networks, using Keras
as a wrapper in order to abstract and simplify the process. Keras allows one to abstract the
declaration of, for example, a dense layer in a model. This would normally require several
complicated lines describing the nature of the weights, nodes, and activation function, such
as:
h i d d e n _ 1 _ l a y e r = { " weights " : t f . V a r i a b l e ( t f . random_normal ( [ 1 2 8 ,
n_nodes_hl1 ] ) ) , " b i a s e s " : t f . V a r i a b l e ( t f . random_normal ( [
n_nodes_hl1 ] ) ) }
l 1 = t f . add ( t f . matmul ( data , h i d d e n _ 1 _ l a y e r [ " weights " ] ) ,
hidden_1_layer [ " b i a s e s " ] )
l 1 = t f . nn . r e l u ( l 1 )
Allowing much more convenient and powerful control over making immediate major changes
to the model structure.
This was a simple enough process. Some test data was captured by fighting a dummy CPU
several times. Then, a simple model was developed that takes in the input state, along with
the buttons pressed at each given frame to serve as a training output. The scaled frame was
not yet passed in, so as to observe what the minimum necessary input data required. The
model itself was comprised of three Dense (fully-connected) layers of 128 rectified linear
nodes (hereafter this activation function is referred to as "relu"), followed by a softmax layer
of 14 output nodes, one to represent each button, and a final node to represent idling. Soft-
max activations are used for one-hot encodings, meaning only one button would be pressed
at a time. Figure 3.3 has been generated by Keras visualising the model:
3.3. Implementation 19
F IGURE 3.3: The feed forward model, made up of three rectified linear activa-
tion layers followed by a softmax output layer.
The initial model was trained on the dummy data. The samples size was low and the data
itself relatively noisy, and so a fairly low initial accuracy was expected. However, after the
model ran, Tensorflow claimed to have achieved a validation accuracy of approximately
70%. The only true means of testing the model was to test its performance in the game,
and so the process of developing a separate script began; a script that would load a trained
feed-forward model and use it and the state detector to "predict" game inputs.
It took an unexpectedly large amount of time to implement the real-time predictor, due
to seemingly running out of video memory when trying to run both For Honor and the
network at the same time. This was solved upon discovering that Keras tries to allocate
as much of the VRAM as possible to itself unless otherwise specified, much of which was
being consumed by the game in this case. By forcing a maximum allocation of 30% to the
neural network, both the program and the game were able to execute properly when run
simultaneously.
When the model was initially run, it was observed that it had been incentivised to almost
exclusively predict "Idle" - i.e. to do nothing at all. The author’s domain knowledge assisted
in identifying the cause of this erroneous behaviour. This is due to the fact that, in any given
frame in a fight, it is unlikely that a button would be being pressed by a player, as duels in
For Honor are carefully paced and designed to avoid "button mashing" behaviour (the act
of aggressively attacking by pressing essentially random buttons that lack any thought or
skill). As a result, most of the data explained that the right answer was indeed to do "noth-
ing", and thus the network had learned that if it did nothing every single time, then it would
get the majority of its behaviour correct, thus explaining the erroneously high accuracy.
This problem of imbalanced data is extremely common in the field of Machine Learning
and as a result a myriad of methods have been developed to tackle it. However, since the
primary solution - obtain more balanced data - was not possible due to the nature of the
game itself as explained above, the next best solution was implemented: modification of the
class weights. This allows certain outputs in the training set to be worth a certain factor
more than other classes. For example, when utilising this method, the weightings could
be set such that every frame containing an output with an actual button press would be 50
20 Chapter 3. Design and Implementation
times more important than any idling examples. At first, employing this method appeared
to resolve the problem. The trained model, when run in game, appeared to make some in-
formed decisions, such as occasionally parrying attacks. At the very least, it was observably
no longer classifying "idle" exclusively.
However, after providing the network with more robust data comprised of a large amount of
successful fights, it was becoming increasingly apparent that more data was not improving
the performance of the network. It was uncertain as to if it had learned at all. As a result of
this failure to learn, the new goal regressed temporarily to teach the network to simply block
attacks. This task is a trivial mapping, teaching it to match the same guard as the opponent’s
whenever they attacked. It was at this point where the threading issue with the gamepad
detection was identified, discussed earlier in (3.3.2 - Input). In addition to this, the model’s
final output layer was changed from a softmax one-hot output to a sigmoidal layer. The
Sigmoidal activation function outputs an array that represents the probability of the inputs
resulting in each corresponding class. This is then interpreted such that if any probabilities
exceed a certain threshold then to input the corresponding button. This allows the network
to "press" multiple buttons simultaneously. Once these two issues were rectified and the
network was retrained on fresh data, the automatic blocking bot worked very effectively
(supplementary video: https://youtu.be/LR2Tm7F-w80). Results from 100 thrown attacks
can be seen in Table 3.3.
TABLE 3.3: The number of blocked and landed opponent attacks before cor-
recting the region of interest. Note that this was a "sanity test" to ensure the
network was making intelligent decisions. Results of final performance tests
are detailed in Chapter 4.
Whilst it was relatively accurate, then with a total accuracy of 83%, the individual accuracy
of attacks that came from the left were significantly less so, with an accuracy of only 71%.
This led to the conclusion that the state detector was, for an unknown reason, not as effective
at detecting the left guard than the others. The root of this problem was not difficult to find;
With some visual testing it could be seen that the Region of Interest cropping, discussed
initially in (3.3.2 - Input), was cutting off the left guard, and even a small part of the right
guard. Increasing the dimensions of the space kept unmasked was enough to fix this prob-
lem. This can be observed in the significant drop in left and right missed blocks, displayed
in Table 3.4.
TABLE 3.4: The number of blocked and landed opponent attacks after correct-
ing the region of interest. Results of final performance tests are detailed in
Chapter 4.
However, even with the threading issue resolved, capturing a new set of fights with the cor-
rected data capturing system and training the model yielded no improvement in fighting
3.3. Implementation 21
performance. The details as to how models were compared are explained in (4 - Testing
and Evaluation). There were two main ways to proceed from here to improve the model.
One method was to add the other type of input data to be fed into the network. At that
stage in the development, only information passed from the state detector and the previous
gamepad inputs from 6-60 of the previous frames had been used; the normalised, low reso-
lution frames stored alongside this data had yet to be utilised. However, additional research
determined that developing models for mixed inputs was a non-trivial task. In the article
"Keras: Multiple Inputs and Mixed Data" [13] Dr. Adrian Rosebrock writes:
Developing machine learning systems capable of handling mixed data can be ex-
tremely challenging as each data type may require separate preprocessing steps,
including scaling, normalization, and feature engineering.
Working with mixed data is still very much an open area of research and is often
heavily dependent on the specific task/end goal.
With this considered, and given the time frame involved, it was more logical to consider the
second option first - converting the network to be recurrent rather than feed-forward, then
consider implementing a mixed input network afterwards if there was still no noticeable
improvement.
Recurrent Network
The structure, advantages, and workings of recurrent networks, specifically long short term
memory networks, are discussed in (2.2 - Neural Networks in Self-Playing AI). The envi-
ronment had been modified prior to allow for GPU training. With CuDNN also installed,
this allowed for the CUDA version of the Tensorflow LSTM cell to be used (which allows
training on the GPU instead of the CPU, which is significantly faster). Using LSTM units,
a recurrent network model with Dropout was developed. Dropout is a method to tackle
overfitting by randomly "switching off" a specified percentage of units by setting them to
zero. The optimum fraction of units to apply Dropout to was subject to experimentation,
but began with 10%. Figure 3.4 depicts the initial model structure.
Capturing training data becomes more difficult with the addition of network memory. With
a feed forward network, specific actions could be captured in a controlled environment be-
cause context did not matter. For example, learning to successfully "counter guard break"
when the opponent attempts to guardbreak the player could be learned by having the oppo-
nent guardbreak repeatedly. However, a recurrent network considers sequences of events,
and in a real fight such a scenario would not ever take place. As a result, the only way to
properly capture data was to play through full, normal fights.
Initial data consisted of 54 rounds worth of fights versus a Warden of Difficulty Level 2,
50 of which were won. Initial results were very promising. The network still failed to do
simple actions like intercept attacks by blocking (though this was considered to be due to
a simple lack of data), but was able to perform complex and skilful evasions and counter-
attacks. For example, the network learned to optimally counter a "shoulder bash" move by
doing the appropriate four action combination attack. It was able to occasionally win against
Difficulty Level 1 bots, and even once defeated a Difficulty Level 2 bots (supplementary
video: https://youtu.be/6I5jLyAHcjY). With this level of success, the next step was to
capture much more data and observe its performance.
22 Chapter 3. Design and Implementation
which required each layer added to be subsequently connected to the next layer added. The
Functional API is much more powerful, as it allows much more control over which parts
of the network connect. The new neural network consisted of two branches; one branch
was the existing LSTM layer that received the categorical state data. The other branch was a
convolutional layer that processed the image data. These were both then flattened and con-
catenated so they could be processed together at activation, after being abstracted through
several dense layers. Figure 3.6 depicts this structure. Convolutional layers were not inves-
tigated during the research phase of the project, but are made up of two processes, convolu-
tion and pooling. Convolution is the splitting of the data into "feature maps". For example,
in Figure 3.5, the dataset showing human faces splits the images into features such as eyes,
ears, noses, lips, etc. This is done by moving a window (kernel) along the data and, for
each subset of data, checking the rest of the image for similarity to the pixels in the window.
This is convolving. What results is a map for each feature that represents what areas of the
data sample are likely to contain that feature. The convolution layer is this resulting stack of
feature maps. Pooling is then performed, which walks another smaller window along the
feature map and attempts to condense the captured window into a single data point. This
is most often done in the form of MaxPooling, where the highest value in the window is
chosen.
However, this network still did not provide any kind of improvement. In fact, with the new
and much larger dataset, the performance of the network was actually worse than the initial
recurrent’s - it failed to appear that the network was ever making intelligent decisions. It was
at this point that the different datasets were compared. The initial set, though much smaller,
still performed better than the new set. Besides a smaller size - which should theoretically
have an adverse effect on learning - the only difference was the number of previous frames’
outputs being fed as inputs; the initial dataset stored only 6 frames of history, whilst the
newest dataset contained 600. This potentially implied that training and predicting using a
higher quantity of frames of history was less effective than using a lower number, if at all.
Based on this theory, the network was then retrained, but with the input history completely
24 Chapter 3. Design and Implementation
F IGURE 3.6: The mixed-data network that accepts both the categorical and
image data, fed through an LSTM layer and a convolutional layer respectively,
before being concatenated and fed through Dense layers till activation.
removed, training solely on categorical and image data. Upon training, the network began
to exhibit more intelligent and reactive decision making again. Not enough to be considered
competent, but marginally better than the initial recurrent net. The reason for this improve-
ment is considered to be that by having a large history of previous button inputs, it makes
the problem space much larger, and it is very unlikely that the same extended sequence
of inputs leading to the exact same result happens enough times to learn from. Thus, this
problem may be eliminated if there was significantly more data available.
It was discovered after further research that the categorical data was being passed into the
network incorrectly. This is because LSTM units do not learn sequences of events implicitly,
rather each sample has to be accompanied by a history of previous samples split into indi-
vidual arrays. Until now, it had been provided with a sequence length of 1. By providing it
with a small history, around 6-10 previous samples worth, the performance improved con-
sistently. However, increasing the sequence length yielded the same problems as discussed
with training using output history, meaning until a much larger dataset can be acquired, the
sequence length must be kept short.
3.3. Implementation 25
Class label
0 1 2 3 4 5 6 7 8 9 10 11 12 13
4999 2997 1095 1997 2814 1862 3468 455 1035 768 353 14 38 4897
TABLE 3.5: Example dataset displaying the extreme class imbalance. Whilst
classes 11 and 12 can be considered too low in sample size to learn from, they
are not very important to overall gameplay performance, and were included
for the sake of completeness.
This problem was originally thought to be solved using the application of class weighting.
As explained previously, class weighting allows for the amplification of the cost of misclas-
sifying certain samples during training. Thus theoretically if one applies a higher weighting
to under-represented labels, then they will be considered more "important" than more reg-
ularly occurring labels. However, this method, and indeed other traditional methods (such
as randomly over/undersampling) do not work with multi-label classification tasks. This
is because when applying an amplified cost to a misclassified sample containing an under-
represented label, because of the multi-label nature of the data, it is very likely that one of
the more over-represented labels would also be present in that sample. Thus that label gets
considered more important as well. Indeed, applying balancing methods such as this to
multi-label data can actually exacerbate the problem. Data balancing for multi-label classi-
fication is still a developing field of research.
As a result, a better metric for measuring was required. F Score was chosen. The F Score is a
balanced compromise between precision and recall, and can be considered relatively harsh
as a result, because often networks have a high value for one metric or the other. Precision
and recall are calculated as follows:
TP
precision =
TP + FP
TP
recall =
TP + FN
precision × recall
F1 = 2 ×
precision + recall
The custom metric implemented was StackOverflow user Paddy’s version of F Score and
can be found at the beginning of all files named "trainModel.py" [12]. When this was used
and tested with prior networks, it was evident that they all suffered extremely low F Scores.
Results are discussed in the following Chapter (4 - Testing and Evaluation). As a result,
optimising for F Score was the ideal route, as it provided a useful and realistic performance
metric which was less susceptible to the Accuracy Paradox.
Where y is the set of the batch’s targets, X is the input batch, W is the set of class weightings,
and e is a small constant, 1e − 10. By applying the weightings such that each weight is
inversely proportional to each class’s presence in the dataset, the loss function decreases
the false positive count. This has the effect of increasing the precision at the cost of the
recall. When tested, the network showed considerable improvement, and the in-game tests
3.3. Implementation 27
displayed a much more intelligent level of decision making, despite still not being able to
win fights consistently.
history of previously pressed gamepad inputs was being used as an input, and initially im-
proved performance. However, it degraded when the sequence length was any higher than
1. This was due to two reasons: the primary problem was that, as discussed, the sequences
were being fed incorrectly - as one long string rather than being split into separate lists. The
second issue is with longer sequences, more data is exponentially required for the network
to learn. This is because an output produced given some input data with a given state is
much less likely to occur multiple times when the input data is a much longer sequence of
previous inputs, as the entire sequence needs to be the same every time. As a result of this,
this input data was removed from the network. At this stage in development, it was now
reintroduced in its own separate branch.
The input history branch was initially using a temporal convolutional layer due to the im-
provement it demonstrated for the other categorical data. The data capturing system was
set up to capture singular frames’ previous input so that different size sequences could be
constructed after the fact, during preprocessing. A new dataset was captured with this sys-
tem in place - 50 rounds of gameplay, approximately an hour’s worth of duels versus a level
2 bot. With this captured, the next stage was to analyse and experiment with the sequence
length. It quickly became apparent with minimal experimentation, however, that the model
still suffered the problem of lack of comprehensive data with long sequence lengths. In
fact, a sequence length of 1 was found to provide the highest predictive power. This would
be re-investigated at the end of the implementation. At this time, the implementation was
interrupted by a new issue that had become apparent.
With this significant improvement in performance, it was now observed that there were
still certain areas of the bot’s performance that were poorly utilised or under-represented
- dodging, guarding top and guard-breaking were all used too rarely. At this point, the
weights for the corresponding classes were increased during training such that they would
be classified positive more often. Over the course of several iterations, this was fine-tuned
to an observably optimal level.
At this stage, time for development was ending, and the bot had become rapidly more effec-
tive at fighting. The areas where it lacked good decision making were mitigated by provid-
ing more data of just that problem area, such as dealing with attacks coming from the top
direction. The bot was then tested against other bots, as well as humans. Whilst it does not
win often, it plays far better than a new player to the game, and can even occasionally de-
feat a human opponent. With this achieved, the implementation was considered complete.
Figure 3.7 displays the finalised network structure.
30 Chapter 3. Design and Implementation
F IGURE 3.7: The finalised neural network model. Input 1 is the categorical
state data. Input 2 is the low resolution, RGB image data. Input 3 is the pre-
vious gamepad input, either provided by the training data, or in the case of a
real-time test it receives its own previous prediction.
31
Chapter 4
4.1 Verification
4.1.1 Infeasibility of Evaluating the State Detector
Evaluating the state detector is a very non trivial task. There are only two empirical ways
to observe the accuracy of the state detector. Both require testing its classifications against
labelled data. The first method is to label footage completely manually. Given the time
frame available, this is unreasonable due to the sheer volume of labels required. Each frame
could potentially be a different classification, meaning an average two-minute fight at 60
FPS would require 7200 manual labels.
The other method is to record the opponent’s button inputs, and convert them automatically
to the appropriate labels. However, this is again infeasible due to the time constraint. The
states being recorded are context sensitive, and also combination sensitive, meaning certain
sequences of inputs lead to different states. To explicitly depict this, a single character, or
"Hero" has had its moveset described in terms of a finite state transducer:
Γ = { LightAttack, HeavyAttack, GuardBreak, ChargedHeavy, Lion0 sClaws, Lion0 sFangs, Lion0 sBite,
Lion0 sJaws, Eagle0 sFury, Eagle0 sFuryAlternate, LegionKick, LegionKickCombo, Jab, ChargedJab,
JabCombo, QuickThrow, Lion0 sRoar, Feint}
I = { Idle}
32 Chapter 4. Testing and Evaluation
σ = {( Q0 , Σ0 , Γ0 , F1 ), ( Q1 , Σ1 , Γ0 , F3 ), ( Q1 , Σ5 , Γ15 , F0 ), ( Q2 , Σ0 , Γ4 , F0 ), ( Q0 , Σ0 , Γ0 , F1 ), ( Q1 , Σ0 , Γ0 , F3 ),
( Q2 , Σ1 , Γ5 , F0 ), ( Q0 , Σ0 Γ0 , F1 ), ( Q0 Σ1 , Γ7 , F3 ), ( Q0 , Σ1 , Γ1 , F2 ), ( Q0 , Σ1 , Γ6 , F3 ), ( Q0 , Σ5 , Γ2 , F4 )
( Q0 , Σ2 + Σ1 , Γ8 , F5 ), ( Q0 , Σ2 + Σ3 + Σ1 , Γ9 , F5 ), ( Q0 , Σ1 + Σ8 + Σ5 + Σ7 , Γ12 , F7 ),
( Q0 , Σ1 + Σ8 + Σ5 + Σ9 , Γ13 , F8 ), ( Q0 , Σ1 + Σ8 + Σ5 + Σ0 , Γ14 , F7 ),
( Q3 , Σ0 + Σ6 + Σ0 + Σ6 + Σ0 , Γ16 , F0 ), ( Q0 , Σ1 + Σ4 , Γ17 , F0 )}
The other important aspect of this to observe is how many items in σ result in the same
output, adding an additional layer of complexity to trying to output this. Coupled with
the fact that each character has a different move-set, evaluation by this method becomes
infeasible under the project time constraint.
Random Fighter
To serve as a benchmark for lower-boundary comparison, a naive bot was implemented that
simply output a random array of gamepad inputs every 0.1 seconds. 100 rounds of fights
were recorded in wins and losses, and the results are displayed in Table 4.1. It achieves a
Win/Loss ratio of 0.04.
Random Fighter
Wins Losses
4 96
TABLE 4.1: Performance results of the random fighter after 100 rounds of du-
elling a Level 1 Warden.
Feed Forward
The feed forward network was the first network to be tested. It exhibited some intelligent be-
haviour, such as blocking attacks, occasional parrying, but failed to play consistently enough
to win often. Results are displayed in Table 4.2. It achieves a Win/Loss ratio of 0.15. It is
decisively more effective than a random fighter.
4.1. Verification 33
Feed Forward
Wins Losses F Score
15 85 0.4010
TABLE 4.2: Performance results of the feed forward network after 100 rounds
of duelling a Level 1 Warden.
Recurrent
The initial recurrent network was tested similarly. Results are displayed in Table 4.3. It
achieved a Win/Loss ratio of 0.23. The network evidently performed significantly better
than the feed-forward network.
Initial Recurrent
Wins Losses F Score
23 77 0.4827
TABLE 4.3: Performance results of the initial LSTM network after 100 rounds
of duelling a Level 1 Warden.
Subsequent tests were performed of a smaller scale upon attempting to improve the recur-
rent net. The tests were of a smaller sample size simply due to the time constraint of the
project. They do, at the very least, suggest a general tendency towards improved effec-
tiveness, and can be seen in Tables 4.4, 4.5, 4.6, and 4.7. As discussed in the Implementation
(3.3.3 - Attempts to improve the Recurrent model) the new dataset saw a severe degradation
in performance, and several methods were attempted in order to fix this degradation.
Pre-Improvement
Wins Losses
2 18
TABLE 4.5: Results of 20 rounds of duelling a Level 1 Warden after the net-
work was trained without the previous frames’ gamepad outputs as inputs.
Note that no F score was recorded for this model as it was discarded before
the switch to this metric. Win/Loss ratio: 0.2
Mixed-Input Model
Wins Losses F Score
4 16 0.3247
TABLE 4.6: Results of 20 rounds of duelling a Level 1 Warden after the net-
work was changed to accept multiple inputs. Win/Loss ratio: 0.2
34 Chapter 4. Testing and Evaluation
Multi-Input Model
Wins Losses F Score
6 14 0.6617
TABLE 4.7: Results of 20 rounds of duelling a Level 1 Warden after the data
being passed into the LSTM layer was fixed by converting it into a series of
sequences. Win/Loss ratio: 0.3
It should be clarified that 20 rounds of duelling is not at all a large enough sample size to
make any conclusive observations about the efficacy of each model. The testing data above
is primarily used as a means to show that none of the model modifications displayed any
significant improvement. A larger sample size would be desirable, but each round can take
from 30-180 seconds to complete, and due to the time frame of the project, obtaining a larger
sample to test each and every attempted improvement was simply infeasible.
Finalised Model
Once the final model had been determined, and the only remaining major improvements
were to qualitatively modify the misclassification weightings and provide more data, the
model was tested rigorously against both the Level 1 and Level 2 difficulties of bot - a test
not previously performed on other models, as initial testing demonstrated that they could
almost never defeat a level 2 opponent. The final model achieved an F Score of 0.8091, a
significant improvement over every other model, and this is evident in the test results:
Multi-Input Model
Wins Losses
74 26
Multi-Input Model
Wins Losses
23 77
TABLE 4.9: Performance after 100 rounds of duelling a level 2 Warden using
the same finalised model as above. No other previous model was able to
intelligently defeat a Level 2 opponent. Win/Loss ratio: 0.23
In both the training process and live testing, the finalised model performs significantly bet-
ter than every other model, and can be considered a genuine success. When compared to
an example novice player, who had volunteered as a test subject, it can be seen that their
performance in terms of win ratio are very comparable, shown in Table 4.10.
4.1. Verification 35
Multi-Input Model
Wins Losses
76 24
Though there was no set data recorded, the bot did fight humans numerous times. Whilst
it is evidently not good enough to defeat experienced humans consistently, it does make it
a challenge for humans to defeat, and even managed to defeat both of the two human test
subjects on more than one occasion. Table 4.11 below displays all of the prior testing table
results, congregated for comparison.
TABLE 4.11: Performance results of all models tested. The degraded model
Pre-improvement and with the Prediction History Removed did have F Scores
recorded. The improvement in Win/Loss ratio is observable, and by the end
is comparable to the results of a real novice human’s results. All duels were
versus a Level 1 Warden.
Tensorboard
As discussed in (3.3.3 - Model Optimisation Utilising Tensorboard) once a basic model struc-
ture had been finalised (before adding the LSTM branch, there was too little time to perform
re-testing after the addition of the third branch), utilising a temporal convolutional layer and
2D convolutional layer, several values for different parameters were tested. This included
the number of neurons per layer and the number of layers for each branch. Every permu-
tation of model for these parameters (within a small range) was trained and compared with
each other. They were compared utilising Tensorboard, a visualisation tool for Tensorflow,
used within Keras as a callback.
Figure 4.1 depicts the performance of 15 of the 98 models tested. Many models experienced
extremely similar training cycles, and so have been omitted for the ease of readability of the
graph. It is visible that all models with a higher quantity of neurons per layer outperform
those with lower amounts. 128 neurons was the highest amount that memory limitations
allowed for. Figure 4.2 displays the generated graph of the network structure in far greater
detail than in Figure 3.7.
36 Chapter 4. Testing and Evaluation
F IGURE 4.1: Results of 15 models trained with the same network structure,
but with different values for the number of layers for each branch, and the
number of neurons per layer. The name of the model details the values of
these parameters.
4.1. Verification 37
4.1.3 Validation
This subsection will review and reflect upon the specifications set out initially (1.2 - Baseline
Target Specification For The Project), and determine how much of what was set out to be
achieved was fully realised. It will also evaluate whether the specifications were themselves
appropriate and realistically feasible. Each stated specification is listed and explored.
Specification Review
The bot must be able to competently make fighting decisions against an enemy.
Whilst this is a fairly nebulous requirement, it nonetheless is of key importance, and is better
observed qualitatively than quantitatively. The bot consistently performs important deci-
sions, such as switching its guard direction to match an opponent’s in order to block incom-
ing attacks, parrying attacks, following successful guard breaks and parries with optimal
counter-attacks, and more. Counter-attacks are performed less consistently than standard
defensive moves, but these successful actions require a window of decision of less than half
38 Chapter 4. Testing and Evaluation
a second. It can be confidently stated that the bot does competently and consistently make
fighting decisions against an opponent. It can be argued however that it does not do so
consistently enough to be adequate, but this is explored later in the subsection.
Ideally it should fight and win matches with a regularity of over 50% against
Level 2 difficulty bots and intermediately-skilled players.
This can be considered a failed specification. As evidenced by the results listed in the pre-
vious subsection, in Table 4.9, it can be observed that the win percentage is only 23%. The
bot simply does not make intelligent decisions consistently enough to win against Level 2
difficulty bots in most test cases. It should be noted, however, that win ratio is a particularly
harsh metric. For Honor is a game where an extremely small number of mistakes made in
the span of milliseconds can cost you an entire match. Whilst a player does not need to play
perfectly, they also cannot afford more than a small number of mistakes, and so even a 23%
win percentage can be considered an achievement. Upon reflection, this specification could
be considered itself to be poor and overly harsh given the time frame. It was born out of,
at the time, a lack of knowledge on the part of the author. It was not known at the time the
sheer complexity of multi-label, multi-class classification tasks such as this, especially when
dealing with imbalance.
It must be able to play without any manual assistance from the time when ini-
tially locking on until the round is over.
This was successful. Once the program is running and the model has loaded, the AI captures
several frames to use as a frame of reference to begin with. Then it can be activated and
ceased with the push of a button. Nothing else is required of the user.
The bot must be able to play in real-time using information entirely gleaned from
the screen. It should not require any hooks to be connected to the opponent’s
client, and thus should be able to fight any opponent, including in-game bots.
This was also successful. Explained in detail in the Implementation (3.3.2 - Input), the bot
gains all of its information via complex computer vision techniques utilised in a state detec-
tor. It gains the rest of its input data from the frame itself and its own previous predictions.
for all five questions can be seen in the Appendix (A - Turing Test Responses) along with the
answers. The video can be seen at the following link: https://youtu.be/Y2cir2THk3E
Each respondent was able to obtain a maximum score of 5, if they managed to discern the
human in all five scenarios. As it was, of the 20 people that responded to the survey, none
of them were able to obtain the maximum score. Nor were any able to obtain a score of 4/5.
Scores beyond that were fairly evenly distributed though tending towards the lower end,
with 5 scoring 3/5, 5 scoring 2/5, 6 scoring 1/5, and 6 scoring 0/5. The mean score thus
was just 1.41/5, which is barely over the most likely score obtained by randomly choosing
answers of 1.25 (0.25 × 5). Given the generally poor performance from respondents, it can
be concluded that the AI observably performs similarly or better than a novice human. Full
results can be seen in Figure 4.3.
F IGURE 4.3: Results from the "Turing Test". No respondents scored a maxi-
mum score of 5, nor even 4. This is a promising result, as it implies that it is
extremely difficult to tell the AI apart from a novice human.
It would be useful if it was able to learn to play more than one kind of character,
as each has a different moveset and different styles with which to win fights. (it
is unlikely it will be completely achievable for every character due to the time
limit of the project and the quantity of data required to be captured).
This final extension specification can be considered not met. The AI can only play as the War-
den Hero, and has only been trained to fight against Wardens. However, theoretically the
network should be able to learn to play as any Hero and against any Hero, provided there is
a dataset tailored specifically for that combination. This combination was chosen due to the
Hero’s comparative simplicity, and due to the fact that supervised data capture required the
skill level of the author to be high in order to provide useful data. More complicated fight
matchups were more likely to lead to more erroneous data capture and subsequently worse
results.
41
Chapter 5
5.1 Evaluation
This section will reflect and review upon the entirety of the project and provide a more
general analysis than Chapter 4 as to how successful the end result was, and how many of
the original intentions were realised.
the stamina system. Though it does not exhaust itself often, the network has been given no
real frame of reference as to the stamina level and thus it cannot rectify that. Extracting the
health and stamina level was deemed too difficult a task given the time allowed, however a
potential solution has been devised that could solve this problem, discussed further in 5.3.
A more significant failing, but another that can be fixed with data, is that Neural Knight
thus-far has only been made functional for one specific matchup i.e. Warden vs Warden.
The network does not and cannot generalise to other character matchups, and the dataset
provided does not support that. This could be theoretically fixed by recording a new dataset
with a new matchup, however this would mean that each trained model would be designed
to perform in one kind of matchup. In order to be a generalised system, there would need
to be 2424 models and datasets, which is an infeasibly large number.
the user interface constantly moving. However, one method of obtaining the health is to
use more precise pixel-by-pixel analysis. The health bars only move around within a certain
range of the screen. If these two sub-windows were captured, then pixels could be analysed
along a row to monitor the colour, from left to right. When pure white was reached, this
would signify the beginning of the health bar. When a black pixel was reached, then the dis-
tance travelled would constitute the current health. The same could be done with stamina.
Figure 5.1 further explains this concept.
In addition to all of this, once Neural Knight has successfully reached a point where it could
receive no further meaningful improvement in its performance of a duel, For Honor also has
a "Brawl" mode, which works exactly the same, but is instead a duel of two teams of two.
This would require taking into consideration a second enemy, as well as co-operation with
another player. It would require either a far more robust API to access game information,
or a way to learn without utilising precise information. Learning to self-play Brawl modes,
as well as the other 4 vs 4 modes, are of such a larger scope that they would be considered
completely different projects.
Appendix A
Correct answer: 3
46 Appendix A. Turing Test Responses
Correct answer: 4
Correct answer: 1
Appendix A. Turing Test Responses 47
Correct answer: 1
Correct answer: 2
49
Appendix B
List of Tables
3.1 A table of images that demonstrates the appearance of the user interface dur-
ing certain game actions, and shows what features were desirable to extract. . 10
3.2 HSV Colour ranges for each type of feature that is captured by detecting the
presence of said colours. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 The number of blocked and landed opponent attacks before correcting the
region of interest. Note that this was a "sanity test" to ensure the network was
making intelligent decisions. Results of final performance tests are detailed
in Chapter 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 The number of blocked and landed opponent attacks after correcting the re-
gion of interest. Results of final performance tests are detailed in Chapter
4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Example dataset displaying the extreme class imbalance. Whilst classes 11
and 12 can be considered too low in sample size to learn from, they are not
very important to overall gameplay performance, and were included for the
sake of completeness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1 Performance results of the random fighter after 100 rounds of duelling a Level
1 Warden. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Performance results of the feed forward network after 100 rounds of duelling
a Level 1 Warden. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Performance results of the initial LSTM network after 100 rounds of duelling
a Level 1 Warden. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Initial results of 20 rounds of duelling a Level 1 Warden before improvements
were attempted. Note that no F score was recorded for this model as it was
discarded before the switch to this metric. Win/Loss ratio: 0.1 . . . . . . . . . 33
4.5 Results of 20 rounds of duelling a Level 1 Warden after the network was
trained without the previous frames’ gamepad outputs as inputs. Note that
no F score was recorded for this model as it was discarded before the switch
to this metric. Win/Loss ratio: 0.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.6 Results of 20 rounds of duelling a Level 1 Warden after the network was
changed to accept multiple inputs. Win/Loss ratio: 0.2 . . . . . . . . . . . . . 33
4.7 Results of 20 rounds of duelling a Level 1 Warden after the data being passed
into the LSTM layer was fixed by converting it into a series of sequences.
Win/Loss ratio: 0.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.8 Performance after 100 rounds of duelling a level 1 Warden. A remarkable im-
provement over other models in a relatively small step, switching the Conv1D
branch that processed the previous gamepad inputs to using LSTM. Win/Loss
ratio: 0.74 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.9 Performance after 100 rounds of duelling a level 2 Warden using the same
finalised model as above. No other previous model was able to intelligently
defeat a Level 2 opponent. Win/Loss ratio: 0.23 . . . . . . . . . . . . . . . . . 34
4.10 Performance after 100 rounds of duelling a level 1 Warden, fought by a novice-
level human player. The performance of this player and the bot are compara-
ble. Win/Loss ratio: 0.76 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.11 Performance results of all models tested. The degraded model Pre-improvement
and with the Prediction History Removed did have F Scores recorded. The
improvement in Win/Loss ratio is observable, and by the end is comparable
to the results of a real novice human’s results. All duels were versus a Level
1 Warden. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
51
List of Figures
4.1 Results of 15 models trained with the same network structure, but with dif-
ferent values for the number of layers for each branch, and the number of
neurons per layer. The name of the model details the values of these parame-
ters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Tensorboard visualisation of the final neural network structure. . . . . . . . . 37
4.3 Results from the "Turing Test". No respondents scored a maximum score of 5,
nor even 4. This is a promising result, as it implies that it is extremely difficult
to tell the AI apart from a novice human. . . . . . . . . . . . . . . . . . . . . . 39
5.1 Concept for a potential reward function. In order to determine the health,
capture the general window where the health bar resides. Travel from the
left till pure white pixels are met. Begin counting. When more than a certain
number of black pixels are reached (so as not to consider each small black
division bar), the distance travelled would be health. Taking the difference
would be able to provide a workable reward function. The same algorithm
could be applied to stamina for observation purposes. . . . . . . . . . . . . . . 43
53
Appendix C
Project Proposal
Below is the original project proposal in its entirety, followed by the list of intended aims
and objectives, and a rudimentary bibliography of the background research conducted.
C.1 Proposal
It is a popular project in the field of neural networks to develop and train neural networks
(and other learning machines) to play single-player computer games, as training is made
easy. Neural networks require many thousands upon thousands of data points, which also
assumes the behaviour being replicated must be able to be encoded (for example, for a feed-
forward neural network with a single layer of hidden neurons, the behaviour must be able
to be represented as a continuous bounded function). Virtually every aspect of a computer
game is already encoded and accessible, and data points can be very easily generated to
provide a large enough training set.
However, most neural networks of this nature are done for single-player computer games,
where all other agents are programmed. The concept of doing the same with online multi-
player games is still fairly new, and multiple agents cooperating together to make decisions
even more so.
My project will be the development of a neural network that will learn to play an online
game, and, time-dependent, to develop other neural networks that all learn to work in a
team. The game that will be used is as of yet undecided, though likely it will either be
Ubisoft’s “For Honor” (2017) or Valve’s “Dota 2” (2011).
There are advantages and disadvantages to both games. For Honor, a fantasy 3D melee com-
bat game, utilises its combat engine dubbed the “Art of Battle” system. It is fundamentally
quite simple, and as a result would be able to be easily encoded. There is also specific modes
for 1v1 duelling, 2v2 and 4v4 fights, which would make scaling the project into dealing with
multiple agents very convenient. However, there is no API or other way of easily accessing
real-time information of a fight. As a result, some kind of visual encoding would need to be
done, so computer vision would add an additional layer of complexity to the project.
Dota 2, however, would be significantly more difficult to model. The game’s mechanics are
extremely complicated, and so forming an intelligent AI would be equally difficult, espe-
cially multiple agents working cooperatively, since different agents would have very differ-
ent roles. One additional note is that 1v1 has already been achieved by Open AI, and they
have since begun to work on a 5v5 project which is making progress, so using Dota would
also not be wholly very unique.It would be much easier to encode however, as real-time in-
formation is accessible outside of the game for anyone to use. This, along with my existing
knowledge and experience with the game are the two main advantages of using Dota.
54 Appendix C. Project Proposal
The only hardware needed is a computer powerful enough to run both the network and the
game, as well as presumably another computer to use to fight the AI and give it training
data as well as to test it against. In terms of software, an additional copy of the game for the
AI to play on. The tools and systems I will be using are Tensorflow, and OpenAI’s Gym.
C.2 Aims
• Create an AI that can play a game, most likely For Honor
• Recognise the screen and convert it into inputs that can be fed to a neural network
• Develop a neural network that learns to produce game actions that can win fights
C.4 Objectives
• Be able to control the game through code
• Regularly capture the screen of the game
• Process each frame into a visualiser that can recognise guard direction
• Process guard direction into normalised data to be fed as inputs into a neural network
• Develop a recurrent neural network that can output the correct response to the en-
emy’s actions
• Develop the AI to be proficient enough to beat Level 1 Bots.
• Develop the AI to be proficient enough to beat Level 2 Bots.
• Develop the AI to be proficient enough to beat Level 3 Bots.
• Develop the AI to be proficient enough to beat competent humans.
C.5 References
G. Cybenko, 1989, “Approximation by Superpositions of a Sigmoidal Function”
John Laird, Michael van Lent, 2001, “Human-Level AI’s Killer Application - Interactive
Computer Games”
Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio
Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman,
Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis,
C.5. References 55
Appendix D
D.1 Introduction
Computer games bots are pieces of software that are able to play computer games au-
tonomously. Researchers have applied many different techniques to the design of bots, and
one such modern technique is the use of of neural networks to develop and train neural
networks (and other learning machines) to play single-player computer games, as training
is made easy.
Neural networks require many thousands upon thousands of data points, which also as-
sumes the behaviour being replicated must be able to be encoded (for example, for a feed-
forward neural network with a single layer of hidden neurons, the behaviour must be able to
be represented as a continuous bounded function. Luckily, virtually every aspect of a com-
puter game is already encoded and accessible, and data points can be very easily generated
to provide a large enough training set.
However, most neural networks of this nature are done for single-player computer games,
where all other agents are programmed. Neuro-evolutionary computing is a popular method
in this instance, where the goal is unchanging, but the solution is unknown. Neuro-evolutionary
computing is a kind of reinforcement learning, where a generation of random neural net-
works is generated, then tested and rated against some reward function. Poor models are
killed off, whilst fitter models are "bred" to make a new generation. This has been done for
Super Mario World as well as for a host of games for the Atari 2600.
The concept of using machine learning techniques with online multiplayer games, however
is still fairly new. In addition, this kind of technique is not feasible for an online fighting
game, where there is no single hard solution to winning, as the human opponent can be
considered completely unpredictable.
This project aims to develop an AI that learns to play the game For Honor (2017), developed
by Ubisoft. It can be considered a 3D melee dueling game, with additional modes that
support 2v2 and 4v4 gameplay. Since there is no API or hook available with which live
match information may be accessed, a significant portion of this project must be dedicated
to processing features directly from the screen using computer vision. This must be done in
an efficient enough manner that frames can be processed from a minimum of 10 frames per
second, due to this being the time taken for the quickest possible action to take place (guard
switching).
D.3 Methods
With regards to the actual learning, I decided early on that I would utilise a neural network,
as has been done in many observable examples previously. Specifically, I would make use of
a particular model of neural network called a recurrent neural network. As opposed to the
traditional feed-forward neural network, a recurrent neural network not only processes in-
puts from one layer to the next, but information is also retained and fed back within the fea-
ture space. This allows for information to be retained between frames, creating the possibil-
ity for patterns to be recognised within gameplay, because it allows time to be represented.
Especially promising is the model known as the "long short-term memory network", a form
of recurrent network that has a very large memory capacity compared to a normal RNN. An
RNN with external memory was used to train multiple agents to play Quake III Arena, but
using an LSTM solves the same problem as the utilisation of external memory - the problem
D.4. Project Plan 59
of the limitation of memorisation due to the the gradient vanishing and exploding problem
because "the temporal evolution of the backpropagated error exponentially depending on
the size of the weights" , back-propagation being a common and efficient method of training
networks. It was also used to train MariFlow, a self-learning Super Mario Kart AI. However,
my knowledge regarding recurrent neural networks is not yet sufficient, and more research
regarding it will be conducted over Reading Week.
The first segment of my project, however, is the matter of feature preprocessing - obtaining
live information regarding the state of the game from frame to frame. Such information is
not accessible using an API, so the only way to approach this problem is using computer
vision solutions. I have primarily been using the OpenCV documentation to learn, and
initially followed the computer vision section of a course by Sentdex [8].
The initial task of detecting guard direction was done by binary thresholding. Binary thresh-
olding is the process of filtering the image only to allow pixels above a certain brightness
value, and convert the rest to black. This could be done with a single line of code. After some
experimentation with the parameters, I was able to quite cleanly be left only with the guard
UI (and some miscellaneous noise, to be expected). Whilst in the conventional case, edge
detection is applied before any kind of line-finding, such as using the Canny Edge Detec-
tion algorithm, the UI is designed such that it was worth attempting to apply line detection
immediately, as it was mostly made of lines.
I used a form of the Hough Lines Transform known as the Probabilistic Hough Lines Trans-
form, provided by OpenCV. This is a more efficient form of the algorithm that also conve-
niently returns the lines as a pair of Cartesian coordinates. Again, this could be done with
only a single line of code. I decided that, in order to determine guard direction, since all
six lines were unique, that direction could be determined by the gradient. That required
writing my own algorithm for retrieving a list of the gradients. After experimenting with
the parameters, the guard detection was tested and successfully labelled the correct guard
most times. A more accurate test will be done of the whole state detector (see Project Plan).
The other kinds of feature were detected using a separate system. By using HSV values,
shades of a certain colour could also be thresholded. Utilising this, verifying the presence of
a colour in enough pixels could be used to detect the state of a feature. This is all stored in an
image separate from the guard direction thresholded image. In the case of normal weapon
attacks, where direction is also required, the attack indicator is converted to white in the
guard direction image before processing, so as to pass the threshold. Unblockable attacks
and guardbreaking is captured in much the same way, each looking for the presence of
certain colours within the frame. A "bash" attack is detected by deduction: if an unblockable
attack is taking place, yet the enemy has no guard up, then it is considered a bash.
The enemy’s health should be captured for the network to potentially use as some kind of
heuristic for progress. This is done by processing a separate copy of the frame and line de-
tecting the health bar. Since the health bar stays the same, and has a length of approximately
60 pixels when full (at 1024x768), finding the ratio of this by dividing the current length by
60 allows the program to get a percentage of health remaining.
In addition I finally generate a low-resolution, greyscaled version of the frame to pass in as
information.
With feature processing virtually complete aside from tweaking and experimenting with pa-
rameters, the next step is for a formal supervised test to be composed to assess the accuracy
and precision of this classifier.
61
Appendix E
Source Code
def r o i ( image , v e r t i c e s ) :
mask = np . z e r o s _ l i k e ( image )
cv2 . f i l l P o l y ( mask , v e r t i c e s , 2 5 5 )
masked = cv2 . b i t w i s e _ a n d ( image , mask )
r e t u r n masked
def process_image ( o r i g i n a l I m a g e ) :
processedImage = cv2 . c v t C o l o r ( o r i g i n a l I m a g e , cv2 . COLOR_BGR2GRAY
)
# p r o c e s s e d I m a g e = cv2 . G a u s s i a n B l u r ( p r o c e s s e d I m a g e , ( 1 , 1 ) , 0 )
v e r t i c e s = np . a r r a y ( [ [ 4 6 9 , 1 7 9 ] , [ 5 6 7 , 1 7 9 ] , [ 5 6 7 , 2 7 8 ] , [ 4 6 9 , 2 7 8 ] ] )
processedImage = r o i ( processedImage , [ v e r t i c e s ] )
_ , processedImage = cv2 . t h r e s h o l d ( processedImage , 1 2 5 , 2 5 5 , cv2 .
THRESH_BINARY)
# p r o c e s s e d I m a g e = cv2 . Canny ( p r o c e s s e d I m a g e , t h r e s h o l d 1 =200 ,
t h r e s h o l d 2 =300)
62 Appendix E. Source Code
def find_guard ( g r a d i e n t s ) :
avgM = i n t ( mean ( g r a d i e n t s ) )
i f avgM < − 0.5:
r e t u r n RIGHT
e l i f avgM > − 0.5 and avgM < 1 :
r e t u r n UP
else :
r e t u r n LEFT
f o r i in l i s t ( range ( 4 ) ) [ : : − 1 ] :
print ( i + 1)
time . s l e e p ( 1 )
def main ( ) :
l a s t T i m e = time . time ( )
while True :
s c r e e n = np . a r r a y ( ImageGrab . grab ( bbox = ( 0 , 1 3 0 , 1 0 2 4 , 7 0 0 ) ) )
# grab_screen ( region =(0 ,40 ,800 ,640) )
newScreen , o r i g i n a l I m a g e , g r a d i e n t s = process_image ( s c r e e n )
l a s t T i m e = time . time ( )
# cv2 . imshow ( " window " , n e w S c r e e n )
if gradients :
g u a r d D i r e c t i o n = find_guard ( g r a d i e n t s )
switch_guard ( g u a r d D i r e c t i o n )
print ( guardDirection )
cv2 . destroyAllWindows ( )
break
p r i n t ( " Loop took { } seconds " . format ( time . time ( ) − l a s t T i m e ) )
main ( )
def lock_on ( ) :
PressKey (CTRL)
def l o c k _ o f f ( ) :
ReleaseKey (CTRL)
def switch_guard ( d i r ) :
tap_key ( d i r )
def warden_heavy_attack ( ) :
tap_key (K)
def w a r d e n _ l i g h t _ a t t a c k ( ) :
tap_key ( J )
def p a r r y _ t h e n _ c o u n t e r ( ) :
warden_heavy_attack ( )
time . s l e e p ( 0 . 1 )
switch_guard ( LEFT )
time . s l e e p ( 0 . 5 )
warden_light_attack ( )
time . s l e e p ( 0 . 0 5 )
warden_light_attack ( )
def e x e c u t e ( o p t i o n ) :
tap_key ( o p t i o n )
"""
time . s l e e p (4)
LockOn ( )
P r e s s K e y (W)
time . s l e e p (1)
64 Appendix E. Source Code
f o r i in range ( 2 ) :
Swit ch Gua rd ( LEFT )
WardenHeavyAttack ( )
time . s l e e p ( 0 . 8 )
Sw it ch Gua rd (UP)
WardenHeavyAttack ( )
time . s l e e p ( 1 . 6 )
E x e c u t e (Q)
time . s l e e p (2)
R e l e a s e K e y (W)
LockOff ( )
"""
# from g r a b s c r e e n import g r a b _ s c r e e n
# for data c o l l e c t i o n
import keyboard
import p i c k l e
import os
capturing = False
def t o g g l e _ c a p t u r e ( key ) :
global capturing
i f capturing :
p r i n t ( " Ceasing c a p t u r e . " )
capturing = False
else :
p r i n t ( " Beginning c a p t u r e . . . " )
c a p t u r i n g = True
i f gamepad [ 4 ] == 1 : # l e f t
p r i n t ( " The p l a y e r s e t guard t o l e f t ! " )
e l i f gamepad [ 5 ] == 1 : # t o p
p r i n t ( " The p l a y e r s e t guard t o top ! " )
e l i f gamepad [ 6 ] == 1 :
p r i n t ( " The p l a y e r s e t guard t o r i g h t ! " )
i f gamepad [ 9 ] == 1 :
p r i n t ( " The p l a y e r heavy a t t a c k e d ! " )
p r i n t ( "−−−−−−−−−" )
#RUNTIME
def main ( ) :
global capturing
keyboard . o n _ r e l e a s e _ k e y ( " v " , t o g g l e _ c a p t u r e )
f o r i in l i s t ( range ( 3 ) ) [ : : − 1 ] :
print ( i + 1)
time . s l e e p ( 1 )
p r i n t ( " Neural Knight Data Capture i s now a c t i v e . . . " )
i = 0
i f os . path . e x i s t s ( " i n / s t a t e D a t a 0 . p i c k l e " ) :
foundFile = False
while not f o u n d F i l e :
i f not os . path . e x i s t s ( " i n / s t a t e D a t a " + s t r ( i ) + " .
pickle " ) :
f o u n d F i l e = True
else :
i += 1
l a s t T i m e = time . time ( )
switch_guard (UP)
c u r r e n t G u a r d D i r e c t i o n = UP
gd = C o n t r o l l e r ( )
gamepadInput = [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ]
# prevGamepadInput = [ ]
prevGamepadInput = gamepadInput
"""
f o r i in range ( prevFrames ) :
prevGamepadInput . append ( gamepadInput )
"""
while True :
i f capturing :
s c r e e n = np . a r r a y ( ImageGrab . grab ( bbox = ( 0 , 1 3 0 , 1 0 2 4 ,
700) ) ) # g r a b _ s c r e e n ( r e g i o n =(0 ,40 ,800 ,640) )
66 Appendix E. Source Code
# prevGamepadInput = prevGamepadInput [ 1 : ]
# prevGamepadInput . append ( gamepadInput )
prevGamepadInput = gamepadInput
pcg = c u r r e n t G u a r d D i r e c t i o n
c u r r e n t G u a r d D i r e c t i o n = checkCurrentGuardInput (
prevGamepadInput , c u r r e n t G u a r d D i r e c t i o n )
s t a t e = g e n e r a t e _ f e a t u r e _ l i s t ( s c re en ,
currentGuardDirection )
# gamepadInputHistory = [ y f o r x in prevGamepadInput f o r
y in x ]
# print (" History : " , gamepadInputHistory )
s t a t e . append ( prevGamepadInput )
p i c k l e . dump( s t a t e , f )
gamepadInput = gd . get_true_game_input ( )
p i c k l e . dump( gamepadInput , g )
# r e a d _ d a t a ( s t a t e [ 0 ] , gamepadInput )
i f cv2 . waitKey ( 2 5 ) & 0xFF == ord ( " q " ) : # e y b o a r d . i s _ p r e s s e d ( "
q ") :
cv2 . destroyAllWindows ( )
break
f . close ( )
g . close ( )
main ( )
def l i s t e n ( c ) :
while True :
e v e n t s = get_gamepad ( )
f o r event in e v e n t s :
i f event . ev_type == " Key " or event . code == "ABS_RZ" :
c . b u t t o n s _ p r e s s e d . add ( event . code )
e l i f event . ev_type == " Absolute " :
i f event . code == "ABS_RX" :
c . rx = i n t ( event . s t a t e ) / 32768
i f event . code == "ABS_RY" :
c . ry = i n t ( event . s t a t e ) / 32768
i f event . code == " ABS_X " :
c . x = i n t ( event . s t a t e ) / 32768
i f event . code == " ABS_Y " :
c . y = i n t ( event . s t a t e ) / 32768
class Controller :
Appendix E. Source Code 67
def _ _ i n i t _ _ ( s e l f ) :
s e l f . rx = 0
s e l f . ry = 0
self .x = 0
self .y = 0
s e l f . p r e s s e d _ s i n c e _ l a s t _ p o l l = t h r e a d i n g . Thread ( t a r g e t =
l i s t e n , a r g s =( s e l f , ) )
s e l f . p r e s s e d _ s i n c e _ l a s t _ p o l l . setDaemon ( True )
self . pressed_since_last_poll . start ()
s e l f . buttons_pressed = set ( )
def g e t _ b u t t o n ( s e l f , button ) :
pressed = button in s e l f . b u t t o n s _ p r e s s e d
r e t u r n pressed
def g e t _ l e f t _ s t i c k ( s e l f ) :
return ( s e l f . x , s e l f . y )
def g e t _ r i g h t _ s t i c k ( s e l f ) :
r e t u r n ( s e l f . rx , s e l f . ry )
def get_true_game_input ( s e l f ) :
inputArray = [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ]
# W, A, S , D, l e f t , up , r i g h t , GB , l i g h t , heavy , dodge ,
f e i n t , taunt , i d l e
i f s e l f . g e t _ b u t t o n ( "BTN_WEST" ) :
# p r i n t ( " Guard b r e a k " )
inputArray [ 7 ] = 1
i f s e l f . g e t _ b u t t o n ( "BTN_TR" ) :
# print (" Light Attack ")
inputArray [ 8 ] = 1
i f s e l f . g e t _ b u t t o n ( "ABS_RZ" ) :
# p r i n t ( " Heavy A t t a c k " )
inputArray [ 9 ] = 1
i f s e l f . g e t _ b u t t o n ( "BTN_SOUTH" ) :
# p r i n t ( " Dodge " )
inputArray [ 1 0 ] = 1
i f s e l f . g e t _ b u t t o n ( "BTN_EAST" ) :
# print (" Feint ")
inputArray [ 1 1 ] = 1
i f s e l f . g e t _ b u t t o n ( "BTN_NORTH" ) :
# p r i n t ( " Taunt " )
68 Appendix E. Source Code
inputArray [ 1 2 ] = 1
s e l f . buttons_pressed . c l e a r ( )
movement = s e l f . g e t _ l e f t _ s t i c k ( )
i f movement [ 1 ] > 0 . 7 :
inputArray [ 0 ] = 1
e l i f movement [ 1 ] < − 0.7:
inputArray [ 2 ] = 1
i f movement [ 0 ] < − 0.7:
inputArray [ 1 ] = 1
e l i f movement [ 0 ] > 0 . 7 :
inputArray [ 3 ] = 1
guard = s e l f . g e t _ r i g h t _ s t i c k ( )
i f guard [ 0 ] < − 0.7 and guard [ 1 ] < 0 . 7 :
inputArray [ 4 ] = 1
# p r i n t ( " Guard l e f t ! " )
e l i f guard [ 0 ] > 0 . 7 and guard [ 1 ] < 0 . 7 :
inputArray [ 6 ] = 1
# p r i n t ( " Guard r i g h t ! " )
e l i f guard [ 1 ] > 0 . 7 and guard [ 0 ] > − 0.3 and guard [ 0 ] < 0 . 3 :
inputArray [ 5 ] = 1
# p r i n t ( " Guard t o p ! " )
i f 1 not in inputArray :
inputArray [ 1 3 ] = 1
p r i n t ( inputArray )
r e t u r n inputArray
c u r r e n t G u a r d D i r e c t i o n = None
coords = l i n e [ 0 ]
cv2 . l i n e ( image , ( coords [ 0 ] , coords [ 1 ] ) , ( coords [ 2 ] ,
coords [ 3 ] ) , [ 2 5 5 , 0 , 0 ] , 1 )
except :
pass
def r o i ( image , v e r t i c e s ) :
mask = np . z e r o s _ l i k e ( image )
cv2 . f i l l P o l y ( mask , v e r t i c e s , 2 5 5 )
masked = cv2 . b i t w i s e _ a n d ( image , mask )
r e t u r n masked
def r o i 3 ( image , v e r t i c e s ) : # r o i f o r 3 c h a n n e l i m a g e
mask = np . z e r o s _ l i k e ( image )
cv2 . f i l l P o l y ( mask , np . i n t 3 2 ( [ v e r t i c e s ] ) , ( 2 5 5 , 2 5 5 , 2 5 5 ) )
masked = cv2 . b i t w i s e _ a n d ( image , mask )
r e t u r n masked
def f i l t e r _ f o r _ a t t a c k _ i n d i c a t o r ( image ) :
hsv = cv2 . c v t C o l o r ( image , cv2 . COLOR_RGB2HSV)
mask = cv2 . inRange ( hsv , np . a r r a y ( [ 0 , 2 3 4 , 7 4 ] ) , np . a r r a y ( [ 1 , 2 5 5 ,
160]) )
whiteImage = np . z e r o s _ l i k e ( image )
whiteImage [ : ] = 255
l o c s = np . where ( mask ! = 0 )
# h t t p s : / / s t a c k o v e r f l o w . com / q u e s t i o n s / 4 1 5 7 2 8 8 7 / e q u i v a l e n t −o f −
c o p y t o −in −python −o p e n c v − b i n d i n g s
# C a s e #1 − O t h e r i m a g e i s g r a y s c a l e and s o u r c e i m a g e i s c o l o u r
i f len ( image . shape ) == 3 and len ( whiteImage . shape ) ! = 3 :
image [ l o c s [ 0 ] , l o c s [ 1 ] ] = whiteImage [ l o c s [ 0 ] , l o c s [ 1 ] , None
]
# C a s e #2 − B o t h i m a g e s a r e c o l o u r o r g r a y s c a l e
e l i f ( len ( image . shape ) == 3 and len ( whiteImage . shape ) == 3 ) or
\
( len ( image . shape ) == 1 and len ( whiteImage . shape ) == 1 ) :
image [ l o c s [ 0 ] , l o c s [ 1 ] ] = whiteImage [ l o c s [ 0 ] , l o c s [ 1 ] ]
# O t h e r w i s e , we can ’ t do t h i s
r e t u r n mask , l o c s
def f i l t e r _ f o r _ u n b l o c k a b l e ( image ) :
hsv = cv2 . c v t C o l o r ( image , cv2 . COLOR_RGB2HSV)
mask = cv2 . inRange ( hsv , np . a r r a y ( [ 8 , 2 5 0 , 1 3 5 ] ) , np . a r r a y ( [ 1 1 , 2 5 5 ,
164]) )
l o c s = np . where ( mask ! = 0 )
return locs
def f i l t e r _ f o r _ g u a r d b r e a k ( image ) :
hsv = cv2 . c v t C o l o r ( image , cv2 . COLOR_RGB2HSV)
mask = cv2 . inRange ( hsv , np . a r r a y ( [ 0 , 2 4 7 , 6 8 ] ) , np . a r r a y ( [ 1 , 2 5 5 ,
85]) )
70 Appendix E. Source Code
l o c s = np . where ( mask ! = 0 )
return locs
"""
we g e n e r a t e 2 d i f f e r e n t i m a g e s . One b l a c k and w h i t e i m a g e f o r
finding the
l i n e s f o r g u a r d d i r e c t i o n , and o n e f o r f i n d i n g t h e a t t a c k i n g s t a t e s
. ProcessedImage finds the l i n e s ( including attack indicators ,
converted to white ) .
outImage turns the a t t a c k i n d i c a t o r s back i n t o red f o r viewing
c o n v e n i e n c e , b u t m o s t l y u s e s l o c s t o f i n d i f an a t t a c k i s
h a p p e n i n g . I t a l s o u s e s t h i s same l o c s method
t o f i n d t h e s t a t e o f u n b l o c k a b l e s and g u a r d b r e a k i n g
As w e l l a s t h i s , i t c r e a t e s a s e p a r a t e i m a g e t o f i n d o u t t h e h e a l t h
o f t h e enemy and p l a y e r
"""
def process_image ( o r i g i n a l I m a g e ) :
processedImage = o r i g i n a l I m a g e . copy ( )
v e r t i c e s 2 = np . a r r a y ( [ [ 4 0 0 , 1 3 0 ] , [ 6 2 0 , 1 3 0 ] , [ 6 2 0 , 4 0 0 ] , [ 4 0 0 , 4 0 0 ] ] )
processedImage = r o i 3 ( processedImage , v e r t i c e s 2 )
gbLocs = f i l t e r _ f o r _ g u a r d b r e a k ( processedImage )
uLocs = f i l t e r _ f o r _ u n b l o c k a b l e ( processedImage ) # u n b l o c k a b l e
presence locations
mask , l o c s = f i l t e r _ f o r _ a t t a c k _ i n d i c a t o r ( processedImage )
# Detect l i n e s
l i n e s = cv2 . HoughLinesP ( processedImage , rho = 1 , t h e t a = np . p i
/180 , t h r e s h o l d = 1 2 , minLineLength = 1 3 , maxLineGap = 0 )
gradients = [ ]
try :
# find gradients
f o r l i n e in l i n e s :
actualLine = line [0]
i f ( actualLine [ 2 ] − actualLine [ 0 ] ) != 0 :
gradient = ( actualLine [ 3 ] − actualLine [ 1 ] ) / (
actualLine [2] − actualLine [ 0 ] )
i f i s f i n i t e ( gradient ) :
g r a d i e n t s . append ( g r a d i e n t )
# print ( gradients )
except :
pass
#now do h e a l t h s t u f f
healthImage = o r i g i n a l I m a g e . copy ( )
healthImage = cv2 . c v t C o l o r ( healthImage , cv2 . COLOR_BGR2GRAY)
v e r t i c e s = np . a r r a y ( [ [ 4 9 0 , 3 0 ] , [ 6 1 0 , 3 0 ] , [ 6 1 0 , 1 2 0 ] , [ 4 9 0 , 1 2 0 ] ] )
healthImage = r o i ( healthImage , [ v e r t i c e s ] )
_ , healthImage = cv2 . t h r e s h o l d ( healthImage , 1 3 0 , 2 5 5 , cv2 .
THRESH_BINARY)
h e a l t h L i n e s = cv2 . HoughLinesP ( healthImage , rho = 1 , t h e t a = np .
p i /180 , t h r e s h o l d = 2 0 , minLineLength = 3 , maxLineGap = 1 0 )
draw _li nes ( healthImage , h e a l t h L i n e s )
h e a l t h L i n e = None # T h i s i s t h e a c t u a l l i n e r e p r e s e n t i n g t h e
health
healths = [ ]
h e a l t h = None
try :
i = 0
f o r l i n e in h e a l t h L i n e s :
i += 1
actualLine = line [0]
y D i f f = abs ( a c t u a l L i n e [ 3 ] − a c t u a l L i n e [ 1 ] ) # N o i s e
filtering
i f y D i f f == 0 :
healthLine = actualLine
h e a l t h s . append ( ( h e a l t h L i n e [ 2 ] − h e a l t h L i n e [ 0 ] ) /
60) # Health i s j u s t x d i f f e r e n c e o f the l i n e
ends
h e a l t h = mode ( h e a l t h s )
except :
pass
72 Appendix E. Source Code
# c r e a t e l o w r e s v e r s i o n o f f r a m e t o p a s s i n a s c o n t e x t −s e n s i t i v e
information
lowRes = o r i g i n a l I m a g e . copy ( )
lowRes = cv2 . c v t C o l o r ( lowRes , cv2 . COLOR_BGR2RGB)
s c a l e P e r c e n t = 20
newWidth = i n t ( lowRes . shape [ 1 ] ∗ s c a l e P e r c e n t / 1 0 0 )
newHeight = i n t ( lowRes . shape [ 0 ] ∗ s c a l e P e r c e n t / 1 0 0 )
newDim = ( newWidth , newHeight )
lowRes = cv2 . r e s i z e ( lowRes , newDim , i n t e r p o l a t i o n = cv2 .
INTER_AREA)
cv2 . imshow ( " img " , lowRes )
r e t u r n outImage , o r i g i n a l I m a g e , g r a d i e n t s , l o c s , uLocs , gbLocs ,
h e a l t h , lowRes
# 0 = right 1 = top 2 = l e f t
def find_guard ( g r a d i e n t s ) :
avgM = np . mean ( g r a d i e n t s )
i f avgM < − 0.5:
# p r i n t ( " RIGHT " )
r e t u r n RIGHT
e l i f avgM > − 0.5 and avgM < 0 . 9 :
# p r i n t ( "TOP" )
r e t u r n UP
else :
# p r i n t ( " LEFT " )
r e t u r n LEFT
def g e n e r a t e _ f e a t u r e _ l i s t ( s cr e en , c u r r e n t G u a r d D i r e c t i o n ) :
# p r i n t("−−−−−−−−−−−−−−−")
newScreen , o r i g i n a l I m a g e , g r a d i e n t s , l o c s , uLocs , gbLocs ,
h e a l t h , lowRes = process_image ( s c r e e n )
enemyGuardDirection = None
attacking = False
unb l o c k a b l e = F a l s e
i f len ( uLocs [ 0 ] > 5 ) : # t h i s number i s >0 j u s t t o f i l t e r o u t
noise
u n b l o c k a b l e = True
i f len ( l o c s [ 0 ] ) > 3 0 0 :
a t t a c k i n g = True
i f len ( g r a d i e n t s ) > 0 :
pass
i f g r a d i e n t s and len ( g r a d i e n t s ) < 1 0 : # o n l y c h a n g e g u a r d i f
t h e r e ’ s n o t much i n t h e way o f n o i s e
enemyGuardDirection = find_guard ( g r a d i e n t s )
else :
enemyGuardDirection = None
# i n o r d e r : p l a y e r r i g h t guard , p l a y e r up guard , p l a y e r l e f t
guard , enemy r i g h t guard , enemy up guard , enemy l e f t guard ,
enemy a t t a c k i n g , enemy u n b l o c k a b l e , enemy b a s h i n g , enemy
guard b r e a k i n g ]
Appendix E. Source Code 73
features = [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0]
i f c u r r e n t G u a r d D i r e c t i o n == RIGHT :
features [0] = 1
e l i f c u r r e n t G u a r d D i r e c t i o n == UP :
features [1] = 1
e l i f c u r r e n t G u a r d D i r e c t i o n == LEFT :
features [2] = 1
i f enemyGuardDirection == RIGHT :
features [3] = 1
# p r i n t (" Right ")
i f enemyGuardDirection == UP :
features [4] = 1
# p r i n t ( " Top " )
i f enemyGuardDirection == LEFT :
features [5] = 1
# print (" L e f t ")
if attacking :
features [6] = 1
tempStr = " l e f t "
i f enemyGuardDirection == UP :
tempStr = " top "
e l i f enemyGuardDirection == RIGHT :
tempStr = " r i g h t "
# p r i n t ( " The enemy i s t h r o w i n g o u t an a t t a c k f r o m t h e " ,
tempStr , " ! " )
i f un b l o c k a b l e :
features [7] = 1
# p r i n t ( " Enemy i s t h r o w i n g some k i n d o f u n b l o c k a b l e a t t a c k
!")
i f f e a t u r e s [ 7 ] == 1 and enemyGuardDirection == None : # f e a t u r e s
[ 3 ] == 0 and f e a t u r e s [ 4 ] == 0 and f e a t u r e s [ 5 ] == 0 : # i f
u n b l o c k a b l e and t h e r e ’ s no a t t a c k d i r e c t i o n , i t ’ s a b a s h
features [8] = 1
# p r i n t (" I t ’ s a bash ! " )
i f len ( gbLocs [ 0 ] ) > 300 and enemyGuardDirection == None :
# p r i n t ( " Enemy i s g u a r d b r e a k i n g ! " )
features [9] = 1
f i n a l F e a t u r e L i s t = [ f e a t u r e s , lowRes ]
return f i n a l F e a t u r e L i s t
def r o i ( image , v e r t i c e s ) :
mask = np . z e r o s _ l i k e ( image )
cv2 . f i l l P o l y ( mask , v e r t i c e s , 2 5 5 )
masked = cv2 . b i t w i s e _ a n d ( image , mask )
r e t u r n masked
def process_image ( o r i g i n a l I m a g e ) :
processedImage = cv2 . c v t C o l o r ( o r i g i n a l I m a g e , cv2 . COLOR_BGR2GRAY
)
# p r o c e s s e d I m a g e = cv2 . G a u s s i a n B l u r ( p r o c e s s e d I m a g e , ( 1 , 1 ) , 0 )
v e r t i c e s = np . a r r a y ( [ [ 4 6 9 , 1 7 9 ] , [ 5 6 7 , 1 7 9 ] , [ 5 6 7 , 2 7 8 ] , [ 4 6 9 , 2 7 8 ] ] )
processedImage = r o i ( processedImage , [ v e r t i c e s ] )
_ , processedImage = cv2 . t h r e s h o l d ( processedImage , 1 2 5 , 2 5 5 , cv2 .
THRESH_BINARY)
# p r o c e s s e d I m a g e = cv2 . Canny ( p r o c e s s e d I m a g e , t h r e s h o l d 1 =200 ,
t h r e s h o l d 2 =300)
l i n e s = cv2 . HoughLinesP ( processedImage , rho = 1 , t h e t a = np . p i
/180 , t h r e s h o l d = 1 7 , minLineLength = 1 5 , maxLineGap = 5 )
gradients = [ ]
Appendix E. Source Code 75
try :
p r i n t ( len ( l i n e s ) )
print ( l in es )
# find gradients
f o r l i n e in l i n e s :
actualLine = line [0]
gradient = ( actualLine [ 3 ] − actualLine [ 1 ] ) / (
actualLine [2] − actualLine [ 0 ] )
i f i s f i n i t e ( gradient ) :
g r a d i e n t s . append ( g r a d i e n t )
print ( gradients )
except :
pass
draw_ li nes ( processedImage , l i n e s )
r e t u r n processedImage , o r i g i n a l I m a g e , g r a d i e n t s
def find_guard ( g r a d i e n t s ) :
avgM = i n t ( mean ( g r a d i e n t s ) )
i f avgM < − 0.5:
r e t u r n RIGHT
e l i f avgM > − 0.5 and avgM < 1 :
r e t u r n UP
else :
r e t u r n LEFT
f o r i in l i s t ( range ( 4 ) ) [ : : − 1 ] :
print ( i + 1)
time . s l e e p ( 1 )
def main ( ) :
l a s t T i m e = time . time ( )
while True :
s c r e e n = np . a r r a y ( ImageGrab . grab ( bbox = ( 0 , 1 3 0 , 1 0 2 4 , 7 0 0 ) ) )
# grab_screen ( region =(0 ,40 ,800 ,640) )
newScreen , o r i g i n a l I m a g e , g r a d i e n t s = process_image ( s c r e e n )
l a s t T i m e = time . time ( )
# cv2 . imshow ( " window " , n e w S c r e e n )
if gradients :
g u a r d D i r e c t i o n = find_guard ( g r a d i e n t s )
switch_guard ( g u a r d D i r e c t i o n )
print ( guardDirection )
main ( )
import k e r a s . backend as K
def f 1 ( y_true , y_pred ) :
def r e c a l l ( y_true , y_pred ) :
""" R e c a l l m e t r i c .
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f r e c a l l .
Computes t h e r e c a l l , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many r e l e v a n t i t e m s a r e s e l e c t e d .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p o s s i b l e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_true , 0 , 1 ) ) )
r e c a l l = true_positives / ( possible_positives + K. epsilon ( )
)
return r e c a l l
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f p r e c i s i o n .
Computes t h e p r e c i s i o n , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many s e l e c t e d i t e m s a r e r e l e v a n t .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p r e d i c t e d _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_pred , 0 , 1 ) ) )
precision = true_positives / ( predicted_positives + K.
epsilon ( ) )
return precision
p r e c i s i o n = p r e c i s i o n ( y_true , y_pred )
r e c a l l = r e c a l l ( y_true , y_pred )
r e t u r n 2 ∗ ( ( p r e c i s i o n ∗ r e c a l l ) /( p r e c i s i o n + r e c a l l +K . e p s i l o n ( ) ) )
X = []
y = []
try :
while True :
example = p i c k l e . load ( f )
a = example [ 0 ]
a += example [ 2 ]
X . append ( a )
y . append ( p i c k l e . load ( g ) )
except :
pass
p r i n t ( len ( X ) )
p r i n t ( len ( y ) )
f . close ( )
g . close ( )
X = np . a s a r r a y ( X , dtype=np . f l o a t 3 2 )
y = np . a s a r r a y ( y , dtype=np . f l o a t 3 2 )
t r a i n i n g S i z e = 3000
x_train = X[ : trainingSize ]
y_train = y [ : trainingSize ]
x _ t e s t = X[ t r a i n i n g S i z e : ]
y_test = y[ trainingSize : ]
a = 0
f o r i in range ( len ( y _ t r a i n ) ) :
i f y _ t r a i n [ i ] [ 1 3 ] == 1 :
a += 1
p r i n t ( " node imbalance : " , a , " / " , len ( y _ t r a i n ) )
"""
x _ t r a i n = t f . k e r a s . u t i l s . n o r m a l i z e ( x _ t r a i n , a x i s =1)
x _ t e s t = t f . k e r a s . u t i l s . n o r m a l i z e ( x _ t e s t , a x i s =1)
f o r a in x_train :
p l t . imshow ( a , cmap = p l t . cm . b i n a r y )
p l t . show ( )
"""
c o n f i g = t f . ConfigProto ( )
c o n f i g . gpu_options . allow_growth = True
session = t f . Session ( config=config )
78 Appendix E. Source Code
model = k e r a s . models . S e q u e n t i a l ( )
model . add ( k e r a s . l a y e r s . Dense ( 1 2 8 , a c t i v a t i o n = t f . nn . r e l u ) )
model . add ( k e r a s . l a y e r s . Dense ( 1 2 8 , a c t i v a t i o n = t f . nn . r e l u ) )
model . add ( k e r a s . l a y e r s . Dense ( 1 2 8 , a c t i v a t i o n = t f . nn . r e l u ) )
model . add ( k e r a s . l a y e r s . Dense ( 1 4 , a c t i v a t i o n = t f . nn . sigmoid ) )
c l a s s _ w e i g h t s = { 0 : 1 , #w
1 : 1 , #A
2 : 1 , #S
3 : 1 , #D
4 : 3 , # g u a r d l e f t 10
5 : 3 , # g u a r d t o p 10
6 : 3 , # g u a r d r i g h t 10
7:2 , # guardbreak
8 : 2 , # l i g h t 60
9 : 2 , # h e a v y 60
10:2 , # dodge
11:1 , # f e i n t
12:1 , # taunt
13:1} # idle
v a l _ l o s s , v a l _ p r e c = model . e v a l u a t e ( x _ t e s t , y _ t e s t )
print ( " Validation l o s s : " , val_loss , " Validation accuracy : " ,
val_prec )
# from g r a b s c r e e n import g r a b _ s c r e e n
# for data c o l l e c t i o n
import keyboard
import os
Appendix E. Source Code 79
import t e n s o r f l o w as t f
import k e r a s
from k e r a s . u t i l s import plot_model
import k e r a s _ m e t r i c s
from processInputToGame import input_to_game
import k e r a s . backend as K
maxHealth = 60
running = F a l s e
def t o g g l e _ r u n n i n g ( key ) :
g l o b a l running
i f running :
p r i n t ( " Ceasing Neural Knight ’ s mind . " )
running = F a l s e
else :
p r i n t ( " Continuing Neural Knight ’ s mind . . . " )
running = True
def read_data ( s t a t e ) :
i f s t a t e [ 3 ] == 1 : # r i g h t
p r i n t ( " The enemy ’ s guard i s r i g h t ! " )
e l i f s t a t e [ 4 ] == 1 : # t o p
p r i n t ( " The enemy ’ s guard i s top ! " )
e l i f s t a t e [ 5 ] == 1 : # l e f t
p r i n t ( " The enemy ’ s guard i s l e f t ! " )
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f r e c a l l .
Computes t h e r e c a l l , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many r e l e v a n t i t e m s a r e s e l e c t e d .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p o s s i b l e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_true , 0 , 1 ) ) )
r e c a l l = true_positives / ( possible_positives + K. epsilon ( )
)
return r e c a l l
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f p r e c i s i o n .
Computes t h e p r e c i s i o n , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many s e l e c t e d i t e m s a r e r e l e v a n t .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p r e d i c t e d _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_pred , 0 , 1 ) ) )
precision = true_positives / ( predicted_positives + K.
epsilon ( ) )
return precision
p r e c i s i o n = p r e c i s i o n ( y_true , y_pred )
r e c a l l = r e c a l l ( y_true , y_pred )
r e t u r n 2 ∗ ( ( p r e c i s i o n ∗ r e c a l l ) /( p r e c i s i o n + r e c a l l +K . e p s i l o n ( ) ) )
#RUNTIME
def main ( ) :
g l o b a l running
keyboard . o n _ r e l e a s e _ k e y ( " t " , t o g g l e _ r u n n i n g )
f o r i in l i s t ( range ( 1 ) ) [ : : − 1 ] :
print ( i + 1)
time . s l e e p ( 1 )
p r i n t ( " Loading model . . . " )
l a s t T i m e = time . time ( )
switch_guard (UP)
c u r r e n t G u a r d D i r e c t i o n = UP
p r e d i c t i o n = np . a r r a y ( [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ] )
prevFrames = 600
prevPrediction = [ ]
f o r i in range ( prevFrames ) :
p r e v P r e d i c t i o n . append ( p r e d i c t i o n )
while True :
i f running :
s c r e e n = np . a r r a y ( ImageGrab . grab ( bbox = ( 0 , 1 3 0 , 1 0 2 4 ,
700) ) ) # g r a b _ s c r e e n ( r e g i o n =(0 ,40 ,800 ,640) )
Appendix E. Source Code 81
c u r r e n t G u a r d D i r e c t i o n = checkCurrentGuardInput (
prediction , currentGuardDirection )
s t a t e = g e n e r a t e _ f e a t u r e _ l i s t ( s c re en ,
currentGuardDirection )
# e x t r a c t e d S t a t e = np . a s a r r a y ( s t a t e [ 0 ] , d t y p e =np . f l o a t 3 2
) # State data
e x t r a c t e d S t a t e = s t a t e [ 0 ] # Frame d a t a
# p r i n t ( " The i n p u t i s : " , e x t r a c t e d S t a t e )
# p r i n t ( " The s h a p e o f t h e i n p u t i s : " , e x t r a c t e d S t a t e .
shape )
prevPrediction = prevPrediction [ 1 : ]
p r e v P r e d i c t i o n . append ( p r e d i c t i o n )
predHistory = [ y f o r x in p r e v P r e d i c t i o n f o r y in x ]
e x t r a c t e d S t a t e = np . expand_dims ( e x t r a c t e d S t a t e , a x i s =0)
e x t r a c t e d S t a t e = np . c o n c a t e n a t e ( ( e x t r a c t e d S t a t e [ 0 ] ,
predHistory ) , a x i s =0) # add p r e v i o u s f r a m e
gamepadinput to input s
e x t r a c t e d S t a t e = np . expand_dims ( e x t r a c t e d S t a t e , a x i s =0)
p r e d i c t i o n = model . p r e d i c t ( e x t r a c t e d S t a t e ) [ 0 ]
b = np . z e r o s _ l i k e ( p r e d i c t i o n )
# S e p a r a t e l y c h e c k t o s e e i f any o f t h e g u a r d d i r e c t i o n s
n e e d c h a n g i n g . I f so , o n l y c h a n g e t h e g u a r d most
l i k e l y to need changing
biggest = 0
b i g g e s t I n d e x = −1
f o r i in range ( 4 , 7 ) :
if prediction [ i ] > 0 . 2 :
if prediction [ i ] > biggest :
biggest = prediction [ i ]
biggestIndex = i
i f b i g g e s t I n d e x > − 1:
b [ biggestIndex ] = 1
f o r i in range ( 0 , 4 ) :
if prediction [ i ] > 0 . 2 :
b[ i ] = 1
f o r i in range ( 7 , len ( b ) ) :
if prediction [ i ] > 0 . 2 :
b[ i ] = 1
# b [ np . argmax ( p r e d i c t i o n ) ] = 1
# p r i n t (" out : " , b )
82 Appendix E. Source Code
main ( )
# metric s t u f f
import k e r a s . backend as K
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f r e c a l l .
Computes t h e r e c a l l , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many r e l e v a n t i t e m s a r e s e l e c t e d .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p o s s i b l e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_true , 0 , 1 ) ) )
r e c a l l = true_positives / ( possible_positives + K. epsilon ( )
)
return r e c a l l
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f p r e c i s i o n .
Computes t h e p r e c i s i o n , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many s e l e c t e d i t e m s a r e r e l e v a n t .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p r e d i c t e d _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_pred , 0 , 1 ) ) )
Appendix E. Source Code 83
try :
while True :
example = p i c k l e . load ( f )
a = example [ 0 ]
a += example [ 2 ]
X . append ( a )
y . append ( p i c k l e . load ( g ) )
except :
pass
f . close ( )
g . close ( )
"""
f = open (" in / s t a t e D a t a 3 . p i c k l e " , " rb ")
g = open (" out / outputData3 . p i c k l e " ," rb ")
try :
w h i l e True :
example = p i c k l e . load ( f )
a = example [ 0 ]
a += e x a m p l e [ 2 ]
X. append ( a )
y . append ( p i c k l e . l o a d ( g ) )
except :
pass
f . close ()
g. close ()
"""
p r i n t ( len ( X ) )
p r i n t ( len ( y ) )
X = np . a s a r r a y ( X , dtype=np . f l o a t 3 2 )
y = np . a s a r r a y ( y , dtype=np . f l o a t 3 2 )
84 Appendix E. Source Code
t r a i n i n g S i z e = 15000
x_train = X[ : trainingSize ]
y_train = y [ : trainingSize ]
print ( x_train [ 0 ] )
x _ t e s t = X[ t r a i n i n g S i z e : ]
y_test = y[ trainingSize : ]
a = 0
f o r i in range ( len ( y _ t r a i n ) ) :
i f y _ t r a i n [ i ] [ 1 3 ] == 1 :
a += 1
p r i n t ( " node imbalance : " , a , " / " , len ( y _ t r a i n ) )
"""
x _ t r a i n = t f . k e r a s . u t i l s . n o r m a l i z e ( x _ t r a i n , a x i s =1)
x _ t e s t = t f . k e r a s . u t i l s . n o r m a l i z e ( x _ t e s t , a x i s =1)
f o r a in x_train :
p l t . imshow ( a , cmap = p l t . cm . b i n a r y )
p l t . show ( )
"""
# Model p a r a m e t e r s
numLSTM = 1
dropout = 0 . 3
numDense = 0
c o n f i g = t f . ConfigProto ( )
c o n f i g . gpu_options . allow_growth = True
session = t f . Session ( config=config )
model = k e r a s . models . S e q u e n t i a l ( )
f o r i in range (numLSTM− 1) :
model . add (CuDNNLSTM( 1 2 8 , r e t u r n _ s e q u e n c e s =True ) )
model . add ( Dropout ( dropout ) )
Appendix E. Source Code 85
model . add ( F l a t t e n ( ) )
"""
f o r i i n r a n g e ( numDense ) :
m o d e l . add ( k e r a s . l a y e r s . Dense ( 3 2 , a c t i v a t i o n =" r e l u " ) )
m o d e l . add ( Dropout ( d r o p o u t ) )
"""
idleBalancer = 2
c l a s s _ w e i g h t s = { 0 : i d l e B a l a n c e r , #w
1 : i d l e B a l a n c e r , #A
2 : i d l e B a l a n c e r , #S
3 : i d l e B a l a n c e r , #D
4 : i d l e B a l a n c e r +1 , # g u a r d l e f t 10
5 : i d l e B a l a n c e r +1 , # g u a r d t o p 10
6 : i d l e B a l a n c e r +1 , # g u a r d r i g h t 10
7: idleBalancer + 5 , # guardbreak
8 : i d l e B a l a n c e r + 2 0 , # l i g h t 60
9 : i d l e B a l a n c e r + 2 0 , # h e a v y 60
10: idleBalancer + 10 , # dodge
11: idleBalancer , # f e i n t
12: idleBalancer , # taunt
13:1} # idle
v a l _ l o s s , v a l _ a c c = model . e v a l u a t e ( x _ t e s t , y _ t e s t )
print ( " Validation l o s s : " , val_loss , " Validation accuracy : " ,
val_acc )
from s t a t e D e t e c t o r import g e n e r a t e _ f e a t u r e _ l i s t ,
checkCurrentGuardInput
# from g r a b s c r e e n import g r a b _ s c r e e n
# for data c o l l e c t i o n
import keyboard
import os
import t e n s o r f l o w as t f
import k e r a s
import k e r a s _ m e t r i c s
from k e r a s . u t i l s import plot_model
from processInputToGame import input_to_game
import k e r a s . backend as K
def f 1 ( y_true , y_pred ) :
def r e c a l l ( y_true , y_pred ) :
""" R e c a l l m e t r i c .
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f r e c a l l .
Computes t h e r e c a l l , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many r e l e v a n t i t e m s a r e s e l e c t e d .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p o s s i b l e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_true , 0 , 1 ) ) )
r e c a l l = true_positives / ( possible_positives + K. epsilon ( )
)
return r e c a l l
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f p r e c i s i o n .
Computes t h e p r e c i s i o n , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many s e l e c t e d i t e m s a r e r e l e v a n t .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p r e d i c t e d _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_pred , 0 , 1 ) ) )
precision = true_positives / ( predicted_positives + K.
epsilon ( ) )
return precision
p r e c i s i o n = p r e c i s i o n ( y_true , y_pred )
r e c a l l = r e c a l l ( y_true , y_pred )
r e t u r n 2 ∗ ( ( p r e c i s i o n ∗ r e c a l l ) /( p r e c i s i o n + r e c a l l +K . e p s i l o n ( ) ) )
Appendix E. Source Code 87
maxHealth = 60
running = F a l s e
def t o g g l e _ r u n n i n g ( key ) :
g l o b a l running
i f running :
p r i n t ( " Ceasing Neural Knight ’ s mind . " )
running = F a l s e
else :
p r i n t ( " Continuing Neural Knight ’ s mind . . . " )
running = True
def read_data ( s t a t e ) :
i f s t a t e [ 3 ] == 1 : # r i g h t
p r i n t ( " The enemy ’ s guard i s r i g h t ! " )
e l i f s t a t e [ 4 ] == 1 : # t o p
p r i n t ( " The enemy ’ s guard i s top ! " )
e l i f s t a t e [ 5 ] == 1 : # l e f t
p r i n t ( " The enemy ’ s guard i s l e f t ! " )
#RUNTIME
def main ( ) :
g l o b a l running
keyboard . o n _ r e l e a s e _ k e y ( " t " , t o g g l e _ r u n n i n g )
f o r i in l i s t ( range ( 1 ) ) [ : : − 1 ] :
print ( i + 1)
time . s l e e p ( 1 )
p r i n t ( " Loading model . . . " )
l a s t T i m e = time . time ( )
switch_guard (UP)
c u r r e n t G u a r d D i r e c t i o n = UP
p r e d i c t i o n = np . a r r a y ( [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ] )
prevFrames = 60
prevPrediction = [ ]
88 Appendix E. Source Code
f o r i in range ( prevFrames ) :
p r e v P r e d i c t i o n . append ( p r e d i c t i o n )
while True :
i f running :
s c r e e n = np . a r r a y ( ImageGrab . grab ( bbox = ( 0 , 1 3 0 , 1 0 2 4 ,
700) ) ) # g r a b _ s c r e e n ( r e g i o n =(0 ,40 ,800 ,640) )
c u r r e n t G u a r d D i r e c t i o n = checkCurrentGuardInput (
prediction , currentGuardDirection )
s t a t e = g e n e r a t e _ f e a t u r e _ l i s t ( s c re en ,
currentGuardDirection )
# e x t r a c t e d S t a t e = np . a s a r r a y ( s t a t e [ 0 ] , d t y p e =np . f l o a t 3 2
) # State data
e x t r a c t e d S t a t e = s t a t e [ 0 ] # Frame d a t a
# p r i n t ( " The i n p u t i s : " , e x t r a c t e d S t a t e )
# p r i n t ( " The s h a p e o f t h e i n p u t i s : " , e x t r a c t e d S t a t e .
shape )
prevPrediction = prevPrediction [ 1 : ]
p r e v P r e d i c t i o n . append ( p r e d i c t i o n )
predHistory = [ y f o r x in p r e v P r e d i c t i o n f o r y in x ]
e x t r a c t e d S t a t e = np . expand_dims ( e x t r a c t e d S t a t e , a x i s =0)
e x t r a c t e d S t a t e = np . c o n c a t e n a t e ( ( e x t r a c t e d S t a t e [0] ,
predHistory ) , a x i s =0) # add p r e v i o u s f r a m e
gamepadinput to input s
e x t r a c t e d S t a t e = np . expand_dims ( e x t r a c t e d S t a t e , a x i s =0)
e x t r a c t e d S t a t e . shape = ( e x t r a c t e d S t a t e . shape [ 0 ] ,1 ,
e x t r a c t e d S t a t e . shape [ 1 ] )
p r e d i c t i o n = model . p r e d i c t ( e x t r a c t e d S t a t e ) [ 0 ]
b = np . z e r o s _ l i k e ( p r e d i c t i o n )
# S e p a r a t e l y c h e c k t o s e e i f any o f t h e g u a r d d i r e c t i o n s
n e e d c h a n g i n g . I f so , o n l y c h a n g e t h e g u a r d most
l i k e l y to need changing
biggest = 0
b i g g e s t I n d e x = −1
f o r i in range ( 4 , 7 ) :
if prediction [ i ] > 0 . 2 :
if prediction [ i ] > biggest :
biggest = prediction [ i ]
biggestIndex = i
i f b i g g e s t I n d e x > − 1:
b [ biggestIndex ] = 1
Appendix E. Source Code 89
f o r i in range ( 7 , len ( b ) ) :
if prediction [ i ] > 0 . 1 :
b[ i ] = 1
f o r i in range ( 0 , 4 ) :
if prediction [ i ] > 0 . 1 :
b[ i ] = 1
# b [ np . argmax ( p r e d i c t i o n ) ] = 1
# p r i n t (" out : " , b )
main ( )
from s k l e a r n . u t i l s import c l a s s _ w e i g h t
# metric s t u f f
import k e r a s . backend as K
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f r e c a l l .
Computes t h e r e c a l l , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many r e l e v a n t i t e m s a r e s e l e c t e d .
"""
90 Appendix E. Source Code
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f p r e c i s i o n .
Computes t h e p r e c i s i o n , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many s e l e c t e d i t e m s a r e r e l e v a n t .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p r e d i c t e d _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_pred , 0 , 1 ) ) )
precision = true_positives / ( predicted_positives + K.
epsilon ( ) )
return precision
p r e c i s i o n = p r e c i s i o n ( y_true , y_pred )
r e c a l l = r e c a l l ( y_true , y_pred )
r e t u r n 2 ∗ ( ( p r e c i s i o n ∗ r e c a l l ) /( p r e c i s i o n + r e c a l l +K . e p s i l o n ( ) ) )
# load data
X = []
X2 = [ ]
y = []
try :
while True :
sample = p i c k l e . load ( f )
X . append ( sample [ 0 ] )
X2 . append ( sample [ 1 ] )
y . append ( p i c k l e . load ( g ) )
except :
pass
f . close ( )
g . close ( )
p r i n t ( len ( X ) )
p r i n t ( len ( y ) )
# Create sequences
XS = [ ]
sequenceLength = 8
Appendix E. Source Code 91
X = np . a r r a y ( XS )
X2 = X2[: − sequenceLength ]
X2 = np . a s a r r a y ( X2 , dtype=np . f l o a t 3 2 )
X2 = X2 / 255 # N o r m a l i s a t i o n o f i m a g e d a t a
y = y[: − sequenceLength ]
y = np . a s a r r a y ( y , dtype=np . f l o a t 3 2 )
x_train = X[ : trainingSize ]
x 2 _ t r a i n = X2 [ : t r a i n i n g S i z e ]
y_train = y [ : trainingSize ]
x _ t e s t = X[ t r a i n i n g S i z e : ]
x 2 _ t e s t = X2 [ t r a i n i n g S i z e : ]
y_test = y[ trainingSize : ]
"""
# dims : 204 x114
f o r a in x2_train :
p l t . imshow ( a , cmap = p l t . cm . b i n a r y )
p l t . show ( )
"""
# Model h y p e r p a r a m e t e r s
numLSTM = 1
dropout = 0 . 4
numDense = 0
"""
config = t f . ConfigProto ()
c o n f i g . g p u _ o p t i o n s . a l l o w _ g r o w t h = True
session = t f . Session ( config=config )
92 Appendix E. Source Code
"""
gpu_options = t f . GPUOptions ( per_process_gpu_memory_fraction = 0 . 5 0 )
s e s s = t f . S e s s i o n ( c o n f i g = t f . ConfigProto ( gpu_options=gpu_options ) )
# S t a t e branch
x = CuDNNLSTM( 1 2 8 , input_shape = ( 1 , x _ t r a i n . shape [ 1 ] ) ,
r e t u r n _ s e q u e n c e s =True ) ( i n p u t S t a t e )
x = Dropout ( dropout ) ( x )
x = Flatten () (x)
x = Model ( i n p u t s = i n p u t S t a t e , outputs=x )
# Image b r a n c h
y = Conv2D ( 6 4 , ( 3 , 3 ) ) ( inputImage )
y = Activation ( " relu " ) (y)
y = MaxPooling2D ( p o o l _ s i z e = ( 2 , 2 ) ) ( y )
y = Flatten () (y)
y = Model ( i n p u t s =inputImage , outputs=y )
# combine inputs
combined = c o n c a t e n a t e ( [ x . output , y . output ] )
# combined branch
z = Dense ( 2 5 6 , a c t i v a t i o n = " r e l u " ) ( combined )
z = Dropout ( dropout ) ( z )
z = Dense ( 1 2 8 , a c t i v a t i o n = " r e l u " ) ( z )
z = Dropout ( dropout ) ( z )
z = Dense ( 1 4 , a c t i v a t i o n = " sigmoid " ) ( z )
model = Model ( i n p u t s =[ x . input , y . input ] , outputs=z )
c l a s s _ w e i g h t s = { 0 : 1 , #w
1 : 1 , #A
2 : 1 , #S
3 : 1 , #D
4 : 1 , # g u a r d l e f t 10
5 : 1 , # g u a r d t o p 10
6 : 1 , # g u a r d r i g h t 10
7:3 , # guardbreak
8 : 2 , # l i g h t 60
9 : 1 , # h e a v y 60
10:3 , # dodge
11:1 , # f e i n t
12:1 , # taunt
13:1} # idle
v a l _ l o s s , v a l _ a c c = model . e v a l u a t e ( [ x _ t e s t , x 2 _ t e s t ] , y _ t e s t )
print ( " Validation l o s s : " , val_loss , " Validation accuracy : " ,
val_acc )
# from g r a b s c r e e n import g r a b _ s c r e e n
# for data c o l l e c t i o n
import keyboard
import os
import t e n s o r f l o w as t f
import k e r a s
import k e r a s _ m e t r i c s
from k e r a s . u t i l s import plot_model
from processInputToGame import input_to_game
import k e r a s . backend as K
def f 1 ( y_true , y_pred ) :
def r e c a l l ( y_true , y_pred ) :
""" R e c a l l m e t r i c .
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f r e c a l l .
Computes t h e r e c a l l , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many r e l e v a n t i t e m s a r e s e l e c t e d .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p o s s i b l e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_true , 0 , 1 ) ) )
r e c a l l = true_positives / ( possible_positives + K. epsilon ( )
)
return r e c a l l
94 Appendix E. Source Code
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f p r e c i s i o n .
Computes t h e p r e c i s i o n , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many s e l e c t e d i t e m s a r e r e l e v a n t .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p r e d i c t e d _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_pred , 0 , 1 ) ) )
precision = true_positives / ( predicted_positives + K.
epsilon ( ) )
return precision
p r e c i s i o n = p r e c i s i o n ( y_true , y_pred )
r e c a l l = r e c a l l ( y_true , y_pred )
r e t u r n 2 ∗ ( ( p r e c i s i o n ∗ r e c a l l ) /( p r e c i s i o n + r e c a l l +K . e p s i l o n ( ) ) )
running = F a l s e
def t o g g l e _ r u n n i n g ( key ) :
g l o b a l running
i f running :
p r i n t ( " Ceasing Neural Knight ’ s mind . " )
running = F a l s e
else :
p r i n t ( " Continuing Neural Knight ’ s mind . . . " )
running = True
def read_data ( s t a t e ) :
i f s t a t e [ 3 ] == 1 : # r i g h t
p r i n t ( " The enemy ’ s guard i s r i g h t ! " )
e l i f s t a t e [ 4 ] == 1 : # t o p
p r i n t ( " The enemy ’ s guard i s top ! " )
e l i f s t a t e [ 5 ] == 1 : # l e f t
p r i n t ( " The enemy ’ s guard i s l e f t ! " )
#RUNTIME
def main ( ) :
g l o b a l running
keyboard . o n _ r e l e a s e _ k e y ( " t " , t o g g l e _ r u n n i n g )
f o r i in l i s t ( range ( 1 ) ) [ : : − 1 ] :
print ( i + 1)
time . s l e e p ( 1 )
p r i n t ( " Loading model . . . " )
Appendix E. Source Code 95
l a s t T i m e = time . time ( )
switch_guard (UP)
c u r r e n t G u a r d D i r e c t i o n = UP
p r e d i c t i o n = np . a r r a y ( [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ] )
prevFrames = 6
prevPrediction = [ ]
sequenceLength = 8
stateSequence = [ ]
f o r i in range ( prevFrames ) :
p r e v P r e d i c t i o n . append ( p r e d i c t i o n )
while True :
i f running :
s c r e e n = np . a r r a y ( ImageGrab . grab ( bbox = ( 0 , 1 3 0 , 1 0 2 4 ,
700) ) ) # g r a b _ s c r e e n ( r e g i o n =(0 ,40 ,800 ,640) )
c u r r e n t G u a r d D i r e c t i o n = checkCurrentGuardInput (
prediction , currentGuardDirection )
s t a t e = g e n e r a t e _ f e a t u r e _ l i s t ( s c re en ,
currentGuardDirection )
extractedState = state [0] # State data
frame = np . a s a r r a y ( s t a t e [ 1 ] ) # i m a g e d a t a
frame . shape = ( 1 , frame . shape [ 0 ] , frame . shape [ 1 ] , 1 )
s t a t e S e q u e n c e . append ( e x t r a c t e d S t a t e )
i f len ( s t a t e S e q u e n c e ) > sequenceLength :
stateSequence = stateSequence [ 1 : ]
stateSequenceNP = np . a s a r r a y ( s t a t e S e q u e n c e )
stateSequenceNP . shape = ( 1 , stateSequenceNP . shape
[ 0 ] , stateSequenceNP . shape [ 1 ] )
# S e p a r a t e l y c h e c k t o s e e i f any o f t h e g u a r d
d i r e c t i o n s n e e d c h a n g i n g . I f so , o n l y c h a n g e t h e
g u a r d most l i k e l y t o n e e d c h a n g i n g
biggest = 0
96 Appendix E. Source Code
b i g g e s t I n d e x = −1
f o r i in range ( 4 , 7 ) :
if prediction [ i ] > 0.05:
if prediction [ i ] > biggest :
biggest = prediction [ i ]
biggestIndex = i
i f b i g g e s t I n d e x > − 1:
b [ biggestIndex ] = 1
# i n c r e a s e w e i g h t i n g o f GB!
f o r i in range ( 7 , len ( b ) ) :
if prediction [ i ] > 0.05:
b[ i ] = 1
f o r i in range ( 0 , 4 ) :
if prediction [ i ] > 0 . 1 :
b[ i ] = 1
# b [ np . argmax ( p r e d i c t i o n ) ] = 1
# print ( stateSequenceNP )
main ( )
# metric s t u f f
import k e r a s . backend as K
# T h i s m e t r i c was t a k e n f r o m h e r e : h t t p s : / / s t a c k o v e r f l o w . com / a
/ 4 5 3 0 5 3 8 4 / 5 6 3 4 6 1 0 by u s e r " Paddy "
def f 1 ( y_true , y_pred ) :
def r e c a l l ( y_true , y_pred ) :
""" R e c a l l m e t r i c .
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f r e c a l l .
Computes t h e r e c a l l , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many r e l e v a n t i t e m s a r e s e l e c t e d .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p o s s i b l e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_true , 0 , 1 ) ) )
r e c a l l = true_positives / ( possible_positives + K. epsilon ( )
)
return r e c a l l
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f p r e c i s i o n .
Computes t h e p r e c i s i o n , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many s e l e c t e d i t e m s a r e r e l e v a n t .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p r e d i c t e d _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_pred , 0 , 1 ) ) )
precision = true_positives / ( predicted_positives + K.
epsilon ( ) )
return precision
p r e c i s i o n = p r e c i s i o n ( y_true , y_pred )
r e c a l l = r e c a l l ( y_true , y_pred )
r e t u r n 2 ∗ ( ( p r e c i s i o n ∗ r e c a l l ) /( p r e c i s i o n + r e c a l l +K . e p s i l o n ( ) ) )
# load data
X = []
X2 = [ ]
X3 = [ ]
y = []
inputFiles = [ " stateData21 " , " stateData20 " , " stateData23 " ] # Enter the
filenames of a l l input data here
o u t p u t F i l e s = [ " outputData21 " , " outputData20 " , " outputData23 " ] # E n t e r
the filenames of a l l output data here
98 Appendix E. Source Code
f o r i in range ( len ( i n p u t F i l e s ) ) :
i n S t r i n g = " in/" + i n p u t F i l e s [ i ] + " . pickle "
o u t S t r i n g = " out/ " + o u t p u t F i l e s [ i ] + " . p i c k l e "
f = open ( i n S t r i n g , " rb " )
g = open ( o u t S t r i n g , " rb " )
try :
while True :
sample = p i c k l e . load ( f )
X . append ( sample [ 0 ] )
X2 . append ( sample [ 1 ] )
X3 . append ( sample [ 2 ] )
y . append ( p i c k l e . load ( g ) )
except :
pass
f . close ( )
g . close ( )
p r i n t ( len ( X ) )
p r i n t ( len ( X3 ) )
p r i n t ( len ( y ) )
X = np . a r r a y ( XS )
X2 = X2[: − s t a t e S e q u e n c e L e n g t h ]
X2 = np . a s a r r a y ( X2 , dtype=np . f l o a t 3 2 )
X2 = X2 / 255 # N o r m a l i s a t i o n o f i m a g e d a t a
X3 = np . a r r a y ( X3S )
#X3 = np . a s a r r a y ( X3 , d t y p e =np . f l o a t 3 2 )
p r i n t ( " X3 0 : " , X [ 0 ] )
Appendix E. Source Code 99
p r i n t ( " X3 0 : \ n " , X3 [ 0 ] )
p r i n t ( " X3 shape " , X3 [ 0 ] . shape )
y = y[: − s t a t e S e q u e n c e L e n g t h ]
y = np . a s a r r a y ( y , dtype=np . f l o a t 3 2 )
x_train = X[ : trainingSize ]
x 2 _ t r a i n = X2 [ : t r a i n i n g S i z e ]
x 3 _ t r a i n = X3 [ : t r a i n i n g S i z e ]
y_train = y [ : trainingSize ]
x _ t e s t = X[ t r a i n i n g S i z e : ]
x 2 _ t e s t = X2 [ t r a i n i n g S i z e : ]
x 3 _ t e s t = X3 [ t r a i n i n g S i z e : ]
y_test = y[ trainingSize : ]
"""
# dims : 204 x114
f o r a in x2_train :
p l t . imshow ( a , cmap = p l t . cm . b i n a r y )
p l t . show ( )
"""
# Model h y p e r p a r a m e t e r s
layerSizes = [128]
conv1Layers = [ 2 ]
conv2Layers = [ 3 ]
denseLayers = [ 2 ]
dropout = 0 . 3
"""
config = t f . ConfigProto ()
c o n f i g . g p u _ o p t i o n s . a l l o w _ g r o w t h = True
session = t f . Session ( config=config )
"""
gpu_options = t f . GPUOptions ( per_process_gpu_memory_fraction = 0 . 5 0 )
s e s s = t f . S e s s i o n ( c o n f i g = t f . ConfigProto ( gpu_options=gpu_options ) )
patience = 6
e a r l y s t o p p e r = E a r l y S t o p p i n g ( monitor= " v a l _ f 1 " , p a t i e n c e = p a t i e n c e ,
verbose =1 , mode= "max" , r e s t o r e _ b e s t _ w e i g h t s =True )
f o r l a y e r S i z e in l a y e r S i z e s :
f o r conv1Layer in conv1Layers :
f o r conv2Layer in conv2Layers :
f o r denseLayer in denseLayers :
100 Appendix E. Source Code
# S t a t e branch
x = Conv1D ( l a y e r S i z e , 4 , padding= ’ same ’ ) ( i n p u t S t a t e )
x = Activation ( " relu " ) ( x )
f o r l in range ( conv1Layer − 1) :
x = Conv1D ( l a y e r S i z e , 4 , padding= ’ same ’ ) ( x )
x = Activation ( " relu " ) ( x )
x = MaxPooling1D ( p o o l _ s i z e =1) ( x )
x = Flatten () (x)
x = Model ( i n p u t s = i n p u t S t a t e , outputs=x )
# Image b r a n c h
y = Conv2D ( l a y e r S i z e , ( 3 , 3 ) ) ( inputImage )
y = Activation ( " relu " ) (y)
y = MaxPooling2D ( p o o l _ s i z e = ( 2 , 2 ) ) ( y )
for l in conv2Layers :
y = Conv2D ( l a y e r S i z e , ( 3 , 3 ) ) ( y )
y = Activation ( " relu " ) (y)
y = MaxPooling2D ( p o o l _ s i z e = ( 2 , 2 ) ) ( y )
y = Flatten () (y)
y = Model ( i n p u t s =inputImage , outputs=y )
# History branch
w = CuDNNLSTM( l a y e r S i z e , input_shape = ( 1 , x 3 _ t r a i n .
shape [ 1 ] ) , r e t u r n _ s e q u e n c e s =True ) ( i n p u t H i s t o r y )
w = Dropout ( dropout ) (w)
w = F l a t t e n ( ) (w)
w = Model ( i n p u t s =i n p u t H i s t o r y , outputs=w)
# combine inputs
combined = c o n c a t e n a t e ( [ x . output , y . output , w.
output ] )
# combined branch
z = Dense ( 2 5 6 , a c t i v a t i o n = " r e l u " ) ( combined )
Appendix E. Source Code 101
z = Dropout ( dropout ) ( z )
f o r l in denseLayers :
z = Dense ( l a y e r S i z e , a c t i v a t i o n = " r e l u " ) ( z )
z = Dropout ( dropout ) ( z )
# Output l a y e r
z = Dense ( 1 4 , a c t i v a t i o n = " sigmoid " ) ( z )
model = Model ( i n p u t s =[ x . input , y . input , w. input ] ,
outputs=z )
v a l _ l o s s , v a l _ a c c = model . e v a l u a t e ( [ x _ t e s t , x 2 _ t e s t
, x3_test ] , y_test )
print ( " Validation loss : " , val_loss , " Validation
accuracy : " , val_acc )
import t e n s o r f l o w as t f
import k e r a s
import k e r a s _ m e t r i c s
from k e r a s . u t i l s import plot_model
import k e r a s . backend as K
from processInputToGame import input_to_game
running = F a l s e
def t o g g l e _ r u n n i n g ( key ) :
g l o b a l running
i f running :
p r i n t ( " Ceasing Neural Knight ’ s mind . " )
running = F a l s e
else :
p r i n t ( " Continuing Neural Knight ’ s mind . . . " )
running = True
def read_data ( s t a t e ) :
i f s t a t e [ 3 ] == 1 : # r i g h t
p r i n t ( " The enemy ’ s guard i s r i g h t ! " )
e l i f s t a t e [ 4 ] == 1 : # t o p
p r i n t ( " The enemy ’ s guard i s top ! " )
e l i f s t a t e [ 5 ] == 1 : # l e f t
p r i n t ( " The enemy ’ s guard i s l e f t ! " )
# T h i s m e t r i c was t a k e n f r o m h e r e : h t t p s : / / s t a c k o v e r f l o w . com / a
/ 4 5 3 0 5 3 8 4 / 5 6 3 4 6 1 0 by u s e r " Paddy "
def f 1 ( y_true , y_pred ) :
def r e c a l l ( y_true , y_pred ) :
Appendix E. Source Code 103
""" R e c a l l m e t r i c .
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f r e c a l l .
Computes t h e r e c a l l , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many r e l e v a n t i t e m s a r e s e l e c t e d .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p o s s i b l e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_true , 0 , 1 ) ) )
r e c a l l = true_positives / ( possible_positives + K. epsilon ( )
)
return r e c a l l
Only c o m p u t e s a b a t c h −w i s e a v e r a g e o f p r e c i s i o n .
Computes t h e p r e c i s i o n , a m e t r i c f o r m u l t i − l a b e l
c l a s s i f i c a t i o n of
how many s e l e c t e d i t e m s a r e r e l e v a n t .
"""
t r u e _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y _ t r u e ∗ y_pred , 0 ,
1) ) )
p r e d i c t e d _ p o s i t i v e s = K . sum(K . round (K . c l i p ( y_pred , 0 , 1 ) ) )
precision = true_positives / ( predicted_positives + K.
epsilon ( ) )
return precision
p r e c i s i o n = p r e c i s i o n ( y_true , y_pred )
r e c a l l = r e c a l l ( y_true , y_pred )
r e t u r n 2 ∗ ( ( p r e c i s i o n ∗ r e c a l l ) /( p r e c i s i o n + r e c a l l +K . e p s i l o n ( ) ) )
#RUNTIME
def main ( ) :
g l o b a l running
keyboard . o n _ r e l e a s e _ k e y ( " t " , t o g g l e _ r u n n i n g )
f o r i in l i s t ( range ( 1 ) ) [ : : − 1 ] :
print ( i + 1)
time . s l e e p ( 1 )
l a s t T i m e = time . time ( )
switch_guard (UP)
c u r r e n t G u a r d D i r e c t i o n = UP
def my_loss ( t a r g e t s , l o g i t s ) :
104 Appendix E. Source Code
# w e i g h t s = np . a r r a y ( [ 0 . 5 9 6 6 7 5 8 1 5 7 9 7 4 9 9 2 ,
0.7981600081325607 , 0.925282098200671 ,
0.9579139981701739 , 0.9246060790891532 ,
0.9940489986784589 , 0.9473050726847616 ,
0.9771636677848937 , 0.951865406119752 ,
0.951547219680797 , 0.9911106028260649 ,
0.9806983836535529 , 0.9880176883196097 ,
0.8289620819355494]) # optimal
weights = np . a r r a y ( [ 0 . 5 9 6 6 7 5 8 1 5 7 9 7 4 9 9 2 , 0 . 7 9 8 1 6 0 0 0 8 1 3 2 5 6 0 7 ,
0.925282098200671 , 0.9579139981701739 ,
0.9246060790891532 , 0.9880489986784589 ,
0.9473050726847616 , 0.9771636677848937 ,
0.951865406119752 , 0.971547219680797 ,
0.9881106028260649 , 0.9806983836535529 ,
0.9880176883196097 , 0.8289620819355494]) # optimal3
r e t u r n K . sum( t a r g e t s ∗ −K . l o g ( l o g i t s + 1e − 10) ∗ weights +
( 1 − t a r g e t s ) ∗ −K . l o g ( 1 − l o g i t s + 1e − 10) ∗ (1 − weights
) , a x i s = − 1)
predictionHistory = [ ]
sequenceLength = 4
his t or y Se q u en c eL e ng t h = 1
stateSequence = [ ]
# gd = C o n t r o l l e r ( )
f o r i in range ( h i st o ry S eq u en c eL e n gt h ) :
p r e d i c t i o n H i s t o r y . append ( [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ] )
# p r e d i c t i o n H i s t o r y . a p p e n d ( gd . g e t _ t r u e _ g a m e _ i n p u t ( ) )
time . s l e e p ( 0 . 1 )
while True :
i f running :
s c r e e n = np . a r r a y ( ImageGrab . grab ( bbox = ( 0 , 1 3 0 , 1 0 2 4 ,
700) ) ) # g r a b _ s c r e e n ( r e g i o n =(0 ,40 ,800 ,640) )
c u r r e n t G u a r d D i r e c t i o n = checkCurrentGuardInput (
p r e d i c t i o n H i s t o r y [ − 1] , c u r r e n t G u a r d D i r e c t i o n )
s t a t e = g e n e r a t e _ f e a t u r e _ l i s t ( s c re en ,
currentGuardDirection )
extractedState = state [0] # State data
frame = np . a s a r r a y ( s t a t e [ 1 ] ) # i m a g e d a t a
frame . shape = ( 1 , frame . shape [ 0 ] , frame . shape [ 1 ] , frame .
shape [ 2 ] )
s t a t e S e q u e n c e . append ( e x t r a c t e d S t a t e )
Appendix E. Source Code 105
stateSequenceNP = np . a s a r r a y ( s t a t e S e q u e n c e )
stateSequenceNP . shape = ( 1 , stateSequenceNP . shape
[ 0 ] , stateSequenceNP . shape [ 1 ] )
p r e d i c t i o n H i s t o r y N P = np . a s a r r a y ( p r e d i c t i o n H i s t o r y )
p r e d i c t i o n H i s t o r y N P . shape = ( 1 , p r e d i c t i o n H i s t o r y N P
. shape [ 0 ] , p r e d i c t i o n H i s t o r y N P . shape [ 1 ] )
# S e p a r a t e l y c h e c k t o s e e i f any o f t h e g u a r d
d i r e c t i o n s n e e d c h a n g i n g . I f so , o n l y c h a n g e t h e
g u a r d most l i k e l y t o n e e d c h a n g i n g
biggest = 0
b i g g e s t I n d e x = −1
f o r i in range ( 4 , 7 ) :
if prediction [ i ] > 0 . 5 :
if prediction [ i ] > biggest :
biggest = prediction [ i ]
biggestIndex = i
i f b i g g e s t I n d e x > − 1:
b [ biggestIndex ] = 1
f o r i in range ( 7 , len ( b ) ) :
if prediction [ i ] > 0 . 5 :
b[ i ] = 1
f o r i in range ( 0 , 4 ) :
if prediction [ i ] > 0 . 5 :
b[ i ] = 1
# b [ np . argmax ( p r e d i c t i o n ) ] = 1 # Only f o r one −h o t
d e c i s i o n making
# read_data ( state [0])
p r e d i c t i o n H i s t o r y . append ( b )
predictionHistory = predictionHistory [ 1 : ]
i f np . a r r a y _ e q u a l ( b , np . a r r a y
( [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ] ) ) or np .
a r r a y _ e q u a l ( p r e d i c t i o n , np . a r r a y
([0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0]) ) :
i f b [ 1 3 ] == 1 :
print ( " Idle " )
else :
p r i n t ( " Zero I d l e " )
106 Appendix E. Source Code
else :
input_to_game ( b )
# print ( stateSequenceNP )
main ( )
107
Appendix F
Weekly Logs
The following documents are the logs written to track progress across the term.
02 - 23/01/2019
The project can be split into three major problems. Approaching them in order of difficulty, the first is the output - controlling the
game through script. This w as already solved soon after proposing the project, and can be read about in Log 00 - controlling
output.
The second problem is the matter of input for the AI - providing real-time data about the game so the AI can use this information to
make a decision as to how to act. This proves challenging, as For Honor does not provide any kind of API to extract such
information, presumably because it could be used for cheating quite easily. As a result, I w ill have to use Computer Vision to try to
extract information. This is the problem I am currently in the process of solving. This log details the process by w hich I created a
visualiser to detect an enemy's guard, and develop the GuardMatcher, w hich can be seen here.
The single most important part of the UI that requires interpreting for the AI is the enemy's guard direction. Below is an anntotated
screenshot of the game to explain how the guard direction functions:
As a result, my first aim w as simply to make a simple program that detected the enemy's guard direction and sw itched its guard
appropriately. How ever, w ith no know ledge as to how to use openCV, I needed to spend time learning and using it in a w ay that
w ould allow me to make progress. I came across this course, w hich w as the same resource I had previously used to discover
how to control the game (w hich can again be read in Log 00 - controlling output), w hich w as a pseudo-improvised course made
by "Sentdex", w here he w ould show his process of making an AI that could play the game Grand Theft Auto V - specifically, it
begins first w ith the making of a self-driving car AI.
Follow ing the first several steps of this course, I developed a rudimentary AI using openCV. It relied primarily on finding the main
tw o lines in a frame, and consider them the tw o boundaries of the lane. By analysing the gradients of these lines, the AI w ould
simply try to stay betw een them. It seemed a lot of this style of AI relied on line detection, and since the guard UI in For Honor is
essentially made up of six unique lines, I decided to try a similar method to detect the guard direction.
The screen itself could be captured very easily using opencv's ImageGrab function; just using the follow ing line, the portion of the
screen containing the game could be captured:
After brief experimentation, I concluded that it w ould be best to run the game at a resolution of 1024x768, as it is the low est
resolution the game allow s to be run w hilst still maintaining a 16:9 aspect ratio. A smaller resolution is desirable, as it means that
the image captured converts to as small a matrix as possible, w hich w ill mean operations manipulating it w ill perform as quickly as
possible. With the image captured, it is ideal to restrict the image solely to the region of interest, so that only areas bearing useful
information w ill appear. Below is a diagram draw n to demonstrate the concept. Though the specific values w ill need ot be
changed as more information requires capturing, the concept remains the same:
This is achieved by creating a mask matrix of entirely zeroes except for the region of interest, w hich are ones. Then, by applying
a Bitw ise And operation w ith the mask and image, all pixels of the image except those covered by the ones are turned black. This
allow s for only the UI to be show ing:
The next step to get line detection w orking w as to threshold the image. Binary thresholding is the process of filtering the image
only to allow pixels above a certain brightness threshold, and convert the rest to black. This could be done w ith a single line of
code:
After some experimentation w ith the parameters, I w as able to quite cleanly be left only w ith the guard UI (and some
miscellaneous noise, to be expected). Whilst in the conventional case, edge detection is applied before any kind of line-finding,
such as using the Canny Edge Detection algorithm, the UI is designed such that it w as w orth attempting to apply line detection
immediately, as it w as mostly made of lines.
I used a form of the Hough Lines Transform know n as the Probabilistic Hough Lines Transform, provided by openCV. This is a
more efficient form of the algorithm that also conveniently returns the lines as a pair of cartesian coordinates. Again, this could be
done w ith only a single line of code:
lines = cv2.HoughLinesP(processedImage, rho = 1, theta = np.pi/180, threshold = 17, minLineLength = 15, maxLineGap = 5)
After experimenting w ith the parameters, the resulting lines ended up essentially perfect. I used a function provided by Sentdex
to draw them:
With this done, I had to use the information from the lines to somehow discriminate and discern guard direction. When follow ing
the Self Driving AI course, the gradients of the lines w ere used to determine w hich side of the rode w as left or right. I decided to
use a similar method of analysing gradient. That required w riting my ow n algorithm for retrieving a list of the gradients:
gradients = []
try:
print(len(lines))
print(lines)
#find gradients
for line in lines:
actualLine = line[0]
gradient = (actualLine[3] - actualLine[1]) / (actualLine[2] - actualLine[0])
if isfinite(gradient):
gradients.append(gradient)
print(gradients)
except:
pass
I then started printing every gradient to the console output to analyse to see if any patterns emerged to use to distinguish. Quite
conveniently, I discovered that the lines had very simple heuristic borders; The lines making up the right guard had a gradient m <
-0.5. For the left guard, m > 1, and for the top guard, -0.5 > m < 1.
Given these simple rules, I w as able to quite easily w rite a function that w ould return the guard direction:
def find_guard(gradients):
avgM = int(mean(gradients))
if avgM < -0.5:
return RIGHT
elif avgM > -0.5 and avgM < 1:
return UP
else:
return LEFT
Where the constants returned are hexcodes that correspond to the inputs used to sw itch guard. This meant I could simply w rite
functions such that I could simply call switch_guard(find_guard(gradients)) . Doing this resulted in the AI successfully sw itching
to the correct guard. This can be seen in the video demo linked at the beginning of this log.
It should be noted that, in order for the AI to detect the lines cleanly and properly, some prerequisites w ere identified. Contrast
should be relatively low . The player character must maintain only a short distance from the enemy. Only permutations of maps that
are set at dusk or night, and that have clear w eather should be used. In the video, it can be seen that the AI occasionally makes
mistakes, and these are caused by noise generated by the snow fall. A clear, dark environment helps minimalise noise.
With this achieved, next steps w ere discussed. The key next step is the ability to discriminate the different states of attack from
the enemy. When an enemy attacks, the corresponding direction indicator grow s larger and turns red. This provides a new
problem, as red does not meet the required brightness levels to be thresholded. My proposed plan is to identify the red guard,
save the pixels as a map, then convert the red to w hite for thresholding and line detection. This w ay, the AI does not go "blind"
w hen the enemy attacks, causing erratic behavioure. Additionally, a second image can be created w here, follow ing thresholding,
the red pixels are reimposed upon the image so colour can be used as a discriminant to distinguish different states. Achieving this
w ill mean it w ill be possible to do for all states by their ow n unique colour.
It w as agreed that creating a table of different states requiring real-time identification w ould be useful. These w ill both be
attempted to be completed by the end of next w eek.
Due to extenuating circumstances regarding family, I w as unable to make any progress on the project this w eek. Last w eek's
goals roll forw ard to the next w eek.
04 - 06/02/2019
As discussed in the previous log, the next step of feature preprocessing w as to be able to filter the different colours (w hich
have been designed by Ubisoft to represent attack types) and use detection of the presence of those colours to be able to tell the
state.
This w ould require know ing exactly w hat information needed to be captured. The supplementary state table document in [log 02a]
() show s the possible information that might be important. Attack indicators w ould also need to reflect their direction as w ell.
Whilst it w ould be helpful to determine the player's stamina, and the enemy's health, it w ould require capturing the quantity of
health and stamina bars, w hich change size and move around. As a result, I have concluded it is not w orth consuming any time
attempting to capture these features as they w ould require a completely different method.
I concluded that the best w ay to achieve this colour-filtering capture w ould be to convert all but a range of colour values into
black. With regards to the attack indicators, since they also need to reveal guard direction, I w ould first convert them to w hite so
that they could pass the luminance threshold for the guard detection algorithm described in log 02.
How ever, before I tackled w ith the implementation of this, I had been encountering in my testing an unforeseen level of
inconsistency w ith the accuracy w ith w hich the lines w ere detected for the guard direction. To the point that even testing the
program on a static frame w ould yield constantly flickering results. After researching, I realised, by w ay of [this]() StackOverflow
thread, that the HoughLinesP function I w as using as part of the OpenCV library is a Probabilistic, meaning it does not test every
single point of data for lines, rather randomly selecting a subset of points. Making this more consistent w ould drastically increase
the accuracy of guard detection. Unfortunately, this w as not a task I w as able to achieve. I discovered tw o potential options: use
a function HoughLines , or use a completely different system called the linesegmentdetector . They both suffered crucial
problems. The former only returned the line equation. Not the line segment at all. This made the information useless. The line
segment detector suffered a similar flaw in that it does not allow for any kind of filtration during line detection - I cannot set a
minimum length of a line, or the maximum line gap. Both of these are crucial information, so I had to resort to remaining w ith the
HoughLinesP algorithm.
I began w ith the colour state detector w ith the attack indicator. The first step w as to determine the range of colours I could filter in
order to capture the attack indicator only. It w as becoming clear that using RGB values w as capturing far too much erroneous
noise. So as a result I researched alternative representations of colour, settling on using the HSV (Hue, Value, Saturation)
scheme. The follow ing diagram describes how these three variables map to a colour:
From Wikipedia, used under a CC BY-SA 3.0 license
Which means that I can very easily capture different shades of red and only shades of red. The first line of the filtering function
sees the image converted to HSV as a result:
With this alone, technically w ith only a single line I can then retrieve all instances of this colour in the image, and w ill w ork w ith all
other colour state detection:
locs = np.where(mask != 0)
But for the attack indicators, as aforementioned I also need to convert the red into w hite in order to pass the brightness threshold
and The method for the actual filtering of the image for the attack indicator uses an algorithm made by StackOverflow user Ray
Ryeng, w hich I found on this thread. This emulates the copyto function in the C++ edition of the openCV library w hich the Python
library lacks. I used this function to copy all instances of the mask (using the locs numpy array) onto the original image.
Discovering this process took many hours of research, but it all w orks very effectively. By making a simple testing function, I
checked if the enemy w as attacking. If so, to do a "heavy attack" in the game. I then set up a testing environment w ithin the game
w hereupon the enemy bot w ould simply light attack in a random direction approximately every three seconds. When one heavy
attacks almost immediately w hen seing an enemy light attack, it has the effect of "parrying" the enemy's attack, negating any
potential damage to the player and opening up the enemy for counter attacking. Theoretically, then, this w ould allow the AI to
automatically parry all light attacks throw n at it.
At the time of w riting, I have had no access to the usual computer w ith w hich I run everything, due to the university failing to
provide my internet. As a result, I w as forced to run it on a suboptimal setup, on a significantly w orse CPU, meaning less frames
are able to be processed in a given time. This has the effect of w orsening the AI's "reaction time". Still, I tested the AI's automatic
parrying effectiveness to see if it w ould w ork, and how w ell if so. It did indeed w ork, though not every time. As can be seen
from [this placeholder video], the AI successfully parries many attacks, and usually blocks those it fails to parry. This is because it
captures the frames in w hich the guard direction has changed in time, but not the much smaller number of frames in w hich the
attack indicator is present. Hypothetically, the performance w ould improve w hen run on a more pow erful computer.
Given that this is all in the pursuit of feature preprocessing, it w as important now to consider how exactly the information w ould
be passed into the neural netw ork. Whilst much more research w ill now need to be done, given that information as simple as
possible w ould optimise the performance, I decided on representing this side of the game state on a 10 index, 1 dimensional
array. Each index w ould represent a different state, and the value at that index w ould be a boolean 0 or 1.
So, for example, an array that looked like this: [0,1,0,1,0,0,1,1,0,0] w ould mean that the player is guarding top, w hilst the
enemy had their guard to the left, and w ere currently attacking w ith an unblockable attack. This w ay, much information could be
learned from very little actual stored data. Implementing this w as simply a series of if statements, as there is no cleaner w ay of
achieving this.
With colour state detection now achieved, the next steps w ill be to complete feature processing - add the colour detection for
unblockable attacks, and for bashing. Guard breaking w ill be a different, more difficult task as that requires detection of a
transparent image. Additionally, from my research it seems beneficial to also pass in an extremely low -resolution greyscaled
version of the frame as information, in order for the netw ork to have more contex-sensitive information w ith w hich to form its
ow n pattern detection. This w ill also require further research.
Log 05 - 14/02/2019
Feature processing is now nearly complete. Unblockable states are now accounted for by w ay of the same method described in
log 04. The state detector is now able to process all useful states but guard breaks, w hich are being investigated.
With this done, I printed w hat the detector could compute, and it w as able to give a coherent and seemingly accurate
representation of the opponent's actions:
At this point, I considered the outputs that the neural netw ork w ould produce after processing. It w ould be precedented for each
node to represent a button that can be pressed, and thus output an array w here each index represents a button, and its value
there a 0 or 1 describing w hether or not it should be pressed.
As such, a program w ould need to be able to conver this information into the DirectX button presses using the method described
in log 01. The follow ing simple script w as w ritten, and can be found as the file processInputToGame.py :
inputArray = [0,0,0,0,0,0,0,0,0,0,0]
correspondingKeys = [W,A,S,D,LEFT,UP,RIGHT,Q,J,K,SPACE]
# W, A, S, D, left, up, right, GB, light, heavy, dodge
def generateRandomInputArray(arr):
for i in range(len(arr)):
arr[i] = random.randint(0,1)
return arr
def inputToGame(arr):
for i in range(len(arr)):
processInput(arr[i],correspondingKeys[i])
for i in list(range(10))[::-1]:
print(i + 1)
time.sleep(1)
print("active...")
while True:
time.sleep(1)
inputArray = generateRandomInputArray(inputArray)
print(inputArray)
inputToGame(inputArray)
The function generateRandomInputArray() w as added as a testing placeholder, and the bottom program w as set up to simply
generate a new random state and execute it every 0.1 seconds. It w orked as expected.
After meeting w ith my supervisor, it w as agreed that an explicit, formal test w ould be required for the state detector before
moving on to the development of the neural netw ork. I w ill record footage of a fight. Every discrete time step, I w ill manually
record the true state log. Then, the state detector w ill w atch the footage of the fight as w ell and log its observed game state. A
confusion matrix w ill be draw n from a comparison of these results, w ith the accuracy and precision calculated.
Log 06 - Reading Week
Completion of State Detector (Unblockable, Low res frame)
Over reading w eek, I mainly focussed on the completion of the state detector, and prepared it for everything I w ould need once
the netw ork w as outpu
tting.
The final elements of the state detector w as to add guard breaking detection, w hich w as done in the exact same w ay as before.
I also w anted to pass in a low resolution version of the frame, and this w as done very simply w ith the cv2.resize() dfunction.
After doing some research on how to actually structure neural netw orks, I decided that it w ould be best to begin the learning
process using a supervised method, by capturing the buttons I am pressing at any frame, so as to serve as an output. This meant
I w ould need to be able to capture inputs from the gamepad. Using the inputs library I w rote a test script to do just that. This can
be seen in Experimentation/Gamepad Capture/inputs Test.py of the repository.
In addition to this, the neural netw ork w ould be outputting a one hot array w here each index represents a button press. This
w ould need to be converted to inputs into the game itself so the netw ork may actually "play" the game. Another script w as used
utilising the DirectKeys.py script and OutputControl.py script discussed in the first logs. It can be seen in
Experimentation/GuardMatcherAttacker/processInputToGame.py .
With these both completed, I w anted to spend time looking at actually implementing neural netw orks. I w atched a series of videos
by Harrison Kinsley, or "Sentdex", explaining how to use Keras and Tensorflow . Follow ing the course, I implemented a
feedforw ard netw ork that trained on the mnist dataset. Mnist is a classically used dataset for learning to implement neural
netw orks, as it contains an extremely large number of samples. They are all greyscale images of handw ritten digits, labelled w ith
their correct answ er. Here, the netw ork learns to identify new handw ritten digits from this training set.
Afterw ard, I follow ed the course into implementing a convolutional neural netw ork that learned on the Kaggle dataset,
approximately 25 000 images of cats and dogs, learning to get it w orking and learning about tensorboard and experimentation of
hyperparameters in the process. It w as at this point I realised my tensorflow w as training extremely slow , and discovered i
should be training on the GPU, rather than the processor. After installing tensorflow -gpu, CUDA, CUDNN and upgrading to python
3.6, the training became drastically faster; w here before a single epoch took approximately 63 seconds, they now took 9.
At the end of Reading w eek I briefly explored implementing a recurrent netw ork on MNIST. This w as extremely fast and accurate,
though more difficutl and complicated to set up. It w ill likely be a recurrent LSTM netw ork I use for the final implementation of
Neural Knight, but after speaking w ith my Supervisor, it w as determined that it w ould be good simply to start capturing data and
explore the results acquired by a simple Feed-Forw ard netw ork.
Log 07 - 03/03/2019
With the intention of being in a position to begin developing a simple feed forw ard netw ork, the final step w as to actually be able
to capture training data.
This w as a simple enough process. Essentially, for each frame, the data captured from the state detector, as w ell as a
greyscaled, low resoltution version of the frame are both stored into a single file w hich is then pickled. For the same frame, the
buttons I am pressing at the time (though only one is stored to keep the netw ork one hot for now ) are stored in a separate file.
This w as set up in a w ay that I could toggle recording by pushing a button, so I could avoid capturing useless frames.
Some test data w as captured by fighting a dummy CPU several times. The implementation and test data can be found in
Implementation/Data Capture .
Then, using a similar version of the feed forw ard netw ork I trained on the mnit dataset, I passed it in the data captured in the
game. The data w as small and noisy, and so I did not expect much. After some tw eaking, it claimed how ever to have achieved a
validation accuracy of ~ 70%. Sceptical, I had the program save hte model, and set about building the part of the program that
w ould load this model and use it and the stateDetector to "predict" game inputs.
It took an unforesson amount of time to get running, unfortunately, due to apparently running out of video memory w hen trying to
run both For Honor and the netw ork at the same time. How ever, eventually this w as solved as I discovered that Keras tries to
allocate as much of the VRAM as possible to itself, but by using gpu_options =
tf.GPUOptions(per_process_gpu_memory_fraction=0.30) I w as able to restrict the netw ork to consuming only 30% of my VRAM,
leaving plenty for the game itself..
When it finally ran, I discovered that the netw ork w as consistently and almost exclusively deciding to do nothing at all - i.e. output
the final node as hot. The final node is reserved not for a button press, but rather an "idle" action, to do notihing. This makes
sense, as I realised that, in a given fight, most frames I w ould not be pressing a button, as duels in For Honor are carefully paced
and designed to avoid "button mashing" behaviour. As a result, most of the data explained that the right answ er w as indeed to do
"nothing", and thus the netw ork had learned that if it just did nothing every single time, then it w ould get the majority of its
behaviour correct.
This problem of imbalanced data can be fixed in a myriad of w ays, but since the primary solution - obtain more balanced data - is
not possible due to the nature of the game itself, the next solution w ill be attmempted: modification of the class w eights. This
allow s certain outputs in the training set to be w orth a certain factor more than other classes. This w ay I could set every frame
containing an output w ith an actual button press, to be, for example, 50 times any idling examples.
Due to possessed domain know ledge, the conclusion w as draw n that the data w as alrgely imbalanced as, in most frames
captured (w hich occurs betw een attacks) no buttons w ould be pressed. As such, the netw ork w ould be incentivised to do
nothing all the time. This w as fixed w ith the utilisation of class w eights, adding proportional value to certain data samples based
on their end classification. Proceeding this, the bot began to make decisions.
Log 09 - 14/03/2019
With the w ork of the previous w eek, the bot began to block attacks, but inonsistently and often it w ould sw itch its guard
incorrectly. As a result, a deeper ivnestigation of the data capturing system w as initiated. After some time, it w as discovered that
the syste mfor detecting the end gamepad's inputs w as flaw ed - the inputs library only detected thumbstick input if the thumbstick
w as being actively moved. This eant that if the thumbstick w as being held in a specific position (w hich is needed to keep a guard
up), then the algorithm w ould report thatr it w as not being moved at all.
This w as due to a problem related to threading. As w ith other buttons, thumbstick data comes in a buffer that has to be read
from, and that buffer does not update if the sticks do not move. To solve this, a new thread w as made only for gamepad input
detection. This w ay the buffer could be read from constantly w ithin its ow n loop, and ensured that w hen holding a thumbstick in
a direction that it w ould consistently detect the correct w ay.
With these tw o additions made, the autoblocker now w orked exactly as intended, w hich can be seen in this video:
https://w w w .youtube.com/w atch?v=LR2Tm7F-w 80
Left 22 9
Up 32 3
Right 29 5
Whilst it is relatively accurate, then w ith a total accuracy of 83%, the individual accuracy of attacks that came from the left w ere
significantly less so, w ith an accuracy of only 71%. This led to the conclusion that the state detector w as, for some reason, not
as effective at detecting the left guard than the others. The root of this problem w as not difficult to find; With some visual testing it
could be seen that the Region of Interest cropping (discussed initially in Log 02) w as cutting off the left guard. Increasing the
dimensions some afterw ard w as enough to fix this problem, as demonstrated:
Left 31 3
Up 28 4
Right 34 0
Weekly Log 24/03/2019
Not much could be done this w eek as I w as ill, and had to focus on tw o other module coursew ork projects and their subsequent
reports.
Weekly Log - 30/03/2019
All of my time w as spent on the draft report. I tested myself against a level 3 w arden in 50 rounds to use as a human benchmark
for testing. I w on 35 rounds and lost 15. I w ill now w ork on capturing large amounts of data.
125
Bibliography