Parallel Genetic Algorithms For Financial Pattern Discovery Using Gpus Springerbriefs in Applied Sciences and Technology Baúto

Parallel Genetic Algorithms for
Financial Pattern Discovery Using

GPUs SpringerBriefs in Applied
Sciences and Technology Baúto
Visit to download the full and correct content document:
https://ebookstep.com/product/parallel-genetic-algorithms-for-financial-pattern-discov
ery-using-gpus-springerbriefs-in-applied-sciences-and-technology-bauto/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
Science and Technology for Society 5 0 Tim Penulis
https://ebookstep.com/product/science-and-technology-for-
society-5-0-tim-penulis/
Using R for Digital Soil Mapping Progress in Soil

Science Malone
https://ebookstep.com/product/using-r-for-digital-soil-mapping-
progress-in-soil-science-malone/
Contemporary Accounts in Drug Discovery and Development

1st Edition Xianhai Huang
https://ebookstep.com/product/contemporary-accounts-in-drug-
discovery-and-development-1st-edition-xianhai-huang/
Financial Technology Elif Pardiansyah S Sy M Si Editor
https://ebookstep.com/product/financial-technology-elif-
pardiansyah-s-sy-m-si-editor/
Applied Quantitative Finance Statistics and Computing
Härdle
https://ebookstep.com/product/applied-quantitative-finance-
statistics-and-computing-hardle/
Model Pembelajaran Matematika Berbasis Proyek dalam

Kerangka Integrasi Sciences Technology Engineering
Mathematics and Islam STEMI Mulin Nu’Man
https://ebookstep.com/product/model-pembelajaran-matematika-
berbasis-proyek-dalam-kerangka-integrasi-sciences-technology-
engineering-mathematics-and-islam-stemi-mulin-numan/
Information Technology for Business Sugeng Hariadi
https://ebookstep.com/product/information-technology-for-
business-sugeng-hariadi/
Research Methods in Building Science and Technology

Field Based Analysis and Simulation 1st Edition Rahman
Azari
https://ebookstep.com/product/research-methods-in-building-
science-and-technology-field-based-analysis-and-simulation-1st-
edition-rahman-azari/
The INSURTECH Book The Insurance Technology Handbook

for Investors Entrepreneurs and FinTech Visionaries 1st
Edition Vanderlinden
https://ebookstep.com/product/the-insurtech-book-the-insurance-
technology-handbook-for-investors-entrepreneurs-and-fintech-
visionaries-1st-edition-vanderlinden/
SPRINGER BRIEFS IN APPLIED SCIENCES AND
TECHNOLOGY  COMPUTATIONAL INTELLIGENCE
João Baúto · Rui Neves · Nuno Horta
Parallel Genetic
Algorithms for
Financial Pattern
Discovery Using
GPUs
123
SpringerBriefs in Applied Sciences
and Technology
Computational Intelligence
Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Systems Research Institute,
Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the fields of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artificial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
More information about this series at http://www.springer.com/series/10618

João Baúto Rui Neves Nuno Horta
• •
Parallel Genetic Algorithms

for Financial Pattern
Discovery Using GPUs
123
João Baúto Nuno Horta
Instituto Superior Técnico Instituto Superior Técnico
Instituto de Telecomunicações Instituto de Telecomunicações
Lisbon Lisbon
Portugal Portugal
Rui Neves
Instituto Superior Técnico
Instituto de Telecomunicações
Lisbon
Portugal
ISSN 2191-530X ISSN 2191-5318 (electronic)

SpringerBriefs in Applied Sciences and Technology
ISSN 2520-8551 ISSN 2520-856X (electronic)
SpringerBriefs in Computational Intelligence
ISBN 978-3-319-73328-9 ISBN 978-3-319-73329-6 (eBook)
https://doi.org/10.1007/978-3-319-73329-6
Library of Congress Control Number: 2017963986
© The Author(s), under exclusive licence to Springer International Publishing AG, part of Springer
Nature 2018
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Printed on acid-free paper
This Springer imprint is published by the registered company Springer International Publishing AG
part of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Maria, Manuel and Miguel
João Baúto
To Susana and Tiago

Rui Neves
To Carla, João and Tiago

Nuno Horta
Preface
The financial markets move vast amounts of capital around the world. This fact and
the easy access to trading in a manual or automatic way that creates a more
accessible way to participate in the markets activity attracted the interest of all type
of investors, from the “man on the street” to academic researchers. This type of new
investors and the automatic trading systems influence the market behavior. In order
to adapt to this new reality, the domain of computational finance has received an
increasing attention by people from both finance and computational intelligence
domains.
The main driving force in the field of computational finance, with application to
financial markets, is to define highly profitable and less risky trading strategies. In
order to accomplish this main objective, the defined strategies must process large
amounts of data which include financial markets time series, fundamental analysis
data, technical analysis data. and produce appropriate buy and sell signals for the
selected financial market securities. What may appear, at a first glance, as an easy
problem is, in fact, a huge and highly complex optimization problem, which cannot
be solved analytically. Therefore, this makes the soft computing and in general the
computational intelligence domains especially appropriate for addressing the
problem.
The use of chart patterns is widely spread among traders as an additional tool for
decision making. The chartists, as these analysts are known, try to identify some
known pattern formations and based on previous appearances try to predict future
market trends. The visual pattern identification is hard and largely subject to errors,
and patterns in the financial time series are not as clean as the images in the books,
so the need to create some solution that helps on this task will always be welcomed.
Together, with this, the general availability of GPU boards, today, presents itself as
an excellent alternative execution system, to traditional CPU architectures, to cope
with high-speed processing requirements at relatively low cost.
This work explores the benefits of putting together a low-cost high-performance
computing solution, a GPU-based architecture, and a state-of-the-art computational
finance approach, SAX/GA which combines a Symbolic Aggregate approXimation
(SAX) technique together with an optimization kernel based on genetic algorithms
vii
viii Preface
(GA). The SAX representation is used to describe the financial time series, so that
relevant patterns can be efficiently identified. The evolutionary optimization kernel
is here used to identify the most relevant patterns and generate investment rules.
The SAX technique uses an alphabetic symbolic representation of data defined by
adjustable parameters. In order to capture and preserve the essence of the
explored financial time series, a search for the optimal combination of SAX
parameters is presented. The proposed approach considers a tailored implementa-
tion of the SAX/GA technique to a GPU-based architecture in order to improve the
computational efficiency of the referenced approach. This approach was tested
using real data from S&P500. The achieved results show that the proposed
approach outperforms CPU alternative with speed gains reaching 200 times faster.
The book is organized in five chapters as follows:
• Chapter 1 presents a brief description on the problematic addressed by this book,
namely the investment optimization based on pattern discovery techniques and
high-performance computing based on GPU architectures. Additionally, the
main goals for the work presented in this book as well as the document’s
structure are, also, highlighted in this chapter.
• Chapter 2 discusses fundamental concepts, key to understand the proposed
work, such as pattern recognition or matching, GAs and GPUs.
• Chapter 3 presents a review of the state-of-the-art pattern recognition techniques
with practical application examples.
• Chapter 4 addresses the CPU implementation of the SAX/GA algorithm along
with a detailed explanation of the genetic operators involved. A benchmark
analysis discusses the performance of SAX/GA and introduces possible loca-
tions to accelerate the algorithm.
• Chapter 5 presents the developed solutions along with previous attempts to
accelerate the SAX/GA algorithm. Each solution started as a prototype that
evolved based on the advantages and disadvantages identified.
• Chapter 6 discusses the experimental results obtained for each solution and
compares them to the original implementation. Solutions are evaluated based on
two metrics, the speedup and the ROI indicator.
• Chapter 7 summarizes the provided book and supplies the respective
conclusions and future work.
Lisbon, Portugal João Baúto

Rui Neves
Nuno Horta
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Book Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Euclidean Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 Piecewise Linear Approximation . . . . . . . . . . . . . . . . . . . . 6
2.1.4 Piecewise Aggregate Approximation . . . . . . . . . . . . . . . . . 7
2.1.5 Symbolic Aggregate approXimation . . . . . . . . . . . . . . . . . 8
2.2 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Selection Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Crossover Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Mutation Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Graphics Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 NVIDIA’s GPU Architecture Overview . . . . . . . . . . . . . . . 13
2.3.2 NVIDIA’s GPU Architectures . . . . . . . . . . . . . . . . . . . . . . 15
2.3.3 CUDA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 State-of-the-Art in Pattern Recognition Techniques . . . . . . . . . . . . . 21
3.1 Middle Curve Piecewise Linear Approximation . . . . . . . . . . . . . . 21
3.2 Perceptually Important Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Turning Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
ix
x Contents
3.4 Symbolic Aggregate approXimation . . . . . . . . . . . . . . . . . . . . . . . 28

3.5 Shapelets . . . . . . . . .................................. 28
3.6 Conclusions . . . . . . .................................. 30
References . . . . . . . . . . . .................................. 31
4 SAX/GA CPU Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 SAX/GA CPU Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.1 Population Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.2 Fitness Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.3 Population Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.4 Chromosome Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.5 Individual Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 SAX/GA Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5 GPU-Accelerated SAX/GA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1 Parallel SAX Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.1 Prototype 1: SAX Transformation On-Demand . . . . . . . . . 45
5.1.2 Prototype 2: Speculative FSM . . . . . . . . . . . . . . . . . . . . . 47
5.1.3 Solution A: SAX/GA with Speculative GPU
SAX Transformation . . . . . . . . . . . . . . . . . . . . . ....... 50
5.2 Parallel Dataset Training . . . . . . . . . . . . . . . . . . . . . . . . ....... 55
5.2.1 Prototype 3: Parallel SAX/GA Training . . . . . . . ....... 55
5.2.2 Solution B: Parallel SAX/GA Training with GPU
Fitness Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Fully GPU-Accelerated SAX/GA . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3.1 Population Generation Kernel . . . . . . . . . . . . . . . . . . . . . . 61
5.3.2 Population Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.3 Gene Crossover Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3.4 Gene Mutation Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3.5 Execution Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1 SAX/GA Initial Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 Study Case A: Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.1 Solution A: SAX/GA with Speculative FSM . . . . . . . . . . . 68
6.2.2 Solution B: Parallel Dataset Training . . . . . . . . . . . . . . . . 74
6.2.3 Solution C: Fully GPU-Accelerated SAX/GA . . . . . . . . . . 77
6.3 Study Case B: FSM Prediction Rate . . . . . . . . . . . . . . . . . . . . . . 81
Contents xi
6.4 Study Case C: Quality of Solutions . . . . . . . . . . . . . . . . . . . . . . . 84

6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Acronyms
Computation and GPU Related

ALU Arithmetic Logic Unit
API Application Programming Interface
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
DirectX Open Graphics Library for 2D or 3D rendering
EA Evolutionary Algorithm
FPU Floating Point Unit
FSM Finite-State Machine
GA Genetic Algorithm
GB Gigabyte
GDDR Graphics Double Data Rate
GPC Graphics Processor Cluster
GPGPU General Purpose Graphic Processing Unit
GPU Graphic Processing Unit
ISA Instruction Set Architecture
KB Kilobyte
MB Megabyte
MM Memory Module
NN Neural Networks
OBST Optimal Binary Search Tree
PCIe Peripheral Component Interconnect Express
PM Processing Module
PS Particle Swarm
PTX Parallel Thread eXecution
SIMT Single-Instruction Multiple-Thread
SFU Special Function Unit
SM Streaming Multiprocessor
SMX Next-Generation Streaming Multiprocessor
SMM Streaming Multiprocessor in Maxwell Architecture
xiii
xiv Acronyms
SP Streaming Processor
SVM Support Vector Machine
TPC Texture/Processor Cluster
TSA Tabu Search Algorithm
Time Series Related

SAX Symbolic Aggregate approXimation
eSAX Extended SAX
iSAX Indexable SAX
DFT Discrete Fourier Transform
PLA Piecewise Linear Approximation
PAA Piecewise Aggregate Approximation
SVD Singular Value Decomposition
ED Euclidean Distance
PD Perpendicular Distance
VD Vertical Distance
DTW Dynamic Time Warping
DDTW Derivative Dynamic Time Warping
MPLA Middle Curve Piecewise Linear Approximation
DPLA Divisive Piecewise Linear Approximation
PIP Perceptually Important Points
SOM Self-Organizing Maps
TP Turning Point
Investment Related
B&H Buy & Hold
C/F Ratio Crossover/Fitness Ratio
HSI Hong Kong Hang Seng Index
IL Enter Long
IS Enter Short
NYSE New York Stock Exchange
OL Exit Long
OS Exit Short
ROI Return on Investment
RSI Relative Strength Index
S&P500 Standard & Poor 500
Others
ECG Electrocardiogram
Chapter 1
Introduction
Abstract This chapter presents a brief description on the scope of the problem
addressed in the book which is the performance and optimization of algorithms
based on pattern discovery. Additionally, the main goals to be achieved by this work
are discussed along with a breakdown of the document’s structure.
Keywords Computational finance · High performance computing · Graphical

Processing Unit
The financial system as it is currently known does not come from an idea invented
one century ago but from the human ideal of trading goods. It is an idea that evolved
into the current stock markets where goods are traded for a monetary value.
Stock markets like New York Stock Exchange (NYSE) and Hong Kong Hang
Seng Index (HSI) are responsible for the movements of tremendously high amount
of capitals. They connect investors from different corners of the world into one com-
mon objective, trading. Trading must occur in real-time where stock prices must be
displayed without any delay and simultaneous to all parties involved. Once presented
with the stock prices, the investors have two main type of analysis, fundamental or
technical, with which they can base their decisions. Some investors are interested in
the company’s position in relation to social or political ideologies, while others are
focused in raw numbers.
The author of [1] discusses a question with it’s fair share of interest, “to what
extent can the past history of a common stock’s price be used to make meaningful
predictions concerning the future price of the stock?”. The motto of technical analysis
depends heavily on the previous question and there are evidences that support this
approach. If the past history of a stock can reveal future movements, one can try
to identify points in history that reflect those movements and use them for future
decisions. These points or patterns are one of most interesting topics of technical
analysis and identifying them has posed a true challenge.
© The Author(s), under exclusive licence to Springer International Publishing AG, 1

part of Springer Nature 2018
J. Baúto et al., Parallel Genetic Algorithms for Financial Pattern Discovery Using GPUs,
Computational Intelligence, https://doi.org/10.1007/978-3-319-73329-6_1
2 1 Introduction
1.1 Motivation
The main objective of participating in trading on financial markets is to maximize

the Return on Investment (ROI). There are two possible decisions that an investor
can make, either play the safe and low risk game of buying an stock and hold it for a
long period of time, also known as Buy & Hold (B&H) strategy, or enter in the high
risk, high reward play of designing a trading strategy that involves multiple entries
(open position) and exits (close position) in the market.
Creating a custom trading strategy that uses patterns as decisions points of entry or
exit on the market can be a tedious and long process of optimization where multiple
possible decision sets must be tested against large time series. Researchers have been
trying to ease the optimization process by reducing datasets while maintaining an
precise representation with minimal data loss however, this is a trade-over between
lower execution time or less accurate trading decision set.
The use of a different execution system is the main ideal behind solving the pre-
vious trade-off, exploiting the characteristics of current state-of-the-art algorithms
to the advantage of many-core systems while combining different time series rep-
resentations. The Central Processing Unit (CPU) was for a while the main sys-
tem used to execute algorithms with heavy workload however, with the increasing
demand in computational resources, an alternative system had to be found. The
NVIDIA’s Graphic Processing Unit (GPU) as it is known did not appear until 2006
but researchers were already using them to accelerate highly parallel algorithms that
resembled to graphical programs such as the rendering of a 2D scene. The architec-
ture of the GPU presents itself as an excellent alternative execution system that not
only was meant to process high volumes of information but also the open access to a
high-level Application Programming Interface (API) that allows a great manipulation
of the GPU.
1.2 Goals
The objective of this work is to study and understand whether the Symbolic Aggregate
approXimation (SAX)/Genetic Algorithm (GA) algorithm can take advantage of
many-core systems such as NVIDIA’s GPU to reduce the execution time of the CPU
sequential implementation. SAX/GA is an algorithm designed to optimize trading
strategies to be applied in the stock market and the whole algorithm was implemented
so that it could explore a vast search space using small populations of individuals.
The authors of SAX/GA [2, 3] found the need to use aggressive genetic operators
capable of preventing the algorithm of entering in a static behaviour and circling
identical solutions.
1.2 Goals 3
The first step is analysing the performance of SAX/GA and understand where are
the causes of prolonged execution time. Once the bottlenecks are identified, different
GPU strategies of optimization will be presented and compared to the original CPU
algorithm based on accuracy of the solution and speedup (Eq. 1.1).
C PU E xec. T ime o f S AX/G A

Speedup = (1.1)
C PU + G PU E xec. T ime o f Solution x
1.3 Book Outline
This Book is organized as follows:

• Chapter 2 discusses fundamental concepts, key to understand the proposed work,
such as pattern recognition or matching, GAs and GPUs.
• Chapter 3 presents a review of the state-of-the-art pattern recognition techniques
with practical application examples.
• Chapter 4 addresses the CPU implementation of the SAX/GA algorithm along
with a detailed explanation of the genetic operators involved. A benchmark anal-
ysis discusses the performance of SAX/GA and introduces possible locations to
accelerate the algorithm.
• Chapter 5 presents the developed solutions along with previous attempts to accel-
erate the SAX/GA algorithm. Each solution started as a prototype that evolved
based on the advantages and disadvantages identified.
• Chapter 6 discusses the experimental results obtained for each solution and com-
pares them to the original implementation. Solutions are evaluated based in two
metrics, the speedup and the ROI indicator.
• Chapter 7 concludes the developed work and indicates aspects of the SAX/GA
algorithm that can be improve in the near future.
References
1. E.F. Fama, The behavior of stock-market prices. J. Bus. 38(1), 34–105 (1965)
2. A. Canelas, R. Neves, N. Horta, A SAX-GA approach to evolve investment strategies on
financial markets based on pattern discovery techniques. Expert Syst. Appl. 40(5), 1579–1590
(2013), http://www.sciencedirect.com/science/article/pii/S0957417412010561. https://doi.org/
10.1016/j.eswa.2012.09.002
3. A. Canelas, R. Neves, N. Horta, Multi-dimensional pattern discovery in financial time series
using sax-ga with extended robustness, in GECCO (2013). https://doi.org/10.1145/2464576.
2464664
Chapter 2
Background
Abstract This Chapter presents some fundamental concepts required to fully under-
stand the topics discussed. First, a brief introduction to some concepts related to
pattern matching and time series dimensional reduction followed, lastly, by an his-
torical and architectural review of GPUs. Time series analysis is one of the pillar
of technical analysis in financial markets. Analysts use variations in a stock’s price
and volume of trade in combination with several well known technical indicators
and chart patterns to forecast what will be the future price of a stock or speculate at
least whether the price will increase or decrease. However, the widespread use of this
indicators and patterns may indirectly influence the direction of the market causing
it to converge into chart patterns that investors recognize.
Keywords Techinal analysis · Time series analysis · Piecewise Aggregate

Approximation · Symbolic Aggregate approXimation · Genetic algorithms
GPU · CUDA
2.1 Time Series Analysis
Searching for chart patterns may seem to be a simple process where two patterns or
time series from different periods would be compared and analysed for similarities
but it is not that trivial as will be later demonstrated. In the following sections, P
will be referred as a normal time series while Q represents the time series to test
similarity with P.
2.1.1 Euclidean Distance
This procedure is the basis for some pattern matching techniques that will be later
presented. Starting with two time series, P = ( p1 , p2 , . . . , pi , . . . , pn ) and Q =
(q1 , q2 , . . . , q j , . . . , qn ), the Euclidean Distance (ED) method iterates through both
series and calculates the distance between pi and q j (Eq. 2.1).
6 2 Background

E D( p, q) = (q1 − p1 )2 + (q2 − p2 )2 (2.1)
At first sight it is possible to observe some important issues. What if both time
series have a different magnitude, and different alignments? With a different magni-
tude, applying the ED method would be pointless as the main feature lies in direct
spatial comparison. The same result is observed with different alignments as both
series may be equal or the very least partially similar, but since they shifted or
unaligned, a direct match will not be found.
2.1.2 Dynamic Time Warping
An alignment technique, Dynamic Time Warping (DTW) [1], can be used to solve the
previous problem. This approach aligns two times series, P = ( p1 , p2 , ..., pi , ..., pn )
and Q = (q1 , q2 , ..., q j , ..., qm ), using a matrix D of n × m. First, to each pair (i, j)
in D, the distance ( pi − q j )2 is calculated. The warping or alignment path (W ) is
obtained by minimizing the cumulative distance defined by,
γ (i, j) = D(i, j) + min[γ (i − 1, j), γ (i − 1, j − 1), γ (i, j − 1)] (2.2)
Iterating through all entries of D is a costly problem with exponential complexity.

To reduce the search space and therefore the number of possible paths, some limi-
tations are introduced to the initial problem. Points in the warping path are mono-
tonically ordered so that i k−1 i k and jk−1 jk preventing that a sub-sequence is
associated a priori and a posteriori. The resulting path must be a continuous path,
starting in (i = 1, j = 1) and ending in (i = n, j = m), so that the algorithm does
not ignore sub-sequences, locking into partial solutions. A warping path that follows
the diagonal of D indicates that the input query Q is completely aligned with the
time series P and therefore they are similar.
2.1.3 Piecewise Linear Approximation
DTW was capable of solving the alignment problem however it was through an
increase in the computational time. Some optimizations could be done to DTW
algorithm but the main issue would remain untouched, the dataset. Financial time
series tend to present small variance in value during a time period and, taking this
into consideration, some points of the dataset can be eliminated.
With a sliding-window Piecewise Linear Approximation (PLA) approach, the time
series is condensed into a representation using N breakpoints where each breakpoint
is the last point that satisfied a threshold condition. The series can then be approxi-
mated by two methods, a linear interpolation, connecting each breakpoint into linear
2.1 Time Series Analysis 7
segments, or a linear regression where the sub-sequence between breakpoint a and

b is approximated through the best fitting line [2]. A linear decreasing or rising seg-
ment does not suggest any behaviour of the time series during that period, only that
the beginning value was higher or lower than the end value, implying that, between
two segments, the series can suffer a trend switching that may not be catch until the
next segment.
2.1.4 Piecewise Aggregate Approximation
Piecewise Aggregate Approximation (PAA) presents some similarities to PLA where,

instead of linear segments, it uses the average of N equal size time series windows
making it a far less complex and time consuming algorithm.
To extract meaningful information from PAA or any other method of time series
representation with the objective of sequence comparison, the dataset must fulfil
one condition, it needs to be normalized otherwise the baseline is off scale. Dataset
normalization grants us the ability of comparing sub-sequences of the dataset with
different magnitude and can be obtained through a standard score normalization
(Eq. 2.3)
xi : normalized value of xi
xi − μ
xi = μ : mean value of X (2.3)
σ σ : standard deviation of X
N
Once the dataset is normalized, PAA reduces a time series of dimension N into W
N
time windows of size W where W must an integer value otherwise Eq. 2.4 is not valid.
An implementation of PAA with a non-integer number of windows is presented in [3]
where border elements of two windows have a partial contribution to both windows.
For each window, the mean value is calculated (Eq. 2.4) and that value is assigned to
represent a time window as represented in Fig. 2.1.
Fig. 2.1 PAA method

8 2 Background
p̄i : element ith of approximated time series

w ·i
n
w x j : element jth of original time series
p̄i = · x j (2.4)
n n : size of original time series
n
j= ·(i−1)+1 w : size of PAA time series
w
In order to discover similarities in time series, PAA uses a ED-based formula

where, instead of the point-to-point calculation, it uses the mean value of the reduced
series (Eq. 2.5).
w
n
Distance(P, Q) = · ( p̄i − q̄i )2 (2.5)
w i=1
2.1.5 Symbolic Aggregate approXimation
SAX [4] can be viewed as an improvement to PAA as it still uses this method to
obtain a dimensional reduced time series but adds a new type of data transformation,
numeric to symbolic representation.
This transformation relies in a Normal distribution, N ∼ (0, 1), with αn intervals
where the probability between the z-score of αi+1 (βi+1 ) minus αi z-score (βi ) must
1
be equal to , where each interval is to considered as a symbol. For example, with
αn
αn = 3, there are 3 intervals, all of them with equal probability of 33.3% and with
symbolic representation,
α = ’A’ ii f − ∞ < ci < β1

α = ’B’ ii f β1 < ci < β2 (2.6)
α = ’C’ ii f β2 < ci < +∞
In Fig. 2.2, frame 3 (c3 ) has an average value of 1.5 and considering an alphabet
with 3 letters (α = 3), from Table 2.1 and Eq. 2.6 it is possible to assess that c3
is between β2 and ∞ and, therefore, the corresponding letter is ’C’. This method
ensures that, in an alphabet containing all possible symbols in the SAX representation,
each symbol has equal probability allowing a direct comparison. The z-score values
(Table 2.1) were obtained from [4, 5].
Now that PAA series is normalized and the z-scores of α are known, the SAX
representation can be easily obtained. To each segment of PAA (ci ) a corresponding
α interval will be assigned so that α must satisfy to conditions similar to those in
Eq. 2.6. The transformation in Fig. 2.2 compressed a window with size equal to 500
into a SAX sequence with si ze = 10 and an alphabet of 3 letters.
Until this point, there is not must of an improvement since there is not a way to
compare two time series, the input and search pattern. The authors of SAX faced
a struggle, how to compared two series if they are represented in a string format?
2.1 Time Series Analysis 9
Fig. 2.2 Transformation of a PAA series into a SAX series with 3 symbols
Table 2.1 Z-scores of N or mal(0, 1) for α = [2, 10]

Z-Score Alphabet Size (αi )
(β j )
2 3 4 5 6 7 8 9 10
1 0.000 −0.431 −0.675 −0.842 −0.968 −1.068 −1.150 −1.221 −1.282
2 – 0.431 0.000 −0.253 −0.431 −0.566 −0.675 −0.765 -0.842
3 – – 0.671 0.255 0.000 −0.180 −0.319 −0.431 −0.524
4 – – – 0.842 0.431 0.180 0.000 −0.140 −0.253
5 – – – – 0.967 0.566 0.317 0.140 0.000
6 – – – – – 1.068 0.671 0.431 0.253
7 – – – – – – 1.150 0.765 0.524
8 – – – – – – – 1.221 0.842
9 – – – – – – – – 1.282
It is possible to know if both series are equal but not if they are similar. Lin et al.
[4] needed to redefined the distance measure so that two symbolic series could be
compared. Similar to the PAA distance, this new distance measure is defined by,

w
n 2
M I N D I ST ( P̂, Q̂) = · dist ( p̂i − q̂i ) (2.7)
w i=1
At first sight, Eq. 2.7 is essentially equal to the one used in PAA. However a new
element was added, the dist(·) function. This function (Eq. 2.8) calculates the
distance between two symbols based on the z-scores values used to transform from
numeric to symbolic representation. For instance, with an alphabet of 4 symbols, the
distance between ’A’ and ’C’ will be given by the z-score of c minus a z-score. In
case of near symbols, such as ’A’–’B’ or ’C’–’D’, the distance will be evaluated
as zero.
10 2 Background
⎧
⎪
⎨0, |i − j| 1
dist ( p̂i − q̂i ) = β j−1 − βi , i < j − 1 (2.8)
⎪
⎩
βi−1 − β j i > j + 1
SAX symbolic representation can produce a very compact and efficient time
series however it is subject to a particular problem, mainly caused by PAA. Since the
symbolic representation of each window is calculated using the average value of the
series in that window, it cannot, accurately, represent a trend as important points will
be ignored. An alternative solution, Extended SAX (eSAX) [6], can be used to fix
this issue. Instead of only considering the average value of the frame, two additional
points are added, the maximum and minimum value of the frame. This values will
composed a string of ordered triplets, < vmin , vavg , vmax >, that can help understand
the behaviour inside each frame.
2.2 Genetic Algorithm
Algorithms are methods that transform input data, through a set of operations, into
an output that is a solution to a specific problem. However, sometimes, finding a
solution may not be so linear. A particular type of problem fall into an optimiza-
tion problem where an approximate and less time consuming solution is acceptable
instead of a more accurate but more time costly. To tackle these problems, researchers
switched to a different field of algorithms, Evolutionary Algorithms (EAs), taking
also advantage of innovative data representation. EAs include, but are not limited
to, Neural Networks (NN), Particle Swarm (PS) and the one relevant to this work,
the Genetic Algorithm (GA). These algorithms follow an identical idea, evolution
of a population of individuals until a near-optimal solution is achieved, inspired by
Darwin’s natural selection and survival of the fittest.
A GA works around a pool of individuals or chromosomes. Each individual, ran-
domly generated, represents a possible solution to the optimization problem and, at
the beginning, is assigned with a score according to an evaluation, the fitness func-
tion. To follow the biological process of evolution, individuals must be subject to
reproduction, where two individuals are randomly selected from the population and
their genetic information is mixed to form two offspring, hopefully with better char-
acteristics. As chromosomes reproduce, there a risk of mutation where one or more
genes of a chromosome are inadvertently changed, also hoping for more favourable
features. At the end of each reproduction cycle, all individuals in the population are
evaluated based on the fitness function and the worst percentage of the population is
discarded (Fig. 2.3).
2.2 Genetic Algorithm 11
Fig. 2.3 Pseudo code of a GA execution
2.2.1 Selection Operator
The three main selection techniques (Fig. 2.4) are the tournament selection, roulette
wheel and rank-based roulette wheel selection [7]. Tournament selection techniques
use n random individuals where two or more individuals compete between them
and the winner is allowed to proceed to the next stage. Roulette wheel selection is
based in a probabilistic model where the best scoring individuals have the highest
probability of being selected to reproduce while low scoring individuals have limited
chances but not null. The rank-based selection tries to prevent the overpowering of
highly fit individuals by mapping their fitness score into ranks.
Fig. 2.4 Types of selection

operators
12 2 Background
2.2.2 Crossover Operator
The crossover operator replicates reproduction between two individuals although is

not applied to all individuals in the population but instead to only a percentage of it
is used. The number of individuals that are selected for crossover is directly related
with the percentage of chromosomes that are discarded between generations. The
transfer of information between two individuals is performed by choosing one or
more breakpoints so that,
• Simple point—first half genetic information of parent 1 and second half of

parent 2 is transferred to the offspring 1 while offspring 2 receives
the second half of parent 1 and parent 2 first half.
• N point—an adaptation to simple point crossover where each parent is split into
N equal parts alternating information between them.
• Uniform—gene-wise transfer where a gene in position i of both parents has 50%
probability of being sent to an offspring (Fig. 2.5).
2.2.3 Mutation Operator
When searching for a solution, GAs are prone to be stuck in a local optima, points
in a limited closed space where the solution is optimal but to an open space it is not.
To prevent algorithms from entering in a local optima, mutation operators performed
small changes to individuals introducing new possible solutions and increasing pop-
ulation diversity (Fig. 2.6).
Fig. 2.5 Types of crossover operators

2.3 Graphics Processing Units 13
Fig. 2.6 Mutation example. Gene at the beginning and end were mutated causing a change in the
genetic information
2.3 Graphics Processing Units
GPU, as commonly known, were firstly introduced by NVIDIA in 1999 [8]. The
new generation of graphical processors, GeForce 2, shifted vertex transformation
and lightning (T&L) from the CPU to the GPU by including dedicated hardware. By
2001 NVIDIA had replaced fixed-function shaders by programmable vertex shaders,
units capable of performing custom instructions over the pixels and vertices of a
scene [9].
Although shader programming was limited to the usage of current graphics API
such as OpenGL and DirectX, researchers tried with some success to solve non-
graphics problems on GPUs by masking them into traditional rendering problems.
Thompson [10] proposed a GPU implementation of matrix multiplication and 3-
SAT using a GeForce Ti4600 and OpenGL’s API obtaining a speed-up of up to 3.2×
when comparing CPU/GPU. Other applications include Ray tracing [11] and Level
set methods [12]. This was the first step into the (GPGPU) programming.
The performance of rendering a 3D scene was heavily linked with the type of
shader used since a GPU normally processes more pixels than vertices, three to one
ratio [8], and with a predefined number of processors the workload is normally unbal-
anced across all processors. Nonetheless, with the release of Tesla based-architecture
GeForce 8, NVIDIA accomplished an important milestone to what is now known
as the GPU architecture. Unifying vertex shaders with Tesla’s new feature, pro-
grammable pixel-fragment shaders into a single shader pipeline, created a new world
to programmers and developers, enabling them to balance workload between vertex
and pixel shaders [9]. This pipeline now behaves similar to the basic CPU archi-
tecture, with its own instruction memory, instruction cache and sequential control
logic. Additionally, Compute Unified Device Architecture (CUDA) framework was
released. CUDA provided access to a parallel architecture capable of being pro-
grammed with high-level languages like C and C++ breaking the need of graphics
API, completing the transition into the GPGPU era.
2.3.1 NVIDIA’s GPU Architecture Overview
NVIDIA’s GPUs follows a unique architecture model, Single-Instruction Multiple-

Thread (SIMT). The foundation of this model leans on multiple threads executing
14 2 Background
the same instruction but in different datasets and that is why it is so useful in a 2D/3D
scene rendering, few operations are required however thousands of pixels need to be
processed.
To obtain a SIMT architecture, the GPU must be designed to execute hundreds
of threads concurrently [13]. On a top-level, a GPU is a combination of multiple
Streaming Multiprocessor (SM), independent multi-threaded units responsible for
the creation, management, schedule and launch of threads, paired in groups of 32
called warps. Each SM features an instruction cache, warp schedulers that selects
warps ready to execute,instruction dispatch units that issues instruction to individual
warps, a 32-bit register file, a shared memory, several types of cache and the most
important element, the CUDA core or Streaming Processor (SP).
On the memory side, a GPU memory organization is divided in a 3 level hierarchi-
cal structure. Each level has a defined set of functions, benefits and limitations and it
is the programmer’s responsibility to assure the appropriate use and correct manage-
ment. All SMs are connected and can communicate through a global memory located
off-chip and with a magnitude of Gigabyte (GB) that is linked to the CPU through the
Peripheral Component Interconnect Express (PCIe) bus. Being a “general” access
off-chip memory leads to an important problem, the latency between requesting and
retrieving information, which can be as high as 800 clock cycles depending on the
device capability [13]. The accesses in global memory can be done with either 32-,
64- or 128-bytes memory transactions which must be aligned to a multiple of the
transaction size, e.g. a warp that requests a sequential 4-byte word with address range
116–244 triggers two 128-byte transactions from address 0 to 256. Ideally, a warp’s
accesses should be coalesced meaning that each thread requests a sequential and
aligned word that is transferred in one or more memory transaction depending on the
word and transaction size. In more recent architectures, aligned but non-sequential
accesses are considered as coalesced transactions.
On a second level, there is a set of caches and an important mechanism of commu-
nication between threads, a shared memory. The latter memory consists in a fast high
throughput access memory located inside each SM that is accessible although only a
small size is available, around the Kilobyte (KB) magnitude. Such advantages come
with disadvantages mainly the access pattern by threads. To achieve peak throughput,
NVIDIA organized shared memory in a modular structure of equally-sized memory
modules called banks with memory lines of either 16 or 32 four bytes banks, compute
capability dependent. Maximum memory bandwidth is obtained by performing read
or writes in n addresses that match n unique banks however once m threads execute
an instruction whose address falls in the same memory bank, it triggers a m-way
bank conflict and each conflict is served in serially.
With the exception of Tesla microarchitecture, two levels of caches, L1 and L2,
are present to assist memory transaction between threads and global memory where
the L2 cache is mainly used to cache global memory loads and the L1 cache is for
local memory accesses (memory whose size is not known at compile time such as
dynamic size arrays or register spill).
For the third and more restrict level, each SM is equipped with a 32-bit register file
with the higher throughput available dedicated for private variables of each thread.
The limited size of the register file creates a constraint in the number of registers
used per thread which can vary from 63 to 255 depending on the microarchitecture.
Although threads are allowed to allocate up to this limit, such will reduced the number
of active warps per SM and therefore decrease the overall performance.
2.3.2 NVIDIA’s GPU Architectures
Over the course of one decade, NVIDIA has been releasing new architectures,
improving existing features while providing developers with new techniques to
increase parallelism in GPU. This section presents an brief overview of NVIDIA’s
lastest GPU generations with technical aspects related to the GPU architecture and
features to enhance parallelism.
2.3.2.1 Tesla Microarchitecture
With the release of Tesla microarchitecture in 2006, NVIDIA introduced the world
to a programmable unified architecture. Tesla is organized on a top-level with eight
Texture/Processor Cluster (TPC) each consisting of one texture unit and two SM
(later increased to three in GT200). The SMs are structured with eight CUDA cores,
two Special-function Unit (SFU) that are responsible for transcendental functions
(functions that can not be expressed through a polynomial expression such as square
root, exponential and trigonometric operations and their inverses), an instruction
fetch and issue unit with instruction cache that serves multiple concurrent threads
with zero scheduling overhead, a read-only constant cache and a 16 KB shared
memory.
The shared memory is split into 16 banks of consecutive four bytes words with
high throughput when each bank is requested by distinct threads in a warp. However
there is a discrepancy between the number of threads and banks and when a warp tries
to access shared memory banks, the requests are divided in independent accesses, one
per half-warp, that should not have bank conflicts. In case of multiple threads reading
from the same bank, a broadcast mechanism is available serving all requesting threads
simultaneously [14].
2.3.2.2 Fermi Microarchitecture
Fermi (2010) brought major changes for both the SM and memory organization.
Graphics Processor Cluster (GPC) replaced the TPC as the top-level module through
the introduction of four dedicated texture units removing the now redundant texture
unit in Tesla, while increasing the overall number of SMs from two (three in GT200)
to four SMs. The SMs now feature 32 CUDA cores and a new configurable cache
with two possible configurations that gives freedom to the programmer, where for
16 2 Background
graphics programs a lower L1 cache is beneficial and for compute program a larger
shared memory allows more cooperation between threads. This cache can be used as
16 KB of L1 cache and 48 KB of shared memory or 48 KB of L1 cache and 16 KB
of shared memory. Besides a configurable cache, shared memory suffered internal
changes. Previously with Tesla, shared memory was organized into 16 four bytes
that served a warp in two independent transactions without bank conflicts however
with Fermi the number of banks was raised to 32 with one request per warp. Bank
conflicts are still present in Fermi with addition to the broadcast mechanism added
with Tesla.
The increase in CUDA cores and a renewed cache were not the only changes
in the SM structure. The number of SFU was doubled to four, each capable of
one transcendental instruction per thread independently of other execution units
preventing a stall in the GPU pipeline due to a separation of CUDA cores and SFU
units from the dispatch unit responsible for serving instruction to each execution unit
and because with Fermi two separate dispatch units are available. The destination
addresses of a thread result is now calculated by one of the 16 Load/Store units
available for a total of 16 thread results per clock. The workload is divided across
two groups of 16 CUDA cores each and instructions are distributed by two warp
schedulers allowing two warps to be issued and executed concurrently meaning
that for a work group to be complete execution, two clock cycles are required (for
transcendental instructions it takes eight cycles for all four SFUs to execute).
2.3.2.3 Kepler Microarchitecture
Kepler microarchitecture (2012) focused in improving the performance achieved

with Fermi while decreasing the overall power consumption. The top-level structure
remained the same with the GPC module however the SM is now called Stream-
ing Multiprocessor (SMX). Each SMX features 192 CUDA cores, 32 Load/Store
units and 32 SFUs now capable of serving a transcendental instruction per warp in
one clock cycle. An important change to achieve NVIDIA’s mission of increasing
Kepler’s performance passed by doubling the number of warp schedulers and with
that also increasing the number of instruction dispatchers to two. With this change,
each warp now executes two independent instructions in one clock cycle if possible.
2.3.2.4 Maxwell Microarchitecture
Maxwell continued Kepler’s trend of better power efficiency and performance

improvement. The new SM, now called Streaming Multiprocessor (SMM), suffered a
decrease in CUDA cores, from 192 to 128, keeping the same amount of special execu-
tion units which allowed a new configuration of the SMM. A SMM is now organized
into four smaller groups each with 32 CUDA cores, eight Load/Store units, eight
SFUs, one warp scheduler and two instruction dispatchers. This represents an over-
all decrease of 33% in CUDA cores however a Maxwell Streaming Processor (SP)
Table 2.2 Architectural comparison between Fermi, Kepler and Maxwell [13, 15–18]
Specifications Fermi - GF 100 Kepler - GK 104 Maxwell - GM 204
Compute capability 2.0 3.0 5.2
Streaming multiprocessor (SM) 11–16 6–8 13–16
CUDA cores 353–512 1152–1536 1664–2048
Theoretical Floating Point Single 855–1345 2100–3000 3500–4600
Precision (GFLOPS)
Main Memory (Megabyte (MB)) 1024–1536 1536–4096 4096
L1Cache(K B) 48 16 16 32 48 24
64 64
Shar ed Memor y(K B) 16 48 48 32 16 96
L2 Cache (KB) 768 1792–2048
Maximum Registers per Thread 63 255
Maximum Registers per SM 32768 65536
Threads per Warp 32
Maximum Warps per SM 48 64
Maximum Blocks per SM 8 16 32
Maximum Threads per SM 1536 2048
Maximum Threads per Block 1024
is equivalent to 1.4 Kepler SP performance-wise, delivering identical performance

with the advantage of a more friendly configuration, power of two organization [15].
Additionally, the shared memory is now a dedicated unit with maximum capacity
of 96 KB although limited to 48 KB per block but with identical characteristics
of Kepler and Fermi. The L1 and texture cache were combined into a single unit
therefore forcing the L2 cache to also be used for caching of local loads and possibly
increasing the latency in case of register spilling [13] (Table 2.2).
2.3.3 CUDA Architecture
In parallel programming, the basic execution unit is the thread. In a CPU, threads
are sub-routines of a main program scheduled to execute a custom set of instruction
that may include memory accesses to local or shared resources. If necessary, threads
can communicate between them using a global resource or memory, however special
attention is required if running threads are performing write operations in the same
memory address.
CUDA introduced a general purpose parallel computing platform and program-
ming model able to combine well established programming languages with an highly
parallel architecture that is a GPU. Creating a functional CUDA C program in a GPU
is a three-stage process. First, the execution environment must be defined. This envi-
ronment consist in a kernel where a developer formalizes the routine to be executed
18 2 Background
in the GPU and how it should be executed. The kernel definition has four arguments
associated, the number of blocks, number of threads, size of dynamic shared mem-
ory per block and stream ID. The way a kernel is defined reflects how the problem
is spatially organized, e.g., parallel sum reduction over an array can be represented
with an 1D kernel and a multiplication between two matrices with an 2D kernel. In
Fig. 2.7, a kernel is declared with 4 blocks, each with 16 × 16 threads (256 in total)
while the size of dynamic shared memory and stream ID are optional, defaulting to
0—Example from [17].
Once the kernel is declared, the second stage begins. The program is compiled
through NVIDIA’s compiler driver, NVCC, that generates a set of binaries that include
the GPU assembly code, Parallel Thread eXecution (PTX), containing the execu-
tion plan for a given thread [19]. Each thread is assigned a unique three-element
identifier (x,y,z coordinates), threadIdx, that will locate her in the GPU execu-
tion plan. Based on several of the available compilation flags, NVCC can perform
some optimizations that can increase a kernel performance. One of those flags,
-maxrregcount, grants the programmer a way to lock the maximum registers
allowed per thread which can greatly impact the kernel performance. By reducing
the register usage per thread, with the same register file it is possible to effectively
allocate more blocks to each SM resulting in more warps being dispatched. Another
advantage is preventing register spilling. With complex kernels, NVCC’s task of cre-
ating PTX code becomes harder and eventually there not enough registers to satisfy
a thread’s needs. In those cases, local memory is used to replicate a register function
and since this type of memory is addressed in the global memory space, it inherits
all characteristics such as latency. The main issue with maxrregcount flag is that
it forces the compiler to generate additional instructions that may not compensate
for the extra one or two blocks per SM.
Furthermore, NVCC has internal mechanisms that are able to optimize redundant
code and prevent duplicate operations identical to those in Fig. 2.8.
And finally, the program’s execution in the GPU. At this point, all threads are
organized, spatially, in a single or multi-dimensional grid formed by blocks. The
SMs are assigned multiple unique blocks (Fig. 2.9) by a global scheduler, GigaThread
unit in post Tesla microarchitecture, from which SMs schedule and execute smaller
groups of 32 consecutive threads called warps. Threads in a warp execute a common
instruction at a time that should not invoke conditional branching operations as it
will introduce thread divergence and therefore the serial execution of all threads in
each branching path until they reach a common instruction again (only applied to
threads in the same warp) [13]. Once a warp finishes executing, a warp scheduler
switches context, with no overhead cost, and replaces the current warp in a SM by a
Fig. 2.7 Kernel declaration

in CUDA C
Fig. 2.8 Kernel code pre-NVCC and post-NVCC optimization
Fig. 2.9 Block division in SMs
new one, from the same block or not, ready to execute. This mechanism is also used
to mask the latency associated with memory transactions since it prevents stalling
the pipeline while a warp waits for the transaction to be completed.
2.4 Conclusions
This chapter presented an introduction to some basic techniques that are the founda-
tion for many of the state-of-the-art pattern matching, the basic concepts of the GA
and finally a review of NVIDIA’s GPUs. The pattern matching techniques can be
divided into two categories, linear and aggregate approximations, that try to create
accurate approximations of a time series using the minimum amount of points possi-
ble. The GA takes part of a group of algorithms, EAs, that attempt to solve problems
that do not have a concrete solution such as non convex problems. The GPUs are
alternative execution system to the common multi-core systems that used the CPU
as the main processing unit. A GPU started as a system that was meant to process 2D
and 3D graphical scenes however, researchers identified the ability of using them to
accelerate highly parallel algorithms.
References
1. D.J. Berndt, J. Clifford, Using dynamic time warping to find patterns in time series, in KDD
Workshop (1994), pp. 359–370
2. E. Keogh, S. Chu, D. Hart, M. Pazzani, Segmenting time series: a survey and novel approach.
Data Mining in Time Series Databases (2003), pp. 1–21
3. L. Wei, Sax: N/n not equal an integer case, http://alumni.cs.ucr.edu/wli/
20 2 Background
4. J. Lin, E. Keogh, S. Lonardi, B. Chiu, A symbolic representation of time series, with impli-
cations for streaming algorithms, in Proceedings of the 8th ACM SIGMOD Workshop on 78
Research Issues in Data Mining and Knowledge Discovery, ser. DMKD 2003. (ACM, New
York, NY, USA, 2003), pp. 2–11. https://doi.org/10.1145/882082.882086
5. A. Canelas, R. Neves, N. Horta, A sax-ga approach to evolve investment strategies on financial
markets based on pattern discovery techniques. Expert Syst. Appl. 40(5), 1579–1590 (2013),
http://www.sciencedirect.com/science/article/pii/S0957417412010561
6. B. Lkhagva, Y. Suzuki, K. Kawagoe, Dews2006 4a-i8 extended sax: extension of symbolic
aggregate approximation for financial time series data representation (2006)
7. N. Razali, J. Geraghty, Genetic algorithm performance with different selection strategies in
solving tsp. IEEE Micro 31(2), 50–59 (2011)
8. E. Lindholm, J. Nickolls, S. Oberman, J. Montrym, Nvidia tesla: a unified graphics and com-
puting architecture. IEEE Micro. 28(2), 39–55 (2008)
9. D. Luebke, G. Humphreys, How gpus work. IEEE Comput. Soc. 40(2), 96–100 (2007)
10. C.J. Thompson, S. Hahn, M. Oskin, Using modern graphics architectures for general-purpose
computing: a framework and analysis, in Proceedings 35th Annual IEEE/ACM International
Symposium on Microarchitecture (2002), pp. 306–317
11. T.J. Purcell, I. Buck, W.R. Mark, P. Hanrahan, Ray tracing on programmable graphics hardware,
in Proceedings of ACM SIGGRAPH 2002 ACM Transactions on Graphics (TOG), vol. 21
(2002), pp. 703–712
12. M. Rumpf, R. Strzodka, Level set segmentation in graphics hardware, in Proceedings of Image
Processing, vol. 3 (2001), pp. 1103–1106
13. NVIDIA Corporation, Nvidia cuda compute unified device architecture programming guide
(2015), ]urlhttps: //docs.nvidia.com/cuda/pdf/CUDA C Programming Guide.pdf. Accessed 15
Nov 2015
14. NVIDIA Corporation, Nvidia cuda compute unified device architecture programming guide
(2012), https://www.cs.unc.edu/prins/Classes/633/Readings/CUDA_C_ProgrammingGuide_
4.2.pdf. Accessed 10 Aug 2016
15. N. Corporation, Whitepaper—nvidia geforce gtx 980 (2014), http://international.download.
nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF
16. C.M. Wittenbrink, E. Kilgariff, A. Prabhu, Fermi gf100 gpu architecture, in Proceedings of the
World Congress on Engineering, vol. 2 (2011)
17. J. Sanders, E. Kandrot, CUDA By Example: An Introduction to General-Purpose GPU Pro-
gramming. (Addison-Wesley, 2012)
18. N. Corporation, Whitepaper—nvidia geforce gtx 680 (2012), http://people.math.umass.edu/
johnston/M697S12/Nvidia_Kepler_Whitepaper.pdf. Accessed 13 Nov 2015
19. N. Corporation, Parallel Thread Execution ISA Application Guide v3.2 (2013), http://docs.
nvidia.com/cuda/pdf/Inline_PTX_Assembly.pdf. Accessed 13 Nov 2015
Chapter 3
State-of-the-Art in Pattern Recognition
Techniques
Abstract Pattern recognition, matching or discovery are terms associated with the
comparison of an input query, a pattern, with a time series sequence. These input
queries can be patterns similar to those presented in Chen (Essentials of Technical
Analysis for Financial Markets, 2010 [1]) or user-defined ones. Although focus will
be in pattern matching techniques applied to financial time series, these techniques
proved to be very versatile and expandable to different areas, going from the medical
sector with applications in Electrocardiogram (ECG) Chen et al. (Comput Meth-
ods Programs Biomed 74:11–27, 2004 [2]) to the energy sector with forecasting
and modelling of buildings energetic profile Iglesias and Kastner (Energies 6:579,
2013 [3]).
Keywords Pattern discovery · Middle curve Piecewise Linear Approximation

Perceptually Important Points · Symbolic Aggregate approXimation · Turning
Points · Shapelets
3.1 Middle Curve Piecewise Linear Approximation
An enhanced version of DTW combined with a PLA based approach is introduced by

[4] with the purpose of reducing the error associated with time series approximations.
The author uses Derivative Dynamic Time Warping (DDTW) applied to two different
time series dimensional reduction approaches, PAA and Middle curve Piecewise
Linear Approximation (MPLA). The main reason behind using DDTW and not
DTW lies in a alignment weakness of DTW. With two unaligned times series, P and
Q, DTW does not take in consideration the current trend, so eventually by mistake,
a down-trend sub-sequence of P will be aligned with an up-trend sub-sequence of
Q. DDTW solves this issue by redefining the distance measure. The new approach
does not use the time series point itself but instead a value (Eq. 3.1) that reflects the
current trend.

22 3 State-of-the-Art in Pattern Recognition Techniques
(qi − qi−1 ) + (qi+1 − qi−1 )/2

di (Q) =
2 (3.1)
D(i, j) = (di (Q) − d j (P))2
In order to evade PLA problem of late representation of a trend inversion, MPLA

tries to find reversal points based on the amplitude of three consecutive points, qi−1 , qi
and qi+1 . Once a reverse point is found, it is saved and later used to create the middle
curve of the time series which will be passed as input in a different PLA technique,
the Divisive Piecewise Linear Approximation (DPLA). This approach minimizes
the distance between the segment, L(star t : end) = L(i : j), that approximates the
time series and the time series itself with a threshold constrain of ε. If the distance
surpasses ε at L(l), then the segment is split into two possibly unequal sub-segments,
L(i : l) and L(l : j).
The authors of [4] tested their proposed technique, MPLA, with a time series
consisting of a real stock data, Q of length 2000, taken from an unknown market and
unknown time period where the search pattern was a sub-sequence of Q with length
50. The results showed that not only the technique could search the input pattern in
Q but also discover similar sub-sequences in the global time series. A setback of this
technique is the increase in time complexity—computational time—when compared
with SAX caused by using DDTW.
3.2 Perceptually Important Points
The previous technique tried to find inversion points in the time series to build an
equivalent time series. A similar approach is Perceptually Important Points (PIP). As
the name suggests, PIP technique searches for points that are humanly identifiable.
The process starts with two fixed PIPs, the first point, p1 , and last point, p2 , of
the time series P. The next PIP can be obtained by maximizing the distance of the
lines that unites two consecutive PIPs to a point in the time series (Fig. 3.1). For
instance, p3 is the result of maximizing the distance between the segment p1 p2 and
P while p4 and p5 are the maximum distance between P and p1 p3 and p3 p2 . This is
an iterative process so that for each PIP, two more are generated, meaning that there
is not a defined stopping condition with the possibility of having len(dataset) − 1
PIPs.
Although previously it is mentioned that a distance measure is used, a formal
description of such has not yet been made. The authors of [5] present three distinct
measures, Euclidean Distance (ED), Perpendicular Distance (PD) and Vertical Dis-
tance (VD). The ED method maximizes the sum of the distance between each pair of
consecutive PIPs ( pi , p j ) to possible a test point, ptest , in the time series (Eq. 3.2).

E D( ptest , pi , p j ) = (x j − xtest )2 + (y j − ytest )2 + (xi − xtest )2 + (yi − ytest )2 (3.2)
3.2 Perceptually Important Points 23
Fig. 3.1 Description of PIP procedure
PD uses instead the perpendicular distance between the point in test to the line
segment that connects pi to p j . The slope of the line segment pi , p j is given by
Eq. 3.3 while the relative position, pc , of the test point in pi , p j can be calculated
using Eq. 3.4. And finally, the PIP is calculated by maximizing Eq. 3.5.
y j − yi
s = Slope( pi , p j ) = (3.3)
x j − yi
xtest + (s · ytest + (s 2 · x j ) − (s · y j )
xc = − xtest
2
1 + s2 (3.4)
yc = (s · xc ) − (s · x j ) + y j

P D( ptest , pc ) = (xc − xtest )2 + (yc − ytest
2
) (3.5)
The last measure presented is VD and is the vertical distance (y-axis) between
the test point and pi , p j , calculated by Eq. 3.6.

xc − xi

V D( ptest , pc ) = yi + (y j − yi ) · − ytest (3.6)
x j − xi
Until now, only time series dimensional reduction techniques were presented.
Similar to ED idea, a template of the pattern can be searched by a minimizing of the
point-to-point distance (Eq. 3.7) between the time series, P, and the PIP’s template, T .

1 n
2
V er tical Distance(P, T ) = pk,x − tk,x (3.7)
n k=1
A vertical similarity method has been established only now missing a horizontal
measure. This measure (Eq. 3.8) must take in consideration possible time distortion
between the template and time series.

1
n
2
H ori zontal Distance(P, T ) = pk,y − tk,y (3.8)
n−1 k=2
To determine if a template matches the time series, a combination of Eq. 3.7 with
Eq. 3.8 is needed. A weighted method can be used where a weight factor is assigned
to both measures. Based on experiments, [5] suggests a weight factor of 0.5 where
horizontal and vertical contribute equally to the final distance measure (Eq. 3.9).
Distance(P, T ) = w · H D(P, T ) + (1 − w) · V D(P, T ) (3.9)
As it is possible to see from before, in a template-based matching approach it can

be hard to obtain a accurate result with predefined patterns. To overcome this issue,
a different approach can be taken by defining a set of rules for a broader number of
possible solutions. Fu et al. [5] presents a set of rules for different pattern including
the Head-and-Shoulders (HS) with 7 PIPs, displayed in Fig. 3.2.
• p4 > p2 and p4 > p6

• p2 > p1 and p4 > p3
• p6 > p5 and p6 > p7
• p3 > p1
• p5 > p7
p
• (1 − p26 ) < 15%

• (1 − pp35 ) < 15%
As for evaluation, [5] used a dataset with 2532 points taken from the HSI index.
To set a default distance measure, a comparative test was performed where PD
Fig. 3.2 Head-and-

Shoulders PIP representation
3.2 Perceptually Important Points 25
presented the highest accuracy of all methods while only being slower than VD. For
benchmarking, both template and rule-based techniques were matched against PAA.
From all three methods, the template-based PIP approach presented the best overall
results with 96% accuracy followed by PAA with approximately 82% of correct
patterns while the rule-based PIP had the worst results with an accuracy of around
38%.
The work in [6] introduced an hybrid approach combining a rule-based method
with Spearman’s rank correlation coefficient to compare the degree of similarity
between two patterns. The authors use PIP with a sliding window technique with
two types of displacement, if the subsequence being tested matches a pattern then
the window will slide W units, where W is the size of the window, and if it does not
match any pattern then the window is shifted by one unit. With this, the authors expect
to accelerate the overall speed of the process while not skipping most patterns. To
determine if the windows should be shifted or displaced, Spearman’s rank correlation
coefficient is applied to the PIP values. For both the time series and search pattern,
the PIP values are converted into ranks according to their value, so that a low PIP
value equals to a low rank. Now it is possible to determine the level of similarity
between the input time series and the pattern created, using Spearman’s correlation
coefficient (Eq. 3.10),
n
6 · i=1 di2
ρ =1− (3.10)
n(n 2 − 1)
where n is equal to the number of ranks or in this case, the number of PIPs, and di
patter n
is the difference between the rank of P I Pitime series and P I Pi . This coefficient
ranges from −1 to 1 indicating that if the module of ρ is near 1, the framed time
series and pattern are identical while if it is near 0, they are not a match.
Two different datasets were used to test the proposed technique, a synthetic and a
real one. In the synthetic dataset, the rule-based method with Spearman rank correla-
tion outperforms the simple template method in finding common patterns—Multiple
Top, Head-and-Shoulders and Spikes. In 7 of the 8 input pattern, this technique has
an overall accuracy of 95% only dropping to around 85% in Spike Top pattern. The
real dataset consisted in information extracted from HSI index for the past 21 years
and the results are very similar to those obtained in the synthetic dataset with an
increase of the Spikes pattern accuracy since there is a low level of occurrence [6].
A different approach was introduced in [7] where the authors present an evolution-
ary pattern discovery approach using GAs and PIP resorting to a clustering technique
to group similar patterns in corresponding clusters. This method starts with an initial
population of size P Si ze randomly generated in which, each individual is a possible
time series solution. The time series in a chromosome is then divided into k clusters
and evaluated with an appropriate fitness function, followed by several genetic oper-
ations and individual selection. The process is iterated until the termination criteria
is met or the number of generation is reached. To validate this approach, the authors
use a normalized and smoothed dataset of Taiwanese companies dated from January
2005 to December 2006. As for the algorithm parameters, population size was set to
100 individuals, each with 6 clusters, where the applied crossover rate was equal to
0.8, mutation rate of 0.3 and a stopping criteria of 300 generations.
The proposed technique shows decent results detecting 2 known patterns, Dou-
ble Top and Double Bottom, and, additionally, Uptrend and Downtrend detection,
although using clusters does not seem to be a convincing technique as some level of
abstraction is required when matching each cluster to the corresponding pattern.
Continuing the trend of evolutionary algorithms, [8] uses a Neural Networks (NN)
to accurately detect the Head-and-Shoulders (HS) pattern. This method is based on
a two layers feed-forward Neural Networks (NN) mechanism, the Self Organizing
Maps (SOM), where the output layer is formed by nodes or neurons and the input layer
by the training/validation data with the objective of minimizing the distance between
a node and the input vector. The authors use a SOM with two nodes indicating that
two possible “clusters” are allowed, one with sequences matching the Head-and-
Shoulders (HS) pattern and another with irrelevant patterns. The input vectors are
created by transforming the rule-based training patterns into rescaled 64 × 64 binary
matrices, that are later compressed into 16 × 16 matrices, obtained by summing the
neighbours of the original matrix. The authors report a recognition result of the Head-
and-Shoulders (HS) pattern of 97.1% although this can be disputed by the fact that
the result is highly dependent in the quality of the input patterns, since this method
relies in a dataset to train the network and with a training set with lower quality, a
less efficient network is created and, therefore, worse results.
3.3 Turning Points
As the name states, this method searches for Turning Points (TPs) in a time series
and considering a sliced time series they represent local minimum and maximum
indicating the trend of the stock [9].
While iterating a time series, a TP is found at t = i if the time series value, pi , is
lower or higher than his neighbours, pi−1 and pi+1 , so than,

f ( pi−1 ) > f ( pi ) and f ( pi+1 ) > f ( pi ) ⇒ Minimum
(3.11)
f ( pi−1 ) < f ( pi ) and f ( pi+1 ) < f ( pi ) ⇒ Maximum
When compared to PLA, both methods locate the maximum and minimum values
however the TP technique applies a filter where points that have a low contribution
to the overall shape of the time series are suppressed. This is achieved through a set
of simplifications (Fig. 3.3),
• Case 1—If a down-trend time series is interrupted by a small temporary up-trend,
the maximum and minimum values, M AX 2 and M I N1 , created by this reversal
can be ignored as long as the difference between M AX 2 and M I N1 is smaller than
M AX 1 minus M AX 2 plus the difference between M I N1 and the next minimum,
M I N2 .
3.3 Turning Points 27
Fig. 3.3 TP simplifications
• Case 2— Similar to the first case, here an up-trend time series has a small trend
reverse, where M AX 1 and M I N2 can be suppressed.
• Case 3—If an up-trend time series suffers a more noticeable trend reversal, M AX 1
and M I N2 can be ignored if their values are close to the closest maximum or
minimum respectively.
• Case 4—Similar to the third case however now it is a down-trend and the same
condition is applied.
In [9], the authors present a comparative study of TP with PIP, since both tech-
niques are based in extracting points from the original time series. The dataset used
was taken from HSI market dated from January 2000 to May 2010. The first test
evaluated the approximation error of both methods, defined as sum of the difference
between point i in the time series and approximated series. The TP method produced
a reduced time series with higher error than PIP, around 105% higher in the worst
case and 13% on the best, easily justified by the simplifications used. The second test
consisted in verifying the number of trends preserved and in this case, TP performed
better with, on average, 30% more trends preserved mainly because TP “is designed
to extract as many trends as possible from the time series” [9].
A different approach was introduced by [10] where a stack is used to organize the
TPs based on their contribution to the overall shape of the time series. This stack is
then converted into a Optimal Binary Search Tree (OBST) in which, the root has the
TP with higher weight or contribution and in the lower branches are TPs with low
effect on the time series shape.
The used test market was the HSI with a timespan of 10 years starting in 2000
until 2010, for which a time series with 2586 TP was created. The authors use a rule-
and template-based pattern detection technique to search for a Head-and-Shoulders
(HS) pattern and divide the tests in 3 categories, depending on the size of the used
TP time series, C1 with 75%, C2 with 50% and C3 with 25% of the original TPs.
The first step of this technique is to reconstruct the reduced time series based on
the stack depending on the category of test. Similar to other methods, this one also
uses the sliding window where the rule- and template-based methods are applied to
match the TP time series with the normalized Head-and-Shoulders (HS) pattern. At
the end, all patterns found by both methods are retrieved by the algorithm.
Table 3.1 Comparison between TP, PIP and PLA. Based in the results presented in [10], it is not
possible to accurately say the error value althought PLA has the lowest error followed by PIP and
then TP. The error metric is the same as used by [9]. It is the sum of the difference between all
points in the time series and approximated series
Error # Preserved trends Execution time (ms)
TP <100 1280 406
PIP <100 840 1329
PLA <100 700 1515
Identical to the results presented by [9], the authors of [10] reached similar con-
clusions when comparing TP with PIP and PLA (Table 3.1). TP can preserve a higher
number of trends in a time series however when the overall error is calculated, PIP
outperforms TP only losing to PLA when considering a low to medium number of
points, as expected from previous conclusions since PLA can recreate a time series
with higher fidelity. It is possible to correlate this results with the execution time,
where the error is inversely proportional to the execution time.
3.4 Symbolic Aggregate approXimation
An application of SAX technique combined with a GA approach, applied to invest-

ment strategies based on pattern discovery, was proposed by [13]. The authors present
a multi-chromosome genetic algorithm with chromosomes divided into 2 sections,
the parameters that support buy or sell decisions, such as, the maximum similarity
accepted between the time series and the pattern before triggering a buy or sell action
and the maximum time allowed to be on hold of a buying position, and the patterns
to search.
To evaluated the obtained results, [13] matched their SAX/GA approach versus
two different template-based technique, one invests in case of a positive identification
of an up-trend pattern while the other enter the market with an up-trend and exits
on a down-trend pattern, with the ROI indicator as comparison metric. SAX/GA
obtains, in average, almost always better results than the template-based methods
and all techniques—SAX/GA and template—present, also, a better return rate when
compared to a another popular investment strategy, B and H, where the investor
maintains a stock for a long period of time ignoring short time movements.
3.5 Shapelets
A supervised technique was introduced in [11] where instead of searching for specific
pattern in a time series, the algorithm searches for the best pattern or sub-sequence
that is capable of characterizing a time series class, group of similar time series, Ti ,
3.5 Shapelets 29
Table 3.2 Implementations of pattern recognition techniques on the literature

Ref. Year Technique Architecture Financial Dataset Performance
market
[5] 2007 PIP CPU Hong Kong 2532 points ∼96% accuracy
template Hang Seng
Index
[5] 2007 PIP rule CPU Hong Kong 2532 points ∼38% accuracy
Hang Seng
Index
[5] 2007 PAA CPU Hong Kong 2532 points ∼30 series
Hang Seng retreived
Index
[8] 2010 PIP-SOM CPU N/A N/A 97.1% accuracy
in HS pattern
[6] 2010 PIP CPU Hong Kong 21 years 95% accuracy in
template Hang Seng (5087 points) HS and Triple
w/ top, 10% in
Spearman Spikes and 55%
rank in Double Top
[8] 2010 PIP CPU Hong Kong 21 years 95% accuracy in
template Hang Seng (5087 points) 7 of 8 patterns,
dropping to 85%
in Spikes Top
[4] 2011 MPLA- CPU N/A 2000 points 2x lower error
DDTW than PAA
[9] 2011 TP CPU Hong Kong January 2000 105% error
Hang Seng to May 2010 (worst case) and
Index 13% error (best
case) compared
to PIP
[9] 2011 TP CPU Hong Kong January 2000 30% more
Hang Seng to May 2010 patterns
Index compared to PIP
[10] 2011 TP-OBST CPU Hong Kong January 2000 52% more
Hang Seng to May 2010 patterns
Index (2586 TPs) preserved
compared to PIP
and 83%
compared to
PLA
[12] 2012 Shapelets GPU N/A N/A ∼15x speedup
[7] 2013 PIP-GA CPU Taiwan cap. N/A Correct
Weighted identification of
stock index 4 patterns
[13] 2013 SAX-GA CPU S&P500 Jan 2005–Apr 67.76% average
2010 return (ROI)
e.g., species of leaves as used in [11], with an unique identifier. All possible solutions
or pivots are computed from the original times series with fixed length l.
The objective is to divide the original dataset D into two sub-groups, D L and D R ,
based on a threshold distance of the pivots, θ ,
Distance(Pivot j , Ti ) ≤ θ, ∀Ti ∈ D L
(3.12)
Distance(Pivot j , Ti ) > θ, ∀Ti ∈ D R
and maximize the quality of the dataset split, and in other words, maximizing the
distance between elements of different classes. The pivot ( j, i) of time series Ti,C
that ends up on maximizing the dataset split and θ , will be used as the shapelet that
“identifies” class C.
The shapelet rationale is identical to the one used in Support Vector Machines
(SVMs), also an supervised classification algorithm, where there are two types of
classes, A and B, and the objective is to obtain a weight vector w that maximizes
the distance between each point in A and B in relation to an hyperplane. The weight
vector gives an idea on the relative importance of each point to separate both classes.
An unexplored system until now was used in [12] where instead of the CPU,
the authors implement the Shapelet algorithm in a GPU. The implementation had
as target the Fermi-based GPU GTX 480, GF 100 in Table 2.2, and their approach
relied in using a CUDA core to perform all calculations related to pivot ( j, i) while
the thread block organization is done in such way that each block is responsible for
one time series Ti .
3.6 Conclusions
Several pattern recognition methods were presented, all with different approaches
to achieve one common objective, a high fidelity algorithm for pattern discovery
or matching. However there is a question to be answered. How easily can these
algorithms be ported into a GPU?
The MPLA method with DTW showed good results in discovering and matching
patterns in a financial time series although it presents some challenges in terms of
GPU optimization. To obtain the reduced time series through a parallel execution, the
original series could be segmented in several time windows and, each window, would
be reduced locally in a thread or block organization. This optimization introduces
a new problem, undetected segments due to the transition between windows, that
would required a new analysis in this transition points. DTW in paper looks to be
the ideal method for a GPU implementation as it relies in a matrix algorithm capable
of exploring a two-dimensional kernel organization however there is dependency in
cells that limits the overall number of threads or blocks to use.
The PIP approaches also demonstrated a great success with high levels of accuracy
but they come with a huge setback. The PIP representation is highly dependent on the
3.6 Conclusions 31
total points of the dataset, which means that with an ineffective use of the available
resources of a GPU, a faster implementation may still be produced. The effect of
this problem is minimized in the PIP-GA approach as the evolution of the genetic
algorithm will take the majority of the execution time. Identical to the PIP methods,
a GPU implementation of TP faces the same issues as PIP, due to the similarities in
the data reduction approach.
And finally the SAX/GA approach. Looking separately to SAX and GA both
present the best chances in portability to a GPU. SAX relies in N independently time
windows of size W , ideal for each window to be calculated in a thread organization
where N threads are responsible for calculating the reduced series, or a block orga-
nization, in which W threads in a block calculate the distance between two points
in window, with a total of N blocks. The GA, as previously explained, also poses as
good method to be implemented a GPU even though there is partial dependencies
between operators (Table 3.2).
References
1. J. Chen, Essentials of Technical Analysis for Financial Markets, 1st edn (2010)
2. W.-S. Chen, L. Hsieh, S.-Y. Yuan, High performance data compression method with
pattern matching for biomedical ecg and arterial pulse waveforms. Comput. Methods
Programs Biomed. 74(1), 11–27 (2004), http://www.sciencedirect.com/science/article/pii/
S0169260703000221
3. F. Iglesias, W. Kastner, Analysis of similarity measures in times series clustering for the dis-
covery of building energy patterns. Energies 6(2), p. 579 (2013), http://www.mdpi.com/1996-
1073/6/2/579
4. H. Li, C. Guo, W. Qiu, Similarity measure based on piecewise linear approximation and deriva-
tive dynamic time warping for time series mining. Expert Syst. Appl. 38(12), 14732–14743
(2011), http://www.sciencedirect.com/science/article/pii/S0957417411007901
5. T.-C. Fu, F.-l. Chung, R. Luk, C.-M. Ng, Stock time series pattern matching: template-based
versus rule-based approaches. Eng. Appl. Artif. Intell. 20(3), 347–364 (2007). https://doi.org/
10.1016/j.engappai.2006.07.003
6. Z. Zhang, J. Jiang, X. Liu, R. Lau, H. Wang, R. Zhang, A real time hybrid pattern match-
ing scheme for stock time series, in Proceedings of the Twenty-First Australasian Confer-
ence on Database Technologies—Volume 104, ser. ADC 2010, Darlinghurst, Australia, Aus-
tralia: Australian Computer Society, Inc. (2010), pp. 161–170, http://dl.acm.org/citation.cfm?
id=1862242.1862263
7. C.-H. Chen, V. Tseng, H.-H. Yu, T.-P. Hong, Time series pattern discovery by a pip-based evo-
lutionary approach. Soft Comput. 17(9), 1699–1710 (2013). https://doi.org/10.1007/s00500-
013-0985-y
8. A. Zapranis, P. Tsinaslanidis, Identification of the head-and-shoulders technical analysis pattern
with neural networks, in Artificial Neural Networks—ICANN 2010, ed. by K. Diamantaras, W.
Duch, L. Iliadis. Lecture Notes in Computer Science, vol. 6354 (Springer, Berlin Heidelberg,
2010) pp. 130–136. https://doi.org/10.1007/978-3-642-15825-417
9. J. Yin, Y. Si, Z. Gong, Financial time series segmentation based on Turning Points, in 2011
International Conference on System Science and Engineering (ICSSE) (2011) pp. 394–399,
http://ieeexplore.ieee.org/xpls/absall.jsp?arnumber=5961935
10. Y.-W. Si, J. Yin, OBST-based segmentation approach to financial time series. Eng. Appl.
Artif. Intell. 26(10), 2581–2596 (2013), http://www.sciencedirect.com/science/article/pii/
S0952197613001723
11. L. Ye, E. Keogh, Time series shapelets, in Proceedings of the 15th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining—KDD 2009 (2009), p. 947, http://
portal.acm.org/citation.cfm?doid=1557019.1557122
12. K.W. Chang, B. Deka, W.M.W. Hwu, D. Roth, Efficient pattern-based time series classification
on GPU, in Proceedings—IEEE International Conference on Data Mining, ICDM (2012), pp.
131–140, http://www.biplabdeka.net/files/icdm2012.pdf
13. A. Canelas, R. Neves, N. Horta, A SAX-GA approach to evolve investment strategies on
financial markets based on pattern discovery techniques. Expert Syst. Appl. 40(5), 1579–1590
(2013). https://doi.org/10.1016/j.eswa.2012.09.002
Chapter 4
SAX/GA CPU Approach
Abstract This chapter discusses the sequential implementation of the SAX/GA

algorithm which is an algorithm designed for the optimization of market trading
solutions. SAX/GA uses the SAX representation to validate the similarity between a
possible solution and the training dataset, while the GA optimizes the pool of trading
strategies based on a function that defines the quality of each solution. Later on, a
benchmark analysis is presented in order to understand the performance of SAX/GA
and locate possible regions that can take advantage of a parallel implementation.
Keywords Symbolic Aggregate approXimation (SAX)

Genetic Algorithm (GA) · SAX/GA workflow · SAX/GA performance analysis
4.1 SAX/GA CPU Approach
SAX/GA algorithm [1, 2] was developed with the purpose of optimizing a trad-
ing strategy with multiple patterns for entry and exit in the Standard & Poor 500
(S&P500) index, either in a long or short position. In a long position, an investor
purchases X shares of a stock at day i market value, expecting an increase in their
value, while an investor, in a short position, borrows X shares of a stock in day i from
a brokerage firm after which sells them at day i market value and is now in debt to
the brokerage firm. The short investor expects a decrease in value and once it has
reached the prospected price, the investor purchases X shares of the same stock at
day i + n price and returns them to the brokerage firm profiting the difference.
The algorithm uses daily historical data extracted from S&P500 index which is
then divided in a sliding window fashion (Fig. 4.1). The initial training dataset, Dtrain
with Dtsize size, is succeeded by a test dataset, Dtest with Dvsize size, used to validate
the best strategy obtained during the training phase. Once testing is completed, Dtrain
is shifted by Dvsize days and the process restarts until the dataset reaches the end or
a stopping criteria is met.

Another random document with
no related content on Scribd:
CHAPTER 7
Our Offensive Running Game
The number one objective of offensive football is to score. If we

cannot score, then the only alternative is to kick. We never want to
surrender the ball to our opposition outside of the 4-down zone other
than by kicking it to them. We cannot possibly win the tough football
games if we lose the football by fumbles, interceptions, blocked
kicks, or anything else other than by kicking the ball to our
opponents.
BASIC PRINCIPLES AND REQUIREMENTS OF

OFFENSIVE FOOTBALL
Offense is based on two primary principles, running with and/or
passing the football. They go hand in hand and complement each
other in all respects. A team must pass sufficiently in order to keep
the opposition from putting all of its strength on or relatively close to
the line of scrimmage. A team also must be able to advance the
football with its ground game so the opposition does not have to
defend against only one phase of the game, the passing attack.
The main requirements for playing offensive football are: (1)
explosive speed, and (2) a real desire to carry out the specific
assignment, whether it be to block, run or fake. Offense is a game of
skill and perfection, and many hours must be spent working on it.
A Sound Offense
A sound offense is one in which each play is designed to gain

ground, and if executed perfectly there is a good possibility for a long
gain and/or scoring a touchdown. If the offense is going to operate at
maximum efficiency, the bad play must be eliminated completely.
The bad play is one that does not gain yardage. If the offense is
based on the 4-yard per play or short thrust offense, such as
Oklahoma has used so successfully in the past under Bud Wilkinson,
it is clear to see how a bad play can stymie the offensive attack.
The sound attack also utilizes the best personnel available, such
as the best ball carriers carrying the ball, the best blockers for
blocking, and so on. If you will analyze your offensive attack, you are
likely to learn this does not always occur. The best blocker might be
carrying the ball more often than your best ball carrier, who in turn
might not be a strong blocker.
We had a fine team at Texas A & M in 1956, but when I look back
over the season I now realize I did not utilize my backs to maximum
efficiency. Our quarterback, Roddie Osborne, was a very fine football
player, and like the great ones he enjoyed running with the football.
Roddie did a terrific job of running with the ball, but we had other
outstanding backs, such as All-American John Crow, All-American
Jack Pardee, and Lloyd Taylor, a great competitor. When the season
was over and we tabulated our final statistics, we found Roddie had
carried the ball nearly as many times as the combined carries of
Crow, Pardee, and Taylor.
In order to maintain maximum results with the offense, it must be
presented to the squad so they all have a complete understanding of
what you are trying to accomplish and how it is going to be done.
Unless the players have a thorough picture and understand the
reasoning behind your offensive attack, maximum efficiency is
impossible.
I firmly believe that the simpler the offense, the better. The fewer
things a boy has to do and remember, the better he will execute the
fundamentals you have taught him. If executed well, the player will
have more confidence in what he is doing and his own ability.
Consequently the greater the chance the offensive attack has of
being successful.
OUR OFFENSIVE TERMINOLOGY

To be sure that we are all talking the same “language” and can
understand each other readily, we have adopted the following
offensive terminology:
Flow—Direction in which most of the backs start.

On-Side—Lineman on side of point of attack.
Off-Side—Lineman on side away from point of attack.
Near Back—Halfback on side of flow.
Far Back—Halfback away from side of flow.
On-Back—Of the two remaining backs in the backfield, the
back toward call.
Off-Back—Of the two remaining backs in the backfield, the
back away from call.
Point of Attack—Spot where ball crosses the line of
scrimmage.
Over—Defensive man over any part of the offensive man.
Cut Off—Shoot the head and shoulder past the defensive
man, destroying his correct pursuit angle to the football.
Set—Fake pass protection block.
Slam—Entertain defensive man with shoulder and forearm.
N/T—No one there.
Position Lateral—Getting self in position to receive lateral
from the man with the football.
Covered—Designates offensive man with a defensive man
over him on L.O.S.
L.B.—Linebacker.
M.L.B.—Middle linebacker.
L.O.S.—Line of scrimmage.
Man on L.O.S.—Defensive man down in 3- or 4-point
stance on the line of scrimmage.
“6”—Right end.
“7”—Left end.
Club—A vicious running head and shoulder block attacking
the defensive man’s upper extremities from the blind side.
This block is used in the area approaching the point of attack.
Color—First man approached with different color jersey.
Motion—Back leaving before snap of the ball.
Ice—Receiver going to outside or inside and becoming a
possible receiver after making his block or when no one
shows. Yell “Ice” when open.
Trail Junction Blocker—Ball carrier straddling the outside
leg of junction blocker. Stay close to him.
Pursuer—Defensive man pursuing the ball carrier.
Gap—Space between two offensive men.
Flare—Call to tell a back to run a flare route.
Drive Man—Man who does the driving on a two-on-one
block.
Post Man—Man who stops the progress of the defensive
man on a two-on-one block.
Odd Defense—Offensive guards not covered.
Even Defense—Offensive guards are covered.
Box Defense—Only two deep men in the secondary.
3-Deep Defense—Three deep men in the defensive
secondary.
9-Man Front—Box defense.
8-Man Front—Three deep defense.
Position Ball—Bring ball immediately into belly, elbows in,
ball in fingers.
Drive Block—A vicious head and shoulders block and turn
opponent.
Climb Block—A brutal drive block.
Chop Block—Open field block on men in the secondary by
throwing your body (extended) at his throat.
Crack Back—A vicious low reverse body block.
Groin Block—A low drive block with upward action aimed at
defensive man’s groin.
Fill—Protect the area to your inside.
Shoot Out—Springing from your original stance, hitting on
your hands and feet running (used to get downfield).
Against the Grain—Direction in the secondary opposite the
flow.
On Linebacker—Denotes (on-side L.B. N/T M.L.B.).
Position—Getting to a spot between the man you are to
block and the ball carrier.
Roll Back—Position and block defensive man from blind
side.
Eagle—Call to tackle and guard to switch assignments.
Head On—Man nose on nose.
We want all of our players and coaches to understand and use our
offensive terminology. One or two words either explains the
descriptive action we want or identifies some segment of the offense
or the opposition’s defense. The terms are simple, meaningful and
descriptive.
OUR OFFENSIVE STANCE

The offensive stance is very important. It is difficult enough trying
to attain maximum results when they are lined up in their proper
stance, but it puts your players at a distinct disadvantage and
handicaps them greatly if you permit them to assume a stance that is
improper and incorrect. Therefore, we put first things first, and try to
coach our players always to take the proper stance.
Offensive Stance for Linemen
The stance for the linemen, with the exception of the center, is
basically the same, with allowances being made for various physical
characteristics, which vary from individual to individual. The inside
foot is forward, the feet staggered in an arch to toe relationship. The
tackles and ends exaggerate the stagger from heel to toe since they
are further removed from the center and quarterback.
The feet should not be spread wider than the individual’s
shoulders, with the weight of the body concentrated on the balls of
the feet. The heels should be slightly in, with the cleats on the heel of
the forward foot almost touching the ground. The ankles should be
bent slightly. The knees should be bent slightly more than 90
degrees, and turned slightly in. The tail is even or a little higher than
the shoulders, and splitting the forward and rear heels. The back is
straight, shoulders square, neck relaxed, and eyes open keeping the
defensive linebacker in line of sight. The hands are placed down
slightly outside of the feet, elbows relaxed, and thumbs in and
slightly forward of the shoulders.
Offensive Stance for Center
The center lines up in a left-handed stance with the feet even and
slightly wider than the shoulders. The weight is on the balls of the
feet, heels turned slightly in, with the cleats on the heels of the shoes
almost touching the ground. The knees are slightly in and bent a little
more than 90 degrees. The tail is slightly higher than the shoulders
and about two inches in front of the heels. The center places his left
hand inside his legs down from between his eye and ear almost
directly under the forehead, with the fingers spread and the thumb
turned slightly in. The shoulders are square, the back is straight, the
neck is relaxed, and the eyes looking upward. His right hand grasps
the football like a passer. He should reach out as far as possible
without changing his stance. The center is coached to place the ball
on his tail as quickly as possible with a natural turn of the arm. He
should drive out over the ball with his head coming up and tail down,
keeping his shoulders square as he makes his hand-back to the
quarterback.
Quarterback’s Stance
The quarterback is coached to get into a football position with the

feet slightly wider than the shoulders, weight on the balls of the feet,
heels and knees turned slightly in, knees bent slightly, and a natural
bend at the waist in order to be in a good position to receive the ball
from the center as he snaps the ball on the hand-back. The elbows
must be bent and in close to the body.
The quarterback’s right hand goes up in the center’s crotch. He
turns it slightly to the right. At this spot he applies pressure with the
hand to his center’s tail. The left hand must be in a comfortable
position, making slight contact with the right hand, and it is used to
trap the ball and to assist the right hand in taking the football from
the center.
Halfback’s Stance
The feet of the halfback should not be wider than the shoulders,
and staggered in a heel to toe relationship with each other. The
weight should be on the balls of the feet, but will vary slightly
depending upon the direction the halfback must move in carrying out
his particular assignment. With the snap of the ball he should throw
himself in the direction he is going, and he should not use a cross-
over step.
His knees should be bent a little beyond 90 degrees, with the
knees and heels turned slightly in, and the tail a little higher than the
shoulders. The halfback’s shoulders should be square, with his head
and eyes in a position to see the defensive linebacker on the
opposite side from him. The inside hand should be down, slightly
forward and inside of the knee with the thumb turned a little to the
inside. The body weight should be forward slightly.
Fullback’s Stance
The fullback lines up with the feet even and a little wider than his
shoulders. The cleats on the heels of his shoes should touch the
ground. The heels and knees are turned slightly in with the weight on
the balls of the feet. The head and eyes are in a relaxed position, but
where they can see the second man standing outside of the
offensive end. The hands are directly in front of each foot with the
thumbs turned in. The shoulders are square, the back is straight, the
tail is directly above the heels, with the weight slightly forward, but
not to such an extent he cannot start quickly in a lateral direction to
either side.
OFFENSIVE LINE SPLITS

The use of intelligent line splits by the offensive guards, tackles
and ends must be mastered in order to realize the full potential of our
basic offensive attack. Without proper line splits, it is impossible for
the offense to function at 100% efficiency. Therefore, we must
present line split theory and coach our linemen in such a way they
will have a clear understanding of why and when we want to move
in, out or remain stationary. Mastering the intelligent use of line splits
is one of the most important single duties of the offensive linemen.
(The other is a quick offensive charge together on the starting
count.)
The Pre-Shift Position
When the linemen leave the huddle and come up to the line of
scrimmage in a pre-shift position (hands on knees in a semi-upright
stance), the basic split rule for the guards is to split one full man. The
tackles and ends will split slightly more than one full man. As the
linemen go down into their offensive stance, each man (except the
center) will move in, out or remain stationary, depending upon the
particular defensive alignment and the individual’s split rules.
Our Basic Split Rules
Our offensive basic split rules are as follows:
Even Defense:
1. Guards—full man; don’t move.

2. Tackles—man over you, split one-half man. If no one is there,
use a common sense split which would be to cut the split down on a
wide play and take a maximum split on an inside play.
3. Ends—Line up a little over a full man split, and use the common
sense split rule which would depend upon the play. Never move
more than one-half man either way.
Odd Defense:
1. Guards—Take a full man split, but never so wide that if a man

should jump into the gap between the guard and center, you could
not cut him off. After taking the proper split, then apply the common
sense split rule.
2. Tackles—If there is no inside linebacker:
(a) Wide play called—split in one-half man.

(b) Inside play called—fake split and don’t move. Inside
linebacker: Split out one-half man.
3. Ends—Take good wide splits and apply common sense rule,

never split more than one-half man either way.
In order to split intelligently it is important to determine first of all if
the defensive man will move with you when you move (Figure 99a),
or whether he is keeping his spacing on his own defensive man
(Figure 99b). You are attempting to determine as quickly as possible
if your defensive man is taking a variable or a static position. Figures
99a-b illustrate variable and static spacing, respectively, by the
defensive linemen.
Figure 99a
Figure 99b
Do not ever emphasize that you are splitting to get good blocking
angles, but you split in order to isolate a defender. If the defender
splits when the offensive man splits, you can isolate him. If his split is
static, a good blocking angle will be the result. Your linemen should
never split merely to get the angle, however. It will also help the
linemen if they have a clear picture of where the ball crosses the line
of scrimmage (the critical point of attack), and from where the ball is
being thrown on a pass play. Then, too, there is no set rule that will
cover all defensive situations and the offensive men must be able to
apply the common sense split rule along with the basic split rule.
Figure 100 illustrates the pre-shift position of the right side of the
offensive line and the application of the guard’s, tackle’s, and end’s
split rules. From the pre-shift stance and position, the offensive men
are allowed to split one-half man either way, according to the
defense. The inside always must be protected. A defensive man
must not be allowed to penetrate or shoot the inside gap as he is
likely to stop the offensive play for a loss.
Figure 100
If the defensive man will move with the offensive man, then the
offense should be able to isolate one man and the point of attack
should be directed toward him. Figures 101a-b illustrate the center’s
man and the offensive right tackle’s man being isolated respectively,
and the critical point of attack being directed at the isolated
defenders.
Figure 101a
Figure 101b
It is very important for the offensive lineman to know his main

objective in proper line splitting is either (1) to spread the defense
out, or (2) to isolate a man. The main objective is never to split in
order to get a good blocking angle. This latter misconception can
destroy any advantage we gain by splitting the line.
OUR SNAP COUNT

In order for a team to operate at maximum efficiency offensively all
of the players must get off with the ball at the same time. They must
uncoil as a unit and strike the opposition as quickly as possible. The
only advantage the offense has over the defense is the former
knows where the play is going and when the ball will be snapped. In
order to maintain this advantage the offense must strike quickly as a
unit. Should the defense penetrate the offense or the offense not get
off on the ball together, the offense loses its advantage over the
defense.
The offense’s advantage hinges on its ability to get off on the ball
together. Whether or not the offense can do this well will depend
upon their first initial movement, which in turn depends upon the
snap count.
At one time or another I believe we have used just about every
imaginable snap count. We have found that our line gets off better as
a unit, without leaning, and hits quicker when we employ a sound
snap count. The quarterback can say what he likes if we are going
on the first sound. While in the huddle, the quarterback will give the
play and then cue the action with the first sound, second sound or
the third sound. We use the same word for our sound snap count,
but the emphasis is placed on the first, second or third sound. For
example, if the quarterback calls, “24 on the second sound,” the
team lines up in its offensive position, and the quarterback says,
“Go!” Since this is the first sound, the ball is not centered. Then the
quarterback commands, “Go!” for the second sound, the ball is
centered and the team moves as a unit. The quarterback can wait
between his first and second commands since our snap count is
non-rhythmic. Such a measure not only keeps the defense off guard,
but it also keeps the offense constantly alert.
Automatics
It is not my purpose to discuss the automatic versus the non-

automatic systems of signal calling. In the former the quarterback
can use the automatic system or change the play at the line of
scrimmage. In the latter he runs the play that was called in the
huddle and does not change it at the line of scrimmage. There are
advantages and disadvantages to both systems. A strong argument
that is advanced for the non-automatic system is that it gives the
linemen time to analyze their block and mentally pick out the person
they are supposed to block as they approach the line of scrimmage.
The people who favor the non-automatic system maintain they have
fewer broken signals, and fewer bad plays than the teams using the
automatic play change.
While I am not stressing the merits of one system over the other,
frankly, I would never send my players into a football game without
several plays which could be automaticked at the line of scrimmage.
It is just common sense to realize certain plays are not good against
certain defenses, and it is useless in most instances to run a play
directly toward the strength of a defense when you need to gain
yardage in order to sustain your offensive drive. Therefore, we
always have a trap play, an end sweep, and a look pass as our
automatics. Since they are few in number, our automatics are easy
to learn and they allow the quarterback to change his play at the line
of scrimmage if he wishes to do so.
Our automatic system is very simple, and there is very little
confusion when we change our plays. As an example, while we are
in the huddle, the quarterback will call play, “18,” and the players will
break from the huddle and go to the line of scrimmage. After the
quarterback gets the players down into their offensive positions and
he sees he wants to change the play on automatic, he will repeat the
original play, “18.” This will alert the players, and they know the next
number they hear will be the new play that will be run instead of play
18. After the quarterback has repeated the original play, and has
called another number, “24,” as an example, everyone on the
offensive team knows the new play will be 24, and the snap count
always remains the same.
Faking the Automatic
If the quarterback wishes to fake an automatic so the opponents

cannot catch on to what he is doing, this can be done very simply. As
an example, let’s say the quarterback called the play, “30,” in the
huddle. At the line of scrimmage he decided to fake or dummy the
automatic. At the line he will say any number other than the one he
called in the huddle. This notifies the boys it is a fake automatic and
they will not pay any attention to the next number he calls as they
are going to run the play which the quarterback called originally in
the huddle.
By using this two number system we can always play the
automatic or fake it, and it is impossible for the opponents to catch
on because they do not know the play the quarterback called in the
huddle originally. It is impossible to use the automatic if the
quarterback has elected to run the play on the first sound. The
majority of the time when we are going on the first sound we will be
hitting fast and straight away, and an automatic will not be
necessary. If the quarterback thinks there is any chance he might
want to change the play for any particular reason, he would not run
the play on the first sound.
THE OFFENSIVE HUDDLE

The entire offensive operation starts with the huddle; therefore, it
is very important to insist that proper huddle techniques are carried
out properly. It is not too important the way the men are spaced or
lined up in forming the huddle, as long as everyone can see, hear,
and get to their proper offensive positions with as little confusion as
possible. It is a must, however, for the quarterback to use the proper
techniques in calling his plays and in controlling the huddle.
The Quarterback Controls the Team
A quarterback should never allow any talking in the huddle, unless

he asks a question and wants specific information. Therefore, he
should stand out of the huddle until it is completely formed. This will
give any player an opportunity to come and give specific information
to the quarterback without talking in the huddle.
The quarterback should not just call a signal, but on certain
occasions he should make a few extra comments while in the
huddle. He should be forceful, and above all he must be confident.
The team will run the play the way the quarterback calls it.
Occasionally he should single out the man called on to make the key
block and inform him the team is depending upon him. After the
quarterback has called the play, and someone wants to check the
signal, the quarterback should not merely repeat the signal but the
entire play over again.
The quarterback must know the tactical situation at all times. If a
substitute comes into the game, the quarterback should always ask if
there are any instructions from the coach. The quarterback must
always be conscious of the 25 seconds between plays, and he
should use it to his advantage, whether to speed up or slow down
the game. Training the quarterback will be discussed in greater detail
in Chapter 9.
THE QUARTERBACK’S TECHNIQUES

I spoke previously of the quarterback’s stance. It is very important
for him to take the same stance every time. The center must know
where to place the ball every time. Centers and quarterbacks should
work with each other frequently giving them as much practice as
possible. Each center or quarterback has particular traits, and by the
centers and quarterbacks working with each other they get to know
one another better, thus eliminating bad exchanges between them.
The quarterback should have his head up and always look straight
ahead, while observing the defense at all times. He should be as
comfortable and as relaxed as possible, and he should never look
down when receiving the ball from his center.
Taking the Snap-Back
On the snap of the ball, the quarterback should dip his hips so his
hands will follow the tail of the center as he charges. This technique
will also help the quarterback push off. The quarterback will take the
ball with his right hand, using the left as a trapper, as was explained
previously. He should make certain he has the ball, and he should
not fight it, before withdrawing his hands from the center’s crotch. As
soon as the quarterback has possession of the ball, he should bring
it into his “third hand,” his stomach. Such a procedure will help
prevent a fumble. He then wants to push off and execute his
techniques as quickly as possible.
The quarterback must always be cognizant of the fact he cannot
score without the ball; consequently, he wants to make certain he
has possession of it before pulling out of there. If he gets in a big
hurry, he is likely to drop the ball to the ground. I have seen this
occur many times.
After Receiving the Snap
After receiving the snap, the quarterback will operate in one

direction or the other by using a pivot or a cross-over step. The
theory and techniques are basically the same.
First, the quarterback must know the defensive alignment as this
will determine how far he should pivot, or if he needs to take a
position step. After he has recognized the defense and he has taken
the first short jab step or pivot step, he should have the ball in close
to his body until he is ready to hand-off.
The head is the first part of the body around if using the pivot step,
and also the first to move in the direction the quarterback is taking if
the cross-over step is used. After the head is pointed in the right
direction, the steps can be adjusted to avoid running into the ball
carrier. The quarterback should not wave his arms or his body up
and down as he should maintain the same level throughout the
entire operation. The quarterback should never flash (show) the
football; he always operates under control.
Making the Exchange

It is important for the quarterback to know each man with whom he
must make an exchange, including his speed, strong and weak
points. He should look at the target, the far hip, of the man to whom
he is going to hand the football, and he should be very quick with the
exchange. He should try to make the hand-off with the same motion
he would use if he were dealing cards quickly.
Quarterback Faking
The faking of the quarterback is very important. He should always

remember to carry out his fakes realistically. Incidentally we
sometimes give the ball to the man who is supposed to be doing the
faking. We have found it helps our faking as the man is not certain
whether or not the quarterback is going to give the ball to him.
When attempting a fake, it is very important for the quarterback to
look at the man he is faking to, and not merely swing his arms in a
half-hearted fake. He should go through the same motions he uses
when he actually gives the ball to the faker. His fakes must be
realistic for the offense to be effective. The quarterback should
remember he never stops moving while on offense, except when
faking to set up a drop back pass.
BALL CARRYING TECHNIQUES

Every team wins or loses a great percentage of its games due to
the manner in which the backs and ends carry the football. All
players, not only the backs, should be drilled in the proper
mechanics and techniques of carrying a football properly as it is
possible on occasions an interior lineman will have an opportunity to
run with the ball.
The outstanding characteristics or strength of each ball carrier is
likely to be different as one might possess outstanding speed,
another is a nifty, shifty runner, and the third is solid on his feet,
possesses power, and is capable of running over the opposition.
Every ball carrier must utilize his natural talents, and should practice
diligently to become as versatile as possible. Regardless of the
individual style of a runner, the most important point is for him to hold
on to the football and not fumble it.
Good habits, careful handling, and the execution of proper
techniques will prevent fumbles on ball exchanges. When a player
has the ball in his possession and fumbles it, he has committed the
unpardonable sin in my estimation. The ball carrier should remember
to have one point of the football in the palm of his hand with the
fingers around the end of the ball, and gripping it tightly. The other
end should be in the crook of his arm, which should force the football
up close to his body.
If the football is carried properly, and the ball carrier is determined
to hang on to it, the football should never be lost due to a fumble.
The ball is the most valuable object on the football field.
Consequently if the ball carrier fails to hang on to the ball, he is
letting down his entire team. Once a player has control of the ball
and fumbles it, this is no accident. It is either carelessness or lack of
courage. I can’t build a winner with this type of player. In a close
game a fumbled ball can be the deciding factor in winning or losing.
A ball carrier should remember his primary objective is to gain
ground; if possible, to score. Therefore, he should move directly
toward the opposition’s goal line as quickly as possible, unless there
is a definite reason for doing otherwise. There are always exceptions
to the rules, and a ball carrier may not be running toward the
opposition’s goal because he might be trying to make better use of a
blocker, dodge an opponent, get to the opening, time the play
properly, or he might have some other valid reason. There will be
times when top speed and the correct direction will not be sufficient
to get the job done.
Open Field Running Techniques
When a ball carrier is in the open field, he should always keep the
tackler guessing. He should not tip-off whether he is going to try to
outrun him, run through him, or dodge him, until he is close enough
to the tackler to give him the fake and then get by him. The ball
carrier should never concede he is down, and he should always
keep fighting to gain ground until the whistle stops the play.
The ball carrier should always realize and know exactly where he
is on the field, and just what he must do in order for the play to be
successful. In a majority of cases, a ball carrier should be concerned
only with running for a touchdown.
The Importance of Proper Mental Attitude
The basic difference between ordinary and great athletes is mental

attitude. As far as football players are concerned, the ordinary ball
carrier will try to make a touchdown, but he will be satisfied with a
five or six yard gain. The champion athlete, the All-American back, is
dissatisfied when he fails to score. He is always going for the
opposition’s goal line. The ball carrier actually does not succeed in
his objective unless he runs for a touchdown on every play (except
for occasional tactical situations), and he should never be made to
feel he has accomplished his objective unless he scores.
A ball carrier must be made to realize when he does a poor job of
carrying the ball, the effort of the other 10 men has been wasted.
The ball carrier must always be aware of the yardage necessary for
a first down and for a touchdown. When the ball carrier has
possession of the football, he must realize he has the control of the
game in his hands.
OUR BLOCKING TECHNIQUES

Blocking techniques are basically the same for all linemen. The
blocks the backs are called upon to execute are basically the same,
too. Therefore, we will not discuss the techniques of each position,
but merely discuss techniques and procedures as a whole.
The Drive Block

Parallel Genetic Algorithms For Financial Pattern Discovery Using Gpus Springerbriefs in Applied Sciences and Technology Baúto

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallel Genetic Algorithms For Financial Pattern Discovery Using Gpus Springerbriefs in Applied Sciences and Technology Baúto

Uploaded by

Copyright:

Available Formats

Parallel Genetic Algorithms for

Financial Pattern Discovery Using

Science and Technology for Society 5 0 Tim Penulis

Using R for Digital Soil Mapping Progress in Soil

Contemporary Accounts in Drug Discovery and Development

Financial Technology Elif Pardiansyah S Sy M Si Editor

Model Pembelajaran Matematika Berbasis Proyek dalam

Information Technology for Business Sugeng Hariadi

Research Methods in Building Science and Technology

The INSURTECH Book The Insurance Technology Handbook

João Baúto · Rui Neves · Nuno Horta

More information about this series at http://www.springer.com/series/10618

Parallel Genetic Algorithms

ISSN 2191-530X ISSN 2191-5318 (electronic)

Printed on acid-free paper

To Susana and Tiago

To Carla, João and Tiago

Lisbon, Portugal João Baúto

3.4 Symbolic Aggregate approXimation . . . . . . . . . . . . . . . . . . . . . . . 28

6.4 Study Case C: Quality of Solutions . . . . . . . . . . . . . . . . . . . . . . . 84

Computation and GPU Related

Time Series Related

Keywords Computational finance · High performance computing · Graphical

© The Author(s), under exclusive licence to Springer International Publishing AG, 1

The main objective of participating in trading on financial markets is to maximize

C PU E xec. T ime o f S AX/G A

1.3 Book Outline

This Book is organized as follows:

Keywords Techinal analysis · Time series analysis · Piecewise Aggregate

2.1 Time Series Analysis

2.1.1 Euclidean Distance

2.1.2 Dynamic Time Warping

γ (i, j) = D(i, j) + min[γ (i − 1, j), γ (i − 1, j − 1), γ (i, j − 1)] (2.2)

Iterating through all entries of D is a costly problem with exponential complexity.

2.1.3 Piecewise Linear Approximation

segments, or a linear regression where the sub-sequence between breakpoint a and

2.1.4 Piecewise Aggregate Approximation

Piecewise Aggregate Approximation (PAA) presents some similarities to PLA where,

Fig. 2.1 PAA method

p̄i : element ith of approximated time series

In order to discover similarities in time series, PAA uses a ED-based formula

2.1.5 Symbolic Aggregate approXimation

α = ’A’ ii f − ∞ < ci < β1

Table 2.1 Z-scores of N or mal(0, 1) for α = [2, 10]

2.2 Genetic Algorithm

Fig. 2.3 Pseudo code of a GA execution

2.2.1 Selection Operator

Fig. 2.4 Types of selection

2.2.2 Crossover Operator

The crossover operator replicates reproduction between two individuals although is

• Simple point—first half genetic information of parent 1 and second half of

2.2.3 Mutation Operator

Fig. 2.5 Types of crossover operators

2.3 Graphics Processing Units

2.3.1 NVIDIA’s GPU Architecture Overview

NVIDIA’s GPUs follows a unique architecture model, Single-Instruction Multiple-

2.3.2 NVIDIA’s GPU Architectures

2.3.2.1 Tesla Microarchitecture

2.3.2.2 Fermi Microarchitecture

2.3.2.3 Kepler Microarchitecture

Kepler microarchitecture (2012) focused in improving the performance achieved

2.3.2.4 Maxwell Microarchitecture

Maxwell continued Kepler’s trend of better power efficiency and performance