Parallel Genetic Algorithms For Financial Pattern Discovery Using GPUs SpringerBriefs in Applied Sciences and Technology Baúto Full Chapter Free

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

Parallel Genetic Algorithms for

Financial Pattern Discovery Using


GPUs SpringerBriefs in Applied
Sciences and Technology Baúto
Visit to download the full and correct content document:
https://ebookstep.com/product/parallel-genetic-algorithms-for-financial-pattern-discov
ery-using-gpus-springerbriefs-in-applied-sciences-and-technology-bauto/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Science and Technology for Society 5 0 Tim Penulis

https://ebookstep.com/product/science-and-technology-for-
society-5-0-tim-penulis/

Using R for Digital Soil Mapping Progress in Soil


Science Malone

https://ebookstep.com/product/using-r-for-digital-soil-mapping-
progress-in-soil-science-malone/

Contemporary Accounts in Drug Discovery and Development


1st Edition Xianhai Huang

https://ebookstep.com/product/contemporary-accounts-in-drug-
discovery-and-development-1st-edition-xianhai-huang/

Financial Technology Elif Pardiansyah S Sy M Si Editor

https://ebookstep.com/product/financial-technology-elif-
pardiansyah-s-sy-m-si-editor/
Applied Quantitative Finance Statistics and Computing
Härdle

https://ebookstep.com/product/applied-quantitative-finance-
statistics-and-computing-hardle/

Model Pembelajaran Matematika Berbasis Proyek dalam


Kerangka Integrasi Sciences Technology Engineering
Mathematics and Islam STEMI Mulin Nu’Man

https://ebookstep.com/product/model-pembelajaran-matematika-
berbasis-proyek-dalam-kerangka-integrasi-sciences-technology-
engineering-mathematics-and-islam-stemi-mulin-numan/

Information Technology for Business Sugeng Hariadi

https://ebookstep.com/product/information-technology-for-
business-sugeng-hariadi/

Research Methods in Building Science and Technology


Field Based Analysis and Simulation 1st Edition Rahman
Azari

https://ebookstep.com/product/research-methods-in-building-
science-and-technology-field-based-analysis-and-simulation-1st-
edition-rahman-azari/

The INSURTECH Book The Insurance Technology Handbook


for Investors Entrepreneurs and FinTech Visionaries 1st
Edition Vanderlinden

https://ebookstep.com/product/the-insurtech-book-the-insurance-
technology-handbook-for-investors-entrepreneurs-and-fintech-
visionaries-1st-edition-vanderlinden/
SPRINGER BRIEFS IN APPLIED SCIENCES AND
TECHNOLOGY  COMPUTATIONAL INTELLIGENCE

João Baúto · Rui Neves · Nuno Horta

Parallel Genetic
Algorithms for
Financial Pattern
Discovery Using
GPUs

123
SpringerBriefs in Applied Sciences
and Technology

Computational Intelligence

Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Systems Research Institute,
Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the fields of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artificial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.

More information about this series at http://www.springer.com/series/10618


João Baúto Rui Neves Nuno Horta
• •

Parallel Genetic Algorithms


for Financial Pattern
Discovery Using GPUs

123
João Baúto Nuno Horta
Instituto Superior Técnico Instituto Superior Técnico
Instituto de Telecomunicações Instituto de Telecomunicações
Lisbon Lisbon
Portugal Portugal

Rui Neves
Instituto Superior Técnico
Instituto de Telecomunicações
Lisbon
Portugal

ISSN 2191-530X ISSN 2191-5318 (electronic)


SpringerBriefs in Applied Sciences and Technology
ISSN 2520-8551 ISSN 2520-856X (electronic)
SpringerBriefs in Computational Intelligence
ISBN 978-3-319-73328-9 ISBN 978-3-319-73329-6 (eBook)
https://doi.org/10.1007/978-3-319-73329-6
Library of Congress Control Number: 2017963986

© The Author(s), under exclusive licence to Springer International Publishing AG, part of Springer
Nature 2018
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer International Publishing AG
part of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Maria, Manuel and Miguel
João Baúto

To Susana and Tiago


Rui Neves

To Carla, João and Tiago


Nuno Horta
Preface

The financial markets move vast amounts of capital around the world. This fact and
the easy access to trading in a manual or automatic way that creates a more
accessible way to participate in the markets activity attracted the interest of all type
of investors, from the “man on the street” to academic researchers. This type of new
investors and the automatic trading systems influence the market behavior. In order
to adapt to this new reality, the domain of computational finance has received an
increasing attention by people from both finance and computational intelligence
domains.
The main driving force in the field of computational finance, with application to
financial markets, is to define highly profitable and less risky trading strategies. In
order to accomplish this main objective, the defined strategies must process large
amounts of data which include financial markets time series, fundamental analysis
data, technical analysis data. and produce appropriate buy and sell signals for the
selected financial market securities. What may appear, at a first glance, as an easy
problem is, in fact, a huge and highly complex optimization problem, which cannot
be solved analytically. Therefore, this makes the soft computing and in general the
computational intelligence domains especially appropriate for addressing the
problem.
The use of chart patterns is widely spread among traders as an additional tool for
decision making. The chartists, as these analysts are known, try to identify some
known pattern formations and based on previous appearances try to predict future
market trends. The visual pattern identification is hard and largely subject to errors,
and patterns in the financial time series are not as clean as the images in the books,
so the need to create some solution that helps on this task will always be welcomed.
Together, with this, the general availability of GPU boards, today, presents itself as
an excellent alternative execution system, to traditional CPU architectures, to cope
with high-speed processing requirements at relatively low cost.
This work explores the benefits of putting together a low-cost high-performance
computing solution, a GPU-based architecture, and a state-of-the-art computational
finance approach, SAX/GA which combines a Symbolic Aggregate approXimation
(SAX) technique together with an optimization kernel based on genetic algorithms

vii
viii Preface

(GA). The SAX representation is used to describe the financial time series, so that
relevant patterns can be efficiently identified. The evolutionary optimization kernel
is here used to identify the most relevant patterns and generate investment rules.
The SAX technique uses an alphabetic symbolic representation of data defined by
adjustable parameters. In order to capture and preserve the essence of the
explored financial time series, a search for the optimal combination of SAX
parameters is presented. The proposed approach considers a tailored implementa-
tion of the SAX/GA technique to a GPU-based architecture in order to improve the
computational efficiency of the referenced approach. This approach was tested
using real data from S&P500. The achieved results show that the proposed
approach outperforms CPU alternative with speed gains reaching 200 times faster.
The book is organized in five chapters as follows:
• Chapter 1 presents a brief description on the problematic addressed by this book,
namely the investment optimization based on pattern discovery techniques and
high-performance computing based on GPU architectures. Additionally, the
main goals for the work presented in this book as well as the document’s
structure are, also, highlighted in this chapter.
• Chapter 2 discusses fundamental concepts, key to understand the proposed
work, such as pattern recognition or matching, GAs and GPUs.
• Chapter 3 presents a review of the state-of-the-art pattern recognition techniques
with practical application examples.
• Chapter 4 addresses the CPU implementation of the SAX/GA algorithm along
with a detailed explanation of the genetic operators involved. A benchmark
analysis discusses the performance of SAX/GA and introduces possible loca-
tions to accelerate the algorithm.
• Chapter 5 presents the developed solutions along with previous attempts to
accelerate the SAX/GA algorithm. Each solution started as a prototype that
evolved based on the advantages and disadvantages identified.
• Chapter 6 discusses the experimental results obtained for each solution and
compares them to the original implementation. Solutions are evaluated based on
two metrics, the speedup and the ROI indicator.
• Chapter 7 summarizes the provided book and supplies the respective
conclusions and future work.

Lisbon, Portugal João Baúto


Rui Neves
Nuno Horta
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Book Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Euclidean Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 Piecewise Linear Approximation . . . . . . . . . . . . . . . . . . . . 6
2.1.4 Piecewise Aggregate Approximation . . . . . . . . . . . . . . . . . 7
2.1.5 Symbolic Aggregate approXimation . . . . . . . . . . . . . . . . . 8
2.2 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Selection Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Crossover Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Mutation Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Graphics Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 NVIDIA’s GPU Architecture Overview . . . . . . . . . . . . . . . 13
2.3.2 NVIDIA’s GPU Architectures . . . . . . . . . . . . . . . . . . . . . . 15
2.3.3 CUDA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 State-of-the-Art in Pattern Recognition Techniques . . . . . . . . . . . . . 21
3.1 Middle Curve Piecewise Linear Approximation . . . . . . . . . . . . . . 21
3.2 Perceptually Important Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Turning Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

ix
x Contents

3.4 Symbolic Aggregate approXimation . . . . . . . . . . . . . . . . . . . . . . . 28


3.5 Shapelets . . . . . . . . .................................. 28
3.6 Conclusions . . . . . . .................................. 30
References . . . . . . . . . . . .................................. 31
4 SAX/GA CPU Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 SAX/GA CPU Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.1 Population Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.2 Fitness Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.3 Population Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.4 Chromosome Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.5 Individual Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 SAX/GA Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5 GPU-Accelerated SAX/GA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1 Parallel SAX Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.1 Prototype 1: SAX Transformation On-Demand . . . . . . . . . 45
5.1.2 Prototype 2: Speculative FSM . . . . . . . . . . . . . . . . . . . . . 47
5.1.3 Solution A: SAX/GA with Speculative GPU
SAX Transformation . . . . . . . . . . . . . . . . . . . . . ....... 50
5.2 Parallel Dataset Training . . . . . . . . . . . . . . . . . . . . . . . . ....... 55
5.2.1 Prototype 3: Parallel SAX/GA Training . . . . . . . ....... 55
5.2.2 Solution B: Parallel SAX/GA Training with GPU
Fitness Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Fully GPU-Accelerated SAX/GA . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3.1 Population Generation Kernel . . . . . . . . . . . . . . . . . . . . . . 61
5.3.2 Population Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3.3 Gene Crossover Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3.4 Gene Mutation Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3.5 Execution Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1 SAX/GA Initial Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 Study Case A: Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.1 Solution A: SAX/GA with Speculative FSM . . . . . . . . . . . 68
6.2.2 Solution B: Parallel Dataset Training . . . . . . . . . . . . . . . . 74
6.2.3 Solution C: Fully GPU-Accelerated SAX/GA . . . . . . . . . . 77
6.3 Study Case B: FSM Prediction Rate . . . . . . . . . . . . . . . . . . . . . . 81
Contents xi

6.4 Study Case C: Quality of Solutions . . . . . . . . . . . . . . . . . . . . . . . 84


6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Acronyms

Computation and GPU Related


ALU Arithmetic Logic Unit
API Application Programming Interface
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
DirectX Open Graphics Library for 2D or 3D rendering
EA Evolutionary Algorithm
FPU Floating Point Unit
FSM Finite-State Machine
GA Genetic Algorithm
GB Gigabyte
GDDR Graphics Double Data Rate
GPC Graphics Processor Cluster
GPGPU General Purpose Graphic Processing Unit
GPU Graphic Processing Unit
ISA Instruction Set Architecture
KB Kilobyte
MB Megabyte
MM Memory Module
NN Neural Networks
OBST Optimal Binary Search Tree
PCIe Peripheral Component Interconnect Express
PM Processing Module
PS Particle Swarm
PTX Parallel Thread eXecution
SIMT Single-Instruction Multiple-Thread
SFU Special Function Unit
SM Streaming Multiprocessor
SMX Next-Generation Streaming Multiprocessor
SMM Streaming Multiprocessor in Maxwell Architecture

xiii
xiv Acronyms

SP Streaming Processor
SVM Support Vector Machine
TPC Texture/Processor Cluster
TSA Tabu Search Algorithm

Time Series Related


SAX Symbolic Aggregate approXimation
eSAX Extended SAX
iSAX Indexable SAX
DFT Discrete Fourier Transform
PLA Piecewise Linear Approximation
PAA Piecewise Aggregate Approximation
SVD Singular Value Decomposition
ED Euclidean Distance
PD Perpendicular Distance
VD Vertical Distance
DTW Dynamic Time Warping
DDTW Derivative Dynamic Time Warping
MPLA Middle Curve Piecewise Linear Approximation
DPLA Divisive Piecewise Linear Approximation
PIP Perceptually Important Points
SOM Self-Organizing Maps
TP Turning Point

Investment Related
B&H Buy & Hold
C/F Ratio Crossover/Fitness Ratio
HSI Hong Kong Hang Seng Index
IL Enter Long
IS Enter Short
NYSE New York Stock Exchange
OL Exit Long
OS Exit Short
ROI Return on Investment
RSI Relative Strength Index
S&P500 Standard & Poor 500

Others
ECG Electrocardiogram
Chapter 1
Introduction

Abstract This chapter presents a brief description on the scope of the problem
addressed in the book which is the performance and optimization of algorithms
based on pattern discovery. Additionally, the main goals to be achieved by this work
are discussed along with a breakdown of the document’s structure.

Keywords Computational finance · High performance computing · Graphical


Processing Unit

The financial system as it is currently known does not come from an idea invented
one century ago but from the human ideal of trading goods. It is an idea that evolved
into the current stock markets where goods are traded for a monetary value.
Stock markets like New York Stock Exchange (NYSE) and Hong Kong Hang
Seng Index (HSI) are responsible for the movements of tremendously high amount
of capitals. They connect investors from different corners of the world into one com-
mon objective, trading. Trading must occur in real-time where stock prices must be
displayed without any delay and simultaneous to all parties involved. Once presented
with the stock prices, the investors have two main type of analysis, fundamental or
technical, with which they can base their decisions. Some investors are interested in
the company’s position in relation to social or political ideologies, while others are
focused in raw numbers.
The author of [1] discusses a question with it’s fair share of interest, “to what
extent can the past history of a common stock’s price be used to make meaningful
predictions concerning the future price of the stock?”. The motto of technical analysis
depends heavily on the previous question and there are evidences that support this
approach. If the past history of a stock can reveal future movements, one can try
to identify points in history that reflect those movements and use them for future
decisions. These points or patterns are one of most interesting topics of technical
analysis and identifying them has posed a true challenge.

© The Author(s), under exclusive licence to Springer International Publishing AG, 1


part of Springer Nature 2018
J. Baúto et al., Parallel Genetic Algorithms for Financial Pattern Discovery Using GPUs,
Computational Intelligence, https://doi.org/10.1007/978-3-319-73329-6_1
2 1 Introduction

1.1 Motivation

The main objective of participating in trading on financial markets is to maximize


the Return on Investment (ROI). There are two possible decisions that an investor
can make, either play the safe and low risk game of buying an stock and hold it for a
long period of time, also known as Buy & Hold (B&H) strategy, or enter in the high
risk, high reward play of designing a trading strategy that involves multiple entries
(open position) and exits (close position) in the market.
Creating a custom trading strategy that uses patterns as decisions points of entry or
exit on the market can be a tedious and long process of optimization where multiple
possible decision sets must be tested against large time series. Researchers have been
trying to ease the optimization process by reducing datasets while maintaining an
precise representation with minimal data loss however, this is a trade-over between
lower execution time or less accurate trading decision set.
The use of a different execution system is the main ideal behind solving the pre-
vious trade-off, exploiting the characteristics of current state-of-the-art algorithms
to the advantage of many-core systems while combining different time series rep-
resentations. The Central Processing Unit (CPU) was for a while the main sys-
tem used to execute algorithms with heavy workload however, with the increasing
demand in computational resources, an alternative system had to be found. The
NVIDIA’s Graphic Processing Unit (GPU) as it is known did not appear until 2006
but researchers were already using them to accelerate highly parallel algorithms that
resembled to graphical programs such as the rendering of a 2D scene. The architec-
ture of the GPU presents itself as an excellent alternative execution system that not
only was meant to process high volumes of information but also the open access to a
high-level Application Programming Interface (API) that allows a great manipulation
of the GPU.

1.2 Goals

The objective of this work is to study and understand whether the Symbolic Aggregate
approXimation (SAX)/Genetic Algorithm (GA) algorithm can take advantage of
many-core systems such as NVIDIA’s GPU to reduce the execution time of the CPU
sequential implementation. SAX/GA is an algorithm designed to optimize trading
strategies to be applied in the stock market and the whole algorithm was implemented
so that it could explore a vast search space using small populations of individuals.
The authors of SAX/GA [2, 3] found the need to use aggressive genetic operators
capable of preventing the algorithm of entering in a static behaviour and circling
identical solutions.
1.2 Goals 3

The first step is analysing the performance of SAX/GA and understand where are
the causes of prolonged execution time. Once the bottlenecks are identified, different
GPU strategies of optimization will be presented and compared to the original CPU
algorithm based on accuracy of the solution and speedup (Eq. 1.1).

C PU E xec. T ime o f S AX/G A


Speedup = (1.1)
C PU + G PU E xec. T ime o f Solution x

1.3 Book Outline

This Book is organized as follows:


• Chapter 2 discusses fundamental concepts, key to understand the proposed work,
such as pattern recognition or matching, GAs and GPUs.
• Chapter 3 presents a review of the state-of-the-art pattern recognition techniques
with practical application examples.
• Chapter 4 addresses the CPU implementation of the SAX/GA algorithm along
with a detailed explanation of the genetic operators involved. A benchmark anal-
ysis discusses the performance of SAX/GA and introduces possible locations to
accelerate the algorithm.
• Chapter 5 presents the developed solutions along with previous attempts to accel-
erate the SAX/GA algorithm. Each solution started as a prototype that evolved
based on the advantages and disadvantages identified.
• Chapter 6 discusses the experimental results obtained for each solution and com-
pares them to the original implementation. Solutions are evaluated based in two
metrics, the speedup and the ROI indicator.
• Chapter 7 concludes the developed work and indicates aspects of the SAX/GA
algorithm that can be improve in the near future.

References

1. E.F. Fama, The behavior of stock-market prices. J. Bus. 38(1), 34–105 (1965)
2. A. Canelas, R. Neves, N. Horta, A SAX-GA approach to evolve investment strategies on
financial markets based on pattern discovery techniques. Expert Syst. Appl. 40(5), 1579–1590
(2013), http://www.sciencedirect.com/science/article/pii/S0957417412010561. https://doi.org/
10.1016/j.eswa.2012.09.002
3. A. Canelas, R. Neves, N. Horta, Multi-dimensional pattern discovery in financial time series
using sax-ga with extended robustness, in GECCO (2013). https://doi.org/10.1145/2464576.
2464664
Chapter 2
Background

Abstract This Chapter presents some fundamental concepts required to fully under-
stand the topics discussed. First, a brief introduction to some concepts related to
pattern matching and time series dimensional reduction followed, lastly, by an his-
torical and architectural review of GPUs. Time series analysis is one of the pillar
of technical analysis in financial markets. Analysts use variations in a stock’s price
and volume of trade in combination with several well known technical indicators
and chart patterns to forecast what will be the future price of a stock or speculate at
least whether the price will increase or decrease. However, the widespread use of this
indicators and patterns may indirectly influence the direction of the market causing
it to converge into chart patterns that investors recognize.

Keywords Techinal analysis · Time series analysis · Piecewise Aggregate


Approximation · Symbolic Aggregate approXimation · Genetic algorithms
GPU · CUDA

2.1 Time Series Analysis

Searching for chart patterns may seem to be a simple process where two patterns or
time series from different periods would be compared and analysed for similarities
but it is not that trivial as will be later demonstrated. In the following sections, P
will be referred as a normal time series while Q represents the time series to test
similarity with P.

2.1.1 Euclidean Distance

This procedure is the basis for some pattern matching techniques that will be later
presented. Starting with two time series, P = ( p1 , p2 , . . . , pi , . . . , pn ) and Q =
(q1 , q2 , . . . , q j , . . . , qn ), the Euclidean Distance (ED) method iterates through both
series and calculates the distance between pi and q j (Eq. 2.1).
© The Author(s), under exclusive licence to Springer International Publishing AG, 5
part of Springer Nature 2018
J. Baúto et al., Parallel Genetic Algorithms for Financial Pattern Discovery Using GPUs,
Computational Intelligence, https://doi.org/10.1007/978-3-319-73329-6_2
6 2 Background

E D( p, q) = (q1 − p1 )2 + (q2 − p2 )2 (2.1)

At first sight it is possible to observe some important issues. What if both time
series have a different magnitude, and different alignments? With a different magni-
tude, applying the ED method would be pointless as the main feature lies in direct
spatial comparison. The same result is observed with different alignments as both
series may be equal or the very least partially similar, but since they shifted or
unaligned, a direct match will not be found.

2.1.2 Dynamic Time Warping

An alignment technique, Dynamic Time Warping (DTW) [1], can be used to solve the
previous problem. This approach aligns two times series, P = ( p1 , p2 , ..., pi , ..., pn )
and Q = (q1 , q2 , ..., q j , ..., qm ), using a matrix D of n × m. First, to each pair (i, j)
in D, the distance ( pi − q j )2 is calculated. The warping or alignment path (W ) is
obtained by minimizing the cumulative distance defined by,

γ (i, j) = D(i, j) + min[γ (i − 1, j), γ (i − 1, j − 1), γ (i, j − 1)] (2.2)

Iterating through all entries of D is a costly problem with exponential complexity.


To reduce the search space and therefore the number of possible paths, some limi-
tations are introduced to the initial problem. Points in the warping path are mono-
tonically ordered so that i k−1  i k and jk−1  jk preventing that a sub-sequence is
associated a priori and a posteriori. The resulting path must be a continuous path,
starting in (i = 1, j = 1) and ending in (i = n, j = m), so that the algorithm does
not ignore sub-sequences, locking into partial solutions. A warping path that follows
the diagonal of D indicates that the input query Q is completely aligned with the
time series P and therefore they are similar.

2.1.3 Piecewise Linear Approximation

DTW was capable of solving the alignment problem however it was through an
increase in the computational time. Some optimizations could be done to DTW
algorithm but the main issue would remain untouched, the dataset. Financial time
series tend to present small variance in value during a time period and, taking this
into consideration, some points of the dataset can be eliminated.
With a sliding-window Piecewise Linear Approximation (PLA) approach, the time
series is condensed into a representation using N breakpoints where each breakpoint
is the last point that satisfied a threshold condition. The series can then be approxi-
mated by two methods, a linear interpolation, connecting each breakpoint into linear
2.1 Time Series Analysis 7

segments, or a linear regression where the sub-sequence between breakpoint a and


b is approximated through the best fitting line [2]. A linear decreasing or rising seg-
ment does not suggest any behaviour of the time series during that period, only that
the beginning value was higher or lower than the end value, implying that, between
two segments, the series can suffer a trend switching that may not be catch until the
next segment.

2.1.4 Piecewise Aggregate Approximation

Piecewise Aggregate Approximation (PAA) presents some similarities to PLA where,


instead of linear segments, it uses the average of N equal size time series windows
making it a far less complex and time consuming algorithm.
To extract meaningful information from PAA or any other method of time series
representation with the objective of sequence comparison, the dataset must fulfil
one condition, it needs to be normalized otherwise the baseline is off scale. Dataset
normalization grants us the ability of comparing sub-sequences of the dataset with
different magnitude and can be obtained through a standard score normalization
(Eq. 2.3)
xi : normalized value of xi
 xi − μ
xi = μ : mean value of X (2.3)
σ σ : standard deviation of X

N
Once the dataset is normalized, PAA reduces a time series of dimension N into W
N
time windows of size W where W must an integer value otherwise Eq. 2.4 is not valid.
An implementation of PAA with a non-integer number of windows is presented in [3]
where border elements of two windows have a partial contribution to both windows.
For each window, the mean value is calculated (Eq. 2.4) and that value is assigned to
represent a time window as represented in Fig. 2.1.

Fig. 2.1 PAA method


8 2 Background

p̄i : element ith of approximated time series


w ·i
n
w  x j : element jth of original time series
p̄i = · x j (2.4)
n n : size of original time series
n
j= ·(i−1)+1 w : size of PAA time series
w

In order to discover similarities in time series, PAA uses a ED-based formula


where, instead of the point-to-point calculation, it uses the mean value of the reduced
series (Eq. 2.5).  
  w
 n 
Distance(P, Q) =  ·  ( p̄i − q̄i )2 (2.5)
w i=1

2.1.5 Symbolic Aggregate approXimation

SAX [4] can be viewed as an improvement to PAA as it still uses this method to
obtain a dimensional reduced time series but adds a new type of data transformation,
numeric to symbolic representation.
This transformation relies in a Normal distribution, N ∼ (0, 1), with αn intervals
where the probability between the z-score of αi+1 (βi+1 ) minus αi z-score (βi ) must
1
be equal to , where each interval is to considered as a symbol. For example, with
αn
αn = 3, there are 3 intervals, all of them with equal probability of 33.3% and with
symbolic representation,

α = ’A’ ii f − ∞ < ci < β1


α = ’B’ ii f β1 < ci < β2 (2.6)
α = ’C’ ii f β2 < ci < +∞

In Fig. 2.2, frame 3 (c3 ) has an average value of 1.5 and considering an alphabet
with 3 letters (α = 3), from Table 2.1 and Eq. 2.6 it is possible to assess that c3
is between β2 and ∞ and, therefore, the corresponding letter is ’C’. This method
ensures that, in an alphabet containing all possible symbols in the SAX representation,
each symbol has equal probability allowing a direct comparison. The z-score values
(Table 2.1) were obtained from [4, 5].
Now that PAA series is normalized and the z-scores of α are known, the SAX
representation can be easily obtained. To each segment of PAA (ci ) a corresponding
α interval will be assigned so that α must satisfy to conditions similar to those in
Eq. 2.6. The transformation in Fig. 2.2 compressed a window with size equal to 500
into a SAX sequence with si ze = 10 and an alphabet of 3 letters.
Until this point, there is not must of an improvement since there is not a way to
compare two time series, the input and search pattern. The authors of SAX faced
a struggle, how to compared two series if they are represented in a string format?
2.1 Time Series Analysis 9

Fig. 2.2 Transformation of a PAA series into a SAX series with 3 symbols

Table 2.1 Z-scores of N or mal(0, 1) for α = [2, 10]


Z-Score Alphabet Size (αi )
(β j )
2 3 4 5 6 7 8 9 10
1 0.000 −0.431 −0.675 −0.842 −0.968 −1.068 −1.150 −1.221 −1.282
2 – 0.431 0.000 −0.253 −0.431 −0.566 −0.675 −0.765 -0.842
3 – – 0.671 0.255 0.000 −0.180 −0.319 −0.431 −0.524
4 – – – 0.842 0.431 0.180 0.000 −0.140 −0.253
5 – – – – 0.967 0.566 0.317 0.140 0.000
6 – – – – – 1.068 0.671 0.431 0.253
7 – – – – – – 1.150 0.765 0.524
8 – – – – – – – 1.221 0.842
9 – – – – – – – – 1.282

It is possible to know if both series are equal but not if they are similar. Lin et al.
[4] needed to redefined the distance measure so that two symbolic series could be
compared. Similar to the PAA distance, this new distance measure is defined by,
 
  w
n   2
M I N D I ST ( P̂, Q̂) =  ·  dist ( p̂i − q̂i ) (2.7)
w i=1

At first sight, Eq. 2.7 is essentially equal to the one used in PAA. However a new
element was added, the dist(·) function. This function (Eq. 2.8) calculates the
distance between two symbols based on the z-scores values used to transform from
numeric to symbolic representation. For instance, with an alphabet of 4 symbols, the
distance between ’A’ and ’C’ will be given by the z-score of c minus a z-score. In
case of near symbols, such as ’A’–’B’ or ’C’–’D’, the distance will be evaluated
as zero.
10 2 Background


⎨0, |i − j|  1
dist ( p̂i − q̂i ) = β j−1 − βi , i < j − 1 (2.8)


βi−1 − β j i > j + 1

SAX symbolic representation can produce a very compact and efficient time
series however it is subject to a particular problem, mainly caused by PAA. Since the
symbolic representation of each window is calculated using the average value of the
series in that window, it cannot, accurately, represent a trend as important points will
be ignored. An alternative solution, Extended SAX (eSAX) [6], can be used to fix
this issue. Instead of only considering the average value of the frame, two additional
points are added, the maximum and minimum value of the frame. This values will
composed a string of ordered triplets, < vmin , vavg , vmax >, that can help understand
the behaviour inside each frame.

2.2 Genetic Algorithm

Algorithms are methods that transform input data, through a set of operations, into
an output that is a solution to a specific problem. However, sometimes, finding a
solution may not be so linear. A particular type of problem fall into an optimiza-
tion problem where an approximate and less time consuming solution is acceptable
instead of a more accurate but more time costly. To tackle these problems, researchers
switched to a different field of algorithms, Evolutionary Algorithms (EAs), taking
also advantage of innovative data representation. EAs include, but are not limited
to, Neural Networks (NN), Particle Swarm (PS) and the one relevant to this work,
the Genetic Algorithm (GA). These algorithms follow an identical idea, evolution
of a population of individuals until a near-optimal solution is achieved, inspired by
Darwin’s natural selection and survival of the fittest.
A GA works around a pool of individuals or chromosomes. Each individual, ran-
domly generated, represents a possible solution to the optimization problem and, at
the beginning, is assigned with a score according to an evaluation, the fitness func-
tion. To follow the biological process of evolution, individuals must be subject to
reproduction, where two individuals are randomly selected from the population and
their genetic information is mixed to form two offspring, hopefully with better char-
acteristics. As chromosomes reproduce, there a risk of mutation where one or more
genes of a chromosome are inadvertently changed, also hoping for more favourable
features. At the end of each reproduction cycle, all individuals in the population are
evaluated based on the fitness function and the worst percentage of the population is
discarded (Fig. 2.3).
2.2 Genetic Algorithm 11

Fig. 2.3 Pseudo code of a GA execution

2.2.1 Selection Operator

The three main selection techniques (Fig. 2.4) are the tournament selection, roulette
wheel and rank-based roulette wheel selection [7]. Tournament selection techniques
use n random individuals where two or more individuals compete between them
and the winner is allowed to proceed to the next stage. Roulette wheel selection is
based in a probabilistic model where the best scoring individuals have the highest
probability of being selected to reproduce while low scoring individuals have limited
chances but not null. The rank-based selection tries to prevent the overpowering of
highly fit individuals by mapping their fitness score into ranks.

Fig. 2.4 Types of selection


operators
12 2 Background

2.2.2 Crossover Operator

The crossover operator replicates reproduction between two individuals although is


not applied to all individuals in the population but instead to only a percentage of it
is used. The number of individuals that are selected for crossover is directly related
with the percentage of chromosomes that are discarded between generations. The
transfer of information between two individuals is performed by choosing one or
more breakpoints so that,

• Simple point—first half genetic information of parent 1 and second half of


parent 2 is transferred to the offspring 1 while offspring 2 receives
the second half of parent 1 and parent 2 first half.
• N point—an adaptation to simple point crossover where each parent is split into
N equal parts alternating information between them.
• Uniform—gene-wise transfer where a gene in position i of both parents has 50%
probability of being sent to an offspring (Fig. 2.5).

2.2.3 Mutation Operator

When searching for a solution, GAs are prone to be stuck in a local optima, points
in a limited closed space where the solution is optimal but to an open space it is not.
To prevent algorithms from entering in a local optima, mutation operators performed
small changes to individuals introducing new possible solutions and increasing pop-
ulation diversity (Fig. 2.6).

Fig. 2.5 Types of crossover operators


2.3 Graphics Processing Units 13

Fig. 2.6 Mutation example. Gene at the beginning and end were mutated causing a change in the
genetic information

2.3 Graphics Processing Units

GPU, as commonly known, were firstly introduced by NVIDIA in 1999 [8]. The
new generation of graphical processors, GeForce 2, shifted vertex transformation
and lightning (T&L) from the CPU to the GPU by including dedicated hardware. By
2001 NVIDIA had replaced fixed-function shaders by programmable vertex shaders,
units capable of performing custom instructions over the pixels and vertices of a
scene [9].
Although shader programming was limited to the usage of current graphics API
such as OpenGL and DirectX, researchers tried with some success to solve non-
graphics problems on GPUs by masking them into traditional rendering problems.
Thompson [10] proposed a GPU implementation of matrix multiplication and 3-
SAT using a GeForce Ti4600 and OpenGL’s API obtaining a speed-up of up to 3.2×
when comparing CPU/GPU. Other applications include Ray tracing [11] and Level
set methods [12]. This was the first step into the (GPGPU) programming.
The performance of rendering a 3D scene was heavily linked with the type of
shader used since a GPU normally processes more pixels than vertices, three to one
ratio [8], and with a predefined number of processors the workload is normally unbal-
anced across all processors. Nonetheless, with the release of Tesla based-architecture
GeForce 8, NVIDIA accomplished an important milestone to what is now known
as the GPU architecture. Unifying vertex shaders with Tesla’s new feature, pro-
grammable pixel-fragment shaders into a single shader pipeline, created a new world
to programmers and developers, enabling them to balance workload between vertex
and pixel shaders [9]. This pipeline now behaves similar to the basic CPU archi-
tecture, with its own instruction memory, instruction cache and sequential control
logic. Additionally, Compute Unified Device Architecture (CUDA) framework was
released. CUDA provided access to a parallel architecture capable of being pro-
grammed with high-level languages like C and C++ breaking the need of graphics
API, completing the transition into the GPGPU era.

2.3.1 NVIDIA’s GPU Architecture Overview

NVIDIA’s GPUs follows a unique architecture model, Single-Instruction Multiple-


Thread (SIMT). The foundation of this model leans on multiple threads executing
14 2 Background

the same instruction but in different datasets and that is why it is so useful in a 2D/3D
scene rendering, few operations are required however thousands of pixels need to be
processed.
To obtain a SIMT architecture, the GPU must be designed to execute hundreds
of threads concurrently [13]. On a top-level, a GPU is a combination of multiple
Streaming Multiprocessor (SM), independent multi-threaded units responsible for
the creation, management, schedule and launch of threads, paired in groups of 32
called warps. Each SM features an instruction cache, warp schedulers that selects
warps ready to execute,instruction dispatch units that issues instruction to individual
warps, a 32-bit register file, a shared memory, several types of cache and the most
important element, the CUDA core or Streaming Processor (SP).
On the memory side, a GPU memory organization is divided in a 3 level hierarchi-
cal structure. Each level has a defined set of functions, benefits and limitations and it
is the programmer’s responsibility to assure the appropriate use and correct manage-
ment. All SMs are connected and can communicate through a global memory located
off-chip and with a magnitude of Gigabyte (GB) that is linked to the CPU through the
Peripheral Component Interconnect Express (PCIe) bus. Being a “general” access
off-chip memory leads to an important problem, the latency between requesting and
retrieving information, which can be as high as 800 clock cycles depending on the
device capability [13]. The accesses in global memory can be done with either 32-,
64- or 128-bytes memory transactions which must be aligned to a multiple of the
transaction size, e.g. a warp that requests a sequential 4-byte word with address range
116–244 triggers two 128-byte transactions from address 0 to 256. Ideally, a warp’s
accesses should be coalesced meaning that each thread requests a sequential and
aligned word that is transferred in one or more memory transaction depending on the
word and transaction size. In more recent architectures, aligned but non-sequential
accesses are considered as coalesced transactions.
On a second level, there is a set of caches and an important mechanism of commu-
nication between threads, a shared memory. The latter memory consists in a fast high
throughput access memory located inside each SM that is accessible although only a
small size is available, around the Kilobyte (KB) magnitude. Such advantages come
with disadvantages mainly the access pattern by threads. To achieve peak throughput,
NVIDIA organized shared memory in a modular structure of equally-sized memory
modules called banks with memory lines of either 16 or 32 four bytes banks, compute
capability dependent. Maximum memory bandwidth is obtained by performing read
or writes in n addresses that match n unique banks however once m threads execute
an instruction whose address falls in the same memory bank, it triggers a m-way
bank conflict and each conflict is served in serially.
With the exception of Tesla microarchitecture, two levels of caches, L1 and L2,
are present to assist memory transaction between threads and global memory where
the L2 cache is mainly used to cache global memory loads and the L1 cache is for
local memory accesses (memory whose size is not known at compile time such as
dynamic size arrays or register spill).
For the third and more restrict level, each SM is equipped with a 32-bit register file
with the higher throughput available dedicated for private variables of each thread.
2.3 Graphics Processing Units 15

The limited size of the register file creates a constraint in the number of registers
used per thread which can vary from 63 to 255 depending on the microarchitecture.
Although threads are allowed to allocate up to this limit, such will reduced the number
of active warps per SM and therefore decrease the overall performance.

2.3.2 NVIDIA’s GPU Architectures

Over the course of one decade, NVIDIA has been releasing new architectures,
improving existing features while providing developers with new techniques to
increase parallelism in GPU. This section presents an brief overview of NVIDIA’s
lastest GPU generations with technical aspects related to the GPU architecture and
features to enhance parallelism.

2.3.2.1 Tesla Microarchitecture

With the release of Tesla microarchitecture in 2006, NVIDIA introduced the world
to a programmable unified architecture. Tesla is organized on a top-level with eight
Texture/Processor Cluster (TPC) each consisting of one texture unit and two SM
(later increased to three in GT200). The SMs are structured with eight CUDA cores,
two Special-function Unit (SFU) that are responsible for transcendental functions
(functions that can not be expressed through a polynomial expression such as square
root, exponential and trigonometric operations and their inverses), an instruction
fetch and issue unit with instruction cache that serves multiple concurrent threads
with zero scheduling overhead, a read-only constant cache and a 16 KB shared
memory.
The shared memory is split into 16 banks of consecutive four bytes words with
high throughput when each bank is requested by distinct threads in a warp. However
there is a discrepancy between the number of threads and banks and when a warp tries
to access shared memory banks, the requests are divided in independent accesses, one
per half-warp, that should not have bank conflicts. In case of multiple threads reading
from the same bank, a broadcast mechanism is available serving all requesting threads
simultaneously [14].

2.3.2.2 Fermi Microarchitecture

Fermi (2010) brought major changes for both the SM and memory organization.
Graphics Processor Cluster (GPC) replaced the TPC as the top-level module through
the introduction of four dedicated texture units removing the now redundant texture
unit in Tesla, while increasing the overall number of SMs from two (three in GT200)
to four SMs. The SMs now feature 32 CUDA cores and a new configurable cache
with two possible configurations that gives freedom to the programmer, where for
16 2 Background

graphics programs a lower L1 cache is beneficial and for compute program a larger
shared memory allows more cooperation between threads. This cache can be used as
16 KB of L1 cache and 48 KB of shared memory or 48 KB of L1 cache and 16 KB
of shared memory. Besides a configurable cache, shared memory suffered internal
changes. Previously with Tesla, shared memory was organized into 16 four bytes
that served a warp in two independent transactions without bank conflicts however
with Fermi the number of banks was raised to 32 with one request per warp. Bank
conflicts are still present in Fermi with addition to the broadcast mechanism added
with Tesla.
The increase in CUDA cores and a renewed cache were not the only changes
in the SM structure. The number of SFU was doubled to four, each capable of
one transcendental instruction per thread independently of other execution units
preventing a stall in the GPU pipeline due to a separation of CUDA cores and SFU
units from the dispatch unit responsible for serving instruction to each execution unit
and because with Fermi two separate dispatch units are available. The destination
addresses of a thread result is now calculated by one of the 16 Load/Store units
available for a total of 16 thread results per clock. The workload is divided across
two groups of 16 CUDA cores each and instructions are distributed by two warp
schedulers allowing two warps to be issued and executed concurrently meaning
that for a work group to be complete execution, two clock cycles are required (for
transcendental instructions it takes eight cycles for all four SFUs to execute).

2.3.2.3 Kepler Microarchitecture

Kepler microarchitecture (2012) focused in improving the performance achieved


with Fermi while decreasing the overall power consumption. The top-level structure
remained the same with the GPC module however the SM is now called Stream-
ing Multiprocessor (SMX). Each SMX features 192 CUDA cores, 32 Load/Store
units and 32 SFUs now capable of serving a transcendental instruction per warp in
one clock cycle. An important change to achieve NVIDIA’s mission of increasing
Kepler’s performance passed by doubling the number of warp schedulers and with
that also increasing the number of instruction dispatchers to two. With this change,
each warp now executes two independent instructions in one clock cycle if possible.

2.3.2.4 Maxwell Microarchitecture

Maxwell continued Kepler’s trend of better power efficiency and performance


improvement. The new SM, now called Streaming Multiprocessor (SMM), suffered a
decrease in CUDA cores, from 192 to 128, keeping the same amount of special execu-
tion units which allowed a new configuration of the SMM. A SMM is now organized
into four smaller groups each with 32 CUDA cores, eight Load/Store units, eight
SFUs, one warp scheduler and two instruction dispatchers. This represents an over-
all decrease of 33% in CUDA cores however a Maxwell Streaming Processor (SP)
2.3 Graphics Processing Units 17

Table 2.2 Architectural comparison between Fermi, Kepler and Maxwell [13, 15–18]
Specifications Fermi - GF 100 Kepler - GK 104 Maxwell - GM 204
Compute capability 2.0 3.0 5.2
Streaming multiprocessor (SM) 11–16 6–8 13–16
CUDA cores 353–512 1152–1536 1664–2048
Theoretical Floating Point Single 855–1345 2100–3000 3500–4600
Precision (GFLOPS)
Main Memory (Megabyte (MB)) 1024–1536 1536–4096 4096
L1Cache(K B) 48 16 16 32 48 24
64 64
Shar ed Memor y(K B) 16 48 48 32 16 96
L2 Cache (KB) 768 1792–2048
Maximum Registers per Thread 63 255
Maximum Registers per SM 32768 65536
Threads per Warp 32
Maximum Warps per SM 48 64
Maximum Blocks per SM 8 16 32
Maximum Threads per SM 1536 2048
Maximum Threads per Block 1024

is equivalent to 1.4 Kepler SP performance-wise, delivering identical performance


with the advantage of a more friendly configuration, power of two organization [15].
Additionally, the shared memory is now a dedicated unit with maximum capacity
of 96 KB although limited to 48 KB per block but with identical characteristics
of Kepler and Fermi. The L1 and texture cache were combined into a single unit
therefore forcing the L2 cache to also be used for caching of local loads and possibly
increasing the latency in case of register spilling [13] (Table 2.2).

2.3.3 CUDA Architecture

In parallel programming, the basic execution unit is the thread. In a CPU, threads
are sub-routines of a main program scheduled to execute a custom set of instruction
that may include memory accesses to local or shared resources. If necessary, threads
can communicate between them using a global resource or memory, however special
attention is required if running threads are performing write operations in the same
memory address.
CUDA introduced a general purpose parallel computing platform and program-
ming model able to combine well established programming languages with an highly
parallel architecture that is a GPU. Creating a functional CUDA C program in a GPU
is a three-stage process. First, the execution environment must be defined. This envi-
ronment consist in a kernel where a developer formalizes the routine to be executed
18 2 Background

in the GPU and how it should be executed. The kernel definition has four arguments
associated, the number of blocks, number of threads, size of dynamic shared mem-
ory per block and stream ID. The way a kernel is defined reflects how the problem
is spatially organized, e.g., parallel sum reduction over an array can be represented
with an 1D kernel and a multiplication between two matrices with an 2D kernel. In
Fig. 2.7, a kernel is declared with 4 blocks, each with 16 × 16 threads (256 in total)
while the size of dynamic shared memory and stream ID are optional, defaulting to
0—Example from [17].
Once the kernel is declared, the second stage begins. The program is compiled
through NVIDIA’s compiler driver, NVCC, that generates a set of binaries that include
the GPU assembly code, Parallel Thread eXecution (PTX), containing the execu-
tion plan for a given thread [19]. Each thread is assigned a unique three-element
identifier (x,y,z coordinates), threadIdx, that will locate her in the GPU execu-
tion plan. Based on several of the available compilation flags, NVCC can perform
some optimizations that can increase a kernel performance. One of those flags,
-maxrregcount, grants the programmer a way to lock the maximum registers
allowed per thread which can greatly impact the kernel performance. By reducing
the register usage per thread, with the same register file it is possible to effectively
allocate more blocks to each SM resulting in more warps being dispatched. Another
advantage is preventing register spilling. With complex kernels, NVCC’s task of cre-
ating PTX code becomes harder and eventually there not enough registers to satisfy
a thread’s needs. In those cases, local memory is used to replicate a register function
and since this type of memory is addressed in the global memory space, it inherits
all characteristics such as latency. The main issue with maxrregcount flag is that
it forces the compiler to generate additional instructions that may not compensate
for the extra one or two blocks per SM.
Furthermore, NVCC has internal mechanisms that are able to optimize redundant
code and prevent duplicate operations identical to those in Fig. 2.8.
And finally, the program’s execution in the GPU. At this point, all threads are
organized, spatially, in a single or multi-dimensional grid formed by blocks. The
SMs are assigned multiple unique blocks (Fig. 2.9) by a global scheduler, GigaThread
unit in post Tesla microarchitecture, from which SMs schedule and execute smaller
groups of 32 consecutive threads called warps. Threads in a warp execute a common
instruction at a time that should not invoke conditional branching operations as it
will introduce thread divergence and therefore the serial execution of all threads in
each branching path until they reach a common instruction again (only applied to
threads in the same warp) [13]. Once a warp finishes executing, a warp scheduler
switches context, with no overhead cost, and replaces the current warp in a SM by a

Fig. 2.7 Kernel declaration


in CUDA C
2.3 Graphics Processing Units 19

Fig. 2.8 Kernel code pre-NVCC and post-NVCC optimization

Fig. 2.9 Block division in SMs

new one, from the same block or not, ready to execute. This mechanism is also used
to mask the latency associated with memory transactions since it prevents stalling
the pipeline while a warp waits for the transaction to be completed.

2.4 Conclusions

This chapter presented an introduction to some basic techniques that are the founda-
tion for many of the state-of-the-art pattern matching, the basic concepts of the GA
and finally a review of NVIDIA’s GPUs. The pattern matching techniques can be
divided into two categories, linear and aggregate approximations, that try to create
accurate approximations of a time series using the minimum amount of points possi-
ble. The GA takes part of a group of algorithms, EAs, that attempt to solve problems
that do not have a concrete solution such as non convex problems. The GPUs are
alternative execution system to the common multi-core systems that used the CPU
as the main processing unit. A GPU started as a system that was meant to process 2D
and 3D graphical scenes however, researchers identified the ability of using them to
accelerate highly parallel algorithms.

References

1. D.J. Berndt, J. Clifford, Using dynamic time warping to find patterns in time series, in KDD
Workshop (1994), pp. 359–370
2. E. Keogh, S. Chu, D. Hart, M. Pazzani, Segmenting time series: a survey and novel approach.
Data Mining in Time Series Databases (2003), pp. 1–21
3. L. Wei, Sax: N/n not equal an integer case, http://alumni.cs.ucr.edu/wli/
20 2 Background

4. J. Lin, E. Keogh, S. Lonardi, B. Chiu, A symbolic representation of time series, with impli-
cations for streaming algorithms, in Proceedings of the 8th ACM SIGMOD Workshop on 78
Research Issues in Data Mining and Knowledge Discovery, ser. DMKD 2003. (ACM, New
York, NY, USA, 2003), pp. 2–11. https://doi.org/10.1145/882082.882086
5. A. Canelas, R. Neves, N. Horta, A sax-ga approach to evolve investment strategies on financial
markets based on pattern discovery techniques. Expert Syst. Appl. 40(5), 1579–1590 (2013),
http://www.sciencedirect.com/science/article/pii/S0957417412010561
6. B. Lkhagva, Y. Suzuki, K. Kawagoe, Dews2006 4a-i8 extended sax: extension of symbolic
aggregate approximation for financial time series data representation (2006)
7. N. Razali, J. Geraghty, Genetic algorithm performance with different selection strategies in
solving tsp. IEEE Micro 31(2), 50–59 (2011)
8. E. Lindholm, J. Nickolls, S. Oberman, J. Montrym, Nvidia tesla: a unified graphics and com-
puting architecture. IEEE Micro. 28(2), 39–55 (2008)
9. D. Luebke, G. Humphreys, How gpus work. IEEE Comput. Soc. 40(2), 96–100 (2007)
10. C.J. Thompson, S. Hahn, M. Oskin, Using modern graphics architectures for general-purpose
computing: a framework and analysis, in Proceedings 35th Annual IEEE/ACM International
Symposium on Microarchitecture (2002), pp. 306–317
11. T.J. Purcell, I. Buck, W.R. Mark, P. Hanrahan, Ray tracing on programmable graphics hardware,
in Proceedings of ACM SIGGRAPH 2002 ACM Transactions on Graphics (TOG), vol. 21
(2002), pp. 703–712
12. M. Rumpf, R. Strzodka, Level set segmentation in graphics hardware, in Proceedings of Image
Processing, vol. 3 (2001), pp. 1103–1106
13. NVIDIA Corporation, Nvidia cuda compute unified device architecture programming guide
(2015), ]urlhttps: //docs.nvidia.com/cuda/pdf/CUDA C Programming Guide.pdf. Accessed 15
Nov 2015
14. NVIDIA Corporation, Nvidia cuda compute unified device architecture programming guide
(2012), https://www.cs.unc.edu/prins/Classes/633/Readings/CUDA_C_ProgrammingGuide_
4.2.pdf. Accessed 10 Aug 2016
15. N. Corporation, Whitepaper—nvidia geforce gtx 980 (2014), http://international.download.
nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF
16. C.M. Wittenbrink, E. Kilgariff, A. Prabhu, Fermi gf100 gpu architecture, in Proceedings of the
World Congress on Engineering, vol. 2 (2011)
17. J. Sanders, E. Kandrot, CUDA By Example: An Introduction to General-Purpose GPU Pro-
gramming. (Addison-Wesley, 2012)
18. N. Corporation, Whitepaper—nvidia geforce gtx 680 (2012), http://people.math.umass.edu/
johnston/M697S12/Nvidia_Kepler_Whitepaper.pdf. Accessed 13 Nov 2015
19. N. Corporation, Parallel Thread Execution ISA Application Guide v3.2 (2013), http://docs.
nvidia.com/cuda/pdf/Inline_PTX_Assembly.pdf. Accessed 13 Nov 2015
Chapter 3
State-of-the-Art in Pattern Recognition
Techniques

Abstract Pattern recognition, matching or discovery are terms associated with the
comparison of an input query, a pattern, with a time series sequence. These input
queries can be patterns similar to those presented in Chen (Essentials of Technical
Analysis for Financial Markets, 2010 [1]) or user-defined ones. Although focus will
be in pattern matching techniques applied to financial time series, these techniques
proved to be very versatile and expandable to different areas, going from the medical
sector with applications in Electrocardiogram (ECG) Chen et al. (Comput Meth-
ods Programs Biomed 74:11–27, 2004 [2]) to the energy sector with forecasting
and modelling of buildings energetic profile Iglesias and Kastner (Energies 6:579,
2013 [3]).

Keywords Pattern discovery · Middle curve Piecewise Linear Approximation


Perceptually Important Points · Symbolic Aggregate approXimation · Turning
Points · Shapelets

3.1 Middle Curve Piecewise Linear Approximation

An enhanced version of DTW combined with a PLA based approach is introduced by


[4] with the purpose of reducing the error associated with time series approximations.
The author uses Derivative Dynamic Time Warping (DDTW) applied to two different
time series dimensional reduction approaches, PAA and Middle curve Piecewise
Linear Approximation (MPLA). The main reason behind using DDTW and not
DTW lies in a alignment weakness of DTW. With two unaligned times series, P and
Q, DTW does not take in consideration the current trend, so eventually by mistake,
a down-trend sub-sequence of P will be aligned with an up-trend sub-sequence of
Q. DDTW solves this issue by redefining the distance measure. The new approach
does not use the time series point itself but instead a value (Eq. 3.1) that reflects the
current trend.

© The Author(s), under exclusive licence to Springer International Publishing AG, 21


part of Springer Nature 2018
J. Baúto et al., Parallel Genetic Algorithms for Financial Pattern Discovery Using GPUs,
Computational Intelligence, https://doi.org/10.1007/978-3-319-73329-6_3
Another random document with
no related content on Scribd:
The Darma Parganah was of some interest to me, as one of the minor routes
into Tibet was along this river. Darma proper was divided into two divisions: the
Malla and the Talla, or “upper” and “lower,” Darma. The Malla Darma is that
portion which comprises the Lissar River and the Dholi Ganga, whereas the
Talla Darma, as its name suggests, lies nearer to the point at which the Dholi
Ganga meets the Kali River.

The Darma Shokas, a tribe somewhat differing from the Shokas of Bias and
Chaudas, carry on the entire trade with Tibet by the Darma route. Gyanema is
the main centre, and the commodities are chiefly borax, salt, wool, skins, cloth,
and utensils, in exchange for which the Tibetans receive silver, wheat, rice,
sattoo, ghur, candied sugar, pepper, beads of all kinds, and a few articles of
Indian manufacture. [198]

It was getting towards the end of September when I was in this region, and the
weather was very cold and stormy. We had plenty of snow every night and the
winds were cutting. It was a great temptation, I must confess, when we reached
the Dholi River, to turn towards the south, which would bring us to lower
elevations and therefore to warmth and comfort; but my work was not finished,
and we had again to go towards the north (N.N.W., to be strictly accurate), for I
wished to solve certain geographical problems and visit some passes into the
Forbidden Land which I had not yet ascended.
A Phantom Lion of Gigantic Proportions

We camped that night at a dreary spot called Gankan (12,295 feet), where we
expected to find some traders, this being one of their temporary stations, but did
not. So we fared rather badly. We could find no fuel, and the supply which they
generally bring up with them was quite exhausted. All the trading with Tibet was
now over from this side, and everybody had retired southward. Two stray sheep
—one dead, the other still alive but with broken legs—were lying near the wall
which marked the favourite spot for a camping-ground. We passed a very chilly
night, and the next day when we woke snow was falling heavily. My [199]men
seemed to be sufferingly greatly, and I decided to ask for two volunteers to
accompany me and carry my instruments to the glaciers northwards, the
remainder of the expedition proceeding one march southward, to a place where
fuel could be obtained, and awaiting our return there.

We three started off in a fierce wind at six o’clock in the morning, and passed
three small glaciers to the east—the Suiti, Pungrung, and Mangti. To the west
were five other smaller glaciers. We had gradually risen to 15,000 feet, and
farther on, at the foot of the Nui Glacier, at the spot known as the Nui
Encamping-ground, the altitude was 16,950 feet.
It was at this place that, in the mist and snow, we saw the immense image of
what seemed a conventional crouching lion sculptured in the rock. On
approaching it, however, the illusion was explained. The main body—as I have
already explained—was merely a gigantic boulder, while the extended paws and
tail were mani walls with end chokdens built away from the rock. From a certain
point of view it looked exactly like a lion.

This being the last camp before traders attempt the high pass, many chokdens
of all sizes are to be seen all over the valley and on the hill-side. One [200]of the
peculiarities of these chokdens is that they are as much as possible built with
white or light-coloured stones.

The wind had got much worse as we got higher, and the effort of walking was
considerable. We had gone some eleven miles, and my two men were so
exhausted they were unable to continue. They dragged along uncomplainingly,
but I could see that they were on the verge of breaking down. At the foot of the
Nui we had some food, and having laden myself with all the necessary
instruments and cameras—quite a considerable weight—left my two men to
await my return, while I went alone to survey the Nui Glacier and climb the high
pass.

Once alone, I proceeded at a greater speed, but the ground was much broken
by huge boulders, and to cover a short distance involved a lot of labour. About
one mile and a half from where I had left my men I came in for an experience
which I did not quite expect at that moment, although, fortunately, I was
prepared for any emergency.

The Tibetans had had time to prepare a great many snares for me, and to send
soldiers to all the passes, and what they could not do by facing me [201]direct
they attempted as usual to accomplish by treachery. Much to my astonishment
in this desolate region, I came upon a Tibetan comfortably seated upon the
ground, upon which he had spread several coats. I asked him if he were alone,
and he said yes.

“What are you doing here?”

“I am going back to my country. My friends went ahead yesterday.”

“Surely you have some one with you; you cannot carry all those coats and
paraphernalia?”
“No, no no; I am quite alone.”

As I was standing talking to him I noticed that his eyes were looking at
something behind me, and on turning round found myself confronted with three
Tibetans, who had evidently crawled out from behind rocks where they were
hidden. They made a dash to seize my rifle as I unslung it from my shoulders,
but they were not quick enough. In a second the fourth fellow—the one sitting
down—had jumped up to help his companions.

One fellow got somewhat of a dent in his skull with the butt of my rifle, the
others, unluckily, ran away, and I did not pursue them, as I needed all the
strength I possessed to go up the pass. As the Tibetans disappeared in the
direction I had come [202]I became rather alarmed for my men, lest they should
be taken by treachery, but I knew they could take care of themselves.

After taking a rest, for the violent exertion had caused me a deal of panting and
blowing, I continued. I soon got out of the débris and boulders, where I
proceeded with great caution, and got upon the snow on the north side of the
glacier. For real majestic beauty the Nui Glacier cannot be surpassed. It has
immense terraces of clear greenish ice, quite regular and well padded with snow
on the surface; gigantic crevasses, down which one was almost afraid to look,
and a background of huge white sharp-edged peaks, like the teeth of a saw, so
white indeed that the stormy sky beyond looked as black as ink. It was truly one
of the most impressive scenes I have ever set eyes upon. I never feel very big
on any occasion, but I do not remember ever feeling quite so small and humble
and insignificant as I did on that particular occasion—a mere speck, a mere
black spot, disturbing the peaceful harmony of the grandiose landscape.
The Nui Glacier

I have attempted to depict the scene in one of the illustrations, but no brush nor
canvas can satisfactorily reproduce the immensity of those white mountains
towering around you, the incalculable [203]masses of snow, the almost terrifying
appearance of the immense cracks in the ice hundreds of feet deep. It gave you
a certain feeling of loneliness and helplessness in case of mishap, and it really
made you think a good deal of how small are human beings and how puerile all
their works, when compared to those accomplished by the hand of Nature.

I think it would be a good thing if a great many other authors—not to speak of


our critics, who need it even more—could have an opportunity of experiencing
the sensation of humility I had upon me that day.

But, humble or not, I went on and on, like a tiny little ant upon the immaculate
and endless white carpet of snow, and higher and higher I gradually rose upon
the mountain side towards the Nui Pass. Panting and blowing, and with a feeling
that I wanted to throw away the rifle and cartridges and cameras and
instruments that I was lugging up with me—oh, they were such a weight! always
in the way, and ever dangling where you did not want them,—I got higher along
a narrow furrow so steep as to be almost vertical. I went up for a time on loose
rocks which gave way under me, and were most trying to the temper. One could
not [204]help constant falling, and one’s poor fingers and toes got jammed to a
pitiable extent.

Overhead a storm was brewing which promised to be of the very worst kind; but
luckily in the last portion of the ascent I was screened somewhat from the cutting
wind. Getting up to the top was a terrible effort, carrying all my paraphernalia,
but at last, in a desperate struggle, I managed to get there.

The Nui Pass itself, as can be seen by the drawing I give in these pages, is a
very narrow opening, in parts quite free from snow owing to its steepness and to
being so boxed in. Besides, on the south side, by which I ascended, it is rather
more covered, because the snow is generally driven with much fury from the
north. In fact, when I reached the summit and proceeded for some distance on
the Tibetan side (north aspect), the mountain was thickly padded with
uninterrupted snow. The wind was so fierce up there that it knocked me clean off
my feet twice.
Ascent to the Nui Pass

Now came the tedious job of boiling water in the hypsometrical apparatus to
ascertain the altitude, and taking whatever photographs and sketches I found
possible. But I had no sooner begun to unpack my instruments in a sheltered
[205]nook than the storm broke out in all its violence, and the snow, driven with
tremendous force by the wind into my face, felt just like hundreds of needles and
nails thrown at me. I wasted two entire boxes of matches in setting the
hypsometrical stove alight, and to accomplish this I had to protect it with my
coat, of which I had divested myself. I have never envied the Chinese gods with
a hundred arms more than I did on this occasion, for one’s two hands were
required in twenty places at the same time, the wind blowing everything about in
a most reckless manner. The water seemed to take ages to boil, and the storm
was getting worse and worse every moment, almost freezing my poor hands,
nose and ears, and giving me intense pain.

At last the welcome puff of steam began to escape from the apparatus; the
temperature of boiling water (178°·1) and the temperature of the air (30°) were
duly registered, and I repacked everything to make my descent. The altitude—
the correct one—of this pass in feet by hypsometrical apparatus was 19,621
feet, and two excellent aneroids I also carried registered 19,600 feet.

A great deal is to be said for and against [206]aneroids. In a few words, this is my
experience of them as regards work at great elevations. Unless you can get
aneroids of tested excellence and the very best that money can procure, you
had better go without them. Very small aneroids may be more portable, but they
are never of any real use. Always carry your aneroids yourself, and never let
them go out of your sight if you want to keep them in good order, and never rely
on them too much except when constantly checked by boiling-point
thermometers. Personally, for important elevations, I have relied entirely on
boiling-point thermometers, the only practical and less cumbersome way of
accurately ascertaining heights for an explorer, but I also always carry several
aneroids, two specially constructed for me to measure down to 12 inches—over
25,000 feet—and I have invariably found them accurate. I use them only for
differential altitudes, and for the less important observations. [207]
[Contents]
CHAPTER XIX

And now for the descent. I was quite numbed with cold—you see, a thin shirt
only is not much protection against snow being driven into you with such force,
and even when I put on my coat again my teeth were chattering so that I thought
they would break. Well, I suppose that if I had been more muffled up and
wearing heavy clothing I should have never got up there. My legs and hands
had nearly lost all feeling in them.

I loaded my rifle and all my instruments on my back, also my straw hat which it
was impossible to wear in the right place, bade Tibet a hearty “good-bye,” and
down I strode, with somewhat disjointed steps, by the way I had come. The
descent was rapid—a great deal too rapid—but partly to get away from the
intense cold, and the wind and the snow, partly owing to the anxiety which I still
felt regarding the safety of my two men I had left behind, [208]I really did not try to
control my speed. When I got among the loose rocks again, which started a
regular landslide at each step one took, I came very near having an accident
which might have had disastrous consequences. A stone rolled under my foot—
they nearly all did—and in slipping I got my right foot badly jammed between two
large stones. Before I had time to get it off again, several big stones came rolling
with great force from above, and one hit me so violently in the leg, and on falling
upon the other rocks squeezed my ankle with such pressure that I really thought
my leg had been fractured. Fortunately it was not.

The pain was excruciating, my feet being still half-frozen, and I sat down,
rubbing the one foot to restore some life to it, but it swelled considerably and
hung like dead, which caused me some little apprehension. Violent friction with
snow I tried next, and this seemed to bring some warmth and circulation, but the
pain was intense. One fact was certain, that it was getting late in the afternoon,
and that the Nui Pass was not the kind of place where I should care to be
benighted, so down I struggled, limping badly, and suffering agony every time
the foot got jammed again, which was at an average about every minute.
Observations for Altitude taken under Difficulties on the Nui Pass,
Darma

[209]
Thank heavens! I then got to the snow incline, where I could practise some
tobogganing, which saved much time and labour, and down I slid, carefully
regulating my speed upon the snow with my good leg. You see, if one had gone
too fast one might have been shot into one of the big crevasses of the Nui
Glacier down below, and that I particularly wished to avoid.

Partly through the strain of carrying up such a heavy load, partly through the
very little sleep I had obtained of late, partly owing to the great glare of the
immense white mountains before me during the day, and also in a measure to
the biting wind and snow—not to speak of the pain I was undergoing—the vision
of the only good eye I possessed became affected and caused me additional
trouble. At moments my sight became obscured altogether.

There is no doubt that it is well worth going up any high mountain for the sake of
the relief and satisfaction one experiences on coming down again, and on no
occasion did I feel this more strongly than upon that day. When I got down to the
glacier again—which spreads from east to west—I felt much better, and although
still quite lame could proceed at a fair pace. [210]

I hastened down to rejoin my two men, for the evening was drawing in. I took
special care not to fall into another snare—as surely the Tibetans might attempt
some of their games again—but nothing happened. Nothing ever does when
you are on your guard.

It was getting dark when I arrived at the spot where my two followers had
remained, and I shouted myself hoarse, but got no reply. I looked for them in
several places where they might possibly be, but I could find no signs of them.
Again I shouted and shouted, but no reply. Had they been murdered or had they
gone away? This was particularly tantalising, because not only did I feel for their
loss, but I also wanted badly to get rid of the load I was carrying.

By a mere chance, possibly suggested by my close observation of Shoka ways,


I thought that, before departing to rejoin the main portion of my expedition, I
would inspect some huge boulders some way off, behind which the men in their
long hours of waiting might possibly have taken shelter. Had they been there,
with the howling wind they could not possibly hear my voice. In fact, under the
largest boulder, where the melting snow had formed a hollow, I discovered
[211]my two Shokas wrapped up, head and all, in their blankets, and snoring
hard. They had given me up for lost—although the idea did not disturb their
sleep—and were waiting till the next morning to proceed up the glacier to look
for me.

The four Tibetans had tried to approach them, pretending friendship, but they
wisely had driven them away with stones. Then, for safety, they had at sunset
removed their quarters to a more secluded spot. The distance from this spot to
the Nui Pass and back was six miles.

The storm was still very bad, snow was again falling plentifully, and we decided
that our best plan was to make a night march—long as it would be—and try to
rejoin the others. Relieved of the weight of rifles and all, I was able to get along
pretty well, except that after we had gone a couple of miles it got pitch dark, and
we stumbled against everything and got terribly jerked. It snowed hard, and the
wind blew in all its fury. We eventually came upon the faint trail, now white with
snow, but on this it was considerably easier to proceed. We travelled now on
long stretches of flat country, then upon an undulating, even hilly, portion of the
valley, occasionally resting for breath under the lee of some big rock, and
drawing [212]freely on my supply of chocolate, which one of my men carried.

Towards midnight we reached an open space—one of the camping-grounds—


called Bedang, where the Tibetans have erected three extensive mani or sacred
walls, one with a number of images. We got on the lee side of it, and, taking
bundles of matches, lighted them up to inspect the long row of coloured
Buddhas forming a cornice to the upper part of the wall. There were dozens of
these images, evidently all made in the same mould, and painted in
combinations of yellow, red, and blue. Then there were large stones with the
usual sacred inscription, and flying prayers wherever they could be hung.
We Came Upon a Shrine of Curious Buddha Images

Half-way between Bedang and the Nui camping-ground we had come across a
number of chirams (pyramids, often tombs) and chokdens.

By the time we had reached this sacred spot we were pretty well tired out and
hungry, but we had not sufficient blankets to go round nor food enough to make
us feel really happy again. We rested a while, and before our limbs got numbed
with cold we again started off on our dreary march to rejoin the main body of my
party. As we got lower down we came in for a violent shower of hail—the
[213]pellets being of such a size that they thumped rather too vigorously on our
skulls—then torrents of rain. We were simply soaked. It cleared for a few
moments, and the moon shone for some seconds between two ugly black
clouds—almost, it seemed, only to laugh at us. Indeed, a moment later another
downpour froze us to the marrow of our bones, and it was all we could do to
proceed at all.

There was, fortunately, a narrow trail here, which we followed, and which
frequently overhung precipices of great height. In several places the trail was
actually resting on crowbars thrust into the face of the cliff. We stumbled along,
but we were all so tired out that we really cared little what happened to us. The
hours seemed interminable.
At 3.30 A.M. we at last approached Go village. All the houses were shut up,
everything was as still as death, until we got quite close. Then dogs barked
furiously from every house, and the noise was echoed from mountain to
mountain. The weather had somewhat cleared, but nowhere could I discern my
tents. We shouted and yelled to rouse the head village man, and eventually the
scared figure of a Shoka appeared, lighted by a red [214]blaze from a torch he
carried in his hand. He was jovial, and most anxious to be of assistance.

Although not absolutely, I am, practically, a teetotaller, as I seldom require


stimulants, but on that particular occasion I would have given all the money I
possessed to have a glass—or, better, a bottle—of stout! But, alas! the nearest
bottle of stout was a great many days’ journey from there. Chökti, the native
liquor, was the only stuff procurable, and I more than jumped at the offer when
the chieftain suggested that we must drink some to be revived.

Now, there is nothing a Shoka admires more in a foreigner than appreciation of


the national chökti—an appreciation they seldom get, for chökti is, indeed, the
vilest concoction a human mind can conceive or a human throat swallow.

When the Shoka and his torch disappeared we listened from the door with ever-
increasing attention to noises of dangling keys being tried, one after the other,
into a lock. Then came the snapping sound of the opening padlock, next the
loosing of the iron chain which is ever used in bolting Shoka doors. Reproachful
noises from the household, interrupted in their sleep, and remonstrative cries of
female relatives, could be heard at intervals; [215]then a long silence, some
rattling about, and at last the chieftain reappeared, triumphantly nursing a huge
jug of the “reviver.”

“Will you drink it here or down in your tent?” he inquired, with a twinkle in his
eye.

“In the tent,” I replied; and we all went down to where my camp had been
pitched. My men sprang out from all sides on hearing my voice, especially
several of them who, not expecting me back that night, had thought fit to occupy
my tent.

In a few moments the camp was alight with several blazing fires, there being
plenty of fuel at this place, and from the village a string of figures with torches
were running down, bringing food, more fuel, milk, and vegetables. The natives
of Go were indeed most thoughtful and polite.
I had marched continuously for twenty-two and a half hours, covering over forty
miles, the entire time over most difficult ground and at such great elevation that
when I sat down upon my blankets I felt quite exhausted. Nor did devouring—
the word eating is hardly expressive enough—several pounds of rice and meat
and potatoes and plum-pudding [216]and milk and chökti make me feel any
better. My appetite was insatiable, and no sooner was my head laid on the pillow
than I was fast asleep. Oh, what a lovely sensation to go to sleep when you are
so tired! [217]
[Contents]
CHAPTER XX

We were already getting to lower elevations—the village of Go being


only 10,577 feet. We had to the west of us the great Nanda Devi, the
highest mountain in the British Empire, 25,660 feet; three pyramidal
peaks, with rock exposed in vertical streaks right up to the summit. The
central peak is Nanda Devi itself, the next highest peak being 24,379
feet, according to Trigonometrical Survey measurements.

There are several extensive glaciers to the east watershed of Nanda


Devi, mostly extending from west to east, with a slight tendency
northwards, but on the western side of the watershed Nanda Devi is
practically surrounded by an immense glacier with numerous
ramifications.

There are a great many legends regarding this imposing mountain, the
principal one stating that on the shores of a lake which is supposed to
exist on [218]the very summit of Nanda is the abode and present
residence of Vishnu. The natives state that smoke is often seen rising
from the summit, which, they say, is caused by the god’s kitchen. Some
considerable distance below the summit, as high as one can possibly
climb, a festival is held every twelve years, but so difficult is the ascent
that many pilgrims perish, and only very few can reach the elevated
spot. Those few are held in great respect by their fellow-countrymen.

Nearer us than Nanda Devi and also to our west were two other giants
(21,520 feet and 22,660 feet), showing characteristics very similar to
Nanda Devi. The former had an immense glacier, the Naulphu, on its
eastern side, with huge masses of clear ice of resplendent beauty when
the sun shone on them. The ice terraces were fairly regular, much more
so than in most other glaciers I had inspected. At the bottom of the
glacier, in the centre, was an immense wall of ice, horse-shoe-shaped, a
most impressive sight as it stood out in brilliant relief above the dark-
brown débris of the terminal moraine. The Neo-lak-chan River has its
birth from this glacier.

There were three or four picturesque little Shoka hamlets along the river
—especially near the [219]spot where the Lissar River, fed by a number
of glaciers to the north-west, meets the Dholi River, which we have
followed from the Nui Glacier to the north. The village of Dukti was quite
attractive, with houses painted white, slate roofs, and strongly built
store-houses.

The trail mostly followed the course of the Dholi River, and was often
boxed in between high vertical cliffs of grey rock, along which the road
was constructed on crowbars. Between the villages of Bahling (10,230
feet) and Nagling (9876 feet—h.a.) was another small but interesting
glacier of dirty grey ice mixed with mud and débris, and a central and
two side dunes. A stream rose from it and became a tributary of the
Dholi.

From Nagling to Shobla there is a fair road, and from Sela, about half-
way between the two above villages, it is possible to get over the
mountains to Kuti. The way, however, is extremely bad, and over a good
deal of snow. The journey occupies from three to four days.

From Nagling southward the road was almost an identical replica of the
Nerpani—the waterless trail—I had followed on the way out. In many
places it was supported on crowbars and [220]we had a drop under us of
several hundred feet.

As far as Go from Nui the trail was on the east side of the Dholi River;
from Go it was on the west side. Some two miles from Shobla one got a
charming bird’s-eye view of this village, with the river like a ribbon of
silver winding its way between high mountains covered with luxuriant
vegetation. Perhaps the beautiful deep green of the trees affected us all
the more because we had been so long among barren, desolate, dreary
landscapes, and among snow and ice; so that it was a regular feast for
our eyes to see some signs of vegetation again.

The Darma Shokas, like those of Bias, only inhabit these villages during
the summer months, retiring to warmer regions farther south (mainly to
Dharchula) for the winter. Hence, a great many temporary sheds can be
seen in all their villages, wherein are stored their articles of furniture,
mats, and clothing, which they do not require when busy trading in the
summer months. In some spots the mountain side was simply dotted
with these temporary store-houses.

As we were going towards Khela, where we should complete the circle


of our journey and meet [221]another contingent of my men who had
proceeded there direct from Nepal, we had no further adventures worth
mentioning, except one.

We came to a strange cave, only a few yards deep and some 30 feet
high, in the side of a hill. The natives had told me that no animal could
enter it without dying, and, in fact, when we peeped into it we saw a
number of skeletons of dogs, other small mammals, and birds. On
stooping down, one of my men and I were immediately seized with
giddiness and a fainting sensation, and had we not been quick enough
in jumping out into the open air we might have possibly collapsed, owing
to the noxious gases which emanated from the ground in the cave. A
peculiar sulphurous odour was noticeable, even some little distance
from the cave. It is in its effects very much like the “Grotta dei Cani” of
Naples, only this one seemed more deadly. The gases seem to hang
low upon the ground, not more than about 3 or 3½ feet, although on
entering the cave one felt at once a stifling sensation, even when
standing upright. We two who had stooped suffered from a severe
headache for some hours.

Before we leave the Darma district, a word on the history of the natives
may be of interest. The Darma Shokas are in many ways—and in facial
[222]appearance—very different from the Shokas of Chaudas and Bias,

You might also like