A Trading System Based On ANN

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Master in Computer Engineering

Computing Project
(Proj-H-402)
A trading system based on
technical indicators and neural
networks
Student:
Michel Halmes
Supervisors:
Mauro Birattari
Michele Pace
May 18, 2012
Contents
1 Introduction 2
1.1 What is Technical Analysis? . . . . . . . . . . . . . . . . . . . . . 2
1.2 How does it work? . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Technical indicators 5
2.1 Average Distribution Line (ADL) . . . . . . . . . . . . . . . . . . 6
2.2 Average Distribution Index (ADX) . . . . . . . . . . . . . . . . . 7
2.3 Bollinger band (%B) . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Relative Strength Index . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Money Flow Index (MFI) . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Chaiki Money Flow (CMF) . . . . . . . . . . . . . . . . . . . . . 9
2.7 Force Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8 Moving Average Convergence-Divergence (MACD) . . . . . . . . 10
2.9 Stochastic Oscillators . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Implementation and evaluation of basic trading systems 11
3.1 Stop-loss and take-prot . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Money management . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Performance measures . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Test candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Results of testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 A simple approach based on Neural Networks 19
4.1 A brief explanation of Neural Networks . . . . . . . . . . . . . . 19
4.2 How to train a Neural Network? . . . . . . . . . . . . . . . . . . 20
4.3 The training system . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Results of the neural network system . . . . . . . . . . . . . . . . 22
5 Neural Network training based on genetic algorithms 24
5.1 Overview of the approach . . . . . . . . . . . . . . . . . . . . . . 24
5.2 The genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 The objective function . . . . . . . . . . . . . . . . . . . . . . . . 27
5.4 An additional restart feature . . . . . . . . . . . . . . . . . . . . 30
6 Performance tests of the Neural Network 32
6.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.2 Robustness of the trading system . . . . . . . . . . . . . . . . . . 33
6.3 Potential remedies against the low robustness . . . . . . . . . . . 37
6.3.1 Grouped training . . . . . . . . . . . . . . . . . . . . . . . 37
6.3.2 Control via the objective function . . . . . . . . . . . . . 39
6.3.3 Complexity of the network . . . . . . . . . . . . . . . . . 40
7 Conclusion 41
7.1 Why technical indicators and neural networks do not work together 41
7.2 A personal comment on ecient markets . . . . . . . . . . . . . . 42
1
T
he aim of this project is purely academic. It asses in how far technical
indicator can be used in combination with neural networks in order to setup
a trading system. It will be found that such an approach is not straightforward
and very unlikely to be useful as the basis for a fully working trading system.
Yet, this project has its contribution in the presentation of a well performing
genetic algorithm which can be used to train a neural network. This report
explains the main issues which are faced when neural networks and technical
indicators are combined.
It report is organized as follows: The rst section will explains briey what
a trading system is and lay out some common approaches. Thereafter, the
technical indicators that are used in this study are presented. A rst approach
will then attempt to use the back-propagation algorithm to train the neural
network on the nancial data. As this approach is inconclusive, an approach
based on genetic algorithm is presented subsequently. This approach raises
some unexpected issues which will be discussed and attempted to be resolved.
Finally, it will be concluded on how successfully neural networks and technical
indicators can be used together.
1 Introduction
1.1 What is Technical Analysis?
The use of nancial data such prices, timing and the volume of past transac-
tions to forecast future price evolution is called technical analysis. It can be
applied to dierent kinds of markets such as stock markets, foreign exchange
markets and derivatives markets. This analysis is however focused on shares
only.
The technical analysis view stands in contrast to the so called fundamental
analysis. Under this view, one uses data about the state of the economy such
as growth and ination rates , but mostly data about the company itself such
as its nancial statement, past dividends, its products, its competitors, and its
strategy , in order to predict price evolution.
Both views are in strong opposition to the modern portfolio theory of nance
and its Ecient Market Hypothesis. The latter states that prices are essentially
unpredictable, and that an active portfolio management has no added value.
The discussion about which of both schools is right or wrong, or rather in how
far each of both is in somehow right, is left to the reader himself. Yet, the
conclusion provides a personal view on ecient markets. In addition, the most
important arguments of each school uses to defend its position are presented in
this introduction.
The Ecient Market Hypothesis argues that all publicly available informa-
tion is reected in the price. Under its strong form it even supposes that all
privately available information is immediately reected in prices, such that even
insiders cannot take advantage of their privileged information. This is because
rational individuals would immediately buy or sell an asset if an information
that becomes available would make the current price unjustied.
The arguments technical analysts use to justify their approach is related to
behavioral nance. Technicians argue that market participants are not as ratio-
nal as the Ecient Market Hypothesis supposes them to be. Individuals need
2
time to process the bulk of information that can inuence asset prices. Infor-
mation is therefore not reected immediately in prices as the Ecient Market
Hypothesis states. Moreover, market participants tend to exaggerate the per-
ceptions of good and bad news. This might cause the existence of positive and
negative price bubbles, at least in the short run. Alan Greenspan, a former
Federal Reserve Chairman, gave to this phenomenon the name of Irrational
Exuberance. Also feeling and past experience play a big role when individuals
trade assets. Individuals tend to hesitate when new information is received and
to use it only with a certain delay in order to observe reactions of other traders
rst. Technical analysts argue therefore that markets react only slowly to infor-
mation. Moreover, the tendency of adaption to new information can early be
detected in prices and volumes, and the time of adaption is suciently long to
make protable trades.
The technical- and in somehow also the fundamental-analysis views can be
integrated in a trading system. A trading system or more precisely an algorith-
mic trading system uses an electronic platform generating trading orders in
nancial markets based on algorithms and without human intervention. Those
algorithms use as input data such as prices, timing and the volume of past
transactions.
The big advantage of trading systems is, that they exclude all emotions from
trading. Emotions are frequently cited as one of the biggest aws of individ-
ual investors. Decisions are often biased by risk aversion, mood and wrong
interpretations of past experiences. By following a pre-dened system, human
ineciencies can be overcome. The disadvantage of trading systems are that
they are dicult to develop and require a deep understanding of technical anal-
ysis.
1.2 How does it work?
There exist many dierent approaches to technical analysis. As an example,
some look for support and resistance lines. A support line is a price below
which a share price rarely goes. If the price approaches such a support from
the upside, it means that the price is likely to reverse upwards. The opposite is
true for resistance lines, above which a share price rarely goes. Traders can thus
make prots while the share uctuates between its support and its resistance.
Such a system executes many with small but consistent prots.
Other technical analysts try to identify trends, that is longer periods during
which the share price moves in the same direction. Of course longer trends are
relatively rare, as most price movements reverse after a while. A trading system
based on trends diers therefore from the previous one by the fact that only a
few trades are protable, but those who are bring large prots.
It is important that a trading strategy is adapted to a specic market state.
One distinguishes market states based on two criteria. The rst criteria dis-
tinguishes stable and trending markets. In opposite to the latter, which has
already been explained, a stable market is characterized by a share price which
remains within a certain range. The second criteria distinguishes quite and
volatile markets. A quite market is one in which prices remain either within
a small range in the case of a stable market, or it trends without any severe
replacements or opposite price movements. Volatile markets are the contrary of
this.
3
Figure 1: The evolution of the IBM share price. One can observe support and
break lines between which the price oscillates. When a trend begins, support
and resistance lines become obsolete.
To come back to the previous examples, a trading system based on support
and break lines or more generally the class of so called counter-trend systems
works best in stable and volatile markets, while trend following systems work
best in trending and quiet markets.
So as to make their order decisions, some technical analysts try to detect
patterns in charts. One approach which is not presented here makes use of
candlesticks, a particular representation of price movements. A candle stick is
composed of a rectangle the real body and two lines above and below this
body the shadows. The body indicates the open and the close price
1
. The
color of the body indicates which one is the lower. If the body is lled, the
stock closes lower than its opening price. When it is hollow the opposite holds.
The shadows indicate the level of the highest and the lowest price. The shape
of those candle sticks can then be used to make order decisions.
Another approach uses moving averages (MA) in order to detect trends. The
moving average shows the average price over a certain period. One generally
combines a shorter and a longer moving average. The short MA moves closely
with the price, while the longer has more inertia. If the shorter MA is above the
longer, it means generally that the share is in an uptrend. This trend reverses
when the fast MA moves again below the slow MA.
There exist two types of moving averages over N days. The Simple Moving
Average(SMA) is obtained by computing the average of the price over the last
N days. The Exponential Moving Average is obtained by attributing decreasing
weights to the prices situated far in the past. The rst value of the EMA must
be initialized to the (simple) moving average over the N previous days. For the
1
The open price is the price at the beginning of the day and the close price is the price at
the end of the day
4
(a) Interpretation of candle-
sticks
(b) Candlestick representation for PowerShares QQQ
trust
Figure 2: Candlesticks
following days, the EMA is then updated using some weight :
EMA
N
=
N

t=1
P
t
_
N (1)
EMA
t
= P
t
+ (1 )S
t1
t > N (2)
= P
t
+ (1 )P
t1
+ (1 )
2
P
t2
(3)
+ + (1 )
tN
EMA
N
(4)
This average needs a few days in order to converge to its true value and to
decrease the inuence of the initialization value. Of course, this average does
not depend on a xed number of days. It depends on by far more than N days,
but more recent prices have a higher inuence than days situated far in the past.
By convention, one uses a value of the weight given by =
2
N+1
. In this case
86% of the weights are attributed to the N last days.
Both averages move relatively closely together; the only dierence being that
the EMA is more reactive to recent price movements. Figure 3 below shows a
trading systems based on moving averages.
A somehow more sophisticated method is based on so called technical indi-
cators. Those are indicators based on past prices mostly even dierent prices
for each day, such as open, close, high, and low , and past trading volumes.
Some are even based on moving averages.
2 Technical indicators
In this section, ten dierent technical indicators will be presented. For each of
those indicators its interpretation, its computation and a decision rule based
on this indicator will be provided. For the sake of shortness of this report,
the indicators are not illustrated in real life examples. The interested reader
is refered to http://stockcharts.com/school, which illustrates the indicators
presented here and even many more. The decision rules which will be used here
are by the way those presented on this website.
5
Figure 3: Trading system based on 30-day and 100-day EMA for Inter-Telecom
2.1 Average Distribution Line (ADL)
The Average Distribution Line is a volume based indicator designed to measure
the cumulative ow of money in an out of a share.
Computation:
1. Money Flow Multiplier = [(Close - Low) - (High - Close)] /(High - Low)
2. Money Flow Volume = Money Flow Multiplier x Volume for the Period
3. ADL = Previous ADL + Current Periods Money Flow Volume
The Money Flow Multiplier uctuates between +1 and -1. The multiplier is
positive when the close is in the upper half of the high-low range and negative
when in the lower half. It is a measure of buying and selling pressure. The
buying pressure is stronger than selling pressure when prices close in the upper
half of the days range (and vice versa). Combined with the volume, the measure
of buying and selling pressure is even reinforced. A high positive multiplier
combined with a high volume shows strong buying pressure that pushes the
indicator higher. Conversely, a low negative number combined with a high
volume reects strong selling pressure that pushes the indicator lower.
Decision rule:
IF ADL > 14-day EMA of ADL THEN buy
IF ADL < 14-day EMA of ADL THEN sell
6
2.2 Average Distribution Index (ADX)
The Average Distribution index measures the trend strength without regard to
the trend direction. Two other indicators, Plus Directional Indicator (+DI) and
Minus Directional Indicator (-DI), complement ADX by dening trend direction.
Computation:
1. Calculate the True Range (TR), which is dened as the greatest of the
following:
Method 1: Current High less the current Low
Method 2: Current High less the previous Close (absolute value)
Method 3: Current Low less the previous Close (absolute value)
2. Plus Directional Movement (+DM) and Minus Directional Movement (-
DM) for each period.
Directional movement is positive (plus) when the current high minus
the prior high is greater than the prior low minus the current low.
This so-called Plus Directional Movement (+DM) then equals the
current high minus the prior high, provided it is positive. A negative
value would simply be entered as zero.
Directional movement is negative (minus) when the prior low minus
the current low is greater than the current high minus the prior high.
This so-called Minus Directional Movement (-DM) equals the prior
low minus the current low, provided it is positive. A negative value
would simply be entered as zero.
3. Smooth these periodic values using the Wilders smoothing techniques.
For instance for the True Range this made as follows:
First TR14 = Sum of rst 14 periods of TR1
Subsequent Values = Prior TR14 - (Prior TR14/14) + Current TR14
4. Divide the 14-day smoothed Plus Directional Movement (+DM) by the
14-day smoothed True Range to nd the 14-day Plus Directional Indicator
(+DI14). Multiply by 100 to move the decimal point two places.Do the
same for the 14-day smoothed Minus Directional Movement (-DM).
5. The Directional Movement Index (DX) equals the absolute value of +DI14
less -DI14 divided by the sum of +DI14 and - DI14.
6. After all these steps, it is time to calculate the Average Directional Index
(ADX) as follows:
First ADX14 = 14 period Average of DX
Subsequent ADX14 = (Prior ADX14 x 13) + Current DX Value
The Average Directional Index (ADX) is used to measure the strength or
weakness of a trend, not the actual direction. The direction of the trend is
dened by +DI and -DI.
7
Decision rule:
IF ADX > 20 & DI+ > DI- THEN buy
IF ADX > 15 & DI+ < DI- THEN sell
2.3 Bollinger band (%B)
The %B compares the current price level with an upper and a lower band. Those
bands are the Bollinger bands, which are set 2 standard deviations above and
below the 20-day simple moving average.
Computation:
1. %B = (Price - Lower Band)/(Upper Band - Lower Band)
%B can be used to identify overbought and oversold situations. A share is
overbought if its price is too high at is likely to decline. The opposite holds for
oversold shares.
Decision rule:
The %B is usually combined with the RSI or the MFI presented here blow.
A share is considered overbought if it is situated in the upper 20% between
the bands and oversold if situated in the lower 20%.
2.4 Relative Strength Index
The Relative strength Index is an indicator that measures the speed and change
of price movements.
Computation:
1. Compute Average Gain / Average Loss as follows:
First Average Gain = Sum of Gains over the past 14 periods / 14.
Subsequent Average Gain = [(previous Average Gain) x 13 + current
Gain] / 14.
Idem for Average Loss
2. RS = Average Gain / Average Loss
3. RSI = 100 100/(1+RS)
The RSI is situated between 0 and 100. It can be combined with the %B
in order to form a trading system. It indicates again whether a share is over-
bought or oversold. It is considered oversold if the indicator goes below 0.2 and
overbought above 0.8 (here 0.7 is used to have a more cautious trading rule)
Decision rule:
IF %B < 0.2 & RSI < 25 THEN buy
IF %B > 0.7 & RSI> 70 THEN sell
8
2.5 Money Flow Index (MFI)
The indicator that uses both price and volume to measure buying and selling
pressure. It can be considered as a volume weighted form of the RSI.
Computation:
1. Typical Price = (High + Low + Close)/3
2. Raw Money Flow = Typical Price x Volume
3. Positive Money Flow = Sum of positive Raw Money Flow over 14 periods.
4. Negative Money Flow = Sum of negative Raw Money Flow over 14 periods.
5. Money Flow Ratio = (Positive Money Flow)/(Negative Money Flow)
6. Money Flow Index = 100 - 100/(1 + Money Flow Ratio)
The MFI is situated between 0 and 100 and can also be combined with the
%B in order to form a trading system.
Decision rule:
IF %B < 0.2 & MFI < 25 THEN buy
IF %B > 0.7 & MFI > 70 THEN sell
2.6 Chaiki Money Flow (CMF)
The Chaiki Money Flow combines price and volume to show how money may
be owing into or out of a stock. It is an alternative to Average Distribution
Line, i.e. it is a measure of buying and selling pressure.
Computation:
1. Money Flow Multiplier = [(Close - Low) - (High - Close)] /(High - Low)
2. Money Flow Volume = Money Flow Multiplier x Volume for the Period
3. 20-period CMF = 20-period Sum of Money Flow Volume / 20-period Sum
of Volume
The resulting indicator uctuates above/below the zero line.
Decision rule:
IF CMF > 0.05 THEN buy
IF CMF < -0.05 THEN sell
2.7 Force Index
The Force Index uses price and volume to assess the power behind a move and
to identify possible turning points.
9
Computation:
1. Force Index(1) = Close (current period) - Close (prior period) x Volume
2. Force Index(13) = 13-period EMA of Force Index(1)
The Force Index combines three elements into one indicator so as to measure
selling and buying pressure. First, there is either a positive or negative price
change. A positive price change signals that buyers were stronger than sellers,
while a negative price change signals that sellers were stronger than buyers.
Second, there is the extent of the price change, which is simply the current close
less the prior close. The higher the price change, the higher the corresponding
pressure. The third and nal element is volume,which measures commitment.
Decision rule:
IF force(13) > 0 THEN buy
IF force(13) < 0 THEN sell
Typical values for the implementation of the force index use the 13 or 20
days EMA.
2.8 Moving Average Convergence-Divergence (MACD)
The MACD is all about convergence and divergence of the two moving averages
(one slow and one fast).
Computation:
1. MACD Line: 12-day EMA - 26-day EMA of prices
2. Signal Line: 9-day EMA of MACD Line
3. MACD Histogram: MACD Line - Signal Line
The MACD Line measures the dierence between a short and a long moving
average. Such an indicator would be used in a normal moving average trading
system. However, this indicator is too slow to identify soon enough beginnings
and reversals of trends. The MACD Histogram measures the convergence and
divergence of the MACD Line. It uctuates above and below the zero line. A
positive MACD Histogram indicates that that the MACD Line is increasing,
which means that one is potentially in an uptrend. An increase of a positive
MACD Histogram indicates that the MACD Line increase is accelerating which
indicates a stronger uptrend. The opposite hold for negative respectively nega-
tive decreasing MACD Histograms.
Decision rule:
IF MACD-hist > 0 THEN buy
IF MACD-hist < 0 THEN sell
10
2.9 Stochastic Oscillators
The Stochastic Oscillator is an indicator that shows the location of the close
relative to the high-low range over a number of periods.
Computation:
1. %K = (Current Close - Lowest Low)/(Highest High - Lowest Low) * 100
Lowest Low = lowest low for 14 day look-back period
Highest High = highest high for 14 day look-back period
2. fast%D = 3-day SMA of %K
3. slow%D = 3-day SMA of fast%D
The Stochastic Oscillators takes values between 0 and 100. It can be used to
identify oversold and overbought shares. A share is overbought if the Stochastic
oscillator is above 80 and oversold if below 20. As their names indicate, the
slow%D is more smooth than the fast%D. One can base a trading system on
each of both indicators.
Decision rule:
IF %D moves from below 20 above 20 THEN buy
IF %D moves from above 80 below 80 THEN sell
3 Implementation and evaluation of basic trad-
ing systems
This section will analyze the performance of the trading systems based on the
technical indicators which have been presented her above. All those systems are
based on the very simple IF-THEN rules. But before, an important additional
feature and some performance measures will be introduced.
3.1 Stop-loss and take-prot
Two important aspects of trading systems, which have not been presented yet
are the stop-loss and take-prot features. As their names indicate, the stop-loss
ends a position that has been take and has generated too much losses so far,
and the take-prot ensures that a position which has generated good prots so
far is sold at a good moment in order to cash-in these prots.
There exist many dierent types of stop-loss and take-prot and the right
choice of those features is crucial for the success of a trading system. How-
ever, the aim of this project is not to discuss dierent forms of those feature.
Therefore, a simple but still widely implemented solution is used: the so called
trailing-stop. The trailing-stop assumes the roles of stop-loss and take-prot at
the same time. The trailing-stop sells the position when the current price goes
below a certain percentage of the maximum price during a position. This per-
centage has been chosen equal to 5% as it gave the most satisfying results. This
11
means that the maximum loss that can be made on a trade is 5%. This is the
stop-loss function. The take-prot function assures that the prot is cashed-in
when the share has decreased by 5% from its peak.
The percentage value that has been chosen for the trailing-stop must make
the balance between a trade-o. If its value is too high, the system can incur
huge losses and is hence very risky. Moreover, the take prot is inecient as the
system loses a lot of its value compared to the peak which has been attained
during a trade. On the other hand, if its value is too low, the system will exit
a position at the rst price fall which is observed, even though prices might
continue to increase afterwards. The trades a system makes are hence too short
and the potential gain too small to recover its trading costs.
3.2 Money management
Another important aspect that has not been mentioned yet is the money man-
agement system. How much to buy is a very crucial question for the success of a
trading system. But as the aim of this project is purely academic, a simple rule
has also been used here. At each buy signal, the share is bought for 100 units.
This gives an equal chance to each trading decision and is therefore suited for
evaluating the trading system. At the same time, all gures such as prot or
loss amounts can immediately be interpreted as a percentage. This rule is how-
ever very simplied for several reasons. First, one cannot simply buy a share
for 100 or $100 but only for a multiple of its share price, which could even
be above 100 or $100. Second, this assumption does not take into account
that the capital available for the system might be insucient especially after
several successive losses. What is also not taken into account is the question of
diversication. It might not be a very smart idea to let a system buy for large
amounts shares of companies operating in the same sector. A very important
principle in nance is diversication, that is not to put all eggs in the same
basket. However, taking account of those factors would be beyond the scope
of this project.
In order to simplify further and also because 100 or $100 is a too small to
be realistic , this study does not take into account trading cost either. However,
the number of trades will always be reported such that the computation of
trading costs is straightforward.
3.3 Performance measures
In order to compare dierent trading systems, this section presents some metrics
that help evaluating trading systems. The metrics which are reported here are
the following[11]:
Total net prot examines protability irrespective of risk taken to achieve
the results. It is useful to quickly compare various portfolio component
results without additional calculations. As the prot must be relativized
with respect to the duration, the annualized prot will be presented here.
If a trading system runs over N days and generates a prot of while
investing each time 100 units, the annualized prot is given by:
annR = (1 +

100
)
252/N
1; (5)
12
The fraction in the exponent represents the inverse of the length of the
trading period in years, where it has been supposed that one year is com-
posed of 252 working (trading) days.
Number of trades (# Trades) shows the total number of trades taken during
the testing period.
Number of days (# Days) shows the average duration of a trade. As with
number of trades, all else being equal, the lower the number of days in a
trade while still generating superior results, the better.
Maximum drawdown amount (Max Draw) tells us the maximum peak-to-
valley equity drawdown during the testing period. This number denes
our absolute minimum capitalization requirements to trade the system.
Maximum drawdown duration (MDD) is the longest duration of a draw-
down in equity prior to the achievement of a new equity peak. This number
is essential in psychologically preparing us for how long we must wait to
experience a new peak in account equity.
Maximum consecutive losses (MCL) is the maximum number of consecu-
tive losses endured throughout the testing period. Just as MDD, it is
important in dispelling any fantasies regarding a systems ability to jump
continuously from equity peak to ever higher peaks, MCL shows ahead of
time exactly how many consecutive losses successful traders would have
endured to enjoy the systems total net prot.
Prot to maximum drawdown (P:MD) refers to the average prot to maxi-
mum drawdown ratio. The higher this ratio is, the better. This is probably
the most important eld listed because it allows to examine the prot in
relation to risk endured to achieve that protability.
Prot loss ratio (P:L ratio) refers to the average prot to average loss ratio.
As with P:MD, the higher these numbers are, the better. Trend-following
systems should have very good P:L ratios because they generally display a
low winning percentage of trades. This means that large prots and small
losses are key in generating a good P:MD ratio. These ratios will drop
for counter-trend systems, but the winning percentage of trades should
compensate for this.
Percent winners (%W) is the percentage of winning trades. As stated, trend
systems generally will have relatively low %Ws and counter-trend systems
typically display high%Ws
Time percentage (Time %) refers to the amount of time that this system
has an open position in the market. If all other elds were equal, then a
lower time percentage would be preferable because it means our available
capital is tied up for less time to yield the same rate of return
3.4 Test candidates
To test the dierent trading systems that will be presented in this report, 30
shares from dierent markets and sectors have been selected.
13
When importing the data, a diculty has been observed. The price data
contained several break points, where the price falls considerably (see gure 4).
Those break points correspond to nancial operations such as dividend distri-
butions, capital increases etc.. They aect however not the return of the share.
To overcome this problem, one must use the adjusted close which eliminates
those eects. From the dierence between the adjusted close and the close, one
could theoretically also adjust the other prices such as open, high and low in
order to be sure that the computed indicators are also adjusted. However,
one observes also that some volatility is missing in the adjusted price before
the breakpoint. Therefore, only the last break-free period has been retained to
perform the tests.
Figure 4: Evolution of closes price and adjusted close for Target Company. One
observes several break points in the close price
Tables 1 and 2 show the characteristics of the test candidates [4]. Reported
are the company code, the complete name, the market and the sector. Reported
are also the market capitalization which gives an indicator of the size of
the company , the number of days for which price data is available and the
annualized return of the share during this period. Finally, the beta is also
reported. The beta is a well known measure for the risk of a share. More
precisely it indicates how volatile the returns of a share are compared to the
volatility of the market. A share with a positive beta (which is the case for
almost all companies) has its returns depending on the return of the market.
A share with a beta higher than one has stronger volatility in its returns than
those of the market, while one with a beta between 0 and 1 has lower volatility
than the market. Mathematically, the beta is given by the slope of an OLS
regression between the shares and the markets return.
3.5 Results of testing
Table 3 shows the performance metrics of the dierent trading systems
2
. It
includes one system in addition to those that have been presented in the previous
section. It is a 3-screen system proposed by Dr. Pace. This trading system buys
2
More details in the attached Excel le Benchmark.xls
14
C
o
d
e
N
a
m
e
M
a
r
k
e
t
S
e
c
t
o
r
M
k
t
C
a
p
B
e
t
a
a
n
n
R
(
%
)
#
d
a
y
s
A
B
T
A
b
b
o
t
t
L
a
b
o
r
a
-
t
o
r
i
e
s
N
Y
S
E
1
0
0
H
e
a
l
t
h
C
a
r
e
9
4
,
1
5
b
n
$
0
,
3
2
1
,
0
6
3
3
1
5
A
C
K
B
A
c
k
e
r
m
a
n
s
&
v
a
n
H
a
a
r
e
n
N
V
.
B
E
L
2
0
I
n
v
e
s
t
m
e
n
t
2
,
0
9
b
n

-
1
7
,
5
2
2
2
8
A
C
N
A
c
c
e
n
t
u
r
e
P
l
c
.
N
Y
S
E
1
0
0
C
o
n
s
u
l
t
i
n
g
a
n
d
o
u
t
s
o
u
r
c
i
n
g
4
3
,
8
5
b
n
$
0
,
8
1
2
,
7
2
5
2
1
A
M
Z
N
A
m
a
z
o
n
.
c
o
m
,
I
n
c
.
N
Y
S
E
1
0
0
O
n
l
i
n
e
m
e
d
i
a
r
e
-
t
a
i
l
8
6
,
4
5
b
n
$
1
6
,
8
9
3
1
1
7
A
X
P
A
m
e
r
i
c
a
n
E
x
-
p
r
e
s
s
C
o
m
p
a
n
y
N
Y
S
E
1
0
0
P
a
y
m
e
n
t
s
e
r
v
i
c
e
s
6
6
,
9
9
b
n
$
1
,
8
4
-
0
,
9
5
2
2
8
2
3
B
A
X
B
a
x
t
e
r
I
n
t
e
r
n
a
-
t
i
o
n
a
l
I
n
c
.
N
Y
S
E
1
0
0
H
e
a
l
t
h
C
a
r
e
3
0
,
1
4
b
n
$
0
,
4
9
1
,
0
4
2
5
5
8
B
E
L
G
B
e
l
g
a
c
o
m
S
A
B
e
l
2
0
T
e
l
e
c
o
m
m
u
n
i
c
a
t
i
o
n
7
,
4
8
b
n

-
-
1
,
3
6
1
9
1
2
B
E
N
F
r
a
n
k
l
i
n
R
e
-
s
o
u
r
c
e
s
,
I
n
c
.
N
Y
S
E
1
0
0
I
n
v
e
s
t
m
e
n
t
2
6
,
5
0
b
n
$
1
,
4
8
6
,
0
7
3
4
0
1
B
M
C
B
M
C
S
o
f
t
w
a
r
e
,
I
n
c
.
N
A
S
D
A
Q
I
T
s
o
l
u
t
i
o
n
s
6
,
7
3
b
n
$
0
,
6
-
2
,
0
9
3
4
4
4
C
S
C
O
C
i
s
c
o
S
y
s
t
e
m
s
,
I
n
c
.
N
A
S
D
A
Q
N
e
t
w
o
r
k
s
o
l
u
-
t
i
o
n
s
1
0
7
,
2
3
b
n
$
1
,
1
8
-
9
,
5
2
2
9
8
5
D
D
E
.
I
.
D
u
P
o
n
t
D
e
N
e
m
o
u
r
s
&
C
o
.
N
Y
S
E
1
0
0
C
h
e
m
i
c
a
l
s
4
9
,
3
0
b
n
$
1
,
4
5
-
2
,
7
5
3
5
5
7
D
E
L
B
D
e
l
h
a
i
z
e
G
r
o
u
p
B
e
l
2
0
R
e
t
a
i
l
3
,
6
8
b
n

-
1
3
,
4
2
2
2
9
D
E
L
L
D
e
l
l
I
n
c
.
N
A
S
D
A
Q
H
a
r
d
w
a
r
e
2
8
,
4
7
b
n
$
1
,
3
6
-
5
,
9
2
3
2
5
0
D
O
W
T
h
e
D
o
w
C
h
e
m
i
-
c
a
l
C
o
m
p
a
n
y
N
Y
S
E
1
0
0
C
h
e
m
i
c
a
l
s
4
2
,
1
2
b
n
$
2
,
3
1
-
0
,
9
2
2
2
7
9
6
F
D
X
F
e
d
E
x
C
o
r
p
o
r
a
-
t
i
o
n
N
Y
S
E
1
0
0
L
o
g
i
s
t
i
c
s
2
8
,
2
8
b
n
$
1
,
2
6
2
,
7
1
3
0
7
3
T
a
b
l
e
1
:
I
n
f
o
r
m
a
t
i
o
n
a
b
o
u
t
t
e
s
t
c
a
n
d
i
d
a
t
e
s
h
a
r
e
s
(
1
/
2
)
15
C
o
d
e
N
a
m
e
M
a
r
k
e
t
S
e
c
t
o
r
M
k
t
C
a
p
B
e
t
a
a
n
n
R
(
%
)
#
d
a
y
s
G
E
G
e
n
e
r
a
l
E
l
e
c
t
r
i
c
C
o
m
p
a
n
y
N
Y
S
E
1
0
0
T
e
c
h
n
o
l
o
g
y
&
M
a
c
h
i
n
e
r
y
2
0
4
,
8
5
b
n
$
1
,
5
8
-
1
0
,
3
2
8
2
6
G
O
O
G
G
o
o
g
l
e
I
n
c
N
A
S
D
A
Q
I
n
t
e
r
n
e
t
s
e
a
r
c
h
1
9
3
,
8
0
b
n
$
1
,
0
8
1
3
,
2
3
6
5
2
H
P
Q
H
e
w
l
e
t
t
-
P
a
c
k
a
r
d
C
o
m
p
a
n
y
N
Y
S
E
1
0
0
H
a
r
d
w
a
r
e
4
8
,
4
6
b
n
$
1
,
0
9
-
3
,
5
7
2
7
0
4
I
B
M
I
n
t
e
r
n
a
t
i
o
n
a
l
B
u
s
i
n
e
s
s
M
a
-
c
h
i
n
e
s
C
o
r
p
.
N
Y
S
E
1
0
0
I
T
s
o
l
u
t
i
o
n
s
2
2
9
,
5
4
b
n
$
0
,
6
6
1
,
5
4
3
0
6
5
J
P
M
J
P
M
o
r
g
a
n
C
h
a
s
e
&
C
o
.
N
Y
S
E
1
0
0
F
i
n
a
n
c
i
a
l
s
e
r
v
i
c
e
s
1
6
3
,
2
8
b
n
$
1
,
2
6
-
4
,
3
7
2
8
0
2
K
B
C
K
B
C
G
r
o
e
p
N
V
.
B
E
L
2
0
F
i
n
a
n
c
i
a
l
s
e
r
v
i
c
e
s
4
,
7
7
b
n
$
-
8
,
3
1
2
2
2
9
K
F
T
K
r
a
f
t
F
o
o
d
s
I
n
c
.
N
Y
S
E
1
0
0
F
o
o
d
6
8
,
3
0
b
n
$
0
,
5
4
1
,
0
3
2
5
4
5
M
C
D
M
c
D
o
n
a
l
d

s
C
o
r
-
p
o
r
a
t
i
o
n
N
Y
S
E
1
0
0
R
e
s
t
a
u
r
a
n
t
s
9
7
,
4
9
b
n
$
0
,
4
1
5
,
3
8
3
1
2
2
M
O
B
B
M
o
b
i
s
t
a
r
S
A
.
B
E
L
2
0
T
e
l
e
c
o
m
m
u
n
i
c
a
t
i
o
n
2
,
0
0
b
n
$
-
7
,
0
1
2
2
2
9
O
R
C
L
O
r
a
c
l
e
C
o
r
p
o
r
a
-
t
i
o
n
N
A
S
D
A
Q
I
T
s
o
l
u
t
i
o
n
s
1
4
3
,
6
8
b
n
$
1
,
0
8
2
,
6
1
2
8
4
3
P
N
C
P
N
C
F
i
n
a
n
c
i
a
l
S
e
r
v
i
c
e
s
N
Y
S
E
1
0
0
F
i
n
a
n
c
i
a
l
s
e
r
v
i
c
e
s
3
4
,
5
2
b
n
$
1
,
2
1
4
,
4
5
4
7
0
7
S
O
L
B
S
o
l
v
a
y
S
A
.
B
E
L
2
0
C
h
e
m
i
c
a
l
s
7
,
2
1
b
n

-
2
,
5
1
2
5
6
5
T
G
T
T
a
r
g
e
t
C
o
r
p
o
r
a
-
t
i
o
n
N
Y
S
E
1
0
0
R
e
t
a
i
l
3
8
,
2
9
b
n
$
0
,
8
9
5
,
9
6
2
7
7
5
U
P
S
U
n
i
t
e
d
P
a
r
c
e
l
S
e
r
v
i
c
e
I
n
c
.
N
Y
S
E
1
0
0
L
o
g
i
s
t
i
c
s
7
6
,
9
4
b
n
$
0
,
8
3
-
0
,
5
8
7
2
9
4
2
X
O
M
E
x
x
o
n
M
o
b
i
l
C
o
r
p
o
r
a
t
i
o
n
N
Y
S
E
1
0
0
O
i
l
4
0
2
,
0
4
b
n
$
0
,
4
9
5
,
7
6
2
5
2
4
T
a
b
l
e
2
:
I
n
f
o
r
m
a
t
i
o
n
a
b
o
u
t
t
e
s
t
c
a
n
d
i
d
a
t
e
s
h
a
r
e
s
(
2
/
2
)
16
when the share is oversold and when and the 40-day Moving average is rising.
This has been translated into the following decision rule:
IF %B < 0.2 & MFI < 25 & EMA40 > 10-day SMA of EMA40 THEN buy
IF %B > 0.7 & MFI > 70 THEN sell
The trading systems are run over all 30 candidate shares using the trailing-
stop system. This means that each trading system has been tested on a total
of about 86000 days. The table reports the metrics which have been dened
above. Moreover it includes the standard deviation of the annual returns of
the trading systems, which is an important metric for the risk of the trading
systems. This standard deviation should be compared to the standard deviation
of the annualized market return of the shares, which is at 6.5%
The annual return of the trading system can easily be compared to the
average annual market return of the shares which is at 2.59% over the studied
period. The table states this deviation from the market return. The last column
provides the p-value of Welchs student test. This test is used with two samples
having potentially dierent standard deviations. Its test statistic is given by:
t =
X
1
X
2

s
2
1
N
1
+
s
2
1
N
1
(6)
where X
i
, s
2
i
and N
i
are the i-th sample mean, sample variance and sample
size respectively. This statistic follows a t-student distribution with degree of
freedom given by:
=
_
s
2
1
N
1
+
s
2
2
N
2
_
2
s
4
1
N
2
1
(N
1
1)
+
s
4
2
N
2
2
(N
2
1)
(7)
One can make the following observations: First, all systems under-perform
the market. The p-value of the student test conrms that this dierence is
mostly signicant. The best performing trading systems are those based on the
oscillators %B and MFI, including the 3-screen system. The 3-screen system
has been successfully tested by Dr. Pace in combination with a more sophisti-
cated stop-loss, take-prot and money management features. We can therefore
conclude that due to our restrictive assumptions on those feature, our trad-
ing system must not necessarily beat the market in order to bear a potential
for being implemented. Improving the cited features will signicantly improve
the performance of the trading systems that have and will be presented in this
study. With this in mind, one can consider that the trading system based on
RSI delivers also good results. The results obtained from ADX and slow%D are
also acceptable.
What is also striking is that the standard deviation is lower for the trading
systems with good performance. This means that those systems perform rela-
tively well on all shares, which further suggest that their success is not simply
based on luck.
17
A n n u a l R e t u r n ( % )
S t d D e v o f A n n R e t u r n ( % )
D e v f r o m M k t R e t u r n
# T r a d e s
# D a y s
M a x D r a w
M D D
M C L
P : M D
P : L
% W i n
% T
P - v a l u e
1
A
D
L
-
3
,
4
8
9
,
6
4
-
6
,
0
4
2
5
4
,
7
0
6
,
4
9
5
,
9
8
4
,
1
0
1
2
,
5
7
-
4
,
4
4
1
3
2
,
1
0
3
3
,
4
1
5
5
,
3
9
0
,
0
0
7
*
*
*
A
D
X
0
,
1
7
5
,
8
1
-
2
,
3
9
8
2
,
6
0
1
2
,
0
5
5
,
9
4
4
,
1
3
8
,
6
0
3
,
5
0
1
0
,
1
8
3
5
,
1
1
3
3
,
3
1
0
,
1
4
2
%
B
&
R
S
I
1
,
3
1
2
,
4
5
-
1
,
2
5
1
1
,
8
0
1
6
,
6
9
5
,
3
4
4
,
6
0
3
,
0
3
3
7
,
5
3
6
7
,
3
4
4
5
,
3
9
6
,
7
2
0
,
3
3
5
%
B
&
M
F
I
1
,
7
4
3
,
7
3
-
0
,
8
2
2
6
,
8
0
1
4
,
9
9
5
,
5
3
2
,
9
7
4
,
4
0
1
9
,
4
1
2
8
,
1
0
4
5
,
4
0
1
3
,
5
6
0
,
5
5
5
C
h
a
i
k
i
M
F
-
2
,
3
3
8
,
2
0
-
4
,
8
9
1
0
8
,
6
3
1
5
,
0
1
6
,
1
6
4
,
8
3
8
,
3
3
-
1
,
3
0
-
0
,
7
7
3
7
,
2
8
5
2
,
7
7
0
,
0
1
4
*
*
F
o
r
c
e
1
3
-
2
,
3
1
7
,
4
8
-
4
,
8
7
2
1
4
,
4
0
7
,
3
8
6
,
0
4
3
,
4
7
1
2
,
8
3
-
0
,
1
0
0
,
3
7
2
8
,
8
8
5
3
,
5
3
0
,
0
1
0
*
*
*
F
o
r
c
e
2
0
-
2
,
0
1
7
,
0
2
-
4
,
5
7
1
7
8
,
5
3
8
,
9
4
6
,
1
6
3
,
6
0
1
2
,
3
3
-
0
,
0
6
1
,
6
0
2
8
,
2
2
5
3
,
4
5
0
,
0
1
2
*
*
M
A
C
D
-
h
i
s
t
-
2
,
0
6
9
,
4
7
-
4
,
6
2
1
3
9
,
2
7
1
0
,
5
5
5
,
8
8
3
,
1
0
9
,
4
7
0
,
5
9
1
,
7
7
3
5
,
4
9
4
9
,
0
6
0
,
0
3
2
*
*
f
a
s
t
%
D
-
0
,
3
9
4
,
4
7
-
2
,
9
5
6
5
,
0
7
1
2
,
7
5
5
,
5
9
2
,
7
7
6
,
2
3
1
,
6
2
1
,
7
3
4
5
,
3
3
2
8
,
1
3
0
,
0
4
7
*
*
s
l
o
w
%
D
0
,
1
6
4
,
9
0
-
2
,
4
0
4
7
,
3
0
1
4
,
4
3
5
,
5
4
4
,
3
7
5
,
6
3
4
,
6
3
5
,
3
8
4
5
,
8
8
2
3
,
2
3
0
,
1
1
4
3
-
s
c
r
e
e
n
1
,
7
2
3
,
7
3
-
0
,
8
3
2
6
,
6
7
1
5
,
0
6
5
,
5
4
2
,
9
3
4
,
4
0
1
9
,
5
0
2
7
,
7
1
4
5
,
5
8
1
3
,
5
3
0
,
5
4
8
T
a
b
l
e
3
:
A
v
e
r
a
g
e
p
e
r
f
o
r
m
a
n
c
e
m
e
t
r
i
c
o
v
e
r
3
0
s
h
a
r
e
s
f
o
r
t
h
e
t
r
a
d
i
n
g
s
y
s
t
e
m
s
b
a
s
e
d
o
n
s
i
m
p
l
e
I
F
-
T
H
E
N
r
u
l
e
s
a
*
*
*
=
s
i
g
n
i

c
a
n
t
a
t
1
%
c
o
n

d
e
n
c
e
,
*
*
=
a
t
5
%
,
*
=
a
t
1
0
%
18
One could of course further improve the performance of those systems by
optimizing the intervening parameters and even adapt those parameters to the
specic characteristics of each share. A very promising approach would also be
to combine the dierent indicators. In fact, the top three systems implemented
so far are those based on two or more indicators. It could in fact be that receiving
a signal from one indicator is not sucient to validate this signal. Potentially
receiving many but weak signals from dierent indicators has more weight than
than just one strong signal.
Neural networks could take all those options into account. This is the main
eld of interest of this study: Can neural networks help improving the per-
formance of trading systems based on technical indicators? The next section
presents a simple approach to this question.
4 A simple approach based on Neural Networks
4.1 A brief explanation of Neural Networks
A Neural Network or more precisely an Articial Neural Network is a non-
linear classier inspired from the functioning of the brain. It is composed of
nodes, the neurons, which are interconnected. Each node corresponds to a
numerical value and each edge corresponds to a weight that is applied to the
value from which this edge is coming from.
The variables are entered in the input nodes. The nodes in the intermediate
layer form the so called hidden layer. For each hidden node, a weighted sum of
the input variable, plus generally a constant corresponding to an input variable
equal to 1, is computed. The output of the hidden nodes is given by a Sigmoid
function applied on this weighted sum. A Sigmoid function is a function with
an S shape. The Sigmoid function which is considered here is f(x) =
1
1+e
x
,
which is represented in gure 5. Mathematically, the output of a hidden layer
j is given by h
j
= f(

w
(h)
ij
x
i
+ w
(h)
0j
), where x
i
are the dierent inputs, w
(h)
ij
corresponds to the weight attributed to the edge going from input node i to the
hidden node j and w
(h)
0j
is the aforementioned constant attributed to the hidden
node.
The output of the hidden nodes is hence comprised between 0 and 1. The
output layer does actually the same as the hidden layer. It takes a weighted
sum of the hidden nodes plus a constant and enters it in the Sigmoid function.
Mathematically, the output k is given by o
k
= f(

w
(o)
jk
h
j
+ w
(o)
0k
), where h
j
are the outputs of the dierent hidden nodes, w
(o)
jk
corresponds to the weight
attributed to the edge going from hidden node j to the output node k and w
(h)
0j
is the constant attributed to the output node.
An example of the Neural Network which will be considered here is repre-
sented in gure 6. The inputs are given by the technical indicators. We have
two outputs, one buy and one sell signal. We will interpret a signal as valid if
the corresponding node is above a threshold, which can be arbitrarily dened at
0.5. With this in mind, on can interpret the constant w
(o)
0k
as a kind of threshold.
In fact, for any of both output nodes, the decision rule:
signal if o
k
= f
_

w
(o)
jk
h
j
+ w
(o)
0k
_
> 0.5, (8)
19
8 6 4 2 0 2 4 6 8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
f
(
x
)
Sigmoid function
Figure 5: Sigmoid function
can be translated as:
signal if

w
(o)
jk
h
j
> w
(o)
0k
, (9)
Also for the hidden nodes, the constant w
(o)
0k
or rather w
(o)
0k
can be inter-
preted as the threshold above which this node is close to 1, and below which it
is close to 0.
%B
MFI
MACD
1
f(x)
f(x)
f(x)
f(x)
1
f(x) Buy
f(x) Sell
Hidden
layer
Input
layer
Output
layer
Figure 6: Layout of the neural network
4.2 How to train a Neural Network?
Training the neural network means determining the weights of the network.
This is done via a gradient descent method called back-propagation. We consider
here only the case where the output is binary (0/1). As mentioned, an output
value above 0.5 would be interpreted as 1 and below this value as 0. To train the
network we need some dataset with the input variables x
i
and the corresponding,
binary variable to predict t
k
.
The idea of back-propagation is the following. The weights are adapted
iteratively. At each iteration one computes rst the outputs o
k
of the network.
Those outputs are then compared to the target values t
k
so as to compute the
20
error of each output node. The goal is to minimize the sum of the squared errors,
SSE =

k
(t
k
o
k
)
2
. Let w
(l)
, with l = h or l = o, be the matrix containing
the weights of the hidden- respectively output-layer. In order to minimize the
SSE, the weights are updates in the opposite direction of the gradient of the
error:
w
(l)
w
(l)

k
(t
k
o
k
)
2
w(l)
, l = h, o (10)
where is a parameter of the optimization called the learning rate.
As the literature about back-propagation is very complete, the develop-
ment of equation (10) is not presented here. Instead, algorithm 1 presents
the complete back-propagation algorithm. In order to have more compact ex-
pressions, matrix notations are used. x, h and o represent under the form of
a line vector the values of the input, hidden and output layers respectively.
f

(x) = f(x)(1f(x)) is the rst derivative of the Sigmoid function. The nota-
tion is used for the usual matrix product while represents the elementwise
product between matrices of identical size. One can also nd here the reason
for the name of this algorithm: The error is back-propagated from one layer
to the previous to update the weights.
Algorithm 1 The back-propagation algorithm
while Local optimum not attained do
h = f(x w
(h)
)
o = f(h w
(o)
)

out
= f

(h w
(o)
) (t o)
w
(o)
w
(o)
+ h
out

hid
= f

(x w
(h)
) (
out
w
(o)
)
w
(h)
w
(h)
+ x
hid
end while
4.3 The training system
To train the network, we need an optimal target t
k
for the buy and the sell
output node for every day. For this, we need an optimum benchmark system
which gives good signals. The systems which have been presented before are of
course not suited for this.
Instead a better system must be developed. For this, the best information
that could be available is used: future information. This is what will be referred
to as the oracle. The oracle gives the 7-days forward simple moving average of
the price. We have thus an indicator of the future price evolution.
Based on this one can dene the following decision rule:
IF oracle > close + 1.3*stdDev20 THEN buy
IF oracle < close - 1*stdDev20 THEN sell
This system needs as input also the 20-day standard deviation of the close
price. The system buys, when the future average price is signicantly above
the current price and sells if it is signicantly below. As the price increase
and decrease is signicant and located in the near future, it should ideally be
reected in todays indicators.
21
4.4 Results of the neural network system
We have developed a system which gives reliable training signals. Not surpris-
ingly, the oracle performs very well on all 30 stocks. It has an average annual
return of 22.7% with a standard deviation of 3.4%. It holds the share on average
over 40 days and all trades are protable.
To train the network one chooses randomly a day in the data set, computes
the output of the oracle and trains the network on this output. As we want to
capture the overall pattern in the data and not the specicities of the last days
on which the network was trained, a small value for the learning rate ( = 0.05)
has been chosen. This assures that each day can only very slightly modify the
weights of the network. It avoids overtting the network to the specicties of
each day. The network is the trained over 10000 randomly selected days.
The results of this approach are very disappointing. Mots of the time, the
network gives no signal over the whole period. On the opposite, during some
periods, it gives both signals at the same time. As only the buy signal counts,
when the share is not held and only the sell signal counts when we are in an open
position, this results in alternated buy and sell decisions. Very frequently it is
even observed, that the network gives some signals, which are even protable,
but as learning continues the prot degrades and sometimes the system stops
even to give signals. This holds even for very short periods of only 100 days.
The problem of signal alternation has been partly addressed by an additional
feature. A buy signal is only considered if at the same time the sell signal is
suciently weak, and vice versa for the sell signal. This decreases the likelihood
that a signal is reversed the next day. More precisely, a signal is only considered
if it is higher that 0.55, while the other signal is below 0.45.
One can give several reasons why an approach based on back-propagation
can actually not work:
1. The weights of the net are initialized randomly. The problem of back
propagation is that it is a local search algorithm which gets stuck in local
optima. The neighborhood of the global optimum is probably a very
narrow place in the whole search space. It is very unlikely that random
initialization will bring us close to this global optimum. At the same time
a network which gives no signal seems to be a local optimum, towards
which the network converges easily. Making no prot, but also no loss
seems very attractive. This explains why states in which some signals are
produced are frequently quit again. It has in fact be observed, that when
the network is trained on data which includes mostly upturn phases, that
fact that the net gives no signals is less of a problem. Giving no signal
in an upmarket is not necessarily optimal. If however, the testing period
includes up and down phases, giving no signal becomes more attractive.
2. As the oracle gives some signals only when the future market is signi-
cantly above or below the current price, it gives indeed most of the time no
signal. As many other algorithms in machine learning, back propagation
has a bias towards the instance of the output which is the most frequent.
In fact, a good t of the data is given by a classier which gives no signal
at all, as this prediction is true most of the time. This explains why so
often no signal is observed. But even restricting the learning samples to-
wards days in which a signal of the oracle was observed does not help to
22
overcome the problem. (see gure 7)
3. Not every small trend the oracle is able to predict, is truly reected in
the indicators. For example, some beginnings of trends are not preceded
by a pattern change in volumes. In this case it is non-sense to train our
volume-based indicators on the signals of the oracle. Another example
is that indicators are usually designed to detect the beginning of a trend.
The network is however also trained during a trend; during which it should
continue to conrm the signal. However, only the rst signal has real
importance. Training the network during the trend has hence only inferior
added value.
24/09/2004 5/01/2005 19/04/2005 1/08/2005 9/11/2005 24/02/2006 7/06/2006 19/09/2006
18
20
22
24
26
28
30
32
34
36
38
System based on NN for ACN
Date
P
r
i
c
e
(a) Trading signals at end of training phase
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
10
5
0
5
10
15
20
25
nb iterations
P
r
o
f
i
t
Evolution of profit during learning
(b) Evolution of protability during learning phase
Figure 7: Results of back propagation on 500 days of Accenture share. The
input to the network are ADX, DI+ -DI-, %B, MFI and MACD. The network
is trained on dates on which the oracle gives a signal only. The network gives
signals which is due to the fact, that the share is mostly in a phase of upturn.
However, the signals are useless, as they are very grouped and alternate between
buy and sell
One could try to overcome some of these problems, but it is very unlikely
that it is possible to overcome all these problems. The next section presents a
23
more promising method.
5 Neural Network training based on genetic al-
gorithms
5.1 Overview of the approach
This section presents the an approach which is likely to overcome the problems
of back-propagation mentioned above. It is based on genetic algorithms.
In order to overcome the problem of local convergence (reason 1), a global
optimization algorithm must be used. Genetic algorithms are such a global
optimizer.
We would also like the training method not to depend on the oracle anymore.
To achieve this, a global objective function will be dened. It clearly targets
what we nally want: We want an overall protable trading system. We do
not necessarily want a trading system that absolutely ts the predictions of the
oracle. The system should give reliable signals indicating when an upturn begins
and when it ends. As mentioned in reason 3, signals during the trend are less
important. The objective function depends only on the rst buy signal and the
rst sell signal related to each trade this problem is also solved.
As will be shown, there is still a problem of the number of signals (reason
2). But dening an overall objective function can also help to easily address
this problem (see section 5.3 below) .
5.2 The genetic algorithm
The genetic algorithm is represented in pseudo-code here below in algorithm 2.
The population size has been selected equal to 100. Each chromosome rep-
resents the weights of the neural network. It takes the from of the two weight
matrices w
(h)
and w
(o)
.
The indicators in the input have been normalized, that is subtracted with
their mean and divided by their standard deviation. This has been done for sev-
eral reasons. The indicators that have been dened have very dierent ranges.
Some are positive while others can also take negative values. Some are directly
proportional to the daily volume and take thus very high values (e.g ADL). Oth-
ers take values between 0 and 100 (e.g. ADX, MFI or RSI), while others only
between 0 and 1 (e.g. %B or Chaiki Money Flow). Normalizing the indicators
makes sure that all indicators have the same range. They are situated around
zero with a unitary standard deviation. The advantage of this is that it pro-
vides us also with an idea of the range of the weights. In case this normalization
would not have been done, indicators which take very high values would have
very low weights and vice-versa. First, it would be dicult for an optimization
algorithm to obtain a sucient level of precision at the same time for small
and big weights. Moreover, initializing the weights would be somehow dicult.
Another reason relates to the objective function, which will be dened in the
following subsection. As it is always done when neural networks are trained
by optimizing an objective function, one always punishes large weights. If the
indicators are not normalized, this would simply result in diminishing the im-
24
Algorithm 2 The Genetic algorithm
1: Input: Learning Data (indicators and prices), # indicators(N
i
), # hidden
nodes(N
h
), Population Size (N
c
)
2: Output: best of population
3: Randomly initialize population
4: for Chromosome in population do
5: Evaluate tness of population
6: end for
7: while last improvement < 75 steps ago && # iterations < 250 do
8: mom = select randomly from top % in population ( increasing linearly
from 30% to 90%)
9: dad = select randomly from population dierent from mom
10: osprings = getCrossoverPopulation(mom, dad)
11: With probability 85%:
Evaluate tness of osprings
candidate = best of osprings
Otherwise:
candidate = select randomly from osprings
12: osprings = getMutationPopulation(candidate)
13: Evaluate tness of osprings
14: Replace worse chromosome in population by best chromosome in o-
springs
15: Select randomly one chromosome from the 50% worse of the population
and one ospring
16: if Chromosmome from osprings not much worse than that of the popu-
lation then
17: Replace chromosome in population by chromosome in osprings
18: end if
19: end while
25
portance of indicators which take low values and which need as a consequence
larger weights to inuence the output of the network.
Finally, the fact that we normalize is helpful for putting a limit on the
absolute value of the weights. This will be benecial when it comes to initializing
the chromosomes on the one hand, and for the crossover operator on the other
hand. Consider the decision rule from equation (9). This decision rule is not
modied if one multiplies all weights (including the threshold) by a constant.
The interpretation of the threshold for the hidden node will also not be modied
in this case. One can therefore put an arbitrary limit on the weights. This limit
has been chosen here equal to one, which is also the order of magnitude that
has been given to the input
3
. For the thresholds however, the weight must be
somewhat larger. Suppose that all weights are close to 1. In this case, limiting
the threshold to 1 would be too restrictive. A properly dened threshold must
be allowed to go close tho the maximum the sums

N
i
w
(h)
ij
x
i
and

N
h
w
(o)
jk
h
j
.
Theoretically, the absolute value of hidden layer thresholds should be bounded
to the number of inputs and those of the output layer to the number of hidden
nodes; in other words, the number of terms of the sums. In practice this value
can however be dened lower as not all weights are systematically close to one.
The maximum is hence also set also equal to one for the thresholds as this gave
the best results.
To initializes the chromosomes (line 3), the corresponding weights simply
take a random value which is uniformly distributed between -1 and 1. The
maximum and minimum bounds dened this way will henceforth be referred as
w
min
and w
max
.
The genetic algorithm uses a double stop criterion (line 7). One limits the
number of maximum iterations, the other the number of iterations without any
improvement of the best chromosome.
Of the two parent which are chosen as candidates, one is taken among the
best percent of the population (line 8). The percentage varies linearly from
30% to 90% over the 250 iterations. This assures that at least one parent has a
suciently good, while giving more importance to exploration in the beginning
and as the iteration continuous, more importance is given to exploitation of
good solutions.
The crossover operators (line 10) returns 5 dierent ospring. Those oper-
ators have been inspired from [9]. Let w
1
and w
2
be the notation for the two
parents to cross-over. The following osprings (w
i
os
) have been dened: The rst
ospring is simply the average of the weights of the two parents, w
1
os
=
w
1
+w
2
2
.
This operations looks for a solution located in the search space just between
the two parents. The second ospring is given by w
2
os
= (1 )w
max
+
max(w
1
, w
2
). Omega is a constant set equal to = 0.8. This operator ex-
plores the weights located above the two parents in the search space. The values
below are explored by the third operator w
3
os
= (1 )w
min
+ min(w
1
, w
2
).
Operators four and ve perform a one-point crossover for each column of the
two matrices dening the weights. For reminder, the rst matrix represent the
weights associated to the hidden nodes. The columns are associated to each hid-
den layer, while the lines are associated to an input variable (or the constant).
3
The size of the weights is still important for the output of the hidden nodes. But anyway,
the limit is not respected strictly. This means that noting in the implementation forbids a
weight from taking values above 1. As mentioned, it is only used for initialization and the
crossover operator
26
It is of size N
i
+ 1 N
h
. The second variable represents the output node. The
columns are associated to an output, while the lines are associated to the input
of the hidden nodes (or the constant). The cut points are chosen randomly. The
example here below represents the crossover of the hidden layer matrices of a
network with three indicators and three hidden nodes (N
i
= 3, N
h
= 3).
_
_
_
_

11

12

13

21

22

23

31

32

33

41

42

43
_
_
_
_

_
_
_
_

11

12

13

21

22

23

31

32

33

41

42

43
_
_
_
_
=
_
_
_
_

11

12

13

21

22

23

31

32

33

41

42

43
_
_
_
_
(11)
This operators generates a random combination of the two parents.
In 85% of the case, the best chromosome which is generated is chosen as can-
didate w
c
for mutation (line 11). In the other 15% of the cases, the chromosome
is randomly chosen among the osprings.
The mutation operator adds some small random perturbations to the weights
of the candidate w
c
(line 12). Only a few, randomly selected weights are mod-
ied by adding a small random constant which might be positive or negative.
Seven mutation ospring are dened which all dier by the share of weights
which are modied on the one hand, and by the size of the random pertur-
bations on the other hand. To come back to our previous example, after the
mutation, the crossover candidate of above might be aected like this:
_
_
_
_

11

12
+
2

13

21

22

23

31
+
1

32

33

41

42

43
+
3
_
_
_
_
(12)
The best obtained mutant enters then the population by replacing the chro-
mosome with the lowest tness (line 14). A second, randomly chosen mutant
replaces a randomly chosen chromosome in the population (which belongs to
the 50% worse chromosomes in the population) (line 15). This has two aims: in
the beginning of the iteration it helps to rapidly replace the worse chromosomes
that have been initialized. Later on, when the average quality of the popula-
tion increases, it helps to keep the population homogeneous and to avoid the
algorithm of being too greedy. This is important to avoid the algorithm to get
stuck in a local optimum. Essentially, getting stuck in local optimum is a serious
problem as we have have already noticed with back propagation, but even with
the genetic algorithm this remains an issue. One can therefore conrm that one
is facing a very complicated optimization problem.
5.3 The objective function
In this subsection the objective function () will be dened. This function
evaluates the tness of the dierent chromosomes w over a certain learning
period. For this it requires the (normalized) values of the indicators for each
day and the close prices as input.
A very basic approach would simply reward for the prot and punish for
large weights. We punish the sum of the squared weights, noted ||w||
2
2
. This
approach yields however very unacceptable results. The network generated only
a few trades. Generally there is even no sell signal, and the end of the period is
27
used as stop, as an open position is automatically sold. Of course a good prot
which is generated this way is due only to over-tting and is not reproducible.
Dierent solutions had to be considered in order to control the pattern and
especially the number of the signals that were generated. Finally, all of those
solutions had to be combined to give the best suited results.
It is also important to note that the terms intervening in the objective func-
tion had also to be normalized in order to have a predictable order of magnitude.
Consider for instance the prot. Logically, larger periods are more likely to gen-
erate higher prots than shorter periods. As the learning phase has dierent
lengths for the various shares, the prot is normalized by the length of the
period.
In order to force the network to produce a suciently high number of trading
signals. We punish a system which generates too few signals. At the same
time, it will be observed that the network might also generate too many
signals, mostly under the form of alternated buy and sell signals. It has been
tried to determine an optimal ratio of the number of signals per day, which
should be targeted in the objective function. But such attempts failed as this
optimum ratio varies a lot from one share to another and from one market
condition to another. A better approach denes a minimum and a maximum
ratio of signals per day, below, receptively above which the deviation from those
thresholds will be punished. Those threshold ratios have been dened equal to
1.5 respectively 4 signals in 100 days. The analytic expression of the punishment
function takes the form:
max
_
0.015
#signals
#days
,
#signals
#days
0.04, 0
_
6
(13)
and is represented in gure 8.
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
0
0.2
0.4
0.6
0.8
1
1.2
x 10
11
signals/day
p
u
n
i
s
h
m
e
n
t
Punishment of too few or too many signals
Figure 8: Punsihment function at equation (13)
This takes however not into account that we would like to generate good
prot with the fewest signals possible. Therefore the number of signals per day
is punished linearly. This is such as taking trading costs into account.
With all this, it is still frequently occurs that buy and sell signals alternate.
To make sure that such chromosomes are more eciently improved or even
rejected, the number of times a buy and a sell signal are less than 4 days away
are also punished linearly.
On potential to further reduce alternating signals and simply control the
number of signals would be to increase the threshold for considering a signal
28
as valid. As mentioned above a signal is only validated if the corresponding
output node is above 0.55, while the other signal is below 0.45. Those same
thresholds are also used when it come to evaluating the objective function. One
could imagine to increase the former while decreasing latter threshold when
it comes to validating the system. This way only the strongest, hence most
promising signals would be retained. However, such an approach resulted only
in decreased prots as the buy signals were simply delayed by a few days while it
did not substantially reduce the number of (alternating) signals. This solution
has hence not been retained.
9/07/2007 30/01/2008 22/08/2008 19/03/2009 12/10/2009 7/05/2010 30/11/2010 24/06/2011
25
30
35
40
45
50
55
60
65
System based on NN for ACN
Date
P
r
i
c
e
Figure 9: Results of the system on 1000 days of Accenture share on the training
period. The input to the network are ADX, DI+ -DI-, %B, MFI and MACD. The
objective function rewards prot, punishes too few and too many signals,
alternating signals and the sum of the squared weights
Figure 9 shows the result of the training method using the objective function
as it has been presented so far. The results are still not satisfying. This is mainly
because the results that are obtained from the training are very diverse. Some
systems perform very well on the number of signals they generate with only
modest prot. Others have good prot but still very few signals. To get a
system which preforms very well on both we reward a function which is directly
proportional to the prot and indirectly proportional to the punishment related
to the signals.
It is also observed, that most training results include very long holding pe-
riods, which include also small phases of downturn. It would hence be more
protable to split those open positions into smaller peaces which holds the
shares only in the up-phases. The trailing-stop dened above is optimal for this
purpose. But when the network is trained using the trailing stop to evaluate
the objective function, it is observed that the network itself does not produce
any sell signal anymore. However, the sell signals of the system are very useful
as they might avoid losing each time the 5% loss from the peak required by the
trailing-system to take the prot. The system will therefore be trained using a
less strict trailing-system, which sells only after a 10% loss.
29
The nal objective function takes hence the form:
(w, p, I) =
prot
#days

#signals
#days
(14)
#alternating signals ||w||
2
2
+
prot
#days
+
#signals
#days
+
where p is the vector of prices for the learning period, I is the matrix of the
corresponding indicators and:
= max
_

_
0.015
#signals
#days
,
#signals
#days
0.04, 0
__
6
= 20
= 10
= 0.25
= 0.5
= 0.25
= 250
= 0.02
The values of the parameters have been selected such that all term of the func-
tion have similar maters of size. Furthermore they have been optimized experi-
mentally. For instance the term relating to prot should have more importance
than the control of the number of signals or the size of the weights.
5.4 An additional restart feature
It has been observed that the genetic algorithm (algorithm 2) nds very quickly
good solutions. The large population of 100 individuals is very useful for two
reasons: First, it assures that the population includes from the beginning on at
least a few good chromosomes. At the same time, the population is very hetero-
geneous for a long time. This way, the crossover operator can further explore
the very large search space. As the chromosomes in the population get better
and more homogeneous, it takes more iterations until further improvements are
found. Keeping the population heterogeneous is here important for the perfor-
mance of the algorithm. As mentioned, this is what the random insertion at
line 15 is good for.
It has been observed that going beyond 250 iteration steps is not useful, as
the algorithm seems to stagnate at its best solution it has found so far. Instead,
it is more useful to completely restart the algorithm. Finally, the following
solution is retained: The algorithm as presented in 2 is run 3 times with a
completely randomly initialized population of size 100. Those three trials give
generally very similar results, while there are still some slight dierences across
the outcome. As a matter of fact, the objective function dened here above
has its maximum at a value between 0 and 10 depending on the share ant the
period. Badly initialized solutions have tness values which can easily go down
to -3000. Of course those values have no units, no specic interpretation and
30
any monotonic transformation would not modify the outcome of the algorithm.
But as the objective function crosses zero during its convergence, one cannot
use relative so percentage terms to compare the outcomes of several runs of
the training algorithms. For comparison, one has hence to use absolute terms
and compare this to the total range of the objective function. With this in
mind, the outcomes over the dierent runs dier still by a value of more than 1
between the best and the worse outcomes.
Each of the best chromosomes obtained during the three runs are then in-
jected three times into a smaller population of size 25 on which the the genetic
algorithm is run again. The goal of this last run is rather to make a local opti-
mization of the best solutions obtained so far. The population should hence be
more homogeneous, which explains the smaller population size and the fact that
each good chromosome is even represented three times since the beginning. It is
observed that the already very good chromosomes obtained during the previous
runs can still be slightly improved.
Algorithm 3 The Restart algorithm
1: Output: best
2: for all i 1, . . . , 3 do
3: best
i
runGA() with N
c
= 100 (algo 2)
4: end for
5: best runGA() with N
c
= 25 containing 3 times each best
i
(algo 2)
This restart algorithm yields reliably very good solutions. A typical conver-
gence curve is shown in gure 10.
0 100 200 300 400 500 600 700 800 900 1000
10
5
0
5
# iteration
O
b
j
e
c
t
i
v
e

f
u
n
c
t
i
o
n



Figure 10: Convergence of the genetic algorithm using the restart feature. The
red line represents the tness of the best chromosome in the population. The
blue line represents the average tness of the population. The black vertical
lines represent a restart. It corresponds to the training on the Delhaize share
over 1222 days
31
6 Performance tests of the Neural Network
6.1 Approach
In order to test the approach dened in the previous section, the nancial data
which is available for our test candidates is split into two dierent sets. The
rst set is the learning or training set, which is used for training the network.
More precisely, it is used to evaluate the objective function. The second set is
the testing or validation set, which is used to evaluate the performance of the
network. By splitting the data into two (time-) independent sets, we avoid any
form of overtting. The testing phase is simple made of the last 1000 days of
the data set.
Concerning the structure and the inputs to the network, one must make
already a compromise between the number hidden nodes and variable to include
on the one hand, and on the complexity of the optimization problem on the
other hand. The size of the hidden-weight matrix w
(h)
is given by N
i
+ 1
N
h
and that of the output-weight matrix is given by N
h
+ 1 N
o
, where N
i
,
N
h
and N
o
are the number of input, hidden and output nodes respectively.
For a network with 10 input variables and 7 hidden nodes 93 dierent weights
must be estimated. Increasing the search space increases the likelihood that
the optimization algorithm nds a solution which is more distanced from the
optimum.
For this reason, all indicators that have been dened so far cannot be in-
cluded all at the same time. As the optimization problem is computationally
heavy and execution of the performance measures on the 30 shares takes about
4 hours it would be too time consuming to test all dierent combinations of in-
dicators. Therefore it is useful to make a pre-selection of the indicators. It was
decided to include the indicators which gave good solutions when used alone
with their corresponding decision rule (see table 3. Those indicators are the
ADX , the dierence of DI+ and DI-, the %B, the MFI, the MACD histogram,
the CMF, the slow %D and the force-20. Moreover the 20-day standard de-
viation of the price was included as a measure of risk, as it was observed that
this is performance increasing. The corresponding structure of the network that
was analyzed included between 5 and 7 hidden nodes. This way the size of the
search space was always below 100 weights. Of course the performance of the
indicators alone must only be seen as a proxy for the potential of performing
well within the network. As for it, the standard deviation of the price cannot
be used alone; yet it adds some value to the network. The same might happen
for an indicator, which might only be useful in combination with other indica-
tors. But as mentioned, testing all dierent combinations of indicators would be
too time consuming so that the individual performance is the best information
which is at our disposition to make the pre-selection.
In order to determine which of the pre-selected indicators should eventually
be retained, each of the variables was excluded individually. Using this approach
it was observed that the best model has 6 hidden layers. Leaving an indicator
out resulted generally in a decline (however insignicant) of the annual return
and mostly even in the increase of the standard deviation of the returns across
the dierent shares, hence in the riskiness of the system.
But unfortunately it has been established that interpreting these results has
actually no sense. The following section explains why an approach based on
32
Objective function Learning phase Testing phase
Run prot #trades ||w||
2
2
prot #trades prot #trades
1 2.81 156.3 10 0.92 139.5 25 1.0 22
2 3.18 185.2 9 1.20 101.9 17 -7.1 17
3 2.51 154.4 11 0.93 136.2 25 7.2 26
4 3.01 224.2 9 1.65 126.2 15 -6.4 12
5 2.38 134.0 10 0.81 112.9 23 -12.1 23
6 2.80 158.7 11 0.95 168.1 26 -43.4 36
Table 4: Objective function component, learning and testing results over 6
independent run on the Delhaize (DELB) share. The training period includes
1222 days, on which the share makes a prot of 2755 (for 100 invested
initially) which corresponds to an annualized return of 31.4%. The testing
period includes 1001 days, on which the share makes a loss of 18.4 (for 100
invested initially) which corresponds to an annualized return of -5.0%.
technical indicators and neural networks (such as represented here) does not
lead to a usable trading system.
For the sake of completeness of this study, an Excel (NN.xls) le is attached
to this report which includes the detailed results of the dierent experimenta-
tion. The following section will also refer to this le when numerical results are
cited. The corresponding sheet will me mentioned in the footnote.
6.2 Robustness of the trading system
A good trading system is of course characterized by the fact that the results
are reproducible. In our case, this means that the results of the trading system
on the trading set on the one hand, and on the learning set on the other hand,
are identical over dierent runs. No trader would use a system whose perfor-
mance diers across dierent trials under exactly the same conditions. Given
the complexity of our optimization problem, one issue that might arise, is that
the genetic algorithm is incapable of of nding (a network close to) the global
optimum at each run.
Table 4 here below shows the results over 6 independent runs of the training
on the Delhaize share. The table suggests that the concerns about the robustness
of the genetic algorithm were not founded. The restart feature ensures that the
values take by the objective functions dier only slightly compared to the whole
range of the functions image. Also on an more detailed level the outcomes
of the training are very similar. In fact the table shows the three important
component of the objective function: the prot, the number of trade (to each
corresponding one buy- and one subsequent sell-signal) and the sum of the
squares of the weights. All these measures are very close across the dierent
runs.
The results begin to dier when introducing the complete trailing system i.e.
stopping at 5% loss instead of the 10% used to evaluate the objective function.
This is due to the fact that the trailing-stop gives end to an open positions
earlier than in the objective function. The system might now produce new
signals during a period on which it was in somehow not trained, as any further
buy-signal was ignored as the position on the share was open. By the way, this
33
11/02/2003 16/10/2003 23/06/2004 24/02/2005 28/10/2005 11/07/2006 15/03/2007 20/11/2007
10
20
30
40
50
60
70
80
90
System based on NN for DELB
Date
P
r
i
c
e
(a) Run 1
11/02/2003 16/10/2003 23/06/2004 24/02/2005 28/10/2005 11/07/2006 15/03/2007 20/11/2007
10
20
30
40
50
60
70
80
90
System based on NN for DELB
Date
P
r
i
c
e
(b) Run 6
Figure 11: Buy- and sell signals over the training period of two independent
runs on the Delhaize share
is what explains the increase in the number of trades when the trailing system
is fully activated.
However, those dierences are only marginal. Figure 11 shows the trades
of the system on the training period for run 1 and run 6 from table 4. Those
two runs had very similar objective functions. One observes only very slight
dierences despite the activation of the trailing system.
When however the system is applied on the testing phase the results dier
greatly. This can be observed in table 4 and in gure 12, where again run 1
and 6 are compared. The dierences are now clearly visible notwithstanding
the very similar performance on the learning set.
We are hence facing the problem that the system is not robust on the testing
set. The neural network performs badly on unseen data, or in other words, it
34
20/11/2007 16/06/2008 6/01/2009 29/07/2009 17/02/2010 8/09/2010 28/03/2011 17/10/2011
30
35
40
45
50
55
60
65
70
75
80
System based on NN for DELB
Date
P
r
i
c
e
(a) Run 1
20/11/2007 16/06/2008 6/01/2009 29/07/2009 17/02/2010 8/09/2010 28/03/2011 17/10/2011
30
35
40
45
50
55
60
65
70
75
80
System based on NN for DELB
Date
P
r
i
c
e
(b) Run 6
Figure 12: Buy- and sell signals over the testing period of two independent runs
on the Delhaize share
35
cannot be generalized.
This lack of robustness of the trading system which has until now been
exemplied for the Delhaize share , is of course also valid for the other test
candidates. An intuitive way to express the lack of reproducible results is the
correlation between the prots on the shares over dierent runs. The average
of the pairwise correlation of the annualized prot over three independent runs
on all 30 shares on the learning set is at about 97% when the reduced trail-
ing system of the objective function is used . Introducing the complete trailing
system lets this correlation drop slightly to about 93%. The correlation can
intuitively be interpreted as a measure of how condent one can be that the
performance measures of the system (here the prot) will reproduce over dif-
ferent runs. When the system is the applied on the testing set, this condence
measure drops then to about 60%.
4
This illustrates now on a global level the problem of this approach. And
even though a correlation of 60% might still sound reasonable, it makes the
trading system completely useless in practice. Across the dierent runs that
have been performed, it was always observed that a few shares perform very
bad, with annual returns close to -20%. However, these bad results are not
each time observed on the same shares. Hence it cannot be interpreted that the
system, or more generally the approach, is not adapted to some of the shares
and which could simply be excluded from the system. Instead, on which shares
the bad performance will be observed cannot be predicted in advance. A trader
using this system, will know in advance that the system will generate big losses
on some of the shares. This makes the system unusable from a psychological
point of view. The trader would stop the system on shares on which he observes
that the system starts generating losses. By doing so, he might in fact exclude
a share on which the system would have generated a prot in the end.
The absence of reproducible results on the testing set can be explained as
follows: There are actually many combinations of weights which correspond to
(a conguration close to) the global optimum. In fact, as observed in table 4,
on average only 10 trades that is 10 buy- and 10 sell signals where generated
over the training period. One can understand, that in a neural net with close
to 100 weights, many dierent congurations can generate the same signals.
As the signals are generated by passing the thresholds dened for the two
nodes, slightly modifying the weights aects the values of the two output nodes,
but not necessarily the signals that are generated. Dierently expressed, it
means that the global optimum does not correspond to some precise values, but
rather to some (interdependent) intervals on which the weights must be located.
This means that the search space must be very at around this global optimum.
As the signals which are generated are not aected when the weights are slightly
modied, the objective function is also not aected.
Unfortunately it is not possible to represent the whole search space as it has
a dimension close to 100. However, gure 13 illustrated very well the atness of
the search space around the global optimum. It shows the value of the objective
function on the training set when two of its weights are modied around the
value which has been obtained by the optimization algorithm. More precisely,
it corresponds to the weights returned by run 2 of table 4, the run with the
highest objective function value. The weights that are modied should ideally
4
NN.xlsx>>full.Nh6
36
0.1
0.05
0
0.05
0.1
0.1
0.05
0
0.05
0.1
1
1.5
2
2.5
3
3.5
w
%B
Search space
w
MFI
O
b
j
c
t
i
v
e

f
u
n
c
t
i
o
n

Figure 13: run 2


be weights that have indeed an impact on the output of the network. Many
weights in the network take in fact very small values, which simply indicates,
that the corresponding input is in fact not taken into consideration. Slightly
modifying those weights would indeed not aect the signals generated by the
network. To avoid this, two inputs have been chosen which should determine
the output of the net. Those inputs are the %B and the MFI, the indicators
which performed the best individually. The precise weights which are modied
are those corresponding to the highest absolute value of the weights of the
hidden layer corresponding to those inputs. The global optimum which has
been returned by the learning algorithm corresponds of course to no variation
of the weights (
w
%B
=
w
MFI
= 0).
It can indeed be observed that the search space is completely at around
the global optimum. The objective function takes the form of stairs. On
each plateau, the signals which are generated are identical. When the objective
function drops, it simply means that the signals are modied.
As the global optimum does not correspond to an exact conguration of the
weights, the network obtained over the dierent runs was indeed not that same.
This is what explains, why the network resulting from dierent learning runs
had a similar behavior on the training data, and behaved still dierently on the
testing data.
6.3 Potential remedies against the low robustness
This subsection presents several attempts to overcome the problem encountered
with the here presented approach.
6.3.1 Grouped training
The major cause of the lack of robustness is, as mentioned, that only a few
signals determine the value of the objective function and that the search space
is too at at the optimal conguration. One obvious remedy against this, would
be to train the network on a longer period. This way, more signals are generated
37
during the learning phase, which would also make the congurations correspond-
ing to the optimum more narrow. In other words, the likelihood that the signals
over the learning period are modied when the weights are modied increases
when the length of the learning phase increases. On average, our learning phase
includes 1825 trading days, which corresponds to about 7 years of data. The
problem with taking longer learning periods, is that the mirco-structure of the
markets changes over such long periods. Over the last decade the liquidity and
the volatility of nancial markets have changed fundamentally. Training the
network on very old data would make it unadapted to running it on the testing
data or applying it to the market.
Yet, the length of the training period can be extended in another way. A
network could be trained for several shares at the same time. The training
period length becomes the sum of the training period lengths of the dierent
shares. Of course, training the network on several shares makes it less specic
to the characteristics of the individual shares. Therefore the shares should by
itself have similar characteristics. On the other hand, training on several shares
is not necessarily bad. It might reduce the problem of over-tting the network
to the data. It might for instance be that the training period of one of the
shares includes a very rare event ( e.g. an accident in the factory). Training the
network only on the share would adapt the network to the specic pattern of
this event, which is not desired as such a rare event is not likely to be observed
again in the future. Training the network on several shares reduces the risk of
over-tting the data to specic pattern in the leaning set.
In order to have shares with still some similar characteristics, dierent groups
are formed according to sectors. This is justied by the fact that shares in the
same sector are frequently correlated and have the same risk. For instance, the
measure of risk introduces above is generally computed on a sector level. The
groups are the following:
IT solutions(5): Accenture Plc. (ACN), BMC Software Inc. (BMC), Inter-
national Business Machines Corp. (IBM), Oracle Corporation (ORCL),
Cisco Systems Inc. (CSCO)
(Petro-)Chemicals(4): E. I. Du Pont De Nemours & Co. (DD), The Dow
Chemical Company (DOW), Solvay SA. (SOLB), Exxon Mobil Corpora-
tion (XOM)
Financial services(4): JPMorgan Chase & Co.(JPM), KBC Groep NV (KBC),
PNC Financial Services (PNC), American Express Company(AXP)
Retail & Food(4): Delhaize Group (DELB), Target Corporation (TGT), Kraft
Foods Inc. (KFT), McDonalds Corporation (MCD)
Hardware & Technology(3): Dell Inc. (DELL), General Electric Company
(GE), Hewlett-Packard Company (HPQ)
Telecommunications(2): Belgacom SA. (BELG), Mobistar SA. (MOBB)
Internet(2): Amazon.com Inc. (AMZN), Google Inc (GOOG)
Logistics(2): FedEx Corporation (FDX), United Parcel Service Inc. (UPS)
Health Care(2): Abbott Laboratories (ABT), Baxter International Inc. (BAX)
38
Investment(2): Ackermans & van Haaren NV. (ACKB), Franklin Resources
Inc. (BEN)
The objective function is simply composed of the individual objective
functions
i
such as dened in equation (14). On the one hand, the individual
objective function should be as high as possible. Therefore, we reward simply
the sum of all individual objective functions. On the other hand, the network
should perform reasonably well on all shares at the same time during the learning
phase. It should not be possible to compensate bad performance on some shares
by very good performance on other. Usually a product of the objective functions
should be included. But as the objective function takes values slightly above or
below 0 this direct approach would not work. Its can be considered, that the
network starts to perform well from the moment on that the objective function
passes above -10. Therefore a product is included whose factors are the objective
functions augmented by 10 if this sum is positive. Otherwise its is considered
as zero. Finally the global objective function is:
=

i
+

i
max(
i
+ 10, 0) (15)
It was observed, that grouping two shares together is not sucient to have
the desired eect. In fact, the correlation are similar to those observed on indi-
vidual shares (i.e 96% on the training set with full trailing system and around
50% on the testing set). But when the groups get larger, the problem of ro-
bustness appears now even on the training set. Despite, the second term in the
objective function (15), the system is now incapable of reproducing the same
results over the training set. In fact, the values of the individual objective func-
tions dier over the dierent runs. It is not possible to guarantee a reproducible
outcome of the learning algorithm when several shares are included. The cor-
relation over dierent runs on the shares which are in groups including at least
four shares has dropped to 50% on the learning set. The problem is that it is ex-
tremely dicult to nd a weight conguration that performs well on all shares.
The initialized chromosomes perform mostly very badly on most of the shares.
With so many bad chromosomes in the population, the genetic algorithm has a
lot of diculties of increasing the average tness of the population, even when
the parent selection was made among the best 20% of the population. Of course
additional features such as increasing the number of iterations or adding more
restarts could possibly help to overcome these algorithmic problems. But the
problem is more likely to be located elsewhere. The fact that the training with
grouped shares is so dicult indicates that having one network for more than 3
shares is not adequate despite the fact that those hares are in the same sector.
It seems that still every share has its own specic pattern. This approach is
hence also miss-leading.
5
6.3.2 Control via the objective function
It has also been veried, that the problem of low robustness is not due to a badly
dened objective function. The term ||w||
2
2
, which is the punishment of the sum
of the squares of the weights, is important for the resulting conguration of the
5
NN.xlsx>>Grouped
39
network. Its role is to keep the values of the weights as small as possible. It
induces also, that inputs which are not taken into account for a specic (hidden
or output) node have a small value. More precisely, the outputs of hidden nodes
are in fact a non-linear combination of the indicators. But as not all indicators
are eventually important for the specic node, some of the weights should have
weights close to zero. One issue that might arise is that an indicator which is
unused during the learning set is still suciently high to aect the signals of
the neural net during the testing phase. Such noise signals should indeed be
avoided by introducing the punishment of the size of the weights. As matter
of facts, it was observed that when the factor associated to the term ||w||
2
2
was decreased by only one matter of size (10
1
), the correlation between the
annualized prots over several runs on the testing set dropped down to 0% on
average (while the correlation on the learning set where unaected) which means
simply that the performance on the testing set became completely random
6
. The
punishment of ||w||
2
2
is hence indeed important for the robustness of the system,
but it does not help to overcome the problem as it does not address the cause
stepping from the search-space atness around the optimum.
This raised the idea of another way to control the structure of the network
and possibly its robustness via the objective function. The punishment of
the sum of the squared weights (||w||
2
2
) is in fact an L2-measure of the size of
the weights. The corresponding L1-measure is the sum of the absolute value
of the weights, ||w||
1
. The dierence between those two measures is that the
L2 is more inuenced by the highest weights in the net, while the L1 treats all
matters of sizes equally. Introducing the L1-measure in the objective function is
expected to have the following benecial eect: As the L1 does not only focus
on the highest weights, it is expected to decrease further the weights of the
aforementioned unused weights of the network, and this way it would further
reduce the noise signals appearing during the testing phase.
7
The objective function (14) was thus modied to include the follow terms:
(w, p, I) =
1
||w||
1

2
||w||
2
2
(16)
where:

1
= 0.05

2
= 0.25
As expected, it was observed, that this had the eect of further approaching
the value of the smallest weights to zero by one matter of size (10
1
). But
again the robustness of the system was not aected as the problem of search
space atness was not solved.
6.3.3 Complexity of the network
A last approach that has been studied is the complexity of the network. It
has been tested whether including less indicators in the input and reducing the
numbers of hidden nodes had an impact on the robustness of the system. Two
systems have been considered which include the mostly promising indicators (on
6
NN.xlsx>>L2reduced
7
NN.xlsx>>L1
40
order to avoid noise from bad indicators): A rst system is based on the %B,
the MFI and the MACD histogram
8
. The second is based on the ADX, the
dierence between DI+ and DI-,the %B, the MFI and the slow%D
9
. Both used
3 hidden nodes. This means that the number of nodes has been reduced from
previously 74 weights to now 20 respectively 26 weights. Yet, the problem of
robustness as expressed by the correlations remains identical.
7 Conclusion
7.1 Why technical indicators and neural networks do not
work together
Unfortunately we must conclude that the approach which is dened here did not
lead to a usable trading system. One reason has been explicated here before.
To conclude this report, some other argument will be added to explain why
technical indicators will probably not work together:
1. As mentioned here before, the system which was obtained is not robust
and can hence not be used as such for psychological reasons. All attempts
to overcome this problem failed. Even though the result on the training
set cannot be interpreted as a consequence, it should bet mentioned that
none of the systems signicantly bet the market.
2. The system is also unusable for another psychological reason. Through
the complexity of the system, it is in fact very untransparent. One cannot
know on what precisely the trading signals are based. No trader would
use a system which he does not understand.
3. Lastly, the approach which had to be used it quite complicated and heavy.
On the one hand, the objective function that was necessary to obtain
acceptable training results is not an easy expression which includes many
parameters. On the other hand, the computational complexity of the
genetic algorithm is considerable. It needs many iterations with many
evaluations of the also very heavy objective functions to converge to the
global optimum. As mentioned, testing the system on all 30 shares takes
about 4 hours on a modern computer
10
. In addition at least two runs are
needed to verify the robustness. Optimizing the parameters, the structure
of the network and the indicators to be considers is hence a very time
consuming task. The big problem is also that the usually very ecient
back propagation algorithm cannot be used.
Of course, this report should not be considered as a proof that one cannot
base a system on indicators and neural nets. Proving this impossibility is of
course not possible in itself. Rather, this report shows the diculties of such an
approach. It has shown that a simple back-propagation algorithm can probably
not be applied. It proposes an idea of an objective function used to overcome the
aws of the back-propagation approach and provides a well performing genetic
8
NN.xlsx>>simple1
9
NN.xlsx>>simple2
10
Intel i5 with 2.3GHz and 4GB RAM; coded in Matlab
41
algorithm to solve the global optimization. It raised also the issue of robustness
such a system can have. The most promising method to overcome this problem
seems to be to group several shares together. So as to implement a system
based on neural networks and technical indicators additional research could be
done in order to determine under which conditions one network can be be used
for several shares. This report indicates that being in the same sector is not
sucient.
Another promising approach might be how eventually combining dierent
networks more precisely bagging, which is used when a classier is unstable
might improve the robustness. Possibly combining networks which have been
obtained over several runs of the algorithm could be aggregated to form one
trading system. The goal of further research would the be to determine how
those networks should be aggregated and whether this can help to overcome the
problem of robustness.
However, the chances of success seem very low. If one is absolutely keen in
using neural network and technical indicators one should rather refer to an ap-
proach which is largely developed in the literature [2, 8, 13]. This approach uses
lagged prices as input and has the predicted price(s) over a very short periods
as output. This approach has two fundamental advantages over the approach
presented here. First, as the predicted prices are observed during the training
phase, back-propagation can be used here. This makes the training on the one
hand computationally more ecient and will also overcome the problem of ro-
bustness, as the correctness of the predictions can be assess for each day (which
increases the precision of the obtained network) instead of just aggregating
the signals of the whole period into one very complex and global objective func-
tion. Second, such a system would also overcome psychological issues. The
predicted price over a very short period is more tangible than some abstract
buy- and sell-indicators. Technical indicators can be used to complement the
input of such a neural network [6, 7].
7.2 A personal comment on ecient markets
One question which remains still open relates to who is right: Fundamentalists
and technical analysts, or the theory about ecient markets? Are stock prices
predictable or purely random?
I think these views are not necessarily mutually exclusive. Before starting
this project I did not believe that technical indicators convey information about
future prices. What made me put this view into doubt was related to an ob-
servation I have made: During the development and improvement phase of the
genetic algorithm I observed a correlation of 50% between the performance on
the training set and that on the testing
11
. This is very abnormal as the market
performance in one period is not related to the market performance in another
period in the long run. The correlation was related to the fact that the genetic
algorithm was not yet fully developed and that as a consequence not all net-
works where correctly trained, as they got stuck in local optima for instance.
Explained dierently, badly taking into account the technical indicators resulted
in performance which was by far worse than the market. As the genetic algo-
rithm became fully developed so that the training an all shares was successful
11
NN.xlsx>>badTraining
42
and not random anymore, this correlation disappeared naturally. This conrms
that there is indeed some information in the indicators.
On the other hand, the Ecient Market Hypothesis should not be over in-
terpreted. To me, the Ecient Market Hypothesis does not mean that prices
are completely random. Rather it should be reformulated as: There is no
free-lunch. The question is hence how complicated and how successful the ex-
traction of the underlying pattern is. With this in mind, one should rather speak
about the reward of a trading system rather than its return, notwithstanding
that both terms are frequently used as synonyms. One should also take into
account the costs and the eort of developing a trading system. Essentially, de-
veloping a trading system is very complicated and requires extensive knowledge
of the tools which are used for this. In my opinion, a trading system might
beat the return of the market. But will it also reward its developer more than
if he had put its eort or applied its knowledge elsewhere? One can cite here a
study published in the Journal of Finance in 2000 [12]. In this article, Wermers
shows that Mutual Funds are able to beat the stock market by 1.3%. However,
1.6% must be subtracted from this for trading costs and other expenses. Put
dierently, the experts are able to beat the returns of the market in a way which
barely rewards them for their costs.
43
References
[1] John Ehlers and Ric Way. Evaluating trading systems. www.mesasoftware.
com.
[2] Fernando Fernandez-Rodrguez, Christian Gonzalez-Martel, and Simon
Sosvilla-Rivero. On the protability of technical trading rules based on
articial neural networks: Evidence from the Madrid stock market. Eco-
nomics Letters, 2000.
[3] Lonnie Hamm, B. Wade Brorsen, and Martin T. Hagan. Comparison of
stochastic global optimization methods to estimate neural network weights.
Neural Processing Letters, 26(3), 2007.
[4] Google Inc. Google nance. www.google.com/finance.
[5] Investopedia. Basics of trading systems, 2006.
[6] Kyoung jae Kim and Ingoo Han. Genetic algorithms approach to feature
discretization in articial neural networks for the prediction of stock price
index. Expert Systems with Applications, 19(2), August 2000.
[7] Monica Lam. Neural network techniques for nancial performance pre-
diction: integrating fundamental and technical analysis. Decision Support
Systems, 37(4), 2004.
[8] William Leigh, Russell Purvis, and James M. Ragusa. Forecasting the
NYSE composite index with technical analysis, pattern recognizer, neural
network, and genetic algorithm: a case study in romantic decision support.
Decision Support Systems, 32, 2002.
[9] Frank H. F. Leung, H. K. Lam, S. H. Ling, and Peter K. S. Tam. Tuning
of the structure and parameters of a neural network using an improved
genetic algorithm. IEEE Transactions on Neural Networks, 14(1), January
2003.
[10] David J. Montana and Lawrence Davis. Training feedforward neural net-
works using genetic algorithms. Proceedings of the 11th international joint
conference on Articial intelligence, 1989.
[11] Michele Pace. Algorithmic trading project notes & guidlines, 2012.
[12] Russ Wermers. Mutual fund performance: An empirical decomposition into
stock-picking talent, style, transactions costs, and expenses. The Journal
of Finance, 55(4), 2000.
[13] Jingtao Yao, Chew Lim Tan, and Hean-Lee Poh. Neural networks for
technical analysis: A study on KLCI. International Journal of Theoretical
and Applied Finance, 2(2), May 1999.
44

You might also like