The main contribution of behavioral finance so far has been in providing simple and intuitive explanations

for many irregular phenomena observed in financial markets. Still, many of the dictates of behavioral
finance cannot be easily accommodated by the current econometric models which makes it hard to
empirically validate not only the statistical but also the economic significance of behavioral hypotheses.
In this paper, the authors discuss new trends in econometric research, namely cointegration and
equilibrium-correction models, and show how these can effectively incorporate many ideas of the
behavioral finance. This, in particular, applies to the interaction of the price and the fundamental value.
The authors provide an up-to-date review of econometric and computational intelligent models that have
been proposed for cointegration analysis and equilibrium-correction modeling, and show exactly which
“behavioral” elements these models are able to capture. The authors in particular, favor the use of
intelligent learning algorithms both in the detection of complicated cointegration relations and the
representation of the equilibrium restoration dynamics.

Most financial and economic theories suggest long-run equilibrium relationships between
financial or economic variables, which take the form of a linear or non-linear functional
relationship between variables of interest. Much of the theoretical work, however, is
exclusively focused on long-run relationships in ideal markets, by simplistically assuming
that equilibrium is almost instantly restored and various market irregularities are
insignificant so far as the final formation of prices is concerned. There is, thus, little
guidance on what to expect in transient states of adjustment to equilibrium in an
“imperfect” market setting with irrational traders and frictions. Behavioral finance has
made an important breakthrough by stressing out the importance and persistence of non-
equilibrium states caused by market frictions (or else limits to arbitrage) and irrational
trading. As we show later in this paper, it has also taken one step further by giving
indications on what patterns of irrational trading and equilibrium restoration are likely
to hold in practice. However, many of these ideas have not yet been fully accommodated
in current econometric practice, which makes it hard to empirically validate both the
statistical and the economic significance of the various hypotheses.
Many of the real-world economic and financial time-series drift in a stochastic or
unpredictable manner, meaning that they cannot be turned into stationary by simply
substracting a deterministic function of time. However, by taking first differences these

series become mean-reverting and stationary. When two or more trending series tend to
move together satisfying a long-run relationship, they are said to be cointegrated.
Cointegration is the statistical counterpart of equilibrium in time-series analysis. In a
stochastic environment, where economic time-series are constantly hit by unpredictable
shocks, one (almost) always expects deviations from equilibrium. However, if two or more
economic variables are to satisfy an equilibrium, then these cannot move apart for long.
The deviation from this relationship at any time must follow a fixed distribution, centered
on zero which remains unchanged with time. Otherwise, there wouldn’t be a tendency to
satisfy the equilibrium. The time-series which represents the discrepancy between the
observed outcome and the postulated equilibrium is called the disequilibrium error.
If theory or empirical evidence support a long-run equilibrium relationship then
knowing this relationship, and in particular the deviation from the equilibrium, could
help in forecasting future movements of individual time series. Although individual series
may be hard to predict, the cointegrating combination tends to be more predictable on
average as it removes much of the idiosyncratic characteristics of each series. Equilibrium-
or Error-Correction Models (ECM) essentially implement this idea and describe how future
prices respond to deviations from the equilibrium in previous periods. ECMs provide a
richer representation of the underlying short-run dynamics by describing the mechanism
by which individual time-series adjust themselves towards the long-run equilibrium.
Cointegration and error-correction models have been applied to examine price
relationships in various markets with rather promising results. We indicatively mention
the studies of Ghosh (1993), Koutmos et al. (1996), Wahab et al. (1993) on index futures;
Anderson(1997), Hall et al. (1992) on the term structure of interest rates; Campbell et al.
(1987, 1988), Lee et al.(1999) on stock prices; Culver et al. (1999), MacDonald et al.(1993)
on foreign exchange rates. Cointegration and equilibrium-correction analysis has also
been applied in the framework of Computational Intelligent (CI) models, and in particular
with neural networks Haefke et al. (1996, 2002), Harm et al. (1996), Markellos (2004),
Refenes et al. (1997), Xu et al. (1998).
The goal of this paper is two-fold: to provide an up-to-date review of the current
practice in cointegration analysis and equilibrium-correction modeling, and show how
various behavioral elements can be effectively fit in this framework. We argue that these
new advances in econometric modeling may eventually provide the econometric device
for empirical validation of bahavioral hypotheses. Of the various dictates of bahavioral
finance, we restrict our attention to results on limits to arbitrage and irrational deviations
from fundamentals. In our exposition of the topic we adopt a descriptive style and attempt
to make ideas as transparent as possible by leaving out unnecessary mathematical
technicalities. The rest of the paper is organized as follows: In Section 2 we discuss a key
concept in modern finance, the concept of arbitrage, and show how the (no) arbitrage
methodology can be used for common pricing and forecasting tasks to derive long-run
equilibrium relationships for the value of stocks, derivatives or between short- and
long-term interest rates. Since the power of arbitrage is critical to the elimination of
disequilibrium errors it is worth examining as to how effectively arbitrage strategies work
in real markets. Such a discussion is given in Section 2.1. The findings of this section are
very important as they suggest that (a) Equilibrium is not instantly restored and almost
never attained; and (b) certain types of equilibrium restoration dynamics are more likely
to hold in reality. In Section 3 we introduce the reader to common terminology in
cointegration analysis and equilibrium correction modeling, and in Section 3.1, we review
the standard practice for specifying linear error-correction models which have been
traditionally used in studies of financial prices. Section 3.2 presents non-linear approaches
to cointegration analysis such as threshold, smooth transition and computational

intelligent-based equilibrium-correction models, while section 3.3 discusses other possible
extensions to the concept of cointegration. Section 4 reviews various issues in
cointegration modeling and concludes the paper.

Arbitrage is the financial term for “free-lunch” or “making money out of nothing”. An
opportunity for arbitrage arises when two “similar”, in terms of final pay-offs, assets sell
at different prices in the market. By selling the expensive and buying the cheap one, one
holds a riskless portfolio, which costs nothing to the investor but still produces
non-negative profits. Apparently, such a portfolio constitutes a rather attractive
investment which many rational profit-hunting investors will seek to buy. Hence, in the
long-run, market activity will bring the prices of the two assets so close that arbitrage is
no more feasible. The necessity for the near-equality of the two prices is often referred
to as the Law of One Price.
Nowadays, it is common practice in finance to use (no) arbitrage-type arguments to
derive equilibrium or “fair” prices for various financial products. This approach to pricing
has the advantage of not making strong assumptions on the interaction of agents or their
attitude towards risk. Popular applications of the concept of arbitrage are in the pricing
of stocks and derivatives (futures, options, swaps, etc.), as well as in explaining the term
structure of interest rates. We attempt a brief review of the rationale of these no-arbitrage
One of the most popular conceptual framework for pricing common stocks is the
Arbitrage Pricing Theory (APT), which goes back to the seminar works Ross (1996, 1997).
One- or many-factor arbitrage models pose an equilibrium relationship for the price of a
traded stock which excludes any arbitrage opportunity. To derive the equilibrium model,
APT rests on two assumptions. The first one concerns the return-generating process and
reflects the intuition that the return on each security is driven by certain economic or
market factors and the sensitivity of the return to these factors. If ri is the ith security’s
return at the end of the investment period then in the case of a K-factor model the
return-generating process is assumed to follow the linear specification
ri ri i1 r f1 i 2 rf 2 i k rf k t ...(1)

• ri is the mean return on i.

• rfk is the excess return (realized minus mean return) on factor k at the end of the
investment period ( rfk = 0).

• ik is the “beta” or sensitivity of the security to fluctuations in fk..

• is the idiosyncratic or company-specific risk of i, which by assumption cannot

be explained by any of the factors, (i.e., E( i \f1, f2,..., fk) = 0). We also assume that
individual idiosyncratic risks are uncorrelated with each other and zero in mean.
The second critical assumption concerns the existence of ‘factor-mimicking’ portfolios.

In a highly developed market with a great variety of assets, one can form portfolios Pk, k
= 1, 2,..., K such that a) each of which is perfectly correlated with factor fk and
uncorrelated with the others (i.e. ( PjE = 1, for j = k and 0 for j z k); and b) its idiosyncratic
risk is approximately zero ( kH ' ¦i w ik iH |0) meaning that portfolios are well-diversified.
The important pricing result states that under the absence of arbitrage, the expected
return of each security i, whose return generating process is given by (1), should be

ri rf i1 (rP1 rf ) i2 (rP2 rf ) ... i1 (rPk rf ) ...(2)

i.e. a linear combination of the risk premia on each factor. Equation (2) is, in fact, an
equilibrium relationship for the return of a stock i. This geometrically corresponds to a
hyperplane in the space ( 1, 2 ,. , K , r) of betas and expected return. To illustrate the
point, let us assume for simplicity that only two factors determine the returns on the
securities traded in the market, so we are in the situation of Figure 1. Under no-arbitrage,
the set of all expected returns on the market should coincide with the hyperplane which
is spanned by the returns on portfolios Pk, k = 1, 2,..., K. Suppose that there exists an asset i
(point A in the figure) with betas i1 and i2 whose expected return ri is greater than the
return implied by (2) (point B on the hyperplane). If one forms a portfolio P by buying
1Euroof i and selling short i1 Euros of P1 and i2 Euros of P2 then he virtually eliminates
all the systematic risk of the stock and also attains on average positive returns. Since this
portfolio costs nothing to form, it is clearly an arbitrage opportunity. As many rational
investors will run to exploit it, the demand for i will make its price to rise until its average
return be equal to that of point B. The central point illustrated here is that, market
activity “exerts pressure” on point A, and also on any point which doesn’t lie on the
hyperplane, to return back to it.

Figure 1: A Geometric Interpretation of the Arbitrage Pricing Theory



E(rB) A



Apart from stocks, no arbitrage arguments are in particular used in the pricing of
derivative securities such as futures, options and swaps (see Cuthbertson et al. (2001), Hull
(1998) for more details). The standard practice is to form a portfolio, typically dynamically
adjusted, that perfectly replicates the derivative under discourse, i.e., it has the same final
value as the asset in a given investment horizon. If this portfolio is self-financing1 then
at any time prior to maturity the price of the security must equal the cost of setting-up
the replicating portfolio. Otherwise an arbitrage opportunity exists among the two
investments, derivative and replicating portfolio, if one longs the cheap and shorts the
expensive one.
Arbitrage arguments are also common to studies of the term structure of interest rates
as a means to explain the comovement of yields associated with bills of different maturity
Anderson (1997), Hall et al. (1992). The postulated no-arbitrage relationship “ties up” the
period yield with the expected one-period successive yields and reflects the intuition that
investors will be indifferent between holding a bill which has k periods left to maturity
and a sequence of one period bills for k successive periods. Certain arbitrage arguments
assert that arbitrageurs will buy and sell bills in an attempt to profit on any yield spread
that is not justified by risk or liquidity. The implied equilibrium is attained when yield
spreads allow no opportunity for arbitrage; if yield spreads are inconsistent with this
equilibrium then arbitrage causes yields to adjust (see Anderson (1997) for more details).
2.1. Arbitrage in Real Markets
Most arbitrage-based pricing theories conceive arbitrage as an investment opportunity
that offers riskless profits at no cost. They presume that most securities in the market have
“substitutes” or perfectly replicating portfolios, and also a market environment with
enough rational traders to detect the mispricing and bring the price back to the
no-arbitrage or fundamental value.2
In essence, most arbitrage arguments are based on two assertions. First, as soon as there is
a deviation from the fundamental value—in short, a mispricing—an attractive investment
opportunity arises. Second, rational traders will immediately snap up the opportunity, thereby
correcting the mispricing. In fact, it is difficult to argue against the second point: when
attractive investment opportunities come to light, it is hard to believe that they are not quickly
exploited. However, in practice even when an asset is wildly mispriced, strategies designed to
correct the mispricing can be both risky and costly, and hence not always attractive.3 There
are actually three important reasons as to why this might be true.
• Implementation costs: Most arbitrage arguments ignore transaction costs and other
market “frictions”. However, transaction costs, margin payments, limitations to short
selling, etc., prevent investors from forming trading strategies that perfectly replicate
an asset and hence take advantage of any mispricing. 4 The importance of these factors
was also noted in an early study by Cootner (1962) who talked about upper and lower
“reflecting barriers” around the equilibrium value, imposed by markets frictions.
Meaning that investment in certain assets is solely financed by selling others short.
It is remarkable that common finance terminology refers to rational traders as “arbitrageurs”.
The views expressed in this section are generally in line with the behavioral finance literature on
limits to arbitrage. See for example Barberis et al. (2001), Shleifer (2000).
Empirical research conducted in various markets suggests that “frictions” and market imperfections are often
responsible for statistically significant mispricings and deviations from the no-arbitrage situation. Some studies
concluded, for example, that the futures contract is selling at a discount relative to its theoretical price Cornell
et al. (1983), Figlewski (1984). Jawadi (2004) reports similar results for stock prices.

He argued that the position of such barriers is likely to depend upon the size of market
frictions and other transaction costs, including the bid-ask spread, short-selling and
borrowing constraints, giving rise to a band of price movements around the
equilibrium price where arbitrage activity is unprofitable.
The way market frictions are conceived in the literature, misses two other less tractable
though important components of implementation costs: the cost of finding and learning
about a mispricing (information cost) and the opportunity cost of the resources needed
to exploit it (processing cost). The information cost, in particular, becomes more
significant in a noisy market environment, where estimating the intrinsic value of an
asset requires a considerable amount of data, effort, time and expertize. Apparently, due
to implementation costs, arbitrage strategies are profitable only when the benefits from
arbitrage well exceed implementation costs He et al. (1995).
• Non-perfect Substitutes: Most securities traded in the market hardly have perfect
‘substitutes’ which makes it difficult to eliminate all the fundamental or non company-
specific risk of a security. In this case, strategies designed to correct the mispricing are
not completely riskless but are partially exposed to fundamental risk.
• Noise Trading: Even when fundamental risk is insignificant, arbitrage strategies may
still be unattractive due to noise trading risk5. Since the activity of noise trades leads
prices away from fundamentals, arbitrageurs run the risk that the trading of noise
investors will cause a further “mis-price deepening”, as noted by Shleifer (2000). If
noise traders are pessimistic today about an asset and have thereby driven its return
down, an arbitrageur buying this asset needs to recognize that noise traders might
become even more pessimistic and drive returns down even further in the near future;
if the arbitrageur has to liquidate their position before returns recovers, they will suffer
a loss, and loss aversion could limit the original arbitrage position Hirshleifer (2001).
Conversely, an arbitrageur selling an asset short when bullish noise traders have driven
its return up needs to be aware that noise traders might become even more bullish
tomorrow, and so must take a position that accounts for the risk of a further rises when
they have to buy back the asset. Where noise traders’ beliefs become even more
extreme before they are subject to correction, this providing a source of risk to be faced
by arbitrageurs particularly if they are subject to a short horizon.6
There is extensive evidence that noise trading is not always a ‘bad signal’ for rational
investors but also a profitable opportunity. Occasionally, arbitrageurs may find it more
profitable to trade in the same direction as the noise traders rather than in the direction
that corrects the mispricing. This is the well-known “feeding the bubble” strategy.7
Noise traders are particularly uniformed traders who typically base their trading strategies on information other
than value relevant news (word-of-mouth, advice of financial “gurus”, technical analysis, etc.).
Shleifer (2000) provides justifies the assumption that arbitrageurs may be faced by short horizons on the basis that
arbitrageurs typically do not manage their own money but are agents for investors who evaluate their performance
at regular intervals and reward them accordingly. Mispricings that take longer to correct than the evaluation
horizon may therefore reduce arbitrageurs’ remuneration. Further, many arbitrageurs borrow money and securi-
ties from intermediaries to put on their trades and, whilst they have to pay interest, they also face the risk of
liquidation by lenders if prices move against them and the value of collateral falls.
This type of strategy is in particular profitable against a group of noise-traders, the positive-feedback or technical
chartists who buy more of an asset one period if it performed well last period and vice versa. If chartists push an
asset’s price above fundamental value, arbitrageurs are better off by holding the asset in their knowing that the
earlier price rise will attract more feedback traders next period, leading to still higher prices, at which point they
can exit at a profit (see Samuels et al. (1998), ch. 8).

As the analysis above suggests, there are good reasons to believe that arbitrage can be
of limited effectiveness in correcting any mispricing. In the short run, the costs and risks
of arbitrage may be sufficiently high to prevent instantaneous convergence to the
no-arbitrage equilibrium but in the long-run arbitrage forces will (hopefully) correct
irrational deviations and bring prices back to fundamentals. Still, it may take considerable
time for a mispricing to be corrected. Also, due to non-perfect replication and
implementation costs, which seem to be persistent sources of limitation to arbitrage and
consequently of mispricing, prices may (rationally) fluctuate in a band around a theoretical
(no arbitrage) price without a tendency to revert to an equilibrium. Figure 2 schematically
summarizes these ideas. For simplicity, let us assume that the no-arbitrage value of the
security evolves according to a smooth exponential curve. The dotted lines represent the
what we call “rational fluctuation bands”. Once noise trading leads price well outside the
fluctuation bands arbitrage strategies put a force on the price to return to the fundamental
value. Hence, fundamental value acts as a kind of attractor to the price. However, inside
the bands price may wander with no tendency to revert to the equilibrium.
Figure 2: A Schematic Representation of the Equilibrium Correction in Securities


Fundamental value

Irrational Time

Figure 2 gives a rough approximation of how real-world prices behave. In some markets
irrational deviations from the fundamental value may appear less often than in others, or
don’t even last for long. This results in coarser price patterns with occasional spikes. The
area of inaction around the fundamental value has also to do with the nature and the
characteristics of the market and the security under study (liquidity, magnitude of
transaction costs, limitations to short-selling). One expects that the size of irrational
deviations and the time passes until those are eliminated may change with the market or
the security. Still, the analysis above can serve as a general methodological tool for
studying the dynamics of financial prices, which mainly dictates that fundamental indices
are inadequate to explain much of the short-term variation of security prices. Next sections
shows how these heuristic ideas can be implemented into a solid econometric

3. Cointegration, Equilibrium Correction and the Specification of Dynamic Models
In econometrics a substantial body of research has been devoted to studying equilibrium
relationships among time-series of economic variables. From a statistical point of view,
two time-series xi and x2 are said to satisfy an equilibrium relationship.
f(x1t, x2t) = 0 ...(3)

if the amount • t = f(xit, x2t) by which actual observations deviate from this equilibrium
is a stationary process with zero median; i.e., the “error” or discrepancy between outcome
and postulated equilibrium has a fixed distribution, centered on zero, which does not
change overtime.8 Intuitively, the concept of stationarity is a natural requirement that the
error cannot grow indefinitely; if it did, the two variables would not have a tendency to
satisfy (3) and hence it could not be an equilibrium relationship.
Most economic and financial time-series exhibit a trending behavior and are thus
non-stationary. However, econometricians have noticed that by taking their first
difference 'x t x t x t 1 , the resulting time-series normally become stationary. This
implies that the original time-series trend like random walks, where past shocks to the
series do not die out but continuously accumulate. A time-series which can be made
stationary by differencing it once is said to be integrated of order one, denoted by I/(1), or
to have a unit root.9 For a long time it was common practice in econometrics to estimate
equilibrium equations involving unit-root variables by straightforward linear regression.
However, is has been proven that testing hypotheses about the coefficients of the
regression using standard statistical inference might lead to completely spurious results!
In an influential paper, Granger et al.(1974) demonstrated that tests of such a regression may
often suggest a statistically significant relationship between variables where none in fact exists.
Over time, econometricians have suggested many simple solutions to the ‘spurious
regression’ problem. If econometric relationships are specified after proper transformation
of the original data, e.g. taking first differences, logs or detrending, the statistical
difficulties due to non-stationarity are often avoided because transformed variables
become near stationary (this is actually the standard procedure suggested in many
econometric textbooks, see e.g. Box et al. (1994)). Although data transformations provide
an ad hoc technical solution to the problem of spurious repression, they cause other
problems in the modeling procedure. If an economic or financial theory is stated in terms
of levels of variables, then in estimating models relating transformed variables one
typically loses much of the theoretical guidance. Not to mention that with such models
it may no longer be feasible to empirically check the validity of hypotheses postulated by
the theory, (e.g., certain restrictions in variables’s coefficients, etc.).
To overcome the problem of spurious regressions, modern econometric science focuses
on the concept of cointegration and the detection of long-run relationships among
cointegrated variables. Two variables xt and yt which are both I(1) are said to be

In the rest of the paper, we adopt a more practical definition of stationarity which requires that the main moments
of a process—mean, variance and autocovariance—do not change over time. This is often termed as weak
If a discrete-time stochastic process can be made weakly stationary by differencing it d times, it is called integrated
of order d, or I(d). Weakly stationary stochastic processes are thus I(0).

cointegrated if there exists a linear combination of these two which is I(0).10 The concept
of cointegration is the link between the statistical theory of integrated variables and
economic long-run equilibrium relationships. Instead of using variables that require first
differencing one can obtain stationarity in levels by considering a composite time-series,
constructed by taking a linear combination of the original series. In addition, this
composite stationary time-series may be said to characterize the long-run equilibrium
relationship linking the series.
Systems with cointegrated variables can be equivalently represented by an error- or
equilibrium-correction model (ECM). The ECM describes how future prices respond to any
deviation from the equilibrium happened in previous periods, the latter being known as
the disequilibrium error. In other words, the ECM offers a representation of the adjustment
process through which the long-run equilibrium is maintained.
An important theoretical result, proven in Engle et al. (1987), is the Granger
Representation Theorem which simply states that if a set of cointegrated variables exists,
then there is a valid error-correction representation which describes the short-run
dynamic adjustment of cointegrated variables to equilibrium. In order to illustrate the
result, let us assume that two I(1) variables xt, yt are generated by the following
autoregressive system of order p:
p p
t 1j t j 1j yt j 1t ...(4a)
j 1 j 1


where , are white noise. We also assume that the two variables are tied-up to
the long-run relationship yt , so that yt – xt ~ I(0). The Granger Representation
Theorem states that in this case the system can be equivalently written as:
p-1 p 1
t 1 yt 1 1 1j t j 1j yt j 1t ...(5a)
j 1 j 1

p-1 p 1
yt 2 yt 1 1 2j t j 2j yt j 2t ...(5b)
j 1 j 1

where at least one of the parameters 1, 2 are significantly far from zero. The term
yt – xt in (5) is the disequilibrium error and 1, 2 measure the strength or speed of the
disequilibrium correction. A specification such as (5), which includes variable differences
and an error-correction term, is generally called an equilibrium- or error-correction model.
There are a few important things to note about ECMs. First, as seen by (5) all variables
Strictly speaking, the components of a vector xt = (x1t, x2t, ..., xnt)’ are said to be cointegrated of order d, b, denoted
by xt ~ CI(d,b), if a) all components of xt are integrated of order d and b) there exists a vector = ( 1, 2, ..., n)'
such that a linear combination '. x = 1x1t + 2x2t + ... + nxnt is integrated of order (d – b), where b > 0. Note
that if xt has n components there may be as many as n – 1 linearly independent cointegrating vectors. The number
of cointegrating vectors is called the cointegrating rank of xt. Clearly if xt contains only two variables, there can be
at most one independent cointegrating vector.

in an ECM are stationary and hence this class of models can be safely estimated and tested
using standard procedures. Second, both equations of the system are ‘balanced’ as their
left- and right-hand sides are of the same order of integration, i.e., I(0). This balancing
of equations is an important property of ECMs whose significance will be further discussed
in Section 3.2.3. Third, due to the equivalence between representations (4) and (5), ECMs
are in fact the proper models to use in case of cointegration. Hence, when two
non-stationary variables are suspected to be cointegrated then the non-inclusion of the
disequilibrium error term may lead to a badly specified model for the short-term dynamics
of the variables. All in all, by means of an error-correction representation we gain in two
ways: a) We reduce the possibility of getting meaningless or phenomenal relations due to
the non-stationarity of regressors; and b) We obtain a richer description of the short-run
dynamics of the system, i.e., the temporal adjustment to equilibrium.
3.1. Linear Models
For specifying a linear error-correction system such as (5) one typically follows a procedure
which constists of
• Testing for the order of integration of the original variables
• Estimating the cointegrating relationship
• Testing for cointegration and
• Specifying the equilibrium-correction model
Popular tests for the order of integration of a time-series are the Dickey-Fuller (DF),
Augmented Dickey-Fuller (ADF), Phillips-Perron (PP) and others Enders (1995). Testing
for cointegration usually amounts to estimating the coefficients of the static relationship
between the variables by OLS and apply unit-root tests to the residuals of the regression. u
Rejecting the hypothesis of a unit root is evidence in favor of cointegration.
If variables are cointegrated, the parameters of the error-correction representation can
be estimated using the two-step procedure proposed by Engle and Granger (1987). In the
first step, the parameters of the cointegrating relationship, (e.g. )E, are estimated by
running a regression in the levels of variables and in the second step these estimates are
used in the specification of the ECM (5). Both steps require only standard estimation
techniques, (i.e., OLS in the first step and maximum likelihood in the second step), and
the whole procedure can be shown to be consistent for all parameters: all estimators are
consistent and the limiting distribution of the estimators of the error-correction form is
the same regardless of whether one uses the estimate ˆ or the true value Efrom the
cointegrating regression Engle et al. (1987). One can also show that the estimates of the
parameters of the cointegrating vector converge to the true value at rate T, T being the
sample size, while the estimators of the parameters of the error-correction system, in the
second step, converge at the usual asymptotic rate of T . Due to their relatively fast
convergence, the estimators from the cointegrating relationship are said to be ‘super-
consistent’. Details and proofs on the above can be found in Banerjee et al. (1993).11

An alternative procedure for testing and estimating cointegrating relationships and error-correction systems,
which tends to become an econometric common-practice, has been developed by Johansen (1988, 1991). This one
builds on simultaneous estimation of the full error-correction system (including the cointegrating vector) using
maximum likelihood and is particularly useful with more than three cointegrated variables. When three or more
cointegrated variables are considered it is possible that there be more than one cointegrating long-run
relationships. In this case, Johansen’s trace test is used to determine the effective number of cointegrating vectors
is here used as a notation for a vector and not as an operator on a vector.

3.2 Non-linear Models
Although linear models are easy to understand and implement they suffer an important
disadvantage. A linear ECM implies that the arbitrage activities, taking the prices back
to a long-term equilibrium, happen in the period immediately following the occurrence
of a mispricing and that they are independent of the order or the sign of the disequilibrium
error. However, recalling the discussion of section 2.1 these two assumptions seem rather
simplistic. Due to several market imperfections not every price discrepancy between
financial assets triggers arbitrage activities. When the deviation from the equilibrium
exceeds a critical threshold, the benefits of adjustment exceed the costs and, hence,
arbitrage activities are more likely to move the system back towards the equilibrium.
In the light of the above arguments, Engle and Grangers’ original notion of
cointegration and equilibrium correction seems not be the optimal way to examine the
relationship between financial prices. In general, the speed of adjustment to equilibrium
may depend on both the sign and the size of the disequilibria which implies the presence
of non-linearities in the error-correction component. To tackle with these issues, two
approaches are mainly followed: a) Threshold and b) Smooth-transition error-correction

Threshold equilibrium-correction models were proposed to examine more complicated

patterns of arbitrage activation and equilibrium adjustment in the presence of market
frictions (liguidity risk, transaction costs, time-varying interest rates, etc.). They assume
a number of linear models each of which corresponds to the adjustment processes holding
for a specific “regime”. The transition from one regime to the other is determined by the
value of a transition variable, which often coincides with the disequilibrium error. Most TEC
approaches incorporate a three-regime equilibrium-correction system, according to which
the cointegration relationship is inactive inside a given range of disequilibrium (the
central regime) and then becomes active once the price deviation exceeds a certain
(positive or negative) threshold. Some studies that use the concept of threshold
cointegration are Anderson (1997), Clements et al. (2003) for yields on T-Bills and [Dwyer
et al. (1996), Lin (2003), Martens et al. (1998), Po et al. (2005) for stock index futures.
Suppose that Xt is a n 1 vector of cointegrated variables, Zt is the disequilibrium error,,
ci(L) is a n n matrix of polynomials in the lag operator L (LXt Xt–1) and is a
n-dimensional white-noise process. Typically, an TECM follows the general specification


if < Zt–1<
i–1 i

where denotes the difference operator dt(.) is a n x 1 vector of constants. In the case
of a two-regime model i = 1, 2 ( 0 = – , 2 =+ ) and in the case of three regimes
i = 1, 2, 3 ( 0 = – , 3 = + ).12 Generally, {ci(L), Gi(Zt–i)} depend on the regime i and

One can also assume a delay in the correction of mispricing and thus use Zt–d in the place of Zt–1, for some low
integer d.

each i W
represents the threshold or transition boundary between regimes i – 1 and i. If for
example three regimes are assumed, a typical specification is

- c1 (L) for Zt 1 d1W

c i (L) c® 2 (L) for 1Wd Zt 1 d2W
°c (L) for Z
¯3 t 1 ! 2W


-d1, Zt 1 d1W
di 0®, 1W Zt 1 d2W ...(6d)
d¯3 , Zt 1 ! 2W

where in the central regime (surrounding the equilibrium) the coefficient of the error
term is 0, incorporating a ‘band of inaction’ in the region [ 1W
, 2W
Various specifications for TECMs have been examined in by Balke et al. (1997), Enders
et al. (1998, 2001), Hansen et al. (2000, 2002), Tsay (1998). A comprehensive presentation
and evaluation of many of these models is given by Clements et al. (2003) in the context
of forecasting the spread between short- and long-term US interest rates.
For specifying a TECM one typically starts with testing for cointegration and especially
the hypothesis of a non-linear threshold adjustment against the linear one. Such tests
have been proposed in the references given above. The thresholds can be determined by
estimating the coefficients of each regime model (6a) over a ‘grid’ of permissable threshold
values, ensuring that a minimum number of observations falls into each regime. Estimates
of thresholds are obtained by minimizing the total sum of squared errors. To determine
the number of regimes in a TECM one naturally turns to theoretical considerations
regarding the specific market or security, although formal statistical procedures, such as
the Akaike’s Information Criterion, can be also employed, (see e.g., Tsay (1998)).
3.2.2. Smooth Transition Error-Correction Models (STECM)
While the threshold cointegration model is appealing since it reflects the intuition that
investors only respond to large deviations from equilibrium, the thresholds introduce
difficulties into the modeling process because they implicitly assume that agents respond
in a homogeneous way to disequilibrium errors. In practice, investors are unlikely to have
common “activation thresholds” because they encounter different transaction costs
associated with portfolio adjustments (brokers’s fees, commissions, tax liabilities, etc.), and
also differ in the time needed to detect the deviation and organize the trading. The fact
that different investors might have different thresholds of inaction suggests that the
overall market thresholds may become “blurred” as one aggregates across individual
investors. Thus, the resulting equilibrium adjustment might not simply be an “on/off’
process, as a TECM suggests. Smooth transition and gradual weakening of adjustment as
the market moves closer towards equilibrium might provide a more realistic representation
of the aggregate adjustment process.

One can incorporate the idea of smooth transition by using the simple model
specification [2]:
where is a positive constant and G(Zt–1) is a n 1 vector-valued function whose
individual components belong to either of the following parametric family:
G(T) = 1 – e– T ...(8a)

T2h (T)
G(T) = 1 – e– , h(T) ...(8b)
2 e
Figure 3: The Symmetric Error Adjustment where , > 0 and T R.
Function (8a) for Different Values of Figures 3 and 4 depict several plots
1 of (8a) and (8b) for different
values of the parameters. Note
that is a measure of the overall
0.8 tendency to return to equilibrium,
0.7 while in h(.) accounts for asymmetries
0.6 in the adjustment processes. When
> 0 (< 0) the speed of

correction is slower (faster) for
0.4 positive deviations than for
0.3 negative ones. Such asymmetries
0.2 in the adjustment process may
0.1 well hold in practice.
0 The smooth adjustment
–0.5 0 0.5 functions (8) can be imbedded in
a more general framework to obtain

Xt c(L) X t 1 d(L) X t 1 Zt 1 t 1 t ...(9)

where d(L) is another Figure 4: The Asymmetric Error Adjustment Function (8b)
n x n matrix of polynomials for = 50 and Different Values of
in the lag operator. This 1
model can be considered as
a generalization of the 0.9
threshold error-correction 0.8
model (6) which smoothly
“blends” between two 0.7
autoregressive structures 0.6

c(L) and c(L) + d(L). In

that way, a continuum of
linear models arises each 0.4
corresponding to a =–20
different size and direction 0.3
of the disequilibrium error. 0.2 =20
The relevance of models
like (9) to the modeling of 0.1
economic time-series is 0
discussed in Granger et al. –0.5 0 0.5

For the specification of smooth transition models, one typically starts with testing the
hypothesis of linearity against a smooth non-linear adjustment model. If linearity is
rejected the model is estimated by non-linear least squares Terasvirta (1994). Smooth
transition equilibrium-correction models have been applied to a range of markets,
including foreign exchange Michael et al. (1997), Treasury Bill Anderson (1997), equity
Jawadi (2004) and others. Still, the marginal performance of smooth transition against the
threshold models is not very evident; for example Anderson (1997) finds no improvements
in the forecasting accuracy when using a smooth transition model instead of a threshold one,
although the former seems to be more robust in terms of several diagnostic tests.
3.2.3. Computational Intelligent Models
By the term computational intelligent (CI) models, we are referred to algorithms, or
computational devices in general, which have the inductive-learning or ‘learning-from-data’
property. Popular examples of these methodologies are artificial neural networks Anderson
(1995), Haykin (1991), Adaptive Neuro-Fuzzy Inference Systems (ANFIS) Jang et al.
(1996), genetically-evolved models Koza (1991), et al. Although most of these intelligent-
learning paradigms have been nowadays grouped under the umbrella of computational
intelligence, they have principle differences and also come in a great variety. This makes
a detailed exposition of the topic infeasible in the given space constraints.
Intelligent learning methodologies are constantly gaining ground in the financial
literature, as concerns the prediction of stock and interest rate movements, the pricing of
derivatives and the forecasting of foreign exchange rates and volatility (see Abu-Mostafa et
al. (2001), Azoff (1994), Chen (2002), Kingdom et al. (1997), Trippi et al. (1996) for
comprehensive surveys). A great majority of intelligent approaches employ a network
learning technique, such as feed forward, radial basis function or recurrent neural network
Swanson et al. (1997), Zhang et al. (1998), although certain paradigms such as genetically-
evolved regression models Cortez et al. (2001), Farley et al. (1996), Koza (1991), Szpiro (1997)
or inductive fuzzy inference systems Fiordaliso (1998) are also encountered in the literature.
The growing interest in these technologies is justified by the fact that they offer very flexible
modeling specifications which rest on few (if any) assumptions on the data-generating
process. In addition, many intelligent learning methods, such as feed forward NNs or Takagi-
Sugeno FIS Jang et al. (1996), Takagi et al. (1985), possess a universal approximation property,
which means that (under mild assumptions) they are capable of approximating highly
nonlinear mappings with arbitrary accuracy (for a comprehensive review of the universal
approximation properties of common intelligent techniques see Tikk et al. (2003).
From an econometric point of view, intelligent learning methods employ
semi-parametric nonlinear regressions of the form:
Yt = g(Wt; )T+ • t ...(10)
where Yt is the target variable, Wt ' (Yt–i , i = 1, 2, ..., p; Xt–j , j = 1, 2, ... , q) is the
set of explanatory variables, Xt is a vector of exogenous explanatory variables, f(.) is a non-
linear R2p o R deterministic mapping and Tis a vector of real parameters, • t is by
construction the unpredictable part of Yt given the information contained in Wt (E( • t \Wt
) = 0).
For some specific methodologies such as single-layer feed forward neural networks, radial
basis function networks and neuro fuzzy systems, the non-linear mapping f(.) takes a more
specific form and can be written as a linear combination of basis functions.
f(Wt , 4) ¦ iTg i (Wt , iI) 0 ...(11)
i 1

where gi(Wt, i ), i = 1, 2,..., m are the basis functions with adjustable parameters i and
i , i = 0, 1, ..., m are adjustable coefficients. In the case of a multilayer perceptron, for
example, the parameters i and i are the network’s weights and each basis function is
given by
g i Wt i s( i0 WKt ik
k 1

where s is the activation function, usually specified as sigmoidal, and i0 is the ‘offset’
or ‘bias’ term. See [14] for details and proofs on the above.
To illustrate the application of cointegration methodology in CI, let us suppose that
variables Yt, Xt are I(1), in that their changes are stationary, and also co- integrated so that
there exist a vector of parameters such that Zt = Yt – ' . Xt is a stationary variable.
Let us assume that we possess a sample of observations (or a training sample as often called
in CI literature) for Yt and Wt, where Wt ( Yt–i, i = 1, 2, ..., p; Xt–j, j = 1, 2, ...,
q ; Zt–1).13 Based on the training sample, CI methods are capable of adaptively learning the
underlying relationship between Yt and Wt , offering parametrization of the form

Yt f( Wt t ...(12)
The main advantage of Cl-based equilibrium correction models is that they offer rich
specifications which can possibly accommodate arbitrary nonlinear features of the
underlying data-generating process (especially the non-linearity in the response of Yt to
the disequilibrium error). Theoretically speaking, (12) can be considered as a generalization
of threshold and smooth transition error correction models (6), (7) and (9) presented in
section 3.2.
Nevertheless, the nonparametric feature of intelligent methods is often the source of
many problems. First of all, nonparametric methods are always prone to overfitting,
meaning that they can be designed to fit a certain data sample as well as possible. However,
when applied to unseen data their performance is poorer than a simple parsimonious model.
Several heuristic procedures have been proposed in the literature for robustifying a
nonparametric model and thus avoid overfitting, but none is superior to others.
Another disadvantage of nonparametric methods stems from the fact they perform
arbitrary non-linear transformations to the original variables which are solely driven by
data and not by theory. One should be concerned about whether these types of nonlinear
transformations make sense from a financial or economic point of view. To illustrate the
point, take for example the discussion on the response of a cointegrating variable to the
disequilibrium error. From a financial point of view, one would reasonably expect that as
the value of Zt becomes higher it leads to more abrupt corrections in future price and
hence have a larger impact on the target variable Yt. In other words, the partial derivative
of the output signal of an intelligent method to an input variable should resemble an
inverse ‘bell’ function, like the ones depicted in figures 8a and 8b, which grows
exponentially as the value of the input variable becomes larger in absolute terms. It is by
no means guaranteed that the parametrization offered by intelligent models can effectively
capture this property, without being necessary to adding much complexity.

W is here used as a notation for a vector and not as an operator on a vector.

3.3. Other Advances in Cointegration Analysis
The reader would probably agree that the notion of cointegration and equilibrium-
correction is very flexible and can be extended to many directions. So far, we have treated
a cointegrating relationship as a linear combination of non-stationary variables. However,
there is a widely held belief that nonlinear equilibrium relationships between economic or
financial variables may well hold in practice. This is in particular true, for example, in the
pricing of options and other derivatives.14
If financial theory suggests a nonlinear equilibrium then the cointegration methodology
has to be generalized in many ways. This first of all applies to common characterizations
of stationary and non-stationary time-series as I(0) and I(1), respectively. When evidence
suggests a nonlinear equilibrium relationships then the theory of unit-root or I(1) time-
series is insufficient to handle all situations and more general definitions, such as
extended-memory in distribution or mean, seem more appropriate to describe the
‘persistence’ or long-memory property of certain time-series (see Granger et al. (1991,
1995) for a discussion). The generalization to nonlinear cointegration is possible, so that
Yt, Xt are both I(1) but Zt = Yt – g(Xt) is a stationary finite variance process for some
nonlinear function g(.). The analysis can proceed as usually by using the new Z terms and
Xt replaced by g(Xt). Since Zt is stationary, Y = g(X) can be thought of as a nonlinear
equilibrium relationship or attractor in a (X, Y) plot. Figure 5 illustrates two types of
attractors, a linear and a nonlinear 3rd degree polynomial one. Provided that market forces
occasionally act to restore the equilibrium, it is more possible that most realizations of the
two variables fall in an area surrounding the curve (X,g(X)), resulting in a picture similar
to that of figure 5. The properties of non-stationary processes having a nonlinear attractor
are examined in Granger (1991).

Figure 5: An Example of a Linear and a Nonlinear, 3rd Degree Polynomial, Attractor

The detection of nonlinear attractors poses an interesting theoretical puzzle, which

deserves attention. Since, detecting nonlinear cointegrating relationships affectively
means running a regression between non-stationary (trending) variables, one must be sure
that minimizing a certain error criterion he/she obtains a ‘genuine’ and not a spurious
long-run relationship. Hopefully, Chang et al. (2001, 2003), Park et al. (2001) provided an
answer to this problem, by developing the statistical theory of nonlinear cointegrating
relationships between non-stationary time-series. In particular, Chang et al. (2003) show
cf the Black-Scholes type formulas

that for a rather general specification, which includes a single hidden layer NN and a
smooth transition model15 as special cases, the parameter estimates obtained by
minimizing a least squares criterion are asymptotically consistent but generally not
efficient. From a practical point of view, this means that although sample estimates
converge to the true parameter values as sample size increases, for small sizes they are
usually biased and have large variance. In addition, they do not asymptotically follow
standard distributions which renders classical statistics like t-ratio improper for statistical
inference. The authors show that in order to guarantee efficiency and good asymptotical
behavior, the usual nonlinear LS estimators have to be modified by a correction term,
which is derived in the paper.

In this paper, we review current econometric trends in cointegration and equilibrium-

correction modeling and show that these type of models are able to accommodate in their
specification many “bahavioral” elements of the price formation mechanism, such as
temporal deviations between price and fundamentals and long-term corrections. In our
presentation of the topic we follow a simple-to-complicate direction, by starting with
linear error-correction models, which capture the main idea of equilibrium restoration,
and moving on to nonlinear ECMs (threshold and smooth transition), which offer
specifications able to model more complicate error-correction patterns. Finally, we discuss
ECMs based on nonparametric computational intelligent methods, which to our opinion
obtain the highest level of generalization as they make few (if any) assumptions on the
equilibrium restoration mechanism. By means of our exposition, we hope to show to the
reader another possible contribution of cointegration methodology to bahavioral finance
research: Its use as a device for empirically validating many of the hypotheses posed by
bahavioral finance. We argue that through cointegration analysis and ECMs, various
gestures on limits to arbitrage and irrational exaggeration can be tested not only in terms
of statistical significance but also in terms of economic value that bring to an investor.
The latter is very important for practitioners who want to base their trading strategy on
bahavioral suggestions.
By means of this paper we encourage the use of intelligent methods in cointegration
analysis. Due to their nonparametric nature, intelligent learning algorithms can be proven
rather successful in estimating both nonlinear equilibrium relationships and error-
correction models. CI methods offer a good alternative to many nonlinear econometric
specifications, especially in the absence of a priori information on the shape or properties
of the attractor or the adjustment dynamics. However, experience has shown that it is
always a good practice to adopt a specific-to-general approach when dealing with
nonparametric methods. This means starting with simple models that capture the salient
features of the underlying relationships and gradually complicating the parametrization
when necessary. Many methodologies for building neural networks are based on this
simple-to-complicate idea (see e.g. Medeiros et al. (2002). One has always to keep in mind
that standard models like linear or generalized linear regressions offer a solid statistical
theory for testing and estimation which makes modeling less time consuming than a trial
and error procedure to derive the optimal architecture for a neural net.

In the context of cointegrating relationships smooth transition models represent a special type of
equilibrium relationship departing from a long run linear equilibrium relationship and smoothly adjusting
to a new one.

Another possible advantage from the use of intelligent methods in cointegration
modeling is related to the issue of modeling transparency. Many of the CI methods provide
more transparent representations of the underlying relationship, by means e.g., of a system
of expert rules or a decision tree, which are in principle easier to understand and validate
by experts. Intelligent modeling gives also the financial analyst great flexibility in
combining individual algorithms to create hybrid systems that share the advantages and
minimize the disadvantages of individual schemes. A neuro-fuzzy combination, for
example, keeps the powerful approximating capability of NNs while it adds much to the
comprehensibility of the induced model. An evolutionary algorithm, such as GP, is often
useful for performing a more consistent exploration of the space of possible network
topologies (i.e., model specifications).
Still, it is our view that a good deal of research is needed to transform these systems
into econometric methodologies for analyzing time-series. This is mainly because in this
type application domain there is a strong culture for investigating not only the predictive
power but also the statistical significance of various aspects of the derived model at a specified
level of confidence. In the context of cointegration analysis, in particular, the analyst
would be very keen to know e.g., how many of the lags of each cointegrated variable are
significant in the error correction model at a given confidence level, whether the error
correction term Z is significant in explaining future movements of the target variable,
which is a 5% confidence interval for the speed of the adjustment to equilibrium, and so
on. Such issues greatly enhance the modeling procedure and our understanding of the
underlying dynamics. The investigation of the statistical properties of model parameters
has been a very active research direction in the Neural Networks literature (see e.g.
Medeiros et al. (2002), White et al. (1998, 2001), Zapranis et al. (1999), where nowadays
standard procedures exist for testing the individual or joint irrelevance of network inputs
or weights. F

Acknowledgment: I am grateful to Dr Uzay Kaymak, Erasmus University Rotterdam, and Dr Georg Dorffner,
Austrian Research Institute for Artificial Intelligence (OFAI), for comments and suggestions on earlier versions
of this paper. Remaining errors, omissions and opinions are at my responsibility. This research is financially
supported by the Public Benefit Foundation “Alexander S. Onassis” (, under the 2003-2004
scholarships program, and by a grant from “Empeirikion” foundation (

Cointegration and Equilibrium-correction Models:

The IUP Journal of Behavioral Finance, Vol. III, No. 3, 2006

Cointegration and Equilibrium-correction Models:

The IUP Journal of Behavioral Finance, Vol. III, No. 3, 2006

Cointegration and Equilibrium-correction Models:

