Claims Reserving, State-Space Models and The Kalman Filter

JIA 110 (1983) 157-181
CLAIMS RESERVING, STATE-SPACE MODELS AND

THE KALMAN FILTER
BY PIET DE JONG, B.ED., PH.D.

AND
B. ZEHNWIRTH, B.Sc., M.Sc., PH.D., A.I.A., A.I.A.A.
1. INTRODUCTION
1.1. THIS paper describes a consistent and justifiable means of establishing
adequate claims provisions in General Insurance. The topic has created
widespread interest amongst actuaries, accountants and regulatory authorities.
The issue of adequate provisions is of utmost importance to policyholders, whose
justifiable claims must be paid, insurance companies who must be able to satisfy
shareholders and make proper assessments of premiums, and regulatory
authorities who must be satisfied that adequate provision has been made for all
liabilities.
1.2. The claims reserving problem has been treated by many authors with
solutions ranging from simple accounting techniques to complex model building.
An extensive and authoritative survey of the multitude of methods and models is
given in a publication by van Eeghen (1981). Covered by this survey is the recent
original work of Reid (1978) which, together with extensive discussion, appeared
in this Journal. Despite the wide-ranging literature on the subject, discussion in
both this and other journals indicates that the problem of determining the
adequate level of claim reserves for general insurers has not been satisfactorily
addressed. Without claiming that the method described in this paper is a panacea
we feel it makes a significant step towards a solution to this problem and opens a
way out of the current impasse.
1.3. The state-space approach to claims reserving has important advantages.
To begin, the framework provides a flexible and unified method of attack,
suitable in a wide variety of circumstances and avoiding the need for the detailed
tailoring of calculations or mode of analysis to specific and often messy
circumstances. The flexibility of the method is such that different, and in
particular changing, circumstances can be accommodated in one and the same
framework. Secondly, the approach focuses on the forecasting nature of the
claims reserving problem and aligns the theory with that of modern time series
analysis, forecasting and control (see e.g. Box and Jenkins (1970)). Thirdly, as is
desirable in all forecasting exercises, the approach yields forecast errors which in
turn can form the basis for confidence statements regarding the adequacy or
otherwise of specified levels of reserves. Next, the state-space approach can
accommodate both objective and subjective information, the latter being
157
158 Claims Reserving, State-Space Models and the Kalman Filter
crucially important in times of rapid change, where data is scanty, or where data
is of questionable validity. A final advantage is that the method of estimation, the
Kalman filter, is a ‘real-time device: every new set of observations leads to a
relatively simple update of existing estimates and there is no necessity to redo all
calculations from anew, or keep track of all previous information.
1.4. Before embarking, two points deserve emphasis. Firstly, the techniques of
state-space or dynamic linear modelling have applications ranging far beyond
the confines of the claims reserving problem. For example, the techniques can be
used to monitor mortality on a continual basis. Moving further afield, business
applications are comprehensively covered by Harrison and Stevens (1976) who
have emphasized the general practicality of the ideas and methods. We highly
recommend the paper for a general overview of dynamic linear modelling.
Claims reserving, on the other hand, presents its own specific problems and
peculiarities, and these are the points of focus of this paper. That the state-space
method can accommodate such a nonstandard forecasting problem emphasizes
the general relevance of the techniques.
1.5. Secondly, it is important to note that the approach developed in this
paper does not constitute a ‘theory’ of how in fact the claims process evolves.
Theories of this kind underly the works of Reid (1978), and Linneman (1980).
Such approaches attempt to develop functional forms and parametrizations of
the claims reserving problem which seem to be suggested by a priori theory
regarding the phenomenon. The aims are essentially empirical in that one tries to
use theory to circumvent data deficiencies. The methodology is analogous to that
employed in Econometrics. In contrast, the approach of this paper is related to
that of Time Series or Box-Jenkins analysis. Simple and direct dynamic relations
are emphasized and a priori theory takes a back seat. Forecasting under such
circumstances is usually no worse and often better. This, together with the
relative simplicity, is a major appeal of the state-space approach.
1.6. The programme of this paper is as follows. In the next section we give a
brief introduction to the claims reserving problem, the data base and the notation
used in this paper. Section 3 introduces the general state-space framework and
the Kalman filter. Section 4 then goes on to develop a state-space model suitable
for the claims reserving context. This is followed up in 6 5 with an illustrative and
simplified example. More general remarks regarding the methodology are made
in §6, while § 7 takes up the issue of forecasting future payments. Appendices deal
with various mathematical details.
2. CLAIMS RESERVING: THE PROBLEM AND THE DATA

2.1. The necessity for claims reserves arises out of the usual delays which arise
from the following two sources:
(a) the event which gives rise to a claim and the notification of the claim to the
insurer.
(b) the notification and final settlement of the claim.
Claims Reserving, State-Space Models and the Kalman Filter 159
2.2. These two delays induce a time lapse between the event which logically
implies a liability and the ultimate settlement of the liability. The actual total time
delay is unknown at the time the liability arises. In addition, the actual size of
settlement is unknown until settlement actually takes place. Insurance companies
must make adequate provision for these unpaid liabilities.
2.3. The provision consists of two components corresponding to the two
sources of delay:
(a) I.B.N.R.—incurred but not reported reserves. These relate to liabilities that
have arisen but are reported after the accounting date.
(b) I.B.N.E.R.—incurred but not enough reserved. These relate to claims that
are notified before, but settled after the accounting date.
2.4. In this paper we do not distinguish between these two parts, although the
methods can be modified to accommodate such distinctions. The described
techniques will focus on forecasting the sum of the two components.
2.5. The basic data available in the claims reserving context is usually
arranged in a so-called ‘runoff triangle’. Consecutive rows in this triangle
indicate amounts paid out over time with respect to accidents relating to a given
‘year’ of ‘origin’ or ‘accident’. The actual time frames are of little theoretical
significance and hence for ‘years’ one may substitute, for example, months or
half-years. Figure I displays the customary layout of the data, and simul-
taneously introduces some notation.
Each row of the table indicates the distribution over time of observed
payments relating to a specific origin or accident year. Classification is according
to the delay or development year of payments. More recent origin years have
fewer observed delays and hence the triangular nature of the data base. Each
additional calendar year results in the filling out of an additional ‘diagonal’ of
numbers and in this sense calendar time moves downwards to the right.
2.6. The above picture is idealized. In practice, sufficiently long delays
Figure 1. The runoff triangle

produce negligible payment streams and hence the triangle may be cut off at the
right to form a trapezium. Moreover, the sharp top edge of the triangle reflects an
assumption that no runoff data is available prior to origin year one. This
assumption is often unrealistic. Despite such potential complications, we restrict
our attention to the idealized setup of Figure 1, since the described techniques
can accommodate numerous variants under apparent and often trite modifica-
tions. This flexibility in the state-space machinery is a strong argument in favour
of the methods.
2.7. In terms of Figure I the aim of claims reserving is to forecast the numbers
lying to the right of the last completed diagonal. These entries constitute all
liabilities incurred but not yet paid out. The sum of entries along each ‘future’
diagonal represent the total liabilities that come due in each future year. The sum
of all such future liabilities, suitably discounted, represents the total present value
of all liabilities for which reserves must be made available.
2.8. As indicated in Figure 1, denote by yd(t) the aggregate payments made in
calendar year I with respect to claims delayed d years (d = 0, 1, . . . , t– 1). In
terms of Figure 1, the numbers yd(t) for fixed t and varying d lie along a diagonal
extending upwards to the right. For given d and t the implied origin year is
w = t– d.
2.9. To model the payment stream we write yd(f) as a mean level plus a zero
mean error:
(2.1)
where for fixed calendar time t, the delay d runs from 0 to t – 1, and hence the year
of origin w= t– d from t down to 1. In general the quantity m(w,d) is the
expected total payment in year t = w + d with respect to claims originating in year
and delayed d years. In this paper we model m (w, d) as a function of the origin
year w and delay d. The data is used to estimate the unknown model parameters.
The model is then used to forecast future payments.
2.10. Collateral information, useful for forecasting the yd(t), is often avail-
able. Obviously, payments relating to a specific accident year are related to the
volume of business transacted in that year. It is also common knowledge that the
nominal size of settlements relate to the price level at the time of payment. Let
n(w) be an index of the volume of business transacted with respect to origin year
w and let (t) denote the price index applicable to payments in calendar year t. In
place of (2.1) it may be reasonable to posit
(2.2)
where now m(w,d) = m(t – d, d) is the mean level per claim after adjusting for
inflation. This alternate specification forms the basis of the empirical example of
05.
2.11. An economical way of writing both (2.1) and (2.2) is to use the following
vector notation. Let y(t) be the column vector of all observations made at time t:
Thus, y(t) constitutes the t-th diagonal of the runoff triangle with entries ordered
according to increasing delay or equivalently as they appear going upwards to
the right. Let u(t) be similarly defined from the components ud(t) and f(t) be
the column vector made up of either the entries m(t - d, d) or
n(t-d) (t)m(t - d, d) corresponding to respectively (2.1) and (2.2). Then for
each period of time t we can write both these latter equations as
(2.3)
3. STATE-SPACE MODELS AND THE KALMAN FILTER

3.1. This section describes the state-space model framework and the Kalman
filter algorithm. State-space modelling techniques have recently received con-
siderable attention in the statistics literature and the tools and techniques have
found uses in diverse areas ranging from engineering to econometric forecasting
and finance. The Kalman algorithm or ‘filter’ is a set of equations that generate
successive and updated best estimates of the unknown parameters in state-space
models.
3.2. A state-space model consists of two sets of equations. The first of these
embodies the assumption that in each period a vector of observations is made,
each component of which is made up of a known, time varying linear
combination of unknown parameters, and a zero mean error. Writing y(t) as the
observed (column) vector, ß(t) as the vector of unknown parameters an d u(r) as
the vector of errors then the model for each period t reads
(3.1)
where X(t) is a known ‘design’ matrix specifying the manner in which the
observations are related to the unknown parameters. The formulation (3.1) is
reminiscent of the well-known general linear model (see, e.g., Rao (1973)).
3.3. Differences arise from the fact that the vector of observations y(t) is
definitely associated with some given time point t, and again, u(t) and the design
matrix X(t) are definitely also associated with this time point. In standard
terminology (3.1) is referred to as either the ‘measurement’ or ‘observation ’
equation, since it specifies the manner in which observations are generated,
3.4. In parallel with (3.1) one hypothesizes a so-called ‘state’ or ‘system’
equation describing the evolution over time of the ‘state’ or parameter vector ß(t)
(3.2)
where H(t) and G(f) are known matrices and v(t) is a vector of zero mean errors.
This state equation imparts the dynamic character on the model and brings out
the substantial difference between (3.1), (3.2) and the standard linear model: the
parameters are envisaged as random quantities evolving through time according
to a known mechanism. Although less stringent conditions are possible, vectors
u(t) and v(t) are presumed uncorrelated both with each other and with previous
error, terms, while their respective covariance matrices U(t) and V(t) are
presumed known.
3.5. A usual first reaction to the Kalman model is to be struck by the amount
of knowledge that is assumed. Yet, as will be illustrated below and as Harrison
and Stevens (1976) have pointed out, this knowledge often enters in a
surprisingly natural way. Moreover, the equations can be specialized to reduce to
many of the usual models employed in statistics. To illustrate, if for all periods t,
the covariance matrix V(t) is identically zero and H(t) is identically equal to an
identity matrix then the state-space formulation reduces to the general linear
model. Pursuing an opposite direction, the equations may be further generalized
but we shall not take up this point as the formulation given above suffices for
present purposes.
3.6. We now turn to the Kalman filter which is a method of estimating the
vector of unknown parameters ß(t) in each period of time t given all observations
up to and including time t. In particular, suppose a sequence of vector
observations y(t), t= 1, 2, . . . is observed through time under the conditions
specified above and we wish to estimate both the current vector of parameters
ß(r) and make a forecast of the next vector of observations y(t + 1). We aim to do
so in a ‘best’ sense and define optimality in the usual minimum error variance
manner. In particular, we pick those two sets of linear combinations of all the
data y(1), y(2), . . . , y(t) which when subtracted from respectively ß(f) and
y(t+ 1) have minimum variance. Using a slightly asymmetric notation let
and denote these estimates, while and ý(t) are the corresponding
estimates constructed in the previous period (that is the estimates of ß(t – 1) and
y(t) on the basis of y(t – 1), y(t – 2), . . .). If C(t) is the covariance matrix of
then Kalman’s result states that the estimates and the covariance
matrix C(t) satisfy the following relations:
(3.3)
(3.4)
(3.5)
(3.6)
(3.7)
where ’ indicates transposition. Formal proofs of these relations can be found in
Hannan (1970) or Jazwinski (1970). An informal proof is outlined in Appendix
D.
Claims Reserving, State-Space Models and the kalman Filter 163
3.7. The first important point to note about these formulae is that they are
recursive, providing an explicit method of updating estimates and predictions as
additional information comes to hand, without requiring all previous data to be
retained. The actual updating occurs as follows. The current observed vector y(t)
is compared with its previous period prediction (t) and the error of prediction
forms the basis for adjusting the best estimate of ß(t) made on the basis
of all but the current y(t) vector. Adjustment occurs via the so-called Kalman
gain matrix K(t), which may be interpreted analogous to the ‘smoothing
constant’ of exponential smoothing or the ‘credibility factor’ of credibility
theory. This gain matrix does not depend on the data and can be computed a
priori. If K(t) is ‘small’, little weight is attached to the current observations. The
matrix K(t) is in turn defined through R(t), X(t) and U(t). A ‘large’ U(t) will
result in a ‘small’ K(t) and, indeed, this accords with intuition: if y(t) is highly
variable then relatively little weight should be accorded to it in the estimation of
ß(t) Moving on to the equation for R(t), a straightforward calculation indicates
that it corresponds to the covariance associated with estimating ß (t) when y(t) is
not used. The matrix k(t) x(t) R(t) is then the value of information afforded by
y(t) inasmuch as this term represents the difference between C(t) and R(t). Again
note that both C(t) and R(t) can be calculated without knowledge of the y(t).
3.8. Equivalent formulae to equations (3.6) and (3.7) which often facilitate
computations are given by:
(3.8)
(3.9)
3.9. The actual solution of the Kalman filter equations (3.3)–(3.5) and
(3.6)–(3.7) or (3.8)–(3.9) evidently presents few computational problems even for
microcomputers. We also note that the dimensionalities of the matrices used in
the computations are strictly related to the number of components of y(t) in a
given period and the number of parameters in ß(t) A more substantive issue
surrounds the problem having to initiate the recursion with initial estimates of
both the parameter vector and its covariance matrix. This question is pursued in
Appendix C.
4. A STATE:-SPACE MODEL FOR THE RUNOFF TRIANGLE

4.1. This section develops a model for the runoff triangle which fits into the
state-space framework. Although a specific model is proposed we stress that this
formulation is only one of a number of possibilities. For example, Appendix E
indicates how the Chain Ladder model can be fitted into the state-space
framework.
4.2. Employing the notation introduced in § 2 we initially aim to express the
mean level f(t) of (2.3) in the observation equation format of the state-space
framework:
(4.1)
where X(t) is a known time-varying matrix, and ß(t) is a vector of unknown
parameters. We consider the case corresponding to (2.1) where
(4.2)
and leave to final extensions the modifications needed to accommodate (2.2).

4.3. Each entry in the vector on the right-hand side of (4.2) interpolates the
m(w, d) surface above a straight line connecting the points (t, 0) and (1, t – 1) in
the (w, d) plane. Fixing the argument w and considering m (w, d) as a function of
d yields a sequence of constants which is naturally called a delay or lag
distribution. Each year of origin w gives rise to such a distribution and in any one
year we sample one component from each of an array of such distributions.
4.4. In modelling these delay sequences we formalize the notion that they are
likely to be ‘smooth’ as functions of d. In particular, for fixed w, the graph of
m(w d) as a function of possibly first rises, then slowly ‘dies out’ with a possible
‘hump’ in the tail reflecting long delayed large settlements. Smooth behaviour of
this type can be modelled using a relatively few known ‘basis functions’.
Concretely we suppose that for each w
(4.3)
where the (d), j= 1, 2,. . . , p are p known functions in d and the bj(W),
j= 1, 2,. . . , p are unknown parameters depending on the year of origin w. The
approach taken here is similar to that used in Econometric regression analysis
where lag distributions are modelled in terms of the polynomials and the
modelling methodology is known as the ‘Almon lag’ technique (see Almon
(1965)). Examples of both (d) and bj(w) are given in § 5.
4.5. In vector form relation (4.3) can be written as
(4.4)
where (d) and b(w) are column vectors defined as follows:
Substituting expressions of the form (4.4) into (4.2) yields

(4.5)
the final equality serving to define the matrix X(t) and the vector ß(t). This is our
observation or measurement equation connecting observations in each period of
time to a set of unknown parameters. Note that given the basis functions (d),
the matrix X(t) varies with t and is indeed known.
4.6. We now develop an appropriate state equation of the form (3.2)
connecting the parameters ß(t) to ß(t – 1). The basic idea is, again, to appeal to
smoothness conditions but now in a slightly different guise. In particular we
return to the m(w,d) and consider the sequences formed by fixing the delay
parameter d and varying the year of origin w. For any d we can write
(4.6)
where the first term on the right is a conditional expectation and n(w d) is a zero
mean error term. Assuming the conditional expectation is a polynomial in w of
order q–1 passing through the conditioning variables
m(w – 1, d), m(w – 2, d), . . . , m(w – q, d), leads to the specification (see Appen-
dix B)
(4.7)
where
(4.8)
4.7. Substituting relation (4.4) into both the left- and right-hand side of (4.7)
for d = d1, d2, . . . shows
(4.9)
where the matrix and vector v(w) are defined as
4.8. If we regard the b(w) as the basic random variables in the model then as is
verified in Appendix B, V(W)has a covariance matrix of rank at most p. This rank
is exactly p if is invertible. This shows that equation (4.7) can be independently
argued for at most p delays Assuming this, premultiplication of
both the left and right side of (4.9) by the inverse of yields
(4.10)
4.9. We have thus devised a widely applicable argument which formalizes the
idea of the gradual adaption of delay sequences over time and translates into a
concrete description of the manner of evolution of the b(w) parameters. From
(4.10) it follows that
(4.11)
where I and 0 are respectively identity and zero matrices of order p. Identifying
H(t) with the first matrix on the right and G(t) with the matrix which multiplies
v(t), then we can rewrite (4.11) as
which is precisely the state equation (3.2). Note that both H(t) and G(t) are
known.
4.10. With the main framework in place we turn to the volume and inflation
extensions incorporated in (2.2). Analogous to the development above, we
assume m(w,d) = (d)b(w) and tracing the argument that led to (4.5) we see
that this alternate formulation can be accommodated in the main framework by
a slight alteration in the definition of X(t). In particular the jth row of X(t) as
defined in (4.5) is multiplied by n(t – j– 1) (t) for j= 1, 2, . . . , t. Turning to the
state equation, the polynomial extrapolation argument culminating in (4.11) is
now applied (without loss of conviction) to the deflated means m(w, d) to yield
precisely the same state equation.
Claims Reserving, State-Space Models and the Kalman filter 167
4.11. In retrospect it is clear that a variety of other formulations and
generalizations can be aimed for. We do not pursue this and for the rest of the
paper content ourselves with extracting the ramifications of the above model.
Amongst issues left hanging in the air is the specification of the covariance
matrices U(t) and V(t). This problem is addressed in Appendix C where it is seen
to fit naturally with the problem of initiating the Kalman algorithm.
5. A SIMPLE EXAMPLE
5.1. This section illustrates the state-space technique and the Kalman filter
with an illustrative and simple example. The data is taken from Benjamin (1977)
and relates to the experience of a United Kingdom general insurer for the years
1970–74 inclusive. The runoff triangle is displayed in Table 1. From the data it is
apparent that the triangle is well behaved and accordingly a very simple model
should suffice.
Table 1. Runoff triangle for U.K. general insurer*
Origin Inflation Volume Delay d

year w index index 0 1 2 3 4
1 ·598 1·000 753·5 648·9 311·7 173·5 71·3
2 ·665 ·899 642·3 648·4 249·7 206·5
3 ·748 ·858 715·8 661·1 309·4
4 ·853 ·863 84 1·6 862·6
5 1·000 ·813 968·8
* Amounts in £1,000.
5.2. We invoke the setup of §2.10 and assume that the amount paid out in a
given year t with respect to a particular year of origin w is proportional to the
volume of business transacted in the year w of origin and the inflation index in the
year t of payment. The tabulated volume index is proportional to the total
number of claims at the end of the year of origin as it is only these figures which
are reported by Benjamin. The inflation index is also taken from Benjamin and
reflects a general cost of living index rather than an index peculiar to claims.
5.3. In terms of the notation introduced in §2.11 the successive y(t)
observation vectors are given by
y(1) = (753·5)
y(2) = (642·3, 648·9)'
y(3) = (715·8, 648·4, 311·7)'
y(4) = (841·6, 661·1, 249·6, 173·5)'
y(5) = (968·8, 862·6, 309·4, 206·5 (71·3)'
these being the successive diagonals of the runoff triangle.
5.4. To model the deflated means m (w, d) for each origin year w we use a
single basis function
(5.1)
which implies that for each year of origin, claims ‘die out’ monotonically and
eventually exponentially as the delay d increases, i.e. as we move along a given
row of the runoff triangle. Accordingly,
(5.2)
controlling the ‘level’ of the exponentially decaying delay sequence for each year
of origin w.
5.5. The first observation y(1) has deflated mean m(1, 0) which in turn is
parametrized in terms of b(1). The second observation vector y(2) involves the
mean levels m (2, 0) and m (1, 1) which are respectively parametrized in terms of
b(2) and b(1). In general the t-th observation vector y(t) relates to the mean levels
m(t, 0), m(t – 1, 1), . . . ,m (1, t – 1) which are respectively parametrized in terms
of b(t), b(t – 1), . . . , b(1). Collectively these latter parameters define the vector
ß(t) or ‘state’ at time t. Thus the state at time t is simply the collection of all
unknown b(w) parameters involved in the observations made at time t.
5.6. Under the current specification, the X(t) matrices take particularly simple
forms. In general the jth component of y(t) involves the single parameter b(t – j)
and hence X(t) is diagonal with jth diagonal entry n(t – j+ 1) (t) exp (–j + 1)j.
5.7. To connect parameters b(w) for different years of origin w, we use the
simple ‘constant’ or ‘random walk’ model:
(5.3)
where v(w) is a zero mean error term. This prescription implies that the
conditional mean of m(w, d) given m(w – 1, d), m(w – 2, d), . . . is m(w – 1,d)
for each d. To arrive at the transition equation for the state ß(t) corresponding to
(5.3) we note that q as defined in paragraph 4.6 is 1 and a(1) = 1. Hence H(t) has
zeros everywhere except in positions (1, 1), (2, 1), (3,2), . . . , (t, t– 1) where it is
unity. Finally, the matrix G(t) is t by 1 whose only non-zero entry is (1, 1) given
by 1/exp (–d) = exp (d), d corresponding to the explicitly constrained delay.
Without loss of generality we set d = 0.
5.8. Using maximum likelihood methods to be outlined in Appendix C the
initial state ß(0) = b(0) corresponding to the unobserved origin year 0 was
estimated to be 1146·3 with an estimate of the associated covariance matrix C(0)
equal to 2667·3. These served as initial starting values for the Kalman filter
algorithm. Each of the successive observation vectors y(t) was then used to
update the estimate of the state ß(t) using the equations (3.1)–(3.5). These
equations have as input the covariance matrices V(t) and U(t). Using methods
again outlined in Appendix C, the 1 by 1 V(t) matrix was estimated to be 626·3
while the t by 1 matrix U(t) was assumed to be diagonal with estimated diagonal
entries given by the first I values of
16283·6 12204·8 4509·3 2755·7 876·8.
In terms of the runoff table these entries are the assumed variances associated
with observations corresponding to delays 0, 1, . . . , t.
5.9. Table 2 displays the successive state estimates as generated by the Kalman
filter algorithm using the above data. The results confirm the regular nature of
the table and the appropriateness of the ‘constant’ transition model for the b(w).
5.10. The reported standard errors are the square roots of the diagonal entries
of successively generated C(t) matrices. As expected, these standard errors
reduce with time, i.e. as more data comes to hand.
5.11. To further explicate matters we now consider more complicated
formulations without pursuing the calculations. Instead of (5.1) one could
assume that each delay distribution is modelled by p = 2 basis functions. For
example corresponding to (5.2)
(5.4)
in which case the delay sequence can initially rise but then monotonically declines
to zero. In general, the more basis functions, the greater the diversity of
behaviour that can be encompassed in the model. In practice, one searches for the
simplest model that appears to provide an adequate fit. The reason for the
exponential terms in both (5.1) and (5.4) is that these ensure the eventual decline
to zero of the delay distribution.
5.12. Corresponding to (5.4), the X(t) matrix in each period t is block diagonal
with blocks given by the row vectors
multiplied by the appropriate inflation and volume index if these are incorpor-
ated in the formulation. The state at time t thus consists of 2t parameters b1(t),
b2(t), b1 (t – 1), ...,b1(1), b2(1), consecutive sets of parameters relating to
origin years t, t– 1, ...,1.
5.13 Instead of the simple ‘constant’ scheme embodied in (5.3) one could
Table 2. Estimated states*
Component of ß(t) Time t

corresponding to 1 2 3 4 5
b(5) 1147·3
(53·7)
b(4) 1141·2 114·2
(55·9) (48·9)
b(3) 1156·4 1141·1 1140·6
(57·2) (51·2) (45·3)
b(2) 1157·5 1157·1 1140·2 1138·5
(57·2) (52·3) (47·3) (42·8)
b(1) 1154·0 1158·6 1156·9 1142·5 1139·4
(55·4) (52·2) (48·2) (44·4) (41·2)
* Standard errors in brackets.
posit a linear or quadratic evolution mechanism for the b (w) coefficients. These
two formulations lead respectively to the following transition equations
The state equation (4.11) now readily follows and, for example, for the first of
these schemes has a ( 1) = 2, a (2) = –1. A slight complication arises for the initial
t = 1 state equation since the formulation (4.11) implicitly assumes as
evidenced by the terminating zero matrices in the top row block of H (t). If
t– 1 < q then the state ß(t – 1) will be defined in terms of b(t – 1),
b(t – 2), . . . , b(t – q) to reflect all the parameters that go on to determine ß(t).
6. GENERAL. COMMENTS
6.1. This section highlights features of the state-space approach not immedia-
tely apparent from either the equations or the example. We begin by noting that
the state-space equations (3.1) and (3.2) are statements, at each point of time t, of
the following kind:
(a) The relationship of the current observations to the unknown current
parameters
(b) The relationship of the current unknown parameters to the previous period
unknown parameters.
The state-space framework requires both sets of relationships to be linear in the
parameters. However, non-linearities in the variables or in the evolution
phenomena are not excluded, as evidenced by our formulations using the
functional forms (d + 1) exp (–d) and the polynomial extrapolation arguments.
6.2. The relationships (a) and (b) are not constrained to be time-invariant. At
a basic level this was seen to operate in §4 where the increasing dimension runoff
diagonals forced a time varying framework. At a more sophisticated level, the
setup permits dynamic changes in the parametrization. This feature stresses the
flexibility of the framework and is an important advantage in a changing or
evolving environment. For example, a sudden change in the factors impinging on
the claims process, such as legislation or company policy, may render the existing
parametrization inappropriate. Accordingly, new parameters are introduced
and subsequent observation vectors are parametrized in terms of the new
parameters to yield a different measurement equation (3.1). The link (3.2)
between the new and old parameters is then spelled out and the Kalman
algorithm is invoked to yield estimates of the latest parameters. The model is thus
adaptable and this emphasizes the fact that the state-space approach constitutes
a relatively broad framework for analysis as opposed to a fixed or rigid method of
attack.
6.3. Further evidence of flexibility is the fact that the framework can
assimilate varied subjective input. Subjective information can enter the frame-
work at two levels. Firstly, at the model formulation stage. Here decisions are
made regarding relevant parametrizations and the incorporation of ancillary
information such as inflation or volume indicators. The subjective import at this
level is comparable with that required in other statistical models. However, as
just noted, and in contrast to most frameworks of analysis, decisions need not be
made once and for all, but can be constantly adapted to reflect changing needs
and circumstances.
6.4. As stressed by Harrison and Stevens (1976), a second level of possible
subjective input is the specification of the covariances U(t) and V(t) of (3.1) and
(3.2), or even the initializing values of ß(t) and C(t) in any particular iterate of the
Kalman algorithm. Subjective input is important at this level because in many
circumstances, subjective information is as important as the objective claims
record. For example, times of uncertainty or rapid change point to weak
connections between the current parameters and those of the previous period. A
formal manner of quantifying such weak connections is to specify a ‘large’ value
for V(r). A comparable step is to directly adjust ß(t) or C(t) to reflect subjective
information regarding ß(t). In this same vein, a particularly unusual year of
claims experience can be subjectively discounted by specifying an associated
‘large’ U(t). The scope for ingenuity is immense. It is remarkable that the results
of this ingenuity can be assimilated in the same framework that simultaneously
integrates the objective claims experience.
6.5. Two final features deserve attention. Firstly, suppose ß(t)=ß(t–1) for
each t and hence that V(t) is identically zero. In this case each y (t) relates to the
same set of parameters and accordingly the state-space framework reduces to the
usual linear regression model. Secondly, we note that the Kalman algorithm is
tailored to the ‘real-time’ monitoring of a process: there is no need to carry along
all previous data and, as additional information comes to hand, estimates are
updated. The framework is thus eminently suited to environments with large
bodies of data where there is a continual need for immediate and up-to-date
information.
7. FORECASTING FUTURE PAYMENTS

7.1. We turn to the mechanics of forecasting the payment incidence in future
years. An important distinction divides all such future payments into those which
have already been logically incurred and the complement. The former group
defines the outstanding claims and is usually of primary interest. We focus on this
group of payments but note that the Kalman approach can also be used to
forecast the complement. In terms of the runoff table, the aim of forecasting the
group of payments of current interest consists of tilling out the cells lying to the
right of the last observed diagonal. Throughout this section we assume the mode1
(2.2) containing both volume and inflation indices.
7.2. Matters are greatly simplified by thinking of all these unobserved
components in the runoff table as the hypothetical next vector of observations. In
particular, suppose s is the last period of observation and we think of y (s + 1) as
the column vector consisting of all unobserved entries stacked in some arbitrary
but fixed manner. All the elements in this vector associate with origin years
1,2,... , s and accordingly have means parametrized by one of b(l), h(2), . . . ,
b(s) these being the vectors parametrizing the various delay sequences. Hence the
state ß(s+ 1) associated with y (s+ 1) is given by
== ß(s).
In other words, with this conceptual device, the hypothetical next state is equal to
the current state, which shows that H (s+ 1) and V (s+ 1) are respectively the
identity and zero matrix.
7.3. The measurement equation (3.1) under this conceptualization is easily
determined from simple book-keeping operations keeping track of the location
of the unobserved runoff entries in the y (s+ 1) vector. For example if the
unobserved entry in row w, column d of the runoff table occupies the jth position
of y (s+ 1) then the jth row of x (s+ 1)is all zeros except for positions (w– 1)p+ 1
through to wp where it is
7.4. The actual forecast of y (s+ 1) now follows from the Kalman filter
equation (3.3) and is equal to
(7.1)
with the covariance matrix of the error of prediction y (s+ 1) given by
(7.2)
Both (s) and its covariance matrix C (s) are known at the current time point s.
On the other hand the matrix X (s+ 1) is not known since it involves the unknown
inflation index at future points of time t> s. This uncertainty leads to
modifications to both (7.1) and (7.2) in order to make the forecast operational.
7.5. Suppose that inflation index for future years follows a random process
with known mean and covariance structure, independent of the error terms u (t)
and u (t) in (3.1) and (3.2), and that X (s+ 1) in (7.1) and (7.2) incorporates the
expected inflation index in the future years. Then from results quoted in Harrison
and Stevens (1976), the prediction formula (7.1) yields the optimum predictor of
y (s+ 1) but the covariance matrix of is now given by (7.2) plus
a matrix Z reflecting the uncertainty associated with X (t+ 1). As derived in
Appendix B, the matrix Z in the current context is defined as follows: the entry
corresponding to the covariance between the components row w, delay d and row
r, delay e of the runoff table is equal to
(7.3)
where Cov denotes covariance, (w) and (r) arethe subvectors of (s)
corresponding to b(w) and b(r), and Cov { (w), (r)} is the appropriate
submatrix of C (s). All terms here are known except for the covariances involving
the λ(t). Again we leave to Appendix B the demonstration that under the
assumption of future percentage changes in the inflation index being independent
with mean µ and variance σ2, then for t’ t s
(7.4)
7.6. Given the covariances of all future forecasts, it is a straightforward matter
to derive variances and covariances associated with subtotals of future payments.
In particular the forecast payment due in year t > s is the sum of all elements lying
on the future diagonal corresponding to t. These sums can be expressed in matrix
terms as Ay(s+ 1) where A is a matrix of zeros and ones. The covariance matrix
associated with this vector of sums is then
(7.5)
where Z has entries (7.3). A similar argument is applied to derive the variance
associated with the total present value of all future liabilities.
7.7. The example of §5is now used to illustrate the forecasting procedures.
Suppose we aim to forecast all future payments associated with origin years w = 1
to w= 5, and up to a maximum delay of d=9. We think of y(s+ 1) as all the
associated unobserved entries ordered in y(s + 1) in increasing t and increasing d,
the latter index moving the most rapidly. Thus y(s+ 1) contains 35 entries, the
first entry corresponding to origin year, delay combination (5, 1), and the 35th to
cell (5,9). The X(s+ 1) matrix is thus 35 by 5, successive blocks of rows forming
diagonal matrices.
7.8. The projected figures decline smoothly to zero with increasing delay, as
Table 3. Projected future payments

Origin Delay d
year w 1 2 3 4 5 6 7 8 9
1 55·3 28·5 14·4 7·1 3·5
2 112·5 59·6 30·7 15·5 7·7 3·8
3 234·0 129·1 68·4 35·2 17·8 8·8 4·3
4 481·9 283·6 156·5 82·9 42·7 21·5 10·7 5·2
5 823·5 545·3 321·0 177·1 93·8 48·3 24·4 12·1 5·9
yearly 1,707·1 1·046·1 590·9 317·9 165·5 82·5 39·4 17·4 5·9
totals (70·0) (43·4) (24·6) (13·3) (7·0) (3·5) (1·7) (0·8) (0·3)
* (158·0) (130·4) (88·7) (54·7) (31·7) (17·3) (8·9) (4·2) (1·5)
grand total* 3,972·8 (164·5) (430·5)
* Standard errors in brackets (see text).
expected in view of the parametrization (d+ 1) exp (–d). The overall forecast is
consistent with the figures given by Benjamin (1977), again not surprising in view
of the regular nature of the triangle. The first reported standard error
corresponds to the first term in (7.5), and hence making no allowance for the
uncertainty associated with future inflation levels. The second standard error
corresponds to the whole of (7.5) under the assumption that future percentage
changes in the price level are independent with mean 20% and standard deviation
10%. Under normal probabilities this translates to a 95% confidence statement
that future percentage changes lie in the interval 0% to 40%. Both standard errors
reported in the table correspond to that of y(s+ 1) considered as an estimate of
the mean level of claims X(s+ 1) ß(s). The actual level of future claims
incorporates the error term u(t) of (3.1) and allowance for this source of variation
further increases the estimated standard error, the increase depending on the
precise assumptions regarding deviations in future claim amounts around their
mean level.
REFERENCES
ALMON, S. (1965). The distributed lag between capital appropriations and expenditures. Econome-
trica 33, 178.
BENJAMIN, B. & HAYCOCKS, H. W. (1970). The analysis of mortality and other actuarial statistics.
Cambridge University Press, Cambridge.
BENJAMIN, B. (1977). General insurance. Heinemann, London.
Box, G. E, P. & JENKINS, G. M. (1970). Timeseries analysis: Forecastingand control. Holden-Day,
San Francisco.
DUNCAN, D. B. & HORN,S. D. (1972). Linear dynamic recursive estimation from the viewpoint of
regression analysis. Journal of the American Statistical Association67, 815.
FELDSTEIN, M. S. (1971). The error of forecasting econometric models when the forecast-period
exogenous variables are stochastic. Econometrica39. 55.
HANNAN, E. J. (1970). Multiple time series. Wiley, New York.
HARRISON, P. J. & STEVENS, C. F. (1976).Bayesianforecasting. J. Royal Statistical Soc. (B) 38,205.
HARVILLE, D. A. (1977)Maximum likelihood approaches to variance component estimation and to
related problems. Journal of the AmericanStatistical Association72,320.
JAZWINSKI, A. H. (1970). Stochastic processesand filtering theory. Academic Press, New York.
KALMAN, R. E. (1960).A new approach to linear filteringand prediction problems. Trans. Amer.Soc.
Mech. Eng.. J. Basic Engineering82, 35.
KRAMREITER, A. & STRAUB, E. (1973).On the calculation of IBNR-Reserves.Swissactuarial journal
73, 177.
LINNEMAN, P. (1980). A multiplicative model of loss reserves: ‘The stochastic process approach’.
Research Report No. 32, Laboratory of Actuarial Mathematics, Copenhagen.
MEHRA, R. K. (1975).Credibility theory and Kalman filtering withextensions.International institute
for applied systemsanalysis research memorandum.RM-75-64. Schloss laxenburg, Austria.
REID,D. H. (1978). Claim reserves in general insurance (with discussion).J. Ins/. of Actuaries 105,
211.
RAO,C. R. (1973).Linear statistical inferenceand its applications(2nd ed). Wiley, New York.
STRAUB, E. (1972). On the calculation of IBNR-Reserves. IBNR: The prize winningpapers in the
Boleslaw manic fund competition held in1971.Nederlandse reassurantie groep N.V.Amsterdam.
TAYLOR, G. C. (1977). Separation of inflation and other effects from the distribution of non-life
insurance claim delays. Astin bulletin 9, 219.
Claims Reserving, State-Space Models and thc Kalman Filter 175
TAYLOR,G. C. (1980). Application of actuarial techniques to non-life insurance establishment of
provisions for outstanding claims. Seminar on the application of Mathematics in Industry
organized by DMS, CSIRO and Pure and Applied Mathematics, ANU.
VANEEGHEN, J. (1981).Loss reservingmethods.Surveysof actuarial studiesnr. 1. Nationale —Neder-
landen N.V., The Netherlands.
APPENDIX A
MATHEMATICAL PROOFSASSOCIATED WITH § 4
A.1. We first elaborate on the argument leading up to (4.7). Consider a set of
equally spaced points x1, x2, . . . where for each w, lies on a polynomial of
implies
(A.1)
where A is the backward differencing operator We can write
A= Z–B where Now using the binomial expansion
Substituting into both sides of (A.1) and rearranging leads to
Identifying with m (w –j, d) for fixed d and j== 1, 2, . , q yields equation (4.7).
In the actuarial literature, the fact that polynomials of degree less than q are not
affected by linear schemes of the form (4.7), is extensively used in the graduation
of mortality data by ‘summation’ formulae—see for example Benjamin and
Haycocks (1970).
A.2. We now substantiate the argument of paragraph 4.8. Rearrangement of
equation (4.9) yields
If the covariance matrix of the expression in curly brackets is C then the

covariance matrix of v(w) is The rank of this latter matrix cannot exceed
that of the p x p matrix C. The rank will be exactly p if C is of full rank, an is
square and invertible. Hence v(w) can have a nonsingular covariance matrix only
if is square and invertible.
APPENDIX B
FORECASTING WITH UNCERTAIN FUTURE INFLATION RATES
B.1. This Appendix establishes the expression (7.3) and displays the algebra
leading up to equation (7.4). We deal with the former expression first. From
Feldstein (1971) or Harrison and Stevens (1976) we have that the i, jth element of
Z is given by
(B.1)
where Dij is the matrix of covariances between the ith and jth row of X (s+ 1) and
the trace of a matrix is the sum of its diagonal entries. If i and j correspond
respectively to entries of the runoff triangle, origin year w, delay d and origin year
r, delay e, then Dij under the model (2.2) is given by
(B.2)
where the ith and jth row of X (s+ 1) are denoted by and
We note that, for example, x'(w, d) is all zeros except in
column positions (w– 1) p + 1 through wp where it is (d) with a similar
statement applying to x’(r, e).
B.2. Substituting equation (B.2) into (B.1) and using the fact that trace
{AB} = trace {BA} for matrices A and B, shows that (B.1) equals
(B.3)
NOW on account of the zeros in x (w, d) and x (r, e) we have
and
where the notation follows that of § 7. Substituting these last identities into (B.3)
yields (7.3).
B.3. To establish (7.4) under the assumption that future inflation rates are
independent random variables, we note that for t > s
where the (s+1), (S+2), . . . , (t) are the future inflation rates with mean and
variance Assuming we have that
s
Subtracting the product yields equation (7.4).
APPENDIX C
COVARIANCE ESTIMATION AND INITIATING THE KALMAN FILTER
C.1. This Appendix deals with the problem of specifying the covariance
matrices U(r) and V(t) associated with the equations (3.1) and (3.2). Moreover,
we indicate a method for initializing the Kalman filter (3.3)–(3.7) with starting
values ß(0) and C(0). These seemingly unrelated issues are simultaneously
tackled in a single development. Before embarking, recall from §6 that all four of
the quantities above are susceptible to subjective determination, the application
of which is necessarily tied to the specific circumstances. The viewpoint of this
Appendix, however, ignores the possibility of a direct role for subjective
information and correspondingly emphasizes the observed data and formal
statistical procedures.
C.2. We suppose that at the start of the investigation, observation vectors
y(1), y(2), .. ., y(s) are available, these being the successive diagonals of the
observed runoff triangle. By repeated back substitution of the state equation
(3.2) we find that
Substituting into the measurement equation (3.1) yields
(C.1)
where
C.3. Now (C.1) holds for t=1,2,. . . , s. Collecting all such equations and
arranging in order yields
(C.2)
where
A =
Note that all the matrices A, B(f), t=: 1, 2, . . . , s are known.

C.3. Equations of the form (C.2) constitute the basis for what is known as the
‘mixed’ model of analysis of variance (see, e.g., Rao (1973, p. 302). Observations
are explained in terms of both ‘fixed’ and ‘random’ effects. In the context of these
mixed models the aim is to estimate both the fixed effects ß(O) and the covariance
matrices of the random effects v(1), v(2), . . . , v(s) and u. This is precisely our
problem and hence we can draw on the estimation theory developed for these
models. Methods of estimation are described in, for example, Rao (1973) or
Harville (1977).
C.4. At the expense of generality, we can explicitly derive straightforward
estimators which appear reasonably justified in the claims reserving context.
Assume that the errors in both the measurement and system equations (3.1) and
(3.2) are normal. Further suppose that both u(t) and V(t) are known up to fixed
scalar multiples and This specific context includes the usual ‘regression’
case where all errors are zero correlated with constant variance, or where the
matrices are diagonal. To proceed, note that (C.2) can be written as
where e is the sum of all the random components in the right side of (C.2). Since
the individual random components in (C.2) are assumed mutually uncorrelated
we find that
By hypothesis the V(t) arc known up to Similarly, since E[uu´] is block

diagonal with blocks U(t), the second term is known up to Writing
then E[ee´] can be written in the form W where every element of the matrix W
is a known linear function of r.
C.5. Using well-known general linear model results (Rao 1973, p. 263), for
given r and assuming normality, the maximum likelihood estimators of ß(0) and
are
(C.3)
(C.4)
where T is the total number of observations. The estimator (C.3) has estimated
covariance matrix
C.6. The above considerations suggest the following procedure. Firstly,
decide on a range of possible values For each of these r solve (C.3) and
compute the log likehood
(C.5)
the equality following after some manipulation. Pick r so as to minimize (C.5).
Given the minimizing r and the corresponding and the estimated
covariance matrix then former two serve to define the U(t)
and V(r), while the latter two initiate the Kalman filter.
C.6. These suggested procedures were used on the data of § 5. The U(t) and
V(t) matrices of the example were assumed diagonal, the former with diagonal
entries proportional to the approximate deflated mean level. Since all the results
were robust to precise specifications, excessive sophistication was avoided. The
maximum likelihood value of r was determined to be zero and the likelihood was
quite flat in this neighbourhood. Accordingly, a variety of r values near zero were
used and again all results turned out to be insensitive to precise specifications.
The results reported in §6 are based on those for r =·5.
APPENDIX D
DERIVING THE KALMAN ALGORITHM
D.1. We briefly indicate an approach to the derivation of the Kalman
algorithm equations (3.3)–(3.7). The approach is from the viewpoint of constant
coefficient regression theory and knowledge of linear regression models including
normal equations is assumed (see, e.g., Rao (1973)). Further details associated
with the derivations below are available in Duncan and Horn (1972).
D.2. Suppose we reach time t and have estimate ß(t– 1) with covariance
matrix C(t – 1). Using equation (3.2) our forecast of ß(t) is
with covariance matrix
D.3. At time t we receive the additional information y(t) which is related to

ß(t) via the measurement equation
Combining this ‘present’ information v(t) with the ‘past’ information (t|t – 1)
leads to the grand linear model
(D.1)
where 6 and u(t) are zero mean error vectors with covariance matrix
D.4. The generalized least-squares estimator of P(r) on the basis of the grand
linear model (D. 1) is given by
(D.2)
where C(t), the covariance matrix of /I(t) is given by
(D.3)
Using a number of matrix manipulations, equation (D.2) and (D.3) may be recast
into the form (3.3)–(3.5).
APPENDIX E
THE CHAIN LADDER MODEL AND THE STATE-SPACE FRAMEWORK
E.1. We briefly indicate how the well-known Chain Ladder model may be
embedded into the state-space framework and is, indeed, a special case of the
model of 64.
E.2. As in Taylor (1980) the Chain Ladder model may be expressed as
where L is an inflation rate and n(w) a volume indicator.

E.3. If 4 (d) is all zero except for a 1 in the (d+ 1)th position and I has in
the (d+ 1)th position then
d=0,1,2,...
Accordingly,
Claims Reserving, Stare-Space Models and the Kalman Filter 181
which is analogous to (4.4). We also note that in this conceptualization b(w) has
components which arc independent of origin year w and this yields the
appropriate system equation.
E.4. Many of the other models reviewed by van Eeghen (1981) may be
similarly embedded into the state-space framework.

Claims Reserving, State-Space Models and The Kalman Filter

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Claims Reserving, State-Space Models and The Kalman Filter

Uploaded by

Copyright:

Available Formats

JIA 110 (1983) 157-181

CLAIMS RESERVING, STATE-SPACE MODELS AND

BY PIET DE JONG, B.ED., PH.D.

2. CLAIMS RESERVING: THE PROBLEM AND THE DATA

Figure 1. The runoff triangle

3. STATE-SPACE MODELS AND THE KALMAN FILTER

4. A STATE:-SPACE MODEL FOR THE RUNOFF TRIANGLE

and leave to final extensions the modifications needed to accommodate (2.2).

where (d) and b(w) are column vectors defined as follows:

Substituting expressions of the form (4.4) into (4.2) yields

Table 1. Runoff triangle for U.K. general insurer*

Origin Inflation Volume Delay d

Table 2. Estimated states*

Component of ß(t) Time t

7. FORECASTING FUTURE PAYMENTS

Table 3. Projected future payments

Substituting into both sides of (A.1) and rearranging leads to

If the covariance matrix of the expression in curly brackets is C then the

Subtracting the product yields equation (7.4).

Substituting into the measurement equation (3.1) yields

Note that all the matrices A, B(f), t=: 1, 2, . . . , s are known.

By hypothesis the V(t) arc known up to Similarly, since E[uu´] is block

with covariance matrix

D.3. At time t we receive the additional information y(t) which is related to

where C(t), the covariance matrix of /I(t) is given by

where L is an inflation rate and n(w) a volume indicator.

You might also like