On_the_Iterative_Image_Space_Reconstruction_Algorthm_for_ECT

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

52 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. MI-6, NO.

1, MARCH 1987

On the Iterative Image Space Reconstruction


Algorthm for ECT

D. M. TITTERINGTON

Abstract-The ISRA of [1] is shown to be an iterative algorithm that It is assumed that the (aij ) are known.
aims to converge to the least-squares estimates of emission densities. Suppose we start from an initial set of positive esti-
Convergence is established in the case where a unique least-squares
estimate exists that is, elementwise, strictly positive. It is pointed out
mates (X)(o) )
that, in terms of asymptotic theory, the resulting estimators are infe- The EM algorithm generates a sequence of sets of es-
rior to the maximum likelihood estimators, for which the EM algo- timates (X(k) ), according to the following iterative step.
rithm is a computational procedure. Potential difficulties with the be- For each k = 0, 1, * * * , and for each j = 1, . . * , J,
havior of the ISRA are illustrated using very simple examples.

I. INTRODUCTION X(k+l) = X(k) {ni*aij/(


n E airXrk) (1)
D ECENTLY, [1] an iterative algorithm, called the im-
IXage space reconstruction algorithm (ISRA), has been The ISRA uses the iterative step
proposed as an alternative to the EM algorithm [2], [3],
in the context of estimating emission densities in emission
computed tomography (ECT). In [1], the two algorithms (k+1) = X(k) (k ni*aii)/[Z ai{ 1 air) (2)
are outlined and compared on real data, and certain com-
putational advantages of the ISRA are pointed out. The It is clearly important to start with positive estimates
motivation for the algorithm is expressed in descriptive because XA() = 0 implies X(k) -0 for all k. Furthermore,
terms. The main goal of this paper is to point out a more if X50) > 0 for all], then Xk) 0 for all and all k = 1,
objective interpretation of the ISRA as a procedure aimed 2,
at calculating least-squares (LS) estimates of the emission Although in the above we have called recursion (1)
densities. The EM algorithm, on the other hand, is a nu- "the" EM algorithm, it is in fact only the special version
merical procedure for calculating maximum likelihood of a general algorithm for finding ML estimates from in-
(ML) estimates. Both the ML and LS estimates can be complete data, as applied to the particular problem of es-
regarded as minimum distance estimates, but based on dif- timating the (Xj ) from the (n ). In this context, the in-
ferent measures of "distance"; see Section III. We con- completeness in the data is derived from the fact that, al-
sider the convergence behavior of the ISRA in Section IV though we know where each single event was detected,
and compare the statistical properties of the ML and LS we do not know from which source it originated.
estimators in Section V.
II. STATEMENTS OF THE ALGORITHMS III. REPRESENTATION USING DISTANCE MEASURES
The notation is as follows, with terminology as in [1]. Since it is an example of the EM algorithm, iteration
There are J source pixels, the jth of which has emission (1) enjoys certain convergence properties [2], [3]. One
density X1. Measurements (Mnan) are observed, where n * way of motivating the algorithm is to consider the prob-
is the number of coincidences counted in the ith of I pairs lem of maximizing, with respect to the (Xj ), the quantity
of coincident detector elements. The value aij is the con-
ditional probability that an event emitted from pixel j is En' log (a..Xij)
assigned to projection i. Thus, for each ], i
I
~i i! (3)
Zaij= 1
subject to the constraint E X = n * = n*. Differentia-
and the expected value of n * is tion with respect to Xj gives
Eaij)j.
i J

E [n*a&j/( airXr)] =d
Manuscript received August 18, 1986; revised November 25, 1986.
The author is with the Department of Statistics, University of Glasgow,
Glasgow G12 8QQ, Scotland. a constant which is the Lagrange multiplier associated
IEEE Log Number 8612992. with the constraint. Multiplying by Xj and adding over j

0278-0062/87/0300-0052$01.00 © 1987 IEEE


Authorized licensed use limited to: Peking University. Downloaded on July 04,2024 at 12:05:29 UTC from IEEE Xplore. Restrictions apply.
TITTERINGTON: ITERATIVE IMAGE SPACE RECONSTRUCTION ALGORITHM 53

gives d = 1, so that the stationarity equations are or

E |n*aij/(E airXr)3
A

= 1. (4) I = E
i
(n .ajj) :. aij
I
I
Z
r
airxr (8)

Multiplication of both sides by Xj gives


A

Multiplication of both sides by Xj provides an equation


which clearly motivates (2). It follows that a strictly pos-
itive vector X is an LS estimate of X if and only if it is a
x= ij n aij ( airr+ fixed point of the iteration (2).
The final steps in establishing the algorithms, the EM
a form which clearly motivates (1). algorithm (1) from (4) and the ISRA (2) from (8), accord
Note that, if we define (r*), the relative counts, by with compatibility with the Kuhn-Tucker conditions of
r= n i/n, i = 1, , I, and
(pj the relative den- optimization theory; see, for instance, [2, Appendix I].
sities, byp = Xj / Er Xr, = 1, ,J, then maximizing From an algebraic point of view, the ISRA is simply an
(3) is equivalent to minimizing, with respect to the (pj) iterative procedure for trying to solve a set of linear equa-
tions without having to invert a matrix, in spite of the fact
that the inverse exists. Note that, if (7) is written as
Er* log (r / aijp)
bjrXr = Cj

which is the Kullback-Leibler distance between (rn) and


r

(Ej aijpj ), the set of expected values of (rn). in an obvious notation, so that ATA = B and ATn * = c,
Other estimates for (pj ), and hence for (Xj ), might be then the ISRA is
obtained by minimizing other distance measures between
(r*) and (Ej aijpj). Perhaps the simplest is the least- X(k+l) X(k)Cj b(b.X(k))
squares distance
j
(9)

E -E aijp1) j = 1, * * *, J, and X satisfies


(rI
A

or, equivalently, x
i
=
xj
Cj Zr bjr x"" r (10)
2
E n~I E aiixi - Note also that although, in the EM algorithm, Sj
i X(k+l) Sj Xj(k), this is not necessarily the case in the
If we denote the vector (n *) by n *, the vector (Xj ) by ISRA.
X, and the I x J matrix (aij ) by A, then this can be written IV. CONVERGENCE PROPERTIES OF THE ISRA
as
The satisfying convergence properties of (1) are dis-
(n A-A ) T(n *- AX) cussed in detail in [3].
where "T" denotes transpose. Suppose we write (9) in the form
The least-squares (LS) estimates, X, of X are obtained X(k+l) Gj(X(k))
by solving
where
A TAx = ATn*. (5)
If ATA is of rank J, then
G1( X) = Xj C/(j bjrXr), (11)
X= (ATA) ATn* (6)
explicitly. However, if J is large (ATA) may be either ill j 1, * * J, thus emphasizing the successive-approx-
= ,

conditioned or singular and formula (6) is unusable. In imations nature of the algorithm. Then an important guide
such cases methods based on regularization may well be to the convergence properties is provided by
the matrix
employed [4] to give estimates of X. These estimates will U( X), where
no longer be LS estimates, however. aGj(X)
We shall, from now on, assume that ATA is nonsingu-
lar. (This requires that I ) J. ) axk
The link with the ISRA can be forged by noting that the for j, k = 1, , J. It is then the case that, if X repre-
jth equation in (5) can be written sents a solution of X = G( X), local convergence to )
obtains if the eigenvalues of U( X) are all, in magnitude,
E aij E
i r
airAr=
En ajj, (7) less than one [5]. Otherwise, divergence may occur in at
least some direction.
Authorized licensed use limited to: Peking University. Downloaded on July 04,2024 at 12:05:29 UTC from IEEE Xplore. Restrictions apply.
54 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. MI-6, NO. 1, MARCH 1987

From (11), we obtain TABLE I


VALUES OF i7 FOR Two CHOICES OF ATA AND A VARIETY OF X

dk=Ckj / bjr X r XXjcjbjk / bjr X r ) ATA

(1.10 0.63 0.29'\ /2.00 2.04 1.96


(12) 0.63 1.18 0.63 2K.04 2.60 2.08|
in which
XT 0.29 0.63 1.10 1.96 2.08 2.60

I (0.33, 0.33, 0.33) 0.83 0.97


ajk = if j = k (0.10, 0.50, 0.40) 0.88 0.98
(0.40, 0.40, 0.20) 0.83 0.96
= 0otherwise. (0.40, 0.20, 0.40) 0.87 0.97
A (0.05, 0.05, 0.90) 0.96 0.99
If X satisfies (10), then the right-hand side of (12) is (0.05, 0.90, 0.05) 0.95 0.99
(0.90, 0.05, 0.05) 0.96 0.99

6jk -
xj jk/( brjr)r
tablished a complete theory of convergence in such cases,
or a simple illustration of the phenomenon is given in Ap-
6jk - jk(X
say. pendix II.
What can be established is the following.
In Appendix I the following Theorem is proved about Suppose X is a fixed point of the iteration in which the
the eigenvalues of V( X). first s elements are zero and the rest positive. Suppose
Theorem: If ATA is positive definite and if X > 0 (i.e., ('0) ) are chosen with the same zero-structure as X. Then
xi > 0 for j = 1. , J ) is a fixed point of (9), then
the eigenvalues of V( X) are all positive and the maximum
an argument analogous to that in Appendix I forns the
basis of a proof that the ISRA will be locally convergent
eigenvalue of V( X) is unity. at X.
The following Corollary follows at once. It is, of course, not a realistic proposition to be able to
Corollary: Under the conditions of the Theorem, it fol- prespecify which elements of X(') should be set to zero,
lows that 0 < q < 1, where q is the maximum eigenvalue but in examples like that in Appendix II, it does appear
of U( X), so that local convergence of (9) to X will hold. that convergence from a positive X occurs to a point on
It also follows that U( X) is a singular matrix, although the finite boundary of the set X > 0. (This empirical re-
this does not affect the convergence of the algorithm. mark does not constitute a proof!)
Since, as remarked in Section III, a positive X is a fixed The final remark emphasizes that, in spite of the rela-
point of (2) if and only if it is a LS estimate of X, it fol- tionship between (2) [or (9)] and the solution of (5), and
lows that, when the LS estimate is strictly positive and of the results about convergence, it does not necessarily
ATA is nonsingular, then the ISRA will be locally conver- follow that an algorithm of the form (9) can be used, as a
gent to that estimate. general rule, to solve any given set of linear equations
Convergence may, however, be slow, particularly if the
matrix ATA is not close to being diagonal. As indicated BX = c,
above, the crucial quantity is 77, the largest eigenvalue of where B is J x J and is nonsingular. A counterexample
U( X). As an illustration, Table I provides numerical re- for a matrix B that is neither symmetric nor semidefinite
sults from a couple of examples with J = 3. In one case is provided in Appendix III.
ATA has a reasonably dominant diagonal, but, in the other,
the resulting values of n , for a variety of choices of X, are V. PRECISION OF LEAST-SQUARES ESTIMATES
very close to unity. It is generally the case that, in the context of positron
Some further remarks should also be made. emission tomography, the ML estimates from (1) will be
The first backs up the comments in [3, sect. 2.3.2], in different from the LS, or projected LS, estimates obtained
which the LS method is discussed; see also [4]. There, it from (2). In comparing them one must either assess them
is emphasized that the solution of (5), even if it is unique, empirically on particular sets of data or by simulation, or
may not be nonnegative. As remarked earlier, if X(') o) investigate their statistical properties. For the latter, it is
0, then the same is true of all future iterates X(k) from (2) appropriate to look at the covariance matrices of the two
or (9). If the solution X of (5) has negative components, types of estimator.
the limit of the sequence (X(k) ) (if the sequence con- The covariance matrix of an estimator X, written cov
verges) represents some projection of the least-squares es- (X), is a J X J matrix whose diagonal elements are the
A ~ ~ ~
~ ~ ~ ~ ~~

timates onto the nonnegative orthant, X > 0. The se- variances of the elements of X and whose off-diagonal ele-
quence cannot converge to a point in X > 0 because the ments are the covariances between different elements of
discussion following the Theorem would then require that X. The variances are particularly important because large
the resulting limit point should satisfy (5), a contradic- variance means low precision.
tion. Convergence, if it occurs, must be to some X in As in [3, sect. 2.2 ] explicit formulas for the covariance
which some elements are zero. Although we have not es- matrices are available only when asymptotic results
Authorized licensed use limited to: Peking University. Downloaded on July 04,2024 at 12:05:29 UTC from IEEE Xplore. Restrictions apply.
TITTERINGTON: ITERATIVE IMAGE SPACE RECONSTRUCTION ALGORITHM ss

are valid. In other words, it has to be assumed that ap- APPENDIX I


proximations are acceptable, to results that are obtained Proof of Theorem: The proof is similar in spirit to
only when a very large amount of data are available. In the argument in [7]. For brevity, we omit mention of the
the present context, the number of pairs of detectors I, dependence of V( X) on X. First note that
would have to be very large. Under this assumption, and
if X denotes the ML estimator, then (cf. [3]) the (j, k) V = DB
element of cov (X) is where D is a diagonal matrix with jth diagonal element djj
= Xjl/r { bjrXr }, and B= ATA, assumed nonsingular.
With respect to the inner product < ., * >, defined on
(13) RJ by
(U, V) = uTD lv,
For X, the solution of (5), we have, assuming ATA is V is symmetric and positive definite
nonsingular,
(u, Vv) = uTBv = (Vu, v)
cov ($\) = (ATA) AATcov (n*)A(ATA) (14) and
in which cov (n * ) is a diagonal matrix with (i, i ) ele- (u, Vu) = uTBu > 0, provided u * 0.
ment Ej aij Xj. In practice, cov (X) and cov ( ) could be
estimated by substituting X by X in (13) and ) in (14). The consequence of this argument is that it implies pos-
The "asymptotics" enter into (13) in that (13) is the itivity of the eigenvalues of V.
covariance matrix for X based on the standard asymptotic Next, note that
theory of maximum likelihood estimation. So far as (14) V= D1D2B
is concerned, it is exact if X is indeed the solution of (5).
If, however, X is obtained from (2) or (9), starting from where DI = diag ( X I, , Xj) and D2 = diag ({Er
X(o) ) 0, then as described in Section IV, the resulting bjrXr }I). Then if D is any eigenvalue of V, we have
X will not satisfy (5) if the solution of (5) has negative det (DID2B - I) = 0,
components. "Asymptotically," however, this phenom-
enon will not occur because, roughly speaking, if there is i.e.,
a large amount of data, X is likely to be very close to the det (D2BD1 - UI ) = 0.
true, positive X. As a result, (14) will be valid, in this
sense, for estimates obtained using the ISRA. However, Thus, P is also an eigenvalue of D2 BD1. However,
in the context of emission tomography [3, sect. 2.2], these D2BDI is a Markov matrix, being nonnegative, with row
asymptotics may well not apply, because I, the number of sums equal to unity. It follows [8] that I 1 and that
detector pairs, may not be large, relative to J, and the one eigenvalue is unity.
alternative, empirical methods of assessment may be more APPENDIX II
reliable.
Standard statistical theory has it [6] that ML estimators Consider the following example with I = J = 2;
are asymptotically optimal so far as precision is con-
cerned, so that, were (13) and (14) valid, the ML esti- 10.2 0.3\
mates obtained from the EM algorithm would be prefer-
A = ) and nl/n* =7/3.
able to the LS estimates obtained by the ISRA.
In principle, the least-squares procedure might be im- We take a normalized form of the problem in which
proved by using a weighted least-squares criterion rather X1 + X2 = 1. To do this, scale the data so that n1 + n2
than ordinary least-squares. Whether or not the extra nu- = 1. Equations (5) then become
merical complication is justified in terms of meaningful 0.68X1 + 0.62X2 = 0.38
improvement in performance would be best judged em-
pirically. 0.62X1 + 0.58X2 = 0.42
AA

the solution of which has X I < 0. Iteration (2) or (9) con-


VI. CONCLUSIONS verges, from a wide variety of starting points, to X1 = 0,
We have demonstrated that the ISRA is a numerical 2 = 21 /29. Note that X + X2 * 1 and rescaling would
procedure aimed at computing the least-squares estimates have to be undertaken.
This negativity phenomenon is partly due to a ratio of
of emission densities. Evidence has been presented in values for (n *, n *) that is rather unlikely, for this A.
support of local convergence of the algorithm and it is Remember, from Section II, that the expected value of
pointed out that, in terms of asymptotic theory, the method n is Ej aij Xj. In this example, these values are
provides an estimator which is not as good as ML in terms
of precision. 0.2X1 + 0.3X2 (i=1)
Authorized licensed use limited to: Peking University. Downloaded on July 04,2024 at 12:05:29 UTC from IEEE Xplore. Restrictions apply.
56 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. MI-6, NO. 1, MARCH 1987

and ACKNOWLEDGMENT
The referees' comments on the original version of this
0.8X1 + 0.7X2 (i 2).
paper were much appreciated.
=

The ratio n * /n * = 7/3 would therefore be unusual. An- REFERENCES


other contributing factor is the near singularity of ATA. [11 M. E. Daube-Witherspoon and G. Muellehner, "An iterative image
space reconstruction algorithm suitable for volume ECT," IEEE Trans.
Med. Imaging, vol. MI-5, pp. 61-66, 1986.
APPENDIX III [2] L. A. Shepp and Y. Vardi, "Maximum likelihood reconstruction for
emission tomography," IEEE Trans. Med. Imaging, vol. MI-1, pp.
Consider the pair of linear equations 113-122, 1982.
[3] Y. Vardi, L. A. Shepp, and L. Kaufman, "A statistical model for pos-
3X1 + 2X2 = 8 itron emission tomography," J. Amer. Statist. Assoc., vol. 80, pp. 8-
37, 1985.
[4] G. T. Herman, Y. Censor, D. Gordon, and R. M. Lewitt, Comment
3X1 + X2= 7 on "A statistical model for positron emission tomography," J. Amer.
Statist. Assoc., vol. 80, pp. 22-25, 1985.
for which the solution is X1 = 2, X2 1. Algorithm (9)
= [5] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear
converges to X only if X(o) A. Otherwise, empirical re- Equations in Several Variables. New York: Academic, 1970, p. 300.
[6] S. D. Silvey, Statistical Inference. London: Chapman and Hall, 1975,
sults showed convergence to XT = (0, 7) or XT = (8 /3, pp. 77-78.
0). These are both clearly fixed points of (9). For this [7] B. C. Peters and H. F. Walker, "The numerical evaluation of the max-
example, the eigenvalues of U( X) are 0 and 31/28, imum-likelihood estimate of a subset of mixture proportions," SIAM
J. Appl. Math., vol. 35, pp. 447-452, 1978.
which explains the instability at X, and the matrix corre- [8] R. Bellman, Introduction to Matrix Analysis. New York: McGraw-
sponding to B is nonsymmetric. Hill, pp. 265-270, 1960.

Authorized licensed use limited to: Peking University. Downloaded on July 04,2024 at 12:05:29 UTC from IEEE Xplore. Restrictions apply.

You might also like