Lecture 6 Merged

In this class
• Problem statement
• Computational strategies
H06W8a: Medical image analysis •
•
Theory of mutual information
Implementation
Class 6: Image registration • Validation
• Applications
Prof. Frederik Maes

frederik.maes@esat.kuleuven.be
Image registration problem Image registration problem

MR CT MR CT
T? T!
• Combining information from multiple images acquired using different scanners or at • If T is known, the second image can be resampled such that it has the same size and
different time points requires their geometric relationship to be known, i.e. the dimensions as the first image (cfr ‘basic concepts’).
transformation T that maps 3D points in one image onto the anatomically corresponding
point in the other. • After resampling, voxels at identical positions in both images correspond to
anatomically identical points.
• The mapping T compensates for differences in patient positioning or scan plane selection.
• Information extracted from one image (e.g. object contours) can then be simply
• Unless prospective measures were taken prior to image acquisition, T is in general unknown projected onto the other image.
and needs to be recovered retrospectively from the image content itself.
Image registration problem Applications

MR CT • Image registration is ubiquitous in medical image analysis
T? – As a goal by itself: image fusion = overlay of different images
– As a computational tool: detecting changes over time, atlas-based segmentation,
groupwise analysis (e.g. voxel-based morphometry)
• Images of the same subject: same modality (e.g. MR/MR), different modality (e.g. CT/MR,
PET/MR), rigid (no deformations) or non-rigid (in case of deformations)
– Follow-up (different imaging sessions)
– Motion compensation (same imaging session)
– Compounding / stitching of partially overlapping images (larger FOV)
• With traditional radiological examination of images printed as 2D slices on radiological film
(‘hard copy examination’), this problem is subjectively solved by the radiologist by looking – Fusing complementary information
for corresponding anatomical landmarks in corresponding images and by mentally • Images of different subjects: non-rigid registration to compensate for shape differences
constructing a 3D anatomical interpretation. – Atlas construction (‘mean shape’ templates)
• This is however rather tedious and not very accurate, as it is necessarily limited to – Atlas-based segmentation
identifying corresponding slices, without providing a true 3D registration solution. – Group analysis (e.g. patients vs controls)
• For computer-aided interpretation of the images displayed on a computer screen (‘soft
copy examination’) and for image analysis, a more formal 3D registration solution is
• Rigid / affine registration : more or less solved for many applications
required, using an algorithm that finds the optimal T according to a suitable criterion
• Non-rigid registration : different algorithms available, very much still research…
Application: brain image registration Application: multi-spectral analysis
MR/PET
Neuro-imaging using different MRI:
MR/CT alignment compensates for inter-scan head motion
T1 PD T2 FLAIR DTI
MR/MR
Application: multi-temporal analysis Application: multi-modal diagnosis
Follow-up of MS lesions with T2-weighted MRI: Diagnosis of metastases in lung cancer:

alignment compensates for differences in positioning PET used for detection, CT used for anatomical localisation
Time 1 Time 2
localisation detection
in CT in PET
CT PET transmission PET-FDG emission
Application: multi-modal therapy planning Application: multi-modal therapy planning
Radiotherapy planning for treatment of prostate cancer: Radiotherapy planning for treatment of prostate cancer:
MR used for target delineation, CT for dose calculation MR used for target delineation, CT for dose calculation
Impact on treatment plan:

CT
MR
CT
MR
CT with MR overlaid MR with CT overlaid

Application: motion compensation Challenges for image registration
• Multimodal: corresponding structures in the images to be registered have different
intensities
• Multitemporal: structural changes may have occurred, due to an evolving process
Original sequence Motion corrected sequence (growth, disease) or surgical intervention
• Multisubject: registration of images of different subjects (e.g. for atlas-based
segmentation) may be hampered by possibly large morphological and also
topological differences between subjects
• Local distortions: these are not modeled if the registration transformation is
restricted to a global rigid body or affine coordinate transformation…
• Different resolution: need for interpolation, possibly resulting in interpolation
artifacts, and difficulty of estimating the registration criterion reliably if the
resolution is low
• Diverse applications: various modality combinations, organs of interest, …, which all
should be tackled preferably by a single registration algorithm
Image registration strategies Example

MR CT
T?
A formal registration solution typically consists of:

1. A model for the registration transformation T, e.g. rigid, affine, non-rigid
2. Selection of registration features in both images that are to be aligned by registration, e.g.
corresponding points, surfaces or regions (voxels)
3. Definition of a registration criterion C that is a function of the transformation T and that
evaluates the proper alignment of the selected registration features, e.g. distance measures,
mutual information, …
4. An optimization strategy that searches for the optimal T that minimizes C (or –C)
3D/3D geometric transformations Non-rigid transformations

x (x1,y1,z1) (x1,y1,z1)
• Rigid body:
3D translation + rotation (#DOF = 6) (u,v,w)
e.g. brain images of same subject x’

x2 = f1(x1,y1,z1) (x2,y2,z2)
• Affine: y2 = f2(x1,y1,z1)
z2 = f3(x1,y1,z1)
3D translation, rotation, scaling, skew (#DOF = 12)
e.g. non-brain images of same subject x2 = x1 + u(x1,y1,z1)
y2 = y1 + v(x1,y1,z1)
e.g. global alignment of images of different subjects z2 = z1 + w(x1,y1,z1) (u,v,w) = deformation field
• Non-rigid: x’ = f(x) Original Deformed: valid Deformed: not valid

‘elastic’ matching (#DOF = up to 3x number of voxels)
e.g. motion/deformation correction, image rectification Impose regularization constraints
e.g. inter-subject registration, atlas-based segmentation to assure the deformation field is
physically valid (e.g. spatially
smooth, no folding or tearing of
tissues...)
Image registration features Marker-based registration
• GEOMETRIC FEATURES:
– External marker based registration
uses specifically designed external markers (= fiducials)
– Point-based registration
aligns corresponding anatomical landmark points
– Surface-based registration
aligns corresponding object surfaces
è These all require some form of segmentation to identify corresponding
objects in the images
• INTENSITY FEATURES:
– Voxel-based registration
maximizes intensity similarity (unimodal, multimodal)
è These can be applied without the need for segmentation Application: stereotactic surgery planning using CT and MR. A frame with known geometry is fixed to
the patient’s skull and visualized in the images.
Marker-based registration Marker-based registration
• Accuracy:
– Can be highly accurate (< 0.5 mm)
– Caveats:
• geometric distortion in MRI
localizer • marker segmentation & centroid localisation:
• Invasive and not always practical or applicable

– Acceptable for stereotactic neurosurgery
• frame is also used for navigation (e.g. electro-stimulation, biopsy)
– Practical consequence: all images acquired at the same day (to avoid refitting
the frame...)
• need for frame-less solutions…
– Alternative: skin markers
• Skin may deform, hence less accurate...
external markers
The position of each image slice in space is determined from the locations of the (centroids of)
markers in the image, relative to the known geometry of the frame.
The location of any image point in 3D space is obtained from its position relative to the markers.
Point-based registration Point-based registration

FREi
• A sufficiently large number of pairs of corresponding points (pi,qi) are identified in q2 T(p2)
each of the images to be registered (e.g. anatomical landmarks) d i2 =| qi - T ( pi ) |2 d2
• Registration criterion: find T that minimizes the average distance C between q1 T(p3)
corresponding points N
1 T(p1)
• T can be solved analytically for rigid-body or similarity transformations
qi T(pi) C=
N
åd
i =1
i
2 d1
?
TRE
q3
d3
di
C(T) = fiducial registration error (FRE, in mm) = root mean squared error (RMSE)
T Optimization problem:
pi qi qi T(pi) find T for which C(T) is minimal, i.e. for which the sum of squared distances between
corresponding points in both sets is smallest
èClosed form solution if T is rigid body (‘Procrustes’)
Note: FRE >< TRE = target registration error:

TRE depends on the location of the ROI and the configuration of the landmarks, while
FRE does not … è FRE may be small, while TRE is not…
Point-based registration Procrustes rigid-body alignment
q2 T(p2)
d i2 =| qi - T ( pi ) |2 d2
q1 T(p3)
N
1 T(p1)
C=
N
å w .d
i =1
i i
2 d1 q3
d3
There can be errors in the indication of the points used for registration (e.g. manual
mistakes due to image ambiguities)
èFiducial localization error (FLE)
Hence, some point pairs may be less reliable landmarks than others. This can be
incorporated in the registration criterion C(T) by the weights wi.
èe.g. wi = 1/Var(FLEi) (rotation matrix)
èClosed form solution if T is rigid body (‘Procrustes’) (translation)
Point-based registration Surface-based registration
• Indentification of corresponding landmarks usually requires manual intervention, • Extract corresponding object surfaces from the images to be registered
although some strategies for automated extraction of anatomical landmarks or
• Find the transformation T that minimizes the distance between both
geometrically characteristic points have been developed for some applications.
surfaces, using a suitable distance measure
• Errors in the accurate 3-D localization of corresponding points in both images
propagate directly into the registration result. Such errors can be reduced by • In order to evaluate the distance between both surfaces, point
increasing the number of landmark points. correspondences need typically to be established (e.g. ‘closest points’)
Example: automatically extracted Example: automatically extracted
è Actual implementation depends on the surface representation
anatomical landmarks (M. Betke et al.) ‘corner’ points (K. Rohr et al.) • Should be able to deal with:
– partial overlap
– outliers
– local optima
J.M. Fitzpatrick et al.
Iterative closest point Distance transform

S1 S2 D(xj) = dj
dj yj yj
dj
xj xj
1) For each point xj on the transformed surface S1, find the closest point yj on S2
1) Construct a map D that gives the distance in each point to surface S2
2) Find the transformation that minimizes the distances dj= |T(xj)-yj|, excluding
è S2 is represented by in an implicit way as the zero-level set of D
outliers:
2) Find the transformation such that the mean value of D evaluated along the
transformed surface S1 is minimal
è No need to establish point correspondences explicitly
3) Update yj and iterate
è But: spatial quantisation introduces additional local minima
Surface-based registration Voxel-based registration
Automated extraction of corresponding surfaces ?
• Find the transformation T that maximizes intensity similarity between the two
images, assuming that similarity is maximal when the images are correctly aligned
• Using original intensity values or derived features (e.g. blurring, image gradients)
• Using all voxels or a subset thereof (depending on the application)
• Much less preprocessing needed than for point or surface-based registration
è much more suited for automation
Skin surface? MR Voxels of the same

èdeformable and object have similar
hence unreliable for intensities in each
image registration image, but these are
CT
Skull segmentation
? Skull segmentation
not the same for the
two images
in MR ? in CT using thresholding
Unimodality similarity measures Unimodality similarity measures
• Unimodal registration: same modality, same protocol Time 1 Time 2

è(more or less) identical contrast, apart from noise
èE.g. CT/CT registration, serial MR image registration
• Minimize sum of squared differences:
• Maximize correlation:
Although the images are from the same modality, there may be significant
local intensity differences due to changes over time (e.g. lesion evolution) or
non-rigid deformations (not compensated for by affine registration),
which affect the robustness of the SSD and CC criterion…
Multimodality similarity measures Multimodality similarity measures
• Multimodal registration: different modality or different protocols • Variance of intensity ratios:

èdifferent contrast, e.g. CT/MR registration – ratio-image uniformity:
• Intensities of corresponding objects are different and their relationship is
in general unknown and non linear è min
èSSD or CC criterion does not apply…
• Variance of intensities:
– partitioned intensity uniformity:
èminimal intensity variation of B’ for each intensity value a of image A

èhistogram based, assuming unimodal conditional histogram h(B’|A=a)
Joint intensity histogram Joint intensity histogram: example
h(i1,i2) h(i1,i2)
I1 I1 I2 h2(i2)
h(i1,i2) counts I1 = 0 I1 =1
1 1
the number of
+1 voxel pairs (p,q) 0 22 27
I1(p) I2 = 1 5
p with intensities 0
I1(p) = i1 and
I2(q) = i2
I2 = 0 4 5 9
I2(q)
I2 h1(i1) 9 27
T
åå h(i1 , i2 ) = N (sum of all entries equals number of voxel pairs)
q = T(p) i1 i2
å h(i , i ) = h (i )
i2
1 2 1 1
(sum over all rows yields histogram of I1)
å h(i , i ) = h (i )
i1
1 2 2 2
(sum over all columns yields histogram of I2)
Joint intensity distribution Joint intensity distribution: example

I1 I2 p(i1,i2)
• The joint intensity distribution p(a,b) is the fraction of voxel pairs with intensity a in p2(i2)
I1 = 0 I1 =1
I1 and b in I2, for all a and b: 1 1
p(a,b) = h(a,b) / N 0
I2 = 1 5/36 22/36 3/4
N = Sa Sb h(a,b) = the total number of voxel pairs
0
• Marginal intensity distributions:
p(a) = Sb p(a,b) = fraction of voxels with intensity I1 = a (depends on T !)
I2 = 0 4/36 5/36 1/4
p(b) = Sa p(a,b) = fraction of voxels with intensity I2 = b (depends on T !)
I1 p1(i1) 1/4 3/4

p(a,b)
åå p(i1 , i2 ) = 1
i1 i2
(sum of all entries equals 1)
å p(i , i ) = p (i )
i2
1 2 1 1
(sum over all rows yields marginal distribution I1)
a
b
å p(i , i ) = p (i )
1 2 2 2
(sum over all columns yields marginal distribution of I2)
I2 i1
Joint intensity distribution Joint intensity distribution

I1 I2 I1 I2
Ta q = Ta(p) Ta q = Ta (p)
a = I1(p) a = I1(p)
b = I2(q) p q b = I2(q)
p q
I1 Unimodal: I1 Multimodal:
p(a,b) p(a,b)
è intensities a and b of corresponding è relationship between a and b is strongly
voxels p and q of registered images I1 and data dependent
I2 are likely to be similar
Only voxels in the region of overlap of both
a a images are considered è p(a,b) depends
è p(a,b) is clustered around diagonal
on T through varying correspondence (p,q)
b I2 b I2 and through varying region of overlap
(256 x 256 bins) (256 x 256 bins)
Histogram dispersion Histogram dispersion
Registered (correct T) Not registered (incorrect T)
The joint histogram changes with the registration transformation T
I1= MR
I2 = CT
I1 vs I1
0 mm shift 2 mm shift 5 mm shift I1 Observation (Hill et al,

1994):
joint histogram is more
clustered at registration
I1 vs I2 than when the images are
not properly aligned
Unimodal: significant non-

D. Hill et al. zero off-diagonal entries
I2 I2
Histogram dispersion Histogram dispersion

Registered (correct T) Not registered (incorrect T) p(a|I2=b)
p(a|I2 = b) = likelihood of observing intensity a in I1 given
more clustered
that the intensity of the corresponding voxel in I2 is b
p(a|I2=b)
more dispersed
The more clustered p(a|I2=b), the less uncertainty there is
about I1 given I2=b, thus the more information the
knowledge of one value (I2=b) contains about the other (I1)
If p(a|I2=b) = p(a), knowledge of I2=b does not contain

information about a
I1
I1 Observation (Hill et al, I1
1994):
joint histogram is more
clustered at registration
than when the images are
not properly aligned
Multimodal: dispersion is
data dependent
I2 I2 b I2 b I2
Joint entropy Joint entropy minimization?
Η(Α) = - å p Α(a) log 2 p Α (a) I1

α
Η(Α, B) = - å p ΑB(a, b) log 2 p ΑB (a, b)

α ,b
æ ö
Η(B | Α) = åa p A (a)çè - åb pB| A(b | a) log2 pB| A (b | a) ÷ø
• The entropy H(A) is a measure for the uncertainty about random variable A.
b I2 b I2
• H(A) is maximal when all values a are equally likely: pA(a) = 1/N è H(A) = log2 N.
• H(A) is zero when A has only one possible value a: pA(a) = 1 è H(A) = 0
• H(A,B) is smaller when the joint histogram pAB appears more clustered
• The joint entropy H(A,B) is a measure of the dependency between A and B: • Hence: a* = arg mina H(A,B) ?
H(A,B) = H(A) + H(B|A) with H(B|A) the conditional entropy of B given A • But this ignores that pAB depends on the overlap of images A and B
è H(B|A) = 0 when A and B are one-to-one related
• H(A,B) = 0 when overlap is empty è tends to minimize the overlap …
Mutual information The mutual information registration criterion
“Mutual information, i.e. the statistical dependence between both images or

the information that one contains about the other, is maximal at registration.”
• Fundamental concept from information theory:
Collignon et al., 1995
Viola & Wells, 1995
è Kullback-Leibler distance between the joint probability distribution and

the product of the marginal distributions
è measure of the statistical dependence of two random variables: A B
if A and B independent è p(a,b)=p(a).p(b) è I(A,B) = 0 Ta
è In general: I(A,B) ≥ 0
• Amount of information that one variable contains about another:
p,a q,b
è reduction in the uncertainty about A when knowing B a = transformation parameters
Interpretation Example
p(i1,i2)
I1 I2 p2(i2)
I1 = 0 I1 =1
HA(a), HB(a) : marginal entropy of A and B, respectively
I2 = 1 5/36 22/36 3/4
HAB(a) : joint entropy of A and B
IAB(a) : mutual information of A and B
I2 = 0 4/36 5/36 1/4
IAB(a) = HA(a) + HB(a) - HAB(a) 4 4 / 36 5 5 / 36 22 22 / 36 1/4 3/4 p1(i1)

I= log 2 + 2 * log 2 + log 2 = 0.045
36 1/ 4 * 1/ 4 36 3 / 4 * 1 / 4 36 3 / 4 *3 / 4
Find as much of the complexity in the separate datasets 1 1 3 3
H1 = -( log 2 + log 2 ) = 0.811
4 4 4 4
(maximizing HA + HB) such that at the same time they 1 1 3 3
H 2 = -( log 2 + log 2 ) = 0.811
4 4 4 4
explain each other well (minimizing HAB). 4
H12 = -( log 2
4 5
+ 2 * log 2
5 22 22
+ log 2 ) = 1.577
36 36 36 36 36 36
I = H1 + H 2 - H12 = 0.045
Example Example
p(i1,i2) p(i1,i2)
I1 I2 p2(i2) I1 I2 p2(i2)
I1 = 0 I1 =1 I1 = 0 I1 =1
I2 = 1 3/36 24/36 3/4 I2 = 1 0/36 27/36 3/4
I2 = 0 6/36 3/36 1/4 I2 = 0 9/36 0/36 1/4
6 6 / 36 3 3 / 36 24 24 / 36 1/4 3/4 p1(i1) 9 9 / 36 0 0 / 36 27 27 / 36 1/4 3/4 p1(i1)

I= log 2 + 2 * log 2 + log 2 = 0.204 I= log2 + 2 * log2 + log2 = 0.811
36 1/ 4 * 1/ 4 36 3 / 4 * 1 / 4 36 3 / 4 *3 / 4 36 1/ 4 * 1/ 4 36 3 / 4 * 1 / 4 36 3 / 4*3 / 4
1 1 3 3 1 1 3 3
H1 = -( log 2 + log 2 ) = 0.811 H1 = -( log2 + log2 ) = 0.811
4 4 4 4 4 4 4 4
1 1 3 3 1 1 3 3
H 2 = -( log 2 + log 2 ) = 0.811 H 2 = -( log2 + log2 ) = 0.811
4 4 4 4 4 4 4 4
6 6 3 3 24 24 9 9 0 0 27 27
H12 = -( log 2 + 2 * log 2 + log 2 ) = 1.418 H12 = -( log2 + 2 * log2 + log2 ) = 0.811
36 36 36 36 36 36 36 36 36 36 36 36
I = H1 + H 2 - H12 = 0.204 I = H1 + H 2 - H12 = 0.811
Example Example
I1 I2 p(i1,i2) p2(i2)
I1 I2 I3
I2 = 1 9/36 0/36 1/4
I2 = 0 0/36 27/36 3/4
9 9 / 36 0 0 / 36 27 27 / 36 1/4 3/4 p1(i1)

I= log2 + 2 * log2 + log2 = 0.811
36 1/ 4 * 1/ 4 36 3 / 4 * 1 / 4 36 3 / 4*3 / 4
1 1 3 3
• Mutual information of I1 and I2 == mutual information of I1 and I3 !
H1 = -( log2 + log2 ) = 0.811
4 4 4 4 • For MMI criterion, both I2 and I3 are equally similar to I1
1 1 3 3
H 2 = -( log2 + log2 ) = 0.811 • Not intensity value itself, but its joint histogram is evaluated.
4 4 4 4
9 9 0 0 27 27
H12 = -( log2 + 2 * log2 + log2 ) = 0.811
36 36 36 36 36 36
I = H1 + H 2 - H12 = 0.811
Multimodal character of mutual information Robustness of MMI criterion
• mathematically well founded with only few parameters

è reduces the need for heuristic and application specific parameter tuning 3.5
Example: original image versus itself
• histogram-based instead of intensity-based 3
original idem
è more robust against image degradations (noise, artifacts, local distortions)
2.5
• no limitations imposed on the data or on their intensity relationship
è same algorithm applicable in a variety of applications 2
• no segmentation required 1.5

è no (or limited) need for user intervention
1
è completely automated translation
0.5
• well suited for clinical application -10 -5 0 5 10
è quickly adapted as ‘standard’ method for voxel-based image registration
x translation (mm)
Noise Intensity inhomogeneity
1.2 1.1
s = 50 k = 0.001
1 1
s = 100 k = 0.002
original original + noise original original x inhomogeneity
s = 500 0.9 k = 0.004
0.8
0.8
0.6
0.7
0.4
0.6
0.2 0.5
translation translation
0 0.4
-10 -5 0 5 10 -10 -5 0 5 10
D log I = - k ||p - pc ||2
x translation (mm) x translation (mm)
k = 1e-3, 2e-3, 4e-3
Geometric distortion Limiting assumptions
• Both images should contain similar information…

1.3
• If not: insufficient registration clues, many local minima, small capture range
1.2 k = 0.0001 CT PET FDG
k = 0.0005
original distorted 1.1 k = 0.00075
1
0.9
0.8
0.7
translation 0.6
0.5
D x = k || p - pc ||2 -10 -5 0 5 10
k = 1e-4, 5e-4, 7.5e-4 x translation (mm)
Limiting assumptions Limiting assumptions
• Nature of relationship between image intensities is assumed to be • Joint probability density can be estimated reliably …
spatially stationary in their region of overlap • This may be problematic if
• If not: additional joint histogram dispersion, not relevant for registration – the images have low resolution
• Example: severe intensity inhomogeneity in MRI (use of surface coils) – the region of overlap at registration is small
è should be rectified first prior to registration • If not: interpolation needed è interpolation artifacts ?
• Note: MI(I1(x),I2(T(x))) >< MI(I2(y),I1(T-1(y)))
èSampling of voxels in I1 or I2 respectively
èInterpolation in I2 or I1 respectively
èDifferent behavior, especially when T is a non-rigid transformation
(sampled values from I1/I2 are more or less fixed if the region of overlap is
more or less stationary, while interpolated values from I2/I1 may vary
significantly)
Alternative measures Implementation
• Entropy correlation coefficient:

Floating Image (A)
• Normalized mutual information (Studholme, 1998):
Z
è0 ≤ ECC ≤ 1 X
Sampling p
è NMI = 1 / (1 – ECC/2) Joint histogram
sub/super a = A(p)
èExpected to be less sensitive to variations in overlap, i.e. variations in multi-resolution
Y
the marginal entropies H(A) and H(B)
• f-information measures: b
+1 I(a)
Transformation Ta
èExample: affine Z a*
X
èBhattacharyya distance:
a
pi Y Binning Optimization
f = å qi = å pi qi Þ å pa pb pab Interpolation q = Ta(p) 256 x 256 Powell
i qi i a ,b b = B(q) Gradient
NN, TRI, PV
Reference Image (B)
Sampling Interpolation
Floating image (I1) Reference image (I2)
p
a = I1(p)
q
I2(q) ?
• Start with few samples initially to speed up the criterion evaluation (O(N) time)
• Add more samples as the registration proceeds to improve accuracy
• Alternatively: select (small) random set of samples at each iteration
è stochastic optimization procedure
Transformed voxel positions q = T(p) of voxels p in the floating image do not coincide
with voxels in the other image è need for interpolation in the reference image
Intensity interpolation Intensity binning
• If the range of image A is [0,N1-1] and the range of B [0,N2-1], the joint
Nearest neighbour Linear Cubic, B-spline,... histogram H would have size N1 x N2
(order 0) (order 1) (higher order)
e.g. 1024x1024 (or more) for medical images.
q1 q2 q1 q2
• If the histogram is large, it will only be sparsely filled, such that small
w3 w4
q q changes in T affect many bins in H.
w2 w1 • Improve robustness by reducing the number of bins

e.g. by remapping all intensities in both images to the range [0-255]
q4 q3 q4 q3 (yielding a histogram of 256x256)
q = T(p) q = T(p) Similar to linear,

a = A(p), b = B(q1) A = A(p), bi = B(qi) but using more
h(a,b) += 1 b = S wi b i , S wi = 1 neighbours
h(a,b) += 1
Parzen windowing estimation Partial volume (PV) ‘interpolation’
• Idea: distribute the contribution of each sample to the joint histogram over • Instead of computing an intensity value for q, distribute the contribution of
multiple bins to make this vary more smoothly with changes in the this sample to the joint histogram over multiple histogram bins
transformation parameters
• Density estimation using a Gaussian kernel function: q1 q2 Joint histogram
1
H (a, b) = å Gs (a - ai , b - bi ) w3 w4 b3 +w3
N i q
Gs = separable Gaussian kernel b2 +w2

N = number of samples i, intensity pairs (ai,bi) w2 w1
• H(a,b) varies continuously in function of T b4 +w4
è H(a,b) and MI can be differentiated analytically q4 q = T(p) q3
• Computationally more intensive (sum of exponential functions) b1 +w1
è reduce number of samples N A = A(p), bi = B(qi)
• Alternative: partial volume distribution h(a,bi) += wi, S wi = 1 a
• Because the fractions wi vary smoothly with q, the histogram and MI vary
smoothly with T (this is not the case with NN and linear interpolation…)
è histogram and MI analytically differentiable
Optimization of MI Initialisation
• The histogram H and hence the mutual information MI are a function of • Initialise translation to align image centers
the parameters of T,
• Initialise rotation/scaling to align corresponding image axes relative to the
e.g. for rigid body registration: MI = MI(tx,ty,tz,fx,fy,fz)
patient (right/left, anterior/posterior, inferior/superior)
• Optimization problem: find optimal values for the transformation • Exploit spatial information in the DICOM header (for images acquired in the
parameters such that MI is maximal same session on the same modality)
èiterative search
• Exploit prior knowledge, e.g. standardized imaging protocols
èstarting from some initial values X
Z Y
X
• Problem: avoid local optima …
X Z
Y Z Y
X
Y Z
Floating Image (I1) Reference Image (I2)
MI traces for in-slice rotation around registered position: [-0.5,+0.5] degrees

Behavior of MI
0.883 NN
Registration of high resolution CT and MR:
MI traces for in-slice rotation around registered position: [-180,+180] degrees
1 0.882
NN
0.9
TRI
PV 0.881
0.8 -180 0 180 -0.5 -0.25 0 0.25 0.5
0.7 0.965 0.877
0.6 TRI PV
0.5 0.964
0.876
0.4 MR: 1x1x1mm

CT: 1x1x1.5mm 0.963
0.3
-180 -120 -60 0 60 120 180 0.875
-0.5 -0.25 0 0.25 0.5 -0.5 -0.25 0 0.25 0.5
Optimization of MI Influence of interpolation on optimization
The presence of local optima deteriorates registration robustness The presence of local optima deteriorates registration robustness
more with NN than with TRI or PV interpolation
-4
x 10
2
0.883 NN I(a*) - I(a)
= global optimum
= local optimum Same registration
0.882 experiment, using 1
different interpolation
methods and starting from
different initial parameter NN
values TRI
0.881
PV
-0.5 -0.25 0 0.25 0.5 a = parameter vector
a* = optimal value for a 0
0 0.1 0.2 0.3 mm
particular interpolation | a - a* | degrees
type
Optimization strategies Gradient of mutual information
q1 PV interpolation q2
• Non-gradient methods
w3 w4
– heuristic search, Powell, simplex dw3 q dw4
• Gradient methods dq
w2 w1
– steepest descent dw2 dw1
– conjugate gradient, Quasi-Newton q4 q3
– Levenberg-Marquardt
• Multi-resolution, especially for non-rigid registration
èCourse global registration first, finer local registration next
èImproves convergence to correct optimum
For affine registration:
1 gradient evaluation ~ 12 criterion evaluations
Optimization strategies Performance
Non-gradient Gradient CJG, SMP, LVM > POW, STD, QSN Important speedup
with same precision by multi-resolution approach
Steepest Conjugate Quasi- Levenberg-
Powell Simplex descent gradient Newton Marquardt Ne
350
250
300
200
250
150 200
150
100
100
50 50
search trajectory in 2-D parameter subspace of rigid-body registration
POW SMP STD CJG QSN LVM CJG, various multiresolution strategies
Ne = number of equivalent full resolution criterion evaluations
Visual inspection of accuracy Quantitative validation of accuracy
Reslice Moving window Linked cursor Retrospective Registration Evaluation Project (RREP)
(J.M.Fitzpatrick et al., 1996)
CT • comparitive validation of retrospective registration techniques

• for CT/MR and PET/MR matching of the brain
• using the stereotactic registration solution as the gold standard
• blind study: images were edited to remove markers
• study demonstrated the subvoxel accuracy of the MI matching criterion
MR J West, JM Fitzpatrick et al., Comparison and evaluation of retrospective intermodality image

registration techniques, JCAT, 21(4), 1997.
Accuracy 6
RREP 1 : CT / MR
RREP study 1: CT to MR
5
- 7 patient datasets :
4
CT : 512 x 512, 28-34 slices
Error (mm)
0.65 x 0.65 x 4 mm 3
MR: axial PD, T1, T2 and rectified images
256 x 256, 20-26 slices 2
1.25 x 1.25 x 4 mm
1
- compared with stereotactic reference
- evaluated at 8 points near brain surface 0
Pd Pdr T1 T1r T2 T2r
Accuracy 12
RREP 1: PET / MR
RREP study 1: PET to MR

10
- 7 patient datasets :
8
PET : 128 x 128, 15 slices
Error (mm)
2.59 x 2.59 x 8 mm 6
MR: axial PD, T1, T2 and rectified images
256 x 256, 20-26 slices 4
1.25 x 1.25 x 4 mm
2
- evaluated at 8 points near brain surface 0
Pd Pdr T1 T1r T2 T2r
Accuracy RREP 2 : CT / MR
4
RREP study 2: CT to MR
3.5
- 9 patient datasets : 3
2.5
Error (mm)
CT : 512 x 512, 44-50 slices, 0.4 x 0.4 x 3 mm

MR: axial PD, T1, T2 2
256 x 256, 52 slices, 0.8 x 0.8 x 4 mm
1.5
sagittal / coronal MPRAGE
256 x 256, 128 slices, 1 x 1 x 1.25 mm 1
0.5
0
- evaluated at 8 points near brain surface MP PD T1 T2
Quality of gold standard ? Impact of geometric distortion ?
Reference Reference
Wrong Correct
Reference Reference
MI MI
Correct Wrong
MI MI
Compare Compare
Non-rigid transformations Regularization of non-rigid registration

(x1,y1,z1) (x1,y1,z1)
(1) Intrinsic smoothness imposed implicitly by the parameterization, typically
(u,v,w)
using basis functions:
• Poly-affine transformation (global support):
locally weighted average of affine transformations Ti (Gaussian weights wi
x2 = f1(x1,y1,z1) (x2,y2,z2) anchored around points ai)
y2 = f2(x1,y1,z1)
z2 = f3(x1,y1,z1)
Arsigny et al., 2004
x2 = x1 + u(x1,y1,z1)
y2 = y1 + v(x1,y1,z1) • B-spline (local support): piece wise polynomials on a regular grid
z2 = z1 + w(x1,y1,z1) (u,v,w) = deformation field • thin-plate spline (global support): arbitrarily control point locations
• radial basis functions (local or global support): e.g. Gaussian, truncated
Original Deformed: valid Deformed: not valid
polynomials, …
è continuous, analytically differentiable

Impose regularization constraints
to assure the deformation field is è degrees of freedom controlled by parameterization (e.g. grid spacing, number
of control points)
physically valid (e.g. spatially
è implicitly smooth at small scale, explicit regularization needed at larger scale
smooth, no folding or tearing of
tissues...)
B-splines Thin-plate spline
(2D)
(3D)
Registration: find (A,t,c) such that I1(x) and I2(x’(x)) are most similar
Warping: given N point pairs (pi,qi), find (A,t,c) such that x’(pi) = qi, "i
èClosed form solution
èExtrapolates discrete point correspondences to whole spatial domain x
Regularization of non-rigid registration Regularization of non-rigid registration
(2) Smoothness imposed explicitly through penalty functions: (3) Physics-based regularization:
Registration criterion Regularization penalty
(similarity measure) (local smoothness constraint)
Elastic: µÑ2u + (l + µ )Ñ(Ñu ) + F (x, u ) = 0
Cost function: E(u) = -Esimilarity (u,I1,I2,) + g1.Epenalty1 + g2.Epenalty2 (u) + …
Viscous fluid: µÑ2v + (l + µ )Ñ(Ñv ) + F (x, u ) = 0
u: deformation field
v: deformation velocity field
F: external force field (similarity measure)
R = registration domain l,µ: material parameters
VR = volume of R
g = geometric transformation
J = Jacobian of g èequation of motion of an elastic / viscous material under the influence of a force F
J = determinant of J èF designed such that the deformation maximizes the registration criterion
èsolved numerically on the discrete voxel grid using finite difference approximations
è heuristic, introduces additional parameters, need for tuning, … èinterpolation needed to estimate the transformation at between-voxel locations
Biomechanical models Example: maxillofacial soft-tissue deformations
• Regularization depends on tissue type

e.g. different tissues have different elasticity W Mollemans et al., MICCAI’04
• Computational approaches:
– Mass-spring models
• Spring stiffness varies with tissue type
– Finite-element models
• Elasticity modulus varies spatially
– Voxel-based approaches
• Penalty term is function of local tissue type
• Tissue modeling
– Requires segmentation: indication of tissue type in each voxel ?
– Appropriate tissue-specific deformation models ?
Statistical deformation model Application: brain image registration
Same patient,
different time point
u = Si wi ui
D Loeckx et al., IEEE TMI, 2003 Deformation model

Modes of
deformation
derived from a
Tx Ty sx sy rz ‘breathing’ training set of
manually
matched
image pairs by
PCA
7 8 9 10 11 12
Application: multispectral analysis Application: multitemporal analysis
T1 PD T2 FLAIR DTI
Time 1 Time 2
Application: thorax tumor staging Application: prostate radiotherapy planning
localisation detection
in CT in PET
CT PET transmission PET-FDG emission

Matched using MMI Aligned by acquisition
Application: prostate radiotherapy planning Application: radiotherapy follow-up
CT Impact on treatment plan:
MR
CT
MR
Radiotherapy planning CT Radiotherapy follow-up CT Planning matched to follow-up
Spline-based representation
Local volume preserving constraint
depending on tissue type
Application: motion compensation Application: subtraction CTA
Pre-contrast CT Post-contrast CT
Affine Non-rigid
Without registration With non-rigid registration
Application: assessment of alveolar bone cleft Application: assessment of alveolar bone cleft
graft from pre and post operative CT graft from pre and post operative CT
pre-operative CT post-operative CT pre-operative post-operative
graft ? cleft graft !
9 year old
child with Post-
innate bone operative
defect in assessment
lower jaw of graft
quality
10 days before intervention… …10 months after intervention Delineate cleft on pre-op image… …to identify graft on post-op image
Application: geometric accuracy of CT imaging Image to model registratrion
hardware phantom CT image

known geometry ideal
image
matched
comparison
geometrical
model
ESP (anthropomorphic
spine phantom)
Histogram dispersion Model-guided measurements
Unregistered Ideal contours

Registered (from model by
registration)
cortical wall 1.5 mm cortical wall 1.0 mm cortical wall 0.5 mm
Image contours
Accuracy validation
(by segmentation)
(by comparison)
Image intensity Image intensity Image intensity
Inter-subject registration: Inter-subject registration:

study-to-template matching study-to-template matching
patient data patient-to-template template
registration in Talairach space
n
2
MRI MR Affine registration è coarse, global alignment
1
MRI MR
MRI MRI
fMRI
Inter-subject registration:
automated brain segmentation Application: inter-subject matching
Patient MR scan
gray white
matter matter
image-
information
Tissue
segmentation
csf
other
a priori
knowledge
registration
Tissue atlas Intensity model
Inter-subject registration: Inter-subject registration:
atlas construction atlas-based segmentation
rigid registration Original Rigid Non-rigid
non-rigid registration
Strategies for image registration Conclusion
External markers: – not retrospective • Maximization of mutual information highly successful for affine image
e.g. stereotactic surgery registration
• MMI is very general, robust and accurate
Point based:
• MMI requires no segmentation or user intervention è completely
anatomical or – interactive
geometrical landmarks – correspondence? automated
• Highly suited for routine clinical use (already commercially available)
Surface based: – segmentation? • Non-rigid matching using MI is still an active area of research ...
objects – correspondence?
Voxel based:
intensity differences – unimodal
intensity correlation – linear relation
histogram dispersion ! mutual information
Next class
• Iconic shape models

Lecture 6 Merged

Uploaded by

Copyright:

Available Formats

You might also like

Lecture 6 Merged

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 6 Merged

Uploaded by

Copyright:

Available Formats

In this class

Prof. Frederik Maes

Image registration problem Image registration problem

Image registration problem Applications

Application: multi-temporal analysis Application: multi-modal diagnosis

Follow-up of MS lesions with T2-weighted MRI: Diagnosis of metastases in lung cancer:

CT PET transmission PET-FDG emission

Application: multi-modal therapy planning Application: multi-modal therapy planning

Impact on treatment plan:

CT with MR overlaid MR with CT overlaid

Image registration strategies Example

A formal registration solution typically consists of:

3D/3D geometric transformations Non-rigid transformations

e.g. brain images of same subject x’

• Non-rigid: x’ = f(x) Original Deformed: valid Deformed: not valid

Marker-based registration Marker-based registration

• Invasive and not always practical or applicable

Point-based registration Point-based registration

Note: FRE >< TRE = target registration error:

Point-based registration Surface-based registration

J.M. Fitzpatrick et al.

Iterative closest point Distance transform

Skin surface? MR Voxels of the same

Unimodality similarity measures Unimodality similarity measures

• Unimodal registration: same modality, same protocol Time 1 Time 2

• Minimize sum of squared differences:

Multimodality similarity measures Multimodality similarity measures

• Multimodal registration: different modality or different protocols • Variance of intensity ratios:

èminimal intensity variation of B’ for each intensity value a of image A

Joint intensity distribution Joint intensity distribution: example

I1 p1(i1) 1/4 3/4

Joint intensity distribution Joint intensity distribution

0 mm shift 2 mm shift 5 mm shift I1 Observation (Hill et al,

Unimodal: significant non-

Histogram dispersion Histogram dispersion

If p(a|I2=b) = p(a), knowledge of I2=b does not contain

Joint entropy Joint entropy minimization?

Η(Α) = - å p Α(a) log 2 p Α (a) I1

Η(Α, B) = - å p ΑB(a, b) log 2 p ΑB (a, b)

“Mutual information, i.e. the statistical dependence between both images or

è Kullback-Leibler distance between the joint probability distribution and

è reduction in the uncertainty about A when knowing B a = transformation parameters

IAB(a) = HA(a) + HB(a) - HAB(a) 4 4 / 36 5 5 / 36 22 22 / 36 1/4 3/4 p1(i1)

I2 = 1 3/36 24/36 3/4 I2 = 1 0/36 27/36 3/4

I2 = 0 6/36 3/36 1/4 I2 = 0 9/36 0/36 1/4

6 6 / 36 3 3 / 36 24 24 / 36 1/4 3/4 p1(i1) 9 9 / 36 0 0 / 36 27 27 / 36 1/4 3/4 p1(i1)

I2 = 1 9/36 0/36 1/4

I2 = 0 0/36 27/36 3/4

9 9 / 36 0 0 / 36 27 27 / 36 1/4 3/4 p1(i1)

Multimodal character of mutual information Robustness of MMI criterion

• mathematically well founded with only few parameters

• no segmentation required 1.5

Noise Intensity inhomogeneity

• Both images should contain similar information…

Limiting assumptions Limiting assumptions

Alternative measures Implementation

• Entropy correlation coefficient:

Floating image (I1) Reference image (I2)