Professional Documents
Culture Documents
Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016
Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016
and Computer-Assisted
Intervention – MICCAI 2016
19th International Conference
Athens, Greece, October 17–21, 2016
Proceedings, Part I
123
Lecture Notes in Computer Science 9900
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7412
Sebastien Ourselin Leo Joskowicz
•
123
Editors
Sebastien Ourselin Gozde Unal
University College London Istanbul Technical University
London Istanbul
UK Turkey
Leo Joskowicz William Wells
The Hebrew University of Jerusalem Harvard Medical School
Jerusalem Boston, MA
Israel USA
Mert R. Sabuncu
Harvard Medical School
Boston, MA
USA
LNCS Sublibrary: SL6 – Image Processing, Computer Vision, Pattern Recognition, and Graphics
as well as the reviewers for their support during the review process. We also thank
Andreas Maier for his support in editorial tasks. Last but not least, we thank our
sponsors for the financial support that made the conference possible.
We look forward to seeing you in Quebec City, Canada, in 2017!
General Chair
Sebastien Ourselin University College London, London, UK
General Co-chair
Aytül Erçil Sabanci University, Istanbul, Turkey
Program Chair
William Wells Harvard Medical School, Boston, MA, USA
Program Co-chairs
Mert R. Sabuncu A.A. Martinos Center for Biomedical Imaging,
Charlestown, MA, USA
Leo Joskowicz The Hebrew University of Jerusalem, Israel
Gozde Unal Istanbul Technical University, Istanbul, Turkey
Industrial Liaison
Tanveer Syeda-Mahmood IBM Almaden Research Center, San Jose, CA, USA
VIII Organization
Publication Chair
Andreas Maier Friedrich-Alexander-Universität Erlangen-Nürnberg,
Erlangen, Germany
Executive Officers
President and Board Chair Wiro Niessen
Executive Director Li Shen
(Managing Educational
Affairs)
Secretary (Coordinating Gabor Fichtinger
MICCAI Awards)
Treasurer Stephen Aylward
Elections Officer Rich Robb
Organization IX
Non-Executive Officers
Society Secretariat Janette Wallace, Canada
Recording Secretary Jackie Williams, Canada
and Web Maintenance
Fellows Nomination Terry Peters, Canada
Coordinator
Program Committee
Arbel, Tal McGill University, Canada
Cardoso, Manuel Jorge University College London, UK
Castellani, Umberto University of Verona, Italy
Cattin, Philippe C. University of Basel, Switzerland
Chung, Albert C.S. Hong Kong University of Science and Technology,
Hong Kong
Cukur, Tolga Bilkent University, Turkey
Delingette, Herve Inria, France
Feragen, Aasa University of Copenhagen, Denmark
Freiman, Moti Philips Healthcare, Israel
Glocker, Ben Imperial College London, UK
Goksel, Orcun ETH Zurich, Switzerland
Gonzalez Ballester, Universitat Pompeu Fabra, Spain
Miguel Angel
Grady, Leo HeartFlow, USA
Greenspan, Hayit Tel Aviv University, Israel
Howe, Robert Harvard University, USA
Isgum, Ivana University Medical Center Utrecht, The Netherlands
Jain, Ameet Philips Research North America, USA
Jannin, Pierre University of Rennes, France
Joshi, Sarang University of Utah, USA
Kalpathy-Cramer, Jayashree Harvard Medical School, USA
Kamen, Ali Siemens Corporate Technology, USA
Knutsson, Hans Linkoping University, Sweden
Konukoglu, Ender Harvard Medical School, USA
Landman, Bennett Vanderbilt University, USA
Langs, Georg University of Vienna, Austria
X Organization
Reviewers
Brain Analysis
Alzheimer Disease
Pareto Front vs. Weighted Sum for Automatic Trajectory Planning of Deep
Brain Stimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
Noura Hamzé, Jimmy Voirin, Pierre Collet, Pierre Jannin,
Claire Haegelen, and Caroline Essert
Segmentation
Shape Modeling
Basal Slice Detection Using Long-Axis Segmentation for Cardiac Analysis . . . 273
Mahsa Paknezhad, Michael S. Brown, and Stephanie Marchesseau
Image Reconstruction
Joint Estimation of Cardiac Motion and T1 Maps for Magnetic Resonance
Late Gadolinium Enhancement Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Jens Wetzl, Aurélien F. Stalder, Michaela Schmidt, Yigit H. Akgök,
Christoph Tillmanns, Felix Lugauer, Christoph Forman,
Joachim Hornegger, and Andreas Maier
Tight Graph Framelets for Sparse Diffusion MRI q-Space Representation . . . 561
Pew-Thian Yap, Bin Dong, Yong Zhang, and Dinggang Shen
Abstract. Brain connectivity networks have been widely used for diag-
nosis of brain-related diseases, e.g., Alzheimer’s disease (AD), mild cog-
nitive impairment (MCI), and attention deficit hyperactivity disorder
(ADHD). Although several network descriptors have been designed for
representing brain connectivity networks, most of them not only ignore
the important weight information of edges, but also cannot capture the
modular local structures of brain connectivity networks by only focusing
on individual brain regions. In this paper, we propose a new network
descriptor (called ordinal pattern) for brain connectivity networks, and
apply it for brain disease diagnosis. Specifically, we first define ordinal
patterns that contain sequences of weighted edges based on a functional
connectivity network. A frequent ordinal pattern mining algorithm is
then developed to identify those frequent ordinal patterns in a brain
connectivity network set. We further perform discriminative ordinal pat-
tern selection, followed by a SVM classification process. Experimental
results on both the ADNI and the ADHD-200 data sets demonstrate
that the proposed method achieves significant improvement compared
with state-of-the-art brain connectivity network based methods.
1 Introduction
As a modern brain mapping technique, functional magnetic resonance imaging
(fMRI) is an efficient as well as non-invasive way to map the patterns of func-
tional connectivity of the human brain [1,2]. In particular, the task-free (resting-
state) functional magnetic resonance imaging (rs-fMRI) have a small-world archi-
tecture, which can reflect a robust functional organization of the brain. Recent
studies [3–6] show great promises of brain connectivity networks in understand-
ing brain diseases (e.g., AD, MCI, and ADHD) pathology by exploring anatom-
ical connections or functional interactions among different brain regions, where
brain regions are treated as nodes and anatomical connections or functional
associations are regarded as edges.
Several network descriptors have been developed for representing brain con-
nectivity networks, such as node degrees [3], clustering coefficients [4], and sub-
networks [7]. Most of existing descriptors are designed on un-weighted brain con-
nectivity networks, where the valuable weight information of edges are ignored.
M. Liu and J. Du—These authors contribute equally for this paper.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 1–9, 2016.
DOI: 10.1007/978-3-319-46720-7 1
2 M. Liu et al.
Actually, different edges are usually assigned different weights to measure the
connectivity strength between pairs of nodes (w.r.t. brain regions). However, pre-
vious studies usually simply apply thresholds to transform the original weighted
networks into un-weighted ones [2,5], which may lead to sub-optimal learning
performance. In addition, existing descriptors mainly focus on individual brain
regions other than local structures of brain networks, while many evidences have
declared that some brain diseases (e.g., AD and MCI) are highly related to mod-
ular local structures [8]. Unfortunately, it is hard to capture such local structures
using existing network descriptors.
...
...
b c e d b c e d
Patients’ networks
... Discriminative Ordinal
Frequent Ordinal Pattern Mining
Pattern Selection
b e d SVM
a b e b e d Classification
...
...
a b e d a b e d
Normal controls’ networks
Fig. 1. An overview of ordinal pattern based learning for brain disease diagnosis.
2 Method
2.1 Data and Preprocessing
The first data set contains rs-fMRI data from the ADNI1 database with 34 AD
patients, 99 MCI patients, and 50 NCs. The rs-fMRI data were pre-processed by
brain skull removal, motion correction, temporal pre-whitening, spatial smooth-
ing, global drift removal, slice time correction, and band pass filtering. By warp-
ing the automated anatomical labelling (AAL) [9] template, for each subject, we
concatenate the brain space of rs-fMRI scans into 90 regions of interest (ROIs).
For each ROI, the rs-fMRI time series of all voxels were averaged to be the mean
time series of the ROI. With ROIs as nodes and Pearson correlations between
pair of ROIs as connectivity weights, a functional full connected weighted net-
work is constructed for each subject. The second data set is ADHD-200 with the
Athena preprocessed rs-fMRI data, including 118 ADHD patients and 98 NCs
(detailed description of data acquisition and post-processing are given online2 .
a A weighted a Root
op1={ea-b, eb-c} 0.7 0.6 network node
0.4
b e
0.7 0.5
a b c 0.5
0.3
0.2 b h n Level 1
2
c 0.4
d
op ={eb-c, ec-e}
0.5 0.3
op4={ea-b, eb-c, ec-e} g c
op1
i j o p Level 2
b c e
0.7 0.5 0.3
a b c e
op3={eb-e, ee-d} d e k l q Level 3
op5={eb-c, ec-e, ee-d} op4 opM
0.4 0.2
b e d 0.5 0.3 0.2 discarded
b c e d f m Level 4
Fig. 2. Illustration of (a) ordinal patterns, and (b) frequent ordinal pattern mining
method.
1
http://adni.loni.usc.edu/.
2
http://www.nitrc.org/plugins/mwiki/index.php/neurobureau:AthenaPipeline.
4 M. Liu et al.
ordinal patterns containing three edges, e.g., op4 = {ea−b , eb−c , ec−e }. Hence, the
proposed ordinal pattern can be regarded as the combination of some ordinal
relations between pairs of edges. We only consider connected ordinal patterns
in this study. That is, an ordinal pattern is connected if and only if the edges
it contains can construct a connected sub-network. Different from conventional
methods, the ordinal pattern is defined on a weighted network directly to explic-
itly utilize the weight information of edges. Also, as a special sub-network, an
ordinal pattern can model the ordinal relations conveyed in a weighted network,
and thus, can naturally preserve the local structures of the network.
where D− means the NCs’ brain network set, and is a small value to prevent
the denominator to be 0. Similarly, the frequent ordinal pattern opj mined from
the NCs’ brain network set (i.e., D− ), its ratio score is computed as
|Gn |opj is an ordinal pattern of Gn , Gn ∈ D− | |D+ |
RS(opj ) = log × (3)
|Gn |opj is an ordinal pattern of Gn , Gn ∈ D+ | + |D− |
3 Experiments
Experimental Settings: We perform three classification tasks, i.e., AD vs.
NC, MCI vs. NC and ADHD vs. NC classification, by using a 10-fold cross-
validation strategy. Note that those discriminative ordinal patterns are selected
only from training data. Classification performance is evaluated by accuracy
(ACC), sensitivity (SEN), specificity (SPE) and area under the ROC curve
(AUC). The parameter in ratio score in Eqs. (2) and (3) is set as 0.1 empirically.
With a inner cross-validation strategy, the level number in our frequent ordinal
pattern mining algorithm is chosen from [2, 6] with step 1, and the number of
discriminative ordinal patterns are chosen from [10, 100] with step 10.
We compare our method with two widely used network descriptors in brain
connectivity network based studies, including cluster coefficients [4] and dis-
criminative sub-networks [7]. Since these two descriptors require a threshold-
ing process, we adopt both single-threshold and multi-thresholds [5,11] strate-
gies to transform weighted networks to un-weighted ones. In summary, there
are four competing methods, including (1) clustering coefficients (CC) with
single-threshold, (2) clustering coefficient using multi-thresholds (CCMT), (3)
discriminative sub-networks (DS) with single-threshold, and (4) discriminative
sub-networks using multi-thresholds (DSMT). The linear SVM with the default
parameter (i.e., C = 1) is used as the classifier in different methods.
Results: Experimental results are listed in Table 1, from which we can see
that our method consistently achieves the best performance in three tasks. For
instance, the accuracy achieved by our method is 94.05 % in AD vs. NC clas-
sification, which is significantly better than the second best result obtained by
DSMT. This demonstrates that the ordinal patterns are discriminative in dis-
tinguishing AD/MCI/ADHD patients from NCs, compared with conventional
network descriptors.
We further plot those top 2 discriminative ordinal patterns identified by
our method in three tasks in Fig. 3. For instance, the most discriminative
ordinal pattern for AD, shown in top left of Fig. 3(a), can be recorded as op =
{eDCG.L−ACG.L , eACG.L−ROL.L , eROL.L−P AL.R , eP AL.R−LIN G.L , eP AL.R−M OG.R }.
Fig. 3. The most discriminative ordinal patterns identified by the proposed method in
three tasks. In each row, the first two columns show those top 2 discriminative ordinal
patterns selected from positive classes (i.e., AD, MCI, and ADHD), while the last two
columns illustrate those selected from the negative class (i.e., NC).
These results imply that the proposed ordinal patterns do reflect some local
structures of original brain networks.
We investigate the influence of frequent ordinal pattern mining level and the
number of selected discriminative ordinal patterns, with results shown in Fig. 4.
From this figure, we can see that our method achieves relatively stable results
when the number of selected ordinal patterns is larger than 40. Also, our method
achieves overall good performance when the level number in the frequent ordinal
pattern mining algorithm are 4 in AD/MCI vs. NC classification and 5 in ADHD
vs. NC classification, respectively.
We perform an additional experiment by using weights of each edge in ordinal
patterns as raw features, and achieve the accuracies of 71.43 %, 67.11 %, and
69.91 % in AD vs. NC, MCI vs. NC and ADHD vs. NC classification, respectively.
We further utilize a real valued network descriptor based on ordinal patterns
(by taking the product of weights in each ordinal pattern), and obtained the
accuracies of 78.52 %, 72.37 %, and 72.69 % in three tasks, respectively.
8 M. Liu et al.
ACC
ACC
ACC
0.7 0.7 0.7
0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 140 160 180 200
Number of discriminative patterns Number of discriminative ordinal patterns Number of discriminative ordinal patterns
Fig. 4. Influence of the level number in frequent ordinal pattern mining method and the
number of discriminative ordinal patterns in AD vs. NC (left), MCI vs. NC (middle),
and ADHD vs. NC (right) classification.
4 Conclusion
In this paper, we propose a new network descriptor (i.e., ordinal pattern)
for brain connectivity networks. The proposed ordinal patterns are defined on
weighted networks, which can preserve the weights information of edges and the
local structure of original brain networks. Then, we develop an ordinal pattern
based brain network classification method for the diagnosis of AD/MCI and
ADHD. Experimental results on both ADNI and ADHD-200 data sets demon-
strate the efficacy of our method.
References
1. Robinson, E.C., Hammers, A., Ericsson, A., Edwards, A.D., Rueckert, D.: Identi-
fying population differences in whole-brain structural networks: a machine learning
approach. NeuroImage 50(3), 910–919 (2010)
2. Sporns, O.: From simple graphs to the connectome: networks in neuroimaging.
NeuroImage 62(2), 881–886 (2012)
3. Rubinov, M., Sporns, O.: Complex network measures of brain connectivity: uses
and interpretations. NeuroImage 52(3), 1059–1069 (2010)
4. Wee, C.Y., Yap, P.T., Li, W., Denny, K., Browndyke, J.N., Potter, G.G., Welsh-
Bohmer, K.A., Wang, L., Shen, D.: Enriched white matter connectivity networks
for accurate identification of MCI patients. NeuroImage 54(3), 1812–1822 (2011)
5. Jie, B., Zhang, D., Wee, C.Y., Shen, D.: Topological graph kernel on multiple
thresholded functional connectivity networks for mild cognitive impairment classi-
fication. Hum. Brain Mapp. 35(7), 2876–2897 (2014)
6. Liu, M., Zhang, D., Shen, D.: Relationship induced multi-template learning for
diagnosis of Alzheimer disease and mild cognitive impairment. IEEE Trans. Med.
Imaging 35(6), 1463–1474 (2016)
7. Fei, F., Jie, B., Zhang, D.: Frequent and discriminative subnetwork mining for mild
cognitive impairment classification. Brain Connect. 4(5), 347–360 (2014)
Ordinal Patterns for Brain Connectivity Network 9
8. Brier, M.R., Thomas, J.B., Fagan, A.M., Hassenstab, J., Holtzman, D.M.,
Benzinger, T.L., Morris, J.C., Ances, B.M.: Functional connectivity and
graph theory in preclinical Alzheimer’s disease. Neurobiol. Aging 35(4), 757–768
(2014)
9. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O.,
Delcroix, N., Mazoyer, B., Joliot, M.: Automated anatomical labeling of activations
in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject
brain. NeuroImage 15(1), 273–289 (2002)
10. Yan, X., Cheng, H., Han, J., Yu, P.S.: Mining significant graph patterns by leap
search. In: Proceedings of ACM SIGMOD International Conference on Manage-
ment of Data, pp. 433–444. ACM (2008)
11. Sanz-Arigita, E.J., Schoonheim, M.M., Damoiseaux, J.S., Rombouts, S.,
Maris, E., Barkhof, F., Scheltens, P., Stam, C.J., et al.: Loss of ‘small-
world’ netowrks in Alzheimer’s disease: graph analysis of FMRI resting-state func-
tional connectivity. PLoS ONE 5(11), e13788 (2010)
Discovering Cortical Folding Patterns
in Neonatal Cortical Surfaces Using
Large-Scale Dataset
Abstract. The cortical folding of the human brain is highly complex and
variable across individuals. Mining the major patterns of cortical folding from
modern large-scale neuroimaging datasets is of great importance in advancing
techniques for neuroimaging analysis and understanding the inter-individual
variations of cortical folding and its relationship with cognitive function and
disorders. As the primary cortical folding is genetically influenced and has been
established at term birth, neonates with the minimal exposure to the complicated
postnatal environmental influence are the ideal candidates for understanding the
major patterns of cortical folding. In this paper, for the first time, we propose a
novel method for discovering the major patterns of cortical folding in a
large-scale dataset of neonatal brain MR images (N = 677). In our method, first,
cortical folding is characterized by the distribution of sulcal pits, which are the
locally deepest points in cortical sulci. Because deep sulcal pits are genetically
related, relatively consistent across individuals, and also stable during brain
development, they are well suitable for representing and characterizing cortical
folding. Then, the similarities between sulcal pit distributions of any two sub-
jects are measured from spatial, geometrical, and topological points of view.
Next, these different measurements are adaptively fused together using a simi-
larity network fusion technique, to preserve their common information and also
catch their complementary information. Finally, leveraging the fused similarity
measurements, a hierarchical affinity propagation algorithm is used to group
similar sulcal folding patterns together. The proposed method has been applied
to 677 neonatal brains (the largest neonatal dataset to our knowledge) in the
central sulcus, superior temporal sulcus, and cingulate sulcus, and revealed
multiple distinct and meaningful folding patterns in each region.
1 Introduction
The human cerebral cortex is a highly convoluted and complex structure. Its cortical
folding is quite variable across individuals (Fig. 1). However, certain common folding
patterns exist in some specific cortical regions as shown in the classic textbook [1],
which examined 25 autopsy specimen adult brains. Mining the major representative
patterns of cortical folding from modern large-scale datasets is of great importance in
advancing techniques for neuroimaging analysis and understanding the inter-individual
variations of cortical folding and their relationship with structural connectivity, cog-
nitive function, and brain disorders. For example, in cortical surface registration [2],
typically a single cortical atlas is constructed for a group of brains. Such an atlas may
not be able to reflect some important patterns of cortical folding, due to the averaging
effect, thus leading to poor registration accuracy for some subjects that cannot be well
characterized by the folding patterns in the atlas. Building multiple atlases, with each
representing one major pattern of cortical folding, will lead to boosted accuracy in
cortical surface registration and subsequent group-level analysis.
Fig. 1. Huge inter-individual variability of sulcal folding patterns in neonatal cortical surfaces,
colored by the sulcal depth. Sulcal pits are shown by white spheres.
for discovering the major cortical patterns. This is very important for understanding the
biological relationships between cortical folding and brain functional development or
neurodevelopmental disorders rooted during infancy. The motivation of using a
large-scale dataset is that small datasets may not sufficiently cover all kinds of major
cortical patterns and thus would likely lead to biased results.
In our method, we leveraged the reliable deep sulcal pits to characterize the cortical
folding, and thus eliminating the effects of noisy shallow folding regions that are
extremely heterogeneous and variable. Specifically, first, sulcal pits were extracted
using a watershed algorithm [8] and represented using a sulcal graph. Then, the dif-
ference between sulcal pit distributions of any two cortices was computed based on six
complementary measurements, i.e., sulcal pit position, sulcal pit depth, ridge point
depth, sulcal basin area, sulcal basin boundary, and sulcal pit local connection, thus
resulting in six matrices. Next, these difference matrices were further converted to
similarity matrices, and adaptively fused as one comprehensive similarity matrix using
a similarity network fusion technique [10], to preserve their common information and
also capture their complementary information. Finally, based on the fused similarity
matrix, a hierarchical affinity propagation clustering algorithm was performed to group
sulcal graphs into different clusters. The proposed method was applied to 677 neonatal
brains (the largest neonatal dataset to our knowledge) in the central sulcus, superior
temporal sulcus, and cingulate sulcus, and revealed multiple distinct and meaningful
patterns of cortical folding in each region.
2 Methods
Subjects and Image Acquisition. MR images for N = 677 term-born neonates were
acquired on a Siemens head-only 3T scanner with a circular polarized head coil. Before
scanning, neonates were fed, swaddled, and fitted with ear protection. All neonates were
unsedated during scanning. T1-weighted MR images with 160 axial slices were obtained
using the parameters: TR = 1,820 ms, TE = 4.38 ms, and resolution = 1 11 mm3.
T2-weighted MR images with 70 axial slices were acquired with the parameters:
TR =7,380 ms, TE = 119 ms, and resolution = 1.25 1.25 1.95 mm3.
Cortical Surface Mapping. All neonatal MRIs were processed using an infant-
dedicated pipeline [2]. Specifically, it contained the steps of rigid alignment between
T2 and T1 MR images, skull-stripping, intensity inhomogeneity correction, tissue
segmentation, topology correction, cortical surface reconstruction, spherical mapping,
spherical registration onto an infant surface atlas, and cortical surface resampling [2].
All results have been visually checked to ensure the quality.
Sulcal Pits Extraction and Sulcal Graph Construction. To characterize the sulcal
folding patterns in each individual, sulcal pits, the locally deepest point of sulci, were
extracted on each cortical surface (Fig. 1) using the method in [8]. The motivation is
that deep sulcal pits were relatively consistent across individuals and stable during
brain development as reported in [6], and thus were well suitable as reliable landmarks
for characterizing sulcal folding. To exact sulcal pits, each cortical surface was
Discovering Cortical Folding Patterns Using Large-Scale Dataset 13
partitioned into small basins using a watershed method based on the sulcal depth map
[11], and the deepest point of each basin was identified as a sulcal pit, after pruning
noisy basins [8]. Then, a sulcal graph was constructed for each cortical surface as in
[5]. Specifically, each sulcal pit was defined as a node, and two nodes were linked by
an edge, if their corresponding basins were spatially connected.
Sulcal Graph Comparison. To compare two sulcal graphs, their similarities were
measured using multiple metrics from spatial, geometrical, and topological points of
view, to capture the multiple aspects of sulcal graphs. Specifically, we computed six
distinct metrics, using sulcal pit position D, sulcal pit depth H, sulcal basin area S,
sulcal basin boundary B, sulcal pit local connection C, and ridge point depth R. Given
N sulcal graphs from N subjects, any two of them were compared using above six
metrics, so a N N matrix was constructed for each metric.
The difference between two sulcal graphs can be measured by comparing the
attributes of the corresponding sulcal pits in the two graphs. In general, the difference
between any sulcal-pit-wise attribute of sulcal graphs P and Q can be computed as
1 1 X 1 X
Diff ðP; Q; diff X Þ ¼ ð diff X ði; QÞ þ diff X ðj; PÞÞ ð1Þ
2 VP i2P VQ j2Q
where VP and VQ are respectively the numbers of sulcal pits in P and Q, and diff X ði; QÞ
is the difference of a specific attribute X between sulcal pit i and its corresponding
sulcal pitin graph Q. Note that we treat the closest pit as the corresponding sulcal pit, as
all cortical surfaces have been aligned to a spherical surface atlas.
(1) Sulcal Pit Position. Based on Eq. 1, the difference between P and Q in terms of
sulcal pit positions is computed as DðP; QÞ ¼ Diff ðP; Q; diff D Þ, where diff D ði; QÞ is
the geodesic distance between sulcal pit i and its corresponding sulcal pit in Q on the
spherical surface atlas.
(2) Sulcal Pit Depth. For each subject, the sulcal depth map is normalized by dividing
by the maximum depth value, to reduce the effect of the brain size variation. The
difference between P and Q in terms of sulcal pit depth is computed as
H ðP; QÞ ¼ Diff ðP; Q; diff H Þ, where diff H ði; QÞ is the depth difference between sulcal
pit i and its corresponding sulcal pit in Q.
(3) Sulcal Basin Area. To reduce the effect of surface area variation across subjects,
the area of each basin is normalized by the area of the whole cortical surface. The
difference between P and Q in terms of sulcal basin area of graphs P and Q is computed
as SðP; QÞ ¼ Diff ðP; Q; diff S Þ, where diff S ði; QÞ is the area difference between the
basins of sulcal pit i and its corresponding sulcal pit in Q.
(4) Sulcal Basin Boundary. The difference between P and Q in terms of sulcal basin
boundary is formulated as BðP; QÞ ¼ Diff ðP; Q; diff B Þ, where diff B ði; QÞ is the dif-
ference between the sulcal basin boundaries of sulcal pit i and its corresponding sulcal
pit in Q. Specifically, we define a vertex as a boundary vertex of a sulcal basin, if one of
its neighboring vertices belongs to a different basin. Given two corresponding sulcal
pits i 2 P and i0 2 Q, their sulcal basin boundary vertices are respectively denoted as Bi
and Bi0 . For any boundary vertex a 2 Bi , its closest vertex a0 is found from Bi0 ; and
similarly for any boundary vertex b0 2 Bi0 , its closest vertex b is found from Bi . Then,
14 Y. Meng et al.
the difference between the basin boundaries of sulcal pit i and its corresponding pit
i0 2 Q is defined as:
!
1 1 X 0 1 X
diff B ði; QÞ ¼ ;a 0
2B 0 disða; a Þ þ b0 2Bi0 ;b2Bi
disðb0 ; bÞ ð2Þ
2 NBi a2Bi i NB0i
where NBi and NBi0 are respectively the numbers of vertices in Bi and Bi0 , and disð; Þ is
the geodesic distance between two vertices on the spherical surface atlas.
(5) Sulcal Pit Local Connection. The difference between local connections of two
graphs P and Q is computed as CðP; QÞ ¼ Diff ðP; Q; diff C Þ, where diff C ði; QÞ is the
difference of local connection after mapping sulcal pit i to graph Q. Specifically, for a
sulcal pit i, assume k is one of its connected sulcal pits. Their corresponding sulcal pits
in graph Q are respectively i0 and k0 . The change of local connection after mapping
sulcal pit i to graph Q is measured by:
1 X
diff C ði; QÞ ¼ jdisði; kÞ disði0 ; k 0 Þj ð3Þ
NGi k2Gi
where Gi is the set of sulcal pits connecting to i, and NGi is the number of pits in Gi.
(6) Ridge Point Depth. Ridge points are the locations, where two sulcal basins meet.
As suggested by [5], the depth of the ridge point is an important indicator for distin-
guishing sulcal patterns. Thus, we compute the difference between the average ridge
point depth of sulcal graphs P and Q, as:
X
1 1 X
RðP; QÞ ¼ r e r
e ð4Þ
EP e2P EQ e2Q
where EP and EQ are respectively the numbers of edges in P and Q; e is the edge
connecting two sulcal pits; and re is the normalized depth of ridge point in the edge e.
Sulcal Graph Similarity Fusion. The above six metrics measured the inter-individual
differences of sulcal graphs from different points of view, and each provided com-
plementary information to the others. To capture both the common information and the
complementary information, we employed a similarity network fusion (SNF) method
[10] to adaptively integrate all six metrics together. To do this, each difference matrix
was normalized by its maximum elements, and then transformed into a similarity
matrix as:
M 2 ðx; yÞ
WM ðx; yÞ ¼ expð Þ ð5Þ
U þ U þ M ðx;yÞ
l x y3
where l was a scaling parameter; M could be anyone of the above six matrices; Ux and
Uy were respectively the average values of the smallest K elements in the x-th row and
y-th row of M. Finally, six similarity matrices WD, WH, WR, WS, WB, and WC were fused
Discovering Cortical Folding Patterns Using Large-Scale Dataset 15
together as a single similarity matrix W by using SNF with t iterations. The parameters
were set as l ¼ 0:8, K = 30, and t = 20 as suggested in [10].
Sulcal Pattern Clustering. To cluster sulcal graphs into different groups based on the
fused similarity matrix W, we employed the Affinity Propagation Clustering
(APC) algorithm [12], which could automatically determine the number of clusters
based on the natural characteristics of data. However, since sulcal folding patterns were
extremely variable across individuals, too many clusters were identified after per-
forming APC, making it difficult to observe the most important major patterns.
Therefore, we proposed a hierarchical APC framework to further group the clusters.
Specifically, after running APC, (1) the exemplars of all clusters were used to perform a
new-level APC, so less clusters were generated. Since the old clusters were merged, the
old exemplars may be no longer representative for the new clusters. Thus, (2) a new
exemplar was selected for each cluster based on the maximal average similarity to all
the other samples in the cluster. We repeated these steps, until the cluster number
reduced to an expected level (<5).
3 Results
We extracted sulcal pits on cortical surfaces from 677 neonatal brains. To demonstrate
the validity of our methods for discovering the cortical folding patterns, we employed
three representative cortical regions, i.e., the central sulcus, superior temporal sulcus,
and cingulate sulcus. For each cortical region, a 677 677 similarity matrix was
computed using SNF and all subjects were then clustered into different groups by the
hierarchical APC. To better explore the major folding patterns, an average cortical
surface was constructed for each cluster, based on 20 representative cortical surfaces
that are most similar to the exemplar in each cluster. All sulcal pits in each cluster were
mapped onto the average surfaces.
For the central sulcus, three distinct folding patterns were identified, as shown in
Fig. 2. In the pattern (a), two sulcal pits concentration areas can be observed, indicating
two sulcal basins in the central sulcus. This pattern was further confirmed by six
representative examples of individual subjects (second to seventh columns). In the
pattern (b), three distinct sulcal pits concentration areas can be observed, with one extra
area (basin 3) located in the most inferior portion of the central sulcus, compared to the
pattern (a). In the pattern (c), three distinct sulcal pits concentration areas can be
observed as in the pattern (b), but they are more concentrated. This is also confirmed by
six representative examples of (c). Moreover, compared to the pattern (b), the sulcal
basin 2 is very short, while the sulcal basin 3 is very long in the pattern (c). Such
phenomenon is likely related to “hand knob shift” in a study of the shape of the central
sulcus in adults [13]. Previously, different studies reported either two [8] or three [7]
sulcal basins in the central sulcus. Herein, we can see that both two-basin and
three-basin patterns are major patterns of sulcal folding.
For the superior temporal sulcus (STS), three distinct folding patterns were
identified, as shown in Fig. 3. In the pattern (a), the distribution of sulcal pits in the
posterior portion of STS is more diffused and bended, compared to the patterns (b) and
16 Y. Meng et al.
Fig. 2. Sulcal folding patterns in the central sulcus. The first column shows three discovered
sulcal folding patterns, with all sulcal pits (red spheres) mapped onto the average surface of each
cluster. For each pattern, the second to seventh columns show six representative examples of
individual subjects. Different sulcal basins are marked with different colors. The percentage of
each pattern is shown at the top-left corner.
(c), indicating the differences in the folding shape of STS. This is supported by a
previous cortical folding study in adults, which reported that for some brains there was
a Y-shaped STS but for some brains there was a single long STS [4]. In the pattern (b),
compared to (a) and (c), an extra concentration region of sulcal pits is exhibited near
the temporal pole, which is also confirmed by six representative examples in individual
subjects, showing small sulcal basins near the temporal pole. In the pattern (c), the
sulcal basin in the anterior portion of STS is very long and straight, extending to the
temporal pole.
Fig. 3. Sulcal folding patterns in the superior temporal sulcus. The first column shows three
discovered sulcal folding patterns, with all sulcal pits (red spheres) mapped onto the average
surface of each cluster. For each pattern, the second to seventh columns show six representative
examples of individual subjects. Different sulcal basins are marked with different colors.
Discovering Cortical Folding Patterns Using Large-Scale Dataset 17
For the cingulate sulcus, four distinct major folding patterns were identified, as
shown in Fig. 4. In the pattern (a), a single long cingulate sulcus is clearly shown,
while in the pattern (b), two long parallel sulci are observed. This is consistent with the
previous cortical folding pattern study in adults [4], which reported that two cingulate
sulci were observed in some brains. A study of autopsy specimen brains also reported
that 24 % left hemispheres had double parallel cingulate sulcus [1]. In the pattern (c),
the cingulate sulcus is interrupted in the anterior region; in contrast, in the pattern (d),
the cingulate sulcus is interrupted in the posterior region. This two types of interruption
were also reported in [1]. In pattern (c) and pattern (d), some parallel sulci can be
observed, but they are much shorter than that in pattern (b).
Fig. 4. Sulcal folding patterns in the cingulate sulcus. The first column shows four discovered
folding patterns, with all sulcal pits (red spheres) mapped onto the average surface of each
cluster. The second column shows the schematic drawing of the sulcal curves (blue dashes) on
the average surface of each cluster. For each pattern, the third to seventh columns show five
representative examples of individual subjects. The percentage of each pattern is shown at the
top-left corner.
4 Conclusion
The main contribution of this paper is twofold. First, a novel generic method for
discovering the cortical folding patterns was proposed, by leveraging the reliable sulcal
pits. Specifically, multiple complementary similarity measures of sulcal pits graph were
first computed and adaptively fused to comprehensively capture the individual simi-
larity. Then, based on the fused similarity, sulcal pits graphs were clustered using a
hierarchical affinity propagation algorithm. Second, for the first time, we applied the
proposed method to discover the cortical folding patterns in a large-scale neonatal
dataset with 677 subjects, and revealed multiple distinct and representative patterns.
These results suggested that it is needed to construct multiple representative cortical
folding atlases for each region for better spatial normalization of individuals in
18 Y. Meng et al.
group-level studies. Our future work includes discovering patterns in other cortical
regions, and exploring their relationships with structural connectivity and cognitive
functions.
Acknowledgements. This work was supported in part by UNC BRIC-Radiology start-up fund
and NIH grants (MH107815, MH108914, MH100217, HD053000, and MH070890).
References
1. Ono, M., Kubik, S., Abernathey, C.D.: Atlas of the Cerebral Sulci. Thieme, New York
(1990)
2. Li, G., Wang, L., Shi, F., et al.: Construction of 4D high-definition cortical surface atlases of
infants: methods and applications. Med. Image Anal. 25, 22–36 (2015)
3. Sun, Z.Y., Rivière, D., Poupon, F., Régis, J., Mangin, J.-F.: Automatic inference of sulcus
patterns using 3D moment invariants. In: Ayache, N., Ourselin, S., Maeder, A. (eds.)
MICCAI 2007, Part I. LNCS, vol. 4791, pp. 515–522. Springer, Heidelberg (2007)
4. Sun, Z.Y., Perrot, M., Tucholka, A., Rivière, D., Mangin, J.-F.: Constructing a dictionary of
human brain folding patterns. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C.
(eds.) MICCAI 2009, Part II. LNCS, vol. 5762, pp. 117–124. Springer, Heidelberg (2009)
5. Im, K., Raschle, N.M., Smith, S.A., et al.: Atypical sulcal pattern in children with
developmental dyslexia and at-risk kindergarteners. Cereb. Cortex 26, 1138–1148 (2016)
6. Lohmann, G., von Cramon, D.Y., Colchester, A.C.: Deep sulcal landmarks provide an
organizing framework for human cortical folding. Cereb. Cortex 18, 1415–1420 (2008)
7. Im, K., Jo, H.J., Mangin, J.F., et al.: Spatial distribution of deep sulcal landmarks and
hemispherical asymmetry on the cortical surface. Cereb. Cortex 20, 602–611 (2010)
8. Meng, Y., Li, G., Lin, W., et al.: Spatial distribution and longitudinal development of deep
cortical sulcal landmarks in infants. NeuroImage 100, 206–218 (2014)
9. Li, G., Nie, J., Wang, L., et al.: Mapping region-specific longitudinal cortical surface
expansion from birth to 2 years of age. Cereb. Cortex 23, 2724–2733 (2013)
10. Wang, B., Mezlini, A.M., Demir, F., et al.: Similarity network fusion for aggregating data
types on a genomic scale. Nat. Methods 11, 333–337 (2014)
11. Li, G., Nie, J., Wang, L., et al.: Mapping longitudinal hemispheric structural asymmetries of
the human cerebral cortex from birth to 2 years of age. Cereb. Cortex 24, 1289–1300 (2014)
12. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315,
972–976 (2007)
13. Sun, Z.Y., Kloppel, S., Riviere, D., et al.: The effect of handedness on the shape of the
central sulcus. NeuroImage 60, 332–339 (2012)
Modeling Functional Dynamics of Cortical
Gyri and Sulci
1 Introduction
Cortical gyrification, which is highly convoluted as convex gyri and concave sulci, is
one of the most prominent characteristics of human brain [1]. A variety of studies have
reported the specific structural/functional difference between gyral and sulcal regions.
For example, from structural perspective, it is reported that the termination of
streamline white matter fiber bundles derived from diffusion tensor imaging or high
angular resolution diffusion imaging concentrate on gyrus in both human fetus and
adult brains, as well as chimpanzee and macaque brains [2–4]. From functional per-
spective, a recent study reported that the functional connectivity based on resting state
© Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 19–27, 2016.
DOI: 10.1007/978-3-319-46720-7_3
20 X. Jiang et al.
w
Xi j ¼ Xiq jtj q tj þ l; tj ¼ 1; ::; ðt l þ 1Þ ð1Þ
where Xiq is the value of q-th row of Xi at time point q. There are ðt l þ 1Þ time
windows in total for each subject.
Fig. 1. tfMRI signals’ temporal segments extraction. (a) The cortical surface and whole-brain
tfMRI signal matrix Xi of subject i. The tfMRI signal of an example grayordinate (cortical vertex)
is shown and highlighted by the blue frame. (b) Examples of extracted two consecutive temporal
w w
segments Xi j and Xi j þ 1 (highlighted by yellow and blue frames, respectively).
Fig. 2. (a) The illustration of group-wise sparse representation of temporal segments of a group
of subjects. (b) An example identified group-wise consistent functional network via mapping a
specific row (highlighted by red) of pwj back onto the cortical surface.
Dwj Rlm (m is the dictionary size, m > l and m (n I)) and a sparse coefficient
weight matrix awj RmðnIÞ using an effective online dictionary learning algorithm [14].
In brief, an empirical cost function considering the average loss of regression to n I
temporal segments is defined as
D 1 X nI
1 wj w w
fnI ðDwj Þ ¼ min jjxk Dak j jj22 þ kjjak j jj1 ð2Þ
n I k¼1 ak R 2
wj m
where ‘1 -norm regularization and k are adopted to trade-off the regression residual and
w w
sparsity level of ak j . xk j is the k-th column of X wj . To make the coefficients in awj
w
comparable, we also have a constraint for k-th column dk j of Dwj as defined in Eq. (3).
The whole problem is then rewritten as a matrix factorization problem in Eq. (4) and
solved by [14] to obtain Dwj and awj .
n o
D w T w
C ¼ Dwj Rlm s:t:8 k ¼ 1; . . .m; ðdk j Þ dk j 1 ð3Þ
1 wj w w
min jjx Dwj ak j jj22 þ kjjak j jj1 ð4Þ
wj w
D C;a j R mðnIÞ 2 k
Since the dictionary learning and sparse representation maintain the organization of
all temporal segments and subjects in X wj , the obtained awj also preserve the spatial
information of temporal segments across I subjects. We therefore decompose awj into
w w
I sub-matrices a1 j ; . . .; ai j Rmn corresponding to I subjects (Fig. 2a). The element (r,
s) in each sub-matrix represents the corresponding coefficient value of the s-th gray-
ordinate to the r-th dictionary in Dwj for each subject. In order to obtain a common
sparse coefficient weight matrix across I subjects, we perform t-test of the null
hypothesis for (r, s) across I subjects (p-value < 0.05) similar as in [15] to obtain the
p-value matrix pwj Rmn (Fig. 2b), in which element (r, s) represents the statistically
coefficient value of the s-th grayordinate to the r-th dictionary across all I subjects. pwj
is thus the common sparse coefficient weight matrix. From a brain science perspective,
w
dk j (k-th column of Dwj ) represents the temporal pattern of a specific group-wise
w
consistent functional network and its corresponding coefficient vector pk j (k-th row of
pwj ) can be mapped back to cortical surface (color-coded by z-score transformed from
Modeling Functional Dynamics of Cortical Gyri and Sulci 23
p-value) (Fig. 2b) to represent the spatial pattern of the network. We then identify those
w
meaningful group-wise consistent functional networks from pk j (k = 1, …, m) similar
as in [10]. Specifically, the GLM-derived activation maps and the intrinsic networks
w
templates provided in [16] are adopted as the network templates. The network from pk j
with the highest spatial pattern similarity with a specific network reference (defined as
J ðS; T Þ ¼ jS \ T j=jT j, S and T are spatial patterns of a specific network and a template,
respectively) is identified as a group-wise consistent functional brain network at wj.
Once we identify all group-wise consistent functional brain networks at wj, the
SOPFN at wj is defined as the set of all common cortical vertices gj (i = 1..64984)
involved in the spatial patterns of all identified functional networks [9, 10]:
3 Experimental Results
For each of the seven tfMRI datasets, we equally divided all 64 subjects into two
groups (32 each) for reproducibility studies. The window length l was experimentally
determined (l = 20) using the similar method in [8]. The values of m and k in Eq. (4)
were experimentally determined (m = 50 and k = 1.5) using the similar method in [13].
24 X. Jiang et al.
Fig. 3. Two example group-wise consistent functional networks within different time windows
in one subject group of emotion tfMRI data. (a) Task design curves across time windows (TW) of
emotion tfMRI data. 12 example TWs are indexed. Three different TW types are divided by black
dashed lines and labeled. TW type #1 involves task design 1, TW type #2 involves task design 2,
and TW type #3 involves both two task designs. The spatial patterns of (b) one example
task-evoked functional network and (c) one example intrinsic connectivity network (ICN) within
the 12 example TWs are shown.
Fig. 4. The mean SOPFN distribution on gyral (G) and sulcal (S) regions across different time
window types in the two subject groups of emotion tfMRI data. The common regions with higher
density are highlighted by red arrows. The two example surfaces illustrate the gyri/sulci and are
color-coded by the principal curvature value.
different TW types in emotion tfMRI data as example. We can see that albeit certain
common regions (with relatively higher density as highlighted by red arrows), there is
considerable SOPFN distribution variability between gyral and sulcal regions across
different time windows. Quantitatively, the distribution percentage on gyral regions is
statistically larger than that on sulcal regions across all time windows using
two-sampled t-test (p < 0.05) for all seven tfMRI datasets as reported in Table 1.
Table 1. The mean ratio of SOPFN distribution percentage on gyri vs. that on sulci across all
time windows in the two subject groups of seven tfMRI datasets.
Emotion Gambling Language Motor Relational Social WM
Group 1 1.47 1.60 1.46 1.32 1.59 1.46 1.67
Group 2 1.45 1.55 1.38 1.33 1.49 1.47 1.66
Finally, we calculated and visualized Pgyri and Psulci representing the dynamics of
SOPFN distribution percentage across all time windows on gyri and sulci, respectively
in Fig. 5. It is interesting that there are considerable peaks/valleys for the distribution
percentage on gyri/sulci which are coincident with the specific task designs across the
entire scan, indicating the temporal dynamics difference of SOPFN distribution
between gyral and sulcal regions. These results indicate that gyri might participate
26 X. Jiang et al.
Fig. 5. The temporal dynamics of SOPFN distribution percentage on gyri (green curve) and
sulci (yellow curve) across all time windows in the seven tfMRI datasets shown in (a)–(g),
respectively. The task design curves in each sub-figure are represented by different colors. Y-axis
represents the percentage value (*100 %).
References
1. Rakic, P.: Specification of cerebral cortical areas. Science 241, 170–176 (1988)
2. Nie, J., et al.: Axonal fiber terminations concentrate on gyri. Cereb. Cortex 22(12), 2831–
2839 (2012)
3. Chen, H., et al.: Coevolution of gyral folding and structural connection patterns in primate
brains. Cereb. Cortex 23(5), 1208–1217 (2013)
4. Takahashi, E., et al.: Emerging cerebral connectivity in the human fetal brain: an MR
tractography study. Cereb. Cortex 22(2), 455–464 (2012)
5. Deng, F., et al.: A functional model of cortical gyri and sulci. Brain Struct. Funct. 219(4),
1473–1491 (2014)
6. Jiang, X., et al.: Sparse representation of HCP grayordinate data reveals novel functional
architecture of cerebral cortex. Hum. Brain Mapp. 36(12), 5301–5319 (2015)
7. Gilbert, C.D., Sigman, M.: Brain states: top-down influences in sensory processing. Neuron
54(5), 677–696 (2007)
8. Li, X., et al.: Dynamic functional connectomics signatures for characterization and
differentiation of PTSD patients. Hum. Brain Mapp. 35(4), 1761–1778 (2014)
9. Duncan, J.: The multiple-demand (MD) system of the primate brain: mental programs for
intelligent behaviour. Trends Cogn. Sci. 14(4), 172–179 (2010)
10. Lv, J.: Sparse representation of whole-brain fMRI signals for identification of functional
networks. Med. Image Anal. 20(1), 112–134 (2015)
11. Glasser, M.F., et al.: The minimal preprocessing pipelines for the Human Connectome
Project. Neuroimage 80, 105–124 (2013)
12. Lee, K., et al.: A data-driven sparse GLM for fMRI analysis using sparse dictionary learning
with MDL criterion. IEEE Trans. Med. Imaging 30(5), 1076–1089 (2011)
13. Lv, J., et al.: Holistic atlases of functional networks and interactions reveal reciprocal
organizational architecture of cortical function. IEEE TBME 62(4), 1120–1131 (2015)
14. Mairal, J., et al.: Online learning for matrix factorization and sparse coding. J. Mach. Learn.
Res. 11, 19–60 (2010)
15. Lv, J., et al.: Assessing effects of prenatal alcohol exposure using group-wise sparse
representation of fMRI data. Psychiatry Res. 233, 254–268 (2015)
16. Smith, S.M., et al.: Correspondence of the brain’s functional architecture during activation
and rest. Proc. Natl. Acad. Sci. U.S.A. 106(31), 13040–13045 (2009)
A Multi-stage Sparse Coding Framework
to Explore the Effects of Prenatal Alcohol
Exposure
1 Introduction
TfMRI has been widely used in clinical neuroscience to understand functional brain
disorders [1]. Among all of state-of-the-art tfMRI analysis methodologies, the general
linear model (GLM) is the most popular approach in detecting functional networks
under specific task performance [2]. The basic idea underling GLM is that task-evoked
brain activities could be discovered by subtracting the activity from a control condition
[3, 4]. In common practice, experimental and control trials are performed several times
and fMRI signals are averaged to increase the signal-to-noise ratio [3]. Thus task-
dominant brain activities are greatly enhanced and other subtle and concurrent activities
are largely overlooked. Another alternative approach is independent component analysis
(ICA) [5]. However, the theoretical foundation of ICA-based methods has been chal-
lenged in recent studies [6]. Therefore, more advanced tfMRI activation detection
methods are still needed.
Recently, dictionary learning and sparse representation methods have been adopted
for fMRI data analysis [6, 7] and attracted a lot of attention. The basic idea is to
factorize the fMRI signal matrix into an over-complete dictionary of basis and a
coefficient matrix via dictionary learning algorithms [8]. Specifically, each dictionary
atom represents the functional activity of a specific brain network and its corresponding
coefficient vector stands for the spatial distribution of this brain network [7]. It should
be noticed that the decomposed coefficient matrix naturally reveals the spatial patterns
of the inferred brain networks. This novel strategy naturally accounts for the various
brain networks that might be involved in concurrent functional processes [9, 10].
However, a notable challenge in current data-driven strategy is how to establish
accurate network correspondence across individuals and characterize the group-wise
consistent activation map in a structured method. Since each dictionary is learned in a
data driven way, it is hard to establish the correspondence across subjects. To address
this challenge, in this paper, we propose a novel multi-stage sparse coding framework
to identify diverse group consistent brain activities and characterize the subtle cross
group differences under specific task conditions. Specifically, we first concatenate all
the fMRI dataset temporally and adopt dictionary learning method to identify the
group-level activation maps across all the subjects. After that, we constrain spatial/
temporal features in dictionary learning procedure to identify individualized temporal
pattern and spatial pattern from individual fMRI data. These constrained features
naturally preserve the correspondence across different subjects. Finally, a statistical
mapping method is adopted to identify group-wise consistent maps. In this way, the
group-wise consistent maps are identified in a structured way. By applying the pro-
posed framework on two groups of tfMRI data (healthy control and PAE groups), we
successfully identified diverse group-wise consistent brain networks for each group and
specific brain networks/regions that are affected by PAE under arithmetic task.
Fig. 1. The computational framework of the proposed methods. (a) Concatenated sparse coding.
t is the number of time point number and n is the voxel number and k is the dictionary atom
number. (b) Supervised dictionary learning with spatial maps fixed. (c) Supervised dictionary
learning with temporal features fixed. (d) Statistical mapping to identify group-wise consistent
maps for each group.
Given the fMRI signal matrix S RLn , where L is the fMRI time points number and
n is the voxel number, dictionary learning and sparse representation methods aim to
represent each signal in S with a sparse linear combination of dictionary (D) atoms and
the coefficient matrix A, i.e., S = DA. The empirical cost function is defined as
A Multi-Stage Sparse Coding Framework 31
1X n
f n ðD Þ , ‘ðsi ; DÞ ð1Þ
n i¼1
where D is the dictionary, ‘ is the loss function, n is the voxel number and, si is a
training sample which represents the time course of a voxel. This problem of mini-
mizing the empirical cost could be further rewritten as a matrix factorization problem
with sparsity penalty:
1
min jjS DAjj22 þ kjjAjj1;1 ð2Þ
DeC;AeRkn 2
tive’. It should be noticed that the coefficient matrix is updated except that part of the
elements keeps ‘active’ (nonzero). The coefficient matrix updating procedure could be
represented as follows.
1 2
Ai , argmin si Dðt1Þ Ai 2 þ kjjAi jj1 ;
Ai eR m 2
Api ¼ 0:1 if Api ¼ 0 && V ði; pÞ ¼ 1 ð5Þ
1
min jjS Dc Ajj22 þ kjjAjj1;1
AeRkn 2
where Dc is the fixed individualized dictionary, k is the dictionary atom number, and
A is the learned coefficient matrix from each individual fMRI data with constrained
individualized dictionary in dictionary learning procedure.
AGx ði; jÞ
Tði; jÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð7Þ
varðAGx ði; jÞÞ
where AGx ði; jÞ represents the average value of the elements in each group and x
represents the patient group or control group. Specifically, the T-test acceptance
threshold is set as p\0:05. The derived T-value is further transformed to the standard
z-score. In this way, each group generated a group consistent Z statistic map and each
row in Z can be mapped back to brain volume standing for the spatial distribution of
the dictionary atom.
3 Experimental Results
The proposed framework was applied to two groups of tfMRI data: unexposed healthy
control and PAE patients. In each stage, the dictionary size is 300 and the sparsity is
around 0.05 and the optimization method is stochastic approximations. Briefly, we
A Multi-Stage Sparse Coding Framework 33
identified 263 meaningful networks in concatenated sparse coding stage and 22 of them
were affected by PAE. The detailed experimental results are reported as follows.
jX \ T j
RðX; T Þ ¼ ð8Þ
jT j
where X is the learned spatial network from Al and T is the RSN template.
Fig. 2. Examples of identified meaning networks by concatenated sparse coding. The first row is
the template name and the second row is the template spatial map. The third row is the
corresponding component network number in concatenated sparse coding. The last row is the
corresponding spatial maps in concatenated sparse coding. RSN represents common resting state
network in [13] and GLM result is computed from FSL feat software.
Table 1. The spatial overlap rates between the identified networks and the corresponding GLM
activation map and resting state templates.
GLM RSN#1 RSN#2 RSN#3 RSN#4 RSN#5 RSN#6 RSN#7 RSN#8 RSN#9
0.47 0.45 0.57 0.45 0.37 0.29 0.36 0.48 0.34 0.34
(a)
(b)
Fig. 3. Identified individualized temporal patterns and correlation matrix between different
subjects. (a) Identified individualized temporal patterns by constraining the same task-evoked
activation map (identified in concatenated sparse coding) in dictionary learning procedure. The
red line is the task paradigm pattern and the other lines are derived individualized temporal
activity patterns from healthy control group subjects for the same task-evoked activation
map. The right figure is the correlation matrix between different subjects. (b) Identified
individualized temporal patterns by constraining resting state activation map (identified in
concatenated sparse coding).
patterns and the correlation matrix between different subjects. Specifically, Fig. 3a
shows the learned temporal patterns from constraining task-evoked group activation
map (Network #175 in Fig. 2). The red line is the task design paradigm which has been
convoluted with hemodynamic response function. It is interesting to see that the
learned individualized temporal patterns from constraining task-evoked activation map
are quite consistent and the average of these learned temporal patterns is similar to the
task paradigm regressor. The correlation matrix between subjects in healthy control
group is visualized in the right map in Fig. 3a and the average value is as high as 0.5.
Another kind of dictionary patterns are learned from constraining resting state net-
works. Figure 3b shows the learned temporal patterns and correlation matrix between
the healthy control group subjects with constraining resting state network (#152 in
Fig. 2). The temporal patterns are quite different among different subjects and the
average correlation value is as low as 0.15. From these results, we can see that the
learned individualized temporal patterns are reasonable according to current neuro-
science knowledge and the subtle temporal activation pattern differences among dif-
ferent subjects under the same task condition are recognized with the proposed
framework (Fig. 4).
A Multi-Stage Sparse Coding Framework 35
(a)
(b)
Fig. 4. Examples of identified group-wise activation map in different groups. (a) and (b) are
organized in the same fashion. The first row shows the component number and the second row
shows the concatenated sparse coding results. While the third row shows the reconstructed
statistical activation map in healthy control group, the last row shows the statistical activation
map in PAE group. Blue circles highlight the difference between statistical maps in two groups.
4 Conclusion
We proposed a novel multi-stage sparse coding framework for inferring group con-
sistency maps and characterizing the subtle group response differences under specific
task performance. Specifically, we combined concatenated sparse coding and super-
vised dictionary learning methods and statistical mapping method together to identify
statistical group consistency maps in each group. This novel framework greatly
overcomes the limitation of lacking correspondence between different subjects in
current sparse coding based methods and provides a structured way to identify sta-
tistical group consistent maps. Experiments on healthy control and PAE tfMRI data
have demonstrated the great advantage of the proposed framework in identifying
meaningful diverse group consistency brain networks. In the future, we will further
investigate the evaluation of subjects’ individual maps in the frame work and parameter
optimization and test our framework on a variety of other tfMRI datasets.
Acknowledgements. J. Han was supported by the National Science Foundation of China under
Grant 61473231 and 61522207. X. Hu was supported by the National Science Foundation of
China under grant 61473234, and the Fundamental Research Funds for the Central Universities
under grant 3102014JCQ01065. T. Liu was supported by the NIH Career Award (NIH
EB006878), NIH R01 DA033393, NSF CAREER Award IIS-1149260, NIH R01 AG-042599,
NSF BME-1302089, and NSF BCS-1439051.
References
1. Matthews, P.M., et al.: Applications of fMRI in translational medicine and clinical practice.
Nat. Rev. Neurosci. 7(9), 732–744 (2006)
2. Fox, M.D., et al.: The human brain is intrinsically organized into dynamic, anticorrelated
functional networks. PNAS 102(27), 9673–9678 (2005)
3. Mastrovito, D.: Interactions between resting-state and task-evoked brain activity suggest a
different approach to fMRI analysis. J. Neurosci. 33(32), 12912–12914 (2013)
4. Friston, K.J., et al.: Statistical parametric maps in functional imaging: a general linear
approach. Hum. Brain Mapp. 2(4), 189–210 (1994)
5. Mckeown, M.J., et al.: Spatially independent activity patterns in functional MRI data during
the stroop color-naming task. PNAS 95(3), 803–810 (1998)
6. Lee, K., et al.: A data-driven sparse GLM for fMRI analysis using sparse dictionary learning
with MDL criterion. IEEE Trans. Med. Imaging 30(5), 1076–1089 (2009)
7. Lv, J., et al.: Sparse representation of whole-brain fMRI signals for identification of
functional networks. Med. Image Anal. 1(20), 112–134 (2014)
8. Mairal, J., et al.: Online learning for matrix factorization and sparse coding. J. Mach. Learn.
Res. 11, 19–60 (2010)
9. Pessoa, L.: Beyond brain regions: network perspective of cognition–emotion interactions.
Behav. Brain Sci. 35(03), 158–159 (2012)
10. Anderson, M.L., Kinnison, J., Pessoa, L.: Describing functional diversity of brain regions
and brain networks. Neuroimage 73, 50–58 (2013)
11. Santhanam, P., et al.: Effects of prenatal alcohol exposure on brain activation during an
arithmetic task: an fMRI study. Alcohol. Clin. Exp. Res. 33(11), 1901–1908 (2009)
12. Jenkinson, M., Smith, S.: A global optimization method for robust affine registration of brain
images. Med. Image Anal. 5(2), 143–156 (2001)
13. Smith, S.M., et al.: Correspondence of the brain’s functional architecture during activation
and rest. PNAS 106(31), 13040–13045 (2009)
Correlation-Weighted Sparse Group
Representation for Brain Network Construction
in MCI Classification
1 Introduction
Study of brain functional connectivity network (BFCN), based on resting-state
fMRI (rs-fMRI), has shown great potentials in understanding brain functions
R. Yu was supported by the Research Fund for the Doctoral Program of Higher
Education of China (RFDP) (No. 20133219110029), the Key Research Foundation
of Henan Province (15A520056) and NFSC (No. 61171165, No. 11431015).
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 37–45, 2016.
DOI: 10.1007/978-3-319-46720-7 5
38 R. Yu et al.
and identifying biomarkers for neurological disorders [1]. Many BFCN modeling
approaches have been proposed and most of them represent the brain network as
a graph by treating brain regions as nodes, and the connectivity between a pair
of region as an edge (or link) [2]. Specifically, the brain can be first parcellated
into different regions-of-interest (ROIs) and then the connectivity in a pair of
ROIs can be estimated by the correlation between the mean blood-oxygen-level
dependent (BOLD) time series of these ROIs.
The most common BFCN modeling approach is based on pairwise Pear-
son’s correlation (PC). However, PC is insufficient to account for the interaction
among multiple brain regions [3], since it only captures pairwise relationship.
Another common modeling approach is based on sparse representation (SR).
For example, the sparse estimation of partial correlation with l1 -regularization
can measure the relationship among certain ROIs while factoring out the effects
of other ROIs [4]. This technique has been applied to construct brain network in
the studies of Alzheimer’s disease (AD), mild cognitive impairment (MCI) [3],
and autism spectrum disorder [5]. However, human brain inherently contains not
only sparse connections but also group structure [6], with the latter considered
more in the recent BFCN modeling methods. A pioneer work in [7] proposed non-
overlapping group sparse representation by considering group structures and sup-
porting group selections. The group structure has been utilized in various ways.
For example, Varoquaux et al. [8] used group sparsity prior to constrain all sub-
jects to share the same network topology. Wee et al. [9] used group constrained
sparsity to overcome inter-subject variability in the brain network construction.
To introduce the sparsity within each group, sparse group representation (SGR)
has also been developed by combining l1 -norm and lq,1 -norm constraints. For
example, a recent work [10] defined “group” based on the anatomical connec-
tivity, and then applied SGR to construct BFCN from the whole-brain fMRI
signals.
Note that, in all these existing methods, the l1 -norm constraint in both SR
and SGR penalizes each edge equally. That is, when learning the sparse represen-
tation for a certain ROI, BOLD signals in all other ROIs are treated equally. This
process ignores the similarity between BOLD signals of the considered ROI and
the other ROIs during the network reconstruction. Actually, if BOLD signals of
two ROIs are highly similar, their strong connectivity should be kept or enhanced
during the BFCN construction, while the weak connectivity shall be restrained.
In light of this, we introduce a link-strength related penalty in sparse represen-
tation. Moreover, to further make the penalty consistent across all similar links
in the whole brain network, we propose a group structure based constraint on
the similar links, allowing them to share the same penalty during the network
construction. In this way, we can jointly model the whole brain network, instead
of separately modeling a sub-network for each ROI. This is implemented by a
novel weighted sparse group regularization that considers sparsity, link strength,
and group structure in a unified framework.
To validate the effectiveness of our proposed method in constructing brain
functional network, we conduct experiments on a real fMRI dataset for the BFCN
Brain Network Construction 39
construction and also for BFCN-based brain disorder diagnosis. The experimen-
tal results in distinguishing MCI subjects from normal controls (NCs) confirm
that our proposed method, with simple t-test for feature selection and linear
SVM for classification, can achieve superior classification performance compared
to the competing methods. The selected feature (i.e., network connections) by
our method can be utilized as potential biomarkers in future studies on early
intervention of such a progressive and incurable disease.
The l1 -norm penalty involved in Eq. (1) penalizes each representation coef-
ficient with the same weight. In other words, it treats each ROI equally when
reconstructing a target ROI (xi ). As a result, sparse modeling methods based
on this formulation tend to reconstruct the target ROI by some ROIs that have
very different signals as the target ROI. Furthermore, the reconstruction of each
ROI is independent from the reconstructions of other ROIs; thus, the estimated
reconstruction coefficients for the similar ROIs could vary a lot, and this could
lead to an unstable BFCN construction. Hence, the link strength that indicates
signal similarity of two ROIs should be considered in the BFCN construction.
where Pji is the PC coefficient between the ith ROI xi and the j th ROI xj ,
and σ is a parameter used to adjust the weight decay speed for the link strength
adaptor. Accordingly, the correlation-weighted sparse representation (WSR) can
be formulated as
N
N
1
min ||xi − xj Wji ||22 + λ Cji |Wji |, (3)
W 2
i=1 j=i i=1 j=i
where C ∈ RN ×N is the link strength adaptor matrix with each element Cji
being inversely proportional to the similarity (i.e., PC coefficient) between the
signals in ROI xj and the signals in the target ROI xi .
Note that the reconstruction of xi , i.e., the ith sub-network construction,
is still independent from the reconstructions of sub-networks for other ROIs.
In order to further make this link-strength related penalty consistent across all
links with similar strength in the whole network, we propose a group structure
constraint on the similar links, allowing them to share the same penalty during
the whole BFCN construction. In this way, we can model the whole brain network
jointly, instead of separately modeling sub-networks of all ROIs.
1 2000
G
1
0.8 G
2
0.6 1500 G3
G4
Group Size
0.4
0.2 1000
G5
0
G6
-0.2 500
G
7
-0.4 G8
G9
G10
-0.6 0
0 0.2 0.4 0.6 0.8 1
-0.8 Absolute PC value
Fig. 1. Illustration of group partition for a typical subject in our data. (a) Pearson
correlation coefficient matrix P . (b) The corresponding group partition (K = 10) of (a).
To identify the group structure, we partition all links, i.e., the pairwise con-
nections among ROIs, into K groups based on the PC coefficients. Specifically,
K non-overlapping groups of links are pre-specified by their corresponding PC
coefficients. Assuming the numerical range of the absolute value of the PC
coefficient |Pij | is [Pmin , Pmax ] with Pmin ≥ 0 and Pmax ≤ 1, we partition
[Pmin , Pmax ] into K uniform and non-overlapping partitions with the same inter-
val Δ = (Pmax − Pmin )/K. The k th group is defined as Gk = {(i, j) | |Pij | ∈
[Pmin + (k − 1)Δ, Pmin + kΔ]}. Figure 1 shows the grouping results by setting
K = 10 for illustration purpose. Most link’s strength in the network is weak,
while the strong connectivity accounts for a small number of links.
To integrate constraints on link strength, group structure, as well as the spar-
sity in a unified framework, we propose a novel weighted sparse group regular-
ization formulated as:
Brain Network Construction 41
N
N
K
1
min ||xi − xj Wji ||22 + λ1 Cji |Wji | + λ2 dk ||WGk ||q , (4)
W 2
i=1 j=i i=1 j=i k=1
where ||WGk ||q = q q
(i,j)∈Gk (Wij )
is the lq -norm (with q=2 in this work).
E2
− σk
dk = e is a pre-defined weight for the k th group and Ek = |G1k | (i,j)∈Gk Pij .
σ is the same parameter as in Eq. (2), set as the mean of all subjects’ standard
variances of absolute PC coefficients. In Eq. (4) the first regularizer (l1 -norm
penalty) controls the overall sparsity of the reconstruction model, and the second
regularizer (lq,1 -norm penalty) contributes the sparsity at the group level.
3 Experiments
The Alzheimers Disease Neuroimaging Initiative (ADNI) dataset is used in this
study. Specifically, 50 MCI patients and 49 NCs are selected from the ADNI-2
dataset in our experiments. Subjects from both groups were scanned using 3.0T
Philips scanners. SPM8 toolbox (http://www.fil.ion.ucl.ac.uk/spm/) was used
to preprocess the rs-fMRI data according to the well-accepted pipeline [6].
1 0.6 0.6
0.6
0.2 0.2
0.4
0 0
0.2
-0.2 -0.2
0
-0.4 -0.4
-0.2
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
1
1
PC SR WSR SGR WSGR
0.6
0.4 PC
0.4 SR
WSR
0.2
0.2 SGR
WSGR
0
0 0.2 0.4 0.6 0.8 1
ACC SEN SPE AUC YI F-score BAC False Positive Rate
(a) Classification Performance (b) ROC Curves
Fig. 3. Comparison of classification results by five methods, using both seven classifi-
cation performance metrics and ROC curve.
significantly outperforms PC, SR, WSR and SGR under 95 % confidence inter-
val with p − value = 1.7 × 10−7 , 3.6 × 10−6 , 0.048 and 0.0017, respectively. The
superior performance of our method suggests the weighted group sparsity is ben-
eficial in constructing brain networks and also able to improve the classification
performance.
As the selected features by t-test in each validation might be different, we
record all selected features during the training process. The 76 most frequently
selected features are visualized in Fig. 4, where the thickness of an arc indicating
the discriminative power of an edge, which is inversely proportional to the esti-
mated p-values. The colors of arcs are randomly generated to differentiate ROIs
Fig. 4. The most frequently selected connections for the 90 ROIs of AAL template. The
thickness of an arc indicates the discriminative power of an edge for MCI classification.
44 R. Yu et al.
and connectivity for clear visualization. We can see that several brain regions
(as highlighted in the figure) are jointly selected as important features for MCI
classification. For example, a set of brain regions in the temporal pole, olfactory
areas and medial orbitofrontal cortex, as well as bilateral fusiform, are found to
have dense connections which are pivotal to MCI classification [14].
4 Conclusion
References
1. Fornito, A., Zalesky, A., Breakspear, M.: The connectomics of brain disorders. Nat.
Rev. Neurosci. 16, 159–172 (2015)
2. Smith, S.M., Miller, K.L., et al.: Network modelling methods for FMRI. NeuroIm-
age 54, 875–891 (2011)
3. Huang, S., Li, J., Sun, L., Ye, J., Fleisher, A., Wu, T.: Alzheimer’s Disease Neu-
roImaging Initiative: learning brain connectivity of Alzheimer’s disease by sparse
inverse covariance estimation. NeuroImage 50, 935–949 (2010)
4. Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection
with the lasso. Ann. Stat., 1436–1462 (2006)
5. Lee, H., Lee, D.S., et al.: Sparse brain network recovery under compressed sensing.
IEEE Trans. Med. Imaging 30, 1154–1165 (2011)
6. Rubinov, M., Sporns, O.: Complex network measures of brain connectivity: uses
and interpretations. NeuroImage 52, 1059–1069 (2010)
7. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped
variables. J. R. Stat. Soc. Series. B. Stat. Methodol 68, 49–67 (2006)
8. Varoquaux, G., Gramfort, A., Poline, J.B., Thirion, B.: Brain covariance selec-
tion: better individual functional connectivity models using population prior. In:
Advances in Neural Information Processing Systems, pp. 2334–2342 (2010)
9. Wee, C.Y., et al.: Group-constrained sparse fMRI connectivity modeling for mild
cognitive impairment identification. Brain Struct. Funct. 219, 641–656 (2014)
10. Jiang, X., Zhang, T., Zhao, Q., Lu, J., Guo, L., Liu, T.: Fiber connection pattern-
guided structured sparse representation of whole-brain fMRI signals for functional
network inference. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A. (eds.)
MICCAI 2015. LNCS, vol. 9349, pp. 133–141. Springer, Heidelberg (2015)
11. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM
Trans. Intell. Syst. Technol. 2, 27 (2011)
Brain Network Construction 45
12. Liu, J., Ji, S., Ye, J.: SLEP: sparse learning with efficient projections. Arizona
State Univ. 6, 491 (2009)
13. DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L.: Comparing the areas under
two or more correlated receiver operating characteristic curves: a nonparametric
approach. Biometrics, 837–845 (1988)
14. Albert, M.S., DeKosky, S.T., Dickson, D., et al.: The diagnosis of mild cogni-
tive impairment due to Alzheimers disease: recommendations from the National
Institute on Aging-Alzheimers Association workgroups on diagnostic guidelines for
Alzheimer’s disease. Alzheimer’s Dement. 7, 270–279 (2011)
Temporal Concatenated Sparse Coding
of Resting State fMRI Data Reveal Network
Interaction Changes in mTBI
Abstract. Resting state fMRI (rsfMRI) has been a useful imaging modality for
network level understanding and diagnosis of brain diseases, such as mild
traumatic brain injury (mTBI). However, there call for effective methodologies
which can detect group-wise and longitudinal changes of network interactions in
mTBI. The major challenges are two folds: (1) There lacks an individualized and
common network system that can serve as a reference platform for statistical
analysis; (2) Networks and their interactions are usually not modeled in the same
algorithmic structure, which results in bias and uncertainty. In this paper, we
propose a novel temporal concatenated sparse coding (TCSC) method to address
these challenges. Based on the sparse graph theory the proposed method can
model the commonly shared spatial maps of networks and the local dynamics of
the networks in each subject in one algorithmic structure. Obviously, the local
dynamics are not comparable across subjects in rsfMRI or across groups;
however, based on the correspondence established by the common spatial
profiles, the interactions of these networks can be modeled individually and
statistically assessed in a group-wise fashion. The proposed method has been
applied on an mTBI dataset with acute and sub-acute stages, and experimental
results have revealed meaningful network interaction changes in mTBI.
1 Introduction
Mild traumatic brain injury (mTBI) has received increasing attention as a significant
public health care burden worldwide [1, 2]. Microstructural damages could be found in
most cases of mTBI using diffusion MRI [3, 4]. Meanwhile many researches based on
resting state fMRI (rsfMRI) have reported that there are functional impairment at
network level with the aspects of memory, attention, executive function and processing
time [5–7]. However, there still lacks effective methodology that could model the
changes of interactions among brain networks longitudinally, which reflects the neural
plasticity and functional compensation during different stages of mTBI. The challenges
mainly lie in two folds: (1) There lacks an individualized and common network system
that can serve as a reference platform for statistical analysis; (2) Networks and their
interactions are usually not modeled in the same algorithmic structure, which results in
bias and uncertainty.
Conventional network analysis methods mainly include three streams: seed-based
network analysis [8], graph theory based quantitative network analysis [9], and
data-driven ICA component analysis [6, 10, 11]. Recently, sparse coding has attracted
intense attention in the fMRI analysis field because the sparsity constraint coincides
with the nature of neural activities, which makes it feasible in modeling the diversity of
brain networks [12–14]. Based on the sparse graph theory, whole brain fMRI signals
can be modeled by a learned dictionary of basis and a sparse parameter matrix. Each
signal of voxel in the brain is sparsely and linearly represented by the learned dic-
tionary with a sparse parameter vector [12–14]. The sparse parameters could be pro-
jected to the brain volume as spatial functional networks. The methodology has been
validated to be effective in reconstructing concurrent brain networks from fMRI data
[12–14]. However, functional interactions among these networks have not been well
explored, especially for group-wise statistics on rsfMRI data. In this paper, we propose
a novel group-wise temporal concatenating sparse coding method for modeling resting
state functional networks and their network-level interactions. Briefly, a dictionary
matrix and a parameter matrix are learned from the temporally concatenated fMRI data
from multiple subjects and groups. Common network spatial profiles can then be
reconstructed from the parameter matrix. It is interesting that the learned dictionary is
also temporally concatenated and it can be decomposed into dictionary of each subject
of each group to represent local dynamics of the common networks. Although the local
dynamics of each network are quite individualized, it turns out that their interactions
are comparable based on the correspondence built by the common spatial profiles. The
proposed method has been applied on a longitudinal mTBI data set, and our results
have shown that network interaction changes could be detected in different stages of
mTBI, in suggestion of brain recovery and plasticity after injury.
2.1 Overview
Briefly, our method is designed for cross-group analysis and longitudinal modeling.
RsfMRI data from multiple subjects and groups are firstly pre-processed, and then they
are spatially and temporally normalized, based on which fMRI signals will be tem-
porally concatenated. There are mainly two steps in our framework. As shown in
Fig. 1, in the first step, based on temporal concatenated sparse coding (TCSC), we
model common spatial profiles of brain networks and local network dynamics at the
same time. In the second step (Fig. 2), based on the local dynamics, functional inter-
actions among networks will be calculated, statistically assessed and compared among
groups. In this research, there are two groups of subjects, which are healthy controls
48 J. Lv et al.
and mTBI patients. For each group, there are two longitudinal stages: stage 1 as
patients at the acute stage and controls at the first visit, and stage 2 as patients at
subacute stage and controls at the second visit.
multiple groups are concatenated as the input of dictionary learning and sparse coding,
S ¼ ½s1 ; s2 . . .; si . . .sn (Fig. 1b). Eventually, the concatenated input matrix is decom-
posed with a concatenated dictionary matrix D (Fig. 1c) and a parameter matrix A ¼
½a1 ; a2 . . .; ai . . .an (Fig. 1d). Each row of the matrix A is projected to brain volume to
represent a functional network (Fig. 1e). As the learning is based on groups of subjects,
the networks are group-wise and common spatial profiles.
The dictionary learning and sparse coding problem is an optimized matrix factor-
ization problem in the machine learning field [15]. The cost function of the problem can
be summarized in Eq. (1) by considering the average loss of single representation.
1X n
f n ðD Þ , ‘ðsi ; DÞ ð1Þ
n i¼1
The loss function for each input signal is defined in Eq. (2), in which an l1 regu-
larization term was introduced to yield the sparse solution of ai .
1
‘ðsi ; DÞ , min jjsi Dai jj22 þ kjjai jj1 ð2Þ
DC;ai Rm
2
n o
C , DRtm s:t: 8j ¼ 1; . . .m; djT dj 1 ð3Þ
For this problem, an established and open-sourced parallel computing solution has
been provided by online dictionary learning method [15] in the SPArse Modeling
Software (http://spams-devel.gforge.inria.fr/). We adopt the SPAMS method to solve
our temporal concatenated sparse coding problem.
Fig. 2. Statistics on network interactions across two stages and two groups.
3 Results
There are complex sources for mTBI patients and the micro damages in the brain tissue
are quite different across subjects. However, based on the cognitive test and literature
report, patients usually suffer from similar functional defect, so that we group them
together to explore common functional interaction changes.
In this part, we firstly present meaningful networks from the concatenated sparse
coding and then we will analyze the statistical interaction differences among four
groups. Note that, there are two scans for each subject, and we will use the following
abbreviations: C1: Stage 1 of control group; C2: Stage 2 of control group; P1: Stage 1
of patient group; and P2: Stage 2 of patient group.
Fig. 3. The networks reconstructed from the matrix A of the TCSC method. Each network is
visualized with volume rendering and surface mapping from the most representative view.
Table 1. T-test design and number of interactions with significant difference (p < 0.01).
T-Test Design C1 C2 P1 P2
C1 Non 0 3 0
C2 0 Non 2 0
P1 2 4 Non 0
P2 2 4 3 Non
T-Test Design C P
C Non 2
P 8 Non
52 J. Lv et al.
It indicates that in order to compensate the functional loss because of micro injury of
mTBI, multiple networks and their interactions are combined to generate alternative
functional pathways [16]. These could be signs of neural plasticity [17].
For longitudinal analysis, we expect P1 (acute stage) and P2 (sub-acute sage) to be
different, so that t-tests are performed separately with the control groups as well as
between the two groups. For validation, we also treat the C1 and C2 as different groups.
First, from Table 1, there is no difference detected between C1 and C2, which is as
expected. Interestingly, C1 and C2 have stronger interactions than P1 (Fig. 4c–d), but
don’t have interactions stronger than P2. This indicates that patients at the sub-acute
stage are recovering towards normal, and in the recovery, there are also interactions are
strengthened (Fig. 4e) in P2. The interaction of N14 and N20 is stably decreased in P1
group, which makes sense because both N14 and N20 are related to memory function.
P1 and P2 both have stronger interactions than control group, but they are quite
different (Fig. 4f–i). For example, N24 (DMN) centered interactions are enhanced in
P1, which might suggest the strengthened functional regularization for functional
compensation. And N18 (cerebellum) centered interactions are enhanced in P2. These
findings are interesting and explicit interpretations of these interactions will be explored
in the future.
Temporal Concatenated Sparse Coding of Resting State fMRI Data 53
4 Conclusion
Acknowledgement. This work was supported by NSF CAREER Award IIS-1149260, NSF
BCS-1439051, NSF CBET-1302089, NIH R21NS090153 and Grant W81XWH-11-1-0493.
References
1. Iraji, A., et al.: The connectivity domain: analyzing resting state fMRI data using
feature-based data-driven and model-based methods. Neuroimage 134, 494–507 (2016)
2. Kou, Z., Iraji, A.: Imaging brain plasticity after trauma. Neural Regen. Res. 9, 693–700
(2014)
3. Kou, Z., VandeVord, P.J.: Traumatic white matter injury and glial activation: from basic
science to clinics. Glia 62, 1831–1855 (2014)
4. Niogi, S.N., Mukherjee, P.: Diffusion tensor imaging of mild traumatic brain injury. J. Head
Trauma Rehabil. 25, 241–255 (2010)
5. Mayer, A.R., et al.: Functional connectivity in mild traumatic brain injury. Hum. Brain
Mapp. 32, 1825–1835 (2011)
6. Iraji, A., et al.: Resting state functional connectivity in mild traumatic brain injury at the
acute stage: independent component and seed based analyses. J. Neurotrauma 32, 1031–
1045 (2014)
7. Stevens, M.C., et al.: Multiple resting state network functional connectivity abnormalities in
mild traumatic brain injury. Brain Imaging Behav. 6, 293–318 (2012)
8. Fox, M., Raichle, M.: Spontaneous fluctuations in brain activity observed with functional
magnetic resonance imaging. Nat. Rev. Neurosci. 8(9), 700 (2007)
9. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural
and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009)
10. van de Ven, V., Formisano, E., Prvulovic, D., Roeder, C., Linden, D.: Functional
connectivity as revealed by spatial independent component analysis of fMRI measurements
during rest. Hum. Brain Mapp. 22(3), 165–178 (2004)
11. Iraji, A., et al.: Compensation through functional hyperconnectivity: a longitudinal
connectome assessment of mild traumatic brain injury. Neural Plast. 2016, 4072402 (2016)
12. Lee, Y.B, Lee, J., Tak, S., et al.: Sparse SPM: sparse-dictionary learning for resting-state
functional connectivity MRI analysis. Neuroimage 125 (2015)
13. Lv, J., et al.: Assessing effects of prenatal alcohol exposure using group-wise sparse
representation of fMRI data. Psychiatry Res. Neuroimaging 233(2), 254–268 (2015)
54 J. Lv et al.
14. Lv, J., et al.: Sparse representation of whole-brain FMRI signals for identification of
functional networks. Med. Image Anal. 20(1), 112–134 (2014)
15. Mairal, J., Bach, F., Ponce, J., et al.: Online learning for matrix factorization and sparse
coding. J. Mach. Learn. Res. 11(1), 19–60 (2010)
16. Chen, H., Iraji, A., Jiang, X., Lv, J., Kou, Z., Liu, T.: Longitudinal analysis of brain recovery
after mild traumatic brain injury based on groupwise consistent brain network clusters. In:
Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part II. LNCS,
vol. 9350, pp. 194–201. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24571-3_24
17. Mishina, M.: Neural plasticity and compensation for human brain damage. Nihon Ika
Daigaku Igakkai Zasshi 10(2), 101–105 (2014)
Exploring Brain Networks via Structured
Sparse Representation of fMRI Data
1 Introduction
Functional magnetic resonance imaging (fMRI) signal analysis and functional brain
network investigation using sparse representation has received increasing interests in the
neuroimaging field [1, 10]. The main theoretical assumption is that each brain fMRI
signal can be represented as sparse linear combination of a set of signal basis in an
over-complete dictionary. The data-driven strategy of dictionary learning and sparse
coding is efficient and effective in reconstructing concurrent and interactive functional
networks from both resting state fMRI (rsfMRI) and task base fMRI (tfMRI) data [1,
10]. However, these approaches have potential space of further improvement, because
the pure data-driven sparse coding does not integrate brain science domain knowledge
© Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 55–62, 2016.
DOI: 10.1007/978-3-319-46720-7_7
56 Q. Zhao et al.
2 Method
2.1 Overview
Our computational framework of AGSMR is illustrated in Fig. 1. fMRI images from
individual brain are first registered into a standard space(MNI) to align with the AAL
template. Then extracting fMRI signals from a whole brain mask, an over-complete
signal dictionary is learned via online dictionary method. The learned dictionary as a set
of features (regressors), the group structured multi-task regression employs anatomical
structures as group information to regress whole brain signals. Finally, the coefficients
matrix are mapped back to the brain volume represent functional brain networks.
Step1 Step2
D
Step3
Step4 Step5
Feature
Mapping
Label of Signals Anatomy-guided Structured
Using AAL Multi-task Regression
Fig. 1. The flowchart of proposed AGSMR method pipeline: Step 1: data acquisition,
preprocessing and extract the whole brain signals. Step 2: using the whole signals for learning
dictionary D. Step 3: labelling of the whole signals via the AAL template. Step 4: feature
Selection based on AGSMR method. Step 5: mapping the selected feature (coefficient matrix) in
the whole brain to identify these meaningful functional networks.
1 Xn
fn ðDÞ , ‘ðxi ; DÞ ð1Þ
n i¼1
1
‘ðxi ; DÞ , minm jjxi Dai jj22 þ kjjai jj1 ð2Þ
ai 2R 2
where D ¼ ½d1 ; d2 ; . . .dn 2 Rtm (t is the fMRI signal time point and m is the number
of dictionary atoms) is the dictionary, each column representing a basis vector, the ‘1
regularization in Eq. (2) was adopted to generate a sparse solution, D and a are
alternatively updated and learned by using online dictionary learning algorithm [4]. The
learned D was adopted as the features (regressors) to perform sparse representation and
the proposed structured sparse representation of brain fMRI signals is detailed in
Sect. 2.5.
58 Q. Zhao et al.
^
a ¼ argmin‘ðaÞ þ k/ðaÞ ð3Þ
where ‘ðaÞ is the loss function, and /ðaÞ is the regularization term, which could
regularize feature selection while achieving sparse regularization, and k > 0 is the
regularization parameter. Once we learned dictionary D ¼ ½d1 ; d2 ; . . .dn 2 Rtm
(Sect. 2.3), the conventional LASSO perform regression of brain fMRI signals X ¼
½x1 ; x2 ; . . .xn 2 Rtn to obtain a sparse coefficient matrix a ¼ ½a1 ; a2 ; . . .an 2 Rmn was
defined as:
Xn Xn Xm
^
a ¼ argmin jjxi Dai jj22 þ k jaij j ð4Þ
i¼1 i¼1 j¼1
where ‘ðaÞ is defined as the least square loss, and /ðaÞ is the ‘1 -norm regularization
term to induce sparsity, aij is the coefficient element at the i-th column and j-th row, m
is the dictionary size. Equation (4) can be viewed as the LASSO penalized least
squares problem, conventional LASSO in Eq. (4) is pure data-driven approach,
However, according to the previous studies [2, 3, 6] that have shown that the priori
structure information such as disjoint/overlapping groups, trees, and graphs may sig-
nificantly improve the classification/regression performance and help identify the
important features [3].
In this paper, we propose a novel structured sparse representation approach (group
guided structured multi-task regression) into the regression of fMRI signals. Specifi-
cally, the group information of fMRI signals are defined by the anatomical structure in
Sect. 2.4, i.e., the whole brain fMRI signals are separated into v groups
fG1 ; G2 ; . . .Gv g; v ¼ 1; 2; . . .V based on the AAL template. The conventional LASSO
adopted the ‘1 norm regularization term to induce sparsity (Eq. (4)), here the ‘2 norm
penalty is introduced into the penalty term as shown in Eq. (5), which will improve the
intra-group homogeneity. Meanwhile, we using ‘1 norm joint ‘2 norms penalty which
will induce both intra-group sparsity and inter-group sparsity in Eq. (5).
Exploring Brain Networks via Structured Sparse Representation of fMRI 59
Xn Xn Xm
^
a ¼ argmin jjxi Dai jj22 þ k jaij j
i¼1 i¼1 j¼1
Xm Xs
þ ð1 kÞ j¼1 s¼1
xs jjaGj s jj2 ð5Þ
Thus, Eq. (5) can be also viewed as the structured sparse penalized multi-task least
squares problem. The detailed solution of this structured LASSO penalized multi-task
least squares problem with combined ‘1 and ‘2 norms were referred to [6, 8] our final
learning problem is summarized in Eq. (5). http://yelab.net/software/SLEP/) is the
SLEP package employed to solve the problem and to learn the coefficient matrix a.
From brain science perspective, the learned coefficient matrix a include the spatial
feature of functional networks and each row of a spatial features were mapped back to
brain volume to identify and quantitatively characterize those meaningful functional
networks similar to the methods in [1].
3 Results
jA \ Bj
S¼ ð6Þ
jBj
where A is the spatial map of our identified network component and B is that of the
RSNs template network. jAj And jBj are the numbers of voxels.
We performed quantitative measurements on Working Memory task dataset to
demonstrate the performance of our method. We selected 10 well-known resting state
networks to compare spatial similarity. The identified networks are visualized in Fig. 2.
The figures(RSNs#1—RSNs#10) represent 10 resting state template networks(RSNs)
and #1–#10 represent our identified networks. It is shown that our method identified
networks are consistent with the templates. The slice #1, #2, and #3 are visual network,
which correspond to medial, occipital pole, and lateral visual areas, the slice #4 is
default mode network(DMN),the slice #5 to #8 are cerebellum, sensorimotor, auditory
and executive control networks respectively. The slice #9 and #10 are frontoparietal
networks, all of these identified networks activated areas are consistent with template
networks and the detailed comparision results in Table 1.
In order to validate our method effective and robust, we used seven different task
datasets to test our approach. Figure 3 shows the results. Table 1 shows similarity
results compare with template on 7 different datasets.
60 Q. Zhao et al.
Fig. 2. Comparison 10 resting state networks (RSNs) with our method identified networks on
working memory task dataset. The figures(RSNs#—1RSNs#10) show 10 resting state template
networks [11] and (#1–#10) our method identified networks.
Table 1. Similarity coefficients between our results and the templates. The first column in table
is 7 tasks. The first row (#1–#10) indexes 10 networks. The average similarity across 7 tasks is
achieved as 0.623.
Task #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
WM 0.84 0.66 0.72 0.61 0.67 0.74 0.68 0.45 0.63 0.69
Emotion 0.82 0.65 0.61 0.46 0.54 0.84 0.62 0.42 0.65 0.70
Gambling 0.86 0.65 0.61 0.53 0.54 0.57 0.55 0.43 0.66 0.73
Language 0.86 0.66 0.62 0.57 0.74 0.56 0.62 0.45 0.67 0.72
Motor 0.83 0.66 0.62 0.47 0.47 0.53 0.51 0.41 0.60 0.79
Relational 0.81 0.68 0.62 0.47 0.47 0.53 0.51 0.41 0.60 0.79
Social 0.82 0.66 0.67 0.48 0.54 0.56 0.63 0.42 0.71 0.71
(a) (b)
Fig. 3. (a) and (b) shows 10 resting state networks of one randomly selected subjects on HCP
Q1 datasets. The first row represents 7 different tasks and the seven columns are corresponding to
7 tasks and the last column shows the corresponding resting state network templates.
(a)
(b)
Fig. 4. (a), (b) Shows Template, LASSO and Our method identified Visual Network (a),
Executive Control network and Auditory Network (b), on Working Memory dataset.
Table 2. Comparison two methods by calculating the similarities with the templates. The first
row represents 10 resting state networks (#1–#10). The first column represents two different
methods. The second column represents two datasets working memory (WM) and gambling
(GB). In general, our method have higher similarity compared with LASSO method.
MethodRSNs #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
Lasso WM 0.79 0.64 0.71 0.62 0.50 0.48 0.53 0.44 0.58 0.68
GB 0.83 0.64 0.55 0.49 0.52 0.56 0.47 0.43 0.52 0.62
AGSMR WM 0.84 0.66 0.73 0.61 0.71 0.74 0.68 0.45 0.63 0.69
GB 0.86 0.65 0.61 0.53 0.54 0.57 0.55 0.43 0.66 0.73
62 Q. Zhao et al.
4 Conclusion
Acknowledgements. This research was supported in part by Jiangsu Natural Science Founda-
tion (Project No. BK20131351), by the Chinese scholarship council (CSC).
References
1. Lv, J., Jiang, X., Li, X., Zhu, D., Chen, H., Zhang, T., Hu, X., Han, J., Huang, H., Zhang, J.:
Sparse representation of whole-brain fMRI signals for identification of functional networks.
Med. Image Anal. 20, 112–134 (2015)
2. Kim, S., Xing, E.P.: Tree-guided group lasso for multi-task regression with structured
sparsity. In: ICML, pp. 543–550 (2010)
3. Ye, J., Liu, J.: Sparse methods for biomedical data. ACM SIGKDD Explor. Newslett. 14,
4–15 (2012)
4. Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online learning for matrix factorization and sparse
coding. J. Mach. Learn. Res. 11, 19–60 (2010)
5. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser.
B (Methodological) 58, 267–288 (1996)
6. Liu, J., Ji, S., Ye, J.: Multi-task feature learning via efficient l 2, 1-norm minimization. In:
Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence,
pp. 339–348. AUAI Press (2009)
7. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N.,
Mazoyer, B., Joliot, M.: Automated anatomical labeling of activations in SPM using a
macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15,
273–289 (2002)
8. Liu, J., Ji, S., Ye, J.: SLEP: Sparse learning with efficient projections. Arizona State
University (2009)
9. Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E., Ugurbil, K.:
WU-Minn HCP consortium. The WU-Minn human connectome project: an overview.
Neuroimage 80, 62–79 (2013)
10. Lv, J., Jiang, X., Li, X., Zhu, D., Zhang, S., Zhao, S., Chen, H., Zhang, T., Hu, X., Han, J.,
Ye, J.: Holistic atlases of functional networks and interactions reveal reciprocal organiza-
tional architecture of cortical function. IEEE Trans. Biomed. Eng. 62, 1120–1131 (2015)
11. Smith, S., Fox, P., Miller, K., Glahn, D., Fox, P., Mackay, C., Filippini, N., Watkins, K.,
Toro, R., Laird, A., Beckmann, C.: Correspondence of the brain’s functional architecture
during activation and rest. Proc. Natl. Acad. Sci. U.S.A. 106, 13040–13045 (2009)
Discover Mouse Gene Coexpression Landscape
Using Dictionary Learning and Sparse Coding
1 Introduction
2 Methods
2.1 Experimental Setup
We downloaded the 4,345 3D volumes of expression energy of coronal sections and
the Allen Reference Atlas (ARA) from the website of AMBA (http://mouse.brain-map.
org/). The ISH data were collected in tissue sections, then digitally processed, stacked,
Discover Mouse Gene Coexpression Landscape Using DLSC 65
Fig. 1. Computational pipeline for constructing GCNs. (a) Input is one slice of 3D expression
grids of all genes. (b) Raw ISH data preprocessing step that removes unreliable genes and voxels
and estimates the remaining missing data. (c) Dictionary learning and sparse coding of ISH
matrix with sparse and non-negative constraints on coefficient a matrix. (d) Visualization of
spatial distributions of GCNs. (e) Enrichment analysis of GCNs.
1
\D; a [ ¼ argmin kX D ak22 s:tkak1 k; 8i; ai 0 ð1Þ
2
3 Results
DLSC allows readily interpretable results by plotting the spatial distributions of GCNs.
A visual inspection showed a set of spatially contiguous clusters partitioning the slice
(Fig. 2a,e). Many formed clusters correspond to one or more canonical anatomical
regions, providing an intuitive validation to the approach.
We will demonstrate the effectiveness of the DLSC by showing that the GCNs are
mathematically valid and biologically meaningful. Since the grouping of genes is
purely based on their expression patterns, a method with good mathematical ability will
make the partitions so that the expression patterns of the genes are similar within group
and dissimilar between groups. One caveat is that one gene may be expressed in
multiple cell types or participate in multiple functional pathways. Therefore, main-
taining the dissimilarity between groups may not be necessary. At the same time, the
method should also balance the biological ability of finding functionally enriched
networks. To show as an example, slice 27 and 38 are analyzed and discussed in depth
due to its good anatomical coverage of various brain regions. Using a fixed
gene-dictionary ratio of 100, 29 GCNs were identified for slice 27 and 31 GCNs were
constructed on slice 38.
Fig. 2. Visualization of spatial distribution of GCNs and the corresponding raw ISH data. On
the left are the slice ID and GCN ID. The second columns are the spatial maps of two GCNs, one
for each slice, followed by the ISH raw data of 3 representative genes. Gene acronyms and the
weights in the GCN are listed at the bottom. The weights indicate the extent to which a gene
conforms to the GCN.
68 Y. Li et al.
Fig. 3. Visualization of spatial distribution of GCNs enriched for major cell types, particular
brain regions and function/disease related genes. In each panel, top row: Slice ID and GCN ID;
second row: spatial map; third row: sub-category; fourth row: highly weighted genes in the
sub-category.
In addition to cell type specific GCNs, we also found some GCNs remarkably
selective for particular brain regions, such as GCN3 (Fig. 3e) in CA1, GCN5 (Fig. 3f)
in thalamus, GCN11 (Fig. 3g) in hypothalamus and GCN16 (Fig. 3h) in caudeputa-
man. Other GCNs with more complex anatomical patterning revealed close associa-
tions to biological functions and brain diseases. The GCNs associated with ubiquitous
functions such as ribosomal (Fig. 3j) and mitochondrial functions (Fig. 3k) have a wide
coverage of brain. A functional annotation suggested GCN12 of slice 27 is highly
enriched for ribosome pathway (p = 6.3 10−5). As to GCN21 on the same slice,
besides mitochondrial function (p = 1.5 10−8), it also enriches in categories
including neuron (p = 5.4 10−8) and postsynaptic proteins (p = 6.3 10−8) com-
paring with literatures [10]. One significant GO term synaptic transmission
(p = 1.1 10−5) might add possible explanations to the strong signals in the cortex
regions. GCN13 of slice 38 (Fig. 3i) showed strong associations with genes that found
downregulated in Alzheimer’s disease. Comparisons with Autism susceptible genes
generated from microarray and high-throughput RNA-sequencing data [13] indicates
70 Y. Li et al.
GCN 24 of slice 27’s association (p = 1.0 10−3) (Fig. 3h). Despite slightly lower
weights, the most significant three genes Met, Pip5k1b, Avpr1a, have all been reported
altered in Autism patients [13].
4 Discussion
References
1. Tavazoie, S., Hughes, J.D., et al.: Systematic determination of genetic network architecture.
Nat. Genet. 22, 281–285 (1999)
2. Stuart, J.M.: A gene-coexpression network for global discovery of conserved genetic
modules. Science 302, 249–255 (2003)
3. Gaiteri, C., Ding, Y., et al.: Beyond modules and hubs: the potential of gene coexpression
networks for investigating molecular mechanisms of complex brain disorders. Genes. Brain
Behav. 13, 13–24 (2014)
4. Bohland, J.W., Bokil, H., et al.: Clustering of spatial gene expression patterns in the mouse
brain and comparison with classical neuroanatomy. Methods 50, 105–112 (2010)
5. Eisen, M.B., Spellman, P.T., et al.: Cluster analysis and display of genome-wide expression
patterns. Proc. Natl. Acad. Sci. U. S. A. 95, 12930–12933 (1999)
6. Langfelder, P., Horvath, S.: WGCNA: an R package for weighted correlation network
analysis. BMC Bioinform. 9, 559 (2008)
7. Lein, E.S., Hawrylycz, M.J., Ao, N., et al.: Genome-wide atlas of gene expression in the
adult mouse brain. Nature 445, 168–176 (2007)
8. Mairal, J., Bach, F., et al.: Online learning for matrix factorization and sparse coding.
J. Mach. Learn. Res. 11, 19–60 (2010)
Discover Mouse Gene Coexpression Landscape Using DLSC 71
9. Dennis, G., Sherman, B.T., et al.: DAVID: database for annotation, visualization, and
integrated discovery. Genome Biol. 4, P3 (2003)
10. Miller, J.A., Cai, C., et al.: Strategies for aggregating gene expression data: the collapseRows
R function. BMC Bioinform. 12, 322 (2011)
11. Cahoy, J., Emery, B., Kaushal, A., et al.: A transcriptome database for astrocytes, neurons,
and oligodendrocytes: a new resource for understanding brain development and function.
J. Neuronsci. 28, 264–278 (2004)
12. Winden, K.D., Oldham, M.C., et al.: The organization of the transcriptional network in
specific neuronal classes. Mol. Syst. Biol. 5, 1–18 (2009)
13. Voineagu, I., Wang, X., et al.: Transcriptomic analysis of autistic brain reveals convergent
molecular pathology. Nature 474(7351), 380–384 (2011)
Integrative Analysis of Cellular Morphometric
Context Reveals Clinically Relevant Signatures
in Lower Grade Glioma
This work was supported by NIH R01 CA184476 carried out at Lawrence Berkeley
National Laboratory.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 72–80, 2016.
DOI: 10.1007/978-3-319-46720-7 9
Clinically Relevant Cellular Morphometric Context 73
1 Introduction
Histology sections provide wealth of information about the tissue architecture
that contains multiple cell types at different states of cell cycles. These sec-
tions are often stained with hematoxylin and eosin (H&E) stains, which label
DNA (e.g., nuclei) and protein contents, respectively, in various shades of color.
Morphometric abberations in tumor architecture often lead to disease progres-
sion, and it is desirable to quantify indices associated with these abberations
since they can be tested against the clinical outcome, e.g., survival, response to
therapy.
For the quantitative analysis of the H&E stained sections, several excellent
reviews can be found in [7,8]. Fundamentally, the trend has been based either on
nuclear segmentation and corresponding morphometric representation, or patch-
based representation of the histology sections that aids in clinical association.
The major challenge for tissue morphometric representation is the large amounts
of technical and biological variations in the data. To overcome this problem,
recent studies have focused on either fine tuning human engineered features [1,
4,11,12], or applying automatic feature learning [5,9,15,16,19,20], for robust
representation and characterization.
Even though there are inter- and intra- observer variations [6], a trained
pathologist always uses rich content (e.g., various cell types, cellular organiza-
tion, cell state and health), in context, to characterize tumor architecture and
heterogeneity for the assessment of disease state. Motivated by the works of
[13,18], we encode cellular morphometric signatures within the spatial pyramid
matching (SPM) framework for robust representation (i.e., cellular morphomet-
ric context) of WSIs in a large cohort with the emphasis on tumor architecture
and tumor heterogeneity, based on which an integrative analysis pipeline is con-
structed for the association of celllular morphometric context with clinical out-
comes and molecular data, with the potential in hypothesis generation regarding
the imaging biomarkers for personalized diagnosis or treatment. The proposed
approach is applied to the TCGA LGG cohort, where experimental results (i)
reveal several clinically relevant cellular morphometric types, which enables both
perceptual interpretation/validation and further investigation through gene set
enrichment analysis; and (ii) indicate the significantly increased survival rates
in one of the cellular morphometric context subtypes derived from the cellular
morphometric context.
2 Approaches
descriptors are described in [3], and the constructed cellular morphometric con-
text representations are released on our website1 .
1. Construct cellular morphometric types (D), where D = [d1 , ..., dK ] are the
K cellular morphometric types to be learned by the following optimization:
M
min ||xm − zm D||2 (1)
D,Z
m=1
subject to card(zm ) = 1, |zm | = 1, zm 0, ∀m
In our experiment, K is fixed to be 64. Meanwhile, given the fact that each
patient may contain multiple WSIs, SPM is applied at a single scale for the
convenient construction of cellular morphometric context as well as the integra-
tive analysis at patient level, where both cellular morphometric types and the
subtypes of cellular morphometric context are associated with clinical outcomes,
and molecular information.
The proposed approach has been applied on the TCGA LGG cohort, including
215 WSIs from 209 patients, where the clinical annotation of 203 patients are
available. For the quality control purpose, background and border portions of
each whole slide image were detected and removed from the analysis.
The TCGA LGG cohort consists of ∼ 80 million segmented nuclear regions, from
which 2 million were randomly selected for construction of cellular morphometric
types. As described in Sect. 2, the cellular morphometric context representation
for each patient is a 64-dimensional vector, where each dimension represents the
normalized frequency of a specific cellular morphometric type appearing in the
WSIs of the patient. Initial integrative analysis is performed by linking individ-
ual cellular morphometric types to clinical outcomes and molecular data. Each
cellular morphometric type is chosen as the predictor variable in the Cox pro-
portional hazards (PH) regression model together with the age of the patient
(implemented through the R survival package). For each cellular morphometric
type, the frequencies are further correlated with the gene expression values across
all patients. The top-ranked genes of positive correlation and negative correla-
tion, respectively, are imported into the MSigDB [17] for gene set enrichment
analysis. Table 1 summarizes cellular morphometric types that best predict the
survival distribution, and the corresponding enriched gene sets. Figure 1 shows
the top-ranked examples for these cellular morphemetric types.
As shown in Table 1, 8 out of 64 cellular morphometric types are clinically
relevant to survival (FDR adjusted p-value < 0.01) with statistical significance.
The first four cellular morphometric types in Fig. 1 all have a hazard ratio > 1,
indicating that a higher frequency of these cellular morphometric types may lead
76 J. Han et al.
Table 1. Top cellular morphometric types for predicting the survival distribution
based on the Cox proportional hazards (PH) regression model, and the corresponding
enriched gene sets with respect to genes that best correlate the frequency of the cellu-
lar morphometric type appearing in the WSIs of the patient, positively or negatively.
Hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions with
a unit difference of an explanatory variable, and higher HR indicates higher hazard of
death.
Fig. 1. Top-ranked examples for cellular morphometric types that best predict the
survival distribution, as shown in Table 1. Each example is an image patch of 101 × 101
pixels centered by the retrieved cell marked with the green dot. The first four cellular
morphometric types (hazard ratio> 1) indicate a worse prognosis and the last four
cellular morphometric types (hazard ratio< 1) indicates a protective effect. Note, this
figure is best viewed in color at 400 % zoom-in.
via STAT3, and two cellular morphometric types of better prognosis are enriched
with genes regulated by NF-kB in response to TNF and genes up-regulated in
response to TGFB1, respectively.
Consensus CDF
1
0.9
0.8
0.7
0.6
CDF
2 clusters
0.5
3 clusters
0.4 4 clusters
5 clusters
0.3
6 clusters
0.2 7 clusters
8 clusters
0.1
9 clusters
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fig. 2. Consensus clustering matrices and corresponding consensus CDFs of 203 TCGA
patients with LGG for cluster number of N = 2 to N = 9 based on cellular morpho-
metric context.
Figure 3(a) shows the Kaplan-Meier survival plot for three major subtypes of
the five-cluster consensus clustering result. The log-rank test p-value of 2.82e−5
indicates that the difference between survival times of subtype #5 patients and
subtypes #3 patients is statistically significant. The integration of genome-
wide data from multiple platforms uncovered three molecular classes of lower-
grade gliomas that were best represented by IDH and 1p/19q status: wild-type
IDH, IDH mutation with 1p/19q codeletion, and IDH mutation without 1p/19q
codeletion [2]. Further Fisher’s exact test reveals no enrichment between the
cellular morphometric subtypes and these molecular subtypes. On the other
hand, differential expressed genes between subtype #5 and subtypes #3
(Fig. 3(b)), indicate enrichment of genes that mediate programmed cell death
(apoptosis) by activation of caspases, and genes defining epithelial-mesenchymal
transition, as in wound healing, fibrosis and metastasis (via MSigDB).
Clinically Relevant Cellular Morphometric Context 79
(a) (b)
Fig. 3. (a) Kaplan-Meier plot for three major subtypes associated with patient survival,
where subtypes #3 (53 patients) #4 (65 patients) and #5 (82 patients) correspond to
the three major subtypes from top-left to bottom-right, respectively, in Fig. 2 (N = 5).
(b) Top genes that are differently expressed between the subtype #5 and subtypes
#3.
References
1. Bhagavatula, R., Fickus, M., Kelly, W., Guo, C., Ozolek, J., Castro, C.,
Kovacevic, J.: Automatic identification and delineation of germ layer components
in H &E stained images of teratomas derived from human and nonhuman primate
embryonic stem cells. In: IEEE ISBI, pp. 1041–1044 (2010)
2. Cancer Genome Atlas Research Network: Comprehensive, integrative genomic
analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 372(26), 2481–2498 (2015)
80 J. Han et al.
3. Chang, H., Borowsky, A., Spellman, P.T., Parvin, B.: Classification of tumor his-
tology via morphometric context. In: IEEE CVPR, pp. 2203–2210 (2013)
4. Chang, H., Han, J., Borowsky, A., Loss, L., Gray, J.W., Spellman, P.T., Parvin, B.:
Invariant delineation of nuclear architecture in glioblastoma multiforme for clinical
and molecular association. IEEE Trans. Med. Imaging 32(4), 670–682 (2013)
5. Chang, H., Zhou, Y., Borowsky, A., Barner, K.E., Spellman, P.T., Parvin, B.:
Stacked predictive sparse decomposition for classification of histology sections. Int.
J. Comput. Vis. 113(1), 3–18 (2015)
6. Dalton, L., Pinder, S., Elston, C., Ellis, I., Page, D., Dupont, W., Blamey, R.: His-
tolgical gradings of breast cancer: linkage of patient outcome with level of pathol-
ogist agreements. Mod. Pathol. 13(7), 730–735 (2000)
7. Demir, C., Yener, B.: Automated cancer diagnosis based on histopathological
images: a systematic survey (2009)
8. Gurcan, M., Boucheron, L., Can, A., Madabhushi, A., Rajpoot, N., Bulent, Y.:
Histopathological image analysis: a review. IEEE Rev. Biomed. Eng. 2, 147–171
(2009)
9. Huang, C.H., Veillard, A., Lomeine, N., Racoceanu, D., Roux, L.: Time efficient
sparse analysis of histopathological whole slide images. Comput. Med. Imaging
Graph. 35(7–8), 579–591 (2011)
10. Kane, A., Yang, I.: Interferon-gamma in brain tumor immunotherapy. Neurosurg.
Clin. N. Am. 21(1), 77–86 (2010)
11. Kong, J., Cooper, L., Sharma, A., Kurk, T., Brat, D., Saltz, J.: Texture based
image recognition in microscopy images of diffuse gliomas with multi-class gentle
boosting mechanism. In: IEEE ICASSP, pp. 457–460 (2010)
12. Kothari, S., Phan, J.H., Osunkoya, A.O., Wang, M.D.: Biological interpretation of
morphological patterns in histopathological whole slide images. In: Proceedings of
the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
(2012)
13. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid
matching for recognizing natural scene categories. In: IEEE CVPR, pp. 2169–2178
(2006)
14. Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-
based method for class discovery and visualization of gene expression microarray
data. Mach. Learn. 52, 91–118 (2003)
15. Romo, D., Garcla-Arteaga, J.D., Arbelez, P., Romero, E.: A discriminant multi-
scale histopathology descriptor using dictionary learning. In: SPIE 9041 Medical
Imaging (2014)
16. Sirinukunwattana, K., Khan, A.M., Rajpoot, N.M.: Cell words: modelling the
visual appearance of cells in histopathology images. Comput. Med. Imaging Graph.
42, 16–24 (2015)
17. Subramanian, A., Tamayo, P., Mootha, V., Mukherjee, S., Ebert, B., Gillette, M.,
Paulovich, A., Pomeroy, S., Golub, T., Lander, E., Mesirov, J.: Gene set enrichment
analysis: a knowledge-based approach for interpreting genome-wide expression pro-
files. Proc. Natl. Acad. Sci. USA 102(43), 15545–15550 (2005)
18. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using
sparse coding for image classification. In: IEEE CVPR, pp. 1794–1801 (2009)
19. Zhou, Y., Chang, H., Barner, K.E., Parvin, B.: Nuclei segmentation via sparsity
constrained convolutional regression. In: IEEE ISBI, pp. 1284–1287 (2015)
20. Zhou, Y., Chang, H., Barner, K.E., Spellman, P.T., Parvin, B.: Classification of
histology sections via multispectral convolutional sparse coding. In: IEEE CVPR,
pp. 3081–3088 (2014)
Mapping Lifetime Brain Volumetry
with Covariate-Adjusted Restricted Cubic
Spline Regression from Cross-Sectional
Multi-site MRI
1 Introduction
Brain volumetry across the lifespan is essential in neurological research and clinical
investigation. Magnetic resonance imaging (MRI) allows for quantification of such
changes, and consequent investigation of specific age ranges or more sparsely sampled
lifetime data [1]. Contemporaneous advancements in data sharing have made consid-
erable quantities of brain images available from normal, healthy populations. However,
© Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 81–88, 2016.
DOI: 10.1007/978-3-319-46720-7_10
82 Y. Huo et al.
2 Methods
2.1 Extracting Volumetric Information
The complete cohort aggregates 9 datasets with a total 5111 MR T1w 3D images from
normal healthy subjects (Table 1). 45 atlases are non-rigidly registered [4] to a target
image and non-local spatial staple (NLSS) label fusion [5] is used to fuse the labels
from each atlas to the target image using the BrainCOLOR protocol [6] (Fig. 1). WBV
and regional volume are then calculated by multiplying the volume of a single voxel by
the number of labeled voxels in original image space. In total, 15 NOIs are defined by
structural and functional covariance networks including visual, frontal, language,
memory, motor, fusiform, basal ganglia (BG) and cerebellum (CB).
where ðx ti Þ þ ¼ x ti ; if x [ ti ; ðx ti Þ þ ¼ 0; if x ti .
0 0 0
To regress out confound effects, new covariates X1 ; X2 ; . . .; Xc (with coefficients
0 0 0
b1 ; b2 ; . . .; bc ) are introduced to the nth degree spline regression
Xn XK XC
b_ x j þ b_ ðx ti Þnþ þ
0 0
Sð x Þ ¼ j¼0 oj i¼1 in u¼0
bu X u ð2Þ
u¼0
bu Xu ð3Þ
where b_ 02 ¼ b_ 03 ¼ 0 ensures the linearity before the first knot. Second, for x [ tK ,
XC
Sð xÞ ¼ b_ 00 þ b_ 01 x þ b_ 13 ðx t1 Þ3þ þ þ b_ K3 ðx tK Þ3þ þ
0 0
u¼0
bu Xu ð4Þ
To guarantee the linearity of C-RCS after the last knot, we expand the previous
expression and force the coefficients of x2 and x3 to be zero. After expansion,
XC
Sð xÞ ¼ b_ 00 þ b_ 13 t13 þ . . . þ b_ K3 tK3 þ _ þ 3b_ t2 þ . . . þ 3b_ t2 x
0 0
u¼0
b u X u þ b01 13 1 K3 K
ð5Þ
_
þ 3b13 t1 þ 3b23 t2 þ . . . þ 3bK3 tK x2 þ 3b_ 13 þ 3b_ 23 þ . . . þ 3b_ K3 x3
_ _
P P
As a result, linearity of Sð xÞ at x [ tK implies that Ki¼1 b_ i3 ti ¼ 0 and Ki¼1 b_ i3 ¼ 0.
Following such restrictions, the b_ ðK1Þ3 and b_ K3 are derived as
84 Y. Huo et al.
PK2 _ PK2 _
i¼1 bi3 ðtK ti Þ b ðtK1 ti Þ
b_ ðK1Þ3 ¼ and b_ K3 ¼ i¼1 i3 ð6Þ
tK tK1 tK tK1
In this work, gender, field strength and total intracranial volume (TICV) are employed
0
as covariates Xu . TICV values are calculated using SIENAX [8]. Field strength and
TICV are used to regress out site effects rather than using site categories directly since
the sites are highly correlated with the explanatory variable age.
Fig. 2. Volumetry and growth rate. The left plot in (a) shows the volumetric trajectory of whole
brain volume (WBV) using C-RCS regression on 5111 MR images. The right figure in
(a) indicates the growth rate curve, which shows volumetric change per year of the volumetric
trajectory. In (b), C-RCS regression is deployed on the same dataset by additionally regressing
out TICV. Our growth rate curves are compared with 40 previous longitudinal studies [1] on
smaller cohorts (21 studies in (a) without regressing out TICV and 19 studies in (b) regressing
out TICV). The standard deviations of previous studies are provided as black bars (if available).
The 95 % CIs in all plots are calculated from 10,000 bootstrap samples.
i; j 2 ½1; 2; . . .; 15 and i 6¼ j, where corrðÞ is the Pearson’s correlation between any
two C-RCS fitted piecewise trajectories ^ Si ð xÞ and ^Sj ð xÞ in the same age bin.
The stability of proposed approaches is demonstrated by the CIs of C-RCS
regression and SCNs using bootstrap method [10]. First, the 95 % CIs of volumetric
trajectories on WBV (Fig. 2) and 15 NOIs (Fig. 3) are derived by deploying C-RCS
regression on 10,000 bootstrap samples. Then, the distances D between all pairs of
clustered NOIs are derived using 15 (NOIs) 10,000 (bootstrap) C-RCS fitted tra-
jectories. Then, the 95 % CIs are obtained for each pair of clustered NOIs and shown
on six SCNs dendrograms (Fig. 4). The average network distance (AND), the average
distance between 15 NOIs for a dendrogram, can be calculated 10,000 times using
bootstrap. The AND reflects the modularity of connections between all NOIs. We are
able to see if the AND are significantly different during brain development periods by
deploying the two-sample t-test on AND values (10,000/age bin) between age bins.
86 Y. Huo et al.
Fig. 3. Lifespan trajectories of 15 NOIs are provided with 95 % CI from 10,000 bootstrap
samples. The upper 3D figures indicate the definition of NOIs (in red). The lower figures show
the trajectories with CI using C-RCS regression method by regressing out gender, field strength
and TICV (same model as Fig. 2b). For each NOI, the piecewise CIs of six age bins are shown in
different colors. The piecewise volumetric trajectories and CIs are separated by 7 knots in the
lifespan C-RCS regression rather than conducting independent fittings. The volumetric
trajectories on both sides of each NOI are derived separately except for CB.
3 Results
Figure 2a shows the lifespan volumetric trajectories using C-RCS regression as well as
the growth rate (volume change in percentage per year) of WBV when regressing out
gender and field strength effects. Figure 2b indicates the C-RCS regression on the same
dataset by adding TICV as an additional covariate. The cross sectional growth rate
Mapping Lifetime Brain Volumetry with C-RCS Regression 87
Fig. 4. The six structural covariance networks (SCNs) dendrograms using hierarchical
clustering analysis (HCA) indicate which NOIs develop together during different developmental
periods (age bins). The distance on the x-axis is in log scale, which equals to one minus Pearson’s
correlation between two curves. The correlation between NOIs becomes stronger from right to
left on the x-axis. The horizontal range of each colored rectangles indicates the 95 % CI of
distance from 10,000 bootstrap samples. Note that the colors are chosen for visualization
purposes without quantitative meanings.
curve using C-RCS regression is compared with 40 previous longitudinal studies (19
are TICV corrected) [1], which are typically limited on smaller age ranges.
Using the same C-RCS model in Figs. 2b and 3 indicates the both lifespan and
piecewise volumetric trajectories of 15 NOIs. In Fig. 4, the piecewise volumetric tra-
jectories of the 15 NOIs within each age bin are clustered using HCA and shown in one
SCNs dendrogram.
Then, six SCNs dendrograms are obtained by repeating HCA on different age bins,
which demonstrate the evolution of SCNs during different developmental periods. The
ANDs between any two age bins in Fig. 4 are statistically significant (p < 0.001).
88 Y. Huo et al.
Acknowledgments. This research was supported by NSF CAREER 1452485, NIH 5R21EY
024036, NIH 1R21NS064534, NIH 2R01EB006136, NIH 1R03EB012461, NIH R01NS095291
and also supported by the Intramural Research Program, National Institute on Aging, NIH.
References
1. Hedman, A.M., van Haren, N.E., Schnack, H.G., Kahn, R.S., Hulshoff Pol, H.E.: Human
brain changes across the life span: a review of 56 longitudinal magnetic resonance imaging
studies. Hum. Brain Mapp. 33, 1987–2002 (2012)
2. Durrleman, S., Simon, R.: Flexible regression models with cubic splines. Stat. Med. 8, 551–
561 (1989)
3. Harrell, F.: Regression Modeling Strategies: with Applications to Linear Models, Logistic
and Ordinal Regression, and Survival Analysis. Springer, Switzerland (2015)
4. Avants, B.B., Epstein, C.L., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image
registration with cross-correlation: evaluating automated labeling of elderly and neurode-
generative brain. Med. Image Anal. 12, 26–41 (2008)
5. Asman, A.J., Dagley, A.S., Landman, B.A.: Statistical label fusion with hierarchical
performance models. In: Proceedings - Society of Photo-Optical Instrumentation Engineers,
vol. 9034, p. 90341E (2014)
6. Klein, A., Dal Canton, T., Ghosh, S.S., Landman, B., Lee, J., Worth, A.: Open labels: online
feedback for a public resource of manually labeled brain images. In: 16th Annual Meeting
for the Organization of Human Brain Mapping (2010)
7. Stone, C.J., Koo, C.-Y.: Additive splines in statistics, p. 48 (1986)
8. Smith, S.M., Zhang, Y., Jenkinson, M., Chen, J., Matthews, P.M., Federico, A., De
Stefano, N.: Accurate, robust, and automated longitudinal and cross-sectional brain change
analysis. Neuroimage 17, 479–489 (2002)
9. Anderberg, M.R.: Cluster Analysis for Applications: Probability and Mathematical Statistics:
A Series of Monographs and Textbooks. Academic Press, New York (2014)
10. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)
1
https://masi.vuse.vanderbilt.edu/index.php/C-RCSregression.
Extracting the Core Structural Connectivity
Network: Guaranteeing Network Connectedness
Through a Graph-Theoretical Approach
1 Introduction
difference between the set of edges of G and the set of edges of Gi modulated by
the parameter λ. In the following, we will refer to fλ (G∗ , Gi ) as the difference
threshold of a core sub-network G∗ wrt Gi . Note that if λ = 1, we only consider
edges excluded from the core network, |{e ∈ E, e ∈ / E(Gi [V ∗ ])}|, and if λ = 0,
we only consider edges included in the core network, |{e ∈ / E, e ∈ E(Gi [V ∗ ])}|.
In Definition 1, we formalize the problem of computing the core sub-network as
a combinatorial optimization problem:
Definition 1 (Core Sub-network Problem). Let G1 = (V, E1 ), . . . , Gk =
(V, Ek ) be k ≥ 1 undirected graphs. Let λ be any real such that λ ∈ [0, 1]. Let
n ≥ 0 be any integer. Then, the core sub-network problem consists in computing
a connected graph G∗ = (V ∗ , E ∗ ) such that |V ∗ | ≥ n and such that the sum of
k
the difference thresholds i=1 fλ (G∗ , Gi ) is minimum.
1 5 1 5 1 5 1 5
2 4 2 4 2 4 2 4
3 3 3 3
Fig. 2. Instance of the common sub-network problem. (a–c) Brain connectivity of dif-
ferent subjects, namely G1 , G2 and G3 . (d) Extracted common sub-network G∗ that
is optimal for n = 5 with λ = 12 : the difference threshold is 72 .
In the rest of this section we state our main contribution, an optimal poly-
nomial time exact algorithm for the core sub-network problem if the number of
nodes is sufficiently large (optimal means here that there is no exact algorithm
with better complexity). Solving the problem in Definition 1, is hard: it can be
proved that given an integer n ≥ 0 and a real number δ ≥ 0, then the decision
version of the SCN problem is NP-complete even if k = 2. However, focusing on
the problem of minimizing fλ we obtain a polynomial time algorithm for SCN
extraction.
The main point of this work is to present an algorithm for the core graph
extraction and assess its potential for clinical and cognitive studies. Even if the
problem is very difficult to solve in general, we design our polynomial time core
subnetwork extraction algorithm and show that it is optimal, when we focus
92 D. Wassermann et al.
on the problem of minimizing the difference threshold and when the number of
nodes of the core sub-network is large.
Theorem 1. Consider k ≥ 1 undirected graphs G1 = (V, E1 ), . . . , Gk = (V, Ek )
and consider any real number λ ∈ [0, 1]. Then, Core-Sum-Alg (Algorthim 1)
is an O(max(k, log(|V |)).|V |2 )-time complexity exact algorithm for the core sub-
network problem when n = |V |.
Intuitively, w0 (e) represents the cost of not adding the edge e in the solution
and w1 (e) represents the cost of adding the edge e in the solution. From this,
we define the graph induced by the set of edges to keep in the core subnetwork
adding. To add such edges we define a graph representing the fully connected
graph where each node represents a maximal connected component:
Gcc = (Vcc , Ecc ) with Vcc = {u1 , . . . , ut } and Ecc = Vcc × Vcc , (3)
where cc(G1 ) = (cc1 (G1 ), . . . , cct (G1 )) is the t maximal connected components of
G1 . Then, to select which maximal connected components to include in our core
subnetwork graph, we define a weight function wcc :
Fitting
Fig. 4. Performance of core network as feature selection for a linear model for gender
specific connectivity. We evaluate model fit (left) and prediction (right), Gong et al.
[5] in green, and ours, in blue. We show the histograms of both values from our nested
Leave- 13 -Out experiment. In both measures, our approach has more frequent lower
values, showing a better performance.
96 D. Wassermann et al.
loop 500 times per outer loop. This totals 50,000 experiments. Finally, for each
experiment, we quantify the prediction performance of the linear model at each
inner loop with the mean squared error (MSE) of the prediction and Akaike
Information Criterion (AIC) for model fitting.
We show the experiment’s results in Fig. 4. In these results we can see that
our approach, in blue, performed better than Gong et al. [5], in green as the
number of cases with lower AIC and MSE is larger in our case.
We present, for the first time, an algorithm to extract the core structural con-
nectivity network of a subject population while guaranteeing connectedness. We
start by formalizing the problem and showing that, although the problem is
very hard (it is NP-complete), we produce a polynomial time exact algorithm
to extract such network when its number of nodes is large. Finally, we show an
example in which that our network constitutes a better feature selection step for
statistical analyses of structural connectivity. For this, we performed a nested
leave- 13 -out experiment on 300 hundred subjects. The results show that perform-
ing feature selection with our technique outperforms the most commonly used
approach.
Acknowledgments. This work has received funding from the European Research
Council (ERC Advanced Grant agreement No. 694665).
References
1. Bassett, D.S., Brown, J.A., Deshpande, V., Carlson, J.M., Grafton, S.T.: Conserved
and variable architecture of human white matter connectivity. Neuroimage 54(2),
1262–1279 (2011)
2. Bassett, D.S., Wymbs, N.F., Rombach, M.P., Porter, M.A., Mucha, P.J.,
Grafton, S.T.: Task-based core-periphery organization of human brain dynamics.
PLoS Comput. Biol. 9(9), e1003171 (2013)
3. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of
structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009)
4. Desikan, R.S., Ségonne, F., Fischl, B., Quinn, B.T., Dickerson, B.C., Blacker, D.,
Buckner, R.L., Dale, A.M., Maguire, R.P., Hyman, B.T., Albert, M., Killiany, R.J.:
An automated labeling system for subdividing the human cerebral cortex on MRI
scans into gyral based regions of interest. Neuroimage 31(3), 968–980 (2006)
5. Gong, G., He, Y., Concha, L., Lebel, C., Gross, D.W., Evans, A.C., Beaulieu, C.:
Mapping anatomical connectivity patterns of human cerebral cortex using in vivo
diffusion tensor imaging tractography. Cereb. Cortex 19(3), 524–536 (2009)
6. Sotiropoulos, S.N., Jbabdi, S., Xu, J., Andersson, J.L., Moeller, S., Auerbach, E.J.,
Glasser, M.F., Hernandez, M., Sapiro, G., Jenkinson, M., Feinberg, D.A.,
Yacoub, E., Lenglet, C., Van Essen, D.C., Ugurbil, K., Behrens, T.E.J.: Advances
in diffusion MRI acquisition and processing in the Human Connectome Project.
Neuroimage 80, 125–143 (2013)
Fiber Orientation Estimation Using Nonlocal
and Local Information
Chuyang Ye(B)
1 Introduction
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 97–105, 2016.
DOI: 10.1007/978-3-319-46720-7 12
98 C. Ye
and ensemble average propagator methods [10]. In particular, to reduce the num-
ber of dMRI acquisitions required for resolving crossing fibers, sparsity assump-
tion has been incorporated in the estimation problem. For example, it has been
used in the multi-tensor framework [1,9,13], leading to dictionary-based FO esti-
mation algorithms that have been shown to reconstruct FOs of good quality yet
using a lower number of dMRI acquisitions [1].
Because of image noise that adversely affects FO estimation, the regular-
ization of spatial consistency has been used in FO estimation problems. For
example, smoothness of diffusion tensors and FOs has been used as regulariza-
tion terms in the estimation in [12,15], respectively, but no sparsity regulariza-
tion is introduced. Other methods incorporate both sparsity and smoothness
assumption. For example, in [11,14] sparsity regularization is used together with
the smoothness of diffusion images in a spherical ridgelets framework, where FO
smoothness is enforced indirectly. More recently, [4,18] manage to directly encode
spatial consistency of FOs between neighbor voxels with sparsity regularization
in the multi-tensor models by using weighted 1 -norm regularization, where FOs
that are consistent with neighbors are encouraged. These methods have focused
on the use of local information for robust FO estimation. However, because fiber
tracts are usually tube-like or sheet-like [19], voxels that are not adjacent to
each other can also share similar FO configurations. Thus, nonlocal information
could further contribute to improved FO reconstruction by providing additional
information.
In this work, we propose an FO estimation algorithm that improves esti-
mation quality by incorporating both nonlocal and local information, which
is named Fiber Orientation Reconstruction using Nonlocal and Local Informa-
tion (FORNLI). We use a dictionary-based FO estimation framework, where
the diffusion signals are represented by a tensor basis so that sparsity regular-
ization can be readily incorporated. We design an objective function that consists
of data fidelity terms and weighted 1 -norm regularization. The weights in the
weighted 1 -norm encourage spatial consistency of FOs and are here encoded
by both local neighbors and nonlocal reference voxels. To determine the nonlo-
cal reference voxels for each voxel, we compare its patch-based diffusion profile
with those of the voxels in a search range, and select the k nearest neighbors in
terms of diffusion profiles. FOs are estimated by minimizing the objective func-
tion, where weighted 1 -norm regularized least squares problems are iteratively
solved.
2 Methods
2.1 Background: A Signal Model with Sparsity and Smoothness
Regularization
Sparsity regularization has been shown to improve FO estimation and reduce
the number of gradient directions required for resolving crossing fibers [1]. A
commonly used strategy to incorporate sparsity is to model the diffusion signals
using a fixed basis. The prolate tensors have been a popular choice because of
Fiber Orientation Estimation Using Nonlocal and Local Information 99
their explicit relationship with FOs [1,9,13]. Specifically, let {Di }Ni=1 be a set of
N fixed prolate tensors. The primary eigenvector (PEV) vi of each Di represents
a possible FO and these PEVs are evenly distributed on the unit sphere. The
eigenvalues of the basis tensors can be determined by examining the diffusion
tensors in noncrossing tracts [9]. Then, the diffusion weighted signal Sm (gk ) at
voxel m associated with the gradient direction gk (k = 1, 2, . . . , K) and b-value
bk can be represented as
N
T
Sm (gk ) = Sm (0) fm,i e−bk gk Di gk + nm (gk ), (1)
i=1
In practice, the constraint of ||fm ||1 = 1 is usually relaxed, and the sparse recon-
struction can be either solved directly [8] or by approximating the 0 -norm with
1 -norm [1,9,13]. Basis directions corresponding to nonzero mixture fractions
are determined as FOs.
To further incorporate spatial coherence of FOs, weighted 1 -norm regulariza-
tion has been introduced into dictionary-based FO estimation [4,18]. For exam-
ple, in [18] FOs in all voxels are jointly estimated by solving
M
{fˆm }M
m=1 = arg min ||Gfm − ym ||22 + β||Cm fm ||1 , (4)
f1 ,f2 ,...,fM ≥0 m=1
are usually tube-shaped (e.g., the cingulum bundle) or sheet-shaped (e.g., the
corpus callosum) [19], voxels that are not adjacent to each other can still have
similar FO patterns, and it is possible to use nonlocal information to improve
the estimation. We choose to use a weighted 1 -norm regularized FO estimation
framework similar to Eq. (4), and encode the weighting matrix Cm using both
nonlocal and local information.
Finding Nonlocal Reference Voxels. For each voxel m, the nonlocal infor-
mation is extracted from a set Rm of voxels, which are called nonlocal reference
voxels and should have diffusion profiles similar to that of m. To identify the
nonlocal reference voxels for m, we compute patch-based dissimilarities between
the voxel m and the voxels in a search range Sm , like the common practice
in nonlocal image processing [3,6]. Specifically, we choose a search range of a
11 × 11 × 11 cube [3] whose center is m. The patch at each voxel n ∈ Sm is
formed by the diffusion tensors of its 6-connected neighbors and the diffusion
tensor at n, which is represented as Δn = (Δn,1 , . . . , Δn,7 ).
We define the following patch-based diffusion dissimilarity between two voxels
m and n
1
7
dΔ (Δm , Δn ) = d(Δm,j , Δn,j ), (5)
7 j=1
For each m we find its k nearest neighbors in terms of the diffusion dissimilarity
in Eq. (5), and define them as the nonlocal reference voxels. k is a parameter to
be specified by users. Note that although we call these reference voxels nonlocal,
it is possible that Rm contains the neighbors of m as well, if they have very
similar diffusion profiles to that of m. We used the implementation of k nearest
neighbors in the scikit-learn toolkit1 based on a ball tree search algorithm.
the one defined in [18]; when n is not adjacent to m, the voxel similarity is
defined using the patches Δm and Δn . Second, suppose the FOs at a voxel n
are {wn,j }W n
j=1 , where Wn is the number of FOs at n. For each m we can compute
the similarity between the basis direction vi and the FO configurations of the
voxels in the guiding set Gm
Rm (i) = w(m, n) max |vi · wn,j |, i = 1, 2, . . . , N. (8)
j=1,2,...,Wn
n∈Gm
When vi is aligned with the FOs in many voxels in the guiding set Gm and
these voxels are similar to m, large Rm (i) is observed, indicating that vi is
likely to be an FO. Note that Rm (i) is similar to the aggregate basis-neighbor
similarity defined in [18]. Here we have replaced the neighborhood Nm in [18]
with the guiding set Gm containing both local and nonlocal information. These
Rm (i) can then be plotted on the unit sphere according to their associated basis
directions, and the basis directions with local maximal Rm (i) are determined as
likely FOs Um = {um,p }U p=1 (Um is the cardinality of Um ) at m [18].
m
With the likely FOs Um , the diagonal entries of Cm are specified as [18]
1−α max |vi ·um,p |
p=1,2,...,Um
Cm,i = , i = 1, 2, . . . , N , (9)
min 1−α max |vq ·um,p |
q=1,2,...,N p=1,2,...,Um
which is a weighted Lasso problem that can be solved using the strategy in [17].
102 C. Ye
3 Results
3.1 3D Digital Crossing Phantom
A 3D digital phantom (see Fig. 1) with the same tract geometries and diffu-
sion properties used in [18] was created to simulate five tracts. Thirty gradient
directions (b = 1000 s/mm2 ) were used to simulate the diffusion weighted images
(DWIs). Rician noise was added to the DWIs. The signal-to-noise ratio (SNR)
is 20 on the b0 image.
FORNLI with k = 4 was applied on the phantom and compared with
CSD [16], CFARI [9], and FORNI [18] using the FO error proposed in [18].
CSD and CFARI are voxelwise FO estimation methods, and FORNI incorpo-
rates neighbor information for FO estimation. We used the CSD implementation
in the Dipy software2 , and implemented CFARI and FORNI using the parame-
ters reported in [9,18], respectively. The errors over the entire phantom and in
the regions with noncrossing or crossing tracts are plotted in Fig. 2(a), where
FORNLI achieves the most accurate result. In addition, we compared the two
best algorithms here, FORNI and FORNLI, using a paired Student’s t-test. In
Fig. 2. FO estimation errors. (a) Means and standard deviations of the FO errors
of CSD, CFARI, FORNI, and FORNLI; (b) mean FORNLI FO errors with different
numbers of nonlocal reference voxels in regions with noncrossing or crossing tracts.
2
http://nipy.org/dipy/examples built/reconst csd.html.
Fiber Orientation Estimation Using Nonlocal and Local Information 103
all four cases, errors of FORNLI are significantly smaller than those of FORNI
(p < 0.05), and the effect sizes (Cohen’s d) are between 0.5 and 0.6.
Next, we studied the impact of the number of nonlocal reference voxels. Using
different k, the errors in regions with noncrossing or crossing tracts are shown in
Fig. 2(b). Note that k = 0 represent cases where only the local information from
neighbors is used. Incorporation of nonlocal information improves the estimation
quality, especially in the more complex regions with three crossing tracts. When
k reaches four, the estimation accuracy becomes stable, so we will use k = 4 for
the brain dMRI dataset.
Fig. 3. FO estimation in the crossing regions of SLF and CC overlaid on the fractional
anisotropy map. Note the highlighted region for comparison.
4 Conclusion
We have presented an FO estimation algorithm FORNLI which is guided by
both local and nonlocal information. Results on simulated and real brain dMRI
data demonstrate the benefit of the incorporation of nonlocal information for
FO estimation.
104 C. Ye
References
1. Aranda, R., Ramirez-Manzanares, A., Rivera, M.: Sparse and adaptive diffusion
dictionary (SADD) for recovering intra-voxel white matter structure. Med. Image
Anal. 26(1), 243–255 (2015)
2. Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Log-Euclidean metrics for fast and
simple calculus on diffusion tensors. Magn. Reson. Med. 56(2), 411–421 (2006)
3. Asman, A.J., Landman, B.A.: Non-local statistical label fusion for multi-atlas seg-
mentation. Med. Image Anal. 17(2), 194–208 (2013)
4. Aurı́a, A., Daducci, A., Thiran, J.P., Wiaux, Y.: Structured sparsity for spatially
coherent fibre orientation estimation in diffusion MRI. NeuroImage 115, 245–255
(2015)
5. Basser, P.J., Pajevic, S., Pierpaoli, C., Duda, J., Aldroubi, A.: In vivo fiber trac-
tography using DT-MRI data. Magn. Reson. Med. 44(4), 625–632 (2000)
6. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In:
IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
vol. 2, pp. 60–65. IEEE (2005)
7. Cetin, M.S., Christensen, F., Abbott, C.C., Stephen, J.M., Mayer, A.R.,
Cañive, J.M., Bustillo, J.R., Pearlson, G.D., Calhoun, V.D.: Thalamus and
posterior temporal lobe show greater inter-network connectivity at rest and across
sensory paradigms in schizophrenia. NeuroImage 97, 117–126 (2014)
8. Daducci, A., Van De Ville, D., Thiran, J.P., Wiaux, Y.: Sparse regularization for
fiber ODF reconstruction: from the suboptimality of 2 and 1 priors to 0 . Med.
Image Anal. 18(6), 820–833 (2014)
9. Landman, B.A., Bogovic, J.A., Wan, H., ElShahaby, F.E.Z., Bazin, P.L.,
Prince, J.L.: Resolution of crossing fibers with constrained compressed sensing
using diffusion tensor MRI. NeuroImage 59(3), 2175–2186 (2012)
10. Merlet, S.L., Deriche, R.: Continuous diffusion signal, EAP and ODF estimation
via compressive sensing in diffusion MRI. Med. Image Anal. 17(5), 556–572 (2013)
11. Michailovich, O., Rathi, Y., Dolui, S.: Spatially regularized compressed sensing
for high angular resolution diffusion imaging. IEEE Trans. Med. Imaging 30(5),
1100–1115 (2011)
12. Pasternak, O., Assaf, Y., Intrator, N., Sochen, N.: Variational multiple-tensor
fitting of fiber-ambiguous diffusion-weighted magnetic resonance imaging voxels.
Magn. Reson. Imaging 26(8), 1133–1144 (2008)
13. Ramirez-Manzanares, A., Rivera, M., Vemuri, B.C., Carney, P., Mareci, T.: Dif-
fusion basis functions decomposition for estimating white matter intravoxel fiber
geometry. IEEE Trans. Med. Imaging 26(8), 1091–1102 (2007)
14. Rathi, Y., Michailovich, O., Laun, F., Setsompop, K., Grant, P.E., Westin, C.F.:
Multi-shell diffusion signal recovery from sparse measurements. Med. Image Anal.
18(7), 1143–1156 (2014)
15. Reisert, M., Kiselev, V.G.: Fiber continuity: an anisotropic prior for ODF estima-
tion. IEEE Trans. Med. Imaging 30(6), 1274–1283 (2011)
16. Tournier, J.D., Calamante, F., Connelly, A.: Robust determination of the fibre ori-
entation distribution in diffusion MRI: non-negativity constrained super-resolved
spherical deconvolution. NeuroImage 35(4), 1459–1472 (2007)
17. Ye, C., Murano, E., Stone, M., Prince, J.L.: A Bayesian approach to distinguishing
interdigitated tongue muscles from limited diffusion magnetic resonance imaging.
Comput. Med. Imaging Graph. 45, 63–74 (2015)
Fiber Orientation Estimation Using Nonlocal and Local Information 105
18. Ye, C., Zhuo, J., Gullapalli, R.P., Prince, J.L.: Estimation of fiber orientations
using neighborhood information. Med. Image Anal. 32, 243–256 (2016)
19. Yushkevich, P.A., Zhang, H., Simon, T.J., Gee, J.C.: Structure-specific statistical
mapping of white matter tracts. NeuroImage 41(2), 448–461 (2008)
Reveal Consistent Spatial-Temporal Patterns
from Dynamic Functional Connectivity
for Autism Spectrum Disorder Identification
1 Introduction
methods have to determine the number of states (clusters) which might work well on
the training data but have the potential issue of generality to the unseen testing subjects.
To address above issues, we propose a novel data-driven solution to reveal the
consistent spatial-temporal FC patterns from resting-state fMRI image. Our work has
two folds. First, we present a robust learning-based method to optimize FC from the
BOLD signals in a fixed sliding window. In order to avoid the unreliable calculation of
FC based on signal correlations, high level feature representation is of necessity to
guide the optimization of FC. Specifically, we apply singular value decomposition
(SVD) to the tentatively estimated FC matrix and regard the top ranked eigenvectors
are as the high level network features which characterize the principal connection
patterns across all brain regions. Thus, we can optimize functional connections for each
brain region based on not only the observed region-to-region signal correlations but
also the similarity between high level principal connection patterns. In turn, the refined
FC can lead to more reasonable estimation of principal connection patterns. Since brain
network is intrinsically economic and sparse, sparsity constraint is used to control the
number of connections during the joint estimation of principal connection patterns and
the optimization of FC. Second, we further extend the above FC optimization frame-
work from one sliding window (capturing the static FC patterns) to a set of overlapped
sliding windows (capturing the dynamic FC patterns), as shown in the middle of Fig. 1.
The leverage is that we arrange the FCs along time into a tensor structure (pink cubic in
Fig. 1) and we employ additional low rank constraint to penalize the oscillatory
changes of FC in the temporal domain.
In this paper, we apply our learning-based method to find the spatial-temporal
functional connectivity patterns for identifying childhood autism spectrum disorders
(ASD). Compared with conventional approaches which simply calculate FC based on
signal correlations, more accurate classification results have been achieved in classi-
fying normal control (NC) and ASD subjects by using our learned spatial-temporal FC
patterns.
2 Method
2.1 Construct Robust Functional Connectivity
Let xi 2 <W1 denote the mean BOLD signal calculated in brain region Oi (i ¼ 1;
. . .; N), where W is the length of time course within the sliding window. Conven-
tionally, a N N connectivity matrix S is used to measure the FCs in the whole brain,
where each element sij quantitatively measure the strength of FC between region Oi and
Oi (i 6¼ j). For convenience, we use si 2 <N1 denote i-th column in connectivity
matrix S, which characters the connections w.r.t. other brain regions. Since the
signal-to-noise ratio of observed xi is low, high level feature is of necessity to guide the
estimation of connectivity matrix S. To achieve it, we apply singular value decom-
position to S and regard the top ranked eigenvectors matrix FKN ¼ ½f i i¼1;...;N as the
high level network features, where each f i 2 <K1 denotes the principal connection
pattern on region Oi. Thus, instead of calculating the connectivity sij based on corre-
lation c xi ; xj between observed BOLD signals xi and xj , we require the optimal
Reveal Consistent Spatial-Temporal Patterns from Dynamic FC 109
connectivity sij should (1) be in consensus with the correlation of low level signals
between xi and xj ; and (2) be in line with similarity of high level principal connection
patterns between f i and f j . To that end, the objective function is defined as:
XN hXN i
arg minsi;j ;f i 1 c xi ; xj 2 sij þ f i f j 2 sij þ r1 ksi k þ r2 ksi k2
1 2
i¼1 j¼1 2 2
ð1Þ
s:t: 8i; si [ 0;
where r1 is the scalar controlling the strength of connection sparsity for each con-
nection pattern si . In order for robustness, L2 norm is applied to si . Since the estimation
of sij and f i are coupled, we propose the following solution to alternative solve sij
and f i :
(1) Initialize connectivity matrix by letting sij ¼ cðxi ; xj Þ;
(2) Given S, obtain the principal connection pattern f i for each region Oi by applying
eigenvalue decomposition to S since S is symmetric. After that, we select the top
K eigenevectors.
(3) Fixing f i , we divide the estimation of sij in Eq. (1) into two sub-tasks: (a) Estimate
sij without the sparsity constraint. Since the objective function without the L1
norm can be reformulated into a quadratic form, we can use Karush Kuhn Tucker
(KKT) [7] algorithm to optimize sij . (b) Make the connection pattern si sparse.
The objective function requires the optimized connection pattern si not only
sparse but also close to the solution in step 3(a) Standard Alternating Direction
Method of Multipliers (ADMM) [7, 8, 14] can be used to solve this sub-task.
(4) Go to step 2 until converge.
Typical optimized connectivity matrix S ^ is shown in the pink cubic in Fig. 1.
Compared to the connectivity matrix by conventional method based on the signal
correlation, our learned connectivity matrix is much sparser and it becomes much easier
to construct the brain network since a lot of spurious connections have been removed
by using the sparsity constraint during optimization.
window. Then, we extend the objective function in Eq. (1) to the spatial-temporal
domain using tensor analysis:
2
arg minS C S þ F S þ aSð1Þ þ r1 Sð2Þ 1 þ r2 Sð2Þ F
ð2Þ
s:t: 8i; t; sti [ 0;
P P
where C S ¼ Tt¼1 ðCt ÞT St and F S ¼ Tt¼1 ðFt ÞT St . We use SðkÞ denote the
unfolding operation to a general tensor S along the k-th mode. In our method, we have
Sð1Þ 2 <N T and Sð2Þ 2 <NTN . Since brain in resting state generally transverses a
2
small number of discrete stages during a short period of time [4], we require the change
of connectivity matrix St to be smooth along time. Thus,
it is reasonable to apply low
rank constraint on Sð1Þ such that the minimization of Sð1Þ (nuclear norm of Sð1Þ ) can
suppress too rapid FC change in the temporal domain. L1 -norm is applied to Sð2Þ since
the brain network within each sliding window is sparse.
Optimization. In order to make the optimization of Eq. (2) tractable, we introduce two
dummy variables Z1 and Z2 so that we can solve this problem using ADMM [7, 8]:
XT h i
arg minS;Z1 ;Z2 ðCt ÞT St þ ðFt ÞT St þ r2 kSt k2F þ akZ1 k þ r1 kZ2 k1
t¼1 ð3Þ
s:t: 8i; t; sti [ 0; Sð1Þ ¼ Z1 ; Sð2Þ ¼ Z2 :
Using Lagrangian multipliers, we can remove the equality constraints in Eq. (3) and
reformulate Eq. (3) into:
XT h T T 2
i
arg minS;Z1 ;Z2 t¼1
ð C t Þ St þ ð Ft Þ St þ r 2 k St k F þ akZ1 k þ r1 kZ2 k1
l l 2 ð4Þ
þ 1 Sð1Þ Z1 F þ KT1 Sð1Þ Z1 þ 2 Sð2Þ Z2 F þ KT2 Sð2Þ Z2 ;
2
2 2
where K1 and K2 are the N 2 T Largrangian multiplier matrix, and l1 and l2 are the
penalty parameters. Furthermore, we solve Eq. (4) by alternatively optimize S, Z1 and
Z2 until Eq. (4) converges. The dynamic connectivity matrices St can be optimized by
following the Karush Kuhn Tunker (KKT) method in [9]. Standard soft threshold
shrinkage method [7] can be used to solve Z1 and Z2 .
3 Experiment
Table 1. Accuracy of identifying ASD subjects on UM dataset w.r.t. sliding window size.
Window Perarson correlation Learned static FC Learned dynamic FC
size ACC AUC ACC AUC ACC AUC
10 % 87.37 ± 6.13 93.22 ± 7.53 89.23 ± 5.27 94.74 ± 6.83 92.25 ± 7.21 97.31 ± 8.63
25 % 84.50 ± 6.51 89.71 ± 7.45 87.83 ± 4.57 92.03 ± 5.81 90.35 ± 6.72 95.46 ± 7.45
45 % 80.81 ± 5.55 85.02 ± 6.81 84.89 ± 3.78 88.91 ± 4.57 86.45 ± 5.34 91.33 ± 5.81
60 % 75.71 ± 8.45 78.83 ± 9.51 81.71 ± 7.17 87.12 ± 8.64 83.76 ± 6.52 88.42 ± 7.46
100 % 68.13 ± 12.41 74.32 ± 14.36 70.52 ± 11.35 75.17 ± 13.11 77.85 ± 8.66 81.31 ± 9.52
Table 2. Accuracy of identifying ASD subjects on NYU dataset w.r.t. sliding window size.
Window Perarson correlation Learned static FC Learned dynamic FC
size ACC AUC ACC AUC ACC AUC
10 % 86.59 ± 5.01 91.07 ± 6.92 88.37 ± 6.07 92.36 ± 7.31 91.85 ± 5.11 96.23 ± 7.24
25 % 83.83 ± 5.56 88.61 ± 6.12 86.89 ± 3.76 91.76 ± 4.57 89.07 ± 4.71 94.64 ± 5.67
45 % 77.72 ± 7.45 81.57 ± 9.24 84.71 ± 6.18 89.43 ± 7.17 87.25 ± 5.46 91.37 ± 6.16
60 % 72.96 ± 12.21 77.56 ± 13.73 78.22 ± 9.14 84.56 ± 10.29 82.52 ± 7.82 86.71 ± 8.62
100 % 65.35 ± 12.13 70.28 ± 14.52 69.33 ± 10.41 74.21 ± 11.35 75.73 ± 8.27 80.16 ± 9.13
4 Conclusion
In this work, we propose a novel learning-based method to discover both static and
dynamic connectivity patterns from resting-state fMRI data. For static FC estimation,
our method optimizes the functional connectivity based on not only the correlation of
low level BOLD signals but also the similarity of high level principal components from
the link-to-link connectivity patterns. To address the problem of dynamic functional
connectivity, we arrange connectivity matrices along time into a tensor structure and
apply sparsity to suppress spurious functional connectivities and low rank to avoid
unrealistic fast state transition along time. We use our method to obtain dynamic
connectivity patterns and apply them to identify ASD subject at individual level, where
classification method using our learned dynamic connectivity patterns can improve the
ASD identification accuracy with almost 8 % increase conventional correlation-based
framework.
References
1. Greicius, M., Srivastava, G., Reiss, A., Menon, V.: Default-mode network activity
distinguishes Alzheimer’s disease from healthy aging: evidence from functional MRI. PNAS
101, 4637–4642 (2004)
2. Amaral, D.G., Schumann, C.M., Nordahl, C.W.: Neuroanatomy of autism. Trends Neurosci.
31, 137–145 (2008)
3. van den Heuvel, M.P., Pol, H.E.H.: Exploring the brain network: a review on resting-state
fMRI functional connectivity. Eur. Neuropsychopharmacol. 20, 519–534 (2010)
4. Hutchison, R.M., Womelsdorf, T., Allen, E.A., Bandettini, P.A., Calhoun, V.D.,
Corbetta, M., Penna, S.D., Duyn, J.H., Glover, G.H., Gonzalez-Castillo, J.,
Handwerker, D.A., Keilholz, S., Kiviniemi, V., Leopold, D.A., de Pasquale, F.,
Sporns, O., Walter, M., Chang, C.: Dynamic functional connectivity: promise, issues, and
interpretations. Neuroimage 80, 360–378 (2013)
5. Wee, C.-Y., Yap, P.-T., Shen, D.: Diagnosis of autism spectrum disorders using temporally
distinct resting-state functional connectivity networks. CNS Neurosci. Ther. 22, 212–219
(2016)
6. Eavani, H., Satterthwaite, T.D., Gur, R.E., Gur, R.C., Davatzikos, C.: Unsupervised learning
of functional network dynamics in resting state fMRI. In: Gee, J.C., Joshi, S., Pohl, K.M.,
Wells, W.M., Zöllei, L. (eds.) IPMI 2013. LNCS, vol. 7917, pp. 426–437. Springer,
Heidelberg (2013)
7. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge
(2004)
8. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and
statistical learning via the alternating direction method of multipliers. Found. Trends Mach.
Learn. 3, 1–122 (2011)
9. Nie, F., Wang, X., Huang, H.: Clustering and projected clustering with adaptive neighbors.
In: The 20th International Conference on Knowledge Discovery and Data Mining (2014)
10. Rubinov, M., Sporns, O.: Complex network measures of brain connectivity: uses and
interpretations. Neuroimage 52, 1059–1069 (2010)
114 Y. Zhu et al.
11. Urs, B., et al.: Dynamic reconfiguration of frontal brain networks during executive cognition
in humans. PNAS 112, 11678–11683 (2015)
12. Heung-Il, S., Lee, S.W., Shen, D.: A hybrid of deep network and hidden Markov model for
MCI identification with resting-state fMRI. In: Navab, N., Hornegger, J., Wells, W.M.,
Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 573–580. Springer, Heidelberg
(2015)
13. Leonardi, N., et al.: Principal components of functional connectivity: a new approach to
study dynamic brain connectivity during rest. Neuroimage 83, 937–950 (2013)
14. Zhu, Y., Lucey, S.: Convolutional sparse coding for trajectory reconstruction. IEEE Trans.
Pattern Anal. Mach. Intell. 37(3), 529–540 (2015)
Boundary Mapping Through Manifold Learning
for Connectivity-Based Cortical Parcellation
1 Introduction
Connectome analysis has recently gained a lot of attention due to its potential
to reveal the functional and structural architecture of the human brain, as well
as understand its evolution through development, aging, and neurological disor-
ders [14]. Brain connectivity is typically analyzed via graphical models obtained
by connecting cortical regions to each other with respect to the similarity between
their connectivity profiles, derived from functional MRI (fMRI) or diffusion imag-
ing (dMRI). In a whole-brain connectivity analysis, parcellation of the cortex con-
stitutes an integral part of the pipeline, as the performance of the subsequent
stages depends on the ability of the parcels to reliably represent the underlying
connectivity [6]. Traditionally, parcellations derived from anatomical landmarks
or randomly partitioned subregions have been used for connectome analysis, how-
ever such parcellations generally fail to fully reflect the function of the cortical
architecture [14]. More recent approaches take into account the connectivity infor-
mation, generally in association with clustering algorithms [1,2,5,12] in order to
group vertices of connectional similarity [16]. Despite promising results, the par-
cellation problem is still open to improvements. This is primarily due to the fact
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 115–122, 2016.
DOI: 10.1007/978-3-319-46720-7 14
116 S. Arslan et al.
that the problem itself is ill-posed, thus, obtaining accurate parcels both depends
on the proposed method’s fidelity to the given data [12] and its capacity to differ-
entiate vertices with different connectivity profiles [6].
To this end, we introduce a new parcellation method, in which we learn
a manifold from local connectivity characteristics of an individual subject and
develop an effective way of computing parcels from this manifold. Our app-
roach rests on the assumption that through dimensionality reduction, we can
capture the underlying connectivity structure that may not be visible in high-
dimensional space [10]. We use the manifold to locate transition points where
connectivity patterns change and interpret them as an abstract delineation of
the parcellation boundaries. After projecting back to the native cortical space,
these boundaries are used to compute non-overlapping and spatially contiguous
parcels. We achieve this with a watershed segmentation technique, originally
utilized to parcellate resting-state correlations [8]. Nonlinear manifold learning
has been formerly used to identify functional networks from fMRI [9,17] and for
surface matching [10], as well as within many other fMRI analysis techniques,
such as [15]. Nevertheless, we propose to use such technique in association with
dMRI-based structural connectivity and boundary mapping, in order to compute
cortical parcellations for individual subjects, which can be used as the network
nodes in a whole-brain connectome analysis.
We assess the parcellation quality based on parcel homogeneity [2,8] and
silhouette analysis [5,6]. Besides the dMRI data, we also evaluate the parcella-
tions with functional connectivity data obtained from resting-state fMRI as a
means of external validation [6]. Our method is compared to the state-of-the-
art connectivity-based parcellation techniques [5,12], as well as two parcellation
schemes which do not take into account any connectivity information [16]. In
addition, we show the extent to which our parcellation boundaries agree with
well-established patterns of cortical myelination and cytoarchitecture.
2 Method
We start with preprocessing the dMRI data using probabilistic tractography
to estimate a structural connectivity network, which is then reduced in dimen-
sionality through manifold learning. Driven by the boundaries identified in the
low-dimensional embedding as points where connectivity patterns change, we
utilize a watershed segmentation to achieve the final parcellation (Fig. 1).
eigenvectors and later show that this method can produce more reliable parcel-
lations compared to spatially constrained spectral clustering.
We discretize the eigenvectors using k -means and partition each eigenvec-
tor into two subregions. The edge between these subregions potentially provides
good separation points towards obtaining a parcellation, as the vertices within
the same subregions tend to have similar connectivity properties, whilst the
points closer to the boundary attribute to the cortical areas where the connec-
tivity is in transition. For example, Fig. 2(a) shows that connectivity profiles of
different vertices may exhibit similar or varying patterns, depending on their
relative location to an edge. In order to show that this tendency holds across
the whole cortex, we randomly selected vertices from one subregion adjacent
to the edge and paired them with their closest neighbors residing in the other
subregion. Keeping the distance between the vertices in pairs approximately the
same, we selected new pairs of vertices, but this time from within the same sub-
regions. We then measured the average correlation between the paired vertices’
connectivity profiles in each set and repeated this for all eigenvectors and sub-
jects. Figure 2(b) shows that, the similarity between the connectivity profiles of
vertices drops by at least 20% if they reside on different sides of a boundary.
Fig. 2. (a) Connectivity profiles of vertices from different sides of a boundary. (b) Left:
illustration of the vertex selection procedure. Right: average similarity (correlation)
between paired vertices for each eigenvector. Dotted lines show the standard deviations.
map where each marker corresponds to an estimated parcel position and then
grow these markers until a boundary is reached or two ridges touch each other
in the flooding process of the watershed. The marker definition is typically per-
formed by defining a threshold on the boundary map. We set this threshold to
the 25th percentile of the boundary map intensities, since in many empirically
tested cases, this effectively revealed approximate parcel locations to be used as
ideal markers for a watershed transformation.
3 Experiments
Data. Experiments are conducted on a set of 100 randomly selected adults (54
females, age 22–35) from the Human Connectome Project (HCP) S500 release1 .
All data have been acquired and preprocessed following the HCP minimal pre-
processing pipelines [7]. For each subject, the gray-matter voxels have been reg-
istered onto the 32k triangulated mesh at 2 mm spatial resolution, yielding a
standard set of cortical vertices per hemisphere.
0.65 0.8
Silhouette coefficient
0.6 0.76
Homogeneity
0.55
0.72
0.5
0.68
0.45
0.64
0.4
0.35 0.6
0.3 0.56
d = 10 d = 15 d = 20 d = 10 d = 15 d = 20
0.19 0.74
Silhouette coefficient
0.17 0.7
Homogeneity
0.15 0.66
0.13 0.62
0.11 0.58
0.09 0.54
0.07 0.5
0.05 0.46
d = 10 d = 15 d = 20 d = 10 d = 15 d = 20
neighbors for the construction of their affinity matrices. Therefore, they may fail
to fully capture the underlying connectivity.
The difference in performance between our approach and the others becomes
more prominent with the resting-state functional connectivity results (Fig. 4).
Both homogeneity and silhouette analysis indicate that, the proposed method
can effectively subdivide the cortical surface into functionally coherent subre-
gions, hence can better reflect the underlying function. Although, other methods
can generate homogeneous parcels to some degree, they fail to separate vertices
with different signals from each other, as indicated by silhouette coefficients.
Finally, visual assessment of parcellations shows some alignment with
Brodmann’s cytoarchitectural areas and highly myelinated cortical regions (see
Supplementary Material). Dice-based overlapping measures [4] indicate that this
observation is substantially consistent across subjects, especially for the motor
(BA[1,3,4]) and visual cortex (BA17), with average Dice scores of 0.81 (±0.05)
and 0.82 (±0.05), respectively.
4 Conclusions
In this paper, we introduced a new connectivity-driven parcellation approach
based on dMRI. The proposed method models the local connectivity character-
istics with manifold learning and describes an effective use of this manifold to
identify locations where connectivity patterns change. Particularly, these tran-
sition locations are interpreted as an abstraction of the parcellation boundaries,
and hence, used to derive distinct parcels at different scales. We showed that
our parcellations can more reliably capture the underlying connectivity of the
brain compared to a set of other approaches. This paper focuses on developing a
complete framework for computing subject-specific parcellations, which can be
used in many application areas, such as for driving a registration process based
on brain connectivity. In addition, a planned future work is to explore the vari-
ability across individual parcellations towards generating a connectivity-based
cortical atlas, which can allow performing population level connectome studies.
Acknowledgments. Authors would like to thank Markus Schirmer for providing the
random parcellations. Data were provided by the Human Connectome Project, WU-
Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54-
MH091657). The research leading to these results received funding from the European
Research Council under the European Unions Seventh Framework Programme (FP/20-
2013)/ERC Grant Agreement No. 319456.
References
1. Arslan, S., Parisot, S., Rueckert, D.: Joint spectral decomposition for the par-
cellation of the human cerebral cortex using resting-state fMRI. In: Ourselin, S.,
Alexander, D.C., Westin, C.-F., Cardoso, M.J. (eds.) IPMI 2015. LNCS, vol. 9123,
pp. 85–97. Springer, Heidelberg (2015). doi:10.1007/978-3-319-19992-4 7
122 S. Arslan et al.
2. Arslan, S., Rueckert, D.: Multi-level parcellation of the cerebral cortex using
resting-rtate fMRI. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.)
MICCAI 2015. LNCS, vol. 9351, pp. 47–54. Springer, Heidelberg (2015)
3. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data
representation. Neural Comput. 15(6), 1373–1396 (2003)
4. Bohland, J.W., Bokil, H., Allen, C.B., Mitra, P.P.: The brain atlas concordance
problem: Quantitative comparison of anatomical parcellations. PLoS ONE 4(9),
e7200 (2009)
5. Craddock, R.C., James, G., Holtzheimer, P.E., Hu, X.P., Mayberg, H.S.: A whole
brain fMRI atlas generated via spatially constrained spectral clustering. Hum.
Brain Mapp. 33(8), 1914–1928 (2012)
6. Eickhoff, S.B., Thirion, B., Varoquaux, G., Bzdok, D.: Connectivity-based parcel-
lation: critique and implications. Hum. Brain Mapp. 36(12), 4771–4792 (2015)
7. Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., Coalson, T.S., Fischl, B.,
Andersson, J.L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J.R., Van Essen, D.C.,
Jenkinson, M.: The minimal preprocessing pipelines for the Human Connectome
Project. NeuroImage 80, 105–124 (2013)
8. Gordon, E.M., Laumann, T.O., Adeyemo, B., Huckins, J.F., Kelley, W.M.,
Petersen, S.E.: Generation and evaluation of a cortical area parcellation from
resting-state correlations. Cereb. Cortex 26(1), 288–303 (2016)
9. Langs, G., Sweet, A., Lashkari, D., Tie, Y., Rigolo, L., Golby, A.J., Golland, P.:
Decoupling function and anatomy in atlases of functional connectivity patterns:
language mapping in tumor patients. NeuroImage 103, 462–475 (2014)
10. Langs, G., Golland, P., Ghosh, S.S.: Predicting activation across individuals with
resting-state functional connectivity based multi-atlas label fusion. In: Navab, N.,
Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9350,
pp. 313–320. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24571-3 38
11. Laumann, T.O., Gordon, E.M., Adeyemo, B., Snyder, A.Z., Joo, S.J.,
Chen, M.Y., Gilmore, A.W., McDermott, K.B., Dosenbach, N.U., Schlaggar, B.L.,
Mumford, J.A., Poldrack, R.A., Petersen, S.E.: Functional system and areal organi-
zation of a highly sampled individual human brain. Neuron 87(3), 657–670 (2015)
12. Parisot, S., Arslan, S., Passerat-Palmbach, J., Wells, W.M., Rueckert, D.:
Tractography-driven groupwise multi-scale parcellation of the cortex. In:
Ourselin, S., Alexander, D.C., Westin, C.F., Cardoso, M.J. (eds.) IPMI 2015.
LNCS, vol. 9351, pp. 600–612. Springer, Heidelberg (2015)
13. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern
Anal. Mach. Intell. 22(8), 888–905 (2000)
14. Sporns, O.: The human connectome: a complex network. Ann. N. Y. Acad. Sci.
1224(1), 109–125 (2011)
15. Thirion, B., Dodel, S., Poline, J.B.: Detection of signal synchronizations in resting-
state fMRI datasets. NeuroImage 29(1), 321–327 (2006)
16. Thirion, B., Varoquaux, G., Dohmatob, E., Poline, J.B.: Which fMRI clustering
gives good brain parcellations? Front. Neurosci. 8, 167 (2014)
17. Wang, D., Buckner, R.L., Fox, M.D., Holt, D.J., Holmes, A.J., Stoecklein, S.,
Langs, G., Pan, R., Qian, T., Li, K., Baker, J.T., Stufflebeam, S.M., Wang, K.,
Wang, X., Hong, B., Liu, H.: Parcellating cortical functional networks in individ-
uals. Nat. Neurosci. 18(12), 1853–1860 (2015)
Species Preserved and Exclusive Structural
Connections Revealed by Sparse CCA
Xiao Li1(&), Lei Du1, Tuo Zhang1, Xintao Hu1, Xi Jiang2, Lei Guo1,
and Tianming Liu2
1
Brain Decoding Research Center, Northwestern Polytechnical University,
Xi’an, Shaanxi, China
lixiao0827@gmail.com
2
Computer Science Department, The University of Georgia, Athens, GA, USA
Abstract. Brain evolution has been an intriguing research topic for centuries.
Efforts have been denoted to identifying structural connectome preserved
between macaques and humans and the one exclusive to one species. However,
recent studies mainly focus on one specific fasciculus or one region. The sim-
ilarity and difference of global structural connection network in macaque and
human are still largely unknown. In this work, we used diffusion MRI (dMRI) to
estimate the whole brain large-scale white matter pathways and Brodmann areas
as a test bed to construct a global connectome for the two species. We adopted
sparse canonical correlation analysis (SCCA) algorithm to yield the weights
which can be applied to the connectome to produce the components strongly
correlated between the two species. Joint analysis of the weights helped to
identify the preserved white matter pathways and those exclusive to a specific
species. The results are consistent with the reports in the literatures, demon-
strating the effectiveness and promise of this framework.
1 Introduction
Brain evolution has been an intriguing research topic for centuries. A comparative
structural connection study among primate brains may help in our understanding of the
structural substrates underlying the development of higher cognitive functions [1].
Recent research indicates that the organization of white matter (WM) bundles has been
preserved between macaques and humans while structural difference has also been
identified [2, 3]. However, these studies mainly focus on one specific fasciculus, e.g.
arcuate fasciculus, or one specific brain region, e.g. dorsal prefrontal lobe [1, 3]. The
similarity and difference between the two species in terms of global connective patterns
are still largely unknown. The lack of such knowledge partly issues from the
methodology used to analyze the connective anatomy.
Diffusion MRI (dMRI) and tractography approaches have given us the opportunity
to study the whole brain large-scale connectome in primate brains in vivo [1]. Recent
Generally, as illustrated in Fig. 1, we used T1-weighted MRI and dMRI data to con-
struct structural connectivity matrices for each species. Then, each matrix was stretched
to a feature vector. Those feature vectors for a species compose a feature matrix. Next,
an improved SCCA algorithm [5] was adopted. Currently, only the canonical com-
ponents with the strongest correlation between the two feature matrices were consid-
ered by this SCCA algorithm [5]. Consequently, two weight vectors u and v were yield
for the element of the connectivity matrices, and they can be restored to the matrix
format, U and V. By jointly analyzing U and V, we determined the strongly correlated
connectivities conserved between the two species, and the corresponding dMRI derived
fibers were extracted and were suggested to be the preserved fibers. Connectivities and
fibers exclusive to a specific species were also analyzed.
Human Brain Imaging. Ten randomly selected human brains from the Q1 release of
WU-Minn Human Connectome Project (HCP) consortium [6] were used in this study.
The T1-weighted structural MRI had voxels with 0.7 mm isotropic, three-dimensional
acquisition, T1 = 1000 ms, TR = 2400 ms, TE = 2.14 ms, flip angle = 8°, image
matrix = 260 311 260. DMRI was acquired in following parameters: spin-echo
EPI sequence; TR = 5520 ms; TE = 89.5 ms; flip angle = 78°; refocusing flip angle =
160°; FOV = 210 180; matrix = 168 144; spatial resolution = 1.25 mm
1.25 mm 1.25 mm; echo spacing = 0.78 ms. Particularly, a full dMRI session
includes 6 runs, representing 3 different gradient tables, with each table acquired once
with right-to-left and left-to-right phase encoding polarities, respectively. Each gradient
table includes approximately 90 diffusion weighting directions plus 6 b = 0 acquisitions
interspersed throughout each run. Diffusion weighted data consisted of 3 shells of
b = 1000, 2000, and 3000 s/mm2 interspersed with an approximately equal number of
acquisitions on each shell within each run. Ten randomly selected subjects were used in
this study.
Macaque Brain Imaging. UNC-Wisconsin neurodevelopment rhesus MRI database
(http://www.nitrc.org/projects/uncuw_macdevmri/) was used in this work, consisting
of T1 weighted MRI and dMRI data. This is a longitudinal database and we only used
the scans of 10 different subjects when they are more than 18 months old. The released
T1 weighted MRI data has been registered to the UNC Primate Brain Atlas space [4].
The resolution of this space is 0.27 0.27 0.27 mm3 and a matrix of
300 350 250. The basic parameters for diffusion data acquisition were: resolution
of 0.65 0.65 1.3 mm3, a matrix of 256 256 58, diffusion-weighting gradi-
ents applied in 120 directions and b value of 1000 s/mm2. Ten images without diffu-
sion weighting (b = 0 s/mm2) were also acquired.
Preprocessing. Preprocessing steps on T1 weighted MRI included brain skull removal
and tissue segmentation via FSL [7]. T1 weighted MRI data was nonlinearly warped to b0
map of dMRI data via FSL-fnirt [8], before cortical surface reconstruction was performed
to reconstruct inner cortical surface of white matter (WM) [9]. For the dMRI data,
skull-strip and eddy currents were applied firstly, then BedpostX in FSL 5 (http://fsl.
fmrib.ox.ac.uk/fsl/fslwiki/FDT/UserGuide#BEDPOSTX) was adopted to estimate the
axonal orientations for each voxel. In this paper, we used two axonal orientations,
because it was suggested that b-values upwards of 4000 would be required to resolve a 3
fiber orthogonal system robustly [10]. For the sake of convenience in visualization,
DSIstudio [11] was used to reconstructed deterministic fibers from BedpostX derived
axon orientations. 5 104 Fiber tracts were reconstructed for each subject. FA and
angular threshold are 0.1 and 60°. Small FA value for primate brain was suggested in [15].
Structural Connectivity Matrix Construction. The structural connectivity matrices
were constructed from dMRI derived fibers and white matter surfaces with a parcel-
lation scheme. Currently, we used Brodmann areas parcellation scheme as a test bed to
develop and evaluate our framework. All macaque white matter surfaces were warped
to the ‘F99’ macaque atlas space [13] via spherical registration method [12].
126 X. Li et al.
The Brodmann parcellation in the atlas space was mapped back to the surface of each
individual. The Brodmann areas in the ‘Conte69’ human atlas [14] were mapped back
to each human subject’s surface using the same approach. Currently, ipsilateral
structural connectivities were considered, and we used Ms and Hs to denote the con-
nective matrices of macaque and human, respectively. Currently, 28 Brodmann areas
shared by the two atlases and robustly warped to individuals were used as nodes for the
matrices. So the connective matrices Ms and Hs are in the same size. For each indi-
vidual, the element of the matrix, such as mi;j in M, was defined as the connective
strength between Brodmann area i and j. That is the number of fiber tracts connecting
area i and j, divided by the total fiber numbers.
The L1 and L2 are the corresponding Laplacian matrices of the correlation matrices
of X and Y separately. The AGN-SCCA method cannot only discovers a high rela-
tionship between X and Y, but also recovers the structure information from X and
Y respectively. That is, it can find out the correlated features in X. The lasso terms in
both penalties assure the sparsity.
Generally, we solve AGN-SCCA problem by two alternative iterations procedure
which is briefly described as follows (Please see [5] for details).
Species Preserved and Exclusive Structural Connections by SCCA 127
3 Results
3.1 Cross-Validation
We used 10 human and macaque subjects in this study. Because only ipsilateral
connection is considered in this work and we assume that contralateral Brodmann areas
with same label have the same function, we have n = 20 samples for each species.
A five-fold cross validation scheme was adopted to evaluate if the framework yield
consistent U and V and effectiveness of AGN-SCCA algorithm to produce highly
correlation coefficient. Specifically, 16 pairs of human and macaque samples were used
as ‘training’ samples to tune the parameters in AGN-SCCA algorithm till the obtained
weight matrices U and V applied to the remaining 4 ‘testing’ sample pairs yield the
highest correlation coefficient. In Fig. 2, we show the five optimized U and V pairs. The
consistency demonstrates the weight matrix yield by the framework is robust to vari-
ance introduced by subjects. The averaged correlation coefficient values on the
Fig. 2. The five weight matrices Us and Vs yield by five-fold cross-validation. The standard
errors of the five Us and Vs are shown on the most right side.
128 X. Li et al.
five-fold test are 0.95 ± 0.01 (illustrated by the scatter chart in Fig. 3(a) on the
five-fold cross-validation) compared to the average intra-species values for human and
macaque, 0.995 and 0.989, demonstrating the effectiveness of the algorithm and sug-
gesting common global connective patterns between human and macaque do exist.
Fig. 3. (a) The correlation of 1st canonical component between human and macaque. The scatter
chart in the top-right corner is the positive correlation between human subjects and macaque
subjects in the transformed space. The results of the five-fold cross-validation are shown in
different colors. The weight vectors have been transformed back to matrices the averaged weight
matrices of the five-fold cross-validation are shown beside the axes. Only the most positive
weights uij s and vij s (above the standard deviation) are shown; (b) and (c) show the fibers of a
human subject and a macaque subject corresponding to the positive weights in (a).
Species Preserved and Exclusive Structural Connections by SCCA 129
clusters on the diagonal line and one off line cluster. Cluster #1 is located on
somatosensory and motor cortices (BA1–7). The other diagonal one (Cluster #2)
resides on visual cortices (BA17–19) and temporal lobe (BA20–22). The off line
cluster (Cluster #3) consists of the Fronto-occipital stream (BA9, 10 to BA17–19) and
Fronto-temporal stream (BA9, 10 to BA20–22). The fibers of the three clusters on a
human subject and a macaque subject are shown in Fig. 4(b–d). Those structural
connectivities have been reported to be preserved in human and macaque in many
available works [1, 3]. On the other hand, connectivity differences can be derived by
overlapping the negative and positive matrix in Fig. 3(a). Because only one negative
matrix was produced for human subject, it can be directly used to identify the con-
nectivity difference, e.g., the white arrow highlighted connectivities (Fig. 4(e)) linking
Inferior front lobe to temporal lobe (see Fig. 4(f) for the corresponding fibers). This
absent frontal projection to the middle and inferior temporal gyrus in macaque has been
validated by literature reports [3].
Fig. 4. (a) The overlapping of the two species’ positive weight matrices; (b)–(d) the fibers
derived from the 3 clusters in (a); (e) the connectivity difference matrix between the two species;
(f) the fibers on a human subject derived from the arrow-highlighted connectivities in (e).
4 Conclusion
In this work, we used dMRI to estimate the whole brain large-scale white matter
pathways and Brodmann areas as a test bed to construct a global connectome for
human and macaque, on which AGN-SCCA algorithm was adopted to yield the
weights associated with the connectivity to produce the component strongly correlated
between the two species. By analyzing the weights we identified the preserved white
130 X. Li et al.
matter pathways and those exclusive to a specific species. The results are consistent
with the reports in the literatures, demonstrating the effectiveness and promise of this
framework.
References
1. Rilling, J.K., Glasser, M.F., Preuss, T.M., Ma, X., Zhao, T., Hu, X., Behrens, T.E.: The
evolution of the arcuate fasciculus revealed with comparative DTI. Nat. Neurosci. 11(4),
426–428 (2008)
2. Thiebaut de Schotten, M., Dell’Acqua, F., Valabregue, R., Catani, M.: Monkey to human
comparative anatomy of the frontal lobe association tracts. Cortex 48(1), 82–96 (2012)
3. Jbabdi, S., Lehman, J.F., Haber, S.N., Behrens, T.E.: Human and monkey ventral prefrontal
fibers use the same organizational principles to reach their targets: tracing versus
tractography. J. Neurosci. 33(7), 3190–3201 (2013)
4. Styner, M., Knickmeyer, R., Joshi, S., Coe, C., Short, S.J., Gilmore, J.: Automatic brain
segmentation in rhesus monkeys. In: Proceedings of SPIE on Medical Imaging, vol. 6512,
p. 65122L1-8 (2007)
5. Du, L., Huang, H., Yan, J., Kim, S., Risacher, S.L., Inlow, M., Moore, J.H., Saykin, A.J.,
Shen, L., for the Alzheimer’s Disease Neuroimaging Initiative: Structured sparse canonical
correlation analysis for brain imaging genetics: an improved GraphNet method. Bioinfor-
matics 32, 1544–1551 (2016)
6. Van Essen, D.C., Ugurbil, K., Auerbach, E., Barch, D., Behrens, T.E.J., Bucholz, R.,
Chang, A., Chen, L., Corbetta, M., Curtiss, S.W., Della Penna, S., Feinberg, D.,
Glasser, M.F., Harel, N., Heath, A.C., Larson-Prior, L., Marcus, D., Michalareas, G.,
Moeller, S., Oostenveld, R., Petersen, S.E., Prior, F., Schlaggar, B.L., Smith, S.M.,
Snyder, A.Z., Xu, J., Yacoub, E.: The human connectome project: a data acquisition
perspective. Neuroimage 62, 2222–2231 (2012)
7. Jenkinson, M., Beckmann, C.F., Behrens, T.E., Woolrich, M.W., Smith, S.M.: FSL.
Neuroimage 62, 782–790 (2012)
8. Andersson, J.L.R., Jenkinson, M., Smith, S.: Non-linear registration, aka spatial normal-
isation. FMRIB technical report TR07JA2 (2010)
9. Liu, T., Nie, J., Tarokh, A., Guo, L., Wong, S.T.C.: Reconstruction of central cortical surface
from brain MRI images: method and application. Neuroimage 40, 991–1002 (2008)
10. Behrens, T.E., Berg, H.J., Jbabdi, S., Rushworth, M.F., Woolrich, M.W.: Probabilistic
diffusion tractography with multiple fibre orientations: what can we gain? Neuroimage 34(1),
144–155 (2007)
11. Basser, P.J., Pajevic, S., Pierpaoli, C., Duda, J., Aldroubi, A.: In vivo fiber tractography
using DT-MRI data. Magn. Reson. Med. 44, 625–632 (2000)
12. Yeo, B.T., Sabuncu, M.R., Vercauteren, T., Ayache, N., Fischl, B., Golland, P.: Spherical
demons: fast diffeomorphic landmark-free surface registration. IEEE Trans. Med. Imaging
29(3), 650–668 (2010)
13. Lewis, J.W., Van Essen, D.C.: Mapping of architectonic subdivisions in the macaque
monkey, with emphasis on parieto-occipital cortex. J. Comput. Neurol. 428(1), 79–111
(2000)
14. Van Essen, D.C., Glasser, M.F., Dierker, D.L., Harwell, J., Coalson, T.: Parcellations and
hemispheric asymmetries of human cerebral cortex analyzed on surface-based atlases. Cereb.
Cortex 22(10), 2241–2262 (2012)
Species Preserved and Exclusive Structural Connections by SCCA 131
15. Dauguet, J., Peled, S., Berezovskii, V., Delzescaux, T., Warfield, S.K., Born, R., Westin, C.-F.:
Comparison of fiber tracts derived from in-vivo DTI tractography with 3D histological neural
tract tracer reconstruction on a macaque brain. Neuroimage 37, 530–538 (2007). doi:10.1016/j.
neuroimage
16. Du, L., et al.: A novel structure-aware sparse learning algorithm for brain imaging genetics.
In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part III.
LNCS, vol. 8675, pp. 329–336. Springer, Heidelberg (2014)
Modularity Reinforcement for Improving Brain
Subnetwork Extraction
1 Introduction
The human brain naturally befits a graphical representation, where brain regions
and their pair-wise interactions constitute graph nodes and weighted edges,
respectively. An important attribute of the brain is its modular structure, in
which specific subnetworks of brain regions work in tandem to execute vari-
ous functions. Functional magnetic resonance imaging (fMRI) is widely used
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 132–139, 2016.
DOI: 10.1007/978-3-319-46720-7 16
Modularity Reinforcement for Improving Brain Subnetwork Extraction 133
for studying this modular structure of the brain. However, reliable subnetwork
extraction from fMRI data remains challenging. First, the brain network topol-
ogy may be obscured by noisy connectivity estimates [1]. Second, confounds,
such as region size bias [2], effects of motion artifacts [3], and signal dropouts
due to susceptibility artifacts (especially in regions like the orbitofrontal cor-
tex and the inferior temporal lobe) [4], introduce region-specific biases to the
connectivity estimates.
The conventional way for dealing with noisy connectivity matrices is to apply
global thresholding (GT) by either keeping only connections with values above
a certain threshold or keeping a certain graph density [1]. Due to region-specific
connectivity biases, e.g. brain regions in signal dropout locations tend to display
lower connectivity, certain regions that do belong to a subnetwork might not
appear as such based on the fMRI measurements, especially after GT, which
prunes weak edges. To mitigate this overlooked problem, a local thresholding
(LT) method based on the minimal spanning tree and k-nearest neighbors (MST-
kNN) has been proposed [5]. The idea in [5] was to build a single connected graph
using the MST and expand the tree by adding edges from each node to its near-
est neighbors until a desired graph density is reached. However, both key steps of
enforcing a single connected graph and adding edges to all nodes when expanding
the tree lack neuro-scientific justifications. A few studies have explored spectral
graph wavelet transform for graph de-noising [6], but this approach does not
explicitly handle region-specific connectivity biases. In fact, most existing con-
nectivity estimation and subnetwork extraction techniques [1,7] do not account
for these biases.
In this paper, we propose a modularity reinforcement strategy for improv-
ing brain subnetwork extraction. To deal with noisy edges and region-specific
connectivity biases, we propose a local thresholding scheme that normalizes the
connectivity distribution of each node prior to thresholding (Sect. 2.1). Also,
since node pairs belonging to the same subnetwork presumably connect to a
similar set of brain regions, i.e. have similar connection fingerprints, we derive a
node similarity measure from the thresholded graph by comparing the adjacency
structure of each node pair, and refine the graph with this similarity measure to
reinforce its modularity structure (Sect. 2.2). More reliable subnetwork extrac-
tion is consequently facilitated on the refined graph (Sect. 2.3). To set the number
of subnetworks, we adopt an automated technique based on graph Laplacian [8],
and compare that against the conventional modularity-maximization approach
[9]. We validate our modularity reinforcement strategy on both synthetic data
and real data from the Human Connectome Project (HCP).
2 Methods
2.1 Local Thresholding
Due to region-specific connectivity biases, conventional GT might prune relevant
connections with weak edge strength. To account for these biases, we present here
a LT scheme. The idea is to first normalize the connectivity distribution of each
134 C. Wang et al.
node into a uniform interval to rectify the biases. Subsequent global thresholding
on this normalized graph would have the effect of applying local thresholding on
each node. Specifically, let C be an n × n connectivity matrix, where n is the
number of nodes in the brain graph. We normalize the connectivity distribution
by mapping each row of C from [min (Ci,: ), max (Ci,: )] to [0, 1], where Ci,:
denotes row i of C corresponding to the connectivity between brain region i
and all other regions in the brain. A threshold is then applied to generate a
binary adjacency matrix, G, which we then symmetrize by taking the union of
G and GT : A = Gi,j ∪ Gj,i . This binary adjacency matrix A is used to mask
out the noisy edges from C: Ĉi,j = Ai,j Ci,j , which is equivalent to applying a
local threshold to Ci,: for all i . We note that in the event that noisy nodes are
accidentally included, some of the connections to these noise nodes (that might
not be kept by GT) would be kept by LT due to the normalization step.
3 Materials
3.1 Synthetic Data
To illustrate our strategy, we synthesized a small-scale network consisting of
n = 13 nodes in Fig. 1. We also generated synthetic data that cover 100 random
Modularity Reinforcement for Improving Brain Subnetwork Extraction 135
network configurations with n set to 100 nodes. For each network configuration,
the number of subnetworks, N , was randomly selected from [10, 20]. The number
of regions within each subnetwork was set to round (n/N ) + r, where r was
randomly selected from [−2, 2]. With the resulting configuration, we created the
corresponding adjacency matrix, Σ, and drew time courses with 4,800 samples
(analogous to real data) from N (0, Σ). We then added Gaussian noise to the time
courses with signal-to-noise ratio randomly set between [−6dB, −3dB]. Sample
covariance was then estimated from these time courses with correlation values
associated with q % of the nodes reduced by z %, where q was randomly selected
from [20 %, 30 %] and z was randomly selected from [30 %, 40 %] to simulate
region-specific connectivity biases for smaller brain regions [2].
We used the resting state fMRI scans of 77 healthy subjects (36 males and
41 females, ages ranging from 22 to 35) from the HCP Q3 dataset [10]. The
data comprised two sessions, each having a 30 min acquisition with a TR of
0.72 s and an isotropic voxel size of 2 mm. Preprocessing already applied to the
data by HCP [11] included gradient distortion correction, motion correction,
spatial normalization to MNI space, and intensity normalization. Additionally,
we regressed out motion artifacts, mean white matter and cerebrospinal fluid
signals, and principal components of high variance voxels [12], followed by band-
pass filtering with cutoff frequencies of 0.01 and 0.1 Hz. We used the Will90fROI
atlas [13] and the Harvard-Oxford (HO) atlas [14] to define regions of interest
(ROIs). The Will90fROI and HO atlas have 90 and 112 ROIs, respectively. Voxel
time courses within ROIs were averaged to generate region time courses. The
region time courses were demeaned, normalized by the standard deviation, and
concatenated across subjects for extracting group subnetworks. The Pearson’s
correlation values between the region time courses were taken as estimates of
connectivity. Negative elements in the connectivity matrix were set to zero due
to the currently unclear interpretation of negative connectivity [15].
(a) Network structure (b) C (c) C̄ (d) Ĉ (e) C̄S (f) ĈS
Fig. 1. Schematic illustrating our method using small scale example having two subnet-
works with each subnetwork having a provincial hub (blue) and linked by a connector
hub (orange). In (b), warmer color indicates higher connectivity and black dots indicate
the ground truth adjacency matrix. We denote C̄ as global thresholded, and Ĉ as local
thresholded connectivity matrix. At a graph density of 0.25, the GT generated isolated
node 2 in (c), while our LT preserved two edges linked to node 2 in (d). Refining the
graph (c) and (d) suppressed the between-network edges (edges between nodes 6 and
7 & nodes 6 and 9) to be the lowest connectivity in (e) and (f).
On the 100 synthetic dataset with 100 nodes over a density range of [0.005,
0.5] at an interval of 0.01, LTMR achieved significantly higher accuracy (aver-
age DC = 0.6735) than GT (average DC = 0.6216, p = 7.56e-10), LT (average
DC = 0.6537, p = 2.89e-7), and MST-kNN (average DC = 0.6327, p = 7.38e-8)
based on Wilcoxon signed rank test. LTMR also achieved higher DC than GTMR
(average DC = 0.6610, p = 0.34), though did not reach significance.
(a) Overlap with established subnetworks (b) Reproducibility over density range
Fig. 2. Subnetwork extraction on real data at graph densities from 0.05 to 0.5 at
interval of 0.05. Blue = GT, green = LT, black = GTMR, cyan = MST-kNN, and
red = our proposed LTMR strategy. Dash lines indicate average value. In (b), the
DC of the reference density of 0.2 was left blank, since inclusion of DC = 1 might
mislead the reader. In both (a) and (b), local thresholding outperforms the global
thresholding, and modularity reinforcement further increases DC compared to using
connectivity alone. Our proposed strategy attained the highest DC overall.
used as ground truth, Fig. 3a. For this assessment, we only considered connectiv-
ity matrices based on the Will90fROI atlas [13]. Our proposed LTMR achieved
an average DC of 0.6222, which was significantly higher than GT (average
DC = 0.5384, p = 0.002), MST-kNN (average DC = 0.4567, p = 0.002), GTMR
(average DC = 0.5422, p = 0.006), and higher than LT (average DC = 0.5936,
p = 0.063), as shown in Fig. 2a. At a graph density of 0.5041, corresponding to no
thresholding except negative correlation removal, a DC of 0.5667 was attained,
suggesting that some thresholding to remove noisy edges is beneficial. We note
that although some node-wise variations in connectivity distribution might have
a neuronal basis, we postulate that these variations would be overwhelmed by
the various confound-induced connectivity biases, as supported by how local
thresholding outperforms global thresholding. We further note that an average
m of 11 was estimated with the Laplace approach, whereas an average m of 4
was estimated with modularity maximization. This result shows the resolution
limits of modularity maximization [9], i.e. it tends to underestimate the number
of subnetworks in favoring network partitions with groups of modules combined
into larger communities. This suggests the need to explore alternative techniques
for estimating the number of subnetworks.
We next evaluated the subnetwork reproducibility over a range of graph
densities. We used connectivity matrices based on the HO atlas, which has larger
brain coverage than the Will90fROI atlas but does not have subnetwork labels
assigned to the regions. We set subnetworks corresponding to an edge density
of 0.2 as the reference. Based on the Laplace approach, the optimal number of
subnetworks was found to be 11±5 over the range of graph density examined. Our
proposed strategy achieved an average DC of 0.7302, which is significantly higher
than that of GT (DC = 0.6121, p = 0.004), LT (DC = 0.6677, p = 0.027), MST-
kNN (DC = 0.5737, p = 0.003), and higher than GTMR (DC = 0.7004, p = 0.262),
Fig. 2b. The results hold with other densities used as reference.
138 C. Wang et al.
(a) Will90fROI (b) Global thresholding (c) Local thresholding (d) Proposed
5 Conclusions
We proposed a modularity reinforcement strategy for improving brain subnet-
work extraction. By applying local thresholding in combination with modular-
ity reinforcement based on connection fingerprint similarity, we attained higher
accuracy in subnetwork extraction compared to conventional global thresholding
and local thresholding. Higher overlap with established brain systems and higher
subnetwork reproducibility were also shown on the real data. Our results thus
demonstrate clear benefits of refining conventional connectivity estimates with
our strategy for subnetwork extraction. In fact, our strategy can be extended
to applications beyond subnetwork extraction by deriving features based on the
extracted subnetworks, e.g. within-subnetwork connectivity computed from the
original connectivity estimates, and using those features for group analysis and
behavioural association studies.
Modularity Reinforcement for Improving Brain Subnetwork Extraction 139
References
1. Fornito, A., Zalesky, A., Breakspear, M.: Graph analysis of the human connectome:
promise, progress, and pitfalls. Neuroimage 80, 426–444 (2013)
2. Achard, S., Coeurjolly, J.F., Marcillaud, R., Richiardi, J.: fMRI functional con-
nectivity estimators robust to region size bias. In: Statistical Signal Processing
Workshop, pp. 813–816. IEEE (2011)
3. Spisák, T., Jakab, A., Kis, S.A., Opposits, G., Aranyi, C., Berényi, E., Emri, M.:
Voxel-wise motion artifacts in population-level whole-brain connectivity analysis
of resting-state fMRI. PLoS ONE 9(9), e104947 (2014)
4. Weiskopf, N., Hutton, C., Josephs, O., Turner, R., Deichmann, R.: Optimized EPI
for fMRI studies of the orbitofrontal cortex: compensation of susceptibility-induced
gradients in the readout direction. Magn. Reson. Mater. Phys. Biol. Med. 20(1),
39–49 (2007)
5. Alexander-Bloch, A.F., Gogtay, N., Meunier, D., Birn, R., Clasen, L., Lalonde, F.,
Lenroot, R., Giedd, J., Bullmore, E.T.: Disrupted modularity and local connec-
tivity of brain functional networks in childhood-onset schizophrenia. Front. Syst.
Neurosci. 4, 147 (2010)
6. Hammond, D.K., Vandergheynst, P., Gribonval, R.: Wavelets on graphs via spec-
tral graph theory. Appl. Comput. Harmonic Anal. 30(2), 129–150 (2011)
7. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)
8. Niu, J., Fan, J., Stojmenovic, I.: JLMC: a clustering method based on Jordan-Form
of Laplacian-Matrix. In: Performance Computing and Communications Confer-
ence, pp. 1–8. IEEE (2014)
9. Fortunato, S., Barthelemy, M.: Resolution limit in community detection. Proc.
Nat. Acad. Sci. 104(1), 36–41 (2007)
10. Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E.,
Ugurbil, K., Consortium, W.M.H., et al.: The WU-Minn human connectome
project: an overview. Neuroimage 80, 62–79 (2013)
11. Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., Coalson, T.S., Fischl, B.,
Andersson, J.L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J.R., et al.: The min-
imal preprocessing pipelines for the human connectome project. Neuroimage 80,
105–124 (2013)
12. Behzadi, Y., Restom, K., Liau, J., Liu, T.T.: A component based noise correction
method (CompCor) for BOLD and perfusion based fMRI. Neuroimage 37(1), 90–
101 (2007)
13. Shirer, W., Ryali, S., Rykhlevskaia, E., Menon, V., Greicius, M.: Decoding subject-
driven cognitive states with whole-brain connectivity patterns. Cereb. Cortex
22(1), 158–165 (2012)
14. Desikan, R.S., Ségonne, F., Fischl, B., Quinn, B.T., Dickerson, B.C., Blacker, D.,
Buckner, R.L., Dale, A.M., Maguire, R.P., Hyman, B.T., et al.: An automated
labeling system for subdividing the human cerebral cortex on MRI scans into gyral
based regions of interest. Neuroimage 31(3), 968–980 (2006)
15. Skudlarski, P., Jagannathan, K., Calhoun, V.D., Hampson, M., Skudlarska, B.A.,
Pearlson, G.: Measuring brain connectivity: diffusion tensor imaging validates rest-
ing state temporal correlations. Neuroimage 43(3), 554–561 (2008)
16. Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc.
Ind. Appl. Math. 5(1), 32–38 (1957)
Effective Brain Connectivity Through
a Constrained Autoregressive Model
1 Introduction
2 Method
The aim of the proposed method is to reinforce the association between structural
brain connectivity and functional brain activation. To obtain this result, we
resort to a multivariate autoregressive model properly modified in order to allow
for the estimation of the temporal brain activation biased by the structural
connectivity.
where () denotes the transpose. To introduce the structural bias into the model,
the parameters fitting is constrained by the structural information in the update
rule:
Anew = (Aold + η∇E) B, (3)
Effective Brain Connectivity Through a Constrained Autoregressive Model 143
In this manner, the lack of connections among certain regions in the initial
matrix Ainit is reinforced at each iteration keeping the connection originally
set to zero when the gradient descent has introduced some non-null values in
them. Thus, B and Ainit encode the prior structural connectivity that reinforce
the relationship between functional and structural data. During the gradient
descent, it is possible that some values of the matrix Anew become negative
which might be meaningless in terms of causality. Therefore, at each iteration
those values are set to zero to correct the descent. Setting to zero negative values
at each iteration in the matrix Anew is a common way to enforce non-negativity
during the learning. Clearly, this has an effect on accuracy as the solution is sub-
optimal with respect to an unconstrained fitting. On the other side, the need
for non-negative coefficients is motivated by the fact that negative causalities
cannot be interpreted.
The approach can be easily generalized to higher order MAR, where a differ-
ent matrix Ai has to be optimized for each order. Choosing the right model-order
is a trade-off between optimizing the variance and the model complexity. In our
experiments to have a direct validation of the method, we limited the model to
the first order.
Fig. 1. Example of adjacency matrix for one subject plotted in logscale: (a) initial
matrix obtained from the tractography; (b) effective matrix obtained with the proposed
autoregressive model. The convergence process is shown in subfigure (c) as a decrease
of error defined in Eq. (1), where each color line is a different subject.
model filtering. The figures highlight that some structural connections which are
not “used” by the resting-state functional data are canceled out. This has clearly
an effect on the subsequent clustering, which is shown in Fig. 2. Indeed, by ana-
lyzing the group-wise eigenvalues resulting from the joint Laplacians diagonal-
ization, it has been noted a spectral gap at the 4th and 8th eigenvalues for both
structural and effective connectivity matrices, in agreement with previous studies
on other datasets [9]. The value k = 8 has been chosen to clusters the brain as
it better explains the brain known communities. The resulting clustering of the
brain regions based on the structural connectome (Fig. 2(a)) and on the effective
connectome (Fig. 2(b)) are slightly different while preserving the overall organi-
zation. Regarding convergence, the gradient descent finds a sub-optimal solution
by definition. The zeroing step makes convergence more cumbersome but does not
prevent from reaching a minimum as shown in Fig. 1(c).
(a) (b)
Fig. 2. Axial view of joint spectral clustering using k = 8 on (a) the original struc-
tural joint eigenspace, and (b) on the joint eigenspace given by effective connectivity
matrices.
146 A. Crimi et al.
4000
CMAR
Eff.Comm
3500 Stru.Comm
Reconstruction Error
3000
2500
2000
1500
1000
0 2 4 6 8 10 12 14 16 18 20
Subjects
(a) (b)
Fig. 3. (a) Reconstruction error of CMAR model after converging to the effective
connectivity (green circles) or according to block-wise MAR based on the structural
communities (red squares) and the effective communities (blue crosses). The lower the
better. (b) Functional segregation of clusters using the effective communities (green
dots) or structural communities (black stars). The higher the better.
We also devised an analysis to assess whether the clusters obtained form the
autoregressive filtered data are more meaningful in relation to the fMRI time-
series then the cluster obtained from the structural information. We carried out
a block-wise definition of the effective connectivity matrices where one block at
a time, defined by the brain regions belonging to a cluster, is used in a CMAR
model involving only the relative fMRI series. Then, the reconstruction error of
the fitted CMAR models for each cluster have been summed up over all clus-
ters and compared each other. The underlying intuition is that partitioning the
brain using an effective connectivity information would remove those structural
connections which are also meaningless from a functional perspective, at least
in the analyzed experimental data.
The reconstruction error per subject in Fig. 3(a) shows, as expected, that
the lowest error is given by considering the whole network in the CMAR com-
putation. However, when removing some connections according to the clustering
results, the communities determined from the effective connectivity matrix show
to be more self explanatory in terms of functional activity then the communi-
ties obtained from the structural connectivity only. Similar evidence is obtained
when analyzing the cluster functional separation (CFS), defined as the average
ratio between the intra- and inter-cluster cross-correlation as follows:
k
1 i<j∈Cs wij
CF S = (4)
k s=1 i<j∈Cs wij + i∈Cs j∈Ct =Cs wij
where wij is the functional cross-correlation of the time-series for nodes i and j.
This index has been computed for both structural and effective clustering
result. Figure 3(b) shows that CFS with clusters determined using our CMAR
approach is significantly higher when compared with the structural clusters (p <
0.001), demonstrating that the effective clusters are also underpinned by the
functional connectivity. Although, larger datasets experiments are required.
Effective Brain Connectivity Through a Constrained Autoregressive Model 147
5 Conclusions
The effective connectivity inferred by the proposed CMAR model highlights a
different brain architecture underpinned by both structural and functional con-
nectivity. Thanks to this, the method can lead to new insights into understanding
brain effective connections in healthy and pathological subjects.
References
1. Chen, H., et al.: Optimization of large-scale mouse brain connectome via joint
evaluation of DTI and neuron tracing data. NeuroImage 115, 202–213 (2015)
2. Deligianni, F., et al.: A framework for inter-subject prediction of functional con-
nectivity from structural networks. IEEE TMI 32(12), 2200–2214 (2013)
3. Dodero, L., Gozzi, A., Liska, A., Murino, V., Sona, D.: Group-wise functional
community detection through joint Laplacian diagonalization. In: Golland, P.,
Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol.
8674, pp. 708–715. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10470-6 88
4. Fox, M.D., et al.: The human brain is intrinsically organized into dynamic, anti-
correlated functional networks. PNAS 102(27), 9673–9678 (2005)
5. Friston, K.J.: Functional and effective connectivity: a review. Brain Connect. 1(1),
13–36 (2011)
6. Garyfallidis, E., et al.: Dipy, a library for the analysis of diffusion MRI data. Front.
Neuroinformatics 8, 8 (2014)
7. Goebel, R., et al.: Investigating directed cortical interactions in time-resolved fMRI
data using vector autoregressive modeling and Granger causality mapping. Magn.
Reson. Imaging 21(10), 1251–1261 (2003)
8. Granger, C.: Investigating causal relations by econometric models and cross-
spectral methods. Econometrica 37(3), 424–438 (1969)
9. Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C.J., et al.: Mapping
the structural core of human cerebral cortex. PLoS Biol. 6(7), e159 (2008)
10. Hinne, M., Ambrogioni, L., Janssen, R.J., Heskes, T., et al.: Structurally-informed
Bayesian functional connectivity analysis. NeuroImage 86, 294–305 (2014)
11. Honey, C., et al.: Predicting human resting-state functional connectivity from
structural connectivity. PNAS 106(6), 2035–2040 (2009)
12. Jirsa, V., et al.: Towards the virtual brain: network modeling of the intact and the
damaged brain. Arch. Ital. Biol. 148(3), 189–205 (2010)
13. Li, X., Li, K., Guo, L., Lim, C., Liu, T.: Fiber-centered granger causality analysis.
In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011. LNCS, vol. 6892,
pp. 251–259. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23629-7 31
14. Liao, W., et al.: Small-world directed networks in the human brain: multivariate
Granger causality analysis of resting-state fMRI. NeuroImage 54, 2683–2694 (2011)
15. Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed
networks: a survey. Phys. Rep. 533(4), 95–142 (2013)
16. Nooner, K., et al.: The NKI-Rockland sample: a model for accelerating the pace
of discovery science in psychiatry. Front. Neurosci. 6, 152 (2012)
17. Power, J.D., et al.: Spurious but systematic correlations in functional connectivity
MRI networks arise from subject motion. NeuroImage 59(3), 2142–2154 (2012)
18. Saad, Z.S., et al.: Correcting brain-wide correlation differences in resting-state
fMRI. Brain Connect. 3(4), 339–352 (2013)
19. Vincent, J., et al.: Intrinsic functional architecture in the anaesthetized monkey
brain. Nature 447(7140), 83–86 (2007)
GraMPa: Graph-Based Multi-modal Parcellation
of the Cortex Using Fusion Moves
1 Introduction
S. Parisot—The research leading to these results has received funding from the
European Union’s Seventh Framework Programme (FP/2007-2013)/ERC Grant
Agreement No. 319456.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 148–156, 2016.
DOI: 10.1007/978-3-319-46720-7 18
GraMPa: Graph-Based Multi-modal Parcellation of the Cortex 149
using myelin [5], diffusion MRI (dMRI) and tractography [10,12] and functional
MRI (fMRI) data [2,3]. Yet, these modalities suffer from important drawbacks
that cannot be addressed in a mono-modal setting. dMRI is prone to false neg-
atives and biased with respect to the location of fibre terminations, fMRI can
be noisy and prone to false positives, while myelin lacks information outside the
motor area and visual cortex.
Exploring parcellations driven by several modalities could provide more
robust and accurate cortical delineations. For instance, strong similarities have
been observed between myelin maps and resting-state fMRI gradients [5], while
functional and structural connectivity are intrinsically linked. Few methods have
attempted to combine different modalities. The majority of efforts have aimed to
construct a more robust fMRI connectivity matrix informed from tractography,
for instance by eliminating functional connections that do not have a struc-
tural support [11]. This kind of approach however assumes a strong reliability of
dMRI data and a global agreement between structural and functional connec-
tivity. Markov Random Field (MRF) models have been applied successfully to
dMRI and fMRI driven cortical parcellation tasks [6,13]. Their main advantage
is their versatility, in the sense that no restriction is made on the data term
driving the parcellation scheme. As a result, the same framework can be used
for parcellation tasks using different kinds of input data.
In this paper, we exploit this idea and extend the mono-modal MRF mod-
els to the multi-modal setting. We propose an iterative approach where each
iteration computes a set of parcellations driven by a single modality. These par-
cellations are subsequently merged based on each modality’s local reliability
using fusion moves [9]. The merged parcellation initialises the next iteration,
forcing the different modalities to converge towards a set of mutually informed
parcellations. The method was tested on the Human Connectome Project (HCP)
database using myelin maps, and fMRI and dMRI data. Focusing on fMRI par-
cellation, our experiments show that the multi-modal setting yields parcels that
are more reproducible and more representative of the underlying connectivity.
2 Methods
Fig. 1. Overview of the proposed iterative method. Each iteration updates an initial
parcellation into a set of modality specific parcellations using an MRF model. The
parcellations are then merged based on the modalities relative influences into a multi-
modal parcellation, which initialises the next iteration.
information all around the cortex. However, it suffers from a low SNR which can
significantly impact the obtained parcels and their reproducibility. Introducing
multi-modal information in the parcellation scheme could be a way of address-
ing this issue. We consider combining rs-fMRI, dMRI and myelin maps due to
their expected similarities. We define the merging unary costs based on how
informative the modalities are, and from prior knowledge of their weaknesses:
Uvm (lv ) = minmod∈1,N 1 − αmod δ(lv , lvmod ) , where αmod ∈ [0, 1]V are costs
that describe the local reliability of all modalities, δ(., .) is the Kronecker delta
function and lmod are the labellings obtained from mono-modal parcellations.
All αmod costs are rescaled between 0 and 1.
Because of rs-fMRI’s low SNR, the joint parcellation should be influenced
by the other modalities when they are reliable. We therefore assign a uniform
reliability αf M RI = 0.5 to rs-fMRI across the whole cortex. Myelin maps should
influence the merged parcellation in regions where strong variations of myeli-
nation are observed. We therefore define the myelin cost as the gradient of the
pre-smoothed myelin maps (see Fig. 3b). Finally, dMRI tractography suffers from
a gyral bias: tractography streamlines tend to terminate preferentially in gyri
[17]. This bias influences the boundaries of dMRI driven parcellations that tend
to align with cortical folding. To evaluate which vertices are impacted by this
bias, we compute for each vertex v the ratio of the number of fibres that ter-
minate at v over the number of connections obtained by sending streamlines
from v. As shown in Fig. 3a, this measurement supports the gyral bias theory
as the resulting map agrees with cortical folding patterns. Using this measure
as a unary cost prevents the vertices affected by the bias to influence the joint
parcellation. In this setting, dMRI will have little influence on parcel boundaries
and essentially act as a smoothing prior, indicating which vertices should be in
the same parcel. As a result, we expect the converged joint parcellation to be
similar to the rs-fMRI parcellation.
Fig. 2. Quantitative evaluation measures. From left to right in each figure, we compare
the mono-modal parcellations to the merged and the multi-modality guided rs-fMRI
parcellations. Lower BIC values (d) are better. Paired t-test results are shown as non-
significant (n.s), p < 0.05 (*) and p < 0.001 (**).
GraMPa: Graph-Based Multi-modal Parcellation of the Cortex 153
3 Results
Evaluation of cortical parcellations is challenging due to the absence of ground
truth. Our proposed evaluation has two main objectives: evaluate (i) whether
multi-modality increases the robustness of the parcellation method, (ii) how well
the parcellations reflect the underlying connectivity. Since our application is tai-
lored to construct more reliable rs-fMRI parcellations, we focus our evaluation
on this modality. We evaluate the impact of multi-modal information by com-
paring the mono-modal rs-fMRI driven parcellation to the joint and individual
rs-fMRI parcellations obtained using our Graph-based Multi-modal Parcella-
tion (GraMPa) method. We tested GraMPa on 50 randomly selected subjects
(left hemisphere) of the HCP database (S500 release) and used the HCP’s pre-
processed fMRI and dMRI data, and myelin maps. dMRI tractography con-
nectivity profiles are obtained using FSL’s bedpostX and probtrackX [1]. 5000
streamlines are sampled from each mesh vertex. We perform rs-fMRI driven
parcellation using timeseries from a 30 min acquisition. Evaluation is performed
on a second independent 30 min acquisition to test the method’s robustness.
The MRF’s smoothness parameter β is set heuristically to 0.3. Modality specific
MRFs were optimised using fastPD [8] due to its speed, while fusion moves were
optimised using QPBO [14] because of asymmetric pairwise costs. Several MRF
optimisation algorithms were tested with very little impact on the obtained par-
cellations. We tested the reproducibility with respect to initialisation using 10
random initialisations constructed using Poisson Disc Sampling. Parcellations
were computed for four different resolutions (50, 100, 150 and 200 labels). All
measures are computed for all initialisations and subjects. Reproducibility is
evaluated using the Adjusted Rand Index (ARI) [4] and the modified Dice Score
Coefficient (DSC) [13] that allows merging very similar parcels. ARI is a mea-
sure from probability theory that assesses the statistical dependence between
Fig. 3. Visual results for randomly selected subjects. (a) dMRI and (b) Myelin relia-
bility maps. (c, d) Overlap between the boundaries of the multi-modal parcellation and
(c) myelin maps and (d) Brodmann areas. (e–g) Comparative overlap between rs-fMRI
parcellations boundaries and t-fMRI activation maps. Top row: mono-modal parcel-
lations, bottom row: GraMPa rs-fMRI parcellations. (e, f) Motor task, (g) Language
task. Coloured arrow indicate striking examples.
154 S. Parisot et al.
two clustering solutions. It takes values between −1 and 1, where 1 means the
clusterings are identical. Figures 2a and b show comparative boxplots of the two
measures between GraMPa and the mono-modal approach. We can see that most
configurations are more reproducible. Results are significant (p < 0.001) for the
two largest resolutions. The lower performance for 50 parcels could indicate that
it is difficult to obtain large smooth parcels while agreeing with all reliability
maps. Our parcellations’ agreement with the underlying structure is evaluated
by (i) computing the average functional coherence (FC) [6] and (ii) evaluating
the agreement with task fMRI activation maps (obtained using FSL’s standard
tools) using the Bayesian Information Criterion (BIC) [16]. FC evaluates the
average correlation between a parcel’s average timeseries and the timeseries of
all vertices in the same parcel. In order to avoid introducing a size bias, very
small parcels are ignored from the computation. For each parcel, BIC evaluates
how well it is possible to fit a probabilistic model of the concatenated task acti-
vation maps of all 50 subjects. As shown in Fig. 2c and d, GraMPa yields better
results for both measures. Results are significant (p < 0.001) for most configu-
rations. Finally, Fig. 3c–g visually compares parcels boundaries with Brodmann
and myelin maps and the average task activation maps over all 50 subjects. We
can see that GraMPa parcellations have a stronger agreement with task activa-
tions boundaries.
4 Discussion
In this paper, we proposed a general graph-based framework which provides
modality specific coherent parcellations, as well as a multi-modal parcellation
that merges modalities based on their reliabilities. We propose an application
to the construction of more reliable rs-fMRI parcellations through the introduc-
tion of multi-modal information from structural connectivity and myelin maps.
Our experiments show that GraMPa’s parcellations are more robust and more
representative of the underlying structure. One of the main advantages of the
proposed framework is its flexibility. It can be tailored for a specific set of modal-
ities and issues associated with a particular acquisition process. It is also easy to
integrate prior knowledge both in designing the modalities’ reliability maps and
through the introduction of known reliable boundaries defined as a new locally
reliable modality. Another possibility is to design a fully data-driven fusion move
step. Local segmentation uncertainties could be estimated for each modality after
each MRF optimisation using min-marginal energies [7].
Furthermore, our model alleviates the need to match the different modali-
ties’ unary costs and does not limit the number of modalities considered. The
method could be extended to other multi-modal segmentation tasks. It could
prove particularly well-suited to group-wise parcellation, where each subject
would be assimilated to a modality and the fusion would be driven by group
consistency measures. It would have the potential of handling very large groups,
as subjects don’t have to be considered simultaneously. The method could sim-
ilarly be used to merge MRF parcellations obtained from a large set of initial-
isations. Many challenges remain associated with the multi-modal parcellation
GraMPa: Graph-Based Multi-modal Parcellation of the Cortex 155
task. fMRI and dMRI are currently the best way of measuring in vivo connectiv-
ity, but remain very indirect measurements and can be unreliable. In addition,
multi-modal analyses would benefit from a stronger knowledge of the modalities
interactions and similarities. Finally, using our parcellations in a clinical context
requires the development of robust methods for analysing the obtained connec-
tivity networks, while parcellation of diseased subjects may be associated with
new challenges.
References
1. Behrens, T., Berg, H.J., Jbabdi, S., Rushworth, M., Woolrich, M.: Probabilistic
diffusion tractography with multiple fibre orientations: what can we gain? Neu-
roImage 34(1), 144–155 (2007)
2. Blumensath, T., Jbabdi, S., Glasser, M.F., Van Essen, D.C., Ugurbil, K.,
Behrens, T.E., Smith, S.M.: Spatially constrained hierarchical parcellation of the
brain with resting-state fMRI. NeuroImage 76, 313–324 (2013)
3. Craddock, R.C., James, G.A., Holtzheimer, P.E., Hu, X.P., Mayberg, H.S.: A whole
brain fMRI atlas generated via spatially constrained spectral clustering. Hum.
Brain Mapp. 33, 1914–1928 (2012)
4. Eickhoff, S.B., Thirion, B., Varoquaux, G., Bzdok, D.: Connectivity-based parcel-
lation: critique and implications. Hum. Brain Mapp. 36(12), 4771–4792 (2015)
5. Glasser, M.F., Van Essen, D.C.: Mapping human cortical areas in vivo based
on myelin content as revealed by T1-and T2-weighted MRI. J. Neurosci. 31(32),
11597–11616 (2011)
6. Honnorat, N., Eavani, H., Satterthwaite, T., Gur, R., Gur, R., Davatzikos, C.:
GraSP: geodesic graph-based segmentation with shape priors for the functional
parcellation of the cortex. NeuroImage 106, 207–221 (2015)
7. Kohli, P., Torr, P.H.: Measuring uncertainty in graph cut solutions. Comput. Vis.
Image Underst. 112(1), 30–38 (2008)
8. Komodakis, N., Tziritas, G.: Approximate labeling via graph cuts based on linear
programming. IEEE Trans. Pattern Anal. Mach. Intell. 29(8), 1436–1453 (2007)
9. Lempitsky, V., Rother, C., Roth, S., Blake, A.: Fusion moves for Markov random
field optimization. IEEE Trans. PAMI 32(8), 1392–1405 (2010)
10. Moreno-Dominguez, D., Anwander, A., Knösche, T.R.: A hierarchical method for
whole-brain connectivity-based parcellation. Hum. Brain Mapp. 35, 5000–5025
(2014)
11. Ng, B., Varoquaux, G., Poline, J.B., Thirion, B.: Implications of inconsistencies
between fMRI and dMRI on multimodal connectivity estimation. In: Mori, K.,
Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol.
8151, pp. 652–659. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40760-4 81
12. Parisot, S., Arslan, S., Passerat-Palmbach, J., Wells, W.M., Rueckert, D.:
Tractography-driven groupwise multi-scale parcellation of the cortex. In: Ourselin,
S., Alexander, D.C., Westin, C.-F., Cardoso, M.J. (eds.) IPMI 2015. LNCS, vol.
9123, pp. 600–612. Springer, Heidelberg (2015). doi:10.1007/978-3-319-19992-4 47
13. Parisot, S., Rajchl, M., Passerat-Palmbach, J., Rueckert, D.: A continuous flow-
maximisation approach to connectivity-driven cortical parcellation. In: Navab, N.,
Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351,
pp. 165–172. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24574-4 20
156 S. Parisot et al.
14. Rother, C., Kolmogorov, V., Lempitsky, V., Szummer, M.: Optimizing binary
MRFs via extended roof duality. In: CVPR, pp. 1–8. IEEE (2007)
15. Sporns, O.: The human connectome: a complex network. Ann. N. Y. Acad. Sci.
1224, 109–125 (2011)
16. Thirion, B., Varoquaux, G., Dohmatob, E., Poline, J.B.: Which fMRI clustering
gives good brain parcellations? Front. Neurosci. 8(167), 13 (2014)
17. Van Essen, D.C., Jbabdi, S., Sotiropoulos, S.N., Chen, C., et al.: Mapping connec-
tions in humans and non-human primates: aspirations and challenges for diffusion
imaging. In: Diffusion MRI, pp. 337–358 (2013)
A Continuous Model of Cortical Connectivity
1 Introduction
In recent years, the study of structural and functional brain connectivity has
expanded rapidly. Following the rise of diffusion and functional MRI, connec-
tomics has unlocked a wealth of knowledge to be explored. Almost synonymous
with the connectome is the network-theory based representation of the brain.
In much of the recent literature, the quantitative analysis of connectomes has
focused on region-to-region connectivity. This paradigm equates physical brain
regions with nodes in a graph, and uses observed structural measurements or
functional correlations as a proxy for edge strengths between nodes.
Critical to this representation of connectivity is the delineation of brain
regions, the parcellation. Multiple studies have shown that the choice of parcel-
lation influences the graph statistics of both structural and functional networks
[15,17,18]. It remains an open question which of the proposed parcellations is
the optimal representation, or even if such a parcellation exists [14].
It is thus useful to construct a more general framework for cortical connectiv-
ity, one in which any particular parcellation of the cortex may be expressed and
its connectivity matrix derived, and one in which the variability of connectivity
measures can be modeled and assessed statistically. It is also important that
this framework allow comparisons between parcellations, and representations in
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 157–165, 2016.
DOI: 10.1007/978-3-319-46720-7 19
158 D. Moyer et al.
The key theoretical component of our work is the use of point process theory
to describe estimated cortical tract projections. A point process is a random
process where any realization consists of a collection of discrete points on a
measurable space. The most basic of these processes is the Poisson process, in
which events occur independently at a specific asymptotic intensity (rate) λ over
the chosen domain [12]. λ completely characterizes each particular process, and
is often defined as a function λ : Domain → R+ , which allows the process to
vary in intensity by location. The expected count of any sub-region (subset) of
the domain is its total intensity, the integral of λ over the sub-region. In this
paper, our domain is the connectivity space of the cortex, the set of all pairs of
points on the surface, and the events are estimated tract intersections with the
cortical surface.
1
It is critical to distinguish between white matter fibers (fascicles) and observed
“tracts.” Here, “tracts” denotes the 3d-curves recovered from Diffusion Weighted
Imaging via tractography algorithms.
A Continuous Model of Cortical Connectivity 159
Assuming that each event is independent of all other events except for its
symmetric event (i.e., each tract is recovered independently), we model con-
nectivity as a intensity function λ : Ω × Ω → R+ , such that for any regions
E1 , E2 ⊂ Ω, the number of events is Poisson distributed with parameter
C(E1 , E2 ) = λ(x, y)dxdy. (1)
E1 ,E2
Here, Ph0 is the hth degree associated Legendre polynomial of order 0. Note
that the non-zero order polynomials have coefficient zero due to the radial
symmetry of the spherical heat kernel [1]. However, since we are estimating
a function on Ω × Ω, we use the product of two heat kernels as our KDE
kernel κ. For any two points p and q, the kernel value associated to a end
point pair (x, y) is κ((p, q)|(x, y)) = Kσ (x, p)Kσ (y, q). It is easy to show that
Ω×Ω
Kσ (x, p)Kσ (y, q)dpdq = 1.
The spherical heat kernel has a single shape parameter σ which corre-
sponds to its bandwidth. While in general tuning this parameter requires the
re-estimation of λ̂ at every iteration, by rewriting our kernel we can memoize
160 D. Moyer et al.
2h + 1 2k + 1
κ((p, q)|D) = exp{−σ(h2 + h + k 2 + k)}
4π 4π
h k
Independent of D, evaluated every iteration
× Ph0 (xi · p)Pk0 (yi · q)
(xi ,yi )∈D
Independent of σ, evaluated once
Thus, evaluations of the kernel at any point (p, q) can be done quickly for
sequences of values of σ. We then are left with the choice of loss function. Denot-
ing the true intensity function λ, the estimated intensity λ̂, and the leave-one-
out estimate λ̂i (leaving out observation i), Integrated Squared Error (ISE) is
defined:
ISE(σ|D) = (λ̂(x, y|σ) − λ(x, y))2 dxdy
Ω×Ω
2
≈ λ̂(x, y|σ)2 dxdy − λ̂i (xi , yi ) + Constant.
|D|
(xi ,yi )∈D
Hall and Marron [9] suggest tuning bandwidth parameters using ISE. In practice,
we find that replacing each leave-one-out estimate with its logarithm log λ̂i (xi , yi )
yields more consistent and stable results.
Here, the independence assumption plays a critical role, allowing pairs of regions
to be evaluated separately. Unfortunately this is biased toward parcellations with
more, smaller regions, as the Poisson distribution has tied variance and mean
in one parameter. A popular likelihood-based option that somewhat counterbal-
ances this is Akaike’s Information Criterion (AIC),
|P |
AIC(P ) = −2 log L(P ) + log |D|. (4)
2
Table 1. This table shows mean ICC scores for each connectome generation method.
The count method - the standard approach - defines edge strength by the fiber endpoint
count. The integrated intensity method is our proposed method; in general it returns a
dense matrix. However, many of the values are extremely low, and so we include results
thresholding the matrix, with and without elements that are zero for all subjects.
Highest ICC scores for each atlas are bolded.
entries in the adjacency matrices that should be zero but that are subject to a
small amount of noise – a few erroneous tracks – have very low ICC. Our method
in effect smooths tracts endpoints into a density; end points near the region
boundaries are in effect shared with the adjacent regions. Thus, even without
thresholding we dampen noise effects as measured by ICC. With thresholding,
our method’s performance is further improved, handily beating the counting
method with respect to ICC score. It is important to note that for many graph
statistics, changing graph topology can greatly affect the measured value [18].
While it is important to have consistent non-zero measurements, the difference
between zero and small but non-zero in the graph context is also non-trivial. The
consistency of zero-valued measurements is thus very important in connectomics.
Table 2 suggests that all three measures, while clearly different, are consistent
in their selection at least with respect to these three parcellations. It is somewhat
surprising that the Destrieux atlas has quite low likelihood criteria, but this may
be due to the (quadratically) larger number of region pairs. Both likelihood based
Table 2. This table shows the means over all subjects of three measures of parcellation
“goodness”. The retest versions are the mean of the measure using the parcellation’s
regional connectivity matrix (or the count matrix) from one scan, and the estimated
intensity function from the other scan.
Fig. 2. A visualization of the marginal connectivity M (x) = E λ̂(x, y)dy for the Left
i
Post-central Gyrus region of the DK atlas (Region 57). The region is shown in blue on
the inset. Red denotes higher connectivity regions with the blue region.
retest statistics also choose the DK parcellation, while ISE chooses the Destrieux
parcellation by a small margin. It should be noted that these results must be
conditioned on the use of a probabilistic CSD tractography model. Different
models may lead to different intensity functions and resulting matrices. The
biases and merits the different models and methods (e.g. gray matter dilation
for fiber counting vs streamline projection) remain important open questions
(Figs 1 and 2).
4 Conclusion
We have presented a general framework for structural brain connectivity. This
framework provides a representation for cortical connectivity that is independent
of the choice of regions, and thus may be used to compare the accuracy of
a given set of regions’ connectivity matrix. We provide one possible estimation
method for this representation, leveraging spherical harmonics for fast parameter
estimation. We have demonstrated this framework’s viability, as well as provided
a preliminary comparison of regions using several measures of accuracy.
The results presented here lead us to conjecture that our connectome esti-
mates are more reliable compared to standard fiber counting, though we stress
that a much larger study is required for strong conclusions to be made. Fur-
ther adaptations of our method are possible, such as using FA-weighted fiber
164 D. Moyer et al.
counting. Our future work will explore these options, conduct tests on larger
datasets, and investigate the relative differences between tracking methods and
parcellations more rigorously.
Acknowledgments. This work was supported by NIH Grant U54 EB020403, as well
as the Rose Hills Fellowship at the University of Southern California. The authors would
like to thank the reviewers as well as Greg Ver Steeg for multiple helpful conversations.
References
1. Chung, M.K.: Heat kernel smoothing on unit sphere. In: 3rd IEEE International
Symposium on Biomedical Imaging: Nano to Macro 2006, pp. 992–995. IEEE
(2006)
2. Desikan, R.S., et al.: An automated labeling system for subdividing the human
cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage
31(3), 968–980 (2006)
3. Diggle, P.: A kernel method for smoothing point process data. Appl. Stat. 34,
138–147 (1985)
4. Fischl, B.: Freesurfer. NeuroImage 2(62), 774–781 (2012)
5. Fischl, B., et al.: High-resolution intersubject averaging and a coordinate system
for the cortical surface. Hum. Brain Mapp. 8(4), 272–284 (1999)
6. Fischl, B., et al.: Automatically parcellating the human cerebral cortex. Cereb.
Cortex 14(1), 11–22 (2004)
7. Garyfallidis, E., et al.: Dipy, a library for the analysis of diffusion MRI data. Front.
Neuroinform. 8(8), 1–17 (2014)
8. Gutman, B., Leonardo, C., Jahanshad, N., Hibar, D., Eschenburg, K., Nir, T.,
Villalon, J., Thompson, P.: Registering cortical surfaces based on whole-brain
structural connectivity and continuous connectivity analysis. In: Golland, P.,
Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol.
8675, pp. 161–168. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10443-0 21
9. Hall, P., Marron, J.S.: Extent to which least-squares cross-validation minimises
integrated square error in nonparametric density estimation. Probab. Theor. Relat.
Fields 74(4), 567–581 (1987)
10. Jahanshad, N., et al.: Alzheimer’s Disease Neuroimaging I (2013) genome-wide scan
of healthy human connectome discovers SPON1 gene variant influencing dementia
severity. Proc. Natl. Acad. Sci. USA 110(12), 4768–4773 (2013)
11. Klein, A., Tourville, J., et al.: 101 labeled brain images and a consistent human
cortical labeling protocol. Front. Neurosci. 6(171), 10–3389 (2012)
12. Moller, J., Waagepetersen, R.P.: Statistical Inference and Simulation for Spatial
Point Processes. CRC Press, Boca Raton (2003)
13. Portney, L.G., Watkins, M.P.: Statistical measures of reliability. Found. Clin. Res.:
Appl. Pract. 2, 557–586 (2000)
14. de Reus, M.A., Van den Heuvel, M.P.: The parcellation-based connectome: limita-
tions and extensions. NeuroImage 80, 397–404 (2013)
15. Satterthwaite, T.D., Davatzikos, C.: Towards an individualized delineation of func-
tional neuroanatomy. Neuron 87(3), 471–473 (2015)
16. Tournier, J.D., Yeh, C.H., Calamante, F., Cho, K.H., Connelly, A., Lin, C.P.:
Resolving crossing fibres using constrained spherical deconvolution: validation
using diffusion-weighted imaging phantom data. NeuroImage 42(2), 617–625
(2008)
A Continuous Model of Cortical Connectivity 165
1 Introduction
Substantial evidence suggests that many major psychiatric and neurological disorders
are associated with aberrations in the network structure of the brain [5, 7]. With the
availability of modern neuroimaging modalities such as diffusion tensor (DTI) and
functional (fMRI) imaging, there is currently an exciting potential for researchers to
identify connectivity-based biomarkers of disease states. Since brain networks are
known to exhibit complex interactions, multivariate pattern analysis (MVPA) methods
are particularly suitable here, as they aim to identify the site of the pathology by
examining the data as a whole, accounting for the correlations among the network
features.
2 Method
1
min kX WðPXÞk2F : ð1Þ
W;P 0 2
orthogonality implies that the bases representing the subnetworks are non-overlapping,
which enhances interpretability and eliminates redundancy.
The F2 term ensures smoothness of the low dimensional representation with respect
to the manifold structure encoded in affinity matrix S 2 Rnn . Intuitively, this regu-
larizer preserves the intrinsic geometric structure in the data by encouraging repre-
sentations Pxi and Pxj to be close if Si;j is large, i.e., subjects i and j are similar under
some notion. This regularizer
can also be expressed in terms of the trace operator:
F2 ðPÞ ¼ Tr ðPXÞLðPXÞT , where L 2 Rnn is the graph Laplacian defined by L ¼
P
n
D S; and D is a diagonal matrix with Di;i ¼ Si;j 8i. While the type of inter-subject
j¼1
relationship that can be encoded via the affinity matrix S is general, in this work, we
will take advantage of the clinical scores that are used to evaluate patients, and create a
“disease-severity graph” to capture the disease-induced variation in the SCs. Specifi-
cally, we will assign higher value to Si;j if subjects i and j share similar severity scores.
Finally, the classification error term F3 enhances the discriminatory power of NMF
by encouraging the label groups in the low dimensional embedding PX to be separated
by a hyperplane b (for clarity, the intercept term b is dropped from our presentation
hereon after). Thus, our proposed NMF model seeks to identify subnetwork bases that
are not only reconstructive of data but also discriminative of label groups (note that the
squared error is used here to allow the ADM algorithm to admit a closed form
solution).
Integrating the above constraint terms into the projective NMF Eq. (1) gives us our
final objective function (k1 ; k2 0 below are regularization parameters):
2
min kX W ðPX Þk2F þ k1 Tr ðPXÞLðPXÞT þ k2 y ðPX ÞT b2 þ IX ðW Þ: ð2Þ
W;P 0;b
~ 1; W ¼ W
such that H ¼ PX; W ¼ W ~ 2 ; P ¼ P;
~ H ¼ H;
~
where I þ ðÞ denotes the indicator function of the non-negative orthant. Although the
auxiliary variables introduced from variable splitting may appear redundant, this
strategy is commonly used in ADM frameworks (see [12] for example), as it allows the
ADM subproblems to be solved in closed form. In the context of our work, the
augmented Lagrangian (AL) function for the above constrained problem is given by:
170 T. Watanabe et al.
~ H; H;
LAL W; P; b; P; ~ W ~ 1; W
~ 2 ; K ~ ; K ~ ; K ~ ; KH ; K ~ ¼ kX WH k2
W1 W2 P H F
T
~ H
þ k1 Tr HL ~ þ k2 y H T b þ I þ W 2
~ 1 þ IX W ~ 2 þ Iþ P ~
2
D E D E
þ KW~ 1 ; W W ~ 1 þ K~ ;W W ~ 2 þ KP ; P P ~ þ hKH ; H PX i þ K ~ ; H H
~
W2 H
q nW W
~ 1 2 þ W W
~ 2 2 þ P P
~ 2 þ kH PX k2 þ H H
o
~ 2 ;
þ F F F F F
2
where W; P; b; W ~ 1; W
~ 2 ; P;
~ H; H
~ and K ~ ; K ~ ; K ~ ; KH ; K ~ are primal and dual
W1 W2 P H
variables, q [ 0 is the AL penalty parameter, and ; denotes the trace inner product.
The ADM algorhm is derived by alternately minimizing LAL with respect to each
primal variable while holding others fixed, followed by a gradient ascent step on dual
variables. The overall ADM algorithm can be summarized as follows:
The primal updates above can all be carried out efficiently in closed form:
P HXT þ P~ þ KH XT KP =q XXT þ I p 1 ~
P max 0; P þ KP~ =q
1
W XH T þ q W~ 1 þW
~ 2 K~ K~ HH T þ 2qIr ~1
W max 0; W 1 þ KW~ 1 =q
W1 W2
1 T
H W T W þ 2qI r þ k2 bbT W X þ qPX KH þ k2 byT ~
H qH þ KH~ ðk1 L þ qIn Þ1
1
b HH T y ~2
W ProjX W þ KW~ 2 =q
Here URV H represents the SVD of A and 0 2 RðprÞr is a matrix of all zeros;
solution (3) is unique as long as A is full column rank (see Proposition 7 in [11]).
Label-Informed Non-negative Matrix Factorization 171
Dataset. We apply our method to a TBI dataset consisting of 34 TBI patients and 32
age-matched controls. While the control subjects were scanned only once, the TBI
patients were scanned and evaluated at three different time points: 3, 6, and 12 months
post-injury. Of the 34 TBI patients, 18 had all 3 time points, 9 had 2 and 7 had only one
timepoint. The functional outcome of patients was evaluated using the Glasgow Out-
come Scale Extended (GOSE) and Disability Rating Scale (DRS), which are com-
monly used in TBI. GOSE ranges from 1 = dead to 8 = good recovery, whereas DRS
ranges from 0 = normal to 29 = extremely vegetated. In total, the dataset comprises
111 total scans, with 32 labeled control and 79 labeled TBI. All scans are accompanied
with 11 clinical scores that are intended to assess the cognitive functioning of the
subject.
Creating the SCs. DTI data was acquired for each subject (Siemens 3T TrioTim, 8
channel head coil, single shot spin echo sequence, TR/TE = 6500/84 ms, b = 1000
s/mm2, 30 gradient directions). 86 ROIs from the Desikan atlas were extracted to
represent the nodes of the structural network. Probabilistic tractography [3] was per-
formed from each of these regions with 100 streamline fibers sampled per voxel,
resulting in an 86 86 matrix of weighted connectivity values, where each element
represents the conditional probability of a pathway between regions, normalized by the
active surface area of the seed ROI. Finally, the 86 86 connectivity matrix of each
subject was vectorized to its p = 3655 lower triangular elements, resulting in x 2 Rpþ
representing the SC.
Implementation Details. We applied our method to SCs computed from the TBI
dataset to compute the subnetwork bases and their corresponding NMF coefficients;
here we let y = + 1 indicate TBI and y = - 1 indicate control. The disease-severity
graph was created using the functional outcome indices of GOSE/DRS as follows.
First, we constructed a symmetrized k-nearest-neighbor (k-NN) graph with k = 5,
where the distance between scans i and j was measured as di;j ¼ ðGOSEi -
GOSEj Þ2 þ ðDRSi - DRSj Þ2 . Then a binary affinity graph was created by setting Si;j to
1 if and only if scans i and j were connected by the k-NN graph and did not represent
the same subject (to avoid connecting same TBI patients who underwent multiple
scans); controls were left un-connected.
We identified r = 5 subnetwork bases using this affinity graph, and the regulariza-
tion parameters were set at k1 ¼ k2 ¼ 0:25, as the model became stable around this
value (degradation in classification performance was observed when parameters were
set at k1 ¼ k2 ¼ 0, i.e., a setup equivalent to traditional NMF). To initialize the ADM
variables, we use the strategy introduced in [4] to deterministically initialize W and H
and set all other variables to zero for replicability. The AL parameter value was set to
q = 1000 based on empirical test runs, and the ADM algorithm was terminated when
the relative change in the objective function value (Eq. 2) at successive iterations fell
below 104 and the following primal residual condition was met:
172 T. Watanabe et al.
!
W W
~ 1 W W~ 2 kH PX k H H ~ P P~
max F
; F
; F
; F
; F
\104 :
kW kF kW kF kH kF kH kF kPkF
Classification Results. Table 1 reports the classification results from LOSO-CV for
different methods, showing overall accuracy, specificity (type I error), sensitivity (type II
error), and balanced score rate (BSR), which is the mean of specificity and sensitivity.
The results show that the classification performance obtained using the proposed sub-
network features demonstrates a noticeable improvement over using the SC features in
its original form, achieving accuracy of 82.0 % and a BSR of 81.8 %. The SVM
achieves the next best performance, but the model is hard to interpret since all 1000 edge
features contribute to the classifier. Finally, despite using a weighted loss function, we
see the sparsity-promoting L1-regularized classifiers suffer from low sensitivity, which
is likely caused by data label imbalance, as well as the correlated structures among the
features (a case where L1-regularizations tend to suffer).
Effect of Manifold Regularization. We next assessed whether the manifold regu-
larizer with the disease-severity graph has successfully preserved the inter-patient
relationship in terms of GOSE/DRS functional outcome indices. To do this, we
computed Spearman’s rank correlation between the subnetwork bases coefficients and
GOSE/DRS indices from the 79 TBI scans. The results reported in Table 2 reveal that
for all basis coefficients, consistently positive and negative correlations (statistically
significant) are obtained for GOSE and DRS, respectively. This result indicates that
subjects with similar level of disease-severity share similar representations in the
embedding space, demonstrating the impact of manifold regularization.
Fig. 1. The subnetwork bases obtained with r ¼ 5. The edge color represents the sign of the
corresponding hyperplane coefficient b 2 Rr (blue = negative/control, red = positive/TBI).
174 T. Watanabe et al.
that the network structure of the first basis exhibits strong bilateral symmetry with
notable inter-hemispheric connections between the cerebellar, precuneus, and cingulate
regions. Moreover, the second subnetwork basis resembles dense inter-hemispheric
connections among the subcortical regions, with the sign indicating that these edges
tend to be the weaker among TBI patients. On the other hand, subnetwork bases 3–5
represents connection towards TBI. Overall, the subnetworks exhibit a diffuse con-
nectivity pattern that spans across the cortex, suggesting that damages from TBI results
in a widespread disturbance in brain network. Interestingly, the connectivity patterns in
the first two bases exhibit rich connectivity pattern within the subcortical and medial
posterior regions, which are frequently reported to be vulnerable in TBI.
Conclusions. We have presented a supervised NMF framework for extracting a dis-
joint set of subnetworks that are interpretable and highlight group differences in
structural connectivity. The method is also capable of preserving the manifold structure
in the data encoded by an affinity graph, thereby respecting the intrinsic geometry of
the data. Experiment on a TBI dataset shows that the subnetworks identified from our
method can not only be used to reliably discriminate TBI from controls, but also exhibit
tight correlation with TBI-outcome indices, indicating that subjects with similar level of
TBI-severity share similar subnetwork representations due to manifold regularization.
References
1. Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., et al.: Machine
learning for neuroimaging with scikit-learn. Front. Neuroinformatics 8(14) (2014)
2. Allahyar, A., Ridder, J.: FERAL: network-based classifier with application to breast cancer
outcome prediction. Bioinformatics 31(12), i311–i319 (2015)
3. Behrens, T., et al.: Non-invasive mapping of connections between human thalamus and
cortex using diffusion imaging. Nat. Neurosci. 6(7), 750–757 (2003)
4. Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix
factorization. Pattern Recognit. 41, 1350–1362 (2008)
5. Cheplygina, V., Tax, D.M., Loog, M., Feragen, A.: Network-guided group feature selection
for classification of autism spectrum disorder. In: Wu, G., Zhang, D., Zhou, L. (eds.) MLMI
2014. LNCS, vol. 8679, pp. 190–197. Springer, Heidelberg (2014)
6. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large
linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
7. Ghanbari, Y., Smith, A.R., Schultz, R.T., Verma, R.: Identifying group discriminative and
age regressive sub-networks from DTI-based connectivity via a unified framework of
non-negative matrix factorization and graph embedding. Med. Image Anal. 18(8) (2014)
8. Kasenburg, N., et al.: Supervised hub-detection for brain connectivity. In: Proceedings of the
SPIE, vol. 9784, Medical Imaging 2016: Image Processing, p. 978409 (2016)
9. Lee, D.D., Seung, H.S.: Learning the parts of objects by NMF. Nature 401, 788–791 (1999)
10. Liu, X., et al., H.: Projective nonnegative graph embedding. IEEE Trans. Image Proc. (2010)
11. Manton, J.H.: Optimization algorithms exploiting unitary constraints. IEEE Trans. Signal
Process. 50(3), 635–650 (2002)
12. Xu, Y., Yin, W., Wen, Z., Zhang, Y.: An alternating direction algorithm for matrix
completion with nonnegative factors. Front. Math. China 7(2), 365–384 (2012)
Predictive Subnetwork Extraction
with Structural Priors for Infant Connectomes
1 Introduction
Preterm birth is a world-wide health challenge, affecting millions of children
every year [1]. Very preterm birth (≤ 32 weeks post-menstrual age, PMA) affects
brain development and puts a child at a high risk for delayed, or altered, cognitive
and motor neurodevelopment. It is known from studies of diffusion MR images,
that the development of white matter plays a critical role in the function of a
child’s brain, and that white matter injury is associated with poorer outcomes [2–
5]. Recently, Ziv et al. and Brown et al. showed that by representing the set of
white matter connections as a network (i.e., connectome), features of network
topology could be used to predict abnormal general neurological function and
neuromotor function respectively [4,6].
Representing a diffusion tensor image (DTI) of the brain as a network defined
between regions of interest (ROIs) allows an anatomically informed reduction of
dimensionality from millions of tensor-valued voxels down to thousands of con-
nections (edges). However, for the purposes of prediction, thousands of features
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 175–183, 2016.
DOI: 10.1007/978-3-319-46720-7 21
176 C.J. Brown et al.
may still be too many and cause over-fitting when limited numbers (e.g. only
hundreds) of scans are available [7]. Furthermore, region of interest based studies
suggest that structural abnormalities related to poor neurodevelopmental out-
comes are not spread evenly across the entire brain, but instead are localized to
particular anatomy [3]. Thus, there is motivation to discover which particular
subnetworks (group of connections or edges) in the brain network best predict
different brain functions.
Some previous works have explored the use of brain subnetworks for predict-
ing outcomes [7–10]. For instance, Zhu et al. used t-tests at each edge in a dataset
of functional connectomes for group discriminance, followed by correlation-based
feature selection and training of a support vector machine (SVM), to find sub-
networks that were predictive of schizophrenia [8]. This multi-stage feature selec-
tion and model training is not ideal, however, because it precludes simultaneous
optimization of all model parameters. Munsell et al. used an Elastic-Net based
subnetwork selection for predicting the presence of temporal lobe epilepsy and
the success of corrective surgery in adults [7]. This method encourages sparse
selection of stable features, useful for identifying those edges most important for
prediction [11], but fails to leverage the underlying structure of the brain net-
works that might inform the importance or the relationships between edges. In
order to capture dependencies between neighbouring edges, Li et al. employed
a Laplacian-based regularizer (in a framework similar to GraphNet [11]) that
encouraged their subnetwork weights to smoothly vary between neighbouring
edges [10]. However, this smoothing may reduce sparsity by promoting many
small weights and blur discontinuities between the weights of neighbouring edges
that should be preserved. An ideal regularizer would encourage a well con-
nected subnetwork while preserving sparsity and discontinuities. Ghanbari et
al. used non-negative matrix factorization to find a sparse set of non-negative
basis subnetworks in structural connectomes [9]. However, rather than trying
to predict specific outcomes (as we propose below), Ghanbari et al. introduced
age-regressive, group-discriminative, and reconstructive regularization terms on
groups of subnetworks, encouraging each group to covary with a particular factor.
They argued that non-negative subnetwork edge weights are more anatomically
interpretable, especially in the case of structural connectomes which have only
non-negative edge feature values.
In this paper, we present our novel approach to identifying anatomical sub-
networks of the human white-matter connectome that are optimally predictive
of a preterm infant’s cognitive and motor neurodevelopmental scores assessed at
18 months of age, adjusted for prematurity. Similar to Munsell et al., our method
is based on a regularized linear regression on the outcome score of choice. Here,
however, we introduce a constraint that ensures the non-negativity of subnet-
work edge weights. We further propose two novel informed priors designed to find
predictive edges that are both anatomically plausible and well integrated into a
connected subnetwork. We demonstrate that these priors effectuate the desired
effect on the learned subnetworks and that, consequently, our method outper-
forms a variety of other competing methods on this very challenging outcome
prediction task. Finally, we discuss the structure of the learned subnetworks in
the context of the underlying neuroanatomy.
Preterm Infant Connectome Subnetworks 177
2 Method
2.1 Preterm Data
Our dataset contains 168 scans taken between 27 and 45 weeks PMA from a
cohort of 115 preterm infants (nearly half of the infants were scanned twice),
born between 24 and 32 weeks PMA. Connectomes were generated for each scan
by aligning an infant atlas of 90 anatomical brain regions with each DTI. Full-
brain streamline tractography was then performed in order to count the number
of tracts (i.e., edge strength) connecting each pair of regions. Our previous works
provide details on the scanning and connectome construction processes [6] and
a discussion on interpreting infant connectomes [5]. Cognitive and neuromotor
function of each infant was assessed at 18 months of age, corrected for prematu-
rity, using the Bayley Scales of Infant and Toddler Development 3rd edition [12].
The scores are normalized to 100 ± 15; adverse outcomes are those with scores
at or below 85 (i.e., ≤ −1 std.).
Our dataset is imbalanced, containing few scans of infants with high and low
outcome scores. In order to flatten this distribution, the number of connectomes
in each training set was doubled by synthesizing instances with high and low
outcome scores, using the synthetic minority over-sampling technique [13].
Many of the 4005 possible connectome edges are anatomically unlikely (i.e.,
between regions not connected by white matter fibers) but may be non-zero in
certain scans due to imaging noise and accumulated pipeline error (i.e. due to
atlas registration, tractography, and tract counting) [15]. With many more edges
than training samples, some edges may appear discriminative by pure chance,
when in fact they are just noise. Therefore, we propose a network backbone prior
term that encodes a penalty discouraging the subnetwork from including edges
with a low signal-to-noise ratio (SNR) in the training data. The SNR of the j-th
edge can be computed as the ratio MEAN(X:,j )/SD(X:,j ). However, this may
falsely declare an edge as noisy when the variability (c.f. denominator) in the
edge value is not due to noise but rather due to the edges values changing in
a manner that correlates with the outcome of the subject. To counteract this
problem, we divide the scans into two classes: scans with normal outcomes, H,
and scans with adverse outcomes, U . The SNR is then computed separately for
each class. Let XΩ represent a matrix with a subset of the rows in X where Ω ∈
{U, H}. The SNR for each edge, j, in each class, Ω, is computed as SNR(XΩ,j ) =
MEAN(XΩ,j )
SD(XΩ,j ) . In order not to favour the strongest fiber bundles over weak yet
important bundles, we threshold the SNR at each edge conservatively, to exclude
only the least anatomically likely edges. An edge, j, is only penalized if both
SNR(XU,j ) and SNR(XH,j ) are less than or equal to 1 (i.e., signal is weaker
than noise in both classes). In particular, B is an M × M diagonal matrix, such
that,
1, if SNR(XH,j ) ≤ 1 and SNR(XU,j ) ≤ 1
Bj,j = (3)
0, otherwise.
So wT Bw only penalizes edges that do not pass the SNR threshold among either
instances with normal outcomes or abnormal outcomes, and thus are likely noisy.
Figure 1 shows an example of B. Note that, especially for infant connectomes,
even edges with high SNR may not represent white matter fibers but instead
high FA from other causes [5]. Nevertheless, such high-SNR edges are not likely
due to noise but instead to some real effect and thus may aid prediction.
Fig. 1. (a) A sample backbone prior network (i.e., all edges where Bj,j = 0) mapped on
to a Circos ideogram (http://circos.ca/). Inter-hemispherical connections are in green
and intra-hemispherical connections are in red (left) and blue (right). Opacity of each
link is computed as SNR(XU,j ) × SNR(XH,j ). (b) Axial, (c) sagittal and (d) coronal
views of the same network rendered as curves representing the mean shape of all tracts
between those connected regions (from one infant’s scan).
−1, if i = p or i = q or j = p or j = q
C(ei,j , ep,q ) = (4)
0, otherwise,
such that the term wT Cw becomes smaller (i.e., more optimal) for each pair of
non-zero weighted subnetwork edges sharing a node. This term places a priority
on retaining edges in the subnetwork that are connected to hub nodes. This is
desirable since subnetwork hub nodes indicate regions that join many connections
(i.e., edges) predictive of outcome. In contrast to a Laplacian based regularizer
which would encourage subnetwork weights to become locally similar, reducing
sparsity, our proposed term simply rewards subnetworks with stronger hubs.
3 Results
For each method (both proposed and competing), coarse grid searches were
performed in powers of two over the method’s hyper-parameters to find the
best performance for both cognitive and motor outcomes independently. For the
proposed method, this search was over λL1 , λC , λB ∈ {20 , ..., 29 }. A finer grid
search was not performed to avoid over-fitting to the dataset. For each setting of
the parameters, a leave-2-out, 1000-round cross validation test was performed.
If two scans were of the same infant, those scans were not split between test and
training sets. Table 1 shows a comparison of the different methods tested on the
preterm infant connectomes for prediction of motor and cognitive scores.
Table 1. Correlation (r) between ground-truth and predicted scores, area over REC
curve (AOC) values and classification accuracy of scores at or below 85 (acc.) for
each model, assessed via 1000 rounds of leave-2-out cross validation. Note that Brown
et al.’s method [6] performs binary classification only.
Motor Cognitive
Method r AOC acc. r AOC acc.
Zhu et al. [8] 0.1586 27.3904 45.10 0.02055 28.0529 49.65
Elastic-Net [7] 0.2703 24.575 58.75 0.2074 24.8292 54.75
Brown et al. [6] - - 62.85 - - 52.55
Linear regression 0.2696 24.777 58.75 0.2445 24.72 55.15
+ L1 regularization 0.3136 18.5451 64.00 0.2443 24.7514 55.2
+ Non-neg. constraint 0.4327 14.5326 68.80 0.3171 17.7255 57.65
+ Backbone prior 0.4355 14.474 68.55 0.3271 17.8184 58.45
+ Connectivity prior (Ours) 0.4423 14.253 70.80 0.3432 17.3768 59.50
Our proposed method with backbone and connectivity priors achieved the
highest correlations, lowest AOCs and best 2-class classification accuracies
for both motor and cognitive scores (for parameter settings, [λL1 , λC , λB ] of
[22 , 21 , 26 ] and [25 , 22 , 25 ], respectively). For 2-class classification in particular,
our method outperformed Brown et al.’s method by 7.4 %, Elastic-Net [7] by
8.4 % and Zhu et al.’s method [8] by 17.6 % higher accuracy on average. Using a
two-proportion z-test, we found all these differences to be statistically significant
(p < 0.05). Also, note that, beginning with standard linear regression, the corre-
lation values improved as each regularization term was added. All tested methods
had statistically significant (p < 0.05) correlations since, for 1000 × 2 = 2000
total predictions, the threshold for 95 % significance is r ≥ 0.0439.
Figure 2 displays the predictive subnetworks learned by our proposed method
(averaged over all rounds of cross validation). Subnetworks were stable across
rounds: 93.6 % of all edges were consistently in or out of the subnetwork 95 % of
the time. We examined the structure of the selected subnetworks to analyse the
effect of the proposed regularization terms. By including the L1 regularization
term, the learned subnetworks were very sparse, having an average of 71.6 % and
Preterm Infant Connectome Subnetworks 181
98.2 % of edge weights set to zero for motor and cognitive scores, respectively,
up from only 6.7 % (for either score) without the L1 term. Adding the back-
bone network prior reduced the number of low SNR edges (i.e., Bj,j = 1) by
18.6 % percent for motor score prediction and 11.2 % for cognitive score predic-
tion. Adding the connectivity prior improved subnetwork efficiencies (a measure
of network integration [5]) by a factor of 6.8 (from 0.0059 to 0.0403) and 2.2
(from 0.2807 to 0.6215) for subnetworks predictive of motor and cognitive scores,
respectively.
Fig. 2. (Top) Optimal weighted subnetworks for prediction of (a) motor and (b) cog-
nitive outcomes. Stronger edge weights are represented with more opaque streamlines.
(Bottom) Circos ideograms for the (c) motor and (d) cognitive subnetworks.
4 Conclusions
To better understand neurodevelopment and to allow for early intervention when
poor outcomes are predicted, we proposed a framework for learning subnetworks
of structural connectomes that are predictive of neurodevelopmental outcomes
for infants born very preterm. We found that by introducing our novel network
backbone prior, the learned subnetworks were more robust to noise by includ-
ing fewer edges with low SNR weights. By including our connectivity prior, the
subnetworks became more highly integrated, a property we expect for subnet-
works pertinent to specific functions. Compared to other methods, our approach
achieved the best accuracies for predicting both cognitive and motor scores of
preterm infants, 18 months into the future.
References
1. World Health Organization. Preterm birth fact sheet no. 363. http://www.who.
int/mediacentre/factsheets/fs363/en/. Accessed 03 Mar 2015
2. Back, S.A., Miller, S.P.: Brain injury in premature neonates: a primary cerebral
dysmaturation disorder? Ann. Neurol. 75(4), 469–486 (2014)
3. Chau, V., Synnes, A., Grunau, R.E., Poskitt, K.J., Brant, R., Miller, S.P.: Abnor-
mal brain maturation in preterm neonates associated with adverse developmental
outcomes. Neurology 81(24), 2082–2089 (2013)
4. Ziv, E., Tymofiyeva, O., Ferriero, D.M., Barkovich, A.J., Hess, C.P., Xu, D.: A
machine learning approach to automated structural network analysis: application
to neonatal encephalopathy. PLoS ONE 8(11), e78824 (2013)
5. Brown, C.J., Miller, S.P., Booth, B.G., Andrews, S., Chau, V., Poskitt, K.J.,
Hamarneh, G.: Structural network analysis of brain development in young preterm
neonates. NeuroImage 101, 667–680 (2014)
6. Brown, C.J., et al.: Prediction of motor function in very preterm infants using
connectome features and LSI. In: Navab, N., Hornegger, J., Wells, W.M.,
Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 69–76. Springer, Heidelberg
(2015)
7. Munsell, B.C., Wee, C.-Y., Keller, S.S., Weber, B., Elger, C., da Silva, L.A.T.,
Nesland, T., Styner, M., Shen, D., Bonilha, L.: Evaluation of machine learning
algorithms for treatment outcome prediction in patients with epilepsy based on
structural connectome data. NeuroImage 118, 219–230 (2015)
8. Zhu, D., Shen, D., Jiang, X., Liu, T.: Connectomics signature for characterizaton
of MCI and schizophrenia. In: ISBI, pp. 325–328. IEEE (2014)
9. Ghanbari, Y., Smith, A.R., Schultz, R.T., Verma, R.: Identifying group discrimina-
tive and age regressive sub-nets from DTI-based connectivity via a unified frame-
work of NMF and graph embedding. MIA 18(8), 1337–1348 (2014)
10. Li, H., Xue, Z., Ellmore, T.M., Frye, R.E., Wong, S.T.: Identification of faulty DTI-
based sub-networks in autism using network regularized SVM. In: Proceedings of
ISBI, vol. 6, pp. 550–553 (2012)
Preterm Infant Connectome Subnetworks 183
11. Grosenick, L., Klingenberg, B., Katovich, K., Knutson, B., Taylor, J.E.: Inter-
pretable whole-brain prediction analysis with GraphNet. NeuroImage 72(2), 304–
321 (2013)
12. Bayley, N.: Manual for the Bayley Scales of Infant Development, 3rd edn. Harcourt,
San Antonio (2006)
13. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic
minority over-sampling technique. J. AI Res. 16(1), 321–357 (2002)
14. Schmidt, M.: Graphical model structure learning with l1-regularization. Ph.D. the-
sis, University of British Columbia (Vancouver) 2010
15. Cheng, H., Wang, Y., Sheng, J., Kronenberger, W.G., Mathews, V.P.,
Hummer, T.A., Saykin, A.J.: Characteristics and variability of structural networks
derived from diffusion tensor imaging. NeuroImage 61(4), 1153–1164 (2012)
16. Honey, C.J., Sporns, O., Cammoun, L., Gigandet, X., Thiran, J.P., Meuli, R.,
Hagmann, P.: Predicting human resting-state functional connectivity from struc-
tural connectivity. Proc. Natl. Acad. Sci. USA 106(6), 2035–40 (2009)
17. de Reus, M.A., Saenger, V.M., Kahn, R.S., van den Heuvel, M.P.: An edge-centric
perspective on the human connectome: link communities in the brain. Phil. Trans.
R. Soc. B 369(1653), 20130527 (2014)
18. Bi, J., Bennett, K.P.: Regression error characteristic curves. In: Proceedings of
ICML-2003, pp. 43–50 (2003)
19. Zhang, S., Ide, J.S., Li, C.S.R.: Resting-state functional connectivity of the medial
superior frontal cortex. Cereb. Cortex 22(1), 99–111 (2012)
Hierarchical Clustering of Tractography
Streamlines Based on Anatomical Similarity
1 Introduction
Diffusion MRI (dMRI) allows us to estimate the preferential direction of water
molecule diffusion at each voxel in white matter (WM). Tractography algorithms
follow these directions to reconstruct continuous paths of diffusion. The most
common approach to segmenting WM from dMRI data is to use every voxel in the
brain as a seed for tractography and to group the resulting streamlines into bun-
dles. Recent advances in dMRI acquisition hardware and software have increased
both spatial and angular resolution, yielding large tractography datasets that are
difficult to parse manually. This creates a need for computational methods that
can extract anatomically meaningful bundles automatically.
Typical methods for unsupervised clustering of streamlines use similarity
measures based on spatial coordinates [1–3]. This is not consistent with the app-
roach followed by neuroanatomists, who define WM bundles based on the brain
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 184–191, 2016.
DOI: 10.1007/978-3-319-46720-7 22
Anatomical-Based Hierarchical Clustering of Streamlines Tractography 185
structures that they go through or next to, rather than their spatial coordinates
in a template space. Our goal is to develop a similarity measure that mimics
this approach, comparing streamlines based on their anatomical neighborhood.
Previous attempts to incorporate anatomical information in streamline cluster-
ing mostly used the termination regions of the streamlines, either in a post-hoc
manner [3] or in the similarity metric itself [4,5]. The similarity measure that we
propose in this work includes a detailed description of all the regions that form
the anatomical neighborhood of a streamline, everywhere along its trajectory.
Such a description was previously used to incorporate prior information from a
set of training subjects in the tractography step itself [6]. However, that was a
supervised approach, limited to a set of predefined bundles from an atlas.
We incorporate the proposed anatomical similarity measure into a hierarchi-
cal spectral clustering algorithm [1–3]. The benefit of a hierarchical approach
is that it models the structure of large WM tracts, which are known to be
subdivided into multiple smaller bundles. We compare our similarity metric
to one based on Euclidean distance between streamlines, using data from the
MGH/UCLA Human Connectome Project. We show that clustering streamlines
based on their anatomical neighborhood rather than their spatial coordinates
leads to a 20 % improvement in the agreement of the clusters with manual label-
ing by a human rater. Importantly, we achieve this without using prior infor-
mation from manual labels, which allows us to explore whole-brain structure
without being constrained to a predetermined set of bundles.
2 Methods
s(A, B) s(A, B)
min snorm (A, B), snorm (A, B) = + .
A,B a(A) a(B)
186 V. Siless et al.
where ·, · is the inner product, and Li , Lj are the sets of all neighboring labels
for streamlines fi , fj . The normalization term |Li ∩ Lj |, which is the number of
common neighbors between the streamlines, penalizes trivial streamlines with
too few neighbors. The sum in the above equation can be seen as the joint
probability of the anatomical neighborhoods of two streamlines.
Anatomical-Based Hierarchical Clustering of Streamlines Tractography 187
3 Results
3.1 Data Analysis
We used dMRI and structural MRI (sMRI) data from 32 healthy sub-
jects, scanned as part of the Human Connectome Project (humanconnect
omeproject.org). The data was acquired with the MGH Siemens Connectom,
a Skyra 3T MRI system with a custom gradient capable of maximum strength
300 mT/m and slew rate 200 T/m/s. The sMRI data was acquired with MEM-
PRAGE [9], TR=2530 ms, TE=1.15 ms, TI=1100 ms, 1 mm isotropic resolution.
The dMRI data was acquired with 2D EPI, TR = 8800 ms, TE=57.0 ms, 1.5 mm
2
isotropic resolution, 512 gradient directions, bmax = 10, 000 s/mm .
We reconstructed orientation distribution functions using the generalized
q-sampling imaging model [10] and performed deterministic tractography using
DSI Studio [11]. We obtained a total of 500 k streamlines. As we are interested in
long-range connections, and to make computations tractable, we excluded any
streamlines shorter than 55 mm, leaving on the order of 100 k streamlines per
subject. Streamlines were then downsampled to N = 10 equispaced points.
For comparison with unsupervised clustering, a trained rater labeled the 18
major WM bundles manually for each subject: corticospinal tract (cst), inferior
longitudinal fasciculus (ilf), uncinate fasciculus (unc), anterior thalamic radia-
tion (atr), cingulum - supracallosal bundle (ccg), cingulum - infracallosal (angu-
lar) bundle (cab), superior longitudinal fasciculus - parietal bundle (slfp), supe-
rior longitudinal fasciculus - temporal bundle (slft), corpus callosum - forceps
major (fmaj), corpus callosum - forceps minor (fmin) [12].
Each subject’s dMRI and sMRI data was co-registered with an affine trans-
formation. The anatomical segmentation was obtained by processing the sMRI
data with the automated cortical parcellation and subcortical segmentation tools
in FreeSurfer [13,14]. In addition, subcortical WM labels were defined by classi-
fying each WM voxel that was within 5 mm from the cortex based on its nearest
cortical label. This resulted in a total of 261 cortical and subcortical labels.
We performed unsupervised clustering with the two similarity measures
described in the previous section. For the anatomical similarity measure we eval-
uated neighborhoods with 6, 14 and 26 elements. Due to space constraints, we
show here results with the 26-element neighborhood only as it performed best.
We iterated the clustering algorithm until a total of 200 clusters were gener-
ated. To evaluate the algorithm for different numbers of clusters, we pruned the
hierarchical clustering tree to keep the first 75, 100, 125, 150 or 200 clusters.
(a) (b)
Fig. 1. (a) Average Dice coefficient of clusters and manually labeled tracts over 18
tracts and 32 subjects, as a function of the total number of clusters. (b) Average Dice
coefficient over all subjects by tract, when the total number of clusters is 200.
(a) (b)
Fig. 2. Average homogeneity (a) and completenes (b) over 18 tracts and 32 subjects,
as a function of the number of clusters.
as a function of the total number of clusters. Figure 1(b) shows the average Dice
coefficient over all subjects by tract, when the total number of clusters is 200.
The anatomical similarity measure is 20 % better than the Euclidean similarity
measure in terms of its agreement with bundles defined by a human rater. We
also compute homogeneity and completeness, two metrics that are commonly
used to evaluate clustering quality [16]. In Fig. 2 we show homogeneity (a) and
completeness (b) for both the anatomical and Euclidean similarity measure. This
comparison takes into account only streamlines that belong to one of the manu-
ally labeled tracts, as it requires ground truth classes. Our anatomical similarity
measure outperforms the Euclidean similarity measure in both homogeneity and
completeness (p < .0001).
Anatomical-Based Hierarchical Clustering of Streamlines Tractography 189
(a) (b)
Fig. 3. Average Euclidean similarity (a) and anatomical similarity (b) for each of the
two clustering methods, as a function of the number of clusters.
In Fig. 4 we show, for four pairs of anatomical ROIs, the clusters for which
at least 5 % of streamlines pass through both ROIs when the number of clusters
is 200. The Euclidean similarity measure produces noisier and less anatomically
consistent clusters than the anatomical similarity measure. For example, stream-
lines that lie on opposite sides of the midline but are close to each other in space
may be erroneously clustered together by we , but not by wa (see Fig. 4(d)).
190 V. Siless et al.
Euclidean Similarity
Anatomical Similarity
4 Conclusion
We present a method for unsupervised hierarchical clustering of dMRI tractog-
raphy data based on anatomical similarity. We compare this to the conventional
approach of using a similarity based on Euclidean distance. We find that the
anatomical similarity yields results more consistent with manual labeling. That
is, without introducing any training data from human raters, we are able to
obtain results that are in closer agreement with such a rater. We achieve this
simply by using a similarity metric that is better at replicating how a human
with neuroanatomical expertise would segment WM tracts, i.e., based on the
anatomical structures that they either intersect or neighbor, everywhere along
the tracts’ trajectory. This allows us to obtain anatomically meaningful WM
Anatomical-Based Hierarchical Clustering of Streamlines Tractography 191
References
1. O’Donnell, L., et al.: Automatic tractography segmentation using a high-
dimensional white matter atlas. IEEE Trans. Med. Imaging 26, 1562–1575 (2007)
2. Guevara, P., et al.: Robust clustering of massive tractography datasets. NeuroIm-
age 54(3), 1993–1975 (2010)
3. Wassermann, D., et al.: Unsupervised white matter fiber clustering and tract prob-
ability map generation: applications of a Gaussian process framework for white
matter fibers. NeuroImage 51(1), 228–241 (2010)
4. Wang, Q., et al.: Application of neuroanatomical features to tractography cluster-
ing. Hum. Brain Mapp. 34(9), 2089–2102 (2013)
5. Tunc, B., et al.: Automated tract extraction via atlas based adaptive clustering.
NeuroImage 102, Part 2:596–607 (2014)
6. Yendiki, A., et al.: Automated probabilistic reconstruction of white-matter path-
ways in health and disease using an atlas of the underlying anatomy. Front. Neu-
roinform. 5(23), 12–23 (2011)
7. Shi, J., et al.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal.
Mach. Intell. 22(8), 888–905 (2000)
8. Golub, G.H., et al.: Matrix Computations. Johns Hopkins University, Baltimore
(1996)
9. van der Kouwe, A., et al.: Brain morphometry with multiecho MPRAGE. Neu-
roImage 40(2), 559–569 (2008)
10. Yeh, F.C., et al.: Generalized q sampling imaging. IEEE Trans. Med. Imaging
29(9), 1626–1635 (2010)
11. Yeh, F.C., et al.: Deterministic diffusion fiber tracking improved by quantitative
anisotropy. PLoS ONE 8(11), 11 (2013)
12. Wakana, S.: Reproducibility of quantitative tractography methods applied to cere-
bral white matter. NeuroImage 36(3), 630–644 (2007)
13. Fischl, B., et al.: Whole brain segmentation: automated labeling of neuroanatom-
ical structures in the human brain. Neuron 33(3), 341–355 (2002)
14. Fischl, B., et al.: Automatically parcellating the human cerebral cortex. Cereb.
Cortex 14(1), 11–22 (2004)
15. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology
26(3), 297–302 (1945)
16. Rosenberg, A., et al.: V-measure: a conditional entropy-based external cluster eval-
uation measure. In: Proceedings of 2007 Joint Conference on Empirical Methods
in Natural Language Processing and Computational Natural Language Learning
(EMNLP-CoNLL), pp. 410–420 (2007)
Unsupervised Identification of Clinically
Relevant Clusters in Routine Imaging Data
1 Introduction
The number of images produced in radiology departments is rising rapidly, gen-
erating thousands of records per day that cover a wide range of diseases and
treatment paths [9]. Identifying diagnostically relevant markers in this data is
a key to improving diagnosis and prognosis. Currently, computational image
analysis typically relies on well annotated and curated training data such as
COPDGene or LTRC1 that have fostered substantial methodological advance.
While these kind of data sets enable the creation of accurate and sensitive detec-
tors for specific findings, they are limited, since annotation is only feasible on
a relatively small number of cases. Selection or study specific data acquisition
can introduce bias, and limits the range of observations represented in the data.
In contrast, learning from routine data could enable the discovery of relation-
ships and markers beyond those that can be feasibly annotated, sampling a wide
variety of cases. Furthermore, unsupervised learning on such data enables the
search for novel disease phenotypes that better reflect a grouping of patients
with similar prognosis, than current categories do.
G. Langs—This research was supported by teamplay which is a Digital Health Ser-
vice of Siemens Healthineers, by the Austrian Science Fund, FWF I2714-B31, and
WWTF S14-069.
1
www.copdgene.org (COPDgene), ltrcpublic.com (LTRC).
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 192–200, 2016.
DOI: 10.1007/978-3-319-46720-7 23
Unsupervised Identification of Clinically Relevant Clusters 193
Fig. 1. Population clustering and evaluation. (a) All processing steps towards pop-
ulation clustering are performed unsupervised and use anonymized routine images
exported from a PACS system. (b) Findings extracted from radiology reports are used
to evaluate if clusters reflect disease phenotypes in the population.
Relation to Previous Work. Radiomics [11] involving (a) imaging data, (b)
segmentation, (c) feature extraction and (d) analysis [10] has recently gained
significant attention, but approaches that reduce the reliance on annotation to
extend the covering of variability are scarse. Our work is a contribution to this
direction. Although applicable to a large number of conditions, radiomics is
mostly applied and developed in oncology [1,3,11]. Aerts et al. use a large number
of routine CT images of cancer patients recorded on multiple sites to discover
prognostic tumor phenotypes [1]. Wibmer et al. differentiate malign from benign
prostate tissue by analysing texture features extracted from MRI images [17].
Shin et al. learn semantic associations between radiology images and reports
from a data set extracted from a PACS [14], but only uses pre-selected 2D key
slices that were referenced from clinicians.
The proposed radiomics approach differs from previous techniques in several
significant aspects. We do not restrict analysis to a certain disease type or a
small region of interest but implement a general form of population analysis.
The most significant difference to prior work is that human interaction is not a
prerequisite to bring images into processable form. We do not require selection of
key images [14] or manual annotation of regions of interest [1,11,17]. In order to
make this possible, spatial normalization involving localization and registration
is performed. The resulting non-linear mapping to a common reference space
allows coordinates and label-masks to be transferred across the population. We
extract texture and shape features and use Latent Dirichlet Allocation (LDA)
[2] to discover latent topics of co-occurring feature classes that are shared across
the population. Subsequently, these topics are used to build volume descriptors
by encoding the contribution of each topic to a specific subject.
194 J. Hofmanninger et al.
2 Identification of Clusters
Spatial Normalization. We perform spatial normalization to establish spatial
correspondences of voxels across the population. This allows to study location
dependent visual variation without the need for manual definition of regions of
interest or preselection of imaging data only showing a specific organ. For this
purpose, we perform non-linear registrations of all images to a common reference
atlas. For a given image Ii ∈ {I1 , . . . , II } and an atlas A, we seek to find a non-
linear transformation T so that A ≈ T(Ii ). High variability in the data such as
the absence of organs, variation in size and shape or diseases poses challenges to
such a registration process. To consider parts of this variations in the normal-
ization process we implement a multi-template approach (Fig. 2). Instead of a
direct mapping to an atlas, images are registered to a set of template candidates
{E1 , . . . , EE } that cover variability in the population. The transformations of the
templates to the atlas are performed in advance, when building the template-set.
They are carefully supervised and supported by manually annotated landmarks
to ensure high quality registrations.
Templates
Normalized Images
PACS Images
Atlas
a b c
In most cases, radiology images cover a delimited region rather than the whole
body. To identify location and extend of these fragments in the templates, we
Unsupervised Identification of Clinically Relevant Clusters 195
3 Evaluation
Data. Experiments are performed on a set of 7812 daily routine CT scans
acquired in the radiology department of a hospital. The dataset includes all
CT scans that were taken during a period of 2 1/2 years and show the lung. We
only include volumes with slice thickness of ≤3 mm, where the number of slices
exceeds 100 and a high spatial frequency reconstruction kernel (e.g. B60, B70,
196 J. Hofmanninger et al.
B80, I70, I80,. . . ) was used. For a subset of 5886 cases, the radiology reports in
the form of unstructured text are available.
Term Extraction. We build a NLP framework for automatic extraction of
terms describing pathological findings in radiology reports. Extracted terms are
mapped to the RadLex2 ontology, which provides a unified vocabulary of clinical
terms, and models relationships by mapping into multiple hierarchies. One of
these hierarchies comprises all words that are related to pathological findings.
We identify pathological terms by searching for words and their synonyms in
the report that are part of this specific hierarchy. The words are then mapped
to their respective RadLex term. Our framework is furthermore able to identify
negations, so that explicitly negated terms are ignored. We define T as the
number of distinct pathological terms and substitute each term by an integer
number {1, . . . , T }. We define Ti as the set of all terms that occur in the radiology
report of subject i. For further analysis we only consider terms that occur more
than 50 times resulting in a set of T = 69 distinct terms.
Evaluating Associations Between Visual Clusters and Report Terms.
For evaluation, we restrict the area of interest to the lung, so that only features
extracted in the lung are used. Clustering is performed on the full set of images,
while for evaluation only records with a report are considered. Aim of the evalu-
ation is to test the hypothesis, that the clustering reflects pathological subgroups
in the population. In order to do so we test whether volume label assignments
(pathology terms) are associated with cluster assignments. A cell-χ2 -test is per-
formed for each term t ∈ {1, . . . , T } and each cluster k ∈ {1, . . . , K} to test
whether its cluster frequency V is significantly different from its population fre-
quency C by a 2 × 2 contingency table:
Here, B denotes the total number of subjects in the population and R the
size of a cluster. Since V is potentially small, we perform Fisher’s exact test. This
results in a p-value that gives the statistical significance of term t being over or
under represented in cluster k. Testing for each cluster independently increases
the Family-Wise Error (FWE) rate and inflates the probability of making a false
discovery of an association between the term and a cluster. We strongly control
the FWE by correcting the p-values with the Bonferroni-Holmes approach. We
define ptk as the corrected p-value for term t being associated with cluster k and
ORtk as the corresponding Odds Ratio. As this is an exploratory analysis we do
not correct the p-values on the term level.
Quality Criterion of Clusters. We interpret the number of discovered asso-
ciations between cluster and terms as a measure of quality of the population
2
http://www.rsna.org/RadLex.
Unsupervised Identification of Clinically Relevant Clusters 197
clustering. This not only allows to quantify the relative quality of an image
descriptor, but also enables to find the optimal number of clusters. For a prede-
fined number of clusters K we define the measure of quality
K
T
QK = [ptk ≤ 0.05]. (2)
k=1 t=1
4 Results
Figure 3 shows values of the quality criterion (Eq. 2) for various numbers of K
using the LDA volume descriptor f L for clustering. K-means is based on random
initialization. Thus, to rule out random effects, we perform the experiments with
a set of 5 different random seeds. Graphs are shown for each seed (gray), the
average result (blue) that was used to determine the number of clusters and
the random seed (red) for which the evaluation results are reported. Figure 4
shows a comparison of different feature sets (f H , f S and f L ) with respect to the
clustering quality QK . Concatenating texture and shape features [f H f S ] allows
to discover more structure in the data than each feature set individually. The
LDA embedding f L further improves the number of associations discovered. For
further results the descriptor f L and the number of cluster 20 are fixed. Figure 5
illustrates the visual variability of the data by showing a 2D visualization of
the f L descriptors using t-SNE [12]. In addition, exemplary slices of volumes
at different positions in the feature space are shown. Figure 6a illustrates all
associations discovered by population clustering. Positive associations (ORtk >
1) and negative associations (ORtk < 1) are shown for all ptk ≤ 0.05. Figure 6(b–
e) shows a comparison of 3 exemplary clusters illustrating the raw features (b),
the embedding (c) a set of terms that are associated with the cluster (d) and
exemplary slices of volumes in the cluster (e).
Fig. 5. 2D visualization of the LDA image descriptors of 7812 volumes using t-SNE.
Exemplary volume slices from different areas in the feature space are given to illustrate
the visual variability in the population.
micro
b
SIFT
69
macro
z-score
SIFT aralick
H
-2
LDA
Cluster
Terms
Emphysema
E usion
Cyst
Bulla
d
Ascites
Lymphoma
e
1 Cluster 20 Ascites <0.001 6.66 Bulla < 0.001 2.63 Lymphoma < 0.001 4.25
OR > 1 Haemorrhage <0.001 5.99 Sclerosis 0.004 1.80 Lesion < 0.001 2.10
OR < 1 a Haematoma
Compression
<0.001
<0.001
5.36
5.12
Mass 0.047 1.75 Granuloma
Sclerosis
0.002
0.004
1.80
1.52
p > 0.05
Fig. 6. (a) Discovered associations between clusters (columns) and terms (rows). Terms
are sorted by decreasing occurrence frequency. Positive associations (OR > 1) are
indicated red and negative associations (OR < 1) are indicated blue. (b–e) Comparison
of three clusters. (b) Shows raw features, (c) The LDA embedding and (d) Indicates
the appearance of 6 terms that are overrepresented in one of these clusters. (e) Shows
exemplary volume slices of members and lists of up to 5 significantly overrepresented
terms with p-values and OR of the respective clusters.
5 Conclusion
We propose a framework for visual population clustering of large clinical routine
imaging data. After spatial normalization, visual features are learned, and a
clustering is performed on the volume level. We evaluate the impact of features
on the clustering, and validate the clinical relevance of the resulting grouping
of patients based on corresponding radiology reports. Results show that the
Unsupervised Identification of Clinically Relevant Clusters 199
References
1. Aerts, H.J., Velazquez, E.R., Leijenaar, R.T., et al.: Decoding tumour phenotype
by noninvasive imaging using a quantitative radiomics approach. Nat. Commun. 5
(2014)
2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn.
Res. 3, 993–1022 (2003)
3. Gillies, R.J., Kinahan, P.E., Hricak, H.: Radiomics: images are more than pictures,
they are data. Radiology 278(2), 563–577 (2016)
4. Göksel, O., Jiménez-del Toro, O.A., Foncubierta-Rodrı́guez, A., Muller, H.:
Overview of the VISCERAL challenge at ISBI. In: Proceedings of VISCERAL
Challenge at ISBI, New York, NY (2015)
5. Gruslys, A., Acosta-Cabronero, J., Nestor, P.J., et al.: A new fast accurate nonlin-
ear medical image registration program including surface preserving regularization.
IEEE Trans. Med. Imaging 33(11), 2118–2127 (2014)
6. Haralick, R.M., Shanmugam, K., et al.: Textural features for image classification.
IEEE Trans. Syst. Man Cybern. 3(6), 610–621 (1973)
7. Hofmanninger, J., Langs, G.: Mapping visual features to semantic profiles for
retrieval in medical imaging. In: Proceedings of IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 457–465 (2015)
8. Toews, M., Wachinger, C., Estepar, R.S.J., Wells, W.M.: A feature-based app-
roach to big data analysis of medical images. In: Ourselin, S., Alexander, D.C.,
Westin, C.-F., Cardoso, M.J. (eds.) IPMI 2015. LNCS, vol. 9123, pp. 339–350.
Springer, Heidelberg (2015). doi:10.1007/978-3-319-19992-4 26
9. Kumar, R.S., Senthilmurugan, M.: Content-based image retrieval system in medical
applications. Int. J. Eng. Res. Technol. 2(3) (2013)
10. Kumar, V., Gu, Y., et al.: Radiomics: the process and the challenges. Magn. Reson.
Imaging 30(9), 1234–1248 (2012)
11. Lambin, P., Rios-Velazquez, E., Leijenaar, R., et al.: Radiomics: extracting more
information from medical images using advanced feature analysis. Eur. J. Cancer
48(4), 441–446 (2012)
12. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn.
Res. 9(2579–2605), 85 (2008)
13. Mondal, P., Mukhopadhyay, J., Sural, S., Bhattacharyya, P.P.: 3D-sift feature
based brain atlas generation: an application to early diagnosis of Alzheimer’s dis-
ease. In: International Conference on Medical Imaging, m-Health and Emerging
Communication Systems, pp. 342–347. IEEE (2014)
14. Shin, H.C., Lu, L., Kim, L., Seff, A., Yao, J., Summers, R.M.: Interleaved
text/image deep mining on a very large-scale radiology database. In: Proceed-
ings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
pp. 1090–1099 (2015)
15. Toews, M., Wells, W.M.: Efficient and robust model-to-image alignment using 3D
scale-invariant features. Med. Image Anal. 17(3), 271–282 (2013)
200 J. Hofmanninger et al.
16. Vogl, W.-D., Prosch, H., Müller-Mang, C., Schmidt-Erfurth, U., Langs, G.: Lon-
gitudinal alignment of disease progression in fibrosing interstitial lung disease.
In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI
2014. LNCS, vol. 8674, pp. 97–104. Springer, Heidelberg (2014). doi:10.1007/
978-3-319-10470-6 13
17. Wibmer, A., et al.: Haralick texture analysis of prostate MRI: utility for differ-
entiating non-cancerous prostate from prostate cancer and differentiating prostate
cancers with different gleason scores. Eur. Radiol. 25(10), 2840–2850 (2015)
Probabilistic Tractography for Topographically
Organized Connectomes
1 Introduction
Tractography is a widely used technique for studying brain connectomes with
diffusion MRI (dMRI) and has provided many exciting results in brain imaging
research [1]. The lack of rigorous validation for in vivo human brain studies,
however, has long been a critical challenge to push tractography toward a quan-
titative tool [2,3]. On the other hand, the regular topographic organization of
many fiber systems in human brains provide a surprisingly untapped anatomical
knowledge for the improvement and validation of tractography techniques. Some
of the well-known examples include the retinotopic organization of the visual
pathway [4], the somatotopic organization of the somatosensory pathway [5],
and the tonotopic organization of the auditory pathway [6]. In this paper, we
incorporate this insight on anatomical regularity to develop a novel probabilistic
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 201–209, 2016.
DOI: 10.1007/978-3-319-46720-7 24
202 D.B. Aydogan and Y. Shi
2 Methods
Fig. 1. (a) At t = 0, without prior information, we select a random curve among the red
candidate curves based on their likelihood. The thicker the curve, the higher posterior
probability it has. (b) By solving the Frenet-Serret ODE, we propagate by Δs. (c) At
t = 1, we calculate the prior probability for candidate curves for a smooth transition
from the previous curve c1p1 shown in green. (d) Propagate to p2 .
Bayesian Inference: Given a curve at time t, ctpt , and data D, we estimate the
posterior probability for the next curve ct+1
pt using Bayesian inference as follows:
204 D.B. Aydogan and Y. Shi
p(ct+1 t
pt | cpt ) = p(F
t+1
| F t , σT2 , σN
2 2
, σB ) p(κt+1 | κt , σκ2 )p(τ t+1 | τ t , στ2 ) (2)
p(T t+1 |T t ,σN
2 2
,σB )p(N t+1 |N t ,σT
2 2
,σB )p(B t+1 |B t ,σT
2 2
,σN )
The functions used for computing the prior probability are given in Eq. 3.
2
(T ,T t )/2σN
2
−acos2 (T t+1 ,T )/2σB
2
p(T t+1 | T t , σN
2 2
, σB ) = (2πσN σB )−1 e−acos
2
(N ,N t )/2σT
2
−acos2 (N t+1 ,N )/2σB
2
p(N t+1 | N t , σT2 , σB
2
) = (2πσT σB )−1 e−acos
2 t 2 2 t+1
,B )/2σN
2
p(B t+1 | B t , σT2 , σN
2
) = (2πσT σN )−1 e−acos (B ,B )/2σT −acos (B
√ t+1 t 2 2
p(κt+1 | κt , σκ2 ) = ( 2πσκ )−1 e−(Ψ (κ )−Ψ (κ )) /(2Ψ (σκ ))
√ t+1 t 2 2
p(τ t+1 | τ t , στ2 ) = ( 2πστ )−1 e−(τ −τ ) /(2στ )
(3)
In Eq. 3, , is the dot product, T , N and B are the intermediary rotations,
and Ψ (κ) = asin(κ) is used to linearize change in curvature.
Likelihood Estimation: We estimate the data support for the next curve
using the parallel curve definition and fiber orientation distribution (FOD). The
field of FODs over an image volume can be expressed as a spherical function
D(p, T ) : R3 × S 2 → R, where p ∈ R3 is the position of a point in the dMRI
image and T ∈ S 2 . We define our likelihood expression as follows:
t+1 1
p(D | cpt ) = D(γcp (s), Tcp (s))dsdcp (4)
4/3πr3 ∀cp ct+1
t |pt −γc (s)|≤r
p p
Fig. 2. (a) To estimate the likelihood of a curve, we randomly pick a number of points
within the integration radius, r. (b) Parallel curves passing through the random points.
(c) We compute the tangents of parallel curves for each point and obtain the average
FOD. (d)(e) Estimated likelihoods are shown in proportion to the thickness of curves.
then compute the tangents and interpolate the FODs at the points. The final
likelihood is obtained by adding and averaging the data support contributed by
these points. In Fig. 2(d) and (e), the estimated likelihoods of two different sets
of parallel curves are visualized, where the thickness of fibers are in proportion
to their likelihood.
Table 1. Tractography parameters used for each technique. vs is voxel size, ◦ is degree.
2
Step(vs) Angle Cutoff σT 2
σN 2
σB 2 (vs) σ 2 (vs) r(vs)
σκ τ
Our method 0.001 0.04 60◦ 1.25◦ 1.25◦ 0.2 0.2 2
MRtrix3 iFOD2 0.2 22◦ 0.04
MRtrix3 iFOD1 0.1 11◦ 0.04
MRtrix3 SD STREAM 0.1 60◦ 0.02
4 Results
Qualitative Evaluation: Reconstruction results of the left optic radiation of
an HCP subject by our method and MRtrix algorithms are shown in Fig. 3. We
can clearly see that our results are more desirable as they are able to successfully
capture the Meyer’s loop while exhibiting highly organized trajectories. As the
tracks approach the V1 cortex, we can see the probabilistic tractography results
from iFOD2 and iFOD1 start to become topographically less organized.
Fig. 3. Qualitative comparison of our method with MRTrix algorithms on the recon-
struction of a left optic radiation of an HCP subject.
Fig. 4. Quantitative comparison of the bundles show in Fig. 3. Top row shows the labels
for three sub-bundles of the optic radiation. Bottom row shows the eccentricity values
and the quality of quadratic fit using MSE and R2 . The low MSE and high R2 values
obtained by the proposed technique corroborate the qualitative observation.
relation as plotted in Fig. 4(f), where the black dots are the raw data and the
colored points are the fitted value with quadratic regression. In Fig. 4(g)–(j),
the eccentricity values of each bundle on the cross-section are visualized, where
the color bar for eccentricity is shown on the right most image. To quantita-
tively assess this relation, we applied quadratic regression to model the relation
between eccentricity and the cross-sectional coordinates of fiber bundles. We
report both the mean square error (MSE) and coefficient of determination (R2 )
to measure how well the fiber tracks preserve the retinotopy of the fiber bundle.
For each technique, the mean value of these two measures from 56 HCP sub-
jects are listed in Table 2, where we can see that our method achieves the best
performance in both measures.
bundles. We also did not use synthetic phantoms or simulated tracks because of
the availability of retinotopic maps for in vivo validation.
In summary, we developed a novel probabilistic tractography technique that
aims to capture the topographic organization of fiber bundles. A key idea in our
method is the use of parallel curves to examine the local fitting of fiber tracks
to the underlying field of FODs. Using the retinotopic mapping on V1 cortex,
we have conducted quantitative evaluations and demonstrated that our method
is able to generate more organized fiber tracks that follows known anatomy of
the visual system. For future work, we will conduct more extensive validations
on the visual pathway, its connectivity maps and other bundles that also follow
topographic organizations such as the auditory and somatosensory pathways.
References
1. Fillard, P., Descoteaux, M., Goh, A., Gouttard, S., Jeurissen, B., Malcolm, J.,
Ramirez-Manzanares, A., Reisert, M., Sakaie, K., Tensaouti, F., Yo, T., Mangin,
J.F., Poupon, C.: Quantitative evaluation of 10 tractography algorithms on a real-
istic diffusion MR phantom. NeuroImage 56(1), 220–234 (2011)
2. Côté, M.A., Girard, G., Bor, A., Garyfallidis, E., Houde, J.C., Descoteaux, M.:
Tractometer: towards validation of tractography pipelines. Med. Image Anal.
17(7), 844–857 (2013)
3. Thomas, C., Ye, F.Q., Irfanoglu, M.O., Modi, P., Saleem, K.S., Leopold, D.A.,
Pierpaoli, C.: Anatomical accuracy of brain connections derived from diffusion
MRI tractography is inherently limited. PNAS 111(46), 16574–16579 (2014)
4. Engel, S.A., Glover, G.H., Wandell, B.A.: Retinotopic organization in human visual
cortex and the spatial precision of functional MRI. Cereb. Cortex 7(2), 181–192
(1997)
5. Ruben, J., Schwiemann, J., Deuchert, M., Meyer, R., Krause, T., Curio, G., Vill-
ringer, K., Kurth, R., Villringer, A.: Somatotopic organization of human secondary
somatosensory cortex. Cereb. Cortex 11(5), 463–473 (2001)
6. Morosan, P., Rademacher, J., Schleicher, A., Amunts, K., Schormann, T., Zilles,
K.: Human primary auditory cortex: cytoarchitectonic subdivisions and mapping
into a spatial reference system. NeuroImage 13(4), 684–701 (2001)
7. Tournier, J.D., Calamante, F., Connelly, A.: MRtrix: diffusion tractography in
crossing fiber regions. Int. J. Imaging Syst. Technol. 22(1), 53–66 (2012)
8. Reisert, M., Mader, I., Anastasopoulos, C., Weigel, M., Schnell, S., Kiselev, V.:
Global fiber reconstruction becomes practical. NeuroImage 54(2), 955–962 (2011)
9. Mangin, J.F., Fillard, P., Cointepas, Y., Le Bihan, D., Frouin, V., Poupon, C.:
Toward global tractography. NeuroImage 80, 290–296 (2013)
10. Daducci, A., Dal Palu, A., Lemkaddem, A., Thiran, J.P.: COMMIT: convex opti-
mization modeling for microstructure informed tractography. IEEE Trans. Med.
Imaging 34(1), 246–257 (2015)
11. Smith, R.E., Tournier, J.D., Calamante, F., Connelly, A.: SIFT2: enabling dense
quantitative assessment of brain white matter connectivity using streamlines trac-
tography. NeuroImage 119, 338–351 (2015)
Probabilistic Tractography for Topographically Organized Connectomes 209
12. Tran, G., Shi, Y.: Fiber orientation and compartment parameter estimation from
multi-shell diffusion imaging. IEEE Trans. Med. Imaging 34(11), 2320–2332 (2015)
13. Essen, D.V., Ugurbil, K., Auerbach, E., Barch, D., Behrens, T., Bucholz, R.,
Chang, A., Chen, L., Corbetta, M., Curtiss, S., Penna, S.D., Feinberg, D., Glasser,
M., Harel, N., Heath, A., Larson-Prior, L., Marcus, D., Michalareas, G., Moeller, S.,
Oostenveld, R., Petersen, S., Prior, F., Schlaggar, B., Smith, S., Snyder, A., Xu,
J., Yacoub, E.: The human connectome project: a data acquisition perspective.
NeuroImage 62(4), 2222–2231 (2012)
14. Kammen, A., Law, M., Tjan, B.S., Toga, A.W., Shi, Y.: Automated retinofugal
visual pathway reconstruction with multi-shell HARDI and FOD-based analysis.
NeuroImage 125, 767–779 (2016)
15. Tournier, J.D., Calamante, F., Connelly., A.: Improved probabilistic streamlines
tractography by 2nd order integration over fibre orientation distributions. In: Pro-
ceedings of 18th Annual Meeting of the International Society for Magnetic Reso-
nance in Medicine (ISMRM), p. 1670 (2010)
16. Benson, N.C., Butt, O.H., Datta, R., Radoeva, P.D., Brainard, D.H., Aguirre,
G.K.: The retinotopic organization of striate cortex is well predicted by surface
topology. Curr. Biol. 22(21), 2081–2085 (2012)
A Hybrid Multishape Learning Framework
for Longitudinal Prediction of Cortical Surfaces
and Fiber Tracts Using Neonatal Data
Islem Rekik, Gang Li, Pew-Thian Yap, Geng Chen, Weili Lin,
and Dinggang Shen(B)
Abstract. Dramatic changes of the human brain during the first year
of postnatal development are poorly understood due to their multifold
complexity. In this paper, we present the first attempt to jointly pre-
dict, using neonatal data, the dynamic growth pattern of brain cortical
surfaces (collection of 3D triangular faces) and fiber tracts (collection of
3D lines). These two entities are modeled jointly as a multishape (a set
of interlinked shapes). We propose a hybrid learning-based multishape
prediction framework that captures both the diffeomorphic evolution of
the cortical surfaces and the non-diffeomorphic growth of fiber tracts. In
particular, we learn a set of geometric and dynamic cortical features and
fiber connectivity features that characterize the relationships between
cortical surfaces and fibers at different timepoints (0, 3, 6, and 9 months
of age). Given a new neonatal multishape at 0 month of age, we hier-
archically predict, at 3, 6 and 9 months, the postnatal cortical surfaces
vertex-by-vertex along with fibers connected to adjacent faces to these
vertices. This is achieved using a new fiber-to-face metric that quantifies
the similarity between multishapes. For validation, we propose several
evaluation metrics to thoroughly assess the performance of our frame-
work. The results confirm that our framework yields good prediction
accuracy of complex neonatal multishape development within a few sec-
onds.
1 Introduction
Knowledge about postnatal brain development fuels our understanding of cog-
nition, actions, sensation, perception, decision, and thought. From a modeling
perspective, one could see the developing brain as characterized by complex and
dynamic interactions of multiple shapes, comprising highly folded cortical sur-
faces and white matter fiber tracts that are evolving rapidly due to myelination.
Developing models that accurately capture the spatiotemporal growth of a spe-
cific multishape (here, tract and cortical surface) can help the investigation of
This work was supported in part by NIH grants (NS093842, EB006733, EB008374,
EB009634, AG041721, MH107815, MH108914, and MH100217).
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 210–218, 2016.
DOI: 10.1007/978-3-319-46720-7 25
A Hybrid Multishape Learning Framework for Longitudinal Prediction 211
Fig. 1. Training steps of hybrid multishape prediction framework for one training sub-
ject. (Top row) Estimate the baseline cortical surface diffeomorphic deformation trajec-
tory through the diffeomorphism φ using [6]. (Middle row) Whole-brain deterministic
tractography to estimate diffusion fiber tracts {Fi } at each acquisition timepoint. The
red box demonstrates the non-diffeomorphic nature of fiber growth. (Bottom row)
Non-diffeomorphic projection using π Ai of training longitudinal fibers tracts on the
estimated longitudinal mean atlas {Ai }.
212 I. Rekik et al.
the first year [1]. Put together, these facts present key challenges for predicting
subject-specific postnatal brain multishape development, solely from the neonatal
multishape. To the best of our knowledge, this is a problem that has not been
addressed.
Noting the limited works targeting the prediction of subject-specific postnatal
cortical shape development from a single timepoint [5], we propose in this work
the first learning-based multishape prediction framework from neonatal cortex
and fibers. The proposed framework comprises training and testing stages. In
the training stage, for each infant, we learn from the training subjects (1) the
geometric features (surface vertices), (2) the dynamic features of the baseline
cortical surface development (smooth and invertible evolution trajectories), and
(3) the fiber-to-face connectivity features via projections on an empirical longi-
tudinal cortical surface atlas. In the testing stage, for a new neonatal multishape,
we hierarchically select the best learned features that simultaneously predict the
triangular faces on the cortical surface (or meshes) and the fibers traversing
them at all training timepoints (in our case, 3, 6 and 9 months of age) based on
cortical shape topographic properties and a novel fiber-face selection criterion.
Our proposed method has several advantages. First, it is not only restricted
to predicting the cortical surface growth as in [5]. Second, it does not require
the computationally expensive process of registering or regressing out thousands
of fibers to establish tract-to-tract correspondence for prediction, which is less
likely to be achieved using a conventional diffeomorphic multishape registration
setting as in [2]. Third, it relies on the diffeomorphic cortical surface deforma-
tion trajectory, which is less complex and more accurate to estimate than for
developing fibers, to guide fiber prediction. More importantly, this enables us to
account for fiber connectivity changes and the appearance of ‘new’ fibers with
different topologies. Ultimately, we present a new metric for jointly predicting
both diffeomorphic surface evolution and non-diffeomorphic fiber growth within
the multishape, thus making our approach hybrid.
along its nonoriented normal vectors n and principal curvature direction. More
simply, measuring a fiber F as a varifold refers to the mathematical
operation of
integrating ω along the fiber nonoriented tangent vectors τ : F = ω(x)t τ (x)dx.
In this context, W is defined as a Reproducing Kernel of Hilbert Space (RKHS)
with a Gaussian kernel K W (x, y) = exp(−|x − y|2 )/σW 2
. The kernel decays at a
rate σW , which defines the scale under which geometric details will be overlooked
when converting a shape into a varifold. Hence, any discrete shape embedded
in the varifold space W ∗ is a summation of local discrete measurements, each
encoding the interaction of the shape at a local scale with a vector field ω [2,6].
Diffeomorphic Geodesic Longitudinal Surface Regression for Extract-
ing Geometric and Dynamic Features. To longitudinally deform a source
varifold surface S0 observed at t0 into a set of target varifold surfaces
{S1 , . . . , SN } respectively observed at {t1 , . . . , tN }, we adopt the Hamiltonian
formulation setting as described in [2,5,6] to estimate a diffeomorphism
φ(x, t), t ∈ [0, 1], which is fully parametarized by a set of control points ck
and their attached initial deformation momenta αk . The initial momenta fully
guide the geodesic shooting of S0 onto subsequent surfaces and are estimated
along with the control points through minimizing 1the following
energy functional
using a conjugate gradient descent [5]: E = 12 0 |vt |2V dt + γ j∈{1,...,N } ||Sj −
φ(S0 , tj )||2W ∗ , with γ denoting the trade-off between the deformation smooth-
ness term and the fidelity to data term, respectively. The velocity field vt
belongs to a RKHS V , with a Gaussian kernel KV decaying at rate σV ,
and is defined Nc at a location x and timepoint t in terms of convolutions as:
v(x, t) = k=1 KV (x, ck (t))αk (t), with Nc as the number of the estimated con-
trol points. This allows to set vertex-to-vertex correspondence across subjects
and timepoints. For prediction, we define the set of geometric features V as the
set of all vertices positions x belonging to baseline training surfaces and the
dynamic features as their corresponding evolution trajectories φ(x, t).
Estimation of Non-diffeomorphic Longitudinal Fiber-to-Face Connec-
tivity Features Using Multi-projections on Spatiotemporal Atlases.
Since we aim to predict the multishape growth from a single timepoint, we
estimate a set of spatiotemporal surface atlases {A0 , . . . , AN } by averaging the
shapes of the training surfaces at each timepoint to help guide the prediction
process (Fig. 1). Note that all these atlases are in correspondence with all subjects
and across all acquisition timepoints. Then, to define the fiber-to-face connec-
tivity features that capture the non-diffeomorphic growth of neonatal fibers, for
each ensemble of fibers Fi from a training subject at ti , we introduce the sur-
jective projection function π Ai (Fi ) to project it onto the corresponding surface
atlas Ai . Specifically, for a fiber line f ∈ Fi with two extremities f 1 and f 2 , we
perform: f k → π Ai (f k ) = ξ, where k ∈ {1, 2} and ξ denotes a face in Ai . In
turn, this allows us to identify for each training subject the connectivity features
for each face in the atlas Ai at a specific timepoint ti as the set of proximal
fibers that hit it or are ‘connected’ to it (noted as Fi (ξ)) (Fig. 1). To define
the connectivity features from all training subjects, we independently project
the set of fibers for each training subject on the atlas. Hence, each atlas face
214 I. Rekik et al.
stores for each training subject a set of connecting fibers through this process of
multi-projections onto a fixed atlas.
In the prediction stage, we first warp all baseline training surfaces onto the
baseline cortical surface of a testing subject. Then, in the common space, we
estimate the baseline testing fiber tracts using deterministic whole-brain trac-
tography. Because of the non-diffeomorphic nature of neonatal fibers growth,
we avoid to diffeomophically regress out fibers as for surfaces for prediction;
instead, we explore the fiber-cortex relationship (or connectivity) to guide the
fiber prediction. Hence, we introduce the following fiber-face selection criterion.
Fiber-face Selection Criterion. We define a distance between two faces ξ
and ξ with respectively F(ξ) = {f1 , . . . , fN } and F(ξ ) = {f1 , . . . , fN
} the set
dconnectivity (ξ, ξ ). The first term measures the overall shape difference between
fibers attached to faces ξ and ξ using the varifold metric as: dshape (ξ, ξ ) =
N N
| N1 k=1 ||fk ||W ∗ − N1 j=1 ||fj ||W ∗ |. The second term quantifies the geometric
N
closeness between the fiber termini positions dtermini (ξ, ξ ) = 12 (| N1 k=1 fk1 −
N N N
since each of its faces stores the set of its connecting fibers from all training
subjects. Ultimately, for each marked training face, we trace its diffeomorphic
deformation using φ, while retrieving the set of its connecting fibers at different
acquisition timepoints ti , thereby estimating F̃i .
For a pair of faces both with traversing fibers, we use the varifold metric to
measure a face-wise discrepancy between the ground Ntruth and predicted fibers
F and F̃ connected to two surfaces S and S̃: N1S i=1 S
| ||F ξi ||W ∗ − ||F̃ ξi ||W ∗ |,
with NS denoting the number of faces in S, and ξi a face in S. (3) Fiber
mismatch per face. This metric represents the average number of mis-
matched fibers per face across surface faces that are hit by either predicted
or ground truth fibers or both. We also evaluate the joint prediction accu-
racyfor both surface and tracts using a unified varifold difference metric:
NS ξi ξi
i=1 | ||F ||W ∗ − ||F̃ ||W ∗ | + |||Si ||W ∗ − ||S̃i ||W ∗ |.
1
NS
Multishape Prediction Evaluation. Despite the small size of our dataset and
its large variability in cortical shape and fiber tracts, our framework led to very
promising results as summarized in Table 1. Since this is the first work to pre-
dict developing cortical fibers, we compared our prediction error with the error
of the observable baseline multishape reconstruction from the baseline ground
truth multishape, which is very low (0 month in Table 1). We notice that the
prediction accuracy generally decreases from 3 to 9 months compared to the
baseline reconstruction from the ground truth, with a slight potential improve-
ment at 6 months. Notably, the global mismatch for the predicted fibers peaks
at 3 months. This is quite expected since the training fibers at around 3 months
are largely variable due to the rapidly developing myelination. Moreover, the
proposed rich fiber-face selection criterion generated better prediction results
compared to using symmetric Euclidean distance as a similarity metric between
fibers for face-fiber selection. Indeed, mean fiber mismatch per face dropped from
1.76 to 1.64 and mean varifold value from 19.98 to 18.83 when using our metric.
Figure 2 shows a good overall overlap between ground truth and predicted fibers
for a representative testing subject. The red-blue fiber mismatch regions can be
explained by a large variability in the training fiber data as well as the use of
inconsistent subject-specific tractography in the temporal domain. Additionally,
we locally evaluated the accuracy of our prediction method in 35 anatomical
cortical regions (Fig. 3), which showed a spatially-varying prediction accuracy
that generally decreased with time. Nonetheless, it still fitted into a promising
range of prediction values for each evaluation metric (e.g., ∼3 mismatched fibers
Table 1. Surface (S) and fiber (F) prediction accuracy evaluation averaged across 10
cortical hemispheres. The baseline multishape reconstruction error (in bold) is consid-
ered as a ‘reference’ in assessing the performance of our prediction framework.
Fig. 2. Multishape prediction for a representative subject. The blue multishape repre-
sents the ground truth while the one in red represents the predicted multishape. The
reconstructed baseline multishape (S̃0 , F̃0 ) is used as guidance for multishape predic-
tion at late timepoints and as a reference for evaluation.
per face). For the cortical surface, the prediction mainly dropped in highly folded
and buried cortical regions such as the insular cortex. On the other hand, the
prediction error of the overall shape of the predicted fiber tracts compared with
the ground truth tracts, quantified using the varifold distance, reached its apex
in the paracentral lobule, the posterior cingulate cortex and the precentral gyrus.
This can be explained by large variability in the shape of the fibers connected to
these regions. For potentially similar reasons, the mean face-wise mismatch was
below 15 % in most cortical regions, except for the anterior and posterior cingu-
late cortices, and the insular cortex. These regions were also affected by largest
values of mean fiber mismatch per face (which generally remained below 5).
5 Conclusion
We proposed the first hybrid developing multishape prediction model
that captured well both the diffeomorphic cortical shape deformation and
non-diffeomorphic fiber tracts growth. Our method leveraged on exploring the
fiber-surface relationship through multi-projections of fiber termini on the cor-
responding surface. Our prediction results are promising and we hope that in
218 I. Rekik et al.
the light of this work more attention will be drawn to solving this challenging
problem. Eventually, building an accurate and fast multishape prediction model
can also help predict structural brain connectivity of axonal wiring during early
postnatal stages. One way to improve our work is to develop a non-diffeomorphic
longitudinally consistent brain tractography algorithm as a preprocessing step –
which, to our knowledge, is still not tailored to handle developing 3D fiber tracts.
References
1. Li, G., Liu, T., Ni, D., Lin, W., Gilmore, J., Shen, D.: Spatiotemporal patterns
of cortical fiber density in developing infants, and their relationship with cortical
thickness. Hum. Brain Mapp. 36, 5183–5195 (2015)
2. Durrleman, S., Prastawa, M., Charon, N., Korenberg, J., Joshi, S., Gerig, G.,
Trouvé, A.: Morphometry of anatomical shape complexes with dense deformations
and sparse parameters. NeuroImage 101, 35–49 (2014)
3. Gori, P., Colliot, O., Worbe, Y., Marrakchi-Kacem, L., Lecomte, S., Poupon, C.,
Hartmann, A., Ayache, N., Durrleman, S.: Bayesian atlas estimation for the vari-
ability analysis of shape complexes. Med. Image Comput. Comput. Assist. Interv.
16, 267–274 (2013)
4. Gori, P., Colliot, O., Marrakchi-Kacem, L., Worbe, Y., Routier, A., Poupon, C.,
Hartmann, A., Ayache, N., Durrleman, S.: Joint morphometry of fiber tracts and
gray matter structures using double diffeomorphisms. Inf. Process. Med. Imaging
24, 275–287 (2015)
5. Rekik, I., Li, G., Lin, W., Shen, D.: Predicting infant cortical surface development
using a 4D varifold-based learning framework and local topography-based shape
morphing. Med. Image Anal. 28, 1–12 (2015)
6. Rekik, I., Li, G., Lin, W., Shen, D.: Multidirectional and topography-based dynamic-
scale varifold representations with application to matching developing cortical sur-
faces. NeuroImage 135, 152–162 (2016)
7. Stieltjes, B., Kaufmann, W., Zijl, P.V., Fredericksen, K., Pearlson, G.,
Solaiyappan, M., Mori, S.: Diffusion tensor imaging and axonal tracking in the
human brainstem. NeuroImage 14, 723–735 (2001)
Learning-Based Topological Correction
for Infant Cortical Surfaces
1 Introduction
The human cerebral cortex is a highly convoluted structure of gray matter. Geometri-
cally, its surface is topologically equivalent to a sphere (without holes and handles),
when artificially closing the midline hemispheric connections. Reconstruction of
topologically correct and accurate cortical surfaces from MR images plays a funda-
mental role in neuroimaging studies [1]. However, due to the highly folded nature of the
cortex and limitations in the MRI acquisition process, it is inevitable to have errors in
brain tissue segmentation, which is a prerequisite for cortical surface reconstruction.
© Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 219–227, 2016.
DOI: 10.1007/978-3-319-46720-7_26
220 S. Hao et al.
Topological correction typically involves two sequential tasks, i.e., (1) locating
topologically defected regions and (2) correcting them. For the former task, methods
largely rely on the priori knowledge that each cortical hemisphere has a simple
spherical topology. Based on this, cyclic graph loops [4–6] or overlapping surface
meshes after remapping [7–9] are used as hints to locate regions with topological
defects. The latter task is much more challenging, as the two types of topological
errors, i.e., holes and handles, are essentially only different in terms of their incon-
sistency with the cortex anatomy. Typically, holes incorrectly perforate the cortical
surface, while handles erroneously bridge the nonadjacent points in the cortical surface,
as shown in Fig. 1. In this context, topological correction methods have to make a
choice between the two correction types: filling a hole or breaking a handle. However,
since the difference between holes and handles actually lies in the sense of anatomical
correctness, they are hard to distinguish solely using geometric information. So
heuristics were usually made to address this issue. For example, a minimal correction
criterion was adopted by assuming that the change for correction should be as small as
Learning-Based Topological Correction for Infant Cortical Surfaces 221
possible [4, 5, 7, 10]. As this criterion is not reliable enough, several ad hoc rules based
on MRI appearance patterns were proposed [6, 8, 11] to help determine the correction
type. Although these methods achieve good performance on adult cortical surfaces,
they have major limitations in processing infant cortical surfaces for two reasons. First,
the minimal correction criterion typically cuts the handles of large topological defects
frequently occurring in infant images, thus leading to anatomical regions missing or
inconsistent. Second, the ad hoc rules designed based on adult MRIs (typically with
clear contrast) are invalid for the infant MRIs, which have longitudinally changing and
regionally heterogeneous intensity patterns. Hence, methods for handling infant MR
images at a variety of developmental stages are highly desired.
In this paper, we propose a novel learning-based method for correcting the topo-
logical defects in infant cortical surfaces, without requiring predefined rules as in the
existing methods. Specifically, we first locate topologically defected regions by using a
topology-preserving level set method. Then, by leveraging rich information of the
corresponding patches from anatomical reference images with correct and accurate
topology, we build region-specific dictionaries and infer the correct tissue labels using
sparse representation. Notably, we further integrate these two steps as an iterative
framework to gradually correct large topological errors that frequently occur in infant
MR images and cannot be completely corrected in one-shot sparse representation.
Extensive experiments demonstrate the feasibility and effectiveness of our method.
2 Method
Given a tissue segmentation image V, labeled as white matter (WM), gray matter
(GM), and cerebrospinal fluid (CSF), our method includes two stages: extracting
candidate voxels (Sect. 2.1) and inferring their new tissue labels (Sect. 2.2). Then,
these two stages are further integrated into an iterative framework (Sect. 2.3).
The element wi in the weight vector w indicates the appearance similarity between
cðvÞ and the i-th atom in DðvÞ. Herein, the center of the i-th atom in DðvÞ is a voxel ri in
the reference image. Based on the assumption that the appearance similarity wi also
reveals the likelihood that v in subject shares the same label as ri , we can infer the new
label of v with a weighted nearest neighbor model. Denoting lv as the tissue label of v,
we can compute the probability of lv ¼ j, where j 2 fWM; GM; CSFg.
XKðdn Þ3
pðlv ¼ jÞ ¼ i¼1
wi pðlv ¼ jjri Þ ð2Þ
(
1 lri ¼ j
pðlv ¼ jjri Þ ¼ ð3Þ
0 otherwise
The new label of v is finally obtained by the MAP criteria, i.e., arg maxj pðlv ¼ jÞ.
This framework brings two benefits. First, large topological defects in infant
cortical surfaces are gradually corrected, as the algorithm updates candidate voxels in
each iteration. Second, the cardinality of the candidate voxels decreases during the
iterations, because successfully fixed defects are no longer included in the next itera-
tion. The computational cost is mainly determined by the dictionary size, the cardi-
nality of VCAN , and the iteration number.
3 Experiments
To validate our method, brain MR images with the resolution of 1 1 1 mm3 from
100 infants at 6 months of age were used in experiments. As our method only relies on
tissue segmentation results, we note that it is generic, and can also be applied to adult
brains and infant brains at other developmental stages, such as neonates and 1-year-old.
The main motivation of using 6-month-old infants for validation is that, among all
stages during early brain development, MR images at 6 months exhibit the lowest
tissue contrast and thus the most severe topological errors in tissue segmentation.
Herein, the tissue segmentation was conducted by the state-of-the-art method in [3].
After segmentation, experts manually corrected the topological errors in the cortical
surfaces of WM for all subjects, by using ITK-SNAP. Among the 100 pairs of
uncorrected and manually corrected volumes, 20 manually corrected volumes are
randomly selected as Rk ðk ¼ 1; . . .; 20Þ. One half of the rest 80 pairs were randomly
selected for adjusting parameters and the other half were for performance evaluation.
We use the successful rate Sc to quantitatively evaluate our method:
Here the successfully corrected topological defects indicate that holes are correctly
filled or handles are correctly broken. However, Sc is limited in reflecting the
anatomical consistency between the resulting surface and the ground truth. So we also
adopt the Dice Ratio (DR) and average Surface Distance (SD) as the evaluation
measures:
0 T 0
2 V1 V2
DR ¼ ð5Þ
V 0 þ V 0
1 2
1 1X
SDðV1 ; V2 Þ ¼ ð v1 2surf ðV1 Þ
dðv1 ; surf ðV2 ÞÞ
2 n1
ð6Þ
1X
þ v2 2surf ðV2 Þ
dðv2 ; surf ðV1 ÞÞÞ
n2
using DR, we only use those regions enclosing the candidate voxels and their adjacent
0 0
voxels in V1 and V2 , i.e., V1 and V2 , obtained by dilation of the set of the candidate
voxels. In Eq. 6, dð; Þ is the Euclidean distance, and n1 and n2 are cardinalities of
surf ðV1 Þ and surf ðV2 Þ, respectively.
Based on the validation set, we found the best Sc was achieved by setting
k1 ¼ 0:2; k2 ¼ 0:01; dc ¼ 11; dn ¼ 5, T = 4, which were then applied to the testing set.
Figure 2 shows an example of topological correction result by our method. We can see
that our method can effectively fix topological defects and meanwhile ensure the
anatomical consistency and correctness. In the iterative framework (visually validated
in Fig. 3), four iterations (T = 4) are empirically enough for all the cases in our
experiments.
As there is no available software specifically designed for correcting infant cortical
surfaces, we compared our method with two popular software BrainSuite [5] and
FreeSurfer [8], which are designed for processing the adult brain and achieve the
state-of-the-art performance in the field. We show typical results in Fig. 4 and quan-
titative results in Table 1. Due to the minimal correction criterion, BrainSuite does not
fully remove the handle regions, e.g., the red ellipses in Fig. 4. More importantly, it
erroneously breaks too many holes that should be filled, e.g., the blue ellipses in Fig. 4,
leading to a low Sc . In contrary, our learning-based method and FreeSurfer achieve
much better Sc than BrainSuite. However, FreeSurfer has low accuracy in terms of DR
and SD, indicating poor anatomical consistency and correctness. For example, in
Fig. 4, the gyral structures highlighted by the ellipses in FreeSurfer’s results are
missing, compared with the ground truth. After checking all experimental results, we
found that the similar problem of missing large gyral structures occurred in over half of
the FreeSurfer’s results, resulting in a clear drop in DR and significant increase in SD in
Table 1. In contrast, our method produces more balanced results. Its Sc is generally
comparable with FreeSurfer, and its DR and SD are much better than FreeSurfer,
indicating that our method not only effectively corrects topological defects, but also
better ensure the anatomical accuracy.
226 S. Hao et al.
4 Conclusion
Acknowledgements. This work was supported in part by NIH grants (MH107815, MH108914,
MH100217, EB006733, EB008374, and EB009634). Dr. Shijie Hao was supported by National
Nature Science Foundation of China grant 61301222.
Learning-Based Topological Correction for Infant Cortical Surfaces 227
References
1. Li, G., et al.: Mapping region-specific longitudinal cortical surface expansion from birth to
2 years of age. Cereb. Cortex 23(11), 2724–2733 (2013)
2. Paus, T., et al.: Maturation of white matter in the human brain: a review of magnetic
resonance studies. Brain Res. Bull. 54(3), 255–266 (2001)
3. Wang, L., et al.: LINKS: learning-based multi-source IntegratioN frameworK for
Segmentation of infant brain images. NeuroImage 108, 160–172 (2015)
4. Shattuck, D.W., Leahy, R.M.: Automated graph-based analysis and correction of cortical
volume topology. TMI 20(11), 1167–1177 (2001)
5. Han, X., et al.: Topology correction in brain cortex segmentation using a multiscale,
graph-based algorithm. TMI 21(2), 109–121 (2002)
6. Shi, Y., Lai, R., Toga, A.W.: Cortical surface reconstruction via unified Reeb analysis of
geometric and topological outliers in magnetic resonance images. TMI 32(3), 511–530
(2013)
7. Fischl, B., et al.: Automated manifold surgery: constructing geometrically accurate and
topologically correct models of the human cerebral cortex. TMI 20(1), 70–80 (2001)
8. Segonne, F., Pacheco, J., Fischl, B.: Geometrically accurate topology-correction of cortical
surfaces using nonseparating loops. TMI 26(4), 518–529 (2007)
9. Yotter, R.A., et al.: Topological correction of brain surface meshes using spherical
harmonics. Hum. Brain Mapp. 32(7), 1109–1124 (2011)
10. Bazin, P.-L., Pham, D.: Topology correction of segmented medical images using a fast
marching algorithm. Comput. Methods Prog. Biomed. 88(2), 182–190 (2007)
11. Ségonne, F., Grimson, W.L., Fischl, B.: A genetic algorithm for the topology correction of
cortical surfaces. In: Christensen, G.E., Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565,
pp. 393–405. Springer, Heidelberg (2005)
12. Han, X., Xu, C., Prince, J.: A topology preserving level set method for geometric deformable
models. PAMI 25(6), 755–768 (2003)
13. Vercauteren, T., et al.: Diffeomorphic demons: efficient non-parametric image registration.
NeuroImage 45(S1), 61–72 (2009)
14. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat.
Soc. Ser. B 67(2), 301–320 (2005)
Riemannian Metric Optimization
for Connectivity-Driven Surface Mapping
1 Introduction
we develop in this work a novel computational framework for intrinsic and dif-
feomorphic surface mapping in the Laplace-Beltrami (LB) embedding space via
the optimization of Riemannian metrics.
For intrinsic shape analysis, there have been growing interests in using the
spectrum of the LB operator in computer vision and medical image analysis
[6,7]. Graph-based approaches were proposed in [8] for surface mapping with LB
eigenfunctions. An isometry invariant embedding space was proposed in [7] using
the LB spectrum. Based on the equivalence of isometry and the minimization
of a spectral-l2 distance in the LB embedding space, a novel surface mapping
algorithm was developed recently via conformal metric optimization on surfaces
(CMOS) [9]. The CMOS approach, however, only computes conformal maps and
cannot incorporate rich connectivity features.
To overcome this limitation, we propose in this paper a more general com-
putational framework based on the Riemannian metric optimization on surfaces
(RMOS). Given any diffeomorphism between two surfaces, the pullback metric
defines the isometry between two surfaces. Since the LB eigen-system is com-
pletely determined by the Riemannian metric, we can thus pose the computation
of diffeomorphism as a problem of finding the proper Riemannian metric that
minimizes the spectral-l2 distance in the LB embedding space, which ensures an
isometry is achieved with the resulting diffeomorphism. In this general frame-
work, we can easily incorporate the matching of desirable connectivity features
during the RMOS process. For numerical implementation, it was established that
the Riemannian metrics on triangular meshes are weights defined on the edges
and they fully determine the heat kernel on the triangular meshes [10]. Thus the
goal of our RMOS is to compute the optimal weights on the mesh edges to real-
ize diffeomorphic mapping of connectivity features in the LB embedding space.
In our experimental results, we apply RMOS for connectivity-driven mapping of
the thalamic surfaces, which have well-established rich connectivity to cortical
regions [11]. In comparisons with the CMOS method, we demonstrate that the
proposed RMOS method can achieve better alignment of anatomical features
and improved sensitivity in detecting thalamic atrophy due to normal aging.
The rest of the paper is organized as follows. In Sect. 2, we first introduce the
mathematical background of LB embedding and Riemannian metric optimiza-
tion. After that, we propose the RMOS framework and develop the numerical
algorithms for energy minimization. Experimental results on surface mapping
with connectivity features are presented in Sect. 3. Finally, conclusions will be
made in Sect. 4.
Table 1. Non-zero gradient elements of Q and U w.r.t. the metric (length) of an edge
Vi Vj , i.e., dij . ∂Q/∂dij and ∂U/∂dij are symmetric. Each edge has two neighboring
triangles: Tl1 and Tl2 , where Al1 and Al2 are their areas, and Bl1 and Bl2 are the
product of their edge lengths. The third vertex in these two triangles are Vk1 and Vk2 .
Riemannian Metric Optimization for Connectivity-Driven Surface Mapping 231
1, 2). For any point p ∈ M1 , the metric g1 (p) defines the inner product of vectors
on the tangent plane at p. Let u : M1 → M2 be a diffeomorphic map from M1 to
M2 . Following the definition of diffeomorphism in differential geometry, we can
map two vectors x1 and x2 on the tangent plane of p ∈ M1 onto the tangent
plane of u(p) ∈ M2 and denote them as du(x1 ) and du(x2 ). We can then define
the inner product of x1 and x2 as the inner product of du(x1 ) and du(x2 ) using the
metric g2 (u(p)) at u(p) ∈ M2 , which is called the pullback metric for M1 induced
by the map u. For every possible diffeomorphism from M1 to M2 , we can thus
induce an isometry between M1 and M2 via the pullback metric. Since the LB
spectrum of a surface is completely determined by its Riemannian metric, the LB
spectrum of M1 generated by the pullback metric will match the LB spectrum of
M2 . For any desired diffeomorphic map, this shows mathematically the existence
of a Riemannian metric on M1 that will ensure its perfect alignment with M2 in
the LB embedding space. For the problem of surface mapping, the diffeomorphism
we want to compute is unknown. In the RMOS framework, our goal is to search
for the Riemannian metric that can minimize their distance in the LB embedding
space while matching connectivity features.
Let W1 and W2 denote their Rie-
mannian metrics, i.e., the edge weights
of M1 and M2 , respectively. The eigen-
systems of M1 and M2 are denoted
as (λ1,n , f1,n ) and (λ2,n , f2,n )(n =
1, 2, · · · ), respectively. We denote u1 :
M1 → M2 as the map from M1 to M2
and u2 : M2 → M1 as the map from
M2 to M1 . As shown in Fig. 2, we com-
pute the LB eigen-system and construct
their embeddings as: M 1 = I Φ1 (M1 )
M1
and M 2 = I Φ2 (M2 ). In the embedding
M2
space, the maps are ũ1 : M 1 → M 2 Fig. 2. Symmetric RMOS mapping
and ũ2 : M2 → M1 . The final maps process.
between the two surfaces are obtained
via composition of the embeddings and the maps in the embedding space.
Energy Function for Surface Mapping. Let ξ1j : M1 → R and ξ2j : M2 → R
(j = 1, 2, · · · , L) denote L connectivity feature functions on each surface. In our
experiments, we will define each feature as the normalized fiber count to a specific
cortical region for thalamic surfaces, but our framework and numerical algorithm
are general for both geometric and other forms of connectivity features. We
define an energy function for connectivity-driven surface mapping with RMOS:
E = EF + γER , where EF is the data fidelity term for matching given features,
ER is the regularization term, and γ is the weight between the two terms. We
define the data fidelity term with an L2 energy:
L
This energy is symmetric w.r.t. both surfaces. It penalizes the mismatch between
the original and mapped features. We define the regularization term as:
w1,i 1 w1,j
2 w2,i 1 w2,j
2
ER = − + − , (4)
w1,i
∈W
ŵ1,i n1,i j∈N ŵ1,j w ∈W
ŵ2,i n2,i j∈N ŵ2,j
1 1,i 2,i 2 2,i
where N1,i and N2,i are the sets of edges in the neighborhood (directly connected
to the edge i), n1,i and n2,i are the total numbers of the neighbor edges, and
w1,i and w2,i (ŵ1,i and ŵ2,i ) are the metric (the standard metric), respectively,
on an edge of M1 and M2 . This term constrains the changes of metric ratios to
be smooth.
Optimization Algorithm. To minimize the energy function using metric opti-
mization, we first construct a coarse correspondence, which we call a β-map,
that transforms the energy into distance measurements in the embedding space.
1 → M
Let ũβ1 : M 2 denote the β-map from M 1 to M2 . For each point x ∈ M1 ,
β
ũ1 (x) can be discretized as a linear combination of vertex positions in M 2 .
β β
Thus we can represent the β-maps: ũ1 : M1 → M2 and ũ2 : M2 → M1 as
linear operators A and B, respectively, as shown in Fig. 2. To construct the β-
maps for the minimization of EF , we start from the nearest point maps and
move the points along the gradient descent direction in the tangent space of the
meshes as:
L
L
∂EF j j β j β ∂EF
β
= −2 ξ1 − ξ2 ◦ ũ 1 ∇ ξ
2 2
M (ũ 1 ), β
= −2 ξ2j − ξ1j ◦ ũβ2 ∇M j β
1 ξ1 (ũ2 ),
∂ ũ1 j=1 ∂ ũ 2 j=1
where ∇M 1 and ∇M
2 are the intrinsic gradients on the surfaces M1 and M2 .
The β-maps are obtained by updating the maps for a fixed number of time steps.
Given the β-map, we convert the data fidelity term EF into the distance energy
in the embedding space, which we call E F , and compute its gradient descent
F /∂Wi (i = 1, 2) using Eqs. 11 and 12 in [9].
direction ∂ E
To minimize the energy ER w.r.t. the metric W1 and W2 , we rewrite Eq. 4
in a matrix form: ER = D1 W1 2 + D2 W2 2 , and compute the gradients of
ER as:
∂ER ∂ER
= 2D1T D1 W1 , = 2D2T D2 W2 , (5)
∂W1 ∂W2
where D1 and D2 are used to calculate the difference of the metrics between
neighboring edges. They are initially given and fixed because the mesh connec-
tivity does not change during the optimization process. We redefine the energy
function as:
=E
E F + γ̃ER , (6)
that is directly differentiable w.r.t. the metric Wi , and finally form the gra-
dient as: ∂ E/∂W
i = ∂ EF /∂Wi + γ̃(∂ER /∂Wi )(i = 1, 2). By minimizing this
energy using gradient descent, we deform the embedding of a surface toward its
β-map, thus achieving the goal of minimizing the original energy E. When it
Riemannian Metric Optimization for Connectivity-Driven Surface Mapping 233
3 Results
In this section, we present experimental results to demonstrate the value of the
RMOS framework in connectivity-based brain mapping. MRI data from 212
subjects of the Q1–Q3 release of the Human Connectome Project (HCP)[3] and
18 subjects from the LifeSpan pilot project of HCP were used in our experi-
ments. We use the left thalamus surfaces from these subjects to compare the
performance of RMOS and CMOS in aligning connectivity features and detect-
ing group differences. All thalamic surfaces are represented as triangular meshes
with 1000 vertices and 2994 edges. For CMOS-based experiments, we use its
implementation in the publicly distributed MOCA software1 .
To define the connectivity features, we use probabilistic tractography with
fiber orientation distributions (FODs) reconstructed from the multi-shell diffu-
sion MRI data from HCP [13]. For each thalamic surface, 100,000 fiber tracts
are generated. For each vertex, we define a neighborhood with a radius of 2 mm.
Given a cortical region, the connectivity from this vertex to the cortical region is
defined as the number of tracts that pass through the vertex neighborhood and
reach the cortical region. By repeating this process for each vertex, we obtain a
connectivity map for this cortical region. After that, we divide the connectivity
map by its maximum value to generate a normalized connectivity map, which
we use in our surface mapping. Overall we compute the connectivity maps to ten
cortical regions: orbital-frontal, superior-frontal, middle/inferior-frontal, motor,
sensory, superior-parietal, inferior-parietal, insular, temporal, and occipital cor-
tices of the same hemisphere.
As a first experiment, we demonstrate a robust approach of selecting the
regularization parameter γ̃ in our energy function Eq. 6. Instead of using a fixed
value during the whole iterative optimization process, we adaptively change γ̃ in
every iteration so that the normalized maximum gradient magnitudes of E F and
ER have the constant ratio γ̄. For a pair of thalamus surfaces shown in Fig. 3(d)
and (e), the effect of the regularization term can be clearly observed in Fig. 3(f)
and (g), where the source mesh is projected onto the target surface using the
RMOS maps computed with two different γ̄ values. For a wide range of γ̄ values,
we run the RMOS map and plot the the optimized total energy E, data fidelity
1
https://www.nitrc.org/projects/moca 2015.
234 J.K. Gahm and Y. Shi
(d) Source (e) Target (f) γ̄ = 0.01 (g) γ̄ = 0.24 (h) Metric (γ̄ = 0.24)
Fig. 3. RMOS mapping of two thalamic surfaces. Plots of (a) E f , (b) EF and (c) ER
over a range of the parameter γ̄ after RMOS on (d) the source and (e) target surface
(lateral view). Projection of the source to target surface with γ̄ = (f) 0.01, (g) 0.24.
(h) the final optimized metric on the source surface with γ̄ = 0.24 (lateral and medial
views).
term EF and regularization term ER as a function of γ̄ in Fig. 3(a), (b) and (c),
respectively. With the increase of γ̄ until the turning point of the L-shape curve
in (a), we have relatively large decrease of the regularization energy without
much increase of the data fidelity term. Thus we consider it as the sweet spot of
our energy minimization problem and choose the parameter γ̄ = 10−0.625 = 0.24
for our large scale experiments. We follow the multi-scale strategy in [9] that
starts with the first 10 eigenfunctions, iteratively increases the number of eigen-
functions by 5 to 20, and set a maximum of 500 iterations at the final eigen-order.
The optimiazed metric was plotted on the source surface in Fig. 3(h). The RMOS
computational process takes around 2 h on a 16-core 2.6-GHz Intel Xeon CPU
(multi-threading enabled) with maximal memory consumption around 900 MB.
As an illustration, the
connectivity features to
the superior-frontal and
sensory cortices of the (a) Source features (b) Target features.
source and target surfaces
are shown in Fig. 4(a)
and (b). Using the maps (c) Pullback by RMOS (d) Pullback by CMOS
computed by RMOS and
CMOS, we pull back the
connectivity features from (e) RMOS atlas (f) CMOS atlas
the target surface onto
the source surface, and Fig. 4. Mapping the connectivity features of thalamic sur-
the results are shown in faces. To highlight the differences between RMOS and CMOS,
only connectivity features to two cortical regions: superior-
Fig. 4(c) and (d). Clearly frontal (left) and sensory (right) cortices are shown in each
a better match with the subfigure (a)–(f) from the lateral view.
Riemannian Metric Optimization for Connectivity-Driven Surface Mapping 235
source connectivity features is achieved by the RMOS method. This is not sur-
prising but emphasizes the need of integrating connectivity features into diffeo-
morphic surface mapping. We then apply both RMOS and CMOS to the 212
thalamic surfaces from the HCP data and constructed average connectivity maps
to the ten cortical regions. The results of the maps to the superior-frontal and
sensory cortices are shown in Fig. 4(e) and (f), where we can see the atlas from
the RMOS method appears to be more concentrated, i.e., less variable, than the
CMOS atlas. This demonstrates the potential of connectivity-based mapping
with RMOS for the construction of more anatomically meaningful atlases.
In the last experiment, we examine local-
ized thickness changes of the left thalamus
between two groups from the LifeSpan pilot
project of HCP. Group one consists of 9 sub-
ject in the age range 14–35 yrs. Group two
consists of 9 subjects in the age range 45– (a) RMOS (b) CMOS
75 yrs. The thickness map of each surface is
computed for statistical analysis [2]. Using Fig. 5. Log-scale p-value (− log p)
the surface maps generated by RMOS and maps of the thickness for the 9
CMOS we run vertex-wise t-test, and the young (14–35 yrs) vs. 9 old (45–
p-value maps from these two methods are 75 yrs). Each subfigure shows the
shown in Fig. 5. Clearly the RMOS maps gen- superior (left) and inferior (right)
erate more significant results about thalamic views.
atrophy due to normal aging.
4 Conclusion
In this paper, we developed a novel method for mapping surface connectivity
based on the optimization of the Riemannian metric in the Laplace-Beltrami
embedding space. We demonstrated the value of our method by applying it
to compute connectivity-driven maps of the thalamic surfaces. In comparisons
with a state-of-the-art method, we showed that our method can achieve better
alignment of connectivity features and higher sensitivity in detecting thalamic
atrophy in normal aging. For future work, we will validate our method on more
general anatomical surfaces with both geometric and connectivity features.
References
1. Fischl, B., Sereno, M.I., Dale, A.M.: Cortical surface-based analysis II: infla-
tion, flattening, and a surface-based coordinate system. NeuroImage 9(2), 195–207
(1999)
2. Thompson, P.M., Hayashi, K.M., de Zubicaray, G.I., Janke, A.L., Rose, S.E.,
Semple, J., Hong, M.S., Herman, D.H., Gravano, D., Doddrell, D.M., Toga, A.W.:
Mapping hippocampal and ventricular change in Alzheimer disease. NeuroImage
22(4), 1754–1766 (2004)
236 J.K. Gahm and Y. Shi
3. Essen, D.C.V., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E., Ugurbil, K.:
The WU-Minn human connectome project: an overview. NeuroImage 80, 62–79
(2013)
4. Gutman, B., Leonardo, C., Jahanshad, N., Hibar, D., Eschenburg, K., Nir, T.,
Villalon, J., Thompson, P.: Registering cortical surfaces based on whole-brain
structural connectivity and continuous connectivity analysis. In: Golland, P.,
Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol.
8675, pp. 161–168. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10443-0 21
5. Jiang, X., Zhang, T., Zhu, D., Li, K., Chen, H., Lv, J., Hu, X., Han, J., Shen, D.,
Guo, L., Liu, T.: Anatomy-guided dense individualized and common connectivity-
based cortical landmarks (A-DICCCOL). IEEE Trans. Biomed. Eng. 62(4), 1108–
1119 (2015)
6. Reuter, M., Wolter, F., Peinecke, N.: Laplace-Beltrami spectra as Shape-DNA of
surfaces and solids. Comput. Aided Des. 38, 342–366 (2006)
7. Rustamov, R.M.: Laplace-beltrami eigenfunctions for deformation invariant shape
representation. In: Proceeding of Eurographics Symposium on Geometry Process-
ing, pp. 225–233 (2007)
8. Lombaert, H., Sporring, J., Siddiqi, K.: Diffeomorphic spectral matching of cortical
surfaces. In: Gee, J.C., Joshi, S., Pohl, K.M., Wells, W.M., Zöllei, L. (eds.) IPMI
2013. LNCS, vol. 7917, pp. 376–389. Springer, Heidelberg (2013). doi:10.1007/
978-3-642-38868-2 32
9. Shi, Y., Lai, R., Wang, D., Pelletier, D., Mohr, D., Sicotte, N., Toga, A.: Metric
optimization for surface analysis in the Laplace-beltrami embedding space. IEEE
Trans. Med. Imag. 33(7), 1447–1463 (2014)
10. Zeng, W., Guo, R., Luo, F., Gu, X.: Discrete heat kernel determines discrete
Riemannian metric. Graph. Models 74(4), 121–129 (2012)
11. Behrens, T.E., Johansen-Berg, H., Woolrich, M.W., Smith, S.M., Wheeler-
Kingshott, C.A., Boulby, P.A., Barker, G.J., Sillery, E.L., Sheehan, K.,
Ciccarelli, O., Thompson, A.J., Brady, J.M., Matthews, P.M.: Non-invasive map-
ping of connections between human thalamus and cortex using diffusion imaging.
Nat. Neurosci. 7(6), 750–757 (2003)
12. Rosen, J.B.: The gradient projection method for nonlinear programming. part I.
Linear constraints. J. Soc. Ind. Appl. Math. 8(1), 181–217 (1960)
13. Tran, G., Shi, Y.: Fiber orientation and compartment parameter estimation from
multi-shell diffusion imaging. IEEE Trans. Med. Imaging 34(11), 2320–2332 (2015)
Riemannian Statistical Analysis of Cortical
Geometry with Robustness to Partial Homology
and Misalignment
2 Methods
This section describes the proposed model for the cortical sheet, the robust
descriptor of local cortical geometry, and its use for hypothesis testing on a
Riemannian manifold.
We propose a medial surface model for the cortex, which subsumes models for
cortical folding and thickness. The proposed model comprises (i) the mid cortical
surface, as the medial surface, and (ii) local cortical thickness values at each point
on the mid-cortical surface. Given the mid-cortical surface M, the value of the
thickness t at each point m on M gives the locations of the inner and outer
(pial) cortical surfaces, at distances t/2 along the inward and outward normals
to the mid-cortical surface at m.
We compute cortical thickness based on [7]. We model the geometry of the
mid-cortical surface M through the local surface-patch characteristics at each
point on the surface. At every point m ∈ M, the principal curvatures κmin (m)
Riemannian Statistical Analysis of Cerebral Cortex 239
and κmax (m) describe the local geometry [3] (up to second order and up to a
translation and rotation). The space (κmin (m),κmax (m)) can be reparametrized,
by a polar transformation, into the orthogonal bases of curvedness
C(m) :=
[κmin (m)2 + κmax (m)2 ]0.5 and shape index S(m) := π1 arctan κκmin
min (m)+κmax (m)
(m)−κmax (m) ,
which meaningfully separate notions of bending and shape [9], leading to easier
interpretation. The shape index S(m) ∈ [−1, 1] is a pure measure of shape,
modulo size, location, and pose. The curvedness C(m) ≥ 0 captures a notion of
surface bending at a particular patch scale/size, and is invariant to location and
pose. We compute principal curvatures at m by fitting a quadratic patch to the
local surface around m [3].
We perform hypothesis testing using the joint histograms Hi (m) as the local
feature descriptor for the cortex at location m for subject i. If the number
of bins in the histogram is B, then Hi (m) ∈ (R≥0 )B , ||Hi (m)||1 = 1, and
Hi (m) lies on a Riemannian manifold. To measure distance between histograms
H1 (m) and H2 (m), we use the Fisher-Rao distance metric d(H1 (m), H2 (m)) :=
dg (F1 (m), F2 (m)), where Fi (m) is the square-root histogram
that is denoted
Hi (m), with the value in the b-th bin Fi (m, b) := Hi (m, b) and
dg (F1 (m), F2 (m)) is the geodesic distance between F1 (m) and F2 (m) on the
unit hypersphere SB−1 [19].
Modeling a probability density function (PDF) on a hypersphere entails fun-
damental trade-offs between model generality and the viability of the underlying
parameter estimation. For instance, although Fisher-Bingham PDFs on Sd are
able to model generic anisotropic distributions using O(d2 ) parameters, their
parameter estimation may be intractable [14]. In contrast, parameter estimation
for the O(d)-parameter von Mises-Fisher PDF is tractable, but that PDF can
only model isotropic distributions. We use a tractable approximation of a Normal
law on a Riemannian manifold [17], modeling anisotropy through its covariance
parameter in the tangent space at the mean.
For a group with I subjects, at each cortical location m, we fit the approx-
imate Normal law to the data { Hi (m)}Ii=1 as follows. We optimize for the
Frechet mean μ ∈ SB−1 via iterative gradient descent on the manifold SB−1 [2],
where
I
μ := arg min d2g (ν, Hi (n)) under the constraint ν ∈ SB−1 . (1)
ν
i=1
We use the logarithmic map Logμ (·) to map the square-root histograms
{ Hi (m)}Ii=1 to the tangent space at the estimated Frechet mean μ and find
the optimal covariance matrix Σ in closed form [5]. For any√histogram H, we
√ between H
define the squared geodesic√ Mahalanobis distance √ and mean μ,
given covariance Σ, as d2M ( H; μ, Σ) := Logμ ( H)T Σ −1 Logμ ( H). Then, the
proposed PDF evaluated at histogram H is
Riemannian Statistical Analysis of Cerebral Cortex 241
√
P (H|μ, Σ) := exp −0.5d2M ( H; μ, Σ) /((2π)(B−1)/2 |Σ|1/2 ). (2)
t(m) := d2M (μX (m); μY (m), Σ Y (m)) + d2M (μY (m); μX (m), Σ X (m)). (3)
(a) shape index (b) curvedness (c) thickness (d) region selected
values in the thinned-flattened region (Fig. 1(d)) and high p values elsewhere. In
contrast, Riemannian analysis on the marginal histograms for the shape index
(Fig. 2(a)), curvedness (Fig. 2(b)), and thickness (Fig. 2(c)) produces far more
Type-I/Type-II errors.
In comparison, a multiscale shape-index descriptor using a Laplacian scale-
space pyramid was unable to detect any significant differences (all p values
> 0.3; hence, figure not shown), multiscale descriptors of curvedness (Fig. 3(a)),
thickness (Fig. 3(b)), and joint shape-curvedness-thickness (Fig. 3(c)) lead to a
large number of false positives. Furthermore, the joint histogram descriptor with
Euclidean statistical modeling and hypothesis testing (permutation test with
Riemannian Statistical Analysis of Cerebral Cortex 243
Fig. 6. OASIS, multiscale descriptor. Permutation test p values with the joint
multiscale descriptor for a MCI cohort of (a) 10 subjects, (b) 18 subjects, and (c) 28
subjects.
References
1. Awate, S., Yushkevich, P., Song, Z., Licht, D., Gee, J.: Cerebral cortical folding
analysis with multivariate modeling and testing: studies on gender differences and
neonatal development. NeuroImage 53(2), 450–459 (2010)
2. Buss, S., Fillmore, J.: Spherical averages and applications to spherical splines and
interpolation. ACM Trans. Graph. 20(2), 95–126 (2001)
3. Carmo, M.D.: Differential Geometry of Curves and Surfaces. Prentice Hall, Upper
Saddle River (1976)
4. Fischl, B., Dale, A.: Measuring the thickness of the human cerebral cortex from
magnetic resonance images. Proc. Nat. Acad. Sci. 97(20), 11050–11055 (2000)
5. Fletcher, T., Lu, C., Pizer, S., Joshi, S.: Principal geodesic analysis for the study
of nonlinear statistics of shape. IEEE Trans. Med. Imaging 23(8), 995–1005 (2004)
6. Hardan, A., Muddasani, S., Vemulapalli, M., Keshavan, M., Minshew, N.: An MRI
study of increased cortical thickness in autism. Am. J. Psychiart. 163(7), 1290–
1292 (2006)
7. Jones, S., Buchbinder, B., Aharon, I.: Three-dimensional mapping of cortical thick-
ness using Laplace’s equation. Hum. Brain Mapp. 11(1), 12–32 (2000)
8. Joshi, A.A., Shattuck, D.W., Leahy, R.M.: A method for automated cortical surface
registration and labeling. In: Dawant, B.M., Christensen, G.E., Fitzpatrick, J.M.,
Rueckert, D. (eds.) WBIR 2012. LNCS, vol. 7359, pp. 180–189. Springer,
Heidelberg (2012)
9. Koenderink, J.J.: Solid Shape. MIT Press, Cambridge (1991)
10. Luders, E., Narr, K., Thompson, P., Rex, D., Woods, R., Jancke, L., Toga, A.:
Gender effects on cortical thickness and the influence of scaling. Hum. Brain Mapp.
27, 314–324 (2006)
11. Lyttelton, O., Boucher, M., Robbins, S., Evans, A.: An unbiased iterative group
registration template for cortical surface analysis. NeuroImage 34, 1535–1544
(2007)
12. Mangin, J., Riviere, D., Cachia, A., Duchesnay, E., Cointepas, Y., Papadopoulos-
Orfanos, D., Scifo, P., Ochiai, T., Brunelle, F., Regis, J.: A framework to study
the cortical folding patterns. NeuroImage 23(1), S129–S138 (2004)
13. Marcus, D., Wang, T., Parker, J., Csernansky, J., Morris, J., Buckner, R.: Open
access series of imaging studies (OASIS): cross-sectional MRI data in young, middle
aged, nondemented, and demented older adults. J. Cogn. Neurosci. 19(9), 1498–
1507 (2007)
14. Mardia, K., Jupp, P.: Directional Statistics. Wiley, Hoboken (2000)
15. Nichols, T., Holmes, A.: Nonparametric permutation tests for functional neu-
roimaging: a primer with examples. Hum. Brain Mapp. 15(1), 1–25 (2002)
16. Nordahl, C., Dierker, D., Mostafavi, I., Schumann, C., Rivera, S., Amaral, D.,
Van-Essen, D.: Cortical folding abnormalities in autism revealed by surface-based
morphometry. J. Neurosci. 27(43), 11725–11735 (2007)
246 S.P. Awate et al.
17. Pennec, X.: Intrinsic statistics on Riemannian manifolds: basic tools for geometric
measurements. J. Math. Imaging Vis. 25(1), 127–154 (2006)
18. Redolfi, A., Manset, D., Barkhof, F., Wahlund, L., Glatard, T., Mangin, J.F.,
Frisoni, G.: Head-to-head comparison of two popular cortical thickness extraction
algorithms: a cross-sectional and longitudinal study. PLoS ONE 10(3), e0117692
(2015)
19. Srivastava, A., Jermyn, I., Joshi, S.: Riemannian analysis of probability density
functions with applications in vision. In: Proceedings of International Conference
on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
20. Van-Essen, D., Dierker, D.: Surface-based and probabilistic atlases of primate cere-
bral cortex. Neuron 56, 209–225 (2007)
21. Yeo, B.T.T., Yu, P., Grant, P.E., Fischl, B., Golland, P.: Shape analysis with
overcomplete spherical wavelets. In: Metaxas, D., Axel, L., Fichtinger, G., Székely,
G. (eds.) MICCAI 2008, Part I. LNCS, vol. 5241, pp. 468–476. Springer, Heidelberg
(2008)
22. Yu, P., Grant, P., Qi, Y., Han, X., Segonne, F., Pienaar, R., Busa, E., Pacheco, J.,
Makris, N., Buckner, R., Golland, P., Fischl, B.: Cortical surface shape analysis
based on spherical wavelets. IEEE Trans. Med. Imaging 26(4), 582–597 (2007)
Modeling Fetal Cortical Expansion Using
Graph-Regularized Gompertz Models
1 Introduction
During the second and third trimester of gestation, the fetal brain grows from
a smooth shape to a complex folded structure. Understanding the processes
driving this rapid development is of strong academic and clinical interest [1].
In the last decade, in utero imaging using Magnetic Resonance Imaging (fetal
MRI), together with specialized reconstruction procedures [2] have led to a better
understanding of gross morphological fetal neurodevelopment. Various authors
have reported normative values for the developing brains’ volume, folding and
surface area. However, these studies are either based on premature neonates [3],
report global measurements [4–6] or rely on a-priori parcellations of the cortical
surface into lobar regions [7].
In this paper, we aim at computing a continuous model of fetal cortical
expansion. We build on recent advances in structured prediction [8] to regularize
parametric growth models along the cortex. We show that the resulting models
G. Langs—This project was supported by the FWF under KLI 544-B27 and I2714-
B31, the OeNB under 14812 and 15929, and EU FP7 under 2012-PIEF-GA-33003.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 247–254, 2016.
DOI: 10.1007/978-3-319-46720-7 29
248 E. Schwartz et al.
can be used to precisely model fetal cortical expansion on a surface node level,
predict gestational age with high accuracy, and identify cortical regions that are
predictive for age.
Obtaining surface models of the fetal brain from fetal MRI requires a sequence
of processing steps. Artefacts due to fetal motion during the acquisition are
mitigated by using fast Rapid Acquisition with Refocused Echoes (RARE) T2
sequences [9] at increased (3–4 mm) slice thickness. To avoid the loss of important
anatomical information due to this strong anisotropy, orthogonal views in axial,
coronal and sagittal direction are acquired and fused into an isotropic, high-
resolution (HR) volume [2].
4 Results
10 10
Local Cortical Area
8 8
6 6
4 4
2 2
20 25 30 35 20 25 30 35
GWs GWs
(a) Unregularized model fit (b) Avg. surface model (c) Regularized model fit
Fig. 2. Percentual increase in cortical surface area from GW 20 to GW 35. Due to the
design of B, this component is largely unaffected by regularization.
Fig. 4. Expansion rate. Distinctive expansion of the left inferior parietal lobule is robust
with respect to regularization, while values in the insula is homogenized.
45
left hemisphere
right hemisphere
40
estimated GW
35
30
25
20
20 25 30 35 40
true GW
Fig. 5. Results of LOOCV age prediction from local cortical surface area.
1 + 5, 0.05-th quantile for λ = 5), Fig. 5(b). By averaging the prediction in these
regions, the error is reduced significantly to 4.65±3.58 days (p < 0.05). The accu-
racy of these results is in the order of the underlying uncertainty of reported last
menstrual cycle and comparable with state of the art results in [15] as well as
manual measurements [4].
5 Conclusion
We have proposed a novel method for fitting spatially regularized growth models
to noisy data. Applying this method in the challenging setting of fetal brain
development enables building accurate interpretable models of cortical expansion
in utero, and allows for the point-wise estimation of gestation age. We have shown
that the resulting models are in line with published knowledge about fetal brain
growth and are able to predict the age of the fetus with high accuracy. We believe
that the presented method is of significant value in deepening the understanding
of the time-course of neuroanatomical development, as well as allowing for the
precise localization and characterization of its vulnerabilities.
References
1. Tallinen, T., Chung, J.Y., Rousseau, F., Girard, N., Lefevre, J., Mahadevan, L.:
On the growth and form of cortical convolutions. Nat. Phys. Feburary 2016
2. Rousseau, F., Oubel, E., Pontabry, J., Schweitzer, M., Studholme, C., Koob, M.,
Dietemann, J.L.: BTK: An open-source toolkit for fetal brain MR image processing.
Comput. Methods Prog. Biomed. 191(1) (2012)
254 E. Schwartz et al.
3. Dubois, J., Benders, M., Cachia, A., Lazeyras, F., Leuchter, R.H.V.,
Sizonenko, S.V., Borradori-Tolsa, C., Mangin, J.F., Hüppi, P.S.: Mapping the early
cortical folding process in the preterm newborn brain. Cereb. Cortex 18(6), 1444–
1454 (2008)
4. Wu, J., Awate, S.P., Licht, D.J., Clouchoux, C., du Plessis, A.J., Avants, B.B.,
Vossough, A., Gee, J.C., Limperopoulos, C.: Assessment of MRI-based automated
fetal cerebral cortical folding measures in prediction of gestational age in the third
trimester. Am. J. Neuroradiol. 36(7), 1369–1374 (2015)
5. Clouchoux, C., Kudelski, D., Gholipour, A., Warfield, S.K., Viseur, S., Bouyssi-
Kobar, M., Mari, J.L., Evans, A.C., du Plessis, A.J., Limperopoulos, C.: Quantita-
tive in vivo MRI measurement of cortical development in the fetus. Brain Struct.
Funct. 217(1), 127–139 (2011)
6. Rajagopalan, V., Scott, J., Habas, P.A., Kim, K., Corbett-Detig, J., Rousseau, F.,
Barkovich, A.J., Glenn, O.A., Studholme, C.: Local tissue growth patterns under-
lying normal fetal human brain gyrification quantified in utero. J. Neurosci. 31(8),
2878–2887 (2011)
7. Wright, R., Kyriakopoulou, V., Ledig, C., Rutherford, M.A., Hajnal, J.V.,
Rueckert, D., Aljabar, P.: Automatic quantification of normal cortical folding pat-
terns from foetal brain MRI. NeuroImage 91, 1–12 (2014)
8. Grosenick, L., Klingenberg, B., Katovich, K., Knutson, B., Taylor, J.E.: Inter-
pretable whole-brain prediction analysis with GraphNet. NeuroImage 72(C), 304–
321 (2013)
9. Hennig, J., Nauerth, A., Friedburg, H.: RARE imaging: a fast imaging method for
clinical MR. Magn. Reson. Med. 3(6), 823–833 (1986)
10. Serag, A., Aljabar, P., Ball, G., Counsell, S.J., Boardman, J.P., Rutherford, M.A.,
Edwards, A.D., Hajnal, J.V., Rueckert, D.: Construction of a consistent high-
definition spatio-temporal atlas of the developing brain using adaptive kernel
regression. NeuroImage 59(3), 2255–2265 (2012)
11. Rajchl, M., Baxter, J.S.H., McLeod, A.J., Yuan, J., Qiu, W., Peters, T.M.,
Khan, A.R.: Hierarchical max-flow segmentation framework for multi-atlas seg-
mentation with Kohonen self-organizing map based Gaussian mixture modeling.
Med. Image Anal. 1–19, May 2015
12. Crane, K., Pinkall, U., Schröder, P.: Robust fairing via conformal curvature flow.
ACM Trans. Graph. 32(4), 1–10 (2013)
13. Lombaert, H., Sporring, J., Siddiqi, K.: Diffeomorphic spectral matching of cortical
surfaces. In: Gee, J.C., Joshi, S., Pohl, K.M., Wells, W.M., Zöllei, L. (eds.) IPMI
2013. LNCS, vol. 7917, pp. 376–389. Springer, Heidelberg (2013). doi:10.1007/
978-3-642-38868-2 32
14. Wright, R., Makropoulos, A., Kyriakopoulou, V., Patkee, P.A., Koch, L.M.,
Rutherford, M.A., Hajnal, J.V., Rueckert, D., Aljabar, P.: Construction of a fetal
spatio-temporal cortical surface atlas from in utero MRI: application of spectral
surface matching. NeuroImage 120(C), 467–480 (2015)
15. Namburete, A.I.L., Stebbing, R.V., Kemp, B., Yaqub, M., Papageorghiou, A.T.,
Noble, J.A.: Learning-based prediction of gestational age from ultrasound images
of the fetal brain. Med. Image Anal. 21(1), 72–86 (2015)
Longitudinal Analysis of the Preterm Cortex
Using Multi-modal Spectral Matching
1 Introduction
Infants born extremely preterm are at high risk of developing cognitive and neu-
rologic impairment from an early age [1]. During the last trimester of pregnancy,
the fetal brain undergoes several changes in size, shape, volume, appearance [2],
as well as changes in connectivity and microstructure. Premature birth implies
that this development of the infant brain will take place under the harsh condi-
tions of the extra-uterine environment. Accurate measurements of the preterm
brain during this early post-natal period may yield predictive biomarkers of
neurological outcome. Furthermore, connecting information given by different
imaging modalities (structural and diffusion), may begin to provide an under-
standing of brain development during the preterm period and how it is affected
by preterm birth.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 255–263, 2016.
DOI: 10.1007/978-3-319-46720-7 30
256 E. Orasanu et al.
diffusion tensor images and surface information. Combining surface (2D) and
volume (3D) information is not trivial [10]. We tackled this problem by embed-
ding the surface with a level set representation in the 3D image domain, and
reformulating the surface spectral matching problem in this context. We fol-
lowed the previous strategies of spectral decomposition in the case of surfaces
and diffusion tensor images. We then compared the groupwise average of PIMMS
with the results of JSM.
Spectral Components of Surface in Image Domain. To decompose the
cortical surface, but in image space, we used the level set images of the white-
grey matter boundary, ILS . To optimise our decomposition, we considered a
subset of our image, ILSΩ1 , consisting of the voxels around the boundary within
a chosen threshold. Similarly to the surface decomposition, where we need to
have continuous surfaces with no holes to obtain smooth spectra, we chose the
smallest threshold that ensured a continuous surface for all subjects, which was
found to be of 3.5 mm in the presented work.
We constructed the connected graph (V, E) with the vertices V being image
voxels and the edges E are defined by the neighbourhood structure of these
vertices. We then represented the graph with its adjacency matrix W , where for
each pair of voxels xi and xj , xi = xj , Wij is 1 if the voxels are neighbours
and 0 if they are not. The diagonal matrix D gives the total weighting of all
edges connected to each voxel and is computed by Dii = j Wij . The general
graph Laplacian is defined by L = G−1 (D − W ), with G being the diagonal node
weighting matrix, which we computed according to the each voxel i’s inverse level
set value Gii = 1/xi . Hence, elements closer to the boundary, with a smaller level
set value, will have a higher weighting when computing the spectra.
The graph spectrum of the level set image at the defined points is given by
the eigen-decomposition of the general graph Laplacian L. The spectral compo-
nents U1LSΩ1 , . . . , UN LSΩ1 represent the fundamental modes of vibrations of the
image, and respectively describe increasing complexity of its geometric features,
from coarse to fine scales.
Mapping the obtained spectra from the level set images decomposition on
surfaces describe similar patterns of variations as the direct spectral decompo-
sition of surfaces given by [8], as shown in Fig. 1.
Combined Level Set and Diffusion Tensor Spectra. We combined the
level set spectra with the spectra obtained by the decomposition of the diffu-
sion tensor images as described by [4]. Briefly, for obtaining the DTI spectra,
the weights between the graph nodes (also neighbouring voxels) are computed
based on both tensor similarity from the log-Euclidean distance and Euclidean
distance. Our main goal was to optimise the surface correspondence by taking
into account microstrucural information inside the white matter. Hence, we sep-
arately compute tensor spectral components U1DT IΩ2 , . . . , UN DT IΩ2 for a subset
of the image IDT IΩ2 , in the deeper white matter structures, i.e. for the values
inside the level set boundary
(negative level set values) and outside the level
set subset ILSΩ1 (IDT IΩ2 ILSΩ1 = Ø). The independently computed spectra
were then combined in the same space to obtain the combined spectra, with
Longitudinal Analysis of the Preterm Cortex 259
Fig. 2. Combined spectral modes for the left hemisphere: shape variation given by the
decomposition a subset of a level set image (edges of the surface) and microstructural
variation given by the decomposition of the diffusion tensor image (inner)
voxels receiving spectral information from either diffusion (inside the WM) or the
surface data (around the boundary) [U1LSΩ1 , U1DT IΩ2 ], . . . , [UN LSΩ1 , UN DT IΩ2 ]
(Fig. 2).
Matching of Multi-modal Spectra. Having the multi-modal spectra of two
subjects R and F , we can now estimate the spatial correspondences between
them by optimising the correspondences between the spectral coordinates defined
by the first k multi-modal components of UR , and UF . We followed the com-
putational scheme introduced in [8]. Briefly, the first k spectral components are
initially corrected for their sign ambiguity by computing the dot product between
the corresponding eigenmodes at similar locations. For this we ran a coherent-
point drift (CPD) rigid registration [9] of the respective clouds of points, which
we used just to ensure the sign matching of the spectra in both the spectral
and diffusion components, independently. Using the combined spectra and the
thresholded level set distance maps for regularisation of the reference and float-
ing images, we estimate a mapping between the corresponding points using a
nearest neighbour search algorithm.
260 E. Orasanu et al.
Fig. 3. Standard deviation in mean diffusivity in the cortex of the early timepoint on
average groupwise for the left hemisphere, obtained using the proposed method and
Joint-Spectral Matching of surfaces
Fig. 4. Mean Longitudinal Rates of Change per week in cortical thickness (CT), cortical
fractional anisotropy (FA) and cortical mean diffusivity (MD) in Groupwise Space
5 Discussion
are likely related to increasing dendrification in the cortex. The cortical thick-
ness is increasing in most regions, except the temporal lobe, where it is slightly
decreasing. This result may be connected to the later development of the tem-
poral lobe and the fact that it is region the most affected by preterm birth [3].
We further investigated the interdependency of these multi-modal parameters
of the cortex across the different lobes. We found a positive CT-FA correlation,
while the CT-MD and FA-MD correlations were negative.
Our future research will imply linking grey and white matter properties close
to the surface (e.g. studying cortical laminae in the cortex and closer to the
white matter boundary), as well as linking the cortical surface and deep white
matter connectivity. Furthermore this method may allow us to also look into the
relationship between cortical folding and fibre-based connectivity.
References
1. Marlow, N., Wolke, D., Bracewell, M.A., Samara, M.: Neurologic and developmen-
tal disability at six years of age after extremely preterm birth. N. Engl. J. Med.
352(1), 9–19 (2005)
2. Kapellou, O., Counsell, S.J., Kennea, N., Dyet, L., Saeed, N., Stark, J.,
Maalouf, E., Duggan, P., Ajayi-obe, M., Hajnal, J., Allsop, J.M., Boardman, J.,
Rutherford, M.A., Cowan, F., Edwards, A.D.: Abnormal cortical development after
premature birth shown by altered allometric scaling of brain growth. PLoS Med.
3(8), e265 (2006)
3. Orasanu, E., Melbourne, A., Cardoso, M.J., Lomabert, H., Kendall, G.S.,
Robertson, N.J., Marlow, N., Ourselin, S.: Cortical folding of the preterm brain: a
longitudinal analysis of extremely preterm born neonates using spectral matching.
Brain Behav. 488, 1–18 (2016)
4. Orasanu, E., Melbourne, A., Lorenzi, M., Modat, M., Eaton-rosen, Z.,
Robertson, N.J., Kendall, G., Ourselin, S.: Tensor spectral matching of diffusion
weighted images. In: SAMI Conference Proceedings, MIDAS Journal, pp. 35–44
(2015)
5. Eaton-Rosen, Z., Melbourne, A., Orasanu, E., Cardoso, M.J., Modat, M., Bain-
bridge, A., Kendall, G.S., Robertson, N.J., Marlow, N., Ourselin, S.: Longitudinal
measurement of the developing grey matter in preterm subjects using multi-modal
MRI. NeuroImage 111, 580–589 (2015)
6. Han, X., Pham, D.L., Tosun, D., Rettmann, M.E., Xu, C., Prince, J.L.: CRUISE:
cortical reconstruction using implicit surface evolution. NeuroImage 23, 997–1012
(2004)
7. Waehnert, M.D., Dinse, J., Weiss, M., Streicher, M.N., Waehnert, P., Geyer, S.,
Turner, R., Bazin, P.: Anatomically motivated modeling of cortical laminae. Neu-
roImage 93, 210–220 (2014)
Longitudinal Analysis of the Preterm Cortex 263
8. Lombaert, H., Sporring, J., Siddiqi, K.: Diffeomorphic spectral matching of cortical
surfaces. In: Gee, J.C., Joshi, S., Pohl, K.M., Wells, W.M., Zöllei, L. (eds.) IPMI
2013. LNCS, vol. 7917, pp. 376–389. Springer, Heidelberg (2013). doi:10.1007/
978-3-642-38868-2 32
9. Myronenko, A., Song, X.: Point set registration: coherent point drift. IEEE Trans.
Pattern Anal. Mach. Intell. 32(12), 2262–2275 (2010)
10. Postelnicu, G., Zollei, L., Fischl, B.: Combined volumetric and surface registration.
IEEE Trans. Med. Imaging 28(4), 508–522 (2009)
Early Diagnosis of Alzheimer’s Disease
by Joint Feature Selection and Classification
on Temporally Structured Support
Vector Machine
1 Introduction
AD introduces both structural and functional loss that is known to have dynami-
cally evolving morphological patterns [1–3, 12–15]. In the past decade, longitudinal
studies have been actively investigated for AD diagnosis with special attention to MCI
[1, 4], which is an intermediate stage between NC and AD. For example, tensor-based
morphometry is used in [4] to reveal brain atrophy patterns from 91 probable AD
patients and 189 MCI subjects scanned at baseline, and after 6, 12, 18, and 24 months.
Moreover, the trend of longitudinal cortical thickness is used as the morphological
patterns in [5] to identify subjects which eventually convert to AD. However, current
longitudinal AD diagnosis methods have very strong restriction on the longitudinal
image sequence. For example, each subject recruited in [5] should at least 5 time-points
in every six months, and should develop AD after at least 12 months after the baseline
scan. For convenience, many longitudinal approaches assume the number of scans is
equal, albeit implicitly. In real clinical setting, however, not all patients have a large or
an equal number of imaging scans.
In order to accurately measure the tiny structural changes along time, current
state-of-the-art computer assisted diagnosis methods have to wait until the patient has
enough number of longitudinal scans. More critically, the prediction is short term, e.g.,
only 6 months before real onset of AD in [5]. Although promising results have been
achieved in predicting whether the subject has progressed to AD or stays in MCI stage,
the limitation of short-term prediction substantially hamper the deployment in clinical
practice.
In light of this, we propose a flexible solution for early detection of AD by
sequentially and consistently recognizing abnormal patterns of structure change from
longitudinal MR image sequence. First, we present a novel temporally structured SVM
(TS-SVM) which is trained based on a set of partial image sequences cut from the
complete longitudinal data. Compared to conventional SVM, our TS-SVM has two
major improvements to achieve early alarm and high accuracy in detecting AD pro-
gression: (1) Temporal consistency. We enforce monotonic constraint to avoid
inconsistent detection results along time. Since convergent evidence suggests that AD
progression is non-reversible [6, 7], we require the risk of AD progression should
monotonically increase within each subject as more and more time-points are inspec-
ted. (2) Early detection. We employ sequential recognition to achieve best balance of
early alarm and detection accuracy. In the training stage, we specifically train the
classifiers by making the classification margin adaptive to the length of partial image
sequence. Given the longitudinal image sequence of new subject with arbitrary number
of scans, we sequentially examine the longitudinal imaging patterns from baseline and
alarm the AD conversion as long as the detection of abnormal change is of high
confidence. Thus, our proposed AD early detection method does not have requirement
on the number of scans. Second, we further present a joint feature selection and
classification framework, in order to make the selected best features are eventually
optimal to work with the learned support vector machine. We have evaluated the
performance of AD early detection on more than 150 longitudinal subjects from ADNI
dataset. Our method achieved promising results by alarming AD onset 12 months prior
to the clinical diagnosis with at least 82.5 % accuracy.
266 Y. Zhu et al.
2 Methods
2.1 Temporally Structured SVM for Early Detection of AD
The goal of our method is to accurately predict AD converting as early as possible by
longitudinally tracking the structure changes. Since magnetic resonance (MR) image is
non-invasive and widely used in clinic practice, we present a novel temporally struc-
tured SVM on longitudinal MR image sequences.
Morphological Features. Suppose we have N training subjects, each subject Sn has a
MR image sequences I n ¼ fItn jt ¼ 1; . . .; Tn g (n ¼ 1; . . .; N) with Tn longitudinal
scans. For each volumetric image Itn , we first register the template image (http://qnl.bu.
edu/obart/explore/AAL/) with 90 manually labeled ROIs (regions of interest) using
hammer registration tool to the underlying image Itn and extract seven morphological
features in each ROI which include tissue percentiles (volumetric percentiles of the
ROI volume) of white matter (WM), gray matter (GM), cerebral-spinal fluid (CSF), and
background, and the averaged voxel-wise Jacobian determinant in WM, and GM and
CSF regions. Therefore, the image feature f nt for each volumetric image Itn is a 90
7 ¼ 630 dimension feature vector.
Decomposition to Partial Image Sequences. We can decompose the complete lon-
gitudinal image sequence I n into ðTn 1Þ partial image sequences Pn ¼ fPn ðbÞjb ¼
2; . . .; Tn g, where each Pn ðbÞ ¼ fItn jt ¼ 1; . . .; bg is the partial image sequence with b
time points from baseline to ðb 1Þ-th follow-up. For each Pn ðbÞ, we further hP extract
b
longitudinal feature representations and form a column vector hðb; nÞ ¼ t¼1 f t =b;
n
n 0
f 1 f nb , where the first half elements are the average of morphological features from
baseline to last time point and the second half elements measure the longitudinal
difference of morphological features from baseline to the last follow-up. It is apparent
that each feature representation hðb; nÞ describes both the spatial and temporal mor-
phological patterns. As we will explain in Sect. 2.2, feature selection is of necessity to
remove data redundancy from such high dimension (d ¼ 1; 260).
Naive Way to Achieve Early Detection by Classic SVM. In our application, the goal
of classification is to determine (1) whether we can detect the conversion of AD on the
new testing subject based on its MR image sequence Z ¼ fZt jt ¼ 1; . . .; Tz g up to the
current time point Tz ; and (2) whether we could detect the AD onset as early as
possible, i.e., push Tz as close to baseline as possible. Thus, we regard the early
detection of AD as a binary classification problem between MCI non-converter
(MCI-NC for short) and MCI converter (MCI-C for short). Without loss of generality,
we assume the first M subjects belong to MCI-NC group and the remaining subjects
belong to MCI-C group. Therefore, we divide all partial image sequences for training
purpose into two groups: MCI-NC group X ¼ xb;p jxb;p ¼ hðb; pÞ; p ¼ 1; . . .; M; b ¼
Early Diagnosis of Alzheimer’s Disease 267
1; . . .; Tp g and MCI-C group Y ¼ yb;p jyb;p ¼ hðb; qÞjq ¼ M þ 1; . . .; N; b ¼ 1;
. . .; Tq g. To achieve above goal, the naïve way is to train a SVM by:
8
< dx wTx wTy xb;p \e; e [ 0; 8xb;p 2 X
arg minw kWk2F þ ke2 ; s:t: ; ð1Þ
: dy wT wT y \e; e [ 0; 8y 2 Y
y x b;q b;q
Fig. 1. Advantages of our TS-SVM (right) over the naïve SVM solution (left). In our TS-SVM
method, we enforce the temporal monotony and consistency constraints on the extracted partial
image sequences (shown in the middle))
Compared to the objective function of naïve SVM in Eq. (1), two new constraints
(C1 and C2) are used. (1) we first turn the inter-class margins dx and dy in Eq. (1) from
scalar values into the monotonically increasing functions of b (the length of partial
268 Y. Zhu et al.
image sequence). The constraint C1 is mainly used to achieve early detection, i.e., we
require the probability of making accurate classification should increase as more time
points are available. (2) The second constraint C2 takes advantage of the non-reversible
nature of AD progression. Suppose ya;q and yb;q are the morphological features from
the same MCI-C subject but yb;q is extracted at the later time points after ya;q (i.e.,
a < b). Then we require the probability of the underlying MCI-C subject being con-
verted to AD should higher at later time point b than at earlier time point a, i.e.,
wTy yb;q [ wTy ya;q since AD conversion is irreversible. Furthermore, the intra-class
margin sy is a monotonically increasing function of l (l ¼ b a is the length difference
between two partial image sequences). Intuitively, the bigger the gap between two time
points is, the larger the increase of AD conversion risk becomes. It is worth noting that
the constraint C2 is not applicable to MCI-NC subjects since the MCI-NC subject might
convert to AD as more and more follow-ups will be scanned in future. Thus it is
unreasonable to assume the MCI-NC subject can keep staying at MCI stage. As shown
in the right of Fig. 1, for particular MCI-C subject, not only the probability score of AD
conversion but also the difference between the probability scores of converting to AD
and staying MCI monotonically increase as the partial image sequence becomes longer
and longer. Thus, our TS-SVM can detect AD onset at early stage with high confi-
dence. It is worth noting that we set dx ðbÞ ¼ b, dy ðbÞ ¼ b, sy ðlÞ ¼ l in all experiments.
The intuitions behind using kWk2;1 are that (1) sparsity constraint on each column of
W: only a small number of features are selected which is useful to suppress the noisy
and redundant patterns, and (2) group-wise constraint on each row of W: both MCI-NC
and MCI-C classifiers select/discard the same morphological features. In this way, W
can be simultaneously regarded as a coefficient matrix for feature selection and a
classifier for classification.
2.3 Optimization
Although Eq. (3) is a convex problem, it is hard to optimize it directly due to a large
number of linear inequality constraints. To solve this problem efficiently, we refor-
mulate it as an unconstrained problem following the framework of Alternating
Direction Method of Multipliers (ADMM) [8, 9, 16]. Specifically, we rewrite Eq. (3) as
Early Diagnosis of Alzheimer’s Disease 269
ð4Þ
where k kh is a hinge loss function which measures the mis-classification error with the
quadratic loss: kxkh ¼ kmaxð0; xÞk22 , l is the penalty parameters for the constraint
W ¼ Z,K 2 Nd2 is the Lagrange multiplier matrix for the equality constraint W ¼ Z,
Trð:Þ represents the trace operator, and k is the penalty parameter for the constraints C1
and C2, respectively. Equation (4) can be optimized by alternatively solving W; Z until
the overall energy function converges.
3 Experiments
In the following experiments, we select 70 MCI-C subjects from ADNI dataset which
have AD onset in the middle of longitudinal image sequence and 81 MCI-NC subject
which stay in MCI stage until the last scan in the latest ADNI dataset. For all subjects,
95.3 % have 4 follow-ups every 6 months, and the remaining 4.7 % having more than 4
follow-ups. Specifically, for 70 MCI-C subjects, 11.1 % are diagnosed AD at 6 months,
31.8 % at 12 months, 25.3 % are diagnosed AD at 18 months after baseline scan, while
the remaining 31.8 % are diagnosed AD more than 24 months after baseline scan. We
compare our proposed TS-SVM based early detection method with standard SVM based
method. Furthermore, we evaluate the importance of feature selection in both TS-SVM
and standard SVM method. Thus, we compare the classification performance for four
method in total, denoted by SVM, SVM+FS, TS-SVM, and TS-SVM+FS, respectively.
In all experiments, we split the data into 10 non-overlap folders and report the averaged
classification accuracy after 10-fold cross validation. The parameters are tuned using
grid search strategy only in the training dataset.
Performance of AD Early Detection. In each cross validation case, we train our
TS-SVM on the training data and sequentially apply the trained classifier to the testing
subject image sequence from the first follow-up. Since the month of converting to AD
after baseline scans varies across MCI-C subjects, we show the detection accuracy for
MCI-C subjects converting to AD 12 months, 18 months, and 24 months after the
baseline scan in Tables 1, 2 and 3, separately. It is clear our TS-SVM beat the standard
SVM with more than 10 % improvement in terms of classification accuracy, which
shows the advantage of temporal consistency and monotony constraints in our pro-
posed method. Also, feature selection is very important to improve the detection
accuracy, where SVM+FS and TS-SVM+FS can obtain average 3.8 % and 2.9 %
270 Y. Zhu et al.
increase over SVM and TS-SVM, respectively. In brief, our full method (TS-SVM+FS)
can detect AD 6 months prior to AD onset with 86.8 % accuracy, 12 months prior to
AD onset with 82.5 % accuracy, and 18 months prior to AD onset with 76.5 %
accuracy. Note, the early detection performance in Table 3 is worse than Tables 1 and
2 at corresponding pre-diagnosis windows. The reason is that the subjects in Table 3
mostly have 5 time points and have AD onset exactly at the last time point. Thus, the
unbalanced partial image sequences before and after AD onset challenge the learning of
robust classifiers.
Critical Brain Regions Related with AD Progression. Since our method jointly
select morphological features in training TS-SVM, it is interesting to examine the
critical brain regions where the morphological features extracted from these region
contribute significantly to detect AD progression via longitudinal tracking. Figure 2
show the top 20 regions selected by our TS-SVM+FS method. It is apparent that the
selected brain regions are located at AD related sub-cortical regions (such as putamen,
thalamus, and hippocampus) and cortical areas (such as orbitofrontal cortex,
medial/lateral temporal lobe, and medial/lateral parietal lobe), which is in consensus
with the neuroimaging observations in the literatures [10, 11]. We also compared the
top selected ROI for short term and long term detection and found that the cortical
regions contribute more for short term detection and the sub-cortical regions, such as
such as putamen, thalamus, and hippocampus, contribute more for long term converters
detection. This may indicate that the sub-cortical regions changes are more significant
compared with the cortical regions at the earlier AD progression stage. We did not
visualize this result due to the page limitation.
Table 1. Accuracy of AD early detection at 6 months, and 0 month before AD onset for the
MCI-C subjects converting to AD 12 months after baseline scan.
Method 18 months 12 months 6 months 0 month
ACC AUC ACC AUC ACC AUC ACC AUC
SVM - - - - 0.7110 0.7612 0.7345 0.7937
SVM+FS - - - - 0.7557 0.7862 0.7735 0.8237
TS-SVM - - - - 0.8816 0.9327 0.8975 0.9431
TS-SVM+FS - - - - 0.9025 0.9649 0.9075 0.9776
Table 2. Accuracy of AD early detection at 12 months, 6 months, and 0 month before AD onset
for the MCI-C subjects converting to AD 18 months after baseline scan.
Method 18 months 12 months 6 months 0 month
ACC AUC ACC AUC ACC AUC ACC AUC
SVM - - 0.7325 0.7822 0.7455 0.7917 0.7535 0.8223
SVM+FS - - 0.7537 0.7912 0.7685 0.8123 0.7725 0.8314
TS-SVM - - 0.8425 0.8851 0.8593 0.9042 0.8635 0.9128
TS-SVM+FS - - 0.8475 0.8932 0.8720 0.9277 0.8812 0.9216
Early Diagnosis of Alzheimer’s Disease 271
Fig. 2. The top 20 critical brain regions which contributed in AD early detection.
4 Conclusion
In this paper, we present a novel early AD diagnosis method using temporally struc-
tural SVM. In order to avoid inconsistent and unrealistic classification results, we
propose the monotony on the output of SVM since the AD progression is generally
non-reversible. In order to achieve early alarm of AD onset, we propose to adjust the
classification margin such that the confidence of detecting AD progression becomes
high as more and more follow-up scans are examined. Furthermore, we jointly perform
feature selection and training of TS-SVM, in order to make the selected features can
work well with the trained classifiers.
References
1. Thompson, P.M., Hayashi, K.M., Dutton, R.A., Chiang, M.C., Leow, A.D., Sowell, E.R., De
Zubicaray, G., Becker, J.T., Lopez, O.L., Aizenstein, H.J., Toga, A.W.: Tracking
Alzheimer’s disease. Ann. NY Acad. Sci. 1097, 198–214 (2007)
2. aël Chetelat, G., Baron, J.-C.: Early diagnosis of Alzheimer’s disease: contribution of
structural neuroimaging. NeuroImage 18, 525–541 (2003)
3. Reisberg, B., Ferris, S.H., Kluger, A., Franssen, E., Wegiel, J., de Leon, M.J.: Mild cognitive
impairment (MCI): a historical perspective. Int. Psychogeriatr. 20, 18–31 (2008)
4. Hua, X., Lee, S., Hibar, D.P., Yanovsky, I., Leow, A.D., Toga Jr., A.W., Jack, C.R.,
Bernstein, M.A., Reiman, E.M., Harvey, D.J., Kornak, J., Schuff, N., Alexander, G.E.,
Weiner, M.W., Thompson, P.M.: Mapping Alzheimer’s disease progression in 1309 MRI
scans: power estimates for different inter-scan intervals. NeuroImage 51, 63–75 (2010)
5. Li, Y., Wang, Y., Wu, G., Shi, F., Zhou, L., Lin, W., Shen, D.: Discriminant analysis of
longitudinal cortical thickness changes in Alzheimer’s disease using dynamic and network
features. Neurobiol. Aging 33, 427.e415–430 (2012)
272 Y. Zhu et al.
6. Hua, X., Gutman, B., Boyle, C.P., Rajagopalan, P., Leow, A.D., Yanovsky, I., Kumar, A.R.,
Toga Jr., A.W., Jack, C.R., Schuff, N., Alexander, G.E., Chen, K., Reiman, E.M., Weiner,
M.W., Thompson, P.M.: Accurate measurement of brain changes in longitudinal MRI scans
using tensor-based morphometry. NeuroImage 57, 5–14 (2011)
7. Filley, C.: Alzheimer’s disease: it’s irreversible but not untreatable. Geriatrics 50, 18–23
(1995)
8. Boyd, S., et al.: Distributed optimization and statistical learning via the ADMM. Found.
Trends Mach. Learn. 3, 1–122 (2011)
9. Nie, F., Huang, Y., Wang, X., Huang, H.: New primal SVM solver with linear computational
cost for big data classifications. In: ICML (2014)
10. Hoesen, G.W.V., Parvizi, J., Chu, C.-C.: Orbitofrontal cortex pathology in Alzheimer’s
disease. Cereb. Cortex 10, 243–251 (2000)
11. Risacher, S., Saykin, A.: Neuroimaging biomarkers of neurodegenerative diseases and
dementia. Semin. Neurol. 33, 386–416 (2013)
12. Antila, K., Lötjönen, J., Thurfjell, L., et al.: The PredictAD project: development of novel
biomarkers and analysis software for early diagnosis of the Alzheimer’s disease. Interface
Focus 3(2012)
13. Lorenzi, M., Ziegler, G., Alexander, D.C., Ourselin, S.: Efficient Gaussian process-based
modelling and prediction of image time series. In: Ourselin, S., Alexander, D.C.,
Westin, C.-F., Cardoso, M. (eds.) IPMI 2015. LNCS, vol. 9123, pp. 626–637. Springer,
Heidelberg (2015)
14. Young, A.L., et al.: A data-driven model of bio-marker changes in sporadic Alzheimer’s
disease. Brain 25, 64–77 (2014)
15. Fonteijin, H.M., et al.: An event-based model for disease progression and its application in
familial Alzheimer’s disease and Huntington’s disease. NeuroImage 60, 1880–1889 (2012)
16. Zhu, Y., Lucey, S.: Convolutional sparse coding for trajectory reconstruction. IEEE Trans.
Pattern Anal. Mach. Intell. 37(3), 529–540 (2015)
Prediction of Memory Impairment with MRI
Data: A Longitudinal Study of Alzheimer’s
Disease
1 Introduction
Alzheimer’s Disease (AD), the most common form of dementia, is a neurodegen-
erative disorder which severely impacts patients’ thinking, memory and behavior.
Current consensus has emphasized the demand of early recognition of this dis-
ease, with which the goal of stoping or slowing down the disease progression
can be achieved [8]. The effectiveness of neuroimaging in predicting the progres-
sion of AD or cognitive performance has been studied and reported in plentiful
research [4,12]. However, many previous research merely paid attention to the
prediction using the baseline data, which neglected correlation among longitudi-
nal cognitive performance. AD is a progressive neurodegenerative disorder, thus
it is significant to discover neuroimaging measures that impact the progression
of this disease along the time axis.
X. Wang and H. Huang were supported in part by NSF IIS-1117965, IIS-1302675,
IIS-1344152, DBI-1356628, and NIH AG049371. D. Shen was supported in part by
NIH AG041721.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 273–281, 2016.
DOI: 10.1007/978-3-319-46720-7 32
274 X. Wang et al.
In our multi-task problem, suppose these mT tasks come from c groups, where
tasks in each group are correlated. We can introduce and optimize a group index
matrix set Q = {Q1 , Q2 , . . . Qc } to discover this group structure. Each Qi is a
diagonal matrix with Qi ∈ {0, 1}mT ×mT showing the assignment of tasks to the
i-th group. For the (k, k)-th element of Qi , (Qi )kk = 1 means that the k-th
task belongs to the i-thcgroup while (Qi )kk = 0 means not. To avoid overlap of
groups, we constrain i=1 Qi = I.
Since each group of tasks share correlative dependence, we can reasonably
assume the latent subspace of each group maintains a low-rank structure. We
impose Schatten-p norm as a low-rank constraint to uncover the common sub-
space shared by different tasks. According to the discussion below, Schatten
p-norm makes a better approximation of the low-rank constraint than the pop-
ular trace norm regularization [7].
For a matrix A ∈ Rd×n , suppose σi is its i-th singular value, then the rank of
min{d,n} 0
A can be written as rank(A) = i=1 σi , where 00 = 0. And the definition
p p
of p-th power Schatten p-norm (0 < p < ∞) of A is: ASp = T r((AT A) 2 ) =
min{d,n} p
i=1 σi . Specially, when p = 1, we find the Schatten p-norm of A is exactly
1 min{d,n}
its trace norm: AS1 = (T r((AT A) 2 )) = i=1 σi = A∗ .
So when 0 < p < 1, Schatten p-norm is a better low-rank regularization
than trace norm. Accordingly, our longitudinal structured low-rank regression
model is:
T
c
T
min Wt Xt − Yt 2 + γ p
(WQi Sp )l .
c F (1)
W,Qi |ci=1 ∈{0,1}mT ×mT , Qi =I t=1 i=1
i=1
The second step is fixing Q and solving W, and then Problem (2) becomes:
T
c
T
min Wt Xt − Yt 2 + γ T r(W T Di WQi ). (6)
W F
t=1 i=1
c
2
min WtT Xt − Yt F + γ T r(WtT Di Wt Qit ). (7)
Wt
i=1
Taking derivative w.r.t. (wt )k in Problem (8) and setting it to zero, then we
get:
c
(wt )k = (Xt Xt T + γ ( (Qit )kk Di ))−1 Xt ((yt )k )T . (9)
i=1
4 Experimental Results
Data used in the preparation of this article were obtained from the ADNI data-
base (adni.loni.usc.edu). Each MRI T1-weighted image was first anterior com-
missure (AC) posterior commissure (PC) corrected using MIPAV2, intensity
inhomogeneity corrected using the N3 algorithm [10], skull stripped [16] with
manual editing, and cerebellum-removed [15]. We then used FAST [17] in the
FSL package3 to segment the image into gray matter (GM), white matter (WM),
and cerebrospinal fluid (CSF), and used HAMMER [9] to register the images to
a common space. GM volumes obtained from 93 ROIs defined in [5], normal-
ized by the total intracranial volume, were extracted as features. Longitudinal
scores were downloaded from three independent cognitive assessments including
Fluency Test, Rey’s Auditory Verbal Learning Test (RAVLT) and Trail making
test (TRAILS). The details of these cognitive assessments can be found in the
ADNI procedure manuals. The time points examined in this study for both imag-
ing markers and cognitive assessments included baseline (BL), Month 6 (M6),
Month 12 (M12) and Month 24 (M24). All the participants with no missing
BL/M6/M12/M24 MRI measurements and cognitive measures were included in
this study. A total of 385 sample subjects are involved in our study, among which
we have 56 AD samples, and 181 MCI samples and 148 health control (HC) sam-
ples. Seven cognitive scores were included: (1) RAVLT TOTAL, RAVLT TOT6
and RAVLT RECOG scores from RAVLT cognitive assessment; (2) FLU ANIM
and FLU VEG scores from Fluency cognitive assessment; (3) Trails A and Trails
B scores from Trail making test.
278 X. Wang et al.
We first evaluate the ability of our method to predict a certain set of cognitive
scores via neuroimaging marker. We tracked the process along the time axis and
intended to find the set of markers which could influence the cognitive score
over the time points. As the evaluation metric, we reported the Root Mean
Square Error (RMSE) as well as the Correlation Coefficient (CorCoe) between
the predicted score and the ground truth.
We compared our method with all the counterparts discussed in the intro-
duction, which are: Multivariate Linear Regression (MLR), Multivariate Ridge
Regression (MRR), Longitudinal Trace-norm Regression (LTR), Longitudinal
2,1 norm Regression (L21R) and their combination (L21R + LTR). To illustrate
the advantage of simultaneously conducting task correlation and longitudinal
feature learning, we also compared with the method of using K-means to cluster
the tasks first and then implementing LTR in each group (K-means + LTR) as
the baseline.
We utilized the 10-fold cross validation technique and ran 50 times for each
method. The average RMSE and CorCoe on these 500 trials are reported. For
MLR and MRR, since they were not designed for the longitudinal tasks, we
computed the weight matrix for each time point separately and then merged
them to the final weight matrix according to the definition W = [W1 , · · · , WT ].
Here in this experiment, the number of time points T is 4. Our initial analy-
ses indicated that our model performs fairly stable when choosing parameter l
from {2, 2.5, . . . , 5} and choosing parameter p from {0.1, 0.2, . . . , 0.8} (data not
shown). In our experiments, we fixed p = 0.1 and l = 3.
The experimental results are summarized in Table 1. From all the results,
we can notice that our method outperforms all other methods consistently on
all data sets. The reasons go as follows: MLR and MRR assumed the cognitive
measures at different time points to be independent, thus didn’t consider the
correlations along the time. Their neglects of the longitudinal correlation within
the data was detrimental to their prediction ability. As for L21R, LTR and their
combination LTR + L21R, even though they take into account the longitudinal
information, they cannot handle the possible group structure within the cognitive
4.3
rio bu tal icle
r pa infe s gy
lim pa ru
ra rio
b hip r p llad s
of p fr u u
in oc onta tam s
te
rn cin am l g en
al gu pal yru
ca la gy s
ps te ru
ule su fr
o re s
in bth nta gio
c. a l fo n
in cerelam lob rnix
fe e
an rio bra ic n W
te r o l uc M
cc pe leu
rio
r li su cau ipit dun s
m pra da al cle
b m te g
of a n yru
m in rg u s
su id tern ina cleu
pe dle a l g s
rio fr l c yru
r p on ap s
ari tal sule
eta gy
l lo rus
su b
pe pa p cu ule
rio rie re ne
r te tal cu us
m m lo ne
id po be us
dle ra W
te lg M
su m yru
po un s
pe li ra c
u
nu rior ngu l gy s
cle fro al ru
u n g s
oc s a tal yru
cip c g s
po ita cum yru
stc l lo be s
m pre en be ns
ed te c tra W
la ial mp entr l gy M
te fr o a ru
ra on ra l g s
l fr t− l lo y
on orb be rus
in p t−o ital WM
fe e rb g
rio ri ita yru
r te rhin l g s
m al yru
p c s
su te ora ort
p en m l g ex
hip erio torh pora yru
po r o in l p s
ca cc al c ole
m ipit ort
pa al e
m l fo gy x
ed
ia rm rus
lo th ati
cc ala on
ip m
10
10
10
10
−5
−5
−5
−5
m late
ed ra
ia l
l fr v
po glo on entr M24 M12 M06 BL
ste
rio bu tal icle
r pa infe s gy
pa ru
lim ra rio ll s
b h ip r fr pu adu
of po on ta s
in c ta m
te
rn cin am l g en
al gu pal yru
ca la gy s
ps te ru
ule su fr
o re s
in bth nta gio
c. a l fo n
in cerelam lob rnix
fe e
an rio bra ic n W
te r o l uc M
cc pe leu
rio
s c
r li u au it un s ip d
m pra da al cle
b m te g
of a n yru
m in rg uc s
su id tern ina leu
pe dle a l g s
rio fr l c yru
r p on ap s
ari tal sule
eta gy
l lo rus
su b
pe pa p cu ule
rio rie re ne
r te tal cu us
m m lob neu
id po e s
dle ra W
te lg M
m yru
su p
pe li ora unc s
u
nu rior ngu l gy s
cle fro al ru
u n g s
oc s a tal yru
cip c g s
po ita cum yru
Identification of Longitudinal Imaging Markers
stc l lo be s
m p e b n
ed te rec ntra e W s
la ial mp entr l gy M
te fr o a ru
ra on ra l g s
l fr t− l lo y
on orb be rus
in p t−o ital WM
fe e rb g
rio ri ita yru
r te rhin l g s
m al yru
p c s
su te ora ort
pe en mp l g ex
hip rio torh ora yru
po r o in l p s
ca cc al c ole
m ipit ort
pa al e
m l fo gy x
ed
ia rm rus
lo th ati
cc ala on
ip m
m ito u
la id te a ins s
te dle m m u
ra p y la
lo oc ora gd
cc cip l ala
Prediction of Memory Impairment with MRI Data
ip a ita gyru
ito ng l
te ula gy s
m r ru
po g s
oc ra ru y
cip l g s
ita yru
lp s
ole
0
5
0
5
0
5
0
5
10
10
10
10
−5
−5
−5
−5
scores, which are RAVLT TOTAL, RAVLT TOT6 and RAVLT RECOG, respectively.
Imaging features (columns) with larger weights possess higher correlation with the
plotted. We draw two matrices for each time point, where the left figure is for the
via MRI data. The weight matrices at four time points, BL, M6, M12 and M24, are
Fig. 1. Heat maps of our learned weight matrices on the RAVLT cognitive assessment
columns denote neuroimaging features while rows represent three different RAVLT
left hemisphere and the right figure for the right hemisphere. For each weight matrix,
results of our model. RAVLT is composed of three cognitive measures, which
We further take a special case, the RAVLT assessment, as an example to analyze
cognitive scores. As was discussed in the theoretical sections, our model is able
279
meanwhile cluster the cognitive results into groups. Thus, our model can capture
in most cases, but are inferior to our proposed method. For K-means + LTR,
features responsible for some, but not necessarily all, cognitive measures along
are: (1) the total number of words kept in mind by the testee in the first five
the clustering step is detached from the longitudinal association study, thus
to find features which impact on the cognitive result at different stages and
scores. That is why they overweigh the standard methods like MLR and MRR
correlations among imaging features, but also detected group structure within
learning process. As for our proposed method, we not only captured longitudinal
the learned interrelation structure is not optimal for the following longitudinal
280 X. Wang et al.
trials, RAVLT TOTAL; (2) the number of words recalled during the 6th trial,
RAVLT TOT6; and (3) the number of words recognized after a gap of 30 min,
RAVLT RECOG. According to the common sense, these three measures should
be interrelated with each other, thus clustered into the same group in our model.
The result of our model shows a consistent obedience of this rule, i.e., no matter
what the c value (number of groups) is, our model invariably put all these three
measures to the same group, which is in line with reality. Specially, when c is
larger than the real number of groups, the extra groups become empty.
Figure 1 shows the heat maps of the weight matrices learned by our method.
The figures demonstrate the capture of a small set of features that are con-
sistently associated to a certain group of cognitive measures (here the group
includes all measures). Among the selected features, we found the top two are
the hippocampal formation and thalamus, whose impacts on AD have already
been proved in the previous papers [2,3]. In summary, our model is competent
to select a small set of features that consistently correlate with a certain group
of cognitive measures along the time axis. And the effectiveness of the selected
features can be confirmed by previous reports in the literature.
5 Conclusion
References
1. Bezdek, J.C., Hathaway, R.J.: Convergence of alternating optimization. Neural
Parallel Sci. Comput. 11(4), 351–368 (2003)
2. De Jong, L., Van der Hiele, K., Veer, I., Houwing, J., Westendorp, R., Bollen, E.,
De Bruin, P., Middelkoop, H., Van Buchem, M., Van Der Grond, J.: Strongly
reduced volumes of putamen and thalamus in Alzheimer’s disease: an MRI study.
Brain 131(12), 3277–3285 (2008)
3. De Leon, M., George, A., Golomb, J., Tarshish, C., Convit, A., Kluger, A.,
De Santi, S., Mc Rae, T., Ferris, S., Reisberg, B., et al.: Frequency of hippocam-
pal formation atrophy in normal aging and Alzheimer’s disease. Neurobiol. Aging
18(1), 1–11 (1997)
4. Ewers, M., Sperling, R.A., Klunk, W.E., Weiner, M.W., Hampel, H.: Neuroimaging
markers for the prediction and early diagnosis of Alzheimer’s disease dementia.
Trends Neurosci. 34(8), 430–442 (2011)
Prediction of Memory Impairment with MRI Data 281
1 Introduction
Popular for improving the power of classifiers is to expand application from a
single data set (Fig. 1(a)) to multiple, independently collected sets of the same
disease (Fig. 1(b)) [1]. To analyze across multiple sets, neuroimage studies gener-
ally first harmonize the data by, for example, regressing out demographic factors
from MRI measurements and then train the classifier to distinguish disease from
control samples [2]. However, harmonization might mitigate group differences
making classification difficult (such as in Fig. 2). To improve classification accu-
racy, we propose the first approach to jointly learn how to harmonize MR image
data and classify disease. Harmonization relies on both controls and disease
Equal contribution by Dr. Zhang and Dr. Park.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 282–290, 2016.
DOI: 10.1007/978-3-319-46720-7 33
Joint Data Harmonization and Classification 283
Fig. 1. Examples for classifying data: (a) Classification based on a single set, (b) sep-
arating healthy and disease based on multiple sets requiring harmonization, and (c)
part two disease groups based on harmonizing controls of disease specific data sets.
Let a data set consists of set SA of controls and samples with disease A, and
an independently collected set SB of controls and samples of disease B. The
284 Y. Zhang et al.
four subsets are matched with respect to demographic scores, such as age. Each
sample s of the joint set is represented by a vector of image features xs and
a label ys , where ys = −1 if s ∈ SA and ys = +1 for s ∈ SB . The acquisi-
tion differences between SA and SB are assumed to linearly impact the image
features. To extract disease separating patterns from this joint set, we review
the training of a sequential model for data harmonization and classification, and
then propose to simultaneously parameterize both tasks by minimizing a single
energy function.
which can be solved via penalty decomposition [4,6]. Note, that Eq. (3) turns
into a sparsity constrained problem if each group gi is of size 1. Furthermore,
setting ‘k = nG ’ turns Eq. (3) into a logistic regression problem. Finally, Eq. (3)
can distinguish a single disease from controls by simply replacing ys in Eq. (2)
with a variable encoding assignment to cohorts instead of data sets.
Joint Data Harmonization and Classification 285
where λ ∈ (0, 1). Note, the model fails to classify when λ = 1 ( v and w are
undefined) or harmonize when λ = 0 (entries of U are undefined). Motivated by
[4], we simplify optimization by first parameterizing the classifier with respect
to the ‘unconstrained’ vector q before determining the corresponding sparse
The solution to Eq. (4) is estimated by iteratively increasing of
solution w.
(U , v , w , q ) := arg min (1 − λ) · lD (U, v, q) + λ · hC (U ) + · w − q22 (5)
U,v,w∈S,q
As this minimization is over a convex and smooth function, Eq. (6) is solved via
gradient descent. Note, that determining U is equivalent to increasing the sepa-
ration between the two disease groups by minimizing lD (·, v , q ) while reducing
the difference between the two control groups by minimizing hC (·).
Next, BCD updates v and q by keeping (U ,w ) fixed in Eq. (5), i.e.,
(v , q ) := arg min(1 − λ) · lD (U , v, q) + · w − q22 . (7)
v,q
As shown in [4,6], the closed form solution of Eq. (8) first computes gi (q )2
for each group i and then sets w to the entries of q , who are assigned to
the k groups with the highest norms. The remaining entries of w are set to 0.
The procedures (6)∼(8) are repeated until the relative changes of (U , v , w , q )
between iterations are smaller than a threshold B . (U , v , w , q ) is updated
with the converged (U , v , w , q ), is increased and another BCD loop is
initiated until w and q converge towards each other (see Algorithm 1 for
details).
Figure 2 showcases the differences between sequential and joint harmoniza-
tion and classification. Two synthetic data sets consist of a control and disease
cohort, where the raw scores for each cohort are 20 random samples from a
Gaussian distribution with the covariance being the identity matrix multiplied
by 0.01. The mean of the Gaussian for Disease A of Set I (blue) is (1.3,2) result-
ing in samples that are somewhat separated from those of Disease B of Set II
(mean = (1.5,2), red). The difference in data acquisition between the two sets is
simulated by an even larger separation of the means between the two control
groups (Set I: mean=(0.9,1), green; Set B: mean = (1.2,1), black). The Sequen-
tial method (see Sect. 2.1 without sparsity) harmonizes the scores so that the
classifier assigns the controls to one set, i.e., the separating plane (black line) is
impartial to acquisition differences. This plane fails to perfectly separate the two
disease cohorts as the cohorts are ‘pushed’ together with the mean of Disease
B being now to the right of the mean of Disease A. Higher accuracy in disease
classification is achieved by our joint model (omitting sparsity) with λ = 0.8.
Comparing this plot to the results with λ = 0.5 shows that as λ decreases the
emphasis on separating the two disease increases as intended by Eq. (6). The clas-
sifier is still impartial to acquisition differences and perfectly labels the samples
of the two disease cohort. In summary, the joint model enables data harmoniza-
tion that preserves group differences, which was not the case for the sequential
approach.
Joint Data Harmonization and Classification 287
The group sparsity constraint aided in separating diseases and identified pat-
impacted by either MCI or HAND (or
terns of regions (i.e., non-zero weights w)
HIV). Each column of Fig. 3 shows the largest, unique pattern associated with
a training set. For those training sets that selected multiple patterns (i.e., w
settings), patterns with less regions were always included in the largest pattern.
The precentral gyrus, cerebellum VIII, and lateral ventricle were parts of all pat-
terns. HIV is known to impact the cerebellum [10] and accelerated enlargement
of ventricles is linked to both HIV [11] and MCI [12]. These findings indicate
that the extracted patterns are informative with respect to MCI and HAND
(and HIV), which requires an in-depth morphemic analysis for confirmation.
Joint Data Harmonization and Classification 289
Fig. 3. Each column shows the largest, unique pattern extracted by JointGroup on one
of the 5 training sets. Identified regions are impacted by HAND, HIV, or MCI.
4 Conclusion
Acknowledgement. This research was supported in part by the NIH grants U01
AA017347, AA010723, K05-AA017168, K23-AG032872, and P30 AI027767. We thank
Dr. Valcour for giving us access to the UHES data set. With respect to the ADNI data,
collection and sharing for this project was funded by the NIH Grant U01 AG024904 and
DOD Grant W81XWH-12-2-0012. Please see https://adni.loni.usc.edu/wp-content/
uploads/how to apply/ADNI DSP Policy.pdf for further details.
References
1. Jovicich, J., et al.: Multisite longitudinal reliability of tract-based spatial statistics
in diffusion tensor imaging of healthy elderly subjects. Neuroimage 101, 390–403
(2014)
2. Moradi, E., et al.: Predicting symptom severity in autism spectrum disorder based
on cortical thickness measures in agglomerative data. bioRxiv (2016)
3. Sabuncu, M.R.: A universal and efficient method to compute maps from image-
based prediction models. In: Golland, P., Hata, N., Barillot, C., Hornegger, J.,
Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8675, pp. 353–360. Springer, Heidelberg
(2014). doi:10.1007/978-3-319-10443-0 45
290 Y. Zhang et al.
4. Zhang, Y., et al.: Computing group cardinality constraint solutions for logistic
regression problems. Medical Image Analysis (2016, in press)
5. Sanmarti, M., et al.: HIV-associated neurocognitive disorders. J.M.P. 2(2) (2014)
6. Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM
J. Optim. 23(4), 2448–2478 (2013)
7. Nir, T.M., et al.: Mapping white matter integrity in elderly people with HIV. Hum.
Brain Mapp. 35(3), 975–992 (2014)
8. Pfefferbaum, A., et al.: Variation in longitudinal trajectories of regional brain vol-
umes of healthy men and women (ages 10 to 85 years) measured with atlas-based
parcellation of MRI. Neuroimage 65, 176–193 (2013)
9. Fisher, R.: The logic of inductive inference. J. Roy. Stat. Soc. 1(98), 38–54 (1935)
10. Chang, L., et al.: Impact of apolipoprotein E 4 and HIV on cognition and brain
atrophy: antagonistic pleiotropy and premature brain aging. Neuroimage 4(58),
1017–1027 (2011)
11. Thompson, P.M., et al.: 3D mapping of ventricular and corpus callosum abnormal-
ities in HIV/AIDS. Neuroimage 31(1), 12–23 (2006)
12. Nestor, S.M., et al.: Ventricular enlargement as a possible measure of Alzheimer’s
disease progression validated using the Alzheimer’s disease neuroimaging initiative
database. Brain 131(9), 2443–2454 (2008)
Progressive Graph-Based Transductive
Learning for Multi-modal Classification
of Brain Disorder Disease
1 Introduction
Alzheimer’s disease (AD) is the most common neurological disorder in the older
population. There is overwhelming evidence in the literature that the morphological
patterns are observable by means of either structural and diffusion MRI or PET [1–3].
However, morphological abnormal patterns are often subtle, compared to high
inter-subject variations. Hence, sophisticated pattern recognition methods are of high
demand to accurately identify individuals at different stages of AD progression.
Medical imaging applications often deal with high dimensional data and usually
less number of samples with ground-truth labels. Thus, it is very challenging to find a
general model that can work well for an entire set of data. Hence, GTL method has
been investigated with great success in medical imaging area [4, 5], since it can
overcome the above difficulties by taking advantage of the data representation on
unlabeled testing subjects. In current state-of-the-art methods, graph is used to repre-
sent the subject-wise relationship. Specifically, each subject, regardless of being
labeled or unlabeled, is treated as a graph node. Two subjects are connected by a graph
link (i.e., an edge) if they have similar morphological patterns. Using these connec-
tions, the labels can be propagated throughout the graph until all latent labels are
determined. Many current label propagation strategies have been proposed to determine
the latent labels of testing subjects based on subject-wise relationships encoded in the
graph [6].
The assumption of current methods is that the graph constructed in the observed
feature domain represents the real data distribution and can be transferred to guide label
propagation. However, this assumption usually does not hold since morphological
patterns are often highly complex and heterogeneous. Figure 1(a) shows the affinity
matrix of 51 AD and 52 NC subjects using the ROI-based features extracted from each
MR image, where red dot and blue dot denote high and low subject-wise similarities,
respectively. Since the clinical data (e.g., MMSE and CDR scores [1]) is more related
with clinical labels, we use these clinical scores to construct another affinity matrix, as
shown in Fig. 1(c). It is apparent that the data representations using structural image
features and clinical scores are completely different. Thus, there is no guarantee that the
learned graph from the affinity matrix in Fig. 1(a) can effectively guide the classifi-
cation of AD and NC subjects. More critically, the affinity matrix using observed image
features is not even necessarily optimal in the feature domain, due to possible imaging
noises and outlier subjects. Many studies take advantage of multi-modal information to
improve discrimination power of transductive learning. However, the graphs from
different modalities might be different too, as shown in the affinity matrices using
structural image features from MR images (Fig. 1(a)) and functional image features
from PET images (Fig. 1(b)). Graph diffusion [5] is recently proposed to find the
common graph. Unfortunately, as shown in Fig. 1, it is hard to find a combination for
the graphs in Fig. 1(a) and (b) that can lead to the graph in Fig. 1(c), which is more
related with final classification task.
Fig. 1. Affinity matrices using structural image features (a), functional image features (b), and
clinical scores (c).
Progressive Graph-Based Transductive Learning 293
To solve these issues, we propose a pGTL method to learn the intrinsic data
representation, which could be eventually optimal for label propagation. Specifically,
the intrinsic data representation is required to be (a) close to subject-wise relationships
constructed by image features extracted from different modalities, and (b) verified on
the training data and guaranteed to be optimal for label classification. To that end, we
simultaneously (1) refine the data representation (subject-wise graph) in the feature
domain, (2) find the intrinsic data representation based on the constructed graphs on
multi-modal imaging data and also the clinical labels of entire subject set (including
known labels on training subjects and also the tentatively-determined labels on testing
subjects), and (3) propagate the clinical labels from training subjects to testing subjects,
following the latest learned intrinsic data representation. Promising classification
results have been achieved in classifying 93AD, 202 MCI, and 101NC subjects, each
with MR and PET images.
2 Methods
As shown in Fig. 1, the affinity matrix S might not be strongly related with the intrinsic
data representation in the label domain. Therefore, it is necessary to further design a
graph based on the labels matrix, rather than solely using the graph constructed by the
features. However, the labels on testing subjects are not determined yet. In order to
solve this chicken-and-egg dilemma, we propose to construct a dynamic graph which
progressively reflects the intrinsic data representation in the label domain.
294 Z. Wang et al.
where l is the scalar balancing the data fitting terms from two different domains (i.e.,
the first and second terms in Eq. (2)). Suppose si 2 RN1 and ti 2 RN1 are vectors
with the j-th element as sij and tij separately. In order to avoid trivial solution, l2 -norm is
used as the constraint on each element sij in affinity matrix S. k1 and k2 are two scalars
to control the strengths of the last two terms in Eq. (2).
Progressive Graph-Based Transductive Learning on Multiple Modalities. Sup-
pose we have M modalities. For each subject Ii , we can extract multi-modal image
i ; m ¼ 1; . . .; M. For m-th modality, we optimize the affinity matrix S . As
m
features xm
shown in Fig. 1(a) and (b), the affinity matrices across modalities could be different.
Thus, we require the intrinsic data representation T to be close to all Sm ; m ¼ 1; . . .; M.
It is straightforward to extend our above pGTL method to the multi-modal scenario:
M
2 2 2
P
N 2 P m m m
arg minSm ;T;F lf i f j 2 tij þ xi xm
j sij þ k sm
1 ij þ k s
2 ij tij
i;j¼1 m¼1 2 2 ð3Þ
m 0 0
s:t: 0 sij 1; si 1 ¼ 1; 0 tij 1; ti 1 ¼ 1; F ¼ ½FP FQ
m
It is worth noting that, although the multi-modal information leads to multiple affinity
matrices in the feature domain, they share the same intrinsic data representation T.
2.2 Optimization
Since our proposed energy function in Eq. (3) is convex to each variables, i.e. S; T; F,
we present the following divide-and-conquer solution to optimize one set of variables
2
at a time by fixing other sets of variable. We initialize S ¼ expðxi xj 2 =2r2 Þ, r is
P
an empirical parameter, T ¼ M m¼1 S =M,FQ ¼ f0g
m QC
.
Progressive Graph-Based Transductive Learning 295
Estimation of Affinity Matrix Sm for Each Modality. Removing the unrelated terms
w.r.t. Sm in Eq. (3), the optimization of Sm falls to the following objective function:
XN 2 2 XN 2
m m m m
arg minSm i;j¼1
x i x j s ij þ k s
1 ij
m
þ k 2 i;j¼1
s ij tij ð4Þ
2 2
0
where ð0 sij 1; ðsm
i Þ 1 ¼ 1Þ. Since Eq. (4) is independent of variables i and j, we
further reformulate Eq. (4) in the vector form as below:
2
m di
arg minsmi s
i þ ð5Þ
2r1 2
i is the i-th column vector of affinity matrix S ; di ¼ dij j¼1;...;N is a vector
m
where sm
2
m
with each dij ¼ xm i x j 2k2 tij , and r1 ¼ k1 þ k2 . The problem in Eq. (5) is
equivalent to project onto a simplex, which has a closed-form solution in [7]. After we
m
solve each smi , we can obtain the affinity matrix S .
2 P
where hi ¼ hij j¼1;...;N is a vector with each element hij ¼ lf i f j 2 2k2 M m
m¼1 si ,
and r2 ¼ Mk2 is a scalar.
Update the Latent Labels FQ on Testing Subjects. Given both Sw and T, the
objective function for the latent label FQ can be derived from Eq. (3) as below:
XN
arg minF f i f j 2 tij ) arg minF TraceðF0 LFÞ; ð8Þ
i;j¼1 2
where Traceð:Þ denotes the matrix trace operator, L ¼ diagðTÞ ðT0 þ TÞ=2 is the
Laplacian matrix of T. By differentiating Eq. (8) w.r.t. Fand letting
the gradient
LPP LPQ FP
LF ¼ 0, we obtain the following equation: ¼ 0; where LPP ,
LQP LQQ FQ
LPQ ; LQP , and LQQ denote the top-left, top-right, bottom-left, and bottom-right blocks
of L. The solution for FQ can be obtained by F ^ Q ¼ ðLQQ Þ1 LQP FP .
Discussion. Taking MRI and PET modalities as example, Fig. 2(a) illustrates the
optimization of Eq. (3) by alternating the following three steps. (1) Estimate each
296 Z. Wang et al.
(a) (b)
Fig. 2. (a) The dynamic procedure of the proposed pGTL method, (b) Classification accuracy as
a function of the number of training samples used.
affinity matrix Sm , which depends on the observed image features xm and the currently
estimated intrinsic data representation T (red arrows); (2) Estimate the intrinsic data
representation T, which requires the estimations of both S1 and S2 and also the
subject-wise relationship in the label domain (purple arrows); (3) Update the latent
labels FQ on the testing subjects which needs guidance from the learned intrinsic data
representation T (blue arrows). It is apparent that the intrinsic data representation
T links the feature domain and label domain, which eventually leads to the dynamic
graph learning model.
3 Experiments
Table 2. Comparison with the classification accuracies reported in the literatures (%).
Method Subject information Modality AD/NC MCI/NC
Random forest [10] 37AD + 75MCI + 35NC MRI + PET + CSF 89.0 74.6
+ Genetic
Graph fusion [4] 35AD + 75MCI + 77NC MRI + PET + CSF 91.8 79.5
+ Genetic
Deep learning [11] 85AD + 169MCI + 77NC MRI + PET 91.4 82.1
Our method 99AD + 202MCI + 101NC MRI + PET 92.6 78.6
that supervised methods require a sufficient number of samples to train the reliable
classifier. Since the training samples with known labels are expensive to collect in
medical imaging area, this experiment indicates that our method has high potential to
be deployed in current neuroimaging studies.
Comparison with Recently Published State-of-the-Art Methods. Table 2 summa-
rizes the subject information, imaging modality, and average classification accuracy by
using state-of-the-art methods. These comparison methods represent four typical
machine learning techniques. Since the classification between pMCI and sMCI groups
are not reported in [4, 10, 11], we only show the classification results for AD vs NC,
and MCI vs NC tasks. Our method achieves higher classification accuracy than both
random forest and graph fusion methods, even though those two methods use addi-
tional CSF and genetic information.
Discussion. Deep learning approach in [11] learns feature representation in a
layer-by-layer manner. Thus, it is time consuming to re-train the deep neural-network
from scratch. Instead, our proposed method only uses hand-crafted features for clas-
sification. It is noteworthy that we can complete the classification on a new dataset
(including greedy parameter tuning) within three hours on a regular PC (8 CPU cores
and 16 GB memory), which is much more economic than massive training cost in [11].
Complementary information in multi-modal data can help improve the classification
performance, therefore, in order to find the intrinsic data representation, we combine
our proposed pGTL with multi-modal information.
4 Conclusion
In this paper, we present a novel pGTL method to identify individual subject at dif-
ferent stages of AD progression, using multi-modal imaging data. Compared to con-
ventional methods, our method seeks for the intrinsic data representation, which can be
learned from the observed imaging features and simultaneously validated on the
existing labels of training data. Since the learned intrinsic data presentation is more
relevant to label propagation, our method achieves promising classification perfor-
mance in AD vs NC, MCI vs NC, and pMCI vs sMCI tasks, after comprehensive
comparison with classic and recent state-of-the-art methods.
Progressive Graph-Based Transductive Learning 299
References
1. Thompson, P.M., Hayashi, K.M., et al.: Tracking Alzheimer’s disease. Ann. NY Acad. Sci.
1097, 198–214 (2007)
2. Zhu, X., Suk, H.-I., et al.: A novel matrix-similarity based loss function for joint regression
and classification in AD diagnosis. NeuroImage 100, 91–105 (2014)
3. Jin, Y., Shi, Y., et al.: Automated multi-atlas labeling of the fornix and its integrity in
Alzheimer’s disease. In: 2015 IEEE 12th ISBI, pp. 140–143. IEEE (2015)
4. Tong, T., Gray, K., Gao, Q., Chen, L., Rueckert, D.: Nonlinear graph fusion for multi-modal
classification of Alzheimer’s disease. In: Zhou, L., Wang, L., Wang, Q., Shi, Y. (eds.)
MICCAI 2015. LNCS, vol. 9352, pp. 77–84. Springer, Heidelberg (2015)
5. Wang, B., Mezlini, A.M., et al.: Similarity network fusion for aggregating data types on a
genomic scale. Nat. Methods 11, 333–337 (2014)
6. Zhang, Y., Huang, K., et al.: MTC: a fast and robust graph-based transductive learning
method. IEEE Trans. Neural Netw. Learn. Syst. 26, 1979–1991 (2015)
7. Huang, H., Yan, J., Nie, F., Huang, J., Cai, W., Saykin, A.J., Shen, L.: A new sparse simplex
model for brain anatomical and genetic network analysis. In: Mori, K., Sakuma, I., Sato, Y.,
Barillot, C., Navab, N. (eds.) MICCAI 2013, Part II. LNCS, vol. 8150, pp. 625–632.
Springer, Heidelberg (2013)
8. Thompson, B.: Canonical correlation analysis. In: Encyclopedia of Statistics in Behavioral
Science (2005)
9. Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12,
2211–2268 (2011)
10. Gray, K., Aljabar, P., et al.: Random forest-based similarity measures for multi-modal
classification of Alzheimer’s disease. NeuroImage 65, 167–175 (2013)
11. Liu, S., Liu, S., et al.: Multimodal neuroimaging feature learning for multiclass diagnosis of
Alzheimer’s disease. IEEE Trans. Biomed. Eng. 62, 1132–1141 (2015)
Structured Outlier Detection in Neuroimaging
Studies with Minimal Convex Polytopes
1 Introduction
Mass-univariate and multivariate pattern analysis techniques aim to reveal dis-
ease effects by comparing a patient group to the control population [1,9]. The
latter is commonly assumed to be homogeneous. However, as noted in recent
works [6,13], controls may often consist of subjects that are outside a normative
range, and this may confound the actual pathological effect when comparing
against the patient group. The confounding effect may be remedied by identify-
ing a normative range and removing outliers that lie outside this range.
There have been two main directions of outlier detection in the context of
neuroimaging. The first class of methods include parametric models that aim to
select a subset of samples such that the determinant of the covariance matrix
is minimized. This is in contrast to non-parametric methods such as the one-
class support vector machine (OC-SVM) [7,13,14] which attempt to separate a
subset of samples from the origin with maximum margin in the Gaussian radial
basis function (GRBF) kernel space. Another complementary non-parametric
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 300–307, 2016.
DOI: 10.1007/978-3-319-46720-7 35
Structured Outlier Detection in Neuroimaging Studies 301
approach is the support vector data description (SVDD) [15] whose objective is
to solve for the smallest radius hypersphere that encloses a subset of the samples
(Fig. 1b). All of the aforementioned outlier detection methods effectively capture
the main probability mass of a dataset and delineate samples outside this region
as outliers. However, they do not provide further information about whether
there are different types of outliers. In this work, we posit that there may be
a structure by which outliers deviate from the normal population. Capturing
this structure may be instrumental in characterizing and understanding how
pathogenesis originates from those who are healthy. Thus, the overall aim of our
approach is to learn the organization by which samples deviate from the main
probability mass.
We resolve the limitation of prior methods regarding learning the structure
of outliers by containing the high probability region of a dataset using convex
polytopes [16]. The geometry of our formulation allows to simultaneously enclose
the normative samples within the convex polytope while excluding outliers with
maximum margin. The assignment of outliers to unique faces of the convex
polytope permits our formulation to be posed as a clustering problem. This
clustering allows to subtype the directions of deviation from the normal.
The remainder of this paper is organized as follows. In Sect. 2 we detail the
proposed approach, while experimental validation follows in Sect. 3. Section 4
concludes the paper with our final remarks.
2 Method
To learn the organization by which samples deviate from the main probability
mass, we aim to find the minimal convex polytope (MCP) that excludes ρ per-
cent of the samples with maximum margin. The convex polytope is minimal in
the sense that the radius of the largest hypersphere that is circumscribed within
the polytope is the minimum possible. Furthermore, the convex polytope is max-
imum margin in the sense that the margin between samples within the polytope
and the outliers surrounding the polytope is maximized (Fig. 1c).
Fig. 1. (a) A simulated dataset with three deviations from normal; (b) the minimum
hypersphere that excludes ρ percent of samples; (c) Proposed solution: minimum convex
polytope (MCP) that excludes ρ percent of samples. Note that the MCP characterizes
the types of deviations by associating outliers to different faces (indicated by colors
orange, green and blue).
302 E. Varol et al.
The previous problem involves two steps. The first step is to find the minimal
hypersphere that excludes ρ percent of samples and the second is to find the
convex polytope that circumscribes this hypersphere. Let xi ∈ Rd for i = 1, . . . , n
denote the ith d-dimensional sample in the dataset. The minimal hypersphere
that excludes ρ percent of samples can be cast as the following optimization
problem:
n
1
minimize R2 + max{0, R2 − xi − xc 22 }, (1)
R,xc nρ i=1
where R describes the radius and xc denotes the center of the hypersphere. This
problem is convex [15] and can be solved using LIBSVM1 .
Once the dichotomy between the outliers and normative samples has been
established, the maximum margin convex polytope [16] that separates the out-
liers from the normative samples can be cast as the following objective:
⎡
K
⎢ 1
minimize wj 1 +C ⎢
⎣ max{0, 1 + wjT xi + bj }
{wj ,bj }K
j=1 j=1
K
i:xi −xc 2 ≤R
{ai,j }K,n
j=1,i=1
j=1,...,K
j ai,j =1 regularization/margin
ai,j ≥0 loss for normative samples
⎤
⎥
+ ai,j max{0, 1 − wjT xi − bj }⎥
⎦.
i:xi −xc 2 >R
j=1,...,K
assignment & loss for outliers
(2)
3 Experimental Validation
Due to lack of ground truth in clinical datasets and the need to quantitatively
evaluate performance, we validated our method on two simulated datasets where
the number of directions of deviations from the normal was a priori determined.
Both datasets composed of 1000 samples and 150 features. 130 out of 150 of
the features were drawn from a zero mean, unit variance, multivariate Gaussian
distribution. For the first dataset, the remaining 20 features were replicates of the
univariate random variable that is uniformly distributed within a unit side length
equilateral triangle (as in Fig. 1a). Thus, the number of simulated deviations
from the spherical white noise was three for this dataset. The second dataset was
analogously generated except that the 20 signal-carrying features were replicates
of the univariate random variable that is uniformly distributed within a unit side
length square. Hence, this dataset was designed to yield four types of outliers.
For the triangular dataset, the parameter selection revealed that the most
stable clustering occurs at K = 3, ρ = 0.1, C = 0.01 (Fig. 2a), while for the
square dataset, the most stable clustering occurred at K = 4, ρ = 0.5, C = 0.01
304 E. Varol et al.
1.2 1
50% outlier 50% outlier
40% outlier 0.9 40% outlier
1 30% outlier 30% outlier
20% outlier 0.8 20% outlier
10% outlier 10% outlier
0.8 0.7
K−means K−means
0.6
0.6
ARI
ARI
0.5
0.4
0.4
0.2 0.3
0.2
0
0.1
−0.2 0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
K K
Fig. 2. The parameter selection for (a) triangular simulated dataset, and (b) square
simulated dataset. (a) K = 3, ρ = 0.1, C = 0.01 were selected, (b) K = 4, ρ = 0.5, C =
0.01 were selected. Different solid lines indicate the ARI of MCP at different values
of ρ at the maximum ARI yielding C parameter. Black dashed lines indicate the ARI
of K-means for comparison. Note that MCP yields more stable clusterings that align
with the ground truth.
(Fig. 2b). For both of these datasets, the ARI values for the optimal K were com-
parable across varying ρ and C, which indicates that the most important direc-
tions of deviation were captured regardless of the amount of outliers searched.
These results demonstrate the ability of MCP to capture the underlying direc-
tions of deviation.
For comparison, K-means clustering was applied to the same datasets (see
Fig. 2a, b, dashed lines). For the triangular and square datasets, K = 2 and
K = 3 yielded the most stable clusterings, respectively. This demonstrates that
K-means was not able to accurately capture the main directions of deviation,
but was most likely grouping outliers with the normative samples.
0.5 4 Deviated CN
0.45
50% outlier
40% outlier
30% outlier
3 D2 Deviated MCI
Deviated AD
0.25
−1
0.2
0.15
−2 D1
0.1
−3 N
0.05 −4
Normative AD
Normative MCI
0 −5
1 2 3 4 5 6 7 8 9 Normative CN
K −5 −4 −3 −2 −1 0 1 2
(a) (b)
(c) D1: Cerebellar and brain stem degeneration associated with deviation subtype 1.
(d) D2: Widespread gray matter atrophy patterns associated with deviation subtype
2.
Fig. 3. (a) The parameter selection for ADNI control group, K = 2, ρ = 0.3, C = 1
yielded the highest clustering stability. (b) The projections of all ADNI subjects along
the two faces of the MCP. Normative samples (N) are in the negative orthant while
deviated subtypes are on the upper left (subtype 2) and lower right (subtype 1). (c,
d) The voxel-based group differences between all normative samples and deviation
subtype 1 (c), and deviation subtype 2 (d) are shown. Warmer colors indicate that the
normative group volume is greater, while colder colors indicate that the deviated group
volume is greater.
The method was applied only to the control group. The parameter selection
revealed that K = 2 subtypes, and 30 % outliers with C = 1 yielded the highest
clustering stability (Fig. 3a). Once the MCP that captured the normative con-
trols was found, it was used to subtype the rest of the ADNI dataset consisting
of AD and MCI subjects into three groups denoted by normative (N), deviation
subtype 1 (D1) and deviation subtype 2 (D2).
The distribution of the entire ADNI dataset with respect to the MCP is illus-
trated in Fig. 3b. Furthermore, the demographic and clinical biomarker informa-
tion of CN, MCI and AD subjects within their respective subgroup is summarized
in Table 1. 56 % of AD and 62 % of MCI patients were categorized into the nor-
mative group. This indicated that the main type of AD and MCI neuropathology
was dissimilar to the deviations exhibited by the normal population. However,
a non-negligible portion, 37 % of AD and 28 % of MCI was found to deviate
306 E. Varol et al.
Table 1. Demographic and clinical characteristics of CN, AD, MCI subjects and their
grouping into the normative (N) or deviated subtypes (D1, D2). a – Mini mental
state exam. b – Presence of at least one APOE ε4 allele. c – Cerebrospinal fluid
(CSF) concentrations of Amyloid-beta (Aβ), total tau (t-tau), and phosphorylated tau
(p-tau). d – p-values using ANOVA between three subgroups
Group CN
Subtype CN-N CN-D1 CN-D2 p-valued
n (%) 125 (70.6) 19 (10.7) 33 (18.6)
Age (years) 75.4 ± 5.3 78.1 ± 4.2 76.2 ± 4.7 0.10
Sex (female), n (%) 58 (46.4) 10 (52.6) 19 (57.5) 0.49
MMSEa 29.0 ± 1.1 29.0 ± 0.7 29.4 ± 0.8 0.19
APOE ε4b , n (%) 34 (27.2) 6 (31.5) 8 (24.2) 0.85
CSF Aβ (pg/mL)c 203.4 ± 55.7 233.7 ± 35.2 218.5 ± 52.3 0.21
CSF t-tau (pg/mL)c 67.4 ± 23.8 63.7 ± 22.6 73.9 ± 29.2 0.54
CSF p-tau (pg/mL)c 24.5 ± 14.5 22.4 ± 11.5 24.8 ± 11.4 0.90
along the second subtype direction along with 18 % of CN. This suggested that
a sizeable portion of the normal population might have the propensity to deviate
towards AD-like pathology.
To better understand and interpret the neuroanatomical directions of these
deviations from the normative range, voxel-based analysis was performed on all
subjects in the normative group versus either of the two subtypes of deviations
using gray matter tissue density maps. The group differences are visualized in
Fig. 3.
There has been a substantial amount of research in the past that has demon-
strated that the normal pattern of aging consists of prefrontal and motor cortex
thinning along with increased ventricle size [11,12]. Corresponding manifesta-
tions of these patterns can be observed in group D2 (Fig. 3d). The significantly
younger ages of AD and MCI subjects (Table 1) that fall into this subtype may
indicate that the cognitive decline they exhibit may be caused by early and
accelerated aging that follows this pattern. Furthermore, the relatively lower
CSF amyloid-β and t-tau concentrations (Table 1) of these patients is another
strong indicator of AD [3].
On the other hand, the patterns seen in group D1 (Fig. 3c) indicate cerebellar
degeneration which is usually accompanied by brain stem atrophy [10]. Although
cerebellar thinning has been demonstrated to be part of normal aging, our find-
ings suggest that the increased rate of this degenerative pattern may be a type
of deviation. Lastly, it should be mentioned that the majority of the AD and
MCI subjects were not designated to be moving along either of the directions of
Structured Outlier Detection in Neuroimaging Studies 307
4 Conclusion
In summary, we have introduced a method that can simultaneously detect a
homogeneous normative group and define subtypes of outliers. This allows a
better understanding of the structure of deviations in control groups in neu-
roimaging cohorts. This, in turn, aids in the better interpretation of the patho-
logical processes, which occur when subjects diverge from the normative region.
References
1. Ashburner, J., Friston, K.J.: Voxel-based morphometry-the methods. Neuroimage
11(6), 805–821 (2000)
2. Ben-Hur, A., et al.: A stability based method for discovering structure in clustered
data. In: Pacific Symposium on Biocomputing, vol. 7, pp. 6–17 (2001)
3. Blennow, K.: Cerebrospinal fluid protein biomarkers for Alzheimer’s disease. Neu-
roRx 1(2), 213–225 (2004)
4. Doshi, J., et al.: MUSE: multi-atlas region segmentation utilizing ensembles of
registration algorithms and parameters, and locally optimal atlas selection. Neu-
roImage 127, 186–195 (2015)
5. Dukart, J., Schroeter, M.L., Mueller, K., Initiative, A.D.N., et al.: Age correction
in dementia-matching to a healthy brain. PloS one 6(7), e22193 (2011)
6. Fritsch, V., et al.: Detecting outliers in high-dimensional neuroimaging datasets
with robust covariance estimators. Med. Image Anal. 16(7), 1359–1370 (2012)
7. Gardner, A.B., et al.: One-class novelty detection for seizure analysis from intracra-
nial EEG. J. Mach. Learn. Res. 7, 1025–1044 (2006)
8. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
9. Kawasaki, Y., et al.: Multivariate voxel-based morphometry successfully differen-
tiates schizophrenia patients from healthy controls. Neuroimage 34(1), 235–242
(2007)
10. Luft, A.R., et al.: Patterns of age-related shrinkage in cerebellum and brainstem
observed in vivo using three-dimensional MRI volumetry. Cereb. Cortex 9(7), 712–
721 (1999)
11. Raz, N., Rodrigue, K.M.: Differential aging of the brain: patterns, cognitive corre-
lates and modifiers. Neurosci. Biobehav. Rev. 30(6), 730–748 (2006)
12. Salat, D.H., et al.: Thinning of the cerebral cortex in aging. Cereb. Cortex 14(7),
721–730 (2004)
13. Sato, J.R., et al.: An fmRI normative database for connectivity networks using
one-class support vector machines. Hum. Brain Mapp. 30(4), 1068–1076 (2009)
14. Schölkopf, B., et al.: Estimating the support of a high-dimensional distribution.
Neural Comput. 13(7), 1443–1471 (2001)
15. Tax, D.M., Duin, R.P.: Support vector data description. Machine Learn. 54(1),
45–66 (2004)
16. Varol, E., Sotiras, A., Davatzikos, C.: Hydra: revealing heterogeneity of imaging
and genetic patterns through a multiple max-margin discriminative analysis frame-
work. NeuroImage (2016)
Diagnosis of Alzheimer’s Disease Using
View-Aligned Hypergraph Learning
with Incomplete Multi-modality Data
1 Introduction
Alzheimer’s disease (AD) is a neurodegenerative disease that continues to pose
major challenges to global health care systems [1]. Studies have shown that
multi-modality data (e.g., structural magnetic resonance imaging (MRI), flu-
orodeoxyglucose positron emission tomography (PET), and cerebrospinal fluid
(CSF)) provide complementary information that can be harnessed for improv-
ing diagnosis of AD and its prodrome, known as mild cognitive impairment
(MCI) [2–5]. However, collecting data with multi-modalities is challenging and
the data are often incomplete due to patient dropouts. In the Alzheimer’s Dis-
ease Neuroimaging Initiative (ADNI) database, for instance, while baseline MRI
D. Shen—This study was supported in part by NIH grants (EB006733, EB008374,
EB009634, MH100217, AG041721, AG042599, AG010129, AG030514, and
NS093842).
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 308–316, 2016.
DOI: 10.1007/978-3-319-46720-7 36
View-Aligned Hypergraph Learning 309
data were collected for all subjects, only approximately half of the subjects have
baseline PET data and half of the subjects have baseline CSF data.
Various approaches have been developed to deal with the problem of incom-
plete multi-modality data. A straightforward method is to remove subjects with
missing data. This approach, however, significantly reduces the sample size. An
alternative way is to impute the missing data using techniques such as expecta-
tion maximization (EM) [6], singular value decomposition (SVD) [7], and matrix
completion [5]. However, the effectiveness of this method can be affected by impu-
tation artifacts. Several recently introduced multi-view learning based methods
circumvent the need for imputation [3,4]. These methods generally apply specific
learning algorithms to different views of the data, comprising the combinations
of available data from different modalities. However, the coherence among views
is not explicitly considered in these methods. Intuitively, integrating these views
coherently can lead to better diagnostic performance. On the other hand, hyper-
graph learning [8] has attracted increasing attention in neuroimaging analysis,
where complex relationships among vertices can be modeled via hyperedges [9].
In this paper, we propose a view-aligned hypergraph learning (VAHL)
method with incomplete multi-modality data for AD/MCI diagnosis. Different
from conventional multi-view based learning methods, VAHL explicitly incor-
porates the coherence among views into the learning model, where the optimal
weights for different views are automatically learned from the data. Figure 1
presents a schematic diagram of our method. We first divide the whole dataset
into M views (M = 6 in Fig. 1) according to the data availability in association
with different combinations of modalities, followed by a sparse representation
based hypergraph construction process in each view space. We then develop a
view-aligned hypergraph classification (VAHC) model to explicitly capture the
coherence among views. To arrive at a final classification decision, we agglomer-
ate the class probability scores via a multi-view label fusion method.
2 Method
Data and Pre-processing: A total of 807 subjects in the baseline ADNI-
1 database [10] with MRI, PET and CSF modalities are used in this study,
which include 186 AD subjects, 226 NCs, and 395 MCI subjects. According
to whether MCI would convert to AD within 24 months, the MCI subjects are
further divided into two categories: (1) stable MCI (sMCI), if diagnosis was
MCI at all available time points (0–96 months); (2) progressive MCI (pMCI),
if diagnosis was MCI at baseline but conversion to AD occurred after baseline
within 24 months. The 395 MCI subjects are separated into 169 pMCI and 226
sMCI subjects.
Image features are extracted from the MR and PET images based on
regions-of-interest (ROIs). Specifically, for each MR image, we perform ante-
rior commissure (AC)-posterior commissure (PC) correction, resampling to size
256 × 256 × 256, and inhomogeneity correction using the N3 algorithm [11].
Skull stripping is then performed using BET [12], followed by manual editing
to ensure that both skull and dura are cleanly removed. Next, we remove the
cerebellum by warping a labeled template to each skull-stripped image. FAST
[12] is applied to segment the human brain into three different tissue types,
i.e., gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF). The
anatomical automatic labeling (AAL) atlas, with 90 pre-defined ROIs in the
cerebrum, is aligned to the native space of each subject using a deformable reg-
istration algorithm. Finally, for each subject, we extract the volumes of GM
tissue inside the 90 ROIs as features, which are normalized by the total intracra-
nial volume (estimated by the summation of GM, WM and CSF volumes from all
ROIs). We align each PET image to its corresponding MR image via affine trans-
formation and compute the mean PET intensity in each ROI as features. We
also employ five CSF biomarkers, including amyloid β (Aβ42), CSF total tau
(t-tau), CSF tau hyperphosphorylated at threonine 181 (p-tau), and two tau
ratios with respect to Aβ42 (i.e., t-tau/Aβ42 and p-tau/Aβ42). Ultimately, we
have a 185-dimensional feature vector for each subject with complete data modal-
ities, including 90 MRI features, 90 PET features, and 5 CSF features.
m
subject, E m denotes a hyperedge set with Nem hyperedges, and wm ∈ RNe is
m m
the corresponding weight vector for hyperedges. Denote Hm ∈ RN ×Ne as the
vertex-edge incidence matrix, with the (v, e)-entry indicating whether the vertex
v is connected with other vertices in the hyperedge e.
In conventional hypergraph based methods [8], the Euclidean distance is typ-
ically used to evaluate similarity between pairs of vertices. We argue that the
Euclidean distance can only model the local structure of data. To this end, we
propose a sparse representation (SR) based hypergraph construction method to
exploit the global structure of data. Specifically, we first select each vertex as a
centroid, and then represent each centroid using the other vertices via a SR model
[13]. A hyperedge can then be constructed by connecting each centroid to the
other vertices, with global sparse representation coefficients as similarity mea-
surements. Given N m vertices, we can obtain Nem = N m hyperedges. A larger
value for the l1 regularization parameter (i.e., ) in SR will lead to more sparse
coefficients. To capture richer data structure information, we employ multiple
(e.g., q) parameters in SR to construct multiple sets of hyperedges, and finally
have Nem = qN m hyperedges for the hypergraph G m .
Fig. 2. Illustration of the view-aligned regularizer with PET, MRI, and CSF data.
312 M. Liu et al.
Using the hypergraph constructed in the m-th view, the objective of hyper-
graph based semi-supervised learning is formulated as [8]
min
m
Remp (f m ) + Rreg (f m ), (2)
f
where the first term is the empirical loss, and the second term is a hypergraph
regularizer [8] defined as
2
wem hm m
u,e hv,e fm fm
Rreg (f m ) = × √ um − √ v m = (f m )T L
f m, (3)
e∈E m u,v∈V m
δem
du dv
M
M
M
+μ (f m )T Ωm Ωp (f m − f p ) + λ Wm 2F , (4)
m=1 p=1 m=1
m
Ne
M
s.t. αm = 1, ∀ αm ≥ 0; m
Wi,i m
= 1, ∀ Wi,i ≥ 0,
m=1 i=1
where the first term is the square loss, and the second one is the hypergraph
Laplacian regularizer. The regularization coefficient (αm )2 is to prevent the
degenerate solution of α. The last term and those constraints in Eq. (4) are used
to penalize the complexity of the weights (i.e., α) for views and the weights (i.e.,
Wm ) for hyperedges. It is worth noting that the third term in Eq. (4) is the
proposed view-aligned regularizer, which encourages that the estimated labels
of one subject represented in different views to be similar. Using Eq. (4), we can
jointly learn the class probability scores F, the optimal weights for views (i.e.,
M
α), and the optimal weights for hyperedges (i.e., {Wm }m=1 ) from data.
M
Since the problem in Eq. (4) is not jointly convex w.r.t. F, α and {Wm }m=1 ,
we adopt an alternating optimization method to solve the objective function.
View-Aligned Hypergraph Learning 313
M
First, we optimize F with fixed α and {Wm }m=1 . Given fixed F and α, we
M
optimize {Wm }m=1 in the second step. In the third step, we optimize α with
M
fixed F and {Wm }m=1 . Such alternating optimization process is repeated until
convergence. The overall computational complexity of our method is O(N 2 ).
3 Experiments
Experimental Settings: We performed three classification tasks, including
AD vs. NC, MCI vs. NC, and pMCI vs. sMCI classification. The classification
performance was evaluated by accuracy (ACC), sensitivity (SEN), specificity
(SPE), and area under the ROC curve (AUC). We compared VAHL with 4 base-
line methods, including Zero (with missing values as zeros), KNN, EM [6], and
SVD [7]. VAHL was further compared with 4 state-of-the-art methods, includ-
ing an ensemble-based method [2] with weighted mean (Ensemble-1) and mean
(Ensemble-2) strategies, iMSF [3] with square loss (iMSF-1) and logistic loss
(iMSF-2), iSFS [4], and matrix shrinkage and completion (MSC) [5].
A 10-fold cross-validation (CV) strategy was used for performance evalua-
tion. To optimize parameters, we performed an inner 10-fold CV using training
data. The parameters μ and λ in Eq. (4) were chosen from {10−3 , 10−2 , · · · , 104 },
while the iteration number in the alternating optimization algorithm for Eq. (4)
was empirically set to 20. Multiple parameter values for in the SR model
[13] were set to [10−3 , 10−2 , 10−1 , 100 ] to construct multiple sets of hyperedges
in each hypergraph of VAHL. The parameter k for KNN was chosen from
{3, 5, 7, 9, 11, 15, 20}. The rank parameter was chosen from {5, 10, 15, 20, 25, 30}
for SVD, and the parameter λ for iMSF was chosen from {10−5 , 10−4 , · · · , 101 }.
Results of iSFS [4] and MSC [5] were taken directly from the authors.
Results: Experimental results achieved by our method and those baseline meth-
ods are given in Fig. 3. As can be seen from Fig. 3, our method consistently
achieves the best performance in terms of ACC, SEN and AUC in three clas-
sification tasks. We further report the comparison between our method and
state-of-the-art methods in Table 1, with results demonstrating that our method
outperforms those competing methods. For instance, the ACC values achieved
by our method are 93.10 % and 80.00 % in AD vs. NC and MCI vs. NC classi-
fication, respectively, which are significantly better than the second best results
314 M. Liu et al.
Results (%)
Results (%)
80 80
80
70 70 70
60 60 60
50 50 50
40 40 40
ACC SEN SPE AUC ACC SEN SPE AUC ACC SEN SPE AUC
(a) AD vs. NC classification (b) MCI vs. NC classification (c) pMCI vs. sMCI classification
(i.e., 88.50 % and 71.61 %, respectively). Similarly, the results in pMCI vs. sMCI
classification show that our method can identify progressive MCI patients from
the whole population more accurately than the state-of-the-art methods.
We also conduct experiments using VAHL based on complete data (with
PET, MRI and CSF modalities), and achieved the accuracies of 89.23 %, 78.50 %
and 78.00 % in AD vs. NC, MCI vs. NC, and pMCI vs. sMCI classification,
respectively. These results are worse than the results of using all subjects with
incomplete data, implying that subjects with missing data can provide useful
information. Then, we compare VAHL with its variant named VAHL-1 (with-
out the view-aligned regularizer), and the accuracies achieved by VAHL-1 are
85.24 %, 75.16 % and 75.25 % in the three classification tasks, respectively. Such
results imply that our view-aligned regularizer plays an important role in VAHL.
We further investigate the influence of parameters and the weights for dif-
ferent views learned from Eq. (4), with results shown in Fig. 4. Figure 4(a) indi-
cates that the best results are achieved by VAHL when 0.1 μ 100 and
0.01 λ 10 in three tasks. From Fig. 4(c), we can observe that the learned
weights for the “PET + MRI + CSF” view are much larger than those of the
other five views, implying that this view contributes the most in three tasks.
View-Aligned Hypergraph Learning 315
Weight value
0.4
ACC (%)
80
ACC (%)
80
0.3
70 70
AD vs. NC AD vs. NC 0.2
60 MCI vs. NC 60 MCI vs. NC
pMCI vs. sMCI 0.1
pMCI vs. sMCI 50
50
01 01 .1 1 10 00 00 00 01 01 .1 1 10 00 00 00 AD vs.
NC . NC . sMCI
0. 0 1 0. 0 1 MCI vs pMCI vs
0.
0 10 10
0
0.
0 10 10
0
Value of Value of Learned weights for different views
(a) (b) (c)
Fig. 4. Influence of parameters (a–b) and learned weights for different views (c).
4 Conclusion
References
1. Brookmeyer, R., Johnson, E., Ziegler-Graham, K., Arrighi, H.M.: Forecasting the
global burden of Alzheimer’s disease. Alzheimer’s Dement. 3(3), 186–191 (2007)
2. Ingalhalikar, M., Parker, W.A., Bloy, L., Roberts, T.P.L., Verma, R.: Using mul-
tiparametric data with missing features for learning patterns of pathology. In:
Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol.
7512, pp. 468–475. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33454-2 58
3. Yuan, L., Wang, Y., Thompson, P.M., Narayan, V.A., Ye, J.: Multi-source feature
learning for joint analysis of incomplete multiple heterogeneous neuroimaging data.
NeuroImage 61(3), 622–632 (2012)
4. Xiang, S., Yuan, L., Fan, W., Wang, Y., Thompson, P.M., Ye, J.: Bi-level multi-
source learning for heterogeneous block-wise missing data. NeuroImage 102, 192–
206 (2014)
5. Thung, K.H., Wee, C.Y., Yap, P.T., Shen, D.: Neurodegenerative disease diag-
nosis using incomplete multi-modality data via matrix shrinkage and completion.
NeuroImage 91, 386–400 (2014)
6. Schneider, T.: Analysis of incomplete climate data: estimation of mean values and
covariance matrices and imputation of missing values. J. Clim. 14(5), 853–871
(2001)
7. Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions.
Numer. Math. 14(5), 403–420 (1970)
316 M. Liu et al.
8. Zhou, D., Huang, J., Schölkopf, B.: Learning with hypergraphs: Clustering, classi-
fication, and embedding. In: NIPS, pp. 1601–1608(2006)
9. Gao, Y., Wang, M., Tao, D., Ji, R., Dai, Q.: 3-D object retrieval and recognition
with hypergraph analysis. IEEE Trans. Image Process. 21(9), 4290–4303 (2012)
10. Jack, C.R., Bernstein, M.A., Fox, N.C., Thompson, P., Alexander, G.,
Harvey, D., Borowski, B., Britson, P.J., Whitwell, L., Ward, C.: The
Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. J. Magn.
Reson. Imaging 27(4), 685–691 (2008)
11. Sled, J.G., Zijdenbos, A.P., Evans, A.C.: A nonparametric method for automatic
correction of intensity nonuniformity in MRI data. IEEE Trans. Med. Imaging
17(1), 87–97 (1998)
12. Jenkinson, M., Beckmann, C.F., Behrens, T.E., Woolrich, M.W., Smith, S.M.: FSL.
NeuroImage 62(2), 782–790 (2012)
13. Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition
via sparse representation. IEEE Trans. Pattern Anal. 31(2), 210–227 (2009)
New Multi-task Learning Model to Predict
Alzheimer’s Disease Cognitive Assessment
1 Introduction
Accruing scientific evidences have demonstrated that the neuroimaging tech-
niques, such as magnetic resonance imaging (MRI), are important for the detec-
tion of early Alzheimer’s Disease (AD) [2,4,7,13]. Current American Academy
of Neurology (AAN) guidelines [3] for dementia diagnosis recommend imaging to
identify structural brain diseases that can cause cognitive impairment. Because
AD is a neurodegenerative disorder characterized by progressive impairment of
cognitive functions, it is important to diagnose the degree of brain impairment,
and how much it can influence the performance of cognitive tests. As a result,
many studies have focused on using regression models to predict cognitive scores
and track AD progression [10,11]. In [10], the voxel-based morphometry (VBM)
features extracted from the entire brain were jointly analyzed by the relevance
Suppose there are T learning tasks, the t-th task has nt training data points
Xt = [xt1 , xt2 , ..., xtnt ] ∈ Rd×nt . For each data xti , the label yit is given with the
label matrix Yt = [y1t , y2t , ..., ynt t ] ∈ Rct ×nt for each task t. Wt ∈ Rd×ct is the
T
projection matrix to be learned, W ∈ Rd×c and c = ct .
t=1
It is interesting to see that when γ is large enough, then the k-smallest
singular values of the optimal solution W to problem (1) will be zero as all the
singular values of a matrix is non-negative. That is, when γ is large enough, it
is equal to constrain the rank of W to be r = m − k in the problem (1).
Multi-task Learning Model to Predict AD Cognitive Assessment 319
where W ∗ is the sum of all the singular values of W , and the optimal solution
of right term is sum of r largest singular values, F is the r left singular vectors
of W and G is the r right singular vectors of W .
According to Eq. (2), the objective Jopt in Eq. (1) is equivalent to:
T
min f (WtT Xt , Yt ) + γW ∗ − γT r(F T W G) . (3)
W =[W1 ,...,WT ],
F ∈Rd×r ,F T F =I, t=1
G∈RT ×r ,GT G=I
The optimal solution F to the problem (4) is formed by r left singular vectors of
W corresponding to the r largest singular values, and the optimal solution G is
formed by r right singular vectors of W corresponding to the r largest singular
values.
When F and G are fixed, we define:
Using the reweighted method [6], we can solve problem (6) by iteratively solving
the following problem:
T
T
min g(Wt ) + γ T r(Wt WtT D), (7)
W =[W1 ,...,WT ]
t=1 t=1
We take derivatives of Eq. (9) with respect to bt and Wt , and set them to zero.
The optimal solution to problem (9) is as follows:
1 1
Wt = (Xt HXtT + γD)−1 (Xt HYtT + γF GTt ) H=I− 1t 1Tt , (10)
2 nt
1 1
bt = Yt 1t − WtT Xt 1t . (11)
nt nt
We summarize the detailed algorithm to solve the objective Jopt in Algorithm 1.
When F̃ and G̃ are fixed, the problem becomes Eq. (7), by assuming that W̃ is
the solution in each iteration, we have:
T
T
γ 1 γ 1
T r(W̃ W̃ T (W W T )− 2 ) ≤
g(W̃t ) + g(Wt ) + T r(W W T (W W T )− 2 ).
t=1
2 t=1
2
(14)
On the other hand, according to Lemma 1, when p = 1, we have:
1 1 1 1 1 1
T r((W̃ W̃ T ) 2 )− T r(W̃ W̃ T (W W T )− 2 ) ≤ T r((W W T ) 2 )− T r((W W T )(W W T )− 2 ).
2 2
(15)
Combining (13), (14), and (15), we arrive at:
T
T
f (W̃tT Xt , Yt )+γ||W̃ ||∗ −γT r(F̃ T W G̃) ≤ f (WtT Xt , Yt )+γW ∗ −γT r(F T W G).
t=1 t=1
(16)
Thus the Algorithm 1 will not increase the objective function in (3) at each
iteration. Note that the equalities in above questions hold only when the algo-
rithm converges. Therefore, the Algorithm 1 monotonically decreases the objec-
tive value in each iteration till the convergence.
Because we alternatively solve F , G, and W , the Algorithm 1 will converge
to the local optimum of the problem (3), which is equivalent to the proposed
objective function.
were downloaded, including three scores from RAVLT cognitive assessment; two
scores from Fluency cognitive assessment (FLU); two scores from Trail making
test (TRAIL). A total of 525 subjects are involved in our study, including 78
AD, 260 MCI, and 187 HC participants.
First, we apply the proposed method to the ADNI cohort, and separately pre-
dict each of the following three sets of cognitive scores: RAVLT, TRAILS and
FLUENCY. The morphometric variables {xi }ni=1 ∈ Rd , and d = 93 in this
experiment.
We compare the proposed multi-task learning method to three most related
methods: multivariate regression (MRV), multi-task learning model with 2,1 -
norm regularization (2,1 ) [11], and multi-task learning model with trace norm
(LS TRACE) [1], in cognitive performance prediction. For each test case, we use
5-fold cross validation and the prediction performance is assessed by the root
mean square error (RMSE). All experimental results are reported in Table 1.
The proposed method consistently outperforms other methods in nearly all the
test cases for all the cognitive tasks.
The heat maps of parameter weights are shown in Fig. 1. Visualizing the
parameter weights can help us locate the features which play important roles in
the corresponding cognitive prediction tasks. In this way, there is much potential
to identify the relevant imaging predictors and explain the effects of morpho-
metric changes in relation to cognitive performance. As we can see, different
coefficient values are represented in different colors in heat map. The blue polar
To further evaluate the multi-task joint analysis power, we apply the proposed
method to predict all five types of cognitive scores (RAVLT, TRAILS, FLU-
ENCY) jointly. Such experiments will demonstrate how the interrelations among
cognitive assessment tests are utilized to enhance the prediction performance.
Table 2. Prediction performance measured by RMSE (mean ± std) for joint assessment
tests.
4 Conclusion
References
1. Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach.
Learn. 73(3), 243–272 (2008)
2. Batmanghelich, N., Taskar, B., Davatzikos, C.: A general and unifying framework
for feature construction, in image-based pattern classification. In: Prince, J.L.,
Pham, D.L., Myers, K.J. (eds.) IPMI 2009. LNCS, vol. 5636, pp. 423–434. Springer,
Heidelberg (2009)
3. De Leon, M., George, A., Stylopoulos, L., Smith, G., Miller, D.: Early marker for
Alzheimer’s disease: the atrophic hippocampus. Lancet 334(8664), 672–673 (1989)
4. Hassabis, D., Maguire, E.A.: Deconstructing episodic memory with construction.
Trends Cogn. Sci. 11(7), 299–306 (2007)
5. Kabani, N.J.: 3D anatomical atlas of the human brain. Neuroimage 7, P-0717
(1998)
6. Nie, F., Huang, H., Ding, C.H.: Low-rank matrix recovery via efficient schatten
p-Norm minimization. In: AAAI (2012)
7. Rosen, H.J., Gorno-Tempini, M.L., Goldman, W., Perry, R., Schuff, N., Weiner, M.,
Feiwell, R., Kramer, J., Miller, B.L.: Patterns of brain atrophy in frontotemporal
dementia and semantic dementia. Neurology 58(2), 198–208 (2002)
8. Shen, D., Davatzikos, C.: Hammer: hierarchical attribute matching mechanism for
elastic registration. IEEE Trans. Med. Imaging 21(11), 1421–1439 (2002)
Multi-task Learning Model to Predict AD Cognitive Assessment 325
9. Sled, J.G., Zijdenbos, A.P., Evans, A.C.: A nonparametric method for automatic
correction of intensity nonuniformity in MRI data. IEEE Trans. Med. Imaging
17(1), 87–97 (1998)
10. Stonnington, C.M., Chu, C., Klöppel, S., Jack Jr., C.R., Ashburner, J.,
Frackowiak, R.S.: Predicting clinical scores from magnetic resonance scans in
Alzheimer’s disease. Neuroimage 51(4), 1405–1413 (2010)
11. Wang, H., Nie, F., Huang, H., Risacher, S., Ding, C., Saykin, A.J., Shen, L.: Sparse
multi-task regression and feature selection to identify brain imaging predictors for
memory performance. In: 2011 IEEE International Conference on Computer Vision
(ICCV), pp. 557–562. IEEE (2011)
12. Wang, H., Nie, F., Huang, H., Risacher, S., Saykin, A.J., Shen, L., ADNI: joint clas-
sification and regression for identifying ad-sensitive and cognition-relevant imaging
biomarkers. In: 14th International Conference on Medical Image Computing and
Computer Assisted Intervention (MICCAI), pp. 115–123 (2011)
13. Wang, H., Nie, F., Huang, H., Risacher, S.L., Saykin, A.J., Shen, L.: ADNI: iden-
tifying disease sensitive and quantitative trait relevant biomarkers from multi-
dimensional heterogeneous imaging genetics data via sparse multi-modal multi-
task learning. Bioinformatics 28(12), i127–i136 (2012)
14. Wang, Y., Nie, J., Yap, P.T., Li, G., Shi, F., Geng, X., Guo, L., Shen, D.,
Initiative, A.D.N., et al.: Knowledge-guided robust MRI brain extraction for diverse
large-scale neuroimaging studies on humans and non-human primates. PloS One
9(1), e77810 (2014)
15. Wang, Y., Nie, J., Yap, P.-T., Shi, F., Guo, L., Shen, D.: Robust deformable-
surface-based skull-stripping for large-scale studies. In: Fichtinger, G., Martel, A.,
Peters, T. (eds.) MICCAI 2011. LNCS, vol. 6893, pp. 635–642. Springer, Heidelberg
(2011). doi:10.1007/978-3-642-23626-6 78
16. Weiner, M.W., Aisen, P.S., Jack Jr., C.R., Jagust, W.J., Trojanowski, J.Q.,
Shaw, L., Saykin, A.J., Morris, J.C., Cairns, N., Beckett, L.A., et al.: The
Alzheimer’s disease neuroimaging initiative: progress report and future plans.
Alzheimer’s Dement. 6(3), 202–211 (2010)
17. Zhang, Y., Brady, M., Smith, S.: Segmentation of brain MR images through a
hidden Markov random field model and the expectation-maximization algorithm.
IEEE Trans. Med. Imaging 20(1), 45–57 (2001)
Hyperbolic Space Sparse Coding with Its
Application on Prediction of Alzheimer’s Disease
in Mild Cognitive Impairment
Jie Zhang1 , Jie Shi1 , Cynthia Stonnington2 , Qingyang Li1 , Boris A. Gutman4 ,
Kewei Chen3 , Eric M. Reiman3 , Richard Caselli3 , Paul M. Thompson4 ,
Jieping Ye5 , and Yalin Wang1(B)
1
School of Computing, Informatics, and Decision Systems Engineering,
Arizona State University, Tempe, AZ, USA
ylwang@asu.edu
2
Department of Psychiatry and Psychology, Mayo Clinic Arizona,
Scottsdale, AZ, USA
3
Banner Alzheimer’s Institute and Banner Good Samaritan PET Center,
Phoenix, AZ, USA
4
Imaging Genetics Center, Institute for Neuroimaging and Informatics,
University of Southern California, Marina del Rey, CA, USA
5
Department of Computational Medicine and Bioinformatics,
University of Michigan, Ann Arbor, MI, USA
1 Introduction
Mild Cognitive Impairment (MCI) is a transitional stage between normal aging
and Alzheimer’s disease (AD). Many neuroimaging studies aim to identify
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 326–334, 2016.
DOI: 10.1007/978-3-319-46720-7 38
Hyperbolic Space Sparse Coding 327
The major computational steps of the proposed system are illustrated in Fig. 1.
The new method can be divided into two stages. In the first stage, we perform
MRI scan segmentation, ventricular surface reconstruction, hyperbolic Ricci flow
328 J. Zhang et al.
based surface registration and surface TBM statistic computation. In the second
stage, we build ring-shaped patches on the hyperbolic parameter space by FBS
to initialize original dictionary, SCC based sparse coding and dictionary learn-
ing and max-pooling are performed for dimension reduction. Following that,
Adaboost is adopted to predict future AD conversion, i.e. classification on MCI-
converter group versus MCI-stable group.
We applied hyperbolic Ricci flow method [11] on ventricular surfaces and mapped
them to the Poincaré disk with conformal mapping. On the Poincaré disk, we
computed a set of consistent geodesics and projected them back to the original
ventricular surface, termed as geodesic curve lifting. Further, we converted the
Poincaré model to the Klein model where the ventricular surfaces are registered
by the constrained harmonic map. The computation of canonical hyperbolic
spaces for a left ventricular surface is shown in Fig. 2.
In Fig. 2, geodesic curve lifting used to construct a canonical hyperbolic space
for ventricular surface registration. γ1 , γ2 , γ3 are some consistent anchor curves
automatically located on the end points of each horn. On the parameter domain,
τ1 is an arc on the circle which passes one endpoint of γ21 and one endpoint of
γ2 and is orthogonal to |z| = 1. The initial paths τ1 and τ2 can be inconsistent,
but they have to connect consistent endpoints of γ1 , γ2 and γ3 , as to guarantee
the consistency of the geodesic curve computation. After slicing the universal
covering space along the geodesics, we get the canonical fundamental domain
in the Poincaré disk, as shown in Fig. 2(b). All the boundary curves become
geodesics. As the geodesics are unique, they are also consistent when we map
them back to the surface in R3 . Furthermore, we convert the Poincaré model
to the Klein model with the complex function [11]: z = 2z/1 + zz. It converts
the canonical fundamental domains of the ventricular surfaces to a Euclidean
octagon, as shown in Fig. 2(c). Then we use the Klein disk as the canonical
parameter space for the ventricular surface analysis.
Hyperbolic Space Sparse Coding 329
Fig. 2. Modeling ventricular surface with hyperbolic geometry. (a) shows three iden-
tified open boundaries, γ1 , γ2 , γ3 , on the ends of three horns. After that, ventricular
surfaces can be conformally mapped to the hyperbolic space. (b)(c) show the hyperbolic
parameter space, where (b) is the Poincaré disk model and (c) is the Klein model.
After that, we computed the TBM features [11] and smooth them with the
heat kernel method [3]. Suppose φ = S1 → S2 is a map from surface S1 to
surface S2 . The derivative map of φ is the linear map between the tangent
spaces dφ : T M (p) → T M (φ(p)), induced by the map φ, which also defines
the Jacobian matrix of φ. The derivative map dφ is approximated by the linear
map from one face [v1 , v2 , v3 ] to another one [w1 , w2 , w3 ]. First, we isometrically
embed the triangles [v1 , v2 , v3 ] and [w1 , w2 , w3 ] onto the Klein disk, the planar
coordinates of the vertices, denotes by vi , wi , i = 1, 2, 3, which represent the 3D
position of points vi , wi , i = 1, 2, 3. Then, the Jacobian matrix for the derivative
map dφ can be computed as J = dφ = [w3 − w1 , w2 − w1 ][v3 − v1 , v2√ − v1 ]−1 .
Based on the derivative map J, the deformation tensors S = J T J was
defined as TBM, which measures the amount of local area changes in a surface.
As pointed out in [3], each step in the processing pipeline including MRI acquisi-
tion, surface registration, etc., are expected to introduce noise in the deformation
measurement. To account for the noise effects, we apply the heat kernel smooth-
ing algorithm proposed in [3] to increase the SNR in the TBM statistical features
and boost the sensitivity of statistical analysis.
The hyperbolic space is different from the original Euclidean space, the struc-
ture is more complicated and demands more efforts for selecting patches based
on its topological structure. The common rectangle patch construction cannot
be directly applied to the hyperbolic space. Therefore, we proposed a Farthest
point sampling with Breadth-first Search (FBS) on hyperbolic space to initialize
original dictionary for sparse coding. Figure 3 (right) is the visualization of patch
selection on hyperbolic parameter domain. And Fig. 3 (left) projects the selected
patches on hyperbolic parameter domain back to the original ventricular surface,
which still maintains the same topological structure as the parameter domain.
First, we randomly selected a point center on the hyperbolic space, denotes
by px1 , px1 ∈ Xr , where Xr is the set of all discrete vertices on hyperbolic space.
Then, we find all points px1 ,i (i = 1, 2, ..., n), where n is the maximum number of
330 J. Zhang et al.
connected points connecting with the patch center px1 . The procedure is called
breadth-first search (BFS) [8], which is an algorithm for searching graph data
structures. It starts at the tree root and explores the neighbor nodes first, before
moving to the next level neighbors. Then, we used the same procedure to find
all connected points with px1 ,i , which are px1 ,ij (j = 1, 2, · · · , mi ). Here, mi
represents the maximum number of connected points with each specific point
px1 ,i . The points px1 ,ij are connected with px1 ,i by using same procedure–BFS–
between px1 and px1 ,i . Finally, we get a set Px1 as follows, which is a selected
patch with patch center px1 and do not contain duplicate points.
Px1 = {px1 , px1 ,1 , px1 ,11 , · · · , px1 ,1m1 , · · · , px1 ,n , px1 ,n1 , · · · , px1 ,nmn }. (1)
We can find all connected components of the center point px1 which are all in
set Px1 . After that, we reconstruct the topological patches based on hyperbolic
geometry and connected edges between the different points within Px1 according
to topological structure. We use Φ1 denotes the first selected patch of the root
(patch center) px1 . Since we randomly select patches with different degree over-
lapped, we use radius r = maxpx ∈Xr dXr (px , px1 ) to determine next patch’s
root px2 position.
In this way, we can find the second patch root px2 ∈ Xr with the farthest
distance r of px1 . We apply farthest point sampling [7], because the sampling
principle is based on the idea of repeatedly placing the next sample point in the
middle of the least known area of the sampling domain, which can guarantee
the randomness of the patches selection. Here, d is hyperbolic distance in the
Klein model. Given two points p and q, draw a straight line between them; the
straight line intersects the unit circle at points a and b, so d is defined as follows:
1 |aq||bp|
d(p, q) = (log ), (2)
2 |ap||bq|
where X denotes the set of selected patch centers. Then, we add px2 in X and
iterate the patch selection procedure T = 2000 times, because it will cover all
vertexes according to the experimental results. The details of FBS are summa-
rized in Algorithm 1.
For our problem, the dimension of surface-based features is usually much larger
than the number of subjects, e.g., we have approximate 150,000 features from
each side of ventricle surfaces on each subject. Therefore, we used the technique
of dictionary learning [6] with pooling to reduce the dimension before prediction.
The problem statement of dictionary learning is described as below.
Given a finite training set of signals X = (x1 , x2 , · · · , xn ) in Rn×m image
patches, each image patch xi ∈ Rm , i = 1, 2, · · · , n, where m is the dimension
of image patch. Then, we can incorporate the idea of patch features into the
following optimization problem for each patch xi :
1
min(D, zi ) = ||Dzi − xi ||22 + λ||zi ||1 . (4)
fi 2
Specifically, suppose there are t atoms dj ∈ Rm , j = 1, 2, · · · , t, where the
number of atoms is much smaller than n (the number of image patches) but
t m (the dimension of the image patches). xi can be represented
larger than
by xi = j=1 zi,j dj . In this way, the m-dimensional vector xi is represented
by an t-dimensional vector zi = (zi,1 , · · · , zi,t )T , which means the learned
feature vector zi is a sparse vector. In Eq. 4, where λ is theregularization
t
parameter, || · || is the standard Euclidean norm and ||zi ||1 = j=1 |zi,j | and
t×m
D = (d1 , d2 , · · · , dt ) ∈ R is the dictionary, each column representing a basis
vector.
332 J. Zhang et al.
To prevent an arbitrary scaling of the sparse codes, the columns di are con-
strained by C = {D ∈ Rt×m s.t.∀j = 1, · · · , t, dTj dj ≤ 1}. Thus, the problem of
dictionary learning can be rewritten as a matrix factorization problem:
1
min ||X − DZ||2F + λ||Z||1 . (5)
D∈C,Z∈n×t 2
It is a convex problem when either D or Z is fixed. When the dictionary D is
fixed, solving each sparse code zi is a Lasso problem. Otherwise, when the Z
are fixed, it will become a quadratic problem, which is relative time consuming.
Thus, we choose the SCC algorithm [6], because it can dramatically reduce the
computational cost of the sparse coding while keeping a comparable performance.
classification on the same dataset with our new algorithm. We tested FBS, Shape,
volume and area on left, right and whole ventricle, respectively. Table 2 shows
classification performance in one experiment featuring four methods.
Fig. 4. Classification performance comparison with ROC curves and AUC measures.
Throughout all the experimental results, we can find that the best accuracy
(96.7 %), the best sensitivity (93.3 %), the best specificity (100 %), the best pos-
itive position value (100 %) and negative position value (88.9 %) were achieved
when we use TBM features on ventricle hyperbolic space on both sides (whole)
for training and testing. The comparison also shows that our new framework
selected better features and made better and more meaningful classification
results. In Fig. 4, we also generated ROC and computed AUC measures in four
experiments. The FBS algorithm with whole ventricle TBM features achieved
best AUC (0.957). The comparison demonstrated that our proposed algorithm
may be useful for AD diagnosis and prognosis research. In the future, we will
do more in depth comparisons against other shape analysis modules, such as
SPHARM-PDM and radio distance, to further improve our algorithm efficiency
and accuracy.
334 J. Zhang et al.
References
1. Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in
visual recognition. In: Proceedings of the ICML-2010, vol. 10, pp. 111–118 (2010)
2. Cardenas, V., Chao, L., Studholme, C., Yaffe, K., Miller, B., Madison, C.,
Buckley, S., Mungas, D., Schuff, N., Weiner, M.: Brain atrophy associated with
baseline and longitudinal measures of cognition. Neurobiol. Aging 32(4), 572–580
(2011)
3. Chung, M.K., Robbins, S.M., Dalton, K.M., Davidson, R.J., Alexander, A.L.,
Evans, A.C.: Cortical thickness analysis in autism with heat kernel smoothing.
NeuroImage 25(4), 1256–1265 (2005)
4. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–
874 (2006)
5. Ferrarini, L., Palm, W.M., Olofsen, H., van der Landen, R., van Buchem, M.A.,
Reiber, J.H., Admiraal-Behloul, F.: Ventricular shape biomarkers for Alzheimers
disease in clinical MR images. Magn. Reson. Med. 59(2), 260–267 (2008)
6. Lin, B., Li, Q., Sun, Q., Lai, M.J., Davidson, I., Fan, W., Ye, J.: Stochastic coordi-
nate coding and its application for drosophila gene expression pattern annotation
(2014). arXiv preprint arXiv:1407.8147
7. Moenning, C., Dodgson, N.A.: Fast marching farthest point sampling. In: Proceed-
ings of EUROGRAPHICS 2003 (2003)
8. Patel, J.R., Shah, T.R., Shingadiy, V.P., Patel, V.B.: Comparison between breadth
first search and nearest neighbor algorithm for waveguide path planning
9. Patenaude, B., Smitha, S.M., Kennedyc, D.N., Jenkinsona, M.: A bayesian model
of shape and appearance for subcortical brain segmentation. Neuroimage 56(3),
907–922 (2011)
10. Rojas, R.: Adaboost and the super bowl of classifiers a tutorial introduction to
adaptive boosting. Technical report, Freie University, Berlin (2009)
11. Shi, J., Stonnington, C.M., Thompson, P.M., Chen, K., Gutman, B., Reschke, C.,
Baxter, L.C., Reiman, E.M., Caselli, R.J., Wang, Y.: Studying ventricular abnor-
malities in mild cognitive impairment with hyperbolic Ricci flow and tensor-based
morphometry. NeuroImage 104, 1–20 (2015)
12. Stonnington, C.M., Chu, C., Klöppel, S., Jack, C.R., Ashburner, J., Frackowiak,
R.S.: Predicting clinical scores from magnetic resonance scans in Alzheimer’s dis-
ease. Neuroimage 51(4), 1405–1413 (2010)
13. Styner, M., Lieberman, J.A., McClure, R.K., Weinberger, D.R., Jones, D.W.,
Gerig, G.: Morphometric analysis of lateral ventricles in schizophrenia and healthy
controls regarding genetic and disease-specific factors. Proc. Natl. Acad. Sci. U.S.A.
102(13), 4872–4877 (2005)
14. Thompson, P.M., Hayashia, K.M., de Zubicaray, G.I., Jankeb, A.L., Roseb,
S.E., Semplec, J., Honga, M.S., Hermana, D.H., Gravanoa, D., Doddrellb, D.M.,
Toga, A.W.: Mapping hippocampal and ventricular changein Alzheimer disease.
Neuroimage 22(4), 1754–1766 (2004)
15. Zhang, J., Stonnington, C., Li, Q., Shi, J., Bauer, R.J., Gutman, B.A., Chen, K.,
Reiman, E.M., Thompson, P.M., Ye, J., Wang, Y.: Applying sparse coding to
surface multivariate tensor-based morphometry to predict future cognitive decline.
In: IEEE International Symposium on Biomedical Imaging (2016)
Large-Scale Collaborative Imaging Genetics
Studies of Risk Genetic Factors for Alzheimer’s
Disease Across Multiple Institutions
Qingyang Li1 , Tao Yang1 , Liang Zhan2 , Derrek Paul Hibar3 , Neda Jahanshad3 ,
Yalin Wang1 , Jieping Ye4 , Paul M. Thompson3 , and Jie Wang4(B)
1
School of Computing, Informatics, and Decision Systems Engineering,
Arizona State University, Tempe, AZ, USA
2
Department of Engineering and Technology, University of Wisconsin-Stout,
Menomonie, WI, USA
3
Imaging Genetics Center, Institute for Neuroimaging and Informatics,
University of Southern California, Marina del Rey, CA, USA
4
Department of Computational Medicine and Bioinformatics,
University of Michigan, Ann Arbor, MI, USA
jwangumi@umich.edu
1 Introduction
Alzheimer’s Disease (AD) is a severe and growing worldwide health problem.
Many techniques have been developed to investigate AD, such as magnetic reso-
nance imaging (MRI) and genome-wide association studies (GWAS), which are
powerful neuroimaging modalities to identify preclinical and clinical AD patients.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 335–343, 2016.
DOI: 10.1007/978-3-319-46720-7 39
336 Q. Li et al.
GWAS [4] are achieving great success in finding single nucleotide polymorphisms
(SNPs) associated with AD. For example, APOE is a highly prevalent AD risk
gene, and each copy of the adverse variant is associated with a 3-fold increase
in AD risk. The Alzheimer’s Disease Neuroimaging Initiative (ADNI) collects
neuroimaging and genomic data from elderly individuals across North America.
However, processing and integrating genetic data across different institutions is
challenging. Each institution may wish to collaborate with others, but often legal
or ethical regulations restrict access to individual data, to avoid compromising
data privacy.
Some studies, such as ADNI, share genomic data publicly under certain con-
ditions, but more commonly, each participating institution may be required to
keep their genomic data private, so collecting all data together may not be fea-
sible. To deal with this challenge, we proposed a novel distributed framework,
termed Local Query Model (LQM), to perform the Lasso regression analysis
in a distributed manner, learning genetic risk factors without accessing others’
data. However, applying LQM for model selection—such as stability selection—
can be very time consuming on a large-scale data set. To speed up the learning
process, we proposed a family of distributed safe screening rules (D-SAFE and
D-EDPP) to identify irrelevant features and remove them from the optimiza-
tion without sacrificing accuracy. Next, LQM is employed on the reduced data
matrix to train the model so that each institution obtains top risk genes for AD
by stability selection on the learnt model without revealing its own data set. We
evaluate our method on the ADNI GWAS data, which contains 809 subjects with
5,906,152 SNP features, involving a 80 GB data matrix with approximate 42 bil-
lion nonzero elements, distributed across three research institutions. Empirical
evaluations demonstrate a speedup of 66-fold gained by D-EDPP, compared to
LQM without D-EDPP. Stability selection results show that proposed framework
ranked APOE as the first risk SNPs among all features.
2 Data Processing
2.1 ADNI GWAS Data
The ADNI GWAS data contains genotype information for each of the 809 ADNI
participants, which consist of 128 patients with AD, 415 with mild cognitive
impairment (MCI), 266 cognitively normal (CN). SNPs at approximately 5.9
million specific loci are recorded for each participant. We encode SNPs with
the coding scheme in [7] and apply Minor Allele Frequency (MAF) < 0.05 and
Genotype Quality (GQ) < 45 as two quality control criteria to filter high quality
SNPs features, the details refer to [11].
3 Methods
Figure 1 illustrates the general idea of our distributed framework. Suppose that
each institution maintains the ADNI genome-wide data for a few subjects. We
first apply the distributed Lasso screening rule to pre-identify inactive features
and remove them from the training phase. Next, we employ the LQM on the
reduced data matrices to perform collaborative analyses across different institu-
tions. Finally, each institution obtains the learnt model and performs stability
selection to rank the SNPs that may collectively affect AD. The process of sta-
bility selection is to count the frequency of nonzero entries in the solution vectors
and select the most frequent ones as the top risk genes for AD. The whole learn-
ing procedure results in the same model for all institutions, and preserves data
privacy at each of them.
The inactive features have zero components in the optimal solution vector
x∗ (λ) so that we can remove them from the optimization without sacrificing the
accuracy of the optimal value in the objective function (1). We call this kind of
screening methods as Safe Screening Rules. SAFE [3] is one of highly efficient
safe screening methods. In SAFE, the jth entry of x∗ (λ) is discarded when
T
[A]j y < λ − ||[A]j ||2 ||y||2 λmax − λ , (6)
λmax
where λmax = maxj [A]Tj y . As a result, the optimization can be performed on
the reduced data matrix A and the original problem (1) can be reformulated as:
1 ∈ Rn×p ,
min ||A
x − y||22 + λ|| ∈ Rp and A
x||1 : x (7)
2
x
where p is the number of remaining features after employing safe screening rules.
The optimization is performed on a reduced feature matrix, accelerating the
whole learning process significantly.
To compute ||[A]j ||2 in Step 3, we first compute Hi = ||[Ai ]j ||22 and √ perform
m
LQM to compute H by H = i H i . Then, we have ||[Ai ]j ||2 = H. Simi-
larly, we can compute ||y||2 in Step 3. As the data communication only requires
intermediate results, D-SAFE preserves the data privacy at each institution.
To tune the value of λ, commonly used methods such as cross validation need
to solve the Lasso problem along a sequence of parameters λ0 > λ1 > ... > λκ ,
which can be very time-consuming. Enhanced Dual Polytope Projection (EDPP)
[10] is a highly efficient safe screening rules. Implementation details of EDPP is
available on the GitHub: http://dpc-screening.github.io/lasso.html.
To address the problem of data privacy, we propose a distributed Lasso
screening rule, termed Distributed Enhanced Dual Polytope Projection (D-
EDPP), to identify and discard inactive features along a sequence of parameter
values in a distributed manner. The idea of D-EDPP is similar to LQM. Specifi-
cally, to update the global variables, we apply LQM to query each local center for
intermediate results–computed locally–and we aggregate them at global center.
After obtaining the reduced matrix for each institution, we apply LQM to solve
the Lasso problem on the reduced data set A i , i = 1, ..., m. We assume that j
340 Q. Li et al.
To further accelerate the learning process, we apply FISTA [1] to solve the Lasso
problem in a distributed manner. The convergence rate of FISTA is O(1/k 2 )
compared to O(1/k) of ISTA, where k is the iteration number. We integrate
FISTA with LQM (F-LQM) to solve the Lasso problem on the reduced matrix
Ai . We summarize the updating rule of F-LQM in kth iteration as follows:
The matrix A i denotes the reduced matrix for the ith institution obtained
by D-EDPP rule. We repeat this procedure until a satisfactory global model is
obtained. Step 1 calculates ∇gik from local data (A i , yi ). Then, each institution
performs LQM to get the gradient ∇g k based on (5). Step 2 updates the auxiliary
variables z k and step size tk . Step 3 updates the model x. Similar to LQM, the
data privacy of institutions are well preserved by F-LQM.
4 Experiment
We implement the proposed framework across three institutions on a state-of-
the-art distributed platform—Apache Spark—a fast and efficient distributed
platform for large-scale data computing. Experiment shows the efficiency and
effectiveness of proposed models.
100 2500
Speedup
90
D-EDPP +F-LQM
F-LQM
80 2000
70
x66
Time(in minutes)
Time(in minutes)
x61
60 x57 1500
x53 x54
x51
50
x42
40 1000
x32
30
x25
x21
20 500
10
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
The Number of features(in millions)
Fig. 2. Running time comparison of Lasso with and without D-EDPP rules.
Table 1. Top 5 selected risk SNPs associated with diagnose, the volume of hippocam-
pal, entorhinal cortex, and lateral ventricle at baseline, based on ADNI.
Acknowledgments. This work was supported in part by NIH Big Data to Knowledge
(BD2K) Center of Excellence grant U54 EB020403, funded by a cross-NIH consortium
including NIBIB and NCI.
References
1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear
inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
2. Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for
linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math.
57(11), 1413–1457 (2004)
3. Ghaoui, L.E., Viallon, V., Rabbani, T.: Safe feature elimination for the lasso and
sparse supervised learning problems. arXiv preprint arXiv:1009.4219 (2010)
4. Harold, D., et al.: Genome-wide association study identifies variants at clu and
picalm associated with Alzheimer’s disease. Nature Genet. 41(10), 1088–1093
(2009)
5. Liu, C.C., Kanekiyo, T., Xu, H., Bu, G.: Apolipoprotein e and Alzheimer disease:
risk, mechanisms and therapy. Nature Rev. Neurol. 9(2), 106–118 (2013)
6. Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Series B (Stat.
Methodol.) 72(4), 417–473 (2010)
7. Sasieni, P.D.: From genotypes to genes: doubling the sample size. Biometrics, 1253–
1261 (1997)
Large-Scale Collaborative Imaging Genetics Studies 343
8. Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l 1-regularized loss mini-
mization. J. Mach. Learn. Res. 12, 1865–1892 (2011)
9. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc.
Series B (Methodol.), 267–288 (1996)
10. Wang, J., Zhou, J., Wonka, P., Ye, J.: Lasso screening rules via dual polytope
projection. In: Advances in Neural Information Processing Systems (2013)
11. Yang, T., et al.: Detecting genetic risk factors for Alzheimer’s disease in whole
genome sequence data via lasso screening. In: IEEE International Symposium on
Biomedical Imaging, pp. 985–989 (2015)
Structured Sparse Low-Rank Regression Model
for Brain-Wide and Genome-Wide Associations
1 Introduction
Recently, it has been of great interest to identify the genetic basis (e.g., Sin-
gle Nucleotide Polymorphisms: SNPs) of phenotypic neuroimaging markers
(e.g., features in Magnetic Resonance Imaging: MRI) and study the associa-
tions between them, known as imaging-genetic analysis. In the previous work,
Vounou et al. categorized the association studies between neuroimaging pheno-
types and genotypes into four classes depending on both the dimensionality of
the phenotype being investigated and the size of genomic regions being searched
for association [13]. In this work, we focus on the Brain-Wide and Genome-Wide
Association (BW-GWA) study, in which we search non-random associations for
both the whole brain and the entire genome.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 344–352, 2016.
DOI: 10.1007/978-3-319-46720-7 40
Structured Sparse Low-Rank Regression Model 345
The BW-GWA study has a potential benefit to help discover important asso-
ciations between neuroimaging based phenotypic markers and genotypes from a
different perspective. For example, by identifying high associations between spe-
cific SNPs and some brain regions related to Alzheimer’s Disease (AD), one can
utilize the information of the corresponding SNPs to predict the risk of incident
AD much earlier, even before pathological changes begin. This will help clini-
cians have much time to track the progress of AD and find potential treatments
to prevent the AD. Due to the high-dimensional nature of brain phenotypes
and genotypes, there were only a few studies for BW-GWA [3,8]. Conventional
methods formulated the problem as Multi-output Linear Regression (MLR) to
estimate the coefficients independently, thus resulting in unsatisfactory perfor-
mance. Recent studies were mostly devoted to conduct dimensionality reduction
while the results should be still interpretable at the end. For example, Stein et
al. [8] and Vounou et al. [13], separately, employed t-test and sparse reduced-
rank regression to conduct association study between voxel-based neuroimaging
phenotypes and SNP genotypes.
In this paper, we propose a novel structured sparse low-rank regression model
for the BW-GWA study with MRI features of a whole brain as phenotypes and
the SNP genotypes. To do this, we first impose a low-rank constraint on the
coefficient matrix of the MLR. With a low-rank constraint, we can think of the
coefficient matrix decomposed by two low-rank matrices, i.e., two transforma-
tion subspaces, each of which separately transfers high-dimensional phenotypes
and genotypes into their own low-rank representations via considering the cor-
relations among the response variables and the features. We then introduce a
structured sparsity-inducing penalty (i.e., an 2,1 -norm regularizer) on each of
transformation matrices to conduct biomarker selection on both phenotypes and
genotypes by taking the correlations among the features into account. The struc-
tured sparsity constraint allows the low-rank regression to select highly predic-
tive genotypes and phenotypes, as a large number of them are not expected to
be important and involved in the BW-GWA study [14]. In this way, our new
method integrates low-rank constraint with structured sparsity constraints in
a unified framework. We apply the proposed method to study the genotype-
phenotype associations using the Alzheimer’s Disease Neuroimaging Initiative
(ADNI) data. Our experimental results show that our new model consistently
outperforms the competing methods in term of the prediction accuracy.
2 Methodology
2.1 Notations
In this paper, we denote matrices, vectors, and scalars as boldface uppercase
letters, boldface lowercase letters, and normal italic letters, respectively. For a
matrix X = [xij ], its i -th row and the j -th column are denoted as xi and xj ,
Frobenius norm and the2,1 -norm of a matrix
respectively. Also, we denote the
i
X as XF = i
i x 2 = j xj 2 and X2,1 = i x 2 , respectively.
2 2
346 X. Zhu et al.
We further denote the transpose operator, the trace operator, the rank, and the
inverse of a matrix X as XT , tr(X), rank(X), and X−1 , respectively.
Y = XW + eb (1)
However, the MLR illustrated in Fig. 1(a) with the OLS estimation in Eq.
(2) has at least two limitations. First, Eq. (2) is equivalent to conduct mass-
univariate linear models, which fit each of c univariate response variables, inde-
pendently. This obviously doesn’t make use of possible relations among the
response variables (i.e., ROIs). Second, neither X nor Y in MLR are ensured
to have a full-rank due to noise, outliers, correlations in the data [13]. For the
non-full rank (or low-rank) case of XT X, Eq. (2) is not applicable.
Yn x c = Xn x d + En x c Yn x c = Xn x d + En x c
Ac x rT
Wd x c Bd x r
3 Experimental Analysis
We conducted various experiments on the ADNI dataset (‘www.adni-info.org’)
by comparing the proposed method with the state-of-the-art methods.
for all methods in our experiments. As for the rank of the coefficient matrix W,
we varied the values of r in {1, 2, ..., 10} for our method.
By following the previous work [3,14], we picked up the top {20, 40, ..., 200}
SNPs to predict test data. The performance of each experiment was assessed
by Root-Mean-Square Error (RMSE), a widely used measurement for regression
analysis, and ‘Frequency’ (∈ [0, 1]) defined as the ratio of the features selected
in 50 experiments. The larger the value of ‘Frequency’, the more likely the cor-
responding SNP (or ROI) is selected.
We summarized the RMSE performances of all methods in Fig. 2(a), where the
mean and standard deviation of the RMSEs were obtained from the 50 (5-fold
CV × 10 repetition) experiments. Figure 2(b) and (c) showed, respectively, the
values of ‘Frequency’ of the top 10 selected SNPs by the competing methods and
the frequency of the top 10 selected ROIs by our method.
Figure 2(a) discovered the following observations: (i) The RMSE values of all
methods decreased with the increase of the number of selected SNPs. This is
because the more the SNPs, the better performance the BW-GWA study is, in
our experiments. (ii) The proposed method obtained the best performance, fol-
lowed by the Baseline, RRR, GFS, CCA, L21, and MLR. Specifically, our method
improved by on average 12.75 % compared to the other competing methods. In
the paired-sample t-test at 95 % confidence level, all p-values between the pro-
posed method and the comparison methods were less than 0.00001. Moreover,
our method was considerably stable than the comparison methods. This clearly
manifested the advantage of the proposed method integrating a low-rank con-
straint with structured sparsity constraints in a unified framework. (iii) The
Baseline method improved by on average 8.26 % compared to the comparison
× 10-4 0.8 50
rs429358
MLR
3.5 rs11234495 0.7
L21
GFS rs7938033 0.6
40
Frequency
CCA rs10792820
RMSE
RRR 0.5
rs7945931
3 Baseline
Proposed rs2276346 0.4
rs6584307 30
0.3
rs1329600
rs17367504 0.2
2.5
rs10779339 0.1
20
20 40 60 80 100 120 140 160 180 200 1 2 3 4 5 6 7 8 9 10
R 21 FS A R ne ed
Number of selected SNPs ML L G CC RR aseli opos ROI
B Pr
(a) RMSE (b) Top 10 selected SNPs (c) Top 10 selected ROIs
Fig. 2. (a) RMSE with respect to different number of selected SNPs of all methods;
(b) Frequency of top 10 selected SNPs by all methods; and (c) Frequency of the top 10
selected ROIs by our method in our 50 experiments. The name of the ROIs (indexed
from 1 to 10) are middle temporal gyrus left, perirhinal cortex left, temporal pole left,
middle temporal gyrus right, amygdala right, hippocampal formation right, middle
temporal gyrus left, amygdala left, inferior temporal gyrus right, and hippocampal
formation left.
350 X. Zhu et al.
methods and the p-values were less than 0.001 in the paired-sample t-tests at
95 % confidence level. This manifested that our model without selecting ROIs
(i.e., Baseline) still outperformed all comparison methods. It is noteworthy that
our proposed method improved by on average 4.49 % over the Baseline method
and the paired-sample t-tests also indicated the improvements were statistically
significant difference. This verified again that it is essential to simultaneously
select a subset of ROIs and a subset of SNPs.
Figure 2(b) indicated that phenotypes could be affected by genotypes in dif-
ferent degrees: (i) The selected SNPs in Fig. 2(b) belonged to the genes, such as
PICALM, APOE, SORL1, ENTPD7, DAPK1, MTHFR, and CR1, which have
been reported as the top AD-related genes in the AlzGene website. (ii) Although
we know little about the underlying mechanisms of genotypes in relation to AD,
but Fig. 2(b) enabled a potential to gain biological insights from the BW-GWA
study. (iii) The selected ROIs by the proposed method in Fig. 2(c) were known to
be highly related to AD in previous studies [10,12,19]. It should be noteworthy
that all methods selected ROIs in Fig. 2(c) as their top ROIs but with different
probability.
Finally, our method conducted the BW-GWA study to select a subset of SNPs
and a subset of ROIs, which were also known in relation to AD by the previous
state-of-the-art methods. The consistent performance of our methods clearly
demonstrated that the proposed method enabled to conduct more statistically
meaningful BW-GWA study, compared to the comparison methods.
4 Conclusion
References
1. Du, L., et al.: A novel structure-aware sparse learning algorithm for brain imaging
genetics. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.)
MICCAI 2014. LNCS, vol. 8675, pp. 329–336. Springer, Heidelberg (2014). doi:10.
1007/978-3-319-10443-0 42
2. Evgeniou, A., Pontil, M.: Multi-task feature learning. NIPS 19, 41–48 (2007)
3. Hao, X., Yu, J., Zhang, D.: Identifying genetic associations with MRI-derived
measures via tree-guided sparse learning. In: Golland, P., Hata, N., Barillot, C.,
Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 757–764.
Springer, Heidelberg (2014). doi:10.1007/978-3-319-10470-6 94
4. Izenman, A.J.: Reduced-rank regression for the multivariate linear model. J. Mul-
tivar. Anal. 5(2), 248–264 (1975)
5. Jin, Y., Wee, C.Y., Shi, F., Thung, K.H., Ni, D., Yap, P.T., Shen, D.: Identification
of infants at high-risk for autism spectrum disorder using multiparameter multi-
scale white matter connectivity networks. Hum. Brain Mapp. 36(12), 4880–4896
(2015)
6. Lin, D., Cao, H., Calhoun, V.D., Wang, Y.P.: Sparse models for correlative and
integrative analysis of imaging and genetic data. J. Neurosci. Methods 237, 69–78
(2014)
7. Shen, L., Thompson, P.M., Potkin, S.G., et al.: Genetic analysis of quantitative
phenotypes in AD and MCI: imaging, cognition and biomarkers. Brain Imaging
Behav. 8(2), 183–207 (2014)
8. Stein, J.L., Hua, X., Lee, S., Ho, A.J., Leow, A.D., Toga, A.W., Saykin, A.J.,
Shen, L., Foroud, T., Pankratz, N., et al.: Voxelwise genome-wide association study
(vGWAS). NeuroImage 53(3), 1160–1174 (2010)
9. Suk, H., Lee, S., Shen, D.: Hierarchical feature representation and multimodal
fusion with deep learning for AD/MCI diagnosis. NeuroImage 101, 569–582 (2014)
10. Suk, H., Wee, C., Lee, S., Shen, D.: State-space model with deep learning for
functional dynamics estimation in resting-state fMRI. NeuroImage 129, 292–307
(2016)
11. Thung, K., Wee, C., Yap, P., Shen, D.: Neurodegenerative disease diagnosis using
incomplete multi-modality data via matrix shrinkage and completion. NeuroImage
91, 386–400 (2014)
12. Thung, K.H., Wee, C.Y., Yap, P.T., Shen, D.: Identification of progressive mild cog-
nitive impairment patients using incomplete longitudinal MRI scans. Brain Struct.
Funct., 1–17 (2015)
13. Vounou, M., Nichols, T.E., Montana, G.: ADNI: discovering genetic associations
with high-dimensional neuroimaging phenotypes: a sparse reduced-rank regression
approach. NeuroImage 53(3), 1147–1159 (2010)
14. Wang, H., Nie, F., Huang, H., et al.: Identifying quantitative trait loci via group-
sparse multitask regression and feature selection: an imaging genetics study of the
ADNI cohort. Bioinformatics 28(2), 229–237 (2012)
15. Yan, J., Du, L., Kim, S., et al.: Transcriptome-guided amyloid imaging genetic
analysis via a novel structured sparse learning algorithm. Bioinformatics 30(17),
i564–i571 (2014)
16. Zhang, C., Qin, Y., Zhu, X., Zhang, J., Zhang, S.: Clustering-based missing value
imputation for data preprocessing. In: IEEE International Conference on Industrial
Informatics, pp. 1081–1086 (2006)
352 X. Zhu et al.
17. Zhu, X., Huang, Z., Shen, H.T., Cheng, J., Xu, C.: Dimensionality reduction by
mixed kernel canonical correlation analysis. Pattern Recogn. 45(8), 3003–3016
(2012)
18. Zhu, X., Li, X., Zhang, S.: Block-row sparse multiview multilabel learning for image
classification. IEEE Trans. Cybern. 46(2), 450–461 (2016)
19. Zhu, X., Suk, H.I., Lee, S.W., Shen, D.: Canonical feature selection for joint regres-
sion and multi-class identification in Alzheimers disease diagnosis. Brain Imaging
Behav., 1–11 (2015)
20. Zhu, X., Suk, H., Shen, D.: A novel matrix-similarity based loss function for joint
regression and classification in AD diagnosis. NeuroImage 100, 91–105 (2014)
3D Ultrasonic Needle Tracking with a 1.5D
Transducer Array for Guidance of Fetal
Interventions
1 Introduction
Ultrasound (US) image guidance is of crucial importance during percutaneous
interventions in many clinical fields including fetal medicine, regional anesthesia,
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 353–361, 2016.
DOI: 10.1007/978-3-319-46720-7 41
354 W. Xia et al.
Fig. 1. The 3D ultrasonic needle tracking system, shown schematically (a). The track-
ing probe was driven by a commercial ultrasound (US) scanner; transmissions from
the probe were received by a fiber-optic hydrophone sensor at the needle tip. The
transducer elements in the probe (b) were arranged in four rows (A–D).
356 W. Xia et al.
Fig. 2. The algorithm to estimate the needle tip position from the sensor data is
shown schematically (top). Representative data from all transducer elements obtained
before Golay decoding (1) and after (2), show improvements in SNR relative to bipolar
excitation (3). These three datasets are plotted on a linear scale as the absolute value
of their Hilbert transforms, normalized separately to their maximum values.
⎧ 4 2 ⎫
⎪
[tm − ts (xi , zj )] · w(k) ⎪
(k) (k)
⎪
⎨ ⎪
⎬
k=1
(x̃, z̃) = arg min (1)
(xi ,zj ) ⎪
⎪
4 ⎪
⎪
⎩ [w(k) ]2 ⎭
k=1
where the signal amplitudes at the coordinates (h(k) , v (k) ) were used as weighting
factors, w(k) , so that tracking images with higher signal amplitudes contributed
more prominently.
The relative tracking accuracy of the system was evaluated with a water phan-
tom. The needle was fixed on a translation stage, with its shaft oriented to
simulate an out-of-plane insertion: it was positioned within an X-Z plane with
its tip approximately 38 mm in depth from the tracking probe, and angled at 45o
to the water surface normal (Fig. 3a). The tracking probe was translated relative
to the needle in the out-of-plane dimension, X. This translation was performed
across 20 mm, with a step size of 2 mm. At each position, FOH sensor data were
acquired for needle tip tracking.
Each needle tip position estimate was compared with a corresponding refer-
ence position. The relative tracking accuracy was defined as the absolute differ-
ence between these two quantities. The X component of the reference position
was obtained from the translation stage, centered relative to the probe axis. As
Y and Z were assumed to be constant during translation of the tracking probe,
the Y and Z components of the reference position were taken to be the mean
values of these components of the position estimates.
Fig. 3. (a) Relative tracking accuracy measurements were performed with the needle
and the ultrasonic needle tracking (UNT) probe in water. (b) The signal-to-noise ratios
(SNRs) of the tracking images were consistently higher for Golay-coded transmissions
than for bipolar transmissions, and they increased with proximity to the center of the
probe (X = 0). The error bars in (b) represent standard deviations calculated from the
four tracking images. (c) Estimated relative tracking accuracies for Golay-coded trans-
missions along orthogonal axes; error bars represent standard deviations calculated
from all needle tip positions.
358 W. Xia et al.
when the needle was approximately centered relative to the probe axis (X ∼ 0).
With Golay-coded excitation, they increased by factors of 7.3 to 8.5 (Fig. 3b).
The increases were broadly consistent with those anticipated: the temporal aver-
aging
√ provided by a pair of 32-bit Golay codes results in an SNR improvement
of 32 × 2 = 8. In water, the mean relative tracking accuracy depended on the
spatial dimension: 0.32 mm, 0.31 mm, and 0.084 mm in X, Y, and Z, respectively
(Fig. 3c). By comparison, these values are smaller than the inner diameter of 22 G
needles that are widely used in percutaneous procedures. They are also smaller
than recently reported EM tracking errors of 2 ± 1 mm [19]. The Z component of
the mean relative tracking accuracy is particularly striking; it is smaller than the
ultrasound wavelength at 9 MHz. This result reflects a high level of consistency
in the tracked position estimates.
With the pregnant sheep model in vivo, in which clinically realistic ultrasound
attenuation was present, the SNR values were sufficiently high for obtaining
tracking estimates. As compared with conventional bipolar excitation, the SNR
was increased with Golay-coded excitation. In the former case, the SNR values
were in the range of 2.1 to 3.0; coding increased this range by factors of 5.3 to
6.2 (Fig. 4b). From the tracked position estimates, a needle insertion angle of
49o and a maximum needle tip depth of 31 mm were calculated.
We presented, for the first time, a 3D ultrasonic tracking system based on
a 1.5D transducer array and a fiber-optic ultrasound sensor. A primary advan-
tage of this system is its compatibility with existing US imaging scanners, which
could facilitate clinical translation. There are several ways in which the track-
ing system developed in this study could be improved. For future iterations,
imaging array elements and a corresponding cylindrical acoustic lens could be
included to enable simultaneous 3D tracking and 2D US imaging. The SNR
could be improved by increasing the sensitivity of the FOH sensor, which could
be achieved with a Fabry-Pérot interferometer cavity that has a curved distal
surface to achieve a high finesse [20]. Additional increases in the SNR could be
obtained with larger code lengths that were beyond the limits of the particular
ultrasound scanner used in this study. The results of this study demonstrate
that 3D ultrasonic needle tracking with a 1.5D array of transducer elements and
a FOH sensor is feasible in clinically realistic environments and that it provides
highly consistent results. When integrated into an ultrasound imaging probe
that includes a linear array for acquiring 2D ultrasound images, this method
has strong potential to reduce the risk of complications and decrease procedure
times.
References
1. Daffos, F., et al.: Fetal blood, sampling during pregnancy with use of a needle
guided by ultrasound: a study of 606 consecutive cases. Am. J. Obstet. Gynecol.
153(6), 655–660 (1985)
2. Agarwal, K., et al.: Pregnancy loss after chorionic villus sampling and genetic
amniocentesis in twin pregnancies: a systematic review. Ultrasound Obstet.
Gynecol. 40(2), 128–134 (2012)
3. Hebard, S., et al.: Echogenic technology can improve needle visibility during
ultrasound-guided regional anesthesia. Reg. Anesth. Pain Med. 36(2), 185–189
(2011)
4. Klein, S.M., et al.: Piezoelectric vibrating needle and catheter for enhancing ultra-
soundguided peripheral nerve blocks. Anesth. Analg. 105, 1858–1860 (2007)
5. Rotemberg, V., et al.: Acoustic radiation force impulse (ARFI) imaging-based nee-
dle visualization. Ultrason. Imaging 33(1), 1–16 (2011)
6. Fronheiser, M.P., et al.: Vibrating interventional device detection using real-time
3-D color doppler. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 55(6), 1355–
1362 (2008)
7. Xia, W., et al.: Performance characteristics of an interventional multispectral pho-
toacoustic imaging system for guiding minimally invasive procedures. J. Biomed.
Opt. 20(8), 086005 (2015)
8. Poulin, F.: Interference during the use of an electromagnetic tracking system under
OR conditions. J. Biomech. 35, 733–737 (2002)
9. Guo, X., et al.: Photoacoustic active ultrasound element for catheter tracking. In:
Proceedings of SPIE, vol. 8943, p. 89435M (2014)
10. Xia, W., et al.: Interventional photoacoustic imaging of the human placenta with
ultrasonic tracking for minimally invasive fetal surgeries. In: Navab, N., Hornegger,
J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 371–378.
Springer, Heidelberg (2015). doi:10.1007/978-3-319-24553-9 46
11. Xia, W., et al.: In-plane ultrasonic needle tracking using a fiber-optic hydrophone.
Med. Phys. 42(10), 5983–5991 (2015)
12. Xia, W., et al.: Coded excitation ultrasonic needle tracking: an in vivo study. Med.
Phys. 43(7), 4065–4073 (2016)
13. Nikolov, S.I.: Precision of needle tip localization using a receiver in the needle.
In: IEEE International Ultrasonics Symposium Proceedings, Beijing, pp. 479–482
(2008)
14. Mung, J., et al.: A non-disruptive technology for robust 3D tool tracking for
ultrasound-guided interventions. In: Fichtinger, G., Martel, A., Peters, T. (eds.)
MICCAI 2011. LNCS, vol. 6891, pp. 153–160. Springer, Heidelberg (2011). doi:10.
1007/978-3-642-23623-5 20
15. Mung, J.: Ultrasonically marked instruments for ultrasound-guided interventions.
In: IEEE Ultrasonics Symposium (IUS), pp. 2053–2056 (2013)
16. Morris, P., et al.: A Fabry-Pérot fiber-optic ultrasonic hydrophone for the simul-
taneous measurement of temperature and acoustic pressure. J. Acoust. Soc. Am.
125(6), 3611–3622 (2009)
17. Budisin, S.Z., et al.: New complementary pairs of sequences. Electron. Lett. 26(13),
881–883 (1990)
18. David, A.L., et al.: Recombinant adeno-associated virus-mediated in utero gene
transfer gives therapeutic transgene expression in the sheep. Hum. Gene Ther. 22,
419–426 (2011)
3D Ultrasonic Needle Tracking with a 1.5D Transducer Array 361
19. Boutaleb, S., et al.: Performance and suitability assessment of a real-time 3D elec-
tromagnetic needle tracking system for interstitial brachytherapy. J. Contemp.
Brachyther. 7(4), 280–289 (2015)
20. Zhang, E.Z., Beard, P.C.: A miniature all-optical photoacoustic imaging probe. In:
Proceedings of SPIE, p. 78991F (2011). http://proceedings.spiedigitallibrary.org/
proceeding.aspx?articleid=1349009
Enhancement of Needle Tip and Shaft from 2D
Ultrasound Using Signal Transmission Maps
1 Introduction
Ultrasound (US) is a popular image-guidance tool used to facilitate real-time
needle visualization in interventional procedures such as fine needle and core
tissue biopsies, catheter placement, drainages, and anesthesia. During such pro-
cedures, it is important that the needle precisely reaches the target with mini-
mum attempts. Unfortunately, successful visualization of the needle in US-based
procedures is greatly affected by the orientation of the needle to the US beam
and is inferior for procedures involving steep needle insertion angles. The visu-
alization becomes especially problematic for curvilinear transducers since only a
small portion or none of the needle gives strong reflection.
There is a wealth of literature on improving needle visibility and detection
in US. A sampling of the literature is provided here to see the wide range of
approaches. External tracking technologies are available to track the needle [1],
but this requires custom needles and changes to the clinical work-flow. Hough
Transform [2,3], Random Sample Consensus (RANSAC) [4,5] and projection
based methods [6,7] were proposed for needle localization. In most of the previ-
ous approaches, assumptions were made for the appearance of the needle in US
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 362–369, 2016.
DOI: 10.1007/978-3-319-46720-7 42
Enhancement of Needle Tip and Shaft from 2D Ultrasound 363
images such as the needle having the longest and straightest line feature with
high intensity. Recently, Hacihaliloglu et al. [6] combined local phase-based image
projections with spatially distributed needle trajectory statistics and achieved
an error of 0.43 ± 0.31 mm for tip localization. Although the method is suitable
in instances when the shaft is discontinuous, it fails when apriori information
on shaft orientation is less available and when the tip does not appear as char-
acteristic high intensity along the needle trajectory. Regarding shaft visibility,
approaches based on beam steering [8], using linear transducers, or mechani-
cally introduced vibration [9] have been proposed. The success of beam steering
depends on the angle values used during the procedure. Furthermore, only a
portion of the needle is enhanced with curvilinear arrays so the tip is still indis-
tinguishable. Vibration-based approaches sometimes require external mechanical
devices, increasing the overall complexity of the system.
In view of the above mentioned limitations, there is need to develop methods
that perform both needle shaft and tip enhancement for improved localization
and guidance without changing the clinical work-flow and increasing the overall
complexity of the system. The proposed method is specifically useful for pro-
cedures, such as lumbar blocks, where needle shaft visibility is poor and the
tip does not have a characteristic high intensity appearance. We introduce an
efficient L1 -norm based contextual regularization that enables us to incorporate
a filter bank into the image enhancement method by taking into account US
specific signal propagation constraints. Our main novelty is incorporation of US
signal modeling, for needle imaging, into an optimization problem to estimate
the unknown signal transmission map which is used for enhancement of needle
shaft and tip. Qualitative and quantitative validation results on scans collected
from porcine, bovine, kidney and liver tissue samples are presented. Compar-
ison results against previous state of the art [6], for tip localizations, are also
provided.
2 Methods
The proposed framework is based on the information that needle insertion side
(left or right) is known a priori, the needle is inserted in plane and the shaft
close to the transducer surface is visible. Explicitly, we are interested in the
enhancement of needle images obtained from 2D curvilinear transducers.
Modeling of US signal transmission has been one of the main topics of research in
US-guided procedures [10]. The interaction of the US signal within the tissue can
be characterized into two main categories, namely, scattering and attenuation.
Since the information of the backscattered US signal from the needle interface
to the transducer, is modulated by these two interactions they can be viewed
as a mechanisms of structural information coding. Based on this we develop a
364 C. Mwikirize et al.
model, called US signal transmission map, for recovering the pertinent needle
structure from the US images. US signal transmission map maximizes the visi-
bility of high intensity features inside a local region and satisfies the constraint
that the mean intensity of the local region is less than the echogenicity of the
tissue confining the needle. In order to achieve this we propose the following
linear interpolation model which combines scattering and attenuation effects in
the tissue: U S(x, y) = U SA (x, y)U SE (x, y) + (1 − U SA (x, y))α. Here, U S(x, y)
is the B-mode US image, U SA (x, y) is the signal transmission map, U SE (x, y) is
the enhanced needle image and α is a constant value representative of echogenic-
ity in the tissue surrounding the needle. Our aim is the extraction of U SE (x, y)
which is obtained by estimating the signal transmission map image U SA (x, y).
In order to calculate U SA (x, y), we make use of the well known Beer Lambert
Law: U ST (x, y) = U S0 (x, y)exp(−ηd(x, y)) which models the attenuation as a
function of depth. Here U ST (x, y) is the attenuated intensity image, U S0 is the
initial intensity image, η the attenuation coefficient, and d(x, y) the distance
from the source/transducer. U ST (x, y) is modeled as a patch-wise transmission
function modulated by attenuation and orientation of the needle which will be
explained in the next section. Once U ST (x, y) is obtained U SA (x, y) is estimated
by minimizing the following objective function [11]:
λ
U SA (x, y) − U ST (x, y) 22 + Wj ◦ (Dj U SA (x, y)) 1 . (1)
2 j∈ω
where EH ,EV and ED are the horizontal, vertical and diagonal edges on the
graph and ci = U S(x, y)i exp(−ηρ(x, y)i ). Here U S(x, y)i is the image intensity
at node i and ρ(x, y)i is the normalized closest distance from the node to the
nearest virtual transducer [10]. The attenuation coefficient η is inherently inte-
grated in the weighting function, γ is used to model the beam width and β = 90
is an algorithmic constant. U ST (x, y) is obtained by taking the complement of
U SCM (x, y) (U ST (x, y) = U SCM (x, y)C ). Since we expect the signal transmis-
sion map function U SA (x, y) to display higher intensity with increasing depth,
the complement of the confidence map provides the ideal patch-wise transmission
map, U ST (x, y), in deriving the minimal objective function. The result of needle
tip enhancement is shown in Fig. 1. Investigating Fig. 1(b), we can see that the
calculated transmission map U SA (x, y), using Eq. (1), has low intensity values
close to the transducer surface (shallow image regions) and high intensity fea-
tures in the regions away from the transducer (deep image regions). Furthermore,
it provides a smooth attenuation density estimate for the US image formation
model. Finally, the mean intensity of the local region in the estimated signal
transmission map is less than the echogenicity of the tissue confining the needle.
This translates into the enhanced image, where the tip will be represented by a
local average of the surrounding points, thus giving a uniform intensity region
with a high intensity feature belonging to the needle tip in the enhanced image
U SE (x, y).
Needle Tip Localization: The first step in tip localization is the enhancement
of the needle shaft appearing close to the transducer surface (Fig. 1(c) top right)
in the enhanced US image U SE (x, y). This is achieved by constructing a phase-
based image descriptor, called phase symmetry (PS), using a 2D Log-Gabor
2
−(θ−θm )2
filter whose function is defined as: LG(ω, θ) = exp( −log(ω/κ)
2log(σω )2 )exp( 2(σθ )2 ).
Here, ω is the filter frequency while θ is its orientation, k is the center frequency,
σω is the bandwidth on the frequency spectrum, σθ is the angular bandwidth
and θm is the filter orientation. These filter parameters are selected automatically
using the framework proposed in [6]. An initial needle trajectory is calculated by
using the peak value of the Radon Transformed PS image. This initial trajectory
is further optimized using Maximum Likelihood Estimation Sample Consensus
366 C. Mwikirize et al.
Fig. 1. Needle tip enhancement by the proposed method.(a) B-mode US image showing
inserted needle at an angle of 450 . The needle tip has a low contrast to the surrounding
tissue and the needle shaft is discontinuous. (b) The derived optimal signal transmission
map function U SA (x, y). The map gives an estimation of the signal density in the US
image, and thus displays higher intensity values in more attenuated and scattered
regions towards the bottom of the image (c) Result of needle tip enhancement. The
red arrow points to the conspicuous tip along the trajectory path.
(MLESAC) [6] algorithm for outlier rejection and geometric optimization for
connecting the extracted inliers [6]. The image intensities at this stage, lying
along a line L in a point cloud, are distributed into a set of line segments, each
defined by set of points or knots denoted as t1 ...tn . Needle tip is extracted using:
ti+1
U SB (L(t)) dt
ti
U Sneedle (U SB , L(t)) = ; t ∈ [ti , ti+1 ]. (3)
L(ti+1 − Lti ) 2
Here, U SB is the result after band-pass filtering the tip enhanced US image,
while ti and ti+1 are successive knots. U Sneedle consists of averaged pixel inten-
sities, and the needle tip is localized as the farthest maximum intensity pixel of
U Sneedle at the distal end of the needle trajectory. One advantage of using the
tip enhanced image for localization instead of the original image is minimization
of interference from soft tissue. In the method of [6], if a high intensity region
other than that emanating from the tip were encountered along the trajectory
beyond the needle region, the likelihood of inaccurate localization was high. In
our case, the enhanced tip has a conspicuously higher intensity than soft tissue
interfaces and other interfering artifacts (Fig. 1(c)).
Shaft Enhancement: For shaft enhancement (Fig. 2), we use the regulariza-
tion framework previously explained. However, with reference to Eq. (1), since
our objective is to enhance the shaft, we construct a new patch-wise transmis-
sion function U ST (x, y) using trajectory and tip information calculated in the
needle tip localization section. Consequently, we model the patch-wise trans-
mission function as U ST (x, y) = U SDM (x, y) which represents the Euclidean
distance map of the trajectory constrained region. Knowledge of the initial tra-
jectory, from previous section, enables us to model an extended region which
includes the entire trajectory of the needle. Incorporating the needle tip loca-
tion, calculated in the previous step, we limit this region to the trajectory depth
Enhancement of Needle Tip and Shaft from 2D Ultrasound 367
Fig. 2. The process of needle shaft enhancement (a) B-mode US image. (b) Trajectory
constrained region obtained from local phase information, indicated by the red line.
Line thickness can be adjusted to suit different needle diameters and bending insertions.
(c) The optimal signal transmission function U SA (x, y) for needle shaft enhancement.
(d) Result of shaft enhancement. Enhancement does not take place for features along
the trajectory that may lie beyond the needle tip.
so as to minimize enhancement of soft tissue interfaces beyond the tip (Fig. 2(c)).
Investigating Fig. 2(c) we can see that the signal transmission map calculated
for the needle shaft has low density values for the local regions confining the
needle shaft and high density values for local regions away from the estimated
needle trajectory. The difference of the signal transmission map for needle shaft
enhancement compared to the tip enhancement is that the signal transmission
map for shaft enhancement is limited to the geometry of the needle. This trans-
lates into the enhanced needle shaft image, where the shaft will be represented by
a local average of the surrounding points, thus giving a uniform intensity region
with a high intensity feature belonging to the needle shaft in the enhanced image
U SE (x, y).
3 Experimental Results
Qualitative and quantitative results obtained from the proposed method are pro-
vided in Fig. 3. It is observed that the proposed method gives clear image detail
for the tip and shaft, even in instances where shaft information is barely visible
(Fig. 3(a) middle column). Using the method proposed in [6], incorrect tip local-
ization arises from soft tissue interface which manifests a higher intensity than
the tip along the needle trajectory in the B mode image (Fig. 3(a) right column).
In the proposed method, the tip is enhanced but the soft tissue interface is not,
thus improving localization as shown in Fig. 3(b). The tip and shaft enhance-
ment takes 0.6 seconds and 0.49 seconds for a 370 × 370 2D image respectively.
Figure 3(b) shows a summary of the quantitative results. The overall localization
error from the proposed method was 0.3 ± 0.06 mm while that from [6] under
an error cap of 2 mm (73 % of the dataset had an error of less than 2 mm) was
0.53 ± 0.07 mm.
Fig. 3. Qualitative an quantitative results for the proposed method. (a) Left column: B-
mode US images for bovine and porcine tissue respectively. Middle column: Respective
localized tip, red dot, overlaid on shaft enhanced image. Right column: Tip localization
results, red dot, from the method of Hacihaliloglu et al. [6]. (b) Quantitative analysis
of needle tip localization for bovine, porcine, liver and kidney tissue. Top: Proposed
method. Bottom: Using the method of Hacihaliloglu et al. [6]. For the method in [6]
only 73 % of the localization result had an error value under 2 mm and were used during
validation. Overall, the new method improves tip localization.
References
1. Hakime, A., Deschamps, F., De Carvalho, E.G., Barah, A., Auperin, A.,
De Baere, T.: Electromagnetic-tracked biopsy under ultrasound guidance: prelim-
inary results. Cardiovasc. Intervent. Radiol. 35(4), 898–905 (2012)
2. Zhou, H., Qiu, W., Ding, M., Zhang, S.: Automatic needle segmentation in 3D
ultra-sound images using 3D improved Hough transform. In: SPIE Medical Imag-
ing, vol. 6918, pp. 691821-1–691821-9 (2008)
3. Elif, A., Jaydev, P.: Optical flow-based tracking of needles and needle-tip localiza-
tion using circular hough transform in ultrasound images. Ann. Biomed. Eng.
43(8), 1828–1840 (2015)
4. Uhercik, M., Kybic, J., Liebgott, H., Cachard, C.: Model fitting using ransac for
surgical tool localization in 3D ultrasound images. IEEE Trans. Biomed. Eng.
57(8), 1907–1916 (2010)
5. Zhao, Y., Cachard, C., Liebgott, H.: Automatic needle detection and tracking in 3D
ultrasound using an ROI-based RANSAC and Kalman method. Ultrason. Imaging
35(4), 283–306 (2013)
6. Hacihaliloglu, I., Beigi, P., Ng, G., Rohling, R.N., Salcudean, S., Abolmaesumi, P.:
Projection-based phase features for localization of a needle tip in 2D curvilinear
ultrasound. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI
2015. LNCS, vol. 9349, pp. 347–354. Springer, Heidelberg (2015). doi:10.1007/
978-3-319-24553-9 43
7. Wu, Q., Yuchi, M., Ding, M.: Phase grouping-based needle segmentation in 3-D
trans-rectal ultrasound-guided prostate trans-perineal therapy. Ultrasound Med.
Biol. 40(4), 804–816 (2014)
8. Hatt, C.R., Ng, G., Parthasarathy, V.: Enhanced needle localization in ultrasound
using beam steering and learning-based segmentation. Comput. Med. Imaging
Graph. 14, 45–54 (2015)
9. Harmat, A., Rohling, R.N., Salcudean, S.: Needle tip localization using stylet vibra-
tion. Ultr. Med. Biol. 32(9), 1339–1348 (2006)
10. Karamalis, A., Wein, W., Klein, T., Navab, N.: Ultrasound confidence maps using
random walks. Med. Image Anal. 16(6), 1101–1112 (2012)
11. Meng, G., Wang, Y., Duan, J., Xiang, S., Pan, C.: Efficient image dehazing with
boundary constraint and contextual regularization. In: IEEE International Con-
ference on Computer Vision, pp. 617–624 (2013)
Plane Assist: The Influence of Haptics
on Ultrasound-Based Needle Guidance
1 Introduction
The use of ultrasound for interventional guidance has expanded significantly
over the past decade. With research showing that ultrasound guidance improves
patient outcomes in procedures such as central vein catheterizations and periph-
eral nerve blocks [3,7], the relevant professional certification organizations began
recommending ultrasound guidance as the gold standard of care, e.g. [1,2]. Some
ultrasound training is now provided in medical school, but often solely involves
the visualization and identification of anatomical structures – a very necessary
skill, but not the only one required [11].
Simultaneous visualization of targets and instruments (usually needles) with
a single 2D probe is a significant challenge. The difficulty of maintaining align-
ment (between probe, instrument, and target) is a major reason for extended
intervention duration [4]. Furthermore, if target or needle visualization is lost
due to probe slippage or tipping, the user has no direct feedback to find them
again. Prior work has shown that bimanual tasks are difficult if the effects of
movements of both hands are not visible in the workspace; when there is lack
of visual alignment, users must rely on their proprioception, which has an error
of up to 5 cm in position and 10◦ of orientation at the hands [9]. This is a
particular challenge for novice or infrequent ultrasound users, as this is on the
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 370–377, 2016.
DOI: 10.1007/978-3-319-46720-7 43
Plane Assist: Haptics for Ultrasound Needle Guidance 371
order of the range of unintended motion during ultrasound scans. Clinical accu-
racy limits (e.g. deep biopsies to lesions) are greater than 10 mm in diameter.
With US beam thickness at depth easily greater than 2 cm, correct continu-
ous target/needle visualization and steady probe position is a critical challenge.
Deviations less than 10 mm practically cannot be confirmed by US alone. One
study [13] found that the second most common error of anesthesiology novices
during needle block placement (occurring in 27 % of cases) was unintentional
probe movement.
One solution to this problem is to pro-
vide corrective guidance to the user. Prior
work in haptic guidance used vibrotactile dis-
plays effectively in tasks where visual load
is high [12]. The guiding vibrations can free
up cognitive resources for more critical task
aspects. Combined visual and haptic feed-
back has been shown to decrease error [10]
and reaction time [16] over visual feedback
alone, and has been shown to be most effec-
tive in tasks with a high cognitive load [6].
Handheld ultrasound scanning systems
with visual guidance or actuated feedback do
exist [8], but are either limited to just ini- Fig. 1. Guidance system used in
tial visual positioning guidance when using this study (Clear Guide ONE),
camera-based local tracking [15], or offer including a computer and handheld
active position feedback only for a small ultrasound probe with mounted
range of motion and require external track- cameras, connected to a standard
ing [5]. ultrasound system.
To improve this situation, we propose a
method for intuitive, always-available, direct probe guidance relative to a clinical
target, with no change to standard workflows. The innovation we describe here
is Plane Assist: ungrounded haptic (tactile) feedback signaling which direction
the user should move to bring the ultrasound imaging plane into alignment with
the target. Ergonomically, such feedback helps to avoid information overload
while allowing for full situational awareness, making it particularly useful for
less experienced operators.
Image guidance provides the user with information to help aligning instruments,
targets, and possibly imaging probes to facilitate successful instrument handling
relative to anatomical targets. This guidance information can be provided visu-
ally, haptically, or auditorily. In this study we consider visual guidance, haptic
guidance, and their combinations, for ultrasound-based interventions.
372 H. Culbertson et al.
For visual guidance, we use a Clear Guide ONE (Clear Guide Medical, Inc.,
Baltimore MD; Fig. 1), which adds instrument guidance capabilities to regular
ultrasound machines for needle-based interventions. Instrument and ultrasound
probe tracking is based on computer vision, using wide-spectrum stereo cameras
mounted on a standard clinical ultrasound transducer [14]. Instrument guidance
is displayed as a dynamic overlay on live ultrasound imaging.
Fiducial markers are attached to the patient skin in the cameras’ field of
view to permit dynamic target tracking. The operator defines a target initially by
tapping on the live ultrasound image. If the cameras observe a marker during this
target definition, further visual tracking of that marker allows continuous 6-DoF
localization of the probe. This target tracking enhances the operator’s ability
to maintain probe alignment with a chosen target. During the intervention, as
(inadvertent) deviations from this reference pose relative to the target – or vice
versa in the case of actual anatomical target motion – are tracked, guidance to
the target is indicated through audio and on-screen visual cues (needle lines,
moving target circles, and targeting crosshairs; Fig. 1).
From an initial target pose U S P in ultrasound (U S) coordinate frame and
camera/ultrasound calibration transformation matrix C T U S , one determines the
pose of the target in the original camera frame:
C C
P = T US US
P (1)
In a subsequent frame, where the same marker is observed in the new camera
coordinate frame (C, t), one finds the transformation between the two camera
frames (C,t T C ) by simple rigid registration of the two marker corner point sets.
Now the target is found in the new ultrasound frame (U S, t):
U S,t U S,t
P = T C,t C,t
TC C
P (2)
Noting that the ultrasound and camera frames are fixed relative to each other
(U S,t T C,t = U S T C ), and expanding, we get the continuously updated target
positions in the ultrasound frame:
U S,t
P = (C T U S )−1 C,t
TC C
T US US
P (3)
This information can be used for both visual and haptic (see below) feedback.
To add haptic cues to this system, two C-2 tactors (Engineering Acoustics, Inc.,
Casselberry, FL) were embedded in a silicone band that was attached to the
ultrasound probe, as shown in Fig. 2. Each tactor is 3 cm wide, 0.8 cm tall, and
has a mass of 17 g. The haptic feedback band adds 65 g of mass and 2.5 cm of
thickness to the ultrasound probe. The tactors were located on the probe sides to
provide feedback to correct unintentional probe tilting. Although other degrees
Plane Assist: Haptics for Ultrasound Needle Guidance 373
of freedom (i.e. probe translation) will also result in misalignment between the
US plane and target, we focus this initial implementation on tilting because our
pilot study showed that tilting is one of the largest contributors to error between
US plane and target.
Haptic feedback is provided to the user if the target location is further than 2
mm away from the ultrasound plane. This ±2 mm deadband thus corresponds to
different amounts of probe tilt for different target depths1 . The tactor on the side
corresponding to the direction of tilt is vibrated with an amplitude proportional
to the amount of deviation.
3 Experimental Methods
We performed a user study to test the effectiveness of haptic feedback in reducing
unintended probe motion during a needle insertion task. All procedures were
approved by the Stanford University Institutional Review Board. Eight right-
handed novice non-medical students were recruited for the study (five male,
three female, 22–43 years old). Novice subjects were used as an approximate
representation of medical residents’ skills to evaluate the effect of both visual
and haptic feedback on the performance of inexperienced users and to assess
the efficacy of this system for use in training. (Other studies indicate that the
system shows the greatest benefit with non-expert operators.)
Fig. 2. (a) Ultrasound probe, augmented with cameras for visual tracking of probe and
needle, and a tactor band for providing haptic feedback. (b) Participant performing
needle insertion trial into a gelatin phantom using visual needle and target guidance
on the screen, and haptic target guidance through the tactor band.
During each trial, the system determines the current position and orientation
of the ultrasound probe, and calculates its deviation from the reference pose.
Once the current probe/target deviation is computed, the operator is informed of
required repositioning using two forms of feedback: (1) Standard visual feedback
(by means of graphic overlays on the live US stream shown on-screen) indicates
the current target location as estimated by visual tracking and the probe motion
necessary to re-visualize the target in the US view port. The needle guidance is
also displayed as blue/green lines on the live imaging stream. (2) Haptic feedback
is presented as vibration on either side of the probe to indicate the direction of
probe tilt from its reference pose. The participants were instructed to tilt the
probe away from the vibration to correct for the unintended motion.
Each participant completed four trials under each of four feedback conditions:
no feedback (standard US imaging with no additional guidance), visual feedback
only, both visual and haptic feedback, and haptic feedback only. The conditions
and target locations were randomized and distributed across all sixteen trials to
mitigate learning effects and differences in difficulty between target locations.
Participants received each feedback and target location pair once.
4 Results
In our analysis, we define the amount of probe deviation as the perpendicular
distance between the ultrasound plane and the target location at the depth of
the target. In the no-feedback condition, participants had an uncorrected probe
deviation larger than 2 mm for longer than half of the trial time in 40 % of
Plane Assist: Haptics for Ultrasound Needle Guidance 375
the trials. This deviation caused these trials to be failures as the needle did
not hit the original 3D target location. This poor performance highlights the
prevalence of unintended probe motion and the need for providing feedback to
guide the user. We focus the remainder of our analysis on the comparison of the
effectiveness of the visual and haptic feedback, and do not include the results
from the no-feedback condition in our statistical analysis.
* **
** ***
2.5 12
10
Probe Deviation (mm)
0.5
2
0 0
Vision On Vision On Vision Off Vision On Vision On Vision Off
Haptics Off Haptics On Haptics On Haptics Off Haptics On Haptics On
Fig. 3. (a) Probe deviation, and (b) time to correct probe deviation, averaged across
each trial. Statistically significant differences in probe deviation and correction time
marked (∗ ∗ ∗ ≡ p ≤ 0.001, ∗∗ ≡ p ≤ 0.01, ∗ ≡ p ≤ 0.05).
factor (p < 0.0005). No significant difference was found between the average
probe deviations across participants or target locations (p > 0.4). A multiple-
comparison test between the three feedback conditions indicated that the average
probe correction time for the condition including visual feedback only (2.15 ±
2.40 s) was significantly greater than that for the conditions with both haptic and
visual feedback (0.61±0.36 s; p < 0.0005) and haptic feedback only (0.77±0.59 s;
p < 0.005). These results indicate that the addition of haptic feedback resulted
in less undesired motion of the probe and allowed participants to more quickly
correct any deviations.
Several participants indicated that the haptic feedback was especially bene-
ficial because of the high visual-cognitive load of the needle alignment portion of
the task. The participants were asked to rate the difficulty of the experimental
conditions on a five-point Likert scale. The difficulty ratings (Fig. 4) support
our other findings. The condition including both haptic and visual feedback was
rated as significantly easier (2.75±0.76) than the conditions with visual feedback
only (3.38 ± 0.92; p < 0.05) and haptic feedback only (3.5 ± 0.46; p < 0.01).
5 Conclusion
We described a method to add haptic feedback to a commercial, vision-based
navigation system for ultrasound-guided interventions. In addition to conven-
tional on-screen cues (target indicators, needle guides, etc.), two vibrating pads
on either side of a standard handheld transducer indicate deviations from the
plane containing a locked target. A user study was performed under simulated
conditions which highlight the central problems of clinical ultrasound imaging
– namely difficult visualization of intended targets, and distraction caused by
task focusing and information overload, both of which contribute to inadver-
tent target-alignment loss. Participants executed a dummy needle-targeting task,
while probe deviation from the target plane, reversion time to return to plane,
and perceived targeting difficulty were measured.
The experimental results clearly show (1) that both visual and haptic feed-
back are extremely helpful at least in supporting inexperienced or overwhelmed
operators, and (2) that adding haptic feedback (presumably because of its intu-
itiveness and independent sensation modality) improves performance over both
static and dynamic visual feedback. The considered metrics map directly to clin-
ical precision (in the case of probe deviation) or efficacy of the feedback method
(in the case of reversion time). Since the addition of haptic feedback resulted in
significant improvement for novice users, the system shows promise for use in
training.
Although this system was implemented using a Clear Guide ONE, the haptic
feedback can in principle be implemented with any navigated ultrasound guid-
ance system. In the future, it would be interesting to examine the benefits of
haptic feedback in a clinical study, across a large cohort of diversely-skilled oper-
ators, while directly measuring the intervention outcome (instrument placement
accuracy). Future prototypes would be improved by including haptic feedback
for additional degrees of freedom such as translation and rotation of the probe.
Plane Assist: Haptics for Ultrasound Needle Guidance 377
References
1. Emergency ultrasound guidelines: Ann. Emerg. Med. 53(4), 550–570 (2009)
2. Revised statement on recommendations for use of real-time ultrasound guidance
for placement of central venous catheters. Bull. Am. Coll. Surg. 96(2), 36–37 (2011)
3. Antonakakis, J.G., Ting, P.H., Sites, B.: Ultrasound-guided regional anesthesia
for peripheral nerve blocks: an evidence-based outcome review. Anesthesiol. Clin.
29(2), 179–191 (2011)
4. Banovac, F., Wilson, E., Zhang, H., Cleary, K.: Needle biopsy of anatomically unfa-
vorable liver lesions with an electromagnetic navigation assist device in a computed
tomography environment. J. Vasc. Interv. Radiol. 17(10), 1671–1675 (2006)
5. Becker, B.C., Maclachlan, R.A., Hager, G.D., Riviere, C.N.: Handheld microma-
nipulation with vision-based virtual fixtures. In: IEEE International Conference of
Robotics Automation, vol. 2011, pp. 4127–4132 (2011)
6. Burke, J.L., Prewett, M.S., Gray, A.A., Yang, L., Stilson, F.R., Coovert, M.D.,
Elliot, L.R., Redden, E.: Comparing the effects of visual-auditory and visual-tactile
feedback on user performance: a meta-analysis. In: Proceedings of the 8th Inter-
national Conference on Multimodal Interfaces, pp. 108–117. ACM (2006)
7. Cavanna, L., Mordenti, P., Bertè, R., Palladino, M.A., Biasini, C., Anselmi, E.,
Seghini, P., Vecchia, S., Civardi, G., Di Nunzio, C.: Ultrasound guidance reduces
pneumothorax rate and improves safety of thoracentesis in malignant pleural effu-
sion: report on 445 consecutive patients with advanced cancer. World J. Surg.
Oncol. 12(1), 1 (2014)
8. Courreges, F., Vieyres, P., Istepanian, R.: Advances in robotic tele-echography-
services-the otelo system. In: 26th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, IEMBS 2004, vol. 2, pp. 5371–5374.
IEEE (2004)
9. Gilbertson, M.W., Anthony, B.W.: Ergonomic control strategies for a handheld
force-controlled ultrasound probe. In: 2012 IEEE/RSJ International Conference
on Intelligent Robots and Systems (IROS), pp. 1284–1291. IEEE (2012)
10. Oakley, I., McGee, M.R., Brewster, S., Gray, P.: Putting the feel in ‘look and
feel’. In: Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, CHI 2000, pp. 415–422. ACM (2000)
11. Shapiro, R.S., Ko, P.P., Jacobson, S.: A pilot project to study the use of ultra-
sonography for teaching physical examination to medical students. Comput. Biol.
Med. 32(6), 403–409 (2002)
12. Sigrist, R., Rauter, G., Riener, R., Wolf, P.: Augmented visual, auditory, haptic,
and multimodal feedback in motor learning: a review. Psychon. Bull. Rev. 20(1),
21–53 (2013)
13. Sites, B.D., Spence, B.C., Gallagher, J.D., Wiley, C.W., Bertrand, M.L., Blike,
G.T.: Characterizing novice behavior associated with learning ultrasound-guided
peripheral regional anesthesia. Reg. Anesth. Pain Med. 32(2), 107–115 (2007)
14. Stolka, P.J., Wang, X.L., Hager, G.D., Boctor, E.M.: Navigation with local sensors
in handheld 3D ultrasound: initial in-vivo experience. In: SPIE Medical Imaging,
p. 79681J. International Society for Optics and Photonics (2011)
15. Sun, S.Y., Gilbertson, M., Anthony, B.W.: Computer-guided ultrasound probe
realignment by optical tracking. In: 2013 IEEE 10th International Symposium on
Biomedical Imaging (ISBI), pp. 21–24. IEEE (2013)
16. Van Erp, J.B., Van Veen, H.A.: Vibrotactile in-vehicle navigation system. Transp.
Res. Part F: Traffic Psychol. Behav. 7(4), 247–256 (2004)
A Surgical Guidance System for Big-Bubble
Deep Anterior Lamellar Keratoplasty
1 Introduction
Ophthalmic anterior segment surgery is among the most technically challenging
manual procedures. Penetrating Keratoplasty (PKP) is a well-established trans-
plant procedure for the treatment of multiple diseases of the cornea. In PKP, the
full thickness of the diseased cornea is removed and replaced with a donor cornea
that is positioned into place and sutured with stitches. Deep Anterior Lamellar
Keratoplasty (DALK) is proposed as an alternative method for corneal disor-
ders not affecting the endothelium. The main difference of DALK compared
to PKP is the preservation of the patient’s own endothelium. This advantage
reduces the risk of immunologic reactions and graft failure while showing simi-
lar overall visual outcomes. However, DALK is generally more complicated and
time-consuming with a steep learning curve particularly when the host stroma
is manually removed layer by layer [4]. In addition, high rate of intraoperative
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 378–385, 2016.
DOI: 10.1007/978-3-319-46720-7 44
A Surgical Guidance System for Big-Bubble Deep Anterior 379
perforation keeps DALK from becoming surgeons’ method of choice [7]. To over-
come the long surgical time and high perforation rate of DALK, in [1] Anwar
et al. have proposed the big-bubble DALK technique (BB-DALK). The funda-
mental step of the big-bubble technique is the insertion of a needle into the deep
stroma where air is injected with the goal of separating the posterior stroma and
the Descemet’s Membrane (DM). The needle is intended to penetrate to a depth
of more than 60 % of the cornea, where the injection of air in most cases forms a
bubble. However, in fear of perforating the DM, surgeons often stop the insertion
before the target depth, where air injection results only in diffuse emphysema
of the anterior stroma [7]. When bubble formation is not achieved, effort on
exposing a deep layer nearest possible to the DM carries the risk of accidental
perforation which brings further complications to the surgical procedure.
Optical Coherence Tomography (OCT) has been shown to increase the suc-
cess rate of the procedure by determining the depth of the cannula before
attempting the air injection [2]. Furthermore, recent integration of Spectral
Domain OCT (SD-OCT) into surgical microscopes gives the possibility of con-
tinuous monitoring of the needle insertion. However, current OCT acquisition
configurations and available tools to visualize the acquired scans are insufficient
for the purpose. Metallic instruments interfere with the OCT signal leading to
obstruction of deep structures. The accurate depth of the needle can only be per-
ceived by removing the needle and imaging the created tunnel since the image
captured when the needle is in position only shows the reflection of the top seg-
ment of the metallic instrument [2]. Also, limited field of view makes it hard to
keep the OCT position over the needle when pressure is applied for insertion.
Here we propose a complete system as a guidance tool for BB-DALK.
The system consists of modified 3D+t OCT acquisition using a microscope-
mounted scanner, sophisticated visualization, tracking of the epithelium (top)
and endothelium (bottom) layers and providing depth information using Aug-
mented Reality (AR). The method addresses all aspects of the indicated com-
plex procedure, hence is a practical solution to improve surgeons’ and patients’
experience.
2 Method
2mm 180px
ns
sca
A-
90
x
0p
18
m
6m
m
2m
6mm 540px
2mm 30 B-scans
(a) (b)
Fig. 2. (a): The modified pattern of OCT acquisition. (b): The lateral visualization of
the cornea (orange) and the surgical needle (gray) in an OCT cuboid.
A Surgical Guidance System for Big-Bubble Deep Anterior 381
voxels (Fig. 2b). For that, frames are first averaged along the depth to obtain
30 frames of 90 × 30 pixels. Then in each cell of the grid, a tricubic interpolant
which maps coordinates to intensity values is defined as follows:
3
f (x, y, z) = cijk xi y j z k , x, y, z ∈ [0, 1], (1)
i,j,k=0
in which cijk are the 64 interpolant coefficients calculated locally from the grid
sample points and their derivatives. The coefficients are calculated by multiplica-
tion of a readily available 64×64 matrix and the vector of 64 elements consisting
of 8 sample points and their derivatives [6]. The interpolation is implemented on
the CPU in a parallel fashion.
2.2 Visualization
The achieved 3D OCT volume is visualized on both 2D monitors using GPU ray
casting with 100 rays per pixel. Maximum information in OCT images is gained
from high-intensity values representing boundaries between tissue layers. Hence,
the Maximum Intensity Projection (MIP) technique is employed for rendering
to put an emphasis on corneal layers. Many segmentation algorithms in OCT
imaging are based on adaptive intensity thresholding [5]. Metallic surgical instru-
ments including typical needles used for the BB-DALK procedure have infrared
reflectivity profiles that are distinct from cellular tissues. The 3D OCT volume is
segmented into the background, the cornea and the instrument by taking advan-
tage of various reflectivity profiles and employing K-means clustering. The initial
cluster mean values are set for the background to zero, the cornea to the volume
mean intensity (μ) and the instrument to the volume mean intensity plus two
standard deviations (μ + 2σ). The segmentation is used to dynamically alter the
color and opacity transfer functions to ensure the instrument is distinctly and
continuously visualized in red, the background speckle noise is suppressed and
the corneal tissue opacity does not obscure the instrument (Fig. 3b, c).
Fig. 3. (a): Needle insertion performed by the surgeon on the ex vivo pig eye. (b), (c):
3D visualization of the OCT cuboid with frontal and lateral viewpoints. The needle is
distinctly visualized in red while endothelium (arrow) is not apparent.
382 H. Roodaki et al.
2.3 Tracking
The corneal DM and endothelial layer are the main targets of the BB-DALK
procedure. The DM must not be perforated while the needle must be guided as
close as possible to it. However, the two layers combined do not have a footprint
larger than a few pixels in OCT images. As an essential part of the guidance
system, DM and endothelium 3D surfaces are tracked for continuous feedback by
solid visualization. The advancement of the needle in a BB-DALK procedure is
examined and reported by percentage of the stroma that is above the needle tip.
Hence, the epithelium surface of the cornea is also tracked to assist the surgeon
by the quantitative guidance of the insertion.
Tracking in each volume is initiated by detection of the topmost and bot-
tommost 3D points in the segmented cornea of the OCT volume. Based on the
spherical shape of the cornea, two half spheres are considered as models of the
endothelium and epithelium surfaces. The models are then fitted to the detected
point clouds using iterative closest point (ICP) algorithm. Since the insertion of
the needle deforms the cornea, ICP is utilized with 3D affine transformation at
its core [3]. If the detected and the model half sphere point clouds are respec-
tively denoted as P = {pi }N P 3 NM
i=1 ∈ R and M = {mi }i=1 ∈ R , each iteration of
3
C(i) = arg min (Ak−1 mi + tk−1 ) − pi 22 , for all i ∈ {1, .., NM }. (2)
j∈{1,...,NP }
N
1
(Ak , tk ) = arg min (Ami + t) − pC(i) 22 . (3)
A,t N i=1
(a) (b)
Fig. 4. Augmented Reality is used to solidly visualize the endothelium and epithe-
lium surfaces (yellow) using wireframes. A hypothetical surface (green) is rendered to
indicate the insertion target depth.
surface (Fig. 4a). The pressure applied for insertion of the needle leads to defor-
mation of the cornea. To keep the OCT field of view centered on the focus of the
procedure despite the induced shifts, the OCT depth range is continuously cen-
tered to halfway between top and bottom surfaces. This is done automatically
to take the burden of manual repositioning away from the surgeon.
Fig. 5. Results of air injection in multiple pig eyes visualized from various viewpoints.
The concentration of air in the bottommost region of the cornea indicates the high
insertion accuracy. Deep stroma is reached with no sign of perforation.
384 H. Roodaki et al.
Phantom eye 10
2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18
Phantom eye 30
21 24 27 30 33 36 39 21 24 27 30 33 36 39
Pig eye 10
2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18
Pig eye 30
21 24 27 30 33 36 39 21 24 27 30 33 36 39
4 Conclusion
This work presents a novel real-time guidance system for one of the most chal-
lenging procedures in ophthalmic microsurgery. The use of medical AR aims at
facilitation of the BB-DALK learning process. Experiments on ex vivo pig eyes
suggest the usability and reliability of the system leading to more effective yet
shorter surgery sessions. Quantitative evaluations of the system indicate its high
accuracy in depicting the surgical scene and tracking its changes leading to pre-
cise and deep insertions. Future work will be in the direction of adding needle
tracking and navigation, further evaluations and clinical in vivo tests.
References
1. Anwar, M., Teichmann, K.D.: Big-bubble technique to bare Descemet’s membrane
in anterior lamellar keratoplasty. J. Cataract Refract. Surg. 28(3), 398–403 (2002)
2. De Benito-Llopis, L., Mehta, J.S., Angunawela, R.I., Ang, M., Tan, D.T.: Intra-
operative anterior segment optical coherence tomography: a novel assessment tool
during deep anterior lamellar keratoplasty. Am. J. Ophthalmol. 157(2), 334–341
(2014)
3. Du, S., Zheng, N., Ying, S., Liu, J.: Affine iterative closest point algorithm for
point set registration. Pattern Recogn. Lett. 31(9), 791–799 (2010)
4. Fontana, L., Parente, G., Tassinari, G.: Clinical outcomes after deep anterior lamel-
lar keratoplasty using the big-bubble technique in patients with keratoconus. Am.
J. Ophthalmol. 143(1), 117–124 (2007)
5. Ishikawa, H., Stein, D.M., Wollstein, G., Beaton, S., Fujimoto, J.G., Schuman, J.S.:
Macular segmentation with optical coherence tomography. Invest. Ophthalmol.
Vis. Sci. 46(6), 2012–2017 (2005)
6. Lekien, F., Marsden, J.: Tricubic interpolation in three dimensions. Int. J. Numer.
Meth. Eng. 63(3), 455–471 (2005)
7. Scorcia, V., Busin, M., Lucisano, A., Beltz, J., Carta, A., Scorcia, G.: Anterior seg-
ment optical coherence tomography-guided big-bubble technique. Ophthalmology
120(3), 471–476 (2013)
Real-Time 3D Tracking of Articulated Tools
for Robotic Surgery
The Hamlyn Centre for Robotic Surgery, Imperial College London, London, UK
menglong.ye11@imperial.ac.uk
1 Introduction
Recent advances in surgical robots have significantly improved the dexterity
of the surgeons, along with enhanced 3D vision and motion scaling. Surgical
robots such as the da Vinci R
(Intuitive Surgical, Inc. CA) platform, can allow
the augmentation of preoperative data to enhance the intraoperative surgical
guidance. In robotic surgery, tracking of surgical tools is an important task for
applications such as safe tool-tissue interaction and surgical skills assessment.
In the last decade, many approaches for surgical tool tracking have been pro-
posed. The majority of these methods have focused on the tracking of laparo-
scopic rigid tools, including using template matching [1] and combining colour-
segmentation with prior geometrical tool models [2]. In [3], the 3D poses of rigid
robotic tools were estimated by combining random forests with level-sets seg-
mentation. More recently, tracking of articulated tools has also attracted a lot of
interest. For example, Pezzementi et al. [4] tracked articulated tools based on an
offline synthetic model using colour and texture features. The CAD model of a
robotic tool was used by Reiter et al. [5] to generate virtual templates using the
robot kinematics. However, thousands of templates were created by configuring
the original tool kinematics, leading to time-demanding rendering and template
matching. In [6], boosted trees were used to learn predefined parts of surgical
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 386–394, 2016.
DOI: 10.1007/978-3-319-46720-7 45
Real-Time 3D Tracking of Articulated Tools for Robotic Surgery 387
Fig. 1. (a) Illustration of transformations; (b) Virtual rendering example of the large
needle driver and its keypoint locations; (c) Extracted gradient orientations from virtual
rendering. The orientations are quantised and colour-coded as shown in the pie chart.
tools. Similarly, regression forests have been employed in [7] to estimate the 2D
pose of articulated tools. In [8], the 3D locations of robotic tools estimated with
offline trained random forests, were fused with robot kinematics to recover the
3D poses of the tools. Whilst there has been significant progress on surgical tool
detection and tracking, none of the existing approaches have thus far achieved
real-time 3D tracking of articulated robotic tools.
In this paper, we propose a framework for real-time 3D tracking of articulated
tools in robotic surgery. Similar to [5], CAD models have been used to generate
virtual tools and their contour templates are extracted online, based on the
kinematic readings of the robot. In our work, the tool detection on the real
camera image is performed via matching the individual parts of the tools rather
than the whole instrument. This enables our method to deal with the changing
pose of the tools due to articulated motion. Another novel aspect of the proposed
framework is the robust verification approach based on 2D geometrical context,
which is used to reject outlier template matches of the tool parts. The inlier 2D
detections are then used for 3D pose estimation via the Extended Kalman Filter
(EKF). Experiments have been conducted on phantom, ex vivo and in vivo video
data, and the results verify that our approach outperforms the state-of-the-art.
2 Methods
Our proposed framework includes three main components. The first component is
a virtual tool renderer that generates part-based templates online. After template
matching, the second component performs verification to extract the inlier 2D
detections. These 2D detections are finally fused with kinematic data for 3D tool
pose estimation. Our framework is implemented on the da Vinci R
robot. The
R
robot kinematics are retrieved using the da Vinci Research Kit (dVRK) [9].
offline training, we propose to generate the part models on-the-fly such that the
changing appearance of tool parts can be dynamically adapted.
To generate the part-based models online, the CAD model of the tool and the
robot kinematics have been used to render the tool in a virtual environment. The
pose of a tool in the robot base frame B can be denoted as the transformation
TB B
E , where E is the end-effector coordinate frame shown in Fig. 1(a). TE can be
retrieved from dVRK (kinematics) to provide the 3D coordinates of the tool in B.
Thus, to set the virtual view to be the same as the laparoscopic view, a standard
hand-eye calibration [10] is used to estimate the transformation TC B from B to
the camera coordinate frame C. However, errors in the calibration can affect the
accuracy of TC B , resulting in a 3D pose offset between the virtual tool and the
real tool in C. In this regard, we represent the transformation found from the
−
calibration as TC B , where C
−
is the camera coordinate frame that includes the
accumulated calibration errors. Therefore, a correction transformation denoted
as TC C − can be introduced to compensate for the calibration errors.
n
In this work, we have defined n = 14 keypoints PB = pB i i=1 on the
tool, and the large needle driver is taken as an example. The keypoints include
the points shown in Fig. 1(b) and those on the symmetric side of the tool. These
keypoints represent the skeleton of the tool, which also apply to other da Vinci
R
tools. At time t, an image It can be obtained from the laparoscopic camera. The
keypoints can be projected in It with the camera intrinsic matrix K via
1 C− B
PIt = KTC
C − TB Pt . (1)
s
Here, s is the scaling factor that normalises the depth to the image plane.
To represent the appearance of the tool parts, the Quantised Gradient Ori-
entations (QGO) approach [11] has been used (see Fig. 1(c)). Bounding boxes
are created to represent part-based models and centred at the keypoints in the
virtual view (see Fig. 2(a)). The box size for each part is adjusted based on the z
Real-Time 3D Tracking of Articulated Tools for Robotic Surgery 389
coordinate (from kinematics) of the keypoint with respect to the virtual camera
centre. QGO templates are then extracted inside these bounding boxes. As QGO
represents the contour information of the tool, it is robust to cluttered scenes and
illumination changes. In addition, a QGO template is represented as a binary
code by quantisation, thus template matching can be performed efficiently.
Note that not all of the defined parts are visible in the virtual view, as some
of them may be occluded. Therefore, the templates are only extracted for those
m parts that facing the camera. To find the correspondences of the tool parts
between the virtual and real images, QGO is also computed on the real image
(see Fig. 2(b)) and template matching is then performed for each part via sliding
windows. Exemplar template matching results are shown in Fig. 2(c).
To further extract the best location estimates of the tool parts, a consensus-based
verification approach [12] is included. This approach analyses the geometrical
context of the correspondences in a PROgressive SAmple Consensus (PROSAC)
m
scheme [13]. For the visible keypoints {pi }i=1 in the virtual view, we denote their
m,k k
2D correspondences in the real camera image as {pi,j }i=1,j=1 , where {pi,j }j=1
represent the top k correspondences of pi sorted by QGO similarities.
m,k
For each iteration in PROSAC, we select two point pairs from {pi,j }i=1,j=1
in a sorted descending order. These two pairs represent the correspondences for
two different parts, e.g., pair of p1 and p1,2 , and pair of p3 and p3,1 . The two pairs
are then used to verify the geometrical context of the tool parts. As shown in
Fig. 2(d) and (e), we use two polar grids to indicate the geometrical context of the
virtual view and the camera image. The origins of the grids are defined as p1 and
p1,2 , respectively. The major axis of the grids can be defined as the vectors from
p1 to p3 and p1,2 to p3,1 , respectively. The scale difference between the two grids
is found by comparing d (p1 , p3 ) and d (p1,2 , p3,1 ), where d (·, ·) is the euclidean
distance. We can then define the angular and radial bin sizes as 30◦ and 10
pixels (allowing moderate out-of-plane rotation), respectively. With these, two
polar grids can be created and placed on the virtual and camera images. A point
pair is determined as an inlier if the two points are located in the same zone in
the polar grids. Therefore, if the number of inliers is larger than a predefined
value, the geometrical context of the tools in the virtual and the real camera
images are considered as matched. Otherwise, the above verification is repeated
until it reaches the maximum number (100) of iterations. After verification, the
inlier point matches are used to estimate the correction transformation TC C− .
T
z = [u1 , v1 , . . . , un , vn ] , where u and v are their locations in the camera image.
To estimate x on-the-fly, the EKF has been adopted to find xt given the observa-
tions zt at time t. The process model is defined as xt = Ixt−1 + wt , where wt is
the process noise at time t, and I is the transition function defined as the identity
matrix. The measurement model is defined as zt = h(xt ) + vt , with vt being the
T
noise. h(·) is the nonlinear function with respect to [θx , θy , θz , rx , ry , rz ] :
1 −
h(xt ) = Kf (xt )TC B
B Pt , (2)
s
which is derived according to Eq. 1. Note here, f (·) is the function that composes
the euler angles and translation (in xt ) into the 4×4 homogenous transformation
matrix TC C − . As Eq. 2 is a nonlinear function, we derive the Jacobian matrix J
of h(·) regarding each element in xt .
For iteration t, the predicted state x− t is calculated and used to predict the
measurement z− t , and also to calculate J t . In addition, zt is obtained from the
inlier detections (Sect. 2.2), which is used, along with Jt and x− t , to derive the
corrected state x+ t which contains the corrected angles and translations. These
are finally used to compose the transformation TC C − at time t, and thus the 3D
C C C− B
pose of the tool in C is obtained as TE = TC − TB TE . Note that if no 2D
detections are available at time t, the previous TC C − is then used.
At the beginning of the tracking process, an estimate 0 TC C − is required to
initialise EKF, and correct the virtual view to be as close as possible to the real
view. Therefore, template matching is performed in multiple scales and rotations
for initialisation, however, only one template is needed for matching of each tool
part after initialisation. The Efficient Perspective-n-Points (EPnP) algorithm
[14] is applied to estimate 0 TC C − based on the 2D–3D correspondences of the
tool parts matched between the virtual and real views and their 3D positions
from kinematic data.
The proposed framework can be easily extended to track multiple tools. This
only requires to generate part-based templates for all the tools in the same
graphic rendering and follow the proposed framework. As template matching is
performed in binarised templates, the computational speed is not deteriorated.
3 Results
The proposed framework has been implemented on an HP workstation with
an Intel Xeon E5-2643v3 CPU. Stereo videos are captured at 25 Hz. In our
C++ implementation, we have separated the part-based rendering and image
processing into two CPU running threads, enabling our framework to be real-
time. The rendering part is implemented based on VTK and OpenGL, of which
the speed is fixed as 25 Hz. As our framework only requires monocular images
for 3D pose estimation, only the images from the left camera were processed.
For image size 720 × 576, the processing speed is ≈29 Hz (without any GPU
programming). The threshold of the inlier number in the geometrical context
verification is empirically defined as 4. For initialisation, template matching is
Real-Time 3D Tracking of Articulated Tools for Robotic Surgery 391
Fig. 3. (a) and (b) Detection rate results of our online template matching and Grad-
Boost [6] on two single-tool tracking sequences (see supplementary videos); (c) Overall
rotation angle errors (mean ± std) along each axis on Seqs. 1–6.
Table 1. Translation and rotation errors (mean ± std) on Seqs. 1–6. Tracking accuracies
with run-time speed in Hz (in brackets) compared to [8] on their dataset (Seqs. 7–12).
performed with additional scale ratios of 0.8 and 1.2, and rotations of ±15◦ ,
which does not deteriorate the run-time speed due to template binarisation. Our
method was compared to the tracking approaches for articulated tools including
[6,8].
For demonstrating the effectiveness of the online part-based templates for
tool detection, we have compared our approach to the method proposed in [6],
which is based on boosted trees for 2D tool part detection. For ease of training
data generation, a subset of the tool parts was evaluated in this comparison,
namely the front pin, logo, and rear pin. The classifier was trained with 6000
samples for each part. Since [6] applies to single tool tracking only, the trained
classifier along with our approach were tested on two single-tool sequences (1677
and 1732 images), where ground truth data was manually labelled. A part detec-
tion is determined to be correct if the distance of its centre and ground truth is
smaller than a threshold. To evaluate the results with different accuracy require-
ments, the threshold was therefore sequentially set to 5, 10, and 20 pixels.
The detection rates of the methods were calculated among the top N detec-
tions. As shown in Fig. 3(a–b) our method significantly outperforms [6] in all
accuracy requirements. This is because our templates are generated adaptively
online.
To validate the accuracy of the 3D pose estimation, we manually labelled
the centre locations of the tool parts on both left and right camera images
392 M. Ye et al.
Fig. 4. Qualitative results. (a–c) phantom data (Seqs. 1–3); (d) ex vivo ovine data
(Seq. 4); (e) and (g) ex vivo porcine data (Seqs. 9 and 12); (f) in vivo porcine data
(Seq. 11). Red lines indicate the tool kinematics, and green lines indicate the tracking
results of our framework with 2D detections in coloured dots.
on phantom (Seqs. 1–3) and ex vivo (Seqs. 4–6) video data to generate the 3D
ground truth. The tool pose errors are then obtained as the relative pose between
the estimated pose and the ground truth. Our approach was also compared
to the 3D poses estimated performing EPnP for every image where the tool
parts are detected. However, EPnP generated unstable results and had inferior
performance to our approach as shown in Table 1 and Fig. 3(c).
We have also compared our framework to the method proposed in [8]. As
their code is not publicly available, we ran our framework on the same ex vivo
(Seqs. 7–10, 12) and in vivo data (Seq. 11) used in [8]. Example results are shown
in Fig. 4(e–g). For achieving a fair comparison, we have evaluated the tracking
accuracy as explained in their work, and presented both our results and theirs
reported in the paper in Table 1. Although our framework achieved slightly bet-
ter accuracies than their approach, our processing speed is significantly faster,
ranging from 25–36 Hz, while theirs is approximately 1–2 Hz as reported in [8].
As shown in Figs. 4(b) and (d), our proposed method is robust to occlusion due
to tool intersections and specularities, thanks to the fusion of 2D part detections
and kinematics. In addition, our framework is able to provide accurate track-
−
ing even when TC B becomes invalid after the laparoscope has moved (Fig. 4(c),
Seq. 3). This is because TC C − is estimated online using the 2D part detections.
All the processed videos are available via https://youtu.be/oqw 9Xp qsw.
4 Conclusions
In this paper, we have proposed a real-time framework for 3D tracking of artic-
ulated tools in robotic surgery. Online part-based templates are generated using
the tool CAD models and robot kinematics, such that efficient 2D detection
can then be performed in the camera image. For rejecting outliers, a robust
verification method based on 2D geometrical context is included. The inlier 2D
Real-Time 3D Tracking of Articulated Tools for Robotic Surgery 393
detections are finally fused with robot kinematics for 3D pose estimation. Our
framework can run in real-time for multi-tool tracking, thus can be used for
imposing dynamic active constraints and motion analysis. The results on phan-
tom, ex vivo and in vivo experiments demonstrate that our approach can achieve
accurate 3D tracking, and outperform the current state-of-the-art.
References
1. Sznitman, R., Ali, K., Richa, R., Taylor, R.H., Hager, G.D., Fua, P.: Data-driven
visual tracking in retinal microsurgery. In: Ayache, N., Delingette, H., Golland, P.,
Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7511, pp. 568–575. Springer, Heidelberg
(2012)
2. Wolf, R., Duchateau, J., Cinquin, P., Voros, S.: 3D tracking of laparoscopic instru-
ments using statistical and geometric modeling. In: Fichtinger, G., Martel, A.,
Peters, T. (eds.) MICCAI 2011. LNCS, vol. 6891, pp. 203–210. Springer, Heidel-
berg (2011)
3. Allan, M., Chang, P.-L., Ourselin, S., Hawkes, D.J., Sridhar, A., Kelly, J., Stoyanov,
D.: Image based surgical instrument pose estimation with multi-class labelling and
optical flow. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI
2015. LNCS, vol. 9349, pp. 331–338. Springer, Heidelberg (2015)
4. Pezzementi, Z., Voros, S., Hager, G.: Articulated object tracking by rendering
consistent appearance parts. In: ICRA, pp. 3940–3947 (2009)
5. Reiter, A., Allen, P.K., Zhao, T.: Articulated surgical tool detection using virtually-
rendered templates. In: CARS (2012)
6. Sznitman, R., Becker, C., Fua, P.: Fast part-based classification for instrument
detection in minimally invasive surgery. In: Golland, P., Hata, N., Barillot, C.,
Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 692–699.
Springer, Heidelberg (2014)
7. Rieke, N., Tan, D.J., Alsheakhali, M., Tombari, F., San Filippo, C.A., Belagiannis,
V., Eslami, A., Navab, N.: Surgical tool tracking and pose estimation in retinal
microsurgery. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MIC-
CAI 2015. LNCS, vol. 9349, pp. 266–273. Springer, Heidelberg (2015)
8. Reiter, A., Allen, P.K., Zhao, T.: Appearance learning for 3D tracking of robotic
surgical tools. Int. J. Rob. Res. 33(2), 342–356 (2014)
9. Kazanzides, P., Chen, Z., Deguet, A., Fischer, G., Taylor, R., DiMaio, S.: An open-
source research kit for the da vinci R
surgical system. In: ICRA, pp. 6434–6439
(2014)
10. Tsai, R., Lenz, R.: A new technique for fully autonomous and efficient 3D robotics
hand/eye calibration. IEEE Trans. Rob. Autom. 5(3), 345–358 (1989)
11. Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., Lepetit, V.:
Gradient response maps for real-time detection of textureless objects. IEEE Trans.
Pattern Anal. Mach. Intell. 34(5), 876–888 (2012)
12. Ye, M., Giannarou, S., Meining, A., Yang, G.Z.: Online tracking and retargeting
with applications to optical biopsy in gastrointestinal endoscopic examinations.
Med. Image Anal. 30, 144–157 (2016)
394 M. Ye et al.
13. Chum, O., Matas, J.: Matching with PROSAC - progressive sample consensus. In:
CVPR, vol. 1, pp. 220–226 (2005)
14. Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the
PnP problem. Int. J. Comput. Vis. 81(2), 155–166 (2008)
Towards Automated Ultrasound
Transesophageal Echocardiography and X-Ray
Fluoroscopy Fusion Using an Image-Based
Co-registration Method
1 Introduction
during the navigation and deployment of the devices. For example, a Xray/TEE
fusion system can help the physican finding correct TAVR deployment angle on
fluoroscopic image using landmarks transformed from annotations on TEE.
To enable the fusion of Xray and TEE images, several methods have been
proposed to recover the 3D pose of TEE probe from the Xray image [1–3,5,6],
where 3D pose recovery is accomplished by 3D-2D image registration. In [1,2,5],
3D-2D image registration is fulfilled via minimizing dissimilarity between digi-
tally generated radiographies (DRR) and X-ray images. In [6], DRR rendering
is accelerated by using mesh model instead of a computed tomography (CT)
volume. In [3], registration is accelerated using a cost function which is directly
computed from X-ray image and CT scan via splatting from point cloud model
without the explicit generation of DRR. The main disadvantage of these meth-
ods is that they are not fully automatic and requires initialization due to small
capture range. Recently, Montney et al. proposed a detection based method to
recover the 3D pose of the TEE probe from an Xray image in work [7]. 3D trans-
lation is derived from probe’s in-plane position detector and scale detector. 3D
Rotation (illustrated in Fig. 1(a)) is derived from in-plane rotation (yaw angle)
based on orientation detector and out-of-plane rotations (roll and pitch angles)
based on a template matching based approach. They demonstrated feasibility
on synthetic data. Motivated by the detection based method, we present a new
method in this paper to handle practical challenges in a clinical setup such as
low X-Ray dose, noise, clutters and probe self-symmetry in 2D image. Two self-
symmetry examples are shown in Fig. 1(b). To minimize appearance ambiguity,
three balls (Fig. 2(a)) and three holes (Fig. 2(b)) are manufactured on the probe.
Examples of ball marker and hole marker appearing in fluoroscopic images are
shown in Fig. 2(c) and (d). Our algorithm explicitly detects the markers and
incorporates the marker detection results into TEE probe pose estimation for
an improved robustness and accuracy.
Fig. 1. (a) Illustration of TEE Euler angles. Yaw is an in-plane rotation. Pitch and
roll are out-of-plane rotations. (b) Example of ambiguous appearance in two different
poses. Green box indicates probe’s transducer array. Roll angle between two poses are
close to 90◦ . Without considering markers (Fig. 2), probe looks similar in X-ray images.
Fig. 2. Illustration of probe markers circled in red. (a) 3D TEE probe front side with
3 ball markers and (b) back side with 3 hole markers. (c) Ball markers and (d) hole
markers appear in X-Ray images.
2 Methods
A 3D TEE point QT EE can be projected to the 2D fluoroscopic image point
QF luoro = Pint Pext (RTWEE QT EE + TTWEE ), where Pint is C-Arm’s internal pro-
jection matrix. Pext is C-Arm’s external matrix which transforms a point from
TEE world coordinate to C-Arm coordinate. RTWEE and TTWEE are TEE probe’s
rotation and position in the world coordinate. The internal and external matrices
−1 C
are known from calibration and C-Arm rotation angles. RTWEE = Pext RT EE and
W −1 C C C
TT EE = Pext TT EE , where RT EE and TT EE are the probe’s rotation and posi-
tion in the C-Arm coordinate system. RTCEE is composed of three euler angles
(θz , θx , θy ), which are illustrated in Fig. 1(a), and TTCEE = (x, y, z).
The proposed tracking algorithm is formulated as finding an optimal pose on
the current image t constrained via prior pose from image t − 1. In our work,
pose hypotheses with pose parameters (u, v), θz , s, θx and θy are generated and
optimal pose among these hypotheses are identified in a sequential Bayesian
inference framework. Figure 3 illustrates an overview of the proposed algorithm.
We defined two tracking stages: in-plane pose tracking for parameters (u, v), s,
and θz and out-of-plane tracking for parameters θx and θy. In the context of
visual tracking, the searching spaces of (ut , vt , θzt , st ) and (θxt , θyt ) are signifi-
cantly reduced via generating in-plane pose hypotheses in the region of interest
(ut−1 ± δT , vt−1 ± δT , θzt−1 ± δz , st−1 ± δs ), and out-of-plane pose hypotheses in
the region of interest (θxt−1 ± δx , θyt−1 ± δy ), where δT , δz , δs, δx and δy are
searching ranges. Note that we choose these searching ranges conservatively, i.e.
much larger than typical frame-to-frame probe motion.
398 S. Sun et al.
where Mt is in-plane pose parameters (u, v, θz , s). M̂t is the optimal solution
using maximum a posterior (MAP) probability. P (Zt |Mt ) is the likelihood of
an in-plane hypothesis being positive. P (Mt ) represents in-plane motion prior
probability, which is defined as a joint Gaussian distribution with respect to the
parameters (u, v, θz , s) with standard deviations (σT , σT , σθz and σs ).
In-plane pose hypotheses are generated using marginal space learning method
similar to the work in [10]. A series of cascaded classifiers are trained to clas-
sify probe position (u, v), size s, and orientation θz . These classifiers are trained
sequentially: two position detectors for (u, v), orientation detector for θz and
scale detector for s. Each detector is a Probabilistic Boosting Tree (PBT) classi-
fier [8] using Haar-like features [9] and rotated Haar-like features [9]. The position
classifier is trained on the annotations (positive samples) and negative samples
randomly sample to be away from annotations. The second position detector
performs bootstrapping procedure. Negative samples are collected from both
false positive of the first position detection results and random negative sam-
ples. Orientation detector is trained on the rotated images, which are rotated to
0◦ according to annotated probe’s orientations. The Haar-like features are com-
puted on rotated images. During orientation test stage, input image is rotated
every 5◦ in range of θzt−1 ± δz . Scale detector is trained on the rotated images.
Haar-like feature is computed on the rotated images and the Haar feature win-
dows are scaled based on probe’s size. During scale test stage, Haar feature
window is scaled and quantified in the range of st−1 ± δs .
Fig. 4. An example of template matching score map for one probe pose. X-axis is roll
angle and Y-axis is pitch angle. Each pixel represents one template pose. Dark red
color indicates a high matching score and dark blue indicates a small matching score.
Initial probe pose in the sequence is derived from detection results without con-
sidering temporal information. We detect the in-plane position, orientation and
scale, and out-of-plane roll and pitch hypotheses in the whole required searching
400 S. Sun et al.
space. We get a final in-plane pose via Non-maximal suppression and weighted
average to the pose with the largest detection probability. The hypothesis with
largest searching score is used as out-of-plane pose. For initializing tracking: (1)
we save poses of Ni (e.g. Ni = 5) consecutive image frames. (2) A median pose
is computed from Ni detection results. (3) Weighted mean pose is computed
based on distance to the median pose. (4) Standard deviation σp to the mean
pose is computed. Once σp < σthreshod , tracking starts with initial pose (i.e.
the mean pose). During tracking, we identify tracking failure through: (1) we
save Nf (e.g. Nf = 5) consecutive tracking results. (2) The average searching
score mscore is computed. If mscore < mthreshold , we stop tracking and re-start
tracking initialization procedure.
For our study, we trained machine learning based detectors on ∼ 10, 000 fluoro-
scopic images (∼ 90 % images are synthetically generated images and ∼ 10 %
images are clinical images). We validated our methods on 34 X-Ray fluoro-
scopic videos (1933 images) acquired from clinical experiments, and 13 videos
(2232 images) from synthetic generation. The synthetic images were generated
by blending DRRs of the TEE probe (including tube) with real fluoroscopic
images containing no TEE probe. Particularly for the test synthetic sequences,
we simulate realistic probe motions (e.g., insertion, retraction, roll etc.) in the
fluoroscopic sequences. Ground truth poses for synthetic images are derived
from 3D probe geometry and rendering parameters. Clinical images are man-
ually annotated using our developed interactive tool by 4 experts. Image size is
1024 × 1024 pixels. Computations were performed on a workstation with Intel
Xeon (E5-1620) CPU 3.7 GHz and 8.00 GB Memory. On average, our tracking
Towards Automated Ultrasound Transesophageal Echocardiography 401
(a) (b)
Fig. 5. Result of success rate vs 2D TRE on clinical (a) and synthetic (b) validations
of the proposed detection, tracking and 3D-2D registration refinement algorithms.
Due to limited availability of clinical data, we enlarged our training data set
using synthetic images. Table 1 and Fig. 5 show our approach performs well on
real clinical data utilizing hybrid training data. We expect increased robustness
and accuracy after larger number of real clinical cases become available. Track-
ing algorithm improved robustness and accuracy comparing to detection alone
approach. One limitation of our tracking algorithm is not able to compensate
all discretization errors although temporal smoothing is applied using Kalman
filter. This is a limitation of any detection based approach. To further enhance
accuracy, refinement is applied when physicians perform the measurements. To
4 Conclusion
In this work, we presented a fully automated method of recovering the 3D pose of
TEE probe from the Xray image. Tracking is very important to give physicians
the confidence that the probe pose recovery is working robustly and continu-
ously. Abrupt failed probe detection is not good especially when the probe does
not move. Detection alone based approach is not able to address abrupt fail-
ures due to disturbance, noise and appearance ambiguities of the probe. Our
proposed visual tracking algorithm avoids abrupt failure and improves detection
robustness as shown in our experiment. In addition, our approach is a near real-
time approach (about 10 FPS) and a fully automated approach without any user
interaction, e.g. manual pose initialization as required by many state-of-the-art
methods. Our proposed complete solution addressing TEE and X-Ray fusion
problem is applicable to clinical practice due to high robustness and accuracy.
Disclaimer: The outlined concepts are not commercially available. Due to reg-
ulatory reasons their future availability cannot be guaranteed.
References
1. Gao, G., et al.: Rapid image registration of three-dimensional transesophageal
echocardiography and X-ray fluoroscopy for the guidance of cardiac interven-
tions. In: Navab, N., Jannin, P. (eds.) IPCAI 2010. LNCS, vol. 6135, pp. 124–134.
Springer, Heidelberg (2010)
2. Gao, G., et al.: Registration of 3D transesophageal echocardiography to X-ray
fluoroscopy using image-based probe tracking. Med. Image Anal. 16(1), 38–49
(2012)
3. Hatt, C.R., Speidel, M.A., Raval, A.N.: Robust 5DOF transesophageal echo probe
tracking at fluoroscopic frame rates. In: Navab, N., Hornegger, J., Wells, W.M.,
Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 290–297. Springer,
Heidelberg (2015). doi:10.1007/978-3-319-24553-9 36
4. Hinterstoisser, S., et al.: Gradient response maps for real-time detection of texture-
less objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 876–888 (2012)
5. Housden, R.J., et al.: Evaluation of a real-time hybrid three-dimensional echo
and X-ray imaging system for guidance of cardiac catheterisation procedures. In:
Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol.
7511, pp. 25–32. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33418-4 4
6. Kaiser, M., et al.: Significant acceleration of 2D–3D registraion-based fusion of
ultrasound and X-ray images by mesh-based DRR rendering. In: SPIE, p. 867111
(2013)
7. Mountney, P., et al.: Ultrasound and fluoroscopic images fusion by autonomous
ultrasound probe detection. In: Ayache, N., Delingette, H., Golland, P., Mori, K.
(eds.) MICCAI 2012. LNCS, vol. 7511, pp. 544–551. Springer, Heidelberg (2012).
doi:10.1007/978-3-642-33418-4 67
Towards Automated Ultrasound Transesophageal Echocardiography 403
There is much ongoing research to develop and apply Augmented Reality (AR)
to improve laparoscopic surgery. One important goal is to visualise hidden sub-
surface structures such as tumors or major vessels by augmenting optical images
from a laparoscope with 3D radiological data from e.g. MRI or CT. Solutions
are currently being developed to assist various procedures including liver tumor
resection such as [6], myomectomy [3] and partial nephrectomy [9]. To solve the
problem one must register the data modalities. The general strategy is to build
a deformable 3D organ model from the radiological data, then to determine
the model’s 3D transformation to the laparoscope’s coordinate system at any
given time. This is very challenging and a general, automatic, robust and real-
time solution does not yet exist. The problem is especially hard with monocular
laparoscopes because of the lack of depth information. A crucial missing com-
ponent is a way to robustly compute dense matches between the organ’s surface
and the laparoscopic images. Currently, real-time results have only been achieved
with sparse feature-based matches using KLT [5,10], however this is quite fragile,
suffers from drift, and can quickly break down for a number of reasons including
occlusions, sudden camera motion, motion blur and optical blur.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 404–412, 2016.
DOI: 10.1007/978-3-319-46720-7 47
Robust, Real-Time, Dense and Deformable 3D Organ Tracking 405
2 Methodology
We now present the framework, which we refer to as Robust, Real-time, Dense
and Deformable (R2D2) tracking. Figure 1 gives an overview of R2D2 tracking
using an in-vivo porcine kidney experiment as an example.
Fig. 1. Overview of R2D2 tracking with monocular laparoscopes. Top row: modelling
the organ’s texture by texture-mapping it from a set of reference laparoscopic images.
Bottom row: real-time tracking of the textured model.
406 T. Collins et al.
are combined with the model’s internal energy, then xt is solved with energy
minimisation. Once completed the new solution is used to update the render,
the next image is acquired and the process repeats. Because this process tracks
the model frame-to-frame a mechanism is needed for initialisation (to provide
an initial extimate of xt at the start) and re-initialisation (to provide and initial
estimate if tracking fails). We discuss these mechanisms below.
We use DRBM as a basis and extend it to our problem. Firstly, DRBM
requires at least some texture variation to be present, however tissue can be quite
textureless in some regions. To deal with this additional constraints are needed.
One that has rarely been exploited before are organ boundary constraints. Specif-
ically, if the organ’s boundary is visible (either partially or fully) it can be used
as a tracking constraint. Organ boundaries have been used previously to semi-
automatically register pre-operative models [3], but not for automatic real-time
tracking. This is non-trivial because one does not know which points correspond
to the organ’s boundary a priori. Secondly, we extend it to volumetric biome-
chanical deformable models, and thirdly we introduce semi-automatic texture
map updating, which allows strong changes of the organ’s appearance to be
handled, due to e.g. coagulation.
E(x) = Ematch (x; Ctexture ) + λbound Ematch (x; Cbound ) + λinternal Einternal (x)
(1)
The term Ematch is a point-match energy, which generates the energy for both
texture and boundary matches. This is defined as follows:
def
Ematch (x; C) = ρ (π(f (pi ; x)) − qi 2 ) (2)
(pi ,qi )∈C
When a match is erroneous the model should not align the match, and the M-
estimator provides this by reducing the influence of an erroneous match on E.
We have tested various M-estimators and found good results are obtained with
def √
pseudo-L1 ρ(x) = x2 + with = 10−3 being a small constant to make Ematch
differentiable everywhere.
The terms λbound and λinternal are influence weights, and discuss how they
have been set in the experimental section. We follow the same procedure to
minimise E as described in [2]. This is done by linearising E about the current
estimate (which is the solution from the previous frame), then we form the
associated linear system and solve its normal equations using a coarse-to-fine
multi-grid Gauss-Newton optimisation with backtracking line-search.
Fig. 2. Visualisations of the five test cases and tracking results. Best viewed in colour.
3 Experimental Results
We evaluate performance with five test cases which are visualised in Fig. 2 as five
columns. These are two in-vivo porcine kidneys (a,b), an in-vivo human uterus
(c), an ex-vivo chicken thigh used for laparoscopy training (d) and an ex-vivo
porcine kidney (e). We used the same kidney in cases (a) and (e). The models
were constructed from CT (a,b,d,e) and T2 weighted MRI (c), and segmented
interactively with MITK. For each case we recorded a monocular laparoscopic
video (10 mm Karl Storz 1080p, 25fps with CLARA image enhancement) of the
object being moved and deformed with surgical tools (a,b,c,d) or with human
hands (e). The video durations ranged from 1424 to 2166 frames (57 to 82 s).
The objects never moved completely out-of-frame in the videos, so we used
them to test tracking performance without re-localisation. The main challenges
present are low light and high noise (c), strong motion blur (b,c), significant tex-
ture change caused by intervention (a,c), tool occlusions (a,b,c,d), specularities
(a,b,c,d,e), dehydration (b), smoke (c), and partial occlusion where the organ
disappears behind the peritoneum (b,c). We constructed deformable models with
a 6 mm grid spacing with the number of respective tetrahedral elements for (a–
e) being 1591, 1757, 8618, 10028 and 1591. Homogeneous StVK elements were
used for (a,b,c,e) using rough generic Poison’s ratio ν values from the literature.
These were ν = 0.43 for (a,b,e) [4] and ν = 0.45 for (c). Note that when we use
homogeneous elements, the Young’s modulus E is not actually a useful parame-
ter for us. This because if we double E and halve λinternal we end up with the
same internal energy. We therefore arbitrarily set E = 1 for (a,b,c,e). For (d) we
410 T. Collins et al.
used two coarse element classes corresponding to bone and all other tissue, and
we set their Young’s moduli using relative values of 200 and 1 respectively.
Our tracking framework has several tunable parameters, which are (i) the
energy weights, (ii) the boundary search length l, (iii) the boundary detector
parameters and (iv) the DRBM parameters. To make them independent of the
image resolution, we pre-scale the images to a canonical width of 640 pixels.
For all five cases we used the same values of (iii) and (iv) (their respective
defaults), and the same value for (iii) of l = 15 pixels. For (i), we used the
same value of λbound = 0.7 in all cases. For λinternal we used category-specific
values, which were λinternal = 0.2 for the uterus, λinternal = 0.09 for kidneys
and λinternal = 0.2 for the chicken thigh. In the interest of space, the results
presented here do not use texture model updating. This is to evaluate track-
ing robustness despite significant appearance change. We refer the reader to
the associated videos to see texture model updating in action. We benchmarked
processing speed on a mid-range Intel i7-5960X desktop PC with a single NVidia
GTX 980Ti GPU. With our current multi-threaded C++/CUDA implementa-
tion the average processing speeds were 35, 27, 22, 17 and 31fps for cases (a-
e) respectively. We also ran our framework without the boundary constraints
(λbound = 0). This was to analyse its influence on tracking accuracy, and we
call this version R2D2-b. We show snapshot results from the videos in Fig. 2. In
Fig. 2(f–j) we show five columns corresponding to each case. The top image is an
example input image, the middle image shows DRBM matches (with coarse-scale
matches in green, fine-scale matches in blue, gross outliers in red) and the bound-
ary matches in yellow. The third image shows an overlay of the tracked surface
mesh. We show three other images with corresponding overlays in Fig. 2(l–n).
The light path on the uterus in Fig. 2(h) is a coagulation path used for interven-
tional incision planning, and it significantly changed the appearance. The haze
in Fig. 2(m) is a smoke plume. In Fig. 2(o) we show the overlay with and without
boundary constraints (top and bottom respectively). This is an example where
the boundary constraints have clearly improved tracking.
We tested how well KLT-based tracking worked by measuring how long it
could sustain tracks from the first video frames. Due to the challenges of the
conditions, KLT tracks dropped off quickly in most cases. mostly due to blur or
tool occlusions. Only in case (b) did some KLT tracks persist to the end, however
they were limited to a small surface region which congregated around speculari-
ties (and therefore were drifting). By contrast our framework sustained tracking
through all videos. It is difficult to quantitatively evaluate tracking accuracy
in 3D without interventional radiological images, which were not available. We
therefore measured accuracy using 2D proxies. These were (i) Correspondence
Prediction Error (CPE) and (ii) Boundary Prediction Error (BPE). CPE tells us
how well the tracker aligns the model with respect to a set of manually located
point correspondences. We found approximately 20 per case, and located them
in 30 representative video frames. We then measured the distance (in pixels)
to their tracked positions. BPE tells us how well the tracker aligns the model’s
boundaries to the image. This was done by manually marking any contours in
Robust, Real-Time, Dense and Deformable 3D Organ Tracking 411
4 Conclusion
We have presented a new, integrated, robust and real-time solution for dense
tracking of deformable 3D soft-tissue organ models in laparoscopic videos. There
are a number of possible future directions. The main three are to investigate
automatic texture map updating, to investigate its performance using stereo
laparoscopic images, and to automatically detect when tracking fails.
References
1. Agisoft Photoscan. http://www.agisoft.com. Accessed 30 May 2016
2. Collins, T., Bartoli, A.: Realtime shape-from-template: system and applications.
In: ISMAR (2015)
3. Collins, T., Pizarro, D., Bartoli, A., Canis, M., Bourdel, N.: Computer-assisted
laparoscopic myomectomy by augmenting the uterus with pre-operative MRI data.
In: ISMAR (2014)
412 T. Collins et al.
4. Egorov, V., Tsyuryupa, S., Kanilo, S., Kogit, M., Sarvazyan, A.: Soft tissue elas-
tometer. Med. Eng. Phys. 30(2), 206–212 (2008)
5. Haouchine, N., Dequidt, J., Berger, M., Cotin, S.: Monocular 3D reconstruction
and augmentation of elastic surfaces with self-occlusion handling. IEEE Trans. Vis.
Comput. Graph. 21(12), 1363–1376 (2015)
6. Haouchine, N., Dequidt, J., Peterlik, I., Kerrien, E., Berger, M.-O., Cotin, S.:
Image-guided simulation of heterogeneous tissue deformation for augmented reality
during hepatic surgery. In: ISMAR (2013)
7. Puerto-Souza, G., Cadeddu, J.A., Mariottini, G.: Toward long-term and accurate
augmented-reality for monocular endoscopic videos. Bio. Eng. 61(10), 2609–2620
(2014)
8. Puerto-Souza, G., Mariottini, G.: A fast and accurate feature-matching algorithm
for minimally-invasive endoscopic images. TMI 32(7), 1201–1214 (2013)
9. Su, L.-M., Vagvolgyi, B.P., Agarwal, R., Reiley, C.E., Taylor, R.H., Hager, G.D.:
Augmented reality during robot-assisted laparoscopic partial nephrectomy: toward
real-time 3D-CT to stereoscopic video registration. Urology 73, 896–900 (2009)
10. Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical report
CMU-CS-91-132 (1991)
Structure-Aware Rank-1 Tensor Approximation
for Curvilinear Structure Tracking Using
Learned Hierarchical Features
1 Introduction
Reliable tracking of vascular structures or intravascular devices in dynamic X-ray
images is essential for guidance during interventional procedures and postpro-
cedural analysis [1–3,8,13,14]. However, bad tissue contrast due to low radi-
ation dose and lack of depth information always bring challenges on detect-
ing and tracking those curvilinear structures (CS). Traditional registration and
alignment-based trackers depend on local image intensity or gradient. With-
out high-level context information, they cannot efficiently discriminate low-
contrasted target structure from complex background. On the other hand, the
confounding irrelevant structures bring challenges to detection-based tracking.
Recently, a new solution is proposed that exploits the progress in multi-target
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 413–421, 2016.
DOI: 10.1007/978-3-319-46720-7 48
414 P. Chu et al.
tracking [2]. After initially detecting candidate points on a CS, the idea is to
model CS tracking as a multi-dimensional assignment (MDA) problem, then a
tensor approximation is applied to search for a solution. The idea encodes high-
order temporal information and hence gains robustness against local ambiguity.
However, it suffers from the lack of mechanism to encode the structure prior in
CS, and the features used in [2] via random forests lack discrimination power.
Build
links
Spatial Likelihood neighbor
interaction on model
In this paper, we present a new method (refer to Fig. 1 for the flowchart) to
detect and track CS in dynamic X-ray sequences. First, a convolutional neural
network (CNN) is used to detect candidate landmarks on CS. CNN automati-
cally learns the hierarchical representations of input images [6,7] and has been
recently used in medical image analysis (e.g. [9,10]). With the detected CS can-
didates, CS tracking is converted to a multiple target tracking problem and then
a multi-dimensional assignment (MDA) one. In MDA, candidates are associ-
ated along motion trajectories cross time, while the association is constructed
according to the trajectory affinity. It has been shown in [11] that MDA can be
efficiently solved via rank-1 tensor approximation (R1TA), in which the goal is
to seek vectors to maximize the “joint projection” of an affinity tensor. Shar-
ing the similar procedure, our solution adopts R1TA to estimate the CS motion.
Specifically, a high-order tensor is first constructed from all trajectory candidates
over a time span. Then, the model prior of CS is integrated into R1TA encoding
the spatial interaction between adjacent candidates in the model. Finally, CS
tracking results are inferred from model likelihood.
The main contribution of our work lies in two-fold. (1) We propose a
structure-aware tensor approximation framework for CS tracking by considering
the spatial interaction between CS components. The combination of such spatial
interaction and higher order temporal information effectively reduces association
ambiguity and hence improves the tracking robustness. (2) We design a discrim-
inative CNN detector for CS candidate detection. Compared with traditional
hand-crafted features, the learned CNN features show very high detection qual-
ity in identifying CS from low-visibility dynamic X-ray images. As a result, it
greatly reduces the number of hypothesis trajectories and improves the tracking
efficiency.
Structure-Aware Rank-1 Tensor Approximation for Curvilinear Structure 415
Probability map
For each image in the sequence except the first one which has groundtruth
annotated manually, a CS probability map is computed by the learned classi-
fier. A threshold is set to eliminate most of the false alarms in the image. Result
images are further processed by filtering and thinning. Typically, binarized prob-
ability map is filtered by a distance mask in which locations too far from the
model are excluded. Instead of using a groundtruth bounding box, we take the
tracking results from previous image batches. Based on the previously tracked
model, we calculate the speed and acceleration of the target to predict its posi-
tion in next image batch. Finally, after removing isolated pixels, CS candidates
are generated from the thinning results. Examples of detection results are shown
in Fig. 3. For comparison, probability maps obtained by a random forests classi-
fier with hand-crafted features [2] are also listed. Our probability maps contain
less false alarm, which guarantees more accurate candidate locations after post-
processing.
416 P. Chu et al.
Fig. 3. Probability maps and detected candidates of a vessel (left) and catheter (right).
For each example, from left to right are groundtruth, random forests result, and CNN
result, respectively. Red indicates region with high possibility, while green dots show
resulting candidates.
K
(1) (2) (K) (k) (k) (k) (k)
f (X ) = cj1 j2 ...jK xj1 xj2 . . . xjK + wlk jk elk jk xlk xjk , (1)
k=1 lk ,jk
(k)
where cj1 j2 ...jK is the affinity measuring trajectory confidence; wlk jk the likeli-
(k) (k) (k)
hood that candidates xjk and xlk are neighboring on the model; and elk jk the
spatial interaction of two candidates on two consecutive frames. The affinity has
two parts as
ci0 i1 ,...iK = appi0 i1 ,...iK × kini0 i1 ,...iK , (2)
where appi0 i1 ,...iK describes the appearance consistency of the trajectory, and
kini0 i1 ,...iK the kinetic affinity modeling the higher order temporal affinity as
detailed in [2].
Model Prior. CS candidates share two kinds of spatial constrains. First, trajec-
tories of two neighboring elements should have similar direction. Second, relative
order of two neighboring elements should not change so that re-composition of
CS is prohibited. Thus inspired, we formulate the spatial interaction of two can-
didates as
.
elk jk = emk−1 mk ik−1 ik = Epara + Eorder , (3)
Structure-Aware Rank-1 Tensor Approximation for Curvilinear Structure 417
where
(k−1) (k) (k−1) (k) (k−1) (k−1) (k) (k)
(oi − oi ) · (omk−1 − omk ) (oi − omk−1 ) · (oi − omk )
k−1 k k−1 k
Epara = (k−1) (k) (k−1) (k)
, Eorder = (k−1) (k−1) (k) (k)
,
(oi − oi ) · (omk−1 − omk ) (oi − omk−1 ) · (oi − omk )
k−1 k k−1 k
such that Epara models the angle between two neighbor trajectories, which also
penalizes large distance change between them; and Eorder models the relative
order of two adjacent candidates by the inner product of vectors between two
neighbor candidates.
Maximizing Eq. 1 closely correlates with the rank-1 tensor approximation
(R1TA) [4], which aims to approximate a tensor by the tensor product of unit
vectors up to a scale factor. By relaxing the integer constraint on the assignment
variables, once a real valued solution of Xk is achieved, it can be binarized
using the Hungarian algorithm [5]. The key issue here is to accommodate the
row/column 1 normalization in a general assignment problem, which is different
from the commonly used 2 norm constraint in tensor factorization. We develop
an approach similar to [11], which is a tensor power iteration solution with 1
row/column normalization.
(k) . (k)
Model Likelihood. Coefficient wlk jk = wmk−1 mk ik−1 ik measures the likelihood
(k−1) (k−1)
that two candidates oik−1 and omk−1 are neighboring on model. In order to
get the association of each candidate pair in each frame, or in other words, to
(k) (0)
measure the likelihood a candidate oik matching a model element part oi0 , we
(k)
maintain a “soft assignment”. In particular, we use θi0 ik to indicate the likelihood
(k) (0)
that oik corresponds to oi0 . It can be estimated by
Θ(k) = Θ(k−1) X(k) , k = 1, 2, . . . , K, (4)
∈ RI0 ×Ik and Θ(0) is fixed as the identity matrix.
(k)
where Θ(k) = (θi0 ik )
The model likelihood is updated in each step of the power iteration. After the
update of the first term in Eq. 1, a pre-likelihood Θ(k) is estimated for computing
(k)
wlk jk . Since Θ(k) associates candidates directly with the model, final tracking
result of the matching between o(0) and o(k) can be derived from Θ(k) .
With Θ(k) , the approximated distance on model of oik−1 and omk−1 can be
(k−1) (k−1)
calculated as following
(0) (0) (k) (k)
(oi0 − oi0 +1 )θi0 ik θi0 +1mk
dik mk = i0
(k)
(k) (k) . (5)
i0 θi0 ik θi0 +1mk
(k)
Thereby, wlk jk then can be simply calculated as
(k−1)
(k) . (k) 2dik−1 mk−1 d¯
wlk jk = wmk−1 mk ik−1 ik = (k−1)
, (6)
¯2
(dik−1 mk−1 )2 + (d)
where d¯ is the average distance between two neighboring elements on model O(0) .
The proposed tracking method is summarized in Algorithm 1.
418 P. Chu et al.
4 Experiments
We evaluate the proposed CS tracking algorithm using two groups of X-ray clin-
ical data collected from liver and cardiac interventions. The first group consists
of six sequences of liver vessel images and the second 11 sequences of catheter
images, each with around 20 frames. The data is acquired with 512 × 512 pix-
els and physical resolution of 0.345 or 0.366 mm. Groundtruth of each image is
manually annotated (Fig. 4(a)).
Vascular Structure Tracking. We first evaluate the proposed algorithm on
the vascular sequences. First frame from each sequence is used to generate train-
ing samples for CNN. To be specific, 800 vascular structure patches and 1500
negative patches are generated from each image. From the six images, a total
of 2300 × 6 = 13, 800 samples are extracted and split as 75 % training and 25 %
validation. All patches have the same size of 28 × 28 pixels. Distance thresh-
old of predictive bounding box is set to 60 pixels for enough error tolerance.
Finally, there are around 200 vascular structure candidates left in each frame.
The number of points on the model is around 50 for each sequence.
In our work, K = 3 is used to allow each four consecutive frames to be
associated. During tracking, tensor kernel costs around 10s and 100 MB (peak
value) RAM to process one frame with 200 candidates in our setting running on
a single Intel Xeon@2.3GHz core. The tracking error is defined as the shortest
distance between tracked pixels and groundtruth annotation. For each perfor-
mance metric, we compute its mean and standard deviation. For comparison, the
registration-based (RG) approach [14], bipartite graph matching [2] (BM) and
pure tensor based method [2] (TB) are applied to the same sequences. For BM
Structure-Aware Rank-1 Tensor Approximation for Curvilinear Structure 419
and TB, same tracking algorithms but with the CNN detector are also tested
and reported. The first block of Fig. 4 illustrates the tracking results of vascular
structures. B-spline is used to connect all tracked candidates to represent the
tracked vascular structure. The zoom-in view of a selected region (rectangle in
blue) in each tracking result is presented below, where portions with large errors
are colored red. Quantitative evaluation for each sequence is listed in Table 1.
Catheter Tracking. Similar procedures and parameters are applied to the
11 sequences of catheter images. The second block of Fig. 4 shows example of
catheter tracking results. The numerical comparisons are listed in Table 1.
The results show that our method clearly outperforms other three
approaches. Candidates in our approach are detected by a highly accurate CNN
detector, ensuring most extracted candidates to be on CS, while registration-
based method depends on the first frame as reference to identify targets. Our
approach is also better than the results of bipartite graph matching where K = 1.
The reason is that our proposed method incorporates higher-order temporal
information from multiple frames; by contrast, bipartite matching is only com-
puted from two frames. Compared with the pure tensor based algorithm, the
proposed method incorporates the model prior which provides more powerful
Fig. 4. Curvilinear structure tracking results. (a) groundtruth, (b) registration, (c)
bipartite matching, (d) tensor based, and (e) proposed method. Red indicates regions
with large errors, while green indicates small errors.
clues for tracking the whole CS. Confirmed by the zoom-in views, with model
prior, our proposed method is less affected by neighboring confounding struc-
tures.
5 Conclusion
References
1. Baert, S.A., Viergever, M.A., Niessen, W.J.: Guide-wire tracking during endovas-
cular interventions. IEEE Trans. Med. Imaging 22(8), 965–972 (2003)
2. Cheng, E., Pang, Y., Zhu, Y., Yu, J., Ling, H.: Curvilinear structure tracking by
low rank tensor approximation with model propagation. In: IEEE Conference on
Computer Vision and Pattern Recognition, pp. 3057–3064 (2014)
3. Cheng, J.Z., Chen, C.M., Cole, E.B., Pisano, E.D., Shen, D.: Automated delin-
eation of calcified vessels in mammography by tracking with uncertainty and graph-
ical linking techniques. IEEE Trans. Med. Imaging 31(11), 2143–2155 (2012)
4. De Lathauwer, L., De Moor, B., Vandewalle, J.: On the best rank-1 and rank-
(r1, r2,. . ., rn) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl.
21(4), 1324–1342 (2000)
5. Frank, A.: On Kuhn’s Hungarian method —a tribute from Hungary. Nav. Res.
Logistics 52(1), 2–5 (2005)
6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep con-
volutional neural networks. In: Advances in Neural Information Processing Sys-
tems, pp. 1097–1105 (2012)
7. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to
document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
8. Palti-Wasserman, D., Brukstein, A.M., Beyar, R.P.: Identifying and tracking a
guide wire in the coronary arteries during angioplasty from X-ray images. IEEE
Trans. Biomed. Eng. 44(2), 152–164 (1997)
9. Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M.: Deep feature
learning for knee cartilage segmentation using a triplanar convolutional neural
network. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI
2013, Part II. LNCS, vol. 8150, pp. 246–253. Springer, Heidelberg (2013)
10. Roth, H.R., Wang, Y., Yao, J., Lu, L., Burns, J.E., Summers, R.M.: Deep convo-
lutional networks for automated detection of posterior-element fractures on spine
CT. In: SPIE Medical Imaging, p. 97850 (2016)
11. Shi, X., Ling, H., Xing, J., Hu, W.: Multi-target tracking by rank-1 tensor approx-
imation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp.
2387–2394 (2013)
12. Vedaldi, A., Lenc, K.: MatConvNet – convolutional neural networks for MATLAB.
In: Proceedings of the ACM International Conference on Multimedia (2015)
13. Wang, P., Chen, T., Zhu, Y., Zhang, W., Zhou, S.K., Comaniciu, D.: Robust
guidewire tracking in fluoroscopy. In: IEEE Conference on Computer Vision and
Pattern Recognition, pp. 691–698 (2009)
14. Zhu, Y., Tsin, Y., Sundar, H., Sauer, F.: Image-based respiratory motion com-
pensation for fluoroscopic coronary roadmapping. In: Jiang, T., Navab, N., Pluim,
J.P.W., Viergever, M.A. (eds.) MICCAI 2010, Part III. LNCS, vol. 6363, pp. 287–
294. Springer, Heidelberg (2010)
Real-Time Online Adaption for Robust
Instrument Tracking and Pose Estimation
the way to objective quality assessment for RM. Surgical tool tracking has been
investigated in different medical specialties: nephrectomy [2], neurosurgery [3],
laparoscopy/endoscopy [4,5]. However, RM presents specific challenges such as
strong illumination changes, blur and variability of surgical instruments appear-
ance, that make the aforementioned approaches not directly applicable in this
scenario. Among the several works recently proposed in the field of tool tracking
for RM, Pezzementi et al. [6] suggested to perform the tracking in two steps: first
via appearance modeling, which computes a pixel-wise probability of class mem-
bership (foreground/background), then filtering, which estimates the current
tool configuration. Richa et al. [7] employ mutual information for tool tracking.
Snitzman et al. [8] introduced a joint algorithm which performs simultaneously
tool detection and tracking. The tool configuration is parametrized and track-
ing is modeled as a Bayesian filtering problem. Succesively, in [9], they propose
to use a gradient-based tracker to estimate the tool’s ROI followed by fore-
ground/background classification of the ROI’s pixels via boosted cascade. In
[10], a gradient boosted regression tree is used to create a multi-class classifier
which is able to detect different parts of the instrument. Li et al. [11] present
a multi-component tracking, i.e. a gradient-based tracker able to capture the
movements and an online-detector to compensate tracking losses.
In this paper, we introduce a robust closed-loop framework to track and
localize the instrument parts in in-vivo RM sequences in real-time, based on the
dual-random forest approach for tracking and pose estimation proposed in [12].
A fast tracker directly employs the pixel intensities in a random forest to infer
the tool tip bounding box in every frame. To cope with the strong illumina-
tion changes affecting the RM sequences, one of the main contributions of our
paper is to adapt the offline model to online information while tracking, so to
incorporate the appearance changes learned by the trees with real photometric
distortions witnessed at test time. This offline learning - online adaption leads to
a substantial capability regarding the generalization to unseen sequences. Sec-
ondly, within the estimated bounding box, another random forest predicts the
locations of the tool joints based on gradient information. Differently from [12],
we enforce spatial temporal constraints by means of a Kalman filter [13]. As
a third contribution of this work, we propose to “close the loop” between the
tracking and 2D pose estimation by obtaining a joint prediction concerning the
template position acquired by merging the outcome of the two separate forests
through the confidence of their estimation. Such cooperative prediction will in
turn provide pose information for the tracker, improving its robustness and accu-
racy. The performance of the proposed approach is quantitatively evaluated on
two different in-vivo RM datasets, and demonstrate remarkable advantages with
respect to the state-of-the-art in terms of robustness and generalization.
2 Method
In this section, we discuss the proposed method, for which an overview is depicted
in Fig. 1. First, a fast intensity-based tracker locates a template around the
424 N. Rieke et al.
Fig. 1. Framework: The description of the tracker, sampling and online learning can
be found in Sect. 2.1. The pose estimator and Kalman filter is presented in Sect. 2.2.
Details on the integrator are given in Sect. 2.3.
instrument tips using an offline trained model based on random forest (RF) and
the location of the template in the previous frame. Within this ROI, a pose
estimator based on HOG recovers the three joints employing another offline
learned RF and filters the result by temporal-spatial constraints. To close the
loop, the output is propagated to an integrator, aimed at merging together
the intensity-based and gradient-based predictions in a synergic way in order
to provide the tracker with an accurate template location for the prediction in
the next frame. Simultaneously, the refined result is propagated to a separate
thread which adapts the model of the tracker to the current data characteristics
via online learning.
A central element in this approach is the definition of the tracked template,
which we define by the landmarks of the forceps. Let (L, R, C) ∈ R2×3 be the
left, right and central joint of the instrument, then the midpoint between the tips
is given by M = L+R 2 and the 2D similarity transform from the patch coordinate
system to the frame coordinate system can be defined as
⎡ ⎤⎡ ⎤
s · cos(θ) −s · sin(θ) Cx 10 0
H = ⎣ s · sin(θ) s · cos(θ) Cy ⎦ ⎣0 1 30⎦
0 0 1 00 1
b M −C
with s = 100 · max{L − C2 , R − C2 } and θ = cos−1 My−Cy 2 for a fixed
patch size of 100×150 pixel and b ∈ R defining the relative size. In this way,
the entire instrument tip is enclosed by the template and aligned with the tool’s
direction. In the following, details of the different components are presented.
function. Thus, the tracker learns a generalized model of the tool based on mul-
tiple templates, taken as the tool undergoes different movements in a variety of
environmental settings, and predicts the translation parameter from the inten-
sity values at n random points {xp }np=1 within the template, similar to [12].
In addition, we assume a piecewise constant velocity from consecutive frames.
Therefore, given the image It at time t and the translation vector of the template
from t − 2 to t − 1 as vt−1 = (vx , vy ) , the input to the forest is a feature vector
concatenating the intensity values on the current location of the template It (xp )
with the velocity vector vt−1 , assuming a constant time interval. In order to
learn the relation between the feature vector and the transformation update, we
use a random forest that follows a dimension-wise splitting of the feature vector
such that the translation vector on the leaves point to a similar location.
The cost of generalization is the inadequacy to describe the conditions that
are specific to a particular situation, such as the type of tool used in the surgery.
As a consequence, the robustness of the tracker is affected, since it cannot con-
fidently predict the location of the template for challenging frames with high
variations from the generalized model. Hence, in addition to the offline learning
for a generalized tracker, we propose to perform an online learning strategy that
considers the current frames and learns the relation of the translation vector
with respect to the feature vector. The objective is to stabilize the tracker by
adapting its forest to the specific conditions at hand. In particular, we propose
to incrementally add new trees to the forest by using the predicted template
location on the current frames of the video sequence. To achieve this goal, we
impose random synthetic transformations on the bounding boxes that enclose
the templates to build the learning dataset with pairs of feature and transla-
tion vectors, such that the transformations emulate the motion of the template
between two consecutive frames. Thereafter, the resulting trees are added to the
existing forest and the prediction for the succeeding frames include both the gen-
eralized and environment-specific trees. Notably, our online learning approach
does not learn from all the incoming frames, but rather introduces in Sect. 2.3 a
confidence measure to evaluate and accumulate templates.
During pose estimation, we model a direct mapping between image features and
the location of the three joints in the 2D space of the patch. Similar to [12], we
employ HOG features around a pool of randomly selected pixel locations within
the provided ROI as an input to the trees in order to infer the pixel offsets to
the joint positions. Since the HOG feature vector is extracted as in [14], the
splitting function of the trees considers only one dimension of the vector and is
optimized by means of information gain. The final vote is aggregated by a dense-
window algorithm. The predicted offsets to the joints in the reference frame of
the patch are back-warped onto the frame coordinate system. Up to now, the
forest considers every input as a still image. However, the surgical movement
is usually continuous. Therefore, we enforce a temporal-spatial relationship for
426 N. Rieke et al.
all joint locations via a Kalman filter [13] by employing the 2D location of the
joints in the frame coordinate system and their frame-to-frame velocity.
In the offline learning for the tracker, we trained 100 trees per parameter,
employed 20 random intensity values and velocity as feature vectors, and used
500 sample points. For the pose estimation, we used 15 trees and the HOG
features are set to a bin size of 9 and pixel size resolution of 50×50.
Fig. 3. Szn-dataset: Sequential and combined evaluation for sequence 1–3. For over
93 %, the results are so close that the single graphs are not distinguishable.
Fig. 4. Rie-dataset: Cross validation evaluation – the offline forests are learned on
three sequences and tested on the unseen one.
Real-Time Online Adaption for Robust Instrument Tracking 429
Table 2. Strict PCP for cross validation of Rie-dataset for Left and Right fork.
Methods Set I (L/R) Set II (L/R) Set III (L/R) Set IV (L/R)
Our work 89.0/88.5 98.5/99.5 99.5/99.5 94.5/95.0
POSE [12] 69.7/58.5 93.94/93.43 94.47/94.47 46.46/57.71
4 Conclusion
In this work, we propose a closed-loop framework for tool tracking and pose
estimation, which runs at 40 fps. A combination of separate predictors yields
robustness which is able to withstand the challenges of RM sequences. The work
further shows the method’s capability to generalize to unseen instruments and
illumination changes by allowing an online adaption. These key drivers allow our
method to outperform state-of-the-art on two benchmark datasets.
References
1. Ehlers, J.P., Kaiser, P.K., Srivastava, S.K.: Intraoperative optical coherence tomog-
raphy using the rescan 700: preliminary results from the discover study. Br. J.
Ophthalmol. 98, 1329–1332 (2014)
2. Reiter, A., Allen, P.K.: An online learning approach to in-vivo tracking using syn-
ergistic features. In: IROS, pp. 3441–3446 (2010)
3. Bouget, D., Benenson, R., Omran, M., Riffaud, L., Schiele, B., Jannin, P.: Detecting
surgical tools by modelling local appearance and global shape. IEEE Trans. Med.
Imaging 34(12), 2603–2617 (2015)
4. Allan, M., Chang, P.L., Ourselin, S., Hawkes, D., Sridhar, A., Kelly, J.,
Stoyanov, D.: Image based surgical instrument pose estimation with multi-class
labelling and optical flow. In: MICCAI, pp. 331–338 (2015)
5. Wolf, R., Duchateau, J., Cinquin, P., Voros, S.: 3D tracking of laparoscopic instru-
ments using statistical and geometric modeling. In: Fichtinger, G., Martel, A.,
Peters, T. (eds.) MICCAI 2011, Part I. LNCS, vol. 6891, pp. 203–210. Springer,
Heidelberg (2011)
6. Pezzementi, Z., Voros, S., Hager, G.D.: Articulated object tracking by rendering
consistent appearance parts. In: ICRA, pp. 3940–3947 (2009)
7. Richa, R., Balicki, M., Meisner, E., Sznitman, R., Taylor, R., Hager, G.: Visual
tracking of surgical tools for proximity detection in retinal surgery. In: Taylor, R.H.,
Yang, G.-Z. (eds.) IPCAI 2011. LNCS, vol. 6689, pp. 55–66. Springer, Heidelberg
(2011)
8. Sznitman, R., Basu, A., Richa, R., Handa, J., Gehlbach, P., Taylor, R.H.,
Jedynak, B., Hager, G.D.: Unified detection and tracking in retinal micro-
surgery. In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011, Part I.
LNCS, vol. 6891, pp. 1–8. Springer, Heidelberg (2011)
9. Sznitman, R., Ali, K., Richa, R., Taylor, R.H., Hager, G.D., Fua, P.: Data-driven
visual tracking in retinal microsurgery. In: Ayache, N., Delingette, H., Golland, P.,
Mori, K. (eds.) MICCAI 2012, Part II. LNCS, vol. 7511, pp. 568–575. Springer,
Heidelberg (2012)
430 N. Rieke et al.
10. Sznitman, R., Becker, C., Fua, P.: Fast part-based classification for instrument
detection in minimally invasive surgery. In: Golland, P., Hata, N., Barillot, C.,
Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part II. LNCS, vol. 8674, pp. 692–
699. Springer, Heidelberg (2014)
11. Li, Y., Chen, C., Huang, X., Huang, J.: Instrument tracking via online learn-
ing in retinal microsurgery. In: Golland, P., Hata, N., Barillot, C., Hornegger, J.,
Howe, R. (eds.) MICCAI 2014, Part I. LNCS, vol. 8673, pp. 464–471. Springer,
Heidelberg (2014)
12. Rieke, N., Tan, D.J., Alsheakhali, M., Tombari, F., Amat di San Filippo, C.,
Belagiannis, V., Eslami, A., Navab, N.: Surgical tool tracking and pose estima-
tion in retinal microsurgery. In: MICCAI, pp. 266–273 (2015)
13. Haykin, S.S.: Kalman Filtering and Neural Networks. Wiley, Hoboken (2001)
14. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection
with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010)
15. Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction
for human pose estimation. In: CVPR, pp. 1–8 (2008)
Integrated Dynamic Shape Tracking and RF
Speckle Tracking for Cardiac Motion Analysis
1 Introduction
Characterization of left ventricular myocardial deformation is useful for the
detection and diagnosis of cardiovascular diseases. Conditions such as ischemia
and infarction undermine the contractile property of the LV and analyzing
Lagrangian strains is one way of identifying such abnormalities.
Numerous methods calculate dense myocardial motion fields and then com-
pute strains using echo images. Despite the substantial interest in Lagrangian
motion and strains, and some recent contributions in spatio-temporal tracking
[1,2], most methods typically calculate frame-to-frame or Eulerian displacements
first and then obtain Lagrangian trajectories.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 431–438, 2016.
DOI: 10.1007/978-3-319-46720-7 50
432 N. Parajuli et al.
2 Methods
1. It will not move too far away from the previous point xi−1,Aj (i−1) and the
starting point x1,j . Same applies to its shape descriptor Fi,Aj (i) .
2. Its displacement will not differ substantially from that of the previous point.
Same applies to its shape descriptors. We call these the 2nd order weights.
Trajectories satisfying the above conditions will have closely spaced con-
secutive points with similar shape descriptors (Fig. 1b). They will also have
points staying close to the starting point and their shape descriptor remain-
ing similar to the starting shape descriptor, causing them to be more closed.
The 2nd order weights enforce smoothness and shape consistency.
Integrated DST and RF Speckle Tracking 433
Let A represent the set of all possible trajectories originating from x1,j .
Hence, Aj ∈ A minimizes, the following energy:
N
Âj = argmin λ1 ||xi,Aj (i) − xi−1,Aj (i−1) || + λ2 ||xi,Aj (i) − x1,j ||
Aj ∈A i=2
+ λ3 ||Fi,Aj (i) − Fi−1,Aj (i−1) || + λ4 ||Fi,Aj (i) − F1,j || + 2nd order weights
(1)
Graph based techniques have been used in particle tracking applications such
as [7]. We set each point xi,j as a node in our graph. Directed edges exist between
a point xi,j and its neighbors ηi,j in frame i + 1. Each edge has an associated
cost of traversal (defined by Eq. 1). The optimal trajectory Âj is the one that
accrues the smallest cost in traversing from point xi,j to the last frame. This can
be solved using Dijkstra’s shortest path algorithm.
Algorithm. Because our search path is causal, we don’t do all edge weight com-
putations in advance. We start at x1,j and proceed frame-by-frame doing edge
cost calculations between points and their neighbors and dynamically updating
a cost matrix E ∈ RN ×K and a correspondence matrix P ∈ RN ×K . The search
for the trajectory AJ stemming from a point j = J in frame 1 is described in
Algorithm 1.
Agreement with Crystal Data. In Fig. 3a, we show, for one baseline image,
strains calculated using our method (echo) and using sonomicrometry (crys). We
can see that the strain values correlate well and drift error is minimal. In Fig. 3b,
we present bar graphs to explicitly quantify the final frame drift as the distance
between actual crystal position and results from tracking. We compare the results
from this method (DST) against that of GRPM (described in [4,11]), which is a
frame-to-frame tracking method, in BL and SO conditions for 5 canine datasets.
The last frame errors for crystal position were lower and statistically signifi-
cant for both BL an SO conditions (p < .01).
the cubic array regions (ISC, BOR, REM) for all conditions are summarized in
Table 1. We see slightly improved correlations from SHP to COMB method. Cor-
relation values were generally lower for ischemic region and longitudinal strains
for both methods.
Since we only had a few data points to carry out statistical analysis in this
format, we also calculated overall correlations (with strain values included for
all time points and conditions together, n > 500) and computed statistical sig-
nificance using Fisher’s transformation. Change in radial strains (SHP r = .72
to COMB r = .82) was statistically significant (p < .01), while circumferential
(SHP r = .73 to COMB r = .75) and longitudinal (SHP r = .44 to COMB
r = .41) were not.
Table 1. Mean correlation values across regions for SHP and COMB methods.
Fig. 3. Peak strain bar graphs (with mean and standard deviations) for radial, circum-
ferential and longitudinal strains - shown across ISC, BOR and REM regions for echo
and crystal based strains.
Integrated DST and RF Speckle Tracking 437
Physiological Response. Changes in the crystal and echo based (using the
combined method) strain magnitudes, across the physiological testing conditions
- BL, SO and SODOB, is shown in Fig. 3.
Both echo and crystal strain magnitudes generally decreased with severe
occlusion and increased beyond baseline levels with low dose dobutamine stress.
The fact that functional recovery was observed with dobutamine stress indicates
that, at the dose given ischemia was not enhanced. Rather, it appears that
the vasodilatory and inotropic effects of dobutamine were able to overcome the
effects of the occlusion.
However, in average, the strain magnitude recovery is less in the ISC region
compared to BOR and REM regions for both echo and crystals. For echo, the
overall physiological response was more pronounced for radial strains.
4 Conclusion
The DST method has provided improved temporal regularization and therefore
drift errors have been reduced, specially in the diastolic phase. A combined
dense field calculation method that integrates the DST results with RF speckle
tracking results provided good strains, which is validated by comparing with
sonomicrometry based strains. The correlation values were specifically good for
radial and circumferential strains.
We also studied how strains vary across the ISC, BOR and REM regions
(defined by the cuboidal array of crystals in the anterior LV wall) during the
BL, SO and SCODOB conditions. Strain magnitudes (particularly radial) varied
in keeping with the physiological conditions, and also in good agreement with
the crystal based strains.
We seek to improve our methods as we notice that the longitudinal strains
and strains in the ischemic region were not very good. Also, the DST algorithm
occasionally resulted in higher error at end systole. Therefore, in the future, we
will enforce spatial regularization directly by solving for neighboring trajectories
together, where the edge weights will be influenced by the neighboring trajecto-
ries. We would also like to extend the method to work with point sets generated
from other feature generation processes than segmentation.
References
1. Craene, M., Piella, G., Camara, O., Duchateau, N., Silva, E., Doltra, A.,
Dhooge, J., Brugada, J., Sitges, M., Frangi, A.F.: Temporal diffeomorphic free-
form deformation: application to motion and strain estimation from 3D echocar-
diography. Med. Image Anal. 16(2), 427–450 (2012)
438 N. Parajuli et al.
2. Ledesma-Carbayo, M.J., Kybic, J., Desco, M., Santos, A., Sühling, M.,
Hunziker, P., Unser, M.: Spatio-temporal nonrigid registration for ultrasound car-
diac motion estimation. IEEE Trans. Med. Imaging 24(9), 1113–1126 (2005)
3. Compas, C.B., Wong, E.Y., Huang, X., Sampath, S., Lin, B.A., Pal, P.,
Papademetris, X., Thiele, K., Dione, D.P., Stacy, M., et al.: Radial basis functions
for combining shape and speckle tracking in 4D echocardiography. IEEE Trans.
Med. Imaging 33(6), 1275–1289 (2014)
4. Parajuli, N., Compas, C.B., Lin, B.A., Sampath, S., ODonnell, M., Sinusas, A.J.,
Duncan, J.S.: Sparsity and biomechanics inspired integration of shape and speckle
tracking for cardiac deformation analysis. In: van Assen, H., Bovendeerd, P.,
Delhaas, T. (eds.) FIMH 2015. LNCS, vol. 9126, pp. 57–64. Springer, Heidelberg
(2015)
5. Huang, X., Dione, D.P., Compas, C.B., Papademetris, X., Lin, B.A., Bregasi, A.,
Sinusas, A.J., Staib, L.H., Duncan, J.S.: Contour tracking in echocardiographic
sequences via sparse representation and dictionary learning. Med. Image Anal.
18(2), 253–271 (2014)
6. Belongie, S., Malik, J., Puzicha, J.: Shape context: a new descriptor for shape
matching and object recognition. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.)
Advances in Neural Information Processing Systems, pp. 831–837. MIT Press,
Cambridge (2001)
7. Shafique, K., Shah, M.: A noniterative greedy algorithm for multiframe point cor-
respondence. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 51–65 (2005)
8. Chen, X., Xie, H., Erkamp, R., Kim, K., Jia, C., Rubin, J., O’Donnell, M.: 3-D
correlation-based speckle tracking. Ultrason. Imaging 27(1), 21–36 (2005)
9. Dione, D., Shi, P., Smith, W., DeMan, P., Soares, J., Duncan, J., Sinusas, A.: Three-
dimensional regional left ventricular deformation from digital sonomicrometry. In:
Proceedings of the 19th Annual International Conference of the IEEE Engineering
in Medicine and Biology Society, vol. 2, pp. 848–851. IEEE (1997)
10. Waldman, L.K., Fung, Y., Covell, J.W.: Transmural myocardial deformation in
the canine left ventricle. Normal in vivo three-dimensional finite strains. Circ. Res.
57(1), 152–163 (1985)
11. Lin, N., Duncan, J.S.: Generalized robust point matching using an extended free-
form deformation model: application to cardiac images. In: 2004 IEEE Interna-
tional Symposium on Biomedical Imaging: Nano to Macro, pp. 320–323. IEEE
(2004)
The Endoscopogram: A 3D Model
Reconstructed from Endoscopic Video Frames
1 Introduction
Modern radiation therapy treatment planning relies on imaging modalities like
CT for tumor localization. For throat cancer, an additional kind of medical
imaging, called endoscopy, is also taken at treatment planning time. Endoscopic
videos provide direct optical visualization of the pharyngeal surface and provide
information, such as a tumor’s texture and superficial (mucosal) spread, that is
not available on CT due to CT’s relatively low contrast and resolution. However,
the use of endoscopy for treatment planning is significantly limited by the fact
that (1) the 2D frames from the endoscopic video do not explicitly provide 3D
spatial information, such as the tumor’s 3D location; (2) reviewing the video
is time-consuming; and (3) the optical views do not provide the full geometric
conformation of the throat.
In this paper, we introduce a pipeline for reconstructing a 3D textured surface
model of the throat, which we call an endoscopogram, from 2D video frames.
The model provides (1) more complete 3D pharyngeal geometry; (2) efficient
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 439–447, 2016.
DOI: 10.1007/978-3-319-46720-7 51
440 Q. Zhao et al.
visualization; and (3) the opportunity to register endoscopy data with the CT,
thereby enabling transfer of the tumor contours and texture into the CT space.
State-of-the-art monocular endoscopic reconstruction techniques have been
applied in applications like colonoscopy inspection [1], laparoscopic surgery [2]
and orthopedic surgeries [3]. However, most existing methods cannot simul-
taneously deal with the following three challenges: (1) non-Lambertian sur-
faces; (2) non-rigid deformation of tissues across frames; and (3) poorly known
shape or motion priors. Our proposed pipeline deals with these problems using
(1) a Shape-from-Motion-and-Shading (SfMS) method [4] incorporating a new
reflectance model for generating single-frame-based partial reconstructions; and
(2) a novel geometry fusion algorithm for non-rigid fusion of multiple partial
reconstructions. Since our pipeline does not assume any prior knowledge on envi-
ronments, motion and shapes, it can be readily generalized to other endoscopic
applications in addition to our nasopharyngoscopy reconstruction problem.
In this paper we focus on the geometry fusion step mentioned above. The
challenge here is that all individual reconstructions are only partially overlapping
due to the constantly changing camera viewpoint, may have missing data (holes)
due to camera occlusion, and may be slightly deformed since the tissue may have
deformed between 2D frame acquisitions. Our main contribution in this paper is
the design of a novel groupwise surface registration algorithm that can deal with
these limitations. An additional contribution is an outlier geometry trimming
algorithm based on robust regression. We generate endoscopograms and validate
our registration algorithm with data from synthetic CT surface deformations and
endoscopic video of a rigid phantom and real patients.
Shape from Motion and Shading (SfMS). Our novel reconstruction method
[4] has been shown to be efficient in single-camera reconstruction of live
endoscopy data. The method leverages sparse geometry information obtained
by Structure-from-Motion (SfM), Shape-from-Shading (SfS) estimation, and a
novel reflectance model to characterize non-Lambertian surfaces. In summary, it
iteratively estimates the reflectance model parameters and a SfS reconstruction
surface for each individual frame under sparse SfM constraints derived within a
sliding time window. One drawback of this method is that large tissue deforma-
tion and lighting changes across frames can induce inconsistent individual SfS
reconstructions. Nevertheless, our experiments show that this kind of error can
be well compensated in the subsequent geometry fusion step. In the end, for
each frame Fi , a reconstruction Ri is produced as a triangle mesh and trans-
formed into the world space using the camera position parameters estimated
from SfM. Mesh faces that are nearly tangent to the camera viewing ray are
removed because they correspond to occluded regions. The end result of this is
that the reconstructions {Ri } have missing patches and different topology and
are only partially overlapping with each other.
Texture Mapping. The goal of texture mapping is to assign a color to each
vertex v k (superscripts refer to vertex index) in the fused geometry R, which is
estimated by the geometry fusion (Sect. 3) of all the registered individual frame
surfaces {Ri }. Our idea is to find a corresponding point of v k in a registered
surface Ri and to trace back its color in the corresponding frame Fi . Since v k
might have correspondences in multiple registered surfaces, we formulate this
procedure as a labeling problem and optimize a Markov Random Field (MRF)
energy function. In general, the objective function prefers pulling color from non-
boundary nearby points in {Ri }, while encouraging regional label consistency.
3 Geometry Fusion
Zhao et al. [9] proposed a pairwise surface registration algorithm, Thin Shell
Demons, that can handle topology change and missing data. We have extended
this algorithm into our groupwise situation.
Thin Shell Demons. Thin Shell Demons is a physics-motivated method that
uses geometric virtual forces and a thin shell model to estimate surface deforma-
tion. The so-called forces {f } between two surfaces {R1 , R2 } are vectors connect-
ing automatically selected corresponding vertex pairs, i.e. {f (v k ) = uk −v k | v k ∈
R1 , uk ∈ R2 } (with some abuse of notation, we use k here to index correspon-
dences). The algorithm regards the surfaces as elastic thin shells and produces a
M field φ : R1 → R2 by iteratively minimizing
non-parametric deformation vector
the energy function E(φ) = k=1 c(v k )(φ(v k ) − f (v k ))2 + Eshell (φ). The first
part penalizes inconsistency between the deformation vector and the force vector
applied on a point and uses a confidence score c to weight the penalization. The
second part minimizes the thin shell deformation energy, which is defined as the
integral of local bending and membrane energy:
Eshell (φ) = λ1 W (σmem (p)) + λ2 W (σbend (p)), (1)
R
W (σ) = Y /(1 − τ 2 )((1 − τ )tr(σ 2 ) + τ tr(σ)2 ), (2)
where Y and τ are the Young’s modulus and Poisson’s ratio of the shell. σmem
is the tangential Cauchy-Green strain tensor characterizing local stretching. The
bending strain tensor σbend characterizes local curvature change and is computed
as the shape operator change.
Our main observation is that the virtual force interaction is still valid among
N partial shells even without the mean geometry. Thus, we propose a group-
wise deformation scenario as an analog to the N-body problem: N surfaces are
deformed under the influence of their mutual forces. This groupwise attraction
can bypass the need of a target mean and still deform all surfaces into a sin-
gle geometric configuration. The deformation of a single surface is independent
and fully determined by the overall forces exerted on it. With the physical thin
shell model, its deformation can be topology-preserving and not influenced by its
partial-ness. With this notion in mind, we now have to define (1) mutual forces
among N partial surfaces; (2) an evolution strategy to deform the N surfaces.
Mutual Forces. In order to derive mutual forces, correspondences should be
credibly computed among N partial surfaces. It has been shown that by using
the geometric descriptor proposed in [10], a set of correspondences can be effec-
tively computed between partial surfaces. Additionally, in our application, each
surface Ri has an underlying texture image Fi . Thus, we also compute texture
correspondences between two frames by using standard computer vision tech-
niques. To improve matching accuracy, we compute inlier SIFT correspondences
The Endoscopogram 443
only between frame pairs that are at most T seconds apart. Finally, these SIFT
matchings can be directly transformed to 3D vertex correspondences via the
SfSM reconstruction procedure.
In the end, any given vertex vik ∈ Ri will have Mik corresponding vertices
in other surfaces {Rj |j = i}, given as vectors {f β (vik ) = uβ − vik , β = 1...Mik },
where uβ is the β th correspondence of vik in some other surface. These corre-
spondences are associated with confidence scores {cβ (vik )} defined by
β δ(uβ , vik ) if uβ , vik is a geometric correspondence,
c (vik ) = (3)
c̄ if uβ , vik is a texture correspondence,
where δ is the geometric feature distance defined in [10]. Since we only consider
inlier SIFT matchings using RANSAC, the confidence score for texture corre-
spondences is a constant c̄. We then define the overall force exerted on vik as the
Mik β k β k Mik β k
weighted average: f¯(v k ) = i c (v )f (v )/
β=1 i c (v ).
i β=1 i
Deformation Strategy. With mutual forces defined, we can solve for the group
deformation fields {φi } by optimizing independently for each surface
Mi
E(φi ) = c(vik )(φ(vik ) − f¯(vik ))2 + Eshell (φi ), (4)
k=1
where Mi is the number of vertices that have forces applied. Then, a groupwise
deformation scenario is to evolve the N surfaces by iteratively estimating the
mutual forces {f } and solving for the deformations {φi }. However, a potential
hazard of our algorithm is that without a common target template, the N sur-
faces could oscillate, especially in the early stage when the force magnitudes are
large and tend to overshoot the deformation. To this end, we observe that the
thin shell energy regularization weights λ1 , λ2 control the deformation flexibility.
Thus, to avoid oscillation, we design the strategy shown in Algorithm 1.
The final step of geometry fusion is to estimate a single geometry R from the reg-
istered surfaces {Ri } [11]. However, this fusion step can be seriously harmed by
the outlier geometry created by SfMS. Outlier geometries are local surface parts
444 Q. Zhao et al.
Fig. 2. (a) 5 overlaying registered surfaces, one of which (pink) has a piece of outlier
geometry (circled) that does not correspond to anything else. (b) Robust quadratic
fitting (red grid) to normalized N (v k ). The outlier scores are indicated by the color.
(c) Color-coded W on L. (d) Fused surface after outlier geometry removal.
that are wrongfully estimated by SfMS under bad lighting conditions (insuffi-
cient lighting, saturation, or specularity) and are drastically different from all
other surfaces (Fig. 2a). The sub-surfaces do not correspond to any part in other
surfaces and thereby are carried over by the deformation process to {Ri }.
Our observation is that outlier geometry changes a local surface’s topology
(branching) and violates many differential geometry properties. We know that
the local surface around a point in a smooth 2-manifold can be approximately
presented by a quadratic Monge Patch h : U → R3 , where U defines a 2D open
set in the tangent plane, and h is a quadratic height function. Our idea is that if
we robustly fit a local quadratic surface at a branching place, the surface points
on the wrong branch of outlier geometry will be counted as outliers (Fig. 2b).
We define the 3D point cloud L = {v 1 , ...v P } of P points as the ensemble of
all vertices in {Ri }, N (v k ) as the set of points in the neighborhood of v k and W
as the set of outlier scores of L. For a given v k , we transform N (v k ) by taking
v k as the center of origin and the normal direction of v k as the z-axis. Then,
we use Iteratively Reweighted Least Squares to fit a quadratic polynomial to
the normalized N (v k ) (Fig. 2b). The method produces outlier scores for each of
the points in N (v k ), which are then accumulated into W (Fig. 2c). We repeat
this robust regression process for all v k in L. Finally, we remove the outlier
branches by thresholding the accumulated scores W, and the remaining largest
point cloud is used to produce the final single geometry R [11] (Fig. 2d).
4 Results
Fig. 3. Left to right: error plot of synthetic data for 6 patients; a phantom endoscopic
video frame; the fused geometry with color-coded deviation (in millimeters) from the
ground truth CT.
diameter, covering from the pharynx down to the vocal cords. We created defor-
mations typically seen in real data, such as the stretching of the pharyngeal
wall and the bending of the epiglottis. We generated for each patient 20 par-
tial surfaces by taking depth maps from different camera positions in the CT
space. Only geometric correspondences were used in this test. We measured the
registration error as the average Euclidean distance of all pairs of correspond-
ing vertices after registration (Fig. 3). Our method significantly reduced error
and performed better than a spectral-graph-based method [10], which is another
potential framework for matching partial surfaces without estimating the mean.
Phantom Data. To test our method on real-world data in a controlled envi-
ronment, we 3D-printed a static phantom model (Fig. 3) from one patient’s CT
data and then collected endoscopic video and high-resolution CT for the model.
We produced SfMS reconstructions for 600 frames in the video, among which 20
reconstructions were uniformly selected for geometry fusion (using more surfaces
for geometry fusion won’t further increase accuracy, but will be computation-
ally slower). The SfMS results were downsampled to ∼2500 vertices and rigidly
aligned to the CT space. Since the phantom is rigid, the registration plays the
role of unifying inconsistent SfMS estimation. No outlier geometry trimming was
performed in this test. We define a vertex’s deviation as its distance to the near-
est point in the CT surface. The average deviation of all vertices is 1.24 mm for
the raw reconstructions and is 0.94 mm for the fused geometry, which shows
that the registration can help filter out inaccurate SfMS geometry estimation.
Figure 3 shows that the fused geometry resembles the ground truth CT surface
except in the farther part, where less data was available in the video.
Patient Data. We produced endoscopograms for 8 video sequences (300 frames
per sequence) extracted from 4 patient endoscopies. Outlier geometry trimming
was used since lighting conditions were often poor. We computed the overlap
distance (OD) defined in [12], which measures the average surface deviation
between all pairs of overlapping regions. The average OD of the 8 cases is 1.6 ±
0.13 mm before registration, 0.58 ± 0.05 mm after registration, and 0.24 ±
0.09 mm after outlier geometry trimming. Figure 4 shows one of the cases.
446 Q. Zhao et al.
Fig. 4. OD plot on the point cloud of 20 surfaces. Left to right: before registration,
after registration, after outlier geometry trimming, the final endoscopogram.
5 Conclusion
We have described a pipeline for producing an endoscopogram from a video
sequence. We proposed a novel groupwise surface registration algorithm and
an outlier-geometry trimming algorithm. We have demonstrated via synthetic
and phantom tests that the N-body scenario is robust for registering partially-
overlapping surfaces with missing data. Finally, we produced endoscopograms for
real patient endsocopic videos. A current limitation is that the video sequence
is at most 3–4 s long for robust SfM estimation. Future work involves fusing
multiple endoscopograms from different video sequences.
References
1. Hong, D., Tavanapong, W., Wong, J., Oh, J., de Groen, P.C.: 3D reconstruction of
virtual colon structures from colonoscopy images. Comput. Med. Imaging Graph.
38(1), 22–23 (2014)
2. Maier-Hein, L., Mountney, P., Bartoli, A., Elhawary, H., Elson, D., Groch, A.,
Kolb, A., Rodrigues, M., Sorger, J., Speidel, S., Stoyanov, D.: Optical techni-
ques for 3D surface reconstruction in computer-assisted laparoscopic surgery. Med.
Image Anal. 17(8), 974–996 (2013)
3. Wu, C., Narasimhan, S.G., Jaramaz, B.: A multi-image shape-from-shading frame-
work for near-lighting perspective endoscopes. Int. J. Comput. Vis. 86(2), 211–228
(2010)
4. Price, T., Zhao, Q., Rosenman, J., Pizer, S., Frahm, J.M.: Shape from motion
and shading in uncontrolled environments. Under submission, To appear. http://
midag.cs.unc.edu/
5. Durrleman, S., Prastawa, M., Korenberg, J.R., Joshi, S., Trouvé, A., Gerig, G.:
Topology preserving atlas construction from shape data without correspondence
using sparse parameters. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.)
MICCAI 2012, Part III. LNCS, vol. 7512, pp. 223–230. Springer, Heidelberg (2012)
6. Durrleman, S., Pennec, X., Trouvé, A., Ayache, N.: Statistical models of sets of
curves and surfaces based on currents. Med. Image Anal. 13(5), 793–808 (2009)
7. Balci, S.K., Golland, P., Shenton, M., Wells, W.M.: Free-form B-spline deformation
model for groupwise registration. In: MICCAI, pp. 23–30 (2007)
8. Arslan, S., Parisot, S., Rueckert, D.: Joint spectral decomposition for the par-
cellation of the human cerebral cortex using resting-state fMRI. In: Ourselin, S.,
Alexander, D.C., Westin, C.-F., Cardoso, M.J. (eds.) IPMI 2015. LNCS, vol. 9123,
pp. 85–97. Springer, Heidelberg (2015)
The Endoscopogram 447
9. Zhao, Q., Price, J.T., Pizer, S., Niethammer, M., Alterovitz, R., Rosenman, J.:
Surface registration in the presence of topology changes and missing patches. In:
Medical Image Understanding and Analysis, pp. 8–13 (2015)
10. Zhao, Q., Pizer, S., Niethammer, M., Rosenman, J.: Geometric-feature-based spec-
tral graph matching in pharyngeal surface registration. In: Golland, P., Hata, N.,
Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part I. LNCS, vol. 8673,
pp. 259–266. Springer, Heidelberg (2014)
11. Curless, B., Levoy, M.: A volumetric method for building complex models from
range images. In: SIGGRAPH, pp. 303–312 (1996)
12. Huber, D.F., Hebert, M.: Fully automatic registration of multiple 3D data sets.
Image Vis. Comput. 21(7), 637–650 (2003)
Robust Image Descriptors for Real-Time
Inter-Examination Retargeting
in Gastrointestinal Endoscopy
1 Introduction
Fig. 1. A framework overview. Grey arrows represent the training phase using diagnosis
video while black arrows represent the querying phase in the surveillance examination.
tracking [3,4], and mapping [5]. However, when applied over successive exami-
nations, these often fail due to the long-term variation in appearance of tissue
surface, which causes difficulty in detecting the same local features. For inter-
examination, endoscopic video manifolds (EVM) [6] was proposed, with retar-
geting achieved by projecting query images into manifold space using locality
preserving projections. In [7], an external positioning sensor was used for retar-
geting, but requiring manual trajectory registration which interferes with the
clinical workflow, increasing the complexity and duration of the procedure.
In this work, we propose an inter-examination retargeting framework (see
Fig. 1) for optical biopsy. This enables recognition of biopsied locations in the sur-
veillance (second) examination, based on targets defined in the diagnosis (first)
examination, whilst not interfering with the clinical workflow. Rather than rely-
ing on feature detection, a global image descriptor is designed based on regional
image comparisons computed at multiple scales. At the higher scale, this offers
robustness to small variations in tissue appearance across examinations, whilst at
the lower scale, this offers discrimination in matching those tissue regions which
have not changed. Inspired by [8], efficient descriptor matching is achieved by
compression into binary codes, with a novel mapping function based on random
forests, allowing for fast encoding of a query image and hence real-time retarget-
ing. Validation was performed on 13 in vivo GI videos, obtained from successive
endoscopies of the same patient, with 6 patients in total. Extensive comparisons
to state-of-the-art methods have been conducted to demonstrate the practical
clinical value of our approach.
2 Methods
Fig. 2. (a) Obtaining an integer from one location; (b) creating the global image
descriptor from all locations using spatial pyramid pooling.
In recent years, the use of local binary patterns (LBP) [11] has proved popular
for recognition due to its fast computational speed, and robustness to image
noise and illumination variation. Here, pairs of pixels within an image patch are
compared in intensity to create a sequence of binary numbers. We propose a
novel, symmetric version of LBP which performs 4 diagonal comparisons within
a patch to yield a 4-bit string for each patch, representing an integer from 0 to
15. This comparison mask acts as a sliding window over the image, and a 16-bin
histogram is created from the full set of integers. To offer tolerance to camera
translation, we extend LBP by comparing local regions rather than individual
pixels, with each region the average of its underlying pixels, as shown in Fig. 2(a).
To encode global geometry such that retargeting ensures similarity at mul-
tiple scales, we adopt the spatial pyramid pooling method [12] which divides
an image into a set of coarse-to-fine levels. As shown in Fig. 2(b), we perform
pooling with three levels, where the second and third levels are divided into 2× 2
and 4 × 4 partitions, respectively, with each partition assigned its own histogram
based on the patches it contains. For the second and third levels, further over-
lapped partitions of 1 × 1 and 3 × 3 are created to allow for deformation and
scale variance. For patches of 3×3 regions, we use patches of 24×24, 12×12 and
6 × 6 pixels for the first, second and third levels, respectively. The histograms for
all partitions over all levels are then concatenated to create a 496-d descriptor.
n
Let us consider a set of training image descriptors {xi }i=1 from the diagnosis
sequence, each assigned to a scene label representing its topological location,
where each scene is formed of a cluster of adjacent images. We now aim to
infer a binary code of m bits for each descriptor, by encouraging the Hamming
distance between the codes of two images to be small for images of the same
scene, and large for images of different scenes, as in [8]. Let us now denote Y as
an affinity matrix, where yij = 1 if images xi and xj have the same scene label,
and yij = 0 if not. We now sequentially optimise each bit in the code, such that
for r-th bit optimisation, we have the objective function:
n
n
n
min lr (br,i , br,j ; yij ) , s.t. b(r) ∈ {0, 1} . (1)
b(r)
i=1 j=1
Here, br,i is the r-th bit of image xi , b(r) is a vector of the r-th bits for all
n images, and lr (·) is the loss function for the assignment of bits br,i and br,j
given the image affinity yij . As proved in [8], this objective can be optimised by
formulating a quadratic hinge loss function as follows:
2
0 − D bri , brj , if yij = 1
lr (br,i , br,j ; yij ) = 2 (2)
max 0.5m − D bri , brj , 0 , if yij = 0
Here, D bri , brj denotes the Hamming distance between bi and bj for the
first r bits. Note that during binary code inference, the optimisation of each bit
uses the results of the optimisation of the previous bits, and hence this is a series
of local optimisations due to the intractability of global optimisation.
Table 1. Mean average precision for recognition, both for the descriptor and the entire
framework. Note that the results of hashing-based methods are at 64-bit.
Descriptor Framework
Methods BOW GIST SPACT Ours EVM AGH ITQ KSH Fasthash Ours
Pat.1 0.227 0.387 0.411 0.488 0.238 0.340 0.145 0.686 0.802 0.920
Pat.2 0.307 0.636 0.477 0.722 0.304 0.579 0.408 0.921 0.925 0.956
Pat.3 0.321 0.576 0.595 0.705 0.248 0.501 0.567 0.903 0.911 0.969
Pat.4 0.331 0.495 0.412 0.573 0.274 0.388 0.289 0.889 0.923 0.957
Pat.5 0.341 0.415 0.389 0.556 0.396 0.435 0.342 0.883 0.896 0.952
Pat.6 0.201 0.345 0.315 0.547 0.273 0.393 0.298 0.669 0.812 0.895
where π (X) is the Shannon entropy: π (X) = − y∈{0,1} py log (py ). Here, py is
the fraction of data in X assigned to label y. Tree growth terminates when the
tree reaches a defined maximum depth, or I is below a certain threshold (e−10 ).
With T trained trees, each returning a value αt (x) between 0 and 1, the hashing
function for the ith bit then averages the responses from all trees and rounds
this accordingly to either 0 or 1:
T
0 if T1 t=1 αt (x) < 0.5
φi (x) = (4)
1 otherwise
Finally, to generate the m-bit binary code, the mapping function Φ (x) con-
m
catenates the output bits from all hashing functions {φi (x)}i=1 into a single
binary string. Therefore, to achieve retargeting, the binary string assigned to a
query image from the surveillance sequence is compared, via Hamming distance,
to the binary strings of scenes captured in a previous diagnosis sequence.
Fig. 3. (a) Means and standard deviations of recognition rates (precisions @ 1-NN) and
(b) precision values @ 50-NN with different binary code lengths; (c-h) precision-recall
curves of individual experiments using 64-bit codes.
Fig. 4. Example top-ranked images for the proposed framework on six patients. Yellow-
border images are queries from a surveillance sequence, green- and red-border images
are the correctly and incorrectly matches from a diagnosis sequence, respectively.
4 Conclusions
In this paper, we have proposed a retargeting framework for optical biopsy in
serial endoscopic examinations. A novel global image descriptor with regional
Robust Image Descriptors for Real-Time Inter-Examination Retargeting 455
comparisons over multiple scales deals with tissue appearance variation across
examinations, whilst binary encoding with a novel random forest-based mapping
function adds discrimination and speeds up recognition. The framework can be
readily incorporated into the existing endoscopic workflow due to its capability
of real-time retargeting and no requirement of manual calibration. Validation
on in vivo videos of serial endoscopies from six patients, shows that both our
descriptor and hashing scheme are consistently state-of-the-art.
References
1. Atasoy, S., Glocker, B., Giannarou, S., Mateus, D., Meining, A., Yang, G.-Z.,
Navab, N.: Probabilistic region matching in narrow-band endoscopy for targeted
optical biopsy. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C.
(eds.) MICCAI 2009, Part I. LNCS, vol. 5761, pp. 499–506. Springer, Heidelberg
(2009)
2. Allain, B., Hu, M., Lovat, L.B., Cook, R.J., Vercauteren, T., Ourselin, S., Hawkes,
D.J.: Re-localisation of a biopsy site in endoscopic images and characterisation of
its uncertainty. Med. Image Anal. 16(2), 482–496 (2012)
3. Ye, M., Giannarou, S., Patel, N., Teare, J., Yang, G.-Z.: Pathological site
retargeting under tissue deformation using geometrical association and track-
ing. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI
2013, Part II. LNCS, vol. 8150, pp. 67–74. Springer, Heidelberg (2013)
4. Ye, M., Johns, E., Giannarou, S., Yang, G.-Z.: Online scene association for endo-
scopic navigation. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R.
(eds.) MICCAI 2014, Part II. LNCS, vol. 8674, pp. 316–323. Springer, Heidelberg
(2014)
5. Mountney, P., Giannarou, S., Elson, D., Yang, G.-Z.: Optical biopsy mapping for
minimally invasive cancer screening. In: Yang, G.-Z., Hawkes, D., Rueckert, D.,
Noble, A., Taylor, C. (eds.) MICCAI 2009, Part I. LNCS, vol. 5761, pp. 483–490.
Springer, Heidelberg (2009)
6. Atasoy, S., Mateus, D., Meining, A., Yang, G.Z., Navab, N.: Endoscopic video
manifolds for targeted optical biopsy. IEEE Trans. Med. Imag. 31(3), 637–653
(2012)
7. Vemuri, A.S., Nicolau, S.A., Ayache, N., Marescaux, J., Soler, L.: Inter-operative
trajectory registration for endoluminal video synchronization: application to biopsy
site re-localization. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.)
MICCAI 2013, Part I. LNCS, vol. 8149, pp. 372–379. Springer, Heidelberg (2013)
8. Lin, G., Shen, C., van den Hengel, A.: Supervised hashing using graph cuts and
boosted decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2317–2331
(2015)
9. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Com-
put. Vis. 60(2), 91–110 (2004)
10. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with
large vocabularies and fast spatial matching. In: CVPR, pp. 1–8 (2007)
11. Wu, J., Rehg, J.: Centrist: a visual descriptor for scene categorization. IEEE Trans.
Pattern Anal. Mach. Intell. 33(8), 1489–1501 (2011)
12. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid
matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)
456 M. Ye et al.
13. Liu, W., Wang, J., Kumar, S., Chang, S.F.: Hashing with graphs. In: ICML, pp.
1–8 (2011)
14. Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with ker-
nels. In: CVPR, pp. 2074–2081 (2012)
15. Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a pro-
crustean approach to learning binary codes for large-scale image retrieval. IEEE
Trans. Pattern Anal. Mach. Intell. 35(12), 2916–2929 (2013)
16. Criminisi, A., Shotton, J.: Decision Forests for Computer Vision and Medical Image
Analysis. Springer, Heidelberg (2013)
17. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation
of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Kalman Filter Based Data Fusion for Needle
Deflection Estimation Using
Optical-EM Sensor
1 Introduction
reasons. We propose to develop a real-time navigation system for better guidance while
accounting for the needle bending caused by the needle-tissue interactions.
Many methods have been proposed to estimate the needle deflection. The most
popular class of methods is the model-based estimation [6–8]. Roesthuis et al. proposed
the virtual springs model considering the needle as a cantilever beam supported by a
series of springs and utilized Rayleigh-Ritz method to solve for needle deflection [8].
The work of Dorileo et al. merged needle-tissue properties, tip asymmetry and needle
tip position updates from images to estimate the needle deflection as a function of
insertion depth [7]. However, since the model-based estimation is sensitive to model
parameters and the needle-tissue interaction is stochastic in nature, needle deflection
and insertion trajectory are not completely repeatable. The second type of estimation is
achieved using an optical fiber based sensor. Park et al. designed an MRI-compatible
biopsy needle instrumented with optical fiber Bragg gratings to track needle deviation
[4]. However, the design and functionality of certain needles, such as cryoablation and
radiofrequency ablation needles, do not allow for instrumentation of the optical fiber
based sensor in the lumen of the needle. The third kind of estimation strategy was
proposed in [9], where Kalman filter was employed to combine a needle bending model
with the needle base and tip position measurements from two electromagnetic
(EM) trackers to estimate the true tip position. This approach can effectively com-
pensate for the quantification uncertainties of the needle model and therefore be more
reliable. However, this method is not feasible in the MRI environment due to the use of
MRI-unsafe sensors. In this work, we present a new fusion method using an optical
tracker at the needle’s base and an MRI gradient field driven EM tracker attached to the
shaft of the needle. By integrating the sensor data with the angular springs model
presented in [10], the Kalman filter-based fusion model can significantly reduce the
estimation error in presence of needle bending.
2 Methodology
Needle Configuration. In this study, we have used a cone-tip IceRod® 1.5 mm MRI
Cryoablation Needle (Galil Medical, Inc.), as shown in Fig. 1. A frame with four
passive spheres (Northern Digital Inc. and a tracking system from Symbow Medical
Inc.) is mounted on the base of the needle, and an MRI-safe EndoScout® EM sensor
(Robin Medical, Inc.) is attached to the needle’s shaft with 10 cm offset from the tip set
by a depth stopper.
Through pivot calibration, the optical tracking system can provide the needle base
position POpt and the orientation of the straight needle OOpt . The EM sensor obtains the
sensor’s location PEM and its orientation with respect to the magnetic field of the MR
scanner OEM .
Kalman Filter Formulation. The state vector is set as xk ¼ ½PtipðkÞ ; P_ tipðkÞ T . The
insertion speed during cryoablation procedure is slow enough to be considered as a
Kalman Filter Based Data Fusion for Needle Deflection Estimation 459
Fig. 1. Cryoablation needle mounted with Optical and EM sensor and a depth stopper.
constant. Therefore, the process model can be formulated in the form of xk ¼ Axk1 þ
wk1 as follows:
T2
PtipðkÞ I3 TS I3 Ptipðk1Þ
þ 2 I3 P € tipðkÞ
s
¼ ð1Þ
P_ tipðkÞ 03 I3 P_ tipðk1Þ Ts I3
|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}
transition
matrixA
where TS , I3 , 03 stand for the time step, 3-order identity matrix and 3-order null matrix.
PtipðkÞ , P_ tipðkÞ , P
€ tipðkÞ represent the tip position, velocity, acceleration, respectively. The
T2
acceleration element ½ s I3 ; Ts I3 T P € tipðkÞ is taken as the process noise, denoted by
2
wk1 N ð0; QÞ, where Q is the process noise covariance matrix.
When considering the needle as straight, the tip position was estimated using the
three sets of data as follows: TIPOpt (using POpt , OOpt and needle length offset), TIPEM
(using PEM , OEM and EM offset), and TIPOptEM (drawing a straight line using POpt and
PEM , and needle length offset). When taking the needle bending into account, we can
estimate the needle tip position using the angular springs model with either the com-
bination of PEM , POpt , and Oopt (TIPEMOptOpt Þ or the combination of POpt , PEM and OEM
(TIPOptEMEM Þ, which are formulated in (2) and (3).
PEMOptOpt ¼ g1 PEM ; POpt ; OOpt ð2Þ
POptEMEM ¼ g2 POpt ; PEM ; OEM ð3Þ
8
>
> kq5 ¼ Ftip l
>
>
>
> kq4 ¼ Ftip lð1 þ cos q5 Þ
>
>
>
< kq3 ¼ Ftip l½1 þ cos q5 þ cosðq5 þ q4 Þ
ð5Þ
>
> kq ¼ F tip ½1 þ cos q5 þ cosðq5 þ q4 Þ þ cosðq5 þ q4 þ q3 Þ
l
>
>
2
>
> kq1 ¼ Ftip l½1 þ cos q5 þ cosðq5 þ q4 Þ þ cosðq5 þ q4 þ q3 Þ
>
>
>
:
þ cosðq5 þ q4 þ q3 þ q2 Þ
The Eq. (5) can be written in the form of k U ¼ Ftip JðUÞ, where U ¼ ½q1 ; q2 ; . . .;
qn , and J is the parameter function calculating the force-deflection relationship vector.
In order to implement this model into the tip estimation method in (2) and (3), one more
equation is needed for relating sensor input data with (5). As the data of PEM ; POpt ; Oopt
and Popt ; PEM ; OEM are received during insertion, the deflection of the needle can be
estimated as:
dEM ¼ l ½sin q1 þ sinðq1 þ q2 Þ ð6Þ
where dEM represents the deviation of the EM sensor from the optical-measured straight
needle orientation and dbase stands for the relative deviation of the needle base from the
EM measured direction.
Kalman Filter Based Data Fusion for Needle Deflection Estimation 461
To estimate the needle deflection from PEM ; POpt ; Oopt or Popt ; PEM ; OEM , a set of
nonlinear equations consisting of either (5) (6) or (5) (7) needs to be solved. However,
as proposed in [10], the nonlinear system of (6) can be solved iteratively using Picard’s
method, which is expressed in (8). Given the needle configuration Ut , we can use the
function J to estimate the needle posture at the next iteration.
For minor deflections, it only takes less than 10 iterations to solve this nonlinear
equations, which is efficient enough to achieve real-time estimation.
However, the implementation of Picard’s method requires the Ftip to be known. In
order to find the Ftip using the sensor inputs, a series of simulation experiments are
conducted and linearly-increasing simulated tip force Ftip with the corresponding dEM ,
dbase are collected. The simulation results are shown in Fig. 3. Left.
A least square method is implemented to fit the force-deviation data with a cubic
polynomial. Thereafter, to solve the needle configuration using PEM ; POpt ; Oopt and
POpt ; PEM ; OEM , the optimal cubic polynomial is used first to estimate the tip force from
the measured dEM and dbase , and then (5) is solved iteratively using (8).
Fig. 3. Left: Tip force and deflection relation: tip force increases with 50 mN intervals. Right:
Static tip bending experiment setup at MRI entrance.
462 B. Jiang et al.
3 Experiments
In order to validate our proposed method, we designed the static tip bending experi-
ment, which was performed at the isocenter and 650 mm offset along z-axis from the
isocenter (entrance) of MRI shown in Fig. 3. Right. The experiment is conducted in
two steps: first, the needle tip was placed at a particular point (such as inside a phantom
marker) and kept static without bending the needle. The optical and EM sensor data
were recorded for 10 s. Second, the needle’s tip remained at the same point and the
needle was bent by maneuvering the needle base, with a mean magnitude of about
40 mm tip deviation for large bending validation and 20 mm for small bending vali-
dation. Similarly, the data were recorded from both sensors for an additional 20 s.
Besides, needle was bent in three patterns: in the x-y plane of MRI, y-z plane and all
directions, to evaluate the relevance between EM sensor orientation and its accuracy.
From the data collected in the first step, the estimated needle tip mean position
without needle deflection compensation can be viewed as the gold standard reference
point TIPgold . In the second step, the proposed fusion method, together with other tip
estimation methods, was used to estimate the static tip position, which was compared
with TIPgold . The results are shown in Fig. 4. For large bending, error of TIPOpt , TIPEM
and TIPfused is 29.23 mm, 6.29 mm, 3.15 mm at isocenter, and 39.96 mm, 9.77 mm,
6.90 mm at MRI entrance, respectively. For small bending they become 21.00 mm,
3.70 mm, 2.20 mm at isocenter, and 16.54 mm, 5.41 mm, 4.20 mm at entrance,
respectively.
4 Discussion
By comparing the TIPfused with TIPOpt instead of TIPEM , it should be noted that the EM
sensor is primarily used to augment the measurements of the optical sensor and
compensate for its line-of-sight problem. Although EM sensor better estimates the
needle tip position in presence of needle bending, it is sensitive to the MR gradient field
nonlinearity and noise. Therefore, its performance is less reliable when performing the
needle insertion procedure at the MRI entrance.
Although quantifying the range of bending during therapy is difficult, our initial
insertion experiments in a homogeneous spine phantom using the same needle
demonstrated a needle bending of over 10 mm. Therefore, we attempted to simulate a
larger bending (40 mm tip deviation) that could be anticipated when needle is inserted
through heterogeneous tissue composition. However, as small bending will be more
commonly observed, validation experiments were conducted and demonstrated con-
sistently better estimation using the data fusion method.
From Fig. 4 Bottom, we find that the green dots, which represent bending in the x-y
plane, exhibit higher accuracy of the EM sensor, thus resulting in a better fusion result.
For large bending experiment in the x-y plane at the entrance, the mean error of TIPOpt ,
TIPEM and TIPfused are 28.22 mm, 5.76 mm, 3.40 mm, respectively. The result sug-
gests that by maneuvering the needle in the x-y plane, the estimation accuracy can be
further improved.
Kalman Filter Based Data Fusion for Needle Deflection Estimation 463
Fig. 4. Top: Single experiment result. Each scattered point represent a single time step record.
The left-side points represent the estimated tip positions using different methods. The light blue
points in the middle and dark blue points to the right represent the raw data of EM sensor
locations and needle base positions respectively. The black sphere is centered at the gold standard
point, and encompasses 90 % of the fused estimation points (black). Lines connect the raw data
and estimated tip positions of a single time step. Bottom: From left to right: large bending
experiment at isocenter, large-entrance, small-isocenter, small-entrance. X axis, from 1 to 6, stand
for TIPfused , TIPEM , TIPOptEMEM , TIPEMOptOpt , TIPOptEM , TIPOpt , respectively. Y axis indicates the
mean estimation error (mm) and each dot represents a single experiment result.
It should be noted that the magnitude of estimation errors using fusion method still
appears large due to the significant bending introduced in the needle. When the actual
bending becomes less conspicuous, the estimation error can be much smaller. In addition,
the estimation error is not equal to the overall targeting error. It only represents the
real-time tracking error in presence of needle bending. By integrating the data fusion
algorithm with the 3D Slicer-based navigation system [13], clinicians can be provided
with better real-time guidance and maneuverability of the needle.
464 B. Jiang et al.
5 Conclusion
In this work, we proposed a Kalman filter based optical-EM sensor fusion method to
estimate the flexible needle deflection. The data fusion method exhibits consistently
smaller mean error than the methods without fusion. The EM sensor used in our
method is MR-safe, and the method requires no other force or insertion-depth sensor,
making it easy to integrate with the clinical workflow. In the future, we will improve
the robustness of the needle bending model and integrate with our navigation system.
References
1. Abolhassani, N., Patel, R., Moallem, M.: Needle insertion into soft tissue: a survey. Med.
Eng. Phys. 29(4), 413–431 (2007)
2. Dupuy, D.E., Zagoria, R.J., Akerley, W., Mayo-Smith, W.W., Kavanagh, P.V., Safran, H.:
Percutaneous radiofrequency ablation of malignancies in the lung. AJR Am. J. Roentgenol.
174(1), 57–59 (2000)
3. Mala, T., Edwin, B., Mathisen, Ø., Tillung, T., Fosse, E., Bergan, A., Søreide, Ø.,
Gladhaug, I.: Cryoablation of colorectal liver metastases: minimally invasive tumour control.
Scand. J. Gastroenterol. 39(6), 571–578 (2004)
4. Park, Y.L., Elayaperumal, S., Daniel, B., Ryu, S.C., Shin, M., Savall, J., Black, R.J.,
Moslehi, B., Cutkosky, M.R.: Real-time estimation of 3-D needle shape and deflection for
MRI-guided interventions. IEEE/ASME Trans. Mechatron. 15(6), 906–915 (2010)
5. Wan, G., Wei, Z., Gardi, L., Downey, D.B., Fenster, A.: Brachytherapy needle deflection
evaluation and correction. Med. Phys. 32(4), 902–909 (2005)
6. Asadian, A., Kermani, M.R., Patel, R.V.: An analytical model for deflection of flexible
needles during needle insertion. In: 2011 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), pp. 2551–2556 (2011)
7. Dorileo, E., Zemiti, N., Poignet, P.: Needle deflection prediction using adaptive slope model.
In: 2015 IEEE International Conference on Advanced Robotics (ICAR), pp. 60–65 (2015)
8. Roesthuis, R.J., Van Veen, Y.R.J., Jahya, A., Misra, S.: Mechanics of needle-tissue
interaction. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), pp. 2557–2563 (2011)
9. Sadjadi, H., Hashtrudi-Zaad, K., Fichtinger, G.: Fusion of electromagnetic trackers to
improve needle deflection estimation: simulation study. IEEE Trans. Biomed. Eng. 60(10),
2706–2715 (2013)
10. Goksel, O., Dehghan, E., Salcudean, S.E.: Modeling and simulation of flexible needles. Med.
Eng. Phys. 31(9), 1069–1078 (2009)
11. Lagarias, J.C., Reeds, J.A., Wright, M.H., Wright, P.E.: Convergence properties of the
Nelder–Mead simplex method in low dimensions. SIAM J. Optim. 9(1), 112–147 (1998)
12. Du, H., Zhang, Y., Jiang, J., Zhao, Y.: Needle deflection during insertion into soft tissue
based on virtual spring model. Int. J. Multimedia Ubiquit. Eng. 10(1), 209–218 (2015)
13. Jayender, J., Lee, T.C., Ruan, D.T.: Real-time localization of parathyroid adenoma during
parathyroidectomy. N. Engl. J. Med. 373(1), 96–98 (2015)
Bone Enhancement in Ultrasound Based on 3D
Local Spectrum Variation for Percutaneous
Scaphoid Fracture Fixation
1 Introduction
Scaphoid fracture is the most probable outcome of wrist injury and it often
occurs due to sudden fall on an outstretched arm. To heal the fracture, casting
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 465–473, 2016.
DOI: 10.1007/978-3-319-46720-7 54
466 E.M.A. Anas et al.
is usually recommended which immobilizes the wrist in a short arm cast. The
typical healing time is 10–12 weeks, however, it can be longer especially for a
fracture located at the proximal pole of the scaphoid bone [8]. Better outcome
and faster recovery are normally achieved through open (for displaced fracture)
or percutaneous (for non-displaced fracture) surgical procedure, where a surgical
screw is inserted along the longest axis of the fractured scaphoid bone within a
clinical accuracy of 2 mm [7].
In the percutaneous surgical approach for scaphoid fractures, fluoroscopy is
usually used to guide the screw along its desired drill path. The major drawbacks
of a fluoroscopic guidance are that only a 2D projection view of a 3D anatomy can
be used and that the patient and the personnel working in the operating room are
exposed to radiation. For reduction of the X-ray radiation exposure, a camera-
based augmentation technique [10] can be used. As an alternative to fluoroscopy,
3D ultrasound (US)-based procedure [2,3] has been suggested, mainly to allow
real-time 3D data for the navigation. However, the main challenge of using US
in orthopaedics lies in the enhancement of weak, dis-connected, blurry and noisy
US bone responses.
The detection/enhancement of the US bone responses can be broadly cate-
gorized into two groups: intensity-based [4] and phase-based approaches [2,5,6].
A review of the literature of these two approaches suggests the phase-based
approaches have an advantage where there are low-contrast or variable bone
responses, as often observed in 3D US data. Hacihaliloglu et al. [5,6] proposed a
number of phase-based bone enhancement approaches using a set of quadrature
band-pass (Log-Gabor) filters at different scales and orientations. These filters
assumed isotropic frequency responses across all orientations. However, the bone
responses in US have a highly directional nature that in turn produce anisotropic
frequency responses in the frequency domain. Most recently, Anas et al. [2] pre-
sented an empirical wavelet-based approach to design a set of 2D anisotropic
band-pass filters. For bone enhancement of a 3D US volume, that 2D approach
could be applied to individual 2D frames of a given US volume. However, as a
2D-based approach, it cannot take advantage of correlations between adjacent
US frames. As a result, the enhancement is affected by the spatial compounding
errors and the errors resulting from the beam thickness effects [5].
In this work, we propose to utilize local 3D Fourier spectrum variations to
design a set of Log-Gabor filters for 3D local phase symmetry estimation applied
to enhance the wrist bone response in 3D US. In addition, information from
the shadow map [4] is utilized to further enhance the bone response. Finally, a
statistical wrist model is registered to the enhanced response to derive a patient-
specific 3D model of the wrist bones. A study consisting of 13 cadaver wrists
is performed to determine the accuracy of the registration, and the results are
compared with two previously published bone enhancement techniques [2,5].
2 Methods
Bone responses in US are highly directional with respect to the direction of
scanning, i.e., the width of the bone response along the scanning direction is
Bone Enhancement in Ultrasound Based on 3D Local Spectrum Variation 467
z orientation 7.6 0
7.5 −10
cone 0
7.4
−20
7.3
)
u( )
−30 ,
y 7.2
−40 ,
7.1
0 7 −50
6.9 −60
x 6.8 −70
0.5 1 1.5 2 2.5 −2 −1 0
10 10 10
(a) (b) (c)
Fig. 1. Utilization of the spectrum variation in local phase symmetry estimation. (a)
A 3D frequency spectrum is divided into different cones, and each segmented cone is
further partitioned into different orientations. (b) The variation of spectrum strength
over the polar angle. (c) The variation of spectrum strength over the angular frequency.
For acquisition of US data from each cadaver wrist, a motorized linear probe
(Ultrasonix 4D L14-5/38, Ultrasonix, Richmond, BC, Canada) was used with
a frequency of 10 MHz, a depth of 40 mm and a field-of-view of 30◦ focusing
mainly on the scaphoid bone. A custom-built wrist holder was used to keep the
wrist fixed at extension position (suggested by expert hand surgeons) during
scanning. To obtain a preoperative image and a ground truth of wrist US bone
responses, CTs were acquired at neutral and extension positions, respectively,
for all 13 cadaver wrists. An optical tracking system equipped with six fiducial
markers was used to track the US probe.
3.2 Evaluation
To generate the ground truth wrist bone surfaces, CTs were segmented manually
using the Medical Imaging Interaction Toolkit. Fiducial-based registration was
used to align the segmented CT with the wrist bone responses in US. We also
needed a manual adjustment afterward to compensate the movement of the wrist
bones during US acquisition due to the US probe’s pressure on the wrist. The
manual translational adjustment was mainly performed along the direction of the
US scanning axis by registering the CT bone surfaces to the US bone responses.
For evaluation, we measured the mean surface distance error (mSDE) and
maximum surface (Hausdorff) distance error (mxSDE) between the registered
and reference wrist bone surfaces. The surface distance error (SDE) at each point
in the registered bone surface is defined as its Euclidean distance to the clos-
est neighboring point in the reference surface. mSDE and mxSDE are defined
as the average and maximum of SDEs across all vertices, respectively. We also
recorded the run-times of the three bone enhancement techniques from unop-
timized MATLABT M (Mathworks, Natick, MA, USA) code on an Intel Core
i7-2600M CPU at 3.40 GHz for an US volume of size of 57.3 × 36.45 × 32.7 mm3
with a pixel spacing of 0.4 mm in all dimensions.
3.3 Results
Table 1 reports a comparative result of our approach with respect to the EWLPS
and 3DLPS methods. For each bone enhancement technique, a consistent thresh-
old value that provides the least error is used across 13 cadaver cases.
US frame
EWLPS
(b) (f)
EWLPS (j)
3DLPS
(c) (g)
3DLPS
(k)
Our method
(d) (h)
Our method
Fig. 2. Results of the proposed, EWLPS and 3DLPS methods. (a-h) Example sagittal
US frames are shown in (a), (e). The corresponding bone enhancement are demon-
strated in (b-d), (f-h). The differences in the enhancement are prominent in the sur-
faces marked by arrows. (i-k) Example registration results of the statistical model to
US for three different methods.
References
1. Anas, E.M.A., et al.: A statistical shape+pose model for segmentation of wrist CT
images. In: SPIE Medical Imaging, vol. 9034, pp. T1–8. International Society for
Optics and Photonics (2014)
2. Anas, E.M.A., et al.: Bone enhancement in ultrasound using local spectrum vari-
ations for guiding percutaneous scaphoid fracture fixation procedures. IJCARS
10(6), 959–969 (2015)
3. Beek, M., et al.: Validation of a new surgical procedure for percutaneous scaphoid
fixation using intra-operative ultrasound. Med. Image Anal. 12(2), 152–162 (2008)
4. Foroughi, P., Boctor, E., Swartz, M.: 2-D ultrasound bone segmentation using
dynamic programming. In: IEEE Ultrasonics Symposium, pp. 2523–2526 (2007)
5. Hacihaliloglu, I., et al.: Automatic bone localization and fracture detection from
volumetric ultrasound images using 3-D local phase features. UMB 38(1), 128–144
(2012)
6. Hacihaliloglu, I., et al.: Local phase tensor features for 3D ultrasound to statistical
shape+pose spine model registration. IEEE TMI 33(11), 2167–2179 (2014)
7. Menapace, K.A., et al.: Anatomic placement of the herbert-whipple screw in
scaphoid fractures: a cadaver study. J. Hand Surg. 26(5), 883–892 (2001)
8. van der Molen, M.A.: Time off work due to scaphoid fractures and other carpal
injuries in the Netherlands in the period 1990 to 1993. J. Hand Surg.: Br. Eur.
24(2), 193–198 (1999)
Bone Enhancement in Ultrasound Based on 3D Local Spectrum Variation 473
9. Moore, D.C., et al.: A digital database of wrist bone anatomy and carpal kinemat-
ics. J. Biomech. 40(11), 2537–2542 (2007)
10. Navab, N., Heining, S.M., Traub, J.: Camera augmented mobile C-arm (CAMC):
calibration, accuracy study, and clinical applications. IEEE TMI 29(7), 1412–1423
(2010)
11. Rasoulian, A., Rohling, R., Abolmaesumi, P.: Lumbar spine segmentation using
a statistical multi-vertebrae anatomical shape+pose model. IEEE TMI 32(10),
1890–1900 (2013)
Bioelectric Navigation: A New Paradigm
for Intravascular Device Guidance
1 Introduction
As common vascular procedures become less invasive, the need for advanced
catheter navigation techniques grows. These procedures depend on accurate nav-
igation of endovascular devices, but the clinical state of the art presents signif-
icant challenges. In practice, the interventionalist plans the path to the area of
interest based on pre-interventional images, inserts guide wires and catheters,
and navigates to the area of interest using multiple fluoroscopic images. How-
ever, it is difficult and time-consuming to identify bifurcations for navigation,
and the challenge is compounded by anatomic irregularities.
B. Fuerst and E.E. Sutton are joint first authors, having contributed equally.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 474–481, 2016.
DOI: 10.1007/978-3-319-46720-7 55
Bioelectric Navigation 475
In our novel system, the local measurement from the catheter is compared to
predicted measurements from a pre-interventional image to identify the global
position of the catheter relative to the vessel tree. It takes advantage of high-
resolution pre-interventional images and live voltage measurement for improved
device navigation. Its primary benefit would be the reduction of radiation expo-
sure for the patient, interventionalist, and staff. Experiments in a synthetic vessel
tree and ex vivo biological tissue show the potential of the proposed technology.
476 B. Fuerst et al.
B 0.24 1.5
Simulated Voltage, µV
1.0
2
1/Area, 1/mm
0.16
0.5
0
0.08
-0.5
0 -1.0
50 100 Position, mm 150 200
Fig. 2. (A) Simulation of synthetic vessel phantom from imported CAD geometry.
The electrodes (black) span the left-most stenosis in this image. The voltage decreases
at a bifurcation (blue star) and increases at a stenosis (pink star). (B) Simulated
voltage magnitude (green) and the inverse of the cross-sectional area (purple) from the
segmented CBCT.
the vessel model by identifying the reference path with the highest similarity
measure: the normalized cross-correlation with the test signal. In these initial
experiments, we advanced the catheter through approximately 90 % of each path,
so our analysis did not take advantage of the open-ended nature of the algorithm.
Fig. 3. A function generator supplied a sinusoidal signal to the current source, creating
a weak electric field around the catheter tip. Electrodes on the catheter recorded the
voltage as it was pulled through six paths of the phantom. Inset: catheter in phantom.
C 40 85
Position, mm 170 255
2.0
D Correspondence E Warped Signals
Catheter Voltage Magnitude
Simulated Voltage, µ V
2 1.5
0 1.0
-2 0.5
-4 0
0 5 10 15 0 5 10
Time, sec Time, sec
Fig. 4. (A) Synthetic phantom with labeled paths. The two halves of the phantom
were machined from acrylic and sealed with a thin layer of transparent waterproof
grease. When assembled, it measured 10 cm × 25.4 cm × 5 cm. (B) Trials for which
OE-DTW incorrectly predicted catheter position. (C) The measured voltage (blue)
and the simulated signal (green) identify the two stenoses and four bifurcations. The
signals appear correlated but misaligned. (D) The OE-DTW algorithm found a corre-
spondence path between the two signals. (E) OE-DTW aligned the simulated data to
the measured data and calculated the cross-correlation between the two signals.
Position, mm
Catheter Voltage Magnitude
1/Area, 1/mm2
2 0.5
0 0.25
-2 0
0 4 8 12 16
Time, sec
200 Correspondence Warped Signals
Position, mm
100
0
0 5 10 15 0 5 10 15
Time, sec Time, sec
Fig. 5. Biological tissue experiment (left) and results from one trial in the long path
(right). The stenosis and bifurcation are visible in both the inverse of the cross-sectional
area and voltage magnitude.
The impedance difference between saline and vessel is less dramatic than between
saline and acrylic, so we expected lower amplitude signals in biological tissue. We
sutured two porcine aorta into a Y-shaped vessel tree and simulated a stenosis
480 B. Fuerst et al.
in the trunk with a cable tie. We embedded the vessel in a 20 % gelatin solution
and filled the vessel with 0.9 % saline. The ground truth catheter position was
recorded from fluoroscopic image series collected simultaneously with the voltage
measurements (Fig. 5). The catheter was advanced six times through the long
path and three times through the short path. The algorithm correctly identified
the path 9/9 times with similarity measure 0.6081 ± 0.1614.
4 Discussion
This preliminary investigation suggests that the location of the catheter in a
blood vessel can be estimated by comparing a series of local measurements to
simulated bioimpedance measurements from a pre-interventional image.
Our technology will benefit from further integration of sensing and imaging
before clinical validation. While OE-DTW did not perfectly predict the location
of the catheter, the trials for which the algorithm misclassified the path also had
the lowest similarity scores. In practice, the system would prompt the interven-
tionalist to take a fluoroscopic image when similarity is low. Because it measures
local changes in bioimpedance, we expect the highest accuracy in feature-rich
environments, those most relevant to endovascular procedures. The estimate is
least accurate in low-feature environments like a long, uniform vessel, but as soon
as the catheter reaches the next landmark, the real-time location prediction is
limited only by the resolution of the electric image from the catheter. A possible
source of uncertainty is the catheter’s position in the vessel cross-section relative
to the centerline, but according to our simulations and the literature [12], it does
not significantly impact the voltage measurement.
To display the real-time position estimate, our next step is to compare tech-
niques that match simulated and live data in real time (e.g. OE-DTW, Hidden
Markov Models, random forests, and particle filters). A limitation of these match-
ing algorithms is that they fail when the catheter changes direction (insertion vs
retraction). One way we plan to address this is by attaching a simple encoder to
the introducer sheath to detect the catheter’s heading and prompting our soft-
ware to only analyze data from when the catheter is being inserted. We recently
validated Bioelectric Navigation in biologically relevant flow in the synthetic
phantom and performed successful renal artery detection the the abdominal
aorta of a sheep cadaver model. Currently, we are evaluating the prototype’s
performance in vivo, navigating through the abdominal vasculature of swine.
References
1. Ambrosini, P., Ruijters, D., Niessen, W.J., Moelker, A., van Walsum, T.: Continu-
ous roadmapping in liver TACE procedures using 2D–3D catheter-based registra-
tion. Int. J. CARS 10, 1357–1370 (2015)
2. Aylward, S.R., Jomier, J., Weeks, S., Bullitt, E.: Registration and analysis of vas-
cular images. Int. J. Comput. Vis. 55(2), 123–138 (2003)
3. Dibildox, G., Baka, N., Punt, M., Aben, J., Schultz, C., Niessen, W.,
van Walsum, T.: 3D/3D registration of coronary CTA and biplane XA recon-
structions for improved image guidance. Med. Phys. 41(9), 091909 (2014)
4. Von der Emde, G., Schwarz, S., Gomez, L., Budelli, R., Grant, K.: Electric fish
measure distance in the dark. Nature 395(6705), 890–894 (1998)
5. Gabriel, S., Lau, R., Gabriel, C.: The dielectric properties of biological tissues: II.
Measurements in the frequency range 10 Hz to 20 GHz. Phys. Med. Biol. 41(11),
2251–2269 (1996)
6. Groher, M., Zikic, D., Navab, N.: Deformable 2D–3D registration of vascular struc-
tures in a one view scenario. IEEE Trans. Med. Imaging 28(6), 847–860 (2009)
7. Hettrick, D., Battocletti, J., Ackmann, J., Linehan, J., Waltier, D.: In vivo mea-
surement of real-time aortic segmental volume using the conductance catheter.
Ann. Biomed. Eng. 26, 431–440 (1998)
8. Metzen, M., Biswas, S., Bousack, H., Gottwald, M., Mayekar, K., von der Emde, G.:
A biomimetic active electrolocation sensor for detection of atherosclerotic lesions
in blood vessels. IEEE Sens. J. 12(2), 325–331 (2012)
9. Mitrovic, U., Spiclin, Z., Likar, B., Pernus, F.: 3D–2D registration of cerebral
angiograms: a method and evaluation on clinical images. IEEE Trans. Med. Imag-
ing 32(8), 1550–1563 (2013)
10. Pauly, O., Heibel, H., Navab, N.: A machine learning approach for deformable
guide-wire tracking in fluoroscopic sequences. In: Jiang, T., Navab, N., Pluim,
J.P.W., Viergever, M.A. (eds.) MICCAI 2010, Part III. LNCS, vol. 6363, pp. 343–
350. Springer, Heidelberg (2010)
11. Tormene, P., Giorgino, T., Quaglini, S., Stefanelli, M.: Matching incomplete time
series with dynamic time warping: an algorithm and an application to post-stroke
rehabilitation. Artif. Intell. Med. 45, 11–34 (2009)
12. Choi, H.W., Zhang, Z., Farren, N., Kassab, G.: Implications of complex anatomical
junctions on conductance catheter measurements of coronary arteries. J. Appl.
Physiol. 114(5), 656–664 (2013)
Process Monitoring in the Intensive Care Unit:
Assessing Patient Mobility Through Activity
Analysis with a Non-Invasive Mobility Sensor
1 Introduction
Monitoring human activities in complex environments are finding an increasing
interest [2,3]. Our current investigation is driven by automated hospital surveil-
lance, specifically, for critical care units that house the sickest and most fragile
patients. In 2012, the Institute of Medicine released their landmark report [4] on
developing digital infrastructures that enable rapid learning health systems; one
of their key postulates is the need for improvement technologies for measuring
the care environment. Currently, simple measures such as whether the patient
has moved in the last 24 h, or whether the patient has gone unattended for sev-
eral hours require manual observation by a nurse, which is highly impractical
to scale. Early mobilization of critically ill patients has been shown to reduce
physical impairments and decrease length of stay [5], however the reliance on
direct observation limits the amount of data that may be collected [6].
To automate this process, non-invasive low-cost camera systems have begun
to show promise [7,8], though current approaches are limited due to the unique
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 482–490, 2016.
DOI: 10.1007/978-3-319-46720-7 56
Patient Mobility in the ICU with NIMS 483
2 Methods
Figure 1 shows an overview of our NIMS system. People are localized, tracked,
and identified using an RGB-D sensor. We predict the pose of the patient and
identify nearby objects to serve as context. Finally, we analyze in-place motion
and train a classifier to determine the highest level of patient mobility.
(pixels/s)
In-bed activity
Fig. 1. Flowchart of our mobility prediction framework. Our system tracks people in
the patient’s room, identifies the “role” of each (“patient”, “caregiver”, or “family
member”), relevant objects, and builds attribute features for mobility classification.
Fig. 2. Full-body (red) and Head (green) detectors trained by [11]. The head detector
may fail with (a) proximity or (d) distance. The full-body detector may also struggle
with proximity [(b) and (c)]. (To protect privacy, all images are blurred). (Color figure
online)
where wkt (n) is the detection score as a penalty for each detected bounding box.
486 A. Reiter et al.
min λdet Edet + λspa Espa + λexc Eexc + λreg Ereg + λdyn Edyn + λapp Eapp (6)
X t ,M t
We refer the interested reader to [22] for more details on our tracking framework.
Table 1. Table comparing our Sensor Scale, containing the 4 discrete levels of mobil-
ity that the NIMS is trained to categorize from a video clip of a patient in the ICU, to
the standardized ICU Mobility Scale [23], used by clinicians in practice today.
We manually annotated: (1) head and full body bounding boxes; (2) person
identification labels; (3) pose labels; and (4) chair, upright, and down beds.
To train the NIMS Mobility classifier, 83 of the 109 video segments covering
the 5 left-out patients were selected, each containing 1000 images. For each clip,
a senior clinician reviewed and reported the highest level of patient mobility and
we trained our mobility classifier through leave-one-out cross validation.
Tracking, Pose, and Identification Evaluation - We quantitatively com-
pared our tracking framework to the current SOTA. We evaluate with the widely
used metric MOTA (Multiple Object Tracking Accuracy) [26], which is defined
as 100 % minus three types of errors: false positive rate, missed detection rate,
and identity switch rate. With our ICU dataset, we achieved a MOTA of 29.14 %
compared to −18.88 % with [15] and −15.21 % with [16]. Using a popular RGBD
Pedestrian Dataset [27], we achieve a MOTA of 26.91 % compared to 20.20 % [15]
and 21.68 % [16]. We believe the difference in improvement here is due to there
being many more occlusions in our ICU data compared to [27]. With respect to
our person and pose ID, we achieved 99 % and 98 % test accuracy, respectively,
over 1052 samples. Our tracking framework requires a runtime of 10 secs/frame
(on average), and speeding this up to real-time is a point of future work.
Mobility Evaluation - Table 2 shows a confusion matrix for the 83 video seg-
ments to demonstrate the inter-rater reliability between the NIMS and clinician
ratings. We evaluated the NIMS using a weighted Kappa statistic with a lin-
ear weighting scheme [28]. The strength of agreement for the Kappa score was
qualitatively interpreted as: 0.0–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as
moderate, 0.61–0.80 as substantial, 0.81–1.0 as perfect [28]. Our weighted Kappa
was 0.8616 with a 95 % confidence interval of (0.72, 1.0). To compare to a pop-
ular technique, we computed features using Dense Trajectories [17] and trained
an SVM (using Fisher Vector encodings with 120 GMMs), achieving a weighted
Kappa of 0.645 with a 95 % confidence interval of (0.43, 0.86).
The main source of difference in agreement was contained within differenti-
ating “A” from “B”. This disagreement highlights a major difference between
human and machine observation in that the NIMS is a computational method
being used to distinguish activities containing motion from those that do not
with a quantitative, repeatable approach.
Patient Mobility in the ICU with NIMS 489
4 Conclusions
In this paper, we demonstrated a video-based activity monitoring system called
NIMS. With respect to the main technical contributions, our multi-person track-
ing methodology addresses a real-world problem of tracking humans in complex
environments where occlusions and rapidly-changing visual information occurs.
We will to continue to develop our attribute-based activity analysis for more gen-
eral activities as well as work to apply this technology to rooms with multiple
patients and explore the possibility of quantifying patient/provider interactions.
References
1. Brower, R.: Consequences of bed rest. Crit. Care Med. 37(10), S422–S428 (2009)
2. Corchado, J., Bajo, J., De Paz, Y., Tapia, D.: Intelligent environment for moni-
toring Alzheimer patients, agent technology for health care. Decis. Support Syst.
44(2), 382–396 (2008)
3. Hwang, J., Kang, J., Jang, Y., Kim, H.: Development of novel algorithm and real-
time monitoring ambulatory system using bluetooth module for fall detection in
the elderly. In: IEEE EMBS (2004)
4. Smith, M., Saunders, R., Stuckhardt, K., McGinnis, J.: Best Care at Lower Cost:
the Path to Continuously Learning Health Care in America. National Academies
Press, Washington, DC (2013)
5. Hashem, M., Nelliot, A., Needham, D.: Early mobilization and rehabilitation in the
intensive care unit: moving back to the future. Respir. Care 61, 971–979 (2016)
6. Berney, S., Rose, J., Bernhardt, J., Denehy, L.: Prospective observation of physical
activity in critically ill patients who were intubated for more than 48 hours. J.
Crit. Care 30(4), 658–663 (2015)
7. Chakraborty, I., Elgammal, A., Burd, R.: Video based activity recognition in
trauma resuscitation. In: International Conference on Automatic Face and Ges-
ture Recognition (2013)
8. Lea, C., Facker, J., Hager, G., et al.: 3D sensing algorithms towards building an
intelligent intensive care unit. In: AMIA Joint Summits Translational Science Pro-
ceedings (2013)
9. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In:
IEEE CVPR (2005)
10. Chen, X., Mottaghi, R., Liu, X., et al.: Detect what you can: detecting and repre-
senting objects using holistic models and body parts. In: IEEE CVPR (2014)
11. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with
discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010)
12. Verceles, A., Hager, E.: Use of accelerometry to monitor physical activity in criti-
cally ill subjects: a systematic review. Respir. Care 60(9), 1330–1336 (2015)
13. Babenko, D., Yang, M., Belongie, S.: Robust object tracking with online multiple
instance learning. PAMI 33(8), 1619–1632 (2011)
14. Lu, Y., Wu, T., Zhu, S.: Online object tracking, learning and parsing with and-or
graphs. In: IEEE CVPR (2014)
15. Choi, W., Pantofaru, C., Savarese, S.: A general framework for tracking multiple
people from a moving camera. PAMI 35(7), 1577–1591 (2013)
16. Milan, A., Roth, S., Schindler, K.: Continuous energy minimization for multi-target
tracking. TPAMI 36(1), 58–72 (2014)
490 A. Reiter et al.
17. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE
ICCV (2013)
18. Karpathy, A., Toderici, G., Shetty, S., et al.: Large-scale video classification with
convolutional neural networks. In: IEEE CVPR (2014)
19. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recog-
nition in videos. In: NIPS (2014)
20. Wu, Z., Wang, X., Jiang, Y., Ye, H., Xue, X.: Modeling spatial-temporal clues in
a hybrid deep learning framework for video classification. In: ACMMM (2015)
21. Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In:
IEEE CVPR (2011)
22. Ma, A.J., Yuen, P.C., Saria, S.: Deformable distributed multiple detector fusion
for multi-person tracking (2015). arXiv:1512.05990 [cs.CV]
23. Hodgson, C., Needham, D., Haines, K., et al.: Feasibility and inter-rater reliability
of the ICU mobility scale. Heart Lung 43(1), 19–24 (2014)
24. Girshick, R.: Fast R-CNN (2015). arXiv:1504.08083
25. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
volutional neural networks. In: NIPS (2012)
26. Keni, B., Rainer, S.: Evaluating multiple object tracking performance: the CLEAR
MOT metrics. EURASIP J. Image Video Proces. 2008, 1–10 (2008)
27. Spinello, L., Arras, K.O.: People detection in RGB-D data. In: IROS (2011)
28. McHugh, M.: Interrater reliability: the Kappa statistic. Biochemia Med. 22(3),
276–282 (2012)
Patient MoCap: Human Pose Estimation Under
Blanket Occlusion for Hospital Monitoring
Applications
1 Introduction
Human motion analysis in the hospital is required in a broad range of diagnostic
procedures. While gait analysis and the evaluation of coordinated motor func-
tions [1,2] allow the patient to move around freely, the diagnosis of sleep-related
motion disorders and movement during epileptic seizures [3] requires a hospital-
ization and long-term stay of the patient. In specialized monitoring units, the
movements of hospitalized patients are visually evaluated in order to detect crit-
ical events and to analyse parameters such as lateralization, movement extent
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 491–499, 2016.
DOI: 10.1007/978-3-319-46720-7 57
492 F. Achilles et al.
2 Related Work
Human pose estimation in the hospital bed has only been approached as a clas-
sification task, which allows to estimate a rough pose or the patient status [5,6].
Li et al. [5] use the Kinect sensor SDK in order to retrieve the patient pose and
estimate the corresponding status. However, they are required to leave the test
subjects uncovered by a blanket, which reduces the practical value for real hospi-
tal scenarios. Yu et al. [6] develop a method to extract torso and head locations
and use it to measure breathing motion and to differentiate sleeping positions.
No attempt was made to infer precise body joint locations and blanket occlusion
was reported to decrease the accuracy of the torso detection. While the number
of previous works that aim at human pose estimation for bed-ridden subjects is
limited, the popularity of depth sensors has pushed research on background-free
3D human pose estimation. Shotton et al. [7] and Girshick et al. [8] train Ran-
dom Forests on a large non-public synthetic dataset of depth frames in order to
capture a diverse range of human shapes and poses. In contrast to their method,
we rely on a realistic dataset that was specifically created to evaluate methods
for human pose estimation in bed. Furthermore, we augment the dataset with
blanket occlusions and aim at making it publicly available. More recently, deep
learning has entered the domain of human pose estimation. Belagiannis et al. [9]
use a convolutional neural network (CNN) and devise a robust loss function to
regress 2D joint positions in RGB images. Such one-shot estimations however
do not leverage temporal consistency. In the work of Fragkiadaki et al. [10], the
Patient MoCap: Human Pose Estimation Under Blanket Occlusion 493
3 Methods
3.1 Convolutional Neural Network
Fig. 1. Data generation and training pipeline. Motion capture (left) allows to retrieve
ground truth joint positions y, which are used to train a CNN-RNN model on depth
video. A simulation tool was used to occlude the input (blue) with a blanket (grey),
such that the system can learn to infer joint locations ŷ even under blanket occlusion.
Fig. 2. Snapshots of iterations of the physics simulation that was used to generate
depth maps occluded by a virtual blanket.
to generate depth maps with the person under a virtual blanket. Each RGB-D
frame is used as a collision body for a moving simulated blanket, represented as a
regular triangle mesh. At the beginning of a sequence, the blanket is added to the
scene at about 2 m above the bed. For each frame of the sequence, gravity acts
upon the blanket vertices. Collisions are handled by using a sparse signed dis-
tance function representation of the depth frame, implemented in OpenVDB [12].
See Fig. 2 for an example rendering. In order to optimize for the physical ener-
gies, we employ a state-of-the-art projection-based dynamics solver [13]. The
geometric energies used in the optimization are triangle area preservation, trian-
gle strain and edge bending constraints for the blanket and closeness constraints
for the collisions, which results in realistic bending and folding of the simulated
blanket.
4 Experiments
As to validate our method, we compare to the regression forest (RF) method
introduced by Girshick et al. [8]. The authors used an RF to estimate the body
pose from depth data. At the training phase, random pixels in the depth image
are taken as training samples. A set of relative offset vectors from each sample’s
3D location to the joint positions is stored. At each branch node, a depth-
difference feature is evaluated and compared to a threshold, which determines if
the sample is passed to the left or the right branch. Threshold and the depth-
difference feature parameters are jointly optimized to provide the maximum
information gain at the branch node. The tree stops growing after a maximum
depth has been reached or if the information gain is too low. At the leaves, the
sets of offsets vectors are clustered and stored as vote vectors. During test time,
body joint locations are inferred by combining the votes of all pixels via mean
shift. The training time of an ensemble of trees on >100 k images is prohibitively
long, which is why the original authors use a 1000-core computational cluster to
achieve state-of-the-art results [7]. To circumvent this requirement, we randomly
496 F. Achilles et al.
sample 10 k frames per tree. By evaluating the gain of using 20 k and 50 k frames
for a single tree, we found that the accuracy saturates quickly (compare Fig. 6
of [8]), such that using 10k samples retains sufficient performance while cutting
down the training time from several days to hours.
Fig. 3. Worst case accuracy computed on 36,000 test frames of the original dataset.
On the y-axis we plot the ratio of frames in which all estimated joints are closer to the
ground truth than a threshold D, which is plotted on the x-axis.
A blanket was simulated on a subset of 10,000 frames of the dataset (as explained
in Sect. 3.4). This set was picked from the clonic movement sequence, as it is
most relevant to clinical applications and allows to compare one-shot (CNN and
RF) and time series methods (RNN) on repetitive movements under occlusion.
The three methods were trained on the new mixed dataset consisting of all
Patient MoCap: Human Pose Estimation Under Blanket Occlusion 497
a) c)
other sequences (not occluded by a blanket) and the new occluded sequence.
For the RF, we added a 6th tree which was trained on the occluded sequence.
Figure 4 shows a per joint comparison of the average error that was reached on
the occluded test set. Especially for hips and legs, the RF approach at over 20 cm
error performs worse than CNN and RNN, which achieve errors lower than 10 cm
except for the left foot. However, the regression forest manages to identify the
head and upper body joints very well and even beats the best method (RNN)
for head, right shoulder and right hand. In Table 1 we compare the average error
on the occluded sequence before and after retraining each method with blan-
ket data. Without retraining on the mixed dataset, the CNN performs best at
9.05 cm error, while after retraining the RNN clearly learns to infer a better joint
estimation for occluded joints, reaching the lowest error at 7.56 cm. Renderings
of the RNN predictions on unoccluded and occluded test frames are shown in
Fig. 5.
498 F. Achilles et al.
5 Conclusions
In this work we presented a unique hospital-setting dataset of depth sequences
with ground truth joint position data. Furthermore, we proposed a new scheme
for 3D pose estimation of hospitalized patients. Training a recurrent neural net-
work on CNN features reduced the average error both on the original dataset
and on the augmented version with an occluding blanket. Interestingly, the RNN
benefits a lot from seeing blanket occluded sequences during training, while the
CNN can only improve very little. It appears that temporal information helps
to determine the location of limbs which are not directly visible but do interact
with the blanket. The regression forest performed well for arms and the head,
but was not able to deal with occluded legs and hip joints that are typically close
to the bed surface, resulting in a low contrast. The end-to-end feature learning
of our combined CNN-RNN model enables it to better adapt to the low contrast
of occluded limbs, which makes it a valuable tool for pose estimation in realistic
environments.
Acknowledgments. The authors would like to thank Leslie Casas and David Tan
from TUM and Marc Lazarovici from the Human Simulation Center Munich for their
support. This work has been funded by the German Research Foundation (DFG)
through grants NA 620/23-1 and NO 419/2-1.
References
1. Stone, E.E., Skubic, M.: Unobtrusive, continuous, in-home gait measurement using
the microsoft kinect. IEEE Trans. Biomed. Eng. 60(10), 2925–2932 (2013)
2. Kontschieder, P., Dorn, J.F., Morrison, C., Corish, R., Zikic, D., Sellen, A.,
D’Souza, M., Kamm, C.P., Burggraaff, J., Tewarie, P., Vogel, T., Azzarito, M.,
Glocker, B., Chin, P., Dahlke, F., Polman, C., Kappos, L., Uitdehaag, B.,
Criminisi, A.: Quantifying progression of multiple sclerosis via classification of
depth videos. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.)
MICCAI 2014, Part II. LNCS, vol. 8674, pp. 429–437. Springer, Heidelberg (2014)
3. Cunha, J., Choupina, H., Rocha, A., Fernandes, J., Achilles, F., Loesch, A.,
Vollmar, C., Hartl, E., Noachtar, S.: NeuroKinect: a novel low-cost 3Dvideo-EEG
system for epileptic seizure motion quantification. PLOS ONE 11(1), e0145669
(2015)
4. Benbadis, S.R., LaFrance, W., Papandonatos, G., Korabathina, K., Lin, K.,
Kraemer, H., et al.: Interrater reliability of eeg-video monitoring. Neurology
73(11), 843–846 (2009)
5. Li, Y., Berkowitz, L., Noskin, G., Mehrotra, S.: Detection of patient’s bed statuses
in 3D using a microsoft kinect. In: EMBC. IEEE (2014)
6. Yu, M.-C., Wu, H., Liou, J.-L., Lee, M.-S., Hung, Y.-P.: Multiparameter sleep mon-
itoring using a depth camera. In: Schier, J., Huffel, S., Conchon, E., Correia, C.,
Fred, A., Gamboa, H., Gabriel, J. (eds.) BIOSTEC 2012. CCIS, vol. 357, pp. 311–
325. Springer, Heidelberg (2013)
7. Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A.,
Cook, M., Moore, R.: Real-time human pose recognition in parts from single depth
images. Commun. ACM 56(1), 116–124 (2013)
Patient MoCap: Human Pose Estimation Under Blanket Occlusion 499
8. Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regres-
sion of general-activity human poses from depth images. In: ICCV. IEEE (2011)
9. Belagiannis, V., Rupprecht, C., Carneiro, G., Navab, N.: Robust optimization for
deep regression. In: ICCV. IEEE (2015)
10. Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for
human dynamics. In: ICCV. IEEE (2015)
11. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint
arXiv:1308.0850 (2013)
12. Museth, K., Lait, J., Johanson, J., Budsberg, J., Henderson, R., Alden, M.,
Cucka, P., Hill, D., Pearce, A.: OpenVDB: an open-source data structure and
toolkit for high-resolution volumes. In: ACM SIGGRAPH 2013 Courses. ACM
(2013)
13. Bouaziz, S., Martin, S., Liu, T., Kavan, L., Pauly, M.: Projective dynamics: fusing
constraint projections for fast simulation. ACM Trans. Graph. (TOG) 33(4), 154
(2014)
Numerical Simulation of Cochlear-Implant
Surgery: Towards Patient-Specific Planning
1 Introduction
Cochlear implant surgery can be used for
profoundly deafened patient, for whom
hearing aids are not satisfactory. An elec-
trode array is inserted into the tym-
panic ramp of the patient’s cochlea (scala
tympani). When well-inserted, this array
can then stimulate the auditory nerve
and provide a substitute way of hearing
(Fig. 2). However, as of today, the surgery
is performed manually and the surgeon Fig. 2. Cross-section of a cochlea with
has only little perception on what hap- implant inserted.
pens in the cochlea while he is doing the
insertion [1].
Yet, it is often the case that the
implant gets blocked in the cochlea before
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 500–507, 2016.
DOI: 10.1007/978-3-319-46720-7 58
Numerical Simulation of Cochlear-Implant Surgery 501
being completely inserted (Fig. 1). Another issue is the fact this insertion can cre-
ate trauma on the wall of the cochlea as well as damaging the basilar membrane.
This can lead to poor postoperative speech performances or loss of remaining
acoustic hearing in lower frequencies that can be combined with electric stimu-
lation. The simulation the insertion procedure would allow for great outcomes.
Indeed, it can be used for surgery planning, where the surgeon wish to predict the
quality of the insertion depending on various parameters (such as the insertion
angle or the type of implant used) for a specific patient, or surgery assistance in
the longer term (where the procedure would be robot-based). Cochlear implant
surgery was simulated in [2,3] respectively in 2 and 3 dimensions, on simplified
representations of the cochlea. These works allowed to make first predictions
about the forces endured by the cochlea walls.
In this contribution, we develop a framework able to accurately simulate,
in three dimensions, the whole process of the implant insertion into a patient-
specific cochlea, including the basilar membrane deformation. The simulation is
done using the finite element method and the SOFA framework1 . The implant
is modelled using the beam theory, while shell elements are used to define a
computational model of the basilar membrane. The cochlea walls are modelled
as rigid which is a common assumption [4] due to the bony nature of the cochlea.
Fig. 1. Examples of 3 insertions with different outcomes, from left to right: successful
insertion, failed insertion (folding tip), incomplete insertion.
In this section, we describe the numerical model used to capture the mechanical
behavior and the specific shapes of the cochlear implant and the basilar mem-
brane. Moreover, the computation of the boundary conditions (contacts with
the cochlea walls, insertion of the implant) are also described, as they play an
important role in this simulation.
Implant Model: The implant is made of silicone and has about 20 electrodes
(depending on the manufacturer) spread along its length. It is about half a
millimetre thick and about two to three centimetre long. Its thin shape makes
1
www.sofa-framework.org.
502 O. Goury et al.
it possible to use beam elements to capture its motion (see Fig. 3). Its dynamics
can be modelled as follows:
Fig. 3. (Left) The implant is modeled using beam elements and, (middle) its motion
is constrained by contact and friction response to collision with cochlear walls. (right)
Contact forces induces strain on the Basilar membrane.
Basilar Membrane Model: The basilar membrane separates two liquid-filled tun-
nels that run along the coil of the cochlea: scala media and scala tympani (by
which the implant is inserted). It is made of a stiff material but is very thin
(about 4 μm) and thus very sensitive to the contact with the electrodes. During
the insertion, even if the electrode is soft, the membrane will deform to com-
ply with its local shape. In case of excessive contact force, the membrane will
rupture: the electrode could then freely go in the scala media or scala vestibuli.
This will lead to loss of remaining hearing, damages to auditory nerve dendrites
and fibrosis. To represent the Basilar membrane, we use a shell model [6] that
derives from a combination of a triangular in-plane membrane element and a
triangular thin plate in bending. The nodes of the membrane that are connected
with the walls of the cochlea are fixed, like in the real case.
Implant Motion: During the procedure, the implant is pushed (using pliers)
through the round window which marks the entrance of the cochlea. To simplify
the implant model, we only simulate the portion of the implant which is inside
Numerical Simulation of Cochlear-Implant Surgery 503
the cochlea. The length of the beam model is thus increased progressively during
the simulation to simulate the insertion made by the surgeon. Fortunately, our
beam model relies on continuum equations, and we can adapt the sampling of
beam elements at each simulation step while keeping the continuity of the values
of F. The position and orientation of the implant body may play an important
role (see Sect. 4), so these are not fixed. Conversely, we consider that the implant
is pushed at constant velocity, as a motorized tool for pushing the implant was
used in the experiments.
3 Experimental Validation
As mentioned in the introduction, it is difficult to have an immediate feedback
on how the implant deploys in the cochlea due to very limited workspace and
504 O. Goury et al.
visibility. This poor feedback prevents the surgeon to adapt and update his/her
gesture to improve the placement of the implant. To have a better understanding
of the behaviors and to simplify the measurements, we have conducted exper-
iments of implant placement on temporal bones issued from cadavers. In this
section, these experiments are presented as well as a comparison between the
measurements and the simulation results.
Material : An custom experimental setup is built up to evaluate the forces
endured by the scala tympani during the insertion of an electrode array at con-
stant velocity. This setup is described in Fig. 4. Recorded data: This setup allows
to compare forces when performing a manual insertion and a motorized, more
regular, insertion. With this setup, we are able to reproduce failure cases such
as incomplete insertion or so-called folding tip insertion, as displayed in Fig. 1.
Ability to Reproduce Incomplete Insertions: The goal of this first comparison is
to show if we can reproduce what is observed in practice using simulation. Due
to contact and friction conditions and the fact that we work with living struc-
tures, it is never possible to reproduce the same insertion, even if the insertion
is motorized. So we do not expect the simulation to be predictive. However, we
show that the simulation is able to reproduce different scenarios of insertion
(complete/incomplete insertion or folding tip). Like in practice, the first impor-
tant resistance to the insertion of the implant appears in the turn at the bottom
of the cochlea (like in the picture (middle) of Fig. 3.) This resistance create a
buckling of the implant that limits the transmission in the longitudinal direction
till the implant presses the cochlear walls and manages to advance again. If the
resistance to motion is too large, the implant stays blocked. This differentiates
a complete and incomplete insertion and is captured by the simulation. Evolu-
tion of the implant forces while performing the insertion: An indicator of the
Fig. 4. Experimental setup. Microdissected cochleae are molded into resin (a) and
fixed to a 6-axis force sensor (c). A motorized uniaxial insertion tool (b) is used to
push the electrode array into the scala tympani at a constant velocity. The whole
setup is schemed in (d).
Numerical Simulation of Cochlear-Implant Surgery 505
smoothness of the insertion is the force applied on the implant by the surgeon
during the surgery. For minimising trauma, that force should typically remain
low. Experimental data shows that this force generally increases as the inser-
tion progresses. This is explained by the fact that as the implant is inserted, its
surface of contact onto the cochlea walls and the basilar membrane increases,
leading to more and more friction. The force has a peak near the first turn of
the cochlea wall (the basal turn). We see that the simulation reproduces this
behaviour (See Figs. 1 and 6).
Many parameters can influence the results of the simulation. We distinguish the
mechanical parameters (such as friction on the cochlea walls, stiffness of the
implant, elasticity of the membrane, etc...) and the clinical parameters, which
the surgeon can control to improve the success of the surgery. In this first study,
among all the mechanical parameters, we selected to study the influence of the
friction, which is complex to measure. We show that the coefficient of friction
has an influence on the completeness of the insertion but has less influence on
the force that is applied on the basilar membrane (see Fig. 7).
For the clinical parameters, we focus on the angle of insertion (see Fig. 5).
The position and orientation of the implant compared to the cochlea tunnels
plays an important role in the easiness of inserting the implant. The anatomy
makes it difficult to have a perfect alignment but the surgeon has still a certain
freedom in the placement of the tube tip. Furthermore, his mental representation
of the optimal insertion axis is related to his experience and even experts have
a 7◦ error of alignment [1]. We test the simulation with various insertion angles,
from a aligned case with θ = 0 to a case where the implant is almost orthogonal
Fig. 5. (Left) Forces when performing motorized versus manual insertion using the
setup presented in Fig. 4. (Right) Dissected temporal bone used during experiments
with the definition of the insertion angle θ: the angle formed by the implant and the
wall of the cochlea’s entrance
506 O. Goury et al.
Fig. 6. Comparison between experiments and simulation in 3 cases. We can see that the
simulation can reproduce cases met in real experiments (see Fig. 1). Regarding forces
on the cochlea walls, the general trend of the simulation is similar to the experiments.
To reproduce the folding tip case in the simulation, which is a rare in practice, the
array was preplaced with a folded tip at the round window region, which is why the
curve does not start from 0 length. In the incomplete insertion case, the force increases
greatly when the implant reaches the first turn. The simulation curves stops then. This
is because we did note include the real anatomy outside the entrance of the cochlea
that would normally constrain the implant and lead the force to keep increasing.
Fig. 7. Forces applied on the cochlea wall (left) and the basilar membrane (center) at
the first turn of the cochlea. We can see that larger forces are generated when inserting
the implant at a wide angle. Regarding the forces on the basilar membrane, there
are two distinct groups of angle: small angles lead to much smaller forces than wider
ones. Changing the friction generally increases the forces (right). This leads to an early
buckle of the implant outside the cochlea and hence to an incomplete insertion.
to the wall entrance with θ = 85, and compare the outcome of the insertion, as
well as the forces induced on the basilar membrane and the implant. Findings
are displayed in Fig. 7.
Numerical Simulation of Cochlear-Implant Surgery 507
Acknowledgements. The authors thank the foundation “Agir pour l’audition” which
funded this work and Oticon Medical.
References
1. Torres, R., Kazmitcheff, G., Bernardeschi, D., De Seta, D., Bensimon, J.L., Ferrary,
E., Sterkers, O., Nguyen, Y.: Variability of the mental representation of the cochlear
anatomy during cochlear implantation. European Archives of ORL, pp. 1–10 (2015)
2. Chen, B.K., Clark, G.M., Jones, R.: Evaluation of trajectories and contact pres-
sures for the straight nucleus cochlear implant electrode arraya two-dimensional
application of finite element analysis. Med. Eng. Phys. 25(2), 141–147 (2003)
3. Todd, C.A., Naghdy, F.: Real-time haptic modeling and simulation for prosthetic
insertion, vol. 73, pp. 343–351 (2011)
4. Ni, G., Elliott, S.J., Ayat, M., Teal, P.D.: Modelling cochlear mechanics. BioMed.
Res. Int. 2014, 42 p. (2014). Article ID 150637, doi:http://dx.doi.org/10.1155/2014/
150637
5. Kha, H.N., Chen, B.K., Clark, G.M., Jones, R.: Stiffness properties for nucleus
standard straight and contour electrode arrays. Med. Eng. Phys. 26(8), 677–685
(2004)
6. Comas, O., Cotin, S., Duriez, C.: A shell model for real-time simulation of intra-
ocular implant deployment. In: Bello, F., Cotin, S. (eds.) ISBMS 2010. LNCS, vol.
5958, pp. 160–170. Springer, Heidelberg (2010)
7. Johnson, D., Willemsen, P.: Six degree-of-freedom haptic rendering of complex
polygonal models. In: Haptic Interfaces for Virtual Environment and Teleoperator
Systems, HAPTICS 2003, pp. 229–235. IEEE (2003)
8. Tykocinski, M., Saunders, E., Cohen, L., Treaba, C., Briggs, R., Gibson, P., Clark,
G., Cowan, R.: The contour electrode array: safety study and initial patient trials
of a new perimodiolar design. Otol. Neurotol. 22(1), 33–41 (2001)
9. Kha, H.N., Chen, B.K.: Determination of frictional conditions between electrode
array and endosteum lining for use in cochlear implant models. J. Biomech. 39(9),
1752–1756 (2006)
Meaningful Assessment of Surgical Expertise:
Semantic Labeling with Data and Crowds
1 Introduction
A great musician, an all-star athlete, and a highly skilled surgeon share one
thing in common: the casual observer can easily recognize their expertise, sim-
ply by observing their movements. These movements, or rather, the appearance
of the expert in action, can often be described by words such as fluid, effort-
less, swift, and decisive. Given that our understanding of expertise is so innate
and engrained in our vocabulary, we seek to develop a lexicon of surgical exper-
tise through combined data analysis (e.g., user movements and physiological
response) and crowd-sourced labeling [1,2].
In recent years, the field of data-driven identification of surgical skill has
grown significantly. Methods now exist to accurately classify expert vs. novice
users based on motion analysis [3], eye tracking [4], and theories from motor con-
trol literature [5], to name a few. Additionally, it is also possible to rank several
users in terms of expertise through pairwise comparisons of surgical videos [2].
While all these methods present novel ways for determining and ranking exper-
tise, an open question remains: how can observed skill deficiencies translate into
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 508–515, 2016.
DOI: 10.1007/978-3-319-46720-7 59
Meaningful Assessment of Surgical Expertise Using Data and Crowds 509
more effective training programs? Leveraging prior work showing the superior-
ity of verbal coaching for training [6], we aim to develop and validate a mecha-
nism for translating conceptually difficult, but quantifiable, differences between
novice and expert surgeons (e.g. “more directed graphs on the known state tran-
sition diagrams” [7] and “superior exploitation of kinematic redundancy” [5])
into actionable, connotation-based feedback that a novice can understand and
employ.
The central hypothesis to this study is that human perception of surgical
expertise is not so much a careful, rational evaluation, but rather an instinctive,
impulsive one. Prior work has proposed that surgical actions, or surgemes (e.g.
knot tying, needle grasping, etc.) are ultimately the building blocks surgery [8].
While these semantic labels describe the procedural flow of a surgical task, we
believe that surgical skill identification is more fundamental than how to push
a needle. It is about the quality of movement that one can observe from a short
snapshot of data. Are the movements smooth? Do they look fluid? Does the oper-
ator seem natural during the task? The hypothesis that expertise is a universal,
instinctive assessment is supported by recent work in which crowd-workers from
the general population identified surgical expertise with high accuracy [1]. Thus,
the key to developing effective training strategies is to translate movement qual-
ities into universally understandable, intuitive, semantic descriptors.
that is useful for counting stressful events, which correlate to increased anxi-
ety [9], thus serving as a basis for a “calm/anxious” word pair. The choice of
word pairs and the corresponding metric is not unique; however, for the purpose
of this paper, we simply aim to determine whether or not these word pairs have
some relevance in terms of surgical skill evaluation. The six preliminary word
pairs chosen and their corresponding data metrics are listed in Table 1.
To quantify task movements and physiological response for our semantic label
metrics, we chose to measure joint positions (elbow, wrist, shoulder), limb accel-
erations (hand, forearms), forearm muscle activity with EMG, and GSR. Joint
positions were recorded using an electromagnetic tracker (trakSTAR, Model
180 sensors, Northern Digital Inc., Ontario, Canada) with an elbow estimation
method as described in [5]. Limb accelerations, EMG and GSR were measured
using sensor units from Shimmer Sensing, Inc. (Dublin, Ireland). Several mus-
cles were selected from EMG measurement including bilateral (left and right)
extensors, and a flexor on the left arm, which are important for wrist rotation,
as well as the abductor pollicus, which is important for pinch grasping with the
thumb [13]. These muscles were recommended by a surgeon educator.
We also recorded videos of the user posture and simulated surgical training
task with CCD cameras (USB 3.0, Point Grey, Richmond, Canada). The Robot
Operating System (ROS) was used to synchronize all data collection. The exper-
imental setup and sensor locations are shown in Fig. 1(a,c).
Meaningful Assessment of Surgical Expertise Using Data and Crowds 511
Skills Simulator
Limb Inertial
Measurement Units
with EMG + GSR
Electromagnetic Joint
Position Tracking
EMG EMG
Right Abductor IMU
Extenso EMG Pollicis Left Foot
r Bilateral
Flexors
IMU
Bilateral Forearms
Left Hand
GSR
The simulated surgical tasks chosen for this study were used to evaluate
endowrist manipulation and needle control and driving skills (Fig. 1(a,c)).
Endowrist instruments provide surgeons with range of motions greater than
a human hand, thus, these simulated tasks evaluates the subject’s ability to
manipulate these instruments. The needle driving task evaluates the subject’s
ability to effectively hand off and position needles for different types of suture
throws (forehand and backhand) and while using different hands (left and right).
Three subjects were recruited to participate in this study, approved by both
UTD and UTSW IRB offices (UTD #14-57, UTSW #STU 032015-053). The
subjects (right handed, 25–45 years old) consisted of: An expert (+6 years clinical
robotic cases), an intermediate (PGY-4 surgical resident) and a novice (PGY-1
surgical resident). All subjects had limited to no training using the da Vinci
simulator; however, the expert and intermediate had exposure to the da Vinci
clinical robots. All subjects first conducted two non-recorded warm up tasks (i.e.,
Matchboard 3 for endowrist manipulation warm up and Suture Sponge 2 for nee-
dle driving warmup). After training, the subjects then underwent baseline data
collection including arm measurements, and maximum voluntary isometric mus-
cle contractions (MVIC) for normalization and cross-subject comparison [14].
Subjects then conducted the recorded experimental tasks for endowrist manipu-
lation (Ring and Rail 2) and needle driving (Suture Sponge 3). For the purposes
of data analysis, each task was subdivided into three repeated trials, correspond-
ing to a single pass of a different colored ring (i.e., red, blue or yellow), or two
consecutive suture throws.
512 M. Ershad et al.
0.2 0.005
0.3
0
0.15
0.2 -0.005
z (m)
z (m)
z (m)
0.1
0.1 -0.01
0.05
-0.015
0
0 -0.02
Fig. 2. Wrist trajectory of subjects performing Ring and Rail 2 (red ring)
Ring & Rail Novice Ring & Rail Novice Ring & Rail Novice
Wrist Angular Velocity Variability (rad/sec)
3
)
Ring & Rail Expert Ring & Rail Expert Ring & Rail Expert
Suture Sponge Expert 0.25 Suture Sponge Expert 44
Suture Sponge Expert
34
0.05
32
5 0 30
Novice Intermediate Expert Novice Intermediate Expert Novice Intermediate Expert
Sluggish (vs. Swift) Anxious (vs. Calm) Tense (vs. Relaxed) Left Arm Extensor
250 450 0.45
Normalized Mean EMG Activation (%MVIC)
Ring & Rail Novice Ring & Rail Novice Ring & Rail Novice
Suture Sponge Novice 400 Suture Sponge Novice 0.4 Suture Sponge Novice
Ring & Rail Intermediate Ring & Rail Intermediate
Ring & Rail Intermediate
Suture Sponge Intermediate
200 Suture Sponge Intermediate 350 0.35 Suture Sponge Intermediate
Ring & Rail Expert Ring & Rail Expert
Ring & Rail Expert
Number of GSR Events
200 0.2
100
More Sluggish 150 More Anxiety 0.15 More Tense
100 0.1
50
50 0.05
0 0 0
Novice Intermediate Expert Novice Intermediate Expert Novice Intermediate Expert
Fig. 3. Mean and standard deviation of all metrics for all trials and subjects.
100
Expert
90 Intermediate
Novice
80
Assignment Percentage
70
60
50
40
30
20
10
0
Fluid Smooth Crisp Swift Calm Relaxed
(a) (b)
Acknowledgment. This work was supported by the Intuitive Surgical Simulator loan
program for the Southwestern Center for Minimally Invasive Surgery at UTSW (PI
Rege). We thank Deborah Hogg and Lauren Scott for providing access to the simulator.
References
1. Chen, C., et al.: Crowd-sourced assessment of technical skills: a novel method to
evaluate surgical performance. J. Surg. Res. 187(1), 65–71 (2014)
2. Malpani, A., Vedula, S.S., Chen, C.C.G., Hager, G.D.: Pairwise comparison-based
objective score for automated skill assessment of segments in a surgical task. In:
Stoyanov, D., Collins, D.L., Sakuma, I., Abolmaesumi, P., Jannin, P. (eds.) IPCAI
2014. LNCS, vol. 8498, pp. 138–147. Springer, Heidelberg (2014)
3. Howells, N.R., et al.: Motion analysis: a validated method for showing skill levels
in arthroscopy. J. Arthrosc. Relat. Surg. 24(3), 335–342 (2008)
4. Ahmidi, N., Hager, G.D., Ishii, L., Fichtinger, G., Gallia, G.L., Ishii, M.: Surgical
task and skill classification from eye tracking and tool motion in minimally invasive
surgery. In: Jiang, T., Navab, N., Pluim, J.P.W., Viergever, M.A. (eds.) MICCAI
2010, Part III. LNCS, vol. 6363, pp. 295–302. Springer, Heidelberg (2010)
5. Nisky, I., et al.: Uncontrolled manifold analysis of arm joint angle variability during
robotic teleoperation and freehand movement of surgeons and novices. IEEE Trans.
Biomed. Eng. 61(12), 2869–2881 (2014)
6. Porte, M.C., et al.: Verbal feedback from an expert is more effective than self-
accessed feedback about motion efficiency in learning new surgical skills. Am. J.
Surg. 193(1), 105–110 (2007)
7. Reiley, C.E., Hager, G.D.: Task versus subtask surgical skill evaluation of robotic
minimally invasive surgery. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A.,
Taylor, C. (eds.) MICCAI 2009, Part I. LNCS, vol. 5761, pp. 435–442. Springer,
Heidelberg (2009)
8. Lin, H.C., et al.: Towards automatic skill evaluation: detection and segmentation
of robot-assisted surgical motions. Comput. Aided Surg. 11(5), 220–230 (2006)
9. Critchley, H.D., et al.: Neural activity relating to generation and representation
of galvanic skin conductance responses: a functional magnetic resonance imaging
study. J. Neurosci. 20(8), 3033–3040 (2000)
10. Malpani, A., et al.: A study of crowdsourced segment-level surgical skill assessment
using pairwise rankings. Int. J. Comput. Assist. Radiol. Surg. 10(9), 1435–1447
(2015)
11. White, L.W., et al.: Crowd-sourced assessment of technical skill: a valid method for
discriminating basic robotic surgery skills. J. Endourol. 29(11), 1295–1301 (2015)
12. Kowalewski, T.M., et al.: Crowd-sourced assessment of technical skills for valida-
tion of Basic Laparoscopic Urologic Skills (BLUS) tasks. J. Urol. 195, 1859–1865
(2016)
13. Criswell, E.: Cram’s Introduction to Surface Electromyography. Jones & Bartlett
Publishers, Sudbury (2010)
14. Halaki, M., Ginn, K.: Normalization of EMG Signals: To Normalize Or Not to
Normalize and What to Normalize To?. INTECH Open Access Publisher, Rijeka
(2012)
15. Henrikson, R., et al.: Surgical trainer and navigator final report, pp. 1–14
16. Benedek, M., Kaernbach, C.: Decomposition of skin conductance data by means
of nonnegative deconvolution. Psychophysiology 47(4), 647–658 (2010)
2D-3D Registration Accuracy Estimation
for Optimised Planning of Image-Guided
Pancreatobiliary Interventions
1 Introduction
independent ‘target’ landmarks, both defined in the 3D image; error estimates for these
landmarks, referred to here as 3D landmark localisation errors1 (LLEs); error estimates
for registration landmarks defined within the 2D image, 2D LLEs; estimates of the
parameters of the transformation that match the 2D- and 3D registration landmarks.
Assuming that the 2D images are produced by projection, as is the case for ERCP,
the 2D coordinates of a landmark, defined by the position vector, u ¼ ½u; vT , in the 2D
image are given by:
where x ¼ ½x; y; zT is the position vector containing the 3D coordinates of the land-
mark in the 3D image; f p is the perspective projection transformation; K is the camera
matrix, determined by the focal length (i.e. the distance between the X-ray source and
detector) and the principal point coordinates. These 2D imaging parameters are
T
assumed known and well-calibrated; and h ¼ ax ; ay ; az ; tx ; ty ; tz is a vector con-
taining the 3 rotation and 3 translation parameters of a rigid-body transformation, f r ,
which may also be parameterised by a 3 3 rotation matrix Rðax ; ay ; az Þ and a
T
translation vector t ¼ tx ; ty ; tz : Rewriting (1) in terms of R and t, and using
homogenous coordinates with a normalisation scaling factor, k, we have:
u Rx þ t
k ¼K :
1 1
In this work, registration of the 2D- and 3D images was achieved by minimising a
collinearity-based error in 3D (as opposed to 2D) image space using the orthogonal
iteration algorithm [5].
Anatomical landmarks used in registration are defined with inherent uncertainties,
due to the intra/inter-operator localisation error, anatomical variations, projection-
related ambiguity, and tissue motion. Assuming an independent, anisotropic and
heterogeneous Gaussian error model, the following errors are involved (see also in
Fig. 1): (a) A 3D LLE for the ith ði ¼ 1; . . .; nÞ 3D landmark xi , represented by 3D
covariance matrix Rxi ; (b) a 2D LLE on the corresponding 2D landmark ui represented
by 2D covariance matrix Rui ; (c) errors on m transformation parameters h (here,
m ¼ 6), a 6D covariance matrix Rh ; and, (d) an error, represented by a 3D matrix Rr ,
associated with the target of interest, defined in the preoperative image by position
vector r.
First, we would like to compute the uncertainty in transformation parameter h, i.e.
Rh , given uncertainties from both 2D- and 3D LLEs, Rxi and Rui . Hoff et al. [6] derived
a backward propagation of covariance using a direct least-squares, pseudo-inversion of
a full rank Jacobian to estimate Rh . Sielhorst et al. [7] directly utilised the forward- and
backward propagation of covariance, summarised in [8], estimating errors for an
optical tracking application. Both of these studies considered only 2D LLEs, i.e. true
1
Equivalent to fiducial localisation error.
2D-3D Registration Accuracy Estimation for Optimised Planning 519
Fig. 1. A schematic showing the variables and errors involved in registration and planning of an
ERCP-guided procedure (see Sects. 2 and 3 for details).
values of xi , the geometry of the calibrated tracking tool, are known without uncer-
tainty. However, this assumption does not hold in 2D-3D registration applications and
the registration error should be estimated by considering both Rxi and Rui . This can be
achieved by modifying Eq. (1) and considering a new vector function:
u h
¼ f p ;K ð2Þ
x x
in which, the same perspective projection still holds, but the 3D landmarks are now
treated as additional parameters to estimate as well as trivial function outputs. The
parameter space of f p now becomes m þ 3n dimensional whilst the function (mea-
surement) space, becomes 2n þ 3n dimensional. Linearly approximating the vector
transformation function f p by a first-order Taylor series, a (2n þ 3nÞ ðm þ 3nÞ
Jacobian matrix Jf p can be computed to map the 2n þ 3n dimension covariance matrix
RU;X onto an m þ 3n dimension parameter space. Without loss of generality, and
assuming independence between the measured landmarks, the new backward propa-
gation formula is:
X þ
d
h;X
¼ JT 1
f p RU;X Jf p ð3Þ
where
2 3
Ru1 0
6 .. .. .. 7
6 . . . 0 7
6 7 h iT h iT
6 0 R un 7
RU;X ¼6
6
7; J ðui Þ ¼ @f p and J ðxi Þ ¼ @f p :
7 fp
6 Rx1 0 7
@ui i;h fp @xi i;h
6 .. .. .. 7
4 0 . . . 5
0 Rxn
where ur ¼ f p ðrÞ is the projected target point in 2D. A scalar root-mean-square error
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
(RMSE) can also be computed, TRE ¼ trace R b u Þ.
r
with apat and Dt, respectively. Whilst it is noted that error estimates in patient posi-
tioning can be derived from observed registration errors, the published values (e.g. [9]),
summarized in Table 1, were used in this study to reflect a plausible clinical scenario.
The C-arm orientation acarm , on the other hand, can be calibrated to a high degree of
accuracy. Therefore, its error is assumed negligible relative to that associated with
patient positioning. If this error becomes significant, for example, if an uncalibrated
C-arm is used, including an additional error in C-arm orientation should be considered.
Unlike estimation of intraoperative TRE, which can be conditioned on the regis-
tration transformation parameters, planning C-arm orientation and patient position has
to take into account (marginalise over) the uncertainty in patient positioning (see also in
Fig. 1). Given b apat , the uncertainty in planning can be represented by an error
covariance Rbacarm on optimised b
carm
a . Now the entire process to compute the regis-
tered target position with parameterisation in Eq. (5) becomes:
carm pat
ur ¼ f p f r r; b
a ;b
a ; DtÞÞ:
Using the same treatment for Eqs. (2) and (3), a composite covariance matrix can
be computed as follows:
X þ
d T 1
^
a carm
^ ; Dt
;a pat ¼ J R
f p ur ;^ pat J
a ;Dt f p ð6Þ
As in Eq. (3), R bpat can be constructed by R b u estimated from Eq. (4) and
ur ; a ;Dt r
MRCP and ERCP were acquired for two patients who underwent ERCP-guided
interventions under local research ethical approval. Anatomical landmarks were
manually defined by an interventional radiologist, a gastroenterologist and three
medical imaging research fellows, on separate occasions. These included points on the
ampulla, hilum, L1-, L2-, T12 vertebrae, pancreatic genu, hepatic duct bifurcation,
cystic duct- and pancreatic duct connections with the common bile duct, previously
implanted surgical devices (e.g. a lap chole clip), and pathological features (e.g. a
stricture). See an example in Fig. 2. LLEs for these points, summarised in Table 1,
were estimated from the variance in the multiple landmarks. In addition, five validation
landmarks for each case were defined to represent locations of clinical interest, such as
points on the most distal aspect of the common bile duct and on the branches of the
left- and right hepatic ducts.
522 Y. Hu et al.
Table 1. Result summary from two sets of patient data, errors are summarised using RMSE
Pat. LLE (mm) Positioning C-arm [1st, 2nd] Planning Estimated Observed
error (°−mm) angles (°) error (°) TRE (mm) TRE (mm)
1 3D: 5.1 ± 1.2 Lateral: [0.2, −0.1] 16.5 ± 3.3 4.9 ± 1.1 9.7 ± 3.0
2D: 2.1 ± 0.6 8.9–11.5 [− 27.0, −0.1] 13.7 ± 5.0 6.1 ± 1.7 16.2 ± 3.4
Target: 3.3 ± 1.0 Supine:
2 3D: 3.4 ± 0.3 4.3–11.5 [− 0.1, 0.0] 92.8 ± 59.0 5.3 ± 0.9 12.0 ± 2.7
2D: 2.4 ± 0.8 Prone: [− 18.9, 0.0] 114.3 ± 62.5 5.3 ± 0.8 13.2 ± 3.2
Target: 3.5 ± 0.6 4.3–11.5
Monte-Carlo simulations were used to verify: (1) the TRE estimated by Eq. (4);
and (2) the derived planning uncertainty given by Eq. (6), where Jacobian was esti-
mated numerically. 10,000 simulations were performed sampling respective variables.
As an example, Fig. 3, the overall agreement is excellent with 4.5 ± 3.6 % difference
between the TRE computed analytically versus the result of numerical simulations,
measured as the RMSE in Rur . The overall RMSE had a range of [3.2, 9.8] mm
between different C-arm positions. The difference in Ra^carm was 17.9 ± 9.7 % when
the RMSE is smaller than 50°, which we consider as a planning range. As the planning
error increased, Eq. (6) provided an increasingly poor approximation with up
to *500 % difference from simulation results. This is to be expected given that there
are only 2 DOFs in this case.
The measured TRE was computed for each of the target for ERCP images acquired
at different, but sub-optimal C-arm angles (due to the retrospective analysis). The
results, compared with the estimated TRE and planning error, are summarised in
Table 1. Although the estimated TREs significantly under-estimated the observed
values for both patients, possibly due to rigid assumption, a trend implying potential
predictive ability was observed (correlation coefficient of 0.95). Significant differences
2D-3D Registration Accuracy Estimation for Optimised Planning 523
Fig. 3. Estimated TREs (RMSE in mm is indicated by the colour bar) for two example targets -
a bifurcation at left hepatic duct (left plot) and a stricture at pancreatic duct (right plot) -
computed using Monte-Carlo simulation (solid-line ellipse) versus analytical results (coloured
ellipse), for different C-arm orientations (in degrees). Please see the text for the experiment
details.
in both estimated- and observed registration accuracy were found for Patient 1, for
different C-arm angles (both p-values < 0.01), which is consistent with the estimated
planning errors that produce mutually exclusive 90 % CIs. Patient 2 had a larger
planning error (overlapping each other within respective 50 % CIs) for both
C-arm orientations, predicting that significant changes in TRE are unlikely. This was
confirmed by insignificant differences from both estimated- and observed TREs
(p-values = 0.99 and 0.07).
Acknowledgements. This work is supported by CRUK, the EPSRC and the CIHR.
References
1. Anderson, M.A., Fisher, L., Jain, R., Evans, J.A., Appalaneni, V., Ben-Menachem, T., Fisher,
D.A.: Complications of ERCP. Gastrointest. Endosc. 75(3), 467–473 (2012)
2. Markelj, P., Tomaževič, D., Likar, B., Pernuš, F.: A review of 3D/2D registration methods for
image-guided interventions. Med. Image Anal. 16(3), 642–661 (2012)
3. Murphy, M.J., Adler, J.R., Bodduluri, M., Dooley, J., Forster, K., Hai, J., Poen, J.:
Image-guided radiosurgery for the spine and pancreas. Comput. Aided Surg. 5(4), 278–288
(2000)
4. Soltys, S.G., Goodman, K.A., Koong, A.C.: CyberKnife radiosurgery for pancreatic cancer.
In: Urschel, H.C., et al. (eds.) In Treating Tumors that Move with Respiration, pp. 227–239.
Springer, Heidelberg (2007)
5. Lu, C.P., Hager, G.D., Mjolsness, E.: Fast and globally convergent pose estimation from
video images. IEEE Trans. PAMI 22(6), 610–622 (2000)
6. Hoff, W., Vincent, T.: Analysis of head pose accuracy in augmented reality. IEEE Trans. Vis.
Comput. Graph. 6(4), 319–334 (2000)
7. Sielhorst, T., Bauer, M., Wenisch, O., Klinker, G., Navab, N.: Online estimation of the target
registration error for n-ocular optical tracking systems. In: Ayache, N., Ourselin, S.,
Maeder, A. (eds.) MICCAI 2007, Part II. LNCS, vol. 4792, pp. 652–659. Springer,
Heidelberg (2007)
8. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge
University Press, Cambridge (2003)
9. Penney, G., Varnavas, A., Dastur, N., Carrell, T.: An image-guided surgery system to aid
endovascular treatment of complex aortic aneurysms: description and initial clinical
experience. In: Taylor, R.H., Yang, G.-Z. (eds.) IPCAI 2011. LNCS, vol. 6689, pp. 13–24.
Springer, Heidelberg (2011)
Registration-Free Simultaneous Catheter
and Environment Modelling
1 Introduction
Endovascular catheter procedures are among the most common surgical inter-
ventions used to treat Cardiovascular Diseases (CVD). Being minimally invasive,
these procedures extend the range of patients able to receive interventional CVD
treatment to age groups with high risks for open surgery [1]. However, the chal-
lenge associated with minimising access incisions lies in the increased complexity
of catheter manipulations, which is mainly caused by the loss of direct access
to the anatomy and the poor visualisation of the surgical site [2]. Thus, the
3D structure of the vasculature needs to be recovered intra-operatively in order
to model the interaction between the catheter and its surroundings and assist
catheter navigation.
The current clinical approaches to endovascular procedures mainly rely on
2D guidance based on X-ray fluoroscopy and the use of contrast agents [3].
An alternative imaging modality that does not depend on ionising radiation or
This work was supported by the FP7-ICT (601021) and the EPSRC (EP/L020688/1).
Dr. Stamatia Giannarou is supported by the Royal Society (UF140290).
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 525–533, 2016.
DOI: 10.1007/978-3-319-46720-7 61
526 L. Zhao et al.
2 Methods
In the state vector, Pi = {RPi , TPi } is the current catheter pose in the EM
coordinate frame, in which RPi and TPi are the rotation matrix and the trans-
lation vector respectively, Ai = {RAi , TAi } is the pose of the anchored EM
sensor and R = {RR , TR } is the relative pose from the anchored EM sen-
sor to the CT coordinate frame, which corresponds to the EM-CT registration.
CC C T C T T
i = [(c1 ) , . . . , (cn ) ] is the vessel contour computed from the pre-operative
data as the cross section of the CT model and the plane defined by the catheter
pose Pi transformed with Ai and the registration pose R. Here, the IVUS images
and EM data used in the optimisation are gated to be at the same phase of the
cardiac cycle by the ECG signal. The anchored EM sensor is introduced only to
deal with global motion and we assume that at each phase of the cardiac cycle,
the relative pose between the anchor sensor and the vessel does not change.
The first term in (1) transforms the contour CC i from the CT to the IVUS
coordinate frame and minimises the difference between the contour extracted
from the IVUS image and the contour computed from the pre-operative data,
weighted by the uncertainty of the IVUS contour ΣI . The second term in (1)
minimises the difference between the catheter pose and the pose reported by the
EM sensor, weighted by the uncertainty of the EM pose ΣE . The first two terms
in (1) are similar to SCEM+ [9], but with R and Ai included in the state vector.
The third term in the objective function (1) aims to minimise the difference
between the EM-CT registration pose R in the state vector and the optimal
solution of the registration pose R̂i−1 from the previous frame, weighted by the
corresponding covariance matrix ΣRi−1 computed from the proposed algorithm.
Here, (R̂i−1 , ΣRi−1 ) from the (i − 1)th frame is used as the observation in the
optimisation of the ith frame. The fourth term in the objective function is to
minimise the difference between the anchored EM pose in the state vector and
the observation of the EM pose reported from the anchored sensor, weighted by
the uncertainty of the EM sensor.
The optimal solution of the optimisation formulated in (1) can be obtained
iteratively by using the Gauss-Newton method, where in the k th iteration
⎧ k+1 ⎡ k⎤
⎪ R = Rk + ΔkR ΔR
⎨
T −1 ⎢ k ⎥
Pi = Pi + ΔP , where J Σ J ⎣ ΔP ⎦ = J T Σ −1 ε.
k+1 k k
(2)
⎪
⎩ k+1
Ai = Aki + ΔkA ΔkA
528 L. Zhao et al.
Here J is the linear mapping represented by the Jacobian matrix of the observa-
tion functions evaluated at Rk , Pki and Aki , Σ is the covariance matrix containing
the uncertainties of all the observations, and ε is the residual vector of all the
observations as
⎡ ⎤
∂CIi ∂CIi ∂CIi ⎡ ⎤ ⎡ I ⎤
⎢ ∂R ∂P ∂Pi ∂Ai
⎥ ΣI 0 0 0 Ci − f (R, Pi , Ai )
E
⎢ 0 i
0 ⎥ ⎢ 0 ⎥ ⎢ PEi − Pi
⎥
J =⎢ ∂Pi ⎥, Σ =⎢ 0 ΣE 0 ⎥, ε =⎢ ⎥ (3)
⎢ ∂Ri−1 ⎥ ⎣ 0 0 ΣRi−1 0 ⎦ ⎣ R̂i−1 − R ⎦
⎣ ∂R 0 0 ⎦
A
0
∂P
0 ∂Aii
A 0 0 0 ΣE P i − Ai
where f (·) combines all the observation functions in the first term in (1)
T T T
f (R, Pi , Ai ) = [. . . , (((cC T T T
j ) RR +TR )RAi +TAi −TPi )RPi , . . .] , j = 1 : n (4)
∂PE ∂PA
and ∂Pii = ∂R ∂R = ∂Ai = E6 , where E6 is the 6 × 6 identity matrix.
i−1 i
For real-time implementation, the residual in the first term of (1) can be
replaced by the shortest distances to the pre-operative CT model. Thus, the
objective function and its Jacobians related to the first term can be pre-
calculated as the distance space and its gradient from the pre-operative data
[9]. By the formulation of the optimisation problem, the state vector can be sim-
ply initialised by using the observations R0 = R̂i−1 , P0i = PE 0
i and Ai = Pi .
A
After the optimal solutions of the EM-CT registration R̂i , the current
catheter pose P̂i and the anchored sensor pose Âi are obtained, their corre-
sponding covariance matrices ΣRi , ΣPi and ΣAi which present their uncertainty
can also be computed by using the Schur complement
⎧
T
−1
⎪
⎪ −1 IP R IP P IP A IP R
⎪
⎪ ΣRi = IRR −
⎪
⎪ IAR IAP IAA IAR I = J T Σ −1 J
⎪
⎪
T
−1
⎨ ⎡ ⎤
IRP IRR IRA IRP , where IRR IRP IRA
ΣP−1 = IP P −
⎪
⎪
⎪
i IAP IAR IAA IAP = ⎣IP R IP P IP A ⎦ .
⎪
⎪
⎪
⎪
T −1 IAR IAP IAA
⎪ Σ −1 = I − IRA
⎩
IRR IRP IRA
Ai AA
IP A IP R IP P IP A
(5)
Here IRR , IP P , IAA and IRP = IPT R , IRA = IAR T T
, IP A = IAP are the parts of
the information matrix I, which correspond to the variables R, Pi , Ai and their
correlations, respectively.
The vessel reconstruction can be performed by transforming the IVUS con-
tour CIi into the CT coordinate frame Ci = [cT1 , . . . , cTn ]T using the optimal R̂i ,
P̂i and Âi , with the corresponding covariance matrix ΣCi as uncertainty:
T I
cj = R̂R (R̂Ai (R̂P c + T̂Pi − T̂Ai ) − T̂R ), ΣCi = JC ΣS JCT
i j
(6)
where JC is the Jacobian matrix of Ci w.r.t the registration pose R, catheter
pose Pi , anchor pose Ai as well as the IVUS contour CIi respectively, and ΣS
contains their covariance matrices on its diagonal
∂C ∂C ∂C ∂C
JC = ∂Ri ∂Pii ∂Aii ∂CIi , ΣS = diag(ΣRi , ΣPi , ΣAi , ΣI ). (7)
i
Registration-Free Simultaneous Catheter and Environment Modelling 529
At the end of the ith frame, the optimal solution of the EM-CT registration
pose together with the corresponding uncertainty (R̂i , ΣRi ) computed by (5) are
used as one of the observations in the (i + 1)th frame.
In the proposed algorithm, the uncertainty of the EM-CT registration pose is
−1
initialised with zero information as ΣR 0
= 06 (where 06 is a 6 × 6 zero matrix) at
the first frame to ensure that the proposed algorithm only uses the information
from IVUS, EM and the pro-operative data. Since the EM-CT registration is
incrementally estimated from IVUS and EM data, the result of the registration
will not be very accurate at the very beginning. As more parts of the vessel
are observed by IVUS, the EM-CT registration is updated intra-operatively and
becomes more accurate. By using the formulation above, the information from
both the IVUS contour (CIi , ΣI ) and the EM pose (PE i , ΣE ) at the i
th
frame are
transferred and accumulated in the covariance matrix ΣRi of R̂i , which means
all the information of IVUS and EM from the 1st to the ith frame is summarised
in ΣRi , and is used in the (i + 1)th frame as an integrated observation (R̂i , ΣRi ).
3 Results
3.1 Monte-Carlo Simulation
First, simulated data generated from a CT model with known EM-CT registra-
tion, and perfect EM poses and IVUS contours as ground truth were used to
assess the accuracy of the proposed algorithm w.r.t the observation noise. Dif-
ferent levels of zero mean Gaussian noise were added to the ground truth EM
poses and to the IVUS contours and were used as observations to the proposed
algorithm. For each noise level, 25 runs were performed and the mean pose and
reconstruction errors are shown in Fig. 1(left). In Fig. 1(right), the changes of the
error of the EM-CT registration pose during the vessel reconstruction are shown
with the 2σ bound from the corresponding uncertainty estimation, when 0.1 rad
noise is added to the rotation and 1 mm noise to the translation of the EM pose,
X Y Z
Error (mm)
-5
0.03
Yaw Pitch Roll
Error (rad)
-0.03
0 500 1000 0 500 1000 0 500 1000
Frame Frame Frame
Fig. 1. Monte-Carlo simulation: (left) the accuracy of catheter pose, vessel reconstruc-
tion and EM-CT registration pose w.r.t different levels of noise on the observations of
EM poses and IVUS contours, (right) the reduction of error (black lines) and uncer-
tainty (2σ bounds shown in blue lines) of the EM-CT registration pose.
530 L. Zhao et al.
and 1 mm noise to the IVUS contour. It can be seen that the reconstruction
errors remain small in the presence of noise and the error and uncertainty of
EM-CT registration reduced quickly.
1.6 0.6
R Registration (rad) SCEM (mm)
1.4 T Registration (cm) SCEM+ (mm)
0.55
Proposed (mm)
1.2
0.5
Error (mm)
1
Error
0.8 0.45
0.6
0 6
0.4
0.4
0.35
0.2
0 0.3
1 Setup-1 5 Setup-2 10 Setup-3 15 Global
Static Periodic Global+Periodic
Fig. 2. Accuracy of phantom experiments: (left) the static case using HeartPrint phan-
tom, (right) with global motion and periodic deformation using the silicone phantom.
Fig. 3. Vessel reconstruction results of the silicone phantom with global motion: (a) pre-
operative CT model, (b) result of SCEM shows the changes of the EM-CT registration,
(c) result of the proposed algorithm coloured by the error of reconstruction in mm, and
(d) the catheter tip poses found using SCEM (red) and the proposed algorithm (black).
In-vivo experiments in a swine model with global motions were also performed to
validate the proposed algorithm. A segmented CT scan provided the triangular
surface mesh of the aorta. Seven CT markers were attached to the body of the
swine, but as shown in Fig. 4(b), the EM-CT registration with CT markers has
large error. In total 4 pullbacks were performed. The IVUS was gated by the
ECG to deal with cardiac motion, and the results of the proposed algorithm are
shown in Fig. 4(c), (d) and Table 1. For the 4 pullbacks, the mean errors of vessel
reconstruction are 0.80, 0.83, 0.71, 0.68 mm, respectively.
Pullback-1
CT Model
Pullback-2
(a) CT and IVUS Image (b) Registration by CT Markers (c) Pullback-1 (d) Pullback-2
Fig. 4. Results of in-vivo experiments in swine model: (a) CT and IVUS image, (b)
SCEM results show the large error of EM-CT registration by using CT markers and
the global motion between Pullback-1 and Pullback-2, the results of Pullback-1 (c) and
Pullback-2 (d) by the proposed algorithm.
532 L. Zhao et al.
4 Conclusion
References
1. Mirabel, M., Iung, B., Baron, G., et al.: What are the characteristics of patients
with severe, symptomatic, mitral regurgitation who are denied surgery? Eur. Heart
J. 28(11), 1358–1365 (2007)
2. Kono, T., Kitahara, H., Sakaguchi, M., Amano, J.: Cardiac rupture after catheter
ablation procedure. Ann. Thor. Surg. 80(1), 326–327 (2005)
3. Groher, M., Bender, F., Hoffmann, R.-T., Navab, N.: Segmentation-driven 2D-
3D registration for abdominal catheter interventions. In: Ayache, N., Ourselin, S.,
Maeder, A. (eds.) MICCAI 2007, Part II. LNCS, vol. 4792, pp. 527–535. Springer,
Heidelberg (2007)
4. Rosales, M., Radeva, P., Rodriguez-Leor, O., Gil, D.: Modelling of image-catheter
motion for 3-D IVUS. Med. Imag. Anal. 13(1), 91–104 (2009)
5. Wahle, A., Prause, G.P.M., DeJong, S.C., Sonka, M.: Geometrically correct
3-D reconstruction of intravascular ultrasound images by fusion with bi-plane
angiography-methods and validation. IEEE Trans. Med. Imag. 18(8), 686–699
(1999)
6. Bourantas, C.V., Papafaklis, M.I., Athanasiou, L., et al.: A new methodology for
accurate 3-dimensional coronary artery reconstruction using routine intravascu-
lar ultrasound and angiographic data: implications for widespread assessment of
endothelial shear stress in humans. EuroIntervention 9(5), 582–93 (2013)
Registration-Free Simultaneous Catheter and Environment Modelling 533
1 Introduction
Preoperative planning of a safe and efficient trajectory for a Deep Brain Stimu-
lation (DBS) electrode is a crucial and challenging task which usually requires
a long experience. The path is usually chosen as the best compromise between
multiple placement rules that may be contradictory, such as accurate targeting,
avoidance of various sensitive structures or zones, or compliance with standards.
Most of the automatic trajectory planning techniques that have been pro-
posed in the literature for DBS are based on mono-objective approaches
[1,2,5,7,11]. They combine the rules into a single aggregative weighted sum
and minimize it to find an optimal solution. This approach is intuitive, and
sounds close to the current decision making process. However, the optimiza-
tion community has shown that using such mono-criteria approaches for solving
multi-criteria optimization problems can lead to an under-detection of the opti-
mal solutions in a given solution space: it often produces poorly distributed
solutions and does not find optimal solutions in non-convex regions [6].
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 534–541, 2016.
DOI: 10.1007/978-3-319-46720-7 62
Pareto Front vs. Weighted Sum for Automatic Trajectory Planning of DBS 535
If multi-criteria methods are already widely used for radiation therapy plan-
ning [3], it’s only recently that a few groups started to consider Pareto-optimality
techniques for path planning in minimally invasive surgery. For example, a non-
dominance based optimization was described in [9,10] for radiofrequency abla-
tion of tumors. But to our knowledge, no such method has been used in DBS.
The purpose of this work is to better understand and quantify the capacities
and limits of different approaches to detect optimal solutions in the case of
preoperative DBS path planning. We introduce to this context an optimality
quantification approach based on dominance with the computation of a Pareto
front. We compare it to a classical aggregative method based on a weighted
sum. For both methods, within a uniform distribution of candidate entry points,
optimal solutions are proposed and the difference is studied. These approaches
are described in detail in Sect. 2. Then in Sect. 3, we describe the experiment
performed by an experienced neurosurgeon on 14 patients cases, in order to
quantify the loss of relevant trajectories missed by the aggregative approach.
The set of all Pareto-optimal solutions is called a Pareto front. Let us denote
SP F the subset of points of S that belong to the Pareto front. Inside the front,
no solution dominates another.
The objective of the test was (1) to compare the two methods on their coverage
over the surface of candidate entry points, and their ability to find the maximal
set of optimal solutions, and (2) to check whether the points found as optimal by
one method and not by the other one were likely to be chosen by neurosurgeons.
To this end, a retrospective study was performed on 14 datasets from 7
patients who underwent a bilateral Deep Brain Stimulation of the Subthalamic
Nucleus (STN) to treat Parkinson’s disease. Each dataset was composed of pre-
operative 3T T1 and T2 MRI with a resolution of 1.0 mm × 1.0 mm × 1.0 mm,
and a 3D brain model containing triangular surface meshes of the majority of
cerebral structures, segmented and reconstructed from the preoperative images
using the pyDBS pipeline described in [4]. Among the 3D structures, we have
the STN, a patch delineated on the skin as a search area for the entry points,
the ventricles and the sulci that neurosurgeon try to avoid. The T1, T2 and 3D
meshes were registered in the same coordinates system.
A second pipeline was implemented and executed on the 3D scenes. First a
discretization S of the search space, as described in Sect. 2.3, was performed.
The distribution contained between 0.93 and 1.29 point per mm2 (average 1.07),
representing an average of 2,320 sample points per case on an average surface
of 2,158 mm2 . Then we computed the subsets SW SE and SP F of points labeled
as optimal respectively by methods MW SE and MP F , as described in Sects. 2.1
and 2.2. Examples of subsets of optimal points proposed by both methods are
presented on Figs. 1a and b. We marked for each case the difference set D of
points found by one method and not by the other DW SE = SW SE −(SW SE ∩SP F )
and DP F = SP F − (SW SE ∩ SP F ), and computed their cardinality.
Finally, an experienced neurosurgeon was asked to perform a test in 4 steps.
(a) Step 2: weighted sum set SW SE (b) Step 3: Pareto front set SP F
Fig. 1. Case #12: area of feasible entry points, with solutions of MW SE in blue,
solutions of MP F in red, and the trajectory chosen with MM P in green.
1
A video illustrating the experiment can be watched at http://goo.gl/mfgrqX or
https://youtu.be/16JthovAh5c.
Pareto Front vs. Weighted Sum for Automatic Trajectory Planning of DBS 539
Fig. 2. Number of points in SW SE (in blue) and SP F (in red) for the 14 cases
approach. In two other cases, the distance is higher than 4.8 mm, which is still
far from the preferred location. For the other 3 cases, the distance ranges between
1.6 mm and 2.05 mm which may correspond to relatively reasonable alternatives.
It is also interesting to observe that for the two cases where MM P was
ranked first (#7 and #13), the distance between the manually proposed entry
point and the closest point of SP F (resp. 1.16 and 0.87 mm) was always lower
than the closest point of SW SE (resp. 2.83 and 1.49 mm).
The average times taken for each of the three methods of selection were
respectively 155 s. for MM P , 38 s. for MW SE , and 42 s. for MP F . Of course,
this measurement is biased because the target selection time is included only in
MM P , as steps 2 and 3 consisted in only selecting an entry point. We did not
record separately the time required to select the entry point, because in step 1
we chose to let the surgeon go back and forth between target and entry point
position refinement to have a good accuracy. However, even considering that
planning the target point took half of the time in step 1, steps 2 and 3 were still
much faster. Besides, the improvement of speed was not at the cost of accuracy,
as an automatically proposed entry point was ranked first in 12/14 cases. This
experiment confirms the overall interest of automatic assistance to preoperative
trajectory planning for Deep Brain Stimulation.
Finally, we can notice that in 5 cases the surgeon did not choose the same
point using PF and WS even though the preferred point was available in both. We
hypothesize that the display might have to be improved for MP F , for instance
by using a color scheme for the objectives.
4 Conclusion
The automatic trajectory planning techniques that have been proposed for DBS
in the literature are based on mono-objective optimization approaches that com-
bine different criteria through weighted sums. Unfortunately, theory shows that
such techniques cannot find concavities in Pareto fronts, meaning that some
Pareto-optimal solutions cannot be reached.
This paper shows that methods using a quantification of the trajectories
quality based on Pareto-optimality can find more optimal propositions than the
current state of the art algorithms using weighted sums. The evaluation study we
Pareto Front vs. Weighted Sum for Automatic Trajectory Planning of DBS 541
conducted involving a blind ranking, highlighted that the extra propositions can
often be chosen as more accurate by a neurosurgeon, and that some of them did
not have any reasonably close alternative proposed by the weighted sum method.
Finally, the recorded times indicated that the automatic assistance was, in 12
cases over 14, both faster and more accurate than a manual planning, which
further confirms the overall interest of automatic assistance to preoperative tra-
jectory planning for Deep Brain Stimulation.
Acknowledgments. The authors would like to thank the French Research Agency
(ANR) for funding this work through project ACouStiC (ANR 2010 BLAN 020901).
References
1. Bériault, S., Subaie, F.A., Collins, D.L., Sadikot, A.F., Pike, G.B.: A multi-modal
approach to computer-assisted deep brain stimulation trajectory planning. Int. J.
CARS 7(5), 687–704 (2012)
2. Brunenberg, E.J.L., Vilanova, A., Visser-Vandewalle, V., Temel, Y.,
Ackermans, L., Platel, B., ter Haar Romeny, B.M.: Automatic trajectory planning
for deep brain stimulation: a feasibility study. In: Ayache, N., Ourselin, S.,
Maeder, A. (eds.) MICCAI 2007, Part I. LNCS, vol. 4791, pp. 584–592. Springer,
Heidelberg (2007)
3. Craft, D.: Multi-criteria optimization methods in radiation therapy planning: a
review of technologies and directions (2013). arXiv preprint: arXiv:1305.1546
4. D’Albis, T., Haegelen, C., Essert, C., Fernandez-Vidal, S., Lalys, F., Jannin, P.:
PyDBS: an automated image processing workflow for deep brain stimulation
surgery. Int. J. Comput. Assist. Radiol. Surg. 10, 1–12 (2014)
5. Essert, C., Haegelen, C., Lalys, F., Abadie, A., Jannin, P.: Automatic computa-
tion of electrode trajectories for deep brain stimulation: a hybrid symbolic and
numerical approach. Int. J. Comput. Assist. Radiol. Surg. 7(4), 517–532 (2012)
6. Kim, I., de Weck, O.: Adaptive weighted-sum method for bi-objective optimization:
pareto front generation. Struct. Multidiscip. Optim. 29(2), 149–158 (2004)
7. Liu, Y., Konrad, P., Neimat, J., Tatter, S., Yu, H., Datteri, R., Landman, B.,
Noble, J., Pallavaram, S., Dawant, B., D’Haese, P.F.: Multisurgeon, multisite val-
idation of a trajectory planning algorithm for deep brain stimulation procedures.
IEEE Trans. Biomed. Eng. 61(9), 2479–2487 (2014)
8. Ng, K.W., Tian, G.L., Tang, M.L.: Dirichlet and Related Distributions: Theory,
Methods and Applications, vol. 888. Wiley, Chichester (2011)
9. Schumann, C., Rieder, C., Haase, S., Teichert, K., Süss, P., Isfort, P., Bruners, P.,
Preusser, T.: Interactive multi-criteria planning for radiofrequency ablation. Int.
J. CARS 10, 879–889 (2015)
10. Seitel, A., Engel, M., Sommer, C., Redeleff, B., Essert-Villard, C., Baegert, C.,
Fangerau, M., Fritzsche, K., Yung, K., Meinzer, H.P., Maier-Hein, L.: Computer-
assisted trajectory planning for percutaneous needle insertions. Med. Phy. 38(6),
3246–3260 (2011)
11. Trope, M., Shamir, R.R., Joskowicz, L., Medress, Z., Rosenthal, G., Mayer, A.,
Levin, N., Bick, A., Shoshan, Y.: The role of automatic computer-aided surgical
trajectory planning in improving the expected safety of stereotactic neurosurgery.
Int. J. CARS 10(7), 1127–1140 (2014)
Efficient Anatomy Driven Automated Multiple
Trajectory Planning for Intracranial Electrode
Implantation
1 Introduction
One-third of individuals with focal epilepsy continue to have seizures despite
optimal medical management. These patients are candidates for resection if the
epileptogenic zone (EZ) can be identified. Intracranial depth electrodes may be
implanted in the brain to record electroencephalographic (EEG) signals indica-
tive of epileptic activity in both deep and superficial regions of interest (ROIs)
within the cortex that have been identified as potential EZ. Implanted electrodes
are also used for stimulation studies to map eloquent areas (e.g. motor or sensory
cortex) and to determine whether a safe resection may be made that removes
the EZ without compromising eloquent cortex.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 542–550, 2016.
DOI: 10.1007/978-3-319-46720-7 63
Efficient Anatomy Driven Automated Multiple Trajectory Planning 543
2 Methodology
(a) (b)
Fig. 1. Implantation plan with 10 electrodes. (a) Deep ROIs: amygdala (cyan), hip-
pocampus (yellow), anterior insula (brown), transverse temporal gyrus (blue), posterior
(orange) and middle cingulate (purple), posterior (green) and anterior medial orbital
gyrus (mauve). (b) Superficial ROIs: middle frontal (blue), superior frontal (purple),
middle temporal (light pink), and superior temporal (dark pink) lobes and supramar-
ginal (light orange), angular (light green), and precentral (dark green) gyri.
Ωroi and Ωcri are the set of voxels in the deep ROI and critical structures,
respectively. froi (c) is the distance between c and the closest surface point on
the deep ROI, calculated using a bounding volume hierarchy (BVH) as in [8].
Similarly, fcri (c) is the distance between c and the closest surface point on all
Efficient Anatomy Driven Automated Multiple Trajectory Planning 545
Fig. 2. (a)Axial, (b) sagittal, and (c) coronal views of Ct (high values of ft (c) are
red, low values are green) for the hippocampus (blue) with blood vessels (cyan) and
trajectories determined by ADMTP (purple, pink).
critical structures. wroi and wcri control the relative importance of placing the
target within the deep ROI and avoiding critical structures, respectively. Figure 2
displays Ct for a hippocampus, red corresponds to high values of ft (c) and green
to low values. Target points are selected from c ∈ Ct by calculating local minima
using the watershed algorithm [1] then sampling the M lowest values of ft (c)
with a distance of at least dtar between every target point Tn,i .
Entry points, defined as En,j : j ∈ {1, . . . , P }, are computed by using all ver-
tices on the skull mesh, roughly 10, 000 points sampled every 0.2 mm3 . Potential
trajectories, Tn,i En,j : i ∈ {1, . . . , M }, j ∈ {1, . . . , P }, are then removed from
consideration using a modified approach of Zombori et al. [8] as follows:
After excluding trajectories based on these hard criteria there are typically
1, 000–5, 000 potential trajectories per electrode. The remaining trajectories have
a risk score Rn,i,j and GM ratio Gn,i,j calculated. Rn,i,j , a measure of the dis-
tance to critical structures, is computed as,
Tn,i
drisk − (fcri (x) − dsaf e )dx
En,j
Rn,i,j = , (2)
(drisk − dsaf e ) ∗ length
where trajectories with fcri (x) closer than dsaf e have the highest risk (Rn,i,j = 1)
while fcri (x) farther than drisk have no risk (Rn,i,j = 0).
Gn,i,j measures the proportion of electrode contacts in GM. For each elec-
trode, Q contacts with a recording radius of pr are spaced at even intervals pq
along the trajectory. Gn,i,j is calculated as,
546 R. Sparks et al.
Q
(H[fgm (pq − pr )] + H[fgm (pq )] + H[fgm (pq + pr )]
q=1
Gn,i,j = , (3)
3∗Q
where fgm (·) is the signed distance from the GM surface and H[·] is the Heaviside
function, with values of 1 inside GM and 0 outside.
Each trajectory is assigned a weighted score Sn,i,j computed as Sn,i,j =
10 ∗ Rn,i,j + Gn,i,j , where 10 was determined empirically so low risk is prioritized
over a high GM ratio.
The final implantation plan V (N ) is found by optimizing,
N
1
Stotal = argmin Sn,i,j
V (N ) N n=1
s.t. D(Tn,i En,j , Tk,i Ek,j ) > dtraj : ∀n, ∀k ∈ {1, . . . , N }, n = k. (4)
where dtraj specifies the minimum distance between trajectories that do not con-
flict. Due to the constraint dtraj if the user selects multiple electrodes with the
same ROI ADMTP will find unique targets. For an implantation plan there are
typically 7–12 electrodes, each with 1, 000–5, 000 potential trajectories, repre-
senting approximately 1 × 1021 possible combinations, hence, a depth-first graph
search strategy is used to calculate a feasible implantation plan. If no combina-
tion of trajectories exists which satisfies dtraj ADMTP returns the plan with the
largest distance between trajectories.
Table 1. The following values were set by a consensus of 3 neurosurgeons: the most
oblique angle drillable, dang , the minimum safe distance from blood vessels, dsaf e , the
distance at which there is no risk, drisk , the minimum distance between electrodes,
dtraj . A commonly used electrode configuration determined: electrode length, dlen , the
number of contacts, Q, the interval between contacts, pq , the contact sample radius,
pr . The following values were set empirically: the number of candidate targets, M ,
the minimum distance between candidate targets, dtar , and the relative importance of
sampling the ROI, wroi , and avoiding critical structures, wcri .
(a) (b)
(c) (d)
Trajectories were assessed by angle with respect to the skull surface normal,
risk score, distance to nearest critical structure, and GM ratio. In Fig. 3 each
point corresponds to one trajectory with the manual plan value plotted on the
X axis and the ADMTP value plotted on the Y axis. The red point represents
the center of mass for each measure. Points below the diagonal have a lower
value for ADMTP compared to manual plans which represent ADMTP giving
the preferred value for angle and risk score; points above the diagonal represent
ADMTP giving the preferred value for critical structures distance and GM ratio.
A two-tailed Student’s t-test evaluated the statistical significance between values
determined by ADMTP and manual plans where the null hypothesis was that
the methods return similar values.
ADMTP found a more feasible entry angle in 96/186 trajectories (p < 0.01)
and increased GM sampling in 104/186 trajectories (p > 0.01). ADMTP found
trajectories that were safer, in terms of reduced risk score and increased distance
to the closest critical structure in 145/186 trajectories (p < 0.01).
548 R. Sparks et al.
(a) (b)
Fig. 4. Manual (pink) and ADMTP (blue) trajectories are shown with veins (cyan),
skull (opaque white), and with (a) the cortex (peach) and (b) no cortex.
4 Concluding Remarks
We presented an anatomically driven multiple trajectory planning (ADMTP)
algorithm for calculating intracerebral electrode trajectories from anatomical
regions of interest (ROIs). Compared to manual planning, ADMTP lowered
risk in 78 % of trajectories and increased GM sampling in 56 % of trajectories.
ADMTP was evaluated on quantitative measures of suitability, however, a qual-
itative analysis is necessary to assess the clinical suitability of ADMTP. Future
work is required to ensure ADMTP provides trajectories that sample unique
gyri. ADMTP efficiently calculates (>5 min) safe trajectories.
References
1. Beare, R., Lehmann, G.: Finding regional extrema - methods and performance
(2005). http://hdl.handle.net/1926/153
2. Bériault, S., Subaie, F.A., Collins, D.L., Sadikot, A.F., Pike, G.B.: A multi-modal
approach to computer-assisted deep brain stimulation trajectory planning. IJCARS
7(5), 687–704 (2012)
3. Cardoso, M.J., Modat, M., Wolz, R., Melbourne, A., Cash, D., Rueckert, D.,
Ourselin, S.: Geodesic information flows: Spatially-variant graphs and their appli-
cation to segmentation and fusion. IEEE TMI 34(9), 1976–1988 (2015)
4. De Momi, E., Caborni, C., Cardinale, F., Casaceli, G., Castana, L., Cossu, M.,
Mai, R., Gozzo, F., Francione, S., Tassi, L., Lo Russo, G., Antiga, L., Ferrigno, G.:
Multi-trajectories automatic planner for StereoElectroEncephaloGraphy (SEEG).
IJCARS, 1–11 (2014)
5. Essert, C., Haegelen, C., Lalys, F., Abadie, A., Jannin, P.: Automatic computation
of electrode trajectories for deep brain stimulation: a hybrid symbolic and numerical
approach. IJCARS 7(4), 517–532 (2012)
6. Shamir, R.R., Joskowicz, L., Tamir, I., Dabool, E., Pertman, L., Ben-Ami, A.,
Shoshan, Y.: Reduced risk trajectory planning in image-guided keyhole neuro-
surgery. Med. Phys. 39(5), 2885–2895 (2012)
7. Zelmann, R., Beriault, S., Marinho, M.M., Mok, K., Hall, J.A., Guizard, N.,
Haegelen, C., Olivier, A., Pike, G.B., Collins, D.L.: Improving recorded volume in
mesial temporal lobe by optimizing stereotactic intracranial electrode implantation
planning. IJCARS 10(10), 1599–1615 (2015)
550 R. Sparks et al.
8. Zombori, G., Rodionov, R., Nowell, M., Zuluaga, M.A., Clarkson, M.J., Micallef, C.,
Diehl, B., Wehner, T., Miserochi, A., McEvoy, A.W., Duncan, J.S., Ourselin, S.: A
computer assisted planning system for the placement of sEEG electrodes in the
treatment of epilepsy. In: Stoyanov, D., Collins, D.L., Sakuma, I., Abolmaesumi, P.,
Jannin, P. (eds.) IPCAI 2014. LNCS, vol. 8498, pp. 118–127. Springer, Heidelberg
(2014)
9. Zuluaga, M.A., Rodionov, R., Nowell, M., Achhala, S., Zombori, G.,
Mendelson, A.F., Cardoso, M.J., Miserocchi, A., McEvoy, A.W., Duncan, J.S.,
Ourselin, S.: Stability, structure and scale: improvements in multi-modal vessel
extraction for seeg trajectory planning. IJCARS 10(8), 1227–1237 (2015)
Recognizing Surgical Activities with Recurrent
Neural Networks
1 Introduction
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 551–558, 2016.
DOI: 10.1007/978-3-319-46720-7 64
552 R. DiPietro et al.
learned convolutional filters. However, we note that even these unaries depend
only on inputs from fairly local neighborhoods in time.
In this work, we use recurrent neural networks (RNNs), and in particular long
short-term memory (LSTM), to map kinematics to labels. Rather than operating
only on local neighborhoods in time, LSTM maintains a memory cell and learns
when to write to memory, when to reset memory, and when to read from memory,
forming unaries that in principle depend on all inputs. In fact, we will rely only
on these unary terms, or in other words assume that labels are independent
given the sequence of kinematics. Despite this, we will see that predicted labels
are smooth over time with no post-processing. Further, using a single model
and a single set of hyperparameters, we match state-of-the-art performance for
gesture recognition and improve over state-of-the-art performance for maneuver
recognition, in terms of both accuracy and edit distance.
2 Methods
The goal of this work is to use nx kinematic signals over time to label every
time step with one of ny surgical activities. An individual sequence of length T
is composed of kinematic inputs {xt }, with each xt ∈ Rnx , and a collection of
one-hot encoded activity labels {yt }, with each yt ∈ {0, 1}ny . (For example, if we
have classes 1, 2, and 3, then the one-hot encoding of label 2 is (0, 1, 0)T .) We aim
to learn a mapping from {xt } to {yt } in a supervised fashion that generalizes to
users that were absent from the training set. In this work, we use recurrent neural
networks to discriminatively model p(yt |x1 , x2 , . . . , xt ) for all t when operating
online and p(yt |x1 , x2 , . . . , xT ) for all t when operating offline.
+
x1 x2 x3 xt
If we use the nonlinear block shown in Fig. 2b, we end up with a specific and
simple model: a vanilla RNN with one hidden layer. The recursive equation for
a vanilla RNN, which can be read off precisely from Fig. 2b, is
ht = tanh(Wx xt + Wh ht−1 + b) (1)
Here, Wx , Wh , and b are free parameters that are shared over time. For the
vanilla RNN, we have m̃t = h̃t = ht . The height of ht is a hyperparameter and
is referred to as the number of hidden units.
In the case of multiclass classification, we use a linear layer to transform m̃t to
appropriate size ny and apply a softmax to obtain a vector of class probabilities:
ŷt = softmax(Wym m̃t + by ) (2)
p(ytk = 1 | x1 , x2 , . . . , xt ) = ŷtk (3)
where softmax(x) = exp(x)/ i exp(xi ).
RNNs traditionally propagate information forward in time, forming predic-
tions using only past and present inputs. Bidirectional RNNs [12] can improve
performance when operating offline by using future inputs as well. This essen-
tially consists of running one RNN in the forward direction and one RNN in the
backward direction, concatenating hidden states, and computing outputs jointly.
3 Experiments
3.1 Datasets
The JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) [2] is a
public benchmark surgical activity dataset recorded using the da Vinci. JIG-
SAWS contains synchronized video and kinematic data from a standard 4-throw
Recognizing Surgical Activities with Recurrent Neural Networks 555
suturing task performed by eight subjects with varying skill levels. All subjects
performed about 5 trials, resulting in a total of 39 trials. We use the same
measurements and activity labels as the current state-of-the-art method [10].
Measurements are position (x, y, z), velocity (vx , vy , vz ), and gripper angle (θ)
for each of the left and right slave manipulators, and the surgical activity at each
time step is one of ten different gestures.
The Minimally Invasive Surgical Training and Innovation Center - Science
of Learning (MISTIC-SL) dataset, also recorded using the da Vinci, includes
49 right-handed trials performed by 15 surgeons with varying skill levels. We
follow [3] and use a subset of 39 right-handed trials for all experiments. All trials
consist of a suture throw followed by a surgeon’s knot, eight more suture throws,
and another surgeon’s knot. We used the same kinematic measurements as for
JIGSAWS, and the surgical activity at each time step is one of 4 maneuvers:
suture throw (ST), knot tying (KT), grasp pull run suture (GPRS), and inter-
maneuver segment (IMS). It is not possible for us to release this dataset at this
time, though we hope we will be able to release it in the future.
We performed a grid search over the number of RNN hidden layers (1 or 2),
the number of hidden units per layer (64, 128, 256, 512, or 1024), and whether
dropout [16] is used (with p = 0.5). 1 hidden layer of 1024 units, with dropout,
resulted in the lowest edit distance and simultaneously yielded high accuracy.
These hyperparameters were used for all experiments.
Using a modern GPU, training takes about 1 h for any particular JIGSAWS
run and about 10 h for any particular MISTIC-SL run (MISTIC-SL sequences
are approximately 10x longer than JIGSAWS sequences). We note, however, that
RNN inference is fast, with a running time that scales linearly with sequence
length. At test time, it took the bidirectional RNN approximately 1 s of compute
time per minute of sequence (300 time steps).
3.4 Results
Table 1 shows results for both JIGSAWS (gesture recognition) and MISTIC-
SL (maneuver recognition). A forward LSTM and a bidirectional LSTM are
compared to the Markov/semi-Markov conditional random field (MsM-CRF),
Shared Discriminative Sparse Dictionary Learning (SDSDL), Skip-Chain CRF
(SC-CRF), and Latent-Convolutional Skip-Chain CRF (LC-SC-CRF). We note
that the LC-SC-CRF results were computed by the original author, using the
same MISTIC-SL validation set for hyperparameter selection.
We include standard deviations where possible, though we note that they
largely describe the user-to-user variations in the datasets. (Some users are exc-
eptionally challenging, regardless of the method.) We also carried out statistical-
significance testing using a paired-sample permutation test (p-value of 0.05).
This test suggests that the accuracy and edit-distance differences between the
bidirectional LSTM and LC-SC-CRF are insignificant in the case of JIGSAWS
but are significant in the case of MISTIC-SL. We also remark that even the
forward LSTM is competitive here, despite being the only algorithm that can
run online.
Qualitative results are shown in Fig. 3 for the trials with highest, median, and
lowest accuracies for each dataset. We note that the predicted label sequences
are smooth, despite the fact that we assumed that labels are independent given
the sequence of kinematics.
JIGSAWS MISTIC-SL
Accuracy (%) Edit dist. (%) Accuracy (%) Edit dist. (%)
MsM-CRF [15] 72.6 — — —
SDSDL [13] 78.7 — — —
SC-CRF [9] 80.3 — — —
LC-SC-CRF [10] 82.5 ± 5.4 14.8 ± 9.4 81.7 ± 6.2 29.7 ± 6.8
Forward LSTM 80.5 ± 6.2 19.8 ± 8.7 87.8 ± 3.7 33.9 ± 13.3
Bidir. LSTM 83.3 ± 5.7 14.6 ± 9.6 89.5 ± 4.0 19.5 ± 5.2
Recognizing Surgical Activities with Recurrent Neural Networks 557
Fig. 3. Qualitative results for JIGSAWS (top) and MISTIC-SL (bottom) using a bidi-
rectional LSTM. For each dataset, we show results from the trials with highest accuracy
(top), median accuracy (middle), and lowest accuracy (bottom). In all cases, ground
truth is displayed above predictions.
4 Summary
In this work we performed joint segmentation and classification of surgical activi-
ties from robot kinematics. Unlike prior work, we focused on high-level maneuver
prediction in addition to low-level gesture prediction, and we modeled the map-
ping from inputs to labels with recurrent neural networks instead of with HMM
or CRF based methods. Using a single model and a single set of hyperparameters,
we matched state-of-the-art performance for JIGSAWS (gesture recognition) and
advanced state-of-the-art performance for MISTIC-SL (maneuver recognition),
in the latter case increasing accuracy from 81.7 % to 89.5 % and decreasing nor-
malized edit distance from 29.7 % to 19.5 %.
References
1. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient
descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
2. Gao, Y., Vedula, S.S., Reiley, C.E., Ahmidi, N., Varadarajan, B., Lin, H.C., Tao,
L., Zappella, L., Bejar, B., Yuh, D.D., Chen, C.C.G., Vidal, R., Khudanpur, S.,
Hager, G.D.: Language of surgery: a surgical gesture dataset for human motion
modeling. In: Modeling and Monitoring of Computer Assisted Interventions
(M2CAI) 2014. Springer, Boston, USA (2014)
3. Gao, Y., Vedula, S., Lee, G.I., Lee, M.R., Khudanpur, S., Hager, G.D.: Unsuper-
vised surgical data alignment with application to automatic activity annotation. In:
2016 IEEE International Conference on Robotics and Automation (ICRA) (2016)
558 R. DiPietro et al.
4. Gers, F.A., Schmidhuber, J.: Recurrent nets that time and count. In: IEEE Con-
ference on Neural Networks, vol. 3 (2000)
5. Graves, A.: Supervised Sequence Labelling. Springer, Heidelberg (2012)
6. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint
arXiv:1308.0850 (2013)
7. Greff, K., Srivastava, R.K., Koutnı́k, J., Steunebrink, B.R., Schmidhuber, J.:
LSTM: A search space odyssey. arXiv preprint arXiv:1503.04069 (2015)
8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8),
1735–1780 (1997)
9. Lea, C., Hager, G.D., Vidal, R.: An improved model for segmentation and recog-
nition of fine-grained activities with application to surgical training tasks. In: 2015
IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1123–
1129. IEEE (2015)
10. Lea, C., Vidal, R., Hager, G.D.: Learning convolutional action primitives for fine-
grained action recognition. In: 2016 IEEE International Conference on Robotics
and Automation (ICRA) (2016)
11. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-
propagating errors. Cogn. Model. 5(3), 1 (1988)
12. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans.
Sig. Process. 45(11), 2673–2681 (1997)
13. Sefati, S., Cowan, N.J., Vidal, R.: Learning shared, discriminative dictionaries for
surgical gesture segmentation and classification. In: Modeling and Monitoring of
Computer Assisted Interventions (M2CAI) 2015. Springer, Heidelberg (2015)
14. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural
networks. In: Advances in Neural Information Processing Systems (2014)
15. Tao, L., Zappella, L., Hager, G.D., Vidal, R.: Surgical gesture segmentation and
recognition. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MIC-
CAI 2013, Part III. LNCS, vol. 8151, pp. 339–346. Springer, Heidelberg (2013)
16. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization.
arXiv preprint arXiv:1409.2329 (2014)
Two-Stage Simulation Method to Improve
Facial Soft Tissue Prediction Accuracy
for Orthognathic Surgery
1 Introduction
important for orthognathic surgery. Therefore, there is an urgent clinical need to develop
a reliable method of accurately predicting facial changes following osteotomies.
Traditional FEM for facial soft tissue simulation assumes that the FEM mesh nodes
move together with the contacting bone surfaces. However, this assumption can lead to
significant errors when a large bone movement and occlusion changes are involved. In
human anatomy, cheek and lip mucosa are not directly attached to the bone and teeth;
they slide over each other. The traditional FEM does not consider this sliding, which
we believe is the main reason for inaccurate prediction in the lips and chin.
Implementing the realistic sliding effect into FEM is technically challenging. It
requires high computational times and efforts because the sliding mechanism in human
mouth is a dynamic interaction between two surfaces. The 2nd challenge is that even if
the sliding movement with force constraint is implemented, the simulation results may
still be inaccurate, because there is no strict nodal displacement boundary condition
applied to the sliding areas. The soft tissues at sliding surfaces follow the buccal surface
profile of the bones and teeth. Thus, it is necessary to consider the displacement
boundary condition for sliding movement. The 3rd challenge is that the mapping
between the bone surface and FEM mesh nodes needs to be reestablished after the bony
segments are moved to a desired planned position. This is because the bone and soft
tissue relationship is not constant before and after the bone movement, e.g. a setback or
advancement surgery may either decrease or increase the soft tissue contacting area to
the bones and teeth. This mismatch may lead to the distortion of the resulting mesh.
The 4th challenge is that occlusal changes, e.g. from preoperative cross-bite to post-
operative Class I (normal) bite, may cause a mesh distortion in the lip region where the
upper and lower teeth meet. Therefore, a simulation method with more advanced
sliding effects is required to increase the prediction accuracy in critical regions such as
the lips and chin.
We solved these technical problems. In this study, we developed a two-stage FEM
simulation method. In the first stage, the facial soft tissue changes following the bony
movements were simulated with an extended sliding boundary condition to overcome
the mesh distortion problem in traditional FEM simulations. The nodal force constraint
was applied to simulate the sliding effect of the mucosa. In the second stage, nodal
displacement boundary conditions were implemented in the sliding areas to accurately
reflect the postoperative bone surface geometry. The corresponding nodal displacement
for each node was recalculated after reassigning the mapping between the mesh and
bone surface in order to achieve a realistic sliding movement. Finally, our simulation
method was evaluated quantitatively and qualitatively using 30 sets of preoperative and
postoperative patient computed tomography (CT) datasets.
Our two-stage approach of simulating facial soft tissue changes following the osteo-
tomies is described below in details. In the 1st stage, a patient-specific FEM model with
homogeneous linear elastic material property is generated using a FEM template model
(Total of 38280 elements and 48593 nodes) [3]. The facial soft tissue changes are
Two-Stage Simulation Method to Improve Facial Soft Tissue Prediction 561
predicted using FEM with the simple sliding effect of the mucosa around the teeth and
partial maxillary and mandibular regions. Only the parallel nodal force is considered on
the corresponding areas. In the 2nd stage, explicit boundary conditions are applied to
improve the tissue sliding effect by exactly reflecting the bone surface geometry, thus
ultimately improving the prediction accuracy.
2.1 The First Stage of FEM Simulation with Simple Sliding Effect
The patient-specific volume mesh is generated from an anatomically detailed FEM
template mesh, which was previously developed from a Visible Female dataset [3].
Both inner and outer surfaces of the template mesh are registered to the patient’s skull
and facial surfaces respectively using anatomical landmark-based thin-plate splines
(TPS) technique. Finally, the total mesh volume is morphed to the patient data by
interpolating the surface registration result using TPS again [3].
Although there have been studies investigating optimal tissue properties, the effect
of using different linear elastic material properties on the simulation results was neg-
ligible [4]. Furthermore, shape deformation patterns are independent of Young’s
modulus for isotropic material under displacement boundary conditions as long as the
loading that causes the deformation is irrelevant for the study. Therefore, in our study,
we assign 3000 (Pa) for Young’s modulus and 0.47 for Poisson’s ratio [4].
Surface nodes of the FEM mesh are divided into the boundary nodes and free nodes
(Fig. 1). The displacements of free nodes (GreenBlue in Fig. 1b and c) are determined
by the displacements of boundary nodes using FEM. Boundary nodes are further
divided into static, moving and sliding nodes. The static nodes do not move in the
surgery (red in Fig. 1). Note that the lower posterior regions of the soft tissue mesh
(orange in Fig. 1b) are assigned as free nodes in the first stage. This is important
because together with the ramus sliding boundary condition, it maintains the soft tissue
integrity, flexibility and smoothness in the posterior and inferior mandibular regions
when an excessive mandibular advancement or setback occurs.
Fig. 1. Mesh nodal boundary condition. (a) Mesh inner surface boundary condition (illustrated
on bones for better understanding) for the 1st stage only; (b) Posterior and superior surface
boundary condition for both 1st and 2nd stages; (c) Mesh inner surface boundary condition
(illustrated on bones for better understanding) for the 2nd stage only. Static nodes: red, and
orange (2nd stage only); Moving nodes: Blue; Sliding nodes: pink; Free nodes: GreenBlue, and
orange (1st stage only); Scar tissue: green.
562 D. Kim et al.
The moving nodes on the mesh are the ones moving in sync with the bones (blue in
Fig. 1a). The corresponding relationships of the vertices of the STL bone segments to
the moving nodes of the mesh are determined by a closest point search algorithm. The
movement vector (magnitude and the direction) of each bone segment is then applied to
the moving nodes as a nodal displacement boundary condition. In addition, the areas
where two bone (proximal and distal) segments collide with each other after the sur-
gical movements are excluded from the moving boundary nodes. These are designated
as free nodes to further solve the mesh distortion at the mandibular inferior border.
Moreover, scar tissue is considered as a moving boundary (green in Fig. 1a). This is
because the soft tissues in these regions are degloved intraoperatively, causing scars
postoperatively, which subsequently affects the facial soft tissue geometry. The scar
tissue is added onto the corresponding moving nodes by shifting them an additional
2 mm in anterior direction as the displacement boundary condition.
In the first stage, the sliding boundary conditions are applied to the sliding nodes
(pink in Fig. 1a) of the mouth, including the cheek, lips, and extended to the mesh inner
surface corresponding to a partial maxilla and mandible (including partial ramus). The
sliding boundary conditions in mucosa area are adopted from [2].
Movement of the free nodes (Fig. 1b) is determined by FEM with the aforemen-
tioned boundary conditions (Fig. 1a and b). An iterative FEM solving algorithm is
developed to calculate the movement of the free nodes and to solve the global FEM
equation: Kd ¼ f , where K is a global stiffness matrix, d is a global nodal displace-
ment, and f is a global nodal force. This equation can be rewritten as:
! ! !
K11 K12 d1 f1
¼ ð1Þ
T
K12 K22 d2 f2
where d1 is the displacement of the moving and static nodes, d2 is the displacement of
the free and sliding nodes to be determined. The parameter f1 is the nodal force on the
moving and static nodes, and f2 is the nodal force acting on both free and sliding nodes.
The nodal force of the free nodes is assumed to be zero, and only tangential nodal
forces along the contacting bone surface are considered for the sliding nodes [2].
The final value of d2 is calculated by iteratively updating d2 using Eq. (2) until the
converging condition is satisfied [described later].
ðk þ 1Þ ðkÞ ðkÞ
d2 ¼ d2 þ d2 update ; ðk ¼ 1; 2; . . .::; nÞ ð2Þ
f2 ¼ K12
T
d1 þ K22 d2 ð3Þ
Second, f2t is calculated by transforming the nodal force of the sliding nodes among
f2 to have only tangential nodal force component [2]. Now, f2t is composed of the nodal
Two-Stage Simulation Method to Improve Facial Soft Tissue Prediction 563
force of the free nodes (f2 free ) and only a tangential component of the nodal force of the
sliding nodes (f2t sliding ).
In the final step of the iteration, f2 update is acquired to determine the required nodal
displacement (d2 update Þ. Nodal force f2 update is the difference between f2t and f2 . d2 update
1
is finally calculated as follows: d2 update ¼ K22 f2 update þ K12
T
d1 , which is derived
ðk þ 1Þ
from Eq. (1). Then, d2 is calculated using Eq. (2). The iteration continues until the
maximal absolute value of f2 update converges below 0.01 N (k = n). The final values of
d (d1 and d2 ) represents the displacement of mesh nodes after the appling bone
movements and the simple sliding effect. The algorithm was implemented in
MATLAB. The final d in this first-stage simulation is designated as dfirst .
2.2 The Second Stage of FEM Simulation with Advanced Sliding Effect
The predicted facial soft tissue changes in the first stage are further refined in the
second stage by adding an advanced sliding effect. This is necessary because the first
stage only accounts for the nodal force constraint, which may result in a mismatch
between the simulated mesh inner surface and the bone surface (Fig. 2).
Fig. 2. Assignment of nodal displacement in the second stage of FEM. (a) Mismatch
between the simulated mesh inner surface and the bone surface. (b) Description of nodal
displacement boundary condition assignment.
Based on real clinical situations, the geometries of the teeth and bone buccal
surface and its contacting surface on the inner side of the soft tissue mesh should
exactly be matched, even though the relationship between the vertices of the bones and
the nodes of the soft tissue mesh is changed after the bony segments are moved.
Therefore, the boundary mapping and condition between the bone surface and soft
tissue mesh nodes need to be reestablished in the sliding areas in order to properly
reflect the above realistic sliding effect. First, the nodes of the inner mesh surface
corresponding to the maxilla and mandible are assigned as the moving nodes in the
second stage (blue in Fig. 1c). The nodal displacements of the moving nodes are
calculated by finding the closest point from each mesh node to the bone surface, instead
of finding them from the bone to the mesh in the first-stage. The assignment is pro-
cessed from superior to inferior direction, ensuring an appropriate boundary condition
implementation without mesh distortion (Fig. 2). This is because clinically the post-
operative lower teeth are always inside of the upper teeth (as a normal bite) despite of
the preoperative condition. This procedure prevents the nodes from having the same
564 D. Kim et al.
nodal displacement being counted twice, thus solving the mismatch problem between
the bone surface and its contacting surface on the inner side of the simulated mesh.
Once computed, the vector between each node and its corresponding closest vertex on
the bone surface is assigned as the nodal displacement for the FEM simulation.
The free nodes at the inferoposterior surface of the soft tissue mesh in the first-stage
are now assigned as static nodes in this stage (orange in Fig. 1b). The rest of the nodes
are assigned as the free nodes (GreenBlue in Fig. 1b and c). The global stiffness matrix
(K), the nodal displacement (d) and the nodal force (f) are reorganized according to the
new boundary conditions. The 2nd-stage results are calculated by solving Eq. (1).
Based on the assumption that the nodal force of the free nodes, f2 , is zero (note no
sliding nodes in the second-stage), the nodal displacement of the free nodes, d2 , can be
1 T
calculated as follows: d2 ¼ K22 K12 d1 (from Eq. (1)). Then, the final d (d1 and d2 ) is
designated as dsecond . Finally, the overall nodal displacement is calculated by com-
bining the resulted nodal displacements of the first (dfirst ) and the second (dsecond ) FEM
simulations.
The evaluation was completed by using 30 randomly selected datasets of patients who
had dentofacial deformity and underwent an orthognathic surgery [IRB0413-0045].
Each patient had a complete preoperative and postoperative CT scans.
The soft tissue prediction was completed using 3 methods: (1) the traditional FEM
without considering the slide effect [1]; (2) the FEM with first-stage (simple) sliding
effect by only considering the nodal force constraint; and (3) our novel FEM with
two-stage sliding effects. All FEM meshes were generated by adapting our FEM
template to the patient’s individual 3D model [3]. In order to determine the actual
movement vector of each bony segment, the postoperative patient’s bone and soft
tissue 3D CT models were registered to the preoperative ones at the cranium (surgically
unmoved). The movement vector of each bony segment was calculated by moving the
osteotomized segment from its preoperative original position to the postoperative
position.
Finally, the simulated results were evaluated quantitatively and qualitatively. In the
quantitative evaluation, displacement errors (absolute mean Euclidean distances) were
calculated between the nodes on the simulated facial mesh and their corresponding
points on the postoperative model. The evaluation was completed for the whole face
and 8 sub-regions (Fig. 3). Repeated measures analysis of variance and its post-hoc
tests were used to detect the statistically significant difference. In the qualitative
evaluation, two maxillofacial surgeons who are experienced in orthognathic surgery
together evaluated the results based on their clinical judgement and consensus. They
were also blinded from the methods used for the simulation. The predicted results were
compared to the postoperative ones using a binary visual analog scale (Unacceptable:
the predicted result was not clinically realistic; Acceptable: the predicted result was
clinically realistic and very similar to the postoperative outcome). Chi-square test was
used to detect the statistical significant differences.
Two-Stage Simulation Method to Improve Facial Soft Tissue Prediction 565
The results of the quantitative evaluation showed that our two-stage sliding effects
FEM method significantly improved the accuracy of the whole face, as well as the
critical areas (i.e., lips, nose and chin) in comparison with the traditional FEM method.
The chin area also showed a trend of improvement (Table 1). Finally, the malar region
showed a significant improvement due to the scar tissue modeling.
The results of the qualitative evaluation showed that 73 % (22/30) predicted results
achieved with 2-stage FEM method were clinically acceptable. The prediction accuracy
of the whole face and the critical regions (e.g., lips and nose) were significantly
improved (Table 1). However, only 43 % (13/30) were acceptable with both traditional
and simple sliding FEMs. This was mainly due to the poor lower lip prediction. Even
though the cheek prediction was significantly improved in the simple sliding FEM,
inaccurately predicted lower lips severely impacted the whole facial appearance.
Table 1. Improvement of the simple and 2-stage sliding over the traditional FEM method (%)
for 30 patients.
Region Quantitative evaluation Qualitative evaluation
Simple sliding Two-stage sliding Simple sliding Two-stage sliding
Entire face 1.9 4.5* 0.0 30.0*
1. Nose 7.2* 8.4* 0.0 0.0
2. Upper lip −1.3 9.2* 13.3 20.0*
3. Lower lip −12.0 10.2 −6.7 23.3*
4. Chin −2.0 3.6 3.3 10.0
5. Right malar 6.1* 6.2* 0.0 0.0
6. Left malar 9.2* 8.8* 0.0 0.0
7. Right cheek 0.1 1.3 23.3* 23.3*
8. Left cheek 3.0 1.4 30.0* 30.0*
* Significant difference compared to the traditional method (P < 0.05).
Figure 4 illustrates the predicted results of a typical patient. Using the traditional
FEM, the upper and lower lip moved together with the underlying bone segments
without considering the sliding movement (1.4 mm of displacement error for the upper
lip; 1.6 mm for the lower), resulting in large displacement errors (clinically unaccept-
able, Fig. 4(a)). The predicted upper lip using the simple sliding FEM was moderately
improved (1.1 mm of error), while the lower lip showed a larger error (3.1 mm). The
upper and lower lips were in a wrong relation (clinically unacceptable, Fig. 4(b)).
566 D. Kim et al.
Fig. 4. An example of quantitative and qualitative evaluation results. The predicted mesh
(red) is superimposed to the postoperative bone (blue) and soft tissue (grey). (a) Traditional FEM
simulation (1.6 mm of error for the whole face, clinically not acceptable). (b) Simple sliding
FEM simulation (1.6 mm of error, clinically not acceptable). (c) Two-stage FEM simulation
(1.4 mm of error, clinically acceptable).
The mesh inner surface, and the bony/teeth geometries were also mismatched that
should be perfectly matched clinically. Finally, our two-stage FEM simulation achieved
the best results of accurately predicting clinically important facial features with a correct
lip relation (the upper lip error: 0.9 mm; the lower: 1.3 mm, clinically acceptable,
Fig. 4(c)).
References
1. Pan, B., et al.: Incremental kernel ridge regression for the prediction of soft tissue
deformations. Med. Image Comput. Comput. Assist. Interv. 15(Pt 1), 99–106 (2012)
2. Kim, H., Jürgens, P., Nolte, L.-P., Reyes, M.: Anatomically-driven soft-tissue simulation
strategy for cranio-maxillofacial surgery using facial muscle template model. In: Jiang, T.,
Navab, N., Pluim, J.P., Viergever, M.A. (eds.) MICCAI 2010, Part I. LNCS, vol. 6361,
pp. 61–68. Springer, Heidelberg (2010)
3. Zhang, X., et al.: An eFace-template method for efficiently generating patient-specific
anatomically-detailed facial soft tissue FE models for craniomaxillofacial surgery simulation.
Ann. Biomed. Eng. 44, 1656–1671 (2016)
4. Mollemans, W., Schutyser, F., Nadjmi, N., Maes, F., Suetens, P.: Parameter optimisation of a
linear tetrahedral mass tensor model for a maxillofacial soft tissue simulator. In: Harders, M.,
Székely, G. (eds.) ISBMS 2006. LNCS, vol. 4072, pp. 159–168. Springer, Heidelberg (2006)
Hand-Held Sound-Speed Imaging Based
on Ultrasound Reflector Delineation
1 Introduction
Breast cancer is a high-prevalence disease affecting 1/8 women in the USA. Cur-
rent routine screening consists of X-ray mammography, which, however, shows
low sensitivity to malign tumors in dense breasts, for which a large number of
false positives leads to an unnecessary number of breast biopsies. Also, the use of
ionizing radiation advises against a frequent utilization, for instance, to monitor
the progress of a tumor. Finally, the compression of the breast down to a few
centimeter may cause patient discomfort. For these reasons, latest recommenda-
tions restrict the general use of X-ray mammography to biennial examinations
in women over 50 year old [13].
Ultrasound (US) is a safe, pain-free, and widely available medical imaging
modality, which can complement routine mammographies. Conventional screen-
ing breast US (B-mode), which measures reflectivity and scattering from tissue
structures, showed significantly higher sensitivity combined with mammography
(97 %) than the latter alone (74 %) [8]. However, B-mode US shows poor speci-
ficity. A novel US modality, Ultrasound Computed-tomography (USCT), aims at
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 568–576, 2016.
DOI: 10.1007/978-3-319-46720-7 66
Hand-Held Sound-Speed Imaging 569
mapping other tissue parameters, such as the speed-of-sound (SoS), which shows
a high potential for tumor differentiation (e.g., fibroadenoma, carcinoma, cysts)
[2]. However, this method requires dedicated and complex systems consisting of
a large number of transducer elements located around the breast in order to
measure US wave propagation paths along multiple trajectories, from which the
SoS-USCT image is reconstructed [3,5,10,11]. Low-cost extensions of conven-
tional B-mode systems that only require a single multi-element array transducer
are desirable for SoS-USCT for the daily clinical routine. There have been some
early attempts to combine B-mode systems with X-ray mammography, using
the back compression plate as a timing reference. Yet, the reconstruction suffers
from strong limited-angle artifacts, which provide unsatisfactory image quality,
unless detailed prior information of the screened inclusion geometry is available
[6,9].
In this work we propose a novel SoS-USCT
method, hand-held sound-speed imaging, which
overcomes the above listed limitations. By trans-
mitting US waves through tissue between a B-
mode transducer and a hand-held reflector, a Reflector Transducer
2 Methods
a) Transducer
c) Dynamic programming (DP) Independent RF line analysis Adaptive amplitude-tracking
Δ ti,o (µs)
Δ ti,o (µs)
Δ ti,o (µs)
tracking
Line l (Tx = Rx)
Inclusion
Fading
Time (µs)
Fig. 2. Reflector identification for ex-vivo liver test (Fig. 5c). a) Setup details; b) RF
lines acquired with overlapped DP delineation for the case of same Tx and Rx; c) the
measured ToF matrix ti,o ; and d) the relative path delays Δti,o after compensating for
geometric effects. The proposed DP method outperforms independent RF line analysis
and adaptive amplitude-tracking [12].
discrete timing decisions for each line and candidate. The optimum reflector tim-
ing is then found, which minimizes the cumulative cost, and following M (l, tl )
backwards the optimum reflector delineation T (l) is drawn:
C(l, tl ) mintl−1 {C(l − 1, tl−1 ) + f1 (tl , tl−1 )} + f0 (tl )
= (1)
M (l, tl ) argmintl−1 {C(l − 1, tl−1 ) + f1 (tl , tl−1 )}
T (l) = argmin C(l, tl ), l = L; M (l + 1, T (l + 1)), l = 1 . . . L − 1;
tl
with f0 and f1 non-linear functions that incorporate ToF for current t1 and
neighbouring tl−1 RF lines. The general formulation of Eq. 1 introduces regu-
larization into the reflector timing problem, enabling the natural incorporation
of available prior information (oscillatory pattern, smoothness, multiple echoes,
path geometry) into the optimization. Moreover, the delineation does not require
manual initialization and is parallelizable linewise. The currently not optimized
Matlab code runs on a single-core of an Intel Core i7-4770K CPU in <100 s, but
several future speed improvements are envisioned.
where c is the average tissue speed of sound (with a nominal value of 1540 ms −1 ),
d is the distance between transducer and reflector, p is array pitch (0.3 mm for our
probe), and ii , io are the indices of the Tx i and Rx o elements considered (1..128).
Note that d and c are estimated with linear regression based on Eq. 2. In practice,
a non-linear fit is performed to estimate both the reflector inclination and in-plane
orientation.
The next step is the reconstruction of SoS distribution, which is expressed
−1
in slowness units σ [s/m], with c(x, y) = c (1 + σ(x, y)) . The tissue region
is discretized into cells c traversed by a finite set of ray paths p correspond-
ing to different Tx-Rx pairs (Fig. 3a). With the known differential path lengths
lp,c , the path delays Δtp are calculated in function of the slowness increments
C
σc , i.e.,Δtp = c=1 lp,c σc , in matrix form Δt=Lσ. Since reconstruction can
be ill-posed, regularization becomes necessary. A conventional solution in X-ray
Computed Tomography (CT) [7], is Filtered Backprojection (FBP), which aver-
ages the delays of all rays p propagating through cell c. Previous reflector-based
US works [9] have used Algebraic Reconstruction (ART), in which Δt=Lσ is
approximated via singular value decomposition, preserving only the largest sin-
gular values of L (typically 5 % of the total). Both FBP and ART provide a stable
572 S.J. Sanabria and O. Goksel
Fig. 3. Formulation of the sound-speed reconstruction problem. (a) Ray tracing dis-
cretization. (b) Smoothness regularization with L2 and L1 norms. While the L2 norm
favors the smooth sound-speed profile, the L1 norm (TV) equally weights smooth and
sharp gradients.
SoS-image reconstruction, which however suffers from strong streak artifacts and
a coarse resolution in the vertical direction (Fig. 4c). The reason is that, similarly
as in limited-angle CBCT, reflector-based SoS-USCT is an ill-posed problem [7],
every cell being traversed by only a limited set of path orientations; i.e., paths
parallel to the reflector are missing. This is a main geometric limitation with
respect to dedicated USCT systems, which incorporate complete angular path
sets [3,5,10].
To overcome the limited-angle artifacts, we introduce additional regularizing
assumptions for the smoothness of the SoS-image:
a) b) c) d)
Setup µs m -1 Conventional ART µs m -1 Total variation TV µs m -1 AWTV µs m -1
Fig. 4. Simulation of sound-speed image reconstruction with (top) single and (bottom)
multiple inclusions: (a) in-silico phantom, (b-c) reconstruction with prior-art, and (d)
our TV approach.
Hand-Held Sound-Speed Imaging 573
used x1 = |x| (Total Variation (TV) Regularization), sharp and smooth
gradients are equally weighted, which leads to the reconstruction of a minimum
number of piecewise homogeneous inclusions. This concept has been previously
applied to regularize sparse array apertures in full-angle 3D USCT [11]. We apply
this here for the first time to the limited-angle ultrasound reflection tomography
case. With n = 1, Eq. 3 becomes a convex problem, which is iteratively solved
with off-the-shelf optimization packages.
The resulting SoS images (Fig. 4c) successfully filter out limited-angle arti-
facts and delineate closed inclusion geometries. However, they still show reduced
axial resolution, due to the extremely reduced path orientation set (according
to Fig. 3a, for a SoS image aspect ratio of 1:1, the largest available ray angle is
25◦ ). In order to compensate for this resolution loss we introduce Anisotropically-
Weighted Total Variation (AWTV), which balances horizontal and vertical gra-
dients with a constant κ according to the available ray information in each
direction:
σ̂AWTV = argmin Δt − Lσ2 + λ κ|σi+1,j − σi,j | + (1 − κ)|σi,j+1 − σi,j | (4)
σ
i,j
With a reconstructed pixel size equal to the array pitch (p =0.3 mm), an optimum
reconstruction performance was achieved with λ = 0.0008 and κ = 0.9 .
Depth (mm)
10
Relative
slowness
20
+1.0 s/m
30 SOFT
Stiff inclusions 0 20
Width (mm)
b) Glandular tissue Skin layer
Stiff tumour
0
HARD
-17 s/m
Depth (mm)
10
Relative
slowness
20
+10 s/m
30 SOFT
0 20
Width (mm) Cyst
c) Hard inclusion
0
HARD
-17 s/m
Depth (mm)
10
Relative
slowness
20
+17 s/m
30 SOFT
0 20 Reflector echo
Width (mm)
128
0
d) -96
Experiment
Rx. (#)
10
64
HARD
Depth (mm)
20 -17 s/m
32
Relative
30 slowness 128
Simulatoin
+10 s/m
96
40
Rx. (#)
SOFT
64
t ( s)
0 20
Width (mm) Cyst 32
-0.05 0.1 32 64 96 128
Tx (#)
cally filters out fading positions from the reconstruction. Calibration experiments
in gradually-heated water provided quantitative SoS values with a sensitivity
< 0.005 ms−1 . The observed timing error of std = 15 ns results in a noise floor
of 0.8 µs m−1 , corresponding to a <0.1 % sound-speed contrast.
The proposed AWTV SoS reconstruction achieves significant improvements
in the delineation of inclusion geometry (Fig. 4). Often-problematic vertical elon-
gation of inclusions is strongly reduced (14 %) compared to ART (>300 %) and
TV (95 %), which enables a quantitative reconstruction of original SoS values
(SoS error <0.3 %). Streak artifacts, which are typical in ART (CNR = 15 dB),
are not visible in AWTV (CNR = 37 dB). Moreover, our novel approach success-
fully reconstructs multiple inclusions with different SoS values and geometries
(Fig. 4). Not only are the inclusion positions correctly identified, but also are
their SoS values and diameter satisfactorily estimated.
An excellent performance is observed in both phantom and ex-vivo tests. For
the gelatin phantom, the hard inclusions were manufactured with a small SoS
contrast (−3.5 µs m−1 , 0.5 % SoS increase), but nonetheless were successfully
resolved (Fig. 5a). In the more heterogeneous breast phantom (Fig. 5b) both hard
inclusions (−17 µs m−1 , 2.6 % SoS increase) and cysts (−6 µs m−1 , 0.9 % SoS
increase) show a higher contrast and are well-separated from the background
noise, which is around 0.6 %. These values are more representative of real breast
tumors, as reported by [2]. The background noise is related to reconstruction
artifacts (e.g., the gradient information is missing at image boundaries), and to
a minor extent, to refraction effects not accounted for in the ray tracing model.
The hard inclusion in the ex-vivo liver samples was invisible in the B-mode, but
clearly delineated in the SoS image, with contrast similar to the breast phantom;
see Fig. 5d. Despite movement artifacts, lower US signal-to-noise ratio (<10 dB),
and imperfect coupling between reflector and breast tissue, the preliminary in-
vivo test demonstrates a successful identification of cystic inclusion, with an
expected lower SoS contrast (−8 µs m−1 ) than the ex-vivo hard inclusions.
Acknowledgment. This work was funded by the Swiss National Science Foundation.
576 S.J. Sanabria and O. Goksel
References
1. Crimi, A., Makhinya, M., Baumann, U., Thalhammer, C., et al.: Automatic mea-
surement of venous pressure using B-mode ultrasound. IEEE TMI 63, 288–299
(2016)
2. Duric, N., Littrup, P., Li, C., Roy, O., Schmidt, S., et al.: Breast imaging with
softVue: initial clinical examination. In: SPIE Medical Imaging, pp. 90400V (2014)
3. Duric, N., Littrup, P., Poulo, L., Babkin, A., Pevzner, R., Holsapple, E.: Detec-
tion of breast cancer with ultrasound tomography: first results with the computer
ultrasound risk evaluation (cure) prototype. Med. Phys. 34, 773–785 (2007)
4. Foroughi, P., Boctor, E., et al.: Ultrasound bone segmentation using dynamic
programming. In: IEEE Ultras Symposium, New York, NY, USA, pp. 2523–2526
(2007)
5. Gemmeke, H., Ruiter, N.V.: 3D ultrasound computer tomography for medical
imaging. Nucl. Instrum. Methods Phys. Res. A 580, 1057–1065 (2007)
6. Huang, S.W., Pai-Chi, L.: Ultrasonic computed tomography reconstruction of the
attenuation coefficient using a linear array. IEEE TUFFC 52, 2011–2022 (2005)
7. Kak, A.C., Slaney, M.: Principles of computerized tomographic imaging. IEEE
Press, New York (1988)
8. Kolb, T.M., Lichy, J., Newhouse, J.H.: Comparison of the performance of screening
mammography, physical examination, and breast us and evaluation of factors that
influence them: an analysis of 27,825 patient evaluation. Radiology 225, 165–175
(2002)
9. Krueger, M., Burow, V., et al.: Limited-angle us transmission tomography of the
compressed female breast. In: IEEE Ultrasonics Symposium, Miyagi, Japan, pp.
1345–1348 (1998)
10. Nebeker, J., Nelson, T.R.: Imaging of sound speed using reflection ultrasound
tomography. J. Ultrasound Med. 31, 1389–1404 (2012)
11. Radovan, J., Peterlik, I., et al.: Sound-speed image reconstruction in sparse-
aperture 3-D us transmission tomography. IEEE Trans. Ultrason. Ferroelectr.,
Freq. Control 59, 254–264 (2012)
12. Sanabria, S.J., Hilbers, U., et al.: Modeling and prediction of density distribu-
tion and microstructure in particleboards from acoustic properties by correlation
of non-contact high-resolution pulsed air-coupled ultrasound and X-ray images.
Ultrasonics 53, 157–170 (2013)
13. Siu, A., U.S. Preventive Services Task Force: Screening for breast cancer: U.S.
preventive services task force recommendation. Ann. Int. Med. 164(4), 279–296
(2016). doi:10.7326/M15-2886
Ultrasound Tomosynthesis: A New Paradigm
for Quantitative Imaging of the Prostate
1 Introduction
Prostate cancer is the most common male cancer in the United States with an estimated
220,000 new cases and 28,000 deaths in 2015 [1]. A key to survival and to avoid
over-treatment is early detection, and accurate characterization [2]. Systematic sextant
biopsies under TRUS guidance have been the gold standard technique since the 1980’s
[3]. TRUS is real-time, relatively low cost, and shows the prostate capsule and
boundaries. However, it suffers from poor spatial resolution, subjectivity, and low
sensitivity for cancer detection (40–60 % [4]).
MRI is the superior imaging modality for visualizing prostate gland, nerve bundles,
and clinically-relevant cancer. However, real-time MRI is challenging, requires spe-
cialized costly equipment, and in-gantry prostate biopsy is time and resource intensive
and impractical to apply across a broad population. Fusion of TRUS and multi-parametric
MRI takes advantage of the strengths of both imaging modalities. In fusion-guided
biopsy, targeting information is solely dependent on MR images [4]. Even though
US-MRI fusion guided biopsy has shown to be highly sensitive to detect higher-grade
cancer, it still suffers from high false positives for lower-grade cancers resulting in
unnecessary biopsies [4]. Also, MRI is expensive and less available to the broad
population.
Some US based technologies have recently been proposed to address this clinical
need in addition to MRI-US fusion, including elastography [5], doppler, and US tissue
characterization [6]. Although several studies reported significant improvement in
prostate cancer identification with quasi-static elastography, there are still some limi-
tations in reproducibility, subjectivity, and the inability of this method to differentiate
cancer from chronic prostatitis [5]. Time series analysis [6] is an interesting new
machine learning technique to perform the tissue characterization and has recently
shown promising results for marking cancerous areas of prostate using the US RF
image [6]. This machine learning method is still based on a post-processing of
reflection data.
Transmission ultrasound imaging works based on transmission of US signals. The
received signal can be used to reconstruct the volume’s acoustic properties such as
SOS, attenuation, and spectral scattering maps. This information may theoretically be
able to differentiate among different tissue types, including cancerous tissues. Trans-
mission ultrasound can be performed in two ways: full angle and limited angle, just as
with tomography. Full angle is a described technique called ultrasound computed
tomography which has been extensively used for breast imaging [7] and recently,
imaging of extremities [8]. Limited angle, however, is a relatively more recent tech-
nique, which has also been used in breast imaging [9]. Similar to X-ray tomosynthesis
(which is a limited angle version of CT [computed tomography]), here, we refer to the
limited angle US tomography as “US tomosynthesis” (USTS).
The current transmission US systems (e.g. [7, 8]) only work with breast, since it is
an easy target to scan in a small water tank. Leveraging these recent findings, we
propose a method to further this technology to prostate cancer diagnosis and screening
utilizing robotic technology. In this concept (Fig. 1), a bi-plane or tri-plane TRUS
probe resides in the rectum, and a linear/curved array transducer resides on the
abdomen/pelvis, using the bladder as an acoustic window to the prostate. The
abdominal probe can be fixed and aligned with the TRUS probe using a co-robotic
setup similar to the one proposed in [10]. Ex vivo modeling is requisite prior to
evaluating prostate USTS in vivo. The first step is to evaluate the feasibility of USTS in
prostate cancer detection in a controlled benchtop environment, to understand the
potential of this technology. Therefore, this paper focuses on modeling and developing
a system and method for ex vivo prostate USTS. The system was evaluated with a mock
prostate and lesions with comparable SOS.
US Tomosynthesis: A New Paradigm for Quantitative Imaging 579
Fig. 1. (a) Prostate ultrasound tomosynthesis concept: a bi/tri-plane TRUS probe is placed into
the rectum and a linear/curved array transducer is placed on patient’s abdomen; (b) sagittal USTS
imaging, (c) axial USTS imaging. (d) USTS image reconstruction concept; larger angel ϴ leads
to more tomographic data, and less artifact in the reconstructed image.
2 Method
US machine
(c)
DAQ (b)
Receiver
Transmitter
(a)
(d)
Fig. 2. (a) USTS ex vivo setup, The patient specific molds to correspond MRI, histology, and
USTS slices. (b) The 3D printed mold for MRI-histology comparison. (c) The 3D printed box
used to create the US friendly mold. (d) The US friendly mold and the 3D printed prostate with
its seminal vesicles.
580 F. Aalamifar et al.
MRI and histology are the ground-truths for comparison of the USTS image
reconstructed using this setup. The technique and test bed model were designed to
enable direct correlation with MRI and matching slices of correlative histology whole
mounts. This technique was performed in two steps: first, a patient specific mold (as
shown in Fig. 2b) with grooves to guide histology knife is 3D printed. The grooves are 3
or 6 mm apart and result in histology slices specifically custom designed to correspond
to MR image slices [11]. Second, the same mold is created using an US friendly material
with marks indicating the corresponding slices to be scanned using the US probes.
The US friendly mold was made from acrylamide gel with 1523 m/s SOS and other
relevant tissue mimicking property as reported previously [12]. The phantom does not
decay, is rigid enough to hold the prostate, and has appropriate SOS suitable for
reconstruction (will be described later). In order to make the mold, initially the prostate
(with seminal vesicles) is segmented from the clinical MR image. This prostate volume
is saved as a stereolithography (.stl) file and printed using a 3D printer (uprint,
Stratasys). The 3D printed prostate is positioned inside a box at similar position and
orientation compared to MRI 3D printed mold using guide rods as shown in Fig. 2c.
Then, the acrylamide solution was poured into the box. After solidification, the rods
were removed and the mold was cut to remove the 3D printed prostate. Figure 2d
shows the US friendly mold. The prostate can be put inside the mold cavity and the
mold’s halves are adhered together. Then, the mold is inserted into a container. The
container holds the mold in place during the USTS scan, can be filled with liquid to fill
the acoustically insulating air gaps between mold and prostate, and provides windows
made of mylar sheet to provide US transparency. The container is marked with lines
that determine the slices that correspond to the MRI slices.
We used two linear array Ultrasonix probes. The transmitting probe was connected to
an Ultrasonix Sonixtoch scanner (Vancouver, BC). As shown in Fig. 2a, the receiving
probe was connected to an Ultrasonix Data Acquisition (DAQ) device, which can receive
the US waveforms of 128 channels in parallel with sampling frequency of 40 MHz.
where s(t) is the intensity of the received signal at time t. s(t) is set to zero outside
[tbg−w, tbg + w], where tbg is the estimated background TOF, and w is half of a certain
window length to reduce the effect of noise and refractions. As shown in Fig. 3, some
of the waveforms contained electrical noise, or refracted delayed signals which could
result in miss-selection of the TOF. The MATLAB interface allows the user to correct
for these miss-selections.
The grid area between transmit-receive pairs (Fig. 1d) were formulated as a system
matrix and the following equation was used to calculate the image based on straight-ray
US propagation approximation [10]:
S X Xbg ¼ T Tbg ð2Þ
typical size of the prostate and lesions. As shown in Fig. 4a, the prostate can be modeled
as a 3 4 cm ellipse and contains two lesions of size 5 and 10 mm in diameter.
The speeds of sound in prostate are set to 1614 m/s for prostate region and, 1572 m/s
and 1596 m/s for the two lesions based on [14].
3 Results
3.1 Simulation Results
A simulation phantom was created in MATLAB based on the prostate description
given above. As shown in Fig. 4, a background speed of 1523 m/s (similar to general
tissue speed of sound) produced a superior image than ones with 1375 m/s and
1010 m/s, corresponding to plastisol and silicon ecoflex respectively. Artifacts in the
images are due to the limited angle data but, the lesions are still distinguishable from
the prostate.
1010 m/s
1000
1600
1523 m/s
Fig. 4. Simulation results: (a) ground-truth simulation phantoms; (b–c) reconstructed SOS map
using (b) Diff-CG and (c) Diff-EM methods.
US Tomosynthesis: A New Paradigm for Quantitative Imaging 583
Table 1. Bias and noise in the reconstructed images using the two methods at different
iterations.
Iteration Diff-CG Diff-EM
Auto TOF Corrected Auto TOF Corrected
pick TOF pick TOF
20 50 20 50 20 50 20 50
%Biasp 2.89 3.68 3.44 4.1 1.77 16.5 2.93 2.69
%Biasw 0.86 1.30 0.38 0.95 0.79 3.47 0.29 0.23
%Biasb 0.80 1.52 0.79 1.42 0.06 0.20 0.11 0.06
Noise 15.7 30.5 14.10 27.05 1.06 4.57 1.20 1.14
p
: plastisol; w: water; b: background. Noise was calculated as the
standard deviation of background pixels.
1450
1400
Water
X: 30 Y: 44
Index: 1476
RGB: 1, 0.625, 0
Fig. 5. (a) B-mode image, (b–c) Reconstructed image using Diff-EM method and (b) automat-
ically picked TOF (more iterations causes more artifacts), and (c) manually corrected TOF.
4 Conclusions
In this study, we proposed and modeled a new paradigm for quantitative imaging of
prostate, that we call ultrasound tomosynthesis. Prostate cancer screening, biopsy, focal
image guided therapies, and brachytherapy are examples of the clinical applications
that could potentially integrate this technology. In this study, a setup and a technique
584 F. Aalamifar et al.
were developed to evaluate feasibility of prostate USTS in ex vivo prostate taken from
prostatectomy patients. Simulation and phantom studies were done to evaluate the
feasibility of this setup. The proposed setup could be used for patient-specific USTS
study of ex vivo tissues. The SOS map reconstructed from a mock ex vivo prostate with
relevant acoustic properties showed promise. Immediate next step includes ex vivo
study. Since the SOS contrast among different tissues may be small in prostate, the
attenuation map and more advanced reconstruction techniques including regularization
[15] will be investigated. There is a critical public health need for improved method-
ologies of prostate tissue characterization and prostate cancer detection that are
cost-effective, broadly accessible, and easy to use.
Acknowledgement. This work was supported by the NIH intramural research funding and
Johns Hopkins internal funds.
References
1. Siegel, R., et al.: Cancer statistics. Cancer J. Clin. 64, 9–29 (2014)
2. Labrie, F., et al.: Screening decreases prostate cancer death: first analysis of the 1988 Quebec
prospective randomized controlled trial. Prostate 38, 83–91 (1999)
3. Durkan, G.C., et al.: Improving prostate cancer detection with an extended-core transrectal
ultrasonography-guided prostate biopsy protocol. BJU Int. 89(1), 33–39 (2002)
4. Imani, F., et al.: Augmenting MRI transrectal ultrasound guided prostate biopsy with
temporal ultrasound data: a clinical feasibility study. Int. J. Comput. Assist. Radiol. Surg. 10,
727–735 (2015)
5. Correas, J.M., et al.: Ultrasound elastography of the prostate: state of the art. Diagn. Interv.
Imaging 94(5), 551–560 (2013)
6. Imani, F., et al.: Computer-aided prostate cancer detection using ultrasound RF time series:
in vivo feasibility study. IEEE Trans. Med. Imaging 34(11), 2248–2257 (2015)
7. Duric, N., et al: Whole breast tissue characterization with ultrasound tomography. In: SPIE
Medical Imaging (2015)
8. Fincke, J.R., et al: Towards ultrasound travel time tomography for quantifying human limb
geometry and material properties. In: SPIE Medical Imaging, San Diego, CA (2016)
9. Huang, L., et al.: Breast ultrasound tomography with two parallel transducer arrays:
preliminary clinical results. In: SPIE Medical Imaging, p. 941916 (2015)
10. Aalamifar, F., et al: Co-robotic ultrasound tomography: dual arm setup and error analysis. In:
SPIE Medical Imaging (2015)
11. Turkbey, B., et al.: Multiparametric 3T prostate magnetic resonance imaging to detect
cancer: histopathological correlation using prostatectomy specimens processed in cus-
tomized magnetic resonance imaging based molds. J. Urol. 186(5), 1818–1824 (2011)
12. Negussie, A.H., et al.: Thermochromic tissue-mimicking phantom for optimisation of
thermal tumour ablation. Int. J. Hyperth. 32(3), 239–243 (2016)
13. Aalamifar, F., et al: Image reconstruction for robot assisted ultrasound tomography. In: SPIE
Medical Imaging (2016)
14. Tanoue, H., et al: Ultrasonic tissue characterization of prostate biopsy tissues by ultrasound
speed microscope. In: Engineering in Medicine and Biology Society (2011)
15. Huthwaite, P., et al.: A new regularization technique for limited-view sound-speed imaging.
IEEE Trans. Ultrason. Ferroelectr. Freq. Control 60(3), 603–613 (2013)
Photoacoustic Imaging Paradigm Shift:
Towards Using Vendor-Independent
Ultrasound Scanners
1 Introduction
(a) (b)
Fig. 1. Conventional PA imaging system (a) and proposed PA imaging system using clinical US
scanners (b). Channel data is necessary for PA beamforming because US beamformed PA data is
defocused with the incorrect delay function. The proposed two approaches could overcome the
problem.
To use clinical US systems for PA image formation, Harrison, and Zemp [8]
proposed to change the speed of sound parameter. However, the access to the speed of
sound parameter is uncommon, and the changeable range of this parameter is bounded.
Zhang et al. [9] proposed to use US post-beamformed RF data with a fixed focal point.
Our paper considers more general US beamformed data applied with delay-and-sum
dynamic receive focusing. Two PA beamforming algorithms are introduced: inverse
beamforming, and synthetic aperture (SA) based re-beamforming (Fig. 1). US beam-
forming is a sequential process scanning line by line. Using those sequentially
beamformed data as input, inverse beamforming recovers channel data by taking into
Photoacoustic Imaging Paradigm Shift: Towards Using Vendor-Independent 587
2 Methods
where S(x, y) is an amplitude correction factor to correct the wave intensity change
caused by the distance. The signal source map can be formed by repeating the inte-
gration for all pixels. The second step is to mimic the PA data acquisition, and find the
signal value of each sampling point on a pre-beamformed image. As shown in Fig. 2, at
t0, the signal source map in FOV is I(xm, yn), the PA waves from each sub-source
propagate through the media, and reach the US probe array at the top of the image. At a
given time t1, a particular array element at xm receives signal from a circle with a radius
y, where y = C * (t1−t0). For each pixel of the recovered channel data geometry P(xm,
yn), we integrate along the circle C2:
I
1
Pðxm ; yn Þ ¼ Iðx; yÞ ; ð3Þ
y2
C2
where P(xm, yn) is the pixel amplitude received by the xm sample at time t1. The last
step is to repeat step two for all pre-beamforming image sampling points, so a
pre-beamformed image is reconstructed.
P(xm, yn)
jr j
t¼ ; ð4Þ
c
where
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
y 2
n
r¼ þ x2m : ð5Þ
2
3 Results
3.1 Simulation Analysis
The simulation results are shown in Fig. 4. The US beamformed RF data was defo-
cused due to an incorrect delay function (Fig. 4b). The reconstructed PA images are
shown in Figs. 4c–e. The proposed two approaches were compared to the ground truth
conventional PA beamforming using channel data. The measured full width at the half
maximum (FWHM) was shown in Table 1. The reconstructed point size was compa-
rable to the point reconstructed using a 9.6 mm aperture on the conventional PA
beamforming.
Fig. 4. Simulation results. (a) Channel data. (b) US post-beamformed RF data. (c) Recon-
structed PA image from channel data with an aperture size of 9.6 mm. (d) Reconstructed PA
image through inverse beamforming. (e) Reconstructed PA image through SA re-beamforming.
Table 1. FWHM of the simulated point targets for corresponding beamforming methods.
FWHM (mm) Conventional using Inverse SA
channel data beamforming re-beamfoming
10 mm depth 0.60 0.62 0.63
20 mm depth 1.02 1.06 0.99
30 mm depth 1.53 1.39 1.43
40 mm depth 1.94 1.76 1.91
50 mm depth 2.45 2.11 2.42
Fig. 5. Experiment results with Pseudo-PA data. (a–b) Comparison of channel data. (a) Ref-
erence channel data collected using DAQ. (b) Recovered channel data through inverse
beamforming from US post-beamformed RF data. (c) US post-beamformed RF data collected
from clinical US scanner. Reconstructed PA image using DAQ channel data (d), inverse
beamforming (e), and SA re-beamforming (f).
Fig. 6. In vivo evaluation results. (a) Experiment setup. Contrast agents (ICG) targeting tumor
are visualized. (b) PA image using channel data. (c) PA image through SA re-beamforming.
Although demonstration of PA image formation was done based on point targets, the
proposed algorithms would work for any structures that have high optical absorption
such as a blood vessel that shows strong contrast for near-infrared wavelength light
excitation. The algorithms could be also integrated into a real-time imaging system
using clinical US machines [10].
A high PRF laser system can be considered as a system requirement, as it is
necessary to synchronize the laser transmission to the US line transmission trigger.
592 H.K. Zhang et al.
To keep the frame rate similar to that of conventional US B-mode imaging, the PRF of
the laser transmission should be the same as the transmission rate, in the range of at
least several kHz. Therefore, a high PRF laser system such as a laser diode is desirable.
US transmission should be off or use low energy to eliminate the artifacts from US
signals.
In this paper, we proposed a new paradigm on PA imaging using US post-
beamformed RF data from clinical US systems. Two algorithms, inverse beamforming
and SA based re-beamforming, were introduced and their performance was demon-
strated in the simulation. In addition, experimental study using the pseudo-PA signal
source and in vivo targets reveals the validity and clinical significance of these methods,
in that a similar resolution was achieved compared to conventional PA imaging using
channel data. Future work includes implementing the algorithm in a real-time
environment.
Acknowledgement. Authors acknowledge Howard Huang for proofreading, and Dr. Ying Chen
for assisting in vivo experiment.
References
1. Xu, M., Wang, L.V.: Photoacoustic imaging in biomedicine. Rev. Sci. Instrum. 77, 041101
(2006)
2. Wang, L.V., Hu, S.: Photoacoustic tomography. in vivo imaging from organelles to organs.
Science 335, 1458–1462 (2012)
3. Kolkman, R.G.M., et al.: Real-time in vivo photoacoustic and ultrasound imaging.
J. Biomed. Opt. 13(5), 050510 (2008)
4. Kolkman, R.G.M., et al.: In vivo photoacoustic imaging of blood vessels with a pulsed laser
diode. Lasers Med. Sci. 21(3), 134–139 (2006)
5. Park, S., Aglyamov, S.R., Emelianov, S.: Beamforming for photoacoustic imaging using
linear array transducer. In: Proceedings in IEEE International Ultrasonics Symposium,
pp. 856–859 (2007)
6. Yin, B., et al.: Fast photoacoustic imaging system based on 320-element linear transducer
array. Phys. Med. Biol. 49(7), 1339–1346 (2004)
7. Liao, C.K., et al.: Optoacoustic imaging with synthetic aperture focusing and coherence
weighting. Opt. Lett. 29, 2506–2508 (2004)
8. Harrison, T., Zemp, R.J.: The applicability of ultrasound dynamic receive beamformers to
photoacoustic imaging. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 58(10), 2259–2263
(2011)
9. Zhang, H.K., et al.: Photoacoustic reconstruction using beamformed RF data: a synthetic
aperture imaging approach. Proc. SPIE 9419, 94190L (2015)
10. Taruttis, A., Ntziachristos, V.: Advances in real-time multispectral optoacoustic imaging and
its applications. Nat. Photonics 9(4), 219–227 (2015)
4D Reconstruction of Fetal Heart Ultrasound
Images in Presence of Fetal Motion
1 Introduction
Fast acquisition rates and non-invasiveness of ultrasound (US) imaging makes
it an ideal modality for screening the fetal heart to detect congenital heart mal-
formation. Traditionally, the functioning of fetal heart is inspected in real-time
during B-mode imaging. Guidelines recommend examination of the four-chamber
and outflow tract views [1]. Yet prenatal detection rates vary widely, due to dif-
ferences in examiner experience, maternal obesity, transducer frequency, gesta-
tional age, amniotic fluid volume and fetal position [1]. 4D US imaging simplifies
the assessment of the outflow tracts, allows a more detailed examination and
contributes to the diagnostic evaluation in case of complex heart defects [1,2].
4D US of the fetal heart requires special image reconstruction methods, since
the speed of 3D US acquisitions using common mechanically steered probes is
too slow compared to the fetal heart rate (e.g. 7–10 vs. 2–2.5 Hz). A general app-
roach for such a 4D reconstruction problem is to continuously acquire individual
2D images covering the region of interest [3–5], which then need reordering to
extract consistent 3D images. While cardiac 4D MR reconstructions for adults
can be supported by ECG and respirator signals [6], these signals cannot reli-
ably extracted for fetus [7]. Hence sorting has relied on extracting the periodic
cardiac signal from the images and that no other fetal motion is present [3,4].
The most common method for fetal 4D US reconstruction is the STIC
(Spatio-Temporal Image Correlation) method [4], where autocorrelation is used
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 593–601, 2016.
DOI: 10.1007/978-3-319-46720-7 69
594 C. Tanner et al.
to detect the systolic peaks, the fetal heart rate (HR) is deduced and the frames
are sorted according to their resulting phases. STIC builds on slow, single sweep
US acquisitions (e.g. 150 frames/s, 25o in 10 s, 1500 frames) and works well if
only fetal cardiac motion is present. Yet, additional motion from spontaneous
fetal activity or mother’s breathing creates artifacts [8,9]. Such artifacts cannot
be remedied, as motion affects all consecutive frame positions, which in a sin-
gle sweep have only been acquired once. Hence mothers are asked to hold their
breath and operators may wait for a period of less fetal movement, which pro-
longs examination time. Volumes acquired by non STIC experts showed more
motion artifacts (42 %) than those by experts (16 %) [8]. Reports on correcting
motion for fetal heart 4D US reconstruction have not been encountered.
Image registration has been used to improve reconstructions, but is generally
computationally very expensive. For example, correction of fetal 3D MRIs by
slice-to-volume rigid registration of local patches required 40 min on multiple
GPUs [10]. Correction of adult 3D cardiac MRIs, after gating based on ECG and
breathing belt signals, took 3 h on a 16 workstation cluster [6]. For respiratory
motion, 4D US reconstruction has been based on extracting a gating signal per
slice position by image dimensionality reduction and then matched these signals
across slices [5]. This relies on gathering reliable motion statistics per slice, and
hence might not be robust to severe, non-periodic motion, e.g. drift.
To avoid time-consuming registrations, we follow the approach of selecting
suitable image slices from repeated mechanically-swept US acquisitions. Herein
we focus on the consistency of the 4D reconstruction and the detection of out-
liers due to motion. A large range of selection criteria was first quantitatively
evaluated on simulated US sequences. Then, to have statistical power, only the
baseline and the best method were applied to in-vivo data, and the visual appear-
ances of the reconstructions were scored by 3 researchers and a gynecologist.
2 Material
Fig. 1. Illustration of (a) the in-silicon phantom geometry with a transducer plane, (b)
a simulated US image and (c) the simulated motion over time.
3 Method
Reconstruction is based on first estimating the heart rate. Then frames are
selected for reconstruction according to phase, spatial and temporal consistency.
3.2 4D Reconstruction
Figure 2 illustrates the problem of reconstructing P 3D phase images from B
B-mode images continuously acquired at K positions in S sweeps. From the
estimated HR fh , the phase value qb ∈ [0.5, P + 0.5] of the B-mode image Ib
(acquired at time t = b/fi ) was estimated from the fractional part of the heart
beats (tfh ), i.e. qb = (P − 1)(tfh − tfh ) + 0.5. The frame from sweep s and
position k is denoted as Iks with associated phase qsk . For reconstructing P 3D
phase images, P × K indices (called škp ) need to be determined.
Table 1 provides an overview of the tested reconstruction methods. Method
M0 selects frames whose phase qsk is closest to the desired phase p [3,4].
596 C. Tanner et al.
Greedy methods M1–M3 first determine for each phase p a reference B-mode
image Im
šm
p
and then sequentially minimize spatial inconsistency, i.e.
šk+1
p = arg min D Ikskp , Ik+1
s for k = {m, m + 1, . . . , K-1, m-1, m-2, . . . , 1}
s∈Spk+1
(1)
where D is an image dissimilarity measure and Spk = {s| |qsk − p| < 0.5} is the
set of sweep indices of frames at position k belonging to phase p. For M1, Im
šm
p
is
the first frame at position m = 1, which belongs to phase p i.e. šp = min Sp . M2
1 1
is the same as M1 apart from using the midframe (m = K/2). In M3 the most
typical midframe is used as reference, i.e. the midframe which has the highest
K/2
correlation with all other midframes within the phase range Sp :
šK/2
p = arg max CC IK/2
s , IK/2
r . (2)
K/2
s∈Sp K/2
r∈Sp
P Dpk SCpk
4D Reconstruction of Fetal Heart Ultrasound Images 597
where weight α = k αk /K was automatically determined by αk = |P Dk /SC k |
with P Dk denoting the mean of P Dpk for the R = 10 closest observations to
p and SC k being the mean of SCpk for the R most similar spatial neighbours.
M5 is the same as M4 apart from also allowing variations in the estimated
HR fh through an additional grid search over 1/f ∈ [1/fh ± 0.05] s to minimize
Čf = minfh ∈f Čfh . M6 extends Eq. (3) by adding a term for temporal consistency
(TC): K
P K−1
K
k k k
Čfh = min P Dp + α SCp + β T Cp (4)
sk
p ∈S
p=1 k=1 k=1 k=1
k k k
where T Cpk = D(Iksk , Iksk ), β = k |P Dp /(T C p K)| and T C p denotes
p [(p−1)modP ]
the mean of T Cpk for the R most similar temporal neighbours. Equation (4) was
sequentially optimized until convergence after reconstructing a phase via Eq. (3).
Outlier Removal (OR) - Having observed on simulated and real data that
motion leads to low CC values when comparing images (see Fig. 3), we also
tested all method after removing low correlating sweeps. For this we created the
CC matrix J for the midframes, determined the midframe with the lowest mean
correlation to all others, and discarded the associated sweep. This was continued
until the lowest mean correlation was >0.5 or only 50 % of sweeps were left.
Sim3
#2
Fig. 3. (left) Example CC matrix J of midframes from Sim3 and for in-vivo sequence
#2. (middle, right) Power spectra from (middle) J and (right) autocorrelation method.
HR ground truth for in-vivo data was estimated from M-mode traces by counting
the number of heart cycles between the first and the last visible extrema.
598 C. Tanner et al.
Table 2. Ground truth (GT) heart rate (in beats/min) and difference (GT-estimation)
for estimation methods using (A1) autocorrelation or (A2) image similarities.
The performance for the simulations was quantified by combined motion errors.
For this, phase errors were converted to motion errors by assigning to each phase
value the corresponding mean change in semi axis length (±2.25 mm).
Table 3 lists the mean absolute error for the most complex simulation (Sim3)
when applying methods M0–M6 using one of 3 image dissimilarity measures
D, and including outlier removal (OR) or not (OR×). Highest accuracy was
Table 3. (top-left) Table with mean absolute errors (in mm) for simulation Sim3. The
lowest errors are marked in bold. (top-right) Visualization of table results. (bottom)
Visualization of results for all simulations and their mean (Sim123).
D OR M0 M1 M2 M3 M4 M5 M6
CC × 2.59 1.68 0.50 0.58 0.88 6.04 0.93
CD2 × 2.59 0.92 0.36 0.47 2.14 4.49 0.39
MI × 2.59 1.81 0.64 0.77 3.95 5.59 1.57
CC 0.71 0.57 0.50 0.59 0.54 1.25 0.55
CD2 0.71 0.62 0.36 0.47 0.68 1.51 0.36
MI 0.71 0.88 0.64 0.77 0.61 1.59 1.03
4D Reconstruction of Fetal Heart Ultrasound Images 599
a b c d
Fig. 4. Orthogonal example slices from reconstruction of simulation Sim3 for (a)
ground truth and methods (b) M0, (c) M2-CD2-OR and (d) M6-CD2-OR.
a b c d
Fig. 5. Example of a representative in-vivo reconstruction (mean score 1.75) for (a,b)
M0 and (c,d) M2-CD2-OR for (a,c) phase 2 and (b,d) difference phase 3 - phase 2.
References
1. Carvalho, J.S., Allan, L.D., Chaoui, R., Copel, J.A., DeVore, G.R., Hecher, K.,
et al.: ISUOG practice guidelines (updated): sonographic screening examination of
the fetal heart. Ultrasound Obstet. Gynecol. 41(3), 348 (2013)
2. DeVore, G.R., Falkensammer, P., Sklansky, M.S., Platt, L.D.: Spatio-temporal
image correlation (STIC): new technology for evaluation of the fetal heart. Ultra-
sound Obstet. Gynecol. 22(4), 380 (2003)
3. Nelson, T.R., Pretorius, D.H., Sklansky, M., Hagen-Ansert, S.: Three-dimensional
echocardiographic evaluation of fetal heart anatomy and function: acquisition,
analysis, and display. J. Ultrasound Med. 15(1), 1 (1996)
4. Schoisswohl, A., Falkensammer, P.: Method and apparatus for obtaining a volu-
metric scan of a periodically moving object. US Patent 6,966,878, 22 November
2005
5. Wachinger, C., Yigitsoy, M., Rijkhorst, E.-J., Navab, N.: Manifold learning for
image-based breathing gating in ultrasound and MRI. Med. Image Anal. 16(4),
806 (2012)
4D Reconstruction of Fetal Heart Ultrasound Images 601
6. Odille, F., Bustin, A., Chen, B., Vuissoz, P., Felblinger, J.: Motion-corrected, super-
resolution reconstruction for high-resolution 3D cardiac cine MRI. In: Navab, N.,
Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part III. LNCS,
vol. 9351, pp. 435–442. Springer, Berlin (2015)
7. Peterfi, I., Kellenyi, L., Szilagyi, A.: Noninvasive recording of true-to-form fetal
ECG during the third trimester of pregnancy. Obstet. Gynecol. Int. 2014, Article
ID 285636 (2014)
8. Uittenbogaard, L.B., Haak, M.C., Spreeuwenberg, M.D., Van Vugt, J.M.G.: A
systematic analysis of the feasibility of four-dimensional ultrasound imaging using
spatiotemporal image correlation in routine fetal echocardiography. Ultrasound
Obstet. Gynecol. 31(6), 625 (2008)
9. Yagel, S., Benachi, A., Bonnet, D., Dumez, Y., Hochner-Celnikier, D., Cohen, S.M.,
et al.: Rendering in fetal cardiac scanning: the intracardiac septa and the coronal
atrioventricular valve planes. Ultrasound Obstet. Gynecol. 28(3), 266 (2006)
10. Kainz, B., Alansary, A., Malamateniou, C., Keraudren, K., Rutherford, M.,
Hajnal, J.V., Rueckert, D.: Flexible reconstruction and correction of unpredic
table motion from stacks of 2D images. In: Navab, N., Hornegger, J., Wells, W.M.,
Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9350, pp. 555–562. Springer, Berlin
(2015)
11. Bürger, B., Bettinghausen, S., Radle, M., Hesser, J.: Real-time GPU-based ultra-
sound simulation using deformable mesh models. IEEE T Med. Imaging 32(3), 609
(2013)
12. Cohen, B., Dinstein, I.: New maximum likelihood motion estimation schemes for
noisy ultrasound images. Pattern Recogn. 35(2), 455 (2002)
13. Seroul, P., Sarrut, D.: VV: a viewer for the evaluation of 4D image registration. In:
MICAS Journal (MICCAI Workshop - Systems and Architectures for Computer
Assisted Interventions) vol. 40, p. 1 (2008)
Towards Reliable Automatic Characterization
of Neonatal Hip Dysplasia from 3D Ultrasound
Images
1 Introduction
Developmental dysplasia of the hip (DDH), which refers to hip joint abnormal-
ities ranging from mild acetabular dysplasia to irreducible hip joint dislocation,
affects 0.16 %−2.85 % of all newborns [1]. Early arthritis is often associated with
DDH [2] so failing to detect and treat DDH in infancy can lead to later expen-
sive corrective surgical procedures. Based on the figures presented in [2], Price
et al. [3] estimated that 25,000 total hip replacements per year are attributable
to missed early diagnosis in the United States alone. At approximately $50,000
per procedure [4], the direct financial impact of this problem is in the order
of $1B/year, not considering the costs of subsequent revision surgeries or other
socioeconomic costs.
To diagnose DDH prior to ossification of the femoral head, 2-dimensional
(2D) ultrasound (US) imaging is currently recommended over other imaging
modalities (e.g. x-ray, magnetic resonance imaging, computed tomography, etc.)
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 602–609, 2016.
DOI: 10.1007/978-3-319-46720-7 70
Towards Reliable Automatic Characterization of Neonatal Hip Dysplasia 603
due to its low cost and absence of ionizing radiation [5]. The standard DDH
metric obtained from 2D US scans is the angle between the acetabular roof and
the vertical cortex of the ilium, referred to as the alpha angle, α2D [5,6]. In
general, α2D > 60◦ indicates a normal hip, whereas 43◦ < α2D < 60◦ rep-
resents borderline to moderate DDH, and α2D < 43◦ suggests severe DDH
[6,7]. However, α2D suffers from high within-hip variability, i.e. the variabil-
ity between dysplasia metrics (DMs) measured on repeated examinations of the
same patien’s hip, with standard deviations of such measurements, σ, ranging
from 3◦ to 7◦ [8]. This may be partly attributable to variations in manually mea-
suring α2D on 2D slices (subjective variability, σ ≈ 5◦ ), but is more likely due
to variability in α2D resulting from differences within what is considered clini-
cally acceptable (standard) 2D US scans that are caused mainly by differences
in the probe orientation (probe-orientation-dependent variability, σ ≈ 7◦ ) [7].
This high variability in measured α2D leads to significant discrepancies between
the initial clinical determination of dysplasia severity and later clinical assess-
ments. Specifically, estimates suggest that 6 % − 29 % of cases that are later
treated were initially regarded as not needing early treatment [1,9]. Further,
there is significant potential for over-treatment since about 90 % of US-detected
hip dysplasia cases resolve spontaneously [10]. Recently, we have proposed to
reduce the subjective variability by automatically extracting α2D from 2D US
[11]. Our preliminary results in that work showed a 9 % reduction in within-hip
variability [11].
To further reduce the within-hip variability, in this paper, we address the
crucial probe-orientation-dependent variability problem [7]. More specifically, we
propose to characterize DDH based on an intrinsically 3D morphology metric
derived directly from 3D US scans, which we argue captures more of the pertinent
anatomical structures, while reducing the dependency on probe orientation in
imaging those structures. To the best of our knowledge, only one previous work
[12] has proposed the use of an intrinsically 3D DM, the acetabular contact angle
(ACA). Similar to α2D , the ACA represents the angular separation between the
acetabular roof (A) and the lateral iliac (I), except that the ACA is based on
the segmented 3D surfaces of A and I. Hareendranathan et al.’s [12] method
involves a slice-by-slice analysis process that requires manually selecting 4 seed
points in each of the 2D US slices in a 3D US volume and manually separating A
from I. Using such an interactive method would require valuable clinician time
and the manual operations introduce within-image measurement variability of
approximately 1◦ [12] and inter-scan variability of approximately 4◦ [13].
In this paper, we propose a fully automatic approach for extracting a new
3D DM, the 3D alpha angle, α3D , by analogy to a α2D [6]. To the best of
our knowledge, our work is the first that proposes a fully automatic approach of
extracting a 3D dysplasia metric. In this paper, we: (1) extend our previous phase
symmetry feature-based bone/cartilage extraction [11] to 3D, (2) define our new
proposed 3D metric, α3D , (3) automatically extract α3D , (4) demonstrate on
real clinical data a significant decrease in within-hip variability of α3D compared
to α2D .
604 N. Quader et al.
2 Methods
The 2D dysplasia metric, α2D , is defined as the angle between the fitted straight
lines that approximate A and I when viewed on a 2D B-mode US image [5,6]. We
therefore define an analogous 3D metric, α3D , based on the relative orientations
of the fitted planar surfaces of A and I (Fig. 1b). Briefly, given a 3D B-mode US
image, U : X ⊂ IR3 → IR, where X = (x, y, z) are the voxel coordinates, our
approach starts by extracting the bone cartilage structures, B (Sect. 2.1). We
then use prior anatomical knowledge of the hip joint to automatically identify
the 3D surfaces of A and I within B (Sect. 2.2). Finally, we approximate the
average normals across A and I, and compute α3D as the angle between these
approximated normals (Sect. 2.3).
Our proposed α3D is based only on the surfaces of A and I, which are both
substructures of the detected B volume. To isolate A and I from other back-
ground structures, we use hip anatomy-based priors; the first prior is that A and
I are located superior to the spherical femoral head, F (Fig. 1(a,e,f)), while the
second prior is that A, I and the labrum (cartilage) tend to have a common
junction at the edge of the ilium (Fig. 1(e,g)). We thus start by detecting F ,
Towards Reliable Automatic Characterization of Neonatal Hip Dysplasia 605
Fig. 1. (a) Rendering of the anatomy of a hip joint showing A (red), I(blue) and the
femoral head. (b) Schematic illustration of α3D - the angle between the fitted planar
surfaces that approximate A and I. (c) An example clinical B-mode volumetric US
image of a neonatal hip. (d) Extracted sheet-like SP S responses. (e) Extracted bone
and cartilage B responses with arrows pointing to A, I and the labrum. (f) Segmented
femoral head, F . (g) Responses of the direction-variability feature with arrow pointing
to the maximum response. (h) Extracted ROIs of A and I within B.
Data Acquisition and Experimental Setup: In this study, two orthopedic sur-
geons at British Columbia Children’s hospital participated in collecting 3D US
images from 30 hip examinations (15 US sessions × 2 hips/US session = 30
hip examinations) belonging to 12 patients (single US session for 9 patients and
two US sessions for 3 patients), obtained as part of routine clinical care under
the appropriate institutional review board approval. To investigate repeatability,
i.e., within-hip variability, each hip examination consisted of five repeated 3D
US image acquisitions. Our proposed α3D angles were automatically calculated
for each of the five 3D US volumes. Furthermore, each of the infants underwent
an independent regular clinical care 2D US scanning session at the radiology
department, where the infants were scanned by the radiologist on duty. In every
2D US session, the radiologist would acquire repeated 2D US images and make
manual measurements of α2D (3 to 6 measurements per hip) on all images that
the radiologist judged to adequately show the key anatomical structures needed
to measure α2D ; subsequently, these were recorded in and retrieved from the
patient chart. As in a typical 2D US scan session (Fig. 2(a,b)), the 3D scans
were acquired in the coronal plane, where the ilium (located superior to the
femoral head, Fig. 1a) appears towards the left of the femoral head in an US
image (Fig. 1(e,f)).
Fig. 2. Qualitative results. (a), (b), (d) and (e) show example variability of α2D and
α3D from two 2D and two 3D US images from a hip examination of patient 3DUS004
(α2D = 47◦ and 56◦ , α3D = 44.8◦ and 45.2◦ ). The higher variability in the input 2D
US images (and α2D values) can be seen in the manually aligned 2D US images in (c)
compared to the variability in the manually aligned 3D US images (and α3D values)
in (f).
Discrepancy Between α2D and α3D : The difference between the two metrics,
mean(α2D ) − mean(α3D ) in the 30 hip examinations, was significant (p < 0.01,
mean: 5.17◦ , SD: 3.33◦ with a bias towards α3D being smaller than α2D ).
Variability of Metrics, σ: The automatic α3D shows a statistically signifi-
cant improvement in variability compared to the manual α2D (p = 0.0053,
mean(σ3D ) = 2.19◦ , mean(σ2D ) = 3.08◦ (qualitative result in Fig. 2 and box-
plot in Fig. 3b)). This 28.9 % reduction in variability suggests that probe position
variation has a larger effect on variability in the DM than manual processing of
the 2D US (9 % improvement with automatic image processing within a 2D US
in our previous study [11]). The residual variability of α3D (σ3D ≈ 2◦ ) seems
to be small enough to be diagnostically valuable, given that the typical range
from normal to dysplastic hip is around 17◦ [6,7]. Furthermore, the variability
608 N. Quader et al.
Fig. 3. (a) Scatter plot of α2D and α3D . (b) Box-plot of the standard deviations,
σ, among the manual α2D and automatic α3D measurements across all the 30 hip
examinations.
of α3D appears substantially lower than the reported variability of the recently
described 3D ACA metric (2.19◦ versus 4.1◦ [13]).
Computational Considerations: The complete process of extracting α3D from an
US volume took approximately 270 seconds, when run on a Xeon(R) 3.40 GHz
CPU computer with 12 GB RAM. All processes were executed using MATLAB
2015b. Current practice has a sonographer process the images post-acquisition,
so this computation time is not a significant barrier to implementation. Although
not critical for clinical use, we plan to work towards optimizing our code to reduce
this computation time.
4 Conclusions
We presented an automatic 3D dysplasia metric, α3D , to characterize hip dys-
plasia in 3D US images of the neonatal hip. Using the proposed α3D resulted
in a statistically significant reduction in variability compared to the currently
standard 2D measure, α2D . This suggests that this 3D morphology-derived DM
could be valuable in improving the reliability in diagnosing DDH, which may
lead to a more standardized DDH assessment with better diagnostic accuracy.
Notably, the improvement in reliability associated with the 3D scans was
achieved by orthopaedic surgeons, who have limited training in performing US
examinations, while the 2D scans and metrics were obtained from radiologists
with explicit training in ultrasound acquisition and analysis. This strongly sug-
gests that we may, in future, be able to train personnel other than radiologists to
obtain reliable and reproducible dysplasia metrics using 3D ultrasound machines,
potentially reducing the costs associated with screening for DDH.
References
1. Shorter, D., Hong, T., Osborn, D.A.: Cochrane review: screening programmes for
developmental dysplasia of the hip in newborn infants. Evid.-Based Child health:
Cochrane Rev. J. 8(1), 11–54 (2013)
Towards Reliable Automatic Characterization of Neonatal Hip Dysplasia 609
2. Hoaglund, F.T., Steinbach, L.S.: Primary osteoarthritis of the hip: etiology and
epidemiology. JAAOS 9(5), 320–327 (2001)
3. Price, C.T., Ramo, B.A.: Prevention of hip dysplasia in children and adults.
Orthop. Clin. North Am. 43(3), 269–279 (2012)
4. Rosenthal, J.A., Lu, X., Cram, P.: Availability of consumer prices from us hospitals
for a common surgical procedure. JAMA Intern. Med. 173(6), 427–432 (2013)
5. Atweh, L.A., Kan, J.H.: Multimodality imaging of developmental dysplasia of the
hip. Pediatr. Radiol. 43(1), 166–171 (2013)
6. Graf, R.: Fundamentals of sonographic diagnosis of infant hip dysplasia. J. Pediatr.
Orthop. 4(6), 735–740 (1984)
7. Jaremko, J.L., et al.: Potential for change in us diagnosis of hip dysplasia solely
caused by changes in probe orientation: patterns of alpha-angle variation revealed
by using three-dimensional us. Radiology 273(3), 870–878 (2014)
8. Ömeroğlu, H.: Use of ultrasonography in developmental dysplasia of the hip. J.
Child. Orthop. 8(2), 105–113 (2014)
9. Imrie, M., et al.: Is ultrasound screening for DDH in babies born breech sufficient?
J. Child. Orthop. 4(1), 3–8 (2010)
10. Shipman, S.A., Helfand, M., Moyer, V.A., Yawn, B.P.: Screening for developmental
dysplasia of the hip: a systematic literature review for the us preventive services
task force. Pediatrics 117(3), e557–e576 (2006)
11. Quader, N., Hodgson, A., Mulpuri, K., Schaeffer, E., Cooper, A., Abugharbieh, R.:
A reliable automatic 2D measurement for developmental dysplasia of the hip. Bone
Joint J. (2016, in press)
12. Hareendranathan, A.R., Mabee, M., Punithakumar, K., Noga, M., Jaremko, J.L.:
A technique for semiautomatic segmentation of echogenic structures in 3D ultra-
sound, applied to infant hip dysplasia. IJCARS 11, 1–12 (2015)
13. Mabee, M.G., Hareendranathan, A.R., Thompson, R.B., Dulai, S., Jaremko, J.L.:
An index for diagnosing infant hip dysplasia using 3-D ultrasound: the acetabular
contact angle. Pediatr. Radiol. 1–9 (2016)
14. Quader, N., Hodgson, A., Abugharbieh, R.: Confidence weighted local phase fea-
tures for robust bone surface segmentation in ultrasound. In: Linguraru, M.G.,
Laura, C.O., Shekhar, R., Wesarg, S., Ballester, M.Á.G., Drechsler, K., Sato, Y.,
Erdt, M. (eds.) CLIP 2014. LNCS, vol. 8680, pp. 76–83. Springer, Heidelberg (2014)
15. Descoteaux, M., Audette, M., Chinzei, K., Siddiqi, K.: Bone enhancement filter-
ing: application to sinus bone segmentation and simulation of pituitary surgery.
Comput. Aided Surg. 11(5), 247–255 (2006)
16. Torr, P.H., Zisserman, A.: MLESAC: a new robust estimator with application to
estimating image geometry. Comput. Vis. Image Underst. 78(1), 138–156 (2000)
17. Averbuch, A., Shkolnisky, Y.: 3D fourier based discrete radon transform. Appl.
Comput. Harmonic Anal. 15(1), 33–69 (2003)
Image-Based Computer-Aided Diagnostic
System for Early Diagnosis of Prostate Cancer
1 Introduction
Prostate cancer is the most frequently diagnosed malignancy after skin cancer,
and is the second primary reason of cancer deaths in American men after lung
cancer. More than 220,000 new cases are diagnosed with prostate cancer and
about 27,540 deaths because of prostate cancer among Americans were reported
in 2015 [1]. Fortunately, the mortality rates can be reduced in case of detecting
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 610–618, 2016.
DOI: 10.1007/978-3-319-46720-7 71
Image-Based CAD System for Early Diagnosis of Prostate Cancer 611
prostate cancer in its early stages. Currently, the standard technique for diag-
nosing prostate cancer is to carry out a transrectal ultrasound (TRUS)-guided
needle biopsy after an elevated level prostate specific antigen (PSA) in the blood,
greater than 4 ng/mL (nanograms per millimeter), is reported. When there is a
contradiction between the PSA level and the results of TRUS-guided biopsy
(such as, elevated PSA level while the biopsy result is negative), the use of MRI
to detect prostate cancer can be significant [2].
Different MRI techniques, such as T2 -weighted MRI, dynamic contrast
enhanced MRI (DCE-MRI) and DW-MRI, have been utilized in computer-aided
diagnostic (CAD) systems for detecting prostate cancer. T2 -weighted MRI pro-
vides superior pathological details of soft tissues but it lacks functional informa-
tion and the use of T2 -weighted MRI alone has resulted in low specificity [3].
Recently, the trend is to use functional MRI modalities such as DCE-MRI and
DW-MRI to increase the diagnostic accuracy. DCE-MRI employs contrast mate-
rials (e.g., gadolinium) to improve the contrast between the different tissue types.
However, DCE-MR images require long acquisition time and contrast materi-
als are deleterious especially for patients with kidney problems. On the other
hand, DW-MRI identifies tissue cellularity indirectly by studying the diffusion of
water molecules. Although the quality of DW-MR images is lower than DCE-MR
images, DW-MR images have distinct advantages over DCE-MR images as they
can be acquired very quickly, without the use of contrast materials [4]. The diag-
nostic accuracy of DW-MRI is higher than DCE-MRI and T2 -weighted MRI [5].
In the literature, a small number of prostate cancer CAD systems have eval-
uated the use of DW-MR images alone or in combination with other MRI tech-
niques [6]. For example, Firjani et al. [7] developed a DW-MRI based CAD
system in which a k-nearest-neighbor (KNN) classifier used three intensity fea-
tures to classify the prostate into benign or malignant. The first multiparametric
CAD system was proposed by Chan et al. [8] using T2 -MRI, T2 -mapping, and
line scan diffusion imaging (LSDI). Intensity and textural features were extracted
from manually-localized prostate region and fed into a support vector machine
(SVM) classifier or Fisher linear discriminant (FLD) classifier to detect prostate
cancer in the peripheral zone (PZ) of the prostate. The area under the curve
(AUC) was 0.76 ± 0.04 for the SVM and 0.84 ± 0.06 for the FLD. Another multi-
parametric CAD system that employed T2 -weighted MRI, DCE-MRI, and DW-
MRI was proposed by Litjens et al. [9]. In this system, an SVM classifier used
apparent diffusion coefficients (ADCs) and pharmacokinetic features extracted
from the segmented prostate to determine malignant and benign regions. Vos
et al. [10] proposed another multiparametric CAD system that utilized the same
MRI modalities used in [9]. In their system, a linear discriminant analysis (LDA)
classifier employed a set of features (e.g., texture-based, ADC maps) to differ-
entiate between malignant and benign prostates. It is obvious that most of the
aforementioned CAD systems used multi-parametric MRI which can be cost
inefficient [11], our CAD system for early detection of prostate cancer utilizes
DW-MRI. In our CAD system, the focus is on classifying the entire prostate
volume into malignant or benign and not on finding the location of the cancer.
Details of the proposed system will be discussed in the following sections.
612 I. Reda et al.
2 Methods
The proposed CAD system summarized in Fig. 1 performs sequentially three
steps. First, the prostate is segmented using our previously developed geomet-
ric deformable model (level-sets) as described in [12]. This model is guided by
a stochastic speed function that is derived using nonnegative matrix factoriza-
tion (NMF). The NMF attributes are calculated using information from the
MRI intensity, a probabilistic shape model, and the spatial interactions between
prostate voxels. The proposed approach reaches 86.89% overall Dice similarity
coefficient and an average Hausdorff distance of 5.72 mm, indicating high seg-
mentation accuracy. Details of this approach and comparisons with other seg-
mentation approaches can be found in [12]. Afterwards, global features describing
the water diffusion inside the prostate tissue are extracted based on ADC-CDFs.
Finally, a two-stage structure of stacked nonnegativity constraint auto-encoder
(SNCAE) is trained to classify the prostate tumor as benign or malignant based
on the CDFs constructed in the previous step. The latter two steps of the pro-
posed CAD system are discussed in the following sections.
ln SS01 (x,y,z)
(x,y,z)
ADC(x, y, z) = (1)
b1 − b 0
where S0 and S1 are the signal intensity acquired at the b0 and b1 b-values,
respectively. Then, all ADC maps at certain b-value for all subjects are nor-
malized with respect to the maximum value of all of these maps to make all
Image-Based CAD System for Early Diagnosis of Prostate Cancer 613
calculated ADC maps in the same range (between 0 and 1) in order to use a
unique color coding for all of them. The calculated ADC values are refined using
a generalized Gauss-Markov random field (GGMRF) image model with a 26-
voxel neighborhood to remove any data inconsistency and preserve continuity.
Continuity of the constructed 3D volume is amplified by using their maximum a
posteriori (MAP) estimates. The CDFs of the normalized ADCs of each subject
are constructed. These CDFs are considered as global features distinguishing
between benign and malignant cases. Instead of using the whole ADC volume,
the resultant CDFs are used to train an SNCAE classifier using the deep learning
approach.
It is worth noting that conventional classification methods, employing
directly the voxel-wise ADCs of the entire prostate volume as discriminative fea-
tures, encounter at least two serious difficulties. Various input data sizes require
unification by either data truncation for large prostate volumes, or zero padding
for small ones. Both ways may decrease the accuracy of the classification. Tech-
niques like bag-of-visual-words (BoVW) can be employed to overcome the diffi-
culty of various input data sizes but the data has to be aligned and the accuracy
of BoVW technique is a function of the data resolution and the size of the bag.
In addition, large ADC data volumes lead to considerable time expenditures for
training and classification. Contrastingly, our SNCAE classifier exploits only the
100-component CDFs to describe the entire 3D ADC maps estimated at each
b-value. This fixed data size helps overcome the above challenges and notably
expedites the classification.
To classify the prostate tumor, our CAD system employs a deep neural network
with two-stage structure of stacked autoencoders (AE). In the first stage, seven
autoencoder-based classifiers, one classifier for each of seven different b-values
(100 to 700 s/mm2 ), are utilized to estimate initial classification probabilities
that are concatenated and fed in the second stage into another SNCAE to esti-
mate the final classification.
Each AE compresses its input data (100-component CDFs at some b-value)
to capture the most prominent variations and is built separately by greedy unsu-
pervised pre-training [14]. A softmax output layer, stacked after AE layers, facili-
tates the subsequent supervised back-propagation-based fine tuning of the entire
classifier by minimizing the total loss (negative log-likelihood) for given training
labeled data. Using the AEs with a non-negativity constraint (NCAE) [15] yields
both more reasonable data codes (features) during its unsupervised pre-training
and better classification performance after the supervised refinement.
For each SNCAE, let W = {Wje , Wid : j = 1, . . . , s; i = 1, . . . , n} denote a
set of column vectors of weights for encoding (e) and decoding (d) layers of a
single AE. Let T denote vector transposition. The AE converts an n-dimensional
column vector u = [u1 , . . . , un ]T of input signals into an s-dimensional column
vector h = [h1 , . . . , hs ]T of hidden codes (features, or activations), such that
614 I. Reda et al.
e T [1]
The activations of the second NCAE layer, h[2] = σ(W[2] h ), are inputs of
the softmax classification layer, as sketched in Fig. 2(a) to compute a plausibility
of a decision in favor of each particular output class, c = 1, 2:
exp(W◦:cT
h[2] ) 2
p(c; W◦:c ) = T h[2] ) + exp(WT h[2] )
; c = 1, 2; p(c; W◦:c ; h[2] ) = 1.
exp(W◦:1 ◦:2 c=1
Image-Based CAD System for Early Diagnosis of Prostate Cancer 615
3 Experimental Results
Experiments were conducted on 53 DW-MRI data sets (27 benign and 26 malig-
nant) obtained using a body coil Signa Horizon GE scanner in axial plane with
the following parameters: Magnetic field strength: 1.5 T; TE: 84.6 ms; TR: 8000
ms; Bandwidth: 142.86 kHz; FOV: 34 cm; Slice thickness: 3 mm; Inter-slice gap:
0 mm; Acquisition sequence: conventional EPI; Diffusion weighting directions:
mono direction; the used range of b-values is from 0 to 700 s/mm2 . On average,
26 slices were obtained in 120 s to cover the prostate in each patient with voxel
size of 1.25×1.25×3.00 mm3 . The ground truths are performed on a slice-by-slice
basis and obtained by manual segmentation using Slicer c
(www.slicer.org). All
annotations are verified by an expert. All the subjects were diagnosed using a
biopsy and the Gleason scores for the malignant cases range from 6 to 8. The
cases were evaluated as a whole and not per tumor.
To learn the statistical characteristics of both benign and malignant subjects,
we trained 7 different SNCAE, one for each b-value, by 53 DW-MRI datasets
(27 benign and 26 malignant). All training was done inside leave-one-subject-out
cross-validation framework. The features involved for classification are the CDF
of the normalized ADC maps for 7 different b-values of the segmented prostate
tissue. To assess the accuracy of our system, we perform a leave-one-subject-
out cross-validation test for each AE with the whole 53 datasets. The overall
diagnostic accuracy for different b-values are summarized in Table 1.
In the last stage of the classification, we concatenated the output probabilities
from the 7 AEs. This vector of the fused probabilities is fed into the prediction
stage SNCAE. Our classifier achieves an overall accuracy of 98.11 % for all testing
data sets which is higher than all reported accuracies in Table 1.
To highlight the merit of using the proposed system, a comparison between
our classifier and four other ready-to-use classifiers (K*, K-nearest neighbor,
Random Forest and Random Tree classifiers implemented in Weka toolbox) [16]
is summarized in Table 2. The input features for each of those four classifiers are
the 100-component CDFs. As demonstrated in Table 2, the proposed framework
616 I. Reda et al.
Table 2. Classification accuracy, sensitivity, specificity and AUC of our SNCAE clas-
sifier and four ready-to-use Weka classifiers.
Fig. 3. ROC curves for our SNCAE classifier and four ready-to-use Weka classifiers.
outperforms the other alternatives. The corresponding areas under the curve
(AUC) of the receiver operating characteristics of those classifiers are shown in
Fig. 3. The AUC of the proposed classifier approaches 0.987.
4 Conclusions
This paper presented an efficient DW-MRI CAD system for early detection of
prostate cancer. The proposed CAD system used integral statistics (CDFs of the
Image-Based CAD System for Early Diagnosis of Prostate Cancer 617
References
1. Siegel, R.L., Miller, K.D., Jemal, A.: Cancer statistics, 2015. CA: Cancer J. Clin.
65(1), 5–29 (2015)
2. Lawrentschuk, N., Fleshner, N.: The role of magnetic resonance imaging in tar-
geting prostate cancer in patients with previous negative biopsies and elevated
prostate-specific antigen levels. BJU Int. 103(6), 730–733 (2009)
3. Hoeks, C.M., et al.: Prostate cancer: multiparametric MR imaging for detection,
localization, and staging. Radiology 261(1), 46–66 (2011)
4. Tan, C.H., Wang, J., Kundra, V.: Diffusion weighted imaging in prostate cancer.
Eur. Radiol. 21(3), 593–603 (2011)
5. Tamada, T., Sone, T., Jo, Y., Yamamoto, A., Ito, K.: Diffusion-weighted MRI and
its role in prostate cancer. NMR Biomed. 27(1), 25–38 (2014)
6. Lemaı̂tre, G., et al.: Computer-aided detection and diagnosis for prostate cancer
based on mono and multi-parametric MRI: a review. Comput. Biol. Med. 60, 8–31
(2015)
7. Firjani, A., Elnakib, A., Khalifa, F., Gimel’farb, G., El-Ghar, M.A.,
Elmaghraby, A., El-Baz, A.: A diffusion-weighted imaging based diagnostic sys-
tem for early detection of prostate cancer. J. Biomed. Sci. Eng. 6(03), 346 (2013)
8. Chan, I., et al.: Detection of prostate cancer by integration of line-scan diffusion,
T2-mapping and T2-weighted magnetic resonance imaging; a multichannel statis-
tical classifier. Med. Phys. 30(9), 2390–2398 (2003)
9. Litjens, G., Vos, P., Barentsz, J., Karssemeijer, N., Huisman, H.: Automatic com-
puter aided detection of abnormalities in multi-parametric prostate MRI. In: Pro-
ceedings of SPIE Medical Imaging 2011: Computer-Aided Diagnosis, vol. 7963, pp.
79630T–79630T. International Society for Optics and Photonics (2011)
10. Vos, P., Barentsz, J., Karssemeijer, N., Huisman, H.: Automatic computer-aided
detection of prostate cancer based on multiparametric magnetic resonance image
analysis. Phys. Med. Biol. 57(6), 1527 (2012)
11. Hambrock, T., Somford, D.M., Hoeks, C., Bouwense, S.A., Huisman, H., Yakar, D.,
van Oort, I.M., Witjes, J.A., Fütterer, J.J., Barentsz, J.O.: Magnetic resonance
imaging guided prostate biopsy in men with repeat negative biopsies and increased
prostate specific antigen. J. Urol. 183(2), 520–528 (2010)
12. McClure, P., Khalifa, F., Soliman, A., El-Ghar, M.A., Gimelfarb, G.,
Elmagraby, A., El-Baz, A.: A novel NMF guided level-set for DWI prostate seg-
mentation. J. Comput. Sci. Syst. Biol. 7, 209–216 (2014)
13. Le Bihan, D.: Apparent diffusion coefficient and beyond: what diffusion MR imag-
ing can tell us about tissue structure. Radiology 268(2), 318–322 (2013)
14. Bengio, Y., et al.: Greedy layer-wise training of deep networks. Adv. Neural Inf.
Process. Syst. 19, 153 (2007)
618 I. Reda et al.
15. Hosseini-Asl, E., Zurada, J., Nasraoui, O.: Deep learning of part-based representa-
tion of data using sparse autoencoders with nonnegativity constraints. IEEE Trans.
Neural Netw. Learn. Syst. 99, 1–13 (2015)
16. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The
weka data mining software: an update. ACM SIGKDD Explor. 11(1), 10–18 (2009)
Multidimensional Texture Analysis for Improved
Prediction of Ultrasound Liver Tumor Response
to Chemotherapy Treatment
1 Introduction
Liver tumor ultrasound scanning is recently becoming increasingly recommended
as a first diagnosis option for early prediction of response to chemotherapy treat-
ment [1]. However, visual assessment of tumor response to chemotherapy is very
challenging without monitoring longitudinally the tumor development. This is,
in part, due to the intertwined tumor speckle variations, leading to formation of
complex texture patterns. A robust approach to tackle this texture complexity
is to assess the radio-frequency (RF) echoes – instead of B-mode images – which
are not subjected to log-compression and proprietary filtering algorithms. This
original data preservation allows for better statistical modeling of backscattering
properties.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 619–626, 2016.
DOI: 10.1007/978-3-319-46720-7 72
620 O.S. Al-Kadi et al.
A natural way of assessing the echo signal f (x, y) is to analyse its statistical
properties at different spatial scales. An efficient way to systematically decom-
pose f (x, y) into successive dyadic scales is to use a redundant isotropic wavelet
transform. The recursive projection of f on Simoncelli’s isotropic wavelet pro-
vides such a frame representation [9]. The radial profile of the wavelet function
is defined in polar coordinates in the Fourier domain as
π
cos π2 log2 2ρπ , 4 <ρ≤π
ĥ(ρ) = (2)
0, otherwise.
622 O.S. Al-Kadi et al.
The scaling function is not used to ensure illumination invariance. The local
structural properties of f (x, y) can be well described in terms of the local circular
frequencies, which was at the origin of the success of methods such as local binary
patterns (LBP) [10]. In this work, local circular harmonics are computed on top
of the wavelet frames to characterize circular frequencies at multiple scales [11],
which is an extension of steerable Riesz wavelets [12]. Circular harmonic wavelets
(CHW) of order n are constructed in the Fourier domain as
The representation obtained from the collection of the complex magnitudes of the
scalar products |f, φ(n) | characterizes the local circular frequencies in f (x, y)
up to an order n = 1 : N and is rotation invariant [13].
Fig. 1. Classification accuracies for varying convolution kernel size (I) in pixels with
fixed order (N = 2) and scale (J = 6)
Cross-validation
Statistical measures loo 5-fold 10-fold
Accuracy 97.91 93.30 ± 0.017 95.70 ± 0.009
Sensitivity 98.80 96.40 ± 0.888 97.50 ± 0.931
Specificity 96.60 88.80 ± 0.964 93.10 ± 0.975
ROC-AUC 97.70 92.60 ± 0.020 95.30 ± 0.009
Fig. 2. (1st column) Tumor B-mode images, (2nd column) fractal texture maps and
(3rd column) corresponding tissue heterogeneity representation for a (1st row) non-
responsive vs (2nd row) responsive case, respectively. Red regions in (c) and (f) indicate
response to treatment according to RECIST criteria [16]. CHW decomposition was
based on a 2nd order and up to the 8-th scale.
4 Conclusion
A novel approach has been presented for quantifying liver tumor response to
chemotherapy treatment with three main contributions: (a) ultrasound liver
tumor texture analysis based on a Nakagami distribution model for analyzing the
envelope RF data is important to retain enough information; (b) a set of CHW
frames are used to define a new tumor heterogeneity descriptor that is charac-
terized at multi-scale circular harmonics of the ultrasound RF envelope data; (c)
the heterogeneity is specified by the lacunarity measure, which is viewed as the
size distribution of gaps on the fractal texture of the decomposed CHW coeffi-
cients. Finally the measurement of heterogeneity for the proposed representation
model is realized by means of support vector machines.
Acknowledgments. We would like to thank Dr. Daniel Y.F. Chung for providing the
ultrasound dataset. This work was partially supported by the Swiss National Science
Foundation (grant PZ00P2 154891) and the Arab Fund (grant 2015-02-00627).
626 O.S. Al-Kadi et al.
References
1. Bae, Y.H., Mrsny, R., Park, K.: Cancer Targeted Drug Delivery: An Elusive Dream,
pp. 689–707. Springer, New York (2013)
2. Sadeghi-Naini, A., Papanicolau, N., Falou, O., Zubovits, J., Dent, R., Verma, S.,
Trudeau, M., Boileau, J.F., Spayne, J., Iradji, S., Sofroni, E., Lee, J.,
Lemon-Wong, S., Yaffe, M., Kolios, M.C., Czarnota, G.J.: Quantitative ultrasound
evaluation of tumor cell death response in locally advanced breast cancer patients
receiving chemotherapy. Clin. Cancer Res. 19(8), 2163–2174 (2013)
3. Tadayyon, H., Sadeghi-Naini, A., Wirtzfeld, L., Wright, F.C., Czarnota, G.: Quan-
titative ultrasound characterization of locally advanced breast cancer by estimation
of its scatterer properties. Med. Phys. 41, 012903 (2014)
4. Gangeh, M.J., Sadeghi-Naini, A., Diu, M., Kamel, M.S., Czarnota, G.J.: Cate-
gorizing extent of tumour cell death response to cancer therapy using quantita-
tive ultrasound spectroscopy and maximum mean discrepancy. IEEE Trans. Med.
Imaging 33(6), 268–272 (2014)
5. Wachinger, C., Klein, T., Navab, N.: The 2D analytic signal for envelope detection
and feature extraction on ultrasound images. Med. Image Anal. 16(6), 1073–1084
(2012)
6. Al-Kadi, O.S., Chung, D.Y., Carlisle, R.C., Coussios, C.C., Noble, J.A.: Quantifi-
cation of ultrasonic texture intra-heterogeneity via volumetric stochastic modeling
for tissue characterization. Med. Image Anal. 21(1), 59–71 (2015)
7. Al-Kadi, O.S., Watson, D.: Texture analysis of aggressive and non-aggressive lung
tumor CE CT images. IEEE Trans. Bio-med. Eng. 55(7), 1822–1830 (2008)
8. Shankar, P.M.: A general statistical model for ultrasonic backscattering from tis-
sues. IEEE T Ultrason. Ferroelectr. Freq. Control 47(3), 727–736 (2000)
9. Portilla, J., Simoncelli, E.P.: A parametric texture model based on joint statistics
of complex wavelet coefficients. Int. J. Comput. Vis. 40, 49–70 (2000)
10. Ojala, T., Pietikänen, M., Mäenpää, T.: Multiresolution gray-scale and rotation
invariant texture classification with local binary patterns. IEEE Trans. Pattern
Anal. Mach. Intell. 24, 971–987 (2002)
11. Unser, M., Chenouard, N.: A unifying parametric framework for 2D steerable
wavelet transforms. SIAM J. Imaging Sci. 6(1), 102–135 (2013)
12. Unser, M., Van De Ville, D.: Wavelet steerability and the higher-order Riesz trans-
form. IEEE Trans. Image Process. 19(3), 636–652 (2010)
13. Depeursinge, A., Püspöki, Z., et al.: Steerable wavelet machines (SWM): learning
moving frames for texture classification. IEEE Trans. Image Process. (submitted)
14. Lopes, R., Betrouni, N.: Fractal and multifractal analysis: a review. Med. Image
Anal. 13(4), 634–649 (2009)
15. Plotnick, R.E., Gardner, R.H., Hargrove, W.W., Prestegaard, K., Perlmutter, M.:
Lacunarity analysis: a general technique for the analysis of spatial patterns. Phys.
Rev. E 53(5), 5461–5468 (1996)
16. Eisenhauer, E.A., Therasse, P., et al.: New response evaluation criteria in solid
tumours: revised RECIST guideline. Eur. J. Cancer 45(2), 228–247 (2009)
Classification of Prostate Cancer Grades
and T-Stages Based on Tissue Elasticity
Using Medical Image Analysis
Shan Yang(B) , Vladimir Jojic, Jun Lian, Ronald Chen, Hongtu Zhu,
and Ming C. Lin
1 Introduction
Currently screening of prostate cancers is usually performed through routine
prostate-specific antigen (PSA) blood tests and/or a rectal examination. Based
on positive PSA indication, a biopsy of randomly sampled areas of the prostate
can then be considered to diagnose the cancer and assess its aggressiveness.
Biopsy may miss sampling cancerous tissues, resulting in missed or delayed diag-
nosis, and miss areas with aggressive cancers, thus under-staging the cancer and
leading to under-treatment.
Studies have shown that the tissue stiffness described by the tissue properties
may indicate abnormal pathological process. Ex-vivo, measurement-based meth-
ods, such as [1,11] using magnetic resonance imaging (MRI) and/or ultrasound,
were proposed for study of prostate cancer tissue. However, previous works in
material property reconstruction often have limitations with respect to their
genericity, applicability, efficiency and accuracy [22]. More recent techniques,
such as inverse finite-element methods [6,13,17,21,22], stochastic finite-element
methods [18], and image-based ultrasound [20] have been developed for in-vivo
soft tissue analysis.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 627–635, 2016.
DOI: 10.1007/978-3-319-46720-7 73
628 S. Yang et al.
In this paper, we study the possible use of tissue (i.e. prostate) elasticity to
help evaluate the prognosis of prostate cancer patients given at least two set of
CT images. The clinical T-stage of a prostate cancer is a measure of how much
the tumor has grown and spread; while a Gleason score based on the biopsy of
cancer cells indicates aggressiveness of the cancer. They are commonly used for
cancer staging and grading. We present an improved method that uses geomet-
ric and physical constraints to deduce the relative tissue elasticity parameters.
Although elasticity reconstruction, or elastography, can be used to estimate tis-
sue elasticity, it is less suited for in-vivo measurements or deeply seated organs
like prostate. We describe a non-invasive method to estimate tissue elasticity
values based on pairs of CT images, using a finite-element based biomechan-
ical model derived from an initial set of images, local displacements, and an
optimization-based framework.
Given the recovered tissue properties reconstructed from analysis of med-
ical images and patient’s ages, we develop a multiclass classification system for
classifying clinical T-stage and Gleason scores for prostate cancer patients. We
demonstrate the feasibility of a statistically-based multiclass classifier that clas-
sifies a supplementary assessment on cancer T-stages and cancer grades using
the computed elasticity values from medical images, as an additional clinical
aids for the physicians and patients to make more informed decision (e.g. more
strategic biopsy locations, less/more aggressive treatment, etc.). Concurrently,
extracted image features [8–10] using dynamic contrast enhanced (DCE) MRI
have also been suggested for prostate cancer detection. These methods are com-
plementary to ours and can be used in conjunction with ours as a multimodal
classification method to further improve the overall classification accuracy.
2 Method
In our system, we apply Finite Element Method (FEM) and adopt Mooney
Rivlin material for bio-tissue modeling [3]. After discretization using FEM, we
arrive at a linear system,
Classification of Prostate Cancer Grades and T-Stages 629
Ku = f (1)
with K as the stiffness matrix, u as the displacement field and f as the external
forces. The stiffness matrix K is not always symmetric possitive definite due
to complicated boundary condition. The boundary condition we applied is the
traction forces (shown in Fig. 7(a) of the supplementary document) computed
based on the displacement of the surrounding tissue (overlapping surfaces shown
in Fig. 7(b) of the supplementary document). We choose to use the Generalized
Minimal Residual (GMRES) [16] solver to solve the linear system instead of the
Generalized Conjugate Gradient (GCG) [14], as GMRES can better cope with
non-symmetric, positive-definite linear system.
The computation of the siffness matrix K in Eq. 1 depends on the energy
function Ψ of the Mooney Rivlin material model [15,19].
1 2 1 1
Ψ= μ1 ((I21 − I2 )/I33 − 6) + μ2 (I1 /I33 − 3) + v1 (I32 − 1)2 , (2)
2
where μ1 , μ2 and v1 are the material parameters. In this paper, we recover
parameters μ1 and μ2 . Since prostate soft tissue (without tumors) tend to be
homogenous, we use the average μ̄ of μ1 and μ2 as our recovered elasticity
parameter. To model incompressibility, we set v1 to be a very large value (1 + e7
was used in our implementation). v1 is linearly related to the bulk modulus. The
larger the bulk modulus, the more incompressible the object.
Relative Elasticity Value: In addition, we divide the recovered absolute elas-
ticity parameter μ̄ by the that of the surrounding tissue to compute the relative
elasticity parameter μ̂. This individualized relativity value helps to remove the
variation in mechanical properties of tissues between patients, normalizing the
per-patient fluctuation in absolute elasticity values due to varying degrees of
hydration and other temporary factors. We refer readers to our supplementary
document for details regarding non-linear material models.
with d(Sl , St ) as the distance between deformed surface and the reference sur-
face, λ as the regularization weight, and Γ as the second-order differentiatial
operator.
630 S. Yang et al.
Given the CT images (shown in Fig. 1a) of the patient, the prostate, bladder
and rectum are first segmented in the images. Then the 3D surfaces (shown in
(a) (b)
Fig. 1. Real Patient CT Image and Reconstructed Organ Surfaces. (a) shows
one slice of the parient CT images with the bladder, prostate and rectum segmented.
(b) shows the reconstructed organ surfaces.
Classification of Prostate Cancer Grades and T-Stages 631
Fig. 1b) of these organs are reconstructed using VTK and these surfaces would
be the input to our elasticity parameter reconstruction algorithm. Our patient
dataset contains 113 (29 as the reference and 84 as target) sets of CT images
from 29 patients, each patient having 2 to 15 sets of CT images. Every patient
in the dataset has prostate cancer with cancer T-stage ranging from T1 to T3,
Gleason score ranging from 6 to 10, and age from 50 to 85. Gleanson scores are
usually used to assess the aggressiveness of the cancer.
(a) (b)
(a) (b)
data, thus further improving the classification results and testing/validating its
classification power for cancer diagnosis. With more data, we could also apply
our learned model for cancer stage/score prediction. And other features, such
as the volume of the prostate can also be included in the larger study. Another
possible direction is to perform the same study on normal subjects and increase
the patient diversity from different locations. A large-scale study can enable
more complete analysis and lead to more insights on the impact of variability
due to demographics and hospital practice on the study results. Similar analysis
and derivation could also be performed using other image modalities, such as
MR and ultrasound, and shown to be applicable to other types of cancers.
References
1. Ashab, H.A.D., Haq, N.F., Nir, G., Kozlowski, P., Black, P., Jones, E.C., Gold-
enberg, S.L., Salcudean, S.E., Moradi, M.: Multimodal classification of prostate
tissue: a feasibility study on combining multiparametric MRI and ultrasound. In:
SPIE Medical Imaging, p. 94141B. International Society for Optics and Photonics
(2015)
2. Bender, R., Grouven, U.: Ordinal logistic regression in medical research. J. R. Coll.
Physicians Lond. 31(5), 546–551 (1997)
3. Cotin, S., Delingette, H., Ayache, N.: Real-time elastic deformations of soft tissues
for surgery simulation. IEEE Trans. Vis. Comput. Graph. 5(1), 62–73 (1999)
4. Dubuisson, M.P., Jain, A.K.: A modified hausdorff distance for object matching. In:
Proceedings of the 12th IAPR International Conference on Pattern Recognition,
1994, vol. 1-Conference A: Computer Vision and Image Processing, vol. 1, pp.
566–568. IEEE (1994)
5. Engl, H.W., Kunisch, K., Neubauer, A.: Convergence rates for Tikhonov regulari-
sation of non-linear ill-posed problems. Inverse Prob. 5(4), 523 (1989)
6. Goksel, O., Eskandari, H., Salcudean, S.E.: Mesh adaptation for improving elas-
ticity reconstruction using the FEM inverse problem. IEEE Trans. Med. Imaging
32(2), 408–418 (2013)
7. Golub, G.H., Hansen, P.C., O’Leary, D.P.: Tikhonov regularization and total least
squares. SIAM J. Matrix Anal. Appl. 21(1), 185–194 (1999)
8. Haq, N.F., Kozlowski, P., Jones, E.C., Chang, S.D., Goldenberg, S.L., Moradi, M.:
Prostate cancer detection from model-free T1-weighted time series and diffusion
imaging. In: SPIE Medical Imaging, p. 94142X. International Society for Optics
and Photonics (2015)
9. Haq, N.F., Kozlowski, P., Jones, E.C., Chang, S.D., Goldenberg, S.L., Moradi, M.:
Improved parameter extraction and classification for dynamic contrast enhanced
MRI of prostate. In: SPIE Medical Imaging, p. 903511. International Society for
Optics and Photonics (2014)
10. Haq, N.F., Kozlowski, P., Jones, E.C., Chang, S.D., Goldenberg, S.L., Moradi, M.:
A data-driven approach to prostate cancer detection from dynamic contrast
enhanced MRI. Comput. Med. Imaging Graph. 41, 37–45 (2015)
Classification of Prostate Cancer Grades and T-Stages 635
11. Khojaste, A., Imani, F., Moradi, M., Berman, D., Siemens, D.R., Sauerberi, E.E.,
Boag, A.H., Abolmaesumi, P., Mousavi, P.: Characterization of aggressive prostate
cancer using ultrasound RF time series. In: SPIE Medical Imaging, p. 94141A.
International Society for Optics and Photonics (2015)
12. Kleinbaum, D.G., Klein, M.: Ordinal logistic regression. Logistic Regression, pp.
463–488. Springer, Berlin (2010)
13. Lee, H.P., Foskey, M., Niethammer, M., Krajcevski, P., Lin, M.C.: Simulation-
based joint estimation of body deformation and elasticity parameters for medical
image analysis. IEEE Trans. Med. Imaging 31(11), 2156–2168 (2012)
14. Liu, Y., Storey, C.: Efficient generalized conjugate gradient algorithms, part 1:
theory. J. Optim. Theory Appl. 69(1), 129–137 (1991)
15. Rivlin, R.S., Saunders, D.: Large elastic deformations of isotropic materials. VII.
Experiments on the deformation of rubber. Philos. Trans. R. Soc. Lond. Ser. A
Math. Phys. Sci. 243(865), 251–288 (1951)
16. Saad, Y., Schultz, M.H.: Gmres: a generalized minimal residual algorithm for solv-
ing nonsymmetric linear systems. SIAM J. Sci. Stat. Computing. 7(3), 856–869
(1986)
17. Shahim, K., Jürgens, P., Cattin, P.C., Nolte, L.-P., Reyes, M.: Prediction of cranio-
maxillofacial surgical planning using an inverse soft tissue modelling approach. In:
Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013, Part I.
LNCS, vol. 8149, pp. 18–25. Springer, Heidelberg (2013)
18. Shi, P., Liu, H.: Stochastic finite element framework for simultaneous estimation
of cardiac kinematic functions and material parameters. Med. Image Anal. 7(4),
445–464 (2003)
19. Treloar, L.R., Hopkins, H., Rivlin, R., Ball, J.: The mechanics of rubber elasticity
[and discussions]. Proc. R. Soc. Lond. A. Math. Phys. Sci. 351(1666), 301–330
(1976)
20. Uniyal, N., et al.: Ultrasound-based predication of prostate cancer in MRI-guided
biopsy. In: Linguraru, M.G., Laura, C.O., Shekhar, R., Wesarg, S., Ballester,
M.Á.G., Drechsler, K., Sato, Y., Erdt, M. (eds.) CLIP 2014. LNCS, vol. 8680,
pp. 142–150. Springer, Heidelberg (2017)
21. Vavourakis, V., Hipwell, J.H., Hawkes, D.J.: An inverse finite element u/p-
formulation to predict the unloaded state of in vivo biological soft tissues. Ann.
Biomed. Eng. 44(1), 187–201 (2016)
22. Yang, S., Lin, M.: Materialcloning: Acquiring elasticity parameters from images
for medical applications (2015)
Automatic Determination of Hormone Receptor
Status in Breast Cancer Using Thermography
1 Introduction
Breast cancer has the highest incidence among cancers in women [1]. Breast can-
cer also has wide variations in the clinical and pathological features [2], which
are taken into account for treatment planning [3], and to predict survival rates or
treatment outcomes [2,4]. Thermography offers a radiation free and non-contact
approach to breast imaging and is being re-investigated in recent times [5–8]
with the availability of high resolution thermal cameras. Thermography detects
the temperature increase in malignancy due to the increased metabolism of can-
cer [9] and due to the additional blood flow generated for feeding the malignant
tumors [6]. Thermography may also be sensitive to hormone receptor status as
these hormones release Nitric Oxide, which causes vasodilation and temperature
increase [6,10]. Both these effects could potentially lead to evaluation of hormone
receptor status of malignant tumors using thermography. If this is possible, it
provides a non-invasive way of predicting the hormone receptor status of malig-
nancies through imaging, before going through Immuno-Histo-Chemistry (IHC)
analysis on the tumor samples after surgery. This paper investigates this possibil-
ity and the prediction accuracy. Most other breast imaging techniques including
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 636–643, 2016.
DOI: 10.1007/978-3-319-46720-7 74
Automatic Determination of Hormone Receptor Status in Breast Cancer 637
mammography are not able to detect hormone receptor status changes. Though
the paper by Chaudhuri et al. [11] claims that Dynamic Contrast Enhanced
(DCE) MRI can be used for prediction of Estrogen status, it is invasive, and
has been tested only on a small dataset of 20 subjects with leave-one-out cross-
validation.
There has been a study to analyze the effect of hormone receptor status of
malignant tumors on thermography [12] though quantitative analysis of average
or maximum temperatures of the tumor, the mirror tumor site and the breasts.
[12] reports a significant difference in these temperature measurements for hor-
mone receptor positive and negative status using thermography. In this paper,
we automatically extract features from the thermographic images in the region
of interest (ROI), i.e. the breast tissue, using image processing and attempt to
classify the hormone receptor status of malignant tumors using machine learn-
ing techniques. The determination of whether or not a subject has breast cancer
using thermography, i.e. screening for cancer, is out of scope for this paper. There
are other algorithms for breast cancer screening using thermography [8,13],
which the reader may refer to based on interest.
The paper is organized as follows. Section 2 provides details on the effect of
hormone receptor positive and negative breast cancers on thermography from
the existing literature. Section 3 describes our approach to automatic feature
extraction from the ROI for HR+ and HR− malignant tumor classification.
Section 4 describes the dataset used for our experiments and our classification
results are provided in Sect. 5. Conclusions and future work are given in Sect. 6.
associated with locally high concentrations of Nitric Oxide generation [10] for
prolonged periods of time. [12] find there is a significant difference in average
and maximum temperature of the tumor site between PR+ and PR− tumors,
with the PR− tumors being hotter. The same pattern holds for ER status
although in a non-significant manner. Their study showed that the more aggres-
sive ER−/PR− tumors were hotter than the less aggressive ER+/PR+ tumors.
Their study also indicates that the difference in average temperatures of the
tumor and its mirror sites in contra-lateral breasts is higher in ER− tumors
than in ER+ tumors, although in a non-significant manner. The same pattern
holds for the PR status too. Since the hormone sensitivity of both breast tis-
sues are similar, it is probable that there is a thermal increase on both breasts
for estrogen or progesterone positive cases. [12] don’t specifically analyze the
four different subtypes of ER/PR status, probably because the difference in
temperatures are small for just one hormone receptor status. Using these med-
ical reasons and empirical observations, in the next section, we design a set of
novel features along with a few existing features that would either extract these
observations automatically or would correlate with these findings for classifying
hormone receptor positive and negative tumors.
T2 = Tmax − τ (2)
In the above equations, Tmax represents the overall maximum temperature in
all views and M ode(ROI) represents the mode of the temperature histogram
obtained using temperature values of pixels from the ROIs of all views. The
parameters ρ, τ and the decision fusion rule are selected based on the accuracy
Automatic Determination of Hormone Receptor Status in Breast Cancer 639
Distance Between Regions. The malignant tumor region is hotter than the
surrounding region, but the relative difference is higher for HR− tumors. In case
of HR+ tumors, the entire breast region is warmed up, and so this difference
is lesser. We use the normalized histogram of temperatures, or probability mass
function (PMF), to represent each region, and find the distance between regions
using a distance measure between PMFs. Here, the Jensen-Shannon Divergence
(JSD) is used a measure, as it is a symmetric measure. The JSD is defined as
1 P (i) 1 Q(i)
JSD(P ||Q) = ∗ (log )P (i) + ∗ (log )Q(i), (3)
2 i
M (i) 2 i
M (i)
where M = 12 (P + Q). The value of JSD(P ||Q) tends to zero when P and Q
have identical distributions and has a very high value when the distributions
are very different. To include a measure of distance between multiple regions,
one or more of the PMFs of one region is modified by the mean temperature of
another region. The JSD between P − μ2 and Q − μ1 , where P is the PMF of
the abnormal region on the malignant side, Q is the PMF of the normal region
on the malignant side, μ1 is the mean of the contra-lateral side abnormal region
and μ2 is the mean of the contra-lateral side normal region, is taken as a feature.
In case of absence of an abnormal region on the contralateral side, μ1 is taken
to be equal to μ2 . A subtraction of the contralateral region means corresponds
to a relative increase in the heat with respect to the contralateral regions. For
HR− tumors, there may be no abnormal regions on the contra-lateral side, due
to which this JSD will be higher.
Relative Hotness to the Mirror Site. HR+ tumors have a lower temperature
difference between the tumor site and the mirror tumor site on the contra-lateral
side. To capture this, we use the mean squared distance between the temperature
of the malignant side abnormal region pixels and the mean temperature of the
contra-lateral side abnormal region, as defined in Eq. 4.
1
RH = ||T (x, y) − μ||2 (4)
|A|
x∈A y∈A
640 S.T. Kakileti et al.
(a) (b)
(c) (d)
Fig. 1. Shows subjects with malignant tumors having a. ER+/PR+ status b. ER−
/PR− status c. ER+/PR+ status with asymmetrical thermal response d. ER−/PR−
status with some symmetrical thermal response
where T (x, y) represents temperature of the malignant side abnormal region pix-
els at location (x, y) in the image, μ represents mean temperature of the contra-
lateral side abnormal region and |A| represents the cardinality of abnormal region
A on the malignant side. This value is lower for HR+ tumors compared to HR−
tumors, as hormone sensitive tissues will be present on both sides. As shown in
Fig. 1a and b, we see thermal responses on both sides for HR+ tumors and no
thermal response on the normal breast for HR− tumors. However, there might
be outliers like Fig. 1c and d.
Run Length Matrix (RLM) is computed from the thermal map, after quantizing
the temperature into l bins. Gray level non-uniformity and Energy features from
the RLM are computed, as mentioned in [7]. The non-uniformity feature would
be higher for HR− tumors as their tumors have more focal temperatures.
4 Dataset Description
We obtained an anonymized dataset of 56 subjects with biopsy confirmed breast
cancer with age varying from 27 to 76 years through our collaboration with Mani-
pal University. The FLIR E60 camera with a spatial resolution of 320 × 240 pix-
els is used to capture the initial 20 subjects and a high-resolution FLIR T650Sc
camera with an image resolution of 640 × 480 pixels is used for the remain-
ing subjects. A video is captured for each subject, and the acquisition protocol
involved asking the subject to rotate from right lateral to left lateral views. The
data for each subject included the mammography, sono-mammography, biopsy
reports, the ER/PR status values, with surgery reports and HER2 Neu status
values, where available of the tumors. From this data, there are 32 subjects with
HR+ malignant tumors and rest of them have HR− tumors.
5 Classification Results
From the obtained videos, we manually selected five frames that correspond
to frontal, right & left oblique and lateral views, and manually cropped the
ROIs in these. Consideration of multiple views helps in better tumor detection
since it might not be seen in a fixed view. From these multiple views, the view
corresponding to maximum abnormal region area with respect to the ROI area
is considered as the best view. This best view along with its contra-lateral side
view is used to calculate the features from the abnormal regions and the entire
ROI as mentioned in Sect. 3. The training set and testing set comprise of a
randomly chosen subset of 26 and 30 subjects, respectively, with an internal
division of 14 HR+ & 12 HR− and 18 HR+ & 12 HR− tumors, respectively.
The abnormal region is located using ρ = 0.2, τ = 3◦ C using the AN D decision
rule, to optimize for the accuracy in classification. All 11 deep tumors of size
0.9 cm and above have been detected in this dataset. The bin width of the PMFs
used is 0.5◦ C. The step size of the temperature bins in the RLM computation
is 0.25◦ C.
A two-class Random Forest ensemble classifier is trained using the features
obtained. The Random Forest (RF) randomly chooses a training sub-set & a fea-
ture sub-set for training a decision tree, and combines the decisions from multiple
such trees to get more accuracy in classification. The mode of all trees is taken
as the final classification decision. RFs with increasing number of trees have a
lower standard deviation in the accuracies over multiple iterations. The standard
deviation in (HR−, HR+) accuracies of the RFs using all features with 5, 25
and 100 trees over 20 iterations is (9.1 %, 11.1 %), (6.4 %, 4.8 %), (2.5 %, 2.0 %),
respectively, and hence a large number of 100 trees is chosen. Table 1 shows the
642 S.T. Kakileti et al.
max. accuracies over 20 iterations of RFs with 100 trees using individual and
combined features proposed in our approach. We tested with different textural
features obtained from both RLM and Gray Level Co-occurence Matrix, but we
found out that gray-level non-uniformity from the RLM is having better accu-
racy than others. Using an optimal combined set of region based features and
textural features, we obtained an accuracy of 82 % and 79 % in classification of
HR+ and HR− tumors respectively.
References
1. Fitzmaurice, C., et al.: The global burden of cancer 2013. JAMA Oncol. 1(4),
505–527 (2015)
2. Parise, C.A., Caggiano, V.: Breast cancer survival defined by the er/pr/her2 sub-
types and a surrogate classification according to tumor grade and immunohisto-
chemical biomarkers. J. Cancer Epidemiol. 2014, 11 p. (2014). Article ID 469251
3. Alba, E., et al.: Chemotherapy (CT) and hormonotherapy (HT) as neoadjuvant
treatment in luminal breast cancer patients: results from the GEICAM/2006-03, a
multicenter, randomized, phase-ii study. Ann. Oncol. 23(12), 3069–3074 (2012)
4. Cheang, M., Chia, S.K., Voduc, D., et al.: Ki67 index, HER2 status, and prognosis
of patients with luminal B breast cancer. J. Nat. Cancer Inst. 101(10), 736–750
(2009)
5. Keyserlingk, J., Ahlgren, P., Yu, E., Belliveau, N., Yassa, M.: Functional infrared
imaging of the breast. Eng. Med. Biol. Mag. 19(3), 30–41 (2000)
6. Kennedy, D.A., Lee, T., Seely, D.: A comparative review of thermography as a
breast cancer screening technique. Integr. Cancer Ther. 8(1), 9–16 (2009)
7. Acharya, U.R., Ng, E., Tan, J.H., Sree, S.V.: Thermography based breast cancer
detection using texture features and support vector machine. J. Med. Syst. 36(3),
1503–1510 (2012)
8. Borchartt, T.B., Conci, A., Lima, R.C., Resmini, R., Sanchez, A.: Breast ther-
mography from an image processing viewpoint: a survey. Signal Process. 93(10),
2785–2803 (2013)
9. Gautherie, M.: Thermobiological assessment of benign and malignant breast dis-
eases. Am. J. Obstet. Gynecol. 147(8), 861–869 (1983)
10. Vakkala, M., Kahlos, K., Lakari, E., Paakko, P., Kinnula, V., Soini, Y.: Inducible
nitric oxide synthase expression, apoptosis, and angiogenesis in in-situ and invasive
breast carcinomas. Clin. Cancer Res. 6(6), 2408–4216 (2000)
11. Chaudhury, B., et al.: New method for predicting estrogen receptor status utilizing
breast mri texture kinetic analysis. In: Proceedings of the SPIE Medical Imaging
(2014)
12. Zore, Z., Boras, I., Stanec, M., Oresic, T., Zore, I.F.: Influence of hormonal status
on thermography findings in breast cancer. Acta Clin. Croat. 52, 35–42 (2013)
13. Madhu, H., Kakileti, S.T., Venkataramani, K., Jabbireddy, S.: Extraction of med-
ically interpretable features for classification of malignancy in breast thermography.
In: 38th Annual IEEE International Conference on Engineering in Medicine and
Biology Society (EMBC) (2016)
14. Urruticoechea, A.: Proliferation marker ki-67 in early breast cancer. J. Clin. Oncol.
23(28), 7212–7220 (2005)
15. Ganong, W.F.: Review of Medical Physiology. McGraw-Hill Medical, New York
(2005)
16. Venkataramani, K., Mestha, L.K., Ramachandra, L., Prasad, S., Kumar, V., Raja,
P.J.: Semi-automated breast cancer tumor detection with thermographic video
imaging. In: 37th Annual International Conference on Engineering in Medicine
and Biology Society, pp. 2022–2025 (2015)
Prostate Cancer: Improved Tissue
Characterization by Temporal Modeling
of Radio-Frequency Ultrasound Echo Data
1 Introduction
Prostate cancer is the most widely diagnosed form of cancer in men [1]. The Amer-
ican Cancer Society predicts that one in seven men will be diagnosed with prostate
cancer during their lifetime. Initial assessment includes measuring Prostate Spe-
cific Antigen level in blood serum and digital rectal examination. If either test
is abnormal, core needle biopsy is performed under Trans-Rectal Ultrasound
(TRUS) guidance. Disease prognosis and treatment decisions are then based on
H. Shatkay and P. Mousavi—These authors have contributed equally to the
manuscript.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 644–652, 2016.
DOI: 10.1007/978-3-319-46720-7 75
Prostate Cancer: Improved Tissue Characterization by Temporal Modeling 645
sequences and others. Here we use them to model rf time series where time does
have an impact on the ultrasound data being recorded. We next describe our
rf time-series data and its representation, followed by a tissue-characterization
framework. We then present experiments and results demonstrating the effec-
tiveness of the method.
Fig. 1. Ultrasound rf-frames collected from a prostate-cancer patient over time. Solid
red dots indicate the same location across multiple frames. The time series for this
location is shown at the bottom right. A grid dividing each frame into rois is shown
on the left-most frame. Pathology labels for malignant/benign rois are also shown.
Table 1. The distribution of malignant and benign rois over the 9 patients.
Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 Total
Malignant rois 42 29 18 64 35 28 23 30 17 286
Benign rois 42 29 18 61 35 29 23 30 17 284
hmms are often used to model time series where the generating process is
unknown or prone to variation and noise. The process is viewed as a sequence
of stochastic transitions between unobservable (hidden) states; some aspects of
each state are observed and recorded. As such, the states may be estimated
from the observation-sequence [12]. A simplifying assumption underlying the
648 L. Nahlawi et al.
use of these models is the Markov property, namely, that the state at a given
time-point depends only on the state at the preceding point, conditionally inde-
pendent of all other time points. In this work we view a tissue response value
recorded in an rf frame and discretized as discussed above, as an observation;
employing the Markov property, we assume each such value depends only on
the response recorded at the frame directly preceding it, independent of any
earlier responses. Formally, an hmm λ consists of five components: A set of N
states, S = {s1 , . . . , sN }; a set of M observation symbols, V = {v1 , . . . , vM }; an
N × N stochastic matrix A governing the state-transition probability, where
Aij = P r(statet+1 = si |statet = sj ), 1 ≤ i, j ≤ N , and statet is the state at time
t; an N × M stochastic-emission matrix B, where Bik = P r(obt = vk |statet = si ),
1 ≤ i ≤ N, 1 ≤ k ≤ M , denoting the probability of observing vk at state si ; an
N -dimensional stochastic vector π, where for each state si , πi = P r(state1 = si ),
denotes the probability to start the process at state si . Learning a model λ
from a sequence of observations O = o1 , o2 , . . . , o127 , amounts to estimating
the model parameters (namely, A, B & π), to maximize log[P r(O|λ)], i.e. the
observations’ probability given the model λ. In practice, π is fixed such that
π1 = P r(state1 = s1 ) = 1 & πj = 0 for j = 1, i.e. s1 is always the first state. In
the experiments reported here, we also fix the matrix A to an initial estimate
based on clustering (as described below), while the matrix B is learned using
the Baum-Welch algorithm [12].
The hmms we develop, as illustrated in Fig. 2, are ergodic models consisting
of 5 states and 10 observations. A small number of states allows for a compu-
tationally efficient model while typically leading to good generalization beyond
the training set. We determined the number of states by experimenting with 2–6
state models (and a few larger ones with >10 states). The classification perfor-
mance of 5-state models was higher than that of others. Moreover, each of the 5
states is associated with a distinct emission probability distribution, which is not
the case when using additional/fewer states. The observation set, as discussed in
Sect. 2, consists of 10 observation symbols v1 , ..., v10 , each of which corresponds
to a discretized interval of first-order difference values of the rf time-series.
Fig. 2. Example of hmms learned from (A) malignant rois, and (B) benign rois. Nodes
represent states. Edges are labeled by transition probabilities; Emission probabilities
are shown to the right of each model. Edges with probability <0.2 are not shown.
Prostate Cancer: Improved Tissue Characterization by Temporal Modeling 649
For tissue classification, we learn two hmms – one for representing series
obtained from malignant tissue, denoted λM , and the other for benign tissue,
denoted λB . We use supervised learning to learn the models’ parameters, where
the training and test data consist of the time-series corresponding to the rois
that were labeled as malignant and benign (described in Sect. 2). To train each
model, we use a leave-one-patient-out cross-validation strategy, partitioning each
set of roi time-series ( malignant for λM , benign for λB ) into training and test
sets. In each cross-validation run the rois of one of the 9 patients are left-out
as a test-set, while the rois of the other 8 patients are used to train the hmm.
Malignant rois are used to train λM , while λB is trained on benign rois. Given
a test-sequence, roitesti , each of the two models assigns it a log probability,
log(P r(ROItesti |λc )), (c ∈ {M, B}) – a measure indicating how likely the model
is to have generated the time-series. The class label assigned to ROItesti , Ctesti ,
is the one whose model maximizes the log probability, that is:
Ctesti = argmax (log(P r(ROItesti |λc ))), 1 ≤ i ≤ L, where L is the # of test-rois.
c∈{M,B}
P r(ROI
testi|λ )
B
Practically, if the log-odds log P r(ROItest |λM ) is positive, ROItesti is classified as
i
malignant, otherwise it is classified as benign. In Sect. 4, we use the log-odds as
a basis for heat-maps to visualize the results (Fig. 3).
To learn the two models, each of the models is initialized, and its observa-
tion matrix B is then iteratively updated until convergence, in accordance with
the Baum-Welch method. Model initialization is based on clustering the values
within all the discretized training time-series into 5 clusters, cl1 ,. . ., cl5 , where 5
is the number of states. Based on the assignment of each value to its respective
cluster, we estimate the transition probability Ai,j where 1 ≤ i, j ≤ 5 as the
data-frequency of observing a value from cluster cli followed by a value from
cluster clj within all the time series in the respective training set. Since the
model is not left-to-right, the transitions can be in either direction. A similar
estimation process is applied for initializing the observation matrix B.
Fig. 3. Top: rf frames overlaid with malignant/benign pathology labels. Bottom: Heat-
map images based on our learned models, where each roi color is assigned based on
the log-odds ratio calculated for its respective time-series. The left three columns are
rf frames from patients P1 (col 1) and P5 (col 2, 3) while the frames in the rightmost
column are from Patient P7, for whom we noted a lower performance.
650 L. Nahlawi et al.
Table 2. The classification performance using hmms. The numbers in parentheses show
the respective result reported by [7] for the same patient.
Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 Average
Accuracy 82.1 96.5 100 93.6 90 85.9 69.5 78.3 97.1 88.1 ± 9
(82) (71) (88) (95) (86) (86) (N/A) (80) (85)
Sensitivity 100 96.5 100 87.5 97.1 82.1 65.2 73.3 100 89.1 ± 12
(100) (68) (76) (90) (100) (81) (N/A) (98) (84)
Specificity 64.2 96.5 100 100 82.8 89.6 73.9 83.3 94.1 87.1 ± 11
(62) (74) (100) (100) (71) (90) (N/A) (61) (84)
Prostate Cancer: Improved Tissue Characterization by Temporal Modeling 651
malignant tissue and those obtained from benign tissue. Moreover, for most cases
our performance either matches or significantly improves upon that of an earlier
method [7] that used SVMs and did not explicitly model the temporal aspect
of the time-series. We note that for patient P8 our sensitivity is significantly
lower, although our specificity is much higher, which amounts to a compara-
ble overall accuracy. An exception to the high level of performance is clearly
observed for patient P7, for whom the classification performance is significantly
lower than that obtained for all other patients. Further investigation showed
that this patient was not included in the earlier reported results [7], because
the ground-truth registration of the histology labels of malignant tissue was not
accurate. The fact that mis-labeled rois are not well-distinguished based on the
models learned from other patient data serves as further evidence for the fact
that the models indeed capture the salient differences between rf echos emitted
by benign vs malignant tissue. The top row of Fig. 3 shows several examples
of rf frames obtained from different patients overlaid with malignant/benign
labels. The bottom row shows corresponding images of heat-maps based on our
results. Each roi, is assigned a color reflecting the log-odds ratio calculated for
its respective time-series Rx , log PP r(R
r(Rx |λB )
x |λM )
. The first three columns show rf
frames from P1 (1 column) and P5 (2 , 3rd columns), all of which show that
st nd
the heat-maps match the original annotations almost perfectly. The fourth col-
umn shows an rf frame from P7. Despite inaccuracies in the gold-standard for
this image, our model still identifies correctly the benign regions, while showing
most of the malignant regions about equally likely to be malignant or benign.
5 Conclusion
Acknowledgment. This work was partially supported by grants from nserc Discov-
ery to hs and pm, nserc and cihr chrp to pm and nih #r56 lm011354a to hs.
652 L. Nahlawi et al.
References
1. Canadian Cancer Society and National Cancer Institute of Canada. Advisory Com-
mittee on Records, Registries: Canadian cancer statistics. Canadian Cancer Society
(2015)
2. Coast, D., Stern, R., Cano, G., et al.: An approach to cardiac arrhythmia analysis
using hidden Markov models. IEEE Trans. Biomed. Eng. 37(9), 26–36 (1990)
3. Etzioni, R., Tsodikov, A., et al.: Quantifying the role of PSA screening in the US
prostate cancer mortality decline. Cancer Causes Control 19(2), 75–81 (2008)
4. Feleppa, E., Porter, C., Ketterling, J., et al.: Recent developments in tissue-type
imaging (TTI) for planning and monitoring treatment of prostate cancer. Ultrason.
Imaging 26(3), 63–72 (2004)
5. Han, S., Lee, H., Choi, J.: Computer-aided prostate cancer detection using texture
features and clinical features in ultrasound image. J. Dig. Imaging 21(1), 21–33
(2008)
6. Hauskrecht, M., Fraser, H.: Planning treatment of ischemic heart disease with
partially observable Markov decision processes. AI Med. 18(3), 21–44 (2000)
7. Imani, F., Abolmaesumi, P., Gibson, E., et al.: Computer-aided prostate cancer
detection using ultrasound RF time series: in vivo feasibility study. IEEE Trans.
Med. Imaging 34(11), 48–57 (2015)
8. Krouskop, T., Wheeler, T., Kallel, F., et al.: Elastic moduli of breast and prostate
tissues under compression. Ultrason. Imaging 20(4), 60–74 (1998)
9. Li, Y., Lipsky Gorman, S., Elhadad, N.: Section classification in clinical notes using
supervised hidden Markov model. In: Proceedings of the 1st ACM International
Health Informatics Symposium, pp. 44–50. ACM (2010)
10. Moradi, M., Mousavi, P., Boag, A.H., et al.: Augmenting detection of prostate
cancer in transrectal ultrasound images using SVM and RF time series. IEEE
Trans. Biomed. Eng. 56(9), 214–224 (2009)
11. Nahlawi, L., Imani, F., Gaed, M., et al.: Using hidden Markov models to capture
temporal aspects of ultrasound data in prostate cancer. In: IEEE International
Conference on BIBM 2015, pp. 46–49. IEEE (2015)
12. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in
speech recognition. Proc. IEEE 77(2), 257–286 (1989)
13. Singer, E.A., Kaushal, A., Turkbey, B., et al.: Active surveillance for prostate
cancer: past, present and future. Curr. Opin Oncol. 24(3), 43–50 (2012)
Classifying Cancer Grades Using Temporal
Ultrasound for Transrectal Prostate Biopsy
1 Introduction
Prostate Cancer (PCa) is a significant public health issue. According to the
National Cancer Institute (NCI)1 , approximately 14 % of men will be diagnosed
with PCa at some point during their lifetime. Definitive diagnosis involves core
needle biopsy guided by Transrectal Ultrasound (TRUS), followed by histopatho-
logical analysis of the obtained samples. TRUS is blind to intraprostatic pathol-
ogy, and can miss clinically significant disease [5].
1
Surveillance, Epidemiology, and End Results (SEER) Cancer Statistics Review.
c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 653–661, 2016.
DOI: 10.1007/978-3-319-46720-7 76
654 S. Azizi et al.
For each target, the Gleason Score (GS) and the % distribution of PCa in
the axial and sagittal samples were reported. The GS is used to describe PCa
grade and ranges from 1 (resembling normal tissue) to 5 (aggressive cancerous
tissue). It is reported as a sum of the grades of the two most common patterns
in a tissue specimen. We only include cores in our study where the axial and
sagittal pathology match. From 197 cores in our data, 57 were cancerous (12 GS
of 3 + 3, 19 GS of 3 + 4, four GS of 4 + 3, 20 GS of 4 + 4, and two GS of 4 + 5)
while 140 had non-cancerous histology including benign or fibromuscular tissue,
chronic inflammation, atrophy and Prostatic Intraepithelial Neoplasia (PIN).
We divide the data from 197 cores into training and testing sets. Training data
consists of 32 biopsy cores from 27 patients with the following histopathology
labels: 19 benign, 0 GS 3+3, 5 GS 3+4, 2 GS 4+3, 4 GS 4+4 and, 2 GS 4+5.
The test data is made up of 165 cores from 114 patients, with the following
distribution: 121 benign, 12 GS 3+3, 14 GS 3+4, 2 GS 4+3, and 16 GS 4+4.
656 S. Azizi et al.
2.2 Preprocessing
We compute the spectrum of temporal US data obtained from each biopsy core.
For this purpose, we analyze an area of 2 × 10 mm2 around the target location in
the lateral and axial directions, respectively. This region is along the projected
needle path in the US image and centered on the target. We divide the selected
area to 20 equally-sized Regions of Interest (ROI) of size 1 mm2 . For each ROI,
we take the Fourier transforms of all time series corresponding to the RF samples
in each ROI, normalized to the frame rate. Then, we average the absolute values
of the Fourier transforms of the RF time series in each ROI. Finally, each ROI
is represented by 50 positive frequency components (see Fig. 1).
Feature Learning: We use a Deep Belief Network (DBN) structure [1] to map
the set of 50 spectral components for each ROI to six high-level latent features.
The network structure includes 100, 50 and 6 hidden units in three layers, where
the last hidden layer represents the latent features. In the pre-training step,
the learning rate is fixed at 0.001, mini-batch size is 5, and the epoch is 100.
Momentum and weight cost are set to defaults of 0.9, and 2 × 10−4 , respectively.
For discriminative fine-tuning, a node is added to represent the labels of obser-
vations, and back-propagation with a learning rate of 0.01 for 70 epochs and
mini-batch size of 10 is used. We perform dimensionality reduction in the space
of the latent features. We use Zero-phase Component Analysis [2] to whiten the
features and determine the top two eigen vectors, f1 and f2 . We call this space
the eigen feature space.
μk is the mean and Σk is the covariance matrix of the k-th mixture component.
Starting with an initial mixture model, the parameters of Θ are estimated with
Expectation-Maximization (EM) [15]. The EM algorithm is a local optimiza-
tion method, and hence particularly sensitive to the initialization of the model.
Instead of random initialization, we present a simple but efficient method for
finding initial parameters based on our prior knowledge from pathology.
GMM Initialization: Let XH be the set of all ROIs within cores of training
data with the histopathology labels H ∈ { benign, GS 3+4,GS 4+3,GS 4+4}. We
first analyze the distribution of the ROIs of benign cores, Xbenign , in the eigen
feature space; we observe two distinct clusters (Fig. 2) that span histopathology
labels of normal and fibromuscular tissue, chronic inflammation, atrophy, and
PIN. We use k-means clustering to separate the two clusters; we consider the
cluster with the maximum number of “normal tissue” ROIs as the dominant
benign cluster, and the second cluster as a representative for other non-cancerous
tissue. Next, we use ROIs in the training dataset that correspond to the cores
with GS 4+4, XGS4+4 , to identify the dominant cluster that represents Gleason
4 pattern. Finally, we use all other ROIs from cancerous cores that correspond
to GS 3+4 and GS 4+3 to identify the centre for Gleason 3 pattern in the eigen
feature space. We denote the centroid of all clusters by C = {Cbenign , CG4 , CG3 ,
Cnoncancerous }. To initialize the K-component GMM, we set K = 4 to model the
four tissue patterns with mean, μk , for each Gaussian component equal to the
centroid of each cluster. We use an equal covariance matrices for all components
and set Σk to the covariance of XH . Each ωk , k = 1, ..., K is randomly drawn
K
from a uniform distribution between [0, 1] and normalized by k=1 ωk .
Prediction of Gleason Score: For each test core, we map the data from 20
ROIs in that core to the eigen feature space. Subsequently, we assign a label
from {benign, G3, G4, non-cancerous} to each ROI based on its proximity to
the corresponding cluster centre in the eigen feature space. To determine a GS
for a test core, Y, we follow histopathology guidelines where we use the ratio
of the number of ROIs labeled as benign, G3 (NG3 ) and G4 (NG4 ) (e.g., a core
with a large number of G4 and a small number of G3 ROIs has GS 4+3):
⎧
⎪
⎨GS 4+3 or higher, NG4 = 0 & NG4 ≥ NG3
Y = GS 3+4 or lower , NG3 = 0 & NG4 < NG3
⎪
⎩
benign, otherwise
Fig. 3. Target location and distribution of biopsies in the test data. Light and dark gray
indicate central and peripheral zones, respectively. The pie charts indicate the number
of cores and their histopathology. The size of the chart is proportional to the number
of biopsies (in the range from 1 to 25) and the colors dark red, light red and blue refer
to cores with GS ≥ 4 + 3, GS ≤ 3 + 4 and benign pathology, respectively. The top and
bottom rows depict histopathology results and our grade predictions, respectively.
combined approach leads to an AUC of 0.72 for predicting cancer grade versus
either 0.65 using mp-MRI or 0.69 using temporal US data. The combined AUC
is 0.83 for tumors with L ≥ 2.0 cm.
Table 1. Model performance for classification of cores in the test dataset and permu-
tation set. L is the greatest length of the tumor visible in mp-MRI.
4 Conclusion
In this paper, in an in vivo study including 197 TRUS-guided biopsy cores,
temporal US data was used to differentiating between clinically less significant
660 S. Azizi et al.
prostate cancer (GS ≤ 3+4), aggressive prostate (GS ≥ 4+3) and non-cancerous
prostate tissues. Determining the aggressiveness of prostate cancer can help
reduce the current high rate of over-treatment in patients with indolent can-
cer. We utilized a two step machine learning approach to address the challenges
related to ground-truth labeling in PCa grading. First, differentiating features for
detection of cancerous and non-cancerous prostate tissue were learned, and then
the statistical distribution of PCa grades was modeled using a GMM. We showed
that we could successfully differentiate among aggressive PCa (GS ≥ 4+3), clin-
ically less significant PCa (GS ≤ 3+4), and non-cancerous prostate tissues. Fur-
thermore, combination of temporal US and mp-MRI has the potential to out-
perform either modality alone in detection of PCa.
Future work includes: (1) examining physical phenomena governing US time
series tissue typing. Our results to-date suggest that tissue microvibration, pos-
sibly due to cardiac pulsation, and changes in tissue temperature due to acoustic
energy [4] play key roles; (2) an inter-institution patient study to determine the
accuracy across a wide range of patient subpopulation. By displaying the pre-
dicted grade not only for the target, but also for regions surrounding the target,
we will determine if US time series can increase cancer yield.
References
1. Azizi, S., et al.: US-based detection of PCa using automatic feature selection with
deep belief networks. In: MICCAI, pp. 70–77. Springer (2015)
2. Bell, A.J., Sejnowski, T.J.: The independent components of natural scenes are edge
filters. Vis. Res. 37(23), 3327–3338 (1997)
3. Correas, J.M., et al.: PCa: diagnostic performance of real-time shear-wave elastog-
raphy. Radiology 275(1), 280–289 (2014)
4. Daoud, M., et al.: Tissue classification using US-induced variations in acoustic
backscattering features. IEEE TBME 60(2), 310–320 (2013)
5. Epstein, J.I., et al.: Upgrading and downgrading of PCa from biopsy to radical
prostatectomy: incidence and predictive factors using the modified Gleason grading
system and factoring in tertiary grades. Eur. Urol. 61(5), 1019–1024 (2012)
6. Feleppa, E., et al.: Recent advances in ultrasonic tissue-type imaging of the
prostate. In: Acoustical Imaging, pp. 331–339. Springer (2007)
7. Imani, F., et al.: US-based characterization of PCa using joint independent com-
ponent analysis. IEEE TBME 62(7), 1796–1804 (2015)
8. Khojaste, A., et al.: Characterization of aggressive PCa using US RF time series.
In: SPIE Med. Imaging, p. 94141A (2015)
9. Kuru, T.H., et al.: Critical evaluation of magnetic resonance imaging targeted,
transrectal US guided transperineal fusion biopsy for detection of PCa. J. Urol.
190(4), 1380–1386 (2013)
10. Llobet, R., et al.: Computer-aided detection of PCa. Int. J. Med. Inform. 76(7),
547–556 (2007)
11. Nelson, E.D., et al.: Targeted biopsy of the prostate: the impact of color Doppler
imaging and elastography on PCa detection and Gleason score. Urology 70(6),
1136–1140 (2007)
12. de Rooij, M., et al.: Accuracy of multiparametric MRI for PCa detection: a meta-
analysis. Am. J. Roentgenol. 202(2), 343–351 (2014)
Classifying Cancer Grades Using Temporal Ultrasound 661
13. Siddiqui, M.M., et al.: Comparison of MR/US fusion-guided biopsy with US-guided
biopsy for the diagnosis of PCa. JAMA 313(4), 390–397 (2015)
14. Vargas, H.A., et al.: Diffusion-weighted endorectal MRI at 3T for PCa: tumor
detection and assessment of aggressiveness. Radiology 259(3), 775–784 (2011)
15. Xu, L., Jordan, M.I.: On convergence properties of the EM algorithm for Gaussian
mixtures. Neural Comput. 8(1), 129–151 (1996)
Characterization of Lung Nodule Malignancy
Using Hybrid Shape and Appearance Features
Mario Buty1 , Ziyue Xu1(B) , Mingchen Gao1 , Ulas Bagci2 , Aaron Wu1 ,
and Daniel J. Mollura1
1
National Institutes of Health, Bethesda, MD, USA
ziyue.xu@nih.gov
2
University of Central Florida, Orlando, FL, USA
1 Introduction
Lung cancer led to approximately 159,260 deaths in the US in 2014 and is the
most common cancer worldwide. The increasing relevance of pulmonary CT data
has triggered dramatic growth in the computer-aided diagnostics (CAD) field.
Specifically, the CAD task for interpreting chest CT scans can be broken down
into separate steps: delineating the lungs, detecting and segmenting nodules,
and using the image observations to infer clinical judgments. Multiple techniques
have been proposed and subsequently studied for each step. This work focuses
on characterizing the segmented nodules.
Clinical protocols for identifying and assessing nodules, specifically the Fleis-
chner Society Guidelines, involve monitoring the size of the nodule with repeated
scans over a period of three months to two years. Ratings on several image-based
features may also be considered, including growth rate, spiculation, sphericity,
texture, etc. Features like size can be quantitatively estimated via image segmen-
tation, while other markers are mostly judged qualitatively and subjectively. For
nodule classification, existing CAD approaches are often based on sub-optimal
stratification of nodules solely based on their morphology. Malignancy is then
roughly correlated with broad morphological categories. For instance, one study
found malignancy in 82 % of lobulated nodules, 97 % of densely spiculated nod-
ules, 93 % of ragged nodules, 100 % of halo nodules, and 34 % of round nodules [1].
Subsequent approaches incorporated automatic or manual definitions of similar
shape features, along with various other contextual or appearance features into
linear discriminant classifiers. However, these features are mostly subjective and
arbitrarily-defined [2]. These limitations reflect the challenges in achieving a com-
plete and quantitative description of malignant nodule appearances. Similarly,
it is difficult to model the 3D shape of a nodule, which is not directly compre-
hensible with the routine slice-wise inspection of human observers. Therefore,
the extraction of proper appearance features, as well as shape description, are
of great value for the development of CAD systems.
For 3D shape modeling, spherical harmonic (SH) parameterizations offer an
effective model of 3-D shapes. As shape descriptors, they have been used success-
fully in many applications such as protein structure [3], cardiac surface match-
ing [4], and brain mapping [5]. While SH has been shown to successfully discrimi-
nate between malignant and benign nodules (with 93 % accuracy for binary sepa-
ration) [6], using the SH coefficients to uniquely describe a nodule’s “fingerprint”
remains largely unexplored [2]. Also, as a scale- and rotation-invariant descriptor
of a mesh surface, SH dose not have the capability of describing a nodule’s size and
other critical appearance features: e.g.,solid, sub-solid, part-solid, peri-fissural etc.
Hence, SH alone may not be sufficient for nodule characterization.
Recently, deep convolutional neural networks (DCNNs) have been shown to
be effective at extracting image features for successful classification across a
variety of situations [7,8]. More importantly, studies on “transfer learning” and
using DCNN as a generic image representation [9–11] have shown that successful
appearance feature extraction can be achieved without the need of significant
modifications to DCNN structures, or even training on the specific dataset [10].
While simpler neural networks have been used for nodule appearance [2], and
DCNN has recently been used to classify peri-fissural nodules [12], to our knowl-
edge, DCNNs such as the Imagenet DCNN introduced by Krizhevsky et al. [7]
have not been applied to the nodule malignancy problem, nor have they been
combined with 3D shape descriptors such as the SH method.
In this paper, we present a classification approach for malignancy evaluation
of lung nodules by combining both shape and appearance features using SHs
and DCNNs, respectively, on a large annotated dataset from the Lung Image
Database Consortium (LIDC) [13]. First, a surface parameterization scheme
664 M. Buty et al.
2 Methods
Our method works from two inputs: radiologists’ binary nodule segmentations
and the local CT image patches. First, we produce a mesh representation of
each nodule from the binary segmentation using the method from [5]. These are
then mapped to the canonical parameter domain of SH functions via conformal
mapping, giving us a vector of function coefficients as a representation of the
nodule shape. Second, using local CT images, three orthogonal local patches
containing each nodule are combined as one image input for the DCNN, and
appearance features are extracted from the first fully-connected layer of the
network. This approach for appearance feature extraction is based on recent
work in “transfer learning” [9,10]. Finally, we combine shape and appearance
features together and use a RF classifier to assess nodule malignancy rating.
SHs are a series of basis for representing functions defined over the unit sphere
S 2 . The basic idea of SH parameterization is to transform a 3D shape defined in
Euclidean space into the space of SHs. In order to do this, a shape must be first
mapped onto a unit sphere. Conformal mapping is used for this task. It functions
by performing a set of one-to-one surface transformations preserving local angles,
and is especially useful for surfaces with significant variations, such as brain
cortical surfaces [5]. Specifically, let M and N be two Riemannian manifolds,
then a mapping φ : M → N will be considered conformal if local angles between
curves remain invariant. Following the Riemann mapping theorem, a simple
surface can always be mapped to the unit sphere S 2 , producing a spherical
parameterization of the surface.
For genus zero closed surfaces, conformal mapping is equivalent to a harmonic
mapping satisfying the Laplace equation, Δf = 0. For our application, nodules
have approximately spherical shape, with bounded local variations. Therefore, it
is an ideal choice to use spherical conformal mapping to normalize and parame-
terize the nodule surface to a unit sphere. We first convert the binary segmenta-
tions to meshes, and then perform conformal spherical mapping with harmonic
energy minimization. Further technical details can be found in [5].
With spherical conformal mapping, we are able to model the variations of
different nodule shapes onto a unit sphere. However, it is still challenging to
Characterization of Lung Nodule Malignancy 665
judge and quantify the differences within S 2 space. Therefore, SHs are used to
map S 2 to real space R.
Similar to Fourier series as basis for the circle, SHs are capable of decompos-
ing a given function f ∈ S 2 into a direct sum of irreducible sub-representations
f= fˆ(l, m)Ylm ,
l≥0 |m|≤l
where Ylm is the m-th harmonic basis of degree l, and fˆ(l, m) is the corresponding
SH coefficient. Compared to directly using the surface in S 2 , this gives us two
major benefits: first, the extracted representation features are rotation, scale,
and transformation invariant [5]; second, it is much easier to compute the cor-
relation between two vectors than two surfaces. Therefore, SHs are a powerful
representation for further shape analysis.
two resulting SHs by using their direct difference. For comparison, the last two
rows show the SH computation for the same nodule, but with different segmen-
tations from two annotators. As illustrated, the SH coefficients have far greater
differences between malignant and benign nodules than two segmentations for
the same nodule, showing that it is possible to use SH coefficients to estimate the
malignancy rating of a specific nodule. Even so, as the figure demonstrates, for
nodules consisting of only a limited number of voxels, a change in segmentation
could lead to some discrepancy in SH coefficients. For such cases, SH may not
be able to serve as a reliable marker for malignancy, and we need to assist the
classification with further information, i.e., appearance.
Fig. 2. Process of appearance feature extraction. Local patches centered at each nodule
were first extracted on three orthogonal planes. Then, an RGB image is generated with
the three patches fed to each channel. This image is further resampled and used as input
to a trained DCNN. The resulting coefficients in the first fully-connected layer (yellow)
are then used as the feature vector for nodule appearance.
Fig. 2 shows the process how each candidate was quantitatively coded. We
first convert a local 3D CT image volume to an RGB image, which is the required
input to the DCNN structure we use [7]. Here, we used a fixed-size cubic ROI
centered at each segmentation’s center of mass with the size of the largest nodule.
Since voxels in the LIDC dataset are mostly anisotropic, we used interpolation to
achieve isotropic resampling, avoiding distortion effects in the resulting patches.
In order to best preserve the appearance information, we performed principal
Characterization of Lung Nodule Malignancy 667
component analysis (PCA) on the binary segmentation data to identify the three
orthogonal axes x , y , z of the local nodule within the regular x, y, z space of
axial, coronal, and sagittal planes. Then, we resampled the local space within the
x y , x z and y z planes to obtain local patches containing the nodule. The three
orthogonal patch samples formed an “RGB” image used as input to the DCNN’s
expected three channels. We use Krizhevsky et al.’s pre-trained model for natural
images and extract the coefficients of the last few layers of the DCNN as a high-
order representation of the input image. This “transfer learning” approach from
natural images has proven successful within medical-imaging domains [9,10].
As an added benefit, no training of the DCNN is required, avoiding this time-
consuming and computationally expensive step. For our application, we use the
first fully-connected layer as the appearance descriptor.
2.3 RF Classification
By using SH and DCNN, both appearance and shape features of nodules can
be extracted as a vector of scalars, which in turn can be used together to dis-
tinguish nodules with different malignancy ratings. Combining these two very
different feature types is not trivial. Yet, recent work [14] has demonstrated that
non-image information can be successfully combined with CNN features using
classifiers. This success motivates our use of the RF classifier to synthesize the
SH and DCNN features together. The RF method features high accuracy and
efficiency, and is well-suited for problems of this form [15]. It works by “bag-
ging” the data to generate new training subsets with limited features, which are
in turn used to create a set of decision trees. A sample is then put through all
trees and voted on for correct classification. While the RF is generally insensitive
to parameter changes, we found that a set of 200 trees delivered accurate and
timely performance.
5 mm. To account for mis-meshing and artifacts from interpolating slices, meshes
were processed by filters to remove holes and fill islands. We also applied 1-step
Laplacian smoothing.
Judging from the distribution of malignancy ratings for all annotating radiol-
ogists and based on Welch’s t-test, inter-observer differences is significant among
annotators. Meanwhile, according to the range of malignancy rating differences
for any specific nodule, most nodules have a rating discrepancy of 2 or 3 among
different annotators, indicating that inter-observer variability is highly signifi-
cant. Therefore to evaluate the performance of the proposed framework, we used
“off-by-one” accuracy, meaning that we regard a malignancy rating with ±1 as
a reasonable and acceptable evaluation.
Accuracy results for 10-fold cross validation are shown in Table 1 for a range
of nodule sets and SH coefficients. Three sets of models were used, one using
DCNN features only, one using SH coefficients only, and one using both SH and
DCNN features. Models were tested with a range of input parameters, including
maximum number of coefficients included and minimum number of annotators
marking the nodule. In all cases, the hybrid model achieved better results than
both individual models using the same input parameters. The hybrid model
results are even more impressive when compared against the inter-observer vari-
ability of the LIDC dataset. These results indicate that DCNNs and SHs provide
complementary appearance and feature information that can help providing ref-
erence malignancy ratings of lung nodules.
Table 1. Off-by-one accuracy for SH only, DCNN only, and hybrid models for input sets
of number of annotators marking the nodule, and maximum number of SH coefficients
included.
There are many promising avenues of future work. For instance, the method
would benefit even more from a larger and more accurate testing pool, as well
as the inclusion of more reliable and precise ground truth data beyond experts’
subjective evaluations. In addition, using additional complementary information,
such as volume and scale-based features, may also further improve scores. In this
study, we represented a nodule’s appearance within orthogonal planes along three
PCA axis. Indeed, including more 2D views, even 3D DCNN, could potentially
be meaningful beyond the promising results from current setting. The rating
classification can also be formulated as regression, whereas the results were not
statistically significant according to our current experiment.
SH computation variations due to nodule size and segmentation remains open
and discussion is limited in existing literatures [6]. In this study, our experiments
partially covered this robustness via testing segmentations for same nodules
from different human observers. We also observed that including more SH coef-
ficients did not necessarily led to higher accuracy. We postulate that coefficients
help define shape to a certain point, beyond which it may introduce more noise
than useful information, and further investigation would be helpful to test this
hypothesis.
Based on the inter-observer variability, experimental results using the LIDC
dataset demonstrate that the proposed scheme can perform comparably to an
independent expert annotator, but does so using full automation up to segmen-
tation. As a result, this work serves as an important demonstration of how both
shape and appearance information can be harnessed for the important task of
lung nodule classification.
References
1. Furuya, K., Murayama, S., Soeda, H., Murakami, J., Ichinose, Y., Yauuchi, H.,
Katsuda, Y., Koga, M., Masuda, K.: New classification of small pulmonary nodules
by margin characteristics on highresolution CT. Acta Radiol. 40, 496–504 (1999)
2. El-Baz, A., Beache, G.M., Gimel’farb, G., Suzuki, K., Okada, K., Elnakib, A.,
Soliman, A., Abdollahi, B.: Computer-aided diagnosis systems for lung cancer:
challenges and methodologies. Int. J. Biomed. Imaging 2013, 942353 (2013)
3. Venkatraman, V., Sael, L., Kihara, D.: Potential for protein surface shape analysis
using spherical harmonics and 3D Zernike descriptors. Cell Biochem. Biophys.
54(1–3), 23–32 (2009)
4. Huang, H., Shen, L., Zhang, R., Makedon, F.S., Hettleman, B., Pearlman, J.D.:
Surface alignment of 3D spherical harmonic models: application to cardiac MRI
analysis. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp.
67–74. Springer, Heidelberg (2005)
5. Gu, X., Wang, Y., Chan, T.F., Thompson, P.M.: Genus zero surface conformal
mapping and its application to brain surface mapping. IEEE Trans. Med. Imaging
23, 949–958 (2004)
6. El-Baz, A., Nitzken, M., Khalifa, F., Elnakib, A., Gimel’farb, G., Falk, R.,
El-Ghar, M.A.: 3D shape analysis for early diagnosis of malignant lung nodules. In:
Székely, G., Hahn, H.K. (eds.) IPMI 2011. LNCS, vol. 6801, pp. 772–783. Springer,
Heidelberg (2011)
670 M. Buty et al.
7. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep
convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L.,
Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25,
pp. 1097–1105. Curran Associates Inc., Red Hook (2012)
8. Gao, M., Bagci, U., Lu, L., Wu, A., Buty, M., Shin, H.C., Roth, H.,
Papadakis, G.Z., Depeursinge, A., Summers, R., Xu, Z., Mollura, D.J.: Holistic
classification of CT attenuation patterns for interstitial lung diseases via deep con-
volutional neural networks. In: 1st Workshop on Deep Learning in Medical Image
Analysis, DLMIA 2015 pp. 41–48, October 2015
9. Shin, H.C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D.,
Summers, R.M.: Deep convolutional neural networks for computer-aided detection:
CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med.
Imaging 99, 1 (2016)
10. Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E., Greenspan, H.: Chest
pathology detection using deep learning with non-medical training. In: 2015 IEEE
12th International Symposium on Biomedical Imaging (ISBI), pp. 294–297, April
2015
11. Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf:
an astounding baseline for recognition. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops, pp. 806–813 (2014)
12. Ciompi, F., de Hoop, B., van Riel, S.J., Chung, K., Scholten, E.T., Oudkerk, M., de
Jong, P.A., Prokop, M., van Ginneken, B.: Automatic classification of pulmonary
peri-fissural nodules in computed tomography using an ensemble of 2D views and
a convolutional neural network out-of-the-box. Med. Image Anal. 26(1), 195–202
(2015)
13. Armato, S.G., McLennan, G., Bidaut, L., et al.: The lung image database consor-
tium (LIDC) and image database resource initiative (IDRI): a completed reference
database of lung nodules on CT scans. Med. Phys. 38(2), 915–931 (2011)
14. Sampaio, W.B., Diniz, E.M., Silva, A.C., de Paiva, A.C., Gattass, M.: Detection
of masses in mammogram images using CNN, geostatistic functions and SVM.
Comput. Biol. Med. 41(8), 653–664 (2011)
15. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Author Index
Xu, Yanwu II-132, III-441, III-458 Zhang, Han I-37, I-106, II-1, II-18, II-26,
Xu, Zheng II-640, II-676 II-115, II-212
Xu, Ziyue I-662 Zhang, Heye III-98
Xu, Zongben II-521 Zhang, Honghai II-344
Zhang, Jie I-326
Yamamoto, Tokunori III-353 Zhang, Jun I-308, II-79
Yan, Pingkun I-653 Zhang, Lichi II-1
Yang, Caiyun II-124 Zhang, Lin I-386
Yang, Feng II-124 Zhang, Miaomiao III-54, III-166
Yang, Guang-Zhong I-386, I-448, I-525 Zhang, Qiang II-274
Yang, Heran II-521 Zhang, Shaoting II-35, II-115, III-264
Yang, Jianhua III-1 Zhang, Shu I-19, I-28
Yang, Jie II-624 Zhang, Siyuan II-658
Yang, Lin II-185, II-442, II-658, III-183 Zhang, Tuo I-19, I-46, I-123
Yang, Shan I-627 Zhang, Wei I-19
Yang, Tao I-335 Zhang, Xiaoqin III-441
Yao, Jiawen II-640, II-649 Zhang, Xiaoyan I-559
Yap, Pew-Thian I-210, I-308, II-88, III-561, Zhang, Yizhe II-658
III-587 Zhang, Yong I-282, III-561
Yarmush, Martin L. III-388 Zhang, Zizhao II-185, II-442, III-183
Ye, Chuyang I-97 Zhao, Liang I-525
Ye, Jieping I-326, I-335 Zhao, Qinghua I-55
Ye, Menglong I-386, I-448 Zhao, Qingyu I-439
Yendiki, Anastasia I-184 Zhao, Shijie I-19, I-28, I-46, I-55
Yin, Qian II-442 Zhen, Xiantong III-210
Yin, Yilong II-335, III-210 Zheng, Yefeng I-413, II-487, III-317
Yin, Zhaozheng II-685 Zheng, Yingqiang III-326
Yoo, Youngjin II-406 Zheng, Yuanjie II-35
Yoshino, Yasushi III-353 Zhong, Zichun III-150
Yousry, Tarek III-81 Zhou, Mu II-124
Yu, Lequan II-149 Zhou, S. Kevin II-487
Yu, Renping I-37 Zhou, Xiaobo I-559
Yuan, Peng I-559 Zhu, Hongtu I-627
Yun, Il Dong III-308 Zhu, Xiaofeng I-106, I-264, I-291, I-344,
Yushkevich, Paul A. II-538, II-564, III-63 II-70
Zhu, Xinliang II-649
Zaffino, Paolo II-158 Zhu, Ying I-413
Zang, Yali II-124 Zhu, Yingying I-106, I-264, I-291
Zapp, Daniel I-378 Zhuang, Xiahai II-581
Zec, Michelle I-465 Zisserman, Andrew II-166
Zhan, Liang I-335 Zombori, Gergely I-542
Zhan, Yiqiang III-264 Zontak, Maria I-431
Zhang, Daoqiang I-1 Zu, Chen I-291
Zhang, Guangming I-559 Zuluaga, Maria A. I-542, II-352
Zhang, Haichong K. I-585 Zwicker, Jill G. I-175