Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016

Sebastien Ourselin · Leo Joskowicz
Mert R. Sabuncu · Gozde Unal

William Wells (Eds.)
Medical Image Computing

LNCS 9900
and Computer-Assisted
Intervention – MICCAI 2016
19th International Conference
Athens, Greece, October 17–21, 2016
Proceedings, Part I
123
Lecture Notes in Computer Science 9900
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7412
Sebastien Ourselin Leo Joskowicz
•
Mert R. Sabuncu Gozde Unal

•
William Wells (Eds.)
Medical Image Computing

and Computer-Assisted
Intervention – MICCAI 2016
19th International Conference
Athens, Greece, October 17–21, 2016
Proceedings, Part I
123
Editors
Sebastien Ourselin Gozde Unal
University College London Istanbul Technical University
London Istanbul
UK Turkey
Leo Joskowicz William Wells
The Hebrew University of Jerusalem Harvard Medical School
Jerusalem Boston, MA
Israel USA
Mert R. Sabuncu
Harvard Medical School
Boston, MA
USA
ISSN 0302-9743 ISSN 1611-3349 (electronic)

Lecture Notes in Computer Science
ISBN 978-3-319-46719-1 ISBN 978-3-319-46720-7 (eBook)
DOI 10.1007/978-3-319-46720-7
Library of Congress Control Number: 2016952513
LNCS Sublibrary: SL6 – Image Processing, Computer Vision, Pattern Recognition, and Graphics
© Springer International Publishing AG 2016

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made.
Printed on acid-free paper
This Springer imprint is published by Springer Nature

The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
In 2016, the 19th International Conference on Medical Image Computing and

Computer-Assisted Intervention (MICCAI 2016) was held in Athens, Greece. It was
organized by Harvard Medical School, The Hebrew University of Jerusalem,
University College London, Sabancı University, Bogazici University, and Istanbul
Technical University.
The meeting took place at the Intercontinental Athenaeum Hotel in Athens, Greece,
during October 18–20. Satellite events associated with MICCAI 2016 were held on
October 19 and October 21. MICCAI 2016 and its satellite events attracted
word-leading scientists, engineers, and clinicians, who presented high-standard papers,
aiming at uniting the fields of medical image processing, medical image formation, and
medical robotics.
This year the triple anonymous review process was organized in several phases. In
total, 756 submissions were received. The review process was handled by one primary
and two secondary Program Committee members for each paper. It was initiated by the
primary Program Committee member, who assigned exactly three expert reviewers, who
were blinded to the authors of the paper. Based on these initial anonymous reviews, 82
papers were directly accepted and 189 papers were rejected. Next, the remaining papers
went to the rebuttal phase, in which the authors had the chance to respond to the
concerns raised by reviewers. The reviewers were then given a chance to revise their
reviews based on the rebuttals. After this stage, 51 papers were accepted and 147 papers
were rejected based on a consensus reached among reviewers. Finally, the reviews and
associated rebuttals were subsequently discussed in person among the Program Com-
mittee members during the MICCAI 2016 Program Committee meeting that took place
in London, UK, during May 28–29, 2016, with 28 Program Committee members out of
55, the four Program Chairs, and the General Chair. The process led to the acceptance of
another 95 papers and the rejection of 192 papers. In total, 228 papers of the 756
submitted papers were accepted, which corresponds to an acceptance rate of 30.1%.
For these proceedings, the 228 papers are organized in 18 groups as follows. The
first volume includes Brain Analysis (12), Brain Analysis: Connectivity (12), Brain
Analysis: Cortical Morphology (6), Alzheimer Disease (10), Surgical Guidance and
Tracking (15), Computer Aided Interventions (10), Ultrasound Image Analysis (5), and
Cancer Image Analysis (7). The second volume includes Machine Learning and Fea-
ture Selection (12), Deep Learning in Medical Imaging (13), Applications of Machine
Learning (14), Segmentation (33), and Cell Image Analysis (7). The third volume
includes Registration and Deformation Estimation (16), Shape Modeling (11), Cardiac
and Vascular Image Analysis (19), Image Reconstruction (10), and MRI Image
Analysis (16).
We thank Dekon, who did an excellent job in the organization of the conference. We
thank the MICCAI society for provision of support and insightful comments, the
Program Committee for their diligent work in helping to prepare the technical program,
VI Preface
as well as the reviewers for their support during the review process. We also thank
Andreas Maier for his support in editorial tasks. Last but not least, we thank our
sponsors for the financial support that made the conference possible.
We look forward to seeing you in Quebec City, Canada, in 2017!
August 2016 Sebastien Ourselin

William Wells
Leo Joskowicz
Mert Sabuncu
Gozde Unal
Organization
General Chair
Sebastien Ourselin University College London, London, UK
General Co-chair
Aytül Erçil Sabanci University, Istanbul, Turkey
Program Chair
William Wells Harvard Medical School, Boston, MA, USA
Program Co-chairs
Mert R. Sabuncu A.A. Martinos Center for Biomedical Imaging,
Charlestown, MA, USA
Leo Joskowicz The Hebrew University of Jerusalem, Israel
Gozde Unal Istanbul Technical University, Istanbul, Turkey
Local Organization Chair

Bülent Sankur Bogazici University, Istanbul, Turkey
Satellite Events Chair

Burak Acar Bogazici University, Istanbul, Turkey
Satellite Events Co-chairs

Evren Özarslan Harvard Medical School, Boston, MA, USA
Devrim Ünay Izmir University of Economics, Izmir, Turkey
Tom Vercauteren University College London, UK
Industrial Liaison
Tanveer Syeda-Mahmood IBM Almaden Research Center, San Jose, CA, USA
VIII Organization
Publication Chair
Andreas Maier Friedrich-Alexander-Universität Erlangen-Nürnberg,
Erlangen, Germany
MICCAI Society Board of Directors

Stephen Aylward Kitware, Inc., NY, USA
(Treasurer)
Hervé Delinguette Inria, Sophia Antipolis, France
Simon Duchesne Université Laval, Quebéc, QC, Canada
Gabor Fichtinger Queen’s University, Kingston, ON, Canada
(Secretary)
Alejandro Frangi University of Sheffield, UK
Pierre Jannin INSERM/Inria, Rennes, France
Leo Joskowicz The Hebrew University of Jerusalem, Israel
Shuo Li Digital Imaging Group, Western University, London,
ON, Canada
Wiro Niessen (President Erasmus MC - University Medical Centre, Rotterdam,
and Board Chair) The Netherlands
Nassir Navab Technical University of Munich, Germany
Alison Noble (Past University of Oxford, UK
President - Non Voting)
Sebastien Ourselin University College London, UK
Josien Pluim Eindhoven University of Technology, The Netherlands
Li Shen (Executive Indiana University, IN, USA
Director)
MICCAI Society Consultants to the Board

Alan Colchester University of Kent, Canterbury, UK
Terry Peters University of Western Ontario, London, ON, Canada
Richard Robb Mayo Clinic College of Medicine, MN, USA
Executive Officers
President and Board Chair Wiro Niessen
Executive Director Li Shen
(Managing Educational
Affairs)
Secretary (Coordinating Gabor Fichtinger
MICCAI Awards)
Treasurer Stephen Aylward
Elections Officer Rich Robb
Organization IX
Non-Executive Officers
Society Secretariat Janette Wallace, Canada
Recording Secretary Jackie Williams, Canada
and Web Maintenance
Fellows Nomination Terry Peters, Canada
Coordinator
Student Board Members

President Lena Filatova
Professional Student Danielle Pace
Events officer
Public Relations Officer Duygu Sarikaya
Social Events Officer Mathias Unberath
Program Committee
Arbel, Tal McGill University, Canada
Cardoso, Manuel Jorge University College London, UK
Castellani, Umberto University of Verona, Italy
Cattin, Philippe C. University of Basel, Switzerland
Chung, Albert C.S. Hong Kong University of Science and Technology,
Hong Kong
Cukur, Tolga Bilkent University, Turkey
Delingette, Herve Inria, France
Feragen, Aasa University of Copenhagen, Denmark
Freiman, Moti Philips Healthcare, Israel
Glocker, Ben Imperial College London, UK
Goksel, Orcun ETH Zurich, Switzerland
Gonzalez Ballester, Universitat Pompeu Fabra, Spain
Miguel Angel
Grady, Leo HeartFlow, USA
Greenspan, Hayit Tel Aviv University, Israel
Howe, Robert Harvard University, USA
Isgum, Ivana University Medical Center Utrecht, The Netherlands
Jain, Ameet Philips Research North America, USA
Jannin, Pierre University of Rennes, France
Joshi, Sarang University of Utah, USA
Kalpathy-Cramer, Jayashree Harvard Medical School, USA
Kamen, Ali Siemens Corporate Technology, USA
Knutsson, Hans Linkoping University, Sweden
Konukoglu, Ender Harvard Medical School, USA
Landman, Bennett Vanderbilt University, USA
Langs, Georg University of Vienna, Austria
X Organization
Lee, Su-Lin Imperial College London, UK

Liao, Hongen Tsinghua University, China
Linguraru, Marius George Children’s National Health System, USA
Liu, Huafeng Zhejiang University, China
Lu, Le National Institutes of Health, USA
Maier-Hein, Lena German Cancer Research Center, Germany
Martel, Anne University of Toronto, Canada
Masamune, Ken The University of Tokyo, Japan
Menze, Bjoern Technische Universitat München, Germany
Modat, Marc Imperial College, London, UK
Moradi, Mehdi IBM Almaden Research Center, USA
Nielsen, Poul The University of Auckland, New Zealand
Niethammer, Marc UNC Chapel Hill, USA
O’Donnell, Lauren Harvard Medical School, USA
Padoy, Nicolas University of Strasbourg, France
Pohl, Kilian SRI International, USA
Prince, Jerry Johns Hopkins University, USA
Reyes, Mauricio University of Bern, Bern, Switzerland
Sakuma, Ichiro The University of Tokyo, Japan
Sato, Yoshinobu Nara Institute of Science and Technology, Japan
Shen, Li Indiana University School of Medicine, USA
Stoyanov, Danail University College London, UK
Van Leemput, Koen Technical University of Denmark, Denmark
Vrtovec, Tomaz University of Ljubljana, Slovenia
Wassermann, Demian Inria, France
Wein, Wolfgang ImFusion GmbH, Germany
Yang, Guang-Zhong Imperial College London, UK
Young, Alistair The University of Auckland, New Zealand
Zheng, Guoyan University of Bern, Switzerland
Reviewers
Abbott, Jake Alexander, Daniel Bai, Ying

Abolmaesumi, Purang Aljabar, Paul Bao, Siqi
Acosta-Tamayo, Oscar Allan, Maximilian Barbu, Adrian
Adeli, Ehsan Altmann, Andre Batmanghelich, Kayhan
Afacan, Onur Andras, Jakab Bauer, Stefan
Aganj, Iman Angelini, Elsa Bazin, Pierre-Louis
Ahmadi, Seyed-Ahmad Antony, Bhavna Beier, Susann
Aichert, Andre Ashburner, John Bello, Fernando
Akhondi-Asl, Alireza Auvray, Vincent Ben Ayed, Ismail
Albarqouni, Shadi Awate, Suyash P. Bergeles, Christos
Alberola-López, Carlos Bagci, Ulas Berger, Marie-Odile
Alberts, Esther Bai, Wenjia Bhalerao, Abhir
Organization XI
Bhatia, Kanwal Conjeti, Sailesh Foroughi, Pezhman

Bieth, Marie Cootes, Tim Forsberg, Daniel
Bilgic, Berkin Coupe, Pierrick Franz, Alfred
Birkfellner, Wolfgang Crum, William Freysinger, Wolfgang
Bloch, Isabelle Dalca, Adrian Fripp, Jurgen
Bogunovic, Hrvoje Darkner, Sune Frisch, Benjamin
Bouget, David Das Gupta, Mithun Fritscher, Karl
Bouix, Sylvain Dawant, Benoit Funka-Lea, Gareth
Brady, Michael de Bruijne, Marleen Gabrani, Maria
Bron, Esther De Craene, Mathieu Gallardo Diez,
Brost, Alexander Degirmenci, Alperen Guillermo Alejandro
Buerger, Christian Dehghan, Ehsan Gangeh, Mehrdad
Burgos, Ninon Demirci, Stefanie Ganz, Melanie
Cahill, Nathan Depeursinge, Adrien Gao, Fei
Cai, Weidong Descoteaux, Maxime Gao, Mingchen
Cao, Yu Despinoy, Fabien Gao, Yaozong
Carass, Aaron Dijkstra, Jouke Gao, Yue
Cardoso, Manuel Jorge Ding, Xiaowei Garvin, Mona
Carmichael, Owen Dojat, Michel Gaser, Christian
Carneiro, Gustavo Dong, Xiao Gass, Tobias
Caruyer, Emmanuel Dorfer, Matthias Georgescu, Bogdan
Cash, David Du, Xiaofei Gerig, Guido
Cerrolaza, Juan Duchateau, Nicolas Ghesu, Florin-Cristian
Cetin, Suheyla Duchesne, Simon Gholipour, Ali
Cetingul, Hasan Ertan Duncan, James S. Ghosh, Aurobrata
Chakravarty, M. Mallar Ebrahimi, Mehran Giachetti, Andrea
Chatelain, Pierre Ehrhardt, Jan Giannarou, Stamatia
Chen, Elvis C.S. Eklund, Anders Gibaud, Bernard
Chen, Hanbo El-Baz, Ayman Ginsburg, Shoshana
Chen, Hao Elliott, Colm Girard, Gabriel
Chen, Ting Ellis, Randy Giusti, Alessandro
Cheng, Jian Elson, Daniel Golemati, Spyretta
Cheng, Jun El-Zehiry, Noha Golland, Polina
Cheplygina, Veronika Erdt, Marius Gong, Yuanhao
Chowdhury, Ananda Essert, Caroline Good, Benjamin
Christensen, Gary Fallavollita, Pascal Gooya, Ali
Chui, Chee Kong Fang, Ruogu Grisan, Enrico
Côté, Marc-Alexandre Fenster, Aaron Gu, Xianfeng
Ciompi, Francesco Ferrante, Enzo Gu, Xuan
Clancy, Neil T. Fick, Rutger Gubern-Mérida, Albert
Claridge, Ela Figl, Michael Guetter, Christoph
Clarysse, Patrick Fischer, Peter Guo, Peifang B.
Cobzas, Dana Fishbaugh, James Guo, Yanrong
Comaniciu, Dorin Fletcher, P. Thomas Gur, Yaniv
Commowick, Olivier Forestier, Germain Gutman, Boris
Compas, Colin Foroughi, Pezhman Hacihaliloglu, Ilker
XII Organization
Haidegger, Tamas Kadkhodamohammadi, Lepore, Natasha

Hamarneh, Ghassan Abdolrahim Lesage, David
Hammer, Peter Kadoury, Samuel Li, Gang
Harada, Kanako Kainz, Bernhard Li, Jiang
Harrison, Adam Kakadiaris, Ioannis Li, Xiang
Hata, Nobuhiko Kamnitsas, Konstantinos Liang, Liang
Hatt, Chuck Kandemir, Melih Lindner, Claudia
Hawkes, David Kapoor, Ankur Lioma, Christina
Haynor, David Karahanoglu, F. Isik Liu, Jiamin
He, Huiguang Karargyris, Alexandros Liu, Mingxia
He, Tiancheng Kasenburg, Niklas Liu, Sidong
Heckemann, Rolf Katouzian, Amin Liu, Tianming
Hefny, Mohamed Kelm, Michael Liu, Ting
Heinrich, Mattias Paul Kerrien, Erwan Lo, Benny
Heng, Pheng Ann Khallaghi, Siavash Lombaert, Herve
Hennersperger, Christoph Khalvati, Farzad Lorenzi, Marco
Herbertsson, Magnus Köhler, Thomas Loschak, Paul
Hütel, Michael Kikinis, Ron Loy Rodas, Nicolas
Holden, Matthew Kim, Boklye Luo, Xiongbiao
Hong, Jaesung Kim, Hosung Lv, Jinglei
Hong, Yi Kim, Minjeong Maddah, Mahnaz
Hontani, Hidekata Kim, Sungeun Mahapatra, Dwarikanath
Horise, Yuki Kim, Sungmin Maier, Andreas
Horiuchi, Tetsuya King, Andrew Maier, Oskar
Hu, Yipeng Kisilev, Pavel Maier-Hein (né Fritzsche),
Huang, Heng Klein, Stefan Klaus Hermann
Huang, Junzhou Klinder, Tobias Mailhe, Boris
Huang, Xiaolei Kluckner, Stefan Malandain, Gregoire
Hughes, Michael Konofagou, Elisa Mansoor, Awais
Hutter, Jana Kunz, Manuela Marchesseau, Stephanie
Iakovidis, Dimitris Kurugol, Sila Marsland, Stephen
Ibragimov, Bulat Kuwana, Kenta Martí, Robert
Iglesias, Juan Eugenio Kwon, Dongjin Martin-Fernandez, Marcos
Iordachita, Iulian Ladikos, Alexander Masuda, Kohji
Irving, Benjamin Lamecker, Hans Masutani, Yoshitaka
Jafari-Khouzani, Kourosh Lang, Andrew Mateus, Diana
Jain, Saurabh Lapeer, Rudy Matsumiya, Kiyoshi
Janoos, Firdaus Larrabide, Ignacio Mazomenos, Evangelos
Jedynak, Bruno Larsen, Anders McClelland, Jamie
Jiang, Tianzi Boesen Lindbo Mehrabian, Hatef
Jiang, Xi Lauze, Francois Meier, Raphael
Jin, Yan Lea, Colin Melano, Tim
Jog, Amod Lefèvre, Julien Melbourne, Andrew
Jolly, Marie-Pierre Lekadir, Karim Mendelson, Alex F.
Joshi, Anand Lelieveldt, Boudewijn Menegaz, Gloria
Joshi, Shantanu Lemaitre, Guillaume Metaxas, Dimitris
Organization XIII
Mewes, Philip Pace, Danielle Randles, Amanda

Meyer, Chuck Panayiotou, Maria Rathi, Yogesh
Miller, Karol Panse, Ashish Reinertsen, Ingerid
Misra, Sarthak Papa, Joao Reiter, Austin
Misra, Vinith Papademetris, Xenios Rekik, Islem
MÌürup, Morten Papadopoulo, Theo Reuter, Martin
Moeskops, Pim Papie, Bartâomiej W. Riklin Raviv, Tammy
Moghari, Mehdi Parisot, Sarah Risser, Laurent
Mohamed, Ashraf Park, Sang hyun Rit, Simon
Mohareri, Omid Paulsen, Rasmus Rivaz, Hassan
Moore, John Peng, Tingying Robinson, Emma
Moreno, Rodrigo Pennec, Xavier Rohling, Robert
Mori, Kensaku Peressutti, Devis Rohr, Karl
Mountney, Peter Pernus, Franjo Ronneberger, Olaf
Mukhopadhyay, Anirban Peruzzo, Denis Roth, Holger
Müller, Henning Peter, Loic Rottman, Caleb
Nakamura, Ryoichi Peterlik, Igor Rousseau, François
Nambu, Kyojiro Petersen, Jens Roy, Snehashis
Nasiriavanaki, Petersen, Kersten Rueckert, Daniel
Mohammadreza Petitjean, Caroline Rueda Olarte, Andrea
Negahdar, Pham, Dzung Ruijters, Daniel
Mohammadreza Pheiffer, Thomas Salcudean, Tim
Nenning, Karl-Heinz Piechnik, Stefan Salvado, Olivier
Neumann, Dominik Pitiot, Alain Sanabria, Sergio
Neumuth, Thomas Pizzolato, Marco Saritas, Emine
Ng, Bernard Plenge, Esben Sarry, Laurent
Ni, Dong Pluim, Josien Scherrer, Benoit
Näppi, Janne Polimeni, Jonathan R. Schirmer, Markus D.
Niazi, Muhammad Poline, Jean-Baptiste Schnabel, Julia A.
Khalid Khan Pont-Tuset, Jordi Schultz, Thomas
Ning, Lipeng Popovic, Aleksandra Schumann, Christian
Noble, Alison Porras, Antonio R. Schumann, Steffen
Noble, Jack Prasad, Gautam Schwartz, Ernst
Noblet, Vincent Prastawa, Marcel Sechopoulos, Ioannis
Nouranian, Saman Pratt, Philip Seeboeck, Philipp
Oda, Masahiro Preim, Bernhard Seiler, Christof
O’Donnell, Thomas Preston, Joseph Seitel, Alexander
Okada, Toshiyuki Prevost, Raphael sepasian, neda
Oktay, Ozan Pszczolkowski, Stefan Sermesant, Maxime
Oliver, Arnau Qazi, Arish A. Sethuraman, Shriram
Onofrey, John Qi, Xin Shahzad, Rahil
Onogi, Shinya Qian, Zhen Shamir, Reuben R.
Orihuela-Espina, Felipe Qiu, Wu Shi, Kuangyu
Otake, Yoshito Quellec, Gwenole Shi, Wenzhe
Ou, Yangming Raj, Ashish Shi, Yonggang
Özarslan, Evren Rajpoot, Nasir Shin, Hoo-Chang
XIV Organization
Siddiqi, Kaleem Tasdizen, Tolga Wang, Liansheng

Silva, Carlos Alberto Taylor, Russell Wang, Linwei
Simpson, Amber Thirion, Bertrand Wang, Qiu
Singh, Vikas Tie, Yanmei Wang, Song
Sivaswamy, Jayanthi Tiwari, Pallavi Wang, Yalin
Sjölund, Jens Toews, Matthew Warfield, Simon
Skalski, Andrzej Tokuda, Junichi Weese, Jürgen
Slabaugh, Greg Tong, Tong Wegner, Ingmar
Smeets, Dirk Tournier, J. Donald Wei, Liu
Sommer, Stefan Toussaint, Nicolas Wels, Michael
Sona, Diego Tsaftaris, Sotirios Werner, Rene
Song, Gang Tustison, Nicholas Westin, Carl-Fredrik
Song, Qi Twinanda, Andru Putra Whitaker, Ross
Song, Yang Twining, Carole Wörz, Stefan
Sotiras, Aristeidis Uhl, Andreas Wiles, Andrew
Speidel, Stefanie Ukwatta, Eranga Wittek, Adam
Špiclin, Žiga Umadevi Venkataraju, Wolf, Ivo
Sporring, Jon Kannan Wolterink,
Staib, Lawrence Unay, Devrim Jelmer Maarten
Stamm, Aymeric Urschler, Martin Wright, Graham
Staring, Marius Vaillant, Régis Wu, Guorong
Stauder, Ralf van Assen, Hans Wu, Meng
Stewart, James van Ginneken, Bram Wu, Xiaodong
Studholme, Colin van Tulder, Gijs Xie, Saining
Styles, Iain van Walsum, Theo Xie, Yuchen
Styner, Martin Vandini, Alessandro Xing, Fuyong
Sudre, Carole H. Vasileios, Vavourakis Xu, Qiuping
Suinesiaputra, Avan Vegas-Sanchez-Ferrero, Xu, Yanwu
Suk, Heung-Il Gonzalo Xu, Ziyue
Summers, Ronald Vemuri, Anant Suraj Yamashita, Hiromasa
Sun, Shanhui Venkataraman, Archana Yan, Jingwen
Sundar, Hari Vercauteren, Tom Yan, Pingkun
Sushkov, Mikhail Veta, Mtiko Yan, Zhennan
Suzuki, Takashi Vidal, Rene Yang, Lin
Szczepankiewicz, Filip Villard, Pierre-Frederic Yao, Jianhua
Sznitman, Raphael Visentini-Scarzanella, Yap, Pew-Thian
Taha, Abdel Aziz Marco Yaqub, Mohammad
Tahmasebi, Amir Viswanath, Satish Ye, Dong Hye
Talbot, Hugues Vitanovski, Dime Ye, Menglong
Tam, Roger Vogl, Wolf-Dieter Yin, Zhaozheng
Tamaki, Toru von Berg, Jens Yokota, Futoshi
Tamura, Manabu Vrooman, Henri Zelmann, Rina
Tanaka, Yoshihiro Wang, Defeng Zeng, Wei
Tang, Hui Wang, Hongzhi Zhan, Yiqiang
Tang, Xiaoying Wang, Junchen Zhang, Daoqiang
Tanner, Christine Wang, Li Zhang, Fan
Organization XV
Zhang, Le Zhang, Yong Zhu, Hongtu

Zhang, Ling Zhen, Xiantong Zhu, Yuemin
Zhang, Miaomiao Zheng, Yefeng Zhuang, Xiahai
Zhang, Pei Zhijun, Zhang Zollei, Lilla
Zhang, Qing Zhou, Jinghao Zuluaga, Maria A.
Zhang, Tianhao Zhou, Luping
Zhang, Tuo Zhou, S. Kevin
Contents – Part I
Brain Analysis
Ordinal Patterns for Connectivity Networks in Brain Disease Diagnosis . . . . . 1

Mingxia Liu, Junqiang Du, Biao Jie, and Daoqiang Zhang
Discovering Cortical Folding Patterns in Neonatal Cortical Surfaces

Using Large-Scale Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Yu Meng, Gang Li, Li Wang, Weili Lin, John H. Gilmore,
and Dinggang Shen
Modeling Functional Dynamics of Cortical Gyri and Sulci . . . . . . . . . . . . . . 19

Xi Jiang, Xiang Li, Jinglei Lv, Shijie Zhao, Shu Zhang, Wei Zhang,
Tuo Zhang, and Tianming Liu
A Multi-stage Sparse Coding Framework to Explore the Effects

of Prenatal Alcohol Exposure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Shijie Zhao, Junwei Han, Jinglei Lv, Xi Jiang, Xintao Hu, Shu Zhang,
Mary Ellen Lynch, Claire Coles, Lei Guo, Xiaoping Hu,
and Tianming Liu
Correlation-Weighted Sparse Group Representation for Brain Network

Construction in MCI Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Renping Yu, Han Zhang, Le An, Xiaobo Chen, Zhihui Wei,
and Dinggang Shen
Temporal Concatenated Sparse Coding of Resting State fMRI Data Reveal

Network Interaction Changes in mTBI. . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Jinglei Lv, Armin Iraji, Fangfei Ge, Shijie Zhao, Xintao Hu, Tuo Zhang,
Junwei Han, Lei Guo, Zhifeng Kou, and Tianming Liu
Exploring Brain Networks via Structured Sparse Representation

of fMRI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Qinghua Zhao, Jianfeng Lu, Jinglei Lv, Xi Jiang, Shijie Zhao,
and Tianming Liu
Discover Mouse Gene Coexpression Landscape Using Dictionary Learning

and Sparse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Yujie Li, Hanbo Chen, Xi Jiang, Xiang Li, Jinglei Lv, Hanchuan Peng,
Joe Z. Tsien, and Tianming Liu
XVIII Contents – Part I
Integrative Analysis of Cellular Morphometric Context Reveals Clinically

Relevant Signatures in Lower Grade Glioma . . . . . . . . . . . . . . . . . . . . . . . 72
Ju Han, Yunfu Wang, Weidong Cai, Alexander Borowsky,
Bahram Parvin, and Hang Chang
Mapping Lifetime Brain Volumetry with Covariate-Adjusted Restricted

Cubic Spline Regression from Cross-Sectional Multi-site MRI . . . . . . . . . . . 81
Yuankai Huo, Katherine Aboud, Hakmook Kang, Laurie E. Cutting,
and Bennett A. Landman
Extracting the Core Structural Connectivity Network: Guaranteeing

Network Connectedness Through a Graph-Theoretical Approach. . . . . . . . . . 89
Demian Wassermann, Dorian Mazauric, Guillermo Gallardo-Diez,
and Rachid Deriche
Fiber Orientation Estimation Using Nonlocal and Local Information . . . . . . . 97

Chuyang Ye
Brain Analysis: Connectivity
Reveal Consistent Spatial-Temporal Patterns from Dynamic Functional

Connectivity for Autism Spectrum Disorder Identification . . . . . . . . . . . . . . 106
Yingying Zhu, Xiaofeng Zhu, Han Zhang, Wei Gao, Dinggang Shen,
and Guorong Wu
Boundary Mapping Through Manifold Learning for Connectivity-Based

Cortical Parcellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Salim Arslan, Sarah Parisot, and Daniel Rueckert
Species Preserved and Exclusive Structural Connections Revealed

by Sparse CCA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Xiao Li, Lei Du, Tuo Zhang, Xintao Hu, Xi Jiang, Lei Guo,
and Tianming Liu
Modularity Reinforcement for Improving Brain Subnetwork Extraction . . . . . 132

Chendi Wang, Bernard Ng, and Rafeef Abugharbieh
Effective Brain Connectivity Through a Constrained Autoregressive Model . . . 140

Alessandro Crimi, Luca Dodero, Vittorio Murino, and Diego Sona
GraMPa: Graph-Based Multi-modal Parcellation of the Cortex

Using Fusion Moves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Sarah Parisot, Ben Glocker, Markus D. Schirmer, and Daniel Rueckert
A Continuous Model of Cortical Connectivity . . . . . . . . . . . . . . . . . . . . . . 157

Daniel Moyer, Boris A. Gutman, Joshua Faskowitz, Neda Jahanshad,
and Paul M. Thompson
Contents – Part I XIX
Label-Informed Non-negative Matrix Factorization with Manifold

Regularization for Discriminative Subnetwork Detection . . . . . . . . . . . . . . . 166
Takanori Watanabe, Birkan Tunc, Drew Parker, Junghoon Kim,
and Ragini Verma
Predictive Subnetwork Extraction with Structural Priors

for Infant Connectomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Colin J. Brown, Steven P. Miller, Brian G. Booth, Jill G. Zwicker,
Ruth E. Grunau, Anne R. Synnes, Vann Chau, and Ghassan Hamarneh
Hierarchical Clustering of Tractography Streamlines Based

on Anatomical Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Viviana Siless, Ken Chang, Bruce Fischl, and Anastasia Yendiki
Unsupervised Identification of Clinically Relevant Clusters in Routine

Imaging Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Johannes Hofmanninger, Markus Krenn, Markus Holzer,
Thomas Schlegl, Helmut Prosch, and Georg Langs
Probabilistic Tractography for Topographically Organized Connectomes . . . . 201

Dogu Baran Aydogan and Yonggang Shi
Brain Analysis: Cortical Morphology
A Hybrid Multishape Learning Framework for Longitudinal Prediction

of Cortical Surfaces and Fiber Tracts Using Neonatal Data. . . . . . . . . . . . . . 210
Islem Rekik, Gang Li, Pew-Thian Yap, Geng Chen, Weili Lin,
and Dinggang Shen
Learning-Based Topological Correction for Infant Cortical Surfaces . . . . . . . 219

Shijie Hao, Gang Li, Li Wang, Yu Meng, and Dinggang Shen
Riemannian Metric Optimization for Connectivity-Driven Surface Mapping . . . 228

Jin Kyu Gahm and Yonggang Shi
Riemannian Statistical Analysis of Cortical Geometry with Robustness

to Partial Homology and Misalignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Suyash P. Awate, Richard M. Leahy, and Anand A. Joshi
Modeling Fetal Cortical Expansion Using Graph-Regularized

Gompertz Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Ernst Schwartz, Gregor Kasprian, András Jakab, Daniela Prayer,
Veronika Schöpf, and Georg Langs
Longitudinal Analysis of the Preterm Cortex Using Multi-modal

Spectral Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Eliza Orasanu, Pierre-Louis Bazin, Andrew Melbourne, Marco Lorenzi,
Herve Lombaert, Nicola J. Robertson, Giles Kendall, Nikolaus Weiskopf,
Neil Marlow, and Sebastien Ourselin
XX Contents – Part I
Alzheimer Disease
Early Diagnosis of Alzheimer’s Disease by Joint Feature Selection

and Classification on Temporally Structured Support Vector Machine . . . . . . 264
Yingying Zhu, Xiaofeng Zhu, Minjeong Kim, Dinggang Shen,
and Guorong Wu
Prediction of Memory Impairment with MRI Data: A Longitudinal Study

of Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Xiaoqian Wang, Dinggang Shen, and Heng Huang
Joint Data Harmonization and Group Cardinality Constrained Classification . . . 282

Yong Zhang, Sang Hyun Park, and Kilian M. Pohl
Progressive Graph-Based Transductive Learning for Multi-modal

Classification of Brain Disorder Disease. . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Zhengxia Wang, Xiaofeng Zhu, Ehsan Adeli, Yingying Zhu, Chen Zu,
Feiping Nie, Dinggang Shen, and Guorong Wu
Structured Outlier Detection in Neuroimaging Studies with Minimal

Convex Polytopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
Erdem Varol, Aristeidis Sotiras, and Christos Davatzikos
Diagnosis of Alzheimer’s Disease Using View-Aligned Hypergraph

Learning with Incomplete Multi-modality Data . . . . . . . . . . . . . . . . . . . . . . 308
Mingxia Liu, Jun Zhang, Pew-Thian Yap, and Dinggang Shen
New Multi-task Learning Model to Predict Alzheimer’s Disease

Cognitive Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Zhouyuan Huo, Dinggang Shen, and Heng Huang
Hyperbolic Space Sparse Coding with Its Application on Prediction

of Alzheimer’s Disease in Mild Cognitive Impairment . . . . . . . . . . . . . . . . . 326
Jie Zhang, Jie Shi, Cynthia Stonnington, Qingyang Li, Boris A. Gutman,
Kewei Chen, Eric M. Reiman, Richard Caselli, Paul M. Thompson,
Jieping Ye, and Yalin Wang
Large-Scale Collaborative Imaging Genetics Studies of Risk Genetic

Factors for Alzheimer’s Disease Across Multiple Institutions . . . . . . . . . . . . 335
Qingyang Li, Tao Yang, Liang Zhan, Derrek Paul Hibar,
Neda Jahanshad, Yalin Wang, Jieping Ye, Paul M. Thompson,
and Jie Wang
Structured Sparse Low-Rank Regression Model for Brain-Wide

and Genome-Wide Associations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
Xiaofeng Zhu, Heung-Il Suk, Heng Huang, and Dinggang Shen
Contents – Part I XXI
Surgical Guidance and Tracking
3D Ultrasonic Needle Tracking with a 1.5D Transducer Array for Guidance

of Fetal Interventions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Wenfeng Xia, Simeon J. West, Jean-Martial Mari, Sebastien Ourselin,
Anna L. David, and Adrien E. Desjardins
Enhancement of Needle Tip and Shaft from 2D Ultrasound Using Signal

Transmission Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
Cosmas Mwikirize, John L. Nosher, and Ilker Hacihaliloglu
Plane Assist: The Influence of Haptics on Ultrasound-Based Needle

Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
Heather Culbertson, Julie M. Walker, Michael Raitor,
Allison M. Okamura, and Philipp J. Stolka
A Surgical Guidance System for Big-Bubble Deep Anterior

Lamellar Keratoplasty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Hessam Roodaki, Chiara Amat di San Filippo, Daniel Zapp,
Nassir Navab, and Abouzar Eslami
Real-Time 3D Tracking of Articulated Tools for Robotic Surgery . . . . . . . . . 386

Menglong Ye, Lin Zhang, Stamatia Giannarou, and Guang-Zhong Yang
Towards Automated Ultrasound Transesophageal Echocardiography

and X-Ray Fluoroscopy Fusion Using an Image-Based
Co-registration Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Shanhui Sun, Shun Miao, Tobias Heimann, Terrence Chen,
Markus Kaiser, Matthias John, Erin Girard, and Rui Liao
Robust, Real-Time, Dense and Deformable 3D Organ Tracking

in Laparoscopic Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Toby Collins, Adrien Bartoli, Nicolas Bourdel, and Michel Canis
Structure-Aware Rank-1 Tensor Approximation for Curvilinear

Structure Tracking Using Learned Hierarchical Features. . . . . . . . . . . . . . . . 413
Peng Chu, Yu Pang, Erkang Cheng, Ying Zhu, Yefeng Zheng,
and Haibin Ling
Real-Time Online Adaption for Robust Instrument Tracking

and Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
Nicola Rieke, David Joseph Tan, Federico Tombari,
Josué Page Vizcaíno, Chiara Amat di San Filippo, Abouzar Eslami,
and Nassir Navab
XXII Contents – Part I
Integrated Dynamic Shape Tracking and RF Speckle Tracking for Cardiac

Motion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Nripesh Parajuli, Allen Lu, John C. Stendahl, Maria Zontak,
Nabil Boutagy, Melissa Eberle, Imran Alkhalil, Matthew O’Donnell,
Albert J. Sinusas, and James S. Duncan
The Endoscopogram: A 3D Model Reconstructed from Endoscopic

Video Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Qingyu Zhao, True Price, Stephen Pizer, Marc Niethammer,
Ron Alterovitz, and Julian Rosenman
Robust Image Descriptors for Real-Time Inter-Examination Retargeting

in Gastrointestinal Endoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Menglong Ye, Edward Johns, Benjamin Walter, Alexander Meining,
and Guang-Zhong Yang
Kalman Filter Based Data Fusion for Needle Deflection Estimation

Using Optical-EM Sensor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
Baichuan Jiang, Wenpeng Gao, Daniel F. Kacher, Thomas C. Lee,
and Jagadeesan Jayender
Bone Enhancement in Ultrasound Based on 3D Local Spectrum Variation

for Percutaneous Scaphoid Fracture Fixation. . . . . . . . . . . . . . . . . . . . . . . . 465
Emran Mohammad Abu Anas, Alexander Seitel, Abtin Rasoulian,
Paul St. John, Tamas Ungi, Andras Lasso, Kathryn Darras,
David Wilson, Victoria A. Lessoway, Gabor Fichtinger, Michelle Zec,
David Pichora, Parvin Mousavi, Robert Rohling,
and Purang Abolmaesumi
Bioelectric Navigation: A New Paradigm for Intravascular

Device Guidance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
Bernhard Fuerst, Erin E. Sutton, Reza Ghotbi, Noah J. Cowan,
and Nassir Navab
Computer Aided Interventions
Process Monitoring in the Intensive Care Unit: Assessing Patient Mobility

Through Activity Analysis with a Non-Invasive Mobility Sensor . . . . . . . . . 482
Austin Reiter, Andy Ma, Nishi Rawat, Christine Shrock, and Suchi Saria
Patient MoCap: Human Pose Estimation Under Blanket Occlusion

for Hospital Monitoring Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
Felix Achilles, Alexandru-Eugen Ichim, Huseyin Coskun,
Federico Tombari, Soheyl Noachtar, and Nassir Navab
Contents – Part I XXIII
Numerical Simulation of Cochlear-Implant Surgery: Towards

Patient-Specific Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
Olivier Goury, Yann Nguyen, Renato Torres, Jeremie Dequidt,
and Christian Duriez
Meaningful Assessment of Surgical Expertise: Semantic Labeling with

Data and Crowds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
Marzieh Ershad, Zachary Koesters, Robert Rege, and Ann Majewicz
2D-3D Registration Accuracy Estimation for Optimised Planning

of Image-Guided Pancreatobiliary Interventions. . . . . . . . . . . . . . . . . . . . . . 516
Yipeng Hu, Ester Bonmati, Eli Gibson, John H. Hipwell,
David J. Hawkes, Steven Bandula, Stephen P. Pereira,
and Dean C. Barratt
Registration-Free Simultaneous Catheter and Environment Modelling . . . . . . 525

Liang Zhao, Stamatia Giannarou, Su-Lin Lee, and Guang-Zhong Yang
Pareto Front vs. Weighted Sum for Automatic Trajectory Planning of Deep
Brain Stimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
Noura Hamzé, Jimmy Voirin, Pierre Collet, Pierre Jannin,
Claire Haegelen, and Caroline Essert
Efficient Anatomy Driven Automated Multiple Trajectory Planning

for Intracranial Electrode Implantation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
Rachel Sparks, Gergely Zombori, Roman Rodionov, Maria A. Zuluaga,
Beate Diehl, Tim Wehner, Anna Miserocchi, Andrew W. McEvoy,
John S. Duncan, and Sebastien Ourselin
Recognizing Surgical Activities with Recurrent Neural Networks . . . . . . . . . 551

Robert DiPietro, Colin Lea, Anand Malpani, Narges Ahmidi,
S. Swaroop Vedula, Gyusung I. Lee, Mija R. Lee, and Gregory D. Hager
Two-Stage Simulation Method to Improve Facial Soft Tissue Prediction

Accuracy for Orthognathic Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
Daeseung Kim, Chien-Ming Chang, Dennis Chun-Yu Ho,
Xiaoyan Zhang, Shunyao Shen, Peng Yuan, Huaming Mai,
Guangming Zhang, Xiaobo Zhou, Jaime Gateno,
Michael A.K. Liebschner, and James J. Xia
Ultrasound Image Analysis
Hand-Held Sound-Speed Imaging Based on Ultrasound

Reflector Delineation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
Sergio J. Sanabria and Orcun Goksel
XXIV Contents – Part I
Ultrasound Tomosynthesis: A New Paradigm for Quantitative Imaging

of the Prostate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
Fereshteh Aalamifar, Reza Seifabadi, Marcelino Bernardo,
Ayele H. Negussie, Baris Turkbey, Maria Merino, Peter Pinto,
Arman Rahmim, Bradford J. Wood, and Emad M. Boctor
Photoacoustic Imaging Paradigm Shift: Towards

Using Vendor-Independent Ultrasound Scanners . . . . . . . . . . . . . . . . . . . . . 585
Haichong K. Zhang, Xiaoyu Guo, Behnoosh Tavakoli,
and Emad M. Boctor
4D Reconstruction of Fetal Heart Ultrasound Images in Presence

of Fetal Motion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
Christine Tanner, Barbara Flach, Céline Eggenberger,
Oliver Mattausch, Michael Bajka, and Orcun Goksel
Towards Reliable Automatic Characterization of Neonatal Hip Dysplasia

from 3D Ultrasound Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
Niamul Quader, Antony Hodgson, Kishore Mulpuri, Anthony Cooper,
and Rafeef Abugharbieh
Cancer Image Analysis
Image-Based Computer-Aided Diagnostic System for Early Diagnosis

of Prostate Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
Islam Reda, Ahmed Shalaby, Mohammed Elmogy, Ahmed Aboulfotouh,
Fahmi Khalifa, Mohamed Abou El-Ghar, Georgy Gimelfarb,
and Ayman El-Baz
Multidimensional Texture Analysis for Improved Prediction of Ultrasound

Liver Tumor Response to Chemotherapy Treatment. . . . . . . . . . . . . . . . . . . 619
Omar S. Al-Kadi, Dimitri Van De Ville, and Adrien Depeursinge
Classification of Prostate Cancer Grades and T-Stages Based

on Tissue Elasticity Using Medical Image Analysis . . . . . . . . . . . . . . . . . . . 627
Shan Yang, Vladimir Jojic, Jun Lian, Ronald Chen, Hongtu Zhu,
and Ming C. Lin
Automatic Determination of Hormone Receptor Status in Breast Cancer

Using Thermography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
Siva Teja Kakileti, Krithika Venkataramani, and Himanshu J. Madhu
Prostate Cancer: Improved Tissue Characterization by Temporal Modeling

of Radio-Frequency Ultrasound Echo Data . . . . . . . . . . . . . . . . . . . . . . . . . 644
Layan Nahlawi, Farhad Imani, Mena Gaed, Jose A. Gomez,
Madeleine Moussa, Eli Gibson, Aaron Fenster, Aaron D. Ward,
Purang Abolmaesumi, Hagit Shatkay, and Parvin Mousavi
Contents – Part I XXV
Classifying Cancer Grades Using Temporal Ultrasound for Transrectal

Prostate Biopsy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
Shekoofeh Azizi, Farhad Imani, Jin Tae Kwak, Amir Tahmasebi,
Sheng Xu, Pingkun Yan, Jochen Kruecker, Baris Turkbey, Peter Choyke,
Peter Pinto, Bradford Wood, Parvin Mousavi, and Purang Abolmaesumi
Characterization of Lung Nodule Malignancy Using Hybrid Shape

and Appearance Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
Mario Buty, Ziyue Xu, Mingchen Gao, Ulas Bagci, Aaron Wu,
and Daniel J. Mollura
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671

Contents – Part II
Machine Learning and Feature Selection
Feature Selection Based on Iterative Canonical Correlation Analysis

for Automatic Diagnosis of Parkinson’s Disease . . . . . . . . . . . . . . . . . . . . . 1
Luyan Liu, Qian Wang, Ehsan Adeli, Lichi Zhang, Han Zhang,
and Dinggang Shen
Identifying Relationships in Functional and Structural Connectome Data

Using a Hypergraph Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Brent C. Munsell, Guorong Wu, Yue Gao, Nicholas Desisto,
and Martin Styner
Ensemble Hierarchical High-Order Functional Connectivity Networks

for MCI Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Xiaobo Chen, Han Zhang, and Dinggang Shen
Outcome Prediction for Patient with High-Grade Gliomas from Brain

Functional and Structural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Luyan Liu, Han Zhang, Islem Rekik, Xiaobo Chen, Qian Wang,
and Dinggang Shen
Mammographic Mass Segmentation with Online Learned Shape

and Appearance Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Menglin Jiang, Shaoting Zhang, Yuanjie Zheng,
and Dimitris N. Metaxas
Differential Dementia Diagnosis on Incomplete Data with Latent Trees . . . . . 44

Christian Ledig, Sebastian Kaltwang, Antti Tolonen, Juha Koikkalainen,
Philip Scheltens, Frederik Barkhof, Hanneke Rhodius-Meester,
Betty Tijms, Afina W. Lemstra, Wiesje van der Flier, Jyrki Lötjönen,
and Daniel Rueckert
Bridging Computational Features Toward Multiple Semantic Features

with Multi-task Regression: A Study of CT Pulmonary Nodules . . . . . . . . . . 53
Sihong Chen, Dong Ni, Jing Qin, Baiying Lei, Tianfu Wang,
and Jie-Zhi Cheng
Robust Cancer Treatment Outcome Prediction Dealing with Small-Sized

and Imbalanced Data from FDG-PET Images . . . . . . . . . . . . . . . . . . . . . . . 61
Chunfeng Lian, Su Ruan, Thierry Denœux, Hua Li,
and Pierre Vera
XXVIII Contents – Part II
Structured Sparse Kernel Learning for Imaging Genetics Based

Alzheimer’s Disease Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Jailin Peng, Le An, Xiaofeng Zhu, Yan Jin, and Dinggang Shen
Semi-supervised Hierarchical Multimodal Feature and Sample Selection

for Alzheimer’s Disease Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Le An, Ehsan Adeli, Mingxia Liu, Jun Zhang, and Dinggang Shen
Stability-Weighted Matrix Completion of Incomplete Multi-modal Data

for Disease Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Kim-Han Thung, Ehsan Adeli, Pew-Thian Yap, and Dinggang Shen
Employing Visual Analytics to Aid the Design of White Matter

Hyperintensity Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Renata Georgia Raidou, Hugo J. Kuijf, Neda Sepasian, Nicola Pezzotti,
Willem H. Bouvy, Marcel Breeuwer, and Anna Vilanova
Deep Learning in Medical Imaging
The Automated Learning of Deep Features for Breast Mass Classification

from Mammograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Neeraj Dhungel, Gustavo Carneiro, and Andrew P. Bradley
Multimodal Deep Learning for Cervical Dysplasia Diagnosis . . . . . . . . . . . . 115

Tao Xu, Han Zhang, Xiaolei Huang, Shaoting Zhang,
and Dimitris N. Metaxas
Learning from Experts: Developing Transferable Deep Features

for Patient-Level Lung Cancer Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Wei Shen, Mu Zhou, Feng Yang, Di Dong, Caiyun Yang, Yali Zang,
and Jie Tian
DeepVessel: Retinal Vessel Segmentation via Deep Learning

and Conditional Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Huazhu Fu, Yanwu Xu, Stephen Lin, Damon Wing Kee Wong,
and Jiang Liu
Deep Retinal Image Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Pablo Arbeláez,
and Luc Van Gool
3D Deeply Supervised Network for Automatic Liver Segmentation

from CT Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Qi Dou, Hao Chen, Yueming Jin, Lequan Yu, Jing Qin,
and Pheng-Ann Heng
Contents – Part II XXIX
Deep Neural Networks for Fast Segmentation of 3D Medical Images . . . . . . 158

Karl Fritscher, Patrik Raudaschl, Paolo Zaffino,
Maria Francesca Spadea, Gregory C. Sharp, and Rainer Schubert
SpineNet: Automatically Pinpointing Classification Evidence

in Spinal MRIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Amir Jamaludin, Timor Kadir, and Andrew Zisserman
A Deep Learning Approach for Semantic Segmentation in Histology

Tissue Images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Jiazhuo Wang, John D. MacKenzie, Rageshree Ramachandran,
and Danny Z. Chen
Spatial Clockwork Recurrent Neural Network for Muscle

Perimysium Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Yuanpu Xie, Zizhao Zhang, Manish Sapkota, and Lin Yang
Automated Age Estimation from Hand MRI Volumes

Using Deep Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Darko Štern, Christian Payer, Vincent Lepetit, and Martin Urschler
Real-Time Standard Scan Plane Detection and Localisation in Fetal

Ultrasound Using Fully Convolutional Neural Networks . . . . . . . . . . . . . . . 203
Christian F. Baumgartner, Konstantinos Kamnitsas, Jacqueline Matthew,
Sandra Smith, Bernhard Kainz, and Daniel Rueckert
3D Deep Learning for Multi-modal Imaging-Guided Survival Time

Prediction of Brain Tumor Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Dong Nie, Han Zhang, Ehsan Adeli, Luyan Liu, and Dinggang Shen
Applications of Machine Learning
From Local to Global Random Regression Forests: Exploring Anatomical

Landmark Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Darko Štern, Thomas Ebner, and Martin Urschler
Regressing Heatmaps for Multiple Landmark Localization Using CNNs. . . . . 230

Christian Payer, Darko Štern, Horst Bischof, and Martin Urschler
Self-Transfer Learning for Weakly Supervised Lesion Localization . . . . . . . . 239

Sangheum Hwang and Hyo-Eun Kim
Automatic Cystocele Severity Grading in Ultrasound

by Spatio-Temporal Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Dong Ni, Xing Ji, Yaozong Gao, Jie-Zhi Cheng, Huifang Wang,
Jing Qin, Baiying Lei, Tianfu Wang, Guorong Wu, and Dinggang Shen
XXX Contents – Part II
Graphical Modeling of Ultrasound Propagation in Tissue for Automatic

Bone Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Firat Ozdemir, Ece Ozkan, and Orcun Goksel
Bayesian Image Quality Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

Ryutaro Tanno, Aurobrata Ghosh, Francesco Grussu, Enrico Kaden,
Antonio Criminisi, and Daniel C. Alexander
Wavelet Appearance Pyramids for Landmark Detection and Pathology

Classification: Application to Lumbar Spinal Stenosis . . . . . . . . . . . . . . . . . 274
Qiang Zhang, Abhir Bhalerao, Caron Parsons, Emma Helm,
and Charles Hutchinson
A Learning-Free Approach to Whole Spine Vertebra Localization in MRI . . . 283

Marko Rak and Klaus-Dietz Tönnies
Automatic Quality Control for Population Imaging: A Generic

Unsupervised Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Mohsen Farzi, Jose M. Pozo, Eugene V. McCloskey, J. Mark Wilkinson,
and Alejandro F. Frangi
A Cross-Modality Neural Network Transform for Semi-automatic Medical

Image Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
Mehdi Moradi, Yufan Guo, Yaniv Gur, Mohammadreza Negahdar,
and Tanveer Syeda-Mahmood
Sub-category Classifiers for Multiple-instance Learning and Its Application

to Retinal Nerve Fiber Layer Visibility Classification . . . . . . . . . . . . . . . . . 308
Siyamalan Manivannan, Caroline Cobb, Stephen Burgess,
and Emanuele Trucco
Vision-Based Classification of Developmental Disorders

Using Eye-Movements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Guido Pusiol, Andre Esteva, Scott S. Hall, Michael Frank,
Arnold Milstein, and Li Fei-Fei
Scalable Unsupervised Domain Adaptation for Electron Microscopy . . . . . . . 326

Róger Bermúdez-Chacón, Carlos Becker, Mathieu Salzmann,
and Pascal Fua
Automated Diagnosis of Neural Foraminal Stenosis Using Synchronized

Superpixels Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Xiaoxu He, Yilong Yin, Manas Sharma, Gary Brahm, Ashley Mercado,
and Shuo Li
Contents – Part II XXXI
Segmentation
Automated Segmentation of Knee MRI Using Hierarchical Classifiers and

Just Enough Interaction Based Learning: Data from Osteoarthritis Initiative . . . 344
Satyananda Kashyap, Ipek Oguz, Honghai Zhang, and Milan Sonka
Dynamically Balanced Online Random Forests for Interactive

Scribble-Based Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
Guotai Wang, Maria A. Zuluaga, Rosalind Pratt, Michael Aertsen,
Tom Doel, Maria Klusmann, Anna L. David, Jan Deprest,
Tom Vercauteren, and Sébastien Ourselin
Orientation-Sensitive Overlap Measures for the Validation of Medical

Image Segmentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Tasos Papastylianou, Erica Dall’ Armellina, and Vicente Grau
High-Throughput Glomeruli Analysis of lCT Kidney Images

Using Tree Priors and Scalable Sparse Computation . . . . . . . . . . . . . . . . . . 370
Carlos Correa Shokiche, Philipp Baumann, Ruslan Hlushchuk,
Valentin Djonov, and Mauricio Reyes
A Surface Patch-Based Segmentation Method for Hippocampal Subfields . . . 379

Benoit Caldairou, Boris C. Bernhardt, Jessie Kulaga-Yoskovitz,
Hosung Kim, Neda Bernasconi, and Andrea Bernasconi
Automatic Lymph Node Cluster Segmentation Using Holistically-Nested

Neural Networks and Structured Optimization in CT Images . . . . . . . . . . . . 388
Isabella Nogues, Le Lu, Xiaosong Wang, Holger Roth, Gedas Bertasius,
Nathan Lay, Jianbo Shi, Yohannes Tsehay, and Ronald M. Summers
Evaluation-Oriented Training via Surrogate Metrics for Multiple

Sclerosis Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
Michel M. Santos, Paula R.B. Diniz, Abel G. Silva-Filho,
and Wellington P. Santos
Corpus Callosum Segmentation in Brain MRIs via Robust Target-

Localization and Joint Supervised Feature Extraction and Prediction . . . . . . . 406
Lisa Y.W. Tang, Tom Brosch, XingTong Liu, Youngjin Yoo,
Anthony Traboulsee, David Li, and Roger Tam
Automatic Liver and Lesion Segmentation in CT Using Cascaded Fully

Convolutional Neural Networks and 3D Conditional Random Fields . . . . . . . 415
Patrick Ferdinand Christ, Mohamed Ezzeldin A. Elshaer,
Florian Ettlinger, Sunil Tatavarty, Marc Bickel, Patrick Bilic,
Markus Rempfler, Marco Armbruster, Felix Hofmann,
Melvin D’Anastasi, Wieland H. Sommer, Seyed-Ahmad Ahmadi,
and Bjoern H. Menze
XXXII Contents – Part II
3D U-Net: Learning Dense Volumetric Segmentation from Sparse

Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
Özgün Çiçek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox,
and Olaf Ronneberger
Model-Based Segmentation of Vertebral Bodies from MR Images

with 3D CNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Robert Korez, Boštjan Likar, Franjo Pernuš, and Tomaž Vrtovec
Pancreas Segmentation in MRI Using Graph-Based Decision Fusion

on Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
Jinzheng Cai, Le Lu, Zizhao Zhang, Fuyong Xing, Lin Yang,
and Qian Yin
Spatial Aggregation of Holistically-Nested Networks for Automated

Pancreas Segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Holger R. Roth, Le Lu, Amal Farag, Andrew Sohn,
and Ronald M. Summers
Topology Aware Fully Convolutional Networks for Histology

Gland Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
Aïcha BenTaieb and Ghassan Hamarneh
HeMIS: Hetero-Modal Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 469

Mohammad Havaei, Nicolas Guizard, Nicolas Chapados,
and Yoshua Bengio
Deep Learning for Multi-task Medical Image Segmentation

in Multiple Modalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
Pim Moeskops, Jelmer M. Wolterink, Bas H.M. van der Velden,
Kenneth G.A. Gilhuijs, Tim Leiner, Max A. Viergever, and Ivana Išgum
Iterative Multi-domain Regularized Deep Learning for Anatomical

Structure Detection and Segmentation from Ultrasound Images . . . . . . . . . . . 487
Hao Chen, Yefeng Zheng, Jin-Hyeong Park, Pheng-Ann Heng,
and S. Kevin Zhou
Gland Instance Segmentation by Deep Multichannel Side Supervision . . . . . . 496

Yan Xu, Yang Li, Mingyuan Liu, Yipei Wang, Maode Lai,
and Eric I-Chao Chang
Enhanced Probabilistic Label Fusion by Estimating Label Confidences

Through Discriminative Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
Oualid M. Benkarim, Gemma Piella, Miguel Angel González Ballester,
and Gerard Sanroma
Contents – Part II XXXIII
Feature Sensitive Label Fusion with Random Walker for Atlas-Based

Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
Siqi Bao and Albert C.S. Chung
Deep Fusion Net for Multi-atlas Segmentation: Application to Cardiac

MR Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
Heran Yang, Jian Sun, Huibin Li, Lisheng Wang, and Zongben Xu
Prior-Based Coregistration and Cosegmentation . . . . . . . . . . . . . . . . . . . . . 529

Mahsa Shakeri, Enzo Ferrante, Stavros Tsogkas, Sarah Lippé,
Samuel Kadoury, Iasonas Kokkinos, and Nikos Paragios
Globally Optimal Label Fusion with Shape Priors . . . . . . . . . . . . . . . . . . . . 538

Ipek Oguz, Satyananda Kashyap, Hongzhi Wang, Paul Yushkevich,
and Milan Sonka
Joint Segmentation and CT Synthesis for MRI-only Radiotherapy

Treatment Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
Ninon Burgos, Filipa Guerreiro, Jamie McClelland, Simeon Nill,
David Dearnaley, Nandita deSouza, Uwe Oelfke, Antje-Christin Knopf,
Sébastien Ourselin, and M. Jorge Cardoso
Regression Forest-Based Atlas Localization and Direction Specific Atlas

Generation for Pancreas Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
Masahiro Oda, Natsuki Shimizu, Ken’ichi Karasawa, Yukitaka Nimura,
Takayuki Kitasaka, Kazunari Misawa, Michitaka Fujiwara,
Daniel Rueckert, and Kensaku Mori
Accounting for the Confound of Meninges in Segmenting Entorhinal

and Perirhinal Cortices in T1-Weighted MRI . . . . . . . . . . . . . . . . . . . . . . . 564
Long Xie, Laura E.M. Wisse, Sandhitsu R. Das, Hongzhi Wang,
David A. Wolk, Jose V. Manjón, and Paul A. Yushkevich
7T-Guided Learning Framework for Improving the Segmentation

of 3T MR Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
Khosro Bahrami, Islem Rekik, Feng Shi, Yaozong Gao,
and Dinggang Shen
Multivariate Mixture Model for Cardiac Segmentation

from Multi-Sequence MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
Xiahai Zhuang
Fast Fully Automatic Segmentation of the Human Placenta from Motion

Corrupted MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
Amir Alansary, Konstantinos Kamnitsas, Alice Davidson,
Rostislav Khlebnikov, Martin Rajchl, Christina Malamateniou,
Mary Rutherford, Joseph V. Hajnal, Ben Glocker, Daniel Rueckert,
and Bernhard Kainz
XXXIV Contents – Part II
Multi-organ Segmentation Using Vantage Point Forests and Binary

Context Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
Mattias P. Heinrich and Maximilian Blendowski
Multiple Object Segmentation and Tracking by Bayes Risk Minimization . . . 607

Tomáš Sixta and Boris Flach
Crowd-Algorithm Collaboration for Large-Scale Endoscopic Image

Annotation with Confidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
L. Maier-Hein, T. Ross, J. Gröhl, B. Glocker, S. Bodenstedt, C. Stock,
E. Heim, M. Götz, S. Wirkert, H. Kenngott, S. Speidel,
and K. Maier-Hein
Emphysema Quantification on Cardiac CT Scans Using Hidden Markov

Measure Field Model: The MESA Lung Study . . . . . . . . . . . . . . . . . . . . . . 624
Jie Yang, Elsa D. Angelini, Pallavi P. Balte, Eric A. Hoffman,
Colin O. Wu, Bharath A. Venkatesh, R. Graham Barr,
and Andrew F. Laine
Cell Image Analysis
Cutting Out the Middleman: Measuring Nuclear Area in Histopathology

Slides Without Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
Mitko Veta, Paul J. van Diest, and Josien P.W. Pluim
Subtype Cell Detection with an Accelerated Deep Convolution

Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
Sheng Wang, Jiawen Yao, Zheng Xu, and Junzhou Huang
Imaging Biomarker Discovery for Lung Cancer Survival Prediction . . . . . . . 649

Jiawen Yao, Sheng Wang, Xinliang Zhu, and Junzhou Huang
3D Segmentation of Glial Cells Using Fully Convolutional Networks

and k-Terminal Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658
Lin Yang, Yizhe Zhang, Ian H. Guldner, Siyuan Zhang,
and Danny Z. Chen
Detection of Differentiated vs. Undifferentiated Colonies of iPS Cells Using

Random Forests Modeled with the Multivariate Polya Distribution . . . . . . . . 667
Bisser Raytchev, Atsuki Masuda, Masatoshi Minakawa, Kojiro Tanaka,
Takio Kurita, Toru Imamura, Masashi Suzuki, Toru Tamaki,
and Kazufumi Kaneda
Detecting 10,000 Cells in One Second. . . . . . . . . . . . . . . . . . . . . . . . . . . . 676

Zheng Xu and Junzhou Huang
Contents – Part II XXXV
A Hierarchical Convolutional Neural Network for Mitosis Detection

in Phase-Contrast Microscopy Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685
Yunxiang Mao and Zhaozheng Yin
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693

Contents – Part III
Registration and Deformation Estimation
Learning-Based Multimodal Image Registration for Prostate Cancer

Radiation Therapy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Xiaohuan Cao, Yaozong Gao, Jianhua Yang, Guorong Wu,
and Dinggang Shen
A Deep Metric for Multimodal Registration . . . . . . . . . . . . . . . . . . . . . . . . 10

Martin Simonovsky, Benjamín Gutiérrez-Becker, Diana Mateus,
Nassir Navab, and Nikos Komodakis
Learning Optimization Updates for Multimodal Registration . . . . . . . . . . . . . 19

Benjamín Gutiérrez-Becker, Diana Mateus, Loïc Peter,
and Nassir Navab
Memory Efficient LDDMM for Lung CT. . . . . . . . . . . . . . . . . . . . . . . . . . 28

Thomas Polzin, Marc Niethammer, Mattias P. Heinrich, Heinz Handels,
and Jan Modersitzki
Inertial Demons: A Momentum-Based Diffeomorphic

Registration Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Andre Santos-Ribeiro, David J. Nutt, and John McGonigle
Diffeomorphic Density Registration in Thoracic Computed Tomography . . . . 46

Caleb Rottman, Ben Larson, Pouya Sabouri, Amit Sawant,
and Sarang Joshi
Temporal Registration in In-Utero Volumetric MRI Time Series . . . . . . . . . . 54

Ruizhi Liao, Esra A. Turk, Miaomiao Zhang, Jie Luo, P. Ellen Grant,
Elfar Adalsteinsson, and Polina Golland
Probabilistic Atlas of the Human Hippocampus Combining Ex Vivo

MRI and Histology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Daniel H. Adler, Ranjit Ittyerah, John Pluta, Stephen Pickup,
Weixia Liu, David A. Wolk, and Paul A. Yushkevich
Deformation Estimation with Automatic Sliding Boundary Computation . . . . 72

Joseph Samuel Preston, Sarang Joshi, and Ross Whitaker
Bilateral Weighted Adaptive Local Similarity Measure for Registration

in Neurosurgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Martin Kochan, Marc Modat, Tom Vercauteren, Mark White, Laura Mancini,
Gavin P. Winston, Andrew W. McEvoy, John S. Thornton, Tarek Yousry,
John S. Duncan, Sébastien Ourselin, and Danail Stoyanov
XXXVIII Contents – Part III
Model-Based Regularisation for Respiratory Motion Estimation

with Sparse Features in Image-Guided Interventions . . . . . . . . . . . . . . . . . . 89
Matthias Wilms, In Young Ha, Heinz Handels,
and Mattias Paul Heinrich
Carotid Artery Wall Motion Estimated from Ultrasound Imaging Sequences

Using a Nonlinear State Space Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Zhifan Gao, Yuanyuan Sun, Heye Zhang, Dhanjoo Ghista, Yanjie Li,
Huahua Xiong, Xin Liu, Yaoqin Xie, Wanqing Wu, and Shuo Li
Accuracy Estimation for Medical Image Registration

Using Regression Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Hessam Sokooti, Gorkem Saygili, Ben Glocker,
Boudewijn P.F. Lelieveldt, and Marius Staring
Embedding Segmented Volume in Finite Element Mesh

with Topology Preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Kazuya Sase, Teppei Tsujita, and Atsushi Konno
Deformable 3D-2D Registration of Known Components

for Image Guidance in Spine Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
A. Uneri, J. Goerres, T. De Silva, M.W. Jacobson, M.D. Ketcha,
S. Reaungamornrat, G. Kleinszig, S. Vogt, A.J. Khanna, J.-P. Wolinsky,
and J.H. Siewerdsen
Anatomically Constrained Video-CT Registration

via the V-IMLOP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Seth D. Billings, Ayushi Sinha, Austin Reiter, Simon Leonard,
Masaru Ishii, Gregory D. Hager, and Russell H. Taylor
Shape Modeling
A Multi-resolution T-Mixture Model Approach to Robust Group-Wise

Alignment of Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Nishant Ravikumar, Ali Gooya, Serkan Çimen, Alejandro F. Frangi,
and Zeike A. Taylor
Quantifying Shape Deformations by Variation of Geometric Spectrum. . . . . . 150

Hajar Hamidian, Jiaxi Hu, Zichun Zhong, and Jing Hua
Myocardial Segmentation of Contrast Echocardiograms

Using Random Forests Guided by Shape Model . . . . . . . . . . . . . . . . . . . . . 158
Yuanwei Li, Chin Pang Ho, Navtej Chahal, Roxy Senior,
and Meng-Xing Tang
Low-Dimensional Statistics of Anatomical Variability via Compact

Representation of Image Deformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Miaomiao Zhang, William M. Wells III, and Polina Golland
Contents – Part III XXXIX
A Multiscale Cardiac Model for Fast Personalisation and Exploitation . . . . . . 174

Roch Mollero, Xavier Pennec, Hervé Delingette, Nicholas Ayache,
and Maxime Sermesant
Transfer Shape Modeling Towards High-Throughput Microscopy

Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Fuyong Xing, Xiaoshuang Shi, Zizhao Zhang, JinZheng Cai,
Yuanpu Xie, and Lin Yang
Hierarchical Generative Modeling and Monte-Carlo EM in Riemannian

Shape Space for Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Saurabh J. Shigwan and Suyash P. Awate
Direct Estimation of Wall Shear Stress from Aneurysmal Morphology:

A Statistical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Ali Sarrami-Foroushani, Toni Lassila, Jose M. Pozo, Ali Gooya,
Multi-task Shape Regression for Medical Image Segmentation . . . . . . . . . . . 210

Xiantong Zhen, Yilong Yin, Mousumi Bhaduri, Ilanit Ben Nachum,
David Laidley, and Shuo Li
Soft Multi-organ Shape Models via Generalized PCA: A General

Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Juan J. Cerrolaza, Ronald M. Summers, and Marius George Linguraru
An Artificial Agent for Anatomical Landmark Detection in Medical Images . . . 229

Florin C. Ghesu, Bogdan Georgescu, Tommaso Mansi,
Dominik Neumann, Joachim Hornegger, and Dorin Comaniciu
Cardiac and Vascular Image Analysis
Identifying Patients at Risk for Aortic Stenosis Through Learning

from Multimodal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Tanveer Syeda-Mahmood, Yanrong Guo, Mehdi Moradi, D. Beymer,
D. Rajan, Yu Cao, Yaniv Gur, and Mohammadreza Negahdar
Multi-input Cardiac Image Super-Resolution Using Convolutional

Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Ozan Oktay, Wenjia Bai, Matthew Lee, Ricardo Guerrero,
Konstantinos Kamnitsas, Jose Caballero, Antonio de Marvao,
Stuart Cook, Declan O’Regan, and Daniel Rueckert
GPNLPerf: Robust 4d Non-rigid Motion Correction for Myocardial

Perfusion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
S. Thiruvenkadam, K.S. Shriram, B. Patil, G. Nicolas, M. Teisseire,
C. Cardon, J. Knoplioch, N. Subramanian, S. Kaushik, and R. Mullick
XL Contents – Part III
Recognizing End-Diastole and End-Systole Frames via Deep Temporal

Regression Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Bin Kong, Yiqiang Zhan, Min Shin, Thomas Denny, and Shaoting Zhang
Basal Slice Detection Using Long-Axis Segmentation for Cardiac Analysis . . . 273
Mahsa Paknezhad, Michael S. Brown, and Stephanie Marchesseau
Spatially-Adaptive Multi-scale Optimization for Local Parameter

Estimation: Application in Cardiac Electrophysiological Models . . . . . . . . . . 282
Jwala Dhamala, John L. Sapp, Milan Horacek, and Linwei Wang
Reconstruction of Coronary Artery Centrelines from X-Ray Angiography

Using a Mixture of Student’s t-Distributions. . . . . . . . . . . . . . . . . . . . . . . . 291
Serkan Çimen, Ali Gooya, Nishant Ravikumar, Zeike A. Taylor,
Barycentric Subspace Analysis: A New Symmetric Group-Wise

Paradigm for Cardiac Motion Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
Marc-Michel Rohé, Maxime Sermesant, and Xavier Pennec
Extraction of Coronary Vessels in Fluoroscopic X-Ray Sequences

Using Vessel Correspondence Optimization . . . . . . . . . . . . . . . . . . . . . . . . 308
Seung Yeon Shin, Soochahn Lee, Kyoung Jin Noh, Il Dong Yun,
and Kyoung Mu Lee
Coronary Centerline Extraction via Optimal Flow Paths and CNN

Path Pruning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Mehmet A. Gülsün, Gareth Funka-Lea, Puneet Sharma,
Saikiran Rapaka, and Yefeng Zheng
Vascular Registration in Photoacoustic Imaging by Low-Rank Alignment

via Foreground, Background and Complement Decomposition . . . . . . . . . . . 326
Ryoma Bise, Yingqiang Zheng, Imari Sato, and Masakazu Toi
From Real MRA to Virtual MRA: Towards an Open-Source Framework . . . . 335

N. Passat, S. Salmon, J.-P. Armspach, B. Naegel, C. Prud’homme,
H. Talbot, A. Fortin, S. Garnotel, O. Merveille, O. Miraucourt,
R. Tarabay, V. Chabannes, A. Dufour, A. Jezierska, O. Balédent,
E. Durand, L. Najman, M. Szopos, A. Ancel, J. Baruthio, M. Delbany,
S. Fall, G. Pagé, O. Génevaux, M. Ismail, P. Loureiro de Sousa,
M. Thiriet, and J. Jomier
Improved Diagnosis of Systemic Sclerosis Using Nailfold Capillary Flow . . . 344

Michael Berks, Graham Dinsdale, Andrea Murray, Tonia Moore,
Ariane Herrick, and Chris Taylor
Contents – Part III XLI
Tensor-Based Graph-Cut in Riemannian Metric Space and Its Application

to Renal Artery Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Chenglong Wang, Masahiro Oda, Yuichiro Hayashi, Yasushi Yoshino,
Tokunori Yamamoto, Alejandro F. Frangi, and Kensaku Mori
Automatic, Robust, and Globally Optimal Segmentation

of Tubular Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
Simon Pezold, Antal Horváth, Ketut Fundana, Charidimos Tsagkas,
Michaela Andělová, Katrin Weier, Michael Amann,
and Philippe C. Cattin
Dense Volume-to-Volume Vascular Boundary Detection . . . . . . . . . . . . . . . 371

Jameson Merkow, Alison Marsden, David Kriegman, and Zhuowen Tu
HALE: Healthy Area of Lumen Estimation for Vessel

Stenosis Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Sethuraman Sankaran, Michiel Schaap, Stanley C. Hunley,
James K. Min, Charles A. Taylor, and Leo Grady
3D Near Infrared and Ultrasound Imaging of Peripheral Blood Vessels

for Real-Time Localization and Needle Guidance . . . . . . . . . . . . . . . . . . . . 388
Alvin I. Chen, Max L. Balter, Timothy J. Maguire,
and Martin L. Yarmush
The Minimum Cost Connected Subgraph Problem in Medical

Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Markus Rempfler, Bjoern Andres, and Bjoern H. Menze
Image Reconstruction
ASL-incorporated Pharmacokinetic Modelling of PET Data With Reduced

Acquisition Time: Application to Amyloid Imaging. . . . . . . . . . . . . . . . . . . 406
Catherine J. Scott, Jieqing Jiao, Andrew Melbourne,
Jonathan M. Schott, Brian F. Hutton, and Sébastien Ourselin
Probe-Based Rapid Hybrid Hyperspectral and Tissue Surface Imaging

Aided by Fully Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 414
Jianyu Lin, Neil T. Clancy, Xueqing Sun, Ji Qi, Mirek Janatka,
Danail Stoyanov, and Daniel S. Elson
Efficient Low-Dose CT Denoising by Locally-Consistent Non-Local

Means (LC-NLM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Michael Green, Edith M. Marom, Nahum Kiryati, Eli Konen,
and Arnaldo Mayer
Deep Learning Computed Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . 432

Tobias Würfl, Florin C. Ghesu, Vincent Christlein, and Andreas Maier
XLII Contents – Part III
Axial Alignment for Anterior Segment Swept Source Optical Coherence

Tomography via Robust Low-Rank Tensor Recovery . . . . . . . . . . . . . . . . . 441
Yanwu Xu, Lixin Duan, Huazhu Fu, Xiaoqin Zhang,
Damon Wing Kee Wong, Baskaran Mani, Tin Aung, and Jiang Liu
3D Imaging from Video and Planar Radiography . . . . . . . . . . . . . . . . . . . . 450

Julien Pansiot and Edmond Boyer
Semantic Reconstruction-Based Nuclear Cataract Grading

from Slit-Lamp Lens Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
Yanwu Xu, Lixin Duan, Damon Wing Kee Wong, Tien Yin Wong,
and Jiang Liu
Vessel Orientation Constrained Quantitative Susceptibility Mapping

(QSM) Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
Suheyla Cetin, Berkin Bilgic, Audrey Fan, Samantha Holdsworth,
and Gozde Unal
Spatial-Angular Sparse Coding for HARDI . . . . . . . . . . . . . . . . . . . . . . . . 475

Evan Schwab, René Vidal, and Nicolas Charon
Compressed Sensing Dynamic MRI Reconstruction Using GPU-accelerated

3D Convolutional Sparse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
Tran Minh Quan and Won-Ki Jeong
MRI Image Analysis
Dynamic Volume Reconstruction from Multi-slice Abdominal MRI

Using Manifold Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Xin Chen, Muhammad Usman, Daniel R. Balfour, Paul K. Marsden,
Andrew J. Reader, Claudia Prieto, and Andrew P. King
Fast and Accurate Multi-tissue Deconvolution Using SHORE

and H-psd Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
Michael Ankele, Lek-Heng Lim, Samuel Groeschel, and Thomas Schultz
Optimisation of Arterial Spin Labelling Using Bayesian Experimental

Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
David Owen, Andrew Melbourne, David Thomas, Enrico De Vita,
Jonathan Rohrer, and Sebastien Ourselin
4D Phase-Contrast Magnetic Resonance CardioAngiography

(4D PC-MRCA) Creation from 4D Flow MRI . . . . . . . . . . . . . . . . . . . . . . 519
Mariana Bustamante, Vikas Gupta, Carl-Johan Carlhäll,
and Tino Ebbers
Contents – Part III XLIII
Joint Estimation of Cardiac Motion and T1 Maps for Magnetic Resonance
Late Gadolinium Enhancement Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Jens Wetzl, Aurélien F. Stalder, Michaela Schmidt, Yigit H. Akgök,
Christoph Tillmanns, Felix Lugauer, Christoph Forman,
Joachim Hornegger, and Andreas Maier
Correction of Fat-Water Swaps in Dixon MRI . . . . . . . . . . . . . . . . . . . . . . 536

Ben Glocker, Ender Konukoglu, Ioannis Lavdas, Juan Eugenio Iglesias,
Eric O. Aboagye, Andrea G. Rockall, and Daniel Rueckert
Motion-Robust Reconstruction Based on Simultaneous Multi-slice

Registration for Diffusion-Weighted MRI of Moving Subjects . . . . . . . . . . . 544
Bahram Marami, Benoit Scherrer, Onur Afacan, Simon K. Warfield,
and Ali Gholipour
Self Super-Resolution for Magnetic Resonance Images . . . . . . . . . . . . . . . . 553

Amod Jog, Aaron Carass, and Jerry L. Prince
Tight Graph Framelets for Sparse Diffusion MRI q-Space Representation . . . 561
Pew-Thian Yap, Bin Dong, Yong Zhang, and Dinggang Shen
A Bayesian Model to Assess T2 Values and Their Changes Over Time

in Quantitative MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
Benoit Combès, Anne Kerbrat, Olivier Commowick,
and Christian Barillot
Simultaneous Parameter Mapping, Modality Synthesis, and Anatomical

Labeling of the Brain with MR Fingerprinting . . . . . . . . . . . . . . . . . . . . . . 579
Pedro A. Gómez, Miguel Molina-Romero, Cagdas Ulas,
Guido Bounincontri, Jonathan I. Sperl, Derek K. Jones,
Marion I. Menzel, and Bjoern H. Menze
XQ-NLM: Denoising Diffusion MRI Data via x-q Space Non-local

Patch Matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
Geng Chen, Yafeng Wu, Dinggang Shen, and Pew-Thian Yap
Spatially Adaptive Spectral Denoising for MR Spectroscopic Imaging

using Frequency-Phase Non-local Means . . . . . . . . . . . . . . . . . . . . . . . . . . 596
Dhritiman Das, Eduardo Coello, Rolf F. Schulte, and Bjoern H. Menze
Beyond the Resolution Limit: Diffusion Parameter Estimation

in Partial Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
Zach Eaton-Rosen, Andrew Melbourne, M. Jorge Cardoso,
Neil Marlow, and Sebastien Ourselin
XLIV Contents – Part III
A Promising Non-invasive CAD System for Kidney Function Assessment . . . 613

M. Shehata, F. Khalifa, A. Soliman, M. Abou El-Ghar, A. Dwyer,
G. Gimel’farb, R. Keynton, and A. El-Baz
Comprehensive Maximum Likelihood Estimation of Diffusion

Compartment Models Towards Reliable Mapping of Brain Microstructure . . . 622
Aymeric Stamm, Olivier Commowick, Simon K. Warfield, and S. Vantini
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631

Ordinal Patterns for Connectivity Networks
in Brain Disease Diagnosis
Mingxia Liu, Junqiang Du, Biao Jie, and Daoqiang Zhang(B)
School of Computer Science and Technology,

Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
dqzhang@nuaa.edu.cn
Abstract. Brain connectivity networks have been widely used for diag-
nosis of brain-related diseases, e.g., Alzheimer’s disease (AD), mild cog-
nitive impairment (MCI), and attention deficit hyperactivity disorder
(ADHD). Although several network descriptors have been designed for
representing brain connectivity networks, most of them not only ignore
the important weight information of edges, but also cannot capture the
modular local structures of brain connectivity networks by only focusing
on individual brain regions. In this paper, we propose a new network
descriptor (called ordinal pattern) for brain connectivity networks, and
apply it for brain disease diagnosis. Specifically, we first define ordinal
patterns that contain sequences of weighted edges based on a functional
connectivity network. A frequent ordinal pattern mining algorithm is
then developed to identify those frequent ordinal patterns in a brain
connectivity network set. We further perform discriminative ordinal pat-
tern selection, followed by a SVM classification process. Experimental
results on both the ADNI and the ADHD-200 data sets demonstrate
that the proposed method achieves significant improvement compared
with state-of-the-art brain connectivity network based methods.
1 Introduction
As a modern brain mapping technique, functional magnetic resonance imaging
(fMRI) is an efficient as well as non-invasive way to map the patterns of func-
tional connectivity of the human brain [1,2]. In particular, the task-free (resting-
state) functional magnetic resonance imaging (rs-fMRI) have a small-world archi-
tecture, which can reflect a robust functional organization of the brain. Recent
studies [3–6] show great promises of brain connectivity networks in understand-
ing brain diseases (e.g., AD, MCI, and ADHD) pathology by exploring anatom-
ical connections or functional interactions among different brain regions, where
brain regions are treated as nodes and anatomical connections or functional
associations are regarded as edges.
Several network descriptors have been developed for representing brain con-
nectivity networks, such as node degrees [3], clustering coefficients [4], and sub-
networks [7]. Most of existing descriptors are designed on un-weighted brain con-
nectivity networks, where the valuable weight information of edges are ignored.
M. Liu and J. Du—These authors contribute equally for this paper.

c Springer International Publishing AG 2016
S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 1–9, 2016.
DOI: 10.1007/978-3-319-46720-7 1
2 M. Liu et al.
Actually, different edges are usually assigned different weights to measure the
connectivity strength between pairs of nodes (w.r.t. brain regions). However, pre-
vious studies usually simply apply thresholds to transform the original weighted
networks into un-weighted ones [2,5], which may lead to sub-optimal learning
performance. In addition, existing descriptors mainly focus on individual brain
regions other than local structures of brain networks, while many evidences have
declared that some brain diseases (e.g., AD and MCI) are highly related to mod-
ular local structures [8]. Unfortunately, it is hard to capture such local structures
using existing network descriptors.
... fMRI Weighted ...

Data Networks
Regional Mean
Pre-Processing
Time Series
... Discriminative Ordinal

... Frequent Ordinal Pattern Mining
Pattern Selection
a b c Feature
a b c
b c e Representation
...
...
b c e d b c e d
Patients’ networks
... Discriminative Ordinal
Frequent Ordinal Pattern Mining
Pattern Selection
b e d SVM
a b e b e d Classification
...
...
a b e d a b e d
Normal controls’ networks
Fig. 1. An overview of ordinal pattern based learning for brain disease diagnosis.
In this paper, we propose a new network descriptor, i.e., ordinal pattern,

for brain connectivity networks. The basic idea of the ordinal pattern is to
construct a sequence of weighted edges on a weighted network by considering
both the edge weights and the ordinal relations between edges. Compared with
conventional network descriptors, ordinal patterns are directly constructed on
weighted networks, which can naturally preserve the weight information and
local structures of original networks. Then, an ordinal pattern based learning
method is developed for brain disease diagnosis. Figure 1 presents the schematic
diagram of the proposed framework with each network representing a specific
subject. We first construct ordinal patterns on patients’ and normal controls’
(NCs) brain connectivity networks separately. A frequent ordinal pattern mining
algorithm is then developed to identify ordinal patterns that frequently occur in
patients’ and NCs’ brain networks. We then select the most discriminative ordi-
nal patterns from those frequent ordinal patterns, and regard them as feature
representation for subjects. Finally, we learn a support vector machine (SVM)
classifier for brain disease diagnosis, by using ordinal pattern based feature
representation.
Ordinal Patterns for Brain Connectivity Network 3
2 Method
2.1 Data and Preprocessing
The first data set contains rs-fMRI data from the ADNI1 database with 34 AD
patients, 99 MCI patients, and 50 NCs. The rs-fMRI data were pre-processed by
brain skull removal, motion correction, temporal pre-whitening, spatial smooth-
ing, global drift removal, slice time correction, and band pass filtering. By warp-
ing the automated anatomical labelling (AAL) [9] template, for each subject, we
concatenate the brain space of rs-fMRI scans into 90 regions of interest (ROIs).
For each ROI, the rs-fMRI time series of all voxels were averaged to be the mean
time series of the ROI. With ROIs as nodes and Pearson correlations between
pair of ROIs as connectivity weights, a functional full connected weighted net-
work is constructed for each subject. The second data set is ADHD-200 with the
Athena preprocessed rs-fMRI data, including 118 ADHD patients and 98 NCs
(detailed description of data acquisition and post-processing are given online2 .
2.2 Ordinal Pattern and Frequent Ordinal Pattern
Definition 1: Ordinal Pattern. Let G = {V, E, w} denote a weighted network,

where V is a set of nodes, E is a set of edges, and w is the weight vector for those
edges with the i-th element w(ei ) representing the weight value for the edge ei .
If w(ei ) > w(ej ) for all 0 < i < j ≤ M , an ordinal pattern (op) of G is defined
as op = {e1 , e2 , · · · , eM } ⊆ E, where M is the number of edges in op.
An illustration of the proposed ordinal patterns is given in Fig. 2(a), where
a weighted network contains 5 nodes and 7 edges. We can get ordinal patterns
that contain two edges, e.g., op1 = {ea−b , eb−c } and op2 = {eb−c , ec−e }. The
ordinal pattern op1 actually denotes w(ea−b ) > w(eb−c ). We can further obtain
a A weighted a Root
op1={ea-b, eb-c} 0.7 0.6 network node
0.4
b e
0.7 0.5
a b c 0.5
0.3
0.2 b h n Level 1
2
c 0.4
d
op ={eb-c, ec-e}
0.5 0.3
op4={ea-b, eb-c, ec-e} g c
op1
i j o p Level 2
b c e
0.7 0.5 0.3
a b c e
op3={eb-e, ee-d} d e k l q Level 3
op5={eb-c, ec-e, ee-d} op4 opM
0.4 0.2
b e d 0.5 0.3 0.2 discarded
b c e d f m Level 4
Ordinal patterns with 2 edges Ordinal patterns with 3 edges

(b) Illustration of frequent ordinal
(a) Illustration of ordinal patterns pattern mining method
Fig. 2. Illustration of (a) ordinal patterns, and (b) frequent ordinal pattern mining
method.
1
http://adni.loni.usc.edu/.
2
http://www.nitrc.org/plugins/mwiki/index.php/neurobureau:AthenaPipeline.
4 M. Liu et al.
ordinal patterns containing three edges, e.g., op4 = {ea−b , eb−c , ec−e }. Hence, the
proposed ordinal pattern can be regarded as the combination of some ordinal
relations between pairs of edges. We only consider connected ordinal patterns
in this study. That is, an ordinal pattern is connected if and only if the edges
it contains can construct a connected sub-network. Different from conventional
methods, the ordinal pattern is defined on a weighted network directly to explic-
itly utilize the weight information of edges. Also, as a special sub-network, an
ordinal pattern can model the ordinal relations conveyed in a weighted network,
and thus, can naturally preserve the local structures of the network.
Definition 2: Frequent Ordinal Pattern. Let D = {G1 , G2 , · · · , GN } repre-

sent a set of N weighted networks. Given an ordinal pattern op, the frequency
ratio of op is defined as follows
|Gn |op is an ordinal pattern of Gn , Gn ∈ D|
f (op|D) = (1)
|D|
If f (op|D) > θ where θ is a pre-defined threshold value, the ordinal pattern op

is called as a f requent ordinal pattern of D.
We can see that frequent ordinal patterns are ordinal patterns that frequently
appear in a weighted network set. For instance, a frequent ordinal pattern in
a brain network set may represent common functional or structural informa-
tion among subjects. Besides, frequent ordinal patterns have an appealing prop-
erty that plays an important role in data mining process. Specifically, for two
ordinal patterns opi = {ei1 , ei2 , · · · , eiM } and opj = {ej1 , ej2 , · · · , ejM , ejM +1 }, if
eim = ejm (∀m ∈ {1, 2, · · · , M }), opi is called the parent of opj , and opj is
called a child of opi . As shown in Fig. 2(a), op1 = {ea−b , eb−c } is the parent of
op4 = {ea−b , eb−c , ec−e }. It is easy to prove that the frequency ratio of an ordinal
pattern is no larger than the frequency ratios of its parents. That is, if an ordinal
pattern is not a frequent ordinal pattern, its children and descendants are not
frequent ordinal patterns, either.
2.3 Ordinal Pattern Based Learning
Ordinal Pattern Construction: Using the above-mentioned preprocessing

method, we can construct one brain connectivity network for each subject,
with each node denoting a ROI and each edge representing Pearson correla-
tion between a pair of ROIs. We then construct ordinal patterns on patients’
and normal controls’ (NCs) brain connectivity networks separately. Given all
training subjects, we can obtain a brain network set with patients’ and NCs’
networks.
Frequent Ordinal Pattern Mining: We then propose a frequent ordinal

pattern mining algorithm to identify ordinal patterns that are frequently occur
in a brain network set, by construcing a deep first search (DFS) tree. We first
randomly choose an edge whose frequency ratio is larger than a threshold θ

as the root node. As illustrated in Fig. 2(b), a path from the root node to the
current node forms a specific ordinal pattern, e.g., op1 = {ea−b , eb−c }. We then
record the number of occurrences and compute the frequency ratio of this ordinal
pattern in a network set (with each network corresponding to a subject). If its
frequency ratio defined in Eq. (1) is larger than θ, the ordinal pattern (e.g., op1 )
is a frequent ordinal pattern and its children (e.g., op4 ) will be further searched.
Otherwise, the ordinal pattern (e.g., opM ) is not a frequent ordinal pattern,
and its descendants will be discarded directly. The max depth of a DFS tree is
limited by the level number. For example, if the level is 3, the frequent ordinal
patterns contain at most 3 edges. Obviously, more levels bring more frequent
ordinal patterns as well as more run-time.
Discriminative Ordinal Pattern Selection: There are a number of frequent

ordinal patterns, and some of them could have less discriminative power. Accord-
ingly, we perform a discriminative ordinal pattern selection process on those
frequent ordinal patterns. Specifically, we first mine frequent ordinal patterns
from the patients’ brain network set and the NCs’ brain network set separately.
According to the discriminative power, we select the most discriminative ordinal
patterns from all frequent ordinal patterns in both patients’ and NCs’ sets. The
ratio score [10] is used to evaluate the discriminative power of frequent ordinal
patterns. Given a frequent ordinal pattern opi mined from the patients’ brain
network set (denoted as D+ ), the ratio score of opi is defined as
|Gn |opi is an ordinal pattern of Gn , Gn ∈ D+ | |D− |
RS(opi ) = log × + (2)
|Gn |op is an ordinal pattern of Gn , Gn ∈ D | +
i − |D |
where D− means the NCs’ brain network set, and is a small value to prevent
the denominator to be 0. Similarly, the frequent ordinal pattern opj mined from
the NCs’ brain network set (i.e., D− ), its ratio score is computed as
|Gn |opj is an ordinal pattern of Gn , Gn ∈ D− | |D+ |
RS(opj ) = log × (3)
|Gn |opj is an ordinal pattern of Gn , Gn ∈ D+ | + |D− |
Classification: A total of k discriminative ordinal patterns are first selected,

with half from patients’ and the other half from NCs’ brain connectivity network
sets. We then combine those discriminative ordinal patterns to construct a fea-
ture matrix for representing subjects. Specifically, given |D| brain connectivity
networks (with each network corresponding to a specific subject) and k selected
discriminative ordinal patterns, we denote the feature matrix as F ∈ R|D|×k ,
where the element Fij represents the j-th feature of the i-th subject. Specifi-
cally, if the j-th discriminative ordinal pattern appears in the brain connectivity
network of the i-th subject, Fi,j is equal to 1, and otherwise 0. Finally, we adopt
an SVM classifier to identify AD/MCI/ADHD patients from NCs.
6 M. Liu et al.
3 Experiments
Experimental Settings: We perform three classification tasks, i.e., AD vs.
NC, MCI vs. NC and ADHD vs. NC classification, by using a 10-fold cross-
validation strategy. Note that those discriminative ordinal patterns are selected
only from training data. Classification performance is evaluated by accuracy
(ACC), sensitivity (SEN), specificity (SPE) and area under the ROC curve
(AUC). The parameter in ratio score in Eqs. (2) and (3) is set as 0.1 empirically.
With a inner cross-validation strategy, the level number in our frequent ordinal
pattern mining algorithm is chosen from [2, 6] with step 1, and the number of
discriminative ordinal patterns are chosen from [10, 100] with step 10.
We compare our method with two widely used network descriptors in brain
connectivity network based studies, including cluster coefficients [4] and dis-
criminative sub-networks [7]. Since these two descriptors require a threshold-
ing process, we adopt both single-threshold and multi-thresholds [5,11] strate-
gies to transform weighted networks to un-weighted ones. In summary, there
are four competing methods, including (1) clustering coefficients (CC) with
single-threshold, (2) clustering coefficient using multi-thresholds (CCMT), (3)
discriminative sub-networks (DS) with single-threshold, and (4) discriminative
sub-networks using multi-thresholds (DSMT). The linear SVM with the default
parameter (i.e., C = 1) is used as the classifier in different methods.
Results: Experimental results are listed in Table 1, from which we can see
that our method consistently achieves the best performance in three tasks. For
instance, the accuracy achieved by our method is 94.05 % in AD vs. NC clas-
sification, which is significantly better than the second best result obtained by
DSMT. This demonstrates that the ordinal patterns are discriminative in dis-
tinguishing AD/MCI/ADHD patients from NCs, compared with conventional
network descriptors.
We further plot those top 2 discriminative ordinal patterns identified by
our method in three tasks in Fig. 3. For instance, the most discriminative
ordinal pattern for AD, shown in top left of Fig. 3(a), can be recorded as op =
{eDCG.L−ACG.L , eACG.L−ROL.L , eROL.L−P AL.R , eP AL.R−LIN G.L , eP AL.R−M OG.R }.
Table 1. Comparison of different methods in three classification tasks
Method AD vs. NC MCI vs. NC ADHD vs. NC

ACC SEN SPE AUC ACC SEN SPE AUC ACC SEN SPE AUC
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
CC 72.62 73.53 67.94 70.94 71.14 72.73 68.00 68.69 71.29 72.03 70.41 70.51
CCMT 80.95 82.35 80.00 76.35 74.50 75.76 72.00 74.79 74.53 75.43 73.47 77.64
DS 76.19 76.47 76.00 75.59 77.18 78.79 74.00 74.89 81.01 81.36 80.61 80.82
DSMT 85.71 85.29 86.00 87.59 79.19 80.81 76.00 76.99 83.79 84.74 82.65 84.63
Proposed 94.05 96.77 92.45 96.35 88.59 87.27 92.31 84.57 87.50 88.89 85.85 87.37
From AD Set From NC Set
(a) AD vs. NC classification
From MCI Set From NC Set
(b) MCI vs. NC classification
From ADHD Set From NC Set
(c) ADHD vs. NC classification
Fig. 3. The most discriminative ordinal patterns identified by the proposed method in
three tasks. In each row, the first two columns show those top 2 discriminative ordinal
patterns selected from positive classes (i.e., AD, MCI, and ADHD), while the last two
columns illustrate those selected from the negative class (i.e., NC).
These results imply that the proposed ordinal patterns do reflect some local
structures of original brain networks.
We investigate the influence of frequent ordinal pattern mining level and the
number of selected discriminative ordinal patterns, with results shown in Fig. 4.
From this figure, we can see that our method achieves relatively stable results
when the number of selected ordinal patterns is larger than 40. Also, our method
achieves overall good performance when the level number in the frequent ordinal
pattern mining algorithm are 4 in AD/MCI vs. NC classification and 5 in ADHD
vs. NC classification, respectively.
We perform an additional experiment by using weights of each edge in ordinal
patterns as raw features, and achieve the accuracies of 71.43 %, 67.11 %, and
69.91 % in AD vs. NC, MCI vs. NC and ADHD vs. NC classification, respectively.
We further utilize a real valued network descriptor based on ordinal patterns
(by taking the product of weights in each ordinal pattern), and obtained the
accuracies of 78.52 %, 72.37 %, and 72.69 % in three tasks, respectively.
8 M. Liu et al.
1.0 1.0 1.0
0.9 0.9 0.9
0.8 0.8 0.8
ACC
ACC
ACC
0.7 0.7 0.7
0.6 Level=1 Level=2 0.6 Level=1 Level=2 0.6 Level=1 Level=2

Level=3 Level=4 Level=3 Level=4 Level=3 Level=4
0.5 0.5 0.5
Level=5 Level=5 Level=5
0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 140 160 180 200
Number of discriminative patterns Number of discriminative ordinal patterns Number of discriminative ordinal patterns
Fig. 4. Influence of the level number in frequent ordinal pattern mining method and the
number of discriminative ordinal patterns in AD vs. NC (left), MCI vs. NC (middle),
and ADHD vs. NC (right) classification.
4 Conclusion
In this paper, we propose a new network descriptor (i.e., ordinal pattern)
for brain connectivity networks. The proposed ordinal patterns are defined on
weighted networks, which can preserve the weights information of edges and the
local structure of original brain networks. Then, we develop an ordinal pattern
based brain network classification method for the diagnosis of AD/MCI and
ADHD. Experimental results on both ADNI and ADHD-200 data sets demon-
strate the efficacy of our method.
Acknowledgment. This study was supported by National Natural Science Founda-

tion of China (Nos. 61422204, 61473149, 61473190, 61573023), the Jiangsu Natural Sci-
ence Foundation for Distinguished Young Scholar (No. BK20130034), and the NUAA
Fundamental Research Funds (No. NE2013105).
References
1. Robinson, E.C., Hammers, A., Ericsson, A., Edwards, A.D., Rueckert, D.: Identi-
fying population differences in whole-brain structural networks: a machine learning
approach. NeuroImage 50(3), 910–919 (2010)
2. Sporns, O.: From simple graphs to the connectome: networks in neuroimaging.
NeuroImage 62(2), 881–886 (2012)
3. Rubinov, M., Sporns, O.: Complex network measures of brain connectivity: uses
and interpretations. NeuroImage 52(3), 1059–1069 (2010)
4. Wee, C.Y., Yap, P.T., Li, W., Denny, K., Browndyke, J.N., Potter, G.G., Welsh-
Bohmer, K.A., Wang, L., Shen, D.: Enriched white matter connectivity networks
for accurate identification of MCI patients. NeuroImage 54(3), 1812–1822 (2011)
5. Jie, B., Zhang, D., Wee, C.Y., Shen, D.: Topological graph kernel on multiple
thresholded functional connectivity networks for mild cognitive impairment classi-
fication. Hum. Brain Mapp. 35(7), 2876–2897 (2014)
6. Liu, M., Zhang, D., Shen, D.: Relationship induced multi-template learning for
diagnosis of Alzheimer disease and mild cognitive impairment. IEEE Trans. Med.
Imaging 35(6), 1463–1474 (2016)
7. Fei, F., Jie, B., Zhang, D.: Frequent and discriminative subnetwork mining for mild
cognitive impairment classification. Brain Connect. 4(5), 347–360 (2014)
8. Brier, M.R., Thomas, J.B., Fagan, A.M., Hassenstab, J., Holtzman, D.M.,
Benzinger, T.L., Morris, J.C., Ances, B.M.: Functional connectivity and
graph theory in preclinical Alzheimer’s disease. Neurobiol. Aging 35(4), 757–768
(2014)
9. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O.,
Delcroix, N., Mazoyer, B., Joliot, M.: Automated anatomical labeling of activations
in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject
brain. NeuroImage 15(1), 273–289 (2002)
10. Yan, X., Cheng, H., Han, J., Yu, P.S.: Mining significant graph patterns by leap
search. In: Proceedings of ACM SIGMOD International Conference on Manage-
ment of Data, pp. 433–444. ACM (2008)
11. Sanz-Arigita, E.J., Schoonheim, M.M., Damoiseaux, J.S., Rombouts, S.,
Maris, E., Barkhof, F., Scheltens, P., Stam, C.J., et al.: Loss of ‘small-
world’ netowrks in Alzheimer’s disease: graph analysis of FMRI resting-state func-
tional connectivity. PLoS ONE 5(11), e13788 (2010)
Discovering Cortical Folding Patterns
in Neonatal Cortical Surfaces Using
Large-Scale Dataset
Yu Meng1,2, Gang Li2, Li Wang2, Weili Lin2, John H. Gilmore3,

and Dinggang Shen2(&)
1
Department of Computer Science, University of North Carolina at Chapel Hill,
Chapel Hill, NC, USA
2
Department of Radiology and BRIC, University of North Carolina
at Chapel Hill, Chapel Hill, NC, USA
dinggang_shen@med.unc.edu
3
Department of Psychiatry, University of North Carolina at Chapel Hill,
Abstract. The cortical folding of the human brain is highly complex and
variable across individuals. Mining the major patterns of cortical folding from
modern large-scale neuroimaging datasets is of great importance in advancing
techniques for neuroimaging analysis and understanding the inter-individual
variations of cortical folding and its relationship with cognitive function and
disorders. As the primary cortical folding is genetically influenced and has been
established at term birth, neonates with the minimal exposure to the complicated
postnatal environmental influence are the ideal candidates for understanding the
major patterns of cortical folding. In this paper, for the first time, we propose a
novel method for discovering the major patterns of cortical folding in a
large-scale dataset of neonatal brain MR images (N = 677). In our method, first,
cortical folding is characterized by the distribution of sulcal pits, which are the
locally deepest points in cortical sulci. Because deep sulcal pits are genetically
related, relatively consistent across individuals, and also stable during brain
development, they are well suitable for representing and characterizing cortical
folding. Then, the similarities between sulcal pit distributions of any two sub-
jects are measured from spatial, geometrical, and topological points of view.
Next, these different measurements are adaptively fused together using a simi-
larity network fusion technique, to preserve their common information and also
catch their complementary information. Finally, leveraging the fused similarity
measurements, a hierarchical affinity propagation algorithm is used to group
similar sulcal folding patterns together. The proposed method has been applied
to 677 neonatal brains (the largest neonatal dataset to our knowledge) in the
central sulcus, superior temporal sulcus, and cingulate sulcus, and revealed
multiple distinct and meaningful folding patterns in each region.

DOI: 10.1007/978-3-319-46720-7_2
Discovering Cortical Folding Patterns Using Large-Scale Dataset 11
1 Introduction
The human cerebral cortex is a highly convoluted and complex structure. Its cortical
folding is quite variable across individuals (Fig. 1). However, certain common folding
patterns exist in some specific cortical regions as shown in the classic textbook [1],
which examined 25 autopsy specimen adult brains. Mining the major representative
patterns of cortical folding from modern large-scale datasets is of great importance in
advancing techniques for neuroimaging analysis and understanding the inter-individual
variations of cortical folding and their relationship with structural connectivity, cog-
nitive function, and brain disorders. For example, in cortical surface registration [2],
typically a single cortical atlas is constructed for a group of brains. Such an atlas may
not be able to reflect some important patterns of cortical folding, due to the averaging
effect, thus leading to poor registration accuracy for some subjects that cannot be well
characterized by the folding patterns in the atlas. Building multiple atlases, with each
representing one major pattern of cortical folding, will lead to boosted accuracy in
cortical surface registration and subsequent group-level analysis.
Fig. 1. Huge inter-individual variability of sulcal folding patterns in neonatal cortical surfaces,
colored by the sulcal depth. Sulcal pits are shown by white spheres.
To investigate the patterns of cortical folding, a clustering approach has been

proposed [3]. This approach used 3D moment invariants to represent each sulcus and
used the agglomerative clustering algorithm to group major sulcal patterns in 150 adult
brains. However, the discrimination of 3D moment invariants was limited in distin-
guishing different patterns. Hence, a more representative descriptor was proposed in
[4], where the distance between any two sulcal folds in 62 adult brains was computed
after they were aligned, resulting in more meaningful results. Meanwhile, sulcal pits,
the locally deepest points in cortical sulci, were proposed for studying the
inter-individual variability of cortical folding [5]. This is because sulcal pits have been
suggested to be genetically affected and closely related to functional areas [6]. It has
been found that the spatial distribution of sulcal pits is relatively consistent across
individuals, compared to the shallow folding regions, in both adults (148 subjects) and
infants (73 subjects) [7, 8].
In this paper, we propose a novel method for discovering major representative
patterns of cortical folding on a large-scale neonatal dataset (N = 677). The motivation
of using a neonatal dataset is that all primary cortical folding is largely genetically
determined and has been established at term birth [9]; hence, neonates with the minimal
exposure to the complicated postnatal environmental influence are the ideal candidates
12 Y. Meng et al.
for discovering the major cortical patterns. This is very important for understanding the
biological relationships between cortical folding and brain functional development or
neurodevelopmental disorders rooted during infancy. The motivation of using a
large-scale dataset is that small datasets may not sufficiently cover all kinds of major
cortical patterns and thus would likely lead to biased results.
In our method, we leveraged the reliable deep sulcal pits to characterize the cortical
folding, and thus eliminating the effects of noisy shallow folding regions that are
extremely heterogeneous and variable. Specifically, first, sulcal pits were extracted
using a watershed algorithm [8] and represented using a sulcal graph. Then, the dif-
ference between sulcal pit distributions of any two cortices was computed based on six
complementary measurements, i.e., sulcal pit position, sulcal pit depth, ridge point
depth, sulcal basin area, sulcal basin boundary, and sulcal pit local connection, thus
resulting in six matrices. Next, these difference matrices were further converted to
similarity matrices, and adaptively fused as one comprehensive similarity matrix using
a similarity network fusion technique [10], to preserve their common information and
also capture their complementary information. Finally, based on the fused similarity
matrix, a hierarchical affinity propagation clustering algorithm was performed to group
sulcal graphs into different clusters. The proposed method was applied to 677 neonatal
brains (the largest neonatal dataset to our knowledge) in the central sulcus, superior
temporal sulcus, and cingulate sulcus, and revealed multiple distinct and meaningful
patterns of cortical folding in each region.
2 Methods
Subjects and Image Acquisition. MR images for N = 677 term-born neonates were
acquired on a Siemens head-only 3T scanner with a circular polarized head coil. Before
scanning, neonates were fed, swaddled, and fitted with ear protection. All neonates were
unsedated during scanning. T1-weighted MR images with 160 axial slices were obtained
using the parameters: TR = 1,820 ms, TE = 4.38 ms, and resolution = 1 11 mm3.
T2-weighted MR images with 70 axial slices were acquired with the parameters:
TR =7,380 ms, TE = 119 ms, and resolution = 1.25 1.25 1.95 mm3.
Cortical Surface Mapping. All neonatal MRIs were processed using an infant-
dedicated pipeline [2]. Specifically, it contained the steps of rigid alignment between
T2 and T1 MR images, skull-stripping, intensity inhomogeneity correction, tissue
segmentation, topology correction, cortical surface reconstruction, spherical mapping,
spherical registration onto an infant surface atlas, and cortical surface resampling [2].
All results have been visually checked to ensure the quality.
Sulcal Pits Extraction and Sulcal Graph Construction. To characterize the sulcal
folding patterns in each individual, sulcal pits, the locally deepest point of sulci, were
extracted on each cortical surface (Fig. 1) using the method in [8]. The motivation is
that deep sulcal pits were relatively consistent across individuals and stable during
brain development as reported in [6], and thus were well suitable as reliable landmarks
for characterizing sulcal folding. To exact sulcal pits, each cortical surface was
partitioned into small basins using a watershed method based on the sulcal depth map
[11], and the deepest point of each basin was identified as a sulcal pit, after pruning
noisy basins [8]. Then, a sulcal graph was constructed for each cortical surface as in
[5]. Specifically, each sulcal pit was defined as a node, and two nodes were linked by
an edge, if their corresponding basins were spatially connected.
Sulcal Graph Comparison. To compare two sulcal graphs, their similarities were
measured using multiple metrics from spatial, geometrical, and topological points of
view, to capture the multiple aspects of sulcal graphs. Specifically, we computed six
distinct metrics, using sulcal pit position D, sulcal pit depth H, sulcal basin area S,
sulcal basin boundary B, sulcal pit local connection C, and ridge point depth R. Given
N sulcal graphs from N subjects, any two of them were compared using above six
metrics, so a N N matrix was constructed for each metric.
The difference between two sulcal graphs can be measured by comparing the
attributes of the corresponding sulcal pits in the two graphs. In general, the difference
between any sulcal-pit-wise attribute of sulcal graphs P and Q can be computed as
1 1 X 1 X
Diff ðP; Q; diff X Þ ¼ ð diff X ði; QÞ þ diff X ðj; PÞÞ ð1Þ
2 VP i2P VQ j2Q
where VP and VQ are respectively the numbers of sulcal pits in P and Q, and diff X ði; QÞ
is the difference of a specific attribute X between sulcal pit i and its corresponding
sulcal pitin graph Q. Note that we treat the closest pit as the corresponding sulcal pit, as
all cortical surfaces have been aligned to a spherical surface atlas.
(1) Sulcal Pit Position. Based on Eq. 1, the difference between P and Q in terms of
sulcal pit positions is computed as DðP; QÞ ¼ Diff ðP; Q; diff D Þ, where diff D ði; QÞ is
the geodesic distance between sulcal pit i and its corresponding sulcal pit in Q on the
spherical surface atlas.
(2) Sulcal Pit Depth. For each subject, the sulcal depth map is normalized by dividing
by the maximum depth value, to reduce the effect of the brain size variation. The
difference between P and Q in terms of sulcal pit depth is computed as
H ðP; QÞ ¼ Diff ðP; Q; diff H Þ, where diff H ði; QÞ is the depth difference between sulcal
pit i and its corresponding sulcal pit in Q.
(3) Sulcal Basin Area. To reduce the effect of surface area variation across subjects,
the area of each basin is normalized by the area of the whole cortical surface. The
difference between P and Q in terms of sulcal basin area of graphs P and Q is computed
as SðP; QÞ ¼ Diff ðP; Q; diff S Þ, where diff S ði; QÞ is the area difference between the
basins of sulcal pit i and its corresponding sulcal pit in Q.
(4) Sulcal Basin Boundary. The difference between P and Q in terms of sulcal basin
boundary is formulated as BðP; QÞ ¼ Diff ðP; Q; diff B Þ, where diff B ði; QÞ is the dif-
ference between the sulcal basin boundaries of sulcal pit i and its corresponding sulcal
pit in Q. Specifically, we define a vertex as a boundary vertex of a sulcal basin, if one of
its neighboring vertices belongs to a different basin. Given two corresponding sulcal
pits i 2 P and i0 2 Q, their sulcal basin boundary vertices are respectively denoted as Bi
and Bi0 . For any boundary vertex a 2 Bi , its closest vertex a0 is found from Bi0 ; and
similarly for any boundary vertex b0 2 Bi0 , its closest vertex b is found from Bi . Then,
14 Y. Meng et al.
the difference between the basin boundaries of sulcal pit i and its corresponding pit
i0 2 Q is defined as:
!
1 1 X 0 1 X
diff B ði; QÞ ¼ ;a 0
2B 0 disða; a Þ þ b0 2Bi0 ;b2Bi
disðb0 ; bÞ ð2Þ
2 NBi a2Bi i NB0i
where NBi and NBi0 are respectively the numbers of vertices in Bi and Bi0 , and disð; Þ is
the geodesic distance between two vertices on the spherical surface atlas.
(5) Sulcal Pit Local Connection. The difference between local connections of two
graphs P and Q is computed as CðP; QÞ ¼ Diff ðP; Q; diff C Þ, where diff C ði; QÞ is the
difference of local connection after mapping sulcal pit i to graph Q. Specifically, for a
sulcal pit i, assume k is one of its connected sulcal pits. Their corresponding sulcal pits
in graph Q are respectively i0 and k0 . The change of local connection after mapping
sulcal pit i to graph Q is measured by:
1 X
diff C ði; QÞ ¼ jdisði; kÞ disði0 ; k 0 Þj ð3Þ
NGi k2Gi
where Gi is the set of sulcal pits connecting to i, and NGi is the number of pits in Gi.
(6) Ridge Point Depth. Ridge points are the locations, where two sulcal basins meet.
As suggested by [5], the depth of the ridge point is an important indicator for distin-
guishing sulcal patterns. Thus, we compute the difference between the average ridge
point depth of sulcal graphs P and Q, as:
X
1 1 X
RðP; QÞ ¼ r e r
e ð4Þ
EP e2P EQ e2Q
where EP and EQ are respectively the numbers of edges in P and Q; e is the edge
connecting two sulcal pits; and re is the normalized depth of ridge point in the edge e.
Sulcal Graph Similarity Fusion. The above six metrics measured the inter-individual
differences of sulcal graphs from different points of view, and each provided com-
plementary information to the others. To capture both the common information and the
complementary information, we employed a similarity network fusion (SNF) method
[10] to adaptively integrate all six metrics together. To do this, each difference matrix
was normalized by its maximum elements, and then transformed into a similarity
matrix as:
M 2 ðx; yÞ
WM ðx; yÞ ¼ expð Þ ð5Þ
U þ U þ M ðx;yÞ
l x y3
where l was a scaling parameter; M could be anyone of the above six matrices; Ux and
Uy were respectively the average values of the smallest K elements in the x-th row and
y-th row of M. Finally, six similarity matrices WD, WH, WR, WS, WB, and WC were fused
together as a single similarity matrix W by using SNF with t iterations. The parameters
were set as l ¼ 0:8, K = 30, and t = 20 as suggested in [10].
Sulcal Pattern Clustering. To cluster sulcal graphs into different groups based on the
fused similarity matrix W, we employed the Affinity Propagation Clustering
(APC) algorithm [12], which could automatically determine the number of clusters
based on the natural characteristics of data. However, since sulcal folding patterns were
extremely variable across individuals, too many clusters were identified after per-
forming APC, making it difficult to observe the most important major patterns.
Therefore, we proposed a hierarchical APC framework to further group the clusters.
Specifically, after running APC, (1) the exemplars of all clusters were used to perform a
new-level APC, so less clusters were generated. Since the old clusters were merged, the
old exemplars may be no longer representative for the new clusters. Thus, (2) a new
exemplar was selected for each cluster based on the maximal average similarity to all
the other samples in the cluster. We repeated these steps, until the cluster number
reduced to an expected level (<5).
3 Results
We extracted sulcal pits on cortical surfaces from 677 neonatal brains. To demonstrate
the validity of our methods for discovering the cortical folding patterns, we employed
three representative cortical regions, i.e., the central sulcus, superior temporal sulcus,
and cingulate sulcus. For each cortical region, a 677 677 similarity matrix was
computed using SNF and all subjects were then clustered into different groups by the
hierarchical APC. To better explore the major folding patterns, an average cortical
surface was constructed for each cluster, based on 20 representative cortical surfaces
that are most similar to the exemplar in each cluster. All sulcal pits in each cluster were
mapped onto the average surfaces.
For the central sulcus, three distinct folding patterns were identified, as shown in
Fig. 2. In the pattern (a), two sulcal pits concentration areas can be observed, indicating
two sulcal basins in the central sulcus. This pattern was further confirmed by six
representative examples of individual subjects (second to seventh columns). In the
pattern (b), three distinct sulcal pits concentration areas can be observed, with one extra
area (basin 3) located in the most inferior portion of the central sulcus, compared to the
pattern (a). In the pattern (c), three distinct sulcal pits concentration areas can be
observed as in the pattern (b), but they are more concentrated. This is also confirmed by
six representative examples of (c). Moreover, compared to the pattern (b), the sulcal
basin 2 is very short, while the sulcal basin 3 is very long in the pattern (c). Such
phenomenon is likely related to “hand knob shift” in a study of the shape of the central
sulcus in adults [13]. Previously, different studies reported either two [8] or three [7]
sulcal basins in the central sulcus. Herein, we can see that both two-basin and
three-basin patterns are major patterns of sulcal folding.
For the superior temporal sulcus (STS), three distinct folding patterns were
identified, as shown in Fig. 3. In the pattern (a), the distribution of sulcal pits in the
posterior portion of STS is more diffused and bended, compared to the patterns (b) and
16 Y. Meng et al.
Fig. 2. Sulcal folding patterns in the central sulcus. The first column shows three discovered
sulcal folding patterns, with all sulcal pits (red spheres) mapped onto the average surface of each
cluster. For each pattern, the second to seventh columns show six representative examples of
individual subjects. Different sulcal basins are marked with different colors. The percentage of
each pattern is shown at the top-left corner.
(c), indicating the differences in the folding shape of STS. This is supported by a
previous cortical folding study in adults, which reported that for some brains there was
a Y-shaped STS but for some brains there was a single long STS [4]. In the pattern (b),
compared to (a) and (c), an extra concentration region of sulcal pits is exhibited near
the temporal pole, which is also confirmed by six representative examples in individual
subjects, showing small sulcal basins near the temporal pole. In the pattern (c), the
sulcal basin in the anterior portion of STS is very long and straight, extending to the
temporal pole.
Fig. 3. Sulcal folding patterns in the superior temporal sulcus. The first column shows three
discovered sulcal folding patterns, with all sulcal pits (red spheres) mapped onto the average
surface of each cluster. For each pattern, the second to seventh columns show six representative
examples of individual subjects. Different sulcal basins are marked with different colors.
For the cingulate sulcus, four distinct major folding patterns were identified, as
shown in Fig. 4. In the pattern (a), a single long cingulate sulcus is clearly shown,
while in the pattern (b), two long parallel sulci are observed. This is consistent with the
previous cortical folding pattern study in adults [4], which reported that two cingulate
sulci were observed in some brains. A study of autopsy specimen brains also reported
that 24 % left hemispheres had double parallel cingulate sulcus [1]. In the pattern (c),
the cingulate sulcus is interrupted in the anterior region; in contrast, in the pattern (d),
the cingulate sulcus is interrupted in the posterior region. This two types of interruption
were also reported in [1]. In pattern (c) and pattern (d), some parallel sulci can be
observed, but they are much shorter than that in pattern (b).
Fig. 4. Sulcal folding patterns in the cingulate sulcus. The first column shows four discovered
folding patterns, with all sulcal pits (red spheres) mapped onto the average surface of each
cluster. The second column shows the schematic drawing of the sulcal curves (blue dashes) on
the average surface of each cluster. For each pattern, the third to seventh columns show five
representative examples of individual subjects. The percentage of each pattern is shown at the
top-left corner.
4 Conclusion
The main contribution of this paper is twofold. First, a novel generic method for
discovering the cortical folding patterns was proposed, by leveraging the reliable sulcal
pits. Specifically, multiple complementary similarity measures of sulcal pits graph were
first computed and adaptively fused to comprehensively capture the individual simi-
larity. Then, based on the fused similarity, sulcal pits graphs were clustered using a
hierarchical affinity propagation algorithm. Second, for the first time, we applied the
proposed method to discover the cortical folding patterns in a large-scale neonatal
dataset with 677 subjects, and revealed multiple distinct and representative patterns.
These results suggested that it is needed to construct multiple representative cortical
folding atlases for each region for better spatial normalization of individuals in
18 Y. Meng et al.
group-level studies. Our future work includes discovering patterns in other cortical
regions, and exploring their relationships with structural connectivity and cognitive
functions.
Acknowledgements. This work was supported in part by UNC BRIC-Radiology start-up fund
and NIH grants (MH107815, MH108914, MH100217, HD053000, and MH070890).
References
1. Ono, M., Kubik, S., Abernathey, C.D.: Atlas of the Cerebral Sulci. Thieme, New York
(1990)
2. Li, G., Wang, L., Shi, F., et al.: Construction of 4D high-definition cortical surface atlases of
infants: methods and applications. Med. Image Anal. 25, 22–36 (2015)
3. Sun, Z.Y., Rivière, D., Poupon, F., Régis, J., Mangin, J.-F.: Automatic inference of sulcus
patterns using 3D moment invariants. In: Ayache, N., Ourselin, S., Maeder, A. (eds.)
MICCAI 2007, Part I. LNCS, vol. 4791, pp. 515–522. Springer, Heidelberg (2007)
4. Sun, Z.Y., Perrot, M., Tucholka, A., Rivière, D., Mangin, J.-F.: Constructing a dictionary of
human brain folding patterns. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C.
(eds.) MICCAI 2009, Part II. LNCS, vol. 5762, pp. 117–124. Springer, Heidelberg (2009)
5. Im, K., Raschle, N.M., Smith, S.A., et al.: Atypical sulcal pattern in children with
developmental dyslexia and at-risk kindergarteners. Cereb. Cortex 26, 1138–1148 (2016)
6. Lohmann, G., von Cramon, D.Y., Colchester, A.C.: Deep sulcal landmarks provide an
organizing framework for human cortical folding. Cereb. Cortex 18, 1415–1420 (2008)
7. Im, K., Jo, H.J., Mangin, J.F., et al.: Spatial distribution of deep sulcal landmarks and
hemispherical asymmetry on the cortical surface. Cereb. Cortex 20, 602–611 (2010)
8. Meng, Y., Li, G., Lin, W., et al.: Spatial distribution and longitudinal development of deep
cortical sulcal landmarks in infants. NeuroImage 100, 206–218 (2014)
9. Li, G., Nie, J., Wang, L., et al.: Mapping region-specific longitudinal cortical surface
expansion from birth to 2 years of age. Cereb. Cortex 23, 2724–2733 (2013)
10. Wang, B., Mezlini, A.M., Demir, F., et al.: Similarity network fusion for aggregating data
types on a genomic scale. Nat. Methods 11, 333–337 (2014)
11. Li, G., Nie, J., Wang, L., et al.: Mapping longitudinal hemispheric structural asymmetries of
the human cerebral cortex from birth to 2 years of age. Cereb. Cortex 24, 1289–1300 (2014)
12. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315,
972–976 (2007)
13. Sun, Z.Y., Kloppel, S., Riviere, D., et al.: The effect of handedness on the shape of the
central sulcus. NeuroImage 60, 332–339 (2012)
Modeling Functional Dynamics of Cortical
Gyri and Sulci
Xi Jiang1(&), Xiang Li1, Jinglei Lv1,2, Shijie Zhao2, Shu Zhang1,

Wei Zhang1, Tuo Zhang2, and Tianming Liu1
1
Cortical Architecture Imaging and Discovery Lab,
Department of Computer Science and Bioimaging Research Center,
The University of Georgia, Athens, GA, USA
superjx2318@gmail.com
2
School of Automation, Northwestern Polytechnical University, Xi’an, China
Abstract. Cortical gyrification is one of the most prominent features of human

brain. A variety of studies in the brain mapping field have demonstrated the
specific structural and functional differences between gyral and sulcal regions.
However, previous studies of gyri/sulci function analysis based on the fMRI
data assume the temporal stationarity over the entire fMRI scan, while the
possible temporal dynamics of gyri/sulci function is largely unknown. We
present a computational framework to model the functional dynamics of cortical
gyri and sulci based on task fMRI data. Specifically, the whole-brain fMRI
signals’ temporal segments are derived via the sliding time window approach.
The spatial overlap patterns among functional networks (SOPFNs), which are
crucial for characterizing brain functions, are then measured within each time
window via a group-wise sparse representation approach. Finally, the temporal
dynamics of SOPFNs distribution on gyral/sulcal regions across all time win-
dows are assessed. Experimental results based on the publicly released Human
Connectome Project task fMRI data demonstrated that the proposed framework
identified meaningful temporal dynamics difference of the SOPFNs distribution
between gyral and sulcal regions which are reproducible across different sub-
jects and task fMRI datasets. Our results provide novel understanding of func-
tional dynamics mechanisms of human cerebral cortex.
Keywords: Cortical gyri and sulci Functional dynamics Spatial overlap

pattern Task-based fMRI
1 Introduction
Cortical gyrification, which is highly convoluted as convex gyri and concave sulci, is
one of the most prominent characteristics of human brain [1]. A variety of studies have
reported the specific structural/functional difference between gyral and sulcal regions.
For example, from structural perspective, it is reported that the termination of
streamline white matter fiber bundles derived from diffusion tensor imaging or high
angular resolution diffusion imaging concentrate on gyrus in both human fetus and
adult brains, as well as chimpanzee and macaque brains [2–4]. From functional per-
spective, a recent study reported that the functional connectivity based on resting state
DOI: 10.1007/978-3-319-46720-7_3
20 X. Jiang et al.
functional magnetic resonance imaging (rsfMRI) data is strong between gyral-gyral

regions, weak between sulcal-sulcal regions, and moderate between gyral-sulcal
regions [5]. Another study demonstrated that the task-based heterogeneous functional
regions derived from task fMRI (tfMRI) data locate significantly more on gyral regions
than on sulcal regions [6].
Although significant progresses have been achieved in exploration of structural/
functional difference between gyral and sulcal regions, a potential limitation is that
previous studies of gyri/sulci function analysis based on the fMRI data (e.g., [5, 6])
assume the temporal stationarity over the entire fMRI scan, i.e., the functional mea-
surement is performed over the entire fMRI scan, while the possible temporal dynamics
of gyri/sulci function has been largely unknown. Neuroscience studies have suggested
that the function of the brain is dynamic both spatially and temporally [7]. That is, the
dynamically changing functional interactions between different cortical regions mediate
the moment-by-moment functional switching in the brain [7]. Recent studies based on
fMRI data also demonstrate that brain undergoes dynamical changes of functional
connectivity [8]. In short, the investigation of possible functional dynamics difference
between gyral and sulcal regions is crucial for understanding functional dynamic
mechanisms of human cerebral cortex.
As an attempt to address the abovementioned limitation, in this work, we propose a
computational framework to model the functional dynamics of cortical gyri and sulci
based on task fMRI data. Firstly, the whole-brain fMRI signals’ temporal segments are
derived via the widely used sliding time window approach. Secondly, the spatial
overlap patterns among functional networks (SOPFNs), which characterize the brain
regions that are involved in multiple concurrent neural processes under specific task
performance and has been demonstrated crucial in depicting brain functions [9, 10], are
measured within each time window via the proposed group-wise sparse representation
approach. Finally, based on the SOPFN distribution on gyral/sulcal regions within each
time window, the temporal dynamics of SOPFN distribution on gyral/sulcal regions
across all time windows are assessed. We hypothesize that there is functional dynamics
(which is characterized by temporal dynamics of SOPFN distribution in this work)
difference between cortical gyral and sulcal regions. Given the lack of ground-truth in
brain mapping, we argue that the reproducibility across different subjects and tasks is a
reasonable verification of the identified temporal dynamics difference of SOPFN dis-
tribution between cortical gyral and sulcal regions.
2 Materials and Methods
2.1 Data Acquisition and Pre-processing

The recently publicly released HCP (Q1 release) grayordinate tfMRI data [11]
including 64 subjects and 7 task designs (emotion, gambling, language, motor, rela-
tional, social, and working memory) were used in this work. The tfMRI acquisition
parameters and task designs are referred to [11]. Pre-processing of the tfMRI data was
referred to [10]. Note that all subjects have the same number (64984) of ‘grayordinates’
(cortical surface vertices) [11] which have reasonably precise correspondences across
subjects for group-wise analysis.
Modeling Functional Dynamics of Cortical Gyri and Sulci 21
2.2 tfMRI Signals’ Temporal Segments Extraction

The temporal segments of whole-brain tfMRI signals are firstly extracted for each
subject as illustrated in Fig. 1. Specifically, for subject i, the tfMRI signals of whole-
brain grayordinates were extracted, normalized to zero mean and standard deviation
of 1, and aggregated into a signal matrix Xi Rtn with t tfMRI time points and
n grayordinates (Fig. 1a). The sliding time window approach, which has been widely
and effectively applied for functional brain temporal dynamics analysis (e.g., [8]), is
then adopted and defined in Eq. (1) to segment Xi into a series of consecutive temporal
w
segments Xi j Rln within the time window wj which starts at time point tj and has
unified window length l:
w
Xi j ¼ Xiq jtj q tj þ l; tj ¼ 1; ::; ðt l þ 1Þ ð1Þ
where Xiq is the value of q-th row of Xi at time point q. There are ðt l þ 1Þ time
windows in total for each subject.
Fig. 1. tfMRI signals’ temporal segments extraction. (a) The cortical surface and whole-brain
tfMRI signal matrix Xi of subject i. The tfMRI signal of an example grayordinate (cortical vertex)
is shown and highlighted by the blue frame. (b) Examples of extracted two consecutive temporal
w w
segments Xi j and Xi j þ 1 (highlighted by yellow and blue frames, respectively).
2.3 SOPFN Identification via Group-Wise Sparse Representation

of tfMRI Signal Temporal Segments
Based on the extracted tfMRI signals’ temporal segments within each time window, we
identify the SOPFN within each time window following the two major steps. First, we
identify meaningful group-wise consistent functional networks across different subjects
within each time window via a group-wise sparse representation approach. Second, we
measure the SOPFN based on the identified functional brain networks within each time
window.
The dictionary learning and sparse representation framework has been demon-
strated to be efficient and effective in identifying concurrent functional brain networks
from tfMRI signals [12, 13]. As illustrated in Fig. 2a, considering a group of I subjects,
w
the temporal segment matrices Xi j at time window wj of all subjects are arranged into a
big matrix X wj RlðnIÞ . X wj is then represented as an over-complete dictionary matrix
22 X. Jiang et al.
Fig. 2. (a) The illustration of group-wise sparse representation of temporal segments of a group
of subjects. (b) An example identified group-wise consistent functional network via mapping a
specific row (highlighted by red) of pwj back onto the cortical surface.
Dwj Rlm (m is the dictionary size, m > l and m (n I)) and a sparse coefficient
weight matrix awj RmðnIÞ using an effective online dictionary learning algorithm [14].
In brief, an empirical cost function considering the average loss of regression to n I
temporal segments is defined as
D 1 X nI
1 wj w w
fnI ðDwj Þ ¼ min jjxk Dak j jj22 þ kjjak j jj1 ð2Þ
n I k¼1 ak R 2
wj m
where ‘1 -norm regularization and k are adopted to trade-off the regression residual and
w w
sparsity level of ak j . xk j is the k-th column of X wj . To make the coefficients in awj
w
comparable, we also have a constraint for k-th column dk j of Dwj as defined in Eq. (3).
The whole problem is then rewritten as a matrix factorization problem in Eq. (4) and
solved by [14] to obtain Dwj and awj .
n o
D w T w
C ¼ Dwj Rlm s:t:8 k ¼ 1; . . .m; ðdk j Þ dk j 1 ð3Þ
1 wj w w
min jjx Dwj ak j jj22 þ kjjak j jj1 ð4Þ
wj w
D C;a j R mðnIÞ 2 k
Since the dictionary learning and sparse representation maintain the organization of
all temporal segments and subjects in X wj , the obtained awj also preserve the spatial
information of temporal segments across I subjects. We therefore decompose awj into
w w
I sub-matrices a1 j ; . . .; ai j Rmn corresponding to I subjects (Fig. 2a). The element (r,
s) in each sub-matrix represents the corresponding coefficient value of the s-th gray-
ordinate to the r-th dictionary in Dwj for each subject. In order to obtain a common
sparse coefficient weight matrix across I subjects, we perform t-test of the null
hypothesis for (r, s) across I subjects (p-value < 0.05) similar as in [15] to obtain the
p-value matrix pwj Rmn (Fig. 2b), in which element (r, s) represents the statistically
coefficient value of the s-th grayordinate to the r-th dictionary across all I subjects. pwj
is thus the common sparse coefficient weight matrix. From a brain science perspective,
w
dk j (k-th column of Dwj ) represents the temporal pattern of a specific group-wise
w
consistent functional network and its corresponding coefficient vector pk j (k-th row of
pwj ) can be mapped back to cortical surface (color-coded by z-score transformed from
p-value) (Fig. 2b) to represent the spatial pattern of the network. We then identify those
w
meaningful group-wise consistent functional networks from pk j (k = 1, …, m) similar
as in [10]. Specifically, the GLM-derived activation maps and the intrinsic networks
w
templates provided in [16] are adopted as the network templates. The network from pk j
with the highest spatial pattern similarity with a specific network reference (defined as
J ðS; T Þ ¼ jS \ T j=jT j, S and T are spatial patterns of a specific network and a template,
respectively) is identified as a group-wise consistent functional brain network at wj.
Once we identify all group-wise consistent functional brain networks at wj, the
SOPFN at wj is defined as the set of all common cortical vertices gj (i = 1..64984)
involved in the spatial patterns of all identified functional networks [9, 10]:
Vwj ¼ 8gi s:t: gi belongs to all networks at wj ð5Þ
2.4 Temporal Dynamics Assessment of SOPFN Distribution

on Gyri/Sulci
Based on the identified SOPFN at time window wj in Eq. (5), we assess the SOPFN
distribution on cortical gyral/sulcal regions at wj. Denote the principal curvature value
0; gi gyri
of cortical vertex gj (i = 1..64984) as pcurvgi which is provided in HCP
\0; gi sulci
data [11], the SOPFN Vwj on gyral and sulcal regions is represented as
Vwj jgyri ¼ 8gi s:t: gi 2 Vwj ; pcurvgi 0, and Vwj jsulci ¼ 8gi s:t: gi 2 Vwj ; pcurvgi \0,
respectively. Note that Vwj ¼ Vwj jgyri þ Vwj jsulci . We further define the SOPFN distri-

bution percentage at wj as Pwj jgyri ¼ Vwj jgyri =Vwj for gyri and Pwj jsulci ¼

Vw jsulci =Vw for sulci, where j:j denotes the number of members of a set and
j j
Pwj jgyri þ Pwj jsulci ¼ 1.

Finally, to assess the temporal
h dynamics of SOPFN i distribution on gyral/sulcal
regions, we define Pgyri ¼ Pw1 jgyri ; Pw2 jgyri ; ::; Pwj jgyri . as a ðt l þ 1Þ dimensional
feature vector representing the dynamics of SOPFN distribution percentage
across all
ðt l þ 1Þ time windows on gyri. Similarly, we define Psulci ¼ Pw1 jsulci ; Pw2 jsulci ; ::;
Pwj jsulci for sulci.
3 Experimental Results
For each of the seven tfMRI datasets, we equally divided all 64 subjects into two
groups (32 each) for reproducibility studies. The window length l was experimentally
determined (l = 20) using the similar method in [8]. The values of m and k in Eq. (4)
were experimentally determined (m = 50 and k = 1.5) using the similar method in [13].
24 X. Jiang et al.
3.1 Group-Wise Consistent Functional Networks Within Different Time

Windows
We successfully identified group-wise consistent functional networks within different
time windows based on methods in Sect. 2.3. Figure 3 shows the spatial maps of two
example identified functional networks within different time windows in one subject
group of emotion tfMRI data. We can see that for each of the two networks (Figs. 3b–
c), albeit similar in overall spatial pattern, there is considerable variability of the spatial
pattern across different time windows compared with the network template. Quanti-
tatively, the mean spatial pattern similarity J defined in Sect. 2.3 is 0.69 ± 0.10 and
0.36 ± 0.06 for the two networks, respectively. This finding is consistent between two
subject groups for all seven tfMRI datasets. The spatial pattern variability of the same
functional network across different time windows is in agreement with the argument
that there is different involvement of specific brain regions in the corresponding net-
works across different time windows [7].
Fig. 3. Two example group-wise consistent functional networks within different time windows
in one subject group of emotion tfMRI data. (a) Task design curves across time windows (TW) of
emotion tfMRI data. 12 example TWs are indexed. Three different TW types are divided by black
dashed lines and labeled. TW type #1 involves task design 1, TW type #2 involves task design 2,
and TW type #3 involves both two task designs. The spatial patterns of (b) one example
task-evoked functional network and (c) one example intrinsic connectivity network (ICN) within
the 12 example TWs are shown.
3.2 Temporal Dynamics Difference of SOPFN Distribution on Gyri/Sulci

We identified the SOPFN based on all identified functional networks using Eq. (5) and
further assessed the SOPFN distribution on cortical gyral/sulcal regions within each
time window. Figure 4 shows the mean SOPFN distribution on gyri and sulci across
Fig. 4. The mean SOPFN distribution on gyral (G) and sulcal (S) regions across different time
window types in the two subject groups of emotion tfMRI data. The common regions with higher
density are highlighted by red arrows. The two example surfaces illustrate the gyri/sulci and are
color-coded by the principal curvature value.
different TW types in emotion tfMRI data as example. We can see that albeit certain
common regions (with relatively higher density as highlighted by red arrows), there is
considerable SOPFN distribution variability between gyral and sulcal regions across
different time windows. Quantitatively, the distribution percentage on gyral regions is
statistically larger than that on sulcal regions across all time windows using
two-sampled t-test (p < 0.05) for all seven tfMRI datasets as reported in Table 1.
Table 1. The mean ratio of SOPFN distribution percentage on gyri vs. that on sulci across all
time windows in the two subject groups of seven tfMRI datasets.
Emotion Gambling Language Motor Relational Social WM
Group 1 1.47 1.60 1.46 1.32 1.59 1.46 1.67
Group 2 1.45 1.55 1.38 1.33 1.49 1.47 1.66
Finally, we calculated and visualized Pgyri and Psulci representing the dynamics of
SOPFN distribution percentage across all time windows on gyri and sulci, respectively
in Fig. 5. It is interesting that there are considerable peaks/valleys for the distribution
percentage on gyri/sulci which are coincident with the specific task designs across the
entire scan, indicating the temporal dynamics difference of SOPFN distribution
between gyral and sulcal regions. These results indicate that gyri might participate
26 X. Jiang et al.
Fig. 5. The temporal dynamics of SOPFN distribution percentage on gyri (green curve) and
sulci (yellow curve) across all time windows in the seven tfMRI datasets shown in (a)–(g),
respectively. The task design curves in each sub-figure are represented by different colors. Y-axis
represents the percentage value (*100 %).
more in those spatially overlapped and interacting concurrent functional networks

(neural processes) than sulci under temporal dynamics. It should be noted that the
identified temporal dynamics difference of SOPFN distribution between gyral and
sulcal regions (Fig. 5) is reasonably reproducible between the two subject groups and
across all seven high-resolution tfMRI datasets. Given the lack of ground truth in brain
mapping, the reproducibility of our results is unlikely due to systematic artifact and
thus should be a reasonable verification of the meaningfulness of the results.
4 Discussion and Conclusion
We proposed a novel computational framework to model the functional dynamics of

cortical gyri and sulci. Experimental results based on 64 HCP subjects and their 7
tfMRI datasets demonstrated that meaningful temporal dynamics difference of SOPFN
distribution between cortical gyral and sulcal regions was identified. Our results pro-
vide novel understanding of brain functional dynamics mechanisms in the future. We
will investigate other potential sources of differences that are observed in the results in
the future. We will apply the proposed framework on resting state fMRI data and more
tfMRI datasets, e.g., the recent 900 subjects’ tfMRI data released by HCP, to further
reproduce and validate the findings.
References
1. Rakic, P.: Specification of cerebral cortical areas. Science 241, 170–176 (1988)
2. Nie, J., et al.: Axonal fiber terminations concentrate on gyri. Cereb. Cortex 22(12), 2831–
2839 (2012)
3. Chen, H., et al.: Coevolution of gyral folding and structural connection patterns in primate
brains. Cereb. Cortex 23(5), 1208–1217 (2013)
4. Takahashi, E., et al.: Emerging cerebral connectivity in the human fetal brain: an MR
tractography study. Cereb. Cortex 22(2), 455–464 (2012)
5. Deng, F., et al.: A functional model of cortical gyri and sulci. Brain Struct. Funct. 219(4),
1473–1491 (2014)
6. Jiang, X., et al.: Sparse representation of HCP grayordinate data reveals novel functional
architecture of cerebral cortex. Hum. Brain Mapp. 36(12), 5301–5319 (2015)
7. Gilbert, C.D., Sigman, M.: Brain states: top-down influences in sensory processing. Neuron
54(5), 677–696 (2007)
8. Li, X., et al.: Dynamic functional connectomics signatures for characterization and
differentiation of PTSD patients. Hum. Brain Mapp. 35(4), 1761–1778 (2014)
9. Duncan, J.: The multiple-demand (MD) system of the primate brain: mental programs for
intelligent behaviour. Trends Cogn. Sci. 14(4), 172–179 (2010)
10. Lv, J.: Sparse representation of whole-brain fMRI signals for identification of functional
networks. Med. Image Anal. 20(1), 112–134 (2015)
11. Glasser, M.F., et al.: The minimal preprocessing pipelines for the Human Connectome
Project. Neuroimage 80, 105–124 (2013)
12. Lee, K., et al.: A data-driven sparse GLM for fMRI analysis using sparse dictionary learning
with MDL criterion. IEEE Trans. Med. Imaging 30(5), 1076–1089 (2011)
13. Lv, J., et al.: Holistic atlases of functional networks and interactions reveal reciprocal
organizational architecture of cortical function. IEEE TBME 62(4), 1120–1131 (2015)
14. Mairal, J., et al.: Online learning for matrix factorization and sparse coding. J. Mach. Learn.
Res. 11, 19–60 (2010)
15. Lv, J., et al.: Assessing effects of prenatal alcohol exposure using group-wise sparse
representation of fMRI data. Psychiatry Res. 233, 254–268 (2015)
16. Smith, S.M., et al.: Correspondence of the brain’s functional architecture during activation
and rest. Proc. Natl. Acad. Sci. U.S.A. 106(31), 13040–13045 (2009)
A Multi-stage Sparse Coding Framework
to Explore the Effects of Prenatal Alcohol
Exposure
Shijie Zhao1, Junwei Han1(&), Jinglei Lv1,2, Xi Jiang2, Xintao Hu1,

Shu Zhang2, Mary Ellen Lynch3, Claire Coles3, Lei Guo1,
Xiaoping Hu3, and Tianming Liu2
1
Junweihan2010@gmail.com
2
Cortical Architecture Imaging and Discovery,
3
Emory University, Atlanta, GA, USA
Abstract. In clinical neuroscience, task-based fMRI (tfMRI) is a popular

method to explore the brain network activation difference between healthy
controls and brain diseases like Prenatal Alcohol Exposure (PAE). Traditionally,
most studies adopt the general linear model (GLM) to detect task-evoked acti-
vations. However, GLM has been demonstrated to be limited in reconstructing
concurrent heterogeneous networks. In contrast, sparse representation based
methods have attracted increasing attention due to the capability of automati-
cally reconstructing concurrent brain activities. However, this data-driven
strategy is still challenged in establishing accurate correspondence across indi-
viduals and characterizing group-wise consistent activation maps in a principled
way. In this paper, we propose a novel multi-stage sparse coding framework to
identify group-wise consistent networks in a structured method. By applying
this novel framework on two groups of tfMRI data (healthy control and PAE),
we can effectively identify group-wise consistent activation maps and charac-
terize brain networks/regions affected by PAE.
Keywords: Task-based fMRI Dictionary learning Sparse coding PAE
1 Introduction
TfMRI has been widely used in clinical neuroscience to understand functional brain
disorders [1]. Among all of state-of-the-art tfMRI analysis methodologies, the general
linear model (GLM) is the most popular approach in detecting functional networks
under specific task performance [2]. The basic idea underling GLM is that task-evoked
brain activities could be discovered by subtracting the activity from a control condition
[3, 4]. In common practice, experimental and control trials are performed several times
and fMRI signals are averaged to increase the signal-to-noise ratio [3]. Thus task-
dominant brain activities are greatly enhanced and other subtle and concurrent activities
are largely overlooked. Another alternative approach is independent component analysis

DOI: 10.1007/978-3-319-46720-7_4
A Multi-Stage Sparse Coding Framework 29
(ICA) [5]. However, the theoretical foundation of ICA-based methods has been chal-
lenged in recent studies [6]. Therefore, more advanced tfMRI activation detection
methods are still needed.
Recently, dictionary learning and sparse representation methods have been adopted
for fMRI data analysis [6, 7] and attracted a lot of attention. The basic idea is to
factorize the fMRI signal matrix into an over-complete dictionary of basis and a
coefficient matrix via dictionary learning algorithms [8]. Specifically, each dictionary
atom represents the functional activity of a specific brain network and its corresponding
coefficient vector stands for the spatial distribution of this brain network [7]. It should
be noticed that the decomposed coefficient matrix naturally reveals the spatial patterns
of the inferred brain networks. This novel strategy naturally accounts for the various
brain networks that might be involved in concurrent functional processes [9, 10].
However, a notable challenge in current data-driven strategy is how to establish
accurate network correspondence across individuals and characterize the group-wise
consistent activation map in a structured method. Since each dictionary is learned in a
data driven way, it is hard to establish the correspondence across subjects. To address
this challenge, in this paper, we propose a novel multi-stage sparse coding framework
to identify diverse group consistent brain activities and characterize the subtle cross
group differences under specific task conditions. Specifically, we first concatenate all
the fMRI dataset temporally and adopt dictionary learning method to identify the
group-level activation maps across all the subjects. After that, we constrain spatial/
temporal features in dictionary learning procedure to identify individualized temporal
pattern and spatial pattern from individual fMRI data. These constrained features
naturally preserve the correspondence across different subjects. Finally, a statistical
mapping method is adopted to identify group-wise consistent maps. In this way, the
group-wise consistent maps are identified in a structured way. By applying the pro-
posed framework on two groups of tfMRI data (healthy control and PAE groups), we
successfully identified diverse group-wise consistent brain networks for each group and
specific brain networks/regions that are affected by PAE under arithmetic task.

2.1 Overview
Figure 1 summarizes the computational pipeline of the multi-stage sparse coding
framework. There are four major steps. First, we concatenate all the subjects’ datasets
temporally to form a concatenated time*voxels data matrix (Fig. 1a) and employ the
dictionary learning and sparse coding algorithms [8] to identify the group-level acti-
vation maps in the population. Then for each subject’s fMRI data, we adopt supervised
dictionary learning method constraining group-level spatial patterns to learn the indi-
vidualized dictionary for each subject (Fig. 1b). These individualized dictionaries
are learned from individual data and thus the subject variety is better reserved. After
that, for each subject, supervised dictionary learning constraining the individual dic-
tionary is adopted to learn individualized coefficient matrix for each subject (Fig. 1c).
In this way, the individualized spatial maps are reconstructed. Finally, based on the
30 S. Zhao et al.
Fig. 1. The computational framework of the proposed methods. (a) Concatenated sparse coding.
t is the number of time point number and n is the voxel number and k is the dictionary atom
number. (b) Supervised dictionary learning with spatial maps fixed. (c) Supervised dictionary
learning with temporal features fixed. (d) Statistical mapping to identify group-wise consistent
maps for each group.
correspondence established in our method, statistical coefficient mapping method is

then adopted to characterize the group-consistent activation maps for each group
(Fig. 1d). Therefore, the correspondence between different subjects is preserved in the
whole procedure and the group-consistent activation maps are identified in a structured
method.

Thirty subjects participated in the arithmetic task-based fMRI experiment under IRB
approval [11]. They are young adults aging from 20–26 and are from two groups:
unexposed health control (16 subjects) and PAE affected ones (14 subjects). Two
participants from healthy control group are abandoned due to the poor data quality. All
participants were scanned in a 3T Siemens Trio scanner and 10 task blocks were
alternated between a letter-matching control task and a subtraction arithmetic task. The
acquisition parameters are as follows: TR = 3 s, TE = 32 ms, FA = 90, the resolution
is 3.44 mm 3.44 mm 3 mm and the dimension is 64 64 34. The prepro-
cessing pipeline was performed in FSL [12] including motion correction, slice time
correlation, spatial smoothing, and global drift removal. The processed volumes were
then registered to the standard space (MNI 152) for further analysis.
2.3 Dictionary Learning and Sparse Representation
Given the fMRI signal matrix S RLn , where L is the fMRI time points number and
n is the voxel number, dictionary learning and sparse representation methods aim to
represent each signal in S with a sparse linear combination of dictionary (D) atoms and
the coefficient matrix A, i.e., S = DA. The empirical cost function is defined as
1X n
f n ðD Þ , ‘ðsi ; DÞ ð1Þ
n i¼1
where D is the dictionary, ‘ is the loss function, n is the voxel number and, si is a
training sample which represents the time course of a voxel. This problem of mini-
mizing the empirical cost could be further rewritten as a matrix factorization problem
with sparsity penalty:
1
min jjS DAjj22 þ kjjAjj1;1 ð2Þ
DeC;AeRkn 2
where k is a sparsity regularization parameter, k is the number of dictionary atom

number and C is the set defined by the constraint to prevent D having arbitrarily large
values. In order to solve this problem, we adopt the online dictionary learning and
sparse coding method [8] and the algorithm pipeline is summarized in Algorithm 1
below.
2.4 Constrain Spatial Maps in Dictionary Learning

In this section, we adjust the dictionary learning procedure to constrain spatial maps in
dictionary learning procedure to learn the individualized dictionary. Similar to GLM,
we name each identified network as activation map. First, each group of activation map
is transferred into binary vector matrix V 2 f0; 1gkn by thresholding. Since both A
and S share the same number of voxels, they have similar structures. We set all these
vectors V as constrains in updating coefficient matrix. Specifically, if the coefficient
matrix element in corresponding constrain matrix location is zero, this elements will be
replaced with 0.1 (other small nonzero value is acceptable) to keep this element ‘ac-
32 S. Zhao et al.
tive’. It should be noticed that the coefficient matrix is updated except that part of the
elements keeps ‘active’ (nonzero). The coefficient matrix updating procedure could be
represented as follows.
1 2
Ai , argmin si Dðt1Þ Ai 2 þ kjjAi jj1 ;
Ai eR m 2
Api ¼ 0:1 if Api ¼ 0 && V ði; pÞ ¼ 1 ð5Þ
2.5 Constrain Temporal Features in Dictionary Learning

In our method, the dictionary is set as a fully fixed dictionary and the learning problem
becomes an easy regression problem. Specifically, this dictionary learning and sparse
representation problem leads to the following formulation:
1
min jjS Dc Ajj22 þ kjjAjj1;1
AeRkn 2
where Dc is the fixed individualized dictionary, k is the dictionary atom number, and
A is the learned coefficient matrix from each individual fMRI data with constrained
individualized dictionary in dictionary learning procedure.
2.6 Statistical Mapping

With the help of constrained features in dictionary learning procedure, the corre-
spondences of spatial activation maps between different subjects are naturally pre-
served. In order to reconstruct accurate consistency maps between different groups, we
hypothesize that each element in coefficient matrix is group-wisely null and a standard
T-test is carried out to test the acceptance of the hypothesis. Specifically,
AGx ði; jÞ
Tði; jÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð7Þ
varðAGx ði; jÞÞ
where AGx ði; jÞ represents the average value of the elements in each group and x
represents the patient group or control group. Specifically, the T-test acceptance
threshold is set as p\0:05. The derived T-value is further transformed to the standard
z-score. In this way, each group generated a group consistent Z statistic map and each
row in Z can be mapped back to brain volume standing for the spatial distribution of
the dictionary atom.
The proposed framework was applied to two groups of tfMRI data: unexposed healthy
control and PAE patients. In each stage, the dictionary size is 300 and the sparsity is
around 0.05 and the optimization method is stochastic approximations. Briefly, we
identified 263 meaningful networks in concatenated sparse coding stage and 22 of them
were affected by PAE. The detailed experimental results are reported as follows.
3.1 Identified Group-Level Activation Maps by Concatenated Sparse

Coding
Figure 2 shows a few examples of identified group-level activation maps by con-
catenated sparse coding in Fig. 1a From these figures, we can see that both GLM
activation map as well as common resting state networks [13] are identified, which
indicates that sparse coding based methods are powerful in identifying diverse and
concurrent brain activities. The quantitative measurement is shown in Table 1. The
spatial similarity is defined as:
jX \ T j
RðX; T Þ ¼ ð8Þ
jT j
where X is the learned spatial network from Al and T is the RSN template.
Fig. 2. Examples of identified meaning networks by concatenated sparse coding. The first row is
the template name and the second row is the template spatial map. The third row is the
corresponding component network number in concatenated sparse coding. The last row is the
corresponding spatial maps in concatenated sparse coding. RSN represents common resting state
network in [13] and GLM result is computed from FSL feat software.
Table 1. The spatial overlap rates between the identified networks and the corresponding GLM
activation map and resting state templates.
GLM RSN#1 RSN#2 RSN#3 RSN#4 RSN#5 RSN#6 RSN#7 RSN#8 RSN#9
0.47 0.45 0.57 0.45 0.37 0.29 0.36 0.48 0.34 0.34
3.2 Learned Individualized Temporal Patterns

After concatenated sparse coding, in order to better account for subject activation
variety, we constrained these identified spatial patterns in dictionary learning procedure
and learned individualized temporal patterns (the method is detailed in Sect. 2.4) for
each subject. Figure 3 shows two kinds of typically learned individualized temporal
34 S. Zhao et al.
(a)
(b)
Fig. 3. Identified individualized temporal patterns and correlation matrix between different
subjects. (a) Identified individualized temporal patterns by constraining the same task-evoked
activation map (identified in concatenated sparse coding) in dictionary learning procedure. The
red line is the task paradigm pattern and the other lines are derived individualized temporal
activity patterns from healthy control group subjects for the same task-evoked activation
map. The right figure is the correlation matrix between different subjects. (b) Identified
individualized temporal patterns by constraining resting state activation map (identified in
concatenated sparse coding).
patterns and the correlation matrix between different subjects. Specifically, Fig. 3a
shows the learned temporal patterns from constraining task-evoked group activation
map (Network #175 in Fig. 2). The red line is the task design paradigm which has been
convoluted with hemodynamic response function. It is interesting to see that the
learned individualized temporal patterns from constraining task-evoked activation map
are quite consistent and the average of these learned temporal patterns is similar to the
task paradigm regressor. The correlation matrix between subjects in healthy control
group is visualized in the right map in Fig. 3a and the average value is as high as 0.5.
Another kind of dictionary patterns are learned from constraining resting state net-
works. Figure 3b shows the learned temporal patterns and correlation matrix between
the healthy control group subjects with constraining resting state network (#152 in
Fig. 2). The temporal patterns are quite different among different subjects and the
average correlation value is as low as 0.15. From these results, we can see that the
learned individualized temporal patterns are reasonable according to current neuro-
science knowledge and the subtle temporal activation pattern differences among dif-
ferent subjects under the same task condition are recognized with the proposed
framework (Fig. 4).
(a)
(b)
Fig. 4. Examples of identified group-wise activation map in different groups. (a) and (b) are
organized in the same fashion. The first row shows the component number and the second row
shows the concatenated sparse coding results. While the third row shows the reconstructed
statistical activation map in healthy control group, the last row shows the statistical activation
map in PAE group. Blue circles highlight the difference between statistical maps in two groups.
3.3 Affected Activation Networks by Prenatal Alcohol Exposure

In order to identify individualized spatial activation maps, we then constrained indi-
vidualized dictionary in dictionary learning procedure (detailed in Sect. 2.5) for each
subject. These fixed features naturally preserve the correspondence information
between subjects. After that, we adopted statistical mapping in Sect. 2.6 to generate
statistical group-wise consistency maps for each group. It is easy to see that although
the general spatial shapes are similar, there are subtle difference between different
group statistical consistent maps which indicated the multi-stage sparse coding better
captures the individual variety. Specifically, blue circles highlight the brain regions that
are difference between healthy control group and PAE group. These areas includes left
inferior occipital areas, left superior, right inferior parietal regions, and medial frontal
gyrus which have been reported related to Prenatal Alcohol Exposure [11]. Further, it is
also interesting to see that there is a clear reduction of region size in corresponding
group consistency networks suggesting the similar effect of Prenatal Alcohol Exposure
reported in the literature [11].
36 S. Zhao et al.
4 Conclusion
We proposed a novel multi-stage sparse coding framework for inferring group con-
sistency maps and characterizing the subtle group response differences under specific
task performance. Specifically, we combined concatenated sparse coding and super-
vised dictionary learning methods and statistical mapping method together to identify
statistical group consistency maps in each group. This novel framework greatly
overcomes the limitation of lacking correspondence between different subjects in
current sparse coding based methods and provides a structured way to identify sta-
tistical group consistent maps. Experiments on healthy control and PAE tfMRI data
have demonstrated the great advantage of the proposed framework in identifying
meaningful diverse group consistency brain networks. In the future, we will further
investigate the evaluation of subjects’ individual maps in the frame work and parameter
optimization and test our framework on a variety of other tfMRI datasets.
Acknowledgements. J. Han was supported by the National Science Foundation of China under
Grant 61473231 and 61522207. X. Hu was supported by the National Science Foundation of
China under grant 61473234, and the Fundamental Research Funds for the Central Universities
under grant 3102014JCQ01065. T. Liu was supported by the NIH Career Award (NIH
EB006878), NIH R01 DA033393, NSF CAREER Award IIS-1149260, NIH R01 AG-042599,
NSF BME-1302089, and NSF BCS-1439051.
References
1. Matthews, P.M., et al.: Applications of fMRI in translational medicine and clinical practice.
Nat. Rev. Neurosci. 7(9), 732–744 (2006)
2. Fox, M.D., et al.: The human brain is intrinsically organized into dynamic, anticorrelated
functional networks. PNAS 102(27), 9673–9678 (2005)
3. Mastrovito, D.: Interactions between resting-state and task-evoked brain activity suggest a
different approach to fMRI analysis. J. Neurosci. 33(32), 12912–12914 (2013)
4. Friston, K.J., et al.: Statistical parametric maps in functional imaging: a general linear
approach. Hum. Brain Mapp. 2(4), 189–210 (1994)
5. Mckeown, M.J., et al.: Spatially independent activity patterns in functional MRI data during
the stroop color-naming task. PNAS 95(3), 803–810 (1998)
6. Lee, K., et al.: A data-driven sparse GLM for fMRI analysis using sparse dictionary learning
with MDL criterion. IEEE Trans. Med. Imaging 30(5), 1076–1089 (2009)
7. Lv, J., et al.: Sparse representation of whole-brain fMRI signals for identification of
functional networks. Med. Image Anal. 1(20), 112–134 (2014)
8. Mairal, J., et al.: Online learning for matrix factorization and sparse coding. J. Mach. Learn.
Res. 11, 19–60 (2010)
9. Pessoa, L.: Beyond brain regions: network perspective of cognition–emotion interactions.
Behav. Brain Sci. 35(03), 158–159 (2012)
10. Anderson, M.L., Kinnison, J., Pessoa, L.: Describing functional diversity of brain regions
and brain networks. Neuroimage 73, 50–58 (2013)
11. Santhanam, P., et al.: Effects of prenatal alcohol exposure on brain activation during an
arithmetic task: an fMRI study. Alcohol. Clin. Exp. Res. 33(11), 1901–1908 (2009)
12. Jenkinson, M., Smith, S.: A global optimization method for robust affine registration of brain
images. Med. Image Anal. 5(2), 143–156 (2001)
13. Smith, S.M., et al.: Correspondence of the brain’s functional architecture during activation
and rest. PNAS 106(31), 13040–13045 (2009)
Correlation-Weighted Sparse Group
Representation for Brain Network Construction
in MCI Classification
Renping Yu1,2 , Han Zhang2 , Le An2 , Xiaobo Chen2 , Zhihui Wei1 ,

and Dinggang Shen2(B)
1
School of Computer Science and Engineering,
Nanjing University of Science and Technology, Nanjing, China
2
Department of Radiology and BRIC, UNC at Chapel Hill, Chapel Hill, NC, USA
dgshen@med.unc.edu
Abstract. Analysis of brain functional connectivity network (BFCN)

has shown great potential in understanding brain functions and iden-
tifying biomarkers for neurological and psychiatric disorders, such as
Alzheimer’s disease and its early stage, mild cognitive impairment (MCI).
In all these applications, the accurate construction of biologically mean-
ingful brain network is critical. Due to the sparse nature of the brain
network, sparse learning has been widely used for complex BFCN con-
struction. However, the conventional l1 -norm penalty in the sparse learn-
ing equally penalizes each edge (or link) of the brain network, which
ignores the link strength and could remove strong links in the brain
network. Besides, the conventional sparse regularization often overlooks
group structure in the brain network, i.e., a set of links (or connec-
tions) sharing similar attribute. To address these issues, we propose to
construct BFCN by integrating both link strength and group structure
information. Specifically, a novel correlation-weighted sparse group con-
straint is devised to account for and balance among (1) sparsity, (2) link
strength, and (3) group structure, in a unified framework. The proposed
method is applied to MCI classification using the resting-state fMRI from
ADNI-2 dataset. Experimental results show that our method is effective
in modeling human brain connectomics, as demonstrated by superior
MCI classification accuracy of 81.8 %. Moreover, our method is promis-
ing for its capability in modeling more biologically meaningful sparse
brain networks, which will benefit both basic and clinical neuroscience
studies.
1 Introduction
Study of brain functional connectivity network (BFCN), based on resting-state
fMRI (rs-fMRI), has shown great potentials in understanding brain functions
R. Yu was supported by the Research Fund for the Doctoral Program of Higher
Education of China (RFDP) (No. 20133219110029), the Key Research Foundation
of Henan Province (15A520056) and NFSC (No. 61171165, No. 11431015).

DOI: 10.1007/978-3-319-46720-7 5
38 R. Yu et al.
and identifying biomarkers for neurological disorders [1]. Many BFCN modeling
approaches have been proposed and most of them represent the brain network as
a graph by treating brain regions as nodes, and the connectivity between a pair
of region as an edge (or link) [2]. Specifically, the brain can be first parcellated
into different regions-of-interest (ROIs) and then the connectivity in a pair of
ROIs can be estimated by the correlation between the mean blood-oxygen-level
dependent (BOLD) time series of these ROIs.
The most common BFCN modeling approach is based on pairwise Pear-
son’s correlation (PC). However, PC is insufficient to account for the interaction
among multiple brain regions [3], since it only captures pairwise relationship.
Another common modeling approach is based on sparse representation (SR).
For example, the sparse estimation of partial correlation with l1 -regularization
can measure the relationship among certain ROIs while factoring out the effects
of other ROIs [4]. This technique has been applied to construct brain network in
the studies of Alzheimer’s disease (AD), mild cognitive impairment (MCI) [3],
and autism spectrum disorder [5]. However, human brain inherently contains not
only sparse connections but also group structure [6], with the latter considered
more in the recent BFCN modeling methods. A pioneer work in [7] proposed non-
overlapping group sparse representation by considering group structures and sup-
porting group selections. The group structure has been utilized in various ways.
For example, Varoquaux et al. [8] used group sparsity prior to constrain all sub-
jects to share the same network topology. Wee et al. [9] used group constrained
sparsity to overcome inter-subject variability in the brain network construction.
To introduce the sparsity within each group, sparse group representation (SGR)
has also been developed by combining l1 -norm and lq,1 -norm constraints. For
example, a recent work [10] defined “group” based on the anatomical connec-
tivity, and then applied SGR to construct BFCN from the whole-brain fMRI
signals.
Note that, in all these existing methods, the l1 -norm constraint in both SR
and SGR penalizes each edge equally. That is, when learning the sparse represen-
tation for a certain ROI, BOLD signals in all other ROIs are treated equally. This
process ignores the similarity between BOLD signals of the considered ROI and
the other ROIs during the network reconstruction. Actually, if BOLD signals of
two ROIs are highly similar, their strong connectivity should be kept or enhanced
during the BFCN construction, while the weak connectivity shall be restrained.
In light of this, we introduce a link-strength related penalty in sparse represen-
tation. Moreover, to further make the penalty consistent across all similar links
in the whole brain network, we propose a group structure based constraint on
the similar links, allowing them to share the same penalty during the network
construction. In this way, we can jointly model the whole brain network, instead
of separately modeling a sub-network for each ROI. This is implemented by a
novel weighted sparse group regularization that considers sparsity, link strength,
and group structure in a unified framework.
To validate the effectiveness of our proposed method in constructing brain
functional network, we conduct experiments on a real fMRI dataset for the BFCN
Brain Network Construction 39
construction and also for BFCN-based brain disorder diagnosis. The experimen-
tal results in distinguishing MCI subjects from normal controls (NCs) confirm
that our proposed method, with simple t-test for feature selection and linear
SVM for classification, can achieve superior classification performance compared
to the competing methods. The selected feature (i.e., network connections) by
our method can be utilized as potential biomarkers in future studies on early
intervention of such a progressive and incurable disease.
2 Brain Network Construction and MCI Classification

Suppose that each brain has been parcellated into N ROIs according to a certain
brain atlas. The regional mean time series of the ith ROI can be denoted by a
column vector xi = [x1i ; x2i ; ...; xT i ] ∈ RT , where T is the number of time
points in the time series, and thus X = [x1 , ..., xi , ..., xN ] ∈ RT ×N denotes the
data matrix of a subject. Then the key step of constructing the BFCN for this
subject is to estimate the connectivity matrix W ∈ RN ×N , given the N nodes
(i.e., xi , i = 1, 2, ..., N ), each of which represents signals in a ROI.
Many studies model the connectivity of brain regions by a sparse network [4].
The optimization of the BFCN construction based on SR can be formulated as
N
N

1
min ||xi − xj Wji ||22 + λ |Wji |. (1)
W 2
i=1 j=i i=1 j=i
The l1 -norm penalty involved in Eq. (1) penalizes each representation coef-
ficient with the same weight. In other words, it treats each ROI equally when
reconstructing a target ROI (xi ). As a result, sparse modeling methods based
on this formulation tend to reconstruct the target ROI by some ROIs that have
very different signals as the target ROI. Furthermore, the reconstruction of each
ROI is independent from the reconstructions of other ROIs; thus, the estimated
reconstruction coefficients for the similar ROIs could vary a lot, and this could
lead to an unstable BFCN construction. Hence, the link strength that indicates
signal similarity of two ROIs should be considered in the BFCN construction.
2.1 Correlation-Weighted Sparse Group Representation for BFCN

Construction
To take into account the link strength, we introduce a correlation-weighted sparse
penalty in Eq. (1). Specifically, if BOLD signals of the two ROIs have high sim-
ilarity, i.e., their link is strong, then this link should be penalized less. On the
other hand, weak link should be penalized more with larger weight. To measure
the link strength between signals of two ROIs, PC coefficient can be calculated.
Then the penalty weight for Wji , i.e., the link between the ith ROI xi and the
j th ROI xj , can be defined as:
2
Pji
Cji = e− σ , (2)
40 R. Yu et al.
where Pji is the PC coefficient between the ith ROI xi and the j th ROI xj ,
and σ is a parameter used to adjust the weight decay speed for the link strength
adaptor. Accordingly, the correlation-weighted sparse representation (WSR) can
be formulated as
N
N

1
min ||xi − xj Wji ||22 + λ Cji |Wji |, (3)
W 2
i=1 j=i i=1 j=i
where C ∈ RN ×N is the link strength adaptor matrix with each element Cji
being inversely proportional to the similarity (i.e., PC coefficient) between the
signals in ROI xj and the signals in the target ROI xi .
Note that the reconstruction of xi , i.e., the ith sub-network construction,
is still independent from the reconstructions of sub-networks for other ROIs.
In order to further make this link-strength related penalty consistent across all
links with similar strength in the whole network, we propose a group structure
constraint on the similar links, allowing them to share the same penalty during
the whole BFCN construction. In this way, we can model the whole brain network
jointly, instead of separately modeling sub-networks of all ROIs.
1 2000
G
1
0.8 G
2
0.6 1500 G3
G4
Group Size
0.4
0.2 1000
G5
0
G6
-0.2 500
G
7
-0.4 G8
G9
G10
-0.6 0
0 0.2 0.4 0.6 0.8 1
-0.8 Absolute PC value
(a) PC Matrix (b) Group partition
Fig. 1. Illustration of group partition for a typical subject in our data. (a) Pearson
correlation coefficient matrix P . (b) The corresponding group partition (K = 10) of (a).
To identify the group structure, we partition all links, i.e., the pairwise con-
nections among ROIs, into K groups based on the PC coefficients. Specifically,
K non-overlapping groups of links are pre-specified by their corresponding PC
coefficients. Assuming the numerical range of the absolute value of the PC
coefficient |Pij | is [Pmin , Pmax ] with Pmin ≥ 0 and Pmax ≤ 1, we partition
[Pmin , Pmax ] into K uniform and non-overlapping partitions with the same inter-
val Δ = (Pmax − Pmin )/K. The k th group is defined as Gk = {(i, j) | |Pij | ∈
[Pmin + (k − 1)Δ, Pmin + kΔ]}. Figure 1 shows the grouping results by setting
K = 10 for illustration purpose. Most link’s strength in the network is weak,
while the strong connectivity accounts for a small number of links.
To integrate constraints on link strength, group structure, as well as the spar-
sity in a unified framework, we propose a novel weighted sparse group regular-
ization formulated as:
N
N
K

1
min ||xi − xj Wji ||22 + λ1 Cji |Wji | + λ2 dk ||WGk ||q , (4)
W 2
i=1 j=i i=1 j=i k=1

where ||WGk ||q = q q
(i,j)∈Gk (Wij )
is the lq -norm (with q=2 in this work).
E2
− σk

dk = e is a pre-defined weight for the k th group and Ek = |G1k | (i,j)∈Gk Pij .
σ is the same parameter as in Eq. (2), set as the mean of all subjects’ standard
variances of absolute PC coefficients. In Eq. (4) the first regularizer (l1 -norm
penalty) controls the overall sparsity of the reconstruction model, and the second
regularizer (lq,1 -norm penalty) contributes the sparsity at the group level.
2.2 MCI Classification

The estimated BFCN are applied to classification of MCI and NC. Note that
the learned connectivity matrix W could be asymmetric. Therefore, we simply
make a symmetric matrix by W ∗ = (W + W T )/2, and use W ∗ to represent
the final network that contains N (N − 1)/2 effective connectivity measures due
to symmetry. These connectivity measures are used as the imaging features,
with the feature dimensionality of 4005 for the case of N = 90. For feature
selection, we use two-sample t-test with the significance level of p < 0.05 to
select features that significantly differentiate between MCI and NC classes. After
feature selection, we employ a linear SVM [11] with c = 1 for classification.
3 Experiments
The Alzheimers Disease Neuroimaging Initiative (ADNI) dataset is used in this
study. Specifically, 50 MCI patients and 49 NCs are selected from the ADNI-2
dataset in our experiments. Subjects from both groups were scanned using 3.0T
Philips scanners. SPM8 toolbox (http://www.fil.ion.ucl.ac.uk/spm/) was used
to preprocess the rs-fMRI data according to the well-accepted pipeline [6].
3.1 Brain Functional Network Construction

Automated Anatomical Labeling (AAL) template is used to define 90 brain
ROIs, and the mean rs-fMRI signals are extracted from each ROI to model
BFCN. For comparison, we also construct the brain networks using other meth-
ods, including PC, SR, WSR, and SGR (corresponding to Cji = 1 in Eq. (4)).
The SLEP toolbox [12] is used to solve the sparse related models in this paper.
Figure 2 shows the visualization of the constructed BFCNs of one typical
subject based on 5 different methods. As can be seen from Fig. 2(a), the intrinsic
grouping in brain connectivity can be indirectly observed, although the PC only
measures pairwise ROI interaction. Comparing Fig. 2(b) and (d), we can observe
that there are relatively fewer non-zero elements in the SGR-based model due
to the use of group sparse regularization. Similarly, the group structure is more
obvious in Fig. 2(e) by our WSGR method than that in Fig. 2(c) by WSR.
42 R. Yu et al.
1 0.6 0.6
0.8 0.4 0.4
0.6
0.2 0.2
0.4
0 0
0.2
-0.2 -0.2
0
-0.4 -0.4
-0.2
-0.4 -0.6 -0.6
(a) PC (b) SR (c) WSR

0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
(d) SGR (e) WSGR
Fig. 2. Comparison of BFCNs of the same subject reconstructed by 5 different methods.
Regarding the effectiveness of using the link-strength related weights, we

can see from Fig. 2(c) and (e) that the sparse constraint with the link-strength
related weights is more reasonable for modeling the BFCN than their counter-
parts without weights (shown in Fig. 2(b) and (d)).
3.2 Classification Results
A leave-one-out cross-validation (LOOCV) strategy is adopted in our experi-

ments. To set the values of the regularization parameter (i.e., λ in SR, WSR, and
λ1 , λ2 in SGR, WSGR), we employed a nested LOOCV strategy on the training
set. The parameters are grid-searched in the range of [2−5 , 2−4 , ..., 20 , ..., 24 , 25 ].
To evaluate the classification performance, we use seven evaluation measures:
accuracy (ACC), sensitivity (SEN), specificity (SPE), area under curve (AUC),
Youden’s index (YI), F-Score and balanced accuracy (BAC).
As shown in Fig. 3, the proposed WSGR model achieves the best classi-
fication performance with an accuracy of 81.8 %, followed by WSR (78.8 %).
By comparison, we can verify the effectiveness of the link strength related
weights from two aspects. First, it can be observed that the WSR model with
link strength related weights from PC performs much better than both the
PC and SR models. Second, the classification result our model outperforms
the SGR model (72.73 %). Similarly, by comparing the results by the SR and
WSR model with those by the SGR and WSGR models, the effectiveness of
our introduced group structure based penalty can be well justified. With the
DeLong’s non-parametric statistical significance test [13], our proposed method
1
1
PC SR WSR SGR WSGR
True Positive Rate

0.8
0.8
0.6
Results
0.6
0.4 PC
0.4 SR
WSR
0.2
0.2 SGR
WSGR
0
0 0.2 0.4 0.6 0.8 1
ACC SEN SPE AUC YI F-score BAC False Positive Rate
(a) Classification Performance (b) ROC Curves
Fig. 3. Comparison of classification results by five methods, using both seven classifi-
cation performance metrics and ROC curve.
significantly outperforms PC, SR, WSR and SGR under 95 % confidence inter-
val with p − value = 1.7 × 10−7 , 3.6 × 10−6 , 0.048 and 0.0017, respectively. The
superior performance of our method suggests the weighted group sparsity is ben-
eficial in constructing brain networks and also able to improve the classification
performance.
As the selected features by t-test in each validation might be different, we
record all selected features during the training process. The 76 most frequently
selected features are visualized in Fig. 4, where the thickness of an arc indicating
the discriminative power of an edge, which is inversely proportional to the esti-
mated p-values. The colors of arcs are randomly generated to differentiate ROIs
Fig. 4. The most frequently selected connections for the 90 ROIs of AAL template. The
thickness of an arc indicates the discriminative power of an edge for MCI classification.
44 R. Yu et al.
and connectivity for clear visualization. We can see that several brain regions
(as highlighted in the figure) are jointly selected as important features for MCI
classification. For example, a set of brain regions in the temporal pole, olfactory
areas and medial orbitofrontal cortex, as well as bilateral fusiform, are found to
have dense connections which are pivotal to MCI classification [14].
4 Conclusion
In this paper, we have proposed a novel weighted sparse group representation

method for brain network modeling, which integrates link strength, group struc-
ture, as well as sparsity in a unified framework. In this way, the complex brain
network can be more accurately modeled as compared to other commonly used
methods. Our proposed method was validated in the task of MCI and NC clas-
sification, and superior results were obtained compared to the classification per-
formance of other brain network modeling approaches. In future work, we plan
to work on more effective grouping strategy in order to further improve the
modeling accuracy and the MCI diagnosis performance.
References
1. Fornito, A., Zalesky, A., Breakspear, M.: The connectomics of brain disorders. Nat.
Rev. Neurosci. 16, 159–172 (2015)
2. Smith, S.M., Miller, K.L., et al.: Network modelling methods for FMRI. NeuroIm-
age 54, 875–891 (2011)
3. Huang, S., Li, J., Sun, L., Ye, J., Fleisher, A., Wu, T.: Alzheimer’s Disease Neu-
roImaging Initiative: learning brain connectivity of Alzheimer’s disease by sparse
inverse covariance estimation. NeuroImage 50, 935–949 (2010)
4. Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection
with the lasso. Ann. Stat., 1436–1462 (2006)
5. Lee, H., Lee, D.S., et al.: Sparse brain network recovery under compressed sensing.
IEEE Trans. Med. Imaging 30, 1154–1165 (2011)
6. Rubinov, M., Sporns, O.: Complex network measures of brain connectivity: uses
and interpretations. NeuroImage 52, 1059–1069 (2010)
7. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped
variables. J. R. Stat. Soc. Series. B. Stat. Methodol 68, 49–67 (2006)
8. Varoquaux, G., Gramfort, A., Poline, J.B., Thirion, B.: Brain covariance selec-
tion: better individual functional connectivity models using population prior. In:
Advances in Neural Information Processing Systems, pp. 2334–2342 (2010)
9. Wee, C.Y., et al.: Group-constrained sparse fMRI connectivity modeling for mild
cognitive impairment identification. Brain Struct. Funct. 219, 641–656 (2014)
10. Jiang, X., Zhang, T., Zhao, Q., Lu, J., Guo, L., Liu, T.: Fiber connection pattern-
guided structured sparse representation of whole-brain fMRI signals for functional
network inference. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A. (eds.)
MICCAI 2015. LNCS, vol. 9349, pp. 133–141. Springer, Heidelberg (2015)
11. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM
Trans. Intell. Syst. Technol. 2, 27 (2011)
12. Liu, J., Ji, S., Ye, J.: SLEP: sparse learning with efficient projections. Arizona
State Univ. 6, 491 (2009)
13. DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L.: Comparing the areas under
two or more correlated receiver operating characteristic curves: a nonparametric
approach. Biometrics, 837–845 (1988)
14. Albert, M.S., DeKosky, S.T., Dickson, D., et al.: The diagnosis of mild cogni-
tive impairment due to Alzheimers disease: recommendations from the National
Institute on Aging-Alzheimers Association workgroups on diagnostic guidelines for
Alzheimer’s disease. Alzheimer’s Dement. 7, 270–279 (2011)
Temporal Concatenated Sparse Coding
of Resting State fMRI Data Reveal Network
Interaction Changes in mTBI
Jinglei Lv1,2(&), Armin Iraji3, Fangfei Ge1,2, Shijie Zhao1,

Xintao Hu1, Tuo Zhang1, Junwei Han1, Lei Guo1, Zhifeng Kou3,4,
and Tianming Liu2
1
lvjinglei@gmail.com
2
3
Department of Biomedical Engineering, Wayne State University,
Detroit, MI, USA
4
Department of Radiology, Wayne State University, Detroit, MI, USA
Abstract. Resting state fMRI (rsfMRI) has been a useful imaging modality for
network level understanding and diagnosis of brain diseases, such as mild
traumatic brain injury (mTBI). However, there call for effective methodologies
which can detect group-wise and longitudinal changes of network interactions in
mTBI. The major challenges are two folds: (1) There lacks an individualized and
common network system that can serve as a reference platform for statistical
analysis; (2) Networks and their interactions are usually not modeled in the same
algorithmic structure, which results in bias and uncertainty. In this paper, we
propose a novel temporal concatenated sparse coding (TCSC) method to address
these challenges. Based on the sparse graph theory the proposed method can
model the commonly shared spatial maps of networks and the local dynamics of
the networks in each subject in one algorithmic structure. Obviously, the local
dynamics are not comparable across subjects in rsfMRI or across groups;
however, based on the correspondence established by the common spatial
profiles, the interactions of these networks can be modeled individually and
statistically assessed in a group-wise fashion. The proposed method has been
applied on an mTBI dataset with acute and sub-acute stages, and experimental
results have revealed meaningful network interaction changes in mTBI.
1 Introduction
Mild traumatic brain injury (mTBI) has received increasing attention as a significant
public health care burden worldwide [1, 2]. Microstructural damages could be found in
most cases of mTBI using diffusion MRI [3, 4]. Meanwhile many researches based on
resting state fMRI (rsfMRI) have reported that there are functional impairment at
network level with the aspects of memory, attention, executive function and processing
time [5–7]. However, there still lacks effective methodology that could model the

DOI: 10.1007/978-3-319-46720-7_6
Temporal Concatenated Sparse Coding of Resting State fMRI Data 47
changes of interactions among brain networks longitudinally, which reflects the neural
plasticity and functional compensation during different stages of mTBI. The challenges
mainly lie in two folds: (1) There lacks an individualized and common network system
that can serve as a reference platform for statistical analysis; (2) Networks and their
interactions are usually not modeled in the same algorithmic structure, which results in
bias and uncertainty.
Conventional network analysis methods mainly include three streams: seed-based
network analysis [8], graph theory based quantitative network analysis [9], and
data-driven ICA component analysis [6, 10, 11]. Recently, sparse coding has attracted
intense attention in the fMRI analysis field because the sparsity constraint coincides
with the nature of neural activities, which makes it feasible in modeling the diversity of
brain networks [12–14]. Based on the sparse graph theory, whole brain fMRI signals
can be modeled by a learned dictionary of basis and a sparse parameter matrix. Each
signal of voxel in the brain is sparsely and linearly represented by the learned dic-
tionary with a sparse parameter vector [12–14]. The sparse parameters could be pro-
jected to the brain volume as spatial functional networks. The methodology has been
validated to be effective in reconstructing concurrent brain networks from fMRI data
[12–14]. However, functional interactions among these networks have not been well
explored, especially for group-wise statistics on rsfMRI data. In this paper, we propose
a novel group-wise temporal concatenating sparse coding method for modeling resting
state functional networks and their network-level interactions. Briefly, a dictionary
matrix and a parameter matrix are learned from the temporally concatenated fMRI data
from multiple subjects and groups. Common network spatial profiles can then be
reconstructed from the parameter matrix. It is interesting that the learned dictionary is
also temporally concatenated and it can be decomposed into dictionary of each subject
of each group to represent local dynamics of the common networks. Although the local
dynamics of each network are quite individualized, it turns out that their interactions
are comparable based on the correspondence built by the common spatial profiles. The
proposed method has been applied on a longitudinal mTBI data set, and our results
have shown that network interaction changes could be detected in different stages of
mTBI, in suggestion of brain recovery and plasticity after injury.
2 Materials and Method
2.1 Overview
Briefly, our method is designed for cross-group analysis and longitudinal modeling.
RsfMRI data from multiple subjects and groups are firstly pre-processed, and then they
are spatially and temporally normalized, based on which fMRI signals will be tem-
porally concatenated. There are mainly two steps in our framework. As shown in
Fig. 1, in the first step, based on temporal concatenated sparse coding (TCSC), we
model common spatial profiles of brain networks and local network dynamics at the
same time. In the second step (Fig. 2), based on the local dynamics, functional inter-
actions among networks will be calculated, statistically assessed and compared among
groups. In this research, there are two groups of subjects, which are healthy controls
48 J. Lv et al.
and mTBI patients. For each group, there are two longitudinal stages: stage 1 as
patients at the acute stage and controls at the first visit, and stage 2 as patients at
subacute stage and controls at the second visit.

This study was approved by both the Human Investigation Committee of Wayne State
University and Institutional Review Board of Detroit Medical Center where the original
data was collected [6]. Each participant signed an informed consent before enrollment.
RsfMRI data was acquired from 16 mTBI patients at acute and sub-acute stages and
from 24 healthy controls at two stages with one month interval, on a 3-Tesla Siemens
Verio scanner with a 32-channel radiofrequency head-only coil by using a gradient EPI
sequence with the following imaging parameters: TR/TE = 2000/30 ms, slice thick-
ness = 3.5 mm, slice gap = 0.595 mm, pixel spacing size = 3.125 3.125 mm,
matrix size = 64 64, flip angle = 90°, 240 volumes for whole-brain coverage,
NEX = 1, acquisition time of 8 min [6].
Pre-processing included skull removal, motion correction, slice-time correction,
spatial smoothing (FWHM = 5 mm), detrending and band-pass filtering (0.01 Hz–
0.1 Hz). All fMRI images were registered into the MNI space and all fMRI data were
uniformed by a common brain mask across groups. To avoid bias caused by individual
variability, each signal of voxel in each subject was normalized to mean 0 and standard
deviation of 1. Based on the voxel correspondence built by registration, the temporal
signals of each voxel can be temporally concatenated across subjects.
2.3 Concatenated Sparse Coding

The method of temporal concatenated sparse coding (TCSC) is summarized in Fig. 1.
Whole brain fMRI signals of each subject is extracted in a designed order and managed
in a signal matrix as shown in Fig. 1a. Then signal matrices from multiple subjects and
Fig. 1. The framework of TCSC method on longitudinal mTBI rsfMRI data.

multiple groups are concatenated as the input of dictionary learning and sparse coding,
S ¼ ½s1 ; s2 . . .; si . . .sn (Fig. 1b). Eventually, the concatenated input matrix is decom-
posed with a concatenated dictionary matrix D (Fig. 1c) and a parameter matrix A ¼
½a1 ; a2 . . .; ai . . .an (Fig. 1d). Each row of the matrix A is projected to brain volume to
represent a functional network (Fig. 1e). As the learning is based on groups of subjects,
the networks are group-wise and common spatial profiles.
The dictionary learning and sparse coding problem is an optimized matrix factor-
ization problem in the machine learning field [15]. The cost function of the problem can
be summarized in Eq. (1) by considering the average loss of single representation.
1X n
f n ðD Þ , ‘ðsi ; DÞ ð1Þ
n i¼1
The loss function for each input signal is defined in Eq. (2), in which an l1 regu-
larization term was introduced to yield the sparse solution of ai .
1
‘ðsi ; DÞ , min jjsi Dai jj22 þ kjjai jj1 ð2Þ
DC;ai Rm
2
n o
C , DRtm s:t: 8j ¼ 1; . . .m; djT dj 1 ð3Þ
For this problem, an established and open-sourced parallel computing solution has
been provided by online dictionary learning method [15] in the SPArse Modeling
Software (http://spams-devel.gforge.inria.fr/). We adopt the SPAMS method to solve
our temporal concatenated sparse coding problem.
2.4 Network Interaction Statistics

As shown in Figs. 1c and 2a, the learned dictionary also follows the concatenating
rules of the original fMRI signals. By decomposing the dictionary D into small dic-
tionaries of each individual, D1 … Dk, we can map the local dynamics of the common
networks. These signal patterns (Fig. 2a) are quite individualized and not comparable
across subjects because in resting state brain activities of subjects are more random,
however, interactions among these networks are comparable, and especially thanks to
the correspondence established by the common spatial maps individual variability are
balanced and statistical analysis could be realized for cross-group analysis.
So as shown in Fig. 2b, for each subject from each group, we define the interaction
matrix by calculating the Pearson’s correlations among dictionary atoms. And three
steps are included in the statistics on the interactions. First, a global null hypothesis
t-test was performed across all groups and stages, and in this way weak interactions
which are globally close to zero are removed. In the second step, on the survived
interactions one-way ANOVA was employed to detect interactions which exhibit
significant difference across two stages and two groups. Finally, based on the signifi-
cance analysis output, we use two sample t-test to detect significant interaction
50 J. Lv et al.
Fig. 2. Statistics on network interactions across two stages and two groups.
difference of control subjects and mTBI patients. In addition, longitudinal interaction

changes can also be analyzed.
3 Results
There are complex sources for mTBI patients and the micro damages in the brain tissue
are quite different across subjects. However, based on the cognitive test and literature
report, patients usually suffer from similar functional defect, so that we group them
together to explore common functional interaction changes.
In this part, we firstly present meaningful networks from the concatenated sparse
coding and then we will analyze the statistical interaction differences among four
groups. Note that, there are two scans for each subject, and we will use the following
abbreviations: C1: Stage 1 of control group; C2: Stage 2 of control group; P1: Stage 1
of patient group; and P2: Stage 2 of patient group.
3.1 Common Networks from Temporal Concatenated Sparse Coding

In the pipeline of Sect. 2.3, we have set the dictionary size as 50 which is based on
previous ICA and sparse coding works [6, 14], thus there are 50 common networks
learned from concatenated sparse coding. However, based on visual inspection, a
variety of these networks are artifact components. So we removed these networks as
well as networks related with white matter and CSF from the following analysis.
Finally 29 networks are kept for interaction analysis as shown in Fig. 3. In Fig. 3,
networks are reordered and renamed from N1 to N29. Among these networks, there are
conventional intrinsic networks such as visual, auditory, motor, executive control and
default mode networks. These networks could cover the whole cerebral cortex and also
include subcortical regions, such as cerebellum, thalamus, and corpus callosum.
Fig. 3. The networks reconstructed from the matrix A of the TCSC method. Each network is
visualized with volume rendering and surface mapping from the most representative view.
3.2 Interaction Analysis and Comparison

With the statistical steps in Sect. 2.4, we analyzed the network interaction differences
across groups. After the first step of global null hypothesis t-test (p < 0.05) and the
second step of one-way ANOVA (p < 0.05), 16 interactions out of the 406 interactions
show significant differences among four groups. In order to determine the direction of
differences, we designed a series of two-sample t-test, as shown in Table 1.
In Table 1, each element indicated by two group IDs shows the number of inter-
action with significant difference, e.g., element (C1, P1) is the number of interactions
with significance of C1 > P1 and element (P1, C1) is the number of interactions with
significance of P1 > C1. Note that we also put C1 and C2 together as C group, and put
P1 and P2 together as P group in the lower part of Table 1. Considering the multiple
comparisons, we used a corrected threshold of p < 0.01 to determine significance.
In Fig. 4, we visualized all the interactions with significant differences with red
lines. In general, only two network interactions are weakened because of mTBI;
however, there are 8 network interactions that are strengthened as shown in Fig. 4a–b.
Table 1. T-test design and number of interactions with significant difference (p < 0.01).
T-Test Design C1 C2 P1 P2
C1 Non 0 3 0
C2 0 Non 2 0
P1 2 4 Non 0
P2 2 4 3 Non
T-Test Design C P
C Non 2
P 8 Non
52 J. Lv et al.
Fig. 4. Visualization of the network interactions with significant difference in Table 1.
It indicates that in order to compensate the functional loss because of micro injury of
mTBI, multiple networks and their interactions are combined to generate alternative
functional pathways [16]. These could be signs of neural plasticity [17].
For longitudinal analysis, we expect P1 (acute stage) and P2 (sub-acute sage) to be
different, so that t-tests are performed separately with the control groups as well as
between the two groups. For validation, we also treat the C1 and C2 as different groups.
First, from Table 1, there is no difference detected between C1 and C2, which is as
expected. Interestingly, C1 and C2 have stronger interactions than P1 (Fig. 4c–d), but
don’t have interactions stronger than P2. This indicates that patients at the sub-acute
stage are recovering towards normal, and in the recovery, there are also interactions are
strengthened (Fig. 4e) in P2. The interaction of N14 and N20 is stably decreased in P1
group, which makes sense because both N14 and N20 are related to memory function.
P1 and P2 both have stronger interactions than control group, but they are quite
different (Fig. 4f–i). For example, N24 (DMN) centered interactions are enhanced in
P1, which might suggest the strengthened functional regularization for functional
compensation. And N18 (cerebellum) centered interactions are enhanced in P2. These
findings are interesting and explicit interpretations of these interactions will be explored
in the future.
4 Conclusion
In this paper, we proposed a network interaction modeling method to determine the

longitudinal changes of mTBI. The method is based on the temporal concatenated
sparse coding, by which common spatial network profiles can be modeled across
groups and at the same time local dynamics can also be modeled for each individual.
Based on the network correspondence established by the common spatial maps, net-
work interactions are statistically compared across groups and longitudinally. Our
method has been applied on an mTBI data set with acute and sub-acute stages.
Experimental results have shown that neural plasticity and functional compensation
could be observed through the interaction changes.
Acknowledgement. This work was supported by NSF CAREER Award IIS-1149260, NSF
BCS-1439051, NSF CBET-1302089, NIH R21NS090153 and Grant W81XWH-11-1-0493.
References
1. Iraji, A., et al.: The connectivity domain: analyzing resting state fMRI data using
feature-based data-driven and model-based methods. Neuroimage 134, 494–507 (2016)
2. Kou, Z., Iraji, A.: Imaging brain plasticity after trauma. Neural Regen. Res. 9, 693–700
(2014)
3. Kou, Z., VandeVord, P.J.: Traumatic white matter injury and glial activation: from basic
science to clinics. Glia 62, 1831–1855 (2014)
4. Niogi, S.N., Mukherjee, P.: Diffusion tensor imaging of mild traumatic brain injury. J. Head
Trauma Rehabil. 25, 241–255 (2010)
5. Mayer, A.R., et al.: Functional connectivity in mild traumatic brain injury. Hum. Brain
Mapp. 32, 1825–1835 (2011)
6. Iraji, A., et al.: Resting state functional connectivity in mild traumatic brain injury at the
acute stage: independent component and seed based analyses. J. Neurotrauma 32, 1031–
1045 (2014)
7. Stevens, M.C., et al.: Multiple resting state network functional connectivity abnormalities in
mild traumatic brain injury. Brain Imaging Behav. 6, 293–318 (2012)
8. Fox, M., Raichle, M.: Spontaneous fluctuations in brain activity observed with functional
magnetic resonance imaging. Nat. Rev. Neurosci. 8(9), 700 (2007)
9. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural
and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009)
10. van de Ven, V., Formisano, E., Prvulovic, D., Roeder, C., Linden, D.: Functional
connectivity as revealed by spatial independent component analysis of fMRI measurements
during rest. Hum. Brain Mapp. 22(3), 165–178 (2004)
11. Iraji, A., et al.: Compensation through functional hyperconnectivity: a longitudinal
connectome assessment of mild traumatic brain injury. Neural Plast. 2016, 4072402 (2016)
12. Lee, Y.B, Lee, J., Tak, S., et al.: Sparse SPM: sparse-dictionary learning for resting-state
functional connectivity MRI analysis. Neuroimage 125 (2015)
13. Lv, J., et al.: Assessing effects of prenatal alcohol exposure using group-wise sparse
representation of fMRI data. Psychiatry Res. Neuroimaging 233(2), 254–268 (2015)
54 J. Lv et al.
14. Lv, J., et al.: Sparse representation of whole-brain FMRI signals for identification of
functional networks. Med. Image Anal. 20(1), 112–134 (2014)
15. Mairal, J., Bach, F., Ponce, J., et al.: Online learning for matrix factorization and sparse
coding. J. Mach. Learn. Res. 11(1), 19–60 (2010)
16. Chen, H., Iraji, A., Jiang, X., Lv, J., Kou, Z., Liu, T.: Longitudinal analysis of brain recovery
after mild traumatic brain injury based on groupwise consistent brain network clusters. In:
Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part II. LNCS,
vol. 9350, pp. 194–201. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24571-3_24
17. Mishina, M.: Neural plasticity and compensation for human brain damage. Nihon Ika
Daigaku Igakkai Zasshi 10(2), 101–105 (2014)
Exploring Brain Networks via Structured
Sparse Representation of fMRI Data
Qinghua Zhao1,3, Jianfeng Lu1(&), Jinglei Lv2,3, Xi Jiang3,

Shijie Zhao2,3, and Tianming Liu3(&)
1
School of Computer Science and Engineering,
Nanjing University of Science and Technology, Nanjing, China
lujf@njust.edu.cn
2
3
Cortical Architecture Imaging and Discovery,
tianming.liu@gmail.com
Abstract. Investigating functional brain networks and activities using sparse

representation of fMRI data has received significant interests in the neu-
roimaging field. It has been reported that sparse representation is effective in
reconstructing concurrent and interactive functional brain networks. However,
previous data-driven reconstruction approaches rarely simultaneously take
consideration of anatomical structures, which are the substrate of brain function.
Furthermore, it has been rarely explored whether structured sparse representa-
tion with anatomical guidance could facilitate functional networks reconstruc-
tion. To address this problem, in this paper, we propose to reconstruct brain
networks using the anatomy-guided structured multi-task regression (AGSMR)
in which 116 anatomical regions from the AAL template as prior knowledge are
employed to guide the network reconstruction. Using the publicly available
Human Connectome Project (HCP) Q1 dataset as a test bed, our method
demonstrated that anatomical guided structure sparse representation is effective
in reconstructing concurrent functional brain networks.
Keywords: Sparse representation Dictionary learning Group sparsity

Functional networks
1 Introduction
Functional magnetic resonance imaging (fMRI) signal analysis and functional brain
network investigation using sparse representation has received increasing interests in the
neuroimaging field [1, 10]. The main theoretical assumption is that each brain fMRI
signal can be represented as sparse linear combination of a set of signal basis in an
over-complete dictionary. The data-driven strategy of dictionary learning and sparse
coding is efficient and effective in reconstructing concurrent and interactive functional
networks from both resting state fMRI (rsfMRI) and task base fMRI (tfMRI) data [1,
10]. However, these approaches have potential space of further improvement, because
the pure data-driven sparse coding does not integrate brain science domain knowledge
DOI: 10.1007/978-3-319-46720-7_7
56 Q. Zhao et al.
when reconstructing functional networks. In the neuroscience field, it is widely believed

that brain anatomy and structure play crucial roles in determining brain function, and
anatomical structure is the substrate of brain function. Thus integrating anatomical
structure information into brain network representation is well motivated and justified.
In this paper, we propose a novel anatomy-guided structured multi-task regression
(AGSMR) method for functional network reconstruction by employing anatomical
group structures to guide sparse representation of fMRI data. In general, group-wise
structured multi-task regression has been an established methodology, which puts
group structure on the multi-tasks and employs a combination of ‘2 and ‘1 norms in
order to learn both intra-group homogeneity and inter-group sparsity [2, 6]. Our pre-
mise is that fMRI voxels from the same anatomical structure should potentially play
similar role in brain function. Thus, employing 116 brain regions from the AAL
template as anatomical group information could effectively improve the network rep-
resentation by constraining both homogeneity within anatomical structure and sparsity
across anatomical structures. After applying our method on the recently publicly
released Human Connectome Project (HCP) data, our experimental results demonstrate
that networks have been improved with higher similarity, which also provides
anatomical clues for understanding the detected brain networks.
2 Method
2.1 Overview
Our computational framework of AGSMR is illustrated in Fig. 1. fMRI images from
individual brain are first registered into a standard space(MNI) to align with the AAL
template. Then extracting fMRI signals from a whole brain mask, an over-complete
signal dictionary is learned via online dictionary method. The learned dictionary as a set
of features (regressors), the group structured multi-task regression employs anatomical
structures as group information to regress whole brain signals. Finally, the coefficients
matrix are mapped back to the brain volume represent functional brain networks.
2.2 Data Acquisition and Preprocessing

The recently publicly released fMRI data by Human Connectome Project (HCP) (Q1)
was used in this paper. The dataset (Q1) was acquired for 68 subjects and it includes 7
tasks such as Motor, Emotion, Gambling, Language, Relational, Social, and Working
Memory. The acquisition parameters of tfMRI data are as follows: 90 104 matrix,
220 mm FOV, 72 slices, TR = 0.72 s, TE = 33.1 ms, flip angle = 52°, BW = 2290
Hz/Px, in-plane FOV = 208 180 mm, 2.0 mm isotropic voxels. The preprocessing
pipelines included motion correction, spatial smoothing, temporal pre-whitening, slice
time correction, global drift removal. More details about the task descriptions and
preprocessing are referred to [9]. After preprocessing, all fMRI images are registered
into a standard template space (MNI space). Then fMRI signals are extracted from
voxels within a brain mask, and each signal was normalized to be with zero mean and
standard deviation of 1.
Exploring Brain Networks via Structured Sparse Representation of fMRI 57
Data Acquisition Dictionary

Extracte Signals Learning
Preprocess
Step1 Step2
D
Step3
Step4 Step5
Feature
Mapping
Label of Signals Anatomy-guided Structured
Using AAL Multi-task Regression
Fig. 1. The flowchart of proposed AGSMR method pipeline: Step 1: data acquisition,
preprocessing and extract the whole brain signals. Step 2: using the whole signals for learning
dictionary D. Step 3: labelling of the whole signals via the AAL template. Step 4: feature
Selection based on AGSMR method. Step 5: mapping the selected feature (coefficient matrix) in
the whole brain to identify these meaningful functional networks.
2.3 The Whole Brain Signals Dictionary Learning

In our method, an over-complete dictionary D is first learned from the whole brain
fMRI signals X ¼ ½x1 ; x2 ; . . .xn 2 Rtn (t is the fMRI signal time point and n is the
voxel number) using online dictionary learning method [4]. The theoretical assumption
here is that the whole brain fMRI signals are represented by sparse linear combination
of a set of signal basis, i.e., dictionary atoms. The empirical cost function of learning is
defined in Eq. (1)
1 Xn
fn ðDÞ , ‘ðxi ; DÞ ð1Þ
n i¼1
1
‘ðxi ; DÞ , minm jjxi Dai jj22 þ kjjai jj1 ð2Þ
ai 2R 2
where D ¼ ½d1 ; d2 ; . . .dn 2 Rtm (t is the fMRI signal time point and m is the number
of dictionary atoms) is the dictionary, each column representing a basis vector, the ‘1
regularization in Eq. (2) was adopted to generate a sparse solution, D and a are
alternatively updated and learned by using online dictionary learning algorithm [4]. The
learned D was adopted as the features (regressors) to perform sparse representation and
the proposed structured sparse representation of brain fMRI signals is detailed in
Sect. 2.5.
58 Q. Zhao et al.
2.4 Grouping fMRI Signals with Anatomical AAL Template

By using the AAL template [7], 116 brain regions are employed in our method as
shown in Fig. 1. Specially, the whole brain voxel are separated into 116 groups based
on AAL template. Before signal extraction, each subject has been registered into the
standard space(MNI) and alignment is established with the AAL template, where each
voxel in brain mask is associated with a template label. Voxels with same anatomical
AAL label are grouped together. Thus, in each brain, voxels of fMRI signals are
categorized and labeled as 116 AAL groups. This anatomical group information will be
used to guide the coefficient matrix learning in the next section.
2.5 Anatomical Guided Structured Multi-task Regression (AGSMR)

In conventional approach, once the dictionary D are defined, the learning of coefficient
matrix is summarized into the typical LASSO [5] problem in Eq. (3).
^
a ¼ argmin‘ðaÞ þ k/ðaÞ ð3Þ
where ‘ðaÞ is the loss function, and /ðaÞ is the regularization term, which could
regularize feature selection while achieving sparse regularization, and k > 0 is the
regularization parameter. Once we learned dictionary D ¼ ½d1 ; d2 ; . . .dn 2 Rtm
(Sect. 2.3), the conventional LASSO perform regression of brain fMRI signals X ¼
½x1 ; x2 ; . . .xn 2 Rtn to obtain a sparse coefficient matrix a ¼ ½a1 ; a2 ; . . .an 2 Rmn was
defined as:
Xn Xn Xm
^
a ¼ argmin jjxi Dai jj22 þ k jaij j ð4Þ
i¼1 i¼1 j¼1
where ‘ðaÞ is defined as the least square loss, and /ðaÞ is the ‘1 -norm regularization
term to induce sparsity, aij is the coefficient element at the i-th column and j-th row, m
is the dictionary size. Equation (4) can be viewed as the LASSO penalized least
squares problem, conventional LASSO in Eq. (4) is pure data-driven approach,
However, according to the previous studies [2, 3, 6] that have shown that the priori
structure information such as disjoint/overlapping groups, trees, and graphs may sig-
nificantly improve the classification/regression performance and help identify the
important features [3].
In this paper, we propose a novel structured sparse representation approach (group
guided structured multi-task regression) into the regression of fMRI signals. Specifi-
cally, the group information of fMRI signals are defined by the anatomical structure in
Sect. 2.4, i.e., the whole brain fMRI signals are separated into v groups
fG1 ; G2 ; . . .Gv g; v ¼ 1; 2; . . .V based on the AAL template. The conventional LASSO
adopted the ‘1 norm regularization term to induce sparsity (Eq. (4)), here the ‘2 norm
penalty is introduced into the penalty term as shown in Eq. (5), which will improve the
intra-group homogeneity. Meanwhile, we using ‘1 norm joint ‘2 norms penalty which
will induce both intra-group sparsity and inter-group sparsity in Eq. (5).
Xn Xn Xm
^
a ¼ argmin jjxi Dai jj22 þ k jaij j
i¼1 i¼1 j¼1
Xm Xs
þ ð1 kÞ j¼1 s¼1
xs jjaGj s jj2 ð5Þ
Thus, Eq. (5) can be also viewed as the structured sparse penalized multi-task least
squares problem. The detailed solution of this structured LASSO penalized multi-task
least squares problem with combined ‘1 and ‘2 norms were referred to [6, 8] our final
learning problem is summarized in Eq. (5). http://yelab.net/software/SLEP/) is the
SLEP package employed to solve the problem and to learn the coefficient matrix a.
From brain science perspective, the learned coefficient matrix a include the spatial
feature of functional networks and each row of a spatial features were mapped back to
brain volume to identify and quantitatively characterize those meaningful functional
networks similar to the methods in [1].
3 Results
3.1 Identifying Resting State Networks on Seven Task Datasets

To evaluate the identified networks, we defined a spatial similarity coefficient for check
the spatial similarity between the identified networks and the resting state networks
(RSNs) template [11]. The similarity coefficient was defined as below:
jA \ Bj
S¼ ð6Þ
jBj
where A is the spatial map of our identified network component and B is that of the
RSNs template network. jAj And jBj are the numbers of voxels.
We performed quantitative measurements on Working Memory task dataset to
demonstrate the performance of our method. We selected 10 well-known resting state
networks to compare spatial similarity. The identified networks are visualized in Fig. 2.
The figures(RSNs#1—RSNs#10) represent 10 resting state template networks(RSNs)
and #1–#10 represent our identified networks. It is shown that our method identified
networks are consistent with the templates. The slice #1, #2, and #3 are visual network,
which correspond to medial, occipital pole, and lateral visual areas, the slice #4 is
default mode network(DMN),the slice #5 to #8 are cerebellum, sensorimotor, auditory
and executive control networks respectively. The slice #9 and #10 are frontoparietal
networks, all of these identified networks activated areas are consistent with template
networks and the detailed comparision results in Table 1.
In order to validate our method effective and robust, we used seven different task
datasets to test our approach. Figure 3 shows the results. Table 1 shows similarity
results compare with template on 7 different datasets.
60 Q. Zhao et al.
Fig. 2. Comparison 10 resting state networks (RSNs) with our method identified networks on
working memory task dataset. The figures(RSNs#—1RSNs#10) show 10 resting state template
networks [11] and (#1–#10) our method identified networks.
Table 1. Similarity coefficients between our results and the templates. The first column in table
is 7 tasks. The first row (#1–#10) indexes 10 networks. The average similarity across 7 tasks is
achieved as 0.623.
Task #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
WM 0.84 0.66 0.72 0.61 0.67 0.74 0.68 0.45 0.63 0.69
Emotion 0.82 0.65 0.61 0.46 0.54 0.84 0.62 0.42 0.65 0.70
Gambling 0.86 0.65 0.61 0.53 0.54 0.57 0.55 0.43 0.66 0.73
Language 0.86 0.66 0.62 0.57 0.74 0.56 0.62 0.45 0.67 0.72
Motor 0.83 0.66 0.62 0.47 0.47 0.53 0.51 0.41 0.60 0.79
Relational 0.81 0.68 0.62 0.47 0.47 0.53 0.51 0.41 0.60 0.79
Social 0.82 0.66 0.67 0.48 0.54 0.56 0.63 0.42 0.71 0.71
3.2 Comparison Between Our Method and Traditional Method

In this section, we compare our method and LASSO method on both Working Memory
and Gambling datasets. Figure 4 shows the our method and LASSO method identified
visual network, executive control network and auditory network, respectively. Table 2
shows the two methods similarities comparisons results with the template on two
different task datasets. These comparisons show that our method has higher similarity
with the template, and in this sense it is superior in reconstructing functional networks
than no used anatomical structure the traditional method of LASSO.
(a) (b)
Fig. 3. (a) and (b) shows 10 resting state networks of one randomly selected subjects on HCP
Q1 datasets. The first row represents 7 different tasks and the seven columns are corresponding to
7 tasks and the last column shows the corresponding resting state network templates.
(a)
(b)
Fig. 4. (a), (b) Shows Template, LASSO and Our method identified Visual Network (a),
Executive Control network and Auditory Network (b), on Working Memory dataset.
Table 2. Comparison two methods by calculating the similarities with the templates. The first
row represents 10 resting state networks (#1–#10). The first column represents two different
methods. The second column represents two datasets working memory (WM) and gambling
(GB). In general, our method have higher similarity compared with LASSO method.
MethodRSNs #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
Lasso WM 0.79 0.64 0.71 0.62 0.50 0.48 0.53 0.44 0.58 0.68
GB 0.83 0.64 0.55 0.49 0.52 0.56 0.47 0.43 0.52 0.62
AGSMR WM 0.84 0.66 0.73 0.61 0.71 0.74 0.68 0.45 0.63 0.69
GB 0.86 0.65 0.61 0.53 0.54 0.57 0.55 0.43 0.66 0.73
62 Q. Zhao et al.
4 Conclusion
In this paper, we propose a novel anatomy guided structured multi-task regression

method for brain network identification. Experiments based on 7 different task datasets
have demonstrated the effectiveness of our AGSMR method in identifying consistent
brain networks. Comparisons have shown that our method is more effective and accurate
than the traditional method of LASSO. In general, our approach provides the anatomical
substrates for the reconstructed functional networks. In the future, we plan to apply and
test this AGSMR method in larger fMRI datasets and compare it with other brain
network construction methods. In addition, it will be applied on clinical fMRI datasets to
potentially reveal the abnormalities of brain networks in diseased brains.
Acknowledgements. This research was supported in part by Jiangsu Natural Science Founda-
tion (Project No. BK20131351), by the Chinese scholarship council (CSC).
References
1. Lv, J., Jiang, X., Li, X., Zhu, D., Chen, H., Zhang, T., Hu, X., Han, J., Huang, H., Zhang, J.:
Sparse representation of whole-brain fMRI signals for identification of functional networks.
Med. Image Anal. 20, 112–134 (2015)
2. Kim, S., Xing, E.P.: Tree-guided group lasso for multi-task regression with structured
sparsity. In: ICML, pp. 543–550 (2010)
3. Ye, J., Liu, J.: Sparse methods for biomedical data. ACM SIGKDD Explor. Newslett. 14,
4–15 (2012)
4. Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online learning for matrix factorization and sparse
coding. J. Mach. Learn. Res. 11, 19–60 (2010)
5. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser.
B (Methodological) 58, 267–288 (1996)
6. Liu, J., Ji, S., Ye, J.: Multi-task feature learning via efficient l 2, 1-norm minimization. In:
Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence,
pp. 339–348. AUAI Press (2009)
7. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N.,
Mazoyer, B., Joliot, M.: Automated anatomical labeling of activations in SPM using a
macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15,
273–289 (2002)
8. Liu, J., Ji, S., Ye, J.: SLEP: Sparse learning with efficient projections. Arizona State
University (2009)
9. Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E., Ugurbil, K.:
WU-Minn HCP consortium. The WU-Minn human connectome project: an overview.
Neuroimage 80, 62–79 (2013)
10. Lv, J., Jiang, X., Li, X., Zhu, D., Zhang, S., Zhao, S., Chen, H., Zhang, T., Hu, X., Han, J.,
Ye, J.: Holistic atlases of functional networks and interactions reveal reciprocal organiza-
tional architecture of cortical function. IEEE Trans. Biomed. Eng. 62, 1120–1131 (2015)
11. Smith, S., Fox, P., Miller, K., Glahn, D., Fox, P., Mackay, C., Filippini, N., Watkins, K.,
Toro, R., Laird, A., Beckmann, C.: Correspondence of the brain’s functional architecture
during activation and rest. Proc. Natl. Acad. Sci. U.S.A. 106, 13040–13045 (2009)
Discover Mouse Gene Coexpression Landscape
Using Dictionary Learning and Sparse Coding
Yujie Li1(&), Hanbo Chen1, Xi Jiang1, Xiang Li1, Jinglei Lv1,2,

Hanchuan Peng3(&), Joe Z. Tsien4(&), and Tianming Liu1(&)
1
YL31679@uga.edu
2
3
Allen Institute for Brain Science, Seattle, WA, USA
4
Brain and Behavior Discovery Institute,
Medical College of Georgia at Augusta University, Augusta, GA, USA
Abstract. Gene coexpression patterns carry rich information of complex brain

structures and functions. Characterization of these patterns in an unbiased and
integrated manner will illuminate the higher order transcriptome organization and
offer molecular foundations of functional circuitry. Here we demonstrate a
data-driven method that can effectively extract coexpression networks from
transcriptome profiles using the Allen Mouse Brain Atlas dataset. For each of the
obtained networks, both genetic compositions and spatial distributions in brain
volume are learned. A simultaneous knowledge of precise spatial distributions of
specific gene as well as the networks the gene plays in and the weights it carries
can bring insights into the molecular mechanism of brain formation and func-
tions. Gene ontologies and the comparisons with published data reveal interesting
functions of the identified coexpression networks, including major cell types,
biological functions, brain regions, and/or brain diseases.
Keywords: Gene coexpression network Sparse coding Transcriptome
1 Introduction
Gene coexpression patterns carry rich amount of valuable information regarding

enormously complex cellular processes. Previous studies have shown that genes dis-
playing similar expression profiles are very likely to participate in the same biological
processes [1]. The gene coexpression network (GCN), offering an integrated and
effective representation of gene interactions, has shown advantages in deciphering the
biological and genetic mechanisms across species and during evolution [2]. In addition
to revealing the intrinsic transcriptome organizations, GCNs have also demonstrated
superior performance when they are used to generate novel hypotheses for molecular
Y. Li and H. Chen—Co-first Authors.

DOI: 10.1007/978-3-319-46720-7_8
64 Y. Li et al.
mechanisms of diseases because many disease phenotypes are a result of dysfunction of

complex network of molecular interactions [3].
Various proposals have been made to identify the GCNs, including the classical
clustering methods [4, 5] and those applying network concepts and models to describe
gene-gene interactions [6]. Given the high dimensionality of genetic data and the
urgent need of making comparisons to unveil the changes or the consensus between
subjects, one common theme of these methods is dimension reduction. Instead of
analyzing the interactions between over ten thousands of genes, the groupings of data
by its co-expression patterns can considerably reduce the complexity of comparisons
from tens of thousands of genes to dozens of networks or clusters while preserving the
original interactions.
Along the line of data-reduction, we proposed dictionary learning and sparse
coding (DLSC) algorithm for GCN construction. DLSC is an unbiased data-driven
method that learns a set of new dictionaries from the signal matrix so that the original
signals can be represented in a sparse and linear manner. Because of the sparsity
constraint, the dimensions of genetic data can be significantly reduced. The grouping
by co-expression patterns are encoded in the sparse coefficient matrix with the
assumption that if two genes use same dictionary to represent their original signals,
their gene expressions must share similar patterns, and thereby considered ‘coex-
pressed’. The proposed method overcomes the potential issues of overlooking multiple
roles of regulatory domains in different networks that are seen in many clustering
methods [3] because DLSC does not impose the bases be orthogonal so that one gene
can be claimed by multiple networks. More importantly, for each of the obtained
GCNs, both genetic compositions and spatial distributions are learned. A simultaneous
knowledge of precise distributions of specific gene as well as the networks the gene
plays in and the weights it carries can bring insights into the molecular mechanism of
brain formation and functions.
In this study the proposed framework was applied on Allen Mouse Brain Atlas
(AMBA) [7], which surveyed over 20,000 genes expression patterns in C57BL6 J
mouse brain using in situ hybridization (ISH). One major advantage of ISH is the
ability of preserving the precise spatial distribution of genes. This powerful dataset,
featured by the whole-genome scale, cellular resolution and anatomical coverage, has
made it possible for a holistic understanding of the molecular underpinnings and related
functional circuitry. Using AMBA, the GCNs identified by DLSC showed significant
enrichment for major cell types, biological functions, anatomical regions, and/or brain
disorders, which holds promises to serve as foundations to explore different cell types
and functional processes in diseased and healthy brains.
2 Methods
2.1 Experimental Setup
We downloaded the 4,345 3D volumes of expression energy of coronal sections and
the Allen Reference Atlas (ARA) from the website of AMBA (http://mouse.brain-map.
org/). The ISH data were collected in tissue sections, then digitally processed, stacked,
Discover Mouse Gene Coexpression Landscape Using DLSC 65
Fig. 1. Computational pipeline for constructing GCNs. (a) Input is one slice of 3D expression
grids of all genes. (b) Raw ISH data preprocessing step that removes unreliable genes and voxels
and estimates the remaining missing data. (c) Dictionary learning and sparse coding of ISH
matrix with sparse and non-negative constraints on coefficient a matrix. (d) Visualization of
spatial distributions of GCNs. (e) Enrichment analysis of GCNs.
registered, gridded, and quantified - creating 3D maps of gene “expression energy” at

200 micron resolution. Coronal sections are chosen because they registered more
accurately to the reference model than the counterparts of sagittal sections. Each 3D
volume is composed by 67 slices with a dimension of 41 58.
As the ISH data is acquired by coronal slice before they were stitched and aligned
into a complete 3D volume, in spite of extensive preprocessing steps, quite significant
changes in average expression levels of the same gene in the adjacent slices are
observed. To alleviate the artifacts due to slice handling and preprocessing, we decided
to study the coexpression networks slice by slice. The input of the pipeline are the
expression grids of one of 67 coronal slices (Fig. 1a). A preprocessing module
(Fig. 1b) is first applied to handle the foreground voxels with missing data (−1 in
expression energy). Specifically, this module includes extraction, filtering and esti-
mation steps. First, the foreground voxels of the slice based on ARA were extracted.
Then the genes of low variance or with missing values in over 20 % of foreground
voxels were excluded. A similar filtering step is also applied to remove voxels in which
over 20 % genes do not have data. Most missing values were resolved in the filtering
steps. The remaining missing values were recursively estimated as the mean of fore-
ground voxels in its 8 neighborhood until all missing values were filled. After pre-
processing, the cleaned expression energies were organized into a matrix and sent to
DLSC (Fig. 1c). In DLSC, the gene expression matrix is factorized into a dictionary
matrix D and a coefficient matrix a. These two matrices encode the distribution and
composition of GCN (Fig. 1d–e) and will be further analyzed.
2.2 Dictionary Learning and Sparse Coding

DLSC is an effective method to achieve a compressed and succinct representation for
ideally all signal vectors. Given a set of M-dimensional input signals X = [x1,…,xN] in
RMN , learning a fixed number of dictionaries for sparse representation of X can be
accomplished by solving the following optimization problem. As discussed later that
66 Y. Li et al.
each entry of a indicates the degree of conformity of a gene to a coexpression pattern, a

non-negative constraint was added to the ‘1 regularization.
1
\D; a [ ¼ argmin kX D ak22 s:tkak1 k; 8i; ai 0 ð1Þ
2
where D 2 RNK is the dictionary matrix, a 2 RKM is the corresponding coefficient

matrix, k is a sparsity constraint factor and indicates each signal has fewer than k items
in its decomposition, kk1 ; kk2 are the summation of ‘1 and ‘2 norm of each column.
kX D ak22 denotes the reconstruction error.
In practice, gene expression grids are arranged into a matrix X 2 RMN , such that
rows correspond to foreground voxels and columns correspond to genes (Fig. 1c).
After normalizing each column by its Frobenius norm, the publicly available online
DLSC package [8] was applied to solve the matrix factorization problem proposed in
Eq. (1). Eventually, X is represented as sparse combinations of learned dictionary
atoms D. Each column in D is one dictionary consisted of a set of voxels. Each row in a
details the coefficient of each gene in a particular dictionary. The key assumptions of
enforcing the sparseness is that each gene is involved in a very limited number of gene
networks. The non-negativity constraint on a matrix imposes that no genes with the
opposite expression patterns placed in the same network. In the context of GCN
construction, we consider that if two genes use the same dictionary to represent the
original signals, then the two genes are ‘coexpressed’ in this dictionary. This set-up has
several benefits. First, both the dictionaries and coefficients are learnt from the data and
therefore intrinsic to the data. Second, the level of coexpressions are quantifiable and
not only comparable within one dictionary, but the entire a matrix.
Further, if we consider each dictionary as one network, the corresponding row of a
matrix contains all genes that use this dictionary for sparse representation, or ‘coex-
pressed’. Each entry of a measures the extent to which this gene conforms to the
coexpression pattern described by the dictionary atom. Therefore, this network,
denoted as the coexpression network, is formed. Since the dictionary atom is composed
of multiple voxels, by mapping each atom in D back to the ARA space, we can
visualize the spatial patterns of the coexpressed networks. Combining information from
both D and a matrices, we would obtain a set of intrinsically learned GCNs with the
knowledge of both their anatomical patterns and gene compositions. As the dictionary
is the equivalent of network, these two terms will be used interchangeably.
The choice of number of dictionaries and the regularization parameter k are crucial
to an effective sparse representation. The final goal here is a set of parameters that result
in a sparse and accurate representation of the original signal while achieving the highest
overlap with the ground truth - the known anatomy. A grid search of parameters is
performed using three criteria: (1) reconstruction error; (2) mutual information between
the dictionaries and ARA; (3) the density of a matrix measured by the percentage of
none-zero-valued elements in a. As different number of genes are expressed in different
slices, instead of a fixed number of dictionaries, a gene-dictionary ratio, defined as the
ratio between the number of genes expressed and the number of dictionaries, is used.
Guided by these criteria, k = 0.5 and gene-dictionary ratio of 100 are selected as the
optimal parameters.
2.3 Enrichment Analysis of GCNs

GCNs were characterized based on common gene ontology (GO) categories (molecular
function, biological process, cellular component), using Database for Annotation,
Visualization and Integrated Discovery (DAVID) [9]. Enrichment analysis was per-
formed by cross-referencing with published lists of genes related to cell type markers,
known and predicted lists of disease genes, specific biological functions etc. This list
consists of 32 publications and is downloaded from [10]. Significance was assessed
using one-sided Fisher’s exact test with a threshold of p < 0.01.
3 Results
DLSC allows readily interpretable results by plotting the spatial distributions of GCNs.
A visual inspection showed a set of spatially contiguous clusters partitioning the slice
(Fig. 2a,e). Many formed clusters correspond to one or more canonical anatomical
regions, providing an intuitive validation to the approach.
We will demonstrate the effectiveness of the DLSC by showing that the GCNs are
mathematically valid and biologically meaningful. Since the grouping of genes is
purely based on their expression patterns, a method with good mathematical ability will
make the partitions so that the expression patterns of the genes are similar within group
and dissimilar between groups. One caveat is that one gene may be expressed in
multiple cell types or participate in multiple functional pathways. Therefore, main-
taining the dissimilarity between groups may not be necessary. At the same time, the
method should also balance the biological ability of finding functionally enriched
networks. To show as an example, slice 27 and 38 are analyzed and discussed in depth
due to its good anatomical coverage of various brain regions. Using a fixed
gene-dictionary ratio of 100, 29 GCNs were identified for slice 27 and 31 GCNs were
constructed on slice 38.
Fig. 2. Visualization of spatial distribution of GCNs and the corresponding raw ISH data. On
the left are the slice ID and GCN ID. The second columns are the spatial maps of two GCNs, one
for each slice, followed by the ISH raw data of 3 representative genes. Gene acronyms and the
weights in the GCN are listed at the bottom. The weights indicate the extent to which a gene
conforms to the GCN.
68 Y. Li et al.
3.1 Validation Against Raw ISH Data

One reliable way to examine whether the expression patterns are consistent within a
GCN is to visually inspect the raw ISH data where the GCNs are derived. As seen in
Fig. 2, in general the ISH raw data match well with the respective spatial map. In GCN
22 of slice 27, the expression peaks at hypothalamus and extends to the substantia
innominate and bed nuclei of the stria teminais (Fig. 2a). All three genes showed strong
signals in these areas in the raw data (Fig. 2b–d). Similarly, the expressions patterns for
GCN 6 in slice 38 centered at the field CA1 (Fig. 2e) and all three genes showed
significantly enhanced signals in the CA1 region compared with other regions (Fig. 2f–
h). Relatedly, the weight in the parentheses is a measure of the degree to which a gene
conforms to the coexpression patterns. With a decreasing weight, the resemblance of
the raw data to the spatial map becomes weaker. One example is the comparison
between Zkscanl6 and the other two genes. Weaker signals were found in lateral septal
nucleus (Fig. 2b–d red arrow), preoptic area (Fig. 2b–d blue arrow), and lateral
olfactory tract (Fig. 2b–d green arrows) in Zkscanl6. On the other hand, the spatial map
of GCN 6 of slice 38 features an abrupt change of expression levels at the boundaries of
field CA1 and field CA2. This feature is seen clearly in Fibcd (Fig. 2f red arrows) and
Arhgap12 (Fig. 2g red arrows), but not Osbp16 (Fig. 2h red arrows). Also, the spatial
map shows an absence of expression in the dental gyrus (DG) and field CA3. However,
Arhgap12 displays strong signals at DG (Fig. 2g blue arrows) and Osbp16 shows high
expressions in both DG and CA3 (Fig. 2h green arrows). The decreased similarity
agrees well with the declining weights. Overall, we have demonstrated a good
agreement between the ISH raw data and the corresponding spatial map. The level of
agreement is correctly measured by the weight. These results visually validate the
mathematical ability of DLSC in grouping genes with similar expression patterns.
3.2 Enrichment Analysis of GCNs

Enrichment analysis using GO terms and existing published gene lists [10] provided
exciting biological insights for the constructed GCNs. We roughly categorize the
networks into four types for the convenience of presentation. In fact, one GCN often
falls into multiple categories as these categories characterize GCNs from different
perspectives. A comparison with the gene lists generated using purified cellular pop-
ulation [11, 12] indicates that GCN1 (Fig. 3a), GCN5 (Fig. 3b), GCN28 (Fig. 3c),
GCN25 (Fig. 3d) of slice 27 are significantly enriched with markers of oligodendro-
cytes, astrocytes, neurons and interneurons, with the p-values to be 1.1 10−7,
1.7 10−8, 2.5 10−3, 15 10−10 respectively. The findings have been not only
confirmed by several other studies using microarray and ISH data, but also corrobo-
rated by the GO terms. For example, two significant GO term in GCN1 is myelination
(p = 5.7 10−4) and axon ensheathment (p = 2.5 10−5), which are featured func-
tions for oligodendrocyte, with established markers such as Mbp, Serinc5, Ugt8a.
A visualization of the spatial map also offers a useful complementary source. For
example, the fact that GCN5 (Fig. 3b) locates at the lateral ventricle, where the sub-
ventricular zone is rich with astrocytes, confirms its enrichment in astrocyte.
Fig. 3. Visualization of spatial distribution of GCNs enriched for major cell types, particular
brain regions and function/disease related genes. In each panel, top row: Slice ID and GCN ID;
second row: spatial map; third row: sub-category; fourth row: highly weighted genes in the
sub-category.
In addition to cell type specific GCNs, we also found some GCNs remarkably
selective for particular brain regions, such as GCN3 (Fig. 3e) in CA1, GCN5 (Fig. 3f)
in thalamus, GCN11 (Fig. 3g) in hypothalamus and GCN16 (Fig. 3h) in caudeputa-
man. Other GCNs with more complex anatomical patterning revealed close associa-
tions to biological functions and brain diseases. The GCNs associated with ubiquitous
functions such as ribosomal (Fig. 3j) and mitochondrial functions (Fig. 3k) have a wide
coverage of brain. A functional annotation suggested GCN12 of slice 27 is highly
enriched for ribosome pathway (p = 6.3 10−5). As to GCN21 on the same slice,
besides mitochondrial function (p = 1.5 10−8), it also enriches in categories
including neuron (p = 5.4 10−8) and postsynaptic proteins (p = 6.3 10−8) com-
paring with literatures [10]. One significant GO term synaptic transmission
(p = 1.1 10−5) might add possible explanations to the strong signals in the cortex
regions. GCN13 of slice 38 (Fig. 3i) showed strong associations with genes that found
downregulated in Alzheimer’s disease. Comparisons with Autism susceptible genes
generated from microarray and high-throughput RNA-sequencing data [13] indicates
70 Y. Li et al.
GCN 24 of slice 27’s association (p = 1.0 10−3) (Fig. 3h). Despite slightly lower
weights, the most significant three genes Met, Pip5k1b, Avpr1a, have all been reported
altered in Autism patients [13].
4 Discussion
We have presented a data-driven framework that can derive biologically meaningful

GCNs from the gene expression data. Using the rich and spatially-resolved ISH AMBA
data, we found a set of GCNs that are significantly enriched for major cell types,
anatomical regions, biological pathways and/or brain diseases. The highlighted
advantage of this method is its capability of visualizing the spatial distribution of the
GCNs while knowing the gene constituents and the weights they carry in the network.
Although the edges in the network are not explicitly stated, it does not impact the
interpretations of the GCNs biologically.
In future work, new strategies will be developed to integrate the gene-gene inter-
actions on a single slice and to construct brain-scale GCNs. These GCNs can offer new
insights in multiple applications. For example, GCNs can serve as a baseline network
to enable comparisons across different species to understand brain evolution. Also,
charactering GCNs of brains at different stages can generate new hypotheses of brain
formation process. When the GCNs are correlated with neuroimaging measurements as
brain phenotypes, we are able to make new associations between the molecular scales
and the macro scale measurement and advance the understanding of how genetic
functions regulate and support brains structures and functions, as well as finding new
genetic variants that might account for the variations in brain structure and functions.
References
1. Tavazoie, S., Hughes, J.D., et al.: Systematic determination of genetic network architecture.
Nat. Genet. 22, 281–285 (1999)
2. Stuart, J.M.: A gene-coexpression network for global discovery of conserved genetic
modules. Science 302, 249–255 (2003)
3. Gaiteri, C., Ding, Y., et al.: Beyond modules and hubs: the potential of gene coexpression
networks for investigating molecular mechanisms of complex brain disorders. Genes. Brain
Behav. 13, 13–24 (2014)
4. Bohland, J.W., Bokil, H., et al.: Clustering of spatial gene expression patterns in the mouse
brain and comparison with classical neuroanatomy. Methods 50, 105–112 (2010)
5. Eisen, M.B., Spellman, P.T., et al.: Cluster analysis and display of genome-wide expression
patterns. Proc. Natl. Acad. Sci. U. S. A. 95, 12930–12933 (1999)
6. Langfelder, P., Horvath, S.: WGCNA: an R package for weighted correlation network
analysis. BMC Bioinform. 9, 559 (2008)
7. Lein, E.S., Hawrylycz, M.J., Ao, N., et al.: Genome-wide atlas of gene expression in the
adult mouse brain. Nature 445, 168–176 (2007)
8. Mairal, J., Bach, F., et al.: Online learning for matrix factorization and sparse coding.
J. Mach. Learn. Res. 11, 19–60 (2010)
9. Dennis, G., Sherman, B.T., et al.: DAVID: database for annotation, visualization, and
integrated discovery. Genome Biol. 4, P3 (2003)
10. Miller, J.A., Cai, C., et al.: Strategies for aggregating gene expression data: the collapseRows
R function. BMC Bioinform. 12, 322 (2011)
11. Cahoy, J., Emery, B., Kaushal, A., et al.: A transcriptome database for astrocytes, neurons,
and oligodendrocytes: a new resource for understanding brain development and function.
J. Neuronsci. 28, 264–278 (2004)
12. Winden, K.D., Oldham, M.C., et al.: The organization of the transcriptional network in
specific neuronal classes. Mol. Syst. Biol. 5, 1–18 (2009)
13. Voineagu, I., Wang, X., et al.: Transcriptomic analysis of autistic brain reveals convergent
molecular pathology. Nature 474(7351), 380–384 (2011)
Integrative Analysis of Cellular Morphometric
Context Reveals Clinically Relevant Signatures
in Lower Grade Glioma
Ju Han1,2 , Yunfu Wang1,5 , Weidong Cai3 , Alexander Borowsky4 ,

Bahram Parvin1,2 , and Hang Chang1,2(B)
1
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
hchang@lbl.gov
2
Department of Electrical and Biomedical Engineering,
University of Nevada, Reno, USA
3
School of Information Technologies, University of Sydney, Sydney, NSW, Australia
4
Center for Comparative Medicine, University of California, Davis, CA, USA
5
Department of Neurology, Taihe Hospital, Hubei University of Medicine,
Hubei, China
Abstract. Integrative analysis based on quantitative representation of

whole slide images (WSIs) in a large histology cohort may provide predic-
tive models of clinical outcome. On one hand, the efficiency and effective-
ness of such representation is hindered as a result of large technical vari-
ations (e.g., fixation, staining) and biological heterogeneities (e.g., cell
type, cell state) that are always present in a large cohort. On the other
hand, perceptual interpretation/validation of important multi-variate
phenotypic signatures are often difficult due to the loss of visual infor-
mation during feature transformation in hyperspace. To address these
issues, we propose a novel approach for integrative analysis based on
cellular morphometric context, which is a robust representation of WSI,
with the emphasis on tumor architecture and tumor heterogeneity, built
upon cellular level morphometric features within the spatial pyramid
matching (SPM) framework. The proposed approach is applied to The
Cancer Genome Atlas (TCGA) lower grade glioma (LGG) cohort, where
experimental results (i) reveal several clinically relevant cellular morpho-
metric types, which enables both perceptual interpretation/validation
and further investigation through gene set enrichment analysis; and (ii)
indicate the significantly increased survival rates in one of the cellular
morphometric context subtypes derived from the cellular morphometric
context.
Keywords: Lower grade glioma · Cellular morphometric context ·

Cellular morphometric type · Spatial pyramid matching · Consensus
clustering · Survival analysis · Gene set enrichment analysis
This work was supported by NIH R01 CA184476 carried out at Lawrence Berkeley
National Laboratory.

DOI: 10.1007/978-3-319-46720-7 9
Clinically Relevant Cellular Morphometric Context 73
1 Introduction
Histology sections provide wealth of information about the tissue architecture
that contains multiple cell types at different states of cell cycles. These sec-
tions are often stained with hematoxylin and eosin (H&E) stains, which label
DNA (e.g., nuclei) and protein contents, respectively, in various shades of color.
Morphometric abberations in tumor architecture often lead to disease progres-
sion, and it is desirable to quantify indices associated with these abberations
since they can be tested against the clinical outcome, e.g., survival, response to
therapy.
For the quantitative analysis of the H&E stained sections, several excellent
reviews can be found in [7,8]. Fundamentally, the trend has been based either on
nuclear segmentation and corresponding morphometric representation, or patch-
based representation of the histology sections that aids in clinical association.
The major challenge for tissue morphometric representation is the large amounts
of technical and biological variations in the data. To overcome this problem,
recent studies have focused on either fine tuning human engineered features [1,
4,11,12], or applying automatic feature learning [5,9,15,16,19,20], for robust
representation and characterization.
Even though there are inter- and intra- observer variations [6], a trained
pathologist always uses rich content (e.g., various cell types, cellular organiza-
tion, cell state and health), in context, to characterize tumor architecture and
heterogeneity for the assessment of disease state. Motivated by the works of
[13,18], we encode cellular morphometric signatures within the spatial pyramid
matching (SPM) framework for robust representation (i.e., cellular morphomet-
ric context) of WSIs in a large cohort with the emphasis on tumor architecture
and tumor heterogeneity, based on which an integrative analysis pipeline is con-
structed for the association of celllular morphometric context with clinical out-
comes and molecular data, with the potential in hypothesis generation regarding
the imaging biomarkers for personalized diagnosis or treatment. The proposed
approach is applied to the TCGA LGG cohort, where experimental results (i)
reveal several clinically relevant cellular morphometric types, which enables both
perceptual interpretation/validation and further investigation through gene set
enrichment analysis; and (ii) indicate the significantly increased survival rates
in one of the cellular morphometric context subtypes derived from the cellular
morphometric context.
2 Approaches
The proposed approach starts with the construction of cellular morphometric

types and cellular morphometric context, followed by integrative analysis with
both clinical and molecular data. Specifically, the nuclear segmentation method
in [4] was adopted given its demonstrated robustness in the presence of bio-
logical and technical variations, where the corresponding nuclear morphometric
74 J. Han et al.
descriptors are described in [3], and the constructed cellular morphometric con-
text representations are released on our website1 .
2.1 Construction of Cellular Morphometric Types and Cellular

Morphometric Context
For a set of WSIs and corresponding nuclear segmentation results, let M be the
total number of segmented nuclei; N be the number of morphometric descriptors
extracted from each segmented nucleus, e.g. nuclear size, and nuclear intensity;
and X be the set of morphometric descriptors for all segmented nuclei, where
X = [x1 , ..., xM ] ∈ RM ×N . The construction of cellular morphometric types
and cellular morphometric context are described as follows,
1. Construct cellular morphometric types (D), where D = [d1 , ..., dK ] are the
K cellular morphometric types to be learned by the following optimization:
M

min ||xm − zm D||2 (1)
D,Z
m=1
subject to card(zm ) = 1, |zm | = 1, zm 0, ∀m
where Z = [z1 , ..., zM ] indicates the assignment of the cellular morphometric

type, card(zm ) is a cardinality constraint enforcing only one nonzero element
of zm , zm 0 is a non-negative constraint on the elements of zm , and |zm | is
the L1-norm of zm . During training, Eq. 1 is optimized with respect to both
Z and D; In the coding phase, for a new set of X, the learned D is applied,
and Eq. 1 is optimized with respect to Z only.
2. Construct cellular morphometric context vis SPM. This is done by repeat-
edly subdividing an image and computing the histograms of different cellular
morphometric types over the resulting subregions. As a result, the spatial his-
togram, H, is formed by concatenating the appropriately weighted histograms
of all cellular morphometric types at all resolutions. For more details about
SPM, please refer to [13].
In our experiment, K is fixed to be 64. Meanwhile, given the fact that each
patient may contain multiple WSIs, SPM is applied at a single scale for the
convenient construction of cellular morphometric context as well as the integra-
tive analysis at patient level, where both cellular morphometric types and the
subtypes of cellular morphometric context are associated with clinical outcomes,
and molecular information.
2.2 Integrative Analysis

The construction of cellular morphometric context at patient level in a large
cohort enables the integrative analysis with both clinical and molecular infor-
mation, which contains the components as follows,
1
http://bmihub.org/project/tcgalggcellularmorphcontext.
1. Identification of cellular morphometric subtypes/clusters: consensus cluster-

ing [14] is performed for identifying subtypes/clusters across patients. The
input of consensus clustering are the cellular morphometric context at the
patient level. Consensus clustering aggregates consensus across multiple runs
for a base clustering algorithm. Moreover, it provides a visualization tool to
explore the number of clusters in the data, as well as assessing the stability
of the discovered clusters.
2. Survival analysis: Cox proportional hazards (PH) regression model is used for
survival analysis.
3. Enrichment analysis: Fisher’s exact test is used for the enrichment analysis
between cellular morphometric context subtypes and genomic subtypes.
4. Genomic association: linear models are used for assessing differential expres-
sion of genes between subtypes of cellular morphometric context, and the
correlation between genes and cellular morphometric types.
3 Experiments and Discussion
The proposed approach has been applied on the TCGA LGG cohort, including
215 WSIs from 209 patients, where the clinical annotation of 203 patients are
available. For the quality control purpose, background and border portions of
each whole slide image were detected and removed from the analysis.
3.1 Phenotypic Visualization and Integrative Analysis of Cellular

Morphometric Types
The TCGA LGG cohort consists of ∼ 80 million segmented nuclear regions, from
which 2 million were randomly selected for construction of cellular morphometric
types. As described in Sect. 2, the cellular morphometric context representation
for each patient is a 64-dimensional vector, where each dimension represents the
normalized frequency of a specific cellular morphometric type appearing in the
WSIs of the patient. Initial integrative analysis is performed by linking individ-
ual cellular morphometric types to clinical outcomes and molecular data. Each
cellular morphometric type is chosen as the predictor variable in the Cox pro-
portional hazards (PH) regression model together with the age of the patient
(implemented through the R survival package). For each cellular morphometric
type, the frequencies are further correlated with the gene expression values across
all patients. The top-ranked genes of positive correlation and negative correla-
tion, respectively, are imported into the MSigDB [17] for gene set enrichment
analysis. Table 1 summarizes cellular morphometric types that best predict the
survival distribution, and the corresponding enriched gene sets. Figure 1 shows
the top-ranked examples for these cellular morphemetric types.
As shown in Table 1, 8 out of 64 cellular morphometric types are clinically
relevant to survival (FDR adjusted p-value < 0.01) with statistical significance.
The first four cellular morphometric types in Fig. 1 all have a hazard ratio > 1,
indicating that a higher frequency of these cellular morphometric types may lead
76 J. Han et al.
Table 1. Top cellular morphometric types for predicting the survival distribution
based on the Cox proportional hazards (PH) regression model, and the corresponding
enriched gene sets with respect to genes that best correlate the frequency of the cellu-
lar morphometric type appearing in the WSIs of the patient, positively or negatively.
Hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions with
a unit difference of an explanatory variable, and higher HR indicates higher hazard of
death.
Type p-value q-value Hazard Enriched gene sets

ratio
Worse prognosis
#5 7.25e−4 7.73e−3 3.47e4
#28 2.05e−5 4.37e−4 9.32e3 Negatively correlated with: genes up-regulated in response to
IFNG; genes up-regulated in response to alpha interferon
proteins
#39 8.57e−7 2.74e−5 5.07e3 Positively correlated with: genes encoding proteins involved in
oxidative phosphorylation; genes up-regulated during
unfolded protein response, a cellular stress response related
to the endoplasmic reticulum; genes involved in DNA repair
Negatively correlated with: genes involved in the G2/M
checkpoint, as in progression through the cell division cycle;
genes important for mitotic spindle assembly; genes
defining response to androgens; genes up-regulated by
activation of the PI3K/AKT/mTOR pathway
#43 1.57e−9 1.00e−7 9.40e3 Negatively correlated with: genes up-regulated by activation
of Notch signaling
Better prognosis
#29 3.01e−4 3.85e−3 1.74e−8 Positively correlated with: genes up-regulated by IL6 via
STAT3 ; genes defining inflammatory response; genes
up-regulated in response to IFNG; genes regulated by
NF-kB in response to TNF ; genes up-regulated in
response to TGFB1 ; genes up-regulated in response to
alpha interferon proteins; genes involved in DNA repair;
genes mediating programmed cell death (apoptosis) by
activation of caspases; genes up-regulated through
activation of mTORC1 complex; genes involved in p53
pathways and networks
#31 1.23e−4 1.96e−3 5.49e−12 Positively correlated with: genes encoding components of the
complement system, which is part of the innate immune
system; genes up-regulated by KRAS activation; genes
up-regulated by IL6 via STAT3
#46 1.17e−3 9.84e−3 1.07e−8 Positively correlated with: a subgroup of genes regulated by
MYC; genes defining response to androgens; genes involved
in DNA repair; genes encoding cell cycle related targets of
E2F transcription factors
#52 1.23e−3 9.84e−3 6.86e−11 Positively correlated with: genes up-regulated during
transplant rejection; genes up-regulated during formation of
blood vessels; genes up-regulated in response to IFNG;
genes regulated by NF-kB in response to TNF ; genes
up-regulated in response to TGFB1 ; genes up-regulated by
IL6 via STAT3 ; genes mediating programmed cell death
(apoptosis) by activation of caspases
Fig. 1. Top-ranked examples for cellular morphometric types that best predict the
survival distribution, as shown in Table 1. Each example is an image patch of 101 × 101
pixels centered by the retrieved cell marked with the green dot. The first four cellular
morphometric types (hazard ratio> 1) indicate a worse prognosis and the last four
cellular morphometric types (hazard ratio< 1) indicates a protective effect. Note, this
figure is best viewed in color at 400 % zoom-in.
to a worse prognosis. A common phenotypic property of these cellular morpho-

metric types is the loss of chromatin content in the nuclear regions, which may
be associated with poor prognosis of lower grade glioma. The last four cellular
morphometric types in Fig. 1 all have a hazard ratio< 1, indicating that a higher
frequency of these cellular morphometric types may lead to a better prognosis.
Table 1 also indicates the enrichment of genes up-regulated in response
to IFNG in cellular morphometric types #28, #29 and #52. In the glioma
microenvironment, tumor cells and local T cells produce abnormally low lev-
els of IFNG. IFNG acts on cell-surface receptors, and activates transcription
of genes that offer potentials in the treatment of brain tumors by increas-
ing tumor immunogenicity, disrupting proliferative mechanisms, and inhibiting
tumor angiogenesis [10]. The observations of IFNG as a positive survival factor
confirms the prognostic effect of these cellular morphometric types: #28 – neg-
ative correlation and worse prognosis; #29 and #52 – positive correlation and
better prognosis. Other interesting observations include that three cellular mor-
phometric types of better prognosis are enriched with genes up-regulated by IL6
78 J. Han et al.
via STAT3, and two cellular morphometric types of better prognosis are enriched
with genes regulated by NF-kB in response to TNF and genes up-regulated in
response to TGFB1, respectively.
3.2 Subtyping and Integrative Analysis of Cellular Morphometric

Context
Hierarchical clustering was adopted as the clustering algorithm for consen-
sus clustering, which is implemented via R Bioconductor ConsensusClusterPlus
package with χ2 distance as the distance function. The procedure was run for
500 iterations with a sampling rate of 0.8 on 203 patients, and the corresponding
consensus clustering matrices with 2 to 9 clusters are shown in Fig. 2, where the
matrices with 2 to 5 clusters reveal different levels of similarity among patients
and matrices with 6 to 9 clusters provide little further information. Thus, we use
the five-cluster result for integrative analysis with clinical outcomes and genomic
signatures, where, due to insufficient patients in subtypes #1 (1 patient) and #2
(2 patients), we focus on the remaining three subtypes.
Consensus CDF
1
0.9
0.8
0.7
0.6
CDF
2 clusters
0.5
3 clusters
0.4 4 clusters
5 clusters
0.3
6 clusters
0.2 7 clusters
8 clusters
0.1
9 clusters
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Consensus index value
Fig. 2. Consensus clustering matrices and corresponding consensus CDFs of 203 TCGA
patients with LGG for cluster number of N = 2 to N = 9 based on cellular morpho-
metric context.
Figure 3(a) shows the Kaplan-Meier survival plot for three major subtypes of
the five-cluster consensus clustering result. The log-rank test p-value of 2.82e−5
indicates that the difference between survival times of subtype #5 patients and
subtypes #3&#4 patients is statistically significant. The integration of genome-
wide data from multiple platforms uncovered three molecular classes of lower-
grade gliomas that were best represented by IDH and 1p/19q status: wild-type
IDH, IDH mutation with 1p/19q codeletion, and IDH mutation without 1p/19q
codeletion [2]. Further Fisher’s exact test reveals no enrichment between the
cellular morphometric subtypes and these molecular subtypes. On the other
hand, differential expressed genes between subtype #5 and subtypes #3&#4
(Fig. 3(b)), indicate enrichment of genes that mediate programmed cell death
(apoptosis) by activation of caspases, and genes defining epithelial-mesenchymal
transition, as in wound healing, fibrosis and metastasis (via MSigDB).
(a) (b)
Fig. 3. (a) Kaplan-Meier plot for three major subtypes associated with patient survival,
where subtypes #3 (53 patients) #4 (65 patients) and #5 (82 patients) correspond to
the three major subtypes from top-left to bottom-right, respectively, in Fig. 2 (N = 5).
(b) Top genes that are differently expressed between the subtype #5 and subtypes
#3&#4.
4 Conclusion and Future Work

In this paper, we encode cellular morphometric signatures within the SPM frame-
work for robust representation (i.e., cellular morphometric context) of WSIs in
a large cohort at patient level, based on which an integrative analysis pipeline
is constructed for the association of celllular morphometric context with clini-
cal outcomes and molecular data. The integrative analysis, performed on TCGA
LGG cohort, reveals clinically relevant cellular morphometric types and morpho-
metric context subtypes, and the corresponding enriched gene sets. We believe
that the proposed approach has the potential to contribute to hypothesis gener-
ation regarding the imaging biomarkers for personalized diagnosis or treatment,
which will be further validated on independent cohort.
References
1. Bhagavatula, R., Fickus, M., Kelly, W., Guo, C., Ozolek, J., Castro, C.,
Kovacevic, J.: Automatic identification and delineation of germ layer components
in H &E stained images of teratomas derived from human and nonhuman primate
embryonic stem cells. In: IEEE ISBI, pp. 1041–1044 (2010)
2. Cancer Genome Atlas Research Network: Comprehensive, integrative genomic
analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 372(26), 2481–2498 (2015)
80 J. Han et al.
3. Chang, H., Borowsky, A., Spellman, P.T., Parvin, B.: Classification of tumor his-
tology via morphometric context. In: IEEE CVPR, pp. 2203–2210 (2013)
4. Chang, H., Han, J., Borowsky, A., Loss, L., Gray, J.W., Spellman, P.T., Parvin, B.:
Invariant delineation of nuclear architecture in glioblastoma multiforme for clinical
and molecular association. IEEE Trans. Med. Imaging 32(4), 670–682 (2013)
5. Chang, H., Zhou, Y., Borowsky, A., Barner, K.E., Spellman, P.T., Parvin, B.:
Stacked predictive sparse decomposition for classification of histology sections. Int.
J. Comput. Vis. 113(1), 3–18 (2015)
6. Dalton, L., Pinder, S., Elston, C., Ellis, I., Page, D., Dupont, W., Blamey, R.: His-
tolgical gradings of breast cancer: linkage of patient outcome with level of pathol-
ogist agreements. Mod. Pathol. 13(7), 730–735 (2000)
7. Demir, C., Yener, B.: Automated cancer diagnosis based on histopathological
images: a systematic survey (2009)
8. Gurcan, M., Boucheron, L., Can, A., Madabhushi, A., Rajpoot, N., Bulent, Y.:
Histopathological image analysis: a review. IEEE Rev. Biomed. Eng. 2, 147–171
(2009)
9. Huang, C.H., Veillard, A., Lomeine, N., Racoceanu, D., Roux, L.: Time efficient
sparse analysis of histopathological whole slide images. Comput. Med. Imaging
Graph. 35(7–8), 579–591 (2011)
10. Kane, A., Yang, I.: Interferon-gamma in brain tumor immunotherapy. Neurosurg.
Clin. N. Am. 21(1), 77–86 (2010)
11. Kong, J., Cooper, L., Sharma, A., Kurk, T., Brat, D., Saltz, J.: Texture based
image recognition in microscopy images of diffuse gliomas with multi-class gentle
boosting mechanism. In: IEEE ICASSP, pp. 457–460 (2010)
12. Kothari, S., Phan, J.H., Osunkoya, A.O., Wang, M.D.: Biological interpretation of
morphological patterns in histopathological whole slide images. In: Proceedings of
the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
(2012)
13. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid
matching for recognizing natural scene categories. In: IEEE CVPR, pp. 2169–2178
(2006)
14. Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-
based method for class discovery and visualization of gene expression microarray
data. Mach. Learn. 52, 91–118 (2003)
15. Romo, D., Garcla-Arteaga, J.D., Arbelez, P., Romero, E.: A discriminant multi-
scale histopathology descriptor using dictionary learning. In: SPIE 9041 Medical
Imaging (2014)
16. Sirinukunwattana, K., Khan, A.M., Rajpoot, N.M.: Cell words: modelling the
visual appearance of cells in histopathology images. Comput. Med. Imaging Graph.
42, 16–24 (2015)
17. Subramanian, A., Tamayo, P., Mootha, V., Mukherjee, S., Ebert, B., Gillette, M.,
Paulovich, A., Pomeroy, S., Golub, T., Lander, E., Mesirov, J.: Gene set enrichment
analysis: a knowledge-based approach for interpreting genome-wide expression pro-
files. Proc. Natl. Acad. Sci. USA 102(43), 15545–15550 (2005)
18. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using
sparse coding for image classification. In: IEEE CVPR, pp. 1794–1801 (2009)
19. Zhou, Y., Chang, H., Barner, K.E., Parvin, B.: Nuclei segmentation via sparsity
constrained convolutional regression. In: IEEE ISBI, pp. 1284–1287 (2015)
20. Zhou, Y., Chang, H., Barner, K.E., Spellman, P.T., Parvin, B.: Classification of
histology sections via multispectral convolutional sparse coding. In: IEEE CVPR,
pp. 3081–3088 (2014)
Mapping Lifetime Brain Volumetry
with Covariate-Adjusted Restricted Cubic
Spline Regression from Cross-Sectional
Multi-site MRI
Yuankai Huo1(&), Katherine Aboud2, Hakmook Kang3,

Laurie E. Cutting2, and Bennett A. Landman1
1
Department of Electrical Engineering,
Vanderbilt University, Nashville, TN, USA
yuankai.huo@vanderbilt.edu
2
Department of Special Education, Vanderbilt University, Nashville, TN, USA
3
Department of Biostatistics, Vanderbilt University, Nashville, TN, USA
Abstract. Understanding brain volumetry is essential to understand neuro-

development and disease. Historically, age-related changes have been studied in
detail for specific age ranges (e.g., early childhood, teen, young adults, elderly,
etc.) or more sparsely sampled for wider considerations of lifetime aging. Recent
advancements in data sharing and robust processing have made available con-
siderable quantities of brain images from normal, healthy volunteers. However,
existing analysis approaches have had difficulty addressing (1) complex volu-
metric developments on the large cohort across the life time (e.g., beyond cubic
age trends), (2) accounting for confound effects, and (3) maintaining an analysis
framework consistent with the general linear model (GLM) approach pervasive
in neuroscience. To address these challenges, we propose to use covariate-
adjusted restricted cubic spline (C-RCS) regression within a multi-site cross-
sectional framework. This model allows for flexible consideration of nonlinear
age-associated patterns while accounting for traditional covariates and interac-
tion effects. As a demonstration of this approach on lifetime brain aging, we
derive normative volumetric trajectories and 95 % confidence intervals from
5111 healthy patients from 64 sites while accounting for confounding sex,
intracranial volume and field strength effects. The volumetric results are shown
to be consistent with traditional studies that have explored more limited age
ranges using single-site analyses. This work represents the first integration of
C-RCS with neuroimaging and the derivation of structural covariance networks
(SCNs) from a large study of multi-site, cross-sectional data.
1 Introduction
Brain volumetry across the lifespan is essential in neurological research and clinical
investigation. Magnetic resonance imaging (MRI) allows for quantification of such
changes, and consequent investigation of specific age ranges or more sparsely sampled
lifetime data [1]. Contemporaneous advancements in data sharing have made consid-
erable quantities of brain images available from normal, healthy populations. However,
DOI: 10.1007/978-3-319-46720-7_10
82 Y. Huo et al.
the regression models prevalent in volumetric mapping (e.g., linear, polynomial,

non-parametric model, etc.) have had difficulty in modeling complex, cross-sectional
large cohorts while accounting for confound effects.
This paper proposes a novel multi-site cross-sectional framework using
Covariate-adjusted Restricted Cubic Spline (C-RCS) regression to map brain volumetry
on a large cohort (5111 MR 3D images) across the lifespan (4 * 98 years). The
C-RCS extends the Restricted Cubic Spline [2, 3] by regressing out the confound
effects in a general linear model (GLM) fashion. Multi-atlas segmentation is used to
obtain whole brain volume (WBV) and 132 regional volumes. The regional volumes
are further grouped to 15 networks of interest (NOIs). Then, structural covariance
networks (SCNs), i.e. regions or networks that mature or decline together during
developmental periods, are established based on NOIs using hierarchical clustering
analysis (HCA). To validate the large-scale framework, confidence intervals (CI) are
provided for both C-RCS regression and clustering from 10,000 bootstrap samples.
Table 1. Data summary of 5111 multi-site images.

Study name Website Images Sites
Baltimore Longitudinal Study of Aging (BLSA) www.blsa.nih.gov 605 4
Cutting Pediatrics vkc.mc.vanderbilt.edu/ebrl 586 2
Autism Brain Imaging Data Exchange (ABIDE) fcon_1000.projects.nitrc.org/indi/abide 563 17
Information eXtraction from Images (IXI) www.nitrc.org/projects/ixi_dataset 523 3
Attention Deficit Hyperactivity Disorder (ADHD200) fcon_1000.projects.nitrc.org/indi/adhd200 949 8
National Database for Autism Research (NDAR) ndar.nih.gov 328 6
Open Access Series on Imaging Study (OASIS) www.oasis-brains.org 312 1
1000 Functional Connectome (fcon_1000) fcon_1000.projects.nitrc.org 1102 22
Nathan Kline Institute Rockland (NKI_rockland) fcon_1000.projects.nitrc.org/indi/enhanced 143 1
2 Methods
2.1 Extracting Volumetric Information
The complete cohort aggregates 9 datasets with a total 5111 MR T1w 3D images from
normal healthy subjects (Table 1). 45 atlases are non-rigidly registered [4] to a target
image and non-local spatial staple (NLSS) label fusion [5] is used to fuse the labels
from each atlas to the target image using the BrainCOLOR protocol [6] (Fig. 1). WBV
and regional volume are then calculated by multiplying the volume of a single voxel by
Fig. 1. The large-scale cross-sectional framework on 5111 multi-site MR 3D images.

Mapping Lifetime Brain Volumetry with C-RCS Regression 83
the number of labeled voxels in original image space. In total, 15 NOIs are defined by
structural and functional covariance networks including visual, frontal, language,
memory, motor, fusiform, basal ganglia (BG) and cerebellum (CB).
2.2 Covariate-Adjusted Restricted Cubic Spline (C-RCS)

We define x as the ages of all subjects and Sð xÞ as the corresponding brain volumes. In
canonical nth degree spline regression, splines are used to model non-linear relation-
ships between variables Sð xÞ and x by deciding the connections between K knots
ðt1 \t2 \ \tK Þ. In this work, such knots were determined based on previously
identified developmental shifts [1], specifically corresponding with transitions between
childhood (7–12), late adolescence (12–19), young adulthood (19–30), middle adult-
hood (30–55), older adulthood (55–75), and late life (75–90). Using the expression
from Durrleman and Simon [2], the canonical nth degree spline function is defined as
Xn XK
Sð x Þ ¼ _ xj þ
b b_ ðx ti Þnþ ð1Þ
j¼0 oj i¼1 in
where ðx ti Þ þ ¼ x ti ; if x [ ti ; ðx ti Þ þ ¼ 0; if x ti .
0 0 0
To regress out confound effects, new covariates X1 ; X2 ; . . .; Xc (with coefficients
0 0 0
b1 ; b2 ; . . .; bc ) are introduced to the nth degree spline regression
Xn XK XC
b_ x j þ b_ ðx ti Þnþ þ
0 0
Sð x Þ ¼ j¼0 oj i¼1 in u¼0
bu X u ð2Þ
where C is the number of confound effects.

In the RCS regression, a linear constrain is introduced [2] to address the poor
behavior of the cubic spline model in the tails (x\t1 and x [ tK ) [7]. Using the same
principle, C-RCS regression extends the RCS regression (n ¼ 3) and restricts the
relationship between Sð xÞ and x to be a linear function in the tails. First, for x\t1 ,
XC
Sð xÞ ¼ b_ 00 þ b_ 01 x þ b_ 02 x2 þ b_ 03 x3 þ b_ 13 þ
0 0
u¼0
bu Xu ð3Þ
where b_ 02 ¼ b_ 03 ¼ 0 ensures the linearity before the first knot. Second, for x [ tK ,
XC
Sð xÞ ¼ b_ 00 þ b_ 01 x þ b_ 13 ðx t1 Þ3þ þ þ b_ K3 ðx tK Þ3þ þ
0 0
u¼0
bu Xu ð4Þ
To guarantee the linearity of C-RCS after the last knot, we expand the previous
expression and force the coefficients of x2 and x3 to be zero. After expansion,
XC
Sð xÞ ¼ b_ 00 þ b_ 13 t13 þ . . . þ b_ K3 tK3 þ _ þ 3b_ t2 þ . . . þ 3b_ t2 x
0 0
u¼0
b u X u þ b01 13 1 K3 K
ð5Þ
_
þ 3b13 t1 þ 3b23 t2 þ . . . þ 3bK3 tK x2 þ 3b_ 13 þ 3b_ 23 þ . . . þ 3b_ K3 x3
_ _
P P
As a result, linearity of Sð xÞ at x [ tK implies that Ki¼1 b_ i3 ti ¼ 0 and Ki¼1 b_ i3 ¼ 0.
Following such restrictions, the b_ ðK1Þ3 and b_ K3 are derived as
84 Y. Huo et al.
PK2 _ PK2 _
i¼1 bi3 ðtK ti Þ b ðtK1 ti Þ
b_ ðK1Þ3 ¼ and b_ K3 ¼ i¼1 i3 ð6Þ
tK tK1 tK tK1
and the complete C-RCS regression model is defined as

XK2 tK ti
Sð xÞ ¼ b_ 00 þ b_ 01 x þ b_ i3 ½ðx ti Þ3þ ðx tK1 Þ3þ
i¼1 tK tK1
tK1 ti XC 0
ð7Þ
0
þ ðx tK Þ3þ þ bX
tK tK1 u¼0 u u
2.3 Regressing Out Confound Effects by C-RCS Regression in GLM

Fashion
To adapt C-RCS regression in the GLM fashion, we redefine the coefficients
b0 ; b1 ; b2 ; . . .; bK1 as Harrell [3] where b0 ¼ b_ 00 ; b1 ¼ b_ 01 ; b2 ¼ b_ 13 ; b3 ¼ b_ 23 ;
b4 ¼ b_ 33 ; ; bK1 ¼ b_ ðK2Þ3 . Then, the C-RCS regression with confound effects
becomes
XK1 XC 0 0
Sð x Þ ¼ b 0 þ j¼1
bj Xj þ bX
u¼0 u u
ð8Þ
0
where C is the number for all confound effects (Xu ). X1 ¼ x and for j ¼ 2; . . .; K 1
3 tK tj1 tK1 tj1
Xj ¼ x tj1 þ ðx tK1 Þ3þ þ ðx tK Þ3þ ð9Þ
tK tK1 tK tK1
Then, the beta coefficients are solvable under GLM framework. Once b ^ ;b
^ ^
0 1 ; b2 ;
^
;b ^ ^
K1 are obtained, two linear assured terms bK and bK þ 1 are estimated:
PK1 ^ PK1 ^
^ ¼ i¼2 bi ðti1 tK Þ and b
b ^ i¼2 bi ðti1 tK1 Þ
K þ1 ¼ ð10Þ
K
tK tK1 tK1 tK
The final estimated volumetric trajectories ^ SðxÞ can be fitted as
XK þ 1 XC
^
SðxÞ ¼ b ^ þ ^ ðx tj Þ3 þ
b ^0 X 0
b ð11Þ
0 j¼1 j þ u¼0 u u
In this work, gender, field strength and total intracranial volume (TICV) are employed
0
as covariates Xu . TICV values are calculated using SIENAX [8]. Field strength and
TICV are used to regress out site effects rather than using site categories directly since
the sites are highly correlated with the explanatory variable age.
2.4 SCNs and CI Using Bootstrap Method

Using aforementioned C-RCS regression, the lifespan volumetric trajectories of WBV
and 15 NOIs are obtained from 5111 images. Simultaneously, the piecewise volumetric
trajectories within a particular age bin (between adjacent knots) of all 15 NOIs
(^
Si ð xÞ; i ¼ 1; 2; . . .; 15) are separated to establish SCNs dendrograms using HCA [9].
The distance metric D used in HCA is defined as D ¼ 1 corrð^Si ð xÞ; ^Sj ð xÞÞ;
Fig. 2. Volumetry and growth rate. The left plot in (a) shows the volumetric trajectory of whole
brain volume (WBV) using C-RCS regression on 5111 MR images. The right figure in
(a) indicates the growth rate curve, which shows volumetric change per year of the volumetric
trajectory. In (b), C-RCS regression is deployed on the same dataset by additionally regressing
out TICV. Our growth rate curves are compared with 40 previous longitudinal studies [1] on
smaller cohorts (21 studies in (a) without regressing out TICV and 19 studies in (b) regressing
out TICV). The standard deviations of previous studies are provided as black bars (if available).
The 95 % CIs in all plots are calculated from 10,000 bootstrap samples.
i; j 2 ½1; 2; . . .; 15 and i 6¼ j, where corrðÞ is the Pearson’s correlation between any
two C-RCS fitted piecewise trajectories ^ Si ð xÞ and ^Sj ð xÞ in the same age bin.
The stability of proposed approaches is demonstrated by the CIs of C-RCS
regression and SCNs using bootstrap method [10]. First, the 95 % CIs of volumetric
trajectories on WBV (Fig. 2) and 15 NOIs (Fig. 3) are derived by deploying C-RCS
regression on 10,000 bootstrap samples. Then, the distances D between all pairs of
clustered NOIs are derived using 15 (NOIs) 10,000 (bootstrap) C-RCS fitted tra-
jectories. Then, the 95 % CIs are obtained for each pair of clustered NOIs and shown
on six SCNs dendrograms (Fig. 4). The average network distance (AND), the average
distance between 15 NOIs for a dendrogram, can be calculated 10,000 times using
bootstrap. The AND reflects the modularity of connections between all NOIs. We are
able to see if the AND are significantly different during brain development periods by
deploying the two-sample t-test on AND values (10,000/age bin) between age bins.
86 Y. Huo et al.
Fig. 3. Lifespan trajectories of 15 NOIs are provided with 95 % CI from 10,000 bootstrap
samples. The upper 3D figures indicate the definition of NOIs (in red). The lower figures show
the trajectories with CI using C-RCS regression method by regressing out gender, field strength
and TICV (same model as Fig. 2b). For each NOI, the piecewise CIs of six age bins are shown in
different colors. The piecewise volumetric trajectories and CIs are separated by 7 knots in the
lifespan C-RCS regression rather than conducting independent fittings. The volumetric
trajectories on both sides of each NOI are derived separately except for CB.
3 Results
Figure 2a shows the lifespan volumetric trajectories using C-RCS regression as well as
the growth rate (volume change in percentage per year) of WBV when regressing out
gender and field strength effects. Figure 2b indicates the C-RCS regression on the same
dataset by adding TICV as an additional covariate. The cross sectional growth rate
Fig. 4. The six structural covariance networks (SCNs) dendrograms using hierarchical
clustering analysis (HCA) indicate which NOIs develop together during different developmental
periods (age bins). The distance on the x-axis is in log scale, which equals to one minus Pearson’s
correlation between two curves. The correlation between NOIs becomes stronger from right to
left on the x-axis. The horizontal range of each colored rectangles indicates the 95 % CI of
distance from 10,000 bootstrap samples. Note that the colors are chosen for visualization
purposes without quantitative meanings.
curve using C-RCS regression is compared with 40 previous longitudinal studies (19
are TICV corrected) [1], which are typically limited on smaller age ranges.
Using the same C-RCS model in Figs. 2b and 3 indicates the both lifespan and
piecewise volumetric trajectories of 15 NOIs. In Fig. 4, the piecewise volumetric tra-
jectories of the 15 NOIs within each age bin are clustered using HCA and shown in one
SCNs dendrogram.
Then, six SCNs dendrograms are obtained by repeating HCA on different age bins,
which demonstrate the evolution of SCNs during different developmental periods. The
ANDs between any two age bins in Fig. 4 are statistically significant (p < 0.001).
88 Y. Huo et al.
4 Conclusion and Discussion
This paper proposes a large-scale cross-sectional framework to investigate life-time brain

volumetry using C-RCS regression. C-RCS regression captures complex brain volu-
metric trajectories across the lifespan while regressing out confound effects in a GLM
fashion. Hence, it can be used by researchers within a familiar context. The estimated
volume trends are consistent with 40 previous smaller longitudinal studies. The stable
estimation of volumetric trends for NOI (exhibited by narrow confidence bands) provides
a basis for assessing patterns in brain changes through SCNs. Moreover, we demonstrate
how to compute confidence intervals for SCNs and correlations between NOIs. The
significant difference of AND indicates that the C-RCS regression detects the changes of
average SCNs connections during the brain development.
The software is freely available online1.
Acknowledgments. This research was supported by NSF CAREER 1452485, NIH 5R21EY
024036, NIH 1R21NS064534, NIH 2R01EB006136, NIH 1R03EB012461, NIH R01NS095291
and also supported by the Intramural Research Program, National Institute on Aging, NIH.
References
1. Hedman, A.M., van Haren, N.E., Schnack, H.G., Kahn, R.S., Hulshoff Pol, H.E.: Human
brain changes across the life span: a review of 56 longitudinal magnetic resonance imaging
studies. Hum. Brain Mapp. 33, 1987–2002 (2012)
2. Durrleman, S., Simon, R.: Flexible regression models with cubic splines. Stat. Med. 8, 551–
561 (1989)
3. Harrell, F.: Regression Modeling Strategies: with Applications to Linear Models, Logistic
and Ordinal Regression, and Survival Analysis. Springer, Switzerland (2015)
4. Avants, B.B., Epstein, C.L., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image
registration with cross-correlation: evaluating automated labeling of elderly and neurode-
generative brain. Med. Image Anal. 12, 26–41 (2008)
5. Asman, A.J., Dagley, A.S., Landman, B.A.: Statistical label fusion with hierarchical
performance models. In: Proceedings - Society of Photo-Optical Instrumentation Engineers,
vol. 9034, p. 90341E (2014)
6. Klein, A., Dal Canton, T., Ghosh, S.S., Landman, B., Lee, J., Worth, A.: Open labels: online
feedback for a public resource of manually labeled brain images. In: 16th Annual Meeting
for the Organization of Human Brain Mapping (2010)
7. Stone, C.J., Koo, C.-Y.: Additive splines in statistics, p. 48 (1986)
8. Smith, S.M., Zhang, Y., Jenkinson, M., Chen, J., Matthews, P.M., Federico, A., De
Stefano, N.: Accurate, robust, and automated longitudinal and cross-sectional brain change
analysis. Neuroimage 17, 479–489 (2002)
9. Anderberg, M.R.: Cluster Analysis for Applications: Probability and Mathematical Statistics:
A Series of Monographs and Textbooks. Academic Press, New York (2014)
10. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)
1
https://masi.vuse.vanderbilt.edu/index.php/C-RCSregression.
Extracting the Core Structural Connectivity
Network: Guaranteeing Network Connectedness
Through a Graph-Theoretical Approach
Demian Wassermann1(B) , Dorian Mazauric2 , Guillermo Gallardo-Diez1 ,

and Rachid Deriche1
1
Athena EPI, Inria Sophia Antipolis-Meditérranée, Sophia Antipolis 06902, France
demian.wassermann@inria.fr
2
ABS EPI, Inria Sophia Antipolis-Meditérranée, Sophia Antipolis 06902, France
Abstract. We present a graph-theoretical algorithm to extract the con-

nected core structural connectivity network of a subject population.
Extracting this core common network across subjects is a main problem
in current neuroscience. Such network facilitates cognitive and clinical
analyses by reducing the number of connections that need to be explored.
Furthermore, insights into the human brain structure can be gained by
comparing core networks of different populations. We show that our novel
algorithm has theoretical and practical advantages. First, contrary to
the current approach our algorithm guarantees that the extracted core
subnetwork is connected agreeing with current evidence that the core
structural network is tightly connected. Second, our algorithm shows
enhanced performance when used as feature selection approach for con-
nectivity analysis on populations.
1 Introduction
Isolating the common core structural connectivity network (SCN) of a population

is an important problem in current neuroscience [3,5]. This procedure facilitates
cognitive and clinical studies based on Diffusion MRI e.g. [1,5] by increasing
their statistical power through a reduction of the number of analyzed structural
connections. We illustrate this process in Fig. 1. Furthermore, recent evidence
indicates a core common network exists in human and macaque brains and that
it is tightly connected [2]. In this work we develop, for the first time, a group-wise
core SCN extraction algorithm which guarantees a connected network output.
Furthermore, we show the potential of such network to select gender-specific
connections through an experiment on 300 human subjects.
The most used population-level core SCN extraction technique [5] is based
on an effective statistical procedure to extract a population SCN: (1) comput-
ing, for each subject, a connectivity matrix using a standardised parcellation;
and (2) extracting a binary graph by analysing each connection separately and
rejecting hypothesis is not in the population. The resulting graph can be a set of
disconnected subgraphs. This is problematic, recent studies have shown the core

DOI: 10.1007/978-3-319-46720-7 11
90 D. Wassermann et al.
Tractographies from the Structural Connectivity Groupwise Core Statistical

Subject Sample Matrices Derived Connectivity Matrix Analyses
from Tractographies
Fig. 1. Scheme of analyses involving the core structural connectivity matrix.
network to be tightly connected [2]. However, extracting connected group-wise

core SCN is far from simple: an algorithm to find the largest core network of a
population cannot find an approximated solution in polynomial time.
In this work, we propose a graph-theoretical algorithm to obtain the con-
nected core SCN of a subject sample. Our approach guarantees a connected core
SCN, agreeing with novel evidences on structural connectivity network topol-
ogy e.g. [2]. We start by proving that we can formulate the problem such that
core network extraction is NP-Complete in general but in our case we find an
exact polynomial time algorithm to perform the extraction. Finally, we show
that our algorithm outperforms that of Gong et al. [5] as a tool for selecting
regressors in SC. For this, we use 300 subjects from the HCP database and
comparing the performance of the networks obtained with both algorithms to
predict connectivity values from gender in a subsection of the core network.
2 Definitions, Problems, and Contributions

A first approach to core sub-network identification can be derived from the
binary connectivity model. In this model the cortical and sub-cortical regions are
common across subjects and what varies is whether these regions are connected
or not [5]. Using this approach, a sample of human brain connectivity of a given
population can be represented by k ≥ 1 graphs G1 = (V, E1 ), . . . , Gk = (V, Ek ).
In this formalism each graph Gi corresponds to a subject and, in accordance
with Gong et al. [5], the vertices, V , stable across subjects are cortical and
sub-cortical regions and the edges Ei are white matter bundles connecting those
regions. Note that all graphs have the same ordered set of nodes. A first approach
to compute, or approximate, the core sub-network of the population sample
consists in finding the core SCN graph G∗ = (V ∗ , E ∗ ) such that G∗ and every
Gi has some quantitative common properties, where V ∗ ⊆ V . In this article, we
model the difference between the core SCN, G∗ , and the subject ones, Gi , by
a function fλ . This function measures the difference between the sets of edges
(and the sets of non-edges) of the core network and those of the subjects:
fλ (G∗ , Gi ) = λ|{e ∈ E, e ∈
/ E(Gi [V ∗ ])}| + (1 − λ)|{e ∈
/ E, e ∈ E(Gi [V ∗ ])}|,
where λ ∈ [0, 1] and Gi [V ∗ ] is the subgraph of Gi induced by the set of nodes
V ∗ ⊆ V and |S| is the cardinality of a set S. In other words, fλ represents the
Extracting the Core Structural Connectivity Network 91
difference between the set of edges of G and the set of edges of Gi modulated by
the parameter λ. In the following, we will refer to fλ (G∗ , Gi ) as the difference
threshold of a core sub-network G∗ wrt Gi . Note that if λ = 1, we only consider
edges excluded from the core network, |{e ∈ E, e ∈ / E(Gi [V ∗ ])}|, and if λ = 0,
we only consider edges included in the core network, |{e ∈ / E, e ∈ E(Gi [V ∗ ])}|.
In Definition 1, we formalize the problem of computing the core sub-network as
a combinatorial optimization problem:
Definition 1 (Core Sub-network Problem). Let G1 = (V, E1 ), . . . , Gk =
(V, Ek ) be k ≥ 1 undirected graphs. Let λ be any real such that λ ∈ [0, 1]. Let
n ≥ 0 be any integer. Then, the core sub-network problem consists in computing
a connected graph G∗ = (V ∗ , E ∗ ) such that |V ∗ | ≥ n and such that the sum of
k
the difference thresholds i=1 fλ (G∗ , Gi ) is minimum.
Small Example: Consider the instance depicted in Fig. 2 with λ = 12 .

Figure 2(a), (b), and (c) represent G1 , G2 , G3 , respectively. Figure 2(d) is a solu-
tion G∗ = (V ∗ , E ∗ ) when n = 5. Indeed, we have fλ (G∗ , G3 ) = 12 because the
difference between G∗ and G3 is the edge connecting nodes 2 and 5 or the two ele-
ment set {2, 5}; we have fλ (G∗ , G2 ) = 2 because the difference between G∗ and
G2 is the four edges {1, 5}, {1, 4}, {3, 4}, and {4, 5}; and we have fλ (G∗ , G1 ) = 1
because the difference between G∗ and G1 is the two edges {3, 5} and {4, 5}. We
get fλ (G∗ , G1 ) + fλ (G∗ , G2 ) + fλ (G∗ , G3 ) = 72 .
1 5 1 5 1 5 1 5
2 4 2 4 2 4 2 4
3 3 3 3
(a) (b) (c) (d)
Fig. 2. Instance of the common sub-network problem. (a–c) Brain connectivity of dif-
ferent subjects, namely G1 , G2 and G3 . (d) Extracted common sub-network G∗ that
is optimal for n = 5 with λ = 12 : the difference threshold is 72 .
In the rest of this section we state our main contribution, an optimal poly-
nomial time exact algorithm for the core sub-network problem if the number of
nodes is sufficiently large (optimal means here that there is no exact algorithm
with better complexity). Solving the problem in Definition 1, is hard: it can be
proved that given an integer n ≥ 0 and a real number δ ≥ 0, then the decision
version of the SCN problem is NP-complete even if k = 2. However, focusing on
the problem of minimizing fλ we obtain a polynomial time algorithm for SCN
extraction.
The main point of this work is to present an algorithm for the core graph
extraction and assess its potential for clinical and cognitive studies. Even if the
problem is very difficult to solve in general, we design our polynomial time core
subnetwork extraction algorithm and show that it is optimal, when we focus
on the problem of minimizing the difference threshold and when the number of
nodes of the core sub-network is large.
Theorem 1. Consider k ≥ 1 undirected graphs G1 = (V, E1 ), . . . , Gk = (V, Ek )
and consider any real number λ ∈ [0, 1]. Then, Core-Sum-Alg (Algorthim 1)
is an O(max(k, log(|V |)).|V |2 )-time complexity exact algorithm for the core sub-
network problem when n = |V |.
Algorithm 1 Core-Sum-Alg: Exact polynomial time complexity algorithm

for the core sub-network problem when n = |V |.
Require: SC graphs for each subject G1 = (V, E1 ), . . . , Gk = (V, Ek ), and λ ∈ [0, 1]
Start Computing a Core Graph that can have Multiple Connected Components:
1: Construct G = (V, V × V ), the completely connected graph G
2: Compute w0 (·), w1 (·) across all subject graphs as in Eq. 1.
3: Compute the set of edges to be added to the core graph E 1 = {e|w1 (e) ≤ w0 (e)}
and construct G1 = (V, E 1 ).
Compute the Connected Components of G1 and Connect Them
4: Compute the set of maximal connected components cc(G1 ) = (cc1 (G1 ) . . . cct (G1 ))
5: Construct Gcc = (Vcc , Vcc × Vcc ) with Vcc = {u1 , . . . , ut }
6: Compute wcc as in Eq. 4.
7: Compute the set of edges E 0 that correspond to argument minimum of Eq. 4. In
other words, for every {ui , uj }, select the edge e connecting the maximal connected
components cci (G1 ) and ccj (G1 ) such that w1 (e) − w0 (e) = wcc ({ui , uj }).
8: Compute a minimum spanning tree Tcc of Gcc
9: Compute the set of edges E∗0 ⊆ E 0 that corresponds to the set of edges of the
previous minimum spanning tree.
10: Construct G∗ = (V ∗ , E ∗ ) with V ∗ := V and E ∗ := E 1 ∪ E∗0
11: return G∗ the connected Core Structural Connectivity Network
In the following, we aim at proving Theorem 1. Consider k ≥ 1 undirected

graphs G1 = (V, E1 ), . . . , Gk = (V, Ek ) and consider any real number λ such that
λ ∈ [0, 1]. Let us define some notations and auxiliary graphs. Let G = (V, V × V )
be the completely connected cortical network graph with V × V the set all pairs
from the elements from V . We define two edge-weighting functions w0 and w1 :
w0 (e) = (1 − λ)|{i, e ∈ Ei , 1 ≤ i ≤ k}|, w1 (e) = λ|{i, e ∈

/ Ei , 1 ≤ i ≤ k}|. (1)
Intuitively, w0 (e) represents the cost of not adding the edge e in the solution
and w1 (e) represents the cost of adding the edge e in the solution. From this,
we define the graph induced by the set of edges to keep in the core subnetwork
G1 = (V, E 1 ), E 1 = {e | w1 (e) ≤ w0 (e)} ⊆ V × V. (2)
If G1 is a connected graph, then it is an optimal solution. Otherwise, we have to

add edges in order to obtain a connected graph while minimizing the cost of such
adding. To add such edges we define a graph representing the fully connected
graph where each node represents a maximal connected component:
Gcc = (Vcc , Ecc ) with Vcc = {u1 , . . . , ut } and Ecc = Vcc × Vcc , (3)
where cc(G1 ) = (cc1 (G1 ), . . . , cct (G1 )) is the t maximal connected components of
G1 . Then, to select which maximal connected components to include in our core
subnetwork graph, we define a weight function wcc :
wcc ({ui , uj }) = min w1 (e)−w0 (e), where 1 ≤ i < j ≤ t. (4)

v∈V (cci (G1 )),v ∈V (ccj (G1 ))
We formally prove in Lemma 1 that the problem of obtaining a minimum con-

nected graph from G1 , that is solving the core sub-network problem when
n = |V |, consists in computing a minimum spanning tree of Gcc .
Lemma 1. The core sub-network problem when n = |V | is equivalent to compute

a minimum spanning tree of Gcc = (Vcc , Ecc ) with weight function wcc .
Proof. The core sub-network problem when n = |V | consists in computing

k
a graph G∗ = (V ∗ , E ∗ ) such that V ∗ = V and δ ∗ = ∗
i=1 fλ (G , Gi ) is
minimum.
Consider the graph G 1 = (V, E 1
) previously
defined. Observe that:
δ ∗ ≥ e∈V ×V min(w0 (e), w1 (e)) = e∈E 1 w1 (e) + e∈V ×V |e∈E w
/ 1 0 (e). Indeed,
for every pair of nodes v, v of V , either we set {v, v } ∈ E ∗ if w1 ({u, v}) ≥
w0 ({v, v }) or we set {v, v } ∈ / E ∗ . Thus, if G1 is a connected graph, then G = G1
∗
∗
is an optimal solution such that δ = e∈V ×V min(w0 (e), w1 (e)). Otherwise, we
have to add edges in E 1 in order to get a connected graph (that is a spanning
graph) and the “cost” of this addition has to be minimized.
Thus, suppose that the graph G1 contains at least two maximal connected
components. Let cc(G1 ) = (cc1 (G1 ), . . . , cct (G1 )) be the t ≥ 1 maximal connected
components of G1 . We have to connect these different components minimizing
the increasing of the difference threshold. Let E 0 be the set of candidate edges
constructed as follows. For every i, j, 1 ≤ i < j ≤ t, let {vi , vj } be an edge
such that for every v ∈ V (cci (G1 )) and for every v ∈ V (ccj (G1 )), we have
w1 ({vi , vj }) − w0 ({vi , vj }) ≤ w1 ({v, v }) − w0 ({v, v }). In other words, {vi , vj }
is an edge that minimizes the marginal cost for connecting cci (G1 ) and ccj (G1 ).
We add {vi , vj } in E 0 .
Thus, we have to add exactly t − 1 edges of E 0 in order to get a connected
graph and we aim at minimizing the cost of this addition. More precisely, we
get our optimal core network by finding, for every i, j, 1 ≤ i < j ≤ t, an edge e
such that wcc ({ui , uj }) = w1 (e) − w0 (e), that is an edge e of minimum marginal
cost between the maximal connected component cci (G1 ) and the maximal con-
nected component
ccj (G1 ). Let E∗0 be such a subset of t − 1 edges. Thus, we get
∗
that δ = e∈E 1 w1 (e) + e∈V ×V |e∈E / 1 ,e∈E 0 w1 (e) + / ∗0 w0 (e).
/ 1 ,e∈E
e∈V ×V |e∈E
∗
Observe that e∈V ×V |e∈E / 1 0
,e∈E∗ w 1 (e) is exactly the cost of a mnimum spanning
tree of the graph Gcc defined before.

We are now able to prove Theorem 1.

Proof. [of Theorem 1] Core-Sum-Alg (Algorithm 1) follows the proof of
Lemma 1 and so solves the core sub-network problem when n = |V |. The con-
struction of G (line 1) can be done in linear time in the number of edges, that is
in O(|V |2 )-time. The time complexity of line 2 is O(k|V |2 ). The construction of
G1 (line 3) can be done in O(|V | + |E 1 |)-time, O(|V |2 )-time in the worst case.
The computation of the maximal connected components of G1 (line 4) can be
done in linear time in the size of G1 , that is in O(|V | + |E 1 |), O(|V |2 )-time in
the worst case. The construction of Gcc (line 5) can be done in linear time in the
size of Gcc , that is, in the worst case, in O(|V |2 )-time. The time complexity of
line 6 and 7 is O(|V |2 ). There is an O(m log(n))-time complexity algorihm for
the problem of computing a minimum spanning tree of a graph composed of n
nodes amd m edges (line 8). Thus, in our case, we get an O(log(|V |)|V |2 )-time
algorithm. The time complexity of line 9 is O(|V |). Finally, the construction of
G∗ = (V ∗ , E ∗ ) (line 10) can be done in constant time because V ∗ = V and
E ∗ = E 1 ∪ E∗0 .

Having developed the core subnetwork extraction guaranteeing a connected
core network (Algorithm 1). We proceed to assess its performance.
3 Experiments and Results

To assess the performance of our method, we compared our novel approach with
the currently used [5]: first, we compared the stability of the obtained binary
graph across randomly chosen subpopulations; second, we compared connectivity
prediction performance.
For this comparisons, we used an homogeneous set from the HCP500 datat-
est [6]: all subjects aged 21–40 with complete dMRI protocol, which resulted
in 309 subjects (112 male). We obtained the weighted connectivity matrices
between the cortical regions defined by the Desikan atlas [4] as done by Bassett
et al. [1]. To verify the untresholded graph construction, we computed the aver-
age degree, number connections over number of possible connections, on each
subject. Bassett et al. [1] reported an average degree of 0.20 and we obtained
0.20 ± 0.01 (min: 0.17, max: 0.25) showing our preprocessing in agreement.
3.1 Consistency of the Extracted Graph

To quantify the consistency of the core graph extraction procedure we performed
500 Leave-N-Out experiments. At each experiment randomly sampled 100 sub-
jects from the total and computed the core graphs with both techniques. We
performed the extraction at 4 different levels of the parameter for each tech-
nique, choosing the parameters such that the density of the resulting graph
connections is stable across methods. Also, we reported the number of unstable
connections, selected as the connections that were not present or absent in all
experiments. We show the results of this experiment in Fig. 3. In this figure we
can observe that the resulting graphs are similar, while the number of unstable
connections is larger for Gong et al. [5] by an order of magnitude.
Fig. 3. Consistency analysis for extracted core graphs. We performed a Leave-N-Out

procedure to quantify the consistency across methods at 4 different parameter levels.
The results show similar graphs for both methods. However our method, in blue, has
a smaller number of connections that are not present or absent across all experiments,
i.e. unstable connections (marked in red).
3.2 Predicting Gender-Specific Connectivity
To assess model fit and prediction we implemented a nested Leave- 13 -Out pro-

cedure. The outer loop performs model selection on 13 of the subjects randomly
selected. First, it computes the core graph of a population, with our approach
and that of Gong et al. [5]. Then, it selects the features F that are more deter-
minant of gender classification using the f-test feature selection procedure. The
features are taken from the core graph adding the connectivity weights to each
subject. The inner loop performs model fitting and prediction using the selected
features F . First, we randomly take 13 of the remaining subjects and fits a linear
model on F for predicting gender. Second, we predict the values of the features
F from the gender column. The outer loop is performed 100 times and the inner
Fitting
Fig. 4. Performance of core network as feature selection for a linear model for gender
specific connectivity. We evaluate model fit (left) and prediction (right), Gong et al.
[5] in green, and ours, in blue. We show the histograms of both values from our nested
Leave- 13 -Out experiment. In both measures, our approach has more frequent lower
values, showing a better performance.
loop 500 times per outer loop. This totals 50,000 experiments. Finally, for each
experiment, we quantify the prediction performance of the linear model at each
inner loop with the mean squared error (MSE) of the prediction and Akaike
Information Criterion (AIC) for model fitting.
We show the experiment’s results in Fig. 4. In these results we can see that
our approach, in blue, performed better than Gong et al. [5], in green as the
number of cases with lower AIC and MSE is larger in our case.
We present, for the first time, an algorithm to extract the core structural con-
nectivity network of a subject population while guaranteeing connectedness. We
start by formalizing the problem and showing that, although the problem is
very hard (it is NP-complete), we produce a polynomial time exact algorithm
to extract such network when its number of nodes is large. Finally, we show an
example in which that our network constitutes a better feature selection step for
statistical analyses of structural connectivity. For this, we performed a nested
leave- 13 -out experiment on 300 hundred subjects. The results show that perform-
ing feature selection with our technique outperforms the most commonly used
approach.
Acknowledgments. This work has received funding from the European Research
Council (ERC Advanced Grant agreement No. 694665).
References
1. Bassett, D.S., Brown, J.A., Deshpande, V., Carlson, J.M., Grafton, S.T.: Conserved
and variable architecture of human white matter connectivity. Neuroimage 54(2),
1262–1279 (2011)
2. Bassett, D.S., Wymbs, N.F., Rombach, M.P., Porter, M.A., Mucha, P.J.,
Grafton, S.T.: Task-based core-periphery organization of human brain dynamics.
PLoS Comput. Biol. 9(9), e1003171 (2013)
3. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of
structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009)
4. Desikan, R.S., Ségonne, F., Fischl, B., Quinn, B.T., Dickerson, B.C., Blacker, D.,
Buckner, R.L., Dale, A.M., Maguire, R.P., Hyman, B.T., Albert, M., Killiany, R.J.:
An automated labeling system for subdividing the human cerebral cortex on MRI
scans into gyral based regions of interest. Neuroimage 31(3), 968–980 (2006)
5. Gong, G., He, Y., Concha, L., Lebel, C., Gross, D.W., Evans, A.C., Beaulieu, C.:
Mapping anatomical connectivity patterns of human cerebral cortex using in vivo
diffusion tensor imaging tractography. Cereb. Cortex 19(3), 524–536 (2009)
6. Sotiropoulos, S.N., Jbabdi, S., Xu, J., Andersson, J.L., Moeller, S., Auerbach, E.J.,
Glasser, M.F., Hernandez, M., Sapiro, G., Jenkinson, M., Feinberg, D.A.,
Yacoub, E., Lenglet, C., Van Essen, D.C., Ugurbil, K., Behrens, T.E.J.: Advances
in diffusion MRI acquisition and processing in the Human Connectome Project.
Neuroimage 80, 125–143 (2013)
Fiber Orientation Estimation Using Nonlocal
and Local Information
Chuyang Ye(B)
Brainnetome Center, Institute of Automation,

Chinese Academy of Sciences, Beijing, China
chuyang.ye@nlpr.ia.ac.cn
Abstract. Diffusion magnetic resonance imaging (dMRI) enables in

vivo investigation of white matter tracts, where the estimation of fiber
orientations (FOs) is a crucial step. Dictionary-based methods have been
developed to compute FOs with a lower number of dMRI acquisitions.
To reduce the effect of noise that is inherent in dMRI acquisitions, spa-
tial consistency of FOs between neighbor voxels has been incorporated
into dictionary-based methods. Because many fiber tracts are tube- or
sheet-shaped, voxels belonging to the same tract could share similar FO
configurations even when they are not adjacent to each other. Therefore,
it is possible to use nonlocal information to improve the performance of
FO estimation. In this work, we propose an FO estimation algorithm,
Fiber Orientation Reconstruction using Nonlocal and Local Information
(FORNLI), which adds nonlocal information to guide FO computation.
The diffusion signals are represented by a set of fixed prolate tensors.
For each voxel, we compare its patch-based diffusion profile with those
of the voxels in a search range, and its nonlocal reference voxels are deter-
mined as the k nearest neighbors in terms of diffusion profiles. Then, FOs
are estimated by iteratively solving weighted 1 -norm regularized least
squares problems, where the weights are determined using local neigh-
bor voxels and nonlocal reference voxels. These weights encourage FOs
that are consistent with the local and nonlocal information. FORNLI
was performed on simulated and real brain dMRI, which demonstrates
the benefit of incorporating nonlocal information for FO estimation.
Keywords: Diffusion MRI · FO estimation · Nonlocal information
1 Introduction
By capturing the anisotropy of water diffusion in tissue, diffusion magnetic reso-

nance imaging (dMRI) enables in vivo investigation of white matter tracts. The
fiber orientation (FO) is a crucial feature computed from dMRI, which plays an
important role in fiber tracking [5].
Voxelwise FO estimation methods have been proposed and widely applied,
such as constrained spherical deconvolution [16], multi-tensor models [9,13,17],

DOI: 10.1007/978-3-319-46720-7 12
98 C. Ye
and ensemble average propagator methods [10]. In particular, to reduce the num-
ber of dMRI acquisitions required for resolving crossing fibers, sparsity assump-
tion has been incorporated in the estimation problem. For example, it has been
used in the multi-tensor framework [1,9,13], leading to dictionary-based FO esti-
mation algorithms that have been shown to reconstruct FOs of good quality yet
using a lower number of dMRI acquisitions [1].
Because of image noise that adversely affects FO estimation, the regular-
ization of spatial consistency has been used in FO estimation problems. For
example, smoothness of diffusion tensors and FOs has been used as regulariza-
tion terms in the estimation in [12,15], respectively, but no sparsity regulariza-
tion is introduced. Other methods incorporate both sparsity and smoothness
assumption. For example, in [11,14] sparsity regularization is used together with
the smoothness of diffusion images in a spherical ridgelets framework, where FO
smoothness is enforced indirectly. More recently, [4,18] manage to directly encode
spatial consistency of FOs between neighbor voxels with sparsity regularization
in the multi-tensor models by using weighted 1 -norm regularization, where FOs
that are consistent with neighbors are encouraged. These methods have focused
on the use of local information for robust FO estimation. However, because fiber
tracts are usually tube-like or sheet-like [19], voxels that are not adjacent to
each other can also share similar FO configurations. Thus, nonlocal information
could further contribute to improved FO reconstruction by providing additional
information.
In this work, we propose an FO estimation algorithm that improves esti-
mation quality by incorporating both nonlocal and local information, which
is named Fiber Orientation Reconstruction using Nonlocal and Local Informa-
tion (FORNLI). We use a dictionary-based FO estimation framework, where
the diffusion signals are represented by a tensor basis so that sparsity regular-
ization can be readily incorporated. We design an objective function that consists
of data fidelity terms and weighted 1 -norm regularization. The weights in the
weighted 1 -norm encourage spatial consistency of FOs and are here encoded
by both local neighbors and nonlocal reference voxels. To determine the nonlo-
cal reference voxels for each voxel, we compare its patch-based diffusion profile
with those of the voxels in a search range, and select the k nearest neighbors in
terms of diffusion profiles. FOs are estimated by minimizing the objective func-
tion, where weighted 1 -norm regularized least squares problems are iteratively
solved.
2 Methods
2.1 Background: A Signal Model with Sparsity and Smoothness
Regularization
Sparsity regularization has been shown to improve FO estimation and reduce
the number of gradient directions required for resolving crossing fibers [1]. A
commonly used strategy to incorporate sparsity is to model the diffusion signals
using a fixed basis. The prolate tensors have been a popular choice because of
Fiber Orientation Estimation Using Nonlocal and Local Information 99
their explicit relationship with FOs [1,9,13]. Specifically, let {Di }Ni=1 be a set of
N fixed prolate tensors. The primary eigenvector (PEV) vi of each Di represents
a possible FO and these PEVs are evenly distributed on the unit sphere. The
eigenvalues of the basis tensors can be determined by examining the diffusion
tensors in noncrossing tracts [9]. Then, the diffusion weighted signal Sm (gk ) at
voxel m associated with the gradient direction gk (k = 1, 2, . . . , K) and b-value
bk can be represented as
N
T
Sm (gk ) = Sm (0) fm,i e−bk gk Di gk + nm (gk ), (1)
i=1
where Sm (0) is the baseline signal without diffusion weighting, fm,i is Di ’s

N
unknown nonnegative mixture fraction ( i=1 fm,i = 1), and nm (gk ) is noise.
We define ym (gk ) = Sm (gk )/Sm (0) and ηm (gk ) = nm (gk )/Sm (0), and let
ym = (ym (g1 ), ym (g2 ), . . . , ym (gK ))T and ηm = (ηm (g1 ), ηm (g2 ), . . . , ηm (gK ))T .
Then, Eq. (1) can be written as
ym = Gfm + ηm , (2)
T
where G is a K × N dictionary matrix with Gki = e−bk qk Di qk , and fm =
(fm,1 , fm,2 , . . . , fm,N )T . Based on the assumption that at each voxel the number
of FOs is small with respect to the number of gradient directions, the mixture
fractions can be estimated using a voxelwise sparse reconstruction formulation
fˆm = arg min ||Gfm − ym ||22 + β||fm ||0 . (3)
fm ≥0,||fm ||1 =1
In practice, the constraint of ||fm ||1 = 1 is usually relaxed, and the sparse recon-
struction can be either solved directly [8] or by approximating the 0 -norm with
1 -norm [1,9,13]. Basis directions corresponding to nonzero mixture fractions
are determined as FOs.
To further incorporate spatial coherence of FOs, weighted 1 -norm regulariza-
tion has been introduced into dictionary-based FO estimation [4,18]. For exam-
ple, in [18] FOs in all voxels are jointly estimated by solving
M

{fˆm }M
m=1 = arg min ||Gfm − ym ||22 + β||Cm fm ||1 , (4)
f1 ,f2 ,...,fM ≥0 m=1
where M is the number of voxels and Cm is a diagonal matrix that encodes

neighbor interaction. It places smaller penalties on mixtures fractions associ-
ated with basis directions that are more consistent with neighbor FOs so that
these mixture fractions are more likely to be positive and their associated basis
directions are thus encouraged.
2.2 FO Estimation Incorporating Nonlocal Information

In image denoising or segmentation problems, nonlocal information has been
used to improve the performance [3,6]. In FO estimation, because fiber tracts
100 C. Ye
are usually tube-shaped (e.g., the cingulum bundle) or sheet-shaped (e.g., the
corpus callosum) [19], voxels that are not adjacent to each other can still have
similar FO patterns, and it is possible to use nonlocal information to improve
the estimation. We choose to use a weighted 1 -norm regularized FO estimation
framework similar to Eq. (4), and encode the weighting matrix Cm using both
nonlocal and local information.
Finding Nonlocal Reference Voxels. For each voxel m, the nonlocal infor-
mation is extracted from a set Rm of voxels, which are called nonlocal reference
voxels and should have diffusion profiles similar to that of m. To identify the
nonlocal reference voxels for m, we compute patch-based dissimilarities between
the voxel m and the voxels in a search range Sm , like the common practice
in nonlocal image processing [3,6]. Specifically, we choose a search range of a
11 × 11 × 11 cube [3] whose center is m. The patch at each voxel n ∈ Sm is
formed by the diffusion tensors of its 6-connected neighbors and the diffusion
tensor at n, which is represented as Δn = (Δn,1 , . . . , Δn,7 ).
We define the following patch-based diffusion dissimilarity between two voxels
m and n
1
7
dΔ (Δm , Δn ) = d(Δm,j , Δn,j ), (5)
7 j=1
where d(·, ·) is the log-Euclidean tensor distance [2]

d(Δm,j , Δn,j ) = Trace({log(Δm,j ) − log(Δn,j )}2 ). (6)
For each m we find its k nearest neighbors in terms of the diffusion dissimilarity
in Eq. (5), and define them as the nonlocal reference voxels. k is a parameter to
be specified by users. Note that although we call these reference voxels nonlocal,
it is possible that Rm contains the neighbors of m as well, if they have very
similar diffusion profiles to that of m. We used the implementation of k nearest
neighbors in the scikit-learn toolkit1 based on a ball tree search algorithm.
Guided FO Estimation. We seek to guide FO estimation using the local neigh-

bors and nonlocal reference voxels. Like [18], we use a 26-connected neighborhood
Nm of m. Then, the set of voxels guiding FO estimation at m is Gm = Nm ∪ Rm .
Using Gm , we extract a set of likely FOs for m to determine the weighting
of basis directions and guide FO estimation. First, a voxel similarity between m
and each voxel n ∈ Gm is defined

exp{−μd2 (Dm , Dn )}, if n ∈ Nm
w(m, n) = , (7)
exp{−μd2Δ (Δm , Δn )}, otherwise
where μ = 3.0 is a constant [18], and Dm and Dn are the diffusion tensors at m
and n, respectively. When n is a neighbor of m, the voxel similarity is exactly
1
http://scikit-learn.org/stable/modules/neighbors.html.
the one defined in [18]; when n is not adjacent to m, the voxel similarity is
defined using the patches Δm and Δn . Second, suppose the FOs at a voxel n
are {wn,j }W n
j=1 , where Wn is the number of FOs at n. For each m we can compute
the similarity between the basis direction vi and the FO configurations of the
voxels in the guiding set Gm

Rm (i) = w(m, n) max |vi · wn,j |, i = 1, 2, . . . , N. (8)
j=1,2,...,Wn
n∈Gm
When vi is aligned with the FOs in many voxels in the guiding set Gm and
these voxels are similar to m, large Rm (i) is observed, indicating that vi is
likely to be an FO. Note that Rm (i) is similar to the aggregate basis-neighbor
similarity defined in [18]. Here we have replaced the neighborhood Nm in [18]
with the guiding set Gm containing both local and nonlocal information. These
Rm (i) can then be plotted on the unit sphere according to their associated basis
directions, and the basis directions with local maximal Rm (i) are determined as
likely FOs Um = {um,p }U p=1 (Um is the cardinality of Um ) at m [18].
m
With the likely FOs Um , the diagonal entries of Cm are specified as [18]
1−α max |vi ·um,p |
p=1,2,...,Um
Cm,i = , i = 1, 2, . . . , N , (9)
min 1−α max |vq ·um,p |
q=1,2,...,N p=1,2,...,Um
where α is a constant controlling the influence of guiding voxels. Smaller weights

are associated with basis directions closer to likely FOs, and these directions are
encouraged. In this work, we set α = 0.8 as suggested by [18].
We estimate FOs in all voxels by minimizing the following objective function
with weighted 1 -norm regularization,
M
β
E(f1 , f2 , . . . , fM ) = ||Gfm − ym ||22 + ||Cm fm ||1 , (10)
m=1
Wm
where fm ≥ 0 and β is a constant. Note that we assign smaller weights to the

weighted 1 -norm when the number of FOs is larger, which in practice increases
accuracy. In this work, we set β = 0.3, which is smaller than the one used in [18]
because the number of gradient directions in the dMRI data is smaller than
that in [18]. Because Cm is a function of the unknown FOs, to solve Eq. (10) we
iteratively solve fm sequentially. At iteration t, for each fm we have
t t−1 t−1
fˆm = arg min E(fˆ1t , . . . , fˆm−1
t
, fm , fˆm+1 , . . . , fˆM )
fm ≥0
β t
= arg min ||Gfm − ym ||22 + t−1 ||Cm fm ||1 , (11)
fm ≥0 Wm
which is a weighted Lasso problem that can be solved using the strategy in [17].
102 C. Ye
Fig. 1. 3D rendering of the digital phantom.
3 Results
3.1 3D Digital Crossing Phantom
A 3D digital phantom (see Fig. 1) with the same tract geometries and diffu-
sion properties used in [18] was created to simulate five tracts. Thirty gradient
directions (b = 1000 s/mm2 ) were used to simulate the diffusion weighted images
(DWIs). Rician noise was added to the DWIs. The signal-to-noise ratio (SNR)
is 20 on the b0 image.
FORNLI with k = 4 was applied on the phantom and compared with
CSD [16], CFARI [9], and FORNI [18] using the FO error proposed in [18].
CSD and CFARI are voxelwise FO estimation methods, and FORNI incorpo-
rates neighbor information for FO estimation. We used the CSD implementation
in the Dipy software2 , and implemented CFARI and FORNI using the parame-
ters reported in [9,18], respectively. The errors over the entire phantom and in
the regions with noncrossing or crossing tracts are plotted in Fig. 2(a), where
FORNLI achieves the most accurate result. In addition, we compared the two
best algorithms here, FORNI and FORNLI, using a paired Student’s t-test. In
Fig. 2. FO estimation errors. (a) Means and standard deviations of the FO errors
of CSD, CFARI, FORNI, and FORNLI; (b) mean FORNLI FO errors with different
numbers of nonlocal reference voxels in regions with noncrossing or crossing tracts.
2
http://nipy.org/dipy/examples built/reconst csd.html.
all four cases, errors of FORNLI are significantly smaller than those of FORNI
(p < 0.05), and the effect sizes (Cohen’s d) are between 0.5 and 0.6.
Next, we studied the impact of the number of nonlocal reference voxels. Using
different k, the errors in regions with noncrossing or crossing tracts are shown in
Fig. 2(b). Note that k = 0 represent cases where only the local information from
neighbors is used. Incorporation of nonlocal information improves the estimation
quality, especially in the more complex regions with three crossing tracts. When
k reaches four, the estimation accuracy becomes stable, so we will use k = 4 for
the brain dMRI dataset.
3.2 Brain dMRI

We selected a random subject in the publicly available dataset of COBRE [7].
The DWIs and b0 images were acquired on a 3T Siemens Trio scanner, where 30
gradient directions (b = 800 s/mm2 ) were used. The resolution is 2 mm isotropic.
The SNR is about 20 on the b0 image.
To evaluate FORNLI (with k = 4) and compare it with CSD, CFARI, and
FORNI, we demonstrate the results in a region containing the crossing of the
corpus callosum (CC) and the superior longitudinal fasciculus (SLF) in Fig. 3.
We have also shown the results of FORNLI with k = 0, where no nonlocal infor-
mation is used. By enforcing spatial consistency of FOs, FORNI and FORNLI
improve the estimation of crossing FOs. In addition, in the orange box FORNLI
(k = 4) achieves more consistent FO configurations than FORNI; and in the
blue box, compared with FORNI and FORNLI (k = 0), FORNLI (k = 4) avoids
the FO configurations in the upper-right voxels that seem to contradict with the
adjacent voxels by having sharp turning angles.
Fig. 3. FO estimation in the crossing regions of SLF and CC overlaid on the fractional
anisotropy map. Note the highlighted region for comparison.
4 Conclusion
We have presented an FO estimation algorithm FORNLI which is guided by
both local and nonlocal information. Results on simulated and real brain dMRI
data demonstrate the benefit of the incorporation of nonlocal information for
FO estimation.
104 C. Ye
References
1. Aranda, R., Ramirez-Manzanares, A., Rivera, M.: Sparse and adaptive diffusion
dictionary (SADD) for recovering intra-voxel white matter structure. Med. Image
Anal. 26(1), 243–255 (2015)
2. Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Log-Euclidean metrics for fast and
simple calculus on diffusion tensors. Magn. Reson. Med. 56(2), 411–421 (2006)
3. Asman, A.J., Landman, B.A.: Non-local statistical label fusion for multi-atlas seg-
mentation. Med. Image Anal. 17(2), 194–208 (2013)
4. Aurı́a, A., Daducci, A., Thiran, J.P., Wiaux, Y.: Structured sparsity for spatially
coherent fibre orientation estimation in diffusion MRI. NeuroImage 115, 245–255
(2015)
5. Basser, P.J., Pajevic, S., Pierpaoli, C., Duda, J., Aldroubi, A.: In vivo fiber trac-
tography using DT-MRI data. Magn. Reson. Med. 44(4), 625–632 (2000)
6. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In:
IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
vol. 2, pp. 60–65. IEEE (2005)
7. Cetin, M.S., Christensen, F., Abbott, C.C., Stephen, J.M., Mayer, A.R.,
Cañive, J.M., Bustillo, J.R., Pearlson, G.D., Calhoun, V.D.: Thalamus and
posterior temporal lobe show greater inter-network connectivity at rest and across
sensory paradigms in schizophrenia. NeuroImage 97, 117–126 (2014)
8. Daducci, A., Van De Ville, D., Thiran, J.P., Wiaux, Y.: Sparse regularization for
fiber ODF reconstruction: from the suboptimality of 2 and 1 priors to 0 . Med.
Image Anal. 18(6), 820–833 (2014)
9. Landman, B.A., Bogovic, J.A., Wan, H., ElShahaby, F.E.Z., Bazin, P.L.,
Prince, J.L.: Resolution of crossing fibers with constrained compressed sensing
using diffusion tensor MRI. NeuroImage 59(3), 2175–2186 (2012)
10. Merlet, S.L., Deriche, R.: Continuous diffusion signal, EAP and ODF estimation
via compressive sensing in diffusion MRI. Med. Image Anal. 17(5), 556–572 (2013)
11. Michailovich, O., Rathi, Y., Dolui, S.: Spatially regularized compressed sensing
for high angular resolution diffusion imaging. IEEE Trans. Med. Imaging 30(5),
1100–1115 (2011)
12. Pasternak, O., Assaf, Y., Intrator, N., Sochen, N.: Variational multiple-tensor
fitting of fiber-ambiguous diffusion-weighted magnetic resonance imaging voxels.
Magn. Reson. Imaging 26(8), 1133–1144 (2008)
13. Ramirez-Manzanares, A., Rivera, M., Vemuri, B.C., Carney, P., Mareci, T.: Dif-
fusion basis functions decomposition for estimating white matter intravoxel fiber
geometry. IEEE Trans. Med. Imaging 26(8), 1091–1102 (2007)
14. Rathi, Y., Michailovich, O., Laun, F., Setsompop, K., Grant, P.E., Westin, C.F.:
Multi-shell diffusion signal recovery from sparse measurements. Med. Image Anal.
18(7), 1143–1156 (2014)
15. Reisert, M., Kiselev, V.G.: Fiber continuity: an anisotropic prior for ODF estima-
tion. IEEE Trans. Med. Imaging 30(6), 1274–1283 (2011)
16. Tournier, J.D., Calamante, F., Connelly, A.: Robust determination of the fibre ori-
entation distribution in diffusion MRI: non-negativity constrained super-resolved
spherical deconvolution. NeuroImage 35(4), 1459–1472 (2007)
17. Ye, C., Murano, E., Stone, M., Prince, J.L.: A Bayesian approach to distinguishing
interdigitated tongue muscles from limited diffusion magnetic resonance imaging.
Comput. Med. Imaging Graph. 45, 63–74 (2015)
18. Ye, C., Zhuo, J., Gullapalli, R.P., Prince, J.L.: Estimation of fiber orientations
using neighborhood information. Med. Image Anal. 32, 243–256 (2016)
19. Yushkevich, P.A., Zhang, H., Simon, T.J., Gee, J.C.: Structure-specific statistical
mapping of white matter tracts. NeuroImage 41(2), 448–461 (2008)
Reveal Consistent Spatial-Temporal Patterns
from Dynamic Functional Connectivity
for Autism Spectrum Disorder Identification
Yingying Zhu1, Xiaofeng Zhu1, Han Zhang1, Wei Gao2,

Dinggang Shen1, and Guorong Wu1(&)
1
Department of Radiology and BRIC,
University of North Carolina at Chapel Hill, Chapel Hill, USA
grwu@med.unc.edu
2
Biomedical Imaging Research Institute,
Department of Biomedical Sciences and Imaging,
Cedars-Sinai Medical Center, Los Angeles, USA
Abstract. Functional magnetic resonance imaging (fMRI) provides a non-

invasive way to investigate brain activity. Recently, convergent evidence shows
that the correlations of spontaneous fluctuations between two distinct brain
regions dynamically change even in resting state, due to the condition-
dependent nature of brain activity. Thus, quantifying the patterns of functional
connectivity (FC) in a short time period and changes of FC over time can
potentially provide valuable insight into both individual-based diagnosis and
group comparison. In light of this, we propose a novel computational method to
robustly estimate both static and dynamic spatial-temporal connectivity patterns
from the observed noisy signals of individual subject. We achieve this goal in
two folds: (1) Construct static functional connectivity across brain regions. Due
to low signal-to-noise ratio induced by possible non-neural noise, the estimated
FC strength is very sensitive and it is hard to define a good threshold to dis-
tinguish between real and spurious connections. To alleviate this issue, we
propose to optimize FC which is in consensus with not only the low level
region-to-region signal correlations but also the similarity of high level principal
connection patterns learned from the estimated link-to-link connections. Since
brain network is intrinsically sparse, we also encourage sparsity during FC
optimization. (2) Characterize dynamic functional connectivity along time. It is
hard to synchronize the estimated dynamic FC patterns and the real cognitive
state changes, even using learning-based methods. To address these limitations,
we further extend above FC optimization method into the spatial-temporal
domain by arranging the FC estimations along a set of overlapped sliding
windows into a tensor structure as the window slides. Then we employ low rank
constraint in the temporal domain assuming there are likely a small number of
discrete states that the brain transverses during a short period of time. We
applied the learned spatial-temporal patterns from fMRI images to identify
autism subjects. Promising classification results have been achieved, suggesting
high discrimination power and great potentials in computer assisted diagnosis.

DOI: 10.1007/978-3-319-46720-7_13
Reveal Consistent Spatial-Temporal Patterns from Dynamic FC 107
1 Introduction
In general, resting-state functional connectivity is a set of pair-wise connectivity

measurements, each of which describes the strength of co-activity between two regions
in human brain. In many group comparison studies, FC obtained from resting-state
fMRI shows observable abnormal patterns in patient cohort to understand different
disease mechanisms. In clinical practice, FC is regarded as an important biomarker for
disease diagnosis and monitoring in various clinical applications such as Alzheimer’s
disease [1] and Autism [2].
In current functional brain network studies, Pearson’s correlation on BOLD (Blood
Oxygen Level Dependent) signals is widely used to measure the strength of FC
between two brain regions [2, 3]. It is worth noting that such correlation based con-
nectivity measure is exclusively calculated based on the observed BOLD signals and
fixed for the subsequent data analysis. However, the BOLD signal usually has very
poor signal-to-noise ratio and is mixed with substantial non-neural noise and artefacts.
Therefore, it is hard for current state-of-the-art methods to determine a good threshold
of FC measure which can effectively distinguish real and spurious connections.
For simplicity, many FC characterization methods assume that connectivity pat-
terns in the brain do not change over the course of a resting-state fMRI scan. There is a
growing consensus in the neuroimaging field, however, that the spontaneous fluctua-
tions and correlations of signals between two distinct brain regions change with cor-
respondence to cognitive states, even in a task-free environment [4]. Thus, dynamic FC
patterns have been investigated recently by mainly using sliding window technique
[4, 11–13]. However, it is very difficult to synchronize the estimated dynamic patterns
with the real fluctuations of cognitive state, even using advanced machine learning
techniques such as clustering [5] and hidden Markov model [6]. For example, both
Fig. 1. The advantage of our learning-based spatial-temporal FC optimization method (bottom)

over the conventional method (top) which calculate the FC based on signal correlations. As the
trajectory of FC at Amygdala shown in the right, the dynamic FC optimized by our
learning-based method is more reasonable than the conventional correlation-based method.
108 Y. Zhu et al.
methods have to determine the number of states (clusters) which might work well on
the training data but have the potential issue of generality to the unseen testing subjects.
To address above issues, we propose a novel data-driven solution to reveal the
consistent spatial-temporal FC patterns from resting-state fMRI image. Our work has
two folds. First, we present a robust learning-based method to optimize FC from the
BOLD signals in a fixed sliding window. In order to avoid the unreliable calculation of
FC based on signal correlations, high level feature representation is of necessity to
guide the optimization of FC. Specifically, we apply singular value decomposition
(SVD) to the tentatively estimated FC matrix and regard the top ranked eigenvectors
are as the high level network features which characterize the principal connection
patterns across all brain regions. Thus, we can optimize functional connections for each
brain region based on not only the observed region-to-region signal correlations but
also the similarity between high level principal connection patterns. In turn, the refined
FC can lead to more reasonable estimation of principal connection patterns. Since brain
network is intrinsically economic and sparse, sparsity constraint is used to control the
number of connections during the joint estimation of principal connection patterns and
the optimization of FC. Second, we further extend the above FC optimization frame-
work from one sliding window (capturing the static FC patterns) to a set of overlapped
sliding windows (capturing the dynamic FC patterns), as shown in the middle of Fig. 1.
The leverage is that we arrange the FCs along time into a tensor structure (pink cubic in
Fig. 1) and we employ additional low rank constraint to penalize the oscillatory
changes of FC in the temporal domain.
In this paper, we apply our learning-based method to find the spatial-temporal
functional connectivity patterns for identifying childhood autism spectrum disorders
(ASD). Compared with conventional approaches which simply calculate FC based on
signal correlations, more accurate classification results have been achieved in classi-
fying normal control (NC) and ASD subjects by using our learned spatial-temporal FC
patterns.
2 Method
2.1 Construct Robust Functional Connectivity
Let xi 2 <W1 denote the mean BOLD signal calculated in brain region Oi (i ¼ 1;
. . .; N), where W is the length of time course within the sliding window. Conven-
tionally, a N N connectivity matrix S is used to measure the FCs in the whole brain,
where each element sij quantitatively measure the strength of FC between region Oi and
Oi (i 6¼ j). For convenience, we use si 2 <N1 denote i-th column in connectivity
matrix S, which characters the connections w.r.t. other brain regions. Since the
signal-to-noise ratio of observed xi is low, high level feature is of necessity to guide the
estimation of connectivity matrix S. To achieve it, we apply singular value decom-
position to S and regard the top ranked eigenvectors matrix FKN ¼ ½f i i¼1;...;N as the
high level network features, where each f i 2 <K1 denotes the principal connection
pattern on region Oi. Thus, instead of calculating the connectivity sij based on corre-

lation c xi ; xj between observed BOLD signals xi and xj , we require the optimal
connectivity sij should (1) be in consensus with the correlation of low level signals
between xi and xj ; and (2) be in line with similarity of high level principal connection
patterns between f i and f j . To that end, the objective function is defined as:
XN hXN i
arg minsi;j ;f i 1 c xi ; xj 2 sij þ f i f j 2 sij þ r1 ksi k þ r2 ksi k2
1 2
i¼1 j¼1 2 2
ð1Þ
s:t: 8i; si [ 0;
where r1 is the scalar controlling the strength of connection sparsity for each con-
nection pattern si . In order for robustness, L2 norm is applied to si . Since the estimation
of sij and f i are coupled, we propose the following solution to alternative solve sij
and f i :
(1) Initialize connectivity matrix by letting sij ¼ cðxi ; xj Þ;
(2) Given S, obtain the principal connection pattern f i for each region Oi by applying
eigenvalue decomposition to S since S is symmetric. After that, we select the top
K eigenevectors.
(3) Fixing f i , we divide the estimation of sij in Eq. (1) into two sub-tasks: (a) Estimate
sij without the sparsity constraint. Since the objective function without the L1
norm can be reformulated into a quadratic form, we can use Karush Kuhn Tucker
(KKT) [7] algorithm to optimize sij . (b) Make the connection pattern si sparse.
The objective function requires the optimized connection pattern si not only
sparse but also close to the solution in step 3(a) Standard Alternating Direction
Method of Multipliers (ADMM) [7, 8, 14] can be used to solve this sub-task.
(4) Go to step 2 until converge.
Typical optimized connectivity matrix S ^ is shown in the pink cubic in Fig. 1.
Compared to the connectivity matrix by conventional method based on the signal
correlation, our learned connectivity matrix is much sparser and it becomes much easier
to construct the brain network since a lot of spurious connections have been removed
by using the sparsity constraint during optimization.
2.2 Characterize Dynamic Functional Connectivity

Next, we extend our learning based FC estimation method to address the problem of
dynamic connectivity in fMRI data. Here, we follow the sliding window technique to
obtain T overlapped sliding windows which cover the whole time course. Since we can
optimize the connectivity matrix St for each sliding window, we employ a tensor
S ¼ fSt jt ¼ 1; . . .; Tg 2 <NNT to describe the dynamic connectivity. Similarly, we
construct tensor C ¼ fCt jt ¼ 1; . . .; Tg 2 <NNT regarding the dynamic correlation
of BOLD signals and tensor F ¼ fFt jt ¼ 1; . . .; Tg 2 <NNT regarding
h i the dynamic
similarity of principal connection patterns, where Ct ¼ ctij ðctij ¼ 1
i;j¼1;...;N
2

c xti ; xtj Þ and Ft ¼ f ij i;j¼1;...;N ðf tij ¼ f ti f tj Þ are the N N matrices in t-th sliding
2
110 Y. Zhu et al.
window. Then, we extend the objective function in Eq. (1) to the spatial-temporal
domain using tensor analysis:
2
arg minS C S þ F S þ aSð1Þ þ r1 Sð2Þ 1 þ r2 Sð2Þ F
ð2Þ
s:t: 8i; t; sti [ 0;
P P
where C S ¼ Tt¼1 ðCt ÞT St and F S ¼ Tt¼1 ðFt ÞT St . We use SðkÞ denote the
unfolding operation to a general tensor S along the k-th mode. In our method, we have
Sð1Þ 2 <N T and Sð2Þ 2 <NTN . Since brain in resting state generally transverses a
2
small number of discrete stages during a short period of time [4], we require the change
of connectivity matrix St to be smooth along time. Thus,
it is reasonable to apply low
rank constraint on Sð1Þ such that the minimization of Sð1Þ (nuclear norm of Sð1Þ ) can
suppress too rapid FC change in the temporal domain. L1 -norm is applied to Sð2Þ since
the brain network within each sliding window is sparse.
Optimization. In order to make the optimization of Eq. (2) tractable, we introduce two
dummy variables Z1 and Z2 so that we can solve this problem using ADMM [7, 8]:
XT h i
arg minS;Z1 ;Z2 ðCt ÞT St þ ðFt ÞT St þ r2 kSt k2F þ akZ1 k þ r1 kZ2 k1
t¼1 ð3Þ
s:t: 8i; t; sti [ 0; Sð1Þ ¼ Z1 ; Sð2Þ ¼ Z2 :
Using Lagrangian multipliers, we can remove the equality constraints in Eq. (3) and
reformulate Eq. (3) into:
XT h T T 2
i
arg minS;Z1 ;Z2 t¼1
ð C t Þ St þ ð Ft Þ St þ r 2 k St k F þ akZ1 k þ r1 kZ2 k1
l l 2 ð4Þ
þ 1 Sð1Þ Z1 F þ KT1 Sð1Þ Z1 þ 2 Sð2Þ Z2 F þ KT2 Sð2Þ Z2 ;
2
2 2
where K1 and K2 are the N 2 T Largrangian multiplier matrix, and l1 and l2 are the
penalty parameters. Furthermore, we solve Eq. (4) by alternatively optimize S, Z1 and
Z2 until Eq. (4) converges. The dynamic connectivity matrices St can be optimized by
following the Karush Kuhn Tunker (KKT) method in [9]. Standard soft threshold
shrinkage method [7] can be used to solve Z1 and Z2 .
2.3 Identifying ASD Subject with the Learned Static/Dynamic

FC Patterns
Conventional method first calculate the connectivity matrix S within certain window and
then extract N dimension node-wise features which describe the connectivity efficiency
at each ROI (Region of Interest) [10]. After that, classic SVM (linear kernel and L2
penalty, https://www.csie.ntu.edu.tw/*cjlin/libsvm/) is trained to identify individual
ASD subjects. We follow the same approach except we extract the node-wise features
from our learned static/dynamic connectivity matrices.
3 Experiment
Image Preprocessing. We conducted various experiments on resting-state fMRI

images from both NYU and UM sites in Autism Brain Imaging Data Exchange
(ABIDE) database, in order to demonstrate the generality of our method. Specifically,
45 NC and 45 ASD subjects are selected from the NYU site. 74 NC and 57 ASD
subjects are selected from UM site. The subjects from NYU site and UM site were
scanned for six and ten minutes during resting state, respectively, producing 180 time
points and 300 time points at a repetition time (TR) of 2 s. We processed all these data
using Data Processing Assistant for Resting-State fMRI (DPARSF) software. Specif-
ically, we remove the first 20 time points and last 20 time points for robustness. After
that, we corrected the fMRI images by slice timing and motion correction. Then, we
register individual subjects to the standard space, apply the AAL template with 116
ROIs to the subject image domain and compute the mean BOLD signal in each ROI,
where conventional method calculate the 116 116 connectivity matrix S based on the
correlation of mean BOLD signals between any pair of two distinct brain regions.
Experiment Setup. Ten-fold cross validation strategy is used in our experiments. We
randomly partition the subjects in the NC and ASD groups into 10 non-overlapping
approximately equal size sets. At each subject, we apply our learning-based method to
optimize the static/dynamic functional connectivity matrices and extract the node-wise
network features. Then, we use one fold for testing and the remaining folds are used for
training the SVM. The training subjects are further divided into 5 subsets for another
5-fold inner cross validation to learn the optimal parameters. The optimal principle
component range is [6, 8] and it take average 10 s to run one subject.
Evaluation of Learned Static/Dynamic FC Patterns in NC/ASD Classification. We
first manually set up the sliding window size which ranges from 20 % to 100 % of the
entire time course. In optimizing the dynamic FC pattern, we set the shift of sliding
window to 1 TR, in order to fully capture the dynamics of FC. The NC/ASD classi-
fication results on UM and NYU dataset are shown in Tables 1 and 2, respectively.
In these two tables, ACC represents accuracy and AUC represent the area under an
ROC curve. It is clear that SVM using the learned dynamic FC patterns outperforms the
conventional correlation based FC patterns and the learned static FC patterns, where
using the learned dynamic FC patterns can achieve almost 8 % increase over the
correlation-based FC patterns and almost 3 % increase over using the learned static FC
patterns.
Visualization of Dynamic Function Connectivity Patterns. The learned dynamic
functional connectivity matrices fSt g for one ASD subject (in blue box) and one NC
subject (in red box) are displayed in the top of Fig. 2. For comparison, we also show
the corresponding connectivity matrices independently calculated based on the corre-
lation of BOLD signals in the bottom of Fig. 2. It is apparent that our learned dynamic
connectivity matrices are much sparser than the counterpart matrices using conven-
tional method. Thus, it becomes much easier to construct functional brain network
using threshold based approach. In order to evaluate the dynamic functional transit, we
visualize the ROIs connected to Amygdala along time by examining the estimated
112 Y. Zhu et al.
connectivity matrices since Amygdala is a critical sub-cortical structure related to

Autism. As the red masks shown in Fig. 2, the transitions of ROIs connected to
Amygdala revealed by our learned dynamic FC patterns are much more consistent than
those by conventional correlation based FC patterns, where the transit of connected
ROIs is unrealistically fast and random. Specifically, we transpose the connection
vector si (i denotes the index for Amygdala here) into column vector and sequentially
arrange them into a matrix and visualize the transition of si along time in the right of
Fig. 1. From the trajectory of each element in si , it is clear that the dynamic FC
optimized by our leaning-based method is more reasonable than the conventional
method.
Validation of FC Pattern. We also checked the learned fc pattern and found that the
connection pattern for vision function regions, such as Lingual gyrus, Cuneus,
Parahippocampalgyrus, and for motion function such as Putamen, Globus Pallidus
are consistently stable for all subjects. We also found that the learned fc pattern is
similar as correlation fc pattern, however, the change is much smoother along time
which is consistent with current neuroscience findings.
Table 1. Accuracy of identifying ASD subjects on UM dataset w.r.t. sliding window size.
Window Perarson correlation Learned static FC Learned dynamic FC
size ACC AUC ACC AUC ACC AUC
10 % 87.37 ± 6.13 93.22 ± 7.53 89.23 ± 5.27 94.74 ± 6.83 92.25 ± 7.21 97.31 ± 8.63
25 % 84.50 ± 6.51 89.71 ± 7.45 87.83 ± 4.57 92.03 ± 5.81 90.35 ± 6.72 95.46 ± 7.45
45 % 80.81 ± 5.55 85.02 ± 6.81 84.89 ± 3.78 88.91 ± 4.57 86.45 ± 5.34 91.33 ± 5.81
60 % 75.71 ± 8.45 78.83 ± 9.51 81.71 ± 7.17 87.12 ± 8.64 83.76 ± 6.52 88.42 ± 7.46
100 % 68.13 ± 12.41 74.32 ± 14.36 70.52 ± 11.35 75.17 ± 13.11 77.85 ± 8.66 81.31 ± 9.52
Table 2. Accuracy of identifying ASD subjects on NYU dataset w.r.t. sliding window size.
Window Perarson correlation Learned static FC Learned dynamic FC
size ACC AUC ACC AUC ACC AUC
10 % 86.59 ± 5.01 91.07 ± 6.92 88.37 ± 6.07 92.36 ± 7.31 91.85 ± 5.11 96.23 ± 7.24
25 % 83.83 ± 5.56 88.61 ± 6.12 86.89 ± 3.76 91.76 ± 4.57 89.07 ± 4.71 94.64 ± 5.67
45 % 77.72 ± 7.45 81.57 ± 9.24 84.71 ± 6.18 89.43 ± 7.17 87.25 ± 5.46 91.37 ± 6.16
60 % 72.96 ± 12.21 77.56 ± 13.73 78.22 ± 9.14 84.56 ± 10.29 82.52 ± 7.82 86.71 ± 8.62
100 % 65.35 ± 12.13 70.28 ± 14.52 69.33 ± 10.41 74.21 ± 11.35 75.73 ± 8.27 80.16 ± 9.13
Fig. 2. Visualization of dynamic functional connectivity matrices by our learning-based method

(top) and conventional correlation based method for an ASD subject (a) and a NC subject (b).
4 Conclusion
In this work, we propose a novel learning-based method to discover both static and
dynamic connectivity patterns from resting-state fMRI data. For static FC estimation,
our method optimizes the functional connectivity based on not only the correlation of
low level BOLD signals but also the similarity of high level principal components from
the link-to-link connectivity patterns. To address the problem of dynamic functional
connectivity, we arrange connectivity matrices along time into a tensor structure and
apply sparsity to suppress spurious functional connectivities and low rank to avoid
unrealistic fast state transition along time. We use our method to obtain dynamic
connectivity patterns and apply them to identify ASD subject at individual level, where
classification method using our learned dynamic connectivity patterns can improve the
ASD identification accuracy with almost 8 % increase conventional correlation-based
framework.
References
1. Greicius, M., Srivastava, G., Reiss, A., Menon, V.: Default-mode network activity
distinguishes Alzheimer’s disease from healthy aging: evidence from functional MRI. PNAS
101, 4637–4642 (2004)
2. Amaral, D.G., Schumann, C.M., Nordahl, C.W.: Neuroanatomy of autism. Trends Neurosci.
31, 137–145 (2008)
3. van den Heuvel, M.P., Pol, H.E.H.: Exploring the brain network: a review on resting-state
fMRI functional connectivity. Eur. Neuropsychopharmacol. 20, 519–534 (2010)
4. Hutchison, R.M., Womelsdorf, T., Allen, E.A., Bandettini, P.A., Calhoun, V.D.,
Corbetta, M., Penna, S.D., Duyn, J.H., Glover, G.H., Gonzalez-Castillo, J.,
Handwerker, D.A., Keilholz, S., Kiviniemi, V., Leopold, D.A., de Pasquale, F.,
Sporns, O., Walter, M., Chang, C.: Dynamic functional connectivity: promise, issues, and
interpretations. Neuroimage 80, 360–378 (2013)
5. Wee, C.-Y., Yap, P.-T., Shen, D.: Diagnosis of autism spectrum disorders using temporally
distinct resting-state functional connectivity networks. CNS Neurosci. Ther. 22, 212–219
(2016)
6. Eavani, H., Satterthwaite, T.D., Gur, R.E., Gur, R.C., Davatzikos, C.: Unsupervised learning
of functional network dynamics in resting state fMRI. In: Gee, J.C., Joshi, S., Pohl, K.M.,
Wells, W.M., Zöllei, L. (eds.) IPMI 2013. LNCS, vol. 7917, pp. 426–437. Springer,
Heidelberg (2013)
7. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge
(2004)
8. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and
statistical learning via the alternating direction method of multipliers. Found. Trends Mach.
Learn. 3, 1–122 (2011)
9. Nie, F., Wang, X., Huang, H.: Clustering and projected clustering with adaptive neighbors.
In: The 20th International Conference on Knowledge Discovery and Data Mining (2014)
10. Rubinov, M., Sporns, O.: Complex network measures of brain connectivity: uses and
interpretations. Neuroimage 52, 1059–1069 (2010)
114 Y. Zhu et al.
11. Urs, B., et al.: Dynamic reconfiguration of frontal brain networks during executive cognition
in humans. PNAS 112, 11678–11683 (2015)
12. Heung-Il, S., Lee, S.W., Shen, D.: A hybrid of deep network and hidden Markov model for
MCI identification with resting-state fMRI. In: Navab, N., Hornegger, J., Wells, W.M.,
Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 573–580. Springer, Heidelberg
(2015)
13. Leonardi, N., et al.: Principal components of functional connectivity: a new approach to
study dynamic brain connectivity during rest. Neuroimage 83, 937–950 (2013)
14. Zhu, Y., Lucey, S.: Convolutional sparse coding for trajectory reconstruction. IEEE Trans.
Pattern Anal. Mach. Intell. 37(3), 529–540 (2015)
Boundary Mapping Through Manifold Learning
for Connectivity-Based Cortical Parcellation
Salim Arslan(B) , Sarah Parisot, and Daniel Rueckert
Biomedical Image Analysis Group, Department of Computing,

Imperial College London, London, UK
s.arslan13@imperial.ac.uk
Abstract. The study of the human connectome is becoming more pop-

ular due to its potential to reveal the brain function and structure. A
critical step in connectome analysis is to parcellate the cortex into coher-
ent regions that can be used to build graphical models of connectivity.
Computing an optimal parcellation is of great importance, as this stage
can affect the performance of the subsequent analysis. To this end, we
propose a new parcellation method driven by structural connectivity esti-
mated from diffusion MRI. We learn a manifold from the local connectiv-
ity properties of an individual subject and identify parcellation bound-
aries as points in this low-dimensional embedding where the connectivity
patterns change. We compute spatially contiguous and non-overlapping
parcels from these boundaries after projecting them back to the native
cortical surface. Our experiments with a set of 100 subjects show that
the proposed method can produce parcels with distinct patterns of con-
nectivity and a higher degree of homogeneity at varying resolutions com-
pared to the state-of-the-art methods, hence can potentially provide a
more reliable set of network nodes for connectome analysis.
1 Introduction
Connectome analysis has recently gained a lot of attention due to its potential
to reveal the functional and structural architecture of the human brain, as well
as understand its evolution through development, aging, and neurological disor-
ders [14]. Brain connectivity is typically analyzed via graphical models obtained
by connecting cortical regions to each other with respect to the similarity between
their connectivity profiles, derived from functional MRI (fMRI) or diffusion imag-
ing (dMRI). In a whole-brain connectivity analysis, parcellation of the cortex con-
stitutes an integral part of the pipeline, as the performance of the subsequent
stages depends on the ability of the parcels to reliably represent the underlying
connectivity [6]. Traditionally, parcellations derived from anatomical landmarks
or randomly partitioned subregions have been used for connectome analysis, how-
ever such parcellations generally fail to fully reflect the function of the cortical
architecture [14]. More recent approaches take into account the connectivity infor-
mation, generally in association with clustering algorithms [1,2,5,12] in order to
group vertices of connectional similarity [16]. Despite promising results, the par-
cellation problem is still open to improvements. This is primarily due to the fact

DOI: 10.1007/978-3-319-46720-7 14
116 S. Arslan et al.
that the problem itself is ill-posed, thus, obtaining accurate parcels both depends
on the proposed method’s fidelity to the given data [12] and its capacity to differ-
entiate vertices with different connectivity profiles [6].
To this end, we introduce a new parcellation method, in which we learn
a manifold from local connectivity characteristics of an individual subject and
develop an effective way of computing parcels from this manifold. Our app-
roach rests on the assumption that through dimensionality reduction, we can
capture the underlying connectivity structure that may not be visible in high-
dimensional space [10]. We use the manifold to locate transition points where
connectivity patterns change and interpret them as an abstract delineation of
the parcellation boundaries. After projecting back to the native cortical space,
these boundaries are used to compute non-overlapping and spatially contiguous
parcels. We achieve this with a watershed segmentation technique, originally
utilized to parcellate resting-state correlations [8]. Nonlinear manifold learning
has been formerly used to identify functional networks from fMRI [9,17] and for
surface matching [10], as well as within many other fMRI analysis techniques,
such as [15]. Nevertheless, we propose to use such technique in association with
dMRI-based structural connectivity and boundary mapping, in order to compute
cortical parcellations for individual subjects, which can be used as the network
nodes in a whole-brain connectome analysis.
We assess the parcellation quality based on parcel homogeneity [2,8] and
silhouette analysis [5,6]. Besides the dMRI data, we also evaluate the parcella-
tions with functional connectivity data obtained from resting-state fMRI as a
means of external validation [6]. Our method is compared to the state-of-the-
art connectivity-based parcellation techniques [5,12], as well as two parcellation
schemes which do not take into account any connectivity information [16]. In
addition, we show the extent to which our parcellation boundaries agree with
well-established patterns of cortical myelination and cytoarchitecture.
2 Method
We start with preprocessing the dMRI data using probabilistic tractography
to estimate a structural connectivity network, which is then reduced in dimen-
sionality through manifold learning. Driven by the boundaries identified in the
low-dimensional embedding as points where connectivity patterns change, we
utilize a watershed segmentation to achieve the final parcellation (Fig. 1).
Fig. 1. Parcellation pipeline, summarizing all steps after preprocessing.

Connectivity-Based Parcellation Through Manifold Learning 117
Estimating Structural Connectivity. We perform whole-brain probabilistic

tractography on dMRI data by following the procedures summarized in [12].
We applied an element-wise log transformation to the tractography matrix to
reduce the bias towards short connections and sampled 5000 streamlines from
each of the cortical vertices. We define a connectivity fingerprint for each vertex
vi by counting the number of streamlines that connect vi to other vertices.
Each subject’s structural connectivity network C ∈ RN ×N is estimated as the
cross-correlations of the fingerprints associated with each vertex, where N is the
number of vertices. We excluded the medial wall vertices from further processing
as they do not possess reliable information for connectivity analysis.
Learning a Manifold from Connectivity. We propose to use Laplacian

eigenmaps to compute a nonlinear embedding from a connectivity network [3].
This method can reveal the intrinsic geometry of the underlying connectivity
by forming an affinity matrix based on how vertices are connected within their
neighborhoods. To this end, we transform C into a locality-preserving affinity
matrix W ∈ R+N ×N by only retaining the correlations of the k nearest neigh-
bors of each vertex. We set k = 100 in order to effectively capture the local
connectivity structure and to ensure that the affinity matrix is connected and
positive-semidefinite (i.e. all Wij ≥ 0) for each subject. A nonlinear embedding
is computed through spectral decomposition of the normalized graph Laplacian,
defined as L = D−1/2 (D − W )D−1/2 , where D is a diagonal matrix with each
entry Dii = j Wij representing the degree of vi . Solving the generalized eigen-
vector problem [3] with respect to L reveals the eigenvectors f0 , f1 , . . . , fN −1 ,
ordered according to their eigenvalues 0 = λ0 ≤ λ1 ≤ . . . ≤ λN −1 . After omit-
ting the eigenvector f0 corresponding to λ0 , we can use the next d eigenvectors
to define an embedding that can approximate a low dimensional manifold [3].
Hence, each cortical vertex vi can be expressed as a row in this spectral embed-
ding, i.e. i → (f1 (i), . . . , fd (i)).
Eigenvector Discretization. The process of dimensionality reduction pre-

serves local connectivity as well as imposes a natural clustering of the data [3].
Therefore, the parcellation problem can be cast as a graph partitioning problem
and one would attempt to subdivide the connectivity graph with spectral clus-
tering, e.g. using the normalized cuts criterion and solving the aforementioned
generalized eigenvalue problem [13]. In particular, each of the smallest eigen-
vectors corresponds to a real valued solution that optimally sub-partitions the
graph. These partitions can be approximated by transforming the real valued
eigenvectors into discrete forms, ideally by dividing them into two parts with
respect to a splitting point [13]. This can further be generalized towards a multi-
way partitioning with a recursive or simultaneous discretization of the smallest
eigenvectors [13], and thus, can be used to obtain a parcellation [5]. However,
by definition, our affinity matrix does not impose any spatial constraints, hence
such spectral methods cannot guarantee spatial contiguity within the parcels.
Instead, we propose a more effective way of deriving parcellations from discrete
eigenvectors and later show that this method can produce more reliable parcel-
lations compared to spatially constrained spectral clustering.
We discretize the eigenvectors using k -means and partition each eigenvec-
tor into two subregions. The edge between these subregions potentially provides
good separation points towards obtaining a parcellation, as the vertices within
the same subregions tend to have similar connectivity properties, whilst the
points closer to the boundary attribute to the cortical areas where the connec-
tivity is in transition. For example, Fig. 2(a) shows that connectivity profiles of
different vertices may exhibit similar or varying patterns, depending on their
relative location to an edge. In order to show that this tendency holds across
the whole cortex, we randomly selected vertices from one subregion adjacent
to the edge and paired them with their closest neighbors residing in the other
subregion. Keeping the distance between the vertices in pairs approximately the
same, we selected new pairs of vertices, but this time from within the same sub-
regions. We then measured the average correlation between the paired vertices’
connectivity profiles in each set and repeated this for all eigenvectors and sub-
jects. Figure 2(b) shows that, the similarity between the connectivity profiles of
vertices drops by at least 20% if they reside on different sides of a boundary.
Fig. 2. (a) Connectivity profiles of vertices from different sides of a boundary. (b) Left:
illustration of the vertex selection procedure. Right: average similarity (correlation)
between paired vertices for each eigenvector. Dotted lines show the standard deviations.
Boundary Map Generation and Cortical Parcellation. To locate the con-

nectivity transition points and construct a boundary map, we first transfer the
discrete eigenvectors back to the native high-dimensional space. We then cal-
culate the gradients of each eigenvector across the cortical surface and combine
them into a boundary map. This map constitutes a more robust substitution for
the boundary maps based on gradients directly calculated from the spatial cor-
relations [8], since it can adjust for possible spurious gradients. In addition, the
traditional boundary mapping requires a considerable amount of data in order to
effectively model the brain function at the individual level [11] and only becomes
reliable for parcellation when averaged across many subjects/datasets [8]. In
order to obtain the final parcellations from the boundary map, we use a marker-
controlled watershed algorithm [8]. We define a set of markers on the boundary
map where each marker corresponds to an estimated parcel position and then
grow these markers until a boundary is reached or two ridges touch each other
in the flooding process of the watershed. The marker definition is typically per-
formed by defining a threshold on the boundary map. We set this threshold to
the 25th percentile of the boundary map intensities, since in many empirically
tested cases, this effectively revealed approximate parcel locations to be used as
ideal markers for a watershed transformation.
3 Experiments
Data. Experiments are conducted on a set of 100 randomly selected adults (54
females, age 22–35) from the Human Connectome Project (HCP) S500 release1 .
All data have been acquired and preprocessed following the HCP minimal pre-
processing pipelines [7]. For each subject, the gray-matter voxels have been reg-
istered onto the 32k triangulated mesh at 2 mm spatial resolution, yielding a
standard set of cortical vertices per hemisphere.
Evaluation. We assess the quality of the parcellations using two validation

techniques: parcel homogeneity [2,8] and silhouette analysis [5,6]. The former
expresses the degree of homogeneity that a parcellation exhibits by calculating
average cross-correlations within each parcel. Silhouette analysis combines par-
cel homogeneity with inter-parcel separation and measures how vertices within
a parcel are similar to each other, compared to the vertices in the nearest
parcels [6]. The goodness-of-fit is estimated based on the structural connectivity
data from which the parcellations have been derived. In addition, we evaluate
parcellations by measuring their extent to reflect the underlying connectivity
estimated from resting-state fMRI, which can provide an external data source
for validation [6]. We compare our parcellations to the ones obtained by hier-
archical clustering applied to the low-dimensional embedding (HC-Low), hier-
archical clustering driven by the connectivity profiles in the high-dimensional
space (HC-High), multi-scale spectral clustering (M-Scale) [12], normalized cuts
(N-Cuts) [5], random parcellations by Poisson disk sampling, and geometric par-
cellations, i.e. k -means clustering of the vertex coordinates [16]. All methods are
spatially constrained to ensure the contiguity of parcels. M-Scale and HC-High
are based on an initial connectivity-based over-parcellation of the cortex to com-
pensate for the noise, and thus, to obtain higher accuracy (1000, 2000 and 3000
regions for M-Scale; 3000 regions for HC-High). Random and geometric par-
cellations do not account for any connectivity information, therefore provide a
baseline for the assessment [16].
Results. As there is no known optimal number of parcels, we evaluate the

proposed method at different scales, determined by the number of eigenvectors
incorporated into the boundary map. We present results for d = 10, 15, and 20
1
http://www.humanconnectome.org/documentation/S500/.
0.65 0.8
Silhouette coefficient
0.6 0.76
Homogeneity
0.55
0.72
0.5
0.68
0.45
0.64
0.4
0.35 0.6
0.3 0.56
d = 10 d = 15 d = 20 d = 10 d = 15 d = 20
Fig. 3. Quantitative results based on structural connectivity estimated from dMRI.

Error bars represent the variability across subjects. Stars (*) indicate statistical signif-
icance between the winner and the runner-up with p < 0.01.
0.19 0.74
Silhouette coefficient
0.17 0.7
Homogeneity
0.15 0.66
0.13 0.62
0.11 0.58
0.09 0.54
0.07 0.5
0.05 0.46
d = 10 d = 15 d = 20 d = 10 d = 15 d = 20
Fig. 4. Quantitative results based on resting-state functional connectivity.
eigenvectors per hemisphere, which on average, yield parcellations with around

180, 230, and 280 regions for each subject, respectively. Our experiments with
fewer eigenvectors resulted in very coarse parcellations that may not be ideal
for network analysis, whereas using d > 30 eigenvectors led to noisy boundary
maps, generating many unreliable parcels. For a fair comparison, other methods
are tuned to use the same number of parcels as inferred by our models. Validation
measures were calculated for each subject-parcellation pair and then averaged
across all subjects. We present the results based on structural and functional
connectivity data in Figs. 3 and 4, respectively.
Figure 3 shows that our method surpasses other approaches at all resolu-
tions in terms of silhouette analysis and performs equally effective as HC-Low
with respect to homogeneity. This may indicate that, spectral embedding, which
drives both methods, can successfully reveal the intrinsic geometry of the under-
lying connectivity, and hence, provides a more robust set of features towards
parcellating the cortical surface. In addition, the way we utilize discrete eigen-
vectors for deriving parcellations help obtain more distinct parcels compared to
the others. This can be deduced from silhouette coefficients, where we especially
perform better than HC-Low, which directly applies a traditional clustering app-
roach to the spectral coordinates. In addition, considering the results obtained by
HC-High, we can infer that nonlinear dimensionality reduction can identify local
connectivity patterns which may not be directly detected in the high dimensional
space. On the other hand, M-Scale and N-Cuts can obtain reliable parcellations
only to some extent. These spectral approaches solely consider the immediate
neighbors for the construction of their affinity matrices. Therefore, they may fail
to fully capture the underlying connectivity.
The difference in performance between our approach and the others becomes
more prominent with the resting-state functional connectivity results (Fig. 4).
Both homogeneity and silhouette analysis indicate that, the proposed method
can effectively subdivide the cortical surface into functionally coherent subre-
gions, hence can better reflect the underlying function. Although, other methods
can generate homogeneous parcels to some degree, they fail to separate vertices
with different signals from each other, as indicated by silhouette coefficients.
Finally, visual assessment of parcellations shows some alignment with
Brodmann’s cytoarchitectural areas and highly myelinated cortical regions (see
Supplementary Material). Dice-based overlapping measures [4] indicate that this
observation is substantially consistent across subjects, especially for the motor
(BA[1,3,4]) and visual cortex (BA17), with average Dice scores of 0.81 (±0.05)
and 0.82 (±0.05), respectively.
4 Conclusions
In this paper, we introduced a new connectivity-driven parcellation approach
based on dMRI. The proposed method models the local connectivity character-
istics with manifold learning and describes an effective use of this manifold to
identify locations where connectivity patterns change. Particularly, these tran-
sition locations are interpreted as an abstraction of the parcellation boundaries,
and hence, used to derive distinct parcels at different scales. We showed that
our parcellations can more reliably capture the underlying connectivity of the
brain compared to a set of other approaches. This paper focuses on developing a
complete framework for computing subject-specific parcellations, which can be
used in many application areas, such as for driving a registration process based
on brain connectivity. In addition, a planned future work is to explore the vari-
ability across individual parcellations towards generating a connectivity-based
cortical atlas, which can allow performing population level connectome studies.
Acknowledgments. Authors would like to thank Markus Schirmer for providing the
random parcellations. Data were provided by the Human Connectome Project, WU-
Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54-
MH091657). The research leading to these results received funding from the European
Research Council under the European Unions Seventh Framework Programme (FP/20-
2013)/ERC Grant Agreement No. 319456.
References
1. Arslan, S., Parisot, S., Rueckert, D.: Joint spectral decomposition for the par-
cellation of the human cerebral cortex using resting-state fMRI. In: Ourselin, S.,
Alexander, D.C., Westin, C.-F., Cardoso, M.J. (eds.) IPMI 2015. LNCS, vol. 9123,
pp. 85–97. Springer, Heidelberg (2015). doi:10.1007/978-3-319-19992-4 7
2. Arslan, S., Rueckert, D.: Multi-level parcellation of the cerebral cortex using
resting-rtate fMRI. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.)
3. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data
representation. Neural Comput. 15(6), 1373–1396 (2003)
4. Bohland, J.W., Bokil, H., Allen, C.B., Mitra, P.P.: The brain atlas concordance
problem: Quantitative comparison of anatomical parcellations. PLoS ONE 4(9),
e7200 (2009)
5. Craddock, R.C., James, G., Holtzheimer, P.E., Hu, X.P., Mayberg, H.S.: A whole
brain fMRI atlas generated via spatially constrained spectral clustering. Hum.
Brain Mapp. 33(8), 1914–1928 (2012)
6. Eickhoff, S.B., Thirion, B., Varoquaux, G., Bzdok, D.: Connectivity-based parcel-
lation: critique and implications. Hum. Brain Mapp. 36(12), 4771–4792 (2015)
7. Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., Coalson, T.S., Fischl, B.,
Andersson, J.L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J.R., Van Essen, D.C.,
Jenkinson, M.: The minimal preprocessing pipelines for the Human Connectome
Project. NeuroImage 80, 105–124 (2013)
8. Gordon, E.M., Laumann, T.O., Adeyemo, B., Huckins, J.F., Kelley, W.M.,
Petersen, S.E.: Generation and evaluation of a cortical area parcellation from
resting-state correlations. Cereb. Cortex 26(1), 288–303 (2016)
9. Langs, G., Sweet, A., Lashkari, D., Tie, Y., Rigolo, L., Golby, A.J., Golland, P.:
Decoupling function and anatomy in atlases of functional connectivity patterns:
language mapping in tumor patients. NeuroImage 103, 462–475 (2014)
10. Langs, G., Golland, P., Ghosh, S.S.: Predicting activation across individuals with
resting-state functional connectivity based multi-atlas label fusion. In: Navab, N.,
Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9350,
11. Laumann, T.O., Gordon, E.M., Adeyemo, B., Snyder, A.Z., Joo, S.J.,
Chen, M.Y., Gilmore, A.W., McDermott, K.B., Dosenbach, N.U., Schlaggar, B.L.,
Mumford, J.A., Poldrack, R.A., Petersen, S.E.: Functional system and areal organi-
zation of a highly sampled individual human brain. Neuron 87(3), 657–670 (2015)
12. Parisot, S., Arslan, S., Passerat-Palmbach, J., Wells, W.M., Rueckert, D.:
Tractography-driven groupwise multi-scale parcellation of the cortex. In:
Ourselin, S., Alexander, D.C., Westin, C.F., Cardoso, M.J. (eds.) IPMI 2015.
LNCS, vol. 9351, pp. 600–612. Springer, Heidelberg (2015)
13. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern
Anal. Mach. Intell. 22(8), 888–905 (2000)
14. Sporns, O.: The human connectome: a complex network. Ann. N. Y. Acad. Sci.
1224(1), 109–125 (2011)
15. Thirion, B., Dodel, S., Poline, J.B.: Detection of signal synchronizations in resting-
state fMRI datasets. NeuroImage 29(1), 321–327 (2006)
16. Thirion, B., Varoquaux, G., Dohmatob, E., Poline, J.B.: Which fMRI clustering
gives good brain parcellations? Front. Neurosci. 8, 167 (2014)
17. Wang, D., Buckner, R.L., Fox, M.D., Holt, D.J., Holmes, A.J., Stoecklein, S.,
Langs, G., Pan, R., Qian, T., Li, K., Baker, J.T., Stufflebeam, S.M., Wang, K.,
Wang, X., Hong, B., Liu, H.: Parcellating cortical functional networks in individ-
uals. Nat. Neurosci. 18(12), 1853–1860 (2015)
Species Preserved and Exclusive Structural
Connections Revealed by Sparse CCA
Xiao Li1(&), Lei Du1, Tuo Zhang1, Xintao Hu1, Xi Jiang2, Lei Guo1,
and Tianming Liu2
1
Brain Decoding Research Center, Northwestern Polytechnical University,
Xi’an, Shaanxi, China
lixiao0827@gmail.com
2
Computer Science Department, The University of Georgia, Athens, GA, USA
Abstract. Brain evolution has been an intriguing research topic for centuries.
Efforts have been denoted to identifying structural connectome preserved
between macaques and humans and the one exclusive to one species. However,
recent studies mainly focus on one specific fasciculus or one region. The sim-
ilarity and difference of global structural connection network in macaque and
human are still largely unknown. In this work, we used diffusion MRI (dMRI) to
estimate the whole brain large-scale white matter pathways and Brodmann areas
as a test bed to construct a global connectome for the two species. We adopted
sparse canonical correlation analysis (SCCA) algorithm to yield the weights
which can be applied to the connectome to produce the components strongly
correlated between the two species. Joint analysis of the weights helped to
identify the preserved white matter pathways and those exclusive to a specific
species. The results are consistent with the reports in the literatures, demon-
strating the effectiveness and promise of this framework.
Keywords: Species comparison Diffusion MRI Large-scale connectome
1 Introduction
Brain evolution has been an intriguing research topic for centuries. A comparative
structural connection study among primate brains may help in our understanding of the
structural substrates underlying the development of higher cognitive functions [1].
Recent research indicates that the organization of white matter (WM) bundles has been
preserved between macaques and humans while structural difference has also been
identified [2, 3]. However, these studies mainly focus on one specific fasciculus, e.g.
arcuate fasciculus, or one specific brain region, e.g. dorsal prefrontal lobe [1, 3]. The
similarity and difference between the two species in terms of global connective patterns
are still largely unknown. The lack of such knowledge partly issues from the
methodology used to analyze the connective anatomy.
Diffusion MRI (dMRI) and tractography approaches have given us the opportunity
to study the whole brain large-scale connectome in primate brains in vivo [1]. Recent
X. Li, L. Du and T. Zhang—These authors are co-first authors.

DOI: 10.1007/978-3-319-46720-7_15
124 X. Li et al.
comparative dMRI studies suggest that it is a powerful approach in revealing inter-

esting structural connectivity patterns of brain evolution at a global scale. Taking the
advantages of this imaging technique, we proposed a novel framework in this paper to
identify the globally preserved structural connectome between human and macaque and
the one exclusive to one species in a data-driven manner.
Generally, the comparative study is performing a common brain parcellation scheme,
such as Brodmann areas in this paper used as a test bed. Currently, only common
Brodmann brain sites between the species are used to create the global connectome.
DMRI derived connective strength is used as a matching criterion. To study the similarity
and difference between the species, an improved sparse CCA algorithm [5] was applied to
those connective matrices to produce optimized weight vectors for the connective fea-
tures so that components strongly correlated between the two species are yield.
Sparse CCA was adopted to avoid the limit of conventional CCA, as the feature
dimension is far larger than subject numbers and not all of them are of interest and
importance. By analyzing the weights, we identified the preserved connections and the
corresponding white matter fibers. Those exclusive to a specific species was also ana-
lyzed. The effectiveness of the framework has been evaluated by cross-validation. The
identified connections and fibers are consistent with the reports in the literatures,
demonstrating the effectiveness and promise of this framework.
Generally, as illustrated in Fig. 1, we used T1-weighted MRI and dMRI data to con-
struct structural connectivity matrices for each species. Then, each matrix was stretched
to a feature vector. Those feature vectors for a species compose a feature matrix. Next,
an improved SCCA algorithm [5] was adopted. Currently, only the canonical com-
ponents with the strongest correlation between the two feature matrices were consid-
ered by this SCCA algorithm [5]. Consequently, two weight vectors u and v were yield
for the element of the connectivity matrices, and they can be restored to the matrix
format, U and V. By jointly analyzing U and V, we determined the strongly correlated
connectivities conserved between the two species, and the corresponding dMRI derived
fibers were extracted and were suggested to be the preserved fibers. Connectivities and
fibers exclusive to a specific species were also analyzed.
Fig. 1. The flowchart of the framework.

Species Preserved and Exclusive Structural Connections by SCCA 125
2.1 Data and Preprocessing
Human Brain Imaging. Ten randomly selected human brains from the Q1 release of
WU-Minn Human Connectome Project (HCP) consortium [6] were used in this study.
The T1-weighted structural MRI had voxels with 0.7 mm isotropic, three-dimensional
acquisition, T1 = 1000 ms, TR = 2400 ms, TE = 2.14 ms, flip angle = 8°, image
matrix = 260 311 260. DMRI was acquired in following parameters: spin-echo
EPI sequence; TR = 5520 ms; TE = 89.5 ms; flip angle = 78°; refocusing flip angle =
160°; FOV = 210 180; matrix = 168 144; spatial resolution = 1.25 mm
1.25 mm 1.25 mm; echo spacing = 0.78 ms. Particularly, a full dMRI session
includes 6 runs, representing 3 different gradient tables, with each table acquired once
with right-to-left and left-to-right phase encoding polarities, respectively. Each gradient
table includes approximately 90 diffusion weighting directions plus 6 b = 0 acquisitions
interspersed throughout each run. Diffusion weighted data consisted of 3 shells of
b = 1000, 2000, and 3000 s/mm2 interspersed with an approximately equal number of
acquisitions on each shell within each run. Ten randomly selected subjects were used in
this study.
Macaque Brain Imaging. UNC-Wisconsin neurodevelopment rhesus MRI database
(http://www.nitrc.org/projects/uncuw_macdevmri/) was used in this work, consisting
of T1 weighted MRI and dMRI data. This is a longitudinal database and we only used
the scans of 10 different subjects when they are more than 18 months old. The released
T1 weighted MRI data has been registered to the UNC Primate Brain Atlas space [4].
The resolution of this space is 0.27 0.27 0.27 mm3 and a matrix of
300 350 250. The basic parameters for diffusion data acquisition were: resolution
of 0.65 0.65 1.3 mm3, a matrix of 256 256 58, diffusion-weighting gradi-
ents applied in 120 directions and b value of 1000 s/mm2. Ten images without diffu-
sion weighting (b = 0 s/mm2) were also acquired.
Preprocessing. Preprocessing steps on T1 weighted MRI included brain skull removal
and tissue segmentation via FSL [7]. T1 weighted MRI data was nonlinearly warped to b0
map of dMRI data via FSL-fnirt [8], before cortical surface reconstruction was performed
to reconstruct inner cortical surface of white matter (WM) [9]. For the dMRI data,
skull-strip and eddy currents were applied firstly, then BedpostX in FSL 5 (http://fsl.
fmrib.ox.ac.uk/fsl/fslwiki/FDT/UserGuide#BEDPOSTX) was adopted to estimate the
axonal orientations for each voxel. In this paper, we used two axonal orientations,
because it was suggested that b-values upwards of 4000 would be required to resolve a 3
fiber orthogonal system robustly [10]. For the sake of convenience in visualization,
DSIstudio [11] was used to reconstructed deterministic fibers from BedpostX derived
axon orientations. 5 104 Fiber tracts were reconstructed for each subject. FA and
angular threshold are 0.1 and 60°. Small FA value for primate brain was suggested in [15].
Structural Connectivity Matrix Construction. The structural connectivity matrices
were constructed from dMRI derived fibers and white matter surfaces with a parcel-
lation scheme. Currently, we used Brodmann areas parcellation scheme as a test bed to
develop and evaluate our framework. All macaque white matter surfaces were warped
to the ‘F99’ macaque atlas space [13] via spherical registration method [12].
126 X. Li et al.
The Brodmann parcellation in the atlas space was mapped back to the surface of each
individual. The Brodmann areas in the ‘Conte69’ human atlas [14] were mapped back
to each human subject’s surface using the same approach. Currently, ipsilateral
structural connectivities were considered, and we used Ms and Hs to denote the con-
nective matrices of macaque and human, respectively. Currently, 28 Brodmann areas
shared by the two atlases and robustly warped to individuals were used as nodes for the
matrices. So the connective matrices Ms and Hs are in the same size. For each indi-
vidual, the element of the matrix, such as mi;j in M, was defined as the connective
strength between Brodmann area i and j. That is the number of fiber tracts connecting
area i and j, divided by the total fiber numbers.
2.2 Sparse Canonical Correlation Analysis (SCCA)

The aim is to identify the common and different connections in a group-wise and global
manner. We used M or H as a connective feature for each subject. Because M and H are
symmetrical matrices, the upper triangle parts were extracted and converted to feature
vectors x and y, x is associate with M and y is associate with H. The problem was
selection of features such that the similarity between the two groups, X ¼ fx1 ; xn g
and Y ¼ fy1 ; yn g reached the maximum. To this end, we adopted a sparse CCA
algorithm [5], because the conventional CCA without sparsity penalty yields too many
non-zero values, making the connective matrices noisy. Also, small sample size results
in all correlation values nearly equal to 1. In the adopted SCCA [5], a more robust
penalty (absolute value based GraphNet, AGN) than the existing methods [16] is
added. The goal is to find a pair of weights, denotes as u and v, where the correlation
coefficient between uX and Yv is the highest. u and v are the canonical loadings of
X and Y, respectively. The AGN-SCCA model is formally defined as
min uT X T Yv ð1Þ

u;v
s:t: kXuk22 1; kYvk22 1; kukAGN c1 ; kvkAGN c2 ð2Þ
where kukAGN c1 and kvkAGN c2 are defined as follows,
kukAGN ¼ k1 jujT L1 juj þ b1 kuk1

ð3Þ
kvkAGN ¼ k2 jvjT L2 jvj þ b2 kvk1
The L1 and L2 are the corresponding Laplacian matrices of the correlation matrices
of X and Y separately. The AGN-SCCA method cannot only discovers a high rela-
tionship between X and Y, but also recovers the structure information from X and
Y respectively. That is, it can find out the correlated features in X. The lasso terms in
both penalties assure the sparsity.
Generally, we solve AGN-SCCA problem by two alternative iterations procedure
which is briefly described as follows (Please see [5] for details).
Algorithm 1. Compute the first pair of canonical loadings of X and Y.

1. Initialize u and v;
2. Iterate until convergence:
(a) u argminu uT X T Yv, s:t: kukAGN c1 and kXuk22 1;
(b) v argminv uT X T Yv, s:t: kvkAGN c2 and kYvk22 1.
3. Report u and v.
Indeed, the weight vectors u and v have the same size of the stretched X and
Y. They can be transformed backwards to the matrix form, denoted by U and V (only
the upper triangle part can be retrieved. The lower part is just a symmetrical copy of the
upper one). The values in the two weight matrices suggest the contribution of the
corresponding structural connectivities to maximizing the correlation (similarity) of the
global connective patterns between two species. Positive uij and vij indicate that those
connectivities in human and macaque are positively correlated, which further sug-
gesting that those connectivities might be those conserved between species.
3 Results
3.1 Cross-Validation
We used 10 human and macaque subjects in this study. Because only ipsilateral
connection is considered in this work and we assume that contralateral Brodmann areas
with same label have the same function, we have n = 20 samples for each species.
A five-fold cross validation scheme was adopted to evaluate if the framework yield
consistent U and V and effectiveness of AGN-SCCA algorithm to produce highly
correlation coefficient. Specifically, 16 pairs of human and macaque samples were used
as ‘training’ samples to tune the parameters in AGN-SCCA algorithm till the obtained
weight matrices U and V applied to the remaining 4 ‘testing’ sample pairs yield the
highest correlation coefficient. In Fig. 2, we show the five optimized U and V pairs. The
consistency demonstrates the weight matrix yield by the framework is robust to vari-
ance introduced by subjects. The averaged correlation coefficient values on the
Fig. 2. The five weight matrices Us and Vs yield by five-fold cross-validation. The standard
errors of the five Us and Vs are shown on the most right side.
128 X. Li et al.
five-fold test are 0.95 ± 0.01 (illustrated by the scatter chart in Fig. 3(a) on the
five-fold cross-validation) compared to the average intra-species values for human and
macaque, 0.995 and 0.989, demonstrating the effectiveness of the algorithm and sug-
gesting common global connective patterns between human and macaque do exist.
3.2 DTI Tracts Comparison Between Human and Macaque

The averaged U and V obtained from the five-fold cross-validation results in Fig. 2
were used to analysis the global connective patterns between human and macaque. In
Fig. 3(a), we show the uij s and vij s whose positive value is greater than the standard
deviation (0.04 for U and 0.1 for V). Globally, the structural connectivities corre-
sponding to the positive uij s and vij s are the strongest contributors to produce the highly
correlated canonical component between two species. Therefore, those connectivities
are suggested to be the species preserved ones. Figure 3(b) and (c) show those pre-
served fibers on a human subject and a macaque subject. It is also noticed in Fig. 3(a)
that negative uij s for human connectivities can be found, suggesting that those human
connectivities together have an opposite trend with the combination of most macaque
connectivities. This result may suggest that those connectivities are exclusive to
human. Because the connectivity matrices of the two species share the same parcel-
lation scheme, we can therefore further analysis the specific connectivities by over-
lapping the two positive weight matrices (no threshold applied). It is observed in Fig. 4
(a) that the overlapped connectivities matrix exhibits three clusters including two
Fig. 3. (a) The correlation of 1st canonical component between human and macaque. The scatter
chart in the top-right corner is the positive correlation between human subjects and macaque
subjects in the transformed space. The results of the five-fold cross-validation are shown in
different colors. The weight vectors have been transformed back to matrices the averaged weight
matrices of the five-fold cross-validation are shown beside the axes. Only the most positive
weights uij s and vij s (above the standard deviation) are shown; (b) and (c) show the fibers of a
human subject and a macaque subject corresponding to the positive weights in (a).
clusters on the diagonal line and one off line cluster. Cluster #1 is located on
somatosensory and motor cortices (BA1–7). The other diagonal one (Cluster #2)
resides on visual cortices (BA17–19) and temporal lobe (BA20–22). The off line
cluster (Cluster #3) consists of the Fronto-occipital stream (BA9, 10 to BA17–19) and
Fronto-temporal stream (BA9, 10 to BA20–22). The fibers of the three clusters on a
human subject and a macaque subject are shown in Fig. 4(b–d). Those structural
connectivities have been reported to be preserved in human and macaque in many
available works [1, 3]. On the other hand, connectivity differences can be derived by
overlapping the negative and positive matrix in Fig. 3(a). Because only one negative
matrix was produced for human subject, it can be directly used to identify the con-
nectivity difference, e.g., the white arrow highlighted connectivities (Fig. 4(e)) linking
Inferior front lobe to temporal lobe (see Fig. 4(f) for the corresponding fibers). This
absent frontal projection to the middle and inferior temporal gyrus in macaque has been
validated by literature reports [3].
Fig. 4. (a) The overlapping of the two species’ positive weight matrices; (b)–(d) the fibers
derived from the 3 clusters in (a); (e) the connectivity difference matrix between the two species;
(f) the fibers on a human subject derived from the arrow-highlighted connectivities in (e).
4 Conclusion
In this work, we used dMRI to estimate the whole brain large-scale white matter
pathways and Brodmann areas as a test bed to construct a global connectome for
human and macaque, on which AGN-SCCA algorithm was adopted to yield the
weights associated with the connectivity to produce the component strongly correlated
between the two species. By analyzing the weights we identified the preserved white
130 X. Li et al.
matter pathways and those exclusive to a specific species. The results are consistent
with the reports in the literatures, demonstrating the effectiveness and promise of this
framework.
References
1. Rilling, J.K., Glasser, M.F., Preuss, T.M., Ma, X., Zhao, T., Hu, X., Behrens, T.E.: The
evolution of the arcuate fasciculus revealed with comparative DTI. Nat. Neurosci. 11(4),
426–428 (2008)
2. Thiebaut de Schotten, M., Dell’Acqua, F., Valabregue, R., Catani, M.: Monkey to human
comparative anatomy of the frontal lobe association tracts. Cortex 48(1), 82–96 (2012)
3. Jbabdi, S., Lehman, J.F., Haber, S.N., Behrens, T.E.: Human and monkey ventral prefrontal
fibers use the same organizational principles to reach their targets: tracing versus
tractography. J. Neurosci. 33(7), 3190–3201 (2013)
4. Styner, M., Knickmeyer, R., Joshi, S., Coe, C., Short, S.J., Gilmore, J.: Automatic brain
segmentation in rhesus monkeys. In: Proceedings of SPIE on Medical Imaging, vol. 6512,
p. 65122L1-8 (2007)
5. Du, L., Huang, H., Yan, J., Kim, S., Risacher, S.L., Inlow, M., Moore, J.H., Saykin, A.J.,
Shen, L., for the Alzheimer’s Disease Neuroimaging Initiative: Structured sparse canonical
correlation analysis for brain imaging genetics: an improved GraphNet method. Bioinfor-
matics 32, 1544–1551 (2016)
6. Van Essen, D.C., Ugurbil, K., Auerbach, E., Barch, D., Behrens, T.E.J., Bucholz, R.,
Chang, A., Chen, L., Corbetta, M., Curtiss, S.W., Della Penna, S., Feinberg, D.,
Glasser, M.F., Harel, N., Heath, A.C., Larson-Prior, L., Marcus, D., Michalareas, G.,
Moeller, S., Oostenveld, R., Petersen, S.E., Prior, F., Schlaggar, B.L., Smith, S.M.,
Snyder, A.Z., Xu, J., Yacoub, E.: The human connectome project: a data acquisition
perspective. Neuroimage 62, 2222–2231 (2012)
7. Jenkinson, M., Beckmann, C.F., Behrens, T.E., Woolrich, M.W., Smith, S.M.: FSL.
Neuroimage 62, 782–790 (2012)
8. Andersson, J.L.R., Jenkinson, M., Smith, S.: Non-linear registration, aka spatial normal-
isation. FMRIB technical report TR07JA2 (2010)
9. Liu, T., Nie, J., Tarokh, A., Guo, L., Wong, S.T.C.: Reconstruction of central cortical surface
from brain MRI images: method and application. Neuroimage 40, 991–1002 (2008)
10. Behrens, T.E., Berg, H.J., Jbabdi, S., Rushworth, M.F., Woolrich, M.W.: Probabilistic
diffusion tractography with multiple fibre orientations: what can we gain? Neuroimage 34(1),
144–155 (2007)
11. Basser, P.J., Pajevic, S., Pierpaoli, C., Duda, J., Aldroubi, A.: In vivo fiber tractography
using DT-MRI data. Magn. Reson. Med. 44, 625–632 (2000)
12. Yeo, B.T., Sabuncu, M.R., Vercauteren, T., Ayache, N., Fischl, B., Golland, P.: Spherical
demons: fast diffeomorphic landmark-free surface registration. IEEE Trans. Med. Imaging
29(3), 650–668 (2010)
13. Lewis, J.W., Van Essen, D.C.: Mapping of architectonic subdivisions in the macaque
monkey, with emphasis on parieto-occipital cortex. J. Comput. Neurol. 428(1), 79–111
(2000)
14. Van Essen, D.C., Glasser, M.F., Dierker, D.L., Harwell, J., Coalson, T.: Parcellations and
hemispheric asymmetries of human cerebral cortex analyzed on surface-based atlases. Cereb.
Cortex 22(10), 2241–2262 (2012)
15. Dauguet, J., Peled, S., Berezovskii, V., Delzescaux, T., Warfield, S.K., Born, R., Westin, C.-F.:
Comparison of fiber tracts derived from in-vivo DTI tractography with 3D histological neural
tract tracer reconstruction on a macaque brain. Neuroimage 37, 530–538 (2007). doi:10.1016/j.
neuroimage
16. Du, L., et al.: A novel structure-aware sparse learning algorithm for brain imaging genetics.
In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part III.
Modularity Reinforcement for Improving Brain
Subnetwork Extraction
Chendi Wang1(B) , Bernard Ng2 , and Rafeef Abugharbieh1

1
Biomedical Signal and Image Computing Lab, UBC, Vancouver, Canada
{chendiw,rafeef}@ece.ubc.ca
2
Department of Statistics, UBC, Vancouver, Canada
bernardng@gmail.com
Abstract. Functional subnetwork extraction is commonly employed to

study the brain’s modular structure. However, reliable extraction from
functional magnetic resonance imaging (fMRI) data remains challenging.
As representations of brain networks, brain graph estimates are typically
noisy due to the pronounced noise in fMRI data. Also, confounds, such as
region size bias, motion artifacts, and signal dropout, introduce region-
specific bias in connectivity, e.g. a node in a signal dropout area tends
to display lower connectivity. The traditional approach of global thresh-
olding might thus remove relevant edges that have low connectivity due
to confounds, resulting in erroneous subnetwork extraction. In this paper,
we present a modularity reinforcement strategy that deals with the above
two challenges. Specifically, we propose a local thresholding scheme that
accounts for region-specific connectivity bias when pruning noisy edges.
From the resulting thresholded graph, we derive a node similarity mea-
sure by comparing the adjacency structure of each node, i.e. its connec-
tion fingerprint, with that of other nodes. Drawing on the intuition that
nodes belonging to the same subnetwork should have similar connection
fingerprints, we refine the brain graph with this similarity measure to rein-
force its modularity structure. On synthetic data, our strategy achieves
higher accuracy in subnetwork extraction compared to using standard
brain graph estimates. On real data, subnetworks extracted with our
strategy attain higher overlaps with well-established brain systems and
higher subnetwork reproducibility across a range of graph densities. Our
results thus demonstrate that modularity reinforcement with our strategy
provides a clear gain in subnetwork extraction.
Keywords: Brain graph estimation · Connection fingerprint · fMRI ·

Local thresholding · Subnetwork extraction
1 Introduction
The human brain naturally befits a graphical representation, where brain regions
and their pair-wise interactions constitute graph nodes and weighted edges,
respectively. An important attribute of the brain is its modular structure, in
which specific subnetworks of brain regions work in tandem to execute vari-
ous functions. Functional magnetic resonance imaging (fMRI) is widely used

DOI: 10.1007/978-3-319-46720-7 16
Modularity Reinforcement for Improving Brain Subnetwork Extraction 133
for studying this modular structure of the brain. However, reliable subnetwork
extraction from fMRI data remains challenging. First, the brain network topol-
ogy may be obscured by noisy connectivity estimates [1]. Second, confounds,
such as region size bias [2], effects of motion artifacts [3], and signal dropouts
due to susceptibility artifacts (especially in regions like the orbitofrontal cor-
tex and the inferior temporal lobe) [4], introduce region-specific biases to the
connectivity estimates.
The conventional way for dealing with noisy connectivity matrices is to apply
global thresholding (GT) by either keeping only connections with values above
a certain threshold or keeping a certain graph density [1]. Due to region-specific
connectivity biases, e.g. brain regions in signal dropout locations tend to display
lower connectivity, certain regions that do belong to a subnetwork might not
appear as such based on the fMRI measurements, especially after GT, which
prunes weak edges. To mitigate this overlooked problem, a local thresholding
(LT) method based on the minimal spanning tree and k-nearest neighbors (MST-
kNN) has been proposed [5]. The idea in [5] was to build a single connected graph
using the MST and expand the tree by adding edges from each node to its near-
est neighbors until a desired graph density is reached. However, both key steps of
enforcing a single connected graph and adding edges to all nodes when expanding
the tree lack neuro-scientific justifications. A few studies have explored spectral
graph wavelet transform for graph de-noising [6], but this approach does not
explicitly handle region-specific connectivity biases. In fact, most existing con-
nectivity estimation and subnetwork extraction techniques [1,7] do not account
for these biases.
In this paper, we propose a modularity reinforcement strategy for improv-
ing brain subnetwork extraction. To deal with noisy edges and region-specific
connectivity biases, we propose a local thresholding scheme that normalizes the
connectivity distribution of each node prior to thresholding (Sect. 2.1). Also,
since node pairs belonging to the same subnetwork presumably connect to a
similar set of brain regions, i.e. have similar connection fingerprints, we derive a
node similarity measure from the thresholded graph by comparing the adjacency
structure of each node pair, and refine the graph with this similarity measure to
reinforce its modularity structure (Sect. 2.2). More reliable subnetwork extrac-
tion is consequently facilitated on the refined graph (Sect. 2.3). To set the number
of subnetworks, we adopt an automated technique based on graph Laplacian [8],
and compare that against the conventional modularity-maximization approach
[9]. We validate our modularity reinforcement strategy on both synthetic data
and real data from the Human Connectome Project (HCP).
2 Methods
2.1 Local Thresholding
Due to region-specific connectivity biases, conventional GT might prune relevant
connections with weak edge strength. To account for these biases, we present here
a LT scheme. The idea is to first normalize the connectivity distribution of each
134 C. Wang et al.
node into a uniform interval to rectify the biases. Subsequent global thresholding
on this normalized graph would have the effect of applying local thresholding on
each node. Specifically, let C be an n × n connectivity matrix, where n is the
number of nodes in the brain graph. We normalize the connectivity distribution
by mapping each row of C from [min (Ci,: ), max (Ci,: )] to [0, 1], where Ci,:
denotes row i of C corresponding to the connectivity between brain region i
and all other regions in the brain. A threshold is then applied to generate a
binary adjacency matrix, G, which we then symmetrize by taking the union of
G and GT : A = Gi,j ∪ Gj,i . This binary adjacency matrix A is used to mask
out the noisy edges from C: Ĉi,j = Ai,j Ci,j , which is equivalent to applying a
local threshold to Ci,: for all i . We note that in the event that noisy nodes are
accidentally included, some of the connections to these noise nodes (that might
not be kept by GT) would be kept by LT due to the normalization step.
2.2 Modularity Reinforcement

Since nodes working in tandem are expected to have similar connection finger-
prints, given A, where Ai,: is the connection fingerprint of node i , we define
the similarity between a pair of nodes (i , j ) as the number of common adjacent
nodes they share, normalized by the minimum node degree of the node pair:
n
Ai,k Aj,k
Si,j = k=1 (1)
min (di , dj )
n
where di = k=1 Ai,k . We use the minimal degree for normalization, instead of
e.g. the average degree, so that connections associated with hub nodes (nodes
with more edges) will not be overly down-weighted. Since nodes within a sub-
network are expected to share more adjacent neighbors than nodes belonging
to different subnetworks, S boosts the within-subnetwork edges while suppresses
the between-subnetwork edges, which highlights the modular pattern inherent in
Ĉ: Hence, we use S to refine Ĉ to reinforce its modular structure: ĈS
i,j = Si,j Ĉi,j .
2.3 Subnetwork Extraction

For subnetwork extraction, we employ normalized cuts (Ncuts), chosen due to
its wide use by the fMRI community. To set the number of subnetworks, m,
we adopt an automated technique based on the spectral properties ofthe graph
n
Laplacian: L = D − W, where W is a connectivity matrix, Dii = k=1 Wi,k .
Specifically, an eigenvalue of 1 has been shown to correspond to the transition
where single isolated nodes would no longer be declared as a subnetwork [8]. We
thus set m to the number of eigenvalues of L with values less than 1.
3 Materials
3.1 Synthetic Data
To illustrate our strategy, we synthesized a small-scale network consisting of
n = 13 nodes in Fig. 1. We also generated synthetic data that cover 100 random
network configurations with n set to 100 nodes. For each network configuration,
the number of subnetworks, N , was randomly selected from [10, 20]. The number
of regions within each subnetwork was set to round (n/N ) + r, where r was
randomly selected from [−2, 2]. With the resulting configuration, we created the
corresponding adjacency matrix, Σ, and drew time courses with 4,800 samples
(analogous to real data) from N (0, Σ). We then added Gaussian noise to the time
courses with signal-to-noise ratio randomly set between [−6dB, −3dB]. Sample
covariance was then estimated from these time courses with correlation values
associated with q % of the nodes reduced by z %, where q was randomly selected
from [20 %, 30 %] and z was randomly selected from [30 %, 40 %] to simulate
region-specific connectivity biases for smaller brain regions [2].
3.2 Real Data
We used the resting state fMRI scans of 77 healthy subjects (36 males and
41 females, ages ranging from 22 to 35) from the HCP Q3 dataset [10]. The
data comprised two sessions, each having a 30 min acquisition with a TR of
0.72 s and an isotropic voxel size of 2 mm. Preprocessing already applied to the
data by HCP [11] included gradient distortion correction, motion correction,
spatial normalization to MNI space, and intensity normalization. Additionally,
we regressed out motion artifacts, mean white matter and cerebrospinal fluid
signals, and principal components of high variance voxels [12], followed by band-
pass filtering with cutoff frequencies of 0.01 and 0.1 Hz. We used the Will90fROI
atlas [13] and the Harvard-Oxford (HO) atlas [14] to define regions of interest
(ROIs). The Will90fROI and HO atlas have 90 and 112 ROIs, respectively. Voxel
time courses within ROIs were averaged to generate region time courses. The
region time courses were demeaned, normalized by the standard deviation, and
concatenated across subjects for extracting group subnetworks. The Pearson’s
correlation values between the region time courses were taken as estimates of
connectivity. Negative elements in the connectivity matrix were set to zero due
to the currently unclear interpretation of negative connectivity [15].
4 Results and Discussion
We compared our strategy (LT with modularity reinforcement - LTMR) against

GT, LT, GT with modularity reinforcement (GTMR) and MST-kNN in [5].
LT was implemented using our proposed scheme (Sect. 2.1). GTMR was imple-
mented by deriving adjacency matrices with global thresholding, and subse-
quently executing our proposed modularity reinforcement strategy (Sect. 2.2).
Instead of using a specific threshold, we examine a range of graph densities to test
the robustness of our proposed strategy. For synthetic data, evaluation was based
on the accuracy of subnetwork extraction. To estimate accuracy, we matched the
extracted subnetworks to the ground truth subnetworks using Hungarian clus-
tering [16] with the Dice coefficient: DC = 2 |X ∩ Y | / (|X | + |Y |), where X is
the set of regions of an extracted subnetwork and Y is the set of regions of
136 C. Wang et al.
a ground truth subnetwork. The average DC over matched subnetworks was

taken as accuracy. For real data, we assessed the overlap between the extracted
subnetworks and fourteen well-established brain systems [13] and subnetwork
reproducibility for a range of graph densities [14] using DC.
4.1 Synthetic Data

An example of the various steps of our strategy is shown in Fig. 1c–f to demon-
strate how our strategy highlights the modular structure of the graph. With GT
(Fig. 1c), node 2 was isolated from subnetwork 1. In contrast, our LT scheme
(Fig. 1d) was able to preserve node 2. Also, with our LT (Fig. 1d), one of between-
subnetwork edges (i.e. edges between nodes 6 and 7 & nodes 6 and 9) was
pruned, which would help prevent the two subnetworks from being declared as
one, whereas none of between-network edges was pruned using GT (Fig. 1c). Fur-
ther, refining the graph (Fig. 1c, d) with our similarity helped to highlight the
modular pattern (Fig. 1e, f), e.g. the between-network edges which were similar
to or higher than some within-network edges (especially those edges between
nodes 12 and 13, node 2 and 1 & nodes 2 and 5) in Fig. 1c, d were supressed by
our similarity to be the lowest values in Fig. 1e, f.
(a) Network structure (b) C (c) C̄ (d) Ĉ (e) C̄S (f) ĈS
Fig. 1. Schematic illustrating our method using small scale example having two subnet-
works with each subnetwork having a provincial hub (blue) and linked by a connector
hub (orange). In (b), warmer color indicates higher connectivity and black dots indicate
the ground truth adjacency matrix. We denote C̄ as global thresholded, and Ĉ as local
thresholded connectivity matrix. At a graph density of 0.25, the GT generated isolated
node 2 in (c), while our LT preserved two edges linked to node 2 in (d). Refining the
graph (c) and (d) suppressed the between-network edges (edges between nodes 6 and
7 & nodes 6 and 9) to be the lowest connectivity in (e) and (f).
On the 100 synthetic dataset with 100 nodes over a density range of [0.005,
0.5] at an interval of 0.01, LTMR achieved significantly higher accuracy (aver-
age DC = 0.6735) than GT (average DC = 0.6216, p = 7.56e-10), LT (average
DC = 0.6537, p = 2.89e-7), and MST-kNN (average DC = 0.6327, p = 7.38e-8)
based on Wilcoxon signed rank test. LTMR also achieved higher DC than GTMR
(average DC = 0.6610, p = 0.34), though did not reach significance.
4.2 Real Data

We first evaluated our strategy by examining the overlap between our extracted
subnetworks and 14 well-established brain systems presented in [13], which we
(a) Overlap with established subnetworks (b) Reproducibility over density range
Fig. 2. Subnetwork extraction on real data at graph densities from 0.05 to 0.5 at
interval of 0.05. Blue = GT, green = LT, black = GTMR, cyan = MST-kNN, and
red = our proposed LTMR strategy. Dash lines indicate average value. In (b), the
DC of the reference density of 0.2 was left blank, since inclusion of DC = 1 might
mislead the reader. In both (a) and (b), local thresholding outperforms the global
thresholding, and modularity reinforcement further increases DC compared to using
connectivity alone. Our proposed strategy attained the highest DC overall.
used as ground truth, Fig. 3a. For this assessment, we only considered connectiv-
ity matrices based on the Will90fROI atlas [13]. Our proposed LTMR achieved
an average DC of 0.6222, which was significantly higher than GT (average
DC = 0.5384, p = 0.002), MST-kNN (average DC = 0.4567, p = 0.002), GTMR
(average DC = 0.5422, p = 0.006), and higher than LT (average DC = 0.5936,
p = 0.063), as shown in Fig. 2a. At a graph density of 0.5041, corresponding to no
thresholding except negative correlation removal, a DC of 0.5667 was attained,
suggesting that some thresholding to remove noisy edges is beneficial. We note
that although some node-wise variations in connectivity distribution might have
a neuronal basis, we postulate that these variations would be overwhelmed by
the various confound-induced connectivity biases, as supported by how local
thresholding outperforms global thresholding. We further note that an average
m of 11 was estimated with the Laplace approach, whereas an average m of 4
was estimated with modularity maximization. This result shows the resolution
limits of modularity maximization [9], i.e. it tends to underestimate the number
of subnetworks in favoring network partitions with groups of modules combined
into larger communities. This suggests the need to explore alternative techniques
for estimating the number of subnetworks.
We next evaluated the subnetwork reproducibility over a range of graph
densities. We used connectivity matrices based on the HO atlas, which has larger
brain coverage than the Will90fROI atlas but does not have subnetwork labels
assigned to the regions. We set subnetworks corresponding to an edge density
of 0.2 as the reference. Based on the Laplace approach, the optimal number of
subnetworks was found to be 11±5 over the range of graph density examined. Our
proposed strategy achieved an average DC of 0.7302, which is significantly higher
than that of GT (DC = 0.6121, p = 0.004), LT (DC = 0.6677, p = 0.027), MST-
kNN (DC = 0.5737, p = 0.003), and higher than GTMR (DC = 0.7004, p = 0.262),
Fig. 2b. The results hold with other densities used as reference.
138 C. Wang et al.
(a) Will90fROI (b) Global thresholding (c) Local thresholding (d) Proposed
Fig. 3. Subnetwork visualization. 11 subnetworks were extracted from graphs with a

density of 0.2. (a) Well-established brain systems [13] (b) Two subnetwork formed by
isolated nodes and false inclusion of premotor-related regions into auditory system was
observed using global thresholding. (c) Local thresholding failed to detect one region
of known visual systems and falsely detected four unrelated regions into dorsal default
mode system. (d) Our strategy correctly detect most of the subnetworks found in [13].
Qualitatively, with GT (Fig. 3b), we observed two subnetworks comprising

only isolated nodes in the left and right Pallidum (yellow and light grey node in
the blue circle). We also observed that a region in the right premotor area was
falsely grouped into the auditory subsystem (the light green region with a red
arrow). With GTMR, two subnetworks comprising single nodes were found. As
for LT (Fig. 3c), we observed the left and right insular cortex as well as the right
Frontal Operculum Cortices (orange nodes with red arrows) were falsely grouped
with Dorsal Default Mode regions and the left paracingulate gyrus was excluded.
In contrast, our proposed strategy correctly identified known Dorsal Default
Mode regions, such as paracingulate gyrus, anterior division of cingulate gyrus,
and Accumbens, as a single subnetwork. Further, LT excluded the left Cuneal
Cortex in the visual system (blue arrow in Fig. 3c). Other found subnetworks
with our strategy, such as left and right executive control subnetworks (red and
yellow), Fig. 3d, also conform well to known brain systems as was quantitatively
demonstrated in Fig. 3a.
5 Conclusions
We proposed a modularity reinforcement strategy for improving brain subnet-
work extraction. By applying local thresholding in combination with modular-
ity reinforcement based on connection fingerprint similarity, we attained higher
accuracy in subnetwork extraction compared to conventional global thresholding
and local thresholding. Higher overlap with established brain systems and higher
subnetwork reproducibility were also shown on the real data. Our results thus
demonstrate clear benefits of refining conventional connectivity estimates with
our strategy for subnetwork extraction. In fact, our strategy can be extended
to applications beyond subnetwork extraction by deriving features based on the
extracted subnetworks, e.g. within-subnetwork connectivity computed from the
original connectivity estimates, and using those features for group analysis and
behavioural association studies.
References
1. Fornito, A., Zalesky, A., Breakspear, M.: Graph analysis of the human connectome:
promise, progress, and pitfalls. Neuroimage 80, 426–444 (2013)
2. Achard, S., Coeurjolly, J.F., Marcillaud, R., Richiardi, J.: fMRI functional con-
nectivity estimators robust to region size bias. In: Statistical Signal Processing
Workshop, pp. 813–816. IEEE (2011)
3. Spisák, T., Jakab, A., Kis, S.A., Opposits, G., Aranyi, C., Berényi, E., Emri, M.:
Voxel-wise motion artifacts in population-level whole-brain connectivity analysis
of resting-state fMRI. PLoS ONE 9(9), e104947 (2014)
4. Weiskopf, N., Hutton, C., Josephs, O., Turner, R., Deichmann, R.: Optimized EPI
for fMRI studies of the orbitofrontal cortex: compensation of susceptibility-induced
gradients in the readout direction. Magn. Reson. Mater. Phys. Biol. Med. 20(1),
39–49 (2007)
5. Alexander-Bloch, A.F., Gogtay, N., Meunier, D., Birn, R., Clasen, L., Lalonde, F.,
Lenroot, R., Giedd, J., Bullmore, E.T.: Disrupted modularity and local connec-
tivity of brain functional networks in childhood-onset schizophrenia. Front. Syst.
Neurosci. 4, 147 (2010)
6. Hammond, D.K., Vandergheynst, P., Gribonval, R.: Wavelets on graphs via spec-
tral graph theory. Appl. Comput. Harmonic Anal. 30(2), 129–150 (2011)
7. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)
8. Niu, J., Fan, J., Stojmenovic, I.: JLMC: a clustering method based on Jordan-Form
of Laplacian-Matrix. In: Performance Computing and Communications Confer-
ence, pp. 1–8. IEEE (2014)
9. Fortunato, S., Barthelemy, M.: Resolution limit in community detection. Proc.
Nat. Acad. Sci. 104(1), 36–41 (2007)
10. Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E.,
Ugurbil, K., Consortium, W.M.H., et al.: The WU-Minn human connectome
project: an overview. Neuroimage 80, 62–79 (2013)
11. Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., Coalson, T.S., Fischl, B.,
Andersson, J.L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J.R., et al.: The min-
imal preprocessing pipelines for the human connectome project. Neuroimage 80,
105–124 (2013)
12. Behzadi, Y., Restom, K., Liau, J., Liu, T.T.: A component based noise correction
method (CompCor) for BOLD and perfusion based fMRI. Neuroimage 37(1), 90–
101 (2007)
13. Shirer, W., Ryali, S., Rykhlevskaia, E., Menon, V., Greicius, M.: Decoding subject-
driven cognitive states with whole-brain connectivity patterns. Cereb. Cortex
22(1), 158–165 (2012)
14. Desikan, R.S., Ségonne, F., Fischl, B., Quinn, B.T., Dickerson, B.C., Blacker, D.,
Buckner, R.L., Dale, A.M., Maguire, R.P., Hyman, B.T., et al.: An automated
labeling system for subdividing the human cerebral cortex on MRI scans into gyral
based regions of interest. Neuroimage 31(3), 968–980 (2006)
15. Skudlarski, P., Jagannathan, K., Calhoun, V.D., Hampson, M., Skudlarska, B.A.,
Pearlson, G.: Measuring brain connectivity: diffusion tensor imaging validates rest-
ing state temporal correlations. Neuroimage 43(3), 554–561 (2008)
16. Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc.
Ind. Appl. Math. 5(1), 32–38 (1957)
Effective Brain Connectivity Through
a Constrained Autoregressive Model
Alessandro Crimi1(B) , Luca Dodero1 , Vittorio Murino1,2 , and Diego Sona1,3

1
Pattern Analysis and Computer Vision, Istituto Italiano di Tecnologia, Genoa, Italy
{alessandro.crimi,luca.dodero,vittorio.murino,diego.sona}@iit.it
2
Department of Computer Science, University of Verona, Verona, Italy
3
Neuroinformatics Laboratory, Fondazione Bruno Kessler, Trento, Italy
Abstract. Integration of functional and structural brain connectivity

is a topic receiving growing attention in the research community. Their
fusion can, in fact, shed new light on brain functions. Targeting this issue,
the manuscript proposes a constrained autoregressive model allowing to
generate an “effective” connectivity matrix that model the structural
connectivity integrating the functional activity. In practice, an initial
structural connectivity representation is altered according to functional
data, by minimizing the reconstruction error of an autoregressive model
constrained by the structural prior. The proposed model has been tested
in a community detection framework, where the brain is partitioned using
the effective network across multiple subjects. Results showed that using
the effective connectivity the resulting clusters better describe the func-
tional interactions of different regions while maintaining the structural
organization.
Keywords: Autoregressive model · Spectral clustering · Connectome ·

Effective connectivity · Community detection · Brain imaging
1 Introduction
It is now well estabilished in neuroscience that functional segregation is a bearing

principle of brain organization. However, the human brain is a complex network
characterized by spatially interconnected regions that can be causally or concur-
rently activated during specific tasks or at rest. As a consequence, integration
of segregated regions is emerging as the most probable organization explaining
the complexity of brain function. This principle is however difficult to prove and
a possible approach is to investigate it through functional connectivity analysis,
which is usually based on the determination of functional correlations.
The main limitation of functional connectivity analysis is that structural
information is not used at all, preventing any interpretation in terms of effec-
tive connectivity. Indeed, understanding the relationships between the functional
activity in different brain regions and the structural network highlighted using
tractography can convey useful information about brain functions. While in [19]

DOI: 10.1007/978-3-319-46720-7 17
Effective Brain Connectivity Through a Constrained Autoregressive Model 141
it has been shown there is a significant overlap between neuroanatomical connec-

tions and correlations of fMRI signals, it is yet to be understood how whole-brain
network interactions relate during specific tasks or at rest, where fundamental
signals have been suggested to play a key role [4].
It has also been shown, for example, that the intensity of functional activity
can be predicted by the strength of structural connections, suggesting that it
is possible to predict resting-state functional activity from structural informa-
tion [11]. On the same line, a predictive framework based on multiple sparse lin-
ear regression has been recently used to predict functional series from structural
data [2], and a model called the “virtual brain” has been designed to simulate
brain activity in injured and healthy subjects [12]. With the aim of highlight-
ing the relationships between functional and structural connectivity, a different
approach relies on a Bayesian framework to estimate the functional connectiv-
ity using a structural graph as a prior [10] has been proposed. However, results
generating functional connectivity from structural information have been so far
challenging, due to the fact that certain high correlations appear between brain
regions not directly linked by structural connections.
An alternative schema adopted to infer the functional directed connections
from brain activity is based on methods investigating the functional causality like
the Dynamical Causal Model (DCM) [5] and the Granger Causality (GC) [8].
DCM uses an explicit model of regional neural dynamics to capture changes
in regional activations and inter-regional coupling in response to stimulus or
task demands. Statistical inference is used to estimate parameters for directed
influences between neuronal elements. While being this a powerful method to
study effective connectivity, its main limitation is the combinatorial complexity
on the number of modeled regions and connections, which limits its applicabil-
ity to only few regions. GC also measures the causal influence and the flow of
information between regions. Despite the slow dynamics and the regions vari-
ability of hemodynamic response making it a controversial method for fMRI
data, it has been also used to identify the dynamics of Blood-Oxygen-Level
Dependent (BOLD) signal flow between brain regions [7,14]. The advantage of
GC is that it is based on a Multivariate Autoregressive model (MAR), a ran-
dom process specifying that the variables depend linearly on their own previous
values and on a stochastic term. Thank to this, GC does not suffer an exces-
sive computational complexity. Nonetheless, in case of a large number of regions
involved in the analysis, it is affected by cancellation issues and high sensitivity to
noise. This makes it impossible to perform a whole brain analysis involving many
brain regions. Moreover, this approach usually does not consider the structural
connectivity, unless considering one fiber/connection at a time with univariate
analysis [13].
Taking inspiration from these limitations, this paper presents an extension of
the MAR model introducing a constrained MAR (CMAR), which uses the struc-
tural connectivity as a prior to bound the search space during parameter fitting.
This fusion of structural connectivity and functional time-series aims at repre-
senting an effective brain connectivity, addressing a whole brain analysis thanks
142 A. Crimi et al.
to the sparse representation of the connectivity matrix. Besides, the advantages

of the proposed method are that only the structural connections justified by the
functionality survive, and it can refine possible false positive connections [1].
The validation of the proposed method has been performed testing the quality
of results in a typical brain community detection framework, using a recently
developed technique based on group-wise spectral clustering [3]. This investiga-
tion allowed to validate the hypothesis that clustering the brain using the effec-
tive connectivity matrix retrieved with CMAR, while preserving the structural
partitioning, it also optimizes the graph cut minimizing the loss of functional
interactions.
2 Method
The aim of the proposed method is to reinforce the association between structural
brain connectivity and functional brain activation. To obtain this result, we
resort to a multivariate autoregressive model properly modified in order to allow
for the estimation of the temporal brain activation biased by the structural
connectivity.
2.1 Structurally Constrained Autoregressive Model

A multivariate autoregressive model of order n (MAR(n)) is a stochastic process
defining r variables y(t) as linearly
dependent on their own previous values and
n
on a stochastic term : y(t) = i=1 A i · y(t − i) + , where the coefficient
matrices Ai are the model parameters. These can be estimated by minimizing
the discrepancy between the observed and the implied covariance or constrained
by prior knowledge, though this is cumbersome for detailed brain-wise tasks.
Given a structural connectivity matrix Ainit , the functional signal in a
column-vector y for all r regions of interest and all T frames, an improved
connectivity matrix can be determined minimizing a reconstruction error of a
MAR(1) model. Indeed, to fit the model parameters A we used a gradient descent
approach minimizing the reconstruction error in a least square fashion:
T −1
1
E= ||A · yt − yt+1 ||22 , (1)
2 t=0
with the direction of the gradient computed regarding A as follows:

T
−1
∇E = (A · yt − yt+1 ) · yt , (2)
t=0
where () denotes the transpose. To introduce the structural bias into the model,
the parameters fitting is constrained by the structural information in the update
rule:
Anew = (Aold + η∇E) B, (3)
where η is the learning rate, is the Hadamard or element-wise product, and

B is a matrix of the same size of Ainit with each element bij defined as

0 if the ij element of Ainit is 0
bij =
1 otherwise.
In this manner, the lack of connections among certain regions in the initial
matrix Ainit is reinforced at each iteration keeping the connection originally
set to zero when the gradient descent has introduced some non-null values in
them. Thus, B and Ainit encode the prior structural connectivity that reinforce
the relationship between functional and structural data. During the gradient
descent, it is possible that some values of the matrix Anew become negative
which might be meaningless in terms of causality. Therefore, at each iteration
those values are set to zero to correct the descent. Setting to zero negative values
at each iteration in the matrix Anew is a common way to enforce non-negativity
during the learning. Clearly, this has an effect on accuracy as the solution is sub-
optimal with respect to an unconstrained fitting. On the other side, the need
for non-negative coefficients is motivated by the fact that negative causalities
cannot be interpreted.
The approach can be easily generalized to higher order MAR, where a differ-
ent matrix Ai has to be optimized for each order. Choosing the right model-order
is a trade-off between optimizing the variance and the model complexity. In our
experiments to have a direct validation of the method, we limited the model to
the first order.
2.2 Effective Brain Community Detection

To assess whether the proposed method produces an effective connectivity infor-
mation characterizing the structural connectivity enriched with functional infor-
mation, a community detection analysis has been performed using a group-wise
graph clustering algorithm recently proposed in [3] both on the set of structural
connectivity matrices and on the effective connectivity matrices.
Given a set of connectivity matrices W = {Wi } representing undirected
weighted graphs with positive weights, the normalized graph Laplacian is built
−1 −1
as Li = Di 2 (Di − Wi )Di 2 , where Di is the diagonal degree matrix of Wi .
However, in general, the connectivity matrices resulting from the above CMAR
model computed for each subject are asymmetric (i.e., edges are directed), thus,
they have been converted to undirected graphs aiming at maintaining the proper-
ties of the original graphs estimated from CMAR. To this aim, a symmetrization
based on random walk was applied [15].
More specifically, given a directed graph M, the transition matrix of the
random walk can be defined as P = D−1 out M, where Dout is a diagonal matrix
built using nodes’ out-degree. The symmetric graph can be therefore defined as
Msym = 12 (ΠP + P Π), where Π is a the diagonal matrix which defines the
probability of a walker to stay in each node in a stationary distribution, defined
as Π = dm out
, where dout is the vector of the out-degree of each node and m is the
144 A. Crimi et al.
number of nodes. Thanks to this new representation, the pipeline described by

the authors in [3] can be applied, generating the Normalized Graph Laplacians
for each subject, performing the joint diagonalization of multiple Laplacians to
find a unique eigenspace and, finally, applying spectral clustering on the smallest
joint eigenvectors. In order to decide the number of clusters, as usual in spectral
clustering, we can look at the spectral gap of the mean approximated eigenvalues.
3 Data and Experimental Settings

All experiments have been performed on 20 right-handed healthy subjects from
the Nathan Kline Institute-Rockland dataset [16]. For each subject, fMRI, DTI
and T1 have been acquired and co-registered. FMRI data were acquired using a
3 T scanner, with TR/TE times as 1.4 s/30 ms, flip angle 65◦ , and isotropic voxel
size of 2 mm; resulting in resting-state time series 10 min long, where subjects
were asked to keep the eyes open. DTI volumes were acquired with a 1.5 Tesla
scanner and isotropic voxel-size of 2.5 mm using 35 gradient directions. The T1
weighted MRI data were acquired with the same scanner, using as TR/TE times
1.1 s/4.38 ms, flip angle 15◦ , and isotropic voxel-size of 1 mm.
3.1 Pre-processing and Connectome Construction

FMRI data have been pre-processed according to a standard pipeline: motion
correction, mean intensity subtraction, pass-band filtering with cutoff frequencies
of [0.005–0.1 Hz] and skull removal. To account for potential noise from physi-
ological processes such as cardiac and respiratory fluctuations, nine covariates
of no interest have been identified for inclusion in our analyses [18]. To further
reduce the effects of motion, compensation for frame-wise displacement has been
carried out [17]. Eddy current correction and skull stripping have been carried
out as the pre-processing for the DTI data. Linear registration has been applied
between the AAL atlas and the T1 reference volume by using linear registration
with 12 degrees of freedom.
Tractographies for all subjects have been generated processing DTI data with
the Python library Dipy [6]. In particular, a deterministic algorithm called Euler
Delta Crossings has been used stemming from 2,000,000 seed-points and stopping
when the fractional anisotropy was smaller than < 0.1. Tracts shorter than
30 mm or in which a sharp angle occurred have been discarded. The final result
yielded to about 250,000 fibers. To construct the connectome, the graph nodes
have been determined using the 90 regions in the AAL atlas. Specifically, the
structural connectome has been built counting the number of tracts connecting
two regions, for any pair of regions. The same regions have been used to compute
the averaged functional time series from the voxels in each region.
4 Results and Discussions

Figure 1 depicts an example from one subject of (a) the original structural con-
nectivity matrix and (b) the effective connectivity resulting from autoregressive
(a) (b) (c)
Fig. 1. Example of adjacency matrix for one subject plotted in logscale: (a) initial
matrix obtained from the tractography; (b) effective matrix obtained with the proposed
autoregressive model. The convergence process is shown in subfigure (c) as a decrease
of error defined in Eq. (1), where each color line is a different subject.
model filtering. The figures highlight that some structural connections which are
not “used” by the resting-state functional data are canceled out. This has clearly
an effect on the subsequent clustering, which is shown in Fig. 2. Indeed, by ana-
lyzing the group-wise eigenvalues resulting from the joint Laplacians diagonal-
ization, it has been noted a spectral gap at the 4th and 8th eigenvalues for both
structural and effective connectivity matrices, in agreement with previous studies
on other datasets [9]. The value k = 8 has been chosen to clusters the brain as
it better explains the brain known communities. The resulting clustering of the
brain regions based on the structural connectome (Fig. 2(a)) and on the effective
connectome (Fig. 2(b)) are slightly different while preserving the overall organi-
zation. Regarding convergence, the gradient descent finds a sub-optimal solution
by definition. The zeroing step makes convergence more cumbersome but does not
prevent from reaching a minimum as shown in Fig. 1(c).
(a) (b)
Fig. 2. Axial view of joint spectral clustering using k = 8 on (a) the original struc-
tural joint eigenspace, and (b) on the joint eigenspace given by effective connectivity
matrices.
146 A. Crimi et al.
4000
CMAR
Eff.Comm
3500 Stru.Comm
Reconstruction Error
3000
2500
2000
1500
1000
0 2 4 6 8 10 12 14 16 18 20
Subjects
(a) (b)
Fig. 3. (a) Reconstruction error of CMAR model after converging to the effective
connectivity (green circles) or according to block-wise MAR based on the structural
communities (red squares) and the effective communities (blue crosses). The lower the
better. (b) Functional segregation of clusters using the effective communities (green
dots) or structural communities (black stars). The higher the better.
We also devised an analysis to assess whether the clusters obtained form the
autoregressive filtered data are more meaningful in relation to the fMRI time-
series then the cluster obtained from the structural information. We carried out
a block-wise definition of the effective connectivity matrices where one block at
a time, defined by the brain regions belonging to a cluster, is used in a CMAR
model involving only the relative fMRI series. Then, the reconstruction error of
the fitted CMAR models for each cluster have been summed up over all clus-
ters and compared each other. The underlying intuition is that partitioning the
brain using an effective connectivity information would remove those structural
connections which are also meaningless from a functional perspective, at least
in the analyzed experimental data.
The reconstruction error per subject in Fig. 3(a) shows, as expected, that
the lowest error is given by considering the whole network in the CMAR com-
putation. However, when removing some connections according to the clustering
results, the communities determined from the effective connectivity matrix show
to be more self explanatory in terms of functional activity then the communi-
ties obtained from the structural connectivity only. Similar evidence is obtained
when analyzing the cluster functional separation (CFS), defined as the average
ratio between the intra- and inter-cluster cross-correlation as follows:
k
1 i<j∈Cs wij
CF S = (4)
k s=1 i<j∈Cs wij + i∈Cs j∈Ct =Cs wij
where wij is the functional cross-correlation of the time-series for nodes i and j.
This index has been computed for both structural and effective clustering
result. Figure 3(b) shows that CFS with clusters determined using our CMAR
approach is significantly higher when compared with the structural clusters (p <
0.001), demonstrating that the effective clusters are also underpinned by the
functional connectivity. Although, larger datasets experiments are required.
5 Conclusions
The effective connectivity inferred by the proposed CMAR model highlights a
different brain architecture underpinned by both structural and functional con-
nectivity. Thanks to this, the method can lead to new insights into understanding
brain effective connections in healthy and pathological subjects.
References
1. Chen, H., et al.: Optimization of large-scale mouse brain connectome via joint
evaluation of DTI and neuron tracing data. NeuroImage 115, 202–213 (2015)
2. Deligianni, F., et al.: A framework for inter-subject prediction of functional con-
nectivity from structural networks. IEEE TMI 32(12), 2200–2214 (2013)
3. Dodero, L., Gozzi, A., Liska, A., Murino, V., Sona, D.: Group-wise functional
community detection through joint Laplacian diagonalization. In: Golland, P.,
Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol.
8674, pp. 708–715. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10470-6 88
4. Fox, M.D., et al.: The human brain is intrinsically organized into dynamic, anti-
correlated functional networks. PNAS 102(27), 9673–9678 (2005)
5. Friston, K.J.: Functional and effective connectivity: a review. Brain Connect. 1(1),
13–36 (2011)
6. Garyfallidis, E., et al.: Dipy, a library for the analysis of diffusion MRI data. Front.
Neuroinformatics 8, 8 (2014)
7. Goebel, R., et al.: Investigating directed cortical interactions in time-resolved fMRI
data using vector autoregressive modeling and Granger causality mapping. Magn.
Reson. Imaging 21(10), 1251–1261 (2003)
8. Granger, C.: Investigating causal relations by econometric models and cross-
spectral methods. Econometrica 37(3), 424–438 (1969)
9. Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C.J., et al.: Mapping
the structural core of human cerebral cortex. PLoS Biol. 6(7), e159 (2008)
10. Hinne, M., Ambrogioni, L., Janssen, R.J., Heskes, T., et al.: Structurally-informed
Bayesian functional connectivity analysis. NeuroImage 86, 294–305 (2014)
11. Honey, C., et al.: Predicting human resting-state functional connectivity from
structural connectivity. PNAS 106(6), 2035–2040 (2009)
12. Jirsa, V., et al.: Towards the virtual brain: network modeling of the intact and the
damaged brain. Arch. Ital. Biol. 148(3), 189–205 (2010)
13. Li, X., Li, K., Guo, L., Lim, C., Liu, T.: Fiber-centered granger causality analysis.
In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011. LNCS, vol. 6892,
14. Liao, W., et al.: Small-world directed networks in the human brain: multivariate
Granger causality analysis of resting-state fMRI. NeuroImage 54, 2683–2694 (2011)
15. Malliaros, F.D., Vazirgiannis, M.: Clustering and community detection in directed
networks: a survey. Phys. Rep. 533(4), 95–142 (2013)
16. Nooner, K., et al.: The NKI-Rockland sample: a model for accelerating the pace
of discovery science in psychiatry. Front. Neurosci. 6, 152 (2012)
17. Power, J.D., et al.: Spurious but systematic correlations in functional connectivity
MRI networks arise from subject motion. NeuroImage 59(3), 2142–2154 (2012)
18. Saad, Z.S., et al.: Correcting brain-wide correlation differences in resting-state
fMRI. Brain Connect. 3(4), 339–352 (2013)
19. Vincent, J., et al.: Intrinsic functional architecture in the anaesthetized monkey
brain. Nature 447(7140), 83–86 (2007)
GraMPa: Graph-Based Multi-modal Parcellation
of the Cortex Using Fusion Moves
Sarah Parisot1(B) , Ben Glocker1 , Markus D. Schirmer2 , and Daniel Rueckert1

1
Biomedical Image Analysis Group, Imperial College London, London, UK
s.parisot@imperial.ac.uk
2
Stroke Division, Massachusetts General Hospital,
Harvard Medical School, Boston, USA
Abstract. Parcellating the brain into a set of distinct subregions is

an essential step for building and studying brain connectivity networks.
Connectivity driven parcellation is a natural approach, but suffers from
the lack of reliability of connectivity data. Combining modalities in the
parcellation task has the potential to yield more robust parcellations,
yet hasn’t been explored much. In this paper, we propose a graph-
based multi-modal parcellation method that iteratively computes a set
of modality specific parcellations and merges them using the concept of
fusion moves. The merged parcellation initialises the next iteration, forc-
ing all modalities to converge towards a set of mutually informed parcel-
lations. Experiments on 50 subjects of the Human Connectome Project
database show that the multi-modal setting yields parcels that are more
reproducible and more representative of the underlying connectivity.
1 Introduction
Advances in neuroimaging have provided a tremendous amount of in-vivo infor-

mation about the brain’s organisation. In particular, studying connectivity net-
works from a network theoretic perspective has shown great potential and
receives growing interest [15]. An essential step for carrying out network analy-
sis is the definition of the nodes of the networks, as the high dimensionality of
the data acquired at the voxel or vertex level hinders tractable and meaning-
ful analysis. Nodes are typically obtained by parcellating the brain into a set
of subregions. Connectivity driven parcellation is a natural approach, as each
network node will comprise regions with similar connectivity profiles. Cortical
parcellation can also be approached from the perspective of identifying function-
ally specified cortical areas, an elusive objective that has been ongoing for over
a century [4]. Cortical areas can be defined based on their local microstructure,
connectivity and function. This suggests that a single modality is not sufficient
for identifying these areas [4]. Brain parcellation has notably been carried out
S. Parisot—The research leading to these results has received funding from the
European Union’s Seventh Framework Programme (FP/2007-2013)/ERC Grant
Agreement No. 319456.

DOI: 10.1007/978-3-319-46720-7 18
GraMPa: Graph-Based Multi-modal Parcellation of the Cortex 149
using myelin [5], diffusion MRI (dMRI) and tractography [10,12] and functional
MRI (fMRI) data [2,3]. Yet, these modalities suffer from important drawbacks
that cannot be addressed in a mono-modal setting. dMRI is prone to false neg-
atives and biased with respect to the location of fibre terminations, fMRI can
be noisy and prone to false positives, while myelin lacks information outside the
motor area and visual cortex.
Exploring parcellations driven by several modalities could provide more
robust and accurate cortical delineations. For instance, strong similarities have
been observed between myelin maps and resting-state fMRI gradients [5], while
functional and structural connectivity are intrinsically linked. Few methods have
attempted to combine different modalities. The majority of efforts have aimed to
construct a more robust fMRI connectivity matrix informed from tractography,
for instance by eliminating functional connections that do not have a struc-
tural support [11]. This kind of approach however assumes a strong reliability of
dMRI data and a global agreement between structural and functional connec-
tivity. Markov Random Field (MRF) models have been applied successfully to
dMRI and fMRI driven cortical parcellation tasks [6,13]. Their main advantage
is their versatility, in the sense that no restriction is made on the data term
driving the parcellation scheme. As a result, the same framework can be used
for parcellation tasks using different kinds of input data.
In this paper, we exploit this idea and extend the mono-modal MRF mod-
els to the multi-modal setting. We propose an iterative approach where each
iteration computes a set of parcellations driven by a single modality. These par-
cellations are subsequently merged based on each modality’s local reliability
using fusion moves [9]. The merged parcellation initialises the next iteration,
forcing the different modalities to converge towards a set of mutually informed
parcellations. The method was tested on the Human Connectome Project (HCP)
database using myelin maps, and fMRI and dMRI data. Focusing on fMRI par-
cellation, our experiments show that the multi-modal setting yields parcels that
are more reproducible and more representative of the underlying connectivity.
2 Methods
As illustrated in Fig. 1, our method alternates between generating a set of N

modality specific parcellations (Sect. 2.1), and the construction of a joint parcel-
lation that initialises the next iteration (Sects. 2.2 and 2.3).
2.1 Modality Specific Markov Random Field Formulation
Considering a set of N aligned modalities, we represent the brain’s cortical sur-

face as a triangular mesh M = {V, E}, where each vertex is associated with
modality specific data (e.g. fMRI timeseries). We cast the parcellation task as
a labelling problem where, for each modality, we aim to assign a label lvmod to
each vertex v, lvmod ∈ L = 1, K, where K is the number of sought parcels,
and each label corresponds to a parcel assignment. This is can be done in the
150 S. Parisot et al.
Fig. 1. Overview of the proposed iterative method. Each iteration updates an initial
parcellation into a set of modality specific parcellations using an MRF model. The
parcellations are then merged based on the modalities relative influences into a multi-
modal parcellation, which initialises the next iteration.
mono-modal setting following the coarse to fine iterative approach proposed in

[13]. Given an initial parcellation, each update consists of: (1) defining a parcel
specific property based on the current parcellation and (2) updating the parcella-
tion by optimising an MRF model. This corresponds to minimising the following
energy:

E(lmod ) = Dvmod (lvmod ) + β Uv,w (lvmod , lw
mod
) (1)
v∈V v∈V w∈N (v)
The pairwise term Uv,w (lvmod , lw

mod
) acts as a smoothing prior and is designed
for all modalities as a Potts model, that penalises assigning different labels to
neighbours with a constant cost. The unary cost Dvmod (lvmod ) defines the likeli-
hood of assigning vertex v to a specific parcel. We consider three different kinds
of modalities that can be used to drive the parcellation scheme and associate
each modality to a specific unary cost Dvmod (lvmod ) and parcel property.
Diffusion MRI. We adopt the unary cost used in [13] which showed good per-
formance for tractography data. The unary cost computes the Pearson’s correla-
tion coefficient between the vertices’ tractography connectivity profiles and the
parcel centres. Parcel centres are the vertices that have the highest correlation
to the rest of their currently assigned parcel.
Functional MRI. Each vertex is associated with fMRI timeseries. Similarly
to dMRI, we obtain a parcel centre by maximising the within parcel timeseries
correlation. Due to the low SNR of fMRI data, we follow the approach proposed
in [6] and compute a representative average timeseries for each label. We average
the timeseries of the N closest vertices (in terms of shortest path) to the parcel
centre that are in the same parcel. The unary cost for a given vertex is then the
correlation between the vertex’s timeseries and the parcel’s average timeseries.
Non Connectivity Data (e.g. Myelin Maps). Each vertex is associated
with a scalar value. We use unary costs and parcel properties inspired from
MRF-based image segmentation. The parcel centre is defined as the geometric

centre of the parcel (obtained by erosion of the parcel), and is assigned the
average vertex value within the parcel. The unary cost is the shortest path on
the mesh M between the centre and the vertex v, where the edges are weighted
by the absolute difference between the values of the parcel centre and the vertex.
The use of the shortest path allows the parcels to be contiguous.
This set-up yields independent mono-modal parcellations for a set of modal-
ities, where each modality is parcellated independently and subject to modality
specific noise. In particular, parcellations may not be coherent, making com-
parisons difficult. Forcing all parcellations to agree (e.g. by using inter-modality
pairwise costs) would not be adequate as the modalities are not expected to pro-
vide information that agrees across the whole cortex [4] and modalities should
not have the same local importance. We propose to force the parcellations to
converge towards a set of coherent parcellations by initialising each iteration with
a multi-modal parcellation computed by merging the individual parcellations.
2.2 Merging Modalities with Fusion Moves

m
m
Modelling
the merging problem as an MRF of energy E (l) = v∈V Uv (lv ) +
v∈V w∈N (v) Uv,w (lv , lw ) allows to tailor the model according to the modal-
ities considered, their interaction, and the quality of the data. We propose to
define the unary cost Uvm (lv ) based on the mono-modal labellings and how impor-
tant and reliable a modality is locally. For instance, Uvm (lv ) can give lower costs
to labels selected by the most reliable modalities. Reliability could be defined
based on prior knowledge, or in a data-driven way using segmentation uncer-
tainties obtained from the mono-modal MRFs [7]. Uvm (lv ) could also integrate
parcel boundaries obtained from a different source (e.g. registered atlases or
expert annotations). An example of unary costs is presented in Sect. 2.3.
Each mono-modal solution can be seen as a suboptimal solution to the multi-
modal problem, where the modality is given more importance compared to the
others. This makes the concept of fusion moves particularly well suited to solving
our problem. Fusion moves [9] cast the task of combining an ensemble of subopti-
mal solutions as a set of simple binary MRF subproblems. In each binary optimi-
sation (a fusion move), we consider two suboptimal labellings l1 and l2 ∈ LV . We
seek to label the cortical mesh M with a binary label b ∈ {0, 1}V so as to obtain
a combination lc defined as lc (b) = l1 ◦ (1 − b) + l2 ◦ b, where ◦ is the Hadamard
product. The fusion move is carried out by minimising the binary MRF energy
E b (b) = E m (lc (b)). The next fusion move considers a new modality and the
current combined parcellation lc as the suboptimal labellings to merge. Fusion
moves are repeated for all modalities until the combined labelling is no longer
updated. The joint parcellation then initialises the next resolution.
2.3 Application to Multi-modality Informed rs-fMRI Parcellation

We propose to apply the multi-modal framework to increase the robustness
of resting-state fMRI (rs-fMRI) driven parcellations. rs-fMRI provides useful
information all around the cortex. However, it suffers from a low SNR which can
significantly impact the obtained parcels and their reproducibility. Introducing
multi-modal information in the parcellation scheme could be a way of address-
ing this issue. We consider combining rs-fMRI, dMRI and myelin maps due to
their expected similarities. We define the merging unary costs based on how
informative the modalities are, and from prior knowledge of their weaknesses:
Uvm (lv ) = minmod∈1,N 1 − αmod δ(lv , lvmod ) , where αmod ∈ [0, 1]V are costs
that describe the local reliability of all modalities, δ(., .) is the Kronecker delta
function and lmod are the labellings obtained from mono-modal parcellations.
All αmod costs are rescaled between 0 and 1.
Because of rs-fMRI’s low SNR, the joint parcellation should be influenced
by the other modalities when they are reliable. We therefore assign a uniform
reliability αf M RI = 0.5 to rs-fMRI across the whole cortex. Myelin maps should
influence the merged parcellation in regions where strong variations of myeli-
nation are observed. We therefore define the myelin cost as the gradient of the
pre-smoothed myelin maps (see Fig. 3b). Finally, dMRI tractography suffers from
a gyral bias: tractography streamlines tend to terminate preferentially in gyri
[17]. This bias influences the boundaries of dMRI driven parcellations that tend
to align with cortical folding. To evaluate which vertices are impacted by this
bias, we compute for each vertex v the ratio of the number of fibres that ter-
minate at v over the number of connections obtained by sending streamlines
from v. As shown in Fig. 3a, this measurement supports the gyral bias theory
as the resulting map agrees with cortical folding patterns. Using this measure
as a unary cost prevents the vertices affected by the bias to influence the joint
parcellation. In this setting, dMRI will have little influence on parcel boundaries
and essentially act as a smoothing prior, indicating which vertices should be in
the same parcel. As a result, we expect the converged joint parcellation to be
similar to the rs-fMRI parcellation.
Fig. 2. Quantitative evaluation measures. From left to right in each figure, we compare
the mono-modal parcellations to the merged and the multi-modality guided rs-fMRI
parcellations. Lower BIC values (d) are better. Paired t-test results are shown as non-
significant (n.s), p < 0.05 (*) and p < 0.001 (**).
3 Results
Evaluation of cortical parcellations is challenging due to the absence of ground
truth. Our proposed evaluation has two main objectives: evaluate (i) whether
multi-modality increases the robustness of the parcellation method, (ii) how well
the parcellations reflect the underlying connectivity. Since our application is tai-
lored to construct more reliable rs-fMRI parcellations, we focus our evaluation
on this modality. We evaluate the impact of multi-modal information by com-
paring the mono-modal rs-fMRI driven parcellation to the joint and individual
rs-fMRI parcellations obtained using our Graph-based Multi-modal Parcella-
tion (GraMPa) method. We tested GraMPa on 50 randomly selected subjects
(left hemisphere) of the HCP database (S500 release) and used the HCP’s pre-
processed fMRI and dMRI data, and myelin maps. dMRI tractography con-
nectivity profiles are obtained using FSL’s bedpostX and probtrackX [1]. 5000
streamlines are sampled from each mesh vertex. We perform rs-fMRI driven
parcellation using timeseries from a 30 min acquisition. Evaluation is performed
on a second independent 30 min acquisition to test the method’s robustness.
The MRF’s smoothness parameter β is set heuristically to 0.3. Modality specific
MRFs were optimised using fastPD [8] due to its speed, while fusion moves were
optimised using QPBO [14] because of asymmetric pairwise costs. Several MRF
optimisation algorithms were tested with very little impact on the obtained par-
cellations. We tested the reproducibility with respect to initialisation using 10
random initialisations constructed using Poisson Disc Sampling. Parcellations
were computed for four different resolutions (50, 100, 150 and 200 labels). All
measures are computed for all initialisations and subjects. Reproducibility is
evaluated using the Adjusted Rand Index (ARI) [4] and the modified Dice Score
Coefficient (DSC) [13] that allows merging very similar parcels. ARI is a mea-
sure from probability theory that assesses the statistical dependence between
Fig. 3. Visual results for randomly selected subjects. (a) dMRI and (b) Myelin relia-
bility maps. (c, d) Overlap between the boundaries of the multi-modal parcellation and
(c) myelin maps and (d) Brodmann areas. (e–g) Comparative overlap between rs-fMRI
parcellations boundaries and t-fMRI activation maps. Top row: mono-modal parcel-
lations, bottom row: GraMPa rs-fMRI parcellations. (e, f) Motor task, (g) Language
task. Coloured arrow indicate striking examples.
two clustering solutions. It takes values between −1 and 1, where 1 means the
clusterings are identical. Figures 2a and b show comparative boxplots of the two
measures between GraMPa and the mono-modal approach. We can see that most
configurations are more reproducible. Results are significant (p < 0.001) for the
two largest resolutions. The lower performance for 50 parcels could indicate that
it is difficult to obtain large smooth parcels while agreeing with all reliability
maps. Our parcellations’ agreement with the underlying structure is evaluated
by (i) computing the average functional coherence (FC) [6] and (ii) evaluating
the agreement with task fMRI activation maps (obtained using FSL’s standard
tools) using the Bayesian Information Criterion (BIC) [16]. FC evaluates the
average correlation between a parcel’s average timeseries and the timeseries of
all vertices in the same parcel. In order to avoid introducing a size bias, very
small parcels are ignored from the computation. For each parcel, BIC evaluates
how well it is possible to fit a probabilistic model of the concatenated task acti-
vation maps of all 50 subjects. As shown in Fig. 2c and d, GraMPa yields better
results for both measures. Results are significant (p < 0.001) for most configu-
rations. Finally, Fig. 3c–g visually compares parcels boundaries with Brodmann
and myelin maps and the average task activation maps over all 50 subjects. We
can see that GraMPa parcellations have a stronger agreement with task activa-
tions boundaries.
4 Discussion
In this paper, we proposed a general graph-based framework which provides
modality specific coherent parcellations, as well as a multi-modal parcellation
that merges modalities based on their reliabilities. We propose an application
to the construction of more reliable rs-fMRI parcellations through the introduc-
tion of multi-modal information from structural connectivity and myelin maps.
Our experiments show that GraMPa’s parcellations are more robust and more
representative of the underlying structure. One of the main advantages of the
proposed framework is its flexibility. It can be tailored for a specific set of modal-
ities and issues associated with a particular acquisition process. It is also easy to
integrate prior knowledge both in designing the modalities’ reliability maps and
through the introduction of known reliable boundaries defined as a new locally
reliable modality. Another possibility is to design a fully data-driven fusion move
step. Local segmentation uncertainties could be estimated for each modality after
each MRF optimisation using min-marginal energies [7].
Furthermore, our model alleviates the need to match the different modali-
ties’ unary costs and does not limit the number of modalities considered. The
method could be extended to other multi-modal segmentation tasks. It could
prove particularly well-suited to group-wise parcellation, where each subject
would be assimilated to a modality and the fusion would be driven by group
consistency measures. It would have the potential of handling very large groups,
as subjects don’t have to be considered simultaneously. The method could sim-
ilarly be used to merge MRF parcellations obtained from a large set of initial-
isations. Many challenges remain associated with the multi-modal parcellation
task. fMRI and dMRI are currently the best way of measuring in vivo connectiv-
ity, but remain very indirect measurements and can be unreliable. In addition,
multi-modal analyses would benefit from a stronger knowledge of the modalities
interactions and similarities. Finally, using our parcellations in a clinical context
requires the development of robust methods for analysing the obtained connec-
tivity networks, while parcellation of diseased subjects may be associated with
new challenges.
References
1. Behrens, T., Berg, H.J., Jbabdi, S., Rushworth, M., Woolrich, M.: Probabilistic
diffusion tractography with multiple fibre orientations: what can we gain? Neu-
roImage 34(1), 144–155 (2007)
2. Blumensath, T., Jbabdi, S., Glasser, M.F., Van Essen, D.C., Ugurbil, K.,
Behrens, T.E., Smith, S.M.: Spatially constrained hierarchical parcellation of the
brain with resting-state fMRI. NeuroImage 76, 313–324 (2013)
3. Craddock, R.C., James, G.A., Holtzheimer, P.E., Hu, X.P., Mayberg, H.S.: A whole
brain fMRI atlas generated via spatially constrained spectral clustering. Hum.
Brain Mapp. 33, 1914–1928 (2012)
4. Eickhoff, S.B., Thirion, B., Varoquaux, G., Bzdok, D.: Connectivity-based parcel-
lation: critique and implications. Hum. Brain Mapp. 36(12), 4771–4792 (2015)
5. Glasser, M.F., Van Essen, D.C.: Mapping human cortical areas in vivo based
on myelin content as revealed by T1-and T2-weighted MRI. J. Neurosci. 31(32),
11597–11616 (2011)
6. Honnorat, N., Eavani, H., Satterthwaite, T., Gur, R., Gur, R., Davatzikos, C.:
GraSP: geodesic graph-based segmentation with shape priors for the functional
parcellation of the cortex. NeuroImage 106, 207–221 (2015)
7. Kohli, P., Torr, P.H.: Measuring uncertainty in graph cut solutions. Comput. Vis.
Image Underst. 112(1), 30–38 (2008)
8. Komodakis, N., Tziritas, G.: Approximate labeling via graph cuts based on linear
programming. IEEE Trans. Pattern Anal. Mach. Intell. 29(8), 1436–1453 (2007)
9. Lempitsky, V., Rother, C., Roth, S., Blake, A.: Fusion moves for Markov random
field optimization. IEEE Trans. PAMI 32(8), 1392–1405 (2010)
10. Moreno-Dominguez, D., Anwander, A., Knösche, T.R.: A hierarchical method for
whole-brain connectivity-based parcellation. Hum. Brain Mapp. 35, 5000–5025
(2014)
11. Ng, B., Varoquaux, G., Poline, J.B., Thirion, B.: Implications of inconsistencies
between fMRI and dMRI on multimodal connectivity estimation. In: Mori, K.,
Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol.
12. Parisot, S., Arslan, S., Passerat-Palmbach, J., Wells, W.M., Rueckert, D.:
Tractography-driven groupwise multi-scale parcellation of the cortex. In: Ourselin,
S., Alexander, D.C., Westin, C.-F., Cardoso, M.J. (eds.) IPMI 2015. LNCS, vol.
13. Parisot, S., Rajchl, M., Passerat-Palmbach, J., Rueckert, D.: A continuous flow-
maximisation approach to connectivity-driven cortical parcellation. In: Navab, N.,
Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351,
14. Rother, C., Kolmogorov, V., Lempitsky, V., Szummer, M.: Optimizing binary
MRFs via extended roof duality. In: CVPR, pp. 1–8. IEEE (2007)
15. Sporns, O.: The human connectome: a complex network. Ann. N. Y. Acad. Sci.
1224, 109–125 (2011)
16. Thirion, B., Varoquaux, G., Dohmatob, E., Poline, J.B.: Which fMRI clustering
gives good brain parcellations? Front. Neurosci. 8(167), 13 (2014)
17. Van Essen, D.C., Jbabdi, S., Sotiropoulos, S.N., Chen, C., et al.: Mapping connec-
tions in humans and non-human primates: aspirations and challenges for diffusion
imaging. In: Diffusion MRI, pp. 337–358 (2013)
A Continuous Model of Cortical Connectivity
Daniel Moyer(B) , Boris A. Gutman, Joshua Faskowitz, Neda Jahanshad,

and Paul M. Thompson
Imaging Genetics Center, University of Southern California, Los Angeles, USA

moyerd@usc.edu
Abstract. We present a continuous model for structural brain connec-

tivity based on the Poisson point process. The model treats each stream-
line curve in a tractography as an observed event in connectome space,
here a product space of cortical white matter boundaries. We approxi-
mate the model parameter via kernel density estimation. To deal with
the heavy computational burden, we develop a fast parameter estima-
tion method by pre-computing associated Legendre products of the data,
leveraging properties of the spherical heat kernel. We show how our app-
roach can be used to assess the quality of cortical parcellations with
respect to connectivty. We further present empirical results that suggest
the “discrete” connectomes derived from our model have substantially
higher test-retest reliability compared to standard methods.
Keywords: Human connectome · Diffusion MRI · Non-parametric

estimation
1 Introduction
In recent years, the study of structural and functional brain connectivity has
expanded rapidly. Following the rise of diffusion and functional MRI, connec-
tomics has unlocked a wealth of knowledge to be explored. Almost synonymous
with the connectome is the network-theory based representation of the brain.
In much of the recent literature, the quantitative analysis of connectomes has
focused on region-to-region connectivity. This paradigm equates physical brain
regions with nodes in a graph, and uses observed structural measurements or
functional correlations as a proxy for edge strengths between nodes.
Critical to this representation of connectivity is the delineation of brain
regions, the parcellation. Multiple studies have shown that the choice of parcel-
lation influences the graph statistics of both structural and functional networks
[15,17,18]. It remains an open question which of the proposed parcellations is
the optimal representation, or even if such a parcellation exists [14].
It is thus useful to construct a more general framework for cortical connectiv-
ity, one in which any particular parcellation of the cortex may be expressed and
its connectivity matrix derived, and one in which the variability of connectivity
measures can be modeled and assessed statistically. It is also important that
this framework allow comparisons between parcellations, and representations in

DOI: 10.1007/978-3-319-46720-7 19
158 D. Moyer et al.
this framework must be both analytically and computationally tractable. Since

several brain parcellations at the macroscopic scale are possible, a representation
of connectivity that is independent of parcellation is particularly appealing.
In this paper, we develop such a general framework for a parcellation inde-
pendent connectivity representation, building on the work of [8]. We describe a
continuous point process model for the generation of an observed tract1 (stream-
line) intersections with the cortical surface, from which we may recover a dis-
tribution of edge strengths for any pair of cortical regions, as measured by the
inter-region tract count. Our model is an intensity function over the product
space of the cortical surface with itself, assigning to every pair of points on the
surface a point connectivity. We describe an efficient method to estimate the
parameter of the model, as well as a method to recover the region-to-region edge
strength. We then demonstrate the estimation of the model on a Test-Retest
dataset. We provide reproducibility estimates for our method and the standard
direct count methods [10] for comparison. We also compare the representational
power of common cortical parcellations with respect to a variety of measures.
2 Continuous Connectivity Model
The key theoretical component of our work is the use of point process theory
to describe estimated cortical tract projections. A point process is a random
process where any realization consists of a collection of discrete points on a
measurable space. The most basic of these processes is the Poisson process, in
which events occur independently at a specific asymptotic intensity (rate) λ over
the chosen domain [12]. λ completely characterizes each particular process, and
is often defined as a function λ : Domain → R+ , which allows the process to
vary in intensity by location. The expected count of any sub-region (subset) of
the domain is its total intensity, the integral of λ over the sub-region. In this
paper, our domain is the connectivity space of the cortex, the set of all pairs of
points on the surface, and the events are estimated tract intersections with the
cortical surface.
2.1 Model Definition
Let Ω be union of two disjoint subspaces each diffeomorphic to the 2-sphere

representing the white matter boundaries in each hemisphere. Further consider
the space Ω × Ω, which here represents all possible end point pairs for tracts
that reach the white matter boundary. We treat the observation of such tracts as
events generated by an inhomogeneous (symmetric) Poisson process on Ω × Ω;
in our case, for every event (x, y) we have a symmetric event (y, x).
1
It is critical to distinguish between white matter fibers (fascicles) and observed
“tracts.” Here, “tracts” denotes the 3d-curves recovered from Diffusion Weighted
Imaging via tractography algorithms.
A Continuous Model of Cortical Connectivity 159
Assuming that each event is independent of all other events except for its
symmetric event (i.e., each tract is recovered independently), we model con-
nectivity as a intensity function λ : Ω × Ω → R+ , such that for any regions
E1 , E2 ⊂ Ω, the number of events is Poisson distributed with parameter

C(E1 , E2 ) = λ(x, y)dxdy. (1)
E1 ,E2
Due to properties of the Poisson distribution, the expected number of tracts is

exactly C(E1 , E2 ). For any collection of regions {Ei }N
i=1 = P , we can compute a
weighted graph G(P, λ) by computing each C(Ei , Ej ) for pairs (Ei , Ej ) ∈ P × P .
Each node in this graph represents a region, and each weighted edge represents
the pairwise connectivity
of the pair
of nodes (regions) it connects. We call P a
parcellation of Ω if i Ei = Ω and i Ei has measure 0 ({Ei } is almost disjoint).
2.2 Recovery of the Intensity Function

A sufficient statistic for Poisson process models is the intensity function λ(x, y).
Estimation of the function is non-trivial, and has been the subject of much
study in the spatial statistics community [3]. We choose to use a non-parametric
Kernel Density Estimation (KDE) approach due to an efficient closed form for
bandwidth estimation described below. This process is self-tuning up to a choice
of desiderata for the bandwidth parameter.
We first inflate each surface to a sphere and register them using a spheri-
cal registration (See Sect. 3.1); however this entire method can be undertaken
without group registration. We treat each hemisphere as disjoint from the other,
allowing us to treat Ω × Ω as the product of spheres (S1 ∪ S2 ) × (S1 ∪ S2 ).
Throughout the rest of the paper D denotes a dataset containing observed tract
endpoints (x, y)i , and λ̂ denotes our estimation of λ.
The unit normalized spherical heat kernel is a natural choice of kernel for S2 .
We use its truncated spherical harmonic representation [1], defined as follows for
any two unit vectors p and q on the 2-sphere:
H
2h + 1
Kσ (p, q) = exp{−h(h + 1)σ}Ph0 (p · q)
4π
h
Here, Ph0 is the hth degree associated Legendre polynomial of order 0. Note
that the non-zero order polynomials have coefficient zero due to the radial
symmetry of the spherical heat kernel [1]. However, since we are estimating
a function on Ω × Ω, we use the product of two heat kernels as our KDE
kernel κ. For any two points p and q, the kernel value associated to a end
point pair (x, y) is κ((p, q)|(x, y)) = Kσ (x, p)Kσ (y, q). It is easy to show that
Ω×Ω
Kσ (x, p)Kσ (y, q)dpdq = 1.
The spherical heat kernel has a single shape parameter σ which corre-
sponds to its bandwidth. While in general tuning this parameter requires the
re-estimation of λ̂ at every iteration, by rewriting our kernel we can memoize
160 D. Moyer et al.
part of the computation so that we only need to store

the sum of the outer prod-
ucts of the harmonics. Writing out κ((p, q)|D) = (xi ,yi )∈D Kσ (xi , p)Kσ (yi , q),
we have the following:
H
H

2h + 1 2k + 1
κ((p, q)|D) = exp{−σ(h2 + h + k 2 + k)}
4π 4π
h k
Independent of D, evaluated every iteration

× Ph0 (xi · p)Pk0 (yi · q)
(xi ,yi )∈D

Independent of σ, evaluated once
Thus, evaluations of the kernel at any point (p, q) can be done quickly for
sequences of values of σ. We then are left with the choice of loss function. Denot-
ing the true intensity function λ, the estimated intensity λ̂, and the leave-one-
out estimate λ̂i (leaving out observation i), Integrated Squared Error (ISE) is
defined:

ISE(σ|D) = (λ̂(x, y|σ) − λ(x, y))2 dxdy
Ω×Ω
2
≈ λ̂(x, y|σ)2 dxdy − λ̂i (xi , yi ) + Constant.
|D|
(xi ,yi )∈D
Hall and Marron [9] suggest tuning bandwidth parameters using ISE. In practice,
we find that replacing each leave-one-out estimate with its logarithm log λ̂i (xi , yi )
yields more consistent and stable results.
2.3 Selecting a Parcellation

Given an estimated intensity λ̂ and two or more parcellations P1 , P2 , . . . , we
would like to know which parcellation and associated graph G(P, λ̂) best repre-
sents the underlying connectivity function. There are at least two perspectives
to consider.
Approximation Error: Because each Pi covers Ω (and Pi × Pi = Ω × Ω),
each G(P1 , λ̂) can be viewed as a piece-wise function g : Ω × Ω → R+ , where
1
g(x, y) = |Ei ||E j|
C(Ei , Ej ) such that x ∈ Ei and y ∈ Ej . In other words, g is
the constant approximation to λ over every pair of regions. A natural measure
of error is another form of Integrated Squared Error:

Err(λ̂, G(P1 , λ̂)) = (g(x, y) − λ(x, y))2 dxdy. (2)
Ω×Ω
This is analogous to squared loss (2 -loss).

Likelihood: An alternative viewpoint leverages the point process model to mea-
sure a likelihood:

log L(P ) = log Poisson(|{(x, y)i ∈ D : x ∈ Ei , y ∈ Ej }|; C(Ei , Ej )). (3)
Ei ,Ej ∈P
Here, the independence assumption plays a critical role, allowing pairs of regions
to be evaluated separately. Unfortunately this is biased toward parcellations with
more, smaller regions, as the Poisson distribution has tied variance and mean
in one parameter. A popular likelihood-based option that somewhat counterbal-
ances this is Akaike’s Information Criterion (AIC),

|P |
AIC(P ) = −2 log L(P ) + log |D|. (4)
2
AIC balances accuracy with parsimony, penalizing overly parameterized models -

in our case, parcellations with too many regions.
3 Application to CoRR Test-Retest Data

We demonstrate the use of our framework on a test-retest dataset. We measure
connectome reproducibility using Intraclass Correlation (ICC) [13], and compare
three parcellations using multiple criteria (See Eqs. 2, 3, and 4).
3.1 Procedure, Connectome Generation, and Evaluation

Our data are comprised of 29 subjects from the Institute of Psychology, Chinese
Academy of Sciences sub-dataset of the larger Consortium for Reliability and
Reproducibility (CoRR) dataset [19]. T1-weighted (T1w) and diffusion weighted
(DWI) images were obtained on 3T Siemens TrioTim using an 8-channel head
coil and 60 directions. Each subject was scanned twice, roughly two weeks apart.
T1w images were processed with Freesufer’s [4] recon-all pipeline to obtain a
triangle mesh of the grey-white matter boundary registered to a shared spherical
space [5], as well as corresponding vertex labels per subject for three atlas-based
cortical parcellations, the Destrieux atlas [6], the Desikan-Killiany (DK) atlas [2],
and the Desikan-Killiany-Tourville (DKT31) atlas [11]. Probabilistic streamline
tractography was conducted using the DWI in 2 mm isotropic MNI 152 space,
using Dipy’s [7] implementation of constrained spherical deconvolution (CSD)
[16] with a harmonic order of 6. As per Dipy’s ACT, we retained only tracts
longer than 5 mm with endpoints in likely grey matter.
We provide the mean ICC score computed both with and without entries
that are zero for all subjects. When estimating λ̂ the kernels are divided by
the number of tracks, and we use a sphere with unit surface area instead of
unit radius for ease of computation. We threshold each of the kernel integrated
connectomes at 10−5 , which is approximately one half of one unit track density.
We then compute three measures of parcellation representation accuracy, namely
ISE, Negative Log Likelihood, and AIC scores.
3.2 Results and Discussion

Table 1 shows a surprisingly low mean ICC scores for regular count matrices. This
may be because ICC normalizes each measure by its s2 statistic, meaning that
162 D. Moyer et al.
Table 1. This table shows mean ICC scores for each connectome generation method.
The count method - the standard approach - defines edge strength by the fiber endpoint
count. The integrated intensity method is our proposed method; in general it returns a
dense matrix. However, many of the values are extremely low, and so we include results
thresholding the matrix, with and without elements that are zero for all subjects.
Highest ICC scores for each atlas are bolded.
Type DK Destrieux DKT31

Number of regions 68 148 62
Count ICC 0.2093 0.1722 0.2266
Integrated intensity ICC (no threshold) 0.5069 0.5144 0.4374
Integrated intensity ICC (no zeros) 0.5130 0.5341 0.3781
Integrated intensity ICC 0.7606 0.9026 0.6102
entries in the adjacency matrices that should be zero but that are subject to a
small amount of noise – a few erroneous tracks – have very low ICC. Our method
in effect smooths tracts endpoints into a density; end points near the region
boundaries are in effect shared with the adjacent regions. Thus, even without
thresholding we dampen noise effects as measured by ICC. With thresholding,
our method’s performance is further improved, handily beating the counting
method with respect to ICC score. It is important to note that for many graph
statistics, changing graph topology can greatly affect the measured value [18].
While it is important to have consistent non-zero measurements, the difference
between zero and small but non-zero in the graph context is also non-trivial. The
consistency of zero-valued measurements is thus very important in connectomics.
Table 2 suggests that all three measures, while clearly different, are consistent
in their selection at least with respect to these three parcellations. It is somewhat
surprising that the Destrieux atlas has quite low likelihood criteria, but this may
be due to the (quadratically) larger number of region pairs. Both likelihood based
Table 2. This table shows the means over all subjects of three measures of parcellation
“goodness”. The retest versions are the mean of the measure using the parcellation’s
regional connectivity matrix (or the count matrix) from one scan, and the estimated
intensity function from the other scan.
Type DK Destrieux DKT31

ISE 1.8526 × 10−5 2.1005 × 10−5 2.1258 × 10−5
Negative LogLik 85062.5 355769.4 88444.5
AIC score 174680.95 733294.8 185253.5
Retest ISE 1.0517 × 10−5 1.0257 × 10−5 1.1262 × 10−5
Retest negative LogLik 85256.0 357292.9 88434.9
Retest AIC score 175068.1 736341.9 185234.3
Fig. 1. A visualization of the ICC scores for connectivity to Brodmann Area 45

(Destrieux region 14) for the Count connectomes (left) and the proposed Integrated
Intensity connectomes (right). Blue denotes a higher score.

Fig. 2. A visualization of the marginal connectivity M (x) = E λ̂(x, y)dy for the Left
i
Post-central Gyrus region of the DK atlas (Region 57). The region is shown in blue on
the inset. Red denotes higher connectivity regions with the blue region.
retest statistics also choose the DK parcellation, while ISE chooses the Destrieux
parcellation by a small margin. It should be noted that these results must be
conditioned on the use of a probabilistic CSD tractography model. Different
models may lead to different intensity functions and resulting matrices. The
biases and merits the different models and methods (e.g. gray matter dilation
for fiber counting vs streamline projection) remain important open questions
(Figs 1 and 2).
4 Conclusion
We have presented a general framework for structural brain connectivity. This
framework provides a representation for cortical connectivity that is independent
of the choice of regions, and thus may be used to compare the accuracy of
a given set of regions’ connectivity matrix. We provide one possible estimation
method for this representation, leveraging spherical harmonics for fast parameter
estimation. We have demonstrated this framework’s viability, as well as provided
a preliminary comparison of regions using several measures of accuracy.
The results presented here lead us to conjecture that our connectome esti-
mates are more reliable compared to standard fiber counting, though we stress
that a much larger study is required for strong conclusions to be made. Fur-
ther adaptations of our method are possible, such as using FA-weighted fiber
164 D. Moyer et al.
counting. Our future work will explore these options, conduct tests on larger
datasets, and investigate the relative differences between tracking methods and
parcellations more rigorously.
Acknowledgments. This work was supported by NIH Grant U54 EB020403, as well
as the Rose Hills Fellowship at the University of Southern California. The authors would
like to thank the reviewers as well as Greg Ver Steeg for multiple helpful conversations.
References
1. Chung, M.K.: Heat kernel smoothing on unit sphere. In: 3rd IEEE International
Symposium on Biomedical Imaging: Nano to Macro 2006, pp. 992–995. IEEE
(2006)
2. Desikan, R.S., et al.: An automated labeling system for subdividing the human
cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage
31(3), 968–980 (2006)
3. Diggle, P.: A kernel method for smoothing point process data. Appl. Stat. 34,
138–147 (1985)
4. Fischl, B.: Freesurfer. NeuroImage 2(62), 774–781 (2012)
5. Fischl, B., et al.: High-resolution intersubject averaging and a coordinate system
for the cortical surface. Hum. Brain Mapp. 8(4), 272–284 (1999)
6. Fischl, B., et al.: Automatically parcellating the human cerebral cortex. Cereb.
Cortex 14(1), 11–22 (2004)
7. Garyfallidis, E., et al.: Dipy, a library for the analysis of diffusion MRI data. Front.
Neuroinform. 8(8), 1–17 (2014)
8. Gutman, B., Leonardo, C., Jahanshad, N., Hibar, D., Eschenburg, K., Nir, T.,
Villalon, J., Thompson, P.: Registering cortical surfaces based on whole-brain
structural connectivity and continuous connectivity analysis. In: Golland, P.,
9. Hall, P., Marron, J.S.: Extent to which least-squares cross-validation minimises
integrated square error in nonparametric density estimation. Probab. Theor. Relat.
Fields 74(4), 567–581 (1987)
10. Jahanshad, N., et al.: Alzheimer’s Disease Neuroimaging I (2013) genome-wide scan
of healthy human connectome discovers SPON1 gene variant influencing dementia
severity. Proc. Natl. Acad. Sci. USA 110(12), 4768–4773 (2013)
11. Klein, A., Tourville, J., et al.: 101 labeled brain images and a consistent human
cortical labeling protocol. Front. Neurosci. 6(171), 10–3389 (2012)
12. Moller, J., Waagepetersen, R.P.: Statistical Inference and Simulation for Spatial
Point Processes. CRC Press, Boca Raton (2003)
13. Portney, L.G., Watkins, M.P.: Statistical measures of reliability. Found. Clin. Res.:
Appl. Pract. 2, 557–586 (2000)
14. de Reus, M.A., Van den Heuvel, M.P.: The parcellation-based connectome: limita-
tions and extensions. NeuroImage 80, 397–404 (2013)
15. Satterthwaite, T.D., Davatzikos, C.: Towards an individualized delineation of func-
tional neuroanatomy. Neuron 87(3), 471–473 (2015)
16. Tournier, J.D., Yeh, C.H., Calamante, F., Cho, K.H., Connelly, A., Lin, C.P.:
Resolving crossing fibres using constrained spherical deconvolution: validation
using diffusion-weighted imaging phantom data. NeuroImage 42(2), 617–625
(2008)
17. Wang, J., et al.: Parcellation-dependent small-world brain functional networks: a

resting-state fMRI study. Hum. Brain Mapp. 30(5), 1511–1523 (2009)
18. Zalesky, A., et al.: Whole-brain anatomical networks: does the choice of nodes
matter? NeuroImage 50(3), 970–983 (2010)
19. Zuo, X.N., et al.: An open science resource for establishing reliability and repro-
ducibility in functional connectomics. Sci. Data 1, 140049 (2014)
Label-Informed Non-negative Matrix
Factorization with Manifold Regularization
for Discriminative Subnetwork Detection
Takanori Watanabe1(&), Birkan Tunc1, Drew Parker1,

Junghoon Kim2, and Ragini Verma1
1
Section of Biomedical Image Analysis, University of Pennsylvania,
Philadelphia, PA, USA
watanabe@uphs.upenn.edu
2
The City College of New York, New York, NY, USA
Abstract. In this paper, we present a novel method for obtaining a low

dimensional representation of a complex brain network that: (1) can be inter-
preted in a neurobiologically meaningful way, (2) emphasizes group differences
by accounting for label information, and (3) captures the variation in disease
subtypes/severity by respecting the intrinsic manifold structure underlying the
data. Our method is a supervised variant of non-negative matrix factorization
(NMF), and achieves dimensionality reduction by extracting an orthogonal set
of subnetworks that are interpretable, reconstructive of the original data, and
also discriminative at the group level. In addition, the method includes a
manifold regularizer that encourages the low dimensional representations to be
smooth with respect to the intrinsic geometry of the data, allowing subjects with
similar disease-severity to share similar network representations. While the
method is generalizable to other types of non-negative network data, in this
work we have used structural connectomes (SCs) derived from diffusion data to
identify the cortical/subcortical connections that have been disrupted in abnor-
mal neurological state. Experiments on a traumatic brain injury (TBI) dataset
demonstrate that our method can identify subnetworks that can reliably classify
TBI from controls and also reveal insightful connectivity patterns that may be
indicative of a biomarker.
1 Introduction
Substantial evidence suggests that many major psychiatric and neurological disorders
are associated with aberrations in the network structure of the brain [5, 7]. With the
availability of modern neuroimaging modalities such as diffusion tensor (DTI) and
functional (fMRI) imaging, there is currently an exciting potential for researchers to
identify connectivity-based biomarkers of disease states. Since brain networks are
known to exhibit complex interactions, multivariate pattern analysis (MVPA) methods
are particularly suitable here, as they aim to identify the site of the pathology by
examining the data as a whole, accounting for the correlations among the network
features.

DOI: 10.1007/978-3-319-46720-7_20
Label-Informed Non-negative Matrix Factorization 167
In this work, we are interested in applying MVPA methods on diffusion-based

structural connectomes (SCs) to identify the patterns of structural dysconnectivity
induced by a brain disorder. However, due to the high dimensionality of SCs, standard
MVPA methods such as the support vector machine (SVM) become prone to over-
fitting and thus tend to generalize poorly to test data. Even when generalizability is
achieved, SVM lacks clinical interpretability since it returns a dense, high dimensional
weight vector. One way to address this is by adding an L1-regularizer to the SVM
objective for feature selection [6], but this approach is known to perform poorly when
the features are highly correlated. Thus dimensionality reduction becomes critical for
improving classification performance and interpretability. Some well-established
dimensionality reduction methods in neuroimaging include the principal and inde-
pendent component analysis (PCA and ICA). However, these approaches do not pre-
serve the non-negativity of the SCs, thus return global representations of brain network
that are highly overlapping and lack interpretability since negative structural connec-
tion is biologically ill-defined.
Non-negative matrix factorization (NMF) [9] is a relatively recent method that
addresses this problem by incorporating non-negativity as a constraint. This constraint
leads to a more localized “parts-based” representation where the data is decomposed
into purely additive combinations of non-negative basis components. For our work, the
bases can be interpreted as data-driven subnetworks, and the corresponding coefficients
provide a low-dimensional representation of the SC that can be used in a classifier.
However, despite its success, NMF possesses several limitations. First, NMF does
not guarantee the basis components to be local and parts-based, i.e., the subnetworks
may be global representations that are overlapping and redundant. Moreover, standard
NMF and many of its variants are unsupervised, thus they ignore discriminative
structures that may signify important group differences. Finally, NMF assumes that the
data are sampled from a Euclidean space, thus does not account for the intrinsic
manifold structure underlying the data. While this last issue was addressed in a recent
work by Ghanbari et al. [7] under a graph-embedding framework, their method is also
unsupervised and thus ignores label information. On the other hand, although super-
vised subnetwork detection frameworks have been introduced in some recent works [2,
8], these methods do not account for the manifold structure underlying the data.
To overcome these limitations, in this paper we introduce a novel supervised NMF
framework for identifying an orthogonal set of subnetworks that is interpretable and
emphasizes group differences in structural connectivity. The method also respects the
intrinsic geometric structure in the data through manifold regularization [7, 10], which
encourages subnetwork representations to be smooth with respect to the data manifold.
To solve the proposed objective function, we introduce an optimization algorithm
based on the alternating direction method (ADM), which has recently been demon-
strated to solve NMF with superior performance over other state-of-the-art algorithms
[12]. The proposed framework was evaluated on a TBI dataset, and the results
demonstrate the interpretability and the discriminative capacity of the subnetworks.
168 T. Watanabe et al.
2 Method
Projective NMF. Let X ¼ ½x1 ; ; xn and y ¼ ½y1 ; ; yn T denote a set of training

samples consisting of SCs xi 2 Rpþ ; i ¼ 1; ; n; and yi 2 f1g indicates the label of
subject i. An SC is a vector representation of the brain network obtained via tractog-
raphy, where each vector elements represents the strength of structural connection
between distinct pair of brain regions (see Sect. 3 for details). Given a target dimension
r p, NMF learns a decomposition of the form X WH by minimizing the Frobe-
nius norm error kX WH k2F , where W ¼ ½w1 ; ; wr 2 Rpr þ is the basis matrix and
H ¼ ½h1 ; ; hn 2 Rrn
þ is the coefficient matrix. In the context of our work, the
columns of W are connectivity bases that represent subnetworks.
Following [10], we assume that H is obtained from a linear projection of X, i.e.,
H ¼ PX, where P 2 Rrp þ is a nonnegative projection matrix that embeds the data onto
the intrinsic subspace. Under this assumption, the objective function for NMF becomes
1
min kX WðPXÞk2F : ð1Þ
W;P 0 2
A key advantage of this projective NMF is that once an optimal projection P

is learned
from solving (1), the trained model can be readily generalized to unseen data. That is,
given a new test data x
, we can immediately obtain its low dimensional representation
by h
¼ P
x
: This is extremely important for running cross-validation (CV).
Orthogonal NMF with Manifold Regularization and Label Information. Despite
the merits of the projective NMF, it has three key deficiencies. Firstly, it is often
reported that NMF does not necessarily return meaningful parts-based decompositions
for some datasets. Secondly, although many real-world data are found to lie in a low
dimensional manifold, NMF assumes that the data are sampled from a Euclidean space,
neglecting the intrinsic geometric structure in the data. Thirdly, traditional NMF
models are unsupervised and thus ignore the discriminative information from the
different label groups.
In light of these limitations, we propose to include the following terms in our
model:

1. Orthogonality constraint: F1 ðW Þ ¼ IX ðWÞ, where X :¼ W 2 Rp r jW T W ¼ I r
and IC ðÞ is the indicator function of a set C : IC ðW Þ ¼ 0 if W 2 C and IC ðW Þ ¼ 1
elsewise.
Pn P n
2. Manifold regularization: F2 ðPÞ¼ Pxi Pxj Sij :
i¼1 j¼1
2
3. Classification error: F3 ðP; b; bÞ ¼ y ðPX ÞT b b1n 2 , where b 2 Rr and b 2 R
defines a hyperplane in the intrinsic subspace, and 1n 2 Rn is a vector of all ones.
The F1 term constrains the basis matrix to reside within the set X, which is the set
of orthogonal matrices known as the Stiefel manifold [11]. Since W is non-negative,
orthogonality implies that the bases representing the subnetworks are non-overlapping,
which enhances interpretability and eliminates redundancy.
The F2 term ensures smoothness of the low dimensional representation with respect
to the manifold structure encoded in affinity matrix S 2 Rnn . Intuitively, this regu-
larizer preserves the intrinsic geometric structure in the data by encouraging repre-
sentations Pxi and Pxj to be close if Si;j is large, i.e., subjects i and j are similar under
some notion. This regularizer
can also be expressed in terms of the trace operator:
F2 ðPÞ ¼ Tr ðPXÞLðPXÞT , where L 2 Rnn is the graph Laplacian defined by L ¼
P
n
D S; and D is a diagonal matrix with Di;i ¼ Si;j 8i. While the type of inter-subject
j¼1
relationship that can be encoded via the affinity matrix S is general, in this work, we
will take advantage of the clinical scores that are used to evaluate patients, and create a
“disease-severity graph” to capture the disease-induced variation in the SCs. Specifi-
cally, we will assign higher value to Si;j if subjects i and j share similar severity scores.
Finally, the classification error term F3 enhances the discriminatory power of NMF
by encouraging the label groups in the low dimensional embedding PX to be separated
by a hyperplane b (for clarity, the intercept term b is dropped from our presentation
hereon after). Thus, our proposed NMF model seeks to identify subnetwork bases that
are not only reconstructive of data but also discriminative of label groups (note that the
squared error is used here to allow the ADM algorithm to admit a closed form
solution).
Integrating the above constraint terms into the projective NMF Eq. (1) gives us our
final objective function (k1 ; k2 0 below are regularization parameters):
2
min kX W ðPX Þk2F þ k1 Tr ðPXÞLðPXÞT þ k2 y ðPX ÞT b2 þ IX ðW Þ: ð2Þ
W;P 0;b
ADM Algorithm. We now introduce an optimization algorithm based on the ADM

algorithm [12] for solving the proposed cost function. Before applying ADM, we first
convert objective function (2) into the following equivalent constrained form by
introducing auxiliary variables fH; H; ~ W~ 1; W
~ 2 ; Pg
~ (a technique called variable
splitting):

min kX WH k2F þ k1 Tr HL ~ T þ k2 y H T b2 þ I þ W
~ H ~ 1 þ IX W
~ 2 þ Iþ P
~
2
W; P; H; b;
~ H;
P; ~ W
~ 1; W
~2
~ 1; W ¼ W
such that H ¼ PX; W ¼ W ~ 2 ; P ¼ P;
~ H ¼ H;
~
where I þ ðÞ denotes the indicator function of the non-negative orthant. Although the
auxiliary variables introduced from variable splitting may appear redundant, this
strategy is commonly used in ADM frameworks (see [12] for example), as it allows the
ADM subproblems to be solved in closed form. In the context of our work, the
augmented Lagrangian (AL) function for the above constrained problem is given by:

~ H; H;
LAL W; P; b; P; ~ W ~ 1; W
~ 2 ; K ~ ; K ~ ; K ~ ; KH ; K ~ ¼ kX WH k2
W1 W2 P H F
T

~ H
þ k1 Tr HL ~ þ k2 y H T b þ I þ W 2
~ 1 þ IX W ~ 2 þ Iþ P ~
2
D E D E

þ KW~ 1 ; W W ~ 1 þ K~ ;W W ~ 2 þ KP ; P P ~ þ hKH ; H PX i þ K ~ ; H H
~
W2 H
q nW W

~ 1 2 þ W W

~ 2 2 þ P P

~ 2 þ kH PX k2 þ H H
o
~ 2 ;
þ F F F F F
2

where W; P; b; W ~ 1; W
~ 2 ; P;
~ H; H
~ and K ~ ; K ~ ; K ~ ; KH ; K ~ are primal and dual
W1 W2 P H
variables, q [ 0 is the AL penalty parameter, and ; denotes the trace inner product.
The ADM algorhm is derived by alternately minimizing LAL with respect to each
primal variable while holding others fixed, followed by a gradient ascent step on dual
variables. The overall ADM algorithm can be summarized as follows:
Repeat until convergence after variable initialization:
Primal updates (1) Primal updates (2) Dual updates

P arg minP LAL ~
P arg min ~ LAL KP~ ~
KP~ þ qðP PÞ
P
W arg minW LAL ~1
W arg minW~ 1 LAL KW~ 1 ~ 1Þ
KW~ 1 þ qðW W
H arg minH LAL ~2
W arg minW~ 2 LAL KW~ 2 K ~ þ qðW W ~ 2Þ
W2
b arg minb LAL ~
H arg minH~ LAL KH KH þ qðH PXÞ
KH~ ~
K ~ þ qðH HÞ
H
The primal updates above can all be carried out efficiently in closed form:

P HXT þ P~ þ KH XT KP =q XXT þ I p 1 ~
P max 0; P þ KP~ =q
1
W XH T þ q W~ 1 þW
~ 2 K~ K~ HH T þ 2qIr ~1
W max 0; W 1 þ KW~ 1 =q
W1 W2
1 T
H W T W þ 2qI r þ k2 bbT W X þ qPX KH þ k2 byT ~
H qH þ KH~ ðk1 L þ qIn Þ1
1
b HH T y ~2
W ProjX W þ KW~ 2 =q
~ 2 update denotes the Euclidean projection of a matrix onto

Note ProjX ðÞ for the W
the Stiefel manifold. Letting A 2 Rpr ðr pÞ denote a rank-r matrix, this is given by:

Ir H
ProjX ðAÞ ¼ arg min jjA Qjj2F ¼ U V ð3Þ
Q2X 0
Here URV H represents the SVD of A and 0 2 RðprÞr is a matrix of all zeros;
solution (3) is unique as long as A is full column rank (see Proposition 7 in [11]).
3 Experiments and Conclusions
Dataset. We apply our method to a TBI dataset consisting of 34 TBI patients and 32
age-matched controls. While the control subjects were scanned only once, the TBI
patients were scanned and evaluated at three different time points: 3, 6, and 12 months
post-injury. Of the 34 TBI patients, 18 had all 3 time points, 9 had 2 and 7 had only one
timepoint. The functional outcome of patients was evaluated using the Glasgow Out-
come Scale Extended (GOSE) and Disability Rating Scale (DRS), which are com-
monly used in TBI. GOSE ranges from 1 = dead to 8 = good recovery, whereas DRS
ranges from 0 = normal to 29 = extremely vegetated. In total, the dataset comprises
111 total scans, with 32 labeled control and 79 labeled TBI. All scans are accompanied
with 11 clinical scores that are intended to assess the cognitive functioning of the
subject.
Creating the SCs. DTI data was acquired for each subject (Siemens 3T TrioTim, 8
channel head coil, single shot spin echo sequence, TR/TE = 6500/84 ms, b = 1000
s/mm2, 30 gradient directions). 86 ROIs from the Desikan atlas were extracted to
represent the nodes of the structural network. Probabilistic tractography [3] was per-
formed from each of these regions with 100 streamline fibers sampled per voxel,
resulting in an 86 86 matrix of weighted connectivity values, where each element
represents the conditional probability of a pathway between regions, normalized by the
active surface area of the seed ROI. Finally, the 86 86 connectivity matrix of each
subject was vectorized to its p = 3655 lower triangular elements, resulting in x 2 Rpþ
representing the SC.
Implementation Details. We applied our method to SCs computed from the TBI
dataset to compute the subnetwork bases and their corresponding NMF coefficients;
here we let y = + 1 indicate TBI and y = - 1 indicate control. The disease-severity
graph was created using the functional outcome indices of GOSE/DRS as follows.
First, we constructed a symmetrized k-nearest-neighbor (k-NN) graph with k = 5,
where the distance between scans i and j was measured as di;j ¼ ðGOSEi -
GOSEj Þ2 þ ðDRSi - DRSj Þ2 . Then a binary affinity graph was created by setting Si;j to
1 if and only if scans i and j were connected by the k-NN graph and did not represent
the same subject (to avoid connecting same TBI patients who underwent multiple
scans); controls were left un-connected.
We identified r = 5 subnetwork bases using this affinity graph, and the regulariza-
tion parameters were set at k1 ¼ k2 ¼ 0:25, as the model became stable around this
value (degradation in classification performance was observed when parameters were
set at k1 ¼ k2 ¼ 0, i.e., a setup equivalent to traditional NMF). To initialize the ADM
variables, we use the strategy introduced in [4] to deterministically initialize W and H
and set all other variables to zero for replicability. The AL parameter value was set to
q = 1000 based on empirical test runs, and the ADM algorithm was terminated when
the relative change in the objective function value (Eq. 2) at successive iterations fell
below 104 and the following primal residual condition was met:
!
W W
~ 1 W W~ 2 kH PX k H H ~ P P~
max F
; F
; F
; F
; F
\104 :
kW kF kW kF kH kF kH kF kPkF
To remove features that are likely non-biological, we applied feature selection

using the aforementioned 11 clinical scores. Precisely, we first correlated individual SC
features with each clinical score to obtain 11 separate p-value rankings (rank = 1 the
smallest), and summed these rankings to obtain a rank-sum value for each feature. We
then selected 1000 features having the smallest rank-sum that were then standardized
via linear scaling to the range [0,1]. This feature selection and standardization proce-
dures were conducted within the CV-folds to avoid biasing the classification
performance.
We compared the performance for the following classifiers (implemented using
Liblinear [6]). The first three methods are applied to the 1000 features selected using
the above procedure: (1) L1-loss L2-regularized SVM (SVM), (2) L2-loss, L1 regu-
larized SVM (SVM + L1), (3) L1-regularized Logistic regression (LogReg + L1), and
(4) L1-loss L2-regularized SVM applied to the projected NMF coefficients with our
method. A weighted loss function was used for all classifiers, where the weights
assigned to each label class is inversely proportional to the class frequency. Since
subjects have multiple timepoints, the classification accuracy was assessed using a
Leave-One-Subject-Out CV (LOSO-CV) procedure, where all scans from a test subject
are iteratively left out during training. Finally, the hyperparameter C, which is common
to all classifiers, were tuned via an internal LOSO-CV over the range
C 2 f210 ; 29 ; ; 210 g.
3.1 Experimental Results and Conclusions
Classification Results. Table 1 reports the classification results from LOSO-CV for
different methods, showing overall accuracy, specificity (type I error), sensitivity (type II
error), and balanced score rate (BSR), which is the mean of specificity and sensitivity.
The results show that the classification performance obtained using the proposed sub-
network features demonstrates a noticeable improvement over using the SC features in
its original form, achieving accuracy of 82.0 % and a BSR of 81.8 %. The SVM
achieves the next best performance, but the model is hard to interpret since all 1000 edge
features contribute to the classifier. Finally, despite using a weighted loss function, we
see the sparsity-promoting L1-regularized classifiers suffer from low sensitivity, which
Table 1. Classification results from “leave-one-subject-out” cross-validation.

Classifier Accuracy Sensitivity Specificity BSR
SVM 76.6 % 77.2 % 75.0 % 76.1 %
SVM + L1 69.4 % 73.4 % 59.4 % 66.4 %
LogReg + L1 67.6 % 70.9 % 59.4 % 65.1 %
Proposed NMF + SVM 82.0 % 82.3 % 81.3 % 81.8 %
is likely caused by data label imbalance, as well as the correlated structures among the
features (a case where L1-regularizations tend to suffer).
Effect of Manifold Regularization. We next assessed whether the manifold regu-
larizer with the disease-severity graph has successfully preserved the inter-patient
relationship in terms of GOSE/DRS functional outcome indices. To do this, we
computed Spearman’s rank correlation between the subnetwork bases coefficients and
GOSE/DRS indices from the 79 TBI scans. The results reported in Table 2 reveal that
for all basis coefficients, consistently positive and negative correlations (statistically
significant) are obtained for GOSE and DRS, respectively. This result indicates that
subjects with similar level of disease-severity share similar representations in the
embedding space, demonstrating the impact of manifold regularization.
Table 2. Spearman’s correlation coefficients and corresponding p values between the r = 5

subnetwork basis coefficients and DRS/GOSE severity scores among TBI patients.
Basis label Basis coefs’ correlation with DRS Basis coefs’ correlation with GOSE
1 −0.538 (p = 3.24e-7) 0.596 (p = 6.75e-9)
2 −0.464 (p = 1.65e-5) 0.584 (p = 1.63e-8)
3 −0.387 (p = 4.19e-4) 0.408 (p = 1.88e-4)
4 −0.516 (p = 1.12e-6) 0.607 (p = 3.00e-9)
5 −0.517 (p = 1.08e-6) 0.605 (p = 3.54e-9)
Subnetwork Visualization. Given the high predictive capacity of subnetwork coef-

ficients, we next examine their corresponding subnetwork bases W ¼ ½w1 ; ; w5 to
assess the pathological impact TBI may have induced on structural connectivity. For
visualization and interpretation, we retrained the proposed NMF model using the entire
dataset, and learned an SVM hyperplane b 2 R5 in the corresponding embedding
space. The resulting subnetworks are rendered in 3-D brain space in Fig. 1 (figures
generated using Python module Nilearn [1]); the color of the edges represent the sign of
the hyperplane coefficients in b, with red indicating contribution towards TBI (positive)
and blue indicating contribution towards control (negative). From the figure, we can see
Fig. 1. The subnetwork bases obtained with r ¼ 5. The edge color represents the sign of the
corresponding hyperplane coefficient b 2 Rr (blue = negative/control, red = positive/TBI).
that the network structure of the first basis exhibits strong bilateral symmetry with
notable inter-hemispheric connections between the cerebellar, precuneus, and cingulate
regions. Moreover, the second subnetwork basis resembles dense inter-hemispheric
connections among the subcortical regions, with the sign indicating that these edges
tend to be the weaker among TBI patients. On the other hand, subnetwork bases 3–5
represents connection towards TBI. Overall, the subnetworks exhibit a diffuse con-
nectivity pattern that spans across the cortex, suggesting that damages from TBI results
in a widespread disturbance in brain network. Interestingly, the connectivity patterns in
the first two bases exhibit rich connectivity pattern within the subcortical and medial
posterior regions, which are frequently reported to be vulnerable in TBI.
Conclusions. We have presented a supervised NMF framework for extracting a dis-
joint set of subnetworks that are interpretable and highlight group differences in
structural connectivity. The method is also capable of preserving the manifold structure
in the data encoded by an affinity graph, thereby respecting the intrinsic geometry of
the data. Experiment on a TBI dataset shows that the subnetworks identified from our
method can not only be used to reliably discriminate TBI from controls, but also exhibit
tight correlation with TBI-outcome indices, indicating that subjects with similar level of
TBI-severity share similar subnetwork representations due to manifold regularization.
References
1. Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., et al.: Machine
learning for neuroimaging with scikit-learn. Front. Neuroinformatics 8(14) (2014)
2. Allahyar, A., Ridder, J.: FERAL: network-based classifier with application to breast cancer
outcome prediction. Bioinformatics 31(12), i311–i319 (2015)
3. Behrens, T., et al.: Non-invasive mapping of connections between human thalamus and
cortex using diffusion imaging. Nat. Neurosci. 6(7), 750–757 (2003)
4. Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix
factorization. Pattern Recognit. 41, 1350–1362 (2008)
5. Cheplygina, V., Tax, D.M., Loog, M., Feragen, A.: Network-guided group feature selection
for classification of autism spectrum disorder. In: Wu, G., Zhang, D., Zhou, L. (eds.) MLMI
2014. LNCS, vol. 8679, pp. 190–197. Springer, Heidelberg (2014)
6. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large
linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
7. Ghanbari, Y., Smith, A.R., Schultz, R.T., Verma, R.: Identifying group discriminative and
age regressive sub-networks from DTI-based connectivity via a unified framework of
non-negative matrix factorization and graph embedding. Med. Image Anal. 18(8) (2014)
8. Kasenburg, N., et al.: Supervised hub-detection for brain connectivity. In: Proceedings of the
SPIE, vol. 9784, Medical Imaging 2016: Image Processing, p. 978409 (2016)
9. Lee, D.D., Seung, H.S.: Learning the parts of objects by NMF. Nature 401, 788–791 (1999)
10. Liu, X., et al., H.: Projective nonnegative graph embedding. IEEE Trans. Image Proc. (2010)
11. Manton, J.H.: Optimization algorithms exploiting unitary constraints. IEEE Trans. Signal
Process. 50(3), 635–650 (2002)
12. Xu, Y., Yin, W., Wen, Z., Zhang, Y.: An alternating direction algorithm for matrix
completion with nonnegative factors. Front. Math. China 7(2), 365–384 (2012)
Predictive Subnetwork Extraction
with Structural Priors for Infant Connectomes
Colin J. Brown1(B) , Steven P. Miller2 , Brian G. Booth1 , Jill G. Zwicker3 ,

Ruth E. Grunau3 , Anne R. Synnes3 , Vann Chau2 , and Ghassan Hamarneh1
1
Medical Image Analysis Lab, Simon Fraser University, Burnaby, BC, Canada
cjbrown@sfu.ca
2
The Hospital for Sick Children and The University of Toronto,
Toronto, ON, Canada
3
University of British Columbia and Child and Family Research Institute,
Vancouver, BC, Canada
Abstract. We present a new method to identify anatomical subnet-

works of the human white matter connectome that are predictive of
neurodevelopmental outcomes. We employ our method on a dataset of
168 preterm infant connectomes, generated from diffusion tensor images
(DTI) taken shortly after birth, to discover subnetworks that predict
scores of cognitive and motor development at 18 months. Predictive sub-
networks are extracted via sparse linear regression with weights on each
connectome edge. By enforcing novel backbone network and connectivity
based priors, along with a non-negativity constraint, the learned subnet-
works are simultaneously anatomically plausible, well connected, posi-
tively weighted and reasonably sparse. Compared to other state-of-the-
art subnetwork extraction methods, we found that our approach extracts
subnetworks that are more integrated, have fewer noisy edges and that
are also better predictive of neurodevelopmental outcomes.
1 Introduction
Preterm birth is a world-wide health challenge, affecting millions of children
every year [1]. Very preterm birth (≤ 32 weeks post-menstrual age, PMA) affects
brain development and puts a child at a high risk for delayed, or altered, cognitive
and motor neurodevelopment. It is known from studies of diffusion MR images,
that the development of white matter plays a critical role in the function of a
child’s brain, and that white matter injury is associated with poorer outcomes [2–
5]. Recently, Ziv et al. and Brown et al. showed that by representing the set of
white matter connections as a network (i.e., connectome), features of network
topology could be used to predict abnormal general neurological function and
neuromotor function respectively [4,6].
Representing a diffusion tensor image (DTI) of the brain as a network defined
between regions of interest (ROIs) allows an anatomically informed reduction of
dimensionality from millions of tensor-valued voxels down to thousands of con-
nections (edges). However, for the purposes of prediction, thousands of features

DOI: 10.1007/978-3-319-46720-7 21
176 C.J. Brown et al.
may still be too many and cause over-fitting when limited numbers (e.g. only
hundreds) of scans are available [7]. Furthermore, region of interest based studies
suggest that structural abnormalities related to poor neurodevelopmental out-
comes are not spread evenly across the entire brain, but instead are localized to
particular anatomy [3]. Thus, there is motivation to discover which particular
subnetworks (group of connections or edges) in the brain network best predict
different brain functions.
Some previous works have explored the use of brain subnetworks for predict-
ing outcomes [7–10]. For instance, Zhu et al. used t-tests at each edge in a dataset
of functional connectomes for group discriminance, followed by correlation-based
feature selection and training of a support vector machine (SVM), to find sub-
networks that were predictive of schizophrenia [8]. This multi-stage feature selec-
tion and model training is not ideal, however, because it precludes simultaneous
optimization of all model parameters. Munsell et al. used an Elastic-Net based
subnetwork selection for predicting the presence of temporal lobe epilepsy and
the success of corrective surgery in adults [7]. This method encourages sparse
selection of stable features, useful for identifying those edges most important for
prediction [11], but fails to leverage the underlying structure of the brain net-
works that might inform the importance or the relationships between edges. In
order to capture dependencies between neighbouring edges, Li et al. employed
a Laplacian-based regularizer (in a framework similar to GraphNet [11]) that
encouraged their subnetwork weights to smoothly vary between neighbouring
edges [10]. However, this smoothing may reduce sparsity by promoting many
small weights and blur discontinuities between the weights of neighbouring edges
that should be preserved. An ideal regularizer would encourage a well con-
nected subnetwork while preserving sparsity and discontinuities. Ghanbari et
al. used non-negative matrix factorization to find a sparse set of non-negative
basis subnetworks in structural connectomes [9]. However, rather than trying
to predict specific outcomes (as we propose below), Ghanbari et al. introduced
age-regressive, group-discriminative, and reconstructive regularization terms on
groups of subnetworks, encouraging each group to covary with a particular factor.
They argued that non-negative subnetwork edge weights are more anatomically
interpretable, especially in the case of structural connectomes which have only
non-negative edge feature values.
In this paper, we present our novel approach to identifying anatomical sub-
networks of the human white-matter connectome that are optimally predictive
of a preterm infant’s cognitive and motor neurodevelopmental scores assessed at
18 months of age, adjusted for prematurity. Similar to Munsell et al., our method
is based on a regularized linear regression on the outcome score of choice. Here,
however, we introduce a constraint that ensures the non-negativity of subnet-
work edge weights. We further propose two novel informed priors designed to find
predictive edges that are both anatomically plausible and well integrated into a
connected subnetwork. We demonstrate that these priors effectuate the desired
effect on the learned subnetworks and that, consequently, our method outper-
forms a variety of other competing methods on this very challenging outcome
prediction task. Finally, we discuss the structure of the learned subnetworks in
the context of the underlying neuroanatomy.
Preterm Infant Connectome Subnetworks 177
2 Method
2.1 Preterm Data
Our dataset contains 168 scans taken between 27 and 45 weeks PMA from a
cohort of 115 preterm infants (nearly half of the infants were scanned twice),
born between 24 and 32 weeks PMA. Connectomes were generated for each scan
by aligning an infant atlas of 90 anatomical brain regions with each DTI. Full-
brain streamline tractography was then performed in order to count the number
of tracts (i.e., edge strength) connecting each pair of regions. Our previous works
provide details on the scanning and connectome construction processes [6] and
a discussion on interpreting infant connectomes [5]. Cognitive and neuromotor
function of each infant was assessed at 18 months of age, corrected for prematu-
rity, using the Bayley Scales of Infant and Toddler Development 3rd edition [12].
The scores are normalized to 100 ± 15; adverse outcomes are those with scores
at or below 85 (i.e., ≤ −1 std.).
Our dataset is imbalanced, containing few scans of infants with high and low
outcome scores. In order to flatten this distribution, the number of connectomes
in each training set was doubled by synthesizing instances with high and low
outcome scores, using the synthetic minority over-sampling technique [13].
2.2 Subnetwork Extraction

Given a set of preterm infant connectomes, our goal is to find a subnetwork that
is: (a) predictive (i.e., contains edges that accurately predict a neurodevelopmen-
tal outcome), (b) anatomically plausible (i.e., edges correspond to valid axon
bundles), (c) well connected (i.e., high network integration [5]), (d) reasonably
sparse and (e) non-negative.
Each connectome is represented as a graph G(V, E) comprising a set of 90
vertices, V , and M = 90 × 89/2 = 4005 edges, E. The tract counts associated
with the edges are represented as a single feature vector x ∈ R1×M and the
entire training set of N subjects is represented as X ∈ RN ×M with outcome
scores y ∈ RN ×1 . To find a subnetwork that fits the above criteria, we optimize
an objective function over a vector of subnetwork edge weights, w ∈ RM ×1 :
w∗ = argmin||y − Xw||2 + λL1 ||w||1 + λB (wT Bw) + λC (wT Cw) (1)
w
such that w ≥ 0, (2)
where ||w||1 is a sparsity regularization term, B is the network backbone prior
matrix (see Sect. 2.3), and C is the connectivity prior matrix (see Sect. 2.4).
Hyper-parameters, λB , λC and λL1 are used to weight each of the regularization
terms. Given a set of learned weights, w∗ , the outcome score of a novel infant
connectome, xnew can be predicted as ypred = xnew w∗ .
Note that since X is non-negative and since w is required to be non-negative,
we also require y to be non-negative, as they should since the true Bayley scores
range between 45 and 155. To perform this optimization we used the method
(and software) of Schmidt et al. [14].
2.3 Network Backbone Prior
Many of the 4005 possible connectome edges are anatomically unlikely (i.e.,
between regions not connected by white matter fibers) but may be non-zero in
certain scans due to imaging noise and accumulated pipeline error (i.e. due to
atlas registration, tractography, and tract counting) [15]. With many more edges
than training samples, some edges may appear discriminative by pure chance,
when in fact they are just noise. Therefore, we propose a network backbone prior
term that encodes a penalty discouraging the subnetwork from including edges
with a low signal-to-noise ratio (SNR) in the training data. The SNR of the j-th
edge can be computed as the ratio MEAN(X:,j )/SD(X:,j ). However, this may
falsely declare an edge as noisy when the variability (c.f. denominator) in the
edge value is not due to noise but rather due to the edges values changing in
a manner that correlates with the outcome of the subject. To counteract this
problem, we divide the scans into two classes: scans with normal outcomes, H,
and scans with adverse outcomes, U . The SNR is then computed separately for
each class. Let XΩ represent a matrix with a subset of the rows in X where Ω ∈
{U, H}. The SNR for each edge, j, in each class, Ω, is computed as SNR(XΩ,j ) =
MEAN(XΩ,j )
SD(XΩ,j ) . In order not to favour the strongest fiber bundles over weak yet
important bundles, we threshold the SNR at each edge conservatively, to exclude
only the least anatomically likely edges. An edge, j, is only penalized if both
SNR(XU,j ) and SNR(XH,j ) are less than or equal to 1 (i.e., signal is weaker
than noise in both classes). In particular, B is an M × M diagonal matrix, such
that,
1, if SNR(XH,j ) ≤ 1 and SNR(XU,j ) ≤ 1
Bj,j = (3)
0, otherwise.
So wT Bw only penalizes edges that do not pass the SNR threshold among either
instances with normal outcomes or abnormal outcomes, and thus are likely noisy.
Figure 1 shows an example of B. Note that, especially for infant connectomes,
even edges with high SNR may not represent white matter fibers but instead
high FA from other causes [5]. Nevertheless, such high-SNR edges are not likely
due to noise but instead to some real effect and thus may aid prediction.
2.4 Connectivity Prior
We also want to encourage the subnetwork to be highly integrated as opposed

to being a set of scattered, disconnected edges. This is motivated by the fact
that functional brain network activity is generally constrained to white mat-
ter structure [16] and white matter structure is organized into well connected
link communities [17]. Thus, we do not expect there to be many, disconnected
sub-parts of the brain that are all highly responsible for any particular neurode-
velopmental outcome type. To embed this prior, we incentivize pairs of edges in
the target subnetwork to share common nodes. For edge ei,j , between nodes i
and j, and edge ep,q between nodes p and q, we construct the matrix,
Fig. 1. (a) A sample backbone prior network (i.e., all edges where Bj,j = 0) mapped on
to a Circos ideogram (http://circos.ca/). Inter-hemispherical connections are in green
and intra-hemispherical connections are in red (left) and blue (right). Opacity of each
link is computed as SNR(XU,j ) × SNR(XH,j ). (b) Axial, (c) sagittal and (d) coronal
views of the same network rendered as curves representing the mean shape of all tracts
between those connected regions (from one infant’s scan).

−1, if i = p or i = q or j = p or j = q
C(ei,j , ep,q ) = (4)
0, otherwise,
such that the term wT Cw becomes smaller (i.e., more optimal) for each pair of
non-zero weighted subnetwork edges sharing a node. This term places a priority
on retaining edges in the subnetwork that are connected to hub nodes. This is
desirable since subnetwork hub nodes indicate regions that join many connections
(i.e., edges) predictive of outcome. In contrast to a Laplacian based regularizer
which would encourage subnetwork weights to become locally similar, reducing
sparsity, our proposed term simply rewards subnetworks with stronger hubs.
3 Results
We compare the proposed subnetwork-driven predicted outcomes for the preterm

infant cohort (N = 168) with competing outcome prediction techniques. Methods
are evaluated using (i) Pearson’s correlation between ground truth and predicted
scores, and (ii) the area over the regression error characteristic curve (AOC),
which provides an estimate of regression error [18]. Some previous studies have
focused on predicting a binary abnormality label instead of predicting actual
scalar outcome scores [4,6]. Thus, to compare more directly to these works, we
also evaluate the accuracy of our models as a binary classifier for predicting scores
above or below 85. Similar to Brown et al., an SVM was used to classify normal
from abnormal instances as it was found to perform better than thresholding
the predicted scores at 85. SVM learns a max-margin threshold for the predicted
scores (i.e., one input feature), optimal for classification over the training set.
For each method (both proposed and competing), coarse grid searches were
performed in powers of two over the method’s hyper-parameters to find the
best performance for both cognitive and motor outcomes independently. For the
proposed method, this search was over λL1 , λC , λB ∈ {20 , ..., 29 }. A finer grid
search was not performed to avoid over-fitting to the dataset. For each setting of
the parameters, a leave-2-out, 1000-round cross validation test was performed.
If two scans were of the same infant, those scans were not split between test and
training sets. Table 1 shows a comparison of the different methods tested on the
preterm infant connectomes for prediction of motor and cognitive scores.
Table 1. Correlation (r) between ground-truth and predicted scores, area over REC
curve (AOC) values and classification accuracy of scores at or below 85 (acc.) for
each model, assessed via 1000 rounds of leave-2-out cross validation. Note that Brown
et al.’s method [6] performs binary classification only.
Motor Cognitive
Method r AOC acc. r AOC acc.
Zhu et al. [8] 0.1586 27.3904 45.10 0.02055 28.0529 49.65
Elastic-Net [7] 0.2703 24.575 58.75 0.2074 24.8292 54.75
Brown et al. [6] - - 62.85 - - 52.55
Linear regression 0.2696 24.777 58.75 0.2445 24.72 55.15
+ L1 regularization 0.3136 18.5451 64.00 0.2443 24.7514 55.2
+ Non-neg. constraint 0.4327 14.5326 68.80 0.3171 17.7255 57.65
+ Backbone prior 0.4355 14.474 68.55 0.3271 17.8184 58.45
+ Connectivity prior (Ours) 0.4423 14.253 70.80 0.3432 17.3768 59.50
Our proposed method with backbone and connectivity priors achieved the
highest correlations, lowest AOCs and best 2-class classification accuracies
for both motor and cognitive scores (for parameter settings, [λL1 , λC , λB ] of
[22 , 21 , 26 ] and [25 , 22 , 25 ], respectively). For 2-class classification in particular,
our method outperformed Brown et al.’s method by 7.4 %, Elastic-Net [7] by
8.4 % and Zhu et al.’s method [8] by 17.6 % higher accuracy on average. Using a
two-proportion z-test, we found all these differences to be statistically significant
(p < 0.05). Also, note that, beginning with standard linear regression, the corre-
lation values improved as each regularization term was added. All tested methods
had statistically significant (p < 0.05) correlations since, for 1000 × 2 = 2000
total predictions, the threshold for 95 % significance is r ≥ 0.0439.
Figure 2 displays the predictive subnetworks learned by our proposed method
(averaged over all rounds of cross validation). Subnetworks were stable across
rounds: 93.6 % of all edges were consistently in or out of the subnetwork 95 % of
the time. We examined the structure of the selected subnetworks to analyse the
effect of the proposed regularization terms. By including the L1 regularization
term, the learned subnetworks were very sparse, having an average of 71.6 % and
98.2 % of edge weights set to zero for motor and cognitive scores, respectively,
up from only 6.7 % (for either score) without the L1 term. Adding the back-
bone network prior reduced the number of low SNR edges (i.e., Bj,j = 1) by
18.6 % percent for motor score prediction and 11.2 % for cognitive score predic-
tion. Adding the connectivity prior improved subnetwork efficiencies (a measure
of network integration [5]) by a factor of 6.8 (from 0.0059 to 0.0403) and 2.2
(from 0.2807 to 0.6215) for subnetworks predictive of motor and cognitive scores,
respectively.
Fig. 2. (Top) Optimal weighted subnetworks for prediction of (a) motor and (b) cog-
nitive outcomes. Stronger edge weights are represented with more opaque streamlines.
(Bottom) Circos ideograms for the (c) motor and (d) cognitive subnetworks.
As expected, the predictive motor subnetwork clearly includes the cortico-

spinal tracts (Fig. 2a.i). The predictive cognitive subnetwork was more sparse
and had generally lower weights than the motor subnetwork (as visualized by
less dense, more transparent streamlines), due to the larger L1 weight used
for best prediction of the cognitive scores. However, the left and right medial
superior frontal gyri (SFGmed) and the connection between these two regions
that had stronger weights (factor of 2.1) in the cognitive network than in the
motor network, (Fig. 2d.ii). This is not surprising as these regions contain the
presupplementary motor area which is thought to be responsible for a range of
cognitive functions [19].
4 Conclusions
To better understand neurodevelopment and to allow for early intervention when
poor outcomes are predicted, we proposed a framework for learning subnetworks
of structural connectomes that are predictive of neurodevelopmental outcomes
for infants born very preterm. We found that by introducing our novel network
backbone prior, the learned subnetworks were more robust to noise by includ-
ing fewer edges with low SNR weights. By including our connectivity prior, the
subnetworks became more highly integrated, a property we expect for subnet-
works pertinent to specific functions. Compared to other methods, our approach
achieved the best accuracies for predicting both cognitive and motor scores of
preterm infants, 18 months into the future.
Acknowledgements. We thank NSERC, CIHR (MOP-79262: S.P.M. and MOP-

86489: R.E.G.), the Canadian Child Health Clinician Scientist Program and the
Michael Smith Foundation for Health Research for their financial support.
References
1. World Health Organization. Preterm birth fact sheet no. 363. http://www.who.
int/mediacentre/factsheets/fs363/en/. Accessed 03 Mar 2015
2. Back, S.A., Miller, S.P.: Brain injury in premature neonates: a primary cerebral
dysmaturation disorder? Ann. Neurol. 75(4), 469–486 (2014)
3. Chau, V., Synnes, A., Grunau, R.E., Poskitt, K.J., Brant, R., Miller, S.P.: Abnor-
mal brain maturation in preterm neonates associated with adverse developmental
outcomes. Neurology 81(24), 2082–2089 (2013)
4. Ziv, E., Tymofiyeva, O., Ferriero, D.M., Barkovich, A.J., Hess, C.P., Xu, D.: A
machine learning approach to automated structural network analysis: application
to neonatal encephalopathy. PLoS ONE 8(11), e78824 (2013)
5. Brown, C.J., Miller, S.P., Booth, B.G., Andrews, S., Chau, V., Poskitt, K.J.,
Hamarneh, G.: Structural network analysis of brain development in young preterm
neonates. NeuroImage 101, 667–680 (2014)
6. Brown, C.J., et al.: Prediction of motor function in very preterm infants using
connectome features and LSI. In: Navab, N., Hornegger, J., Wells, W.M.,
Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 69–76. Springer, Heidelberg
(2015)
7. Munsell, B.C., Wee, C.-Y., Keller, S.S., Weber, B., Elger, C., da Silva, L.A.T.,
Nesland, T., Styner, M., Shen, D., Bonilha, L.: Evaluation of machine learning
algorithms for treatment outcome prediction in patients with epilepsy based on
structural connectome data. NeuroImage 118, 219–230 (2015)
8. Zhu, D., Shen, D., Jiang, X., Liu, T.: Connectomics signature for characterizaton
of MCI and schizophrenia. In: ISBI, pp. 325–328. IEEE (2014)
9. Ghanbari, Y., Smith, A.R., Schultz, R.T., Verma, R.: Identifying group discrimina-
tive and age regressive sub-nets from DTI-based connectivity via a unified frame-
work of NMF and graph embedding. MIA 18(8), 1337–1348 (2014)
10. Li, H., Xue, Z., Ellmore, T.M., Frye, R.E., Wong, S.T.: Identification of faulty DTI-
based sub-networks in autism using network regularized SVM. In: Proceedings of
ISBI, vol. 6, pp. 550–553 (2012)
11. Grosenick, L., Klingenberg, B., Katovich, K., Knutson, B., Taylor, J.E.: Inter-
pretable whole-brain prediction analysis with GraphNet. NeuroImage 72(2), 304–
321 (2013)
12. Bayley, N.: Manual for the Bayley Scales of Infant Development, 3rd edn. Harcourt,
San Antonio (2006)
13. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic
minority over-sampling technique. J. AI Res. 16(1), 321–357 (2002)
14. Schmidt, M.: Graphical model structure learning with l1-regularization. Ph.D. the-
sis, University of British Columbia (Vancouver) 2010
15. Cheng, H., Wang, Y., Sheng, J., Kronenberger, W.G., Mathews, V.P.,
Hummer, T.A., Saykin, A.J.: Characteristics and variability of structural networks
derived from diffusion tensor imaging. NeuroImage 61(4), 1153–1164 (2012)
16. Honey, C.J., Sporns, O., Cammoun, L., Gigandet, X., Thiran, J.P., Meuli, R.,
Hagmann, P.: Predicting human resting-state functional connectivity from struc-
tural connectivity. Proc. Natl. Acad. Sci. USA 106(6), 2035–40 (2009)
17. de Reus, M.A., Saenger, V.M., Kahn, R.S., van den Heuvel, M.P.: An edge-centric
perspective on the human connectome: link communities in the brain. Phil. Trans.
R. Soc. B 369(1653), 20130527 (2014)
18. Bi, J., Bennett, K.P.: Regression error characteristic curves. In: Proceedings of
ICML-2003, pp. 43–50 (2003)
19. Zhang, S., Ide, J.S., Li, C.S.R.: Resting-state functional connectivity of the medial
superior frontal cortex. Cereb. Cortex 22(1), 99–111 (2012)
Hierarchical Clustering of Tractography
Streamlines Based on Anatomical Similarity
Viviana Siless1(B) , Ken Chang3 , Bruce Fischl1,2 , and Anastasia Yendiki1

1
Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging,
Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
vsiless@mgh.harvard.edu
2
MIT Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
3
Harvard Medical School, Boston, MA, USA
Abstract. Diffusion MRI tractography produces massive sets of stream-

lines that contain a wealth of information on brain connections. The
size of these datasets creates a need for automated clustering methods
to group the streamlines into anatomically meaningful bundles. Con-
ventional clustering techniques group streamlines based on their spa-
tial coordinates. Neuroanatomists, however, define white-matter bundles
based on the anatomical structures that they go through or next to,
rather than their spatial coordinates. Thus we propose a similarity met-
ric for clustering streamlines based on their position relative to cortical
and subcortical brain regions. We incorporate this metric into a hier-
archical clustering algorithm and compare it to a metric that relies on
Euclidean distance, using data from the Human Connectome Project.
We show that the anatomical similarity metric leads to a 20 % improve-
ment in the agreement of clustering results with manually labeled tracts,
without introducing prior information from a tract atlas into the clus-
tering.
Keywords: Hierarchical clustering · Normalized cuts · Tractography ·

dMRI
1 Introduction
Diffusion MRI (dMRI) allows us to estimate the preferential direction of water
molecule diffusion at each voxel in white matter (WM). Tractography algorithms
follow these directions to reconstruct continuous paths of diffusion. The most
common approach to segmenting WM from dMRI data is to use every voxel in the
brain as a seed for tractography and to group the resulting streamlines into bun-
dles. Recent advances in dMRI acquisition hardware and software have increased
both spatial and angular resolution, yielding large tractography datasets that are
difficult to parse manually. This creates a need for computational methods that
can extract anatomically meaningful bundles automatically.
Typical methods for unsupervised clustering of streamlines use similarity
measures based on spatial coordinates [1–3]. This is not consistent with the app-
roach followed by neuroanatomists, who define WM bundles based on the brain

DOI: 10.1007/978-3-319-46720-7 22
Anatomical-Based Hierarchical Clustering of Streamlines Tractography 185
structures that they go through or next to, rather than their spatial coordinates
in a template space. Our goal is to develop a similarity measure that mimics
this approach, comparing streamlines based on their anatomical neighborhood.
Previous attempts to incorporate anatomical information in streamline cluster-
ing mostly used the termination regions of the streamlines, either in a post-hoc
manner [3] or in the similarity metric itself [4,5]. The similarity measure that we
propose in this work includes a detailed description of all the regions that form
the anatomical neighborhood of a streamline, everywhere along its trajectory.
Such a description was previously used to incorporate prior information from a
set of training subjects in the tractography step itself [6]. However, that was a
supervised approach, limited to a set of predefined bundles from an atlas.
We incorporate the proposed anatomical similarity measure into a hierarchi-
cal spectral clustering algorithm [1–3]. The benefit of a hierarchical approach
is that it models the structure of large WM tracts, which are known to be
subdivided into multiple smaller bundles. We compare our similarity metric
to one based on Euclidean distance between streamlines, using data from the
MGH/UCLA Human Connectome Project. We show that clustering streamlines
based on their anatomical neighborhood rather than their spatial coordinates
leads to a 20 % improvement in the agreement of the clusters with manual label-
ing by a human rater. Importantly, we achieve this without using prior infor-
mation from manual labels, which allows us to explore whole-brain structure
without being constrained to a predetermined set of bundles.
2 Methods
2.1 Normalized Cuts
Spectral methods approach clustering as a graph partitioning problem. The

graph weights, which represent local information, are used to form a similarity
matrix. Clusters are defined globally based on the eigenvectors of this matrix.
Given a connected graph G, the Normalized Cuts algorithm searches for a graph
cut that divides G into sets A and B (A ∩ B = ∅), by minimizing similarity
between A and B and maximizing similarity within A and B. Clusters are split
recursively, generating a top-down hierarchical structure [7].
When clustering tractography data, each streamline represents a node on a
graph G, and the weight of each edge is the similarity between the nodes it
connects. For the purpose of finding the optimal cut of the graph into A and B,
we need to quantify the similarity between A and B. This is defined as the sum
of the weights of the edges between them: s(A, B) = u∈A,v∈B w(u, v), where
w(·, ·) is a similarity function between two streamlines. To avoid trivial solutions
where A or B is a single isolated node, s is normalized by an association measure
a(A) = u∈A,t∈G w(u, t). Thus the minimum cut is defined by:
s(A, B) s(A, B)
min snorm (A, B), snorm (A, B) = + .
A,B a(A) a(B)
186 V. Siless et al.
Embedding the problem in the real-valued domain, it can be efficiently solved

by the following equation:
1 1
D 2 (D − W)D 2 z = λz,

where D is a diagonal matrix with Dii = j w(i, j), W is the similarity matrix
with Wij = w(i, j), z = Dy, and D − W is a Laplacian matrix. The second
eigenvector of the Laplacian matrix is a solution to the above equation. The
optimal cut is approximated by assigning the i-th node to A if yi > 0 and to B
otherwise [8]. The algorithm is run iteratively until a desired number of clusters
is reached or a threshold for minimum cluster size is met.
2.2 Similarity Measures
Let fi be a tractography streamline, defined as a sequence of N points xik ∈

R3 , k = 1, . . . , N . Each tractography dataset consists of a set of M streamlines,
F = {f1 , . . . , fM }.
We define a similarity measure based on Euclidean distances between two
streamlines fi and fj as:
N

we (fi , fj ) = (xik − xjk 2 + 1)−1 .
k=1
We incorporate anatomical information in the clustering with a cortical and

subcortical segmentation, S(x), x ∈ R. We then find the segmentation labels that
each streamline goes through or next to. Specifically, each point x on a streamline
is associated with a set of segmentation labels, S(x+dl (x)vl ), l = 1, . . . , P , where
dl (x) is the minimum d > 0 such that S(x + dvl ) = S(x). That is, for each point
x, we find the nearest neighboring segmentation labels in a set of directions vl ,
l = 1, . . . , P . For a neighborhood of P = 26 elements, vl = [e1 , e2 , e3 ], where
e1,2,3 ∈ {−1, 0, 1}. In the case of e1 = e2 = e3 = 0, we get the label that the
streamline goes through. In all other cases we get its neighboring labels.
For each neighborhood element l = 1, . . . , P , we compute a histogram Hil
that represents the frequency with which different segmentation labels are the
l-th neighbor across all points on the i-th streamline. We then define our anatom-
ical similarity between two streamlines fi and fj as:
P

wa (fi , fj ) = |Li ∩ Lj | Hil , Hjl ,
l=1
where ·, · is the inner product, and Li , Lj are the sets of all neighboring labels
for streamlines fi , fj . The normalization term |Li ∩ Lj |, which is the number of
common neighbors between the streamlines, penalizes trivial streamlines with
too few neighbors. The sum in the above equation can be seen as the joint
probability of the anatomical neighborhoods of two streamlines.
3 Results
3.1 Data Analysis
We used dMRI and structural MRI (sMRI) data from 32 healthy sub-
jects, scanned as part of the Human Connectome Project (humanconnect
omeproject.org). The data was acquired with the MGH Siemens Connectom,
a Skyra 3T MRI system with a custom gradient capable of maximum strength
300 mT/m and slew rate 200 T/m/s. The sMRI data was acquired with MEM-
PRAGE [9], TR=2530 ms, TE=1.15 ms, TI=1100 ms, 1 mm isotropic resolution.
The dMRI data was acquired with 2D EPI, TR = 8800 ms, TE=57.0 ms, 1.5 mm
2
isotropic resolution, 512 gradient directions, bmax = 10, 000 s/mm .
We reconstructed orientation distribution functions using the generalized
q-sampling imaging model [10] and performed deterministic tractography using
DSI Studio [11]. We obtained a total of 500 k streamlines. As we are interested in
long-range connections, and to make computations tractable, we excluded any
streamlines shorter than 55 mm, leaving on the order of 100 k streamlines per
subject. Streamlines were then downsampled to N = 10 equispaced points.
For comparison with unsupervised clustering, a trained rater labeled the 18
major WM bundles manually for each subject: corticospinal tract (cst), inferior
longitudinal fasciculus (ilf), uncinate fasciculus (unc), anterior thalamic radia-
tion (atr), cingulum - supracallosal bundle (ccg), cingulum - infracallosal (angu-
lar) bundle (cab), superior longitudinal fasciculus - parietal bundle (slfp), supe-
rior longitudinal fasciculus - temporal bundle (slft), corpus callosum - forceps
major (fmaj), corpus callosum - forceps minor (fmin) [12].
Each subject’s dMRI and sMRI data was co-registered with an affine trans-
formation. The anatomical segmentation was obtained by processing the sMRI
data with the automated cortical parcellation and subcortical segmentation tools
in FreeSurfer [13,14]. In addition, subcortical WM labels were defined by classi-
fying each WM voxel that was within 5 mm from the cortex based on its nearest
cortical label. This resulted in a total of 261 cortical and subcortical labels.
We performed unsupervised clustering with the two similarity measures
described in the previous section. For the anatomical similarity measure we eval-
uated neighborhoods with 6, 14 and 26 elements. Due to space constraints, we
show here results with the 26-element neighborhood only as it performed best.
We iterated the clustering algorithm until a total of 200 clusters were gener-
ated. To evaluate the algorithm for different numbers of clusters, we pruned the
hierarchical clustering tree to keep the first 75, 100, 125, 150 or 200 clusters.
3.2 Comparison with Manual Labeling
We use the 18 manually labeled bundles to evaluate correspondence between

unsupervised clustering and labeling by a human rater. We compare the Dice
coefficient [15] between each manually labeled bundle and the union of all clus-
ters for which at least 5 % of streamlines belong to the manually labeled bundle.
Figure 1(a) shows the average Dice coefficient over all 18 tracts and 32 subjects,
(a) (b)
Fig. 1. (a) Average Dice coefficient of clusters and manually labeled tracts over 18
tracts and 32 subjects, as a function of the total number of clusters. (b) Average Dice
coefficient over all subjects by tract, when the total number of clusters is 200.
(a) (b)
Fig. 2. Average homogeneity (a) and completenes (b) over 18 tracts and 32 subjects,
as a function of the number of clusters.
as a function of the total number of clusters. Figure 1(b) shows the average Dice
coefficient over all subjects by tract, when the total number of clusters is 200.
The anatomical similarity measure is 20 % better than the Euclidean similarity
measure in terms of its agreement with bundles defined by a human rater. We
also compute homogeneity and completeness, two metrics that are commonly
used to evaluate clustering quality [16]. In Fig. 2 we show homogeneity (a) and
completeness (b) for both the anatomical and Euclidean similarity measure. This
comparison takes into account only streamlines that belong to one of the manu-
ally labeled tracts, as it requires ground truth classes. Our anatomical similarity
measure outperforms the Euclidean similarity measure in both homogeneity and
completeness (p < .0001).
3.3 Anatomical and Spatial Consistency of Clusters

One might expect streamlines that are close to each other in Euclidean space to
also have common anatomical neighbors, and vice versa. Here we evaluate the
extent to which streamlines that are clustered based on the anatomical similarity
metric wa are also similar with respect to the Euclidean similarity metric we and
vice versa. To this end, we calculate the average of all streamlines in a cluster,
and find the streamline closest to this average. We refer to this streamline as
the centroid of the cluster. We then evaluate we and wa between every element
in a cluster and the cluster centroid. Average similarities are shown in Fig. 3.
Figure 3(a) shows plots of we for clusters derived by optimizing either we or wa .
Figure 3(b) shows plots of wa for clusters derived by optimizing either we or
wa . Of course we expect that the we -optimal clusters will have higher we than
the wa -optimal clusters, and vice versa. This is confirmed in the plots. However,
we find that the wa -optimal clusters are also near-optimal in terms of we but
the reverse is not true. Based on a two-way analysis of variance, where the
factors are the clustering methods and the number of clusters, we find that the
difference in we between the wa -optimal clusters and the we -optimal clusters is
not statistically significant (p = .48), whereas the difference in wa between them
is (p < .0001). This implies that, while wa tends to group together streamlines
that are also close to each other in the Euclidean space, we does not necessarily
yield clusters that are anatomically consistent.
(a) (b)
Fig. 3. Average Euclidean similarity (a) and anatomical similarity (b) for each of the
two clustering methods, as a function of the number of clusters.
In Fig. 4 we show, for four pairs of anatomical ROIs, the clusters for which
at least 5 % of streamlines pass through both ROIs when the number of clusters
is 200. The Euclidean similarity measure produces noisier and less anatomically
consistent clusters than the anatomical similarity measure. For example, stream-
lines that lie on opposite sides of the midline but are close to each other in space
may be erroneously clustered together by we , but not by wa (see Fig. 4(d)).
Euclidean Similarity
Anatomical Similarity
(a) (b) (c) (d)
Fig. 4. Union of clusters containing at least 5 % of streamlines passing through a pair of

anatomical ROIs: (a) precentral and brainstem, (b) superior parietal and brainstem, (c)
superior temporal and precentral, (d) isthmus cingulate and rostral anterior cingulate
To evaluate the robustness of our method to the accuracy of the anatomical

segmentation, we randomly perturbed the label borders by 1.5 mm. This experi-
ment did not lead to significant changes in the performance of our method (p = .7
for homogeneity, .7 for completeness, .3 for Dice). Although we show results
only on the long-range (>55 mm) streamlines to reduce computation, prelimi-
nary tests on full tractography datasets indicate that the algorithm also works
when shorter streamlines are included. The algorithm runs in ∼45 min with the
anatomical similarity measure when all anatomical neighbors are parallelized. It
runs in ∼1 h when the Euclidean similarity measure is used. Segmentations are
computed in ∼12 h per subject.
4 Conclusion
We present a method for unsupervised hierarchical clustering of dMRI tractog-
raphy data based on anatomical similarity. We compare this to the conventional
approach of using a similarity based on Euclidean distance. We find that the
anatomical similarity yields results more consistent with manual labeling. That
is, without introducing any training data from human raters, we are able to
obtain results that are in closer agreement with such a rater. We achieve this
simply by using a similarity metric that is better at replicating how a human
with neuroanatomical expertise would segment WM tracts, i.e., based on the
anatomical structures that they either intersect or neighbor, everywhere along
the tracts’ trajectory. This allows us to obtain anatomically meaningful WM
bundles without being limited to a small set of tracts included in an atlas. We

expect that our approach, which relies on the relative positions between stream-
lines and a set of anatomical structures that is encountered in all subjects, will
allow us to identify bundles that are consistent across subjects and populations.
We plan to investigate this further in the future.
Acknowledgement. This research was supported by the Boston Adolescent Neu-

roimaging of Depression and Anxiety project (U01-MH108168), and was made pos-
sible by the resources provided by Shared Instrumentation Grants 1S10RR023401,
1S10RR019307, and 1S10RR023043. Data were provided by the MGH/UCLA Human
Connectome Project (5U01-MH093765).
References
1. O’Donnell, L., et al.: Automatic tractography segmentation using a high-
dimensional white matter atlas. IEEE Trans. Med. Imaging 26, 1562–1575 (2007)
2. Guevara, P., et al.: Robust clustering of massive tractography datasets. NeuroIm-
age 54(3), 1993–1975 (2010)
3. Wassermann, D., et al.: Unsupervised white matter fiber clustering and tract prob-
ability map generation: applications of a Gaussian process framework for white
matter fibers. NeuroImage 51(1), 228–241 (2010)
4. Wang, Q., et al.: Application of neuroanatomical features to tractography cluster-
ing. Hum. Brain Mapp. 34(9), 2089–2102 (2013)
5. Tunc, B., et al.: Automated tract extraction via atlas based adaptive clustering.
NeuroImage 102, Part 2:596–607 (2014)
6. Yendiki, A., et al.: Automated probabilistic reconstruction of white-matter path-
ways in health and disease using an atlas of the underlying anatomy. Front. Neu-
roinform. 5(23), 12–23 (2011)
7. Shi, J., et al.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal.
Mach. Intell. 22(8), 888–905 (2000)
8. Golub, G.H., et al.: Matrix Computations. Johns Hopkins University, Baltimore
(1996)
9. van der Kouwe, A., et al.: Brain morphometry with multiecho MPRAGE. Neu-
roImage 40(2), 559–569 (2008)
10. Yeh, F.C., et al.: Generalized q sampling imaging. IEEE Trans. Med. Imaging
29(9), 1626–1635 (2010)
11. Yeh, F.C., et al.: Deterministic diffusion fiber tracking improved by quantitative
anisotropy. PLoS ONE 8(11), 11 (2013)
12. Wakana, S.: Reproducibility of quantitative tractography methods applied to cere-
bral white matter. NeuroImage 36(3), 630–644 (2007)
13. Fischl, B., et al.: Whole brain segmentation: automated labeling of neuroanatom-
ical structures in the human brain. Neuron 33(3), 341–355 (2002)
14. Fischl, B., et al.: Automatically parcellating the human cerebral cortex. Cereb.
Cortex 14(1), 11–22 (2004)
15. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology
26(3), 297–302 (1945)
16. Rosenberg, A., et al.: V-measure: a conditional entropy-based external cluster eval-
uation measure. In: Proceedings of 2007 Joint Conference on Empirical Methods
in Natural Language Processing and Computational Natural Language Learning
(EMNLP-CoNLL), pp. 410–420 (2007)
Unsupervised Identification of Clinically
Relevant Clusters in Routine Imaging Data
Johannes Hofmanninger(B) , Markus Krenn, Markus Holzer, Thomas Schlegl,

Helmut Prosch, and Georg Langs
Department of Biomedical Imaging and Image-guided Therapy,

Computational Imaging Research Lab, Medical University of Vienna, Vienna, Austria
johannes.hofmanninger@meduniwien.ac.at
http://www.cir.meduniwien.ac.at/
Abstract. A key question in learning from clinical routine imaging data

is whether we can identify coherent patterns that re-occur across a pop-
ulation, and at the same time are linked to clinically relevant patient
parameters. Here, we present a feature learning and clustering approach
that groups 3D imaging data based on visual features at corresponding
anatomical regions extracted from clinical routine imaging data without
any supervision. On a set of 7812 routine lung computed tomography vol-
umes, we show that the clustering results in a grouping linked to terms
in radiology reports which were not used for clustering. We evaluate dif-
ferent visual features in light of their ability to identify groups of images
with consistent reported findings.
1 Introduction
The number of images produced in radiology departments is rising rapidly, gen-
erating thousands of records per day that cover a wide range of diseases and
treatment paths [9]. Identifying diagnostically relevant markers in this data is
a key to improving diagnosis and prognosis. Currently, computational image
analysis typically relies on well annotated and curated training data such as
COPDGene or LTRC1 that have fostered substantial methodological advance.
While these kind of data sets enable the creation of accurate and sensitive detec-
tors for specific findings, they are limited, since annotation is only feasible on
a relatively small number of cases. Selection or study specific data acquisition
can introduce bias, and limits the range of observations represented in the data.
In contrast, learning from routine data could enable the discovery of relation-
ships and markers beyond those that can be feasibly annotated, sampling a wide
variety of cases. Furthermore, unsupervised learning on such data enables the
search for novel disease phenotypes that better reflect a grouping of patients
with similar prognosis, than current categories do.
G. Langs—This research was supported by teamplay which is a Digital Health Ser-
vice of Siemens Healthineers, by the Austrian Science Fund, FWF I2714-B31, and
WWTF S14-069.
1
www.copdgene.org (COPDgene), ltrcpublic.com (LTRC).

DOI: 10.1007/978-3-319-46720-7 23
Unsupervised Identification of Clinically Relevant Clusters 193
In this paper, we propose unsupervised learning to group patients based on

non-annotated clinical routine imaging data. We show that based on learned
visual features, we identify population clusters with homogeneous (within clus-
ters) but distinct (across clusters) clinical findings. To evaluate the link between
visual clusters and clinical findings, we compare clusters with corresponding radi-
ology report information extracted with natural language processing algorithms.
An overview of the workflow is given in Fig. 1.
a) Unsupervised Learning from Imaging Data b) Evaluation based

PACS on R adiology Reports Radiology
Image Data Reports
Spatial Feature Cluster Term

Segmentation Clustering
Normalization Extraction Analysis Extraction
Fig. 1. Population clustering and evaluation. (a) All processing steps towards pop-
ulation clustering are performed unsupervised and use anonymized routine images
exported from a PACS system. (b) Findings extracted from radiology reports are used
to evaluate if clusters reflect disease phenotypes in the population.
Relation to Previous Work. Radiomics [11] involving (a) imaging data, (b)
segmentation, (c) feature extraction and (d) analysis [10] has recently gained
significant attention, but approaches that reduce the reliance on annotation to
extend the covering of variability are scarse. Our work is a contribution to this
direction. Although applicable to a large number of conditions, radiomics is
mostly applied and developed in oncology [1,3,11]. Aerts et al. use a large number
of routine CT images of cancer patients recorded on multiple sites to discover
prognostic tumor phenotypes [1]. Wibmer et al. differentiate malign from benign
prostate tissue by analysing texture features extracted from MRI images [17].
Shin et al. learn semantic associations between radiology images and reports
from a data set extracted from a PACS [14], but only uses pre-selected 2D key
slices that were referenced from clinicians.
The proposed radiomics approach differs from previous techniques in several
significant aspects. We do not restrict analysis to a certain disease type or a
small region of interest but implement a general form of population analysis.
The most significant difference to prior work is that human interaction is not a
prerequisite to bring images into processable form. We do not require selection of
key images [14] or manual annotation of regions of interest [1,11,17]. In order to
make this possible, spatial normalization involving localization and registration
is performed. The resulting non-linear mapping to a common reference space
allows coordinates and label-masks to be transferred across the population. We
extract texture and shape features and use Latent Dirichlet Allocation (LDA)
[2] to discover latent topics of co-occurring feature classes that are shared across
the population. Subsequently, these topics are used to build volume descriptors
by encoding the contribution of each topic to a specific subject.
194 J. Hofmanninger et al.
2 Identification of Clusters
Spatial Normalization. We perform spatial normalization to establish spatial
correspondences of voxels across the population. This allows to study location
dependent visual variation without the need for manual definition of regions of
interest or preselection of imaging data only showing a specific organ. For this
purpose, we perform non-linear registrations of all images to a common reference
atlas. For a given image Ii ∈ {I1 , . . . , II } and an atlas A, we seek to find a non-
linear transformation T so that A ≈ T(Ii ). High variability in the data such as
the absence of organs, variation in size and shape or diseases poses challenges to
such a registration process. To consider parts of this variations in the normal-
ization process we implement a multi-template approach (Fig. 2). Instead of a
direct mapping to an atlas, images are registered to a set of template candidates
{E1 , . . . , EE } that cover variability in the population. The transformations of the
templates to the atlas are performed in advance, when building the template-set.
They are carefully supervised and supported by manually annotated landmarks
to ensure high quality registrations.
Templates
Normalized Images
PACS Images
Atlas
a b c
Fig. 2. Multi-Template Approach. During normalization, an image (a) is aligned

to multiple templates (b). All templates are aligned with the atlas (c) by a high qual-
ity registration. An image is mapped to the atlas by concatenation of the two corre-
sponding transformations that yield maximal registration quality. After normalization,
coordinates and label masks are mapped across the population (d).
Let Tie denote a non-linear transformation from Ii to a template Ee and

TeA the transformation from Ee to A. Ii is then mapped to A by concatenating
both non-linear transformations so that A ≈ TeA (Tie (Ii )). The use of multiple
templates gives a candidate set of registrations of a fragment to the reference
atlas. Normalized Cross Correlation (NCC) is then used as a quality criteria to
select the best transformation by
arg max N CC(A, TeA (Tie (Ii ))) (1)

1≤e≤E
In most cases, radiology images cover a delimited region rather than the whole
body. To identify location and extend of these fragments in the templates, we
perform rigid and affine transformations. An initial rigid position estimation is

performed by utilizing correlated 3D-SIFT features [15]. For the template set and
the atlas, we use 17 volumes of the VISCERAL Anatomy 3 dataset [4], which
provides CT volumes paired with manually annotated landmarks and organ
masks. Non-linear registrations are performed on an isotropic voxel resolution of
2 mm using Ezys [5].
Feature Extraction. We extract two types of features that capture comple-
mentary visual characteristics in order to map an image to a visual descriptor
representation so that Ii → fi .
1. Texture Features. We densely sample Haralick [6] features of orientation
independent Gray-Level Co-occurrence Matrices similar to the work in [16].
Haralick features are able to encode 3D texture and have been used to clas-
sify lung diseases [7,16] or distinguish between cancerous and benign breast
tissue [17].
2. Shape Features. We extract 3D-SIFT [15] features to encode rotation vari-
ant gradient changes such as shape. 3D-SIFT has been used in diagnosis of lung
and brain diseases [8,13].
3. Bag of Words. We follow the Bag Of Visual Words paradigm to summarize
local features to global volume descriptors. In advance, we augment the features
with their spatial position in the reference space. This enables to train spatio-
visual vocabularies. To account for the different occurrence frequencies of small
and large 3DSIFT features, we train two separate vocabularies, microSIFT (3D-
SIFT features with ≤2 cm in diameter) and macroSIFT (diameter > 2 cm). We
denote fiH (Haralick) and fiS (SIFT) as the word count feature representations
for an image Ii .
4. Embedding. Finally, we learn a set of 20 latent topics of co-occurring feature
settings of f H and f S using Latent Dirichlet Allocation (LDA) [2]. This allows
to interpret an image as a mixture of topics represented by its 20 dimensional
topic assignment vector fiL .
Clustering. We perform clustering of the population to retrieve groups of sub-
jects with (visually) similar properties. Here we interpret the Euclidean distance
between two volume descriptors as a measure of visual similarity. This allows to
extract clusters utilizing vector quantization. We use the k-means algorithm, an
iterative procedure to minimize the sum of squared errors within the clusters,
for this purpose. Each subject i is mapped to one clusters c(i) : i → k, k ∈
{1, . . . , K}. We evaluate if these clusters based on visual information reflect
homogeneous profiles of findings in the corresponding radiology reports.
3 Evaluation
Data. Experiments are performed on a set of 7812 daily routine CT scans
acquired in the radiology department of a hospital. The dataset includes all
CT scans that were taken during a period of 2 1/2 years and show the lung. We
only include volumes with slice thickness of ≤3 mm, where the number of slices
exceeds 100 and a high spatial frequency reconstruction kernel (e.g. B60, B70,
B80, I70, I80,. . . ) was used. For a subset of 5886 cases, the radiology reports in
the form of unstructured text are available.
Term Extraction. We build a NLP framework for automatic extraction of
terms describing pathological findings in radiology reports. Extracted terms are
mapped to the RadLex2 ontology, which provides a unified vocabulary of clinical
terms, and models relationships by mapping into multiple hierarchies. One of
these hierarchies comprises all words that are related to pathological findings.
We identify pathological terms by searching for words and their synonyms in
the report that are part of this specific hierarchy. The words are then mapped
to their respective RadLex term. Our framework is furthermore able to identify
negations, so that explicitly negated terms are ignored. We define T as the
number of distinct pathological terms and substitute each term by an integer
number {1, . . . , T }. We define Ti as the set of all terms that occur in the radiology
report of subject i. For further analysis we only consider terms that occur more
than 50 times resulting in a set of T = 69 distinct terms.
Evaluating Associations Between Visual Clusters and Report Terms.
For evaluation, we restrict the area of interest to the lung, so that only features
extracted in the lung are used. Clustering is performed on the full set of images,
while for evaluation only records with a report are considered. Aim of the evalu-
ation is to test the hypothesis, that the clustering reflects pathological subgroups
in the population. In order to do so we test whether volume label assignments
(pathology terms) are associated with cluster assignments. A cell-χ2 -test is per-
formed for each term t ∈ {1, . . . , T } and each cluster k ∈ {1, . . . , K} to test
whether its cluster frequency V is significantly different from its population fre-
quency C by a 2 × 2 contingency table:
Here, B denotes the total number of subjects in the population and R the
size of a cluster. Since V is potentially small, we perform Fisher’s exact test. This
results in a p-value that gives the statistical significance of term t being over or
under represented in cluster k. Testing for each cluster independently increases
the Family-Wise Error (FWE) rate and inflates the probability of making a false
discovery of an association between the term and a cluster. We strongly control
the FWE by correcting the p-values with the Bonferroni-Holmes approach. We
define ptk as the corrected p-value for term t being associated with cluster k and
ORtk as the corresponding Odds Ratio. As this is an exploratory analysis we do
not correct the p-values on the term level.
Quality Criterion of Clusters. We interpret the number of discovered asso-
ciations between cluster and terms as a measure of quality of the population
2
http://www.rsna.org/RadLex.
clustering. This not only allows to quantify the relative quality of an image
descriptor, but also enables to find the optimal number of clusters. For a prede-
fined number of clusters K we define the measure of quality
K
T
QK = [ptk ≤ 0.05]. (2)
k=1 t=1
4 Results
Figure 3 shows values of the quality criterion (Eq. 2) for various numbers of K
using the LDA volume descriptor f L for clustering. K-means is based on random
initialization. Thus, to rule out random effects, we perform the experiments with
a set of 5 different random seeds. Graphs are shown for each seed (gray), the
average result (blue) that was used to determine the number of clusters and
the random seed (red) for which the evaluation results are reported. Figure 4
shows a comparison of different feature sets (f H , f S and f L ) with respect to the
clustering quality QK . Concatenating texture and shape features [f H f S ] allows
to discover more structure in the data than each feature set individually. The
LDA embedding f L further improves the number of associations discovered. For
further results the descriptor f L and the number of cluster 20 are fixed. Figure 5
illustrates the visual variability of the data by showing a 2D visualization of
the f L descriptors using t-SNE [12]. In addition, exemplary slices of volumes
at different positions in the feature space are shown. Figure 6a illustrates all
associations discovered by population clustering. Positive associations (ORtk >
1) and negative associations (ORtk < 1) are shown for all ptk ≤ 0.05. Figure 6(b–
e) shows a comparison of 3 exemplary clusters illustrating the raw features (b),
the embedding (c) a set of terms that are associated with the cluster (d) and
exemplary slices of volumes in the cluster (e).
Fig. 3. Number of discovered associations Fig. 4. Comparison of different feature

(Q) over varying numbers of clusters (K) sets. “SIFT3D+Haralick” denotes the
for different seeds (gray) and averaged concatenation of the two feature sets.
(blue). Red indicates the seed used to gen- The values represent the average using
erate the evaluation results. 5 different seeds.
Fig. 5. 2D visualization of the LDA image descriptors of 7812 volumes using t-SNE.
Exemplary volume slices from different areas in the feature space are given to illustrate
the visual variability in the population.
micro
b
SIFT
69
macro
z-score
SIFT aralick
H
-2
LDA
Cluster
Terms
Emphysema
E usion
Cyst
Bulla
d
Ascites
Lymphoma
e
Term p-value OR Term p-value OR Term p-value OR

E usion <0.001 7.00 Emphysema < 0.001 11.22 Cyst < 0.001 2.60
1
1 Cluster 20 Ascites <0.001 6.66 Bulla < 0.001 2.63 Lymphoma < 0.001 4.25
OR > 1 Haemorrhage <0.001 5.99 Sclerosis 0.004 1.80 Lesion < 0.001 2.10
OR < 1 a Haematoma
Compression
<0.001
<0.001
5.36
5.12
Mass 0.047 1.75 Granuloma
Sclerosis
0.002
0.004
1.80
1.52
p > 0.05
Fig. 6. (a) Discovered associations between clusters (columns) and terms (rows). Terms
are sorted by decreasing occurrence frequency. Positive associations (OR > 1) are
indicated red and negative associations (OR < 1) are indicated blue. (b–e) Comparison
of three clusters. (b) Shows raw features, (c) The LDA embedding and (d) Indicates
the appearance of 6 terms that are overrepresented in one of these clusters. (e) Shows
exemplary volume slices of members and lists of up to 5 significantly overrepresented
terms with p-values and OR of the respective clusters.
5 Conclusion
We propose a framework for visual population clustering of large clinical routine
imaging data. After spatial normalization, visual features are learned, and a
clustering is performed on the volume level. We evaluate the impact of features
on the clustering, and validate the clinical relevance of the resulting grouping
of patients based on corresponding radiology reports. Results show that the
clustering after normalization identifies groups with coherent sets of reported

findings. This demonstrates that visual markers that relate to clinical findings,
can be learned without supervision. The proposed approach is a step towards
unsupervised learning from clinical routine imaging data.
References
1. Aerts, H.J., Velazquez, E.R., Leijenaar, R.T., et al.: Decoding tumour phenotype
by noninvasive imaging using a quantitative radiomics approach. Nat. Commun. 5
(2014)
2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn.
Res. 3, 993–1022 (2003)
3. Gillies, R.J., Kinahan, P.E., Hricak, H.: Radiomics: images are more than pictures,
they are data. Radiology 278(2), 563–577 (2016)
4. Göksel, O., Jiménez-del Toro, O.A., Foncubierta-Rodrı́guez, A., Muller, H.:
Overview of the VISCERAL challenge at ISBI. In: Proceedings of VISCERAL
Challenge at ISBI, New York, NY (2015)
5. Gruslys, A., Acosta-Cabronero, J., Nestor, P.J., et al.: A new fast accurate nonlin-
ear medical image registration program including surface preserving regularization.
IEEE Trans. Med. Imaging 33(11), 2118–2127 (2014)
6. Haralick, R.M., Shanmugam, K., et al.: Textural features for image classification.
IEEE Trans. Syst. Man Cybern. 3(6), 610–621 (1973)
7. Hofmanninger, J., Langs, G.: Mapping visual features to semantic profiles for
retrieval in medical imaging. In: Proceedings of IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 457–465 (2015)
8. Toews, M., Wachinger, C., Estepar, R.S.J., Wells, W.M.: A feature-based app-
roach to big data analysis of medical images. In: Ourselin, S., Alexander, D.C.,
Westin, C.-F., Cardoso, M.J. (eds.) IPMI 2015. LNCS, vol. 9123, pp. 339–350.
Springer, Heidelberg (2015). doi:10.1007/978-3-319-19992-4 26
9. Kumar, R.S., Senthilmurugan, M.: Content-based image retrieval system in medical
applications. Int. J. Eng. Res. Technol. 2(3) (2013)
10. Kumar, V., Gu, Y., et al.: Radiomics: the process and the challenges. Magn. Reson.
Imaging 30(9), 1234–1248 (2012)
11. Lambin, P., Rios-Velazquez, E., Leijenaar, R., et al.: Radiomics: extracting more
information from medical images using advanced feature analysis. Eur. J. Cancer
48(4), 441–446 (2012)
12. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn.
Res. 9(2579–2605), 85 (2008)
13. Mondal, P., Mukhopadhyay, J., Sural, S., Bhattacharyya, P.P.: 3D-sift feature
based brain atlas generation: an application to early diagnosis of Alzheimer’s dis-
ease. In: International Conference on Medical Imaging, m-Health and Emerging
Communication Systems, pp. 342–347. IEEE (2014)
14. Shin, H.C., Lu, L., Kim, L., Seff, A., Yao, J., Summers, R.M.: Interleaved
text/image deep mining on a very large-scale radiology database. In: Proceed-
ings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
pp. 1090–1099 (2015)
15. Toews, M., Wells, W.M.: Efficient and robust model-to-image alignment using 3D
scale-invariant features. Med. Image Anal. 17(3), 271–282 (2013)
16. Vogl, W.-D., Prosch, H., Müller-Mang, C., Schmidt-Erfurth, U., Langs, G.: Lon-
gitudinal alignment of disease progression in fibrosing interstitial lung disease.
In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI
2014. LNCS, vol. 8674, pp. 97–104. Springer, Heidelberg (2014). doi:10.1007/
978-3-319-10470-6 13
17. Wibmer, A., et al.: Haralick texture analysis of prostate MRI: utility for differ-
entiating non-cancerous prostate from prostate cancer and differentiating prostate
cancers with different gleason scores. Eur. Radiol. 25(10), 2840–2850 (2015)
Probabilistic Tractography for Topographically
Organized Connectomes
Dogu Baran Aydogan(B) and Yonggang Shi
Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute,

Keck School of Medicine, University of Southern California, Los Angeles, USA
baran.aydogan@loni.usc.edu
Abstract. While tractography is widely used in brain imaging research,

its quantitative validation is highly difficult. Many fiber systems, how-
ever, have well-known topographic organization which can even be quan-
titatively mapped such as the retinotopy of visual pathway. Motivated
by this previously untapped anatomical knowledge, we develop a novel
tractography method that preserves both topographic and geometric reg-
ularity of fiber systems. For topographic preservation, we propose a novel
likelihood function that tests the match between parallel curves and fiber
orientation distributions. For geometric regularity, we use Gaussian dis-
tributions of Frenet-Serret frames. Taken together, we develop a Bayesian
framework for generating highly organized tracks that accurately fol-
low neuroanatomy. Using multi-shell diffusion images of 56 subjects
from Human Connectome Project, we compare our method with algo-
rithms from MRtrix. By applying regression analysis between retinotopic
eccentricity and tracks, we quantitatively demonstrate that our method
achieves superior performance in preserving the retinotopic organization
of optic radiation.
Keywords: Probabilistic tractography · Bayesian inference · Visual

pathway
1 Introduction
Tractography is a widely used technique for studying brain connectomes with
diffusion MRI (dMRI) and has provided many exciting results in brain imaging
research [1]. The lack of rigorous validation for in vivo human brain studies,
however, has long been a critical challenge to push tractography toward a quan-
titative tool [2,3]. On the other hand, the regular topographic organization of
many fiber systems in human brains provide a surprisingly untapped anatomical
knowledge for the improvement and validation of tractography techniques. Some
of the well-known examples include the retinotopic organization of the visual
pathway [4], the somatotopic organization of the somatosensory pathway [5],
and the tonotopic organization of the auditory pathway [6]. In this paper, we
incorporate this insight on anatomical regularity to develop a novel probabilistic

DOI: 10.1007/978-3-319-46720-7 24
202 D.B. Aydogan and Y. Shi
tractography algorithm for studying the connectome of these topographically

organized systems.
Conventional streamline methods rely on step size and curvature parameters
to control the regularity of fiber tracks [7]. This type of local mechanism is dif-
ficult to account for the topographic organization of fiber pathways. It becomes
particularly challenging when we want to reconstruct fiber tracks that are topo-
graphically arranged but with highly bended segments such as the Meyer’s loop
of the optic radiation. More recently, global tractography techniques were devel-
oped to improve the robustness of streamline techniques [8,9], but their focus is
on balancing the fitting of dMRI signals and the regularity of local stick models.
These developments are followed by microstructure based optimization tech-
niques that start with a large set of tracks and assign weights or prune them so
that resultant set of tracks best fit the dMRI signal [10,11].
In this work we propose a novel probabilistic tractography method that incor-
porates both topographic and geometric regularity. Each fiber track is repre-
sented with the Frenet-Serret frame in our method. Fiber orientation distribu-
tions (FODs) [7,12] are used in our work to represent the connectivity informa-
tion at each voxel. For topographic regularity, the key idea in our method is the
development of a novel likelihood function that tests the match of parallel curves
to the FODs in the neighborhood of each point. The geometric prior is modeled
as a Gaussian distribution of the Frenet-Serret frame. Taken together, we use
a Bayesian approach with rejection sampling to propagate the fiber tracks and
reconstruct highly organized fiber bundles.
In our experimental results, we validate the proposed technique on the recon-
struction of the optic radiation using the multi-shell diffusion imaging data of
the Human Connectome Project (HCP) [13]. The retinotopy of the visual sys-
tem means there is a corresponding point on the cortex for each point of the
retinal space [4]. This correspondence between the topography of the retina and
axonal projections provides a great opportunity for localized mapping of retina
disease to visual pathway integrity. Another distinct aspect of this fiber system
is the Meyer’s loop that bends sharply toward the anterior aspect before mov-
ing toward the visual cortex [14]. Given these anatomical knowledge about the
topography and geometric organization of the optic radiation, we believe it is
an ideal testbed for tractography algorithms. On multi-shell imaging data from
56 HCP subjects, we compare our method with three tractography algorithms
of the MRtrix package [7]. We demonstrate that our method is able to gener-
ate highly organized tracks while capturing the challenging Meyer’s loop. Using
regression analysis, we quantitatively demonstrate that our method performs
better in preserving the retinotopy of the optic radiation.
2 Methods
Frenet-Serret Apparatus: We use the Frenet-Serret frame and formulas

to represent fiber tracks as differentiable curves. Let γ(s) represent a non-
degenerate curve parameterized by its arc length s. The Frenet-Serret frame
Probabilistic Tractography for Topographically Organized Connectomes 203
can then be defined with three orthonormal vectors: tangent (T ), normal (N )

and binormal (B) that are given as T = dγ/ds, N = (dT /ds)/ | dT /ds | and
B = T × N . The Frenet-Serret formulas express the derivatives of T , N and
B in terms of themselves with a system of first order ODEs: dT /ds = κN ,
dN/ds = −κT + τ B and dB/ds = −τ N . Here κ and τ are the curvature and
torsion of the curve, respectively. Given the initial conditions, this system can
be uniquely solved.
Curve Spaces: Curve parameters can be put in a vector form to represent the
space of differentiable curves that pass through p ∈ R3 with cp = [F, κ, τ ] ∈ C.
Here F denotes the Frenet-Serret frame and C is a short notation for the space
of curves, S 2 × S 2 × R+ × R, where S 2 is the unit sphere. We will use the
notation γcp (s) ∈ R3 to denote where cp traces in space and denote Fcp (s) =
{Tcp (s), Ncp (s), Bcp (s)}, κcp (s) and τcp (s) as the parameters of the curve at s.
Parallel Curves: Let cp and cp be two curves. If p lies on the normal plane
of cp and cp = cp then we call cp and cp parallel curves and denote them
as cp cp . These are also known as offset curves in computer graphics and
computer aided design.
Fiber Model: We model a fiber track as a finite length, arc length parameter-
ized curve that starts from a seed point pt=0 ∈ R3 at time step zero (t = 0). In a
nutshell, our algorithm initializes by estimating ct=0 0
pt=0 for s = 0 and γc0p0 (0) = p .
It solves the Frenet-Serret ODE system and moves to t = 1 by taking a step of
Δs which gives s = Δs, γc00 (Δs) = p1 and c1p1 = [Fc00 (Δs), κc00 (Δs), τc00 (Δs)].
p p p p
We then compute the priors, likelihood and posterior using Bayesian inference.
Lastly we pick the next curve, c2p1 , randomly by rejection sampling and iter-
ate until a stopping condition is met. Our track is a train of points traced by
the curve {p0 , p1 , p2 , · · · }. Figure 1 explains our notation and the propagation
technique.
Fig. 1. (a) At t = 0, without prior information, we select a random curve among the red
candidate curves based on their likelihood. The thicker the curve, the higher posterior
probability it has. (b) By solving the Frenet-Serret ODE, we propagate by Δs. (c) At
t = 1, we calculate the prior probability for candidate curves for a smooth transition
from the previous curve c1p1 shown in green. (d) Propagate to p2 .
Bayesian Inference: Given a curve at time t, ctpt , and data D, we estimate the
posterior probability for the next curve ct+1
pt using Bayesian inference as follows:
p(D | ct+1 t+1 t

pt )p(cpt | cpt )
p(ct+1 t
pt | D, cpt ) = ∝ p(D | ct+1 t+1 t
pt ) p(cpt | cpt ) (1)
p(D | ctpt )
posterior likelihood prior
Prior Estimation: In order to preserve smoothness between the transition of

curves that involves the changes in κ, τ and 3D rotation of F, we use Gaussian
distributions to define the geometric prior. The variances for curvature and tor-
sion are denoted with σκ2 and στ2 . 3D rotation is realized by rotating F around T ,
N and B axes in order, with variances σT , σN and σB respectively. The overall
prior probability is expressed in Eq. 2:
p(ct+1 t
pt | cpt ) = p(F
t+1
| F t , σT2 , σN
2 2
, σB ) p(κt+1 | κt , σκ2 )p(τ t+1 | τ t , στ2 ) (2)

p(T t+1 |T t ,σN
2 2
,σB )p(N t+1 |N t ,σT
2 2
,σB )p(B t+1 |B t ,σT
2 2
,σN )
The functions used for computing the prior probability are given in Eq. 3.
2
(T ,T t )/2σN
2
−acos2 (T t+1 ,T )/2σB
2
p(T t+1 | T t , σN
2 2
, σB ) = (2πσN σB )−1 e−acos
2
(N ,N t )/2σT
2
−acos2 (N t+1 ,N )/2σB
2
p(N t+1 | N t , σT2 , σB
2
) = (2πσT σB )−1 e−acos
2 t 2 2 t+1
,B )/2σN
2
p(B t+1 | B t , σT2 , σN
2
) = (2πσT σN )−1 e−acos (B ,B )/2σT −acos (B
√ t+1 t 2 2
p(κt+1 | κt , σκ2 ) = ( 2πσκ )−1 e−(Ψ (κ )−Ψ (κ )) /(2Ψ (σκ ))
√ t+1 t 2 2
p(τ t+1 | τ t , στ2 ) = ( 2πστ )−1 e−(τ −τ ) /(2στ )
(3)
In Eq. 3, , is the dot product, T , N and B are the intermediary rotations,
and Ψ (κ) = asin(κ) is used to linearize change in curvature.
Likelihood Estimation: We estimate the data support for the next curve
using the parallel curve definition and fiber orientation distribution (FOD). The
field of FODs over an image volume can be expressed as a spherical function
D(p, T ) : R3 × S 2 → R, where p ∈ R3 is the position of a point in the dMRI
image and T ∈ S 2 . We define our likelihood expression as follows:

t+1 1
p(D | cpt ) = D(γcp (s), Tcp (s))dsdcp (4)
4/3πr3 ∀cp ct+1
t |pt −γc (s)|≤r
p p
In Eq. 4, r is the radius of a sphere centered at p, which is the integra-

tion domain. Constant terms in front of Eq. 4 are used for normalization. The
likelihood expression computes the data support for a candidate curve ct+1
p by
integrating contributions of FODs belonging to parallel curves.
Implementation Details: The double integral in Eq. 4 is intractable. In order
to estimate the likelihood numerically, we discretize the integral domain and
add the contributions of FODs for a set of points as illustrated in Fig. 2. In this
case, we used an actual human dMRI and picked a point in corpus callosum. For
better visualization, we show here 7 random points as black dots within a radius
of a few voxels, but in practice we use 27 points. The likelihood of a curve is
computed by estimating the parallel curves that pass through these points. We
Fig. 2. (a) To estimate the likelihood of a curve, we randomly pick a number of points
within the integration radius, r. (b) Parallel curves passing through the random points.
(c) We compute the tangents of parallel curves for each point and obtain the average
FOD. (d)(e) Estimated likelihoods are shown in proportion to the thickness of curves.
then compute the tangents and interpolate the FODs at the points. The final
likelihood is obtained by adding and averaging the data support contributed by
these points. In Fig. 2(d) and (e), the estimated likelihoods of two different sets
of parallel curves are visualized, where the thickness of fibers are in proportion
to their likelihood.
3 Test Subjects and Experimental Setup
In our experiments we used multi-shell images from Q1 release of HCP that

includes 74 subjects, 56 of which completed both T1 and dMRI scans. For all 56
subjects, we computed FODs following the algorithm in [12] for multi-shell data.
The FODs from this method are represented by spherical harmonics (SPHARM)
and fully compatible with MRtrix that we compare with. We focused on the
reconstruction of optic radiation that connects the lateral geniculate nucleus
(LGN) and primary visual cortex (V1). One salient feature of this bundle is the
retinotopic organization. Following the method in [16], we automatically gen-
erate V1 ROI and its retinotopic map that assigns each vertex in V1 cortex
two coordinates: angle and eccentricity. ROI for LGN was generated using the
method proposed in [14]. Inferior bundle of the optic radiation has an unconven-
tional trajectory which first courses anteriorly before it runs posteriorly towards
the visual cortex forming the elusive Meyer’s loop. Because of the quantitative
coordinates provided by the retinotopic map and the challenges in reconstruct-
ing the Meyer’s loop, we picked the optical radiation in our tests. We conducted
qualitative and quantitative comparisons of our technique with algorithms in
MRtrix3 [7], which includes two probabilistic tractography algorithms: iFOD2
[15], iFOD1 and a deterministic, SD STREAM, approach. Table 1 shows the
parameters used for each technique. For our technique, we typically use a very
2 2
small step size together with small σN and σB values since we walk along curves.
In order to capture the Meyer’s loop, for MRtrix3 algorithms we used a higher
angular threshold than the default parameters. Cut-off values were adjusted so
that all the techniques can capture the Meyer’s loop. A lower cut-off threshold
and a much higher angle threshold are used for SD STREAM to achieve this.
Table 1. Tractography parameters used for each technique. vs is voxel size, ◦ is degree.
2
Step(vs) Angle Cutoff σT 2
σN 2
σB 2 (vs) σ 2 (vs) r(vs)
σκ τ
Our method 0.001 0.04 60◦ 1.25◦ 1.25◦ 0.2 0.2 2
MRtrix3 iFOD2 0.2 22◦ 0.04
MRtrix3 iFOD1 0.1 11◦ 0.04
MRtrix3 SD STREAM 0.1 60◦ 0.02
4 Results
Qualitative Evaluation: Reconstruction results of the left optic radiation of
an HCP subject by our method and MRtrix algorithms are shown in Fig. 3. We
can clearly see that our results are more desirable as they are able to successfully
capture the Meyer’s loop while exhibiting highly organized trajectories. As the
tracks approach the V1 cortex, we can see the probabilistic tractography results
from iFOD2 and iFOD1 start to become topographically less organized.
Fig. 3. Qualitative comparison of our method with MRTrix algorithms on the recon-
struction of a left optic radiation of an HCP subject.
Quantitative Evaluation: In Fig. 4 we illustrate our approach for quantita-

tive evaluation. Using the eccentricity of the V1 cortex, we can divide it into
three parts: fovea (red), superior-peripheral (green) and inferior-peripheral (blue)
regions as shown in Fig. 4(a), which also allow us to split the fiber bundle into
three sub-bundles. By cutting the bundle with a coronal plane (black dashed
lines in Fig. 4(a)), we can visualize the topographic organization of the fiber
tracks from different methods as shown in Fig. 4(b)–(e). Because of the topo-
graphic organization of the fovea and peripheral bundles, the eccentricity of the
fiber tracks and their coordinates on the cross-section should follow a U-shape
Fig. 4. Quantitative comparison of the bundles show in Fig. 3. Top row shows the labels
for three sub-bundles of the optic radiation. Bottom row shows the eccentricity values
and the quality of quadratic fit using MSE and R2 . The low MSE and high R2 values
obtained by the proposed technique corroborate the qualitative observation.
relation as plotted in Fig. 4(f), where the black dots are the raw data and the
colored points are the fitted value with quadratic regression. In Fig. 4(g)–(j),
the eccentricity values of each bundle on the cross-section are visualized, where
the color bar for eccentricity is shown on the right most image. To quantita-
tively assess this relation, we applied quadratic regression to model the relation
between eccentricity and the cross-sectional coordinates of fiber bundles. We
report both the mean square error (MSE) and coefficient of determination (R2 )
to measure how well the fiber tracks preserve the retinotopy of the fiber bundle.
For each technique, the mean value of these two measures from 56 HCP sub-
jects are listed in Table 2, where we can see that our method achieves the best
performance in both measures.
Table 2. Quantitative evaluation of the retinotopic organization of the optic radiation

bundle. Best results are in bold.
Our method iFOD2 iFOD1 SD STREAM

2
Mean R 0.53 0.27 0.4 0.46
Mean MSE 7.44 14.83 9.03 12.5
5 Discussions and Conclusion

Our current implementation is non-optimal and written in MATLAB and C++.
Typically it takes several hours to generate tracks on the orders of thousands
which we are working to improve. Our technique uses more parameters as com-
pared to the conventional algorithms, but they are geometrically intuitive and
their effects are not very difficult to understand. For quantitative comparison,
we took a different approach than in [2] which suggests counting valid, invalid
and no connections. This is because we focus on the reconstruction of individual
bundles. We also did not use synthetic phantoms or simulated tracks because of
the availability of retinotopic maps for in vivo validation.
In summary, we developed a novel probabilistic tractography technique that
aims to capture the topographic organization of fiber bundles. A key idea in our
method is the use of parallel curves to examine the local fitting of fiber tracks
to the underlying field of FODs. Using the retinotopic mapping on V1 cortex,
we have conducted quantitative evaluations and demonstrated that our method
is able to generate more organized fiber tracks that follows known anatomy of
the visual system. For future work, we will conduct more extensive validations
on the visual pathway, its connectivity maps and other bundles that also follow
topographic organizations such as the auditory and somatosensory pathways.
Acknowledgements. This work was in part supported by the National Insti-

tute of Health (NIH) under Grant K01EB013633, P41EB015922, P50AG005142,
U01EY025864, U01AG051218.
References
1. Fillard, P., Descoteaux, M., Goh, A., Gouttard, S., Jeurissen, B., Malcolm, J.,
Ramirez-Manzanares, A., Reisert, M., Sakaie, K., Tensaouti, F., Yo, T., Mangin,
J.F., Poupon, C.: Quantitative evaluation of 10 tractography algorithms on a real-
istic diffusion MR phantom. NeuroImage 56(1), 220–234 (2011)
2. Côté, M.A., Girard, G., Bor, A., Garyfallidis, E., Houde, J.C., Descoteaux, M.:
Tractometer: towards validation of tractography pipelines. Med. Image Anal.
17(7), 844–857 (2013)
3. Thomas, C., Ye, F.Q., Irfanoglu, M.O., Modi, P., Saleem, K.S., Leopold, D.A.,
Pierpaoli, C.: Anatomical accuracy of brain connections derived from diffusion
MRI tractography is inherently limited. PNAS 111(46), 16574–16579 (2014)
4. Engel, S.A., Glover, G.H., Wandell, B.A.: Retinotopic organization in human visual
cortex and the spatial precision of functional MRI. Cereb. Cortex 7(2), 181–192
(1997)
5. Ruben, J., Schwiemann, J., Deuchert, M., Meyer, R., Krause, T., Curio, G., Vill-
ringer, K., Kurth, R., Villringer, A.: Somatotopic organization of human secondary
somatosensory cortex. Cereb. Cortex 11(5), 463–473 (2001)
6. Morosan, P., Rademacher, J., Schleicher, A., Amunts, K., Schormann, T., Zilles,
K.: Human primary auditory cortex: cytoarchitectonic subdivisions and mapping
into a spatial reference system. NeuroImage 13(4), 684–701 (2001)
7. Tournier, J.D., Calamante, F., Connelly, A.: MRtrix: diffusion tractography in
crossing fiber regions. Int. J. Imaging Syst. Technol. 22(1), 53–66 (2012)
8. Reisert, M., Mader, I., Anastasopoulos, C., Weigel, M., Schnell, S., Kiselev, V.:
Global fiber reconstruction becomes practical. NeuroImage 54(2), 955–962 (2011)
9. Mangin, J.F., Fillard, P., Cointepas, Y., Le Bihan, D., Frouin, V., Poupon, C.:
Toward global tractography. NeuroImage 80, 290–296 (2013)
10. Daducci, A., Dal Palu, A., Lemkaddem, A., Thiran, J.P.: COMMIT: convex opti-
mization modeling for microstructure informed tractography. IEEE Trans. Med.
Imaging 34(1), 246–257 (2015)
11. Smith, R.E., Tournier, J.D., Calamante, F., Connelly, A.: SIFT2: enabling dense
quantitative assessment of brain white matter connectivity using streamlines trac-
tography. NeuroImage 119, 338–351 (2015)
12. Tran, G., Shi, Y.: Fiber orientation and compartment parameter estimation from
multi-shell diffusion imaging. IEEE Trans. Med. Imaging 34(11), 2320–2332 (2015)
13. Essen, D.V., Ugurbil, K., Auerbach, E., Barch, D., Behrens, T., Bucholz, R.,
Chang, A., Chen, L., Corbetta, M., Curtiss, S., Penna, S.D., Feinberg, D., Glasser,
M., Harel, N., Heath, A., Larson-Prior, L., Marcus, D., Michalareas, G., Moeller, S.,
Oostenveld, R., Petersen, S., Prior, F., Schlaggar, B., Smith, S., Snyder, A., Xu,
J., Yacoub, E.: The human connectome project: a data acquisition perspective.
NeuroImage 62(4), 2222–2231 (2012)
14. Kammen, A., Law, M., Tjan, B.S., Toga, A.W., Shi, Y.: Automated retinofugal
visual pathway reconstruction with multi-shell HARDI and FOD-based analysis.
NeuroImage 125, 767–779 (2016)
15. Tournier, J.D., Calamante, F., Connelly., A.: Improved probabilistic streamlines
tractography by 2nd order integration over fibre orientation distributions. In: Pro-
ceedings of 18th Annual Meeting of the International Society for Magnetic Reso-
nance in Medicine (ISMRM), p. 1670 (2010)
16. Benson, N.C., Butt, O.H., Datta, R., Radoeva, P.D., Brainard, D.H., Aguirre,
G.K.: The retinotopic organization of striate cortex is well predicted by surface
topology. Curr. Biol. 22(21), 2081–2085 (2012)
A Hybrid Multishape Learning Framework
for Longitudinal Prediction of Cortical Surfaces
and Fiber Tracts Using Neonatal Data
Islem Rekik, Gang Li, Pew-Thian Yap, Geng Chen, Weili Lin,
and Dinggang Shen(B)
Department of Radiology and BRIC, University of North Carolina at Chapel Hill,

dgshen@med.unc.edu
Abstract. Dramatic changes of the human brain during the first year
of postnatal development are poorly understood due to their multifold
complexity. In this paper, we present the first attempt to jointly pre-
dict, using neonatal data, the dynamic growth pattern of brain cortical
surfaces (collection of 3D triangular faces) and fiber tracts (collection of
3D lines). These two entities are modeled jointly as a multishape (a set
of interlinked shapes). We propose a hybrid learning-based multishape
prediction framework that captures both the diffeomorphic evolution of
the cortical surfaces and the non-diffeomorphic growth of fiber tracts. In
particular, we learn a set of geometric and dynamic cortical features and
fiber connectivity features that characterize the relationships between
cortical surfaces and fibers at different timepoints (0, 3, 6, and 9 months
of age). Given a new neonatal multishape at 0 month of age, we hier-
archically predict, at 3, 6 and 9 months, the postnatal cortical surfaces
vertex-by-vertex along with fibers connected to adjacent faces to these
vertices. This is achieved using a new fiber-to-face metric that quantifies
the similarity between multishapes. For validation, we propose several
evaluation metrics to thoroughly assess the performance of our frame-
work. The results confirm that our framework yields good prediction
accuracy of complex neonatal multishape development within a few sec-
onds.
1 Introduction
Knowledge about postnatal brain development fuels our understanding of cog-
nition, actions, sensation, perception, decision, and thought. From a modeling
perspective, one could see the developing brain as characterized by complex and
dynamic interactions of multiple shapes, comprising highly folded cortical sur-
faces and white matter fiber tracts that are evolving rapidly due to myelination.
Developing models that accurately capture the spatiotemporal growth of a spe-
cific multishape (here, tract and cortical surface) can help the investigation of
This work was supported in part by NIH grants (NS093842, EB006733, EB008374,
EB009634, AG041721, MH107815, MH108914, and MH100217).

DOI: 10.1007/978-3-319-46720-7 25
A Hybrid Multishape Learning Framework for Longitudinal Prediction 211
brain development and improve the diagnosis of several neurodeveopmental and

psychiatric illnesses that are rooted in early infancy [1].
Recently, the generic varifold metric tailored to measure multidimensional
shapes (e.g., a set of landmarks, a set of lines, and surfaces) was introduced
in [2]. It has been used for population-based multishape atlas reconstruction of
subcortical surfaces and fiber tracts [3,4]. However, the evaluation and hence the
utility of these methods are limited. First, they were tested on simple deep brain
structures (e.g., caudate) and specific fiber tracts (e.g., those connecting the
cortical surface to the caudate) [3]. Second, they were tested on adult patients,
where the inter- and intra-subject multishape variability is not as large as that
in postnatal development (Fig. 1). More recently, Gori et al. presented a double-
diffeomorphism strategy to jointly estimate a cortical surface and fiber-bundle
template for both adult control and patient populations. The idea of double-
diffeomorphism nicely accounts for the possibility of having fibers connecting to
a specific cortical region in one subject and then ‘switching’ to another cortical
region for another subject. However, when modeling subject-specific multishape
development in infants, one would not expect the fibers to change their con-
necting spots on the cortical surface. Furthermore, these fiber tracts undergo
fundamental topological changes, especially for the fiber tracts which bifurcate,
branch out and multiply with myelination after birth. This is a non-diffeomorphic
growth behavior, which contrasts the more stable diffeomorphic fiber deforma-
tion in older children and adults. For instance, Li et al. recently found that
cortical fiber density is regionally heterogeneous and increases dramatically in
Fig. 1. Training steps of hybrid multishape prediction framework for one training sub-
ject. (Top row) Estimate the baseline cortical surface diffeomorphic deformation trajec-
tory through the diffeomorphism φ using [6]. (Middle row) Whole-brain deterministic
tractography to estimate diffusion fiber tracts {Fi } at each acquisition timepoint. The
red box demonstrates the non-diffeomorphic nature of fiber growth. (Bottom row)
Non-diffeomorphic projection using π Ai of training longitudinal fibers tracts on the
estimated longitudinal mean atlas {Ai }.
212 I. Rekik et al.
the first year [1]. Put together, these facts present key challenges for predicting
subject-specific postnatal brain multishape development, solely from the neonatal
multishape. To the best of our knowledge, this is a problem that has not been
addressed.
Noting the limited works targeting the prediction of subject-specific postnatal
cortical shape development from a single timepoint [5], we propose in this work
the first learning-based multishape prediction framework from neonatal cortex
and fibers. The proposed framework comprises training and testing stages. In
the training stage, for each infant, we learn from the training subjects (1) the
geometric features (surface vertices), (2) the dynamic features of the baseline
cortical surface development (smooth and invertible evolution trajectories), and
(3) the fiber-to-face connectivity features via projections on an empirical longi-
tudinal cortical surface atlas. In the testing stage, for a new neonatal multishape,
we hierarchically select the best learned features that simultaneously predict the
triangular faces on the cortical surface (or meshes) and the fibers traversing
them at all training timepoints (in our case, 3, 6 and 9 months of age) based on
cortical shape topographic properties and a novel fiber-face selection criterion.
Our proposed method has several advantages. First, it is not only restricted
to predicting the cortical surface growth as in [5]. Second, it does not require
the computationally expensive process of registering or regressing out thousands
of fibers to establish tract-to-tract correspondence for prediction, which is less
likely to be achieved using a conventional diffeomorphic multishape registration
setting as in [2]. Third, it relies on the diffeomorphic cortical surface deforma-
tion trajectory, which is less complex and more accurate to estimate than for
developing fibers, to guide fiber prediction. More importantly, this enables us to
account for fiber connectivity changes and the appearance of ‘new’ fibers with
different topologies. Ultimately, we present a new metric for jointly predicting
both diffeomorphic surface evolution and non-diffeomorphic fiber growth within
the multishape, thus making our approach hybrid.
2 Hybrid Longitudinal Surface-Fiber Evolution Modeling

(Training Stage)
In this section, we present the advanced mathematical tools that mold our work.
As a preliminary step, we embed the multishape (both fibers and cortical surface)
into the varifold space. The multidirectional varifold-based surface representa-
tion will be used to estimate the diffeomorphic cortical growth [6], whereas the
conventional unidirectional varifold-based fiber representation will be part of the
proposed non-diffeomorphic fiber selection criterion for prediction [2].
Surface and Fiber Tract Representation Using Respectively Multidi-
rectional and Unidirectional Varifold Metrics. The varifold metric mea-
sures the rich geometry of any shape with dimension d > 0 by the way the
shape integrates a square-integrable 3D vector field ω ∈ W through convolu-
tions based on a reproducing kernel K W [2,5]. In this case, measuring a surface
S as a varifold is defined as an integration of a testing vector field ω ∈ W
along its nonoriented normal vectors n and principal curvature direction. More
simply, measuring a fiber F as a varifold refers to the mathematical
operation of
integrating ω along the fiber nonoriented tangent vectors τ : F = ω(x)t τ (x)dx.
In this context, W is defined as a Reproducing Kernel of Hilbert Space (RKHS)
with a Gaussian kernel K W (x, y) = exp(−|x − y|2 )/σW 2
. The kernel decays at a
rate σW , which defines the scale under which geometric details will be overlooked
when converting a shape into a varifold. Hence, any discrete shape embedded
in the varifold space W ∗ is a summation of local discrete measurements, each
encoding the interaction of the shape at a local scale with a vector field ω [2,6].
Diffeomorphic Geodesic Longitudinal Surface Regression for Extract-
ing Geometric and Dynamic Features. To longitudinally deform a source
varifold surface S0 observed at t0 into a set of target varifold surfaces
{S1 , . . . , SN } respectively observed at {t1 , . . . , tN }, we adopt the Hamiltonian
formulation setting as described in [2,5,6] to estimate a diffeomorphism
φ(x, t), t ∈ [0, 1], which is fully parametarized by a set of control points ck
and their attached initial deformation momenta αk . The initial momenta fully
guide the geodesic shooting of S0 onto subsequent surfaces and are estimated
along with the control points through minimizing 1the following
energy functional
using a conjugate gradient descent [5]: E = 12 0 |vt |2V dt + γ j∈{1,...,N } ||Sj −
φ(S0 , tj )||2W ∗ , with γ denoting the trade-off between the deformation smooth-
ness term and the fidelity to data term, respectively. The velocity field vt
belongs to a RKHS V , with a Gaussian kernel KV decaying at rate σV ,
and is defined Nc at a location x and timepoint t in terms of convolutions as:
v(x, t) = k=1 KV (x, ck (t))αk (t), with Nc as the number of the estimated con-
trol points. This allows to set vertex-to-vertex correspondence across subjects
and timepoints. For prediction, we define the set of geometric features V as the
set of all vertices positions x belonging to baseline training surfaces and the
dynamic features as their corresponding evolution trajectories φ(x, t).
Estimation of Non-diffeomorphic Longitudinal Fiber-to-Face Connec-
tivity Features Using Multi-projections on Spatiotemporal Atlases.
Since we aim to predict the multishape growth from a single timepoint, we
estimate a set of spatiotemporal surface atlases {A0 , . . . , AN } by averaging the
shapes of the training surfaces at each timepoint to help guide the prediction
process (Fig. 1). Note that all these atlases are in correspondence with all subjects
and across all acquisition timepoints. Then, to define the fiber-to-face connec-
tivity features that capture the non-diffeomorphic growth of neonatal fibers, for
each ensemble of fibers Fi from a training subject at ti , we introduce the sur-
jective projection function π Ai (Fi ) to project it onto the corresponding surface
atlas Ai . Specifically, for a fiber line f ∈ Fi with two extremities f 1 and f 2 , we
perform: f k → π Ai (f k ) = ξ, where k ∈ {1, 2} and ξ denotes a face in Ai . In
turn, this allows us to identify for each training subject the connectivity features
for each face in the atlas Ai at a specific timepoint ti as the set of proximal
fibers that hit it or are ‘connected’ to it (noted as Fi (ξ)) (Fig. 1). To define
the connectivity features from all training subjects, we independently project
the set of fibers for each training subject on the atlas. Hence, each atlas face
214 I. Rekik et al.
stores for each training subject a set of connecting fibers through this process of
multi-projections onto a fixed atlas.
3 Longitudinal Multishape Prediction Algorithm from

Baseline (Testing Stage)
In the prediction stage, we first warp all baseline training surfaces onto the
baseline cortical surface of a testing subject. Then, in the common space, we
estimate the baseline testing fiber tracts using deterministic whole-brain trac-
tography. Because of the non-diffeomorphic nature of neonatal fibers growth,
we avoid to diffeomophically regress out fibers as for surfaces for prediction;
instead, we explore the fiber-cortex relationship (or connectivity) to guide the
fiber prediction. Hence, we introduce the following fiber-face selection criterion.
Fiber-face Selection Criterion. We define a distance between two faces ξ
and ξ with respectively F(ξ) = {f1 , . . . , fN } and F(ξ ) = {f1 , . . . , fN

} the set
of fibers that ‘connect’ to it as follows: d(ξ, ξ ) = dshape (ξ, ξ ) + dtermini (ξ, ξ ) +

dconnectivity (ξ, ξ ). The first term measures the overall shape difference between
fibers attached to faces ξ and ξ using the varifold metric as: dshape (ξ, ξ ) =
N N
| N1 k=1 ||fk ||W ∗ − N1 j=1 ||fj ||W ∗ |. The second term quantifies the geometric
N
closeness between the fiber termini positions dtermini (ξ, ξ ) = 12 (| N1 k=1 fk1 −
N N N
j=1 fj |2 + | N k=1 fk − N j=1 fj |2 ). And the third term computes the

1 1 1 2 1 2
N
difference between the number of fibers attached to respectively ξ and ξ (with
η = 0.01 for normalization): dconnectivity (ξ, ξ ) = η|N −N |. This criterion defines
a distance between two faces in terms of their ‘attached’ fiber characterstics in
shape, geometric proximity, and connectivity.
Postnatal Multishape Prediction Algorithm. Algorithm 1 presents the key
steps for multishape prediction based on the learned geometric, dynamic and con-
nectivity features. Briefly, for surface prediction, we use a surface topography-
based metric similar to the one introduced in [5] to hierarchically identify the
closest baseline training vertices, which falls within an -distance from the base-
line atlas A0 , to a baseline testing vertex. Specifically, we propose to reconstruct
a testing baseline surface S̃0 from training baseline surfaces using the following
nested steps: (a) selecting a set of geometrically closest training vertices to the
testing vertex, (b) selecting a subset of these vertices that have most similar
normal directions to the normal vector at the testing vertex, and (c) selecting
another subset of vertices marked in (b) that additionally share the same maxi-
mum principal curvature sign. As for fiber prediction, we first aim to reconstruct
the baseline testing fibers F̃0 using training fibers. To do so, we project the fibers
of the testing subject onto the baseline atlas A0 , hence estimating the testing
connectivity features. Then, for each vertex μ in the reconstructed baseline sur-
face S̃0 , we use the fiber-face selection criterion to first mark the most similar
corresponding training face in fiber properties to the testing face, then add its
connecting fibers to F̃0 . Note that this uses the baseline atlas A0 as a proxy
since each of its faces stores the set of its connecting fibers from all training
subjects. Ultimately, for each marked training face, we trace its diffeomorphic
deformation using φ, while retrieving the set of its connecting fibers at different
acquisition timepoints ti , thereby estimating F̃i .
Algorithm 1. Hybrid longitudinal multishape evolution prediction from baseline

1: INPUTS:
The longitudinal mean atlases Ai , the set of training baseline vertices V, the
baseline testing multishape M0 = (S0 , F0 ), and π A0 (F0 ).
2: Initialize S̃i ← Ai and F̃i = {} for i ∈ {0, . . . , N }.
3: Initialize as the mean distance between S0 and A0 plus its standard deviation.
4: for every vertex μ in the reconstructed baseline shape S̃0 do
5: if its 3D position x is located outside the −neighborhood from S0 then
Update x using a hierarchically surface topography-based metric.
For each unchecked adjacent face ξ to μ, use the fiber-face selection criterion
to identify the most similar corresponding training face in fiber properties to the
testing face. Mark this face as ‘checked’.
Retrieve the dynamic feature for μ as S̃i (x) = φ(x, ti ) at each timepoint.
Retrieve the spatiotemporal connectivity features for the selected deforming
training face (set of fibers Fi (φ(ξ, ti )) that hit φ(ξ, ti ) at timepoint ti ), then F̃i =
F̃i ∪ Fi (φ(ξ, ti )).
6: else
Implement while using projections of both training and testing fibers on A0 .
7: end if
8: end for
9: OUTPUT:
Set of predicted multishapes {M̃i = (S̃i , F̃i )} at timepoints ti .
4 Experiments and Discussion

Dataset and Parameter Setting. We use leave-one-out cross-validation to
evaluate the proposed framework using data of 10 left and right cortical hemi-
spheres from 5 infants, each with longitudinal diffusion and structural MR images
acquired at around birth, 3, 6, and 9 months of age. For varifold surface and fiber
representation, we set σW = 5 for the shape kernel KW , σV = 30 for the defor-
mation kernel KV , and γ = 0.001 for the energy E as explained in [6]. Streamline
tractography [7] was used to estimate the fibers inside each cortical surface at
each timepoint.
Evaluation Metrics. For surface evaluation, we use both Dice index, which
quantifies the face-to-face cortical overlap between two surfaces S and S as

the ratio 2S∩S
S∪S , and the symmetric Euclidean distance. For fiber prediction
evaluation, we introduce three metrics: (1) Global mismatch (%). This rep-
resents the percentage of faces with attached fibers while the corresponding
predicted faces had no fibers and vice versa. (2) Mean varifold difference.
216 I. Rekik et al.
For a pair of faces both with traversing fibers, we use the varifold metric to
measure a face-wise discrepancy between the ground Ntruth and predicted fibers
F and F̃ connected to two surfaces S and S̃: N1S i=1 S
| ||F ξi ||W ∗ − ||F̃ ξi ||W ∗ |,
with NS denoting the number of faces in S, and ξi a face in S. (3) Fiber
mismatch per face. This metric represents the average number of mis-
matched fibers per face across surface faces that are hit by either predicted
or ground truth fibers or both. We also evaluate the joint prediction accu-
racyfor both surface and tracts using a unified varifold difference metric:
NS ξi ξi
i=1 | ||F ||W ∗ − ||F̃ ||W ∗ | + |||Si ||W ∗ − ||S̃i ||W ∗ |.
1
NS
Multishape Prediction Evaluation. Despite the small size of our dataset and
its large variability in cortical shape and fiber tracts, our framework led to very
promising results as summarized in Table 1. Since this is the first work to pre-
dict developing cortical fibers, we compared our prediction error with the error
of the observable baseline multishape reconstruction from the baseline ground
truth multishape, which is very low (0 month in Table 1). We notice that the
prediction accuracy generally decreases from 3 to 9 months compared to the
baseline reconstruction from the ground truth, with a slight potential improve-
ment at 6 months. Notably, the global mismatch for the predicted fibers peaks
at 3 months. This is quite expected since the training fibers at around 3 months
are largely variable due to the rapidly developing myelination. Moreover, the
proposed rich fiber-face selection criterion generated better prediction results
compared to using symmetric Euclidean distance as a similarity metric between
fibers for face-fiber selection. Indeed, mean fiber mismatch per face dropped from
1.76 to 1.64 and mean varifold value from 19.98 to 18.83 when using our metric.
Figure 2 shows a good overall overlap between ground truth and predicted fibers
for a representative testing subject. The red-blue fiber mismatch regions can be
explained by a large variability in the training fiber data as well as the use of
inconsistent subject-specific tractography in the temporal domain. Additionally,
we locally evaluated the accuracy of our prediction method in 35 anatomical
cortical regions (Fig. 3), which showed a spatially-varying prediction accuracy
that generally decreased with time. Nonetheless, it still fitted into a promising
range of prediction values for each evaluation metric (e.g., ∼3 mismatched fibers
Table 1. Surface (S) and fiber (F) prediction accuracy evaluation averaged across 10
cortical hemispheres. The baseline multishape reconstruction error (in bold) is consid-
ered as a ‘reference’ in assessing the performance of our prediction framework.
0 month 3 months 6 months 9 months

Global mismatch % (F) 15.40 ± 2.31 20.40 ± 3.68 19.50 ± 1.26 19.87 ± 1.24
Mean varifold difference (F) 18.83 ± 4.39 21.80 ± 3.22 21.22 ± 4.27 23.83 ± 4.59
Fiber mismatch per face (F) 1.64 ± 0.63 3.09 ± 1.14 2.76 ± 0.44 3.15 ± 0.30
Mean Dice index (S) 1 0.81 ± 0.03 0.81 ± 0.03 0.77 ± 0.02
Mean Euclidean distance in mm (S) 0.45 ± 0.07 0.68 ± 0.09 0.91 ± 0.14 1.08 ± 0.14
Unified varifold difference (S + F) 50.43 55.2 56.42 60.23
Fig. 2. Multishape prediction for a representative subject. The blue multishape repre-
sents the ground truth while the one in red represents the predicted multishape. The
reconstructed baseline multishape (S̃0 , F̃0 ) is used as guidance for multishape predic-
tion at late timepoints and as a reference for evaluation.
per face). For the cortical surface, the prediction mainly dropped in highly folded
and buried cortical regions such as the insular cortex. On the other hand, the
prediction error of the overall shape of the predicted fiber tracts compared with
the ground truth tracts, quantified using the varifold distance, reached its apex
in the paracentral lobule, the posterior cingulate cortex and the precentral gyrus.
This can be explained by large variability in the shape of the fibers connected to
these regions. For potentially similar reasons, the mean face-wise mismatch was
below 15 % in most cortical regions, except for the anterior and posterior cingu-
late cortices, and the insular cortex. These regions were also affected by largest
values of mean fiber mismatch per face (which generally remained below 5).
Fig. 3. Multishape prediction evaluation in 35 anatomical inflated cortical regions.
5 Conclusion
We proposed the first hybrid developing multishape prediction model
that captured well both the diffeomorphic cortical shape deformation and
non-diffeomorphic fiber tracts growth. Our method leveraged on exploring the
fiber-surface relationship through multi-projections of fiber termini on the cor-
responding surface. Our prediction results are promising and we hope that in
218 I. Rekik et al.
the light of this work more attention will be drawn to solving this challenging
problem. Eventually, building an accurate and fast multishape prediction model
can also help predict structural brain connectivity of axonal wiring during early
postnatal stages. One way to improve our work is to develop a non-diffeomorphic
longitudinally consistent brain tractography algorithm as a preprocessing step –
which, to our knowledge, is still not tailored to handle developing 3D fiber tracts.
References
1. Li, G., Liu, T., Ni, D., Lin, W., Gilmore, J., Shen, D.: Spatiotemporal patterns
of cortical fiber density in developing infants, and their relationship with cortical
thickness. Hum. Brain Mapp. 36, 5183–5195 (2015)
2. Durrleman, S., Prastawa, M., Charon, N., Korenberg, J., Joshi, S., Gerig, G.,
Trouvé, A.: Morphometry of anatomical shape complexes with dense deformations
and sparse parameters. NeuroImage 101, 35–49 (2014)
3. Gori, P., Colliot, O., Worbe, Y., Marrakchi-Kacem, L., Lecomte, S., Poupon, C.,
Hartmann, A., Ayache, N., Durrleman, S.: Bayesian atlas estimation for the vari-
ability analysis of shape complexes. Med. Image Comput. Comput. Assist. Interv.
16, 267–274 (2013)
4. Gori, P., Colliot, O., Marrakchi-Kacem, L., Worbe, Y., Routier, A., Poupon, C.,
Hartmann, A., Ayache, N., Durrleman, S.: Joint morphometry of fiber tracts and
gray matter structures using double diffeomorphisms. Inf. Process. Med. Imaging
24, 275–287 (2015)
5. Rekik, I., Li, G., Lin, W., Shen, D.: Predicting infant cortical surface development
using a 4D varifold-based learning framework and local topography-based shape
morphing. Med. Image Anal. 28, 1–12 (2015)
6. Rekik, I., Li, G., Lin, W., Shen, D.: Multidirectional and topography-based dynamic-
scale varifold representations with application to matching developing cortical sur-
faces. NeuroImage 135, 152–162 (2016)
7. Stieltjes, B., Kaufmann, W., Zijl, P.V., Fredericksen, K., Pearlson, G.,
Solaiyappan, M., Mori, S.: Diffusion tensor imaging and axonal tracking in the
human brainstem. NeuroImage 14, 723–735 (2001)
Learning-Based Topological Correction
for Infant Cortical Surfaces
Shijie Hao1,2, Gang Li2, Li Wang2, Yu Meng2,

and Dinggang Shen2(&)
1
School of Computer and Information,
Hefei University of Technology, Anhui, China
2
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
dgshen@med.unc.edu
Abstract. Reconstruction of topologically correct and accurate cortical surfaces

from infant MR images is of great importance in neuroimaging mapping of early
brain development. However, due to rapid growth and ongoing myelination,
infant MR images exhibit extremely low tissue contrast and dynamic appearance
patterns, thus leading to much more topological errors (holes and handles) in the
cortical surfaces derived from tissue segmentation results, in comparison to adult
MR images which typically have good tissue contrast. Existing methods for
topological correction either rely on the minimal correction criteria, or ad hoc
rules based on image intensity priori, thus often resulting in erroneous correction
and large anatomical errors in reconstructed infant cortical surfaces. To address
these issues, we propose to correct topological errors by learning information
from the anatomical references, i.e., manually corrected images. Specifically, in
our method, we first locate candidate voxels of topologically defected regions by
using a topology-preserving level set method. Then, by leveraging rich infor-
mation of the corresponding patches from reference images, we build region-
specific dictionaries from the anatomical references and infer the correct labels
of candidate voxels using sparse representation. Notably, we further integrate
these two steps into an iterative framework to enable gradual correction of large
topological errors, which are frequently occurred in infant images and cannot be
completely corrected using one-shot sparse representation. Extensive experi-
ments on infant cortical surfaces demonstrate that our method not only effec-
tively corrects the topological defects, but also leads to better anatomical
consistency, compared to the state-of-the-art methods.
1 Introduction
The human cerebral cortex is a highly convoluted structure of gray matter. Geometri-
cally, its surface is topologically equivalent to a sphere (without holes and handles),
when artificially closing the midline hemispheric connections. Reconstruction of
topologically correct and accurate cortical surfaces from MR images plays a funda-
mental role in neuroimaging studies [1]. However, due to the highly folded nature of the
cortex and limitations in the MRI acquisition process, it is inevitable to have errors in
brain tissue segmentation, which is a prerequisite for cortical surface reconstruction.
DOI: 10.1007/978-3-319-46720-7_26
220 S. Hao et al.
This situation is especially severe in infant brain MR images, which typically

have extremely low tissue contrast, severe partial volume effects, and regionally-
heterogeneous, dynamically-changing imaging appearance patterns across time, due to
the rapid brain growth and ongoing myelination [2]. For example, at birth, T2-weighted
images typically have much better contrast than T1-weighted images; at 1 year of age,
T1-weighted images have much better contrast than T2-weighted images; at 6 months of
age, both T1- and T2-weighted images exhibit extremely low contrast. Although some
infant-dedicated tissue segmentation method [3] was proposed and achieved reasonable
segmentation results, they do not guarantee the topological correctness of the recon-
structed infant cortical surface, and topological errors caused by inaccurate segmenta-
tion are frequently seen during surface reconstruction, as shown in Fig. 1. Of note, even
a very small error in segmentation could lead to a significant topological defect (the
enlarged view in Fig. 1), thus bringing errors to the cortical surface-based analysis, e.g.,
measuring and processing structural and functional signals based on the geodesic (e.g.,
the red dotted curve in Fig. 1) in the cortical surface.
Fig. 1. Illustration of topological errors in infant cortical surfaces.
Topological correction typically involves two sequential tasks, i.e., (1) locating
topologically defected regions and (2) correcting them. For the former task, methods
largely rely on the priori knowledge that each cortical hemisphere has a simple
spherical topology. Based on this, cyclic graph loops [4–6] or overlapping surface
meshes after remapping [7–9] are used as hints to locate regions with topological
defects. The latter task is much more challenging, as the two types of topological
errors, i.e., holes and handles, are essentially only different in terms of their incon-
sistency with the cortex anatomy. Typically, holes incorrectly perforate the cortical
surface, while handles erroneously bridge the nonadjacent points in the cortical surface,
as shown in Fig. 1. In this context, topological correction methods have to make a
choice between the two correction types: filling a hole or breaking a handle. However,
since the difference between holes and handles actually lies in the sense of anatomical
correctness, they are hard to distinguish solely using geometric information. So
heuristics were usually made to address this issue. For example, a minimal correction
criterion was adopted by assuming that the change for correction should be as small as
Learning-Based Topological Correction for Infant Cortical Surfaces 221
possible [4, 5, 7, 10]. As this criterion is not reliable enough, several ad hoc rules based
on MRI appearance patterns were proposed [6, 8, 11] to help determine the correction
type. Although these methods achieve good performance on adult cortical surfaces,
they have major limitations in processing infant cortical surfaces for two reasons. First,
the minimal correction criterion typically cuts the handles of large topological defects
frequently occurring in infant images, thus leading to anatomical regions missing or
inconsistent. Second, the ad hoc rules designed based on adult MRIs (typically with
clear contrast) are invalid for the infant MRIs, which have longitudinally changing and
regionally heterogeneous intensity patterns. Hence, methods for handling infant MR
images at a variety of developmental stages are highly desired.
In this paper, we propose a novel learning-based method for correcting the topo-
logical defects in infant cortical surfaces, without requiring predefined rules as in the
existing methods. Specifically, we first locate topologically defected regions by using a
topology-preserving level set method. Then, by leveraging rich information of the
corresponding patches from anatomical reference images with correct and accurate
topology, we build region-specific dictionaries and infer the correct tissue labels using
sparse representation. Notably, we further integrate these two steps as an iterative
framework to gradually correct large topological errors that frequently occur in infant
MR images and cannot be completely corrected in one-shot sparse representation.
Extensive experiments demonstrate the feasibility and effectiveness of our method.
2 Method
Given a tissue segmentation image V, labeled as white matter (WM), gray matter
(GM), and cerebrospinal fluid (CSF), our method includes two stages: extracting
candidate voxels (Sect. 2.1) and inferring their new tissue labels (Sect. 2.2). Then,
these two stages are further integrated into an iterative framework (Sect. 2.3).
2.1 Extracting Candidate Voxels

To locate candidate voxels involved in topological defects, we propose to leverage a
topology-preserving level set method [12]. Specifically, for each hemisphere, a level set
function with a spherical topology is first initialized by a large ellipsoid containing all
WM and GM voxels. Then the level set function is gradually shrunk towards the WM
surface, and meanwhile preserves its initial topology by carefully checking the “sim-
ple” voxels in topology during the evolution. Briefly, for a binary volume, a voxel is
called “simple” if its addition or removal from the volume does not change the target’s
topology. On the contrary, a voxel is called “non-simple” if these manipulations change
its topology [12]. During the level set evolution, the judgment on a voxel’s simpleness
is regarded as the topology-aware constraint, which always enables the evolving sur-
face to keep the genus-zero-topology. Therefore, the converged volume of the level set
(LS) evolution VLS can be considered as the result of a pure hole-filling process on
VWM , where all the hole errors are successfully fixed, while all the handle errors are
failed. However, this level set evolution process is still very useful as it facilitates us in
222 S. Hao et al.
extracting candidates (CAN) of topological defects VCAN by a simple XOR operation,

VCAN ¼ VLS VWM .
However, VCAN only provides all the hole positions in VWM in a pure topological
sense. Considering anatomical correctness, these holes actually have different origins.
Some of them are caused by erroneous perforations of WM (holes in anatomy, such as
blue arrows in Fig. 1), while others are spin-off products of the erroneous connection of
WM (handles in anatomy, such as red arrows in Fig. 1). Hence, the obtained VCAN only
covers all the hole voxels, but does not contain any handle voxels. So we further enroll
the neighboring voxels by a morphological dilation that is adaptive to the shape of the
local structures in VCAN . Specifically, for each connected region in VCAN , we first
compute its three PCA coefficients. We then deform the originally isotropic dilator
according to these PCA coefficients and directions. Finally, the connected region is
dilated with this region adaptive dilator. By gathering all the dilated results of con-
nected regions, an updated version of the candidate set for topological correction VCAN
is generated, containing voxels from both holes and handles. In this way, the cardinality
of VCAN is smaller than simply using an isotropic dilator.
2.2 Inferring New Labels of Candidate Voxels Using Anatomical

References
As topological defects stem from mislabeled brain tissue voxels, the task of topological
correction can be considered as a classification problem, with the goal of accurately
inferring new labels for the topologically defected voxels. To this end, we leverage a

sparse representation model based on multiple reference volumes Rk ðk ¼ 1; . . .; KÞ,
which are the manually corrected volumes by experts and thus are free of topological
errors. Given the fact that the common morphological patterns of brain anatomy exist
across subjects, the new labels of voxels involved in topological defects in a subject can

be learned from the reference volumes Rk , thus improving the topological cor-
rectness and anatomical accuracy. The details are described as follows.
For a candidate voxel v 2 VCAN , we collect the tissue labels from v and its dc
dc dc neighbors, and represent v by a cubic patch cðvÞ, which encodes information of
local anatomical structure. Topologically defected voxels can be considered as noises
that contaminate the patch’s morphology. For cðvÞ, we build a region-specific dic-

tionary DðvÞ composed of “clean” patches from Rk without topological errors.

Considering the inter-subject variability, each volume in Rk is firstly non-linearly
aligned to the input V (the tissue-segmented infant MR image) using the Diffeomorphic

Demons method [13]. In this way, the morphed Rk play a better role as the
anatomical reference, since more accurate correspondence can be established between

V and Rk . DðvÞ is then built as follows. Considering possible errors in image
registration, for the corresponding voxel rvk in each morphed volume in Rk , we gather
patches of dc dc dc size from rvk and its dn dn dn neighbors. Hence, DðvÞ is
formed as a ðdc Þ3 ðK ðdn Þ3 Þ matrix, where each column (atom) is the vectorized

patch and K ðdn Þ3 is the number of the atoms collected from the morphed Rk .
Based on the dictionary DðvÞ; cðvÞ is then linearly reconstructed by a weighting

vector w using a sparse representation model, with the following motivations. First, we
intend to exclude the irrelevant patches in DðvÞ during the reconstruction, which
adversely affect the representation results. So an ‘1 LASSO penalty term is adopted to
keep their weights close to zero. Second, as the atoms are densely extracted in dn
dn dn neighborhood for each morphed Rk , some of them can be highly similar to
each other and should be either jointly selected or ignored. As the ‘1 LASSO term tends
to select only one atom from a group and ignore others, we add an ‘2 penalty. Thus the
model is finally formulated as a non-negative Elastic-Net problem [14]:
minw 0 kcðvÞ DðvÞwk þ k1 kwk1 þ k2 kwk22 ð1Þ
The element wi in the weight vector w indicates the appearance similarity between
cðvÞ and the i-th atom in DðvÞ. Herein, the center of the i-th atom in DðvÞ is a voxel ri in
the reference image. Based on the assumption that the appearance similarity wi also
reveals the likelihood that v in subject shares the same label as ri , we can infer the new
label of v with a weighted nearest neighbor model. Denoting lv as the tissue label of v,
we can compute the probability of lv ¼ j, where j 2 fWM; GM; CSFg.
XKðdn Þ3
pðlv ¼ jÞ ¼ i¼1
wi pðlv ¼ jjri Þ ð2Þ
(
1 lri ¼ j
pðlv ¼ jjri Þ ¼ ð3Þ
0 otherwise
The new label of v is finally obtained by the MAP criteria, i.e., arg maxj pðlv ¼ jÞ.
2.3 Iterative Framework

As infant cortical surfaces often contain large handles or holes, which generally cannot
be completely corrected using one-shot sparse representation, we further propose to
integrate the above two steps in an iterative framework to gradually refine the topo-
logical correction results. The whole algorithm is summarized as follow:
224 S. Hao et al.
This framework brings two benefits. First, large topological defects in infant
cortical surfaces are gradually corrected, as the algorithm updates candidate voxels in
each iteration. Second, the cardinality of the candidate voxels decreases during the
iterations, because successfully fixed defects are no longer included in the next itera-
tion. The computational cost is mainly determined by the dictionary size, the cardi-
nality of VCAN , and the iteration number.
3 Experiments
To validate our method, brain MR images with the resolution of 1 1 1 mm3 from
100 infants at 6 months of age were used in experiments. As our method only relies on
tissue segmentation results, we note that it is generic, and can also be applied to adult
brains and infant brains at other developmental stages, such as neonates and 1-year-old.
The main motivation of using 6-month-old infants for validation is that, among all
stages during early brain development, MR images at 6 months exhibit the lowest
tissue contrast and thus the most severe topological errors in tissue segmentation.
Herein, the tissue segmentation was conducted by the state-of-the-art method in [3].
After segmentation, experts manually corrected the topological errors in the cortical
surfaces of WM for all subjects, by using ITK-SNAP. Among the 100 pairs of
uncorrected and manually corrected volumes, 20 manually corrected volumes are

randomly selected as Rk ðk ¼ 1; . . .; 20Þ. One half of the rest 80 pairs were randomly
selected for adjusting parameters and the other half were for performance evaluation.
We use the successful rate Sc to quantitatively evaluate our method:
#ðsuccessfully corrected topological defectsÞ

Sc ¼ ð4Þ
#ðtopological defectsÞ
Here the successfully corrected topological defects indicate that holes are correctly
filled or handles are correctly broken. However, Sc is limited in reflecting the
anatomical consistency between the resulting surface and the ground truth. So we also
adopt the Dice Ratio (DR) and average Surface Distance (SD) as the evaluation
measures:
0 T 0
2 V1 V2
DR ¼ ð5Þ
V 0 þ V 0
1 2
1 1X
SDðV1 ; V2 Þ ¼ ð v1 2surf ðV1 Þ
dðv1 ; surf ðV2 ÞÞ
2 n1
ð6Þ
1X
þ v2 2surf ðV2 Þ
dðv2 ; surf ðV1 ÞÞÞ
n2
where V1 is the output of a topological correction method, and V2 is the manually

corrected WM volume. Of note, to better reflect the performance of topology correction
using DR, we only use those regions enclosing the candidate voxels and their adjacent
0 0
voxels in V1 and V2 , i.e., V1 and V2 , obtained by dilation of the set of the candidate
voxels. In Eq. 6, dð; Þ is the Euclidean distance, and n1 and n2 are cardinalities of
surf ðV1 Þ and surf ðV2 Þ, respectively.
Fig. 2. Examples of topological correction results by our method.
Based on the validation set, we found the best Sc was achieved by setting
k1 ¼ 0:2; k2 ¼ 0:01; dc ¼ 11; dn ¼ 5, T = 4, which were then applied to the testing set.
Figure 2 shows an example of topological correction result by our method. We can see
that our method can effectively fix topological defects and meanwhile ensure the
anatomical consistency and correctness. In the iterative framework (visually validated
in Fig. 3), four iterations (T = 4) are empirically enough for all the cases in our
experiments.
As there is no available software specifically designed for correcting infant cortical
surfaces, we compared our method with two popular software BrainSuite [5] and
FreeSurfer [8], which are designed for processing the adult brain and achieve the
state-of-the-art performance in the field. We show typical results in Fig. 4 and quan-
titative results in Table 1. Due to the minimal correction criterion, BrainSuite does not
fully remove the handle regions, e.g., the red ellipses in Fig. 4. More importantly, it
erroneously breaks too many holes that should be filled, e.g., the blue ellipses in Fig. 4,
leading to a low Sc . In contrary, our learning-based method and FreeSurfer achieve
much better Sc than BrainSuite. However, FreeSurfer has low accuracy in terms of DR
and SD, indicating poor anatomical consistency and correctness. For example, in
Fig. 4, the gyral structures highlighted by the ellipses in FreeSurfer’s results are
missing, compared with the ground truth. After checking all experimental results, we
found that the similar problem of missing large gyral structures occurred in over half of
the FreeSurfer’s results, resulting in a clear drop in DR and significant increase in SD in
Table 1. In contrast, our method produces more balanced results. Its Sc is generally
comparable with FreeSurfer, and its DR and SD are much better than FreeSurfer,
indicating that our method not only effectively corrects topological defects, but also
better ensure the anatomical accuracy.
226 S. Hao et al.
Fig. 3. Examples of iterative correction of topological defects.
Fig. 4. Comparison with other topological correction methods.
Table 1. Quantitative comparison with other topological correction methods.

Sc DR (mean ± std) SD (mean ± std)
BrainSuite 0.694 0.912 ± 0.021 0.025 ± 0.013 mm
FreeSurfer 0.944 0.872 ± 0.119 0.389 ± 2.516 mm
Proposed 0.920 0.941 ± 0.025 0.020 ± 0.007 mm
4 Conclusion
In this paper, we proposed a learning-based method for correcting topological defects

in infant brain cortical surfaces. Our contribution is threefold. First, based on the sparse
representation model, for the first time, we correct topological errors in infant cortical
surfaces by learning rich information from manually-corrected volumes by experts.
Second, to locate the regions with topological errors, we leverage a topology-
preserving level set method. Third, we formulate an iterative framework to facilitate the
correction of large topological errors frequently occurred in infant cortical surfaces.
Experiments demonstrate the effectiveness and accuracy of our method.
Acknowledgements. This work was supported in part by NIH grants (MH107815, MH108914,
MH100217, EB006733, EB008374, and EB009634). Dr. Shijie Hao was supported by National
Nature Science Foundation of China grant 61301222.
References
1. Li, G., et al.: Mapping region-specific longitudinal cortical surface expansion from birth to
2 years of age. Cereb. Cortex 23(11), 2724–2733 (2013)
2. Paus, T., et al.: Maturation of white matter in the human brain: a review of magnetic
resonance studies. Brain Res. Bull. 54(3), 255–266 (2001)
3. Wang, L., et al.: LINKS: learning-based multi-source IntegratioN frameworK for
Segmentation of infant brain images. NeuroImage 108, 160–172 (2015)
4. Shattuck, D.W., Leahy, R.M.: Automated graph-based analysis and correction of cortical
volume topology. TMI 20(11), 1167–1177 (2001)
5. Han, X., et al.: Topology correction in brain cortex segmentation using a multiscale,
graph-based algorithm. TMI 21(2), 109–121 (2002)
6. Shi, Y., Lai, R., Toga, A.W.: Cortical surface reconstruction via unified Reeb analysis of
geometric and topological outliers in magnetic resonance images. TMI 32(3), 511–530
(2013)
7. Fischl, B., et al.: Automated manifold surgery: constructing geometrically accurate and
topologically correct models of the human cerebral cortex. TMI 20(1), 70–80 (2001)
8. Segonne, F., Pacheco, J., Fischl, B.: Geometrically accurate topology-correction of cortical
surfaces using nonseparating loops. TMI 26(4), 518–529 (2007)
9. Yotter, R.A., et al.: Topological correction of brain surface meshes using spherical
harmonics. Hum. Brain Mapp. 32(7), 1109–1124 (2011)
10. Bazin, P.-L., Pham, D.: Topology correction of segmented medical images using a fast
marching algorithm. Comput. Methods Prog. Biomed. 88(2), 182–190 (2007)
11. Ségonne, F., Grimson, W.L., Fischl, B.: A genetic algorithm for the topology correction of
cortical surfaces. In: Christensen, G.E., Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565,
pp. 393–405. Springer, Heidelberg (2005)
12. Han, X., Xu, C., Prince, J.: A topology preserving level set method for geometric deformable
models. PAMI 25(6), 755–768 (2003)
13. Vercauteren, T., et al.: Diffeomorphic demons: efficient non-parametric image registration.
NeuroImage 45(S1), 61–72 (2009)
14. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat.
Soc. Ser. B 67(2), 301–320 (2005)
Riemannian Metric Optimization
for Connectivity-Driven Surface Mapping
Jin Kyu Gahm and Yonggang Shi(B)
Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute,

Keck School of Medicine, University of Southern California, Los Angeles, USA
{jkgahm,yshi}@loni.usc.edu
Abstract. With the advance of human connectome research, there are

great interests in computing diffeomorphic maps of brain surfaces with
rich connectivity features. In this paper, we propose a novel framework for
connectivity-driven surface mapping based on Riemannian metric opti-
mization on surfaces (RMOS) in the Laplace-Beltrami (LB) embedding
space. The mathematical foundation of our method is that we can use the
pullback metric to define an isometry between surfaces for an arbitrary
diffeomorphism, which in turn results in identical LB embeddings from
the two surfaces. For connectivity-driven surface mapping, our goal is to
compute a diffeomorphism that can match a set of connectivity features
defined over anatomical surfaces. The proposed RMOS approach achieves
this goal by iteratively optimizing the Riemannian metric on surfaces to
match the connectivity features in the LB embedding space. At the core
of our framework is an optimization approach that converts the cost func-
tion of connectivity features into a distance measure in the LB embedding
space, and optimizes it using gradients of the LB eigen-system with respect
to the Riemannian metric. We demonstrate our method on the mapping of
thalamic surfaces according to connectivity to ten cortical regions, which
we compute with the multi-shell diffusion imaging data from the Human
Connectome Project (HCP). Comparisons with a state-of-the-art method
show that the RMOS method can more effectively match anatomical fea-
tures and detect thalamic atrophy due to normal aging.
1 Introduction
Surface mapping plays an important role in brain imaging research by enabling

a localized comparison of anatomical structures [1,2]. With the advance of MRI
techniques, there is an avalanche of large scale brain imaging data that focus
on mapping the connectome of human brains [3] and thus an increase of inter-
ests in mapping brain surfaces with connectivity [4,5]. However, they also pose
significant challenges for existing surface mapping techniques that have largely
depended upon customized geometric features for the mapping of specific brain
structures such as the cortex. For general connectivity-driven surface mapping,
Y. Shi—This work was in part supported by the National Institute of Health

(NIH) under Grant K01EB013633, P41EB015922, P50AG005142, U01EY025864,
U01AG051218.

DOI: 10.1007/978-3-319-46720-7 27
Riemannian Metric Optimization for Connectivity-Driven Surface Mapping 229
we develop in this work a novel computational framework for intrinsic and dif-
feomorphic surface mapping in the Laplace-Beltrami (LB) embedding space via
the optimization of Riemannian metrics.
For intrinsic shape analysis, there have been growing interests in using the
spectrum of the LB operator in computer vision and medical image analysis
[6,7]. Graph-based approaches were proposed in [8] for surface mapping with LB
eigenfunctions. An isometry invariant embedding space was proposed in [7] using
the LB spectrum. Based on the equivalence of isometry and the minimization
of a spectral-l2 distance in the LB embedding space, a novel surface mapping
algorithm was developed recently via conformal metric optimization on surfaces
(CMOS) [9]. The CMOS approach, however, only computes conformal maps and
cannot incorporate rich connectivity features.
To overcome this limitation, we propose in this paper a more general com-
putational framework based on the Riemannian metric optimization on surfaces
(RMOS). Given any diffeomorphism between two surfaces, the pullback metric
defines the isometry between two surfaces. Since the LB eigen-system is com-
pletely determined by the Riemannian metric, we can thus pose the computation
of diffeomorphism as a problem of finding the proper Riemannian metric that
minimizes the spectral-l2 distance in the LB embedding space, which ensures an
isometry is achieved with the resulting diffeomorphism. In this general frame-
work, we can easily incorporate the matching of desirable connectivity features
during the RMOS process. For numerical implementation, it was established that
the Riemannian metrics on triangular meshes are weights defined on the edges
and they fully determine the heat kernel on the triangular meshes [10]. Thus the
goal of our RMOS is to compute the optimal weights on the mesh edges to real-
ize diffeomorphic mapping of connectivity features in the LB embedding space.
In our experimental results, we apply RMOS for connectivity-driven mapping of
the thalamic surfaces, which have well-established rich connectivity to cortical
regions [11]. In comparisons with the CMOS method, we demonstrate that the
proposed RMOS method can achieve better alignment of anatomical features
and improved sensitivity in detecting thalamic atrophy due to normal aging.
The rest of the paper is organized as follows. In Sect. 2, we first introduce the
mathematical background of LB embedding and Riemannian metric optimiza-
tion. After that, we propose the RMOS framework and develop the numerical
algorithms for energy minimization. Experimental results on surface mapping
with connectivity features are presented in Sect. 3. Finally, conclusions will be
made in Sect. 4.
2 Riemannian Metric Optimization on Surfaces (RMOS)

In this section, we first review the com-
putation of the LB embedding on trian-
gular meshes. Then we develop a numer-
ical algorithm to compute its gradi-
ent w.r.t. the Riemannian metrics, and
the metric optimization algorithm for
connectivity-driven surface mapping. Fig. 1. Notations for Riemannian metric.
230 J.K. Gahm and Y. Shi
LB Embedding and Riemannian Metric. Let M = (V, T ) denote a tri-

angular mesh where V and T are the set of vertices and triangles, respectively.
For a triangular mesh, the Riemannian metric is denoted as the set of weights
W = [w1 , w2 , · · · ] on all edges of the mesh [10]. These weights are non-negative
and satisfy the triangular inequality on each triangle. With the standard metric
induced from R3 , the metric on the edge Vi Vj is the edge length dij as shown in
Fig. 1. For metric optimization, we will optimize these weights to achieve desir-
able surface mapping results. To compute the LB eigen-system of M, we solve
a generalized matrix eigenvalue problem: Qf = λU f , where λ is an eigenvalue,
f is an eigenfunction, and the matrices Q and U are defined as [9]:
⎧ ⎧
⎪ ij
cot θl Al
⎪
⎪ , ⎪
⎪ , if i = j
⎪
⎪ 2 ⎪
⎪ 6
⎨ j
V ∈N (V i ) Tl ∈N (Vi ,Vj ) ⎨Tl ∈N
(Vi )
Qij = ij
− cot θl Uij = Al
, if Vj ∈ N (Vi ) (1)
⎪ , ⎪ 12
⎪
⎪
2 ⎪
⎪Tl ∈N (Vi ,Vj )
⎪
⎪
Tl ∈N (Vi ,Vj ) ⎪
⎩0,
⎩0, otherwise,
where N (Vi ) is the set of vertices in the 1-ring neighborhood of Vi , N (Vi , Vj )

is the set of triangle sharing the edge (Vi , Vj ), θlij is the angle in the triangle
Tl opposite to the edge (Vi , Vj ), and Al = 12 dik djk sin θlij is the area of the l-th
triangle Tl . The set of eigenfunctions Φ = {f0 , f1 , f2 , · · · } form an orthonormal
Φ
basis on the surface. Using the LB eigen-system, an embedding IM : M → R∞
was proposed in [7]:
Φ f1 (x) f2 (x) fn (x)
IM (x) = ( √ , √ ··· , √ ,···) ∀x ∈ M, (2)
λ1 λ2 λn
where λn and fn denote the n-th eigenvalue and eigenfunction. To compute
the gradient of the LB embedding w.r.t. the Riemannian metric, we derive the
gradient of Q and U w.r.t. the metric on each edge, i.e., the length dij , and list
them in Table 1. We can compute the gradient of fn w.r.t. a metric element
wi ∈ W by taking the derivative on both sides of the eigen-system equation and
solving it as in [9].
Diffeomorphism and LB Embedding. Let us consider two brain surfaces as
Riemannian manifolds: (M1 , g1 ) and (M2 , g2 ), where gi is the metric on Mi (i =
Table 1. Non-zero gradient elements of Q and U w.r.t. the metric (length) of an edge
Vi Vj , i.e., dij . ∂Q/∂dij and ∂U/∂dij are symmetric. Each edge has two neighboring
triangles: Tl1 and Tl2 , where Al1 and Al2 are their areas, and Bl1 and Bl2 are the
product of their edge lengths. The third vertex in these two triangles are Vk1 and Vk2 .
1, 2). For any point p ∈ M1 , the metric g1 (p) defines the inner product of vectors
on the tangent plane at p. Let u : M1 → M2 be a diffeomorphic map from M1 to
M2 . Following the definition of diffeomorphism in differential geometry, we can
map two vectors x1 and x2 on the tangent plane of p ∈ M1 onto the tangent
plane of u(p) ∈ M2 and denote them as du(x1 ) and du(x2 ). We can then define
the inner product of x1 and x2 as the inner product of du(x1 ) and du(x2 ) using the
metric g2 (u(p)) at u(p) ∈ M2 , which is called the pullback metric for M1 induced
by the map u. For every possible diffeomorphism from M1 to M2 , we can thus
induce an isometry between M1 and M2 via the pullback metric. Since the LB
spectrum of a surface is completely determined by its Riemannian metric, the LB
spectrum of M1 generated by the pullback metric will match the LB spectrum of
M2 . For any desired diffeomorphic map, this shows mathematically the existence
of a Riemannian metric on M1 that will ensure its perfect alignment with M2 in
the LB embedding space. For the problem of surface mapping, the diffeomorphism
we want to compute is unknown. In the RMOS framework, our goal is to search
for the Riemannian metric that can minimize their distance in the LB embedding
space while matching connectivity features.
Let W1 and W2 denote their Rie-
mannian metrics, i.e., the edge weights
of M1 and M2 , respectively. The eigen-
systems of M1 and M2 are denoted
as (λ1,n , f1,n ) and (λ2,n , f2,n )(n =
1, 2, · · · ), respectively. We denote u1 :
M1 → M2 as the map from M1 to M2
and u2 : M2 → M1 as the map from
M2 to M1 . As shown in Fig. 2, we com-
pute the LB eigen-system and construct
their embeddings as: M 1 = I Φ1 (M1 )
M1
and M 2 = I Φ2 (M2 ). In the embedding
M2
space, the maps are ũ1 : M 1 → M 2 Fig. 2. Symmetric RMOS mapping

and ũ2 : M2 → M1 . The final maps process.
between the two surfaces are obtained
via composition of the embeddings and the maps in the embedding space.
Energy Function for Surface Mapping. Let ξ1j : M1 → R and ξ2j : M2 → R
(j = 1, 2, · · · , L) denote L connectivity feature functions on each surface. In our
experiments, we will define each feature as the normalized fiber count to a specific
cortical region for thalamic surfaces, but our framework and numerical algorithm
are general for both geometric and other forms of connectivity features. We
define an energy function for connectivity-driven surface mapping with RMOS:
E = EF + γER , where EF is the data fidelity term for matching given features,
ER is the regularization term, and γ is the weight between the two terms. We
define the data fidelity term with an L2 energy:
L

EF = (ξ1j − ξ2j ◦ u1 )2 dM1 + (ξ2j − ξ1j ◦ u2 )2 dM2 . (3)

j=1 M1 M2
This energy is symmetric w.r.t. both surfaces. It penalizes the mismatch between
the original and mapped features. We define the regularization term as:
w1,i 1 w1,j
2 w2,i 1 w2,j
2
ER = − + − , (4)
w1,i
∈W
ŵ1,i n1,i j∈N ŵ1,j w ∈W
ŵ2,i n2,i j∈N ŵ2,j
1 1,i 2,i 2 2,i
where N1,i and N2,i are the sets of edges in the neighborhood (directly connected
to the edge i), n1,i and n2,i are the total numbers of the neighbor edges, and
w1,i and w2,i (ŵ1,i and ŵ2,i ) are the metric (the standard metric), respectively,
on an edge of M1 and M2 . This term constrains the changes of metric ratios to
be smooth.
Optimization Algorithm. To minimize the energy function using metric opti-
mization, we first construct a coarse correspondence, which we call a β-map,
that transforms the energy into distance measurements in the embedding space.
1 → M
Let ũβ1 : M 2 denote the β-map from M 1 to M2 . For each point x ∈ M1 ,
β
ũ1 (x) can be discretized as a linear combination of vertex positions in M 2 .
β β
Thus we can represent the β-maps: ũ1 : M1 → M2 and ũ2 : M2 → M1 as
linear operators A and B, respectively, as shown in Fig. 2. To construct the β-
maps for the minimization of EF , we start from the nearest point maps and
move the points along the gradient descent direction in the tangent space of the
meshes as:
L L
∂EF j j β j β ∂EF
β
= −2 ξ1 − ξ2 ◦ ũ 1 ∇ ξ
2 2
M (ũ 1 ), β
= −2 ξ2j − ξ1j ◦ ũβ2 ∇M j β
1 ξ1 (ũ2 ),
∂ ũ1 j=1 ∂ ũ 2 j=1
where ∇M 1 and ∇M

2 are the intrinsic gradients on the surfaces M1 and M2 .
The β-maps are obtained by updating the maps for a fixed number of time steps.
Given the β-map, we convert the data fidelity term EF into the distance energy
in the embedding space, which we call E F , and compute its gradient descent
F /∂Wi (i = 1, 2) using Eqs. 11 and 12 in [9].
direction ∂ E
To minimize the energy ER w.r.t. the metric W1 and W2 , we rewrite Eq. 4
in a matrix form: ER = D1 W1 2 + D2 W2 2 , and compute the gradients of
ER as:
∂ER ∂ER
= 2D1T D1 W1 , = 2D2T D2 W2 , (5)
∂W1 ∂W2
where D1 and D2 are used to calculate the difference of the metrics between
neighboring edges. They are initially given and fixed because the mesh connec-
tivity does not change during the optimization process. We redefine the energy
function as:
=E
E F + γ̃ER , (6)
that is directly differentiable w.r.t. the metric Wi , and finally form the gra-

dient as: ∂ E/∂W
i = ∂ EF /∂Wi + γ̃(∂ER /∂Wi )(i = 1, 2). By minimizing this
energy using gradient descent, we deform the embedding of a surface toward its
β-map, thus achieving the goal of minimizing the original energy E. When it
is minimized, the LB embeddings of M1 and M2 are perfectly aligned and the

nearest point map gives the final maps that are diffeomorphic.
As mentioned above, the Riemannian metrics defined on the edges of trian-
gular meshes need to satisfy the triangular inequality. By incorporating these
convex conditions, we have an optimization problem for the minimization of the
energy E with the linear constraints. We use Rosen’s gradient projection method
[12] that projects the search direction onto the subspace tangent to any active
constraints. Every iteration in the optimization process, we compute the pro-

jection matrix P , apply it to the gradient: P (∂ E/∂W i ) (i = 1, 2), and finally
update the metric with the projected gradient.
3 Results
In this section, we present experimental results to demonstrate the value of the
RMOS framework in connectivity-based brain mapping. MRI data from 212
subjects of the Q1–Q3 release of the Human Connectome Project (HCP)[3] and
18 subjects from the LifeSpan pilot project of HCP were used in our experi-
ments. We use the left thalamus surfaces from these subjects to compare the
performance of RMOS and CMOS in aligning connectivity features and detect-
ing group differences. All thalamic surfaces are represented as triangular meshes
with 1000 vertices and 2994 edges. For CMOS-based experiments, we use its
implementation in the publicly distributed MOCA software1 .
To define the connectivity features, we use probabilistic tractography with
fiber orientation distributions (FODs) reconstructed from the multi-shell diffu-
sion MRI data from HCP [13]. For each thalamic surface, 100,000 fiber tracts
are generated. For each vertex, we define a neighborhood with a radius of 2 mm.
Given a cortical region, the connectivity from this vertex to the cortical region is
defined as the number of tracts that pass through the vertex neighborhood and
reach the cortical region. By repeating this process for each vertex, we obtain a
connectivity map for this cortical region. After that, we divide the connectivity
map by its maximum value to generate a normalized connectivity map, which
we use in our surface mapping. Overall we compute the connectivity maps to ten
cortical regions: orbital-frontal, superior-frontal, middle/inferior-frontal, motor,
sensory, superior-parietal, inferior-parietal, insular, temporal, and occipital cor-
tices of the same hemisphere.
As a first experiment, we demonstrate a robust approach of selecting the
regularization parameter γ̃ in our energy function Eq. 6. Instead of using a fixed
value during the whole iterative optimization process, we adaptively change γ̃ in
every iteration so that the normalized maximum gradient magnitudes of E F and
ER have the constant ratio γ̄. For a pair of thalamus surfaces shown in Fig. 3(d)
and (e), the effect of the regularization term can be clearly observed in Fig. 3(f)
and (g), where the source mesh is projected onto the target surface using the
RMOS maps computed with two different γ̄ values. For a wide range of γ̄ values,
we run the RMOS map and plot the the optimized total energy E, data fidelity
1
https://www.nitrc.org/projects/moca 2015.
(a) (b) (c)
(d) Source (e) Target (f) γ̄ = 0.01 (g) γ̄ = 0.24 (h) Metric (γ̄ = 0.24)
Fig. 3. RMOS mapping of two thalamic surfaces. Plots of (a) E f , (b) EF and (c) ER
over a range of the parameter γ̄ after RMOS on (d) the source and (e) target surface
(lateral view). Projection of the source to target surface with γ̄ = (f) 0.01, (g) 0.24.
(h) the final optimized metric on the source surface with γ̄ = 0.24 (lateral and medial
views).
term EF and regularization term ER as a function of γ̄ in Fig. 3(a), (b) and (c),
respectively. With the increase of γ̄ until the turning point of the L-shape curve
in (a), we have relatively large decrease of the regularization energy without
much increase of the data fidelity term. Thus we consider it as the sweet spot of
our energy minimization problem and choose the parameter γ̄ = 10−0.625 = 0.24
for our large scale experiments. We follow the multi-scale strategy in [9] that
starts with the first 10 eigenfunctions, iteratively increases the number of eigen-
functions by 5 to 20, and set a maximum of 500 iterations at the final eigen-order.
The optimiazed metric was plotted on the source surface in Fig. 3(h). The RMOS
computational process takes around 2 h on a 16-core 2.6-GHz Intel Xeon CPU
(multi-threading enabled) with maximal memory consumption around 900 MB.
As an illustration, the
connectivity features to
the superior-frontal and
sensory cortices of the (a) Source features (b) Target features.
source and target surfaces
are shown in Fig. 4(a)
and (b). Using the maps (c) Pullback by RMOS (d) Pullback by CMOS
computed by RMOS and
CMOS, we pull back the
connectivity features from (e) RMOS atlas (f) CMOS atlas
the target surface onto
the source surface, and Fig. 4. Mapping the connectivity features of thalamic sur-
the results are shown in faces. To highlight the differences between RMOS and CMOS,
only connectivity features to two cortical regions: superior-
Fig. 4(c) and (d). Clearly frontal (left) and sensory (right) cortices are shown in each
a better match with the subfigure (a)–(f) from the lateral view.
source connectivity features is achieved by the RMOS method. This is not sur-
prising but emphasizes the need of integrating connectivity features into diffeo-
morphic surface mapping. We then apply both RMOS and CMOS to the 212
thalamic surfaces from the HCP data and constructed average connectivity maps
to the ten cortical regions. The results of the maps to the superior-frontal and
sensory cortices are shown in Fig. 4(e) and (f), where we can see the atlas from
the RMOS method appears to be more concentrated, i.e., less variable, than the
CMOS atlas. This demonstrates the potential of connectivity-based mapping
with RMOS for the construction of more anatomically meaningful atlases.
In the last experiment, we examine local-
ized thickness changes of the left thalamus
between two groups from the LifeSpan pilot
project of HCP. Group one consists of 9 sub-
ject in the age range 14–35 yrs. Group two
consists of 9 subjects in the age range 45– (a) RMOS (b) CMOS
75 yrs. The thickness map of each surface is
computed for statistical analysis [2]. Using Fig. 5. Log-scale p-value (− log p)
the surface maps generated by RMOS and maps of the thickness for the 9
CMOS we run vertex-wise t-test, and the young (14–35 yrs) vs. 9 old (45–
p-value maps from these two methods are 75 yrs). Each subfigure shows the
shown in Fig. 5. Clearly the RMOS maps gen- superior (left) and inferior (right)
erate more significant results about thalamic views.
atrophy due to normal aging.
4 Conclusion
In this paper, we developed a novel method for mapping surface connectivity
based on the optimization of the Riemannian metric in the Laplace-Beltrami
embedding space. We demonstrated the value of our method by applying it
to compute connectivity-driven maps of the thalamic surfaces. In comparisons
with a state-of-the-art method, we showed that our method can achieve better
alignment of connectivity features and higher sensitivity in detecting thalamic
atrophy in normal aging. For future work, we will validate our method on more
general anatomical surfaces with both geometric and connectivity features.
References
1. Fischl, B., Sereno, M.I., Dale, A.M.: Cortical surface-based analysis II: infla-
tion, flattening, and a surface-based coordinate system. NeuroImage 9(2), 195–207
(1999)
2. Thompson, P.M., Hayashi, K.M., de Zubicaray, G.I., Janke, A.L., Rose, S.E.,
Semple, J., Hong, M.S., Herman, D.H., Gravano, D., Doddrell, D.M., Toga, A.W.:
Mapping hippocampal and ventricular change in Alzheimer disease. NeuroImage
22(4), 1754–1766 (2004)
3. Essen, D.C.V., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E., Ugurbil, K.:
The WU-Minn human connectome project: an overview. NeuroImage 80, 62–79
(2013)
4. Gutman, B., Leonardo, C., Jahanshad, N., Hibar, D., Eschenburg, K., Nir, T.,
Villalon, J., Thompson, P.: Registering cortical surfaces based on whole-brain
structural connectivity and continuous connectivity analysis. In: Golland, P.,
5. Jiang, X., Zhang, T., Zhu, D., Li, K., Chen, H., Lv, J., Hu, X., Han, J., Shen, D.,
Guo, L., Liu, T.: Anatomy-guided dense individualized and common connectivity-
based cortical landmarks (A-DICCCOL). IEEE Trans. Biomed. Eng. 62(4), 1108–
1119 (2015)
6. Reuter, M., Wolter, F., Peinecke, N.: Laplace-Beltrami spectra as Shape-DNA of
surfaces and solids. Comput. Aided Des. 38, 342–366 (2006)
7. Rustamov, R.M.: Laplace-beltrami eigenfunctions for deformation invariant shape
representation. In: Proceeding of Eurographics Symposium on Geometry Process-
ing, pp. 225–233 (2007)
8. Lombaert, H., Sporring, J., Siddiqi, K.: Diffeomorphic spectral matching of cortical
surfaces. In: Gee, J.C., Joshi, S., Pohl, K.M., Wells, W.M., Zöllei, L. (eds.) IPMI
978-3-642-38868-2 32
9. Shi, Y., Lai, R., Wang, D., Pelletier, D., Mohr, D., Sicotte, N., Toga, A.: Metric
optimization for surface analysis in the Laplace-beltrami embedding space. IEEE
Trans. Med. Imag. 33(7), 1447–1463 (2014)
10. Zeng, W., Guo, R., Luo, F., Gu, X.: Discrete heat kernel determines discrete
Riemannian metric. Graph. Models 74(4), 121–129 (2012)
11. Behrens, T.E., Johansen-Berg, H., Woolrich, M.W., Smith, S.M., Wheeler-
Kingshott, C.A., Boulby, P.A., Barker, G.J., Sillery, E.L., Sheehan, K.,
Ciccarelli, O., Thompson, A.J., Brady, J.M., Matthews, P.M.: Non-invasive map-
ping of connections between human thalamus and cortex using diffusion imaging.
Nat. Neurosci. 7(6), 750–757 (2003)
12. Rosen, J.B.: The gradient projection method for nonlinear programming. part I.
Linear constraints. J. Soc. Ind. Appl. Math. 8(1), 181–217 (1960)
13. Tran, G., Shi, Y.: Fiber orientation and compartment parameter estimation from
multi-shell diffusion imaging. IEEE Trans. Med. Imaging 34(11), 2320–2332 (2015)
Riemannian Statistical Analysis of Cortical
Geometry with Robustness to Partial Homology
and Misalignment
Suyash P. Awate1(B) , Richard M. Leahy2 , and Anand A. Joshi2

1
Computer Science and Engineering Department,
Indian Institute of Technology (IIT) Bombay, Mumbai, India
suyash@cse.iitb.ac.in
2
Signal and Image Processing Institute (SIPI),
University of Southern California (USC), Los Angeles, USA
Abstract. Typical studies of the geometry of the cerebral cortical struc-

ture focus on either cortical folding or thickness. They rely on spatial
normalization, but use cortical descriptors that are sensitive to misreg-
istration arising from the well-known problems of partial homologies
between subject brains and local optima in nonlinear registration. In
contrast to these approaches, we propose a novel framework for study-
ing the geometry of the entire cortical sheet, subsuming its folding and
thickness characteristics. We propose a novel descriptor of local cortical
geometry to increase robustness to partial homology and misregistration.
The proposed descriptor lies on a Riemannian manifold, and we describe
a method for hypothesis testing on manifolds for cross-sectional studies.
Results on simulated and clinical data show the benefits of the proposed
approach for detecting between-group differences with greater accuracy
and consistency.
Keywords: Brain cortex · Folding · Thickness · Riemannian space ·

Hypothesis tests
1 Introduction and Related Work

Studies of cerebral cortical geometry can provide insights into development,
aging, and disease progression. We propose a novel framework to study the geom-
etry of the entire cortical sheet, subsuming its folding and thickness properties
and modeling the complementary nature of these two attributes. Our histogram-
based approach provides robustness to partial homologies and misregistration in
detecting inter-cohort differences.
Typical cross-sectional cortical studies of thickness [6,10,18] or folding [16,
21,22] first perform spatial normalization and then conduct hypothesis tests at
every cortical location in the normalized space. However, it is difficult to find a
S.P. Awate—This work was funded by the following grants: NIH R01 NS089212,
NIH R01 NS074980, and IIT Bombay Seed Grant 14IRCCSG010.

DOI: 10.1007/978-3-319-46720-7 28
238 S.P. Awate et al.
large number of homologous features across individual cortices [11,12,20]. While

the major sulcal patterns are similar across individuals, there is a large individ-
ual variation in the folding pattern and, thus, the homology between cortical
surfaces of two brains is only approximate. This partial homology between two
cortices can lead to large within-group variance of cortical properties, e.g., bend-
ing, shape, or thickness. Moreover, fine-scale misregistrations, either because of
partial homology or numerical challenges in finding the global optimum under-
lying nonlinear diffeomorphic registration, can lead to invalid between-group
comparisons of cortical structure at non-homologous locations.
This paper proposes a novel statistical descriptor of local cortical geometry
that increases robustness to partial homology and misregistration. The proposed
descriptor for cortical folding, and thickness, can lead to easier interpretation,
unlike descriptors based on spherical harmonics or spherical wavelets [21]. The
proposed descriptor lies on a Riemannian manifold and, unlike related studies
on region-based cortical folding [1], uses a method for hypothesis testing on the
Riemannian manifold.
This paper presents a framework for cross-sectional cortical studies, which
models the geometry of the entire cortical sheet, unlike approaches that model
either cortical folding or thickness. It proposes a neighborhood-based histogram
feature of local cortical shape, which is robust to partial homology and misreg-
istration. It presents a method for hypothesis testing for cross-sectional studies
on the Riemannian manifold of histograms. The cross-sectional studies on simu-
lated and clinical brain MRI show the benefits of (i) modeling the entire geome-
try of the cortex, (ii) the robust histogram-based measure, and (iii) Riemannian
hypothesis testing, each of which leads to the detection of the between-group
differences with greater accuracy and precision.
2 Methods
This section describes the proposed model for the cortical sheet, the robust
descriptor of local cortical geometry, and its use for hypothesis testing on a
Riemannian manifold.
2.1 Modeling the Cortex
We propose a medial surface model for the cortex, which subsumes models for
cortical folding and thickness. The proposed model comprises (i) the mid cortical
surface, as the medial surface, and (ii) local cortical thickness values at each point
on the mid-cortical surface. Given the mid-cortical surface M, the value of the
thickness t at each point m on M gives the locations of the inner and outer
(pial) cortical surfaces, at distances t/2 along the inward and outward normals
to the mid-cortical surface at m.
We compute cortical thickness based on [7]. We model the geometry of the
mid-cortical surface M through the local surface-patch characteristics at each
point on the surface. At every point m ∈ M, the principal curvatures κmin (m)
Riemannian Statistical Analysis of Cerebral Cortex 239
and κmax (m) describe the local geometry [3] (up to second order and up to a
translation and rotation). The space (κmin (m),κmax (m)) can be reparametrized,
by a polar transformation, into the orthogonal bases of curvedness
C(m) :=
[κmin (m)2 + κmax (m)2 ]0.5 and shape index S(m) := π1 arctan κκmin
min (m)+κmax (m)
(m)−κmax (m) ,
which meaningfully separate notions of bending and shape [9], leading to easier
interpretation. The shape index S(m) ∈ [−1, 1] is a pure measure of shape,
modulo size, location, and pose. The curvedness C(m) ≥ 0 captures a notion of
surface bending at a particular patch scale/size, and is invariant to location and
pose. We compute principal curvatures at m by fitting a quadratic patch to the
local surface around m [3].
2.2 Multivariate Local Descriptor of Cortical Folding and Thickness
We propose a novel local descriptor of the cortical geometry (folding as well as

thickness) for cross-sectional studies for detecting cortical differences. The par-
tial homology across different brains, biologically limited to about two dozen
landmarks in each hemisphere [20], casts doubts on the validity of typical com-
parisons across non-homologous locations. At location m on the mid-cortical
surface in normalized space, the partial homology can greatly increase the vari-
ance of shape-index values Si (m) across individuals i. The variance increase
reduces the power of the subsequent hypothesis tests.
Surface based smoothing of the shape-index values Si (n), over a neighbor-
hood of location m, cannot address the partial-homology problem. When the
average of the shape index over sulci and gyri leads to a value close to zero, this
average is non-informative about the nature of the folding; e.g., the sinusoidal
surfaces f1 (x, y) := sin(x + y) and f2 (x, y) := sin(y) can both lead to the same
average close to zero in sufficiently large neighborhoods. Thus, spatial smooth-
ing of shape-index in leads to (i) high variance at fine scales (as for pointwise
analyses) and (ii) loss of differentiability between surfaces at coarse scales (i.e.,
large neighborhoods). So, a single-/multiscale analysis with shape-index values
can be an unreliable indicator of folding differences.
We propose to consider the histogram of shape-index values S(n) in a spatial
neighborhood around location n as the feature. The size of the neighborhood
for building this histogram depends on the typical size of regions (not individual
points) in the cortex over which sulcal/gyral homologies can be reliably estab-
lished. This histogram is immune to the inevitable misregistration of sulci/gyri
at fine scales. Moreover, unlike the neighborhood average that is a scalar, the
histogram is a far richer descriptor that retains neighborhood information by
restricting averaging to each histogram bin.
The limitations exhibited by the shape index, because of partial homology,
are also shared by the curvedness because the surface path from the crown of
a gyrus to a fundus of an adjacent sulcus takes the curvature through a large
variation, i.e., from a large positive value to zero (at the inflection point) and back
to a large positive value. Cortical thickness appears to be the least affected by the
partial homology because thickness exhibits a much smaller variation from gyrus
to sulcus. Nevertheless, because the crowns of gyri are typically 20 % thicker

than the fundi of sulci [4], even thickness studies, relying on normalization and
groupwise comparison of spatially-averaged thickness values (at multiple scales),
suffer from problems related to increased variance/information loss. Thus, we
also include curvedness and thickness through their local histograms.
Finally, motivated by the empirically found biological correlations between
the values of shape index Si (n), curvedness Ci (n), and thickness Ti (n), we pro-
pose their joint histogram, denoted by Hi (m), in the spatial neighborhood (about
5 mm radius) of location m on the medial (mid-cortical) surface, as the local
descriptor of the cortex.
2.3 Riemannian Statistical Modeling
We perform hypothesis testing using the joint histograms Hi (m) as the local
feature descriptor for the cortex at location m for subject i. If the number
of bins in the histogram is B, then Hi (m) ∈ (R≥0 )B , ||Hi (m)||1 = 1, and
Hi (m) lies on a Riemannian manifold. To measure distance between histograms
H1 (m) and H2 (m), we use the Fisher-Rao distance metric d(H1 (m), H2 (m)) :=
dg (F1 (m), F2 (m)), where Fi (m) is the square-root histogram
that is denoted
Hi (m), with the value in the b-th bin Fi (m, b) := Hi (m, b) and
dg (F1 (m), F2 (m)) is the geodesic distance between F1 (m) and F2 (m) on the
unit hypersphere SB−1 [19].
Modeling a probability density function (PDF) on a hypersphere entails fun-
damental trade-offs between model generality and the viability of the underlying
parameter estimation. For instance, although Fisher-Bingham PDFs on Sd are
able to model generic anisotropic distributions using O(d2 ) parameters, their
parameter estimation may be intractable [14]. In contrast, parameter estimation
for the O(d)-parameter von Mises-Fisher PDF is tractable, but that PDF can
only model isotropic distributions. We use a tractable approximation of a Normal
law on a Riemannian manifold [17], modeling anisotropy through its covariance
parameter in the tangent space at the mean.
For a group with I subjects, at each cortical location m, we fit the approx-
imate Normal law to the data { Hi (m)}Ii=1 as follows. We optimize for the
Frechet mean μ ∈ SB−1 via iterative gradient descent on the manifold SB−1 [2],
where
I

μ := arg min d2g (ν, Hi (n)) under the constraint ν ∈ SB−1 . (1)
ν
i=1
We use the logarithmic map Logμ (·) to map the square-root histograms

{ Hi (m)}Ii=1 to the tangent space at the estimated Frechet mean μ and find
the optimal covariance matrix Σ in closed form [5]. For any√histogram H, we
√ between H
define the squared geodesic√ Mahalanobis distance √ and mean μ,
given covariance Σ, as d2M ( H; μ, Σ) := Logμ ( H)T Σ −1 Logμ ( H). Then, the
proposed PDF evaluated at histogram H is
√
P (H|μ, Σ) := exp −0.5d2M ( H; μ, Σ) /((2π)(B−1)/2 |Σ|1/2 ). (2)
2.4 Permutation Testing for Riemannian Statistical Analysis

Voxel-wise parametric hypothesis testing in the framework of general linear mod-
els runs a test at each voxel and adjusts p-values to control for Type-I error aris-
ing from multiple comparisons, using Gaussian field theory. However, such para-
metric approaches make strong assumptions on the data distributions and the
dependencies within neighborhoods [15]. In contrast, permutation tests are non-
parametric, rely on the generic assumption of exchangeability, lead to stronger
control over Type-1 error, are more robust to deviations of the data and effects
of processing from an assumed model, and yield multiple-comparison adjusted
p values [15].
For permutation testing within the Riemannian manifold of histograms, we
use a test statistic for cross-sectional studies to measure the differences between
the histogram distributions arising from two cohorts X and Y , at every location
m on the cortex. At each cortical location m, for both cohorts {HiX (m)}Ii=1
and {HjY (m)}Jj=1 , we fit the above Riemannian model to estimate the Frechet
means μX (m), μY (m) and covariances Σ X (m), Σ Y (m) (in the respective mean’s
tangent space). The Hotelling’s T 2 test statistic used in the standard multivariate
Gaussian case cannot be applied in our Riemannian case because the covariances
Σ X (m) and Σ Y (m) are defined in two different (tangent) spaces. Thus, we
propose the following test statistic t(m) to measure the dissimilarity between the
two cohort distributions by adding the squared Mahalanobis geodesic distance
between the group means with respect to each group covariance, i.e.,
t(m) := d2M (μX (m); μY (m), Σ Y (m)) + d2M (μY (m); μX (m), Σ X (m)). (3)
3 Results and Conclusion

We evaluate the proposed framework on MRI volumes from the OASIS
dataset [13]. We use BrainSuite (brainsuite.org) for tissue segmentation, mid-
cortical surface extraction, computing thickness and curvature measures, and
spatial normalization [8].
Validation on Brain MRI by Simulating Cortical Differences. We ran-
domly assigned 140 control subjects to 2 groups of 50 and 90 subjects. We treat
the larger group as normal. For the 50 subjects, we simulated both cortical
thinning (eroding the cortex segmentation) and flattening (smoothing the cor-
tex segmentation) in part of the right parietal lobe (Fig. 1(d)). This (i) reduced
thickness and curvedness values and (ii) increased the concentration of shape
index values around ±0.5 (corresponding to gyral ridges and sulcal valleys) by
smoothing fine-scale cortical features. We then tested for differences between
cortices of these 2 cohorts.
The new approach using the joint histogram descriptor with Riemannian
modeling and hypothesis testing (Fig. 2(d)) correctly shows significantly low p
(a) shape index (b) curvedness (c) thickness (d) region selected
Fig. 1. Cortex parametrization: a sample brain showing computed values of the

(a) shape index, (b) curvedness, and (c) thickness, at each point on the mid-cortical
surface. Simulating cortical differences: (d) Selected region for simulating cortical
thinning and flattening.
(a) (b) (c) (d)
Fig. 2. Validation with simulated differences. Permutation test p values using

Riemannian statistical modeling and hypothesis testing for histogram descriptors with:
(a) shape index, (b) curvedness, (c) thickness, and (d) shape index, curvedness, and
thickness jointly (proposed).
(a) (b) (c) (d)
Fig. 3. Validation with simulated differences. Permutation test p values using

multiscale features of (a) curvedness, (b) thickness, and (c) shape index, curvedness,
and thickness jointly. Permutation test p values with the (d) histogram descriptor and
Euclidean statistics.
values in the thinned-flattened region (Fig. 1(d)) and high p values elsewhere. In
contrast, Riemannian analysis on the marginal histograms for the shape index
(Fig. 2(a)), curvedness (Fig. 2(b)), and thickness (Fig. 2(c)) produces far more
Type-I/Type-II errors.
In comparison, a multiscale shape-index descriptor using a Laplacian scale-
space pyramid was unable to detect any significant differences (all p values
> 0.3; hence, figure not shown), multiscale descriptors of curvedness (Fig. 3(a)),
thickness (Fig. 3(b)), and joint shape-curvedness-thickness (Fig. 3(c)) lead to a
large number of false positives. Furthermore, the joint histogram descriptor with
Euclidean statistical modeling and hypothesis testing (permutation test with
(a) (b) (c)
Fig. 4. Simulated differences, stability analysis. Standard deviation of permuta-

tion test p values, using bootstrap sampling, for (a) joint multiscale descriptor, (b) joint
histogram descriptor with Euclidean analysis, (c) joint histogram descriptor with Rie-
mannian analysis (proposed).
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
(m) (n) (o) (p)
Fig. 5. OASIS, histogram descriptors, Riemannian analysis. Permutation test p

values comparing MCI with controls using Riemannian statistical modeling and hypoth-
esis testing for histogram descriptors, for both hemispheres, using: (a),(e) shape index,
(b),(f) curvedness, (c),(g) thickness, and (d),(h) shape index, curvedness, and thick-
ness jointly (proposed). Analogous p values with (i)–(l) MCI cohort subset of 18 (ran-
domly chosen) subjects and (m)–(p) MCI cohort subset of the remaining 10 subjects.
[Color bar same as Fig. 2]
Hotelling’s test statistic) leads to a large number of false negatives (differences

detected in shrunk region; Fig. 3(d)).
To evaluate the stability of the p values under variation in cohorts, we com-
puted a set of p values by bootstrap sampling the original cohort. This analysis
indicates that the stability of the p values from our framework (Fig. 4(c)) is
superior to approaches using (i) multiscale descriptors (Fig. 4(a)) and (ii) his-
togram descriptors without Riemannian analysis (Fig. 4(b)). The lower values in
the parietal lobe are consistent with the location of the selected region where
strong differences are introduced.
Comparisons of an MCI Cohort to the Control Group. We tested for
differences in 2 cohorts from the OASIS dataset: (i) 140 control subjects and
(ii) 28 subjects with mild cognitive impairment (MCI) with a clinical demen-
tia rating of 1. The results using the proposed approach for 28 MCI subjects
(Fig. 5(d)) remain far more stable for smaller cohort sizes, i.e., 18 MCI sub-
jects (Fig. 5(l)) and 10 MCI subjects (Fig. 5(p)), as compared to using the his-
togram descriptors separately for the shape index (Fig. 5(a),(i),(m)), curvedness
(Fig. 5(b),(j),(n)), and thickness (Fig. 5(c),(k),(o)). The joint multiscale descrip-
tor (Fig. 6) also leads to widely varying results with change in cohort size. Boot-
strap sampling of the cohorts shows that the stability of the p values from our
framework (Fig. 7(c)–(d)) are clearly more stable than those using the joint mul-
tiscale descriptor (Fig. 7(a)–(b)). Our thickness-based results (Fig. 5(c),(g)) share
some similarity with the thickness changes found in MCI [18].
(a) (b) (c)
Fig. 6. OASIS, multiscale descriptor. Permutation test p values with the joint
multiscale descriptor for a MCI cohort of (a) 10 subjects, (b) 18 subjects, and (c) 28
subjects.
(a) (b) (c) (d)
Fig. 7. OASIS, stability analysis. Standard deviation of permutation test p val-

ues, using bootstrap sampling for (a)–(b) joint multiscale descriptor (both hemispheres)
and (c)–(d) joint histogram descriptor with Riemannian analysis (both hemispheres)
(proposed).
Conclusion. We have described a framework for analysis of cortical geometry

that combines a novel histogram-based descriptor, which is robust to partial
homologies and misregistration, with statistical analysis of a Riemannian man-
ifold. Our results show improved accuracy relative to multiscale approaches in
simulations and improved robustness to small sample sizes using clinical data.
References
1. Awate, S., Yushkevich, P., Song, Z., Licht, D., Gee, J.: Cerebral cortical folding
analysis with multivariate modeling and testing: studies on gender differences and
neonatal development. NeuroImage 53(2), 450–459 (2010)
2. Buss, S., Fillmore, J.: Spherical averages and applications to spherical splines and
interpolation. ACM Trans. Graph. 20(2), 95–126 (2001)
3. Carmo, M.D.: Differential Geometry of Curves and Surfaces. Prentice Hall, Upper
Saddle River (1976)
4. Fischl, B., Dale, A.: Measuring the thickness of the human cerebral cortex from
magnetic resonance images. Proc. Nat. Acad. Sci. 97(20), 11050–11055 (2000)
5. Fletcher, T., Lu, C., Pizer, S., Joshi, S.: Principal geodesic analysis for the study
of nonlinear statistics of shape. IEEE Trans. Med. Imaging 23(8), 995–1005 (2004)
6. Hardan, A., Muddasani, S., Vemulapalli, M., Keshavan, M., Minshew, N.: An MRI
study of increased cortical thickness in autism. Am. J. Psychiart. 163(7), 1290–
1292 (2006)
7. Jones, S., Buchbinder, B., Aharon, I.: Three-dimensional mapping of cortical thick-
ness using Laplace’s equation. Hum. Brain Mapp. 11(1), 12–32 (2000)
8. Joshi, A.A., Shattuck, D.W., Leahy, R.M.: A method for automated cortical surface
registration and labeling. In: Dawant, B.M., Christensen, G.E., Fitzpatrick, J.M.,
Rueckert, D. (eds.) WBIR 2012. LNCS, vol. 7359, pp. 180–189. Springer,
Heidelberg (2012)
9. Koenderink, J.J.: Solid Shape. MIT Press, Cambridge (1991)
10. Luders, E., Narr, K., Thompson, P., Rex, D., Woods, R., Jancke, L., Toga, A.:
Gender effects on cortical thickness and the influence of scaling. Hum. Brain Mapp.
27, 314–324 (2006)
11. Lyttelton, O., Boucher, M., Robbins, S., Evans, A.: An unbiased iterative group
registration template for cortical surface analysis. NeuroImage 34, 1535–1544
(2007)
12. Mangin, J., Riviere, D., Cachia, A., Duchesnay, E., Cointepas, Y., Papadopoulos-
Orfanos, D., Scifo, P., Ochiai, T., Brunelle, F., Regis, J.: A framework to study
the cortical folding patterns. NeuroImage 23(1), S129–S138 (2004)
13. Marcus, D., Wang, T., Parker, J., Csernansky, J., Morris, J., Buckner, R.: Open
access series of imaging studies (OASIS): cross-sectional MRI data in young, middle
aged, nondemented, and demented older adults. J. Cogn. Neurosci. 19(9), 1498–
1507 (2007)
14. Mardia, K., Jupp, P.: Directional Statistics. Wiley, Hoboken (2000)
15. Nichols, T., Holmes, A.: Nonparametric permutation tests for functional neu-
roimaging: a primer with examples. Hum. Brain Mapp. 15(1), 1–25 (2002)
16. Nordahl, C., Dierker, D., Mostafavi, I., Schumann, C., Rivera, S., Amaral, D.,
Van-Essen, D.: Cortical folding abnormalities in autism revealed by surface-based
morphometry. J. Neurosci. 27(43), 11725–11735 (2007)
17. Pennec, X.: Intrinsic statistics on Riemannian manifolds: basic tools for geometric
measurements. J. Math. Imaging Vis. 25(1), 127–154 (2006)
18. Redolfi, A., Manset, D., Barkhof, F., Wahlund, L., Glatard, T., Mangin, J.F.,
Frisoni, G.: Head-to-head comparison of two popular cortical thickness extraction
algorithms: a cross-sectional and longitudinal study. PLoS ONE 10(3), e0117692
(2015)
19. Srivastava, A., Jermyn, I., Joshi, S.: Riemannian analysis of probability density
functions with applications in vision. In: Proceedings of International Conference
on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
20. Van-Essen, D., Dierker, D.: Surface-based and probabilistic atlases of primate cere-
bral cortex. Neuron 56, 209–225 (2007)
21. Yeo, B.T.T., Yu, P., Grant, P.E., Fischl, B., Golland, P.: Shape analysis with
overcomplete spherical wavelets. In: Metaxas, D., Axel, L., Fichtinger, G., Székely,
G. (eds.) MICCAI 2008, Part I. LNCS, vol. 5241, pp. 468–476. Springer, Heidelberg
(2008)
22. Yu, P., Grant, P., Qi, Y., Han, X., Segonne, F., Pienaar, R., Busa, E., Pacheco, J.,
Makris, N., Buckner, R., Golland, P., Fischl, B.: Cortical surface shape analysis
based on spherical wavelets. IEEE Trans. Med. Imaging 26(4), 582–597 (2007)
Modeling Fetal Cortical Expansion Using
Graph-Regularized Gompertz Models
Ernst Schwartz1(B) , Gregor Kasprian1 , András Jakab1,2 , Daniela Prayer1 ,

Veronika Schöpf3 , and Georg Langs1
1
Computational Imaging Research Lab and Division of Neuroradiology
and Musculoskeletal Radiology,
Department of Biomedical Imaging and Image-guided Therapy,
Medical University Vienna, Vienna, Austria
ernst.schwartz@meduniwien.ac.at
2
Center for MR-Research, University Children’s Hospital Zürich, Zürich, Switzerland
3
Institute of Psychology, University of Graz, Graz, Austria
Abstract. Understanding patterns of brain development before birth is

of both high clinical and scientific interest. However, despite advances
in reconstruction methods, the challenging setting of in-utero imaging
renders precise, point-wise measurements of the rapidly changing fetal
brain morphology difficult. This paper proposes a method to deal with
bad measurement quality due to image noise, motion artefacts and ensu-
ing segmentation and registration errors by enforcing spatial regularity
during the estimation of parametric models of cortical expansion. Qual-
itative and quantitative analysis of the proposed method was performed
on 88 clinical fetal MR volumes. We show that the resulting models accu-
rately capture the morphological and temporal properties of fetal brain
development by predicting gestational age on unseen cases at human-
level accuracy.
1 Introduction
During the second and third trimester of gestation, the fetal brain grows from
a smooth shape to a complex folded structure. Understanding the processes
driving this rapid development is of strong academic and clinical interest [1].
In the last decade, in utero imaging using Magnetic Resonance Imaging (fetal
MRI), together with specialized reconstruction procedures [2] have led to a better
understanding of gross morphological fetal neurodevelopment. Various authors
have reported normative values for the developing brains’ volume, folding and
surface area. However, these studies are either based on premature neonates [3],
report global measurements [4–6] or rely on a-priori parcellations of the cortical
surface into lobar regions [7].
In this paper, we aim at computing a continuous model of fetal cortical
expansion. We build on recent advances in structured prediction [8] to regularize
parametric growth models along the cortex. We show that the resulting models
G. Langs—This project was supported by the FWF under KLI 544-B27 and I2714-
B31, the OeNB under 14812 and 15929, and EU FP7 under 2012-PIEF-GA-33003.

DOI: 10.1007/978-3-319-46720-7 29
248 E. Schwartz et al.
can be used to precisely model fetal cortical expansion on a surface node level,
predict gestational age with high accuracy, and identify cortical regions that are
predictive for age.
2 Regularizing Parametric Cortical Growth Models

We propose a method allowing for the joint estimation of sets of parametric
models in a regularization framework. Modeling prior knowledge about the type
of interactions between elements of the set as graphs allows us to include this
information in the process of fitting the models to noisy data. Specifically, we
are interested in enforcing spatial smoothness of growth models on a surface. We
show how this can be achieved in the case of a special type of logistic growth
model, the Gompertz function.
Parametric Growth Modeling. Simple models such as linear and polyno-

mial functions are commonly used to summarize observations. However, their
underlying assumption that the observed process is unbounded is generally not
valid. Logistic functions such as the Gompertz function
f (t) = β1 + β2 exp(− exp(−β3 (t − β4 ))) β1,2,4 ∈ R, β3 ∈ R+ (1)

on the other hand can be used to describe an asymptotically bounded processes
such as brain growth [7]. Fitting functions of this type to measurements allows
for the summarization of the observed process in terms of initial (β1 ) and final
(β2 ) quantity, as well as the timing (β4 ) and rate (β3 ) of its growth.
Graph-Based Regularization. Spatial relationships between observations

can be incorporated into model fitting using an appropriate regularization term.
This approach is common in imaging, where the lattice structure of image ele-
ments is exploited to solve otherwise ill-posed problems such as denoising, seg-
mentation or registration. In linear modeling, more complex types of structured
approaches have been proposed that exploit known regularities of the data such
as sparsity or group structure. In work that is most related to the proposed
method, Grosenik et al. [8] exploit the underlying spatial coherence of fMRI to
predict subjects’ behavior by solving the regularized linear least squares problem
argmin Xβ − y22 + λ1 β1 + λ2 β T Gβ (2)

β
where G is a graph representing the connections between voxels of the fMRI

volume.
Spatially Smooth Gompertz Models. A common measure of smoothness of

a function f : Rn → Rm is the sum of all its unmixed second partial derivatives,
n 2
which define the Laplace operator Δf = i=1 ∂∂xf2 . In the discrete space of values
i
Modeling Fetal Cortical Expansion 249
observed on a graph G, an analoguous operator can be defined as L = D − A,

where D is the diagonal matrix of the degrees of the nodes in the adjacency
matrix A.
We introduce this prior into the non-linear problem of fitting a set of growth
models represented as Gompertz functions to observations obtained at locations
corresponding to nodes in a graph G. As in [8], we encourage spatial smoothness
of the model parameters in G by solving

argmin β(1,n) + β(2,n) exp(− exp(−β(3,n) (t − β(4,n) ))) − y(t, xn )22
β n (3)
T
+λβ (L ⊗ B)β
where β = (β(1,1) , . . . , β(4,1) , . . . , β(1,n) , . . . , β(4,n) ) and y(t, xn ) is a measurement

at location xn at time t. Introducing the diagonal matrix B ∈ R4×4 allows us to
selectively penalize
spatial variablity of specific model parameters. For example,
setting B = diag 0, 0, 1, 1 adds costs for the rate and timing of growth, but not
its amplitude. We solve Eq. 3 for observed data y using constrained optimization
to ensure β(3,·) > 0.
3 Data, Cortical Segmentation, and Tracking
Obtaining surface models of the fetal brain from fetal MRI requires a sequence
of processing steps. Artefacts due to fetal motion during the acquisition are
mitigated by using fast Rapid Acquisition with Refocused Echoes (RARE) T2
sequences [9] at increased (3–4 mm) slice thickness. To avoid the loss of important
anatomical information due to this strong anisotropy, orthogonal views in axial,
coronal and sagittal direction are acquired and fused into an isotropic, high-
resolution (HR) volume [2].
Cortical Segmentation. We use probability maps of brain tissues provided

with a publicly available atlas of fetal brain anatomy [10] to initialize a graph-
based segmentation procedure [11]. Atlas alignment is performed using affine
and non-rigid registration, initialized using fiducials placed at the distal horns
of the lateral ventricles. The resulting segmentation is split at the interhemi-
spheric fissure and an estimate of the gray/white matter boundary in each
hemisphere is computed using marching cubes. A spherical parametrization of
the resulting surface mesh is then computed [12] and used to obtain an ini-
tial regular sampling S0 = (V0 , F0 ){l,r} , |V0 | = 642, |F0 | = 1280 of each cortical
surface. The surface mesh of each hemisphere is deformed towards the darkest
part of the cortical band using an active contour model while avoiding self-
intersections. We proceed in a multi-resolution manner using recursive icosa-
hedral subdivisions. This procedure stops after 3 subdivisions, yielding surface
meshes S3 = (V3 , F3 ){l,r} , |V3 | = 40962, |F3 | = 81920.
Surface Normalization. Comparing the surfaces of different cases requires

their point-wise correspondence. We employ a spectral graph matching tech-
nique [13] to establish inter-patient correspondences. To ensure compatibility
over the whole course of development, we first perform surface extraction on
the atlas volumes of [10] and compute pair-wise correspondences between the
models obtained for subsequent weeks as described in [14]. We then perform
spectral matching between each individual case and its age-matched reference
surface. After normalization, we compute the surface area of each face in F3
using Heron’s formula. As Fn+1 is a complete subdivision of Fn , we can directly
integrate these measurements over the faces of F0 to reduce computational com-
plexity. Note that this is not identical to computing the area of F0 directly.
4 Results
We performed the described reconstruction, segmentation and matching proce-

dure on 88 fetal MRIs obtained from routine clinical evaluations at the General
Hospital of Vienna. The gestational age in days (GW + d, or GD) of the individ-
ual cases has been assessed based on the last menstrual cycle of the mother. The
dataset spans the whole second trimester from GW 19 + 5 to GW 36 + 5 (mean
GD 201.76 ± 35.34). None of the cases showed any neurological pathologies.
A Continuous Model of Fetal Cortical Expansion. We fit both indepen-

dent and graph-regularized1 Gompertz models to measurements of local surface
area of corresponding faces in the 88 surface models. Due to differences in the
size of the surface elements in the triangulation (Fig. 1(b)), we enforce smooth-
ness only on the growth rate β(3,·) and inflation time-point β(4,·) by setting B in
Eq. 3 accordingly.
10 10
Local Cortical Area
Local Cortical Area
8 8
6 6
4 4
2 2
20 25 30 35 20 25 30 35
GWs GWs
(a) Unregularized model fit (b) Avg. surface model (c) Regularized model fit
Fig. 1. Comparison between unregularized and regularized model fit showing higher

similarity of local cortical expansion models at neighboring cortex points after regular-
ization.
1
The choice of the regularization weight λ is described in the next section.
Inter-patient variability as well as inconsistencies in the segmentation and

matching procedures lead to considerable noise when measuring the local sur-
face area (Fig. 1). Thus, fitting independent growth models at each location x
can yield physiologically meaningless measurements. This is clearly visible in
Fig. 1(a), where the time-courses of development estimated via the fitted mod-
els vary considerably in neighboring surface patches. Using the proposed graph-
regularized model on the other hand (Fig. 1(c)) has the expected effect of finding
locally similar models that still fit the data well. We assesed model fit by comput-
ing R2 values for both the individual (R2 = 0.75) and regularized (R2 = 0.74)
model, which yielded only a negligeable (95 % CI [9 · 10−3 , 1 · 10−2 ]) decrease
for the regularized model. Lower model fit could be expected due to the added
regularization term in Eq. 3. The small difference however shows that the pro-
posed hypothesis of a smooth developmental process of fetal cortical expansion
can indeed be experimentally validated.
A major advantage of logistic models are their interpretable parameters.
Regularization plays an important role of providing more robust measures of
these factors by reducing the influence of spurious measurements on the model
fit. Figures 2–4 show this effect when modeling fetal cortical expansion. The
relative increase in cortical surface area (Fig. 2) corresponds well to published
results on the global cortical expansion [5,6] and due to our choice of B remains
largely unaffected by the regularization. On the other hand, both the time-point
of maximal cortical expansion (Fig. 3) and the rate of growth (Fig. 4) show the
expected effect of the regularization removing physiologically implausible results
(a) Unregularized model (b) Regularized model
Fig. 2. Percentual increase in cortical surface area from GW 20 to GW 35. Due to the
design of B, this component is largely unaffected by regularization.
Fig. 3. Time-point (GW) of maximal expansion. Regularization removes inconsisten-

cies in regions exhibiting linear growth (cf. Fig. 4) and the recovers the known pattern
of early expansion in central cortical regions [3].
Fig. 4. Expansion rate. Distinctive expansion of the left inferior parietal lobule is robust
with respect to regularization, while values in the insula is homogenized.
such as an inhomogenous time-course of operculization or an inhomogeneous

expansion pattern in the insula.
Note however that the regularized model fit does not simply correspond to
smoothing the parameters of the unregularized model on the surface mesh. This
is apparent in the absence of the distinctive blurring effect of high values affecting
its neighbors, as for example in the expansion rate at the level of the right middle
frontal gyrus (Fig. 4) or in the expansion time-point of the right superior frontal
sulcus (Fig. 3).
Predicting Gestational Age from Cortical Surface Area. Given a mea-

surement of local cortical surface area and a fitted model β1...4 , the inverse of
the Gompertz model can be used to estimate the gestational age of a fetus2 .
We evaluate the validity of models obtained with λ ∈ [0, 0.1, 0.5, 1, 2, 5, 10] using
Leave-One-Out Cross-Validation (LOOCV). By averaging all predictions of the
models, we achieved an overall error of 5.70 ± 4.49 days in the unregularized
case. The best results using regularized models were obtained using λ = 5,
yielding a prediction error of 5.70 ± 4.37 days. Although these differences are not
significant, the regularized model is much easier to interpret. This shows that
spatial smoothness is an adequate prior for fetal cortical expansion and helps in
building descriptive, interpretable growth models. Prediction accuracy in both
cases is also much higher as for instance in [7] where the authors used global
cortical surface curvature measurements instead of local ones. We plot the GWs
estimated using the regularized model against the true GWs in Fig. 5(a), along
with their 0.95-th quantiles over all model faces. Comparing the error rates for
the left and right hemispheres shows an increase in variability with age that is
higher in the left than the right hemisphere.
Finally, the proposed method allows for the point-wise evaluation of the pre-
diction error (Fig. 5(b)). This enables us to distinguish regions that are stable
predictors of age, such as the parietal lobe, the prefrontal cortex and the callosal
area. We performed an exhaustive search over percentiles of face-wise LOOCV
prediction errors in the range of [0.01, . . . , 1] to determine a threshold defin-
ing regions predictive of gestational age. The best prediction was obtained by
retaining all faces that have a mean prediction error below 11.79 days (GW
2
The inverse of Eq. 1 is given as t = f −1 (y) = − log(− log((y−β
β3
1 )/β2 ))
+ β4 .
45
left hemisphere
right hemisphere
40
estimated GW
35
30
25
20
20 25 30 35 40
true GW
Fig. 5. Results of LOOCV age prediction from local cortical surface area.
1 + 5, 0.05-th quantile for λ = 5), Fig. 5(b). By averaging the prediction in these
regions, the error is reduced significantly to 4.65±3.58 days (p < 0.05). The accu-
racy of these results is in the order of the underlying uncertainty of reported last
menstrual cycle and comparable with state of the art results in [15] as well as
manual measurements [4].
5 Conclusion
We have proposed a novel method for fitting spatially regularized growth models
to noisy data. Applying this method in the challenging setting of fetal brain
development enables building accurate interpretable models of cortical expansion
in utero, and allows for the point-wise estimation of gestation age. We have shown
that the resulting models are in line with published knowledge about fetal brain
growth and are able to predict the age of the fetus with high accuracy. We believe
that the presented method is of significant value in deepening the understanding
of the time-course of neuroanatomical development, as well as allowing for the
precise localization and characterization of its vulnerabilities.
References
1. Tallinen, T., Chung, J.Y., Rousseau, F., Girard, N., Lefevre, J., Mahadevan, L.:
On the growth and form of cortical convolutions. Nat. Phys. Feburary 2016
2. Rousseau, F., Oubel, E., Pontabry, J., Schweitzer, M., Studholme, C., Koob, M.,
Dietemann, J.L.: BTK: An open-source toolkit for fetal brain MR image processing.
Comput. Methods Prog. Biomed. 191(1) (2012)
3. Dubois, J., Benders, M., Cachia, A., Lazeyras, F., Leuchter, R.H.V.,
Sizonenko, S.V., Borradori-Tolsa, C., Mangin, J.F., Hüppi, P.S.: Mapping the early
cortical folding process in the preterm newborn brain. Cereb. Cortex 18(6), 1444–
1454 (2008)
4. Wu, J., Awate, S.P., Licht, D.J., Clouchoux, C., du Plessis, A.J., Avants, B.B.,
Vossough, A., Gee, J.C., Limperopoulos, C.: Assessment of MRI-based automated
fetal cerebral cortical folding measures in prediction of gestational age in the third
trimester. Am. J. Neuroradiol. 36(7), 1369–1374 (2015)
5. Clouchoux, C., Kudelski, D., Gholipour, A., Warfield, S.K., Viseur, S., Bouyssi-
Kobar, M., Mari, J.L., Evans, A.C., du Plessis, A.J., Limperopoulos, C.: Quantita-
tive in vivo MRI measurement of cortical development in the fetus. Brain Struct.
Funct. 217(1), 127–139 (2011)
6. Rajagopalan, V., Scott, J., Habas, P.A., Kim, K., Corbett-Detig, J., Rousseau, F.,
Barkovich, A.J., Glenn, O.A., Studholme, C.: Local tissue growth patterns under-
lying normal fetal human brain gyrification quantified in utero. J. Neurosci. 31(8),
2878–2887 (2011)
7. Wright, R., Kyriakopoulou, V., Ledig, C., Rutherford, M.A., Hajnal, J.V.,
Rueckert, D., Aljabar, P.: Automatic quantification of normal cortical folding pat-
terns from foetal brain MRI. NeuroImage 91, 1–12 (2014)
8. Grosenick, L., Klingenberg, B., Katovich, K., Knutson, B., Taylor, J.E.: Inter-
pretable whole-brain prediction analysis with GraphNet. NeuroImage 72(C), 304–
321 (2013)
9. Hennig, J., Nauerth, A., Friedburg, H.: RARE imaging: a fast imaging method for
clinical MR. Magn. Reson. Med. 3(6), 823–833 (1986)
10. Serag, A., Aljabar, P., Ball, G., Counsell, S.J., Boardman, J.P., Rutherford, M.A.,
Edwards, A.D., Hajnal, J.V., Rueckert, D.: Construction of a consistent high-
definition spatio-temporal atlas of the developing brain using adaptive kernel
regression. NeuroImage 59(3), 2255–2265 (2012)
11. Rajchl, M., Baxter, J.S.H., McLeod, A.J., Yuan, J., Qiu, W., Peters, T.M.,
Khan, A.R.: Hierarchical max-flow segmentation framework for multi-atlas seg-
mentation with Kohonen self-organizing map based Gaussian mixture modeling.
Med. Image Anal. 1–19, May 2015
12. Crane, K., Pinkall, U., Schröder, P.: Robust fairing via conformal curvature flow.
ACM Trans. Graph. 32(4), 1–10 (2013)
978-3-642-38868-2 32
14. Wright, R., Makropoulos, A., Kyriakopoulou, V., Patkee, P.A., Koch, L.M.,
Rutherford, M.A., Hajnal, J.V., Rueckert, D., Aljabar, P.: Construction of a fetal
spatio-temporal cortical surface atlas from in utero MRI: application of spectral
surface matching. NeuroImage 120(C), 467–480 (2015)
15. Namburete, A.I.L., Stebbing, R.V., Kemp, B., Yaqub, M., Papageorghiou, A.T.,
Noble, J.A.: Learning-based prediction of gestational age from ultrasound images
of the fetal brain. Med. Image Anal. 21(1), 72–86 (2015)
Longitudinal Analysis of the Preterm Cortex
Using Multi-modal Spectral Matching
Eliza Orasanu1(B) , Pierre-Louis Bazin2 , Andrew Melbourne1 , Marco Lorenzi1 ,

Herve Lombaert3 , Nicola J. Robertson4 , Giles Kendall4 , Nikolaus Weiskopf2 ,
Neil Marlow4 , and Sebastien Ourselin1
1
Centre for Medical Image Computing, University College London, London, UK
eliza.orasanu.12@ucl.ac.uk
2
Department of Neurophysics,
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
3
INRIA - Microsoft Research Joint Centre, Palaiseau, France
4
Academic Neonatology, EGA UCL Institute for Women’s Health, London, UK
Abstract. Extremely preterm birth (less than 32 weeks completed ges-

tation) overlaps with a period of rapid brain growth and development.
Investigating longitudinal brain changes over the preterm period in these
infants may allow the development of biomarkers for predicting neurolog-
ical outcome. In this paper we investigate longitudinal changes in cortical
thickness, cortical fractional anisotropy and cortical mean diffusivity in
a groupwise space obtained using a novel multi-modal spectral matching
technique. The novelty of this method consists in its ability to register
surfaces with very little shape complexity, like in the case of the early
developmental stages of preterm infants, by also taking into account their
underlying biology. A multi-modal method also allows us to investigate
interdependencies between the parameters. Such tools have great poten-
tial in investigating in depth the regions affected by preterm birth and
how they relate to each other.
1 Introduction
Infants born extremely preterm are at high risk of developing cognitive and neu-
rologic impairment from an early age [1]. During the last trimester of pregnancy,
the fetal brain undergoes several changes in size, shape, volume, appearance [2],
as well as changes in connectivity and microstructure. Premature birth implies
that this development of the infant brain will take place under the harsh condi-
tions of the extra-uterine environment. Accurate measurements of the preterm
brain during this early post-natal period may yield predictive biomarkers of
neurological outcome. Furthermore, connecting information given by different
imaging modalities (structural and diffusion), may begin to provide an under-
standing of brain development during the preterm period and how it is affected
by preterm birth.

DOI: 10.1007/978-3-319-46720-7 30
256 E. Orasanu et al.
Longitudinal studies are critical to the accurate analysis of neurodevelop-

ment due to the rapidity of changes in shape and structure. However, longitu-
dinal studies of this period of development are challenging for several reasons.
First, the early and more developed infant brains have very different appearance
and connectivity. Spectral techniques have been used in the preterm popula-
tions for matching white matter surfaces for the study of longitudinal cortical
folding patterns and changes in the white-grey matter boundary [3]. These tech-
niques have proved successful in the intra-subject case due to the prominance
of primary sulci even at early gestational age allowing matching to be driven
by low-frequency spatial features. Second, the anatomical variability of cortical
surfaces is very high cross-sectionally, and variability is not well-represented by
geometric folding patterns. This is especially true in the very early stages of
development, when we deal with lissencephalic brains from different subjects.
During the very-preterm period, additional diffusion information of the white
matter might guide surface registration by contributing emergent microstruc-
tural information; high cortical fractional anisotropy (FA) is a feature of early
brain development, although FA falls rapidly in the third trimester with cortical
maturation. Matching of diffusion tensor images alone [4] to study longitudinal
changes in diffusion parameters and white matter tract shape has already shown
to be successful, but is of limited use to align cortical surfaces except within a
very narrow age range.
The relationships between cerebral microstructure and cortical shape are
intrinsically related and the ability to accurately combine multi-modal informa-
tion about cortical and white matter structure with cortical shape information
represents a key challenge to understanding the synergistic processes of neu-
rodevelopment. In this paper we propose a novel registration technique based
on Pairing Images using the Multi-Modal Spectra (PIMMS) to register cross-
sectional data from 9 early-scanned preterm infants and investigate longitudinal
rates of change in cortical thickness, cortical FA and MD, in the created group-
wise space. The method matches the combined spectra based on tensor similarity
(from the diffusion weighted images) and on the surface information (obtained
from structural image segmentation). The proposed method has an advantage
over the classical surface registration algorithm, since it optimises both surface
and microstructural information, thus providing a more biologically accurate
mapping based on tissue properties and not only sulcal patterns. The mapping
also enables us to study multi-modal variations and interdependency between
parameters obtained from different imaging modalities.
2 Data and Image Processing

Subjects. Volumetric T1 -weighted images were acquired for nine infants (Mean
Gestational Age at Birth (GAB) of 26.8 ± 1.5 weeks) on a Philips Achieva 3T
MRI machine. The infants were imaged at first shortly after birth, at a mean age
of 26.8 ± 1.1 weeks equivalent gestational age (EGA) and then at a mean age of
41.7 ± 2.9 weeks EGA, in an MR-compatible incubator after feeding, when spon-
taneously asleep, with no sedation. T1 -weighted data was acquired at a resolution
Longitudinal Analysis of the Preterm Cortex 257
of 0.82 mm×0.82 mm×0.5 mm at TR/TE =17/4.6 ms, acquisition duration 462 s.

The diffusion weighted data had a resolution of 1.75 mm×1.75 mm×2 mm. We
acquired six volumes at b = 0 mm2 /s, 16 directions at b = 750 mm2 /s and 32 at
b = 2000 mm2 /s with TR/TE =9 s/60 ms.
Image Preprocessing and Infant Brain Segmentation. The preprocessing
of the T1-weighted images was carried out as described in [3]. Briefly, images
were bias-corrected then segmented using the help of neonate brain atlases and
an adaptive EM algorithm. The preprocessing of the diffusion images was done
as described in [5]. We obtained FA and MD maps by fitting a diffusion tensor
model to the data. We resampled the diffusion images to T1 space using a rigid
registration and correct for EPI distortions.
Cortical Thickness. Using the obtained white matter, grey matter and CSF
segmentations, we automatically computed the level set functions of the inner
(WM/GM), central and outer (GM/CSF) boundaries of each hemisphere of the
cerebral cortex using CRUISE [6]. We estimated the cortical thickness (CT)
as the difference between the distance to the inner cortical surface and the
distance to the outer cortical surfaces (given by the level set values). The
WM/GM boundary level set was used to create smooth triangle based meshes
of each hemisphere. We mapped the CT values onto the white-grey matter inner
surface.
Laminar Analysis. From the level set functions of the WM/GM and GM/CSF
boundaries, we created a continuous layering of the cortex (cortical grey matter)
[7]. We used the obtained 11 laminar profiles to sample the FA and MD maps,
and computed the mean values of these parameters across the cortex. To reduce
partial volume effects, we excluded the first and last three profiles from the
computation of the average cortical FA and MD values. These mean FA and
MD values were then mapped onto the white-grey surface.
Longitudinal and Cross-Sectional Mapping. To quantify the longitudinal
changes taking place over the preterm period, we defined a mapping for the
intra-subject WM/GM surfaces, by hemisphere, at the two timepoints, using
JSM [8], initialised with a rigid CPD [9]. To investigate the changes in the same
reference space, we create an average early time point template, by choosing a
random subject as template, mapping all the others into its space and averaging
the results. To register this WM/GM surface data, using only a surface-based
matching technique based on mapping sulcal patterns would not be appropriate,
since early cross-sectional data of this type does not provide us with sufficient
surface information for a proper match. Hence we propose a novel multi-modal
registration technique based on spectral matching (PIMMS), described in the
following section.
3 Pairing Images Using Multi-Modal Spectra (PIMMS)

To estimate a more biologically accurate surface matching in the case
of lissencephalic surfaces, we propose the novel PIMMS, which uses both
diffusion tensor images and surface information. Combining surface (2D) and
volume (3D) information is not trivial [10]. We tackled this problem by embed-
ding the surface with a level set representation in the 3D image domain, and
reformulating the surface spectral matching problem in this context. We fol-
lowed the previous strategies of spectral decomposition in the case of surfaces
and diffusion tensor images. We then compared the groupwise average of PIMMS
with the results of JSM.
Spectral Components of Surface in Image Domain. To decompose the
cortical surface, but in image space, we used the level set images of the white-
grey matter boundary, ILS . To optimise our decomposition, we considered a
subset of our image, ILSΩ1 , consisting of the voxels around the boundary within
a chosen threshold. Similarly to the surface decomposition, where we need to
have continuous surfaces with no holes to obtain smooth spectra, we chose the
smallest threshold that ensured a continuous surface for all subjects, which was
found to be of 3.5 mm in the presented work.
We constructed the connected graph (V, E) with the vertices V being image
voxels and the edges E are defined by the neighbourhood structure of these
vertices. We then represented the graph with its adjacency matrix W , where for
each pair of voxels xi and xj , xi = xj , Wij is 1 if the voxels are neighbours
and 0 if they are not. The diagonal matrix D gives the total weighting of all
edges connected to each voxel and is computed by Dii = j Wij . The general
graph Laplacian is defined by L = G−1 (D − W ), with G being the diagonal node
weighting matrix, which we computed according to the each voxel i’s inverse level
set value Gii = 1/xi . Hence, elements closer to the boundary, with a smaller level
set value, will have a higher weighting when computing the spectra.
The graph spectrum of the level set image at the defined points is given by
the eigen-decomposition of the general graph Laplacian L. The spectral compo-
nents U1LSΩ1 , . . . , UN LSΩ1 represent the fundamental modes of vibrations of the
image, and respectively describe increasing complexity of its geometric features,
from coarse to fine scales.
Mapping the obtained spectra from the level set images decomposition on
surfaces describe similar patterns of variations as the direct spectral decompo-
sition of surfaces given by [8], as shown in Fig. 1.
Combined Level Set and Diffusion Tensor Spectra. We combined the
level set spectra with the spectra obtained by the decomposition of the diffu-
sion tensor images as described by [4]. Briefly, for obtaining the DTI spectra,
the weights between the graph nodes (also neighbouring voxels) are computed
based on both tensor similarity from the log-Euclidean distance and Euclidean
distance. Our main goal was to optimise the surface correspondence by taking
into account microstrucural information inside the white matter. Hence, we sep-
arately compute tensor spectral components U1DT IΩ2 , . . . , UN DT IΩ2 for a subset
of the image IDT IΩ2 , in the deeper white matter structures, i.e. for the values
inside the level set boundary
(negative level set values) and outside the level
set subset ILSΩ1 (IDT IΩ2 ILSΩ1 = Ø). The independently computed spectra
were then combined in the same space to obtain the combined spectra, with
Fig. 1. Spectral modes of shape variation given by the decomposition of a subset of a

level set image, and by the surface of the same boundary in the same subject
Fig. 2. Combined spectral modes for the left hemisphere: shape variation given by the
decomposition a subset of a level set image (edges of the surface) and microstructural
variation given by the decomposition of the diffusion tensor image (inner)
voxels receiving spectral information from either diffusion (inside the WM) or the
surface data (around the boundary) [U1LSΩ1 , U1DT IΩ2 ], . . . , [UN LSΩ1 , UN DT IΩ2 ]
(Fig. 2).
Matching of Multi-modal Spectra. Having the multi-modal spectra of two
subjects R and F , we can now estimate the spatial correspondences between
them by optimising the correspondences between the spectral coordinates defined
by the first k multi-modal components of UR , and UF . We followed the com-
putational scheme introduced in [8]. Briefly, the first k spectral components are
initially corrected for their sign ambiguity by computing the dot product between
the corresponding eigenmodes at similar locations. For this we ran a coherent-
point drift (CPD) rigid registration [9] of the respective clouds of points, which
we used just to ensure the sign matching of the spectra in both the spectral
and diffusion components, independently. Using the combined spectra and the
thresholded level set distance maps for regularisation of the reference and float-
ing images, we estimate a mapping between the corresponding points using a
nearest neighbour search algorithm.
Comparison of Multi-modal Spectral Matching with Surface Spectral

Matching. We used both PIMMS and JSM to create a groupwise average of
the early time point of the subjects described in Sect. 2. The accuracy of the
matching was evaluated by comparing at each vertex the standard deviation of
the mean diffusivity values in the groupwise space. A lower variability indicates
better alignment and consistency of the registration algorithm. We chose the
MD for this validation over the FA, since FA values in the cortex are more
homogeneous.
Figure 3 shows that the standard deviation from the mean of the mean dif-
fusivity is minimised when we used the proposed multi-modal technique. Using
additional microstructural information improves the alignment of the surfaces
by taking into account tissue properties.
Fig. 3. Standard deviation in mean diffusivity in the cortex of the early timepoint on
average groupwise for the left hemisphere, obtained using the proposed method and
Joint-Spectral Matching of surfaces
4 Groupwise Analysis of Longitudinal Changes
Longitudinal Rates of Change. All longitudinal changes in the parameters

were corrected for the time between scans. We computed rates of change per week
in CT, cortical FA and cortical MD during the preterm period in all infants and
mapped them in the groupwise average space (Fig. 4).
Multi-modal Parameters Interdepencies. To investigate correlations
between the rates of change in the cortex, we computed the rank based cor-
relation coefficient between rates in all infants, per region, for each pair of para-
meters: CT-FA, CT-MD and FA-MD. The p-values were FDR corrected at a 0.05
significance rate. The cortical rates of change show statistically significant cor-
relations between the three measures in different regions summarised in Table 1.
We notice that the changes in CT and FA are positively correlated, while the
ones in CT-MD and FA-MD show a direct negative correlation.
Fig. 4. Mean Longitudinal Rates of Change per week in cortical thickness (CT), cortical
fractional anisotropy (FA) and cortical mean diffusivity (MD) in Groupwise Space
Table 1. Statistically significant (0.05 significance, FDR corrected) correlation coeffi-

cients between multi-modal cortex parameters in all four lobes. Negative values imply
a negative correlation, while positive imply a direct positive correlation. The regions
not mentioned did not have a significant correlation between certain parameters.
Left hemisphere Right hemisphere

CT-FA Temporal: 0.15, Occipital: 0.09 Temporal: 0.32, Occipital: 0.21
CT-MD Frontal: −0.06, Occipital: −0.13, Frontal: −0.11, Temporal: −0.33,
Parietal: −0.17 Occipital: −0.17
FA-MD Frontal: −0.24, Temporal: −0.18, -
Parietal: −0.33
5 Discussion
In this work we propose a novel registration technique based on Pairing Images

using the Multi-Modal Spectra (PIMMS), which defines a surface-to-surface
mapping in image domain by optimising both microstructural information in
the white matter (from the diffusion tensor images) and WM/GM surface infor-
mation (obtained from structural image segmentation). We applied this method
to the challenging problem of registering early developmental stages in preterm
born infants. Because of the timing, these surfaces do not provide us with suf-
ficient sulcal patterns needed for a classical surface registration algorithm. The
novelty of this method consists of ensuring a biologically accurate correspon-
dence for surfaces with low gyrification.
We used PIMMS to create a groupwise average space of the early develop-
mental time point, in which we mapped longitudinal changes over the preterm
period in 9 infants. We investigated the rates of change per week in cortical
thickness, cortical fractional anisotropy and cortical mean diffusivity in the cor-
tex. The cortical FA is decreasing in most regions of the brain, and the cortical
MD is increasing, results which match previous studies at the global level [5] and
are likely related to increasing dendrification in the cortex. The cortical thick-
ness is increasing in most regions, except the temporal lobe, where it is slightly
decreasing. This result may be connected to the later development of the tem-
poral lobe and the fact that it is region the most affected by preterm birth [3].
We further investigated the interdependency of these multi-modal parameters
of the cortex across the different lobes. We found a positive CT-FA correlation,
while the CT-MD and FA-MD correlations were negative.
Our future research will imply linking grey and white matter properties close
to the surface (e.g. studying cortical laminae in the cortex and closer to the
white matter boundary), as well as linking the cortical surface and deep white
matter connectivity. Furthermore this method may allow us to also look into the
relationship between cortical folding and fibre-based connectivity.
Acknowledgements. This work was supported by funding from the UK charity

SPARKS, EPSRC (EP/H046410/1, EP/J020990/1, EP/K005278), the EPSRC-funded
UCL Centre for Doctoral Training in Medical Imaging (EP/L016478/1), the MRC
(MR/J01107X/1) and the National Institute for Health Research University College
London Hospitals Biomedical Research Centre (NIHR BRC UCLH/UCL High Impact
Initiative).
References
1. Marlow, N., Wolke, D., Bracewell, M.A., Samara, M.: Neurologic and developmen-
tal disability at six years of age after extremely preterm birth. N. Engl. J. Med.
352(1), 9–19 (2005)
2. Kapellou, O., Counsell, S.J., Kennea, N., Dyet, L., Saeed, N., Stark, J.,
Maalouf, E., Duggan, P., Ajayi-obe, M., Hajnal, J., Allsop, J.M., Boardman, J.,
Rutherford, M.A., Cowan, F., Edwards, A.D.: Abnormal cortical development after
premature birth shown by altered allometric scaling of brain growth. PLoS Med.
3(8), e265 (2006)
3. Orasanu, E., Melbourne, A., Cardoso, M.J., Lomabert, H., Kendall, G.S.,
Robertson, N.J., Marlow, N., Ourselin, S.: Cortical folding of the preterm brain: a
longitudinal analysis of extremely preterm born neonates using spectral matching.
Brain Behav. 488, 1–18 (2016)
4. Orasanu, E., Melbourne, A., Lorenzi, M., Modat, M., Eaton-rosen, Z.,
Robertson, N.J., Kendall, G., Ourselin, S.: Tensor spectral matching of diffusion
weighted images. In: SAMI Conference Proceedings, MIDAS Journal, pp. 35–44
(2015)
5. Eaton-Rosen, Z., Melbourne, A., Orasanu, E., Cardoso, M.J., Modat, M., Bain-
bridge, A., Kendall, G.S., Robertson, N.J., Marlow, N., Ourselin, S.: Longitudinal
measurement of the developing grey matter in preterm subjects using multi-modal
MRI. NeuroImage 111, 580–589 (2015)
6. Han, X., Pham, D.L., Tosun, D., Rettmann, M.E., Xu, C., Prince, J.L.: CRUISE:
cortical reconstruction using implicit surface evolution. NeuroImage 23, 997–1012
(2004)
7. Waehnert, M.D., Dinse, J., Weiss, M., Streicher, M.N., Waehnert, P., Geyer, S.,
Turner, R., Bazin, P.: Anatomically motivated modeling of cortical laminae. Neu-
roImage 93, 210–220 (2014)
978-3-642-38868-2 32
9. Myronenko, A., Song, X.: Point set registration: coherent point drift. IEEE Trans.
10. Postelnicu, G., Zollei, L., Fischl, B.: Combined volumetric and surface registration.
Early Diagnosis of Alzheimer’s Disease
by Joint Feature Selection and Classification
on Temporally Structured Support
Vector Machine
Yingying Zhu, Xiaofeng Zhu, Minjeong Kim,

Dinggang Shen, and Guorong Wu(&)

grwu@med.unc.edu
Abstract. The diagnosis of Alzheimer’s disease (AD) from neuroimaging data

at the pre-clinical stage has been intensively investigated because of the immense
social and economic cost. In the past decade, computational approaches on
longitudinal image sequences have been actively investigated with special
attention to Mild Cognitive Impairment (MCI), which is an intermediate stage
between normal control (NC) and AD. However, current state-of-the-art diag-
nosis methods have limited power in clinical practice, due to the excessive
requirements such as equal and immoderate number of scans in longitudinal
imaging data. More critically, very few methods are specifically designed for the
early alarm of AD uptake. To address these limitations, we propose a flexible
spatial-temporal solution for early detection of AD by recognizing abnormal
structure changes from longitudinal MR image sequence. Specifically, our
method is leveraged by the non-reversible nature of AD progression. We employ
temporally structured SVM to accurately alarm AD at early stage by enforcing
the monotony on classification result to avoid unrealistic and inconsistent diag-
nosis result along time. Furthermore, in order to select best features which can
well collaborate with the classifier, we present as joint feature selection and
classification framework. The evaluation on more than 150 longitudinal subjects
from ADNI dataset shows that our method is able to alarm the conversion of AD
12 months prior to the clinical diagnosis with at least 82.5 % accuracy. It is worth
noting that our proposed method works on widely used MR images and does not
have restriction on the number of scans in the longitudinal sequence, which is
very attractive to real clinical practice.
1 Introduction
Alzheimer’s disease is an incurable neurodegenerative disease. Typical clinical

symptoms include memory loss, disorientation, language and behavioral issues. The
progression of AD is not reversible, however, there are treatment available to modify
disease effect in the early stage of AD [1]. Thus, early diagnosis or prognosis of AD is
of high value in clinical practice since it can save more time for treatment and then
improve the life quality for not only patients but also their caregivers.
DOI: 10.1007/978-3-319-46720-7_31
Early Diagnosis of Alzheimer’s Disease 265
AD introduces both structural and functional loss that is known to have dynami-
cally evolving morphological patterns [1–3, 12–15]. In the past decade, longitudinal
studies have been actively investigated for AD diagnosis with special attention to MCI
[1, 4], which is an intermediate stage between NC and AD. For example, tensor-based
morphometry is used in [4] to reveal brain atrophy patterns from 91 probable AD
patients and 189 MCI subjects scanned at baseline, and after 6, 12, 18, and 24 months.
Moreover, the trend of longitudinal cortical thickness is used as the morphological
patterns in [5] to identify subjects which eventually convert to AD. However, current
longitudinal AD diagnosis methods have very strong restriction on the longitudinal
image sequence. For example, each subject recruited in [5] should at least 5 time-points
in every six months, and should develop AD after at least 12 months after the baseline
scan. For convenience, many longitudinal approaches assume the number of scans is
equal, albeit implicitly. In real clinical setting, however, not all patients have a large or
an equal number of imaging scans.
In order to accurately measure the tiny structural changes along time, current
state-of-the-art computer assisted diagnosis methods have to wait until the patient has
enough number of longitudinal scans. More critically, the prediction is short term, e.g.,
only 6 months before real onset of AD in [5]. Although promising results have been
achieved in predicting whether the subject has progressed to AD or stays in MCI stage,
the limitation of short-term prediction substantially hamper the deployment in clinical
practice.
In light of this, we propose a flexible solution for early detection of AD by
sequentially and consistently recognizing abnormal patterns of structure change from
longitudinal MR image sequence. First, we present a novel temporally structured SVM
(TS-SVM) which is trained based on a set of partial image sequences cut from the
complete longitudinal data. Compared to conventional SVM, our TS-SVM has two
major improvements to achieve early alarm and high accuracy in detecting AD pro-
gression: (1) Temporal consistency. We enforce monotonic constraint to avoid
inconsistent detection results along time. Since convergent evidence suggests that AD
progression is non-reversible [6, 7], we require the risk of AD progression should
monotonically increase within each subject as more and more time-points are inspec-
ted. (2) Early detection. We employ sequential recognition to achieve best balance of
early alarm and detection accuracy. In the training stage, we specifically train the
classifiers by making the classification margin adaptive to the length of partial image
sequence. Given the longitudinal image sequence of new subject with arbitrary number
of scans, we sequentially examine the longitudinal imaging patterns from baseline and
alarm the AD conversion as long as the detection of abnormal change is of high
confidence. Thus, our proposed AD early detection method does not have requirement
on the number of scans. Second, we further present a joint feature selection and
classification framework, in order to make the selected best features are eventually
optimal to work with the learned support vector machine. We have evaluated the
performance of AD early detection on more than 150 longitudinal subjects from ADNI
dataset. Our method achieved promising results by alarming AD onset 12 months prior
to the clinical diagnosis with at least 82.5 % accuracy.
266 Y. Zhu et al.
2 Methods
2.1 Temporally Structured SVM for Early Detection of AD
The goal of our method is to accurately predict AD converting as early as possible by
longitudinally tracking the structure changes. Since magnetic resonance (MR) image is
non-invasive and widely used in clinic practice, we present a novel temporally struc-
tured SVM on longitudinal MR image sequences.
Morphological Features. Suppose we have N training subjects, each subject Sn has a
MR image sequences I n ¼ fItn jt ¼ 1; . . .; Tn g (n ¼ 1; . . .; N) with Tn longitudinal
scans. For each volumetric image Itn , we first register the template image (http://qnl.bu.
edu/obart/explore/AAL/) with 90 manually labeled ROIs (regions of interest) using
hammer registration tool to the underlying image Itn and extract seven morphological
features in each ROI which include tissue percentiles (volumetric percentiles of the
ROI volume) of white matter (WM), gray matter (GM), cerebral-spinal fluid (CSF), and
background, and the averaged voxel-wise Jacobian determinant in WM, and GM and
CSF regions. Therefore, the image feature f nt for each volumetric image Itn is a 90
7 ¼ 630 dimension feature vector.
Decomposition to Partial Image Sequences. We can decompose the complete lon-
gitudinal image sequence I n into ðTn 1Þ partial image sequences Pn ¼ fPn ðbÞjb ¼
2; . . .; Tn g, where each Pn ðbÞ ¼ fItn jt ¼ 1; . . .; bg is the partial image sequence with b
time points from baseline to ðb 1Þ-th follow-up. For each Pn ðbÞ, we further hP extract
b
longitudinal feature representations and form a column vector hðb; nÞ ¼ t¼1 f t =b;
n
n 0
f 1 f nb , where the first half elements are the average of morphological features from
baseline to last time point and the second half elements measure the longitudinal
difference of morphological features from baseline to the last follow-up. It is apparent
that each feature representation hðb; nÞ describes both the spatial and temporal mor-
phological patterns. As we will explain in Sect. 2.2, feature selection is of necessity to
remove data redundancy from such high dimension (d ¼ 1; 260).
Naive Way to Achieve Early Detection by Classic SVM. In our application, the goal
of classification is to determine (1) whether we can detect the conversion of AD on the
new testing subject based on its MR image sequence Z ¼ fZt jt ¼ 1; . . .; Tz g up to the
current time point Tz ; and (2) whether we could detect the AD onset as early as
possible, i.e., push Tz as close to baseline as possible. Thus, we regard the early
detection of AD as a binary classification problem between MCI non-converter
(MCI-NC for short) and MCI converter (MCI-C for short). Without loss of generality,
we assume the first M subjects belong to MCI-NC group and the remaining subjects
belong to MCI-C group. Therefore, we divide all partial image sequences for training
purpose into two groups: MCI-NC group X ¼ xb;p jxb;p ¼ hðb; pÞ; p ¼ 1; . . .; M; b ¼

1; . . .; Tp g and MCI-C group Y ¼ yb;p jyb;p ¼ hðb; qÞjq ¼ M þ 1; . . .; N; b ¼ 1;
. . .; Tq g. To achieve above goal, the naïve way is to train a SVM by:
8
< dx wTx wTy xb;p \e; e [ 0; 8xb;p 2 X
arg minw kWk2F þ ke2 ; s:t: ; ð1Þ
: dy wT wT y \e; e [ 0; 8y 2 Y
y x b;q b;q
where W ¼ ½wx wy is a matrix consisting of classifier wx 2 Nd1 for MCI-NC group

and wy 2 Nd1 for MCI-C group. The intuition behind the constraint is that the

probability score wTx xb;p for each MCI-NC sample xb;p staying the MCI-NC group

should be greater than the score wTy xb;p for jumping to MCI-C group by an
inter-class margin dx . Similar principle also applies to the sample yb;q from MCI-C
group. e is the slack variable which compensates for the mis-classification errors.
It is clear that there is strong structural correlations along partial image sequences in
each subject. However, the naïve SVM solution shown in Eq. (1) treats each partial
sequence separately. As shown in the left of Fig. 1, the probability scores of AD
conversion and staying in MCI stage are not stable along time, which is not realistic
since the structural change and AD progression are normally regarded as
non-reversible.
Fig. 1. Advantages of our TS-SVM (right) over the naïve SVM solution (left). In our TS-SVM
method, we enforce the temporal monotony and consistency constraints on the extracted partial
image sequences (shown in the middle))
Temporally Structured SVM on Longitudinal MR Image Sequences. To improve

the accuracy of early AD detection, we propose the temporally structured SVM as:
8
>
< dx ðbÞ wTx wTy xb;p \e; e [ 0; 8xb;p 2 X
arg minw kWk2F þ ke2 ; s:t: C1 : ; and
>
: dy ðbÞ wTy wTx xb;q \e; e [ 0; 8xb;q 2 Y ð2Þ
C2 : sy ðlÞ wTy ðyb;q ya;q Þ\e; l ¼ b a; e [ 0; 2 a\b; 8ya;q ; yb;q 2 Y:
Compared to the objective function of naïve SVM in Eq. (1), two new constraints
(C1 and C2) are used. (1) we first turn the inter-class margins dx and dy in Eq. (1) from
scalar values into the monotonically increasing functions of b (the length of partial
268 Y. Zhu et al.
image sequence). The constraint C1 is mainly used to achieve early detection, i.e., we
require the probability of making accurate classification should increase as more time
points are available. (2) The second constraint C2 takes advantage of the non-reversible
nature of AD progression. Suppose ya;q and yb;q are the morphological features from
the same MCI-C subject but yb;q is extracted at the later time points after ya;q (i.e.,
a < b). Then we require the probability of the underlying MCI-C subject being con-
verted to AD should higher at later time point b than at earlier time point a, i.e.,
wTy yb;q [ wTy ya;q since AD conversion is irreversible. Furthermore, the intra-class
margin sy is a monotonically increasing function of l (l ¼ b a is the length difference
between two partial image sequences). Intuitively, the bigger the gap between two time
points is, the larger the increase of AD conversion risk becomes. It is worth noting that
the constraint C2 is not applicable to MCI-NC subjects since the MCI-NC subject might
convert to AD as more and more follow-ups will be scanned in future. Thus it is
unreasonable to assume the MCI-NC subject can keep staying at MCI stage. As shown
in the right of Fig. 1, for particular MCI-C subject, not only the probability score of AD
conversion but also the difference between the probability scores of converting to AD
and staying MCI monotonically increase as the partial image sequence becomes longer
and longer. Thus, our TS-SVM can detect AD onset at early stage with high confi-
dence. It is worth noting that we set dx ðbÞ ¼ b, dy ðbÞ ¼ b, sy ðlÞ ¼ l in all experiments.
2.2 Joint Feature Selection and Classification on TS-SVM

Since the morphological features are in high dimension, feature selection is a standard
procedure to remove the data redundancy. Usually feature selection is independently
applied prior to train the classifiers. In order to make the selected best features are also
optimal for using TS-SVM, we proposed to jointly select best features and train the
classifiers by introducing L2;1 norm on the classification matrix W:
arg minw kWk2;1 þ ke2 ; s:t: C1 and C2 : ð3Þ
The intuitions behind using kWk2;1 are that (1) sparsity constraint on each column of
W: only a small number of features are selected which is useful to suppress the noisy
and redundant patterns, and (2) group-wise constraint on each row of W: both MCI-NC
and MCI-C classifiers select/discard the same morphological features. In this way, W
can be simultaneously regarded as a coefficient matrix for feature selection and a
classifier for classification.
2.3 Optimization
Although Eq. (3) is a convex problem, it is hard to optimize it directly due to a large
number of linear inequality constraints. To solve this problem efficiently, we refor-
mulate it as an unconstrained problem following the framework of Alternating
Direction Method of Multipliers (ADMM) [8, 9, 16]. Specifically, we rewrite Eq. (3) as
an unconstrained convex optimization problem by introducing a dummy variable Z to

break the group sparse constraint with other inequality constraints:
2
X

X

arg minW;Z kZk2;1 þ k4 dx ðbÞ wTx wTy xb;p þ dy ðbÞ wTy wTx yb;p þ
h h
xb;p 2X yb;q 2Y
3
X

sy ðb aÞ wTy yb;q ya;q 5 þ lkW ZkF þ Tr KT ðW ZÞ
2
h
ya;q 2Y;yb;q 2Y
ð4Þ
where k kh is a hinge loss function which measures the mis-classification error with the
quadratic loss: kxkh ¼ kmaxð0; xÞk22 , l is the penalty parameters for the constraint
W ¼ Z,K 2 Nd2 is the Lagrange multiplier matrix for the equality constraint W ¼ Z,
Trð:Þ represents the trace operator, and k is the penalty parameter for the constraints C1
and C2, respectively. Equation (4) can be optimized by alternatively solving W; Z until
the overall energy function converges.
3 Experiments
In the following experiments, we select 70 MCI-C subjects from ADNI dataset which
have AD onset in the middle of longitudinal image sequence and 81 MCI-NC subject
which stay in MCI stage until the last scan in the latest ADNI dataset. For all subjects,
95.3 % have 4 follow-ups every 6 months, and the remaining 4.7 % having more than 4
follow-ups. Specifically, for 70 MCI-C subjects, 11.1 % are diagnosed AD at 6 months,
31.8 % at 12 months, 25.3 % are diagnosed AD at 18 months after baseline scan, while
the remaining 31.8 % are diagnosed AD more than 24 months after baseline scan. We
compare our proposed TS-SVM based early detection method with standard SVM based
method. Furthermore, we evaluate the importance of feature selection in both TS-SVM
and standard SVM method. Thus, we compare the classification performance for four
method in total, denoted by SVM, SVM+FS, TS-SVM, and TS-SVM+FS, respectively.
In all experiments, we split the data into 10 non-overlap folders and report the averaged
classification accuracy after 10-fold cross validation. The parameters are tuned using
grid search strategy only in the training dataset.
Performance of AD Early Detection. In each cross validation case, we train our
TS-SVM on the training data and sequentially apply the trained classifier to the testing
subject image sequence from the first follow-up. Since the month of converting to AD
after baseline scans varies across MCI-C subjects, we show the detection accuracy for
MCI-C subjects converting to AD 12 months, 18 months, and 24 months after the
baseline scan in Tables 1, 2 and 3, separately. It is clear our TS-SVM beat the standard
SVM with more than 10 % improvement in terms of classification accuracy, which
shows the advantage of temporal consistency and monotony constraints in our pro-
posed method. Also, feature selection is very important to improve the detection
accuracy, where SVM+FS and TS-SVM+FS can obtain average 3.8 % and 2.9 %
270 Y. Zhu et al.
increase over SVM and TS-SVM, respectively. In brief, our full method (TS-SVM+FS)
can detect AD 6 months prior to AD onset with 86.8 % accuracy, 12 months prior to
AD onset with 82.5 % accuracy, and 18 months prior to AD onset with 76.5 %
accuracy. Note, the early detection performance in Table 3 is worse than Tables 1 and
2 at corresponding pre-diagnosis windows. The reason is that the subjects in Table 3
mostly have 5 time points and have AD onset exactly at the last time point. Thus, the
unbalanced partial image sequences before and after AD onset challenge the learning of
robust classifiers.
Critical Brain Regions Related with AD Progression. Since our method jointly
select morphological features in training TS-SVM, it is interesting to examine the
critical brain regions where the morphological features extracted from these region
contribute significantly to detect AD progression via longitudinal tracking. Figure 2
show the top 20 regions selected by our TS-SVM+FS method. It is apparent that the
selected brain regions are located at AD related sub-cortical regions (such as putamen,
thalamus, and hippocampus) and cortical areas (such as orbitofrontal cortex,
medial/lateral temporal lobe, and medial/lateral parietal lobe), which is in consensus
with the neuroimaging observations in the literatures [10, 11]. We also compared the
top selected ROI for short term and long term detection and found that the cortical
regions contribute more for short term detection and the sub-cortical regions, such as
such as putamen, thalamus, and hippocampus, contribute more for long term converters
detection. This may indicate that the sub-cortical regions changes are more significant
compared with the cortical regions at the earlier AD progression stage. We did not
visualize this result due to the page limitation.
Table 1. Accuracy of AD early detection at 6 months, and 0 month before AD onset for the
MCI-C subjects converting to AD 12 months after baseline scan.
Method 18 months 12 months 6 months 0 month
ACC AUC ACC AUC ACC AUC ACC AUC
SVM - - - - 0.7110 0.7612 0.7345 0.7937
SVM+FS - - - - 0.7557 0.7862 0.7735 0.8237
TS-SVM - - - - 0.8816 0.9327 0.8975 0.9431
TS-SVM+FS - - - - 0.9025 0.9649 0.9075 0.9776
Table 2. Accuracy of AD early detection at 12 months, 6 months, and 0 month before AD onset
for the MCI-C subjects converting to AD 18 months after baseline scan.
SVM - - 0.7325 0.7822 0.7455 0.7917 0.7535 0.8223
SVM+FS - - 0.7537 0.7912 0.7685 0.8123 0.7725 0.8314
TS-SVM - - 0.8425 0.8851 0.8593 0.9042 0.8635 0.9128
TS-SVM+FS - - 0.8475 0.8932 0.8720 0.9277 0.8812 0.9216
Table 3. Accuracy of AD early detection at 18 months, 12 months, 6 months, and 0 month

before AD onset for the MCI-C subjects converting to AD 24 months after baseline scan.
SVM 0.6016 0.6542 0.6025 0.6677 0.6325 0.6712 0.6515 0.6931
SVM+FS 0.6557 0.6862 0.6735 0.6127 0.6675 0.6983 0.6464 0.6926
TS-SVM 0.7345 0.7734 0.7675 0.8116 0.7805 0.8334 0.7875 0.8503
TS-SVM+FS 0.7653 0.7983 0.8125 0.8434 0.8345 0.8672 0.8431 0.8894
Fig. 2. The top 20 critical brain regions which contributed in AD early detection.
4 Conclusion
In this paper, we present a novel early AD diagnosis method using temporally struc-
tural SVM. In order to avoid inconsistent and unrealistic classification results, we
propose the monotony on the output of SVM since the AD progression is generally
non-reversible. In order to achieve early alarm of AD onset, we propose to adjust the
classification margin such that the confidence of detecting AD progression becomes
high as more and more follow-up scans are examined. Furthermore, we jointly perform
feature selection and training of TS-SVM, in order to make the selected features can
work well with the trained classifiers.
References
1. Thompson, P.M., Hayashi, K.M., Dutton, R.A., Chiang, M.C., Leow, A.D., Sowell, E.R., De
Zubicaray, G., Becker, J.T., Lopez, O.L., Aizenstein, H.J., Toga, A.W.: Tracking
Alzheimer’s disease. Ann. NY Acad. Sci. 1097, 198–214 (2007)
2. aël Chetelat, G., Baron, J.-C.: Early diagnosis of Alzheimer’s disease: contribution of
structural neuroimaging. NeuroImage 18, 525–541 (2003)
3. Reisberg, B., Ferris, S.H., Kluger, A., Franssen, E., Wegiel, J., de Leon, M.J.: Mild cognitive
impairment (MCI): a historical perspective. Int. Psychogeriatr. 20, 18–31 (2008)
4. Hua, X., Lee, S., Hibar, D.P., Yanovsky, I., Leow, A.D., Toga Jr., A.W., Jack, C.R.,
Bernstein, M.A., Reiman, E.M., Harvey, D.J., Kornak, J., Schuff, N., Alexander, G.E.,
Weiner, M.W., Thompson, P.M.: Mapping Alzheimer’s disease progression in 1309 MRI
scans: power estimates for different inter-scan intervals. NeuroImage 51, 63–75 (2010)
5. Li, Y., Wang, Y., Wu, G., Shi, F., Zhou, L., Lin, W., Shen, D.: Discriminant analysis of
longitudinal cortical thickness changes in Alzheimer’s disease using dynamic and network
features. Neurobiol. Aging 33, 427.e415–430 (2012)
272 Y. Zhu et al.
6. Hua, X., Gutman, B., Boyle, C.P., Rajagopalan, P., Leow, A.D., Yanovsky, I., Kumar, A.R.,
Toga Jr., A.W., Jack, C.R., Schuff, N., Alexander, G.E., Chen, K., Reiman, E.M., Weiner,
M.W., Thompson, P.M.: Accurate measurement of brain changes in longitudinal MRI scans
using tensor-based morphometry. NeuroImage 57, 5–14 (2011)
7. Filley, C.: Alzheimer’s disease: it’s irreversible but not untreatable. Geriatrics 50, 18–23
(1995)
8. Boyd, S., et al.: Distributed optimization and statistical learning via the ADMM. Found.
Trends Mach. Learn. 3, 1–122 (2011)
9. Nie, F., Huang, Y., Wang, X., Huang, H.: New primal SVM solver with linear computational
cost for big data classifications. In: ICML (2014)
10. Hoesen, G.W.V., Parvizi, J., Chu, C.-C.: Orbitofrontal cortex pathology in Alzheimer’s
disease. Cereb. Cortex 10, 243–251 (2000)
11. Risacher, S., Saykin, A.: Neuroimaging biomarkers of neurodegenerative diseases and
dementia. Semin. Neurol. 33, 386–416 (2013)
12. Antila, K., Lötjönen, J., Thurfjell, L., et al.: The PredictAD project: development of novel
biomarkers and analysis software for early diagnosis of the Alzheimer’s disease. Interface
Focus 3(2012)
13. Lorenzi, M., Ziegler, G., Alexander, D.C., Ourselin, S.: Efficient Gaussian process-based
modelling and prediction of image time series. In: Ourselin, S., Alexander, D.C.,
Westin, C.-F., Cardoso, M. (eds.) IPMI 2015. LNCS, vol. 9123, pp. 626–637. Springer,
Heidelberg (2015)
14. Young, A.L., et al.: A data-driven model of bio-marker changes in sporadic Alzheimer’s
disease. Brain 25, 64–77 (2014)
15. Fonteijin, H.M., et al.: An event-based model for disease progression and its application in
familial Alzheimer’s disease and Huntington’s disease. NeuroImage 60, 1880–1889 (2012)
16. Zhu, Y., Lucey, S.: Convolutional sparse coding for trajectory reconstruction. IEEE Trans.
Prediction of Memory Impairment with MRI
Data: A Longitudinal Study of Alzheimer’s
Disease
Xiaoqian Wang1 , Dinggang Shen2 , and Heng Huang1(B)

1
Computer Science and Engineering, University of Texas at Arlington,
Arlington, USA
heng@uta.edu
2
Chapel Hill, USA
Abstract. Alzheimer’s Disease (AD), a severe type of neurodegenera-

tive disorder with progressive impairment of learning and memory, has
threatened the health of millions of people. How to recognize AD at
early stage is crucial. Multiple models have been presented to predict
cognitive impairments by means of neuroimaging data. However, tra-
ditional models did not employ the valuable longitudinal information
along the progression of the disease. In this paper, we proposed a novel
longitudinal feature learning model to simultaneously uncover the inter-
relations among different cognitive measures at different time points and
utilize such interrelated structures to enhance the learning of associations
between imaging features and prediction tasks. Moreover, we adopted
Schatten p-norm to identify the interrelation structures existing in the
low-rank subspace. Empirical results on the ADNI cohort demonstrated
promising performance of our model.
1 Introduction
Alzheimer’s Disease (AD), the most common form of dementia, is a neurodegen-
erative disorder which severely impacts patients’ thinking, memory and behavior.
Current consensus has emphasized the demand of early recognition of this dis-
ease, with which the goal of stoping or slowing down the disease progression
can be achieved [8]. The effectiveness of neuroimaging in predicting the progres-
sion of AD or cognitive performance has been studied and reported in plentiful
research [4,12]. However, many previous research merely paid attention to the
prediction using the baseline data, which neglected correlation among longitudi-
nal cognitive performance. AD is a progressive neurodegenerative disorder, thus
it is significant to discover neuroimaging measures that impact the progression
of this disease along the time axis.
X. Wang and H. Huang were supported in part by NSF IIS-1117965, IIS-1302675,
IIS-1344152, DBI-1356628, and NIH AG049371. D. Shen was supported in part by
NIH AG041721.

DOI: 10.1007/978-3-319-46720-7 32
274 X. Wang et al.
In the association study of predicting cognitive scores from imaging features,

the input data usually consists of two matrices: the imaging feature matrix X
and the cognitive score matrix Y. If we denote the number or samples as n;
the number of features as d while the number of different measures of a certain
cognitive performance test as m, then X and Y can be formed in the following
format: X = [X1 , · · · , XT ] ∈ Rd×nT corresponds to the imaging features at T
consecutive time points where Xt ∈ Rd×n is the imaging marker matrix at the
t-th time point; Y = [Y1 , · · · , YT ] ∈ Rn×mT corresponds to the cognitive scores
at T consecutive time points with Yt ∈ Rn×m denoting the measurement at the
t-th time point.
Let’s consider the prediction of one cognitive measure at one time point to
be one task, then the association study between cognitive scores and imaging
features can be regarded as a multi-task problem. Apparently, in our setting of
the longitudinal association study, the number of tasks is mT . The goal of the
association study is to find a weight matrix W = [W1 , · · · , WT ] ∈ Rd×mT , which
captures the relevant features for predicting the cognitive scores.
A forthright method is to perform linear regression at each time point and
determine Wt separately. However, the linear regression treats all tasks indepen-
dently and ignores the useful information reserved in the change along the time
continuum. Since AD is a progressive neurodegenerative disorder and cognitive
performance is an intuitive indication of the disease status, we can reasonably
regard the various tasks to be possibly related. In one cognitive experiment,
the result of a certain measure at different time points may be correlated and
also different cognitive measures at a certain time point may have mutual influ-
ence. To excavate the correlations among the cognitive scores, several multi-task
models are put forward.
One possible method is the longitudinal 2,1 -norm regression model [6,11]. In
this model, the introduced 2,1 -norm regularization enforces structured sparsity,
which helps to detect features related to all the cognitive measures along the
whole time axis. Moreover, with the assumption that imaging features may be
correlated with each other thus gain an overlap in their effects on brain structure
or disease progression, we can use the trace norm (also known as nuclear norm)
regularization to impose a low-rank restriction. Also, there are models combining
these two regularization terms to enforce the structured sparsity as well as low-
rank constraint [13,14].
Indeed, these models impose trace norm regularization to the whole parame-
ter matrix, such that the common subspace globally shared by different predic-
tion tasks can be extracted. However, the longitudinal prediction tasks can be
interrelated as different groups. The straightforward way to discover such inter-
related groups is to conduct the clustering analysis first and extract the group
structures. However, such a heuristic step is independent to the entire longitu-
dinal learning model, thus the detected group structures are not optimal for the
longitudinal learning process.
To address this challenging problem, we propose a novel longitudinal struc-
tured low-rank learning model to uncover the interrelations among different
Prediction of Memory Impairment with MRI Data 275
cognitive measures and utilize the learned interrelated structures to enhance

cognitive function prediction tasks.
2 Longitudinal Structured Low-Rank Regression Model
In our multi-task problem, suppose these mT tasks come from c groups, where
tasks in each group are correlated. We can introduce and optimize a group index
matrix set Q = {Q1 , Q2 , . . . Qc } to discover this group structure. Each Qi is a
diagonal matrix with Qi ∈ {0, 1}mT ×mT showing the assignment of tasks to the
i-th group. For the (k, k)-th element of Qi , (Qi )kk = 1 means that the k-th
task belongs to the i-thcgroup while (Qi )kk = 0 means not. To avoid overlap of
groups, we constrain i=1 Qi = I.
Since each group of tasks share correlative dependence, we can reasonably
assume the latent subspace of each group maintains a low-rank structure. We
impose Schatten-p norm as a low-rank constraint to uncover the common sub-
space shared by different tasks. According to the discussion below, Schatten
p-norm makes a better approximation of the low-rank constraint than the pop-
ular trace norm regularization [7].
For a matrix A ∈ Rd×n , suppose σi is its i-th singular value, then the rank of
min{d,n} 0
A can be written as rank(A) = i=1 σi , where 00 = 0. And the definition
p p
of p-th power Schatten p-norm (0 < p < ∞) of A is: ASp = T r((AT A) 2 ) =
min{d,n} p
i=1 σi . Specially, when p = 1, we find the Schatten p-norm of A is exactly
1 min{d,n}
its trace norm: AS1 = (T r((AT A) 2 )) = i=1 σi = A∗ .
So when 0 < p < 1, Schatten p-norm is a better low-rank regularization
than trace norm. Accordingly, our longitudinal structured low-rank regression
model is:
T
c

T
min Wt Xt − Yt 2 + γ p
(WQi Sp )l .

c F (1)
W,Qi |ci=1 ∈{0,1}mT ×mT , Qi =I t=1 i=1
i=1
In Problem (1), the grouping structure tends to be unstable when p is small,

so we add a power parameter l to the regularization term and make our model
robust. It is difficult to solve this new non-convex and non-smooth objective func-
tion. In next section, we will propose a novel alternating optimization method
for Problem (1).
3 Optimization Algorithm for Solving Problem (1)
According to the property of Qi that Qi 2 = Qi , Problem (1) can be rewritten as:

T
c

T
min Wt Xt − Yt 2 + γ T r(W T Di WQi ), (2)

c F
W, Qi |ci=1 ∈{0,1}mT ×mT , Qi =I t=1 i=1
i=1
276 X. Wang et al.
where Di is defined as:

lp p p−2
Di = (WQi Sp )l−1 (WQi W T ) 2 . (3)
2
We can solve Problem (2) via alternating optimization method.
The first step is fixing W and solving Q, and then Problem (2) becomes:
c

min T r((W T Di W)Qi ). (4)

c
Qi |ci=1 ∈{0,1}mT ×mT , Qi =I i=1
i=1
Letting Ai = W T Di W, then the solution of Qi is:

1 i = arg min (Aj )kk
(Qi )kk = j (5)
0 otherwise
The second step is fixing Q and solving W, and then Problem (2) becomes:
T
c

T
min Wt Xt − Yt 2 + γ T r(W T Di WQi ). (6)
W F
t=1 i=1
Denote Qi in the format that Qi = diag(Qi1 , Qi2 , . . . , QiT ). Since

T
T r(W T Di WQi ) = T r(WtT Di Wt Qit ), we can decouple Problem (6) for each t:
t=1
c

2
min WtT Xt − Yt F + γ T r(WtT Di Wt Qit ). (7)
Wt
i=1
Problem (7) can be further decoupled for each column of Wt as follows:

c

2
min (wt T )k Xt − (yt )k 2 + γ T r((wt T )k ( (Qit )kk Di )(wt )k ). (8)
(wt )k
i=1
Taking derivative w.r.t. (wt )k in Problem (8) and setting it to zero, then we
get:
c
(wt )k = (Xt Xt T + γ ( (Qit )kk Di ))−1 Xt ((yt )k )T . (9)
i=1
We can iteratively update Q, W and D with the alternating steps mentioned

above and the algorithm of Problem (2) is summarized in Algorithm 1.
Convergence Analysis: Our algorithm uses alternating optimization method,

whose convergence has already been proved in [1]. In Algorithm 1, variables in
each iteration has a closed form solution and can be computed fairly fast. In the
following experiments on the ADNI data, the running time of each iteration is
about 0.005 s and our method usually converges within one second.
Algorithm 1. Algorithm to solve problem (2).

Input:
Longitudinal imaging feature matrix X = [X1 , X2 , ..., XT ] ∈ Rd×nT , longitudinal
cognitive score matrix Y = [Y1 , Y2 , ..., YT ] ∈ Rn×mT , parameter γ, and number of
groups c.
Output:
Weight matrix W = [W1 , W2 , ..., WT ] where Wt ∈ Rd×m and c different group
matrix Qi ∈ RmT ×mT which groups the tasks into exactly c groups.
Initialize W by the optimal solution to ridge regression problem
while not converge do
1. Update Di |ci=1 according to the definition in Eq. (3).
2. Update Qi |ci=1 according to the solution in Eq. (5)
3. Update W, where the solution to the k-th column of Wt is displayed in Eq. (9).
end while
In this section, we evaluate the prediction performance of our proposed method

by applying it to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data-
base.
4.1 Data Description
Data used in the preparation of this article were obtained from the ADNI data-
base (adni.loni.usc.edu). Each MRI T1-weighted image was first anterior com-
missure (AC) posterior commissure (PC) corrected using MIPAV2, intensity
inhomogeneity corrected using the N3 algorithm [10], skull stripped [16] with
manual editing, and cerebellum-removed [15]. We then used FAST [17] in the
FSL package3 to segment the image into gray matter (GM), white matter (WM),
and cerebrospinal fluid (CSF), and used HAMMER [9] to register the images to
a common space. GM volumes obtained from 93 ROIs defined in [5], normal-
ized by the total intracranial volume, were extracted as features. Longitudinal
scores were downloaded from three independent cognitive assessments including
Fluency Test, Rey’s Auditory Verbal Learning Test (RAVLT) and Trail making
test (TRAILS). The details of these cognitive assessments can be found in the
ADNI procedure manuals. The time points examined in this study for both imag-
ing markers and cognitive assessments included baseline (BL), Month 6 (M6),
Month 12 (M12) and Month 24 (M24). All the participants with no missing
BL/M6/M12/M24 MRI measurements and cognitive measures were included in
this study. A total of 385 sample subjects are involved in our study, among which
we have 56 AD samples, and 181 MCI samples and 148 health control (HC) sam-
ples. Seven cognitive scores were included: (1) RAVLT TOTAL, RAVLT TOT6
and RAVLT RECOG scores from RAVLT cognitive assessment; (2) FLU ANIM
and FLU VEG scores from Fluency cognitive assessment; (3) Trails A and Trails
B scores from Trail making test.
278 X. Wang et al.
4.2 Performance Comparison on the ADNI Cohort
We first evaluate the ability of our method to predict a certain set of cognitive
scores via neuroimaging marker. We tracked the process along the time axis and
intended to find the set of markers which could influence the cognitive score
over the time points. As the evaluation metric, we reported the Root Mean
Square Error (RMSE) as well as the Correlation Coefficient (CorCoe) between
the predicted score and the ground truth.
We compared our method with all the counterparts discussed in the intro-
duction, which are: Multivariate Linear Regression (MLR), Multivariate Ridge
Regression (MRR), Longitudinal Trace-norm Regression (LTR), Longitudinal
2,1 norm Regression (L21R) and their combination (L21R + LTR). To illustrate
the advantage of simultaneously conducting task correlation and longitudinal
feature learning, we also compared with the method of using K-means to cluster
the tasks first and then implementing LTR in each group (K-means + LTR) as
the baseline.
We utilized the 10-fold cross validation technique and ran 50 times for each
method. The average RMSE and CorCoe on these 500 trials are reported. For
MLR and MRR, since they were not designed for the longitudinal tasks, we
computed the weight matrix for each time point separately and then merged
them to the final weight matrix according to the definition W = [W1 , · · · , WT ].
Here in this experiment, the number of time points T is 4. Our initial analy-
ses indicated that our model performs fairly stable when choosing parameter l
from {2, 2.5, . . . , 5} and choosing parameter p from {0.1, 0.2, . . . , 0.8} (data not
shown). In our experiments, we fixed p = 0.1 and l = 3.
The experimental results are summarized in Table 1. From all the results,
we can notice that our method outperforms all other methods consistently on
all data sets. The reasons go as follows: MLR and MRR assumed the cognitive
measures at different time points to be independent, thus didn’t consider the
correlations along the time. Their neglects of the longitudinal correlation within
the data was detrimental to their prediction ability. As for L21R, LTR and their
combination LTR + L21R, even though they take into account the longitudinal
information, they cannot handle the possible group structure within the cognitive
Table 1. Cognitive assessment FLUENCY, RAVLT and TRAILS prediction compari-

son via RMSE and CorCoe. Better performance corresponds to lower RMSE or higher
CorCoe value.
MLR MRR LTR L21R L21R + LTR K-means + LTR OURS

RMSE FLUENCY 0.352 0.350 0.343 0.339 0.345 0.351 0.316
RAVLT 0.469 0.447 0.458 0.445 0.448 0.459 0.417
TRAILS 0.571 0.554 0.564 0.551 0.567 0.557 0.511
CorCoe FLUENCY 0.504 0.499 0.516 0.528 0.513 0.503 0.579
RAVLT 0.872 0.880 0.877 0.879 0.879 0.874 0.891
TRAILS 0.541 0.551 0.548 0.558 0.547 0.562 0.600
m late
ed ra
ia l
po l fr v
ste glo on entr M24 M12 M06 BL
4.3
rio bu tal icle
r pa infe s gy
lim pa ru
ra rio
b hip r p llad s
of p fr u u
in oc onta tam s
te
rn cin am l g en
al gu pal yru
ca la gy s
ps te ru
ule su fr
o re s
in bth nta gio
c. a l fo n
in cerelam lob rnix
fe e
an rio bra ic n W
te r o l uc M
cc pe leu
rio
r li su cau ipit dun s
m pra da al cle
b m te g
of a n yru
m in rg u s
su id tern ina cleu
pe dle a l g s
rio fr l c yru
r p on ap s
ari tal sule
eta gy
l lo rus
su b
pe pa p cu ule
rio rie re ne
r te tal cu us
m m lo ne
id po be us
dle ra W
te lg M
su m yru
po un s
pe li ra c
u
nu rior ngu l gy s
cle fro al ru
u n g s
oc s a tal yru
cip c g s
po ita cum yru
stc l lo be s
m pre en be ns
ed te c tra W
la ial mp entr l gy M
te fr o a ru
ra on ra l g s
l fr t− l lo y
on orb be rus
in p t−o ital WM
fe e rb g
rio ri ita yru
r te rhin l g s
m al yru
p c s
su te ora ort
p en m l g ex
hip erio torh pora yru
po r o in l p s
ca cc al c ole
m ipit ort
pa al e
m l fo gy x
ed
ia rm rus
lo th ati
cc ala on
ip m
corresponding cognitive measure.

m ito u
la id te a ins s
te dle m m u
ra p y la
lo oc ora gd
cc c l ala
ip a ipita gyru
ito ng l
te ula gy s
m r ru
po g s
oc ra yru
cip l g s
ita yru
lp s
ole
(a) Weight matrix for the left hemisphere.

0
5
0
5
0
5
0
5
10
10
10
10
−5
−5
−5
−5
m late
ed ra
ia l
l fr v
po glo on entr M24 M12 M06 BL
ste
rio bu tal icle
r pa infe s gy
pa ru
lim ra rio ll s
b h ip r fr pu adu
of po on ta s
in c ta m
te
rn cin am l g en
al gu pal yru
ca la gy s
ps te ru
ule su fr
o re s
in bth nta gio
c. a l fo n
in cerelam lob rnix
fe e
an rio bra ic n W
te r o l uc M
cc pe leu
rio
s c
r li u au it un s ip d
m pra da al cle
b m te g
of a n yru
m in rg uc s
su id tern ina leu
pe dle a l g s
rio fr l c yru
r p on ap s
ari tal sule
eta gy
l lo rus
su b
pe pa p cu ule
rio rie re ne
r te tal cu us
m m lob neu
id po e s
dle ra W
te lg M
m yru
su p
pe li ora unc s
u
nu rior ngu l gy s
cle fro al ru
u n g s
oc s a tal yru
cip c g s
po ita cum yru
Identification of Longitudinal Imaging Markers
stc l lo be s
m p e b n
ed te rec ntra e W s
la ial mp entr l gy M
te fr o a ru
ra on ra l g s
l fr t− l lo y
on orb be rus
in p t−o ital WM
fe e rb g
rio ri ita yru
r te rhin l g s
m al yru
p c s
su te ora ort
pe en mp l g ex
hip rio torh ora yru
po r o in l p s
ca cc al c ole
m ipit ort
pa al e
m l fo gy x
ed
ia rm rus
lo th ati
cc ala on
ip m
m ito u
la id te a ins s
te dle m m u
ra p y la
lo oc ora gd
cc cip l ala
Prediction of Memory Impairment with MRI Data
ip a ita gyru
ito ng l
te ula gy s
m r ru
po g s
oc ra ru y
cip l g s
ita yru
lp s
ole
(b) Weight matrix for the right hemisphere.

the time continuum, which saves more effective information in the prediction.
0
5
0
5
0
5
0
5
10
10
10
10
−5
−5
−5
−5
scores, which are RAVLT TOTAL, RAVLT TOT6 and RAVLT RECOG, respectively.
Imaging features (columns) with larger weights possess higher correlation with the
plotted. We draw two matrices for each time point, where the left figure is for the
via MRI data. The weight matrices at four time points, BL, M6, M12 and M24, are
Fig. 1. Heat maps of our learned weight matrices on the RAVLT cognitive assessment
columns denote neuroimaging features while rows represent three different RAVLT
left hemisphere and the right figure for the right hemisphere. For each weight matrix,
results of our model. RAVLT is composed of three cognitive measures, which
We further take a special case, the RAVLT assessment, as an example to analyze
cognitive scores. As was discussed in the theoretical sections, our model is able
279
meanwhile cluster the cognitive results into groups. Thus, our model can capture
in most cases, but are inferior to our proposed method. For K-means + LTR,
features responsible for some, but not necessarily all, cognitive measures along
are: (1) the total number of words kept in mind by the testee in the first five
the clustering step is detached from the longitudinal association study, thus
to find features which impact on the cognitive result at different stages and
scores. That is why they overweigh the standard methods like MLR and MRR
correlations among imaging features, but also detected group structure within
learning process. As for our proposed method, we not only captured longitudinal
the learned interrelation structure is not optimal for the following longitudinal
280 X. Wang et al.
trials, RAVLT TOTAL; (2) the number of words recalled during the 6th trial,
RAVLT TOT6; and (3) the number of words recognized after a gap of 30 min,
RAVLT RECOG. According to the common sense, these three measures should
be interrelated with each other, thus clustered into the same group in our model.
The result of our model shows a consistent obedience of this rule, i.e., no matter
what the c value (number of groups) is, our model invariably put all these three
measures to the same group, which is in line with reality. Specially, when c is
larger than the real number of groups, the extra groups become empty.
Figure 1 shows the heat maps of the weight matrices learned by our method.
The figures demonstrate the capture of a small set of features that are con-
sistently associated to a certain group of cognitive measures (here the group
includes all measures). Among the selected features, we found the top two are
the hippocampal formation and thalamus, whose impacts on AD have already
been proved in the previous papers [2,3]. In summary, our model is competent
to select a small set of features that consistently correlate with a certain group
of cognitive measures along the time axis. And the effectiveness of the selected
features can be confirmed by previous reports in the literature.
5 Conclusion
In this paper, we proposed a novel longitudinal structured low-rank regression

model to study the longitudinal cognitive score prediction. Our model can simul-
taneously uncover the interrelation structures existing in different prediction
tasks and utilize such learned interrelated structures to enhance the longitudinal
learning model. Moreover, we utilized Schatten p-norm to extract the common
subspace shared by the prediction tasks. Our new model is applied to ADNI
cohort for cognitive impairment prediction using MRI data. Empirical results
validate the effectiveness of our model, showing a potential to provide reference
for current clinical research.
References
1. Bezdek, J.C., Hathaway, R.J.: Convergence of alternating optimization. Neural
Parallel Sci. Comput. 11(4), 351–368 (2003)
2. De Jong, L., Van der Hiele, K., Veer, I., Houwing, J., Westendorp, R., Bollen, E.,
De Bruin, P., Middelkoop, H., Van Buchem, M., Van Der Grond, J.: Strongly
reduced volumes of putamen and thalamus in Alzheimer’s disease: an MRI study.
Brain 131(12), 3277–3285 (2008)
3. De Leon, M., George, A., Golomb, J., Tarshish, C., Convit, A., Kluger, A.,
De Santi, S., Mc Rae, T., Ferris, S., Reisberg, B., et al.: Frequency of hippocam-
pal formation atrophy in normal aging and Alzheimer’s disease. Neurobiol. Aging
18(1), 1–11 (1997)
4. Ewers, M., Sperling, R.A., Klunk, W.E., Weiner, M.W., Hampel, H.: Neuroimaging
markers for the prediction and early diagnosis of Alzheimer’s disease dementia.
Trends Neurosci. 34(8), 430–442 (2011)
5. Kabani, N.J.: 3D anatomical atlas of the human brain. Neuroimage 7, P-0717

(1998)
6. Nie, F., Huang, H., Cai, X., Ding, C.H.: Efficient and robust feature selection
via joint l2,1 -norms minimization. In: Advances in Neural Information Processing
Systems, pp. 1813–1821 (2010)
7. Nie, F., Huang, H., Ding, C.H.: Low-rank matrix recovery via efficient schatten
p-norm minimization. In: AAAI (2012)
8. Petrella, J.R., Coleman, R.E., Doraiswamy, P.M.: Neuroimaging and early diagno-
sis of Alzheimer disease: a look to the future 1. Radiology 226(2), 315–336 (2003)
9. Shen, D., Davatzikos, C.: Hammer: hierarchical attribute matching mechanism for
elastic registration. IEEE Trans. Med. Imaging 21(11), 1421–1439 (2002)
10. Sled, J.G., Zijdenbos, A.P., Evans, A.C.: A nonparametric method for automatic
correction of intensity nonuniformity in MRI data. IEEE Trans. Med. Imaging
17(1), 87–97 (1998)
11. Wang, H., Nie, F., Huang, H., Kim, S., Nho, K., Risacher, S.L., Saykin, A.J.,
Shen, L.: Identifying quantitative trait loci via group-sparse multitask regression
and feature selection: an imaging genetics study of the ADNI cohort. Bioinformatics
28(2), 229–237 (2012)
12. Wang, H., Nie, F., Huang, H., Risacher, S., Saykin, A.J., Shen, L.: Identifying AD-
sensitive and cognition-relevant imaging biomarkers via joint classification and
regression. In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011, Part III.
13. Wang, H., Nie, F., Huang, H., Yan, J., Kim, S., Nho, K., Risacher, S.L.,
Saykin, A.J., Shen, L., et al.: From phenotype to genotype: an association study
of longitudinal phenotypic markers to Alzheimer’s disease relevant SNPs. Bioin-
formatics 28(18), i619–i625 (2012)
14. Wang, H., Nie, F., Huang, H., Yan, J., Kim, S., Risacher, S., Saykin, A., Shen, L.:
High-order multi-task feature learning to identify longitudinal phenotypic markers
for Alzheimer’s disease progression prediction. In: Advances in Neural Information
Processing Systems, pp. 1277–1285 (2012)
15. Wang, Y., Nie, J., Yap, P.T., Li, G., Shi, F., Geng, X., Guo, L., Shen, D.,
Alzheimer’s Disease Neuroimaging Initiative: Knowledge-guided robust MRI brain
extraction for diverse large-scale neuroimaging studies on humans and non-human
primates. PloS One 9(1), e77810 (2014)
16. Wang, Y., Nie, J., Yap, P.-T., Shi, F., Guo, L., Shen, D.: Robust deformable-
surface-based skull-stripping for large-scale studies. In: Fichtinger, G., Martel, A.,
Peters, T. (eds.) MICCAI 2011. LNCS, vol. 6893, pp. 635–642. Springer, Heidelberg
(2011). doi:10.1007/978-3-642-23626-6 78
17. Zhang, Y., Brady, M., Smith, S.: Segmentation of brain MR images through a
hidden Markov random field model and the expectation-maximization algorithm.
Joint Data Harmonization and Group
Cardinality Constrained Classification
Yong Zhang1(B) , Sang Hyun Park2 , and Kilian M. Pohl1,2

1
Department of Psychiatry and Behavioral Sciences,
Stanford University, Stanford, USA
michaelzhang917@gmail.com
2
Center of Health Sciences, SRI International, Menlo Park, USA
Abstract. To boost the power of classifiers, studies often increase the

size of existing samples through the addition of independently collected
data sets. Doing so requires harmonizing the data for demographic
and acquisition differences based on a control cohort before perform-
ing disease specific classification. The initial harmonization often miti-
gates group differences negatively impacting classification accuracy. To
preserve cohort separation, we propose the first model unifying linear
regression for data harmonization with a logistic regression for disease
classification. Learning to harmonize data is now an adaptive process
taking both disease and control data into account. Solutions within that
model are confined by group cardinality to reduce the risk of overfit-
ting (via sparsity), to explicitly account for the impact of disease on
the inter-dependency of regions (by grouping them), and to identify dis-
ease specific patterns (by enforcing sparsity via the l0 -‘norm’). We test
those solutions in distinguishing HIV-Associated Neurocognitive Disor-
der from Mild Cognitive Impairment of two independently collected, neu-
roimage data sets; each contains controls and samples from one disease.
Our classifier is impartial to acquisition difference between the data sets
while being more accurate in diseases seperation than sequential learning
of harmonization and classification parameters, and non-sparsity based
logistic regressors.
1 Introduction
Popular for improving the power of classifiers is to expand application from a
single data set (Fig. 1(a)) to multiple, independently collected sets of the same
disease (Fig. 1(b)) [1]. To analyze across multiple sets, neuroimage studies gener-
ally first harmonize the data by, for example, regressing out demographic factors
from MRI measurements and then train the classifier to distinguish disease from
control samples [2]. However, harmonization might mitigate group differences
making classification difficult (such as in Fig. 2). To improve classification accu-
racy, we propose the first approach to jointly learn how to harmonize MR image
data and classify disease. Harmonization relies on both controls and disease
Equal contribution by Dr. Zhang and Dr. Park.

DOI: 10.1007/978-3-319-46720-7 33
Joint Data Harmonization and Classification 283
(a) (b) (c)
Set A - Controls Set B - Controls Set C - Controls Harmonization

Set A - Diseases I Set B - Diseases I Set C - Diseases II Classifier
Fig. 1. Examples for classifying data: (a) Classification based on a single set, (b) sep-
arating healthy and disease based on multiple sets requiring harmonization, and (c)
part two disease groups based on harmonizing controls of disease specific data sets.
cohorts to reduce differences in the image based measurements due to acquisi-

tion differences between sets while preserving the group separation investigated
by the classifier. We evaluate our approach on the challenging task (Fig. 1(c))
of being trained on two data sets differing in their acquisition and diseases they
sample, and tested on accurately distinguishing the two diseases while being
impartial to acquisition differences, i.e., the controls of the two sets.
Training of our approach is defined by an energy function that combines a
linear harmonization model with a logistic regression classifier. Minimizing this
function is confined to group cardinality constrained solutions, i.e., labeling is
based on a small number of groups of image scores (counted via the l0 -‘norm’).
Doing so reduces the risk of overfitting, accounts for inter-dependency between
regions (e.g., bilateral impact of diseases), and identifies disease distinguishing
patterns (defined by non-zero weights of classifiers) that avoid issues of solutions
based on relaxed sparsity constraints [3]. Inspired by [4], our method uses block
coordinate descent to find the optimal parameters for harmonizing the two data
sets and correctly labeling disease samples while being indifferent to control
cohorts (i.e., acquisition differences). During testing, we use those parameters
to harmonize the image scores before performing classification ensuring that the
data set associated with a subject is not part of the labeling decision.
Using 5-fold cross-validation, we measure the accuracy of our method on dis-
tinguishing HIV-Associated Neurocognitive Disorder (HAND) from Mild Cog-
nitive Impairment (MCI) of two independently collected data sets; each set con-
tains controls and samples of one disease only. Distinguishing HAND from MCI
is clinically challenging due to similarity in symptoms and missing standard pro-
tocols for assessing neuropsychological deficits [5]. Not only is our classifier indif-
ferent to the controls from both groups but is also significantly more accurate in
distinguishing the two diseases than the conventional sequential harmonization
and classification approach and non-sparsity based logistic regression methods.
2 Jointly Learning of Harmonization and Classification
Let a data set consists of set SA of controls and samples with disease A, and
an independently collected set SB of controls and samples of disease B. The
284 Y. Zhang et al.
four subsets are matched with respect to demographic scores, such as age. Each
sample s of the joint set is represented by a vector of image features xs and
a label ys , where ys = −1 if s ∈ SA and ys = +1 for s ∈ SB . The acquisi-
tion differences between SA and SB are assumed to linearly impact the image
features. To extract disease separating patterns from this joint set, we review
the training of a sequential model for data harmonization and classification, and
then propose to simultaneously parameterize both tasks by minimizing a single
energy function.
2.1 Sequential Harmonization and Classification

Training the linear regression model for data harmonization results in parame-
terizing matrix U so that it minimizes the difference (with respect to the l2 -norm
· 2 ) between the inferred values U · [1 ys ] (1 is the bias term of the linear
model) and the raw image scores xs across nC controls of the joint data set, i.e.,
nC

:= arg min hC (U ) with hC (U ) := 1
U U · [1 ys ] − xs 22 . (1)
U nC s=1
A group sparsity constrained logistic regression classifier is trained on the

) := U
residuals rs (U · [1 ys ] − xs , i.e., harmonized scores, across samples ‘s’ of
two groups, which are nD samples of the two disease cohorts. Note, classification
based on the inferred values U ·[1 ys ] is uninformative as all scores of a region are
mapped to two values. Now, let the logistic function be θ(α) := log(1+exp(−α)),
the weight vector w encode the importance of each score of rs and v ∈ R be the
label offset, then the penalty (or label) function of the classifier is defined by
nC
+nD
, v, w) := 1
lD (U ) + v .
θ ys · w · rs (U (2)
nD s=nC +1
The search for point ( minimizing lD (U

v , w) , ·, ·) often has to be limited so
that w conforms to disease specific constraints, such as the bilateral impact
of HAND or MCI on individual regions. These constraints can be modeled by
group cardinality. Specifically, every entry of w is assigned to one of nG groups
g1 (w), . . . , gnG (w). The search space S is composed of weights where the number
of groups with nGnon-zero weight entries are below a certain threshold k ∈ N, i.e.,
S := {w : i=1 gi (w)2 0 ≤ k} with · 0 being the l0 -‘norm’. Finally, the
training of the classifier is fully described by the following minimization problem
, v, w),
(v̂, ŵ) := arg min lD (U (3)
v∈R,w∈S
which can be solved via penalty decomposition [4,6]. Note, that Eq. (3) turns
into a sparsity constrained problem if each group gi is of size 1. Furthermore,
setting ‘k = nG ’ turns Eq. (3) into a logistic regression problem. Finally, Eq. (3)
can distinguish a single disease from controls by simply replacing ys in Eq. (2)
with a variable encoding assignment to cohorts instead of data sets.
Algorithm 1. Joint Harmonization and Classification

√
1: Set = 0.1, σ = 10, B = P = 10−3 (according to [4]) and U = 0, w = 0, q = 0, v = 1
2: Repeat
3: Repeat (Block Coordinate Descent)
4: U ← U , w ← w , q ← q , v ← v
5: U ← arg minU (1 − λ)lD (U, v , q ) + λhC (U ) (via Gradient Descent)
6: (v , q ) ← arg minv,q (1−λ)lD (U , v, q)+w −q22 (via Gradient Descent)
7: Update w by
Sort groups gj (q’) so that gj1 (q’)2 ≥ gj2 (q’)2 ≥ . . . ≥ gjn (q’)2
G
Define I to be the set of indices of q associated with groups gj1 , . . . , gj
k

wi ← qi if qi ∈ I and 0 otherwise.
U −U |v −v | w −w max q −q max
8: Until max{ max{U max , , ,
max{|v |,1} max{w max ,1} max{q max ,1}
} < B ,
max ,1}
9: ←σ·
10:Until w − q max < P .
2.2 Simultaneous Harmonization and Classification

We now determine (U , v, w)
for a single minimization problem composed of the
linear (harmonization) and logistic (classification) regression terms, i.e.,
, v, w)
(U := arg min (1 − λ) · lD (U, v, w) + λ · hC (U ), (4)
U,v,w∈S
where λ ∈ (0, 1). Note, the model fails to classify when λ = 1 ( v and w are
undefined) or harmonize when λ = 0 (entries of U are undefined). Motivated by
[4], we simplify optimization by first parameterizing the classifier with respect
to the ‘unconstrained’ vector q before determining the corresponding sparse
The solution to Eq. (4) is estimated by iteratively increasing of
solution w.
(U , v , w , q ) := arg min (1 − λ) · lD (U, v, q) + λ · hC (U ) + · w − q22 (5)
U,v,w∈S,q
until the maximum of the absolute difference between elements of w and q

is below a threshold, i.e.,w − q max <P . (U ,v ,w ,q ) are determined by
block coordinate descent (BCD). As outlined in Algorithm 1, let (U , v , w , q )
be estimates of (U , v , w , q ), then U is updated by solving Eq. (5) with fixed
(v , w , q ):
U := arg min (1 − λ) · lD (U, v , q ) + λ · hC (U ). (6)
U
As this minimization is over a convex and smooth function, Eq. (6) is solved via
gradient descent. Note, that determining U is equivalent to increasing the sepa-
ration between the two disease groups by minimizing lD (·, v , q ) while reducing
the difference between the two control groups by minimizing hC (·).
Next, BCD updates v and q by keeping (U ,w ) fixed in Eq. (5), i.e.,
(v , q ) := arg min(1 − λ) · lD (U , v, q) + · w − q22 . (7)
v,q
This minimization problem is defined by a smooth and convex function. Its

solution is thus also estimated via gradient descent. Finally, w is updated by
solving Eq. (5) with fixed (U , v , q ), i.e.,
w := arg min w − q 22 . (8)
w∈S
286 Y. Zhang et al.
Raw Scores Sequential Joint (λ=0.8) Joint (λ=0.5)

Set I 2.5 2.5 2.5 2.5
Control
2 2 2 2
Disease A
1.5 1.5 1.5 1.5
Set II
Control 1 1 1 1
Disease B 0.5 0.5 0.5 0.5
0 1 2 0 1 2 0 1 2 0 1 2
Fig. 2. Impact of harmonization on classification (black line) of two synthetic sets.

Compared to the raw scores, the Sequential approach mitigated differences between
the two disease groups by ‘pushing’ them together. Our joint model with λ = 0.5 is
the only approach that is indifferent to controls and correctly labels all disease cases.
As shown in [4,6], the closed form solution of Eq. (8) first computes gi (q )2
for each group i and then sets w to the entries of q , who are assigned to
the k groups with the highest norms. The remaining entries of w are set to 0.
The procedures (6)∼(8) are repeated until the relative changes of (U , v , w , q )
between iterations are smaller than a threshold B . (U , v , w , q ) is updated
with the converged (U , v , w , q ), is increased and another BCD loop is
initiated until w and q converge towards each other (see Algorithm 1 for
details).
Figure 2 showcases the differences between sequential and joint harmoniza-
tion and classification. Two synthetic data sets consist of a control and disease
cohort, where the raw scores for each cohort are 20 random samples from a
Gaussian distribution with the covariance being the identity matrix multiplied
by 0.01. The mean of the Gaussian for Disease A of Set I (blue) is (1.3,2) result-
ing in samples that are somewhat separated from those of Disease B of Set II
(mean = (1.5,2), red). The difference in data acquisition between the two sets is
simulated by an even larger separation of the means between the two control
groups (Set I: mean=(0.9,1), green; Set B: mean = (1.2,1), black). The Sequen-
tial method (see Sect. 2.1 without sparsity) harmonizes the scores so that the
classifier assigns the controls to one set, i.e., the separating plane (black line) is
impartial to acquisition differences. This plane fails to perfectly separate the two
disease cohorts as the cohorts are ‘pushed’ together with the mean of Disease
B being now to the right of the mean of Disease A. Higher accuracy in disease
classification is achieved by our joint model (omitting sparsity) with λ = 0.8.
Comparing this plot to the results with λ = 0.5 shows that as λ decreases the
emphasis on separating the two disease increases as intended by Eq. (6). The clas-
sifier is still impartial to acquisition differences and perfectly labels the samples
of the two disease cohort. In summary, the joint model enables data harmoniza-
tion that preserves group differences, which was not the case for the sequential
approach.
3 Distinguishing HAND from MCI

We tested our method on a joint set of two independently collected data sets:
the ‘UHES’ set [7] contained cross-sectional MRIs of 15 HAND cases and 21
controls while the ‘ADNI’ set contained MRIs of 20 MCIs and 18 controls. The
4 groups were matched with respect to sex and age. Each MRI was acquired on
a 3T Siemens scanner (with the two sets having different acquisition protocols)
and was processed by [8] resulting in 102 regional volume scores. We assigned
those scores to (nG = 52) groups to account for the bilateral impact of MCI
and HAND on both hemispheres. We now review our experimental setup and
results highlighting that jointly learning harmonization and classification results
in findings that are indifferent to the two control cohorts and more accurate in
distinguishing the two disease groups compared to alternative learning models.
To measure the accuracy of our method, we used 5 fold-cross validation
with each fold containing roughly 20% of each cohort. On the training sets,
we used parameter exploration to determine the optimal group cardinality
k ∈ {1, 2, . . . , 10} and weighing λ ∈ {0.1, 0.2, . . . , 0.9}. Across all folds and set-
tings, our algorithm converged within 5 iterations while each BCD optimization
converged within 500 iterations. For each setting, we then computed the disease
accuracy of the classifier by separately computing the accuracy for each cohort
(MCI and HAND) to be assigned to the right set (UHES or ADNI) and then
averaging the scores to consider the imbalance of cohort sizes. We determined
the control accuracy repeating those computations with respect to the two con-
trol groups. Since higher control accuracy infers worse harmonization, an unbi-
ased control accuracy, i.e., around 50 %, coupled with a high disease accuracy
was viewed as preferable. Note, an unbiased control accuracy only qualifies the
harmonization in that the remaining acquisition differences do not impact the
separating plane (or weights w) of the classifier. For each training set, we then
chose the setting (λ, k) with corresponding weights w and harmonization para-
meters U that produced the largest difference between the two accuracy scores.
This criteria selected a unique setting for 2 folds, 2 settings for 2 folds, and 3
settings for 1 fold.
On the test set, we computed the harmonized scores (residuals) of all sam-
ples for each selected setting. We then hid the subjects’ data set associations
from the ensemble of selected classifiers by applying the residuals to classifiers
parameterized by the corresponding weight settings. Based on those results, we
computed control and disease accuracy as well as the corresponding p-values
via the Fisher exact test [9]. We viewed p < 0.05 as significantly more accurate
than randomly assigning samples to the two sets. An ensemble of classifiers was
viewed as informative, if separating HAND from MCI resulted in a significant
p-value and a non-significant one for controls.
We repeated the above experiment for the sequential harmonization and clas-
sification approach of Sect. 2.1, called SeqGroup, to show the advantages of our
joint learning approach, called JointGroup. To generalize our findings about
joint learning of harmonization and classification parameters to non-sparsity
constrained models, we also tested the approach omitting the group sparsity
288 Y. Zhang et al.
constraint, i.e., k = 52. JointNoGroup refers to the results of the correspond-

ing joint method and SeqNoGroup to the sequential results.
Table 1 shows the accuracy scores and p-values of all implementations listed
according to the difference between disease and control accuracy. The method
with the smallest difference, SeqNoGroup, was the only approach recording a
significant control accuracy score (68.4 %, p = 0.024). The corresponding sepa-
rating plane was thus not impartial to acquisition difference so that the relatively
high disease accuracy (82.5%) was insignificant. The next approach, SegGroup,
obtained non-significant accuracy scores for controls (64.6 %) indicating that
the group sparsity constrain improved generalizing the results from the relatively
small training set to testing. The difference between control (52.8 %) and disease
accuracy (83.3 %) almost doubled (30.5%) for the joint approach JointNoGroup.
As reflected in Fig. 2, learning data harmonization separated from classification
does not generalize as well as the joint approach, who harmonizes data so that
differences between disease groups are preserved. Confirming all previous find-
ings, the overall best approach (i.e., control and disease accuracy differ most) is
our proposed JointGroup approach achieving a disease accuracy of 90 %.
Table 1. Implementations listed according to the difference between disease (MCI

vs. HAND) and control (UHES controls vs. ADNI controls) accuracy. In bold are
the findings that are statistically significant different from chance, i.e., p < 0.05. Our
method, JointGroup, achieves the highest disease accuracy and largest spread in scores.
Classification accuracy p-value with ground truth

Disease Control Difference Disease Control
SeqNoGroup 82.5 68.4 14.1 <0.001 0.024
SeqGroup 80.0 64.6 15.4 <0.001 0.073
JointNoGroup 83.3 52.8 30.5 <0.001 0.753
JointGroup 90.0 52.8 37.2 <0.001 0.753
The group sparsity constraint aided in separating diseases and identified pat-
impacted by either MCI or HAND (or
terns of regions (i.e., non-zero weights w)
HIV). Each column of Fig. 3 shows the largest, unique pattern associated with
a training set. For those training sets that selected multiple patterns (i.e., w
settings), patterns with less regions were always included in the largest pattern.
The precentral gyrus, cerebellum VIII, and lateral ventricle were parts of all pat-
terns. HIV is known to impact the cerebellum [10] and accelerated enlargement
of ventricles is linked to both HIV [11] and MCI [12]. These findings indicate
that the extracted patterns are informative with respect to MCI and HAND
(and HIV), which requires an in-depth morphemic analysis for confirmation.
Fig. 3. Each column shows the largest, unique pattern extracted by JointGroup on one
of the 5 training sets. Identified regions are impacted by HAND, HIV, or MCI.
4 Conclusion
We proposed an approach that simultaneously learned the parameters for data

harmonization and disease classification. The search for the optimal separating
plane was confined by group cardinality to reduce the risk of overfitting, to
explicitly account for the impact of disease on the inter-dependency of regions,
and to identify disease specific patterns. On separating HAND from MCI samples
of two disease specific data sets, our joint approach achieved better classification
accuracy than the non-sparsity based model and sequential approaches.
Acknowledgement. This research was supported in part by the NIH grants U01
AA017347, AA010723, K05-AA017168, K23-AG032872, and P30 AI027767. We thank
Dr. Valcour for giving us access to the UHES data set. With respect to the ADNI data,
collection and sharing for this project was funded by the NIH Grant U01 AG024904 and
DOD Grant W81XWH-12-2-0012. Please see https://adni.loni.usc.edu/wp-content/
uploads/how to apply/ADNI DSP Policy.pdf for further details.
References
1. Jovicich, J., et al.: Multisite longitudinal reliability of tract-based spatial statistics
in diffusion tensor imaging of healthy elderly subjects. Neuroimage 101, 390–403
(2014)
2. Moradi, E., et al.: Predicting symptom severity in autism spectrum disorder based
on cortical thickness measures in agglomerative data. bioRxiv (2016)
3. Sabuncu, M.R.: A universal and efficient method to compute maps from image-
based prediction models. In: Golland, P., Hata, N., Barillot, C., Hornegger, J.,
Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8675, pp. 353–360. Springer, Heidelberg
(2014). doi:10.1007/978-3-319-10443-0 45
290 Y. Zhang et al.
4. Zhang, Y., et al.: Computing group cardinality constraint solutions for logistic
regression problems. Medical Image Analysis (2016, in press)
5. Sanmarti, M., et al.: HIV-associated neurocognitive disorders. J.M.P. 2(2) (2014)
6. Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM
J. Optim. 23(4), 2448–2478 (2013)
7. Nir, T.M., et al.: Mapping white matter integrity in elderly people with HIV. Hum.
Brain Mapp. 35(3), 975–992 (2014)
8. Pfefferbaum, A., et al.: Variation in longitudinal trajectories of regional brain vol-
umes of healthy men and women (ages 10 to 85 years) measured with atlas-based
parcellation of MRI. Neuroimage 65, 176–193 (2013)
9. Fisher, R.: The logic of inductive inference. J. Roy. Stat. Soc. 1(98), 38–54 (1935)
10. Chang, L., et al.: Impact of apolipoprotein E 4 and HIV on cognition and brain
atrophy: antagonistic pleiotropy and premature brain aging. Neuroimage 4(58),
1017–1027 (2011)
11. Thompson, P.M., et al.: 3D mapping of ventricular and corpus callosum abnormal-
ities in HIV/AIDS. Neuroimage 31(1), 12–23 (2006)
12. Nestor, S.M., et al.: Ventricular enlargement as a possible measure of Alzheimer’s
disease progression validated using the Alzheimer’s disease neuroimaging initiative
database. Brain 131(9), 2443–2454 (2008)
Progressive Graph-Based Transductive
Learning for Multi-modal Classification
of Brain Disorder Disease
Zhengxia Wang1,2, Xiaofeng Zhu1, Ehsan Adeli1, Yingying Zhu1,

Chen Zu1, Feiping Nie3, Dinggang Shen1, and Guorong Wu1(&)
1
University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
grwu@med.unc.edu
2
Department of Information Science and Engineering,
Chongqing Jiaotong University, Chongqing 400074, China
3
School of Computer Science and OPTIMAL Center,
Northwestern Polytechnical University, Xi’an 710072, China
Abstract. Graph-based Transductive Learning (GTL) is a powerful tool in

computer-assisted diagnosis, especially when the training data is not sufficient to
build reliable classifiers. Conventional GTL approaches first construct a fixed
subject-wise graph based on the similarities of observed features (i.e., extracted
from imaging data) in the feature domain, and then follow the established graph
to propagate the existing labels from training to testing data in the label domain.
However, such a graph is exclusively learned in the feature domain and may not
be necessarily optimal in the label domain. This may eventually undermine the
classification accuracy. To address this issue, we propose a progressive GTL
(pGTL) method to progressively find an intrinsic data representation. To achieve
this, our pGTL method iteratively (1) refines the subject-wise relationships
observed in the feature domain using the learned intrinsic data representation in
the label domain, (2) updates the intrinsic data representation from the refined
subject-wise relationships, and (3) verifies the intrinsic data representation on
the training data, in order to guarantee an optimal classification on the new
testing data. Furthermore, we extend our pGTL to incorporate multi-modal
imaging data, to improve the classification accuracy and robustness as
multi-modal imaging data can provide complementary information. Promising
classification results in identifying Alzheimer’s disease (AD), Mild Cognitive
Impairment (MCI), and Normal Control (NC) subjects are achieved using MRI
and PET data.
1 Introduction
Alzheimer’s disease (AD) is the most common neurological disorder in the older
population. There is overwhelming evidence in the literature that the morphological
patterns are observable by means of either structural and diffusion MRI or PET [1–3].
However, morphological abnormal patterns are often subtle, compared to high
inter-subject variations. Hence, sophisticated pattern recognition methods are of high
demand to accurately identify individuals at different stages of AD progression.

DOI: 10.1007/978-3-319-46720-7_34
292 Z. Wang et al.
Medical imaging applications often deal with high dimensional data and usually
less number of samples with ground-truth labels. Thus, it is very challenging to find a
general model that can work well for an entire set of data. Hence, GTL method has
been investigated with great success in medical imaging area [4, 5], since it can
overcome the above difficulties by taking advantage of the data representation on
unlabeled testing subjects. In current state-of-the-art methods, graph is used to repre-
sent the subject-wise relationship. Specifically, each subject, regardless of being
labeled or unlabeled, is treated as a graph node. Two subjects are connected by a graph
link (i.e., an edge) if they have similar morphological patterns. Using these connec-
tions, the labels can be propagated throughout the graph until all latent labels are
determined. Many current label propagation strategies have been proposed to determine
the latent labels of testing subjects based on subject-wise relationships encoded in the
graph [6].
The assumption of current methods is that the graph constructed in the observed
feature domain represents the real data distribution and can be transferred to guide label
propagation. However, this assumption usually does not hold since morphological
patterns are often highly complex and heterogeneous. Figure 1(a) shows the affinity
matrix of 51 AD and 52 NC subjects using the ROI-based features extracted from each
MR image, where red dot and blue dot denote high and low subject-wise similarities,
respectively. Since the clinical data (e.g., MMSE and CDR scores [1]) is more related
with clinical labels, we use these clinical scores to construct another affinity matrix, as
shown in Fig. 1(c). It is apparent that the data representations using structural image
features and clinical scores are completely different. Thus, there is no guarantee that the
learned graph from the affinity matrix in Fig. 1(a) can effectively guide the classifi-
cation of AD and NC subjects. More critically, the affinity matrix using observed image
features is not even necessarily optimal in the feature domain, due to possible imaging
noises and outlier subjects. Many studies take advantage of multi-modal information to
improve discrimination power of transductive learning. However, the graphs from
different modalities might be different too, as shown in the affinity matrices using
structural image features from MR images (Fig. 1(a)) and functional image features
from PET images (Fig. 1(b)). Graph diffusion [5] is recently proposed to find the
common graph. Unfortunately, as shown in Fig. 1, it is hard to find a combination for
the graphs in Fig. 1(a) and (b) that can lead to the graph in Fig. 1(c), which is more
related with final classification task.
Fig. 1. Affinity matrices using structural image features (a), functional image features (b), and
clinical scores (c).
Progressive Graph-Based Transductive Learning 293
To solve these issues, we propose a pGTL method to learn the intrinsic data
representation, which could be eventually optimal for label propagation. Specifically,
the intrinsic data representation is required to be (a) close to subject-wise relationships
constructed by image features extracted from different modalities, and (b) verified on
the training data and guaranteed to be optimal for label classification. To that end, we
simultaneously (1) refine the data representation (subject-wise graph) in the feature
domain, (2) find the intrinsic data representation based on the constructed graphs on
multi-modal imaging data and also the clinical labels of entire subject set (including
known labels on training subjects and also the tentatively-determined labels on testing
subjects), and (3) propagate the clinical labels from training subjects to testing subjects,
following the latest learned intrinsic data representation. Promising classification
results have been achieved in classifying 93AD, 202 MCI, and 101NC subjects, each
with MR and PET images.
2 Methods
Suppose we have N subjects fI1 ; . . .; IP ; IP þ 1 ; . . .; IN g, which sequentially consist of P

training subjects and Qð¼ N PÞ testing subjects. For P training subjects, the clinical

labels FP ¼ f p p¼1;...;P are known, where each f p 2 ½0; 1C is a binary coding vector
indicating the clinical label from C classes. Our goal is to jointly determine the latent
labels for Q testing subjects based on a set of their continuous likelihood vectors
FQ ¼ f q q¼P þ 1;...;N , where each element in vector f q indicates the likelihood of the
q-th subject belonging to one of C classes. For convenience, we concatenate FP and FQ
into a single label matrix FNC ¼ ½FP FQ .
2.1 Progressive Graph-Based Transductive Learning
Conventional Graph-Based Transductive Learning. For clarity, we first extract

single modality image features from each subject Ii ði ¼ 1; . . .; NÞ, denoted as xi . In
conventional GTL methods, the subject-wise relationships are computed based on
feature similarity, which is encoded in an N N feature affinity matrix S. Each element
sij ð0 sij 1; i; j ¼ 1; . . .; NÞ represents the feature affinity degree between xi and xj .
After constructing S (based on feature similarity), conventional methods determine the
latent label for each testing subject Iq by solving a classic graph learning problem:
XN
^ q ¼ arg minFq
F f i f j 2 sij : ð1Þ
i;j¼1 2
As shown in Fig. 1, the affinity matrix S might not be strongly related with the intrinsic
data representation in the label domain. Therefore, it is necessary to further design a
graph based on the labels matrix, rather than solely using the graph constructed by the
features. However, the labels on testing subjects are not determined yet. In order to
solve this chicken-and-egg dilemma, we propose to construct a dynamic graph which
progressively reflects the intrinsic data representation in the label domain.
294 Z. Wang et al.
Progressive Graph-Based Transductive Learning on Single Modality. We propose

above issue. (1) We propose to gradually find an intrinsic
three strategies to remedy the
data representation T ¼ tij i;j¼1;...;N which is more relevant than S to guide the label
propagation in Eq. (1). (2) Since only the training images have their known clinical
labels, exclusively optimizing T in the label domain is an ill-posed problem. Thus, we
encourage the intrinsic data presentation T also respecting the affinity matrix S, where
image features are complete in the feature domain. (3) In order to suppress possible
noisy patterns and outlier subjects, we allow the intrinsic data representation T to
progressively refine the affinity matrix S in the feature domain. In this way, the esti-
mations of S and T are coupled, thus bringing a dynamic graph learning model with
the following objective function:
PN n 2 2 2 o
arg minS;T;F lf i f j 2 tij þ xi xj 2 sij þ k1 s2ij þ k2 sij tij 2
i;j¼1
0 0
ð2Þ
s:t: 0 sij 1; si 1 ¼ 1; 0 tij 1; ti 1 ¼ 1; F ¼ ½FP FQ
where l is the scalar balancing the data fitting terms from two different domains (i.e.,
the first and second terms in Eq. (2)). Suppose si 2 RN1 and ti 2 RN1 are vectors
with the j-th element as sij and tij separately. In order to avoid trivial solution, l2 -norm is
used as the constraint on each element sij in affinity matrix S. k1 and k2 are two scalars
to control the strengths of the last two terms in Eq. (2).
Progressive Graph-Based Transductive Learning on Multiple Modalities. Sup-
pose we have M modalities. For each subject Ii , we can extract multi-modal image
i ; m ¼ 1; . . .; M. For m-th modality, we optimize the affinity matrix S . As
m
features xm
shown in Fig. 1(a) and (b), the affinity matrices across modalities could be different.
Thus, we require the intrinsic data representation T to be close to all Sm ; m ¼ 1; . . .; M.
It is straightforward to extend our above pGTL method to the multi-modal scenario:
M
2 2 2
P
N 2 P m m m
arg minSm ;T;F lf i f j 2 tij þ xi xm
j sij þ k sm
1 ij þ k s
2 ij tij
i;j¼1 m¼1 2 2 ð3Þ
m 0 0
s:t: 0 sij 1; si 1 ¼ 1; 0 tij 1; ti 1 ¼ 1; F ¼ ½FP FQ
m
It is worth noting that, although the multi-modal information leads to multiple affinity
matrices in the feature domain, they share the same intrinsic data representation T.
2.2 Optimization
Since our proposed energy function in Eq. (3) is convex to each variables, i.e. S; T; F,
we present the following divide-and-conquer solution to optimize one set of variables
2
at a time by fixing other sets of variable. We initialize S ¼ expðxi xj 2 =2r2 Þ, r is
P
an empirical parameter, T ¼ M m¼1 S =M,FQ ¼ f0g
m QC
.
Estimation of Affinity Matrix Sm for Each Modality. Removing the unrelated terms
w.r.t. Sm in Eq. (3), the optimization of Sm falls to the following objective function:
XN 2 2 XN 2
m m m m
arg minSm i;j¼1
x i x j s ij þ k s
1 ij
m
þ k 2 i;j¼1
s ij tij ð4Þ
2 2
0
where ð0 sij 1; ðsm
i Þ 1 ¼ 1Þ. Since Eq. (4) is independent of variables i and j, we
further reformulate Eq. (4) in the vector form as below:
2
m di
arg minsmi s
i þ ð5Þ
2r1 2

i is the i-th column vector of affinity matrix S ; di ¼ dij j¼1;...;N is a vector
m
where sm
2
m
with each dij ¼ xm i x j 2k2 tij , and r1 ¼ k1 þ k2 . The problem in Eq. (5) is
equivalent to project onto a simplex, which has a closed-form solution in [7]. After we
m
solve each smi , we can obtain the affinity matrix S .
Estimate the Intrinsic Data Representation T. Fixing Sm and F, the objective

function w.r.t. T reduces to:
XN XM XN 2
2 m
arg minT i;j¼1
lf i f j tij þ k2
2 m¼1 i;j¼1
sij tij ð6Þ
2
Similarly, we can reformulate Eq. (6) by solving each ti at a time:

2
hi
arg minti t þ
i 2r
ð7Þ
2 2
2 P
where hi ¼ hij j¼1;...;N is a vector with each element hij ¼ lf i f j 2 2k2 M m
m¼1 si ,
and r2 ¼ Mk2 is a scalar.
Update the Latent Labels FQ on Testing Subjects. Given both Sw and T, the
objective function for the latent label FQ can be derived from Eq. (3) as below:
XN
arg minF f i f j 2 tij ) arg minF TraceðF0 LFÞ; ð8Þ
i;j¼1 2
where Traceð:Þ denotes the matrix trace operator, L ¼ diagðTÞ ðT0 þ TÞ=2 is the
Laplacian matrix of T. By differentiating Eq. (8) w.r.t. Fand letting
the gradient
LPP LPQ FP
LF ¼ 0, we obtain the following equation: ¼ 0; where LPP ,
LQP LQQ FQ
LPQ ; LQP , and LQQ denote the top-left, top-right, bottom-left, and bottom-right blocks
of L. The solution for FQ can be obtained by F ^ Q ¼ ðLQQ Þ1 LQP FP .
Discussion. Taking MRI and PET modalities as example, Fig. 2(a) illustrates the
optimization of Eq. (3) by alternating the following three steps. (1) Estimate each
296 Z. Wang et al.
(a) (b)
Fig. 2. (a) The dynamic procedure of the proposed pGTL method, (b) Classification accuracy as
a function of the number of training samples used.
affinity matrix Sm , which depends on the observed image features xm and the currently
estimated intrinsic data representation T (red arrows); (2) Estimate the intrinsic data
representation T, which requires the estimations of both S1 and S2 and also the
subject-wise relationship in the label domain (purple arrows); (3) Update the latent
labels FQ on the testing subjects which needs guidance from the learned intrinsic data
representation T (blue arrows). It is apparent that the intrinsic data representation
T links the feature domain and label domain, which eventually leads to the dynamic
graph learning model.
3 Experiments
Subject Information and Image Processing. In the following experiments, we select

93 AD subjects, 202 MCI subjects, and 101 NC subjects from ADNI dataset.
Since MCI is a highly heterogeneous group, we further separate them into 55 pro-
gressive MCI subjects (pMCI), who will finally develop into AD patients within the
next 24 months, and 63 stable MCI subjects (sMCI), who won’t convert to AD after 24
months. The remain MCI subjects included a group not converted in 24 months but
converted in 36 months and another group with observation information in baseline but
missing information in 24 months. Each subject has both MR and 18-Fluoro-
DeoxyGlucose PET (FDG-PET) images.
For each subject, we first align the PET image to MR image. Then we remove the
skull and cerebellum from MR image and segment MR image into white matter, gray
matter and cerebrospinal fluid. Next, we parcellate each subject image into 93 ROIs
(Regions of Interest) by registering the template (with manual annotation of 93 ROIs)
to the subject image domain. Finally, the gray matter volume and the mean PET
intensity image in each ROI are used and form a 186-dimensional feature vector.
Experiment Settings. First, we evaluate our proposed pGTL method, with compar-
ison to classic classification methods, such as Canonical Correlation Analysis
(CCA) [8] based SVM (denoted as CCA in the following context), Multi-Kernel SVM
(MKSVM) [9], and a conventional GTL method, since these methods are widely used
in AD studies. In order to demonstrate the overall performance of our method in several

classification tasks, i.e. AD vs NC, MCI vs NC, and pMCI vs sMCI, in each experi-
ment, we use 10-fold cross-validation strategy, with 9 folds of data as training dataset
and the remaining 1 fold as testing dataset. Second, we compare our proposed method
with three recently published state-of-the-art classification methods: (1) random-forest
based classification method [10], (2) multi-modal graph-fusion method [4], and
(3) multi-modal deep learning method [11]. It is worth noting that we only use the
classification accuracy reported in their papers, in order for fair comparison.
Parameter Settings. In the following experiments, we use the same greedy strategy to
select best parameters for CCA, MKSVM and our proposed method. For example, we
obtain the optimal values for l, k1 and k2 in our method by exhaustive search in the
range from 103 to 103 in a small portion of training dataset.
Comparison with Classic CCA, GTL and Multi-kernel SVM (MKSVM). The
classification accuracies by CCA, MKSVM, GTL and our method are evaluated in
three classification tasks (AD vs NC, MCI vs NC, and pMCI vs sMCI), respectively.
The averaged classification accuracy (ACC), sensitivity (SEN), and specificity
(SPE) with 10-fold cross-validation are summarized in Table 1. It is clear that our
proposed method beats other competing classification methods in three classification
tasks, with significant improvement under paired t-test (p < 0.001, designated by ‘*’ in
Table 1).
Table 1. Comparison of classification performance by different methods.
Furthermore, we evaluate the classification performance w.r.t. the number of

training samples, as shown in Fig. 2(b). It is clear that (1) our proposed method always
has higher classification accuracy than both CCA and MKSVM methods; and (2) all
methods can improve the classification accuracy as the number of training samples
increases. It is worth nothing that our proposed method achieves large improvement
against MKSVM, when only 10 % of data is used as the training dataset. The reason is
298 Z. Wang et al.
Table 2. Comparison with the classification accuracies reported in the literatures (%).
Method Subject information Modality AD/NC MCI/NC
Random forest [10] 37AD + 75MCI + 35NC MRI + PET + CSF 89.0 74.6
+ Genetic
Graph fusion [4] 35AD + 75MCI + 77NC MRI + PET + CSF 91.8 79.5
+ Genetic
Deep learning [11] 85AD + 169MCI + 77NC MRI + PET 91.4 82.1
Our method 99AD + 202MCI + 101NC MRI + PET 92.6 78.6
that supervised methods require a sufficient number of samples to train the reliable
classifier. Since the training samples with known labels are expensive to collect in
medical imaging area, this experiment indicates that our method has high potential to
be deployed in current neuroimaging studies.
Comparison with Recently Published State-of-the-Art Methods. Table 2 summa-
rizes the subject information, imaging modality, and average classification accuracy by
using state-of-the-art methods. These comparison methods represent four typical
machine learning techniques. Since the classification between pMCI and sMCI groups
are not reported in [4, 10, 11], we only show the classification results for AD vs NC,
and MCI vs NC tasks. Our method achieves higher classification accuracy than both
random forest and graph fusion methods, even though those two methods use addi-
tional CSF and genetic information.
Discussion. Deep learning approach in [11] learns feature representation in a
layer-by-layer manner. Thus, it is time consuming to re-train the deep neural-network
from scratch. Instead, our proposed method only uses hand-crafted features for clas-
sification. It is noteworthy that we can complete the classification on a new dataset
(including greedy parameter tuning) within three hours on a regular PC (8 CPU cores
and 16 GB memory), which is much more economic than massive training cost in [11].
Complementary information in multi-modal data can help improve the classification
performance, therefore, in order to find the intrinsic data representation, we combine
our proposed pGTL with multi-modal information.
4 Conclusion
In this paper, we present a novel pGTL method to identify individual subject at dif-
ferent stages of AD progression, using multi-modal imaging data. Compared to con-
ventional methods, our method seeks for the intrinsic data representation, which can be
learned from the observed imaging features and simultaneously validated on the
existing labels of training data. Since the learned intrinsic data presentation is more
relevant to label propagation, our method achieves promising classification perfor-
mance in AD vs NC, MCI vs NC, and pMCI vs sMCI tasks, after comprehensive
comparison with classic and recent state-of-the-art methods.
References
1. Thompson, P.M., Hayashi, K.M., et al.: Tracking Alzheimer’s disease. Ann. NY Acad. Sci.
1097, 198–214 (2007)
2. Zhu, X., Suk, H.-I., et al.: A novel matrix-similarity based loss function for joint regression
and classification in AD diagnosis. NeuroImage 100, 91–105 (2014)
3. Jin, Y., Shi, Y., et al.: Automated multi-atlas labeling of the fornix and its integrity in
Alzheimer’s disease. In: 2015 IEEE 12th ISBI, pp. 140–143. IEEE (2015)
4. Tong, T., Gray, K., Gao, Q., Chen, L., Rueckert, D.: Nonlinear graph fusion for multi-modal
classification of Alzheimer’s disease. In: Zhou, L., Wang, L., Wang, Q., Shi, Y. (eds.)
5. Wang, B., Mezlini, A.M., et al.: Similarity network fusion for aggregating data types on a
genomic scale. Nat. Methods 11, 333–337 (2014)
6. Zhang, Y., Huang, K., et al.: MTC: a fast and robust graph-based transductive learning
method. IEEE Trans. Neural Netw. Learn. Syst. 26, 1979–1991 (2015)
7. Huang, H., Yan, J., Nie, F., Huang, J., Cai, W., Saykin, A.J., Shen, L.: A new sparse simplex
model for brain anatomical and genetic network analysis. In: Mori, K., Sakuma, I., Sato, Y.,
Barillot, C., Navab, N. (eds.) MICCAI 2013, Part II. LNCS, vol. 8150, pp. 625–632.
Springer, Heidelberg (2013)
8. Thompson, B.: Canonical correlation analysis. In: Encyclopedia of Statistics in Behavioral
Science (2005)
9. Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12,
2211–2268 (2011)
10. Gray, K., Aljabar, P., et al.: Random forest-based similarity measures for multi-modal
classification of Alzheimer’s disease. NeuroImage 65, 167–175 (2013)
11. Liu, S., Liu, S., et al.: Multimodal neuroimaging feature learning for multiclass diagnosis of
Alzheimer’s disease. IEEE Trans. Biomed. Eng. 62, 1132–1141 (2015)
Structured Outlier Detection in Neuroimaging
Studies with Minimal Convex Polytopes
Erdem Varol(B) , Aristeidis Sotiras, and Christos Davatzikos
Center for Biomedical Image Computing and Analytics, University of Pennsylvania,

Philadelphia, PA 19104, USA
erdem.varol@uphs.upenn.edu
Abstract. Computer assisted imaging aims to characterize disease

processes by contrasting healthy and pathological populations. The sensi-
tivity of these analyses is hindered by the variability in the neuroanatomy
of the normal population. To alleviate this shortcoming, it is necessary
to define a normative range of controls. Moreover, elucidating the struc-
ture in outliers may be important in understanding diverging individu-
als and characterizing prodromal disease states. To address these issues,
we propose a novel geometric concept called minimal convex polytope
(MCP). The proposed approach is used to simultaneously capture high
probability regions in datasets consisting of normal subjects, and delin-
eate outliers, thus characterizing the main directions of deviation from
the normative range. We validated our method using simulated datasets
before applying it to an imaging study of elderly subjects consisting
of 177 controls, 123 Alzheimer’s disease (AD) and 285 mild cognitive
impairment (MCI) patients. We show that cerebellar degeneration is a
major type of deviation among the controls. Furthermore, our findings
suggest that a subset of AD patients may be following an accelerated
type of deviation that is observed among the normal population.
1 Introduction
Mass-univariate and multivariate pattern analysis techniques aim to reveal dis-
ease effects by comparing a patient group to the control population [1,9]. The
latter is commonly assumed to be homogeneous. However, as noted in recent
works [6,13], controls may often consist of subjects that are outside a normative
range, and this may confound the actual pathological effect when comparing
against the patient group. The confounding effect may be remedied by identify-
ing a normative range and removing outliers that lie outside this range.
There have been two main directions of outlier detection in the context of
neuroimaging. The first class of methods include parametric models that aim to
select a subset of samples such that the determinant of the covariance matrix
is minimized. This is in contrast to non-parametric methods such as the one-
class support vector machine (OC-SVM) [7,13,14] which attempt to separate a
subset of samples from the origin with maximum margin in the Gaussian radial
basis function (GRBF) kernel space. Another complementary non-parametric

DOI: 10.1007/978-3-319-46720-7 35
Structured Outlier Detection in Neuroimaging Studies 301
approach is the support vector data description (SVDD) [15] whose objective is
to solve for the smallest radius hypersphere that encloses a subset of the samples
(Fig. 1b). All of the aforementioned outlier detection methods effectively capture
the main probability mass of a dataset and delineate samples outside this region
as outliers. However, they do not provide further information about whether
there are different types of outliers. In this work, we posit that there may be
a structure by which outliers deviate from the normal population. Capturing
this structure may be instrumental in characterizing and understanding how
pathogenesis originates from those who are healthy. Thus, the overall aim of our
approach is to learn the organization by which samples deviate from the main
probability mass.
We resolve the limitation of prior methods regarding learning the structure
of outliers by containing the high probability region of a dataset using convex
polytopes [16]. The geometry of our formulation allows to simultaneously enclose
the normative samples within the convex polytope while excluding outliers with
maximum margin. The assignment of outliers to unique faces of the convex
polytope permits our formulation to be posed as a clustering problem. This
clustering allows to subtype the directions of deviation from the normal.
The remainder of this paper is organized as follows. In Sect. 2 we detail the
proposed approach, while experimental validation follows in Sect. 3. Section 4
concludes the paper with our final remarks.
2 Method
To learn the organization by which samples deviate from the main probability
mass, we aim to find the minimal convex polytope (MCP) that excludes ρ per-
cent of the samples with maximum margin. The convex polytope is minimal in
the sense that the radius of the largest hypersphere that is circumscribed within
the polytope is the minimum possible. Furthermore, the convex polytope is max-
imum margin in the sense that the margin between samples within the polytope
and the outliers surrounding the polytope is maximized (Fig. 1c).
(a) (b) (c)
Fig. 1. (a) A simulated dataset with three deviations from normal; (b) the minimum
hypersphere that excludes ρ percent of samples; (c) Proposed solution: minimum convex
polytope (MCP) that excludes ρ percent of samples. Note that the MCP characterizes
the types of deviations by associating outliers to different faces (indicated by colors
orange, green and blue).
302 E. Varol et al.
The previous problem involves two steps. The first step is to find the minimal
hypersphere that excludes ρ percent of samples and the second is to find the
convex polytope that circumscribes this hypersphere. Let xi ∈ Rd for i = 1, . . . , n
denote the ith d-dimensional sample in the dataset. The minimal hypersphere
that excludes ρ percent of samples can be cast as the following optimization
problem:
n
1
minimize R2 + max{0, R2 − xi − xc 22 }, (1)
R,xc nρ i=1
where R describes the radius and xc denotes the center of the hypersphere. This
problem is convex [15] and can be solved using LIBSVM1 .
Once the dichotomy between the outliers and normative samples has been
established, the maximum margin convex polytope [16] that separates the out-
liers from the normative samples can be cast as the following objective:
⎡
K

⎢ 1
minimize wj 1 +C ⎢
⎣ max{0, 1 + wjT xi + bj }
{wj ,bj }K
j=1 j=1
K
i:xi −xc 2 ≤R
{ai,j }K,n
j=1,i=1
j=1,...,K
j ai,j =1 regularization/margin
ai,j ≥0 loss for normative samples
⎤
⎥
+ ai,j max{0, 1 − wjT xi − bj }⎥
⎦.
i:xi −xc 2 >R
j=1,...,K

assignment & loss for outliers
(2)
This objective bears resemblance to standard large margin classifiers such as

SVM. The first term encourages sparsity to capture focal directions of deviation
which are often encountered in neuroimaging studies. The loss term is broken into
one for normative samples and another for outliers. Specifically, the normative
samples are constrained to be in the negative halfspace of all faces of the polytope
while the outliers are constrained to be in the positive halfspace for at least
one of the faces. This leads to an assignment problem which is encoded by the
ai,j entries of the matrix A that inform us whether ith sample belongs to the
jth face of polytope or not. The resulting formulation is non-convex and an
iterative optimization between solutions of the faces, W, b and assignments, A
is necessary.
When fixing the assignments, the problem can be solved by K applications
of weighted LIBSVM2 . On the other hand, when fixing the convex polytope, the
outliers can be assigned to the face that yields the maximum value of wjT xi + bj .
The overall optimization scheme is summarized in Algorithm 1.
1
https://www.csie.ntu.edu.tw/∼cjlin/libsvmtools/svdd/.
2
https://www.csie.ntu.edu.tw/∼cjlin/libsvmtools/weights/.
Algorithm 1. Minimal (enclosing) Convex Polytope (MCP)

Input: X ∈ Rn×d , C (loss penalty), ρ (outlier percentage), K (number of outlier
subtypes)
Output: W ∈ Rd×K , b ∈ R1×K (Outlier excluding convex polytope); A ∈ [0, 1]n×K
(Outlier subtype assignment)
Outlier delineation: Solve for R, xc in Eq. (1b) using LIBSVM-SVDD1
Initialization: Initialize outlier assignments A randomly
Loop: Repeat until convergence (or a fixed number of iterations)
• Fix A— Solve for W, b with weighted LIBSVM with weights2 :
C
if xi − xc 2 ≤ R
wi,j = K
Cai,j if xi − xc 2 > R
⎧
1
⎪
⎨ai,j = K if xi − xc 2 ≤ R
• Fix W, b — Solve for A: ai,j = 1 if xi − xc 2 > R and j = arg maxj wjT xi + bj
⎪
⎩
ai,j = 0 otherwise
2.1 Model Selection
The proposed MCP model is ultimately a clustering method whose performance

depends on the selection of the following three parameters: (1) K, the number of
deviation subtypes; (2) ρ, the outlier amount; (3) C, the loss penalty for violating
margin. We choose the parameter combination that yields the most stable clus-
tering [2]. To measure stability, we compute the average pairwise adjusted Rand
index (ARI) [8] in a 10-fold cross-validation setting. The considered parameter
space is: K ∈ {1, . . . , 9}, ρ ∈ {0.1, 0.2, 0.3, 0.4, 0.5} and C ∈ {10−3 , . . . , 101 }.
3 Experimental Validation
3.1 Simulated Data
Due to lack of ground truth in clinical datasets and the need to quantitatively
evaluate performance, we validated our method on two simulated datasets where
the number of directions of deviations from the normal was a priori determined.
Both datasets composed of 1000 samples and 150 features. 130 out of 150 of
the features were drawn from a zero mean, unit variance, multivariate Gaussian
distribution. For the first dataset, the remaining 20 features were replicates of the
univariate random variable that is uniformly distributed within a unit side length
equilateral triangle (as in Fig. 1a). Thus, the number of simulated deviations
from the spherical white noise was three for this dataset. The second dataset was
analogously generated except that the 20 signal-carrying features were replicates
of the univariate random variable that is uniformly distributed within a unit side
length square. Hence, this dataset was designed to yield four types of outliers.
For the triangular dataset, the parameter selection revealed that the most
stable clustering occurs at K = 3, ρ = 0.1, C = 0.01 (Fig. 2a), while for the
square dataset, the most stable clustering occurred at K = 4, ρ = 0.5, C = 0.01
304 E. Varol et al.
1.2 1
50% outlier 50% outlier
40% outlier 0.9 40% outlier
1 30% outlier 30% outlier
20% outlier 0.8 20% outlier
10% outlier 10% outlier
0.8 0.7
K−means K−means
0.6
0.6
ARI
ARI
0.5
0.4
0.4
0.2 0.3
0.2
0
0.1
−0.2 0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
K K
(a) Triangular simulated dataset (b) Square simulated dataset
Fig. 2. The parameter selection for (a) triangular simulated dataset, and (b) square
simulated dataset. (a) K = 3, ρ = 0.1, C = 0.01 were selected, (b) K = 4, ρ = 0.5, C =
0.01 were selected. Different solid lines indicate the ARI of MCP at different values
of ρ at the maximum ARI yielding C parameter. Black dashed lines indicate the ARI
of K-means for comparison. Note that MCP yields more stable clusterings that align
with the ground truth.
(Fig. 2b). For both of these datasets, the ARI values for the optimal K were com-
parable across varying ρ and C, which indicates that the most important direc-
tions of deviation were captured regardless of the amount of outliers searched.
These results demonstrate the ability of MCP to capture the underlying direc-
tions of deviation.
For comparison, K-means clustering was applied to the same datasets (see
Fig. 2a, b, dashed lines). For the triangular and square datasets, K = 2 and
K = 3 yielded the most stable clusterings, respectively. This demonstrates that
K-means was not able to accurately capture the main directions of deviation,
but was most likely grouping outliers with the normative samples.
3.2 Application to a Study of Alzheimer’s Disease

The proposed method was applied to a subset of the ADNI study3 which is
composed of magnetic resonance imaging (MRI) scans of 177 controls (CN), 123
Alzheimer’s disease (AD) patients and 285 mild cognitive impairment (MCI)
patients. T1-weighted MRI volumetric scans were obtained at 1.5 Tesla. The
images were pre-processed through a pipeline consisting of (1)alignment to the
Anterior and Posterior Commissures plane; (2) skull-stripping; (3) N3 bias cor-
rection; (4) deformable mapping to a standardized template space. Following
these steps, a low-level representation of the tissue volumes was extracted by
automatically partitioning the MRI volumes of all participants into 153 volumet-
ric regions of interest (ROI) spanning the entire brain. The ROI segmentation
was performed by applying a multi-atlas label fusion method [4]. The derived
ROIs were used as the input features for our method. Before training the model,
all ROIs were linearly residualized to remove the effect of age and sex [5].
3
http://adni.loni.usc.edu/data-samples/mri/.
0.5 4 Deviated CN
0.45
50% outlier
40% outlier
30% outlier
3 D2 Deviated MCI
Deviated AD
0.4 20% outlier 2

10% outlier
0.35
1
0.3
0
ARI
0.25
−1
0.2
0.15
−2 D1
0.1
−3 N
0.05 −4
Normative AD
Normative MCI
0 −5
1 2 3 4 5 6 7 8 9 Normative CN
K −5 −4 −3 −2 −1 0 1 2
(a) (b)
(c) D1: Cerebellar and brain stem degeneration associated with deviation subtype 1.
(d) D2: Widespread gray matter atrophy patterns associated with deviation subtype
2.
Fig. 3. (a) The parameter selection for ADNI control group, K = 2, ρ = 0.3, C = 1
yielded the highest clustering stability. (b) The projections of all ADNI subjects along
the two faces of the MCP. Normative samples (N) are in the negative orthant while
deviated subtypes are on the upper left (subtype 2) and lower right (subtype 1). (c,
d) The voxel-based group differences between all normative samples and deviation
subtype 1 (c), and deviation subtype 2 (d) are shown. Warmer colors indicate that the
normative group volume is greater, while colder colors indicate that the deviated group
volume is greater.
The method was applied only to the control group. The parameter selection
revealed that K = 2 subtypes, and 30 % outliers with C = 1 yielded the highest
clustering stability (Fig. 3a). Once the MCP that captured the normative con-
trols was found, it was used to subtype the rest of the ADNI dataset consisting
of AD and MCI subjects into three groups denoted by normative (N), deviation
subtype 1 (D1) and deviation subtype 2 (D2).
The distribution of the entire ADNI dataset with respect to the MCP is illus-
trated in Fig. 3b. Furthermore, the demographic and clinical biomarker informa-
tion of CN, MCI and AD subjects within their respective subgroup is summarized
in Table 1. 56 % of AD and 62 % of MCI patients were categorized into the nor-
mative group. This indicated that the main type of AD and MCI neuropathology
was dissimilar to the deviations exhibited by the normal population. However,
a non-negligible portion, 37 % of AD and 28 % of MCI was found to deviate
306 E. Varol et al.
Table 1. Demographic and clinical characteristics of CN, AD, MCI subjects and their
grouping into the normative (N) or deviated subtypes (D1, D2). a – Mini mental
state exam. b – Presence of at least one APOE ε4 allele. c – Cerebrospinal fluid
(CSF) concentrations of Amyloid-beta (Aβ), total tau (t-tau), and phosphorylated tau
(p-tau). d – p-values using ANOVA between three subgroups
Normative and deviated subjects in ADNI study

Group AD MCI
Subtype AD-N AD-D1 AD-D2 p-valued MCI-N MCI-D1 MCI-D2 p-valued
n (%) 69 (56.0) 8 (6.5) 46 (37.3) 178 (62.4) 26 (9.1) 81 (28.4)
Age (years) 76.4 ± 6.7 74.4 ± 6.6 72.0 ± 7.8 0.007 75.4 ± 6.7 77.6 ± 6.3 71.9 ± 7.4 0.01
Sex (female), n (%) 36 (52.1) 3 (37.5) 23 (50.0) 0.73 59 (33.1) 6 (23.0) 33 (40.7) 0.15
MMSEa 23.6 ± 1.9 23.7 ± 1.9 23.5 ± 1.7 0.92 27.0 ± 1.7 27.0 ± 1.8 26.9 ± 1.7 0.89
APOE ε4b , n (%) 45 (65.2) 6 (75.0) 31 (67.3) 0.85 95 (53.3) 12 (46.1) 50 (61.7) 0.47
CSF Aβ (pg/mL)c 153.8 ± 49.2 146.2 ± 39.7 126.9 ± 23.3 0.03 158.8 ± 51.9 187.7 ± 59.2 164.9± 54.5 0.24
CSF t-tau (pg/mL)c 118.5 ± 59.0 104.2 ± 44.7 132.3 ± 59.0 0.49 90.0 ± 38.4 129.2 ± 71.0 106.2 ± 53.0 0.0029
CSF p-tau (pg/mL)c 37.9 ± 18.3 38 ± 14 45.5 ± 20.6 0.27 34.2 ± 16.2 38.1 ± 20.3 36.6 ± 16.5 0.05
Group CN
Subtype CN-N CN-D1 CN-D2 p-valued
n (%) 125 (70.6) 19 (10.7) 33 (18.6)
Age (years) 75.4 ± 5.3 78.1 ± 4.2 76.2 ± 4.7 0.10
Sex (female), n (%) 58 (46.4) 10 (52.6) 19 (57.5) 0.49
MMSEa 29.0 ± 1.1 29.0 ± 0.7 29.4 ± 0.8 0.19
APOE ε4b , n (%) 34 (27.2) 6 (31.5) 8 (24.2) 0.85
CSF Aβ (pg/mL)c 203.4 ± 55.7 233.7 ± 35.2 218.5 ± 52.3 0.21
CSF t-tau (pg/mL)c 67.4 ± 23.8 63.7 ± 22.6 73.9 ± 29.2 0.54
CSF p-tau (pg/mL)c 24.5 ± 14.5 22.4 ± 11.5 24.8 ± 11.4 0.90
along the second subtype direction along with 18 % of CN. This suggested that
a sizeable portion of the normal population might have the propensity to deviate
towards AD-like pathology.
To better understand and interpret the neuroanatomical directions of these
deviations from the normative range, voxel-based analysis was performed on all
subjects in the normative group versus either of the two subtypes of deviations
using gray matter tissue density maps. The group differences are visualized in
Fig. 3.
There has been a substantial amount of research in the past that has demon-
strated that the normal pattern of aging consists of prefrontal and motor cortex
thinning along with increased ventricle size [11,12]. Corresponding manifesta-
tions of these patterns can be observed in group D2 (Fig. 3d). The significantly
younger ages of AD and MCI subjects (Table 1) that fall into this subtype may
indicate that the cognitive decline they exhibit may be caused by early and
accelerated aging that follows this pattern. Furthermore, the relatively lower
CSF amyloid-β and t-tau concentrations (Table 1) of these patients is another
strong indicator of AD [3].
On the other hand, the patterns seen in group D1 (Fig. 3c) indicate cerebellar
degeneration which is usually accompanied by brain stem atrophy [10]. Although
cerebellar thinning has been demonstrated to be part of normal aging, our find-
ings suggest that the increased rate of this degenerative pattern may be a type
of deviation. Lastly, it should be mentioned that the majority of the AD and
MCI subjects were not designated to be moving along either of the directions of
deviations of normal subjects. A possible explanation is that for these particular

subjects, the deviation towards AD may have begun at an earlier time point,
which was not represented by the control subjects present in the study.
4 Conclusion
In summary, we have introduced a method that can simultaneously detect a
homogeneous normative group and define subtypes of outliers. This allows a
better understanding of the structure of deviations in control groups in neu-
roimaging cohorts. This, in turn, aids in the better interpretation of the patho-
logical processes, which occur when subjects diverge from the normative region.
References
1. Ashburner, J., Friston, K.J.: Voxel-based morphometry-the methods. Neuroimage
11(6), 805–821 (2000)
2. Ben-Hur, A., et al.: A stability based method for discovering structure in clustered
data. In: Pacific Symposium on Biocomputing, vol. 7, pp. 6–17 (2001)
3. Blennow, K.: Cerebrospinal fluid protein biomarkers for Alzheimer’s disease. Neu-
roRx 1(2), 213–225 (2004)
4. Doshi, J., et al.: MUSE: multi-atlas region segmentation utilizing ensembles of
registration algorithms and parameters, and locally optimal atlas selection. Neu-
roImage 127, 186–195 (2015)
5. Dukart, J., Schroeter, M.L., Mueller, K., Initiative, A.D.N., et al.: Age correction
in dementia-matching to a healthy brain. PloS one 6(7), e22193 (2011)
6. Fritsch, V., et al.: Detecting outliers in high-dimensional neuroimaging datasets
with robust covariance estimators. Med. Image Anal. 16(7), 1359–1370 (2012)
7. Gardner, A.B., et al.: One-class novelty detection for seizure analysis from intracra-
nial EEG. J. Mach. Learn. Res. 7, 1025–1044 (2006)
8. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
9. Kawasaki, Y., et al.: Multivariate voxel-based morphometry successfully differen-
tiates schizophrenia patients from healthy controls. Neuroimage 34(1), 235–242
(2007)
10. Luft, A.R., et al.: Patterns of age-related shrinkage in cerebellum and brainstem
observed in vivo using three-dimensional MRI volumetry. Cereb. Cortex 9(7), 712–
721 (1999)
11. Raz, N., Rodrigue, K.M.: Differential aging of the brain: patterns, cognitive corre-
lates and modifiers. Neurosci. Biobehav. Rev. 30(6), 730–748 (2006)
12. Salat, D.H., et al.: Thinning of the cerebral cortex in aging. Cereb. Cortex 14(7),
721–730 (2004)
13. Sato, J.R., et al.: An fmRI normative database for connectivity networks using
one-class support vector machines. Hum. Brain Mapp. 30(4), 1068–1076 (2009)
14. Schölkopf, B., et al.: Estimating the support of a high-dimensional distribution.
Neural Comput. 13(7), 1443–1471 (2001)
15. Tax, D.M., Duin, R.P.: Support vector data description. Machine Learn. 54(1),
45–66 (2004)
16. Varol, E., Sotiras, A., Davatzikos, C.: Hydra: revealing heterogeneity of imaging
and genetic patterns through a multiple max-margin discriminative analysis frame-
work. NeuroImage (2016)
Diagnosis of Alzheimer’s Disease Using
View-Aligned Hypergraph Learning
with Incomplete Multi-modality Data
Mingxia Liu, Jun Zhang, Pew-Thian Yap, and Dinggang Shen(B)

Chapel Hill, NC 27599, USA
dgshen@med.unc.edu
Abstract. Effectively utilizing incomplete multi-modality data for diag-

nosis of Alzheimer’s disease (AD) is still an area of active research. Sev-
eral multi-view learning methods have recently been developed to deal
with missing data, with each view corresponding to a specific modality or
a combination of several modalities. However, existing methods usually
ignore the underlying coherence among views, which may lead to sub-
optimal learning performance. In this paper, we propose a view-aligned
hypergraph learning (VAHL) method to explicitly model the coherence
among the views. Specifically, we first divide the original data into several
views based on possible combinations of modalities, followed by a sparse
representation based hypergraph construction process in each view.
A view-aligned hypergraph classification (VAHC) model is then pro-
posed, by using a view-aligned regularizer to model the view coherence.
We further assemble the class probability scores generated from VAHC
via a multi-view label fusion method to make a final classification deci-
sion. We evaluate our method on the baseline ADNI-1 database hav-
ing 807 subjects and three modalities (i.e., MRI, PET, and CSF). Our
method achieves at least a 4.6 % improvement in classification accuracy
compared with state-of-the-art methods for AD/MCI diagnosis.
1 Introduction
Alzheimer’s disease (AD) is a neurodegenerative disease that continues to pose
major challenges to global health care systems [1]. Studies have shown that
multi-modality data (e.g., structural magnetic resonance imaging (MRI), flu-
orodeoxyglucose positron emission tomography (PET), and cerebrospinal fluid
(CSF)) provide complementary information that can be harnessed for improv-
ing diagnosis of AD and its prodrome, known as mild cognitive impairment
(MCI) [2–5]. However, collecting data with multi-modalities is challenging and
the data are often incomplete due to patient dropouts. In the Alzheimer’s Dis-
ease Neuroimaging Initiative (ADNI) database, for instance, while baseline MRI
D. Shen—This study was supported in part by NIH grants (EB006733, EB008374,
EB009634, MH100217, AG041721, AG042599, AG010129, AG030514, and
NS093842).

DOI: 10.1007/978-3-319-46720-7 36
View-Aligned Hypergraph Learning 309
data were collected for all subjects, only approximately half of the subjects have
baseline PET data and half of the subjects have baseline CSF data.
Various approaches have been developed to deal with the problem of incom-
plete multi-modality data. A straightforward method is to remove subjects with
missing data. This approach, however, significantly reduces the sample size. An
alternative way is to impute the missing data using techniques such as expecta-
tion maximization (EM) [6], singular value decomposition (SVD) [7], and matrix
completion [5]. However, the effectiveness of this method can be affected by impu-
tation artifacts. Several recently introduced multi-view learning based methods
circumvent the need for imputation [3,4]. These methods generally apply specific
learning algorithms to different views of the data, comprising the combinations
of available data from different modalities. However, the coherence among views
is not explicitly considered in these methods. Intuitively, integrating these views
coherently can lead to better diagnostic performance. On the other hand, hyper-
graph learning [8] has attracted increasing attention in neuroimaging analysis,
where complex relationships among vertices can be modeled via hyperedges [9].
In this paper, we propose a view-aligned hypergraph learning (VAHL)
method with incomplete multi-modality data for AD/MCI diagnosis. Different
from conventional multi-view based learning methods, VAHL explicitly incor-
porates the coherence among views into the learning model, where the optimal
weights for different views are automatically learned from the data. Figure 1
presents a schematic diagram of our method. We first divide the whole dataset
into M views (M = 6 in Fig. 1) according to the data availability in association
with different combinations of modalities, followed by a sparse representation
based hypergraph construction process in each view space. We then develop a
view-aligned hypergraph classification (VAHC) model to explicitly capture the
coherence among views. To arrive at a final classification decision, we agglomer-
ate the class probability scores via a multi-view label fusion method.
Fig. 1. Overview of the proposed view-aligned hypergraph learning method.

310 M. Liu et al.
2 Method
Data and Pre-processing: A total of 807 subjects in the baseline ADNI-
1 database [10] with MRI, PET and CSF modalities are used in this study,
which include 186 AD subjects, 226 NCs, and 395 MCI subjects. According
to whether MCI would convert to AD within 24 months, the MCI subjects are
further divided into two categories: (1) stable MCI (sMCI), if diagnosis was
MCI at all available time points (0–96 months); (2) progressive MCI (pMCI),
if diagnosis was MCI at baseline but conversion to AD occurred after baseline
within 24 months. The 395 MCI subjects are separated into 169 pMCI and 226
sMCI subjects.
Image features are extracted from the MR and PET images based on
regions-of-interest (ROIs). Specifically, for each MR image, we perform ante-
rior commissure (AC)-posterior commissure (PC) correction, resampling to size
256 × 256 × 256, and inhomogeneity correction using the N3 algorithm [11].
Skull stripping is then performed using BET [12], followed by manual editing
to ensure that both skull and dura are cleanly removed. Next, we remove the
cerebellum by warping a labeled template to each skull-stripped image. FAST
[12] is applied to segment the human brain into three different tissue types,
i.e., gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF). The
anatomical automatic labeling (AAL) atlas, with 90 pre-defined ROIs in the
cerebrum, is aligned to the native space of each subject using a deformable reg-
istration algorithm. Finally, for each subject, we extract the volumes of GM
tissue inside the 90 ROIs as features, which are normalized by the total intracra-
nial volume (estimated by the summation of GM, WM and CSF volumes from all
ROIs). We align each PET image to its corresponding MR image via affine trans-
formation and compute the mean PET intensity in each ROI as features. We
also employ five CSF biomarkers, including amyloid β (Aβ42), CSF total tau
(t-tau), CSF tau hyperphosphorylated at threonine 181 (p-tau), and two tau
ratios with respect to Aβ42 (i.e., t-tau/Aβ42 and p-tau/Aβ42). Ultimately, we
have a 185-dimensional feature vector for each subject with complete data modal-
ities, including 90 MRI features, 90 PET features, and 5 CSF features.
Multi-view Data Grouping: We group the subjects into 6 views, includ-

ing “PET + MRI”, “PET + MRI + CSF”, “MRI + CSF”, “PET”, “MRI”, and
“CSF”. Here, each view denotes a specific modality or a possible combination of
several modalities. As shown in Fig. 1, subjects in View 1 have PET and MRI data,
while those in View 6 only have CSF data. This grouping allows us to make use of
all data without discarding subjects or introducing imputation artifacts.
Sparse Representation Based Hypergraph Construction: In this study,

we formulate the AD/MCI diagnosis as a multi-view hypergraph based classi-
fication problem, where one hypergraph is constructed in each view space. Let
G m = (V m , E m , wm ) denote the hypergraph with N m vertices corresponding
to the m-th view, where V m is a vertex set with each vertex representing a
m
subject, E m denotes a hyperedge set with Nem hyperedges, and wm ∈ RNe is
m m
the corresponding weight vector for hyperedges. Denote Hm ∈ RN ×Ne as the
vertex-edge incidence matrix, with the (v, e)-entry indicating whether the vertex
v is connected with other vertices in the hyperedge e.
In conventional hypergraph based methods [8], the Euclidean distance is typ-
ically used to evaluate similarity between pairs of vertices. We argue that the
Euclidean distance can only model the local structure of data. To this end, we
propose a sparse representation (SR) based hypergraph construction method to
exploit the global structure of data. Specifically, we first select each vertex as a
centroid, and then represent each centroid using the other vertices via a SR model
[13]. A hyperedge can then be constructed by connecting each centroid to the
other vertices, with global sparse representation coefficients as similarity mea-
surements. Given N m vertices, we can obtain Nem = N m hyperedges. A larger
value for the l1 regularization parameter (i.e., ) in SR will lead to more sparse
coefficients. To capture richer data structure information, we employ multiple
(e.g., q) parameters in SR to construct multiple sets of hyperedges, and finally
have Nem = qN m hyperedges for the hypergraph G m .
View-Aligned Hypergraph Classification: Denote f m as the class probabil-

ity score vector for N subjects in the m-th view, and F = [f 1 , · · · , f m , · · · , f M ] ∈
RN ×M . To model the coherence among different views, we propose a view-aligned
regularizer, as illustrated in Fig. 2. For instance, the circles indicate the subject
1 with PET, MRI and CSF features (i.e., xPET 1 , xMRI
1 and xCSF
1 ), respectively.
PET MRI
Intuitively, their class probability scores (i.e., f1 , f1 and f1CSF ) should be
close to one another, because they represent the same subject. Let Ωm ∈ RN ×N
m
be a diagonal matrix with the diagonal element Ωn,n = 0 if the n-th subject
m
has missing values in the m-th view, and Ωn,n = 1, otherwise. The view-aligned
regularizer is then defined as

N
M
M
m p

M
M
Ωn,n Ωn,n (fnm − fnp )2 = (f m )T Ωm Ωp (f m − f p ). (1)
n=1 m=1 p=1 m=1 p=1
Fig. 2. Illustration of the view-aligned regularizer with PET, MRI, and CSF data.
312 M. Liu et al.
Using the hypergraph constructed in the m-th view, the objective of hyper-
graph based semi-supervised learning is formulated as [8]
min
m
Remp (f m ) + Rreg (f m ), (2)
f
where the first term is the empirical loss, and the second term is a hypergraph
regularizer [8] defined as
2
wem hm m
u,e hv,e fm fm
Rreg (f m ) = × √ um − √ v m = (f m )T L
f m, (3)
e∈E m u,v∈V m
δem
du dv
where the hypergraph Laplacian matrix is defined as L m = I − Θm . Here, Θm =

1 1
− −1 − N m ×N m
(Dm v )
2
Hm Wm (Dm e ) (Hm )T (Dm vm)
2
, where Dmv ∈R is the diagonal
m
vertex degree matrix and Dm e ∈R
Ne ×Ne
denote the diagonal
hyperedge degree
m m m
matrix. Note that vertex degree for v is defined as d v = e∈E m we hv,e , and the
hyperedge degree for e is defined as δem = v∈V m hm v,e .

T
Let y = [yla T T
, yun ] ∈ RN , where yla represents the label information for
labeled data and yun is the label information for the unlabeled data. For the
i-th sample, yi = 1 if it is associated with the positive class (e.g., AD), yi = −1 if
it belongs to the negative class (e.g., NC), and yi = 0 if its category is unknown.
Since different views and hyperedges may play different roles in classification,
we learn the weights associated with different views and hyperedges from data.
Denote α ∈ RM as a weight vector, with the element αm representing the weight
m m
for the m-th view. For the m-th hypergraph, we denote Wm ∈ RNe ×Ne as the
diagonal matrix of hyperedge weights. Our view-aligned hypergraph classifica-
tion (VAHC) model is formulated as follows:

M
M
min Ωm (f m − y)2 + (αm )2 (f m )T L
mf m
F,α,{Wm }M
m=1 m=1 m=1

M
M
M
+μ (f m )T Ωm Ωp (f m − f p ) + λ Wm 2F , (4)
m=1 p=1 m=1
m
Ne

M
s.t. αm = 1, ∀ αm ≥ 0; m
Wi,i m
= 1, ∀ Wi,i ≥ 0,
m=1 i=1
where the first term is the square loss, and the second one is the hypergraph
Laplacian regularizer. The regularization coefficient (αm )2 is to prevent the
degenerate solution of α. The last term and those constraints in Eq. (4) are used
to penalize the complexity of the weights (i.e., α) for views and the weights (i.e.,
Wm ) for hyperedges. It is worth noting that the third term in Eq. (4) is the
proposed view-aligned regularizer, which encourages that the estimated labels
of one subject represented in different views to be similar. Using Eq. (4), we can
jointly learn the class probability scores F, the optimal weights for views (i.e.,
M
α), and the optimal weights for hyperedges (i.e., {Wm }m=1 ) from data.
M
Since the problem in Eq. (4) is not jointly convex w.r.t. F, α and {Wm }m=1 ,
we adopt an alternating optimization method to solve the objective function.
M
First, we optimize F with fixed α and {Wm }m=1 . Given fixed F and α, we
M
optimize {Wm }m=1 in the second step. In the third step, we optimize α with
M
fixed F and {Wm }m=1 . Such alternating optimization process is repeated until
convergence. The overall computational complexity of our method is O(N 2 ).
Multi-view Label Fusion: For a new testing subject z, we now compute

the weighted mean of its class probability scores {fzm }Mm=1 for making a final
classification
M decision.
Specifically, its class label can be obtained via l(z) =
αm ×fzm
M
m m
sign γ , where γ = α and α is the learned weight of the
m=1 m=1
m-th view via VAHL. Note that if z has missing values in a specific modality,
the weights for corresponding views associated with this modality will be 0.
3 Experiments
Experimental Settings: We performed three classification tasks, including
AD vs. NC, MCI vs. NC, and pMCI vs. sMCI classification. The classification
performance was evaluated by accuracy (ACC), sensitivity (SEN), specificity
(SPE), and area under the ROC curve (AUC). We compared VAHL with 4 base-
line methods, including Zero (with missing values as zeros), KNN, EM [6], and
SVD [7]. VAHL was further compared with 4 state-of-the-art methods, includ-
ing an ensemble-based method [2] with weighted mean (Ensemble-1) and mean
(Ensemble-2) strategies, iMSF [3] with square loss (iMSF-1) and logistic loss
(iMSF-2), iSFS [4], and matrix shrinkage and completion (MSC) [5].
A 10-fold cross-validation (CV) strategy was used for performance evalua-
tion. To optimize parameters, we performed an inner 10-fold CV using training
data. The parameters μ and λ in Eq. (4) were chosen from {10−3 , 10−2 , · · · , 104 },
while the iteration number in the alternating optimization algorithm for Eq. (4)
was empirically set to 20. Multiple parameter values for in the SR model
[13] were set to [10−3 , 10−2 , 10−1 , 100 ] to construct multiple sets of hyperedges
in each hypergraph of VAHL. The parameter k for KNN was chosen from
{3, 5, 7, 9, 11, 15, 20}. The rank parameter was chosen from {5, 10, 15, 20, 25, 30}
for SVD, and the parameter λ for iMSF was chosen from {10−5 , 10−4 , · · · , 101 }.
Results of iSFS [4] and MSC [5] were taken directly from the authors.
Results: Experimental results achieved by our method and those baseline meth-
ods are given in Fig. 3. As can be seen from Fig. 3, our method consistently
achieves the best performance in terms of ACC, SEN and AUC in three clas-
sification tasks. We further report the comparison between our method and
state-of-the-art methods in Table 1, with results demonstrating that our method
outperforms those competing methods. For instance, the ACC values achieved
by our method are 93.10 % and 80.00 % in AD vs. NC and MCI vs. NC classi-
fication, respectively, which are significantly better than the second best results
314 M. Liu et al.
Zero KNN EM 100 Zero KNN EM 100 Zero KNN EM

100
SVD VAHL 90 SVD VAHL 90 SVD VAHL
90
Results (%)
Results (%)
Results (%)
80 80
80
70 70 70
60 60 60
50 50 50
40 40 40
(a) AD vs. NC classification (b) MCI vs. NC classification (c) pMCI vs. sMCI classification
Fig. 3. Performance of VAHL and baseline methods in three classification tasks.
Table 1. Comparison with the state-of-the-art methods
Method AD vs. NC MCI vs. NC pMCI vs. sMCI

(%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
Ensemble-1 [2] 83.03 78.54 86.72 89.82 62.58 65.42 57.73 64.40 68.10 55.44 77.77 64.60
Ensemble-2 [2] 81.07 76.37 84.94 87.39 61.61 64.16 57.28 62.07 65.56 51.15 75.41 61.78
iMSF-1 [3] 86.41 76.91 94.24 85.57 70.64 81.62 54.42 63.02 65.82 56.90 72.38 68.20
iMSF-2 [3] 86.97 75.78 93.90 86.34 71.61 82.83 54.73 63.78 64.55 56.85 70.22 66.00
iSFS [4] 88.48 88.95 88.16 88.56 - - - - - - -
MSC [5] 88.50 83.70 92.70 94.40 71.50 75.30 64.90 77.30 - - - -
VAHL (ours) 93.10 90.00 95.65 94.83 80.00 86.19 68.78 80.49 79.00 60.80 92.53 79.66
(i.e., 88.50 % and 71.61 %, respectively). Similarly, the results in pMCI vs. sMCI
classification show that our method can identify progressive MCI patients from
the whole population more accurately than the state-of-the-art methods.
We also conduct experiments using VAHL based on complete data (with
PET, MRI and CSF modalities), and achieved the accuracies of 89.23 %, 78.50 %
and 78.00 % in AD vs. NC, MCI vs. NC, and pMCI vs. sMCI classification,
respectively. These results are worse than the results of using all subjects with
incomplete data, implying that subjects with missing data can provide useful
information. Then, we compare VAHL with its variant named VAHL-1 (with-
out the view-aligned regularizer), and the accuracies achieved by VAHL-1 are
85.24 %, 75.16 % and 75.25 % in the three classification tasks, respectively. Such
results imply that our view-aligned regularizer plays an important role in VAHL.
We further investigate the influence of parameters and the weights for dif-
ferent views learned from Eq. (4), with results shown in Fig. 4. Figure 4(a) indi-
cates that the best results are achieved by VAHL when 0.1 μ 100 and
0.01 λ 10 in three tasks. From Fig. 4(c), we can observe that the learned
weights for the “PET + MRI + CSF” view are much larger than those of the
other five views, implying that this view contributes the most in three tasks.
100 100 0.6 MRI PET CSF

90 90 0.5 PET+MRI MRI+CSF
PET+MRI+CSF
Weight value
0.4
ACC (%)
80
ACC (%)
80
0.3
70 70
AD vs. NC AD vs. NC 0.2
60 MCI vs. NC 60 MCI vs. NC
pMCI vs. sMCI 0.1
pMCI vs. sMCI 50
50
01 01 .1 1 10 00 00 00 01 01 .1 1 10 00 00 00 AD vs.
NC . NC . sMCI
0. 0 1 0. 0 1 MCI vs pMCI vs
0.
0 10 10
0
0.
0 10 10
0
Value of Value of Learned weights for different views
(a) (b) (c)
Fig. 4. Influence of parameters (a–b) and learned weights for different views (c).
4 Conclusion
We propose a view-aligned hypergraph learning (VAHL) method using incom-

plete multi-modality data for AD/MCI diagnosis. Specifically, we first group
data into several views according to the availability of modalities, and construct
one hypergraph in each view using a sparse representation based hypergraph
construction method. We then develop a view-aligned hypergraph classification
model to explicitly capture coherence among views, as well as to automatically
learn the optimal weights of different views from data. A multi-view label fusion
method is employed to arrive at a final classification decision. Results on the
baseline ADNI-1 database (with MRI, PET, and CSF modalities) demonstrate
the efficacy of our method in AD/MCI diagnosis with incomplete data.
References
1. Brookmeyer, R., Johnson, E., Ziegler-Graham, K., Arrighi, H.M.: Forecasting the
global burden of Alzheimer’s disease. Alzheimer’s Dement. 3(3), 186–191 (2007)
2. Ingalhalikar, M., Parker, W.A., Bloy, L., Roberts, T.P.L., Verma, R.: Using mul-
tiparametric data with missing features for learning patterns of pathology. In:
Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol.
3. Yuan, L., Wang, Y., Thompson, P.M., Narayan, V.A., Ye, J.: Multi-source feature
learning for joint analysis of incomplete multiple heterogeneous neuroimaging data.
NeuroImage 61(3), 622–632 (2012)
4. Xiang, S., Yuan, L., Fan, W., Wang, Y., Thompson, P.M., Ye, J.: Bi-level multi-
source learning for heterogeneous block-wise missing data. NeuroImage 102, 192–
206 (2014)
5. Thung, K.H., Wee, C.Y., Yap, P.T., Shen, D.: Neurodegenerative disease diag-
nosis using incomplete multi-modality data via matrix shrinkage and completion.
NeuroImage 91, 386–400 (2014)
6. Schneider, T.: Analysis of incomplete climate data: estimation of mean values and
covariance matrices and imputation of missing values. J. Clim. 14(5), 853–871
(2001)
7. Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions.
Numer. Math. 14(5), 403–420 (1970)
316 M. Liu et al.
8. Zhou, D., Huang, J., Schölkopf, B.: Learning with hypergraphs: Clustering, classi-
fication, and embedding. In: NIPS, pp. 1601–1608(2006)
9. Gao, Y., Wang, M., Tao, D., Ji, R., Dai, Q.: 3-D object retrieval and recognition
with hypergraph analysis. IEEE Trans. Image Process. 21(9), 4290–4303 (2012)
10. Jack, C.R., Bernstein, M.A., Fox, N.C., Thompson, P., Alexander, G.,
Harvey, D., Borowski, B., Britson, P.J., Whitwell, L., Ward, C.: The
Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. J. Magn.
Reson. Imaging 27(4), 685–691 (2008)
17(1), 87–97 (1998)
12. Jenkinson, M., Beckmann, C.F., Behrens, T.E., Woolrich, M.W., Smith, S.M.: FSL.
NeuroImage 62(2), 782–790 (2012)
13. Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition
via sparse representation. IEEE Trans. Pattern Anal. 31(2), 210–227 (2009)
New Multi-task Learning Model to Predict
Alzheimer’s Disease Cognitive Assessment
Zhouyuan Huo1 , Dinggang Shen2 , and Heng Huang1(B)

1
Computer Science and Engineering, University of Texas at Arlington,
Arlington, USA
heng@uta.edu
2
Chapel Hill, USA
Abstract. As a neurodegenerative disorder, the Alzheimer’s disease

(AD) status can be characterized by the progressive impairment of mem-
ory and other cognitive functions. Thus, it is an important topic to use
neuroimaging measures to predict cognitive performance and track the
progression of AD. Many existing cognitive performance prediction meth-
ods employ the regression models to associate cognitive scores to neu-
roimaging measures, but these methods do not take into account the
interconnected structures within imaging data and those among cog-
nitive scores. To address this problem, we propose a novel multi-task
learning model for minimizing the k smallest singular values to uncover
the underlying low-rank common subspace and jointly analyze all the
imaging and clinical data. The effectiveness of our method is demon-
strated by the clearly improved prediction performances in all empirical
AD cognitive scores prediction cases.
1 Introduction
Accruing scientific evidences have demonstrated that the neuroimaging tech-
niques, such as magnetic resonance imaging (MRI), are important for the detec-
tion of early Alzheimer’s Disease (AD) [2,4,7,13]. Current American Academy
of Neurology (AAN) guidelines [3] for dementia diagnosis recommend imaging to
identify structural brain diseases that can cause cognitive impairment. Because
AD is a neurodegenerative disorder characterized by progressive impairment of
cognitive functions, it is important to diagnose the degree of brain impairment,
and how much it can influence the performance of cognitive tests. As a result,
many studies have focused on using regression models to predict cognitive scores
and track AD progression [10,11]. In [10], the voxel-based morphometry (VBM)
features extracted from the entire brain were jointly analyzed by the relevance
Z. Huo and H. Huang—were supported in part by NSF IIS-1117965, IIS-1302675,

IIS-1344152, DBI-1356628, and NIH AG049371. D. Shen was supported in part by
NIH AG041721.

DOI: 10.1007/978-3-319-46720-7 37
318 Z. Huo et al.
vector regression method to predict different clinical scores individually. How-

ever, different neuroimaging features or different cognitive scores are often inter-
related. To tackle this problem, several recent studies, such as [11,12], tried to
employ the multi-task learning models to uncover the inherent structures among
neuroimaging features and cognitive scores. The low-rank regularization is an
effective method to extract the common subspace for multiple tasks. Although
trace norm is a widely used convex relaxation of low-rank regularization [1], its
performance is easily influenced by the large singular values. For example, when
the largest singular values of matrix M increase, the rank of M doesn’t change
but the trace norm of M increases correspondingly.
To address the above problems, in this paper, we propose a novel multi-
task learning model to learn the associations between neuroimaging features
and cognitive scores and uncover the low-rank common subspace among dif-
ferent tasks by minimizing the k smallest singular values. Our new k minimal
singular values minimization regularization is a tighter relaxation than trace
norm for rank minimization, such that our new multi-task learning model can
have better prediction performance. We derive a new optimization algorithm to
solve the proposed objective function and demonstrate the proof of its conver-
gence. The proposed new model is applied to analyze the Alzheimer’s Disease
Neuroimaging Initiative (ADNI) cohort [16] data. In all empirical results, our
new multi-task learning method consistently outperforms the widely used multi-
variate regression method, as well as different state-of-the-art multi-task learning
approaches.
2 New Multi-task Learning Model

2.1 New Objective Function
In our new model, we focus on minimizing the k-smallest singular values of

W and ignoring the largest singular values, such that our new regularization
function is a better relaxation than trace norm. Thus, we propose to solve the
following problem for multi-task learning:
T
k

Jopt = min f (WtT Xt , Yt ) + γ σi (W ) (1)
W =[W1 ,...,WT ]
t=1 i=1
Suppose there are T learning tasks, the t-th task has nt training data points
Xt = [xt1 , xt2 , ..., xtnt ] ∈ Rd×nt . For each data xti , the label yit is given with the
label matrix Yt = [y1t , y2t , ..., ynt t ] ∈ Rct ×nt for each task t. Wt ∈ Rd×ct is the

T
projection matrix to be learned, W ∈ Rd×c and c = ct .
t=1
It is interesting to see that when γ is large enough, then the k-smallest
singular values of the optimal solution W to problem (1) will be zero as all the
singular values of a matrix is non-negative. That is, when γ is large enough, it
is equal to constrain the rank of W to be r = m − k in the problem (1).
Multi-task Learning Model to Predict AD Cognitive Assessment 319
2.2 Optimization Algorithm
As per the definition of ||W ||∗ and singular value decomposition of W , it is

known that:
k

σi (W ) = W ∗ − max T r(F T W G) , (2)
F ∈Rd×r ,F T F =I,
i=1 G∈Rc×r ,GT G=I
where W ∗ is the sum of all the singular values of W , and the optimal solution
of right term is sum of r largest singular values, F is the r left singular vectors
of W and G is the r right singular vectors of W .
According to Eq. (2), the objective Jopt in Eq. (1) is equivalent to:
T

min f (WtT Xt , Yt ) + γW ∗ − γT r(F T W G) . (3)
W =[W1 ,...,WT ],
F ∈Rd×r ,F T F =I, t=1
G∈RT ×r ,GT G=I
When W is fixed, the problem (3) becomes:
max T r(F T W G) (4)

F ∈Rd×r ,F T F =I,
G∈Rc×r ,GT G=I
The optimal solution F to the problem (4) is formed by r left singular vectors of
W corresponding to the r largest singular values, and the optimal solution G is
formed by r right singular vectors of W corresponding to the r largest singular
values.
When F and G are fixed, we define:
g(Wt ) = f (WtT Xt , Yt ) − γT r(WtT F GTt ), (5)
the problem (3) becomes:

T

min g(Wt ) + γW ∗ . (6)
W =[W1 ,...,WT ]
t=1
Using the reweighted method [6], we can solve problem (6) by iteratively solving
the following problem:
T
T

min g(Wt ) + γ T r(Wt WtT D), (7)
W =[W1 ,...,WT ]
t=1 t=1
where D is computed according to the solution W ∗ in the last iteration and is

defined as:
1
D = (W ∗ W ∗ T )− 2 .
1
(8)
2
320 Z. Huo et al.
We can see that each subproblem of task t is independent of each other in

problem (7). Thus, if we use the least square loss function, for each task Wt , the
objective function could be written as:
2
min WtT Xt + bt 1Tt − Yt F − γT r(WtT F GTt ) + γT r(Wt WtT D). (9)
Wt
We take derivatives of Eq. (9) with respect to bt and Wt , and set them to zero.
The optimal solution to problem (9) is as follows:
1 1
Wt = (Xt HXtT + γD)−1 (Xt HYtT + γF GTt ) H=I− 1t 1Tt , (10)
2 nt
1 1
bt = Yt 1t − WtT Xt 1t . (11)
nt nt
We summarize the detailed algorithm to solve the objective Jopt in Algorithm 1.
Algorithm 1. Algorithm to solve the objective Jopt in Eq. (1)

Input: The training data matrix Xt = [xt1 , xt2 , ..., xtnt ] ∈ Rd×nt and the label matrix
Yt = [y1t , y2t , ..., ynt t ] ∈ Rct ×nt for each task t.
Output: W ∈ Rd×c .
Initialize W ∈ Rd×c .
repeat
1. Update F and G by the optimal solution to the problem (4).
1
2. Compute D = 12 (W W T )− 2 .
3. For each t, update Wt by the optimal solution to the problem (7).
until Converges
2.3 Algorithm Analysis
The Algorithm 1 will monotonically decrease the objective of the problem in

Eq. (1) in each iteration. To prove it, we need the following lemma:
Lemma 1. For any positive definite matrices A, At ∈ Rm×m , the following
inequality holds when 0 < p ≤ 2:
p p p−2 p p p−2
T r(A 2 ) − T r(AAt 2 ) ≤ T r(At2 ) − T r(At At 2 ). (12)
2 2
It is proved in [6] that Lemma 1 holds. Based on the Lemma, we have the
following theorem:
Theorem 1. The Algorithm 1 will monotonically decrease the objective of the
problem in Eq. (3) in each iteration till convergence.
Proof. In each iteration, at first, we fix W and compute F̃ and G̃. According to

the solution of Eq. (4), we know:
− γT r(F̃ T W G̃) ≤ −γT r(F T W G). (13)
When F̃ and G̃ are fixed, the problem becomes Eq. (7), by assuming that W̃ is
the solution in each iteration, we have:
T
T
γ 1 γ 1
T r(W̃ W̃ T (W W T )− 2 ) ≤
g(W̃t ) + g(Wt ) + T r(W W T (W W T )− 2 ).
t=1
2 t=1
2
(14)
On the other hand, according to Lemma 1, when p = 1, we have:
1 1 1 1 1 1
T r((W̃ W̃ T ) 2 )− T r(W̃ W̃ T (W W T )− 2 ) ≤ T r((W W T ) 2 )− T r((W W T )(W W T )− 2 ).
2 2
(15)
Combining (13), (14), and (15), we arrive at:

T
T
f (W̃tT Xt , Yt )+γ||W̃ ||∗ −γT r(F̃ T W G̃) ≤ f (WtT Xt , Yt )+γW ∗ −γT r(F T W G).
t=1 t=1
(16)
Thus the Algorithm 1 will not increase the objective function in (3) at each
iteration. Note that the equalities in above questions hold only when the algo-
rithm converges. Therefore, the Algorithm 1 monotonically decreases the objec-
tive value in each iteration till the convergence.
Because we alternatively solve F , G, and W , the Algorithm 1 will converge
to the local optimum of the problem (3), which is equivalent to the proposed
objective function.
3 Experimental Results and Discussions

3.1 Data Set Description
Data used in this paper were obtained from the ADNI database
(adni.loni.usc.edu). One goal of ADNI has been to test whether serial MRI,
PET, other biological markers, and clinical and neuropsychological assessment
can be combined to measure the progression of MCI and early AD. For up-to-
date information, we refer interested readers to visit www.adni-info.org.
The data processing steps are as follows. Each MRI T1-weighted image was
first anterior commissure (AC)’s posterior commissure (PC) corrected using
MIPAV2, intensity inhomogeneity corrected using the N3 algorithm [9], skull
stripped [15] with manual editing, and cerebellum-removed [14]. We then used
FAST [17] in the FSL package3 to segment the image into gray matter (GM),
white matter (WM), and cerebrospinal fluid (CSF), and further used HAMMER
[8] to register the images to a common space. GM volumes obtained from 93
ROIs defined in [5], normalized by the total intracranial volume, were extracted
as features. Nine cognitive scores from five independent cognitive assessments
322 Z. Huo et al.
were downloaded, including three scores from RAVLT cognitive assessment; two
scores from Fluency cognitive assessment (FLU); two scores from Trail making
test (TRAIL). A total of 525 subjects are involved in our study, including 78
AD, 260 MCI, and 187 HC participants.
3.2 Improved Cognitive Status Prediction for Individual

Assessment Tests
First, we apply the proposed method to the ADNI cohort, and separately pre-
dict each of the following three sets of cognitive scores: RAVLT, TRAILS and
FLUENCY. The morphometric variables {xi }ni=1 ∈ Rd , and d = 93 in this
experiment.
We compare the proposed multi-task learning method to three most related
methods: multivariate regression (MRV), multi-task learning model with 2,1 -
norm regularization (2,1 ) [11], and multi-task learning model with trace norm
(LS TRACE) [1], in cognitive performance prediction. For each test case, we use
5-fold cross validation and the prediction performance is assessed by the root
mean square error (RMSE). All experimental results are reported in Table 1.
The proposed method consistently outperforms other methods in nearly all the
test cases for all the cognitive tasks.
The heat maps of parameter weights are shown in Fig. 1. Visualizing the
parameter weights can help us locate the features which play important roles in
the corresponding cognitive prediction tasks. In this way, there is much potential
to identify the relevant imaging predictors and explain the effects of morpho-
metric changes in relation to cognitive performance. As we can see, different
coefficient values are represented in different colors in heat map. The blue polar
Table 1. Prediction performance measured by RMSE (mean ± std)
Test cases Algorithm Score1 Score2 Score3

FLUENCY MVR 6.2292 ± 0.4191 4.1210 ± 0.4733 -
LS TRACE 5.9792 ± 0.6339 4.0492 ± 0.4294 -
2,1 5.7431 ± 0.2796 3.9567 ± 0.2143 -
Our method 5.4377 ± 0.3125 3.9498 ± 0.3505 -
RAVLT MVR 10.8194 ± 0.9530 4.0606 ± 0.3071 4.0616 ± 0.3928
LS TRACE 10.6359 ± 1.1303 4.0252 ± 0.2896 4.0399 ± 0.2250
2,1 10.4451 ± 0.8905 3.9618 ± 0.2484 3.7906 ± 0.1444
Our method 9.7834 ± 0.4867 3.7261 ± 0.1368 3.6984 ± 0.1603
TRAILS MVR 22.3629 ± 1.0656 78.1796 ± 7.3501 70.9399 ± 7.2238
LS TRACE 20.7686 ± 1.1213 75.0121 ± 6.4147 65.3007 ± 6.0726
2,1 19.5400 ± 2.8240 72.7200 ± 8.6480 63.4796 ± 7.3528
Our method 18.1809 ± 2.0390 66.9982 ± 5.1144 58.0915 ± 4.0492
Fig. 1. Heat map of corresponding features for cognitive score prediction.
and red polar mean a significant effect of corresponding features on cognitive

score performance.
3.3 Improved Cognitive Performance Prediction for Joint

Assessment Tests
To further evaluate the multi-task joint analysis power, we apply the proposed
method to predict all five types of cognitive scores (RAVLT, TRAILS, FLU-
ENCY) jointly. Such experiments will demonstrate how the interrelations among
cognitive assessment tests are utilized to enhance the prediction performance.
Table 2. Prediction performance measured by RMSE (mean ± std) for joint assessment
tests.
Algorithm Score name Score1 Score2 Score3

MVR FLUENCY 6.0282 ± 0.2255 4.1852 ± 0.4346 -
RAVLT 11.0376 ± 0.4489 4.0608 ± 0.2554 4.0561 ± 0.1547
TRAILS 21.7435 ± 1.3936 77.0161 ± 5.2578 68.1576 ± 4.837
LS TRACE FLUENCY 5.7778 ± 0.1130 3.9681 ± 0.2965 -
RAVLT 10.8519 ± 0.8808 3.8674 ± 0.4112 3.8772 ± 0.1943
TRAILS 20.5224 ± 1.1906 74.4795 ± 4.5967 64.3386 ± 4.2974
2,1 FLUENCY 5.8100 ± 0.9274 3.9139 ± 0.3538 -
RAVLT 10.4500 ± 0.3846 3.9806 ± 0.2158 3.8797 ± 0.2050
TRAILS 19.7753 ± 1.5802 70.9585 ± 5.5396 62.3717 ± 4.9592
Our method FLUENCY 5.4644 ± 0.3515 3.8724 ± 0.1908 -
RAVLT 10.4492 ± 0.8235 3.6522 ± 0.2542 3.7086 ± 0.1814
TRAILS 17.8778 ± 1.8126 66.3821 ± 5.6292 57.7588 ± 5.3360
324 Z. Huo et al.
Similar to the previous experiment, we also compare our method to three

other related models. For each test case, we use 5-fold cross validation to evalu-
ate the average performance of each algorithm. The prediction results are eval-
uated by RMSE and reported in Table 2. In all prediction cases, our method
outperforms other methods.
4 Conclusion
In this paper, we proposed a new multi-task learning model for minimizing k

smallest singular values to predict the cognitive scores for complex brain dis-
orders. This proposed new low-rank regularization is a better approximation of
rank minimization regularization problem than the standard trace norm regular-
ization, thus our new multi-task learning method can uncover the shared com-
mon subspace efficiently and sufficiently. As a result, cognitive score prediction
results are enhanced by the learned hidden structures among tasks and features.
We also introduced an efficient optimization algorithm to solve our proposed
objective function with rigorous theoretical analysis. Our experiments were con-
ducted on the MRI and multiple cognitive scores data of the ADNI cohort and
yield promising results: (1) Prediction performance of the proposed multi-task
learning model is better than all related methods in all cases; (2) Our method
can predict multiple cognitive scores at the same time and has a potential to
play an important role in determining cognitive functions and characterizing AD
progression.
References
1. Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach.
Learn. 73(3), 243–272 (2008)
2. Batmanghelich, N., Taskar, B., Davatzikos, C.: A general and unifying framework
for feature construction, in image-based pattern classification. In: Prince, J.L.,
Pham, D.L., Myers, K.J. (eds.) IPMI 2009. LNCS, vol. 5636, pp. 423–434. Springer,
Heidelberg (2009)
3. De Leon, M., George, A., Stylopoulos, L., Smith, G., Miller, D.: Early marker for
Alzheimer’s disease: the atrophic hippocampus. Lancet 334(8664), 672–673 (1989)
4. Hassabis, D., Maguire, E.A.: Deconstructing episodic memory with construction.
Trends Cogn. Sci. 11(7), 299–306 (2007)
5. Kabani, N.J.: 3D anatomical atlas of the human brain. Neuroimage 7, P-0717
(1998)
6. Nie, F., Huang, H., Ding, C.H.: Low-rank matrix recovery via efficient schatten
p-Norm minimization. In: AAAI (2012)
7. Rosen, H.J., Gorno-Tempini, M.L., Goldman, W., Perry, R., Schuff, N., Weiner, M.,
Feiwell, R., Kramer, J., Miller, B.L.: Patterns of brain atrophy in frontotemporal
dementia and semantic dementia. Neurology 58(2), 198–208 (2002)
8. Shen, D., Davatzikos, C.: Hammer: hierarchical attribute matching mechanism for
elastic registration. IEEE Trans. Med. Imaging 21(11), 1421–1439 (2002)
17(1), 87–97 (1998)
10. Stonnington, C.M., Chu, C., Klöppel, S., Jack Jr., C.R., Ashburner, J.,
Frackowiak, R.S.: Predicting clinical scores from magnetic resonance scans in
Alzheimer’s disease. Neuroimage 51(4), 1405–1413 (2010)
11. Wang, H., Nie, F., Huang, H., Risacher, S., Ding, C., Saykin, A.J., Shen, L.: Sparse
multi-task regression and feature selection to identify brain imaging predictors for
memory performance. In: 2011 IEEE International Conference on Computer Vision
(ICCV), pp. 557–562. IEEE (2011)
12. Wang, H., Nie, F., Huang, H., Risacher, S., Saykin, A.J., Shen, L., ADNI: joint clas-
sification and regression for identifying ad-sensitive and cognition-relevant imaging
biomarkers. In: 14th International Conference on Medical Image Computing and
Computer Assisted Intervention (MICCAI), pp. 115–123 (2011)
13. Wang, H., Nie, F., Huang, H., Risacher, S.L., Saykin, A.J., Shen, L.: ADNI: iden-
tifying disease sensitive and quantitative trait relevant biomarkers from multi-
dimensional heterogeneous imaging genetics data via sparse multi-modal multi-
task learning. Bioinformatics 28(12), i127–i136 (2012)
14. Wang, Y., Nie, J., Yap, P.T., Li, G., Shi, F., Geng, X., Guo, L., Shen, D.,
Initiative, A.D.N., et al.: Knowledge-guided robust MRI brain extraction for diverse
large-scale neuroimaging studies on humans and non-human primates. PloS One
9(1), e77810 (2014)
15. Wang, Y., Nie, J., Yap, P.-T., Shi, F., Guo, L., Shen, D.: Robust deformable-
surface-based skull-stripping for large-scale studies. In: Fichtinger, G., Martel, A.,
Peters, T. (eds.) MICCAI 2011. LNCS, vol. 6893, pp. 635–642. Springer, Heidelberg
(2011). doi:10.1007/978-3-642-23626-6 78
16. Weiner, M.W., Aisen, P.S., Jack Jr., C.R., Jagust, W.J., Trojanowski, J.Q.,
Shaw, L., Saykin, A.J., Morris, J.C., Cairns, N., Beckett, L.A., et al.: The
Alzheimer’s disease neuroimaging initiative: progress report and future plans.
Alzheimer’s Dement. 6(3), 202–211 (2010)
17. Zhang, Y., Brady, M., Smith, S.: Segmentation of brain MR images through a
hidden Markov random field model and the expectation-maximization algorithm.
Hyperbolic Space Sparse Coding with Its
Application on Prediction of Alzheimer’s Disease
in Mild Cognitive Impairment
Jie Zhang1 , Jie Shi1 , Cynthia Stonnington2 , Qingyang Li1 , Boris A. Gutman4 ,
Kewei Chen3 , Eric M. Reiman3 , Richard Caselli3 , Paul M. Thompson4 ,
Jieping Ye5 , and Yalin Wang1(B)
1
School of Computing, Informatics, and Decision Systems Engineering,
Arizona State University, Tempe, AZ, USA
ylwang@asu.edu
2
Department of Psychiatry and Psychology, Mayo Clinic Arizona,
Scottsdale, AZ, USA
3
Banner Alzheimer’s Institute and Banner Good Samaritan PET Center,
Phoenix, AZ, USA
4
Imaging Genetics Center, Institute for Neuroimaging and Informatics,
University of Southern California, Marina del Rey, CA, USA
5
Department of Computational Medicine and Bioinformatics,
University of Michigan, Ann Arbor, MI, USA
Abstract. Mild Cognitive Impairment (MCI) is a transitional stage

between normal age-related cognitive decline and Alzheimer’s disease
(AD). Here we introduce a hyperbolic space sparse coding method to
predict impending decline of MCI patients to dementia using surface
measures of ventricular enlargement. First, we compute diffeomorphic
mappings between ventricular surfaces using a canonical hyperbolic para-
meter space with consistent boundary conditions and surface tensor-
based morphometry is computed to measure local surface deformations.
Second, ring-shaped patches of TBM features are selected according to
the geometric structure of the hyperbolic parameter space to initialize a
dictionary. Sparse coding is then applied on the patch features to learn
sparse codes and update the dictionary. Finally, we adopt max-pooling
to reduce the feature dimensions and apply Adaboost to predict AD in
MCI patients (N = 133) from the Alzheimer’s Disease Neuroimaging
Initiative baseline dataset. Our work achieved an accuracy rate of 96.7 %
and outperformed some other morphometry measures. The hyperbolic
space sparse coding method may offer a more sensitive tool to study AD
and its early symptom.
Keywords: Mild Cognitive Impairment · Hyperbolic parameter space ·

Ring-shaped patches · Sparse coding and dictionary learning
1 Introduction
Mild Cognitive Impairment (MCI) is a transitional stage between normal aging
and Alzheimer’s disease (AD). Many neuroimaging studies aim to identify

DOI: 10.1007/978-3-319-46720-7 38
Hyperbolic Space Sparse Coding 327
abnormal anatomical or functional patterns, their association with cognitive

decline, and evaluate the therapeutic efficacy of interventions in MCI. Struc-
tural magnetic resonance imaging (MRI) measures have been a mainstay of AD
imaging research, including whole-brain [12], entorhinal cortex [2], hippocam-
pus [15] and ventricular enlargement [14].
Ventricular enlargement is a highly reproducible measure of AD progression,
owing to the high contrast between the CSF and surrounding brain tissue on T1-
weighted images. However, its concave shape, complex branching topology and
the extreme narrowness of the inferior and posterior horns have made ventric-
ular enlargement notoriously difficult for analysis. Recent research has demon-
strated that subregional surface-based ventricular morphometry analysis may
offer improved statistical power. For example, a variety of surface-based analysis
techniques such as SPHARM [13] and radial distance [14] have been proposed
to analyze ventricular morphometry abnormalities. To model a topologically
complicated ventricular surface, Shi et al. [11] proposed to use the hyperbolic
conformal geometry to build the canonical hyperbolic parameter space of ven-
tricular surfaces. After introducing cuts on the ends of three horns, ventricular
surfaces become genus-zero surfaces with multiple open boundaries, which may
be equipped with Riemannian metrics that induce negative Gaussian curvature.
Hyperbolic Ricci flow method was adopted to compute their hyperbolic confor-
mal parameterizations and the resulting parameterizations have no singularities.
After registration, tensor-based morphometry (TBM) [11] was computed on the
entire ventricular surfaces and used for group difference study. Thus far, no
attempt has been made to use the hyperbolic space based surface morphometry
features for the prognosis of AD.
In this paper, we propose a new hyperbolic space sparse coding and dictio-
nary learning framework, in which a Farthest point sampling with Breadth-first
Search (FBS) algorithm is proposed to construct ring-shaped feature patches
from hyperbolic space and patch based hyperbolic sparse coding algorithm is
developed to reduce feature dimensions. Max-pooling [1] and Adaboost [10] are
used for finalizing features and binary classification. We further validate our
algorithms with AD prediction in MCI using ventricular surface TBM features.
The major contributions of this paper are as follows. First, to the best of our
knowledge, it is the first sparse coding framework which is designed on hyperbolic
space. Second, the hyperbolic space sparse coding empowers the AD prediction
accuracy through ventricular morphometry analysis. In our experiments with
the ADNI data (N = 133), our ventricular morphometry system achieves 96.7 %
accuracy, 93.3 % sensitivity, 100.0 % specificity and outperforms other ventricu-
lar morphometric measures in predicting AD conversion for MCI patients.
2 Hyperbolic Space Sparse Coding
The major computational steps of the proposed system are illustrated in Fig. 1.
The new method can be divided into two stages. In the first stage, we perform
MRI scan segmentation, ventricular surface reconstruction, hyperbolic Ricci flow
328 J. Zhang et al.
Fig. 1. The major processing steps in the proposed framework.
based surface registration and surface TBM statistic computation. In the second
stage, we build ring-shaped patches on the hyperbolic parameter space by FBS
to initialize original dictionary, SCC based sparse coding and dictionary learn-
ing and max-pooling are performed for dimension reduction. Following that,
Adaboost is adopted to predict future AD conversion, i.e. classification on MCI-
converter group versus MCI-stable group.
2.1 Hyperbolic Space and Surface Tenser-Based Morphometry
We applied hyperbolic Ricci flow method [11] on ventricular surfaces and mapped
them to the Poincaré disk with conformal mapping. On the Poincaré disk, we
computed a set of consistent geodesics and projected them back to the original
ventricular surface, termed as geodesic curve lifting. Further, we converted the
Poincaré model to the Klein model where the ventricular surfaces are registered
by the constrained harmonic map. The computation of canonical hyperbolic
spaces for a left ventricular surface is shown in Fig. 2.
In Fig. 2, geodesic curve lifting used to construct a canonical hyperbolic space
for ventricular surface registration. γ1 , γ2 , γ3 are some consistent anchor curves
automatically located on the end points of each horn. On the parameter domain,
τ1 is an arc on the circle which passes one endpoint of γ21 and one endpoint of
γ2 and is orthogonal to |z| = 1. The initial paths τ1 and τ2 can be inconsistent,
but they have to connect consistent endpoints of γ1 , γ2 and γ3 , as to guarantee
the consistency of the geodesic curve computation. After slicing the universal
covering space along the geodesics, we get the canonical fundamental domain
in the Poincaré disk, as shown in Fig. 2(b). All the boundary curves become
geodesics. As the geodesics are unique, they are also consistent when we map
them back to the surface in R3 . Furthermore, we convert the Poincaré model
to the Klein model with the complex function [11]: z = 2z/1 + zz. It converts
the canonical fundamental domains of the ventricular surfaces to a Euclidean
octagon, as shown in Fig. 2(c). Then we use the Klein disk as the canonical
parameter space for the ventricular surface analysis.
Fig. 2. Modeling ventricular surface with hyperbolic geometry. (a) shows three iden-
tified open boundaries, γ1 , γ2 , γ3 , on the ends of three horns. After that, ventricular
surfaces can be conformally mapped to the hyperbolic space. (b)(c) show the hyperbolic
parameter space, where (b) is the Poincaré disk model and (c) is the Klein model.
After that, we computed the TBM features [11] and smooth them with the
heat kernel method [3]. Suppose φ = S1 → S2 is a map from surface S1 to
surface S2 . The derivative map of φ is the linear map between the tangent
spaces dφ : T M (p) → T M (φ(p)), induced by the map φ, which also defines
the Jacobian matrix of φ. The derivative map dφ is approximated by the linear
map from one face [v1 , v2 , v3 ] to another one [w1 , w2 , w3 ]. First, we isometrically
embed the triangles [v1 , v2 , v3 ] and [w1 , w2 , w3 ] onto the Klein disk, the planar
coordinates of the vertices, denotes by vi , wi , i = 1, 2, 3, which represent the 3D
position of points vi , wi , i = 1, 2, 3. Then, the Jacobian matrix for the derivative
map dφ can be computed as J = dφ = [w3 − w1 , w2 − w1 ][v3 − v1 , v2√ − v1 ]−1 .
Based on the derivative map J, the deformation tensors S = J T J was
defined as TBM, which measures the amount of local area changes in a surface.
As pointed out in [3], each step in the processing pipeline including MRI acquisi-
tion, surface registration, etc., are expected to introduce noise in the deformation
measurement. To account for the noise effects, we apply the heat kernel smooth-
ing algorithm proposed in [3] to increase the SNR in the TBM statistical features
and boost the sensitivity of statistical analysis.
2.2 Ring-Shaped Patch Selection
The hyperbolic space is different from the original Euclidean space, the struc-
ture is more complicated and demands more efforts for selecting patches based
on its topological structure. The common rectangle patch construction cannot
be directly applied to the hyperbolic space. Therefore, we proposed a Farthest
point sampling with Breadth-first Search (FBS) on hyperbolic space to initialize
original dictionary for sparse coding. Figure 3 (right) is the visualization of patch
selection on hyperbolic parameter domain. And Fig. 3 (left) projects the selected
patches on hyperbolic parameter domain back to the original ventricular surface,
which still maintains the same topological structure as the parameter domain.
First, we randomly selected a point center on the hyperbolic space, denotes
by px1 , px1 ∈ Xr , where Xr is the set of all discrete vertices on hyperbolic space.
Then, we find all points px1 ,i (i = 1, 2, ..., n), where n is the maximum number of
330 J. Zhang et al.
Fig. 3. Visualization of computed image patches on ventricle surfaces and hyperbolic

geometry, respectively. The zoom-in pictures show some overlapping areas between
image patches.
connected points connecting with the patch center px1 . The procedure is called
breadth-first search (BFS) [8], which is an algorithm for searching graph data
structures. It starts at the tree root and explores the neighbor nodes first, before
moving to the next level neighbors. Then, we used the same procedure to find
all connected points with px1 ,i , which are px1 ,ij (j = 1, 2, · · · , mi ). Here, mi
represents the maximum number of connected points with each specific point
px1 ,i . The points px1 ,ij are connected with px1 ,i by using same procedure–BFS–
between px1 and px1 ,i . Finally, we get a set Px1 as follows, which is a selected
patch with patch center px1 and do not contain duplicate points.
Px1 = {px1 , px1 ,1 , px1 ,11 , · · · , px1 ,1m1 , · · · , px1 ,n , px1 ,n1 , · · · , px1 ,nmn }. (1)
We can find all connected components of the center point px1 which are all in
set Px1 . After that, we reconstruct the topological patches based on hyperbolic
geometry and connected edges between the different points within Px1 according
to topological structure. We use Φ1 denotes the first selected patch of the root
(patch center) px1 . Since we randomly select patches with different degree over-
lapped, we use radius r = maxpx ∈Xr dXr (px , px1 ) to determine next patch’s
root px2 position.
In this way, we can find the second patch root px2 ∈ Xr with the farthest
distance r of px1 . We apply farthest point sampling [7], because the sampling
principle is based on the idea of repeatedly placing the next sample point in the
middle of the least known area of the sampling domain, which can guarantee
the randomness of the patches selection. Here, d is hyperbolic distance in the
Klein model. Given two points p and q, draw a straight line between them; the
straight line intersects the unit circle at points a and b, so d is defined as follows:
1 |aq||bp|
d(p, q) = (log ), (2)
2 |ap||bq|
where |aq| > |ap| and |bp| > |bq|.

Then, we can calculate:
px2 = arg max dXr (px , X), (3)

px ∈Xr
Algorithm 1. Farthest point sampling with Breadth-first Search (FBS)

Input: Hyperbolic parameter space.
Output: A collect of different amount overlapped patches on topological structure.
1: Start with X = {px1 } ,Xr denotes all discrete vertices on the hyperbolic space.
2: for t=1 to T do
3: for r do determine sampling radius
4: Find all connected components pxt ,i of pxt by using one step BFS.
5: Find set Pxt similar with Eq. 1 by using one step BFS.
6: r = maxpx ∈Xr dXr (px , pxt )
7: if r ≤ 10e−2 then STOP
8: end if
9: Find the farthest point from X
10: pxt+1 = arg maxpx ∈Xr dr(px , X )
11: Add pxt+1 to X
12: end for
13: end for
where X denotes the set of selected patch centers. Then, we add px2 in X and
iterate the patch selection procedure T = 2000 times, because it will cover all
vertexes according to the experimental results. The details of FBS are summa-
rized in Algorithm 1.
2.3 Sparse Coding and Dictionary Learning
For our problem, the dimension of surface-based features is usually much larger
than the number of subjects, e.g., we have approximate 150,000 features from
each side of ventricle surfaces on each subject. Therefore, we used the technique
of dictionary learning [6] with pooling to reduce the dimension before prediction.
The problem statement of dictionary learning is described as below.
Given a finite training set of signals X = (x1 , x2 , · · · , xn ) in Rn×m image
patches, each image patch xi ∈ Rm , i = 1, 2, · · · , n, where m is the dimension
of image patch. Then, we can incorporate the idea of patch features into the
following optimization problem for each patch xi :
1
min(D, zi ) = ||Dzi − xi ||22 + λ||zi ||1 . (4)
fi 2
Specifically, suppose there are t atoms dj ∈ Rm , j = 1, 2, · · · , t, where the
number of atoms is much smaller than n (the number of image patches) but
t m (the dimension of the image patches). xi can be represented
larger than
by xi = j=1 zi,j dj . In this way, the m-dimensional vector xi is represented
by an t-dimensional vector zi = (zi,1 , · · · , zi,t )T , which means the learned
feature vector zi is a sparse vector. In Eq. 4, where λ is theregularization
t
parameter, || · || is the standard Euclidean norm and ||zi ||1 = j=1 |zi,j | and
t×m
D = (d1 , d2 , · · · , dt ) ∈ R is the dictionary, each column representing a basis
vector.
332 J. Zhang et al.
To prevent an arbitrary scaling of the sparse codes, the columns di are con-

strained by C = {D ∈ Rt×m s.t.∀j = 1, · · · , t, dTj dj ≤ 1}. Thus, the problem of
dictionary learning can be rewritten as a matrix factorization problem:
1
min ||X − DZ||2F + λ||Z||1 . (5)
D∈C,Z∈n×t 2
It is a convex problem when either D or Z is fixed. When the dictionary D is
fixed, solving each sparse code zi is a Lasso problem. Otherwise, when the Z
are fixed, it will become a quadratic problem, which is relative time consuming.
Thus, we choose the SCC algorithm [6], because it can dramatically reduce the
computational cost of the sparse coding while keeping a comparable performance.
3 Dataset of Experiments and Classification Results

We selected 133 subjects from the MCI group in the ADNI baseline dataset [11].
These subjects were chosen on the basis of having at least 36 months of lon-
gitudinal data, which consisting of 71 subjects who developed AD within 36
months (MCIc) group and 62 subjects who did not convert to AD (MCIs) group.
All subjects underwent thorough clinical and cognitive assessment at the time
of aquisition, including Mini-Mental State Examination (MMSE), Alzheimer’s
disease assessment scale-Cognitive (ADAS-Cog). The statistics with matched
gender, education, age, and MMSE are shown in Table 1.
Table 1. Demographic statistic information of our experiment’s dataset.
Number Gender (F/M) Education Age MMSE

MCIc 71 26/45 15.99 ± 2.73 74.77 ± 6.81 26.83 ± 1.60
MCIs 62 18/44 15.87 ± 2.76 75.42 ± 7.83 27.66 ± 1.57
In this work, we employed the Adaboost [10] to do the binary classification

and distinguish different individuals in different groups. Accuracy (ACC), Sen-
sitivity (SEN), Specificity (SPE), Positive predictive value (PPV) and Negative
predictive value (NPV) were computed to evaluate classification results [4]. We
also computed the area-under-the-curve (AUC) of the receiver operating char-
acteristic (ROC) [4]. A five-fold cross-validation was adopted to estimate clas-
sification accuracy. For comparison purposes, we computed ventricular volumes
and surface areas within the MNI space on each side of the brain hemisphere [9],
which are viewed as powerful MRI biomarker that has been widely-used in stud-
ies of AD. And we also compared FBS with a ventricular surface shape method
in [5] (Shape), which built an automatic shape modeling method to generate
comparable meshes of all ventricles. The deformation based morphometry model
were employed with repeated permutation tests and then used as geometry fea-
tures. Support vector machine was adopted as the classifier. With our ventricle
surface registration results, we followed Shape work for selecting biomarkers and
Table 2. Classification results comparing with other systems.
Name Region ACC SEN SPE PPV NPV AUC

FBS Left 0.727 0.786 0.684 0.647 0.813 0.754
Right 0.652 0.652 0.000 1.000 0.000 0.567
Whole 0.967 0.933 1.000 1.000 0.889 0.976
Shape Left 0.535 0.615 0.412 0.615 0.412 0.572
Right 0.512 0.515 0.500 0.773 0.238 0.526
Whole 0.605 0.656 0.500 0.731 0.412 0.656
Volume Left 0.558 0.571 0.552 0.381 0.727 0.532
Right 0.517 0.536 0.467 0.652 0.350 0.430
Whole 0.535 0.607 0.400 0.654 0.353 0.452
Area Left 0.558 0.552 0.571 0.727 0.381 0.626
Right 0.465 0.625 0.370 0.370 0.625 0.493
Whole 0.512 0.482 0.563 0.650 0.391 0.517
classification on the same dataset with our new algorithm. We tested FBS, Shape,
volume and area on left, right and whole ventricle, respectively. Table 2 shows
classification performance in one experiment featuring four methods.
Fig. 4. Classification performance comparison with ROC curves and AUC measures.
Throughout all the experimental results, we can find that the best accuracy
(96.7 %), the best sensitivity (93.3 %), the best specificity (100 %), the best pos-
itive position value (100 %) and negative position value (88.9 %) were achieved
when we use TBM features on ventricle hyperbolic space on both sides (whole)
for training and testing. The comparison also shows that our new framework
selected better features and made better and more meaningful classification
results. In Fig. 4, we also generated ROC and computed AUC measures in four
experiments. The FBS algorithm with whole ventricle TBM features achieved
best AUC (0.957). The comparison demonstrated that our proposed algorithm
may be useful for AD diagnosis and prognosis research. In the future, we will
do more in depth comparisons against other shape analysis modules, such as
SPHARM-PDM and radio distance, to further improve our algorithm efficiency
and accuracy.
334 J. Zhang et al.
Acknowledgments. The research was supported in part by NIH (R21AG043760,

R21AG049216, RF1AG051710, R01AG031581, P30AG19610 and U54EB020403) and
NSF (DMS-1413417 and IIS-1421165).
References
1. Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in
visual recognition. In: Proceedings of the ICML-2010, vol. 10, pp. 111–118 (2010)
2. Cardenas, V., Chao, L., Studholme, C., Yaffe, K., Miller, B., Madison, C.,
Buckley, S., Mungas, D., Schuff, N., Weiner, M.: Brain atrophy associated with
baseline and longitudinal measures of cognition. Neurobiol. Aging 32(4), 572–580
(2011)
3. Chung, M.K., Robbins, S.M., Dalton, K.M., Davidson, R.J., Alexander, A.L.,
Evans, A.C.: Cortical thickness analysis in autism with heat kernel smoothing.
NeuroImage 25(4), 1256–1265 (2005)
4. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–
874 (2006)
5. Ferrarini, L., Palm, W.M., Olofsen, H., van der Landen, R., van Buchem, M.A.,
Reiber, J.H., Admiraal-Behloul, F.: Ventricular shape biomarkers for Alzheimers
disease in clinical MR images. Magn. Reson. Med. 59(2), 260–267 (2008)
6. Lin, B., Li, Q., Sun, Q., Lai, M.J., Davidson, I., Fan, W., Ye, J.: Stochastic coordi-
nate coding and its application for drosophila gene expression pattern annotation
(2014). arXiv preprint arXiv:1407.8147
7. Moenning, C., Dodgson, N.A.: Fast marching farthest point sampling. In: Proceed-
ings of EUROGRAPHICS 2003 (2003)
8. Patel, J.R., Shah, T.R., Shingadiy, V.P., Patel, V.B.: Comparison between breadth
first search and nearest neighbor algorithm for waveguide path planning
9. Patenaude, B., Smitha, S.M., Kennedyc, D.N., Jenkinsona, M.: A bayesian model
of shape and appearance for subcortical brain segmentation. Neuroimage 56(3),
907–922 (2011)
10. Rojas, R.: Adaboost and the super bowl of classifiers a tutorial introduction to
adaptive boosting. Technical report, Freie University, Berlin (2009)
11. Shi, J., Stonnington, C.M., Thompson, P.M., Chen, K., Gutman, B., Reschke, C.,
Baxter, L.C., Reiman, E.M., Caselli, R.J., Wang, Y.: Studying ventricular abnor-
malities in mild cognitive impairment with hyperbolic Ricci flow and tensor-based
morphometry. NeuroImage 104, 1–20 (2015)
12. Stonnington, C.M., Chu, C., Klöppel, S., Jack, C.R., Ashburner, J., Frackowiak,
R.S.: Predicting clinical scores from magnetic resonance scans in Alzheimer’s dis-
ease. Neuroimage 51(4), 1405–1413 (2010)
13. Styner, M., Lieberman, J.A., McClure, R.K., Weinberger, D.R., Jones, D.W.,
Gerig, G.: Morphometric analysis of lateral ventricles in schizophrenia and healthy
controls regarding genetic and disease-specific factors. Proc. Natl. Acad. Sci. U.S.A.
102(13), 4872–4877 (2005)
14. Thompson, P.M., Hayashia, K.M., de Zubicaray, G.I., Jankeb, A.L., Roseb,
S.E., Semplec, J., Honga, M.S., Hermana, D.H., Gravanoa, D., Doddrellb, D.M.,
Toga, A.W.: Mapping hippocampal and ventricular changein Alzheimer disease.
Neuroimage 22(4), 1754–1766 (2004)
15. Zhang, J., Stonnington, C., Li, Q., Shi, J., Bauer, R.J., Gutman, B.A., Chen, K.,
Reiman, E.M., Thompson, P.M., Ye, J., Wang, Y.: Applying sparse coding to
surface multivariate tensor-based morphometry to predict future cognitive decline.
In: IEEE International Symposium on Biomedical Imaging (2016)
Large-Scale Collaborative Imaging Genetics
Studies of Risk Genetic Factors for Alzheimer’s
Disease Across Multiple Institutions
Qingyang Li1 , Tao Yang1 , Liang Zhan2 , Derrek Paul Hibar3 , Neda Jahanshad3 ,
Yalin Wang1 , Jieping Ye4 , Paul M. Thompson3 , and Jie Wang4(B)
1
School of Computing, Informatics, and Decision Systems Engineering,
Arizona State University, Tempe, AZ, USA
2
Department of Engineering and Technology, University of Wisconsin-Stout,
Menomonie, WI, USA
3
Imaging Genetics Center, Institute for Neuroimaging and Informatics,
University of Southern California, Marina del Rey, CA, USA
4
Department of Computational Medicine and Bioinformatics,
University of Michigan, Ann Arbor, MI, USA
jwangumi@umich.edu
Abstract. Genome-wide association studies (GWAS) offer new oppor-

tunities to identify genetic risk factors for Alzheimer’s disease (AD).
Recently, collaborative efforts across different institutions emerged that
enhance the power of many existing techniques on individual institution
data. However, a major barrier to collaborative studies of GWAS is that
many institutions need to preserve individual data privacy. To address
this challenge, we propose a novel distributed framework, termed Local
Query Model (LQM) to detect risk SNPs for AD across multiple research
institutions. To accelerate the learning process, we propose a Distributed
Enhanced Dual Polytope Projection (D-EDPP) screening rule to iden-
tify irrelevant features and remove them from the optimization. To the
best of our knowledge, this is the first successful run of the computa-
tionally intensive model selection procedure to learn a consistent model
across different institutions without compromising their privacy while
ranking the SNPs that may collectively affect AD. Empirical studies are
conducted on 809 subjects with 5.9 million SNP features which are dis-
tributed across three individual institutions. D-EDPP achieved a 66-fold
speed-up by effectively identifying irrelevant features.
Keywords: Alzheimer’s disease · GWAS · Data privacy · Lasso

screening
1 Introduction
Alzheimer’s Disease (AD) is a severe and growing worldwide health problem.
Many techniques have been developed to investigate AD, such as magnetic reso-
nance imaging (MRI) and genome-wide association studies (GWAS), which are
powerful neuroimaging modalities to identify preclinical and clinical AD patients.

DOI: 10.1007/978-3-319-46720-7 39
336 Q. Li et al.
GWAS [4] are achieving great success in finding single nucleotide polymorphisms
(SNPs) associated with AD. For example, APOE is a highly prevalent AD risk
gene, and each copy of the adverse variant is associated with a 3-fold increase
in AD risk. The Alzheimer’s Disease Neuroimaging Initiative (ADNI) collects
neuroimaging and genomic data from elderly individuals across North America.
However, processing and integrating genetic data across different institutions is
challenging. Each institution may wish to collaborate with others, but often legal
or ethical regulations restrict access to individual data, to avoid compromising
data privacy.
Some studies, such as ADNI, share genomic data publicly under certain con-
ditions, but more commonly, each participating institution may be required to
keep their genomic data private, so collecting all data together may not be fea-
sible. To deal with this challenge, we proposed a novel distributed framework,
termed Local Query Model (LQM), to perform the Lasso regression analysis
in a distributed manner, learning genetic risk factors without accessing others’
data. However, applying LQM for model selection—such as stability selection—
can be very time consuming on a large-scale data set. To speed up the learning
process, we proposed a family of distributed safe screening rules (D-SAFE and
D-EDPP) to identify irrelevant features and remove them from the optimiza-
tion without sacrificing accuracy. Next, LQM is employed on the reduced data
matrix to train the model so that each institution obtains top risk genes for AD
by stability selection on the learnt model without revealing its own data set. We
evaluate our method on the ADNI GWAS data, which contains 809 subjects with
5,906,152 SNP features, involving a 80 GB data matrix with approximate 42 bil-
lion nonzero elements, distributed across three research institutions. Empirical
evaluations demonstrate a speedup of 66-fold gained by D-EDPP, compared to
LQM without D-EDPP. Stability selection results show that proposed framework
ranked APOE as the first risk SNPs among all features.
2 Data Processing
2.1 ADNI GWAS Data
The ADNI GWAS data contains genotype information for each of the 809 ADNI
participants, which consist of 128 patients with AD, 415 with mild cognitive
impairment (MCI), 266 cognitively normal (CN). SNPs at approximately 5.9
million specific loci are recorded for each participant. We encode SNPs with
the coding scheme in [7] and apply Minor Allele Frequency (MAF) < 0.05 and
Genotype Quality (GQ) < 45 as two quality control criteria to filter high quality
SNPs features, the details refer to [11].
2.2 Data Partition

Lasso [9] is a widely-used regression technique to find sparse representations of
data, or predictive models. Standard Lasso takes the form of
1
min ||Ax − y||22 + λ||x||1 : x ∈ Rp , (1)
x 2
Large-Scale Collaborative Imaging Genetics Studies 337
where A is genomic data sets distributed across different institutions, y is

the response vector (e.g., hippocampus volume or disease status), x is sparse
representa-tion—shared across all institutions and λ is a positive regularization
parameter.
Suppose that we have m participating institutions. For the ith institution, we
denote its data set by (Ai , yi ), where Ai ∈ Rni ×p , ni is the number of subjects
ni
in this institution, p is the numberm of features, and yi ∈ R is the correspond-
ing response vector, and n = i ni . We assume p is the same across all m
institutions. Our goal is to apply Lasso to rank risk SNPs of AD based on the
distributed data sets (Ai , yi ), i = 1, 2, ..., m.
3 Methods
Figure 1 illustrates the general idea of our distributed framework. Suppose that
each institution maintains the ADNI genome-wide data for a few subjects. We
first apply the distributed Lasso screening rule to pre-identify inactive features
and remove them from the training phase. Next, we employ the LQM on the
reduced data matrices to perform collaborative analyses across different institu-
tions. Finally, each institution obtains the learnt model and performs stability
selection to rank the SNPs that may collectively affect AD. The process of sta-
bility selection is to count the frequency of nonzero entries in the solution vectors
and select the most frequent ones as the top risk genes for AD. The whole learn-
ing procedure results in the same model for all institutions, and preserves data
privacy at each of them.
Fig. 1. The streamline of our proposed framework.
3.1 Local Query Model

We apply a proximal gradient descent algorithm—the Iterative Shrinkage/Thre
sho-lding Algorithm (ISTA) [2]—to solve problem (1). We define g(x; A, y) =
||Ax − y||22 as the least square loss function. The general updating rule of
ISTA is:
338 Q. Li et al.
xk+1 = Γλtk (xk − tk ∇g(xk ; A, y)), (2)

where k is the iteration number, tk is an appropriate step size, and Γ is the soft
thresholding operator [8] defined by Γα (x) = sign(x) · (|x| − α)+ .
In view of (2), to solve (1), we need to compute the gradient of the loss
function ∇g, which equals to AT (Ax − y). However, because the data set (A, y)
is distributed to different institutions, we cannot compute the gradient directly.
To address this challenge, we propose a Local Query Model to learn the model
x across multiple institutions without compromising data privacy.
In our study, each institution maintains its own data set (Ai , yi ) to pre-
serve their privacy. To avoid collecting all data matrices Ai , i = 1, 2, ..., m
together,
m we can rewrite the problem (1) as the following equivalent formulation:
minx i gi (x; Ai , yi )+λ||x||1 : i = 1, 2, ..., m, where gi (x; Ai , yi ) = 12 ||Ai x−yi ||22
is the least squares loss.
mThe Tkey of LQM lies in the following decomposition: ∇g = AT (Ax − y) =
m
i=1 Ai (Ai x−yi ) = i=1 ∇gi . We use “local institution” to denote all the insti-
tutions and “global center” to represent the place where intermediate results are
calculated. The ith local institution computes ∇gi = ATi (Ai x − yi ). Then, each
local institution sends the partial gradient of the loss function to the global cen-
ter. After gathering all the gradient information, the global center can compute
the accurate gradient with respect to x by adding all ∇gi together and send the
updated gradient ∇g back to all the local institutions to compute x.
The master (global center) only servers as the computation center and does
not store any data sets. Although the master gets gi , it could not reconstruct Ai
and yi . Let gik denote the kth iteration of gi . Suppose x is initialized to be zero,
gi1 = −ATi yi and gik = Ai (ATi xk − yi ). We get ATi Ai x by gik − gi1 but Ai can
not be reconstructed since updating and storing x only happens in the workers
(local institution). As a result, LQM can properly maintain data privacy for all
the institutions.
3.2 Safe Screening Rules for Lasso

The dual problem of Lasso (1) can be formulated as the following equation:

1 λ2 y 2 T
sup ||y||2 − ||θ − ||2 : [A]j θ ≤ 1, j = 1, 2, ..., p ,
2
(3)
θ 2 2 λ
where θ is the dual variable and [A]j denotes the jth column of A. Let θ∗ (λ) be
the optimal solution of problem (3) and x∗ (λ) denotes the optimal solution of
problem (1). The Karush–Kuhn–Tucker (KKT) conditions are given by:
y = Ax∗ (λ) + λθ∗ (λ), (4)

sign([x∗ (λ)]j ), If [x∗ (λ)]j =
0,
[A]Tj θ∗ (λ) ∈ (5)
[−1, 1], If [x∗ (λ)]j = 0,
where [x∗ (λ)]k denotes the kth component of x∗ (λ). In view of the KKT condi-
tion in equation (5), the following rule holds: [A]Tj θ∗ (λ) < 1 ⇒ [x∗ (λ)]j = 0 ⇒
xj is an inactive feature.
The inactive features have zero components in the optimal solution vector
x∗ (λ) so that we can remove them from the optimization without sacrificing the
accuracy of the optimal value in the objective function (1). We call this kind of
screening methods as Safe Screening Rules. SAFE [3] is one of highly efficient
safe screening methods. In SAFE, the jth entry of x∗ (λ) is discarded when
T
[A]j y < λ − ||[A]j ||2 ||y||2 λmax − λ , (6)
λmax

where λmax = maxj [A]Tj y . As a result, the optimization can be performed on
the reduced data matrix A and the original problem (1) can be reformulated as:
1 ∈ Rn×p ,
min ||A
x − y||22 + λ|| ∈ Rp and A
x||1 : x (7)
2
x
where p is the number of remaining features after employing safe screening rules.
The optimization is performed on a reduced feature matrix, accelerating the
whole learning process significantly.
3.3 Distributed Safe Screening Rules for Lasso

As data are distributed to different institutions, we develop a family of distrib-
uted Lasso screening rule to identify and discard inactive features in a distributed
environment. Suppose ith institution holds the data set (Ai , yi ), we summarize
a distributed version of SAFE screening rules (D-SAFE) as follows:
m
Step 1: Qi = [Ai ]T yi , update Q = i Qi by LQM.
Step 2: λmax
= max j |[Q]j | .
Step 3: If [A]Tj y < λ − ||[A]j ||2 ||y||2 λmax −λ
λmax , discard jth feature.
To compute ||[A]j ||2 in Step 3, we first compute Hi = ||[Ai ]j ||22 and √ perform
m
LQM to compute H by H = i H i . Then, we have ||[Ai ]j ||2 = H. Simi-
larly, we can compute ||y||2 in Step 3. As the data communication only requires
intermediate results, D-SAFE preserves the data privacy at each institution.
To tune the value of λ, commonly used methods such as cross validation need
to solve the Lasso problem along a sequence of parameters λ0 > λ1 > ... > λκ ,
which can be very time-consuming. Enhanced Dual Polytope Projection (EDPP)
[10] is a highly efficient safe screening rules. Implementation details of EDPP is
available on the GitHub: http://dpc-screening.github.io/lasso.html.
To address the problem of data privacy, we propose a distributed Lasso
screening rule, termed Distributed Enhanced Dual Polytope Projection (D-
EDPP), to identify and discard inactive features along a sequence of parameter
values in a distributed manner. The idea of D-EDPP is similar to LQM. Specifi-
cally, to update the global variables, we apply LQM to query each local center for
intermediate results–computed locally–and we aggregate them at global center.
After obtaining the reduced matrix for each institution, we apply LQM to solve
the Lasso problem on the reduced data set A i , i = 1, ..., m. We assume that j
340 Q. Li et al.
Algorithm 1. Distributed Enhanced Dual Polytope Projection (D-EDPP)

Require: A set of data pairs {(A1 , y1 ), (A2 , y2 ), ..., (An , yn )} and ith institution holds
the data pair (Ai , yi ). A sequence of parameters: λmax = λ0 > λ1 > ... > λκ .
Ensure: The learnt models: {x∗ (λ0 ), x∗ (λ1 ), ..., x∗ (λκ )}.
1: Perform the computation on n institutions.
For the ith institution:
2: Let Ri = ATi yi , compute R = m i R i by LQM. Then we get λmax by ||R||∞ .
3: J = arg maxj |R|, vi = [Ai ]J where [Ai ]J is the Jth column of Ai .
4: Let λ0 ∈ (0, λmax ] and λ ∈ (0, λ0 ].
yi
, if λ = λmax ,
5: θi (λ) = λyimax−Ai x∗ (λ)
λ
, if λ ∈ (0, λmax ),

6: Ti = viT ∗ yi , compute T = m i Ti by LQM.

sign(T ) ∗ vi , if λ0 = λmax ,
7: v1 (λ0 )i = yi
λ0
− θi (λ0 ), if λ0 ∈ (0, λmax ),

8: v2 (λ, λ0 )i = yλi − θi (λ0 ), Si = ||v1 (λ0 )i ||22 , compute S = m i Si by LQM.
9: v2⊥ (λ, λ0 )i = v2 (λ, λ0 )i − <v1 (λ0 )i ,vS2 (λ,λ0 )i > v1 (λ0 )i .
10: Given a sequence of parameters: λmax = λ0 > λ1 > ... > λκ , for k ∈ [1, κ], we
make a prediction of screening on λk if x∗ (λk−1 ) is known:
11: for j=1 to p do
12: wi = [Ai ]Tj (θi (λk−1 ) + 12 v2⊥ (λk , λk−1 )i ), compute w = mi wi by LQM.
13: if w < 1 − 12 ||v2⊥ (λk , λk−1 )||2 ||[A]j ||2 then
14: We identify [x∗ (λk )]j = 0.
15: end for
indicates the jth column in A, j = 1, ..., p, where p is the number of features.

We summarize the proposed D-EDPP in Algorithm 1.
To calculate R, we apply m LQM through aggregating all the Ri together in
the global center by R = i Ri and send R back to every institution. The same
approach is used to calculate T , S and w in D-EDPP. The calculation of ||[A]j ||2
and ||v2⊥ (λk , λk−1 )||2 follows the same way in D-SAFE. The discarding result
of λk relies on the previous optimal solution x∗ (λk−1 ). Especially, λk equals to
λmax when k is zero. Thus, we identify all the elements to be zero at x∗ (λ0 ).
When k is 1, we can perform screening based on x∗ (λ0 ).
3.4 Local Query Model for Lasso
To further accelerate the learning process, we apply FISTA [1] to solve the Lasso
problem in a distributed manner. The convergence rate of FISTA is O(1/k 2 )
compared to O(1/k) of ISTA, where k is the iteration number. We integrate
FISTA with LQM (F-LQM) to solve the Lasso problem on the reduced matrix
Ai . We summarize the updating rule of F-LQM in kth iteration as follows:
Step 1: ∇gik = AT (Ai xk − yi ), update ∇g k = m ∇g k by LQM.

i √i i
1+ 1+4t2k
Step 2: z k = Γλtk (xk − tk ∇g k ) and tk+1 = 2 .
Step 3: xk+1 = z k + ttkk+1
−1 k
(z − z k−1 ).
The matrix A i denotes the reduced matrix for the ith institution obtained
by D-EDPP rule. We repeat this procedure until a satisfactory global model is
obtained. Step 1 calculates ∇gik from local data (A i , yi ). Then, each institution
performs LQM to get the gradient ∇g k based on (5). Step 2 updates the auxiliary
variables z k and step size tk . Step 3 updates the model x. Similar to LQM, the
data privacy of institutions are well preserved by F-LQM.
4 Experiment
We implement the proposed framework across three institutions on a state-of-
the-art distributed platform—Apache Spark—a fast and efficient distributed
platform for large-scale data computing. Experiment shows the efficiency and
effectiveness of proposed models.
4.1 Comparison of Lasso with and Without D-EDPP Rule

We choose the volume of lateral ventricle as variables being predicted in trials
containing 717 subjects by removing subjects without labels. The volumes of
brain regions were extracted from each subject’s T1 MRI scan using Freesurfer:
http://freesurfer.net. We evaluate the efficiency of D-EDPP across three research
institutions that maintain 326, 215, and 176 subjects, respectively. The subjects
are stored as HDFS files. We solve the Lasso problem along a sequence of 100
parameter values equally spaced on the linear scale of λ/λmax from 1.00 to 0.05.
We randomly select 0.1 million to 1 million features by applying F-LQM since
[1] proved that FISTA converges faster than ISTA. We report the result in Fig. 2
and achieved about a speedup of 66-fold compared to F-LQM.
100 2500
Speedup
90
D-EDPP +F-LQM
F-LQM
80 2000
70
x66
Time(in minutes)
Time(in minutes)
x61
60 x57 1500
x53 x54
x51
50
x42
40 1000
x32
30
x25
x21
20 500
10
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
The Number of features(in millions)
Fig. 2. Running time comparison of Lasso with and without D-EDPP rules.
4.2 Stability Selection for Top Risk Genetic Factors

We employ stability selection [6,11] with D-EDPP+F-LQM to select top risk
SNPs from the entire GWAS with 5,906,152 features. We conduct four groups
342 Q. Li et al.
Table 1. Top 5 selected risk SNPs associated with diagnose, the volume of hippocam-
pal, entorhinal cortex, and lateral ventricle at baseline, based on ADNI.
Diagnose at baseline Hippocampus at baseline

No. Chr Position RS ID Gene No. Chr Position RS ID Gene
1 19 45411941 rs429358 APOE 1 19 45411941 rs429358 APOE
2 19 45410002 rs769449 APOE 2 8 145158607 rs34173062 SHARPIN
3 12 9911736 rs3136564 CD69 3 11 11317240 rs10831576 GALNT18
4 1 172879023 rs2227203 unknown 4 10 71969989 rs12412466 PPA1
5 20 58267891 rs6100558 PHACTR3 5 6 168107162 rs71573413 unknown
Entorhinal cortex at baseline Lateral ventricle at baseline
No.C hr P osition RS ID Gene No.C hr Position RS ID Gene
1 19 45411941 rs429358 APOE 1 Y 3164319 rs2261174 unknown
2 15 89688115 rs8025377 ABHD2 2 10 62162053 rs10994327 ANK3
3 Y 10070927 rs79584829 unknown 3 Y 13395084 rs62610496 unknown
4 14 47506875 rs41354245 MDGA2 4 1 77895410 rs2647521 AK5
5 3 30106956 rs55904134 unknown 5 1 114663751 rs2629810 SYT6
of trials in Table 1. In each trial, D-EDPP+F-LQM is carried out along a 100

linear-scale sequence from 1 to 0.05. We simulate this 200 times and perform on
500 of subjects in each round. Table 1 shows the top 5 selected SNPs. APOE,
one of the top genetic risk factors for AD [5], is ranked #1 for three groups.
Acknowledgments. This work was supported in part by NIH Big Data to Knowledge
(BD2K) Center of Excellence grant U54 EB020403, funded by a cross-NIH consortium
including NIBIB and NCI.
References
1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear
inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
2. Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for
linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math.
57(11), 1413–1457 (2004)
3. Ghaoui, L.E., Viallon, V., Rabbani, T.: Safe feature elimination for the lasso and
sparse supervised learning problems. arXiv preprint arXiv:1009.4219 (2010)
4. Harold, D., et al.: Genome-wide association study identifies variants at clu and
picalm associated with Alzheimer’s disease. Nature Genet. 41(10), 1088–1093
(2009)
5. Liu, C.C., Kanekiyo, T., Xu, H., Bu, G.: Apolipoprotein e and Alzheimer disease:
risk, mechanisms and therapy. Nature Rev. Neurol. 9(2), 106–118 (2013)
6. Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Series B (Stat.
Methodol.) 72(4), 417–473 (2010)
7. Sasieni, P.D.: From genotypes to genes: doubling the sample size. Biometrics, 1253–
1261 (1997)
8. Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l 1-regularized loss mini-
mization. J. Mach. Learn. Res. 12, 1865–1892 (2011)
9. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc.
Series B (Methodol.), 267–288 (1996)
10. Wang, J., Zhou, J., Wonka, P., Ye, J.: Lasso screening rules via dual polytope
projection. In: Advances in Neural Information Processing Systems (2013)
11. Yang, T., et al.: Detecting genetic risk factors for Alzheimer’s disease in whole
genome sequence data via lasso screening. In: IEEE International Symposium on
Biomedical Imaging, pp. 985–989 (2015)
Structured Sparse Low-Rank Regression Model
for Brain-Wide and Genome-Wide Associations
Xiaofeng Zhu1 , Heung-Il Suk2 , Heng Huang3 , and Dinggang Shen1(B)

1
dgshen@med.unc.edu
2
Department of Brain and Cognitive Engineering,
Korea University, Seoul, Republic of Korea
3
Computer Science and Engineering,
University of Texas at Arlington, Arlington, USA
Abstract. With the advances of neuroimaging techniques and genome

sequences understanding, the phenotype and genotype data have been
utilized to study the brain diseases (known as imaging genetics). One of
the most important topics in image genetics is to discover the genetic
basis of phenotypic markers and their associations. In such studies, the
linear regression models have been playing an important role by pro-
viding interpretable results. However, due to their modeling character-
istics, it is limited to effectively utilize inherent information among the
phenotypes and genotypes, which are helpful for better understanding
their associations. In this work, we propose a structured sparse low-
rank regression method to explicitly consider the correlations within
the imaging phenotypes and the genotypes simultaneously for Brain-
Wide and Genome-Wide Association (BW-GWA) study. Specifically, we
impose the low-rank constraint as well as the structured sparse con-
straint on both phenotypes and phenotypes. By using the Alzheimer’s
Disease Neuroimaging Initiative (ADNI) dataset, we conducted experi-
ments of predicting the phenotype data from genotype data and achieved
performance improvement by 12.75 % on average in terms of the root-
mean-square error over the state-of-the-art methods.
1 Introduction
Recently, it has been of great interest to identify the genetic basis (e.g., Sin-
gle Nucleotide Polymorphisms: SNPs) of phenotypic neuroimaging markers
(e.g., features in Magnetic Resonance Imaging: MRI) and study the associa-
tions between them, known as imaging-genetic analysis. In the previous work,
Vounou et al. categorized the association studies between neuroimaging pheno-
types and genotypes into four classes depending on both the dimensionality of
the phenotype being investigated and the size of genomic regions being searched
for association [13]. In this work, we focus on the Brain-Wide and Genome-Wide
Association (BW-GWA) study, in which we search non-random associations for
both the whole brain and the entire genome.

DOI: 10.1007/978-3-319-46720-7 40
Structured Sparse Low-Rank Regression Model 345
The BW-GWA study has a potential benefit to help discover important asso-
ciations between neuroimaging based phenotypic markers and genotypes from a
different perspective. For example, by identifying high associations between spe-
cific SNPs and some brain regions related to Alzheimer’s Disease (AD), one can
utilize the information of the corresponding SNPs to predict the risk of incident
AD much earlier, even before pathological changes begin. This will help clini-
cians have much time to track the progress of AD and find potential treatments
to prevent the AD. Due to the high-dimensional nature of brain phenotypes
and genotypes, there were only a few studies for BW-GWA [3,8]. Conventional
methods formulated the problem as Multi-output Linear Regression (MLR) to
estimate the coefficients independently, thus resulting in unsatisfactory perfor-
mance. Recent studies were mostly devoted to conduct dimensionality reduction
while the results should be still interpretable at the end. For example, Stein et
al. [8] and Vounou et al. [13], separately, employed t-test and sparse reduced-
rank regression to conduct association study between voxel-based neuroimaging
phenotypes and SNP genotypes.
In this paper, we propose a novel structured sparse low-rank regression model
for the BW-GWA study with MRI features of a whole brain as phenotypes and
the SNP genotypes. To do this, we first impose a low-rank constraint on the
coefficient matrix of the MLR. With a low-rank constraint, we can think of the
coefficient matrix decomposed by two low-rank matrices, i.e., two transforma-
tion subspaces, each of which separately transfers high-dimensional phenotypes
and genotypes into their own low-rank representations via considering the cor-
relations among the response variables and the features. We then introduce a
structured sparsity-inducing penalty (i.e., an 2,1 -norm regularizer) on each of
transformation matrices to conduct biomarker selection on both phenotypes and
genotypes by taking the correlations among the features into account. The struc-
tured sparsity constraint allows the low-rank regression to select highly predic-
tive genotypes and phenotypes, as a large number of them are not expected to
be important and involved in the BW-GWA study [14]. In this way, our new
method integrates low-rank constraint with structured sparsity constraints in
a unified framework. We apply the proposed method to study the genotype-
phenotype associations using the Alzheimer’s Disease Neuroimaging Initiative
(ADNI) data. Our experimental results show that our new model consistently
outperforms the competing methods in term of the prediction accuracy.
2 Methodology
2.1 Notations
In this paper, we denote matrices, vectors, and scalars as boldface uppercase
letters, boldface lowercase letters, and normal italic letters, respectively. For a
matrix X = [xij ], its i -th row and the j -th column are denoted as xi and xj ,
Frobenius norm and the2,1 -norm of a matrix
respectively. Also, we denote the
i
X as XF = i
i x 2 = j xj 2 and X2,1 = i x 2 , respectively.
2 2
346 X. Zhu et al.
We further denote the transpose operator, the trace operator, the rank, and the
inverse of a matrix X as XT , tr(X), rank(X), and X−1 , respectively.
2.2 Low-Rank Multi-output Linear Regression
We denote X ∈ Rn×d and Y ∈ Rn×c matrices as n samples of d SNPs and c MRI

features, respectively. We assume that there exists a linear relationship between
them and thus formulate as follows:
Y = XW + eb (1)
where W ∈ Rd×c is a coefficient matrix, b ∈ R1×c is a bias term, and e ∈ Rn×1

denotes a column vector with all ones. If the covariance matrix XT X has full
rank, i.e., rank(XT X) = d, the solution of W in Eq. (1) can be obtained by the
Ordinary Least Square (OLS) estimation [4] as:
Ŵ = (XT X)−1 XT (Y − eb). (2)
However, the MLR illustrated in Fig. 1(a) with the OLS estimation in Eq.
(2) has at least two limitations. First, Eq. (2) is equivalent to conduct mass-
univariate linear models, which fit each of c univariate response variables, inde-
pendently. This obviously doesn’t make use of possible relations among the
response variables (i.e., ROIs). Second, neither X nor Y in MLR are ensured
to have a full-rank due to noise, outliers, correlations in the data [13]. For the
non-full rank (or low-rank) case of XT X, Eq. (2) is not applicable.
Yn x c = Xn x d + En x c Yn x c = Xn x d + En x c
Ac x rT
Wd x c Bd x r
(a) Multi-output linear regression (b) Low-rank regression
Fig. 1. Illustration of multi-output linear regression and low-rank regression.
The principle of parsimony in many areas of science and engineering, espe-

cially in machine learning, justifies to hypothesize low-rankness of the data,
i.e., the MRI phenotypes and the SNP genotypes in our work. The low-rankness
leads to the inequality rank(W) ≤ min(d, c) or even rank(W) ≤ min(n, d, c)
in the case with limited samples. It thus allows to decompose the coefficient
matrix W by the product of two low-rank matrices, i.e., W = BAT , where
B ∈ Rd×r , A ∈ Rc×r , and r is the rank of W. For a fixed r, a low-rank MLR
model illustrated in Fig. 1(b) is formulated as:
min Y − XBAT − eb2F . (3)

A,B,b
The assumption of the existence of latent factors in either phenotypes or

genotypes has been reported, making imaging-genetic analysis gain accurate
estimation [1,15]. Equation (3) may achieve by seeking the low-rank representa-
tion of phenotypes and genotypes, but not producing interpretable results and
also not touching the issues of non-invertible XT X and over-fitting. Naturally,
a regularizer is preferred.
2.3 Structured Sparse Low-Rank Multi-output Linear Regression

From a statistical point of view, a well-defined regularizer may produce a gener-
alized solution, and thus resulting in stable estimation. In this section, we devise
new regularizers for identifying statistically interpretable BW-GWA.
The high-dimensional feature matrix often suffers from multi-collinearity,
i.e., lack of orthogonality among features, which may lead to the singular prob-
lem and the inflation of variance of coefficients [13]. In order to circumvent this
problem, we introduce an orthogonality constraint on A to Eq. (3). In the BW-
GWA study, there are a large number of SNP genotypes or MRI phenotypes,
some of them may not be related to the association analysis between them. The
unuseful SNP genotypes (or MRI phenotypes) may affect the extraction of r
latent factors of X (or Y). In these cases, it is not known with certainty which
quantitative phenotypes or genotypes provide good estimation to the model.
As human brain is a complex system, brain regions may be dependently
related to each other [3,14]. This motivates us to conduct feature selection via
structured sparsity constraints on both X (i.e., SNPs) and Y (i.e., brain regions)
while conducting subspace learning via the low-rank constraint. The rationale
of using a structured sparsity constraint (e.g., an 2,1 -norm regularizer on A,
i.e., A2,1 ) is that it effectively selects highly predictive features (i.e., discard-
ing the unimportant features from the model) by considering the correlations
among the features. Such a process implies to extract latent vectors from ‘puri-
fied data’ (i.e., the data after removing unuseful features by conducting feature
selection) or conduct feature selection with the help of the low-rank constraint.
By applying the constraints of orthogonality and structured sparsity, Eq. (3)
can be rewritten as follows:
min Y − XBAT − eb2F + αB2,1 + βA2,1 , s.t., AT A = I. (4)
A,B,b,r
Clearly, the 2,1 -norm regularizers on B and A penalize coefficients of B and A

in a row-wise manner for joint selection or un-selection of the features and the
response variables, respectively.
Compared to sparse Reduced-Rank Regression (RRR) [13] that exploits reg-
ularization terms of 1 -norm on B and A to sequentially output a vector of
either B or A, thus leading to suboptimal solutions of B and A, our method
panelizes 2,1 -norm on BAT and A to explicitly conduct feature selection on X
and Y. Furthermore, the orthogonality constraint on A helps avoid the multi-
collinearity problem, and thus simplifies the objective function to only optimize
B (instead of BAT ) and A.
348 X. Zhu et al.
Finally, after optimizing Eq. (4), we conduct feature selection by discarding

the features (or the response variables) whose corresponding coefficients (i.e., in
B or A) are zeros in the rows.
3 Experimental Analysis
We conducted various experiments on the ADNI dataset (‘www.adni-info.org’)
by comparing the proposed method with the state-of-the-art methods.
3.1 Preprocessing and Feature Extraction

By following the literatures [9,11,20], we used baseline MRI images of 737 sub-
jects including 171 AD, 362 mild cognitive impairments, and 204 normal controls.
We preprocessed the MRI images by sequentially applying spatial distortion cor-
rection, skull-stripping, and cerebellum removal. We then segmented images into
gray matter, white matter, and cerebrospinal fluid, and further warped them into
93 Regions Of Interest (ROIs). We computed the gray matter tissue volume in
each ROI by integrating the gray matter segmentation result of each subject.
Finally, we acquired 93 features for one MRI image.
The genotype data of all participants were first obtained from the ADNI
1 and then genotyped using the Human 610-Quad BeadChip. In our experi-
ments, 2,098 SNPs, from 153 AD candidate genes (boundary: 20 KB) listed on
the AlzGene database (www.alzgene.org) as of 4/18/2011, were selected by the
standard quality control (QC) and imputation steps. The QC criteria includes
(1) call rate check per subject and per SNP marker, (2) gender check, (3) sibling
pair identification, (4) the Hardy-Weinberg equilibrium test, (5) marker removal
by the minor allele frequency, and (6) population stratification. The imputation
step imputed the QC?ed SNPs using the MaCH software.
3.2 Experimental Setting
The comparison methods include the standard regularized Multi-output Lin-

ear Regression (MLR) [4], sparse feature selection with an 2,1 -norm regularizer
(L21 for short) [2], Group sparse Feature Selection (GFS) [14], sparse Canonical
Correlation Analysis (CCA) [6,17], and sparse Reduced-Rank Regression (RRR)
[13]. The former two are the most widely used methods in both statistical learn-
ing and medical image analysis, while the last three are the state-of-the-art
methods in imaging-genetic analysis. Besides, we define the method ‘Baseline’
by removing the third term (i.e., βA2,1 ) in Eq. (4) to only select SNPs using
our model.
We conducted a 5-fold Cross Validation (CV) on all methods, and then
repeated the whole process 10 times. The final result was computed by aver-
aging results of all 50 experiments. We also used a 5-fold nested CV to tune the
parameters (such as α and β in Eq. (4)) in the space of {10−5 , 10−4 , ..., 104 , 105 }
for all methods in our experiments. As for the rank of the coefficient matrix W,
we varied the values of r in {1, 2, ..., 10} for our method.
By following the previous work [3,14], we picked up the top {20, 40, ..., 200}
SNPs to predict test data. The performance of each experiment was assessed
by Root-Mean-Square Error (RMSE), a widely used measurement for regression
analysis, and ‘Frequency’ (∈ [0, 1]) defined as the ratio of the features selected
in 50 experiments. The larger the value of ‘Frequency’, the more likely the cor-
responding SNP (or ROI) is selected.
3.3 Experimental Results
We summarized the RMSE performances of all methods in Fig. 2(a), where the
mean and standard deviation of the RMSEs were obtained from the 50 (5-fold
CV × 10 repetition) experiments. Figure 2(b) and (c) showed, respectively, the
values of ‘Frequency’ of the top 10 selected SNPs by the competing methods and
the frequency of the top 10 selected ROIs by our method.
Figure 2(a) discovered the following observations: (i) The RMSE values of all
methods decreased with the increase of the number of selected SNPs. This is
because the more the SNPs, the better performance the BW-GWA study is, in
our experiments. (ii) The proposed method obtained the best performance, fol-
lowed by the Baseline, RRR, GFS, CCA, L21, and MLR. Specifically, our method
improved by on average 12.75 % compared to the other competing methods. In
the paired-sample t-test at 95 % confidence level, all p-values between the pro-
posed method and the comparison methods were less than 0.00001. Moreover,
our method was considerably stable than the comparison methods. This clearly
manifested the advantage of the proposed method integrating a low-rank con-
straint with structured sparsity constraints in a unified framework. (iii) The
Baseline method improved by on average 8.26 % compared to the comparison
× 10-4 0.8 50
rs429358
MLR
3.5 rs11234495 0.7
L21
GFS rs7938033 0.6
40
Frequency
CCA rs10792820
RMSE
RRR 0.5
rs7945931
3 Baseline
Proposed rs2276346 0.4
rs6584307 30
0.3
rs1329600
rs17367504 0.2
2.5
rs10779339 0.1
20
20 40 60 80 100 120 140 160 180 200 1 2 3 4 5 6 7 8 9 10
R 21 FS A R ne ed
Number of selected SNPs ML L G CC RR aseli opos ROI
B Pr
(a) RMSE (b) Top 10 selected SNPs (c) Top 10 selected ROIs
Fig. 2. (a) RMSE with respect to different number of selected SNPs of all methods;
(b) Frequency of top 10 selected SNPs by all methods; and (c) Frequency of the top 10
selected ROIs by our method in our 50 experiments. The name of the ROIs (indexed
from 1 to 10) are middle temporal gyrus left, perirhinal cortex left, temporal pole left,
middle temporal gyrus right, amygdala right, hippocampal formation right, middle
temporal gyrus left, amygdala left, inferior temporal gyrus right, and hippocampal
formation left.
350 X. Zhu et al.
methods and the p-values were less than 0.001 in the paired-sample t-tests at
95 % confidence level. This manifested that our model without selecting ROIs
(i.e., Baseline) still outperformed all comparison methods. It is noteworthy that
our proposed method improved by on average 4.49 % over the Baseline method
and the paired-sample t-tests also indicated the improvements were statistically
significant difference. This verified again that it is essential to simultaneously
select a subset of ROIs and a subset of SNPs.
Figure 2(b) indicated that phenotypes could be affected by genotypes in dif-
ferent degrees: (i) The selected SNPs in Fig. 2(b) belonged to the genes, such as
PICALM, APOE, SORL1, ENTPD7, DAPK1, MTHFR, and CR1, which have
been reported as the top AD-related genes in the AlzGene website. (ii) Although
we know little about the underlying mechanisms of genotypes in relation to AD,
but Fig. 2(b) enabled a potential to gain biological insights from the BW-GWA
study. (iii) The selected ROIs by the proposed method in Fig. 2(c) were known to
be highly related to AD in previous studies [10,12,19]. It should be noteworthy
that all methods selected ROIs in Fig. 2(c) as their top ROIs but with different
probability.
Finally, our method conducted the BW-GWA study to select a subset of SNPs
and a subset of ROIs, which were also known in relation to AD by the previous
state-of-the-art methods. The consistent performance of our methods clearly
demonstrated that the proposed method enabled to conduct more statistically
meaningful BW-GWA study, compared to the comparison methods.
4 Conclusion
In this paper, we proposed an efficient structured sparse low-rank regression

method to select highly associated MRI phenotypes and SNP genotypes in a
BW-GWA study. The experimental results on the association study between
neuroimaging data and genetic information verified the effectiveness of the pro-
posed method, by comparing with the state-of-the-art methods.
Our method considered SNPs (or ROIs) evenly. However, SNPs are natu-
rally connected via different pathways, while ROIs have various functional or
structural relations to each other [6,7]. In our future work, we will extend our
model to take the interlinked structures within both genotypes and incomplete
multi-modality phenotypes [5,16,18] into account for further improving the per-
formance of the BW-GWA study.
Acknowledgements. This work was supported in part by NIH grants (EB006733,

EB008374, EB009634, MH100217, AG041721, AG042599). Heung-Il Suk was supported
in part by Institute for Information & communications Technology Promotion (IITP)
grant funded by the Korea government (MSIP) (No. B0101-16-0307, Basic Software
Research in Human-level Lifelong Machine Learning (Machine Learning Center)). Heng
Huang was supported in part by NSF IIS 1117965, IIS 1302675, IIS 1344152, DBI
1356628, and NIH AG049371. Xiaofeng Zhu was supported in part by the National
Natural Science Foundation of China under grants 61573270 and 61263035.
References
1. Du, L., et al.: A novel structure-aware sparse learning algorithm for brain imaging
genetics. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.)
MICCAI 2014. LNCS, vol. 8675, pp. 329–336. Springer, Heidelberg (2014). doi:10.
1007/978-3-319-10443-0 42
2. Evgeniou, A., Pontil, M.: Multi-task feature learning. NIPS 19, 41–48 (2007)
3. Hao, X., Yu, J., Zhang, D.: Identifying genetic associations with MRI-derived
measures via tree-guided sparse learning. In: Golland, P., Hata, N., Barillot, C.,
Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 757–764.
4. Izenman, A.J.: Reduced-rank regression for the multivariate linear model. J. Mul-
tivar. Anal. 5(2), 248–264 (1975)
5. Jin, Y., Wee, C.Y., Shi, F., Thung, K.H., Ni, D., Yap, P.T., Shen, D.: Identification
of infants at high-risk for autism spectrum disorder using multiparameter multi-
scale white matter connectivity networks. Hum. Brain Mapp. 36(12), 4880–4896
(2015)
6. Lin, D., Cao, H., Calhoun, V.D., Wang, Y.P.: Sparse models for correlative and
integrative analysis of imaging and genetic data. J. Neurosci. Methods 237, 69–78
(2014)
7. Shen, L., Thompson, P.M., Potkin, S.G., et al.: Genetic analysis of quantitative
phenotypes in AD and MCI: imaging, cognition and biomarkers. Brain Imaging
Behav. 8(2), 183–207 (2014)
8. Stein, J.L., Hua, X., Lee, S., Ho, A.J., Leow, A.D., Toga, A.W., Saykin, A.J.,
Shen, L., Foroud, T., Pankratz, N., et al.: Voxelwise genome-wide association study
(vGWAS). NeuroImage 53(3), 1160–1174 (2010)
9. Suk, H., Lee, S., Shen, D.: Hierarchical feature representation and multimodal
fusion with deep learning for AD/MCI diagnosis. NeuroImage 101, 569–582 (2014)
10. Suk, H., Wee, C., Lee, S., Shen, D.: State-space model with deep learning for
functional dynamics estimation in resting-state fMRI. NeuroImage 129, 292–307
(2016)
11. Thung, K., Wee, C., Yap, P., Shen, D.: Neurodegenerative disease diagnosis using
incomplete multi-modality data via matrix shrinkage and completion. NeuroImage
91, 386–400 (2014)
12. Thung, K.H., Wee, C.Y., Yap, P.T., Shen, D.: Identification of progressive mild cog-
nitive impairment patients using incomplete longitudinal MRI scans. Brain Struct.
Funct., 1–17 (2015)
13. Vounou, M., Nichols, T.E., Montana, G.: ADNI: discovering genetic associations
with high-dimensional neuroimaging phenotypes: a sparse reduced-rank regression
approach. NeuroImage 53(3), 1147–1159 (2010)
14. Wang, H., Nie, F., Huang, H., et al.: Identifying quantitative trait loci via group-
sparse multitask regression and feature selection: an imaging genetics study of the
ADNI cohort. Bioinformatics 28(2), 229–237 (2012)
15. Yan, J., Du, L., Kim, S., et al.: Transcriptome-guided amyloid imaging genetic
analysis via a novel structured sparse learning algorithm. Bioinformatics 30(17),
i564–i571 (2014)
16. Zhang, C., Qin, Y., Zhu, X., Zhang, J., Zhang, S.: Clustering-based missing value
imputation for data preprocessing. In: IEEE International Conference on Industrial
Informatics, pp. 1081–1086 (2006)
352 X. Zhu et al.
17. Zhu, X., Huang, Z., Shen, H.T., Cheng, J., Xu, C.: Dimensionality reduction by
mixed kernel canonical correlation analysis. Pattern Recogn. 45(8), 3003–3016
(2012)
18. Zhu, X., Li, X., Zhang, S.: Block-row sparse multiview multilabel learning for image
classification. IEEE Trans. Cybern. 46(2), 450–461 (2016)
19. Zhu, X., Suk, H.I., Lee, S.W., Shen, D.: Canonical feature selection for joint regres-
sion and multi-class identification in Alzheimers disease diagnosis. Brain Imaging
Behav., 1–11 (2015)
20. Zhu, X., Suk, H., Shen, D.: A novel matrix-similarity based loss function for joint
regression and classification in AD diagnosis. NeuroImage 100, 91–105 (2014)
3D Ultrasonic Needle Tracking with a 1.5D
Transducer Array for Guidance of Fetal
Interventions
Wenfeng Xia1(B) , Simeon J. West2 , Jean-Martial Mari3 , Sebastien Ourselin4 ,

Anna L. David5 , and Adrien E. Desjardins1
1
Department of Medical Physics and Biomedical Engineering,
University College London, Gower Street, London WC1E 6BT, UK
wenfeng.xia@ucl.ac.uk
2
Department of Anaesthesia, Main Theatres, Maple Bridge Link Corridor, Podium
3, University College Hospital, 235 Euston Road, London NW1 2BU, UK
3
GePaSud, University of French Polynesia, Faa’a 98702, French Polynesia
4
Translational Imaging Group, Centre for Medical Image Computing,
Department of Medical Physics and Biomedical Engineering,
University College London, Wolfson House, London NW1 2HE, UK
5
Institute for Women’s Health,
University College London, 86-96 Chenies Mews, London WC1E 6HX, UK
Abstract. Ultrasound image guidance is widely used in minimally inva-

sive procedures, including fetal surgery. In this context, maintaining vis-
ibility of medical devices is a significant challenge. Needles and catheters
can readily deviate from the ultrasound imaging plane as they are
inserted. When the medical device tips are not visible, they can dam-
age critical structures, with potentially profound consequences including
loss of pregnancy. In this study, we performed 3D ultrasonic tracking
of a needle using a novel probe with a 1.5D array of transducer ele-
ments that was driven by a commercial ultrasound system. A fiber-optic
hydrophone integrated into the needle received transmissions from the
probe, and data from this sensor was processed to estimate the position
of the hydrophone tip in the coordinate space of the probe. Golay cod-
ing was used to increase the signal-to-noise (SNR). The relative tracking
accuracy was better than 0.4 mm in all dimensions, as evaluated using a
water phantom. To obtain a preliminary indication of the clinical poten-
tial of 3D ultrasonic needle tracking, an intravascular needle insertion was
performed in an in vivo pregnant sheep model. The SNR values ranged
from 12 to 16 at depths of 20 to 31 mm and at an insertion angle of 49o
relative to the probe surface normal. The results of this study demon-
strate that 3D ultrasonic needle tracking with a fiber-optic hydrophone
sensor and a 1.5D array is feasible in clinically realistic environments.
1 Introduction
Ultrasound (US) image guidance is of crucial importance during percutaneous
interventions in many clinical fields including fetal medicine, regional anesthesia,

DOI: 10.1007/978-3-319-46720-7 41
354 W. Xia et al.
interventional pain management, and interventional oncology. Fetal interven-

tions such as amniocentesis, chorionic villus sampling and fetal blood sampling
are commonly performed under US guidance [1,2]. Two-dimensional (2D) US
imaging is typically used to visualize anatomy and to identify the location of
the needle tip. The latter is often challenging, however. One reason is that the
needle tip can readily deviate from the US imaging plane, particularly with nee-
dle insertions at large depths. A second reason is that the needles tend to have
poor echogenicity during large-angle insertions, as the incident US beams can
be reflected outside the aperture of the external US imaging probe. In the con-
text of fetal interventions, misplacement of the needle tip can result in severe
complications, including the loss of pregnancy [2].
A number of methods have been proposed to improve needle tip visibility dur-
ing US guidance, including the use of echogenic surfaces, which tend to be most
relevant at steep insertion angles. However, a recent study on peripheral nerve
blocks found that even with echogenic needles, tip visibility was lost in approx-
imately 50 % of the procedure time [3]. Other methods for improving needle tip
visibility are based on the introduction of additional sources of image contrast,
including shaft vibrations [4], acoustic radiation force imaging [5], Doppler imag-
ing [6], and photoacoustic imaging [7]. Electromagnetic (EM) tracking has many
advantages, but the accuracy of EM tracking can be severely degraded by EM
field disturbances such as those arising from metal in tables [8], and the sen-
sors integrated into needles tend to be bulky and expensive. A needle tracking
method that is widely used in clinical practice has remained elusive.
Ultrasonic needle tracking is an emerging method that has shown promise
in terms of its accuracy and its compatibility with clinical workflow: positional
information and ultrasound images can be acquired from the same probe. With
this method, there is ultrasonic communication between the external US imaging
probe and the needle. One implementation involves integrating a miniature US
sensor into the needle that receives transmissions from the imaging probe; the
location of the needle tip can be estimated from the times between transmission
onset and reception, which we refer to here as the “time-of-flights”. With their
flexibility, small size, wide bandwidths, and low manufacturing costs, fiber-optic
US sensors are ideally suited for this purpose [9–11]. Recently, ultrasonic tracking
with coded excitation was performed in utero, in an in vivo ovine model [12].
A piezoelectric ring sensor has also been used [13].
In this study, we present a novel system for ultrasonic tracking that includes
a 1.5D array of 128 US transducer elements to identify the needle tip position
in three-dimensions (3D). Whilst ultrasonic tracking can be performed with 3D
US imaging probes, including those with 2D matrix arrays [14,15], the use of
these probes in clinical practice is limited. Indeed, 3D imaging probes tend to
be bulky and expensive, 2D matrix arrays are only available on a few high-end
systems, and it can be challenging to interpret 3D image volumes acquired from
complex tissue structures in real-time. In contrast, the 1.5D array in this study
is compatible with a standard commercial US system that drives 1D US imaging
probes. We evaluated the relative tracking accuracy with a water phantom, and
validated the system with an in vivo pregnant sheep model.
3D Ultrasonic Needle Tracking with a 1.5D Transducer Array 355

2.1 System Configuration
The ultrasonic tracking system was centered on a clinical US imaging system

(SonixMDP, Analogic Ultrasound, Richmond, BC, Canada) that was operated
in research mode (Fig. 1a). A custom 1.5D tracking probe, which comprised four
linear rows of 32 transducer elements with a nominal bandwidth of 4–9 MHz
(Fig. 1b), was produced by Vermon (Tours, France). This array was denoted as
“1.5D” to reflect the much larger number of elements in one dimension than in
the other. The US sensor was a fiber-optic hydrophone (FOH) that was inte-
grated into the cannula of a 20 gauge spinal needle (Terumo, Surrey, UK). The
FOH sensor (Precision Acoustics, Dorchester, UK) has a Fabry-Pérot cavity at
the distal end, so that impinging ultrasound waves result in changes in optical
reflectivity [16]. It was epoxied within the needle cannula so that its tip was flush
with the bevel surface, and used to receive US transmissions from the tracking
probe.
Three transmission sequences were used for tracking. The first comprised
bipolar pulses; the second and third, 32-bit Golay code pairs [17]. Transmissions
were performed from individual transducer elements, sequentially across rows
(Fig. 1b). The synchronization of data acquisition from the FOH sensor with US
transmissions was presented in detail in Refs. [10,11]. Briefly, two output triggers
were used: a frame trigger (FT) for the start of all 128 transmissions, and a
line trigger (LT) for each transmission. The FOH sensor signal was digitized at
100 MS/s (USB-5132, National Instruments, Austin, TX). Transmissions from
the ultrasonic tracking probe were controlled by a custom LabView program
operating on the ultrasound scanner PC, with access to low-level libraries.
Fig. 1. The 3D ultrasonic needle tracking system, shown schematically (a). The track-
ing probe was driven by a commercial ultrasound (US) scanner; transmissions from
the probe were received by a fiber-optic hydrophone sensor at the needle tip. The
transducer elements in the probe (b) were arranged in four rows (A–D).
356 W. Xia et al.
Fig. 2. The algorithm to estimate the needle tip position from the sensor data is
shown schematically (top). Representative data from all transducer elements obtained
before Golay decoding (1) and after (2), show improvements in SNR relative to bipolar
excitation (3). These three datasets are plotted on a linear scale as the absolute value
of their Hilbert transforms, normalized separately to their maximum values.
2.2 Tracking Algorithms

The algorithm for converting raw FOH sensor data to a 3D needle tip position
estimate is shown schematically in Fig. 2. It was implemented offline using cus-
tom scripts written in Matlab. First, band-pass frequency filtering matched to
the bandwidth of the transducer elements of the tracking probe was performed
(Chebyshev Type I; 5th order; 2–6 MHz). For Golay-coded transmissions, the
frequency-filtered data from each pair of transmissions were convolved with the
time-reversed versions of the oversampled Golay codes. As the final decoding
step, these convolved data from each pair were summed. The decoded data were
concatenated according to the rows of transducer elements from which the trans-
missions originated to form 4 tracking images.
The 4 tracking images were processed to obtain an estimate of the needle
tip position in the coordinate space of the tracking probe (x̃, ỹ, z̃). The hori-
zontal coordinate of each tracking image was the transducer element number;
the vertical component, the distance from the corresponding transducer element.
Typically, each tracking image comprised a single region of high signal ampli-
tude. For the k th tracking image (k = {1, 2, 3, 4}), the coordinate of the image
for which the signal was a maximum, (h(k) , v (k) ) was identified. The h(k) values
were consistent across tracking images (Fig. 2). Accordingly, ỹ was calculated as
their mean, offset from center and scaled by the distance between transducer
(k)
elements. To obtain x̃, and z̃, the measured time-of-flights tm were calculated
(k)
as v (k) /c, where c is the speed of sound. The tm values were compared with a
(k)
set of simulated time-of-flight values ts . The latter were pre-computed at each
point (xi , zj ) of a 2D grid in the X-Z coordinate space of the tracking probe,
where i and j are indices. This grid had ranges of −20 to 20 mm in X and 0 to
80 mm in Z, with a spacing of 0.025 mm. For estimation, the squared differences
(k) (k)
between tm and ts , were minimized:
⎧ 4 2 ⎫
⎪
[tm − ts (xi , zj )] · w(k) ⎪
(k) (k)
⎪
⎨ ⎪
⎬
k=1
(x̃, z̃) = arg min (1)
(xi ,zj ) ⎪
⎪
4 ⎪
⎪
⎩ [w(k) ]2 ⎭
k=1
where the signal amplitudes at the coordinates (h(k) , v (k) ) were used as weighting
factors, w(k) , so that tracking images with higher signal amplitudes contributed
more prominently.
2.3 Relative Tracking Accuracy
The relative tracking accuracy of the system was evaluated with a water phan-
tom. The needle was fixed on a translation stage, with its shaft oriented to
simulate an out-of-plane insertion: it was positioned within an X-Z plane with
its tip approximately 38 mm in depth from the tracking probe, and angled at 45o
to the water surface normal (Fig. 3a). The tracking probe was translated relative
to the needle in the out-of-plane dimension, X. This translation was performed
across 20 mm, with a step size of 2 mm. At each position, FOH sensor data were
acquired for needle tip tracking.
Each needle tip position estimate was compared with a corresponding refer-
ence position. The relative tracking accuracy was defined as the absolute differ-
ence between these two quantities. The X component of the reference position
was obtained from the translation stage, centered relative to the probe axis. As
Y and Z were assumed to be constant during translation of the tracking probe,
the Y and Z components of the reference position were taken to be the mean
values of these components of the position estimates.
Fig. 3. (a) Relative tracking accuracy measurements were performed with the needle
and the ultrasonic needle tracking (UNT) probe in water. (b) The signal-to-noise ratios
(SNRs) of the tracking images were consistently higher for Golay-coded transmissions
than for bipolar transmissions, and they increased with proximity to the center of the
probe (X = 0). The error bars in (b) represent standard deviations calculated from the
four tracking images. (c) Estimated relative tracking accuracies for Golay-coded trans-
missions along orthogonal axes; error bars represent standard deviations calculated
from all needle tip positions.
358 W. Xia et al.
2.4 In Vivo Validation

To obtain a preliminary indication of the system’s potential for guiding fetal
interventions, 3D needle tracking was performed in a pregnant sheep model in
vivo [18]. The primary objective of this experiment was to measure the signal-
to-noise ratios (SNRs) in a clinically realistic environment. All procedures on
animals were conducted in accordance with U.K. Home Office regulations and
the Guidance for the Operation of Animals (Scientific Procedures) Act (1986).
Ethics approval was provided by the joint animal studies committee of the Royal
Veterinary College and the University College London, United Kingdom. Ges-
tational age was confirmed using ultrasound. The sheep was placed under gen-
eral anesthesia and monitored continuously. The needle was inserted into the
uterus, towards a vascular target (Fig. 4a), with the bevel facing upward. Dur-
ing insertion, tracking was performed continuously, so that 4 tracked positions
were identified.
Fig. 4. In vivo validation of the 3D ultrasonic needle tracking system in a pregnant

sheep model. (a) Schematic illustration of the measurement geometry showing the out-
of-plane needle insertion into the abdomen of the sheep. The needle tip was tracked at 4
positions (p1–p4). (b) Comparison of signal-to-noise ratios (SNRs) using Golay-coded
and bipolar excitation, for all 4 tracked positions. The error bars represent standard
deviations obtained at each tracked position. (c) The tracked needle tip positions, which
were used to calculate the needle trajectory.
2.5 SNR Analysis

The SNR, was calculated for each tracking image at each needle tip position. The
numerator was defined as the maximum signal value attained for each tracking
image; the denominator, as the standard deviation of signal values obtained from
each tracking image in a region above the needle tip, where there was a visual
absence of signal (20 mm × 16 tracking elements).

With the needle in water (Fig. 3a), transmissions from the tracking probe could
clearly be identified in the received signals without averaging. With bipolar exci-
tation, the SNR values ranged from 12 to 21, with the highest values obtained
when the needle was approximately centered relative to the probe axis (X ∼ 0).
With Golay-coded excitation, they increased by factors of 7.3 to 8.5 (Fig. 3b).
The increases were broadly consistent with those anticipated: the temporal aver-
aging
√ provided by a pair of 32-bit Golay codes results in an SNR improvement
of 32 × 2 = 8. In water, the mean relative tracking accuracy depended on the
spatial dimension: 0.32 mm, 0.31 mm, and 0.084 mm in X, Y, and Z, respectively
(Fig. 3c). By comparison, these values are smaller than the inner diameter of 22 G
needles that are widely used in percutaneous procedures. They are also smaller
than recently reported EM tracking errors of 2 ± 1 mm [19]. The Z component of
the mean relative tracking accuracy is particularly striking; it is smaller than the
ultrasound wavelength at 9 MHz. This result reflects a high level of consistency
in the tracked position estimates.
With the pregnant sheep model in vivo, in which clinically realistic ultrasound
attenuation was present, the SNR values were sufficiently high for obtaining
tracking estimates. As compared with conventional bipolar excitation, the SNR
was increased with Golay-coded excitation. In the former case, the SNR values
were in the range of 2.1 to 3.0; coding increased this range by factors of 5.3 to
6.2 (Fig. 4b). From the tracked position estimates, a needle insertion angle of
49o and a maximum needle tip depth of 31 mm were calculated.
We presented, for the first time, a 3D ultrasonic tracking system based on
a 1.5D transducer array and a fiber-optic ultrasound sensor. A primary advan-
tage of this system is its compatibility with existing US imaging scanners, which
could facilitate clinical translation. There are several ways in which the track-
ing system developed in this study could be improved. For future iterations,
imaging array elements and a corresponding cylindrical acoustic lens could be
included to enable simultaneous 3D tracking and 2D US imaging. The SNR
could be improved by increasing the sensitivity of the FOH sensor, which could
be achieved with a Fabry-Pérot interferometer cavity that has a curved distal
surface to achieve a high finesse [20]. Additional increases in the SNR could be
obtained with larger code lengths that were beyond the limits of the particular
ultrasound scanner used in this study. The results of this study demonstrate
that 3D ultrasonic needle tracking with a 1.5D array of transducer elements and
a FOH sensor is feasible in clinically realistic environments and that it provides
highly consistent results. When integrated into an ultrasound imaging probe
that includes a linear array for acquiring 2D ultrasound images, this method
has strong potential to reduce the risk of complications and decrease procedure
times.
Acknowledgments. This work was supported by an Innovative Engineering for

Health award by the Wellcome Trust (No. WT101957) and the Engineering and Phys-
ical Sciences Research Council (EPSRC) (No. NS/A000027/1), by a Starting Grant
from the European Research Council (ERC-2012-StG, Proposal No. 310970 MOPHIM),
and by an EPSRC First Grant (No. EP/J010952/1). A.L.D. is supported by the
UCL/UCLH NIHR Comprehensive Biomedical Research Centre.
360 W. Xia et al.
References
1. Daffos, F., et al.: Fetal blood, sampling during pregnancy with use of a needle
guided by ultrasound: a study of 606 consecutive cases. Am. J. Obstet. Gynecol.
153(6), 655–660 (1985)
2. Agarwal, K., et al.: Pregnancy loss after chorionic villus sampling and genetic
amniocentesis in twin pregnancies: a systematic review. Ultrasound Obstet.
Gynecol. 40(2), 128–134 (2012)
3. Hebard, S., et al.: Echogenic technology can improve needle visibility during
ultrasound-guided regional anesthesia. Reg. Anesth. Pain Med. 36(2), 185–189
(2011)
4. Klein, S.M., et al.: Piezoelectric vibrating needle and catheter for enhancing ultra-
soundguided peripheral nerve blocks. Anesth. Analg. 105, 1858–1860 (2007)
5. Rotemberg, V., et al.: Acoustic radiation force impulse (ARFI) imaging-based nee-
dle visualization. Ultrason. Imaging 33(1), 1–16 (2011)
6. Fronheiser, M.P., et al.: Vibrating interventional device detection using real-time
3-D color doppler. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 55(6), 1355–
1362 (2008)
7. Xia, W., et al.: Performance characteristics of an interventional multispectral pho-
toacoustic imaging system for guiding minimally invasive procedures. J. Biomed.
Opt. 20(8), 086005 (2015)
8. Poulin, F.: Interference during the use of an electromagnetic tracking system under
OR conditions. J. Biomech. 35, 733–737 (2002)
9. Guo, X., et al.: Photoacoustic active ultrasound element for catheter tracking. In:
Proceedings of SPIE, vol. 8943, p. 89435M (2014)
10. Xia, W., et al.: Interventional photoacoustic imaging of the human placenta with
ultrasonic tracking for minimally invasive fetal surgeries. In: Navab, N., Hornegger,
J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 371–378.
11. Xia, W., et al.: In-plane ultrasonic needle tracking using a fiber-optic hydrophone.
Med. Phys. 42(10), 5983–5991 (2015)
12. Xia, W., et al.: Coded excitation ultrasonic needle tracking: an in vivo study. Med.
Phys. 43(7), 4065–4073 (2016)
13. Nikolov, S.I.: Precision of needle tip localization using a receiver in the needle.
In: IEEE International Ultrasonics Symposium Proceedings, Beijing, pp. 479–482
(2008)
14. Mung, J., et al.: A non-disruptive technology for robust 3D tool tracking for
ultrasound-guided interventions. In: Fichtinger, G., Martel, A., Peters, T. (eds.)
MICCAI 2011. LNCS, vol. 6891, pp. 153–160. Springer, Heidelberg (2011). doi:10.
1007/978-3-642-23623-5 20
15. Mung, J.: Ultrasonically marked instruments for ultrasound-guided interventions.
In: IEEE Ultrasonics Symposium (IUS), pp. 2053–2056 (2013)
16. Morris, P., et al.: A Fabry-Pérot fiber-optic ultrasonic hydrophone for the simul-
taneous measurement of temperature and acoustic pressure. J. Acoust. Soc. Am.
125(6), 3611–3622 (2009)
17. Budisin, S.Z., et al.: New complementary pairs of sequences. Electron. Lett. 26(13),
881–883 (1990)
18. David, A.L., et al.: Recombinant adeno-associated virus-mediated in utero gene
transfer gives therapeutic transgene expression in the sheep. Hum. Gene Ther. 22,
419–426 (2011)
19. Boutaleb, S., et al.: Performance and suitability assessment of a real-time 3D elec-
tromagnetic needle tracking system for interstitial brachytherapy. J. Contemp.
Brachyther. 7(4), 280–289 (2015)
20. Zhang, E.Z., Beard, P.C.: A miniature all-optical photoacoustic imaging probe. In:
Proceedings of SPIE, p. 78991F (2011). http://proceedings.spiedigitallibrary.org/
proceeding.aspx?articleid=1349009
Enhancement of Needle Tip and Shaft from 2D
Ultrasound Using Signal Transmission Maps
Cosmas Mwikirize1(B) , John L. Nosher2 , and Ilker Hacihaliloglu1,2

1
Department of Biomedical Engineering, Rutgers University, Piscataway, USA
cosmas.mwikirize@rutgers.edu
2
Department of Radiology,
Rutgers Robert Wood Johnson Medical School, New Brunswick, USA
Abstract. New methods for needle tip and shaft enhancement in 2D

curvilinear ultrasound are proposed. Needle tip enhancement is achieved
using an image regularization method that utilizes ultrasound signal
transmission maps to model inherent signal loss due to attenuation. Shaft
enhancement is achieved by optimizing the proposed signal transmission
map using the information based on trajectory constrained boundary
statistics derived from phase oriented features. The enhanced tip is auto-
matically localized using spatially distributed image statistics from the
estimated shaft trajectory. Validation results from 100 ultrasound images
of bovine, porcine, kidney and liver ex vivo reveal a mean localization
error of 0.3 ± 0.06 mm, a 43 % improvement in localization over previous
state of the art.
Keywords: Image regularization · Confidence maps · Needle enhance-

ment · Local phase · Ultrasound
1 Introduction
Ultrasound (US) is a popular image-guidance tool used to facilitate real-time
needle visualization in interventional procedures such as fine needle and core
tissue biopsies, catheter placement, drainages, and anesthesia. During such pro-
cedures, it is important that the needle precisely reaches the target with mini-
mum attempts. Unfortunately, successful visualization of the needle in US-based
procedures is greatly affected by the orientation of the needle to the US beam
and is inferior for procedures involving steep needle insertion angles. The visu-
alization becomes especially problematic for curvilinear transducers since only a
small portion or none of the needle gives strong reflection.
There is a wealth of literature on improving needle visibility and detection
in US. A sampling of the literature is provided here to see the wide range of
approaches. External tracking technologies are available to track the needle [1],
but this requires custom needles and changes to the clinical work-flow. Hough
Transform [2,3], Random Sample Consensus (RANSAC) [4,5] and projection
based methods [6,7] were proposed for needle localization. In most of the previ-
ous approaches, assumptions were made for the appearance of the needle in US

DOI: 10.1007/978-3-319-46720-7 42
Enhancement of Needle Tip and Shaft from 2D Ultrasound 363
images such as the needle having the longest and straightest line feature with
high intensity. Recently, Hacihaliloglu et al. [6] combined local phase-based image
projections with spatially distributed needle trajectory statistics and achieved
an error of 0.43 ± 0.31 mm for tip localization. Although the method is suitable
in instances when the shaft is discontinuous, it fails when apriori information
on shaft orientation is less available and when the tip does not appear as char-
acteristic high intensity along the needle trajectory. Regarding shaft visibility,
approaches based on beam steering [8], using linear transducers, or mechani-
cally introduced vibration [9] have been proposed. The success of beam steering
depends on the angle values used during the procedure. Furthermore, only a
portion of the needle is enhanced with curvilinear arrays so the tip is still indis-
tinguishable. Vibration-based approaches sometimes require external mechanical
devices, increasing the overall complexity of the system.
In view of the above mentioned limitations, there is need to develop methods
that perform both needle shaft and tip enhancement for improved localization
and guidance without changing the clinical work-flow and increasing the overall
complexity of the system. The proposed method is specifically useful for pro-
cedures, such as lumbar blocks, where needle shaft visibility is poor and the
tip does not have a characteristic high intensity appearance. We introduce an
efficient L1 -norm based contextual regularization that enables us to incorporate
a filter bank into the image enhancement method by taking into account US
specific signal propagation constraints. Our main novelty is incorporation of US
signal modeling, for needle imaging, into an optimization problem to estimate
the unknown signal transmission map which is used for enhancement of needle
shaft and tip. Qualitative and quantitative validation results on scans collected
from porcine, bovine, kidney and liver tissue samples are presented. Compar-
ison results against previous state of the art [6], for tip localizations, are also
provided.
2 Methods
The proposed framework is based on the information that needle insertion side
(left or right) is known a priori, the needle is inserted in plane and the shaft
close to the transducer surface is visible. Explicitly, we are interested in the
enhancement of needle images obtained from 2D curvilinear transducers.
2.1 L1 -Norm Based Contextual Regularization for Image

Enhancement
Modeling of US signal transmission has been one of the main topics of research in
US-guided procedures [10]. The interaction of the US signal within the tissue can
be characterized into two main categories, namely, scattering and attenuation.
Since the information of the backscattered US signal from the needle interface
to the transducer, is modulated by these two interactions they can be viewed
as a mechanisms of structural information coding. Based on this we develop a
364 C. Mwikirize et al.
model, called US signal transmission map, for recovering the pertinent needle
structure from the US images. US signal transmission map maximizes the visi-
bility of high intensity features inside a local region and satisfies the constraint
that the mean intensity of the local region is less than the echogenicity of the
tissue confining the needle. In order to achieve this we propose the following
linear interpolation model which combines scattering and attenuation effects in
the tissue: U S(x, y) = U SA (x, y)U SE (x, y) + (1 − U SA (x, y))α. Here, U S(x, y)
is the B-mode US image, U SA (x, y) is the signal transmission map, U SE (x, y) is
the enhanced needle image and α is a constant value representative of echogenic-
ity in the tissue surrounding the needle. Our aim is the extraction of U SE (x, y)
which is obtained by estimating the signal transmission map image U SA (x, y).
In order to calculate U SA (x, y), we make use of the well known Beer Lambert
Law: U ST (x, y) = U S0 (x, y)exp(−ηd(x, y)) which models the attenuation as a
function of depth. Here U ST (x, y) is the attenuated intensity image, U S0 is the
initial intensity image, η the attenuation coefficient, and d(x, y) the distance
from the source/transducer. U ST (x, y) is modeled as a patch-wise transmission
function modulated by attenuation and orientation of the needle which will be
explained in the next section. Once U ST (x, y) is obtained U SA (x, y) is estimated
by minimizing the following objective function [11]:
λ
U SA (x, y) − U ST (x, y) 22 + Wj ◦ (Dj U SA (x, y)) 1 . (1)
2 j∈ω
Here ω is an index set, ◦ represents element wise multiplication, and is con-

volution operator. Dj is obtained using a bank of high order differential filters
consisting of eight Kirsch filters and one Laplacian filter [11]. The incorporation
of this filter bank into the contextual regularization framework helps in attenuat-
ing the image noise and results in the enhancement of ridge edge features such as
the needle shaft and tip in the local region. Wj is a weighting matrix calculated
using: Wj (x, y) = exp(− | Dj (x, y) U S(x, y) |2 ). In Eq. (1), the first part is the
data term which measures the dependability of U SA (x, y) on U ST (x, y)). The
second part of Eq. (1) models the contextual constraints of U SA (x, y), and λ is
the regularization parameter used to balance the two terms. The optimization
of Eq. (1) is achieved using variable splitting where several auxiliary variables
are introduced to construct a sequence of simple sub-problems, the solutions of
which finally converge to the optimal solution of the original problem [11]. Once
U SA (x, y) is estimated, the enhanced needle image U SE (x, y) can be extracted
using U SE (x, y) = [(U S(x, y) − α)/[max(U SA (x, y), )]δ ] + α. Here is a small
constant used to avoid division by zero and δ is related to η, the attenuation
coefficient. In next sections we explain how U ST (x, y)) is modeled for needle tip
and shaft enhancement.
Needle Tip Enhancement: For tip enhancement (Fig. 1), we apply the regu-
larization framework described in previous section. With reference to Eq. (1), we
require U ST (x, y) to represent boundary constraints imposed on the US image
by attenuation and orientation of the needle within the tissue. Therefore, we
first calculate the US confidence map, denoted as U SCM (x, y), using US specific
domain constraints previously proposed by Karamalis et al. [10]. The confidence

map is a measure of probability that a random walk starting from a pixel would
be able to reach a virtual transducer elements given US image and US specific
constraints. The B-mode US image, U S(x, y), is represented as a weighted con-
nected graph and random walks from virtual transducers at the top of the image
are used to calculate the apparent signal strengths at each pixel location. Let eij
denote the edge between nodes i and j. The graph Laplacian has the weights:
⎧ H
⎪
⎪ wij = exp(−β | ci − cj | +γ), if i, j adjacent and eij ∈ EH
⎪
⎨wV = exp(−β | c − c |),
ij i j if i, j adjacent and eij ∈ EV
wij = √ (2)
⎪
⎪
D
wij = exp(−β | ci − cj | + 2γ), if i, j adjacent and eij ∈ ED
⎪
⎩
0, otherwise .
where EH ,EV and ED are the horizontal, vertical and diagonal edges on the
graph and ci = U S(x, y)i exp(−ηρ(x, y)i ). Here U S(x, y)i is the image intensity
at node i and ρ(x, y)i is the normalized closest distance from the node to the
nearest virtual transducer [10]. The attenuation coefficient η is inherently inte-
grated in the weighting function, γ is used to model the beam width and β = 90
is an algorithmic constant. U ST (x, y) is obtained by taking the complement of
U SCM (x, y) (U ST (x, y) = U SCM (x, y)C ). Since we expect the signal transmis-
sion map function U SA (x, y) to display higher intensity with increasing depth,
the complement of the confidence map provides the ideal patch-wise transmission
map, U ST (x, y), in deriving the minimal objective function. The result of needle
tip enhancement is shown in Fig. 1. Investigating Fig. 1(b), we can see that the
calculated transmission map U SA (x, y), using Eq. (1), has low intensity values
close to the transducer surface (shallow image regions) and high intensity fea-
tures in the regions away from the transducer (deep image regions). Furthermore,
it provides a smooth attenuation density estimate for the US image formation
model. Finally, the mean intensity of the local region in the estimated signal
transmission map is less than the echogenicity of the tissue confining the needle.
This translates into the enhanced image, where the tip will be represented by a
local average of the surrounding points, thus giving a uniform intensity region
with a high intensity feature belonging to the needle tip in the enhanced image
U SE (x, y).
Needle Tip Localization: The first step in tip localization is the enhancement
of the needle shaft appearing close to the transducer surface (Fig. 1(c) top right)
in the enhanced US image U SE (x, y). This is achieved by constructing a phase-
based image descriptor, called phase symmetry (PS), using a 2D Log-Gabor
2
−(θ−θm )2
filter whose function is defined as: LG(ω, θ) = exp( −log(ω/κ)
2log(σω )2 )exp( 2(σθ )2 ).
Here, ω is the filter frequency while θ is its orientation, k is the center frequency,
σω is the bandwidth on the frequency spectrum, σθ is the angular bandwidth
and θm is the filter orientation. These filter parameters are selected automatically
using the framework proposed in [6]. An initial needle trajectory is calculated by
using the peak value of the Radon Transformed PS image. This initial trajectory
is further optimized using Maximum Likelihood Estimation Sample Consensus
Fig. 1. Needle tip enhancement by the proposed method.(a) B-mode US image showing
inserted needle at an angle of 450 . The needle tip has a low contrast to the surrounding
tissue and the needle shaft is discontinuous. (b) The derived optimal signal transmission
map function U SA (x, y). The map gives an estimation of the signal density in the US
image, and thus displays higher intensity values in more attenuated and scattered
regions towards the bottom of the image (c) Result of needle tip enhancement. The
red arrow points to the conspicuous tip along the trajectory path.
(MLESAC) [6] algorithm for outlier rejection and geometric optimization for
connecting the extracted inliers [6]. The image intensities at this stage, lying
along a line L in a point cloud, are distributed into a set of line segments, each
defined by set of points or knots denoted as t1 ...tn . Needle tip is extracted using:
ti+1

U SB (L(t)) dt
ti
U Sneedle (U SB , L(t)) = ; t ∈ [ti , ti+1 ]. (3)
L(ti+1 − Lti ) 2
Here, U SB is the result after band-pass filtering the tip enhanced US image,
while ti and ti+1 are successive knots. U Sneedle consists of averaged pixel inten-
sities, and the needle tip is localized as the farthest maximum intensity pixel of
U Sneedle at the distal end of the needle trajectory. One advantage of using the
tip enhanced image for localization instead of the original image is minimization
of interference from soft tissue. In the method of [6], if a high intensity region
other than that emanating from the tip were encountered along the trajectory
beyond the needle region, the likelihood of inaccurate localization was high. In
our case, the enhanced tip has a conspicuously higher intensity than soft tissue
interfaces and other interfering artifacts (Fig. 1(c)).
Shaft Enhancement: For shaft enhancement (Fig. 2), we use the regulariza-
tion framework previously explained. However, with reference to Eq. (1), since
our objective is to enhance the shaft, we construct a new patch-wise transmis-
sion function U ST (x, y) using trajectory and tip information calculated in the
needle tip localization section. Consequently, we model the patch-wise trans-
mission function as U ST (x, y) = U SDM (x, y) which represents the Euclidean
distance map of the trajectory constrained region. Knowledge of the initial tra-
jectory, from previous section, enables us to model an extended region which
includes the entire trajectory of the needle. Incorporating the needle tip loca-
tion, calculated in the previous step, we limit this region to the trajectory depth
Fig. 2. The process of needle shaft enhancement (a) B-mode US image. (b) Trajectory
constrained region obtained from local phase information, indicated by the red line.
Line thickness can be adjusted to suit different needle diameters and bending insertions.
(c) The optimal signal transmission function U SA (x, y) for needle shaft enhancement.
(d) Result of shaft enhancement. Enhancement does not take place for features along
the trajectory that may lie beyond the needle tip.
so as to minimize enhancement of soft tissue interfaces beyond the tip (Fig. 2(c)).
Investigating Fig. 2(c) we can see that the signal transmission map calculated
for the needle shaft has low density values for the local regions confining the
needle shaft and high density values for local regions away from the estimated
needle trajectory. The difference of the signal transmission map for needle shaft
enhancement compared to the tip enhancement is that the signal transmission
map for shaft enhancement is limited to the geometry of the needle. This trans-
lates into the enhanced needle shaft image, where the shaft will be represented by
a local average of the surrounding points, thus giving a uniform intensity region
with a high intensity feature belonging to the needle shaft in the enhanced image
U SE (x, y).
2.2 Data Acquisition and Analysis

The US images used to evaluate the proposed methods were obtained using
a SonixGPS system (Analogic Corporation, Peabody, MA, USA) with a C5-2
curvilinear probe. A standard 17 gauge SonixGPS vascular access needle was
inserted at varying insertion angles (300 − 700 ) and depths up to 8 cm. The
image resolutions varied from 0.13 mm to 0.21 mm for different depth settings.
Freshly excised ex vivo porcine, bovine, liver and kidney tissue samples, obtained
from a local butcher were used as the imaging medium. A total of 100 images
were analyzed (25 images per tissue sample). The proposed method was imple-
mented using MATLAB 2014a software package and run on a 3.6 GHz Intel(R)
CoreTM i7 CPU, 16 GB RAM windows PC. In order to quantify the accuracy,
we compared the localized tip against the gold standard tip location obtained by
an expert user. The Euclidean Distance (ED) between the ground truth and the
estimated tip locations was calculated. We also report the Root Mean Square
Error (RMS) and 95 % Confidence Interval (CI) for the quantitative localization
results. Finally, we also provide comparison results against the method proposed
in [6]. For calculating the US confidence map, U SCM (x, y), the constant values
were chosen as: η = 2, β = 90, γ = 0.03. For Eq. (1), λ = 2 and the Log-
Gabor filter parameters were calculated using [6]. α, the constant related to
tissue echogenicity, was chosen as 90 % of the maximum intensity value of the

patch-wise transmission function. Throughout the experimental validation these
values were not changed.
Qualitative and quantitative results obtained from the proposed method are pro-
vided in Fig. 3. It is observed that the proposed method gives clear image detail
for the tip and shaft, even in instances where shaft information is barely visible
(Fig. 3(a) middle column). Using the method proposed in [6], incorrect tip local-
ization arises from soft tissue interface which manifests a higher intensity than
the tip along the needle trajectory in the B mode image (Fig. 3(a) right column).
In the proposed method, the tip is enhanced but the soft tissue interface is not,
thus improving localization as shown in Fig. 3(b). The tip and shaft enhance-
ment takes 0.6 seconds and 0.49 seconds for a 370 × 370 2D image respectively.
Figure 3(b) shows a summary of the quantitative results. The overall localization
error from the proposed method was 0.3 ± 0.06 mm while that from [6] under
an error cap of 2 mm (73 % of the dataset had an error of less than 2 mm) was
0.53 ± 0.07 mm.
Fig. 3. Qualitative an quantitative results for the proposed method. (a) Left column: B-
mode US images for bovine and porcine tissue respectively. Middle column: Respective
localized tip, red dot, overlaid on shaft enhanced image. Right column: Tip localization
results, red dot, from the method of Hacihaliloglu et al. [6]. (b) Quantitative analysis
of needle tip localization for bovine, porcine, liver and kidney tissue. Top: Proposed
method. Bottom: Using the method of Hacihaliloglu et al. [6]. For the method in [6]
only 73 % of the localization result had an error value under 2 mm and were used during
validation. Overall, the new method improves tip localization.
4 Discussions and Conclusions

We presented a method for needle shaft and tip enhancement from curvilinear
US images. The proposed method is based on the incorporation of US signal
modeling into a L1 -norm based contextual regularization framework by tak-

ing into account US specific signal propagation constraints. The improvement
achieved in terms of needle tip localization compared to the previously proposed
state-of-the-art method [6] was 43 %. The method is validated on epidural nee-
dles with minimal bending. For enhancement of bending needles, the proposed
model can be updated by incorporating the bending information into the frame-
work. As part of future work, we will incorporate an optical tracking system in
validation to minimize variability associated with an expert user. We will also
explore shaft and tip enhancement at steeper angles (>700 ), and deeper insertion
(>8 cm) depths. The achieved tip localization accuracy and shaft enhancement
makes our method appropriate for further investigation in vivo, and is valuable
to all previously proposed state of the art needle localization methods.
References
1. Hakime, A., Deschamps, F., De Carvalho, E.G., Barah, A., Auperin, A.,
De Baere, T.: Electromagnetic-tracked biopsy under ultrasound guidance: prelim-
inary results. Cardiovasc. Intervent. Radiol. 35(4), 898–905 (2012)
2. Zhou, H., Qiu, W., Ding, M., Zhang, S.: Automatic needle segmentation in 3D
ultra-sound images using 3D improved Hough transform. In: SPIE Medical Imag-
ing, vol. 6918, pp. 691821-1–691821-9 (2008)
3. Elif, A., Jaydev, P.: Optical flow-based tracking of needles and needle-tip localiza-
tion using circular hough transform in ultrasound images. Ann. Biomed. Eng.
43(8), 1828–1840 (2015)
4. Uhercik, M., Kybic, J., Liebgott, H., Cachard, C.: Model fitting using ransac for
surgical tool localization in 3D ultrasound images. IEEE Trans. Biomed. Eng.
57(8), 1907–1916 (2010)
5. Zhao, Y., Cachard, C., Liebgott, H.: Automatic needle detection and tracking in 3D
ultrasound using an ROI-based RANSAC and Kalman method. Ultrason. Imaging
35(4), 283–306 (2013)
6. Hacihaliloglu, I., Beigi, P., Ng, G., Rohling, R.N., Salcudean, S., Abolmaesumi, P.:
Projection-based phase features for localization of a needle tip in 2D curvilinear
ultrasound. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI
978-3-319-24553-9 43
7. Wu, Q., Yuchi, M., Ding, M.: Phase grouping-based needle segmentation in 3-D
trans-rectal ultrasound-guided prostate trans-perineal therapy. Ultrasound Med.
Biol. 40(4), 804–816 (2014)
8. Hatt, C.R., Ng, G., Parthasarathy, V.: Enhanced needle localization in ultrasound
using beam steering and learning-based segmentation. Comput. Med. Imaging
Graph. 14, 45–54 (2015)
9. Harmat, A., Rohling, R.N., Salcudean, S.: Needle tip localization using stylet vibra-
tion. Ultr. Med. Biol. 32(9), 1339–1348 (2006)
10. Karamalis, A., Wein, W., Klein, T., Navab, N.: Ultrasound confidence maps using
random walks. Med. Image Anal. 16(6), 1101–1112 (2012)
11. Meng, G., Wang, Y., Duan, J., Xiang, S., Pan, C.: Efficient image dehazing with
boundary constraint and contextual regularization. In: IEEE International Con-
ference on Computer Vision, pp. 617–624 (2013)
Plane Assist: The Influence of Haptics
on Ultrasound-Based Needle Guidance
Heather Culbertson1(B) , Julie M. Walker1 , Michael Raitor1 ,

Allison M. Okamura1 , and Philipp J. Stolka2
1
Stanford University, Stanford, CA 94305, USA
{hculbert,jwalker4,mraitor,aokamura}@stanford.edu
2
Clear Guide Medical, Baltimore, MD 21211, USA
stolka@clearguidemedical.com
Abstract. Ultrasound-based interventions require experience and good

hand-eye coordination. Especially for non-experts, correctly guiding a
handheld probe towards a target, and staying there, poses a remarkable
challenge. We augment a commercial vision-based instrument guidance
system with haptic feedback to keep operators on target. A user study
shows significant improvements across deviation, time, and ease-of-use
when coupling standard ultrasound imaging with visual feedback, haptic
feedback, or both.
Keywords: Ultrasound · Haptic feedback · Instrument guidance ·

Image guidance
1 Introduction
The use of ultrasound for interventional guidance has expanded significantly
over the past decade. With research showing that ultrasound guidance improves
patient outcomes in procedures such as central vein catheterizations and periph-
eral nerve blocks [3,7], the relevant professional certification organizations began
recommending ultrasound guidance as the gold standard of care, e.g. [1,2]. Some
ultrasound training is now provided in medical school, but often solely involves
the visualization and identification of anatomical structures – a very necessary
skill, but not the only one required [11].
Simultaneous visualization of targets and instruments (usually needles) with
a single 2D probe is a significant challenge. The difficulty of maintaining align-
ment (between probe, instrument, and target) is a major reason for extended
intervention duration [4]. Furthermore, if target or needle visualization is lost
due to probe slippage or tipping, the user has no direct feedback to find them
again. Prior work has shown that bimanual tasks are difficult if the effects of
movements of both hands are not visible in the workspace; when there is lack
of visual alignment, users must rely on their proprioception, which has an error
of up to 5 cm in position and 10◦ of orientation at the hands [9]. This is a
particular challenge for novice or infrequent ultrasound users, as this is on the

DOI: 10.1007/978-3-319-46720-7 43
Plane Assist: Haptics for Ultrasound Needle Guidance 371
order of the range of unintended motion during ultrasound scans. Clinical accu-
racy limits (e.g. deep biopsies to lesions) are greater than 10 mm in diameter.
With US beam thickness at depth easily greater than 2 cm, correct continu-
ous target/needle visualization and steady probe position is a critical challenge.
Deviations less than 10 mm practically cannot be confirmed by US alone. One
study [13] found that the second most common error of anesthesiology novices
during needle block placement (occurring in 27 % of cases) was unintentional
probe movement.
One solution to this problem is to pro-
vide corrective guidance to the user. Prior
work in haptic guidance used vibrotactile dis-
plays effectively in tasks where visual load
is high [12]. The guiding vibrations can free
up cognitive resources for more critical task
aspects. Combined visual and haptic feed-
back has been shown to decrease error [10]
and reaction time [16] over visual feedback
alone, and has been shown to be most effec-
tive in tasks with a high cognitive load [6].
Handheld ultrasound scanning systems
with visual guidance or actuated feedback do
exist [8], but are either limited to just ini- Fig. 1. Guidance system used in
tial visual positioning guidance when using this study (Clear Guide ONE),
camera-based local tracking [15], or offer including a computer and handheld
active position feedback only for a small ultrasound probe with mounted
range of motion and require external track- cameras, connected to a standard
ing [5]. ultrasound system.
To improve this situation, we propose a
method for intuitive, always-available, direct probe guidance relative to a clinical
target, with no change to standard workflows. The innovation we describe here
is Plane Assist: ungrounded haptic (tactile) feedback signaling which direction
the user should move to bring the ultrasound imaging plane into alignment with
the target. Ergonomically, such feedback helps to avoid information overload
while allowing for full situational awareness, making it particularly useful for
less experienced operators.
2 Vision-Based Guidance System and Haptic Feedback
Image guidance provides the user with information to help aligning instruments,
targets, and possibly imaging probes to facilitate successful instrument handling
relative to anatomical targets. This guidance information can be provided visu-
ally, haptically, or auditorily. In this study we consider visual guidance, haptic
guidance, and their combinations, for ultrasound-based interventions.
372 H. Culbertson et al.
2.1 Visual Guidance
For visual guidance, we use a Clear Guide ONE (Clear Guide Medical, Inc.,
Baltimore MD; Fig. 1), which adds instrument guidance capabilities to regular
ultrasound machines for needle-based interventions. Instrument and ultrasound
probe tracking is based on computer vision, using wide-spectrum stereo cameras
mounted on a standard clinical ultrasound transducer [14]. Instrument guidance
is displayed as a dynamic overlay on live ultrasound imaging.
Fiducial markers are attached to the patient skin in the cameras’ field of
view to permit dynamic target tracking. The operator defines a target initially by
tapping on the live ultrasound image. If the cameras observe a marker during this
target definition, further visual tracking of that marker allows continuous 6-DoF
localization of the probe. This target tracking enhances the operator’s ability
to maintain probe alignment with a chosen target. During the intervention, as
(inadvertent) deviations from this reference pose relative to the target – or vice
versa in the case of actual anatomical target motion – are tracked, guidance to
the target is indicated through audio and on-screen visual cues (needle lines,
moving target circles, and targeting crosshairs; Fig. 1).
From an initial target pose U S P in ultrasound (U S) coordinate frame and
camera/ultrasound calibration transformation matrix C T U S , one determines the
pose of the target in the original camera frame:
C C
P = T US US
P (1)
In a subsequent frame, where the same marker is observed in the new camera
coordinate frame (C, t), one finds the transformation between the two camera
frames (C,t T C ) by simple rigid registration of the two marker corner point sets.
Now the target is found in the new ultrasound frame (U S, t):
U S,t U S,t
P = T C,t C,t
TC C
P (2)
Noting that the ultrasound and camera frames are fixed relative to each other
(U S,t T C,t = U S T C ), and expanding, we get the continuously updated target
positions in the ultrasound frame:
U S,t
P = (C T U S )−1 C,t
TC C
T US US
P (3)
This information can be used for both visual and haptic (see below) feedback.
2.2 Haptic Guidance
To add haptic cues to this system, two C-2 tactors (Engineering Acoustics, Inc.,
Casselberry, FL) were embedded in a silicone band that was attached to the
ultrasound probe, as shown in Fig. 2. Each tactor is 3 cm wide, 0.8 cm tall, and
has a mass of 17 g. The haptic feedback band adds 65 g of mass and 2.5 cm of
thickness to the ultrasound probe. The tactors were located on the probe sides to
provide feedback to correct unintentional probe tilting. Although other degrees
of freedom (i.e. probe translation) will also result in misalignment between the
US plane and target, we focus this initial implementation on tilting because our
pilot study showed that tilting is one of the largest contributors to error between
US plane and target.
Haptic feedback is provided to the user if the target location is further than 2
mm away from the ultrasound plane. This ±2 mm deadband thus corresponds to
different amounts of probe tilt for different target depths1 . The tactor on the side
corresponding to the direction of tilt is vibrated with an amplitude proportional
to the amount of deviation.
3 Experimental Methods
We performed a user study to test the effectiveness of haptic feedback in reducing
unintended probe motion during a needle insertion task. All procedures were
approved by the Stanford University Institutional Review Board. Eight right-
handed novice non-medical students were recruited for the study (five male,
three female, 22–43 years old). Novice subjects were used as an approximate
representation of medical residents’ skills to evaluate the effect of both visual
and haptic feedback on the performance of inexperienced users and to assess
the efficacy of this system for use in training. (Other studies indicate that the
system shows the greatest benefit with non-expert operators.)
3.1 Experiment Set-Up

In the study, the participants used the ultrasound probe to image a synthetic
homogeneous gelatin phantom (Fig. 2(b)) with surface-attached markers for
probe pose tracking. After target definition, the participants used the instru-
ment guidance of the Clear Guide system to adjust a needle to be in-plane with
ultrasound, and its trajectory to be aligned with the target. After appropriate
alignment, they then inserted the needle into the phantom until reaching the
target, and the experimenter recorded success or failure of each trial. The suc-
cess of a trial was determined by watching the needle on the ultrasound image; if
it intersected the target, the trial was a success, otherwise a failure. The system
continuously recorded the target position in ultrasound coordinates (U S,t P ) for
all trials.
3.2 Guidance Conditions

Each participant was asked to complete sixteen needle insertion trials. At the
beginning of each trial, the experimenter selected one of four pre-specified tar-
get locations ranging from 3 cm to 12 cm in depth. When the experimenter
defined a target location on the screen, the system saved the current position
and orientation of the ultrasound probe as the reference pose.
1
Note that we ignore the effects of ultrasound beam physics resulting in varying
resolution cell widths (beam thickness), and instead consider the ideal geometry.
Fig. 2. (a) Ultrasound probe, augmented with cameras for visual tracking of probe and
needle, and a tactor band for providing haptic feedback. (b) Participant performing
needle insertion trial into a gelatin phantom using visual needle and target guidance
on the screen, and haptic target guidance through the tactor band.
During each trial, the system determines the current position and orientation
of the ultrasound probe, and calculates its deviation from the reference pose.
Once the current probe/target deviation is computed, the operator is informed of
required repositioning using two forms of feedback: (1) Standard visual feedback
(by means of graphic overlays on the live US stream shown on-screen) indicates
the current target location as estimated by visual tracking and the probe motion
necessary to re-visualize the target in the US view port. The needle guidance is
also displayed as blue/green lines on the live imaging stream. (2) Haptic feedback
is presented as vibration on either side of the probe to indicate the direction of
probe tilt from its reference pose. The participants were instructed to tilt the
probe away from the vibration to correct for the unintended motion.
Each participant completed four trials under each of four feedback conditions:
no feedback (standard US imaging with no additional guidance), visual feedback
only, both visual and haptic feedback, and haptic feedback only. The conditions
and target locations were randomized and distributed across all sixteen trials to
mitigate learning effects and differences in difficulty between target locations.
Participants received each feedback and target location pair once.
4 Results
In our analysis, we define the amount of probe deviation as the perpendicular
distance between the ultrasound plane and the target location at the depth of
the target. In the no-feedback condition, participants had an uncorrected probe
deviation larger than 2 mm for longer than half of the trial time in 40 % of
the trials. This deviation caused these trials to be failures as the needle did
not hit the original 3D target location. This poor performance highlights the
prevalence of unintended probe motion and the need for providing feedback to
guide the user. We focus the remainder of our analysis on the comparison of the
effectiveness of the visual and haptic feedback, and do not include the results
from the no-feedback condition in our statistical analysis.
* **
** ***
2.5 12
10
Probe Deviation (mm)
Correction Time (s)

8
1.5
6
1
4
0.5
2
0 0
Vision On Vision On Vision Off Vision On Vision On Vision Off
Haptics Off Haptics On Haptics On Haptics Off Haptics On Haptics On
Fig. 3. (a) Probe deviation, and (b) time to correct probe deviation, averaged across
each trial. Statistically significant differences in probe deviation and correction time
marked (∗ ∗ ∗ ≡ p ≤ 0.001, ∗∗ ≡ p ≤ 0.01, ∗ ≡ p ≤ 0.05).
The probe deviation was averaged *

**
in each trial. A three-way ANOVA was Very Hard
run on the average deviation with par-

ticipant, condition, and target loca- Hard
Rated Difficulty
tion as factors (Fig. 3(a)). Feedback

condition and target locations were
Moderate
found to be significant factors (p <
0.001). No significant difference was
found between the average probe devi- Easy
ations across participants (p > 0.1).

A multiple-comparison test between Very Easy
Vision On Vision On Vision Off
the three feedback conditions indi- Haptics Off Haptics On Haptics On
cated that the average probe deviation
for the condition including visual feed- Fig. 4. Rated difficulty for the three feed-
back only (1.12±0.62 mm) was signifi- back conditions (see below). Statistically
cantly greater than that for the condi- significant differences in rated difficulty
tions with both haptic and visual feed- marked (∗∗ ≡ p ≤ 0.01, ∗ ≡ p ≤ 0.05).
back (0.80 ± 0.38 mm; p < 0.01) and haptic feedback only (0.87 ± 0.48 mm;
p < 0.05).
Additionally, the time it took for participants to correct probe deviations
larger than the 2 mm deadband was averaged in each trial. A three-way ANOVA
was run on the average correction time with participant, condition, and target
location as factors (Fig. 3(b)). Feedback condition was found to be a significant
factor (p < 0.0005). No significant difference was found between the average
probe deviations across participants or target locations (p > 0.4). A multiple-
comparison test between the three feedback conditions indicated that the average
probe correction time for the condition including visual feedback only (2.15 ±
2.40 s) was significantly greater than that for the conditions with both haptic and
visual feedback (0.61±0.36 s; p < 0.0005) and haptic feedback only (0.77±0.59 s;
p < 0.005). These results indicate that the addition of haptic feedback resulted
in less undesired motion of the probe and allowed participants to more quickly
correct any deviations.
Several participants indicated that the haptic feedback was especially bene-
ficial because of the high visual-cognitive load of the needle alignment portion of
the task. The participants were asked to rate the difficulty of the experimental
conditions on a five-point Likert scale. The difficulty ratings (Fig. 4) support
our other findings. The condition including both haptic and visual feedback was
rated as significantly easier (2.75±0.76) than the conditions with visual feedback
only (3.38 ± 0.92; p < 0.05) and haptic feedback only (3.5 ± 0.46; p < 0.01).
5 Conclusion
We described a method to add haptic feedback to a commercial, vision-based
navigation system for ultrasound-guided interventions. In addition to conven-
tional on-screen cues (target indicators, needle guides, etc.), two vibrating pads
on either side of a standard handheld transducer indicate deviations from the
plane containing a locked target. A user study was performed under simulated
conditions which highlight the central problems of clinical ultrasound imaging
– namely difficult visualization of intended targets, and distraction caused by
task focusing and information overload, both of which contribute to inadver-
tent target-alignment loss. Participants executed a dummy needle-targeting task,
while probe deviation from the target plane, reversion time to return to plane,
and perceived targeting difficulty were measured.
The experimental results clearly show (1) that both visual and haptic feed-
back are extremely helpful at least in supporting inexperienced or overwhelmed
operators, and (2) that adding haptic feedback (presumably because of its intu-
itiveness and independent sensation modality) improves performance over both
static and dynamic visual feedback. The considered metrics map directly to clin-
ical precision (in the case of probe deviation) or efficacy of the feedback method
(in the case of reversion time). Since the addition of haptic feedback resulted in
significant improvement for novice users, the system shows promise for use in
training.
Although this system was implemented using a Clear Guide ONE, the haptic
feedback can in principle be implemented with any navigated ultrasound guid-
ance system. In the future, it would be interesting to examine the benefits of
haptic feedback in a clinical study, across a large cohort of diversely-skilled oper-
ators, while directly measuring the intervention outcome (instrument placement
accuracy). Future prototypes would be improved by including haptic feedback
for additional degrees of freedom such as translation and rotation of the probe.
References
1. Emergency ultrasound guidelines: Ann. Emerg. Med. 53(4), 550–570 (2009)
2. Revised statement on recommendations for use of real-time ultrasound guidance
for placement of central venous catheters. Bull. Am. Coll. Surg. 96(2), 36–37 (2011)
3. Antonakakis, J.G., Ting, P.H., Sites, B.: Ultrasound-guided regional anesthesia
for peripheral nerve blocks: an evidence-based outcome review. Anesthesiol. Clin.
29(2), 179–191 (2011)
4. Banovac, F., Wilson, E., Zhang, H., Cleary, K.: Needle biopsy of anatomically unfa-
vorable liver lesions with an electromagnetic navigation assist device in a computed
tomography environment. J. Vasc. Interv. Radiol. 17(10), 1671–1675 (2006)
5. Becker, B.C., Maclachlan, R.A., Hager, G.D., Riviere, C.N.: Handheld microma-
nipulation with vision-based virtual fixtures. In: IEEE International Conference of
Robotics Automation, vol. 2011, pp. 4127–4132 (2011)
6. Burke, J.L., Prewett, M.S., Gray, A.A., Yang, L., Stilson, F.R., Coovert, M.D.,
Elliot, L.R., Redden, E.: Comparing the effects of visual-auditory and visual-tactile
feedback on user performance: a meta-analysis. In: Proceedings of the 8th Inter-
national Conference on Multimodal Interfaces, pp. 108–117. ACM (2006)
7. Cavanna, L., Mordenti, P., Bertè, R., Palladino, M.A., Biasini, C., Anselmi, E.,
Seghini, P., Vecchia, S., Civardi, G., Di Nunzio, C.: Ultrasound guidance reduces
pneumothorax rate and improves safety of thoracentesis in malignant pleural effu-
sion: report on 445 consecutive patients with advanced cancer. World J. Surg.
Oncol. 12(1), 1 (2014)
8. Courreges, F., Vieyres, P., Istepanian, R.: Advances in robotic tele-echography-
services-the otelo system. In: 26th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, IEMBS 2004, vol. 2, pp. 5371–5374.
IEEE (2004)
9. Gilbertson, M.W., Anthony, B.W.: Ergonomic control strategies for a handheld
force-controlled ultrasound probe. In: 2012 IEEE/RSJ International Conference
on Intelligent Robots and Systems (IROS), pp. 1284–1291. IEEE (2012)
10. Oakley, I., McGee, M.R., Brewster, S., Gray, P.: Putting the feel in ‘look and
feel’. In: Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, CHI 2000, pp. 415–422. ACM (2000)
11. Shapiro, R.S., Ko, P.P., Jacobson, S.: A pilot project to study the use of ultra-
sonography for teaching physical examination to medical students. Comput. Biol.
Med. 32(6), 403–409 (2002)
12. Sigrist, R., Rauter, G., Riener, R., Wolf, P.: Augmented visual, auditory, haptic,
and multimodal feedback in motor learning: a review. Psychon. Bull. Rev. 20(1),
21–53 (2013)
13. Sites, B.D., Spence, B.C., Gallagher, J.D., Wiley, C.W., Bertrand, M.L., Blike,
G.T.: Characterizing novice behavior associated with learning ultrasound-guided
peripheral regional anesthesia. Reg. Anesth. Pain Med. 32(2), 107–115 (2007)
14. Stolka, P.J., Wang, X.L., Hager, G.D., Boctor, E.M.: Navigation with local sensors
in handheld 3D ultrasound: initial in-vivo experience. In: SPIE Medical Imaging,
p. 79681J. International Society for Optics and Photonics (2011)
15. Sun, S.Y., Gilbertson, M., Anthony, B.W.: Computer-guided ultrasound probe
realignment by optical tracking. In: 2013 IEEE 10th International Symposium on
Biomedical Imaging (ISBI), pp. 21–24. IEEE (2013)
16. Van Erp, J.B., Van Veen, H.A.: Vibrotactile in-vehicle navigation system. Transp.
Res. Part F: Traffic Psychol. Behav. 7(4), 247–256 (2004)
A Surgical Guidance System for Big-Bubble
Deep Anterior Lamellar Keratoplasty
Hessam Roodaki1(B) , Chiara Amat di San Filippo1 , Daniel Zapp3 ,

Nassir Navab1,2 , and Abouzar Eslami4
1
Computer Aided Medical Procedures,
Technische Universität München, Munich, Germany
he.roodaki@tum.de
2
Computer Aided Medical Procedures, Johns Hopkins University, Baltimore, USA
3
Augenklinik rechts der Isar, Technische Universität München, Munich, Germany
4
Carl Zeiss Meditec AG, Munich, Germany
Abstract. Deep Anterior Lamellar Keratoplasty using Big-Bubble tech-

nique (BB-DALK) is a delicate and complex surgical procedure with a
multitude of benefits over Penetrating Keratoplasty (PKP). Yet the steep
learning curve and challenges associated with BB-DALK prevents it from
becoming the standard procedure for keratoplasty. Optical Coherence
Tomography (OCT) aids surgeons to carry out BB-DALK in a shorter
time with a higher success rate but also brings complications of its own
such as image occlusion by the instrument, the constant need to reposi-
tion and added distraction. This work presents a novel real-time guidance
system for BB-DALK which is practically a complete tool for smooth
execution of the procedure. The guidance system comprises of modi-
fied 3D+t OCT acquisitions, advanced visualization, tracking of corneal
layers and providing depth information using Augmented Reality. The
system is tested by an ophthalmic surgeon performing BB-DALK on sev-
eral ex vivo pig eyes. Results from multiple evaluations show a maximum
tracking error of 8.8 micrometers.
1 Introduction
Ophthalmic anterior segment surgery is among the most technically challenging
manual procedures. Penetrating Keratoplasty (PKP) is a well-established trans-
plant procedure for the treatment of multiple diseases of the cornea. In PKP, the
full thickness of the diseased cornea is removed and replaced with a donor cornea
that is positioned into place and sutured with stitches. Deep Anterior Lamellar
Keratoplasty (DALK) is proposed as an alternative method for corneal disor-
ders not affecting the endothelium. The main difference of DALK compared
to PKP is the preservation of the patient’s own endothelium. This advantage
reduces the risk of immunologic reactions and graft failure while showing simi-
lar overall visual outcomes. However, DALK is generally more complicated and
time-consuming with a steep learning curve particularly when the host stroma
is manually removed layer by layer [4]. In addition, high rate of intraoperative

DOI: 10.1007/978-3-319-46720-7 44
A Surgical Guidance System for Big-Bubble Deep Anterior 379
perforation keeps DALK from becoming surgeons’ method of choice [7]. To over-
come the long surgical time and high perforation rate of DALK, in [1] Anwar
et al. have proposed the big-bubble DALK technique (BB-DALK). The funda-
mental step of the big-bubble technique is the insertion of a needle into the deep
stroma where air is injected with the goal of separating the posterior stroma and
the Descemet’s Membrane (DM). The needle is intended to penetrate to a depth
of more than 60 % of the cornea, where the injection of air in most cases forms a
bubble. However, in fear of perforating the DM, surgeons often stop the insertion
before the target depth, where air injection results only in diffuse emphysema
of the anterior stroma [7]. When bubble formation is not achieved, effort on
exposing a deep layer nearest possible to the DM carries the risk of accidental
perforation which brings further complications to the surgical procedure.
Optical Coherence Tomography (OCT) has been shown to increase the suc-
cess rate of the procedure by determining the depth of the cannula before
attempting the air injection [2]. Furthermore, recent integration of Spectral
Domain OCT (SD-OCT) into surgical microscopes gives the possibility of con-
tinuous monitoring of the needle insertion. However, current OCT acquisition
configurations and available tools to visualize the acquired scans are insufficient
for the purpose. Metallic instruments interfere with the OCT signal leading to
obstruction of deep structures. The accurate depth of the needle can only be per-
ceived by removing the needle and imaging the created tunnel since the image
captured when the needle is in position only shows the reflection of the top seg-
ment of the metallic instrument [2]. Also, limited field of view makes it hard to
keep the OCT position over the needle when pressure is applied for insertion.
Here we propose a complete system as a guidance tool for BB-DALK.
The system consists of modified 3D+t OCT acquisition using a microscope-
mounted scanner, sophisticated visualization, tracking of the epithelium (top)
and endothelium (bottom) layers and providing depth information using Aug-
mented Reality (AR). The method addresses all aspects of the indicated com-
plex procedure, hence is a practical solution to improve surgeons’ and patients’
experience.
2 Method
As depicted in Fig. 1, the system is based on an OPMI LUMERA 700 micro-

scope equipped with a modified integrated RESCAN 700 OCT device (Carl Zeiss
Meditec, Germany). A desktop computer with a quad-core Intel Core i7 CPU,
a single NVIDIA GeForce GTX TITAN X GPU and two display screens are
connected to the OCT device. Interaction with the guidance system is done by
the surgeon’s assistant via a 3D mouse (3Dconnexion, Germany). The surgeon
performs the procedure under the microscope while looking at the screens for
both microscopic and OCT feedback. The experiments are performed on ex vivo
pig eyes as shown in Fig. 3a using 27 and 30 gauge needles. For evaluations, a
micromanipulator and a plastic anterior segment phantom eye are used.
380 H. Roodaki et al.
Fig. 1. Experimental setup of the guidance system.
2.1 OCT Acquisition

The original configuration of the intraoperative OCT device is set to acquire
B-scans consisting of either 512 or 1024 A-scans. It can be set to acquire a single
B-scan, 2 orthogonal B-scans or 5 parallel B-scans. For the proposed guidance
system, the OCT device is set to provide 30 B-scans each with 90 A-scan samples
by reprogramming the movement of its internal mirror galvanometers. B-scans
are captured in a reciprocating manner for shorter scanning time. The scan region
covered by the system is 2 mm by 6 mm. The depth of each A-scan is 1024 pixels
corresponding to 2 mm in tissue. The concept is illustrated in Fig. 2a.
The cuboid of 30 × 90 × 1024 voxels is scanned at the rate of 10 volumes per
second. Since the cuboid is a 3D grid of samples from a continuous scene, it is
interpolated using tricubic interpolants to the target resolution of 180×540×180
2mm 1024px
2mm 180px
ns
sca
A-
90
x
0p
18
m
6m
m
2m
6mm 540px
2mm 30 B-scans
(a) (b)
Fig. 2. (a): The modified pattern of OCT acquisition. (b): The lateral visualization of
the cornea (orange) and the surgical needle (gray) in an OCT cuboid.
voxels (Fig. 2b). For that, frames are first averaged along the depth to obtain
30 frames of 90 × 30 pixels. Then in each cell of the grid, a tricubic interpolant
which maps coordinates to intensity values is defined as follows:

3
f (x, y, z) = cijk xi y j z k , x, y, z ∈ [0, 1], (1)
i,j,k=0
in which cijk are the 64 interpolant coefficients calculated locally from the grid
sample points and their derivatives. The coefficients are calculated by multiplica-
tion of a readily available 64×64 matrix and the vector of 64 elements consisting
of 8 sample points and their derivatives [6]. The interpolation is implemented on
the CPU in a parallel fashion.
2.2 Visualization
The achieved 3D OCT volume is visualized on both 2D monitors using GPU ray
casting with 100 rays per pixel. Maximum information in OCT images is gained
from high-intensity values representing boundaries between tissue layers. Hence,
the Maximum Intensity Projection (MIP) technique is employed for rendering
to put an emphasis on corneal layers. Many segmentation algorithms in OCT
imaging are based on adaptive intensity thresholding [5]. Metallic surgical instru-
ments including typical needles used for the BB-DALK procedure have infrared
reflectivity profiles that are distinct from cellular tissues. The 3D OCT volume is
segmented into the background, the cornea and the instrument by taking advan-
tage of various reflectivity profiles and employing K-means clustering. The initial
cluster mean values are set for the background to zero, the cornea to the volume
mean intensity (μ) and the instrument to the volume mean intensity plus two
standard deviations (μ + 2σ). The segmentation is used to dynamically alter the
color and opacity transfer functions to ensure the instrument is distinctly and
continuously visualized in red, the background speckle noise is suppressed and
the corneal tissue opacity does not obscure the instrument (Fig. 3b, c).
(a) (b) (c)
Fig. 3. (a): Needle insertion performed by the surgeon on the ex vivo pig eye. (b), (c):
3D visualization of the OCT cuboid with frontal and lateral viewpoints. The needle is
distinctly visualized in red while endothelium (arrow) is not apparent.
The OCT cuboid could be examined from different viewpoints according to

the exact need of the surgeon. For this purpose, one of the two displays could be
controlled by the surgeon’s assistant using a 3D mouse with zooming, panning
and 3D rotating functionalities. The proposed guidance system maintains an
automatic viewpoint of the OCT volume next to the microscopic view in the
second display using the tracking procedure described below.
2.3 Tracking
The corneal DM and endothelial layer are the main targets of the BB-DALK
procedure. The DM must not be perforated while the needle must be guided as
close as possible to it. However, the two layers combined do not have a footprint
larger than a few pixels in OCT images. As an essential part of the guidance
system, DM and endothelium 3D surfaces are tracked for continuous feedback by
solid visualization. The advancement of the needle in a BB-DALK procedure is
examined and reported by percentage of the stroma that is above the needle tip.
Hence, the epithelium surface of the cornea is also tracked to assist the surgeon
by the quantitative guidance of the insertion.
Tracking in each volume is initiated by detection of the topmost and bot-
tommost 3D points in the segmented cornea of the OCT volume. Based on the
spherical shape of the cornea, two half spheres are considered as models of the
endothelium and epithelium surfaces. The models are then fitted to the detected
point clouds using iterative closest point (ICP) algorithm. Since the insertion of
the needle deforms the cornea, ICP is utilized with 3D affine transformation at
its core [3]. If the detected and the model half sphere point clouds are respec-
tively denoted as P = {pi }N P 3 NM
i=1 ∈ R and M = {mi }i=1 ∈ R , each iteration of
3
the tracker algorithm is consecutively minimizing the following functions:
C(i) = arg min (Ak−1 mi + tk−1 ) − pi 22 , for all i ∈ {1, .., NM }. (2)
j∈{1,...,NP }
N
1
(Ak , tk ) = arg min (Ami + t) − pC(i) 22 . (3)
A,t N i=1
Equation 2 finds the correspondence C(i) between N ≤ min(NP , NM )

detected and model points. Equation 3 minimizes the Euclidean distance between
the detected points and the transformed points of the model. Ak and tk are the
desired affine and translation matrices at iteration k. For each incoming vol-
ume, ICP is initialized by the transformation that brings the centroid of the
model points to the centroid of the detected points. The algorithm stops after
30 iterations.
The lateral view of the OCT volume gives a better understanding of the nee-
dle dimensions and advancement. Also, the perception of the distance between
the instrument and the endothelium layer is best achieved from viewing the
scene parallel to the surface. Therefore, the viewpoint of the second display is
constantly kept parallel to a small plane at the center of the tracked endothelium
(a) (b)
Fig. 4. Augmented Reality is used to solidly visualize the endothelium and epithe-
lium surfaces (yellow) using wireframes. A hypothetical surface (green) is rendered to
indicate the insertion target depth.
surface (Fig. 4a). The pressure applied for insertion of the needle leads to defor-
mation of the cornea. To keep the OCT field of view centered on the focus of the
procedure despite the induced shifts, the OCT depth range is continuously cen-
tered to halfway between top and bottom surfaces. This is done automatically
to take the burden of manual repositioning away from the surgeon.
2.4 Augmented Reality
To further assist the surgeon, a hypothetical third surface is composed between

the top and bottom surfaces indicating the insertion target depth (Fig. 4). The
surgeon can choose a preferred percentage of penetration at which the imaginary
surface would be rendered. Each point of the third surface is a linear combination
of the corresponding points on the tracked epithelium and endothelium layers
according to the chosen percentage. To visualize the detected surfaces, a wire-
frame mesh is formed on each of the three point sets. The two detected surfaces
are rendered in yellow at their tracked position and the third surface is rendered
(a) (b) (c)
Fig. 5. Results of air injection in multiple pig eyes visualized from various viewpoints.
The concentration of air in the bottommost region of the cornea indicates the high
insertion accuracy. Deep stroma is reached with no sign of perforation.
in green at its hypothetical location. Visualization of each surface could be turned

off if necessary. After injection, the presence of air leads to high-intensity voxels
in the OCT volume. Therefore, the separation region is visualized effectively in
red and could be used for validation of separation (Fig. 5).
The proposed guidance system is tested by an ophthalmic surgeon experienced in

corneal transplantation procedure on several ex vivo pig eyes. The visualization
gives a new dimension never seen before in conventional systems in his comment.
His experience with the system signifies the ability of the real-time guidance
solution to help in deep needle insertions with fewer perforation incidents.
For the purpose of real-time OCT acquisition, the surgical scene is sparsely
sampled via a grid of A-scans and interpolated. To evaluate the accuracy of inter-
polation against dense sampling, four fixed regions of a phantom eye (2 mm ×
6 mm × 2 mm) are scanned once with real-time sparse sampling (30 px × 90 px ×
1024 px) and two times with slow dense sampling (85 px × 512 px × 1024 px). The
sparse volumes are then interpolated to the size of the dense volumes. Volume
pixels have intensities in the range of [0, 1]. For each of the four regions, Mean
Absolute Error (MAE) of pixel intensities is once calculated for the two dense
captures and once for one of the dense volumes and the interpolated volume. A
maximum pixel intensity error of 0.073 is observed for the dense-sparse compari-
son while a minimum pixel intensity error of 0.043 is observed for the dense-dense
comparison. The reason for the observed error in dense-dense comparison lies in
the presence of OCT speckle noise which is a known phenomenon. The error
observed for the dense-sparse comparison is comparable with the error induced
by speckle noise hence the loss in sparse sampling is insignificant.
Human corneal thickness is reported to be around 500 µm. To ensure a min-
imum chance of perforation when insertion is done to the depth of 90 %, the
tracking accuracy required is around 50 µm. To evaluate tracking accuracy of
the proposed solution, a micromanipulator with a resolution of 5 µm is used.
A phantom eye and a pig eye are fixed to a micromanipulator and precisely
moved upwards and downwards while the epithelium and endothelium surfaces
are tracked. At each position, the change in the depth of the tracked surfaces
corresponding points are studied. Results are presented in Table 1 using box-and-
whisker plots. The whiskers are showing the minimum and maximum recorded
change of all tracked points while the start and the end of the box are the
first and third quartiles. Bands and dots represent medians and means of the
recorded changes respectively. The actual value against which the tracking accu-
racy should be compared is highlighted in red on the horizontal axis of the plots.
Overall, the maximum tracking error is 8.8 µm.
Table 1. Evaluation of Tracking
Actual Detected epithelium Detected endothelium

Experiment
move (µm) displacement (µm) displacement (µm)
Phantom eye 10
2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18
Phantom eye 30
21 24 27 30 33 36 39 21 24 27 30 33 36 39
Pig eye 10
2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18
Pig eye 30
21 24 27 30 33 36 39 21 24 27 30 33 36 39
4 Conclusion
This work presents a novel real-time guidance system for one of the most chal-
lenging procedures in ophthalmic microsurgery. The use of medical AR aims at
facilitation of the BB-DALK learning process. Experiments on ex vivo pig eyes
suggest the usability and reliability of the system leading to more effective yet
shorter surgery sessions. Quantitative evaluations of the system indicate its high
accuracy in depicting the surgical scene and tracking its changes leading to pre-
cise and deep insertions. Future work will be in the direction of adding needle
tracking and navigation, further evaluations and clinical in vivo tests.
References
1. Anwar, M., Teichmann, K.D.: Big-bubble technique to bare Descemet’s membrane
in anterior lamellar keratoplasty. J. Cataract Refract. Surg. 28(3), 398–403 (2002)
2. De Benito-Llopis, L., Mehta, J.S., Angunawela, R.I., Ang, M., Tan, D.T.: Intra-
operative anterior segment optical coherence tomography: a novel assessment tool
during deep anterior lamellar keratoplasty. Am. J. Ophthalmol. 157(2), 334–341
(2014)
3. Du, S., Zheng, N., Ying, S., Liu, J.: Affine iterative closest point algorithm for
point set registration. Pattern Recogn. Lett. 31(9), 791–799 (2010)
4. Fontana, L., Parente, G., Tassinari, G.: Clinical outcomes after deep anterior lamel-
lar keratoplasty using the big-bubble technique in patients with keratoconus. Am.
J. Ophthalmol. 143(1), 117–124 (2007)
5. Ishikawa, H., Stein, D.M., Wollstein, G., Beaton, S., Fujimoto, J.G., Schuman, J.S.:
Macular segmentation with optical coherence tomography. Invest. Ophthalmol.
Vis. Sci. 46(6), 2012–2017 (2005)
6. Lekien, F., Marsden, J.: Tricubic interpolation in three dimensions. Int. J. Numer.
Meth. Eng. 63(3), 455–471 (2005)
7. Scorcia, V., Busin, M., Lucisano, A., Beltz, J., Carta, A., Scorcia, G.: Anterior seg-
ment optical coherence tomography-guided big-bubble technique. Ophthalmology
120(3), 471–476 (2013)
Real-Time 3D Tracking of Articulated Tools
for Robotic Surgery
Menglong Ye(B) , Lin Zhang, Stamatia Giannarou, and Guang-Zhong Yang
The Hamlyn Centre for Robotic Surgery, Imperial College London, London, UK
menglong.ye11@imperial.ac.uk
Abstract. In robotic surgery, tool tracking is important for provid-

ing safe tool-tissue interaction and facilitating surgical skills assessment.
Despite recent advances in tool tracking, existing approaches are faced
with major difficulties in real-time tracking of articulated tools. Most
algorithms are tailored for offline processing with pre-recorded videos.
In this paper, we propose a real-time 3D tracking method for articu-
lated tools in robotic surgery. The proposed method is based on the
CAD model of the tools as well as robot kinematics to generate online
part-based templates for efficient 2D matching and 3D pose estimation.
A robust verification approach is incorporated to reject outliers in 2D
detections, which is then followed by fusing inliers with robot kinematic
readings for 3D pose estimation of the tool. The proposed method has
been validated with phantom data, as well as ex vivo and in vivo exper-
iments. The results derived clearly demonstrate the performance advan-
tage of the proposed method when compared to the state-of-the-art.
1 Introduction
Recent advances in surgical robots have significantly improved the dexterity
of the surgeons, along with enhanced 3D vision and motion scaling. Surgical
robots such as the da Vinci R
(Intuitive Surgical, Inc. CA) platform, can allow
the augmentation of preoperative data to enhance the intraoperative surgical
guidance. In robotic surgery, tracking of surgical tools is an important task for
applications such as safe tool-tissue interaction and surgical skills assessment.
In the last decade, many approaches for surgical tool tracking have been pro-
posed. The majority of these methods have focused on the tracking of laparo-
scopic rigid tools, including using template matching [1] and combining colour-
segmentation with prior geometrical tool models [2]. In [3], the 3D poses of rigid
robotic tools were estimated by combining random forests with level-sets seg-
mentation. More recently, tracking of articulated tools has also attracted a lot of
interest. For example, Pezzementi et al. [4] tracked articulated tools based on an
offline synthetic model using colour and texture features. The CAD model of a
robotic tool was used by Reiter et al. [5] to generate virtual templates using the
robot kinematics. However, thousands of templates were created by configuring
the original tool kinematics, leading to time-demanding rendering and template
matching. In [6], boosted trees were used to learn predefined parts of surgical

DOI: 10.1007/978-3-319-46720-7 45
Real-Time 3D Tracking of Articulated Tools for Robotic Surgery 387
Fig. 1. (a) Illustration of transformations; (b) Virtual rendering example of the large
needle driver and its keypoint locations; (c) Extracted gradient orientations from virtual
rendering. The orientations are quantised and colour-coded as shown in the pie chart.
tools. Similarly, regression forests have been employed in [7] to estimate the 2D
pose of articulated tools. In [8], the 3D locations of robotic tools estimated with
offline trained random forests, were fused with robot kinematics to recover the
3D poses of the tools. Whilst there has been significant progress on surgical tool
detection and tracking, none of the existing approaches have thus far achieved
real-time 3D tracking of articulated robotic tools.
In this paper, we propose a framework for real-time 3D tracking of articulated
tools in robotic surgery. Similar to [5], CAD models have been used to generate
virtual tools and their contour templates are extracted online, based on the
kinematic readings of the robot. In our work, the tool detection on the real
camera image is performed via matching the individual parts of the tools rather
than the whole instrument. This enables our method to deal with the changing
pose of the tools due to articulated motion. Another novel aspect of the proposed
framework is the robust verification approach based on 2D geometrical context,
which is used to reject outlier template matches of the tool parts. The inlier 2D
detections are then used for 3D pose estimation via the Extended Kalman Filter
(EKF). Experiments have been conducted on phantom, ex vivo and in vivo video
data, and the results verify that our approach outperforms the state-of-the-art.
2 Methods
Our proposed framework includes three main components. The first component is
a virtual tool renderer that generates part-based templates online. After template
matching, the second component performs verification to extract the inlier 2D
detections. These 2D detections are finally fused with kinematic data for 3D tool
pose estimation. Our framework is implemented on the da Vinci R
robot. The

R
robot kinematics are retrieved using the da Vinci Research Kit (dVRK) [9].
2.1 Part-Based Online Templates for Tool Detection

In this work, to deal with the changing pose of articulated surgical tools, the
tool detection has been performed by matching individual parts of the tools,
rather than the entire instrument, similar to [6]. To avoid the limitations of
388 M. Ye et al.
Fig. 2. (a) An example of part-based templates; (b) Quantised gradient orientations

from a camera image; (c) Part-based template matching results of tool parts; (d) and
(e) Geometrical context verification; (f) Inlier detections obtained after verification.
offline training, we propose to generate the part models on-the-fly such that the
changing appearance of tool parts can be dynamically adapted.
To generate the part-based models online, the CAD model of the tool and the
robot kinematics have been used to render the tool in a virtual environment. The
pose of a tool in the robot base frame B can be denoted as the transformation
TB B
E , where E is the end-effector coordinate frame shown in Fig. 1(a). TE can be
retrieved from dVRK (kinematics) to provide the 3D coordinates of the tool in B.
Thus, to set the virtual view to be the same as the laparoscopic view, a standard
hand-eye calibration [10] is used to estimate the transformation TC B from B to
the camera coordinate frame C. However, errors in the calibration can affect the
accuracy of TC B , resulting in a 3D pose offset between the virtual tool and the
real tool in C. In this regard, we represent the transformation found from the
−
calibration as TC B , where C
−
is the camera coordinate frame that includes the
accumulated calibration errors. Therefore, a correction transformation denoted
as TC C − can be introduced to compensate for the calibration errors.
n
In this work, we have defined n = 14 keypoints PB = pB i i=1 on the
tool, and the large needle driver is taken as an example. The keypoints include
the points shown in Fig. 1(b) and those on the symmetric side of the tool. These
keypoints represent the skeleton of the tool, which also apply to other da Vinci
R
tools. At time t, an image It can be obtained from the laparoscopic camera. The
keypoints can be projected in It with the camera intrinsic matrix K via
1 C− B
PIt = KTC
C − TB Pt . (1)
s
Here, s is the scaling factor that normalises the depth to the image plane.
To represent the appearance of the tool parts, the Quantised Gradient Ori-
entations (QGO) approach [11] has been used (see Fig. 1(c)). Bounding boxes
are created to represent part-based models and centred at the keypoints in the
virtual view (see Fig. 2(a)). The box size for each part is adjusted based on the z
coordinate (from kinematics) of the keypoint with respect to the virtual camera
centre. QGO templates are then extracted inside these bounding boxes. As QGO
represents the contour information of the tool, it is robust to cluttered scenes and
illumination changes. In addition, a QGO template is represented as a binary
code by quantisation, thus template matching can be performed efficiently.
Note that not all of the defined parts are visible in the virtual view, as some
of them may be occluded. Therefore, the templates are only extracted for those
m parts that facing the camera. To find the correspondences of the tool parts
between the virtual and real images, QGO is also computed on the real image
(see Fig. 2(b)) and template matching is then performed for each part via sliding
windows. Exemplar template matching results are shown in Fig. 2(c).
2.2 Tool Part Verification via 2D Geometrical Context
To further extract the best location estimates of the tool parts, a consensus-based
verification approach [12] is included. This approach analyses the geometrical
context of the correspondences in a PROgressive SAmple Consensus (PROSAC)
m
scheme [13]. For the visible keypoints {pi }i=1 in the virtual view, we denote their
m,k k
2D correspondences in the real camera image as {pi,j }i=1,j=1 , where {pi,j }j=1
represent the top k correspondences of pi sorted by QGO similarities.
m,k
For each iteration in PROSAC, we select two point pairs from {pi,j }i=1,j=1
in a sorted descending order. These two pairs represent the correspondences for
two different parts, e.g., pair of p1 and p1,2 , and pair of p3 and p3,1 . The two pairs
are then used to verify the geometrical context of the tool parts. As shown in
Fig. 2(d) and (e), we use two polar grids to indicate the geometrical context of the
virtual view and the camera image. The origins of the grids are defined as p1 and
p1,2 , respectively. The major axis of the grids can be defined as the vectors from
p1 to p3 and p1,2 to p3,1 , respectively. The scale difference between the two grids
is found by comparing d (p1 , p3 ) and d (p1,2 , p3,1 ), where d (·, ·) is the euclidean
distance. We can then define the angular and radial bin sizes as 30◦ and 10
pixels (allowing moderate out-of-plane rotation), respectively. With these, two
polar grids can be created and placed on the virtual and camera images. A point
pair is determined as an inlier if the two points are located in the same zone in
the polar grids. Therefore, if the number of inliers is larger than a predefined
value, the geometrical context of the tools in the virtual and the real camera
images are considered as matched. Otherwise, the above verification is repeated
until it reaches the maximum number (100) of iterations. After verification, the
inlier point matches are used to estimate the correction transformation TC C− .
2.3 From 2D to 3D Tool Pose Estimation
We now describe how to combine the 2D detections with 3D kinematic data

to estimate TCC − . Here the transformation matrix is represented as a vector of
T
rotation angles and translations along each axis: x = [θx , θy , θz , rx , ry , rz ] . We
denote the n observations (corresponding to the tool parts defined in Sect. 2.1) as
390 M. Ye et al.
T
z = [u1 , v1 , . . . , un , vn ] , where u and v are their locations in the camera image.
To estimate x on-the-fly, the EKF has been adopted to find xt given the observa-
tions zt at time t. The process model is defined as xt = Ixt−1 + wt , where wt is
the process noise at time t, and I is the transition function defined as the identity
matrix. The measurement model is defined as zt = h(xt ) + vt , with vt being the
T
noise. h(·) is the nonlinear function with respect to [θx , θy , θz , rx , ry , rz ] :
1 −
h(xt ) = Kf (xt )TC B
B Pt , (2)
s
which is derived according to Eq. 1. Note here, f (·) is the function that composes
the euler angles and translation (in xt ) into the 4×4 homogenous transformation
matrix TC C − . As Eq. 2 is a nonlinear function, we derive the Jacobian matrix J
of h(·) regarding each element in xt .
For iteration t, the predicted state x− t is calculated and used to predict the
measurement z− t , and also to calculate J t . In addition, zt is obtained from the
inlier detections (Sect. 2.2), which is used, along with Jt and x− t , to derive the
corrected state x+ t which contains the corrected angles and translations. These
are finally used to compose the transformation TC C − at time t, and thus the 3D
C C C− B
pose of the tool in C is obtained as TE = TC − TB TE . Note that if no 2D
detections are available at time t, the previous TC C − is then used.
At the beginning of the tracking process, an estimate 0 TC C − is required to
initialise EKF, and correct the virtual view to be as close as possible to the real
view. Therefore, template matching is performed in multiple scales and rotations
for initialisation, however, only one template is needed for matching of each tool
part after initialisation. The Efficient Perspective-n-Points (EPnP) algorithm
[14] is applied to estimate 0 TC C − based on the 2D–3D correspondences of the
tool parts matched between the virtual and real views and their 3D positions
from kinematic data.
The proposed framework can be easily extended to track multiple tools. This
only requires to generate part-based templates for all the tools in the same
graphic rendering and follow the proposed framework. As template matching is
performed in binarised templates, the computational speed is not deteriorated.
3 Results
The proposed framework has been implemented on an HP workstation with
an Intel Xeon E5-2643v3 CPU. Stereo videos are captured at 25 Hz. In our
C++ implementation, we have separated the part-based rendering and image
processing into two CPU running threads, enabling our framework to be real-
time. The rendering part is implemented based on VTK and OpenGL, of which
the speed is fixed as 25 Hz. As our framework only requires monocular images
for 3D pose estimation, only the images from the left camera were processed.
For image size 720 × 576, the processing speed is ≈29 Hz (without any GPU
programming). The threshold of the inlier number in the geometrical context
verification is empirically defined as 4. For initialisation, template matching is
Fig. 3. (a) and (b) Detection rate results of our online template matching and Grad-
Boost [6] on two single-tool tracking sequences (see supplementary videos); (c) Overall
rotation angle errors (mean ± std) along each axis on Seqs. 1–6.
Table 1. Translation and rotation errors (mean ± std) on Seqs. 1–6. Tracking accuracies
with run-time speed in Hz (in brackets) compared to [8] on their dataset (Seqs. 7–12).
3D Pose error Tracking accuracy
Seq. Our method EPnP-based Seq. Our method [8]

Trans. (mm) Rot. (rads.) Trans. (mm) Rot. (rads.)
1 1.31 ± 1.15 0.11 ± 0.08 3.10 ± 3.89 0.12 ± 0.09 7 97.79 % (27) 97.12 % (1)
2 1.50 ± 1.12 0.12 ± 0.07 6.69 ± 8.33 0.24 ± 0.19 8 99.45 % (27) 96.88 % (1)
3 3.14 ± 1.96 0.12 ± 0.08 8.03 ± 8.46 0.23 ± 0.20 9 99.25 % (28) 98.04 % (1)
4 4.04 ± 2.21 0.19 ± 0.15 5.02 ± 5.41 0.29 ± 0.18 10 96.84 % (28) 97.75 % (1)
5 3.07 ± 2.02 0.14 ± 0.11 5.47 ± 5.63 0.26 ± 0.20 11 96.57 % (36) 98.76 % (2)
6 3.24 ± 2.70 0.12 ± 0.05 4.03 ± 3.87 0.23 ± 0.13 12 98.70 % (25) 97.25 %(1)
Overall 2.83 ± 2.19 0.13 ± 0.10 5.51 ± 6.45 0.24 ± 0.18 Overall 97.83 % 97.81 %
performed with additional scale ratios of 0.8 and 1.2, and rotations of ±15◦ ,
which does not deteriorate the run-time speed due to template binarisation. Our
method was compared to the tracking approaches for articulated tools including
[6,8].
For demonstrating the effectiveness of the online part-based templates for
tool detection, we have compared our approach to the method proposed in [6],
which is based on boosted trees for 2D tool part detection. For ease of training
data generation, a subset of the tool parts was evaluated in this comparison,
namely the front pin, logo, and rear pin. The classifier was trained with 6000
samples for each part. Since [6] applies to single tool tracking only, the trained
classifier along with our approach were tested on two single-tool sequences (1677
and 1732 images), where ground truth data was manually labelled. A part detec-
tion is determined to be correct if the distance of its centre and ground truth is
smaller than a threshold. To evaluate the results with different accuracy require-
ments, the threshold was therefore sequentially set to 5, 10, and 20 pixels.
The detection rates of the methods were calculated among the top N detec-
tions. As shown in Fig. 3(a–b) our method significantly outperforms [6] in all
accuracy requirements. This is because our templates are generated adaptively
online.
To validate the accuracy of the 3D pose estimation, we manually labelled
the centre locations of the tool parts on both left and right camera images
392 M. Ye et al.
Fig. 4. Qualitative results. (a–c) phantom data (Seqs. 1–3); (d) ex vivo ovine data
(Seq. 4); (e) and (g) ex vivo porcine data (Seqs. 9 and 12); (f) in vivo porcine data
(Seq. 11). Red lines indicate the tool kinematics, and green lines indicate the tracking
results of our framework with 2D detections in coloured dots.
on phantom (Seqs. 1–3) and ex vivo (Seqs. 4–6) video data to generate the 3D
ground truth. The tool pose errors are then obtained as the relative pose between
the estimated pose and the ground truth. Our approach was also compared
to the 3D poses estimated performing EPnP for every image where the tool
parts are detected. However, EPnP generated unstable results and had inferior
performance to our approach as shown in Table 1 and Fig. 3(c).
We have also compared our framework to the method proposed in [8]. As
their code is not publicly available, we ran our framework on the same ex vivo
(Seqs. 7–10, 12) and in vivo data (Seq. 11) used in [8]. Example results are shown
in Fig. 4(e–g). For achieving a fair comparison, we have evaluated the tracking
accuracy as explained in their work, and presented both our results and theirs
reported in the paper in Table 1. Although our framework achieved slightly bet-
ter accuracies than their approach, our processing speed is significantly faster,
ranging from 25–36 Hz, while theirs is approximately 1–2 Hz as reported in [8].
As shown in Figs. 4(b) and (d), our proposed method is robust to occlusion due
to tool intersections and specularities, thanks to the fusion of 2D part detections
and kinematics. In addition, our framework is able to provide accurate track-
−
ing even when TC B becomes invalid after the laparoscope has moved (Fig. 4(c),
Seq. 3). This is because TC C − is estimated online using the 2D part detections.
All the processed videos are available via https://youtu.be/oqw 9Xp qsw.
4 Conclusions
In this paper, we have proposed a real-time framework for 3D tracking of artic-
ulated tools in robotic surgery. Online part-based templates are generated using
the tool CAD models and robot kinematics, such that efficient 2D detection
can then be performed in the camera image. For rejecting outliers, a robust
verification method based on 2D geometrical context is included. The inlier 2D
detections are finally fused with robot kinematics for 3D pose estimation. Our
framework can run in real-time for multi-tool tracking, thus can be used for
imposing dynamic active constraints and motion analysis. The results on phan-
tom, ex vivo and in vivo experiments demonstrate that our approach can achieve
accurate 3D tracking, and outperform the current state-of-the-art.
Acknowledgements. We thank Dr. DiMaio (Intuitive Surgical) for providing the

CAD models, and Dr. Reiter (Johns Hopkins University) for assisting method compar-
isons. Dr. Giannarou is supported by the Royal Society (UF140290).
References
1. Sznitman, R., Ali, K., Richa, R., Taylor, R.H., Hager, G.D., Fua, P.: Data-driven
visual tracking in retinal microsurgery. In: Ayache, N., Delingette, H., Golland, P.,
Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7511, pp. 568–575. Springer, Heidelberg
(2012)
2. Wolf, R., Duchateau, J., Cinquin, P., Voros, S.: 3D tracking of laparoscopic instru-
ments using statistical and geometric modeling. In: Fichtinger, G., Martel, A.,
Peters, T. (eds.) MICCAI 2011. LNCS, vol. 6891, pp. 203–210. Springer, Heidel-
berg (2011)
3. Allan, M., Chang, P.-L., Ourselin, S., Hawkes, D.J., Sridhar, A., Kelly, J., Stoyanov,
D.: Image based surgical instrument pose estimation with multi-class labelling and
optical flow. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI
4. Pezzementi, Z., Voros, S., Hager, G.: Articulated object tracking by rendering
consistent appearance parts. In: ICRA, pp. 3940–3947 (2009)
5. Reiter, A., Allen, P.K., Zhao, T.: Articulated surgical tool detection using virtually-
rendered templates. In: CARS (2012)
6. Sznitman, R., Becker, C., Fua, P.: Fast part-based classification for instrument
detection in minimally invasive surgery. In: Golland, P., Hata, N., Barillot, C.,
Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 692–699.
7. Rieke, N., Tan, D.J., Alsheakhali, M., Tombari, F., San Filippo, C.A., Belagiannis,
V., Eslami, A., Navab, N.: Surgical tool tracking and pose estimation in retinal
microsurgery. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MIC-
CAI 2015. LNCS, vol. 9349, pp. 266–273. Springer, Heidelberg (2015)
8. Reiter, A., Allen, P.K., Zhao, T.: Appearance learning for 3D tracking of robotic
surgical tools. Int. J. Rob. Res. 33(2), 342–356 (2014)
9. Kazanzides, P., Chen, Z., Deguet, A., Fischer, G., Taylor, R., DiMaio, S.: An open-
source research kit for the da vinci R
surgical system. In: ICRA, pp. 6434–6439
(2014)
10. Tsai, R., Lenz, R.: A new technique for fully autonomous and efficient 3D robotics
hand/eye calibration. IEEE Trans. Rob. Autom. 5(3), 345–358 (1989)
11. Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., Lepetit, V.:
Gradient response maps for real-time detection of textureless objects. IEEE Trans.
12. Ye, M., Giannarou, S., Meining, A., Yang, G.Z.: Online tracking and retargeting
with applications to optical biopsy in gastrointestinal endoscopic examinations.
Med. Image Anal. 30, 144–157 (2016)
394 M. Ye et al.
13. Chum, O., Matas, J.: Matching with PROSAC - progressive sample consensus. In:
CVPR, vol. 1, pp. 220–226 (2005)
14. Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the
PnP problem. Int. J. Comput. Vis. 81(2), 155–166 (2008)
Towards Automated Ultrasound
Transesophageal Echocardiography and X-Ray
Fluoroscopy Fusion Using an Image-Based
Co-registration Method
Shanhui Sun1(B) , Shun Miao1 , Tobias Heimann1 , Terrence Chen1 ,

Markus Kaiser2 , Matthias John2 , Erin Girard2 , and Rui Liao1
1
Siemens Healthcare, Medical Imaging Technologies, Princeton, NJ 08540, USA
shanhui.sun@siemens.com
2
Siemens Healthcare, Advanced Therapies, 91301 Forchheim, Germany
Abstract. Transesophageal Echocardiography (TEE) and X-Ray fluo-

roscopy are two routinely used real-time image guidance modalities for
interventional procedures, and co-registering them into the same coordi-
nate system enables advanced hybrid image guidance by providing aug-
mented and complimentary information. In this paper, we present an
image-based system of co-registering these two modalities through real-
time tracking of the 3D position and orientation of a moving TEE probe
from 2D fluoroscopy images. The 3D pose of the TEE probe is estimated
fully automatically using a detection based visual tracking algorithm,
followed by intensity-based 3D-to-2D registration refinement. In addi-
tion, to provide high reliability for clinical use, the proposed system can
automatically recover from tracking failures. The system is validated on
over 1900 fluoroscopic images from clinical trial studies, and achieves a
success rate of 93.4 % at 2D target registration error (TRE) less than
2.5 mm and an average TRE of 0.86 mm, demonstrating high accuracy
and robustness when dealing with poor image quality caused by low
radiation dose and pose ambiguity caused by probe self-symmetry.
Keywords: Visual tracking based pose detection · 3D-2D registration
1 Introduction
There is a fast growth of catheter-based procedures for structure heart disease

such as transcatheter aortic valve implantation (TAVI) and transcatheter mitral
valve replacement (TMVR). These procedures are typically performed under the
independent guidance of two real-time imaging modalities, i.e. fluoroscopic Xray
and transesophageal echocardiaography (TEE). Both imaging modalities have
their own advantages, for example, Xray is good at depicting devices, and TEE
is much better at soft tissue visualization. Therefore fusion of both modalities
could provide complimentary information for improved security and accuracy

DOI: 10.1007/978-3-319-46720-7 46
396 S. Sun et al.
during the navigation and deployment of the devices. For example, a Xray/TEE
fusion system can help the physican finding correct TAVR deployment angle on
fluoroscopic image using landmarks transformed from annotations on TEE.
To enable the fusion of Xray and TEE images, several methods have been
proposed to recover the 3D pose of TEE probe from the Xray image [1–3,5,6],
where 3D pose recovery is accomplished by 3D-2D image registration. In [1,2,5],
3D-2D image registration is fulfilled via minimizing dissimilarity between digi-
tally generated radiographies (DRR) and X-ray images. In [6], DRR rendering
is accelerated by using mesh model instead of a computed tomography (CT)
volume. In [3], registration is accelerated using a cost function which is directly
computed from X-ray image and CT scan via splatting from point cloud model
without the explicit generation of DRR. The main disadvantage of these meth-
ods is that they are not fully automatic and requires initialization due to small
capture range. Recently, Montney et al. proposed a detection based method to
recover the 3D pose of the TEE probe from an Xray image in work [7]. 3D trans-
lation is derived from probe’s in-plane position detector and scale detector. 3D
Rotation (illustrated in Fig. 1(a)) is derived from in-plane rotation (yaw angle)
based on orientation detector and out-of-plane rotations (roll and pitch angles)
based on a template matching based approach. They demonstrated feasibility
on synthetic data. Motivated by the detection based method, we present a new
method in this paper to handle practical challenges in a clinical setup such as
low X-Ray dose, noise, clutters and probe self-symmetry in 2D image. Two self-
symmetry examples are shown in Fig. 1(b). To minimize appearance ambiguity,
three balls (Fig. 2(a)) and three holes (Fig. 2(b)) are manufactured on the probe.
Examples of ball marker and hole marker appearing in fluoroscopic images are
shown in Fig. 2(c) and (d). Our algorithm explicitly detects the markers and
incorporates the marker detection results into TEE probe pose estimation for
an improved robustness and accuracy.
Fig. 1. (a) Illustration of TEE Euler angles. Yaw is an in-plane rotation. Pitch and
roll are out-of-plane rotations. (b) Example of ambiguous appearance in two different
poses. Green box indicates probe’s transducer array. Roll angle between two poses are
close to 90◦ . Without considering markers (Fig. 2), probe looks similar in X-ray images.
In addition, based on the fact of that physicians acquire series of frames

(a video sequence) in interventional cardiac procedure, we incorporate tempo-
ral information to boost the accuracy and speed, and we formulate our 6-DOF
parameter tracking inference as a sequential Bayesian inference framework. To
Towards Automated Ultrasound Transesophageal Echocardiography 397
further remove discretization errors, Kalman filter is applied to temporal pose

parameters. In addition, tracking failure is automatically detected and auto-
mated tracking initialization method is applied. For critical time points when the
measurements (e.g., annotated anatomical landmarks) from the TEE image are
to be transformed to the fluoroscopic image for enhanced visualization, intensity-
based 3D to 2D registration of the TEE probe is performed to further refine the
estimated pose to ensure a high accuracy.
Fig. 2. Illustration of probe markers circled in red. (a) 3D TEE probe front side with
3 ball markers and (b) back side with 3 hole markers. (c) Ball markers and (d) hole
markers appear in X-Ray images.
2 Methods
A 3D TEE point QT EE can be projected to the 2D fluoroscopic image point
QF luoro = Pint Pext (RTWEE QT EE + TTWEE ), where Pint is C-Arm’s internal pro-
jection matrix. Pext is C-Arm’s external matrix which transforms a point from
TEE world coordinate to C-Arm coordinate. RTWEE and TTWEE are TEE probe’s
rotation and position in the world coordinate. The internal and external matrices
−1 C
are known from calibration and C-Arm rotation angles. RTWEE = Pext RT EE and
W −1 C C C
TT EE = Pext TT EE , where RT EE and TT EE are the probe’s rotation and posi-
tion in the C-Arm coordinate system. RTCEE is composed of three euler angles
(θz , θx , θy ), which are illustrated in Fig. 1(a), and TTCEE = (x, y, z).
The proposed tracking algorithm is formulated as finding an optimal pose on
the current image t constrained via prior pose from image t − 1. In our work,
pose hypotheses with pose parameters (u, v), θz , s, θx and θy are generated and
optimal pose among these hypotheses are identified in a sequential Bayesian
inference framework. Figure 3 illustrates an overview of the proposed algorithm.
We defined two tracking stages: in-plane pose tracking for parameters (u, v), s,
and θz and out-of-plane tracking for parameters θx and θy. In the context of
visual tracking, the searching spaces of (ut , vt , θzt , st ) and (θxt , θyt ) are signifi-
cantly reduced via generating in-plane pose hypotheses in the region of interest
(ut−1 ± δT , vt−1 ± δT , θzt−1 ± δz , st−1 ± δs ), and out-of-plane pose hypotheses in
the region of interest (θxt−1 ± δx , θyt−1 ± δy ), where δT , δz , δs, δx and δy are
searching ranges. Note that we choose these searching ranges conservatively, i.e.
much larger than typical frame-to-frame probe motion.
398 S. Sun et al.
Fig. 3. Overview of tracking framework.
2.1 In-Plane Pose Tracking

To realize tracking, we use Bayesian inference network [9] as follows.
P (Mt |Zt ) ∝ P (Mt )P (Zt |Mt ), (1a)

M̂t = argmaxP (Mt |Zt ) (1b)
Mt
where Mt is in-plane pose parameters (u, v, θz , s). M̂t is the optimal solution
using maximum a posterior (MAP) probability. P (Zt |Mt ) is the likelihood of
an in-plane hypothesis being positive. P (Mt ) represents in-plane motion prior
probability, which is defined as a joint Gaussian distribution with respect to the
parameters (u, v, θz , s) with standard deviations (σT , σT , σθz and σs ).
In-plane pose hypotheses are generated using marginal space learning method
similar to the work in [10]. A series of cascaded classifiers are trained to clas-
sify probe position (u, v), size s, and orientation θz . These classifiers are trained
sequentially: two position detectors for (u, v), orientation detector for θz and
scale detector for s. Each detector is a Probabilistic Boosting Tree (PBT) classi-
fier [8] using Haar-like features [9] and rotated Haar-like features [9]. The position
classifier is trained on the annotations (positive samples) and negative samples
randomly sample to be away from annotations. The second position detector
performs bootstrapping procedure. Negative samples are collected from both
false positive of the first position detection results and random negative sam-
ples. Orientation detector is trained on the rotated images, which are rotated to
0◦ according to annotated probe’s orientations. The Haar-like features are com-
puted on rotated images. During orientation test stage, input image is rotated
every 5◦ in range of θzt−1 ± δz . Scale detector is trained on the rotated images.
Haar-like feature is computed on the rotated images and the Haar feature win-
dows are scaled based on probe’s size. During scale test stage, Haar feature
window is scaled and quantified in the range of st−1 ± δs .
2.2 Out-of-Plane Pose Tracking

Out-of-plane pose tracking performs another Bayesian inference network derived
from Eq. 1. Thus in this case Mt (in Eq. 1) is out-of-plane pose parame-
ters (θx , θy ). M̂t is the optimal solution using MAP probability. P (Zt |Mt ) is
likelihood of an out-of-plane hypothesis being positive. P (Mt ) is an out-of-plane

motion prior probability, which is defined as a joint Gaussian distribution with
respect to the parameters (θx , θy ) with standard deviations (σx ,σy ).
Out-of-plane pose hypothesis generation is based on a K nearest neighbour
search using library-based template matching. At training stage, we generate
2D synthetic X-Ray images at different out-of-plane poses and keeping the same
in-plane pose. Roll angle ranges from −180◦ to 180◦ . Pitch angle ranges from
−52◦ to 52◦ , and angles out of this range are not considered since they are not
clinically relevant. Both step sizes are 4◦ . All in-plane poses of these synthetic
images are set to the same canonical space: probe positioned at image center,
0◦ yaw angle and normalized size. Global image representation of each image
is computed representing out-of-plane pose and saved in a database. The image
representation is derived based on method presented in [4]. At test stage, L
in-plane pose perturbations (small translations, rotations and scales) about the
computed in-plane pose (Sect. 2.1) are produced. L in-plane poses are utilized
to define L probe ROIs in the same canonical space. Image representation of
each ROI is computed and is used to search (e.g. KD-Tree) in the database and
resulting K nearest neighbors. Unfortunately, only using global representation
is not able to differentiate symmetric poses. For example, a response map of an
exemplar pose to all the synthetic images shown in Fig. 4. Note that there are two
dominant symmetrical modes and thus out-of-plane hypotheses are generated
around these two regions. We utilize markers (Fig. 2) to address this problem.
For each synthetic image, we thus save the marker positions in the database.
The idea is that we perform a visibility test at each marker position in L ∗ K
N
searching results. The updated searching score T̂score = Tscore
N i=1 α+Pi (xi , yi ),
th
where Tscore is a searching score. Pi is i marker’s visibility ([0.0, 1.0]) at marker
position (xi , yi ) in the corresponding synthetic image template. N is the number
of markers. α is a constant value 0.5. Marker visibility test is fulfilled using two
marker detectors: ball marker detector and hole marker detector. Both detectors
are two cascaded position classifiers (PBT classifier with Haar-like features), and
visibility maps are computed based on the detected marker locations.
Fig. 4. An example of template matching score map for one probe pose. X-axis is roll
angle and Y-axis is pitch angle. Each pixel represents one template pose. Dark red
color indicates a high matching score and dark blue indicates a small matching score.
2.3 Tracking Initialization and Failure Detection
Initial probe pose in the sequence is derived from detection results without con-
sidering temporal information. We detect the in-plane position, orientation and
scale, and out-of-plane roll and pitch hypotheses in the whole required searching
400 S. Sun et al.
space. We get a final in-plane pose via Non-maximal suppression and weighted
average to the pose with the largest detection probability. The hypothesis with
largest searching score is used as out-of-plane pose. For initializing tracking: (1)
we save poses of Ni (e.g. Ni = 5) consecutive image frames. (2) A median pose
is computed from Ni detection results. (3) Weighted mean pose is computed
based on distance to the median pose. (4) Standard deviation σp to the mean
pose is computed. Once σp < σthreshod , tracking starts with initial pose (i.e.
the mean pose). During tracking, we identify tracking failure through: (1) we
save Nf (e.g. Nf = 5) consecutive tracking results. (2) The average searching
score mscore is computed. If mscore < mthreshold , we stop tracking and re-start
tracking initialization procedure.
2.4 3D-2D Registration Based Pose Refinement
In addition, we perform 3D-2D registration of the probe at critical time points

when measurements are to be transformed from TEE images to fluoroscopic
images. With known perspective geometry of the C-Arm system, a DRR can
be rendered for any given pose parameters. In 3D-2D registration, the pose
parameters are iteratively optimized to maximize a similarity metric calculated
between the DRR and the fluoroscopic image. In the proposed method, we
use Spatially Weighted Gradient Correlation (SWGC) as the similarity met-
ric, where areas around the markers in the DRR are assigned higher weights
as they are more distinct and reliable features indicating the alignment of the
two images. SWGC is calculated as Gradient Correlation (GC) of two weighted
images: SW GC = GC(If · W, Id · W ), where If and Id denote the fluoroscopic
image and the DRR, respectively, W is a dense weight map calculated based
on the projection of the markers, and GC(·, ·) denotes the GC of the two input
images. Using SWGC as the similarity metric, the pose parameters are optimized
using Nelder-Mead optimizer to maximize SWGC.
3 Experiment Setup, Results and Discussions
For our study, we trained machine learning based detectors on ∼ 10, 000 fluoro-
scopic images (∼ 90 % images are synthetically generated images and ∼ 10 %
images are clinical images). We validated our methods on 34 X-Ray fluoro-
scopic videos (1933 images) acquired from clinical experiments, and 13 videos
(2232 images) from synthetic generation. The synthetic images were generated
by blending DRRs of the TEE probe (including tube) with real fluoroscopic
images containing no TEE probe. Particularly for the test synthetic sequences,
we simulate realistic probe motions (e.g., insertion, retraction, roll etc.) in the
fluoroscopic sequences. Ground truth poses for synthetic images are derived
from 3D probe geometry and rendering parameters. Clinical images are man-
ually annotated using our developed interactive tool by 4 experts. Image size is
1024 × 1024 pixels. Computations were performed on a workstation with Intel
Xeon (E5-1620) CPU 3.7 GHz and 8.00 GB Memory. On average, our tracking
algorithm performs at 10 fps. We performed our proposed detection algorithm

(discussed in Sect. 2.3, tracking is not enabled), proposed automated tracking
algorithm and registration refinement after tracking on all test images. Algo-
rithm accuracy was evaluated by calculating the standard target registration
error (TRE) in 2D. The targets are defined at the four corners of the TEE imag-
ing cone at 60 mm depth and the reported TRE is the average TRE over the
four targets. 2D TRE is a target registration error that z axis (depth) of the
projected target point is not considered when computing distance error. Table 1
shows success rate, average TRE and median TRE at 2D TRE < 4 mm and
< 2.5 mm respectively. Figure 5 shows success rate vs 2D TRE on all validated
clinical and synthetic images.
(a) (b)
Fig. 5. Result of success rate vs 2D TRE on clinical (a) and synthetic (b) validations
of the proposed detection, tracking and 3D-2D registration refinement algorithms.
Due to limited availability of clinical data, we enlarged our training data set
using synthetic images. Table 1 and Fig. 5 show our approach performs well on
real clinical data utilizing hybrid training data. We expect increased robustness
and accuracy after larger number of real clinical cases become available. Track-
ing algorithm improved robustness and accuracy comparing to detection alone
approach. One limitation of our tracking algorithm is not able to compensate
all discretization errors although temporal smoothing is applied using Kalman
filter. This is a limitation of any detection based approach. To further enhance
accuracy, refinement is applied when physicians perform the measurements. To
Table 1. Quantitative results on validations of the proposed detection (Det), tracking

(Trak) and 3D-2D registration refinement (Reg) algorithms. Numbers in the table show
success rate, mean TRE (mm), median TRE (mm) under different TRE error ranges.
Clinical data Synthetic data

Method TRE < 4 mm TRE < 2.5 mm TRE < 4 mm TRE < 2.5 mm
Det (80.0 %, 2.09, 2.02) (50.9 %, 1.47, 1.48) (88.4 %, 1.86, 1.73) (64.7 %, 1.38, 1.35)
Trak (91.6 %, 1.71, 1.61) (73.7 %, 1.38, 1.36) (96.4 %, 1.59, 1.42) (79.7 %, 1.28, 1.22)
Reg (98.0 %, 0.97, 0.79) (93.4 %, 0.86, 0.75) (96.4 %, 0.69, 0.52) (94.3 %, 0.63, 0.51)
402 S. Sun et al.
better understand the performance from registration refinement, in our study we

applied the refinement step on all images after tracking. Note that the refinement
algorithm did not bring more robustness but improved the accuracy.
4 Conclusion
In this work, we presented a fully automated method of recovering the 3D pose of
TEE probe from the Xray image. Tracking is very important to give physicians
the confidence that the probe pose recovery is working robustly and continu-
ously. Abrupt failed probe detection is not good especially when the probe does
not move. Detection alone based approach is not able to address abrupt fail-
ures due to disturbance, noise and appearance ambiguities of the probe. Our
proposed visual tracking algorithm avoids abrupt failure and improves detection
robustness as shown in our experiment. In addition, our approach is a near real-
time approach (about 10 FPS) and a fully automated approach without any user
interaction, e.g. manual pose initialization as required by many state-of-the-art
methods. Our proposed complete solution addressing TEE and X-Ray fusion
problem is applicable to clinical practice due to high robustness and accuracy.
Disclaimer: The outlined concepts are not commercially available. Due to reg-
ulatory reasons their future availability cannot be guaranteed.
References
1. Gao, G., et al.: Rapid image registration of three-dimensional transesophageal
echocardiography and X-ray fluoroscopy for the guidance of cardiac interven-
tions. In: Navab, N., Jannin, P. (eds.) IPCAI 2010. LNCS, vol. 6135, pp. 124–134.
2. Gao, G., et al.: Registration of 3D transesophageal echocardiography to X-ray
fluoroscopy using image-based probe tracking. Med. Image Anal. 16(1), 38–49
(2012)
3. Hatt, C.R., Speidel, M.A., Raval, A.N.: Robust 5DOF transesophageal echo probe
tracking at fluoroscopic frame rates. In: Navab, N., Hornegger, J., Wells, W.M.,
Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 290–297. Springer,
Heidelberg (2015). doi:10.1007/978-3-319-24553-9 36
4. Hinterstoisser, S., et al.: Gradient response maps for real-time detection of texture-
less objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 876–888 (2012)
5. Housden, R.J., et al.: Evaluation of a real-time hybrid three-dimensional echo
and X-ray imaging system for guidance of cardiac catheterisation procedures. In:
Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol.
6. Kaiser, M., et al.: Significant acceleration of 2D–3D registraion-based fusion of
ultrasound and X-ray images by mesh-based DRR rendering. In: SPIE, p. 867111
(2013)
7. Mountney, P., et al.: Ultrasound and fluoroscopic images fusion by autonomous
ultrasound probe detection. In: Ayache, N., Delingette, H., Golland, P., Mori, K.
(eds.) MICCAI 2012. LNCS, vol. 7511, pp. 544–551. Springer, Heidelberg (2012).
doi:10.1007/978-3-642-33418-4 67
8. Tu, Z.: Probabilistic boosting-tree: learning discriminative models for classification,

recognition, and clustering. In: Proceedings of the IEEE International Conference
on Computer Vision, pp. 1589–1596 (2005)
9. Wang, P., et al.: Image-based co-registration of angiography and intravascular
ultrasound images. IEEE TMI 32(12), 2238–2249 (2013)
10. Zheng, Y., et al.: Four-chamber heart modeling and automatic segmentation for
3D cardiac CT volumes using marginal space learning and steerable featurs. IEEE
Trans. Med. Imaging 27(11), 1668–1681 (2008)
Robust, Real-Time, Dense and Deformable 3D
Organ Tracking in Laparoscopic Videos
Toby Collins(B) , Adrien Bartoli, Nicolas Bourdel, and Michel Canis
ALCoV-ISIT, UMR 6284 CNRS/Université d’Auvergne, Clermont-Ferrand, France

toby.collins@gmail.com
Abstract. An open problem in computer-assisted surgery is to robustly

track soft-tissue 3D organ models in laparoscopic videos in real-time and
over long durations. Previous real-time approaches use locally-tracked
features such as SIFT or SURF to drive the process, usually with KLT
tracking. However this is not robust and breaks down with occlusions,
blur, specularities, rapid motion and poor texture. We have developed a
fundamentally different framework that can deal with most of the above
challenges and in real-time. This works by densely matching tissue tex-
ture at the pixel level, without requiring feature detection or matching. It
naturally handles texture distortion caused by deformation and/or view-
point change, does not cause drift, is robust to occlusions from tools and
other structures, and handles blurred frames. It also integrates robust
boundary contour matching, which provides tracking constraints at the
organ’s boundaries. We show that it can track over long durations and
can handles challenging cases that were previously unsolvable.
1 Introduction and Background
There is much ongoing research to develop and apply Augmented Reality (AR)
to improve laparoscopic surgery. One important goal is to visualise hidden sub-
surface structures such as tumors or major vessels by augmenting optical images
from a laparoscope with 3D radiological data from e.g. MRI or CT. Solutions
are currently being developed to assist various procedures including liver tumor
resection such as [6], myomectomy [3] and partial nephrectomy [9]. To solve the
problem one must register the data modalities. The general strategy is to build
a deformable 3D organ model from the radiological data, then to determine
the model’s 3D transformation to the laparoscope’s coordinate system at any
given time. This is very challenging and a general, automatic, robust and real-
time solution does not yet exist. The problem is especially hard with monocular
laparoscopes because of the lack of depth information. A crucial missing com-
ponent is a way to robustly compute dense matches between the organ’s surface
and the laparoscopic images. Currently, real-time results have only been achieved
with sparse feature-based matches using KLT [5,10], however this is quite fragile,
suffers from drift, and can quickly break down for a number of reasons including
occlusions, sudden camera motion, motion blur and optical blur.

DOI: 10.1007/978-3-319-46720-7 47
Robust, Real-Time, Dense and Deformable 3D Organ Tracking 405
To reliably solve the problem a much more advanced, integrated framework

is required, which is the focus of this paper. Our framework is fundamentally
a template-driven approach which works by matching each image directly to a
deformable 3D template, which in our case is a textured 3D biomechanical model
of the organ. The model’s intrinsic, physical constraints are fully integrated which
allows a high level of robustness. This differs from registration using KLT tracks,
where tracks are made by independently tracking points frame-to-frame without
being constrained by the model. This causes a lack of robustness and drift,
where over time the tracked points no longer corresponds to the same physical
point. We propose to solve this by densely and robustly matching the organ’s
texture at the pixel level, which is designed to overcome several fundamental
limitations of feature-base matching. Specifically, feature-based matches exist
only at sparse. discriminative, repeatable feature points (or interest points), and
for tissues with weak and/or repetitive texture it can be difficult to detect and
match enough features to recover the deformation. This is especially true with
blurred frames, lens smears, significant illumination changes, and distortions
caused by deformations or viewpoint change. By contrast we match the organ’s
texture densely, without requiring any feature detection or feature matching,
and in a way that naturally handles texture distortions and illumination change.
2 Methodology
We now present the framework, which we refer to as Robust, Real-time, Dense
and Deformable (R2D2) tracking. Figure 1 gives an overview of R2D2 tracking
using an in-vivo porcine kidney experiment as an example.
Model Requirements. We require three main models. The first is a geometric

model of the organ’s outer surface, which we assume is represented by a closed
surface mesh S. We denote its interior by Ω ⊂ R3 . The second is a deformation
model, which has a transform function f (p; xt ) : Ω → R3 that transforms a 3D
Fig. 1. Overview of R2D2 tracking with monocular laparoscopes. Top row: modelling
the organ’s texture by texture-mapping it from a set of reference laparoscopic images.
Bottom row: real-time tracking of the textured model.
406 T. Collins et al.
point p ∈ Ω to the laparoscope’s coordinates frame at time t. The vector xt

denotes the model’s parameters at time t, and our task is to recover it. We also
require the deformation model to have an internal energy function, which gives
the associated energy for transforming the organ according to xt . We use Et to
regularise the tracking problem. In the presented experiments the deformation
models used are tetrahedral finite element models, generated by a regular 3D
vertex grid cropped to the organ’s surface mesh (sometimes called a cage), and
we compute f with trilinear interpolation. Thus xt denotes the unknown 3D
positions of the grid vertices in the laparoscope’s coordinate frame. For Einternal
we have used the isotropic Saint Venant-Kirchoff (StVK) strain energy, which
has been shown to work well for reconstructing deformations from 2D images [5].
Further modelling deails are given in the experimental section. The third model
that we require is a texture model, which models the photometric appearance
of S. Unlike feature-based tracking, where the texture model is essentially a
collection of 2D features, we will be densely tracking its texture, and so we require
a dense texture model. We do this with a texture-map, which is a common model
used in computer graphics. Specifically, our texture-map is a 2D colour image
T (u, v) : R2 → [0, 255]3 which models the surface appearance up to changes of
illumination.
Texture-Map Construction. Before tracking begins we construct T through a

process known as image-based texture-mapping. This requires taking laparoscopic
images of the organ from several viewpoints (we call these reference images).
The reference images are then used to generate T through an image mosaicing
process. To do this we must align S to the reference images. Once done T can be
constructed automatically using an existing method (we currently use Agisoft’s
Photoscan’s method [1], using a default mosaic resolution of 4096 × 4096 pixels).
The difficult part is computing the alignments. Note that this is done just once
so it does not need to be real-time. We do this using an existing semi-automatic
approach based on [3], which assumes the organ does not deform when the
reference images are taken. This requires a minimum of two reference images,
however more can be used to build a more complete texture model (in our
experiments we use between 4 and 8 reference images), taking approximately
three minutes to compute with non-optimised code.
Tracking Overview. Our solution builds on a new technique called Deformable

Render-based Block Matching (DRBM) [2], which was originally proposed to
track thin-shell objects such as cloth and plastic bottles, yet has great potential
for our problem. It works by densely matching each image It to a time-varying
2D photometric render Rt of the deforming object. The render is generated from
the camera’s viewpoint and is continuously updated to reflect the current defor-
mation. Matching is performed by dividing Rt into local pixel windows, then each
window is matched to It with an illumination-invariant score function and a fast
coarse-to-fine search process. At a final stage most incorrect matches, caused by
e.g. occlusions or specularities are detected and eliminated using several consis-
tency tests. The remaining matches are used as deformation constraints, which
are combined with the model’s internal energy, then xt is solved with energy
minimisation. Once completed the new solution is used to update the render,
the next image is acquired and the process repeats. Because this process tracks
the model frame-to-frame a mechanism is needed for initialisation (to provide
an initial extimate of xt at the start) and re-initialisation (to provide and initial
estimate if tracking fails). We discuss these mechanisms below.
We use DRBM as a basis and extend it to our problem. Firstly, DRBM
requires at least some texture variation to be present, however tissue can be quite
textureless in some regions. To deal with this additional constraints are needed.
One that has rarely been exploited before are organ boundary constraints. Specif-
ically, if the organ’s boundary is visible (either partially or fully) it can be used
as a tracking constraint. Organ boundaries have been used previously to semi-
automatically register pre-operative models [3], but not for automatic real-time
tracking. This is non-trivial because one does not know which points correspond
to the organ’s boundary a priori. Secondly, we extend it to volumetric biome-
chanical deformable models, and thirdly we introduce semi-automatic texture
map updating, which allows strong changes of the organ’s appearance to be
handled, due to e.g. coagulation.
Overview and Energy-Based Formulation. To ease readability we now drop the

time index. During tracking texture matches are found using DRBM, which out-
def
puts a quasi-dense set of texture matches Ctexture = {(p1 , q1 ), . . . , (pN , qN )}
between 3D points pi ∈ R3 on the surface mesh S and points qi ∈ R2
def
in the image. We also compute a dense set of boundary matches Cbound =
{(p̃1 , q̃1 ), . . . , (p̃M , q̃M )} along the model’s boundary, as described below. Note
that this set can be empty if none of its boundaries are visible. The boundary
matches work in an Iterative Closest Point (ICP) sense, where over time the
boundary correspondences slide over the surface as it deforms.
Our energy function E(x) ∈ R+ encodes tracking cues from the image
(Ctexture , Cbound ) and the model’s internal deformation energy, and has the
following form:
E(x) = Ematch (x; Ctexture ) + λbound Ematch (x; Cbound ) + λinternal Einternal (x)
(1)
The term Ematch is a point-match energy, which generates the energy for both
texture and boundary matches. This is defined as follows:
def

Ematch (x; C) = ρ (π(f (pi ; x)) − qi 2 ) (2)
(pi ,qi )∈C
where π : R3 → R2 is the camera’s projection function. We assume the

laparoscope is intrinsically calibrated, which means π is known. The function
ρ : R → R+ is an M-estimator and is crucial to achieve robust tracking. It ts
purpose is to align the model point pi with the image point qi , but to do so
robustly to account for erroneous matches, which are practically unavoidable.
When a match is erroneous the model should not align the match, and the M-
estimator provides this by reducing the influence of an erroneous match on E.
We have tested various M-estimators and found good results are obtained with
def √
pseudo-L1 ρ(x) = x2 + with = 10−3 being a small constant to make Ematch
differentiable everywhere.
The terms λbound and λinternal are influence weights, and discuss how they
have been set in the experimental section. We follow the same procedure to
minimise E as described in [2]. This is done by linearising E about the current
estimate (which is the solution from the previous frame), then we form the
associated linear system and solve its normal equations using a coarse-to-fine
multi-grid Gauss-Newton optimisation with backtracking line-search.
Computing Boundary Matches. We illustrate this process in Fig. 2(k). First we

take R and extract all pixels P on the render’s boundary. For each pixel pi ∈ P
we denote its 3D position on the model by p̃i , which is determined from the
render’s depthmap. We then conduct a 1D search in I for a putative match q̃i .
The search is centred at pi in the direction orthogonal to the render’s boundary,
which we denote by the unit vector vi . We search within a range [−l, +l] in one
pixel increments where l is a free parameter, and measure the likelihood b(p) ∈ R
that a sample p corresponds to the organ’s boundary. We currently compute b
with a hand-crafted detector, based on the fact that organ boundaries tend
to occur at low-frequency intensity gradients, which correspond to a change of
predominant tissue albedo. We give the precise algorithm for computing b in the
supplementary material. We take q̃i as the sample with the maximal b beyond a
detection threshold bτ . If no such sample exists then we do not have a boundary
match. An important stage is then to eliminate false positives because there
may be other nearby boundary structures that could cause confusion. For this
we adopt a conservative strategy and reject the match if there exists another
local minimum of b along the search line that also exceeds bτ .
Initialisation, Re-localisation and Texture Model Updating. There are various

approaches one can use for initialisation and re-localisation. One is with an
automatic wide-baseline pose estimation method such as [7]. An alternative is
to have the laparoscope operator provide them, by roughly aligning the live
video with a overlaid render of the organ from some canonical viewpoint (Figs. 1
and 2(a)), and then tracking is activated. The alignment does not need to be
particularly precise due to the robustness of our match terms, which makes it a
practical option. For the default viewpoint we use the model’s pose in one of the
reference images from the texture-map construction stage. The exact choice is
not too important so we simply use the one where the model centroid is closest
to the image centre. During tracking, we have the option to update the texture
model by re-texturing its front-facing surface regions with the current image.
This is useful where the texture changes substantially during surgery. Currently
this is semi-automatic to ensure the organ is not being occluded by tools or other
organs in the current image, and is activated by a user notification. In future
work aim to make this automatic, but this is non-trivial.
Fig. 2. Visualisations of the five test cases and tracking results. Best viewed in colour.
We evaluate performance with five test cases which are visualised in Fig. 2 as five
columns. These are two in-vivo porcine kidneys (a,b), an in-vivo human uterus
(c), an ex-vivo chicken thigh used for laparoscopy training (d) and an ex-vivo
porcine kidney (e). We used the same kidney in cases (a) and (e). The models
were constructed from CT (a,b,d,e) and T2 weighted MRI (c), and segmented
interactively with MITK. For each case we recorded a monocular laparoscopic
video (10 mm Karl Storz 1080p, 25fps with CLARA image enhancement) of the
object being moved and deformed with surgical tools (a,b,c,d) or with human
hands (e). The video durations ranged from 1424 to 2166 frames (57 to 82 s).
The objects never moved completely out-of-frame in the videos, so we used
them to test tracking performance without re-localisation. The main challenges
present are low light and high noise (c), strong motion blur (b,c), significant tex-
ture change caused by intervention (a,c), tool occlusions (a,b,c,d), specularities
(a,b,c,d,e), dehydration (b), smoke (c), and partial occlusion where the organ
disappears behind the peritoneum (b,c). We constructed deformable models with
a 6 mm grid spacing with the number of respective tetrahedral elements for (a–
e) being 1591, 1757, 8618, 10028 and 1591. Homogeneous StVK elements were
used for (a,b,c,e) using rough generic Poison’s ratio ν values from the literature.
These were ν = 0.43 for (a,b,e) [4] and ν = 0.45 for (c). Note that when we use
homogeneous elements, the Young’s modulus E is not actually a useful parame-
ter for us. This because if we double E and halve λinternal we end up with the
same internal energy. We therefore arbitrarily set E = 1 for (a,b,c,e). For (d) we
used two coarse element classes corresponding to bone and all other tissue, and
we set their Young’s moduli using relative values of 200 and 1 respectively.
Our tracking framework has several tunable parameters, which are (i) the
energy weights, (ii) the boundary search length l, (iii) the boundary detector
parameters and (iv) the DRBM parameters. To make them independent of the
image resolution, we pre-scale the images to a canonical width of 640 pixels.
For all five cases we used the same values of (iii) and (iv) (their respective
defaults), and the same value for (iii) of l = 15 pixels. For (i), we used the
same value of λbound = 0.7 in all cases. For λinternal we used category-specific
values, which were λinternal = 0.2 for the uterus, λinternal = 0.09 for kidneys
and λinternal = 0.2 for the chicken thigh. In the interest of space, the results
presented here do not use texture model updating. This is to evaluate track-
ing robustness despite significant appearance change. We refer the reader to
the associated videos to see texture model updating in action. We benchmarked
processing speed on a mid-range Intel i7-5960X desktop PC with a single NVidia
GTX 980Ti GPU. With our current multi-threaded C++/CUDA implementa-
tion the average processing speeds were 35, 27, 22, 17 and 31fps for cases (a-
e) respectively. We also ran our framework without the boundary constraints
(λbound = 0). This was to analyse its influence on tracking accuracy, and we
call this version R2D2-b. We show snapshot results from the videos in Fig. 2. In
Fig. 2(f–j) we show five columns corresponding to each case. The top image is an
example input image, the middle image shows DRBM matches (with coarse-scale
matches in green, fine-scale matches in blue, gross outliers in red) and the bound-
ary matches in yellow. The third image shows an overlay of the tracked surface
mesh. We show three other images with corresponding overlays in Fig. 2(l–n).
The light path on the uterus in Fig. 2(h) is a coagulation path used for interven-
tional incision planning, and it significantly changed the appearance. The haze
in Fig. 2(m) is a smoke plume. In Fig. 2(o) we show the overlay with and without
boundary constraints (top and bottom respectively). This is an example where
the boundary constraints have clearly improved tracking.
We tested how well KLT-based tracking worked by measuring how long it
could sustain tracks from the first video frames. Due to the challenges of the
conditions, KLT tracks dropped off quickly in most cases. mostly due to blur or
tool occlusions. Only in case (b) did some KLT tracks persist to the end, however
they were limited to a small surface region which congregated around speculari-
ties (and therefore were drifting). By contrast our framework sustained tracking
through all videos. It is difficult to quantitatively evaluate tracking accuracy
in 3D without interventional radiological images, which were not available. We
therefore measured accuracy using 2D proxies. These were (i) Correspondence
Prediction Error (CPE) and (ii) Boundary Prediction Error (BPE). CPE tells us
how well the tracker aligns the model with respect to a set of manually located
point correspondences. We found approximately 20 per case, and located them
in 30 representative video frames. We then measured the distance (in pixels)
to their tracked positions. BPE tells us how well the tracker aligns the model’s
boundaries to the image. This was done by manually marking any contours in
Table 1. Summary statistics of the quantitative performance evaluation (in pixels).

Errors are computed using a default image width of 640 pixels.
the representative images that corresponded to the object’s boundary. We then

measured the distance (in pixels) between each contour point and the model’s
boundary. The results are shown in Table 1, where we give summary statistics
(median, inter-quartile range, median, standard deviation and maximum). The
table also includes results from R2D2-b. To show the benefits of tracking with
a deformable model, we also compare with a fast feature-based baseline method
using a rigid transform model. For this we used SIFT matching with HMA out-
lier detection [8] (using the author’s implementation) and rigid pose estimation
using OpenCV’s PnP implementation. We denote this by R-HMA. Its perfor-
mance is certainly worse, which is because it cannot model deformation, and
also because HMA was sometimes unable to find any correct feature clusters,
most notably in (c) due to poor texture, blur and appearance changes.
4 Conclusion
We have presented a new, integrated, robust and real-time solution for dense
tracking of deformable 3D soft-tissue organ models in laparoscopic videos. There
are a number of possible future directions. The main three are to investigate
automatic texture map updating, to investigate its performance using stereo
laparoscopic images, and to automatically detect when tracking fails.
References
1. Agisoft Photoscan. http://www.agisoft.com. Accessed 30 May 2016
2. Collins, T., Bartoli, A.: Realtime shape-from-template: system and applications.
In: ISMAR (2015)
3. Collins, T., Pizarro, D., Bartoli, A., Canis, M., Bourdel, N.: Computer-assisted
laparoscopic myomectomy by augmenting the uterus with pre-operative MRI data.
In: ISMAR (2014)
4. Egorov, V., Tsyuryupa, S., Kanilo, S., Kogit, M., Sarvazyan, A.: Soft tissue elas-
tometer. Med. Eng. Phys. 30(2), 206–212 (2008)
5. Haouchine, N., Dequidt, J., Berger, M., Cotin, S.: Monocular 3D reconstruction
and augmentation of elastic surfaces with self-occlusion handling. IEEE Trans. Vis.
Comput. Graph. 21(12), 1363–1376 (2015)
6. Haouchine, N., Dequidt, J., Peterlik, I., Kerrien, E., Berger, M.-O., Cotin, S.:
Image-guided simulation of heterogeneous tissue deformation for augmented reality
during hepatic surgery. In: ISMAR (2013)
7. Puerto-Souza, G., Cadeddu, J.A., Mariottini, G.: Toward long-term and accurate
augmented-reality for monocular endoscopic videos. Bio. Eng. 61(10), 2609–2620
(2014)
8. Puerto-Souza, G., Mariottini, G.: A fast and accurate feature-matching algorithm
for minimally-invasive endoscopic images. TMI 32(7), 1201–1214 (2013)
9. Su, L.-M., Vagvolgyi, B.P., Agarwal, R., Reiley, C.E., Taylor, R.H., Hager, G.D.:
Augmented reality during robot-assisted laparoscopic partial nephrectomy: toward
real-time 3D-CT to stereoscopic video registration. Urology 73, 896–900 (2009)
10. Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical report
CMU-CS-91-132 (1991)
Structure-Aware Rank-1 Tensor Approximation
for Curvilinear Structure Tracking Using
Learned Hierarchical Features
Peng Chu1 , Yu Pang1 , Erkang Cheng1 , Ying Zhu2 , Yefeng Zheng3 ,

and Haibin Ling1(B)
1
Computer and Information Sciences Department,
Temple University, Philadelphia, USA
hbling@temple.edu
2
Electrical and Computer Engineering Department,
Temple University, Philadelphia, USA
3
Medical Imaging Technologies, Siemens Healthcare, Princeton, USA
Abstract. Tracking of curvilinear structures (CS), such as vessels and

catheters, in X-ray images has become increasingly important in recent
interventional applications. However, CS is often barely visible in low-
dose X-ray due to overlay of multiple 3D objects in a 2D projection,
making robust and accurate tracking of CS very difficult. To address
this challenge, we propose a new tracking method that encodes the
structure prior of CS in the rank-1 tensor approximation tracking frame-
work, and it also uses the learned hierarchical features via a convolu-
tional neural network (CNN). The three components, i.e., curvilinear
prior modeling, high-order information encoding and automatic feature
learning, together enable our algorithm to reduce the ambiguity rising
from the complex background, and consequently improve the tracking
robustness. Our proposed approach is tested on two sets of X-ray fluo-
roscopic sequences including vascular structures and catheters, respec-
tively. In the tests our approach achieves a mean tracking error of 1.1
pixels for vascular structure and 0.8 pixels for catheter tracking, signifi-
cantly outperforming state-of-the-art solutions on both datasets.
1 Introduction
Reliable tracking of vascular structures or intravascular devices in dynamic X-ray
images is essential for guidance during interventional procedures and postpro-
cedural analysis [1–3,8,13,14]. However, bad tissue contrast due to low radi-
ation dose and lack of depth information always bring challenges on detect-
ing and tracking those curvilinear structures (CS). Traditional registration and
alignment-based trackers depend on local image intensity or gradient. With-
out high-level context information, they cannot efficiently discriminate low-
contrasted target structure from complex background. On the other hand, the
confounding irrelevant structures bring challenges to detection-based tracking.
Recently, a new solution is proposed that exploits the progress in multi-target

DOI: 10.1007/978-3-319-46720-7 48
414 P. Chu et al.
tracking [2]. After initially detecting candidate points on a CS, the idea is to
model CS tracking as a multi-dimensional assignment (MDA) problem, then a
tensor approximation is applied to search for a solution. The idea encodes high-
order temporal information and hence gains robustness against local ambiguity.
However, it suffers from the lack of mechanism to encode the structure prior in
CS, and the features used in [2] via random forests lack discrimination power.
Rand-1 tensor Model

Detect using approximation Unfolding
hierarchical likelihood
feature
Build
links
Spatial Likelihood neighbor
interaction on model
Fig. 1. Overview of the proposed method.
In this paper, we present a new method (refer to Fig. 1 for the flowchart) to
detect and track CS in dynamic X-ray sequences. First, a convolutional neural
network (CNN) is used to detect candidate landmarks on CS. CNN automati-
cally learns the hierarchical representations of input images [6,7] and has been
recently used in medical image analysis (e.g. [9,10]). With the detected CS can-
didates, CS tracking is converted to a multiple target tracking problem and then
a multi-dimensional assignment (MDA) one. In MDA, candidates are associ-
ated along motion trajectories cross time, while the association is constructed
according to the trajectory affinity. It has been shown in [11] that MDA can be
efficiently solved via rank-1 tensor approximation (R1TA), in which the goal is
to seek vectors to maximize the “joint projection” of an affinity tensor. Shar-
ing the similar procedure, our solution adopts R1TA to estimate the CS motion.
Specifically, a high-order tensor is first constructed from all trajectory candidates
over a time span. Then, the model prior of CS is integrated into R1TA encoding
the spatial interaction between adjacent candidates in the model. Finally, CS
tracking results are inferred from model likelihood.
The main contribution of our work lies in two-fold. (1) We propose a
structure-aware tensor approximation framework for CS tracking by considering
the spatial interaction between CS components. The combination of such spatial
interaction and higher order temporal information effectively reduces association
ambiguity and hence improves the tracking robustness. (2) We design a discrim-
inative CNN detector for CS candidate detection. Compared with traditional
hand-crafted features, the learned CNN features show very high detection qual-
ity in identifying CS from low-visibility dynamic X-ray images. As a result, it
greatly reduces the number of hypothesis trajectories and improves the tracking
efficiency.
Structure-Aware Rank-1 Tensor Approximation for Curvilinear Structure 415
For evaluation, our method is tested on two sets of X-ray fluoroscopic

sequences including vascular structures and catheters, respectively. Our app-
roach achieves a mean tracking error of 1.1 pixels on the vascular dataset and
0.8 pixels on the catheter dataset. Both results are clearly better than other
state-of-the-art solutions in comparison.
2 Candidate Detection with Hierarchical Features

Detecting CS in the low-visibility dynamic X-ray images is challenging. Without
color and depth information, CS shares great similarity with other anatomical
structures or imaging noise. Attacking these problems, a four-layer CNN (Fig. 2)
is designed to automatically learn hierarchical features for CS candidate detec-
tion. We employ 32 filters of size 5 × 5 in the first convolution stage, and 64
filters of the same size in the second stage. Max-pooling layers with a receptive
window of 2 × 2 pixels are employed to down-sample the feature maps. Finally,
two fully-connected layers are used as the classifier. Dropout is employed to
reduce overfitting. The CNN framework used in our experiments is based on
MatConvNet [12].
Probability map
Inputimage Image patches Convolutional Max-pooling ConvolutionalMax-pooling

@
Stage 1 Stage 2 Fully connected
Feature extraction Classifier
Fig. 2. The CNN architecture for CS candidate detection.
For each image in the sequence except the first one which has groundtruth
annotated manually, a CS probability map is computed by the learned classi-
fier. A threshold is set to eliminate most of the false alarms in the image. Result
images are further processed by filtering and thinning. Typically, binarized prob-
ability map is filtered by a distance mask in which locations too far from the
model are excluded. Instead of using a groundtruth bounding box, we take the
tracking results from previous image batches. Based on the previously tracked
model, we calculate the speed and acceleration of the target to predict its posi-
tion in next image batch. Finally, after removing isolated pixels, CS candidates
are generated from the thinning results. Examples of detection results are shown
in Fig. 3. For comparison, probability maps obtained by a random forests classi-
fier with hand-crafted features [2] are also listed. Our probability maps contain
less false alarm, which guarantees more accurate candidate locations after post-
processing.
416 P. Chu et al.
Fig. 3. Probability maps and detected candidates of a vessel (left) and catheter (right).
For each example, from left to right are groundtruth, random forests result, and CNN
result, respectively. Red indicates region with high possibility, while green dots show
resulting candidates.
3 Tracking with Model Prior

To encode the structure prior in a CS model, we use an energy maximiza-
tion scheme that combines temporal energy of individual candidate and spatial
interaction energy of multiple candidates into a united optimization framework.
Here, we consider the pairwise interactions of two candidates on neighboring
frames. The assignment matrix between two consecutive sets O(k−1) and O(k)
(i.e. detected candidate CS landmarks) can be written as X(k) = (xik−1 ik )(k) ,
(k)
where k = 1, 2, . . . , K, and oik ∈ O(k) is the ik -th landmark candidate of CS.
For notation convenience, we use a single subscript jk to represent the entry
(k) . (k) (k)
index (ik−1 , ik ), such as xjk = xik−1 ik , i.e., vec(X(k) ) = (xjk ) for vectorized
X(k) . Then our objective function can be written as
K

(1) (2) (K) (k) (k) (k) (k)
f (X ) = cj1 j2 ...jK xj1 xj2 . . . xjK + wlk jk elk jk xlk xjk , (1)
k=1 lk ,jk
(k)
where cj1 j2 ...jK is the affinity measuring trajectory confidence; wlk jk the likeli-
(k) (k) (k)
hood that candidates xjk and xlk are neighboring on the model; and elk jk the
spatial interaction of two candidates on two consecutive frames. The affinity has
two parts as
ci0 i1 ,...iK = appi0 i1 ,...iK × kini0 i1 ,...iK , (2)
where appi0 i1 ,...iK describes the appearance consistency of the trajectory, and
kini0 i1 ,...iK the kinetic affinity modeling the higher order temporal affinity as
detailed in [2].
Model Prior. CS candidates share two kinds of spatial constrains. First, trajec-
tories of two neighboring elements should have similar direction. Second, relative
order of two neighboring elements should not change so that re-composition of
CS is prohibited. Thus inspired, we formulate the spatial interaction of two can-
didates as
.
elk jk = emk−1 mk ik−1 ik = Epara + Eorder , (3)
where
(k−1) (k) (k−1) (k) (k−1) (k−1) (k) (k)
(oi − oi ) · (omk−1 − omk ) (oi − omk−1 ) · (oi − omk )
k−1 k k−1 k
Epara = (k−1) (k) (k−1) (k)
, Eorder = (k−1) (k−1) (k) (k)
,
(oi − oi ) · (omk−1 − omk ) (oi − omk−1 ) · (oi − omk )
k−1 k k−1 k
such that Epara models the angle between two neighbor trajectories, which also
penalizes large distance change between them; and Eorder models the relative
order of two adjacent candidates by the inner product of vectors between two
neighbor candidates.
Maximizing Eq. 1 closely correlates with the rank-1 tensor approximation
(R1TA) [4], which aims to approximate a tensor by the tensor product of unit
vectors up to a scale factor. By relaxing the integer constraint on the assignment
variables, once a real valued solution of Xk is achieved, it can be binarized
using the Hungarian algorithm [5]. The key issue here is to accommodate the
row/column 1 normalization in a general assignment problem, which is different
from the commonly used 2 norm constraint in tensor factorization. We develop
an approach similar to [11], which is a tensor power iteration solution with 1
row/column normalization.
(k) . (k)
Model Likelihood. Coefficient wlk jk = wmk−1 mk ik−1 ik measures the likelihood
(k−1) (k−1)
that two candidates oik−1 and omk−1 are neighboring on model. In order to
get the association of each candidate pair in each frame, or in other words, to
(k) (0)
measure the likelihood a candidate oik matching a model element part oi0 , we
(k)
maintain a “soft assignment”. In particular, we use θi0 ik to indicate the likelihood
(k) (0)
that oik corresponds to oi0 . It can be estimated by
Θ(k) = Θ(k−1) X(k) , k = 1, 2, . . . , K, (4)
∈ RI0 ×Ik and Θ(0) is fixed as the identity matrix.
(k)
where Θ(k) = (θi0 ik )
The model likelihood is updated in each step of the power iteration. After the
update of the first term in Eq. 1, a pre-likelihood Θ(k) is estimated for computing
(k)
wlk jk . Since Θ(k) associates candidates directly with the model, final tracking
result of the matching between o(0) and o(k) can be derived from Θ(k) .
With Θ(k) , the approximated distance on model of oik−1 and omk−1 can be
(k−1) (k−1)
calculated as following
(0) (0) (k) (k)
(oi0 − oi0 +1 )θi0 ik θi0 +1mk
dik mk = i0
(k)
(k) (k) . (5)
i0 θi0 ik θi0 +1mk
(k)
Thereby, wlk jk then can be simply calculated as
(k−1)
(k) . (k) 2dik−1 mk−1 d¯
wlk jk = wmk−1 mk ik−1 ik = (k−1)
, (6)
¯2
(dik−1 mk−1 )2 + (d)
where d¯ is the average distance between two neighboring elements on model O(0) .
The proposed tracking method is summarized in Algorithm 1.
418 P. Chu et al.
Algorithm 1. Power iteration with model prior

1: Input: Global affinity C = (cj1 j2 ...jK ), spatial interaction elk jk , k = 1 . . . K, and
CS candidates O(k) , k = 0 . . . K.
2: Output: CS Matching.
3: Initialize X(k) , k = 1 . . . K, CS (0) = O(0) and Θ(0) = I.
4: repeat
5: for k = 1, . . . , K do
6: for jk = 1, . . . , J do
(k) . (k) (k) (k) (1) (f ) (K)
7: update xjk = xik−1 ik by xjk ∝ xjk jf :f =k cj1 ...jk ...jK xj1 . . . xjf . . . xjK
8: end for
9: row/column normalize X(k)
10: update model pre-likelihood: Θ(k) = Θ(k−1) X(k)
11: for jk = 1, . . . , J do
. (k) (k) (k) (k) (k)
12: update xjk = xik−1 ik by xjk ∝ xjk lk wlk jk elk jk xlk
13: end for
14: update model likelihood: Θ(k) = Θ(k−1) X(k)
15: end for
16: until convergence
17: discretize Θ(k) to CS Matching.
4 Experiments
We evaluate the proposed CS tracking algorithm using two groups of X-ray clin-
ical data collected from liver and cardiac interventions. The first group consists
of six sequences of liver vessel images and the second 11 sequences of catheter
images, each with around 20 frames. The data is acquired with 512 × 512 pix-
els and physical resolution of 0.345 or 0.366 mm. Groundtruth of each image is
manually annotated (Fig. 4(a)).
Vascular Structure Tracking. We first evaluate the proposed algorithm on
the vascular sequences. First frame from each sequence is used to generate train-
ing samples for CNN. To be specific, 800 vascular structure patches and 1500
negative patches are generated from each image. From the six images, a total
of 2300 × 6 = 13, 800 samples are extracted and split as 75 % training and 25 %
validation. All patches have the same size of 28 × 28 pixels. Distance thresh-
old of predictive bounding box is set to 60 pixels for enough error tolerance.
Finally, there are around 200 vascular structure candidates left in each frame.
The number of points on the model is around 50 for each sequence.
In our work, K = 3 is used to allow each four consecutive frames to be
associated. During tracking, tensor kernel costs around 10s and 100 MB (peak
value) RAM to process one frame with 200 candidates in our setting running on
a single Intel Xeon@2.3GHz core. The tracking error is defined as the shortest
distance between tracked pixels and groundtruth annotation. For each perfor-
mance metric, we compute its mean and standard deviation. For comparison, the
registration-based (RG) approach [14], bipartite graph matching [2] (BM) and
pure tensor based method [2] (TB) are applied to the same sequences. For BM
and TB, same tracking algorithms but with the CNN detector are also tested
and reported. The first block of Fig. 4 illustrates the tracking results of vascular
structures. B-spline is used to connect all tracked candidates to represent the
tracked vascular structure. The zoom-in view of a selected region (rectangle in
blue) in each tracking result is presented below, where portions with large errors
are colored red. Quantitative evaluation for each sequence is listed in Table 1.
Catheter Tracking. Similar procedures and parameters are applied to the
11 sequences of catheter images. The second block of Fig. 4 shows example of
catheter tracking results. The numerical comparisons are listed in Table 1.
The results show that our method clearly outperforms other three
approaches. Candidates in our approach are detected by a highly accurate CNN
detector, ensuring most extracted candidates to be on CS, while registration-
based method depends on the first frame as reference to identify targets. Our
approach is also better than the results of bipartite graph matching where K = 1.
The reason is that our proposed method incorporates higher-order temporal
information from multiple frames; by contrast, bipartite matching is only com-
puted from two frames. Compared with the pure tensor based algorithm, the
proposed method incorporates the model prior which provides more powerful
Table 1. Curvilinear structure tracking errors (in pixels)
Dataset Seq ID RG [14] BM [2] TB [2] BM+CNN TB+CNN Proposed

Vascular VAS1 2.77 ± 3.25 1.54 ± 1.59 1.33 ± 1.08 1.44 ± 2.37 1.15 ± 0.91 1.14 ± 0.84
structures
VAS2 2.02 ± 3.10 1.49 ± 1.14 1.49 ± 1.74 1.11 ± 0.83 1.30 ± 2.48 1.09 ± 0.83
VAS3 3.25 ± 7.64 1.65 ± 2.40 1.41 ± 1.54 1.19 ± 0.91 1.17 ± 0.92 1.17 ± 0.91
VAS4 2.16 ± 2.52 1.61 ± 2.25 1.99 ± 3.02 1.12 ± 1.00 1.95 ± 5.00 1.17 ± 1.53
VAS5 3.04 ± 5.46 2.71 ± 4.36 1.36 ± 1.44 1.95 ± 3.94 1.14 ± 1.55 1.09 ± 1.42
VAS6 2.86 ± 5.60 1.40 ± 1.94 1.32 ± 1.68 1.39 ± 2.53 1.09 ± 1.70 1.11 ± 1.90
75 %ile, - 2.00, 31.2 2.00, 26.8 1.40, 32.6 1.40, 56.9 1.40, 23.2
100 %ile
Overall 2.69 ± 5.03 1.75 ± 2.60 1.49 ± 1.86 1.37 ± 2.26 1.30 ± 2.64 1.13 ± 1.30
Catheters CAT1 2.86 ± 3.83 1.47 ± 1.57 1.29 ± 1.06 1.13 ± 1.19 1.08 ± 0.85 1.00 ± 0.77
CAT2 1.98 ± 2.66 2.38 ± 5.33 1.11 ± 1.58 1.77 ± 4.11 0.77 ± 1.06 0.56 ± 0.89
CAT3 2.20 ± 1.56 1.55 ± 1.98 1.39 ± 1.70 0.99 ± 1.52 0.72 ± 0.66 0.74 ± 0.65
CAT4 1.07 ± 0.76 2.12 ± 3.35 1.15 ± 1.33 0.94 ± 1.37 0.92 ± 1.34 0.76 ± 0.77
CAT5 2.54 ± 3.65 2.02 ± 4.85 1.04 ± 0.88 1.65 ± 5.36 0.84 ± 1.01 0.83 ± 0.97
CAT6 1.93 ± 2.15 2.06 ± 3.92 1.14 ± 0.95 1.19 ± 2.03 0.96 ± 0.92 0.93 ± 0.89
CAT7 1.39 ± 2.18 1.86 ± 3.79 1.00 ± 0.78 0.76 ± 0.72 0.76 ± 0.72 0.73 ± 0.63
CAT8 2.74 ± 4.32 2.30 ± 5.53 1.31 ± 2.21 1.22 ± 2.21 1.74 ± 3.81 0.96 ± 1.37
CAT9 1.74 ± 1.25 2.80 ± 4.78 2.00 ± 2.74 1.54 ± 3.44 1.18 ± 2.02 0.99 ± 1.33
CAT10 3.17 ± 5.26 2.86 ± 4.33 2.48 ± 3.59 0.86 ± 1.26 0.81 ± 1.12 0.86 ± 1.29
CAT11 3.96 ± 5.89 2.68 ± 4.36 1.17 ± 0.97 3.50 ± 11.3 1.35 ± 3.72 0.80 ± 0.74
75 %ile, – 2.00, 47.7 1.40, 24.0 1.00, 70.5 1.00, 48.4 1.00, 19.2
100 %ile
Overall 2.40 ± 3.62 2.17 ± 4.14 1.38 ± 1.90 1.39 ± 4.16 1.01 ± 1.93 0.83 ± 0.98
420 P. Chu et al.
Vascular structures
Catheter
(a) GT (b) RG [2] (c) BM [2] (d) TB [2] (e) Proposed
Fig. 4. Curvilinear structure tracking results. (a) groundtruth, (b) registration, (c)
bipartite matching, (d) tensor based, and (e) proposed method. Red indicates regions
with large errors, while green indicates small errors.
clues for tracking the whole CS. Confirmed by the zoom-in views, with model
prior, our proposed method is less affected by neighboring confounding struc-
tures.
5 Conclusion
We presented a new method to combine hierarchical features learned in CNN

and encode model prior to estimate the motion of CS in X-ray image sequences.
Experiments on two groups of CS demonstrate the effectiveness of our pro-
posed approach. Achieving a tracking error of around one pixel (or smaller than
0.5 mm), it clearly outperforms the other state-of-the-art algorithms. For future
work, we plan to adopt pyramid detection strategy in order to accelerate the
pixel-wised probability map calculation in our current approach.
Acknowledgement. We thank the anonymous reviewers for valuable suggestions.

This work was supported in part by NSF grants IIS-1407156 and IIS-1350521.
References
1. Baert, S.A., Viergever, M.A., Niessen, W.J.: Guide-wire tracking during endovas-
cular interventions. IEEE Trans. Med. Imaging 22(8), 965–972 (2003)
2. Cheng, E., Pang, Y., Zhu, Y., Yu, J., Ling, H.: Curvilinear structure tracking by
low rank tensor approximation with model propagation. In: IEEE Conference on
Computer Vision and Pattern Recognition, pp. 3057–3064 (2014)
3. Cheng, J.Z., Chen, C.M., Cole, E.B., Pisano, E.D., Shen, D.: Automated delin-
eation of calcified vessels in mammography by tracking with uncertainty and graph-
ical linking techniques. IEEE Trans. Med. Imaging 31(11), 2143–2155 (2012)
4. De Lathauwer, L., De Moor, B., Vandewalle, J.: On the best rank-1 and rank-
(r1, r2,. . ., rn) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl.
21(4), 1324–1342 (2000)
5. Frank, A.: On Kuhn’s Hungarian method —a tribute from Hungary. Nav. Res.
Logistics 52(1), 2–5 (2005)
6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep con-
volutional neural networks. In: Advances in Neural Information Processing Sys-
tems, pp. 1097–1105 (2012)
7. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to
document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
8. Palti-Wasserman, D., Brukstein, A.M., Beyar, R.P.: Identifying and tracking a
guide wire in the coronary arteries during angioplasty from X-ray images. IEEE
Trans. Biomed. Eng. 44(2), 152–164 (1997)
9. Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M.: Deep feature
learning for knee cartilage segmentation using a triplanar convolutional neural
network. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI
2013, Part II. LNCS, vol. 8150, pp. 246–253. Springer, Heidelberg (2013)
10. Roth, H.R., Wang, Y., Yao, J., Lu, L., Burns, J.E., Summers, R.M.: Deep convo-
lutional networks for automated detection of posterior-element fractures on spine
CT. In: SPIE Medical Imaging, p. 97850 (2016)
11. Shi, X., Ling, H., Xing, J., Hu, W.: Multi-target tracking by rank-1 tensor approx-
imation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp.
2387–2394 (2013)
12. Vedaldi, A., Lenc, K.: MatConvNet – convolutional neural networks for MATLAB.
In: Proceedings of the ACM International Conference on Multimedia (2015)
13. Wang, P., Chen, T., Zhu, Y., Zhang, W., Zhou, S.K., Comaniciu, D.: Robust
guidewire tracking in fluoroscopy. In: IEEE Conference on Computer Vision and
Pattern Recognition, pp. 691–698 (2009)
14. Zhu, Y., Tsin, Y., Sundar, H., Sauer, F.: Image-based respiratory motion com-
pensation for fluoroscopic coronary roadmapping. In: Jiang, T., Navab, N., Pluim,
J.P.W., Viergever, M.A. (eds.) MICCAI 2010, Part III. LNCS, vol. 6363, pp. 287–
294. Springer, Heidelberg (2010)
Real-Time Online Adaption for Robust
Instrument Tracking and Pose Estimation
Nicola Rieke1(B) , David Joseph Tan1 , Federico Tombari1,4 ,

Josué Page Vizcaı́no1 , Chiara Amat di San Filippo3 , Abouzar Eslami3 ,
and Nassir Navab1,2
1
Nicola.Rieke@tum.de
2
3
Carl Zeiss MEDITEC München, Munich, Germany
4
DISI, University of Bologna, Bologna, Italy
Abstract. We propose a novel method for instrument tracking in Reti-

nal Microsurgery (RM) which is apt to withstand the challenges of RM
visual sequences in terms of varying illumination conditions and blur.
At the same time, the method is general enough to deal with different
background and tool appearances. The proposed approach relies on two
random forests to, respectively, track the surgery tool and estimate its
2D pose. Robustness to photometric distortions and blur is provided by a
specific online refinement stage of the offline trained forest, which makes
our method also capable of generalizing to unseen backgrounds and tools.
In addition, a peculiar framework for merging together the predictions of
tracking and pose is employed to improve the overall accuracy. Remark-
able advantages in terms of accuracy over the state-of-the-art are shown
on two benchmarks.
1 Introduction and Related Work

Retinal Microsurgery (RM) is a challenging task wherein a surgeon has to handle
anatomical structures at micron-scale dimension while observing targets through
a stereo-microscope. Novel imaging modalities such as interoperative Optical
Coherence Tomography (iOCT) [1] aid the physician in this delicate task by
providing anatomical sub-retinal information, but lead to an increased workload
due to the required manual positioning to the region of interest (ROI). Recent
research has aimed at introducing advanced computer vision and augmented
reality techniques within RM to increase safety during surgical maneuvers and
to simplify the surgical workflow. A key step for most of these methods is rep-
resented by an accurate and real-time localization of the instrument tips, which
allows to automatically position the iOCT according to it. This further enables
to calculate the distance of the instrument tip to the retina and to provide a
real-time feedback to the physician. In addition, the trajectories performed by
the instrument during surgery can be compared with other surgeries, thus paving

DOI: 10.1007/978-3-319-46720-7 49
Real-Time Online Adaption for Robust Instrument Tracking 423
the way to objective quality assessment for RM. Surgical tool tracking has been
investigated in different medical specialties: nephrectomy [2], neurosurgery [3],
laparoscopy/endoscopy [4,5]. However, RM presents specific challenges such as
strong illumination changes, blur and variability of surgical instruments appear-
ance, that make the aforementioned approaches not directly applicable in this
scenario. Among the several works recently proposed in the field of tool tracking
for RM, Pezzementi et al. [6] suggested to perform the tracking in two steps: first
via appearance modeling, which computes a pixel-wise probability of class mem-
bership (foreground/background), then filtering, which estimates the current
tool configuration. Richa et al. [7] employ mutual information for tool tracking.
Snitzman et al. [8] introduced a joint algorithm which performs simultaneously
tool detection and tracking. The tool configuration is parametrized and track-
ing is modeled as a Bayesian filtering problem. Succesively, in [9], they propose
to use a gradient-based tracker to estimate the tool’s ROI followed by fore-
ground/background classification of the ROI’s pixels via boosted cascade. In
[10], a gradient boosted regression tree is used to create a multi-class classifier
which is able to detect different parts of the instrument. Li et al. [11] present
a multi-component tracking, i.e. a gradient-based tracker able to capture the
movements and an online-detector to compensate tracking losses.
In this paper, we introduce a robust closed-loop framework to track and
localize the instrument parts in in-vivo RM sequences in real-time, based on the
dual-random forest approach for tracking and pose estimation proposed in [12].
A fast tracker directly employs the pixel intensities in a random forest to infer
the tool tip bounding box in every frame. To cope with the strong illumina-
tion changes affecting the RM sequences, one of the main contributions of our
paper is to adapt the offline model to online information while tracking, so to
incorporate the appearance changes learned by the trees with real photometric
distortions witnessed at test time. This offline learning - online adaption leads to
a substantial capability regarding the generalization to unseen sequences. Sec-
ondly, within the estimated bounding box, another random forest predicts the
locations of the tool joints based on gradient information. Differently from [12],
we enforce spatial temporal constraints by means of a Kalman filter [13]. As
a third contribution of this work, we propose to “close the loop” between the
tracking and 2D pose estimation by obtaining a joint prediction concerning the
template position acquired by merging the outcome of the two separate forests
through the confidence of their estimation. Such cooperative prediction will in
turn provide pose information for the tracker, improving its robustness and accu-
racy. The performance of the proposed approach is quantitatively evaluated on
two different in-vivo RM datasets, and demonstrate remarkable advantages with
respect to the state-of-the-art in terms of robustness and generalization.
2 Method
In this section, we discuss the proposed method, for which an overview is depicted
in Fig. 1. First, a fast intensity-based tracker locates a template around the
424 N. Rieke et al.
Fig. 1. Framework: The description of the tracker, sampling and online learning can
be found in Sect. 2.1. The pose estimator and Kalman filter is presented in Sect. 2.2.
Details on the integrator are given in Sect. 2.3.
instrument tips using an offline trained model based on random forest (RF) and
the location of the template in the previous frame. Within this ROI, a pose
estimator based on HOG recovers the three joints employing another offline
learned RF and filters the result by temporal-spatial constraints. To close the
loop, the output is propagated to an integrator, aimed at merging together
the intensity-based and gradient-based predictions in a synergic way in order
to provide the tracker with an accurate template location for the prediction in
the next frame. Simultaneously, the refined result is propagated to a separate
thread which adapts the model of the tracker to the current data characteristics
via online learning.
A central element in this approach is the definition of the tracked template,
which we define by the landmarks of the forceps. Let (L, R, C) ∈ R2×3 be the
left, right and central joint of the instrument, then the midpoint between the tips
is given by M = L+R 2 and the 2D similarity transform from the patch coordinate
system to the frame coordinate system can be defined as
⎡ ⎤⎡ ⎤
s · cos(θ) −s · sin(θ) Cx 10 0
H = ⎣ s · sin(θ) s · cos(θ) Cy ⎦ ⎣0 1 30⎦
0 0 1 00 1

b M −C
with s = 100 · max{L − C2 , R − C2 } and θ = cos−1 My−Cy 2 for a fixed
patch size of 100×150 pixel and b ∈ R defining the relative size. In this way,
the entire instrument tip is enclosed by the template and aligned with the tool’s
direction. In the following, details of the different components are presented.
2.1 Tracker – Offline Learning, Online Adaption
Derived from image registration, tracking aims to determine a transformation

parameter that minimizes the similarity measure to a given template. In con-
trast to attaining a single template, the tool undergoes an articulated motion
and a variation of lighting changes which is difficult to minimize as an energy
function. Thus, the tracker learns a generalized model of the tool based on mul-
tiple templates, taken as the tool undergoes different movements in a variety of
environmental settings, and predicts the translation parameter from the inten-
sity values at n random points {xp }np=1 within the template, similar to [12].
In addition, we assume a piecewise constant velocity from consecutive frames.
Therefore, given the image It at time t and the translation vector of the template
from t − 2 to t − 1 as vt−1 = (vx , vy ) , the input to the forest is a feature vector
concatenating the intensity values on the current location of the template It (xp )
with the velocity vector vt−1 , assuming a constant time interval. In order to
learn the relation between the feature vector and the transformation update, we
use a random forest that follows a dimension-wise splitting of the feature vector
such that the translation vector on the leaves point to a similar location.
The cost of generalization is the inadequacy to describe the conditions that
are specific to a particular situation, such as the type of tool used in the surgery.
As a consequence, the robustness of the tracker is affected, since it cannot con-
fidently predict the location of the template for challenging frames with high
variations from the generalized model. Hence, in addition to the offline learning
for a generalized tracker, we propose to perform an online learning strategy that
considers the current frames and learns the relation of the translation vector
with respect to the feature vector. The objective is to stabilize the tracker by
adapting its forest to the specific conditions at hand. In particular, we propose
to incrementally add new trees to the forest by using the predicted template
location on the current frames of the video sequence. To achieve this goal, we
impose random synthetic transformations on the bounding boxes that enclose
the templates to build the learning dataset with pairs of feature and transla-
tion vectors, such that the transformations emulate the motion of the template
between two consecutive frames. Thereafter, the resulting trees are added to the
existing forest and the prediction for the succeeding frames include both the gen-
eralized and environment-specific trees. Notably, our online learning approach
does not learn from all the incoming frames, but rather introduces in Sect. 2.3 a
confidence measure to evaluate and accumulate templates.
2.2 2D Pose Estimation with Temporal-Spatial Constraints
During pose estimation, we model a direct mapping between image features and
the location of the three joints in the 2D space of the patch. Similar to [12], we
employ HOG features around a pool of randomly selected pixel locations within
the provided ROI as an input to the trees in order to infer the pixel offsets to
the joint positions. Since the HOG feature vector is extracted as in [14], the
splitting function of the trees considers only one dimension of the vector and is
optimized by means of information gain. The final vote is aggregated by a dense-
window algorithm. The predicted offsets to the joints in the reference frame of
the patch are back-warped onto the frame coordinate system. Up to now, the
forest considers every input as a still image. However, the surgical movement
is usually continuous. Therefore, we enforce a temporal-spatial relationship for
426 N. Rieke et al.
all joint locations via a Kalman filter [13] by employing the 2D location of the
joints in the frame coordinate system and their frame-to-frame velocity.
2.3 Closed Loop via Integrator

Although the combination of the pose estimation with the Kalman filter would
already define a valid instrument tracking for all three joints, it completely relies
on the gradient information, which may be unreliable in case of blurred frames.
In these scenarios, the intensity information is still a valid source for predicting
the movement. On the other hand, gradient information tends to be more reliable
for precise localization in focused images. Due to the definition of the template,
the prediction of the joint positions can directly be connected to the expected
prediction of the tracker via the similarity transform. Depending on the confi-
dence for the current prediction of the separate random forests, we define the
scale sF and the translation tF of the joint similarity transform as the weighted
average
sT · σP + sP · σT tT · σ P + tP · σ T
sF = and tF =
σT + σP σT + σP
where σT and σP are the average standard deviation of the tracking prediction
and pose prediction, respectively, and the tF is set to be greater than or equal to
the initial translation. In this way, the final template is biased towards the more
reliable prediction. If σT is higher than a threshold τσ , the tracker transmits
the previous location of the template, which is subsequently corrected by the
similarity transform of the predicted pose. Furthermore, the prediction of the
pose can also correct for the scale of the 2D similarity transform which is actually
not captured by the tracker, leading to a scale adaptive tracking. This is an
important improvement because an implicit assumption of the pose algorithm is
that the size of the bounding box corresponds to the size of the instrument due
to the HOG features. The refinement also guarantees that only reliable templates
are used for the online learning thread.

We evaluated our approach on two different datasets ([9,12]), which we refer to
as Szn- and Rie-dataset, respectively. We considered both datasets because of
their intrinsic difference: the first one presents a strong coloring of the sequences
and a well-focused ocular of the microscope; the second presents different types
of instruments, changing zoom factor, presence of light source and presence of
detached epiretinal membrane. Further information on the dataset can be found
in Table 1 and in [9,12]. Analogously to baseline methods, we evaluate the perfor-
mance of our method by means of a threshold measure [9] for the separate joint
predictions and the strict PCP score [15] for evaluating the parts connected by
the joints. The proposed method is implemented in C++ and runs at 40 fps on
a Dell Alienware Laptop, Intel Core i7-4720HQ @ 2.6 GHz and 16 GB RAM.
Table 1. Summary of the datasets.
Set Szn [9] Rie [12]

# Frames I 402 200
II 222 200
III 547 200
IV — 200
Resolution 640×480 1920×1080
Fig. 2. Component evaluation.
In the offline learning for the tracker, we trained 100 trees per parameter,
employed 20 random intensity values and velocity as feature vectors, and used
500 sample points. For the pose estimation, we used 15 trees and the HOG
features are set to a bin size of 9 and pixel size resolution of 50×50.
3.1 Evaluation of Components

To analyze the influence of the different proposed components, we evaluate the
algorithm with different settings on the Rie-dataset, whereby the sequences I,
II and III are used for the offline learning and sequence IV is used as the test
sequence. Figure 2 shows the threshold measure for the left tip in (a) and the
strict PCP for the left fork in (b). Individually, each component excels in perfor-
mance and contribute to a robust performance when combined. Among them,
the most prominent improvement is the weighted averaging of the templates
from Sect. 2.3.
3.2 Comparison to State-of-the-Art

We compare the performance of our method against the state-of-the-art methods
DDVT [9], MI [7], ITOL [11] and POSE [12]. Throughout the experiments on the
Szn-dataset, the proposed method can compete with state-of-the-art methods,
as depicted in Fig. 3. In the first experiment, in which the forest are learned on
the first half of a sequence and evaluated on the second half, our method reaches
an accuracy of at least 94.3 % by means of threshold distance for the central
joint. In the second experiment, all the first halves of the sequences are included
into the learning database and tested on the second halves.
In contrast to the Szn-dataset, the Rie-dataset is not as saturated in terms
of accuracy and therefore the benefits of our methods are more evident. Figure 4
illustrates the results for the cross-validation setting, i.e. the offline training is
performed on three sequences and the method is tested on the remaining one. In
this case, our method outperforms POSE for all test sequences. Notably, there is
a significant improvement in accuracy for the Rie-Set IV which demonstrates the
generalization capacity of our method for unseen illumination and instrument.
Table 2 also reflects this improvement in the strict PCP scores which indicate
that our method is nearly twice as accurate as the baseline method [12].
428 N. Rieke et al.
Fig. 3. Szn-dataset: Sequential and combined evaluation for sequence 1–3. For over
93 %, the results are so close that the single graphs are not distinguishable.
Fig. 4. Rie-dataset: Cross validation evaluation – the offline forests are learned on
three sequences and tested on the unseen one.
Table 2. Strict PCP for cross validation of Rie-dataset for Left and Right fork.
Methods Set I (L/R) Set II (L/R) Set III (L/R) Set IV (L/R)
Our work 89.0/88.5 98.5/99.5 99.5/99.5 94.5/95.0
POSE [12] 69.7/58.5 93.94/93.43 94.47/94.47 46.46/57.71
4 Conclusion
In this work, we propose a closed-loop framework for tool tracking and pose
estimation, which runs at 40 fps. A combination of separate predictors yields
robustness which is able to withstand the challenges of RM sequences. The work
further shows the method’s capability to generalize to unseen instruments and
illumination changes by allowing an online adaption. These key drivers allow our
method to outperform state-of-the-art on two benchmark datasets.
References
1. Ehlers, J.P., Kaiser, P.K., Srivastava, S.K.: Intraoperative optical coherence tomog-
raphy using the rescan 700: preliminary results from the discover study. Br. J.
Ophthalmol. 98, 1329–1332 (2014)
2. Reiter, A., Allen, P.K.: An online learning approach to in-vivo tracking using syn-
ergistic features. In: IROS, pp. 3441–3446 (2010)
3. Bouget, D., Benenson, R., Omran, M., Riffaud, L., Schiele, B., Jannin, P.: Detecting
surgical tools by modelling local appearance and global shape. IEEE Trans. Med.
Imaging 34(12), 2603–2617 (2015)
4. Allan, M., Chang, P.L., Ourselin, S., Hawkes, D., Sridhar, A., Kelly, J.,
Stoyanov, D.: Image based surgical instrument pose estimation with multi-class
labelling and optical flow. In: MICCAI, pp. 331–338 (2015)
5. Wolf, R., Duchateau, J., Cinquin, P., Voros, S.: 3D tracking of laparoscopic instru-
ments using statistical and geometric modeling. In: Fichtinger, G., Martel, A.,
Peters, T. (eds.) MICCAI 2011, Part I. LNCS, vol. 6891, pp. 203–210. Springer,
Heidelberg (2011)
6. Pezzementi, Z., Voros, S., Hager, G.D.: Articulated object tracking by rendering
consistent appearance parts. In: ICRA, pp. 3940–3947 (2009)
7. Richa, R., Balicki, M., Meisner, E., Sznitman, R., Taylor, R., Hager, G.: Visual
tracking of surgical tools for proximity detection in retinal surgery. In: Taylor, R.H.,
Yang, G.-Z. (eds.) IPCAI 2011. LNCS, vol. 6689, pp. 55–66. Springer, Heidelberg
(2011)
8. Sznitman, R., Basu, A., Richa, R., Handa, J., Gehlbach, P., Taylor, R.H.,
Jedynak, B., Hager, G.D.: Unified detection and tracking in retinal micro-
surgery. In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011, Part I.
9. Sznitman, R., Ali, K., Richa, R., Taylor, R.H., Hager, G.D., Fua, P.: Data-driven
visual tracking in retinal microsurgery. In: Ayache, N., Delingette, H., Golland, P.,
Mori, K. (eds.) MICCAI 2012, Part II. LNCS, vol. 7511, pp. 568–575. Springer,
Heidelberg (2012)
430 N. Rieke et al.
10. Sznitman, R., Becker, C., Fua, P.: Fast part-based classification for instrument
detection in minimally invasive surgery. In: Golland, P., Hata, N., Barillot, C.,
Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part II. LNCS, vol. 8674, pp. 692–
11. Li, Y., Chen, C., Huang, X., Huang, J.: Instrument tracking via online learn-
ing in retinal microsurgery. In: Golland, P., Hata, N., Barillot, C., Hornegger, J.,
Howe, R. (eds.) MICCAI 2014, Part I. LNCS, vol. 8673, pp. 464–471. Springer,
Heidelberg (2014)
12. Rieke, N., Tan, D.J., Alsheakhali, M., Tombari, F., Amat di San Filippo, C.,
Belagiannis, V., Eslami, A., Navab, N.: Surgical tool tracking and pose estima-
tion in retinal microsurgery. In: MICCAI, pp. 266–273 (2015)
13. Haykin, S.S.: Kalman Filtering and Neural Networks. Wiley, Hoboken (2001)
14. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection
with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010)
15. Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction
for human pose estimation. In: CVPR, pp. 1–8 (2008)
Integrated Dynamic Shape Tracking and RF
Speckle Tracking for Cardiac Motion Analysis
Nripesh Parajuli1(B) , Allen Lu2 , John C. Stendahl3 , Maria Zontak4 ,

Nabil Boutagy3 , Melissa Eberle3 , Imran Alkhalil3 , Matthew O’Donnell4 ,
Albert J. Sinusas3,5 , and James S. Duncan1,2,5
1
Departments of Electrical Engineering, Yale University, New Haven, CT, USA
nripesh.parajuli@yale.edu
2
Biomedical Engineering, Yale University, New Haven, CT, USA
3
Internal Medicine, Yale University, New Haven, CT, USA
4
Department of Bioengineering, University of Washington, Seattle, WA, USA
5
Radiology and Biomedical Imaging, Yale University, New Haven, CT, USA
Abstract. We present a novel dynamic shape tracking (DST) method

that solves for Lagrangian motion trajectories originating at the left ven-
tricle (LV) boundary surfaces using a graphical structure and Dijkstra’s
shortest path algorithm.
These trajectories, which are temporally regularized and accrue min-
imal drift, are augmented with radio-frequency (RF) speckle tracking
based mid-wall displacements and dense myocardial deformation fields
and strains are calculated.
We used this method on 4D Echocardiography (4DE) images acquired
from 7 canine subjects and validated the strains using a cuboidal array
of 16 sonomicrometric crystals that were implanted on the LV wall. The
4DE based strains correlated well with the crystal based strains. We
also created an ischemia on the LV wall and evaluated how strain values
change across ischemic, non-ischemic remote and border regions (with
the crystals planted accordingly) during baseline, severe occlusion and
severe occlusion with dobutamine stress conditions. We were able to
observe some interesting strain patterns for the different physiological
conditions, which were in good agreement with the crystal based strains.
1 Introduction
Characterization of left ventricular myocardial deformation is useful for the
detection and diagnosis of cardiovascular diseases. Conditions such as ischemia
and infarction undermine the contractile property of the LV and analyzing
Lagrangian strains is one way of identifying such abnormalities.
Numerous methods calculate dense myocardial motion fields and then com-
pute strains using echo images. Despite the substantial interest in Lagrangian
motion and strains, and some recent contributions in spatio-temporal tracking
[1,2], most methods typically calculate frame-to-frame or Eulerian displacements
first and then obtain Lagrangian trajectories.

DOI: 10.1007/978-3-319-46720-7 50
432 N. Parajuli et al.
Therefore, we propose a novel dynamic shape tracking (DST) method that

first provides Lagrangian motion trajectories of points in the LV surfaces and
then computes dense motion fields to obtain Lagrangian strains. This approach
aims to reduce the drift problem which is prevalent in many frame-to-frame
tracking methods.
We first segment our images and obtain point clouds of LV surfaces, which
are then set up as nodes in a directed acyclic graph. Nearest neighbor relation-
ships define the edges, which have weights based on the Euclidean distance and
difference in shape properties between neighboring and starting points. Find-
ing the trajectory is then posed as a shortest path problem and solved using
Dijkstra’s algorithm and dynamic programming.
Once we obtain trajectories, we calculate dense displacement fields for each
frame using radial basis functions (RBFs), which are regularized using spar-
sity and incompressibility constraints. Since the trajectories account for motion
primarily in the LV boundaries, we also include mid-wall motion vectors from
frame-to-frame RF speckle tracking. This fusion strategy was originally pro-
posed in [3] and expanded in [4]. Ultimately, we calculate Lagrangian strains
and validate our results by comparing with sonomicrometry based strains.
2 Methods
2.1 Dynamic Shape Tracking (DST)

We first segment our images to obtain endocardial and epicardial surfaces using
an automated (except the first frame) level-set segmentation method [5]. There
are surfaces from N frames, each with K points, and each point has M neighbors
(Fig. 1a, with M = 3). x ∈ RN ×K×3 is the point matrix, F ∈ RN ×K×S is the
shape descriptor matrix (where S is the descriptor length) and η ∈ RN ×K×M
is the neighborhood matrix (shape context feature is described in [6]). The j th
point of the ith frame is indexed as xi,j (i ∈ [1 : N ] and j ∈ [1 : K]).
Let Aj (i) : NN → NK be the set that indexes the points in the trajectory
starting at point x1,j . For any point xi,Aj (i) in the trajectory, we assume that:
1. It will not move too far away from the previous point xi−1,Aj (i−1) and the
starting point x1,j . Same applies to its shape descriptor Fi,Aj (i) .
2. Its displacement will not differ substantially from that of the previous point.
Same applies to its shape descriptors. We call these the 2nd order weights.
Trajectories satisfying the above conditions will have closely spaced con-
secutive points with similar shape descriptors (Fig. 1b). They will also have
points staying close to the starting point and their shape descriptor remain-
ing similar to the starting shape descriptor, causing them to be more closed.
The 2nd order weights enforce smoothness and shape consistency.
Integrated DST and RF Speckle Tracking 433
(b) Abstraction of points

(circles) and shape de-
(a) Endocardial surfaces, derived points and the scriptors (dotted curves) in
graphical grid structure with a trajectory. a trajectory.
Fig. 1. Points, shape descriptors and trajectories.
Let A represent the set of all possible trajectories originating from x1,j .
Hence, Aj ∈ A minimizes, the following energy:
N

Âj = argmin λ1 ||xi,Aj (i) − xi−1,Aj (i−1) || + λ2 ||xi,Aj (i) − x1,j ||
Aj ∈A i=2
+ λ3 ||Fi,Aj (i) − Fi−1,Aj (i−1) || + λ4 ||Fi,Aj (i) − F1,j || + 2nd order weights
(1)
Graph based techniques have been used in particle tracking applications such
as [7]. We set each point xi,j as a node in our graph. Directed edges exist between
a point xi,j and its neighbors ηi,j in frame i + 1. Each edge has an associated
cost of traversal (defined by Eq. 1). The optimal trajectory Âj is the one that
accrues the smallest cost in traversing from point xi,j to the last frame. This can
be solved using Dijkstra’s shortest path algorithm.
Algorithm. Because our search path is causal, we don’t do all edge weight com-
putations in advance. We start at x1,j and proceed frame-by-frame doing edge
cost calculations between points and their neighbors and dynamically updating
a cost matrix E ∈ RN ×K and a correspondence matrix P ∈ RN ×K . The search
for the trajectory AJ stemming from a point j = J in frame 1 is described in
Algorithm 1.
2.2 Speckle Tracking

We use speckle information, which is consistent in a small temporal window
(given sufficient frequency), from the raw radio frequency (RF) data from our
Algorithm 1. DST Algorithm

1: Inputs:
x, F and η
2: Initialize:
Ei,j = ∞ and Pi,j = 0 ∀(i, j) ∈ NN ×K
3: for i = 2, . . . , N − 1 do
4: for j = 1, . . . , K do
5: for l ∈ ηi,j do
6: etemp ← Ei,j + λ1 ||xi+1,l − xi,j || + λ2 ||xi+1,l − x1,J || +
λ3 ||Fi+1,l − Fi,j || + λ4 ||Fi+1,l − F1,J || + 2nd order weights
7: if Ei,j + etemp < Ei+1,l then
8: Ei+1,l ← Ei,j + etemp
9: Pi+1,l ← j
10: AJ (N ) = argmini EN,i
11: for i = N − 1, . . . 1 do
12: AJ (i) = Pi+1,AJ (i+1)
echo acquisitions, and correlate them from frame-to-frame to provide mid-wall

displacement values.
A kernel of one speckle length, around a pixel in the complex signal, is cor-
related with neighboring kernels in the next frame [8]. The peak correlation
value is used to determine the matching kernel in the next frame and calculate
displacement and confidence measure. The velocity in the echo beam direction
is further refined using zero-crossings of the phase of the complex correlation
function.
2.3 Integrated Dense Displacement Field
We use a RBF based representation to solve for a dense displacement field U

that adheres to the dynamic shape (Ush ) and speckle tracking (Usp ) results and
regularize the field in the same manner as [4]:
ŵ = argmin fadh (Hw, U sh , U sp ) + λ1 ||w||1 + λ2 fbiom (U )

w (2)
fbiom (U ) = ||∇ · U ||22 + α||∇U ||22 .
Here, U is parametrized by Hw, H represents the RBF matrix, w represents

the weights on the bases, fadh is the squared loss function, fbiom is the penalty
on divergences and derivatives, which along with the l1 norm penalty results
in smooth and biomechanically consistent displacement fields. This is a convex
optimization problem and can be solved efficiently. The λ’s here and in Eq. 1 are
chosen heuristically and scaled based on the number of frames in the images.
3 Experiment and Results

4DE Data (Acute Canine Studies). We acquired 4DE images from 7 acute
canine studies (open chested, transducer suspended in a water bath) in the fol-
lowing conditions: baseline (BL), severe occlusion of the left anterior descending
(LAD) coronary artery (SO), and severe LAD occlusion with low dose dobuta-
mine stress (SODOB, dosage: 5 μg\kg\min). All BL images were used while 2
SO and SODOB images were not used due to lack of good crystal data. Philips
iE33 ultrasound system, with the X7-2 probe and a hardware attachment that
provided RF data were used for acquisition. All experiments were conducted in
compliance with the Institutional Animal Care and Use Committee policies.
Sonomicrometry Data. We utilized an array of sonomicrometry crystals

(Sonometrics Corp) to validate strains calculated via echo. 16 crystals were
implanted in the anterior LV wall. They were positioned with respect to the LAD
occlusion and perfusion characteristics within the crystal area, which are defined
accordingly: ischemic (ISC), border (BOR) and remote (REM) (see Fig. 2). Three
crystals were implanted in the apical (1) and basal (2) regions (similar to [9]).
We adapted the 2D sonomicrometry based strain calculation method outlined in
[10] to our 3D case. Sonomicrometric strains were calculated for each cube and
used for the validation of echo based strains.
(a) 3 cuboidal lattices.

(b) Crystals aligned to the LV.
Fig. 2. Crystals and their relative position in the LV
Agreement with Crystal Data. In Fig. 3a, we show, for one baseline image,
strains calculated using our method (echo) and using sonomicrometry (crys). We
can see that the strain values correlate well and drift error is minimal. In Fig. 3b,
we present bar graphs to explicitly quantify the final frame drift as the distance
between actual crystal position and results from tracking. We compare the results
from this method (DST) against that of GRPM (described in [4,11]), which is a
frame-to-frame tracking method, in BL and SO conditions for 5 canine datasets.
The last frame errors for crystal position were lower and statistically signifi-
cant for both BL an SO conditions (p < .01).
Strain Correlation. Pearson correlations of echo based strain curves (calcu-

lated using the shape based tracking (SHP) only and the shape and speckle track-
ing combined (COMB) with corresponding crystal based strain curves, across
the cubic array regions (ISC, BOR, REM) for all conditions are summarized in
Table 1. We see slightly improved correlations from SHP to COMB method. Cor-
relation values were generally lower for ischemic region and longitudinal strains
for both methods.
Since we only had a few data points to carry out statistical analysis in this
format, we also calculated overall correlations (with strain values included for
all time points and conditions together, n > 500) and computed statistical sig-
nificance using Fisher’s transformation. Change in radial strains (SHP r = .72
to COMB r = .82) was statistically significant (p < .01), while circumferential
(SHP r = .73 to COMB r = .75) and longitudinal (SHP r = .44 to COMB
r = .41) were not.
Table 1. Mean correlation values across regions for SHP and COMB methods.
Radial Circ Long

ISC BOR REM ISC BOR REM ISC BOR REM
SHP .75 .85 0.82 .77 .86 .87 .30 .66 .80
COMB .72 .85 0.86 .77 .87 .92 .35 .71 .75
Fig. 3. Peak strain bar graphs (with mean and standard deviations) for radial, circum-
ferential and longitudinal strains - shown across ISC, BOR and REM regions for echo
and crystal based strains.
Physiological Response. Changes in the crystal and echo based (using the
combined method) strain magnitudes, across the physiological testing conditions
- BL, SO and SODOB, is shown in Fig. 3.
Both echo and crystal strain magnitudes generally decreased with severe
occlusion and increased beyond baseline levels with low dose dobutamine stress.
The fact that functional recovery was observed with dobutamine stress indicates
that, at the dose given ischemia was not enhanced. Rather, it appears that
the vasodilatory and inotropic effects of dobutamine were able to overcome the
effects of the occlusion.
However, in average, the strain magnitude recovery is less in the ISC region
compared to BOR and REM regions for both echo and crystals. For echo, the
overall physiological response was more pronounced for radial strains.
4 Conclusion
The DST method has provided improved temporal regularization and therefore
drift errors have been reduced, specially in the diastolic phase. A combined
dense field calculation method that integrates the DST results with RF speckle
tracking results provided good strains, which is validated by comparing with
sonomicrometry based strains. The correlation values were specifically good for
radial and circumferential strains.
We also studied how strains vary across the ISC, BOR and REM regions
(defined by the cuboidal array of crystals in the anterior LV wall) during the
BL, SO and SCODOB conditions. Strain magnitudes (particularly radial) varied
in keeping with the physiological conditions, and also in good agreement with
the crystal based strains.
We seek to improve our methods as we notice that the longitudinal strains
and strains in the ischemic region were not very good. Also, the DST algorithm
occasionally resulted in higher error at end systole. Therefore, in the future, we
will enforce spatial regularization directly by solving for neighboring trajectories
together, where the edge weights will be influenced by the neighboring trajecto-
ries. We would also like to extend the method to work with point sets generated
from other feature generation processes than segmentation.
Acknowledgment. Several members of Dr. Albert Sinusas’s lab, including Christi

Hawley and James Bennett, were involved in the image acquisitions. Dr. Xiaojie Huang
provided code for image segmentation. We would like to sincerely thank everyone for
their contributions. This work was supported in part by the National Institute of Health
(NIH) grant number 5R01HL121226.
References
1. Craene, M., Piella, G., Camara, O., Duchateau, N., Silva, E., Doltra, A.,
Dhooge, J., Brugada, J., Sitges, M., Frangi, A.F.: Temporal diffeomorphic free-
form deformation: application to motion and strain estimation from 3D echocar-
diography. Med. Image Anal. 16(2), 427–450 (2012)
2. Ledesma-Carbayo, M.J., Kybic, J., Desco, M., Santos, A., Sühling, M.,
Hunziker, P., Unser, M.: Spatio-temporal nonrigid registration for ultrasound car-
diac motion estimation. IEEE Trans. Med. Imaging 24(9), 1113–1126 (2005)
3. Compas, C.B., Wong, E.Y., Huang, X., Sampath, S., Lin, B.A., Pal, P.,
Papademetris, X., Thiele, K., Dione, D.P., Stacy, M., et al.: Radial basis functions
for combining shape and speckle tracking in 4D echocardiography. IEEE Trans.
Med. Imaging 33(6), 1275–1289 (2014)
4. Parajuli, N., Compas, C.B., Lin, B.A., Sampath, S., ODonnell, M., Sinusas, A.J.,
Duncan, J.S.: Sparsity and biomechanics inspired integration of shape and speckle
tracking for cardiac deformation analysis. In: van Assen, H., Bovendeerd, P.,
Delhaas, T. (eds.) FIMH 2015. LNCS, vol. 9126, pp. 57–64. Springer, Heidelberg
(2015)
5. Huang, X., Dione, D.P., Compas, C.B., Papademetris, X., Lin, B.A., Bregasi, A.,
Sinusas, A.J., Staib, L.H., Duncan, J.S.: Contour tracking in echocardiographic
sequences via sparse representation and dictionary learning. Med. Image Anal.
18(2), 253–271 (2014)
6. Belongie, S., Malik, J., Puzicha, J.: Shape context: a new descriptor for shape
matching and object recognition. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.)
Advances in Neural Information Processing Systems, pp. 831–837. MIT Press,
Cambridge (2001)
7. Shafique, K., Shah, M.: A noniterative greedy algorithm for multiframe point cor-
respondence. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 51–65 (2005)
8. Chen, X., Xie, H., Erkamp, R., Kim, K., Jia, C., Rubin, J., O’Donnell, M.: 3-D
correlation-based speckle tracking. Ultrason. Imaging 27(1), 21–36 (2005)
9. Dione, D., Shi, P., Smith, W., DeMan, P., Soares, J., Duncan, J., Sinusas, A.: Three-
dimensional regional left ventricular deformation from digital sonomicrometry. In:
Proceedings of the 19th Annual International Conference of the IEEE Engineering
in Medicine and Biology Society, vol. 2, pp. 848–851. IEEE (1997)
10. Waldman, L.K., Fung, Y., Covell, J.W.: Transmural myocardial deformation in
the canine left ventricle. Normal in vivo three-dimensional finite strains. Circ. Res.
57(1), 152–163 (1985)
11. Lin, N., Duncan, J.S.: Generalized robust point matching using an extended free-
form deformation model: application to cardiac images. In: 2004 IEEE Interna-
tional Symposium on Biomedical Imaging: Nano to Macro, pp. 320–323. IEEE
(2004)
The Endoscopogram: A 3D Model
Reconstructed from Endoscopic Video Frames
Qingyu Zhao1(B) , True Price1 , Stephen Pizer1,2 , Marc Niethammer1 ,

Ron Alterovitz1 , and Julian Rosenman1,2
1
Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA
zenyo@cs.unc.edu
2
Radiation Oncology, University of North Carolina at Chapel Hill,
Chapel Hill, USA
Abstract. Endoscopy enables high resolution visualization of tissue tex-

ture and is a critical step in many clinical workflows, including diagnosis
and radiation therapy treatment planning for cancers in the nasophar-
ynx. However, an endoscopic video does not provide explicit 3D spatial
information, making it difficult to use in tumor localization, and it is
inefficient to review. We introduce a pipeline for automatically recon-
structing a textured 3D surface model, which we call an endoscopogram,
from multiple 2D endoscopic video frames. Our pipeline first reconstructs
a partial 3D surface model for each input individual 2D frame. In the
next step (which is the focus of this paper), we generate a single high-
quality 3D surface model using a groupwise registration approach that
fuses multiple, partially overlapping, incomplete, and deformed surface
models together. We generate endoscopograms from synthetic, phantom,
and patient data and show that our registration approach can account for
tissue deformations and reconstruction inconsistency across endoscopic
video frames.
1 Introduction
Modern radiation therapy treatment planning relies on imaging modalities like
CT for tumor localization. For throat cancer, an additional kind of medical
imaging, called endoscopy, is also taken at treatment planning time. Endoscopic
videos provide direct optical visualization of the pharyngeal surface and provide
information, such as a tumor’s texture and superficial (mucosal) spread, that is
not available on CT due to CT’s relatively low contrast and resolution. However,
the use of endoscopy for treatment planning is significantly limited by the fact
that (1) the 2D frames from the endoscopic video do not explicitly provide 3D
spatial information, such as the tumor’s 3D location; (2) reviewing the video
is time-consuming; and (3) the optical views do not provide the full geometric
conformation of the throat.
In this paper, we introduce a pipeline for reconstructing a 3D textured surface
model of the throat, which we call an endoscopogram, from 2D video frames.
The model provides (1) more complete 3D pharyngeal geometry; (2) efficient

DOI: 10.1007/978-3-319-46720-7 51
440 Q. Zhao et al.
visualization; and (3) the opportunity to register endoscopy data with the CT,
thereby enabling transfer of the tumor contours and texture into the CT space.
State-of-the-art monocular endoscopic reconstruction techniques have been
applied in applications like colonoscopy inspection [1], laparoscopic surgery [2]
and orthopedic surgeries [3]. However, most existing methods cannot simul-
taneously deal with the following three challenges: (1) non-Lambertian sur-
faces; (2) non-rigid deformation of tissues across frames; and (3) poorly known
shape or motion priors. Our proposed pipeline deals with these problems using
(1) a Shape-from-Motion-and-Shading (SfMS) method [4] incorporating a new
reflectance model for generating single-frame-based partial reconstructions; and
(2) a novel geometry fusion algorithm for non-rigid fusion of multiple partial
reconstructions. Since our pipeline does not assume any prior knowledge on envi-
ronments, motion and shapes, it can be readily generalized to other endoscopic
applications in addition to our nasopharyngoscopy reconstruction problem.
In this paper we focus on the geometry fusion step mentioned above. The
challenge here is that all individual reconstructions are only partially overlapping
due to the constantly changing camera viewpoint, may have missing data (holes)
due to camera occlusion, and may be slightly deformed since the tissue may have
deformed between 2D frame acquisitions. Our main contribution in this paper is
the design of a novel groupwise surface registration algorithm that can deal with
these limitations. An additional contribution is an outlier geometry trimming
algorithm based on robust regression. We generate endoscopograms and validate
our registration algorithm with data from synthetic CT surface deformations and
endoscopic video of a rigid phantom and real patients.
2 Endoscopogram Reconstruction Pipeline
The input to our system (Fig. 1) is a video sequence of hundreds of consecutive

frames {Fi |i = 1...N }. The output is an endoscopogram, which is a textured 3D
surface model derived from the input frames. We first generate for each frame
Fi a reconstruction Ri by the SfMS method. We then fuse multiple single-frame
reconstructions {Ri } into a single geometry R. Finally, we texture R by pulling
color from the original frames {Fi }. We will focus on the geometry fusion step
in Sect. 3 and briefly introduce the other techniques in the rest of this section.
Fig. 1. The endoscopogram reconstruction pipeline.

The Endoscopogram 441
Shape from Motion and Shading (SfMS). Our novel reconstruction method
[4] has been shown to be efficient in single-camera reconstruction of live
endoscopy data. The method leverages sparse geometry information obtained
by Structure-from-Motion (SfM), Shape-from-Shading (SfS) estimation, and a
novel reflectance model to characterize non-Lambertian surfaces. In summary, it
iteratively estimates the reflectance model parameters and a SfS reconstruction
surface for each individual frame under sparse SfM constraints derived within a
sliding time window. One drawback of this method is that large tissue deforma-
tion and lighting changes across frames can induce inconsistent individual SfS
reconstructions. Nevertheless, our experiments show that this kind of error can
be well compensated in the subsequent geometry fusion step. In the end, for
each frame Fi , a reconstruction Ri is produced as a triangle mesh and trans-
formed into the world space using the camera position parameters estimated
from SfM. Mesh faces that are nearly tangent to the camera viewing ray are
removed because they correspond to occluded regions. The end result of this is
that the reconstructions {Ri } have missing patches and different topology and
are only partially overlapping with each other.
Texture Mapping. The goal of texture mapping is to assign a color to each
vertex v k (superscripts refer to vertex index) in the fused geometry R, which is
estimated by the geometry fusion (Sect. 3) of all the registered individual frame
surfaces {Ri }. Our idea is to find a corresponding point of v k in a registered
surface Ri and to trace back its color in the corresponding frame Fi . Since v k
might have correspondences in multiple registered surfaces, we formulate this
procedure as a labeling problem and optimize a Markov Random Field (MRF)
energy function. In general, the objective function prefers pulling color from non-
boundary nearby points in {Ri }, while encouraging regional label consistency.
3 Geometry Fusion
This section presents the main methodological contributions of this paper: a

novel groupwise surface registration algorithm based on N-body interaction, and
an outlier-geometry trimming algorithm based on robust regression.
Related Work. Given the set of partial reconstructions {Ri }, our goal is to
non-rigidly deform them into a consistent geometric configuration, thus compen-
sating for tissue deformation and minimizing reconstruction inconsistency among
different frames. Current groupwise surface registration methods often rely on
having or iteratively estimating the mean geometry (template) [5]. However, in
our situation, the topology change and partially overlapping data renders initial
template geometry estimation almost impossible. Missing large patches also pose
serious challenges to the currents metric [6] for surface comparison. Template-
free methods have been studied for images [7], but it has not been shown that
such methods can be generalized to surfaces. The joint spectral graph frame-
work [8] can match a group of surfaces without estimating the mean, but these
methods do not explicitly compute deformation fields for geometry fusion.
442 Q. Zhao et al.
Zhao et al. [9] proposed a pairwise surface registration algorithm, Thin Shell
Demons, that can handle topology change and missing data. We have extended
this algorithm into our groupwise situation.
Thin Shell Demons. Thin Shell Demons is a physics-motivated method that
uses geometric virtual forces and a thin shell model to estimate surface deforma-
tion. The so-called forces {f } between two surfaces {R1 , R2 } are vectors connect-
ing automatically selected corresponding vertex pairs, i.e. {f (v k ) = uk −v k | v k ∈
R1 , uk ∈ R2 } (with some abuse of notation, we use k here to index correspon-
dences). The algorithm regards the surfaces as elastic thin shells and produces a
M field φ : R1 → R2 by iteratively minimizing
non-parametric deformation vector
the energy function E(φ) = k=1 c(v k )(φ(v k ) − f (v k ))2 + Eshell (φ). The first
part penalizes inconsistency between the deformation vector and the force vector
applied on a point and uses a confidence score c to weight the penalization. The
second part minimizes the thin shell deformation energy, which is defined as the
integral of local bending and membrane energy:

Eshell (φ) = λ1 W (σmem (p)) + λ2 W (σbend (p)), (1)
R
W (σ) = Y /(1 − τ 2 )((1 − τ )tr(σ 2 ) + τ tr(σ)2 ), (2)
where Y and τ are the Young’s modulus and Poisson’s ratio of the shell. σmem
is the tangential Cauchy-Green strain tensor characterizing local stretching. The
bending strain tensor σbend characterizes local curvature change and is computed
as the shape operator change.
3.1 N-Body Surface Registration
Our main observation is that the virtual force interaction is still valid among
N partial shells even without the mean geometry. Thus, we propose a group-
wise deformation scenario as an analog to the N-body problem: N surfaces are
deformed under the influence of their mutual forces. This groupwise attraction
can bypass the need of a target mean and still deform all surfaces into a sin-
gle geometric configuration. The deformation of a single surface is independent
and fully determined by the overall forces exerted on it. With the physical thin
shell model, its deformation can be topology-preserving and not influenced by its
partial-ness. With this notion in mind, we now have to define (1) mutual forces
among N partial surfaces; (2) an evolution strategy to deform the N surfaces.
Mutual Forces. In order to derive mutual forces, correspondences should be
credibly computed among N partial surfaces. It has been shown that by using
the geometric descriptor proposed in [10], a set of correspondences can be effec-
tively computed between partial surfaces. Additionally, in our application, each
surface Ri has an underlying texture image Fi . Thus, we also compute texture
correspondences between two frames by using standard computer vision tech-
niques. To improve matching accuracy, we compute inlier SIFT correspondences
only between frame pairs that are at most T seconds apart. Finally, these SIFT
matchings can be directly transformed to 3D vertex correspondences via the
SfSM reconstruction procedure.
In the end, any given vertex vik ∈ Ri will have Mik corresponding vertices
in other surfaces {Rj |j = i}, given as vectors {f β (vik ) = uβ − vik , β = 1...Mik },
where uβ is the β th correspondence of vik in some other surface. These corre-
spondences are associated with confidence scores {cβ (vik )} defined by

β δ(uβ , vik ) if uβ , vik is a geometric correspondence,
c (vik ) = (3)
c̄ if uβ , vik is a texture correspondence,
where δ is the geometric feature distance defined in [10]. Since we only consider
inlier SIFT matchings using RANSAC, the confidence score for texture corre-
spondences is a constant c̄. We then define the overall force exerted on vik as the
Mik β k β k Mik β k
weighted average: f¯(v k ) = i c (v )f (v )/
β=1 i c (v ).
i β=1 i
Deformation Strategy. With mutual forces defined, we can solve for the group
deformation fields {φi } by optimizing independently for each surface
Mi

E(φi ) = c(vik )(φ(vik ) − f¯(vik ))2 + Eshell (φi ), (4)
k=1
where Mi is the number of vertices that have forces applied. Then, a groupwise
deformation scenario is to evolve the N surfaces by iteratively estimating the
mutual forces {f } and solving for the deformations {φi }. However, a potential
hazard of our algorithm is that without a common target template, the N sur-
faces could oscillate, especially in the early stage when the force magnitudes are
large and tend to overshoot the deformation. To this end, we observe that the
thin shell energy regularization weights λ1 , λ2 control the deformation flexibility.
Thus, to avoid oscillation, we design the strategy shown in Algorithm 1.
Algorithm 1. N-body Groupwise Surface Registration

1: Start with large regularization weights: λ1 (0), λ2 (0)
2: In iteration p, compute {f } from the current N surfaces {Ri (p)}
3: Optimize Eq. 4 independently for each surface to obtain {Ri (p + 1)}
4: λ1 (p + 1) = σ ∗ λ1 (p), λ2 (p + 1) = σ ∗ λ1 (p), with σ < 1
5: Go to step 2 until reaching maximum number of iterations.
3.2 Outlier Geometry Trimming
The final step of geometry fusion is to estimate a single geometry R from the reg-
istered surfaces {Ri } [11]. However, this fusion step can be seriously harmed by
the outlier geometry created by SfMS. Outlier geometries are local surface parts
444 Q. Zhao et al.
(a) (b) (c) (d)
Fig. 2. (a) 5 overlaying registered surfaces, one of which (pink) has a piece of outlier
geometry (circled) that does not correspond to anything else. (b) Robust quadratic
fitting (red grid) to normalized N (v k ). The outlier scores are indicated by the color.
(c) Color-coded W on L. (d) Fused surface after outlier geometry removal.
that are wrongfully estimated by SfMS under bad lighting conditions (insuffi-
cient lighting, saturation, or specularity) and are drastically different from all
other surfaces (Fig. 2a). The sub-surfaces do not correspond to any part in other
surfaces and thereby are carried over by the deformation process to {Ri }.
Our observation is that outlier geometry changes a local surface’s topology
(branching) and violates many differential geometry properties. We know that
the local surface around a point in a smooth 2-manifold can be approximately
presented by a quadratic Monge Patch h : U → R3 , where U defines a 2D open
set in the tangent plane, and h is a quadratic height function. Our idea is that if
we robustly fit a local quadratic surface at a branching place, the surface points
on the wrong branch of outlier geometry will be counted as outliers (Fig. 2b).
We define the 3D point cloud L = {v 1 , ...v P } of P points as the ensemble of
all vertices in {Ri }, N (v k ) as the set of points in the neighborhood of v k and W
as the set of outlier scores of L. For a given v k , we transform N (v k ) by taking
v k as the center of origin and the normal direction of v k as the z-axis. Then,
we use Iteratively Reweighted Least Squares to fit a quadratic polynomial to
the normalized N (v k ) (Fig. 2b). The method produces outlier scores for each of
the points in N (v k ), which are then accumulated into W (Fig. 2c). We repeat
this robust regression process for all v k in L. Finally, we remove the outlier
branches by thresholding the accumulated scores W, and the remaining largest
point cloud is used to produce the final single geometry R [11] (Fig. 2d).
4 Results
We validate our groupwise registration algorithm by generating and evaluating

endoscopograms from synthetic data, phantom data, and real patient endoscopic
videos. We selected algorithm parameters by tuning on a test patient’s data (sep-
arate from the datasets presented here). We set the thin shell elastic parameters
Y = 2, τ = 0.05, the energy weighting parameters λ1 = λ2 = 1, σ = 0.95, the
frame interval T = 0.5s, and the texture confidence score c̄ = 1.
Synthetic Data. We produced synthetic deformations to 6 patients’ head-and-
neck CT surfaces. Each surface has 3500 vertices and a 2–3 cm cross-sectional
Fig. 3. Left to right: error plot of synthetic data for 6 patients; a phantom endoscopic
video frame; the fused geometry with color-coded deviation (in millimeters) from the
ground truth CT.
diameter, covering from the pharynx down to the vocal cords. We created defor-
mations typically seen in real data, such as the stretching of the pharyngeal
wall and the bending of the epiglottis. We generated for each patient 20 par-
tial surfaces by taking depth maps from different camera positions in the CT
space. Only geometric correspondences were used in this test. We measured the
registration error as the average Euclidean distance of all pairs of correspond-
ing vertices after registration (Fig. 3). Our method significantly reduced error
and performed better than a spectral-graph-based method [10], which is another
potential framework for matching partial surfaces without estimating the mean.
Phantom Data. To test our method on real-world data in a controlled envi-
ronment, we 3D-printed a static phantom model (Fig. 3) from one patient’s CT
data and then collected endoscopic video and high-resolution CT for the model.
We produced SfMS reconstructions for 600 frames in the video, among which 20
reconstructions were uniformly selected for geometry fusion (using more surfaces
for geometry fusion won’t further increase accuracy, but will be computation-
ally slower). The SfMS results were downsampled to ∼2500 vertices and rigidly
aligned to the CT space. Since the phantom is rigid, the registration plays the
role of unifying inconsistent SfMS estimation. No outlier geometry trimming was
performed in this test. We define a vertex’s deviation as its distance to the near-
est point in the CT surface. The average deviation of all vertices is 1.24 mm for
the raw reconstructions and is 0.94 mm for the fused geometry, which shows
that the registration can help filter out inaccurate SfMS geometry estimation.
Figure 3 shows that the fused geometry resembles the ground truth CT surface
except in the farther part, where less data was available in the video.
Patient Data. We produced endoscopograms for 8 video sequences (300 frames
per sequence) extracted from 4 patient endoscopies. Outlier geometry trimming
was used since lighting conditions were often poor. We computed the overlap
distance (OD) defined in [12], which measures the average surface deviation
between all pairs of overlapping regions. The average OD of the 8 cases is 1.6 ±
0.13 mm before registration, 0.58 ± 0.05 mm after registration, and 0.24 ±
0.09 mm after outlier geometry trimming. Figure 4 shows one of the cases.
446 Q. Zhao et al.
Fig. 4. OD plot on the point cloud of 20 surfaces. Left to right: before registration,
after registration, after outlier geometry trimming, the final endoscopogram.
5 Conclusion
We have described a pipeline for producing an endoscopogram from a video
sequence. We proposed a novel groupwise surface registration algorithm and
an outlier-geometry trimming algorithm. We have demonstrated via synthetic
and phantom tests that the N-body scenario is robust for registering partially-
overlapping surfaces with missing data. Finally, we produced endoscopograms for
real patient endsocopic videos. A current limitation is that the video sequence
is at most 3–4 s long for robust SfM estimation. Future work involves fusing
multiple endoscopograms from different video sequences.
Acknowledgements. This work was supported by NIH grant R01 CA158925.
References
1. Hong, D., Tavanapong, W., Wong, J., Oh, J., de Groen, P.C.: 3D reconstruction of
virtual colon structures from colonoscopy images. Comput. Med. Imaging Graph.
38(1), 22–23 (2014)
2. Maier-Hein, L., Mountney, P., Bartoli, A., Elhawary, H., Elson, D., Groch, A.,
Kolb, A., Rodrigues, M., Sorger, J., Speidel, S., Stoyanov, D.: Optical techni-
ques for 3D surface reconstruction in computer-assisted laparoscopic surgery. Med.
Image Anal. 17(8), 974–996 (2013)
3. Wu, C., Narasimhan, S.G., Jaramaz, B.: A multi-image shape-from-shading frame-
work for near-lighting perspective endoscopes. Int. J. Comput. Vis. 86(2), 211–228
(2010)
4. Price, T., Zhao, Q., Rosenman, J., Pizer, S., Frahm, J.M.: Shape from motion
and shading in uncontrolled environments. Under submission, To appear. http://
midag.cs.unc.edu/
5. Durrleman, S., Prastawa, M., Korenberg, J.R., Joshi, S., Trouvé, A., Gerig, G.:
Topology preserving atlas construction from shape data without correspondence
using sparse parameters. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.)
MICCAI 2012, Part III. LNCS, vol. 7512, pp. 223–230. Springer, Heidelberg (2012)
6. Durrleman, S., Pennec, X., Trouvé, A., Ayache, N.: Statistical models of sets of
curves and surfaces based on currents. Med. Image Anal. 13(5), 793–808 (2009)
7. Balci, S.K., Golland, P., Shenton, M., Wells, W.M.: Free-form B-spline deformation
model for groupwise registration. In: MICCAI, pp. 23–30 (2007)
8. Arslan, S., Parisot, S., Rueckert, D.: Joint spectral decomposition for the par-
cellation of the human cerebral cortex using resting-state fMRI. In: Ourselin, S.,
Alexander, D.C., Westin, C.-F., Cardoso, M.J. (eds.) IPMI 2015. LNCS, vol. 9123,
9. Zhao, Q., Price, J.T., Pizer, S., Niethammer, M., Alterovitz, R., Rosenman, J.:
Surface registration in the presence of topology changes and missing patches. In:
Medical Image Understanding and Analysis, pp. 8–13 (2015)
10. Zhao, Q., Pizer, S., Niethammer, M., Rosenman, J.: Geometric-feature-based spec-
tral graph matching in pharyngeal surface registration. In: Golland, P., Hata, N.,
Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part I. LNCS, vol. 8673,
11. Curless, B., Levoy, M.: A volumetric method for building complex models from
range images. In: SIGGRAPH, pp. 303–312 (1996)
12. Huber, D.F., Hebert, M.: Fully automatic registration of multiple 3D data sets.
Image Vis. Comput. 21(7), 637–650 (2003)
Robust Image Descriptors for Real-Time
Inter-Examination Retargeting
in Gastrointestinal Endoscopy
Menglong Ye1(B) , Edward Johns2 , Benjamin Walter3 , Alexander Meining3 ,

and Guang-Zhong Yang1
1
The Hamlyn Centre for Robotic Surgery, Imperial College London, London, UK
menglong.ye11@imperial.ac.uk
2
Dyson Robotics Laboratory, Imperial College London, London, UK
3
Centre of Internal Medicine, Ulm University, Ulm, Germany
Abstract. For early diagnosis of malignancies in the gastrointestinal

tract, surveillance endoscopy is increasingly used to monitor abnormal
tissue changes in serial examinations of the same patient. Despite suc-
cesses with optical biopsy for in vivo and in situ tissue characterisa-
tion, biopsy retargeting for serial examinations is challenging because
tissue may change in appearance between examinations. In this paper, we
propose an inter-examination retargeting framework for optical biopsy,
based on an image descriptor designed for matching between endoscopic
scenes over significant time intervals. Each scene is described by a hierar-
chy of regional intensity comparisons at various scales, offering tolerance
to long-term change in tissue appearance whilst remaining discrimina-
tive. Binary coding is then used to compress the descriptor via a novel
random forests approach, providing fast comparisons in Hamming space
and real-time retargeting. Extensive validation conducted on 13 in vivo
gastrointestinal videos, collected from six patients, show that our app-
roach outperforms state-of-the-art methods.
1 Introduction
In gastrointestinal (GI) endoscopy, serial surveillance examinations are increas-

ingly used to monitor recurrence of abnormalities, and detect malignancies in
the GI tract in time for curative therapy. In addition to direct visualisation of
the mucosa, serial endoscopic examinations involve the procurement of histologi-
cal samples from suspicious regions, for diagnosis and assessment of pathologies.
Recent advances in imaging modalities such as confocal laser endomicroscopy
and narrow band imaging (NBI), allow for in vivo and in situ tissue characteri-
sation with optical biopsy. Despite the advantages of optical biopsy, the required
retargeting of biopsied locations, for tissue monitoring, during intra- or inter-
examination of the same patient is challenging.
For intra-examination, retargeting techniques using local image features have
been proposed, which include feature matching [1], geometric transformations [2],

DOI: 10.1007/978-3-319-46720-7 52
Robust Image Descriptors for Real-Time Inter-Examination Retargeting 449
Fig. 1. A framework overview. Grey arrows represent the training phase using diagnosis
video while black arrows represent the querying phase in the surveillance examination.
tracking [3,4], and mapping [5]. However, when applied over successive exami-
nations, these often fail due to the long-term variation in appearance of tissue
surface, which causes difficulty in detecting the same local features. For inter-
examination, endoscopic video manifolds (EVM) [6] was proposed, with retar-
geting achieved by projecting query images into manifold space using locality
preserving projections. In [7], an external positioning sensor was used for retar-
geting, but requiring manual trajectory registration which interferes with the
clinical workflow, increasing the complexity and duration of the procedure.
In this work, we propose an inter-examination retargeting framework (see
Fig. 1) for optical biopsy. This enables recognition of biopsied locations in the sur-
veillance (second) examination, based on targets defined in the diagnosis (first)
examination, whilst not interfering with the clinical workflow. Rather than rely-
ing on feature detection, a global image descriptor is designed based on regional
image comparisons computed at multiple scales. At the higher scale, this offers
robustness to small variations in tissue appearance across examinations, whilst at
the lower scale, this offers discrimination in matching those tissue regions which
have not changed. Inspired by [8], efficient descriptor matching is achieved by
compression into binary codes, with a novel mapping function based on random
forests, allowing for fast encoding of a query image and hence real-time retarget-
ing. Validation was performed on 13 in vivo GI videos, obtained from successive
endoscopies of the same patient, with 6 patients in total. Extensive comparisons
to state-of-the-art methods have been conducted to demonstrate the practical
clinical value of our approach.
2 Methods
2.1 A Global Image Descriptor for Endoscopic Scenes
Visual scene recognition is often addressed using keypoint-based methods such

as SIFT [9], typically made scalable with Bag-of-Words (BOW) [10]. However,
these approaches rely on consistent detection of the same keypoint on different
observations of the same scene, which is often not possible when applied to
endoscopic scenes undergoing long-term appearance changes of the tissue surface.
450 M. Ye et al.
Fig. 2. (a) Obtaining an integer from one location; (b) creating the global image
descriptor from all locations using spatial pyramid pooling.
In recent years, the use of local binary patterns (LBP) [11] has proved popular
for recognition due to its fast computational speed, and robustness to image
noise and illumination variation. Here, pairs of pixels within an image patch are
compared in intensity to create a sequence of binary numbers. We propose a
novel, symmetric version of LBP which performs 4 diagonal comparisons within
a patch to yield a 4-bit string for each patch, representing an integer from 0 to
15. This comparison mask acts as a sliding window over the image, and a 16-bin
histogram is created from the full set of integers. To offer tolerance to camera
translation, we extend LBP by comparing local regions rather than individual
pixels, with each region the average of its underlying pixels, as shown in Fig. 2(a).
To encode global geometry such that retargeting ensures similarity at mul-
tiple scales, we adopt the spatial pyramid pooling method [12] which divides
an image into a set of coarse-to-fine levels. As shown in Fig. 2(b), we perform
pooling with three levels, where the second and third levels are divided into 2× 2
and 4 × 4 partitions, respectively, with each partition assigned its own histogram
based on the patches it contains. For the second and third levels, further over-
lapped partitions of 1 × 1 and 3 × 3 are created to allow for deformation and
scale variance. For patches of 3×3 regions, we use patches of 24×24, 12×12 and
6 × 6 pixels for the first, second and third levels, respectively. The histograms for
all partitions over all levels are then concatenated to create a 496-d descriptor.
2.2 Compressing the Descriptor into a Compact Binary Code
Recent advances in large-scale image retrieval propose compressing image

descriptors into compact binary codes (known as Hashing [8,13–15]), to allow
for efficient descriptor matching in Hamming space. To enable real-time retar-
geting, and hence application without affecting the existing clinical workflow, we
similarly compress our descriptor via a novel random forests hash. Furthermore,
we propose to learn the hash function with a loss function, which maps to a
new space where images from the same scene have a smaller descriptor distance,
compared with the original descriptor.
n
Let us consider a set of training image descriptors {xi }i=1 from the diagnosis
sequence, each assigned to a scene label representing its topological location,
where each scene is formed of a cluster of adjacent images. We now aim to
infer a binary code of m bits for each descriptor, by encouraging the Hamming
distance between the codes of two images to be small for images of the same
scene, and large for images of different scenes, as in [8]. Let us now denote Y as
an affinity matrix, where yij = 1 if images xi and xj have the same scene label,
and yij = 0 if not. We now sequentially optimise each bit in the code, such that
for r-th bit optimisation, we have the objective function:
n
n
n
min lr (br,i , br,j ; yij ) , s.t. b(r) ∈ {0, 1} . (1)
b(r)
i=1 j=1
Here, br,i is the r-th bit of image xi , b(r) is a vector of the r-th bits for all
n images, and lr (·) is the loss function for the assignment of bits br,i and br,j
given the image affinity yij . As proved in [8], this objective can be optimised by
formulating a quadratic hinge loss function as follows:
2
0 − D bri , brj , if yij = 1
lr (br,i , br,j ; yij ) = 2 (2)
max 0.5m − D bri , brj , 0 , if yij = 0

Here, D bri , brj denotes the Hamming distance between bi and bj for the
first r bits. Note that during binary code inference, the optimisation of each bit
uses the results of the optimisation of the previous bits, and hence this is a series
of local optimisations due to the intractability of global optimisation.
2.3 Learning Encoding Functions with Random Forests

With each training image descriptor assigned a compact binary code, we now
n n
propose a novel method for mapping {xi }i=1 to {bi }i=1 , such that the binary
code for a new query image may be computed. We denote this function Φ (x),
m
and represent it as a set of independent hashing functions {φi (x)}i=1 , one for
th
each bit. To learn the hashing function φi of the i bit in b, we treat this as a
n
binary classifier which is trained on input data {xi }i=1 with labels b(i) .
Rather than using boosted trees as in [8], we employ random forests [16],
which are faster for training and less susceptible to overfitting. Our approach
allows for fast hashing which enables encoding to be achieved without slowing
down the clinical workflow. We create a set of random forests, one for each
m
hashing function {φi (x)}i=1 . Each tree in one forest is independently trained
n
with a random subset of {xi }i=1 , and comparisons of random pairs of descriptor
elements as the split functions. We grow each binary decision tree by maximising
the information gain to optimally split the data X into left XL and right XR
subsets at each node. This information gain I is defined as:
1
I = π (X) − |Xk |π (Xk ) (3)
|X|
k∈{L,R}
452 M. Ye et al.
Table 1. Mean average precision for recognition, both for the descriptor and the entire
framework. Note that the results of hashing-based methods are at 64-bit.
Descriptor Framework
Methods BOW GIST SPACT Ours EVM AGH ITQ KSH Fasthash Ours
Pat.1 0.227 0.387 0.411 0.488 0.238 0.340 0.145 0.686 0.802 0.920
Pat.2 0.307 0.636 0.477 0.722 0.304 0.579 0.408 0.921 0.925 0.956
Pat.3 0.321 0.576 0.595 0.705 0.248 0.501 0.567 0.903 0.911 0.969
Pat.4 0.331 0.495 0.412 0.573 0.274 0.388 0.289 0.889 0.923 0.957
Pat.5 0.341 0.415 0.389 0.556 0.396 0.435 0.342 0.883 0.896 0.952
Pat.6 0.201 0.345 0.315 0.547 0.273 0.393 0.298 0.669 0.812 0.895

where π (X) is the Shannon entropy: π (X) = − y∈{0,1} py log (py ). Here, py is
the fraction of data in X assigned to label y. Tree growth terminates when the
tree reaches a defined maximum depth, or I is below a certain threshold (e−10 ).
With T trained trees, each returning a value αt (x) between 0 and 1, the hashing
function for the ith bit then averages the responses from all trees and rounds
this accordingly to either 0 or 1:
T
0 if T1 t=1 αt (x) < 0.5
φi (x) = (4)
1 otherwise
Finally, to generate the m-bit binary code, the mapping function Φ (x) con-
m
catenates the output bits from all hashing functions {φi (x)}i=1 into a single
binary string. Therefore, to achieve retargeting, the binary string assigned to a
query image from the surveillance sequence is compared, via Hamming distance,
to the binary strings of scenes captured in a previous diagnosis sequence.

For validation, in vivo experiments were performed on 13 GI videos (≈ 17, 700
images) obtained from six patients. For each from patients 1–5, two videos were
recorded in two separate endoscopies of the same examination, resulting in ten
videos. For patient 6, three videos were collected in three serial examinations,
with each consecutive examination 3–4 months apart. All videos were collected
using standard Olympus endoscopes, with NBI-mode on for image enhancement.
The videos were captured at 720 × 576-pixels, and the black borders in the images
were cropped out.
We used leave-one-video-out cross validation, where one surveillance video
(O1) and one diagnosis video (O2) are selected for each experiment, for a total
of 16 experiments (two for each of patients 1–5, and six for patient 6). Intensity-
based k-means clustering was used to divide O2 into clusters, with the number of
clusters defined empirically and proportional to the video length (10–34 clusters).
To assign ground truth labels to test images, videos O1 and O2 were observed
Fig. 3. (a) Means and standard deviations of recognition rates (precisions @ 1-NN) and
(b) precision values @ 50-NN with different binary code lengths; (c-h) precision-recall
curves of individual experiments using 64-bit codes.
side-by-side manually by an expert, moving simultaneously from start to end. For

each experiment, we randomly selected 50 images from O1 (testing) as queries.
Our framework has been implemented using Matlab and C++, and runs on an
HP workstation (Intel ×5650 CPU).
Recognition results for our original descriptor before binary encoding were
compared to a range of standard image descriptors, including a BOW vector [10]
based on SIFT features, a global descriptor GIST based on frequency response
[17], and SPACT [11], a global descriptor based on pixel comparisons. We used
the publicly-available code of GIST, and implemented a 10, 000-d BOW descrip-
tor and a 1, 240-d SPACT descriptor. Descriptor similarity was computed using
the L2 distance for all methods. Table 1 shows the mAP results, with our descrip-
tor significantly outperforming all competitors. As expected, BOW offers poor
results due to the inconsistency of local keypoint detection over long time inter-
vals. We also outperform SPACT as the latter is based on pixel-level compar-
isons, while our regional comparisons are more robust to illumination variation
and camera translation. Whilst GIST typically offers good tolerance to scene
deformation, it lacks local texture encoding, whereas the multi-level nature of
our novel descriptor ensures that similar descriptors suggest image similarity
across a range of scales.
Our entire framework was compared to the EVM method [6] and hashing-
based methods, including ITQ [15], AGH [13], KSH [14] and Fasthash [8]. For the
competitors based on hashing, our descriptor was used as input. For our frame-
work, the random forest consisted of 100 trees, each with a stopping criteria of
maximum tree depth of 10, or minimum information gain of e−10 . Figure 3(a)
shows the recognition rate if the best-matched image is correct (average preci-
454 M. Ye et al.
Fig. 4. Example top-ranked images for the proposed framework on six patients. Yellow-
border images are queries from a surveillance sequence, green- and red-border images
are the correctly and incorrectly matches from a diagnosis sequence, respectively.
sion at 1-nearest-neighbour(NN)). We compare across a range of binary string

lengths, with our framework consistently outperforming others and with the
highest mean recognition rate {0.87, 0.86, 0.82, 0.75}. We also show the precision
values at 50-NN in Fig. 3(b). Precision-recall curves (at 64-bit length) for each
patient data are shown in Fig. 3(c-h), with mAP values in Table 1. As well as sur-
passing the original descriptor, our full framework outperforms all other hashing
methods, with the highest mAP scores and graceful fall-offs in precision-recall.
Our separation of encoding and hashing achieves strong discrimination through
a powerful independent classifier compared to the single-stage approaches of [13–
15] and the less flexible classifier of [8]. We also found that the performance of
EVM is inferior to ours (Table 1), and significantly lower than that presented
in [6]. This is because in their work, training and testing data were from the
same video sequence. In our experiments however, two different sequences were
used for training and testing, yielding a more challenging task, to fully evaluate
the performance on inter-examination retargeting. The current average querying
time using 64-bit strings (including descriptor extraction, binary encoding and
Hamming distance calculation) is around 19 ms, which demonstrates its real-
time capability, compared to 490 ms for querying with the original descriptor.
Finally, images for example retargeting attempts are provided for our framework
in Fig. 4.
Note that our descriptor currently does not explicitly address rotation invari-
ance. However, from the experiments, we do observe that the framework allows
for a moderate degree of rotation, due to the inclusion of global geometry in the
descriptor. A simple way to achieve full rotation-invariance would be to augment
the training data with images from rotated versions of the diagnosis video.
4 Conclusions
In this paper, we have proposed a retargeting framework for optical biopsy in
serial endoscopic examinations. A novel global image descriptor with regional
comparisons over multiple scales deals with tissue appearance variation across
examinations, whilst binary encoding with a novel random forest-based mapping
function adds discrimination and speeds up recognition. The framework can be
readily incorporated into the existing endoscopic workflow due to its capability
of real-time retargeting and no requirement of manual calibration. Validation
on in vivo videos of serial endoscopies from six patients, shows that both our
descriptor and hashing scheme are consistently state-of-the-art.
References
1. Atasoy, S., Glocker, B., Giannarou, S., Mateus, D., Meining, A., Yang, G.-Z.,
Navab, N.: Probabilistic region matching in narrow-band endoscopy for targeted
optical biopsy. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C.
(eds.) MICCAI 2009, Part I. LNCS, vol. 5761, pp. 499–506. Springer, Heidelberg
(2009)
2. Allain, B., Hu, M., Lovat, L.B., Cook, R.J., Vercauteren, T., Ourselin, S., Hawkes,
D.J.: Re-localisation of a biopsy site in endoscopic images and characterisation of
its uncertainty. Med. Image Anal. 16(2), 482–496 (2012)
3. Ye, M., Giannarou, S., Patel, N., Teare, J., Yang, G.-Z.: Pathological site
retargeting under tissue deformation using geometrical association and track-
ing. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI
2013, Part II. LNCS, vol. 8150, pp. 67–74. Springer, Heidelberg (2013)
4. Ye, M., Johns, E., Giannarou, S., Yang, G.-Z.: Online scene association for endo-
scopic navigation. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R.
(eds.) MICCAI 2014, Part II. LNCS, vol. 8674, pp. 316–323. Springer, Heidelberg
(2014)
5. Mountney, P., Giannarou, S., Elson, D., Yang, G.-Z.: Optical biopsy mapping for
minimally invasive cancer screening. In: Yang, G.-Z., Hawkes, D., Rueckert, D.,
Noble, A., Taylor, C. (eds.) MICCAI 2009, Part I. LNCS, vol. 5761, pp. 483–490.
6. Atasoy, S., Mateus, D., Meining, A., Yang, G.Z., Navab, N.: Endoscopic video
manifolds for targeted optical biopsy. IEEE Trans. Med. Imag. 31(3), 637–653
(2012)
7. Vemuri, A.S., Nicolau, S.A., Ayache, N., Marescaux, J., Soler, L.: Inter-operative
trajectory registration for endoluminal video synchronization: application to biopsy
site re-localization. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.)
MICCAI 2013, Part I. LNCS, vol. 8149, pp. 372–379. Springer, Heidelberg (2013)
8. Lin, G., Shen, C., van den Hengel, A.: Supervised hashing using graph cuts and
boosted decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2317–2331
(2015)
9. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Com-
put. Vis. 60(2), 91–110 (2004)
10. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with
large vocabularies and fast spatial matching. In: CVPR, pp. 1–8 (2007)
11. Wu, J., Rehg, J.: Centrist: a visual descriptor for scene categorization. IEEE Trans.
12. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid
matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)
456 M. Ye et al.
13. Liu, W., Wang, J., Kumar, S., Chang, S.F.: Hashing with graphs. In: ICML, pp.
1–8 (2011)
14. Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with ker-
nels. In: CVPR, pp. 2074–2081 (2012)
15. Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a pro-
crustean approach to learning binary codes for large-scale image retrieval. IEEE
Trans. Pattern Anal. Mach. Intell. 35(12), 2916–2929 (2013)
16. Criminisi, A., Shotton, J.: Decision Forests for Computer Vision and Medical Image
Analysis. Springer, Heidelberg (2013)
17. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation
of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Kalman Filter Based Data Fusion for Needle
Deflection Estimation Using
Optical-EM Sensor
Baichuan Jiang1,3(&), Wenpeng Gao2,3, Daniel F. Kacher3,

Thomas C. Lee3, and Jagadeesan Jayender3
1
Department of Mechanical Engineering,
Tianjin University, Tianjin, China
baichuan@tju.edu.cn
2
School of Life Science and Technology,
Harbin Institute of Technology, Harbin, China
3
Department of Radiology, Brigham and Women’s Hospital,
Harvard Medical School, Boston, MA, USA
Abstract. In many clinical procedures involving needle insertion, such as

cryoablation, accurate navigation of the needle to the desired target is of para-
mount importance to optimize the treatment and minimize the damage to the
neighboring anatomy. However, the force interaction between the needle and
tissue may lead to needle deflection, resulting in considerable error in the
intraoperative tracking of the needle tip. In this paper, we have proposed a
Kalman filter-based formulation to fuse two sensor data — optical sensor at the
base and magnetic resonance (MR) gradient-field driven electromagnetic
(EM) sensor placed 10 cm from the needle tip — to estimate the needle
deflection online. Angular springs model based tip estimations and EM based
estimation without model are used to form the measurement vector in the
Kalman filter. Static tip bending experiments show that the fusion method can
reduce the error of the tip estimation by from 29.23 mm to 3.15 mm and from
39.96 mm to 6.90 mm at the MRI isocenter and 650 mm from the isocenter
respectively.
Keywords: Sensor fusion Needle deflection Kalman filter Surgical

navigation
1 Introduction
Minimally invasive therapies such as biopsy, brachytherapy, radiofrequency ablation

and cryoablation involve the insertion of multiple needles into the patient [1–3].
Accurate placement of the needle tip can result in reliable acquisition of diagnostic
samples [4], effective drug delivery [5] or target ablation [2]. When the clinicians
maneuver the needle to the target location, the needle is likely to bend due to the
tissue-needle or hand-needle interaction, resulting in suboptimal placement of the
needle. Mala et al. [3] have reported that in nearly 28 % of the cases, cryoablation of
liver metastases was inadequate due to improper placement of the needles among other

DOI: 10.1007/978-3-319-46720-7_53
458 B. Jiang et al.
reasons. We propose to develop a real-time navigation system for better guidance while
accounting for the needle bending caused by the needle-tissue interactions.
Many methods have been proposed to estimate the needle deflection. The most
popular class of methods is the model-based estimation [6–8]. Roesthuis et al. proposed
the virtual springs model considering the needle as a cantilever beam supported by a
series of springs and utilized Rayleigh-Ritz method to solve for needle deflection [8].
The work of Dorileo et al. merged needle-tissue properties, tip asymmetry and needle
tip position updates from images to estimate the needle deflection as a function of
insertion depth [7]. However, since the model-based estimation is sensitive to model
parameters and the needle-tissue interaction is stochastic in nature, needle deflection
and insertion trajectory are not completely repeatable. The second type of estimation is
achieved using an optical fiber based sensor. Park et al. designed an MRI-compatible
biopsy needle instrumented with optical fiber Bragg gratings to track needle deviation
[4]. However, the design and functionality of certain needles, such as cryoablation and
radiofrequency ablation needles, do not allow for instrumentation of the optical fiber
based sensor in the lumen of the needle. The third kind of estimation strategy was
proposed in [9], where Kalman filter was employed to combine a needle bending model
with the needle base and tip position measurements from two electromagnetic
(EM) trackers to estimate the true tip position. This approach can effectively com-
pensate for the quantification uncertainties of the needle model and therefore be more
reliable. However, this method is not feasible in the MRI environment due to the use of
MRI-unsafe sensors. In this work, we present a new fusion method using an optical
tracker at the needle’s base and an MRI gradient field driven EM tracker attached to the
shaft of the needle. By integrating the sensor data with the angular springs model
presented in [10], the Kalman filter-based fusion model can significantly reduce the
estimation error in presence of needle bending.
2 Methodology
2.1 Sensor Fusion
Needle Configuration. In this study, we have used a cone-tip IceRod® 1.5 mm MRI
Cryoablation Needle (Galil Medical, Inc.), as shown in Fig. 1. A frame with four
passive spheres (Northern Digital Inc. and a tracking system from Symbow Medical
Inc.) is mounted on the base of the needle, and an MRI-safe EndoScout® EM sensor
(Robin Medical, Inc.) is attached to the needle’s shaft with 10 cm offset from the tip set
by a depth stopper.
Through pivot calibration, the optical tracking system can provide the needle base
position POpt and the orientation of the straight needle OOpt . The EM sensor obtains the
sensor’s location PEM and its orientation with respect to the magnetic field of the MR
scanner OEM .
Kalman Filter Formulation. The state vector is set as xk ¼ ½PtipðkÞ ; P_ tipðkÞ T . The
insertion speed during cryoablation procedure is slow enough to be considered as a
Kalman Filter Based Data Fusion for Needle Deflection Estimation 459
Fig. 1. Cryoablation needle mounted with Optical and EM sensor and a depth stopper.
constant. Therefore, the process model can be formulated in the form of xk ¼ Axk1 þ
wk1 as follows:
T2
PtipðkÞ I3 TS I3 Ptipðk1Þ
þ 2 I3 P € tipðkÞ
s
¼ ð1Þ
P_ tipðkÞ 03 I3 P_ tipðk1Þ Ts I3
|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}
transition
matrixA
where TS , I3 , 03 stand for the time step, 3-order identity matrix and 3-order null matrix.
PtipðkÞ , P_ tipðkÞ , P
€ tipðkÞ represent the tip position, velocity, acceleration, respectively. The
T2
acceleration element ½ s I3 ; Ts I3 T P € tipðkÞ is taken as the process noise, denoted by
2
wk1 N ð0; QÞ, where Q is the process noise covariance matrix.
When considering the needle as straight, the tip position was estimated using the
three sets of data as follows: TIPOpt (using POpt , OOpt and needle length offset), TIPEM
(using PEM , OEM and EM offset), and TIPOptEM (drawing a straight line using POpt and
PEM , and needle length offset). When taking the needle bending into account, we can
estimate the needle tip position using the angular springs model with either the com-
bination of PEM , POpt , and Oopt (TIPEMOptOpt Þ or the combination of POpt , PEM and OEM
(TIPOptEMEM Þ, which are formulated in (2) and (3).

PEMOptOpt ¼ g1 PEM ; POpt ; OOpt ð2Þ

POptEMEM ¼ g2 POpt ; PEM ; OEM ð3Þ
In our measurement equation zk ¼ Hxk þ vk , as zk is of crucial importance for the

stability and accuracy, zk ¼ ½g1 Popt ; PEM ; Oopt ; g2 Popt ; PEM ; OEM ; TIPEM T is sug-
gested by later experiments. Accordingly, H is defined as in (4):
2 3
I3 O3
H ¼ 4 I3 O3 5 ð4Þ
I3 O3
460 B. Jiang et al.
The measurement noise is denoted as vk N ð0; RÞ, where R is the measurement

noise covariance matrix. For finding the optimal noise estimation, we used the
Nelder-Mead simplex method [11].
2.2 Bending Model

In order to estimate the flexible needle deflection from the sensor data, an efficient and
robust bending model is needed. In [10], three different models are presented, including two
models based on finite element method (FEM) and one angular springs model. Further
in [8] and [12], a virtual springs model was proposed, which took the needle-tissue force
interaction into consideration. In [5] and [9], a kinematic quadratic polynomial model is
implemented to estimate the needle tip deflection. Since we assume that the deflection is
planar and caused by the orthogonal force acting on the needle tip, we have investigated
multiple models and here we present the angular springs formulation to model the needle.
Angular Springs Model. In this method, the needle is modeled into n rigid rods
connected by angular springs with the same spring constant k, which can be identified
through experiment. Due to the orthogonal force acting on the needle tip, the needle
deflects causing the springs to extend. The insertion process is slow enough to be
considered as quasi-static, therefore the rods and springs are in equilibrium at each time
step. Additionally, for the elastic range of deformations, the springs behave linearly,
i.e., si ¼ k qi , where si is the spring torque at each joint. The implementation of this
method is demonstrated in Fig. 2, and the mechanical relations are expressed as in (5).
8
>
> kq5 ¼ Ftip l
>
>
>
> kq4 ¼ Ftip lð1 þ cos q5 Þ
>
>
>
< kq3 ¼ Ftip l½1 þ cos q5 þ cosðq5 þ q4 Þ
ð5Þ
>
> kq ¼ F tip ½1 þ cos q5 þ cosðq5 þ q4 Þ þ cosðq5 þ q4 þ q3 Þ
l
>
>
2
>
> kq1 ¼ Ftip l½1 þ cos q5 þ cosðq5 þ q4 Þ þ cosðq5 þ q4 þ q3 Þ
>
>
>
:
þ cosðq5 þ q4 þ q3 þ q2 Þ
The Eq. (5) can be written in the form of k U ¼ Ftip JðUÞ, where U ¼ ½q1 ; q2 ; . . .;
qn , and J is the parameter function calculating the force-deflection relationship vector.
In order to implement this model into the tip estimation method in (2) and (3), one more
equation is needed for relating sensor input data with (5). As the data of PEM ; POpt ; Oopt
and Popt ; PEM ; OEM are received during insertion, the deflection of the needle can be
estimated as:
dEM ¼ l ½sin q1 þ sinðq1 þ q2 Þ ð6Þ
dbase ¼ l ½sin q3 þ sinðq3 þ q2 Þ ð7Þ
where dEM represents the deviation of the EM sensor from the optical-measured straight
needle orientation and dbase stands for the relative deviation of the needle base from the
EM measured direction.
Fig. 2. Angular springs model, taking 5 rods as an example
To estimate the needle deflection from PEM ; POpt ; Oopt or Popt ; PEM ; OEM , a set of
nonlinear equations consisting of either (5) (6) or (5) (7) needs to be solved. However,
as proposed in [10], the nonlinear system of (6) can be solved iteratively using Picard’s
method, which is expressed in (8). Given the needle configuration Ut , we can use the
function J to estimate the needle posture at the next iteration.
Ut þ 1 ¼ k1 J ðUt ÞFtip ð8Þ
For minor deflections, it only takes less than 10 iterations to solve this nonlinear
equations, which is efficient enough to achieve real-time estimation.
However, the implementation of Picard’s method requires the Ftip to be known. In
order to find the Ftip using the sensor inputs, a series of simulation experiments are
conducted and linearly-increasing simulated tip force Ftip with the corresponding dEM ,
dbase are collected. The simulation results are shown in Fig. 3. Left.
A least square method is implemented to fit the force-deviation data with a cubic
polynomial. Thereafter, to solve the needle configuration using PEM ; POpt ; Oopt and
POpt ; PEM ; OEM , the optimal cubic polynomial is used first to estimate the tip force from
the measured dEM and dbase , and then (5) is solved iteratively using (8).
Fig. 3. Left: Tip force and deflection relation: tip force increases with 50 mN intervals. Right:
Static tip bending experiment setup at MRI entrance.
462 B. Jiang et al.
3 Experiments
In order to validate our proposed method, we designed the static tip bending experi-
ment, which was performed at the isocenter and 650 mm offset along z-axis from the
isocenter (entrance) of MRI shown in Fig. 3. Right. The experiment is conducted in
two steps: first, the needle tip was placed at a particular point (such as inside a phantom
marker) and kept static without bending the needle. The optical and EM sensor data
were recorded for 10 s. Second, the needle’s tip remained at the same point and the
needle was bent by maneuvering the needle base, with a mean magnitude of about
40 mm tip deviation for large bending validation and 20 mm for small bending vali-
dation. Similarly, the data were recorded from both sensors for an additional 20 s.
Besides, needle was bent in three patterns: in the x-y plane of MRI, y-z plane and all
directions, to evaluate the relevance between EM sensor orientation and its accuracy.
From the data collected in the first step, the estimated needle tip mean position
without needle deflection compensation can be viewed as the gold standard reference
point TIPgold . In the second step, the proposed fusion method, together with other tip
estimation methods, was used to estimate the static tip position, which was compared
with TIPgold . The results are shown in Fig. 4. For large bending, error of TIPOpt , TIPEM
and TIPfused is 29.23 mm, 6.29 mm, 3.15 mm at isocenter, and 39.96 mm, 9.77 mm,
6.90 mm at MRI entrance, respectively. For small bending they become 21.00 mm,
3.70 mm, 2.20 mm at isocenter, and 16.54 mm, 5.41 mm, 4.20 mm at entrance,
respectively.
4 Discussion
By comparing the TIPfused with TIPOpt instead of TIPEM , it should be noted that the EM
sensor is primarily used to augment the measurements of the optical sensor and
compensate for its line-of-sight problem. Although EM sensor better estimates the
needle tip position in presence of needle bending, it is sensitive to the MR gradient field
nonlinearity and noise. Therefore, its performance is less reliable when performing the
needle insertion procedure at the MRI entrance.
Although quantifying the range of bending during therapy is difficult, our initial
insertion experiments in a homogeneous spine phantom using the same needle
demonstrated a needle bending of over 10 mm. Therefore, we attempted to simulate a
larger bending (40 mm tip deviation) that could be anticipated when needle is inserted
through heterogeneous tissue composition. However, as small bending will be more
commonly observed, validation experiments were conducted and demonstrated con-
sistently better estimation using the data fusion method.
From Fig. 4 Bottom, we find that the green dots, which represent bending in the x-y
plane, exhibit higher accuracy of the EM sensor, thus resulting in a better fusion result.
For large bending experiment in the x-y plane at the entrance, the mean error of TIPOpt ,
TIPEM and TIPfused are 28.22 mm, 5.76 mm, 3.40 mm, respectively. The result sug-
gests that by maneuvering the needle in the x-y plane, the estimation accuracy can be
further improved.
Fig. 4. Top: Single experiment result. Each scattered point represent a single time step record.
The left-side points represent the estimated tip positions using different methods. The light blue
points in the middle and dark blue points to the right represent the raw data of EM sensor
locations and needle base positions respectively. The black sphere is centered at the gold standard
point, and encompasses 90 % of the fused estimation points (black). Lines connect the raw data
and estimated tip positions of a single time step. Bottom: From left to right: large bending
experiment at isocenter, large-entrance, small-isocenter, small-entrance. X axis, from 1 to 6, stand
for TIPfused , TIPEM , TIPOptEMEM , TIPEMOptOpt , TIPOptEM , TIPOpt , respectively. Y axis indicates the
mean estimation error (mm) and each dot represents a single experiment result.
It should be noted that the magnitude of estimation errors using fusion method still
appears large due to the significant bending introduced in the needle. When the actual
bending becomes less conspicuous, the estimation error can be much smaller. In addition,
the estimation error is not equal to the overall targeting error. It only represents the
real-time tracking error in presence of needle bending. By integrating the data fusion
algorithm with the 3D Slicer-based navigation system [13], clinicians can be provided
with better real-time guidance and maneuverability of the needle.
464 B. Jiang et al.
5 Conclusion
In this work, we proposed a Kalman filter based optical-EM sensor fusion method to
estimate the flexible needle deflection. The data fusion method exhibits consistently
smaller mean error than the methods without fusion. The EM sensor used in our
method is MR-safe, and the method requires no other force or insertion-depth sensor,
making it easy to integrate with the clinical workflow. In the future, we will improve
the robustness of the needle bending model and integrate with our navigation system.
References
1. Abolhassani, N., Patel, R., Moallem, M.: Needle insertion into soft tissue: a survey. Med.
Eng. Phys. 29(4), 413–431 (2007)
2. Dupuy, D.E., Zagoria, R.J., Akerley, W., Mayo-Smith, W.W., Kavanagh, P.V., Safran, H.:
Percutaneous radiofrequency ablation of malignancies in the lung. AJR Am. J. Roentgenol.
174(1), 57–59 (2000)
3. Mala, T., Edwin, B., Mathisen, Ø., Tillung, T., Fosse, E., Bergan, A., Søreide, Ø.,
Gladhaug, I.: Cryoablation of colorectal liver metastases: minimally invasive tumour control.
Scand. J. Gastroenterol. 39(6), 571–578 (2004)
4. Park, Y.L., Elayaperumal, S., Daniel, B., Ryu, S.C., Shin, M., Savall, J., Black, R.J.,
Moslehi, B., Cutkosky, M.R.: Real-time estimation of 3-D needle shape and deflection for
MRI-guided interventions. IEEE/ASME Trans. Mechatron. 15(6), 906–915 (2010)
5. Wan, G., Wei, Z., Gardi, L., Downey, D.B., Fenster, A.: Brachytherapy needle deflection
evaluation and correction. Med. Phys. 32(4), 902–909 (2005)
6. Asadian, A., Kermani, M.R., Patel, R.V.: An analytical model for deflection of flexible
needles during needle insertion. In: 2011 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), pp. 2551–2556 (2011)
7. Dorileo, E., Zemiti, N., Poignet, P.: Needle deflection prediction using adaptive slope model.
In: 2015 IEEE International Conference on Advanced Robotics (ICAR), pp. 60–65 (2015)
8. Roesthuis, R.J., Van Veen, Y.R.J., Jahya, A., Misra, S.: Mechanics of needle-tissue
interaction. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), pp. 2557–2563 (2011)
9. Sadjadi, H., Hashtrudi-Zaad, K., Fichtinger, G.: Fusion of electromagnetic trackers to
improve needle deflection estimation: simulation study. IEEE Trans. Biomed. Eng. 60(10),
2706–2715 (2013)
10. Goksel, O., Dehghan, E., Salcudean, S.E.: Modeling and simulation of flexible needles. Med.
Eng. Phys. 31(9), 1069–1078 (2009)
11. Lagarias, J.C., Reeds, J.A., Wright, M.H., Wright, P.E.: Convergence properties of the
Nelder–Mead simplex method in low dimensions. SIAM J. Optim. 9(1), 112–147 (1998)
12. Du, H., Zhang, Y., Jiang, J., Zhao, Y.: Needle deflection during insertion into soft tissue
based on virtual spring model. Int. J. Multimedia Ubiquit. Eng. 10(1), 209–218 (2015)
13. Jayender, J., Lee, T.C., Ruan, D.T.: Real-time localization of parathyroid adenoma during
parathyroidectomy. N. Engl. J. Med. 373(1), 96–98 (2015)
Bone Enhancement in Ultrasound Based on 3D
Local Spectrum Variation for Percutaneous
Scaphoid Fracture Fixation
Emran Mohammad Abu Anas1(B) , Alexander Seitel1 , Abtin Rasoulian1 ,

Paul St. John2 , Tamas Ungi3 , Andras Lasso3 , Kathryn Darras4 , David Wilson5 ,
Victoria A. Lessoway6 , Gabor Fichtinger3 , Michelle Zec2 , David Pichora2 ,
Parvin Mousavi3 , Robert Rohling1,7 , and Purang Abolmaesumi1
1
Electrical and Computer Engineering, University of British Columbia,
Vancouver, BC, Canada
emrana@ece.ubc.ca
2
Kingston General Hospital, Kingston, ON, Canada
3
School of Computing, Queen’s University, Kingston, ON, Canada
4
Vancouver General Hospital, Vancouver, BC, Canada
5
Orthopaedics and Centre for Hip Health and Mobility,
University of British Columbia, Vancouver, BC, Canada
6
BC Women’s Hospital, Vancouver, BC, Canada
7
Mechanical Engineering, University of British Columbia, Vancouver, BC, Canada
Abstract. This paper proposes a 3D local phase-symmetry-based bone

enhancement technique to automatically identify weak bone responses
in 3D ultrasound images of the wrist. The objective is to enable percu-
taneous fixation of scaphoid bone fractures, which occur in 90 % of all
carpal bone fractures. For this purpose, we utilize 3D frequency spec-
trum variations to design a set of 3D band-pass Log-Gabor filters for
phase symmetry estimation. Shadow information is also incorporated to
further enhance the bone surfaces compared to the soft-tissue response.
The proposed technique is then used to register a statistical wrist model
to intraoperative ultrasound in order to derive a patient specific 3D model
of the wrist bones. We perform a cadaver study of 13 subjects to evaluate
our method. Our results demonstrate average mean surface and Haus-
dorff distance errors of 0.7 mm and 1.8 mm, respectively, showing better
performance compared to two state-of-the art approaches. This study
demonstrate the potential of the proposed technique to be included in
an ultrasound-based percutaenous scaphoid fracture fixation procedure.
Keywords: Bone enhancement · Scaphoid fracture · Phase symmetry ·

Log-Gabor filters · Shadow map
1 Introduction
Scaphoid fracture is the most probable outcome of wrist injury and it often
occurs due to sudden fall on an outstretched arm. To heal the fracture, casting

DOI: 10.1007/978-3-319-46720-7 54
466 E.M.A. Anas et al.
is usually recommended which immobilizes the wrist in a short arm cast. The
typical healing time is 10–12 weeks, however, it can be longer especially for a
fracture located at the proximal pole of the scaphoid bone [8]. Better outcome
and faster recovery are normally achieved through open (for displaced fracture)
or percutaneous (for non-displaced fracture) surgical procedure, where a surgical
screw is inserted along the longest axis of the fractured scaphoid bone within a
clinical accuracy of 2 mm [7].
In the percutaneous surgical approach for scaphoid fractures, fluoroscopy is
usually used to guide the screw along its desired drill path. The major drawbacks
of a fluoroscopic guidance are that only a 2D projection view of a 3D anatomy can
be used and that the patient and the personnel working in the operating room are
exposed to radiation. For reduction of the X-ray radiation exposure, a camera-
based augmentation technique [10] can be used. As an alternative to fluoroscopy,
3D ultrasound (US)-based procedure [2,3] has been suggested, mainly to allow
real-time 3D data for the navigation. However, the main challenge of using US
in orthopaedics lies in the enhancement of weak, dis-connected, blurry and noisy
US bone responses.
The detection/enhancement of the US bone responses can be broadly cate-
gorized into two groups: intensity-based [4] and phase-based approaches [2,5,6].
A review of the literature of these two approaches suggests the phase-based
approaches have an advantage where there are low-contrast or variable bone
responses, as often observed in 3D US data. Hacihaliloglu et al. [5,6] proposed a
number of phase-based bone enhancement approaches using a set of quadrature
band-pass (Log-Gabor) filters at different scales and orientations. These filters
assumed isotropic frequency responses across all orientations. However, the bone
responses in US have a highly directional nature that in turn produce anisotropic
frequency responses in the frequency domain. Most recently, Anas et al. [2] pre-
sented an empirical wavelet-based approach to design a set of 2D anisotropic
band-pass filters. For bone enhancement of a 3D US volume, that 2D approach
could be applied to individual 2D frames of a given US volume. However, as a
2D-based approach, it cannot take advantage of correlations between adjacent
US frames. As a result, the enhancement is affected by the spatial compounding
errors and the errors resulting from the beam thickness effects [5].
In this work, we propose to utilize local 3D Fourier spectrum variations to
design a set of Log-Gabor filters for 3D local phase symmetry estimation applied
to enhance the wrist bone response in 3D US. In addition, information from
the shadow map [4] is utilized to further enhance the bone response. Finally, a
statistical wrist model is registered to the enhanced response to derive a patient-
specific 3D model of the wrist bones. A study consisting of 13 cadaver wrists
is performed to determine the accuracy of the registration, and the results are
compared with two previously published bone enhancement techniques [2,5].
2 Methods
Bone responses in US are highly directional with respect to the direction of
scanning, i.e., the width of the bone response along the scanning direction is
Bone Enhancement in Ultrasound Based on 3D Local Spectrum Variation 467
significantly narrower than along other directions. As a result, the magnitude

spectrum of an US volume has wider bandwidth along the scanning direction
than along other directions. Most of the existing phase-based approaches [5,6]
employ isotropic filters (having same bandwidths and center frequencies) across
different directions for the phase symmetry estimation. However, the isotropic
filter bank may not be appropriate to extract the phase symmetry accurately
from an anisotropic magnitude spectrum. In contrast to those approaches,
here, we account for the spectrum variations in different directions to design
an anisotropic 3D Log-Gabor filter bank for an improved phase symmetry
estimation.
2.1 Phase Symmetry Estimation

The 3D local phase symmetry estimation starts with dividing a 3D frequency
spectrum into different orientations (example in Fig. 1(a)). A set of orientational
filters are used for this purpose, where the frequency response of each filter is
defined as a multiplication of azimuthal (Φ(φ)) and polar (Θ(θ)) filters:

(φ − φ0 )2 (θ − θ0 )2
O(φ, θ) = Φ(φ) × Θ(θ) = exp − × exp − , (1)
2σφ 2σθ
where the azimuthal angle φ (0 ≤ φ ≤ 2π) measures the angle in the xy-plane
from the positive x-axis in counter-clockwise direction, and the polar angle
θ (0 ≤ θ ≤ π) indicates the angle from the positive z-axis. φ0 and θ0 indicate the
center of the orientation, and σφ and σθ represent the span/bandwidth of the ori-
entation (Fig. 1(a)). The purpose of the polar orientational filter is to divide the
3D spectrum into different cones, and the azimuthal orientational filter further
divides each cone into different sub-spectrums/orientations (Fig. 1(a)).
z orientation 7.6 0
7.5 −10
cone 0
7.4
−20
7.3
)
u( )
−30 ,
y 7.2
−40 ,
7.1
0 7 −50
6.9 −60
x 6.8 −70
0.5 1 1.5 2 2.5 −2 −1 0
10 10 10
(a) (b) (c)
Fig. 1. Utilization of the spectrum variation in local phase symmetry estimation. (a)
A 3D frequency spectrum is divided into different cones, and each segmented cone is
further partitioned into different orientations. (b) The variation of spectrum strength
over the polar angle. (c) The variation of spectrum strength over the angular frequency.
After selection of a particular orientation, band-pass Log-Gabor filters are

applied at different scales. Mathematically, the frequency response of a Log-
Gabor filter is defined as below:

(ln( ωω0 ))2
R(ω) = exp − , (2)
2(ln(κ))2
√
where ω (0 ≤ ω ≤ 3π) represents the angular frequency, ω0 represents the peak
tuning frequency, and 0 < κ < 1 is related to the octave bandwidth. Finally, the
frequency response of a band-pass Log-Gabor filter at a particular orientation
can be expressed as: F (ω, φ, θ) = R(ω)O(φ, θ).
2.2 Enhancement of Bone Responses in US

The bone enhancement starts with the estimation of the parameters for the ori-
entational filter (φ0 , θ0 , σφ , σθ ) for each orientation (Sect. 2.2.1). The estimation
of the Log-Gabor filter parameters for each orientation is presented in Sect. 2.2.2.
The subsequent bone surface extraction is described afterward (Sect. 2.2.3).
2.2.1 Parameters for Orientational Filter

The first step is to compute spherical Fourier transform (FT) P (ω, φ, θ) from
a given US volume. To do so, the 3D conventional FT in rectangular coor-
dinates is calculated, followed by transforming them into spherical coordi-
nates. For segmentation of the spectrum into different cones, we compute
the√ strength of the spectrum along the polar angle coordinate as: u(θ) =
3π 2π
ω=0 φ=0 log(|P (ω, φ, θ)|). An example u(θ) is demonstrated in Fig. 1(b). The
locations θm of the maxima of u(θ) are detected, where m = 1, 2, ..., M , and M
is the total number of detected maxima. For each θm , the detected left and right
minima are represented as − θm and + θm (shown in Fig. 1(b)), and the difference
between these two minima positions is estimated as: σθ,m =+ θm −− θm . Note
that each detected maxima corresponds to a cone in the 3D frequency spectrum,
i.e., the total number of cones is M , the center and the bandwidth of the m-th
cone are θm and σθ,m , respectively.
Subsequently, each segmented cone is further divided into different sub-
spectrums. To do so, the strength of the spectrum is calculated along the
azimuthal angle within a particular cone (say, m-th cone), followed by the max-
ima and corresponding two neighboring minima as before. Then, the center
φm m
n and the bandwidth σφ,n of the n-th sub-spectrum within m-th cone are
calculated.
2.2.2 Parameters for Log-Gabor Filters

For estimation of the Log-Gabor filter parameters at each orientation, the
spectrum
strength is calculated within that orientation as: wm,n (ω) =
20 log(|P (ω, φ, θ)|) dB. A segmentation of wm,n (ω) is performed [2] to
θ φ
estimate the parameters of the Log-Gabor filters at different scales. The lower
m,n m,n
ωs,l and upper ωs,u cut-off frequencies for a scale s are determined from
m,n
w (ω) (an example is shown in Fig. 1(c)), where, 1 ≤ s ≤ S m,n , S m,n is
the total number of scales at n-th orientation within m-th cone. The right
subscripts ‘l’ and ‘u’ indicate the lower and upper cut-off frequencies. The
parameters of the Log-Gabor filters (ω0 and κ) can be directly calculated

m,n m,n m,n
from the lower and upper cut-off frequencies as: ωs,0 = ωs,l ωs,u and
ω m,n √
κm,n
s = exp(−0.25 log2 ωu,l
m,n 2 ln 2).
s,l
2.2.3 Bone Surface Extraction

The above estimated filter parameters are then utilized to compute the frequency
responses of the orientational and Log-Gabor filters using Eqs. (1)-(2). These
filters are subsequently used in 3D phase symmetry estimation [5]. As local
phase symmetry also enhances other anatomical interfaces having symmetrical
responses, shadow information is utilized to suppress those responses from other
anatomies. A shadow map is estimated for each voxel by weighted summation
of the intensity values of all voxels beneath [4]. The product of the shadow map
with the phase symmetry is defined as the bone response (BR) in this work,
which has a range from 0 to 1. To construct a target bone surface, we use a
simple thresholding with a threshold of Tbone on the BR volume to detect the
bones in the US volume. An optimized selection of the threshold is not possible
due to a smaller sample size (13) in this work, therefore, an empirical threshold
value of 0.15 is chosen.
2.3 Registration of a Statistical Wrist Model
A multi-object statistical wrist shape+scale+pose model is developed based on

the idea in [1] to capture the main modes of shape, scale and pose variations
of the wrist bones across a group of subjects at different wrist positions. For
the training during the model development, we use a publicly available wrist
database [9]. For registration of the model to a target point cloud, a multi-object
probabilistic registration is used [11]. The sequential registration is carried out
in two steps: (1) the statistical model is registered to a preoperative CT acquired
at neutral wrist position, and (2) a subsequent registration of the model to the
extracted bone surface in US acquired at a non-neutral wrist position. Note
that in the second step only pose coefficients are optimized to capture the pose
differences between CT and US. Note that the pose model in [1] captures both
the rigid-body and scale variations; however, in this work we use two different
models (pose and scale, respectively) to capture those variations. The key idea
behind separation of the scale from the rigid-body motion is to avoid the scale
optimization during the US registration, as the scale estimation from a limited
view of the bony anatomy in US may introduce additional registration error.
3 Experiments, Evaluation and Results
A cadaver experiment including 13 cadaver wrists was performed for evaluation

as well as comparison of our proposed approach with two state-of-the art tech-
niques: a 2D empirical wavelet based local phase symmetry (EWLPS) [2] and a
3D local phase symmetry (3DLPS) [5] methods.
For acquisition of US data from each cadaver wrist, a motorized linear probe
(Ultrasonix 4D L14-5/38, Ultrasonix, Richmond, BC, Canada) was used with
a frequency of 10 MHz, a depth of 40 mm and a field-of-view of 30◦ focusing
mainly on the scaphoid bone. A custom-built wrist holder was used to keep the
wrist fixed at extension position (suggested by expert hand surgeons) during
scanning. To obtain a preoperative image and a ground truth of wrist US bone
responses, CTs were acquired at neutral and extension positions, respectively,
for all 13 cadaver wrists. An optical tracking system equipped with six fiducial
markers was used to track the US probe.
3.2 Evaluation
To generate the ground truth wrist bone surfaces, CTs were segmented manually
using the Medical Imaging Interaction Toolkit. Fiducial-based registration was
used to align the segmented CT with the wrist bone responses in US. We also
needed a manual adjustment afterward to compensate the movement of the wrist
bones during US acquisition due to the US probe’s pressure on the wrist. The
manual translational adjustment was mainly performed along the direction of the
US scanning axis by registering the CT bone surfaces to the US bone responses.
For evaluation, we measured the mean surface distance error (mSDE) and
maximum surface (Hausdorff) distance error (mxSDE) between the registered
and reference wrist bone surfaces. The surface distance error (SDE) at each point
in the registered bone surface is defined as its Euclidean distance to the clos-
est neighboring point in the reference surface. mSDE and mxSDE are defined
as the average and maximum of SDEs across all vertices, respectively. We also
recorded the run-times of the three bone enhancement techniques from unop-
timized MATLABT M (Mathworks, Natick, MA, USA) code on an Intel Core
i7-2600M CPU at 3.40 GHz for an US volume of size of 57.3 × 36.45 × 32.7 mm3
with a pixel spacing of 0.4 mm in all dimensions.
3.3 Results
Table 1 reports a comparative result of our approach with respect to the EWLPS
and 3DLPS methods. For each bone enhancement technique, a consistent thresh-
old value that provides the least error is used across 13 cadaver cases.
Table 1. Comparative results of the proposed approach.
Method mSDE (mm) mxSDE (mm) Run-time (sec)

Our 0.7 ± 0.2 1.8 ± 0.3 11
EWLPS 0.8 ± 0.2 2.5 ± 0.5 4
3DLPS 0.9 ± 0.3 2.3 ± 0.4 10
(a) (e) (i)
US frame
EWLPS
(b) (f)
EWLPS (j)
3DLPS
(c) (g)
3DLPS
(k)
Our method
(d) (h)
Our method
Fig. 2. Results of the proposed, EWLPS and 3DLPS methods. (a-h) Example sagittal
US frames are shown in (a), (e). The corresponding bone enhancement are demon-
strated in (b-d), (f-h). The differences in the enhancement are prominent in the sur-
faces marked by arrows. (i-k) Example registration results of the statistical model to
US for three different methods.
Figure 2 demonstrates the significant improvement we obtain in bone enhance-

ment quality using our proposed method over the two competing techniques. Two
example US sagittal slices are demonstrated in Figs. 2(a), (e), and the correspond-
ing bone enhancement are shown below. The registered atlases superimposed on
the US volume are displayed in Figs. 2(i–k).
The 2D EWLPS method is applied across the axial slices to obtain the bone
enhancement of the given US volume, therefore, a better bone enhancement
is expected across the axial slices than across the other directions. Figure 2(i)
demonstrates a better registration accuracy in axial direction compared to the
sagittal one (solid vs dash arrow) for the EWLPS method. The 3DLPS method
mainly fails to enhance the curvy surfaces (an example in Fig. 2(c)), as a result,
it leads to less accuracy in registration to the curvy surfaces (Fig. 2(j)).
We have presented a bone enhancement method for 3D US volumes based on

the utilization of local 3D spectrum variation. The introduction of the spectrum
variation in the filter design allows us to estimate the 3D local phase symme-
try more accurately, subsequently better enhancing the expected bone locations.
The improved bone enhancement in turn allows a better statistical model regis-
tration to the US volume. We have applied our technique to 13 cadaver wrists,
and obtained an average mSDE of 0.7 mm and an average mxSDE of 1.8 mm
between the registered and reference scaphoid bone surfaces. Though our mxSDE
improvement of 0.5 mm is small in absolute magnitude, the achieved improve-

ment is significant at about 25 % of the clinical surgical accuracy (2 mm).
The appearance of neighboring bones in the US volume has a significant
impact on the registration accuracy. We have observed better registration accu-
racies where the scaphoid and all of its four neighboring bones (lunate, trapez-
ium, capitate, part of radius) are included in the field of view of the US scans.
The tuning parameter (Tbone ) acts as a trade-off between the appearance of
the bony anatomy and the outlier in the extracted surface. We have selected Tbone
in such a way that more outliers are allowed with the purpose of increased bone
visibility. The effect of the outlier has been compensated by using a probabilistic
registration approach that was robust to noise and outliers.
One of the limitations of the proposed approach is enhancement of the sym-
metrical noise. This type of noise mainly appears as scattered objects (marked
by circles in Figs. 2(d), (h)) in the bone enhanced volumes. Another limitation
is ineffective shadow information utilization. The shadow map used in this work
was not able to reduce the non-bony responses substantially.
Future work includes the development of a post-filtering approach on the
bone enhanced volume to remove the scattered outliers. We also aim to integrate
the proposed technology in a clinical workflow and compare it with a fluoroscopic
guidance-based technique. Further improvement of the run-time is also needed
for the clinical implementation.
Acknowledgements. We would like to thank the Natural Sciences and Engineering

Research Council, and the Canadian Institutes of Health Research for funding this
project.
References
1. Anas, E.M.A., et al.: A statistical shape+pose model for segmentation of wrist CT
images. In: SPIE Medical Imaging, vol. 9034, pp. T1–8. International Society for
Optics and Photonics (2014)
2. Anas, E.M.A., et al.: Bone enhancement in ultrasound using local spectrum vari-
ations for guiding percutaneous scaphoid fracture fixation procedures. IJCARS
10(6), 959–969 (2015)
3. Beek, M., et al.: Validation of a new surgical procedure for percutaneous scaphoid
fixation using intra-operative ultrasound. Med. Image Anal. 12(2), 152–162 (2008)
4. Foroughi, P., Boctor, E., Swartz, M.: 2-D ultrasound bone segmentation using
dynamic programming. In: IEEE Ultrasonics Symposium, pp. 2523–2526 (2007)
5. Hacihaliloglu, I., et al.: Automatic bone localization and fracture detection from
volumetric ultrasound images using 3-D local phase features. UMB 38(1), 128–144
(2012)
6. Hacihaliloglu, I., et al.: Local phase tensor features for 3D ultrasound to statistical
shape+pose spine model registration. IEEE TMI 33(11), 2167–2179 (2014)
7. Menapace, K.A., et al.: Anatomic placement of the herbert-whipple screw in
scaphoid fractures: a cadaver study. J. Hand Surg. 26(5), 883–892 (2001)
8. van der Molen, M.A.: Time off work due to scaphoid fractures and other carpal
injuries in the Netherlands in the period 1990 to 1993. J. Hand Surg.: Br. Eur.
24(2), 193–198 (1999)
9. Moore, D.C., et al.: A digital database of wrist bone anatomy and carpal kinemat-
ics. J. Biomech. 40(11), 2537–2542 (2007)
10. Navab, N., Heining, S.M., Traub, J.: Camera augmented mobile C-arm (CAMC):
calibration, accuracy study, and clinical applications. IEEE TMI 29(7), 1412–1423
(2010)
11. Rasoulian, A., Rohling, R., Abolmaesumi, P.: Lumbar spine segmentation using
a statistical multi-vertebrae anatomical shape+pose model. IEEE TMI 32(10),
1890–1900 (2013)
Bioelectric Navigation: A New Paradigm
for Intravascular Device Guidance
Bernhard Fuerst1,2(B) , Erin E. Sutton1,3 , Reza Ghotbi4 , Noah J. Cowan3 ,

and Nassir Navab1,2
1
Johns Hopkins University, Baltimore, MD, USA
{be.fuerst,esutton5}@jhu.edu
2
3
Department of Mechanical Engineering,
4
Department of Vascular Surgery,
HELIOS Klinikum München West, Munich, Germany
Abstract. Inspired by the electrolocalization behavior of weakly elec-

tric fish, we introduce a novel catheter guidance system for interven-
tional vascular procedures. Impedance measurements from electrodes on
the catheter form an electric image of the internal geometry of the ves-
sel. That electric image is then mapped to a pre-interventional model to
determine the relative position of the catheter within the vessel tree. The
catheter’s measurement of its surroundings is unaffected by movement of
the surrounding tissue, so there is no need for deformable 2D/3D image
registration. Experiments in a synthetic vessel tree and ex vivo biolog-
ical tissue are presented. We employed dynamic time warping to map
the empirical data to the pre-interventional simulation, and our system
correctly identified the catheter’s path in 25/30 trials in a synthetic phan-
tom and 9/9 trials in biological tissue. These first results demonstrated
the capability and potential of Bioelectric Navigation as a non-ionizing
technique to guide intravascular devices.
1 Introduction
As common vascular procedures become less invasive, the need for advanced
catheter navigation techniques grows. These procedures depend on accurate nav-
igation of endovascular devices, but the clinical state of the art presents signif-
icant challenges. In practice, the interventionalist plans the path to the area of
interest based on pre-interventional images, inserts guide wires and catheters,
and navigates to the area of interest using multiple fluoroscopic images. How-
ever, it is difficult and time-consuming to identify bifurcations for navigation,
and the challenge is compounded by anatomic irregularities.
B. Fuerst and E.E. Sutton are joint first authors, having contributed equally.

DOI: 10.1007/978-3-319-46720-7 55
Bioelectric Navigation 475
Previous work has focused interventional and pre-interventional image regis-

tration [2,3,9], but adopting those techniques to endovascular navigation requires
deformable registration, a task that has proven challenging due to significant ves-
sel deformation [1]. From a clinical standpoint, the interventionalist is interested
in the catheter’s position within the vessel tree, and deformation is irrelevant
to navigation since the catheter is constrained to stay within the tree. In fact,
to guide the catheter, one only needs a series of consecutive local measurements
of its surroundings to identify the branch and excursion into the branch. This
suggests that a sensor directly on the catheter could be employed for navigation.
In this paper, we propose a radical new solution, Bioelectric Navigation.
It was inspired by the weakly electric fish which generates an electric field
to detect subtle features of nearby objects [4]. Our technique combines such
local impedance measurements with estimates from pre-interventional imaging
to determine the position of catheter within the vascular tree (Fig. 1). Instead of
the interventionalist relying on fluoroscopy, the catheter itself is equipped with
electrodes to provide feedback. One or more of the electrodes on the catheter
emits a weak electric signal and measures the change to the resulting electric
field as the catheter advances through the vessel tree. The impedance of blood is
much lower than that of vessel walls and surrounding tissue [5], so the catheter
detects local vessel geometry from measured impedance changes. For instance, as
the device passes a bifurcation, it detects a significant disturbance to the electric
field caused by the dramatic increase in vessel cross-sectional area. Bioimpedance
analysis has been proposed for plaque classification [8] and vessel lumen mea-
surement [7,12] but, to our knowledge, has not been applied to navigation. The
relative ordering and amplitude of the features (e.g. bifurcations, stenoses) used
for matching the live signal to the pre-interventional estimate are unchanged
under deformation, so the system is unaffected by movement and manipulation
of the surrounding tissue and does not require 2D/3D deformable registration.
Diagnostic Images Vessel Model Signal Simulation Signal Registration
Live Signal Aquisition Signal Processing Visualization
Fig. 1. In Bioelectric Navigation, live bioelectric measurements are registered to sim-

ulated signals from a pre-interventional image to identify the catheter’s position.
In our novel system, the local measurement from the catheter is compared to
predicted measurements from a pre-interventional image to identify the global
position of the catheter relative to the vessel tree. It takes advantage of high-
resolution pre-interventional images and live voltage measurement for improved
device navigation. Its primary benefit would be the reduction of radiation expo-
sure for the patient, interventionalist, and staff. Experiments in a synthetic vessel
tree and ex vivo biological tissue show the potential of the proposed technology.
476 B. Fuerst et al.

2.1 Modeling Bioimpedance as a Function of Catheter Location
The first step in Bioelectric Navigation is the creation of a bioimpedance model

from the pre-interventional image. A complete bioimpedance model requires
solution of the 3D Poisson equation, assuming known permittivities of blood
and tissue. Given a relatively simple geometry, one can employ finite element
analysis to numerically solve for the electric potential distribution. For our first
feasibility experiments, we designed an eight-path vessel phantom with two
stenoses and one aneurysm. We imported the 3D CAD model into Comsol Mul-
tiphysics (COMSOL, Inc., Stockholm, Sweden) and simulated the signal as a
two-electrode catheter passed through the six primary branches (Fig. 2A). The
simulation yielded six distinct models, one for each path.
2.2 Cross-Sectional Area to Parameterize Vessel Tree
It is unfeasible to import the geometry of an entire human cardiovascular system

and simulate every path from a given insertion site. However, for catheter nav-
igation, we are only interested in temporal variation of the measured signal as
the catheter travels through the vascular tree. This is why sensing technologies
such as intravascular ultrasound and optical coherence tomography are excessive
for catheter navigation. Instead of an exact characterization of the vessel wall
at each location, we use a simpler parameter to characterize the vessel geom-
etry: cross-sectional area along the centerline of a vessel. We model the blood
between the emitting electrode and the ground electrode as an RC circuit, so
the voltage magnitude at the emitting electrode is inversely proportional to the
cross-sectional area of the vessel between the two electrodes, greatly simplifying
our parameterization of the vessel tree.
There are many methods for the segmentation of the vascular tree in CT
images, and selecting the optimal method is not a contribution of this work. In
fact, our system is largely invariant to the segmentation algorithm chosen. It
uses the relative variation between segments to guide the catheter, so as long as
it captures major geometric features, the extracted model need not have high
resolution. Here, we selected segmentation parameters specific to the imaging
modality (e.g. threshold, shape, background suppression) based on published
techniques [6,10]. After manual initialization at an entry point, the algorithm
detected the centerline and the shortest path between two points in a vessel-like
segmentation. It generated the vessel model and computed the cross-sectional
area at each segment for each possible path.
For the synthetic phantom, the simulated voltage at the emitting electrode
was proportional to the inverse of the cross-sectional area extracted from the
cone-beam CT (CBCT) (Fig. 2B). We conclude that cross-sectional area is ade-
quate for localization with a two-electrode catheter, the minimum required for
Bioelectric Navigation.
B 0.24 1.5
Simulated Voltage, µV
1.0
2
1/Area, 1/mm
0.16
0.5
0
0.08
-0.5
0 -1.0
50 100 Position, mm 150 200
Fig. 2. (A) Simulation of synthetic vessel phantom from imported CAD geometry.
The electrodes (black) span the left-most stenosis in this image. The voltage decreases
at a bifurcation (blue star) and increases at a stenosis (pink star). (B) Simulated
voltage magnitude (green) and the inverse of the cross-sectional area (purple) from the
segmented CBCT.
2.3 Bioimpedance Acquisition

Like the fish, the catheter measures changes to its electric field to detect changes
in the geometry of its surroundings. The bioimpedance acquisition consists of
three main components: the catheter, the electronics, and the signal processing.
Almost any commercially available catheter equipped with electrodes can be
used to measure bioimpedance. A function generator supplies a sinusoidal input
to a custom-designed constant current source. The current source supplies a
constant low-current signal to the emitting electrode on the catheter, creating a
weak electric field in its near surroundings. As such, the voltage measured by the
catheter is a function of the change in impedance. Our software simply extracts
the voltage magnitude at the input frequency as the catheter advances.
2.4 Modeled and Empirical Signal Matching

The bioimpedance signal is a scaled and time-warped version of the inverse cross-
sectional area of the vessel, so the alignment of measured bioimpedance from the
catheter with the modeled vessel tree is the foundation of our technique. While
we are investigating other alignment methods, in these initial experiments we
used open-ended dynamic time warping (OE-DTW) [11]. OE-DTW was cho-
sen because it can be adapted to provide feedback to the interventionalist dur-
ing a procedure. OE-DTW enables the alignment of incomplete test time series
with complete references. The incomplete voltage series during a procedure is
incrementally compared to each of the complete references from the model to
obtain constant feedback about the predicted location of the catheter. See [11]
for details. In our implementation, inverse cross-sectional area along each path
formed the reference dataset, and the voltage magnitude from the catheter was
the test time series. It estimated the most likely position of the catheter in
the vessel model by identifying the reference path with the highest similarity
measure: the normalized cross-correlation with the test signal. In these initial
experiments, we advanced the catheter through approximately 90 % of each path,
so our analysis did not take advantage of the open-ended nature of the algorithm.

The prototype was kept constant for the two experiments presented here (Fig. 3).
We used a 6 F cardiac electrophysiology catheter (MutliCath 10J, Biotronik,
Berlin, Germany). Its ten ring electrodes were 2 mm wide with 5 mm spacing.
The input to the current source was ±5 mV at 430 Hz, and the current source
supplied a constant 18 µA to the emitting electrode. A neighboring electrode was
grounded. The voltage between the two electrodes was amplified and filtered by
a low-power biosignal acquisition system (RHD2000, Intan Technologies, Los
Angeles, USA). The Intan software (Intan Interface 1.4.2, Intan Technologies,
Los Angeles, USA) logged the signal from the electrodes. A windowed discrete
Fourier transform converted the signal into the frequency domain, and the mag-
nitude at the input frequency was extracted from each window. The most likely
path was identified as described in Sect. 2.4. While real-time implementation is
crucial to navigation, these first experiments involved only post hoc analyses.
Fig. 3. A function generator supplied a sinusoidal signal to the current source, creating
a weak electric field around the catheter tip. Electrodes on the catheter recorded the
voltage as it was pulled through six paths of the phantom. Inset: catheter in phantom.
3.2 Synthetic Vessel Tree

We performed the first validation experiments in the synthetic vessel tree
immersed in 0.9 % saline (Fig. 4). A camera recorded the trajectory of the
catheter through the phantom as it advanced through the six main paths at
1–2 mm/s. The OE-DTW algorithm correctly identified the path taken in 25/30
trials. The similarity measure was 0.5245 ± 0.0683 for misidentified trials and
0.6751 ± 0.1051 for correctly identified trials.
A B Actual Predict Similarity

1 6 0.4639
2 1 0.4396
5 4 0.5841
5 4 0.5838
5 4 0.5509
C 40 85
Position, mm 170 255
2.0
D Correspondence E Warped Signals
Catheter Voltage Magnitude
Simulated Voltage, µ V
2 1.5
0 1.0
-2 0.5
-4 0
0 5 10 15 0 5 10
Time, sec Time, sec
Fig. 4. (A) Synthetic phantom with labeled paths. The two halves of the phantom
were machined from acrylic and sealed with a thin layer of transparent waterproof
grease. When assembled, it measured 10 cm × 25.4 cm × 5 cm. (B) Trials for which
OE-DTW incorrectly predicted catheter position. (C) The measured voltage (blue)
and the simulated signal (green) identify the two stenoses and four bifurcations. The
signals appear correlated but misaligned. (D) The OE-DTW algorithm found a corre-
spondence path between the two signals. (E) OE-DTW aligned the simulated data to
the measured data and calculated the cross-correlation between the two signals.
Position, mm
Catheter Voltage Magnitude
0 80 160 240 300

4 0.75
1/Area, 1/mm2
2 0.5
0 0.25
-2 0
0 4 8 12 16
Time, sec
200 Correspondence Warped Signals
Position, mm
100
0
0 5 10 15 0 5 10 15
Time, sec Time, sec
Fig. 5. Biological tissue experiment (left) and results from one trial in the long path
(right). The stenosis and bifurcation are visible in both the inverse of the cross-sectional
area and voltage magnitude.
3.3 Ex Vivo Aorta
The impedance difference between saline and vessel is less dramatic than between
saline and acrylic, so we expected lower amplitude signals in biological tissue. We
sutured two porcine aorta into a Y-shaped vessel tree and simulated a stenosis
in the trunk with a cable tie. We embedded the vessel in a 20 % gelatin solution
and filled the vessel with 0.9 % saline. The ground truth catheter position was
recorded from fluoroscopic image series collected simultaneously with the voltage
measurements (Fig. 5). The catheter was advanced six times through the long
path and three times through the short path. The algorithm correctly identified
the path 9/9 times with similarity measure 0.6081 ± 0.1614.
4 Discussion
This preliminary investigation suggests that the location of the catheter in a
blood vessel can be estimated by comparing a series of local measurements to
simulated bioimpedance measurements from a pre-interventional image.
Our technology will benefit from further integration of sensing and imaging
before clinical validation. While OE-DTW did not perfectly predict the location
of the catheter, the trials for which the algorithm misclassified the path also had
the lowest similarity scores. In practice, the system would prompt the interven-
tionalist to take a fluoroscopic image when similarity is low. Because it measures
local changes in bioimpedance, we expect the highest accuracy in feature-rich
environments, those most relevant to endovascular procedures. The estimate is
least accurate in low-feature environments like a long, uniform vessel, but as soon
as the catheter reaches the next landmark, the real-time location prediction is
limited only by the resolution of the electric image from the catheter. A possible
source of uncertainty is the catheter’s position in the vessel cross-section relative
to the centerline, but according to our simulations and the literature [12], it does
not significantly impact the voltage measurement.
To display the real-time position estimate, our next step is to compare tech-
niques that match simulated and live data in real time (e.g. OE-DTW, Hidden
Markov Models, random forests, and particle filters). A limitation of these match-
ing algorithms is that they fail when the catheter changes direction (insertion vs
retraction). One way we plan to address this is by attaching a simple encoder to
the introducer sheath to detect the catheter’s heading and prompting our soft-
ware to only analyze data from when the catheter is being inserted. We recently
validated Bioelectric Navigation in biologically relevant flow in the synthetic
phantom and performed successful renal artery detection the the abdominal
aorta of a sheep cadaver model. Currently, we are evaluating the prototype’s
performance in vivo, navigating through the abdominal vasculature of swine.
5 Scientific and Clinical Context

The generation and measurement of bioelectrical signals within vessels and their
mapping to a patient-specific vessel model has never been proposed for catheter
navigation. This work is complementary to the research done within MICCAI
community and has the potential to advance image-guided intervention. Bioelec-
tric Navigation circumvents many clinical imaging challenges such as catheter
detection, motion compensation, and catheter tracking. Significantly, deformable
registration for global 3D localization becomes irrelevant; the interventionalist

may move a vessel, but the catheter remains in the same vascular branch.
Once incorporated into the clinical workflow, Bioelectric Navigation has the
potential to significantly reduce fluoroscope use during common endovascular
procedures. In addition, it could ease the positioning of complex grafts, for
instance a graft to repair abdominal aortic aneurysm. These custom grafts incor-
porate holes such that the the visceral arterial ostia are not occluded. Angio-
graphic imaging is of limited use to the positioning of the device because the oper-
ator must study the graft markers and arterial anatomy simultaneously. In con-
trast, when the bioelectric catheter passes a bifurcation, the electric impedance
changes dramatically. Bioelectric Navigation’s inside-out sensing could change
the current practice for device deployment by providing real-time feedback about
device positioning from inside the device itself.
References
1. Ambrosini, P., Ruijters, D., Niessen, W.J., Moelker, A., van Walsum, T.: Continu-
ous roadmapping in liver TACE procedures using 2D–3D catheter-based registra-
tion. Int. J. CARS 10, 1357–1370 (2015)
2. Aylward, S.R., Jomier, J., Weeks, S., Bullitt, E.: Registration and analysis of vas-
cular images. Int. J. Comput. Vis. 55(2), 123–138 (2003)
3. Dibildox, G., Baka, N., Punt, M., Aben, J., Schultz, C., Niessen, W.,
van Walsum, T.: 3D/3D registration of coronary CTA and biplane XA recon-
structions for improved image guidance. Med. Phys. 41(9), 091909 (2014)
4. Von der Emde, G., Schwarz, S., Gomez, L., Budelli, R., Grant, K.: Electric fish
measure distance in the dark. Nature 395(6705), 890–894 (1998)
5. Gabriel, S., Lau, R., Gabriel, C.: The dielectric properties of biological tissues: II.
Measurements in the frequency range 10 Hz to 20 GHz. Phys. Med. Biol. 41(11),
2251–2269 (1996)
6. Groher, M., Zikic, D., Navab, N.: Deformable 2D–3D registration of vascular struc-
tures in a one view scenario. IEEE Trans. Med. Imaging 28(6), 847–860 (2009)
7. Hettrick, D., Battocletti, J., Ackmann, J., Linehan, J., Waltier, D.: In vivo mea-
surement of real-time aortic segmental volume using the conductance catheter.
Ann. Biomed. Eng. 26, 431–440 (1998)
8. Metzen, M., Biswas, S., Bousack, H., Gottwald, M., Mayekar, K., von der Emde, G.:
A biomimetic active electrolocation sensor for detection of atherosclerotic lesions
in blood vessels. IEEE Sens. J. 12(2), 325–331 (2012)
9. Mitrovic, U., Spiclin, Z., Likar, B., Pernus, F.: 3D–2D registration of cerebral
angiograms: a method and evaluation on clinical images. IEEE Trans. Med. Imag-
ing 32(8), 1550–1563 (2013)
10. Pauly, O., Heibel, H., Navab, N.: A machine learning approach for deformable
guide-wire tracking in fluoroscopic sequences. In: Jiang, T., Navab, N., Pluim,
J.P.W., Viergever, M.A. (eds.) MICCAI 2010, Part III. LNCS, vol. 6363, pp. 343–
11. Tormene, P., Giorgino, T., Quaglini, S., Stefanelli, M.: Matching incomplete time
series with dynamic time warping: an algorithm and an application to post-stroke
rehabilitation. Artif. Intell. Med. 45, 11–34 (2009)
12. Choi, H.W., Zhang, Z., Farren, N., Kassab, G.: Implications of complex anatomical
junctions on conductance catheter measurements of coronary arteries. J. Appl.
Physiol. 114(5), 656–664 (2013)
Process Monitoring in the Intensive Care Unit:
Assessing Patient Mobility Through Activity
Analysis with a Non-Invasive Mobility Sensor
Austin Reiter1(B) , Andy Ma1 , Nishi Rawat2 , Christine Shrock2 ,

and Suchi Saria1
1
The Johns Hopkins University, Baltimore, MD, USA
areiter@cs.jhu.edu
2
Johns Hopkins Medical Institutions, Baltimore, MD, USA
Abstract. Throughout a patient’s stay in the Intensive Care Unit

(ICU), accurate measurement of patient mobility, as part of routine
care, is helpful in understanding the harmful effects of bedrest [1]. How-
ever, mobility is typically measured through observation by a trained
and dedicated observer, which is extremely limiting. In this work, we
present a video-based automated mobility measurement system called
NIMS: Non-Invasive Mobility Sensor . Our main contributions are:
(1) a novel multi-person tracking methodology designed for complex envi-
ronments with occlusion and pose variations, and (2) an application of
human-activity attributes in a clinical setting. We demonstrate NIMS on
data collected from an active patient room in an adult ICU and show a
high inter-rater reliability using a weighted Kappa statistic of 0.86 for
automatic prediction of the highest level of patient mobility as compared
to clinical experts.
Keywords: Activity recognition · Tracking · Patient safety
1 Introduction
Monitoring human activities in complex environments are finding an increasing
interest [2,3]. Our current investigation is driven by automated hospital surveil-
lance, specifically, for critical care units that house the sickest and most fragile
patients. In 2012, the Institute of Medicine released their landmark report [4] on
developing digital infrastructures that enable rapid learning health systems; one
of their key postulates is the need for improvement technologies for measuring
the care environment. Currently, simple measures such as whether the patient
has moved in the last 24 h, or whether the patient has gone unattended for sev-
eral hours require manual observation by a nurse, which is highly impractical
to scale. Early mobilization of critically ill patients has been shown to reduce
physical impairments and decrease length of stay [5], however the reliance on
direct observation limits the amount of data that may be collected [6].
To automate this process, non-invasive low-cost camera systems have begun
to show promise [7,8], though current approaches are limited due to the unique

DOI: 10.1007/978-3-319-46720-7 56
Patient Mobility in the ICU with NIMS 483
challenges common to complex environments. First, though person detection in

images is an active research area [9,10], significant occlusions present limitations
because the expected appearances of people do not match what is observed in
the scene. Part-based deformable methods [11] do somewhat address these issues
as well as provide support for articulation, however when combining deformation
with occlusion, these too suffer for similar reasons.
This paper presents two main contributions towards addressing challenges
common to complex environments. First, using an RGB-D sensor, we demon-
strate a novel methodology for human tracking systems which accounts for vari-
ations in occlusion and pose. We combine multiple detectors and model their
deformable spatial relationship with a temporal consistency so that individual
parts may be occluded at any given time, even through articulation. Second, we
apply an attribute-based framework to supplement the tracking information in
order to recognize activities, such as mobility events in a complex clinical envi-
ronment. We call this system NIMS: A Non-Invasive Mobility Sensor .
1.1 Related Work
Currently, few techniques exist to automatically and accurately monitor ICU

patient’s mobility. Accelerometry is one method that has been validated [12],
but it has limited use in critically ill inpatient populations [6]. Related to
multi-person tracking, methods have been introduced to leverage temporal cues
[13,14], however hand-annotated regions are typically required at the onset, lim-
iting automation. To avoid manual initializations, techniques such as [15,16]
employ a single per-frame detector with temporal constraints. Because sin-
gle detectors are limited towards appearance variations, [15] proposes to make
use of multiple detectors, however this assumes that the spatial configuration
between the detectors is fixed, which does not scale to address significant pose
variations.
Much activity analysis research has approached action classification with bag-
of-words approaches. Typically, spatio-temporal features, such as Dense Trajec-
tories [17], are used with a histogram of dictionary elements or a Fisher Vector
encoding [17]. Recent work has applied Convolutional Neural Network (CNN)
models to the video domain [18,19] by utilizing both spatial and temporal infor-
mation within the network topology. Other work uses Recurrent Neural Networks
with Long Short Term Memory [20] to model sequences over time. Because the
“activities” addressed in this paper are more high-level in nature, traditional
spatio-temporal approaches often suffer. Attributes describe high-level proper-
ties that have been demonstrated for activities [21], but these tend to ignore
contextual information.
The remainder of this paper is as follows: first, we describe our multi-person
tracking framework followed by our attributes and motivate their use in the
clinical setting to predict mobility. We then describe our data collection protocol
and experimental results and conclude with discussions and future directions.
484 A. Reiter et al.
2 Methods
Figure 1 shows an overview of our NIMS system. People are localized, tracked,
and identified using an RGB-D sensor. We predict the pose of the patient and
identify nearby objects to serve as context. Finally, we analyze in-place motion
and train a classifier to determine the highest level of patient mobility.
Input 1. Person 2. Patient 3. Patient Pose Classification 4. Motion 5. Mobility Mobility

Video Localization Identification and Context Detection analysis Classification Level
(pixels/s)
In-bed activity
Motion vs. non-motion threshold

Caregiver Patient Bed up without Patient Sitting No in-bed activity
Patient in Chair (frames)
Fig. 1. Flowchart of our mobility prediction framework. Our system tracks people in
the patient’s room, identifies the “role” of each (“patient”, “caregiver”, or “family
member”), relevant objects, and builds attribute features for mobility classification.
2.1 Multi-person Tracking by Fusing Multiple Detectors
Our tracking method works by formulating an energy functional comprising of

spatial and temporal consistency over multiple part-based detectors (see Fig. 2).
We model the relationship between detectors within a single frame using a
deformable spatial model and then track in an online setting.
Fig. 2. Full-body (red) and Head (green) detectors trained by [11]. The head detector
may fail with (a) proximity or (d) distance. The full-body detector may also struggle
with proximity [(b) and (c)]. (To protect privacy, all images are blurred). (Color figure
online)
Modeling Deformable Spatial Configurations - For objects that exhibit

deformation, such as humans, there is an expected spatial structure between
regions of interest (ROIs) (e.g., head, hands, etc.) across pose variations. Within
each pose (e.g. lying, sitting, or standing), we can speculate about an ROI (e.g.
head) based on other ROIs (e.g. full-body). To model such relationships, we
assume that there is a projection matrix Acll which maps the location of ROI
l to that of l for a given pose c. With a training dataset, C types of poses
are determined automatically by clustering location features [10], and projec-

tion matrix Acll can be learnt by solving a regularized least-square optimization
problem.
To derive the energy function of our deformable model, we denote the number
of persons in the t-th frame as M t . For the m-th person, the set of corresponding
bounding boxes from L ROIs is defined by X t = {X1t (m), · · · , XLt (m)}. For
any two proposed bounding boxes Xlt (m) and Xlt (m) at frame t for individual
m, the deviation from the expected spatial configuration is quantified as the
error between the expected location of the bounding box for the second ROI
conditioned on the first. The total cost is computed by summing, for each of the
M t individuals, the minimum cost for each of the C subcategories:
t
M

t t
Espa (X , M ) = min Acll Xlt (m) − Xlt (m)2 (1)
1≤c≤C
m=1 l=l
Grouping Multiple Detectors - Next we wish to automate the process of

detecting people to track using a combination of multiple part-based detectors.
A collection of existing detection methods [11] can be employed to train K
detectors; each detector is geared towards detecting an ROI. Let us consider two
bounding boxes Dkt (n) and Dkt (n ) from any two detectors k and k , respectively.
If these are from the same person, the overlapped region is large when they are
projected to the same ROI using our projection matrix. In this case, the average
depths in these two bounding boxes are similar. We calculate the probability
that these are from the same person as:
p = apover + (1 − a)pdepth (2)
where a is a positive weight, pover and pdepth measure the overlapping ratio and
depth similarity between two bounding boxes, respectively. These scores are:

|Ac(k)(k ) Dk
t
(n)∩Dkt
(n )|
t
|Dk (n)∩Ac(k )(k) Dk
t
(n )|
pover = max min(|A c t t , t
D (n)|,|D (n )|) min(|D (n)|,|A c t
D (n )|) (3)
(k)(k ) k k k (k )(k) k
−(v t (n)−v t (n ))2

k −(v t (n)−v t (n ))2
k k k
1 2σ t (n)2
1 2v t (n )2
pdepth = e k + e k (4)
2 2
where maps the k-th detector to the l-th region-of-interest, v and σ denote the
mean and standard deviation of the depth inside a bounding box, respectively.
By the proximity measure given by (2), we group the detection outputs into
N t sets of bounding boxes. In each group Gt (n), the bounding boxes are likely
from the same person. Then, we define a cost function that represents the match-
ing relationships between the true positions of our tracker and the candidate
locations suggested by the individual detectors as:
t
N

t t
Edet (X , M ) = min wkt (n)||Dkt (n ) − X(k)
t
(m)||2 (5)
1≤m≤M t
n=1 t (n )∈Gt (n)
Dk
where wkt (n) is the detection score as a penalty for each detected bounding box.
Tracking Framework - We initialize our tracker at time t = 1 by aggregating

the spatial (Eq. 1) and detection matching (Eq. 5) cost functions. To determine
the best bounding box locations at time t conditioned on the inferred bounding
box locations at time t − 1, we extend the temporal trajectory Edyn and appear-
ance Eapp energy functions from [16] and solve the joint optimization (definition
for Eexc , Ereg , Edyn , Eapp left out for space considerations) as:
min λdet Edet + λspa Espa + λexc Eexc + λreg Ereg + λdyn Edyn + λapp Eapp (6)
X t ,M t
We refer the interested reader to [22] for more details on our tracking framework.
2.2 Activity Analysis by Contextual Attributes

We describe the remaining steps for our NIMS system here.
Patient Identification - We fine-tune a pre-trained CNN [24] based on the
architecture in [25], which is initially trained on ImageNet (http://image-net.
org/). From our RGB-D sensor, we use the color images to classify images of
people into one of the following categories: patient, caregiver, or family-member.
Given each track from our multi-person tracker, we extract a small image accord-
ing to the tracked bounding box to be classified. By understanding the role of
each person, we can tune our activity analysis to focus on the patient as the
primary “actor” in the scene and utilize the caregivers into supplementary roles.
Patient Pose Classification and Context Detection - Next, we seek to
estimate the pose of the patient, and so we fine-tune a pre-trained network
to classify our depth images into one of the following categories: lying-down,
sitting, or standing. We choose depth over color as this is a geometric decision.
To supplement our final representation, we apply a real-time object detector
[24] to localize important objects that supplement the state of the patient, such
as: bed upright, bed down, and chair. By combining bounding boxes identified
as people with bounding boxes of objects, the NIMS may better ascertain if a
patient is, for example, “lying-down in a bed down” or “sitting in a chair”.
Motion Analysis - Finally, we compute in-place body motion. For example, if
a patient is lying in-bed for a significant period of time, clinicians are interested
in how much exercise in-bed occurs [23]. To achieve this, we compute the mean
magnitude of motion with a dense optical flow field within the bounding box
of the tracked patient between successive frames in the sequence. This statistic
indicates how much frame-to-frame in-place motion the patient is exhibiting.
Mobility Classification - [23] describes a clinically-accepted 11-point mobility
scale (ICU Mobility Scale), as shown in Table 1 on the right. We collapsed
this into our Sensor Scale (left) into 4 discrete categories. The motivation for
this collapse was that when a patient walks, this is often performed outside the
room where our sensors cannot see.
By aggregating the different sources of information described in the preceding
steps, we construct our attribute feature Ft with:
Table 1. Table comparing our Sensor Scale, containing the 4 discrete levels of mobil-
ity that the NIMS is trained to categorize from a video clip of a patient in the ICU, to
the standardized ICU Mobility Scale [23], used by clinicians in practice today.
Sensor Scale ICU Mobility Scale

A. Nothing in bed 0. Nothing (lying in bed)
B. In-bed activity 1. Sitting in bed, exercises in bed
C. Out-of-bed activity 2. Passively moved to chair (no standing)
3. Sitting over edge of bed
4. Standing
5. Transferring bed to chair (with standing)
6. Marching in place (at bedside) for short duration
D. Walking 7. Walking with assistance of 2 or more people
8. Walking with assistance of 1 person
9. Walking independently with a gait aid
10. Walking independently without a gait aid
1. Was a patient detected in the image? (0 for no; 1 for yes)

2. What was the patient’s pose? (0 for sitting; 1 for standing; 2 for lying-down;
3 for no patient found )
3. Was a chair found? (0 for no; 1 for yes)
4. Was the patient in a bed? (0 for no; 1 for yes)
5. Was the patient in a chair? (0 for no; 1 for yes)
6. Average patient motion value
7. Number of caregivers present in the scene
We chose these attributes because their combination describes the “state”

of the activity. Given a video segment of length T , all attributes F =
T
[F1 , F2 , . . . , FT ] are extracted and the mean Fμ = t=1 Ft /T is used to rep-
resent the overall video segment (the mean is used to account for spurious errors
that may occur). We then train a Support Vector Machine (SVM) to automati-
cally map each Fμ to the corresponding Sensor Scale mobility level from Table 1.
3 Experiments and Discussions

Video data was collected from a surgical ICU at a large tertiary care hospital. All
ICU staff and patients were consented to participate in our IRB-approved study.
A Kinect sensor was mounted on the wall of a private patient room and was
connected to a dedicated encrypted computer where data was de-identified and
encrypted. We recorded 362 h of video and manually curated 109 video segments
covering 8 patients. Of these 8 patients, we use 3 of them to serve as training
data for the NIMS components (Sect. 2), and the remaining 5 to evaluate.
Training - To train the person, patient, pose, and object detectors we selected
2000 images from the 3 training patients to cover a wide range of appearances.
We manually annotated: (1) head and full body bounding boxes; (2) person
identification labels; (3) pose labels; and (4) chair, upright, and down beds.
To train the NIMS Mobility classifier, 83 of the 109 video segments covering
the 5 left-out patients were selected, each containing 1000 images. For each clip,
a senior clinician reviewed and reported the highest level of patient mobility and
we trained our mobility classifier through leave-one-out cross validation.
Tracking, Pose, and Identification Evaluation - We quantitatively com-
pared our tracking framework to the current SOTA. We evaluate with the widely
used metric MOTA (Multiple Object Tracking Accuracy) [26], which is defined
as 100 % minus three types of errors: false positive rate, missed detection rate,
and identity switch rate. With our ICU dataset, we achieved a MOTA of 29.14 %
compared to −18.88 % with [15] and −15.21 % with [16]. Using a popular RGBD
Pedestrian Dataset [27], we achieve a MOTA of 26.91 % compared to 20.20 % [15]
and 21.68 % [16]. We believe the difference in improvement here is due to there
being many more occlusions in our ICU data compared to [27]. With respect to
our person and pose ID, we achieved 99 % and 98 % test accuracy, respectively,
over 1052 samples. Our tracking framework requires a runtime of 10 secs/frame
(on average), and speeding this up to real-time is a point of future work.
Table 2. Confusion matrix demonstrating clinician and sensor agreement.
A. Nothing B. In-Bed C. Out-of-Bed D. Walking

A. Nothing 18 4 0 0
B. In-Bed 3 25 2 0
C. Out-of-Bed 0 1 25 1
D. Walking 0 0 0 4
Mobility Evaluation - Table 2 shows a confusion matrix for the 83 video seg-
ments to demonstrate the inter-rater reliability between the NIMS and clinician
ratings. We evaluated the NIMS using a weighted Kappa statistic with a lin-
ear weighting scheme [28]. The strength of agreement for the Kappa score was
qualitatively interpreted as: 0.0–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as
moderate, 0.61–0.80 as substantial, 0.81–1.0 as perfect [28]. Our weighted Kappa
was 0.8616 with a 95 % confidence interval of (0.72, 1.0). To compare to a pop-
ular technique, we computed features using Dense Trajectories [17] and trained
an SVM (using Fisher Vector encodings with 120 GMMs), achieving a weighted
Kappa of 0.645 with a 95 % confidence interval of (0.43, 0.86).
The main source of difference in agreement was contained within differenti-
ating “A” from “B”. This disagreement highlights a major difference between
human and machine observation in that the NIMS is a computational method
being used to distinguish activities containing motion from those that do not
with a quantitative, repeatable approach.
4 Conclusions
In this paper, we demonstrated a video-based activity monitoring system called
NIMS. With respect to the main technical contributions, our multi-person track-
ing methodology addresses a real-world problem of tracking humans in complex
environments where occlusions and rapidly-changing visual information occurs.
We will to continue to develop our attribute-based activity analysis for more gen-
eral activities as well as work to apply this technology to rooms with multiple
patients and explore the possibility of quantifying patient/provider interactions.
References
1. Brower, R.: Consequences of bed rest. Crit. Care Med. 37(10), S422–S428 (2009)
2. Corchado, J., Bajo, J., De Paz, Y., Tapia, D.: Intelligent environment for moni-
toring Alzheimer patients, agent technology for health care. Decis. Support Syst.
44(2), 382–396 (2008)
3. Hwang, J., Kang, J., Jang, Y., Kim, H.: Development of novel algorithm and real-
time monitoring ambulatory system using bluetooth module for fall detection in
the elderly. In: IEEE EMBS (2004)
4. Smith, M., Saunders, R., Stuckhardt, K., McGinnis, J.: Best Care at Lower Cost:
the Path to Continuously Learning Health Care in America. National Academies
Press, Washington, DC (2013)
5. Hashem, M., Nelliot, A., Needham, D.: Early mobilization and rehabilitation in the
intensive care unit: moving back to the future. Respir. Care 61, 971–979 (2016)
6. Berney, S., Rose, J., Bernhardt, J., Denehy, L.: Prospective observation of physical
activity in critically ill patients who were intubated for more than 48 hours. J.
Crit. Care 30(4), 658–663 (2015)
7. Chakraborty, I., Elgammal, A., Burd, R.: Video based activity recognition in
trauma resuscitation. In: International Conference on Automatic Face and Ges-
ture Recognition (2013)
8. Lea, C., Facker, J., Hager, G., et al.: 3D sensing algorithms towards building an
intelligent intensive care unit. In: AMIA Joint Summits Translational Science Pro-
ceedings (2013)
9. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In:
IEEE CVPR (2005)
10. Chen, X., Mottaghi, R., Liu, X., et al.: Detect what you can: detecting and repre-
senting objects using holistic models and body parts. In: IEEE CVPR (2014)
11. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with
discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010)
12. Verceles, A., Hager, E.: Use of accelerometry to monitor physical activity in criti-
cally ill subjects: a systematic review. Respir. Care 60(9), 1330–1336 (2015)
13. Babenko, D., Yang, M., Belongie, S.: Robust object tracking with online multiple
instance learning. PAMI 33(8), 1619–1632 (2011)
14. Lu, Y., Wu, T., Zhu, S.: Online object tracking, learning and parsing with and-or
graphs. In: IEEE CVPR (2014)
15. Choi, W., Pantofaru, C., Savarese, S.: A general framework for tracking multiple
people from a moving camera. PAMI 35(7), 1577–1591 (2013)
16. Milan, A., Roth, S., Schindler, K.: Continuous energy minimization for multi-target
tracking. TPAMI 36(1), 58–72 (2014)
17. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE
ICCV (2013)
18. Karpathy, A., Toderici, G., Shetty, S., et al.: Large-scale video classification with
convolutional neural networks. In: IEEE CVPR (2014)
19. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recog-
nition in videos. In: NIPS (2014)
20. Wu, Z., Wang, X., Jiang, Y., Ye, H., Xue, X.: Modeling spatial-temporal clues in
a hybrid deep learning framework for video classification. In: ACMMM (2015)
21. Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In:
IEEE CVPR (2011)
22. Ma, A.J., Yuen, P.C., Saria, S.: Deformable distributed multiple detector fusion
for multi-person tracking (2015). arXiv:1512.05990 [cs.CV]
23. Hodgson, C., Needham, D., Haines, K., et al.: Feasibility and inter-rater reliability
of the ICU mobility scale. Heart Lung 43(1), 19–24 (2014)
24. Girshick, R.: Fast R-CNN (2015). arXiv:1504.08083
25. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
volutional neural networks. In: NIPS (2012)
26. Keni, B., Rainer, S.: Evaluating multiple object tracking performance: the CLEAR
MOT metrics. EURASIP J. Image Video Proces. 2008, 1–10 (2008)
27. Spinello, L., Arras, K.O.: People detection in RGB-D data. In: IROS (2011)
28. McHugh, M.: Interrater reliability: the Kappa statistic. Biochemia Med. 22(3),
276–282 (2012)
Patient MoCap: Human Pose Estimation Under
Blanket Occlusion for Hospital Monitoring
Applications
Felix Achilles1,2(B) , Alexandru-Eugen Ichim3 , Huseyin Coskun1 ,

Federico Tombari1,4 , Soheyl Noachtar2 , and Nassir Navab1,5
1
achilles@in.tum.de
2
Department of Neurology, Ludwig-Maximilians-University of Munich,
Munich, Germany
3
Graphics and Geometry Laboratory, EPFL, Lausanne, Switzerland
4
DISI, University of Bologna, Bologna, Italy
5
Abstract. Motion analysis is typically used for a range of diagnostic

procedures in the hospital. While automatic pose estimation from RGB-
D input has entered the hospital in the domain of rehabilitation medicine
and gait analysis, no such method is available for bed-ridden patients.
However, patient pose estimation in the bed is required in several fields
such as sleep laboratories, epilepsy monitoring and intensive care units.
In this work, we propose a learning-based method that allows to auto-
matically infer 3D patient pose from depth images. To this end we rely on
a combination of convolutional neural network and recurrent neural net-
work, which we train on a large database that covers a range of motions
in the hospital bed. We compare to a state of the art pose estimation
method which is trained on the same data and show the superior result
of our method. Furthermore, we show that our method can estimate the
joint positions under a simulated occluding blanket with an average joint
error of 7.56 cm.
Keywords: Pose estimation · Motion capture · Occlusion · CNN ·

RNN · Random forest
1 Introduction
Human motion analysis in the hospital is required in a broad range of diagnostic
procedures. While gait analysis and the evaluation of coordinated motor func-
tions [1,2] allow the patient to move around freely, the diagnosis of sleep-related
motion disorders and movement during epileptic seizures [3] requires a hospital-
ization and long-term stay of the patient. In specialized monitoring units, the
movements of hospitalized patients are visually evaluated in order to detect crit-
ical events and to analyse parameters such as lateralization, movement extent

DOI: 10.1007/978-3-319-46720-7 57
492 F. Achilles et al.
or the occurrence of pathological patterns. As the analysis of patient movements

can be highly subjective [4], several groups have developed semi-automatic meth-
ods in order to provide quantified analysis. However, in none of the above works,
a full body joint regression has been attempted, which would be necessary for
automatic and objective quantification of patient movement. In this work, we
propose a new system for fully automatic continuous pose estimation of hospi-
talized patients, purely based on visual data. In order to capture the constrained
body movements in the hospital bed, we built up a large motion database that
is comprised of synchronized data from a motion capture system and a depth
sensor. We use a novel combination of a deep convolutional neural network and
a recurrent network in order to discriminatively predict the patient body pose in
a temporally smooth fashion. Furthermore, we augment our dataset with blan-
ket occlusion sequences, and show that our approach can learn to infer body
pose even under an occluding blanket. Our contributions can be summarized as
follows: (1) proposing a novel framework based on deep learning for real time
regression of 3D human pose from depth video, (2) collecting a large dataset of
movement sequences in a hospital bed, consisting of synchronized depth video
and motion capture data, (3) developing a method for synthetic occlusion of
the hospital bed frames with a simulated blanket model, (4) evaluating our new
approach against a state-of-the-art pose estimation method based on Random
Forests.
2 Related Work
Human pose estimation in the hospital bed has only been approached as a clas-
sification task, which allows to estimate a rough pose or the patient status [5,6].
Li et al. [5] use the Kinect sensor SDK in order to retrieve the patient pose and
estimate the corresponding status. However, they are required to leave the test
subjects uncovered by a blanket, which reduces the practical value for real hospi-
tal scenarios. Yu et al. [6] develop a method to extract torso and head locations
and use it to measure breathing motion and to differentiate sleeping positions.
No attempt was made to infer precise body joint locations and blanket occlusion
was reported to decrease the accuracy of the torso detection. While the number
of previous works that aim at human pose estimation for bed-ridden subjects is
limited, the popularity of depth sensors has pushed research on background-free
3D human pose estimation. Shotton et al. [7] and Girshick et al. [8] train Ran-
dom Forests on a large non-public synthetic dataset of depth frames in order to
capture a diverse range of human shapes and poses. In contrast to their method,
we rely on a realistic dataset that was specifically created to evaluate methods
for human pose estimation in bed. Furthermore, we augment the dataset with
blanket occlusions and aim at making it publicly available. More recently, deep
learning has entered the domain of human pose estimation. Belagiannis et al. [9]
use a convolutional neural network (CNN) and devise a robust loss function to
regress 2D joint positions in RGB images. Such one-shot estimations however
do not leverage temporal consistency. In the work of Fragkiadaki et al. [10], the
Patient MoCap: Human Pose Estimation Under Blanket Occlusion 493
authors rely on a recurrent neural network (RNN) to improve pose prediction on

RGB video. However in their setting, the task is formulated as a classification
problem for each joint, which results in a coarse detection on a 12 × 12 grid.
Our method in contrast produces accurate 3D joint predictions in the contin-
uous domain, and is able to handle blanket occlusions that occur in hospital
monitoring settings.
3 Methods
3.1 Convolutional Neural Network
A convolutional neural network is trained for the objective of one-shot pose

estimation in 3D. The network directly predicts all 14 joint locations y which
are provided by the motion capture system. We use an L2 objective during
stochastic gradient descent training. Incorrect joint predictions ŷ result in a
gradient g = 2 · (ŷ − y), which is used to optimize the network weights via
backpropagation. An architecture of three convolutional layers followed by two
fully connected layers proved successful for this task. The layers are configured
as [9-9-64]/[3-3-128]/[3-3-128]/[13-5-1024]/[1024-42] in terms of [height-width-
channels]. A [2x2] max pooling is applied after each convolution. In order to
achieve better generalization of our network, we use a dropout function before
the second and before the fifth layer during training, which randomly switches
off features with a probability of 50 %. Rectified linear units are used after every
learned layer in order to allow for non-linear mappings of input and output. In
total, the CNN has 8.8 million trainable weights. After convergence, we use the
1024-element feature of the 4th layer and pass it to a recurrent neural network in
order to improve the temporal consistence of our joint estimations. An overview
of the full pipeline of motion capture and depth video acquisition as well as the
combination of convolutional and recurrent neural network is shown in Fig. 1.
3.2 Recurrent Neural Network
While convolutional neural networks have capability of learning and exploiting

local spatial correlations of data, their design does not allow them to learn tem-
poral dependencies. Recurrent neural networks on the other hand are specifically
modeled to process timeseries data and can hence complement convolutional net-
works. Their cyclic connections allow them to capture long-range dependencies
by propagating a state vector. Our RNN is built in a Long Short Term Mem-
ory (LSTM) way and its implementation closely follows the one described in
Graves et al. [11]. We use the 1024-element input vector of the CNN and train
128 hidden LSTM units to predict the 42-element output consisting of x-, y-
and z-coordinate of each of the 14 joints. The number of trainable weights of
our RNN is around 596,000. During training, backpropagation through time is
limited to 20 frames.
Motion ground truth

capture
[1024] yˆ1
frame 1 CNN RNN
yˆ2
L y1
frame 2 CNN RNN L y2

yˆ3
frame 3 CNN RNN L y3
yˆ4
Depth frame 4 CNN RNN L y4
sensor
yˆN
frame N CNN RNN L yN
Fig. 1. Data generation and training pipeline. Motion capture (left) allows to retrieve
ground truth joint positions y, which are used to train a CNN-RNN model on depth
video. A simulation tool was used to occlude the input (blue) with a blanket (grey),
such that the system can learn to infer joint locations ŷ even under blanket occlusion.
3.3 Patient MoCap Dataset

Our dataset consists of a balanced set of easier sequences (no occlusion, little
movement) and more difficult sequences (high occlusion, extreme movement)
with ground truth pose information. Ground truth is provided through five cali-
brated motion capture cameras which track 14 rigid targets attached to each sub-
ject. The system allows to infer the location of 14 body joints (head, neck, shoul-
ders, elbows, wrists, hips, knees and ankles). All test subjects (5 female, 5 male)
performed 10 sequences, with a duration of one minute per sequence. Activi-
ties include getting out/in the bed, sleeping on a horizontal/elevated bed, eating
with/without clutter, using objects, reading, clonic movement and a calibration
sequence. During the clonic movement sequence, the subjects were asked to per-
form rapid twitching movements of arms and legs, such as to display motions
that occur during the clonic phase of an epileptic seizure. A calibrated and syn-
chronized Kinect sensor was used to capture depth video at 30 fps. In total, the
dataset consists of 180, 000 video frames. For training, we select a bounding box
that only contains the bed. To alleviate the adaption to different hospital envi-
ronments, all frames are rendered from a consistent camera viewpoint, fixed at
2 m distance from the center of the bed at a 70 ◦ inclination.
3.4 Blanket Simulation

Standard motion capture technologies make it impossible to track bodies under
blankets due to the necessity of the physical markers to be visible to the track-
ing cameras. For this reason, we captured the ground truth data of each person
lying on the bed without being covered. We turned to physics simulation in order
Fig. 2. Snapshots of iterations of the physics simulation that was used to generate
depth maps occluded by a virtual blanket.
to generate depth maps with the person under a virtual blanket. Each RGB-D
frame is used as a collision body for a moving simulated blanket, represented as a
regular triangle mesh. At the beginning of a sequence, the blanket is added to the
scene at about 2 m above the bed. For each frame of the sequence, gravity acts
upon the blanket vertices. Collisions are handled by using a sparse signed dis-
tance function representation of the depth frame, implemented in OpenVDB [12].
See Fig. 2 for an example rendering. In order to optimize for the physical ener-
gies, we employ a state-of-the-art projection-based dynamics solver [13]. The
geometric energies used in the optimization are triangle area preservation, trian-
gle strain and edge bending constraints for the blanket and closeness constraints
for the collisions, which results in realistic bending and folding of the simulated
blanket.
4 Experiments
As to validate our method, we compare to the regression forest (RF) method
introduced by Girshick et al. [8]. The authors used an RF to estimate the body
pose from depth data. At the training phase, random pixels in the depth image
are taken as training samples. A set of relative offset vectors from each sample’s
3D location to the joint positions is stored. At each branch node, a depth-
difference feature is evaluated and compared to a threshold, which determines if
the sample is passed to the left or the right branch. Threshold and the depth-
difference feature parameters are jointly optimized to provide the maximum
information gain at the branch node. The tree stops growing after a maximum
depth has been reached or if the information gain is too low. At the leaves, the
sets of offsets vectors are clustered and stored as vote vectors. During test time,
body joint locations are inferred by combining the votes of all pixels via mean
shift. The training time of an ensemble of trees on >100 k images is prohibitively
long, which is why the original authors use a 1000-core computational cluster to
achieve state-of-the-art results [7]. To circumvent this requirement, we randomly
sample 10 k frames per tree. By evaluating the gain of using 20 k and 50 k frames
for a single tree, we found that the accuracy saturates quickly (compare Fig. 6
of [8]), such that using 10k samples retains sufficient performance while cutting
down the training time from several days to hours.
4.1 Comparison on the Patient MoCap Dataset

We fix the training and test set by using all sequences of 4 female and 4 male
subjects for training, and the remaining subjects (1 female, 1 male) for testing.
A grid search over batch sizes B and learning rates η provided B = 50 and
η = 3 · 10−2 as the best choice for the CNN and η = 10−4 for the RNN. The
regression forest was trained on the same distribution of training data, from
which we randomly sampled 10,000 images per tree. We observed a saturation
of the RF performance after training 5 trees with a maximum depth of 15.
We compare the CNN, RNN and RF methods with regard to their average
joint error (see Table 1) and with regard to their worst case accuracy, which is
the percentage of frames for which all joint errors satisfy a maximum distance
constraint D, see Fig. 3. While the RNN reaches the lowest average error at
12.25 cm, the CNN appears to have less outlier estimations which result in the
best worst case accuracy curve. At test-time, the combined CNN and RNN block
takes 8.87 ms to infer the joint locations (CNN: 1.65 ms, RNN: 7.25 ms), while
the RF algorithm takes 36.77 ms per frame.
Fig. 3. Worst case accuracy computed on 36,000 test frames of the original dataset.
On the y-axis we plot the ratio of frames in which all estimated joints are closer to the
ground truth than a threshold D, which is plotted on the x-axis.
4.2 Blanket Occlusion
A blanket was simulated on a subset of 10,000 frames of the dataset (as explained
in Sect. 3.4). This set was picked from the clonic movement sequence, as it is
most relevant to clinical applications and allows to compare one-shot (CNN and
RF) and time series methods (RNN) on repetitive movements under occlusion.
The three methods were trained on the new mixed dataset consisting of all
Fig. 4. Average per joint error on the blanket occluded sequence.
a) c)
Table 1. Euclidean distance

errors in [cm]. Error on the
occluded test set decreases after
b) d) retraining the models on blanket
occluded sequences (+r).
Sequence CNN RNN RF
All 12.69 12.25 28.10
Occluded 9.05 9.23 21.30
Occluded+r 8.61 7.56 19.80
Fig. 5. Examples of estimated (red) and ground
truth skeletons (green). Pose estimations work with-
out (a,b) and underneath (c,d ) the blanket (blue).
other sequences (not occluded by a blanket) and the new occluded sequence.
For the RF, we added a 6th tree which was trained on the occluded sequence.
Figure 4 shows a per joint comparison of the average error that was reached on
the occluded test set. Especially for hips and legs, the RF approach at over 20 cm
error performs worse than CNN and RNN, which achieve errors lower than 10 cm
except for the left foot. However, the regression forest manages to identify the
head and upper body joints very well and even beats the best method (RNN)
for head, right shoulder and right hand. In Table 1 we compare the average error
on the occluded sequence before and after retraining each method with blan-
ket data. Without retraining on the mixed dataset, the CNN performs best at
9.05 cm error, while after retraining the RNN clearly learns to infer a better joint
estimation for occluded joints, reaching the lowest error at 7.56 cm. Renderings
of the RNN predictions on unoccluded and occluded test frames are shown in
Fig. 5.
5 Conclusions
In this work we presented a unique hospital-setting dataset of depth sequences
with ground truth joint position data. Furthermore, we proposed a new scheme
for 3D pose estimation of hospitalized patients. Training a recurrent neural net-
work on CNN features reduced the average error both on the original dataset
and on the augmented version with an occluding blanket. Interestingly, the RNN
benefits a lot from seeing blanket occluded sequences during training, while the
CNN can only improve very little. It appears that temporal information helps
to determine the location of limbs which are not directly visible but do interact
with the blanket. The regression forest performed well for arms and the head,
but was not able to deal with occluded legs and hip joints that are typically close
to the bed surface, resulting in a low contrast. The end-to-end feature learning
of our combined CNN-RNN model enables it to better adapt to the low contrast
of occluded limbs, which makes it a valuable tool for pose estimation in realistic
environments.
Acknowledgments. The authors would like to thank Leslie Casas and David Tan
from TUM and Marc Lazarovici from the Human Simulation Center Munich for their
support. This work has been funded by the German Research Foundation (DFG)
through grants NA 620/23-1 and NO 419/2-1.
References
1. Stone, E.E., Skubic, M.: Unobtrusive, continuous, in-home gait measurement using
the microsoft kinect. IEEE Trans. Biomed. Eng. 60(10), 2925–2932 (2013)
2. Kontschieder, P., Dorn, J.F., Morrison, C., Corish, R., Zikic, D., Sellen, A.,
D’Souza, M., Kamm, C.P., Burggraaff, J., Tewarie, P., Vogel, T., Azzarito, M.,
Glocker, B., Chin, P., Dahlke, F., Polman, C., Kappos, L., Uitdehaag, B.,
Criminisi, A.: Quantifying progression of multiple sclerosis via classification of
depth videos. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.)
MICCAI 2014, Part II. LNCS, vol. 8674, pp. 429–437. Springer, Heidelberg (2014)
3. Cunha, J., Choupina, H., Rocha, A., Fernandes, J., Achilles, F., Loesch, A.,
Vollmar, C., Hartl, E., Noachtar, S.: NeuroKinect: a novel low-cost 3Dvideo-EEG
system for epileptic seizure motion quantification. PLOS ONE 11(1), e0145669
(2015)
4. Benbadis, S.R., LaFrance, W., Papandonatos, G., Korabathina, K., Lin, K.,
Kraemer, H., et al.: Interrater reliability of eeg-video monitoring. Neurology
73(11), 843–846 (2009)
5. Li, Y., Berkowitz, L., Noskin, G., Mehrotra, S.: Detection of patient’s bed statuses
in 3D using a microsoft kinect. In: EMBC. IEEE (2014)
6. Yu, M.-C., Wu, H., Liou, J.-L., Lee, M.-S., Hung, Y.-P.: Multiparameter sleep mon-
itoring using a depth camera. In: Schier, J., Huffel, S., Conchon, E., Correia, C.,
Fred, A., Gamboa, H., Gabriel, J. (eds.) BIOSTEC 2012. CCIS, vol. 357, pp. 311–
7. Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A.,
Cook, M., Moore, R.: Real-time human pose recognition in parts from single depth
images. Commun. ACM 56(1), 116–124 (2013)
8. Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regres-
sion of general-activity human poses from depth images. In: ICCV. IEEE (2011)
9. Belagiannis, V., Rupprecht, C., Carneiro, G., Navab, N.: Robust optimization for
deep regression. In: ICCV. IEEE (2015)
10. Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for
human dynamics. In: ICCV. IEEE (2015)
11. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint
arXiv:1308.0850 (2013)
12. Museth, K., Lait, J., Johanson, J., Budsberg, J., Henderson, R., Alden, M.,
Cucka, P., Hill, D., Pearce, A.: OpenVDB: an open-source data structure and
toolkit for high-resolution volumes. In: ACM SIGGRAPH 2013 Courses. ACM
(2013)
13. Bouaziz, S., Martin, S., Liu, T., Kavan, L., Pauly, M.: Projective dynamics: fusing
constraint projections for fast simulation. ACM Trans. Graph. (TOG) 33(4), 154
(2014)
Numerical Simulation of Cochlear-Implant
Surgery: Towards Patient-Specific Planning
Olivier Goury1,2(B) , Yann Nguyen2 , Renato Torres2 , Jeremie Dequidt1 ,

and Christian Duriez1
1
Inria Lille - Nord Europe, Université de Lille 1, Villeneuve-d’Ascq, France
olivier.goury@inria.fr
2
Inserm, UMR-S 1159, Université Paris VI Pierre et Marie Curie, Paris, France
Abstract. During Cochlear Implant Surgery, the right placement of the

implant and the minimization of the surgical trauma to the inner ear are
an important issue with recurrent fails. In this study, we reproduced,
using simulation, the mechanical insertion of the implant during the
surgery. This simulation allows to have a better understanding of the
failing cases: excessive contact force, buckling of the implant inside and
outside the cochlea. Moreover, using a patient-specific geometric model
of the cochlea in the simulation, we show that the insertion angle is a
clinical parameter that has an influence on the forces endured by both
the cochlea walls and the basilar membrane, and hence to post-operative
trauma. The paper presents the mechanical models used for the implant,
for the basilar membrane and the boundary conditions (contact, fric-
tion, insertion etc...) and discuss the obtained results in the perspective
of using the simulation for planning and robotization of the implant
insertion.
Keywords: Cochlear implant surgery · Cochlea modeling · FEM
1 Introduction
Cochlear implant surgery can be used for
profoundly deafened patient, for whom
hearing aids are not satisfactory. An elec-
trode array is inserted into the tym-
panic ramp of the patient’s cochlea (scala
tympani). When well-inserted, this array
can then stimulate the auditory nerve
and provide a substitute way of hearing
(Fig. 2). However, as of today, the surgery
is performed manually and the surgeon Fig. 2. Cross-section of a cochlea with
has only little perception on what hap- implant inserted.
pens in the cochlea while he is doing the
insertion [1].
Yet, it is often the case that the
implant gets blocked in the cochlea before

DOI: 10.1007/978-3-319-46720-7 58
Numerical Simulation of Cochlear-Implant Surgery 501
being completely inserted (Fig. 1). Another issue is the fact this insertion can cre-
ate trauma on the wall of the cochlea as well as damaging the basilar membrane.
This can lead to poor postoperative speech performances or loss of remaining
acoustic hearing in lower frequencies that can be combined with electric stimu-
lation. The simulation the insertion procedure would allow for great outcomes.
Indeed, it can be used for surgery planning, where the surgeon wish to predict the
quality of the insertion depending on various parameters (such as the insertion
angle or the type of implant used) for a specific patient, or surgery assistance in
the longer term (where the procedure would be robot-based). Cochlear implant
surgery was simulated in [2,3] respectively in 2 and 3 dimensions, on simplified
representations of the cochlea. These works allowed to make first predictions
about the forces endured by the cochlea walls.
In this contribution, we develop a framework able to accurately simulate,
in three dimensions, the whole process of the implant insertion into a patient-
specific cochlea, including the basilar membrane deformation. The simulation is
done using the finite element method and the SOFA framework1 . The implant
is modelled using the beam theory, while shell elements are used to define a
computational model of the basilar membrane. The cochlea walls are modelled
as rigid which is a common assumption [4] due to the bony nature of the cochlea.
Fig. 1. Examples of 3 insertions with different outcomes, from left to right: successful
insertion, failed insertion (folding tip), incomplete insertion.
2 Numerical Models and Algorithms
In this section, we describe the numerical model used to capture the mechanical
behavior and the specific shapes of the cochlear implant and the basilar mem-
brane. Moreover, the computation of the boundary conditions (contacts with
the cochlea walls, insertion of the implant) are also described, as they play an
important role in this simulation.
Implant Model: The implant is made of silicone and has about 20 electrodes
(depending on the manufacturer) spread along its length. It is about half a
millimetre thick and about two to three centimetre long. Its thin shape makes
1
www.sofa-framework.org.
502 O. Goury et al.
it possible to use beam elements to capture its motion (see Fig. 3). Its dynamics
can be modelled as follows:
Mv̇ = p − F(q, v) + HT λ, (1)
where M is the mass matrix, q is the vector of generalised coordinates (each

node at the extremity of a beam contains three spatial degrees of freedom and
three angular degrees of freedom), v is the vector of velocities. F represents the
internal forces of the beams while p gathers the external forces. λ is the vector of
contact forces magnitudes with either the cochlea wall or the basilar membrane,
and H gathers the contact directions. The computation of the internal forces F
relies on the assumption of an elastic behavior, which brings back the electrode
to its rest shape when external forces are released. In practice, we defined a
Young’s modulus of around 250 MPa as in [5] and we rely on the assumption
of a straight rest shape to model the electrode we used for the experiments.
However some pre-shaped electrodes exist, and our implementation of the beam
model supports the use of curved reference shape.
Fig. 3. (Left) The implant is modeled using beam elements and, (middle) its motion
is constrained by contact and friction response to collision with cochlear walls. (right)
Contact forces induces strain on the Basilar membrane.
Basilar Membrane Model: The basilar membrane separates two liquid-filled tun-
nels that run along the coil of the cochlea: scala media and scala tympani (by
which the implant is inserted). It is made of a stiff material but is very thin
(about 4 μm) and thus very sensitive to the contact with the electrodes. During
the insertion, even if the electrode is soft, the membrane will deform to com-
ply with its local shape. In case of excessive contact force, the membrane will
rupture: the electrode could then freely go in the scala media or scala vestibuli.
This will lead to loss of remaining hearing, damages to auditory nerve dendrites
and fibrosis. To represent the Basilar membrane, we use a shell model [6] that
derives from a combination of a triangular in-plane membrane element and a
triangular thin plate in bending. The nodes of the membrane that are connected
with the walls of the cochlea are fixed, like in the real case.
Implant Motion: During the procedure, the implant is pushed (using pliers)
through the round window which marks the entrance of the cochlea. To simplify
the implant model, we only simulate the portion of the implant which is inside
the cochlea. The length of the beam model is thus increased progressively during
the simulation to simulate the insertion made by the surgeon. Fortunately, our
beam model relies on continuum equations, and we can adapt the sampling of
beam elements at each simulation step while keeping the continuity of the values
of F. The position and orientation of the implant body may play an important
role (see Sect. 4), so these are not fixed. Conversely, we consider that the implant
is pushed at constant velocity, as a motorized tool for pushing the implant was
used in the experiments.
Contact Response on Cochlear Walls: The motion of the implant is constrained

by contact and friction forces that appear when colliding the walls of the cochlea.
To obtain an accurate simulation, the modeling of both geometry of the cochlea
walls and physics of the collision response are important. To reproduce the geom-
etry of the cochlea, we rely on images issued from cone-beam CT. The images are
segmented using ITK-Snap and the surface of the obtained mesh are smoothed
to remove sampling noise. Compared to previous work [2,3], our simulations do
not used a simplified geometric representation of the cochlear walls.
The contact points between implant and cochlea walls are detected using an
algorithm that computes the closest distances (proximity queries) between the
mesh and the centerline of the implant model. The algorithm is derived from [7].
At each contact point, the signed distance distance δ n (q) between the centerline
and the corresponding point on the collision surface (along the normal direction
of the surface) must be larger than the radius of the implant (δ n (q) ≥ r). It
should be noted that this collision formulation creates a round shape at the tip
of the implant which is realistic but badly displayed visually in the simulation.
The contact force λn follows the Signorini’s law:
0 ≤ λn ⊥ δ n (q) − r ≥ 0 (2)
In addition to the precision, one advantage of this law is that there is no addi-
tional parameters rather than the relative compliance of the deformable structure
in contact. In the tangential direction, λt follows Coulomb’s law friction to repro-
duce the stick/slip transitions that are observed in the clinical practice. At each
contact point, the collision response is based on Signorini’s law and Coulomb’s
friction using the solvers available in SOFA.
Unfortunately, the friction coefficient μ is one of the missing parameter of
the simulation. Several studies have tried to estimate the frictional conditions
between the electrode array of the implant and the endosteum lining and the
wall of the tympani such as [8] or [9]. However experiments were performed ex-
vivo on a relatively small set of samples and exhibit some important variability
and heterogeneity. As a consequence, in Sect. 4, we perform a sensitivity analysis
of this parameter.
3 Experimental Validation
As mentioned in the introduction, it is difficult to have an immediate feedback
on how the implant deploys in the cochlea due to very limited workspace and
504 O. Goury et al.
visibility. This poor feedback prevents the surgeon to adapt and update his/her
gesture to improve the placement of the implant. To have a better understanding
of the behaviors and to simplify the measurements, we have conducted exper-
iments of implant placement on temporal bones issued from cadavers. In this
section, these experiments are presented as well as a comparison between the
measurements and the simulation results.
Material : An custom experimental setup is built up to evaluate the forces
endured by the scala tympani during the insertion of an electrode array at con-
stant velocity. This setup is described in Fig. 4. Recorded data: This setup allows
to compare forces when performing a manual insertion and a motorized, more
regular, insertion. With this setup, we are able to reproduce failure cases such
as incomplete insertion or so-called folding tip insertion, as displayed in Fig. 1.
Ability to Reproduce Incomplete Insertions: The goal of this first comparison is
to show if we can reproduce what is observed in practice using simulation. Due
to contact and friction conditions and the fact that we work with living struc-
tures, it is never possible to reproduce the same insertion, even if the insertion
is motorized. So we do not expect the simulation to be predictive. However, we
show that the simulation is able to reproduce different scenarios of insertion
(complete/incomplete insertion or folding tip). Like in practice, the first impor-
tant resistance to the insertion of the implant appears in the turn at the bottom
of the cochlea (like in the picture (middle) of Fig. 3.) This resistance create a
buckling of the implant that limits the transmission in the longitudinal direction
till the implant presses the cochlear walls and manages to advance again. If the
resistance to motion is too large, the implant stays blocked. This differentiates
a complete and incomplete insertion and is captured by the simulation. Evolu-
tion of the implant forces while performing the insertion: An indicator of the
Fig. 4. Experimental setup. Microdissected cochleae are molded into resin (a) and
fixed to a 6-axis force sensor (c). A motorized uniaxial insertion tool (b) is used to
push the electrode array into the scala tympani at a constant velocity. The whole
setup is schemed in (d).
smoothness of the insertion is the force applied on the implant by the surgeon
during the surgery. For minimising trauma, that force should typically remain
low. Experimental data shows that this force generally increases as the inser-
tion progresses. This is explained by the fact that as the implant is inserted, its
surface of contact onto the cochlea walls and the basilar membrane increases,
leading to more and more friction. The force has a peak near the first turn of
the cochlea wall (the basal turn). We see that the simulation reproduces this
behaviour (See Figs. 1 and 6).
4 Sensitivity of the Results to Mechanical and Clinical

Parameters
Many parameters can influence the results of the simulation. We distinguish the
mechanical parameters (such as friction on the cochlea walls, stiffness of the
implant, elasticity of the membrane, etc...) and the clinical parameters, which
the surgeon can control to improve the success of the surgery. In this first study,
among all the mechanical parameters, we selected to study the influence of the
friction, which is complex to measure. We show that the coefficient of friction
has an influence on the completeness of the insertion but has less influence on
the force that is applied on the basilar membrane (see Fig. 7).
For the clinical parameters, we focus on the angle of insertion (see Fig. 5).
The position and orientation of the implant compared to the cochlea tunnels
plays an important role in the easiness of inserting the implant. The anatomy
makes it difficult to have a perfect alignment but the surgeon has still a certain
freedom in the placement of the tube tip. Furthermore, his mental representation
of the optimal insertion axis is related to his experience and even experts have
a 7◦ error of alignment [1]. We test the simulation with various insertion angles,
from a aligned case with θ = 0 to a case where the implant is almost orthogonal
Fig. 5. (Left) Forces when performing motorized versus manual insertion using the
setup presented in Fig. 4. (Right) Dissected temporal bone used during experiments
with the definition of the insertion angle θ: the angle formed by the implant and the
wall of the cochlea’s entrance
506 O. Goury et al.
Fig. 6. Comparison between experiments and simulation in 3 cases. We can see that the
simulation can reproduce cases met in real experiments (see Fig. 1). Regarding forces
on the cochlea walls, the general trend of the simulation is similar to the experiments.
To reproduce the folding tip case in the simulation, which is a rare in practice, the
array was preplaced with a folded tip at the round window region, which is why the
curve does not start from 0 length. In the incomplete insertion case, the force increases
greatly when the implant reaches the first turn. The simulation curves stops then. This
is because we did note include the real anatomy outside the entrance of the cochlea
that would normally constrain the implant and lead the force to keep increasing.
Fig. 7. Forces applied on the cochlea wall (left) and the basilar membrane (center) at
the first turn of the cochlea. We can see that larger forces are generated when inserting
the implant at a wide angle. Regarding the forces on the basilar membrane, there
are two distinct groups of angle: small angles lead to much smaller forces than wider
ones. Changing the friction generally increases the forces (right). This leads to an early
buckle of the implant outside the cochlea and hence to an incomplete insertion.
to the wall entrance with θ = 85, and compare the outcome of the insertion, as
well as the forces induced on the basilar membrane and the implant. Findings
are displayed in Fig. 7.

In this paper, we propose the first mechanical simulation tool that reproduces
the insertion of the cochlear implant in 3D, using patient data. Several scenarios
are considered and the results we obtained exhibit that several failures in the
surgery can be reproduced in the simulator. Moreover similar pattern of forces
against the cochlea’s wall are measured in experimental scenarios and their cor-
responding simulations. From a quantitative standpoint, an analysis has been
conducted to estimate the influence of the main parameters reported by clini-
cians. This preliminary study could be extended with the following perspectives:
first, we need to enrich our experimental study by considering several patients
and different implants; second a (semi-)automatized framework should be con-
sidered in order to generate patient-specific data from medical images in order to
allow in a clinical time a virtual planning of the surgery. This work could be a first
step towards the use of simulation in the planning of cochlear implant surgery
or even robot-assisted surgery. This objective would require the use of accurate
and validated bio-mechanical simulations of the whole procedure (anatomical
structures and implant). In-vivo experiments may be necessary.
Acknowledgements. The authors thank the foundation “Agir pour l’audition” which
funded this work and Oticon Medical.
References
1. Torres, R., Kazmitcheff, G., Bernardeschi, D., De Seta, D., Bensimon, J.L., Ferrary,
E., Sterkers, O., Nguyen, Y.: Variability of the mental representation of the cochlear
anatomy during cochlear implantation. European Archives of ORL, pp. 1–10 (2015)
2. Chen, B.K., Clark, G.M., Jones, R.: Evaluation of trajectories and contact pres-
sures for the straight nucleus cochlear implant electrode arraya two-dimensional
application of finite element analysis. Med. Eng. Phys. 25(2), 141–147 (2003)
3. Todd, C.A., Naghdy, F.: Real-time haptic modeling and simulation for prosthetic
insertion, vol. 73, pp. 343–351 (2011)
4. Ni, G., Elliott, S.J., Ayat, M., Teal, P.D.: Modelling cochlear mechanics. BioMed.
Res. Int. 2014, 42 p. (2014). Article ID 150637, doi:http://dx.doi.org/10.1155/2014/
150637
5. Kha, H.N., Chen, B.K., Clark, G.M., Jones, R.: Stiffness properties for nucleus
standard straight and contour electrode arrays. Med. Eng. Phys. 26(8), 677–685
(2004)
6. Comas, O., Cotin, S., Duriez, C.: A shell model for real-time simulation of intra-
ocular implant deployment. In: Bello, F., Cotin, S. (eds.) ISBMS 2010. LNCS, vol.
5958, pp. 160–170. Springer, Heidelberg (2010)
7. Johnson, D., Willemsen, P.: Six degree-of-freedom haptic rendering of complex
polygonal models. In: Haptic Interfaces for Virtual Environment and Teleoperator
Systems, HAPTICS 2003, pp. 229–235. IEEE (2003)
8. Tykocinski, M., Saunders, E., Cohen, L., Treaba, C., Briggs, R., Gibson, P., Clark,
G., Cowan, R.: The contour electrode array: safety study and initial patient trials
of a new perimodiolar design. Otol. Neurotol. 22(1), 33–41 (2001)
9. Kha, H.N., Chen, B.K.: Determination of frictional conditions between electrode
array and endosteum lining for use in cochlear implant models. J. Biomech. 39(9),
1752–1756 (2006)
Meaningful Assessment of Surgical Expertise:
Semantic Labeling with Data and Crowds
Marzieh Ershad1 , Zachary Koesters1 , Robert Rege2 , and Ann Majewicz1,2(B)

1
The University of Texas at Dallas, Richardson, TX, USA
Ann.Majewicz@utdallas.edu
2
UT Southwestern Medical Center, Dallas, TX, USA
http://www.utdallas.edu/hero/
Abstract. Many surgical assessment metrics have been developed to

identify and rank surgical expertise; however, some of these metrics (e.g.,
economy of motion) can be difficult to understand and do not coach
the user on how to modify behavior. We aim to standardize assessment
language by identifying key semantic labels for expertise. We chose six
pairs of contrasting adjectives and associated a metric with each pair
(e.g., fluid/viscous correlated to variability in angular velocity). In a
user study, we measured quantitative data (e.g., limb accelerations, skin
conductivity, and muscle activity), for subjects (n = 3, novice to expert)
performing tasks on a robotic surgical simulator. Task and posture videos
were recorded for each repetition and crowd-workers labeled the videos
by selecting one word from each pair. The expert was assigned more
positive words and also had better quantitative metrics for the majority
of the chosen word pairs, showing feasibility for automated coaching.
Keywords: Surgical training and evaluation · Crowdsourced assess-

ment · Semantic descriptors
1 Introduction
A great musician, an all-star athlete, and a highly skilled surgeon share one
thing in common: the casual observer can easily recognize their expertise, sim-
ply by observing their movements. These movements, or rather, the appearance
of the expert in action, can often be described by words such as fluid, effort-
less, swift, and decisive. Given that our understanding of expertise is so innate
and engrained in our vocabulary, we seek to develop a lexicon of surgical exper-
tise through combined data analysis (e.g., user movements and physiological
response) and crowd-sourced labeling [1,2].
In recent years, the field of data-driven identification of surgical skill has
grown significantly. Methods now exist to accurately classify expert vs. novice
users based on motion analysis [3], eye tracking [4], and theories from motor con-
trol literature [5], to name a few. Additionally, it is also possible to rank several
users in terms of expertise through pairwise comparisons of surgical videos [2].
While all these methods present novel ways for determining and ranking exper-
tise, an open question remains: how can observed skill deficiencies translate into

DOI: 10.1007/978-3-319-46720-7 59
Meaningful Assessment of Surgical Expertise Using Data and Crowds 509
more effective training programs? Leveraging prior work showing the superior-
ity of verbal coaching for training [6], we aim to develop and validate a mecha-
nism for translating conceptually difficult, but quantifiable, differences between
novice and expert surgeons (e.g. “more directed graphs on the known state tran-
sition diagrams” [7] and “superior exploitation of kinematic redundancy” [5])
into actionable, connotation-based feedback that a novice can understand and
employ.
The central hypothesis to this study is that human perception of surgical
expertise is not so much a careful, rational evaluation, but rather an instinctive,
impulsive one. Prior work has proposed that surgical actions, or surgemes (e.g.
knot tying, needle grasping, etc.) are ultimately the building blocks surgery [8].
While these semantic labels describe the procedural flow of a surgical task, we
believe that surgical skill identification is more fundamental than how to push
a needle. It is about the quality of movement that one can observe from a short
snapshot of data. Are the movements smooth? Do they look fluid? Does the oper-
ator seem natural during the task? The hypothesis that expertise is a universal,
instinctive assessment is supported by recent work in which crowd-workers from
the general population identified surgical expertise with high accuracy [1]. Thus,
the key to developing effective training strategies is to translate movement qual-
ities into universally understandable, intuitive, semantic descriptors.
2 Semantic Descriptors of Expertise

Inspired by studies showing the benefit of verbal coaching for surgical training [6],
we selected a set of semantic labels which could be used to both describe sur-
gical expertise and coach a novice during training. The choice of adjectives was
informed by metrics commonly found in the literature (e.g., time, jerk, accel-
eration), as well as through discussions with surgeon educators and training
staff. As this is a preliminary study, we selected adjective pairs that were com-
monly used and also had some logical data metric that could be associated with
them. For example, “crisp/jittery” can be matched with a jerk measurement, and
“relaxed/tense” can be matched to some metric from electromyography (EMG)
recordings. Galvanic skin response (GSR) is another physiological measurement
Table 1. Semantic labeling lexicon
Positive adjective Positive metric Negative metric Negative adjective

Crisp High mean jerk Low mean jerk Jittery
Fluid Low ang. velocity var High ang. velocity var Viscous
Smooth Low acceleration var High acceleration var Rough
Swift Short completion time Long completion time Sluggish
Relaxed Low normalized EMG High normalized EMG Tense
Calm Low GSR event count High GSR event count Anxious
510 M. Ershad et al.
that is useful for counting stressful events, which correlate to increased anxi-
ety [9], thus serving as a basis for a “calm/anxious” word pair. The choice of
word pairs and the corresponding metric is not unique; however, for the purpose
of this paper, we simply aim to determine whether or not these word pairs have
some relevance in terms of surgical skill evaluation. The six preliminary word
pairs chosen and their corresponding data metrics are listed in Table 1.
3 Experimental Setup and Methods

Our hypothesis is that crowd-chosen words, which simultaneously correlate to
expertise level and measurable data metrics, will be good choices for an auto-
mated coaching system. Many studies have recently investigated the effectiveness
of crowd-sourcing for the assessment of technical skills and have shown correla-
tions to expert evaluations [1,10–12]. These studies support the hypothesis that
the identification of expertise is somewhat instinctive, regardless of whether or
not the evaluator is a topic-expert in his or her evaluation area. The goal of
this portion of our study is to see if the crowd can identify which of the chosen
words for expertise are most important or relevant for surgical skill. Therefore,
we conducted an experimental evaluation of our word pairs and metrics through
a human subjects study, using the da Vinci Surgical Simulator (on loan from
Intuitive Surgical, Sunnyvale, CA). Users were outfitted with a variety of sen-
sors used to collect metric data while performing tasks on the simulator. Video
recordings of the user performing tasks were used for crowd sourced identification
of relevant semantic descriptors.
3.1 Data Collection System
To quantify task movements and physiological response for our semantic label
metrics, we chose to measure joint positions (elbow, wrist, shoulder), limb accel-
erations (hand, forearms), forearm muscle activity with EMG, and GSR. Joint
positions were recorded using an electromagnetic tracker (trakSTAR, Model
180 sensors, Northern Digital Inc., Ontario, Canada) with an elbow estimation
method as described in [5]. Limb accelerations, EMG and GSR were measured
using sensor units from Shimmer Sensing, Inc. (Dublin, Ireland). Several mus-
cles were selected from EMG measurement including bilateral (left and right)
extensors, and a flexor on the left arm, which are important for wrist rotation,
as well as the abductor pollicus, which is important for pinch grasping with the
thumb [13]. These muscles were recommended by a surgeon educator.
We also recorded videos of the user posture and simulated surgical training
task with CCD cameras (USB 3.0, Point Grey, Richmond, Canada). The Robot
Operating System (ROS) was used to synchronize all data collection. The exper-
imental setup and sensor locations are shown in Fig. 1(a,c).
Skills Simulator
Limb Inertial
Measurement Units
with EMG + GSR
Electromagnetic Joint
Position Tracking
(a) Human Subject Trial (b) Ring and Rail Task
EMG EMG
Right Abductor IMU
Extenso EMG Pollicis Left Foot
r Bilateral
Flexors
IMU
Bilateral Forearms
Left Hand
GSR
(c) Shimmer Sensor Placement (d) Suturing Task
Fig. 1. Experimental setup and sensor positioning.
3.2 Simulated Surgical Tasks and Human Subject Study
The simulated surgical tasks chosen for this study were used to evaluate
endowrist manipulation and needle control and driving skills (Fig. 1(a,c)).
Endowrist instruments provide surgeons with range of motions greater than
a human hand, thus, these simulated tasks evaluates the subject’s ability to
manipulate these instruments. The needle driving task evaluates the subject’s
ability to effectively hand off and position needles for different types of suture
throws (forehand and backhand) and while using different hands (left and right).
Three subjects were recruited to participate in this study, approved by both
UTD and UTSW IRB offices (UTD #14-57, UTSW #STU 032015-053). The
subjects (right handed, 25–45 years old) consisted of: An expert (+6 years clinical
robotic cases), an intermediate (PGY-4 surgical resident) and a novice (PGY-1
surgical resident). All subjects had limited to no training using the da Vinci
simulator; however, the expert and intermediate had exposure to the da Vinci
clinical robots. All subjects first conducted two non-recorded warm up tasks (i.e.,
Matchboard 3 for endowrist manipulation warm up and Suture Sponge 2 for nee-
dle driving warmup). After training, the subjects then underwent baseline data
collection including arm measurements, and maximum voluntary isometric mus-
cle contractions (MVIC) for normalization and cross-subject comparison [14].
Subjects then conducted the recorded experimental tasks for endowrist manipu-
lation (Ring and Rail 2) and needle driving (Suture Sponge 3). For the purposes
of data analysis, each task was subdivided into three repeated trials, correspond-
ing to a single pass of a different colored ring (i.e., red, blue or yellow), or two
consecutive suture throws.
3.3 Crowd-Worker Recruitment and Tasks

For each trial, side-by-side, time-synchronized videos of the simulated surgical
task and user posture were posted on Amazon Mechanical Turk. The videos
ranged in length from 15 s to 3 min and 40 s. Anonymous crowd-workers (n = 547)
were recruited to label the videos using one from each of the six contrasting
adjectives pairs (Fig. 4(a)). Crowd-workers received $0.10 for each video and
were not allowed to evaluate the same video more than once.
3.4 Data Analysis Methods

For each word pair, many options exist for correlation to a desired metric (e.g.,
sensor location, muscle type, summary static etc.). In this paper, we selected
metrics based on logical reasoning and feedback from surgical collaborators. To
measure crisp vs. jittery hand movement, we calculated the standard deviation
of the trial average mean value of jerk from the inertial measurement unit (IMU)
mounted on the subject‘s right hand. Similarly, fluid/viscous was measured by
variability in angular velocity of the same IMU. Smooth/rough was measured by
the variability in acceleration magnitude of an IMU mounted on the right fore-
arm. The acceleration magnitude was calculated for each time-sampled x, y, and
z accelerations to eliminate the effects due to gravity [15]. To measure the calm-
ness vs. anxiousness of the user, the GSR signal was processed using the Ledalab
EDA data analysis toolbox in Matlab to count stressful events [16]. Mean EMG
levels were recorded through electrodes placed on the forearm extensor as a mea-
sure of relaxedness vs. tenseness of the subject. In order to compare EMG levels
between different subjects, these signals were normalized using the maximum of
three repeated EMG signals during a maximal voluntary isometric contraction
(MVIC) for each muscle. All EMG signals were high pass filtered using a fourth
order Butterworth filter with a 20 Hz cut-off frequency to remove motion arti-
facts and were detrended, rectified, and smoothed [14]. The EM tracker data
was used to visualize user movements and was not correlated to any word pairs.
Finally, a Pearson’s R correlation was used to compare the crowd-worker results
and data metrics for each word pair.

The trajectories for each subjects wrist movements as measured by the EM
tracker are shown in Fig. 2. As expected, the expert movements are tighter and
smoother than the intermediate and novice. Figure 3 compares the mean and
standard deviation of each chosen metric through all trials among three subjects.
Of the 547 crowd-workers recruited, 7 jobs were rejected due to incomplete label-
ing assignments, resulting in 30 complete jobs for each of the 18 videos posted.
The results of the analysis can be seen in Fig. 4(b). An ANOVA analysis was
conducted to identify significant groups in terms of expertise level, type of task,
and repetition for the data metrics, as well as crowd sourced data. Addition-
ally, the crowd sourced data was evaluated for significant differences in terms
0.25 0.4 0.01
0.2 0.005
0.3
0
0.15
0.2 -0.005
z (m)
z (m)
z (m)
0.1
0.1 -0.01
0.05
-0.015
0
0 -0.02
-0.05 -0.1 -0.025

-0.2 0.1 -0.05
-0.1 0 0
0.1 -0.1 0.03
0 0.05 -0.1 -0.05 0.05 0.02
0 0 0.01
-0.05 0.05 0
0.1 -0.2 0.1 0.1
x (m) -0.1 x (m) 0.15 x (m) -0.01
y (m) y (m) y (m)
(a) Novice (b) Intermediate (c) Expert
Fig. 2. Wrist trajectory of subjects performing Ring and Rail 2 (red ring)
Fluid (vs. Viscous) Rough (vs. Smooth) Crisp (vs. Jittery)

20 0.35 50
2
)
Ring & Rail Novice Ring & Rail Novice Ring & Rail Novice
Wrist Angular Velocity Variability (rad/sec)
Suture Sponge Novice Suture Sponge Novice 48 Suture Sponge Novice

Ring & Rail Intermediate 0.3 Ring & Rail Intermediate Ring & Rail Intermediate
Forearm Acceleration Variability (mm/s
Suture Sponge Intermediate Suture Sponge Intermediate 46 Suture Sponge Intermediate
3
)
Ring & Rail Expert Ring & Rail Expert Ring & Rail Expert
Suture Sponge Expert 0.25 Suture Sponge Expert 44
Suture Sponge Expert
Mean Hand Jerk (mm/s

15
42
0.2
40
0.15
38
10 More Fluidity More Rough More Crisp
0.1 36
34
0.05
32
5 0 30
Novice Intermediate Expert Novice Intermediate Expert Novice Intermediate Expert
(a) (b) (c)
Sluggish (vs. Swift) Anxious (vs. Calm) Tense (vs. Relaxed) Left Arm Extensor
250 450 0.45
Normalized Mean EMG Activation (%MVIC)
Ring & Rail Novice Ring & Rail Novice Ring & Rail Novice
Suture Sponge Novice 400 Suture Sponge Novice 0.4 Suture Sponge Novice
Ring & Rail Intermediate Ring & Rail Intermediate
Ring & Rail Intermediate
Suture Sponge Intermediate
200 Suture Sponge Intermediate 350 0.35 Suture Sponge Intermediate
Ring & Rail Expert Ring & Rail Expert
Ring & Rail Expert
Number of GSR Events
Suture Sponge Expert

Completion Time (sec)
Suture Sponge Expert Suture Sponge Expert

300 0.3
150
250 0.25
200 0.2
100
More Sluggish 150 More Anxiety 0.15 More Tense
100 0.1
50
50 0.05
0 0 0
Novice Intermediate Expert Novice Intermediate Expert Novice Intermediate Expert
(d) (e) (f)
Fig. 3. Mean and standard deviation of all metrics for all trials and subjects.
of word assignment rates. Table 2 summarizes the significant statistical results

(p ≤ 0.05), significant groups (post-hoc Scheffe test), and the correlation between
the crowd ratings and data metrics. For nearly all metrics, there was no signif-
icant effect due to task or repetition. The expert exhibited better performance
on all metrics, with the exception of the smooth/rough and relaxed/tense. This
could be due to a poor choice of metrics, or data collection errors with the expert
EMG signal or baseline. The crowd assigned significantly better semantic labels
to the expert, then the intermediate and novice. Additionally, the crowd rated
the ring and rail tasks with significantly lower ratings than the suturing task,
and evaluated the second repetition across all subjects as worse than the first
and last. The magnitude of the data metric to crowd rating correlation ranged
from 0.25 to 0.99. The best correlated metric to word-pair was swift/sluggish
and the worst correlated metric was smooth/rough followed by crisp/jittery.
Table 2. Statistical analysis summary
Source Metric/crowd Subject (E, I, N) Task (RR, SS) Repetition (1–3)

correlation
p Significance p Significance p Significance
Fluid/viscous 0.82 0.0005 E > I &N 0.0374 RR > SS 0.1134 n/a
Smooth/rough −0.25 0.0001 I < E &N 0.1240 n/a 0.3366 n/a
Crisp/jittery 0.63 0.073 n/a 0.7521 n/a 0.9128 n/a
Calm/anxious −0.98 0.035 E<I<N 0.2286 n/a 0.9504 n/a
Relaxed/tense 0.76 <0.0001 E>N>I 0.6834 RR > SS 0.6291 n/a
Swift/sluggish −0.99 0.0028 E<N 0.1659 n/a 0.8541 n/a
Crowd-worker rating <0.0001 E>I>N <0.0001 SS > RR 0.0005 2 < 1 &3
Crowd-worker rating for word choice Fluid was least chosen word (p = 0.0003)
E – Expert, I – Intermediate, N – Novice, RR – Ring and Rail 2, SS – Suture Sponge 3.
100
Expert
90 Intermediate
Novice
80
Assignment Percentage
70
60
50
40
30
20
10
0
Fluid Smooth Crisp Swift Calm Relaxed
(a) (b)
Fig. 4. The crowd-worker assignment (a), and the results (b).
5 Conclusions and Future Work

In this paper, we present a lexicon of surgical expertise composed of contrasting
adjective pairs, associated with quantitative movement or user-based metrics.
We have shown that crowd-worker labeled training videos correlate strongly to
expertise level. The data metrics typically also corresponded to expertise level;
however, there were some discrepancies in terms of the smooth/rough metric
and relaxed/tense metric. Finally, the crowd-workers identified differences based
on task and repetition not seen in the data metrics, and not all data metrics
correlated to trends in the crowd-worker ratings. An interesting area of future
work will be to determine which metrics are best correlated, and who/what is
more correct in evaluating expertise - the crowd or the metrics. Finally, we plan
to expand the lexicon to include additional semantic labels and will identify,
specifically, which better predict expertise. These labels will be the basis of a
future automated coaching system. We hope to extend these methods to other
aspects of surgical training, such as: open and laparoscopic skills, team dynamics,
patient interactions, and professionalism.
Acknowledgment. This work was supported by the Intuitive Surgical Simulator loan
program for the Southwestern Center for Minimally Invasive Surgery at UTSW (PI
Rege). We thank Deborah Hogg and Lauren Scott for providing access to the simulator.
References
1. Chen, C., et al.: Crowd-sourced assessment of technical skills: a novel method to
evaluate surgical performance. J. Surg. Res. 187(1), 65–71 (2014)
2. Malpani, A., Vedula, S.S., Chen, C.C.G., Hager, G.D.: Pairwise comparison-based
objective score for automated skill assessment of segments in a surgical task. In:
Stoyanov, D., Collins, D.L., Sakuma, I., Abolmaesumi, P., Jannin, P. (eds.) IPCAI
3. Howells, N.R., et al.: Motion analysis: a validated method for showing skill levels
in arthroscopy. J. Arthrosc. Relat. Surg. 24(3), 335–342 (2008)
4. Ahmidi, N., Hager, G.D., Ishii, L., Fichtinger, G., Gallia, G.L., Ishii, M.: Surgical
task and skill classification from eye tracking and tool motion in minimally invasive
surgery. In: Jiang, T., Navab, N., Pluim, J.P.W., Viergever, M.A. (eds.) MICCAI
2010, Part III. LNCS, vol. 6363, pp. 295–302. Springer, Heidelberg (2010)
5. Nisky, I., et al.: Uncontrolled manifold analysis of arm joint angle variability during
robotic teleoperation and freehand movement of surgeons and novices. IEEE Trans.
Biomed. Eng. 61(12), 2869–2881 (2014)
6. Porte, M.C., et al.: Verbal feedback from an expert is more effective than self-
accessed feedback about motion efficiency in learning new surgical skills. Am. J.
Surg. 193(1), 105–110 (2007)
7. Reiley, C.E., Hager, G.D.: Task versus subtask surgical skill evaluation of robotic
minimally invasive surgery. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A.,
Taylor, C. (eds.) MICCAI 2009, Part I. LNCS, vol. 5761, pp. 435–442. Springer,
Heidelberg (2009)
8. Lin, H.C., et al.: Towards automatic skill evaluation: detection and segmentation
of robot-assisted surgical motions. Comput. Aided Surg. 11(5), 220–230 (2006)
9. Critchley, H.D., et al.: Neural activity relating to generation and representation
of galvanic skin conductance responses: a functional magnetic resonance imaging
study. J. Neurosci. 20(8), 3033–3040 (2000)
10. Malpani, A., et al.: A study of crowdsourced segment-level surgical skill assessment
using pairwise rankings. Int. J. Comput. Assist. Radiol. Surg. 10(9), 1435–1447
(2015)
11. White, L.W., et al.: Crowd-sourced assessment of technical skill: a valid method for
discriminating basic robotic surgery skills. J. Endourol. 29(11), 1295–1301 (2015)
12. Kowalewski, T.M., et al.: Crowd-sourced assessment of technical skills for valida-
tion of Basic Laparoscopic Urologic Skills (BLUS) tasks. J. Urol. 195, 1859–1865
(2016)
13. Criswell, E.: Cram’s Introduction to Surface Electromyography. Jones & Bartlett
Publishers, Sudbury (2010)
14. Halaki, M., Ginn, K.: Normalization of EMG Signals: To Normalize Or Not to
Normalize and What to Normalize To?. INTECH Open Access Publisher, Rijeka
(2012)
15. Henrikson, R., et al.: Surgical trainer and navigator final report, pp. 1–14
16. Benedek, M., Kaernbach, C.: Decomposition of skin conductance data by means
of nonnegative deconvolution. Psychophysiology 47(4), 647–658 (2010)
2D-3D Registration Accuracy Estimation
for Optimised Planning of Image-Guided
Pancreatobiliary Interventions
Yipeng Hu1(&), Ester Bonmati1, Eli Gibson1, John H. Hipwell1,

David J. Hawkes1, Steven Bandula2, Stephen P. Pereira3,
and Dean C. Barratt1
1
Centre for Medical Image Computing,
University College London, London, UK
yipeng.hu@ucl.ac.uk
2
Centre for Medical Imaging, University College London, London, UK
3
Institute for Liver and Digestive Health,
University College London, London, UK
Abstract. We describe a fast analytical method to estimate landmark-based

2D-3D registration accuracy to aid the planning of pancreatobiliary interven-
tions in which ERCP images are combined with information from diagnostic 3D
MR or CT images. The method analytically estimates a target registration error
(TRE), accounting for errors in the manual selection of both 2D- and 3D
landmarks, that agrees with Monte Carlo simulation to within 4.5 ± 3.6 %
(mean ± SD). We also show how to analytically estimate a planning uncer-
tainty incorporating uncertainty in patient positioning, and utilise it to support
ERCP-guided procedure planning by selecting the optimal patient position and
X-ray C-arm orientation that minimises the expected TRE. Simulated- and
derived planning uncertainties agreed to within 17.9 ± 9.7 % when the
root-mean-square error was less than 50°. We demonstrate the feasibility of this
approach on clinical data from two patients.
1 Introduction
Endoscopic retrograde cholangiopancreatography (ERCP) is an imaging technique that

provides real time, planar X-ray images of the bile ducts following the intra-ductal
injection of a radio-opaque contrast agent via a catheter. It is the standard guidance
technique for many diagnostic and therapeutic endoscopic interventions involving the
pancreatobiliary ducts, such as ductal lesion tissue sampling and stenting, where it is
used to navigate catheter-based instruments through the ductal tree. As ERCP-guided
procedures can be technically demanding with a significant risk of complications, such
as post-procedural pancreatitis (2–16 %) and cholangitis (<1%) [1], there is growing
clinical interest in using high-quality 3D CT and MR imaging of the pancreatobiliary
system to plan interventional procedures and to enhance the navigational information
provided by ERCP. For example, a 3D graphical representation of organs such as the
pancreas, the biliary tree, and associated pathology (such as cysts and cancerous

DOI: 10.1007/978-3-319-46720-7_60
2D-3D Registration Accuracy Estimation for Optimised Planning 517
lesions) obtained from an MR or CT scan; might be displayed together with an indi-

cation of the current 3D location of instruments determined from ERCP images during
a procedure. Alternatively, anatomical structures and landmarks defined within the
MR/CT scan can be displayed superimposed on ERCP images by projecting these onto
the X-ray plane. Adopting multimodal image guidance using either of these approaches
has the potential advantages of increasing surgical accuracy and confidence, and
reducing the technical skill and time required to perform a procedure. Such benefits are
likely to be especially useful in enabling rapid localisation of pathology in the more
distal biliary tree, and in cases where strictures prevent or severely impede the visu-
alisation of the bile ducts more distal to the ampulla of Vater. Enhanced surgical
guidance also offers the potential to reduce the risk of ERCP-contrast-related toxicity
and radiation dose by reducing the reliance on ERCP for visualisation of the ductal tree.
A fundamental requirement of multimodal image guidance is the registration of
ERCP and MR/CT images. This is a 2D-3D image registration problem, solutions
for which have been the subject of a substantial body of research, although largely in
the context of other applications, such as orthopaedic surgery and neurovascular
interventions [2]. Very few studies have focused on multimodal image guidance for
ERCP-guided pancreatobiliary interventions; in [3, 4], for example, commercial guid-
ance systems were used to guide radiosurgery of pancreatic cancer, with registration
achieved by matching implanted fiducials. However, the use of implanted fiducials
specifically for registration is in general very difficult to justify for the majority of
patients undergoing ERCP-guided procedures. For this reason, in this work we focus on
registration using anatomical landmarks.
The ability to predict intraoperative registration accuracy is becoming increasingly
important as guidance systems become more widely adopted in clinical practice. For
ERCP-guided pancreatobiliary procedures this information could be used to optimise
registration protocols, for example, by informing the selection of anatomical landmarks
so that a sufficient number of landmarks are used, with an adequate geometric con-
figuration, to ensure an accurate registration. Furthermore, predicted registration
accuracy can inform how much confidence a clinician should place in a 3D guidance
system, determining whether additional effort should be made to improve a registra-
tion – for example, changing the position of a C-arm X-ray or searching for additional
landmarks in the ERCP image. In this paper, we develop a new, fast analytical method
for estimating 2D-3D registration error, with a particular focus on pancreatobiliary
procedures, and demonstrate how it can be applied practically to predict and optimise
landmark-based registration of ERCP and MR cholangiopancreatography (MRCP)
images.
2 Analytical Estimation of 2D-3D Registration Error
In this section, we develop an analytical model of error propagation for a

landmark-based 2D-3D registration in which corresponding anatomical landmarks
defined in both 2D- and 3D images are matched. Figure 1 shows a schematic of the
variables and errors involved. The objective of the model is to estimate the target
registration error (TRE), given locations of registration landmarks and one or more
518 Y. Hu et al.
independent ‘target’ landmarks, both defined in the 3D image; error estimates for these
landmarks, referred to here as 3D landmark localisation errors1 (LLEs); error estimates
for registration landmarks defined within the 2D image, 2D LLEs; estimates of the
parameters of the transformation that match the 2D- and 3D registration landmarks.
Assuming that the 2D images are produced by projection, as is the case for ERCP,
the 2D coordinates of a landmark, defined by the position vector, u ¼ ½u; vT , in the 2D
image are given by:
u ¼ f p ðf r ðx; hÞ; KÞ ð1Þ
where x ¼ ½x; y; zT is the position vector containing the 3D coordinates of the land-
mark in the 3D image; f p is the perspective projection transformation; K is the camera
matrix, determined by the focal length (i.e. the distance between the X-ray source and
detector) and the principal point coordinates. These 2D imaging parameters are
T
assumed known and well-calibrated; and h ¼ ax ; ay ; az ; tx ; ty ; tz is a vector con-
taining the 3 rotation and 3 translation parameters of a rigid-body transformation, f r ,
which may also be parameterised by a 3 3 rotation matrix Rðax ; ay ; az Þ and a
T
translation vector t ¼ tx ; ty ; tz : Rewriting (1) in terms of R and t, and using
homogenous coordinates with a normalisation scaling factor, k, we have:

u Rx þ t
k ¼K :
1 1
In this work, registration of the 2D- and 3D images was achieved by minimising a
collinearity-based error in 3D (as opposed to 2D) image space using the orthogonal
iteration algorithm [5].
Anatomical landmarks used in registration are defined with inherent uncertainties,
due to the intra/inter-operator localisation error, anatomical variations, projection-
related ambiguity, and tissue motion. Assuming an independent, anisotropic and
heterogeneous Gaussian error model, the following errors are involved (see also in
Fig. 1): (a) A 3D LLE for the ith ði ¼ 1; . . .; nÞ 3D landmark xi , represented by 3D
covariance matrix Rxi ; (b) a 2D LLE on the corresponding 2D landmark ui represented
by 2D covariance matrix Rui ; (c) errors on m transformation parameters h (here,
m ¼ 6), a 6D covariance matrix Rh ; and, (d) an error, represented by a 3D matrix Rr ,
associated with the target of interest, defined in the preoperative image by position
vector r.
First, we would like to compute the uncertainty in transformation parameter h, i.e.
Rh , given uncertainties from both 2D- and 3D LLEs, Rxi and Rui . Hoff et al. [6] derived
a backward propagation of covariance using a direct least-squares, pseudo-inversion of
a full rank Jacobian to estimate Rh . Sielhorst et al. [7] directly utilised the forward- and
backward propagation of covariance, summarised in [8], estimating errors for an
optical tracking application. Both of these studies considered only 2D LLEs, i.e. true
1
Equivalent to fiducial localisation error.
Fig. 1. A schematic showing the variables and errors involved in registration and planning of an
ERCP-guided procedure (see Sects. 2 and 3 for details).
values of xi , the geometry of the calibrated tracking tool, are known without uncer-
tainty. However, this assumption does not hold in 2D-3D registration applications and
the registration error should be estimated by considering both Rxi and Rui . This can be
achieved by modifying Eq. (1) and considering a new vector function:

u h
¼ f p ;K ð2Þ
x x
in which, the same perspective projection still holds, but the 3D landmarks are now
treated as additional parameters to estimate as well as trivial function outputs. The
parameter space of f p now becomes m þ 3n dimensional whilst the function (mea-
surement) space, becomes 2n þ 3n dimensional. Linearly approximating the vector
transformation function f p by a first-order Taylor series, a (2n þ 3nÞ ðm þ 3nÞ
Jacobian matrix Jf p can be computed to map the 2n þ 3n dimension covariance matrix
RU;X onto an m þ 3n dimension parameter space. Without loss of generality, and
assuming independence between the measured landmarks, the new backward propa-
gation formula is:
X þ
d
h;X
¼ JT 1
f p RU;X Jf p ð3Þ
where
2 3
Ru1 0
6 .. .. .. 7
6 . . . 0 7
6 7 h iT h iT
6 0 R un 7
RU;X ¼6
6
7; J ðui Þ ¼ @f p and J ðxi Þ ¼ @f p :
7 fp
6 Rx1 0 7
@ui i;h fp @xi i;h
6 .. .. .. 7
4 0 . . . 5
0 Rxn
The estimated covariance of registration parameters R b h is the m m leading

b
principal submatrix of Rh;X . The pseudo-inverse operator, applicable when the Jaco-
bian has a rank smaller than the number of columns, is denoted by ðÞ þ .
520 Y. Hu et al.
Given estimated values for R b h , transforming the covariance of an independent

target point (i.e. covariance Rh;r ¼ 0) at location r is a straightforward application of
the forward propagation which maps parameter space onto function space:
X
d X
d T
u
¼ J f p
ðh Þ J ðhÞ þ Jf p ðrÞRr JTf p ðrÞ
h fp
ð4Þ
r
where ur ¼ f p ðrÞ is the projected target point in 2D. A scalar root-mean-square error
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
(RMSE) can also be computed, TRE ¼ trace R b u Þ.
r
3 Estimation of Planning Error for ERCP-Guided

Procedures
In this section, we develop an example framework for planning ERCP-guided proce-

dures that minimises the estimated TRE to optimise patient position and C-arm ori-
entation, for which, an analytical formulation of uncertainty in planning is introduced
to assess the optimised plan.
The estimated TRE from Sect. 2 is dependent on the estimates of transformation
parameters h, which are determined by the physical position of the X-ray positioner and
the patient. Over-parameterising f r in Eq. (1), a point xplan defined in 3D plan space
(i.e. the physical coordinate system of the preoperative image in mm) can be trans-
formed to xpos in positioner space (the camera or C-arm space, in which the X-ray
source is the centre of projection and at the origin, and the detector is parallel to the x-y
plane). One possible parameterisation for f r without loss of generality is:
xpos ¼ Rpos -1 Rpat xplan Rpos -1 Rpat ciso

plan þ R
pos -1 pat
ðt tpos Þ þ ciso
pos ð5Þ
where, for easy interpretation, the isocentre (ciso iso

plan and cpos , in plan- and positioner
spaces, respectively) of the C-arm is assumed as rotation centre for both patient- and
positioner orientations, described by Rpat (DOF = 3 for apat in Euler angles) and Rpos
(DOF = 2 for C-arm primary- and secondary angles acarm ), respectively. The vector
Dt ¼ tpat tpos represents relative translation between patient and positioner and, as
indicated by Eq. (5), can be reduced to DOF = 3. These 8 variables are not indepen-
dent but also have constraints imposed by clinical practice.
For the applications of interest, the patient position is assumed to take one of three
discrete values in which the ERCP is performed, corresponding to the prone, left
lateral, or supine position. Based on the current parameterisation, each patient position
is optimised separately using a quasi-Newton method for the two remaining variables
acarm , to minimise the estimated TRE.
The three patient positions approximately correspond to three different orientations
for apat . Dt may be calibrated with respect to external landmarks, such as palpable
points on the ribs or vertebra, or tracked markers fixed to the surgical bed. However,
neither can be estimated accurately before a procedure. Rotating a patient to a given
angle or repositioning accurately during a procedure may be clinically infeasible, so it
is more practical to estimate the bounds of the uncertainties Rapat and RDt associated
with apat and Dt, respectively. Whilst it is noted that error estimates in patient posi-
tioning can be derived from observed registration errors, the published values (e.g. [9]),
summarized in Table 1, were used in this study to reflect a plausible clinical scenario.
The C-arm orientation acarm , on the other hand, can be calibrated to a high degree of
accuracy. Therefore, its error is assumed negligible relative to that associated with
patient positioning. If this error becomes significant, for example, if an uncalibrated
C-arm is used, including an additional error in C-arm orientation should be considered.
Unlike estimation of intraoperative TRE, which can be conditioned on the regis-
tration transformation parameters, planning C-arm orientation and patient position has
to take into account (marginalise over) the uncertainty in patient positioning (see also in
Fig. 1). Given b apat , the uncertainty in planning can be represented by an error
covariance Rbacarm on optimised b
carm
a . Now the entire process to compute the regis-
tered target position with parameterisation in Eq. (5) becomes:
carm pat
ur ¼ f p f r r; b
a ;b
a ; DtÞÞ:
Using the same treatment for Eqs. (2) and (3), a composite covariance matrix can
be computed as follows:
X þ
d T 1
^
a carm
^ ; Dt
;a pat ¼ J R
f p ur ;^ pat J
a ;Dt f p ð6Þ
As in Eq. (3), R bpat can be constructed by R b u estimated from Eq. (4) and
ur ; a ;Dt r
uncertainty in patient positioning Rb and RDt . The uncertainty in planning C-arm

pat
a
b carm is the 2 2 leading principal submatrix. The estimated uncertainty
orientation Rba
in C-arm orientation provides an estimate of how precisely the C-arm position can be
planned to achieve the estimated TRE that is optimal at the time of planning. For
example, a 95 % confidence interval (CI) would indicate that the true optimised plan
will be within the estimated region 95 % of the time. This may be considered together
with other practical restrictions, such as the orientation range and control precision.
4 Experiment and Results
MRCP and ERCP were acquired for two patients who underwent ERCP-guided
interventions under local research ethical approval. Anatomical landmarks were
manually defined by an interventional radiologist, a gastroenterologist and three
medical imaging research fellows, on separate occasions. These included points on the
ampulla, hilum, L1-, L2-, T12 vertebrae, pancreatic genu, hepatic duct bifurcation,
cystic duct- and pancreatic duct connections with the common bile duct, previously
implanted surgical devices (e.g. a lap chole clip), and pathological features (e.g. a
stricture). See an example in Fig. 2. LLEs for these points, summarised in Table 1,
were estimated from the variance in the multiple landmarks. In addition, five validation
landmarks for each case were defined to represent locations of clinical interest, such as
points on the most distal aspect of the common bile duct and on the branches of the
left- and right hepatic ducts.
522 Y. Hu et al.
Table 1. Result summary from two sets of patient data, errors are summarised using RMSE
Pat. LLE (mm) Positioning C-arm [1st, 2nd] Planning Estimated Observed
error (°−mm) angles (°) error (°) TRE (mm) TRE (mm)
1 3D: 5.1 ± 1.2 Lateral: [0.2, −0.1] 16.5 ± 3.3 4.9 ± 1.1 9.7 ± 3.0
2D: 2.1 ± 0.6 8.9–11.5 [− 27.0, −0.1] 13.7 ± 5.0 6.1 ± 1.7 16.2 ± 3.4
Target: 3.3 ± 1.0 Supine:
2 3D: 3.4 ± 0.3 4.3–11.5 [− 0.1, 0.0] 92.8 ± 59.0 5.3 ± 0.9 12.0 ± 2.7
2D: 2.4 ± 0.8 Prone: [− 18.9, 0.0] 114.3 ± 62.5 5.3 ± 0.8 13.2 ± 3.2
Target: 3.5 ± 0.6 4.3–11.5
Fig. 2. A snapshot of the graphic user interface of MRCP-ERCP registration software,

developed as part of this research. Landmarks are plotted with their respective uncertainties using
2D ellipse or 3D ellipsoids, corresponding to 90 % confidence levels. Left - preoperative MRCP
with 3D landmarks (blue), targets (yellow) and the segmented biliary tree shown in green and
grey; middle - a digitally reconstructed radiograph generated from the MRCP image with the
estimated target TREs in yellow; right - ERCP with 2D landmarks (blue) and two example targets
(registered targets are shown in yellow, whereas validation targets are shown in red).
Monte-Carlo simulations were used to verify: (1) the TRE estimated by Eq. (4);
and (2) the derived planning uncertainty given by Eq. (6), where Jacobian was esti-
mated numerically. 10,000 simulations were performed sampling respective variables.
As an example, Fig. 3, the overall agreement is excellent with 4.5 ± 3.6 % difference
between the TRE computed analytically versus the result of numerical simulations,
measured as the RMSE in Rur . The overall RMSE had a range of [3.2, 9.8] mm
between different C-arm positions. The difference in Ra^carm was 17.9 ± 9.7 % when
the RMSE is smaller than 50°, which we consider as a planning range. As the planning
error increased, Eq. (6) provided an increasingly poor approximation with up
to *500 % difference from simulation results. This is to be expected given that there
are only 2 DOFs in this case.
The measured TRE was computed for each of the target for ERCP images acquired
at different, but sub-optimal C-arm angles (due to the retrospective analysis). The
results, compared with the estimated TRE and planning error, are summarised in
Table 1. Although the estimated TREs significantly under-estimated the observed
values for both patients, possibly due to rigid assumption, a trend implying potential
predictive ability was observed (correlation coefficient of 0.95). Significant differences
Fig. 3. Estimated TREs (RMSE in mm is indicated by the colour bar) for two example targets -
a bifurcation at left hepatic duct (left plot) and a stricture at pancreatic duct (right plot) -
computed using Monte-Carlo simulation (solid-line ellipse) versus analytical results (coloured
ellipse), for different C-arm orientations (in degrees). Please see the text for the experiment
details.
in both estimated- and observed registration accuracy were found for Patient 1, for
different C-arm angles (both p-values < 0.01), which is consistent with the estimated
planning errors that produce mutually exclusive 90 % CIs. Patient 2 had a larger
planning error (overlapping each other within respective 50 % CIs) for both
C-arm orientations, predicting that significant changes in TRE are unlikely. This was
confirmed by insignificant differences from both estimated- and observed TREs
(p-values = 0.99 and 0.07).
5 Conclusion and Discussion
Three main contributions have been identified in the proposed planning-informed

registration method: (a) a new error propagation approach is derived, incorporating
uncertainties in 2D- and 3D landmark identification, to estimate a TRE for 2D-3D
registration in ERCP-guided procedures; (b) a novel method for handling uncertainty in
planning is introduced to assess a planning protocol in which C-arm orientation and
patient position are optimised to achieve a minimum estimated TRE; and (c) presen-
tation of preliminary results from real patient data that validate the derived error
propagation methods and demonstrate possible clinical utility of the proposed methods.
The proposed error propagation method may be extended to non-rigid registration,
in particular, to inform planning strategies to maximise registration accuracy. The
assumption of a Gaussian distribution provides a reasonable and convenient approxi-
mation for the purpose of surgical planning, but the registration landmark and patient
positioning error distributions may be non-Gaussian. In this case, alternative methods,
such as unscented filtering, may be applicable. The values of LLEs and uncertainty in
patient positioning are estimated using a relatively small sample size and from litera-
ture. Current results in Sect. 4 are based on retrospective image data which do not have
optimised C-arm orientation. A larger prospective patient study will be able to further
validate the proposed methods.
524 Y. Hu et al.
Acknowledgements. This work is supported by CRUK, the EPSRC and the CIHR.
References
1. Anderson, M.A., Fisher, L., Jain, R., Evans, J.A., Appalaneni, V., Ben-Menachem, T., Fisher,
D.A.: Complications of ERCP. Gastrointest. Endosc. 75(3), 467–473 (2012)
2. Markelj, P., Tomaževič, D., Likar, B., Pernuš, F.: A review of 3D/2D registration methods for
image-guided interventions. Med. Image Anal. 16(3), 642–661 (2012)
3. Murphy, M.J., Adler, J.R., Bodduluri, M., Dooley, J., Forster, K., Hai, J., Poen, J.:
Image-guided radiosurgery for the spine and pancreas. Comput. Aided Surg. 5(4), 278–288
(2000)
4. Soltys, S.G., Goodman, K.A., Koong, A.C.: CyberKnife radiosurgery for pancreatic cancer.
In: Urschel, H.C., et al. (eds.) In Treating Tumors that Move with Respiration, pp. 227–239.
5. Lu, C.P., Hager, G.D., Mjolsness, E.: Fast and globally convergent pose estimation from
video images. IEEE Trans. PAMI 22(6), 610–622 (2000)
6. Hoff, W., Vincent, T.: Analysis of head pose accuracy in augmented reality. IEEE Trans. Vis.
Comput. Graph. 6(4), 319–334 (2000)
7. Sielhorst, T., Bauer, M., Wenisch, O., Klinker, G., Navab, N.: Online estimation of the target
registration error for n-ocular optical tracking systems. In: Ayache, N., Ourselin, S.,
Maeder, A. (eds.) MICCAI 2007, Part II. LNCS, vol. 4792, pp. 652–659. Springer,
Heidelberg (2007)
8. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge
University Press, Cambridge (2003)
9. Penney, G., Varnavas, A., Dastur, N., Carrell, T.: An image-guided surgery system to aid
endovascular treatment of complex aortic aneurysms: description and initial clinical
experience. In: Taylor, R.H., Yang, G.-Z. (eds.) IPCAI 2011. LNCS, vol. 6689, pp. 13–24.
Registration-Free Simultaneous Catheter
and Environment Modelling
Liang Zhao(B) , Stamatia Giannarou, Su-Lin Lee, and Guang-Zhong Yang
Hamlyn Centre for Robotic Surgery, Imperial College London, London, UK

{liang.zhao,stamatia.giannarou,su-lin.lee,g.z.yang}@imperial.ac.uk
Abstract. Endovascular procedures are challenging to perform due to

the complexity and difficulty in catheter manipulation. The simultaneous
recovery of the 3D structure of the vasculature and the catheter posi-
tion and orientation intra-operatively is necessary in catheter control
and navigation. State-of-art Simultaneous Catheter and Environment
Modelling provides robust and real-time 3D vessel reconstruction based
on real-time intravascular ultrasound (IVUS) imaging and electromag-
netic (EM) sensing, but still relies on accurate registration between EM
and pre-operative data. In this paper, a registration-free vessel recon-
struction method is proposed for endovascular navigation. In the opti-
misation framework, the EM-CT registration is estimated and updated
intra-operatively together with the 3D vessel reconstruction from IVUS,
EM and pre-operative data, and thus does not require explicit registra-
tion. The proposed algorithm can also deal with global (patient) motion
and periodic deformation caused by cardiac motion. Phantom and in-
vivo experiments validate the accuracy of the algorithm and the results
demonstrate the potential clinical value of the technique.
1 Introduction
Endovascular catheter procedures are among the most common surgical inter-
ventions used to treat Cardiovascular Diseases (CVD). Being minimally invasive,
these procedures extend the range of patients able to receive interventional CVD
treatment to age groups with high risks for open surgery [1]. However, the chal-
lenge associated with minimising access incisions lies in the increased complexity
of catheter manipulations, which is mainly caused by the loss of direct access
to the anatomy and the poor visualisation of the surgical site [2]. Thus, the
3D structure of the vasculature needs to be recovered intra-operatively in order
to model the interaction between the catheter and its surroundings and assist
catheter navigation.
The current clinical approaches to endovascular procedures mainly rely on
2D guidance based on X-ray fluoroscopy and the use of contrast agents [3].
An alternative imaging modality that does not depend on ionising radiation or
This work was supported by the FP7-ICT (601021) and the EPSRC (EP/L020688/1).
Dr. Stamatia Giannarou is supported by the Royal Society (UF140290).

DOI: 10.1007/978-3-319-46720-7 61
526 L. Zhao et al.
nephrotoxic contrast agents is intravascular ultrasound (IVUS) [4]. In [5,6], the

3D shape of the vessel is reconstructed by registering IVUS images to angiog-
raphy data in the 3D space, but still involves x-ray radiation and the use of
contrast agents. Reconstruction only based on IVUS requires assumptions on
pose of the transducer [7]. Recently, Simultaneous Catheter and Environment
Modelling (SCEM) [8] has been proposed to reconstruct the 3D vessel shape by
fusing IVUS and electromagnetic (EM) sensing data. This framework has been
enhanced in [9] with SCEM+ for more robust and real time 3D vessel recon-
struction. This has been achieved by formulating the 3D vessel reconstruction
as a real-time nonlinear optimisation problem by considering the uncertainty in
both the IVUS contour and the EM pose, as well as vessel structure information
from pre-operative CT data. Both of the above frameworks rely on accurate
pre-registration between the EM and CT data. However, in practice this is chal-
lenging as the registration is performed using external markers which can cause
large errors [10], and requires updating every time the patient moves.
To this end, this paper proposes a registration-free simultaneous catheter
and environment modelling method for endovascular navigation. This framework
advances SCEM+ as it does not require any prior information about the EM-CT
registration and can deal with global motion (e.g. patient motion) and periodic
vessel deformation caused by the cardiac cycle. A novel optimisation framework
has been formulated which incorporates the relative pose between the EM and
CT data and allows this pose to be updated online. The uncertainty of the
EM-CT registration is reduced incrementally and accurate 3D vessel reconstruc-
tions are estimated. In addition, the proposed algorithm can deal with global
motion by introducing an anchored EM sensor and periodic deformation is over-
come by gating the IVUS images and EM data to the same phase of the cycle
using electrocardiogram (ECG) signal. Detailed validation on phantom and in-
vivo datasets has been performed to compare the performance of the proposed
framework to SCEM and SCEM+ and demonstrate its accuracy and robustness
to global motion.
2 Methods
Similarly to SCEM+ [9], in this work, the 3D vessel reconstruction is formulated

as a real-time nonlinear optimisation problem by considering the uncertainty in
both the IVUS contour and the EM pose, as well as vessel structure informa-
tion from pre-operative CT data. The transformation between the EM and CT
coordinate systems is required. Advancing SCEM+, to avoid the need for pre-
registering the EM and CT coordinate systems, the relative pose between the
EM and CT data is incorporated in the optimisation framework and updated
online. Robustness to global motion is achieved by including in the optimisation
the pose of an externally anchored 6DoF EM sensor.
At the ith frame, suppose PE i is the 6DoF pose reported from the EM
sensor on the catheter tip and ΣE its corresponding uncertainty, CIi =
[(cI1 )T , . . . , (cIn )T ]T is the contour of the inner vessel wall extracted from
Registration-Free Simultaneous Catheter and Environment Modelling 527
the IVUS image which consists of a set of n boundary points and ΣI =

diag(Σ1 , . . . , Σn ) is its corresponding uncertainty, and (PA
i , ΣE ) is the observa-
tion from the anchored EM sensor. Then the proposed algorithm can be math-
ematically formulated as the following nonlinear optimisation framework:
n

argmin cIj − RPi (RA
T
i
T C
(RR cj + TR ) + TAi − TPi )2Σ −1
R,Pi ,Ai j=1 j
(1)
+PE
i − Pi 2Σ −1 + R̂i−1 − R2Σ −1 + PA
i − Ai 2Σ −1 .
E R i−1 E
In the state vector, Pi = {RPi , TPi } is the current catheter pose in the EM
coordinate frame, in which RPi and TPi are the rotation matrix and the trans-
lation vector respectively, Ai = {RAi , TAi } is the pose of the anchored EM
sensor and R = {RR , TR } is the relative pose from the anchored EM sen-
sor to the CT coordinate frame, which corresponds to the EM-CT registration.
CC C T C T T
i = [(c1 ) , . . . , (cn ) ] is the vessel contour computed from the pre-operative
data as the cross section of the CT model and the plane defined by the catheter
pose Pi transformed with Ai and the registration pose R. Here, the IVUS images
and EM data used in the optimisation are gated to be at the same phase of the
cardiac cycle by the ECG signal. The anchored EM sensor is introduced only to
deal with global motion and we assume that at each phase of the cardiac cycle,
the relative pose between the anchor sensor and the vessel does not change.
The first term in (1) transforms the contour CC i from the CT to the IVUS
coordinate frame and minimises the difference between the contour extracted
from the IVUS image and the contour computed from the pre-operative data,
weighted by the uncertainty of the IVUS contour ΣI . The second term in (1)
minimises the difference between the catheter pose and the pose reported by the
EM sensor, weighted by the uncertainty of the EM pose ΣE . The first two terms
in (1) are similar to SCEM+ [9], but with R and Ai included in the state vector.
The third term in the objective function (1) aims to minimise the difference
between the EM-CT registration pose R in the state vector and the optimal
solution of the registration pose R̂i−1 from the previous frame, weighted by the
corresponding covariance matrix ΣRi−1 computed from the proposed algorithm.
Here, (R̂i−1 , ΣRi−1 ) from the (i − 1)th frame is used as the observation in the
optimisation of the ith frame. The fourth term in the objective function is to
minimise the difference between the anchored EM pose in the state vector and
the observation of the EM pose reported from the anchored sensor, weighted by
the uncertainty of the EM sensor.
The optimal solution of the optimisation formulated in (1) can be obtained
iteratively by using the Gauss-Newton method, where in the k th iteration
⎧ k+1 ⎡ k⎤
⎪ R = Rk + ΔkR ΔR
⎨
T −1 ⎢ k ⎥
Pi = Pi + ΔP , where J Σ J ⎣ ΔP ⎦ = J T Σ −1 ε.
k+1 k k
(2)
⎪
⎩ k+1
Ai = Aki + ΔkA ΔkA
528 L. Zhao et al.
Here J is the linear mapping represented by the Jacobian matrix of the observa-
tion functions evaluated at Rk , Pki and Aki , Σ is the covariance matrix containing
the uncertainties of all the observations, and ε is the residual vector of all the
observations as
⎡ ⎤
∂CIi ∂CIi ∂CIi ⎡ ⎤ ⎡ I ⎤
⎢ ∂R ∂P ∂Pi ∂Ai
⎥ ΣI 0 0 0 Ci − f (R, Pi , Ai )
E
⎢ 0 i
0 ⎥ ⎢ 0 ⎥ ⎢ PEi − Pi
⎥
J =⎢ ∂Pi ⎥, Σ =⎢ 0 ΣE 0 ⎥, ε =⎢ ⎥ (3)
⎢ ∂Ri−1 ⎥ ⎣ 0 0 ΣRi−1 0 ⎦ ⎣ R̂i−1 − R ⎦
⎣ ∂R 0 0 ⎦
A
0
∂P
0 ∂Aii
A 0 0 0 ΣE P i − Ai
where f (·) combines all the observation functions in the first term in (1)
T T T
f (R, Pi , Ai ) = [. . . , (((cC T T T
j ) RR +TR )RAi +TAi −TPi )RPi , . . .] , j = 1 : n (4)
∂PE ∂PA
and ∂Pii = ∂R ∂R = ∂Ai = E6 , where E6 is the 6 × 6 identity matrix.
i−1 i
For real-time implementation, the residual in the first term of (1) can be
replaced by the shortest distances to the pre-operative CT model. Thus, the
objective function and its Jacobians related to the first term can be pre-
calculated as the distance space and its gradient from the pre-operative data
[9]. By the formulation of the optimisation problem, the state vector can be sim-
ply initialised by using the observations R0 = R̂i−1 , P0i = PE 0
i and Ai = Pi .
A
After the optimal solutions of the EM-CT registration R̂i , the current
catheter pose P̂i and the anchored sensor pose Âi are obtained, their corre-
sponding covariance matrices ΣRi , ΣPi and ΣAi which present their uncertainty
can also be computed by using the Schur complement
⎧ T −1
⎪
⎪ −1 IP R IP P IP A IP R
⎪
⎪ ΣRi = IRR −
⎪
⎪ IAR IAP IAA IAR I = J T Σ −1 J
⎪
⎪ T −1
⎨ ⎡ ⎤
IRP IRR IRA IRP , where IRR IRP IRA
ΣP−1 = IP P −
⎪
⎪
⎪
i IAP IAR IAA IAP = ⎣IP R IP P IP A ⎦ .
⎪
⎪
⎪
⎪
T −1 IAR IAP IAA
⎪ Σ −1 = I − IRA
⎩
IRR IRP IRA
Ai AA
IP A IP R IP P IP A
(5)
Here IRR , IP P , IAA and IRP = IPT R , IRA = IAR T T
, IP A = IAP are the parts of
the information matrix I, which correspond to the variables R, Pi , Ai and their
correlations, respectively.
The vessel reconstruction can be performed by transforming the IVUS con-
tour CIi into the CT coordinate frame Ci = [cT1 , . . . , cTn ]T using the optimal R̂i ,
P̂i and Âi , with the corresponding covariance matrix ΣCi as uncertainty:
T I
cj = R̂R (R̂Ai (R̂P c + T̂Pi − T̂Ai ) − T̂R ), ΣCi = JC ΣS JCT
i j
(6)
where JC is the Jacobian matrix of Ci w.r.t the registration pose R, catheter
pose Pi , anchor pose Ai as well as the IVUS contour CIi respectively, and ΣS
contains their covariance matrices on its diagonal

∂C ∂C ∂C ∂C
JC = ∂Ri ∂Pii ∂Aii ∂CIi , ΣS = diag(ΣRi , ΣPi , ΣAi , ΣI ). (7)
i
At the end of the ith frame, the optimal solution of the EM-CT registration
pose together with the corresponding uncertainty (R̂i , ΣRi ) computed by (5) are
used as one of the observations in the (i + 1)th frame.
In the proposed algorithm, the uncertainty of the EM-CT registration pose is
−1
initialised with zero information as ΣR 0
= 06 (where 06 is a 6 × 6 zero matrix) at
the first frame to ensure that the proposed algorithm only uses the information
from IVUS, EM and the pro-operative data. Since the EM-CT registration is
incrementally estimated from IVUS and EM data, the result of the registration
will not be very accurate at the very beginning. As more parts of the vessel
are observed by IVUS, the EM-CT registration is updated intra-operatively and
becomes more accurate. By using the formulation above, the information from
both the IVUS contour (CIi , ΣI ) and the EM pose (PE i , ΣE ) at the i
th
frame are
transferred and accumulated in the covariance matrix ΣRi of R̂i , which means
all the information of IVUS and EM from the 1st to the ith frame is summarised
in ΣRi , and is used in the (i + 1)th frame as an integrated observation (R̂i , ΣRi ).
3 Results
3.1 Monte-Carlo Simulation
First, simulated data generated from a CT model with known EM-CT registra-
tion, and perfect EM poses and IVUS contours as ground truth were used to
assess the accuracy of the proposed algorithm w.r.t the observation noise. Dif-
ferent levels of zero mean Gaussian noise were added to the ground truth EM
poses and to the IVUS contours and were used as observations to the proposed
algorithm. For each noise level, 25 runs were performed and the mean pose and
reconstruction errors are shown in Fig. 1(left). In Fig. 1(right), the changes of the
error of the EM-CT registration pose during the vessel reconstruction are shown
with the 2σ bound from the corresponding uncertainty estimation, when 0.1 rad
noise is added to the rotation and 1 mm noise to the translation of the EM pose,
X Y Z
Error (mm)
-5
0.03
Yaw Pitch Roll
Error (rad)
-0.03
0 500 1000 0 500 1000 0 500 1000
Frame Frame Frame
Fig. 1. Monte-Carlo simulation: (left) the accuracy of catheter pose, vessel reconstruc-
tion and EM-CT registration pose w.r.t different levels of noise on the observations of
EM poses and IVUS contours, (right) the reduction of error (black lines) and uncer-
tainty (2σ bounds shown in blue lines) of the EM-CT registration pose.
530 L. Zhao et al.
and 1 mm noise to the IVUS contour. It can be seen that the reconstruction
errors remain small in the presence of noise and the error and uncertainty of
EM-CT registration reduced quickly.
3.2 Phantom Experiments
The Static Case: A static HeartPrint R

(Materialise, Leuven, Belgium) aor-
tic phantom was used to compare the proposed algorithm against SCEM and
SCEM+. Three setups with different EM-CT registrations were used and for
each setup 5 datasets with catheter insertions and pullbacks were generated.
The mean errors of the vessel reconstruction are shown in Fig. 2(left). Since the
ground truth of EM-CT registrations cannot be obtained, the registrations esti-
mated by the proposed algorithm were compared to the ones computed by using
CT markers which were used in the SCEM and SCEM+ algorithms. As shown
in Fig. 2(left), the proposed algorithm can achieve the similar accuracy of vessel
reconstruction (around 0.3 mm) to SCEM+, but without EM-CT registration.
1.6 0.6
R Registration (rad) SCEM (mm)
1.4 T Registration (cm) SCEM+ (mm)
0.55
Proposed (mm)
1.2
0.5
Error (mm)
1
Error
0.8 0.45
0.6
0 6
0.4
0.4
0.35
0.2
0 0.3
1 Setup-1 5 Setup-2 10 Setup-3 15 Global
Static Periodic Global+Periodic
Fig. 2. Accuracy of phantom experiments: (left) the static case using HeartPrint phan-
tom, (right) with global motion and periodic deformation using the silicone phantom.
Global Motion and Periodic Deformation: A Silicone aortic phantom con-

nected to a pump was used and periodic deformation and different global motions
were simulated. For the global motion, the catheter was first inserted, and the
EM field generator was moved before pulling the catheter back. For the periodic
deformation, the pump simulated the cardiac motion, and its signal was used
to gate the IVUS images in the proposed algorithm. For each cardiac cycle, the
3D shape of the vessel was reconstructed at the same phase of the cycle. Exper-
iments with static phantom, global motion, periodic deformation, and global
motion+periodic deformation were performed and for each case five experiments
were done. The result of the proposed algorithm with global motion is shown in
Fig. 3, and quantitative evaluation of the accuracy of the vessel reconstruction
is presented in Fig. 2(right). From the results we can see that the proposed algo-
rithm can successfully deal with global motion and periodic deformation motion
achieving about 0.45 mm accuracy.
(a) CT Model (b) SCEM (c) Proposed (d) Trajectories
Fig. 3. Vessel reconstruction results of the silicone phantom with global motion: (a) pre-
operative CT model, (b) result of SCEM shows the changes of the EM-CT registration,
(c) result of the proposed algorithm coloured by the error of reconstruction in mm, and
(d) the catheter tip poses found using SCEM (red) and the proposed algorithm (black).
3.3 In-vivo Experiments
In-vivo experiments in a swine model with global motions were also performed to
validate the proposed algorithm. A segmented CT scan provided the triangular
surface mesh of the aorta. Seven CT markers were attached to the body of the
swine, but as shown in Fig. 4(b), the EM-CT registration with CT markers has
large error. In total 4 pullbacks were performed. The IVUS was gated by the
ECG to deal with cardiac motion, and the results of the proposed algorithm are
shown in Fig. 4(c), (d) and Table 1. For the 4 pullbacks, the mean errors of vessel
reconstruction are 0.80, 0.83, 0.71, 0.68 mm, respectively.
Pullback-1
CT Model
Pullback-2
(a) CT and IVUS Image (b) Registration by CT Markers (c) Pullback-1 (d) Pullback-2
Fig. 4. Results of in-vivo experiments in swine model: (a) CT and IVUS image, (b)
SCEM results show the large error of EM-CT registration by using CT markers and
the global motion between Pullback-1 and Pullback-2, the results of Pullback-1 (c) and
Pullback-2 (d) by the proposed algorithm.
532 L. Zhao et al.
Table 1. Vessel reconstruction error of in-vivo experiments (in mm)
SCEM SCEM+ Proposed

Pullback Mean Std Mean Std Mean Std
1 5.0291 5.0452 1.7009 1.3939 0.8065 0.5976
2 5.6656 5.3751 1.8991 1.4088 0.8330 0.5756
3 3.9780 3.9749 1.5499 1.1917 0.7072 0.5372
4 4.1629 4.1490 1.4851 1.1708 0.6851 0.5409
4 Conclusion
This paper presents an intra-operative, real-time, registration-free 3D vessel

reconstruction approach based on nonlinear optimisation using IVUS, EM and
pre-operative data. Phantom and in-vivo experiments show that the proposed
algorithm can achieve accurate vessel reconstruction, and, unlike the SCEM and
SCEM+ methods, the explicit registration between the EM system and pre-
operative data is not required. The use of external CT markers for registration
can cause large errors and by removing this requirement, the proposed method
can be easily integrated clinically, without interrupting the workflow. The algo-
rithm runs in real-time with around 200 fps (on one core of an Intel i7-2600
CPU @3.4GHz) and is also robust to global motion and periodic deformation.
In conclusion, the algorithm proposed in the paper can be deployed to improve
endovascular navigation without the need of explicit EM-CT registration.
References
1. Mirabel, M., Iung, B., Baron, G., et al.: What are the characteristics of patients
with severe, symptomatic, mitral regurgitation who are denied surgery? Eur. Heart
J. 28(11), 1358–1365 (2007)
2. Kono, T., Kitahara, H., Sakaguchi, M., Amano, J.: Cardiac rupture after catheter
ablation procedure. Ann. Thor. Surg. 80(1), 326–327 (2005)
3. Groher, M., Bender, F., Hoffmann, R.-T., Navab, N.: Segmentation-driven 2D-
3D registration for abdominal catheter interventions. In: Ayache, N., Ourselin, S.,
Maeder, A. (eds.) MICCAI 2007, Part II. LNCS, vol. 4792, pp. 527–535. Springer,
Heidelberg (2007)
4. Rosales, M., Radeva, P., Rodriguez-Leor, O., Gil, D.: Modelling of image-catheter
motion for 3-D IVUS. Med. Imag. Anal. 13(1), 91–104 (2009)
5. Wahle, A., Prause, G.P.M., DeJong, S.C., Sonka, M.: Geometrically correct
3-D reconstruction of intravascular ultrasound images by fusion with bi-plane
angiography-methods and validation. IEEE Trans. Med. Imag. 18(8), 686–699
(1999)
6. Bourantas, C.V., Papafaklis, M.I., Athanasiou, L., et al.: A new methodology for
accurate 3-dimensional coronary artery reconstruction using routine intravascu-
lar ultrasound and angiographic data: implications for widespread assessment of
endothelial shear stress in humans. EuroIntervention 9(5), 582–93 (2013)
7. Sanz-Requena, R., Moratal, D., Diego, R.G.S., et al.: Automatic segmentation

and 3D reconstruction of intravascular ultrasound images for a fast preliminary
evaluation of vessel pathologies. Comput. Med. Imag. Graph. 31(2), 71–80 (2007)
8. Shi, C., Giannarou, S., Lee, S.L., Yang, G.Z.: Simultaneous catheter and envi-
ronment modeling for trans-catheter aortic valve implantation. In: Proceedings of
IROS, pp. 2024–2029 (2014)
9. Zhao, L., Giannarou, S., Lee, S.L., Yang, G.Z.: SCEM+: real-time robust simul-
taneous catheter and environment modeling for endovascular navigation. IEEE
Robot. Autom. Lett. 1(2), 961–968 (2016)
10. Fitzpatrick, J.M., West, J.B., Maurer, C.R.: Predicting error in rigid-body point-
based registration. IEEE Trans. Med. Imag. 17(5), 694–702 (1998)
Pareto Front vs. Weighted Sum for Automatic
Trajectory Planning of Deep Brain Stimulation
Noura Hamzé1,5(B) , Jimmy Voirin1,2 , Pierre Collet1 , Pierre Jannin3 ,

Claire Haegelen3,4 , and Caroline Essert1,5
1
ICube, Université de Strasbourg, Strasbourg, France
n.hamze@unistra.fr
2
Departement of Neurosurgery, Strasbourg University Hospital, Strasbourg, France
3
LTSI, Inserm UMR 1099 - Université de Rennes 1, Rennes, France
4
Department of Neurosurgery, Pontchaillou University Hospital, Rennes, France
5
IHU, Institute of Image-Guided Surgery, Strasbourg, France
Abstract. Preoperative path planning for Deep Brain Stimulation

(DBS) is a multi-objective optimization problem consisting in search-
ing the best compromise between multiple placement constraints. Its
automation is usually addressed by turning the problem into mono-
objective thanks to an aggregative approach. However, despite its intu-
itiveness, this approach is known for its incapacity to find all opti-
mal solutions. In this work, we introduce an approach based on multi-
objective dominance to DBS path planning. We compare it to a classical
aggregative weighted sum of the multiple constraints and to a manual
planning thanks to a retrospective study performed by a neurosurgeon
on 14 DBS cases. The results show that the dominance-based method
is preferred over manual planning, and covers a larger choice of relevant
optimal entry points than the traditional weighted sum approach which
discards interesting solutions that could be preferred by surgeons.
1 Introduction
Preoperative planning of a safe and efficient trajectory for a Deep Brain Stimu-
lation (DBS) electrode is a crucial and challenging task which usually requires
a long experience. The path is usually chosen as the best compromise between
multiple placement rules that may be contradictory, such as accurate targeting,
avoidance of various sensitive structures or zones, or compliance with standards.
Most of the automatic trajectory planning techniques that have been pro-
posed in the literature for DBS are based on mono-objective approaches
[1,2,5,7,11]. They combine the rules into a single aggregative weighted sum
and minimize it to find an optimal solution. This approach is intuitive, and
sounds close to the current decision making process. However, the optimiza-
tion community has shown that using such mono-criteria approaches for solving
multi-criteria optimization problems can lead to an under-detection of the opti-
mal solutions in a given solution space: it often produces poorly distributed
solutions and does not find optimal solutions in non-convex regions [6].

DOI: 10.1007/978-3-319-46720-7 62
Pareto Front vs. Weighted Sum for Automatic Trajectory Planning of DBS 535
If multi-criteria methods are already widely used for radiation therapy plan-
ning [3], it’s only recently that a few groups started to consider Pareto-optimality
techniques for path planning in minimally invasive surgery. For example, a non-
dominance based optimization was described in [9,10] for radiofrequency abla-
tion of tumors. But to our knowledge, no such method has been used in DBS.
The purpose of this work is to better understand and quantify the capacities
and limits of different approaches to detect optimal solutions in the case of
preoperative DBS path planning. We introduce to this context an optimality
quantification approach based on dominance with the computation of a Pareto
front. We compare it to a classical aggregative method based on a weighted
sum. For both methods, within a uniform distribution of candidate entry points,
optimal solutions are proposed and the difference is studied. These approaches
are described in detail in Sect. 2. Then in Sect. 3, we describe the experiment
performed by an experienced neurosurgeon on 14 patients cases, in order to
quantify the loss of relevant trajectories missed by the aggregative approach.

In this section, we detail both quality quantification approaches that were com-
pared. Then, we describe the GUI proposed to facilitate the navigation within
the solutions, and present the data and experiment used for comparison.
2.1 Method 1: Pareto Front (MP F )
Method MP F is a multi-objective method based on a Pareto ranking scheme.

It consists in analyzing the mutual non-dominance of candidate entry points in
an initial set S. We define the strict dominance relationship dom between two
individuals x and y of the solution space S for a set of n objective functions fi
as follows:
∀x, y ∈ S x dom y ⇐⇒ ∀i ∈ [1..n], fi (x) < fi (y)
A solution x is Pareto-optimal if it is not dominated by any other solution

in the solution space S.
x∈S is P areto optimal ⇐⇒ ∀y ∈ S, ¬(y dom x)
The set of all Pareto-optimal solutions is called a Pareto front. Let us denote
SP F the subset of points of S that belong to the Pareto front. Inside the front,
no solution dominates another.
x ∈ SP F ⇐⇒ ∀y ∈ SP F , ¬(y dom x) ∧ ¬(x dom y)
SP F represents the Pareto-optimal points of S that can be reached using

MP F . They are computed by comparing point of the sampling in pairs and
keeping only the points that satisfy the above property.
536 N. Hamzé et al.
2.2 Method 2: Weighted Sum Exploration (MW SE )

The weighted sum is a mono-objective approach for quantifying the quality of a
solution based on the representation of all of the n objective functions fi by a
single aggregative cost function f to minimize. A weight wi is associated to each
fi as follows:
n

f (x) = wi .fi (x), x ∈ RN
i=1
where: 0 < wi < 1 and Σwi = 1 (weights condition), and x represents the
trajectory associated to a candidate entry point.
For a fixed combination of weights W = w1 , ..., wn , we can quantify the
quality of each candidate entry point pj ∈ S of the initial set by evaluating
f (xj ), where xj is the trajectory corresponding to pj . Then, the optimal entry
point for combination W is the point of S with a minimal evaluation of f .
When varying weights wi in W , different entry points of S minimizing f can
be obtained. An exploration by varying systematically a high number of different
combinations of weights is the most widely-used approach to approximate a
Pareto front: the maximal coverage of this method MW SE is the subset SW SE
of all the points of S that can be found as optimal with this method.
To achieve this, a stochastic sampling of the n weights wi satisfying the
above-mentioned weights condition is built. A Dirichlet distribution [8] allows
to obtain a uniform sampling of 20,000 different combinations of weights. Note
that different combinations can lead to the same optimal entry point within a
predefined finite set of candidate entry points.
2.3 Discretization of the Solution Space

Usual automatic trajectory planning techniques involve the search of the best
entry point thanks to an optimization phase converging to solutions optimiz-
ing the chosen quality measurement. However, optimization methods would dif-
fer for mono- and multi-objective cases: classical derivative-free optimization
techniques are appropriate for mono-objective, while evolutionary approaches
are more suitable and most frequently used for multi-objective techniques. The
choice of such different optimizers could bias the comparison of the quality mea-
surement method, as their convergence may differ. In order to have a fair com-
parison, we chose to avoid the use of optimizers, and focused on comparing only
the quality measurement methods on a selection of candidate entry points.
To do so, we computed a discretization of the solution space by choosing a
uniform distribution S of points over the surface of the feasible entry points,
i.e. the points leading to a safe trajectory not crossing any forbidden anatomical
structure or zone. The precision of the distribution was chosen such that we
have one candidate trajectory per degree, if the center of rotation is the center
of the targeted structure, which corresponds to approximately to one entry point
every millimeter. This precision was assessed as sufficiently spaced by our expert
neurosurgeon. An example of a distribution of candidate points over a surface
of feasible points is shown on Fig. 1d.
2.4 Evaluation Study
The objective of the test was (1) to compare the two methods on their coverage
over the surface of candidate entry points, and their ability to find the maximal
set of optimal solutions, and (2) to check whether the points found as optimal by
one method and not by the other one were likely to be chosen by neurosurgeons.
To this end, a retrospective study was performed on 14 datasets from 7
patients who underwent a bilateral Deep Brain Stimulation of the Subthalamic
Nucleus (STN) to treat Parkinson’s disease. Each dataset was composed of pre-
operative 3T T1 and T2 MRI with a resolution of 1.0 mm × 1.0 mm × 1.0 mm,
and a 3D brain model containing triangular surface meshes of the majority of
cerebral structures, segmented and reconstructed from the preoperative images
using the pyDBS pipeline described in [4]. Among the 3D structures, we have
the STN, a patch delineated on the skin as a search area for the entry points,
the ventricles and the sulci that neurosurgeon try to avoid. The T1, T2 and 3D
meshes were registered in the same coordinates system.
A second pipeline was implemented and executed on the 3D scenes. First a
discretization S of the search space, as described in Sect. 2.3, was performed.
The distribution contained between 0.93 and 1.29 point per mm2 (average 1.07),
representing an average of 2,320 sample points per case on an average surface
of 2,158 mm2 . Then we computed the subsets SW SE and SP F of points labeled
as optimal respectively by methods MW SE and MP F , as described in Sects. 2.1
and 2.2. Examples of subsets of optimal points proposed by both methods are
presented on Figs. 1a and b. We marked for each case the difference set D of
points found by one method and not by the other DW SE = SW SE −(SW SE ∩SP F )
and DP F = SP F − (SW SE ∩ SP F ), and computed their cardinality.
Finally, an experienced neurosurgeon was asked to perform a test in 4 steps.
– Step 1: “Manual planning MM P ”. This phase consisted in selecting inter-

actively the target point and the entry point on the 2D T1/T2 slices. The
chosen trajectory TM P could be visualized and assessed in the 3D view to
check if the position was satisfying. Let’s denote this method MM P .
– Step 2: “Planning using method MW SE ”. In this phase, the target point
chosen in step 1 was kept, and the surgeon had to choose an entry point among
the ones proposed by MW SE .
– Step 3: “Planning using method MP F ”. In this phase, the target point
chosen in step 1 was also kept, and the surgeon had to choose an entry point
among the ones proposed by method MP F .
– Step 4: “Trajectories ranking”. This phase consisted in ranking the three
trajectories TM P , TW SE and TP F chosen in steps 1–3. The ranking was blind
as the three trajectories were randomly assigned a color, and the surgeon
ranked the colors. An illustration of this step is shown on Fig. 1c, where the
colors have been set to match Figs. 1a and b for readability purposes. The
ranking could be zero if the trajectory was finally marked as being really
worse than the others and rejected. Trajectories could be equally ranked if
they were identical or estimated to have a similar quality.
(a) Step 2: weighted sum set SW SE (b) Step 3: Pareto front set SP F
(c) Step 4: ranking (d) Initial distribution of points
Fig. 1. Case #12: area of feasible entry points, with solutions of MW SE in blue,
solutions of MP F in red, and the trajectory chosen with MM P in green.
The trajectory planning is submitted to a number of surgical rules. We have

chosen to represent three of them, that seemed to be among the most commonly
accepted rules, as objective functions for our experiment. Function f1 represents
the proximity to a standard trajectory defined by expert neurosurgeons and
commonly used in the commercial platforms: 30◦ anterior and 30◦ lateral. Func-
tion f2 represents the distance from the electrode to the sulci where the vessels
are most often located, and that the surgeons try to avoid at best. Function f3
represents the distance from the electrode to the ventricles, to avoid as well.
In order to assist the navigation through the solutions proposed by each
method, we displayed visual clues controlled by sliders. For method MW SE , a
slider i allows the surgeon to assign a value to weight wi . Modifying the slider
position updates a color map representing f for a particular set of weights, to
help in the selection of a candidate. Its use is optional, and the visualization of all
candidates in SW SE is not affected. In the case of method MP F , the implemented
filtering sliders were inspired by those proposed in [9] for radiofrequency tumor
ablation. A slider i assigns a threshold θi for the value of fi : point p ∈ SP F
is displayed only if fi (p) < θi , and hidden otherwise. Sliders range from 1 (all
solutions displayed) to 0 (no solution displayed). Their use is also optional.
The experiment has been performed on an Intel Core i7 running at 2.67 GHz
with 8 GB RAM workstation. For all cases, the positions of the target/entry
points, the final ranking, and the times required for each step were recorded1 .
1
A video illustrating the experiment can be watched at http://goo.gl/mfgrqX or
https://youtu.be/16JthovAh5c.

For all the cases, SW SE ⊂ SP F . Average cardinalities are |SW SE | = 26 and
|SP F | = 190. Difference set DW SE was always empty, meaning that all points
found as optimal by MW SE were also proposed by MP F . On the contrary, MP F
always found more points than MW SE . The chart on Fig. 2 shows the number
of points in SW SE and SP F . The average cardinality of the difference DP F is
164, which represents 86.41 % of the average number of points of MP F .
Fig. 2. Number of points in SW SE (in blue) and SP F (in red) for the 14 cases
In order to determine if the points missed by MW SE were interesting points

likely to be chosen by a surgeon, we analyzed the data recorded during the test.
First, we could observe that MM P ranked first in 2/14 cases, method MW SE
ranked first in 5/14 cases, and method MP F ranked first in 6/14 cases. In the
remaining case, the entry points chosen using MW SE and MP F coincided, so
both methods were equally ranked in first position.
Let us note that the risk of being biased towards one solution picked right
before is significantly reduced by the random color coding, the protocol pipeline
(14 plannings with MM P , then 14 with MW SE then 14 with MP F , before the
14 rankings), and the absence of any visual clue when ranking.
The fact that MW SE ranked first 5 times does not mean MW SE outper-
formed MP F in 5 cases, as SW SE ⊂ SP F . On the contrary, when MP F ranked
first we observed that none of the entry points were also part of MW SE . There-
fore, we can state that method MP F is superior to MW SE in the sense that
it finds preferred solutions that MW SE does not propose, while the opposite is
not true. Presumably, the best possible solution was not available in SW SE so a
sub-optimal alternative was chosen.
In order to see whether, in these kind of cases, reasonably close alternatives
would be available in SW SE , we computed the distances between entry points
selected in SP F and the closest point of SW SE . Results are shown on the left
part of Table 1. It can be observed that in one case over six (#12), the distance
is higher than 16 mm, which means that no point was proposed by SW SE within
the region of the selected entry point. This case is also the one having the highest
difference in terms of coverage of optimal points between the two methods. We
chose to display this particular case in Fig. 1 to highlight that such cases may
happen quite often due to the mathematical specificity of the weighted sum
approach. In two other cases, the distance is higher than 4.8 mm, which is still
far from the preferred location. For the other 3 cases, the distance ranges between
1.6 mm and 2.05 mm which may correspond to relatively reasonable alternatives.
Table 1. Distances in mm between entry points selected with methods MP F and

MM P and the closest alternative point in other automatic methods
Distance to: MP F ranked first MM P ranked first

2 6 9 11 12 14 7 13
SW SE 2.05 4.83 1.67 1.90 16.05 5.92 2.83 1.49
SP F - - - - - - 1.16 0.87
It is also interesting to observe that for the two cases where MM P was
ranked first (#7 and #13), the distance between the manually proposed entry
point and the closest point of SP F (resp. 1.16 and 0.87 mm) was always lower
than the closest point of SW SE (resp. 2.83 and 1.49 mm).
The average times taken for each of the three methods of selection were
respectively 155 s. for MM P , 38 s. for MW SE , and 42 s. for MP F . Of course,
this measurement is biased because the target selection time is included only in
MM P , as steps 2 and 3 consisted in only selecting an entry point. We did not
record separately the time required to select the entry point, because in step 1
we chose to let the surgeon go back and forth between target and entry point
position refinement to have a good accuracy. However, even considering that
planning the target point took half of the time in step 1, steps 2 and 3 were still
much faster. Besides, the improvement of speed was not at the cost of accuracy,
as an automatically proposed entry point was ranked first in 12/14 cases. This
experiment confirms the overall interest of automatic assistance to preoperative
trajectory planning for Deep Brain Stimulation.
Finally, we can notice that in 5 cases the surgeon did not choose the same
point using PF and WS even though the preferred point was available in both. We
hypothesize that the display might have to be improved for MP F , for instance
by using a color scheme for the objectives.
4 Conclusion
The automatic trajectory planning techniques that have been proposed for DBS
in the literature are based on mono-objective optimization approaches that com-
bine different criteria through weighted sums. Unfortunately, theory shows that
such techniques cannot find concavities in Pareto fronts, meaning that some
Pareto-optimal solutions cannot be reached.
This paper shows that methods using a quantification of the trajectories
quality based on Pareto-optimality can find more optimal propositions than the
current state of the art algorithms using weighted sums. The evaluation study we
conducted involving a blind ranking, highlighted that the extra propositions can
often be chosen as more accurate by a neurosurgeon, and that some of them did
not have any reasonably close alternative proposed by the weighted sum method.
Finally, the recorded times indicated that the automatic assistance was, in 12
cases over 14, both faster and more accurate than a manual planning, which
further confirms the overall interest of automatic assistance to preoperative tra-
jectory planning for Deep Brain Stimulation.
Acknowledgments. The authors would like to thank the French Research Agency
(ANR) for funding this work through project ACouStiC (ANR 2010 BLAN 020901).
References
1. Bériault, S., Subaie, F.A., Collins, D.L., Sadikot, A.F., Pike, G.B.: A multi-modal
approach to computer-assisted deep brain stimulation trajectory planning. Int. J.
CARS 7(5), 687–704 (2012)
2. Brunenberg, E.J.L., Vilanova, A., Visser-Vandewalle, V., Temel, Y.,
Ackermans, L., Platel, B., ter Haar Romeny, B.M.: Automatic trajectory planning
for deep brain stimulation: a feasibility study. In: Ayache, N., Ourselin, S.,
Maeder, A. (eds.) MICCAI 2007, Part I. LNCS, vol. 4791, pp. 584–592. Springer,
Heidelberg (2007)
3. Craft, D.: Multi-criteria optimization methods in radiation therapy planning: a
review of technologies and directions (2013). arXiv preprint: arXiv:1305.1546
4. D’Albis, T., Haegelen, C., Essert, C., Fernandez-Vidal, S., Lalys, F., Jannin, P.:
PyDBS: an automated image processing workflow for deep brain stimulation
surgery. Int. J. Comput. Assist. Radiol. Surg. 10, 1–12 (2014)
5. Essert, C., Haegelen, C., Lalys, F., Abadie, A., Jannin, P.: Automatic computa-
tion of electrode trajectories for deep brain stimulation: a hybrid symbolic and
numerical approach. Int. J. Comput. Assist. Radiol. Surg. 7(4), 517–532 (2012)
6. Kim, I., de Weck, O.: Adaptive weighted-sum method for bi-objective optimization:
pareto front generation. Struct. Multidiscip. Optim. 29(2), 149–158 (2004)
7. Liu, Y., Konrad, P., Neimat, J., Tatter, S., Yu, H., Datteri, R., Landman, B.,
Noble, J., Pallavaram, S., Dawant, B., D’Haese, P.F.: Multisurgeon, multisite val-
idation of a trajectory planning algorithm for deep brain stimulation procedures.
IEEE Trans. Biomed. Eng. 61(9), 2479–2487 (2014)
8. Ng, K.W., Tian, G.L., Tang, M.L.: Dirichlet and Related Distributions: Theory,
Methods and Applications, vol. 888. Wiley, Chichester (2011)
9. Schumann, C., Rieder, C., Haase, S., Teichert, K., Süss, P., Isfort, P., Bruners, P.,
Preusser, T.: Interactive multi-criteria planning for radiofrequency ablation. Int.
J. CARS 10, 879–889 (2015)
10. Seitel, A., Engel, M., Sommer, C., Redeleff, B., Essert-Villard, C., Baegert, C.,
Fangerau, M., Fritzsche, K., Yung, K., Meinzer, H.P., Maier-Hein, L.: Computer-
assisted trajectory planning for percutaneous needle insertions. Med. Phy. 38(6),
3246–3260 (2011)
11. Trope, M., Shamir, R.R., Joskowicz, L., Medress, Z., Rosenthal, G., Mayer, A.,
Levin, N., Bick, A., Shoshan, Y.: The role of automatic computer-aided surgical
trajectory planning in improving the expected safety of stereotactic neurosurgery.
Int. J. CARS 10(7), 1127–1140 (2014)
Efficient Anatomy Driven Automated Multiple
Trajectory Planning for Intracranial Electrode
Implantation
Rachel Sparks1(B) , Gergely Zombori1 , Roman Rodionov2 , Maria A. Zuluaga1 ,

Beate Diehl2,3 , Tim Wehner2,3 , Anna Miserocchi2,3 , Andrew W. McEvoy2,3 ,
John S. Duncan2,3 , and Sebastien Ourselin1,4
1
rachel.sparks@ucl.ac.uk
2
Department of Clinical and Experimental Epilepsy,
University College London Institute of Neurology, London, UK
3
National Hospital for Neurology and Neurosurgery (NHNN), London, UK
4
Dementia Research Centre, Department of Neurodegenerative Disease,
University College London Institute of Neurology, London, UK
Abstract. Epilepsy is curable if the epileptogenic zone (EZ) can be

identified within the brain and resected. Intracranial depth electrodes
help identify the EZ and also map cortical function. In current clinical
practice, 7–12 electrode trajectories typically needed, and are planned
manually, requiring 2–3 h. Automated methods can reduce planning
time and improve safety by computing suitable trajectories. We present
anatomy driven multiple trajectory planning (ADMTP) to compute safe
trajectories from anatomical regions of interest(ROIs). Trajectories are
computed by (1) identifying targets within deep ROIs, (2) finding trajec-
tories that traverse superficial ROIs and avoid critical structures (blood
vessels, sulci), and (3) determining a feasible configuration of trajecto-
ries. ADMTP was evaluated on 20 patients (186 electrodes). Compared
to manual planning, ADMTP lowered risk in 78 % of trajectories and
increased GM sampling in 56 % of trajectories. ADMTP is computation-
ally efficient, computing between 7–12 trajectories in 61 (15–279) s.
1 Introduction
One-third of individuals with focal epilepsy continue to have seizures despite
optimal medical management. These patients are candidates for resection if the
epileptogenic zone (EZ) can be identified. Intracranial depth electrodes may be
implanted in the brain to record electroencephalographic (EEG) signals indica-
tive of epileptic activity in both deep and superficial regions of interest (ROIs)
within the cortex that have been identified as potential EZ. Implanted electrodes
are also used for stimulation studies to map eloquent areas (e.g. motor or sensory
cortex) and to determine whether a safe resection may be made that removes
the EZ without compromising eloquent cortex.

DOI: 10.1007/978-3-319-46720-7 63
Efficient Anatomy Driven Automated Multiple Trajectory Planning 543
Electrode trajectories must avoid critical structures (blood vessels, sulci) to

minimise the risk of haemorrhage. In current clinical practice each electrode tra-
jectory, defined by a brain target and a skull entry, is planned by visual inspection
of preoperative imaging data. Trajectories are placed to attain deep and superfi-
cial ROIs, avoid critical structures, sample grey matter (GM), and avoid conflict
between electrodes. Maximal sampling of GM is preferred, as epileptic seizures
arise in GM rather than white matter. This is a complex, time-consuming task
typically requiring 2–3 h per case. Trajectory planning algorithms may reduce
planning time, improve safety, and increase GM sampling by optimizing trajec-
tories according to quantitative suitability measures.
Single trajectory planning algorithms require the user to specify a target as a
point [2,6,8] or manual ROI [5]. Trajectories are discarded if surgically infeasible
or an unsafe distance from critical structures. Trajectories are then scored with
a risk metric and the most suitable trajectory is selected. These methods are
limited when applied to epilepsy as (1) the most suitable target point for deep
ROIs may not be obvious, (2) multiple electrodes are implanted and must be
placed to avoid conflicts between electrodes and to maximize GM sampling.
Trajectory planning algorithms designed for intracranial depth electrodes
allow users to specify target and entry ROIs [4,7]. de Momi et al. [4] determined
trajectories by randomly sampling from user selected target and entry ROIs.
Zelmann et al. [7] determined candidate target points within the hippocampus
and amygdala by sampling a Gaussian distribution defined on a ROI distance
map where targets far from the surface were preferentially sampled. For both
methods, the best trajectory in terms of entry angle with the skull, avoidance of
critical structures, and no conflicts between electrodes was computed. Zelmann
et al. [7] also had a constraint to maximize GM sampling.
We present anatomically-driven multiple trajectory planning (ADMTP) to
compute electrode trajectories. ADMTP improves the state-of-the-art by (1)
requiring the user to only input anatomical names for deep and superficial cere-
bral ROIs, (2) selecting target points within deep ROIs according to safety,
measured as the distance from critical structures, and (3) computing the best
trajectories to attain superficial ROIs, avoid critical structures, improve GM
sampling, and avoid conflicts between electrodes.
2 Methodology
ADMTP requires brain parcellation and segmentation of critical structure

(Sect. 2.1). From a list of user defined ROIs ADMTP finds an implantation plan
V (N ) = [v1 , . . . , vN ], where vn is the trajectory for the nth electrode. For vn ,
M candidate target points Tn,i : i ∈ {1, . . . , M } are calculated as described in
Sect. 2.2. Next ADMTP determines P entry points En,j : j ∈ {1, . . . , P } for each
Tn,i and then a combination of Tn,i and En,j for all N electrodes are computed
to avoid electrode interference as detailed in Sect. 2.3.
544 R. Sparks et al.
2.1 Regions of Interest and Critical Structure Extraction

Brain parcellation and segmentation of GM, arteries, veins, sulci, and the skull is
performed as follows. Veins and arteries are segmented from CT angiography or
T1 weighted MRI with gadolinium enhancement using multi-scale, multi-modal
tensor voting [9]. The skull is segmented from CT using thresholding. Geodesic
Information Flows (GIF) [3] is used to parcelate the brain on T1 weighted MRI,
giving 208 ROIs. GM and sulci are obtained from the brain parcellation. Figure 1
shows an implantation plan with deep and superficial ROIs.
(a) (b)
Fig. 1. Implantation plan with 10 electrodes. (a) Deep ROIs: amygdala (cyan), hip-
pocampus (yellow), anterior insula (brown), transverse temporal gyrus (blue), posterior
(orange) and middle cingulate (purple), posterior (green) and anterior medial orbital
gyrus (mauve). (b) Superficial ROIs: middle frontal (blue), superior frontal (purple),
middle temporal (light pink), and superior temporal (dark pink) lobes and supramar-
ginal (light orange), angular (light green), and precentral (dark green) gyri.
2.2 Candidate Target Point Selection

For a ROI M candidate target points Tn,i : i ∈ {1, . . . , M } are computed for
the nth electrode. First, a target risk image is defined as Ct = [C, ft (c) : c ∈ C]
where C is the grid of image voxel locations and ft (c) is the target risk score for
the given voxel c. ft (c) is computed as,
⎧
⎪
⎪1, if c ∈
/ Ωroi
⎨
ft (c) = 1, if c ∈ Ωcri (1)
⎪
⎪ froi (c) fcri (c)
⎩w roi ∗ 1− + wcri ∗ 1 − , else
max(froi ) max(fcri )
Ωroi and Ωcri are the set of voxels in the deep ROI and critical structures,
respectively. froi (c) is the distance between c and the closest surface point on
the deep ROI, calculated using a bounding volume hierarchy (BVH) as in [8].
Similarly, fcri (c) is the distance between c and the closest surface point on all
(a) (b) (c)
Fig. 2. (a)Axial, (b) sagittal, and (c) coronal views of Ct (high values of ft (c) are
red, low values are green) for the hippocampus (blue) with blood vessels (cyan) and
trajectories determined by ADMTP (purple, pink).
critical structures. wroi and wcri control the relative importance of placing the
target within the deep ROI and avoiding critical structures, respectively. Figure 2
displays Ct for a hippocampus, red corresponds to high values of ft (c) and green
to low values. Target points are selected from c ∈ Ct by calculating local minima
using the watershed algorithm [1] then sampling the M lowest values of ft (c)
with a distance of at least dtar between every target point Tn,i .
2.3 Automated Trajectory Planning
Entry points, defined as En,j : j ∈ {1, . . . , P }, are computed by using all ver-
tices on the skull mesh, roughly 10, 000 points sampled every 0.2 mm3 . Potential
trajectories, Tn,i En,j : i ∈ {1, . . . , M }, j ∈ {1, . . . , P }, are then removed from
consideration using a modified approach of Zombori et al. [8] as follows:
1. Tn,i En,j is longer than dlen .

2. The angle between Tn,i En,j and the skull normal is greater than dang .
3. Tn,i En,j does not traverse the superficial ROI.
4. Tn,i En,j intersects a critical structure (arteries, veins, or sulci).
After excluding trajectories based on these hard criteria there are typically
1, 000–5, 000 potential trajectories per electrode. The remaining trajectories have
a risk score Rn,i,j and GM ratio Gn,i,j calculated. Rn,i,j , a measure of the dis-
tance to critical structures, is computed as,
Tn,i
drisk − (fcri (x) − dsaf e )dx
En,j
Rn,i,j = , (2)
(drisk − dsaf e ) ∗ length
where trajectories with fcri (x) closer than dsaf e have the highest risk (Rn,i,j = 1)
while fcri (x) farther than drisk have no risk (Rn,i,j = 0).
Gn,i,j measures the proportion of electrode contacts in GM. For each elec-
trode, Q contacts with a recording radius of pr are spaced at even intervals pq
along the trajectory. Gn,i,j is calculated as,
Q

(H[fgm (pq − pr )] + H[fgm (pq )] + H[fgm (pq + pr )]
q=1
Gn,i,j = , (3)
3∗Q
where fgm (·) is the signed distance from the GM surface and H[·] is the Heaviside
function, with values of 1 inside GM and 0 outside.
Each trajectory is assigned a weighted score Sn,i,j computed as Sn,i,j =
10 ∗ Rn,i,j + Gn,i,j , where 10 was determined empirically so low risk is prioritized
over a high GM ratio.
The final implantation plan V (N ) is found by optimizing,
N
1
Stotal = argmin Sn,i,j
V (N ) N n=1
s.t. D(Tn,i En,j , Tk,i Ek,j ) > dtraj : ∀n, ∀k ∈ {1, . . . , N }, n = k. (4)
where dtraj specifies the minimum distance between trajectories that do not con-
flict. Due to the constraint dtraj if the user selects multiple electrodes with the
same ROI ADMTP will find unique targets. For an implantation plan there are
typically 7–12 electrodes, each with 1, 000–5, 000 potential trajectories, repre-
senting approximately 1 × 1021 possible combinations, hence, a depth-first graph
search strategy is used to calculate a feasible implantation plan. If no combina-
tion of trajectories exists which satisfies dtraj ADMTP returns the plan with the
largest distance between trajectories.
3 Experimental Design and Results

3.1 Experimental Design
ADMTP was evaluated on retrospective data from 20 patients with refractory
focal epilepsy who underwent intracerebral electrode implantation. Each patient
had 7–12 electrodes, 186 in total. Manual plans were determined by the consensus
of two neurosurgeons. Deep and superficial ROIs were identified by the same
neurosurgeons and ADMTP was evaluated using the parameters in Table 1.
Table 1. The following values were set by a consensus of 3 neurosurgeons: the most
oblique angle drillable, dang , the minimum safe distance from blood vessels, dsaf e , the
distance at which there is no risk, drisk , the minimum distance between electrodes,
dtraj . A commonly used electrode configuration determined: electrode length, dlen , the
number of contacts, Q, the interval between contacts, pq , the contact sample radius,
pr . The following values were set empirically: the number of candidate targets, M ,
the minimum distance between candidate targets, dtar , and the relative importance of
sampling the ROI, wroi , and avoiding critical structures, wcri .
Parameter M dtar wroi wcri dlen dang dsaf e drisk Q pq pr dtraj

◦
Value 10 3 mm 0.25 0.75 80 mm 25 3 mm 10 mm 10 6 mm 1.2 mm 10 mm
(a) (b)
(c) (d)
Fig. 3. Measures of suitability for trajectories determined by manual planning (plotted

on the X axis) versus ADMTP (plotted on the Y axis) are shown for (a) angle with
re-spect to the skull surface normal, (b) risk score, (c) distance to the nearest critical
structure, and (d) GM ratio. Red points are the center of mass for each measure. For
(a) angle and (b) risk score points below the diagonal correspond to ADMTP giving
the preferred result. For (c) distance to critical structures and (d) GM ratio points
above the di-agonal correspond to ADMTPgiving the preferred result.
3.2 Trajectory Suitability
Trajectories were assessed by angle with respect to the skull surface normal,
risk score, distance to nearest critical structure, and GM ratio. In Fig. 3 each
point corresponds to one trajectory with the manual plan value plotted on the
X axis and the ADMTP value plotted on the Y axis. The red point represents
the center of mass for each measure. Points below the diagonal have a lower
value for ADMTP compared to manual plans which represent ADMTP giving
the preferred value for angle and risk score; points above the diagonal represent
ADMTP giving the preferred value for critical structures distance and GM ratio.
A two-tailed Student’s t-test evaluated the statistical significance between values
determined by ADMTP and manual plans where the null hypothesis was that
the methods return similar values.
ADMTP found a more feasible entry angle in 96/186 trajectories (p < 0.01)
and increased GM sampling in 104/186 trajectories (p > 0.01). ADMTP found
trajectories that were safer, in terms of reduced risk score and increased distance
to the closest critical structure in 145/186 trajectories (p < 0.01).
(a) (b)
Fig. 4. Manual (pink) and ADMTP (blue) trajectories are shown with veins (cyan),
skull (opaque white), and with (a) the cortex (peach) and (b) no cortex.
3.3 Implantation Plan Suitability

Suitability of implantation plans were assessed by (1) distance between trajecto-
ries and (2) the ratio of unique gyri sampled to total number of electrodes. Ideally
each electrode samples a unique gyrus, corresponding to a ratio of 1. ADMTP has
an median distance between trajectories of 35.5 mm (5.2–124.2 mm) compared
to manual plans with an median of 34.2 mm (1.3–117.5 mm). Manual plans and
ADMTP have 12 trajectory pairs separated by less than dtraj (<10 mm). Man-
ual plans had an average gyri to electrode ratio of 0.96 (0.86-1) while ADMTP
has a ratio of 0.92 (0.75–1). ADMTP has larger distances between trajectories
but manual planning still provides preferred coverage in terms of more unique
gyri sampled and consistent coverage of the cortex. Figure 4 displays a man-
ual (pink) and ADMTP (blue) plan.Figure 4(b) shows target point differences
within the cortex, and Fig. 4(a) shows entry point differences. ADMTP often
finds trajectories within the same gyrus as manual planning.
3.4 Computational Efficiency

The computational efficiency of ADMTP was assessed for computing candidate
target points, trajectories, and ADTMP (combination of both steps). Compu-
tations were performed on a computer with a Intel(R) Xeon(R) 12 core CPU
2.10 GHz with 64.0 GB RAM and a single NVIDIA Quadro K4000 4 GB GPU.
Table 2 reports computation time. ADMTP took 61.14 s (15.43–279.20 s) with
most of the computation time spent on trajectory planning.
Table 2. Computation time for each step in ADMTP reported in seconds.
Algorithm Median time [Range] (sec)

Target point selection 6.91 [3.5–19.18]
Trajectory planning 54.07 [7.86–264.01]
ADMTP 61.14 [15.43–279.20]
4 Concluding Remarks
We presented an anatomically driven multiple trajectory planning (ADMTP)
algorithm for calculating intracerebral electrode trajectories from anatomical
regions of interest (ROIs). Compared to manual planning, ADMTP lowered
risk in 78 % of trajectories and increased GM sampling in 56 % of trajectories.
ADMTP was evaluated on quantitative measures of suitability, however, a qual-
itative analysis is necessary to assess the clinical suitability of ADMTP. Future
work is required to ensure ADMTP provides trajectories that sample unique
gyri. ADMTP efficiently calculates (>5 min) safe trajectories.
Acknowledgments. This publication represents in part independent research com-

missioned by the Health Innovation Challenge Fund (HICF-T4-275, WT097914,
WT106882), a parallel funding partnership between the Wellcome Trust and the
Department of Health, and the National Institute for Health Research University Col-
lege London Hospitals Biomedical Research Centre (NIHR BRC UCLH/UCL High
Impact Initiative). This work was undertaken at University College London Hospitals,
which received a proportion of funding from the Department of Health’s NNIHR BRC
funding scheme. The views expressed in this publication are those of the authors and
not necessarily those of the Wellcome Trust or NIHR.
References
1. Beare, R., Lehmann, G.: Finding regional extrema - methods and performance
(2005). http://hdl.handle.net/1926/153
2. Bériault, S., Subaie, F.A., Collins, D.L., Sadikot, A.F., Pike, G.B.: A multi-modal
approach to computer-assisted deep brain stimulation trajectory planning. IJCARS
7(5), 687–704 (2012)
3. Cardoso, M.J., Modat, M., Wolz, R., Melbourne, A., Cash, D., Rueckert, D.,
Ourselin, S.: Geodesic information flows: Spatially-variant graphs and their appli-
cation to segmentation and fusion. IEEE TMI 34(9), 1976–1988 (2015)
4. De Momi, E., Caborni, C., Cardinale, F., Casaceli, G., Castana, L., Cossu, M.,
Mai, R., Gozzo, F., Francione, S., Tassi, L., Lo Russo, G., Antiga, L., Ferrigno, G.:
Multi-trajectories automatic planner for StereoElectroEncephaloGraphy (SEEG).
IJCARS, 1–11 (2014)
5. Essert, C., Haegelen, C., Lalys, F., Abadie, A., Jannin, P.: Automatic computation
of electrode trajectories for deep brain stimulation: a hybrid symbolic and numerical
approach. IJCARS 7(4), 517–532 (2012)
6. Shamir, R.R., Joskowicz, L., Tamir, I., Dabool, E., Pertman, L., Ben-Ami, A.,
Shoshan, Y.: Reduced risk trajectory planning in image-guided keyhole neuro-
surgery. Med. Phys. 39(5), 2885–2895 (2012)
7. Zelmann, R., Beriault, S., Marinho, M.M., Mok, K., Hall, J.A., Guizard, N.,
Haegelen, C., Olivier, A., Pike, G.B., Collins, D.L.: Improving recorded volume in
mesial temporal lobe by optimizing stereotactic intracranial electrode implantation
planning. IJCARS 10(10), 1599–1615 (2015)
8. Zombori, G., Rodionov, R., Nowell, M., Zuluaga, M.A., Clarkson, M.J., Micallef, C.,
Diehl, B., Wehner, T., Miserochi, A., McEvoy, A.W., Duncan, J.S., Ourselin, S.: A
computer assisted planning system for the placement of sEEG electrodes in the
treatment of epilepsy. In: Stoyanov, D., Collins, D.L., Sakuma, I., Abolmaesumi, P.,
Jannin, P. (eds.) IPCAI 2014. LNCS, vol. 8498, pp. 118–127. Springer, Heidelberg
(2014)
9. Zuluaga, M.A., Rodionov, R., Nowell, M., Achhala, S., Zombori, G.,
Mendelson, A.F., Cardoso, M.J., Miserocchi, A., McEvoy, A.W., Duncan, J.S.,
Ourselin, S.: Stability, structure and scale: improvements in multi-modal vessel
extraction for seeg trajectory planning. IJCARS 10(8), 1227–1237 (2015)
Recognizing Surgical Activities with Recurrent
Neural Networks
Robert DiPietro1(B) , Colin Lea1 , Anand Malpani1 , Narges Ahmidi1 ,

S. Swaroop Vedula1 , Gyusung I. Lee2 , Mija R. Lee2 , and Gregory D. Hager1
1
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
rdipietro@gmail.com
2
Department of Surgery, Johns Hopkins University, Baltimore, MD, USA
Abstract. We apply recurrent neural networks to the task of recog-

nizing surgical activities from robot kinematics. Prior work in this area
focuses on recognizing short, low-level activities, or gestures, and has
been based on variants of hidden Markov models and conditional ran-
dom fields. In contrast, we work on recognizing both gestures and longer,
higher-level activites, or maneuvers, and we model the mapping from
kinematics to gestures/maneuvers with recurrent neural networks. To
our knowledge, we are the first to apply recurrent neural networks to
this task. Using a single model and a single set of hyperparameters, we
match state-of-the-art performance for gesture recognition and advance
state-of-the-art performance for maneuver recognition, in terms of both
accuracy and edit distance. Code is available at https://github.com/
rdipietro/miccai-2016-surgical-activity-rec.
1 Introduction
Automated surgical-activity recognition is a valuable precursor for higher-level

goals such as objective surgical-skill assessment and for providing targeted feed-
back to trainees. Previous research on automated surgical-activity recognition
has focused on gestures within a surgical task [9,10,13,15]. Gestures are atomic
segments of activity that typically last for a few seconds, such as grasping a
needle. In contrast, maneuvers are composed of a sequence of gestures and rep-
resent higher-level segments of activity, such as tying a knot. We believe that
targeted feedback for maneuvers is meaningful and consistent with the subjective
feedback that faculty surgeons currently provide to trainees.
Here we focus on jointly segmenting and classifying surgical activities. Other
work in this area has focused on variants of hidden Markov models (HMMs) and
conditional random fields (CRFs) [9,10,13,15]. HMM and CRF based methods
often define unary (label-input) and pairwise (label-label) energy terms, and
during inference find a global label configuration that minimizes overall energy.
Here we put emphasis on the unary terms and note that defining unaries that
are both general and meaningful is a difficult task. For example, of the works
above, the unaries of [10] are perhaps most general: they are computed using

DOI: 10.1007/978-3-319-46720-7 64
552 R. DiPietro et al.
Fig. 1. Example images from the JIGSAWS and MISTIC datasets.
learned convolutional filters. However, we note that even these unaries depend
only on inputs from fairly local neighborhoods in time.
In this work, we use recurrent neural networks (RNNs), and in particular long
short-term memory (LSTM), to map kinematics to labels. Rather than operating
only on local neighborhoods in time, LSTM maintains a memory cell and learns
when to write to memory, when to reset memory, and when to read from memory,
forming unaries that in principle depend on all inputs. In fact, we will rely only
on these unary terms, or in other words assume that labels are independent
given the sequence of kinematics. Despite this, we will see that predicted labels
are smooth over time with no post-processing. Further, using a single model
and a single set of hyperparameters, we match state-of-the-art performance for
gesture recognition and improve over state-of-the-art performance for maneuver
recognition, in terms of both accuracy and edit distance.
2 Methods
The goal of this work is to use nx kinematic signals over time to label every
time step with one of ny surgical activities. An individual sequence of length T
is composed of kinematic inputs {xt }, with each xt ∈ Rnx , and a collection of
one-hot encoded activity labels {yt }, with each yt ∈ {0, 1}ny . (For example, if we
have classes 1, 2, and 3, then the one-hot encoding of label 2 is (0, 1, 0)T .) We aim
to learn a mapping from {xt } to {yt } in a supervised fashion that generalizes to
users that were absent from the training set. In this work, we use recurrent neural
networks to discriminatively model p(yt |x1 , x2 , . . . , xt ) for all t when operating
online and p(yt |x1 , x2 , . . . , xT ) for all t when operating offline.
2.1 Recurrent Neural Networks

Though not yet as ubiquitous as their feedforward counterparts, RNNs have
been applied successfully to many diverse sequence-modeling tasks, from text-
to-handwriting generation [6] to machine translation [14].
A generic RNN is shown in Fig. 2a. An RNN maintains a hidden state h̃t ,
and at each time step t, the nonlinear block uses the previous hidden state h̃t−1
and the current input xt to produce a new hidden state h̃t and an output m̃t .
Recognizing Surgical Activities with Recurrent Neural Networks 553
m̃1 m̃2 m̃3 ht−1 Wh b
h̃0 h̃1 h̃2

··· Wx tanh ht
+
x1 x2 x3 xt
(a) A recurrent neural network. (b) A vanilla RNN block.
Fig. 2. A recurrent neural network.
If we use the nonlinear block shown in Fig. 2b, we end up with a specific and
simple model: a vanilla RNN with one hidden layer. The recursive equation for
a vanilla RNN, which can be read off precisely from Fig. 2b, is
ht = tanh(Wx xt + Wh ht−1 + b) (1)
Here, Wx , Wh , and b are free parameters that are shared over time. For the
vanilla RNN, we have m̃t = h̃t = ht . The height of ht is a hyperparameter and
is referred to as the number of hidden units.
In the case of multiclass classification, we use a linear layer to transform m̃t to
appropriate size ny and apply a softmax to obtain a vector of class probabilities:
ŷt = softmax(Wym m̃t + by ) (2)
p(ytk = 1 | x1 , x2 , . . . , xt ) = ŷtk (3)

where softmax(x) = exp(x)/ i exp(xi ).
RNNs traditionally propagate information forward in time, forming predic-
tions using only past and present inputs. Bidirectional RNNs [12] can improve
performance when operating offline by using future inputs as well. This essen-
tially consists of running one RNN in the forward direction and one RNN in the
backward direction, concatenating hidden states, and computing outputs jointly.
2.2 Long Short-Term Memory

Vanilla RNNs are very difficult to train because of what is known as the van-
ishing gradient problem [1]. LSTM [8] was specifically designed to overcome this
problem and has since become one of the most widely-used RNN architectures.
The recursive equations for the LSTM block used in this work are
x̃t = tanh(Wx̃x xt + Wx̃m mt−1 + bx̃ ) (4)
it = σ(Wix xt + Wim mt−1 + Wic ct−1 + bi ) (5)
ft = σ(Wf x xt + Wf m mt−1 + Wf c ct−1 + bf ) (6)
ct = it x̃t + ft ct−1 (7)
ot = σ(Wox xt + Wom mt−1 + Woc ct + bo ) (8)
mt = ot tanh(ct ) (9)
where represents element-wise multiplication and σ(x) = 1/(1 + exp(−x)). All

matrices W and all biases b are free parameters that are shared across time.
LSTM maintains a memory over time and learns when to write to memory,
when to reset memory, and when to read from memory [5]. In the context of
the generic RNN, m̃t = mt , and h̃t is the concatenation of ct and mt . ct is the
memory cell and is updated at each time step to be a linear combination of x̃t
and ct−1 , with proportions governed by the input gate it and the forget gate ft .
mt , the output, is a nonlinear version of ct that is filtered by the output gate ot .
Note that all elements of the gates it , ft , and ot lie between 0 and 1.
This version of LSTM, unlike the original, has forget gates and peephole con-
nections, which let the input, forget, and output gates depend on the memory
cell. Forget gates are a standard part of modern LSTM [7], and we include peep-
hole connections because they have been found to improve performance when
precise timing is required [4]. All weight matrices are full except the peephole
matrices Wic , Wf c , and Woc , which by convention are restricted to be diagonal.
Loss. Because we assume every yt is independent of all other yt given x1 , . . . , xt ,

maximizing the log likelihood of our data is equivalent to minimizing the overall
cross entropy between the true labels {yt } and the predicted labels {ŷt }. The
global loss for an individual sequence is therefore

lseq ({yt }, {ŷt }) = lt (yt , ŷt ) with lt (yt , ŷt ) = − ytk log ŷtk
t k
Training. All experiments in this paper use standard stochastic gradient

descent to minimize loss. Although the loss is non-convex, it has repeatedly
been observed empirically that ending up in a poor local optimum is unlikely.
Gradients can be obtained efficiently using backpropagation [11]. In practice, one
can build a computation graph out of fundamental operations, each with known
local gradients, and then apply the chain rule to compute overall gradients with
respect to all free parameters. Frameworks such as Theano and Google Tensor-
Flow let the user specify these computation graphs symbolically and alleviate
the user from computing overall gradients manually.
Once gradients are obtained for a particular free parameter p, we take a small
step in the direction opposite to that of the gradient: with η being the learning
rate,
∂lseq ∂lseq ∂lt
p = p − η with =
∂p ∂p t
∂p
3 Experiments
3.1 Datasets
The JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) [2] is a
public benchmark surgical activity dataset recorded using the da Vinci. JIG-
SAWS contains synchronized video and kinematic data from a standard 4-throw
suturing task performed by eight subjects with varying skill levels. All subjects
performed about 5 trials, resulting in a total of 39 trials. We use the same
measurements and activity labels as the current state-of-the-art method [10].
Measurements are position (x, y, z), velocity (vx , vy , vz ), and gripper angle (θ)
for each of the left and right slave manipulators, and the surgical activity at each
time step is one of ten different gestures.
The Minimally Invasive Surgical Training and Innovation Center - Science
of Learning (MISTIC-SL) dataset, also recorded using the da Vinci, includes
49 right-handed trials performed by 15 surgeons with varying skill levels. We
follow [3] and use a subset of 39 right-handed trials for all experiments. All trials
consist of a suture throw followed by a surgeon’s knot, eight more suture throws,
and another surgeon’s knot. We used the same kinematic measurements as for
JIGSAWS, and the surgical activity at each time step is one of 4 maneuvers:
suture throw (ST), knot tying (KT), grasp pull run suture (GPRS), and inter-
maneuver segment (IMS). It is not possible for us to release this dataset at this
time, though we hope we will be able to release it in the future.

JIGSAWS has a standardized leave-one-user-out evaluation setup: for the i-th
run, train using all users except i and test on user i. All results in this paper
are averaged over the 8 runs, one per user. We follow the same strategy for
MISTIC-SL, averaging over 11 runs, one for each user that does not appear in
the validation set, as explained below.
We include accuracy and edit distance (Levenshtein distance) as performance
metrics. Accuracy is the percentage of correctly-classified frames, measuring per-
formance without taking temporal consistency into account. In contrast, edit dis-
tance is the number of operations needed to transform predicted segment-level
labels into ground-truth segment-level labels, here normalized for each dataset
using the maximum number (over all sequences) of segment-level labels.
3.3 Hyperparameter Selection and Training

Here we include the most relevant details regarding hyperparameter selection
and training; other details are fully specified in code, available at https://github.
com/rdipietro/miccai-2016-surgical-activity-rec.
For each run we train for a total of approximately 80 epochs, maintaining
a learning rate of 1.0 for the first 40 epochs and then halving the learning rate
every 5 epochs for the rest of training. Using a small batch size is important;
we found that otherwise the lack of stochasticity let us converge to bad local
optima. We use a batch size of 5 sequences for all experiments.
Because JIGSAWS has a fixed leave-one-user-out test setup, with all users
appearing in the test set exactly once, it is not possible to use JIGSAWS for
hyperparameter selection without inadvertently training on the test set. We
therefore choose all hyperparameters using a small MISTIC-SL validation set
consisting of 4 users (those with only one trial each), and we use the resulting
hyperparameters for both JIGSAWS experiments and MISTIC-SL experiments.
We performed a grid search over the number of RNN hidden layers (1 or 2),
the number of hidden units per layer (64, 128, 256, 512, or 1024), and whether
dropout [16] is used (with p = 0.5). 1 hidden layer of 1024 units, with dropout,
resulted in the lowest edit distance and simultaneously yielded high accuracy.
These hyperparameters were used for all experiments.
Using a modern GPU, training takes about 1 h for any particular JIGSAWS
run and about 10 h for any particular MISTIC-SL run (MISTIC-SL sequences
are approximately 10x longer than JIGSAWS sequences). We note, however, that
RNN inference is fast, with a running time that scales linearly with sequence
length. At test time, it took the bidirectional RNN approximately 1 s of compute
time per minute of sequence (300 time steps).
3.4 Results
Table 1 shows results for both JIGSAWS (gesture recognition) and MISTIC-
SL (maneuver recognition). A forward LSTM and a bidirectional LSTM are
compared to the Markov/semi-Markov conditional random field (MsM-CRF),
Shared Discriminative Sparse Dictionary Learning (SDSDL), Skip-Chain CRF
(SC-CRF), and Latent-Convolutional Skip-Chain CRF (LC-SC-CRF). We note
that the LC-SC-CRF results were computed by the original author, using the
same MISTIC-SL validation set for hyperparameter selection.
We include standard deviations where possible, though we note that they
largely describe the user-to-user variations in the datasets. (Some users are exc-
eptionally challenging, regardless of the method.) We also carried out statistical-
significance testing using a paired-sample permutation test (p-value of 0.05).
This test suggests that the accuracy and edit-distance differences between the
bidirectional LSTM and LC-SC-CRF are insignificant in the case of JIGSAWS
but are significant in the case of MISTIC-SL. We also remark that even the
forward LSTM is competitive here, despite being the only algorithm that can
run online.
Qualitative results are shown in Fig. 3 for the trials with highest, median, and
lowest accuracies for each dataset. We note that the predicted label sequences
are smooth, despite the fact that we assumed that labels are independent given
the sequence of kinematics.
Table 1. Quantitative results and comparisons to prior work.
JIGSAWS MISTIC-SL
Accuracy (%) Edit dist. (%) Accuracy (%) Edit dist. (%)
MsM-CRF [15] 72.6 — — —
SDSDL [13] 78.7 — — —
SC-CRF [9] 80.3 — — —
LC-SC-CRF [10] 82.5 ± 5.4 14.8 ± 9.4 81.7 ± 6.2 29.7 ± 6.8
Forward LSTM 80.5 ± 6.2 19.8 ± 8.7 87.8 ± 3.7 33.9 ± 13.3
Bidir. LSTM 83.3 ± 5.7 14.6 ± 9.6 89.5 ± 4.0 19.5 ± 5.2
Fig. 3. Qualitative results for JIGSAWS (top) and MISTIC-SL (bottom) using a bidi-
rectional LSTM. For each dataset, we show results from the trials with highest accuracy
(top), median accuracy (middle), and lowest accuracy (bottom). In all cases, ground
truth is displayed above predictions.
4 Summary
In this work we performed joint segmentation and classification of surgical activi-
ties from robot kinematics. Unlike prior work, we focused on high-level maneuver
prediction in addition to low-level gesture prediction, and we modeled the map-
ping from inputs to labels with recurrent neural networks instead of with HMM
or CRF based methods. Using a single model and a single set of hyperparameters,
we matched state-of-the-art performance for JIGSAWS (gesture recognition) and
advanced state-of-the-art performance for MISTIC-SL (maneuver recognition),
in the latter case increasing accuracy from 81.7 % to 89.5 % and decreasing nor-
malized edit distance from 29.7 % to 19.5 %.
References
1. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient
descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
2. Gao, Y., Vedula, S.S., Reiley, C.E., Ahmidi, N., Varadarajan, B., Lin, H.C., Tao,
L., Zappella, L., Bejar, B., Yuh, D.D., Chen, C.C.G., Vidal, R., Khudanpur, S.,
Hager, G.D.: Language of surgery: a surgical gesture dataset for human motion
modeling. In: Modeling and Monitoring of Computer Assisted Interventions
(M2CAI) 2014. Springer, Boston, USA (2014)
3. Gao, Y., Vedula, S., Lee, G.I., Lee, M.R., Khudanpur, S., Hager, G.D.: Unsuper-
vised surgical data alignment with application to automatic activity annotation. In:
2016 IEEE International Conference on Robotics and Automation (ICRA) (2016)
4. Gers, F.A., Schmidhuber, J.: Recurrent nets that time and count. In: IEEE Con-
ference on Neural Networks, vol. 3 (2000)
5. Graves, A.: Supervised Sequence Labelling. Springer, Heidelberg (2012)
6. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint
arXiv:1308.0850 (2013)
7. Greff, K., Srivastava, R.K., Koutnı́k, J., Steunebrink, B.R., Schmidhuber, J.:
LSTM: A search space odyssey. arXiv preprint arXiv:1503.04069 (2015)
8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8),
1735–1780 (1997)
9. Lea, C., Hager, G.D., Vidal, R.: An improved model for segmentation and recog-
nition of fine-grained activities with application to surgical training tasks. In: 2015
IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1123–
1129. IEEE (2015)
10. Lea, C., Vidal, R., Hager, G.D.: Learning convolutional action primitives for fine-
grained action recognition. In: 2016 IEEE International Conference on Robotics
and Automation (ICRA) (2016)
11. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-
propagating errors. Cogn. Model. 5(3), 1 (1988)
12. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans.
Sig. Process. 45(11), 2673–2681 (1997)
13. Sefati, S., Cowan, N.J., Vidal, R.: Learning shared, discriminative dictionaries for
surgical gesture segmentation and classification. In: Modeling and Monitoring of
Computer Assisted Interventions (M2CAI) 2015. Springer, Heidelberg (2015)
14. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural
networks. In: Advances in Neural Information Processing Systems (2014)
15. Tao, L., Zappella, L., Hager, G.D., Vidal, R.: Surgical gesture segmentation and
recognition. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MIC-
CAI 2013, Part III. LNCS, vol. 8151, pp. 339–346. Springer, Heidelberg (2013)
16. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization.
arXiv preprint arXiv:1409.2329 (2014)
Two-Stage Simulation Method to Improve
Facial Soft Tissue Prediction Accuracy
for Orthognathic Surgery
Daeseung Kim1, Chien-Ming Chang1, Dennis Chun-Yu Ho1,

Xiaoyan Zhang1, Shunyao Shen1, Peng Yuan1, Huaming Mai1,
Guangming Zhang2, Xiaobo Zhou2, Jaime Gateno1,3,
Michael A.K. Liebschner4, and James J. Xia1,3(&)
1
Department of Oral and Maxillofacial Surgery,
Houston Methodist Research Institute, Houston, TX, USA
JXia@houstonmethodist.org
2
Department of Radiology, Wake Forest School of Medicine,
Winston-Salem, NC, USA
3
Department of Surgery, Weill Medical College, Cornell University,
New York, NY, USA
4
Department of Neurosurgery, Baylor College of Medicine,
Houston, TX, USA
Abstract. It is clinically important to accurately predict facial soft tissue

changes prior to orthognathic surgery. However, the current simulation methods
are problematic, especially in clinically critical regions. We developed a
two-stage finite element method (FEM) simulation model with realistic tissue
sliding effects. In the 1st stage, the facial soft-tissue-change following bone
movement was simulated using FEM with a simple sliding effect. In the 2nd
stage, the tissue sliding effect was improved by reassigning the bone-soft tissue
mapping and boundary condition. Our method has been quantitatively and
qualitatively evaluated using 30 patient datasets. The two-stage FEM simulation
method showed significant accuracy improvement in the whole face and the
critical areas (i.e., lips, nose and chin) in comparison with the traditional FEM
method.
1 Introduction
Facial appearance significantly impacts human social life. Orthognathic surgery is a

bone-only surgical procedure to treat patients with dentofacial deformity, in which the
deformed jaws are cut into pieces and repositioned to a desired position (osteotomy).
Currently, only osteotomies can be accurately planned presurgically. Facial soft tissue
changes, a direct result from osteotomies, cannot be accurately predicted due to the
complex nature of facial anatomy. Traditionally soft tissue simulation is based on
bone-to-soft tissue movement ratios, which have been proven inaccurate. Among the
published reports, finite element method (FEM) [1] is reported to be the most common,
accurate and biomechanically relevant method [1, 2]. Nonetheless, the predicted results
are still less than ideal, especially in nose, lips and chin regions, which are extremely
DOI: 10.1007/978-3-319-46720-7_65
560 D. Kim et al.
important for orthognathic surgery. Therefore, there is an urgent clinical need to develop
a reliable method of accurately predicting facial changes following osteotomies.
Traditional FEM for facial soft tissue simulation assumes that the FEM mesh nodes
move together with the contacting bone surfaces. However, this assumption can lead to
significant errors when a large bone movement and occlusion changes are involved. In
human anatomy, cheek and lip mucosa are not directly attached to the bone and teeth;
they slide over each other. The traditional FEM does not consider this sliding, which
we believe is the main reason for inaccurate prediction in the lips and chin.
Implementing the realistic sliding effect into FEM is technically challenging. It
requires high computational times and efforts because the sliding mechanism in human
mouth is a dynamic interaction between two surfaces. The 2nd challenge is that even if
the sliding movement with force constraint is implemented, the simulation results may
still be inaccurate, because there is no strict nodal displacement boundary condition
applied to the sliding areas. The soft tissues at sliding surfaces follow the buccal surface
profile of the bones and teeth. Thus, it is necessary to consider the displacement
boundary condition for sliding movement. The 3rd challenge is that the mapping
between the bone surface and FEM mesh nodes needs to be reestablished after the bony
segments are moved to a desired planned position. This is because the bone and soft
tissue relationship is not constant before and after the bone movement, e.g. a setback or
advancement surgery may either decrease or increase the soft tissue contacting area to
the bones and teeth. This mismatch may lead to the distortion of the resulting mesh.
The 4th challenge is that occlusal changes, e.g. from preoperative cross-bite to post-
operative Class I (normal) bite, may cause a mesh distortion in the lip region where the
upper and lower teeth meet. Therefore, a simulation method with more advanced
sliding effects is required to increase the prediction accuracy in critical regions such as
the lips and chin.
We solved these technical problems. In this study, we developed a two-stage FEM
simulation method. In the first stage, the facial soft tissue changes following the bony
movements were simulated with an extended sliding boundary condition to overcome
the mesh distortion problem in traditional FEM simulations. The nodal force constraint
was applied to simulate the sliding effect of the mucosa. In the second stage, nodal
displacement boundary conditions were implemented in the sliding areas to accurately
reflect the postoperative bone surface geometry. The corresponding nodal displacement
for each node was recalculated after reassigning the mapping between the mesh and
bone surface in order to achieve a realistic sliding movement. Finally, our simulation
method was evaluated quantitatively and qualitatively using 30 sets of preoperative and
postoperative patient computed tomography (CT) datasets.
2 Two-Stage FEM Simulation Algorithm
Our two-stage approach of simulating facial soft tissue changes following the osteo-
tomies is described below in details. In the 1st stage, a patient-specific FEM model with
homogeneous linear elastic material property is generated using a FEM template model
(Total of 38280 elements and 48593 nodes) [3]. The facial soft tissue changes are
Two-Stage Simulation Method to Improve Facial Soft Tissue Prediction 561
predicted using FEM with the simple sliding effect of the mucosa around the teeth and
partial maxillary and mandibular regions. Only the parallel nodal force is considered on
the corresponding areas. In the 2nd stage, explicit boundary conditions are applied to
improve the tissue sliding effect by exactly reflecting the bone surface geometry, thus
ultimately improving the prediction accuracy.
2.1 The First Stage of FEM Simulation with Simple Sliding Effect
The patient-specific volume mesh is generated from an anatomically detailed FEM
template mesh, which was previously developed from a Visible Female dataset [3].
Both inner and outer surfaces of the template mesh are registered to the patient’s skull
and facial surfaces respectively using anatomical landmark-based thin-plate splines
(TPS) technique. Finally, the total mesh volume is morphed to the patient data by
interpolating the surface registration result using TPS again [3].
Although there have been studies investigating optimal tissue properties, the effect
of using different linear elastic material properties on the simulation results was neg-
ligible [4]. Furthermore, shape deformation patterns are independent of Young’s
modulus for isotropic material under displacement boundary conditions as long as the
loading that causes the deformation is irrelevant for the study. Therefore, in our study,
we assign 3000 (Pa) for Young’s modulus and 0.47 for Poisson’s ratio [4].
Surface nodes of the FEM mesh are divided into the boundary nodes and free nodes
(Fig. 1). The displacements of free nodes (GreenBlue in Fig. 1b and c) are determined
by the displacements of boundary nodes using FEM. Boundary nodes are further
divided into static, moving and sliding nodes. The static nodes do not move in the
surgery (red in Fig. 1). Note that the lower posterior regions of the soft tissue mesh
(orange in Fig. 1b) are assigned as free nodes in the first stage. This is important
because together with the ramus sliding boundary condition, it maintains the soft tissue
integrity, flexibility and smoothness in the posterior and inferior mandibular regions
when an excessive mandibular advancement or setback occurs.
Fig. 1. Mesh nodal boundary condition. (a) Mesh inner surface boundary condition (illustrated
on bones for better understanding) for the 1st stage only; (b) Posterior and superior surface
boundary condition for both 1st and 2nd stages; (c) Mesh inner surface boundary condition
(illustrated on bones for better understanding) for the 2nd stage only. Static nodes: red, and
orange (2nd stage only); Moving nodes: Blue; Sliding nodes: pink; Free nodes: GreenBlue, and
orange (1st stage only); Scar tissue: green.
562 D. Kim et al.
The moving nodes on the mesh are the ones moving in sync with the bones (blue in
Fig. 1a). The corresponding relationships of the vertices of the STL bone segments to
the moving nodes of the mesh are determined by a closest point search algorithm. The
movement vector (magnitude and the direction) of each bone segment is then applied to
the moving nodes as a nodal displacement boundary condition. In addition, the areas
where two bone (proximal and distal) segments collide with each other after the sur-
gical movements are excluded from the moving boundary nodes. These are designated
as free nodes to further solve the mesh distortion at the mandibular inferior border.
Moreover, scar tissue is considered as a moving boundary (green in Fig. 1a). This is
because the soft tissues in these regions are degloved intraoperatively, causing scars
postoperatively, which subsequently affects the facial soft tissue geometry. The scar
tissue is added onto the corresponding moving nodes by shifting them an additional
2 mm in anterior direction as the displacement boundary condition.
In the first stage, the sliding boundary conditions are applied to the sliding nodes
(pink in Fig. 1a) of the mouth, including the cheek, lips, and extended to the mesh inner
surface corresponding to a partial maxilla and mandible (including partial ramus). The
sliding boundary conditions in mucosa area are adopted from [2].
Movement of the free nodes (Fig. 1b) is determined by FEM with the aforemen-
tioned boundary conditions (Fig. 1a and b). An iterative FEM solving algorithm is
developed to calculate the movement of the free nodes and to solve the global FEM
equation: Kd ¼ f , where K is a global stiffness matrix, d is a global nodal displace-
ment, and f is a global nodal force. This equation can be rewritten as:
! ! !
K11 K12 d1 f1
¼ ð1Þ
T
K12 K22 d2 f2
where d1 is the displacement of the moving and static nodes, d2 is the displacement of
the free and sliding nodes to be determined. The parameter f1 is the nodal force on the
moving and static nodes, and f2 is the nodal force acting on both free and sliding nodes.
The nodal force of the free nodes is assumed to be zero, and only tangential nodal
forces along the contacting bone surface are considered for the sliding nodes [2].
The final value of d2 is calculated by iteratively updating d2 using Eq. (2) until the
converging condition is satisfied [described later].
ðk þ 1Þ ðkÞ ðkÞ
d2 ¼ d2 þ d2 update ; ðk ¼ 1; 2; . . .::; nÞ ð2Þ
d2 update is calculated as follows. First, f2 is calculated by substituting current d2 into

Eq. (3) that is derived from Eq. (1). At the start of the iteration (k = 1), the initial d2 is
randomly assigned and substituted for d2 to solve Eq. (3). f2 is composed of nodal force
of the sliding nodes (f2 sliding ) and the free nodes (f2 free ).
f2 ¼ K12
T
d1 þ K22 d2 ð3Þ
Second, f2t is calculated by transforming the nodal force of the sliding nodes among
f2 to have only tangential nodal force component [2]. Now, f2t is composed of the nodal
force of the free nodes (f2 free ) and only a tangential component of the nodal force of the
sliding nodes (f2t sliding ).
In the final step of the iteration, f2 update is acquired to determine the required nodal
displacement (d2 update Þ. Nodal force f2 update is the difference between f2t and f2 . d2 update
1

is finally calculated as follows: d2 update ¼ K22 f2 update þ K12
T
d1 , which is derived
ðk þ 1Þ
from Eq. (1). Then, d2 is calculated using Eq. (2). The iteration continues until the
maximal absolute value of f2 update converges below 0.01 N (k = n). The final values of
d (d1 and d2 ) represents the displacement of mesh nodes after the appling bone
movements and the simple sliding effect. The algorithm was implemented in
MATLAB. The final d in this first-stage simulation is designated as dfirst .
2.2 The Second Stage of FEM Simulation with Advanced Sliding Effect
The predicted facial soft tissue changes in the first stage are further refined in the
second stage by adding an advanced sliding effect. This is necessary because the first
stage only accounts for the nodal force constraint, which may result in a mismatch
between the simulated mesh inner surface and the bone surface (Fig. 2).
Fig. 2. Assignment of nodal displacement in the second stage of FEM. (a) Mismatch
between the simulated mesh inner surface and the bone surface. (b) Description of nodal
displacement boundary condition assignment.
Based on real clinical situations, the geometries of the teeth and bone buccal
surface and its contacting surface on the inner side of the soft tissue mesh should
exactly be matched, even though the relationship between the vertices of the bones and
the nodes of the soft tissue mesh is changed after the bony segments are moved.
Therefore, the boundary mapping and condition between the bone surface and soft
tissue mesh nodes need to be reestablished in the sliding areas in order to properly
reflect the above realistic sliding effect. First, the nodes of the inner mesh surface
corresponding to the maxilla and mandible are assigned as the moving nodes in the
second stage (blue in Fig. 1c). The nodal displacements of the moving nodes are
calculated by finding the closest point from each mesh node to the bone surface, instead
of finding them from the bone to the mesh in the first-stage. The assignment is pro-
cessed from superior to inferior direction, ensuring an appropriate boundary condition
implementation without mesh distortion (Fig. 2). This is because clinically the post-
operative lower teeth are always inside of the upper teeth (as a normal bite) despite of
the preoperative condition. This procedure prevents the nodes from having the same
564 D. Kim et al.
nodal displacement being counted twice, thus solving the mismatch problem between
the bone surface and its contacting surface on the inner side of the simulated mesh.
Once computed, the vector between each node and its corresponding closest vertex on
the bone surface is assigned as the nodal displacement for the FEM simulation.
The free nodes at the inferoposterior surface of the soft tissue mesh in the first-stage
are now assigned as static nodes in this stage (orange in Fig. 1b). The rest of the nodes
are assigned as the free nodes (GreenBlue in Fig. 1b and c). The global stiffness matrix
(K), the nodal displacement (d) and the nodal force (f) are reorganized according to the
new boundary conditions. The 2nd-stage results are calculated by solving Eq. (1).
Based on the assumption that the nodal force of the free nodes, f2 , is zero (note no
sliding nodes in the second-stage), the nodal displacement of the free nodes, d2 , can be
1 T
calculated as follows: d2 ¼ K22 K12 d1 (from Eq. (1)). Then, the final d (d1 and d2 ) is
designated as dsecond . Finally, the overall nodal displacement is calculated by com-
bining the resulted nodal displacements of the first (dfirst ) and the second (dsecond ) FEM
simulations.
3 Quantitative and Qualitative Evaluations and Results
The evaluation was completed by using 30 randomly selected datasets of patients who
had dentofacial deformity and underwent an orthognathic surgery [IRB0413-0045].
Each patient had a complete preoperative and postoperative CT scans.
The soft tissue prediction was completed using 3 methods: (1) the traditional FEM
without considering the slide effect [1]; (2) the FEM with first-stage (simple) sliding
effect by only considering the nodal force constraint; and (3) our novel FEM with
two-stage sliding effects. All FEM meshes were generated by adapting our FEM
template to the patient’s individual 3D model [3]. In order to determine the actual
movement vector of each bony segment, the postoperative patient’s bone and soft
tissue 3D CT models were registered to the preoperative ones at the cranium (surgically
unmoved). The movement vector of each bony segment was calculated by moving the
osteotomized segment from its preoperative original position to the postoperative
position.
Finally, the simulated results were evaluated quantitatively and qualitatively. In the
quantitative evaluation, displacement errors (absolute mean Euclidean distances) were
calculated between the nodes on the simulated facial mesh and their corresponding
points on the postoperative model. The evaluation was completed for the whole face
and 8 sub-regions (Fig. 3). Repeated measures analysis of variance and its post-hoc
tests were used to detect the statistically significant difference. In the qualitative
evaluation, two maxillofacial surgeons who are experienced in orthognathic surgery
together evaluated the results based on their clinical judgement and consensus. They
were also blinded from the methods used for the simulation. The predicted results were
compared to the postoperative ones using a binary visual analog scale (Unacceptable:
the predicted result was not clinically realistic; Acceptable: the predicted result was
clinically realistic and very similar to the postoperative outcome). Chi-square test was
used to detect the statistical significant differences.
Fig. 3. Sub-regions (automatically divided using anatomical landmarks)
The results of the quantitative evaluation showed that our two-stage sliding effects
FEM method significantly improved the accuracy of the whole face, as well as the
critical areas (i.e., lips, nose and chin) in comparison with the traditional FEM method.
The chin area also showed a trend of improvement (Table 1). Finally, the malar region
showed a significant improvement due to the scar tissue modeling.
The results of the qualitative evaluation showed that 73 % (22/30) predicted results
achieved with 2-stage FEM method were clinically acceptable. The prediction accuracy
of the whole face and the critical regions (e.g., lips and nose) were significantly
improved (Table 1). However, only 43 % (13/30) were acceptable with both traditional
and simple sliding FEMs. This was mainly due to the poor lower lip prediction. Even
though the cheek prediction was significantly improved in the simple sliding FEM,
inaccurately predicted lower lips severely impacted the whole facial appearance.
Table 1. Improvement of the simple and 2-stage sliding over the traditional FEM method (%)
for 30 patients.
Region Quantitative evaluation Qualitative evaluation
Simple sliding Two-stage sliding Simple sliding Two-stage sliding
Entire face 1.9 4.5* 0.0 30.0*
1. Nose 7.2* 8.4* 0.0 0.0
2. Upper lip −1.3 9.2* 13.3 20.0*
3. Lower lip −12.0 10.2 −6.7 23.3*
4. Chin −2.0 3.6 3.3 10.0
5. Right malar 6.1* 6.2* 0.0 0.0
6. Left malar 9.2* 8.8* 0.0 0.0
7. Right cheek 0.1 1.3 23.3* 23.3*
8. Left cheek 3.0 1.4 30.0* 30.0*
* Significant difference compared to the traditional method (P < 0.05).
Figure 4 illustrates the predicted results of a typical patient. Using the traditional
FEM, the upper and lower lip moved together with the underlying bone segments
without considering the sliding movement (1.4 mm of displacement error for the upper
lip; 1.6 mm for the lower), resulting in large displacement errors (clinically unaccept-
able, Fig. 4(a)). The predicted upper lip using the simple sliding FEM was moderately
improved (1.1 mm of error), while the lower lip showed a larger error (3.1 mm). The
upper and lower lips were in a wrong relation (clinically unacceptable, Fig. 4(b)).
566 D. Kim et al.
Fig. 4. An example of quantitative and qualitative evaluation results. The predicted mesh
(red) is superimposed to the postoperative bone (blue) and soft tissue (grey). (a) Traditional FEM
simulation (1.6 mm of error for the whole face, clinically not acceptable). (b) Simple sliding
FEM simulation (1.6 mm of error, clinically not acceptable). (c) Two-stage FEM simulation
(1.4 mm of error, clinically acceptable).
The mesh inner surface, and the bony/teeth geometries were also mismatched that
should be perfectly matched clinically. Finally, our two-stage FEM simulation achieved
the best results of accurately predicting clinically important facial features with a correct
lip relation (the upper lip error: 0.9 mm; the lower: 1.3 mm, clinically acceptable,
Fig. 4(c)).
4 Discussion and Future Work
We developed a novel two-stage FEM simulation method to accurately predict facial

soft tissue changes following osteotomies. Our approach was quantitatively and qual-
itatively evaluated using 30 patient datasets. The clinical contribution of this method is
significant. Our approach allows doctors to understand how the bony movements affect
the facial soft tissues changes preoperatively, and subsequently revise the plan as
needed. In addition, it also allows patients to foresee their postoperative facial
appearance prior to the surgery (patient education). The technical contributions include:
(1) Efficient 2-stage sliding effects are implemented into the FEM simulation model to
predict realistic facial soft tissue changes following the osteotomies. (2) The extended
definition of the boundary condition and the ability of changing node types during the
simulation clearly solve the mesh distortion problem, not only in the sliding regions, but
also in the bone collision areas where the proximal and distal segments meet. (3) The
patient-specific soft tissue FEM model can be efficiently generated by deforming our
FEM template, without the need of building FEM model for each patient. It makes the
FEM simulation feasible for clinical use.
There are still some limitations in the current approach. Preoperative strained lower
lip is not considered in the simulation. It can be automatically corrected to a reposed
status in the surgery by a pure horizontal surgical movement. But the same is not true in
the simulation. The 8 clinically unacceptable results using our two-stage FEM method
were all due to this reason. We are working on solving this clinically observed phe-
nomenon. In addition, we are also improving the error evaluation method. The quan-
titative results in this study do not necessary reflect the qualitative results as shown in
Table 1 and Fig. 4. Nonetheless, our two-stage FEM simulation is the first step towards
achieving a realistic facial soft-tissue-change prediction following osteotomies. In the
near future, it will be fully tested in a larger clinical study.
References
1. Pan, B., et al.: Incremental kernel ridge regression for the prediction of soft tissue
deformations. Med. Image Comput. Comput. Assist. Interv. 15(Pt 1), 99–106 (2012)
2. Kim, H., Jürgens, P., Nolte, L.-P., Reyes, M.: Anatomically-driven soft-tissue simulation
strategy for cranio-maxillofacial surgery using facial muscle template model. In: Jiang, T.,
Navab, N., Pluim, J.P., Viergever, M.A. (eds.) MICCAI 2010, Part I. LNCS, vol. 6361,
3. Zhang, X., et al.: An eFace-template method for efficiently generating patient-specific
anatomically-detailed facial soft tissue FE models for craniomaxillofacial surgery simulation.
Ann. Biomed. Eng. 44, 1656–1671 (2016)
4. Mollemans, W., Schutyser, F., Nadjmi, N., Maes, F., Suetens, P.: Parameter optimisation of a
linear tetrahedral mass tensor model for a maxillofacial soft tissue simulator. In: Harders, M.,
Székely, G. (eds.) ISBMS 2006. LNCS, vol. 4072, pp. 159–168. Springer, Heidelberg (2006)
Hand-Held Sound-Speed Imaging Based
on Ultrasound Reflector Delineation
Sergio J. Sanabria(B) and Orcun Goksel
Computer-assisted Applications in Medicine, ETH Zurich, Zurich, Switzerland

ssanabria@ethz.ch
Abstract. A novel hand-held speed-of-sound (SoS) imaging method is

proposed, which requires only minor hardware extensions to conventional
ultrasound (US) B-mode systems. A hand-held reflector is used as a tim-
ing reference for US signals. A robust reflector-detection algorithm, based
on dynamic programming (DP), achieves unambiguous timing even with
10 dB signal-to-noise ratio in real tissues, successfully detecting delays
<100 ns introduced by SoS heterogeneities. An Anisotropically-Weighted
Total-Variation (AWTV) regularization based on L1-norm smoothness
reconstruction is shown to achieve significant improvements in the delin-
eation of focal lesions. The Contrast-to-noise-ratio (CNR) is improved
from 15 dB to 37 dB, and the axial resolution loss from >300 % to <15 %.
Experiments with breast-mimicking phantoms and ex-vivo liver samples
showed, for hard hypoechogenic inclusions not visible in B-mode US, a
high SoS contrast (2.6 %) with respect to cystic inclusions (0.9 %) and the
background SoS noise (0.6 %). We also tested our method on a healthy
volunteer in a preliminary in-vivo test. The proposed technique demon-
strates potential for low-cost and non-ionizing screening, as well as for
diagnostics in daily clinical routine.
1 Introduction
Breast cancer is a high-prevalence disease affecting 1/8 women in the USA. Cur-
rent routine screening consists of X-ray mammography, which, however, shows
low sensitivity to malign tumors in dense breasts, for which a large number of
false positives leads to an unnecessary number of breast biopsies. Also, the use of
ionizing radiation advises against a frequent utilization, for instance, to monitor
the progress of a tumor. Finally, the compression of the breast down to a few
centimeter may cause patient discomfort. For these reasons, latest recommenda-
tions restrict the general use of X-ray mammography to biennial examinations
in women over 50 year old [13].
Ultrasound (US) is a safe, pain-free, and widely available medical imaging
modality, which can complement routine mammographies. Conventional screen-
ing breast US (B-mode), which measures reflectivity and scattering from tissue
structures, showed significantly higher sensitivity combined with mammography
(97 %) than the latter alone (74 %) [8]. However, B-mode US shows poor speci-
ficity. A novel US modality, Ultrasound Computed-tomography (USCT), aims at

DOI: 10.1007/978-3-319-46720-7 66
Hand-Held Sound-Speed Imaging 569
mapping other tissue parameters, such as the speed-of-sound (SoS), which shows
a high potential for tumor differentiation (e.g., fibroadenoma, carcinoma, cysts)
[2]. However, this method requires dedicated and complex systems consisting of
a large number of transducer elements located around the breast in order to
measure US wave propagation paths along multiple trajectories, from which the
SoS-USCT image is reconstructed [3,5,10,11]. Low-cost extensions of conven-
tional B-mode systems that only require a single multi-element array transducer
are desirable for SoS-USCT for the daily clinical routine. There have been some
early attempts to combine B-mode systems with X-ray mammography, using
the back compression plate as a timing reference. Yet, the reconstruction suffers
from strong limited-angle artifacts, which provide unsatisfactory image quality,
unless detailed prior information of the screened inclusion geometry is available
[6,9].
In this work we propose a novel SoS-USCT
method, hand-held sound-speed imaging, which
overcomes the above listed limitations. By trans-
mitting US waves through tissue between a B-
mode transducer and a hand-held reflector, a Reflector Transducer
SoS-USCT image of sufficient quality for tumor

screening is obtained (Fig. 1). A specific reflector
design combined with dedicated image process-
Tissue
ing provides unambiguous measurement of US Ultrasound waves
time-of-flight (ToF) between different trans-
mitter/receiver elements of a transducer, from
Fig. 1. SoS imaging setup.
which local tissue SoS is derived as an image.
Total-variation regularization overcomes the previously reported limited-angle
artifacts and enables prior-less SoS imaging and precise delineation of piece-wise
homogeneous inclusions. The proposed method only requires a small and local-
ized breast compression, while allowing for flexible access to arbitrary imaging
planes within the breast.
2 Methods
A 128-element 5 MHz linear ultrasound array (L14/5-38) was operated in mul-

tistatic mode (a), each element sequentially firing (Tx) and the rest receiving
(Rx). For this purpose, a custom acquisition sequence was implemented on a
research ultrasound machine (SonixTouch, Ultrasonix, Richmond, Canada). In
a first implementation, a conventional ultrasound beamformer is adapted to the
application by beamforming only a single element pair in Tx and Rx at a time,
which requires the acquisition of 128 × 128 RF lines for 40 mm depth in about 8
s. To keep the measurement scene stable during acquisition, a positioning frame
was introduced to keep the orientation of transducer and reflector fixed with
respect to each other (Fig. 5b). For each line, the raw (unmodulated) ultrasound
data (RF lines) are recorded. Computations are then performed in Matlab R
.
570 S.J. Sanabria and O. Goksel
a) Transducer
c) Dynamic programming (DP) Independent RF line analysis Adaptive amplitude-tracking
Tracked delays ti,o (µs)


Timing ambiguities
20 60 100 Tx = Rx Tx = Rx
Fading
Inclusion First echo
Secondary echo
Reflector
First echo Secondary echo

b) d)
Lost
Δ ti,o (µs)
Δ ti,o (µs)
Δ ti,o (µs)
tracking
Line l (Tx = Rx)
Inclusion
Fading
Time (µs)
Fig. 2. Reflector identification for ex-vivo liver test (Fig. 5c). a) Setup details; b) RF
lines acquired with overlapped DP delineation for the case of same Tx and Rx; c) the
measured ToF matrix ti,o ; and d) the relative path delays Δti,o after compensating for
geometric effects. The proposed DP method outperforms independent RF line analysis
and adaptive amplitude-tracking [12].
2.1 Reflector Delineation
The reflector consists of a thin Plexiglas stripe (50 mm × 7 mm × 5 mm), which

limits the reflected echoes to the desired imaging plane, and allows for flexible
access to different breast locations. The material choice ensures a coherent wave
reflection along the tested angular range. The flat reflector geometry is simple for
manufacture, and easy to identify in US data. Secondary echoes corresponding to
wave reflections between reflector boundaries are well-separated from the main
echo and filtered out (Fig. 2a, b).
In a real tissue scenario, a modulated ultrasound waveform with an oscil-
latory pressure pattern is recorded. The recorded signal shows multiple local
maxima, with varying amplitudes depending on the wave path. Simply picking
the peak response in each RF line yields incorrect ToF values, since different
peaks may be selected for different transmit-receive (Tx-Rx) pairs. An adaptive
amplitude-tracking measurement, which uses the current timing measurement as
prior information for the adjacent Tx-Rx pairs, was shown for non-destructive
testing of heterogeneous materials [12]. However, it requires manual initializa-
tion, which is not affordable for in-vivo scenarios and fails when, due to wave
interference and scattering effects, the reflected wave-front falls below the system
noise level (fading), as frequently observed in real tissue samples (Fig. 2c, d).
In this work a global optimization is introduced, which simultaneously con-
siders the full Tx-Rx dataset. Based on Dynamic Programming (DP), which has
been applied in US for the segmentation of bones [4] and vessel walls [1], an
algorithm for detecting oscillatory patterns in RF lines is proposed. It consists
of a global cost matrix C(l, tl ), which is cumulatively built along successive RF
lines l (adjacent Tx-Rx pairs) for a list of N timing candidates tl = t0l , t1l . . . tN
l ,
i.e., a list of possible time samples in the current RF line l, among which the
optimum reflector timing can be found. Also, a memory matrix M (l, tl ) records
discrete timing decisions for each line and candidate. The optimum reflector tim-
ing is then found, which minimizes the cumulative cost, and following M (l, tl )
backwards the optimum reflector delineation T (l) is drawn:

C(l, tl ) mintl−1 {C(l − 1, tl−1 ) + f1 (tl , tl−1 )} + f0 (tl )
= (1)
M (l, tl ) argmintl−1 {C(l − 1, tl−1 ) + f1 (tl , tl−1 )}

T (l) = argmin C(l, tl ), l = L; M (l + 1, T (l + 1)), l = 1 . . . L − 1;
tl
with f0 and f1 non-linear functions that incorporate ToF for current t1 and
neighbouring tl−1 RF lines. The general formulation of Eq. 1 introduces regu-
larization into the reflector timing problem, enabling the natural incorporation
of available prior information (oscillatory pattern, smoothness, multiple echoes,
path geometry) into the optimization. Moreover, the delineation does not require
manual initialization and is parallelizable linewise. The currently not optimized
Matlab code runs on a single-core of an Intel Core i7-4770K CPU in <100 s, but
several future speed improvements are envisioned.
2.2 Total-Variation Sound-Speed Image Reconstruction

Once the time delay matrix for all transmit-receiver elements ti,o has been
obtained (Fig. 2c), a SoS image is reconstructed. First, the baseline geometri-
cal delays ti,o due to different path lengths between different transmit-receiver
elements are subtracted from ti,o to isolate the relative delays induced by SoS
inhomogeneities Δti,o (Fig. 2d):

Δti,o = ti,o − ti,o ti,o = (c)−1 4d2 + p(io − ii )2 ∀i, o (2)
where c is the average tissue speed of sound (with a nominal value of 1540 ms −1 ),
d is the distance between transducer and reflector, p is array pitch (0.3 mm for our
probe), and ii , io are the indices of the Tx i and Rx o elements considered (1..128).
Note that d and c are estimated with linear regression based on Eq. 2. In practice,
a non-linear fit is performed to estimate both the reflector inclination and in-plane
orientation.
The next step is the reconstruction of SoS distribution, which is expressed
−1
in slowness units σ [s/m], with c(x, y) = c (1 + σ(x, y)) . The tissue region
is discretized into cells c traversed by a finite set of ray paths p correspond-
ing to different Tx-Rx pairs (Fig. 3a). With the known differential path lengths
lp,c , the path delays Δtp are calculated in function of the slowness increments
C
σc , i.e.,Δtp = c=1 lp,c σc , in matrix form Δt=Lσ. Since reconstruction can
be ill-posed, regularization becomes necessary. A conventional solution in X-ray
Computed Tomography (CT) [7], is Filtered Backprojection (FBP), which aver-
ages the delays of all rays p propagating through cell c. Previous reflector-based
US works [9] have used Algebraic Reconstruction (ART), in which Δt=Lσ is
approximated via singular value decomposition, preserving only the largest sin-
gular values of L (typically 5 % of the total). Both FBP and ART provide a stable
a) b) profile ||D ||2

gradient D ||D ||1
0 1 0 -1 0 2 2 Sharp gradient
c=0 profile 0 0 1 1 0 0
c=C .5 .5 0 -.5 -.5 1 2 Smooth gradient
0 .5 1 1 .5 0
1 2 min ||D ||n
t0 t1 tp
Fig. 3. Formulation of the sound-speed reconstruction problem. (a) Ray tracing dis-
cretization. (b) Smoothness regularization with L2 and L1 norms. While the L2 norm
favors the smooth sound-speed profile, the L1 norm (TV) equally weights smooth and
sharp gradients.
SoS-image reconstruction, which however suffers from strong streak artifacts and
a coarse resolution in the vertical direction (Fig. 4c). The reason is that, similarly
as in limited-angle CBCT, reflector-based SoS-USCT is an ill-posed problem [7],
every cell being traversed by only a limited set of path orientations; i.e., paths
parallel to the reflector are missing. This is a main geometric limitation with
respect to dedicated USCT systems, which incorporate complete angular path
sets [3,5,10].
To overcome the limited-angle artifacts, we introduce additional regularizing
assumptions for the smoothness of the SoS-image:
σ̂T V = argmin {Δt − Lσ2 + λDσn } (3)

σ
n n
where Dσn = i,j |σi+1,j − σi,j | + |σi,j+1 − σi,j | minimizes the sum of
horizontal i and vertical j gradients of the reconstructed image, and λ is a
constant. The norm of the smoothness term n criticallyinfluences the recon-
2
struction results (Fig. 3b). For L2-norm, i.e., x2 = |x| , a closed linear
solution (Tikhonov regularization) of Eq. 3 is found, but smooth gradients are
favored with respect to sharp gradients. As a result, the reconstruction does not
significantly improve with respect to ART. However, if the L1-norm n = 1 is
Setup µs m -1 Conventional ART µs m -1 Total variation TV µs m -1 AWTV µs m -1
a) b) c) d)
Setup µs m -1 Conventional ART µs m -1 Total variation TV µs m -1 AWTV µs m -1
Fig. 4. Simulation of sound-speed image reconstruction with (top) single and (bottom)
multiple inclusions: (a) in-silico phantom, (b-c) reconstruction with prior-art, and (d)
our TV approach.

used x1 = |x| (Total Variation (TV) Regularization), sharp and smooth
gradients are equally weighted, which leads to the reconstruction of a minimum
number of piecewise homogeneous inclusions. This concept has been previously
applied to regularize sparse array apertures in full-angle 3D USCT [11]. We apply
this here for the first time to the limited-angle ultrasound reflection tomography
case. With n = 1, Eq. 3 becomes a convex problem, which is iteratively solved
with off-the-shelf optimization packages.
The resulting SoS images (Fig. 4c) successfully filter out limited-angle arti-
facts and delineate closed inclusion geometries. However, they still show reduced
axial resolution, due to the extremely reduced path orientation set (according
to Fig. 3a, for a SoS image aspect ratio of 1:1, the largest available ray angle is
25◦ ). In order to compensate for this resolution loss we introduce Anisotropically-
Weighted Total Variation (AWTV), which balances horizontal and vertical gra-
dients with a constant κ according to the available ray information in each
direction:

σ̂AWTV = argmin Δt − Lσ2 + λ κ|σi+1,j − σi,j | + (1 − κ)|σi,j+1 − σi,j | (4)
σ
i,j
With a reconstructed pixel size equal to the array pitch (p =0.3 mm), an optimum
reconstruction performance was achieved with λ = 0.0008 and κ = 0.9 .
2.3 Tissue Phantoms, Ex-vivo and In-vivo Tests
A tissue-mimicking phantom was manufactured from gelatin (9 g/100 mL water),

mixed with flour to simulate typical ultrasound reflectivity patterns (speckle).
Hard inclusions simulating tumors (Fig. 5a) were introduced by using a higher
amount of gelatin (13 g/100 mL water) in well-defined phantom regions. In order
to make these inclusions invisible to conventional B-mode US, the same amount
of scatterering was used in tissue background and hard inclusions, so that both
exhibited the same echogenicity. To test the applicability of the method to breast
mammography, a breast elastography phantom (Model 059, CIRS Inc., Norfolk,
VA, USA) was tested. The phantom is fabricated with a tissue-mimicking mate-
rial (ZerdineT M ) and shows a realistic breast geometry, incorporating both skin
layers and glandular tissue, together with cystic (water) and dense lesions (with
embedded microcalcifications) (Fig. 5b). Ex-vivo tests were performed in bovine
liver samples. Hard inclusions were simulated by ablating small pieces of liver
(submerged in 250 mL water for 6 min at 700 W microwave). These were after-
wards inserted in the liver (Fig. 5c). Finally, a preliminary in-vivo test was carried
out with a healthy volunteer with benign cysts. While the subject sat in tripod
position (Fig. 5d), the sonographer placed the US probe on the region-of-interest.
Then, the subject held the positioning frame closed with both hands, while the
reflector USCT data was acquired. For all tests, B-mode US images were also
generated from the multi-static datasets for comparison with the SoS images.
Conventional B-mode Hand-held Sound-Speed

a)
Stiff inclusions
0
HARD
-3.5 s/m
Depth (mm)
10
Relative
slowness
20
+1.0 s/m
30 SOFT
Stiff inclusions 0 20
Width (mm)
b) Glandular tissue Skin layer
Stiff tumour
0
HARD
-17 s/m
Depth (mm)
10
Relative
slowness
20
+10 s/m
30 SOFT
0 20
Width (mm) Cyst
c) Hard inclusion
0
HARD
-17 s/m
Depth (mm)
10
Relative
slowness
20
+17 s/m
30 SOFT
0 20 Reflector echo
Width (mm)
128
0
d) -96
Experiment
Rx. (#)
10
64
HARD
Depth (mm)
20 -17 s/m
32
Relative
30 slowness 128
Simulatoin
+10 s/m
96
40
Rx. (#)
SOFT
64
t ( s)
0 20
Width (mm) Cyst 32
-0.05 0.1 32 64 96 128
Tx (#)
Fig. 5. Hand-held sound-speed mammography of gelatin phantom (a), breast-mimick-

ing phantom (b), ex-vivo liver sample with hard inclusion (c), and in-vivo data for a
benign cystic mass (d).
The proposed DP method clearly outperforms independent RF line analysis

and adaptive amplitude-tracking, enabling the acquisition of a continuous ToF
matrix for real tissues (Fig. 2c), in which small timing variations (<100 ns)
caused by SoS inhomogeneities are successfully observed (Fig. 2d). Signal fading
(Fig. 2b) was typically observed around the inclusion boundaries, where strong
wave refraction occurs due to quasi-parallel incident ray paths. DP automati-
cally filters out fading positions from the reconstruction. Calibration experiments
in gradually-heated water provided quantitative SoS values with a sensitivity
< 0.005 ms−1 . The observed timing error of std = 15 ns results in a noise floor
of 0.8 µs m−1 , corresponding to a <0.1 % sound-speed contrast.
The proposed AWTV SoS reconstruction achieves significant improvements
in the delineation of inclusion geometry (Fig. 4). Often-problematic vertical elon-
gation of inclusions is strongly reduced (14 %) compared to ART (>300 %) and
TV (95 %), which enables a quantitative reconstruction of original SoS values
(SoS error <0.3 %). Streak artifacts, which are typical in ART (CNR = 15 dB),
are not visible in AWTV (CNR = 37 dB). Moreover, our novel approach success-
fully reconstructs multiple inclusions with different SoS values and geometries
(Fig. 4). Not only are the inclusion positions correctly identified, but also are
their SoS values and diameter satisfactorily estimated.
An excellent performance is observed in both phantom and ex-vivo tests. For
the gelatin phantom, the hard inclusions were manufactured with a small SoS
contrast (−3.5 µs m−1 , 0.5 % SoS increase), but nonetheless were successfully
resolved (Fig. 5a). In the more heterogeneous breast phantom (Fig. 5b) both hard
inclusions (−17 µs m−1 , 2.6 % SoS increase) and cysts (−6 µs m−1 , 0.9 % SoS
increase) show a higher contrast and are well-separated from the background
noise, which is around 0.6 %. These values are more representative of real breast
tumors, as reported by [2]. The background noise is related to reconstruction
artifacts (e.g., the gradient information is missing at image boundaries), and to
a minor extent, to refraction effects not accounted for in the ray tracing model.
The hard inclusion in the ex-vivo liver samples was invisible in the B-mode, but
clearly delineated in the SoS image, with contrast similar to the breast phantom;
see Fig. 5d. Despite movement artifacts, lower US signal-to-noise ratio (<10 dB),
and imperfect coupling between reflector and breast tissue, the preliminary in-
vivo test demonstrates a successful identification of cystic inclusion, with an
expected lower SoS contrast (−8 µs m−1 ) than the ex-vivo hard inclusions.
4 Conclusions and Outlook

A novel hand-held sound-speed imaging modality has been proposed with min-
imum hardware modifications to conventional B-mode ultrasound systems. An
accurate geometric delineation of hypoechogenic inclusions was achieved with a
high SoS-contrast for hard inclusions in both breast elastography phantoms and
ex-vivo liver samples. SoS values are known as potential quantitative imaging bio-
markers for breast mass differentiation [2]. In our preliminary in-vivo test, even
cystic inclusion, which are known to be of low SoS contrast, were successfully
identified, indicating the future potential for detecting higher contrast cancer-
ous tumors. The proposed method is radiation-free, painless, and can potentially
complement routine screening for breast cancer. Prospective applications can be
for other organs that allow reflector placement such as the testicles, limbs, skin,
the prostate, and with catheters; or during open-surgery, e.g., for liver.
Acknowledgment. This work was funded by the Swiss National Science Foundation.
References
1. Crimi, A., Makhinya, M., Baumann, U., Thalhammer, C., et al.: Automatic mea-
surement of venous pressure using B-mode ultrasound. IEEE TMI 63, 288–299
(2016)
2. Duric, N., Littrup, P., Li, C., Roy, O., Schmidt, S., et al.: Breast imaging with
softVue: initial clinical examination. In: SPIE Medical Imaging, pp. 90400V (2014)
3. Duric, N., Littrup, P., Poulo, L., Babkin, A., Pevzner, R., Holsapple, E.: Detec-
tion of breast cancer with ultrasound tomography: first results with the computer
ultrasound risk evaluation (cure) prototype. Med. Phys. 34, 773–785 (2007)
4. Foroughi, P., Boctor, E., et al.: Ultrasound bone segmentation using dynamic
programming. In: IEEE Ultras Symposium, New York, NY, USA, pp. 2523–2526
(2007)
5. Gemmeke, H., Ruiter, N.V.: 3D ultrasound computer tomography for medical
imaging. Nucl. Instrum. Methods Phys. Res. A 580, 1057–1065 (2007)
6. Huang, S.W., Pai-Chi, L.: Ultrasonic computed tomography reconstruction of the
attenuation coefficient using a linear array. IEEE TUFFC 52, 2011–2022 (2005)
7. Kak, A.C., Slaney, M.: Principles of computerized tomographic imaging. IEEE
Press, New York (1988)
8. Kolb, T.M., Lichy, J., Newhouse, J.H.: Comparison of the performance of screening
mammography, physical examination, and breast us and evaluation of factors that
influence them: an analysis of 27,825 patient evaluation. Radiology 225, 165–175
(2002)
9. Krueger, M., Burow, V., et al.: Limited-angle us transmission tomography of the
compressed female breast. In: IEEE Ultrasonics Symposium, Miyagi, Japan, pp.
1345–1348 (1998)
10. Nebeker, J., Nelson, T.R.: Imaging of sound speed using reflection ultrasound
tomography. J. Ultrasound Med. 31, 1389–1404 (2012)
11. Radovan, J., Peterlik, I., et al.: Sound-speed image reconstruction in sparse-
aperture 3-D us transmission tomography. IEEE Trans. Ultrason. Ferroelectr.,
Freq. Control 59, 254–264 (2012)
12. Sanabria, S.J., Hilbers, U., et al.: Modeling and prediction of density distribu-
tion and microstructure in particleboards from acoustic properties by correlation
of non-contact high-resolution pulsed air-coupled ultrasound and X-ray images.
Ultrasonics 53, 157–170 (2013)
13. Siu, A., U.S. Preventive Services Task Force: Screening for breast cancer: U.S.
preventive services task force recommendation. Ann. Int. Med. 164(4), 279–296
(2016). doi:10.7326/M15-2886
Ultrasound Tomosynthesis: A New Paradigm
for Quantitative Imaging of the Prostate
Fereshteh Aalamifar1,2(&), Reza Seifabadi2, Marcelino Bernardo2,

Ayele H. Negussie2, Baris Turkbey2, Maria Merino2, Peter Pinto2,
Arman Rahmim1, Bradford J. Wood2, and Emad M. Boctor1
1
fereshteh@jhu.edu
2
National Institutes of Health, Bethesda, MD, USA
Abstract. Biopsy under B-mode transrectal ultrasound (TRUS) is the gold

standard for prostate cancer diagnosis. However, B-mode US shows only the
boundary of the prostate, therefore biopsy is performed in a blind fashion,
resulting in many false negatives. Although MRI or TRUS-MRI fusion is more
sensitive and specific, it may not be readily available across a broad population,
and may be cost prohibitive. In this paper, a limited-angle transmission US
methodology is proposed, here called US tomosynthesis (USTS), for prostate
imaging. This enables quantitative imaging of the prostate, such as generation of
a speed of sound (SOS) map, which theoretically may improve detection,
localization, or characterization of cancerous prostate tissue. Prostate USTS can
be enabled by adding an abdominal probe aligned with the transrectal probe by
utilizing a robotic arm. In this paper, we elaborate proposed methodology; then
develop a setup and a technique to enable ex vivo USTS imaging of human
prostate immediately after prostatectomy. Custom hardware and software were
developed and implemented. Mock ex vivo prostate and lesions were made by
filling a mold cavity with water, and adding a plastisol lesion. The time of flights
were picked using a proposed center of mass method and corrected manually.
The SOS map with a difference expectation-maximization reconstruction per-
formed most accurately, with 2.69 %, 0.23 %, 0.06 % bias in estimating the
SOS of plastisol, water, and mold respectively. Although USTS methodology
requires further ex vivo validation, USTS has the potential to open up a new
window in quantitative low-cost US imaging of the prostate which may meet a
public health need.
Keywords: Transmission ultrasound Ultrasound tomography Ultrasound

tomosynthesis Robotic ultrasound Ex vivo Prostate
1 Introduction
Prostate cancer is the most common male cancer in the United States with an estimated
220,000 new cases and 28,000 deaths in 2015 [1]. A key to survival and to avoid
over-treatment is early detection, and accurate characterization [2]. Systematic sextant
biopsies under TRUS guidance have been the gold standard technique since the 1980’s
[3]. TRUS is real-time, relatively low cost, and shows the prostate capsule and

DOI: 10.1007/978-3-319-46720-7_67
578 F. Aalamifar et al.
boundaries. However, it suffers from poor spatial resolution, subjectivity, and low
sensitivity for cancer detection (40–60 % [4]).
MRI is the superior imaging modality for visualizing prostate gland, nerve bundles,
and clinically-relevant cancer. However, real-time MRI is challenging, requires spe-
cialized costly equipment, and in-gantry prostate biopsy is time and resource intensive
and impractical to apply across a broad population. Fusion of TRUS and multi-parametric
MRI takes advantage of the strengths of both imaging modalities. In fusion-guided
biopsy, targeting information is solely dependent on MR images [4]. Even though
US-MRI fusion guided biopsy has shown to be highly sensitive to detect higher-grade
cancer, it still suffers from high false positives for lower-grade cancers resulting in
unnecessary biopsies [4]. Also, MRI is expensive and less available to the broad
population.
Some US based technologies have recently been proposed to address this clinical
need in addition to MRI-US fusion, including elastography [5], doppler, and US tissue
characterization [6]. Although several studies reported significant improvement in
prostate cancer identification with quasi-static elastography, there are still some limi-
tations in reproducibility, subjectivity, and the inability of this method to differentiate
cancer from chronic prostatitis [5]. Time series analysis [6] is an interesting new
machine learning technique to perform the tissue characterization and has recently
shown promising results for marking cancerous areas of prostate using the US RF
image [6]. This machine learning method is still based on a post-processing of
reflection data.
Transmission ultrasound imaging works based on transmission of US signals. The
received signal can be used to reconstruct the volume’s acoustic properties such as
SOS, attenuation, and spectral scattering maps. This information may theoretically be
able to differentiate among different tissue types, including cancerous tissues. Trans-
mission ultrasound can be performed in two ways: full angle and limited angle, just as
with tomography. Full angle is a described technique called ultrasound computed
tomography which has been extensively used for breast imaging [7] and recently,
imaging of extremities [8]. Limited angle, however, is a relatively more recent tech-
nique, which has also been used in breast imaging [9]. Similar to X-ray tomosynthesis
(which is a limited angle version of CT [computed tomography]), here, we refer to the
limited angle US tomography as “US tomosynthesis” (USTS).
The current transmission US systems (e.g. [7, 8]) only work with breast, since it is
an easy target to scan in a small water tank. Leveraging these recent findings, we
propose a method to further this technology to prostate cancer diagnosis and screening
utilizing robotic technology. In this concept (Fig. 1), a bi-plane or tri-plane TRUS
probe resides in the rectum, and a linear/curved array transducer resides on the
abdomen/pelvis, using the bladder as an acoustic window to the prostate. The
abdominal probe can be fixed and aligned with the TRUS probe using a co-robotic
setup similar to the one proposed in [10]. Ex vivo modeling is requisite prior to
evaluating prostate USTS in vivo. The first step is to evaluate the feasibility of USTS in
prostate cancer detection in a controlled benchtop environment, to understand the
potential of this technology. Therefore, this paper focuses on modeling and developing
a system and method for ex vivo prostate USTS. The system was evaluated with a mock
prostate and lesions with comparable SOS.
US Tomosynthesis: A New Paradigm for Quantitative Imaging 579
(a) (b) (c) (d)
Fig. 1. (a) Prostate ultrasound tomosynthesis concept: a bi/tri-plane TRUS probe is placed into
the rectum and a linear/curved array transducer is placed on patient’s abdomen; (b) sagittal USTS
imaging, (c) axial USTS imaging. (d) USTS image reconstruction concept; larger angel ϴ leads
to more tomographic data, and less artifact in the reconstructed image.
2 Method
2.1 System Components

For the ex vivo study, we propose a setup as depicted in Fig. 2. In this setup, two 128
arrays, 6 cm, linear US probes are precisely aligned. It should be noted that in the in-
vivo study a transrectal probe which has similar geometry to the abdominal probe used
in this setup can be utilized (e.g. Ultrasonix BPL9-5/55). The distance between the
probes is adjustable to provide sufficient acoustic window with contact against the
scanned volume. The ex vivo prostate can be put inside a patient-specific, 3D printed,
US friendly mold, whose 3D geometries are based upon 3D MRI data. This mold is
placed inside a container which has transparent rubber windows. Small amount of
acoustic-transmitting liquid is injected to fill the gaps between the prostate, mold, and
container. The container is placed between the aligned probes and its height is adjusted
in order to scan different slices.
US machine
(c)
DAQ (b)
Receiver
Transmitter
(a)
(d)
Fig. 2. (a) USTS ex vivo setup, The patient specific molds to correspond MRI, histology, and
USTS slices. (b) The 3D printed mold for MRI-histology comparison. (c) The 3D printed box
used to create the US friendly mold. (d) The US friendly mold and the 3D printed prostate with
its seminal vesicles.
MRI and histology are the ground-truths for comparison of the USTS image
reconstructed using this setup. The technique and test bed model were designed to
enable direct correlation with MRI and matching slices of correlative histology whole
mounts. This technique was performed in two steps: first, a patient specific mold (as
shown in Fig. 2b) with grooves to guide histology knife is 3D printed. The grooves are 3
or 6 mm apart and result in histology slices specifically custom designed to correspond
to MR image slices [11]. Second, the same mold is created using an US friendly material
with marks indicating the corresponding slices to be scanned using the US probes.
The US friendly mold was made from acrylamide gel with 1523 m/s SOS and other
relevant tissue mimicking property as reported previously [12]. The phantom does not
decay, is rigid enough to hold the prostate, and has appropriate SOS suitable for
reconstruction (will be described later). In order to make the mold, initially the prostate
(with seminal vesicles) is segmented from the clinical MR image. This prostate volume
is saved as a stereolithography (.stl) file and printed using a 3D printer (uprint,
Stratasys). The 3D printed prostate is positioned inside a box at similar position and
orientation compared to MRI 3D printed mold using guide rods as shown in Fig. 2c.
Then, the acrylamide solution was poured into the box. After solidification, the rods
were removed and the mold was cut to remove the 3D printed prostate. Figure 2d
shows the US friendly mold. The prostate can be put inside the mold cavity and the
mold’s halves are adhered together. Then, the mold is inserted into a container. The
container holds the mold in place during the USTS scan, can be filled with liquid to fill
the acoustically insulating air gaps between mold and prostate, and provides windows
made of mylar sheet to provide US transparency. The container is marked with lines
that determine the slices that correspond to the MRI slices.
We used two linear array Ultrasonix probes. The transmitting probe was connected to
an Ultrasonix Sonixtoch scanner (Vancouver, BC). As shown in Fig. 2a, the receiving
probe was connected to an Ultrasonix Data Acquisition (DAQ) device, which can receive
the US waveforms of 128 channels in parallel with sampling frequency of 40 MHz.
2.2 Data Processing

In this study we were interested in SOS in each pixel of the image (as shown in
Fig. 1d). To reconstruct an USTS image, i.e. to calculate the SOS in each pixel of the
image, two pieces of information are required: the accurate distances between each
transmit-receive pair, and the measured time of flight (TOF) between them.
The US data collected contains 128 waveforms per transmitter, each corresponding
to one receiver; and one image (slice) is calculated from 128 transmissions. Hence, in
order to compute the SOS, the TOF should be picked at all 128 128 (=16384)
waveforms. A MATLAB interface was implemented to pick the TOFs semi-
automatically. The initial locations of the TOFs were estimated using a center of
mass method as:
R 2
ts ðtÞdt
tcm ¼ Rt 2 ð1Þ
t s ðt Þdt
where s(t) is the intensity of the received signal at time t. s(t) is set to zero outside
[tbg−w, tbg + w], where tbg is the estimated background TOF, and w is half of a certain
window length to reduce the effect of noise and refractions. As shown in Fig. 3, some
of the waveforms contained electrical noise, or refracted delayed signals which could
result in miss-selection of the TOF. The MATLAB interface allows the user to correct
for these miss-selections.
Fig. 3. A sample of raw data.
The grid area between transmit-receive pairs (Fig. 1d) were formulated as a system
matrix and the following equation was used to calculate the image based on straight-ray
US propagation approximation [10]:

S X Xbg ¼ T Tbg ð2Þ
Where S is the system matrix, X is a vectored concatenation of the image matrix,

and T is a vector containing the TOF measurements. Xbg and Tbg are the known
background SOS values, and the measured TOFs for background respectively. The
background image was collected by scanning a slice that only contains the acrylamide
gel. This information helps in compensating for probes’ misalignment and measure-
ment bias [10]. Equation (2) is under-determined and would be computationally
expensive to solve analytically. Hence, we tested two iterative methods of conjugate
gradient and expectation maximization. Since the background information is incor-
porated, these methods are referred to as Diff-CG and Diff-EM in the results. More
details regarding the Diff-EM method could be found in [13].
2.3 Simulation Setup

A simulation study was carried out to answer the following questions for the designed
setup: (1) How well the image can be reconstructed given the limited angle data and the
small difference between cancerous and non-cancerous SOS [14]; (2) What is the best
SOS for the background material (i.e. which material should be chosen for the
US-friendly mold). It should be noted that this study simulates the mathematics of
the reconstruction problem without considering US wave propagation properties. The
locations of probes was simulated on top and bottom of the image with 5 cm axial
distance. The ground-truth image was created with arbitrary features based on the
typical size of the prostate and lesions. As shown in Fig. 4a, the prostate can be modeled
as a 3 4 cm ellipse and contains two lesions of size 5 and 10 mm in diameter.
The speeds of sound in prostate are set to 1614 m/s for prostate region and, 1572 m/s
and 1596 m/s for the two lesions based on [14].
2.4 Phantom Study

Using the setup shown in Fig. 2, an image of the mock prostate was acquired. The US
machine was set in B-mode image acquisition mode with depth of 7 cm, US frequency
of 5 MHz, and aperture size equal to 1 (equivalent to 2 elements in Ultrasoix machine)
to enable sequential transmission of US waves. Mock ex vivo study was performed by
filling the mold cavity with water (1480 m/s) and attaching to the inner part of the mold
a lesion made of plastisol (*1375 m/s). The container with the mold inside was placed
between the aligned probes and their axial distance was adjusted using a caliper to
50 mm. US gel was applied to the probes’ tip to enhance the coupling and the center
slice was chosen to do USTS data collection.
3 Results
3.1 Simulation Results
A simulation phantom was created in MATLAB based on the prostate description
given above. As shown in Fig. 4, a background speed of 1523 m/s (similar to general
tissue speed of sound) produced a superior image than ones with 1375 m/s and
1010 m/s, corresponding to plastisol and silicon ecoflex respectively. Artifacts in the
images are due to the limited angle data but, the lesions are still distinguishable from
the prostate.
Background SOS Ground-truth Diff-CG Diff-EM

1500
1010 m/s
1000
1600
1375 m/s 1400
1523 m/s
(a) (b) (c)
Fig. 4. Simulation results: (a) ground-truth simulation phantoms; (b–c) reconstructed SOS map
using (b) Diff-CG and (c) Diff-EM methods.
3.2 Phantom Results

Figure 5a shows a B-mode image of a slice of the mock prostate. The TOF was picked
once automatically, and once with manual correction; then the image was recon-
structed. Images using Diff-CG and automatically picked TOF contained a high amount
of artifacts that increased with number of iterations. The Diff-EM method with man-
ually corrected TOF produced the most accurate results, with the least amount of
artifacts, and is shown in Fig. 5c. The theoretical SOSs are around 1523, 1480, and
1375 m/s for mold, water, and plastisol respectively and as shown in the Fig. 5c, these
values in one pixel in each of these areas are estimated as 1523, 1476, and 1415 m/s.
Table 1 shows bias and noise for a 5 5 group of pixels around each of these pixels
for the different methods. Diff-EM method seems less robust to TOF error while
Diff-CG was less robust to other experimental noise and errors.
Table 1. Bias and noise in the reconstructed images using the two methods at different
iterations.
Iteration Diff-CG Diff-EM
Auto TOF Corrected Auto TOF Corrected
pick TOF pick TOF
20 50 20 50 20 50 20 50
%Biasp 2.89 3.68 3.44 4.1 1.77 16.5 2.93 2.69
%Biasw 0.86 1.30 0.38 0.95 0.79 3.47 0.29 0.23
%Biasb 0.80 1.52 0.79 1.42 0.06 0.20 0.11 0.06
Noise 15.7 30.5 14.10 27.05 1.06 4.57 1.20 1.14
p
: plastisol; w: water; b: background. Noise was calculated as the
standard deviation of background pixels.
Diff-EM iteration: 20 X: 30 Y: 15 Diff-EM iteration: 50

Index: 1415 1600
Plastisol Mold
RGB: 0, 1, 1
1550
X: 6 Y: 11
Index: 1523
RGB: 0.5, 0, 0 1500
1450
1400
Water
X: 30 Y: 44
Index: 1476
RGB: 1, 0.625, 0
(a) (b) (c)
Fig. 5. (a) B-mode image, (b–c) Reconstructed image using Diff-EM method and (b) automat-
ically picked TOF (more iterations causes more artifacts), and (c) manually corrected TOF.
4 Conclusions
In this study, we proposed and modeled a new paradigm for quantitative imaging of
prostate, that we call ultrasound tomosynthesis. Prostate cancer screening, biopsy, focal
image guided therapies, and brachytherapy are examples of the clinical applications
that could potentially integrate this technology. In this study, a setup and a technique
were developed to evaluate feasibility of prostate USTS in ex vivo prostate taken from
prostatectomy patients. Simulation and phantom studies were done to evaluate the
feasibility of this setup. The proposed setup could be used for patient-specific USTS
study of ex vivo tissues. The SOS map reconstructed from a mock ex vivo prostate with
relevant acoustic properties showed promise. Immediate next step includes ex vivo
study. Since the SOS contrast among different tissues may be small in prostate, the
attenuation map and more advanced reconstruction techniques including regularization
[15] will be investigated. There is a critical public health need for improved method-
ologies of prostate tissue characterization and prostate cancer detection that are
cost-effective, broadly accessible, and easy to use.
Acknowledgement. This work was supported by the NIH intramural research funding and
Johns Hopkins internal funds.
References
1. Siegel, R., et al.: Cancer statistics. Cancer J. Clin. 64, 9–29 (2014)
2. Labrie, F., et al.: Screening decreases prostate cancer death: first analysis of the 1988 Quebec
prospective randomized controlled trial. Prostate 38, 83–91 (1999)
3. Durkan, G.C., et al.: Improving prostate cancer detection with an extended-core transrectal
ultrasonography-guided prostate biopsy protocol. BJU Int. 89(1), 33–39 (2002)
4. Imani, F., et al.: Augmenting MRI transrectal ultrasound guided prostate biopsy with
temporal ultrasound data: a clinical feasibility study. Int. J. Comput. Assist. Radiol. Surg. 10,
727–735 (2015)
5. Correas, J.M., et al.: Ultrasound elastography of the prostate: state of the art. Diagn. Interv.
Imaging 94(5), 551–560 (2013)
6. Imani, F., et al.: Computer-aided prostate cancer detection using ultrasound RF time series:
in vivo feasibility study. IEEE Trans. Med. Imaging 34(11), 2248–2257 (2015)
7. Duric, N., et al: Whole breast tissue characterization with ultrasound tomography. In: SPIE
Medical Imaging (2015)
8. Fincke, J.R., et al: Towards ultrasound travel time tomography for quantifying human limb
geometry and material properties. In: SPIE Medical Imaging, San Diego, CA (2016)
9. Huang, L., et al.: Breast ultrasound tomography with two parallel transducer arrays:
preliminary clinical results. In: SPIE Medical Imaging, p. 941916 (2015)
10. Aalamifar, F., et al: Co-robotic ultrasound tomography: dual arm setup and error analysis. In:
SPIE Medical Imaging (2015)
11. Turkbey, B., et al.: Multiparametric 3T prostate magnetic resonance imaging to detect
cancer: histopathological correlation using prostatectomy specimens processed in cus-
tomized magnetic resonance imaging based molds. J. Urol. 186(5), 1818–1824 (2011)
12. Negussie, A.H., et al.: Thermochromic tissue-mimicking phantom for optimisation of
thermal tumour ablation. Int. J. Hyperth. 32(3), 239–243 (2016)
13. Aalamifar, F., et al: Image reconstruction for robot assisted ultrasound tomography. In: SPIE
Medical Imaging (2016)
14. Tanoue, H., et al: Ultrasonic tissue characterization of prostate biopsy tissues by ultrasound
speed microscope. In: Engineering in Medicine and Biology Society (2011)
15. Huthwaite, P., et al.: A new regularization technique for limited-view sound-speed imaging.
IEEE Trans. Ultrason. Ferroelectr. Freq. Control 60(3), 603–613 (2013)
Photoacoustic Imaging Paradigm Shift:
Towards Using Vendor-Independent
Ultrasound Scanners
Haichong K. Zhang, Xiaoyu Guo, Behnoosh Tavakoli,

and Emad M. Boctor(&)
The Johns Hopkins University, Baltimore, MD 21218, USA

{hzhang61,xguo9}@jhu.edu,
behnoosh.tavakoli@gmail.com, eboctor1@jhmi.edu
Abstract. Photoacoustic (PA) imaging requires channel data acquisition syn-

chronized with a laser firing system. Unfortunately, the access to these channel
data is only available on specialized research systems, and most clinical ultra-
sound scanners do not offer an interface to obtain this data. To broaden the
impact of clinical PA imaging, we propose a vendor-independent PA imaging
system utilizing ultrasound post-beamformed radio frequency (RF) data, which
is readily accessible in some clinical scanners. In this paper, two PA beam-
forming algorithms that use the post-beamformed RF data as the input are
introduced: inverse beamforming, and synthetic aperture (SA) based
re-beamforming. Inverse beamforming recovers the channel data by taking into
account the ultrasound beamforming delay function. The recovered channel data
can then be used to reconstruct a PA image. SA based re-beamforming algo-
rithm regards the defocused RF data as a set of pre-beamformed RF data
received by virtual elements; an adaptive synthetic aperture beamforming
algorithm is applied to refocus it. We demonstrated the concepts in simulation,
and experimentally validated their applicability on a clinical ultrasound scanner
using a pseudo-PA point source and in vivo data. Results indicate the full width
at the half maximum (FWHM) of the point target using the proposed inverse
beamforming and SA re-beamforming were 1.33 mm, and 1.08 mm, respec-
tively. This is comparable to conventional delay-and-sum PA beamforming, for
which the measured FWHM was 1.49 mm.
1 Introduction
Photoacoustic (PA) imaging is an emerging image modality that visualizes optical

absorption property with acoustic penetration depth. PA imaging has a wide range of
applications from small animals to human patients [1–3]. However, there are several
factors that prevent PA imaging from being more widely used in clinical research and
applications. The first limitation is the laser. Most of the laser systems used for PA
imaging have high power and with low pulse repetition frequency (PRF). Those laser
systems are expensive, bulky, and non-portable. Thus, a portable low cost laser system
with sufficient power output such as a pulsed laser diode (PLD) is desired for easier
access to PA data acquisition [4]. The second limitation is in PA signals receiving.
DOI: 10.1007/978-3-319-46720-7_68
586 H.K. Zhang et al.
PA signals contain broad-band spectral information, while conventional ultrasound

(US) has a frequency window for reception [5]. Although a narrow-band probe cannot
utilize the full potential of PA imaging, it still can receive signals, and is sometimes
useful for collecting signals from deeper regions. The previous two limitations have
been addressed or studied in the past research. The third limitation is the necessity of
using channel data, which has not been well studied. PA reconstruction requires a delay
function calculated based on the time-of-flight (TOF) from the PA source to the
receiving probe element [6, 7], while US beamforming takes into account the round trip
initiated from the transmitting and receiving probe element.
Thus, the reconstructed PA image with US beamformer would be defocused due to
the incorrect delay function (Fig. 1a). Real-time channel data acquisition systems are
only accessible from limited research platforms. Most of them are not FDA approved,
which hinders the development of PA imaging in the clinical setting. Therefore, there is
a strong demand to implement PA imaging on more widely used clinical machines.
This third limitation motivated our research, whose goal is to investigate a vendor
independent PA imaging system.
(a) (b)
Fig. 1. Conventional PA imaging system (a) and proposed PA imaging system using clinical US
scanners (b). Channel data is necessary for PA beamforming because US beamformed PA data is
defocused with the incorrect delay function. The proposed two approaches could overcome the
problem.
To use clinical US systems for PA image formation, Harrison, and Zemp [8]
proposed to change the speed of sound parameter. However, the access to the speed of
sound parameter is uncommon, and the changeable range of this parameter is bounded.
Zhang et al. [9] proposed to use US post-beamformed RF data with a fixed focal point.
Our paper considers more general US beamformed data applied with delay-and-sum
dynamic receive focusing. Two PA beamforming algorithms are introduced: inverse
beamforming, and synthetic aperture (SA) based re-beamforming (Fig. 1). US beam-
forming is a sequential process scanning line by line. Using those sequentially
beamformed data as input, inverse beamforming recovers channel data by taking into
Photoacoustic Imaging Paradigm Shift: Towards Using Vendor-Independent 587
account the delay function used to construct US post-beamformed RF data. The

recovered channel data can then be used to reconstruct a PA image. SA based
re-beamforming algorithm regards the defocused RF data as a set of pre-beamformed
RF data from virtual elements; an adaptive synthetic aperture beamforming algorithm is
applied on the RF data to refocus it.
In this paper, we first introduce the theory behind the proposed PA reconstruction
method. Afterwards, we present the evaluation of our method through simulation and
experiments that validate its feasibility for practical implementation.
2 Methods
2.1 Approach I: Inverse Beamforming

The idea of inverse beamforming is based on three hypotheses; 1. Localizing an US
point source does not require measuring its whole wavefront. 2. According to
Huygens-Fresnel principle, a non-point source can be considered as a cloud of multiple
point sub-sources. 3. Given the distribution, intensity and phase of the sub-sources, the
pre-beamforming data can be derived using the previous two hypotheses.
According to Huygens-Fresnel principle, given any wavefront, each point on this
wavefront is a sub-signal source. Thus, it is possible to reverse the beam propagation
process, and reconstruct a map of the original signal source from beamformed RF data.
In this specific case, each pixel on this image is considered as a sub-signal source. The
value of the pixel represents the signal amplitude. Once the signal source map is
derived, we can “fire” an US pulse from each pixel. By summing up all the time
reversal wavefronts, and correcting the known distortion caused by the incorrect
beamforming, the original channel data can be derived.
Figure 2 shows the three steps of the inverse beamforming method. The first step is
to reconstruct the signal source map. Suppose at the beginning of the image frame t0, a
laser pulse is sent to the field of view (FOV) and stimulates PA waves. The PA wave
amplitude distribution at t0 is I(x, y) under the continuous geometry (x, y). The US
system receives the signal, performed the conventional pulse-echo beam forming, and
output an incorrectly constructed image A(xm, yn) under the discrete geometry (xm, yn).
If the FOV is quantized as an M by N grid, each cell can be viewed as a PA point
sub-source, and the distribution I(xm, yn) is the signal source map. The value of each
cell I(xm, yn) on the map indicates the sub-source intensity at that particular position.
For each sub-source (xm, yn), the intensity I(xm, yn) is derived by integrating along the
curve C1:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
y¼ y2n þ ðx xm Þ2 ; ð1Þ
2
I
Iðxm ; yn Þ ¼ Aðx; yÞ Sðx; yÞ; ð2Þ
C1
where S(x, y) is an amplitude correction factor to correct the wave intensity change
caused by the distance. The signal source map can be formed by repeating the inte-
gration for all pixels. The second step is to mimic the PA data acquisition, and find the
signal value of each sampling point on a pre-beamformed image. As shown in Fig. 2, at
t0, the signal source map in FOV is I(xm, yn), the PA waves from each sub-source
propagate through the media, and reach the US probe array at the top of the image. At a
given time t1, a particular array element at xm receives signal from a circle with a radius
y, where y = C * (t1−t0). For each pixel of the recovered channel data geometry P(xm,
yn), we integrate along the circle C2:
I
1
Pðxm ; yn Þ ¼ Iðx; yÞ ; ð3Þ
y2
C2
where P(xm, yn) is the pixel amplitude received by the xm sample at time t1. The last
step is to repeat step two for all pre-beamforming image sampling points, so a
pre-beamformed image is reconstructed.
P(xm, yn)
A(xm, yn) I(xm, yn)
Fig. 2. Illustration of inverse beamforming processes.
2.2 Approach II: Synthetic Aperture Based Re-Beamforming

The difference between US beamforming and PA beamforming is the TOF and
accompanied delay function. US beamforming takes into account the TOF of the round
trip of acoustic signals transmitted and received by the US probe elements (that is sent
to and reflected from targets), while PA beamforming only counts a one way trip from
the PA source to the US probe. Therefore, the PA signals under US beamforming is
defocused due to an incorrect delay function (Fig. 3).
The delay function in dynamically focused US beamforming takes into account the
round trip between the transmitter and the reflecting point, thus the focus point at each
depth becomes the half distance for that in PA beamforming. Thereby, it is possible to
consider that there is a virtual point source, of which depth is dynamically varied in the
axial dimension by a maximum value equal to half distance of the true focal point.
The TOF from the virtual element to the receiving elements can be formulated as
jr j
t¼ ; ð4Þ
c
where
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
y 2
n
r¼ þ x2m : ð5Þ
2
Fig. 3. Illustration of synthetic aperture based re-beamforming processes.
2.3 Simulation and Experiment Setup

For the simulation, five point targets were placed at the depth of 10 mm to 50 mm with
10 mm intervals. A 0.3 mm pitch linear array transducer with 128 elements was
designed to receive the PA signals. Delay-and-sum with dynamic receive focusing and
an aperture size of 4.8 mm was used to beamform the simulated channel data assuming
US delays.
The experiment was performed with a clinical US machine (Sonix Touch, Ultra-
sonix), which was used to display and save the received data. A 1 mm piezoelectric
element was used to imitate a PA point source target. The element was attached to the
tip of a needle and wired to an electric board controlling the voltage and transmission.
The acoustic signal transmission is triggered by the line trigger sent from the clinical
US machine. The US post-beamformed RF data with dynamic receive focusing was
then saved. To validate the channel data recovery through inverse beamforming, the
raw channel data was collected using a data acquisition device (DAQ). For in vivo
mouse experiment, a tumor mimicking prostate cancer targeted by ICG was imaged
using a 2 MHz center frequency convex probe.
3 Results
3.1 Simulation Analysis
The simulation results are shown in Fig. 4. The US beamformed RF data was defo-
cused due to an incorrect delay function (Fig. 4b). The reconstructed PA images are
shown in Figs. 4c–e. The proposed two approaches were compared to the ground truth
conventional PA beamforming using channel data. The measured full width at the half
maximum (FWHM) was shown in Table 1. The reconstructed point size was compa-
rable to the point reconstructed using a 9.6 mm aperture on the conventional PA
beamforming.
(a) (b) (c) (d) (e)
Fig. 4. Simulation results. (a) Channel data. (b) US post-beamformed RF data. (c) Recon-
structed PA image from channel data with an aperture size of 9.6 mm. (d) Reconstructed PA
image through inverse beamforming. (e) Reconstructed PA image through SA re-beamforming.
Table 1. FWHM of the simulated point targets for corresponding beamforming methods.
FWHM (mm) Conventional using Inverse SA
channel data beamforming re-beamfoming
10 mm depth 0.60 0.62 0.63
20 mm depth 1.02 1.06 0.99
30 mm depth 1.53 1.39 1.43
40 mm depth 1.94 1.76 1.91
50 mm depth 2.45 2.11 2.42
3.2 Validation Using Pseudo-Photoacoustic Signal Source

US beamformed data was distorted due to incorrect delay (Fig. 5c), but both algorithms
were applied on the RF data. Comparing the channel data recovered through inverse
beamforming and the channel data from DAQ (Figs. 5a–b), identical wavefront was
confirmed, while there were intensity differences due to different noise realization and
recovery artifacts. However, this effect was negligible in the final PA image (Fig. 5e).
The measured FWHM was also similar for both inverse beamforming and SA
re-beamforming compared to the ground truth result using channel data (Table 2). This
indicates the proposed methods could replace conventional PA beamforming using raw
channel data.
(a) (b) (c) (d) (e) (f)
Fig. 5. Experiment results with Pseudo-PA data. (a–b) Comparison of channel data. (a) Ref-
erence channel data collected using DAQ. (b) Recovered channel data through inverse
beamforming from US post-beamformed RF data. (c) US post-beamformed RF data collected
from clinical US scanner. Reconstructed PA image using DAQ channel data (d), inverse
beamforming (e), and SA re-beamforming (f).
Table 2. FWHM of the point targets for corresponding beamforming methods.

Conventional Inverse beamforming SA re-beamfoming
FWHM (mm) 1.49 1.33 1.08
3.3 In Vivo Prostate Cancer Visualization

The tumor mimicking prostate cancer could be visualized in both approaches (Fig. 6).
The main contrast features were well captured in both methods, while the surrounding
contrast varies due to different noise realization.
(a) (b) (c)
Fig. 6. In vivo evaluation results. (a) Experiment setup. Contrast agents (ICG) targeting tumor
are visualized. (b) PA image using channel data. (c) PA image through SA re-beamforming.
Although demonstration of PA image formation was done based on point targets, the
proposed algorithms would work for any structures that have high optical absorption
such as a blood vessel that shows strong contrast for near-infrared wavelength light
excitation. The algorithms could be also integrated into a real-time imaging system
using clinical US machines [10].
A high PRF laser system can be considered as a system requirement, as it is
necessary to synchronize the laser transmission to the US line transmission trigger.
To keep the frame rate similar to that of conventional US B-mode imaging, the PRF of
the laser transmission should be the same as the transmission rate, in the range of at
least several kHz. Therefore, a high PRF laser system such as a laser diode is desirable.
US transmission should be off or use low energy to eliminate the artifacts from US
signals.
In this paper, we proposed a new paradigm on PA imaging using US post-
beamformed RF data from clinical US systems. Two algorithms, inverse beamforming
and SA based re-beamforming, were introduced and their performance was demon-
strated in the simulation. In addition, experimental study using the pseudo-PA signal
source and in vivo targets reveals the validity and clinical significance of these methods,
in that a similar resolution was achieved compared to conventional PA imaging using
channel data. Future work includes implementing the algorithm in a real-time
environment.
Acknowledgement. Authors acknowledge Howard Huang for proofreading, and Dr. Ying Chen
for assisting in vivo experiment.
References
1. Xu, M., Wang, L.V.: Photoacoustic imaging in biomedicine. Rev. Sci. Instrum. 77, 041101
(2006)
2. Wang, L.V., Hu, S.: Photoacoustic tomography. in vivo imaging from organelles to organs.
Science 335, 1458–1462 (2012)
3. Kolkman, R.G.M., et al.: Real-time in vivo photoacoustic and ultrasound imaging.
J. Biomed. Opt. 13(5), 050510 (2008)
4. Kolkman, R.G.M., et al.: In vivo photoacoustic imaging of blood vessels with a pulsed laser
diode. Lasers Med. Sci. 21(3), 134–139 (2006)
5. Park, S., Aglyamov, S.R., Emelianov, S.: Beamforming for photoacoustic imaging using
linear array transducer. In: Proceedings in IEEE International Ultrasonics Symposium,
pp. 856–859 (2007)
6. Yin, B., et al.: Fast photoacoustic imaging system based on 320-element linear transducer
array. Phys. Med. Biol. 49(7), 1339–1346 (2004)
7. Liao, C.K., et al.: Optoacoustic imaging with synthetic aperture focusing and coherence
weighting. Opt. Lett. 29, 2506–2508 (2004)
8. Harrison, T., Zemp, R.J.: The applicability of ultrasound dynamic receive beamformers to
photoacoustic imaging. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 58(10), 2259–2263
(2011)
9. Zhang, H.K., et al.: Photoacoustic reconstruction using beamformed RF data: a synthetic
aperture imaging approach. Proc. SPIE 9419, 94190L (2015)
10. Taruttis, A., Ntziachristos, V.: Advances in real-time multispectral optoacoustic imaging and
its applications. Nat. Photonics 9(4), 219–227 (2015)
4D Reconstruction of Fetal Heart Ultrasound
Images in Presence of Fetal Motion
Christine Tanner1(B) , Barbara Flach1 , Céline Eggenberger1 ,

Oliver Mattausch1 , Michael Bajka2 , and Orcun Goksel1
1
Computer Vision Laboratory, ETH Zurich, 8092 Zurich, Switzerland
tannerch@vision.ee.ethz.ch
2
Department of Gynaecology, University Hospital, 8091 Zurich, Switzerland
Abstract. 4D ultrasound imaging of the fetal heart relies on reconstruc-

tions from B-mode images. In the presence of fetal or mother’s motion,
current approaches suffer from artifacts. We propose to use many sweeps
and exploit the resulting redundancy to recover from motion by recon-
structing a 4D image which is consistent in phase, space and time. We
first quantified the performance of 7 formulations on simulated data.
Reconstructions of the best and baseline approach were then visually
compared for 10 in-vivo sequences. Ratings from 4 observers showed
that the proposed consistent reconstruction significantly improved image
quality.
1 Introduction
Fast acquisition rates and non-invasiveness of ultrasound (US) imaging makes
it an ideal modality for screening the fetal heart to detect congenital heart mal-
formation. Traditionally, the functioning of fetal heart is inspected in real-time
during B-mode imaging. Guidelines recommend examination of the four-chamber
and outflow tract views [1]. Yet prenatal detection rates vary widely, due to dif-
ferences in examiner experience, maternal obesity, transducer frequency, gesta-
tional age, amniotic fluid volume and fetal position [1]. 4D US imaging simplifies
the assessment of the outflow tracts, allows a more detailed examination and
contributes to the diagnostic evaluation in case of complex heart defects [1,2].
4D US of the fetal heart requires special image reconstruction methods, since
the speed of 3D US acquisitions using common mechanically steered probes is
too slow compared to the fetal heart rate (e.g. 7–10 vs. 2–2.5 Hz). A general app-
roach for such a 4D reconstruction problem is to continuously acquire individual
2D images covering the region of interest [3–5], which then need reordering to
extract consistent 3D images. While cardiac 4D MR reconstructions for adults
can be supported by ECG and respirator signals [6], these signals cannot reli-
ably extracted for fetus [7]. Hence sorting has relied on extracting the periodic
cardiac signal from the images and that no other fetal motion is present [3,4].
The most common method for fetal 4D US reconstruction is the STIC
(Spatio-Temporal Image Correlation) method [4], where autocorrelation is used

DOI: 10.1007/978-3-319-46720-7 69
594 C. Tanner et al.
to detect the systolic peaks, the fetal heart rate (HR) is deduced and the frames
are sorted according to their resulting phases. STIC builds on slow, single sweep
US acquisitions (e.g. 150 frames/s, 25o in 10 s, 1500 frames) and works well if
only fetal cardiac motion is present. Yet, additional motion from spontaneous
fetal activity or mother’s breathing creates artifacts [8,9]. Such artifacts cannot
be remedied, as motion affects all consecutive frame positions, which in a sin-
gle sweep have only been acquired once. Hence mothers are asked to hold their
breath and operators may wait for a period of less fetal movement, which pro-
longs examination time. Volumes acquired by non STIC experts showed more
motion artifacts (42 %) than those by experts (16 %) [8]. Reports on correcting
motion for fetal heart 4D US reconstruction have not been encountered.
Image registration has been used to improve reconstructions, but is generally
computationally very expensive. For example, correction of fetal 3D MRIs by
slice-to-volume rigid registration of local patches required 40 min on multiple
GPUs [10]. Correction of adult 3D cardiac MRIs, after gating based on ECG and
breathing belt signals, took 3 h on a 16 workstation cluster [6]. For respiratory
motion, 4D US reconstruction has been based on extracting a gating signal per
slice position by image dimensionality reduction and then matched these signals
across slices [5]. This relies on gathering reliable motion statistics per slice, and
hence might not be robust to severe, non-periodic motion, e.g. drift.
To avoid time-consuming registrations, we follow the approach of selecting
suitable image slices from repeated mechanically-swept US acquisitions. Herein
we focus on the consistency of the 4D reconstruction and the detection of out-
liers due to motion. A large range of selection criteria was first quantitatively
evaluated on simulated US sequences. Then, to have statistical power, only the
baseline and the best method were applied to in-vivo data, and the visual appear-
ances of the reconstructions were scored by 3 researchers and a gynecologist.
2 Material
Simulated Data - To support method development by having a ground truth,

B-mode images were simulated from an in-silicon phantom (see Fig. 1a) based
on [11]. This method uses GPU ray-tracing to simulate US beam propagation and
interactions with given surfaces to accurately simulate typical US attenuation,
reflection, refraction, and shadowing effects. Following the mechanical sweep
protocol of the real data, 3845 frames at an image frequency of fi = 279 frames/s
were simulated. The in-silicon phantom consisted of an ellipsoidal object with
semi-axes of a = [9.911.512.3] mm length. The object changed in size (a ± 20 %)
according to a sinusoidal pattern with the simulated HR frequency. Regular HR
was set to 143.08 beats/min (i.e. 117 frames/beat). Irregular HR was modelled by
linearly changing it by 5 % over 1500 frames. Fetal global motion was simulated
by applying a [4 8 3] mm translation and a [4 3 8]o rotation from frame 701 to
1100 and their inverse from frames 1701 to 2200, see Fig. 1c. Simulations included
3 scenarios, namely (Sim1) irregular HR, no global motion, (Sim2) regular HR,
with global motion, and (Sim3) irregular HR, with global motion.
4D Reconstruction of Fetal Heart Ultrasound Images 595
(a) (b) (c)
Fig. 1. Illustration of (a) the in-silicon phantom geometry with a transducer plane, (b)
a simulated US image and (c) the simulated motion over time.
In-Vivo Data - A total of 10 US sequences of 6 fetus at 19–25 weeks of

gestation (mean ± SD heart semi-axes of [10.99.210.6] ± [3.02.52.3] mm) were
acquired. B-mode images were continuously acquired at fi ∈ [217, 372] frames/s
(i.e. ≈[87,148] frames/beat) during 56–128 motorized forward-backward sweeps,
each covering 25o and consisting of 31 frames (i.e. ≈[18,46] beats/sequence).
3 Method
Reconstruction is based on first estimating the heart rate. Then frames are
selected for reconstruction according to phase, spatial and temporal consistency.
3.1 Mean Heart Rate (HR) Estimation

We tested two approaches (A1, A2) for automatically estimating HR fh (Hz).
A1 was based on computing the autocorrelation of the intensity profile over
time per pixel, taking their mean and then extracting the power spectrum via
Fourier transform. For A2, the image similarity J(i, j) between frames i and j was
measured using various (dis)similarity measures (correlation coefficient (CC),
mean square difference (MSD), mutual information (MI) and US specific image
similarity measures SK1, SK2, CD1, CD2 [12]). The power spectrum of each row
of J was then extracted via Fourier transform and their mean obtained. For each
method the resulting spectrum was bandpass filtered around the expected HR
(100–200 beats/min) and fh was set to the frequency at its global maxima.
3.2 4D Reconstruction
Figure 2 illustrates the problem of reconstructing P 3D phase images from B
B-mode images continuously acquired at K positions in S sweeps. From the
estimated HR fh , the phase value qb ∈ [0.5, P + 0.5] of the B-mode image Ib
(acquired at time t = b/fi ) was estimated from the fractional part of the heart
beats (tfh ), i.e. qb = (P − 1)(tfh − tfh ) + 0.5. The frame from sweep s and
position k is denoted as Iks with associated phase qsk . For reconstructing P 3D
phase images, P × K indices (called škp ) need to be determined.
Table 1 provides an overview of the tested reconstruction methods. Method
M0 selects frames whose phase qsk is closest to the desired phase p [3,4].
Fig. 2. Problem overview: a sequence of B B-mode images from S sweeps of K positions

need to be reconstructed into P 3D images capturing individual heart-beat phases.
Table 1. Overview of methods M0 to M6. Optimization included phase difference PD,

spatial inconsistency cost SC and temporal inconsistency cost TC.
Name fh Cost Optimization type Reference image Im

š
M0 fixed PD global n/a

√
M1 fixed SC sequential m=1, P D < 0.5, min(s)
√
M2 fixed SC sequential m=K/2, P D < 0.5, min(s)
√
M3 fixed SC sequential m=K/2, P D < 0.5, max(sumCC)
M4 fixed PD, SC global n/a
M5 opt. PD, SC global n/a
M6 fixed PD, SC, TC TC sequential n/a
Greedy methods M1–M3 first determine for each phase p a reference B-mode
image Im
šm
p
and then sequentially minimize spatial inconsistency, i.e.

šk+1
p = arg min D Ikskp , Ik+1
s for k = {m, m + 1, . . . , K-1, m-1, m-2, . . . , 1}
s∈Spk+1
(1)
where D is an image dissimilarity measure and Spk = {s| |qsk − p| < 0.5} is the
set of sweep indices of frames at position k belonging to phase p. For M1, Im
šm
p
is
the first frame at position m = 1, which belongs to phase p i.e. šp = min Sp . M2
1 1
is the same as M1 apart from using the midframe (m = K/2). In M3 the most
typical midframe is used as reference, i.e. the midframe which has the highest
K/2
correlation with all other midframes within the phase range Sp :

šK/2
p = arg max CC IK/2
s , IK/2
r . (2)
K/2
s∈Sp K/2
r∈Sp
In M4–M6 different cost functions are globally minimized using dynamic

programming for determining the best P × K frame selection indices škp . M4
balances the phase difference PD against the spatial inconsistency cost SC:
⎛ ⎞
P ⎜ K 2
K−1 ⎟
⎜ ⎟
Čfh = min ⎜ qskkp − p +α D Ikskp , Ik+1
k+1 ⎟ (3)
sk ∈S ⎝ sp
⎠
p
p=1 k=1
k=1
P Dpk SCpk

where weight α = k αk /K was automatically determined by αk = |P Dk /SC k |
with P Dk denoting the mean of P Dpk for the R = 10 closest observations to
p and SC k being the mean of SCpk for the R most similar spatial neighbours.
M5 is the same as M4 apart from also allowing variations in the estimated
HR fh through an additional grid search over 1/f ∈ [1/fh ± 0.05] s to minimize
Čf = minfh ∈f Čfh . M6 extends Eq. (3) by adding a term for temporal consistency
(TC): K
P K−1
K
k k k
Čfh = min P Dp + α SCp + β T Cp (4)
sk
p ∈S
p=1 k=1 k=1 k=1
k k k
where T Cpk = D(Iksk , Iksk ), β = k |P Dp /(T C p K)| and T C p denotes
p [(p−1)modP ]
the mean of T Cpk for the R most similar temporal neighbours. Equation (4) was
sequentially optimized until convergence after reconstructing a phase via Eq. (3).
Outlier Removal (OR) - Having observed on simulated and real data that
motion leads to low CC values when comparing images (see Fig. 3), we also
tested all method after removing low correlating sweeps. For this we created the
CC matrix J for the midframes, determined the midframe with the lowest mean
correlation to all others, and discarded the associated sweep. This was continued
until the lowest mean correlation was >0.5 or only 50 % of sweeps were left.
Sim3
#2
Fig. 3. (left) Example CC matrix J of midframes from Sim3 and for in-vivo sequence
#2. (middle, right) Power spectra from (middle) J and (right) autocorrelation method.

4.1 Mean Heart Rate (HR) Estimation
HR ground truth for in-vivo data was estimated from M-mode traces by counting
the number of heart cycles between the first and the last visible extrema.
Table 2. Ground truth (GT) heart rate (in beats/min) and difference (GT-estimation)
for estimation methods using (A1) autocorrelation or (A2) image similarities.
Method/sequence Sim1 Sim2 Sim3 In-vivo1 In-vivo2 In-vivo3 In-vivo4

GT 143.08 143.08 143.08 148.06 154.29 159.34 147.86
A1 0.00 0.00 0.00 −0.75 0.34 0.60 0.46
A2 CC 0.00 0.00 4.62 −0.75 −2.03 0.60 0.46
A2 MSD 0.00 0.00 0.00 −0.75 16.92 0.60 0.46
A2 SK1 0.00 −4.62 −4.62 −0.75 −2.03 0.60 0.46
A2 MI,SK2,CD1,CD2 0.00 0.00 0.00 −0.75 −2.03 0.60 0.46
Table 2 lists the errors in automatic HR estimation for the 3 simulations

and 4 in-vivo sequences. Errors were below 0.6 % for autocorrelation (A1),
and below 3.5 % for the image similarity measures (A2) apart from MSD for
in-vivo sequence #2 (11.0 %). Hence we used A1 for estimating HR for all 4D
reconstructions.
4.2 4D Reconstruction of Simulated Data
The performance for the simulations was quantified by combined motion errors.
For this, phase errors were converted to motion errors by assigning to each phase
value the corresponding mean change in semi axis length (±2.25 mm).
Table 3 lists the mean absolute error for the most complex simulation (Sim3)
when applying methods M0–M6 using one of 3 image dissimilarity measures
D, and including outlier removal (OR) or not (OR×). Highest accuracy was
Table 3. (top-left) Table with mean absolute errors (in mm) for simulation Sim3. The
lowest errors are marked in bold. (top-right) Visualization of table results. (bottom)
Visualization of results for all simulations and their mean (Sim123).
D OR M0 M1 M2 M3 M4 M5 M6
CC × 2.59 1.68 0.50 0.58 0.88 6.04 0.93
CD2 × 2.59 0.92 0.36 0.47 2.14 4.49 0.39
MI × 2.59 1.81 0.64 0.77 3.95 5.59 1.57
CC 0.71 0.57 0.50 0.59 0.54 1.25 0.55
CD2 0.71 0.62 0.36 0.47 0.68 1.51 0.36
MI 0.71 0.88 0.64 0.77 0.61 1.59 1.03
a b c d
Fig. 4. Orthogonal example slices from reconstruction of simulation Sim3 for (a)
ground truth and methods (b) M0, (c) M2-CD2-OR and (d) M6-CD2-OR.
achieved by M2 based on CD2 with or without OR (M2-CD2) and by M6-CD2-

OR. The result overview shows that without motion (Sim1), the errors were
low and OR had no impact. For simulations with motion, additional optimiza-
tion of the heart rate (M5) was counter-productive, while OR generally helped.
For the motion scenario with regular heart rate, lowest errors were achieved with
M6-CD2 (×: 0.12, : 0.11 mm) followed by M2-CD2 (0.29 mm). When consid-
ering the 3 simulations, M2-CD2 (0.31 mm) and M6-CD2 (×: 0.26, : 0.23 mm)
still provided the lowest errors. The mean runtime of M0, M2, or M6 with OR
was 12 s, 191 s, or 285 s, respectively, when reconstructing Sim3 on a single CPU
using non-optimized Matlab code. Prior OR reduced the image data by 31 %
and the runtime of M2 (M6) by 58 (59)%. Figure 4 shows example reconstruc-
tions for Sim3. Artifacts can be observed for M0 across the combined frames.
Reconstructions by the best OR methods are both very similar to the ground
truth. Due to its lower runtime, we selected M2-CD2-OR for the in-vivo eval-
uation.
4.3 4D Reconstruction of In-Vivo Data
The reconstructed 4D US images were visually inspected using the vv image

viewer [13]. Four raters were asked to compare the image quality provided by 2
methods for 10 cases and to rate them in a Likert scale as A either (score 1: ‘much
better’, 2: ‘better’, 3: ‘equal’, 4: ‘worse’, or 5: ‘much worse’) than B. The mean
score when comparing M2-CD2-OR against M0 was 2.1 (close to ‘better’). The
distribution of the 5 categories was 1: 20.0 %, 2: 57.5 %, 3: 15.0 %, 4: 7.5 % and
5: 0 %. The mean score of the 4 raters ranged from 1.8 to 2.4, with the clinician’s
result being closest to the mean (2.2). The median score (2: ‘better’) was sta-
tistically significantly different from score 3 (‘equal’) at the <0.0001 level using
the Wilcoxon signed rank test. Raters observed reduced artifacts across frames
and fewer motion artifacts. Figure 5 shows sample reconstructions. Misalignment
artifacts are clearly reduced by M2-CD2-OR.
a b c d
Fig. 5. Example of a representative in-vivo reconstruction (mean score 1.75) for (a,b)
M0 and (c,d) M2-CD2-OR for (a,c) phase 2 and (b,d) difference phase 3 - phase 2.
We developed a fast reconstruction method, which improved reconstructions of

4D fetal heart US images noticeable in comparison to neglecting the presence
of fetal motion. Based on evaluations on simulated data, the proposed outlier
removal was found beneficial. The most successful methods were M6 by optimiz-
ing phase, spatial and temporal consistency, and M2 by using the first midframe
within a phase and iteratively selecting the most similar neighbouring slice.
The developed framework is suitable for continuous, long acquisitions. Dis-
similarity calculation of neighbouring slices (97 % of M3 runtime) is easily paral-
lelizable. A real-time implementation can also use the outlier removal criterion to
process midframes immediately for providing real-time feedback on acquisition
quality. The out-of-plane image resolution can be improved by denser sampling
(slower speed) of the sweep. Given the relatively low number of rejected outliers
in this study, reconstruction of more phases should also be possible, if needed.
Acknowledgments. We thank the Swiss CTI and NSF for funding.
References
1. Carvalho, J.S., Allan, L.D., Chaoui, R., Copel, J.A., DeVore, G.R., Hecher, K.,
et al.: ISUOG practice guidelines (updated): sonographic screening examination of
the fetal heart. Ultrasound Obstet. Gynecol. 41(3), 348 (2013)
2. DeVore, G.R., Falkensammer, P., Sklansky, M.S., Platt, L.D.: Spatio-temporal
image correlation (STIC): new technology for evaluation of the fetal heart. Ultra-
sound Obstet. Gynecol. 22(4), 380 (2003)
3. Nelson, T.R., Pretorius, D.H., Sklansky, M., Hagen-Ansert, S.: Three-dimensional
echocardiographic evaluation of fetal heart anatomy and function: acquisition,
analysis, and display. J. Ultrasound Med. 15(1), 1 (1996)
4. Schoisswohl, A., Falkensammer, P.: Method and apparatus for obtaining a volu-
metric scan of a periodically moving object. US Patent 6,966,878, 22 November
2005
5. Wachinger, C., Yigitsoy, M., Rijkhorst, E.-J., Navab, N.: Manifold learning for
image-based breathing gating in ultrasound and MRI. Med. Image Anal. 16(4),
806 (2012)
6. Odille, F., Bustin, A., Chen, B., Vuissoz, P., Felblinger, J.: Motion-corrected, super-
resolution reconstruction for high-resolution 3D cardiac cine MRI. In: Navab, N.,
Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part III. LNCS,
vol. 9351, pp. 435–442. Springer, Berlin (2015)
7. Peterfi, I., Kellenyi, L., Szilagyi, A.: Noninvasive recording of true-to-form fetal
ECG during the third trimester of pregnancy. Obstet. Gynecol. Int. 2014, Article
ID 285636 (2014)
8. Uittenbogaard, L.B., Haak, M.C., Spreeuwenberg, M.D., Van Vugt, J.M.G.: A
systematic analysis of the feasibility of four-dimensional ultrasound imaging using
spatiotemporal image correlation in routine fetal echocardiography. Ultrasound
Obstet. Gynecol. 31(6), 625 (2008)
9. Yagel, S., Benachi, A., Bonnet, D., Dumez, Y., Hochner-Celnikier, D., Cohen, S.M.,
et al.: Rendering in fetal cardiac scanning: the intracardiac septa and the coronal
atrioventricular valve planes. Ultrasound Obstet. Gynecol. 28(3), 266 (2006)
10. Kainz, B., Alansary, A., Malamateniou, C., Keraudren, K., Rutherford, M.,
Hajnal, J.V., Rueckert, D.: Flexible reconstruction and correction of unpredic
table motion from stacks of 2D images. In: Navab, N., Hornegger, J., Wells, W.M.,
Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9350, pp. 555–562. Springer, Berlin
(2015)
11. Bürger, B., Bettinghausen, S., Radle, M., Hesser, J.: Real-time GPU-based ultra-
sound simulation using deformable mesh models. IEEE T Med. Imaging 32(3), 609
(2013)
12. Cohen, B., Dinstein, I.: New maximum likelihood motion estimation schemes for
noisy ultrasound images. Pattern Recogn. 35(2), 455 (2002)
13. Seroul, P., Sarrut, D.: VV: a viewer for the evaluation of 4D image registration. In:
MICAS Journal (MICCAI Workshop - Systems and Architectures for Computer
Assisted Interventions) vol. 40, p. 1 (2008)
Towards Reliable Automatic Characterization
of Neonatal Hip Dysplasia from 3D Ultrasound
Images
Niamul Quader(B) , Antony Hodgson, Kishore Mulpuri, Anthony Cooper,

and Rafeef Abugharbieh
BiSICL, University of British Columbia, Vancouver, Canada

niamul@ece.ubc.ca
Abstract. Ultrasound (US) imaging is recommended for early detection

of developmental dysplasia of the hip (DDH), which includes a spectrum
of hip joint abnormalities in infants. However, the currently standard
2-dimensional (2D) US-based approach to measuring the dysplasia met-
ric (DM), namely the α angle, suffers from high within-hip variability
with standard deviations typically ranging between 3◦ − 7◦ . Such high
variability leads to elevated over- and under-treatment rates in hip classi-
fication. To reduce this high variability inherent to the 2D α angle, α2D ,
we propose a 3D US-based DM in the form of a 3D α angle, α3D , that
more accurately characterizes the morphology of an infant’s hip joint.
Our method leverages phase symmetry features that automatically iden-
tify the 3D bone/cartilage structures to compute α3D . Validating on 30
clinical patient hip examinations, we demonstrate the within-hip vari-
ability of α3D to be significantly smaller than α2D (28.9 % reduction,
p < 0.01). Our findings indicate that α3D may be significantly more
reproducible than the conventional 2D measure, which will likely reduce
misclassification rates.
1 Introduction
Developmental dysplasia of the hip (DDH), which refers to hip joint abnormal-
ities ranging from mild acetabular dysplasia to irreducible hip joint dislocation,
affects 0.16 %−2.85 % of all newborns [1]. Early arthritis is often associated with
DDH [2] so failing to detect and treat DDH in infancy can lead to later expen-
sive corrective surgical procedures. Based on the figures presented in [2], Price
et al. [3] estimated that 25,000 total hip replacements per year are attributable
to missed early diagnosis in the United States alone. At approximately $50,000
per procedure [4], the direct financial impact of this problem is in the order
of $1B/year, not considering the costs of subsequent revision surgeries or other
socioeconomic costs.
To diagnose DDH prior to ossification of the femoral head, 2-dimensional
(2D) ultrasound (US) imaging is currently recommended over other imaging
modalities (e.g. x-ray, magnetic resonance imaging, computed tomography, etc.)

DOI: 10.1007/978-3-319-46720-7 70
Towards Reliable Automatic Characterization of Neonatal Hip Dysplasia 603
due to its low cost and absence of ionizing radiation [5]. The standard DDH
metric obtained from 2D US scans is the angle between the acetabular roof and
the vertical cortex of the ilium, referred to as the alpha angle, α2D [5,6]. In
general, α2D > 60◦ indicates a normal hip, whereas 43◦ < α2D < 60◦ rep-
resents borderline to moderate DDH, and α2D < 43◦ suggests severe DDH
[6,7]. However, α2D suffers from high within-hip variability, i.e. the variabil-
ity between dysplasia metrics (DMs) measured on repeated examinations of the
same patien’s hip, with standard deviations of such measurements, σ, ranging
from 3◦ to 7◦ [8]. This may be partly attributable to variations in manually mea-
suring α2D on 2D slices (subjective variability, σ ≈ 5◦ ), but is more likely due
to variability in α2D resulting from differences within what is considered clini-
cally acceptable (standard) 2D US scans that are caused mainly by differences
in the probe orientation (probe-orientation-dependent variability, σ ≈ 7◦ ) [7].
This high variability in measured α2D leads to significant discrepancies between
the initial clinical determination of dysplasia severity and later clinical assess-
ments. Specifically, estimates suggest that 6 % − 29 % of cases that are later
treated were initially regarded as not needing early treatment [1,9]. Further,
there is significant potential for over-treatment since about 90 % of US-detected
hip dysplasia cases resolve spontaneously [10]. Recently, we have proposed to
reduce the subjective variability by automatically extracting α2D from 2D US
[11]. Our preliminary results in that work showed a 9 % reduction in within-hip
variability [11].
To further reduce the within-hip variability, in this paper, we address the
crucial probe-orientation-dependent variability problem [7]. More specifically, we
propose to characterize DDH based on an intrinsically 3D morphology metric
derived directly from 3D US scans, which we argue captures more of the pertinent
anatomical structures, while reducing the dependency on probe orientation in
imaging those structures. To the best of our knowledge, only one previous work
[12] has proposed the use of an intrinsically 3D DM, the acetabular contact angle
(ACA). Similar to α2D , the ACA represents the angular separation between the
acetabular roof (A) and the lateral iliac (I), except that the ACA is based on
the segmented 3D surfaces of A and I. Hareendranathan et al.’s [12] method
involves a slice-by-slice analysis process that requires manually selecting 4 seed
points in each of the 2D US slices in a 3D US volume and manually separating A
from I. Using such an interactive method would require valuable clinician time
and the manual operations introduce within-image measurement variability of
approximately 1◦ [12] and inter-scan variability of approximately 4◦ [13].
In this paper, we propose a fully automatic approach for extracting a new
3D DM, the 3D alpha angle, α3D , by analogy to a α2D [6]. To the best of
our knowledge, our work is the first that proposes a fully automatic approach of
extracting a 3D dysplasia metric. In this paper, we: (1) extend our previous phase
symmetry feature-based bone/cartilage extraction [11] to 3D, (2) define our new
proposed 3D metric, α3D , (3) automatically extract α3D , (4) demonstrate on
real clinical data a significant decrease in within-hip variability of α3D compared
to α2D .
604 N. Quader et al.
2 Methods
The 2D dysplasia metric, α2D , is defined as the angle between the fitted straight
lines that approximate A and I when viewed on a 2D B-mode US image [5,6]. We
therefore define an analogous 3D metric, α3D , based on the relative orientations
of the fitted planar surfaces of A and I (Fig. 1b). Briefly, given a 3D B-mode US
image, U : X ⊂ IR3 → IR, where X = (x, y, z) are the voxel coordinates, our
approach starts by extracting the bone cartilage structures, B (Sect. 2.1). We
then use prior anatomical knowledge of the hip joint to automatically identify
the 3D surfaces of A and I within B (Sect. 2.2). Finally, we approximate the
average normals across A and I, and compute α3D as the angle between these
approximated normals (Sect. 2.3).
2.1 Extraction of Bone/Cartilage Structures
To extract the hyperechoic bone/cartilage boundaries, we first compute the local

phase symmetry feature, P S, from U using responses to a band-pass quadra-
ture filter bank [14]. To segment the sheet-like bone and cartilage surfaces from
the P S feature volume, we deploy a multi-scale eigen analysis of the Hessian
matrix. For eigenvalues |λ1 | ≤ |λ2 | ≤ |λ3 |, voxels on sheet-like structures will
exhibit |λ1 | ≈ |λ2 | ≈ 0, |λ3 | 0. We enhance the sheet-like features in our P S
volume as:

0, if λ3 > 0.
SP S = (1)
(1 − exp(−Ra ))(exp(−Rb ))(1 − exp(−S )), otherwise.
2 2 2
where Ra = abs(2 ∗ |λ3 | − |λ2 | − |λ1 |)/ |λ3 | is

a blob eliminating term, Ra =
2 2 2
|λ2 | / |λ3 | is a sheet enhancing term and S = |λ1 | + |λ2 | + |λ3 | is a noise
cancelling term [15].
Bone/cartilage boundaries also tend to attenuate an US beam more than
other neighboring structures (e.g. soft-tissue, etc.). We further enhance the
bone/cartilage structures in SP S to form B (Fig. 1e) using an attenuation-
based method [14]. Specifically, the enhancement process extracts attenuation
and shadowing features from U , which are then used to modify SP S such that
the bone/cartilage boundaries are enhanced and outliers (e.g. soft-tissue, etc.)
are removed [14].
2.2 Localization of the 3D Ilium and Acetabulum Surfaces
Our proposed α3D is based only on the surfaces of A and I, which are both
substructures of the detected B volume. To isolate A and I from other back-
ground structures, we use hip anatomy-based priors; the first prior is that A and
I are located superior to the spherical femoral head, F (Fig. 1(a,e,f)), while the
second prior is that A, I and the labrum (cartilage) tend to have a common
junction at the edge of the ilium (Fig. 1(e,g)). We thus start by detecting F ,
which we estimate by fitting a sphere (with radius r and center c = [cx , cy , cz ])

in the B volume using an M-estimator SAmple Consensus (MSAC) algorithm
[16]. In fitting the sphere, we set the maximum allowable distance from an inlier
point to the sphere as 1mm based on the empirical observation that average
radius of an infant’s femoral head is around 10 mm [6]. Next, we locate the edge
of the ilium, i = [ix , iy , cz ], based on the fact that the bone/cartilage structures
around such a junction point, i, has a heterogeneous gradient direction in the
image volume. Specifically, we calculate a direction-variability feature (Fig. 1g),
d, that captures the variability in orientations (angular values of eigenvectors
to λ1 ) of bone/cartilage surfaces (voxels with B > 0). We define
corresponding
d as, d = k std(tan−1 (uk /vk )), where k ∈ x represents the neighboring r/2
voxels, and uk and vk are the x and y direction components of eigenvectors
corresponding to minor eigenvalues λ1 . Amongst the voxels located superior to
c, we expect the directional variability d to have the highest response around i
(Fig. 1g); consequently, we assign i = [id,x , id,y , cz ], where id,x and id,y are the
x and y coordinates corresponding to the maximum value of d that is located
superior to c.
Fig. 1. (a) Rendering of the anatomy of a hip joint showing A (red), I(blue) and the
femoral head. (b) Schematic illustration of α3D - the angle between the fitted planar
surfaces that approximate A and I. (c) An example clinical B-mode volumetric US
image of a neonatal hip. (d) Extracted sheet-like SP S responses. (e) Extracted bone
and cartilage B responses with arrows pointing to A, I and the labrum. (f) Segmented
femoral head, F . (g) Responses of the direction-variability feature with arrow pointing
to the maximum response. (h) Extracted ROIs of A and I within B.
Having located F (radius r, center c = [cx , cy , cz ]) and i = [id,x , id,y , cz ],

we extract cubic regions-of-interest (ROIs) around A and I that are centered at
[id,x + r/2, id,y − r/2, cz ] and [id,x − r/2, id,y , cz ], respectively, with sides of length
r (Fig. 1h). These ROIs simplify the fitting of planes that approximate A and I
by reducing background structures and by lowering computation complexity.
2.3 Computation of the 3D α Angle
Once we have identified the 3D surfaces of A and I, we proceed to fit planes that

approximate A and I, which we then use to estimate α3D . We start by evaluating
the 3D Radon transforms, RA and RI of A and I, respectively. The 3D Radon
transform, R, maps an input in IR3 into a set of its plane integrals in IR3 , where
the orientations of the planes are represented by elevation angles, θ, and azimuth
angles, φ [17]. We estimate the best-fit plane for an input at Rm = max(R), i.e.,
the plane for which the plane integral is maximum, with orientations θm , φc
and unit normal vector n = [cosθm sinφm , sinθm sinφm , cosφm ] [17]. We use the
resulting unit normal vectors nA and nI to calculate α3D using the formula
α3D = cos−1 (nA .nI )/(|nA | |nI |).
Data Acquisition and Experimental Setup: In this study, two orthopedic sur-
geons at British Columbia Children’s hospital participated in collecting 3D US
images from 30 hip examinations (15 US sessions × 2 hips/US session = 30
hip examinations) belonging to 12 patients (single US session for 9 patients and
two US sessions for 3 patients), obtained as part of routine clinical care under
the appropriate institutional review board approval. To investigate repeatability,
i.e., within-hip variability, each hip examination consisted of five repeated 3D
US image acquisitions. Our proposed α3D angles were automatically calculated
for each of the five 3D US volumes. Furthermore, each of the infants underwent
an independent regular clinical care 2D US scanning session at the radiology
department, where the infants were scanned by the radiologist on duty. In every
2D US session, the radiologist would acquire repeated 2D US images and make
manual measurements of α2D (3 to 6 measurements per hip) on all images that
the radiologist judged to adequately show the key anatomical structures needed
to measure α2D ; subsequently, these were recorded in and retrieved from the
patient chart. As in a typical 2D US scan session (Fig. 2(a,b)), the 3D scans
were acquired in the coronal plane, where the ilium (located superior to the
femoral head, Fig. 1a) appears towards the left of the femoral head in an US
image (Fig. 1(e,f)).
Validation Scheme: We compared the automatically extracted measure, α3D ,

with the currently standard manual 2D measurement, α2D . Specifically, we
analyzed the following for each of the 30 hip examinations: (1) the correla-
tion between the averages of α2D and α3D on individual hips (i.e., between
mean(α2D ) and mean(α3D )), (2) the discrepancy between the averages of α2D
and α3D (i.e., mean(α2D ) − mean(α3D )), and (3) the within-hip variability, σ,
in the measurements (i.e., std(α2D ) and std(α3D )) on individual hips.
Correlation Between α2D and α3D : In terms of agreement, mean(α3D ) showed

good correlation with mean(α2D ) across all the 30 hip examinations (Fig. 3a,
Fig. 2. Qualitative results. (a), (b), (d) and (e) show example variability of α2D and
α3D from two 2D and two 3D US images from a hip examination of patient 3DUS004
(α2D = 47◦ and 56◦ , α3D = 44.8◦ and 45.2◦ ). The higher variability in the input 2D
US images (and α2D values) can be seen in the manually aligned 2D US images in (c)
compared to the variability in the manually aligned 3D US images (and α3D values)
in (f).
correlation coefficient R = 0.87 (95 % confidence interval: 0.74 to 0.94)). This

suggests that the proposed α3D measures an aspect of morphology similar to
what the current α2D measures.
Discrepancy Between α2D and α3D : The difference between the two metrics,
mean(α2D ) − mean(α3D ) in the 30 hip examinations, was significant (p < 0.01,
mean: 5.17◦ , SD: 3.33◦ with a bias towards α3D being smaller than α2D ).
Variability of Metrics, σ: The automatic α3D shows a statistically signifi-
cant improvement in variability compared to the manual α2D (p = 0.0053,
mean(σ3D ) = 2.19◦ , mean(σ2D ) = 3.08◦ (qualitative result in Fig. 2 and box-
plot in Fig. 3b)). This 28.9 % reduction in variability suggests that probe position
variation has a larger effect on variability in the DM than manual processing of
the 2D US (9 % improvement with automatic image processing within a 2D US
in our previous study [11]). The residual variability of α3D (σ3D ≈ 2◦ ) seems
to be small enough to be diagnostically valuable, given that the typical range
from normal to dysplastic hip is around 17◦ [6,7]. Furthermore, the variability
Fig. 3. (a) Scatter plot of α2D and α3D . (b) Box-plot of the standard deviations,
σ, among the manual α2D and automatic α3D measurements across all the 30 hip
examinations.
of α3D appears substantially lower than the reported variability of the recently
described 3D ACA metric (2.19◦ versus 4.1◦ [13]).
Computational Considerations: The complete process of extracting α3D from an
US volume took approximately 270 seconds, when run on a Xeon(R) 3.40 GHz
CPU computer with 12 GB RAM. All processes were executed using MATLAB
2015b. Current practice has a sonographer process the images post-acquisition,
so this computation time is not a significant barrier to implementation. Although
not critical for clinical use, we plan to work towards optimizing our code to reduce
this computation time.
4 Conclusions
We presented an automatic 3D dysplasia metric, α3D , to characterize hip dys-
plasia in 3D US images of the neonatal hip. Using the proposed α3D resulted
in a statistically significant reduction in variability compared to the currently
standard 2D measure, α2D . This suggests that this 3D morphology-derived DM
could be valuable in improving the reliability in diagnosing DDH, which may
lead to a more standardized DDH assessment with better diagnostic accuracy.
Notably, the improvement in reliability associated with the 3D scans was
achieved by orthopaedic surgeons, who have limited training in performing US
examinations, while the 2D scans and metrics were obtained from radiologists
with explicit training in ultrasound acquisition and analysis. This strongly sug-
gests that we may, in future, be able to train personnel other than radiologists to
obtain reliable and reproducible dysplasia metrics using 3D ultrasound machines,
potentially reducing the costs associated with screening for DDH.
References
1. Shorter, D., Hong, T., Osborn, D.A.: Cochrane review: screening programmes for
developmental dysplasia of the hip in newborn infants. Evid.-Based Child health:
Cochrane Rev. J. 8(1), 11–54 (2013)
2. Hoaglund, F.T., Steinbach, L.S.: Primary osteoarthritis of the hip: etiology and
epidemiology. JAAOS 9(5), 320–327 (2001)
3. Price, C.T., Ramo, B.A.: Prevention of hip dysplasia in children and adults.
Orthop. Clin. North Am. 43(3), 269–279 (2012)
4. Rosenthal, J.A., Lu, X., Cram, P.: Availability of consumer prices from us hospitals
for a common surgical procedure. JAMA Intern. Med. 173(6), 427–432 (2013)
5. Atweh, L.A., Kan, J.H.: Multimodality imaging of developmental dysplasia of the
hip. Pediatr. Radiol. 43(1), 166–171 (2013)
6. Graf, R.: Fundamentals of sonographic diagnosis of infant hip dysplasia. J. Pediatr.
Orthop. 4(6), 735–740 (1984)
7. Jaremko, J.L., et al.: Potential for change in us diagnosis of hip dysplasia solely
caused by changes in probe orientation: patterns of alpha-angle variation revealed
by using three-dimensional us. Radiology 273(3), 870–878 (2014)
8. Ömeroğlu, H.: Use of ultrasonography in developmental dysplasia of the hip. J.
Child. Orthop. 8(2), 105–113 (2014)
9. Imrie, M., et al.: Is ultrasound screening for DDH in babies born breech sufficient?
J. Child. Orthop. 4(1), 3–8 (2010)
10. Shipman, S.A., Helfand, M., Moyer, V.A., Yawn, B.P.: Screening for developmental
dysplasia of the hip: a systematic literature review for the us preventive services
task force. Pediatrics 117(3), e557–e576 (2006)
11. Quader, N., Hodgson, A., Mulpuri, K., Schaeffer, E., Cooper, A., Abugharbieh, R.:
A reliable automatic 2D measurement for developmental dysplasia of the hip. Bone
Joint J. (2016, in press)
12. Hareendranathan, A.R., Mabee, M., Punithakumar, K., Noga, M., Jaremko, J.L.:
A technique for semiautomatic segmentation of echogenic structures in 3D ultra-
sound, applied to infant hip dysplasia. IJCARS 11, 1–12 (2015)
13. Mabee, M.G., Hareendranathan, A.R., Thompson, R.B., Dulai, S., Jaremko, J.L.:
An index for diagnosing infant hip dysplasia using 3-D ultrasound: the acetabular
contact angle. Pediatr. Radiol. 1–9 (2016)
14. Quader, N., Hodgson, A., Abugharbieh, R.: Confidence weighted local phase fea-
tures for robust bone surface segmentation in ultrasound. In: Linguraru, M.G.,
Laura, C.O., Shekhar, R., Wesarg, S., Ballester, M.Á.G., Drechsler, K., Sato, Y.,
Erdt, M. (eds.) CLIP 2014. LNCS, vol. 8680, pp. 76–83. Springer, Heidelberg (2014)
15. Descoteaux, M., Audette, M., Chinzei, K., Siddiqi, K.: Bone enhancement filter-
ing: application to sinus bone segmentation and simulation of pituitary surgery.
Comput. Aided Surg. 11(5), 247–255 (2006)
16. Torr, P.H., Zisserman, A.: MLESAC: a new robust estimator with application to
estimating image geometry. Comput. Vis. Image Underst. 78(1), 138–156 (2000)
17. Averbuch, A., Shkolnisky, Y.: 3D fourier based discrete radon transform. Appl.
Comput. Harmonic Anal. 15(1), 33–69 (2003)
Image-Based Computer-Aided Diagnostic
System for Early Diagnosis of Prostate Cancer
Islam Reda1,2 , Ahmed Shalaby2 , Mohammed Elmogy1 , Ahmed Aboulfotouh1 ,

Fahmi Khalifa2 , Mohamed Abou El-Ghar3 , Georgy Gimelfarb4 ,
and Ayman El-Baz2(B)
1
Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
2
Bioengineering Department, University of Louisville, Louisville, KY 40292, USA
aselba01@louisville.edu
3
Radiology Department, Urology and Nephrology Center,
University of Mansoura, Mansoura, Egypt
4
Department of Computer Science, University of Auckland, Auckland, New Zealand
Abstract. The goal of this paper is to develop a computer-aided diag-

nostic (CAD) system for early detection of prostate cancer from diffusion-
weighted magnetic resonance imaging (DW-MRI) acquired at different
b-values. The proposed system consists of three main steps. First, the
prostate is segmented using a hybrid framework that integrates geomet-
ric deformable model (level-sets) and nonnegative matrix factorization
(NMF). Secondly, the apparent diffusion coefficient (ADC) of the seg-
mented prostate volume is first estimated at different b-values and is
then normalized and refined using a generalized Gauss-Markov random
field (GGMRF) image model. Then, the cumulative distribution function
(CDF) of the refined ADCs at different b-values are constructed. Finally,
a two-stage structure of stacked non-negativity constraint auto-encoder
(SNCAE) is trained to classify the prostate tumor as benign or malignant
based on the constructed CDFs. In the first stage, classification proba-
bilities are estimated at each b-value and in the second stage, those prob-
abilities are fused and fed into the prediction stage SNCAE to calculate
the final classification. Preliminary experiments on 53 clinical DW-MRI
datasets resulted in 98.11 % correct classification (sensitivity = 96.15 %
and specificity = 100 %), indicating the high performance of the proposed
CAD system and holding promise of the proposed system as a reliable
non-invasive diagnostic tool.
Keywords: Prostate cancer · NMF · Autoencoder · CAD
1 Introduction
Prostate cancer is the most frequently diagnosed malignancy after skin cancer,
and is the second primary reason of cancer deaths in American men after lung
cancer. More than 220,000 new cases are diagnosed with prostate cancer and
about 27,540 deaths because of prostate cancer among Americans were reported
in 2015 [1]. Fortunately, the mortality rates can be reduced in case of detecting

DOI: 10.1007/978-3-319-46720-7 71
Image-Based CAD System for Early Diagnosis of Prostate Cancer 611
prostate cancer in its early stages. Currently, the standard technique for diag-
nosing prostate cancer is to carry out a transrectal ultrasound (TRUS)-guided
needle biopsy after an elevated level prostate specific antigen (PSA) in the blood,
greater than 4 ng/mL (nanograms per millimeter), is reported. When there is a
contradiction between the PSA level and the results of TRUS-guided biopsy
(such as, elevated PSA level while the biopsy result is negative), the use of MRI
to detect prostate cancer can be significant [2].
Different MRI techniques, such as T2 -weighted MRI, dynamic contrast
enhanced MRI (DCE-MRI) and DW-MRI, have been utilized in computer-aided
diagnostic (CAD) systems for detecting prostate cancer. T2 -weighted MRI pro-
vides superior pathological details of soft tissues but it lacks functional informa-
tion and the use of T2 -weighted MRI alone has resulted in low specificity [3].
Recently, the trend is to use functional MRI modalities such as DCE-MRI and
DW-MRI to increase the diagnostic accuracy. DCE-MRI employs contrast mate-
rials (e.g., gadolinium) to improve the contrast between the different tissue types.
However, DCE-MR images require long acquisition time and contrast materi-
als are deleterious especially for patients with kidney problems. On the other
hand, DW-MRI identifies tissue cellularity indirectly by studying the diffusion of
water molecules. Although the quality of DW-MR images is lower than DCE-MR
images, DW-MR images have distinct advantages over DCE-MR images as they
can be acquired very quickly, without the use of contrast materials [4]. The diag-
nostic accuracy of DW-MRI is higher than DCE-MRI and T2 -weighted MRI [5].
In the literature, a small number of prostate cancer CAD systems have eval-
uated the use of DW-MR images alone or in combination with other MRI tech-
niques [6]. For example, Firjani et al. [7] developed a DW-MRI based CAD
system in which a k-nearest-neighbor (KNN) classifier used three intensity fea-
tures to classify the prostate into benign or malignant. The first multiparametric
CAD system was proposed by Chan et al. [8] using T2 -MRI, T2 -mapping, and
line scan diffusion imaging (LSDI). Intensity and textural features were extracted
from manually-localized prostate region and fed into a support vector machine
(SVM) classifier or Fisher linear discriminant (FLD) classifier to detect prostate
cancer in the peripheral zone (PZ) of the prostate. The area under the curve
(AUC) was 0.76 ± 0.04 for the SVM and 0.84 ± 0.06 for the FLD. Another multi-
parametric CAD system that employed T2 -weighted MRI, DCE-MRI, and DW-
MRI was proposed by Litjens et al. [9]. In this system, an SVM classifier used
apparent diffusion coefficients (ADCs) and pharmacokinetic features extracted
from the segmented prostate to determine malignant and benign regions. Vos
et al. [10] proposed another multiparametric CAD system that utilized the same
MRI modalities used in [9]. In their system, a linear discriminant analysis (LDA)
classifier employed a set of features (e.g., texture-based, ADC maps) to differ-
entiate between malignant and benign prostates. It is obvious that most of the
aforementioned CAD systems used multi-parametric MRI which can be cost
inefficient [11], our CAD system for early detection of prostate cancer utilizes
DW-MRI. In our CAD system, the focus is on classifying the entire prostate
volume into malignant or benign and not on finding the location of the cancer.
Details of the proposed system will be discussed in the following sections.
612 I. Reda et al.
2 Methods
The proposed CAD system summarized in Fig. 1 performs sequentially three
steps. First, the prostate is segmented using our previously developed geomet-
ric deformable model (level-sets) as described in [12]. This model is guided by
a stochastic speed function that is derived using nonnegative matrix factoriza-
tion (NMF). The NMF attributes are calculated using information from the
MRI intensity, a probabilistic shape model, and the spatial interactions between
prostate voxels. The proposed approach reaches 86.89% overall Dice similarity
coefficient and an average Hausdorff distance of 5.72 mm, indicating high seg-
mentation accuracy. Details of this approach and comparisons with other seg-
mentation approaches can be found in [12]. Afterwards, global features describing
the water diffusion inside the prostate tissue are extracted based on ADC-CDFs.
Finally, a two-stage structure of stacked nonnegativity constraint auto-encoder
(SNCAE) is trained to classify the prostate tumor as benign or malignant based
on the CDFs constructed in the previous step. The latter two steps of the pro-
posed CAD system are discussed in the following sections.
Fig. 1. Framework of the proposed DW-MRI CAD system.
2.1 Feature Extraction

After the prostate tissues are segmented, discriminatory features are estimated
from the segmented DW-MRI data and are used to distinguish between benign
and malignant cases. In this paper, ADCs were used as discriminatory features
to assess the tumor status, where the malignant tissues show a lower ADC at
different b-values compared with benign and normal tissue due to the replace-
ment of normal tissue [13]. The the voxel-wise ADC is computed according to
Eq. (1) to generate the ADC map at each b-value.
ln SS01 (x,y,z)
(x,y,z)
ADC(x, y, z) = (1)
b1 − b 0
where S0 and S1 are the signal intensity acquired at the b0 and b1 b-values,
respectively. Then, all ADC maps at certain b-value for all subjects are nor-
malized with respect to the maximum value of all of these maps to make all
calculated ADC maps in the same range (between 0 and 1) in order to use a
unique color coding for all of them. The calculated ADC values are refined using
a generalized Gauss-Markov random field (GGMRF) image model with a 26-
voxel neighborhood to remove any data inconsistency and preserve continuity.
Continuity of the constructed 3D volume is amplified by using their maximum a
posteriori (MAP) estimates. The CDFs of the normalized ADCs of each subject
are constructed. These CDFs are considered as global features distinguishing
between benign and malignant cases. Instead of using the whole ADC volume,
the resultant CDFs are used to train an SNCAE classifier using the deep learning
approach.
It is worth noting that conventional classification methods, employing
directly the voxel-wise ADCs of the entire prostate volume as discriminative fea-
tures, encounter at least two serious difficulties. Various input data sizes require
unification by either data truncation for large prostate volumes, or zero padding
for small ones. Both ways may decrease the accuracy of the classification. Tech-
niques like bag-of-visual-words (BoVW) can be employed to overcome the diffi-
culty of various input data sizes but the data has to be aligned and the accuracy
of BoVW technique is a function of the data resolution and the size of the bag.
In addition, large ADC data volumes lead to considerable time expenditures for
training and classification. Contrastingly, our SNCAE classifier exploits only the
100-component CDFs to describe the entire 3D ADC maps estimated at each
b-value. This fixed data size helps overcome the above challenges and notably
expedites the classification.
2.2 A Two-Stage Classification
To classify the prostate tumor, our CAD system employs a deep neural network
with two-stage structure of stacked autoencoders (AE). In the first stage, seven
autoencoder-based classifiers, one classifier for each of seven different b-values
(100 to 700 s/mm2 ), are utilized to estimate initial classification probabilities
that are concatenated and fed in the second stage into another SNCAE to esti-
mate the final classification.
Each AE compresses its input data (100-component CDFs at some b-value)
to capture the most prominent variations and is built separately by greedy unsu-
pervised pre-training [14]. A softmax output layer, stacked after AE layers, facili-
tates the subsequent supervised back-propagation-based fine tuning of the entire
classifier by minimizing the total loss (negative log-likelihood) for given training
labeled data. Using the AEs with a non-negativity constraint (NCAE) [15] yields
both more reasonable data codes (features) during its unsupervised pre-training
and better classification performance after the supervised refinement.
For each SNCAE, let W = {Wje , Wid : j = 1, . . . , s; i = 1, . . . , n} denote a
set of column vectors of weights for encoding (e) and decoding (d) layers of a
single AE. Let T denote vector transposition. The AE converts an n-dimensional
column vector u = [u1 , . . . , un ]T of input signals into an s-dimensional column
vector h = [h1 , . . . , hs ]T of hidden codes (features, or activations), such that
614 I. Reda et al.
s n, by uniform nonlinear transformation of s weighted linear combinations

of signals: n

e T
hj = σ Wj u ≡ σ e
wj:i ui
i=1
where σ(. . .) is a certain sigmoid, i.e., a differentiable monotone scalar function

with values in the range [0, 1].
The classifier is built by stacking the NCAE layers with an output soft-
max layer, that computes a softmax regression, generalizing the common logistic
regression to more than two classes as shown in Fig. 2(a). Each NCAE is pre-
trained separately in the unsupervised mode, by using the activation vector of a
lower layer as the input to the upper layer. In our case, the initial input data uf ;
f = 1, . . . , 7 consisted of the 100-component CDFs, each of size 100. The bottom
NCAE compresses the input vector to s1 = 50 first-level activators, compressed
by the next NCAE to s2 = 5 second-level activators, which are reduced in turn
by the output softmax layer to s◦ = 2 values.
Fig. 2. Schematic diagram of a two-stage structure of SNCAE.
e T [1]
The activations of the second NCAE layer, h[2] = σ(W[2] h ), are inputs of
the softmax classification layer, as sketched in Fig. 2(a) to compute a plausibility
of a decision in favor of each particular output class, c = 1, 2:
exp(W◦:cT
h[2] ) 2
p(c; W◦:c ) = T h[2] ) + exp(WT h[2] )
; c = 1, 2; p(c; W◦:c ; h[2] ) = 1.
exp(W◦:1 ◦:2 c=1
Finally, the entire stacked NCAE classifier (SNCAE) is fine-tuned on the

labeled training data by the conventional error back-propagation through the
network and penalizing only the negative weights of the softmax layer. The
parameters that specify relative contributions of the non-negativity and sparsity
constraints to the overall loss were chosen based on comparative experiments.
In the second stage, each SNCAE’s output probability is extracted and fused
by concatenation, resulting in a vector of fused probability ut = [g1 , . . . , g14 ]
as shown in Fig. 2(b). To enhance the classification accuracy, this vector (ut ) is
fed into a new SNCAE to estimate the final classification as a class probability
using the following equation,
t T
t exp(W◦:c gt )
pt (c; W◦:c ) = C T
; c = 1, 2 (2)
t
c=1 exp(W◦:c gt )
Experiments were conducted on 53 DW-MRI data sets (27 benign and 26 malig-
nant) obtained using a body coil Signa Horizon GE scanner in axial plane with
the following parameters: Magnetic field strength: 1.5 T; TE: 84.6 ms; TR: 8000
ms; Bandwidth: 142.86 kHz; FOV: 34 cm; Slice thickness: 3 mm; Inter-slice gap:
0 mm; Acquisition sequence: conventional EPI; Diffusion weighting directions:
mono direction; the used range of b-values is from 0 to 700 s/mm2 . On average,
26 slices were obtained in 120 s to cover the prostate in each patient with voxel
size of 1.25×1.25×3.00 mm3 . The ground truths are performed on a slice-by-slice
basis and obtained by manual segmentation using Slicer c
(www.slicer.org). All
annotations are verified by an expert. All the subjects were diagnosed using a
biopsy and the Gleason scores for the malignant cases range from 6 to 8. The
cases were evaluated as a whole and not per tumor.
To learn the statistical characteristics of both benign and malignant subjects,
we trained 7 different SNCAE, one for each b-value, by 53 DW-MRI datasets
(27 benign and 26 malignant). All training was done inside leave-one-subject-out
cross-validation framework. The features involved for classification are the CDF
of the normalized ADC maps for 7 different b-values of the segmented prostate
tissue. To assess the accuracy of our system, we perform a leave-one-subject-
out cross-validation test for each AE with the whole 53 datasets. The overall
diagnostic accuracy for different b-values are summarized in Table 1.
In the last stage of the classification, we concatenated the output probabilities
from the 7 AEs. This vector of the fused probabilities is fed into the prediction
stage SNCAE. Our classifier achieves an overall accuracy of 98.11 % for all testing
data sets which is higher than all reported accuracies in Table 1.
To highlight the merit of using the proposed system, a comparison between
our classifier and four other ready-to-use classifiers (K*, K-nearest neighbor,
Random Forest and Random Tree classifiers implemented in Weka toolbox) [16]
is summarized in Table 2. The input features for each of those four classifiers are
the 100-component CDFs. As demonstrated in Table 2, the proposed framework
616 I. Reda et al.
Table 1. Classification accuracy of our SNCAE classifier at different b-values.
Autoencoder Correct instance Accuracy

SNCAE 1 (b-value = 100) 50 out of 53 94.34 %
Table 2. Classification accuracy, sensitivity, specificity and AUC of our SNCAE clas-
sifier and four ready-to-use Weka classifiers.
Classifier Accuracy Sensitivity Specificity AUC

SNCAE (proposed) 98.11 % 96.15 % 100 % 0.987
K* (K-Star) 94.32 % 94.33 % 94.42 % 0.926
KNN-classifier (IBK) 88.67 % 88.63 % 88.71 % 0.887
Random forest 88.64 % 88.72 % 88.60 % 0.952
Random tree 84.91 % 85.13 % 84.93 % 0.851
Fig. 3. ROC curves for our SNCAE classifier and four ready-to-use Weka classifiers.
outperforms the other alternatives. The corresponding areas under the curve
(AUC) of the receiver operating characteristics of those classifiers are shown in
Fig. 3. The AUC of the proposed classifier approaches 0.987.
4 Conclusions
This paper presented an efficient DW-MRI CAD system for early detection of
prostate cancer. The proposed CAD system used integral statistics (CDFs of the
ADCs) of the segmented prostates to train an SNCAE classifier. The proposed

CAD system was tested on DW-MRI datasets from 53 subjects acquired at
different b-values (100 to 700 s/mm2 ). The leave-one-subject-out experiments
resulted in 98.11 % overall diagnostic accuracy using all b-values. Our future
work will include acquiring data at higher b-values (larger than 700 s/mm2 ) and
increasing the number of training and testing datasets to confirm the accuracy
and robustness of the proposed CAD system.
References
1. Siegel, R.L., Miller, K.D., Jemal, A.: Cancer statistics, 2015. CA: Cancer J. Clin.
65(1), 5–29 (2015)
2. Lawrentschuk, N., Fleshner, N.: The role of magnetic resonance imaging in tar-
geting prostate cancer in patients with previous negative biopsies and elevated
prostate-specific antigen levels. BJU Int. 103(6), 730–733 (2009)
3. Hoeks, C.M., et al.: Prostate cancer: multiparametric MR imaging for detection,
localization, and staging. Radiology 261(1), 46–66 (2011)
4. Tan, C.H., Wang, J., Kundra, V.: Diffusion weighted imaging in prostate cancer.
Eur. Radiol. 21(3), 593–603 (2011)
5. Tamada, T., Sone, T., Jo, Y., Yamamoto, A., Ito, K.: Diffusion-weighted MRI and
its role in prostate cancer. NMR Biomed. 27(1), 25–38 (2014)
6. Lemaı̂tre, G., et al.: Computer-aided detection and diagnosis for prostate cancer
based on mono and multi-parametric MRI: a review. Comput. Biol. Med. 60, 8–31
(2015)
7. Firjani, A., Elnakib, A., Khalifa, F., Gimel’farb, G., El-Ghar, M.A.,
Elmaghraby, A., El-Baz, A.: A diffusion-weighted imaging based diagnostic sys-
tem for early detection of prostate cancer. J. Biomed. Sci. Eng. 6(03), 346 (2013)
8. Chan, I., et al.: Detection of prostate cancer by integration of line-scan diffusion,
T2-mapping and T2-weighted magnetic resonance imaging; a multichannel statis-
tical classifier. Med. Phys. 30(9), 2390–2398 (2003)
9. Litjens, G., Vos, P., Barentsz, J., Karssemeijer, N., Huisman, H.: Automatic com-
puter aided detection of abnormalities in multi-parametric prostate MRI. In: Pro-
ceedings of SPIE Medical Imaging 2011: Computer-Aided Diagnosis, vol. 7963, pp.
79630T–79630T. International Society for Optics and Photonics (2011)
10. Vos, P., Barentsz, J., Karssemeijer, N., Huisman, H.: Automatic computer-aided
detection of prostate cancer based on multiparametric magnetic resonance image
analysis. Phys. Med. Biol. 57(6), 1527 (2012)
11. Hambrock, T., Somford, D.M., Hoeks, C., Bouwense, S.A., Huisman, H., Yakar, D.,
van Oort, I.M., Witjes, J.A., Fütterer, J.J., Barentsz, J.O.: Magnetic resonance
imaging guided prostate biopsy in men with repeat negative biopsies and increased
prostate specific antigen. J. Urol. 183(2), 520–528 (2010)
12. McClure, P., Khalifa, F., Soliman, A., El-Ghar, M.A., Gimelfarb, G.,
Elmagraby, A., El-Baz, A.: A novel NMF guided level-set for DWI prostate seg-
mentation. J. Comput. Sci. Syst. Biol. 7, 209–216 (2014)
13. Le Bihan, D.: Apparent diffusion coefficient and beyond: what diffusion MR imag-
ing can tell us about tissue structure. Radiology 268(2), 318–322 (2013)
14. Bengio, Y., et al.: Greedy layer-wise training of deep networks. Adv. Neural Inf.
Process. Syst. 19, 153 (2007)
618 I. Reda et al.
15. Hosseini-Asl, E., Zurada, J., Nasraoui, O.: Deep learning of part-based representa-
tion of data using sparse autoencoders with nonnegativity constraints. IEEE Trans.
Neural Netw. Learn. Syst. 99, 1–13 (2015)
16. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The
weka data mining software: an update. ACM SIGKDD Explor. 11(1), 10–18 (2009)
Multidimensional Texture Analysis for Improved
Prediction of Ultrasound Liver Tumor Response
to Chemotherapy Treatment
Omar S. Al-Kadi1,2(B) , Dimitri Van De Ville2,3 , and Adrien Depeursinge2,4

1
King Abdullah II School for Information Technology,
University of Jordan, Amman, Jordan
2
School of Engineering, École Polytechnique Fédérale de Lausanne (EPFL),
Lausanne, Switzerland
omar.alkadi@epfl.ch
3
Department of Radiology and Medical Informatics,
Université de Genève, Geneva, Switzerland
4
University of Applied Sciences Western Switzerland (HES–SO), Sierre, Switzerland
Abstract. The number density of scatterers in tumor tissue contribute

to a heterogeneous ultrasound speckle pattern that can be difficult to dis-
cern by visual observation. Such tumor stochastic behavior becomes even
more challenging if the tumor texture heterogeneity itself is investigated
for changes related to response to chemotherapy treatment. Here we
define a new tumor texture heterogeneity model for evaluating response
to treatment. The characterization of the speckle patterns is performed
via state-of-the-art multi-orientation and multi-scale circular harmonic
wavelet (CHW) frames analysis of the envelope of the radio-frequency
signal. The lacunarity measure – corresponding to scatterer number den-
sity – is then derived from fractal dimension texture maps within the
CHW decomposition, leading to a localized quantitative assessment of
tumor texture heterogeneity. Results indicate that evaluating tumor het-
erogeneity in a multidimensional texture analysis approach could poten-
tially impact on designing an early and effective chemotherapy treat-
ment.
1 Introduction
Liver tumor ultrasound scanning is recently becoming increasingly recommended
as a first diagnosis option for early prediction of response to chemotherapy treat-
ment [1]. However, visual assessment of tumor response to chemotherapy is very
challenging without monitoring longitudinally the tumor development. This is,
in part, due to the intertwined tumor speckle variations, leading to formation of
complex texture patterns. A robust approach to tackle this texture complexity
is to assess the radio-frequency (RF) echoes – instead of B-mode images – which
are not subjected to log-compression and proprietary filtering algorithms. This
original data preservation allows for better statistical modeling of backscattering
properties.

DOI: 10.1007/978-3-319-46720-7 72
620 O.S. Al-Kadi et al.
During the course of chemotherapy treatment, changes within tumor region

may occur due to progression or regression of disease. The speckle patterns vari-
ations are heterogeneous as tumor angiogenesis can affect the complexity of the
tissue spatial relationship. This heterogeneity in tumor tissue scatterers could
span different scatterer densities. Regions within the tumor tissue that respond
to treatment exhibit different statistical properties to that of the non-respondent
counterpart [2-4,6]. Thus it is essential to improve the discriminative abilities of
the employed statistical distribution model for better characterization of tumor
heterogeneity.
A number of contributions employed the backscattered statistics from RF
ultrasound signal for evaluating the early death of tumor cells in response to
chemotherapy treatment; such as using the scatterer spacing and diameter, and
acoustic concentration in combination with texture features from the gray-level
co-occurrence matrix [3]. Others relied on calculating the power spectrum from
the Fourier transform of raw RF data, and subsequently deriving the spectral
slope, 0-MHz intercept, and mid-band fit quantitative parameters [2]. In another
similar work, the maximum mean discrepancy features were extracted on his-
tograms of quantitative ultrasound spectroscopic parametric maps [4]. However,
analyzing tissue backscattering properties from a single resolution perspective is
limiting, as substantial information that could assist in tumor tissue characteri-
zation can be hidden at different locations, orientations and scales of resolution.
Refined statistical properties can be obtained from the RF envelope-detected
signal performed in 2D [5], where a fractal-based representation that underpins
a tumor model can be achieved [6].
For an improved ultrasound tissue characterization of tumor texture, the use
of the fractal dimension (FD) as in [6] might not capture well all relevant tissue
changes. Namely, the FD corresponds to scatterers spatial distribution and not
scatterer number density, thus it is possible to get similar FD values for tex-
tures which might not look alike. As a result, this could overlook some of the
important aspects of the statistics of the envelope detected from the RF signal.
Tissue heterogeneity property previously proved to be useful in assessing tumor
aggressiveness [7]. Therefore, the variation in scatterer number density within
the tumor focal region – which corresponds to spatial heterogeneity – would be
an interesting property to consider as well.
Here we propose a novel ultrasound texture analysis approach for tissue char-
acterization. The assumption is made that quantifying tumor spatial heterogene-
ity could assist in revealing subtle cues (i.e., small changes in tissue texture) for
tumors that responded to chemotherapy administration. A Nakagami statisti-
cal distribution is fitted locally to the envelope RF signal for estimating the
backscattering parameters. Subsequently, circular harmonic wavelets frames are
used to decompose the backscattered shape statistics into different spatial scales
and local circular frequencies. Finally, a heterogeneity feature descriptor is suc-
cessively constructed by mapping the circular harmonic wavelets frames on the
fractal dimension space, and then quantitatively estimating tissue sparsity from
the constructed fractal texture maps.
Multidimensional Texture Analysis for Improved Prediction 621

Although the backscattered signal from tumor tissue tends to show a stochastic
pattern, the local concentration and spatial arrangement of progressive tumor
tissue scatterers may follow a distribution different from regressive ones. To
maximize the difference between the two conditions, the fractal signatures are
derived from multi-scale circular frequency analysis of the acoustic properties of
the envelope RF signal for assessing tissue heterogeneity.
2.1 Ultrasound RF Data Analysis
The amplitudes of the individual backscattered signals are assumed to be ran-

domly distributed due to the random backscatter coefficient of each individual
scatterer. The interference signals from the large number of randomly distrib-
uted scatterers give the echo signal its statistical nature. Many statistical models
exist for the purpose of characterizing randomness in soft tissue; however, very
few can estimate the model parameters with analytical simplicity and computa-
tional ease. The Nakagami distribution is an example of a simple bi-parametric
model which can characterize tissue in various scattering conditions and scatterer
densities [8]. The Nakagami density function is defined as
2mm 2
Pn (x|m, Ω) = m
x2m−1 e−mx /Ω , (1)
Γ (m) Ω
for x ≥ 0, where Γ is the Euler gamma function. The real numbers m > 0
(related to the local concentration of scatterers) and Ω > 0 (related to the local
backscattered energy) are called the shape and scaling parameters, respectively.
Similarly to the Rayleigh distribution, the envelope of the RF signal x2 follows
a gamma distribution. By fine-tuning the shape of the distribution parameter
m, other statistical distributions can be modeled, such as, an approximation of
the Rician distribution (i.e., post-Rayleigh) for m > 1, a Rayleigh distribution
for the special case when m = 1, and when m < 1 a K-distribution (i.e., pre-
Rayleigh). The envelope-detected RF signal based on the Nakagami m parameter
was used subsequently for investigating tissue heterogeneity.
2.2 Circular Harmonic Wavelets
A natural way of assessing the echo signal f (x, y) is to analyse its statistical
properties at different spatial scales. An efficient way to systematically decom-
pose f (x, y) into successive dyadic scales is to use a redundant isotropic wavelet
transform. The recursive projection of f on Simoncelli’s isotropic wavelet pro-
vides such a frame representation [9]. The radial profile of the wavelet function
is defined in polar coordinates in the Fourier domain as
π
cos π2 log2 2ρπ , 4 <ρ≤π
ĥ(ρ) = (2)
0, otherwise.
The scaling function is not used to ensure illumination invariance. The local
structural properties of f (x, y) can be well described in terms of the local circular
frequencies, which was at the origin of the success of methods such as local binary
patterns (LBP) [10]. In this work, local circular harmonics are computed on top
of the wavelet frames to characterize circular frequencies at multiple scales [11],
which is an extension of steerable Riesz wavelets [12]. Circular harmonic wavelets
(CHW) of order n are constructed in the Fourier domain as
φ̂(n) (ρ, ϕ) = ĥ(ρ)ejnϕ . (3)
The representation obtained from the collection of the complex magnitudes of the
scalar products |f, φ(n) | characterizes the local circular frequencies in f (x, y)
up to an order n = 1 : N and is rotation invariant [13].
2.3 Heterogeneity Quantification

After the projection of f on CHW frames, the self-similarity of each voxel from its
surrounding neighborhood is determined via estimating its localized FD. This
would serve as an estimated array of localized FD values (i.e. fractal texture
map) for a multi-dimensional representation of tissue heterogeneity.
Fractal Texture Map Estimation. There are several methods to estimate

the FD, however multiplicative speckle scale changes can effect the stability of
parameter estimation. The fractal Brownian motion (fBm), which is known for its
capability for describing random phenomena [14], can work well with ultrasound
tissue characterization [6]. Both scale- and rotation-invariance properties of the
non-stationary fBm model makes it a perfect candidate to be integrated with the
Nakagami modeling and CHW decomposition. The fBm can be characterized by
E (Δv) = KΔrH , (4)

where E (Δv) = |q − p| is the mean absolute difference of voxel pairs Δv; Δr =
s 2
i=1 (q − p) (s = 3 for fBm texture surface enveloped in 3-D space) is the
voxel pair distances; H is the Hurst exponent; and K > 0 is a constant.
Given a volume set V of constructed envelope-detected RF tumor regions
fiμ (x, y), where μ stands for the Nakagami shape parameter and i is a certain
slice in the acquired volume, tissue fractal characteristics from the backscattered
envelope are investigated. A fractal texture map F, having a size m × n and
for k dimensions, can be defined as in (5) based on the CHW frames for all
corresponding voxels vxy k
of fiμ (x, y), f ∈ V . The k value empirically specifies
the maximum convolution kernel size I used in estimating Δv of (4). The slope
of linear regression line of the log-log plot of (4) gives H from which the localized
fractal dimension is estimated (F D = 3−H). This procedure is iterated covering
k
all vxy which yields a set of multi-dimensional fractal texture maps Mf to be
constructed for each V , where Mf = {F1 , F2 , . . . , Fz }, and z is the total number
of F in Mf .
⎛ k k k k ⎞
v11 v12 · · · v1y · · · v1n
⎜ v21k k
v22 · · · v2y k
· · · v2nk ⎟
⎜ ⎟
⎜ .. .. . . .. .. ⎟
⎜. . .. . ⎟
F (N,J) {f } (x, y) = ⎜
⎜ vx1k k k k
⎟
⎟ (5)
⎜ vx2 · · · vxy · · · vxn ⎟
⎜. .. .. . . .. ⎟
⎝ .. . . .. ⎠
k k k k
vm1 vm2 · · · vmy · · · vmn
The integration of the fBm model at different CHW orders N and scales
J can contribute towards a better separability between the mixtures of speckle
k
patterns within tissue. Such that Δv and Δr locally estimate the FD of each vxy
μ
up to the resolution limits of fi specified by k that best characterizes the speckle
patterns for different scales and circular frequencies. This approach enables for
further probing the resolution of CHW frames, and hence facilitates for assessing
the speckle pattern heterogeneity.
Lacunarity Analysis. To further differentiate between textures having similar

FD values, the lacunarity (L) – a coefficient of variation that can measure the
sparsity of the fractal texture – can assist in quantifying aspects of patterns that
exhibit scale-dependent changes in structure [15]. Namely, it measures the het-
erogeneity of the fractal pattern, providing meta-information about the dimen-
sion of F. The lower the L value, the more heterogeneous the examined tumor
region fiμ represented by F, and vice versa. L can be defined in terms of the
relative variance of the size distribution in F as
2 1 2
y F − mn yF
1 2
mn x x E[F 2 ] − E [F] V ar [F]
L= 2 = 2 = 2 . (6)
1
F E [F] E [F]
mn x y

3.1 Clinical Tumor Cross-Sectional Dataset
The approach has been validated on RF ultrasound data acquired using a diag-
nostic ultrasound system (Zonare Medical Systems, Mountain View, CA, USA)
4 MHz curvilinear transducer and 11 MHz sampling. The output 2-D image size
was 65 × 160 mm with a resolution of 225 × 968 pixels. A total of 287 cross sec-
tional images of 33 volumetric liver tumors manually segmented were obtained
from the stacks of 2-D images, 117 were responsive and 170 did not responded
to chemotherapy treatment. Response to treatment was determined based on
conventional computed tomography follow up imaging as part of the patient
standard clinical care based on the response evaluation criteria in solid tumors
(RECIST) [16]. The baseline cross-sectional imaging was compared against those
performed at the end of treatment according to the RECIST criteria to deter-
mine response to treatment for each target tumor. A tumor was classified as
responsive if categorized as partial response and non-responsive if no change or
disease demonstrated progression.
3.2 Statistical Analysis

To quantitatively assess the robustness of our approach, 2 (N + 1) × J features,
where 2 stands for both the average F and L estimated at each N and J per
slice of each of the acquired volumes, were fed into a support vector machine
classifier to compare the overall modeling performance of classifying responsive
versus non-responsive cases. Cross-validation was performed based on a leave-
one-tumor-out (loo) approach, and further validated on independent test-set of
107 cross sectional images (69 responsive versus 38 non-responsive images). The
convolution kernel size I used in estimating the localized FD of F was initially
optimized while having N and J fixed, see Fig. 1. Then the classification perfor-
mance for different N and J values of the CHW representation was investigated
in order to quantify L extracted from F. Hence, when the optimized values of N ,
J and I are employed, 97.91 % (97.2 % for unseen data) best classification accu-
racy is achieved as compared to 92.1 % in the work of [6], and similarly applies
to the 5- and 10-folds cross validation results (indicated in terms of mean ±
standard deviation of the performance over 60 runs). Figure 2 shows the F and
corresponding L for a non-responsive vs responsive case. A less heterogeneous
texture (i.e. higher L values colored in red in Fig. 2) is witnessed in the respon-
sive case. This indicates tumor tissue texture is becoming more sparse, which
could be signs of necrotic regions, and hence responding to treatment (Table 1).
Fig. 1. Classification accuracies for varying convolution kernel size (I) in pixels with
fixed order (N = 2) and scale (J = 6)
Characterizing the speckle patterns in terms of multi-scale circular harmonics

representation could assist in better characterization of the backscattered signal,
which adapts according to the varying nature of tissue structure. As changes in
the scatterers’ spatial distribution and number density reflect in the ultrasound
backscattering, the sensitivity of response to treatment of the envelope RF signal
is implicitly linked to changes in FD and associated L on the CHW frames.
Finally, tumors with varying depth would decrease the amplitude of the RF
data, and quantifying the tumor response to chemotherapy treatment under
such conditions is planned for future work.
Table 1. Classification performance for the multidimensional heterogeneity analysis of

clinical liver tumor dataset
Cross-validation
Statistical measures loo 5-fold 10-fold
Accuracy 97.91 93.30 ± 0.017 95.70 ± 0.009
Sensitivity 98.80 96.40 ± 0.888 97.50 ± 0.931
Specificity 96.60 88.80 ± 0.964 93.10 ± 0.975
ROC-AUC 97.70 92.60 ± 0.020 95.30 ± 0.009
Fig. 2. (1st column) Tumor B-mode images, (2nd column) fractal texture maps and
(3rd column) corresponding tissue heterogeneity representation for a (1st row) non-
responsive vs (2nd row) responsive case, respectively. Red regions in (c) and (f) indicate
response to treatment according to RECIST criteria [16]. CHW decomposition was
based on a 2nd order and up to the 8-th scale.
4 Conclusion
A novel approach has been presented for quantifying liver tumor response to
chemotherapy treatment with three main contributions: (a) ultrasound liver
tumor texture analysis based on a Nakagami distribution model for analyzing the
envelope RF data is important to retain enough information; (b) a set of CHW
frames are used to define a new tumor heterogeneity descriptor that is charac-
terized at multi-scale circular harmonics of the ultrasound RF envelope data; (c)
the heterogeneity is specified by the lacunarity measure, which is viewed as the
size distribution of gaps on the fractal texture of the decomposed CHW coeffi-
cients. Finally the measurement of heterogeneity for the proposed representation
model is realized by means of support vector machines.
Acknowledgments. We would like to thank Dr. Daniel Y.F. Chung for providing the
ultrasound dataset. This work was partially supported by the Swiss National Science
Foundation (grant PZ00P2 154891) and the Arab Fund (grant 2015-02-00627).
References
1. Bae, Y.H., Mrsny, R., Park, K.: Cancer Targeted Drug Delivery: An Elusive Dream,
pp. 689–707. Springer, New York (2013)
2. Sadeghi-Naini, A., Papanicolau, N., Falou, O., Zubovits, J., Dent, R., Verma, S.,
Trudeau, M., Boileau, J.F., Spayne, J., Iradji, S., Sofroni, E., Lee, J.,
Lemon-Wong, S., Yaffe, M., Kolios, M.C., Czarnota, G.J.: Quantitative ultrasound
evaluation of tumor cell death response in locally advanced breast cancer patients
receiving chemotherapy. Clin. Cancer Res. 19(8), 2163–2174 (2013)
3. Tadayyon, H., Sadeghi-Naini, A., Wirtzfeld, L., Wright, F.C., Czarnota, G.: Quan-
titative ultrasound characterization of locally advanced breast cancer by estimation
of its scatterer properties. Med. Phys. 41, 012903 (2014)
4. Gangeh, M.J., Sadeghi-Naini, A., Diu, M., Kamel, M.S., Czarnota, G.J.: Cate-
gorizing extent of tumour cell death response to cancer therapy using quantita-
tive ultrasound spectroscopy and maximum mean discrepancy. IEEE Trans. Med.
Imaging 33(6), 268–272 (2014)
5. Wachinger, C., Klein, T., Navab, N.: The 2D analytic signal for envelope detection
and feature extraction on ultrasound images. Med. Image Anal. 16(6), 1073–1084
(2012)
6. Al-Kadi, O.S., Chung, D.Y., Carlisle, R.C., Coussios, C.C., Noble, J.A.: Quantifi-
cation of ultrasonic texture intra-heterogeneity via volumetric stochastic modeling
for tissue characterization. Med. Image Anal. 21(1), 59–71 (2015)
7. Al-Kadi, O.S., Watson, D.: Texture analysis of aggressive and non-aggressive lung
tumor CE CT images. IEEE Trans. Bio-med. Eng. 55(7), 1822–1830 (2008)
8. Shankar, P.M.: A general statistical model for ultrasonic backscattering from tis-
sues. IEEE T Ultrason. Ferroelectr. Freq. Control 47(3), 727–736 (2000)
9. Portilla, J., Simoncelli, E.P.: A parametric texture model based on joint statistics
of complex wavelet coefficients. Int. J. Comput. Vis. 40, 49–70 (2000)
10. Ojala, T., Pietikänen, M., Mäenpää, T.: Multiresolution gray-scale and rotation
invariant texture classification with local binary patterns. IEEE Trans. Pattern
Anal. Mach. Intell. 24, 971–987 (2002)
11. Unser, M., Chenouard, N.: A unifying parametric framework for 2D steerable
wavelet transforms. SIAM J. Imaging Sci. 6(1), 102–135 (2013)
12. Unser, M., Van De Ville, D.: Wavelet steerability and the higher-order Riesz trans-
form. IEEE Trans. Image Process. 19(3), 636–652 (2010)
13. Depeursinge, A., Püspöki, Z., et al.: Steerable wavelet machines (SWM): learning
moving frames for texture classification. IEEE Trans. Image Process. (submitted)
14. Lopes, R., Betrouni, N.: Fractal and multifractal analysis: a review. Med. Image
Anal. 13(4), 634–649 (2009)
15. Plotnick, R.E., Gardner, R.H., Hargrove, W.W., Prestegaard, K., Perlmutter, M.:
Lacunarity analysis: a general technique for the analysis of spatial patterns. Phys.
Rev. E 53(5), 5461–5468 (1996)
16. Eisenhauer, E.A., Therasse, P., et al.: New response evaluation criteria in solid
tumours: revised RECIST guideline. Eur. J. Cancer 45(2), 228–247 (2009)
Classification of Prostate Cancer Grades
and T-Stages Based on Tissue Elasticity
Using Medical Image Analysis
Shan Yang(B) , Vladimir Jojic, Jun Lian, Ronald Chen, Hongtu Zhu,
and Ming C. Lin

alexyang@cs.unc.edu
http://gamma.cs.unc.edu/CancerClass
Abstract. In this paper, we study the correlation of tissue (i.e. prostate)

elasticity with the spread and aggression of prostate cancers. We describe
an improved, in-vivo method that estimates the individualized, relative
tissue elasticity parameters directly from medical images. Although elas-
ticity reconstruction, or elastograph, can be used to estimate tissue elas-
ticity, it is less suited for in-vivo measurements or deeply-seated organs
like prostate. We develop a non-invasive method to estimate tissue elas-
ticity values based on pairs of medical images, using a finite-element
based biomechanical model derived from an initial set of images, local
displacements, and an optimization-based framework. We demonstrate
the feasibility of a statistically-based multi-class learning method that
classifies a clinical T-stage and Gleason score using the patient’s age and
relative prostate elasticity values reconstructed from computed tomog-
raphy (CT) images.
1 Introduction
Currently screening of prostate cancers is usually performed through routine
prostate-specific antigen (PSA) blood tests and/or a rectal examination. Based
on positive PSA indication, a biopsy of randomly sampled areas of the prostate
can then be considered to diagnose the cancer and assess its aggressiveness.
Biopsy may miss sampling cancerous tissues, resulting in missed or delayed diag-
nosis, and miss areas with aggressive cancers, thus under-staging the cancer and
leading to under-treatment.
Studies have shown that the tissue stiffness described by the tissue properties
may indicate abnormal pathological process. Ex-vivo, measurement-based meth-
ods, such as [1,11] using magnetic resonance imaging (MRI) and/or ultrasound,
were proposed for study of prostate cancer tissue. However, previous works in
material property reconstruction often have limitations with respect to their
genericity, applicability, efficiency and accuracy [22]. More recent techniques,
such as inverse finite-element methods [6,13,17,21,22], stochastic finite-element
methods [18], and image-based ultrasound [20] have been developed for in-vivo
soft tissue analysis.

DOI: 10.1007/978-3-319-46720-7 73
628 S. Yang et al.
In this paper, we study the possible use of tissue (i.e. prostate) elasticity to
help evaluate the prognosis of prostate cancer patients given at least two set of
CT images. The clinical T-stage of a prostate cancer is a measure of how much
the tumor has grown and spread; while a Gleason score based on the biopsy of
cancer cells indicates aggressiveness of the cancer. They are commonly used for
cancer staging and grading. We present an improved method that uses geomet-
ric and physical constraints to deduce the relative tissue elasticity parameters.
Although elasticity reconstruction, or elastography, can be used to estimate tis-
sue elasticity, it is less suited for in-vivo measurements or deeply seated organs
like prostate. We describe a non-invasive method to estimate tissue elasticity
values based on pairs of CT images, using a finite-element based biomechan-
ical model derived from an initial set of images, local displacements, and an
optimization-based framework.
Given the recovered tissue properties reconstructed from analysis of med-
ical images and patient’s ages, we develop a multiclass classification system for
classifying clinical T-stage and Gleason scores for prostate cancer patients. We
demonstrate the feasibility of a statistically-based multiclass classifier that clas-
sifies a supplementary assessment on cancer T-stages and cancer grades using
the computed elasticity values from medical images, as an additional clinical
aids for the physicians and patients to make more informed decision (e.g. more
strategic biopsy locations, less/more aggressive treatment, etc.). Concurrently,
extracted image features [8–10] using dynamic contrast enhanced (DCE) MRI
have also been suggested for prostate cancer detection. These methods are com-
plementary to ours and can be used in conjunction with ours as a multimodal
classification method to further improve the overall classification accuracy.
2 Method
Our iterative simulation-optimization-identification framework consists of two

alternating phases: the forward simulation to estimate the tissue deformation
and inverse process that refines the tissue elasticity parameters to minimize the
error in a given objective function. The input to our framework are two sets of 3D
images. After iterations of the forward and inverse processes, we obtain the best
set of elasticity parameters. Below we provide a brief overview of the key steps
in this framework and we refer the interested readers to the supplementary doc-
ument at http://gamma.cs.unc.edu/CancerClass/ for the detailed mathematical
formulations and algorithmic process to extract the tissue elasticity parameters
from medical images.
2.1 Forward Simulation: BioTissue Modeling
In our system, we apply Finite Element Method (FEM) and adopt Mooney
Rivlin material for bio-tissue modeling [3]. After discretization using FEM, we
arrive at a linear system,
Classification of Prostate Cancer Grades and T-Stages 629
Ku = f (1)
with K as the stiffness matrix, u as the displacement field and f as the external
forces. The stiffness matrix K is not always symmetric possitive definite due
to complicated boundary condition. The boundary condition we applied is the
traction forces (shown in Fig. 7(a) of the supplementary document) computed
based on the displacement of the surrounding tissue (overlapping surfaces shown
in Fig. 7(b) of the supplementary document). We choose to use the Generalized
Minimal Residual (GMRES) [16] solver to solve the linear system instead of the
Generalized Conjugate Gradient (GCG) [14], as GMRES can better cope with
non-symmetric, positive-definite linear system.
The computation of the siffness matrix K in Eq. 1 depends on the energy
function Ψ of the Mooney Rivlin material model [15,19].
1 2 1 1
Ψ= μ1 ((I21 − I2 )/I33 − 6) + μ2 (I1 /I33 − 3) + v1 (I32 − 1)2 , (2)
2
where μ1 , μ2 and v1 are the material parameters. In this paper, we recover
parameters μ1 and μ2 . Since prostate soft tissue (without tumors) tend to be
homogenous, we use the average μ̄ of μ1 and μ2 as our recovered elasticity
parameter. To model incompressibility, we set v1 to be a very large value (1 + e7
was used in our implementation). v1 is linearly related to the bulk modulus. The
larger the bulk modulus, the more incompressible the object.
Relative Elasticity Value: In addition, we divide the recovered absolute elas-
ticity parameter μ̄ by the that of the surrounding tissue to compute the relative
elasticity parameter μ̂. This individualized relativity value helps to remove the
variation in mechanical properties of tissues between patients, normalizing the
per-patient fluctuation in absolute elasticity values due to varying degrees of
hydration and other temporary factors. We refer readers to our supplementary
document for details regarding non-linear material models.
2.2 Inverse Process: Optimization for Parameter Identification

To estimate the patient-specific relative elasticity, our framework minimizes the
error due to approximated parameters in an objective function. Our objective
function as defined in Eq. 3 consists of the two components. The first part is
the difference between the two surfaces – one reconstructed from the reference
(initial) set of images, deformed using FEM simulation with the estimated para-
meters toward the target surface, and one target surface reconstructed from the
second set of images. This difference is measured by the Hausdorff distance [4].
In addition we add a Tikhonov regularization [5,7] term, which improves the
conditioning of a possibly ill-posed problem.
With regularization, our objective function is given as:

μ = argmin d(Sl , St )2 + λΓ Sl , (3)
µ
with d(Sl , St ) as the distance between deformed surface and the reference sur-
face, λ as the regularization weight, and Γ as the second-order differentiatial
operator.
630 S. Yang et al.
The second-order differential operator Γ on a continuous surface (2-

manifolds) S is the curvatures of a point on the surface. The curvature is defined
through the tangent plane passing that point. We denote the normal vector of
the tangent plane as n and the unit direction in the tangent plane as eθ . The
curvature related to the unit direction eθ is κ(θ). The mean curvature κmean
for a continuous surface
2π is defined as the average curvature of all the direc-
1
tions, κmean = 2π 0 κ(θ)dθ. In our implementation, we use triangle mesh to
approximate a continuous surface. We use the 1-ring neighbor as the region for
computing the mean curvature normal on our discrete surface Sl . We treat each
triangle of the mesh as a local surface with two conformal space parameters u
and v. With these two parameters u and v the second-order differential operator
Γ on vertex x is, Δu,v x = xuu + xvv .
2.3 Classification Methods
For classification of cancer prognostic scores, we develop a learning method to

classify patient cancer T-Stage and Gleason score based on the relative elasticity
parameters recovered from CT images. Both the prostate cancer T-stage and the
Gleason score are generally considered as ordinal responses. We study the effec-
tiveness of ordianl logistic regression [2] and multinomial logistic regression [12]
in the context of prostate cancer staging and grading. For both cases we use
RBF kernel to project our feature to higher dimentional space. We refer readers
to supplementary document for method details and the comparison with the
Random Forests method.
3 Patient Data Study
3.1 Preprocessing and Patient Dataset
Given the CT images (shown in Fig. 1a) of the patient, the prostate, bladder
and rectum are first segmented in the images. Then the 3D surfaces (shown in
(a) (b)
Fig. 1. Real Patient CT Image and Reconstructed Organ Surfaces. (a) shows
one slice of the parient CT images with the bladder, prostate and rectum segmented.
(b) shows the reconstructed organ surfaces.
Fig. 1b) of these organs are reconstructed using VTK and these surfaces would
be the input to our elasticity parameter reconstruction algorithm. Our patient
dataset contains 113 (29 as the reference and 84 as target) sets of CT images
from 29 patients, each patient having 2 to 15 sets of CT images. Every patient
in the dataset has prostate cancer with cancer T-stage ranging from T1 to T3,
Gleason score ranging from 6 to 10, and age from 50 to 85. Gleanson scores are
usually used to assess the aggressiveness of the cancer.
3.2 Cancer Grading/Staging Classification Based on Prostate

Elasticity Parameters
We further study the feasibility of using recovered elasticity parameters as a

cancer prognostic indicator using our classifier based on relative tissue elastic-
ity values and ages. Two classification methods, ordinal logistic regression and
multinomial logistic regression, were tested in our study. We test each method
with two sets of features. The first set of features contains only the relative tissue
elasticity values μ̂. The resultant feature vector is one dimension. The second set
of features contains both the relative tissue elasticity values and the age. The
feature vector for this set of features is two dimensional. Our cancer staging has
C = 3 classes, T1, T2 and T3. And the cancer grading has G = 5 classes, from
6 to 10. In our patient dataset, each patient has at least 2 sets of CT images.
The elasticity parameter reconstruction algorithm needs 2 sets of CT images as
input. We fix one set of CT images as the initial (reference) image and use the
other M number of images T , where |T | = M as the target (deformed) images.
By registering the initial image to the target images, we obtain one elasticity
parameter μ̂i , i = 1 . . . M for each image in T . We perform both per-patient and
per-image cross validation.
Per-Image Cross Validation: We treat all the target images (N = 84) of
all the patients as data points of equal importance. The elasticity feature for
each target image is the recovered elasticity parameter μ̂. In this experiment, we
train our classifier using the elasticity feature of the 83 images then cross validate
with the one left out. Then, we add the patient’s age as another feature to the
classifier and perform the validation. The results for cancer staging (T-Stage)
classification are shown in Fig. 2a and that for cancer grading (Gleason score)
classification are shown in Fig. 2b. The error metric is measured as the absolute
difference between the classified cancer T-Stage and the actual cancer T-Stage.
Zero error-distance means our classifier accurately classifies the cancer T-Stage.
The multinomial method outperforms the ordinal method for both cancer
staging (T-Stage) and cancer aggression (Gleason score) classification. The main
reason that we are observing this is due to the optimization weights or the
unknown regression coefficients β (refer to supplementary document for the def-
inition) dimension of the multinomial and ordinal logistic regression method.
The dimension of the unknown regression coefficients of the multinomial logistic
regression for cancer staging classification (with elasticity parameter and age as
features) is 6 while that of ordinal logistic regression is 4. With the ‘age’ feature,
632 S. Yang et al.
(a) (b)
Fig. 2. Error Distribution of Cancer Grading/Staging Classification for Per-

Image Study. (a) shows error distribution of our cancer staging classification using the
recovered prostate elasticity parameter and the patient’s age. For our patient dataset,
the multinomial classifier (shown in royal blue and sky blue) outperforms the ordi-
nal classifier (shown in crimson and coral). We achieve up to 91 % accuracy using
multinomial logistic regression and 89 % using ordinal logistic regression for classify-
ing cancer T-Stage based on recovered elasticity parameter and age. (b) shows the
correlation between the recovered relative elasticity parameter and the Gleason score
with/without the patient’s age. We achieve up to 88 % accuracy using multinomial
logistic regression and 81 % using ordinal logistic regression for classifying Gleason
score based on recovered elasticity parameter and age.
we obtain up to 91 % accuracy for perdicting cancer T-Stage using multinomial

logistic regression method and 89 % using ordinal logistic regression method. For
Gleason score classification we achieve up to 88 % accuracy using multinomial
logistic regression method and 81 % using ordinal logistic regression method.
Per-Patient Cross Validation: For patients with more than 2 sets of images,
we apply Gaussian sampling to μ̂i , i = 1 . . . M to compute the sampled elasticity
parameter as the elasticity feature of the patient. We first train our classifier
using the elasticity feature of the 28 patients then test the trained classifier on
the remaining one patient not in the training set. We repeat this process for each
of the 29 patients. Then we include the patient age as another feature in the
classifier. The error distribution for cancer staging (T-Stage) classification results
are shown in Fig. 3a and the error distribution of cancer grading (Gleason score)
classification are shown in Fig. 3b. We observe that the multinomial method in
general outperforms the ordinal method. More interestingly, the age feature helps
to increase the classification accuracy by 2 % for staging classification and 7 % for
Gleason scoring classification). With the age feature, our multinomial classifier
achieves up to 84 % accuracy for classifying cancer T-Stage and up to 77 %
accuracy for classifying Gleason scores. And our ordinal classifier achieves up to
82 % for cancer T-Stage classification and 70 % for Gleason score classification.
The drop in accuracy for per-patient experiments compared with per-image ones
is primary due to the decrease in data samples.
(a) (b)
Fig. 3. Error Distribution of Cancer Aggression/Staging Classification for

Per-Patient Study. (a) shows the accuracy and error distribution of our recovered
prostate elasticity parameter and cancer T-Stage. For our patient dataset, the multino-
mial classifier (shown in royal blue and sky blue) outperforms the ordinal classifier
(shown in crimson and coral). We achieve up to 84 % accuracy using multinomial logis-
tic regression and 82 % using ordinal logistic regression for classifying cancer T-Stage
based on our recovered elasticity parameter and patient age information. (b) shows the
correlation between the recovered relative elasticity parameter and the Gleason score.
We achieve up to 77 % accuracy using multinomial logistic regression and 70 % using
ordinal logistic regression for classifying Gleason score based on our recovered elasticity
parameter and patient age information.
Among the 16 % failure cases for cancer staging classification, 15 % of our

multinomial classification results with age feature is only 1 stage away from the
ground truth. And for the failure cases for scoring classification, only 10 % of the
classified Gleason scores is 1 away from the ground truth and 13 % of them are
2 away from the ground truth.

In this paper, we present an improved, non-invasive tissue elasticity parame-
ter reconstruction framework using CT images. We further studied the correla-
tion of the recovered relative elasticity parameters with prostate cancer T-Stage
and Gleason score for multiclass classification of cancer T-stages and grades.
The classification accuracy on our patient dataset using multinormial logistic
regression method is up to 84 % accurate for cancer T-stages and up to 77 %
accurate for Gleason scores. This study further demonstrates the effectiveness of
our algorithm for recovering (relative) tissue elasticity parameter in-vivo and its
promising potential for correct classification in cancer screening and diagnosis.
Future Work: This study is performed on 113 sets of images from 29 prostate
cancer patients all treated in the same hospital. More image data from more
patients across multiple institutions can provide a much richer set of training
634 S. Yang et al.
data, thus further improving the classification results and testing/validating its
classification power for cancer diagnosis. With more data, we could also apply
our learned model for cancer stage/score prediction. And other features, such
as the volume of the prostate can also be included in the larger study. Another
possible direction is to perform the same study on normal subjects and increase
the patient diversity from different locations. A large-scale study can enable
more complete analysis and lead to more insights on the impact of variability
due to demographics and hospital practice on the study results. Similar analysis
and derivation could also be performed using other image modalities, such as
MR and ultrasound, and shown to be applicable to other types of cancers.
Acknowledgments. This project is supported in part by NIH R01 EB020426-01.
References
1. Ashab, H.A.D., Haq, N.F., Nir, G., Kozlowski, P., Black, P., Jones, E.C., Gold-
enberg, S.L., Salcudean, S.E., Moradi, M.: Multimodal classification of prostate
tissue: a feasibility study on combining multiparametric MRI and ultrasound. In:
SPIE Medical Imaging, p. 94141B. International Society for Optics and Photonics
(2015)
2. Bender, R., Grouven, U.: Ordinal logistic regression in medical research. J. R. Coll.
Physicians Lond. 31(5), 546–551 (1997)
3. Cotin, S., Delingette, H., Ayache, N.: Real-time elastic deformations of soft tissues
for surgery simulation. IEEE Trans. Vis. Comput. Graph. 5(1), 62–73 (1999)
4. Dubuisson, M.P., Jain, A.K.: A modified hausdorff distance for object matching. In:
Proceedings of the 12th IAPR International Conference on Pattern Recognition,
1994, vol. 1-Conference A: Computer Vision and Image Processing, vol. 1, pp.
566–568. IEEE (1994)
5. Engl, H.W., Kunisch, K., Neubauer, A.: Convergence rates for Tikhonov regulari-
sation of non-linear ill-posed problems. Inverse Prob. 5(4), 523 (1989)
6. Goksel, O., Eskandari, H., Salcudean, S.E.: Mesh adaptation for improving elas-
ticity reconstruction using the FEM inverse problem. IEEE Trans. Med. Imaging
32(2), 408–418 (2013)
7. Golub, G.H., Hansen, P.C., O’Leary, D.P.: Tikhonov regularization and total least
squares. SIAM J. Matrix Anal. Appl. 21(1), 185–194 (1999)
8. Haq, N.F., Kozlowski, P., Jones, E.C., Chang, S.D., Goldenberg, S.L., Moradi, M.:
Prostate cancer detection from model-free T1-weighted time series and diffusion
imaging. In: SPIE Medical Imaging, p. 94142X. International Society for Optics
and Photonics (2015)
Improved parameter extraction and classification for dynamic contrast enhanced
MRI of prostate. In: SPIE Medical Imaging, p. 903511. International Society for
Optics and Photonics (2014)
A data-driven approach to prostate cancer detection from dynamic contrast
enhanced MRI. Comput. Med. Imaging Graph. 41, 37–45 (2015)
11. Khojaste, A., Imani, F., Moradi, M., Berman, D., Siemens, D.R., Sauerberi, E.E.,
Boag, A.H., Abolmaesumi, P., Mousavi, P.: Characterization of aggressive prostate
cancer using ultrasound RF time series. In: SPIE Medical Imaging, p. 94141A.
International Society for Optics and Photonics (2015)
12. Kleinbaum, D.G., Klein, M.: Ordinal logistic regression. Logistic Regression, pp.
463–488. Springer, Berlin (2010)
13. Lee, H.P., Foskey, M., Niethammer, M., Krajcevski, P., Lin, M.C.: Simulation-
based joint estimation of body deformation and elasticity parameters for medical
image analysis. IEEE Trans. Med. Imaging 31(11), 2156–2168 (2012)
14. Liu, Y., Storey, C.: Efficient generalized conjugate gradient algorithms, part 1:
theory. J. Optim. Theory Appl. 69(1), 129–137 (1991)
15. Rivlin, R.S., Saunders, D.: Large elastic deformations of isotropic materials. VII.
Experiments on the deformation of rubber. Philos. Trans. R. Soc. Lond. Ser. A
Math. Phys. Sci. 243(865), 251–288 (1951)
16. Saad, Y., Schultz, M.H.: Gmres: a generalized minimal residual algorithm for solv-
ing nonsymmetric linear systems. SIAM J. Sci. Stat. Computing. 7(3), 856–869
(1986)
17. Shahim, K., Jürgens, P., Cattin, P.C., Nolte, L.-P., Reyes, M.: Prediction of cranio-
maxillofacial surgical planning using an inverse soft tissue modelling approach. In:
Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013, Part I.
18. Shi, P., Liu, H.: Stochastic finite element framework for simultaneous estimation
of cardiac kinematic functions and material parameters. Med. Image Anal. 7(4),
445–464 (2003)
19. Treloar, L.R., Hopkins, H., Rivlin, R., Ball, J.: The mechanics of rubber elasticity
[and discussions]. Proc. R. Soc. Lond. A. Math. Phys. Sci. 351(1666), 301–330
(1976)
20. Uniyal, N., et al.: Ultrasound-based predication of prostate cancer in MRI-guided
biopsy. In: Linguraru, M.G., Laura, C.O., Shekhar, R., Wesarg, S., Ballester,
M.Á.G., Drechsler, K., Sato, Y., Erdt, M. (eds.) CLIP 2014. LNCS, vol. 8680,
21. Vavourakis, V., Hipwell, J.H., Hawkes, D.J.: An inverse finite element u/p-
formulation to predict the unloaded state of in vivo biological soft tissues. Ann.
Biomed. Eng. 44(1), 187–201 (2016)
22. Yang, S., Lin, M.: Materialcloning: Acquiring elasticity parameters from images
for medical applications (2015)
Automatic Determination of Hormone Receptor
Status in Breast Cancer Using Thermography
Siva Teja Kakileti, Krithika Venkataramani(B) , and Himanshu J. Madhu
Xerox Research Centre India, Bangalore, India

{SivaTeja.Kakileti,Krithika.Venkataramani,Himanshu.Madhu2}@xerox.com
Abstract. Estrogren and progesterone hormone receptor status play a

role in the treatment planning and prognosis of breast cancer. These
are typically found after Immuno-Histo-Chemistry (IHC) analysis of the
tumor tissues after surgery. Since breast cancer and hormone receptor
status affect thermographic images, we attempt to estimate the hormone
receptor status before surgery through non-invasive thermographic imag-
ing. We automatically extract novel features from the thermographic
images that would differentiate hormone receptor positive tumors from
hormone receptor negative tumors, and classify them though machine
learning. We obtained a good accuracy of 82 % and 79 % in classification
of HR+ and HR− tumors, respectively, on a dataset consisting of 56
subjects with breast cancer. This shows a novel application of automatic
thermographic classification in breast cancer prognosis.
Keywords: Thermography · Breast cancer prognosis · Hormone recep-

tor status
1 Introduction
Breast cancer has the highest incidence among cancers in women [1]. Breast can-
cer also has wide variations in the clinical and pathological features [2], which
are taken into account for treatment planning [3], and to predict survival rates or
treatment outcomes [2,4]. Thermography offers a radiation free and non-contact
approach to breast imaging and is being re-investigated in recent times [5–8]
with the availability of high resolution thermal cameras. Thermography detects
the temperature increase in malignancy due to the increased metabolism of can-
cer [9] and due to the additional blood flow generated for feeding the malignant
tumors [6]. Thermography may also be sensitive to hormone receptor status as
these hormones release Nitric Oxide, which causes vasodilation and temperature
increase [6,10]. Both these effects could potentially lead to evaluation of hormone
receptor status of malignant tumors using thermography. If this is possible, it
provides a non-invasive way of predicting the hormone receptor status of malig-
nancies through imaging, before going through Immuno-Histo-Chemistry (IHC)
analysis on the tumor samples after surgery. This paper investigates this possibil-
ity and the prediction accuracy. Most other breast imaging techniques including

DOI: 10.1007/978-3-319-46720-7 74
Automatic Determination of Hormone Receptor Status in Breast Cancer 637
mammography are not able to detect hormone receptor status changes. Though
the paper by Chaudhuri et al. [11] claims that Dynamic Contrast Enhanced
(DCE) MRI can be used for prediction of Estrogen status, it is invasive, and
has been tested only on a small dataset of 20 subjects with leave-one-out cross-
validation.
There has been a study to analyze the effect of hormone receptor status of
malignant tumors on thermography [12] though quantitative analysis of average
or maximum temperatures of the tumor, the mirror tumor site and the breasts.
[12] reports a significant difference in these temperature measurements for hor-
mone receptor positive and negative status using thermography. In this paper,
we automatically extract features from the thermographic images in the region
of interest (ROI), i.e. the breast tissue, using image processing and attempt to
classify the hormone receptor status of malignant tumors using machine learn-
ing techniques. The determination of whether or not a subject has breast cancer
using thermography, i.e. screening for cancer, is out of scope for this paper. There
are other algorithms for breast cancer screening using thermography [8,13],
which the reader may refer to based on interest.
The paper is organized as follows. Section 2 provides details on the effect of
hormone receptor positive and negative breast cancers on thermography from
the existing literature. Section 3 describes our approach to automatic feature
extraction from the ROI for HR+ and HR− malignant tumor classification.
Section 4 describes the dataset used for our experiments and our classification
results are provided in Sect. 5. Conclusions and future work are given in Sect. 6.
2 Effect of Hormone Receptor Status on Thermography

There is usage of readily available tumor markers such as Estrogen Receptor
(ER), Progesterone Receptor (PR), Human Epidermal growth factor Receptor
2 (HER2) and tumor cell growth protein marker Ki67, for treatment planning
[3,14], and survival rate prediction [2,4], especially in resource constrained devel-
oping countries like India. [2] uses ER, PR and HER2 for estimating breast cancer
mortality risk from a large dataset of more than 100, 000 patients with invasive
breast cancer. They find that there is variability in the 8 different ER/PR/HER2
subtypes, and the ER status has the largest importance. ER+ tumors have a
lower risk than ER− tumors. PR status has a lesser importance than ER status
and PR+ tumors have lower risk than PR− tumors. HER2 status has variations
in risk across the different hormone receptor subtypes, depending on the stage
of the cancer, with the lowest risk for the ER+/PR+/HER2− tumors, and the
highest risk for ER−/PR−/HER2− tumors. The effect of the Ki-67 marker indi-
cates the rate of tumor cell growth [14]. More aggressive tumors may have higher
temperatures due to their increased metabolism [9] and so the Ki-67 marker sta-
tus may play a role in thermography, but it has not been formally investigated
in any study yet.
Estrogen leads to increase in vasodilation due to the production of Nitric
Oxide with a resultant temperature increase [6,15]. Progesterone is also
638 S.T. Kakileti et al.
associated with locally high concentrations of Nitric Oxide generation [10] for
prolonged periods of time. [12] find there is a significant difference in average
and maximum temperature of the tumor site between PR+ and PR− tumors,
with the PR− tumors being hotter. The same pattern holds for ER status
although in a non-significant manner. Their study showed that the more aggres-
sive ER−/PR− tumors were hotter than the less aggressive ER+/PR+ tumors.
Their study also indicates that the difference in average temperatures of the
tumor and its mirror sites in contra-lateral breasts is higher in ER− tumors
than in ER+ tumors, although in a non-significant manner. The same pattern
holds for the PR status too. Since the hormone sensitivity of both breast tis-
sues are similar, it is probable that there is a thermal increase on both breasts
for estrogen or progesterone positive cases. [12] don’t specifically analyze the
four different subtypes of ER/PR status, probably because the difference in
temperatures are small for just one hormone receptor status. Using these med-
ical reasons and empirical observations, in the next section, we design a set of
novel features along with a few existing features that would either extract these
observations automatically or would correlate with these findings for classifying
hormone receptor positive and negative tumors.
3 Automatic Feature Extraction for Hormone Receptor

Status
We attempt to classify all combinations of Hormone Receptor (HR) pos-
itive (ER+/PR+, ER+/PR−, ER−/PR+) tumors from the HR negative
(ER−/PR−) tumors. We extracted features from elevated temperature regions
in the ROI, and the overall ROI. The elevated temperature regions, i.e., the
hot-spots are extracted as below.
3.1 Abnormal Region Extraction

The entire ROI is divided into abnormal regions and normal regions based on
their regional temperatures. The malignant tumor region is typically an abnor-
mal region with an elevated temperature. The abnormal regions have the highest
regional temperature in the Region of Interest (ROI). To segment an abnormal
region, we used an algorithm proposed in [16], where segmentation areas are
combined from multiple features defined by Eqs. 1 and 2 using a decision rule.
T1 = M ode(ROI) + ρ ∗ (Tmax − M ode(ROI)) (1)
T2 = Tmax − τ (2)
In the above equations, Tmax represents the overall maximum temperature in
all views and M ode(ROI) represents the mode of the temperature histogram
obtained using temperature values of pixels from the ROIs of all views. The
parameters ρ, τ and the decision fusion rule are selected based on the accuracy
of classification on a training/cross-validation subset and diversity in the seg-

mentation decisions. Decision fusion results in better hot-spot detection than
simple thresholding techniques [16]. Heat transmission from deep tumors results
in diffused lower temperatures on the surface and these parameters play a large
role in the deep tumor detection. Research on determining the combined depth
and size of tumors that can be detected needs to be done.
As discussed in [12], HR− tumors are hotter compared to HR+ tumors while
temperature increase on both sides is observed for HR+ tumors due to the
presence of similar hormone sensitive tissues. To capture these properties, we
extract the following features from these detected abnormal regions.
Distance Between Regions. The malignant tumor region is hotter than the
surrounding region, but the relative difference is higher for HR− tumors. In case
of HR+ tumors, the entire breast region is warmed up, and so this difference
is lesser. We use the normalized histogram of temperatures, or probability mass
function (PMF), to represent each region, and find the distance between regions
using a distance measure between PMFs. Here, the Jensen-Shannon Divergence
(JSD) is used a measure, as it is a symmetric measure. The JSD is defined as
1 P (i) 1 Q(i)
JSD(P ||Q) = ∗ (log )P (i) + ∗ (log )Q(i), (3)
2 i
M (i) 2 i
M (i)
where M = 12 (P + Q). The value of JSD(P ||Q) tends to zero when P and Q
have identical distributions and has a very high value when the distributions
are very different. To include a measure of distance between multiple regions,
one or more of the PMFs of one region is modified by the mean temperature of
another region. The JSD between P − μ2 and Q − μ1 , where P is the PMF of
the abnormal region on the malignant side, Q is the PMF of the normal region
on the malignant side, μ1 is the mean of the contra-lateral side abnormal region
and μ2 is the mean of the contra-lateral side normal region, is taken as a feature.
In case of absence of an abnormal region on the contralateral side, μ1 is taken
to be equal to μ2 . A subtraction of the contralateral region means corresponds
to a relative increase in the heat with respect to the contralateral regions. For
HR− tumors, there may be no abnormal regions on the contra-lateral side, due
to which this JSD will be higher.
Relative Hotness to the Mirror Site. HR+ tumors have a lower temperature
difference between the tumor site and the mirror tumor site on the contra-lateral
side. To capture this, we use the mean squared distance between the temperature
of the malignant side abnormal region pixels and the mean temperature of the
contra-lateral side abnormal region, as defined in Eq. 4.
1
RH = ||T (x, y) − μ||2 (4)
|A|
x∈A y∈A
(a) (b)
(c) (d)
Fig. 1. Shows subjects with malignant tumors having a. ER+/PR+ status b. ER−
/PR− status c. ER+/PR+ status with asymmetrical thermal response d. ER−/PR−
status with some symmetrical thermal response
where T (x, y) represents temperature of the malignant side abnormal region pix-
els at location (x, y) in the image, μ represents mean temperature of the contra-
lateral side abnormal region and |A| represents the cardinality of abnormal region
A on the malignant side. This value is lower for HR+ tumors compared to HR−
tumors, as hormone sensitive tissues will be present on both sides. As shown in
Fig. 1a and b, we see thermal responses on both sides for HR+ tumors and no
thermal response on the normal breast for HR− tumors. However, there might
be outliers like Fig. 1c and d.
Thermal Distribution Ratio. In addition to the temperature change, the

areas of the abnormal regions on both sides are also considered as features.
We used the ratio of areas of abnormal regions on the contralateral side to the
malignant side. This value tends to be zero for HR− tumors, as there may be
no abnormal region on the contralateral side, and is higher for HR+ tumors.
3.2 Entire ROI Features

Textural features are used here to extract the features from the entire ROI.
However, instead of using the original temperature map of the ROI, a modified
temperature map is used. The thermal map formed by subtracting the malignant
side ROI with the contra-lateral side mean temperature, i.e. the relative temper-
ature from the contralateral side, is used to determine the textural features. The
Run Length Matrix (RLM) is computed from the thermal map, after quantizing
the temperature into l bins. Gray level non-uniformity and Energy features from
the RLM are computed, as mentioned in [7]. The non-uniformity feature would
be higher for HR− tumors as their tumors have more focal temperatures.
4 Dataset Description
We obtained an anonymized dataset of 56 subjects with biopsy confirmed breast
cancer with age varying from 27 to 76 years through our collaboration with Mani-
pal University. The FLIR E60 camera with a spatial resolution of 320 × 240 pix-
els is used to capture the initial 20 subjects and a high-resolution FLIR T650Sc
camera with an image resolution of 640 × 480 pixels is used for the remain-
ing subjects. A video is captured for each subject, and the acquisition protocol
involved asking the subject to rotate from right lateral to left lateral views. The
data for each subject included the mammography, sono-mammography, biopsy
reports, the ER/PR status values, with surgery reports and HER2 Neu status
values, where available of the tumors. From this data, there are 32 subjects with
HR+ malignant tumors and rest of them have HR− tumors.
5 Classification Results
From the obtained videos, we manually selected five frames that correspond
to frontal, right & left oblique and lateral views, and manually cropped the
ROIs in these. Consideration of multiple views helps in better tumor detection
since it might not be seen in a fixed view. From these multiple views, the view
corresponding to maximum abnormal region area with respect to the ROI area
is considered as the best view. This best view along with its contra-lateral side
view is used to calculate the features from the abnormal regions and the entire
ROI as mentioned in Sect. 3. The training set and testing set comprise of a
randomly chosen subset of 26 and 30 subjects, respectively, with an internal
division of 14 HR+ & 12 HR− and 18 HR+ & 12 HR− tumors, respectively.
The abnormal region is located using ρ = 0.2, τ = 3◦ C using the AN D decision
rule, to optimize for the accuracy in classification. All 11 deep tumors of size
0.9 cm and above have been detected in this dataset. The bin width of the PMFs
used is 0.5◦ C. The step size of the temperature bins in the RLM computation
is 0.25◦ C.
A two-class Random Forest ensemble classifier is trained using the features
obtained. The Random Forest (RF) randomly chooses a training sub-set & a fea-
ture sub-set for training a decision tree, and combines the decisions from multiple
such trees to get more accuracy in classification. The mode of all trees is taken
as the final classification decision. RFs with increasing number of trees have a
lower standard deviation in the accuracies over multiple iterations. The standard
deviation in (HR−, HR+) accuracies of the RFs using all features with 5, 25
and 100 trees over 20 iterations is (9.1 %, 11.1 %), (6.4 %, 4.8 %), (2.5 %, 2.0 %),
respectively, and hence a large number of 100 trees is chosen. Table 1 shows the
max. accuracies over 20 iterations of RFs with 100 trees using individual and
combined features proposed in our approach. We tested with different textural
features obtained from both RLM and Gray Level Co-occurence Matrix, but we
found out that gray-level non-uniformity from the RLM is having better accu-
racy than others. Using an optimal combined set of region based features and
textural features, we obtained an accuracy of 82 % and 79 % in classification of
HR+ and HR− tumors respectively.
Table 1. Accuracies with different features obtained using our approach
Feature set Features HR−Accuracy HR+Accuracy

Distance between regions 74 % 56 %
Abnormal Region Relative Hotness 79 % 73 %
Features Thermal Distribution Ratio 63 % 27 %
Combination of above three 84 % 73 %
features
Entire ROI Gray-level non-uniformity 68 % 64 %
Features
Overall features Combination of features from 79 % 82 %
abnormal and entire ROI regions
From Table 1, it is clear that Abnormal Region features plays an important

role compared to textural features. Among these abnormal region features, fea-
tures corresponding to relative temperatures, i.e., Relative Hotness and Distance
Between Regions, have an important role in the classification of HR+ and HR−
tumors, thus validating the findings of [12].
6 Conclusions and Future Work

We have come up with a novel application to automatically classify breast cancer
tumors into HR+ tumors from HR− tumors using thermography with a reason-
ably good accuracy of around 80 %. This is a first approach through image
processing features and machine learning algorithms for such automatic classi-
fication. This also presents an advantage to thermography over other imaging
modalities in estimating prognosis and treatment planning of breast cancer with-
out invasive surgery. In future work, we will test our algorithm on larger datasets
with more variation in data and modify the algorithm to detect sub classes within
HR+ tumors. Additionally, we will try to determine the role of Ki-67 status in
thermography to refine the automatic classification.
Acknowledgement. We thank Manipal University and Dr. L. Ramachandra, Dr.

S. S. Prasad and Dr. Vijayakumar for sharing the data and assisting us in thermo-
graphic image interpretation.
References
1. Fitzmaurice, C., et al.: The global burden of cancer 2013. JAMA Oncol. 1(4),
505–527 (2015)
2. Parise, C.A., Caggiano, V.: Breast cancer survival defined by the er/pr/her2 sub-
types and a surrogate classification according to tumor grade and immunohisto-
chemical biomarkers. J. Cancer Epidemiol. 2014, 11 p. (2014). Article ID 469251
3. Alba, E., et al.: Chemotherapy (CT) and hormonotherapy (HT) as neoadjuvant
treatment in luminal breast cancer patients: results from the GEICAM/2006-03, a
multicenter, randomized, phase-ii study. Ann. Oncol. 23(12), 3069–3074 (2012)
4. Cheang, M., Chia, S.K., Voduc, D., et al.: Ki67 index, HER2 status, and prognosis
of patients with luminal B breast cancer. J. Nat. Cancer Inst. 101(10), 736–750
(2009)
5. Keyserlingk, J., Ahlgren, P., Yu, E., Belliveau, N., Yassa, M.: Functional infrared
imaging of the breast. Eng. Med. Biol. Mag. 19(3), 30–41 (2000)
6. Kennedy, D.A., Lee, T., Seely, D.: A comparative review of thermography as a
breast cancer screening technique. Integr. Cancer Ther. 8(1), 9–16 (2009)
7. Acharya, U.R., Ng, E., Tan, J.H., Sree, S.V.: Thermography based breast cancer
detection using texture features and support vector machine. J. Med. Syst. 36(3),
1503–1510 (2012)
8. Borchartt, T.B., Conci, A., Lima, R.C., Resmini, R., Sanchez, A.: Breast ther-
mography from an image processing viewpoint: a survey. Signal Process. 93(10),
2785–2803 (2013)
9. Gautherie, M.: Thermobiological assessment of benign and malignant breast dis-
eases. Am. J. Obstet. Gynecol. 147(8), 861–869 (1983)
10. Vakkala, M., Kahlos, K., Lakari, E., Paakko, P., Kinnula, V., Soini, Y.: Inducible
nitric oxide synthase expression, apoptosis, and angiogenesis in in-situ and invasive
breast carcinomas. Clin. Cancer Res. 6(6), 2408–4216 (2000)
11. Chaudhury, B., et al.: New method for predicting estrogen receptor status utilizing
breast mri texture kinetic analysis. In: Proceedings of the SPIE Medical Imaging
(2014)
12. Zore, Z., Boras, I., Stanec, M., Oresic, T., Zore, I.F.: Influence of hormonal status
on thermography findings in breast cancer. Acta Clin. Croat. 52, 35–42 (2013)
13. Madhu, H., Kakileti, S.T., Venkataramani, K., Jabbireddy, S.: Extraction of med-
ically interpretable features for classification of malignancy in breast thermography.
In: 38th Annual IEEE International Conference on Engineering in Medicine and
Biology Society (EMBC) (2016)
14. Urruticoechea, A.: Proliferation marker ki-67 in early breast cancer. J. Clin. Oncol.
23(28), 7212–7220 (2005)
15. Ganong, W.F.: Review of Medical Physiology. McGraw-Hill Medical, New York
(2005)
16. Venkataramani, K., Mestha, L.K., Ramachandra, L., Prasad, S., Kumar, V., Raja,
P.J.: Semi-automated breast cancer tumor detection with thermographic video
imaging. In: 37th Annual International Conference on Engineering in Medicine
and Biology Society, pp. 2022–2025 (2015)
Prostate Cancer: Improved Tissue
Characterization by Temporal Modeling
of Radio-Frequency Ultrasound Echo Data
Layan Nahlawi1(B) , Farhad Imani2 , Mena Gaed4 , Jose A. Gomez4 ,

Madeleine Moussa4 , Eli Gibson3 , Aaron Fenster4 , Aaron D. Ward4 ,
Purang Abolmaesumi2 , Hagit Shatkay1,5 , and Parvin Mousavi1
1
School of Computing, Queen’s University, Kingston, Canada
lnahlawi@cs.queensu.ca
2
Department of Electrical and Computer Engineering,
University of British Columbia, Vancouver, Canada
3
4
Department of Medical Biophysics, Pathology and Robarts Institute,
Western University, London, Canada
5
Department of Computer and Information Sciences,
University of Delaware, Newark, USA
Abstract. Despite recent advances in clinical oncology, prostate cancer

remains a major health concern in men, where current detection tech-
niques still lead to both over- and under-diagnosis. More accurate predic-
tion and detection of prostate cancer can improve disease management
and treatment outcome. Temporal ultrasound is a promising imaging
approach that can help identify tissue-specific patterns in time-series of
ultrasound data and, in turn, differentiate between benign and malignant
tissues. We propose a probabilistic-temporal framework, based on hid-
den Markov models, for modeling ultrasound time-series data obtained
from prostate cancer patients. Our results show improved prediction of
malignancy compared to previously reported results, where we identify
cancerous regions with over 88 % accuracy. As our models directly repre-
sent temporal aspects of the data, we expect our method to be applicable
to other types of cancer in which temporal-ultrasound can be captured.
1 Introduction
Prostate cancer is the most widely diagnosed form of cancer in men [1]. The Amer-
ican Cancer Society predicts that one in seven men will be diagnosed with prostate
cancer during their lifetime. Initial assessment includes measuring Prostate Spe-
cific Antigen level in blood serum and digital rectal examination. If either test
is abnormal, core needle biopsy is performed under Trans-Rectal Ultrasound
(TRUS) guidance. Disease prognosis and treatment decisions are then based on
H. Shatkay and P. Mousavi—These authors have contributed equally to the
manuscript.

DOI: 10.1007/978-3-319-46720-7 75
Prostate Cancer: Improved Tissue Characterization by Temporal Modeling 645
grading, i.e., assessing the degree of cancer-aggressiveness in the biopsy cores.

TRUS-guided biopsy often leads to a high rate (∼30 %) of false negatives for can-
cer diagnosis as well as to over- and under-estimation of the cancer grade. Exten-
sive heterogeneity in morphology and pathology of prostate adenocarcinoma are
additional challenging factors for making an accurate diagnosis.
While improved prostate-cancer screening has reduced mortality rates by
45 % over the past two decades [3], inaccurate diagnosis and grading have
resulted in a surge in over-treatment. Notably, radical over-aggressive treatment
of prostate-cancer patients leads to a decline in their quality of life. For indolent
prostate cancer, such aggressive treatment should be avoided as active surveil-
lance has proven to be an effective disease management course [13]. Accurate
identification and grading of lesions and their extent – especially using low cost,
readily accessible technology such as ultrasound – can, therefore, significantly
contribute to appropriate effective treatment. To achieve this, methods must be
developed to guide TRUS biopsies to target regions likely to be malignant. The
task of differentiating malignant tissue from its surrounding tissue is referred to
in the literature as tissue-typing or characterization. In this paper we propose a
new method that utilizes ultrasound time-series data to characterize malignant
vs. benign tissue obtained from prostate-cancer patients.
Most of the research on ultrasound-based tissue characterization focuses on
analysis of texture- [5] and spectral-features [4] within single ultrasound frames.
Elastography [8], another ultrasound technique, aims to distinguish tissue types
based on their measured stiffness in response to external vibrations. A different
way of utilizing ultrasound is by acquiring radio-frequency (rf) time series, which
is a sequence of ultrasound frames captured from sonication of tissue over time,
without moving the tissue or the ultrasound probe. Frequency domain analysis of
rf time series has shown promising results for tissue characterization in breast
and prostate cancer. Moradi et al. [10] used the fractal dimension of rf time
series as features and employed Bayesian and neural network classifiers for ex-
vivo characterization of prostate tissue. More recently, Imani et al. [7] combined
wavelet features and mean central frequency of rf time-series to characterize
in-vivo prostate tissue using SVMs. Neither of these lines of work have explicitly
modeled the temporal aspect of the time-series data and utilized it for tissue
characterization. In a recent study [11] we have suggested that the temporal
aspect of the data may carry useful information if directly captured.
Here we carry the idea forward, presenting a new method for analyzing rf
time series, using a probabilistic temporal model, namely, a hidden Markov
model, hmm [12], specifically representing and making use of the temporal aspect
of the data. We apply the method to differentiate between malignant and benign
prostate tissue and demonstrate its utility, showing an improved performance
compared to previous methods. Probabilistic temporal modeling, and hmms in
particular, have been applied to a wide range of clinical data. They are typically
used to model a time-dependent physiological process (e.g. heartbeats [2]), or the
progression of disease-risk over time [6]. hmms are also used within and outside
the biomedical domain to model sequence-data such as text [9], proteins, DNA
646 L. Nahlawi et al.
sequences and others. Here we use them to model rf time series where time does
have an impact on the ultrasound data being recorded. We next describe our
rf time-series data and its representation, followed by a tissue-characterization
framework. We then present experiments and results demonstrating the effec-
tiveness of the method.
2 RF Time Series Data

rf time series record tissue-response to prolonged sonication. These responses
consist of reflected ultrasound echo intensity values. Figure 1 shows ultrasound
image-frames collected from prostate sonication over time (each such frame is
referred to as an rfframe). The boundary of the prostate is encircled in white.
The solid red dots indicate the same location within the prostate over time,
while the dotted blue arrows point to the corresponding echo intensity values.
The sequence of echo intensities obtained from the same point in the prostate
over time makes up an rf time series (bottom right of the Figure). Due to
the scattering phenomenon in ultrasound imaging, very small objects such as
individual cells cannot be identified using single rf values. As such, we partition
each rf frame using a grid into small regions, where each window in the grid
is referred to as a Region of Interest ( roi), and comprises multiple rf values.
In this work, we use the same dataset as Imani et al. [7], and adopt the same
roi size 1.7 × 1.7 mm2 , which corresponds to 44 × 2 rf values. The 88 rf values
within each grid-window in a single frame recorded at time t, are averaged to
produce a single value representing each roi at the corresponding time-point t.
Fig. 1. Ultrasound rf-frames collected from a prostate-cancer patient over time. Solid
red dots indicate the same location across multiple frames. The time series for this
location is shown at the bottom right. A grid dividing each frame into rois is shown
on the left-most frame. Pathology labels for malignant/benign rois are also shown.
The image data consists of in-vivo rf frames gathered from 9 prostate-cancer

patients who have undergone radical prostatectomy1 . Prior to the surgery, 128
1
The study was approved by the institutional research ethics board and the patients
provided consent to participate.
rf frames recorded over a time-period of 4 s, were gathered from each patient. A

grid is overlaid on each of the frames, rois are obtained as described above, and
for each roi, Rk , a 128-long time series Rk = Rk1 ,. . ., Rk128 is created. Each
point Rkt in the series corresponds to the average intensity of that roi in the
rf-frame recorded at time t, where 1 ≤ t ≤ 128. While the number of patients is
relatively low, the total number of rois per patient is quite high (see Table 1),
thus providing sufficient data to support effective model-learning.
As commonly done in time series analysis, we map the series associated with
each roi, Rk , to its first-order difference series, i.e. the sequence of differences
between pairs of consecutive time-points. To simplify the modeling task, we
further discretize the difference series, by placing the values into 10 equally-
spaced bins, where the values in the lowest bin are all mapped to 1, and those
at the top-most bin are mapped to 10. We denote the sequence obtained by
discretizing Rk , as Ok1 , ..., Ok127 . Our experiments suggest that 10 bins are
sufficient for obtaining good classification performance.
To create a gold-standard of labeled malignant vs benign regions we use
whole-mount histopathology information. To obtain such information, following
prostatectomy, the tissues are formalin-fixed and imaged using mri. The tissues
are then cut into ∼4 mm slices, and further processed to enable high resolution
microscopy. Two clinicians then assign (in consensus) the appropriate labels to
the malignant and to the benign areas within each slice. A multi-step rigorous
registration process, in which mri images are used as an intermediate step, is
employed to overlay the labeled histopathology images on the in-vivo ultrasound
frames (see [7] for additional details). This registration process results in an
assignment of a pathology label to each roi, indicating whether it is malignant
or benign. Figure 1 shows several examples of such labeled rois. We use the same
570 labeled rois as in [7], of which 286 are malignant and 284 benign. Table 1
summarizes the data. The rf time-series associated with the labeled rois are
used as training and test data for building a probabilistic model for distinguishing
between benign and malignant tissues, as described in the next section.
Table 1. The distribution of malignant and benign rois over the 9 patients.
Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 Total
Malignant rois 42 29 18 64 35 28 23 30 17 286
Benign rois 42 29 18 61 35 29 23 30 17 284
3 Probabilistic Modeling Using Hidden Markov Models
hmms are often used to model time series where the generating process is
unknown or prone to variation and noise. The process is viewed as a sequence
of stochastic transitions between unobservable (hidden) states; some aspects of
each state are observed and recorded. As such, the states may be estimated
from the observation-sequence [12]. A simplifying assumption underlying the
use of these models is the Markov property, namely, that the state at a given
time-point depends only on the state at the preceding point, conditionally inde-
pendent of all other time points. In this work we view a tissue response value
recorded in an rf frame and discretized as discussed above, as an observation;
employing the Markov property, we assume each such value depends only on
the response recorded at the frame directly preceding it, independent of any
earlier responses. Formally, an hmm λ consists of five components: A set of N
states, S = {s1 , . . . , sN }; a set of M observation symbols, V = {v1 , . . . , vM }; an
N × N stochastic matrix A governing the state-transition probability, where
Aij = P r(statet+1 = si |statet = sj ), 1 ≤ i, j ≤ N , and statet is the state at time
t; an N × M stochastic-emission matrix B, where Bik = P r(obt = vk |statet = si ),
1 ≤ i ≤ N, 1 ≤ k ≤ M , denoting the probability of observing vk at state si ; an
N -dimensional stochastic vector π, where for each state si , πi = P r(state1 = si ),
denotes the probability to start the process at state si . Learning a model λ
from a sequence of observations O = o1 , o2 , . . . , o127 , amounts to estimating
the model parameters (namely, A, B & π), to maximize log[P r(O|λ)], i.e. the
observations’ probability given the model λ. In practice, π is fixed such that
π1 = P r(state1 = s1 ) = 1 & πj = 0 for j = 1, i.e. s1 is always the first state. In
the experiments reported here, we also fix the matrix A to an initial estimate
based on clustering (as described below), while the matrix B is learned using
the Baum-Welch algorithm [12].
The hmms we develop, as illustrated in Fig. 2, are ergodic models consisting
of 5 states and 10 observations. A small number of states allows for a compu-
tationally efficient model while typically leading to good generalization beyond
the training set. We determined the number of states by experimenting with 2–6
state models (and a few larger ones with >10 states). The classification perfor-
mance of 5-state models was higher than that of others. Moreover, each of the 5
states is associated with a distinct emission probability distribution, which is not
the case when using additional/fewer states. The observation set, as discussed in
Sect. 2, consists of 10 observation symbols v1 , ..., v10 , each of which corresponds
to a discretized interval of first-order difference values of the rf time-series.
Fig. 2. Example of hmms learned from (A) malignant rois, and (B) benign rois. Nodes
represent states. Edges are labeled by transition probabilities; Emission probabilities
are shown to the right of each model. Edges with probability <0.2 are not shown.
For tissue classification, we learn two hmms – one for representing series
obtained from malignant tissue, denoted λM , and the other for benign tissue,
denoted λB . We use supervised learning to learn the models’ parameters, where
the training and test data consist of the time-series corresponding to the rois
that were labeled as malignant and benign (described in Sect. 2). To train each
model, we use a leave-one-patient-out cross-validation strategy, partitioning each
set of roi time-series ( malignant for λM , benign for λB ) into training and test
sets. In each cross-validation run the rois of one of the 9 patients are left-out
as a test-set, while the rois of the other 8 patients are used to train the hmm.
Malignant rois are used to train λM , while λB is trained on benign rois. Given
a test-sequence, roitesti , each of the two models assigns it a log probability,
log(P r(ROItesti |λc )), (c ∈ {M, B}) – a measure indicating how likely the model
is to have generated the time-series. The class label assigned to ROItesti , Ctesti ,
is the one whose model maximizes the log probability, that is:
Ctesti = argmax (log(P r(ROItesti |λc ))), 1 ≤ i ≤ L, where L is the # of test-rois.
c∈{M,B}
P r(ROI
testi|λ )
B
Practically, if the log-odds log P r(ROItest |λM ) is positive, ROItesti is classified as
i
malignant, otherwise it is classified as benign. In Sect. 4, we use the log-odds as
a basis for heat-maps to visualize the results (Fig. 3).
To learn the two models, each of the models is initialized, and its observa-
tion matrix B is then iteratively updated until convergence, in accordance with
the Baum-Welch method. Model initialization is based on clustering the values
within all the discretized training time-series into 5 clusters, cl1 ,. . ., cl5 , where 5
is the number of states. Based on the assignment of each value to its respective
cluster, we estimate the transition probability Ai,j where 1 ≤ i, j ≤ 5 as the
data-frequency of observing a value from cluster cli followed by a value from
cluster clj within all the time series in the respective training set. Since the
model is not left-to-right, the transitions can be in either direction. A similar
estimation process is applied for initializing the observation matrix B.
Fig. 3. Top: rf frames overlaid with malignant/benign pathology labels. Bottom: Heat-
map images based on our learned models, where each roi color is assigned based on
the log-odds ratio calculated for its respective time-series. The left three columns are
rf frames from patients P1 (col 1) and P5 (col 2, 3) while the frames in the rightmost
column are from Patient P7, for whom we noted a lower performance.
To assess our method’s performance, we apply each of the trained models

(trained over roi time-series obtained from 8 patients) to assign labels over the
test data (the rois of the left-out patient), and calculate the average standard
measures accuracy, sensitivity and specificity of the assigned labels with respect
to the ground-truth in the gold-standard. The learnt hmms provides a summary
of the course of changes in rf values that each of the tissue types goes through
in response to the prolonged sonication.

As explained above, we train 9 pairs of hmms – one malignant and one benign –
where for each pair, the training is done on data obtained from 8 of the 9 patients.
Each pair of hmms is then used for classifying the rois of the left-out 9th (test)
patient. Figure 2 shows an example of such a pair of learned hmms, where the left
one was trained on time-series obtained from malignant rois while the one on the
right was trained on benign ones. The transition probabilities are shown on the
edges while emission probabilities for each state are shown as histograms. The
figure shows that in both models, each state is characterized by its own markedly
distinct observation distribution. Moreover, the most likely path transitioning
through the malignant model alternates primarily between the states s1 and s5
possibly via s2 , where s5 is the most central state, that is, the one most likely to be
visited. In contrast, the benign model alternates primarily between the states s1 ,
s3 and s4 , with s3 being the most central state. Notably the emission distribution
associated with s5 in the malignant model is very different from that associated
with s3 in the benign one, hence these two states are not equivalent. The clear dis-
tinction between the two models means that time series obtained from malignant
roi’s form a certain typical pattern of changing values, while time series obtained
from benign roi’s form a different typical pattern, and our models do capture the
difference.
The classification results for all test patients are shown in Table 2. The aver-
age accuracy is 88.1 %, whereas the average sensitivity and specificity are 89.1 %
and 87.1 %, respectively. The results clearly indicate that for the majority of
rois, our trained models can correctly distinguish between rois obtained from
Table 2. The classification performance using hmms. The numbers in parentheses show
the respective result reported by [7] for the same patient.
Patient P1 P2 P3 P4 P5 P6 P7 P8 P9 Average
Accuracy 82.1 96.5 100 93.6 90 85.9 69.5 78.3 97.1 88.1 ± 9
(82) (71) (88) (95) (86) (86) (N/A) (80) (85)
Sensitivity 100 96.5 100 87.5 97.1 82.1 65.2 73.3 100 89.1 ± 12
(100) (68) (76) (90) (100) (81) (N/A) (98) (84)
Specificity 64.2 96.5 100 100 82.8 89.6 73.9 83.3 94.1 87.1 ± 11
(62) (74) (100) (100) (71) (90) (N/A) (61) (84)
malignant tissue and those obtained from benign tissue. Moreover, for most cases
our performance either matches or significantly improves upon that of an earlier
method [7] that used SVMs and did not explicitly model the temporal aspect
of the time-series. We note that for patient P8 our sensitivity is significantly
lower, although our specificity is much higher, which amounts to a compara-
ble overall accuracy. An exception to the high level of performance is clearly
observed for patient P7, for whom the classification performance is significantly
lower than that obtained for all other patients. Further investigation showed
that this patient was not included in the earlier reported results [7], because
the ground-truth registration of the histology labels of malignant tissue was not
accurate. The fact that mis-labeled rois are not well-distinguished based on the
models learned from other patient data serves as further evidence for the fact
that the models indeed capture the salient differences between rf echos emitted
by benign vs malignant tissue. The top row of Fig. 3 shows several examples
of rf frames obtained from different patients overlaid with malignant/benign
labels. The bottom row shows corresponding images of heat-maps based on our
results. Each roi, is assigned a color reflecting the log-odds ratio calculated for
its respective time-series Rx , log PP r(R
r(Rx |λB )
x |λM )
. The first three columns show rf
frames from P1 (1 column) and P5 (2 , 3rd columns), all of which show that
st nd
the heat-maps match the original annotations almost perfectly. The fourth col-
umn shows an rf frame from P7. Despite inaccuracies in the gold-standard for
this image, our model still identifies correctly the benign regions, while showing
most of the malignant regions about equally likely to be malignant or benign.
5 Conclusion
We introduced a new approach for tissue-classification in prostate cancer, based

on modeling temporal aspects of tissue-response to prolonged sonication. Repre-
senting the two tissue types (malignant/benign), each as a probabilistic-temporal
model learned from patients’ data (training-set) allows for accurate labeling of
test-data obtained from another patient. Our results indicate that temporal pat-
terns, captured by our models, help differentiate between rf time series obtained
from malignant vs. benign tissues, with an average accuracy of over 88 %. As
a next step we plan to take into account the heterogeneity in benign tissue, as
well as incorporate cancer grades to support a more refined categorization of
tissue types. This study takes a first step using such models, and is limited by a
relatively small number of patients for which we have reliably annotated whole-
mount tissue images. In the future we shall increase the number of patients,
and include anatomical-data indicating the zones from which rois are selected.
Beyond prostate cancer, we expect our method to be applicable to other types
of cancer such as breast and liver.
Acknowledgment. This work was partially supported by grants from nserc Discov-
ery to hs and pm, nserc and cihr chrp to pm and nih #r56 lm011354a to hs.
References
1. Canadian Cancer Society and National Cancer Institute of Canada. Advisory Com-
mittee on Records, Registries: Canadian cancer statistics. Canadian Cancer Society
(2015)
2. Coast, D., Stern, R., Cano, G., et al.: An approach to cardiac arrhythmia analysis
using hidden Markov models. IEEE Trans. Biomed. Eng. 37(9), 26–36 (1990)
3. Etzioni, R., Tsodikov, A., et al.: Quantifying the role of PSA screening in the US
prostate cancer mortality decline. Cancer Causes Control 19(2), 75–81 (2008)
4. Feleppa, E., Porter, C., Ketterling, J., et al.: Recent developments in tissue-type
imaging (TTI) for planning and monitoring treatment of prostate cancer. Ultrason.
Imaging 26(3), 63–72 (2004)
5. Han, S., Lee, H., Choi, J.: Computer-aided prostate cancer detection using texture
features and clinical features in ultrasound image. J. Dig. Imaging 21(1), 21–33
(2008)
6. Hauskrecht, M., Fraser, H.: Planning treatment of ischemic heart disease with
partially observable Markov decision processes. AI Med. 18(3), 21–44 (2000)
7. Imani, F., Abolmaesumi, P., Gibson, E., et al.: Computer-aided prostate cancer
detection using ultrasound RF time series: in vivo feasibility study. IEEE Trans.
Med. Imaging 34(11), 48–57 (2015)
8. Krouskop, T., Wheeler, T., Kallel, F., et al.: Elastic moduli of breast and prostate
tissues under compression. Ultrason. Imaging 20(4), 60–74 (1998)
9. Li, Y., Lipsky Gorman, S., Elhadad, N.: Section classification in clinical notes using
supervised hidden Markov model. In: Proceedings of the 1st ACM International
Health Informatics Symposium, pp. 44–50. ACM (2010)
10. Moradi, M., Mousavi, P., Boag, A.H., et al.: Augmenting detection of prostate
cancer in transrectal ultrasound images using SVM and RF time series. IEEE
Trans. Biomed. Eng. 56(9), 214–224 (2009)
11. Nahlawi, L., Imani, F., Gaed, M., et al.: Using hidden Markov models to capture
temporal aspects of ultrasound data in prostate cancer. In: IEEE International
Conference on BIBM 2015, pp. 46–49. IEEE (2015)
12. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in
speech recognition. Proc. IEEE 77(2), 257–286 (1989)
13. Singer, E.A., Kaushal, A., Turkbey, B., et al.: Active surveillance for prostate
cancer: past, present and future. Curr. Opin Oncol. 24(3), 43–50 (2012)
Classifying Cancer Grades Using Temporal
Ultrasound for Transrectal Prostate Biopsy
Shekoofeh Azizi1(B) , Farhad Imani1 , Jin Tae Kwak5 , Amir Tahmasebi2 ,

Sheng Xu3 , Pingkun Yan2 , Jochen Kruecker2 , Baris Turkbey4 , Peter Choyke4 ,
Peter Pinto4 , Bradford Wood3 , Parvin Mousavi6 , and Purang Abolmaesumi1
1
The University of British Columbia, Vancouver, BC, Canada
Shekoofeh.a@gmail.com
2
Philips Research North America, Briarcliff Manor, NY, USA
3
4
National Cancer Institute, Bethesda, MD, USA
5
Sejong University, Gwangjin-gu, SU, South Korea
6
Queen’s University, Kingston, ON, Canada
Abstract. We propose a cancer grading approach for transrectal

ultrasound-guided prostate biopsy based on analysis of temporal ultra-
sound signals. Histopathological grading of prostate cancer reports the
statistics of cancer distribution in a biopsy core. We propose a coarse-
to-fine classification approach, similar to histopathology reporting, that
uses statistical analysis and deep learning to determine the distribution
of aggressive cancer in ultrasound image regions surrounding a biopsy
target. Our approach consists of two steps; in the first step, we learn
high-level latent features that maximally differentiate benign from can-
cerous tissue. In the second step, we model the statistical distribution of
prostate cancer grades in the space of latent features. In a study with
197 biopsy cores from 132 subjects, our approach can effectively sepa-
rate clinically significant disease from low-grade tumors and benign tis-
sue. Further, we achieve the area under the curve of 0.8 for separating
aggressive cancer from benign tissue in large tumors.
Keywords: Temporal ultrasound · Cancer grading · Deep belief

network · Gaussian mixture model
1 Introduction
Prostate Cancer (PCa) is a significant public health issue. According to the
National Cancer Institute (NCI)1 , approximately 14 % of men will be diagnosed
with PCa at some point during their lifetime. Definitive diagnosis involves core
needle biopsy guided by Transrectal Ultrasound (TRUS), followed by histopatho-
logical analysis of the obtained samples. TRUS is blind to intraprostatic pathol-
ogy, and can miss clinically significant disease [5].
1
Surveillance, Epidemiology, and End Results (SEER) Cancer Statistics Review.

DOI: 10.1007/978-3-319-46720-7 76
654 S. Azizi et al.
In recent years, multi-parametric Magnetic Resonance Imaging (mp-MRI)

and its fusion with TRUS has emerged as a promising technology to target
potential cancer lesions identified in mp-MRI [13,14]. While mp-MRI has neg-
ative predictive values as high as 94 % [9,12], it has a high false positive rate,
and can miss smaller tumors. Furthermore, mp-MRI can not reliably detect the
degree of aggressiveness of cancer, known as the grade. The vagaries of PCa diag-
nosis and prognosis have led to high rates of over-treatment: for every man saved
from PCa-related death, 1400 are screened and 48 undergo radical treatment [8].
Accurate detection of aggressive cancer is critical to its appropriate management.
Patients with indolent cancer can then opt for active surveillance [5,13].
There have been a large number of efforts to adopt ultrasound (US)-based
tissue typing for PCa detection as the US is affordable, accessible and real time.
PCa detection using analysis of B-mode images [10] and single frame radio-
frequency (RF) US data [6] has not had significant clinical uptake, while the
application of these methods for PCa grading is not widely reported. Elastog-
raphy [3] and Doppler imaging [11], available on many conventional US sys-
tems, have been promising for PCa detection while conflicting results have been
reported on their application to PCa grading [8]. The main shortcoming of these
approaches is the need to determine a consistent threshold for tissue properties
that can reliably identify cancer, and generalize well to prospective patients [3].
More recently, analysis of temporal US data has emerged as a promising
modality for PCa tissue typing. In this technology, a series of US frames is
captured from a stationary tissue location without intentional movement of the
tissue or the transducer. This approach has been successful in classification of
cancerous and benign prostate tissue [1,7]. It has also been employed to differ-
entiate between various cancer grades, in preliminary whole-mount studies [8].
In this paper, in a clinical study of 197 TRUS-guided biopsy cores from 132
patients, we use temporal US to address the problem of PCa grading. We pro-
pose an approach that is based on deep learning and statistical analysis of image
regions corresponding to biopsy targets. It has two components (Fig. 1): (1) fea-
ture learning, where a deep learning architecture derives a set of high-level latent
features to separate benign from the cancerous tissue; and (2) distribution learn-
ing, where clustering is applied in the space of the latent features to determine
the cancer grade. Our proposed approach is effective in differentiating aggressive
PCa from clinically-less-significant disease and non-cancerous tissue.

2.1 Data
One hundred and thirty-two (132) subjects were enrolled in the study. All sub-
jects provided informed consent to participate and the study was approved by
the institutional research ethics board. The subjects underwent a diagnostic mp-
MRI of the prostate. The mp-MRI sequences were examined by two independent
radiologists to identify primary and secondary cancerous lesions (with cancer
suspicious level assigned to as low, intermediate, or high), and to provide the
Classifying Cancer Grades Using Temporal Ultrasound 655
“largest diameter of tumor”. Subjects with suspicious lesions underwent MRI-

guided targeted TRUS biopsies using the UroNav (Invivo Corp., FL) MR/US
fusion system. During biopsy, T2-weighted MR images were registered to the 3D
US volume of the prostate using UroNav. The clinician then navigated in the
prostate volume towards the MR-identified target; the TRUS transducer was
held steady for about 5 s to acquire 100 frames of temporal US data from the
target, and the biopsy core was taken. Two cores were obtained for the primary
lesion; one in the axial, the other in the sagittal plane. Temporal US data was
only recorded from the primary lesion in the axial imaging plane to minimize
disruption of the clinical work flow. Histopathology labels of the cores were used
as the ground-truth (Fig. 1).
Fig. 1. An illustration of the proposed cancer grading approach.
For each target, the Gleason Score (GS) and the % distribution of PCa in
the axial and sagittal samples were reported. The GS is used to describe PCa
grade and ranges from 1 (resembling normal tissue) to 5 (aggressive cancerous
tissue). It is reported as a sum of the grades of the two most common patterns
in a tissue specimen. We only include cores in our study where the axial and
sagittal pathology match. From 197 cores in our data, 57 were cancerous (12 GS
of 3 + 3, 19 GS of 3 + 4, four GS of 4 + 3, 20 GS of 4 + 4, and two GS of 4 + 5)
while 140 had non-cancerous histology including benign or fibromuscular tissue,
chronic inflammation, atrophy and Prostatic Intraepithelial Neoplasia (PIN).
We divide the data from 197 cores into training and testing sets. Training data
consists of 32 biopsy cores from 27 patients with the following histopathology
labels: 19 benign, 0 GS 3+3, 5 GS 3+4, 2 GS 4+3, 4 GS 4+4 and, 2 GS 4+5.
The test data is made up of 165 cores from 114 patients, with the following
distribution: 121 benign, 12 GS 3+3, 14 GS 3+4, 2 GS 4+3, and 16 GS 4+4.
656 S. Azizi et al.
2.2 Preprocessing
We compute the spectrum of temporal US data obtained from each biopsy core.
For this purpose, we analyze an area of 2 × 10 mm2 around the target location in
the lateral and axial directions, respectively. This region is along the projected
needle path in the US image and centered on the target. We divide the selected
area to 20 equally-sized Regions of Interest (ROI) of size 1 mm2 . For each ROI,
we take the Fourier transforms of all time series corresponding to the RF samples
in each ROI, normalized to the frame rate. Then, we average the absolute values
of the Fourier transforms of the RF time series in each ROI. Finally, each ROI
is represented by 50 positive frequency components (see Fig. 1).
2.3 Cancer Grading
Grading can be considered as a multi-class classification problem, where the

objective is to determine if an area in the tissue is benign or has various grades
of PCa (Grades 3, 4 or 5). Training such a classifier with prostate biopsy data is
non-trivial: the ground-truth histopathology reports a measure of the statistical
distribution of cancer in a biopsy core. The exact location of the cancerous tissue
in the core is not provided. Therefore, the exact label of each ROI in a core is not
available, rather the statistics of ROIs with various labels in a core are known.
We propose a coarse-to-fine classification approach that similar to histopathology
reporting, calculates a statistical representation of the distribution of ROIs in
various classes (benign and grades 3, or 4). The approach has two steps (Fig. 1):
(1) feature learning to extract latent features that maximally separate benign
from cancerous tissues; and (2) distribution learning to model the statistical
distribution of cancer grades in the space of learnt features.
Feature Learning: We use a Deep Belief Network (DBN) structure [1] to map
the set of 50 spectral components for each ROI to six high-level latent features.
The network structure includes 100, 50 and 6 hidden units in three layers, where
the last hidden layer represents the latent features. In the pre-training step,
the learning rate is fixed at 0.001, mini-batch size is 5, and the epoch is 100.
Momentum and weight cost are set to defaults of 0.9, and 2 × 10−4 , respectively.
For discriminative fine-tuning, a node is added to represent the labels of obser-
vations, and back-propagation with a learning rate of 0.01 for 70 epochs and
mini-batch size of 10 is used. We perform dimensionality reduction in the space
of the latent features. We use Zero-phase Component Analysis [2] to whiten the
features and determine the top two eigen vectors, f1 and f2 . We call this space
the eigen feature space.
Distribution Learning: We use the training data to build a Gaussian Mix-

ture Model (GMM) [15] to represent the distribution of different Gleason
patterns in the eigen feature space. The K-component GMM is denoted by
K
Θ = {(ωk , μk , Σk )|k = 1, ..., K}, where ωk is the mixing weight ( k=1 ωk = 1),
μk is the mean and Σk is the covariance matrix of the k-th mixture component.
Starting with an initial mixture model, the parameters of Θ are estimated with
Expectation-Maximization (EM) [15]. The EM algorithm is a local optimiza-
tion method, and hence particularly sensitive to the initialization of the model.
Instead of random initialization, we present a simple but efficient method for
finding initial parameters based on our prior knowledge from pathology.
GMM Initialization: Let XH be the set of all ROIs within cores of training
data with the histopathology labels H ∈ { benign, GS 3+4,GS 4+3,GS 4+4}. We
first analyze the distribution of the ROIs of benign cores, Xbenign , in the eigen
feature space; we observe two distinct clusters (Fig. 2) that span histopathology
labels of normal and fibromuscular tissue, chronic inflammation, atrophy, and
PIN. We use k-means clustering to separate the two clusters; we consider the
cluster with the maximum number of “normal tissue” ROIs as the dominant
benign cluster, and the second cluster as a representative for other non-cancerous
tissue. Next, we use ROIs in the training dataset that correspond to the cores
with GS 4+4, XGS4+4 , to identify the dominant cluster that represents Gleason
4 pattern. Finally, we use all other ROIs from cancerous cores that correspond
to GS 3+4 and GS 4+3 to identify the centre for Gleason 3 pattern in the eigen
feature space. We denote the centroid of all clusters by C = {Cbenign , CG4 , CG3 ,
Cnoncancerous }. To initialize the K-component GMM, we set K = 4 to model the
four tissue patterns with mean, μk , for each Gaussian component equal to the
centroid of each cluster. We use an equal covariance matrices for all components
and set Σk to the covariance of XH . Each ωk , k = 1, ..., K is randomly drawn
K
from a uniform distribution between [0, 1] and normalized by k=1 ωk .
Fig. 2. An illustration of the proposed GMM initialization method.

658 S. Azizi et al.
Prediction of Gleason Score: For each test core, we map the data from 20
ROIs in that core to the eigen feature space. Subsequently, we assign a label
from {benign, G3, G4, non-cancerous} to each ROI based on its proximity to
the corresponding cluster centre in the eigen feature space. To determine a GS
for a test core, Y, we follow histopathology guidelines where we use the ratio
of the number of ROIs labeled as benign, G3 (NG3 ) and G4 (NG4 ) (e.g., a core
with a large number of G4 and a small number of G3 ROIs has GS 4+3):
⎧
⎪
⎨GS 4+3 or higher, NG4 = 0 & NG4 ≥ NG3
Y = GS 3+4 or lower , NG3 = 0 & NG4 < NG3
⎪
⎩
benign, otherwise

We assess the overall performance of our approach using the area under the
receiver operating characteristic curve (AUC). This curve depicts relative trade-
offs between sensitivity and specificity where larger AUC values indicate better
classification performance. Figure 3 (top) shows the target location and distri-
bution of histopathologic outcome of biopsies in the prostate, as divided into
anterior/posterior, and central/peripheral zones for base, midgland and apex.
Figure 3 (bottom) shows our predictions of cancer grades using temporal US
data. The distribution of cancerous cores out of all biopsies by location within
the gland was 34 % (19 out of 56 biopsies) in the central region, and 24 % (25 out
of 109 biopsies) in the peripheral region. Although more biopsies were performed
in the peripheral zone, a higher portion of positive biopsies was observed in the
central zone. In the central zone, we can differentiate between non-cancerous
targets and clinically significant cancer (GS ≥ 4 + 3) with the AUC of 0.80.
Table 1 shows the classification performance based on the inter-class AUC. To
investigate the effect of the size of the tumor on our detection performance, we
analyze the AUC against the greatest length of the tumor in MRI for each target
biopsy ranging from 0.3 cm to 3.8 cm. We obtained AUC of 0.70 for cores with
MR-tumor-size≥2.0 cm. The results show our method has a higher performance
for larger tumors.
We also performed an analysis to determine the sensitivity of our method-
ology to the choice of the training data. We create 32 pairs of training and
testing datasets: each new pair of datasets is identical to the original except that
one benign or cancerous core is swapped between the datasets in the pair. As
Table 1 shows, the average AUC of the sensitivity analysis follows our previous
performance results which support the generalization of the proposed model.
Finally, we combine our cancer grading results with readings from mp-
MRI. The combination takes advantage of both imaging techniques. If mp-MRI
declares cancer suspicious level as low or high for a core, we use its predictions
alone and declare the core as benign or aggressive cancer, respectively. On the
other hand, when mp-MRI declares the suspicious level as intermediate (70 %
of all cores in our data), we use predictions based on temporal US data. The
(a) Base (b) Mid-gland (c) Apex
Fig. 3. Target location and distribution of biopsies in the test data. Light and dark gray
indicate central and peripheral zones, respectively. The pie charts indicate the number
of cores and their histopathology. The size of the chart is proportional to the number
of biopsies (in the range from 1 to 25) and the colors dark red, light red and blue refer
to cores with GS ≥ 4 + 3, GS ≤ 3 + 4 and benign pathology, respectively. The top and
bottom rows depict histopathology results and our grade predictions, respectively.
combined approach leads to an AUC of 0.72 for predicting cancer grade versus
either 0.65 using mp-MRI or 0.69 using temporal US data. The combined AUC
is 0.83 for tumors with L ≥ 2.0 cm.
Table 1. Model performance for classification of cores in the test dataset and permu-
tation set. L is the greatest length of the tumor visible in mp-MRI.
Evaluation Test dataset Permutation dataset

All cores L ≥ 2.0 cm All cores L ≥ 2.0 cm
Non-cancerous vs. GS ≥ 4+3 0.69 0.80 0.68 0.78
Non-cancerous vs. GS ≤ 3+4 0.62 0.63 0.62 0.62
GS ≤ 3+4 vs. GS ≥ 4+3 0.61 0.67 0.60 0.67
Non-cancerous vs. Cancerous 0.62 0.70 0.61 0.69
4 Conclusion
In this paper, in an in vivo study including 197 TRUS-guided biopsy cores,
temporal US data was used to differentiating between clinically less significant
660 S. Azizi et al.
prostate cancer (GS ≤ 3+4), aggressive prostate (GS ≥ 4+3) and non-cancerous
prostate tissues. Determining the aggressiveness of prostate cancer can help
reduce the current high rate of over-treatment in patients with indolent can-
cer. We utilized a two step machine learning approach to address the challenges
related to ground-truth labeling in PCa grading. First, differentiating features for
detection of cancerous and non-cancerous prostate tissue were learned, and then
the statistical distribution of PCa grades was modeled using a GMM. We showed
that we could successfully differentiate among aggressive PCa (GS ≥ 4+3), clin-
ically less significant PCa (GS ≤ 3+4), and non-cancerous prostate tissues. Fur-
thermore, combination of temporal US and mp-MRI has the potential to out-
perform either modality alone in detection of PCa.
Future work includes: (1) examining physical phenomena governing US time
series tissue typing. Our results to-date suggest that tissue microvibration, pos-
sibly due to cardiac pulsation, and changes in tissue temperature due to acoustic
energy [4] play key roles; (2) an inter-institution patient study to determine the
accuracy across a wide range of patient subpopulation. By displaying the pre-
dicted grade not only for the target, but also for regions surrounding the target,
we will determine if US time series can increase cancer yield.
References
1. Azizi, S., et al.: US-based detection of PCa using automatic feature selection with
deep belief networks. In: MICCAI, pp. 70–77. Springer (2015)
2. Bell, A.J., Sejnowski, T.J.: The independent components of natural scenes are edge
filters. Vis. Res. 37(23), 3327–3338 (1997)
3. Correas, J.M., et al.: PCa: diagnostic performance of real-time shear-wave elastog-
raphy. Radiology 275(1), 280–289 (2014)
4. Daoud, M., et al.: Tissue classification using US-induced variations in acoustic
backscattering features. IEEE TBME 60(2), 310–320 (2013)
5. Epstein, J.I., et al.: Upgrading and downgrading of PCa from biopsy to radical
prostatectomy: incidence and predictive factors using the modified Gleason grading
system and factoring in tertiary grades. Eur. Urol. 61(5), 1019–1024 (2012)
6. Feleppa, E., et al.: Recent advances in ultrasonic tissue-type imaging of the
prostate. In: Acoustical Imaging, pp. 331–339. Springer (2007)
7. Imani, F., et al.: US-based characterization of PCa using joint independent com-
ponent analysis. IEEE TBME 62(7), 1796–1804 (2015)
8. Khojaste, A., et al.: Characterization of aggressive PCa using US RF time series.
In: SPIE Med. Imaging, p. 94141A (2015)
9. Kuru, T.H., et al.: Critical evaluation of magnetic resonance imaging targeted,
transrectal US guided transperineal fusion biopsy for detection of PCa. J. Urol.
190(4), 1380–1386 (2013)
10. Llobet, R., et al.: Computer-aided detection of PCa. Int. J. Med. Inform. 76(7),
547–556 (2007)
11. Nelson, E.D., et al.: Targeted biopsy of the prostate: the impact of color Doppler
imaging and elastography on PCa detection and Gleason score. Urology 70(6),
1136–1140 (2007)
12. de Rooij, M., et al.: Accuracy of multiparametric MRI for PCa detection: a meta-
analysis. Am. J. Roentgenol. 202(2), 343–351 (2014)
13. Siddiqui, M.M., et al.: Comparison of MR/US fusion-guided biopsy with US-guided
biopsy for the diagnosis of PCa. JAMA 313(4), 390–397 (2015)
14. Vargas, H.A., et al.: Diffusion-weighted endorectal MRI at 3T for PCa: tumor
detection and assessment of aggressiveness. Radiology 259(3), 775–784 (2011)
15. Xu, L., Jordan, M.I.: On convergence properties of the EM algorithm for Gaussian
mixtures. Neural Comput. 8(1), 129–151 (1996)
Characterization of Lung Nodule Malignancy
Using Hybrid Shape and Appearance Features
Mario Buty1 , Ziyue Xu1(B) , Mingchen Gao1 , Ulas Bagci2 , Aaron Wu1 ,
and Daniel J. Mollura1
1
ziyue.xu@nih.gov
2
University of Central Florida, Orlando, FL, USA
Abstract. Computed tomography imaging is a standard modality for

detecting and assessing lung cancer. In order to evaluate the malignancy
of lung nodules, clinical practice often involves expert qualitative ratings
on several criteria describing a nodule’s appearance and shape. Translat-
ing these features for computer-aided diagnostics is challenging due to
their subjective nature and the difficulties in gaining a complete descrip-
tion. In this paper, we propose a computerized approach to quantitatively
evaluate both appearance distinctions and 3D surface variations. Nod-
ule shape was modeled and parameterized using spherical harmonics,
and appearance features were extracted using deep convolutional neural
networks. Both sets of features were combined to estimate the nodule
malignancy using a random forest classifier. The proposed algorithm
was tested on the publicly available Lung Image Database Consortium
dataset, achieving high accuracy. By providing lung nodule characteriza-
tion, this method can provide a robust alternative reference opinion for
lung cancer diagnosis.
Keywords: Nodule characterization · Conformal mapping · Spherical

Harmonics · Deep convolutional neural network · Random forest
1 Introduction
Lung cancer led to approximately 159,260 deaths in the US in 2014 and is the
most common cancer worldwide. The increasing relevance of pulmonary CT data
has triggered dramatic growth in the computer-aided diagnostics (CAD) field.
Specifically, the CAD task for interpreting chest CT scans can be broken down
into separate steps: delineating the lungs, detecting and segmenting nodules,
and using the image observations to infer clinical judgments. Multiple techniques
have been proposed and subsequently studied for each step. This work focuses
on characterizing the segmented nodules.
Z. Xu—This research is supported by CIDI, the intramural research program of

the National Institute of Allergy and Infectious Diseases (NIAID) and the National
Institute of Biomedical Imaging and Bioengineering (NIBIB).

DOI: 10.1007/978-3-319-46720-7 77
Characterization of Lung Nodule Malignancy 663
Clinical protocols for identifying and assessing nodules, specifically the Fleis-
chner Society Guidelines, involve monitoring the size of the nodule with repeated
scans over a period of three months to two years. Ratings on several image-based
features may also be considered, including growth rate, spiculation, sphericity,
texture, etc. Features like size can be quantitatively estimated via image segmen-
tation, while other markers are mostly judged qualitatively and subjectively. For
nodule classification, existing CAD approaches are often based on sub-optimal
stratification of nodules solely based on their morphology. Malignancy is then
roughly correlated with broad morphological categories. For instance, one study
found malignancy in 82 % of lobulated nodules, 97 % of densely spiculated nod-
ules, 93 % of ragged nodules, 100 % of halo nodules, and 34 % of round nodules [1].
Subsequent approaches incorporated automatic or manual definitions of similar
shape features, along with various other contextual or appearance features into
linear discriminant classifiers. However, these features are mostly subjective and
arbitrarily-defined [2]. These limitations reflect the challenges in achieving a com-
plete and quantitative description of malignant nodule appearances. Similarly,
it is difficult to model the 3D shape of a nodule, which is not directly compre-
hensible with the routine slice-wise inspection of human observers. Therefore,
the extraction of proper appearance features, as well as shape description, are
of great value for the development of CAD systems.
For 3D shape modeling, spherical harmonic (SH) parameterizations offer an
effective model of 3-D shapes. As shape descriptors, they have been used success-
fully in many applications such as protein structure [3], cardiac surface match-
ing [4], and brain mapping [5]. While SH has been shown to successfully discrimi-
nate between malignant and benign nodules (with 93 % accuracy for binary sepa-
ration) [6], using the SH coefficients to uniquely describe a nodule’s “fingerprint”
remains largely unexplored [2]. Also, as a scale- and rotation-invariant descriptor
of a mesh surface, SH dose not have the capability of describing a nodule’s size and
other critical appearance features: e.g.,solid, sub-solid, part-solid, peri-fissural etc.
Hence, SH alone may not be sufficient for nodule characterization.
Recently, deep convolutional neural networks (DCNNs) have been shown to
be effective at extracting image features for successful classification across a
variety of situations [7,8]. More importantly, studies on “transfer learning” and
using DCNN as a generic image representation [9–11] have shown that successful
appearance feature extraction can be achieved without the need of significant
modifications to DCNN structures, or even training on the specific dataset [10].
While simpler neural networks have been used for nodule appearance [2], and
DCNN has recently been used to classify peri-fissural nodules [12], to our knowl-
edge, DCNNs such as the Imagenet DCNN introduced by Krizhevsky et al. [7]
have not been applied to the nodule malignancy problem, nor have they been
combined with 3D shape descriptors such as the SH method.
In this paper, we present a classification approach for malignancy evaluation
of lung nodules by combining both shape and appearance features using SHs
and DCNNs, respectively, on a large annotated dataset from the Lung Image
Database Consortium (LIDC) [13]. First, a surface parameterization scheme
664 M. Buty et al.
based on SH conformal mapping is used to model the variations of 3D nod-

ule shape. Then, a trained DCNN is used to extract the texture and intensity
features from local image patches. Finally, the sets of DCNN and SH coefficients
are combined and used to train a random forest (RF) classifier for evaluation of
their corresponding malignancy scores, on a scale of 1 to 5. The proposed algo-
rithm aims to achieve a more complete description of local nodules from both
shape (SH) and appearance (DCNN) perspective. In the following sections, we
discuss the proposed method in more detail.
2 Methods
Our method works from two inputs: radiologists’ binary nodule segmentations
and the local CT image patches. First, we produce a mesh representation of
each nodule from the binary segmentation using the method from [5]. These are
then mapped to the canonical parameter domain of SH functions via conformal
mapping, giving us a vector of function coefficients as a representation of the
nodule shape. Second, using local CT images, three orthogonal local patches
containing each nodule are combined as one image input for the DCNN, and
appearance features are extracted from the first fully-connected layer of the
network. This approach for appearance feature extraction is based on recent
work in “transfer learning” [9,10]. Finally, we combine shape and appearance
features together and use a RF classifier to assess nodule malignancy rating.
2.1 Spherical Harmonics Computation
SHs are a series of basis for representing functions defined over the unit sphere
S 2 . The basic idea of SH parameterization is to transform a 3D shape defined in
Euclidean space into the space of SHs. In order to do this, a shape must be first
mapped onto a unit sphere. Conformal mapping is used for this task. It functions
by performing a set of one-to-one surface transformations preserving local angles,
and is especially useful for surfaces with significant variations, such as brain
cortical surfaces [5]. Specifically, let M and N be two Riemannian manifolds,
then a mapping φ : M → N will be considered conformal if local angles between
curves remain invariant. Following the Riemann mapping theorem, a simple
surface can always be mapped to the unit sphere S 2 , producing a spherical
parameterization of the surface.
For genus zero closed surfaces, conformal mapping is equivalent to a harmonic
mapping satisfying the Laplace equation, Δf = 0. For our application, nodules
have approximately spherical shape, with bounded local variations. Therefore, it
is an ideal choice to use spherical conformal mapping to normalize and parame-
terize the nodule surface to a unit sphere. We first convert the binary segmenta-
tions to meshes, and then perform conformal spherical mapping with harmonic
energy minimization. Further technical details can be found in [5].
With spherical conformal mapping, we are able to model the variations of
different nodule shapes onto a unit sphere. However, it is still challenging to
judge and quantify the differences within S 2 space. Therefore, SHs are used to
map S 2 to real space R.
Similar to Fourier series as basis for the circle, SHs are capable of decompos-
ing a given function f ∈ S 2 into a direct sum of irreducible sub-representations

f= fˆ(l, m)Ylm ,
l≥0 |m|≤l
where Ylm is the m-th harmonic basis of degree l, and fˆ(l, m) is the corresponding
SH coefficient. Compared to directly using the surface in S 2 , this gives us two
major benefits: first, the extracted representation features are rotation, scale,
and transformation invariant [5]; second, it is much easier to compute the cor-
relation between two vectors than two surfaces. Therefore, SHs are a powerful
representation for further shape analysis.
Fig. 1. Example of SH coefficients’ difference for different nodules and segmentations.

The top two rows show a comparison of high-malignancy and low-malignancy nodules,
and the difference of their SH coefficient values. The bottom two rows show that two
different segmentations of the same nodule have much similar SH coefficients. Nonethe-
less, differences still remain, motivating supplementing shape-based descriptors with
appearance-based ones.
Fig. 1 illustrates the process of computing SH representations. It also com-

pares the SH coefficients of four nodule segmentation cases: nodules with high
and low malignancy, and two segmentations of the same nodule by different radi-
ologists. From the manual segmentations, we first generate their corresponding
3D mesh. Then, the mesh is conformally mapped to the unit sphere and subse-
quently decomposed into a series of SH coefficients. Here, we briefly compare the
666 M. Buty et al.
two resulting SHs by using their direct difference. For comparison, the last two
rows show the SH computation for the same nodule, but with different segmen-
tations from two annotators. As illustrated, the SH coefficients have far greater
differences between malignant and benign nodules than two segmentations for
the same nodule, showing that it is possible to use SH coefficients to estimate the
malignancy rating of a specific nodule. Even so, as the figure demonstrates, for
nodules consisting of only a limited number of voxels, a change in segmentation
could lead to some discrepancy in SH coefficients. For such cases, SH may not
be able to serve as a reliable marker for malignancy, and we need to assist the
classification with further information, i.e., appearance.
2.2 DCNN Appearance Feature Extraction:
The goal of the DCNN appearance feature extraction is to obtain a represen-

tation of local nodule patches and relate them to malignancy. Here, we have
used the same DCNN structure used by Krizhevsky et al. [7], which has demon-
strated success in multiple applications. This network balances discriminative
power with computational efficiency by using five convolutional layers followed
by three fully-connected layers. With the trained DCNN, each layer provides
different levels of image representations.
Fig. 2. Process of appearance feature extraction. Local patches centered at each nodule
were first extracted on three orthogonal planes. Then, an RGB image is generated with
the three patches fed to each channel. This image is further resampled and used as input
to a trained DCNN. The resulting coefficients in the first fully-connected layer (yellow)
are then used as the feature vector for nodule appearance.
Fig. 2 shows the process how each candidate was quantitatively coded. We
first convert a local 3D CT image volume to an RGB image, which is the required
input to the DCNN structure we use [7]. Here, we used a fixed-size cubic ROI
centered at each segmentation’s center of mass with the size of the largest nodule.
Since voxels in the LIDC dataset are mostly anisotropic, we used interpolation to
achieve isotropic resampling, avoiding distortion effects in the resulting patches.
In order to best preserve the appearance information, we performed principal
component analysis (PCA) on the binary segmentation data to identify the three
orthogonal axes x , y , z of the local nodule within the regular x, y, z space of
axial, coronal, and sagittal planes. Then, we resampled the local space within the
x y , x z and y z planes to obtain local patches containing the nodule. The three
orthogonal patch samples formed an “RGB” image used as input to the DCNN’s
expected three channels. We use Krizhevsky et al.’s pre-trained model for natural
images and extract the coefficients of the last few layers of the DCNN as a high-
order representation of the input image. This “transfer learning” approach from
natural images has proven successful within medical-imaging domains [9,10].
As an added benefit, no training of the DCNN is required, avoiding this time-
consuming and computationally expensive step. For our application, we use the
first fully-connected layer as the appearance descriptor.
2.3 RF Classification
By using SH and DCNN, both appearance and shape features of nodules can
be extracted as a vector of scalars, which in turn can be used together to dis-
tinguish nodules with different malignancy ratings. Combining these two very
different feature types is not trivial. Yet, recent work [14] has demonstrated that
non-image information can be successfully combined with CNN features using
classifiers. This success motivates our use of the RF classifier to synthesize the
SH and DCNN features together. The RF method features high accuracy and
efficiency, and is well-suited for problems of this form [15]. It works by “bag-
ging” the data to generate new training subsets with limited features, which are
in turn used to create a set of decision trees. A sample is then put through all
trees and voted on for correct classification. While the RF is generally insensitive
to parameter changes, we found that a set of 200 trees delivered accurate and
timely performance.

We trained and tested our method on the Lung Image Database Consortium
(LIDC) image collection [13], which consists of 1018 helical thoracic CT scans.
Each scan was processed by four blinded radiologists, who provided segmenta-
tions, shape and texture characteristic descriptors, and also malignancy ratings.
Inclusion criteria consisted of scans with a collimation and reconstruction inter-
val less than or equal to 3 mm, and those with between approximately 1 and 6
lung nodules with longest dimensions between 3 and 30 mm. The LIDC dataset
was chosen for its high-quality and numerous multi-radiologist assessments.
In total 2054 nodules were extracted with 5155 segmentations, and 1432 nod-
ules were marked by at least 2 annotators. Different segmentations/malignancy
ratings were treated individually. In order to avoid training and testing against
different segmentations of the same nodule, dataset was split at nodule level to
avoid bias. Different segmentations of same nodules were grouped into sets based
on the mean Euclidean distance between their ROI centers using a threshold of
668 M. Buty et al.
5 mm. To account for mis-meshing and artifacts from interpolating slices, meshes
were processed by filters to remove holes and fill islands. We also applied 1-step
Laplacian smoothing.
Judging from the distribution of malignancy ratings for all annotating radiol-
ogists and based on Welch’s t-test, inter-observer differences is significant among
annotators. Meanwhile, according to the range of malignancy rating differences
for any specific nodule, most nodules have a rating discrepancy of 2 or 3 among
different annotators, indicating that inter-observer variability is highly signifi-
cant. Therefore to evaluate the performance of the proposed framework, we used
“off-by-one” accuracy, meaning that we regard a malignancy rating with ±1 as
a reasonable and acceptable evaluation.
Accuracy results for 10-fold cross validation are shown in Table 1 for a range
of nodule sets and SH coefficients. Three sets of models were used, one using
DCNN features only, one using SH coefficients only, and one using both SH and
DCNN features. Models were tested with a range of input parameters, including
maximum number of coefficients included and minimum number of annotators
marking the nodule. In all cases, the hybrid model achieved better results than
both individual models using the same input parameters. The hybrid model
results are even more impressive when compared against the inter-observer vari-
ability of the LIDC dataset. These results indicate that DCNNs and SHs provide
complementary appearance and feature information that can help providing ref-
erence malignancy ratings of lung nodules.
Table 1. Off-by-one accuracy for SH only, DCNN only, and hybrid models for input sets
of number of annotators marking the nodule, and maximum number of SH coefficients
included.
Min annotations # of SH Coeffs DCNN only SH only SH+DCNN

1 100 0.791 0.772 0.812
1 150 0.783 0.824
1 400 0.774 0.807
2 100 0.759 0.761 0.803
2 150 0.779 0.793
2 400 0.761 0.824

In this study, we presented an approach for generating a reference opinion about
lung nodule malignancy based on the knowledge of experts’ characterizations.
Our method is based on hybrid feature sets that include shape features, from
SHs decomposition, and appearance features, from a DCNN trained on natural
images. Both features are subsequently used for malignancy classification with
a RF classifier.
There are many promising avenues of future work. For instance, the method
would benefit even more from a larger and more accurate testing pool, as well
as the inclusion of more reliable and precise ground truth data beyond experts’
subjective evaluations. In addition, using additional complementary information,
such as volume and scale-based features, may also further improve scores. In this
study, we represented a nodule’s appearance within orthogonal planes along three
PCA axis. Indeed, including more 2D views, even 3D DCNN, could potentially
be meaningful beyond the promising results from current setting. The rating
classification can also be formulated as regression, whereas the results were not
statistically significant according to our current experiment.
SH computation variations due to nodule size and segmentation remains open
and discussion is limited in existing literatures [6]. In this study, our experiments
partially covered this robustness via testing segmentations for same nodules
from different human observers. We also observed that including more SH coef-
ficients did not necessarily led to higher accuracy. We postulate that coefficients
help define shape to a certain point, beyond which it may introduce more noise
than useful information, and further investigation would be helpful to test this
hypothesis.
Based on the inter-observer variability, experimental results using the LIDC
dataset demonstrate that the proposed scheme can perform comparably to an
independent expert annotator, but does so using full automation up to segmen-
tation. As a result, this work serves as an important demonstration of how both
shape and appearance information can be harnessed for the important task of
lung nodule classification.
References
1. Furuya, K., Murayama, S., Soeda, H., Murakami, J., Ichinose, Y., Yauuchi, H.,
Katsuda, Y., Koga, M., Masuda, K.: New classification of small pulmonary nodules
by margin characteristics on highresolution CT. Acta Radiol. 40, 496–504 (1999)
2. El-Baz, A., Beache, G.M., Gimel’farb, G., Suzuki, K., Okada, K., Elnakib, A.,
Soliman, A., Abdollahi, B.: Computer-aided diagnosis systems for lung cancer:
challenges and methodologies. Int. J. Biomed. Imaging 2013, 942353 (2013)
3. Venkatraman, V., Sael, L., Kihara, D.: Potential for protein surface shape analysis
using spherical harmonics and 3D Zernike descriptors. Cell Biochem. Biophys.
54(1–3), 23–32 (2009)
4. Huang, H., Shen, L., Zhang, R., Makedon, F.S., Hettleman, B., Pearlman, J.D.:
Surface alignment of 3D spherical harmonic models: application to cardiac MRI
analysis. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp.
67–74. Springer, Heidelberg (2005)
5. Gu, X., Wang, Y., Chan, T.F., Thompson, P.M.: Genus zero surface conformal
mapping and its application to brain surface mapping. IEEE Trans. Med. Imaging
23, 949–958 (2004)
6. El-Baz, A., Nitzken, M., Khalifa, F., Elnakib, A., Gimel’farb, G., Falk, R.,
El-Ghar, M.A.: 3D shape analysis for early diagnosis of malignant lung nodules. In:
Székely, G., Hahn, H.K. (eds.) IPMI 2011. LNCS, vol. 6801, pp. 772–783. Springer,
Heidelberg (2011)
670 M. Buty et al.
7. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep
convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L.,
Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25,
pp. 1097–1105. Curran Associates Inc., Red Hook (2012)
8. Gao, M., Bagci, U., Lu, L., Wu, A., Buty, M., Shin, H.C., Roth, H.,
Papadakis, G.Z., Depeursinge, A., Summers, R., Xu, Z., Mollura, D.J.: Holistic
classification of CT attenuation patterns for interstitial lung diseases via deep con-
volutional neural networks. In: 1st Workshop on Deep Learning in Medical Image
Analysis, DLMIA 2015 pp. 41–48, October 2015
9. Shin, H.C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D.,
Summers, R.M.: Deep convolutional neural networks for computer-aided detection:
CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med.
Imaging 99, 1 (2016)
10. Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E., Greenspan, H.: Chest
pathology detection using deep learning with non-medical training. In: 2015 IEEE
12th International Symposium on Biomedical Imaging (ISBI), pp. 294–297, April
2015
11. Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf:
an astounding baseline for recognition. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops, pp. 806–813 (2014)
12. Ciompi, F., de Hoop, B., van Riel, S.J., Chung, K., Scholten, E.T., Oudkerk, M., de
Jong, P.A., Prokop, M., van Ginneken, B.: Automatic classification of pulmonary
peri-fissural nodules in computed tomography using an ensemble of 2D views and
a convolutional neural network out-of-the-box. Med. Image Anal. 26(1), 195–202
(2015)
13. Armato, S.G., McLennan, G., Bidaut, L., et al.: The lung image database consor-
tium (LIDC) and image database resource initiative (IDRI): a completed reference
database of lung nodules on CT scans. Med. Phys. 38(2), 915–931 (2011)
14. Sampaio, W.B., Diniz, E.M., Silva, A.C., de Paiva, A.C., Gattass, M.: Detection
of masses in mammogram images using CNN, geostatistic functions and SVM.
Comput. Biol. Med. 41(8), 653–664 (2011)
15. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Author Index
Aalamifar, Fereshteh I-577 Balfour, Daniel R. III-493

Abdulkadir, Ahmed II-424 Balte, Pallavi P. II-624
Aboagye, Eric O. III-536 Balter, Max L. III-388
Abolmaesumi, Purang I-465, I-644, I-653 Bandula, Steven I-516
Aboud, Katherine I-81 Bao, Siqi II-513
Aboulfotouh, Ahmed I-610 Barillot, Christian III-570
Abugharbieh, Rafeef I-132, I-602 Barkhof, Frederik II-44
Achilles, Felix I-491 Barr, R. Graham II-624
Adalsteinsson, Elfar III-54 Barratt, Dean C. I-516
Adeli, Ehsan I-291, II-1, II-79, II-88, II-212 Bartoli, Adrien I-404
Adler, Daniel H. III-63 Baruthio, J. III-335
Aertsen, Michael II-352 Baumann, Philipp II-370
Afacan, Onur III-544 Baumgartner, Christian F. II-203
Ahmadi, Seyed-Ahmad II-415 Bazin, Pierre-Louis I-255
Ahmidi, Narges I-551 Becker, Carlos II-326
Akgök, Yigit H. III-527 Bengio, Yoshua II-469
Alansary, Amir II-589 Benkarim, Oualid M. II-505
Alexander, Daniel C. II-265 BenTaieb, Aïcha II-460
Al-Kadi, Omar S. I-619 Berks, Michael III-344
Alkhalil, Imran I-431 Bermúdez-Chacón, Róger II-326
Alterovitz, Ron I-439 Bernardo, Marcelino I-577
Amann, Michael III-362 Bernasconi, Andrea II-379
An, Le I-37, II-70, II-79 Bernasconi, Neda II-379
Anas, Emran Mohammad Abu I-465 Bernhardt, Boris C. II-379
Ancel, A. III-335 Bertasius, Gedas II-388
Andělová, Michaela III-362 Beymer, D. III-238
Andres, Bjoern III-397 Bhaduri, Mousumi III-210
Angelini, Elsa D. II-624 Bhalerao, Abhir II-274
Ankele, Michael III-502 Bickel, Marc II-415
Arbeláez, Pablo II-140 Bilgic, Berkin III-467
Armbruster, Marco II-415 Bilic, Patrick II-415
Armspach, J.-P. III-335 Billings, Seth D. III-133
Arslan, Salim I-115 Bischof, Horst II-230
Aung, Tin III-441 Bise, Ryoma III-326
Awate, Suyash P. I-237, III-191 Blendowski, Maximilian II-598
Ayache, Nicholas III-174 Boctor, Emad M. I-577, I-585
Aydogan, Dogu Baran I-201 Bodenstedt, S. II-616
Azizi, Shekoofeh I-653 Bonmati, Ester I-516
Booth, Brian G. I-175
Bagci, Ulas I-662 Borowsky, Alexander I-72
Bahrami, Khosro II-572 Bounincontri, Guido III-579
Bai, Wenjia III-246 Bourdel, Nicolas I-404
Bajka, Michael I-593 Boutagy, Nabil I-431
Balédent, O. III-335 Bouvy, Willem H. II-97
672 Author Index
Boyer, Edmond III-450 Cheng, Jie-Zhi II-53, II-247

Bradley, Andrew P. II-106 Choyke, Peter I-653
Brahm, Gary II-335 Christ, Patrick Ferdinand II-415
Breeuwer, Marcel II-97 Christlein, Vincent III-432
Brosch, Tom II-406 Chu, Peng I-413
Brown, Colin J. I-175 Chung, Albert C.S. II-513
Brown, Michael S. III-273 Çiçek, Özgün II-424
Brox, Thomas II-424 Çimen, Serkan III-142, III-291
Burgess, Stephen II-308 Clancy, Neil T. III-414
Burgos, Ninon II-547 Cobb, Caroline II-308
Bustamante, Mariana III-519 Coello, Eduardo III-596
Buty, Mario I-662 Coles, Claire I-28
Collet, Pierre I-534
Caballero, Jose III-246 Collins, Toby I-404
Cai, Jinzheng II-442, III-183 Comaniciu, Dorin III-229
Cai, Weidong I-72 Combès, Benoit III-570
Caldairou, Benoit II-379 Commowick, Olivier III-570, III-622
Canis, Michel I-404 Cook, Stuart III-246
Cao, Xiaohuan III-1 Cooper, Anthony I-602
Cao, Yu III-238 Coskun, Huseyin I-491
Carass, Aaron III-553 Cowan, Noah J. I-474
Cardon, C. III-255 Crimi, Alessandro I-140
Cardoso, M. Jorge II-547, III-605 Criminisi, Antonio II-265
Carlhäll, Carl-Johan III-519 Culbertson, Heather I-370
Carneiro, Gustavo II-106 Cutting, Laurie E. I-81
Caselli, Richard I-326
Cattin, Philippe C. III-362 D’Anastasi, Melvin II-415
Cerrolaza, Juan J. III-219 Dall’ Armellina, Erica II-361
Cetin, Suheyla III-467 Darras, Kathryn I-465
Chabannes, V. III-335 Das, Dhritiman III-596
Chahal, Navtej III-158 Das, Sandhitsu R. II-564
Chang, Chien-Ming I-559 Davatzikos, Christos I-300
Chang, Eric I-Chao II-496 David, Anna L. I-353, II-352
Chang, Hang I-72 Davidson, Alice II-589
Chang, Ken I-184 de Marvao, Antonio III-246
Chapados, Nicolas II-469 De Silva, T. III-124
Charon, Nicolas III-475 de Sousa, P. Loureiro III-335
Chau, Vann I-175 De Vita, Enrico III-511
Chen, Alvin I. III-388 Dearnaley, David II-547
Chen, Danny Z. II-176, II-658 Delbany, M. III-335
Chen, Geng I-210, III-587 Delingette, Hervé III-174
Chen, Hanbo I-63 Denny, Thomas III-264
Chen, Hao II-149, II-487 Denœux, Thierry II-61
Chen, Kewei I-326 Depeursinge, Adrien I-619
Chen, Ronald I-627 Deprest, Jan II-352
Chen, Sihong II-53 Dequidt, Jeremie I-500
Chen, Terrence I-395 Deriche, Rachid I-89
Chen, Xiaobo I-37, II-18, II-26 Desisto, Nicholas II-9
Chen, Xin III-493 Desjardins, Adrien E. I-353
Cheng, Erkang I-413 deSouza, Nandita II-547
Author Index 673
Dhamala, Jwala III-282 Flach, Barbara I-593

Dhungel, Neeraj II-106 Flach, Boris II-607
di San Filippo, Chiara Amat I-378, I-422 Forman, Christoph III-527
Diehl, Beate I-542 Fortin, A. III-335
Diniz, Paula R.B. II-398 Frangi, Alejandro F. II-291, III-142, III-201,
Dinsdale, Graham III-344 III-353
DiPietro, Robert I-551 Frank, Michael II-317
Djonov, Valentin II-370 Fritscher, Karl II-158
Dodero, Luca I-140 Fu, Huazhu II-132, III-441
Doel, Tom II-352 Fua, Pascal II-326
Dong, Bin III-561 Fuerst, Bernhard I-474
Dong, Di II-124 Fujiwara, Michitaka II-556
Dou, Qi II-149 Fundana, Ketut III-362
Du, Junqiang I-1 Funka-Lea, Gareth III-317
Du, Lei I-123
Duan, Lixin III-441, III-458 Gaed, Mena I-644
Dufour, A. III-335 Gahm, Jin Kyu I-228
Duncan, James S. I-431 Gallardo-Diez, Guillermo I-89
Duncan, John S. I-542, III-81 Gao, Mingchen I-662
Durand, E. III-335 Gao, Wei I-106
Duriez, Christian I-500 Gao, Wenpeng I-457
Dwyer, A. III-613 Gao, Yaozong II-247, II-572, III-1
Gao, Yue II-9
Gao, Zhifan III-98
Eaton-Rosen, Zach III-605
Garnotel, S. III-335
Ebbers, Tino III-519
Gateno, Jaime I-559
Eberle, Melissa I-431
Ge, Fangfei I-46
Ebner, Thomas II-221
Génevaux, O. III-335
Eggenberger, Céline I-593
Georgescu, Bogdan III-229
El-Baz, Ayman I-610, III-613
Ghesu, Florin C. III-229, III-432
El-Ghar, Mohamed Abou I-610, III-613
Ghista, Dhanjoo III-98
Elmogy, Mohammed I-610
Gholipour, Ali III-544
Elshaer, Mohamed Ezzeldin A. II-415
Ghosh, Aurobrata II-265
Elson, Daniel S. III-414
Ghotbi, Reza I-474
Ershad, Marzieh I-508
Giannarou, Stamatia I-386, I-525
Eslami, Abouzar I-378, I-422
Gibson, Eli I-516, I-644
Essert, Caroline I-534
Gilhuijs, Kenneth G.A. II-478
Esteva, Andre II-317
Gilmore, John H. I-10
Ettlinger, Florian II-415
Gimelfarb, Georgy I-610, III-613
Girard, Erin I-395
Fall, S. III-335 Glocker, Ben I-148, II-589, II-616, III-107,
Fan, Audrey III-467 III-536
Farag, Amal II-451 Goerres, J. III-124
Farzi, Mohsen II-291 Goksel, Orcun I-568, I-593, II-256
Faskowitz, Joshua I-157 Golland, Polina III-54, III-166
Fei-Fei, Li II-317 Gomez, Jose A. I-644
Fenster, Aaron I-644 Gómez, Pedro A. III-579
Ferrante, Enzo II-529 González Ballester, Miguel Angel II-505
Fichtinger, Gabor I-465 Gooya, Ali III-142, III-201, III-291
Fischl, Bruce I-184 Götz, M. II-616
674 Author Index
Goury, Olivier I-500 Ho, Dennis Chun-Yu I-559

Grady, Leo III-380 Hodgson, Antony I-602
Grant, P. Ellen III-54 Hoffman, Eric A. II-624
Grau, Vicente II-361 Hofmann, Felix II-415
Green, Michael III-423 Hofmanninger, Johannes I-192
Groeschel, Samuel III-502 Holdsworth, Samantha III-467
Gröhl, J. II-616 Holzer, Markus I-192
Grunau, Ruth E. I-175 Horacek, Milan III-282
Grussu, Francesco II-265 Hornegger, Joachim III-229, III-527
Guerreiro, Filipa II-547 Horváth, Antal III-362
Guerrero, Ricardo III-246 Hu, Jiaxi III-150
Guizard, Nicolas II-469 Hu, Xiaoping I-28
Guldner, Ian H. II-658 Hu, Xintao I-28, I-46, I-123
Gülsün, Mehmet A. III-317 Hu, Yipeng I-516
Guo, Lei I-28, I-46, I-123 Hua, Jing III-150
Guo, Xiaoyu I-585 Huang, Heng I-273, I-317, I-344
Guo, Yanrong III-238 Huang, Junzhou II-640, II-649, II-676
Guo, Yufan II-300 Huang, Xiaolei II-115
Gupta, Vikas III-519 Hunley, Stanley C. III-380
Gur, Yaniv II-300, III-238 Huo, Yuankai I-81
Gutiérrez-Becker, Benjamín III-10, III-19 Huo, Zhouyuan I-317
Gutman, Boris A. I-157, I-326 Hutchinson, Charles II-274
Hutton, Brian F. III-406
Ha, In Young III-89 Hwang, Sangheum II-239
Hacihaliloglu, Ilker I-362
Haegelen, Claire I-534
Ichim, Alexandru-Eugen I-491
Hager, Gregory D. I-551, III-133
Iglesias, Juan Eugenio III-536
Hajnal, Joseph V. II-589
Imamura, Toru II-667
Hall, Scott S. II-317
Imani, Farhad I-644, I-653
Hamarneh, Ghassan I-175, II-460
Iraji, Armin I-46
Hamidian, Hajar III-150
Išgum, Ivana II-478
Hamzé, Noura I-534
Ishii, Masaru III-133
Han, Ju I-72
Ismail, M. III-335
Han, Junwei I-28, I-46
Ittyerah, Ranjit III-63
Handels, Heinz III-28, III-89
Hao, Shijie I-219
Havaei, Mohammad II-469 Jacobson, M.W. III-124
Hawkes, David J. I-516 Jahanshad, Neda I-157, I-335
Hayashi, Yuichiro III-353 Jakab, András I-247
He, Xiaoxu II-335 Jamaludin, Amir II-166
Heim, E. II-616 Janatka, Mirek III-414
Heimann, Tobias I-395 Jannin, Pierre I-534
Heinrich, Mattias Paul II-598, III-28, III-89 Jayender, Jagadeesan I-457
Helm, Emma II-274 Jeong, Won-Ki III-484
Heng, Pheng-Ann II-149, II-487 Jezierska, A. III-335
Herrick, Ariane III-344 Ji, Xing II-247
Hibar, Derrek Paul I-335 Jiang, Baichuan I-457
Hipwell, John H. I-516 Jiang, Menglin II-35
Hlushchuk, Ruslan II-370 Jiang, Xi I-19, I-28, I-55, I-63, I-123
Ho, Chin Pang III-158 Jiao, Jieqing III-406
Author Index 675
Jie, Biao I-1 Kochan, Martin III-81

Jin, Yan II-70 Koesters, Zachary I-508
Jin, Yueming II-149 Koikkalainen, Juha II-44
Jog, Amod III-553 Kokkinos, Iasonas II-529
John, Matthias I-395 Komodakis, Nikos III-10
John, Paul St. I-465 Konen, Eli III-423
Johns, Edward I-448 Kong, Bin III-264
Jojic, Vladimir I-627 Konno, Atsushi III-116
Jomier, J. III-335 Konukoglu, Ender III-536
Jones, Derek K. III-579 Korez, Robert II-433
Joshi, Anand A. I-237 Kou, Zhifeng I-46
Joshi, Sarang III-46, III-72 Krenn, Markus I-192
Kriegman, David III-371
Kacher, Daniel F. I-457 Kruecker, Jochen I-653
Kaden, Enrico II-265 Kuijf, Hugo J. II-97
Kadir, Timor II-166 Kulaga-Yoskovitz, Jessie II-379
Kadoury, Samuel II-529 Kurita, Takio II-667
Kainz, Bernhard II-203, II-589 Kwak, Jin Tae I-653
Kaiser, Markus I-395
Kakileti, Siva Teja I-636 Lai, Maode II-496
Kaltwang, Sebastian II-44 Laidley, David III-210
Kamnitsas, Konstantinos II-203, II-589, Laine, Andrew F. II-624
III-246 Landman, Bennett A. I-81
Kaneda, Kazufumi II-667 Langs, Georg I-192, I-247
Kang, Hakmook I-81 Larson, Ben III-46
Karasawa, Ken’ichi II-556 Lassila, Toni III-201
Kashyap, Satyananda II-344, II-538 Lasso, Andras I-465
Kasprian, Gregor I-247 Lavdas, Ioannis III-536
Kaushik, S. III-255 Lay, Nathan II-388
Kee Wong, Damon Wing II-132 Lea, Colin I-551
Kendall, Giles I-255 Leahy, Richard M. I-237
Kenngott, H. II-616 Ledig, Christian II-44
Kerbrat, Anne III-570 Lee, Gyusung I. I-551
Ketcha, M. III-124 Lee, Kyoung Mu III-308
Keynton, R. III-613 Lee, Matthew III-246
Khalifa, Fahmi I-610, III-613 Lee, Mija R. I-551
Khanna, A.J. III-124 Lee, Soochahn III-308
Khlebnikov, Rostislav II-589 Lee, Su-Lin I-525
Kim, Daeseung I-559 Lee, Thomas C. I-457
Kim, Hosung II-379 Lei, Baiying II-53, II-247
Kim, Hyo-Eun II-239 Leiner, Tim II-478
Kim, Junghoon I-166 Lelieveldt, Boudewijn P.F. III-107
Kim, Minjeong I-264 Lemstra, Afina W. II-44
King, Andrew P. III-493 Leonard, Simon III-133
Kiryati, Nahum III-423 Lepetit, Vincent II-194
Kitasaka, Takayuki II-556 Lessoway, Victoria A. I-465
Kleinszig, G. III-124 Li, David II-406
Klusmann, Maria II-352 Li, Gang I-10, I-210, I-219
Knopf, Antje-Christin II-547 Li, Hua II-61
Knoplioch, J. III-255 Li, Huibin II-521
676 Author Index
Li, Qingyang I-326, I-335 Maier-Hein, L. II-616

Li, Shuo II-335, III-98, III-210 Majewicz, Ann I-508
Li, Xiang I-19, I-63 Malamateniou, Christina II-589
Li, Xiao I-123 Malpani, Anand I-551
Li, Yang II-496 Mancini, Laura III-81
Li, Yanjie III-98 Mani, Baskaran III-441
Li, Yuanwei III-158 Maninis, Kevis-Kokitsi II-140
Li, Yujie I-63 Manivannan, Siyamalan II-308
Lian, Chunfeng II-61 Manjón, Jose V. II-564
Lian, Jun I-627 Mansi, Tommaso III-229
Liao, Rui I-395 Mao, Yunxiang II-685
Liao, Ruizhi III-54 Marami, Bahram III-544
Liebschner, Michael A.K. I-559 Marchesseau, Stephanie III-273
Lienkamp, Soeren S. II-424 Mari, Jean-Martial I-353
Likar, Boštjan II-433 Marlow, Neil I-255, III-605
Lim, Lek-Heng III-502 Marom, Edith M. III-423
Lin, Jianyu III-414 Marsden, Alison III-371
Lin, Ming C. I-627 Marsden, Paul K. III-493
Lin, Stephen II-132 Masuda, Atsuki II-667
Lin, Weili I-10, I-210 Mateus, Diana III-10, III-19
Ling, Haibin I-413 Mattausch, Oliver I-593
Linguraru, Marius George III-219 Matthew, Jacqueline II-203
Lippé, Sarah II-529 Mayer, Arnaldo III-423
Liu, Jiang II-132, III-441, III-458 Mazauric, Dorian I-89
Liu, Luyan II-1, II-26, II-212 McClelland, Jamie II-547
Liu, Mingxia I-1, I-308, II-79 McCloskey, Eugene V. II-291
Liu, Mingyuan II-496 McEvoy, Andrew W. I-542, III-81
Liu, Tianming I-19, I-28, I-46, I-55, I-63, McGonigle, John III-37
I-123 Meining, Alexander I-448
Liu, Weixia III-63 Melbourne, Andrew I-255, III-406, III-511,
Liu, Xin III-98 III-605
Liu, XingTong II-406 Meng, Yu I-10, I-219
Lombaert, Herve I-255 Menze, Bjoern H. II-415, III-397, III-579,
Lorenzi, Marco I-255 III-596
Lötjönen, Jyrki II-44 Menzel, Marion I. III-579
Lu, Allen I-431 Mercado, Ashley II-335
Lu, Jianfeng I-55 Merino, Maria I-577
Lu, Le II-388, II-442, II-451 Merkow, Jameson III-371
Lugauer, Felix III-527 Merveille, O. III-335
Luo, Jie III-54 Metaxas, Dimitris N. II-35, II-115
Lv, Jinglei I-19, I-28, I-46, I-55, I-63 Miao, Shun I-395
Lynch, Mary Ellen I-28 Miller, Steven P. I-175
Milstein, Arnold II-317
Ma, Andy I-482 Min, James K. III-380
MacKenzie, John D. II-176 Minakawa, Masatoshi II-667
Madhu, Himanshu J. I-636 Miraucourt, O. III-335
Maguire, Timothy J. III-388 Misawa, Kazunari II-556
Mai, Huaming I-559 Miserocchi, Anna I-542
Maier, Andreas III-432, III-527 Modat, Marc III-81
Maier-Hein, K. II-616 Modersitzki, Jan III-28
Author Index 677
Moeskops, Pim II-478 Oktay, Ozan III-246

Molina-Romero, Miguel III-579 Orasanu, Eliza I-255
Mollero, Roch III-174 Ourselin, Sebastien I-255, I-353, I-542,
Mollura, Daniel J. I-662 II-352, II-547, III-81, III-406, III-511,
Moore, Tonia III-344 III-605
Moradi, Mehdi II-300, III-238 Owen, David III-511
Mori, Kensaku II-556, III-353 Ozdemir, Firat II-256
Mousavi, Parvin I-465, I-644, I-653 Ozkan, Ece II-256
Moussa, Madeleine I-644
Moyer, Daniel I-157 Pagé, G. III-335
Mullick, R. III-255 Paknezhad, Mahsa III-273
Mulpuri, Kishore I-602 Pang, Yu I-413
Munsell, Brent C. II-9 Pansiot, Julien III-450
Murino, Vittorio I-140 Papastylianou, Tasos II-361
Murray, Andrea III-344 Paragios, Nikos II-529
Mwikirize, Cosmas I-362 Parajuli, Nripesh I-431
Parisot, Sarah I-115, I-148
Park, Jin-Hyeong II-487
Nachum, Ilanit Ben III-210
Park, Sang Hyun I-282
Naegel, B. III-335
Parker, Drew I-166
Nahlawi, Layan I-644
Parsons, Caron II-274
Najman, L. III-335
Parvin, Bahram I-72
Navab, Nassir I-378, I-422, I-474, I-491,
Passat, N. III-335
III-10, III-19
Patil, B. III-255
Negahdar, Mohammadreza II-300, III-238
Payer, Christian II-194, II-230
Negussie, Ayele H. I-577
Peng, Hanchuan I-63
Neumann, Dominik III-229
Peng, Jailin II-70
Ng, Bernard I-132
Pennec, Xavier III-174, III-300
Nguyen, Yann I-500
Pereira, Stephen P. I-516
Ni, Dong II-53, II-247
Pernuš, Franjo II-433
Nicolas, G. III-255
Peter, Loïc III-19
Nie, Dong II-212
Pezold, Simon III-362
Nie, Feiping I-291
Pezzotti, Nicola II-97
Niethammer, Marc I-439, III-28
Pichora, David I-465
Nill, Simeon II-547
Pickup, Stephen III-63
Nimura, Yukitaka II-556
Piella, Gemma II-505
Noachtar, Soheyl I-491
Pinto, Peter I-577, I-653
Nogues, Isabella II-388
Pizer, Stephen I-439
Noh, Kyoung Jin III-308
Pluim, Josien P.W. II-632
Nosher, John L. I-362
Pluta, John III-63
Nutt, David J. III-37
Pohl, Kilian M. I-282
Polzin, Thomas III-28
O’Donnell, Matthew I-431 Pont-Tuset, Jordi II-140
O’Regan, Declan III-246 Pozo, Jose M. II-291, III-201
Oda, Masahiro II-556, III-353 Pratt, Rosalind II-352
Oelfke, Uwe II-547 Prayer, Daniela I-247
Oguz, Ipek II-344, II-538 Preston, Joseph Samuel III-72
Okamura, Allison M. I-370 Price, True I-439
678 Author Index
Prieto, Claudia III-493 Ruan, Su II-61

Prince, Jerry L. III-553 Rueckert, Daniel I-115, I-148, II-44, II-203,
Prosch, Helmut I-192 II-556, II-589, III-246, III-536
Prud’homme, C. III-335 Rutherford, Mary II-589
Pusiol, Guido II-317
Sabouri, Pouya III-46
Qi, Ji III-414 Salmon, S. III-335
Qin, Jing II-53, II-149, II-247 Salzmann, Mathieu II-326
Quader, Niamul I-602 Sanabria, Sergio J. I-568
Quan, Tran Minh III-484 Sankaran, Sethuraman III-380
Sanroma, Gerard II-505
Rahmim, Arman I-577 Santos, Michel M. II-398
Raidou, Renata Georgia II-97 Santos, Wellington P. II-398
Raitor, Michael I-370 Santos-Ribeiro, Andre III-37
Rajan, D. III-238 Sapkota, Manish II-185
Rajchl, Martin II-589 Sapp, John L. III-282
Rak, Marko II-283 Saria, Suchi I-482
Ramachandran, Rageshree II-176 Sarrami-Foroushani, Ali III-201
Rapaka, Saikiran III-317 Sase, Kazuya III-116
Rasoulian, Abtin I-465 Sato, Imari III-326
Raudaschl, Patrik II-158 Sawant, Amit III-46
Ravikumar, Nishant III-142, III-291 Saygili, Gorkem III-107
Rawat, Nishi I-482 Schaap, Michiel III-380
Raytchev, Bisser II-667 Scheltens, Philip II-44
Reader, Andrew J. III-493 Scherrer, Benoit III-544
Reaungamornrat, S. III-124 Schirmer, Markus D. I-148
Reda, Islam I-610 Schlegl, Thomas I-192
Rege, Robert I-508 Schmidt, Michaela III-527
Reiman, Eric M. I-326 Schöpf, Veronika I-247
Reiter, Austin I-482, III-133 Schott, Jonathan M. III-406
Rekik, Islem I-210, II-26, II-572 Schubert, Rainer II-158
Rempfler, Markus II-415, III-397 Schulte, Rolf F. III-596
Reyes, Mauricio II-370 Schultz, Thomas III-502
Rhodius-Meester, Hanneke II-44 Schwab, Evan III-475
Rieke, Nicola I-422 Schwartz, Ernst I-247
Robertson, Nicola J. I-255 Scott, Catherine J. III-406
Rockall, Andrea G. III-536 Seifabadi, Reza I-577
Rodionov, Roman I-542 Seitel, Alexander I-465
Rohé, Marc-Michel III-300 Senior, Roxy III-158
Rohling, Robert I-465 Sepasian, Neda II-97
Rohrer, Jonathan III-511 Sermesant, Maxime III-174, III-300
Ronneberger, Olaf II-424 Shakeri, Mahsa II-529
Roodaki, Hessam I-378 Shalaby, Ahmed I-610
Rosenman, Julian I-439 Sharma, Manas II-335
Ross, T. II-616 Sharma, Puneet III-317
Roth, Holger R. II-388, II-451 Sharp, Gregory C. II-158
Rottman, Caleb III-46 Shatkay, Hagit I-644
Author Index 679
Shehata, M. III-613 Suk, Heung-Il I-344

Shen, Dinggang I-10, I-37, I-106, I-210, Summers, Ronald M. II-388, II-451, III-219
I-219, I-264, I-273, I-291, I-308, I-317, Sun, Jian II-521
I-344, II-1, II-18, II-26, II-70, II-79, Sun, Shanhui I-395
II-88, II-212, II-247, II-572, III-561, Sun, Xueqing III-414
III-587 Sun, Yuanyuan III-98
Shen, Shunyao I-559 Sutton, Erin E. I-474
Shen, Wei II-124 Suzuki, Masashi II-667
Shi, Feng II-572 Syeda-Mahmood, Tanveer II-300, III-238
Shi, Jianbo II-388 Synnes, Anne R. I-175
Shi, Jie I-326 Szopos, M. III-335
Shi, Xiaoshuang III-183
Shi, Yonggang I-201, I-228 Tahmasebi, Amir I-653
Shigwan, Saurabh J. III-191 Talbot, H. III-335
Shimizu, Natsuki II-556 Tam, Roger II-406
Shin, Min III-264 Tamaki, Toru II-667
Shin, Seung Yeon III-308 Tan, David Joseph I-422
Shokiche, Carlos Correa II-370 Tanaka, Kojiro II-667
Shriram, K.S. III-255 Tang, Lisa Y.W. II-406
Shrock, Christine I-482 Tang, Meng-Xing III-158
Siewerdsen, J.H. III-124 Tanner, Christine I-593
Siless, Viviana I-184 Tanno, Ryutaro II-265
Silva-Filho, Abel G. II-398 Tarabay, R. III-335
Simonovsky, Martin III-10 Tatavarty, Sunil II-415
Sinha, Ayushi III-133 Tavakoli, Behnoosh I-585
Sinusas, Albert J. I-431 Taylor, Charles A. III-380
Sixta, Tomáš II-607 Taylor, Chris III-344
Smith, Sandra II-203 Taylor, Russell H. III-133
Sohn, Andrew II-451 Taylor, Zeike A. III-142, III-291
Sokooti, Hessam III-107 Teisseire, M. III-255
Soliman, A. III-613 Thiriet, M. III-335
Sommer, Wieland H. II-415 Thiruvenkadam, S. III-255
Sona, Diego I-140 Thomas, David III-511
Sonka, Milan II-344, II-538 Thompson, Paul M. I-157, I-326, I-335
Sotiras, Aristeidis I-300 Thornton, John S. III-81
Spadea, Maria Francesca II-158 Thung, Kim-Han II-88
Sparks, Rachel I-542 Tian, Jie II-124
Speidel, S. II-616 Tijms, Betty II-44
Sperl, Jonathan I. III-579 Tillmanns, Christoph III-527
Stalder, Aurélien F. III-527 Toi, Masakazu III-326
Stamm, Aymeric III-622 Tolonen, Antti II-44
Staring, Marius III-107 Tombari, Federico I-422, I-491
Stendahl, John C. I-431 Tönnies, Klaus-Dietz II-283
Štern, Darko II-194, II-221, II-230 Torres, Renato I-500
Stock, C. II-616 Traboulsee, Anthony II-406
Stolka, Philipp J. I-370 Trucco, Emanuele II-308
Stonnington, Cynthia I-326 Tsagkas, Charidimos III-362
Stoyanov, Danail III-81, III-414 Tsehay, Yohannes II-388
Styner, Martin II-9 Tsien, Joe Z. I-63
Subramanian, N. III-255 Tsogkas, Stavros II-529
680 Author Index
Tsujita, Teppei III-116 Wang, Tianfu II-53, II-247

Tu, Zhuowen III-371 Wang, Xiaoqian I-273
Tunc, Birkan I-166 Wang, Xiaosong II-388
Turk, Esra A. III-54 Wang, Yalin I-326, I-335
Turkbey, Baris I-577, I-653 Wang, Yipei II-496
Wang, Yunfu I-72
Ulas, Cagdas III-579 Wang, Zhengxia I-291
Unal, Gozde III-467 Ward, Aaron D. I-644
Uneri, A. III-124 Warfield, Simon K. III-544, III-622
Ungi, Tamas I-465 Wassermann, Demian I-89
Urschler, Martin II-194, II-221, II-230 Watanabe, Takanori I-166
Usman, Muhammad III-493 Wehner, Tim I-542
Wei, Zhihui I-37
Weier, Katrin III-362
Van De Ville, Dimitri I-619 Weiskopf, Nikolaus I-255
van der Flier, Wiesje II-44 Wells, William M. III-166
van der Velden, Bas H.M. II-478 West, Simeon J. I-353
van Diest, Paul J. II-632 Wetzl, Jens III-527
Van Gool, Luc II-140 Whitaker, Ross III-72
Vantini, S. III-622 White, Mark III-81
Varol, Erdem I-300 Wilkinson, J. Mark II-291
Vedula, S. Swaroop I-551 Wilms, Matthias III-89
Venkataramani, Krithika I-636 Wilson, David I-465
Venkatesh, Bharath A. II-624 Winston, Gavin P. III-81
Vera, Pierre II-61 Wirkert, S. II-616
Vercauteren, Tom II-352, III-81 Wisse, Laura E.M. II-564
Verma, Ragini I-166 Wolinsky, J.-P. III-124
Veta, Mitko II-632 Wolk, David A. II-564, III-63
Vidal, René III-475 Wolterink, Jelmer M. II-478
Viergever, Max A. II-478 Wong, Damon Wing Kee III-441, III-458
Vilanova, Anna II-97 Wong, Tien Yin III-458
Vizcaíno, Josué Page I-422 Wood, Bradford J. I-577, I-653
Vogt, S. III-124 Wu, Aaron I-662
Voirin, Jimmy I-534 Wu, Colin O. II-624
Vrtovec, Tomaž II-433 Wu, Guorong I-106, I-264, I-291, II-9,
II-247, III-1
Walker, Julie M. I-370 Wu, Wanqing III-98
Walter, Benjamin I-448 Wu, Yafeng III-587
Wang, Chendi I-132 Würfl, Tobias III-432
Wang, Chenglong III-353
Wang, Guotai II-352 Xia, James J. I-559
Wang, Hongzhi II-538, II-564 Xia, Wenfeng I-353
Wang, Huifang II-247 Xie, Long II-564
Wang, Jiazhuo II-176 Xie, Yaoqin III-98
Wang, Jie I-335 Xie, Yuanpu II-185, III-183
Wang, Li I-10, I-219 Xing, Fuyong II-442, III-183
Wang, Linwei III-282 Xiong, Huahua III-98
Wang, Lisheng II-521 Xu, Sheng I-653
Wang, Qian II-1, II-26 Xu, Tao II-115
Wang, Sheng II-640, II-649 Xu, Yan II-496
Author Index 681
Xu, Yanwu II-132, III-441, III-458 Zhang, Han I-37, I-106, II-1, II-18, II-26,
Xu, Zheng II-640, II-676 II-115, II-212
Xu, Ziyue I-662 Zhang, Heye III-98
Xu, Zongben II-521 Zhang, Honghai II-344
Zhang, Jie I-326
Yamamoto, Tokunori III-353 Zhang, Jun I-308, II-79
Yan, Pingkun I-653 Zhang, Lichi II-1
Yang, Caiyun II-124 Zhang, Lin I-386
Yang, Feng II-124 Zhang, Miaomiao III-54, III-166
Yang, Guang-Zhong I-386, I-448, I-525 Zhang, Qiang II-274
Yang, Heran II-521 Zhang, Shaoting II-35, II-115, III-264
Yang, Jianhua III-1 Zhang, Shu I-19, I-28
Yang, Jie II-624 Zhang, Siyuan II-658
Yang, Lin II-185, II-442, II-658, III-183 Zhang, Tuo I-19, I-46, I-123
Yang, Shan I-627 Zhang, Wei I-19
Yang, Tao I-335 Zhang, Xiaoqin III-441
Yao, Jiawen II-640, II-649 Zhang, Xiaoyan I-559
Yap, Pew-Thian I-210, I-308, II-88, III-561, Zhang, Yizhe II-658
III-587 Zhang, Yong I-282, III-561
Yarmush, Martin L. III-388 Zhang, Zizhao II-185, II-442, III-183
Ye, Chuyang I-97 Zhao, Liang I-525
Ye, Jieping I-326, I-335 Zhao, Qinghua I-55
Ye, Menglong I-386, I-448 Zhao, Qingyu I-439
Yendiki, Anastasia I-184 Zhao, Shijie I-19, I-28, I-46, I-55
Yin, Qian II-442 Zhen, Xiantong III-210
Yin, Yilong II-335, III-210 Zheng, Yefeng I-413, II-487, III-317
Yin, Zhaozheng II-685 Zheng, Yingqiang III-326
Yoo, Youngjin II-406 Zheng, Yuanjie II-35
Yoshino, Yasushi III-353 Zhong, Zichun III-150
Yousry, Tarek III-81 Zhou, Mu II-124
Yu, Lequan II-149 Zhou, S. Kevin II-487
Yu, Renping I-37 Zhou, Xiaobo I-559
Yuan, Peng I-559 Zhu, Hongtu I-627
Yun, Il Dong III-308 Zhu, Xiaofeng I-106, I-264, I-291, I-344,
Yushkevich, Paul A. II-538, II-564, III-63 II-70
Zhu, Xinliang II-649
Zaffino, Paolo II-158 Zhu, Ying I-413
Zang, Yali II-124 Zhu, Yingying I-106, I-264, I-291
Zapp, Daniel I-378 Zhuang, Xiahai II-581
Zec, Michelle I-465 Zisserman, Andrew II-166
Zhan, Liang I-335 Zombori, Gergely I-542
Zhan, Yiqiang III-264 Zontak, Maria I-431
Zhang, Daoqiang I-1 Zu, Chen I-291
Zhang, Guangming I-559 Zuluaga, Maria A. I-542, II-352
Zhang, Haichong K. I-585 Zwicker, Jill G. I-175

Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016

Uploaded by

Copyright:

Available Formats

Sebastien Ourselin · Leo Joskowicz

Mert R. Sabuncu · Gozde Unal

Medical Image Computing

Mert R. Sabuncu Gozde Unal

William Wells (Eds.)

Medical Image Computing

ISSN 0302-9743 ISSN 1611-3349 (electronic)

Library of Congress Control Number: 2016952513

© Springer International Publishing AG 2016

Printed on acid-free paper

This Springer imprint is published by Springer Nature

In 2016, the 19th International Conference on Medical Image Computing and

August 2016 Sebastien Ourselin

Local Organization Chair

Satellite Events Chair

Satellite Events Co-chairs

MICCAI Society Board of Directors

MICCAI Society Consultants to the Board

Student Board Members

Lee, Su-Lin Imperial College London, UK

Abbott, Jake Alexander, Daniel Bai, Ying

Bhatia, Kanwal Conjeti, Sailesh Foroughi, Pezhman

Haidegger, Tamas Kadkhodamohammadi, Lepore, Natasha

Mewes, Philip Pace, Danielle Randles, Amanda

Siddiqi, Kaleem Tasdizen, Tolga Wang, Liansheng

Zhang, Le Zhang, Yong Zhu, Hongtu

Ordinal Patterns for Connectivity Networks in Brain Disease Diagnosis . . . . . 1

Discovering Cortical Folding Patterns in Neonatal Cortical Surfaces

Modeling Functional Dynamics of Cortical Gyri and Sulci . . . . . . . . . . . . . . 19

A Multi-stage Sparse Coding Framework to Explore the Effects

Correlation-Weighted Sparse Group Representation for Brain Network

Temporal Concatenated Sparse Coding of Resting State fMRI Data Reveal

Exploring Brain Networks via Structured Sparse Representation

Discover Mouse Gene Coexpression Landscape Using Dictionary Learning

Integrative Analysis of Cellular Morphometric Context Reveals Clinically

Mapping Lifetime Brain Volumetry with Covariate-Adjusted Restricted

Extracting the Core Structural Connectivity Network: Guaranteeing

Fiber Orientation Estimation Using Nonlocal and Local Information . . . . . . . 97

Brain Analysis: Connectivity

Reveal Consistent Spatial-Temporal Patterns from Dynamic Functional

Boundary Mapping Through Manifold Learning for Connectivity-Based

Species Preserved and Exclusive Structural Connections Revealed

Modularity Reinforcement for Improving Brain Subnetwork Extraction . . . . . 132

Effective Brain Connectivity Through a Constrained Autoregressive Model . . . 140

GraMPa: Graph-Based Multi-modal Parcellation of the Cortex

A Continuous Model of Cortical Connectivity . . . . . . . . . . . . . . . . . . . . . . 157

Label-Informed Non-negative Matrix Factorization with Manifold

Predictive Subnetwork Extraction with Structural Priors

Hierarchical Clustering of Tractography Streamlines Based

Unsupervised Identification of Clinically Relevant Clusters in Routine

Probabilistic Tractography for Topographically Organized Connectomes . . . . 201

Brain Analysis: Cortical Morphology

A Hybrid Multishape Learning Framework for Longitudinal Prediction

Learning-Based Topological Correction for Infant Cortical Surfaces . . . . . . . 219

Riemannian Metric Optimization for Connectivity-Driven Surface Mapping . . . 228

Riemannian Statistical Analysis of Cortical Geometry with Robustness

Modeling Fetal Cortical Expansion Using Graph-Regularized

Longitudinal Analysis of the Preterm Cortex Using Multi-modal

Early Diagnosis of Alzheimer’s Disease by Joint Feature Selection

Prediction of Memory Impairment with MRI Data: A Longitudinal Study

Joint Data Harmonization and Group Cardinality Constrained Classification . . . 282

Progressive Graph-Based Transductive Learning for Multi-modal

Structured Outlier Detection in Neuroimaging Studies with Minimal