Segmentation of Cone Beam CT in Stereotactic Radiosurgery

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

DEGREE PROJECT IN MEDICAL ENGINEERING,

SECOND CYCLE, 30 CREDITS


STOCKHOLM, SWEDEN 2016

Segmentation of Cone Beam


CT in Stereotactic
Radiosurgery
AWAIS ASHFAQ

KTH ROYAL INSTITUTE OF TECHNOLOGY


SCHOOL OF TECHNOLOGY AND HEALTH
This master thesis was performed in the Research and Physics
Department at Elekta Instrument AB, Stockholm and School of
Technology and Health at KTH Royal Institute of Technology.

Segmentation of Cone Beam CT in


Stereotactic Radiosurgery

Awais Ashfaq

Master of Science Thesis in Medical Engineering


Advanced level (second cycle), 30 credits
Supervisor at KTH: Asst Prof Pawel Herman
Supervisor at Elekta: Jonas Adler
Examiner: Dr. Mats Nilsson
School of Technology and Health
TRITA-STH. EX 2016:104

Royal Institute of Technology


KTH STH
SE-141 86 Flemingsberg, Sweden
http://www.kth.se/sth
Abstract
C-arm Cone Beam CT (CBCT) systems – due to compact size, flexible geometry
and low radiation exposure – inaugurated the era of on-board 3D image guidance
in therapeutic and surgical procedures. Leksell Gamma Knife Icon by Elekta
introduced an integrated CBCT system to determine patient position prior to
surgical session, thus advancing to a paradigm shift in facilitating frameless
stereotactic radiosurgeries. While CBCT offers a quick imaging facility with high
spatial accuracy, the quantitative values tend to be distorted due to various
physics based artifacts such as scatter, beam hardening and cone beam effect.
Several 3D reconstruction algorithms targeting these artifacts involve an accurate
and fast segmentation of craniofacial CBCT images into air, tissue and bone.

The objective of the thesis is to investigate the performance of deep learning


based convolutional neural networks (CNN) in relation to conventional image
processing and machine learning algorithms in segmenting CBCT images. CBCT
data for training and testing procedures was provided by Elekta. A framework of
segmentation algorithms including multilevel automatic thresholding, fuzzy
clustering, multilayer perceptron and CNN is developed and tested against pre-
defined evaluation metrics carrying pixel-wise prediction accuracy, statistical
tests and execution times among others. CNN has proven its ability to outperform
other segmentation algorithms throughout the evaluation metrics except for
execution times. Mean segmentation error for CNN is found to be 0.4% with a
standard deviation of 0.07%, followed by fuzzy clustering with mean
segmentation error of 0.8% and a standard deviation of 0.12%. CNN based
segmentation takes 500s compared to multilevel thresholding which requires ~1s
on similar sized CBCT image.

The present work demonstrates the ability of CNN in handling artifacts and noise
in CBCT images and maintaining a high semantic segmentation performance.
However, further efforts targeting CNN execution speed are required to utilize
the segmentation framework within real-time 3D reconstruction algorithms.

Keywords: Cone Beam CT, Convolutional Neural Networks, Image


Segmentation, Leksell Gamma Knife Icon, Deep Learning.
Sammanfattning

C-arm Cone Beam CT (CBCT) system har tack vare sitt kompakta format,
flexibla geometri och låga strålningsdos startat en era av inbyggda 3D
bildtagningssystem för styrning av terapeutiska och kirurgiska ingripanden.
Elektas Leksell Gamma Knife Icon introducerade ett integrerat CBCT-system
för att bestämma patientens position för operationer och på så sätt gå in i en
paradigm av ramlös stereotaktisk strålkirurgi. Även om CBCT erbjuder snabb
bildtagning med hög spatiel noggrannhet så tenderar de kvantitativa värdena att
störas av olika artefakter som spridning, beam hardening och cone beam effekten.
Ett flertal 3D rekonstruktionsalgorithmer som försöker reducera dessa artefakter
kräver en noggrann och snabb segmentering av kraniofaciala CBCT-bilder i luft,
mjukvävnad och ben.
Målet med den här avhandlingen är att undersöka hur djupa neurala nätverk
baserade på faltning (convolutional neural networks, CNN) presterar i jämförelse
med konventionella bildbehandlings- och maskininlärningalgorithmer för
segmentering av CBCT-bilder. CBCT-data för träning och testning tillhandahölls
av Elekta. Ett ramverk för segmenteringsalgorithmer inklusive flernivåströskling
(multilevel automatic thresholding), suddig klustring (fuzzy clustering),
flerlagersperceptroner (multilayer perceptron) och CNN utvecklas och testas mot
fördefinerade utvärderingskriterier som pixelvis noggrannhet, statistiska tester
och körtid. CNN presterade bäst i alla metriker förutom körtid. Det
genomsnittliga segmenteringsfelet för CNN var 0.4% med en standardavvikelse
på 0.07%, följt av suddig klustring med ett medelfel på 0.8% och en
standardavvikelse på 0.12%. CNN kräver 500 sekunder jämfört med ungefär 1
sekund för den snabbaste algorithmen, flernivåströskling på lika stora CBCT-
volymer.
Arbetet visar CNNs förmåga att handera artefakter och brus i CBCT-bilder och
bibehålla en högkvalitativ semantisk segmentering. Vidare arbete behövs dock
för att förbättra presetandan hos algorithmen för att metoden ska vara
applicerbar i realtidsrekonstruktionsalgorithmer.

Nyckelord: Cone Beam CT, neurala nätverk baserade på faltning,


bildsegmentering, Leksell Gamma Knife Icon,
Djup inlärning.
To the One
who gives me life
Table of Contents
1. Introduction ........................................................................................... 1
1.1 Problem Statement .................................................................................................. 2
1.2 Aim .......................................................................................................................... 3
1.3 Thesis Layout........................................................................................................... 3

2. Methods ................................................................................................. 4
2.1 Convolutional Neural Networks ........................................................................... 4
2.1.1 Training ............................................................................................................. 4
2.1.2 Implementation................................................................................................ 11
2.1.3 Testing ............................................................................................................. 11
2.2 Other Algorithms ............................................................................................... 13
2.2.1 Features Selection ............................................................................................ 13
2.2.2 Modified Fuzzy C-means ................................................................................. 14
2.3 Evaluation Metrics ............................................................................................. 15

3. Results ................................................................................................. 16
3.1 Two Sample T-Test ............................................................................................... 16
3.2 Confusion Matrix ................................................................................................... 17
3.3 ROI error ............................................................................................................... 19
3.4 Visualization .......................................................................................................... 20

4. Discussion............................................................................................. 22
4.1 Performance ........................................................................................................... 22
4.2 Limitations ............................................................................................................. 24

5. Conclusion ............................................................................................ 26

Appendix A: Background and Literature Review....................................... 27


A.1 Computed Tomography ........................................................................................ 27
A.2 Cone Beam Computed Tomography ..................................................................... 28
A.3 CBCT System at Elekta ....................................................................................... 29
A.4 CBCT Artifacts ..................................................................................................... 31
A.5 CBCT Denoising Methods .................................................................................... 33
A.6 Segmentation ......................................................................................................... 36
Appendix B: Image Processing and Segmentation ..................................... 37
B.1 Image Filtering ...................................................................................................... 37
B.2 Automatic Thresholding........................................................................................ 38
B.3 K-means ................................................................................................................ 38
B.4 Fuzzy C-means ...................................................................................................... 38
B.5 Mean Shift ............................................................................................................. 39
B.6 Multilayer Perceptron ........................................................................................... 39
B.7 Convolutional Neural Networks ............................................................................ 40

References................................................................................................... 42

i
List of Figures
Figure 1.1: The Leksell Gamma Knife Icon....................................................................... 1
Figure 2.1: Steps involved in preparation of training images and labels........................... 5
Figure 2.2: A 3D illustration of the stereotactic head frame............................................. 6
Figure 2.3: Extracting patches around edge pixels............................................................ 6
Figure 2.4: Patch generation around edge pixels .............................................................. 7
Figure 2.5: Block diagram of the implemented CNN ........................................................ 8
Figure 2.6: Validation set assisting in avoiding over-fitting............................................ 11
Figure 2.7: Block diagram for parallel test data processing ............................................ 12
Figure 2.8: A slice from craniofacial probabilistic maps.................................................. 13
Figure 2.9: Modified fuzzy C-means framework .............................................................. 14
Figure 3.1: Box plots showing statistical distribution of prediction accuracy ................ 18
Figure 3.2: 5 Regions of interest (L-R) in CBCT images. ............................................... 19
Figure 3.3: Bar graph showing pixel prediction error in 5 ROI's .................................... 19
Figure 3.4: Visual comparison of different segmentation algorithms. ............................. 21
Figure 4.1: Bone information per craniofacial CBCT slice.............................................. 22
Figure 4.2: An illustration of beam hardening in CT images.......................................... 25
Figure A.1: Comparison of conventional fan beam CT and CBCT systems................... 28
Figure A.2: Elekta CBCT scanner .................................................................................. 30
Figure A.3: Basic artifacts in CBCT images ................................................................... 32
Figure A.4: Basic steps in CBCT image creation............................................................ 34
Figure A.5: Iterative reconstruction flowchart ................................................................ 35
Figure A.6: Typical framework for a scatter correction method ..................................... 35
Figure B.1: Automatic thresholding ................................................................................ 38
Figure B.2: Multilayer perceptron ................................................................................... 39
Figure B.3: Features in deep learning in context of images. ........................................... 40

ii
List of Abbreviations
ANN Artificial Neural Network
CBCT Cone Beam Computed Tomography
CNN Convolutional Neural Network
CPU Central Processing Unit
CT Computed Tomography
CUDA Compute Unified Device Architecture
cuDNN CUDA Deep Neural Network library
DICOM Digital Imaging and Communications in Medicine
DL Deep Learning
FBP Filtered Backprojection
FCM Fuzzy C Means
GPU Graphical Processing Unit
IGRT Image Guided Radiation Therapy
LMA Levenberg-Marquardt Algorithm
MFCM Modified Fuzzy C-means
ML Machine Learning
MLP Multilayer Perceptron
MR Magnetic Resonance
NLM Non-Local Means
ReLU Rectified Linear Unit
RMSE Root Mean Square Error
ROI Region of Interest
SVM Support Vector Machine

iii
1. Introduction
The development of compact, high-quality two-dimensional detector arrays coupled
with a modified Filtered Backprojection (FBP) algorithm [1] that supports volumetric
reconstruction from 2D X-ray projections acquired in a circular trajectory led to the
introduction of Cone Beam Computed Tomography (CBCT) [2]. Hence, expanding the
role of X-ray imaging from medical diagnosis to image guidance in operative and
surgical procedures. C-arm mounted CBCT systems are another advancement in
radiology [3]. The C-arm CBCT systems primarily assist surgeons to anatomically
locate the target site when performing medical interventions. This leads to safer and
minimally invasive procedures. Due to its compact size, mobility and reduced storage
area, C-arm CBCT systems also inaugurated the era of on-board treatment guidance
through physical attachment to therapeutic and surgical devices such as Linear
Accelerators and the Leksell Gamma Knife.

The state-of-the-art, Leksell Gamma Knife Icon (see Fig. 1.1) introduced an image
guidance system based on CBCT to enable fractionated, frameless and time efficient
surgeries while maintaining similar precision levels as is achieved by the Leksell Frame
[4]. The purpose of the CBCT is to determine patient position prior to radiosurgery.
This information is then registered to the treatment planning Magnetic Resonance
(MR) images. In applications such as brain tumor radiosurgery, a sub-millimeter
registration accuracy becomes a necessity in order to deliver a high radiation dose to
tumor cells while sparing neighboring cells at risk. More on Gamma Knife and
stereotactic radiosurgery can be found in section A.3.

Figure 1.1: The Leksell Gamma Knife Icon. Published with permission from Elekta AB 1

1
For a video demonstration, visit https://www.elekta.com/radiosurgery/leksell-gamma-knife-icon.html
1
1.1 Problem Statement
CBCT offers a quick imaging facility with high spatial accuracy, yet the quantitative
values tend to be distorted due to various artifacts such as photon scattering, beam
hardening and photon starvation. This results in a sub-optimal registration
performance of the planning MR and stereotactic CBCT images resulting in a less
accurate positioning of the target site. Most of the existing software methods used for
artifact reduction in CBCT [5-9] require a fast and accurate segmentation of the
volume into different materials. See section A.5.2 for details.

Automated image segmentation is a fundamental problem in medical image analysis.


The applications include contouring organs, lesion (tumor) detections, dose
calculations, noise reduction etc. There are three broad categories of segmentation
methods.

(i) Segmentation based on local information. These algorithms tend to work from an
image processing viewpoint. They include image analysis based morphological and
thresholding operations [10]. These methods are sensitive to image noise and non-
uniform object textures among others.
(ii) Segmentation based on a priori information. Contrary to local information, these are
high-level methods that make use of prior information of the expected volume and
adapt it to the actual volume. They include deformable models and atlas-based
segmentations [11-13]. There are two main drawbacks of these methods. First, they
require a rich database for constructing atlases. Unlike MR images, the availability of
CBCT data is sparse. Second, these methods require a smooth transformation between
expected and actual volumes. Tumors and other pathological conditions may disrupt
this transformation.
(iii) Segmentation based on Machine Learning (ML) techniques. These methods have
recently shown huge success in medical image segmentation inclined towards MR and
CT images [14]. These methods make use of supervised learning principles where a set
of training data (hand-engineered input features) and corresponding labels are fed into
a network for training under a certain ML algorithm. More on ML algorithms can be
found in [15]. A major drawback of these so called hand-crafted training features is
that they are unable to extract salient information from the data. As a result their
performance varies from dataset to dataset.

Contrary to such segmentation methods that are dependent on feature engineering,


deep learning networks directly extract hierarchical features specialized for the training
data [16]. Deep learning based methods for medical image segmentation have shown
to outperform the traditional feature based methods on MR images [17, 18]. However,
to the best of our knowledge, these methods have not yet been tested on CBCT images.

2
1.2 Aim
The objective of the thesis is to develop and implement a deep learning based
convolutional neural network [19] for de-noising CBCT volumes through semantic
segmentation – voxel classification into air, soft tissue or bone. It is aimed at
investigating whether deep learning based convolutional neural networks surpass the
conventional image processing and ML methods in segmenting noisy CBCT images
given the following evaluation metrics (see section 2.3 for details)
i. Two-sample T-test p-values
ii. Confusion matrix
iii. Execution time
iv. Region of interest (ROI) error
v. Visual inspection
It is hypothesized that CNN shall outperform conventional image processing and ML
methods in segmenting noisy CBCT volumes. The hypothesis is based on following
grounds:
1. Deep learning models tend to extract high-level abstraction within data which
make them less sensitive to image noise.
2. Deep architectures have shown promising segmentation results on MR images
[17, 18, 20].

1.2.1 Scientific Contribution


CBCT is widely used as a treatment planning tool in implant dentistry and as a
patient positioning tool in image guided radiation therapy [3]. It offers a simpler
geometry and also less radiation exposure compared to conventional CT [21]. This
work addresses an important limitation of CBCT – poor image contrast – which has
been a prime research area in the past decade [22]. The current work scope focuses on
semantic segmentation aimed at noise reduction in CBCT, however, similar framework
may be extended to other applications such as dose calculation. The current work also
introduces the novelty of using CNN in segmenting CBCT images. It highlights
necessary areas to consider when using deep learning architectures for semantic
segmentation of medical images. Last but not least, the present work features the
ability of CNN in resisting noise artifacts when extracting salient features from the
images. The idea of processing noisy images in CNN is unique in its way because CNN
- unlike deep encoder networks – has found most of its application in image
classification and object recognition domains.

1.3 Thesis Layout


Chapter 2 and 3 describes the implementation of methods and results respectively.
Chapter 4 discusses the proposed results, challenges and possible improvements.
Appendix A covers the State of the Art in CBCT and its applications.
Appendix B covers a revision of basic image processing and segmentation techniques.
3
2. Methods
This chapter shall describe and motivate the steps involved to develop the
segmentation model for CBCT images.

2.1 Convolutional Neural Networks


Convolutional Neural Networks (CNN) belong to the group feed-forward Artificial
Neural Networks where the connectivity pattern between different layers is inspired
by the connectivity in the animal visual cortex. See section B.7 for details on CNN.

2.1.1 Training
The training of a CNN is carried out in a supervised fashion. This means that for every
input image we require a true label. These image-label pairs are stored in a training
set. The training process was divided into following key steps:

I. Data Preparation
Training data and labels were generated in the following order: (see Fig. 2.1)
i. Craniofacial real CT data was provided by Elekta Instruments, Stockholm. All
CT images had a stereotactic frame attached to the head which needed to be
removed from the images in order to proceed with the segmentation method (see
Fig. 2.2). This dataset has over 1000 3D CT images of mixed gender and age
groups. All images were of size 512x512x𝑆𝑆, where 𝑆𝑆 is the number of slices
ascending to the top of the head. 𝑆𝑆 varies between 100 and 200 throughout the
dataset. The voxel size is 0.5x0.5x1.0mm3
ii. After frame removal, one CT image was randomly selected to be a template (size:
512x512x154) to provide a common frame of reference. A unimodal intensity-
based rigid registration was performed to geometrically align every other CT
image one at a time with the template. Rigid registration consists of translational
and rotational transformations to align images with the template. This operation
also results in every input image of size equivalent to the template size.
iii. Next, hard segmentation of registered CT images into bone, tissue and air regions
was performed using multi-level thresholding [23]. These are training labels.
iv. CBCT projections were then acquired using Elekta's proprietary algorithm
involving density calculations from CT Data F, segmented CT Data G and energy
spectrum of the Elekta Icon CBCT.
v. Finally, CBCT projections were reconstructed using Conjugate Gradient Least
Square (CGLS) technique in the Operator Discretization Library (ODL)
package 2. These are the training images.

40 CBCT images and corresponding labels were generated following the method in
Fig. 2.1. 20 images were used for training the CNN and the remaining were used for
testing.

2
Available at https://github.com/odlgroup/odl
4
A. 3D CT B. Otsu Thresholding C. 3D Filling

E. Apply mask to CT D. Extract largest 3D


F. CT Registration contiguous region
volume

G. Multi-Class I. 3D CBCT
Thresholding H. CBCT Projections

Training Labels Training Images


Figure 2.1: An illustration of steps involved in preparation of training images and labels.
A. Craniofacial CT data provided by Elekta, Stockholm.
B. After applying Otsu Thresholding at 30%.
C. After 3D morphological filling operation with holes.
D. After extracting the largest connected region in the volume with a 3D connectivity of 26.
E. After applying the mask generated in D to A.
F. After performing rigid registration with a CT template.
G. After applying multi-class thresholding on F. In the current work a voxel may have one of
the three training labels: Air = 0; Tissue = 1; Bone = 2.
H. Generate CBCT projections using F and G. (Beyond thesis scope: provided by Elekta)
I. 3D reconstruction. (Beyond thesis scope: provided by Elekta)

5
Figure 2.2: A 3D illustration of the stereotactic head frame before and after removal.
Metallic screws piercing through patient's head can be seen in the frame-removed image.

II Generate Training Set


The training set was at first generated by extracting a 2D patch of size 33x33 centered
on every pixel in the CBCT image. Each input image (also known as a patch)
corresponds to a single label that belongs to the center pixel of the patch. In general,
craniofacial CBCT images have far more air pixels than tissue pixels which are in turn
far more than bone pixels. In order to have an optimal training, it is important to
incorporate approximately equal number of samples of every class in the training set
[24]. A common solution is to isolate bone, tissue and air samples and randomly sort
equal number of samples from them. However, a novel approach using Canny Edge
detection [25] was implemented to generate a balanced training set. There are two
advantages of this approach.
1. Edge pixels are non-trivial because they directly influence the anatomy in the
image.
2. Patches around edge voxels, possess a structure which speeds up the learning
process (see Fig. 2.3)

A B C

Figure 2.3: Extracting patches around edge voxels.


Random voxel sampling often results in patches with no structural information (A).
Hence, voxel pixels were determined (B) and patches only around edge voxels were captured (C)

Patches around edge voxels only were used for training the CNN. Fig. 2.4 shows
different variants of patch sizes attempted during the thesis.

6
Input Image Edge Image Case I
512x512x154 512x512x154 33x33x1
z Pre-processing Patch preparation Case II
33x33x3
y Gaussian Filtering i. 2D patch
Edge Detection ii. Tri-patch
x
iii. Cubic patch Case III
33x33x33
Figure 2.4: Patch generation around edge voxels.
Case 1 – is a 2D patch extracted around the edge voxel of interest in the x-y plane.
Case 2 – are 3 2D patches extracted around the edge voxel of interest in the x-y, x-z and y-z plane.
Case 3 – is a cubic patch extracted around the edge voxel of interest.

III. CNN Architecture


A usual convention for implementing CNN is to begin with a convolution block
followed a spatial pooling block, non-linearity function and a fully connected layer (see
section B.7 for details). It is a feed-forward network which means that there are no
backward connections in the network. A feed-forward network can be taken as a
stream of linearly stacked non-linear functions.

𝑓𝑓(𝑥𝑥) = �𝑓𝑓𝐿𝐿,𝑤𝑤𝐿𝐿 ∘ … ∘ 𝑓𝑓2,𝑤𝑤2 ∘ 𝑓𝑓1,𝑤𝑤1 �(𝑥𝑥) (1)

Each functional block 𝑓𝑓𝑙𝑙,𝑤𝑤𝑙𝑙 takes an input 𝑥𝑥𝑙𝑙 along with some parameters 𝑤𝑤𝑙𝑙 to compute
an output 𝑥𝑥𝑙𝑙+1 and the computations carry on until the last stacked function 𝑓𝑓𝐿𝐿 is
computed. It must be clear that only 𝑥𝑥 = 𝑥𝑥1 is the actual input image passed to the
first computation block. The rest of the blocks get intermediate feature maps. Fig. 2.5
explains the overall architecture of the proposed CNN using case-1 in Fig. 2.4. In this
implementation 𝑥𝑥1 is a 2D square patch and has a single label belonging to the center
pixel of the patch.

The output of the proposed architecture can be written as

𝑦𝑦 = �𝑠𝑠 ∘ 𝑓𝑓2,𝑤𝑤4 ∘ 𝑅𝑅 ∘ 𝑓𝑓1,𝑤𝑤3 ∘ 𝑅𝑅 ∘ 𝑝𝑝2,𝑎𝑎2 ∘ 𝑐𝑐2,𝑤𝑤2 ∘ 𝑅𝑅 ∘ 𝑝𝑝1,𝑎𝑎1 ∘ 𝑐𝑐1.𝑤𝑤1 �(𝑥𝑥1 ) (2)

𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑐𝑐𝑛𝑛 , 𝑝𝑝𝑛𝑛 and 𝑓𝑓n are the 𝑛𝑛𝑡𝑡ℎ convolution, pooling and fully connected blocks
respectively.
𝑅𝑅 and 𝑠𝑠 are the ReLU and softmax operator respectively.
𝑤𝑤𝑛𝑛 are the weight vectors corresponding to convolution or fully connected blocks.
𝑎𝑎𝑛𝑛 is the pooling window of the 𝑛𝑛𝑡𝑡ℎ pooling block.
y is the posterior probability vector of size 1 x 3.

7
Input Feature Maps Feature Maps
33x33 29x29x20 14x14x1x20
2D Convolution 2D Max Pooling
5x5x1x20 Stride: 2

ReLU

Feature Maps Feature Maps Feature Maps Feature Maps


5x5x32 5x5x32 10x10x32 14x14x20

ReLU 2D Max Pooling 2D Convolution

Stride: 2 5x5x20x32

Reshape

Categorical
Probability
Distributions
ReLU

Fully Fully Fully Softmax


Connected Connected Connected Loss

800x1 512x1 512x1 3x1

Figure 2.5: Block diagram of the implemented CNN

8
IV. Training Objective
The computations are performed in two modes. In the forward mode, posterior
probabilities 𝑦𝑦 are calculated according to Eq. 2. The posterior probabilities are then
compared with the ground truth or labels. The objective of the training algorithm is
to minimize the dissimilarity or error between 𝑦𝑦 and the ground truth. The error
function 𝐸𝐸 is given by cross entropy for a 1-of-𝐾𝐾 coding scheme [15].
𝑁𝑁 𝐾𝐾

𝐸𝐸(𝑊𝑊) = − � � 𝑡𝑡𝑖𝑖𝑖𝑖 ln 𝑆𝑆𝑘𝑘 (𝑥𝑥𝑖𝑖 , 𝑊𝑊) (3)


𝑖𝑖=1 𝑘𝑘=1
𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑊𝑊 is the weight vector.
𝑁𝑁 is the batch size – see VIII of this section for details.
𝑡𝑡𝑖𝑖𝑖𝑖 is the label that 𝑖𝑖 th input image belongs to the 𝑘𝑘th class.
𝑆𝑆 is the softmax operator that returns the probability of the 𝑖𝑖 th image to the 𝑘𝑘th class.

Since, the only tunable parameters in Eq. 2 are the weights 𝑤𝑤𝑛𝑛 , the error minimization
problem can be stated as:

𝑤𝑤1 ∗ , 𝑤𝑤2 ∗ , 𝑤𝑤3 ∗ , 𝑤𝑤4 ∗ = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑤𝑤1 ,𝑤𝑤2,𝑤𝑤3,𝑤𝑤4 � 𝐸𝐸 (𝑊𝑊) (4)


(𝑥𝑥,𝑇𝑇)∈𝐷𝐷

𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝐸𝐸 is the error function between the posterior probabilities 𝑦𝑦 and true label 𝑇𝑇
𝐷𝐷 is the set of training data

V. Backpropagation
The algorithm used to estimate the optimum weights leading to minimum error as in
Eq. 4 is known as backpropagation. It is basically a method of computing gradients
with recursive application of chain rule. Mathematical description of the algorithm can
be found in [26]. It is a gradient descent algorithm in which the weights are trained to
update the value of 𝐸𝐸 in the direction of negative gradient –downhill - to find a global
minima on the error surface. The standard gradient descent algorithm updates the
parameter 𝑊𝑊 of the objective function 𝐸𝐸 (Eq. 3) as

𝑊𝑊𝑛𝑛𝑛𝑛𝑛𝑛 = 𝑊𝑊𝑜𝑜𝑜𝑜𝑜𝑜 − 𝜂𝜂 ∗ ∇𝑊𝑊 (𝐸𝐸) (5)

𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 ∇𝑊𝑊 (𝐸𝐸) is the error gradient w.r.t weights.


𝜂𝜂 is the Learning rate.

9
VI. Learning Rate
Learning rate governs how fast or slow the network converges towards the global
minima. A high learning step may surpass the error minima and lead to unstable
learning. A small learning step leads to slow convergence and more processing time.
To overcome this, learning rate is set to a high value in the beginning and is gradually
decreased as the algorithm progresses. In the current implementation a learning rate
of 0.1 was used with a decrement factor of 2 after every 50 iterations.

VII. Momentum
Momentum is used to avoid the network getting stuck in a local minima as we descent
the error surface. The value of momentum relates to the level of contribution we add
to our learning from a previous weight change. Using momentum in CNN also aids in
having a smaller learning rate which is necessary for stable learning. In the current
implementation a momentum (𝛼𝛼) of 0.9 was used.

𝑊𝑊𝑛𝑛+1 = 𝑊𝑊𝑛𝑛 − 𝜂𝜂 ∗ ∇𝑊𝑊𝑛𝑛 (𝐸𝐸) + 𝛼𝛼(𝑊𝑊𝑛𝑛 − 𝑊𝑊𝑛𝑛−1 ) (6)

VIII. Batch Size


There are two ways to update the weights in CNN. In serial or online update, images
are fed one at a time to the CNN. The image is processed through all the network
blocks; error is calculated; and the weights are updated in the direction of negative
error gradient. This method has two drawbacks. First, it requires high computation
time. Second, it is sensitive to noise as a single noisy image may corrupt the weights
of the network. To avoid this, parallel or batch training is used. In batch training
input images are grouped together before feeding to the CNN. All computations are
performed in a parallel manner and cumulative error of the entire group is calculated
which is then back-propagated to update the weights. The number of images sent to
the CNN at one time is called the batch size. A higher batch size requires larger parallel
computation memory. In the current implementation, a batch size of 500 was used –
which is symbolized as 𝑁𝑁 in Eq. 3

IX. Validation Set


Since the network is allowed training for a given number of iterations, it is likely to
lose its generalization ability by overfitting on the training images [27]. In order to
avoid over-fitting, 10% of the training images were included in an independent
validation set. During every iteration and after the weight update; the validation set
was fed to the CNN and the error was calculated. The training error and validation
errors were plotted as shown in Fig. 2.6. As the validation set is independent of the
training set, it offers a good measure of the generalization ability of the network. No
weight update is done using the validation error.

10
Figure 2.6: An illustration of how validation set assists in avoiding over-fitting. The arrows
mark possible training termination points.

2.1.2 Implementation
Compute Unified Device Architecture (CUDA) is a parallel computing platform
developed by NVIDIA [28]. It utilizes the multi-core flexibility of modern day GPU's
enabling data processing in a parallel manner rather serially on CPU's. This results in
an exceptionally fast computation speed on big data processing applications. CUDA
supports C, C++ and Fortran programming languages. Matlab supports CUDA
enabled NVIDIA GPU's in its parallel processing toolbox. Since speed is a big concern
in the current segmentation problem, the implemented methods make use of the GPU
architecture using CUDA libraries. Matlab, Microsoft Visual Studio and Enthought
Canopy were used as programming environments. Matconvnet library was used for
implementing CNN [26].

2.1.3 Testing
After determining the optimal weights, CNN is frozen and no further weight update
takes place. Testing the CNN with an input CBCT image of size (512 x 512 x 154)
involves two key steps. First, patch generation (Fig. 2.4) for every pixel in the input
image. Second, computing 𝑦𝑦 (Eq. 2) for every patch. The former step was performed
using parallel CPU threads. Each thread gets a 2D image slice (512 x 512) and
computes patches (33 x 33 x 262144). Next, one thread at a time requests for GPU
control and 𝑦𝑦′𝑠𝑠 are computed in a batch-wise manner. Based on maximum class
probability, every patch is assigned a single label. Once all patches from a single slice
are labelled, the GPU control is transferred to the next CPU thread which by now has
generated patches from the next slice in queue. Fig. 2.7 illustrates the parallel test
architecture.
11
Figure 2.7: Block diagram for parallel test data processing.

12
2.2 Other Algorithms
Several other segmentation algorithms were tried on CBCT images to compare their
performance with CNN. Details on image domain based simple segmentation
algorithms such as thresholding and K-means can be found in Appendix B. The
following section includes details on features selection for Multilayer Perceptron (MLP)
followed by a modified Fuzzy C-mean algorithm.

2.2.1 Features Selection


Conventional machine learning algorithms such as Support Vector Machine [29] and
MLP (see Appendix B.6) rely on manually extracted features as input to the network.
In order to compare the performance of the proposed CNN against MLP, 9 training
features were extracted from the CBCT images. Practically, it implies a 9 x 1 feature
vector for every CBCT image pixel. The feature vector includes the following:
• 1 pixel intensity that roughly corresponds to CT Hounsfield (HU) value.
• 3 spherical coordinates. Spherical coordinates tend to provide a better realistic
interpretation of craniofacial anatomy than Cartesian coordinates.
• 2 Haar-like features [30] extracted along x and y directions.
• 3 probabilistic features showing probability of the pixel belonging to any of the
three classification class. The main idea behind generating the probabilistic
maps is to ensure that the predicted segmentation is consistent with expected
craniofacial anatomy. 500 CT images were generated using the framework
described in Fig. 2.1 (Step A – H). Next, three separate binary segmentations
were performed for bone, tissue and air respectively. The probabilistic maps
were generated by taking the mean over all segmented images. Fig. 2.8 shows
a slice from the three probabilistic maps.

Air Tissue Bone


Figure 2.8: A slice from craniofacial probabilistic maps. The intensity of every pixel
indicates the probability of the pixel being in the air, tissue or bone region.

For MLP training, these 9 features were fed as inputs to the network with one hidden
layer carrying 800 nodes and one output layer carrying 3 nodes. The training of the
MLP is an optimization problem where the weights were updated using Levenberg-
Marquardt backpropagation algorithm [31]. One million samples were randomly
selected from 20 CBCT images for training. Training set was biased to have roughly
equal number of samples from every class. 20% of the training samples were used as a
validation set to avoid overfitting.

13
2.2.2 Modified Fuzzy C-means
A novel approach for correcting shading artifacts in CBCT images that does not
require any prior information was implemented during the thesis work. The basic
framework as shown in Fig. 2.9 is inspired from the analogy between scatter and bias-
field effect - which is a low frequency noise signal - observed in MR images. Several
variants of Fuzzy C-means (FCM) have been proposed in the literature for estimating
bias field in MR images [32-35]. The present study has extended the application of
the framework on craniofacial CBCT images. The framework is a combination of FCM
[36] coupled with a neighborhood regularization term and Otsu's method [37]. It is
based on the assumption that true CBCT images follow a uniform volumetric intensity
distribution, and scatter perturbs this uniform texture by contributing cupping and
shading artifacts in the image domain. The neighborhood regularization term within
the FCM objective function biases the solution to piece-wise smooth clusters. The Otsu
method initializes the cluster centers to ensure similar solution every time the function
is run. Hard segmentation of the final image is performed after subtracting the
estimated bias field from the input image. Details of this framework shall be available
in the preprint [38].

Input Membership Matrix Bias Field

Modified FCM

Input– Bias Field

Multilevel Thresholding

Hard segmented Bias corrected


Image Image

Figure 2.9: Modified Fuzzy C-means Framework

14
2.3 Evaluation Metrics
The evaluation metric includes the following examinations:

I. Two Sample T-test


The two sample T-test [39] is a statistical test to imply if the means of two groups are
reliably different from one another. Mean is a descriptive statistic that describes the
result but cannot be generalized beyond that. On the other hand inferential statistics
like T-test allows us to make inferences about our predictions beyond the given data.

II. Confusion Matrix


Confusion matrix demonstrates the sensitivity (true positive rate) and specificity (true
negative rate) of the results. Sensitivity measures the proportion of actual positives
which are correctly identified as such (e.g. the percentage of bone pixels correctly
classified as bone pixels). Mathematically,

number of true positives


Sensitivity = (7)
number of true positives + number of false negatives

Specificity measures the proportion of negatives which are correctly identified as such
(e.g. the percentage of no-bone pixels which are correctly identified as no-bone).
Mathematically,

number of true negatives


Specificity = (8)
number of true negatives + number of false positives

III. Execution Time

IV. ROI error


Several regions in craniofacial CBCT images such as Dorsum Sellae and Clivus are
hard to segment due to poor image contrast. Hence, 5 regions of interest (ROIs) were
manually defined in CBCT images and classification error within ROI's was calculated.
𝑁𝑁
1
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑟𝑟𝑘𝑘 = � 𝑛𝑛𝑛𝑛𝑛𝑛 (𝑃𝑃𝑖𝑖𝑖𝑖 − 𝐿𝐿𝑖𝑖𝑖𝑖 ) ∀ 𝑘𝑘 ∈ {1,2,3,4,5} (9)
𝑁𝑁
𝑖𝑖=1

where 𝑁𝑁 is the number of test images.


𝑃𝑃𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎𝑎𝑎 𝐿𝐿𝑖𝑖𝑖𝑖 are the predicted and label vectors for the 𝑘𝑘th ROI in the 𝑖𝑖 th test image
respectively.
𝑛𝑛𝑛𝑛𝑛𝑛 is a function evaluating number of non-zeros.

V. Visual Inspection
The aim of visual inspection is to analyze regions within images that are more
susceptible to misclassifications due to different segmentation methods.

15
3. Results
This chapter covers the results generated by the aforementioned methods. All
simulations were performed on a desktop computer with Intel Core i7-4790K CPU, 32
GB RAM and NVIDIA GeForce GTX 970 4GB. The training of MLP was performed
on the aforementioned CPU and took 6 days to complete. The training of CNN was
performed on the aforementioned GPU and took 2 days to complete.
In order to maintain result fidelity, none of the 20 test images were exposed to any of
the segmentation algorithm during training process for parameter tuning. The results
presented below are first time evaluations on test images and no parameter tuning was
performed after this. All test images have a fixed size of 512 x 512 x 154.

3.1 Two Sample T-Test

Methods Evaluation Metric


Standard Execution Time
Mean Error %
Deviation % (s)
Thresholding 1.1 0.18 0.9
MFCM 0.8 0.12 64*
MLP 1.2 0.19 92
CNN 0.4 0.07 500
Table 3.1: Mean classification error and execution times over 20 test images. Best scores in every
metric are marked in bold. Execution times marked with an * symbolize that the test was performed
on CPU. Else, the test was performed on GPU.

Table 3.1 shows that CNN based segmentation, results in the minimum classification
error and variance among test CBCT images. Yet, at the cost of long execution time.
For a T-test, we assume that the null hypothesis states no statistically significant
difference between means of two methods. Test result is 1 if it rejects the null
hypothesis at the 5% significance level, and 0 otherwise. In case the test result is 1, it
is concluded that the group with lower mean error is better. A P-value is the
probability of observing the result by chance if the null hypothesis is true. Smaller p-
values are preferred.

Table 3.2 shows a generalization that in at least 95% of cases, CNN shall outperform
other segmentation techniques on CBCT images. MFCM proves to be the second best
segmentation algorithm outperforming MLP and multi-level thresholding. MLP and
multi-level thresholding show no statistically significant difference among their
performance.

16
Method 1 Method 2 Result P-value
CNN Thresholding 1 < 0.001
CNN MFCM 1 < 0.001
CNN MLP 1 < 0.001
MFCM Thresholding 1 < 0.001
MFCM MLP 1 < 0.001
Thresholding MLP 0 0.08
Table 3.2: T-test results for comparing mean errors of different segmentation methods.

3.2 Confusion Matrix

Methods Evaluation Metric


Specificity Sensitivity
Bone Tissue Air Bone Tissue Air
Thresholding 99.9 98.8 99.6 85.3 99.3 99.7
MFCM 99.8 99.3 99.4 91.7 98.5 99.8
MLP 99.1 99.6 99.7 98.1 95.9 99.7
CNN 99.8 99.7 99.4 97.9 98.7 99.8
Table 3.3: Confusion matrix for bone, tissue and air.

Table 3.3 displays sensitivity and specificity of different materials averaged over 20
CBCT test images. The table shows a close call between different segmentation
algorithms with no significant difference in scores except for low bone sensitivity value
using multi-level thresholding.

Sensitivity and specificity values must always be dealt in pair. For instance, a higher
specificity value of bone thresholding in Table 3.3 implies that in case of positive
prediction of a bone pixel, it is nearly certain that it is truly a bone pixel. However,
the corresponding low sensitivity value implies that in case of a negative prediction, it
is not certain if it is actually a no bone pixel or not. Hence, both sensitivity and
specificity should be high for a strong classifier.

Also of interest is the variance in performance index of different segmentation


algorithms. This can be analyzed through statistical box plots (Fig. 3.1) calculated
over 20 CBCT images. Box plots show that CNN offer the smallest deviation in results
over all other segmentation algorithms while achieving the highest level of accuracy.
Modified FCM shows the second best performance in terms of prediction accuracy and
variance among test images. This plot is consistent to the T-test results depicted in
Table 3.2.

17
BoxPlot for Tissue Prediction

99.6

99.4

99.2
Correct Prediction Rate %

99

98.8

98.6

98.4
CNN MFCM MLP Thesh
Segmentation Method

BoxPlot for Bone Prediction

99.8

99.6

99.4
Correct Prediction Rate %

99.2

99

98.8

CNN MFCM MLP Thesh


Segmentation Method

Figure 3.1: Box Plots showing statistical distribution of prediction accuracy for different segmentation
algorithms evaluated over 20 test images. For each box, the red mark indicates the median, and the
bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend
to extreme data points that are not considered as outliers. + Symbol indicates outliers.
18
3.3 ROI error

Also of interest is the performance of different segmentation algorithms over manually


defined low-contrast regions of interests (Fig. 3.2) within CBCT images. Fig. 3.3
demonstrates an explicit decrease in prediction error for CNN among other
segmentation techniques among the ROI's.

Figure 3.2: 5 regions of interests (L-R) in CBCT images.

ROI ERROR
30
25.7
25 22.7 2322.9
18.9 18.7 19.2
20 17.3
15.5 16.2
15 12.3 12.6
9 9.85
10 7.1
4.7 5.3
5 3.1 3.7
2.5
0
ROI -1 ROI -2 ROI -3 ROI -4 ROI -5
CNN MFCM MLP Thresh

Figure 3.3: Bar graph showing average pixel prediction error in 5 ROI's for different
segmentation algorithms over 20 test images.

19
3.4 Visualization

Fig. 3.4 provides a visual comparison of different segmentation algorithms. The three
color schemes – black, gray and white – represent air, soft tissue and bone regions
respectively. It is observed that CNN provide the most similarity with the ground
truth followed by MFCM. It is also shown that multi-level thresholding (F) obscures
certain low contrast bone regions around central craniofacial anatomy as discussed in
the low sensitivity value in Table 3.3. MFCM (D) outperforms multi-level thresholding
because of bias correction which smooths out shading artifacts in the images. MLP
(E) demonstrates a slight noisy segmentation. This is due to lack of contextual
information captured within the 9 training features.

A. Input Slices

B. Ground Truth

20
C. CNN

D. MFCM

E. MLP

F. Multilevel Thresholding

Figure 3.4: Visual comparison of different segmentation algorithms.

21
4. Discussion
This chapter discusses the results; highlights major challenges involved during the
work and suggests improvements.

4.1 Performance

Table 3.1 shows that CNN outperforms other segmentation techniques in term of
classification accuracy. However, they also require the highest execution time because
of the complex architecture. Table 3.1 also depicts that histogram based thresholding
give a good enough classification performance in minimum time. In a strict
mathematical sense, the difference in classification performance between CNN and
multi-level thresholding (~0.7%) seems to be fairly low compared to the time and
resources needed for CNN implementation. However it must be noted that 0.7% of
misclassification in volumetric images of size (512x512x154) implies approximately
300,000 misclassified pixels. These misclassifications are mostly seen around the low-
contrast regions of central anatomy (Fig. 3.4 E) and it obscures crucial structural
information. This is graphically explained in Fig. 4.1. The plot depicts the number of
bone pixels per slice for a test CBCT image. It is apparent that significant bone
information is lost via multi-level thresholding around regions inferior to the patient
head top (axial slices 20 to 50) when compared to the dotted red reference line – the
ground truth. However, CNN (blue line) shows a considerable match to the ground
truth.

Figure 4.1: Bone information per craniofacial CBCT slice in transverse plane. The
highest slice No. corresponds to top of the head.
22
Another important contribution of CNN is the stability in their results as shown in
Fig. 3.1. Reliability and validity are significant assessment tools of a system. Reliability
is the degree to which the system outputs consistent results. Validity is the quality of
being factually correct. CNN achieves high reliability and validity by capturing
intrinsic noise-free features from images. Hence, the output of CNN is resistant to
image artifacts. Shading artifacts are a major source of non-uniform pixel intensities
in CBCT images which make multi-level thresholding less reliable and valid in its
results. MFCM performs shading correction prior to segmentation which makes the
results more reliable and valid compared to multi-level thresholding.

A key ingredient behind the success of CNN prediction is the choice of patches centered
on edge voxels as explained in section 2.1.1 II. Contrary to random patches extracted
from CBCT images, it was observed that the mean segmentation error on test images
was reduced by 70% when the CNN was trained using patches around edge voxels.

The choice of CNN architecture (order and size of layers) is also a crucial element
governing the classification accuracy. The present CNN architecture was determined
by iteratively running different combinations of network layers and sizes and
comparing the validation scores. However, there is always a room for incorporating
more efficient network architectures to improve classification accuracy. Different
variants of patch dimensions (Fig. 2.4) were tried during the course work. However,
the network performance was optimized for 2D patches only. Further investigation on
3D patches might be of interest in improving network efficiency.

An alternative approach is to make use of deep convolutional encoder networks (CEN)


[40]. Contrary to our existing patch-based CNN model, CEN has intermediate
convolution and deconvolution blocks which extracts features from the entire image
rather than patches. Hence, they provide a similar sized mapping of input and output
images and eliminate the steps of patch selection and redundant calculations at the
overlap of neighboring patches. However, considering the size of CBCT images, a major
drawback of CEN is that it would approximately require a 40 times extra processing
memory to perform training.

A major drawback in CNN performance is the high execution time. This includes time
required for creating patches (Fig. 2.4) and then generating output for every patch
(Fig. 2.5). Possible solutions to this limitation include:

i. Using a stronger GPU in terms of size or using multiple GPUs.


ii. Generate test patches on GPU rather than CPU.
iii. Using advanced CNN libraries such as Caffe [41] with cuDNN
acceleration [42].

23
4.2 Limitations

Since the ultimate aim of the segmentation step is to assist in the iterative
reconstruction of CBCT images (section A.5.2), a further study targeting the impact
of different segmentation techniques on CBCT projection generation is required. This
specific study shall draw the boundary conditions of our confusion matrix. For
instance, the study shall highlight how lower or higher sensitivity/specificity values of
bone, tissue or air voxels impact the CBCT projection generation which in turn effects
3D image reconstruction.

A notable limitation of this work lies in the limited number of segmentation algorithms
tested on CBCT images. A further study addressing the performance of new
segmentation techniques such as level set, region growing, graph partitioning etc. is
essential to generalize the performance of CNN over other segmentation algorithms.
Also of interest is to attempt improvements in the existing tested methods, for
instance, improved features extraction for MLP etc.

Since Leksell Gamma Knife Icon was introduced in May 2015, a major challenge in
implementing a deep framework was the unavailability of expert segmented real CBCT
images for training. Hence, it is worth mentioning that no real CBCT data was used
during the thesis work. Synthetic CBCT images - generated from stereotactic CT
dataset - were used for training and testing (Fig 2.1 – I) as explained in section 2.1.1.
However, the use of real CBCT images in future is important to certify the authenticity
of the results on clinical level.

Another question to be addressed is that the present study is based on a strong


assumption that CT images are less sensitive to noise artifacts compared to CBCT
images which make CT segmentation using thresholding operation very accurate.
Following this assumption, segmented CT images were used as the ground truth for
CNN training. However, this assumption is not exactly true. For instance, conventional
CT is more sensitive to beam hardening artifacts primarily due to a higher energy
spectrum [43]. During the data preparation process as explained in Fig. 2.1 (steps A-
E), metallic screws fastening the stereotactic frame to the skull could not be entirely
removed. These screws penetrate through the skin (see Fig. 2.2) and cause streaking
artifacts. Severe beam hardening was also apparent in the maxillofacial region in some
CT images. This significantly deteriorates the quality of ground truth obtained
through thresholding – Fig 4.2. The availability of expert segmented CBCT images in
future is likely to overcome this problem. Another possible fix is to make use of
advanced tomogram segmentation algorithms that determine thresholds from the
projections rather than reconstructed volumes [44, 45].

24
Figure 4.2: An illustration of beam hardening in CT images (left column) deteriorating the
ground truth (right column) due to dental anatomy (top row) and metallic stereotactic
frame screws (bottom row). Dotted rectangles demonstrate the presence of artifacts.

An additional issue to be thought of is that the entire CT data used to generate the
CBCT images and ground truth for training and testing was obtained from a single
clinical center (anonymous). This means that this data is likely to have inherited some
unwanted traits specific to scanner deformations, user protocols etc. Hence, there is a
possibility that the trained CNN model might not work with a similar performance
index at another clinical site. However, since training architecture is implemented, it
is advisable to tune the CNN parameters using data from new sites before practical
implementation.

Medical data handling is a sensitive exercise involving high level of integrity and
probity. CT data from Elekta was provided in DICOM format which groups patient,
scanner, protocol and image information in one file. More on DICOM can be found in
[46]. Respecting patient's privacy and confidentiality; common tags that describe
patient identity such as name, age, sex, ethnic group, occupation, examination date,
hospital ID etc. were obscured. Only image information was used as Matlab (.mat) or
Python (.py) files for this work. However, since craniofacial anatomy differs slightly
among different age groups, sex and geographical locations; it might be of interest in
future to make use of this information from DICOM files as separate features in
training deep networks. However, it is essential to seek legal consent from the data
provider before using this information.
25
5. Conclusion
Semantic segmentation of Cone Beam CT images was performed using conventional
image processing methods, feature-based machine learning technique and feature-
independent deep learning framework. The objective of the work was to investigate
whether deep learning based CNN surpass the conventional image processing and ML
methods in segmenting noisy CBCT images. It was hypothesized that CNN will
outperform other algorithms given the aforementioned evaluation metric. Apart from
execution time, CNN's have outperformed other segmentation algorithms throughout
the evaluation metric. Since execution speed is important for real-time 3D image
reconstruction in CBCT, further efforts to speed up CNN execution are required.

26
Appendix A: Background and Literature Review

If the great German Physicist - W. Röntgen - were to return today, what would
probably impress him the most from medical evolution? We might shortlist three
achievements: resourcefulness of our electronic health record systems in providing
secure patient information anywhere and anytime; the vast range and specificity of
drugs in our pharmacies; and our ability to look into every inch of human body through
imaging and scoping techniques. Nearly every clinic that Röntgen would encounter
would have a busy imaging facility running medical diagnosis or treatment planning,
and much of the success behind this achievement can be attributed to his own
discovery of what he called ‘unknown radiations’ or X-radiations. It happened by
chance – as so many things in science do – and marked the beginning of a glittering
career and fame for Röntgen as Father of Diagnostic Radiology that revolutionized
the field of medical imaging.

A.1 Computed Tomography


Within months after the discovery of X-radiations in 1895, X-ray images found their
place in clinical environments [47]. In 1917, an Austrian Mathematician – Johan Radon
– showed that any function can be estimated from an infinite number of its projections
[48]. However, oblivious to the Radon theorem, an English engineer G. Hounsfield and
South African physicist Allan Cormack invented the Computed Tomography (CT) in
1972 [49] inaugurating the era of Digital Medical Imaging. Since then, there has been
an explosion of research activities for refining CT technology primarily focusing on
better tissue contrast and spatial resolution for medical diagnosis with the constraints
of minimum radiation dosage to patients and reduced reconstruction times [50].
Modern third generation CT scanners consist of an X-ray source in a fan geometry
that penetrates through the object or patient being scanned, covering a slice section
of it. The X-rays are attenuated when penetrating through the object and the local
absorption is measured by the detector array. In order to capture a 3D representation,
the object is irradiated by rotating the X-ray source and the detector around the
object. The local absorptions or projections captured by the detector under different
projection angles are used to reconstruct 2D axial slices of the object. Conventional
CT generates a 2D slice at every rotation. CT becomes a 3D imaging modality when
consecutive slices are arranged as axial stacks. Two broad categories of reconstruction
algorithms include direct and iterative methods. Filtered Back Projection (FBP)
algorithm is a common example of a direct reconstruction method based on the Radon
Transform [51]. The standard FBP algorithm has several fundamental assumptions on
scanner geometry and is basically a compromise between image noise and
reconstruction speed. With the advent of high speed parallel GPU computations,
iterative reconstruction algorithms have attracted huge interest in medical image
reconstruction. The field of iterative 3D reconstruction has since been readily evolving;
giving rise to a seemingly never ceasing flow of new algorithms [52, 53].

27
Figure A.1: An illustrative operational comparison of conventional fan beam and CBCT systems.

A.2 Cone Beam Computed Tomography


In 1984, Feldkamp, David and Kress (FDK) provided an extension to the FBP method
by formulating a 3D reconstruction algorithm using a convolution back projection
formula on a set of 2D projections acquired along a circular trajectory [1]. This
reconstruction algorithm coupled with the development of continuous exposure x-ray
tubes and compact high-quality 2D detector arrays led to the production of Cone
Beam CT that expanded the role of X-ray imaging from diagnosis to image guidance
in operative and surgical procedures.

CBCT is a recent imaging modality. The name cone-beam CT was coined to illustrate
the cone shaped divergence of the x-ray beam from the source to detector. It consists
of a flat detector that captures 2D projections (rather than slices) of the object at each
angular step. The projections are used to estimate a 3D volume of the object being
scanned.

CBCT mounted on a C-arm is another advancement in Radiology. This configuration


provides volumetric imaging capabilities within interventional suites. Historically,
medical domains such as interventional radiology and implant industry had been using
2D radiographic imaging techniques like digital subtraction angiography to visualize,
manipulate and intervene on 3D structures [3]. C-arm CBCT systems may also be
physically attached to a therapeutic device such as proton therapy and linear
accelerators [54]. This intervention inaugurated the era of Image Guided Radiotherapy
(IGRT) where frequent imaging during the course of treatment helps in improving the
precision and accuracy of the radiation source. Recently, Elekta utilized the C-arm
CBCT functionalities by physically attaching it to Gamma Knife to determine patient
position prior to radiosurgery [55]. In the present frame of work, we are interested in
the latter application of CBCT.
28
CBCT has several advantages over traditional CT which make it suitable for point-
of-care applications. It offers a small footprint, low cost and an adaptable geometry
that qualifies it to be physically attached to a surgical device like Gamma Knife. It
also offers a better isotropic spatial resolution. In conventional CT, the voxels are
anisotropic rectangular cubes with plane dimensions as small as 0.6mm and voxel
depth determined by the axial slice thickness - usually in the order of 1-2mm. The
volumetric dataset of CBCT comprises of isotropic cubic voxels that determine the
resolution of the image which range from 0.4mm to 0.076mm [56]. Last but not least,
CBCT has proven to offer lower radiation dosage compared to conventional CT
systems [3, 21]. The price paid for all the above mentioned advantages is poor image
contrast and significantly higher noise artifacts compared to conventional CT.

A.3 CBCT System at Elekta


Lars Leksell – a Swedish Physician – developed the concept of Radiosurgery in 1951
[57]. Radiosurgery is a surgical treatment targeting and destroying tumor cells with
an ionizing radiation source. The word stereotactic refers to a 3D coordinate system
used to locate the target site within the cranial region. Lars, later invented the Gamma
Knife in 1967 and founded Elekta in 1972. Gamma Knife stereotactic radiosurgery by
Elekta has since been one of the top treatment choice for brain tumors and functional
disorders around the globe.

In the Gamma-Knife, Gamma-rays emitted from multiple radioactive 60Co sources


arranged in a hemi-spherical array are focused on a single point of the brain creating
a high spherical dose to destroy tumor cells. The target site (tumor) of the patient is
at first located using Magnetic Resonance (MR) scan in a separate room. This is
termed as the planning image. The patient is then transferred for a surgical session to
Gamma Knife. The planning images are then mapped onto stereotactic coordinates
defined for Gamma Knife. This mapping involves annotations on planning images by
a physician using the Leksell Frame System [4]. The Leksell Frame has certain
drawbacks. The metallic frame fixation to the patient’s head is an invasive procedure
leading to patient discomfort and enhanced treatment time. Second, Dose
Fractionation – which is a common protocol in radiation oncology [58] - is cumbersome
to perform using the frame based system.

The-state-of-the-art, Leksell Gamma Knife Icon introduced an image guidance system


based on CBCT to enable fractionated, frameless and time efficient surgeries while
maintaining similar precision levels as is achieved by the Leksell Frame. A C-arm
CBCT system attached to Gamma Knife provides the stereotactic reference prior to
surgical session. Planning MR images are then co-registered to stereotactic CBCT
images automatically for determining patient position relative to Gamma Knife
coordinates at the time of surgery.

29
Focusing on the scope of the present thesis, it would be better to discuss a few design
constraints of CBCT system mounted on Gamma Knife Icon. Due to the fact that the
patient table is physically attached to the gantry, the rotational scan range of the C-
arm CBCT is reduced to 200 degrees. Hence, limiting the number of projections. An
implication of the geometry, is that the detector needs to be very close to the patient’s
head which results in a substantial amount of scatter radiation being absorbed
resulting in cupping artifacts. The application of CBCT in Gamma Knife is limited to
getting a better registration performance of the planning MR images on stereotactic
reference system. This puts a high demand on spatial accuracy within CBCT generated
volumes for target localization to ensure precise dose delivery. However, it is not meant
for medical diagnosis and hence a high soft tissue contrast is not a requirement. Also
the field of view (FOV) is limited to craniofacial region, and thus, basic prior
knowledge on anatomical craniofacial features might turn out handy in improving
image quality. The following sections shall take into consideration the aforementioned
design constraints and requirements when discussing image artifacts.

Figure A.2: Elekta CBCT scanner. Published with permission from Elekta AB.

30
A.4 CBCT Artifacts
Artifacts are image errors that deteriorate image quality. They can be broadly
classified into three categories. (i) Physics based artifacts; that result due to our
assumptions, simplifications and modelling limitations of the physical processes
involved in the acquisition and reconstruction of CBCT data [51]. (ii) Scanner based
artifacts; that occur due to imperfections in scanning geometry, defective detector
elements etc. (iii) Patient based artifacts; that occur due to patient movements.
Understanding the source of artifact is crucial for applying correction measures. The
present thesis focuses on physics based artifacts.

A.4.1 Scatter
Scatter is a big concern in CBCT imaging. Scatter radiation is caused by photons that
after interaction with matter, get diffracted from their original path and add to the
measured intensities of primary photons. Unlike the fan beam geometry in
conventional CT where most of the scatter radiation gets dispersed away from the
detector, scatter radiation has a large possibility of interacting with the 2D detector
in CBCT. Streak and shading artifacts commonly occur due to scatter radiations [43].

A.4.2 Beam Hardening


The FBP method assumes that the X-radiation is monochromatic – all photons have
the same energy [51]. This is assumed to calculate a homogenous attenuation
coefficient (dependent only on beam path length) when X-rays pass through an object
being radiographed. In practice X-ray beam is polychromatic and from a physical
perspective, attenuation coefficients are non-linearly dependent on the density of
material being radiographed. Denser the object, the more radiation it absorbs. The
low energy rays get more absorbed when passing through the object than high
energetic rays within the polychromatic spectrum. The beam reaching the detector has
high energy (harder) X-ray beams compared to the emitted source. This phenomenon
is termed as beam hardening [59]. Metals and bony structures are a major cause of
this artifact. Since this artifact is produced due to the monochromatic assumption in
the reconstruction algorithm – which in turn is same for both CT and CBCT – beam
hardening is present in both tomography techniques. CBCT, however is less sensitive
to beam hardening due to its low energy spectrum.

A.4.3 Cone-beam artifact


Cone-beam artifact is commonly visible in the periphery of scan volumes. As the X-
ray source rotates around the patient in a horizontal arc, the amount of data recorded
at the peripheral pixels of the detector is less compared to central detector pixels. Cone
beam artifact is proportional to the cone beam angle and smears out flat planar (xy-
plane) structures in the z-direction [51].

31
A: Scatter noise significantly disturbing tissue contrast and homogeneity
in CBCT scan (right) compared to CT scan (left)

B: Beam Hardening artifact in CT (left) and CBCT (right) showing dark streaks
along the lines of greatest attenuation and bright streaks in the other direction.

C: Cone Beam artifact visible on the top region of the head in CBCT scan (right).

Figure A.3: The above illustrations show basic artifacts in CBCT images. Dotted Rectangles
demonstrate the presence of artifacts. Published with permission from Elekta AB.

32
A.5 CBCT Denoising Methods
Since the introduction of CBCT in 2001 [60], numerous algorithms have been proposed
to tackle its noise artifacts. These correction algorithms can be broadly grouped into
two categories.

A.5.1 Hardware approaches


Hardware approaches are basically pre-acquisition methods that are specific for a
particular artifact. For instance, in case of scatter artifacts, hardware approaches tend
to use air-gaps [61] or anti-scatter grids [62] to prevent scatter photons from reaching
the CBCT detector. Similarly, beam filtration using bow-tie filters is an example of
hardware correction for beam hardening artifact where an attenuation plate is placed
between the source and object to shift the x-ray spectrum towards high-energy end
[63]. There are three major drawbacks of hardware approaches. First, they tend to
reduce the number of primary photons from reaching the detector, thus decreasing
signal to noise ratio (SNR). As a result, higher patient dose is delivered to maintain
image contrast. Second, they are specialized for single artifact correction and do not
account for image correction in general. Third, they require an add-on device that
complicates the hardware on site.

A.5.2 Software approaches


These methods can be divided into three categories:

I. Pre-acquisition correction
These methods tend to adjust the gain variations of detector pixels prior to acquisition
phase. They make use of beam filters to calculate the ratio between ideal and measured
pixel values - Gain Correction Factor - for every pixel. They have shown promising
results in eliminating ring artifacts [64]. However, they do not account for other
common physics based artifacts such as scatter noise.

II. Post reconstruction correction


Post reconstruction techniques are general methods that tend to eliminate shading
artifacts from reconstructed images. They include image transformation algorithms
coupled with prior information - usually CT images - to minimize the difference
between the reconstructed CBCT image and CT image of the same patient [65-68].
The common idea is to build a mathematical fitting model that minimizes a distance
metric between the CT and CBCT pixel values. This approach is highly dependent on
prior information since it requires exact CT - CBCT image pairs to estimate the
polynomial function. In radiotherapy applications, the CT information is gathered
from planning images captured by conventional fan beam CT. However, it is desirable
to replace CT planning images with MR images due to higher tissue contrast and no
radiation exposure [69]. In the context of radiosurgery applications such as Gamma
Knife, planning images are often never acquired through CT scans. Rather, MRI scans
are used. This impedes the general use of polynomial fitting approach for shading
correction in CBCT images.
33
III. Correction within reconstruction
These include analytical and iterative reconstruction techniques that tend to include
mathematical-physical models for estimating scan objects, scatter contribution in
projections, noise levels and system nonlinearities in a deterministic and statistical
manner respectively. Iterative reconstruction algorithms - due to a more realistic
approach – are seen as a bellwether in the reconstruction domain. They have
demonstrated improved noise suppression while maintaining image features in emission
tomography [70, 71] and computed tomography [72, 73]. FDK – an analytical
reconstruction algorithm - is currently used in Gamma Knife Icon and Elekta is
investing in significant research to alleviate the artifact problems via state-of-the-art
iterative methods. Iterative approaches tend to minimize a cost function iteratively
which is constructed based on noise characteristics of the measured data. Accurate
noise modelling is essential for statistical iterative algorithms. And hence there has
been a series of research activities giving birth to specialized techniques for targeting
particular artifacts in the measured data. These include (i) Monte Carlo simulations
for modelling scatter noise by tracing the photon path from source to detector and
performing corrections in the projection domain [74]. This method is considered to be
the most accurate approach for scatter estimation but is computationally
burdensome.(ii) Analytical kernel methods using scatter convolution kernels to
approximate scatter model [75]. These models offer a balance between computational
burden and accuracy.

Fig. A.4 briefly explain the overall picture of CBCT image creation. Fig. A.5 and A.6
include the classical steps in an iterative reconstruction algorithm and a typical
example of how an analytical scatter estimation model works respectively. Either way,
segmentation plays a small but crucial part in both algorithms. The current scope of
work is limited to a better segmentation method to be run within the iterative
reconstruction loop to assinst image denoising. In order to have a real-time 3D
reconstruction, the segmentation step needs to be fast and robust.

Capture 2D Pre- Volume Post- Visualiza-


X-ray Reconstruc
projections processing -tion processing tion

Figure A.4: Basic steps in CBCT image creation. The present thesis work is focused within the volume
reconstruction block.

34
A typical iterative reconstruction loop is shown below.

Initial/New Segmented
image estimate Image

Measured Generate 2D Volume


projections raw projections Reconstruction

Volume Update reconstruction


Reconstruction Stop parameters

Figure A.5: Iterative reconstruction flowchart

Similarly, different scatter estimation methods also make use of fast segmented
CBCT images [6] as shown below. The extent of this thesis is marked in red.

Initial 3D Segmented Corrected


Image Estimate Image Image

Measured Generate 2D Volume


projections raw projections Reconstruction

Estimate Calculate Correction


Scatter Data

Figure A.6: Typical framework for a scatter correction method

35
A.6 Segmentation
Automated image segmentation is a fundamental problem in medical image analysis.
The applications include contouring organs, lesion (tumor) detection, dose
calculations, noise reduction etc. In the present work scope, the aim of the
segmentation is to divide the image into homogenous meaningful regions. It is basically
a classification problem where every voxel belongs to a particular class. This form of
segmentation is also referred as semantic segmentation. There are three broad
categories of segmentation methods. (i) Segmentation based on local information.
These algorithms tend to work from an image processing viewpoint based
morphological and thresholding algorithms [76, 77]. These methods are sensitive to
image noise and non-uniform object textures among others [78]. (ii) Segmentation
based on a priori information. These are high-level methods that make use of prior
information of the expected volume and adapt it to the actual volume. They include
deformable models and atlas-based segmentations [10]. These methods are often
combined together for segmentation purposes [11, 12]. However, there are two main
problems with these methods. First, they require a rich database for constructing
atlases. Second, these methods require a smooth transformation between expected and
actual volumes. Tumors and other pathological conditions may disrupt this
transformation. (iii) Machine Learning (ML) based segmentation techniques have
recently shown huge success in medical image analysis with focus on MRI and CT
images [14]. They tend to segment lesions or tumor sites within scan volumes and even
anatomically segment the entire scan volume into homogenous regions [79]. These
methods make use of supervised learning principles where a set of training data and
corresponding true labels are fed into a network for training under a certain ML
algorithm. The network model is trained to identify the best fitting relationship
between the inputs and outputs in a generalized manner. More information on ML
algorithms can be found in [15]. A key ingredient behind the success of ML based
segmentation methods is the way how the input data is represented. Better image
representation often leads to a better analysis. We call it Feature Engineering and
Selection and it has been an active research topic in the past decade [80]. Harr wavelet
[81] and histogram of oriented gradients [82] are some common examples of feature
engineering used for medical image analysis. A major drawback of these so called hand-
crafted features is that they are unable to extract salient information from the data.
As a result their performance varies from dataset to dataset.

Contrary to such traditional classification methods that are dependent on feature


engineering, deep learning networks directly extract hierarchical features that are
adapted to the dataset [16]. Deep learning based methods for medical image
segmentation have shown to outperform the traditional feature based methods on MR
images [17, 18]. However, unlike MRI segmentation, these methods have yet not been
tried for semantic CBCT segmentation. There are two main reasons for it. First, well
labelled CBCT data is not openly available for training deep architectures. Second,
CBCT data, unlike MRI, is extremely noisy which poses problem for segmentation.

36
Appendix B: Image Processing and Segmentation

B.1 Image Filtering


Image filtering is a neighborhood operation used to enhance image quality by
emphasizing on certain features and depreciating others. Noise is invariably present in
digital images. Noise in CBCT images can be broadly grouped into two categories:
1. Signal independent noise: This includes electrical and additive noise. Electrical
noise may originate due to several factors involved in the image acquisition
process like digital discretization, sensor noise etc.
2. Signal dependent noise: Quantum noise is the photon-count noise that occurs
due to uncertainty in the amount of X-ray photons being absorbed by the
detector after passing through an object. Quantum noise can be modelled by
Poisson's distribution and is often taken care of in the iterative reconstruction
process [43].
These noise sources introduce irregularities in the projection images which cause
problems in the reconstructed image. However, for simplicity, we shall consider both
these noise as additive (signal independent). Hence, we assume that noise is modelled
as Additive White Gaussian Noise (AWGN).

𝑉𝑉𝑖𝑖 = 𝑈𝑈𝑖𝑖 + 𝑁𝑁𝑖𝑖 ∀ 𝑖𝑖 ∈ 𝐼𝐼 (10)

𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑖𝑖 is the pixel belonging to the image set of pixels 𝐼𝐼.


𝑈𝑈 is the assumed true noiseless image.
𝑉𝑉 is the noisy image.
𝑁𝑁 is the noise distribution with mean 0 and standard deviation 1.

B.1.1 Gaussian Filtering


Gaussian filtering is a linear spatial operation that convolves the input volume with a
Gaussian function kernel [83]. It performs smoothing of the image by replacing every
pixel value by the average of its neighbors defined by the kernel window. In the Fourier
domain, it gets rid of the high frequency signal which is assumed to be mostly noise.
However, high frequency image features such as edges and corners are also lost in the
operation.

B.1.2 Non-Local Means Filtering (NLM)


Unlike the Gaussian filtering that denoises images by averaging pixels in the
neighborhood kernel, NLM filtering averages similar pixels in the image. The main
difference lies in the concept of neighborhood used by the two methods. The
neighborhood of a pixel A in NLM filtering includes a set of pixels B such that the
window around B pixels (usually Gaussian) is similar to the window around A. Every
pixel in B carries a similarity weighting factor that influences the averaging operation.
This idea was first proposed by [84] and [85] motivated the use of NLM filtering for
denoising CBCT projections.

37
B.2 Automatic Thresholding
This is a histogram based hard segmentation method where pixels are quantized based
on a fixed threshold. Valleys in the image histograms determine the threshold points
as shown in Fig. B.1. Otsu's method was originally developed to identify a single
threshold value to separate the fore and background images [37]. However, in the
present work a modified version of the Otsu's method is used to determine multi-
thresholds allowing multi-class segmentation [86].

Figure B.1: Automatic Thresholding

B.3 K-means
K-means is an image clustering algorithm where pixels are grouped together based on
a similarity matrix [87]. In case of grayscale CBCT images, Euclidian distances
between the grayscale values of pixels is the simplest approach for segmenting the
image. The algorithm is given below:

a. Randomly choose K initial cluster centers.


b. Assign each pixel to the closest cluster
c. Update cluster centers as the average of pixels assigned to each cluster.
d. Continue until there is no change in center values of the clusters.
A major limitation with the algorithm is that it does not take spatial coherence into
account.

B.4 Fuzzy C-means


Fuzzy C-means algorithm is very similar to K-means, except for step b in the latter
algorithm. Rather than hard assigning each pixel to its closest center, Fuzzy C-means
creates a partition matrix that stores a membership vector for every pixel defining the
degree to which that pixel belongs to all the K clusters [36]. Fuzzy C-means also does
not take into account spatial coherence.

38
B.5 Mean Shift
Mean shift is also a clustering based segmentation method that includes spatial
information (x,y,z) and grayscale values of the pixels [88]. Mean shift considers this
four dimensional input vector as an empirical probability density function. This means
that peaks of the density function imply true clusters or segments in the image. These
peaks are found iteratively using a gradient ascent function. The basic algorithm works
as follows
a. Determine a window (often called as bandwidth) around each image pixel
b. Calculate the data mean within the bandwidth
c. Shift the window to the new mean and continue until convergence.

A major drawback of mean shift is that we do not get control over number of clusters
or segments. It is also computationally demanding.

B.6 Multilayer Perceptron


Artificial Neural Networks (ANN) are machine learning techniques inspired and loosely
based on biological neural networks. Multilayer Perceptron (MLP) is a type of
supervised ANN which was developed to overcome the limitation of a linear perceptron
that only classifies linearly separable data. MLP consists of one or more feed-forward
hidden layers that rest between the input and output layers and are used to model
non-linearity in the feature space. Each layer in a MLP is fully connected to the next
one via weights: an input layer that receives external inputs, hidden layers and an
output layer which generates the classification results (Fig. B.2). Except for the
neurons in the input layer, every neuron is a computational element with a summation
processor and a nonlinear activation function. Rumelhart discovered a way to train
MLP using backpropagation of error in which weights between nodes are trained so
that the error (difference between output and target value) goes downhill [89]. This is
a gradient descent algorithm in which the weights are adjusted to change the value of
E (error) in the direction of negative gradient to find a global minima on the error
surface.

Figure B.2: Multilayer Perceptron

39
B.7 Convolutional Neural Networks
Convolutional Neural Networks (CNN) belong to the group feed-forward artificial
neural networks where the connectivity pattern between different layers is inspired by
the connectivity in the animal visual cortex. CNN offer a hierarchical method of data
representation as shown in Fig. B.3. For instance, in language processing applications
features may be sorted into letters, words, phrases, sentences and finally a story.

Image in Low Level High Trainable


Mid-Level Classifier
features Level
features
(pixel features
(edges,
intensities (objects
motif etc.)
etc.) etc.)
Figure B.3: Features in deep learning in context of images.

The animal visual cortex - first explored by Hubel [90] - is a complex processing system
that extracts strong spatial correlations within images via a mosaic cellular
arrangement covering the entire visual field. Every cell in the visual cortex is sensitive
only to a small region of the visual field – often known as its receptive field. Efforts to
emulate the architecture of animal visual cortex led to the introduction of
convolutional neural network that rests on the principles of local connectivity and
extensive weight-sharing [19]. This allows handling high-dimensional data such as
images and videos without the need of big networks (number of nodes) and longer
processing times.

B.7.1 Computational Blocks


Conventionally, CNN's are built from five fundamental blocks.

I. Convolution Block
This block takes in a randomly initialized filter bank of arbitrary size and performs
2D or 3D convolution operation on the input image. Simply stating, convolution means
applying a filter over an image at all possible offsets. The convolution of a 2D image
𝑓𝑓(𝑥𝑥, 𝑦𝑦) with a filter ℎ of size (2𝑁𝑁 + 1) x (2𝑀𝑀 + 1) is given as:
𝑀𝑀
𝑁𝑁

𝑔𝑔(𝑥𝑥, 𝑦𝑦) = (ℎ ∗ 𝑓𝑓)(𝑥𝑥, 𝑦𝑦) = � � ℎ(𝑚𝑚, 𝑛𝑛)𝑓𝑓(𝑥𝑥 − 𝑚𝑚, 𝑦𝑦 − 𝑛𝑛) (11)


𝑛𝑛=−𝑁𝑁
𝑚𝑚=−𝑀𝑀

Intuitively, the filters are analogous to visual cells in the animal cortex. The size of
the filter corresponds to the receptive field of a visual cell. At one instant, the filter is
only wired to a sub-region of the input image equivalent to the size of the filter - local
connectivity. As the filter is swiped through the image along horizontal and vertical
directions during convolution, dot product of the filter with its corresponding receptive
40
field on the input image is calculated. This operation highlights the presence of a
particular feature within the image at any spatial location. Each filter has same
weights - weight sharing - and bias values at every offset and is termed as a feature
map. The number of filters correspond to the number of features in the convolution
layer and this value is chosen arbitrarily.

II. Spatial Pooling Block


This block performs a non-linear down-sampling of the input image. It takes a window
size as an input and swipes it along the image in both directions and outputs the
maximum or mean value at every offset. Hence, for an input image of size (32x32) and
a pool window of size (2x2), the output would be (16x16). This block has two main
advantages. First, it reduces computational burden by down sampling the input image.
This reduces the dimensionality of the input image, thus avoiding overfitting. Second,
it provides a sort of translational invariance in the network, thus adding robustness to
the solution.

III. Non-Linearity Block


This block is essential to stack multiple layers in the network. Different non-linearity
functions also termed as activation functions can be used. Most common are Rectified
Linear Unit (ReLU) and Sigmoid. The former transforms all negative values to 0 and
the latter computes the sigmoid.
𝑦𝑦 = max {0, 𝑥𝑥 } 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (12)

1
𝑦𝑦 = 𝜎𝜎(𝑥𝑥 ) = 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 (13)
(1 + 𝑒𝑒 −𝑥𝑥 )

IV. Fully Connected Block


Nodes in fully connected block have full connections to every node in the previous and
next layer. This is similar to MLP discussed in Appendix B.6. This block captures
position-dependent information and more global patterns.

V. Softmax Block
The softmax operator outputs posterior probabilities for a categorical label. These
probabilities lie within 0 - 1 range and sum to 1. The softmax operator is given as
𝑒𝑒 𝑥𝑥
𝑦𝑦 = 𝐷𝐷 𝑥𝑥 (14)
∑𝑡𝑡=1 𝑒𝑒

Other important computational blocks include De-convolution, Drop Out,


Normalization, Log Loss etc. and details can be found in Matconvnet 3 [26].

3
Technical manual available at http://www.vlfeat.org/matconvnet/matconvnet-manual.pdf
41
References

1. Feldkamp, L.A., L.C. Davis, and J.W. Kress, Practical cone-beam algorithm. Journal
of the Optical Society of America A, 1984. 1(6): p. 612-619.
2. Jaffray, D.A. and J.H. Siewerdsen, Cone-beam computed tomography with a flat-
panel imager: initial performance characterization. Med Phys, 2000. 27(6): p. 1311-
23.
3. Orth, R.C., M.J. Wallace, and M.D. Kuo, C-arm cone-beam CT: general principles
and technical considerations for use in interventional radiology. J Vasc Interv Radiol,
2008. 19(6): p. 814-20.
4. Maciunas, R.J., R.L. Galloway Jr, and J.W. Latimer, The application accuracy of
stereotactic frames. Neurosurgery, 1994. 35(4): p. 682-695.
5. Zhao, W., et al., A model-based scatter artifacts correction for cone beam CT.
Medical physics, 2016. 43(4): p. 1736-1753.
6. Zhao, W., et al. A patient-specific scatter artifacts correction method. in SPIE
Medical Imaging. 2014. International Society for Optics and Photonics.
7. Bai, Y., et al., Iterative CBCT Scatter Shading Correction Without Prior
Information. Medical Physics, 2016. 43(6): p. 3345-3345.
8. Pengwei Wu, T.M., Shutao Gong, Jing Wang, Ke Sheng, Yaoqin Xie, Tianye Niu,
Shading correction assisted iterative conebeam CT reconstruction, in The 4th
International Conference on Image Formation in X-Ray Computed Tomography, C.
meeting, Editor. 2016: Bamberg, Germany. p. 391-394.
9. Wu, P., et al., Iterative CT shading correction with no prior information. Phys Med
Biol, 2015. 60(21): p. 8437-55.
10. Sharma, N. and L.M. Aggarwal, Automated medical image segmentation techniques.
J Med Phys, 2010. 35(1): p. 3-14.
11. Ciofolo, C. and C. Barillot, Atlas-based segmentation of 3D cerebral structures with
competitive level sets and fuzzy control. Medical Image Analysis, 2009. 13(3): p. 456-
470.
12. Ghadimi, S., et al., Segmentation of scalp and skull in neonatal MR images using
probabilistic atlas and level set method. Conf Proc IEEE Eng Med Biol Soc, 2008.
2008: p. 3060-3.
13. Ledig, C., et al., Robust whole-brain segmentation: Application to traumatic brain
injury. Medical Image Analysis. 21(1): p. 40-58.
14. Luping Zhou LW, Q.W., Yinghuan Sh., Machine Learning in Medical Imaging. 2015
ed. 6th International Workshop, MLMI. 2015: Springer International Publishing.
15. Marsland, S., Machine Learning - An Algorithmic Perspective. Machine Learning and
Pattern Recognition Series, ed. C.a. Hall/CRC. 2009: Taylor and Francis Group.
16. Deng, L. and D. Yu, Deep Learning: Methods and Applications. 2014: Now
Publishers.
17. Brosch, T., et al., Deep Convolutional Encoder Networks for Multiple Sclerosis
Lesion Segmentation, in Medical Image Computing and Computer-Assisted
Intervention – MICCAI 2015: 18th International Conference, Munich, Germany,
October 5-9, 2015, Proceedings, Part III, N. Navab, et al., Editors. 2015, Springer
International Publishing: Cham. p. 3-11.
18. Liao, S., et al., Representation learning: a unified deep learning framework for
automatic prostate MR segmentation. Med Image Comput Comput Assist Interv,
2013. 16(Pt 2): p. 254-61.
19. Krizhevsky, A., I. Sutskever, and G.E. Hinton. Imagenet classification with deep
convolutional neural networks. in Advances in neural information processing systems.
2012.
20. Brosch, T., et al., Deep 3D Convolutional Encoder Networks with Shortcuts for
Multiscale Feature Integration Applied to Multiple Sclerosis Lesion Segmentation.
IEEE Trans Med Imaging, 2016.

42
21. Nagatani, Y., et al., Ultra-low-dose computed tomography system with a flat panel
detector: assessment of radiation dose reduction and spatial and low contrast
resolution. Radiat Med, 2008. 26(10): p. 627-35.
22. Ruhrnschopf, E.P. and K. Klingenbeck, A general framework and review of scatter
correction methods in x-ray cone-beam computerized tomography. Part 1: Scatter
compensation approaches. Med Phys, 2011. 38(7): p. 4296-311.
23. Huang, D.-Y. and C.-H. Wang, Optimal multi-level thresholding using a two-stage
Otsu optimization approach. Pattern Recognition Letters, 2009. 30(3): p. 275-284.
24. Kubat, M. and S. Matwin. Addressing the curse of imbalanced training sets: one-
sided selection. in ICML. 1997. Nashville, USA.
25. Canny, J., A Computational Approach to Edge Detection. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 1986. PAMI-8(6): p. 679-698.
26. Lenc, A.V.a.K., MatConvNet -- Convolutional Neural Networks for MATLAB.
Proceeding of the {ACM} Int. Conf. on Multimedia. 2015.
27. Geman, S., E. Bienenstock, and R. Doursat, Neural networks and the bias/variance
dilemma. Neural computation, 1992. 4(1): p. 1-58.
28. Nickolls, J., et al., Scalable parallel programming with CUDA. Queue, 2008. 6(2): p.
40-53.
29. Cortes, C. and V. Vapnik, Support-vector networks. Machine learning, 1995. 20(3):
p. 273-297.
30. Viola, P. and M. Jones. Rapid object detection using a boosted cascade of simple
features. in Computer Vision and Pattern Recognition, 2001. CVPR 2001.
Proceedings of the 2001 IEEE Computer Society Conference on. 2001. IEEE.
31. Marquardt, D.W., An algorithm for least-squares estimation of nonlinear parameters.
Journal of the society for Industrial and Applied Mathematics, 1963. 11(2): p. 431-
441.
32. Ahmed, M.N., et al., A modified fuzzy c-means algorithm for bias field estimation
and segmentation of MRI data. IEEE Transactions on Medical Imaging, 2002. 21(3):
p. 193-199.
33. Aparajeeta, J., P.K. Nanda, and N. Das. Bias field estimation and segmentation of
MR image using modified fuzzy-C means algorithms. in Advances in Pattern
Recognition (ICAPR), 2015 Eighth International Conference on. 2015.
34. Bei, Y., et al. A fuzzy C-means based algorithm for bias field estimation and
segmentation of MR images. in Apperceiving Computing and Intelligence Analysis
(ICACIA), 2010 International Conference on. 2010.
35. Pham, D.L. and J.L. Prince, Adaptive fuzzy segmentation of magnetic resonance
images. IEEE Transactions on Medical Imaging, 1999. 18(9): p. 737-752.
36. Nock, R. and F. Nielsen, On weighting clustering. Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 2006. 28(8): p. 1223-1235.
37. Otsu, N., A threshold selection method from gray-level histograms. Automatica,
1975. 11(285-296): p. 23-27.
38. A Ashfaq, J.A., A modified Fuzzy C-Means algorithm for Shading Correction in
Craniofacial CBCT Images [Preprint]. 2016, Elekta Instruments.
39. Snedecor, G.W. and W.G. Cochran, Statistical methods, 8thEdn. Ames: Iowa State
Univ. Press Iowa, 1989.
40. Brosch, T., et al. Deep convolutional encoder networks for multiple sclerosis lesion
segmentation. in International Conference on Medical Image Computing and
Computer-Assisted Intervention. 2015. Springer.
41. Jia, Y., et al. Caffe: Convolutional architecture for fast feature embedding. in
Proceedings of the ACM International Conference on Multimedia. 2014. ACM.
42. Chetlur, S., et al., cudnn: Efficient primitives for deep learning. arXiv preprint
arXiv:1410.0759, 2014.
43. Schulze, R., et al., Artefacts in CBCT: a review. Dentomaxillofac Radiol, 2011. 40(5):
p. 265-73.
44. Batenburg, K.J. and J. Sijbers, Adaptive thresholding of tomograms by projection
distance minimization. Pattern Recognition, 2009. 42(10): p. 2297-2305.

43
45. Batenburg, K.J. and J. Sijbers, Optimal threshold selection for tomogram
segmentation by projection distance minimization. IEEE Transactions on Medical
Imaging, 2009. 28(5): p. 676-686.
46. Indrajit, I., Digital imaging and communications in medicine: A basic review. Indian
Journal of Radiology and Imaging, 2007. 17(1): p. 5.
47. Mould, R.F., The early history of x-ray diagnosis with emphasis on the contributions
of physics 1895-1915. Phys Med Biol, 1995. 40(11): p. 1741-87.
48. Radon, J., On the Determination of Functions from Their Integral Values along
Certain Manifolds. IEEE Trans Med Imaging, 1986. 5(4): p. 170-6.
49. Peeters, F., B. Verbeeten, Jr., and H.W. Venema, [Nobel Prize for medicine and
physiology 1979 for A.M. Cormack and G.N. Hounsfield]. Ned Tijdschr Geneeskd,
1979. 123(51): p. 2192-3.
50. Ginat, D.T. and R. Gupta, Advances in computed tomography imaging technology.
Annu Rev Biomed Eng, 2014. 16: p. 431-53.
51. Buzug, T.M., Computed Tomography. 2009: Springers.
52. Willemink, M.J., et al., Iterative reconstruction techniques for computed tomography
part 2: initial results in dose reduction and image quality. European radiology, 2013.
23(6): p. 1632-1642.
53. Willemink, M.J., et al., Iterative reconstruction techniques for computed tomography
Part 1: technical principles. European radiology, 2013. 23(6): p. 1623-1631.
54. Jaffray, D.A., et al., Flat-panel cone-beam computed tomography for image-guided
radiation therapy. Int J Radiat Oncol Biol Phys, 2002. 53(5): p. 1337-49.
55. Instruments, E., Design and performance characteristics of a Cone Beam CT system
for Leksell Gamma Knife® Icon™, in White Paper. 2015.
56. Scarfe, W.C. and A.G. Farman, What is Cone-Beam CT and How Does it Work?
Dental Clinics of North America, 2008. 52(4): p. 707-730.
57. Leksell, L., The stereotaxic method and radiosurgery of the brain. Acta Chir Scand,
1951. 102(4): p. 316-9.
58. Minniti, G., et al., Fractionated stereotactic radiosurgery for patients with brain
metastases. J Neurooncol, 2014. 117(2): p. 295-301.
59. Esmaeili, F., et al., Beam Hardening Artifacts: Comparison between Two Cone Beam
Computed Tomography Scanners. Journal of Dental Research, Dental Clinics, Dental
Prospects, 2012. 6(2): p. 49-53.
60. Hatcher, D.C., Operational principles for cone-beam computed tomography. J Am
Dent Assoc, 2010. 141 Suppl 3: p. 3s-6s.
61. Neitzel, U., Grids or air gaps for scatter reduction in digital radiography: A model
calculation. Medical Physics, 1992. 19(2): p. 475-481.
62. Wiegert, J., et al. Performance of standard fluoroscopy antiscatter grids in flat-
detector-based cone-beam CT. 2004.
63. Mail, N., et al., The influence of bowtie filtration on cone-beam CT image quality.
Med Phys, 2009. 36(1): p. 22-32.
64. Altunbas, C., et al., Reduction of ring artifacts in CBCT: Detection and correction of
pixel gain variations in flat panel detectors. Medical Physics, 2014. 41(9): p. 091913.
65. Marchant, T.E., et al., Shading correction algorithm for improvement of cone-beam
CT images in radiotherapy. Phys Med Biol, 2008. 53(20): p. 5719-33.
66. Stephen, B., et al., Prior image constrained scatter correction in cone-beam
computed tomography image-guided radiation therapy. Physics in Medicine and
Biology, 2011. 56(4): p. 1015.
67. Niu, T., A. Al-Basheer, and L. Zhu, Quantitative cone-beam CT imaging in radiation
therapy using planning CT as a prior: first patient studies. Med Phys, 2012. 39(4): p.
1991-2000.
68. Niu, T., et al., Shading correction for on-board cone-beam CT in radiation therapy
using planning MDCT images. Med Phys, 2010. 37(10): p. 5395-406.
69. Schmidt, M.A. and G.S. Payne, Radiotherapy planning using MRI. Physics in
medicine and biology, 2015. 60(22): p. R323.

44
70. Richard M. Leahy, J.Q., Statistical approaches in quantitative positron emission
tomography. Statistics and Computing, 2000. 10(2): p. 147-165
71. Iriarte, A., et al., System models for PET statistical iterative reconstruction: A
review. Comput Med Imaging Graph, 2016. 48: p. 30-48.
72. Kahn, J., et al., Computed tomography in trauma patients using iterative
reconstruction: reducing radiation exposure without loss of image quality. Acta
Radiol, 2016. 57(3): p. 362-9.
73. Ono, S., et al., Improved image quality of helical computed tomography of the head
in children by iterative reconstruction. J Neuroradiol, 2016. 43(1): p. 31-6.
74. Yuan, X., et al., A practical cone-beam CT scatter correction method with optimized
Monte Carlo simulations for image-guided radiation therapy. Physics in Medicine
and Biology, 2015. 60(9): p. 3567.
75. Wei, Z., et al., Patient-specific scatter correction for flat-panel detector-based cone-
beam CT imaging. Physics in Medicine and Biology, 2015. 60(3): p. 1339.
76. Woods., R.C.G.a.R.E., Digital Image Processing. 3 ed. 2008: Pearson Education
International.
77. Dogdas, B., D.W. Shattuck, and R.M. Leahy, Segmentation of skull and scalp in 3-D
human MRI using mathematical morphology. Hum Brain Mapp, 2005. 26(4): p. 273-
85.
78. Hristidis, V., "Information discovery on electronic health records" Chapter 10:
Medical Image Segmentation. 2009: CRC Press.
79. Montillo, A., et al., Entangled Decision Forests and Their Application for Semantic
Segmentation of CT Images, in Information Processing in Medical Imaging: 22nd
International Conference, IPMI 2011, Kloster Irsee, Germany, July 3-8, 2011.
Proceedings, G. Székely and H.K. Hahn, Editors. 2011, Springer Berlin Heidelberg:
Berlin, Heidelberg. p. 184-196.
80. Visalakshi, S. and V. Radha. A literature review of feature selection techniques and
applications: Review of feature selection in data mining. in Computational
Intelligence and Computing Research (ICCIC), 2014 IEEE International Conference
on. 2014.
81. Mallat, S.G., A theory for multiresolution signal decomposition: the wavelet
representation. IEEE Transactions on Pattern Analysis and Machine Intelligence,
1989. 11(7): p. 674-693.
82. Dalal, N. and B. Triggs. Histograms of oriented gradients for human detection. in
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer
Society Conference on. 2005.
83. Blinchikoff, H. and H. Krause, Filtering in the time and frequency domains. 2001:
The Institution of Engineering and Technology.
84. Buades, A., B. Coll, and J.M. Morel. A non-local algorithm for image denoising. in
2005 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR'05). 2005.
85. Dacke, F., Non-local means denoising ofprojection images in cone beamcomputed
tomography, in KTH, Mathematical Statistics. 2013, KTH: Stockholm. p. 80.
86. Liao, P.-S., T.-S. Chen, and P.-C. Chung, A fast algorithm for multilevel
thresholding. J. Inf. Sci. Eng., 2001. 17(5): p. 713-727.
87. Hartigan, J.A. and M.A. Wong, Algorithm AS 136: A K-Means Clustering
Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics),
1979. 28(1): p. 100-108.
88. Yizong, C., Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 1995. 17(8): p. 790-799.
89. Rumelhart, D.E., G.E. Hinton, and R.J. Williams, Learning representations by back-
propagating errors. Cognitive modeling, 1988. 5(3): p. 1.
90. Hubel, D.H. and T.N. Wiesel, Receptive fields and functional architecture of monkey
striate cortex. The Journal of Physiology, 1968. 195(1): p. 215-243.

45
TRITA 2016:104

www.kth.se

You might also like