Lecture 10 Merged

H06W8a: Medical image analysis Deep learning for medical image computing
Class 10: Deep learning

Part 1: Principles
Prof. Frederik Maes
frederik.maes@esat.kuleuven.be
Medical Image Computing Fundamental problems
• Image segmentation:
– Detection and delineation of objects of interest in the images
– Prerequisite for morphometry and regional quantification
• Image registration:
Normal
– Spatial alignment of different images (different modalities / time points / subjects…)
Stroke
fMRI – Prerequisite for fusion and joint analysis of complementary image information
6
• Image visualization:
– Presentation of the relevant information that was extracted from the images
– Prerequisite for clinical interpretation
0
Huntington
3 4
Challenges Model-based image analysis
• Complex data • Incorporate prior knowledge about the appearance of the objects of interest in the images
– multi-dimensional, multi-temporal, multi-modal, multi-subject – photometric properties (intensity, contrast, texture…)
– limited image quality: resolution, contrast, noise, artifacts è ambiguity – geometric properties (position, shape, deformations...)
• Complex objects – context (other objects, clinical information…)
– 3D anatomical shapes
– normal biological variability • Model = mathematical representation of prior knowledge
– abnormalities (pathology…) – must be flexible (= parameterized) to account for variability
• Complex applications – can itself be represented as an image (e.g. an atlas)
– continuous technological advances in medical imaging
– increasing clinical requirements • Image analysis problem formulated as an optimization problem
• Complex validation – Find the model parameters that “best fit” the image data
– lack of objective ground truth in clinical images – Suitable criterion to measure “goodness of fit” (= objective function)
– observer variability – Suitable computational strategy to find the optimal solution (= optimization method)
5 6
Maximum A Posteriori Probability (MAP) formulation Approach 1: energy minimization
M(Q) = model with parameters Q
Bayes’ rule: P(a,b) = P(a).P(b|a) = P(b).P(a|b)

Gibbs distribution
E = energy function (to be defined…)
If Prob(M(Q)) = constant è maximum likelihood Z = normalization constant
Eint = internal energy è measures fidelity to the prior
Eext = external energy è measures conformity to the data
Prob(M(Q)) = prior probability of the model with parameters Q Energy minimization problem
Prob(I | M(Q)) = data likelihood for the specified model parameters è Flexibility to define the energy terms…
Prob(I ) = prior probability of observing image I (independent of Q ) è Based on heuristics, physics, statistics…
Prob(M(Q) | I) = posterior probability of the model with parameters Q for the observed image I
g = user-specified weight (hyper-parameter)
è Tunable behavior…
7 8
Example 1: statistical shape models Example 2: atlas-based segmentation
Model building Model fitting Model building Model fitting
M = contour with N landmarks Eint (Q): shape model M = atlas Eint (Q): regularization
Q = landmark coordinates Eext (I|Q): intensity model 9 Q = deformation field Eext (I|Q): similarity measure 10
Approach 2: Classification Conventional machine learning: handcrafted features
MAP formulation
F
e O
Estimate Prob(M(q)|I ) directly from training data a u
(supervised classification)
t t
Data Classifier
F = classifier / regressor u p
Maps given input image I onto most likely model r u
instance q* based on previous training data
e t
s
Instead of full data I: feature vector f
(= dimensionality reduction) SVM
Domain Handcrafted Random forest
knowledge feature Neural network
11 vector 12
Deep learning: automated optimal features A single neural node
x0 = 1 w0
O
u a = f(z)
w1
Classifie t x1 z = Si wi*xi f
Data Features
r p
u Linear Non-linear
t weighted sum activation function
w2
a
1
Deep neural network x2 sigmoid:
Domain
with automated discovery of
knowledge 0
optimal features z
13 14
z = 0 è a = 0.5
Rationale Rationale
x2 x2
z<0èaà0 z = 0 è a = 0.5
z = Si wi*xi z = Si wi*xi
Non-optimal values for w Optimal values for w
Class 1 Class 1 Decision boundary

Class 2 Class 2 in feature space (x1,x2)
a = Prob(Class = 1)
z>0èaà1
z>0èaà1 x1 z<0èaà0 x1
15 16
Rationale A simple neural network x2 Curved

x2 decision boundaries
z = 0 è a = 0.5
z = Si wi*xi
1 if Class 1,
Optimal values for w 0 otherwise
Class 1 Prediction: 1 if Class 2,

Class 2 0 otherwise
* = new sample
a(*) > 0.5 => Class 1
* Output of layer l = input of layer (l+1)
Weight w associated with each connection
z>0èaà1 For classification: output = binary (class label)
For regression: output = real value x1
z<0èaà0 x1
17 18
Loss L
Supervised training Gradient descent
= walk downhill by taking small

Classifier based on features f with internal parameters w steps along direction of steepest
descent
Prediction for training sample i for the current parameters w
= iterative search procedure w(k+1) = w(k) – a (dL/dw)(k)
a = step size = learning rate

Find classifier parameters w that yield optimal classification
Too large: no convergence
performance over the training set w.r.t. some loss function L
Too small: local optimum dL/dw
= complex non-linear optimization problem
Updates determined for each Optimum
Popular loss functions:
weight layer-by-layer (from
- Cross-entropy (for classification)
output to input)
- Mean squared error (for regression) = back-propagation
- Dice similarity (for segmentation)
19 20
wopt w(k+1) w(k) w
Deep neural networks Convolutional neural networks
Input: 2000x2000 pixels

= vector with 4 million values Output: N classes
1 if Normal,
0 otherwise
1 if Abnormal type A,
0 otherwise
1 if Abnormal type B,
0 otherwise
Share weights between different regions è less parameters

Huge amount of parameters if fully connected Convolution = linear filtering operator è feature maps
Limited amount of training samples Pooling è position and scale invariance
Complex training Cascaded è increasing level of abstraction
Poor generalization for classification of new test images Learn optimal features during training (back-propagation)
21 Fully connected layers for final classification using learned features 22
Convolution operator Convolutional filters in image processing

Example: 3x3 filter
c-1 c c+1 c-1 c c+1 Smoothing Edge detection
r-1 i1 i2 i3 w1 w2 w3 r-1
r i4 i0 i5 * w4 w0 w5 r j
r+1 i6 i7 i8 w6 w7 w8 r+1
filter W filtered image J 0.0232 0.0338 0.0383 0.0338 0.0232

0.0338 0.0492 0.0558 0.0492 0.0338 -1 -2 -1 -1 0 1
0.0383 0.0558 0.0632 0.0558 0.0383 0 0 0 -2 0 2
0.0338 0.0492 0.0558 0.0492 0.0338 1 2 1 -1 0 1
= feature map 0.0232 0.0338 0.0383 0.0338 0.0232
vertical horizontal
23 weighted average difference difference 24
Stride and padding Convolution layer
Shorthand:
Number of parameters: 3x3x3x2 + 2 = 56

If fully connected: 108x32 + 32 = 3488
25 26
Alternative activation functions Pooling layer

a
a
sigmoid:
1 tanh: 1
0 z -1
MOST POPULAR a
a
Rectified Linear Unit: Leaky RELU:
(RELU) (LRELU)
0 z z
0
27 28
Feature maps Consecutive layers = consecutive levels of abstraction
29 30
Training of deep CNNs: plenty of heuristics Drop out
• Large amount of parameters, limited training samples

è Data augmentation (image translation, rotation, mirroring, deformations)
è Normalization
è Initialization (pre-training, transfer learning)
• Local minima
è Vanishing gradient problem: gradients get smaller from end-to-front (è RELU)
è Batch gradient descent: less erratic updates
è Momentum: keep track of previous downhill direction (e.g. Adam optimizer)
è Variable learning rate
è Drop out
• Overfitting
è Training set & validation set
è Early stopping
è Regularization
31 32
Training & validation set Batch gradient descent

N training samples
Stochastic gradient descent

Parameters adjusted after every training sample
TRAINING VALIDATION TEST
(60%) (20%) (20%) Batch gradient descent
Parameters adjusted after every pass over all training samples
TRAINING SET VALIDATION SET TEST SET (once after each epoch)
Train network Used during training New data examples,
parameters to assess overfitting not used for training
Mini-batch gradient descent
Parameters adjusted after every m training samples
(several times per epoch)
33 34
Early stopping Example: LeNet-5 for written digit recognition (1998)
Input dimensions: 32 x 32 = 1024

Output dimensions: 10
Number of parameters:
Convolution layers: ~2500
Fully connected layers: ~59100
Total: ~62000
35 36
Example: AlexNet for object recognition (2012) CNN variants
Input dimensions: ~150 K

Output dimensions: 1000
Number of parameters:
Convolution layers: ~2.3 M
Fully connected layers: ~58.6 M
Total: ~61 M
Residual Neural Network

(skip connections)
37 38
GAN (Generative Adversarial Networks) GAN: examples
Christies, 2018
StarGAN, 2018
Synthetic CT, 2018

39 40
The DL zoo RN
N
LSTM
t GA
U-Ne N
SqueezeNet Deep learning for medical image computing
U-net, Ronneberger et al.
Part 2: Applications
dic
p Me
Dee
DeepMedic, Kamnitsas et al.
41 42
General principles of deep learning Deep Learning for Computer Vision
Step 1 Step 2 Step 3 Step 4 Step 5

Gather data which is Develop your neural Train your neural network Tune the hyper Use the predicted
relevant for your network with basic using an optimization parameters of your neural parameters from your model
learning problem parameters method and a cost function network to secure to predict outcomes on new
generalization data
• Pictures are everywhere

• Everyone is an expert
• Performance is (usually) not critical
• Underlying hypotheses are irrelevant
è Strive for fully automated solutions
43 44
Deep Learning for Medical Imaging Deep Learning for Image Segmentation
Reconstruction & QC Quantification Treatment planning
Diagnosis © ADNI
Outcome prediction
Ubiquitous in medical image computing
Requires domain-specific knowledge
Until recently: variety of model-based strategies

(cfr. energy minimization)
• Image access is limited
• True expertise is scarce Now: unified approach using Deep Learning
• Performance is (usually) critical (classification)
• Image findings require clinically relevant interpretation
45 46
è Essential to keep the expert in the loop Lao et al., Scientific Reports, 2017
CNNs for image segmentation: U-Net CNNs for image segmentation: DeepMedic
DeepMedic, Kamnitsas et al.
U-net, Ronneberger et al.

47 48
CNNs for image segmentation: our own DeepVoxNet Dice Similarity Coefficient (DSC)
V(2)
Input Patch-Based Approach Conv neural network
V(1)
Architecture based on DeepMedic
V(1&2)
50% over-segmentation: DSC = 80% 20% over-segmentation: DSC = 90%
Loss function
V (1& 2)
DSC =
(cross-entropy, Dice) (V (1) +V (2)) / 2
Image transformations
x mm over-segmentation: DSC = 82%
Class weighted sample generator
49 x mm over-segmentation: DSC = 90% 50
Sample transformations
Example 1: Delineation of left ventricle in cardiac MRI Example 2: Delineation of left ventricle in cardiac US
• Unet-based architecture • Unet-based architecture LV
– Input = stack of 11 T1-weighted images – Input = 2CH US, ED or ES
(mid-cavity SA, 320 x 320, pre-aligned) (672 x 672, point of cone aligned)
– Output = segmentation of myocardium – Output = delineation of LV epi/endo + LA
• Training: 146 patients x 5 slices • Training: 450 patients x 2 phases
– Loss-function = Dice coefficient – Loss-function = mean DSC
Myocard
– No data augmentation – No data augmentation
– Trained from scratch (~ 30’) – Trained from scratch
• Test: 50 patients x 5 slices • Test: 50 patients x 2 phases
• From concept to state-of-the-art result: < 2 weeks…
Mean DSC [%] 91.8
MAD_Endo [mm] 0.94 LA
DSC (%) Mean +/- Std
MAD_Epi [mm] 1.36
LV 91.1 +/- 4.9
HD_Endo [mm] 2.54
Expert Myocard 82.2 +/- 7.7
AI tool HD_epi [mm] 3.40
Best DSC = .97 Median DSC = .93 Worst DSC = .40 DSC = .80
LA 84.7 +/- 15.4 Max Median Min
51 52
Example 3: Brain tissue segmentation (2D CNN) Example 4: Brain tissue segmentation (3D CNN)
Moeskops et al., Wachinger et al.,
IEEE Trans Medical Imaging, 2016 DeepNAT, NeuroImage 2018
25 structures
Prenatal, young adults, aging subjects T1-weighted, 1 mm3
Modalities: T1, T2 13x13x13 patches
Very limited training set (5-10 images) 2.7 million parameters
2D patches at multiple scales
Use coordinates for local context
Preprocessing: BFC, brain mask
No post-processing 30 images (20 training, 10 test)
Performance: DSC 80-90%
ChildBrain 2019 53 54
Example 5: Detection and segmentation of brain pathology FLAIR T1 T1 CE T2 GT Prediction
Tumor
MS
Unified approach using DL !
Stroke Binary classification
WHOLE TUMOR:
© BRATS, ISLES
GT Prediction
55
FLAIR T1 T1 CE T2 GT Prediction FLAIR T1 T1 CE T2 GT Prediction
Binary classification Binary classification GT Prediction
WHOLE TUMOR: WHOLE TUMOR:
GT Prediction
Binary classification Binary classification
WHOLE TUMOR: WHOLE TUMOR:
GT Prediction GT Prediction
Multiclass classification Multiclass classification
WHOLE TUMOR: TUMOR CORE:
GT Prediction GT Prediction
FLAIR T1 T1 CE T2 GT Prediction Example 6: Assessment of tissue viability in acute stroke

Initial CT perfusion scan
0s 5s 10 s
Tmax
Ischemic core
(excluding penumbra)
Conventional approach:
perfusion model
15 s 20 s 25 s
Predicted
Multiclass classification Tmax
ENHANCING TUMOR:
Alternative approach: Ground truth
data-driven prediction based on MRI
GT Prediction
64
Deep Learning for RT planning Delineation of Organs at Risk (OAR) Brainstem
Left Cochlea
Planning CT Parotid glands
Brainstem Right Cochlea
Upper Esophagus
CT acquisition Contouring RT Plan optimization Final dose calculation
Glottic Larynx
Mandible
Oral Cavity
Supraglottic Larynx
Mandible Left Parotid gland
Right Parotid gland
Inferior PCM
Mid PCM
Superior PCM
Left Submandibular gland
Manual or semi-automatic Manual interventions One patient specific treatment Oral cavity Spinal cord
plan Right Submandibular gland
Target volumes Organs at risk
Spinal Cord
45min – 2hours hours
H&N : 16 OAR, clinical guidelines (Brouwer et al, 2016)
Inter- and intra- Inter-institutional Semi-automated delineation approaches have been developed (e.g. atlas-based), but complicated to use
observer variations
65 In clinical practice: manual delineation…
variability 66
DeepVoxNet 3D CNN for H&N OAR segmentation Initial results
DeepVoxNet
16 outputs
Loss function: = probability map
mean Dice per structure
è post-processing
~70 training images: planning CT + delineations exported from clinical TPS Expert
Trained from scratch (randomly initialized weights) AI tool
Adam optimizer with drop out
67 68
Integration in the clinical RT workflow in UZ Leuven Retrospective validation: AI tool mimicking the expert
Radiation Oncology Medical Imaging Research Center DSC (%) 10 test cases
PLANNING AUTO-DELINEATION Brainstem 84.7
GPU server hosted by Left Cochlea 70.1

DICOM UZ Leuven datacenter Right cochlea 80.1
Planning CT
Upper esophagus 66.7
Glottic Larynx 53.3
DeepVoxNet
Mandible 84.9
DICOM RT-Struct Oral Cavity 67.5

TPS Supraglottic Larynx 54.3
Retraining
Left Parotid gland 80.8
Right Parotid gland 79.8
Inferior PCM 48.0
Approved Expert (45’) AI tool (5’)
Validation Mid PCM 55.5
contours Clinical
Retrospective = ‘never perfect…’ Superior PCM 36.5
feedback
Observer variability… Left Submandibular gland 69.6
‘Ground truth’ ? Right Submandibular gland 71.9
Spinal Cord 81.9
69 70
Prospective validation: expert correcting the AI tool Clinical benefits of auto-delineation using DL ?
DSC (%) Retrospective Prospective
(10 cases) (20 cases) Experimental automated workflow
Brainstem 84.7 91.5
Left Cochlea 70.1 75.4
Right cochlea 80.1 73.1 Expert 1
Corrected
Upper esophagus 66.7 34.8
delineations
Glottic Larynx 53.3 39.4
Automated Expert 2
Mandible 84.9 95.9
delineations
Oral Cavity 67.5 83.5
(16 OAR)
Supraglottic Larynx 54.3 71.2
Left Parotid gland 80.8 86.3 Conventional manual workflow
Right Parotid gland 79.8 89.7
AI tool (5’) Corrections (15’) Inferior PCM 48.0 57.9
15 consecutive Clinical planning CT Expert 1
Manual
Mid PCM 55.5 60.9
H&N RT patients delineations
Prospective = ‘good enough!’ Superior PCM 36.5 46.1 Expert 2
Observer focuses on clear errors, Left Submandibular gland 69.6 78.8
irrelevant variability ignored
Right Submandibular gland 71.9 87.7
Direct clinical feedback:
Spinal Cord 81.9 95.9 71 72
mouth open/closed, missing organs
Auto-delineation performance: efficiency Auto-delineation performance: observer variability
Expert 1
Expert 2
Delineation time
= time spent by the expert to delineate or to correct Automated
all 16 OAR structures in one patient
Average manual delineation time = 36 min
Average corrected delineation time = 22 min
Reduction of 38 %
Manual delineations Corrected delineations
73 74
Auto-delineation performance: observer variability Auto-delineation performance: robustness
Expert 1 Expert 1
Expert 2 Expert 2
Automated Automated
Manual delineations Corrected delineations Manual delineations Corrected delineations
75 76
Beyond delineation in Radiology: computer-aided diagnosis Beyond delineation in RT: treatment planning
Cardiac MRI
Deep Learning today
Delineation Imaging biomarkers CT acquisition Contouring RT Plan optimization Final dose calculation
End-diastole End-systole
Epicardium Endocardium Epicardium Endocardium
& & & 6

features
Statistical modeling
Conventional
machine learning Feature selection
Manual or semi-automatic Manual interventions One patient specific treatment

Classification plan
45min – 2hours hours

Deep Learning
Healthy or infarcted
in the future ? Inter- and intra- Inter-institutional
observer variations
Dept. Cardiology, KU Leuven variability 80
77
Deep learning for optimal dose prediction Multi-task learning: contouring + dose prediction
1) Contours derived from CT 2) Optimal dose derived Contours and optimal dose
using deep learning from contours (& CT) estimated jointly from CT
(= voxel-wise classification) using deep learning using deep learning
(= voxel-wise regression)
Dose
mimicking
DL 1 DL 2 DL TPS
Plan
Images Contours Dose Images Actual dose
DVH constraints
Treatment setup
Sequential è sub-optimal use of correlation between CT / contours / dose Simultaneous, largely based on same image-derived features è exploit correlations
81 82
Deep learning for treatment adaptation? Conclusion
Pre-treatment planning CT On-board CBCT of today • Deep learning is based on deep convolutional neural networks
• A deep neural network is a NN with many hidden layers
Option 2:
• A deep NN aims at achieving a complex mapping of a high-dimensional input (an image) onto a user-specified output
Replan (classification, regression)
• Training involves optimizing the (very many) internal parameters of the NN (using back-propagation) by presenting samples
of input/output pairs until convergence (assessed by a suitable loss function)
• CNNs are conceptually simple, but due to the huge amount of parameters, many heuristics are needed to get them to work
Option 1: • CNNs are computationally complex and best implemented on GPUs
Adapt • CNNs were invented in the 1990s, but broke through 20 years later in computer vision due to cheaper GPUs, large image
databases on the internet and more clever training schemes
• DL using CNNs is a revolution in computer vision, as it works with rich image patterns directly instead of poor feature-based
representations derived from them
• Medical imaging applications are more complex, as data is scarce and specific expertise is needed
• Properly structured image annotations are key for deep learning
• The current AI hype is all about DL. The intelligence is in the combination of {data, annotations, domain knowledge}, not in
the algorithm itself.
87 88

Lecture 10 Merged

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 10 Merged

Uploaded by

Copyright:

Available Formats

H06W8a: Medical image analysis Deep learning for medical image computing

Class 10: Deep learning

Medical Image Computing Fundamental problems

Challenges Model-based image analysis

M(Q) = model with parameters Q

Bayes’ rule: P(a,b) = P(a).P(b|a) = P(b).P(a|b)

Example 1: statistical shape models Example 2: atlas-based segmentation

Model building Model fitting Model building Model fitting

Approach 2: Classification Conventional machine learning: handcrafted features

Non-optimal values for w Optimal values for w

Class 1 Class 1 Decision boundary

Rationale A simple neural network x2 Curved

Class 1 Prediction: 1 if Class 2,

= walk downhill by taking small

a = step size = learning rate

Deep neural networks Convolutional neural networks

Input: 2000x2000 pixels

Share weights between different regions è less parameters

Convolution operator Convolutional filters in image processing

filter W filtered image J 0.0232 0.0338 0.0383 0.0338 0.0232

Number of parameters: 3x3x3x2 + 2 = 56

Alternative activation functions Pooling layer

Feature maps Consecutive layers = consecutive levels of abstraction

• Large amount of parameters, limited training samples

Training & validation set Batch gradient descent

Stochastic gradient descent

Early stopping Example: LeNet-5 for written digit recognition (1998)

Input dimensions: 32 x 32 = 1024

Input dimensions: ~150 K

Residual Neural Network

GAN (Generative Adversarial Networks) GAN: examples

Synthetic CT, 2018

DeepMedic, Kamnitsas et al.

Step 1 Step 2 Step 3 Step 4 Step 5

• Pictures are everywhere

Requires domain-specific knowledge

Until recently: variety of model-based strategies

DeepMedic, Kamnitsas et al.

U-net, Ronneberger et al.

50% over-segmentation: DSC = 80% 20% over-segmentation: DSC = 90%

Unified approach using DL !

Stroke Binary classification

FLAIR T1 T1 CE T2 GT Prediction FLAIR T1 T1 CE T2 GT Prediction

Binary classification Binary classification GT Prediction

WHOLE TUMOR: WHOLE TUMOR:

FLAIR T1 T1 CE T2 GT Prediction FLAIR T1 T1 CE T2 GT Prediction

Binary classification Binary classification

WHOLE TUMOR: WHOLE TUMOR:

Multiclass classification Multiclass classification

WHOLE TUMOR: TUMOR CORE:

FLAIR T1 T1 CE T2 GT Prediction Example 6: Assessment of tissue viability in acute stroke

Deep Learning for RT planning Delineation of Organs at Risk (OAR) Brainstem

PLANNING AUTO-DELINEATION Brainstem 84.7

GPU server hosted by Left Cochlea 70.1

DICOM RT-Struct Oral Cavity 67.5

Average manual delineation time = 36 min

Average corrected delineation time = 22 min

Manual delineations Corrected delineations

Auto-delineation performance: observer variability Auto-delineation performance: robustness

Manual delineations Corrected delineations Manual delineations Corrected delineations

& & & 6

Manual or semi-automatic Manual interventions One patient specific treatment

45min – 2hours hours