Professional Documents
Culture Documents
Nackaerts 2014
Nackaerts 2014
Nackaerts 2014
Beam CT Datasets
Olivia Nackaerts, MSc, PhD;* Maarten Depypere, MSc, PhD;† Guozhi Zhang, MSc, PhD;‡
Bart Vandenberghe, DDS, MSc, PhD;§ Frederik Maes, MSc, PhD;¶ Reinhilde Jacobs, DDS, MSc, PhD;**
SEDENTEXCT Consortium
ABSTRACT
Background: The term bone quality is often used in a dentomaxillofacial context, for example in implant planning, as bone
density and bone structure have been linked to primary implant success.
Purpose: This research aimed to investigate the performance of adaptive thresholding of trabecular bone in cone beam CT
(CBCT) images. The segmentation quality was assessed for different imaging devices and upper and lower jaws.
Materials and Methods: Four jaws were scanned with eight CBCT scanners and one micro-CT device. Images of the jaws
were spatially aligned with the micro-CT images. Two volumes of interest for each jaw were manually delineated. Trabecular
bone in the volumes of interest in the micro-CT images was segmented so that the micro-CT images could serve as
high-resolution ground truth images. The volumes of interest in the CBCT images were segmented using both global and
adaptive thresholding.
Results: Segmentation was significantly better for the lower jaw than for the upper jaw. Differences in performance between
the scanners were significant for both jaws. Adaptive thresholding performed significantly better in segmenting the bone
structure out of CBCT images.
Conclusions: When assessing jaw bone structure, the observer should always choose adaptive thresholding. It remains a
challenge to identify the optimal threshold selection for the structural assessment of jaw bone.
KEY WORDS: bone density, CBCT imaging, micro-CT
1
2 Clinical Implant Dentistry and Related Research, Volume *, Number *, 2014
depends on the segmentation of the trabecular bone in often used in the medical imaging context to measure
the images. In turn, segmentation performance depends the performance of a radiologist in detecting and clas-
on a range of factors, including hardware, patient posi- sifying image patterns of disease, they can also serve as a
tioning and imaging stability, reconstruction software, useful evaluation of image segmentation performance
postprocessing algorithms, and others. through analysis of pixel classification accuracy. For
For implant placement planning, clinicians often segmentation algorithms that require user input, every
opt for cone beam CT (CBCT) imaging.4,5 In previous possible parameter value generates a different point on
research, it has been shown that the intensity values of the ROC curve. When an algorithm is evaluated by its
CBCT images are not as uniform or reproducible as ROC characteristics, all possible user-selected inputs are
those of multislice CT images,6,7 making mere densito- considered.10 ROC curves can also be used to compare
metric analysis inappropriate for assessment of bone the final segmentation accuracy of images obtained
quality with CBCT. When using segmentation-derived by various imaging devices or the accuracy obtained
morphometric measures, it is important that the post- for different bone samples independently of the user-
processing of CBCT images be adapted to this in- selected parameters.
consistency in intensity values. Adaptive thresholding This research aims to investigate the performance
algorithms that allow the segmentation threshold to of adaptive thresholding of trabecular bone in images
vary in every pixel of the image can be expected to cope of human jawbones acquired by CBCT. To account for
reasonably well with the nonuniformity of intensity the varying user-selected thresholds in the segmentation
values in CBCT images.8,9 A thorough evaluation of algorithms, ROC curves are used for evaluating the per-
global and adaptive thresholding of CBCT images is formance of global and adaptive segmentation methods.
therefore of interest. Both thresholding methods are The ROC approach additionally enables assessment of
simple and fast, but they require manual selection of segmentation qualities for different imaging devices
appropriate threshold parameters. Comparing segmen- and for upper and lower jaws. The impact of different
tation methods that rely on user input can be delicate, approaches for threshold selection on morphometric
as it cannot be guaranteed that the user will select indices is also investigated.
optimal thresholds in practice. Moreover, different
CBCT devices or acquisition parameters may require MATERIALS AND METHODS
different threshold selections for optimal performance. This study received ethical approval from the medical
In fact, the entire imaging chain, including scanned ethics committee of the University Hospitals Leuven
sample, CBCT device, and segmentation method, (number B32220083749). Table 1 shows an overview of
should be evaluated, irrespective of the user input. the steps that were followed in order to assess the per-
Receiver operating characteristic (ROC) curves formance of adaptive thresholding of trabecular bone in
measure performance on classification tasks, provided images of human jawbones acquired by CBCT. In the
that the ground truth is known. While such curves are paragraphs below, each step will be further clarified.
4 jaws Scan registration Percentage bone volume ROC curves One-way ANOVA:
8 CBCT scanners Scan normalisation Trabecular thickness Global thresholding versus scanner comparison
1 micro-CT Volume of interest (VOI) Trabecular number adaptive thresholding Paired t-test: thresholding
scanner selection Trabecular separation Manual thresholding versus method comparison,
Segmentation of trabecular thresholding based on upper versus lower jaw
bone, micro-CT sample properties versus
(ground truth) thresholding based on
Segmentation of trabecular optimal overlap
bone, CBCT
Segmentation of Jaw Bone on CBCT Datasets 3
adaptive thresholding. In global thresholding, one single positive rate (FPR) is then defined as the ratio of false
intensity threshold value was selected to classify all positives to the amount of background voxels in the
voxels as either background or bone. This method is ground truth, while the true positive rate (TPR) is given
typically applied in CBCT data, as bone is bright in these by the ratio of true positives to the amount of bone
image sets. In adaptive thresholding, the average inten- voxels in the ground truth.
sity within a sphere around each voxel was calculated A point on the ROC curve plots the TPR versus the
and used as local threshold for that voxel. For the current FPR in the VOI of a segmented image. Figure 1 illus-
analysis, the radius of the sphere was kept constant at trates how global segmentations of the same VOI in the
3 voxels (i.e., 0.6 mm) based on an estimate of the tra- same image with five different thresholds can result in a
becular thickness derived from micro-CT morphologi- ROC curve. The segmentation at the lower left has a
cal analysis. This approach implicitly assumes that each global threshold value higher than the highest intensity
sphere contains bone as well as soft tissue and defines in the image, resulting in no bone in the segmentation.
the threshold as the average of those two tissues. Adap- As no voxel is assigned to bone, there are no positives,
tive thresholding enables the selection of a range of and both TPR and FPR are 0. Upon lowering of the
intensity values outside of which voxels are excluded threshold, some voxels are segmented as bone, some
from local thresholding. This range provides a correc- correctly and some incorrectly, leading to a specific com-
tion for regions that contain no bone or only bone. bination of TPR and FPR. The computation of a ROC
Through selection of a lower threshold, a minimum point for each threshold value results in a ROC curve. To
average intensity is defined, under which none of the obtain an average ROC curve, such as the average of
voxels within the sphere are considered bone; through ROC curves obtained for different images acquired with
selection of an upper threshold, a maximum average the same CBCT device, these curves are averaged by
intensity is defined, above which none of the pixels are threshold averaging, that is, averaging the points on the
considered soft tissue. For the current analysis, no upper curve based on the thresholds that created the points.10
threshold was selected, as no cortical bone was involved In general, a ROC curve that lies more northwest-
in the analysis and therefore we did not expect to find erly in a ROC graph provides higher accuracy, with
spheres with only bone. As a result, both segmentation a high TPR and a low FPR. This feature can be
methods, global and adaptive, had a single parameter characterised by the area under the curve (AUC), which
that needed to be set manually. will be larger for curves lying more northwesterly. Note
that for the adaptive thresholding scheme, not all voxels
Image Analysis: Bone Morphometry can be segmented as bone, resulting in a curve that does
Standard morphometric parameters were calculated not reach the coordinate (1,1). Furthermore, the most
with CT-Analyser, including BV/TV, trabecular thick- extreme threshold values are very unlikely to be chosen
ness, trabecular number, and trabecular separation. For by a user and are not of interest for our evaluation.
precise definitions of these quantities and their inter- Therefore, we computed the partial area under the curve
pretation, please refer to Parfitt and colleagues.12 (pAUC) as the AUC between an FPR of 0.1 and an FPR
of 0.4.
Image Analysis: Segmentation Quality
ROC Curves. To incorporate all the possible choices of Manual Thresholding versus Thresholding
the threshold values, a ROC curve can be generated for Based on Sample Properties or
each combination of VOI and segmentation method. Optimal Overlap
Given the ground truth, every voxel within the VOI in In clinical practice, a threshold is typically selected by
the segmented image can be categorised in one of four visual inspection by an examiner. Alternatively, some
possible classes. If the voxel is bone and it is segmented information about the sample might be available, such
as bone, it is classified as true positive; if it is segmented as the percentage bone volume (BV/TV), and a thresh-
as background it is classified as false negative. If the voxel old might then be chosen in a way that the segmented
is background in the ground truth and segmented as volume has equivalent BV/TV values.
background, it is classified as true negative; if it is seg- As stated before, the pAUC is based on the TPR and
mented as bone it is classified as false positive. The false FPR, which can be related to the overlap, defined as the
Segmentation of Jaw Bone on CBCT Datasets 5
Figure 1 ROC curve construction. Each coordinate on the curve represents a threshold for segmentation. CBCT E: see Table 2.
AUC, global thresholding, upper versus lower jaw Paired t-test AUC lower > AUC upper .04
AUC, global thresholding, difference between scanners ANOVA (single-factor) Significant differences for .005
for upper jaw the upper jaw
AUC, global thresholding, difference between scanners ANOVA (single-factor) Significant differences for <.001
for lower jaw the lower jaw
AUC, global versus adaptive thresholding Paired t-test AUC adaptive > AUC global <.001
Sensitivity to threshold changes for global versus adaptive Paired t-test Sensitivity global > adaptive .03
thresholding
We can conclude from Table 4 that for all methods mise between all morphometric indices, as indicated by
of threshold selection, adaptive segmentation generally the smaller mean error value for adaptive thresholding.
produces smaller errors in morphometric indices than
global thresholding, which is translated into a smaller DISCUSSION
mean absolute error over the morphometric indices. We have quantified the segmentation accuracy of
Manual threshold selection results in smaller errors trabecular bone in CBCT images of the upper and
for trabecular thickness than other threshold selection lower jaw obtained with eight different CBCT scanners
methods, but other indices are estimated less accurately. using global and adaptive thresholding, with micro-CT
A threshold based on the ground truth bone volume imaging as the ground truth.
percentage (BV/TV as derived from segmented micro- In the search for a robust method of bone quan-
CT images) improves the results for this individual tification, we found a clear benefit of using adaptive
morphometric index. When the threshold is based on instead of global thresholding for segmentation. Simi-
maximal overlap (aiming for maximal overlap with the larly, Burghardt and colleagues13 found equivalent or
segmented micro-CT images), the result is a compro- improved accuracy of trabecular bone quantification
Figure 2 ROC curves for global thresholding. a, Upper jaw. b, Lower jaw. For details on CBCTs A–I, see Table 2.
Segmentation of Jaw Bone on CBCT Datasets 7
Figure 3 ROC curves for adaptive thresholding. a, Upper jaw. b, Lower jaw. For details on CBCTs A–I, see Table 2.
using a local thresholding scheme on high-resolution performance cannot be brought to the same level as that
peripheral quantitative computed tomography (HR- of other scanners.
pQCT) scans. As mentioned before, the intensity values Several authors have attempted to categorize CBCT
are not uniform through all CBCT scans.6,7 As adaptive devices based on observers’ opinion on overall image
thresholding uses spheres for local thresholding, the quality or image suitability for specific diagnostic
results of this procedure are less sensitive to density purposes.14–17 We summarize these results in Table 5,
fluctuations within the image. The problem of incon- including the assessment criteria used to classify differ-
sistent intensity is less important in HR-pQCT, which ent devices. Not all studies included the same devices
is why we could find an unequivocal improvement for or identical imaging parameters, so a full comparison is
all indices in all scanners when adaptive thresholding not possible.
was used instead of global thresholding. Despite the Based on the classification in Table 5, an average
success of adaptive thresholding, segmentation accuracy ranking could be given to each scanner. As such, scan-
for a scanner with suboptimal hardware and software ners could be classified into 4 groups: {C,E} (mean rank
TABLE 4 Mean Percentage Error Compared with Micro-CT for Different Thresholding Methods
Mean percentage error (standard deviation)
Global threshold, manual −30.7 (26.9) 89.8 (48.6) −62.3 (16.4) 92.1 (53.6) 68.7
Adaptive threshold, manual −20.9 (25.4) 51.3 (22.8) −46.5 (18.8) 79.5 (52.6) 49.5
Global threshold based on micro-CT BV/TV 3.2 (2.2) 115.5 (47.8) −49.9 (10.7) 68.5 (34.5) 59.3
Adaptive threshold based on micro-CT BV/TV 3.2 (3.9) 55.2 (21.6) −32.4 (8.9) 57.3 (31.5) 37.0
Global threshold based on overlap −12.1 (82.6) 111.4 (110.7) −63.2 (22.1) 92.4 (71.9) 69.7
Adaptive threshold based on overlap 24.5 (54.1) 55.7 (24.6) −20.5 (31.5) 41.5 (38.2) 35.6
allowing more variation in exposure protocols without hence a lower number of correctly selected voxels with
changes in the study results. Moreover, trabecular bone global thresholding. It seems that our assumption is only
is sparse under the maxillary sinus, and a minor thresh- valid when enough voxels are correctly segmented, such
old shift is capable of changing segmentation accuracy that the morphometric index estimates are near their
to a great extent. All jaws were fixed in formaldehyde, nominal values. Manual threshold selection provides
which has a high pH and therefore is known to affect the lower errors for trabecular thickness than the automatic
chemical consistency of bone.20 This is obviously more methods, suggesting an observer bias resulting from
significant for thin structures. In the upper jaw, trabecu- mentally trying to match thicknesses when selecting a
lar bone is densely vascularized, while the lower jaw threshold.
has a more centralized blood supply that flows through Although the differences between adaptive and
channels. The formaldehyde might have been more global thresholding appear limited in the ROC graphs,
destructive in the upper jaw, causing the already sparse Table 3 shows that adaptive thresholding provides much
trabeculation to become even thinner and thus more smaller errors on morphometric indices, irrespective of
sensitive to threshold changes during segmentation.20 the threshold selection method. However, as can be seen
Hence, differences in quality between scanners might be from the ROC graphs, a good choice of scanning equip-
best assessed based on their performance in upper jaw ment and scan settings influences the final segmentation
scanning accuracy. quality more than the segmentation method. Changing
The ROC analysis evaluates segmentation quality by the segmentation method influences the individual
true and false positive rates. In clinical practice, one is pAUC, but it does not change the classification of the
typically more interested in morphometric indices such scanners based on pAUC.
as bone volume fraction and trabecular thickness. Such Some clinical advantages of adaptive thresholding are
morphometric indices are average values for the entire obvious: Implant placement procedures can be planned
volume that is analyzed and do not guarantee good better and performed more safely with precise knowledge
local congruence with the ground truth. This encum- of the supporting bone structures. Adaptive threshold-
bers evaluation of segmentation quality, as all relevant ing indeed provides better visual inspection of the bone
indices should be tested for a comprehensive result. pattern. The ability to increase segmentation accuracy can
We hypothesize that individual morphometric indices also provide a better differentiation between bone and
improve when more voxels are segmented correctly. To nerves in osteoporotic bone. In the next research stage, we
test this hypothesis, we use the TPR and FPR, which are intend to relate the results of the current study to clinical
a measure of overlap on a voxel-by-voxel basis. The results, such as in the optimization of implant surgery. We
results in Table 3 indicate that the hypothesis has merit, are in need of systematic comparisons of segmentation
as thresholds based on overlap lead to a smaller mean results with observers’ opinions on image quality as well as
error for adaptive thresholding. For global threshold- objective technical quality parameters of CBCT scanners.
ing, however, the overlap threshold selection does not This requires the analysis of a large amount of data from
provide a lower mean error. This can be attributed to the different CBCT equipment, but also a variation of expo-
lower FPR and TPR values for global thresholding, sure and reconstruction settings for each device. Ideally
10 Clinical Implant Dentistry and Related Research, Volume *, Number *, 2014
this type of investigation should be approached in a 7. Pauwels R, Nackaerts O, Bellaiche N, et al. Variability of
simulation context. dental cone beam computed tomography grey values for
density estimations. Br J Radiol 2013; 86. Doi: 20120135
CONCLUSIONS 8. Kuhn JL, Goldstein SA, Feldkamp LA, Goulet RW, Jesion G.
Evaluation of a microcomputed tomography system to
When assessing jaw bone structure, for example in
study trabecular bone structure. J Orthop Res 1990; 8:833–
implant planning, the observer should always choose
842.
adaptive thresholding. Although it clearly provides better 9. Waarsing JH, Day JS, Weinans H. An improved segmentation
structural assessment of jaw bone, it is not yet common method for in vivo microCT imaging. J Bone Min Res 2004;
practice among scientists in dentomaxillofacial radiology 19:1640–1650.
to use adaptive thresholding in image analysis. It remains 10. Fawcett T. An introduction to ROC analysis. Patt Rec Letters
a challenging research focus to identify the optimal 2006; 27:861–874.
threshold selection for meaningful jawbone structural 11. Maes F, Collignon A, Vandermeulen D, Marchal G,
assessment. Lastly, it needs to be said that the present Suetens P. Multimodality image registration by maximiza-
tion of mutual information. IEEE Med Im 1997; 16:187–198.
study has been performed with CBCT image data of
12. Parfitt AM, Drezner MK, Glorieux FH, et al. Bone histo-
existing scanners. Yet, considering the rapid develop- morphometry: standardization of nomenclature, symbols
ments in CBCT hardware and software, the differential and units. J Bone Min Res 1987; 2:595–610.
scanner results shown in the present study may not 13. Burghardt AJ, Kazakia GJ, Majumdar S. A local adaptive
necessarily be extrapolated to new generation devices. threshold strategy for high resolution peripheral quantita-
tive computed tomography of trabecular bone. Annals
ACKNOWLEDGMENTS Biomed Eng 2007; 35:1678–1686.
The research leading to these results has received fund- 14. Liang X, Jacobs R, Hassan B, et al. A comparative evaluation
ing from the European Atomic Energy Community’s of cone beam computed tomography and multislice CT.
Part I. On subjective image quality. Eur J Radiol 2010;
Seventh Framework Programme FP7/2007–2011 under
75:265–269.
grant agreement no. 212246 (SEDENTEXCT: Safety and
15. Alqerban A, Jacobs R, Fieuws S, Nackaerts O, SEDENTEXCT
Efficacy of a New and Emerging Dental X-ray Modality; Project Consortium, Willems G. Comparison of 6 cone
http://www.sedentexct.eu) and from the KU Leuven beam computed tomography systems for image quality and
Research Fund (Bijzonder Onderzoeksfonds OT/08/057). detection of simulated canine impaction-induced external
root resorption in maxillary lateral incisors. Am J Orthod
REFERENCES Dentofac Orthop 2011; 140:e129–e139.
1. Ikumi N, Tsutsumi S. Assessment of correlation between 16. Pauwels R, Beinsberger J, Stamatakis H, et al. Comparison of
computerized tomography values of the bone and cutting spatial and contrast resolution for cone beam computed
torque values at implant placement: a clinical study. Int J tomography scanners. Oral Surg Oral Med Oral Path Oral
Oral Maxillofac Impl 2005; 20:253–260. Radiol 2012; 114:127–135.
2. Merheb J, Van Assche N, Coucke W, Jacobs R, Naert I, 17. Vandenberghe B, Luchsinger S, Hostens J, Dhoore E,
Quirynen M. Relationship between cortical bone thickness Jacobs R, SEDENTEXCT Project Consortium. The influence
or computerized tomography-derived bone density values of exposure parameters on jaw bone model accuracy using
and implant stability. Clin Oral Impl Res 2010; 21:612–617. cone beam computed tomography and multislice CT.
3. Ribeiro-Rotta RF, Lindh C, Pereira AC, Rohlin M. Ambiguity Dentomaxillofac Radiol 2012; 41:466–474.
in bone tissue characteristics as presented in studies on 18. Pauwels R, Stamatakis H, Manousaridis G, et al. Develop-
dental implant planning and placement: a systematic review. ment and applicability of a quality control phantom for
Clin Oral Impl Res 2011; 22:798–801. dental cone beam computed tomography. J App Clin Med
4. Vandenberghe B, Jacobs R, Bosmans H. Modern dental imag- Phys 2011; 12(4):245–260.
ing: a review of the current technology and clinical applica- 19. Lofthag-Hansen S, Thilander-Klang A, Gröndahl K. Evalua-
tions in dental practice. Eur Radiol 2010; 20:2637–2655. tion of subjective image quality in relation to diagnostic task
5. Jacobs R. Dental cone beam computed tomography and its for cone beam computed tomography with different fields of
justified use in oral health care. JBR-BTR 2011; 94:254–265. view. Eur J Radiol 2011; 80:483–488.
6. Nackaerts O, Maes F, Yan H, Couto Souza P, Pauwels R, 20. Fonseca AA, Cherubini K, Veeck EB, Ladeira RS,
Jacobs R. Analysis of intensity variability in multislice and Carapeto LP. Effect of 10% formalin on radiographic optical
cone beam computed tomography. Clin Oral Impl Res 2011; density of bone specimens. Dentomaxillofac Radiol 2008;
22:873–879. 37:137–141.