Professional Documents
Culture Documents
Computer Vision in The Surgical Operating Room: Review Article
Computer Vision in The Surgical Operating Room: Review Article
Danail Stoyanov a
a Department
of Computer Science, Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS),
University College London, London, UK
Keywords Introduction
Artificial intelligence · Computer-assisted intervention ·
Computer vision · Minimally invasive surgery Surgery has progressively shifted towards the mini-
mally invasive surgery (MIS) paradigm. This means that
today most operating rooms (OR) are equipped with dig-
Abstract ital cameras that visualize the surgical site. The video gen-
Background: Multiple types of surgical cameras are used in erated by surgical cameras is a form of digital measure-
modern surgical practice and provide a rich visual signal that ment and observation of the patient anatomy at the surgi-
is used by surgeons to visualize the clinical site and make cal site. It contains information about the appearance,
clinical decisions. This signal can also be used by artificial in- shape, motion, and function of the anatomy and instru-
telligence (AI) methods to provide support in identifying in- mentation within it. Once recorded over the duration of
struments, structures, or activities both in real-time during a procedure it also embeds information about the surgical
procedures and postoperatively for analytics and under- process, actions performed, instruments used, possible
standing of surgical processes. Summary: In this paper, we hazards or complications, and even information about
provide a succinct perspective on the use of AI and espe- risk. While such information can be inferred by expert
cially computer vision to power solutions for the surgical op- observers, this is not practical for providing assistance in
erating room (OR). The synergy between data availability routine clinical use and automated techniques are neces-
and technical advances in computational power and AI sary to effectively utilize the data for driving improve-
methodology has led to rapid developments in the field and ments in practice [1, 2].
promising advances. Key Messages: With the increasing Currently, the majority of surgical video is either not re-
availability of surgical video sources and the convergence of corded or it is stored for a limited period of time on the stack
technologies around video storage, processing, and under- accompanying the surgical camera and then discarded at a
standing, we believe clinical solutions and products leverag- later date. Perhaps some video is used in case presentations
ing vision are going to become an important component of during clinical meeting discussions or society conferences
modern surgical capabilities. However, both technical and or for educational purposes, and on an individual level sur-
clinical challenges remain to be overcome to efficiently geons may choose to record their case history. Storage has
make use of vision-based approaches into the clinic. an associated cost and hence it is sensible to reduce data
© 2020 S. Karger AG, Basel stores to only relevant and clinically useful information.
This is largely due to the lack of tools that can synthesize the
surgical video into meaningful information, either about
131.211.12.11 - 10/22/2020 2:56:11 PM
the process or about physiological information contained Computer vision has seen major improvements in the
in the video observations. For certain diagnostic proce- last 2 decades driven by breakthroughs in computing,
dures, e.g., endoscopic gastroenterology, storage of images digital cameras, mathematical modelling, and most re-
from the procedure into the patient medical record to doc- cently deep learning techniques. While previous systems
ument observed lesions is becoming standard practice but required human intervention in the design and modelling
this is largely not done for surgical video. of image features that capture different objects in a cer-
In addition to surgical cameras, it is also nowadays tain domain, in deep learning the most discriminant fea-
common for other OR cameras to be present. These can tures are learned autonomously from extensive amounts
be used to monitor activity throughout the OR and not of annotated data. The increasing access to high volumes
just at the surgical site [3]. As such, opportunities are of digitally recorded image-guided surgeries is sparking a
present to capture this signal and provide an understand- significant interest in translating deep learning to intra-
ing of the entire room and activities or events that occur operative imaging. Annotated surgical video datasets in a
within it. This can potentially be used to optimize team wide variety of domains are being made publicly available
performance or monitor room level events that can be for training validating new algorithms in the form of
used to improve the surgical process. To effectively make competition challenges [5], resulting in a rapid progress
use of video data from the OR, it is necessary to build al- towards reliable automatic interpretation of surgical data.
gorithms for video analysis and understanding. In this
paper, we provide a short review of the state of the art in
artificial intelligence (AI), and especially computer vi- Surgical Process Understanding
sion, for the analysis of surgical data and outline some of
the concepts and directions for future development and Surgical procedures can be decomposed into a number
practical translation into the clinic. of sequential tasks (e.g., dissection, suturing, and anasto-
mosis) typically called procedural phases or steps [6–8].
Recognizing and temporally localizing these tasks allows
Computer Vision for process surgical modelling and workflow analysis [9].
This further facilitates the current trend in MIS practice
The field of computer vision is a sub-branch of AI fo- towards establishing standardized protocols for surgical
cused on building algorithms and methods for under- workflow and guidelines for task execution, describing
standing information captured in images and video [4]. optimal tool positioning with respect to the anatomy, set-
To make vision problems tractable, computational meth- ting performance benchmarks, and ensuring operational
ods typically focus on sub-components of the human vi- safety and complication-free, cost-effective procedures
sual system, e.g., object detection or identification (clas- [10, 11]. The ability to objectively quantify surgical per-
sification), motion extraction, or spatial understanding formance could impact many aspects of the user and pa-
(Fig. 1). Developing these building blocks in the context tient experience, like a reduced mental/physical load, in-
of surgery and surgeons’ vision can lead to exciting pos- creased safety, and more efficient training and planning
sibilities for utilizing surgical video [4]. [12, 13]. Intraoperative video is the main sensory cue for
131.211.12.11 - 10/22/2020 2:56:11 PM
DOI: 10.1159/000511934
Downloaded by:
the automatic generation of standardized procedure re- learn a depth map from a single image, overcoming the
ports for quality control assessment and clinical training need for image registration. It has recently been shown
[37]. that these approaches are able to infer dense and detailed
depth maps in colonoscopy [45]. By fusing consecutive
Surgical Instrument Detection depth maps and simultaneously estimating the endo-
Automatic detection and localization of surgical in- scope motion using geometric constraints, it has been
struments, when integrated with anatomy detection, can demonstrated that long range colon sections could be re-
also contribute to accurate positioning and ensure critical constructed [46]. A similar approach has also been suc-
structures are not damaged. In robotic MIS (RMIS), this cessfully applied to 3-D reconstruction of the sinus anat-
information has the added benefit of making progress to- omy from endoscopic video so as to propose an alterna-
ward active guidance, e.g., during needle suturing [38]. tive to CT scans – expensive procedures using ionizing
For this reason, research on surgical instrument detection radiation – for longitudinal monitoring of patients after
has seen its largest share of research targeting the articu- nasal obstruction surgery [47]. However, critical limita-
lated tools in RMIS [39]. With RMIS there are interesting tions, such as navigation within deformable environ-
possibilities in using robotic systems to generate data for ments, need to be overcome.
training AI models and bypassing the need for expensive
manual labelling [40]. However, instrument detection Navigation in Robotic Surgery
has also received some attention in nonrobotic proce- Surgical robots such as the da Vinci Surgical System
dures including colorectal, pelvis, spine, and retinal sur- generally use stereo endoscopes which have significant
gery [41]. In such cases, vision for instrument analysis advantages over monocular endoscopes in their ability to
may assist in building systems that can report analytics capture 3-D measurements. Estimating a dense depth
about instrument usage for reporting, or instrument mo- map from a pair of stereo images generally consists of es-
tion and activity for surgical technical skill analysis or ver- timating dense disparity maps defining the apparent pix-
ification. el motion between 2 images. Most of the stereo registra-
tion approaches rely on geometric methods [48]. It has,
however, been shown that DL-based approaches could be
Computer-Assisted Navigation successfully applied to partial nephrectomy outperform-
ing state of the art stereo reconstruction methods [49].
Vision-based methods for localization and mapping of Surgical tool segmentation and localization contribute to
the environment using the surgical camera have advanced safe tool-tissue interaction and are essential to visually
rapidly in recent years. This is crucial in both diagnostic guided manipulation tasks. Recent DL approaches dem-
and surgical procedures because it may enable more com- onstrate significant improvements over hand-crafted tool
plete diagnosis or fusion of pre- and intraoperative infor- tracking methods offering a high degree of flexibility, ac-
mation to enhance clinical decision making. curacy, and reliability [50].
DOI: 10.1159/000511934
Downloaded by:
5 Bernal J, Tajkbaksh N, Sánchez FJ, Matusze- 18 Jin Y, Dou Q, Chen H, Yu L, Qin J, Fu CW, et 32 Ahmad OF, Soares AS, Mazomenos E, Bran-
wski BJ, Hao Chen, Lequan Yu, et al. Com- al. SV-RCNet: Workflow Recognition From dao P, Vega R, Seward E, et al. Artificial intel-
parative validation of polyp detection meth- Surgical Videos Using Recurrent Convolu- ligence and computer-aided diagnosis in
ods in video colonoscopy: results from the tional Network. IEEE Trans Med Imaging. colonoscopy: current evidence and future di-
MICCAI 2015 endoscopic vision challenge. 2018 May;37(5):1114–26. rections. Lancet Gastroenterol Hepatol. 2019
IEEE Trans Med Imaging. 2017 Jun; 36(6): 19 Funke I, Bodenstedt S, Oehme F, von Bechtol- Jan;4(1):71–80.
1231–49. sheim F, Weitz J, Speidel S. Using 3D Convo- 33 Brandao P, Zisimopoulos O, Mazomenos E,
6 Ahmidi N, Tao L, Sefati S, Gao Y, Lea C, Haro lutional Neural Networks to Learn Spatio- Ciuti G, Bernal J, Visentini-Scarzanella M, et
BB, et al. A Dataset and Benchmarks for Seg- temporal Features for Automatic Surgical al. Towards a computed-aided diagnosis sys-
mentation and Recognition of Gestures in Gesture Recognition in Video. Conference on tem in colonoscopy: automatic polyp seg-
Robotic Surgery. IEEE Trans Biomed Eng. Medical Image Computing and Computer- mentation using convolution neural net-
2017 Sep;64(9):2025–41. Assisted Intervention. 2019. https://doi. works. J Med Robot Res. 2018;3(2):1840002.
7 Stauder R, Ostler D, Kranzfelder M, Koller S, org/10.1007/978-3-030-32254-0_52. 34 Hussein M, Puyal J, Brandao P, Toth D, Seh-
Feußner H, Navab N. The TUM LapChole da- 20 Bodenstedt S, Rivoir D, Jenke A, Wagner M, gal V, Everson M, et al. Deep neural network
taset for the M2CAI 2016 workflow challenge Breucha M, Müller-Stich B, et al. Active learn- for the detection of early neoplasia in Barret’s
[Internet]. 2016. Available from: https://arx- ing using deep Bayesian networks for surgical oesophagus. Gastrointest Endosc. 2020;
iv.org/abs/1610.09278. workflow analysis. Int J CARS. 2019 Jun; 91(6):AB250.
8 Flouty E, Kadkhodamohammadi A, Luengo I, 14(6):1079–87. 35 Janatka M, Sridhar A, Kelly J, Stoyanov D.
Fuentes-Hurtado F, Taleb H, Barbarisi S, et al. 21 Liu D, Jiang T. Deep Reinforcement Learning Higher order of motion magnification for
CaDIS: Cataract dataset for image segmenta- for Surgical Gesture Segmentation and Clas- vessel localisation in surgical video. Confer-
tion [Internet]. 2019. Available from: https:// sification. Conference on Medical Image ence on Medical Image Computing and Com-
arxiv.org/abs/1906.11586. Computing and Computer-Assisted Inter- puter-Assisted Intervention. 2018. https://
9 Gholinejad M, J Loeve A, Dankelman J. Surgi- vention. 2018; vol 11073, pp 247–55. doi.org/10.1007/978-3-030-00937-3_36.
cal process modelling strategies: which meth- 22 Despinoy F, Bouget D, Forestier G, Penet C, 36 Mascagni P, Fiorillo C, Urade T, Emre T, Yu
od to choose for determining workflow? Min- Zemiti N, Poignet P, et al. Unsupervised Tra- T, Wakabayashi T, et al. Formalizing video
im Invasive Ther Allied Technol. 2019 Apr; jectory Segmentation for Surgical Gesture documentation of the critical view of safety in
28(2):91–104. Recognition in Robotic Training. IEEE Trans laparoscopic cholecystectomy: a step towards
10 Gill S, Stetler JL, Patel A, Shaffer VO, Sriniva- Biomed Eng. 2016 Jun;63(6):1280–91. artificial intelligence assistance to improve
san J, Staley C, et al. Transanal Minimally In- 23 Van Amsterdam B, Nakawala H, De Momi E, surgical safety. Surg Endosc. 2020 Jun; 34(6):
vasive Surgery (TAMIS): Standardizing a Re- Stoyanov D. Weakly Supervised Recognition 2709–14.
producible Procedure. J Gastrointest Surg. of Surgical Gestures. IEEE International Con- 37 He Q, Bano S, Ahmad OF, Yang B, Chen X,
2015 Aug;19(8):1528–36. ference on Robotics and Automation. 2019; Valdastri P, et al. Deep learning-based ana-
11 Jajja MR, Maxwell D, Hashmi SS, Meltzer RS, pp 9565–71. tomical site classification for upper gastroin-
Lin E, Sweeney JF, et al. Standardization of 24 Van Amsterdam B, Clarkson MJ, Stoyanov D. testinal endoscopy. Int J CARS. 2020 Jul;
operative technique in minimally invasive Multi-Task Recurrent Neural Network for 15(7):1085–94.
right hepatectomy: improving cost-value re- Surgical Gesture Recognition and Progress 38 D’Ettorre C, Dwyer G, Du X, Chadebecq F,
lationship through value stream mapping in Prediction. IEEE International Conference Vasconcelos F, De Momi E, et al. Automated
hepatobiliary surgery. HPB (Oxford). 2019 on Robotics and Automation. 2020. pick-up of suturing needles for robotic surgi-
May;21(5):566–73. 25 Vedual SS, Ishii M, Hager GD. Objective as- cal assistance. IEEE International Conference
12 Reiley CE, Lin HC, Yuh DD, Hager GD. Re- sessment of surgical technical skill and com- on Robotics and Automation. 2018; pp 1370-
view of methods for objective surgical skill petency in the operating room. Annu Rev 1377.
evaluation. Surg Endosc. 2011 Feb;25(2):356– Biomed Eng. 2017 Jun 21;19:301–25. 39 Du X, Kurmann T, Chang PL, Allan M,
66. 26 Nguyen XA, Ljuhar D, Pacilli M, Nataraja Ourselin S, Sznitman R, et al. Articulated
13 Mazomenos EB, Chang PL, Rolls A, Hawkes RM, Chauhan S. Surgical skill levels: classifi- multi-instrument 2-D pose estimation using
DJ, Bicknell CD, Vander Poorten E, et al. A cation and analysis using deep neural network fully convolutional networks. IEEE Trans
survey on the current status and future chal- model and motion signals. Comput Methods Med Imaging. 2018 May;37(5):1276–87.
lenges towards objective skills assessment in Programs Biomed. 2019 Aug;177:1–8. 40 Colleoni E, Edwards P, Stoyanov D. Synthetic
endovascular surgery. J Med Robot Res. 2016; 27 Ismail Fawaz H, Forestier G, Weber J, and Real Inputs for Tool Segmentation in Ro-
1(3):1640010. Idoumghar L, Muller PA. Accurate and inter- botic Surgery. Conference on Medical Image
14 Azevedo-Coste C, Pissard-Gibollet R, Toupet pretable evaluation of surgical skills from ki- Computing and Computer Assisted Interven-
G, Fleury É, Lucet JC, Birgand G. Tracking nematic data using fully convolutional neural tion. 2020. https://doi.org/10.1007/978-3-
Clinical Staff Behaviors in an Operating networks. Int J CARS. 2019 Sep;14(9):1611–7. 030-59716-0_67.
Room. Sensors (Basel). 2019 May; 19(10): 28 Wang Z, Majewicz Fey A. Deep learning with 41 Bouget D, Allan M, Stoyanov D, Jannin P. Vi-
2287. convolutional neural network for objective sion-based and marker-less surgical tool de-
15 Khan RS, Tien G, Atkins MS, Zheng B, Panton skill evaluation in robot-assisted surgery. Int tection and tracking: a review of the literature.
ON, Meneghetti AT. Analysis of eye gaze: do J CARS. 2018 Dec;13(12):1959–70. Med Image Anal. 2017 Jan;35:633–54.
novice surgeons look at the same location as 29 Fawaz HI, Forestier G, Weber J, Idoumghar L, 42 Maier-Hein L, Mountney P, Bartoli A, Elha-
expert surgeons during a laparoscopic opera- Muller PA. Automated Performance Assess- wary H, Elson D, Groch A, et al. Optical tech-
tion? Surg Endosc. 2012 Dec;26(12):3536–40. ment in Transoesophageal Echocardiography niques for 3D surface reconstruction in com-
16 Twinanda AP, Shehata S, Mutter D, Mares- with Convolutional Neural Networks. Con- puter-assisted laparoscopic surgery. Med Im-
caux J, de Mathelin M, Padoy N. EndoNet: A ference on Medical Image Computing and age Anal. 2013 Dec;17(8):974–96.
Deep Architecture for Recognition Tasks on Computer Assisted Intervention. 2018. 43 Liu X, Zheng Y, Killeen B, Ishii M, Hager GD,
Laparoscopic Videos. IEEE Trans Med Imag- 30 Abdi AH, Luong C, Tsang T, Allan G, Noura- Taylor RH, et al. Extremely dense point cor-
ing. 2017 Jan;36(1):86–97. nian S, Jue J et al. Automatic quality assess- respondences using a learned feature descrip-
17 Chen W, Feng J, Lu J, Zhou J. Endo3D: Online ment of echocardiograms using convolution- tor [Internet]. 2020. Available from: https://
workflow analysis for endoscopic surgeries al neural networks: Feasibility on the apical arxiv.org/abs/2003.00619.
based on 3D CNN and LSTM. In: Stoyanov D, four-chamber view. IEEE Trans Med Imag- 44 Bano S, Vasconcelos F, Tella Amo M, Dwyer
Taylor Z, Sarikaya D, et al., editors. OR 2.0 ing. 2017 Jun;36(6):1221–30. G, Gruijthuijsen C, Deprest J, et al. Deep Se-
context-aware operating theaters, computer 31 Castellino RA. Computer aided detection quential Mosaicking of Fetoscopic Videos.
assisted robotic endoscopy, clinical image- (CAD): an overview. Cancer Imaging. 2005 Conference on Medical Image Computing
based procedures, and skin image analysis. Aug;5(1):17–9. and Computer Assisted Intervention. 2019.
Cham: Springer International; 2018.
131.211.12.11 - 10/22/2020 2:56:11 PM
DOI: 10.1159/000511934
Downloaded by: