Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2017 Dec 12;318(22):2199-2210.
doi: 10.1001/jama.2017.14585.

Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer

Babak Ehteshami Bejnordi  1 Mitko Veta  2 Paul Johannes van Diest  3 Bram van Ginneken  1 Nico Karssemeijer  1 Geert Litjens  4 Jeroen A W M van der Laak  4 the CAMELYON16 ConsortiumMeyke Hermsen  4 Quirine F Manson  3 Maschenka Balkenhol  4 Oscar Geessink  4   5 Nikolaos Stathonikos  3 Marcory Crf van Dijk  6 Peter Bult  4 Francisco Beca  7 Andrew H Beck  7   8 Dayong Wang  7   8 Aditya Khosla  8   9 Rishab Gargeya  10 Humayun Irshad  7 Aoxiao Zhong  11 Qi Dou  11   12 Quanzheng Li  11 Hao Chen  12 Huang-Jing Lin  12 Pheng-Ann Heng  12 Christian Haß  13 Elia Bruni  13 Quincy Wong  14 Ugur Halici  15   16 Mustafa Ümit Öner  15 Rengul Cetin-Atalay  17 Matt Berseth  18 Vitali Khvatkov  19 Alexei Vylegzhanin  19 Oren Kraus  20 Muhammad Shaban  21 Nasir Rajpoot  21   22 Ruqayya Awan  23 Korsuk Sirinukunwattana  21 Talha Qaiser  21 Yee-Wah Tsang  22 David Tellez  4 Jonas Annuscheit  24 Peter Hufnagl  24 Mira Valkonen  25 Kimmo Kartasalo  24   26 Leena Latonen  27 Pekka Ruusuvuori  24   28 Kaisa Liimatainen  24 Shadi Albarqouni  29 Bharti Mungal  29 Ami George  29 Stefanie Demirci  29 Nassir Navab  29 Seiryo Watanabe  30 Shigeto Seno  30 Yoichi Takenaka  30 Hideo Matsuda  30 Hady Ahmady Phoulady  31 Vassili Kovalev  32 Alexander Kalinovsky  32 Vitali Liauchuk  32 Gloria Bueno  33 M Milagro Fernandez-Carrobles  33 Ismael Serrano  33 Oscar Deniz  33 Daniel Racoceanu  34   35 Rui Venâncio  36
Affiliations
Comparative Study

Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer

Babak Ehteshami Bejnordi et al. JAMA. .

Abstract

Importance: Application of deep learning algorithms to whole-slide pathology images can potentially improve diagnostic accuracy and efficiency.

Objective: Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting.

Design, setting, and participants: Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC).

Exposures: Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation.

Main outcomes and measures: The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor.

Results: The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P < .001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC).

Conclusions and relevance: In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Dr Veta reported receiving grant funding from Netherlands Organization for Scientific Research. Dr van Ginneken reported being a co-founder of and holding shares from Thirona and receiving grant funding and royalties from Mevis Medical Solutions. Dr Karssemeijer reported receiving holding shares in Volpara Solutions, QView Medical, and ScreenPoint Medical BV; consulting fees from QView Medical; and being an employee of ScreenPoint Medical BV. Dr van der Laak reported receiving personal fees from Philips, ContextVision, and Diagnostic Services Manitoba. Dr Manson reported receiving grant funding from Dutch Cancer Society. Mr Geessink reported receiving grant funding from Dutch Cancer Society. Dr Beca reported receiving personal fees from PathAI and Nvidia and owning stock in Nvidia. Dr Li reported receiving grant funding from the National Institutes of Health. Dr Ruusuvuori reported receiving grant funding from Finnish Funding Agency for Innovation. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. FROC Curves of the Top 5 Performing Algorithms vs Pathologist WOTC for the Metastases Identification Task (Task 1) From the CAMELYON16 Competition
CAMELYON16 indicates Cancer Metastases in Lymph Nodes Challenge 2016; CULab, Chinese University Lab; FROC, free-response receiver operator characteristic; HMS, Harvard Medical School; MGH, Massachusetts General Hospital; MIT, Massachusetts Institute of Technology; WOTC, without time constraint. The range on the x-axis is linear between 0 and 0.125 (blue) and base 2 logarithmic scale between 0.125 and 8. Teams were those organized in the CAMELYON16 competition. Task 1 was measured on the 129 whole-slide images in the test data set, of which 49 contained metastatic regions. The pathologist did not produce any false-positives and achieved a true-positive fraction of 0.724 for detecting and localizing metastatic regions.
Figure 2.
Figure 2.. Probability Maps Generated by the Top 3 Algorithms From the CAMELYON16 Competition
For abbreviations, see the legend of Figure 3. The color scale bar (top right) indicates the probability for each pixel to be part of a metastatic region. For additional examples, see eFigure 5 in the Supplement. A, Four annotated micrometastatic regions in whole-slide images of hematoxylin and eosin–stained lymph node tissue sections taken from the test set of Cancer Metastases in Lymph Nodes Challenge 2016 (CAMELYON16) dataset. B-D, Probability maps from each team overlaid on the original images.
Figure 3.
Figure 3.. ROC Curves of the Top-Performing Algorithms vs Pathologists for Metastases Classification (Task 2) From the CAMELYON16 Competition
AUC indicates area under the receiver operating characteristic curve; CAMELYON16, Cancer Metastases in Lymph Nodes Challenge 2016; CULab, Chinese University Lab; HMS, Harvard Medical School; MGH, Massachusetts General Hospital; MIT, Massachusetts Institute of Technology; WOTC, without time constraint; WTC, with time constraint; ROC, receiver operator characteristic. The blue in the axes on the left panels correspond with the blue on the axes in the right panels. Task 2 was measured on the 129 whole-slide images (for algorithms and the pathologist WTC) and corresponding glass slides (for 11 pathologists WOTC) in the test data set, which 49 contained metastatic regions. A, A machine-learning system achieves superior performance to a pathologist if the operating point of the pathologist lies below the ROC curve of the system. The top 2 deep learning–based systems outperform all the pathologists WTC in this study. All the pathologists WTC scored glass slide images using 5 levels of confidence: definitely normal, probably normal, equivocal, probably tumor, definitely tumor. To generate estimates of sensitivity and specificity for each pathologist, negative was defined as confidence levels of definitely normal and probably normal; all others as positive. B, The mean ROC curve was computed using the pooled mean technique. This mean is obtained by joining all the diagnoses of the pathologists WTC and computing the resulting ROC curve as if it were 1 person analyzing 11 × 129 = 1419 cases.

Comment in

Similar articles

Cited by

References

    1. Griffin J, Treanor D. Digital pathology in clinical use: where are we now and what is holding us back? Histopathology. 2017;70(1):134-145. - PubMed
    1. Madabhushi A, Lee G. Image analysis and machine learning in digital pathology: challenges and opportunities. Med Image Anal. 2016;33:170-175. - PMC - PubMed
    1. Gulshan V, Peng L, Coram M, et al. . Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402-2410. - PubMed
    1. Esteva A, Kuprel B, Novoa RA, et al. . Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115-118. - PMC - PubMed
    1. Vestjens JHMJ, Pepels MJ, de Boer M, et al. . Relevant impact of central pathology review on nodal classification in individual breast cancer patients. Ann Oncol. 2012;23(10):2561-2566. - PubMed

Publication types