International evaluation of an AI system for breast cancer screening

Screening mammography aims to identify breast cancer before symptoms appear, enabling earlier therapy for more treatable disease. Despite the existence of screening programs worldwide, interpretation of these images suffers from suboptimal rates of false positives and false negatives. Here we present an AI system capable of surpassing a single expert reader in breast cancer prediction performance. Using two large data sets representative of clinical practice in the United States (US) and the United Kingdom (UK), we show an absolute reduction of 5.7%/1.2% (US/UK) in false positives and 9.4%/2.7% (US/UK) in false negatives. We show evidence of the system's ability to generalize from the UK sites to the US site. In an independently-conducted reader study, the AI system out-performed all six radiologists with an area under the receiver operating characteristic curve (AUC-ROC) greater than the average radiologist by an absolute margin of 12.1%. By simulating the AI system's role in the double-reading process, we maintain noninferior performance while reducing the second reader's workload by 88%. This robust assessment of the AI system paves the way for prospective clinical trials to improve the accuracy and efficiency of breast cancer screening.