Study population
China has launched the “Two Cancers (Breast and Cervical Cancer) Screening” project since 2009, providing free screening for millions of rural women aged 35–69 years. And this cancer screening program has expanded to include all eligible women in both rural and urban areas as a basic public health service since 2019 [27, 28].
Building on this screening initiative, we conducted a prospective controlled trial in Shanghai, China. Eligible participants were women aged 35–69 years, who were attending the “Two Cancers (Breast and Cervical Cancer) Screening” project, and had no history of breast cancer (including in-situ cancer), or any other cancers in the previous 5 years. Participants were also required to have no serious cardiopulmonary insufficiency, liver or kidney insufficiency, or other systemic diseases, and a life expectancy of more than five years (Fig. 1).
This study was conducted in accordance with the guidelines of the Helsinki Declaration and was approved by the Institutional Review Board of Fudan University Shanghai Cancer Center (No. 2008223-22). Informed consent was obtained from all individual participants included in the study.
This study is registered at ClinicalTrials.gov under the number NCT 06521788 (Initial Release Date: 07/22/2024).
Definition and selection of a cluster
We define one district as a cluster. As of 2021, Shanghai has jurisdiction over 16 municipal districts. The division of these districts was based on field-measured data and relevant geographical maps, utilizing human–computer interaction for the vectorization of administrative division maps. Among these divisions, seven are classified as urban areas and nine as suburban areas. The urban districts cover an area of 20–40 square kilometers, with a resident population ranging from 600,000 to 1,000,000. In contrast, the suburban districts span 300–1200 square kilometers and have populations between 500,000 and 3,200,000. Each district has at least 20,000 women aged 35–69.
We selected two districts as the units of cluster rather than individuals based on economy, population characteristics, pandemic, and willingness to cooperate. Hongkou district was selected to serve as the intervention group and Pudong district as the control group.
Trial participants and intervention
Women in the intervention group received AI-assisted ultrasound screening, while those in the control group underwent routine ultrasound screening. The AI-assisted ultrasound diagnostic device used in this study is a portable and intelligent AI-assisted ultrasound diagnostic instrument developed collaboratively by FUSCC, the School of Information Science and Engineering at Fudan University, Shanghai University, and Shisun Intelligent Technology (Shanghai) Co., Ltd. This project was funded by the Major Instruments Program of the National Natural Science Foundation of China and the Science and Technology Innovation Action Plan of the Shanghai Municipal Science and Technology Commission. The AI-assisted device features a portable design, consisting of a display panel measuring 500 mm × 500 mm × 20 mm laptop and an ultrasound probe (frequency of 10 MHz, 4 cm in width), which can be easily carried in a bag for on-the-go examinations, making it ideal for community/rural screening (Fig. 2). During the ultrasound scanning process, it enables real-time breast nodule lesion detection. When suspicious lesions are detected, the system issues both audio and visual alerts, with the suspicious lesions outlined by a red bounding box. After lesion localization, the system provides a detailed description of the lesion characteristics, including morphology, boundary, margin, echogenicity, as well as the detection and analysis of associated features. Based on the BI-RADS classification guideline, the AI system conducts a comprehensive evaluation of the lesion and assigns a BI-RADS category. The routine ultrasound device was the IU22 ultrasound diagnostic system from Philips, Netherlands, with the L9-3 probe and a frequency range of 3–9 MHz.
Ultrasound doctors used portable AI-assisted ultrasound diagnostic instrument for screening on-site
Participants in both groups were eligible for further diagnostic evaluation and treatment at Fudan University Shanghai Cancer Center. Recruitment began in January 2021 and was completed in December 2022. All participants were followed up until the end of 2023, and the database was locked in January 2024 for analysis.
Screening procedure
Our screening procedure is based on the National Breast Cancer Screening Process Technical Guidelines, issued in 2015 by the National Health Commission of the People’s Republic of China [27]. The procedure consists of the following steps (Fig. 3):
-
(1)
Clinical examination and initial breast ultrasound screening: All participants were enrolled and underwent a clinical examination along with an initial breast ultrasound (routine ultrasound or AI-assisted ultrasound). The results of the initial screening tests were classified according to the BI-RADS.
-
(2)
MAM Rescreen: Participants who received BI-RADS grades 0 or 3 in the initial ultrasound were suggested for an MAM rescreen. The results of the MAM were also classified using the BI-RADS system.
-
(3)
Histopathological Examination (Biopsy): Participants who received BI-RADS grades 4 or 5 in either the initial ultrasound or MAM were advised to undergo further biopsy examination. For those with MAM rescreen results of grade 0 or grade 3, a short-term follow-up (3–6 months) or further biopsy examination was recommended based on the evaluation of breast specialists.
-
(4)
Follow-Up: Participants who received BI-RADS levels 1 or 2 in either the ultrasound or MAM were monitored closely through visits and phone calls conducted by trained medical social workers.
Breast Cancer Screening Flowchart Based on “Two Cancers (Breast and Cervical Cancer) Screening” Project. “Initial ultrasound, MG rescreen” was applied
The final results of the screening tests (positive or negative) were determined using a two-step approach. Initial results were categorized as negative, indeterminate, or positive based on BI-RADS grades assessed by ultrasound. Participants with indeterminate initial results were reclassified to negative or positive based on follow-up MAM assessments. Initial screening results were determined by the BI-RADS grade from the breast ultrasound: results were considered negative with BI-RADS grades 1 or 2, positive with grades 4 or 5, and indeterminate with grades 0 or 3. Participants with indeterminate results were informed of the detection of suspicious lesions that required further MAM assessment. The results from this second step were classified as negative or positive based on the MAM BI-RADS grade.
After receiving a negative final screening result, participants did not undergo any additional diagnostic procedures. However, those with a positive final screening result were referred for further diagnostic work-up to exclude or confirm a diagnosis of breast cancer at FUSCC.
Interval cancers were defined as cases diagnosed following a negative or indeterminate screening test, with no subsequent mammography or diagnostic workup within 1 year, confirmed by linkage to regional cancer registries. To assess interval cancers, we collected data on all breast cancers diagnosed, along with an additional year of follow-up from the Shanghai Municipal Cancer Registry, one of the largest cancer registries globally and an associate member of the International Association of Cancer Registries (IARC). For each patient diagnosed with breast cancer outside of the screening (i.e., diagnosed with interval cancer), we gathered all relevant medical records. Two experienced radiologists reviewed both the screening ultrasound and MAM images from the study and the clinical MAM and ultrasound images used for breast cancer diagnosis, reaching a consensus on whether the breast cancer could be retrospectively identified in the screening.
Outcomes
The primary endpoint of this study was improved screening sensitivity, enabling the detection of more true-positive cases. True-positive screening results were defined as positive screening tests in participants with histologically confirmed breast cancer through subsequent diagnostic evaluation. Screening sensitivity was calculated as the proportion of screen-detected breast cancers among all breast cancer cases (comprising both screen-detected and interval cancers) identified within one year of follow-up in the screened population. Screen-detected cancers were defined as breast cancer cases confirmed through pathological diagnosis following a positive screening test result. Interval cancers were defined as breast cancers diagnosed either: (1) after a negative screening test, or (2) after an indeterminate screening result without subsequent diagnostic follow-up (including mammography or other diagnostic examinations). Interval cases then included false negatives, true interval cancers, as well as cases where a woman with a prior negative screening exam presented with symptoms between a normal screening interval and was found to have cancer.
The secondary outcome was the detection of more proportion of early-stage cancers. Early-stage cancers were defined as those meeting any of the following criteria at the time of diagnosis: a size of less than 20 mm; no metastatic lymph nodes in the axilla; no distant metastasis; inclusion of non-invasive cancers; or classified as stage 0, stage I, or stage II according to the American Joint Committee on Cancer (AJCC) 8th Edition staging system [29].
Sample size consideration
Routine ultrasound screening for breast cancer has a sensitivity of 40%. We hypothesize that AI-assisted ultrasound screening will achieve a sensitivity of at least 70%. Assuming a significance level (α) of 0.05, 80% power, and a 5‰ breast lesion detection rate among Asian women, the estimated total sample size required is 16,800 participants—with 8400 allocated to the conventional screening group and 8400 to the AI-assisted screening group.
Statistical analysis
Numerical data are presented as medians with Interquartile Range (IQR), while categorical variables are expressed as percentages. Sensitivity was calculated by dividing the number of true-positive screenings by the total number of true positives and false positives. Specificity was determined by dividing the number of true-negative screenings by the total number of true negatives and false negatives. The PPV was estimated by dividing the number of participants with true-positive screenings by the total number of participants with positive screenings. Conversely, the NPV was calculated by dividing the number of participants with true-negative screenings by the total number of participants with negative screenings. To calculate 95% confidence intervals (CIs), we employed bootstrapping based on 5,000 samples. The significance of differences in incidence rates was assessed using Poisson regression. For categorical variables, we utilized Fisher’s exact test or the likelihood-based χ2 test. All analyses were conducted using IBM SPSS (version 20).