• Agusti, A., Vogelmeier, C. F. & Halpin, D. M. G. Tackling the global burden of lung disease through prevention and early diagnosis. Lancet Respir. Med. 10, 1013–1015 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Ardila, D. et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25, 954–961 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Chen, Z., Song, Y., Chang, T.-H. & Wan, X. Generating radiology reports via memory-driven transformer. In Proc. Conference on Empirical Methods in Natural Language Processing (eds Webber, B. et al.) 1439–1449 (ACL, 2020); https://doi.org/10.18653/v1/2020.emnlp-main.112

  • OpenAI. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).

  • Kirillov, A. et al. Segment anything. In Proc. IEEE/CVF International Conference on Computer Vision 4015–4026 (IEEE, 2023).

  • Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat. Med. 28, 31–38 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).

    Article 
    PubMed 

    Google Scholar
     

  • Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ma, J. et al. Segment anything in medical images. Nat. Commun. 15, 654 (2024).

  • Lei, W., Wei, X., Zhang, X., Li, K. & Zhang, S. MedLSAM: localize and segment anything model for 3D medical images. Med. Image Anal. 99, 103370 (2025).

    Article 
    PubMed 

    Google Scholar
     

  • Zhang, X., Wu, C., Zhang, Y., Xie, W. & Wang, Y. Knowledge-enhanced visual-language pre-training on chest radiology images. Nat. Commun. 14, 4542 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhang, S. et al. A multimodal biomedical foundation model trained from fifteen million image–text pairs. NEJM AI 2, AIoa2400640 (2025).

    Article 

    Google Scholar
     

  • Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619, 357–362 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Topol, E. J. As artificial intelligence goes multimodal, medical applications multiply. Science 381, adk6139 (2023).

    Article 
    PubMed 

    Google Scholar
     

  • National Academies of Sciences, Engineering, and Medicine. Improving Diagnosis in Health Care (The National Academies Press, 2015).

  • Azizi, S. et al. Big self-supervised models advance medical image classification. In Proc. IEEE/CVF International Conference on Computer Vision 3458–3468 (IEEE, 2021); https://doi.org/10.1109/ICCV48922.2021.00346

  • Hosseinzadeh Taher, M. R., Haghighi, F., Gotway, M. B. & Liang, J. CAiD: context-aware instance discrimination for self-supervised learning in medical imaging. Proc. Mach. Learn. Res. 172, 535–551 (2022).

  • Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems (eds Guyon, I. et al.) 6000–6010 (Curran Associates, 2017).

  • The National Lung Screening Trial Research Team Reduced lung-cancer mortality with low-dose computed tomographic screening. N. Engl. J. Med. 365, 395–409 (2011).

    Article 
    PubMed Central 

    Google Scholar
     

  • Morozov, S. P. et al. Mosmeddata: chest CT scans with COVID-19 related findings dataset. Preprint at https://arxiv.org/abs/2005.06465 (2020).

  • Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).

  • Eslami, S., Meinel, C. & De Melo, G. PubMedClip: how much does CLIP benefit visual question answering in the medical domain? In Proc. Findings of the Association for Computational Linguistics (eds Vlachos, A. & Augenstein, I.) 1151–1163 (ACL, 2023).

  • Moor, M. et al. Med-Flamingo: a multimodal medical few-shot learner. In Proc. 3rd Machine Learning for Health Symposium (eds Hegselmann, S. et al.) 353–367 (PMLR, 2023).

  • Li, C. et al. LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day. In Proc. 37th Conference on Neural Information Processing Systems 28541–28564 (Curran Associates, 2023).

  • Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In Proc. International Conference on Learning Representations (OpenReview.net, 2021).

  • Deng, J. et al. Imagenet: a large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  • Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting of the Association for Computational Linguistics (eds Isabelle, P. et al.) 311–318 (ACL, 2002); https://doi.org/10.3115/1073083.1073135

  • Lin, C.-Y. ROUGE: a package for automatic evaluation of summaries. In Proc. Text Summarization Branches Out 74–81 (ACL, 2004).

  • Banerjee, S. & Lavie, A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (eds Goldstein, J. et al.) 65–72 (ACL, 2005).

  • Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. Conference of the North American, Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (eds Burstein, J. et al.) 4171–4186 (ACL, 2019); https://doi.org/10.18653/v1/N19-1423

  • Fedus, W., Zoph, B. & Shazeer, N. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res. 23, 1–39 (2022).


    Google Scholar
     

  • Houlsby, N. et al. Parameter-efficient transfer learning for NLP. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 2790–2799 (PMLR, 2019).

  • Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Luo, R. et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief. Bioinform. 23, bbac409 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Yang, X. et al. A large language model for electronic health records. npj Digit. Med. 5, 194 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rasmy, L., Xiang, Y., Xie, Z., Tao, C. & Zhi, D. Med-bert: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digit. Med. 4, 86 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Liu, S. et al. Multimodal data matters: language model pre-training over structured and unstructured electronic health records. IEEE J. Biomed. Heal. Inform. 27, 504–514 (2023).

    Article 

    Google Scholar
     

  • Peiris, H., Hayat, M., Chen, Z., Egan, G. & Harandi, M. Uncertainty-guided dual-views for semi-supervised volumetric medical image segmentation. Nat. Mach. Intell. 5, 724–738 (2023).

    Article 

    Google Scholar
     

  • Zhou, L. et al. Self pre-training with masked autoencoders for medical image classification and segmentation. In Proc. IEEE 20th International Symposium on Biomedical Imaging 1–6 (IEEE, 2023).

  • Hu, X., Xu, X. & Shi, Y. How to efficiently adapt large segmentation model (SAM) to medical images. Preprint at https://arxiv.org/abs/2306.13731 (2023).

  • Qiu, Z., Hu, Y., Li, H. & Liu, J. Learnable ophthalmology SAM. Preprint at https://arxiv.org/abs/2304.13425 (2023).

  • Cao, H. et al. Swin-Unet: Unet-like pure transformer for medical image segmentation. In Proc. Computer Vision–ECCV 2022 Workshops (eds Karlinsky, L. et al.) 205–218 (Springer, 2023).

  • Schäfer, R. et al. Overcoming data scarcity in biomedical imaging with a foundational multi-task model. Nat. Comput. Sci. 4, 495–509 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Pai, S. et al. Foundation model for cancer imaging biomarkers. Nat. Mach. Intell. 6, 354–367 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tu, T. et al. Towards generalist biomedical AI. NEJM AI 1, AIoa2300138 (2024).

    Article 

    Google Scholar
     

  • Zhou, H.-Y. et al. Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports. Nat. Mach. Intell. 4, 32–40 (2022).

    Article 

    Google Scholar
     

  • Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 2307–2316 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Zhang, K. et al. A generalist vision–language foundation model for diverse biomedical tasks. Nat. Med. https://doi.org/10.1038/s41591-024-03185-2 (2024).

  • Zhou, H.-Y., Adithan, S., Acosta, J. N., Topol, E. J. & Rajpurkar, P. MedVersa: a generalist foundation model for medical image interpretation. Preprint at https://arxiv.org/abs/2405.07988 (2025).

  • Yang, J. et al. Poisoning medical knowledge using large language models. Nat. Mach. Intell. 6, 1156–1168 (2024).

    Article 

    Google Scholar
     

  • Jin, C. et al. Development and evaluation of an artificial intelligence system for COVID-19 diagnosis. Nat. Commun. 11, 5088 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chen, X., Fan, H., Girshick, R. & He, K. Improved baselines with momentum contrastive learning. Preprint at https://arxiv.org/abs/2003.04297 (2020).

  • Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conference on Machine Learning (eds Daumé III, H. & Singh, A.) 1597–1607 (PMLR, 2020).

  • He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 9726–9735 (IEEE, 2020).

  • van den Oord, A., Li, Y. & Vinyals, O. Representation learning with contrastive predictive coding. Preprint at https://arxiv.org/abs/1807.03748 (2019).

  • He, K. et al. Masked autoencoders are scalable vision learners. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15979–15988 https://doi.org/10.1109/CVPR52688.2022.01553 (IEEE, 2022).

  • Brody, S., Alon, U. & Yahav, E. How attentive are graph attention networks? In Proc. International Conference on Learning Representations (OpenReview.net, 2022).

  • Pelka, O., Koitka, S., Rückert, J., Nensa, F. & Friedrich, C. M. Radiology objects in context (ROCO): a multimodal image dataset. In Proc. Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis (eds Stoyanov, D. et al.) 180–189 (Springer, 2018).

  • Liu, H., Li, C., Wu, Q. & Lee, Y. J. Visual instruction tuning. In Proc. 37th Conference on Neural Information Processing Systems 34892–34916 (Curran Associates, 2023).

  • Abnar, S. & Zuidema, W. Quantifying attention flow in transformers. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D. et al.) 4190–4197 (ACL, 2020); https://doi.org/10.18653/v1/2020.acl-main.385

  • Chefer, H., Gur, S. & Wolf, L. Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In Proc. IEEE/CVF International Conference on Computer Vision 387–396 (IEEE, 2021); https://doi.org/10.1109/ICCV48922.2021.00045

  • Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In Proc. International Conference on Learning Representations (OpenReview.net, 2019).

  • Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. In Proc. Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) 8024–8035 (Curran Associates, 2019).

  • Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).


    Google Scholar
     

  • Ma, L. D. et al. MedMPT. GitHub https://github.com/maliangdi/MedMPT (2025).