{"id":559251,"date":"2025-11-09T11:33:26","date_gmt":"2025-11-09T11:33:26","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/559251\/"},"modified":"2025-11-09T11:33:26","modified_gmt":"2025-11-09T11:33:26","slug":"reporting-guidelines-for-studies-involving-generative-artificial-intelligence-applications-what-do-i-use-and-when","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/559251\/","title":{"rendered":"Reporting guidelines for studies involving generative artificial intelligence applications: what do I use, and when?"},"content":{"rendered":"<p>The rise in publications addressing the use of general artificial intelligence (GAI), namely large language models (LLMs), for health purposes has generated the need to guide authors on transparent reporting practices<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 1\" title=\"Huo, B. et al. Reporting standards for the use of large language model-linked chatbots for health advice. Nat. Med. 29, 2988 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR1\" id=\"ref-link-section-d301500351e543\" target=\"_blank\" rel=\"noopener\">1<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 2\" title=\"Huo, B. et al. Large language models for chatbot health advice studies: a systematic review. JAMA Netw. Open 8, e2457879 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR2\" id=\"ref-link-section-d301500351e546\" target=\"_blank\" rel=\"noopener\">2<\/a>. Although LLMs currently dominate, other GAI applications such as diffusion models and large multimodal models are gaining popularity<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"The CHART Collaborative. Protocol for the development of the Chatbot Assessment Reporting Tool (CHART) for clinical advice. BMJ Open 14, 1&#x2013;7 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR3\" id=\"ref-link-section-d301500351e550\" target=\"_blank\" rel=\"noopener\">3<\/a>. One key distinction between GAI and conventional AI is the ability of GAI to create new information based on its training data. Varying methodology and incomplete reporting among studies applying GAI for health purposes compromise the ability of readers to accurately interpret the study findings<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"The CHART Collaborative. Protocol for the development of the Chatbot Assessment Reporting Tool (CHART) for clinical advice. BMJ Open 14, 1&#x2013;7 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR3\" id=\"ref-link-section-d301500351e554\" target=\"_blank\" rel=\"noopener\">3<\/a>, which is a particularly relevant issue when evaluating the effectiveness of complex GAI platforms in a healthcare context.<\/p>\n<p>GAI models are now used to address a variety of research questions across alternative study designs, which require novel reporting guidelines<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 4\" title=\"Kolbinger, F. R., Veldhuizen, G. P., Zhu, J., Truhn, D. &amp; Kather, J. N. Reporting guidelines in medical artificial intelligence: a systematic review and meta-analysis. Commun. Med. 4, 71 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR4\" id=\"ref-link-section-d301500351e561\" target=\"_blank\" rel=\"noopener\">4<\/a>. While more than 25 reporting guidelines address studies applying artificial intelligence or machine learning in a healthcare context, very few reporting standards apply to studies involving GAI applications in healthcare, while fewer adhere to contemporary methodological standards<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60&#x2013;69 (2025).\" href=\"#ref-CR5\" id=\"ref-link-section-d301500351e565\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"The CHART Collaborative. Reporting guidelines for chatbot health advice studies: explanation and elaboration for the Chatbot Assessment Reporting Tool (CHART). BMJ 390, e083305 (2025).\" href=\"#ref-CR6\" id=\"ref-link-section-d301500351e565_1\">6<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Luo, X. et al. Reporting guideline for the use of Generative Artificial intelligence tools in MEdical Research: the GAMER statement. BMJ Evid. Based Med. 0, 1&#x2013;11 (2025).\" href=\"#ref-CR7\" id=\"ref-link-section-d301500351e565_2\">7<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"The CHART Collaborative. Reporting guideline for chatbot health advice studies: the Chatbot Assessment Reporting Tool (CHART) statement. BMJ Med. 0, e083305 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR8\" id=\"ref-link-section-d301500351e568\" target=\"_blank\" rel=\"noopener\">8<\/a>. As journal editors adopt these reporting standards, investigators may be encouraged to complete and submit checklists and methodological diagrams to accompany their submissions to optimize the transparent reporting of their methods. Authors applying GAI models in healthcare must therefore carefully identify the most appropriate reporting guideline for their study, as these standards contain tailored items for studies involving GAI models<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60&#x2013;69 (2025).\" href=\"#ref-CR5\" id=\"ref-link-section-d301500351e572\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"The CHART Collaborative. Reporting guidelines for chatbot health advice studies: explanation and elaboration for the Chatbot Assessment Reporting Tool (CHART). BMJ 390, e083305 (2025).\" href=\"#ref-CR6\" id=\"ref-link-section-d301500351e572_1\">6<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Luo, X. et al. Reporting guideline for the use of Generative Artificial intelligence tools in MEdical Research: the GAMER statement. BMJ Evid. Based Med. 0, 1&#x2013;11 (2025).\" href=\"#ref-CR7\" id=\"ref-link-section-d301500351e572_2\">7<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"The CHART Collaborative. Reporting guideline for chatbot health advice studies: the Chatbot Assessment Reporting Tool (CHART) statement. BMJ Med. 0, e083305 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR8\" id=\"ref-link-section-d301500351e575\" target=\"_blank\" rel=\"noopener\">8<\/a>. The purpose of this article is to summarize current GAI reporting guidelines of contemporary rigor and highlight those that are in development.<\/p>\n<p>Reporting guidelines for GAI<\/p>\n<p>Selecting the most suitable reporting guideline will generally depend on the research aims. Figure <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#Fig1\" target=\"_blank\" rel=\"noopener\">1<\/a> provides a list of potential research aims currently addressed by reporting guidelines. At the time of writing, LLMs are the predominant GAI model being evaluated in the healthcare context, though other popular examples include diffusion models and large multimodal models<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 9\" title=\"Rouzrokh, P. et al. A current review of generative AI in medicine: core concepts, applications, and current limitations. Curr. Rev. Musculoskelet. Med. 18, 246&#x2013;266 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR9\" id=\"ref-link-section-d301500351e589\" target=\"_blank\" rel=\"noopener\">9<\/a>. Studies involving LLMs are addressed by the Chatbot Assessment Reporting Tool (CHART), the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD)-LLM, or the Generative Artificial intelligence tools in Medical Research (GAMER)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60&#x2013;69 (2025).\" href=\"#ref-CR5\" id=\"ref-link-section-d301500351e593\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"The CHART Collaborative. Reporting guidelines for chatbot health advice studies: explanation and elaboration for the Chatbot Assessment Reporting Tool (CHART). BMJ 390, e083305 (2025).\" href=\"#ref-CR6\" id=\"ref-link-section-d301500351e593_1\">6<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Luo, X. et al. Reporting guideline for the use of Generative Artificial intelligence tools in MEdical Research: the GAMER statement. BMJ Evid. Based Med. 0, 1&#x2013;11 (2025).\" href=\"#ref-CR7\" id=\"ref-link-section-d301500351e593_2\">7<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"The CHART Collaborative. Reporting guideline for chatbot health advice studies: the Chatbot Assessment Reporting Tool (CHART) statement. BMJ Med. 0, e083305 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR8\" id=\"ref-link-section-d301500351e596\" target=\"_blank\" rel=\"noopener\">8<\/a>.<\/p>\n<p><b id=\"Fig1\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 1<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41746-025-02113-z\/figures\/1\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/11\/41746_2025_2113_Fig1_HTML.png\" alt=\"figure 1\" loading=\"lazy\" width=\"685\" height=\"827\"\/><\/a><\/p>\n<p>Overview of GAI reporting guidelines<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Washington, C. J. et al. Evaluating the effectiveness of ChatGPT and Google Gemini in providing lung cancer screening recommendations for vulnerable communities. CHEST Pulm. 3, 100167 (2025).\" href=\"#ref-CR17\" id=\"ref-link-section-d301500351e610\">17<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Alomari, E. Evaluating ChatGPT for disease prediction: a comparative study on heart disease and diabetes. BioMedInformatics 5, 33 (2025).\" href=\"#ref-CR18\" id=\"ref-link-section-d301500351e610_1\">18<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Huo, B. et al. Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD. Surg. Endosc. 38, 5668&#x2013;5677 (2024).\" href=\"#ref-CR19\" id=\"ref-link-section-d301500351e610_2\">19<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Pastrak, M. et al. Evaluation of ChatGPT performance on emergency medicine board examination questions: observational study. JMIR AI 4, e67696 (2025).\" href=\"#ref-CR20\" id=\"ref-link-section-d301500351e610_3\">20<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Elliott, D. B. An editorial on myopia control, mainly written by ChatGPT. Optom. Vis. Sci. 101, 233&#x2013;235 (2024).\" href=\"#ref-CR21\" id=\"ref-link-section-d301500351e610_4\">21<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"O&#x2019;Connor, S. Open artificial intelligence platforms in nursing education: tools for academic progress or abuse? Nurse Educ. Pr. 66, 103537 (2023).\" href=\"#ref-CR22\" id=\"ref-link-section-d301500351e610_5\">22<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Peng, C. et al. A study of generative large language model for medical research and healthcare. NPJ Digit. Med. 6, 210 (2023).\" href=\"#ref-CR23\" id=\"ref-link-section-d301500351e610_6\">23<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Chen, S. et al. The effect of using a large language model to respond to patient messages. Lancet Digit Health 6, e379&#x2013;e381 (2024).\" href=\"#ref-CR24\" id=\"ref-link-section-d301500351e610_7\">24<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 25\" title=\"Han, L., Erofeev, G., Sorokina, I., Gladkoff, S. &amp; Nenadic, G. Examining large pre-trained language models for machine translation: what you don&#x2019;t know about it. Preprint at &#010;                  https:\/\/arxiv.org\/abs\/2209.07417&#010;                  &#010;                 1&#x2013;12 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR25\" id=\"ref-link-section-d301500351e613\" target=\"_blank\" rel=\"noopener\">25<\/a>.<\/p>\n<p>Clinical evidence summaries and health advice<\/p>\n<p>CHART provides reporting recommendations for studies evaluating any GAI model or GAI-driven chatbot that summarizes clinical evidence and provides health advice\u2014termed Chatbot Health Advice (CHA) studies<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"The CHART Collaborative. Reporting guidelines for chatbot health advice studies: explanation and elaboration for the Chatbot Assessment Reporting Tool (CHART). BMJ 390, e083305 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR6\" id=\"ref-link-section-d301500351e633\" target=\"_blank\" rel=\"noopener\">6<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"The CHART Collaborative. Reporting guideline for chatbot health advice studies: the Chatbot Assessment Reporting Tool (CHART) statement. BMJ Med. 0, e083305 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR8\" id=\"ref-link-section-d301500351e636\" target=\"_blank\" rel=\"noopener\">8<\/a>. CHART can also be applied to studies of standalone GAI models, provided the model interacts with users in natural language, such as through an application programming interface. Investigators should apply CHART for CHA studies evaluating a single GAI model or GAI-driven chatbot, as well as in comparative studies between multiple GAI models or chatbots<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"The CHART Collaborative. Reporting guidelines for chatbot health advice studies: explanation and elaboration for the Chatbot Assessment Reporting Tool (CHART). BMJ 390, e083305 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR6\" id=\"ref-link-section-d301500351e640\" target=\"_blank\" rel=\"noopener\">6<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"The CHART Collaborative. Reporting guideline for chatbot health advice studies: the Chatbot Assessment Reporting Tool (CHART) statement. BMJ Med. 0, e083305 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR8\" id=\"ref-link-section-d301500351e643\" target=\"_blank\" rel=\"noopener\">8<\/a>. The framework is also relevant for evaluations of tuned or fine-tuned GAI models or chatbots for tailored evidence summaries or health advice. While examples are provided in Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#Fig1\" target=\"_blank\" rel=\"noopener\">1<\/a>, CHART\u2019s scope includes clinical evidence or health advice related to health prevention, screening, diagnosis, treatment, prognosis, and general health information<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"The CHART Collaborative. Reporting guidelines for chatbot health advice studies: explanation and elaboration for the Chatbot Assessment Reporting Tool (CHART). BMJ 390, e083305 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR6\" id=\"ref-link-section-d301500351e650\" target=\"_blank\" rel=\"noopener\">6<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"The CHART Collaborative. Reporting guideline for chatbot health advice studies: the Chatbot Assessment Reporting Tool (CHART) statement. BMJ Med. 0, e083305 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR8\" id=\"ref-link-section-d301500351e653\" target=\"_blank\" rel=\"noopener\">8<\/a>.<\/p>\n<p>Model development, document generation, and outcome prediction<\/p>\n<p>Authors may apply TRIPOD-LLM across a wide range of use cases, from de novo LLM development to using LLMs for generating medical documents or predicting outcomes using patient data<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60&#x2013;69 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR5\" id=\"ref-link-section-d301500351e666\" target=\"_blank\" rel=\"noopener\">5<\/a>. The TRIPOD-LLM authors also recommend its use for studies assessing an LLM\u2019s capability in tasks such as:<\/p>\n<ol class=\"u-list-style-none\">\n<li>\n                    \u25cb<\/p>\n<p>Text processing (e.g., identifying predefined categories of objects in a body of data, or named entity recognition)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60&#x2013;69 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR5\" id=\"ref-link-section-d301500351e680\" target=\"_blank\" rel=\"noopener\">5<\/a>.<\/p>\n<\/li>\n<li>\n                    \u25cb<\/p>\n<p>Classification (e.g., determining whether a clinic note uses a patient\u2019s pronouns correctly).<\/p>\n<\/li>\n<li>\n                    \u25cb<\/p>\n<p>Information retrieval (e.g., training a GAI model to respond to user queries using relevant publications)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60&#x2013;69 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR5\" id=\"ref-link-section-d301500351e706\" target=\"_blank\" rel=\"noopener\">5<\/a>.<\/p>\n<\/li>\n<li>\n                    \u25cb<\/p>\n<p>Summarization (e.g., translating clinical documents into specific languages for patients).<\/p>\n<\/li>\n<\/ol>\n<p>Figure <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#Fig1\" target=\"_blank\" rel=\"noopener\">1<\/a> outlines further use cases, as does the original TRIPOD-LLM publication<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60&#x2013;69 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR5\" id=\"ref-link-section-d301500351e730\" target=\"_blank\" rel=\"noopener\">5<\/a>. The reporting recommendations are suitable for evaluations of a single LLM or comparisons among multiple LLMs.<\/p>\n<p>Applying GAI for manuscript writing<\/p>\n<p>Studies discussed thus far have evaluated GAI model performance for specific study objectives. However, there is growing interest in applying GAI models to assist in manuscript writing across traditional research designs<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Luo, X. et al. Reporting guideline for the use of Generative Artificial intelligence tools in MEdical Research: the GAMER statement. BMJ Evid. Based Med. 0, 1&#x2013;11 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR7\" id=\"ref-link-section-d301500351e742\" target=\"_blank\" rel=\"noopener\">7<\/a>. Rather than focusing on model performance, the GAMER reporting guideline provides recommendations that address studies where all or portions of a manuscript are written by a GAI model for medical research<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Luo, X. et al. Reporting guideline for the use of Generative Artificial intelligence tools in MEdical Research: the GAMER statement. BMJ Evid. Based Med. 0, 1&#x2013;11 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR7\" id=\"ref-link-section-d301500351e746\" target=\"_blank\" rel=\"noopener\">7<\/a>. For example, authors may apply GAMER if they apply a GAI model to assist in writing a case report. Figure <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#Fig1\" target=\"_blank\" rel=\"noopener\">1<\/a> lists additional examples.<\/p>\n<p>Strengths and limitations of current reporting guidelines<\/p>\n<p>All reporting guidelines described above followed methodological guidance from the Enhancing the QUAlity and Transparency Of health Research Network; an international initiative to improve the transparency of health research<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 10\" title=\"Simera, I., Moher, D., Hoey, J., Schulz, K. F. &amp; Altman, D. G. The EQUATOR Network and reporting guidelines: helping to achieve high standards in reporting health research studies. Maturitas 63, 4&#x2013;6 (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR10\" id=\"ref-link-section-d301500351e761\" target=\"_blank\" rel=\"noopener\">10<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\" title=\"Moher, D., Schulz, K. F., Simera, I. &amp; Altman, D. G. Guidance for developers of health research reporting guidelines. PLoS Med. 7, e1000217 (2010).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR11\" id=\"ref-link-section-d301500351e764\" target=\"_blank\" rel=\"noopener\">11<\/a>. These reporting guidelines currently apply to LLMs, while CHART and TRIPOD-LLM are designed as living documents which will be updated periodically to respond to advances in the field<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60&#x2013;69 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR5\" id=\"ref-link-section-d301500351e768\" target=\"_blank\" rel=\"noopener\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"The CHART Collaborative. Reporting guidelines for chatbot health advice studies: explanation and elaboration for the Chatbot Assessment Reporting Tool (CHART). BMJ 390, e083305 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR6\" id=\"ref-link-section-d301500351e771\" target=\"_blank\" rel=\"noopener\">6<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"The CHART Collaborative. Reporting guideline for chatbot health advice studies: the Chatbot Assessment Reporting Tool (CHART) statement. BMJ Med. 0, e083305 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR8\" id=\"ref-link-section-d301500351e774\" target=\"_blank\" rel=\"noopener\">8<\/a>. Authors applying conventional study designs such as randomized controlled trials or cohort studies should continue to adhere to relevant tools such as the CONsolidated Standards Of Reporting Trials (CONSORT) and the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) reporting guidelines in addition to those described here<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Gallifant, J. et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat. Med. 31, 60&#x2013;69 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR5\" id=\"ref-link-section-d301500351e778\" target=\"_blank\" rel=\"noopener\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 12\" title=\"Yoon, W. J. et al. LCD benchmark: long clinical document benchmark on mortality prediction for language models. J. Am. Med. Inf. Assoc. 32, 285&#x2013;295 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR12\" id=\"ref-link-section-d301500351e781\" target=\"_blank\" rel=\"noopener\">12<\/a>.<\/p>\n<p>One strength of the CHART reporting guideline was its input from broad representation of interdisciplinary stakeholders through 531 members during the Delphi consensus. Though it is highly applicable to CHA studies, its scope is narrow. In contrast, TRIPOD-LLM applies to a multitude of use cases involving LLMs, though the applicability of each checklist item may depend on the specific use case. While the GAMER checklist is concise and specifically relevant for medical research, it may lack important items included in other reporting guidelines.<\/p>\n<p>Reporting guidelines in development<\/p>\n<p>There are multiple reporting guidelines in development including the ChatGPT and Artificial Intelligence Natural Large Language Models for Accountable Reporting and Use (CANGARU) reporting guideline<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 13\" title=\"Cacciamani, G., Gill, I. &amp; Collins, G. ChatGPT: standard reporting guidelines for responsible use. Nature 618, 1&#x2013;1 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR13\" id=\"ref-link-section-d301500351e796\" target=\"_blank\" rel=\"noopener\">13<\/a>. CANGARU is being developed according to robust methodological standards involving a living systematic review, Delphi consensus, and panel consensus meetings among international, multidisciplinary stakeholders<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Cacciamani, G.E. Development of the ChatGPT, Generative Artificial Intelligence and Natural Large Language Models for ACcountable Reporting and Use (CANGARU) Guidelines. Preprint at &#010;                  https:\/\/arxiv.org\/abs\/2307.08974&#010;                  &#010;                 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR14\" id=\"ref-link-section-d301500351e800\" target=\"_blank\" rel=\"noopener\">14<\/a>. Once published, investigators may be interested in the CANGARU guidelines when using LLMs in academic research and scientific writing. The CANGARU guidelines will apply to studies within medicine, but also to those studies using LLMs for manuscript writing in other non-medical scientific sectors<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Cacciamani, G.E. Development of the ChatGPT, Generative Artificial Intelligence and Natural Large Language Models for ACcountable Reporting and Use (CANGARU) Guidelines. Preprint at &#010;                  https:\/\/arxiv.org\/abs\/2307.08974&#010;                  &#010;                 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR14\" id=\"ref-link-section-d301500351e804\" target=\"_blank\" rel=\"noopener\">14<\/a>.<\/p>\n<p>In health economics, investigators have initiated the ELEVATE-GenAI framework with 10 preliminary checklist items after a targeted literature review, iterative discussion, and usability testing for both systematic reviews and health economic modeling<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"Fleurence, R. L. et al. ELEVATE-GenAI: reporting guidelines for the use of large language models in health economics and outcomes research: an ISPOR Working Group on Generative AI Report. Value Health 11, 1&#x2013;57 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR15\" id=\"ref-link-section-d301500351e811\" target=\"_blank\" rel=\"noopener\">15<\/a>. It currently consists of a structured framework and a checklist for practical implementation which uses a scoring system, with a maximum of 3 points awarded per domain. The authors are planning stakeholder consultation across various disciplines through a Delphi consensus to improve the validity of the tool<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"Fleurence, R. L. et al. ELEVATE-GenAI: reporting guidelines for the use of large language models in health economics and outcomes research: an ISPOR Working Group on Generative AI Report. Value Health 11, 1&#x2013;57 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR15\" id=\"ref-link-section-d301500351e815\" target=\"_blank\" rel=\"noopener\">15<\/a>.<\/p>\n<p>In contrast, the Consolidated Criteria for Reporting Qualitative Research (COREQ) extension for LLMs (COREQ-LLM) will address studies employing LLMs for qualitative research<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"Fehring, L. et al. Reporting of qualitative research using large language models (COREQ+LLM): protocol for an extension of the consolidated criteria for reporting qualitative research guideline. JMIR Res. Protoc. 14, 1&#x2013;19 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR16\" id=\"ref-link-section-d301500351e822\" target=\"_blank\" rel=\"noopener\">16<\/a>. COREQ-LLM will be developed following a systematic scoping review and Delphi consensus to identify checklist items to aid in the transparent reporting of qualitative research involving LLMs. It is anticipated that this reporting guideline will address current trends in qualitative research in which LLMs are used to support research design, data processing, analysis, interpretation, and direct interaction with qualitative data<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"Fehring, L. et al. Reporting of qualitative research using large language models (COREQ+LLM): protocol for an extension of the consolidated criteria for reporting qualitative research guideline. JMIR Res. Protoc. 14, 1&#x2013;19 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-02113-z#ref-CR16\" id=\"ref-link-section-d301500351e826\" target=\"_blank\" rel=\"noopener\">16<\/a>.<\/p>\n<p>These represent the first iterations of reporting guidelines addressing the landscape of GAI research in healthcare. They address the development of GAI models as well as the use of GAI models for manuscript writing, summarizing clinical evidence, providing health advice, or predicting health outcomes using electronic health records. Clinicians, researchers, journal editors, and publishers should note that these reporting guidelines and apply to any studies evaluating the use of GAI models for health purposes. Future iterations, extensions, and\/or new reporting guidelines will keep pace with the dynamically transforming nature of the field. Researchers must remain up-to-date with the literature and continue to apply the most applicable reporting standards to their work as we work toward the safe and responsible integration of GAI technology in healthcare. Journal editors and publishers must also be alert to updates in the GAI field and continue to encourage authors to adhere to relevant reporting standards. We will perform a living systematic survey of GAI-oriented reporting guidelines to help readers remain up-to-date with the dynamically evolving environment of GAI literature.<\/p>\n","protected":false},"excerpt":{"rendered":"The rise in publications addressing the use of general artificial intelligence (GAI), namely large language models (LLMs), for&hellip;\n","protected":false},"author":2,"featured_media":559252,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4316],"tags":[3967,12848,90447,13450,3968,105,3941,4348,7462,1096,20181,16,15],"class_list":{"0":"post-559251","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-healthcare","8":"tag-biomedicine","9":"tag-biotechnology","10":"tag-business-and-industry","11":"tag-computational-biology-and-bioinformatics","12":"tag-general","13":"tag-health","14":"tag-health-care","15":"tag-healthcare","16":"tag-mathematics-and-computing","17":"tag-medical-research","18":"tag-medicine-public-health","19":"tag-uk","20":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/115519528586256291","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/559251","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=559251"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/559251\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/559252"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=559251"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=559251"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=559251"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}