{"id":82566,"date":"2025-05-07T18:52:15","date_gmt":"2025-05-07T18:52:15","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/82566\/"},"modified":"2025-05-07T18:52:15","modified_gmt":"2025-05-07T18:52:15","slug":"concerns-raised-over-foresight-ai-trained-on-57-million-nhs-medical-records","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/82566\/","title":{"rendered":"Concerns raised over Foresight AI trained on 57 million NHS medical records"},"content":{"rendered":"<p><img decoding=\"async\" class=\"Image\" alt=\"\" width=\"1350\" height=\"901\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/05\/SEI_250200236.jpg\"   loading=\"eager\" fetchpriority=\"high\" data-image-context=\"Article\" data-image-id=\"2479303\" data-caption=\"The Foresight AI model uses data taken from hospital and family doctor records in England\" data-credit=\"Hannah McKay\/Reuters\/Bloomberg via Getty Images\"\/><\/p>\n<p class=\"ArticleImageCaption__Title\">The Foresight AI model uses data taken from hospital and family doctor records in England<\/p>\n<p class=\"ArticleImageCaption__Credit\">Hannah McKay\/Reuters\/Bloomberg via Getty Images<\/p>\n<\/p>\n<p>An artificial intelligence model trained on the medical data of 57 million people who have used the National Health Service in England could one day assist doctors in predicting disease or forecast hospitalisation rates, its creators have claimed. However, other researchers say there are still significant privacy and data protection concerns around such large-scale use of health data, while even the AI\u2019s architects say they can\u2019t guarantee that it won\u2019t inadvertently reveal sensitive patient data.<\/p>\n<p>The model, called Foresight, was <a href=\"https:\/\/www.newscientist.com\/article\/2356001-medical-ais-are-advancing-when-will-they-be-in-a-clinic-near-you\/\" target=\"_blank\" rel=\"noopener\">first developed in 2023<\/a>. That initial version used OpenAI\u2019s GPT-3, the large language model (LLM) behind the first version of ChatGPT, and trained on 1.5 million real patient records from two London hospitals.<\/p>\n<p>Now, <a href=\"https:\/\/www.ctomlinson.net\/\" target=\"_blank\" rel=\"noopener\">Chris Tomlinson<\/a> at University College London and his colleagues have scaled up Foresight to create what they say is the world\u2019s first \u201cnational-scale generative AI model of health data\u201d and the largest of its kind.<\/p>\n<p>Foresight uses eight different datasets of medical information routinely collected by the NHS in England between November 2018 to December 2023 and is based on Meta\u2019s open-source LLM Llama 2. These datasets include outpatient appointments, hospital visits, vaccination data and records, comprising a total of 10 billion different health events for 57 million people \u2013 essentially everyone in England.<\/p>\n<p>Tomlinson says his team isn\u2019t releasing information about how well Foresight performs because the model is still being tested, but he claims it could one day be used to do everything from making individual diagnoses to predicting broad future health trends, such as hospitalisations or heart attacks. \u201cThe real potential of Foresight is to predict disease complications before they happen, giving us a valuable window to intervene early, and enabling a shift towards more preventative healthcare at scale,\u201d he told a press conference on 6 May.<\/p>\n<p>While the potential benefits are yet to be supported, there are already concerns about people\u2019s medical data being fed to an AI at such a large scale. The researchers insist all records were \u201cde-identified\u201d before being used to train the AI, but the <a href=\"https:\/\/www.newscientist.com\/article\/2210928-anonymised-data-isnt-nearly-anonymous-enough-heres-how-we-fix-it\/\" target=\"_blank\" rel=\"noopener\">risks of someone being able to use patterns in the data to re-identify the records are well-recorded<\/a>, particularly when it comes to large datasets.<\/p>\n<p>\u201cBuilding powerful generative AI models that protect patient privacy is an open, unsolved scientific problem,\u201d says <a href=\"https:\/\/www.oii.ox.ac.uk\/people\/profiles\/luc-rocher\/\" target=\"_blank\" rel=\"noopener\">Luc Rocher<\/a> at the University of Oxford. \u201cThe very richness of data that makes it valuable for AI also makes it incredibly hard to anonymise. These models should remain under strict NHS control where they can be safely used.\u201d<\/p>\n<p>\u201cThe data that goes into the model is de-identified, so the direct identifiers are removed,\u201d said <a href=\"https:\/\/digital.nhs.uk\/people\/nhs-digital\/data-services\/michael-chapman\" target=\"_blank\" rel=\"noopener\">Michael Chapman<\/a> at NHS Digital, speaking at the press conference. But Chapman, who oversees the data used to train Foresight, admitted that there is always a risk of re-identification: \u201cIt\u2019s then very hard with rich health data to give 100 per cent certainty that somebody couldn\u2019t be spotted in that dataset.\u201d<\/p>\n<p>To mitigate this risk, Chapman said the AI is operating within a custom-built \u201csecure\u201d NHS data environment to ensure that information isn\u2019t leaked out of the model and is accessible only to approved researchers. Amazon Web Services and data company Databricks have also supplied \u201ccomputational infrastructure\u201d, but can\u2019t access the data, said Tomlinson.<\/p>\n<p><a href=\"http:\/\/www.demontjoye.com\/\" target=\"_blank\" rel=\"noopener\">Yves-Alexandre de Montjoye<\/a> at Imperial College London says one way to check whether models can reveal sensitive information is to verify whether they can memorise data seen during training. When asked by New Scientist whether the Foresight team had conducted these tests, Tomlinson said it hadn\u2019t, but that it was looking at doing so in the future.<\/p>\n<p>Using such a vast dataset without communicating to people how the data has been used can also weaken public trust, says <a href=\"https:\/\/www.oxford-aiethics.ox.ac.uk\/caroline-emmer-de-albuquerque-green\" target=\"_blank\" rel=\"noopener\">Caroline Green<\/a> at the University of Oxford. \u201cEven if it is being anonymised, it\u2019s something that people feel very strongly about from an ethical point of view, because people usually want to keep control over their data and they want to know where it\u2019s going.\u201d<\/p>\n<p>But existing controls give people little chance to opt out of their data being used by Foresight. All of the data used to train the model comes from nationally collected NHS datasets, and because it has been \u201cde-identified\u201d, <a href=\"https:\/\/digital.nhs.uk\/services\/national-data-opt-out\/understanding-the-national-data-opt-out#is-the-use-or-disclosure-confidential-patient-information-\" target=\"_blank\" rel=\"noopener\">existing opt-out mechanisms don\u2019t apply<\/a>, says an NHS England spokesperson, though people who have chosen not to share data from their family doctor won\u2019t have this fed into the model.<\/p>\n<p>Under the <a href=\"https:\/\/www.newscientist.com\/article\/mg23831794-700-those-gdpr-emails-should-stop-soon-but-our-data-nightmare-wont\/\" target=\"_blank\" rel=\"noopener\">General Data Protection Regulation<\/a> (GDPR), people must have the option to withdraw consent for the use of their personal data, but because of the way LLMs like Foresight are trained, it isn\u2019t possible to remove a single record from an AI tool. The NHS England spokesperson says that \u201cas the data used to train the model is anonymised, it is not using personal data and GDPR would therefore not apply\u201d.<\/p>\n<p>Exactly how the GDPR should address the impossibility of removing data from an LLM is an <a href=\"https:\/\/arxiv.org\/abs\/2307.03941\" target=\"_blank\" rel=\"noopener\">untested legal question<\/a>, but the UK Information Commissioner\u2019s Office\u2019s website states that \u201cde-identified\u201d data should not be used as a synonym for anonymous data. \u201cThis is because UK data protection law doesn\u2019t define the term, so using it can lead to confusion,\u201d <a href=\"https:\/\/ico.org.uk\/for-organisations\/uk-gdpr-guidance-and-resources\/data-sharing\/anonymisation\/introduction-to-anonymisation\/#:~:text=While%20the%20term%20&#039;de%2Didentified,it%20can%20lead%20to%20confusion\" target=\"_blank\" rel=\"noopener\">it states<\/a>.<\/p>\n<p>The legal position is further complicated because Foresight is currently being used only for research related to covid-19, says Tomlinson. That means exceptions to data protection laws enacted during the pandemic still apply, says Sam Smith at <a href=\"https:\/\/medconfidential.org\/\" target=\"_blank\" rel=\"noopener\">medConfidential<\/a>, a UK data privacy organisation. \u201cThis covid-only AI almost certainly has patient data embedded in it, which cannot be let out of the lab,\u201d he says. \u201cPatients should have control over how their data is used.\u201d<\/p>\n<p>Ultimately, the competing rights and responsibilities around using medical data for AI leave Foresight in an uncertain position. \u201cThere is a bit of a problem when it comes to AI development, where the ethics and people are a second thought, rather than the starting point,\u201d says Green. \u201cBut what we need is the humans and the ethics need to be the starting point, and then comes the technology.\u201d<\/p>\n<p>                    Article amended on 7 May 2025<\/p>\n<p>We have correctly attributed comments made by an NHS England spokesperson<\/p>\n<p class=\"ArticleTopics__Heading\">Topics:<\/p>\n","protected":false},"excerpt":{"rendered":"The Foresight AI model uses data taken from hospital and family doctor records in England Hannah McKay\/Reuters\/Bloomberg via&hellip;\n","protected":false},"author":2,"featured_media":82567,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4316],"tags":[2814,105,4348,6512,16,15],"class_list":{"0":"post-82566","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-healthcare","8":"tag-data","9":"tag-health","10":"tag-healthcare","11":"tag-privacy","12":"tag-uk","13":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114468064223727922","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/82566","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=82566"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/82566\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/82567"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=82566"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=82566"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=82566"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}