{"id":132405,"date":"2025-08-09T18:46:11","date_gmt":"2025-08-09T18:46:11","guid":{"rendered":"https:\/\/www.europesays.com\/us\/132405\/"},"modified":"2025-08-09T18:46:11","modified_gmt":"2025-08-09T18:46:11","slug":"lessons-for-synthetic-data-from-care-datas-past","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/us\/132405\/","title":{"rendered":"Lessons for synthetic data from care.data\u2019s past"},"content":{"rendered":"<p>Confidentiality<\/p>\n<p>The risk of breaching patient confidentiality through re-identification became a key tenet that opponents of care.data, including medConfidential, built their opposition on despite its legality<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 10\" title=\"Vezyridis, P. Kindling the fire&#x2019; of NHS patient data exploitations: the care.data controversy in news media discourses. Soc. Sci. Med. 348, 116824 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR10\" id=\"ref-link-section-d133932255e449\" target=\"_blank\" rel=\"noopener\">10<\/a>. Although pseudo-anonymisation removes identifiers from data, there remains a risk of re-identification. Due to this, pseudo-anonymised data is bound by UK General Data Protection Regulation laws that apply to personal data<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\" title=\"Lodie, A. &amp; Lauradoux, C. Is It Personal Data?: solving the Gordian Knot of Anonymisation. in Privacy Symposium 2024 (eds. Hoepman, J.-H., Jensen, M., Porcedda, M. G., Schiffner, S. &amp; Ziegler, S.) 83&#x2013;109 (Springer Nature, 2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR11\" id=\"ref-link-section-d133932255e453\" target=\"_blank\" rel=\"noopener\">11<\/a>. Furthermore, aside from public stakeholders, professional groups, such as the British Medical Association (BMA) and RCGP also raised concerns about the impact of confidentiality concerns on the patient-doctor relationship and how this would negatively impact patient care. NHS England\u2019s failure to reassure the public and professional bodies that the risk of re-identification was low enough to warrant data sharing contributed to the abandonment of care.data. Although synthetic data differs from pseudo-anonymised real-world data, both have risks of re-identification and therefore confidentiality is a critical consideration for synthetic data policy.<\/p>\n<p>Utilising privacy metrics to differentiate synthetic datasets into different risk categories (e.g., low, medium and high) would help policymakers adequately mitigate for risks, and act to reassure public and professional bodies regarding reidentification risk. Set thresholds of privacy risk should be agreed to at a national level by cross-functional teams, bridging technical knowledge with sector-specific insights, led by the relevant government department. Low-fidelity data has the lowest risk of re-identification, and would require much less stringent requirements than current NHS data access processes, allowing for greater data sharing without compromising privacy. For example, as part of an NHS pilot, low-fidelity synthetic data made from Hospital Episode Statistics aggregate data is currently publicly available to download<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 12\" title=\"NHS Digital. Artificial Data Pilot. (NHS Digital, 2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR12\" id=\"ref-link-section-d133932255e460\" target=\"_blank\" rel=\"noopener\">12<\/a>. Medium risk synthetic datasets have a higher re-identification risk and could therefore utilise additional safeguards, such as Trusted Research Environments (TREs). This type of risk stratification also serves as a design criterion for generating organisations as if TREs cannot be supported, then low-fidelity synthetic data would be prioritised for generation. For synthetic datasets with the highest re-identification risks, identical processes needed for real-world data access should be followed. Given this, organisations who need high-fidelity synthetic data may instead choose to focus on real-world data acquisition, as the requirements for access would be the same.<\/p>\n<p>Consent<\/p>\n<p>A significant critique of care.data surrounded the failure of adequately consenting patients in a move that was seen by many as a violation of patient autonomy<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 13\" title=\"Stancic, H. Trust and Records in an Open Digital Environment. (Routledge, 2021).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR13\" id=\"ref-link-section-d133932255e472\" target=\"_blank\" rel=\"noopener\">13<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 14\" title=\"Hays, R. &amp; Daker-White, G. The care.data consensus? A qualitative analysis of opinions expressed on Twitter. BMC Public Health 15, 838 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR14\" id=\"ref-link-section-d133932255e475\" target=\"_blank\" rel=\"noopener\">14<\/a>. Strategies suggested by NHS England to obtain informed consent included posters to be displayed at GP practices and leaflets posted to homes<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"Sterckx, S., Rakic, V., Cockbain, J. &amp; Borry, P. &#x201C;You hoped we would sleep walk into accepting the collection of our data&#x201D;: controversies surrounding the UK care.data scheme and their wider relevance for biomedical research. Med. Health Care Philos. 19, 177&#x2013;190 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR15\" id=\"ref-link-section-d133932255e479\" target=\"_blank\" rel=\"noopener\">15<\/a>. The obvious issue with these includes assumptions that they will be read, as well as exclusionary impacts on patients for whom written text is not accessible (i.e., language barrier, literacy levels). Furthermore, the circulated unaddressed leaflets were often mistaken for junk, and households who opted out of junk mail deliveries were missed altogether<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"McCartney, M. Care.data doesn&#x2019;t care enough about consent. BMJ 348, g2831 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR16\" id=\"ref-link-section-d133932255e483\" target=\"_blank\" rel=\"noopener\">16<\/a>. Of those that did receive and read the leaflet, it was fedback that there was no mention of care.data by name and a lack of detailed risk information, including the possibility of re-identification, as well as details of opting-out<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 16\" title=\"McCartney, M. Care.data doesn&#x2019;t care enough about consent. BMJ 348, g2831 (2014).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR16\" id=\"ref-link-section-d133932255e487\" target=\"_blank\" rel=\"noopener\">16<\/a>.<\/p>\n<p>For synthetic data efforts to be successful, public acceptance is key. Although data sharing initiatives may be lawful, legal authority does not always equate to social legitimacy<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Carter, P., Laurie, G. T. &amp; Dixon-Woods, M. The social licence for research: why care.data ran into trouble. J. Med Ethics 41, 404&#x2013;409 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR17\" id=\"ref-link-section-d133932255e494\" target=\"_blank\" rel=\"noopener\">17<\/a>. Carter et al<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Carter, P., Laurie, G. T. &amp; Dixon-Woods, M. The social licence for research: why care.data ran into trouble. J. Med Ethics 41, 404&#x2013;409 (2015).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR17\" id=\"ref-link-section-d133932255e498\" target=\"_blank\" rel=\"noopener\">17<\/a> explore this further, arguing that data sharing initiatives rely on a \u2018social contract\u2019 that requires trust and transparency in managing data, so that patients continue to consent to its use for research. Patients must understand what synthetic data is, both the risks and the benefits and their rights as data subjects. Without proper provision of information, synthetic data initiatives are likely to face contestation similarly to care.data, because of a breakdown in the social contract necessary for data-sharing initiatives to succeed. Therefore, meaningful engagement with Patient and Public Involvement and Engagement groups should be prioritised by policymakers.<\/p>\n<p>Transparency<\/p>\n<p>A significant critique that contributed to the abandonment of care.data centred on the backlash from the lack of transparency about who would be able to access data<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"Sterckx, S., Rakic, V., Cockbain, J. &amp; Borry, P. &#x201C;You hoped we would sleep walk into accepting the collection of our data&#x201D;: controversies surrounding the UK care.data scheme and their wider relevance for biomedical research. Med. Health Care Philos. 19, 177&#x2013;190 (2016).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR15\" id=\"ref-link-section-d133932255e510\" target=\"_blank\" rel=\"noopener\">15<\/a>. In response to these concerns, the Care Act 2014 was amended to prohibit data release to certain commercial companies (i.e., marketing and insurance)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 10\" title=\"Vezyridis, P. Kindling the fire&#x2019; of NHS patient data exploitations: the care.data controversy in news media discourses. Soc. Sci. Med. 348, 116824 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR10\" id=\"ref-link-section-d133932255e514\" target=\"_blank\" rel=\"noopener\">10<\/a>. Although research has shown that public concern about commercial use of health data reduces when conditions are applied, such as data access requests having a clear public benefit, care.data\u2019s clarifications came too late<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Kalkman, S. et al. Patients&#x2019; and public views and attitudes towards the sharing of health data for research: a narrative review of the empirical evidence. J. Med. Ethics 48, 3&#x2013;13 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR18\" id=\"ref-link-section-d133932255e518\" target=\"_blank\" rel=\"noopener\">18<\/a>. More recently, NHS plans for data sharing through a federated learning platform have also faced obstacles, largely due to the controversy surrounding the award of a contract to Palantir, a multibillion-dollar US tech company<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"Abbasi, K. Trust and the Palantir question. BMJ 388, r452 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR19\" id=\"ref-link-section-d133932255e522\" target=\"_blank\" rel=\"noopener\">19<\/a>.<\/p>\n<p>In light of care.data\u2019s transparency failures, synthetic data initiatives must clearly communicate to patients who the intended users are prior to roll out. To safeguard the interests of patients and garner trust, external organisations that request access to medium and high fidelity synthetic data should also go through a process where their rationale for data use is vetted to ensure it is for public benefit. Furthermore, patients should be able to opt out of synthetic data generation from their real data, with the ability to choose if commercial entities should be given access. This would help ease concerns amongst individuals who oppose commercial access, respecting patient autonomy and allowing patient choice.<\/p>\n<p>In response to the current federated learning controversies stemming from NHS England awarding Palantir a \u00a3480\u2009million contract to create and run the platform, synthetic data initiatives must be transparent in who is creating the data<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"Abbasi, K. Trust and the Palantir question. BMJ 388, r452 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR19\" id=\"ref-link-section-d133932255e532\" target=\"_blank\" rel=\"noopener\">19<\/a>. A designated public body within the NHS would be best served to have ownership of creating and managing access to synthetic data, in a move that would reassure the public and avoid the current controversies seen amongst other privacy-enhancing technology (PET) initiatives. Where outsourcing is necessary, conflicts of interest must be properly considered and published to ensure partners are considered trustworthy when managing sensitive data. For example, the openSAFELY federated platform, a publicly funded collaborative project, has gained support from the BMA, RCGP, and medConfidential, highlighting that trust in the same technology can be eroded depending on who is managing that platform<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 20\" title=\"Mahase, E. Researchers could soon access GP patient data&#x2014;how will it work? BMJ 388, r375 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01928-0#ref-CR20\" id=\"ref-link-section-d133932255e536\" target=\"_blank\" rel=\"noopener\">20<\/a>.<\/p>\n<p>In summary, synthetic data offers promise in solving issues of data availability and imbalance when developing AI models. For synthetic data initiatives to be successful in a UK context, lessons from previous endeavours, such as care.data must be learned by prioritising patient confidentiality, consent and organisational transparency, with the ultimate aim of improving patient care.<\/p>\n","protected":false},"excerpt":{"rendered":"Confidentiality The risk of breaching patient confidentiality through re-identification became a key tenet that opponents of care.data, including&hellip;\n","protected":false},"author":3,"featured_media":132406,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[35],"tags":[15576,150,80285,30258,834,454,210,1141,1142,3228,3740,3209,67,132,68],"class_list":{"0":"post-132405","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-health-care","8":"tag-biomedicine","9":"tag-biotechnology","10":"tag-data-acquisition","11":"tag-ethics","12":"tag-general","13":"tag-government","14":"tag-health","15":"tag-health-care","16":"tag-healthcare","17":"tag-law","18":"tag-medical-research","19":"tag-medicine-public-health","20":"tag-united-states","21":"tag-unitedstates","22":"tag-us"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@us\/115000297561060292","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/132405","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/comments?post=132405"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/132405\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media\/132406"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media?parent=132405"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/categories?post=132405"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/tags?post=132405"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}