A large government study published Thursday shows more definitively than ever before that Americans’ self-reported race is a poor proxy for their genetic ancestry. Researchers said the findings have major implications for the way health disparities are studied, and how they are discussed in the public sphere.
The new paper offered more nuance and consideration to the complicated relationship between race and genetics than past studies, outside commentators said. Its massive dataset and National Institutes of Health authors give authority to its conclusions, which arrive amid a heated debate over the role racial categories play in research as the Trump administration has targeted grants it deems related to “diversity, equity, and inclusion” as being “unscientific.”
In the study, published Thursday in the American Journal of Human Genetics, researchers analyzed the genomes of more than 200,000 participants in the All of Us cohort, which was established by the NIH to create a dataset that accurately represented the makeup of the United States.
Outside scientists said the study drives home the important differences between race and genetic ancestry. “The clear message here is that these are two distinct constructs, they mean different things, and they should not be used interchangeably,” said Luisa Borrell, a social epidemiologist at the CUNY School of Public Health who worked on a 2023 National Academies report on the use of race in genetics research.
Jonathan Kahn, a legal and historical scholar at Northeastern Law School who just published a book on the use of diversity in biology and in law, added, “I thought it was thoughtful and much more nuanced than the kind of language geneticists were using about race and variation, even five years ago, let alone 20 years ago. There was a lot of movement towards a subtler, more well-considered examination and articulation of relationships between race and genetic variation.”
A STAT Investigation: Embedded Bias
The study mapped people’s self-reported race with their genetic ancestry, and through several different analyses, it points to a much fuzzier understanding of race, as opposed to thinking of racial groups as distinct and easily defined.
In one case, it used body mass index as an example of how broad racial categories can be misleading. The study found that those with West African ancestry were predisposed to have a high BMI while those with East African ancestry were predisposed to not have a high BMI, but those groups may be lumped together if a study just accounts for someone having African ancestry.
The study also revealed how differences in race mapped out across the United States, which is likely the result of migration to different parts of the country from different parts of the world. The paper also hones in on Latinos as an example where a socially defined group of people doesn’t map neatly onto genetic ancestry.
“As a representation of the general U.S. population and how complicated it is, this is a pretty big advance, and an important set of results,” said Sasha Gusev, a statistical geneticist at the Dana-Farber Cancer Institute. He noted that other studies have pointed out the incongruence between race and genetic ancestry, but those have typically been from smaller biobanks that intentionally collect data from racially minoritized groups. He added that the kind of geographic analyses the study did could only be done because it was a national cohort. “It also speaks to the fact that not only is racial self-identification a construct, but it’s one that differs across different parts of the country.”
The study could have a profound influence on the way geneticists conduct and frame their work. Historically, the idea that race was inextricably linked to biology was used to justify health disparities and other forms of racial bias.
There has generally been two camps in the field — one that claims race is a social construct and a biological understanding of race is too simplistic, and the other that says race is a discrete, identifiable group and is the primary driver in a person’s trajectory. The new study reinforces the shortfalls of using socially defined racial categories in genetics work.
“Race and ethnicity are poor proxies for genetic ancestry; therefore, biomedical research should adjust directly for ancestries estimated from genetic data rather than relying on self-identified race or ethnicity,” the study’s lead author, Charles Rotimi, scientific director of the National Human Genome Research Institute, wrote in an email to reporters.
Just minutes after that email was sent, an NIH spokesperson responded, “Hi all – please hold on using these responses until you hear from me,” but did not follow up by the time of publication.
STAT reached out to the paper’s co-authors outside of the NIH. Only one responded, writing that “all communication related to the paper is under review by HHS, and we do not have approval to participate in press interviews.”
The screening of scientists’ communications contrasts with NIH Director Jay Bhattacharya’s promise to foster a culture of free speech. Bhattacharya’s boss, health secretary Robert F. Kennedy Jr., has similarly pledged to promote “radical transparency.”
After publication, NIH made Rotimi available for an interview. He said the study results point to race being the wrong way to think about a person’s risk of disease — and should encourage researchers to think of people on a more individual level. The results don’t necessarily invalidate previous work that has found genetic differences between races, he said, but “those past findings are not as precise as they should have been, or they should be. As we gather more data with large cohorts like All of Us, we should be in a better position to truly characterize people.”
A fraught history
The current study is part of a long and often-fraught history of geneticists and ideas of race.
The NIH called my health equity research ‘antithetical to scientific inquiry’
Since Charles Darwin proposed the basics of what would become the field of genetics, people have tried to apply its principles to racial groups. Soon after Darwin proposed his theory of evolution, his cousin, Francis Galton, created the foundations of what would become the eugenics movement. On the first page of his book “Hereditary Genius,” Galton writes he was interested in understanding “the mental peculiarities of different races.”
“He spent his career trying to, in the primitive ways that were available at the time, prove that Europeans were biologically superior. It’s continuous, since then, it’s never, ever been dropped,” said Eric Turkheimer, a behavior geneticist at the University of Virginia who has written about the way Trump’s Make America Great Again movement could affect the field of genomics.
The All of Us cohort was launched in 2018 to create a database of genetic data that more accurately represented the U.S. — but even it has had a spotty past when it comes to discussing race.
The program caught some flak last year for a paper published by some of its scientists because it used a method of genetic visualization, called UMAP, that can artificially create distinct clusters in data. Several geneticists argued that it amplified genetic differences in racial groups in a misleading way. The worry at the time was that it could send the message that it was reifying a genetic basis for racial groups, a concern that was only amplified because genetics research has at times been used to justify white supremacy.
“The majority of population geneticists try not to promote the narrative that there are these discrete genetic racial clusters, because that’s not what the data tells us, not what the last 70 years of research tell us,” said Jedidiah Carlson, a population geneticist who has studied how some population genetics data are used in right-wing circles. While he has not studied the 2024 paper specifically, he did note it seems to be shared frequently by conservatives.
‘Open to interpretation’
The new paper does not mince words in its recommendation to scientists, but how it may be interpreted by the general public is much harder to pin down.
While the text and analyses of the paper point to race being a social construct, the way that the paper still leans on ancestry by subcontinental groups could allow for people to interpret the paper as simply calling for more granular racial categories.
“You can build a narrative around this that says, ‘Our socially constructed view is completely true.’ You can also build another view, which is, ‘It’s a little different than we thought it was back in the day. But the socially defined races look like they have pretty different ancestry components,’” said Aaron Panofsky, the director of the Institute for Society and Genetics at the University of California, Los Angeles, who has studied how conservatives have used genetics. He said that the paper is written in such a way that it “does not just double down on all these straightforward, old-style ways of talking about it, but it can’t emancipate itself from the problematic space of race and ethnicity. In some ways, it slips back in, and I believe it contradicts it. I think it’s very open to interpretation. It doesn’t settle anything.”
Another potential interpretation is that the paper supports the rationale the NIH has used in recent weeks when terminating research grants it deems related to DEI, which states, “Research programs based primarily on artificial and non-scientific categories, including amorphous equity objectives, are antithetical to the scientific inquiry.”
‘What wouldn’t I be worried about?’: Research leaders discuss threats to U.S. science
For his part, Rotimi, the lead author, said he tries to stay out of politics and likes “my data to speak for me.” He said his lab views race “as really a social construct. It doesn’t mean it’s not useful, but it’s truly a social construct. And the best way I can describe that is to say that trying to use genetics to define race or to use genetics to support our racial classification is like slicing soup. You can cut all you want — that soup is going to stay mixed.”
He added that “when we use concepts as broad as European or Africans or Asians, we distort our understanding of genetic variation, and that distortion can put individuals at risk when we try to prescribe medicine or when we try to treat them.”
Kahn and others noted the paper is making what could be a radical shift in thinking about the role race plays in medical research. In the past, geneticists have largely chalked up variation by race in health outcomes that couldn’t be explained by other variables as genetic differences they did not fully understand yet. But the paper instead says that race is likely a proxy for environmental factors.
“We’re not saying this because we’re a bunch of woke, leftist scientists, we are saying this because this is going to improve the science that we’re doing,” Carlson said. But he also worries that if the paper garners negative attention, it could have a chilling effect on further research that confronts genetic determinism.
Kahn lauded the paper’s use of the word “variation” as opposed to “diversity,” a word that he says may have encouraged the connection between the idea of race and genetics.
“The two concepts, the social and biological understandings of diversity, have been doing this sort of dance around each other, and sometimes they’re in productive juxtaposition. But oftentimes they’re entangled, and it’s just a mess,” he said.
“What I see this paper doing is providing a potential template to keep these concepts from being entangled in a way that is both scientifically unproductive but also politically volatile. So I’m hopeful.”
Correction: An earlier version of this story misspelled Jonathan Kahn’s name in one instance. This story has been updated with comments from an interview with the paper’s lead author.