Using an artificial intelligence (AI) model to conduct a genomic meta-analysis of Cronobacter sakazakii strains, researchers from the University of Maryland discovered genetic differences in isolates from powdered food samples that could explain why the pathogen persists in infant formula.

The finding is especially important in the wake of the infant formula crisis of 2022. After four infant hospitalizations and two fatalities were caused by C. sakazakii infection, an investigation led to a widespread formula recall and the temporary shutdown of a major U.S. production facility where C. sakazakii contamination was found. These events resulted in a critical shortage of infant formula across the nation, driving federal regulatory agencies to prioritize infant formula safety and supply.

Published in the International Journal of Food Microbiology, the study leveraged an AI large language model (LLM) to standardize large, global, inconsistent genetic datasets for C. sakazakii—making comparison possible between previously irreconcilable metadata. The resulting dataset is believed to be the most complete C. sakazakii pangenome to date, comprising whole genome sequencing (WGS) data for 748 food, clinical, and environmental isolates originating from North America, Europe, and Asia.

Next, the researchers used machine learning models to associate genetic information with information like where, from what product, and under which conditions an isolate was sampled.  

The analysis showed that C. sakazakii isolates from powdered food samples, such as infant formula and milk powders, had a higher frequency of genes that could contribute to the pathogen’s survival under dry conditions; specifically, DNA recombination, repair, and desiccation resistance genes. Moreover, virulence genes were more prevalent in strains with the greatest potential for persistence, adding to the likelihood of consumers contracting foodborne illness from contaminated product. Correlations between regionality and biofilm formation, as well as regionality and heavy metals resistance, were also discovered.

Overall, the findings suggest that the existence of so many genes with adaptive traits may be the reason why C. sakazakii is so persistent in a variety of environments, including food production environments and low-moisture foods. The researchers believe their AI-based approach could inform effective molecular surveillance strategies and targets for control strategies in food facilities.

With the food trade being a global system in which products are shipped across borders, researchers underline the need for international cooperation to better understand how foodborne pathogens evolve and move through the food chain, including tracking genetic virulence and resistance markers. Additionally, the researchers uphold AI as a useful tool for standardizing and analyzing epidemiological data, as demonstrated through the reproduceable workflow used in their study.

The project was supported by the U.S. Department of Agriculture’s National Institute of Food and Agriculture (USDA-NIFA). Authors of the study include Ryan Blaustein, Ph.D., Abani Pradhan, Ph.D., and Maurui Gao, Ph.D. with the University of Maryland’s Department of Food and Nutrition Science.