Researchers trace 73% of U.S. cases of salmonella to chicken and vegetables

Transparenz: Redaktionell erstellt und geprüft.
Veröffentlicht am

A modern genomic model shows where your risk of salmonella really lies - if you change chicken and vegetables as main sources and reshape the disease from foods. In a recent study published in the journal Emerging Infectious Diseases, a group of researchers used genome sequencing and machine learning to determine the primary food sources that cause human Salmonella infections in the United States (US). Background Annually, Salmonella enterica infections result in approximately 1.35 million illnesses resulting in significant hospitalizations in the United States. Common sources include contaminated food, water, animals, soil and infected people. Serotypes such as Enteritidis and Typhimurium can infect numerous hosts, while others such as Dublin mainly...

Researchers trace 73% of U.S. cases of salmonella to chicken and vegetables

A modern genomic model shows where your risk of salmonella really lies - if you change chicken and vegetables as main sources and reshape the disease from foods.

In a study recently published in the journalEmerging infectious diseasesA group of researchers used genome sequencing and machine learning to determine the primary food sources that cause humansSalmonellaInfections in the United States (US).

background

Yearly,Salmonella entericaInfections result in approximately 1.35 million illnesses resulting in significant hospitalizations in the United States. Common sources include contaminated food, water, animals, soil and infected people. Serotypes such as Enteritidis and Typhimurium can infect numerous hosts, while others such as Dublin primarily affect cattle. Traditional methods only attribute known outbreaks to about 5% of cases, so most illnesses are no more. Previous approaches relied on limited laboratory techniques, but with the introduction of whole genome sequencing (WGS) a clearer picture is emergingSalmonellaTransmission routes can arise. Improved attribution models are critical to refining food safety regulations and preventative measures and emphasize the need for advanced research using advanced genomic technologies.

About the study

Researchers have compiled a data set of 18,661SalmonellaIsolates from food and animal samples available at the National Center for Biotechnology Information (NCBI) and augmented by metadata from U.S. government agencies, including the Food and Drug Administration (FDA), the U.S. Safety and Inspection Service (CDC), and Centers for Disease Control (CDC). The isolates were categorized into 15 different food groups, excluding mixed source samples. Due to an excess of chicken isolates, 50% were randomly selected to balance the data set and inverse class weighting was applied to further correct imbalances. Although the model was used globallySalmonellaisolates, 76% came from the United States, making it generally representative of domestic food sources.

For human infections 6,470SalmonellaIsolates with unknown sources of infection and no international travel history were collected from the active foodborne illness surveillance network (FoodNet), which covered approximately 15% of the US population between 2014 and 2017.

The research team assembled genetic data using Spade software and used whole-genome multilocus sequence typing (WGMLST) to characterize both food-derived and human isolates. Serotype identification used the SEQSERO2 tool. A forest random machine learning algorithm that classifies data using numerous genetic markers was trained on isolates with known sources. The model was evaluated for accuracy using cross-validation and permutation to identify the most informative genomic markers. The model achieved maximum accuracy using a subset of 7,360 genetic loci, reinforcing the value of high-dimensional genomic data for classification tasks. The optimized model predicted sources of infection for human cases with >50% probability and attributed unusual cases to unknown sources.

Study results

The random forest model trained on genomic data from 18,661 foods and animal isolates, identified chicken (31%), vegetables (13%), turkey (12%) and pork (11%)SalmonellaSources. Most frequentlySalmonellaSerotypes were Kentucky, Typhimurium, Enteritidis, and Heidelberg.

The model was applied to human infections and analyzed 6,470 cases and attributed 34% of illnesses to chicken and 30% to vegetables, accounting for nearly two-thirds of infections. When uncertainty was taken into account (probabilities <50%), approximately 44% of cases remained unclassified. Excluding uncertain cases, the model traced 46% of infections to chicken and 27% to vegetables, which together account for around 73% of confirmed sources.

DifferentSalmonellaSerotypes showed different source associations. Chicken was particularly associated with serotypes Enteritidis, Typhimurium, Heidelberg, and Infantis, while vegetables were primarily associated with Javiana and Newport. Pork has been identified as the dominant source of serotype Salmonella enterica 4,[5]12:i:- (STM).

The percentage of Salmonella isolates collected from known single-source foods in the United States and other countries from 2003 to 2018 (used as training data in the random forest model) by food category (n = 18,661, including 613 isolates before 2003).

The model's accuracy was strong, particularly in identifying chicken (97% accuracy), vegetables (82%), turkey (88%), pork (83%) and beef (77%). However, it struggled with less common sources such as milk and game. Increasing the number of genomic loci using improved precision confirmed the effectiveness of WGs and machine learning for source attribution.

Compared to previous outbreak studies, this analysis showed chicken as a far more extensive source ofSalmonellaInfections reflecting different risk profiles between sporadic infections and outbreaks. Importantly, predictions agree well with known epidemiological data and confirm the real-world applicability of the model.

These results highlight the need for targeted interventions and policies focused on poultry and fresh produce, which are critical to reducing theSalmonellaPublic health burden. Given that many infections remain untouched, expanding the data set with more diverse non-chicken isolates and additional non-food sources such as environmental and wildlife samples could further improve accuracy. The regional limitations of FoodNet data and variations in healthcare behavior also suggest a need for broader nationwide data collection.

Conclusions

In conclusion, this study demonstrated the effectiveness of WGs combined with a random machine learning algorithm to accurately identify the food sources ofSalmonellaInfections in the USA. Chicken and vegetables emerged as key supply advisors, reinforcing the importance of targeted regulatory and public health strategies. This genomic approach offers significant improvements over traditional methods and provides detailed insights critical to food safety policy, routine surveillance and outbreak management. Continued research should incorporate broader sample diversity, expand geographic representation, and include non-food sources to further strengthen the precision of the model, thereby benefiting public health efforts against itSalmonella.


Sources:

Journal reference: