UVA Health researchers are developing a new tool to advance genomics and disease research
UVA Health researchers have developed an important new tool to help scientists distinguish signals from noise as they study the genetic causes of cancer and other diseases. In addition to advancing research and potentially accelerating new treatments, the new tool could help improve cancer diagnosis by making it easier for doctors to detect cancer cells. The new tool, developed by Chongzhi Zang, PhD, UVA, and his team and collaborators, is a mathematical model that will help ensure the integrity of “big data” about the building blocks of our chromosomes, genetic material called chromatin. Chromatin – a…

UVA Health researchers are developing a new tool to advance genomics and disease research
UVA Health researchers have developed an important new tool to help scientists distinguish signals from noise as they study the genetic causes of cancer and other diseases. In addition to advancing research and potentially accelerating new treatments, the new tool could help improve cancer diagnosis by making it easier for doctors to detect cancer cells.
The new tool, developed by Chongzhi Zang, PhD, UVA, and his team and collaborators, is a mathematical model that will help ensure the integrity of “big data” about the building blocks of our chromosomes, genetic material called chromatin. Chromatin – a combination of DNA and protein – plays an important role in controlling the activity of our genes. When chromatin goes wrong, it can turn a healthy cell into cancer or contribute to other diseases.
Genetics & Genomics eBook
Compilation of the top interviews, articles and news from the last year. Download a free copy
Scientists can now study chromatin in individual cells using a cutting-edge technology called single-cell ATAC-seq, but this produces an enormous amount of data, including a lot of noise and distortion. Zang's new tool cuts through that, saving scientists from false leads and wasted efforts.
In the best of times, large-scale genomic research on single cells is like “hunting for a needle in a haystack,” says Zang. But his new tool will make it a lot easier by clearing away a lot of bad hay.
In the traditional way of data analysis, you may see some patterns that look like real signals of a particular chromatin state, but are false due to the bias of the experimental technology itself. Such fake signals can confuse scientists. We have developed a model to better capture and filter out such false signals so that the real needle we are looking for can more easily stand out from the hay.”
Chongzhi Zang, PhD, computational biologist at the UVA Center for Public Health Genomics and UVA Health Cancer Center
About the genomics tool
Zang's new tool adapts a model from number theory and cryptology called "simplex coding." He and his colleagues used this to encode DNA sequences into mathematical forms and ultimately convert the complex genome sequence into a much simpler mathematical form. You can then compare different shapes to detect distortions and noise in the sequence data that are not easily found using traditional approaches.
"The complexity of DNA sequences increases exponentially as they get longer. They are difficult to model because a typical data set contains millions of sequences from thousands of cells," said Shengen Shawn Hu, PhD, a researcher in Zang's lab and lead author of this work. “But the simplex coding model can provide an accurate estimate of sequence distortions because of its beautiful mathematical property.”
Tests of the tool showed that it was significantly better at analyzing complex single-cell data to characterize different cell types. This is important for both basic biological research and disease diagnosis, where doctors need to detect tiny numbers of disease cells in much larger samples, ranging from tens of thousands to millions of cells.
"The distortions were not easy to find because they were interwoven with real signals and hidden in the large amounts of data. It might not be a big deal if people just picked the strongest signals from a large number of cells," Zang said. who recently co-led several other single-cell genomics researches studying coronary artery disease and intestinal development. "But when you look at single-cell data, there is no longer any low-hanging fruit. The signals are always weak at the individual cell level, and the effects of noise and distortion can be catastrophic. Bias correction is often ignored, but can be crucial in single-cell data analysis."
To make their new tool widely available, the researchers developed free open source software and put it online. The software can be found at https://github.com/zang-lab/SELMA and at https://doi.org/10.5281/zenodo.7048767.
“We hope this tool can benefit the biomedical research community in studying chromatin biology and genomics and ultimately support disease research,” Zang said. “It’s always exciting to see how our colleagues use the tools we develop to make important scientific discoveries in their own research.”
Results published
The researchers published their results in the journal Nature Communications. (The article is open access, meaning free to read.) The team consisted of Shengen Shawn Hu, Lin Liu, Qi Li, Wenjing Ma, Michael J. Guertin, Clifford A. Meyer, Ke Deng, Tingting Zhang and Chongzhi Zang.
Zang is part of UVA's departments of Public Health Sciences, Biochemistry and Molecular Genetics, and Biomedical Engineering. The Department of Biomedical Engineering is a collaboration of the UVA School of Medicine and School of Engineering.
The work was supported by National Institutes of Health grants R35GM133712, K22CA204439, and R35GM128635; the National Science Foundation, Grant NSF-796 2048991; the University of Pittsburgh Center for Research Computing; UVA Cancer Center; and the National Cancer Institute of the NIH, Cancer Center Support Grant P30 CA44579.
Source:
University of Virginia Health System
Reference:
Hu, SS, et al. (2022) Intrinsic bias estimation for improved analysis of bulk and single-cell chromatin accessibility profiles using SELMA. Nature communication. doi.org/10.1038/s41467-022-33194-z.
.