Last read sequencing is revolutionizing the diagnosis of rare diseases
One in 10 people worldwide are affected by a rare genetic disorder, but about 50% of them remain undiagnosed despite rapid increases in genetic technology and testing. Even if a person has access to testing, the process of diagnosis can take around five years or longer, which is sometimes too late for patients, who are often children, to begin proper treatment. This is partly because current clinical testing uses a method called short-read sequencing, which cannot access information in certain regions of the genome and therefore provide crucial evidence for diagnosis...
Last read sequencing is revolutionizing the diagnosis of rare diseases
One in 10 people worldwide are affected by a rare genetic disorder, but about 50% of them remain undiagnosed despite rapid increases in genetic technology and testing. Even if a person has access to testing, the process of diagnosis can take around five years or longer, which is sometimes too late for patients, who are often children, to begin proper treatment.
This is partly because current clinical testing uses a method called short-read sequencing, which cannot access information in certain regions of the genome and therefore cannot perform crucial evidence for diagnosis. However, UC Santa Cruz researchers are moving forward to a cutting-edge alternative method, called Lonad sequencing, that can provide a more comprehensive data set for identifying variations, eliminate the need for multiple specialized tests, and optimize the diagnosis of rare diseases.
A new study shows that long-lived sequencing has the potential to improve diagnosis rates while reducing time to diagnosis from years to days - in a single test and at a much lower cost. The study was published inThe American Journal of Human Geneticsand led by core members of the UCSC Genomic Institute Professor of Biomolecular Engineering (BME) Benedikt Paten and Associate Professor of BME Karen Miga, as well as former UCSC postdoctoral scholar Jean Monlong.
Rare diseases are something that people have struggled with for so many years, and if we have sequencing technology that optimizes diagnostic testing, that will be a huge contribution - and that's what we tested as part of this paper. “
Shloka Negi, UC Santa Cruz Bme Ph.D. student, first author of the paper
“Today, the diagnostic yield of genetic sequencing is frustratingly low,” said Paten. “A likely cause is the incomplete sequencing methods, variants and epigenetic signals in our cohort.
Find rare diseases
This study focused on rare monogenic diseases caused by a disorder of a single gene.
Scientists diagnose genetic diseases by searching through their genetic material to find variants - differences in a gene that can prevent it from functioning properly. The typical approach to finding these variants uses a technique called short-read sequencing, which maps genetic base pairs—combinations of adenine (A), cytosine (C), guanine (G), and thymine (T)—in sequences of about 150-250 each.
However, the limitation of short-read sequencing is that it can miss crucial information in certain regions of the genome, such as patterns of base pairs that are much longer than just 250 base pairs. Nor can it perform “phasing,” the process of determining which variants are inherited from the mother and which come from the father. This can help clinicians discover which variants are inherited. For example, when two variants are inherited from the same parent, one from each parent, or not inherited at all. This can be very useful information for genetic diagnoses, especially when parental data is not available.
In contrast, long-lived sequencing can read lengthy stretches of DNA simultaneously, eliminating gaps that can cause scientists and clinicians to miss important information about gene variation. Long-read sequencing also provides direct phase data as well as information about methylation, a chemical process in DNA that causes genes to be “turned on or off” and can contribute to disease.
“Long-lived sequencing will be much better in certain cases, and we are taking steps to prove that,” Negi said.
Lead in methods
UC Santa Cruz Genomics Institute researchers have an extensive history of innovation and expertise in long-lived sequencing and are actively developing methods to optimize sequencing and analysis for a wide range of health research applications. Many of the techniques that researchers developed to achieve achievements, such as the first truly complete “telomere-telomere” reference genome, are now being used to improve patient outcomes.
"If previous findings were reinforced, we found that the benefits of using long-lived sequencing were significantly increased by using a complete, so-called 'telomere-telomere' reference genome instead of the existing incomplete but widely used genomic reference," MIGA said. "We anticipate that pangenomes - references representing diverse human variation - will benefit even more from new long-lived sequencing technologies."
Paten and Miga's laboratories teamed up with clinicians to work on cases of 42 patients with rare diseases - some of whom were diagnosed through short-read methods or other specialized tests, some of whom were not yet diagnosed. In some cases, researchers had access to parents' genetic information, but in others they did not.
Long-lived sequencing of the patients was conducted by the MIGA laboratory using Nanopore sequencing, a long-lived sequencing method at UCSC, to achieve highly accurate, end-to-end reads of the patients' genomes for approximately $1,000 per sample.
The genomic data was analyzed using computational methods developed in the Paten laboratory to find small and large variants, phase data and methylation data, all using a pipeline called the NAPU pipeline. The analysis process takes about a day or less, depending on the speed of computer processing, and costs $100.
solve cases
After sequencing and analyzing patient data, researchers found that long-read readings provided a more comprehensive data set compared to what can be derived using short-read sequencing.
Long-read sequencing provided a conclusive diagnosis for 11 of the 42 patients in the cohort, providing everything from the short-read data as well as additional information including additional rare candidate variants, long-range phasing and methylation - all in a single, inexpensive and rapid protocol.
The 11 cases diagnosed included four congenital adrenal hypoplasias (a rare condition in which the adrenal glands are enlarged and do not function properly). The gene responsible for this disease is located in a particularly challenging region of the genome – it cannot be characterized using short read sequencing technology, and current clinical testing is cumbersome and incomplete.
“To solve these cases, we have developed a new pan-genomic tool that integrates new high-quality assemblies such as the reference genome “telomeres-telomeres” current position at INSERM in France. "We were excited to see the human genome, which has historically been difficult to study. Our results encourage us to expand our approach to more diseases that have long been stalled."
In addition, two cases involved disorders of sexual development, while a rare case of Leydig cell hypoplasia due to underdeveloped Leydig cells in the testes affected male sexual development. In addition, four cases of neurodevelopmental disorders, each representing long and challenging diagnostic odysseys, have finally been resolved.
“Long read sequencing is probably the next best test for unsolved cases with compelling variants in a single gene or a clear phenotype,” Negi said. “It can serve as a single diagnostic test, reducing the need for multiple clinical visits and converting a years-long diagnostic journey into a matter of hours.”
On average, each patient had 280 genes (including some Mendelian disease genes associated with inherited diseases caused by single gene mutations) with significant protein coding regions that are clearly covered by long reads and undetected by short reads.
“There is so much more of the genome that the long reads can unlock,” Negi said. "But it will take time before we can fully interpret this new information revealed by long reads. These data were not present in our clinical databases. Reads uncover about 5.8% more of the telomere-to-telomere genome that short reads simply could not access."
Other UC Santa Cruz researchers involved in this research include Brandy McNulty, Ivo Violich, Joshua Gardner, Todd Hillaker and Sara O’Rourke.
This research was funded in part by the Chan Zuckerberg Initiative.
Sources:
Negi, S., et al. (2025) Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection. The American Journal of Human Genetics. doi.org/10.10doi.org/10.1016/j.ajhg.2025.01.00216/j.ajhg.2025.01.002.