Menlo Park, California, USA
March 17, 2025
PacBio (NASDAQ: PACB), a leading provider of high-quality, highly accurate sequencing platforms, today announced a newly published study in Nature Communications unveiling a powerful new method for analyzing some of the most complex regions of the human genome. Led by researchers from PacBio, GeneDx, and a global consortium of genomics experts, the study utilizes Paraphase, an informatics tool that, when paired with HiFi long-read sequencing, allows for high-precision variant detection and copy number analysis in 316 previously inaccessible segmental duplication regions, including 9 challenging medically-relevant genes.
Segmental duplications (SDs) are highly similar, duplicated regions of the genome that have posed persistent challenges for genetic analysis. These regions contain hundreds of genes critical to human health—including those implicated in spinal muscular atrophy (SMN1/SMN2), congenital adrenal hyperplasia (CYP21A2), and red-green color blindness (OPN1LW/OPN1MW)—but their high sequence similarity makes accurate mapping and variant detection nearly impossible with short-read sequencing. Paraphase, combined with HiFi sequencing, overcomes these challenges by phasing haplotypes across paralogous gene families, providing a more complete and accurate view of genetic variation. This is enabled by the length and accuracy of reads from HiFi sequencing.
Study Reveals Previously Inaccessible Regions of the Genome
By applying Paraphase to 160 long (>10 kb) segmental duplication regions spanning 316 genes, the researchers revealed new insights into genetic variation across five ancestral populations.
Among the key findings:
- Newly Identified De Novo Variants in SDs in Parent-Offspring Trios: Analysis of 36 trios uncovered 7 previously undetected de novo single nucleotide variants (SNVs) and 4 de novo gene conversion events, two of which were non-allelic—a level of detail not possible with traditional sequencing approaches.
- Copy Number Variability Across Populations: The study profiled the copy number distributions of paralog groups across populations, showing high copy number variability in many gene families in SDs. It also provided a new approach for identifying false duplications in the reference genome.
- Gene Conversion Drives Sequence Similarity between Genes and Paralogs: The team identified 23 paralog groups with strikingly low genetic diversity between genes and paralogs, indicating that frequent gene conversion and/or unequal crossing-over may have played a role in preserving highly similar gene copies over time.
“For decades, sequencing technologies have struggled to provide reliable data on paralogous genes—some of the most medically relevant but hardest to analyze regions of the genome,” said Dr. Michael A. Eberle, Vice President of Bioinformatics at PacBio and senior author of the study. “With Paraphase and HiFi sequencing, we now have a scalable way to accurately genotype SD-encoded genes across diverse populations, filling in long-standing gaps in genomic research and improving our ability to identify disease-linked variants.”
The study also highlights how Paraphase can disentangle medically important gene families that have long required specialized, multi-step assays like MLPA and Sanger sequencing. For example, in the CYP21A2/CYP21A1P region—where mutations cause congenital adrenal hyperplasia—the researchers characterized a previously overlooked duplication allele carrying both a functional CYP21A2 copy and a CYP21A2(Q319X) pseudogene copy, which could have led to misclassification in standard tests.
“This study demonstrates that when we use HiFi sequencing we see a much richer and more complex picture of genetic variation,” said Dr. Xiao Chen, lead author of the study and principal scientist at PacBio. “Paraphase enables the precise resolution of genetic regions that have been largely inaccessible until now, providing new opportunities for disease research, population genetics, and potentially even clinical testing.”
“Long-read genome sequencing offers the ability to detect variants that are difficult to identify using other testing methods, particularly in regions with highly similar sequence,” said Dr. Paul Kruszka, MD, FACMG, Chief Medical Officer at GeneDx. “This work may enhance variant detection, resolve complex genomic regions, and provide more answers for patients and families, so we are encouraged by the prospect of the data.”
The full study, “Genome-wide profiling of highly similar paralogous genes using HiFi sequencing,” is now available in Nature Communications.