当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall
Genome Research ( IF 7 ) Pub Date : 2023-12-01 , DOI: 10.1101/gr.278070.123
William T. Harvey , Peter Ebert , Jana Ebler , Peter A. Audano , Katherine M. Munson , Kendra Hoekzema , David Porubsky , Christine R. Beck , Tobias Marschall , Kiran Garimella , Evan E. Eichler

Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.

中文翻译:

全基因组长读长测序下采样及其对变异调用精度和召回率的影响

长读长测序 (LRS) 技术的进步不断使全基因组测序变得更加完整、经济且准确。与短读长测序方法相比,LRS 具有显着优势,包括分阶段从头基因组组装、访问先前排除的基因组区域以及发现与疾病相关的更复杂的结构变异 (SV)。在成本、可扩展性和平台相关的读取准确性方面仍然存在限制,并且序列覆盖率和变体发现的敏感性之间的权衡是 LRS 应用的重要实验考虑因素。我们比较了 Oxford Nanopore Technologies (ONT) 和 Pacific Biosciences (PacBio) HiFi 平台在一系列序列覆盖范围内的遗传变异调用精度和召回率。对于基于读取的应用,LRS 灵敏度开始稳定在 12 倍覆盖范围左右,大多数变体都具有合理的准确度(F 1得分高于 0.5),并且两个平台都在 SV 检测方面表现良好。基因组组装提高了 HiFi 数据集中的变体调用精度以及 SV 和 indel 的召回率,根据基于组装的变体调用集的F 1分数来衡量,HiFi 的质量优于 ONT。虽然这两种技术都在不断发展,但我们的工作为设计具有成本效益的实验策略提供了指导,同时又不影响发现新的生物学。
更新日期:2023-12-01
down
wechat
bug