当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Benefits and Limits of Phasing Alleles for Network Inference of Allopolyploid Complexes
Systematic Biology ( IF 6.5 ) Pub Date : 2024-05-11 , DOI: 10.1093/sysbio/syae024
George P Tiley 1 , Andrew A Crowl 2 , Paul S Manos 2 , Emily B Sessa 3 , Claudia Solís-Lemus 4 , Anne D Yoder 2 , J Gordon Burleigh 3
Affiliation  

Accurately reconstructing the reticulate histories of polyploids remains a central challenge for understanding plant evolution. Although phylogenetic networks can provide insights into relationships among polyploid lineages, inferring networks may be hindered by the complexities of homology determination in polyploid taxa. We use simulations to show that phasing alleles from allopolyploid individuals can improve phylogenetic network inference under the multispecies coalescent by obtaining the true network with fewer loci compared to haplotype consensus sequences or sequences with heterozygous bases represented as ambiguity codes. Phased allelic data can also improve divergence time estimates for networks, which is helpful for evaluating allopolyploid speciation hypotheses and proposing mechanisms of speciation. To achieve these outcomes in empirical data, we present a novel pipeline that leverages a recently developed phasing algorithm to reliably phase alleles from polyploids. This pipeline is especially appropriate for target enrichment data, where depth of coverage is typically high enough to phase entire loci. We provide an empirical example in the North American Dryopteris fern complex that demonstrates insights from phased data as well as the challenges of network inference. We establish that our pipeline (PATÉ: Phased Alleles from Target Enrichment data) is capable of recovering a high proportion of phased loci from both diploids and polyploids. These data may improve network estimates compared to using haplotype consensus assemblies by accurately inferring the direction of gene flow, but statistical non-identifiability of phylogenetic networks poses a barrier to inferring the evolutionary history of reticulate complexes.

中文翻译:

用于异源多倍体复合物网络推理的定相等位基因的优点和局限性

准确重建多倍体的网状历史仍然是理解植物进化的核心挑战。尽管系统发育网络可以深入了解多倍体谱系之间的关系,但推断网络可能会受到多倍体类群中同源性确定的复杂性的阻碍。我们使用模拟来表明,与单倍型共有序列或具有模糊代码表示的杂合碱基序列相比,来自异源多倍体个体的定相等位基因可以通过获得具有更少基因座的真实网络来改善多物种合并下的系统发育网络推断。定相等位基因数据还可以改善网络的分歧时间估计,这有助于评估异源多倍体物种形成假设和提出物种形成机制。为了在经验数据中实现这些结果,我们提出了一种新颖的管道,利用最近开发的定相算法来可靠地对来自多倍体的等位基因进行定相。该管道特别适合目标富集数据,其中覆盖深度通常足够高以对整个基因座进行定相。我们提供了北美鳞毛蕨复合体的实证示例,展示了来自分阶段数据的见解以及网络推理的挑战。我们确定我们的管道(PATÉ:来自目标富集数据的定相等位基因)能够从二倍体和多倍体中恢复高比例的定相基因座。与使用单倍型共有组件相比,这些数据可以通过准确推断基因流的方向来改善网络估计,但系统发生网络的统计不可识别性对推断网状复合体的进化历史构成了障碍。
更新日期:2024-05-11
down
wechat
bug