当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Random-effects substitution models for phylogenetics via scalable gradient approximations
Systematic Biology ( IF 6.5 ) Pub Date : 2024-05-07 , DOI: 10.1093/sysbio/syae019
Andrew F Magee 1 , Andrew J Holbrook 1 , Jonathan E Pekar 2, 3 , Itzue W Caviedes-Solis 4 , Fredrick A Matsen IV 5, 6, 7, 8 , Guy Baele 9 , Joel O Wertheim 10 , Xiang Ji 11 , Philippe Lemey 9 , Marc A Suchard 1, 12, 13
Affiliation  

Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.

中文翻译:

通过可扩展梯度近似的系统发生学随机效应替代模型

系统发育和离散性状进化推理在很大程度上取决于对潜在特征替换过程的适当表征。在本文中,我们提出了随机效应替代模型,将常见的连续时间马尔可夫链模型扩展到更丰富的过程类别,能够捕获更广泛的替代动态。由于这些随机效应替代模型通常需要比通常模型更多的参数,因此推断在统计和计算上都具有挑战性。因此,我们还提出了一种有效的方法来计算相对于所有未知替代模型参数的数据似然度梯度的近似值。我们证明,这种近似梯度能够在大型树和状态空间的随机效应替代模型下扩展基于采样的推理,即通过哈密顿蒙特卡罗进行贝叶斯推理。将具有随机效应的 HKY 模型应用于 583 个 SARS-CoV-2 序列的数据集,在替代过程中显示出强烈的不可逆性信号,后验预测模型检查清楚地表明,它是一个比可逆模型更合适的模型。在分析 1441 个甲型流感病毒 (H3N2) 序列在 14 个地区之间的系统发育地理学传播模式时,随机效应系统发育地理学替代模型推断,航空旅行量足以预测几乎所有的传播率。随机效应状态依赖替代模型表明,没有证据表明树栖环境对树蛙亚科 Hylinae 的游泳模式有影响。模拟表明,随机效应替代模型可以适应与基础碱基替代模型的可忽略的和根本的偏离。我们表明,我们基于梯度的推理方法比传统方法的时间效率高一个数量级。
更新日期:2024-05-07
down
wechat
bug