当前位置: X-MOL 学术npj Comput. Mater. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Growing strings in a chemical reaction space for searching retrosynthesis pathways
npj Computational Materials ( IF 9.7 ) Pub Date : 2024-05-10 , DOI: 10.1038/s41524-024-01290-x
Federico Zipoli , Carlo Baldassari , Matteo Manica , Jannis Born , Teodoro Laino

Machine learning algorithms have shown great accuracy in predicting chemical reaction outcomes and retrosyntheses. However, designing synthesis pathways remains challenging for existing machine learning models which are trained for single-step prediction. In this manuscript, we propose to recast the retrosynthesis problem as a string optimization problem in a data-driven fingerprint space, leveraging the similarity between chemical reactions and embedding vectors. Based on this premise, multi-step complex synthesis can be conceptualized as sequences that link multidimensional vectors (fingerprints) representing individual chemical reaction steps. We extracted an extensive corpus of chemical synthesis from patents and converted them into multidimensional strings. While optimizing the retrosynthetic path, we use the Euclidean metric to minimize the distance between the expanded trajectory of the growing retrosynthesis string and the corpus of extracted strings. By doing so, we promote the assembly of synthetic pathways that, in the chemical reaction space, will be more similar to existing retrosyntheses, thereby inheriting the strategic guidelines designed by human experts. We integrated this approach into the RXN platform (https://rxn.res.ibm.com/) and present the method’s application to complex synthesis as well as its ability to produce better synthetic strategies than current methodologies.



中文翻译:

在化学反应空间中生长字符串以搜索逆合成途径

机器学习算法在预测化学反应结果和逆合成方面表现出了极高的准确性。然而,对于经过单步预测训练的现有机器学习模型来说,设计合成路径仍然具有挑战性。在这篇手稿中,我们建议利用化学反应和嵌入向量之间的相似性,将逆合成问题重新转化为数据驱动的指纹空间中的字符串优化问题。基于这个前提,多步骤复杂合成可以被概念化为链接代表各个化学反应步骤的多维向量(指纹)的序列。我们从专利中提取了广泛的化学合成语料库,并将其转换为多维字符串。在优化逆合成路径时,我们使用欧几里德度量来最小化不断增长的逆合成字符串的扩展轨迹与提取字符串的语料库之间的距离。通过这样做,我们促进了合成途径的组装,在化学反应空间中,这些合成途径将与现有的逆合成更加相似,从而继承了人类专家设计的战略指导方针。我们将此方法集成到 RXN 平台 (https://rxn.res.ibm.com/) 中,并展示该方法在复杂合成中的应用,以及它产生比当前方法更好的合成策略的能力。

更新日期:2024-05-11
down
wechat
bug