当前位置: X-MOL 学术Nat. Biomed. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models
Nature Biomedical Engineering ( IF 28.1 ) Pub Date : 2024-03-21 , DOI: 10.1038/s41551-024-01193-8
Francisco Carrillo-Perez , Marija Pizurica , Yuanning Zheng , Tarak Nath Nandi , Ravi Madduri , Jeanne Shen , Olivier Gevaert

Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.



中文翻译:

通过级联扩散模型从 ​​RNA 测序数据生成肿瘤的合成全幻灯片图像图块

当获取多样化且足够大的数据集成本高昂且具有挑战性时,使用综合生成的数据训练机器学习模型可以缓解数据稀缺的问题。在这里,我们展示了级联扩散模型可用于根据人类肿瘤 RNA 测序数据的潜在表示合成真实的全幻灯片图像图块。基因表达的改变影响了生成的合成图像块中细胞类型的组成,这准确地保留了细胞类型的分布并维持了在批量 RNA 测序数据中观察到的细胞分数,正如我们在肺腺癌、肾乳头状细胞癌中所显示的那样、宫颈鳞状细胞癌、结肠腺癌和胶质母细胞瘤。使用生成的合成数据进行预训练的机器学习模型比从头开始训练的模型表现更好。合成数据可以加速稀缺数据环境中机器学习模型的开发,并允许对缺失的数据模式进行插补。

更新日期:2024-03-23
down
wechat
bug