Sparks of Generative Pretrained Transformers in Edge Intelligence for the Metaverse: Caching and Inference for Mobile Artificial Intelligence-Generated Content Services,IEEE Vehicular Technology Magazine

当前位置： X-MOL 学术 › IEEE Veh. Technol. Mag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Sparks of Generative Pretrained Transformers in Edge Intelligence for the Metaverse: Caching and Inference for Mobile Artificial Intelligence-Generated Content Services
IEEE Vehicular Technology Magazine ( IF 8.1 ) Pub Date : 2023-11-03 , DOI: 10.1109/mvt.2023.3323757
Minrui Xu ₁ , Dusit Niyato ₁ , Hongliang Zhang ₂ , Jiawen Kang ₃ , Zehui Xiong ₄ , Shiwen Mao ₅ , Zhu Han ₆

Affiliation

Aiming at achieving artificial general intelligence (AGI) for the metaverse, pretrained foundation models (PFMs), e.g., generative pretrained transformers (GPTs), can effectively provide various artificial intelligence (AI) services, such as autonomous driving, digital twins (DTs), and AI-generated content (AIGC) for extended reality (XR). With the advantages of low latency and privacy-preserving, serving PFMs of mobile AI services in edge intelligence is a viable solution for caching and executing PFMs on edge servers with limited computing resources and GPU memory. However, PFMs typically consist of billions of parameters that are computation- and memory-intensive for edge servers during loading and execution. In this article, we investigate edge PFM serving problems for mobile AIGC services of the metaverse. First, we introduce the fundamentals of PFMs and discuss their characteristic fine-tuning and inference methods in edge intelligence. Then, we propose a novel framework of joint model caching and inference for managing models and allocating resources to satisfy users’ requests efficiently. Furthermore, considering the in-context learning ability of PFMs, we propose a new metric to evaluate the freshness and relevance between examples in demonstrations and executing tasks, namely the Age of Context ( AoC ). Finally, we propose a least-context (LC) algorithm for managing cached models at edge servers by balancing the tradeoff among latency, energy consumption, and accuracy.

中文翻译：

元宇宙边缘智能中生成式预训练 Transformer 的火花：移动人工智能生成的内容服务的缓存和推理

以实现元宇宙的通用人工智能（AGI）为目标，预训练基础模型（PFM），例如生成式预训练变压器（GPT），可以有效地提供各种人工智能（AI）服务，例如自动驾驶、数字孪生（DT），以及用于扩展现实 (XR) 的人工智能生成内容 (AIGC)。凭借低延迟和保护隐私的优势，在边缘智能中提供移动AI服务的PFM是在计算资源和GPU内存有限的边缘服务器上缓存和执行PFM的可行解决方案。然而，PFM 通常由数十亿个参数组成，这些参数在加载和执行期间对于边缘服务器来说是计算和内存密集型的。在本文中，我们研究了元宇宙移动 AIGC 服务的边缘 PFM 服务问题。首先，我们介绍了 PFM 的基本原理，并讨论了其在边缘智能中的特征微调和推理方法。然后，我们提出了一种联合模型缓存和推理的新框架，用于管理模型和分配资源以有效地满足用户的请求。此外，考虑到 PFM 的上下文学习能力，我们提出了一个新的指标来评估演示和执行任务中的示例之间的新鲜度和相关性，即语境时代（航标）。最后，我们提出了一种最小上下文（LC）算法，通过平衡延迟、能耗和准确性之间的权衡来管理边缘服务器上的缓存模型。

更新日期：2023-11-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>