当前位置: X-MOL 学术Opt. Switch. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RAMP: A flat nanosecond optical network and MPI operations for distributed deep learning systems
Optical Switching and Networking ( IF 2.2 ) Pub Date : 2023-08-17 , DOI: 10.1016/j.osn.2023.100761
Alessandro Ottino , Joshua Benjamin , Georgios Zervas

Distributed deep learning (DDL) systems strongly depend on network performance. Current electronic packet switched (EPS) network architectures and technologies suffer from variable diameter topologies, low-bisection bandwidth and over-subscription affecting completion time of communication and collective operations. We introduce a near-exascale, full-bisection bandwidth, all-to-all, single-hop, all-optical network architecture with nanosecond reconfiguration called RAMP, which supports large-scale distributed and parallel computing systems (12.8 Tbps per node for up to 65,536 nodes). For the first time, a custom RAMP-x MPI strategy and a network transcoder is proposed to run MPI collective operations across the optical circuit switched (OCS) network in a schedule-less and contention-less manner. RAMP achieves 7.6-171× speed-up in completion time across all MPI operations compared to realistic EPS and OCS counterparts. It can also deliver a 1.3-16× and 7.8-58× reduction in Megatron and DLRM training time respectively while offering 38-47× and 6.4-26.5× improvement in energy consumption and cost respectively.



中文翻译:

RAMP:用于分布式深度学习系统的平坦纳秒光网络和 MPI 操作

分布式深度学习(DDL)系统强烈依赖于网络性能。当前的电子分组交换(EPS)网络架构和技术受到可变直径拓扑、低平分带宽和超额订阅的影响,影响了通信和集体操作的完成时间。我们引入了一种近百亿亿级、全对分带宽、全对全、单跳、全光网络架构,具有纳秒级重新配置功能,称为 RAMP,它支持大规模分布式并行计算系统(每个节点 12.8 Tbps至 65,536 个节点)。首次提出定制 RAMP-x MPI 策略和网络转码器,以无调度和无争用的方式在光电路交换 (OCS) 网络上运行 MPI 集体操作。RAMP 达到 7.6-171×与实际的 EPS 和 OCS 操作相比,所有 MPI 操作的完成时间都加快了。它还可以提供 1.3-16×和7.8-58×分别减少威震天和 DLRM 训练时间,同时提供 38-47×和6.4-26.5×分别改善能源消耗和成本。

更新日期:2023-08-17
down
wechat
bug