SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

InfiniBand In-Network Computing Technology and Roadmap


Authors: Gilad Shainer (Mellanox Technologies), Daniel Gruner (University of Toronto), Ron Hawkins (San Diego Supercomputer Center)

Abstract: Being a standard-based interconnect, InfiniBand enjoys the continuous development of new capabilities.

HDR 200G InfiniBand In-Network Computing technology provides innovative engines offloading and accelerating communication frameworks and application algorithms. The session will discuss the InfiniBand In-Network Computing technology and testing results from DoE systems, Canada’s fastest InfiniBand Dragonfly based supercomputer at the University of Toronto, the world’s first HDR 200G InfiniBand systems and more.

As the needs for faster data speed accelerates, the InfiniBand Trade Association has been working to set the goals for future speeds, and this topic will also be covered at the session.


Long Description: Supercomputers are the essential tools we need to conduct research, enable scientific discoveries, design new products, and develop self-learning software algorithms. Supercomputing leadership means scientific leadership, which explains the investments made by many governments and research institutes to build faster and more powerful supercomputing platforms. The heart of a supercomputer is the network that connects the compute elements together, enabling parallel and synchronized computing cycles. Over the past decades, multiple network technologies were created and multiple have disappeared. InfiniBand, an industry standard developed in 1999, continues to show a strong presence in the high-performance computing market. It connected one of the top three supercomputers in 2003 and today it is being used in six of the top ten supercomputers in the world based on the TOP500 supercomputers list. Being a standard-based interconnect, InfiniBand enjoys the continuous development of new capabilities, better performance, and scalability. It is used in many of the leading supercomputers around the world, demonstrating 96% network utilization with probably the most advanced adaptive routing capabilities (source “The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems”), and delivering leading performance for the most demanding high compute intensive applications. InfiniBand technology can be separated into three main pillars: connectivity, network, and communication. The connectivity pillar refers to the elements around the interconnect infrastructure such as topologies. The network pillar refers to the network transport and routing for example. And the communication pillar refers to co-design elements related to communication frameworks such as MPI, SHMEM/PGAS and more The past focus for smart interconnects development was to offload the network functions from the CPU to the network. With the new efforts in the co-design approach, the new generation of smart interconnects will also offload data algorithms that will be managed within the network, allowing users to run these algorithms as the data being transferred within the system interconnect, rather than waiting for the data to reach the CPU. This technology is being referred to as In-Network Computing. In-Network Computing transforms the data center interconnect to become a “distributed CPU”, and “distributed memory”, enables to overcome performance walls and to enable faster and more scalable data analysis. HDR 200G InfiniBand In-Network Computing technology provide innovative engines accelerating and improving each of the pillars, such as Scalable Hierarchical Aggregation and Reduction Protocol (SHARP), a technology that was developed by Oak Ridge National Laboratory and Mellanox and received the R&D100 award, smart Tag Matching and rendezvoused protocol, SHIELD and more. These technologies are in use at some of the recent large scale supercomputers around the world, including the top TOP500 platforms. The session will discuss the InfiniBand In-Network Computing technology and testing results from DoE systems, Canada’s fastest InfiniBand Dragonfly based supercomputer at the University of Toronto, the world’s first HDR 200G InfiniBand systems and more. As the needs for faster data speed accelerates, the InfiniBand Trade Association has been working to set the goals for future speeds, and this topic will also be covered at the session.



Back to Birds of a Feather Archive Listing