InfiniBand, Omni-Path, and High-Speed Ethernet: Advanced Features, Challenges in Designing HEC Systems and Usage
TimeMonday, 18 November 20191:30pm - 5pm
DescriptionAs InfiniBand (IB), Omni-Path, and High-Speed Ethernet (HSE) technologies mature, they are being used to design and deploy various High-End Computing (HEC) systems: HPC clusters with GPGPUs supporting MPI, Storage and Parallel File Systems, Cloud Computing systems with SR-IOV Virtualization, Grid Computing systems, and Deep Learning systems. These systems are bringing new challenges in terms of performance, scalability, portability, reliability and network congestion. Many scientists, engineers, researchers, managers, and system administrators are becoming interested in learning about these challenges, approaches being used to solve these challenges, and the associated impact on performance and scalability.
This tutorial will start with an overview of these systems. Advanced hardware and software features of IB, Omni-Path, HSE, and RoCE and their capabilities to address these challenges will be emphasized. Next, we will focus on Open Fabrics RDMA and Libfabrics programming, and network management infrastructure and tools to effectively use these systems. A common set of challenges being faced while designing these systems will be presented. Case studies focusing on domain-specific challenges in designing these systems, their solutions, and sample performance numbers will be presented. Finally, hands-on exercises will be carried out with Open Fabrics and Libfabrics software stacks and Network Management tools.