Poster 22: Fast Profiling-Based Performance Modeling of Distributed GPU Applications

SC19 Proceedings

Poster 22: Fast Profiling-Based Performance Modeling of Distributed GPU Applications

Student: Jaemin Choi (University of Illinois, Lawrence Livermore National Laboratory)
Supervisor: Abhinav Bhatele (Lawrence Livermore National Laboratory)

Abstract: An increasing number of applications utilize GPUs to accelerate computation, with MPI responsible for communication in distributed environments. Existing performance models only focus on either modeling GPU kernels or MPI communication; few that do model the entire application are often too specialized for a single application and require extensive input from the programmer.

To be able to quickly model different types of distributed GPU applications, we propose a profiling-based methodology for creating performance models. We build upon the roofline performance model for GPU kernels and analytical models for MPI communication, with a significant reduction in profiling time. We also develop a benchmark to model 3D halo exchange that occurs in many scientific applications. Our proposed model for the main iteration loops of MiniFE achieves 6-7% prediction error on LLNL Lassen and 1-2% error on PSC Bridges, with minimal code inspection required to model MPI communication.

ACM-SRC Semi-Finalist: no

Poster: PDF
Poster Summary: PDF

Back to Poster Archive Listing