Performance Optimality or Reproducibility: That Is the Question

SC19 Proceedings

Performance Optimality or Reproducibility: That Is the Question

Authors: Tapasya Patki (Lawrence Livermore National Laboratory), Jayaraman J. Thiagarajan (Lawrence Livermore National Laboratory), Alexis Ayala (Western Washington University), Tanzima Z. Islam (Western Washington University)

Abstract: The era of extremely heterogeneous supercomputing brings with itself the devil of increased performance variation and reduced reproducibility. There is a lack of understanding in the HPC community on how the simultaneous consideration of network traffic, power limits, concurrency tuning, and interference from other jobs impacts application performance.

In this paper, we design a methodology that allows both HPC users and system administrators to understand the trade-off space between optimal and reproducible performance. We present a first-of-its-kind dataset that simultaneously varies multiple system- and user-level parameters on a production cluster, and introduce a new metric, called the desirability score, which enables comparison across different system configurations. We develop a novel, model-agnostic machine learning methodology based on the graph signal theory for comparing the influence of parameters on application predictability, and using a new visualization technique, make practical suggestions for best practices for multi-objective HPC environments.

Back to Technical Papers Archive Listing