Abstract: This work tackles the communication challenges posed by the latency-bound applications with irregular communication patterns, i.e., applications with high average and/or maximum message counts. We propose a novel algorithm for reorganizing a given set of irregular point-to-point messages with the objective of reducing total latency cost at the expense of increased volume. We organize processes into a virtual process topology inspired by the k-ary n-cube networks and regularize irregular messages by imposing regular communication pattern(s) onto them. Exploiting this process topology, we propose a flexible store-and-forward algorithm to control the trade-off between latency and volume. Our approach is able to reduce the communication time of sparse-matrix multiplication with latency-bound instances drastically: up to 22.6x for 16K processes on a 3D Torus network and up to 7.2x for 4K processes on a Dragonfly network, with its performance getting better with increasing number of processes.
Back to Technical Papers Archive Listing