Workshop: Conveyors for Streaming Many-to-Many Communication
Abstract: We report on a software package that offers high-bandwidth and memory-efficient ways for a parallel application to transmit numerous small data items among its processes. The package provides a standalone library that can integrated into any SHMEM, UPC, or MPI application. It defines a simple interface to parallel objects called conveyors, and it provides a variety of conveyor implementations. Often the most efficient type of conveyor is an asynchronous three-hop conveyor, which makes heavy use of fast intranode communication. This type also uses the least memory internally. Conveyors of this type scale well to 100,000 processes and beyond.
Our experience with conveyors applied to irregular algorithms at scale has convinced us of the necessity and profitability of message aggregation. The conveyor interface is a low-level C API that is intended to guide future hardware and runtime improvements and to be a foundation for future parallel programming models.