Abstract: Inter-thread communication is a vital performance indicator in shared-memory systems. Prior works on identifying inter-thread communication employed hardware simulators or binary instrumentation and suffered from inaccuracy or high overheads—both space and time—making them impractical for production use. We propose ComDetective, which produces communication matrices that are accurate and introduces low runtime and memory overheads, thus making it practical for production use.
ComDetective employs hardware performance counters to sample memory-access events and uses hardware debug registers to sample communicating pairs of threads. ComDetective can differentiate communication as true or false sharing between threads. Its runtime and memory overheads are only 1.30x and 1.27x, respectively, for the 18 applications studied under 500K sampling period. Using ComDetective, we produce insightful communication matrices for micro-benchmarks, PARSEC benchmark suite, and several CORAL applications and compare the generated matrices against MPI counterparts. Guided by ComDetective we optimize a few codes and achieve up to 13% speedup.
Back to Technical Papers Archive Listing