Presentation

· Contributors · Organizations · Search Program · Flagged · Happening Now · Maps

Posters

Research Posters

: Poster 48: Runtime System for GPU-Based Hierarchical LU Factorization

SessionResearch Posters Display

Authors

Qianxiang Ma

Rio Yokota

Event Type

Posters

Research Posters

Registration Categories

TimeWednesday, 20 November 20198:30am - 5pm

LocationE Concourse

DescriptionHierarchical low-rank approximation can reduce both the storage and computation costs of dense matrices, but its implementation is challenging. In this research, we tackle one of the most difficult problems of GPU parallelization of the factorization of these hierarchical matrices. To this end, we are developing a new runtime system for GPUs that can schedule all tasks into one GPU kernel. Other existing runtime systems, like cuGraph and Standford Legion, can only manage streams and kernel-level parallelism. Even without too much tuning, we achieved 4x better performance in H-LU factorization with a single GPU when comparing with a well-tuned CPU-based hierarchical matrix library, HLIBpro, on moderately sized matrices. Additionally, we have significantly less runtime overheads exposed when processing smaller matrices.