BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200129T163204Z
LOCATION:401-402-403-404
DTSTART;TZID=America/Denver:20191119T153000
DTEND;TZID=America/Denver:20191119T160000
UID:submissions.supercomputing.org_SC19_sess144_pap293@linklings.com
SUMMARY:Red-Blue Pebbling Revisited: Near Optimal Parallel Matrix Multipli
cation
DESCRIPTION:Paper\n\nRed-Blue Pebbling Revisited: Near Optimal Parallel Ma
trix Multiplication\n\nKwasniewski, Kabic, Besta, Solca, VandeVondele...\n
\nWe propose COSMA: a parallel matrix-matrix multiplication algorithm that
is near communication-optimal for all combinations of matrix dimensions,
processor counts, and memory sizes. The key idea behind COSMA is to derive
an optimal (up to a factor of 0.03% for 10MB of fast memory) sequential s
chedule and then parallelize it, preserving I/O optimality. To achieve thi
s, we use the red-blue pebble game to precisely model MMM dependencies and
derive a constructive and tight sequential and parallel I/O lower bound p
roofs. Compared to 2D or 3D algorithms, which fix processor decomposition
upfront and then map it to the matrix dimensions, it reduces communication
volume by up to sqrt{3}. COSMA outperforms the established ScaLAPACK, CAR
MA, and CTF algorithms in all scenarios up to 12.8x (2.2x on average), ach
ieving up to 88% of Piz Daint's peak performance. Our work does not requir
e any hand tuning and is maintained as an open source implementation.\n\nT
ag: Tech Program Reg Pass, BP Finalist, BSP Finalist, Algorithms, I/O, Lin
ear Algebra, Parallel Programming Languages, Libraries, and Models, Perfor
mance, Task-based programming\n\nRegistration Category: Tech Program Reg P
ass, BP Finalist, BSP Finalist, Algorithms, I/O, Linear Algebra, Parallel
Programming Languages, Libraries, and Models, Performance, Task-based prog
ramming\n\nAward Finalist: Tech Program Reg Pass, BP Finalist, BSP Finalis
t, Algorithms, I/O, Linear Algebra, Parallel Programming Languages, Librar
ies, and Models, Performance, Task-based programming
URL:https://sc19.supercomputing.org/presentation/?id=pap293&sess=sess144
END:VEVENT
END:VCALENDAR