BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200129T163558Z
LOCATION:704-706
DTSTART;TZID=America/Denver:20191117T120000
DTEND;TZID=America/Denver:20191117T123000
UID:submissions.supercomputing.org_SC19_sess114_ws_prot111@linklings.com
SUMMARY:Performance Analysis of Tile Low-Rank Cholesky Factorization Using
  PaRSEC Instrumentation Tools
DESCRIPTION:Workshop\n\nPerformance Analysis of Tile Low-Rank Cholesky Fac
 torization Using PaRSEC Instrumentation Tools\n\nCao, Pei, Herault, Akbuda
 k, Mikhalev...\n\nThis paper highlights the necessary development of new i
 nstrumentation tools within the PaRSE task-based runtime system to leverag
 e the performance of low-rank matrix computations. In particular, the tile
  low-rank (TLR) Cholesky factorization represents one of the most critical
  matrix operations toward solving challenging large-scale scientific appli
 cations. The challenge resides in the heterogeneous arithmetic intensity o
 f the various computational kernels, which stresses PaRSE's dynamic engine
  when orchestrating the task executions at runtime. Such irregular workloa
 d imposes the deployment of new scheduling heuristics to privilege the cri
 tical path, while exposing task parallelism to maximize hardware occupancy
 . To measure the effectiveness of PaRSE's engine and its various schedulin
 g strategies for tackling such workloads, it becomes paramount to implemen
 t adequate performance analysis and profiling tools tailored to fine-grain
 ed and heterogeneous task execution. This permits us not only to provide i
 nsights from PaRSE, but also to identify potential applications' performan
 ce bottlenecks. These instrumentation tools may actually foster synergism 
 between applications and PaRSE developers for productivity as well as high
 -performance computing purposes. We demonstrate the benefits of these amen
 able tools, while assessing the performance of TLR Cholesky factorization 
 from data distribution, communication-reducing and synchronization-reducin
 g perspectives. This tool-assisted performance analysis results in three m
 ajor contributions: a new hybrid data distribution, a new hierarchical TLR
  Cholesky algorithm, and a new performance model for tuning the tile size.
  The new TLR Cholesky factorization achieves an 8X performance speedup ove
 r existing implementations on massively parallel supercomputers, toward so
 lving large-scale 3D climate and weather prediction applications.\n\nTag: 
 Workshop Reg Pass, Performance, Programming Systems, Visualization\n\nRegi
 stration Category: Workshop Reg Pass, Performance, Programming Systems, Vi
 sualization
URL:https://sc19.supercomputing.org/presentation/?id=ws_prot111&sess=sess1
 14
END:VEVENT
END:VCALENDAR

