BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200129T163600Z
LOCATION:607
DTSTART;TZID=America/Denver:20191118T113000
DTEND;TZID=America/Denver:20191118T115000
UID:submissions.supercomputing.org_SC19_sess124_ws_lasalss104@linklings.co
 m
SUMMARY:GPU Acceleration of Communication Avoiding Chebyshev Basis Conjuga
 te Gradient Solver for Multiphase CFD Simulations
DESCRIPTION:Workshop\n\nGPU Acceleration of Communication Avoiding Chebysh
 ev Basis Conjugate Gradient Solver for Multiphase CFD Simulations\n\nAli, 
 Onodera, Idomura, Ina, Imamura\n\nIterative methods for solving large line
 ar systems are common parts of computational fluid dynamics (CFD) codes. T
 he Preconditioned Conjugate Gradient (P-CG) method is one of the most wide
 ly used iterative methods. However, in the P-CG method, global collective 
 communication is a crucial bottleneck especially on accelerated computing 
 platforms. To resolve this issue, communication avoiding (CA) variants of 
 the P-CG method are becoming increasingly important. In this paper, the P-
 CG and Preconditioned Chebyshev Basis CA CG (P-CBCG) solvers in the multip
 hase CFD code JUPITER are ported on the latest V100 GPUs. All GPU kernels 
 are highly optimized to achieve about 90¥% of the roofline performance, th
 e block Jacobi preconditioner is re-designed to extract high computing pow
 er of GPUs, and the remaining bottleneck of halo data communication is avo
 ided by overlapping communication and computation. The overall performance
  of the P-CG and P-CBCG solvers is determined by the competition between t
 he CA properties of the global collective communication and the halo data 
 communication, indicating an importance of the inter-node interconnect ban
 dwidth per GPU.\nThe developed GPU solvers are accelerated up to 2x compar
 ed with the former CPU solvers on KNLs, and excellent strong scaling is ac
 hieved up to 7,680 GPUs on the Summit.\n\nTag: Workshop Reg Pass, Algorith
 ms, Scalable Computing\n\nRegistration Category: Workshop Reg Pass, Algori
 thms, Scalable Computing
URL:https://sc19.supercomputing.org/presentation/?id=ws_lasalss104&sess=se
 ss124
END:VEVENT
END:VCALENDAR

