BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200129T163600Z
LOCATION:501
DTSTART;TZID=America/Denver:20191118T113000
DTEND;TZID=America/Denver:20191118T115200
UID:submissions.supercomputing.org_SC19_sess117_ws_mchpc107@linklings.com
SUMMARY:Optimizing Memory Layout of Hyperplane Ordering for Vector Superco
 mputer SX-Aurora TSUBASA
DESCRIPTION:Workshop\n\nOptimizing Memory Layout of Hyperplane Ordering fo
 r Vector Supercomputer SX-Aurora TSUBASA\n\nWatanabe, Hougi, Komatsu, Sato
 , Musa...\n\nThis paper describes the performance optimization of hyperpla
 ne ordering methods applied to the high cost routine of the turbine simula
 tion code called “Numerical Turbine” for the newest vector supercomputer. 
 The Numerical Turbine code is a computational fluid dynamics code develope
 d at Tohoku University, which can execute large-scale parallel calculation
  of the entire thermal flow through multistage cascades of gas and steam t
 urbines. The Numerical Turbine code is a memory- intensive application tha
 t requires a high memory bandwidth to achieve a high sustained performance
 .  For this reason, it is implemented in a vector supercomputer equipped w
 ith a high-performance memory subsystem. The main performance bottleneck o
 f the Numerical Turbine code is the time-integration routine. To vectorize
  the lower-upper symmetric Gauss-Seidel method used in this time integrati
 on routine, a hyperplane ordering method is used. We clarify the problems 
 of the current hyperplane ordering methods for the newest vector supercomp
 uter NEC SX-Aurora TSUBASA and propose an optimized hyperplane ordering me
 thod that changes the data layout in the memory to resolve this bottleneck
 . Through the performance evaluation, it is clarified that the proposed hy
 perplane ordering can achieve further improvement of the performance by up
  to 2.77x, and 1.27x on average.\n\nTag: Workshop Reg Pass, HPC, Memory, O
 S and Runtime Systems, Runtime Systems\n\nRegistration Category: Workshop 
 Reg Pass, HPC, Memory, OS and Runtime Systems, Runtime Systems
URL:https://sc19.supercomputing.org/presentation/?id=ws_mchpc107&sess=sess
 117
END:VEVENT
END:VCALENDAR

