BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200129T163600Z
LOCATION:501
DTSTART;TZID=America/Denver:20191118T104600
DTEND;TZID=America/Denver:20191118T110800
UID:submissions.supercomputing.org_SC19_sess117_ws_mchpc103@linklings.com
SUMMARY:Optimizing Data Layouts for Irregular Applications on a Migratory 
 Thread Architecture
DESCRIPTION:Workshop\n\nOptimizing Data Layouts for Irregular Applications
  on a Migratory Thread Architecture\n\nRolinger, Krieger, Sussman\n\nAppli
 cations that operate on sparse data induce irregular data access patterns 
 and cannot take full advantage of caches and prefetching. Novel hardware a
 rchitectures have been proposed to address the disparity between processor
  and memory speeds by moving computation closer to memory. One such archit
 ecture is the Emu system, which employs light-weight threads that migrate 
 to the location of the data being accessed. While smart heuristics and pro
 file-guided techniques have been developed to derive good data layouts for
  traditional machines, these methods are largely ineffective when applied 
 to a migratory thread architecture. In this work, we present an applicatio
 n-independent framework for data layout optimizations that targets the Emu
  architecture. We discuss the necessary tools and concepts to facilitate s
 uch optimizations, including a data-centric profiler, data distribution li
 brary, and cost model. To demonstrate the framework, we have designed a bl
 ock placement optimization that distributes blocks of data across the syst
 em such that access latency is reduced. The optimization was applied towar
 ds sparse matrix-vector multiplication on an Emu FPGA implementation, achi
 eving a geometric mean speed up of 12.5% across 57 matrices. Only one matr
 ix experienced a loss of performance of 6%, while the maximum runtime spee
 dup was 50%.\n\nTag: Workshop Reg Pass, HPC, Memory, OS and Runtime System
 s, Runtime Systems\n\nRegistration Category: Workshop Reg Pass, HPC, Memor
 y, OS and Runtime Systems, Runtime Systems
URL:https://sc19.supercomputing.org/presentation/?id=ws_mchpc103&sess=sess
 117
END:VEVENT
END:VCALENDAR