BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200129T163557Z
LOCATION:505
DTSTART;TZID=America/Denver:20191118T162000
DTEND;TZID=America/Denver:20191118T163000
UID:submissions.supercomputing.org_SC19_sess128_ws_ia104@linklings.com
SUMMARY:Performance Impact of Memory Channels on Sparse and Irregular Algo
 rithms
DESCRIPTION:Workshop\n\nPerformance Impact of Memory Channels on Sparse an
 d Irregular Algorithms\n\nGreen, Fox, Young, Shirako, Bader\n\nGraph proce
 ssing is typically considered to be a memory-bound rather than compute-bou
 nd problem. One common line of thought is that more available memory bandw
 idth corresponds to better graph processing performance. However, in this 
 work we demonstrate that the key factor in the utilization of the memory s
 ystem for graph algorithms is not necessarily the raw bandwidth or even th
 e latency of memory requests. Instead, we show that performance is proport
 ional to the number of memory channels available to handle small data tran
 sfers with limited spatial locality.\n\nUsing several widely used graph fr
 ameworks, including Gunrock (on the GPU) and GAPBS and Ligra (for CPUs), w
 e evaluate key graph analytics kernels using two unique memory hierarchies
 , DDR-based and HBM/MCDRAM. Our results show that the differences in the p
 eak bandwidths of several Pascal-generation GPU memory subsystems aren't r
 eflected in the performance of various analytics. Furthermore, our experim
 ents on CPU and Xeon Phi systems (see arXiv extended version) demonstrate 
 that the number of memory channels utilized can be a decisive factor in pe
 rformance across several different applications. For CPU systems with smal
 ler thread counts, the memory channels can be underutilized while systems 
 with high thread counts can oversaturate the memory subsystem, which leads
  to limited performance. Finally, we model the potential performance impro
 vements of adding more memory channels with narrower access widths than ar
 e found in current platforms. We analyze performance trade-offs for the tw
 o most prominent types of memory accesses found in graph algorithms, strea
 ming and random accesses.\n\nTag: Workshop Reg Pass, Algorithms, Architect
 ures, Irregular Applications\n\nRegistration Category: Workshop Reg Pass, 
 Algorithms, Architectures, Irregular Applications
URL:https://sc19.supercomputing.org/presentation/?id=ws_ia104&sess=sess128
END:VEVENT
END:VCALENDAR

