BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200129T163557Z
LOCATION:702
DTSTART;TZID=America/Denver:20191118T160000
DTEND;TZID=America/Denver:20191118T163000
UID:submissions.supercomputing.org_SC19_sess127_ws_waccpd109@linklings.com
SUMMARY:A Portable SIMD Primitive Using Kokkos for Heterogeneous Architect
 ures
DESCRIPTION:Workshop\n\nA Portable SIMD Primitive Using Kokkos for Heterog
 eneous Architectures\n\nSahasrabudhe, Phipps, Rajamanickam, Berzins\n\nAs 
 computer architectures are rapidly evolving (e.g. those designed for exasc
 ale), multiple portability frameworks have been developed to avoid new arc
 hitecture-specific development and tuning. However, portability frameworks
  depend on compilers for auto-vectorization and may lack support for expli
 cit vectorization on heterogeneous platforms. Alternatively, programmers c
 an use intrinsics-based primitives to achieve more efficient vectorization
 , but the lack of a gpu back-end for these primitives makes such code non-
 portable. A unified, portable, Single Instruction Multiple Data (SIMD) pri
 mitive proposed in this work, allows intrinsics-based vectorization on cpu
 s and many-core architectures such as Intel Knights Landing (KNL), and als
 o facilitates Single Instruction Multiple Threads (SIMT) based execution o
 n gpus. This unified primitive, coupled with the Kokkos portability ecosys
 tem, makes it possible to develop explicitly vectorized code, which is por
 table across heterogeneous platforms. The new SIMD primitive is used on di
 fferent architectures to test the performance boost against hard-to-auto-v
 ectorize baseline, to measure the overhead against efficiently vectroized 
 baseline, and to evaluate the new feature called the “logical vector lengt
 h” (LVL). The SIMD primitive provides portability across cpus and gpus wit
 hout any performance degradation being observed experimentally.\n\nTag: Wo
 rkshop Reg Pass, Accelerators, Parallel Application Frameworks, Parallel P
 rogramming Languages, Libraries, and Models, Scientific Computing, Softwar
 e Engineering\n\nRegistration Category: Workshop Reg Pass, Accelerators, P
 arallel Application Frameworks, Parallel Programming Languages, Libraries,
  and Models, Scientific Computing, Software Engineering
URL:https://sc19.supercomputing.org/presentation/?id=ws_waccpd109&sess=ses
 s127
END:VEVENT
END:VCALENDAR