BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200129T163559Z
LOCATION:502-503-504
DTSTART;TZID=America/Denver:20191118T103000
DTEND;TZID=America/Denver:20191118T110000
UID:submissions.supercomputing.org_SC19_sess115_ws_mlhpce116@linklings.com
SUMMARY:Understanding Scalability and Fine-Grain Parallelism of Synchronou
 s Data Parallel Training
DESCRIPTION:Workshop\n\nUnderstanding Scalability and Fine-Grain Paralleli
 sm of Synchronous Data Parallel Training\n\nLi, Nicolae, Wozniak, Bosilca\
 n\nIn the age of big data, deep learning has emerged as a powerful tool to
  extract insight and exploit its value, both in industry and scientific ap
 plications.  With increasing complexity of learning models and amounts of 
 training data,  data-parallel approaches based on frequent all-reduce sync
 hronization steps are increasingly popular. Despite the fact that high-per
 formance computing  (HPC) technologies have been designed to address such 
 patterns efficiently, the behavior of data-parallel approaches on HPC plat
 forms is not well understood. To address this issue, in this paper we stud
 y the behavior of  Horovod,  a  popular data-parallel approach that relies
  on MPI,  on  Theta,  a pre-Exascale machine at  Argonne  National  Labora
 tory.  Using  two representative  applications,  we  explore  two  aspects
 :  (1)  how performance and scalability is affected by important parameter
 s such as number of nodes, number of workers, threads per node, batch  siz
 e;  (2)  how  computational  phases  are  interleaved  withall-reduce  com
 munication  phases  at  fine  granularity  and  what consequences  this  i
 nterleaving  has  in  terms  of  potential  bottlenecks.  Our findings sho
 w that pipelining of back-propagation, gradient reduction and weight updat
 es mitigate the effects of stragglers during all-reduce only partially.  F
 urthermore,  there can be significant delays between weights update, which
  can be leveraged to mask the overhead of additional background operations
  that are coupled with the training.\n\nTag: Workshop Reg Pass, Machine Le
 arning\n\nRegistration Category: Workshop Reg Pass, Machine Learning
URL:https://sc19.supercomputing.org/presentation/?id=ws_mlhpce116&sess=ses
 s115
END:VEVENT
END:VCALENDAR

