BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200129T163557Z
LOCATION:710
DTSTART;TZID=America/Denver:20191118T121000
DTEND;TZID=America/Denver:20191118T123000
UID:submissions.supercomputing.org_SC19_sess125_pec267@linklings.com
SUMMARY:On the Feasibility of Optical Circuit Switching for Distributed De
 ep Learning
DESCRIPTION:Workshop\n\nOn the Feasibility of Optical Circuit Switching fo
 r Distributed Deep Learning\n\nNguyen\n\nData parallelism is the dominant 
 method used to training deep learning (DL) model on High-Performance Compu
 ting systems such as large-scale GPU clusters. In which, collective commun
 ication large message, e.g., up to hundreds of MB, between GPUs becomes on
 e of the major bottlenecks. Especially when training a deep learning model
  on a large number of node, inter-node communication becomes bottle-neck d
 ue to its relatively higher latency and lower link bandwidth (than intra-n
 ode communication). To cope with this problem, some techniques have been p
 roposed to (a) optimize the collective communication algorithms that take 
 into account the network topology, (b) reduce the message size, and (c) ov
 erlap the communication and computation. All of these approaches target to
  deal with the large message size issue while diminishing the effect of th
 e limitation of the inter-node network. In this study, we investigate the 
 benefit of increasing inter-node link bandwidth by using the hybrid switch
 ing systems, i.e., Electrical Packet Switching and Optical Circuit Switchi
 ng, We find that the typical data-transfer of synchronous data-parallelism
  training are long-live and rarely changed that can be speed-up with optic
 al switching. Simulation results on Simgrid simulator show that our approa
 ch speed-up the training time of deep learning application.\n\nTag: Worksh
 op Reg Pass, Architectures, Datacenter, Emerging Technologies, Hardware, H
 PC, I/O, Networks, Photonics, Silicon Fabrication\n\nRegistration Category
 : Workshop Reg Pass, Architectures, Datacenter, Emerging Technologies, Har
 dware, HPC, I/O, Networks, Photonics, Silicon Fabrication
URL:https://sc19.supercomputing.org/presentation/?id=pec267&sess=sess125
END:VEVENT
END:VCALENDAR

