BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200129T163557Z
LOCATION:708
DTSTART;TZID=America/Denver:20191117T160000
DTEND;TZID=America/Denver:20191117T163000
UID:submissions.supercomputing.org_SC19_sess112_pec251@linklings.com
SUMMARY:A Performance Comparison of Dask and Apache Spark for Data-Intensi
 ve Neuroimaging Pipelines
DESCRIPTION:Workshop\n\nA Performance Comparison of Dask and Apache Spark 
 for Data-Intensive Neuroimaging Pipelines\n\nDugré, Hayot-Sasson, Glatard\
 n\nIn the past few years, neuroimaging has entered the Big Data era due to
  the joint increase in image resolution, data sharing, and study sizes. Ho
 wever, no particular Big Data engines have emerged in this field, and seve
 ral alternatives remain available. We compare two popular Big Data engines
  with Python APIs, Apache Spark and Dask, for their runtime performance in
  processing neuroimaging pipelines. Our evaluation uses two synthetic pipe
 lines processing the 81GB BigBrain image, and a real pipeline processing a
 natomical data from more than 1,000 subjects. We benchmark these pipelines
  using various combinations of task durations, data sizes, and numbers of 
 workers, deployed on an 8-node (8 cores ea.) compute cluster in Compute Ca
 nada's Arbutus cloud. We evaluate PySpark's RDD API against Dask's Bag, De
 layed and Futures. Results show that despite slight differences between Sp
 ark and Dask, both engines perform comparably. However, Dask pipelines ris
 k being limited by Python's GIL depending on task type and cluster configu
 ration. In all cases, the major limiting factor was data transfer. While e
 ither engine is suitable for neuroimaging pipelines, more effort needs to 
 be placed in reducing data transfer time.\n\nTag: Workshop Reg Pass, Extre
 me Scale Computing, Scalable Computing, Scientific Workflows\n\nRegistrati
 on Category: Workshop Reg Pass, Extreme Scale Computing, Scalable Computin
 g, Scientific Workflows
URL:https://sc19.supercomputing.org/presentation/?id=pec251&sess=sess112
END:VEVENT
END:VCALENDAR

