Aggregating Local Storage for Scalable Deep Learning I/O
TimeSunday, 17 November 201912:10pm - 12:30pm
DescriptionDeep learning applications introduce heavy I/O loads on computer systems. The inherently long-running, highly concurrent, and random file accesses can easily saturate traditional shared file systems and negatively impact other users. We investigate here a solution to these problems based on leveraging local storage and the interconnect to serve training datasets at scale. We present FanStore, a user-level transient object store that provides low-latency and scalable POSIX-compliant file access by integrating the function interception technique and various metadata/data placement strategies. On a single node, FanStore provides performance similar to that of the XFS journaling file system. On many nodes, our experiments with real applications show that FanStore achieves over 90% scaling efficiency.