Poster 147: Extremely Accelerated Deep Learning: ResNet-50 Training in 70.4 Seconds
TimeThursday, 21 November 20198:30am - 5pm
DescriptionDistributed deep learning using a large mini-batch is a key technology to accelerate training in deep learning. However, it is difficult to achieve a high scalability and maintain validation accuracy in distributed learning on large clusters. We introduce two optimizations, reducing the computation time and overlapping the communication with the computation. By applying the techniques and using 2,048 GPUs, we achieved the world's fastest ResNet-50 training in MLPerf, which is a de facto standard DNN benchmark (as of July 2019).