Presentation

DescriptionLarge-batch training approaches have enabled researchers to utilize distributed processing and greatly accelerate deep neural networks training. However, there are three problems in current large-batch research:

(1) Although RNN approaches like LSTM have been widely used in many applications, current large-batch research is principally focused on CNNs.

(2) Even for CNNs, there is no automated technique for extending the batch size beyond 8K.

(3) To keep the variance in the gradient expectation constant, theory suggests that a Sqrt Scaling scheme should be used in large-batch training.

Unfortunately, there are not many successful applications. In this paper, we propose Dynamic Adaptive-Tuning Engine (DATE) for better large-batch training. DATE achieves a 5.3x average speedup over the baselines for four LSTM-based applications on the same hardware. We finish the ImageNet training with ResNet-50 in two minutes on 1024 v3 TPUs (76.7% top-1 accuracy), which is the fastest version as of June 2019.

Download PDF

Paper available from the ACM Digital Library