Workshop: Afternoon Keynote - Running large models in minutes: an engineering journey through high performance for AI
Abstract: From climate modelling to drug design, AI models are not fully part of scientific modelling and AI models are getting more complex and larger every year. The adoption of challenging workloads like the BERT language model and the popularity of Deep Learning performance blogs or benchmarks such as MLPerf highlight the importance of being able to quickly train and tune such models. Until recently, system design for HPC and AI were often done in isolation as the requirements for the platforms where different, making large scientific experimentations difficult. To overcome these gaps, systems are now designed with AI software in mind and scale is introduced in the software design from ground up so that each model running at the edge can be trained in minutes at scale. In this talk we will cover how software leverages the inherent scaling nature of large models and how HPC infrastructures can be built and leveraged as the ideal platforms for fast experimentation and large problems.