DescriptionThis paper presents a new mixed-precision implementation of a linear-solver kernel used in practical large-scale CFD simulations to improve GPU performance. The new implementation reduces memory traffic by using the half-precision format for some critical computations while maintaining double-precision solution accuracy. As the linear-solver kernel is memory bound on GPUs, a reduction in memory traffic directly translates to improved performance. The performance of the new implementation is assessed for a benchmark steady flow simulation and a large-scale unsteady turbulent flow application. Both studies were conducted using NVIDIA® Tesla V100 GPUs on the Summit system at the Oak Ridge Leadership Computing Facility.