DescriptionDataRaceBench is an OpenMP benchmark suite designed to systematically and quantitatively evaluate data race detection tools. It has been used by several research and development groups to measure the quality of their tools. In this paper we explore how to evaluate the regression of data race detection tools in the presence of observed tool errors. We define major factors related to generating consistent, reproducible, and comparable evaluation results and a detailed evaluation process with a set of configuration and execution rules. We also outline differences in the evaluation of dynamic and static data race detection tools.On top of the evaluation results, we further explore and suggest different ways to process and present the data, with focus on how to consider tool errors. Using DataRaceBench we can show a regression for several popular data race detection tools in recent release cycles.