Abstract: With task parallel models, programmers can easily parallelize divide-and-conquer algorithms by using nested fork-join structures. Work stealing, which is a popular scheduling strategy for task parallel programs, can efficiently perform dynamic load balancing; however, it tends to damage data locality and does not scale well with memory-bound applications. This paper introduces Almost Deterministic Work Stealing (ADWS), which addresses the issue of data locality of traditional work stealing by making the scheduling almost deterministic. Specifically, ADWS consists of two parts: (i) deterministic task allocation, which deterministically distributes tasks to workers based on the amount of work for each task, and (ii) hierarchical localized work stealing, which dynamically compensates load imbalance in a locality-aware manner. Experimental results show that ADWS is up to nearly 6 times faster than a traditional work stealing scheduler with memory-bound applications, and that dynamic load balancing works well while maintaining good data locality.
Back to Technical Papers Archive Listing