DescriptionMany HPC applications are designed based on underlying performance and execution models. These models could successfully be employed in the past for balancing load within and between compute nodes. However, the increasing complexity of modern software and hardware makes performance predictability and load balancing much more difficult. Tackling these challenges in search for a generic solution, we present a novel library for fine-granular task-based reactive load balancing in distributed memory based on MPI and OpenMP. Our concept allows creating individual migratable tasks that can be executed on any MPI rank. Migration decisions are performed at run time based on online performance or load data. Two fundamental approaches to balance load and at the same time overlap computation and communication are compared. We evaluate our concept under enforced power caps and clock frequency changes using a synthetic benchmark and demonstrate robustness against work-induced imbalances for an AMR application.