Workshop: Arbiter: Dynamically Limiting Resource Consumption on Login Nodes
Abstract: Login nodes are shared resources that are meant for small, lightweight tasks such as submitting to the batch system, scripting, compiling, and staging data. Because they are shared resources, the responsiveness of login nodes depends on users being good citizens with respect to CPU time and memory usage. Many HPC centers have policies that define behavior that is acceptable. However, the unfortunate reality is that users frequently violate these policies and negatively impact the work of others. This impact is exacerbated by the fact that administrators often have to police users’ behavior.
Arbiter is a service that addresses the misuse of login nodes by automatically enforcing policies using cgroups. When users log in, Arbiter sets a default hard memory and CPU limit on the user to prevent them from dominating the whole machine’s memory and CPU resources. To enforce policies, Arbiter tracks the usage of individual users over a set interval and looks for policy violations. When a violation occurs, the violating user is emailed about what behavior constituted the violation and the acceptable usage policy for login nodes. In addition, Arbiter also temporarily lowers the hard memory and CPU limit to discourage excessive usage. The length of time and severity of the lower hard limit depends on whether a user has repeatedly violated policies to penalize users for continued excessive usage. The result of the Arbiter service is that login nodes stay responsive, with users informed of policies and discouraged from running computationally heavy jobs on login nodes.