SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Poster 127: sFlow Monitoring for Security and Reliability

Authors: Xava A. Grooms (Los Alamos National Laboratory, University of Kentucky), Robert V. Rollins (Los Alamos National Laboratory, Michigan Technological University), Collin T. Rumpca (Los Alamos National Laboratory, Dakota State University)

Abstract: In the past ten years, High Performance Computing (HPC) has moved far beyond the terascale performance, making petascale systems the new standard. The drastic improvement in performance has been largely unmatched with insignificant improvements in system monitoring. Thus, there is an immediate need for practical and scalable monitoring solutions to ensure the effectiveness of costly compute clusters. This project aims to explore the viability and impact of sFlow enabled switches in cluster network monitoring for security and reliability. A series of tests and exploits were performed to target specific network abnormalities on a nine-node HPC cluster. The results present web-based dashboards that can aid network administrators in improving a cluster’s security and reliability.

Best Poster Finalist (BP): no

