Poster 127: sFlow Monitoring for Security and Reliability
TimeThursday, 21 November 20198:30am - 5pm
DescriptionIn the past ten years, High Performance Computing (HPC) has moved far beyond the terascale performance, making petascale systems the new standard. The drastic improvement in performance has been largely unmatched with insignificant improvements in system monitoring. Thus, there is an immediate need for practical and scalable monitoring solutions to ensure the effectiveness of costly compute clusters. This project aims to explore the viability and impact of sFlow enabled switches in cluster network monitoring for security and reliability. A series of tests and exploits were performed to target specific network abnormalities on a nine-node HPC cluster. The results present web-based dashboards that can aid network administrators in improving a cluster’s security and reliability.