Workshop: WORKS19 Keynote: Priority Research Directions for In Situ Data Management: Enabling Scientific Discovery from Diverse Data Sources
Abstract: Scientific computing will increasingly incorporate a number of different tasks that need to be managed along with the main simulation or experimental tasks—ensemble analysis, data-driven science, artificial intelligence, machine learning, surrogate modeling, and graph analytics—all nontraditional applications unheard of in HPC just a few years ago. Many of these tasks will need to execute concurrently, that is, in situ, with simulations and experiments sharing the same computing resources.
There are two primary, interdependent motivations for processing and managing data in situ. The first motivation is the need to decrease data volume. The in situ methodology can make critical contributions to managing large data from computations and experiments to minimize data movement, save storage space, and boost resource efficiency—often while simultaneously increasing scientific precision. The second motivation is that the in situ methodology can enable scientific discovery from a broad range of data sources—HPC simulations, experiments, scientific instruments, and sensor networks—over a wide scale of computing platforms: leadership-class HPC, clusters, clouds, workstations, and embedded devices at the edge.
The successful development of in situ data management capabilities can potentially benefit real-time decision making, design optimization, and data-driven scientific discovery. This talk will feature six priority research directions that highlight the components and capabilities needed for in situ data management to be successful for a wide variety of applications: making in situ data management more pervasive, controllable, composable, and transparent, with a focus on greater coordination with the software stack, and a diversity of fundamentally new data algorithms.