Modeling the Next-Generation High Performance Schedulers
High performance computing (HPC) resources and workloads are undergoing tumultuous changes. HPC resources are growing more diverse with the adoption of accelerators; HPC workloads have increased in size by orders of magnitude. Despite these changes, when assigning workload jobs to resources, HPC schedulers still rely on users to accurately anticipate their applications’ resource usage and remain stuck with the decades-old centralized scheduling model.
In this talk we will discuss these ongoing changes and propose alternative models for HPC scheduling based on resource awareness and fully hierarchical models. A key role in our models’ evaluation is played by an emulator of a real open-source, next-generation resource management system. We will discuss the challenges of realistically mimicking the system's scheduling behavior. Our evaluation shows how our models improve scheduling scalability on a diverse set of synthetic and real-world workloads.
This is joint work with Stephen Herbein and Michael Wyatt at the University of Delaware, and Tapasya Patki, Dong H. Ahn, Don Lipari, Thomas R.W. Scogland, Marc Stearman, Mark Grondona, Jim Garlick, Tamara Dahlgren, David Domyancic, and Becky Springmeyer at the Lawrence Livermore National Laboratory.
Michela Taufer is a Professor in Computer and Information Sciences and a J.P. Morgan Case Scholar at the University of Delaware; she has a joint appointment in the Biomedical Department and the Bioinformatics Program at the same university. She earned her undergraduate degrees in Computer Engineering from the University of Padova (Italy) and her doctoral degree in Computer Science from the Swiss Federal Institute of Technology or ETH (Switzerland). From 2003 to 2004 she was a La Jolla Interfaces in Science Training Program (LJIS) Postdoctoral Fellow at the University of California San Diego (UCSD) and The Scripps Research Institute (TSRI), where she worked on interdisciplinary projects in computer systems and computational chemistry. Taufer’s research interests in high performance computing include scientific applications, scheduling and reproducibility challenges, and big data analytics. She has nearly 100 publications and delivered nearly 80 talks at various conferences and research institutes. She is currently serving on the NSF Advisory Committee for Cyberinfrastructures (ACCI). She is a professional member of the IEEE and a Distinguished Scientist of the ACM.