| Workshops and Tutorials | ||||||
|---|---|---|---|---|---|---|
|
8:00 AM - 9:00 AM
|
Registration Opens |
|||||
|
9:00 AM - 10:30 AM
|
Tutorial: High-Performance and Smart Networking Technologies for HPC and AI Room: Senior Classroom A |
QUASAR Room: Student Organization Center |
REX-IO Room: Senior Classroom B |
FlexScience Room: Second Floor Conference Room |
||
|
10:30 AM - 11:00 AM
|
Coffee Break |
|||||
|
11:00 AM - 12:30 PM
|
Tutorial: High-Performance and Smart Networking Technologies for HPC and AI Room: Senior Classroom A |
QUASAR Room: Student Organization Center |
REX-IO Room: Senior Classroom B |
FlexScience Room: Second Floor Conference Room |
||
|
12:30 PM - 1:30 PM
|
(Lunch on Your Own) |
|||||
|
1:30 PM - 3:00 PM
|
Tutorial: Principles and Practice of High Performance Deep Learning Training and Inference Room: Senior Classroom A |
QUASAR Room: Student Organization Center |
PERMAVOST Room: Senior Classroom B |
AI4Sys Room: Second Floor Conference Room |
Tutorial: When Error-Bounded Lossy Compression Meets Large-Scale AI Model Training in Federated Environments Room: First Floor Conference Room |
|
|
3:00 PM - 3:30 PM
|
Coffee Break |
|||||
|
3:30 PM - 5:00 PM
|
Tutorial: Principles and Practice of High Performance Deep Learning Training and Inference Room: Senior Classroom A |
QUASAR Room: Student Organization Center |
PERMAVOST Room: Senior Classroom B |
AI4Sys Room: Second Floor Conference Room |
Tutorial: When Error-Bounded Lossy Compression Meets Large-Scale AI Model Training in Federated Environments Room: First Floor Conference Room |
|
|
8:00 AM - 9:00 AM
|
Breakfast + Opening Remarks |
|
|---|---|---|
|
9:00 AM - 10:00 AM
|
Keynote: The Social Life of Distributed Systems: From Virtual Organizations to Intentional SystemsAbstractTwenty-five years ago, the "Grid Problem" was defined as flexible, secure, and coordinated resource sharing among dynamic collections of individuals and institutions — the Virtual Organization (VO). The Anatomy of the Grid (2001) focused on the technical "brawn" required for multi-institutional interoperability, but it soon became clear that infrastructure alone could not solve the challenges of collaborative discovery. In Brain Meets Brawn: Why Grid and Agents Need Each Other (2004), we argued that autonomous agents would provide the necessary "brains" to manage the complexity, coordination, and scale of these distributed environments. Over the following two decades, this perspective evolved toward socio-technical systems in which data, policy, organizational structure, and human interaction became first-class concerns. Our work on Deriva and Deriva-ML treats continuously evolving, curated data collections as the core organizing principle for collaborative discovery, making explicit the rich relationships among data, processes, collaborators, and outcomes across the lifetime of a scientific project. This shift also reframes a long-standing challenge: existing approaches to reproducibility preserve computational artifacts and execution histories, but perfect reconstruction of an experiment does not guarantee reproduction of scientific understanding — one can faithfully reproduce the wrong answer. By making context explicit and persistent, data-centric systems create the foundation for a new class of intentional systems: systems that represent, maintain, and act upon the goals, assumptions, and shared understanding that give scientific results their meaning, not merely the artifacts they produce. The emergence of Large Language Models now makes this practical. Earlier agent-based systems were constrained by narrow reasoning and brittle coordination; LLMs supply a reasoning substrate powerful enough to finally realize the long-envisioned potential of agent-mediated science, shifting the focus from process-centric automation to managing the interactional layer of discovery itself. In this talk, I will discuss recent work in which agent-based systems perform a dual role: executing computational tasks, and — in coordination with a data-centric ecosystem — serving as the "social glue" that captures intent, maintains semantic alignment, and manages shared state across long-running human-agent collaborations. Consider a multi-institutional effort to develop deep learning models for detecting referable glaucoma from fundus photographs collected through a Los Angeles County safety-net teleretinal screening program. As cohorts are refined, label conventions evolve, and foundation-model and supervised approaches are compared across sites, the surrounding agents capture not just the resulting models and metrics, but the clinical and methodological rationale behind each choice — so that when a collaborator, a reviewer, or a downstream agent revisits the work months later, the reasoning behind the result is recoverable, not just the result itself. If the Grid was originally conceived as enabling coordinated resource sharing among dynamic collections of individuals and institutions, the convergence of AI, data-centric systems, and agent-based computing now lets us deliver on a larger vision: coordinating understanding, intent, and discovery across long-lived human-agent collectives.
Speaker: Carl Kesselman
BiographyCarl Kesselman is the William M. Keck Professor of Engineering at the University of Southern California, with appointments in the Daniel J. Epstein Department of Industrial and Systems Engineering, the Thomas Lord Department of Computer Science, the Keck School of Medicine, and the Herman Ostrow School of Dentistry. He is Director of the Informatics Systems Research Division at the USC Information Sciences Institute and is internationally recognized as one of the pioneers of Grid Computing and distributed cyberinfrastructure. Kesselman co-founded the Globus Project, whose technologies and concepts helped establish the foundations for modern distributed, cloud, and data-intensive computing systems. His research has spanned distributed systems, scientific cyberinfrastructure, data integration, security, and large-scale collaborative science platforms. More recently, his work has focused on data-centric socio-technical ecosystems, AI-enabled scientific infrastructure, and agent-mediated systems that support long-running human-machine scientific interactions. He has co-authored four papers recognized in HPDC's retrospective list of the most important papers from the conference's first twenty years. Kesselman is a Fellow of the ACM, IEEE, and the British Computer Society. His honors include the British Computer Society's Lovelace Medal, the IEEE Internet Award, and the IEEE Computer Society's Goode Memorial Award. |
|
|
10:00 AM - 10:20 AM
|
Coffee Break |
|
|
10:20 AM - 11:20 AM
|
Session 1 — LLM Inference: Scheduling & ServingSession Chair: TBA |
|
|
STAR: Decode-Phase Rescheduling for LLM Inference Authors: Zhibin Wang , Zetao Hong , Xue Li , Zibo Wang , Shipeng Li , Qingkai Meng , Qing Wang , Chengying Huan , Rong Gu , Sheng Zhong , Chen Tian |
||
|
PKAS: Predictive KVCache-Aware Scheduling Authors: Jie Ye , Avinash Maurya , Krishna Teja Chitty-Venkata , Bogdan Nicolae , Anthony Kougkas , Xian-He Sun |
||
|
Omnia: RAG Serving through Speculative Scheduling Authors: Rongtian Fu , Shigang Li , Youxuan Xu , Tong Wu , Zhi Ma , Jinliang Shi |
||
|
Kizashi Talks Session 1 (11:05 AM - 11:25 AM)
|
||
|
11:30 AM - 12:30 PM
|
Session 2 — Scaling Generative InferenceSession Chair: TBA |
|
|
Scaling Attention Beyond GPUs for LLM Inference Authors: Weishu Deng , Yujie Yang , Peiran Du , Lingfeng Xiang , Zhen Lin , Chen Zhong , Faraz Ahmed , Lianjie Cao , Puneet Sharma , Song Jiang , Hui Lu , Jia Rao |
||
|
Accelerating Block Low-Rank Foundation Model Inference on Memory-Constrained GPUs Authors: Pierre Abillama , Changwoo Lee , Juechu Dong , David Blaauw , Dennis Sylvester , Hun-Seok Kim |
||
|
MoE-Lens: High-Throughput MoE LLM Inference at the Hardware Limit Authors: Yichao Yuan , Lin Ma , Nishil Talati |
||
|
Kizashi Talks Session 2 (12:15 AM - 12:35 AM)
|
||
|
12:30 PM - 1:30 PM
|
Lunch |
|
|
1:30 PM - 2:30 PM
|
Session 3 — MoE Systems & Distributed LLM TrainingSession Chair: TBA |
|
|
UniEP: Unified Expert-Parallel MoE MegaKernel for Training Authors: Size Zheng , Xuegui Zheng , Li-Wen Chang , Jidong Zhai |
||
|
ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism Authors: Tenghui Ma , Jihu Guo , Wei Gao , Sitian Lu , Zhisheng Ye , Hanjing Wang , Dahua Lin |
||
|
TACO: Communication Compression for Tensor-Parallel LLM Training Authors: Man Liu , Xingchen Liu , Xingjian Tian , Bing Lu , Shengkai Lyu , Shengquan Yin , Wenjing Huang , Zheng Wei , Hairui Zhao , Guangming Tan , Dingwen Tao |
||
|
Kizashi Talks Session 3 (2:15 PM - 2:35 PM)
|
||
|
2:40 PM - 3:40 PM
|
Session 4 — Programming Models, Frameworks & CompilersSession Chair: TBA |
|
|
SYCL++: Unified Programming for Heterogeneous Supercomputers at Scale Authors: Zitao Shen , Yuyang Jin , Kinman Lei , Zixuan Ma , Wenqiang Wang , Yinuo Wang , Wenhao Zhou , Zhenchuan Chen , Di Wei , Qi Zhang , Fei Wang , Ying Liu , Lin Gan , Jidong Zhai |
||
|
Floating Point Virtualization With Tiny Numbers Authors: Kevin Hayes , Peter Dinda |
||
|
FAME: Framework for Multi-Agent RL on Heterogeneous Platforms Authors: Samuel Wiggins , Nikunj Gupta , Grace Zgheib , Mahesh A. Iyer , Viktor Prasanna |
||
|
CARBS: Compiler Autotuning via Randomized Biased Search (Best paper nominee) Authors: Wei Li , Bin Gao , Weng-Fai Wong |
||
|
3:40 PM - 4:00 PM
|
Coffee Break |
|
|
4:00 PM - 5:00 PM
|
The Five Most Important Research Problems for the Next DecadeModerator: Manish Parashar | |
|
8:00 AM - 9:00 AM
|
Breakfast |
|
|---|---|---|
|
9:00 AM - 10:00 AM
|
Keynote: Misconceptions in Parallel ComputingAbstractAbstract: For many years, parallel computing was driven by the steady advance of commodity processors. Clusters of commodity CPUs provided ever greater computing power. Simple parallel performance models made it easy to analyze algorithms and implementations. Systems with mostly nodes of identical type became common, simplifying the application environment. Since 2005, the situation has become increasingly complicated. The end of Dennard scaling has spurred ever greater degrees of parallelism within a single processor chip, as well as specialization in GPUs and other processors. AI has driven systems to sizes never seen before, as well as changing the mix of operations. The way we think about parallel systems needs to be reexamined. In this talk, I will talk about what I think are misconceptions in parallel computing that are a result of the changes in computing, especially since 2005. These include: Is Moore's Law over? Do we still have good performance models? Are we prepared for systems with dissimilar node types? Are standards still useful? I will close with a few challenges for the community.
Speaker: William Gropp
BiographyWilliam Gropp is a professor in the Siebel School of Computing and Data Science at the University of Illinois Urbana‑Champaign, where he holds a Grainger Distinguished Chair in Engineering. He earned his Ph.D. in Computer Science from Stanford University in 1982 and has held research and leadership positions at Yale University and Argonne National Laboratory. Gropp’s research focuses on parallel computing, scientific software, and numerical methods for partial differential equations. He is widely known for his contributions to high‑performance computing, including foundational work on the MPI message‑passing standard and the development of influential software tools used throughout computational science. From 2016 to 2025, he served as Director of the National Center for Supercomputing Applications (NCSA), guiding major initiatives in advanced computing and data‑intensive research. He currently chairs the Computing Community Consortium for the Computing Research Association, helping shape long‑term research directions for the computing field. Gropp is a Fellow of AAAS, ACM, IEEE, and SIAM, a member of the National Academy of Engineering, and the recipient of numerous awards recognizing his impact on high‑performance computing. |
|
|
10:00 AM - 10:20 AM
|
Coffee Break |
|
|
10:20 AM - 11:20 AM
|
Session 5 — Scientific Applications at ScaleSession Chair: TBA |
|
|
POLAR-PIC: Matrixized PIC for Plasma Physics Authors: Yizhuo Rao , Xingjian Cui , Shangzhi Pang , Jiabin Xie , Guangnan Feng , Ziyan Zhang , Jinhui Wei , Languang Gao , Zhenyu Wang , Zhiguang Chen , Yutong Lu |
||
|
A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States Authors: Daran Sun , Bowen Kan , Haoquan Long , Hairui Zhao , Haoxu Li , Yicheng Liu , Pengyu Zhou , Ankang Feng , Wenjing Huang , Yida Gu , Zhenyu Li , Honghui Shang , Yunquan Zhang , Dingwen Tao , Ninghui Sun , Guangming Tan |
||
|
Full-Core Fluid-Structure-Interaction Simulation of Nuclear Reactor Authors: Xue Miao , Jue Wang , Qida Lin , Shufei Zhang , Rongqiang Cao , Chunbao Zhou , Ningming Nie , He Bai , Yangang Wang |
||
|
Endeavor: PairHMM for DNA Variant Detection at Genome Scale Authors: Miguel Graça , Aleksandar Ilic |
||
|
11:30 AM - 12:30 PM
|
Session 6 — Scientific Data CompressionSession Chair: TBA |
|
|
OPAL: On-demand Progressive Accelerated Scientific Lossy Compression Authors: Longtao Zhang , Ruoyu Li , Zhuoxun Yang , Robert Underwood , Sheng Di , Daoce Wang , Jinyang Liu , Jiajun Huang , Franck Cappello , Kai Zhao |
||
|
TZ: High-Ratio Scientific Data Compression on GPUs Authors: Zhuoxun Yang , Ruoyu Li , Amit Subrahmanya , Vishwas Rao , Sheng Di , Robert Underwood , Longtao Zhang , Jinyang Liu , Franck Cappello , Kai Zhao |
||
|
QProR: An Efficient Framework for Quantity-of-Interest Based Progressive Retrieval with Guaranteed Error Control Authors: Wenbo Li , Qian Gong , Xuan Wu , Jieyang Chen , Qing Liu , Xubin He , Norbert Podhorszki , Scott Klasky , Xin Liang |
||
|
Bridging Information Theory and Practice for Scientific Lossy Compression Authors: Sujata Sinha , Sheng Di , Vishwas Rao , Robert Underwood , David Lenz , Zizhe Jian , Zhuoxun Yang , Kai Zhao , Lingjia Liu , Franck Cappello |
||
|
12:30 PM - 1:30 PM
|
Lunch |
|
|
1:30 PM - 2:30 PM
|
Session 7 — Graph Processing & Community DetectionSession Chair: TBA |
|
|
SAGA: State-Aware Graph Analytics for Combinatorial Optimization on Dynamic Graphs Authors: Rohit Prajapati , Prajjwal Nijhara , Dip Sankar Banerjee |
||
|
Efficient Tracking of Communities on Evolving Graphs with Leiden Authors: Subhajit Sahu |
||
|
νMG-LPA and νBM-LPA: Memory Efficient GPU-based Label Propagation Algorithms (LPA) for Community Detection Authors: Subhajit Sahu |
||
|
Kizashi Talks Session 7 (2:15 PM - 2:35 PM)
|
||
|
2:40 PM - 3:40 PM
|
Session 8 — Storage & I/O for Data-Intensive WorkloadsSession Chair: TBA |
|
|
A High-Performance Persistent Transactional Memory System via Cooperative Concurrency Authors: Hao Hu , Xinrui Zheng , Yizou Chen , Xiangyu Zou , Erci Xu , Hongpeng Wang , Wen Xia |
||
|
GLANCED-IO: Taming I/O Optimization for Deep Learning at Scale Authors: Ray A. O. Sinurat , William Nixon , Philip Carns , Huihuo Zheng , Sandeep Madireddy , Sam Foreman , Troy Arcomano , Robert Ross , Haryadi S. Gunawi , Hariharan Devarajan |
||
|
Merkle-Tree Weight Snapshot Deduplication for NN Training Provenance Authors: Kin Wai Ng , Francesco Antici , Nigel Tan , Befikir Bogale , Caleb Han , Florence Tama , Osamu Miyashita , Bogdan Nicolae , Michela Taufer |
||
|
Pome: Parallelizing I/Os and Computations for LSM-tree Storage (Best paper nominee) Authors: Yanpeng Hu , Li Zhu , Lei Jia , Chundong Wang |
||
|
3:40 PM - 4:00 PM
|
Coffee Break |
|
|
4:00 PM - 5:00 PM
|
Data Centers: Information, Misinformation, and the future of large scale computingModerator: Barney Maccabe | |
|
6:00 PM - 7:00 PM
|
Poster Reception — Student, Conference, RSE, Industry |
|
|
7:30 PM - 10:00 PM
|
Banquet Dinner and Museum Visit |
|
|
8:00 AM - 9:00 AM
|
Breakfast |
|
|---|---|---|
|
9:00 AM - 10:00 AM
|
Keynote: From Scalable Systems to Quantum Computing: Navigating New Computing FrontiersAbstractOver the past several decades, the high-performance parallel and distributed computing community has played a central role in advancing computing infrastructures, from operating systems to cloud platforms and large-scale distributed environments. These systems challenges are now reappearing in the context of emerging scientific application workflows that leverage advances in artificial intelligence (AI) and quantum information science. This talk connects a long-standing research trajectory in scalable systems with more recent work in quantum computing. It discusses how a researcher grounded in high- performance and distributed systems can approach the quantum computing landscape, identify familiar abstractions and challenges, and contribute to the emerging quantum software and systems stack. As quantum technologies evolve toward larger and more programmable platforms, issues such as orchestration of hybrid quantum–classical workflows, runtime systems, compilation, resource management, distributed control, and reliability are becoming increasingly important. Advances in AI and quantum computing point toward a future in which scalable systems expertise may play a foundational role in shaping new computing paradigms and scientific capabilities. The presentation will also provide an overview of the evolving U.S. federal research funding landscape in quantum computing, quantum networking, and quantum sensing.
Speaker: Dilma M Da Silva
BiographyDilma Da Silva is a Regents Professor and holder of the Ford Design Professorship II in the Department of Computer Science and Engineering at Texas A&M University. From July 2022 to June 2026, she served in several leadership positions at the U.S. National Science Foundation. Her roles at Texas A&M include Department Head (2014-2019), Associate Dean (2019-2020), interim Director of the Texas A&M Institute of Data Science, and interim Director of the Texas A&M Cybersecurity Center. Her primary research interests are high performance computing, computer science education, and quantum computing. Before joining Texas A&M, she worked at Qualcomm Research (2012-2014), IBM Research (2000-2012), and the University of São Paulo (1996-2000). Dilma is an ACM Distinguished Scientist. She received her doctoral degree in computer science from Georgia Tech in 1997 and her bachelor's and master's degrees from the University of São Paulo, Brazil. She is passionate about mentoring and supporting the next generation of computing researchers and practitioners. |
|
|
10:00 AM - 10:20 AM
|
Coffee Break |
|
|
10:20 AM - 11:20 AM
|
Session 9 — Cloud, Serverless & MicroservicesSession Chair: TBA |
|
|
Enabling High-Utilization and Low-Contention FaaS: A Request-Level Resource Provisioning Approach Authors: Runfu Li , Zishu Yu , Yifan Wang , Xiaohui Peng , Ninghui Sun , Zhiwei Xu |
||
|
Cremes: Cost-Efficient Microservice Execution on Spot Instances Authors: Liao Chen , Chenyu Lin , Junlin Chen , Shutian Luo , Huanle Xu , ChengZhong Xu |
||
|
Krysha: Cost-Efficient Geo-Distributed Serverless Microservices Authors: Yuqiu Zhang , Hans-Arno Jacobsen |
||
|
ACM SRC Presentations (11:05 AM - 11:25 AM)
|
||
|
11:30 AM - 12:30 PM
|
Session 10 — Data Systems for AI & AnalyticsSession Chair: TBA |
|
|
SIVF: GPU-Resident IVF Index for Streaming Vector Analytics Authors: Dongfang Zhao |
||
|
BCCE: Block-Centric GPU Co-Design for Real-Time Range-Top-K Query at Scale Authors: Chengying Huan , Ziheng Meng , Zhengyi Yang , Yongchao Liu , Jie Zhang , Qing Wang , Jing Wang , Shaonan Ma , Zhibin Wang , Mingxing Zhang , Rong Gu , Baokun Wang , Guihai Chen , Chen Tian |
||
|
ATLAS: Out-of-Core Inference for Billion-Scale GNNs Authors: Pranjal Naman , Yogesh Simmhan |
||
|
Kizashi Talks Session 10 (12:15 PM - 12:35 PM)
|
||
|
12:30 PM - 1:30 PM
|
Lunch |
|
|
1:30 PM - 2:30 PM
|
Session 11 — Communication, Workflows & SustainabilitySession Chair: TBA |
|
|
GICC: GPU-Initiated Communication and Coordination Runtime Authors: Baodi Shan , Mauricio Araya-Polo , Barbara Chapman |
||
|
When RDMA Goes Long-Haul: Characterization, Modeling, and Verbs-Level Emulation with Implications for Federated Learning Authors: Yuke Li , Zhonghao Chen , Xiaoyi Lu |
||
|
JANUS: Resilient and Adaptive Data Transmission for Enabling Timely and Efficient Authors: Vladislav Esaulov , Jieyang Chen , Norbert Podhorszki , Fred Suter , Scott Klasky , Anu G. Bourgeois , Lipeng Wan |
||
|
PowerQuant: Architecture-Agnostic GPU Power Estimation via Quantile Regression (Best paper nominee) Authors: Aditya Challa , Tanish Desai , Gargi Alavani Prabhu , Snehanshu Saha , Santonu Sarkar |
||
|
2:40 PM - 3:40 PM
|
Session 12 — Emerging Generative & Agentic AI WorkloadsSession Chair: TBA |
|
|
DiTango: Cost-Effective Parallel Diffusion Generation with Selective Attention State Reuse Authors: Yuyang Chen , Runxin Zhong , Zan Zong , Hengjie Li , Yuyang Jin , Jidong Zhai |
||
|
Dynamo-MoE: Accelerating Sparse Large Model Inference Authors: Jiahao Chen , Shigang Li , Rongtian Fu , Tong Wu , Zhi Ma , Jingkun Dong |
||
|
3:40 PM - 4:00 PM
|
Coffee Break |
|
|
4:00 PM - 5:00 PM
|
Awards Announcements and Closing Remarks |
|