Day 1 - Workshops - 13th July (Tinkham Veale University Center, Case Western Reserve University)

Workshops and Tutorials
8:00 AM - 9:00 AM
Registration Opens
9:00 AM - 10:30 AM
Tutorial: High-Performance and Smart Networking Technologies for HPC and AI
Room: Senior Classroom A
QUASAR
Room: Student Organization Center
REX-IO
Room: Senior Classroom B
FlexScience
Room: Second Floor Conference Room
10:30 AM - 11:00 AM
Coffee Break
11:00 AM - 12:30 PM
Tutorial: High-Performance and Smart Networking Technologies for HPC and AI
Room: Senior Classroom A
QUASAR
Room: Student Organization Center
REX-IO
Room: Senior Classroom B
FlexScience
Room: Second Floor Conference Room
12:30 PM - 1:30 PM
(Lunch on Your Own)
1:30 PM - 3:00 PM
Tutorial: Principles and Practice of High Performance Deep Learning Training and Inference
Room: Senior Classroom A
QUASAR
Room: Student Organization Center
PERMAVOST
Room: Senior Classroom B
AI4Sys
Room: Second Floor Conference Room
Tutorial: When Error-Bounded Lossy Compression Meets Large-Scale AI Model Training in Federated Environments
Room: First Floor Conference Room
3:00 PM - 3:30 PM
Coffee Break
3:30 PM - 5:00 PM
Tutorial: Principles and Practice of High Performance Deep Learning Training and Inference
Room: Senior Classroom A
QUASAR
Room: Student Organization Center
PERMAVOST
Room: Senior Classroom B
AI4Sys
Room: Second Floor Conference Room
Tutorial: When Error-Bounded Lossy Compression Meets Large-Scale AI Model Training in Federated Environments
Room: First Floor Conference Room

Day 2 - 14th July (Foster-Castele Great Hall)

8:00 AM - 9:00 AM
Breakfast + Opening Remarks
9:00 AM - 10:00 AM
Keynote: The Social Life of Distributed Systems: From Virtual Organizations to Intentional Systems
Abstract Twenty-five years ago, the "Grid Problem" was defined as flexible, secure, and coordinated resource sharing among dynamic collections of individuals and institutions — the Virtual Organization (VO). The Anatomy of the Grid (2001) focused on the technical "brawn" required for multi-institutional interoperability, but it soon became clear that infrastructure alone could not solve the challenges of collaborative discovery. In Brain Meets Brawn: Why Grid and Agents Need Each Other (2004), we argued that autonomous agents would provide the necessary "brains" to manage the complexity, coordination, and scale of these distributed environments. Over the following two decades, this perspective evolved toward socio-technical systems in which data, policy, organizational structure, and human interaction became first-class concerns. Our work on Deriva and Deriva-ML treats continuously evolving, curated data collections as the core organizing principle for collaborative discovery, making explicit the rich relationships among data, processes, collaborators, and outcomes across the lifetime of a scientific project. This shift also reframes a long-standing challenge: existing approaches to reproducibility preserve computational artifacts and execution histories, but perfect reconstruction of an experiment does not guarantee reproduction of scientific understanding — one can faithfully reproduce the wrong answer. By making context explicit and persistent, data-centric systems create the foundation for a new class of intentional systems: systems that represent, maintain, and act upon the goals, assumptions, and shared understanding that give scientific results their meaning, not merely the artifacts they produce. The emergence of Large Language Models now makes this practical. Earlier agent-based systems were constrained by narrow reasoning and brittle coordination; LLMs supply a reasoning substrate powerful enough to finally realize the long-envisioned potential of agent-mediated science, shifting the focus from process-centric automation to managing the interactional layer of discovery itself. In this talk, I will discuss recent work in which agent-based systems perform a dual role: executing computational tasks, and — in coordination with a data-centric ecosystem — serving as the "social glue" that captures intent, maintains semantic alignment, and manages shared state across long-running human-agent collaborations. Consider a multi-institutional effort to develop deep learning models for detecting referable glaucoma from fundus photographs collected through a Los Angeles County safety-net teleretinal screening program. As cohorts are refined, label conventions evolve, and foundation-model and supervised approaches are compared across sites, the surrounding agents capture not just the resulting models and metrics, but the clinical and methodological rationale behind each choice — so that when a collaborator, a reviewer, or a downstream agent revisits the work months later, the reasoning behind the result is recoverable, not just the result itself. If the Grid was originally conceived as enabling coordinated resource sharing among dynamic collections of individuals and institutions, the convergence of AI, data-centric systems, and agent-based computing now lets us deliver on a larger vision: coordinating understanding, intent, and discovery across long-lived human-agent collectives.

Speaker: Carl Kesselman
Biography Carl Kesselman is the William M. Keck Professor of Engineering at the University of Southern California, with appointments in the Daniel J. Epstein Department of Industrial and Systems Engineering, the Thomas Lord Department of Computer Science, the Keck School of Medicine, and the Herman Ostrow School of Dentistry. He is Director of the Informatics Systems Research Division at the USC Information Sciences Institute and is internationally recognized as one of the pioneers of Grid Computing and distributed cyberinfrastructure. Kesselman co-founded the Globus Project, whose technologies and concepts helped establish the foundations for modern distributed, cloud, and data-intensive computing systems. His research has spanned distributed systems, scientific cyberinfrastructure, data integration, security, and large-scale collaborative science platforms. More recently, his work has focused on data-centric socio-technical ecosystems, AI-enabled scientific infrastructure, and agent-mediated systems that support long-running human-machine scientific interactions. He has co-authored four papers recognized in HPDC's retrospective list of the most important papers from the conference's first twenty years. Kesselman is a Fellow of the ACM, IEEE, and the British Computer Society. His honors include the British Computer Society's Lovelace Medal, the IEEE Internet Award, and the IEEE Computer Society's Goode Memorial Award.

10:00 AM - 10:20 AM
Coffee Break
10:20 AM - 11:20 AM
Session 1 — LLM Inference: Scheduling & Serving

Session Chair: TBA

STAR: Decode-Phase Rescheduling for LLM Inference

Authors: Zhibin Wang , Zetao Hong , Xue Li , Zibo Wang , Shipeng Li , Qingkai Meng , Qing Wang , Chengying Huan , Rong Gu , Sheng Zhong , Chen Tian

PKAS: Predictive KVCache-Aware Scheduling

Authors: Jie Ye , Avinash Maurya , Krishna Teja Chitty-Venkata , Bogdan Nicolae , Anthony Kougkas , Xian-He Sun

Omnia: RAG Serving through Speculative Scheduling

Authors: Rongtian Fu , Shigang Li , Youxuan Xu , Tong Wu , Zhi Ma , Jinliang Shi

Kizashi Talks Session 1 (11:05 AM - 11:25 AM)

11:30 AM - 12:30 PM
Session 2 — Scaling Generative Inference

Session Chair: TBA

Scaling Attention Beyond GPUs for LLM Inference

Authors: Weishu Deng , Yujie Yang , Peiran Du , Lingfeng Xiang , Zhen Lin , Chen Zhong , Faraz Ahmed , Lianjie Cao , Puneet Sharma , Song Jiang , Hui Lu , Jia Rao

Accelerating Block Low-Rank Foundation Model Inference on Memory-Constrained GPUs

Authors: Pierre Abillama , Changwoo Lee , Juechu Dong , David Blaauw , Dennis Sylvester , Hun-Seok Kim

MoE-Lens: High-Throughput MoE LLM Inference at the Hardware Limit

Authors: Yichao Yuan , Lin Ma , Nishil Talati

Kizashi Talks Session 2 (12:15 AM - 12:35 AM)

12:30 PM - 1:30 PM
Lunch
1:30 PM - 2:30 PM
Session 3 — MoE Systems & Distributed LLM Training

Session Chair: TBA

UniEP: Unified Expert-Parallel MoE MegaKernel for Training

Authors: Size Zheng , Xuegui Zheng , Li-Wen Chang , Jidong Zhai

ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism

Authors: Tenghui Ma , Jihu Guo , Wei Gao , Sitian Lu , Zhisheng Ye , Hanjing Wang , Dahua Lin

TACO: Communication Compression for Tensor-Parallel LLM Training

Authors: Man Liu , Xingchen Liu , Xingjian Tian , Bing Lu , Shengkai Lyu , Shengquan Yin , Wenjing Huang , Zheng Wei , Hairui Zhao , Guangming Tan , Dingwen Tao

Kizashi Talks Session 3 (2:15 PM - 2:35 PM)

2:40 PM - 3:40 PM
Session 4 — Programming Models, Frameworks & Compilers

Session Chair: TBA

SYCL++: Unified Programming for Heterogeneous Supercomputers at Scale

Authors: Zitao Shen , Yuyang Jin , Kinman Lei , Zixuan Ma , Wenqiang Wang , Yinuo Wang , Wenhao Zhou , Zhenchuan Chen , Di Wei , Qi Zhang , Fei Wang , Ying Liu , Lin Gan , Jidong Zhai

Floating Point Virtualization With Tiny Numbers

Authors: Kevin Hayes , Peter Dinda

FAME: Framework for Multi-Agent RL on Heterogeneous Platforms

Authors: Samuel Wiggins , Nikunj Gupta , Grace Zgheib , Mahesh A. Iyer , Viktor Prasanna

CARBS: Compiler Autotuning via Randomized Biased Search (Best paper nominee)

Authors: Wei Li , Bin Gao , Weng-Fai Wong

3:40 PM - 4:00 PM
Coffee Break
4:00 PM - 5:00 PM
The Five Most Important Research Problems for the Next Decade

Moderator: Manish Parashar


Day 3 - 15th July (Foster-Castele Great Hall)

8:00 AM - 9:00 AM
Breakfast
9:00 AM - 10:00 AM
Keynote: Misconceptions in Parallel Computing
Abstract Abstract: For many years, parallel computing was driven by the steady advance of commodity processors. Clusters of commodity CPUs provided ever greater computing power. Simple parallel performance models made it easy to analyze algorithms and implementations. Systems with mostly nodes of identical type became common, simplifying the application environment. Since 2005, the situation has become increasingly complicated. The end of Dennard scaling has spurred ever greater degrees of parallelism within a single processor chip, as well as specialization in GPUs and other processors. AI has driven systems to sizes never seen before, as well as changing the mix of operations. The way we think about parallel systems needs to be reexamined. In this talk, I will talk about what I think are misconceptions in parallel computing that are a result of the changes in computing, especially since 2005. These include: Is Moore's Law over? Do we still have good performance models? Are we prepared for systems with dissimilar node types? Are standards still useful? I will close with a few challenges for the community.

Speaker: William Gropp
Biography William Gropp is a professor in the Siebel School of Computing and Data Science at the University of Illinois Urbana‑Champaign, where he holds a Grainger Distinguished Chair in Engineering. He earned his Ph.D. in Computer Science from Stanford University in 1982 and has held research and leadership positions at Yale University and Argonne National Laboratory. Gropp’s research focuses on parallel computing, scientific software, and numerical methods for partial differential equations. He is widely known for his contributions to high‑performance computing, including foundational work on the MPI message‑passing standard and the development of influential software tools used throughout computational science. From 2016 to 2025, he served as Director of the National Center for Supercomputing Applications (NCSA), guiding major initiatives in advanced computing and data‑intensive research. He currently chairs the Computing Community Consortium for the Computing Research Association, helping shape long‑term research directions for the computing field. Gropp is a Fellow of AAAS, ACM, IEEE, and SIAM, a member of the National Academy of Engineering, and the recipient of numerous awards recognizing his impact on high‑performance computing.

10:00 AM - 10:20 AM
Coffee Break
10:20 AM - 11:20 AM
Session 5 — Scientific Applications at Scale

Session Chair: TBA

POLAR-PIC: Matrixized PIC for Plasma Physics

Authors: Yizhuo Rao , Xingjian Cui , Shangzhi Pang , Jiabin Xie , Guangnan Feng , Ziyan Zhang , Jinhui Wei , Languang Gao , Zhenyu Wang , Zhiguang Chen , Yutong Lu

A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States

Authors: Daran Sun , Bowen Kan , Haoquan Long , Hairui Zhao , Haoxu Li , Yicheng Liu , Pengyu Zhou , Ankang Feng , Wenjing Huang , Yida Gu , Zhenyu Li , Honghui Shang , Yunquan Zhang , Dingwen Tao , Ninghui Sun , Guangming Tan

Full-Core Fluid-Structure-Interaction Simulation of Nuclear Reactor

Authors: Xue Miao , Jue Wang , Qida Lin , Shufei Zhang , Rongqiang Cao , Chunbao Zhou , Ningming Nie , He Bai , Yangang Wang

Endeavor: PairHMM for DNA Variant Detection at Genome Scale

Authors: Miguel Graça , Aleksandar Ilic

11:30 AM - 12:30 PM
Session 6 — Scientific Data Compression

Session Chair: TBA

OPAL: On-demand Progressive Accelerated Scientific Lossy Compression

Authors: Longtao Zhang , Ruoyu Li , Zhuoxun Yang , Robert Underwood , Sheng Di , Daoce Wang , Jinyang Liu , Jiajun Huang , Franck Cappello , Kai Zhao

TZ: High-Ratio Scientific Data Compression on GPUs

Authors: Zhuoxun Yang , Ruoyu Li , Amit Subrahmanya , Vishwas Rao , Sheng Di , Robert Underwood , Longtao Zhang , Jinyang Liu , Franck Cappello , Kai Zhao

QProR: An Efficient Framework for Quantity-of-Interest Based Progressive Retrieval with Guaranteed Error Control

Authors: Wenbo Li , Qian Gong , Xuan Wu , Jieyang Chen , Qing Liu , Xubin He , Norbert Podhorszki , Scott Klasky , Xin Liang

Bridging Information Theory and Practice for Scientific Lossy Compression

Authors: Sujata Sinha , Sheng Di , Vishwas Rao , Robert Underwood , David Lenz , Zizhe Jian , Zhuoxun Yang , Kai Zhao , Lingjia Liu , Franck Cappello

12:30 PM - 1:30 PM
Lunch
1:30 PM - 2:30 PM
Session 7 — Graph Processing & Community Detection

Session Chair: TBA

SAGA: State-Aware Graph Analytics for Combinatorial Optimization on Dynamic Graphs

Authors: Rohit Prajapati , Prajjwal Nijhara , Dip Sankar Banerjee

Efficient Tracking of Communities on Evolving Graphs with Leiden

Authors: Subhajit Sahu

νMG-LPA and νBM-LPA: Memory Efficient GPU-based Label Propagation Algorithms (LPA) for Community Detection

Authors: Subhajit Sahu

Kizashi Talks Session 7 (2:15 PM - 2:35 PM)

2:40 PM - 3:40 PM
Session 8 — Storage & I/O for Data-Intensive Workloads

Session Chair: TBA

A High-Performance Persistent Transactional Memory System via Cooperative Concurrency

Authors: Hao Hu , Xinrui Zheng , Yizou Chen , Xiangyu Zou , Erci Xu , Hongpeng Wang , Wen Xia

GLANCED-IO: Taming I/O Optimization for Deep Learning at Scale

Authors: Ray A. O. Sinurat , William Nixon , Philip Carns , Huihuo Zheng , Sandeep Madireddy , Sam Foreman , Troy Arcomano , Robert Ross , Haryadi S. Gunawi , Hariharan Devarajan

Merkle-Tree Weight Snapshot Deduplication for NN Training Provenance

Authors: Kin Wai Ng , Francesco Antici , Nigel Tan , Befikir Bogale , Caleb Han , Florence Tama , Osamu Miyashita , Bogdan Nicolae , Michela Taufer

Pome: Parallelizing I/Os and Computations for LSM-tree Storage (Best paper nominee)

Authors: Yanpeng Hu , Li Zhu , Lei Jia , Chundong Wang

3:40 PM - 4:00 PM
Coffee Break
4:00 PM - 5:00 PM
Data Centers: Information, Misinformation, and the future of large scale computing

Moderator: Barney Maccabe

6:00 PM - 7:00 PM
Poster Reception — Student, Conference, RSE, Industry
7:30 PM - 10:00 PM
Banquet Dinner and Museum Visit

Day 4 - 16th July (Foster-Castele Great Hall)

8:00 AM - 9:00 AM
Breakfast
9:00 AM - 10:00 AM
Keynote: From Scalable Systems to Quantum Computing: Navigating New Computing Frontiers
Abstract Over the past several decades, the high-performance parallel and distributed computing community has played a central role in advancing computing infrastructures, from operating systems to cloud platforms and large-scale distributed environments. These systems challenges are now reappearing in the context of emerging scientific application workflows that leverage advances in artificial intelligence (AI) and quantum information science. This talk connects a long-standing research trajectory in scalable systems with more recent work in quantum computing. It discusses how a researcher grounded in high- performance and distributed systems can approach the quantum computing landscape, identify familiar abstractions and challenges, and contribute to the emerging quantum software and systems stack. As quantum technologies evolve toward larger and more programmable platforms, issues such as orchestration of hybrid quantum–classical workflows, runtime systems, compilation, resource management, distributed control, and reliability are becoming increasingly important. Advances in AI and quantum computing point toward a future in which scalable systems expertise may play a foundational role in shaping new computing paradigms and scientific capabilities. The presentation will also provide an overview of the evolving U.S. federal research funding landscape in quantum computing, quantum networking, and quantum sensing.

Speaker: Dilma M Da Silva
Biography Dilma Da Silva is a Regents Professor and holder of the Ford Design Professorship II in the Department of Computer Science and Engineering at Texas A&M University. From July 2022 to June 2026, she served in several leadership positions at the U.S. National Science Foundation. Her roles at Texas A&M include Department Head (2014-2019), Associate Dean (2019-2020), interim Director of the Texas A&M Institute of Data Science, and interim Director of the Texas A&M Cybersecurity Center. Her primary research interests are high performance computing, computer science education, and quantum computing. Before joining Texas A&M, she worked at Qualcomm Research (2012-2014), IBM Research (2000-2012), and the University of São Paulo (1996-2000). Dilma is an ACM Distinguished Scientist. She received her doctoral degree in computer science from Georgia Tech in 1997 and her bachelor's and master's degrees from the University of São Paulo, Brazil. She is passionate about mentoring and supporting the next generation of computing researchers and practitioners.

10:00 AM - 10:20 AM
Coffee Break
10:20 AM - 11:20 AM
Session 9 — Cloud, Serverless & Microservices

Session Chair: TBA

Enabling High-Utilization and Low-Contention FaaS: A Request-Level Resource Provisioning Approach

Authors: Runfu Li , Zishu Yu , Yifan Wang , Xiaohui Peng , Ninghui Sun , Zhiwei Xu

Cremes: Cost-Efficient Microservice Execution on Spot Instances

Authors: Liao Chen , Chenyu Lin , Junlin Chen , Shutian Luo , Huanle Xu , ChengZhong Xu

Krysha: Cost-Efficient Geo-Distributed Serverless Microservices

Authors: Yuqiu Zhang , Hans-Arno Jacobsen

ACM SRC Presentations (11:05 AM - 11:25 AM)

11:30 AM - 12:30 PM
Session 10 — Data Systems for AI & Analytics

Session Chair: TBA

SIVF: GPU-Resident IVF Index for Streaming Vector Analytics

Authors: Dongfang Zhao

BCCE: Block-Centric GPU Co-Design for Real-Time Range-Top-K Query at Scale

Authors: Chengying Huan , Ziheng Meng , Zhengyi Yang , Yongchao Liu , Jie Zhang , Qing Wang , Jing Wang , Shaonan Ma , Zhibin Wang , Mingxing Zhang , Rong Gu , Baokun Wang , Guihai Chen , Chen Tian

ATLAS: Out-of-Core Inference for Billion-Scale GNNs

Authors: Pranjal Naman , Yogesh Simmhan

Kizashi Talks Session 10 (12:15 PM - 12:35 PM)

12:30 PM - 1:30 PM
Lunch
1:30 PM - 2:30 PM
Session 11 — Communication, Workflows & Sustainability

Session Chair: TBA

GICC: GPU-Initiated Communication and Coordination Runtime

Authors: Baodi Shan , Mauricio Araya-Polo , Barbara Chapman

When RDMA Goes Long-Haul: Characterization, Modeling, and Verbs-Level Emulation with Implications for Federated Learning

Authors: Yuke Li , Zhonghao Chen , Xiaoyi Lu

JANUS: Resilient and Adaptive Data Transmission for Enabling Timely and Efficient

Authors: Vladislav Esaulov , Jieyang Chen , Norbert Podhorszki , Fred Suter , Scott Klasky , Anu G. Bourgeois , Lipeng Wan

PowerQuant: Architecture-Agnostic GPU Power Estimation via Quantile Regression (Best paper nominee)

Authors: Aditya Challa , Tanish Desai , Gargi Alavani Prabhu , Snehanshu Saha , Santonu Sarkar

2:40 PM - 3:40 PM
Session 12 — Emerging Generative & Agentic AI Workloads

Session Chair: TBA

DiTango: Cost-Effective Parallel Diffusion Generation with Selective Attention State Reuse

Authors: Yuyang Chen , Runxin Zhong , Zan Zong , Hengjie Li , Yuyang Jin , Jidong Zhai

Dynamo-MoE: Accelerating Sparse Large Model Inference

Authors: Jiahao Chen , Shigang Li , Rongtian Fu , Tong Wu , Zhi Ma , Jingkun Dong

3:40 PM - 4:00 PM
Coffee Break
4:00 PM - 5:00 PM
Awards Announcements and Closing Remarks