Program • HPDC 2026

Day 1 | Day 2 | Day 3 | Day 4

Day 1 - Workshops - 13th July (Tinkham Veale University Center, Case Western Reserve University)

	Workshops and Tutorials
8:00 AM - 9:00 AM	Registration Opens
9:00 AM - 10:30 AM	Tutorial: High-Performance and Smart Networking Technologies for HPC and AI Room: Senior Classroom A	QUASAR Room: Student Organization Center	REX-IO Room: Senior Classroom B	FlexScience Room: Second Floor Conference Room
10:30 AM - 11:00 AM	Coffee Break
11:00 AM - 12:30 PM	Tutorial: High-Performance and Smart Networking Technologies for HPC and AI Room: Senior Classroom A	QUASAR Room: Student Organization Center	REX-IO Room: Senior Classroom B	FlexScience Room: Second Floor Conference Room
12:30 PM - 1:30 PM	(Lunch on Your Own)
1:30 PM - 3:00 PM	Tutorial: Principles and Practice of High Performance Deep Learning Training and Inference Room: Senior Classroom A	QUASAR Room: Student Organization Center	PERMAVOST Room: Senior Classroom B	AI4Sys Room: Second Floor Conference Room	Tutorial: When Error-Bounded Lossy Compression Meets Large-Scale AI Model Training in Federated Environments Room: First Floor Conference Room
3:00 PM - 3:30 PM	Coffee Break
3:30 PM - 5:00 PM	Tutorial: Principles and Practice of High Performance Deep Learning Training and Inference Room: Senior Classroom A	QUASAR Room: Student Organization Center	PERMAVOST Room: Senior Classroom B	AI4Sys Room: Second Floor Conference Room	Tutorial: When Error-Bounded Lossy Compression Meets Large-Scale AI Model Training in Federated Environments Room: First Floor Conference Room

Day 2 - 14th July (Foster-Castele Great Hall)

8:00 AM - 9:00 AM	Breakfast + Opening Remarks
9:00 AM - 10:00 AM	Keynote: The Social Life of Distributed Systems: From Virtual Organizations to Intentional Systems Abstract Twenty-five years ago, the "Grid Problem" was defined as flexible, secure, and coordinated resource sharing among dynamic collections of individuals and institutions — the Virtual Organization (VO). The Anatomy of the Grid (2001) focused on the technical "brawn" required for multi-institutional interoperability, but it soon became clear that infrastructure alone could not solve the challenges of collaborative discovery. In Brain Meets Brawn: Why Grid and Agents Need Each Other (2004), we argued that autonomous agents would provide the necessary "brains" to manage the complexity, coordination, and scale of these distributed environments. Over the following two decades, this perspective evolved toward socio-technical systems in which data, policy, organizational structure, and human interaction became first-class concerns. Our work on Deriva and Deriva-ML treats continuously evolving, curated data collections as the core organizing principle for collaborative discovery, making explicit the rich relationships among data, processes, collaborators, and outcomes across the lifetime of a scientific project. This shift also reframes a long-standing challenge: existing approaches to reproducibility preserve computational artifacts and execution histories, but perfect reconstruction of an experiment does not guarantee reproduction of scientific understanding — one can faithfully reproduce the wrong answer. By making context explicit and persistent, data-centric systems create the foundation for a new class of intentional systems: systems that represent, maintain, and act upon the goals, assumptions, and shared understanding that give scientific results their meaning, not merely the artifacts they produce. The emergence of Large Language Models now makes this practical. Earlier agent-based systems were constrained by narrow reasoning and brittle coordination; LLMs supply a reasoning substrate powerful enough to finally realize the long-envisioned potential of agent-mediated science, shifting the focus from process-centric automation to managing the interactional layer of discovery itself. In this talk, I will discuss recent work in which agent-based systems perform a dual role: executing computational tasks, and — in coordination with a data-centric ecosystem — serving as the "social glue" that captures intent, maintains semantic alignment, and manages shared state across long-running human-agent collaborations. Consider a multi-institutional effort to develop deep learning models for detecting referable glaucoma from fundus photographs collected through a Los Angeles County safety-net teleretinal screening program. As cohorts are refined, label conventions evolve, and foundation-model and supervised approaches are compared across sites, the surrounding agents capture not just the resulting models and metrics, but the clinical and methodological rationale behind each choice — so that when a collaborator, a reviewer, or a downstream agent revisits the work months later, the reasoning behind the result is recoverable, not just the result itself. If the Grid was originally conceived as enabling coordinated resource sharing among dynamic collections of individuals and institutions, the convergence of AI, data-centric systems, and agent-based computing now lets us deliver on a larger vision: coordinating understanding, intent, and discovery across long-lived human-agent collectives. Speaker: Carl Kesselman Biography Carl Kesselman is the William M. Keck Professor of Engineering at the University of Southern California, with appointments in the Daniel J. Epstein Department of Industrial and Systems Engineering, the Thomas Lord Department of Computer Science, the Keck School of Medicine, and the Herman Ostrow School of Dentistry. He is Director of the Informatics Systems Research Division at the USC Information Sciences Institute and is internationally recognized as one of the pioneers of Grid Computing and distributed cyberinfrastructure. Kesselman co-founded the Globus Project, whose technologies and concepts helped establish the foundations for modern distributed, cloud, and data-intensive computing systems. His research has spanned distributed systems, scientific cyberinfrastructure, data integration, security, and large-scale collaborative science platforms. More recently, his work has focused on data-centric socio-technical ecosystems, AI-enabled scientific infrastructure, and agent-mediated systems that support long-running human-machine scientific interactions. He has co-authored four papers recognized in HPDC's retrospective list of the most important papers from the conference's first twenty years. Kesselman is a Fellow of the ACM, IEEE, and the British Computer Society. His honors include the British Computer Society's Lovelace Medal, the IEEE Internet Award, and the IEEE Computer Society's Goode Memorial Award.
10:00 AM - 10:20 AM	Coffee Break
10:20 AM - 11:20 AM	Session 1 — LLM Inference: Scheduling & Serving Session Chair: TBA
	STAR: Decode-Phase Rescheduling for LLM Inference Authors: Zhibin Wang , Zetao Hong , Xue Li , Zibo Wang , Shipeng Li , Qingkai Meng , Qing Wang , Chengying Huan , Rong Gu , Sheng Zhong , Chen Tian
	PKAS: Predictive KVCache-Aware Scheduling Authors: Jie Ye , Avinash Maurya , Krishna Teja Chitty-Venkata , Bogdan Nicolae , Anthony Kougkas , Xian-He Sun
	Omnia: RAG Serving through Speculative Scheduling Authors: Rongtian Fu , Shigang Li , Youxuan Xu , Tong Wu , Zhi Ma , Jinliang Shi
	Kizashi Talks Session 1 (11:05 AM - 11:25 AM)
11:30 AM - 12:30 PM	Session 2 — Scaling Generative Inference Session Chair: TBA
	Scaling Attention Beyond GPUs for LLM Inference Authors: Weishu Deng , Yujie Yang , Peiran Du , Lingfeng Xiang , Zhen Lin , Chen Zhong , Faraz Ahmed , Lianjie Cao , Puneet Sharma , Song Jiang , Hui Lu , Jia Rao
	Accelerating Block Low-Rank Foundation Model Inference on Memory-Constrained GPUs Authors: Pierre Abillama , Changwoo Lee , Juechu Dong , David Blaauw , Dennis Sylvester , Hun-Seok Kim
	MoE-Lens: High-Throughput MoE LLM Inference at the Hardware Limit Authors: Yichao Yuan , Lin Ma , Nishil Talati
	Kizashi Talks Session 2 (12:15 AM - 12:35 AM)
12:30 PM - 1:30 PM	Lunch
1:30 PM - 2:30 PM	Session 3 — MoE Systems & Distributed LLM Training Session Chair: TBA
	UniEP: Unified Expert-Parallel MoE MegaKernel for Training Authors: Size Zheng , Xuegui Zheng , Li-Wen Chang , Jidong Zhai
	ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism Authors: Tenghui Ma , Jihu Guo , Wei Gao , Sitian Lu , Zhisheng Ye , Hanjing Wang , Dahua Lin
	TACO: Communication Compression for Tensor-Parallel LLM Training Authors: Man Liu , Xingchen Liu , Xingjian Tian , Bing Lu , Shengkai Lyu , Shengquan Yin , Wenjing Huang , Zheng Wei , Hairui Zhao , Guangming Tan , Dingwen Tao
	Kizashi Talks Session 3 (2:15 PM - 2:35 PM)
2:40 PM - 3:40 PM	Session 4 — Programming Models, Frameworks & Compilers Session Chair: TBA
	SYCL++: Unified Programming for Heterogeneous Supercomputers at Scale Authors: Zitao Shen , Yuyang Jin , Kinman Lei , Zixuan Ma , Wenqiang Wang , Yinuo Wang , Wenhao Zhou , Zhenchuan Chen , Di Wei , Qi Zhang , Fei Wang , Ying Liu , Lin Gan , Jidong Zhai
	Floating Point Virtualization With Tiny Numbers Authors: Kevin Hayes , Peter Dinda
	FAME: Framework for Multi-Agent RL on Heterogeneous Platforms Authors: Samuel Wiggins , Nikunj Gupta , Grace Zgheib , Mahesh A. Iyer , Viktor Prasanna
	CARBS: Compiler Autotuning via Randomized Biased Search (Best paper nominee) Authors: Wei Li , Bin Gao , Weng-Fai Wong
3:40 PM - 4:00 PM	Coffee Break
4:00 PM - 5:00 PM	The Five Most Important Research Problems for the Next Decade Moderator: Manish Parashar

Day 3 - 15th July (Foster-Castele Great Hall)

8:00 AM - 9:00 AM	Breakfast
9:00 AM - 10:00 AM	Keynote: Misconceptions in Parallel Computing Abstract Abstract: For many years, parallel computing was driven by the steady advance of commodity processors. Clusters of commodity CPUs provided ever greater computing power. Simple parallel performance models made it easy to analyze algorithms and implementations. Systems with mostly nodes of identical type became common, simplifying the application environment. Since 2005, the situation has become increasingly complicated. The end of Dennard scaling has spurred ever greater degrees of parallelism within a single processor chip, as well as specialization in GPUs and other processors. AI has driven systems to sizes never seen before, as well as changing the mix of operations. The way we think about parallel systems needs to be reexamined. In this talk, I will talk about what I think are misconceptions in parallel computing that are a result of the changes in computing, especially since 2005. These include: Is Moore's Law over? Do we still have good performance models? Are we prepared for systems with dissimilar node types? Are standards still useful? I will close with a few challenges for the community. Speaker: William Gropp Biography William Gropp is a professor in the Siebel School of Computing and Data Science at the University of Illinois Urbana‑Champaign, where he holds a Grainger Distinguished Chair in Engineering. He earned his Ph.D. in Computer Science from Stanford University in 1982 and has held research and leadership positions at Yale University and Argonne National Laboratory. Gropp’s research focuses on parallel computing, scientific software, and numerical methods for partial differential equations. He is widely known for his contributions to high‑performance computing, including foundational work on the MPI message‑passing standard and the development of influential software tools used throughout computational science. From 2016 to 2025, he served as Director of the National Center for Supercomputing Applications (NCSA), guiding major initiatives in advanced computing and data‑intensive research. He currently chairs the Computing Community Consortium for the Computing Research Association, helping shape long‑term research directions for the computing field. Gropp is a Fellow of AAAS, ACM, IEEE, and SIAM, a member of the National Academy of Engineering, and the recipient of numerous awards recognizing his impact on high‑performance computing.
10:00 AM - 10:20 AM	Coffee Break
10:20 AM - 11:20 AM	Session 5 — Scientific Applications at Scale Session Chair: TBA
	POLAR-PIC: Matrixized PIC for Plasma Physics Authors: Yizhuo Rao , Xingjian Cui , Shangzhi Pang , Jiabin Xie , Guangnan Feng , Ziyan Zhang , Jinhui Wei , Languang Gao , Zhenyu Wang , Zhiguang Chen , Yutong Lu
	A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States Authors: Daran Sun , Bowen Kan , Haoquan Long , Hairui Zhao , Haoxu Li , Yicheng Liu , Pengyu Zhou , Ankang Feng , Wenjing Huang , Yida Gu , Zhenyu Li , Honghui Shang , Yunquan Zhang , Dingwen Tao , Ninghui Sun , Guangming Tan
	Full-Core Fluid-Structure-Interaction Simulation of Nuclear Reactor Authors: Xue Miao , Jue Wang , Qida Lin , Shufei Zhang , Rongqiang Cao , Chunbao Zhou , Ningming Nie , He Bai , Yangang Wang
	Endeavor: PairHMM for DNA Variant Detection at Genome Scale Authors: Miguel Graça , Aleksandar Ilic
11:30 AM - 12:30 PM	Session 6 — Scientific Data Compression Session Chair: TBA
	OPAL: On-demand Progressive Accelerated Scientific Lossy Compression Authors: Longtao Zhang , Ruoyu Li , Zhuoxun Yang , Robert Underwood , Sheng Di , Daoce Wang , Jinyang Liu , Jiajun Huang , Franck Cappello , Kai Zhao
	TZ: High-Ratio Scientific Data Compression on GPUs Authors: Zhuoxun Yang , Ruoyu Li , Amit Subrahmanya , Vishwas Rao , Sheng Di , Robert Underwood , Longtao Zhang , Jinyang Liu , Franck Cappello , Kai Zhao
	QProR: An Efficient Framework for Quantity-of-Interest Based Progressive Retrieval with Guaranteed Error Control Authors: Wenbo Li , Qian Gong , Xuan Wu , Jieyang Chen , Qing Liu , Xubin He , Norbert Podhorszki , Scott Klasky , Xin Liang
	Bridging Information Theory and Practice for Scientific Lossy Compression Authors: Sujata Sinha , Sheng Di , Vishwas Rao , Robert Underwood , David Lenz , Zizhe Jian , Zhuoxun Yang , Kai Zhao , Lingjia Liu , Franck Cappello
12:30 PM - 1:30 PM	Lunch
1:30 PM - 2:30 PM	Session 7 — Graph Processing & Community Detection Session Chair: TBA
	SAGA: State-Aware Graph Analytics for Combinatorial Optimization on Dynamic Graphs Authors: Rohit Prajapati , Prajjwal Nijhara , Dip Sankar Banerjee
	Efficient Tracking of Communities on Evolving Graphs with Leiden Authors: Subhajit Sahu
	νMG-LPA and νBM-LPA: Memory Efficient GPU-based Label Propagation Algorithms (LPA) for Community Detection Authors: Subhajit Sahu
	Kizashi Talks Session 7 (2:15 PM - 2:35 PM)
2:40 PM - 3:40 PM	Session 8 — Storage & I/O for Data-Intensive Workloads Session Chair: TBA
	A High-Performance Persistent Transactional Memory System via Cooperative Concurrency Authors: Hao Hu , Xinrui Zheng , Yizou Chen , Xiangyu Zou , Erci Xu , Hongpeng Wang , Wen Xia
	GLANCED-IO: Taming I/O Optimization for Deep Learning at Scale Authors: Ray A. O. Sinurat , William Nixon , Philip Carns , Huihuo Zheng , Sandeep Madireddy , Sam Foreman , Troy Arcomano , Robert Ross , Haryadi S. Gunawi , Hariharan Devarajan
	Merkle-Tree Weight Snapshot Deduplication for NN Training Provenance Authors: Kin Wai Ng , Francesco Antici , Nigel Tan , Befikir Bogale , Caleb Han , Florence Tama , Osamu Miyashita , Bogdan Nicolae , Michela Taufer
	Pome: Parallelizing I/Os and Computations for LSM-tree Storage (Best paper nominee) Authors: Yanpeng Hu , Li Zhu , Lei Jia , Chundong Wang
3:40 PM - 4:00 PM	Coffee Break
4:00 PM - 5:00 PM	Data Centers: Information, Misinformation, and the future of large scale computing Moderator: Barney Maccabe
6:00 PM - 7:00 PM	Poster Reception — Student, Conference, RSE, Industry
7:30 PM - 10:00 PM	Banquet Dinner and Museum Visit

Day 4 - 16th July (Foster-Castele Great Hall)

8:00 AM - 9:00 AM	Breakfast
9:00 AM - 10:00 AM	Keynote: From Scalable Systems to Quantum Computing: Navigating New Computing Frontiers Abstract Over the past several decades, the high-performance parallel and distributed computing community has played a central role in advancing computing infrastructures, from operating systems to cloud platforms and large-scale distributed environments. These systems challenges are now reappearing in the context of emerging scientific application workflows that leverage advances in artificial intelligence (AI) and quantum information science. This talk connects a long-standing research trajectory in scalable systems with more recent work in quantum computing. It discusses how a researcher grounded in high- performance and distributed systems can approach the quantum computing landscape, identify familiar abstractions and challenges, and contribute to the emerging quantum software and systems stack. As quantum technologies evolve toward larger and more programmable platforms, issues such as orchestration of hybrid quantum–classical workflows, runtime systems, compilation, resource management, distributed control, and reliability are becoming increasingly important. Advances in AI and quantum computing point toward a future in which scalable systems expertise may play a foundational role in shaping new computing paradigms and scientific capabilities. The presentation will also provide an overview of the evolving U.S. federal research funding landscape in quantum computing, quantum networking, and quantum sensing. Speaker: Dilma M Da Silva Biography Dilma Da Silva is a Regents Professor and holder of the Ford Design Professorship II in the Department of Computer Science and Engineering at Texas A&M University. From July 2022 to June 2026, she served in several leadership positions at the U.S. National Science Foundation. Her roles at Texas A&M include Department Head (2014-2019), Associate Dean (2019-2020), interim Director of the Texas A&M Institute of Data Science, and interim Director of the Texas A&M Cybersecurity Center. Her primary research interests are high performance computing, computer science education, and quantum computing. Before joining Texas A&M, she worked at Qualcomm Research (2012-2014), IBM Research (2000-2012), and the University of São Paulo (1996-2000). Dilma is an ACM Distinguished Scientist. She received her doctoral degree in computer science from Georgia Tech in 1997 and her bachelor's and master's degrees from the University of São Paulo, Brazil. She is passionate about mentoring and supporting the next generation of computing researchers and practitioners.
10:00 AM - 10:20 AM	Coffee Break
10:20 AM - 11:20 AM	Session 9 — Cloud, Serverless & Microservices Session Chair: TBA
	Enabling High-Utilization and Low-Contention FaaS: A Request-Level Resource Provisioning Approach Authors: Runfu Li , Zishu Yu , Yifan Wang , Xiaohui Peng , Ninghui Sun , Zhiwei Xu
	Cremes: Cost-Efficient Microservice Execution on Spot Instances Authors: Liao Chen , Chenyu Lin , Junlin Chen , Shutian Luo , Huanle Xu , ChengZhong Xu
	Krysha: Cost-Efficient Geo-Distributed Serverless Microservices Authors: Yuqiu Zhang , Hans-Arno Jacobsen
	ACM SRC Presentations (11:05 AM - 11:25 AM)
11:30 AM - 12:30 PM	Session 10 — Data Systems for AI & Analytics Session Chair: TBA
	SIVF: GPU-Resident IVF Index for Streaming Vector Analytics Authors: Dongfang Zhao
	BCCE: Block-Centric GPU Co-Design for Real-Time Range-Top-K Query at Scale Authors: Chengying Huan , Ziheng Meng , Zhengyi Yang , Yongchao Liu , Jie Zhang , Qing Wang , Jing Wang , Shaonan Ma , Zhibin Wang , Mingxing Zhang , Rong Gu , Baokun Wang , Guihai Chen , Chen Tian
	ATLAS: Out-of-Core Inference for Billion-Scale GNNs Authors: Pranjal Naman , Yogesh Simmhan
	Kizashi Talks Session 10 (12:15 PM - 12:35 PM)
12:30 PM - 1:30 PM	Lunch
1:30 PM - 2:30 PM	Session 11 — Communication, Workflows & Sustainability Session Chair: TBA
	GICC: GPU-Initiated Communication and Coordination Runtime Authors: Baodi Shan , Mauricio Araya-Polo , Barbara Chapman
	When RDMA Goes Long-Haul: Characterization, Modeling, and Verbs-Level Emulation with Implications for Federated Learning Authors: Yuke Li , Zhonghao Chen , Xiaoyi Lu
	JANUS: Resilient and Adaptive Data Transmission for Enabling Timely and Efficient Authors: Vladislav Esaulov , Jieyang Chen , Norbert Podhorszki , Fred Suter , Scott Klasky , Anu G. Bourgeois , Lipeng Wan
	PowerQuant: Architecture-Agnostic GPU Power Estimation via Quantile Regression (Best paper nominee) Authors: Aditya Challa , Tanish Desai , Gargi Alavani Prabhu , Snehanshu Saha , Santonu Sarkar
2:40 PM - 3:40 PM	Session 12 — Emerging Generative & Agentic AI Workloads Session Chair: TBA
	DiTango: Cost-Effective Parallel Diffusion Generation with Selective Attention State Reuse Authors: Yuyang Chen , Runxin Zhong , Zan Zong , Hengjie Li , Yuyang Jin , Jidong Zhai
	Dynamo-MoE: Accelerating Sparse Large Model Inference Authors: Jiahao Chen , Shigang Li , Rongtian Fu , Tong Wu , Zhi Ma , Jingkun Dong
3:40 PM - 4:00 PM	Coffee Break
4:00 PM - 5:00 PM	Awards Announcements and Closing Remarks

Day 1 - Workshops - 13th July (Tinkham Veale University Center, Case Western Reserve University)

Registration Opens

Coffee Break

(Lunch on Your Own)

Coffee Break

Day 2 - 14th July (Foster-Castele Great Hall)

Breakfast + Opening Remarks

Keynote: The Social Life of Distributed Systems: From Virtual Organizations to Intentional Systems

Coffee Break

Session 1 — LLM Inference: Scheduling & Serving

Session 2 — Scaling Generative Inference

Lunch

Session 3 — MoE Systems & Distributed LLM Training

Session 4 — Programming Models, Frameworks & Compilers

Coffee Break

The Five Most Important Research Problems for the Next Decade

Day 3 - 15th July (Foster-Castele Great Hall)

Breakfast

Keynote: Misconceptions in Parallel Computing

Coffee Break

Session 5 — Scientific Applications at Scale

Session 6 — Scientific Data Compression

Lunch

Session 7 — Graph Processing & Community Detection

Session 8 — Storage & I/O for Data-Intensive Workloads

Coffee Break

Data Centers: Information, Misinformation, and the future of large scale computing

Poster Reception — Student, Conference, RSE, Industry

Banquet Dinner and Museum Visit

Day 4 - 16th July (Foster-Castele Great Hall)

Breakfast

Keynote: From Scalable Systems to Quantum Computing: Navigating New Computing Frontiers

Coffee Break

Session 9 — Cloud, Serverless & Microservices

Session 10 — Data Systems for AI & Analytics

Lunch

Session 11 — Communication, Workflows & Sustainability

Session 12 — Emerging Generative & Agentic AI Workloads

Coffee Break

Awards Announcements and Closing Remarks