Leveraging In-Network Processing for Practical Memory Disaggregation
Data systems leverage Big Data and AI tools to extract value out of data, be it local or distributed across the world. Over the last five years, we have demonstrated that network-informed data systems design can yield order-of-magnitude performance and efficiency improvements. In the process, we have established two complementary threads of research: (1) tailoring Big Data and AI applications to their underlying networks and (2) applying networking principles in designing new data systems. In this talk, I will provide an overview of the broader landscape and dive deep into a case study of the former: how to leverage in-network processing for practical memory disaggregation.
High-performance data systems often over-provision memory because applications today cannot access otherwise unused memory beyond their machine boundaries even when their performance grinds to a halt. But could they? Many attempted to answer this question since the 80s, but practical memory disaggregation remained elusive. I will start by presenting the first scalable memory disaggregation system that allows any application to use remote memory without any changes to the application, the underlying operating system, or requiring new hardware. Even a microsecond is almost an order-of-magnitude slower than accessing a local memory page. I will present a new prefetching algorithm to breach this latency barrier. Finally, I will introduce the fastest lock manager in the world – processing billions of lock requests every second – that enables safe concurrent remote page access using programmable switches. Our open-source solutions allow unmodified data systems to run with only 25% local memory with little to no performance loss.
Mosharaf Chowdhury is a Morris Wellman assistant professor of CSE at the University of Michigan, Ann Arbor. His current research focuses on application-infrastructure symbiosis across different layers of software and hardware stacks. Mosharaf invented coflows and is a co-creator of Apache Spark. Software artifacts from his research have been deployed in Microsoft, Facebook, Google, and Amazon datacenters. He has received an NSF CAREER award, the 2015 ACM SIGCOMM doctoral dissertation award, best paper awards at NSDI and ATC, multiple faculty fellowships and awards from Google, VMware, and Alibaba, as well as a Facebook fellowship and a Cheriton Scholarship. He received his PhD from the AMPLab at UC Berkeley in 2015.