Dr. Nicolas Weber

NEC Labs


Systems for ML and ML for Systems @ NEC Labs Europe


Research and development in machine learning and, specifically, neural network-based learning is experiencing tremendous growth. Due to the resource (compute, memory, energy) consumption of ML, the volume of data, and the need to deploy ML in resource-constrained environments, there is a strong push for ML research to pay close attention to systems considerations. The talk will cover three projects of the SysML group at the intersection of ML and systems research. Each of these projects is motivated by the question of what ML can do for systems research and what systems research can do for ML. First, I will introduce net2vec, a deep learning system for the network that performs unsupervised representation learning and is able to compute predictions at line speed. Second, I describe our work on using machine learning for resource scheduling. This work integrates systems telemetry ranging from standard resource usage statistics to kernel and library calls of applications to train a machine learning model. Intuitively, such a state-based recurrent model approximates, at any point in time, the state of a system and allows us to solve tasks such as resource usage prediction. Third, I will discuss BrainSlug, a principled method for transparently accelerating ML workloads by changing the breadth-first, layer-by-layer processing of neural networks to also perform depth-first processing. BrainSlug reduces the amount of data on which operations are performed and, therefore, improves the utilization of hardware caches. BrainSlug achieves performance improvements of up to 41.1% on CPUs and 35.7% on GPUs with zero cost for the user.


After receiving his PHD from Technische Universität Darmstadt in 2017, Nicolas Weber joined the Systems and Machine Learning group at NEC Laboratories Europe. He is the lead researcher for the “BrainSlug” transparent neural network acceleration project. His research interest mainly focus on automated performance optimizations of compute intensive applications on high performance hardware.