Distinguished Lecture Series

John Wilkes

Google

11 June 2015, 04:15 pm

“Large-scale Cluster Management at Google with Borg”

Venue: S2|02 room C110 (Robert-Piloty building, Hochschulstr. 10, 64289 Darmstadt)

Abstract:

Google’s Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.

It achieves high utilization by combining admission control, efficient task-packing, over-commitment,
and machine sharing with process-level performance isolation. It supports high-availability applications with runtime features that minimize fault-recovery time, and scheduling policies that reduce the probability of correlated failures. Borg simplifies life for its users by offering a declarative job specification language,
name service integration, real-time job monitoring, and tools to analyze and simulate system behavior.

Short Bio:

John is working on cluster management for Google‘s compute infrastructure; he was one of the architects of Omega, Google‘s next generation cluster management system. He is interested in far too many aspects of distributed systems, but a recurring theme has been technologies that allow systems to manage themselves.

John received a PhD in CS from the Univ. of Cambridge (UK), joined HP Labs in 1982, and was elected an HP Fellow and an ACM Fellow in 2002 for his work on storage system design.
He’s listed as an inventor on 40+ US patents, and has an adjunct faculty appointment at Carnegie-Mellon Univ. (US).