Very Short Bottlenecks and Very Long Latencies in Mission-Critical Systems

Keynote Speaker: Prof. Calton Pu

Professor and John P. Imlay, Jr. Chair in Software

School of Computer Science, Georgia Institute of Technology



Web-facing applications have complex deployment dependencies and stringent quality of service requirements, e.g., 99.9% of requests with response time within 0.5 seconds. However, despite continued efforts by industry and academic researchers, the latency long tail problem, where a non-trivial fraction of requests return after a few seconds, remains a serious research and practical challenge. Latency long tail happens even when the system utilization is still very far from saturation (e.g., 40 – 60% average CPU utilization). 

Millibottlenecks (resource saturations that last only tens to hundreds of milliseconds) have been shown to cause latency long tail for a variety of hardware (e.g., CPU, memory, and disk) and software issues (e.g., garbage collection and scheduling algorithms) in n-tier benchmarks. Recent experimental results show the latency long tail problem may be amplified in mission-critical microservices-based systems due to composition and dependencies. To meet the challenge of latency long tail problem, we developed a methodical and automated approach to collect fine-grain performance data on microservices-based benchmarks, filter and process the experimental data, and analyze the copious data to find the millibottlenecks causing the latency long tail. With a better understanding of the latency long tail problem, we will be able to improve overall system utilization in mission-critical applications while preserving high quality of service.


Calton Pu was born in Taiwan and grew up in Brazil.  He received his PhD from University of Washington and served on the faculty of Columbia University and Oregon Graduate Institute.  Currently, he is holding the position of Professor and John P. Imlay, Jr. Chair in Software in the College of Computing, Georgia Institute of Technology.  He has worked on several projects in systems and database research.  His contributions to systems research include program specialization and software feedback.  His contributions to database research include extended transaction models and their implementation.  His recent research has focused on automated system management in clouds (Elba project), information quality (e.g., spam processing), and big data in Internet of Things (GRAIT-DM project).  He has collaborated extensively with scientists and industry researchers.  He has published more than 70 journal papers and book chapters, 280 conference and refereed workshop papers. He served on more than 120 program committees, including about a dozen PC co-chairs and a dozen co-general chairs. He is a Fellow of AAAS and IEEE.