Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation

Liu Ke, Udit Gupta, Mark Hempsteadis, Carole Jean Wu, Hsien Hsin S. Lee, Xuan Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

Personalized recommendation is an important class of deep-learning applications that powers a large collection of internet services and consumes a considerable amount of datacenter resources. As the scale of production-grade recommendation systems continues to grow, optimizing their serving performance and efficiency in a heterogeneous datacenter is important and can translate into infrastructure capacity saving. In this paper, we propose Hercules, an optimized framework for personalized recommendation inference serving that targets diverse industry-representative models and cloud-scale heterogeneous systems. Hercules performs a two-stage optimization procedure - offline profiling and online serving. The first stage searches the large under-explored task scheduling space with a gradient-based search algorithm achieving up to 9.0× latency-bounded throughput improvement on individual servers; it also identifies the optimal heterogeneous server architecture for each recommendation workload. The second stage performs heterogeneity-aware cluster provisioning to optimize resource mapping and allocation in response to fluctuating diurnal loads. The proposed cluster scheduler in Hercules achieves 47.7% cluster capacity saving and reduces the provisioned power by 23.7% over a state-of-the-art greedy scheduler.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2022
PublisherIEEE Computer Society
Pages141-154
Number of pages14
ISBN (Electronic)9781665420273
DOIs
StatePublished - 2022
Externally publishedYes
Event28th Annual IEEE International Symposium on High-Performance Computer Architecture, HPCA 2022 - Virtual, Online, Korea, Republic of
Duration: Apr 2 2022Apr 6 2022

Publication series

NameProceedings - International Symposium on High-Performance Computer Architecture
Volume2022-April

Conference

Conference28th Annual IEEE International Symposium on High-Performance Computer Architecture, HPCA 2022
Country/TerritoryKorea, Republic of
CityVirtual, Online
Period4/2/224/6/22

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation'. Together they form a unique fingerprint.

Cite this