TY - GEN
T1 - MP-Rec
T2 - 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2023
AU - Hsia, Samuel
AU - Gupta, Udit
AU - Acun, Bilge
AU - Ardalani, Newsha
AU - Zhong, Pan
AU - Wei, Gu Yeon
AU - Brooks, David
AU - Wu, Carole Jean
N1 - Publisher Copyright: © 2023 ACM.
PY - 2023/3/25
Y1 - 2023/3/25
N2 - Deep learning recommendation systems serve personalized content under diverse tail-latency targets and input-query loads. In order to do so, state-of-the-art recommendation models rely on terabyte-scale embedding tables to learn user preferences over large bodies of contents. The reliance on a fixed embedding representation of embedding tables not only imposes significant memory capacity and bandwidth requirements but also limits the scope of compatible system solutions. This paper challenges the assumption of fixed embedding representations by showing how synergies between embedding representations and hardware platforms can lead to improvements in both algorithmic-and system performance. Based on our characterization of various embedding representations, we propose a hybrid embedding representation that achieves higher quality embeddings at the cost of increased memory and compute requirements. To address the system performance challenges of the hybrid representation, we propose MP-Rec-a co-design technique that exploits heterogeneity and dynamic selection of embedding representations and underlying hardware platforms. On real system hardware, we demonstrate how matching custom accelerators, i.e., GPUs, TPUs, and IPUs, with compatible embedding representations can lead to 16.65× performance speedup. Additionally, in query-serving scenarios, MP-Rec achieves 2.49× and 3.76× higher correct prediction throughput and 0.19% and 0.22% better model quality on a CPU-GPU system for the Kaggle and Terabyte datasets, respectively.
AB - Deep learning recommendation systems serve personalized content under diverse tail-latency targets and input-query loads. In order to do so, state-of-the-art recommendation models rely on terabyte-scale embedding tables to learn user preferences over large bodies of contents. The reliance on a fixed embedding representation of embedding tables not only imposes significant memory capacity and bandwidth requirements but also limits the scope of compatible system solutions. This paper challenges the assumption of fixed embedding representations by showing how synergies between embedding representations and hardware platforms can lead to improvements in both algorithmic-and system performance. Based on our characterization of various embedding representations, we propose a hybrid embedding representation that achieves higher quality embeddings at the cost of increased memory and compute requirements. To address the system performance challenges of the hybrid representation, we propose MP-Rec-a co-design technique that exploits heterogeneity and dynamic selection of embedding representations and underlying hardware platforms. On real system hardware, we demonstrate how matching custom accelerators, i.e., GPUs, TPUs, and IPUs, with compatible embedding representations can lead to 16.65× performance speedup. Additionally, in query-serving scenarios, MP-Rec achieves 2.49× and 3.76× higher correct prediction throughput and 0.19% and 0.22% better model quality on a CPU-GPU system for the Kaggle and Terabyte datasets, respectively.
KW - Deep Learning
KW - Hardware-Software Co-Design
KW - Recommender Systems
UR - https://www.scopus.com/pages/publications/85159309083
UR - https://www.scopus.com/pages/publications/85159309083#tab=citedBy
U2 - 10.1145/3582016.3582068
DO - 10.1145/3582016.3582068
M3 - Conference contribution
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 449
EP - 465
BT - ASPLOS 2023 - Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
A2 - Aamodt, Tor M.
A2 - Jerger, Natalie Enright
A2 - Swift, Michael
PB - Association for Computing Machinery
Y2 - 25 March 2023 through 29 March 2023
ER -