TY - GEN
T1 - GPU-enabled Function-as-a-Service for Machine Learning Inference
AU - Zhao, Ming
AU - Jha, Kritshekhar
AU - Hong, Sungho
N1 - Funding Information: This work is partly supported by National Science Foundation awards CNS-1955593 and OAC-2126291. Publisher Copyright: © 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Function-as-a-Service (FaaS) is emerging as an important cloud computing service model as it can improve the scalability and usability of a wide range of applications, especially Machine-Learning (ML) inference tasks that require scalable resources and complex software configurations. These inference tasks heavily rely on GPUs to achieve high performance; however, support for GPUs is currently lacking in the existing FaaS solutions. The unique event-triggered and short-lived nature of functions poses new challenges to enabling GPUs on FaaS, which must consider the overhead of transferring data (e.g., ML model parameters and inputs/outputs) between GPU and host memory. This paper proposes a novel GPU-enabled FaaS solution that enables ML inference functions to efficiently utilize GPUs to accelerate their computations. First, it extends existing FaaS frameworks such as OpenFaaS to support the scheduling and execution of functions across GPUs in a FaaS cluster. Second, it provides caching of ML models in GPU memory to improve the performance of model inference functions and global management of GPU memories to improve cache utilization. Third, it offers co-designed GPU function scheduling and cache management to optimize the performance of ML inference functions. Specifically, the paper proposes locality-aware scheduling, which maximizes the utilization of both GPU memory for cache hits and GPU cores for parallel processing. A thorough evaluation based on real-world traces and ML models shows that the proposed GPU-enabled FaaS works well for ML inference tasks, and the proposed locality-aware scheduler achieves a speedup of 48x compared to the default, load balancing only schedulers.
AB - Function-as-a-Service (FaaS) is emerging as an important cloud computing service model as it can improve the scalability and usability of a wide range of applications, especially Machine-Learning (ML) inference tasks that require scalable resources and complex software configurations. These inference tasks heavily rely on GPUs to achieve high performance; however, support for GPUs is currently lacking in the existing FaaS solutions. The unique event-triggered and short-lived nature of functions poses new challenges to enabling GPUs on FaaS, which must consider the overhead of transferring data (e.g., ML model parameters and inputs/outputs) between GPU and host memory. This paper proposes a novel GPU-enabled FaaS solution that enables ML inference functions to efficiently utilize GPUs to accelerate their computations. First, it extends existing FaaS frameworks such as OpenFaaS to support the scheduling and execution of functions across GPUs in a FaaS cluster. Second, it provides caching of ML models in GPU memory to improve the performance of model inference functions and global management of GPU memories to improve cache utilization. Third, it offers co-designed GPU function scheduling and cache management to optimize the performance of ML inference functions. Specifically, the paper proposes locality-aware scheduling, which maximizes the utilization of both GPU memory for cache hits and GPU cores for parallel processing. A thorough evaluation based on real-world traces and ML models shows that the proposed GPU-enabled FaaS works well for ML inference tasks, and the proposed locality-aware scheduler achieves a speedup of 48x compared to the default, load balancing only schedulers.
KW - Caching
KW - Function-as-a-Service
KW - GPU scheduling
KW - Machine learning inference
UR - http://www.scopus.com/inward/record.url?scp=85166675105&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85166675105&partnerID=8YFLogxK
U2 - 10.1109/IPDPS54959.2023.00096
DO - 10.1109/IPDPS54959.2023.00096
M3 - Conference contribution
T3 - Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023
SP - 918
EP - 928
BT - Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 37th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023
Y2 - 15 May 2023 through 19 May 2023
ER -