TY - JOUR
T1 - Serving Deep Learning Models with Deduplication from Relational Databases
AU - Zhou, Lixi
AU - Chen, Jiaqing
AU - Das, Amitabh
AU - Min, Hong
AU - Yu, Lei
AU - Zhao, Ming
AU - Zou, Jia
N1 - Funding Information: This work was supported by ASU FSE start-up funding, IBM Academic Research Award, and NSF CAREER award (Number 2144923). We also appreciate the constructive feedbacks from the anonymous reviewers of VLDB 2022. Publisher Copyright: © 2022, VLDB Endowment., All rights reserved.
PY - 2022
Y1 - 2022
N2 - Serving deep learning models from relational databases brings significant benefits. First, features extracted from databases do not need to be transferred to any decoupled deep learning systems for inferences, and thus the system management overhead can be significantly reduced. Second, in a relational database, data management along the storage hierarchy is fully integrated with query processing, and thus it can continue model serving even if the working set size exceeds the available memory. Applying model deduplication can greatly reduce the storage space, memory footprint, cache misses, and inference latency. However, existing data deduplication techniques are not applicable to the deep learning model serving applications in relational databases. They do not consider the impacts on model inference accuracy as well as the inconsistency between tensor blocks and database pages. This work proposed synergistic storage optimization techniques for duplication detection, page packing, and caching, to enhance database systems for model serving. Evaluation results show that our proposed techniques significantly improved the storage efficiency and the model inference latency, and outperformed existing deep learning frameworks in targeting scenarios.
AB - Serving deep learning models from relational databases brings significant benefits. First, features extracted from databases do not need to be transferred to any decoupled deep learning systems for inferences, and thus the system management overhead can be significantly reduced. Second, in a relational database, data management along the storage hierarchy is fully integrated with query processing, and thus it can continue model serving even if the working set size exceeds the available memory. Applying model deduplication can greatly reduce the storage space, memory footprint, cache misses, and inference latency. However, existing data deduplication techniques are not applicable to the deep learning model serving applications in relational databases. They do not consider the impacts on model inference accuracy as well as the inconsistency between tensor blocks and database pages. This work proposed synergistic storage optimization techniques for duplication detection, page packing, and caching, to enhance database systems for model serving. Evaluation results show that our proposed techniques significantly improved the storage efficiency and the model inference latency, and outperformed existing deep learning frameworks in targeting scenarios.
UR - http://www.scopus.com/inward/record.url?scp=85138003831&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85138003831&partnerID=8YFLogxK
U2 - 10.14778/3547305.3547325
DO - 10.14778/3547305.3547325
M3 - Conference article
SN - 2150-8097
VL - 15
SP - 2230
EP - 2243
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 10
T2 - 48th International Conference on Very Large Data Bases, VLDB 2022
Y2 - 1 January 2022
ER -