TY - GEN
T1 - Poster
T2 - 8th Annual IEEE/ACM Symposium on Edge Computing, SEC 2023
AU - Chen, Yitao
AU - Zhao, Ming
AU - Chen, Dawei
AU - Han, Kyungtae
AU - Kenney, John
N1 - Publisher Copyright: © 2023 ACM.
PY - 2023
Y1 - 2023
N2 - Video instance segmentation has emerged as a critical component in enabling connected vehicles to comprehend complex driving scenes, thereby facilitating navigation under various driving conditions. Recent advances focus on video-based solutions, which leverage temporal and spatial information to achieve superior performance compared to the traditional image-based approaches. However, these video-based solutions present challenges for efficient deployment at the edge due to their high computational and memory demands, making them inefficient for deployment on edge devices, such as intelligent vehicles. Furthermore, the large size of video data makes it impractical to upload to cloud servers. To address the latency challenge during on-device inference, we propose to incorporate early exits into the model. While the early exit strategy has been successful in image classification and natural language processing tasks, our study is the first to explore its application in video instance segmentation. Specifically, we incorporate early exits into the transformer-based video instance segmentation model, VisTR. Our experimental results on the YouTube-VIS dataset demonstrate that early exit can significantly speed up the inference by up to 4.83x with a minimal trade-off of only 3% in the averaged precision scores. Furthermore, our qualitative analysis confirms the satisfactory quality of the generated segmentation masks.
AB - Video instance segmentation has emerged as a critical component in enabling connected vehicles to comprehend complex driving scenes, thereby facilitating navigation under various driving conditions. Recent advances focus on video-based solutions, which leverage temporal and spatial information to achieve superior performance compared to the traditional image-based approaches. However, these video-based solutions present challenges for efficient deployment at the edge due to their high computational and memory demands, making them inefficient for deployment on edge devices, such as intelligent vehicles. Furthermore, the large size of video data makes it impractical to upload to cloud servers. To address the latency challenge during on-device inference, we propose to incorporate early exits into the model. While the early exit strategy has been successful in image classification and natural language processing tasks, our study is the first to explore its application in video instance segmentation. Specifically, we incorporate early exits into the transformer-based video instance segmentation model, VisTR. Our experimental results on the YouTube-VIS dataset demonstrate that early exit can significantly speed up the inference by up to 4.83x with a minimal trade-off of only 3% in the averaged precision scores. Furthermore, our qualitative analysis confirms the satisfactory quality of the generated segmentation masks.
UR - http://www.scopus.com/inward/record.url?scp=85186120127&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85186120127&partnerID=8YFLogxK
U2 - 10.1145/3583740.3626630
DO - 10.1145/3583740.3626630
M3 - Conference contribution
T3 - Proceedings - 2023 IEEE/ACM Symposium on Edge Computing, SEC 2023
SP - 270
EP - 272
BT - Proceedings - 2023 IEEE/ACM Symposium on Edge Computing, SEC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 January 2023
ER -