TY - GEN
T1 - Poster
T2 - 8th Annual IEEE/ACM Symposium on Edge Computing, SEC 2023
AU - Zhao, Kaiqi
AU - Zhao, Ming
N1 - Publisher Copyright: © 2023 ACM.
PY - 2023
Y1 - 2023
N2 - Quantization-aware training (QAT) achieves competitive performance and is widely used for image classification tasks in model compression. Existing QAT works start with a pre-trained full-precision model and perform quantization during retraining. However, these works require supervision from the ground-truth labels whereas sufficient labeled data are infeasible in real-world environments. Also, they suffer from accuracy loss due to reduced precision, and no algorithm consistently achieves the best or the worst performance on every model architecture. To address the afore-mentioned limitations, this paper proposes a novel Self-Supervised Quantization-Aware Knowledge Distillation framework (SQAKD). SQAKD unifies the forward and backward dynamics of various quantization functions, making it flexible for incorporating the various QAT works. With the full-precision model as the teacher and the low-bit model as the student, SQAKD reframes QAT as a cooptimization problem that simultaneously minimizes the KL-Loss (i.e., the Kullback-Leibler divergence loss between the teacher's and student's penultimate outputs) and the discretization error (i.e., the difference between the full-precision weights/activations and their quantized counterparts). This optimization is achieved in a self-supervised manner without labeled data. The evaluation shows that SQAKD significantly improves the performance of various state-of-the-art QAT works (e.g., PACT, LSQ, DoReFa, and EWGS). SQAKD establishes stronger baselines and does not require extensive labeled training data, potentially making state-of-the-art QAT research more accessible.
AB - Quantization-aware training (QAT) achieves competitive performance and is widely used for image classification tasks in model compression. Existing QAT works start with a pre-trained full-precision model and perform quantization during retraining. However, these works require supervision from the ground-truth labels whereas sufficient labeled data are infeasible in real-world environments. Also, they suffer from accuracy loss due to reduced precision, and no algorithm consistently achieves the best or the worst performance on every model architecture. To address the afore-mentioned limitations, this paper proposes a novel Self-Supervised Quantization-Aware Knowledge Distillation framework (SQAKD). SQAKD unifies the forward and backward dynamics of various quantization functions, making it flexible for incorporating the various QAT works. With the full-precision model as the teacher and the low-bit model as the student, SQAKD reframes QAT as a cooptimization problem that simultaneously minimizes the KL-Loss (i.e., the Kullback-Leibler divergence loss between the teacher's and student's penultimate outputs) and the discretization error (i.e., the difference between the full-precision weights/activations and their quantized counterparts). This optimization is achieved in a self-supervised manner without labeled data. The evaluation shows that SQAKD significantly improves the performance of various state-of-the-art QAT works (e.g., PACT, LSQ, DoReFa, and EWGS). SQAKD establishes stronger baselines and does not require extensive labeled training data, potentially making state-of-the-art QAT research more accessible.
UR - http://www.scopus.com/inward/record.url?scp=85186121926&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85186121926&partnerID=8YFLogxK
U2 - 10.1145/3583740.3626620
DO - 10.1145/3583740.3626620
M3 - Conference contribution
T3 - Proceedings - 2023 IEEE/ACM Symposium on Edge Computing, SEC 2023
SP - 250
EP - 252
BT - Proceedings - 2023 IEEE/ACM Symposium on Edge Computing, SEC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 January 2023
ER -