TY - GEN
T1 - Learning to Selectively Learn for Weakly-supervised Paraphrase Generation
AU - Ding, Kaize
AU - Li, Dingcheng
AU - Li, Alexander Hanbo
AU - Fan, Xing
AU - Guo, Chenlei
AU - Liu, Yang
AU - Liu, Huan
N1 - Publisher Copyright: © 2021 Association for Computational Linguistics
PY - 2021
Y1 - 2021
N2 - Paraphrase generation is a longstanding NLP task that has diverse applications for downstream NLP tasks. However, the effectiveness of existing efforts predominantly relies on large amounts of golden labeled data. Though unsupervised endeavors have been proposed to address this issue, they may fail to generate meaningful paraphrases due to the lack of supervision signals. In this work, we go beyond the existing paradigms and propose a novel approach to generate high-quality paraphrases with weak supervision data. Specifically, we tackle the weakly-supervised paraphrase generation problem by: (1) obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion; and (2) developing a meta-learning framework to progressively select valuable samples for fine-tuning a pre-trained language model, i.e., BART, on the sentential paraphrasing task. We demonstrate that our approach achieves significant improvements over existing unsupervised approaches, and is even comparable in performance with supervised state-of-the-arts.
AB - Paraphrase generation is a longstanding NLP task that has diverse applications for downstream NLP tasks. However, the effectiveness of existing efforts predominantly relies on large amounts of golden labeled data. Though unsupervised endeavors have been proposed to address this issue, they may fail to generate meaningful paraphrases due to the lack of supervision signals. In this work, we go beyond the existing paradigms and propose a novel approach to generate high-quality paraphrases with weak supervision data. Specifically, we tackle the weakly-supervised paraphrase generation problem by: (1) obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion; and (2) developing a meta-learning framework to progressively select valuable samples for fine-tuning a pre-trained language model, i.e., BART, on the sentential paraphrasing task. We demonstrate that our approach achieves significant improvements over existing unsupervised approaches, and is even comparable in performance with supervised state-of-the-arts.
UR - http://www.scopus.com/inward/record.url?scp=85127410310&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127410310&partnerID=8YFLogxK
M3 - Conference contribution
T3 - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 5930
EP - 5940
BT - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
PB - Association for Computational Linguistics (ACL)
T2 - 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Y2 - 7 November 2021 through 11 November 2021
ER -