TY - GEN
T1 - “Let’s Eat Grandma”
T2 - 22nd Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2022
AU - Karami, Mansooreh
AU - Mosallanezhad, Ahmadreza
AU - Mancenido, Michelle V.
AU - Liu, Huan
N1 - Funding Information: The authors would like to thank Sarath Sreedharan (ASU) and Sachin Grover (ASU) for their comments on the manuscript. This material is, in part, based upon works supported by ONR (N00014-21-1-4002) and the U.S. Department of Homeland Security (17STQAC00001-05-00) (Disclaimer: “The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the U.S. Department of Homeland Security.”). Publisher Copyright: © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Neural network-based embeddings have been the mainstream approach for creating a vector representation of the text to capture lexical and semantic similarities and dissimilarities. In general, existing encoding methods dismiss the punctuation as insignificant information; consequently, they are routinely treated as a predefined token/word or eliminated in the pre-processing phase. However, punctuation could play a significant role in the semantics of the sentences, as in “Let’s eat, grandma” and “Let’s eat grandma”. We hypothesize that a punctuation-aware representation model would affect the performance of the downstream tasks. Thereby, we propose a model-agnostic method that incorporates both syntactic and contextual information to improve the performance of the sentiment classification task. We corroborate our findings by conducting experiments on publicly available datasets and provide case studies that our model generates representations with respect to the punctuation in the sentence.
AB - Neural network-based embeddings have been the mainstream approach for creating a vector representation of the text to capture lexical and semantic similarities and dissimilarities. In general, existing encoding methods dismiss the punctuation as insignificant information; consequently, they are routinely treated as a predefined token/word or eliminated in the pre-processing phase. However, punctuation could play a significant role in the semantics of the sentences, as in “Let’s eat, grandma” and “Let’s eat grandma”. We hypothesize that a punctuation-aware representation model would affect the performance of the downstream tasks. Thereby, we propose a model-agnostic method that incorporates both syntactic and contextual information to improve the performance of the sentiment classification task. We corroborate our findings by conducting experiments on publicly available datasets and provide case studies that our model generates representations with respect to the punctuation in the sentence.
KW - Punctuation
KW - Representation learning
KW - Sentiment analysis
KW - Structural embedding
UR - http://www.scopus.com/inward/record.url?scp=85151066155&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85151066155&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-26390-3_34
DO - 10.1007/978-3-031-26390-3_34
M3 - Conference contribution
SN - 9783031263897
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 588
EP - 604
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2022, Proceedings
A2 - Amini, Massih-Reza
A2 - Canu, Stéphane
A2 - Fischer, Asja
A2 - Guns, Tias
A2 - Kralj Novak, Petra
A2 - Tsoumakas, Grigorios
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 19 September 2022 through 23 September 2022
ER -