Abstract

Big data presents new challenges for understanding large text corpora. Topic modeling algorithms help understand the underlying patterns, or "topics", in data. Researchersauthor often read these topics in order to gain an understanding of the underlying corpus. It is important to evaluate the interpretability of these automatically generated topics. Methods have previously been designed to use crowdsourcing platforms to measure interpretability. In this paper, we demonstrate the necessity of a key concept, coherence, when assessing the topics and propose an effective method for its measurement. We show that the proposed measure of coherence captures a different aspect of the topics than existing measures. We further study the automation of these topic measures for scalability and reproducibility, showing that these measures can be automated.

Original languageEnglish (US)
Title of host publication54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages543-548
Number of pages6
ISBN (Electronic)9781510827592
DOIs
StatePublished - 2016
Event54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Berlin, Germany
Duration: Aug 7 2016Aug 12 2016

Publication series

Name54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Short Papers

Conference

Conference54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
Country/TerritoryGermany
CityBerlin
Period8/7/168/12/16

ASJC Scopus subject areas

  • Language and Linguistics
  • Artificial Intelligence
  • Linguistics and Language
  • Software

Fingerprint

Dive into the research topics of 'A novel measure for coherence in statistical topic models'. Together they form a unique fingerprint.

Cite this