The American national corpus: More than the web can provide

Nancy Ide, Randi Reppen, Keith Suderman

Research output: Contribution to conferencePaperpeer-review

23 Scopus citations

Abstract

The American National Corpus (ANC) project is developing a corpus comparable to the British National Corpus (BNC), covering American English. Recent interest in the web as a source of corpus materials has caused some in the language processing community to suggest that the development of a corpus of American English is unnecessary. However, we argue that far from being rendered superfluous by the availability of web materials, the ANC is likely to provide a resource for developing web acquisition techniques to support tasks such as genre and language detection and automatic annotation. This paper presents a comparison of the ANC in terms of both content and format with a test corpus compiled from web data, and a discussion of points of intersection and divergence.

Original languageEnglish (US)
Pages839-844
Number of pages6
StatePublished - 2002
Event3rd International Conference on Language Resources and Evaluation, LREC 2002 - Las Palmas, Canary Islands, Spain
Duration: May 29 2002May 31 2002

Other

Other3rd International Conference on Language Resources and Evaluation, LREC 2002
Country/TerritorySpain
CityLas Palmas, Canary Islands
Period5/29/025/31/02

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Education
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'The American national corpus: More than the web can provide'. Together they form a unique fingerprint.

Cite this