Abstract
Bootstrapping is a statistical technique that relies on randomly sampling with replacement from a set of observed values. Bootstrapping makes it possible to measure the accuracy and reliability of sample estimates and is often recommended for small samples and samples with unknown or non-normal distributions. In corpus linguistics, bootstrapping has also been proposed as a method for quantifying the degree of homogeneity in a corpus sample, for validation of statistical results, and as a methodological step in random decision forests, an advanced classification method. However, to date bootstrapping techniques have seldom been used with corpus data. We argue in this chapter that bootstrapping is underused in corpus linguistics, and that quantitative corpus linguists would do well to add this tool to their repertoire. This chapter includes an introduction to the fundamentals-both conceptual and practical-of bootstrapping methods. We address several applications of bootstrapping, including the measurement of sample estimate accuracy, the validation of statistical models, the estimation of corpus homogeneity, and random forests. We include an overview of two representative studies that have successfully used bootstrapping techniques with corpus data. Finally, we demonstrate how to perform bootstrapping on corpus data using R, and how to visualize and interpret the results.
Original language | English (US) |
---|---|
Title of host publication | A Practical Handbook of Corpus Linguistics |
Publisher | Springer International Publishing |
Pages | 593-610 |
Number of pages | 18 |
ISBN (Electronic) | 9783030462161 |
ISBN (Print) | 9783030462154 |
DOIs | |
State | Published - Jan 1 2021 |
ASJC Scopus subject areas
- General Arts and Humanities
- General Social Sciences