Data reduction prior to inference: Are there consequences of comparing groups using a t-test based on principal component scores?

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Researchers often use a two-step process to analyze multivariate data. First, dimensionality is reduced using a technique such as principal component analysis, followed by a group comparison using a (Formula presented.) -test or analysis of variance. Although this practice is often discouraged, the statistical properties of this procedure are not well understood, starting with the hypothesis being tested. We suggest that this approach might be considering two distinct hypotheses, one of which is a global test of no differences in the mean vectors, and the other being a focused test of a specific linear combination where the coefficients have been estimated from the data. We study the asymptotic properties of the two-sample (Formula presented.) -statistic for these two scenarios, assuming a nonsparse setting. We show that the size of the global test agrees with the presumed level but that the test has poor power. In contrast, the size of the focused test can be arbitrarily distorted with certain mean and covariance structures. A simple method is provided to correct the size of the focused test. Data analyses and simulations are used to illustrate the results. Recommendations on the use of this two-step method and the related use of principal components for prediction are provided.

Original languageEnglish (US)
Pages (from-to)508-517
Number of pages10
JournalBiometrics
Volume76
Issue number2
DOIs
StatePublished - Jun 1 2020

Keywords

  • data reduction
  • pooled covariance
  • principal component analysis

ASJC Scopus subject areas

  • Statistics and Probability
  • General Biochemistry, Genetics and Molecular Biology
  • General Immunology and Microbiology
  • General Agricultural and Biological Sciences
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Data reduction prior to inference: Are there consequences of comparing groups using a t-test based on principal component scores?'. Together they form a unique fingerprint.

Cite this