Abstract
Excess of heterozygosity (H) is a widely used measure of genetic diversity of a population. As high-throughput sequencing and genotyping data become readily available, it has been applied to investigating the associations of genome-wide genetic diversity with human diseases and traits. However, these studies often report contradictory results. In this paper, we present a meta-analysis of five whole-exome studies to examine the association of H scores with Alzheimer's disease. We show that the mean H score of a group is not associated with the disease status, but ot is associated with the sample size. Across all five studies, the group with more samples has a significantly lower H score than the group with fewer samples. To remove potential confounders in empirical data sets, we perform computer simulations to create artificial genomes controlled for the number of polymorphic loci, the sample size, and the allele frequency. Analyses of these simulated data confirm the negative correlation between the sample size and the H score. Furthermore, we find that genomes with a large number of rare variants also have inflated H scores. These biases altogether can lead to spurious associations between genetic diversity and the phenotype of interest. Based on these findings, we advocate that studies shall balance the sample sizes when using genome-wide H scores to assess genetic diversities of different populations, which helps improve the reproducibility of future research.
Original language | English (US) |
---|---|
Pages (from-to) | 197-202 |
Number of pages | 6 |
Journal | Human Heredity |
Volume | 84 |
Issue number | 4-5 |
DOIs | |
State | Published - Jul 1 2020 |
Keywords
- Alzheimer's disease
- Excess of heterozygosity
- Genetic diversity
- Genome analysis
- Sample size bias
ASJC Scopus subject areas
- Genetics
- Genetics(clinical)