TY - JOUR
T1 - Best practices for genotype imputation from low-coverage sequencing data in natural populations
AU - Watowich, Marina M.
AU - Chiou, Kenneth L.
AU - Graves, Brian
AU - Montague, Michael J.
AU - Brent, Lauren J.N.
AU - Higham, James P.
AU - Horvath, Julie E.
AU - Lu, Amy
AU - Martinez, Melween I.
AU - Platt, Michael L.
AU - Schneider-Crease, India A.
AU - Lea, Amanda J.
AU - Snyder-Mackler, Noah
N1 - Publisher Copyright: © 2023 John Wiley & Sons Ltd.
PY - 2023
Y1 - 2023
N2 - Monitoring genetic diversity in wild populations is a central goal of ecological and evolutionary genetics and is critical for conservation biology. However, genetic studies of nonmodel organisms generally lack access to species-specific genotyping methods (e.g. array-based genotyping) and must instead use sequencing-based approaches. Although costs are decreasing, high-coverage whole-genome sequencing (WGS), which produces the highest confidence genotypes, remains expensive. More economical reduced representation sequencing approaches fail to capture much of the genome, which can hinder downstream inference. Low-coverage WGS combined with imputation using a high-confidence reference panel is a cost-effective alternative, but the accuracy of genotyping using low-coverage WGS and imputation in nonmodel populations is still largely uncharacterized. Here, we empirically tested the accuracy of low-coverage sequencing (0.1–10×) and imputation in two natural populations, one with a large (n = 741) reference panel, rhesus macaques (Macaca mulatta), and one with a smaller (n = 68) reference panel, gelada monkeys (Theropithecus gelada). Using samples sequenced to coverage as low as 0.5×, we could impute genotypes at >95% of the sites in the reference panel with high accuracy (median r2 ≥ 0.92). We show that low-coverage imputed genotypes can reliably calculate genetic relatedness and population structure. Based on these data, we also provide best practices and recommendations for researchers who wish to deploy this approach in other populations, with all code available on GitHub (https://github.com/mwatowich/LoCSI-for-non-model-species). Our results endorse accurate and effective genotype imputation from low-coverage sequencing, enabling the cost-effective generation of population-scale genetic datasets necessary for tackling many pressing challenges of wildlife conservation.
AB - Monitoring genetic diversity in wild populations is a central goal of ecological and evolutionary genetics and is critical for conservation biology. However, genetic studies of nonmodel organisms generally lack access to species-specific genotyping methods (e.g. array-based genotyping) and must instead use sequencing-based approaches. Although costs are decreasing, high-coverage whole-genome sequencing (WGS), which produces the highest confidence genotypes, remains expensive. More economical reduced representation sequencing approaches fail to capture much of the genome, which can hinder downstream inference. Low-coverage WGS combined with imputation using a high-confidence reference panel is a cost-effective alternative, but the accuracy of genotyping using low-coverage WGS and imputation in nonmodel populations is still largely uncharacterized. Here, we empirically tested the accuracy of low-coverage sequencing (0.1–10×) and imputation in two natural populations, one with a large (n = 741) reference panel, rhesus macaques (Macaca mulatta), and one with a smaller (n = 68) reference panel, gelada monkeys (Theropithecus gelada). Using samples sequenced to coverage as low as 0.5×, we could impute genotypes at >95% of the sites in the reference panel with high accuracy (median r2 ≥ 0.92). We show that low-coverage imputed genotypes can reliably calculate genetic relatedness and population structure. Based on these data, we also provide best practices and recommendations for researchers who wish to deploy this approach in other populations, with all code available on GitHub (https://github.com/mwatowich/LoCSI-for-non-model-species). Our results endorse accurate and effective genotype imputation from low-coverage sequencing, enabling the cost-effective generation of population-scale genetic datasets necessary for tackling many pressing challenges of wildlife conservation.
KW - conservation
KW - genotyping
KW - imputation
KW - next-generation sequencing
KW - population genetics
UR - http://www.scopus.com/inward/record.url?scp=85169100847&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85169100847&partnerID=8YFLogxK
U2 - 10.1111/1755-0998.13854
DO - 10.1111/1755-0998.13854
M3 - Article
SN - 1755-098X
JO - Molecular Ecology Resources
JF - Molecular Ecology Resources
ER -