Question: Plink Logistic regression controlling for pop. structure in tri-hybrid admixed population
gravatar for Guilherme
2.3 years ago by
Guilherme20 wrote:


I have done an analysis in STRUCTURE to get individual ancestry estimation in my population sample. My population is a tri-hybrid population with ancestry of European, African and Amerindian. My question is:

Since It's a tri-hybrid admixed population, do I have to include in the logistic regression model the three columns (EUR, AFR, AMR), as covariates in order to control for possible confounding effects from population stratification?

I'm asking this because I have seen people using only two (for example, EUR and AMR, or AFR and AMR)...


ADD COMMENTlink modified 2.3 years ago by Kevin Blighe60k • written 2.3 years ago by Guilherme20

Are you interested to do single-marker analyses (GWAS)? Feed data to gemma, it's going to create kinship matrix. Proceed with kinship matrix, your phenotype, covariates. Kinship takes care of population stratification. Gemma is good for admixed population.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Bioinformatics_NewComer320
gravatar for Kevin Blighe
2.3 years ago by
Kevin Blighe60k
Kevin Blighe60k wrote:

Ciao Guilherme,

I am not so sure about the choice of STRUCTURE. It's reliability has been refuted: The computer program STRUCTURE does not reliably identify the main genetic clusters within species: simulations and implications for human population structure.

If your population has mixed ethnicity, then population structure, i.e., ethnicity, will most likely be a confounder for which you should adjust in your logistic regression model. You can visually check the extent of the confounding (and statistically check via various metrics) through PCA: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old)

The standard way to adjust for population structure is to include the sample loadings from PC1 and PC2 as covariates (or more PCs, if you feel necessary). PCA can easily be performed within PLINK or using other programs, such as EIGENSOFT or GCTA (the actual PCA implementation in PLINK is the same as GCTA).

There is a previous Biostars thread relating to a similar topic: GWAS: when is it appropriate to add covariates?



ADD COMMENTlink modified 18 months ago • written 2.3 years ago by Kevin Blighe60k

Thank you so much for the enlightenment Kevin! I'm going to take a look at PCA.

ADD REPLYlink written 2.3 years ago by Guilherme20

De nada cara

ADD REPLYlink written 2.3 years ago by Kevin Blighe60k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1012 users visited in the last hour