Plink MDS-plot: remove population outliers twice?

0

Entering edit mode

7.6 years ago

aldoc ▴ 10

Hi,

[The next 2 MDS-based steps were aimed at 1. removing population outliers (to keep only European-ancestry subjects) and
2. obtaining population covariates only for those European-ancestry subjects, to use them later. In short, the question is: should population outliers be removed based on information from step 2? More details below... .]

I have merged some Plink datasets with the 1000 genomes reference, in order to detect population stratification. After removing non-Europeans, the MDS-plot looked like this:

MDS-plot_1Kgenomes

Then, using only the individuals from my sample (labeled "HNPs" in the plot), I obtained a relevant subset from the whole SNP dataset (~300K SNPs), and got an MDS-plot, which looked like this:

Plot_of_HNPs_subsetSNPs

The plan is to use the MDS covariates in a later stage. However, in the last plot, there are many outliers within the set of European-ancestry subjects. Should those outlier observations be excluded?

Best,

PLINK MDS GWAS SNP • 2.8k views

ADD COMMENT • link 7.6 years ago by aldoc ▴ 10

Login before adding your answer.