I am using PLINK to generate PCA/MDS to adjust for population stratification during association analysis.
I have found a reference that uses pairwise IBS on pruned SNPs, then feeds the result alongside all SNPs (not pruned) to plink for MDS (approach 1, codes below.):
$ plink --bfile xxx --extract xxx.prune.in --genome --out
$ plink --bfile xxx --read-genome xxx.genome --cluster --mds-plot 10 --out xxx_mds
It is my understanding that the more conventional way to go about it would be to:
(approach2) A. Remove high LD regions B. Prune SNPs C. Do PCA/MDS (using pruned SNPs),
I have tried both and approach 1 does seem to improve the signal in my data, however I am agnostic about using either approach as long as it is justified.
I am wondering when approach 1 is applicable (my reference does not discuss this) or is there a reason that it would be advantageous over the "conventional" approach 2.
Many thanks you for your reply beforehand.