I have HapMAp data+another data set (totally 9 population). I will aply PCA to this data set. I merged the data sets using PLINK, --merge-list. Now, I have mergeddata.bim,mergeddata.bad,mergeddata.fam files.
How can list the overlapping SNPs in nine files in R?
And what/how should I do after I identify the overlapping SNPs?
Note: I am really new in this area and using Linux.
You should add this as a comment and not as a separate answer to your question. As I said in my answer your merged file
mergedata.bimcontains the intersect of all the SNPs. There shouldn't be any duplicated SNPs on that merged data set. Maybe check the PLINK website and explore the merge command that you use to see what it does.
Thank you very much. I did it as you said. I have a snps.txt file now. Then, I should do LD-prunning.I will try to prune out SNP which has low r2. I guess I need some parameter. How can I define them?
I'm glad it worked. How to prune SNP will depend on what you want to do with the pruned data ? Is it to do a PCA for instance? In that case, this parameter is common:
--indep-pairwise 50 10 0.2However, this question is not related to your original post and you should either post a new one or try to look for the answer by googling it :)
Actually, LD prunning is completed. I used --indep-pairwise 50 5 0.2 . Thank you very much for your suggestions.
I asked another question about the continuation of this topic in other posts. Maybe you can help out there :)
cool, accept the answer then :P