Is 1000 genomes data good enough to use for PCA?
0
1
Entering edit mode
7.9 years ago
das2000sidd ▴ 30

Hi I am trying to use the thousand genomes data snp data along with common snps from my exome sequencing project to perform principal component analysis (PCA). I have generated a combined PLINK binary file of my data and the snp data from the thousand genomes data. Then I am using the R package SNPRelate to perform the PCA analysis. Unfortunately regardless of whatever LD value I use to generate a pruned snpset, my samples do not cluster with any of the population groups of the thousand genomes data. In fact they always cluster around the (0,0) mark in the PCA plot. Does anyone know as to why this might be happening and have some suggestions as to how this should be done? Sincere thanks for any suggestions in advance.

sequencing next-gen • 3.0k views
ADD COMMENT
0
Entering edit mode

Yes. What allele frequency off are you using? See the PCAs in the 1kg publications.

ADD REPLY

Login before adding your answer.

Traffic: 2433 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6