Population stratification with PCA
1
1
Entering edit mode
4.0 years ago
Yong ▴ 10

Hi all! I have a genotype dataset in plink format. Now I want to correct for population structure with PCA in association analysis. I split my dataset to training and testing datasets. I want to do the PCA only in the training dataset and use the training dataset as a reference panel to calculate the PCs in testing datasets (I don't want to combine them before PCA). Dose anyone know how to do that? Thank you!

Plink PCA stratification Population • 1.2k views
ADD COMMENT
1
Entering edit mode
4.0 years ago

Yes, you can multiply the genotypes of the test data (as alt-allele counts) with those from eigen-vector from the training-set-only PCA. That's very fast to do. I use it to derive per-person placements in a PCA plot of 1kgenomes individuals, like this for inspiration, and it takes literally just a split-second to get.

I've used the prcomp package in R for that calculation, you could follow the calculation in above code-link if you want (I think the original pca-call is here), but the principle is the same for any PCA-package.

Good luck!

ADD COMMENT

Login before adding your answer.

Traffic: 2751 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6