PCA on VCF
2
0
Entering edit mode
5.8 years ago
Picasa ▴ 610

Is it possible to produce this kind of PCA:

https://rstudio-pubs-static.s3.amazonaws.com/89838_c06c544a19f94599aa856576e7c08e2b.html

without EIGENSOFT ? (for some reasons I can't install it in my computer).

pca vcf • 4.4k views
0
Entering edit mode

How about using PLINK to generate the matrix of the VCF files and then do PCA for it.

0
Entering edit mode

PLINK has a lot of tools. Which one are you referring to? Is it pseq proj v-matrix ...?

0
Entering edit mode

I am not sure actually. But I saw once that with PLINK a SNPs matrix(numerical) were generated. Through this, a PCA would be easy.

0
Entering edit mode

Does it perform LD pruning ?

0
Entering edit mode

Not sure. You might need to check them out by yourself because I haven't tried it. But I would recommend you to go with @Philipp and @Michs' answers.

1
Entering edit mode
5.8 years ago

GAPIT can do this for you, too, but it needs other input data: http://www.maizegenetics.net/#!gapit/cmkv For the conversion of VCF to HapMap format, have a look here: Convert Plink Ped Format Into Hapmap Format?

You can also use FlashPCA, esp. because that one shows how to do LD-pruning of SNPs. You can then use the output pcs.txt in the R-script from your link,

0
Entering edit mode

Thanks for your link. Just one thing. Why do we have to perform LD pruning ?

1
Entering edit mode

SNPs in LD are not independent observations and result in spurious inflation of the distance in PCA.

0
Entering edit mode
5.8 years ago
Mitch Bekritsky ★ 1.3k

Illumina has a C++ package that does partial PCA on a population VCF directly: https://github.com/Illumina/akt

(In the interest of full disclosure, I work at Illumina, but do not work on this tool)