Question: Principal Component Analysis using SNP data ste
1
gravatar for SOHAIL
2.2 years ago by
SOHAIL230
Beijing Institute of Genomics, CAS.
SOHAIL230 wrote:

Hi everybody,

I have two population of Whole genome SNP calls, and i want to perform Principal component analysis on my set of variant calls together with variant information of population samples from 1000 genomes by using Eigensoft or some other good software.

can anyone please describe in steps how to do that, especially (bit in detail) how to combine the variant information between 1000 genome samples and my samples and later format conversion steps?

Thank you very much! .

pca analysis ngs • 5.5k views
ADD COMMENTlink modified 2.2 years ago by brentp22k • written 2.2 years ago by SOHAIL230
4
gravatar for WouterDeCoster
2.2 years ago by
Belgium
WouterDeCoster35k wrote:

I never got to play with it, but kept it saved in my to do list (probably forever): http://alimanfoo.github.io/2015/09/28/fast-pca.html

ADD COMMENTlink written 2.2 years ago by WouterDeCoster35k
4
gravatar for Ahill
2.2 years ago by
Ahill1.3k
United States
Ahill1.3k wrote:

If you are an R user, the SNPRelate package provides PCA and routines that would allow you to bring in datasets like 1000G from VCF or PLINK format files: http://corearray.sourceforge.net/tutorials/SNPRelate/#principal-component-analysis-pca http://dx.doi.org/10.1093/bioinformatics/bts606

ADD COMMENTlink modified 9 months ago • written 2.2 years ago by Ahill1.3k

Hi Ahil! cannot access openthens account.. :(

ADD REPLYlink written 2.2 years ago by SOHAIL230
3
gravatar for brentp
2.2 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

If your data is in VCF format, you can do this with peddy: https://github.com/brentp/peddy

The command would look like:

python -m peddy -p 4 --plot --prefix my.out $vcf $ped

this will do a PCA with your samples projected onto those from thousand genomes and ancestry predicted according to those. In addition to an interactive html file, you'll get (among others) a PCA plot that looks like:

enter image description here

Where the points from your cohort will be the big open circles and the small ones in the background are the 1000 genomes samples.

ADD COMMENTlink written 2.2 years ago by brentp22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2146 users visited in the last hour