Question: Genetic PCA from poolseq genotype file
gravatar for AP
20 months ago by
AP90 wrote:


I have a sync file extracted with Popoolation2 software that looks like that:

Contig    Position  Ref    Pool1           Pool2           Pool3           Pool4
SCAFOLD1    11722   A   330:0:0:0:0:0   315:0:0:0:0:0   334:0:0:0:0:0   111:0:0:0:0:0
SCAFOLD1    11723   T   0:330:0:0:0:0   0:316:0:0:0:0   0:334:0:0:0:0   0:111:0:0:0:0
SCAFOLD1    11725   T   0:327:0:0:0:0   0:314:0:0:0:0   0:329:0:0:0:0   0:111:0:0:0:0
SCAFOLD1    11726   A   330:0:0:0:0:0   314:0:0:0:0:0   332:0:0:0:0:0   111:0:0:0:0:0

Each cell contain the allelic counts for each basis (e.g. 330:0:0:0:0:0 for A:T:C:G:N).

I would like to perform a genetic PCA on this dataset just as one would do it on a 012 file extracted with VCFtools. I guess, one could convert the sync file with a single value per cell by adding the total number of non-reference alleles and work from that.

Does anybody have experience with that? Any opinion/comment would be very helpful.


ADD COMMENTlink modified 11 weeks ago by ndiaz0 • written 20 months ago by AP90

Hi, did you find out how to perform the PCA? I also obtained a sync file using popoolations2 and a VCF using GATK and I was trying to perform a PCA using either file... but no success yet. Thank you,


ADD REPLYlink written 4 months ago by ndiaz0

I managed following your method. Thanks a million!

ADD REPLYlink written 11 weeks ago by ndiaz0
gravatar for AP
4 months ago by
AP90 wrote:

Hi Natalia,

Yes, I did manage to run a PCA using the sync file. The way I did it was to first calculate the frequency of the minor allele (or the major) of all the SNPs. Then, I ran a PCA on R using prcomp. Instead of the frequency, you can also just use the total count of the minor or major allele. You can also do the same on a 012 file.

Hope that helps! AP

ADD COMMENTlink written 4 months ago by AP90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1024 users visited in the last hour