I have a sync file extracted with Popoolation2 software that looks like that:
Contig Position Ref Pool1 Pool2 Pool3 Pool4 SCAFOLD1 11722 A 330:0:0:0:0:0 315:0:0:0:0:0 334:0:0:0:0:0 111:0:0:0:0:0 SCAFOLD1 11723 T 0:330:0:0:0:0 0:316:0:0:0:0 0:334:0:0:0:0 0:111:0:0:0:0 SCAFOLD1 11725 T 0:327:0:0:0:0 0:314:0:0:0:0 0:329:0:0:0:0 0:111:0:0:0:0 SCAFOLD1 11726 A 330:0:0:0:0:0 314:0:0:0:0:0 332:0:0:0:0:0 111:0:0:0:0:0
Each cell contain the allelic counts for each basis (e.g. 330:0:0:0:0:0 for A:T:C:G:N).
I would like to perform a genetic PCA on this dataset just as one would do it on a 012 file extracted with VCFtools. I guess, one could convert the sync file with a single value per cell by adding the total number of non-reference alleles and work from that.
Does anybody have experience with that? Any opinion/comment would be very helpful.