Question: Calculating FST from Transcriptomic Sequence Data stored in a VCF
gravatar for mek362
2.6 years ago by
mek36210 wrote:

Hi all,

I am working with de novo - assembled transcriptomes, and would like to estimate a suite of neutrality and F stats for each locus (i.e. contig) using sequence variation (as opposed to SNPs). So far, I have used the R package PopGenome to estimate single-population and pairwise FST, as well as some neutrality stats. PopGenome gives results for this by contig, but after looking at FST code on the PopGenome github repo, this value appears to be SNP FST pooled across each contig, rather than FST estimated from contig sequences (though I am not completely sure about this). With the haplotype FST method, each population for a region is assigned an identical FST estimate.

Does anybody know of a package that would allow for FST calculation using sequence data? Preferably, a distance-based hierarchical approach like AMOVA.

I have considered HierFstat, but am not able to find a file converter capable of parsing the 2 GB VCF (uncompressed) that my data are stored in. Interested in hearing about other possibilities before taking the time to write my own.

Please let me know if I've left out any useful info.



ADD COMMENTlink written 2.6 years ago by mek36210

mek362 I've implemented weir and cockerham's 1984 Fst in VCFLIB. It will work directly from a vcf. There are several downstream tools in VCFLIB for smoothing and doing permutations. The program is called wcFst.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Zev.Kronenberg11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1054 users visited in the last hour