Question: Calculating FST from Transcriptomic Sequence Data stored in a VCF
2.6 years ago by
mek36210 wrote:

Hi all,

I am working with de novo - assembled transcriptomes, and would like to estimate a suite of neutrality and F stats for each locus (i.e. contig) using sequence variation (as opposed to SNPs). So far, I have used the R package PopGenome to estimate single-population and pairwise FST, as well as some neutrality stats. PopGenome gives results for this by contig, but after looking at FST code on the PopGenome github repo, this value appears to be SNP FST pooled across each contig, rather than FST estimated from contig sequences (though I am not completely sure about this). With the haplotype FST method, each population for a region is assigned an identical FST estimate.

Does anybody know of a package that would allow for FST calculation using sequence data? Preferably, a distance-based hierarchical approach like AMOVA.

I have considered HierFstat, but am not able to find a file converter capable of parsing the 2 GB VCF (uncompressed) that my data are stored in. Interested in hearing about other possibilities before taking the time to write my own.

mek362 I've implemented weir and cockerham's 1984 Fst in VCFLIB. It will work directly from a vcf. There are several downstream tools in VCFLIB for smoothing and doing permutations. The program is called wcFst.

