I’m trying to make an Fst table from my ddRAD data that is currently in VCF format. I’ve done this once before but I’m sure there are better, less convoluted ways to show Fst values. Is there a package (in R) or program that is recommended to make a table showing Fst. I used hierfstat in R once but it seemed overly complicated. I have multiple populations, around 8 populations! Thank you.
For ddRAD data in VCF format with multiple populations (e.g., your 8 pops), the dartR package in R is a great choice—it's specifically designed for reduced-representation sequencing like ddRAD/DArTseq, handles VCF import nicely, and makes pairwise Fst calculations straightforward without the hassle you saw in hierfstat. It outputs a tidy table of pairwise Fst values (plus bootstraps for CIs if needed) and integrates well with filtering steps if you want to QC your SNPs first.
Quick Setup
Install dartR (and dependencies) if you haven't:
install.packages("dartR")
# Or from GitHub for the latest: devtools::install_github("owensgl/dartR.base")
Workflow
Import your VCF into a genlight object (dartR's format). You'll need a population map file (simple tab-delimited text: first column sample IDs matching your VCF, second column pop names—e.g., "Pop1", "Pop2", etc. for your 8 groups).
Example popmap.txt:
Sample1 Pop1
Sample2 Pop1
Sample3 Pop2
...
Code:
library(dartR)
# Read VCF (assumes gzipped; adjust path as needed)
gl <- gl.read.vcf(vcf.file = "your_data.vcf.gz",
pop.file = "popmap.txt", # your pop assignments
n.cores = 4) # parallelize if big dataset
# Optional: Quick summary to check import
gl.report(gl)
(Optional but recommended) Filter SNPs/individuals for quality—e.g., call rate >80%, minor allele freq >5%. This avoids biases in Fst.
gl_filt <- gl.filter.callrate(gl, threshold = 0.8)
gl_filt <- gl.filter.maf(gl_filt, threshold = 0.05)
gl_filt <- gl.filter.monomorphs(gl_filt) # drop monomorphic loci
Calculate pairwise Fst. This uses Weir & Cockerham's estimator and can include bootstraps (e.g., 100 reps for 95% CIs).
# Pairwise Fst table (no bootstrap for speed)
fst_table <- gl.fst.pop(gl_filt, nboot = 0)
print(fst_table) # Outputs a matrix-like table of pairwise Fst
# With bootstraps for CIs (slower but better)
fst_boot <- gl.fst.pop(gl_filt, nboot = 100)
print(fst_boot) # Includes mean Fst, lower/upper CIs per pair
The output is a clean data frame/matrix you can export as CSV (write.csv(fst_table, "fst_pairs.csv")) or plot as a heatmap (e.g., via gl.fst.plot(fst_boot) in dartR, or pheatmap).
If your VCF is huge, this runs efficiently with parallelization. For more options (e.g., overall Fst across all pops), check gl.fst.overall(). Docs and vignettes are solid: ?gl.fst.pop or https://cran.r-project.org/package=dartR.