Question

Fst and SNPs

0

Entering edit mode

5 days ago

md2022 • 0

Hello,

I’m trying to make an Fst table from my ddRAD data that is currently in VCF format. I’ve done this once before but I’m sure there are better, less convoluted ways to show Fst values. Is there a package (in R) or program that is recommended to make a table showing Fst. I used hierfstat in R once but it seemed overly complicated. I have multiple populations, around 8 populations! Thank you.

Fst R snps ddrad • 236 views

ADD COMMENT • link updated 1 day ago by Kevin Blighe 89k • written 5 days ago by md2022 • 0

score 0 · Answer 1 · 2025-11-06

Hello,

For ddRAD data in VCF format with multiple populations (e.g., your 8 pops), the dartR package in R is a great choice—it's specifically designed for reduced-representation sequencing like ddRAD/DArTseq, handles VCF import nicely, and makes pairwise Fst calculations straightforward without the hassle you saw in hierfstat. It outputs a tidy table of pairwise Fst values (plus bootstraps for CIs if needed) and integrates well with filtering steps if you want to QC your SNPs first.

Quick Setup

Install dartR (and dependencies) if you haven't:

install.packages("dartR")
# Or from GitHub for the latest: devtools::install_github("owensgl/dartR.base")

Workflow

Import your VCF into a genlight object (dartR's format). You'll need a population map file (simple tab-delimited text: first column sample IDs matching your VCF, second column pop names—e.g., "Pop1", "Pop2", etc. for your 8 groups).

Example popmap.txt:

Sample1  Pop1
Sample2  Pop1
Sample3  Pop2
...

Code:

library(dartR)

# Read VCF (assumes gzipped; adjust path as needed)
gl <- gl.read.vcf(vcf.file = "your_data.vcf.gz", 
                  pop.file = "popmap.txt",  # your pop assignments
                  n.cores = 4)  # parallelize if big dataset

# Optional: Quick summary to check import
gl.report(gl)

(Optional but recommended) Filter SNPs/individuals for quality—e.g., call rate >80%, minor allele freq >5%. This avoids biases in Fst.

gl_filt <- gl.filter.callrate(gl, threshold = 0.8)
gl_filt <- gl.filter.maf(gl_filt, threshold = 0.05)
gl_filt <- gl.filter.monomorphs(gl_filt)  # drop monomorphic loci

Calculate pairwise Fst. This uses Weir & Cockerham's estimator and can include bootstraps (e.g., 100 reps for 95% CIs).

# Pairwise Fst table (no bootstrap for speed)
fst_table <- gl.fst.pop(gl_filt, nboot = 0)
print(fst_table)  # Outputs a matrix-like table of pairwise Fst

# With bootstraps for CIs (slower but better)
fst_boot <- gl.fst.pop(gl_filt, nboot = 100)
print(fst_boot)  # Includes mean Fst, lower/upper CIs per pair

The output is a clean data frame/matrix you can export as CSV (write.csv(fst_table, "fst_pairs.csv")) or plot as a heatmap (e.g., via gl.fst.plot(fst_boot) in dartR, or pheatmap).

If your VCF is huge, this runs efficiently with parallelization. For more options (e.g., overall Fst across all pops), check gl.fst.overall(). Docs and vignettes are solid: ?gl.fst.pop or https://cran.r-project.org/package=dartR.

Kevin