Fst and SNPs
1
0
Entering edit mode
5 days ago
md2022 • 0

Hello,

I’m trying to make an Fst table from my ddRAD data that is currently in VCF format. I’ve done this once before but I’m sure there are better, less convoluted ways to show Fst values. Is there a package (in R) or program that is recommended to make a table showing Fst. I used hierfstat in R once but it seemed overly complicated. I have multiple populations, around 8 populations! Thank you.

Fst R snps ddrad • 236 views
ADD COMMENT
0
Entering edit mode
1 day ago

Hello,

For ddRAD data in VCF format with multiple populations (e.g., your 8 pops), the dartR package in R is a great choice—it's specifically designed for reduced-representation sequencing like ddRAD/DArTseq, handles VCF import nicely, and makes pairwise Fst calculations straightforward without the hassle you saw in hierfstat. It outputs a tidy table of pairwise Fst values (plus bootstraps for CIs if needed) and integrates well with filtering steps if you want to QC your SNPs first.

Quick Setup

Install dartR (and dependencies) if you haven't:

install.packages("dartR")
# Or from GitHub for the latest: devtools::install_github("owensgl/dartR.base")

Workflow

  1. Import your VCF into a genlight object (dartR's format). You'll need a population map file (simple tab-delimited text: first column sample IDs matching your VCF, second column pop names—e.g., "Pop1", "Pop2", etc. for your 8 groups).

    Example popmap.txt:

    Sample1  Pop1
    Sample2  Pop1
    Sample3  Pop2
    ...
    

    Code:

    library(dartR)
    
    # Read VCF (assumes gzipped; adjust path as needed)
    gl <- gl.read.vcf(vcf.file = "your_data.vcf.gz", 
                      pop.file = "popmap.txt",  # your pop assignments
                      n.cores = 4)  # parallelize if big dataset
    
    # Optional: Quick summary to check import
    gl.report(gl)
    
  2. (Optional but recommended) Filter SNPs/individuals for quality—e.g., call rate >80%, minor allele freq >5%. This avoids biases in Fst.

    gl_filt <- gl.filter.callrate(gl, threshold = 0.8)
    gl_filt <- gl.filter.maf(gl_filt, threshold = 0.05)
    gl_filt <- gl.filter.monomorphs(gl_filt)  # drop monomorphic loci
    
  3. Calculate pairwise Fst. This uses Weir & Cockerham's estimator and can include bootstraps (e.g., 100 reps for 95% CIs).

    # Pairwise Fst table (no bootstrap for speed)
    fst_table <- gl.fst.pop(gl_filt, nboot = 0)
    print(fst_table)  # Outputs a matrix-like table of pairwise Fst
    
    # With bootstraps for CIs (slower but better)
    fst_boot <- gl.fst.pop(gl_filt, nboot = 100)
    print(fst_boot)  # Includes mean Fst, lower/upper CIs per pair
    

The output is a clean data frame/matrix you can export as CSV (write.csv(fst_table, "fst_pairs.csv")) or plot as a heatmap (e.g., via gl.fst.plot(fst_boot) in dartR, or pheatmap).

If your VCF is huge, this runs efficiently with parallelization. For more options (e.g., overall Fst across all pops), check gl.fst.overall(). Docs and vignettes are solid: ?gl.fst.pop or https://cran.r-project.org/package=dartR.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 3381 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6