Population information for Fst in StAMPP
1
0
Entering edit mode
6.0 years ago
aesculus ▴ 20

Hi,

I want to calculate P values for pairwise Fst estimates among populations for my RADseq SNP dataset. I have been trying to use the R package StAMPP. The accepted input formats are .stampp, .vcf, .csv and an R genlight object. See: https://cran.r-project.org/web/packages/StAMPP/StAMPP.pdf

The problem I have is that .vcf and genlight objects do not seem to contain a population column for individuals, but this is required by StaMPP to estimate allele frequencies (and then Fst estimates).

Is there a way to include population information in a vcf file or genlight object?

Alternatively, is there an easy way to produce a .stampp file? The format is hard to reproduce manually with a dataset of tens of thousands of SNPs.

Packages

   {r}
    #LOAD LIBRARIES
    library(pinfsc50)
    library(vcfR)
    library(adegenet)
    library(StAMPP)

Method 1

##1.1 Read fstat .dat into R and convert to genind object
genind<-read.fstat("f002-p3-r90-m10.dat", quiet=FALSE)

No genind2genlight command?

Method 2

##2.1 Read vcf file into into R
vcf<-read.vcfR("f002-p3-r60-m10.vcf", verbose = FALSE)
##2.2 Convert vcf into genlight object
x <- vcfR2genlight(vcf)
##2.3 Convert genlight to allele frequencies for StAMPP
data.freq<-stamppConvert(x, type = "genlight")

Error in pop.num[i] <- which(pop.names[i] == pops) : replacement has length zero

Need a population column in the vcf file or genlight object!

StAMPP vcf adegenet SNP Fst • 3.7k views
ADD COMMENT
1
Entering edit mode
6.0 years ago
aesculus ▴ 20

Luke Pembleton very kindly provided this successful solution, should anyone else need it:

## as per your post 
##2.1 Read vcf file into into R 
vcf<-read.vcfR("g002-p3-r60-m10.vcf", verbose = FALSE) 
##2.2 Convert vcf into genlight object 
x <- vcfR2genlight(vcf) 

### convert genlight object to a matrix in stampp format 
x2 <- as.matrix(x) #convert genlight object to matrix 
sample <- row.names(x2) #sample names 

#pop.names <- pop(x) #extract ploidy info from genlight object (however this is not available when imported via a vcf file) 
#pop.names <- #provide a vector here of the same length of the number of samples, with a corresponding population name/id
#pop.names <- c(")
#e.g. c("popA", "popA", "popB", "popB", "popB", "popC", "popC") 

ploidy <- ploidy(x) #extract ploidy info from genlight object 
x2 = x2 * (1/ploidy) #convert allele counts to frequency 
x2[is.na(x2)] = NaN 
format <- vector(length = length(sample))
#format id for the genotype data
format[1:length(format)] = "freq"  

x.stampp <- as.data.frame(cbind(sample, pop.names, ploidy, format, x2)) #convert to basic r data.frame suitable to stamppConvert 

geno <- stamppConvert(x.stampp, 'r') 

#you should now be able to run all the stampp commands with 'geno' as the input object 
# e.g. fst <- stamppFst(geno)
ADD COMMENT

Login before adding your answer.

Traffic: 1680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6