Hello,
I am a bit confused about how to estimate Watterson's Theta from ANGSD output. I have 99 individuals comprising 9 populations (11 in each population). First, I used the following code for generating unfolded site frequency spectrum.
for i in 1958 1963 2747 2931 2932 3111 4107 4117 4330
do
angsd -bam "$i".txt -P 16 -dosaf 1 -GL 2 -anc Reference.fasta -minMapQ 30 -minQ 20 -minInd 8 -out $i
done
Then, overall site frequency spectrum was estimates using:
for i in 1958 1963 2747 2931 2932 3111 4107 4117 4330
do
echo $i
realSFS $i.saf.idx -P 8 > $i.sfs
done
Then, I estimated the Theta value per site and based on sliding windows with the below codes:
for i in 1958 1963 2747 2931 2932 3111 4107 4117 4330
do
echo $i
angsd -P 16 -bam $i.txt -ref Reference.fasta -anc Reference.fasta -out $i -doThetas 1 -doSaf 1 -GL 2 -minMapQ 30 -minQ 20 -minInd 8 -pest $i.sfs
done
Sliding window analysis:
for i in 1958 1963 2747 2931 2932 3111 4107 4117 4330
do
echo $POP
thetaStat do_stat $i.thetas.idx &> /dev/null
# perform a sliding-window analysis
thetaStat do_stat $i.thetas.idx -win 10000 -step 5000 -outnames $i.thetas &> /dev/null
done
Now, my confusion is that from the pestPG
output file, I have obtained the Watterson's Theta value by dividing the tW
column by the number of sites (nSites
column) and then plotted as a distribution in R. It looks like the following.
Is the last method for estimating Watterson's Theta Correct?
Please let me know.