How to estimate Watterson's Theta per population from ANGSD output?
0
2
Entering edit mode
22 months ago

Hello,

I am a bit confused about how to estimate Watterson's Theta from ANGSD output. I have 99 individuals comprising 9 populations (11 in each population). First, I used the following code for generating unfolded site frequency spectrum.

for i in 1958 1963 2747 2931 2932 3111 4107 4117 4330
do
angsd -bam "$i".txt -P 16 -dosaf 1 -GL 2 -anc Reference.fasta -minMapQ 30 -minQ 20 -minInd 8 -out $i
done

Then, overall site frequency spectrum was estimates using:

for i in 1958 1963 2747 2931 2932 3111 4107 4117 4330
do
echo $i
realSFS $i.saf.idx -P 8 > $i.sfs
done

Then, I estimated the Theta value per site and based on sliding windows with the below codes:

for i in 1958 1963 2747 2931 2932 3111 4107 4117 4330
do
echo $i
angsd -P 16 -bam $i.txt -ref Reference.fasta -anc Reference.fasta -out $i -doThetas 1 -doSaf 1 -GL 2 -minMapQ 30 -minQ 20 -minInd 8 -pest $i.sfs
done

Sliding window analysis:

for i in 1958 1963 2747 2931 2932 3111 4107 4117 4330
do
    echo $POP
    thetaStat do_stat $i.thetas.idx &> /dev/null
    # perform a sliding-window analysis
    thetaStat do_stat $i.thetas.idx -win 10000 -step 5000 -outnames $i.thetas &> /dev/null
done

Now, my confusion is that from the pestPG output file, I have obtained the Watterson's Theta value by dividing the tW column by the number of sites (nSites column) and then plotted as a distribution in R. It looks like the following.

Is the last method for estimating Watterson's Theta Correct?

Please let me know. Distribution of theta value

SNP theta ANGSD nucleotide diversity genetic • 693 views
ADD COMMENT

Login before adding your answer.

Traffic: 2711 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6