BBtools bug in reporting the number of substitutions in the console output, it seems to report insanely high rates of heterozygosity
2
1
Entering edit mode
5 months ago
Axzd ▴ 70

Hello,

I know Brian is sometimes around, but here is my command:

while read p; do callvariants.sh in=${p}.recal.bam ploidy=2 vcf=${p}.20score.vcf useidentity=f overwrite=true ref=ref.fsa
-Xmx50g ; done <ID java -ea -Xmx50g -Xms50g -cp /home/alessandro/software/bbmap/current/ var2.CallVariants in=ancestor.recal.bam ploidy=2 vcf=ancestor.20score.vcf useidentity=f overwrite=true ref=ref.fsa -Xmx50g Executing var2.CallVariants [in=ancestor.recal.bam, ploidy=2, vcf=ancestor.20score.vcf, useiden
tity=f, overwrite=true,
ref=Adineta_vaga.fsa, -Xmx50g]

And the console output is:

Loading reference.
Time:   0.596 seconds.
Processing input files.
Found sambamba.
Time:   230.202 seconds.
Memory: max=53687m, total=53687m, free=13311m, used=40376m
Processing variants.
Time:   12.761 seconds.

Counting nearby variants.
Time:   1.972 seconds.

Sorting variants.
Time:   1.304 seconds.
Writing VCF file.
Time:   1.064 seconds.
1838700 of 160623721 variants passed primary filters (1.1447%).

Type            Count   Rate    AD  Depth   AF  Score   Qual
Substitutions:  1669537 90.8%   158.8   341.6   0.468   43.6    23.4
Deletions:      75362   4.1%    149.5   337.3   0.446   38.4    24.2
Insertions:     93801   5.1%    151.1   340.2   0.447   40.9    24.2
Variation Rate: 1/54
Homozygous:     8887    0.5%

It gives the impression that 90% of the bases are heterozygous. Maybe I misread it, but I think this could be clarified.

It is BBmap version 39.06.

Thanks.

PS: I use biostars a lot for questions related to bbtools. Is it a correct way to use biostars? Please, admins, let me know if my questions are more annoying than helpful. Thanks a lot

SNP bbtools • 714 views
ADD COMMENT
2
Entering edit mode
5 months ago

The rates are for the variants that were called (not total bases). If you add up the percentages you get 100%; if you add up the counts it equals the total number of variants that passed primary filters (1838700). Note that this number is much smaller than the size of the rotifer genome (>200 Mb).

ADD COMMENT
1
Entering edit mode
5 months ago

You can use a tool like vt peek to summarize variant stats once you have got to a vcf etc.

https://genome.sph.umich.edu/wiki/Vt

ADD COMMENT

Login before adding your answer.

Traffic: 1655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6