Does bcftools call ignore duplicate markings?
1
0
Entering edit mode
6 weeks ago
Egelbets ▴ 10

I am using Picard MarkDuplicates to mark duplicates in my position sorted BAM. Then, I variant call using bcftools mpileup and bcftools call like so:

bcftools mpileup -f reference.fa positionsort.bam | bcftools call -mv -Ob -o calls.bcf


To my knowledge, bcftools mpileup ignores duplicates by default (here and here). Initially I assumed that this would be the same case for bcftools call. However, I was skeptical and decided to do a test where I variant call when I mark duplicates with Picard, and where I variant call without marking duplicates. To my surprise, this resulted in both cases in the exact same number of variants. So, does this mean that bcftools call doesn't ignore duplicate markings? Or have I made a mistake?

edit: I did in fact make a mistake, see comments below

samtools ignore bcftools markings duplicates • 284 views
1
Entering edit mode

the information about 'duplicate' is not present in the vcf after 'mpileup'

2
Entering edit mode
5 weeks ago
Egelbets ▴ 10

So I did find a bug in my pipeline that caused the variant calling to be done only on the marked duplicates files, and not on the files where I didn't mark the duplicates, explaining why I found the exact same number of variants in both of these cases. I fixed that mistake, did some more tests and also tested SAMtools markdup. This resulted in the following:

I tested marking the duplicates, and removing them, and then doing the variant calling. The number of SNPs called with BCFtools are shown in the table above. As you can see, the # of variants called when marking duplicates and removing duplicates are basically the exact same, except for Picard SAMPLE5, where removedup resulted in 1 variant less then markdup (in red in the table). So, I think it's safe to say that bcftools call also ignores duplicate markings, just like mpileup.