I have a pipeline for a maximum depth sequencing project. Briefly, this means I can ignore PCR errors because I check for consensus of UMI-tagged sequences. Therefore, once I have a consensus bam, I'd like to be able to call all variants within it, without trying to correct for the probability of errors.
I am having trouble doing this with bcftools mpileup. It finds SNPs quite well, but does not seem to find deletions. I know that the consensus has some deletions, because I can see mapped sequences with deletions in the CIGAR string, like the following example:
testsample:32848 99 mygenome 12814 60 4M1D131M = 12814 152 CAATCGGAGCTTTACCTCTTCTTCACTACAAAGCATGGGGTTTCAAAACGAACATCGCTATCTCAATTCGCTAATATCCGCAACAATGGTCTAATTGCTCTGAGTCTTCGTGAAGATGATGAACTGATGGGTGTA NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NM:i:1 MD:Z:4^G131 MC:Z:4M1D147M AS:i:131 XS:i:0
I am pretty sure I'm using the correct bctools mpileup arguments to call indels with high depth and no probabilistic base calling, but I may be doing something wrong. I would have expected to see the deletion shown above in my output vcf but I do not. Here is my bcftools mpileup call:
bcftools mpileup \ -f path/to/my/genome \ -d 1000000 \ -a AD,DP \ -o pileup.vcf \ --no-BAQ \ -L 1000000 \ consensus.bam
Thank you for your help!
Thanks for your reply. The problem is the output from
mpileupdoesn't contain the deletions, so neither will