Question: Is samtools BAQ redundant after GATK's IndelRealinger
0
2
Entering edit mode
2.6 years ago
James Reeve ▴ 130

I'm planning to generate a mpileup file for SNP calling. When I look for examples online I notice many turn off samtools' base alignment quality (BAQ) setting by specifying -B

Example:

samtools -B -f ref.fa data.bam > out.mpileup


After doing a bit of research I seems BAQ is designed to adjust quality scores to account for indels (original paper). However, I have already used GATK 3's IndelRealigner to account for indels.

Are samtools BAQ and GATK's IndelRealigner analogous commands for dealing with indels? Is it safe to turn off BAQ after IndelRealignment?

samtools indels • 1.1k views
1
Entering edit mode

The variant calling workflow recommended in the SAMtools webpage mixes GATK's IndelRealigner and bcftools mpileup (with BAQ). I don't know if this pipeline is up to date, though. I'm curious about whether mixing GATK and SAMtools is still the best option if you want to perform the actual variant calling using SAMtools.

As for turning off BAQ (-B option), I have read that it is recommended if you want to perform somatic variant calling using VarScan2 (see for example the Genomic Data Commons user's guide). However, I am not aware of people turning BAQ off for other types of analyses.

0
Entering edit mode

Do you have any idea why VarScan2 recommends turning off mpileup -B? I looked at your source and couldn't find an explanation.

1
Entering edit mode

I believe this thread is one of the first in which they concluded that it was best to use mpileup -B for VarScan2. Apparently, not using -B makes VarScan miss true variants, but be aware that if you use mpileup -B you may get more false positives. I guess this explanation might be applicable to other variant calling pipelines that rely on samtools, such as yours, but I'm not 100% sure. Also note that the thread does not mention whether the user performed indel realignment with GATK before using samtools.

This manuscript from VarScan2's creator also recommends using mpileup -B as "best practice", but it doesn't give an explanation for this recommendation.

0
Entering edit mode

You would likely need the developer(s) of both SAMtools and GATK to be here to adequately answer this. Just generally, though, I would not mix and match these pipelines, where possible. Why are you using a mixture of GATK and SAMtools? I have neither heard of people disabling BAQ...

0
Entering edit mode

I'm mixing GATK's indel realignment and SAMtools because Schlötterer et al. 2014 in Box1 advises using IndelRealigner for a pool-seq pipeline. They also disable BAQ using the mpileup -B option in their tutorial for their variant caller / analysis toolkit PoPoolation2.