Entering edit mode
5.8 years ago
James Reeve
▴
130
I'm planning to generate a mpileup
file for SNP calling. When I look for examples online I notice many turn off samtools' base alignment quality (BAQ) setting by specifying -B
Example:
samtools -B -f ref.fa data.bam > out.mpileup
After doing a bit of research I seems BAQ is designed to adjust quality scores to account for indels (original paper). However, I have already used GATK 3's IndelRealigner to account for indels.
Are samtools BAQ and GATK's IndelRealigner analogous commands for dealing with indels? Is it safe to turn off BAQ after IndelRealignment?
The variant calling workflow recommended in the SAMtools webpage mixes GATK's
IndelRealigner
andbcftools mpileup
(with BAQ). I don't know if this pipeline is up to date, though. I'm curious about whether mixing GATK and SAMtools is still the best option if you want to perform the actual variant calling using SAMtools.As for turning off BAQ (
-B
option), I have read that it is recommended if you want to perform somatic variant calling using VarScan2 (see for example the Genomic Data Commons user's guide). However, I am not aware of people turning BAQ off for other types of analyses.Do you have any idea why VarScan2 recommends turning off
mpileup -B
? I looked at your source and couldn't find an explanation.I believe this thread is one of the first in which they concluded that it was best to use
mpileup -B
for VarScan2. Apparently, not using-B
makes VarScan miss true variants, but be aware that if you usempileup -B
you may get more false positives. I guess this explanation might be applicable to other variant calling pipelines that rely onsamtools
, such as yours, but I'm not 100% sure. Also note that the thread does not mention whether the user performed indel realignment with GATK before usingsamtools
.This manuscript from VarScan2's creator also recommends using
mpileup -B
as "best practice", but it doesn't give an explanation for this recommendation.You would likely need the developer(s) of both SAMtools and GATK to be here to adequately answer this. Just generally, though, I would not mix and match these pipelines, where possible. Why are you using a mixture of GATK and SAMtools? I have neither heard of people disabling BAQ...
I'm mixing GATK's indel realignment and SAMtools because Schlötterer et al. 2014 in Box1 advises using IndelRealigner for a pool-seq pipeline. They also disable BAQ using the
mpileup -B
option in their tutorial for their variant caller / analysis toolkit PoPoolation2.