Hello,
I'm doing consensus calling using bcftools. The command I'm using is:
bcftools mpileup --max-depth 0 --ff DUP -f [fa file] [deduped bam file] | bcftools call -mv -Ov -o [variants.vcf]
I looked at the intermediate mpileup output created using bcftools mpileup --max-depth 0 --ff DUP -f [fa file] [deduped bam file] > [mpileup.txt]
and there's a variant found at position 1703:
gb-cds|KM069571.1|Influenza_virus|HA|A|H3|A/Singapore/C2012.669/2012| 1703 . T C,A,<*> 0 . DP=40;I16=13,21,1,1,1043,39031,72,2592,1220,45432,86,3700,802,19468,30,650;
QS=0.927565,0.0362173,0.0362173,0;
VDB=0.02;
SGB=-0.453602;
RPB=0.132353;
MQB=0.220588;
MQSB=0.993571;
BQB=0.558824;
MQ0F=0 PL 0,69,255,69,255,255,102,255,255,255
But in the resulting .VCF file I don't see this variant showing up anywhere. Is it because the C and A variants are too infrequent or low quality? If so how do I interpret all the fields in the mpileup entry?
Thanks!
So I checked the region in IGV and the allelic depths are: Total count: 1263 A: 5 reads (0%, 1+, 4-) C: 19 reads (2%, 4+, 15-) T: 1239 reads (98%, 148+, 1091-)
Is 2% considered too low a depth for bcftools to consider it as a variant? I tried to look for this cutoff depth threshold used by bcftools and didn't have any luck finding this information.
Yes, that is very low - BCFtools is best at calling germline variants, i.e., at frequencies of 50% or 100%. You can still call them - take a look at the varFilter executable that can be used with BCFtools and that I use here: https://github.com/kevinblighe/ClinicalGradeDNAseq/blob/master/AnalysisMasterVersion1.sh#L354-L375
If these are cancer samples, then you should obviously use a somatic variant caller.