Question: Why Doesn'T Samtools/Bcftools Output Homozygous Mutations?
2
gravatar for Leszek
6.5 years ago by
Leszek4.0k
IIMCB, Poland
Leszek4.0k wrote:

I have noticed samtools/bcftools don't report homozygous mutations. Is it possible to make it report such cases?
ie.

chrX    273982    G    125    aAAAAAaaaaaAAaAAaaaaaaaAAAaAAaaaaAAaAaaAAaaAAAAaaAAAAAAAAaaAAaaaaAAAaaAAAAAAAAAaAAAaAAAAAAAAAAAAAAAAAAAAaaAAAAAAaAAAAaAAAaAaa    FFFG=JFFFFFIJFJF<HHHHHHJDIHJJ2JJIJI:CJ?IGIHIJJIJJIJIJJJJJGJGIJJJJJJJJDJJJJJJICBJJJEJ2JJIJBJJJHJHFHHFHHGHIJHDDFFFEDDDADCCD?C8?

The command I use is:

samtools mpileup -C 50 -ugf $ref.fa $f.bam -r chrX:253982-293982 | bcftools view -bvcg - > $f.raw.bcf
bcftools view $f.raw.bcf | vcfutils.pl varFilter -D 600 > $f.flt.vcf

Samtools version: 0.1.18-dev (r982:313)

EDIT

Only setting -p 1 report homozygous mt I have been looking for (among 3251 false positives):

samtools mpileup -Bugf $ref.fa $f.bam -r chrX:253982-293982 | bcftools view -p 1.0 -vcgN - | grep "3982"
chrX    273982    .    G    A    0.00652    .    DP=125;VDB=0.0615;AF1=0.5;CI95=1.5,0;DP4=0,0,82,43;MQ=44;FQ=-30    GT:PL:GQ    0/1:255,255,255:3

Is there any other, more efficient way of getting such cases?

EDIT2
Interestingly GATK 2.0 UnifiedGenotyper predicts homozygous SNPs nicely:

java -jar ~/src/GenomeAnalysisTK-2.1-13-g1706365/GenomeAnalysisTK.jar -T UnifiedGenotyper -R $ref.fa -I $s.rg.bam -o $s.rg.vcf -ploidy 1
chrX    273982    .    G    A    5393    .    AC=1;AF=1.00;AN=1;DP=125;Dels=0.00;FS=0.000;HaplotypeScore=1.9663;MLEAC=1;MLEAF=1.00;MQ=44.00;MQ0=0;QD=43.14;SB=-1.820e+03    GT:AD:DP:GQ:MLPSAC:MLPSAF:PL    1:0,125:125:99:1:1.00:5393,0

In this context behaviour of samtools is difficult to explain. What is your opinion?

EDIT3
Still no idea, and get more samples with false negative homozygous SNPs ie.

CNBG_3826    1049    .    T    A    0.00652    .    DP=284;VDB=0.0343;AF1=0.5;CI95=1.5,0;DP4=0,0,101,158;MQ=42;FQ=-30    GT:PL:GQ    0/1:255,255,255:3
next-gen calling samtools snp • 3.1k views
ADD COMMENTlink modified 6.4 years ago by swbarnes25.2k • written 6.5 years ago by Leszek4.0k

I am interested in those false positivies, what are they? How do they originate? Thanks.

ADD REPLYlink written 6.5 years ago by Biomonika (Noolean)3.0k
2
gravatar for matted
6.5 years ago by
matted7.0k
Boston, United States
matted7.0k wrote:

I worry about the various implicit and explicit filtering steps in cases like this.

Does it show up before varFilter? If so, then you can figure out why it's failing a default check.

Try mpileup -B to disable BAQ, which might be filtering it out.

I'm actually not sure what effect mpileup -C 50 will have, so it might be worth trying this without it.

It looks like your quality scores are in Sanger format, but in the past I've had problems when I didn't convert them and forgot to use the -6 flag to mpileup. So that's something to keep in mind for other situations.

The previous suggestion of raising bcftools view -p is good to try too.

Finally, and this is more of a guess, does the behavior change based on the size of the region you're calling? That is, do you still miss the SNP in a genome- or chromosome-wide scan, versus this single base example you've shown? I'm not sure how the AFS estimation will work given only one position, so you may want to try this on a larger region (if you haven't already).

ADD COMMENTlink written 6.5 years ago by matted7.0k

it's not showing before varFilter.
not an issue of BAQ, -C.
fastq is sanger like (PHRED+33).
as a matter of fact I call snps in 40kb region (-r chrX:253982-293982), but showing only 1bp for simplicity.

ADD REPLYlink written 6.5 years ago by Leszek4.0k
1

So just to clarify, you actually tried running with -B? Can you try running a whole chromosome at a time and checking the results?

This is very strange, but samtools is so widely used (and your call above is so simple) and this is exactly what it's supposed to do. So without much data I still assume it's something strange in the way you're calling it, or some evil feature of your reads or pipeline that we don't understand yet.

Have you tried another version of samtools, maybe the latest on github, to compare?

Have you made sure that the reference matches the one used to create the BAMs (just checking, since in strange cases like this it's sometimes the obvious things that we'd all overlook at first)?

ADD REPLYlink written 6.5 years ago by matted7.0k
1
gravatar for Ashutosh Pandey
6.5 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

I am not sure if it helps but you can try using bcftools view -p 0.99 -vcgN and see if you get any homozygous mutations.

ADD COMMENTlink written 6.5 years ago by Ashutosh Pandey11k

tried, but setting -p to 0.99. changing -p 1 solves the issue, but report thousands false positives (look at edit)

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by Leszek4.0k
0
gravatar for swbarnes2
6.5 years ago by
swbarnes25.2k
United States
swbarnes25.2k wrote:

Do try mpileup -B. I've seen the BAQ calculations eat sanger-verified SNPs.

mpileup -C 50 is suggested when using .bams generated with bwa, so that bit is fine.

ADD COMMENTlink written 6.5 years ago by swbarnes25.2k

it's not an issue of BAQ. I have got high coverage (~120x)

ADD REPLYlink written 6.5 years ago by Leszek4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 838 users visited in the last hour