Vcf - Showing Data For Non-Variant Positions As Well As Snps
2
3
Entering edit mode
10.1 years ago
stoker.neil ▴ 70

I've been using Samtools/BCFtools to identify SNPs in bacterial genomes mapped to a reference. For some uses, that's enough, but for others I want to know the evidence that a variant does NOT exist - and the lack of SNP is not (for example) due to poor data in that region.

I've searched for ways of generating data for all positions, and found (a) that no-one seems to discuss this, and (b) that removing all of the three '-cgv' options when running BCFtools works - and furthermore removing '-e' option gives a slightly different output. But these still aren't ideal - they don't for example call the bases properly, or give a quality score that they are non-variant.

So what I see is eg:

Normal output: (shows a SNP)

strain1234    1834177    .    A    C    180    .    DP=51;VDB=0.0280;AF1=1;AC1=2;DP4=0,0,29,19;MQ=58;FQ=-168    GT:PL:DP:GQ    1/1:213,141,0:48:99

cvg parameters removed: (shows a non-SNP, and the above SNP)

strain1234    1377185    .    C    X    0    .    DP=65;AF1=0    PL:DP    0,196,255:65
strain1234    1834177    .    A    C,G,X    0    .    DP=51;VDB=0.0280;AF1=1    PL:DP    213,141,0,216,128,213,213,141,216,213:48

cvge parameters removed: (shows a non-SNP, and the above SNP)

strain1234    1377185    .    C    X    0    .    DP=65;I16=33,32,0,0,1505,36349,0,0,3659,216179,0,0,1451,34591,0,0    PL:DP    0,196,255:65
strain1234    1834177    .    A    C,G,X    0    .    DP=51;I16=0,0,29,19,0,0,864,17124,0,0,2774,162396,0,0,971,22103;VDB=0.0280    PL:DP    213,141,0,216,128,213,213,141,216,213:48

I wondered if anyone had ideas on this?

• 4.2k views
ADD COMMENT
6
Entering edit mode
10.1 years ago
lh3 33k

Just use "-c":

seq1    547 .   G   .   144 .   DP=40;AF1=0;AC1=0;DP4=16,22,0,0;MQ=60;FQ=-141   PL  0
seq1    548 .   C   A   133 .   DP=39;VDB=0.0921;AF1=0.5;AC1=1;DP4=11,8,4,13;MQ=60;FQ=136;PV4=0.049,0.085,1,1   PL  163,0,192
seq1    549 .   T   .   144 .   DP=41;AF1=0;AC1=0;DP4=18,20,0,0;MQ=60;FQ=-141   PL  0
ADD COMMENT
0
Entering edit mode

That's a nice compromise, simply outputting depth and placement information for the reference allele.

The problem I always had when confronting this was that, when including indels, there are a huge number of possible alleles per position.

ADD REPLY
2
Entering edit mode
10.1 years ago
stoker.neil ▴ 70

Thanks - looking at your answer made me double check, and I found that the post-BCFtools filter I was using vcf_filter.pl) was removing much more than I realised. In fact my original thought - just removing '-v' from BCFtools is all I need to do.

ADD COMMENT
0
Entering edit mode

If your question was answered, please 'accept' the correct answer (so the answerer gets credit), rather than leaving your comment as an answer.

ADD REPLY

Login before adding your answer.

Traffic: 2394 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6