Incorrect Reference Base In 'Samtools Mpileup' Output
2
1
Entering edit mode
11.9 years ago

Hello, I'm trying to call variants using samtools and bcftools and (for unknown reasons) I get incorrect base in the REF column of the resulting VCF file; example:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  ../algn/algn_400K.sorted.bam
...    226300  .       T       C       119     .       DP=465;VDB=0.0418;AF1=0.5;AC1=1;DP4=135,80,131,105;MQ=20;FQ=105;PV4=0.13,0.27,1,1       GT:PL:GQ    0/1:149,0,132:99

But when I check the above position in the reference, the actual base seems to be G:

In [18]: from Bio import SeqIO
In [19]: [ref] = SeqIO.parse(open("MG1655-K12.fasta"), "fasta")
In [20]: ref[226300] 
Out[20]: 'G'

Here's a summary of what I did to obtain the VCF:

$ samtools mpileup -uf MG1655-K12.fasta algn_400K.sorted.bam > 400K.raw.bcf
$ bcftools view -bvcg 400K.raw.bcf > 400K.vcf

Any hints on why this might happen?

samtools bcftools vcf • 4.4k views
ADD COMMENT
5
Entering edit mode
11.9 years ago

The bases in the VCF are numbered starting with 1 as the first base, NOT '0' and I suppose that SeqIO uses '0' for the first base, NOT 1.

ADD COMMENT
1
Entering edit mode

Thanks, that's exactly the case!

In [23]: ref[226299]
Out[23]: 'T'
ADD REPLY
0
Entering edit mode
11.9 years ago
Stephwen ▴ 160

Perhaps we could further complete the information present in this table : http://alternateallele.blogspot.de/2012/03/genome-coordinate-cheat-sheet.html and save it on some wiki somewhere (seqanswers?), as this kind of question seems to arise periodically.

ADD COMMENT
2
Entering edit mode

Perhaps a better solution would be to unify the tools and make the table smaller? ;) Gregor

ADD REPLY

Login before adding your answer.

Traffic: 2558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6