Samtools mpileup quality scores all zero
2
2
Entering edit mode
6.4 years ago
fwuffy ▴ 100

Hi everyone.

I'm calling some variants using samtools (and bcftools) from a bwa aligned and sorted BAM.

For some reason, the samtools mpileup is reporting all zero quality scores, but I know the base and read quality scores in the BAM are good (viewed in IGV)

samtools mpileup -uvB -t DP -f ref.fa -r chrX:48,902,600-48,902,700 mapped_sorted.bam

gives me

CHROM    POS    ID    REF    ALT    QUAL
chrX    48902600    .    C    <X>    0

...
one for every base in the specified range

here's a variant position:
chrX    48902688    .    A    C,<X>    0

Base phred quality scores as reported in IGV range from 20-40 and read mapping quality is all 60.

Anyone run into this before?

Thanks!

next-gen SNP alignment • 5.9k views
1
Entering edit mode

what is the meaning of options -t DP and -v in mpileup command?

1
Entering edit mode

-v, --VCF               generate genotype likelihoods in VCF format

-t, --output-tags LIST  optional tags to output: DP,DPR,DV,DP4,INFO/DPR,SP []

Neither affect the reported quality score

1
Entering edit mode

what version samtools are you using? (and have you tried updating to the latest version?)

1
Entering edit mode

Actually it may be 1.1 only, will have to verify this ... thanks.

*edit:   It's 1.1.?. I'm going to try the develop branch

samtools --version

samtools 1.1
Using htslib 1.1
Copyright (C) 2014 Genome Research Ltd.

This is mac os x, and compiling from source is throwing some errors. I think this is a pre-compiled version for os x

Not sure if that includes exact minor version or not

1
Entering edit mode

Hello fwuffy!

This is typically not recommended as it runs the risk of annoying people in both communities.

1
Entering edit mode

Ok.

1
Entering edit mode

You may want to make a small BAM file of just this region and then send an email to the samtools list. This looks a bit like a bug, given the MAPQ and Phred scores you have.

3
Entering edit mode
6.4 years ago

I also spent last week or so trying to work through the project on calling variants and identifying causative gene (plants have been selected through phenotype. haven't finished the projects yet). I don't understand exactly what happens behind the scene, but samtools mpileup doesn't call the variants ! you need to use bcftool call to call variants. Here is some what helpful link http://samtools.sourceforge.net/mpileup.shtml Except be wary because it is out of date, bcftools have changes since that ti,me and you need to use bcftools call instead of view !! Here is what I did to get my VCF files with quality scores

for i in Bams/*.bam; do (samtools mpileup -g -f reference.fa $i | bcftools call -c -v - > calledVCF/$(basename \$i .bam).vcf)& done

0
Entering edit mode

Yes, I'm piping the output into bcftools as well... Can't do any quality filtering without the quality scores though.

Edit: Thanks for that info about bcftools, I also noticed bcftools view was giving strange output (outputting a call every 11 rows) and bcftools call is now filtering out all those bogus positions just fine. Obviously something I'm missing or confused about regarding the QUAL column all being zeros.

mpileup  reports DP=21 as the number of high quality bases supporting the above variant, so it knows about the base quality, but still reports all zeros in the QUAL column, which seems wrong.

mpileup output for the above variant position, and the preceding position which has no variant.

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    mapped_sorted.bam

....

chrX    48902687    .    G    <X>    0    .    DP=18;I16=13,3,0,0,575,22405,0,0,960,57600,0,0,306,6852,0,0;QS=1,0;MQSB=1;MQ0F=0    PL:DP    0,48,255:16

chrX    48902688    .    A    C,<X>    0    .    DP=21;I16=0,0,16,2,0,0,653,25421,0,0,1080,64800,0,0,287,6367;QS=0,1,0;VDB=0.434613;SGB=-0.691153;MQSB=1;MQ0F=0    PL:DP    255,54,0,255,54,255:18

Also I edited my post to reflect that I'm using bcftools as well. Thanks.

0
Entering edit mode

I'm accepting this answer because even though I have not figured out why mpileup was reporting zeros (turned out it was not just on chrX), bcftools call was able to find the quality scores somewhere and puts them in the QUAL column.

0
Entering edit mode

That's exactly what I found. I also could not understand how bcftools call did it. I was going to mention my samtools version is

samtools 1.2
Using htslib 1.2
Copyright (C) 2015 Genome Research Lt

And I do get DP tag in my VCF without specifying it like you did. Maybe try to simplify your mpile up options. I feel -t and -uv might be redundant information (not that it should effect the final result).

by the way my next step was to find intersect between several VCF files.  I used bcftools isec. I know this isn't part of this discussion but it be very helpful to get a sense what others are using. (like I said this is my very first project on variants calling)

0
Entering edit mode
6.4 years ago
Ying W ★ 4.1k

Could you check your base phred quality? are they all 33? This number corresponds to a low quality read: http://en.wikipedia.org/wiki/FASTQ_format#Encoding

0
Entering edit mode

They range from 20-40 at the above SNP locus. The data is from 1000 genomes.

Even if they are low, I suspect mpileup should not be reporting all zeros.

0
Entering edit mode

could you double check that your reference is correct? (using g1k reference and not the ucsc reference). The 1000g reference has chromosome '1', '2', '3', while the ucsc ones have chromosome 'chr1', 'chr2', 'chr3'. In igv, there is a special 1kg, b37 reference. You might also want to try posting over at the samtools mailing list: https://lists.sourceforge.net/lists/listinfo/samtools-help

0
Entering edit mode

I am using the UCSC hg38 reference for mpileup.  Same one I used for mapping and sorting with BWA. Will check into that chr thing further.Thanks.

0
Entering edit mode

if you re-mapped it then that shouldn't be an issue, i would post on the mailing list with a small bam file that can reproduce this issue.

0
Entering edit mode

I emailed the list.

I didn't re-map the reference, I just used the hg38 reference to map and sort my sample FastQs. Also not sure why but this only seems to be happening on X chromosome.  other chrs are reporting normal QUAL in the output.  I'm running the pipe now on the entire sample, and will scan over to see if Y is also affected. It could have something to do with ploidy of sex chromosomes... just a guess at this point.

Thanks

2
Entering edit mode

It might be helpful to note if this only occurs in the pseudoautosomal region on X, which is shared with Y, and leads to ambiguously-mapped reads.

1
Entering edit mode

Sorry that was just confusion on my part. mpileup is actually giving zeros everywhere. Still haven't resolved that question, but bcftools was able to put quality scores in the QUAL column. Not sure where its getting those numbers from, but i'll take it.