Question: Samtools mpileup quality scores all zero
2
gravatar for fwuffy
4.4 years ago by
fwuffy90
Michigan, USA
fwuffy90 wrote:

Hi everyone.

I'm calling some variants using samtools (and bcftools) from a bwa aligned and sorted BAM.

For some reason, the samtools mpileup is reporting all zero quality scores, but I know the base and read quality scores in the BAM are good (viewed in IGV)

samtools mpileup -uvB -t DP -f ref.fa -r chrX:48,902,600-48,902,700 mapped_sorted.bam


gives me


CHROM    POS    ID    REF    ALT    QUAL
chrX    48902600    .    C    <X>    0

...
one for every base in the specified range

here's a variant position:
chrX    48902688    .    A    C,<X>    0

Base phred quality scores as reported in IGV range from 20-40 and read mapping quality is all 60.

Anyone run into this before?

Thanks!

snp alignment next-gen • 4.6k views
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by fwuffy90
1

what is the meaning of options -t DP and -v in mpileup command?

 

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Nicola Casiraghi450
1

-v, --VCF               generate genotype likelihoods in VCF format

-t, --output-tags LIST  optional tags to output: DP,DPR,DV,DP4,INFO/DPR,SP []

Neither affect the reported quality score

 

ADD REPLYlink written 4.4 years ago by fwuffy90
1

what version samtools are you using? (and have you tried updating to the latest version?)

ADD REPLYlink written 4.4 years ago by Ying W3.9k
1

 

1.131-25 which is the latest. Just downloaded.

Actually it may be 1.1 only, will have to verify this ... thanks.

*edit:   It's 1.1.?. I'm going to try the develop branch

 

samtools --version

samtools 1.1
Using htslib 1.1
Copyright (C) 2014 Genome Research Ltd.

 

This is mac os x, and compiling from source is throwing some errors. I think this is a pre-compiled version for os x

 

Not sure if that includes exact minor version or not

 

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by fwuffy90
1

Hello fwuffy!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=59422

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 4.4 years ago by Pierre Lindenbaum124k
1

Ok.
 

ADD REPLYlink written 4.4 years ago by fwuffy90
1

You may want to make a small BAM file of just this region and then send an email to the samtools list. This looks a bit like a bug, given the MAPQ and Phred scores you have.

ADD REPLYlink written 4.4 years ago by Devon Ryan92k
3
gravatar for Kirill
4.4 years ago by
Kirill270
Australia
Kirill270 wrote:

I also spent last week or so trying to work through the project on calling variants and identifying causative gene (plants have been selected through phenotype. haven't finished the projects yet). I don't understand exactly what happens behind the scene, but samtools mpileup doesn't call the variants ! you need to use bcftool call to call variants. Here is some what helpful link http://samtools.sourceforge.net/mpileup.shtml Except be wary because it is out of date, bcftools have changes since that ti,me and you need to use bcftools call instead of view !! Here is what I did to get my VCF files with quality scores    

for i in Bams/*.bam; do (samtools mpileup -g -f reference.fa $i | bcftools call -c -v - > calledVCF/$(basename $i .bam).vcf)& done                       

 

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Kirill270

Yes, I'm piping the output into bcftools as well... Can't do any quality filtering without the quality scores though.

Edit: Thanks for that info about bcftools, I also noticed bcftools view was giving strange output (outputting a call every 11 rows) and bcftools call is now filtering out all those bogus positions just fine. Obviously something I'm missing or confused about regarding the QUAL column all being zeros.

mpileup  reports DP=21 as the number of high quality bases supporting the above variant, so it knows about the base quality, but still reports all zeros in the QUAL column, which seems wrong.

mpileup output for the above variant position, and the preceding position which has no variant.

 

 

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    mapped_sorted.bam

....

chrX    48902687    .    G    <X>    0    .    DP=18;I16=13,3,0,0,575,22405,0,0,960,57600,0,0,306,6852,0,0;QS=1,0;MQSB=1;MQ0F=0    PL:DP    0,48,255:16


chrX    48902688    .    A    C,<X>    0    .    DP=21;I16=0,0,16,2,0,0,653,25421,0,0,1080,64800,0,0,287,6367;QS=0,1,0;VDB=0.434613;SGB=-0.691153;MQSB=1;MQ0F=0    PL:DP    255,54,0,255,54,255:18

 

Also I edited my post to reflect that I'm using bcftools as well. Thanks.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by fwuffy90

I'm accepting this answer because even though I have not figured out why mpileup was reporting zeros (turned out it was not just on chrX), bcftools call was able to find the quality scores somewhere and puts them in the QUAL column.

Thanks everyone for your help.

ADD REPLYlink written 4.4 years ago by fwuffy90

That's exactly what I found. I also could not understand how bcftools call did it. I was going to mention my samtools version is 

samtools 1.2
Using htslib 1.2
Copyright (C) 2015 Genome Research Lt

And I do get DP tag in my VCF without specifying it like you did. Maybe try to simplify your mpile up options. I feel -t and -uv might be redundant information (not that it should effect the final result).

by the way my next step was to find intersect between several VCF files.  I used bcftools isec. I know this isn't part of this discussion but it be very helpful to get a sense what others are using. (like I said this is my very first project on variants calling)  

ADD REPLYlink written 4.4 years ago by Kirill270
0
gravatar for Ying W
4.4 years ago by
Ying W3.9k
South San Francisco, CA
Ying W3.9k wrote:

Could you check your base phred quality? are they all 33? This number corresponds to a low quality read: http://en.wikipedia.org/wiki/FASTQ_format#Encoding

ADD COMMENTlink written 4.4 years ago by Ying W3.9k

They range from 20-40 at the above SNP locus. The data is from 1000 genomes.

Even if they are low, I suspect mpileup should not be reporting all zeros.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by fwuffy90

could you double check that your reference is correct? (using g1k reference and not the ucsc reference). The 1000g reference has chromosome '1', '2', '3', while the ucsc ones have chromosome 'chr1', 'chr2', 'chr3'. In igv, there is a special 1kg, b37 reference. You might also want to try posting over at the samtools mailing list: https://lists.sourceforge.net/lists/listinfo/samtools-help

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Ying W3.9k

I am using the UCSC hg38 reference for mpileup.  Same one I used for mapping and sorting with BWA. Will check into that chr thing further.Thanks.

ADD REPLYlink written 4.4 years ago by fwuffy90

if you re-mapped it then that shouldn't be an issue, i would post on the mailing list with a small bam file that can reproduce this issue.

ADD REPLYlink written 4.4 years ago by Ying W3.9k

I emailed the list.

I didn't re-map the reference, I just used the hg38 reference to map and sort my sample FastQs. Also not sure why but this only seems to be happening on X chromosome.  other chrs are reporting normal QUAL in the output.  I'm running the pipe now on the entire sample, and will scan over to see if Y is also affected. It could have something to do with ploidy of sex chromosomes... just a guess at this point.

Thanks

 

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by fwuffy90
2

It might be helpful to note if this only occurs in the pseudoautosomal region on X, which is shared with Y, and leads to ambiguously-mapped reads.

ADD REPLYlink written 4.4 years ago by Brian Bushnell16k
1

Sorry that was just confusion on my part. mpileup is actually giving zeros everywhere. Still haven't resolved that question, but bcftools was able to put quality scores in the QUAL column. Not sure where its getting those numbers from, but i'll take it.

ADD REPLYlink written 4.4 years ago by fwuffy90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1256 users visited in the last hour