Question: Discrepancy In Samtools Mpileup/Depth And Bedtools Genomecoveragebed Counts
7
gravatar for Surya Saha
6.7 years ago by
Surya Saha260
NY
Surya Saha260 wrote:

I am getting different counts for the number of bases on reference covered by aligned reads using samtools depth/mpileup and BEDTools genomeCoverageBed commands. I am using samtools-0.1.19 and bedtools-2.17.0

samtools mpileup -ABQ0 -d10000000 -f ref.fas qry.bam > qry.mpileup
samtools depth -q0 -Q0 qry.bam > qry.depth

genomeCoverageBed -ibam qry.bam -g ref.genome -dz > qry.dz
wc -l qry.[dm]*
  1026779 qry.depth
  1027173 qry.dz
  1026779 qry.mpileup

Any ideas? Thanks

bedtools samtools • 13k views
ADD COMMENTlink modified 6.1 years ago by plafdesigns30 • written 6.7 years ago by Surya Saha260

If there are any spliced (CIGAR ~ /N/) alignments in your BAM file, bedtools will count the bases in the spiced region by default. To prevent this, use the -split option.

ADD REPLYlink written 6.7 years ago by Aaronquinlan11k

Thanks for the prompt response! Unfortunately, using -split did not change the counts.

ADD REPLYlink written 6.7 years ago by Surya Saha260

I think you should follow Heng's advice and look at discrepant positions with IGV or the like. I did this myself and found positions where bedtools reports coverage that is higher than samtools. When I look in IGV, the alignments support the bedtools counts, but that is an N=1 experiment. There may be cases where the opposite is true. I will try to look more closely at this soon.

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Aaronquinlan11k

Glad to hear that you have seen something similar. See my reply to Heng's comment below. I have seen this in about 80% of my mapping runs this week using tmap, bwa mem, bowtie2, bwasw and bwa aln. Interestingly, the counts were same for only bwa aln produced SAM files.

ADD REPLYlink written 6.7 years ago by Surya Saha260

You can extract sites unique to bedtools and check the alignment in a BAM viewer.

ADD REPLYlink written 6.7 years ago by lh331k

I did check the counts in Tablet/Bamview and it looks like samtools depth is ignoring some sites and under-counting others. Do I need to tweak the parameters? Thanks

ADD REPLYlink written 6.7 years ago by Surya Saha260
3
gravatar for plafdesigns
6.1 years ago by
plafdesigns30
plafdesigns30 wrote:

This happened to me too, but after playing around with both I realised that it's because samtools mpileup counted in properly mapped only, while bedtools used all the mappings.

Solution: filter the .bam file first using samtools view -b -f 2 to get only mappings with properly mapped reads, then pass that output to bedtools. You will get exactly the same results for both of the methods.

ADD COMMENTlink written 6.1 years ago by plafdesigns30
1

The discrepancy in my case seems to be due to secondary alignments. samtools view -b -F 0x100 whatever.bam | bedtools coverage -abam stdin ... agrees with samtools depth. I want to count those secondary alignements though, so samtools view | bedtools coverage seems to be the most proper way to go.

ETA: samtools view -u ... | bedtools coverage ... appears to work fine too, despite what the docs implies

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by travcollier160
1
gravatar for nlomioni
6.7 years ago by
nlomioni30
United Kingdom
nlomioni30 wrote:

A guess: I know that samtools outputs only positions with depth > 0. Might it be that genomeCoverageBed instead outputs a line for every position in the reference? If your reference is 1,027,173 bases long then this is likely.

ADD COMMENTlink written 6.7 years ago by nlomioni30

Thanks but -dz only reports >0 counts. See http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Surya Saha260
1
gravatar for Matt Shirley
6.7 years ago by
Matt Shirley9.2k
Cambridge, MA
Matt Shirley9.2k wrote:

There might be a couple of things going on here. The first is that I think you are using an outdated version of BEDTools - I can't find genomeCoverageBed anywhere in the current git repository, and I think it has been superseded by coverage. In any case, you should take a look at the -split option in the documentation for coverage as I suspect that your reads have insertions and deletions, and that BEDTools is not correctly splitting the CIGAR string of these reads to produce coverage in the way that samtools does. This could result in a different number of bases in your reads "covering" the bases in the alignment, and a different number of lines in your output.

Edit: Looks like lh3 has the answer.

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by Matt Shirley9.2k

Thanks for the reply but genomeCoverageBed is there as an alias for genomecov. And this is the latest version 2.17.0. See http://bedtools.readthedocs.org/en/latest/content/tools/genomecov.html

ADD REPLYlink written 6.7 years ago by Surya Saha260

That's right, the genomeCoverageBed command is just a shell script that is built when you install bedtools. It merely calls the new tool "bedtools genomecov".

ADD REPLYlink written 6.7 years ago by Aaronquinlan11k
1
gravatar for Aaronquinlan
6.7 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

Based on my comments above, I looked more closely into this. I ran the exact commands you used on a test file I have. It is never the case that the samtools depth is reported to be higher than bedtools. When discrepant, bedtools is always higher.

Here is an example. In this case, bedtools reports a coverage of 5, whereas samtools reports coverage of 1. Every other case I check out has the same pattern. I actually doubt there is a bug in the samtools logic for counting depth. I suspect that instead, the parameters you are using aren't behaving exactly the way you expect, and as such, some alignments are just being filtered out before being counted.

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by Aaronquinlan11k
1

Samtools skips secondary alignments and aberrant read pairs. It is possible to change the default in the code, but not possible at command line. Mpileup has -A, but it turns out to be useless (a bug). If my guess is correct, 4 of reads in your example do not have bit 2 flagged.

ADD REPLYlink written 6.7 years ago by lh331k
2

Yep, in that example 4 reads are duplicates. bedtools just counts whatever you have in your BAM file so long as it is mapped. Filtering duplicates, discordants, etc. is expected to be done ahead of time with samtools, Picard, or bamtools. I think this solves the discrepancy.

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Aaronquinlan11k

I have found exactly the same after exploring a few cases by eye. The coverage count was less for mpileup/depth even when the base count was equal to genomeCoverageBed. I will go with the genomeCoverageBed counts. Thank you for looking into it.

ADD REPLYlink written 6.7 years ago by Surya Saha260
0
gravatar for volavii
6.6 years ago by
volavii0
volavii0 wrote:

hi:)

i have a question regarding the depth calculated by samtools, when there is a deletion in the read.

scaffold79    978    N    5    g$gggg    HGHGH
scaffold79    979    N    4    aaaa    HHGH
scaffold79    980    N    4    g-1ng-1ng-1ng-1n    HH?H
scaffold79    981    N    4    ****    HHEH
scaffold79    982    N    4    tttt    HHEH
scaffold79    983    N    4    tttt    HHGH
scaffold79    984    N    4    tttt    HHGD

at position 980 it is indicated, that here is a deletion in 4 reads at the following position. So these 4 reads do not really cover the next position 981. Should i check for these positions and correct the depth to gain the real coverage, or is this the accepted way to calculate the coverage?

Many thanks in advance!:)

ADD COMMENTlink written 6.6 years ago by volavii0

You should start a new question for future queries like this, but the answer is "it depends". If you're looking for base-by-base depth, then yes, the coverage at that position would be 1. If you're calculating genomic coverage for sequencing metrics, you wouldn't penalize that position for having a deletion that was adequately covered by reads.

ADD REPLYlink written 6.6 years ago by Chris Miller21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1061 users visited in the last hour