Question: samtools bedcov vs. bedtools coverage
10
gravatar for Dataman
2.7 years ago by
Dataman260
Finland
Dataman260 wrote:

Hi, currently, I am trying to calculate the coverage for my genes of interest (i.e. number of reads falling in or overlapping by the gene of interest) from DNA-seq data in a BAM file. Genes of interest are annotated in a BED file.

There are several answers on Biostar suggesting to use
bedtools coverage e.g., bedtools coverage -abam sample.bam -b exons.bed -counts. However, while checking the Samtools' manual, I stumbled upon samtools bedcov with the following description:

read depth per BED region. Reports read depth per genomic region, as specified in the supplied BED file.

which can be used like this: samtools bedcov gene.bed sample.bam.

I have tested these two and I am getting different results! For instance, for OR4F5gene I get the following results (using the same BAM file for both):

chr1    69091   70008   "OR4F5" 61 #from bedtools coverage

and

chr1    69091   70008   "OR4F5" 4714  #from  samtools bedcov

where 61 is very different than 4714!

Another observation is that the result from Samtools is instantaneous for the above test while Bedtools takes a lot of time to produce the result. The speed itself has led me to think of using samtools bedcov. However, I was wondering whether there is catch that I am overlooking! I tried to find more information about samtools bedcov but I was not able to find anything more than that one-liner description.

I would be happy to hear about your feedback on this.

[EDIT]: This is how it looks like in IGV. It seems that Bedtools results is closer to reality. visualization in the IGV

coverage dna-seq • 13k views
ADD COMMENTlink modified 5 months ago by RamRS20k • written 2.7 years ago by Dataman260
4

That's peculiar. My gut feeling says the samtools bedcov is reporting the total number of reads within a region. However testing samtools view BAM <region> | wc -l gives me different answers. I'm not familiar with bedtools coverage but the value of 67 seems close to high coverage libraries.

Another option using samtools that I know works. samtools depth will give you per base pair depth of coverage. You can take the median (or mean) coverage within a locus. If you're parsing regions that are large (>10kb) you can obtain the read counts samtools view BAM <region> | wc -l and then determine the coverage by this equation (READ COUNTS / BASES SPANED) * AVERAGE READ LENGTH

Using either samtools view or samtools depth you can also limit the reads to ones with mapping qualities greater than 10 (or whatever value you like). You can also limit the read length in samtools depth

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by QVINTVS_FABIVS_MAXIMVS2.2k

Thanks for your reply! bedtools coverage does not provide option for filtering based on mapping quality. The value of 61 is achieved when one counts all mapped reads. If one uses your first approach and also filters reads with mapping score below 10, then the result would be 6. I have now noticed that bedtools multicov can also be used for the same purpose with support for filtering based on mapping quality. Also it only counts non duplicate reads.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Dataman260
1

How does the data looks in IGV in that region ? Can you post the exact commands ?

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by geek_y9.1k

Let me check! I'll get back to you asap!

ADD REPLYlink written 2.7 years ago by Dataman260

Now, I have edited the post with an image from the IGV.

ADD REPLYlink written 2.7 years ago by Dataman260
6
gravatar for Devon Ryan
2.7 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

The description that goes with samtools bedcov is wrong (I'll create an issue on github). It returns the sum of per-base coverage in each region. I'm not sure how that would be useful, but that's what it is.

Edit: Here's the issue to get the description fixed.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Devon Ryan88k
2

Thank you! I had a hunch if I did samtools depth and added up the coverages it would equal the output.

ADD REPLYlink written 2.7 years ago by QVINTVS_FABIVS_MAXIMVS2.2k

Yup, your hunch nailed it. I should note that samtools bedcov (like samtools depth) skips alignments marked as duplicates, QC fail, etc. I'm pretty sure bedtools doesn't do that.

ADD REPLYlink written 2.7 years ago by Devon Ryan88k
1

Thank you for your response. It seems that bedtools multicov (see here) can take care of duplicates and QC-failed reads and also filtering based on mapping quality.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Dataman260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 892 users visited in the last hour