Question: Calculate Exon Coverage Using Samtools
gravatar for Cdiez
8.5 years ago by
Cdiez150 wrote:

Hi! I would be very happy if someone could give any advice about SAMtools. I have used "pileup" to calculate the coverage of each position of my reference genome by mRNAseq reads. Now I have a list of exons in bed format and I would like to count the number of times each position of the exon is covered. Does SAMtools have any tool able to do it?

samtools coverage pileup • 18k views
ADD COMMENTlink written 8.5 years ago by Cdiez150
gravatar for Aaronquinlan
8.5 years ago by
United States
Aaronquinlan11k wrote:

The samtools "view" option now has a -F argument that reports reads aligned to entries in a BED file. For eaxmple:

samtools view -F genes.bed aln.bam

However, to my knowledge, it will not report the alignment coverage across the BED entries. BEDTools has a utility called "coverageBed" that will do this. For example:

coverageBed -abam aln.bam -b exons.bed > exons.bed.coverage

You can combine samtools and bedtools if you want to only count certain types of reads. For example, only counting "proper pairs" with a mapping quality >=30:

samtools view -u -q 30 -f 0x2 aln.bam | \
   coverageBed -abam stdin -b exons.bed \
   > exons.bed.proper.q30.coverage

Also, this has been asked in many different ways before. See these threads for other options:

Lastly, as you are using RNAseq, you likely will have "spliced" alignments and on;y want to count coverage at exons. In this case, use the "-split" option in coverageBed.

coverageBed -split -abam aln.bam -b exons.bed > exons.bed.split.coverage
ADD COMMENTlink modified 6.9 years ago by seidel6.9k • written 8.5 years ago by Aaronquinlan11k

Thanks for your answer, BEDtools is a very useful package! However, I can not use the tool coverageBED because it is not counting twice the positions that are covered by two reads and I need these information in order to have an idea of the level of expression of each exon. It's a pity, I thought my problem was solved using BEDtools.

ADD REPLYlink written 8.5 years ago by Cdiez150

Are you sure? It certainly should be. Email with an example, and I'd be glad to help debug your problem.

ADD REPLYlink modified 4 months ago by RamRS25k • written 8.5 years ago by Aaronquinlan11k

Thanks! According to the BEDtools manual the coverage is calculated as the number of positions of the reference sequences (in my case exons) covered by the elements included in a second file (in my case mRNAseq alignment.bam). But if a position of the exon is covered by two reads this is not taken into account (see figure in pg. 60 of BEDtools manual) and this issue is really important if we want to know the level of expression calculating the total coverage/bp of each exon. I have sent an example to the email address you indicated me. Thanks a lot for your help!

ADD REPLYlink modified 4 months ago by RamRS25k • written 8.5 years ago by Cdiez150
gravatar for Swbarnes2
8.5 years ago by
Swbarnes21.5k wrote:

Stand alone? I don't think so. You can pull the data out of the pileup, and process it with a script.

BEDTools has something closer to what you want. Give it the bam file, a bed file, and coverageBed will tell you for each element in the bed file how many letters have how much coverage. You aligned to genome, and have a bed of exons, or you aligned to spliced transripts, and have exons of that? Things are going to be strange at junctions, of course.

ADD COMMENTlink written 8.5 years ago by Swbarnes21.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2214 users visited in the last hour