Calculate Exon Coverage Using Samtools
2
9
Entering edit mode
11.3 years ago
Cdiez ▴ 150

Hi! I would be very happy if someone could give any advice about SAMtools. I have used "pileup" to calculate the coverage of each position of my reference genome by mRNAseq reads. Now I have a list of exons in bed format and I would like to count the number of times each position of the exon is covered. Does SAMtools have any tool able to do it?

pileup coverage samtools • 21k views
27
Entering edit mode
11.3 years ago

The samtools "view" option now has a -F argument that reports reads aligned to entries in a BED file. For eaxmple:

samtools view -F genes.bed aln.bam


However, to my knowledge, it will not report the alignment coverage across the BED entries. BEDTools has a utility called "coverageBed" that will do this. For example:

coverageBed -abam aln.bam -b exons.bed > exons.bed.coverage


You can combine samtools and bedtools if you want to only count certain types of reads. For example, only counting "proper pairs" with a mapping quality >=30:

samtools view -u -q 30 -f 0x2 aln.bam | \
coverageBed -abam stdin -b exons.bed \
> exons.bed.proper.q30.coverage


Also, this has been asked in many different ways before. See these threads for other options:

Lastly, as you are using RNAseq, you likely will have "spliced" alignments and on;y want to count coverage at exons. In this case, use the "-split" option in coverageBed.

coverageBed -split -abam aln.bam -b exons.bed > exons.bed.split.coverage

1
Entering edit mode

Thanks for your answer, BEDtools is a very useful package! However, I can not use the tool coverageBED because it is not counting twice the positions that are covered by two reads and I need these information in order to have an idea of the level of expression of each exon. It's a pity, I thought my problem was solved using BEDtools.

1
Entering edit mode

Are you sure? It certainly should be. Email bedtools-discuss@googlegroups.com with an example, and I'd be glad to help debug your problem.

1
Entering edit mode

Thanks! According to the BEDtools manual the coverage is calculated as the number of positions of the reference sequences (in my case exons) covered by the elements included in a second file (in my case mRNAseq alignment.bam). But if a position of the exon is covered by two reads this is not taken into account (see figure in pg. 60 of BEDtools manual) and this issue is really important if we want to know the level of expression calculating the total coverage/bp of each exon. I have sent an example to the email address you indicated me. Thanks a lot for your help!

1
Entering edit mode
11.3 years ago
Swbarnes2 ★ 1.5k

Stand alone? I don't think so. You can pull the data out of the pileup, and process it with a script.

BEDTools has something closer to what you want. Give it the bam file, a bed file, and coverageBed will tell you for each element in the bed file how many letters have how much coverage. You aligned to genome, and have a bed of exons, or you aligned to spliced transripts, and have exons of that? Things are going to be strange at junctions, of course.