Hi! I would be very happy if someone could give any advice about SAMtools. I have used "pileup" to calculate the coverage of each position of my reference genome by mRNAseq reads. Now I have a list of exons in bed format and I would like to count the number of times each position of the exon is covered. Does SAMtools have any tool able to do it?
The samtools "view" option now has a -F argument that reports reads aligned to entries in a BED file. For eaxmple:
samtools view -F genes.bed aln.bam
However, to my knowledge, it will not report the alignment coverage across the BED entries. BEDTools has a utility called "coverageBed" that will do this. For example:
coverageBed -abam aln.bam -b exons.bed > exons.bed.coverage
samtools view -u -q 30 -f 0x2 aln.bam | \ coverageBed -abam stdin -b exons.bed \ > exons.bed.proper.q30.coverage
Also, this has been asked in many different ways before. See these threads for other options:
- How Many Reads In A Bam File Are Aligned To An Arbitrary Region
- Intersecting A Set Of Bam Reads With A Set Of Coordinates
- Ngs And Coverage Of Exons
Lastly, as you are using RNAseq, you likely will have "spliced" alignments and on;y want to count coverage at exons. In this case, use the "-split" option in coverageBed.
coverageBed -split -abam aln.bam -b exons.bed > exons.bed.split.coverage
Stand alone? I don't think so. You can pull the data out of the pileup, and process it with a script.
BEDTools has something closer to what you want. Give it the bam file, a bed file, and coverageBed will tell you for each element in the bed file how many letters have how much coverage. You aligned to genome, and have a bed of exons, or you aligned to spliced transripts, and have exons of that? Things are going to be strange at junctions, of course.