Let's say I have a bam file(alignment) and gff(annotations of exons).
You can use the BEDOPS
bedmap application and its
--indicator operand or the
--count operand to report any or count all reads which overlap exons.
First, use the BEDOPS conversion utilities to render your inputs into sorted BED files:
$ bam2bed < reads.bam > reads.bed
$ gff2bed < exons.gff > exons.bed
Then perform the indicator function operation:
$ bedmap --echo --indicator exons.bed reads.bed \
| awk -F '|' '($2 == 1)' - \
exons-with-overlapping-reads.bed is a sorted BED file that contains exons that have one or more overlapping reads. The
awk statement just filters for results that meet this condition.
If you want the actual count of reads over an exon, use the
--count operator and skip the
$ bedmap --echo --count exons.bed reads.bed \
exons-with-number-of-overlapping-reads.bed is a sorted BED file that contains exons and the number of reads which overlap each exon (by one or more bases).
--indicator operand is the same as
--count, where the indicator result is
1, or "true" if the mapped-element count is greater than zero, and
0, or "false" otherwise.)
If you're interested in subsets of the exons, use
bedops --range or
grep or other approaches (
awk, Perl, etc.) to filter or adjust the coordinates in your
exons.bed file, as needed. Then just run the
bedmap steps, as previously described.
If you'd like to learn more about this toolkit, I have posted a brief summary of BEDOPS Bedops: The Fastest, Most Scalable, And Easily-Parallelizable Genome Analysis Toolkit!.