6.2 years ago by
Seattle, WA USA
You could do this with the BEDOPS toolkit.
First, convert the datasets into BED files, e.g.:
$ bam2bed < reads.bam > reads.bed
$ gff2bed < annotations.gff > annotations.gff.bed
$ gtf2bed < annotations.gtf > annotations.gtf.bed
BEDOPS tools are written to work with standard input and output streams, so you can use piping to filter annotations for classes of elements, for instance:
$ gff2bed < annotations.gff | grep -w "gene" > genes.gff.bed
$ gtf2bed < annotations.gtf | grep -w "exon" > exons.gtf.bed
Then do set or map operations on BED files, e.g. to get all reads that overlap annotations, use
$ bedops --element-of -1 reads.bed annotations.gff.bed > justReadsThatOverlapAnnotations.bed
And to get both reads and their associated, overlapping annotations, use
bedmap --echo --echo-map:
$ bedmap --echo --echo-map reads.bed annotations.gtf.bed > readsWithAssociatedAnnotations.bed
Note here that set operations return members of the reference set (or subsets thereof), while map operations return elements from both reference and map sets. The documentation goes into more detail about both operations and arguments for each application.