6.2 years ago by
If you want to get annotations for every exon/intron/UTR in a reference genome, you can use the UCSC Table Browser.
Here's how to get it done:
- Pick you reference genome under clade/genome/assembly
- Make sure the group is "Genes and Gene Predictions"
- Choose your preferred track (I like to rely on RefSeq and CCDS)
- Choose the table that gives gene information (e.g. for RefSeq, the table you want is refGene)
- Select your region or the entire genome to get coordinates for
- Select BED format as your output format
- Name your output file
- Click "get output"
On the next page, you will get the option to get coordinates only for all exons, coding exons, introns, 5' UTRs, or 3' UTRs (plus flanking sequence if you want). You can download these coordinates however you'd like (I prefer having one file for each genomic feature type), then overlap your mapped sequences to the genomic features using bedtools' intersect.
To find intergenic regions, you can create a merged BED file of all exons, introns and UTR sequences and look for mapped sequences that overlap NONE of those features using bedtools intersect with the -v option.
If your curious about other ways to use bedtools to analyze your mapped sequences, I've found this site to have the best documentation.