Question: Where To Download Genome Annotation Including Exon, Intron, Utr, Intergenic Information
3
gravatar for liran0921
5.4 years ago by
liran0921110
liran0921110 wrote:

Hi Everyone,

This might be a simple question but has been bothering me. Actually I have some small RNA which have been mapped to genome. But I want to find out their location in the genome (exon, intron, UTR, intergenic). So I would like to use a genome annotation with these information to do that. I tried Ensemble and UCSC genome broswer, but failed to get what I want. Can anybody give me some instructions? Many thanks!

genome • 16k views
ADD COMMENTlink modified 5.4 years ago by Mitch Bekritsky1.1k • written 5.4 years ago by liran0921110

What kind of small RNA you are talking about? UCSC and Ensembl have annotation information for lot of non-coding RNAs.

ADD REPLYlink written 5.4 years ago by Ashutosh Pandey11k

I got some novel miRNAs which don't have any annotation information. So I want to overlap them with the gene annotation in the genome.

ADD REPLYlink written 5.4 years ago by liran0921110

1) You can try downloading the .gff3 file from miRBase (http://www.mirbase.org/ftp.shtml) for your specie of interest. It has coordinates for most of the known miRNAs.

2) You can download the gene annotation information using the UCSC table (http://genome.ucsc.edu/cgi-bin/hgTables?command=start)

Excuse me if this is not the answer to your question. I may have not fully understood what you want.

ADD REPLYlink written 5.4 years ago by Ashutosh Pandey11k

Thanks. I tried UCSC table, but there are only exon coordinate info in the output gtf file. So how to extract the info for intronic, UTR and intergenic region?

ADD REPLYlink written 5.4 years ago by liran0921110
25
gravatar for Mitch Bekritsky
5.4 years ago by
Mitch Bekritsky1.1k
London, England
Mitch Bekritsky1.1k wrote:

If you want to get annotations for every exon/intron/UTR in a reference genome, you can use the UCSC Table Browser.

Here's how to get it done:

  1. Pick you reference genome under clade/genome/assembly
  2. Make sure the group is "Genes and Gene Predictions"
  3. Choose your preferred track (I like to rely on RefSeq and CCDS)
  4. Choose the table that gives gene information (e.g. for RefSeq, the table you want is refGene)
  5. Select your region or the entire genome to get coordinates for
  6. Select BED format as your output format
  7. Name your output file
  8. Click "get output"

On the next page, you will get the option to get coordinates only for all exons, coding exons, introns, 5' UTRs, or 3' UTRs (plus flanking sequence if you want). You can download these coordinates however you'd like (I prefer having one file for each genomic feature type), then overlap your mapped sequences to the genomic features using bedtools' intersect.

To find intergenic regions, you can create a merged BED file of all exons, introns and UTR sequences and look for mapped sequences that overlap NONE of those features using bedtools intersect with the -v option.

If your curious about other ways to use bedtools to analyze your mapped sequences, I've found this site to have the best documentation.

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Mitch Bekritsky1.1k

Thanks for your answer. It's very helpful!

ADD REPLYlink written 5.4 years ago by liran0921110

My pleasure! I'm glad I was able to help

ADD REPLYlink written 5.4 years ago by Mitch Bekritsky1.1k

Thanks for the answer, but after "get output" I can not see the option for getting intronic and/or UTR coordinates, only for exons. Can you share a screenshot please? 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Nathaniel70

Hi Richard,

Sorry for the late reply--holidays and all.  Here's a link to the screenshot of the page I get after I follow steps 1-8 above (sorry it's not embedded here...I've tried embedding images hosted on Dropbox and Google Drive in tiff, jpeg, and png formats, to no avail...).  As you can see, the intron option is right there.  My guess is you may not have selected the correct output format on the table browser page.  Can you screenshot it?  Or link to a screen shot?

 

ADD REPLYlink written 4.6 years ago by Mitch Bekritsky1.1k

Can I do this with Ensembl? I used the Ensembl GRCm38 to align my RNA-seq data, and now I want to summarize (htseq) using introns and exons both. The ensembl GTF that I got seems to have transcript and gene. Does gene = coordinates of exons + introns? Or is it just exons?

ADD REPLYlink written 2.4 years ago by achamess50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1403 users visited in the last hour