Question: 5"UTR and 3"UTR of Nile Tilapia Genome (Oreochromis niloticus)
gravatar for wanziyi89
5.7 years ago by
Singapore, Temasek Life Sciences Laboratory
wanziyi8960 wrote:

Dear all,

I am looking for the annotations of 5'UTR and 3'UTR of the tilapia genome. I went to UCSC table browser to try to download the BED files of these UTRs. I tried track:Refseq Genes, table:refGene and downloaded the BED file. The number of annotated UTRs were very little (about 5 per chromosome). Hence I believe the genome sequencing team did not annotate the UTRs.

Then I move on to try track:Refseq Genes, table:xenoRefGene and downloaded the BED files. This time around, the table browser gave me a full list of genes annotated on the reference genome but the UTRs were all 200bps from the start codon. All 5' UTRs were 200bp in length. 

  • I'm a little confused now, can this datasets be used? If no 5'UTR were annotated, can we assume that 200bp up-stream of the start codon can be arbitrarily defined as the 5'UTR vice-versa for the 3'UTR?
  • If this can't be used, how should I go about it if I am keen to know if my sequencing hits of interests are in the UTR region?






ADD COMMENTlink modified 5.7 years ago by cyril-cros910 • written 5.7 years ago by wanziyi8960
gravatar for cyril-cros
5.7 years ago by
cyril-cros910 wrote:

If you want, you can check out some RNA Seq data in IGV and see if the annotation is reasonable. From what I have seen in the mouse genome, most UTRs are annotated, but some isoforms and some genes with a tissue-specific expression may be missing or just plain wrong (like, olfactory receptors). There are always some genes and isoforms with longer 3' specifically expressed in the brain.

For the annotation, see the Havana team work at the Wellcome Trust. They do manual annotation a lot, because automated tools can often fail. Such tools could be Cufflinks (which is not so good with 3'UTRs) or this one ( - the associated article explains how the UTR annotation is done. They can work with a reference annotation and complete it,  or without even a genome (Trinity). They are called ab initio transcript assembly methods.
You can detect a gene rather easily (ORF,stop codon, homology data) and their annotation is always good, the 5' and 3' extremity are less clear. You see a drop in the number of reads. They can have also have introns, so paired-end data is much better.

Once again, take some relevant paired-end strand-specific RNASeq data with some good sequencing depth (or merge biological replicates) and check the annotation...

ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by cyril-cros910
gravatar for Devon Ryan
5.7 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

Try the Ensembl annotation. That one has >27000 UTRs annotated and they're of variable length (the longest is ~6kb).

ADD COMMENTlink written 5.7 years ago by Devon Ryan98k

Do you mean by going to the BioMart section to download the UTR annotations? 

ADD REPLYlink written 5.7 years ago by wanziyi8960

Of just download the GTF and filter it as needed. Either way would work.

ADD REPLYlink written 5.7 years ago by Devon Ryan98k

I have checked. The tilapia UTRs are not annotated in the Ensembl database. I wonder how bioinformaticians annotate UTRs? Is it possible to use, say zebrafish transcriptome to annotate the Nile tilapia genome as well as their UTRs?



ADD REPLYlink written 5.7 years ago by wanziyi8960

They're in the GTF file that I downloaded, so just check there. You can find some details on Ensembls annotation process for this species here.

ADD REPLYlink written 5.7 years ago by Devon Ryan98k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2486 users visited in the last hour