Question: 5"UTR and 3"UTR of Nile Tilapia Genome (Oreochromis niloticus)
5.5 years ago by
Singapore, Temasek Life Sciences Laboratory
wanziyi8960 wrote:

Dear all,

I am looking for the annotations of 5'UTR and 3'UTR of the tilapia genome. I went to UCSC table browser to try to download the BED files of these UTRs. I tried track:Refseq Genes, table:refGene and downloaded the BED file. The number of annotated UTRs were very little (about 5 per chromosome). Hence I believe the genome sequencing team did not annotate the UTRs.

Then I move on to try track:Refseq Genes, table:xenoRefGene and downloaded the BED files. This time around, the table browser gave me a full list of genes annotated on the reference genome but the UTRs were all 200bps from the start codon. All 5' UTRs were 200bp in length. 

  • I'm a little confused now, can this datasets be used? If no 5'UTR were annotated, can we assume that 200bp up-stream of the start codon can be arbitrarily defined as the 5'UTR vice-versa for the 3'UTR?
  • If this can't be used, how should I go about it if I am keen to know if my sequencing hits of interests are in the UTR region?






5.5 years ago by
cyril-cros900 wrote:

If you want, you can check out some RNA Seq data in IGV and see if the annotation is reasonable. From what I have seen in the mouse genome, most UTRs are annotated, but some isoforms and some genes with a tissue-specific expression may be missing or just plain wrong (like, olfactory receptors). There are always some genes and isoforms with longer 3' specifically expressed in the brain.

For the annotation, see the Havana team work at the Wellcome Trust. They do manual annotation a lot, because automated tools can often fail. Such tools could be Cufflinks (which is not so good with 3'UTRs) or this one ( - the associated article explains how the UTR annotation is done. They can work with a reference annotation and complete it,  or without even a genome (Trinity). They are called ab initio transcript assembly methods.
You can detect a gene rather easily (ORF,stop codon, homology data) and their annotation is always good, the 5' and 3' extremity are less clear. You see a drop in the number of reads. They can have also have introns, so paired-end data is much better.

Once again, take some relevant paired-end strand-specific RNASeq data with some good sequencing depth (or merge biological replicates) and check the annotation...

5.5 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

Try the Ensembl annotation. That one has >27000 UTRs annotated and they're of variable length (the longest is ~6kb).

Do you mean by going to the BioMart section to download the UTR annotations? 

Of just download the GTF and filter it as needed. Either way would work.

I have checked. The tilapia UTRs are not annotated in the Ensembl database. I wonder how bioinformaticians annotate UTRs? Is it possible to use, say zebrafish transcriptome to annotate the Nile tilapia genome as well as their UTRs?



They're in the GTF file that I downloaded, so just check there. You can find some details on Ensembls annotation process for this species here.

