Question

5"UTR and 3"UTR of Nile Tilapia Genome (Oreochromis niloticus)

0

Entering edit mode

9.0 years ago

wanziyi89 ▴ 60

Dear all,

I am looking for the annotations of 5'UTR and 3'UTR of the tilapia genome. I went to UCSC table browser to try to download the BED files of these UTRs. I tried track:Refseq Genes, table:refGene and downloaded the BED file. The number of annotated UTRs were very little (about 5 per chromosome). Hence I believe the genome sequencing team did not annotate the UTRs.

Then I move on to try track:Refseq Genes, table:xenoRefGene and downloaded the BED files. This time around, the table browser gave me a full list of genes annotated on the reference genome but the UTRs were all 200bps from the start codon. All 5' UTRs were 200bp in length.

I'm a little confused now, can this datasets be used? If no 5'UTR were annotated, can we assume that 200bp up-stream of the start codon can be arbitrarily defined as the 5'UTR vice-versa for the 3'UTR?
If this can't be used, how should I go about it if I am keen to know if my sequencing hits of interests are in the UTR region?

regards,
kenta

utr annotations xenorefseq ucsc table browser • 3.3k views

ADD COMMENT • link updated 2.9 years ago by Ram 43k • written 9.0 years ago by wanziyi89 ▴ 60

Ram · Answer 1 · 2015-05-06

If you want, you can check out some RNA Seq data in IGV and see if the annotation is reasonable. From what I have seen in the mouse genome, most UTRs are annotated, but some isoforms and some genes with a tissue-specific expression may be missing or just plain wrong (like, olfactory receptors). There are always some genes and isoforms with longer 3' specifically expressed in the brain.

For the annotation, see the Havana team work at the Wellcome Trust. They do manual annotation a lot, because automated tools can often fail. Such tools could be Cufflinks (which is not so good with 3'UTRs) or this one - the associated article explains how the UTR annotation is done. They can work with a reference annotation and complete it, or without even a genome (Trinity). They are called ab initio transcript assembly methods.

You can detect a gene rather easily (ORF,stop codon, homology data) and their annotation is always good, the 5' and 3' extremity are less clear. You see a drop in the number of reads. They can have also have introns, so paired-end data is much better.

Once again, take some relevant paired-end strand-specific RNASeq data with some good sequencing depth (or merge biological replicates) and check the annotation...

Ram · Answer 2 · 2015-05-05

1

Entering edit mode

9.0 years ago

Devon Ryan 104k

Try the Ensembl annotation. That one has >27000 UTRs annotated and they're of variable length (the longest is ~6kb).

ADD COMMENT • link updated 2.9 years ago by Ram 43k • written 9.0 years ago by Devon Ryan 104k

0

Entering edit mode

Do you mean by going to the BioMart section to download the UTR annotations?

ADD REPLY • link updated 2.9 years ago by Ram 43k • written 9.0 years ago by wanziyi89 ▴ 60

0

Entering edit mode

Of just download the GTF and filter it as needed. Either way would work.

ADD REPLY • link updated 2.9 years ago by Ram 43k • written 9.0 years ago by Devon Ryan 104k

0

Entering edit mode

I have checked. The tilapia UTRs are not annotated in the Ensembl database. I wonder how bioinformaticians annotate UTRs? Is it possible to use, say zebrafish transcriptome to annotate the Nile tilapia genome as well as their UTRs?

ADD REPLY • link updated 2.9 years ago by Ram 43k • written 9.0 years ago by wanziyi89 ▴ 60

0

Entering edit mode

They're in the GTF file that I downloaded, so just check there. You can find some details on Ensembls annotation process for this species here.

ADD REPLY • link updated 2.9 years ago by Ram 43k • written 9.0 years ago by Devon Ryan 104k