I am looking for the annotations of 5'UTR and 3'UTR of the tilapia genome. I went to UCSC table browser to try to download the BED files of these UTRs. I tried track:Refseq Genes, table:refGene and downloaded the BED file. The number of annotated UTRs were very little (about 5 per chromosome). Hence I believe the genome sequencing team did not annotate the UTRs.
Then I move on to try track:Refseq Genes, table:xenoRefGene and downloaded the BED files. This time around, the table browser gave me a full list of genes annotated on the reference genome but the UTRs were all 200bps from the start codon. All 5' UTRs were 200bp in length.
- I'm a little confused now, can this datasets be used? If no 5'UTR were annotated, can we assume that 200bp up-stream of the start codon can be arbitrarily defined as the 5'UTR vice-versa for the 3'UTR?
- If this can't be used, how should I go about it if I am keen to know if my sequencing hits of interests are in the UTR region?