Question: Finding UTR regions in the RNA sequencing data from a non-model organism
2.7 years ago by
seta1.2k wrote:

Hi all friends,

I'm working on a RNA-seq project of a non-model plant, the library is generated from mRNA fraction (enriched with polyA) and sequenced as PE, stranded-specific. I have done de novo transcriptome assembly and annotation. Now, I would like to know if there is any way to determine the 3 and 5 UTR region of genes? Regarding 3 UTR, Since the library enrichment was done by oligo-dT primers, I'm not concerned about it, but I don't know what is the right procedure to determine these regions? Could you please advise me on this issue?

Thanks in advance

Hi!, I have to run a similar task...Did you find any solution?


2.7 years ago by
EVR540 wrote:

Hi Seta,

Use Transdecoder. It will output gff3 file which includes all the transcripts and their respective UTRs location and CDS location. Its very effective and reliable.

2.7 years ago by
Sheffield, UK
i.sudbery4.8k wrote:

You want to use an ORF finder to find the longest open reading frame in each transcript. EMBOSS has tools for this. One thing to be careful of is that if you have missed the 5' end of the transcript, you might not find a start codon for all your transcripts. The UTR is then most probably the sequence after the longest ORF.

