Question: How can I get the sequence of 5' and 3' UTR regions of bacterial genome?
0
gravatar for joannerandy80
5 weeks ago by
joannerandy8010 wrote:

Hi there, Is there any software tool or database to identify the entire 5' and 3' UTR regions of bacterial gene? I am aware that eukaryotic genes are clearly annotated in Ensembl and Genbank with these details. But unfortunately I couldnt able to find this information for bacterial genes. Your help on this would be very much appreciated. Many thanks in advance.

ADD COMMENTlink written 5 weeks ago by joannerandy8010
1

I am not a bacterial expert at all, but conceptually you can check for these UTRs manually. In eukaryotes, the 5'UTR is defined as the sequence from the beginning of exon 1 to the base right upstream of the start codon. Likewise, the 3'UTR is the base right downstream after the stop codon until the end of the last exon. In prokaryotes, you could try to take the gene/operon annotation you have and define 5'UTR as the entire range from the beginning of the gene until the start codon and the 3'UTR as the range after the stop codon until the end of the gene.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by ATpoint13k

Thanks.. I believe you mean by "range from the beginning of the gene..." actually "range from the beginning of the transcript...". But my problem begins how to define / identify / the beginning of the transcript and end of the transcript.. As explained below, my aim is to find accessible open areas in the mRNA secondary structures of a few bacterial genes (Ex. Alr, dxr) to identify Antisense oligonucleotides (ASOs) binding target. There are few programmes available to predict the secondary structure but we need to key in exactly entire mRNA CDS + 5' + 3' UTRs otherwise the secondary structure prediction won't be correct and we end up designing ASOs for wrong inaccessible area.

I had done this exactly a few years back but for Eukaryotic (huntington, DMD etc.) genes.. the advantage of Eukaryotic genes are that they are properly annotated in the genome databases including the UTR regions.. but for prokaryotes no such thing available..

ADD REPLYlink written 5 weeks ago by joannerandy8010
1
gravatar for jrj.healey
5 weeks ago by
jrj.healey10k
United Kingdom
jrj.healey10k wrote:

If you don't have some transcriptional data, there's no way to define this specifically (and even then its pretty woolly).

It's easy enough to just take the preceeding n bases from the start codon of a gene though. Most regulatory elements will be within 500-1000bp upstream of most genes.

What exactly are you looking for an how specific are you trying to be?

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by jrj.healey10k

Thanks Healey. I am trying to find accessible open areas in the mRNA secondary structures of a few bacterial genes (Ex. Alr, dxr) to identify Antisense oligonucleotides (ASOs) binding target. There are few programmes available to predict the secondary structure but we need to key in exactly entire mRNA CDS + 5' + 3' UTRs otherwise the secondary structure prediction won't be correct and we end up designing ASOs for wrong inaccessible area.

I had done this exactly a few years back but for Eukaryotic (huntington, DMD etc.) genes.. the advantage of Eukaryotic genes are that they are properly annotated in the genome databases including the UTR regions.. but for prokaryotes no such thing available..

ADD REPLYlink written 5 weeks ago by joannerandy8010

Yep, it's a tough challenge.

Something you could try, is, for each gene, get the gene sequence, and all intergenic space either side of the gene, and call that the CDS/5'/3' 'super mRNA'.

You'd end up with some overlapping/redundancy for sure, but you could filter the sequences after the fact once you've done secondary structure prediction on the 'super mRNA'.

It's a bit of a brute force approach, but I can't see what other options you really have.

You might try looking at the piggy paper from Harry Thorpe (https://github.com/harry-thorpe/piggy) which deals in a bit of analysis of intergenic spaces, but not secondary structure specifically AFAIK. There could be something in that approach that works for you (its a quick way to extract all intergenic space anyway).

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by jrj.healey10k

Thanks a lot.. let me give a try..

ADD REPLYlink written 4 weeks ago by joannerandy8010
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1471 users visited in the last hour