Question: 5'UTR in bacterial genomes
gravatar for bioslayer
6.3 years ago by
New Zealand
bioslayer50 wrote:

The gene coordinates table for E.coli looks like the following :


orientation start end gene name
+    189    255    thrL

336     2799    thrA

2800    3733    thrB

3733    5020    thrC

5233     5530    yaaX

5682     6459    yaaA

6528    7959    yaaJ

8237    9191    talB

9305    9893   


So these coordinates, depending on orientation,  always point to start codons of the gene or to the termination codon. Given this information, is it fair to assume that the region between the end of one gene and the start of the following gene can be used to infer intergenic regions ? I am particularly interested in distinguishing 5' UTRs for these genes based on the coordinates. Computational gene extraction itself is not the problem, I can't think about any effective way where I can tell apart 3`UTRs from 5`UTRs in the region separating two tandem genes since some intergenic regions have varying sizes and some genes are included togeher in operons.

If you have any idaes that can help me brainstorm this problem I will be grateful

ADD COMMENTlink modified 6.3 years ago by Asaf8.4k • written 6.3 years ago by bioslayer50
gravatar for Asaf
6.3 years ago by
Asaf8.4k wrote:

The information about the UTRs of E. coli is indeed not in this table or any other simple annotation table. These can be found experimentally, you can download the 5' UTRs from RegulonDB for instance, there are 3 files with transcription start site locations ("Transcription start sites experimentally determined in the laboratory of Dr. Morett"), beware that there might be different values for each gene. The 3' UTRs can be obtained from RNA-seq experiments as well, I didn't find a simple table that describe them. You can try and build the transcripts yourself using published RNA-seq experiments, you can use Rockhopper for this purpose, it's really friendly and gives you a simple table with transcription start and termination, translation start and termination for each gene.

You should be aware that there are alternative transcription start sites and termination sites so the UTRs can be different for different mRNA molecules.

Another issue is the direction of the genes, two genes can share the same terminator (the poly-U part) if they are convergent (---><---)or the same promoter if they are divergent (<----->), if they are in the same orientation they might reside on the same transcription unit.

I know it's a mess.

Good luck.

ADD COMMENTlink written 6.3 years ago by Asaf8.4k

It sure is a mess. An interesting type of mess. Thanks for the clarifications regarding that possibility that convergent and divergent genes could be sharing either the same terminator or promotor. That was insightful

In the UTR table, I noticed some genes, like CsrA have several 5` UTRs. That all UTRs have the same start position but different end positions and therefore different lengths. From an experiment perspective what do you think has happened here ? Does that mean 5` UTRs were not possible to resolve for such genes or that such genes simply tend to have more than one UTR ?

ADD REPLYlink written 6.3 years ago by bioslayer50

I think it's real, these are alternative TSSs. From looking at the gene in a RNA-seq I'm working on right now you can see that there appears to be different start sites, pay attention to the coordinates that match two of the TSSs in the first table on RegulonDB.

CsrA example

ADD REPLYlink written 6.3 years ago by Asaf8.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1688 users visited in the last hour