5'UTR in bacterial genomes
1
1
Entering edit mode
7.3 years ago
bioslayer ▴ 50

The gene coordinates table for E.coli looks like the following :

orientation     start     end     gene name
+               189       255     thrL
+               336       2799    thrA
+               2800      3733    thrB
+               3733      5020    thrC
+               5233      5530    yaaX
-               5682      6459    yaaA
-               6528      7959    yaaJ
+               8237      9191    talB
+               9305      9893    mog


So these coordinates, depending on orientation, always point to start codons of the gene or to the termination codon. Given this information, is it fair to assume that the region between the end of one gene and the start of the following gene can be used to infer intergenic regions? I am particularly interested in distinguishing 5' UTRs for these genes based on the coordinates. Computational gene extraction itself is not the problem, I can't think about any effective way where I can tell apart 3UTRs from 5UTRs in the region separating two tandem genes since some intergenic regions have varying sizes and some genes are included together in operons.

If you have any ideas that can help me brainstorm this problem I will be grateful

prokaryotes intergenic-regions 5-prime-UTR • 4.2k views
3
Entering edit mode
7.3 years ago
Asaf 8.6k

The information about the UTRs of E. coli is indeed not in this table or any other simple annotation table. These can be found experimentally, you can download the 5' UTRs from RegulonDB for instance, there are 3 files with transcription start site locations ("Transcription start sites experimentally determined in the laboratory of Dr. Morett"), beware that there might be different values for each gene. The 3' UTRs can be obtained from RNA-seq experiments as well, I didn't find a simple table that describe them. You can try and build the transcripts yourself using published RNA-seq experiments, you can use Rockhopper for this purpose, it's really friendly and gives you a simple table with transcription start and termination, translation start and termination for each gene.

You should be aware that there are alternative transcription start sites and termination sites so the UTRs can be different for different mRNA molecules.

Another issue is the direction of the genes, two genes can share the same terminator (the poly-U part) if they are convergent (---><---)or the same promoter if they are divergent (<----->), if they are in the same orientation they might reside on the same transcription unit.

I know it's a mess.

Good luck.

0
Entering edit mode

It sure is a mess. An interesting type of mess. Thanks for the clarifications regarding that possibility that convergent and divergent genes could be sharing either the same terminator or promotor. That was insightful

In the UTR table, I noticed some genes, like CsrA have several 5' UTRs. That all UTRs have the same start position but different end positions and therefore different lengths. From an experiment perspective what do you think has happened here? Does that mean 5` UTRs were not possible to resolve for such genes or that such genes simply tend to have more than one UTR?

1
Entering edit mode

I think it's real, these are alternative TSSs. From looking at the gene in a RNA-seq I'm working on right now you can see that there appears to be different start sites, pay attention to the coordinates that match two of the TSSs in the first table on RegulonDB.