The gene coordinates table for E.coli looks like the following :
orientation start end gene name + 189 255 thrL + 336 2799 thrA + 2800 3733 thrB + 3733 5020 thrC + 5233 5530 yaaX - 5682 6459 yaaA - 6528 7959 yaaJ + 8237 9191 talB + 9305 9893 mog
So these coordinates, depending on orientation, always point to start codons of the gene or to the termination codon. Given this information, is it fair to assume that the region between the end of one gene and the start of the following gene can be used to infer intergenic regions? I am particularly interested in distinguishing 5' UTRs for these genes based on the coordinates. Computational gene extraction itself is not the problem, I can't think about any effective way where I can tell apart 3
UTRs from 5UTRs in the region separating two tandem genes since some intergenic regions have varying sizes and some genes are included together in operons.
If you have any ideas that can help me brainstorm this problem I will be grateful