I am trying to analyze the loci of a gene in various bacterial genomes. Sometimes the gene appears at the beginning/end of the genome in the embl/genbank file. I am wondering if the starting base of the genome has any connection with the origin of replication (ORi). Also if the selection of positive and negative strand are random. If yes, I will put my gene of interest in positive strand in all genomes for easier comparison.
The start is just the origin of sequencing, which is not necessarily the origin of replication. You can see quite nicely in Ensembl Bacteria, which uses EMBL-Bank genome and annotation files directly, without processing to find the origin of replication.
Here's a genome where the origin of sequencing is the origin of replication. The origin of sequencing is shown at the top, as a red arrow. The obvious GC and gene skew on either side of ther chromosome clearly show demonstrate the positions of the origin and terminus of replication.
Here's a genome where it's not. The origin of sequencing is, again, at the top. But the GC skew seems to be ~60º away from vertical, suggesting that the origin and terminus are at the 4 o'clock and 10 o'clock position (no idea which is which).
Here's a genome where there's no way of knowing. No GC or gene skew, so the origin of replication could be anywhere.