ChrisM10 wrote:

Hi, I'm using OMA ( to find orthologues between bacterial species and I'm having trouble retrieving gene names that can be matched to those in the corresponding genbank file, whether it be refseq IDs, gene names, etc. For example: Bacillus subtilis 168 vs. Bacillus thuringensis Al Hakam yields:

OMA data line:

BACSU00001 BACAH00001 1:1 769903 > OMA specific naming system?

OMA source data IDs:

dnaA dnaA 1:1 769903 > gene names are not available for many of these genes in the Bt genbank file so this isn't a usable format

OMA "Refseq IDs":

NP_387882.1 BACAH00001 1:1 769903 > not actual refseq IDs

OMA "Ensemble Gene IDs":

939978 BACAH00001 1:1 769903 > again, not actual IDs

Actual corresponding locus tags (The data I'm trying to collect):

BSU00010 BALH_0001

Without matching gene IDs, I can't really use the lovely data you guys have compiled. I figured others might be in the same boat. Also, in case other people need data in the same format as I do, has been helpful. Cheers!

Doing Source Data AC's I get this for BACSU/BACAH. Both are NCBI accession #.

CAB11777.1  ABK83423.1  1:1 769903
CAB11778.1  ABK83424.1  1:1 798523
CAB11779.1  ABK83425.1  1:1 759802
CAB11780.1  ABK83426.1  1:1 768267
CAB11782.1  ABK83427.1  1:1 796765
CAB11783.1  ABK83428.1  1:1 796766
CAB11784.1  ABK83429.1  1:1 59659
CAB11785.1  ABK83430.1  1:1 797548
CAB11786.1  ABK83431.1  1:m 
CAB11786.1  ABK85545.1  1:m 
CAB11787.1  ABK83432.1  1:1 798155

So instead of the accession numbers you want the Locus ID?

adrian.altenhoff620 wrote:

In OMA we integrate data from many different sources. So it's quite difficult to have a coherent, comprehensive set of cross-references. Because of this we add our internal OMA-IDs, which are the UniProt species codes + a number and fall back to those in case we cannot identify one of the canonical forms.

In the Download section of OMA, we also provide many cross references for all the sequences in OMA. Lastly, there is also a way to get all the crossreferences for a genome pair at once. It is not really a public API, but works as of know: , so using the genome ids and as p3 the ALLTYPES query parameter.

