Question

Cannonical human sequence missing in OMA database?

1

Entering edit mode

5.6 years ago

asente ▴ 30

Hello,

I would like to fetch 1:1 orthologs for GBRA2_HUMAN, a cannonical GABAA receptor alpha-2 subunit sequence as defined by Uniprot (Uniprot ID: P47869), however it seems that this sequence is not available in the OMA database. Could you please let me know if there is an alternative way to obtain the orthologs for this gene and what might be the reason for the missing entry?

I tried using the OMAdb R package:

getHOG(id ='GBRA2_HUMAN', members = TRUE)

THE OMA REST API request failed:https://omabrowser.org/api/hog/GBRA2_HUMAN/members/ Here's the original error message: Not Found (HTTP 404).

If I use mapSequence(sequence) using the cannonical sequence from Uniprot, I get three matched targets, neither of which is the cannonical human GABAA alpha2 subunit:

[1] "A0A2R8ZZ28" "H2QPE5" "G3QVC6"

On the other hand, other human GABAA subunits seem to be present in the database (e.g. GBRA1_HUMAN, GBRA3_HUMAN).

The GBRA2_HUMAN sequence I used for searches is the following:

sequence <- 'MKTKLNIYNMQFLLFVFLVWDPARLVLANIQEDEAKNNITIFTRILDRLLDGYDNRLRPG
LGDSITEVFTNIYVTSFGPVSDTDMEYTIDVFFRQKWKDERLKFKGPMNILRLNNLMASK
IWTPDTFFHNGKKSVAHNMTMPNKLLRIQDDGTLLYTMRLTVQAECPMHLEDFPMDAHSC
PLKFGSYAYTTSEVTYIWTYNASDSVQVAPDGSRLNQYDLLGQSIGKETIKSSTGEYTVM
TAHFHLKRKIGYFVIQTYLPCIMTVILSQVSFWLNRESVPARTVFGVTTVLTMTTLSISA
RNSLPKVAYATAMDWFIAVCYAFVFSALIEFATVNYFTKRGWAWDGKSVVNDKKKEKASV
MIQNNAYAVAVANYAPNLSKDPVLSTISKSATTPEPNKKPENKPAEAKKTFNSVSKIDRM
SRIVFPVLFGTFNLVYWATYLNREPVLGVSP'

Many thanks.

software error OMA R oma • 1.2k views

ADD COMMENT • link updated 5.6 years ago by Christophe Dessimoz ▴ 740 • written 5.6 years ago by asente ▴ 30

score 5 · Accepted Answer · 2019-12-03

The reason for this is that OMA is quite stringent when mapping IDs. For human, OMA uses the genome provided by Ensembl. The protein sequence we consider for the gene you are interested in (human gene "GABRA2") is a different isoform in OMA vs UniProt. The one we use in OMA (https://omabrowser.org/oma/info/ENSP00000427603) which is 511 AA long and maps to a different UniProt entry (https://uniprot.org/uniprot/E9PBQ7).

So how can you find GBRA2_HUMAN in OMA?

1) One way would be to use the approximate search functionality, e.g. using the REST API function sequence list (setting the search parameter to "approximate").

https://omabrowser.org/api/docs#sequence-list

For instance, using the library OmaDB in R: mapSequence(sequence,search = "approximate")

2) Another solution would be to map via the gene name GABRA2. You can use the REST API function xref list:

https://omabrowser.org/api/docs#xref

For instance, using the library OmaDB in R: searchProtein('GABRA2')

This returns two human isoforms in OMA, the first of which is ENSP00000427603 (and which was used to infer orthologs).