Retrieve unique human membrane protein coding genes in Uniprot
Entering edit mode
2.3 years ago
PRog ▴ 10

Hi everyone,

I'm trying to retreive all human membrane protein coding genes in Uniprot. I made this query :

locations:(location:"Membrane [SL-0162]") AND organism:"Homo sapiens (Human) [9606]"

But when I'm looking (with a simple unique function in python) at the list of related genes ("Gene names (primary)") I get as many genes as proteins (e.i. 37 557).

That is not logical since human genome is approx 23 000 genes long and that membrane protein coding genes are estimated to represent 20% of it.

Can anyone see what is going on here ?

Uniprot membrane proteins human genome • 590 views
Entering edit mode
2.3 years ago
GenoMax 129k

human membrane protein coding

Do you mean genes that are coding for proteins that are targeted at the membrane and may be secreted?

My answer here is still valid: A: human membrane protein gene symbols These are HUGO approved human membrane genes.

If you use following query at UniProt

reviewed:yes AND organism:"Homo sapiens (Human) [9606]" AND proteome:up000005640 AND annotation:(type:transmem) and annotation:(type:intramem)

you are down to 186 non-redundant proteins as of today.

And using just membrane as location

locations:(location:membrane) AND reviewed:yes AND organism:"Homo sapiens (Human) [9606]" AND proteome:up000005640

there are 7602 proteins.

Entering edit mode

Thank you for these hints ! I'll look deeper at HUGO database. Meanwhile, changing for "distinct" function, I found many "nan" not considered as NaN in my list. With that in addition to your answer, I should obtain a proper list.

Entering edit mode
2.3 years ago

I suggest you start by looking at the results of your query on the website first and make sure the results make sense in your context (not just looking at numbers), before launching the query programmatically:

The filters give you a number of hints that may be useful, e.g. - restrict to reviewed entries - restrict to entries from the human proteome etc

You may also add a column for the primary gene name, remove all columns that are not relevant for you in this context, and then download in tab-separated format:


tab-separated (preview first 10):[SL-0162]%22)%20AND%20organism:%22Homo%20sapiens%20(Human)%20[9606]%22&format=tab&limit=10&columns=id,reviewed,genes(PREFERRED)&sort=score

Please don't hesitate to contact the UniProt helpdesk if you have any additional questions.


Login before adding your answer.

Traffic: 849 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6