Question: Issues In Mapping Uniprot Ids To Entrez Ids
gravatar for mtyler.jason
6.2 years ago by
mtyler.jason110 wrote:

Hi all,

I am currently trying to use the site's protein id to entrez id mapping service. I have around 2714 human proteins. However, when I map them to entrez ids, it gives me matches for only 790 proteins which is much lesser than I had anticipated. Are there better services to match uniprot ids to entrez ids?

gene protein entrez • 3.8k views
ADD COMMENTlink modified 6.2 years ago by miquelduranfrigola760 • written 6.2 years ago by mtyler.jason110
gravatar for Hamish
6.2 years ago by
Hamish3.1k wrote:

By "uniprot ids" do you mean UniProtKB entry names (e.g. BRCA1_HUMAN) or accessions (e.g. P38398)?

Since UniProtKB entry names are subject to change, you may need to map any entry names you are using to accessions before attempting to perform further mappings.

Also do you mean Entrez Gene identifiers or NCBI GI numbers?

If the former, then:

If you want GI numbers then you can do much the same thing, but instead of querying Entrez Gene you need to query the 'Protein' database (which includes all the sequence data in UniProtKB), and will return the NCBI version of the UniProtKB entry, including the GI number assigned to the current protein sequence.

You could also try mapping your UniProtKB accessions (or entry names) through UniParc to RefSeq. Since this mapping is based on sequence identity, instead of the UniProtKB cross-reference annotations (which do not exist for some organisms), it will give you RefSeq entries which have the same sequence. You can then look-up these RefSeq identifiers in Entrez Gene to get the best coverage possible for a set of sequence entries. It may be possible to go further by including older sequence versions in the mapping, however these are typically updated to correct sequencing errors, and so should be treated with caution.

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by Hamish3.1k
gravatar for miquelduranfrigola
6.2 years ago by
miquelduranfrigola760 wrote:

It's quite surprising that you get such a small coverage. You are using SwissProt Uniprot ACs, right?

Anyway, if you are a R user, biomaRt Bioconductor's package is very useful.

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by miquelduranfrigola760
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 769 users visited in the last hour