How do I go from UniProt ID to retrieving the gene name?
0
0
Entering edit mode
5.0 years ago
a.rex ▴ 340

I have a very long list of gene and their corresponding uniprot IDs from a blast against the uniprot database.

I was wondering - is there a tool I can download to convert these ids to a gene name.

Thank you

sequence • 9.7k views
ADD COMMENT
2
Entering edit mode

Here is Uniprot's link that explains about programmatic (i.e. terminal) access to the retrieve/mapping tool: Programmatic access - Mapping database identifiers They have examples on how to write a script in several languages just for this. Also, If you are unfamiliar with how to use the UniProtID converter tool, here is a UniProtID Tutorial

You can manually upload a file of thousands of IDs and convert them. I believe their limit is a file with ~40,000 IDs. If you have a very large amount of IDs from your blast output, you can use the split command in terminal to split each file to 40,000 IDs or whatever you wish and then write a script using their examples to access that tool programmatically on each of these file.

ADD REPLY
1
Entering edit mode

Using #R

uniprot_mapping <- function(ids) {
           uri <- 'http://www.uniprot.org/uniprot/?query='
           idStr <- paste(ids, collapse="+or+")
           format <- '&format=tab'
           fullUri <- paste0(uri,idStr,format)
           dat <- read.delim(fullUri)
           dat
                  }

  ## Usage
  ids = c("A0A2T3D680", "A0A0F0E143", "A0A0F0E266")
  uniprot_mapping(ids)

You can find your data under the column "Gene.names".

ADD REPLY
0
Entering edit mode

For some reason when I search for a uniprot accession using this function, it returns data for twice as many proteins. Is there something special about the A0A2 or A0A0 prepended to the accessions that makes this work?

Edit: Figured it out. Paste "accession:" before each accession id to make this work. I suspect just including a bare accession (e.g. P60710) makes it search both entry ids and accessions.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you - I am however aware of this. I just wanted to know if there was a way of inputting a very large list >1000 loci. Something terminal based?

ADD REPLY
0
Entering edit mode

The code I posted here could solve your problem as well, once you add the necessary input and output logic you're looking for.

ADD REPLY
0
Entering edit mode

You can download ID mappings and parse them in any way you want.

ADD REPLY
0
Entering edit mode

This conversion tool will let you (in theory) a list of unlimited ids, but I think in the UI it is limited to about 3,000. They also have an API that you should be able to access via a terminal and I believe that workflow supports the unlimited workflow.

https://biodbnet-abcc.ncifcrf.gov/db/db2db.php

ADD REPLY

Login before adding your answer.

Traffic: 1834 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6