Question: How do I go from UniProt ID to retrieving the gene name?
0
gravatar for a.rex
2.3 years ago by
a.rex200
a.rex200 wrote:

I have a very long list of gene and their corresponding uniprot IDs from a blast against the uniprot database.

I was wondering - is there a tool I can download to convert these ids to a gene name.

Thank you

sequence • 3.7k views
ADD COMMENTlink written 2.3 years ago by a.rex200
2

Here is Uniprot's link that explains about programmatic (i.e. terminal) access to the retrieve/mapping tool: Programmatic access - Mapping database identifiers They have examples on how to write a script in several languages just for this. Also, If you are unfamiliar with how to use the UniProtID converter tool, here is a UniProtID Tutorial

You can manually upload a file of thousands of IDs and convert them. I believe their limit is a file with ~40,000 IDs. If you have a very large amount of IDs from your blast output, you can use the split command in terminal to split each file to 40,000 IDs or whatever you wish and then write a script using their examples to access that tool programmatically on each of these file.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by ladypurrsia40
1

Using #R

uniprot_mapping <- function(ids) {
           uri <- 'http://www.uniprot.org/uniprot/?query='
           idStr <- paste(ids, collapse="+or+")
           format <- '&format=tab'
           fullUri <- paste0(uri,idStr,format)
           dat <- read.delim(fullUri)
           dat
                  }

  ## Usage
  ids = c("A0A2T3D680", "A0A0F0E143", "A0A0F0E266")
  uniprot_mapping(ids)

You can find your data under the column "Gene.names".

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by josev.die20

For some reason when I search for a uniprot accession using this function, it returns data for twice as many proteins. Is there something special about the A0A2 or A0A0 prepended to the accessions that makes this work?

Edit: Figured it out. Paste "accession:" before each accession id to make this work. I suspect just including a bare accession (e.g. P60710) makes it search both entry ids and accessions.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by alexandercmonovich0

UniProt ID converter.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax78k

Thank you - I am however aware of this. I just wanted to know if there was a way of inputting a very large list >1000 loci. Something terminal based?

ADD REPLYlink written 2.3 years ago by a.rex200

The code I posted here could solve your problem as well, once you add the necessary input and output logic you're looking for.

ADD REPLYlink written 2.3 years ago by mobiusklein160

You can download ID mappings and parse them in any way you want.

ADD REPLYlink written 2.3 years ago by genomax78k

This conversion tool will let you (in theory) a list of unlimited ids, but I think in the UI it is limited to about 3,000. They also have an API that you should be able to access via a terminal and I believe that workflow supports the unlimited workflow.

https://biodbnet-abcc.ncifcrf.gov/db/db2db.php

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by andrew480
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 801 users visited in the last hour