Question: Gene Id Conversion Tool
33
gravatar for Renee
5.7 years ago by
Renee330
Renee330 wrote:

Hey,

I was using DAVID (http://david.abcc.ncifcrf.gov/conversion.jsp) to do the gene ID conversion, e.g.conversion between Agilent ID, Genebank accession id and Entrez gene ID, but I found the DAVID database is not updated. Does anyone know a better updataed conversion tool to do this job? Thanks!

ADD COMMENTlink modified 11 days ago by s.birch0 • written 5.7 years ago by Renee330

How frequently do you need things updated? DAVID does have yearly releases so far, but their latest release is this month (March 2010). See the release announcement here: http://david.abcc.ncifcrf.gov/forum/cgi-bin/ikonboard.cgi?act=ST;f=10;t=25 This does suggest the underlying mapping framework will be updated along with it in the 6.7 beta, and hence should include more recent information for the conversion tool

ADD REPLYlink modified 2.7 years ago by Istvan Albert ♦♦ 55k • written 5.3 years ago by Daniel Swan11k
25
gravatar for Casey Bergman
4.2 years ago by
Casey Bergman16k
Manchester, UK
Casey Bergman16k wrote:

The bioDBnet and Hyperlink Management System (HMS) systems convert multiple ID sets to each other.

HMS is limited to three species (human, mouse ciona) and has fewer data sources (Agilent - no, GenBank and Entrez - yes).

The bioDBnet system appears to be species-neutral and the network of linked databases is shown here, (includes Agilent, GenBank and Entrez, so it should fit your requirements): alt text

ADD COMMENTlink written 4.2 years ago by Casey Bergman16k
17
gravatar for Michael Dondrup
5.3 years ago by
Bergen, Norway
Michael Dondrup32k wrote:

BioMart has already been mentioned. It can do much more than ID conversion but it is very useful for conversion purposes, it is regularly updated and you can select different genome builds and all kinds of genomic features. It seems to me that you wish to retrieve GeneIDs linked to Affymetrix IDs. To select these attributes in BioMart: go to the Martview page to start a new BioMart query.

Select attributes on the attribute page: The Ensembl GeneIDs and Transcript IDs are default. Ensembl GeneID and Affy IDs are under the "External" tab. Select your chip there. To limit to those genes which are on the chip, use the Filters->Gene menue. You can limit the genes to those present on various platforms or your favourite set.

There is an URL button in biomart that allows to retrieve a URL for your query and to pass it on to others. Try this example:

BioMart URL URL, that should be a good starting point.

If you are interested in KEGG identifiers (Pathways, Genes), EC-numbers, etc. the

KEGG Identifier page could be handy, because the KEGG ids are not in BioMart as far as I know.

ADD COMMENTlink written 5.3 years ago by Michael Dondrup32k
8
gravatar for adam.maikai
8 months ago by
adam.maikai200
United States
adam.maikai200 wrote:

MyGene.info  is a web service that provides up to date annotations in several fields and is great for gene ID conversion. All species from NCBI and Ensembl are supported and annotations are updated weekly to ensure the latest annotations are available. Both python and R/Bioconductor clients are easy to use. 

http://bioconductor.org/packages/release/bioc/html/mygene.html

https://pypi.python.org/pypi/mygene

MyGene.info may not be able to solve your problem with Agilent IDs but several other IDs from Genebank, Uniprot, Ensembl, Refseq are all available. Also, from either client, you can query several thousand genes at once.

Here is some example syntax for ID conversion from the python module:

>>>import mygene
>>>mg = mygene.MyGeneInfo()
>>>mg.metadata['available_fields'] ## returns available query terms
[u'accession', u'alias', u'biocarta', u'chr', u'end', u'ensemblgene', u'ensemblprotein', u'ensembltranscript', u'entrezgene', u'exons', u'flybase', u'generif', u'go', u'hgnc', u'homologene', u'hprd', u'humancyc', u'interpro', u'ipi', u'kegg', u'mgi', u'mim', u'mirbase', u'mousecyc', u'name', u'netpath', u'pdb', u'pfam', u'pharmgkb', u'pid', u'pir', u'prosite', u'ratmap', u'reactome', u'reagent', u'refseq', u'reporter', u'retired', u'rgd', u'smpdb', u'start', u'strand', u'summary', u'symbol', u'tair', u'taxid', u'type_of_gene', u'unigene', u'uniprot', u'wikipathways', u'wormbase', u'xenbase', u'yeastcyc', u'zfin']

>>>xli = ['DDX26B','CCDC83', 'MAST3', 'RPL11', 'ZDHHC20', 'LUC7L3', 'SNORD49A', 'CTSH', 'ACOT8']
>>>mg.querymany(xli, scopes="symbol", fields=["uniprot", "ensembl.gene", "reporter"], species="human", as_dataframe=True)

A DataFrame is returned:

Finished.
             _id                        ensembl.gene  \
query
DDX26B    203522                     ENSG00000165359
CCDC83    220047                     ENSG00000150676
MAST3      23031                     ENSG00000099308
RPL11       6135                     ENSG00000142676
ZDHHC20   253832                     ENSG00000180776
LUC7L3     51747                     ENSG00000108848
SNORD49A   26800  [ENSG00000277370, ENSG00000175061]
CTSH        1512                     ENSG00000103811
ACOT8      10005                     ENSG00000101473

                                                   reporter  \
query
DDX26B    {u'HG-U95B': u'53886_at', u'GNF1H': u'gnf1h144...
CCDC83    {u'GNF1H': [u'gnf1h06565_at', u'gnf1h09743_at'...
MAST3     {u'HG-U133_Plus_2': u'213045_at', u'HG-U95Av2'...
RPL11     {u'GNF1H': u'200010_at', u'HG-U133_Plus_2': u'...
ZDHHC20   {u'HG-U133_Plus_2': [u'225365_at', u'243786_at']}
LUC7L3    {u'HG-U95B': [u'55032_at', u'57642_at'], u'HG-...
SNORD49A  {u'HG-U133_Plus_2': [u'225065_x_at', u'239754_...
CTSH      {u'HG-U133_Plus_2': u'202295_s_at', u'HG-U95Av...
ACOT8     {u'HG-U95B': u'47789_at', u'HG-U133_Plus_2': [...

                                                    uniprot
query
DDX26B                           {u'Swiss-Prot': u'Q5JSJ4'}
CCDC83     {u'Swiss-Prot': u'Q8IWF9', u'TrEMBL': u'H0YDV3'}
MAST3      {u'Swiss-Prot': u'O60307', u'TrEMBL': u'V9GYV0'}
RPL11     {u'Swiss-Prot': u'P62913', u'TrEMBL': [u'Q5VVC...
ZDHHC20    {u'Swiss-Prot': u'Q5W0Z9', u'TrEMBL': u'B4DRN8'}
LUC7L3    {u'Swiss-Prot': u'O95232', u'TrEMBL': [u'A8K3C...
SNORD49A                                                NaN
CTSH      {u'Swiss-Prot': u'P09668', u'TrEMBL': [u'E9PKT...
ACOT8     {u'Swiss-Prot': u'O14734', u'TrEMBL': [u'E9PIS...

And now for the Bioconductor package:

library(mygene)
xli  <-  c('DDX26B','CCDC83',  'MAST3', 'RPL11', 'ZDHHC20',  'LUC7L3',  'SNORD49A',  'CTSH', 'ACOT8')
queryMany(xli, scopes="symbol", fields=c("uniprot", "ensembl.gene", "reporter"), species="human")

This returns a DataFrame:

Finished
DataFrame with 9 rows and 5 columns
                     ensembl.gene         _id uniprot.Swiss-Prot uniprot.TrEMBL       query
                  <CharacterList> <character>        <character>         <List> <character>
1                 ENSG00000165359      203522             Q5JSJ4       ########      DDX26B
2                 ENSG00000150676      220047             Q8IWF9       ########      CCDC83
3                 ENSG00000099308       23031             O60307       ########       MAST3
4                 ENSG00000142676        6135             P62913       ########       RPL11
5                 ENSG00000180776      253832             Q5W0Z9       ########     ZDHHC20
6                 ENSG00000108848       51747             O95232       ########      LUC7L3
7 ENSG00000277370,ENSG00000175061       26800                 NA       ########    SNORD49A
8                 ENSG00000103811        1512             P09668       ########        CTSH
9                 ENSG00000101473       10005             O14734       ########       ACOT8

 

ADD COMMENTlink modified 8 months ago • written 8 months ago by adam.maikai200
2

That's a pretty neat service. You should post this as a separate tool annonucement. 

ADD REPLYlink written 8 months ago by Istvan Albert ♦♦ 55k

There is already a request for including Agilent reporter IDs in MyGene.info:

https://bitbucket.org/sulab/mygene.info/issue/1/support-for-agilent-platform-reporters

Please leave a comment there if someone need any specific platforms to be included.

ADD REPLYlink written 8 months ago by Newgene40
7
gravatar for Perry
5.3 years ago by
Perry280
philadelphia
Perry280 wrote:

BridgeDB provides a nice API and REST interface, so you can put ID mapping queries in your scripts.

ADD COMMENTlink written 5.3 years ago by Perry280

BridgeDB is really a software framework that you can use in our own code; either directly (currently only in Java) or through calling it as a webservice. It can use different and even multiple stacked mappings. By default these come from ENSEMBL (for gene products) and HMDB (for metabolites). Ongoing projects extend the available mappings with ChemSPider and SNP info. There is a short introduction available at Nature Precedings: http://precedings.nature.com/documents/5023/version/1 and a paper in BMC Bioinformatics: http://dx.doi.org/10.1186/1471-2105-11-5

ADD REPLYlink written 4.4 years ago by Chris Evelo9.4k
6
gravatar for Giovanni M Dall'Olio
5.4 years ago by
London, UK
Giovanni M Dall'Olio21k wrote:

You can also do it with the following services:

  • uniprot - Click on 'Id Mapping' from the home page.
  • biomart - choose a database and a version, then put the ids you want to convert under Filters->Id List limit (select the proper input id in the menu), and then the output ids under 'Attributes'. Biomart is a general tool that enables you to extract a lot of different informations from databases - sequences, ontologies, transcripts, homologues - but maybe for converting gene ids is a bit too complex.
  • galaxy - I can't help too much about this here but I am sure it has a function for doing that - and many other things.
ADD COMMENTlink written 5.4 years ago by Giovanni M Dall'Olio21k
5
gravatar for Madelaine Gogol
5.3 years ago by
Madelaine Gogol4.2k
Kansas City
Madelaine Gogol4.2k wrote:

If you have just a few, I just saw someone use the R package BioIDMapper and it seemed kind of neat. But it's slow.

ADD COMMENTlink written 5.3 years ago by Madelaine Gogol4.2k

Unfortunately, this link is now broken :/ ...

ADD REPLYlink written 15 months ago by Samuel Lampa1000
1

There is a more recent version at: http://sourceforge.net/projects/bioidmapper/

ADD REPLYlink written 15 months ago by Daniel Swan11k
4
gravatar for Mohammed Islaih
5.6 years ago by
Mohammed Islaih40 wrote:

The following link has a list of ID conversion tools:

http://hum-molgen.org/NewsGen/08-2009/000020.html

ADD COMMENTlink modified 2.7 years ago by Istvan Albert ♦♦ 55k • written 5.6 years ago by Mohammed Islaih40
3
gravatar for Daniel Swan
5.3 years ago by
Daniel Swan11k
The Genome Analysis Centre, Norwich, UK
Daniel Swan11k wrote:

http://idconverter.bioinfo.cnio.es/

Is another possible solution to this, although you might find this is not as up to date as you might like either.

ADD COMMENTlink written 5.3 years ago by Daniel Swan11k

I would like to ask here that this tool also converts HGNC id to ENSEMBLE ID (ENSG..) But for all the HGNC ID I have I do not get the correspoding ENSEMBLE ID, is there anyway I can retrieve the maximum id of ENSEMBLE for my HGNC gene id's?

ADD REPLYlink written 9 months ago by vchris_ngs360
1
gravatar for Samuel Lampa
2.1 years ago by
Samuel Lampa1000
Stockholm
Samuel Lampa1000 wrote:

Have a look at the (BETA stage) Ensembl REST API

For example, for converting from Ensembl Gene ids to Gene symbols, you could use a query like this one:

http://beta.rest.ensembl.org/xrefs/id/ENSG00000059804?content-type=application/json

... and then programmatically (some python parsing should be rather straight forward) extract the "display_id" for the items that have "dbname" = "HGNC", or "EntrezGene".

For example, the following PHP code does the trick for me:

http://beta.rest.ensembl.org/xrefs/id/$ensemblID?content-type=application/json");
$ensemblResult = json_decode($ensemblResultJson, true);

// Print out each found Gene symbol on a separate row:
echo "
";
foreach ($ensemblResult as $mapping) {
    if ( in_array( $mapping['dbname'], array("EntrezGene","HGNC"))) {
        echo "Found Gene symbol: " . $mapping['display_id'] . "\n";
    }
}
echo "
"; ?>
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Samuel Lampa1000
0
gravatar for Istvan Albert
5.7 years ago by
Istvan Albert ♦♦ 55k
University Park, USA
Istvan Albert ♦♦ 55k wrote:

I don't know of a direct solution myself, but this is a topic that may be of interest for the biological data analysis class that I am teaching.

If you specify the organism/genomic builds that you are interested in we may be able to generate a full translation list as an in class example or a homework. I was planning on covering an Affymetrix ID to Genebank example anyhow.

ADD COMMENTlink written 5.7 years ago by Istvan Albert ♦♦ 55k

Thanks! That's great! But I'm not student there...Can I access to that anyway? I am using Human whole genome Agilent array. Thank you so much.

ADD REPLYlink written 5.7 years ago by Renee330

missed this comment, sorry about it!

ADD REPLYlink written 5.4 years ago by Istvan Albert ♦♦ 55k
0
gravatar for aheinzel
18 months ago by
aheinzel80
Austria
aheinzel80 wrote:

Not sure what your background is, however, we recently started to develop an id mapper / converter for experimentalists who prefer organizing their data in Excel. Therefore, the client directly integrates into MS Excel.

Currently, we provide the possibility to map from various IDs to ensembl and back. The mapping data were extracted from Ensembl 73 (released on the 4.9.2013). If you need mappings for any additional ID types availalble from the ensembl database we will be happy to add them (please just tell us via our feedback form).

ADD COMMENTlink written 18 months ago by aheinzel80

@aheinzel

Is it possible to use this tool to generate ENSEMBLE (ENSG ID) from HGNC gene ID for human?  Also does it work on Mac or is it just for Windows?

ADD REPLYlink modified 9 months ago • written 9 months ago by vchris_ngs360

I don't really understand the need for this. Many identity mappers offer webservices and if needed these can be installed locally. That is definitely true for our own BridgeDb. Is there any reason you could not just call these services from Excel? (And yes that would allow mapping from ENSEML gene ID to HGNC or from probeset IDs)

ADD REPLYlink written 9 months ago by Chris Evelo9.4k
0
gravatar for s.birch
11 days ago by
s.birch0
United Kingdom
s.birch0 wrote:

If you have just a few, I just saw someone use the R package BioIDMapper and it seemed kind of a good thing to use. But it'

is slow

ADD COMMENTlink written 11 days ago by s.birch0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 982 users visited in the last hour