MyGene.info is a web service that provides up to date annotations in several fields and is great for gene ID conversion. All species from NCBI and Ensembl are supported and annotations are updated weekly to ensure the latest annotations are available. Both python and R/Bioconductor clients are easy to use.
MyGene.info may not be able to solve your problem with Agilent IDs but several other IDs from Genebank, Uniprot, Ensembl, Refseq are all available. Also, from either client, you can query several thousand genes at once.
Here is some example syntax for ID conversion from the python module:
>>> import mygene
>>> mg = mygene.MyGeneInfo()
>>> mg.metadata['available_fields'] ## returns available query terms, pay special attention to "ensemblgene", "entrezgene", "symbol" and "uniprot"
[u'accession', u'alias', u'biocarta', u'chr', u'end', u'ensemblgene', u'ensemblprotein', u'ensembltranscript', u'entrezgene', u'exons', u'flybase', u'generif', u'go', u'hgnc', u'homologene', u'hprd', u'humancyc', u'interpro', u'ipi', u'kegg', u'mgi', u'mim', u'mirbase', u'mousecyc', u'name', u'netpath', u'pdb', u'pfam', u'pharmgkb', u'pid', u'pir', u'prosite', u'ratmap', u'reactome', u'reagent', u'refseq', u'reporter', u'retired', u'rgd', u'smpdb', u'start', u'strand', u'summary', u'symbol', u'tair', u'taxid', u'type_of_gene', u'unigene', u'uniprot', u'wikipathways', u'wormbase', u'xenbase', u'yeastcyc', u'zfin']
>>> xli = ['DDX26B','CCDC83', 'MAST3', 'RPL11', 'ZDHHC20', 'LUC7L3', 'SNORD49A', 'CTSH', 'ACOT8']
>>> mg.querymany(xli, scopes="symbol", fields=["uniprot", "ensembl.gene", "reporter"], species="human", as_dataframe=True)
A DataFrame is returned:
And now for the Bioconductor package:
library(mygene)
xli <- c('DDX26B','CCDC83', 'MAST3', 'RPL11', 'ZDHHC20', 'LUC7L3', 'SNORD49A', 'CTSH', 'ACOT8')
queryMany(xli, scopes="symbol", fields=c("uniprot", "ensembl.gene", "reporter"), species="human")
This returns a DataFrame:
Finished
DataFrame with 9 rows and 5 columns
ensembl.gene _id uniprot.Swiss-Prot uniprot.TrEMBL query
<CharacterList> <character> <character> <List> <character>
1 ENSG00000165359 203522 Q5JSJ4 ######## DDX26B
2 ENSG00000150676 220047 Q8IWF9 ######## CCDC83
3 ENSG00000099308 23031 O60307 ######## MAST3
4 ENSG00000142676 6135 P62913 ######## RPL11
5 ENSG00000180776 253832 Q5W0Z9 ######## ZDHHC20
6 ENSG00000108848 51747 O95232 ######## LUC7L3
7 ENSG00000277370,ENSG00000175061 26800 NA ######## SNORD49A
8 ENSG00000103811 1512 P09668 ######## CTSH
9 ENSG00000101473 10005 O14734 ######## ACOT8
How frequently do you need things updated? DAVID does have yearly releases so far, but their latest release is this month (March 2010). See the release announcement here: http://david.abcc.ncifcrf.gov/forum/cgi-bin/ikonboard.cgi?act=ST;f=10;t=25 This does suggest the underlying mapping framework will be updated along with it in the 6.7 beta, and hence should include more recent information for the conversion tool
Hi
I am faced the same problem.I did differential gene expression by using this protocol "Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown"
I have gene list file after using the ballgown
the gene id in this files is as
I want to perform gene ontology next by using tool AgriGo. these gene ids are not recognized in any database.
I have use the tool bioDBnet to convert these ids into ensembl gene id .but not found result.
I would say that the suffix numbers are Entrez IDs (Gene IDs).