Question

How do you get gene symbols from microarray accession numbers?

0

Entering edit mode

4.0 years ago

dikisakye • 0

I have a list of microarray gene accession numbers and would like to obtain the gene symbols. Any recommendations on how to go about it?

gene • 1.4k views

ADD COMMENT • link updated 4.0 years ago by GenoMax 142k • written 4.0 years ago by dikisakye • 0

1

Entering edit mode

Don't SHOUT please.

ADD REPLY • link 4.0 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

ok, sorry about that.

ADD REPLY • link 4.0 years ago by dikisakye • 0

0

Entering edit mode

Just as a side note, getting the symbols should be one of the later steps in the analysis, for anything else they are redundant.

ADD REPLY • link 4.0 years ago by Michael 54k

0

Entering edit mode

When referring to any kind of ID's please provide examples.

ADD REPLY • link 4.0 years ago by GenoMax 142k

0

Entering edit mode

I am in the preliminary steps of conducting a meta analysis of microarray data, this is my first time to analyse any data. I need the gene IDs to feed them into one of the databases for pathway and enrichment analysis. One dataset has the following gene bank accession numbers( NM_010378, NM_010382, NM_008873, NM_016701, NM_178057, NM_028072, NM_020581, NM_010441, NM_207105, ,NM_031254,NM_007631, XM_001005899, NM_029422, NM_147217,XM_993267, NM_175406, XM_985034 etc)

ADD REPLY • link 4.0 years ago by dikisakye • 0

GenoMax · Answer 1 · 2020-05-14

2

Entering edit mode

4.0 years ago

Jean-Karim Heriche 27k

If the species of interest is in Ensembl, try Ensembl's Biomart. See how to use it here. ID conversion is one of the most common bioinformatics tasks so you should consider learning to do it programmatically if you're going to be doing this more than a couple of times.

ADD COMMENT • link 4.0 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I haven't done any analyses yet, this is my first task. I notice though that my data was has Gene bank accession numbers while the ensembl doesn't have these. Might you know how to perform such tasks in R? And, what sort of packages I need for it? Plus, any other links to materials that might be helpful Thanks!

ADD REPLY • link 4.0 years ago by dikisakye • 0

0

Entering edit mode

For R, the Bioconductor org.XXX.eg.db packages contain objects mapping between different types of identifiers for the XXX organisms. Make sure you make note of which version you're using as these packages are updates regularly. Use it like this (example for human):

library("org.Hs.eg.db")
gene.symbols <- mapIds(org.Hs.eg.db, keys = list.of.IDs, keytype = "ENTREZID", column="SYMBOL")

Alternatively, use the biomaRt package. Read the vignette to see how to use it.

ADD REPLY • link 4.0 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks! Will give feedback on progress

ADD REPLY • link 4.0 years ago by dikisakye • 0

0

Entering edit mode

Hi Jean- Karim, I used the biomaRt package for mapping. I had 2 datasets, one for homo sapiens and the other for the mouse. I was however left with a large amount of unmapped data. So I thought I should use the annotationhub bioconductor package to map some of those. So I have used the argument you provided but i seem to get an error. How do I perform the function correctly? This was my input;(acc.not.mapped is a vector of length 10886 that contains a list of the accession numbers that were not mapped.)

head(acc.not.mapped)
[1] "AY766452"     "XR_109632"    "AK130765"     "NM_020914"    "NM_001077493" "AY358259" 
acc.not.mapped  %>% as.data.frame
gene.symbols <- mapIds(org.Hs.eg.db, keys = acc.not.mapped, keytype = "ENTREZID", column="SYMBOL")
"Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments."

ADD REPLY • link updated 3.9 years ago by GenoMax 142k • written 3.9 years ago by dikisakye • 0

0

Entering edit mode

Thanks for the modification genomax, how do I rectify the issue?

ADD REPLY • link 3.9 years ago by dikisakye • 0

0

Entering edit mode

$ esearch -db nuccore -query AK130765 | elink -target gene | efetch -format docsum | xtract -pattern DocumentSummary -element Name
LOC105378085
$ esearch -db nuccore -query AY766452 | elink -target gene | efetch -format docsum | xtract -pattern DocumentSummary -element Name
CCL4L2
$ esearch -db nuccore -query NM_001077493 | elink -target gene | efetch -format docsum | xtract -pattern DocumentSummary -element Name
QueryKey value not found in summary input

ADD REPLY • link 3.9 years ago by GenoMax 142k

0

Entering edit mode

Thanks for the help genomax

ADD REPLY • link 3.9 years ago by dikisakye • 0

score 1 · Answer 2 · 2020-05-14

1

Entering edit mode

4.0 years ago

GenoMax 142k

You can use EntrezDirect:

$ more id
NM_010378
NM_010382
NM_008873
NM_016701
NM_178057
NM_028072
NM_020581
NM_010441
NM_207105

$ for i in $(cat id); do printf ${i}"\t"; esearch -db nuccore -query ${i} | elink -target gene | efetch -format docsum | xtract -pattern DocumentSummary -element Name; done
NM_010378       H2-Aa
NM_010382       H2-Eb1
NM_008873       Plau
NM_016701       Nes
NM_178057       QueryKey value not found in summary input
NM_028072       Sulf2
NM_020581       Angptl4
NM_010441       Hmga2
NM_207105       H2-Ab1

ADD COMMENT • link 4.0 years ago by GenoMax 142k

0

Entering edit mode

Thanks! I would like to try this out too to enrich my experience. I am a beginner with programming so I understand the basic loops, for this particular one, could you please elaborate on what the meaning of this section of the argument; "\t"; esearch -db nuccore -query $ {I} | elink

ADD REPLY • link 4.0 years ago by dikisakye • 0

1

Entering edit mode

Each entry read from the file called id is passed to Entrezdirect for search, specifically for esearch program that is part of that package. Since Entrezdirect does not keep track of original search terms I am printing that out with printf so you can know which term belongs to gene name that is looked up.

If you want an easy option you can use batch search on this MGI Informatics page. Paste your ID's in/upload a file with them and hit search.

ADD REPLY • link 4.0 years ago by GenoMax 142k