Question

How can I convert a list of genbank accession number to gene symbols?

2

Entering edit mode

7.6 years ago

524129693 ▴ 20

I have a list genbank accession number, e.g:AJ001495,AF339794,AK127588,BC039327,BC035392,NR_033244.1,BC038766,CR608805,S81294 How can I convert them to gene symbols? I find that Biomart have "refseq" dataSet, but not "genbank accession". I see the answers "Getting Gene Names From Genbank Ids", but I can not use "python", so, can R achieve it？

R gene • 15k views

ADD COMMENT • link updated 7.6 years ago by noorpratap.singh ▴ 330 • written 7.6 years ago by 524129693 ▴ 20

0

Entering edit mode

First answer in thread you linked suggests BioMart (which is a web based tool). There is a R version of it as well. Tutorials for BioMart are here if you are not familiar with it.

ADD REPLY • link 7.6 years ago by GenoMax 141k

0

Entering edit mode

I can not find "genbank accession" database in Biomart. So I can not convert a list of genbank accession number to gene symbols using Biomart package.

ADD REPLY • link 7.6 years ago by 524129693 ▴ 20

0

Entering edit mode

Most of those appear to be cDNA clones from IMAGE and other sources. You should be able to get the gene symbols using this file from NCBI.

ADD REPLY • link 7.6 years ago by GenoMax 141k

0

Entering edit mode

Thansks for your answer. How to use it(gene2accession)? I do not know. I can not open it.

ADD REPLY • link 7.6 years ago by 524129693 ▴ 20

0

Entering edit mode

You need to download and gunzip the file (it is compressed). If you are on OS X/unix that would be simple. On windows you will need to use 7-zip program.

ADD REPLY • link 7.6 years ago by GenoMax 141k

0

Entering edit mode

7.6 years ago

noorpratap.singh ▴ 330

http://asia.ensembl.org/biomart/martview/7402494965f44f0fa7a91e68012acfdd

The above link would perhaps help.

ADD COMMENT • link 7.6 years ago by noorpratap.singh ▴ 330

0

Entering edit mode

Thansks for your answer.

ADD REPLY • link 7.6 years ago by 524129693 ▴ 20

score 0 · Accepted Answer · 2016-09-23

0

Entering edit mode

7.6 years ago

wiggs38 • 0

This should be achievable with biomaRt in R.

ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")

## instead of using the wildcard ("*") use a vector of genbank accession you are using.
dat = getBM(attributes = c("protein_id", "embl", "hgnc_symbol"), values = "*", mart = ensembl)

You can now simply match the genBank accession in your data with the genBank accession ids in dat.

dat[match(yourdata$genBank, dat$protein_id),]

I haven't tested this (not 100% sure your IDs will match), but if you want to search for what biomaRt has in the future use the listAttributes() function. I tend to write it to a data frame so I can search with grep() terms of interest.

x = listAttributes(ensembl)
x[grep("Genbank", x$description),]

ADD COMMENT • link 7.6 years ago by wiggs38 • 0

0

Entering edit mode

Thansks for your answer. But I try it, the result as follows:

values=c("AJ001495","AF339794")
dat=getBM(attributes=c("protein_id","embl","hgnc_symbol"),filters="protein_id",values=values, mart=ensembl)
dat
protein_id  embl        hgnc_symbol
<0 rows> (or 0-length row.names)

How can I deal with it ?

ADD REPLY • link 7.6 years ago by 524129693 ▴ 20

0

Entering edit mode

Have you tried with your complete list of IDs? Does it still return an empty data frame?

ADD REPLY • link 7.6 years ago by wiggs38 • 0

0

Entering edit mode

Yes, I deal with all my data, but the result is still empty. I have a question, my data is lncRNA genebank ID. Can I use "protein_id" as filters ? Look forward to your reply!

ADD REPLY • link 7.6 years ago by 524129693 ▴ 20

0

Entering edit mode

my data is lncRNA genebank ID. Can I use "protein_id" as filters

Think about that statement for a second and you will have your answer.

ADD REPLY • link 7.6 years ago by GenoMax 141k

0

Entering edit mode

But I do not know how to choose the "filters" for my data.

ADD REPLY • link 7.6 years ago by 524129693 ▴ 20

0

Entering edit mode

I think your issue is as you suggested early, identifying the equivalent IDs in biomaRt, there is the real possibility that they aren't in there. In which case you will to use some other method, have you look into the file provided by genomax2

ADD REPLY • link 7.6 years ago by wiggs38 • 0