Question: Are gene symbol and HGNC symbol the same names for a gene?
0
gravatar for Sib
10 months ago by
Sib20
Sib20 wrote:

I have GB-ACC numbers of differentially expressed genes from GEO2R. But I need gene symbols for entering to enrichr database for further analyses. I used BioMart to convert RefSeq mRNA ID(s) to HGNC symbols. But I am not sure that if RefSeq mRNA ID is GB-ACC? And is HGNC symbol, Gene symbol? (BioMart does not have Gen symbol and GB-ACC options)

rna-seq gene • 344 views
ADD COMMENTlink modified 10 months ago by dsull1.6k • written 10 months ago by Sib20
2
gravatar for dsull
10 months ago by
dsull1.6k
UCLA
dsull1.6k wrote:

Yes, HGNC symbol is the gene symbol (for humans) It's called HGNC because the symbols were carefully made by the HUGO gene nomenclature committee, and these symbols are the standards for human genes.

ADD COMMENTlink written 10 months ago by dsull1.6k

Thank you. And what about GB-ACC? is it the same as RefSeq mRNA ID?

ADD REPLYlink modified 10 months ago • written 10 months ago by Sib20
1

That is a great question. GenBank accession (GB-ACC) is not the same as RefSeq.

The RefSeq mRNA ID might start with something like NM_ (such as NM_004985).

The GenBank accession numbers follow a different format (as described here: https://www.ncbi.nlm.nih.gov/Sequin/acc.html ). For example, AF493917 would be a GenBank accession ID (note that the GB-ACC doesn't contain an underscore).

A lot of publications confuse the two but GenBank and RefSeq are two separate databases, where GenBank contains sequences submitted by individual labs whereas RefSeq data is curated and maintained by the NCBI.

I prefer RefSeq because GenBank is an archive of a bunch of raw sequences that are dumped into the database so there's a hodgepodge of redundant data and you have to do a fair amount of filtering to get what you want (in fact, RefSeq is largely based off of NCBI manually curating GenBank data). See the RefSeq paper for more information: https://www.ncbi.nlm.nih.gov/pubmed/15608248

ADD REPLYlink modified 10 months ago • written 10 months ago by dsull1.6k

Thanks a lot for your answer. As you said I think the GEO2R has confused the two. the image below is the result page of GEO2R.. As you see in the GB-ACC column, different formats like NM_201591, BX100997, BC043554, NR_038236 and etc. are used. I'll be grateful if you show me a way to obtain gene symbols of these genes.

ADD REPLYlink written 10 months ago by Sib20
1

Unfortunately, I can't think of an easy way to do it. Personally, I'd use BioMart to convert all the RefSeq IDs first, and then for the remaining IDs that can't be converted (i.e. the GenBank accession numbers), use the following file from NCBI which maps GenBank accession numbers to gene symbols: ftp://ftp.ncbi.nih.gov/gene/DATA/gene2accession.gz

ADD REPLYlink written 10 months ago by dsull1.6k

Thank you.

ADD REPLYlink written 10 months ago by Sib20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1327 users visited in the last hour