Get Gene Symbol Synonyms/Aliases With Biomart
4
3
Entering edit mode
9.5 years ago
Caddymob ▴ 990

I am trying to get the HGNC gene name for a handful of genes where I only have a synonym name using biomaRt in R for some downstream processing.

Two examples are:

  1. WHRN is DFNB31 in HGNC
  2. SANS is USH1C in HGNC

From what I can tell by listAttributes() there aren't any synonym or alais attributes to retrieve, but maybe I have missed something?

Alternatively, anyone have a method to do this within R without downloading a ton of flat files, eg: Biostar: Finding Gene Symbol Synonyms

Thanks!

biomart gene • 14k views
ADD COMMENT
0
Entering edit mode

How many gnes are in your list and for how many organisms are you collecting the symbol-alias pairings?

ADD REPLY
0
Entering edit mode

in my current list, I have 8. I am trying to make my lookup as robust as possible. Basically I am trying to map human genes to other mammalian genomes based on a list of genes provided by a biologist.

ADD REPLY
4
Entering edit mode
9.5 years ago
Duff ▴ 660

Hi caddymob
In R you can use SQL directly on the annotation databases to do this. Using your gene aliases as examples:

# load the annotation database
library(org.Hs.eg.db)
# set up your query genes
queryGeneNames <- c('WHRN', 'SANS')

# use sql to get alias table and gene_info table (contains the symbols)
# first open the database connection
dbCon <- org.Hs.eg_dbconn()
# write your SQL query
sqlQuery <- 'SELECT * FROM alias, gene_info WHERE alias._id == gene_info._id;'
# execute the query on the database
aliasSymbol <- dbGetQuery(dbCon, sqlQuery)
# subset to get your results
result <- aliasSymbol[which(aliasSymbol[,2] == queryGeneNames),5]
result
[1] "DFNB31" "USH1G"

See the AnnotationDBI docs for more information.
Best
d

ADD COMMENT
1
Entering edit mode

I like this approach! I actually modified my script to first check my total list of genes for synonyms using the AnnotationDBI, then procede with the rest of my stuff in biomaRt using the approved gene name. Thanks duff!

ADD REPLY
3
Entering edit mode
9.5 years ago
Caddymob ▴ 990

Actually - after thinking the flat file might be the only solution in response to Larry, I was poking around genenames.org and found a way to do this.. When you use the cgi download page, you get a link. Rather than saving this as a text file, I feed this right into R:

hgnc <- read.delim(url("http://www.genenames.org/cgi-bin/hgnc_downloads.cgi?title=HGNC+output+data&hgnc_dbtag=on&col=gd_app_sym&col=gd_aliases&status=Approved&status=Entry+Withdrawn&status_opt=2&where=&order_by=gd_app_sym_sort&format=text&limit=&submit=submit&.cgifields=&.cgifields=chr&.cgifields=status&.cgifields=hgnc_dbtag"))

hgnc[grep("WHRN",hgnc$Synonyms),]

     Approved.Symbol           Synonyms
6950          DFNB31 CIP98, WHRN, USH2D

Ta da!

ADD COMMENT
2
Entering edit mode

Nice +1. I offered my solution so that the 8 are done and you can ship the results. 80, 800 - certainly a different situation.

ADD REPLY
1
Entering edit mode
9.5 years ago

With just 8 and in terms of getting the result out the door, I would query manually at NCBI under the HomoloGene site. You'll have to perform each search individually, but aliases are accepted most of the time. If you're unsure of aliases being an acceptable query, then search EntrezGene with the human alias and limit the search to human genes.

WHRN as query at HomoloGene gives:

DFNB31, H.sapiens[?] DFNB31, P.troglodytes[?] DFNB31, C.lupus[?] Whrn, M.musculus[?] Dfnb31, R.norvegicus[?] LOC100334777, D.rerio

It is interesting that the mouse gene symbol remains as the human alias equivalent.

ADD COMMENT
1
Entering edit mode

Yea - there are multiple ways of getting these manually. UCSC gene search will get them too, as will wikigenes and genenames.org. But again, I'd really like to do this programmatically rather than manually...

I could just download a flat file from hgnc (http://www.genenames.org/cgi-bin/hgnc_downloads.cgi) and use this as a lookup table. Just seems clunky.

ADD REPLY
1
Entering edit mode
4.7 years ago
Shicheng Guo ★ 8.7k

Here is the mapping relationship between Gene Symbol to all his alias name, Download it

https://yunpan.cn/ckkc5J3JBv2ua passwd: 7010

ADD COMMENT
0
Entering edit mode

Shicheng Guo is it possible to get again your table? thanks!

ADD REPLY
1
Entering edit mode

I strongly encourage you to use a solution from this thread or maybe https://www.biostars.org/p/126277/ rather than random files people upload (without code) in dropboxes. Not saying this one here is wrong, it is just not reproducible without code and therefore has limited value imho, again not saying it is wrong or not well done, just not reliable.

ADD REPLY

Login before adding your answer.

Traffic: 1514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6