Get Gene Symbol Synonyms/Aliases With Biomart
4
3
Entering edit mode
9.5 years ago

I am trying to get the HGNC gene name for a handful of genes where I only have a synonym name using biomaRt in R for some downstream processing.

Two examples are:

1. WHRN is DFNB31 in HGNC
2. SANS is USH1C in HGNC

From what I can tell by listAttributes() there aren't any synonym or alais attributes to retrieve, but maybe I have missed something?

Alternatively, anyone have a method to do this within R without downloading a ton of flat files, eg: Biostar: Finding Gene Symbol Synonyms

Thanks!

biomart gene • 14k views
0
Entering edit mode

How many gnes are in your list and for how many organisms are you collecting the symbol-alias pairings?

0
Entering edit mode

in my current list, I have 8. I am trying to make my lookup as robust as possible. Basically I am trying to map human genes to other mammalian genomes based on a list of genes provided by a biologist.

4
Entering edit mode
9.5 years ago
Duff ▴ 660

In R you can use SQL directly on the annotation databases to do this. Using your gene aliases as examples:

# load the annotation database
library(org.Hs.eg.db)
# set up your query genes
queryGeneNames <- c('WHRN', 'SANS')

# use sql to get alias table and gene_info table (contains the symbols)
# first open the database connection
dbCon <- org.Hs.eg_dbconn()
sqlQuery <- 'SELECT * FROM alias, gene_info WHERE alias._id == gene_info._id;'
# execute the query on the database
aliasSymbol <- dbGetQuery(dbCon, sqlQuery)
# subset to get your results
result <- aliasSymbol[which(aliasSymbol[,2] == queryGeneNames),5]
result
[1] "DFNB31" "USH1G"


Best
d

1
Entering edit mode

I like this approach! I actually modified my script to first check my total list of genes for synonyms using the AnnotationDBI, then procede with the rest of my stuff in biomaRt using the approved gene name. Thanks duff!

3
Entering edit mode
9.5 years ago

Actually - after thinking the flat file might be the only solution in response to Larry, I was poking around genenames.org and found a way to do this.. When you use the cgi download page, you get a link. Rather than saving this as a text file, I feed this right into R:

hgnc <- read.delim(url("http://www.genenames.org/cgi-bin/hgnc_downloads.cgi?title=HGNC+output+data&hgnc_dbtag=on&col=gd_app_sym&col=gd_aliases&status=Approved&status=Entry+Withdrawn&status_opt=2&where=&order_by=gd_app_sym_sort&format=text&limit=&submit=submit&.cgifields=&.cgifields=chr&.cgifields=status&.cgifields=hgnc_dbtag"))

hgnc[grep("WHRN",hgnc\$Synonyms),]

Approved.Symbol           Synonyms
6950          DFNB31 CIP98, WHRN, USH2D


Ta da!

2
Entering edit mode

Nice +1. I offered my solution so that the 8 are done and you can ship the results. 80, 800 - certainly a different situation.

1
Entering edit mode
9.5 years ago

With just 8 and in terms of getting the result out the door, I would query manually at NCBI under the HomoloGene site. You'll have to perform each search individually, but aliases are accepted most of the time. If you're unsure of aliases being an acceptable query, then search EntrezGene with the human alias and limit the search to human genes.

WHRN as query at HomoloGene gives:

DFNB31, H.sapiens[?] DFNB31, P.troglodytes[?] DFNB31, C.lupus[?] Whrn, M.musculus[?] Dfnb31, R.norvegicus[?] LOC100334777, D.rerio

It is interesting that the mouse gene symbol remains as the human alias equivalent.

1
Entering edit mode

Yea - there are multiple ways of getting these manually. UCSC gene search will get them too, as will wikigenes and genenames.org. But again, I'd really like to do this programmatically rather than manually...

1
Entering edit mode
4.7 years ago
Shicheng Guo ★ 8.7k

Here is the mapping relationship between Gene Symbol to all his alias name, Download it

https://yunpan.cn/ckkc5J3JBv2ua passwd: 7010

0
Entering edit mode

Shicheng Guo is it possible to get again your table? thanks!

1
Entering edit mode

I strongly encourage you to use a solution from this thread or maybe https://www.biostars.org/p/126277/ rather than random files people upload (without code) in dropboxes. Not saying this one here is wrong, it is just not reproducible without code and therefore has limited value imho, again not saying it is wrong or not well done, just not reliable.