Question

Get gene names from rs SNP ids

7

Entering edit mode

10.9 years ago

Endre Bakken Stovner ▴ 970

I've got a list of rs SNPs I'd like to enter into the David functional annotation tool. Since it does not support rsids I need to get the refseq gene id (or something similar) of the genes these SNPs overlap with first.

Is there a simple way of getting a gene ID for a SNP? Solution in BioPython or R is fine.

Since I have 22k snps I need an automatic way. And if a text file that maps these values exist, a link to it would be enough.

Ps. preferably looking for a solution that does not use the position of the SNPs; I would be able to solve the problem this way myself.

SNP rs gene • 30k views

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.9 years ago by Endre Bakken Stovner ▴ 970

2

Entering edit mode

I find this tool SNP-Nexus very effective. It runs with the rs IDs of the SNPs as well with batch queries. Please find the link to the page below

http://snp-nexus.org/about.html

I hope this tool will suffice your need a well

ADD REPLY • link 10.9 years ago by ivivek_ngs ★ 5.2k

5

Entering edit mode

10.9 years ago

Emily 24k

Depending on what you want to know, you might also put them into the Ensembl VEP. You can use a list of rsIDs as input, and as output get all the genes they hit and what effects they have, including SIFT and Polyphen scores where relevant. You can use it as an online tool or run it as a Perl script (no perling on your part).

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 10.9 years ago by Emily 24k

0

Entering edit mode

Teensy perl code example appreciated.

ADD REPLY • link updated 3.5 years ago by Ram 45k • written 10.9 years ago by Endre Bakken Stovner ▴ 970

0

Entering edit mode

Just go to the page I linked to and download. The whole script is written for you and there's full documentation on how to run it. You don't have to write anything.

ADD REPLY • link updated 3.5 years ago by Ram 45k • written 10.9 years ago by Emily 24k

2

Entering edit mode

3.8 years ago

krassowski.michal ▴ 180

As an alternative to Biomart you can use Enterz API. An example query to fetch the rs6311 would use the following URL:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=rs6311&retmode=xml

when retrieving many results at a time you may wish to use a script with proper identification to follow Entrez usage policy. I am using my own little Python wrapper for that, easy_entrez:

from easy_entrez import EntrezAPI

entrez_api = EntrezAPI('Project name', 'your@mail.com')
rs6311 = entrez_api.fetch(['rs6311'], max_results=1, database='snp').data
namespaces = {'ns0': 'https://www.ncbi.nlm.nih.gov/SNP/docsum'}
genes = [
    name.text
    for name in rs6311.findall('.//ns0:GENE_E/ns0:NAME', namespaces)
]
print(genes)

which gives ['HTR2A'].

ADD COMMENT • link 3.8 years ago by krassowski.michal ▴ 180

1

Entering edit mode

10.8 years ago

Rm 8.3k

Gene to rs id

library(biomaRt)

## It might take long time to process if many genes (>50)  in the list.

## hgnc_gene_symbols.txt is the file that has the list of gene symbols one per line.
genes <- read.table("~/hgnc_gene_symbols.txt")

ensembl = useMart("ensembl", dataset="hsapiens_gene_ensembl")
dbsnp = useMart("snp", dataset = "hsapiens_snp")

getHGNC2ENSG = getBM(attributes=c('chromosome_name', 'start_position',
                                  'end_position', 'strand', 'ensembl_gene_id',
                                  'hgnc_symbol', 'refseq_mrna'),
                     filters ='hgnc_symbol', values = genes, mart = ensembl)

write.table(getHGNC2ENSG, file="~/hgnc_gene_symbols.txt.ensg.coord.tsv",
            sep="\t", col.names=T, row.names=T, append = F, quote=FALSE)

getRSid4ENSG <- getBM(c('refsnp_id', 'allele', 'snp', 'chr_name', 'chrom_start',
                        'chrom_strand', 'associated_gene', 'ensembl_gene_stable_id', 
                        'synonym_name', 'consequence_type_tv'), 
                      filters = 'ensembl_gene',  values = genes, mart = dbsnp)

write.table(getRSid4ENSG, file="~/hgnc_gene_symbols.txt.ensg.RSid.coord.tsv", 
            sep="\t", col.names=T, row.names=T, append = F, quote=FALSE)

ADD COMMENT • link updated 7.2 years ago by zx8754 12k • written 10.8 years ago by Rm 8.3k

Ram · Accepted Answer · 2014-09-01

11

Entering edit mode

10.9 years ago

Neilfws 49k

It looks like DAVID can convert from Ensembl Gene IDs, so you can get to those from rs IDs using R/biomaRt, like this:

library(biomaRt)
mart.snp <- useMart("snp", "hsapiens_snp")

getENSG <- function(rs = "rs3043732", mart = mart.snp) {
  results <- getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id"),
                   filters    = "snp_filter", values = rs, mart = mart)
  return(results)
}

# default parameters
getENSG()
  refsnp_id ensembl_gene_stable_id
1 rs3043732        ENSG00000175445

# or supply rs ID
getENSG(rs = "rs224550")
  refsnp_id ensembl_gene_stable_id
1  rs224550        ENSG00000262304
2  rs224550        ENSG00000196689

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 10.9 years ago by Neilfws 49k

0

Entering edit mode

Is there any way to get NCBI Gene ID and/or HGNC id for each SNP id?

ADD REPLY • link updated 3.5 years ago by Ram 45k • written 10.9 years ago by always_learning ★ 1.2k

0

Entering edit mode

listAttributes(mart.snp) will show what can be returned; listFilters(mart.snp) will show what can be queried.

ADD REPLY • link 10.9 years ago by Neilfws 49k

0

Entering edit mode

Hi, using biomaRt is there a way to set GRCh37 instead of default GRCh38 version? thank you in advance.

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by Nicola Casiraghi ▴ 500

0

Entering edit mode

You can specify an Ensembl archive using the host = parameter of useMart. For example:

mart.hs <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl", host = "grch37.ensembl.org")

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.6 years ago by Neilfws 49k