Question: Get gene names from rs SNP ids
5
gravatar for Endre Bakken Stovner
4.6 years ago by
Norway
Endre Bakken Stovner880 wrote:

I've got a list of rs SNPs I'd like to enter into the David functional annotation tool. Since it does not support rsids I need to get the refseq gene id (or something similar) of the genes these SNPs overlap with first.

Is there a simple way of getting a gene ID for a SNP? Solution in BioPython or R is fine.

Since I have 22k snps I need an automatic way. And if a text file that maps these values exist, a link to it would be enough.

Ps. preferably looking for a solution that does not use the position of the SNPs; I would be able to solve the problem this way myself.

snp rs gene • 14k views
ADD COMMENTlink modified 4.6 years ago by Emily_Ensembl18k • written 4.6 years ago by Endre Bakken Stovner880
2

I find this tool SNP-Nexus very effective. It runs with the rs IDs of the SNPs as well with batch queries. Please find the link to the page below

http://snp-nexus.org/about.html

I hope this tool will suffice your need a well

ADD REPLYlink written 4.6 years ago by ivivek_ngs4.8k
10
gravatar for Neilfws
4.6 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

It looks like DAVID can convert from Ensembl Gene IDs, so you can get to those from rs IDs using R/biomaRt, like this:

library(biomaRt)
mart.snp <- useMart("snp", "hsapiens_snp")

getENSG <- function(rs = "rs3043732", mart = mart.snp) {
  results <- getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id"),
                   filters    = "snp_filter", values = rs, mart = mart)
  return(results)
}

# default parameters
getENSG()
  refsnp_id ensembl_gene_stable_id
1 rs3043732        ENSG00000175445

# or supply rs ID
getENSG(rs = "rs224550")
  refsnp_id ensembl_gene_stable_id
1  rs224550        ENSG00000262304
2  rs224550        ENSG00000196689

 

 

ADD COMMENTlink written 4.6 years ago by Neilfws48k

Is thier any way to get NCBI Gene ID and/or HGNC id for each SNP id ? 

ADD REPLYlink written 4.6 years ago by always_learning960

listAttributes(mart.snp) will show what can be returned; listFilters(mart.snp) will show what can be queried.

ADD REPLYlink written 4.6 years ago by Neilfws48k

Hi, using biomaRt is there a way to set GRCh37 instead of default GRCh38 version? thank you in advance.

ADD REPLYlink written 4.4 years ago by Nicola Casiraghi450

You can specify an Ensembl archive using the host =  parameter of useMart. For example:

mart.hs <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl", host = "grch37.ensembl.org")
ADD REPLYlink written 4.3 years ago by Neilfws48k
4
gravatar for Emily_Ensembl
4.6 years ago by
Emily_Ensembl18k
EMBL-EBI
Emily_Ensembl18k wrote:

Depending on what you want to know, you might also put them into the Ensembl VEP. You can use a list of rsIDs as input, and as output get all the genes they hit and what effects they have, including SIFT and Polyphen scores where relevant. You can use it as an online tool or run it as a Perl script (no perling on your part).

ADD COMMENTlink written 4.6 years ago by Emily_Ensembl18k

Teensy perl code example appreciated.
 

ADD REPLYlink written 4.6 years ago by Endre Bakken Stovner880

Just go to the page I linked to and download. The whole script is written for you and there's full documentation on how to run it. You don't have to write anything.

ADD REPLYlink written 4.6 years ago by Emily_Ensembl18k
1
gravatar for Rm
4.6 years ago by
Rm7.8k
Danville, PA
Rm7.8k wrote:

Gene to rs id

library(biomaRt)

## It might take long time to process if many genes (>50)  in the list.

## hgnc_gene_symbols.txt is the file that has the list of gene symbols one per line.
genes <- read.table("~/hgnc_gene_symbols.txt")

ensembl = useMart("ensembl", dataset="hsapiens_gene_ensembl")
dbsnp = useMart("snp", dataset = "hsapiens_snp")

getHGNC2ENSG = getBM(attributes=c('chromosome_name', 'start_position',
                                  'end_position', 'strand', 'ensembl_gene_id',
                                  'hgnc_symbol', 'refseq_mrna'),
                     filters ='hgnc_symbol', values = genes, mart = ensembl)

write.table(getHGNC2ENSG, file="~/hgnc_gene_symbols.txt.ensg.coord.tsv",
            sep="\t", col.names=T, row.names=T, append = F, quote=FALSE)

getRSid4ENSG <- getBM(c('refsnp_id', 'allele', 'snp', 'chr_name', 'chrom_start',
                        'chrom_strand', 'associated_gene', 'ensembl_gene_stable_id', 
                        'synonym_name', 'consequence_type_tv'), 
                      filters = 'ensembl_gene',  values = genes, mart = dbsnp)

write.table(getRSid4ENSG, file="~/hgnc_gene_symbols.txt.ensg.RSid.coord.tsv", 
            sep="\t", col.names=T, row.names=T, append = F, quote=FALSE)
ADD COMMENTlink modified 12 months ago by zx87547.1k • written 4.6 years ago by Rm7.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 984 users visited in the last hour