Question: using biomaRt in R to annotate SNPs to Genes within 1000kb
gravatar for Sheila
3.7 years ago by
United States
Sheila330 wrote:


In short, I have a list of SNPs that I would like to map to the closest gene within 1000kb. I am using the biomaRt package in R/Bioconductor. I am successfully able to map my SNPs to Genes but only with biomaRt's default bp flanking region (I beleive that is 100kb). Here are my commands to get the Genes.

Also, I am using h19 build.

#Mart used to map SNPs to Ensembl Gene IDs
grch37.snp = useMart(biomart="ENSEMBL_MART_SNP", host="", path="/biomart/martservice",dataset="hsapiens_snp")

#Mart used to map Ensembl Gene IDs to Gene name
grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

snpList <- studyResults$SNP   #List of 5000 SNPs

1. Mapping SNPs to Ensembl Gene IDs

table1 <- getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id"), 
                      filters = "snp_filter", 
                      values = snpList, 
                      mart = grch37.snp)

2. Mapping Ensembl Gene IDs to Gene names

 table2 <- getBM(attributes = c("ensembl_gene_id", "external_gene_name","external_gene_source","variation_name","start_position","end_position","description"),
                 filters = "ensembl_gene_id", 
                 values =  table1$ensembl_gene_stable_id, 
                 mart = grch37)

3. Merge both tables

 results <- merge(table1,table2, by.x = "ensembl_gene_stable_id", by.y="ensembl_gene_id", all.x=T)

So here's my question: How do I specify the region I want? Right now, biomaRt is mapping the SNP to the gene within 100kb, but I want to map the SNP to the gene within 1000kb.

If I can't do this in biomaRt, I welcome alternative solutions!


annotation bioconductor biomart R • 5.4k views
ADD COMMENTlink modified 3.7 years ago by Emily_Ensembl21k • written 3.7 years ago by Sheila330
gravatar for VHahaut
3.7 years ago by
VHahaut1.1k wrote:

Maybe not the best but one idea would be:

  1. Download the list of ENSEMBL gene ID you need (from biomart webiste).
  2. Get the coordinates of your SNP using biomaRt for example.
  3. Transform your SNP/annotations into a GRange object ( and
  4. Cross your annotation dataframe with your coordinates using one of GenomicRanges functions.

    findOverlaps(,, maxgap=100000)

After some text manipulation you should easily get what you need here.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by VHahaut1.1k

Thanks! I actually have the BP of each SNP! I'll try and start from there :)

ADD REPLYlink written 3.7 years ago by Sheila330

Hi, I am working on this problem also. When I followed the step 4 using findOverlaps function in GenomicRanges, a mistake popped out.

`> findOverlaps(,, maxgap = 1000) Hits object with 102300 hits and 0 metadata columns: queryHits subjectHits <integer> <integer> [1] 4231 22438 [2] 4474 23304 [3] 13285 23304 [4] 16393 23304 [5] 19266 23545 ... ... ... [102296] 123777 9501 [102297] 123778 9501 [102298] 123779 6078 [102299] 123780 6078 [102300] 123781 6078

queryLength: 123781 / subjectLength: 64616 Warning message: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': CHR_HG2216_PATCH, CHR_HG23_PATCH, CHR_HSCHR1_4_CTG32_1, CHR_HSCHR11_1_CTG1_1, CHR_HSCHR11_1_CTG2, CHR_HSCHR11_1_CTG3, CHR_HSCHR12_1_CTG2, CHR_HSCHR13_1_CTG2, CHR_HSCHR13_1_CTG4, CHR_HSCHR13_1_CTG6, CHR_HSCHR13_1_CTG7, CHR_HSCHR16_3_CTG3_1, CHR_HSCHR18_1_CTG2, CHR_HSCHR18_4_CTG1_1, CHR_HSCHR2_5_CTG7_2, CHR_HSCHR3_3_CTG2_1, CHR_HSCHR4_1_CTG8_1, CHR_HSCHR9_1_CTG7 - in 'y': CHR_HG1_PATCH, CHR_HG1342_HG2282_PATCH, CHR_HG1531_PATCH, CHR_HG1535_PATCH, CHR_HG1708_PATCH, CHR_HG1815_PATCH, CHR_HG2002_PATCH, CHR_HG2022_PATCH, CHR_HG2046_PATCH, CHR_HG2047_PATCH, CHR_HG2057_PATCH, CHR_HG2060_PATCH, CHR_HG2062_PATCH, CHR_HG2067_PATCH, CHR_HG2088_PATCH, CHR_HG2114_PATCH, CHR_HG2133_PATCH, CHR_HG2236_PATCH, CHR_HG2247_PATCH, CHR_HG2263_PATCH, CHR_HG2266_PATCH, CHR_HG2285_HG106_HG2252_PATCH, CHR_HG2291_PATCH, CHR_HG2412_PATCH, CHR_HG2419_PATCH, CHR_HG2442_PATCH, CHR_HG30_PATCH, CHR_HG708_PATCH, CHR_HG76_PATCH, CHR_HG926_ [... truncated]`

How should I fix this? Thank you in advance!

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Huichen0320

As explained in the warning message, it seems that your query and your subject have some chromosome names which are not in common. This is a warning not an error so if you expect different chromosome names there is nothing to fix.

ADD REPLYlink written 2.7 years ago by VHahaut1.1k

Got it. Thanks a lot!

ADD REPLYlink written 2.7 years ago by Huichen0320
gravatar for Emily_Ensembl
3.7 years ago by
Emily_Ensembl21k wrote:

Alternatively, you could annotate the SNPs using the VEP, and alter the upstream/downstream distance to 1 Mb. With the online tool, this is just a tick option, whereas with the script you would need a plugin.

ADD COMMENTlink written 3.7 years ago by Emily_Ensembl21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1982 users visited in the last hour