Question: using biomaRt in R to annotate SNPs to Genes within 1000kb
gravatar for Sheila
20 months ago by
United States
Sheila280 wrote:


In short, I have a list of SNPs that I would like to map to the closest gene within 1000kb. I am using the biomaRt package in R/Bioconductor. I am successfully able to map my SNPs to Genes but only with biomaRt's default bp flanking region (I beleive that is 100kb). Here are my commands to get the Genes.

Also, I am using h19 build.

#Mart used to map SNPs to Ensembl Gene IDs
grch37.snp = useMart(biomart="ENSEMBL_MART_SNP", host="", path="/biomart/martservice",dataset="hsapiens_snp")

#Mart used to map Ensembl Gene IDs to Gene name
grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

snpList <- studyResults$SNP   #List of 5000 SNPs

1. Mapping SNPs to Ensembl Gene IDs

table1 <- getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id"), 
                      filters = "snp_filter", 
                      values = snpList, 
                      mart = grch37.snp)

2. Mapping Ensembl Gene IDs to Gene names

 table2 <- getBM(attributes = c("ensembl_gene_id", "external_gene_name","external_gene_source","variation_name","start_position","end_position","description"),
                 filters = "ensembl_gene_id", 
                 values =  table1$ensembl_gene_stable_id, 
                 mart = grch37)

3. Merge both tables

 results <- merge(table1,table2, by.x = "ensembl_gene_stable_id", by.y="ensembl_gene_id", all.x=T)

So here's my question: How do I specify the region I want? Right now, biomaRt is mapping the SNP to the gene within 100kb, but I want to map the SNP to the gene within 1000kb.

If I can't do this in biomaRt, I welcome alternative solutions!


annotation bioconductor biomart R • 2.1k views
ADD COMMENTlink modified 20 months ago by Emily_Ensembl16k • written 20 months ago by Sheila280
gravatar for VHahaut
20 months ago by
VHahaut1.1k wrote:

Maybe not the best but one idea would be:

  1. Download the list of ENSEMBL gene ID you need (from biomart webiste).
  2. Get the coordinates of your SNP using biomaRt for example.
  3. Transform your SNP/annotations into a GRange object ( and
  4. Cross your annotation dataframe with your coordinates using one of GenomicRanges functions.

    findOverlaps(,, maxgap=100000)

After some text manipulation you should easily get what you need here.

ADD COMMENTlink modified 20 months ago • written 20 months ago by VHahaut1.1k

Thanks! I actually have the BP of each SNP! I'll try and start from there :)

ADD REPLYlink written 20 months ago by Sheila280

Hi, I am working on this problem also. When I followed the step 4 using findOverlaps function in GenomicRanges, a mistake popped out.

`> findOverlaps(,, maxgap = 1000) Hits object with 102300 hits and 0 metadata columns: queryHits subjectHits <integer> <integer> [1] 4231 22438 [2] 4474 23304 [3] 13285 23304 [4] 16393 23304 [5] 19266 23545 ... ... ... [102296] 123777 9501 [102297] 123778 9501 [102298] 123779 6078 [102299] 123780 6078 [102300] 123781 6078

queryLength: 123781 / subjectLength: 64616 Warning message: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': CHR_HG2216_PATCH, CHR_HG23_PATCH, CHR_HSCHR1_4_CTG32_1, CHR_HSCHR11_1_CTG1_1, CHR_HSCHR11_1_CTG2, CHR_HSCHR11_1_CTG3, CHR_HSCHR12_1_CTG2, CHR_HSCHR13_1_CTG2, CHR_HSCHR13_1_CTG4, CHR_HSCHR13_1_CTG6, CHR_HSCHR13_1_CTG7, CHR_HSCHR16_3_CTG3_1, CHR_HSCHR18_1_CTG2, CHR_HSCHR18_4_CTG1_1, CHR_HSCHR2_5_CTG7_2, CHR_HSCHR3_3_CTG2_1, CHR_HSCHR4_1_CTG8_1, CHR_HSCHR9_1_CTG7 - in 'y': CHR_HG1_PATCH, CHR_HG1342_HG2282_PATCH, CHR_HG1531_PATCH, CHR_HG1535_PATCH, CHR_HG1708_PATCH, CHR_HG1815_PATCH, CHR_HG2002_PATCH, CHR_HG2022_PATCH, CHR_HG2046_PATCH, CHR_HG2047_PATCH, CHR_HG2057_PATCH, CHR_HG2060_PATCH, CHR_HG2062_PATCH, CHR_HG2067_PATCH, CHR_HG2088_PATCH, CHR_HG2114_PATCH, CHR_HG2133_PATCH, CHR_HG2236_PATCH, CHR_HG2247_PATCH, CHR_HG2263_PATCH, CHR_HG2266_PATCH, CHR_HG2285_HG106_HG2252_PATCH, CHR_HG2291_PATCH, CHR_HG2412_PATCH, CHR_HG2419_PATCH, CHR_HG2442_PATCH, CHR_HG30_PATCH, CHR_HG708_PATCH, CHR_HG76_PATCH, CHR_HG926_ [... truncated]`

How should I fix this? Thank you in advance!

ADD REPLYlink modified 8 months ago • written 8 months ago by Huichen0310

As explained in the warning message, it seems that your query and your subject have some chromosome names which are not in common. This is a warning not an error so if you expect different chromosome names there is nothing to fix.

ADD REPLYlink written 8 months ago by VHahaut1.1k

Got it. Thanks a lot!

ADD REPLYlink written 8 months ago by Huichen0310
gravatar for Emily_Ensembl
20 months ago by
Emily_Ensembl16k wrote:

Alternatively, you could annotate the SNPs using the VEP, and alter the upstream/downstream distance to 1 Mb. With the online tool, this is just a tick option, whereas with the script you would need a plugin.

ADD COMMENTlink written 20 months ago by Emily_Ensembl16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1803 users visited in the last hour