Question: using biomaRt in R to annotate SNPs to Genes within 1000kb
gravatar for Sheila
23 months ago by
United States
Sheila290 wrote:


In short, I have a list of SNPs that I would like to map to the closest gene within 1000kb. I am using the biomaRt package in R/Bioconductor. I am successfully able to map my SNPs to Genes but only with biomaRt's default bp flanking region (I beleive that is 100kb). Here are my commands to get the Genes.

Also, I am using h19 build.

#Mart used to map SNPs to Ensembl Gene IDs
grch37.snp = useMart(biomart="ENSEMBL_MART_SNP", host="", path="/biomart/martservice",dataset="hsapiens_snp")

#Mart used to map Ensembl Gene IDs to Gene name
grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

snpList <- studyResults$SNP   #List of 5000 SNPs

1. Mapping SNPs to Ensembl Gene IDs

table1 <- getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id"), 
                      filters = "snp_filter", 
                      values = snpList, 
                      mart = grch37.snp)

2. Mapping Ensembl Gene IDs to Gene names

 table2 <- getBM(attributes = c("ensembl_gene_id", "external_gene_name","external_gene_source","variation_name","start_position","end_position","description"),
                 filters = "ensembl_gene_id", 
                 values =  table1$ensembl_gene_stable_id, 
                 mart = grch37)

3. Merge both tables

 results <- merge(table1,table2, by.x = "ensembl_gene_stable_id", by.y="ensembl_gene_id", all.x=T)

So here's my question: How do I specify the region I want? Right now, biomaRt is mapping the SNP to the gene within 100kb, but I want to map the SNP to the gene within 1000kb.

If I can't do this in biomaRt, I welcome alternative solutions!


annotation bioconductor biomart R • 2.4k views
ADD COMMENTlink modified 23 months ago by Emily_Ensembl17k • written 23 months ago by Sheila290
gravatar for VHahaut
23 months ago by
VHahaut1.1k wrote:

Maybe not the best but one idea would be:

  1. Download the list of ENSEMBL gene ID you need (from biomart webiste).
  2. Get the coordinates of your SNP using biomaRt for example.
  3. Transform your SNP/annotations into a GRange object ( and
  4. Cross your annotation dataframe with your coordinates using one of GenomicRanges functions.

    findOverlaps(,, maxgap=100000)

After some text manipulation you should easily get what you need here.

ADD COMMENTlink modified 23 months ago • written 23 months ago by VHahaut1.1k

Thanks! I actually have the BP of each SNP! I'll try and start from there :)

ADD REPLYlink written 23 months ago by Sheila290

Hi, I am working on this problem also. When I followed the step 4 using findOverlaps function in GenomicRanges, a mistake popped out.

`> findOverlaps(,, maxgap = 1000) Hits object with 102300 hits and 0 metadata columns: queryHits subjectHits <integer> <integer> [1] 4231 22438 [2] 4474 23304 [3] 13285 23304 [4] 16393 23304 [5] 19266 23545 ... ... ... [102296] 123777 9501 [102297] 123778 9501 [102298] 123779 6078 [102299] 123780 6078 [102300] 123781 6078

queryLength: 123781 / subjectLength: 64616 Warning message: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': CHR_HG2216_PATCH, CHR_HG23_PATCH, CHR_HSCHR1_4_CTG32_1, CHR_HSCHR11_1_CTG1_1, CHR_HSCHR11_1_CTG2, CHR_HSCHR11_1_CTG3, CHR_HSCHR12_1_CTG2, CHR_HSCHR13_1_CTG2, CHR_HSCHR13_1_CTG4, CHR_HSCHR13_1_CTG6, CHR_HSCHR13_1_CTG7, CHR_HSCHR16_3_CTG3_1, CHR_HSCHR18_1_CTG2, CHR_HSCHR18_4_CTG1_1, CHR_HSCHR2_5_CTG7_2, CHR_HSCHR3_3_CTG2_1, CHR_HSCHR4_1_CTG8_1, CHR_HSCHR9_1_CTG7 - in 'y': CHR_HG1_PATCH, CHR_HG1342_HG2282_PATCH, CHR_HG1531_PATCH, CHR_HG1535_PATCH, CHR_HG1708_PATCH, CHR_HG1815_PATCH, CHR_HG2002_PATCH, CHR_HG2022_PATCH, CHR_HG2046_PATCH, CHR_HG2047_PATCH, CHR_HG2057_PATCH, CHR_HG2060_PATCH, CHR_HG2062_PATCH, CHR_HG2067_PATCH, CHR_HG2088_PATCH, CHR_HG2114_PATCH, CHR_HG2133_PATCH, CHR_HG2236_PATCH, CHR_HG2247_PATCH, CHR_HG2263_PATCH, CHR_HG2266_PATCH, CHR_HG2285_HG106_HG2252_PATCH, CHR_HG2291_PATCH, CHR_HG2412_PATCH, CHR_HG2419_PATCH, CHR_HG2442_PATCH, CHR_HG30_PATCH, CHR_HG708_PATCH, CHR_HG76_PATCH, CHR_HG926_ [... truncated]`

How should I fix this? Thank you in advance!

ADD REPLYlink modified 11 months ago • written 11 months ago by Huichen0310

As explained in the warning message, it seems that your query and your subject have some chromosome names which are not in common. This is a warning not an error so if you expect different chromosome names there is nothing to fix.

ADD REPLYlink written 11 months ago by VHahaut1.1k

Got it. Thanks a lot!

ADD REPLYlink written 11 months ago by Huichen0310
gravatar for Emily_Ensembl
23 months ago by
Emily_Ensembl17k wrote:

Alternatively, you could annotate the SNPs using the VEP, and alter the upstream/downstream distance to 1 Mb. With the online tool, this is just a tick option, whereas with the script you would need a plugin.

ADD COMMENTlink written 23 months ago by Emily_Ensembl17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1005 users visited in the last hour