Question: using biomaRt in R to annotate SNPs to Genes within 1000kb
1
gravatar for Sheila
17 months ago by
Sheila240
United States
Sheila240 wrote:

Hi!

In short, I have a list of SNPs that I would like to map to the closest gene within 1000kb. I am using the biomaRt package in R/Bioconductor. I am successfully able to map my SNPs to Genes but only with biomaRt's default bp flanking region (I beleive that is 100kb). Here are my commands to get the Genes.

Also, I am using h19 build.

#Mart used to map SNPs to Ensembl Gene IDs
grch37.snp = useMart(biomart="ENSEMBL_MART_SNP", host="grch37.ensembl.org", path="/biomart/martservice",dataset="hsapiens_snp")

#Mart used to map Ensembl Gene IDs to Gene name
grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")


snpList <- studyResults$SNP   #List of 5000 SNPs

1. Mapping SNPs to Ensembl Gene IDs

table1 <- getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id"), 
                      filters = "snp_filter", 
                      values = snpList, 
                      mart = grch37.snp)

2. Mapping Ensembl Gene IDs to Gene names

 table2 <- getBM(attributes = c("ensembl_gene_id", "external_gene_name","external_gene_source","variation_name","start_position","end_position","description"),
                 filters = "ensembl_gene_id", 
                 values =  table1$ensembl_gene_stable_id, 
                 mart = grch37)

3. Merge both tables

 results <- merge(table1,table2, by.x = "ensembl_gene_stable_id", by.y="ensembl_gene_id", all.x=T)

So here's my question: How do I specify the region I want? Right now, biomaRt is mapping the SNP to the gene within 100kb, but I want to map the SNP to the gene within 1000kb.

If I can't do this in biomaRt, I welcome alternative solutions!

Thanks!

annotation bioconductor biomart R • 1.7k views
ADD COMMENTlink modified 17 months ago by Emily_Ensembl16k • written 17 months ago by Sheila240
1
gravatar for VHahaut
17 months ago by
VHahaut1.1k
Belgium
VHahaut1.1k wrote:

Maybe not the best but one idea would be:

  1. Download the list of ENSEMBL gene ID you need (from biomart webiste).
  2. Get the coordinates of your SNP using biomaRt for example.
  3. Transform your SNP/annotations into a GRange object (snp.gr and annot.gr).
  4. Cross your annotation dataframe with your coordinates using one of GenomicRanges functions.

    findOverlaps( snp.gr, annot.gr, maxgap=100000)

After some text manipulation you should easily get what you need here.

ADD COMMENTlink modified 17 months ago • written 17 months ago by VHahaut1.1k

Thanks! I actually have the BP of each SNP! I'll try and start from there :)

ADD REPLYlink written 17 months ago by Sheila240

Hi, I am working on this problem also. When I followed the step 4 using findOverlaps function in GenomicRanges, a mistake popped out.

`> findOverlaps( snp.gr, annot.gr, maxgap = 1000) Hits object with 102300 hits and 0 metadata columns: queryHits subjectHits <integer> <integer> [1] 4231 22438 [2] 4474 23304 [3] 13285 23304 [4] 16393 23304 [5] 19266 23545 ... ... ... [102296] 123777 9501 [102297] 123778 9501 [102298] 123779 6078 [102299] 123780 6078 [102300] 123781 6078


queryLength: 123781 / subjectLength: 64616 Warning message: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': CHR_HG2216_PATCH, CHR_HG23_PATCH, CHR_HSCHR1_4_CTG32_1, CHR_HSCHR11_1_CTG1_1, CHR_HSCHR11_1_CTG2, CHR_HSCHR11_1_CTG3, CHR_HSCHR12_1_CTG2, CHR_HSCHR13_1_CTG2, CHR_HSCHR13_1_CTG4, CHR_HSCHR13_1_CTG6, CHR_HSCHR13_1_CTG7, CHR_HSCHR16_3_CTG3_1, CHR_HSCHR18_1_CTG2, CHR_HSCHR18_4_CTG1_1, CHR_HSCHR2_5_CTG7_2, CHR_HSCHR3_3_CTG2_1, CHR_HSCHR4_1_CTG8_1, CHR_HSCHR9_1_CTG7 - in 'y': CHR_HG1_PATCH, CHR_HG1342_HG2282_PATCH, CHR_HG1531_PATCH, CHR_HG1535_PATCH, CHR_HG1708_PATCH, CHR_HG1815_PATCH, CHR_HG2002_PATCH, CHR_HG2022_PATCH, CHR_HG2046_PATCH, CHR_HG2047_PATCH, CHR_HG2057_PATCH, CHR_HG2060_PATCH, CHR_HG2062_PATCH, CHR_HG2067_PATCH, CHR_HG2088_PATCH, CHR_HG2114_PATCH, CHR_HG2133_PATCH, CHR_HG2236_PATCH, CHR_HG2247_PATCH, CHR_HG2263_PATCH, CHR_HG2266_PATCH, CHR_HG2285_HG106_HG2252_PATCH, CHR_HG2291_PATCH, CHR_HG2412_PATCH, CHR_HG2419_PATCH, CHR_HG2442_PATCH, CHR_HG30_PATCH, CHR_HG708_PATCH, CHR_HG76_PATCH, CHR_HG926_ [... truncated]`

How should I fix this? Thank you in advance!

ADD REPLYlink modified 5 months ago • written 5 months ago by Huichen0310

As explained in the warning message, it seems that your query snp.gr) and your subject annot.gr) have some chromosome names which are not in common. This is a warning not an error so if you expect different chromosome names there is nothing to fix.

ADD REPLYlink written 5 months ago by VHahaut1.1k

Got it. Thanks a lot!

ADD REPLYlink written 5 months ago by Huichen0310
1
gravatar for Emily_Ensembl
17 months ago by
Emily_Ensembl16k
EMBL-EBI
Emily_Ensembl16k wrote:

Alternatively, you could annotate the SNPs using the VEP, and alter the upstream/downstream distance to 1 Mb. With the online tool, this is just a tick option, whereas with the script you would need a plugin.

ADD COMMENTlink written 17 months ago by Emily_Ensembl16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2147 users visited in the last hour