Question: using biomaRt in R to annotate SNPs to Genes within 1000kb
1
gravatar for Sheila
13 months ago by
Sheila220
United States
Sheila220 wrote:

Hi!

In short, I have a list of SNPs that I would like to map to the closest gene within 1000kb. I am using the biomaRt package in R/Bioconductor. I am successfully able to map my SNPs to Genes but only with biomaRt's default bp flanking region (I beleive that is 100kb). Here are my commands to get the Genes.

Also, I am using h19 build.

#Mart used to map SNPs to Ensembl Gene IDs
grch37.snp = useMart(biomart="ENSEMBL_MART_SNP", host="grch37.ensembl.org", path="/biomart/martservice",dataset="hsapiens_snp")

#Mart used to map Ensembl Gene IDs to Gene name
grch37 = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")


snpList <- studyResults$SNP   #List of 5000 SNPs

1. Mapping SNPs to Ensembl Gene IDs

table1 <- getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id"), 
                      filters = "snp_filter", 
                      values = snpList, 
                      mart = grch37.snp)

2. Mapping Ensembl Gene IDs to Gene names

 table2 <- getBM(attributes = c("ensembl_gene_id", "external_gene_name","external_gene_source","variation_name","start_position","end_position","description"),
                 filters = "ensembl_gene_id", 
                 values =  table1$ensembl_gene_stable_id, 
                 mart = grch37)

3. Merge both tables

 results <- merge(table1,table2, by.x = "ensembl_gene_stable_id", by.y="ensembl_gene_id", all.x=T)

So here's my question: How do I specify the region I want? Right now, biomaRt is mapping the SNP to the gene within 100kb, but I want to map the SNP to the gene within 1000kb.

If I can't do this in biomaRt, I welcome alternative solutions!

Thanks!

ADD COMMENTlink modified 12 months ago by Emily_Ensembl14k • written 13 months ago by Sheila220
1
gravatar for VHahaut
13 months ago by
VHahaut1.1k
Belgium
VHahaut1.1k wrote:

Maybe not the best but one idea would be:

  1. Download the list of ENSEMBL gene ID you need (from biomart webiste).
  2. Get the coordinates of your SNP using biomaRt for example.
  3. Transform your SNP/annotations into a GRange object (snp.gr and annot.gr).
  4. Cross your annotation dataframe with your coordinates using one of GenomicRanges functions.

    findOverlaps( snp.gr, annot.gr, maxgap=100000)

After some text manipulation you should easily get what you need here.

ADD COMMENTlink modified 13 months ago • written 13 months ago by VHahaut1.1k

Thanks! I actually have the BP of each SNP! I'll try and start from there :)

ADD REPLYlink written 13 months ago by Sheila220

Hi, I am working on this problem also. When I followed the step 4 using findOverlaps function in GenomicRanges, a mistake popped out.

`> findOverlaps( snp.gr, annot.gr, maxgap = 1000) Hits object with 102300 hits and 0 metadata columns: queryHits subjectHits <integer> <integer> [1] 4231 22438 [2] 4474 23304 [3] 13285 23304 [4] 16393 23304 [5] 19266 23545 ... ... ... [102296] 123777 9501 [102297] 123778 9501 [102298] 123779 6078 [102299] 123780 6078 [102300] 123781 6078


queryLength: 123781 / subjectLength: 64616 Warning message: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other: - in 'x': CHR_HG2216_PATCH, CHR_HG23_PATCH, CHR_HSCHR1_4_CTG32_1, CHR_HSCHR11_1_CTG1_1, CHR_HSCHR11_1_CTG2, CHR_HSCHR11_1_CTG3, CHR_HSCHR12_1_CTG2, CHR_HSCHR13_1_CTG2, CHR_HSCHR13_1_CTG4, CHR_HSCHR13_1_CTG6, CHR_HSCHR13_1_CTG7, CHR_HSCHR16_3_CTG3_1, CHR_HSCHR18_1_CTG2, CHR_HSCHR18_4_CTG1_1, CHR_HSCHR2_5_CTG7_2, CHR_HSCHR3_3_CTG2_1, CHR_HSCHR4_1_CTG8_1, CHR_HSCHR9_1_CTG7 - in 'y': CHR_HG1_PATCH, CHR_HG1342_HG2282_PATCH, CHR_HG1531_PATCH, CHR_HG1535_PATCH, CHR_HG1708_PATCH, CHR_HG1815_PATCH, CHR_HG2002_PATCH, CHR_HG2022_PATCH, CHR_HG2046_PATCH, CHR_HG2047_PATCH, CHR_HG2057_PATCH, CHR_HG2060_PATCH, CHR_HG2062_PATCH, CHR_HG2067_PATCH, CHR_HG2088_PATCH, CHR_HG2114_PATCH, CHR_HG2133_PATCH, CHR_HG2236_PATCH, CHR_HG2247_PATCH, CHR_HG2263_PATCH, CHR_HG2266_PATCH, CHR_HG2285_HG106_HG2252_PATCH, CHR_HG2291_PATCH, CHR_HG2412_PATCH, CHR_HG2419_PATCH, CHR_HG2442_PATCH, CHR_HG30_PATCH, CHR_HG708_PATCH, CHR_HG76_PATCH, CHR_HG926_ [... truncated]`

How should I fix this? Thank you in advance!

ADD REPLYlink modified 28 days ago • written 28 days ago by xiaohuichen030

As explained in the warning message, it seems that your query snp.gr) and your subject annot.gr) have some chromosome names which are not in common. This is a warning not an error so if you expect different chromosome names there is nothing to fix.

ADD REPLYlink written 27 days ago by VHahaut1.1k

Got it. Thanks a lot!

ADD REPLYlink written 27 days ago by xiaohuichen030
1
gravatar for Emily_Ensembl
12 months ago by
Emily_Ensembl14k
EMBL-EBI
Emily_Ensembl14k wrote:

Alternatively, you could annotate the SNPs using the VEP, and alter the upstream/downstream distance to 1 Mb. With the online tool, this is just a tick option, whereas with the script you would need a plugin.

ADD COMMENTlink written 12 months ago by Emily_Ensembl14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 863 users visited in the last hour