LINK SNPS TO GENES USING BIOMART, KEEPING YOUR INITIAL POSITIONS IN THE RESULTING FILE
2
0
Entering edit mode
7 months ago
SGMS ▴ 70

Dear all,

I have a question regarding SNPs to genes mapping using BiomaRt.

I have successfully mapped my chr/start/end SNP positions to ENSIDs and gene symbols. I have done this several times but my question is; Can I keep my original SNP positions in the resulting file which has the chr-start-end gene positions and names? I struggle everytime I need to go back to the original SNP positions to link them to the genes I extracted from BiomaRt. As a picture is worth a 1000 words, please see below:

my file (examples of SNP positions):

chr start       end
1   194972207   194972207
6   41187262    41187262
7   43120222    43120222
7   43120878    43120878

BiomaRt's gene results (examples of gene results):

chromosome_name start_position  end_position    ensembl_gene_id hgnc_symbol
1               119853316       119853748       ENSG00000227205 PFN1P9
1               119886304       119886927       ENSG00000226446 NOTCH2P1
1               119893533       119896515       ENSG00000134249 ADAM30

Therefore, I do have the gene list I requested based on my file's positions, but I do not have those positions in the resulting file. That would be really useful to have.

Any help would be greatly appreciated.

Thank you in advance

biomart snps genes mapping • 350 views
ADD COMMENT
1
Entering edit mode
7 months ago

Don't use BioMart, use the VEP instead.

ADD COMMENT
0
Entering edit mode
7 months ago
Papyrus ▴ 920

What about using GenomicRanges to find the overlaps between the ranges of the two files? Like this: you have these 2 files, in which 2 SNPs belong to the ranges of PFN1P9 and NOTCH2P1 (in your example files there were no matches):

> SNP
  chr     start       end
1   1 119853317 119853317
2   6  41187262  41187262
3   1 119893536 119893536
4   7  43120878  43120878
> biomart
  chromosome_name start_position end_position ensembl_gene_id hgnc_symbol
1               1      119853316    119853748 ENSG00000227205      PFN1P9
2               1      119893533    119896515 ENSG00000134249      ADAM30
3               1      119886304    119886927 ENSG00000226446    NOTCH2P1

If they are data frame, convert them to GRanges:

library(GenomicRanges)
SNP.gr <- makeGRangesFromDataFrame(SNP, keep.extra.columns = T)
biomart.gr <- makeGRangesFromDataFrame(biomart, 
                                       start.field = "start_position",
                                       end.field = "end_position",
                                       keep.extra.columns = T)

Then, use the findOverlaps function to match coordinates between GRanges, like so:

hits <- findOverlaps(query = SNP.gr, subject = biomart.gr)

The hits objects gives you the matching positions between the 2 GRanges. With that info, you can do anything. Such as (a bit sloppy):

mcolsbiomart.gr)$chrSNP <- NA
mcolsbiomart.gr)$chrSNP[subjectHits(hits)] <- seqnamesSNP.gr)[queryHits(hits)]
mcolsbiomart.gr)$startSNP <- NA
mcolsbiomart.gr)$startSNP[subjectHits(hits)] <- startSNP.gr)[queryHits(hits)]
mcolsbiomart.gr)$endSNP <- NA
mcolsbiomart.gr)$endSNP[subjectHits(hits)] <- endSNP.gr)[queryHits(hits)]

And there's the object:

GRanges object with 3 ranges and 5 metadata columns:
      seqnames              ranges strand | ensembl_gene_id hgnc_symbol      chrSNP  startSNP    endSNP
         <Rle>           <IRanges>  <Rle> |        <factor>    <factor> <character> <integer> <integer>
  [1]        1 119853316-119853748      * | ENSG00000227205    PFN1P9             1 119853317 119853317
  [2]        1 119893533-119896515      * | ENSG00000134249    ADAM30             1 119893536 119893536
  [3]        1 119886304-119886927      * | ENSG00000226446    NOTCH2P1        <NA>      <NA>      <NA>
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENT
0
Entering edit mode

Thank you for this. I have actually used findOverlaps before for another purpose but VEP actually is a much easier solution as soon as you bring your data in the right format. Thanks again for your help:)

ADD REPLY

Login before adding your answer.

Traffic: 1574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6