Question: overlap snp with regions in R
0
gravatar for pt.taklifi
10 weeks ago by
pt.taklifi60
pt.taklifi60 wrote:

Hello everyone, I have a list of SNPs in R and a list of regions.

p1.snp <- structure(list(chrom = c("chr1", "chr1", "chr1", "chr1", "chr1", 
"chr1"), position = c(2135809L, 11130112L, 11253473L, 11258963L, 
15847782L, 16611163L), REF = c("C", "G", "G", "C", "C", "A"), 
    ALT = c("A", "T", "T", "A", "G", "G"), NA. = c(18L, 46L, 
    21L, 32L, 17L, 9L), NA..1 = c(0L, 0L, 0L, 0L, 0L, 0L), NA..2 = c("0%", 
    "0%", "0%", "0%", "0%", "0%"), NA..3 = c("C", "G", "G", "C", 
    "C", "A"), NA..4 = c(11L, 31L, 8L, 43L, 9L, 0L), NA..5 = c(4L, 
    9L, 4L, 12L, 6L, 14L), NA..6 = c("26.67%", "22.5%", "33.33%", 
    "21.82%", "40%", "100%"), NA..7 = c("M", "K", "K", "M", "S", 
    "G"), NA..8 = c("Somatic", "Somatic", "Somatic", "Somatic", 
    "Somatic", "Somatic"), NA..9 = c(1L, 1L, 1L, 1L, 1L, 1L), 
    NA..10 = c(0.03335777, 0.0005946179, 0.01209677, 0.002473575, 
    0.005523112, 1.223706e-06), NA..11 = c(0L, 6L, 8L, 43L, 9L, 
    0L), NA..12 = c(11L, 25L, 0L, 0L, 0L, 0L), NA..13 = c(0L, 
    0L, 4L, 12L, 6L, 14L), NA..14 = c(4L, 9L, 0L, 0L, 0L, 0L), 
    NA..15 = c(0L, 19L, 21L, 24L, 17L, 0L), NA..16 = c(18L, 27L, 
    0L, 8L, 0L, 9L), NA..17 = c(0L, 0L, 0L, 0L, 0L, 0L), NA..18 = c(0L, 
    0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-6L))

and

regions <- structure(list(chrom = c("chr1", "chr1", "chr1", "chr1", "chr1", 
"chr1"), chromStart = c(975451L, 1014228L, 1290080L, 1291099L, 
1291742L, 1327977L), chromEnd = c(975952L, 1014729L, 1290581L, 
1291600L, 1292243L, 1328478L), name = c("BRCA_39", "BRCA_55", 
"BRCA_123", "BRCA_124", "BRCA_125", "BRCA_143"), NA. = c(1.878426, 
4.074697, 2.443588, 3.180199, 8.26783, 1.082465), NA..1 = c("3'UTR", 
"3'UTR", "3'UTR", "3'UTR", "3'UTR", "3'UTR"), NA..2 = c(0.6187625, 
0.6287425, 0.6786427, 0.7025948, 0.6407186, 0.6766467), NA..3 = c(0.3812375, 
0.3712575, 0.3213573, 0.2974052, 0.3592814, 0.3233533)), row.names = c(NA, 
-6L), class = "data.frame")

I know that I could use some commands in terminal for this purpose ,like bedtools intersect; however I'm looking for a function in R that does the same; I want to have an output in a matrix format reporting if a varinat falls in any of ranges of regions. I would appreciate your help and suggestions

snp overlap • 194 views
ADD COMMENTlink modified 10 weeks ago by rpolicastro3.9k • written 10 weeks ago by pt.taklifi60
3
gravatar for rpolicastro
10 weeks ago by
rpolicastro3.9k
Bloomington, IN
rpolicastro3.9k wrote:

You need to first convert both to a granges object.

library("plyranges")

p1.snp <- as_granges(p1.snp, seqnames=chrom, start=position, end=position)
regions <- as_granges(regions, seqnames=chrom, start=chromStart, end=chromEnd)

You can then find the overlaps. For this I will use the handy function join_overlap_left from plyranges. It will return a GRanges object of the snps with information from any overlapping regions added to each row.

overlaps <- join_overlap_left(p1.snp, regions)

There are no overlaps in your example data.

You can then turn this back into a data.frame if needed.

overlaps <- as.data.frame(overlaps)
ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by rpolicastro3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2829 users visited in the last hour
_