Question

overlap snp with regions in R

0

Entering edit mode

4.6 years ago

pt.taklifi ▴ 70

Hello everyone, I have a list of SNPs in R and a list of regions.

p1.snp <- structure(list(chrom = c("chr1", "chr1", "chr1", "chr1", "chr1", 
"chr1"), position = c(2135809L, 11130112L, 11253473L, 11258963L, 
15847782L, 16611163L), REF = c("C", "G", "G", "C", "C", "A"), 
    ALT = c("A", "T", "T", "A", "G", "G"), NA. = c(18L, 46L, 
    21L, 32L, 17L, 9L), NA..1 = c(0L, 0L, 0L, 0L, 0L, 0L), NA..2 = c("0%", 
    "0%", "0%", "0%", "0%", "0%"), NA..3 = c("C", "G", "G", "C", 
    "C", "A"), NA..4 = c(11L, 31L, 8L, 43L, 9L, 0L), NA..5 = c(4L, 
    9L, 4L, 12L, 6L, 14L), NA..6 = c("26.67%", "22.5%", "33.33%", 
    "21.82%", "40%", "100%"), NA..7 = c("M", "K", "K", "M", "S", 
    "G"), NA..8 = c("Somatic", "Somatic", "Somatic", "Somatic", 
    "Somatic", "Somatic"), NA..9 = c(1L, 1L, 1L, 1L, 1L, 1L), 
    NA..10 = c(0.03335777, 0.0005946179, 0.01209677, 0.002473575, 
    0.005523112, 1.223706e-06), NA..11 = c(0L, 6L, 8L, 43L, 9L, 
    0L), NA..12 = c(11L, 25L, 0L, 0L, 0L, 0L), NA..13 = c(0L, 
    0L, 4L, 12L, 6L, 14L), NA..14 = c(4L, 9L, 0L, 0L, 0L, 0L), 
    NA..15 = c(0L, 19L, 21L, 24L, 17L, 0L), NA..16 = c(18L, 27L, 
    0L, 8L, 0L, 9L), NA..17 = c(0L, 0L, 0L, 0L, 0L, 0L), NA..18 = c(0L, 
    0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-6L))

and

regions <- structure(list(chrom = c("chr1", "chr1", "chr1", "chr1", "chr1", 
"chr1"), chromStart = c(975451L, 1014228L, 1290080L, 1291099L, 
1291742L, 1327977L), chromEnd = c(975952L, 1014729L, 1290581L, 
1291600L, 1292243L, 1328478L), name = c("BRCA_39", "BRCA_55", 
"BRCA_123", "BRCA_124", "BRCA_125", "BRCA_143"), NA. = c(1.878426, 
4.074697, 2.443588, 3.180199, 8.26783, 1.082465), NA..1 = c("3'UTR", 
"3'UTR", "3'UTR", "3'UTR", "3'UTR", "3'UTR"), NA..2 = c(0.6187625, 
0.6287425, 0.6786427, 0.7025948, 0.6407186, 0.6766467), NA..3 = c(0.3812375, 
0.3712575, 0.3213573, 0.2974052, 0.3592814, 0.3233533)), row.names = c(NA, 
-6L), class = "data.frame")

I know that I could use some commands in terminal for this purpose ,like bedtools intersect; however I'm looking for a function in R that does the same; I want to have an output in a matrix format reporting if a varinat falls in any of ranges of regions. I would appreciate your help and suggestions

snp overlap • 993 views

ADD COMMENT • link updated 4.6 years ago by rpolicastro 13k • written 4.6 years ago by pt.taklifi ▴ 70

score 3 · Accepted Answer · 2020-12-12

You need to first convert both to a granges object.

library("plyranges")

p1.snp <- as_granges(p1.snp, seqnames=chrom, start=position, end=position)
regions <- as_granges(regions, seqnames=chrom, start=chromStart, end=chromEnd)

You can then find the overlaps. For this I will use the handy function join_overlap_left from plyranges. It will return a GRanges object of the snps with information from any overlapping regions added to each row.

overlaps <- join_overlap_left(p1.snp, regions)

There are no overlaps in your example data.

You can then turn this back into a data.frame if needed.

overlaps <- as.data.frame(overlaps)