If you are fine with just joining any overlapping ranges despite having an exact match to the start or stop, data.table would be the best solution. If not go to the tidyverse solution at the end.
I'll assume you're starting from two data.frames we will call DT1 and DT2 respectively.
DT1 <- structure(list(Start = c(1L, 8L, 21L), Stop = c(6L, 15L, 27L),
Gene_Name = c("A", "B", "C")), class = "data.frame", row.names = c(NA,
-3L))
DT2 <- structure(list(Start = c(1L, 8L, 21L), Stop = c(6L, 14L, 25L),
Gene_Name = c("Gene1", "Gene2", "Gene3")), class = "data.frame", row.names = c(NA,
-3L))
Using foverlaps
from data.table to join based on any overlaps.
library("data.table")
setDT(DT1, key=c("Start", "Stop"))
setDT(DT2, key=c("Start", "Stop"))
overlaps <- foverlaps(DT2, DT1)
> overlaps
Start Stop Gene_Name i.Start i.Stop i.Gene_Name
1: 1 6 A 1 6 Gene1
2: 8 15 B 8 14 Gene2
3: 21 27 C 21 25 Gene3
If you really need to join by either a matching start or stop the easy/ugly way would be using tidyverse. Someone might come around with a quicker and more clever solution. If they do listen to them.
library("tidyverse")
overlaps <- DT1 %>%
rowwise %>%
mutate(matches=list(DT2[DT2$Start == Start | DT2$Stop == Stop, ])) %>%
unnest(matches, names_repair="unique")
> overlaps
# A tibble: 3 x 6
Start...1 Stop...2 Gene_Name...3 Start...4 Stop...5 Gene_Name...6
<int> <int> <chr> <int> <int> <chr>
1 1 6 A 1 6 Gene1
2 8 15 B 8 14 Gene2
3 21 27 C 21 25 Gene3
Is there always a matching start or stop, or can you have overlaps of ranges without having the same start and/or stop?
There should always be a matching start or stop