Which R function can be used for intersection of genomic coordinates from one file and check if the coordinates are within a specific range?
2
0
Entering edit mode
3 months ago
bioinfo89 ▴ 40

Hi All,

I am a newbie in R language and need help with some suggestions for the following script I am working on.

So, the script checks if the genomic coordinate from BEDPE file1 (it has other info as well) are within a specific range of the genomic coordinates from another BEDPE file2. I want to merge the two data frames in to one if the file1 genome coordinates are within 150 bps upstream or downstream of the genome coordinates in the file2, I am comparing with. Is there any R function which can be used here to get what I want?

The output should be the genomic coordinates from first file which are within the range of second file.

So far I have tried merge and using by= "colname" but that does not allow the intersection within a range. It gives the output if the coordinates are matching exactly.

I forgot to mention, I am using BEDPE files.

Any help would be appreciated. Thanks!

coordinates R Genomic • 727 views
0
Entering edit mode

Can you tell us why specifically an R based solution is required? Otherwise you can use bedtools pairtopair

0
Entering edit mode

I am processing the analysis using an R script, that's why it will be convenient for me to find an equivalent function to process the files rather separately use bedtools and then add the output file again in the Rscript for processing. Also, I have used bedtoolsr which can be used inside R. Just sharing:)! Thanks!

2
Entering edit mode
3 months ago
seidel 8.3k

For R the GenomicRanges package is made for this (as mentioned by @jared.andrews07).

library(GenomicRanges)
library(rtracklayer)

f1 <- import("file1.bed")
f2 <- import("file2.bed")

# subset f1 features that overlap with f2
result <- f1[countOverlaps(f1, f2, maxgap=150) > 0]

# save as bed
export(result, "f1_f2_overlap.bed")

# or as a dataframe
df <- as.data.frame(result)
write.table(df, file="f1_f2_overlap.txt", sep="\", col.names=NA)


If file1 and file2 are not bed files, you can import them as a data frame, and then create GRanges Objects out of them using makeGRangesFromDataFrame() function. There are many different kinds of overlap functions in GenomicRanges. You can choose to pay attention to strand or not. See the help, and the vignette.

1
Entering edit mode
3 months ago

Seems like a job for bedtools, for which there are multiple R wrappers as well. Like bedr. The GenomicRanges package could also handle this.