Subset genomic intervals by other intervals
1
0
Entering edit mode
5 months ago

Hello,

I have a bed-like dataframe in R and want to subset it by another bed-like file e.g.:

Bed file 1:

Chr Start End

1     1      10

Bed file2:

Chr Start End

1     5      13

Final bed file:

Chr Start End

1     5      10

So the final bed file should only consist of regions, which are present in both files.

Any suggestions how to do this in R?

Best,

Andreas

R • 139 views
ADD COMMENT
1
Entering edit mode
5 months ago
seidel 8.3k

Use the GenomicRanges library in R. This stuff is what it's all about:

library(GenomicRanges)

a <- data.frame(Chr=1, Start=1, End=10)
b <- data.frame(Chr=1, Start=5, End=13)

# convert to GRanges objects
a <- makeGRangesFromDataFrame(a)
b <- makeGRangesFromDataFrame(b)

# get the common overlapping bits
theIntersection <- intersect(a,b)

# check your results in a genome browser
library(rtracklayer)
export(theIntersection, "intersected_features.bed")

FYI, the result:

> intersect(a,b)
GRanges object with 1 range and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]        1      5-10      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENT

Login before adding your answer.

Traffic: 1901 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6