Question: Common genomic intervals in R
0
viniciushs8850 wrote:

I would like to infer shared genomic interval between different samples.

My input:

``````sample    chr start end
NE001      1   100  200
NE001      2   100  200
NE002      1   50   150
NE002      2   50   150
NE003      2   250  300``````

My expected output:

``````chr start end  freq
1    100  150   2
2    100  150   2``````

Where the "freq" is the how many samples have contribuited to infer the shared region. In the above example freq = 2 (NE001 and NE002).

Cheers!

overlap R genome • 2.6k views
written 6.3 years ago by viniciushs8850

What have you tried so far?

2

Actually, I know how to measure the shared numbers between two regions.

input:

``````start1  end1    start2  end2    bp_overlapped
20    30       25      35          6
25    35       20      30          6
100    190     126     226          65
126    226     100     190          65``````

Function:

`f <- function(x) length( intersect(seq(x,x,1), seq(x,x,1)) )`

`a <- apply(X,1,f)`

And I known how to overlap regions and a "cbind" with overlapped ones too. But when all regions are in a same dataframe, and have a collumn tagging different animals... sincerely I don´t know how can I start the logic...

Cool, we try to ensure that posters have given it a go before providing help. Hopefully my answer (below) will work well for you.

2
Devon Ryan96k wrote:

This is a bit easier to do if you use the `GenomicRanges` package from bioconductor:

``````library(GenomicRanges)
gr <- GRanges(seqname=c(1,2,1,2,2), range=IRanges(start=c(100,100,50,50,250),
end=c(200,200,150,150,300))) #N.B., You wouldn't do this manually
bins <- disjoin(gr)
mcols(bins)\$count <- countOverlaps(bins, gr)
``````

The `disjoin()` command splits the ranges into no-overlapping segments.

Edit: I left off a `)` in an earlier version. Mea culpa!

1
Alex Reynolds30k wrote:

If you can use system() within R, or if you don't need to use R, here's a way to do it with BEDOPS (i.e., you'd wrap these calls in `system()` or use the command-line):

``````\$ sort-bed inputs.unsorted.bed > inputs.bed
\$ bedops --intersect inputs.bed | bedmap --echo --count - inputs.bed > answer.bed
``````