Question: Common genomic intervals in R
0
gravatar for viniciushs88
4.9 years ago by
viniciushs8850
Germany
viniciushs8850 wrote:

I would like to infer shared genomic interval between different samples.

My input:

sample    chr start end
NE001      1   100  200
NE001      2   100  200
NE002      1   50   150
NE002      2   50   150
NE003      2   250  300

My expected output:

chr start end  freq
1    100  150   2
2    100  150   2

Where the "freq" is the how many samples have contribuited to infer the shared region. In the above example freq = 2 (NE001 and NE002).

Cheers!

overlap R genome • 1.8k views
ADD COMMENTlink written 4.9 years ago by viniciushs8850

What have you tried so far?

ADD REPLYlink written 4.9 years ago by Devon Ryan88k
2

Actually, I know how to measure the shared numbers between two regions.

input:

start1  end1    start2  end2    bp_overlapped   
   20    30       25      35          6        
   25    35       20      30          6
  100    190     126     226          65
  126    226     100     190          65

Function:

f <- function(x) length( intersect(seq(x[1],x[2],1), seq(x[3],x[4],1)) )

a <- apply(X,1,f)

And I known how to overlap regions and a "cbind" with overlapped ones too. But when all regions are in a same dataframe, and have a collumn tagging different animals... sincerely I don´t know how can I start the logic...

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by viniciushs8850

Cool, we try to ensure that posters have given it a go before providing help. Hopefully my answer (below) will work well for you.

ADD REPLYlink written 4.9 years ago by Devon Ryan88k
2
gravatar for Devon Ryan
4.9 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

This is a bit easier to do if you use the GenomicRanges package from bioconductor:

library(GenomicRanges)
gr <- GRanges(seqname=c(1,2,1,2,2), range=IRanges(start=c(100,100,50,50,250),
    end=c(200,200,150,150,300))) #N.B., You wouldn't do this manually
bins <- disjoin(gr)
mcols(bins)$count <- countOverlaps(bins, gr)

 

The disjoin() command splits the ranges into no-overlapping segments.

Edit: I left off a ")" in an earlier version. Mea culpa!

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Devon Ryan88k
1
gravatar for Alex Reynolds
4.9 years ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

If you can use system() within R, or if you don't need to use R, here's a way to do it with BEDOPS (i.e., you'd wrap these calls in system() or use the command-line):

$ sort-bed inputs.unsorted.bed > inputs.bed
$ bedops --intersect inputs.bed | bedmap --echo --count - inputs.bed > answer.bed
ADD COMMENTlink written 4.9 years ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1258 users visited in the last hour