Found Correspondent Numbers In Integer Intervals (R)
3
0
Entering edit mode
10.4 years ago
viniciushs88 ▴ 50

I would like to get the correspondent numbers between two integers intervals. My input is like that:

start1  end1    start2  end2
20     30      25      35
25     35      20      30
100     190    126      226
126     226    100      190


In the first and second line, the overlap from first(1) interval (2 first columns) to second(2) interval (2 last columns) was equal to 6 correspondents numbers (25,26,27,28,29 and 30).

My expected output is like that:

 start1  end1    start2  end2    bp_overlapped
20    30       25      35          6
25    35       20      30          6
100    190     126     226          65
126    226     100     190          65


It is a matrix in R.

Thank you

r overlap • 2.3k views
1
Entering edit mode

Please indicate relevance of question to a specific bioinformatics research problem.

2
Entering edit mode
10.4 years ago

This has only the most tenuous connection to bioinformatics if I make a number of assumptions about why you're trying to do this. You should really post this on an R forum. Having said that:

m <- matrix(c(20,25,100,126,30,35,190,226,25,20,126,100,35,30,226,190), ncol=4)
overlap <- apply(m, 1, function(x) length(intersect(x[1]:x[2], x[3]:x[4])))
cbind(m, overlap)

0
Entering edit mode
10.4 years ago
zx8754 11k

This should work:

# dummy data
df <- read.table(text="start1  end1    start2  end2
20     30      25      35
25     35      20      30
100     190    126      226

# Count overlap
df$bp_overlapped <- sapply(1:nrow(df), function(x) { length( intersect(c(df[x,1]:df[x,2]), c(df[x,3]:df[x,4]))) })  ADD COMMENT 0 Entering edit mode 9.4 years ago You can use findOverlaps command in R. The script is as follows: data2=read.table("C:/file_name.txt",sep = "\t",fill = TRUE) data2=data2[data2[,1]=="Chromosome_name",] end=0 start=data2[,2] for(i in 1:length(data2[,1])) { x=length(data2[i,])-sum(is.na(data2[i,])) end[i]=data2[i,x] } chr=data2[,1] genes=data.frame( chr,start,end) library(IRanges) query <- IRanges(start,end) result=read.table("C:/GC/chromosome_name.txt/result.txt") subject <- IRanges(c(result$start1), c(result\$end1))
tree <- IntervalTree(subject)
findOverlaps(query, tree, select = "all")