Hi!
I have these two dataset: Before that contains 5 columns (chromsome, start, end, line number, score)
chrI 1 10 1 0
chrI 11 20 2 0
chrI 21 30 3 0
chrI 31 40 4 0
chrI 41 50 5 0
chrI 51 60 6 0
chrI 61 70 7 0
chrI 71 80 8 0
chrI 81 90 9 0
chrI 91 100 10 0
Peaks that contains 4 columns (chromsome, start, end, value)
"chrI" 880 1091 383
"chrI" 1350 1601 302
"chrI" 1680 1921 241
"chrI" 2220 2561 322
"chrI" 2750 2761 18
"chrI" 3100 3481 420
"chrI" 3660 4211 793
"chrI" 4480 4491 20
"chrI" 4710 4871 195
"chrI" 5010 5261 238
For each lines of Peaks I would like to extract the corresponding lines (e.g all the lines between 880 and 1091 for the first line) in Before, find the highest score value and write it on a new file.
To this end, I've written this function:
summit <- function(x,y,output){
y<- Before
chrom <- x[1]
start <-x[2]
end <-x[3]
startLine <- y[which((y$V1 == chrom) & (y$V2==start)),]
endLine <- y[which((y$V1 == chrom) & (y$V3==end)),]
Subset <- y[which((y$V2 >= startLine$V2) & (y$V3 <= endLine$V2))]
maximum <- Subset[which(Subset$V4 == max(Subset$V4))]
output <- print(maximum)
}
apply(Peaks,1,summit,output = 'peaks_list.bed')
I don't have an error message but It runs during the entire night without giving me results so I guess something is wrong with my code but I don't know what. Does anyone have any idea?
Thank you, Maude
You may like to check GenomicRanges package, which simplifies the job considerably while working with genomic intervals.