Question: Find the highest value in an interval defined in an other file
0
gravatar for maude
6 months ago by
maude0
Switzerland
maude0 wrote:

Hi!

I have these two dataset: Before that contains 5 columns (chromsome, start, end, line number, score)

chrI           1          10    1      0
chrI          11          20    2      0
chrI          21          30    3      0
chrI          31          40    4      0
chrI          41          50    5      0
chrI          51          60    6      0
chrI          61          70    7      0
chrI          71          80    8      0
chrI          81          90    9      0
chrI          91         100    10     0

Peaks that contains 4 columns (chromsome, start, end, value)

"chrI"  880     1091    383
"chrI"  1350    1601    302
"chrI"  1680    1921    241
"chrI"  2220    2561    322
"chrI"  2750    2761    18
"chrI"  3100    3481    420
"chrI"  3660    4211    793
"chrI"  4480    4491    20
"chrI"  4710    4871    195
"chrI"  5010    5261    238

For each lines of Peaks I would like to extract the corresponding lines (e.g all the lines between 880 and 1091 for the first line) in Before, find the highest score value and write it on a new file.

To this end, I've written this function:

summit <- function(x,y,output){
    y<- Before
    chrom <- x[1]
    start <-x[2]
    end <-x[3]
    startLine <- y[which((y$V1 == chrom) & (y$V2==start)),]
    endLine <- y[which((y$V1 == chrom) & (y$V3==end)),]
    Subset <- y[which((y$V2 >= startLine$V2) & (y$V3 <= endLine$V2))]
    maximum <- Subset[which(Subset$V4 == max(Subset$V4))]
    output <- print(maximum)
}

apply(Peaks,1,summit,output = 'peaks_list.bed')

I don't have an error message but It runs during the entire night without giving me results so I guess something is wrong with my code but I don't know what. Does anyone have any idea?

Thank you, Maude

peaks R • 207 views
ADD COMMENTlink modified 5 months ago by Rashedul Islam160 • written 6 months ago by maude0
1

You may like to check GenomicRanges package, which simplifies the job considerably while working with genomic intervals.

ADD REPLYlink modified 6 months ago • written 6 months ago by Santosh Anand3.0k
0
gravatar for Alex Reynolds
5 months ago by
Alex Reynolds21k
Seattle, WA USA
Alex Reynolds21k wrote:

If you're not tied to R, or you can use system, you could use BEDOPS bedmap --max-element or bedmap --max, depending on whether you want the maximum-scoring element or just the score, respectively:

$ bedmap --echo --max-element --delim '\t' Before.bed Peaks.bed > answer.bed

You could use write.table to export a BED file from R. Use BEDOPS sort-bed to prepare the BED files before using them with BEDOPS tools. And make sure the chromosome names line up by removing any quotation marks from the names.

ADD COMMENTlink modified 5 months ago • written 5 months ago by Alex Reynolds21k
0
gravatar for Rashedul Islam
5 months ago by
Canada
Rashedul Islam160 wrote:

For this purpose you can use bedtools groupby and -o max option to get the maximum value.

ADD COMMENTlink written 5 months ago by Rashedul Islam160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 861 users visited in the last hour