Question: Find the highest value in an interval defined in an other file
0
gravatar for maude
2.6 years ago by
maude0
Switzerland
maude0 wrote:

Hi!

I have these two dataset: Before that contains 5 columns (chromsome, start, end, line number, score)

chrI           1          10    1      0
chrI          11          20    2      0
chrI          21          30    3      0
chrI          31          40    4      0
chrI          41          50    5      0
chrI          51          60    6      0
chrI          61          70    7      0
chrI          71          80    8      0
chrI          81          90    9      0
chrI          91         100    10     0

Peaks that contains 4 columns (chromsome, start, end, value)

"chrI"  880     1091    383
"chrI"  1350    1601    302
"chrI"  1680    1921    241
"chrI"  2220    2561    322
"chrI"  2750    2761    18
"chrI"  3100    3481    420
"chrI"  3660    4211    793
"chrI"  4480    4491    20
"chrI"  4710    4871    195
"chrI"  5010    5261    238

For each lines of Peaks I would like to extract the corresponding lines (e.g all the lines between 880 and 1091 for the first line) in Before, find the highest score value and write it on a new file.

To this end, I've written this function:

summit <- function(x,y,output){
    y<- Before
    chrom <- x[1]
    start <-x[2]
    end <-x[3]
    startLine <- y[which((y$V1 == chrom) & (y$V2==start)),]
    endLine <- y[which((y$V1 == chrom) & (y$V3==end)),]
    Subset <- y[which((y$V2 >= startLine$V2) & (y$V3 <= endLine$V2))]
    maximum <- Subset[which(Subset$V4 == max(Subset$V4))]
    output <- print(maximum)
}

apply(Peaks,1,summit,output = 'peaks_list.bed')

I don't have an error message but It runs during the entire night without giving me results so I guess something is wrong with my code but I don't know what. Does anyone have any idea?

Thank you, Maude

peaks R • 852 views
ADD COMMENTlink modified 2.5 years ago by Rashedul Islam340 • written 2.6 years ago by maude0
1

You may like to check GenomicRanges package, which simplifies the job considerably while working with genomic intervals.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Santosh Anand5.0k
0
gravatar for Alex Reynolds
2.5 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

If you're not tied to R, or you can use system, you could use BEDOPS bedmap --max-element or bedmap --max, depending on whether you want the maximum-scoring element or just the score, respectively:

$ bedmap --echo --max-element --delim '\t' Before.bed Peaks.bed > answer.bed

You could use write.table to export a BED file from R. Use BEDOPS sort-bed to prepare the BED files before using them with BEDOPS tools. And make sure the chromosome names line up by removing any quotation marks from the names.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Alex Reynolds29k
0
gravatar for Rashedul Islam
2.5 years ago by
Canada
Rashedul Islam340 wrote:

For this purpose you can use bedtools groupby and -o max option to get the maximum value.

ADD COMMENTlink written 2.5 years ago by Rashedul Islam340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1791 users visited in the last hour