Question: Find the highest value in an interval defined in an other file
0
gravatar for maude
12 months ago by
maude0
Switzerland
maude0 wrote:

Hi!

I have these two dataset: Before that contains 5 columns (chromsome, start, end, line number, score)

chrI           1          10    1      0
chrI          11          20    2      0
chrI          21          30    3      0
chrI          31          40    4      0
chrI          41          50    5      0
chrI          51          60    6      0
chrI          61          70    7      0
chrI          71          80    8      0
chrI          81          90    9      0
chrI          91         100    10     0

Peaks that contains 4 columns (chromsome, start, end, value)

"chrI"  880     1091    383
"chrI"  1350    1601    302
"chrI"  1680    1921    241
"chrI"  2220    2561    322
"chrI"  2750    2761    18
"chrI"  3100    3481    420
"chrI"  3660    4211    793
"chrI"  4480    4491    20
"chrI"  4710    4871    195
"chrI"  5010    5261    238

For each lines of Peaks I would like to extract the corresponding lines (e.g all the lines between 880 and 1091 for the first line) in Before, find the highest score value and write it on a new file.

To this end, I've written this function:

summit <- function(x,y,output){
    y<- Before
    chrom <- x[1]
    start <-x[2]
    end <-x[3]
    startLine <- y[which((y$V1 == chrom) & (y$V2==start)),]
    endLine <- y[which((y$V1 == chrom) & (y$V3==end)),]
    Subset <- y[which((y$V2 >= startLine$V2) & (y$V3 <= endLine$V2))]
    maximum <- Subset[which(Subset$V4 == max(Subset$V4))]
    output <- print(maximum)
}

apply(Peaks,1,summit,output = 'peaks_list.bed')

I don't have an error message but It runs during the entire night without giving me results so I guess something is wrong with my code but I don't know what. Does anyone have any idea?

Thank you, Maude

peaks R • 425 views
ADD COMMENTlink modified 11 months ago by Rashedul Islam220 • written 12 months ago by maude0
1

You may like to check GenomicRanges package, which simplifies the job considerably while working with genomic intervals.

ADD REPLYlink modified 12 months ago • written 12 months ago by Santosh Anand3.4k
0
gravatar for Alex Reynolds
11 months ago by
Alex Reynolds23k
Seattle, WA USA
Alex Reynolds23k wrote:

If you're not tied to R, or you can use system, you could use BEDOPS bedmap --max-element or bedmap --max, depending on whether you want the maximum-scoring element or just the score, respectively:

$ bedmap --echo --max-element --delim '\t' Before.bed Peaks.bed > answer.bed

You could use write.table to export a BED file from R. Use BEDOPS sort-bed to prepare the BED files before using them with BEDOPS tools. And make sure the chromosome names line up by removing any quotation marks from the names.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Alex Reynolds23k
0
gravatar for Rashedul Islam
11 months ago by
Canada
Rashedul Islam220 wrote:

For this purpose you can use bedtools groupby and -o max option to get the maximum value.

ADD COMMENTlink written 11 months ago by Rashedul Islam220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1587 users visited in the last hour