Sum up value within a interval
2
0
Entering edit mode
2.8 years ago
CWL • 0

Hi all,

I want to do a simple mathematics: I have to two files, a bed-format file that indicates the range of serval blocks within a chromosome, like:

chrom   chromStart  chromEnd
Chr1    715416  3431775
Chr1    3700258 5952874
Chr1    6081076 7205282
Chr1    7234954 8428511
Chr1    8997448 9100392
Chr1    9249530 9839305
Chr1    10677947    10733706
Chr1    10803957    1100682
Chr1    11061862    11205652
Chr1    15473705    16122685

And the other file is a Dxy a serial estimation based on sliding windows approach, for example:

CHR     Start     End     Dxy
Chr1    705001  710000      0
Chr1    710001  715000      0
Chr1    715001  720000      22.125
Chr1    720001  725000      19.625
Chr1    725001  730000      2.625
Chr1    730001  735000      14.625
Chr1    735001  740000      10.375
Chr1    740001  745000      21.75
Chr1    745001  750000      7.75

I wish to sum up the $4 value in the second file base on the block range in the first file.

As I have already filtered out the SNPs that were not located in the blocks from the pervious step, it is OK to just sum up the windows that overlaps with each blocks.

My first guess is to use awk, but I have trouble to start, please kindly give me some suggestion.

Best,

CWL

genome • 1.5k views
ADD COMMENT
1
Entering edit mode

Did you explore the bedtools option OR GenomicRanges?

ADD REPLY
2
Entering edit mode
2.8 years ago

You could use bedmap --sum to do this:

$ tail -n+2 scores.txt | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, ".", $4 }' | bedmap --echo --sum <(tail -n+2 regions.txt) - > answer.bed

The file answer.bed will contain the regions from regions.txt and the sum of values from scores.txt which overlap each region.

Note: If a region has no scores associated with it, then the returned value will be NAN to indicate an unmapped value. Adding the --skip-unmapped option to the bedmap command will remove these regions from the answer.bed output. Another option is to pipe the result to sed to replace NAN with another value that is acceptable, such as -1 or 0.

ADD COMMENT
0
Entering edit mode

Thanks for the detail, and it works!

ADD REPLY
2
Entering edit mode
2.8 years ago

That's exactly what bedtools map was designed for.

ADD COMMENT
0
Entering edit mode

Thanks for pointing out, it is more simple to use bedtools map!

ADD REPLY

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6