I have a question related on how I could use a range given to split it in intervals of 100 length max. I'm looking for doing this in AWK, Shell Script, Perl or Python. Just to show a small example, I have a tab-delimited file that contains:
- Chromosome Number (1 to 23)
- Start range
- End range
- ID of the range
- Number indicating how many times the range have to be splitted
This is how it looks:
chr1 37850 38536 MACS_peak_1 7 chr1 820769 821857 MACS_peak_4 11 chr1 5018483 5019041 MACS_peak_9 6
And after taking each range, and split them in 100 intervals, the output should look like this. The last two columns are not important as much as obtain the ranges splitted, and I showed for this example to indicate the number of split and the length of the split, respectively:
chr1 37850 37950 1 100 chr1 37951 38051 2 100 chr1 38052 38152 3 100 chr1 38153 38253 4 100 chr1 38254 38354 5 100 chr1 38355 38455 6 100 chr1 38456 38536 7 80 chr1 820769 820869 1 100 chr1 820870 820970 2 100 chr1 820971 821071 3 100 chr1 821072 821172 4 100 chr1 821173 821273 5 100 chr1 821274 821374 6 100 chr1 821375 821475 7 100 chr1 821476 821576 8 100 chr1 821577 821677 9 100 chr1 821678 821778 10 100 chr1 821779 821857 11 78 chr1 5018483 5018583 1 100 chr1 5018584 5018684 2 100 chr1 5018685 5018785 3 100 chr1 5018786 5018886 4 100 chr1 5018887 5018987 5 100 chr1 5018988 5019041 6 53
What is it you are trying to do? Summarize/bin data on genomic positions in windows/intervals? Then I suggest you use bedtools makewindows and then bedtools map to map you data on the genomic intervals
seems that we have a whitespace bug that we'll be fixing shortly
I found that tab delimited appears to not be recognized so I had to put the example in 4 space-delimited format