I have a list of genomic intervals of interest and I am interested in calculating fraction of chromosome arm they make up.
For example,
chr     start      stop
1         50,000,000     100,000,000
1         120,000,000     150,000,000
And using chromosome arm size information (e.g. chr1p spans region 0-123,400,000, chr1q spans between 123,400,000 and 248,956,422), For this, first I need to split the intervals using chromosome arm boundaries, like:
chr  start   stop arm
1     50,000,000     100,000,000   p
1     120,000,000     123,400,000    p
1      123,400,000      150,000,000   q
Then I will merge the ones on the same chromosome and arm and calculate the fraction of the chromosome arm they make up. Do you have any suggestions on how to split the intervals? Is there an easy function/way or I need to write a script? Thanks.
Thank you, it works just as I wanted! Just a note, maybe add rm interval.*[pq] in the end of the bash script to clean up the temporary interval files.