Subset overlapping regions to find the minimal number of non overlapping regions
0
0
Entering edit mode
7 weeks ago
Geoffrey • 0

I have a large set of regions, many of which are overlapping. When they are overlapping I am trying to subset them such that I have the minimal number of non overlapping regions.

I am not sure the best way to go about this. I thought maybe I could do bedtools merge and then select the largest of the merged regions? I could then do bedtools intersect -v on the subset of regions and add the two together? Since this could bring in regions which themselves have overlaps I would then need to iterate on this untill I got what I wanted. Also selecting the largest might bot give the best path.

Is there a more elegant way to do this?

Region diagram

Example data:

chr1    15627   2015626 TRF_197022
chr1    15628   43383   TRF_197021
chr1    43845   44514   TRF_197027
chr1    44503   355335  TRF_197029
chr1    355339  356932  TRF_197079
chr1    356933  450858  TRF_197081
chr1    450888  455989  TRF_197096
chr1    450888  455989  TRF_197095
chr1    458068  458111  TRF_197101
chr1    458068  458111  TRF_197100
chr1    458253  458301  TRF_197102
chr1    458798  458823  TRF_197103
chr1    458920  458947  TRF_197104
chr1    459055  459093  TRF_197105
chr1    468536  519257  TRF_197110
chr1    519259  2507432 TRF_197117
chr1    598171  599043  TRF_197129
chr1    639285  2639284 TRF_197226
chr1    1686777 1686809 TRF_197185
chr1    1687030 1687057 TRF_197186
chr1    1687706 1687770 TRF_197188
chr1    1687717 1687770 TRF_197187
chr1    1687828 1687861 TRF_197190
chr1    1734806 1734853 TRF_197193
chr1    1736459 1736506 TRF_197195
chr1    2507429 3300067 TRF_197221
chr1    2675012 2676396 TRF_197228
chr1    2676387 3320976 TRF_197230
overlap bedtools intersect • 271 views
ADD COMMENT
0
Entering edit mode

When they are overlapping I am trying to subset them such that I have the minimal number of non overlapping regions.

it's not clear how you choose to merge or not some regions.

ADD REPLY
0
Entering edit mode

I do not want to merge them. I want to filter them such that I retain a subset that cover the most area but any one position only has, at most, one region annotated.

I mentioned bedtools merge because as well as creating a new large region which encompasses all the previous regions it also returns a list of previous regions subsumed into the new merged region. This serves as a mechanism to identifying groups of the overlapping original regions

ADD REPLY

Login before adding your answer.

Traffic: 2048 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6