Hi,
I‘ve got a list of genomic regions in python, say:
list=["chr1: 100-300", "chr1:150-350", "chr1:200-400", "chr1:500-700", "chr1:600-800", "chr1:900-1000"]
Some of them overlap each other. I wanna create a new list contain subsets of genomic regions and without overlapping regions in each subset. The result should be something like:
new_list=[["chr1:100-300", "chr1:500-700", "chr1:900-1000"], ["chr1:150-350", "chr1:600-800"], ["chr1:200-400"]]
I created my own function but it tooks 20 mins to achieve the result, if the origin list contain 1000 regions.
Is there any quick way to do that? Please help.
It seems like you could have at least two different answers:
new_list_1=[["chr1:100-300", "chr1:500-700", "chr1:900-1000"], ["chr1:150-350", "chr1:600-800"], ["chr1:200-400"]]
ornew_list_2=[["chr1:100-300", "chr1:600-800", "chr1:900-1000"], ["chr1:150-350", "chr1:500-700"], ["chr1:200-400"]]
. Which result would be correct?Either one will be great.
Could this be done with bedtools?