I have BED files for 2 types of coords: A and B. I want to collapse them into intervals where an entry of A is separated from an entry of B by not more than 20, and vice versa. What is not acceptable is a merge of coordinates with 2 or more entries from just A or 2 or more entries from just B.
For example, let's say:
cat A.bed Chr1 10 20 A1 Chr1 50 60 A2 Chr1 75 100 A3
cat B.bed Chr1 25 40 B1 Chr1 115 160 B2
cat A.bed B.bed > AB.bed sortBed -i AB.bed > AB_sort.bed
The output from mergeBed with -d 20 gives what I do NOT want!
mergeBed -d 20 -i AB_sort.bed -c 4 -o collapse -delim "," Chr1 10 160 A1,B1,A2,A3,B2
Results look different when I visually parse bedtools closest results seen below
bedtools closest -d -a A.bed -b B.bed Chr1 10 20 A1 Chr1 25 40 B1 6 Chr1 50 60 A2 Chr1 25 40 B1 11 Chr1 75 100 A3 Chr1 115 160 B2 16
What combinations of bedtools's sub-commands should I use to get the type of result that I seek, shown below:
AB_dist20_merge.bed Chr1 10 60 A1,B1,A2 Chr1 75 160 A3,B2
Thanks to Alex for pointing out an error, changed from A3,B1 to A3,B2 in the line above
In my manually parsed output example above, I do NOT collapse coords for features A2 and A3 together (the rule mentioned in the intro) whereas mergeBed is agnostic of this. Any ideas on how to use bedtools and / or bedOps?
note to self: Images (with and without annotation) to help visualize this example has been added below...
Raw Image for example
Annotated image for example