It's an interesting quirk. Let's step back and draw some cartoons to see if that might help. I think the following case may help suggest a solution, or the start to one.

It seems like a "worst-case" scenario for you seems to be when merging elements `A`

and `B`

that overlap by your threshold, and `B`

and `C`

overlap by your threshold, but `A`

and `C`

are adjoining (and disjoint) or otherwise do not overlap by your threshold criterion.

Let's sketch this out. Here, `chrZ`

is some weird abstract chromosome, and we have elements `A`

, `B`

, and `C`

that span this chromosome:

```
|-----|-----|-----|-----| chrZ
[-- A --)
[-- B --)
[-- C --)
```

A simple merge on a 0.5 or 50% overlap threshold will correctly give:

```
|-----|-----|-----|-----| chrZ
[-- AB --)
[-- ABC --)
[-- BC --)
```

In this worst-case scenario, the middle element from the result set (`ABC`

) is calculated because `B`

overlaps both `A`

and `C`

by 50%, and so a merge gives `ABC`

.

It sounds like what you really want is something that looks like this:

```
|-----|-----|-----|-----| chrZ
[-- AB --)
[-- BC --)
```

In other words, you want to remove any merge results from the set {`AB`

, `ABC`

, `BC`

} which contain a nested element from the set of {`A`

, `B`

, `C`

}. (Or, the larger set of merges for more than three elements, where there are full or partial pairwise 50% overlaps, more generally.)

A nested element is one for which its start and stop coordinates fall within the bounds of its parent interval.

A merge element result that contains a nested element is one generated from a use case where one of the elements contributing to that merge result does not meet your threshold over all pairwise comparisons. I think this property can be exploited.

In the above worst-case scenario, merge result `ABC`

contains the nested element `B`

, where both the start and stop coordinates of `B`

fall within the start and stop coordinates of `ABC`

.

You could use `awk`

to strip out such results from a `bedmap`

call, by using `awk`

to test the start and end coordinates of each input coordinate from `in.bed`

against the merged range result that comes out of `--echo-map-range`

:

```
$ bedmap --fraction-map 0.5 --echo-map-range --echo --delim '\t' in.bed | awk -vFS="\t" -vOFS="\t" '{ if (($2 == $5) || ($3 == $6)) { print $1, $2, $3 } }' | sort -u | sort-bed - > out.bed
```

Note that this also pipes results from `bedmap`

and `awk`

to `sort -u`

, in order take care of duplicate `--echo-map-range`

results (generated from other, more relaxed overlap situations) and there is a final pipe to `sort-bed`

, in order to ensure generating a sorted BED file that is ready for further set operations.

I think this works, but I may be wrong by missing a use case. Please feel free to give it a go and post back sample data, if you believe it doesn't meet your needs.

Thanks for the suggestions, but because I wrote the post rapidly, I didn't explain myself correctly as I gave incorrect details. I have used bedtools intersect with -f and -F 0.5 and -e, so if only one region is covered enough. The thing is, [A B], [B C] and [C D] accomplish this requisite, but not A with C.

In short, I don't wish to merge all regions just because B and C also pass the filter. As I stated before, I thought about doing merge sequentially (A with B, then [AB] doesn't overlap 0.5 with C. Now I check B with C so I merge then and finally [BC] with D so I would have [AB] and [BCD]). I will update the initial post.

0