2.8 years ago by
Seattle, WA USA
EDIT - I suggest using
closest-features in an answer at the bottom of this page. I would use that approach instead of this one. It will run fast and allow more relaxed assumptions about input.
Running under the assumption that your input intervals are entirely disjoint — none overlap each other — you can use BEDOPS tools to solve this easily.
First, sort your intervals with BEDOPS sort-bed:
$ sort-bed < intervals.unsorted.bed > intervals.bed
As a first pass, use bedmap directly with a symmetric one-base padding (via --range):
$ bedmap --echo --echo-map --range 1 --delim '\t' intervals.bed > answer.bed
Take a look at
answer.bed. Because we use a symmetric padding, this result will contain multiple intervals, if one interval either follows before or after another by one base.
Per your question's guidelines, we only want mapped intervals that follow afterwards. One extra step is needed to filter out elements in
answer.bed where an interval contains itself and another interval that comes before it:
$ bedmap --echo --echo-map --range 1 --delim '\t' intervals.bed \
| awk '$3 < $8' - \
The addition of the awk test filters out results where the stop position of a reference (unpadded) interval is greater than or equal to the start position of any mapped (padded) interval.
Depending on what "follow" means for reverse-stranded elements, there is an easy way to deal with that case. Feel free to follow up with what that means to you.
Also, if your input intervals are not guaranteed to be disjoint, follow up with that qualification and I'll see if I can modify my answer to help in that case.