Are there known good datastructures/algorithms for finding the nearest region like bedtools closest?
From the top of my head:
If I have a datastructure to do interval lookup (like an intervaltree) I can for each range in A: (a_start, a_end) do
hits =  i = 0 slack = 1000 while not hits: hits = interval_tree.find(start - slack * i, end + slack * i) i += 1 find_nearest_in_hits(start, end, hits)
But even in C this is might be slooow, depending on how the data looks.
I can use a large slack, but that would just require me to do more work in
find_nearest_in_hits since it would get a larger result set most of the time.
(Here the intervaltree contains the ranges in B, while I use A to query it).