Question: Calculate distances between items in different GRanges
1
3.2 years ago by
igor9.8k
United States
igor9.8k wrote:

I have two GRanges objects. I would like to calculate distances between specific items in one to specific items in the other. For example, I have genes and peaks and I want to get distances between them. Is there a good way to do that?

There is `GenomicRanges::distance`, but that expects a single range. I tried using that and it works fine for individual pairs of ranges. However, iterating through all the combinations takes a really long time. Using `apply` or multi-threaded `foreach` is still slow (more than a day for a million pairs). This can't be the proper way.

I am familiar with `GenomicRanges::distanceToNearest` and that works when you are comparing two GRanges objects, but it only returns the nearest hit.

So is there an efficient way to determine distances between items in two GRanges?

granges bioconductor R • 2.6k views
modified 20 months ago by zx87549.1k • written 3.2 years ago by igor9.8k

Interesting question- don't know if I know the answer. I understand you don't want the distance from all genes to all peaks but only a subset of them (?). Could you add a minimum example with what you have tried to have a better idea of what you want?

I want distances between specific peaks and genes. For example, distance between each peak and all nearby genes (genes within a certain region). I have specific peak-gene pairs I am interested in.

I ended up solving this by taking my data frame with the peak and gene pairs and adding to it positions for peaks (subsetting peaks GR to peaks col) and then positions for genes (subsetting genes GR to genes col). Then I could do some if-else statements to calculate the distance in the right orientation. All of that is vectorized, so it's essentially instant. However, it feels like a poor hack. I would think GenomicRanges has something like that built it.

2
20 months ago by
HectorH20
HectorH20 wrote:

Just a remark: The question seems to be "How to calculate all distances between different GRanges".

Indeed, `GenomicRanges::distance` expects a single range. However, using the argument `select="all"`, it will output all distances between ONE range from the 1st GRanges and ALL ranges from the 2nd GRanges object.

ADD COMMENTlink modified 20 months ago by zx87549.1k • written 20 months ago by HectorH20