Question: Calculate distances between items in different GRanges
gravatar for igor
3.2 years ago by
United States
igor9.8k wrote:

I have two GRanges objects. I would like to calculate distances between specific items in one to specific items in the other. For example, I have genes and peaks and I want to get distances between them. Is there a good way to do that?

There is GenomicRanges::distance, but that expects a single range. I tried using that and it works fine for individual pairs of ranges. However, iterating through all the combinations takes a really long time. Using apply or multi-threaded foreach is still slow (more than a day for a million pairs). This can't be the proper way.

I am familiar with GenomicRanges::distanceToNearest and that works when you are comparing two GRanges objects, but it only returns the nearest hit.

So is there an efficient way to determine distances between items in two GRanges?

granges bioconductor R • 2.6k views
ADD COMMENTlink modified 20 months ago by zx87549.1k • written 3.2 years ago by igor9.8k

Interesting question- don't know if I know the answer. I understand you don't want the distance from all genes to all peaks but only a subset of them (?). Could you add a minimum example with what you have tried to have a better idea of what you want?

ADD REPLYlink written 3.2 years ago by ddiez1.8k

I want distances between specific peaks and genes. For example, distance between each peak and all nearby genes (genes within a certain region). I have specific peak-gene pairs I am interested in.

I ended up solving this by taking my data frame with the peak and gene pairs and adding to it positions for peaks (subsetting peaks GR to peaks col) and then positions for genes (subsetting genes GR to genes col). Then I could do some if-else statements to calculate the distance in the right orientation. All of that is vectorized, so it's essentially instant. However, it feels like a poor hack. I would think GenomicRanges has something like that built it.

ADD REPLYlink written 3.2 years ago by igor9.8k
gravatar for HectorH
20 months ago by
HectorH20 wrote:

Just a remark: The question seems to be "How to calculate all distances between different GRanges".

Indeed, GenomicRanges::distance expects a single range. However, using the argument select="all", it will output all distances between ONE range from the 1st GRanges and ALL ranges from the 2nd GRanges object.

ADD COMMENTlink modified 20 months ago by zx87549.1k • written 20 months ago by HectorH20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 792 users visited in the last hour