Question: Get upstream and downstream intergenic regions in R
1
gravatar for greentailedmouse
2.6 years ago by
greentailedmouse50 wrote:

Hi all,

I would like to get the upstream and downstream intergenic distance between two genes in R (taking into account that some genes may overlap). For example - in the following case I would expect to get a the upstream distance of 5 for gene A, a distance 0 or negative for downstream of A, and a distance of 10 for downstream of B/upstream of C.

-----|____gene A____|-------------------------|____gene_C___|---     + strand
---------------|______gene_B_______|----------------------------     - strand

I tried to use precede() and follow() in GRanges but these two functions completely exclude overlapping genes from the analysis. This sounds like a task everyone is doing and it should be part of a package but I can't find anything appropriate.

Thank you for any hints!

intergenic granges R • 1.1k views
ADD COMMENTlink modified 2.6 years ago by ddiez1.7k • written 2.6 years ago by greentailedmouse50

I think your problem might be similar to that of finding the distance between motifs asked in this question. My answer there could be a starting point.

ADD REPLYlink written 2.6 years ago by ddiez1.7k

OK on second thought I see you want to obtain a negative distance for overlapping genes (to tell the amount of overlapping I guess). I am not sure how can you get this... (might think about it later).

ADD REPLYlink written 2.6 years ago by ddiez1.7k
2
gravatar for ddiez
2.6 years ago by
ddiez1.7k
Japan
ddiez1.7k wrote:

Here is some attempt. I create an example ranges (IRanges for simplicity). You can used this modified version of plotRanges() function (from the IRangesOverview vignette in the IRanges package) to visualize the ranges and how they overlap.

The problem using follow() and precede() is that they ignore overlapping ranges. Therefore here instead compute the sequence-wide comparison of distances. Maybe this needs to be modified to be more efficient and only do this in the closest sequence. However, with overlapping sequences it is not so clear to me what closest means.

library(IRanges)
foo <- IRanges(start = c(1, 7, 10, 5, 13), end = c(4, 12, 15, 8, 14))

plotRanges <- function(x,
                       xlim = x,
                       main = deparse(substitute(x)),
                       col = "black",
                       sep = 0.5,
                       ...)

{
  height <- 1
  if (is(xlim, "Ranges"))
    xlim <-
      c(min(start(xlim)), max(end(xlim)))
  bins <-
    disjointBins(IRanges(start(x), end(x) + 1))
  plot.new()
  plot.window(xlim, c(0, max(bins) * (height + sep)))
  ybottom <-
    bins * (sep + height) - height
  rect(start(x), ybottom, end(x), ybottom + height, col = col, ...)
  text(c(start(x) + width(x)/2) - .5, y = ybottom + height /2)
  title(main)
  r <- range(x)
  axis(1, at = seq(start(r), end(r)))
}

plotRanges(foo, col = rainbow(length(foo)))
foo
IRanges object with 5 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         4         4
  [2]         7        12         6
  [3]        10        15         6
  [4]         5         8         4
  [5]        13        14         2

I compute the sequence-wise comparison as this:

# comparison matrix.
combmatrix <- t(combn(length(foo), 2))
combmatrix
colnames(combmatrix) <- c("i", "j")
      i j
 [1,] 1 2
 [2,] 1 3
 [3,] 1 4
 [4,] 1 5
 [5,] 2 3
 [6,] 2 4
 [7,] 2 5
 [8,] 3 4
 [9,] 3 5

Then iterate over the comparison matrix and compute the distances:

hoo <- apply(combmatrix, 1, function(x) {
  s1 <- foo[x[1], ]
  s2 <- foo[x[2], ]
  if (start(s1) < start(s2))
    start(s2) - end(s1)
  else
    start(s1) - end(s2)
})

cbind(combmatrix, distance = hoo)
      i j distance
 [1,] 1 2        3
 [2,] 1 3        6
 [3,] 1 4        1
 [4,] 1 5        9
 [5,] 2 3       -2
 [6,] 2 4       -1
 [7,] 2 5        1
 [8,] 3 4        2
 [9,] 3 5       -2
[10,] 4 5        5

Note that for the comparison between sequences 3 and 5, it is not clear to me what the right answer would be in this case. I guess this is the reason distances to overlapping ranges are zero in IRanges (and GRanges).

ADD COMMENTlink written 2.6 years ago by ddiez1.7k

I'll mark your answer as accepted since I've done something similar in the end; but it also answers my original question - namely, there is no such function or parameter in Granges. Thanks!

ADD REPLYlink written 2.6 years ago by greentailedmouse50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 903 users visited in the last hour