3.7 years ago by

Seattle, WA USA

If you use `strsplit()`

, you could test this directly. For instance:

> v <- '1:3099740-8804450'
> unlist(strsplit(v, '[:-]'))
[1] "1" "3099740" "8804450"

Then you can make a function out of it and print out the result of the desired tests via `apply()`

:

> t <- read.table("some.table", header=T)
> interval <- function(str) unlist(strsplit(str, '[:-]'))
> apply(t, 1, function(x) (x[2] == interval(x[4])[1]) && (as.numeric(x[3]) > as.numeric(interval(x[4])[2])) && (as.numeric(x[3]) < as.numeric(interval(x[4])[3])))
[1] FALSE FALSE FALSE FALSE TRUE TRUE

We leave the chromosome name as a character vector to deal with cases like `X`

and `Y`

, and convert the position and interval position elements to numerics via `as.numeric()`

to apply numerical relation operators.

The `apply`

call can also be turned into a function:

> interval_test <- function(t) apply(t, 1, function(x) (x[2] == interval(x[4])[1]) && (as.numeric(x[3]) > as.numeric(interval(x[4])[2])) && (as.numeric(x[3]) < as.numeric(interval(x[4])[3])))

This function can be used to filter your table for rows that fall within the interval:

> t[interval_test(t),]
GeneID Chr Position Interval
5 b 2 9983384 2:4864334-18271005
6 c 2 11479025 2:8222941-18271005

Or, perhaps, used to filter for rows which do not fall within the interval, by using the `!`

operator:

> t[!interval_test(t),]
GeneID Chr Position Interval
1 x 1 18697251 1:3099740-8804450
2 y 1 19546617 1:3422930-8804450
3 z 1 3332236 2:2751757-4502486
4 a 2 3993537 2:6187995-8804450