New Column based on another data frame
1
0
Entering edit mode
17 months ago
j.lunger18 ▴ 10

Hi, I am trying to apply this problem to a data frame with variants so that I can say within which domain each variant is found.

> ranges
start end domain_name
1     1   3   beginning
2     4   6     middle1
3     7   8     middle2
4     9  11         end

> positions
ID position
1   a        0
2   b        1
3   c        2
4   d        3
5   e        4
6   f        5
7   g        6
8   h        7
9   i        8
10  j        9
11  k       10
12  l       11
13  m       12
14  n       13


I want to add a column to "positions", which will tell me which domain (and there could be multiple for a single variant...) each position is found in. Thanks!

r domains genome • 364 views
0
Entering edit mode
0
Entering edit mode
17 months ago
Brice Sarver ★ 3.6k

Something like this will work. Assumes non-overlapping domains and no special R packages. Also casting to numerics to avoid any character conflicts. ranges must be global.

locate_domain <- function(position) {
for (i in 1:nrow(ranges)) {
r <- c(as.numeric(ranges[i, 1]):as.numeric(ranges[i, 2]))
if (position %in% r) {
return(ranges[i, 3])
}
}
}

positions <- cbind(positions, domain = sapply(as.numeric(positions\$position), locate_domain)


This will search for a given position in a range of positions calculated on-the-fly in the data.frame and return the domain, then cbind it to the positions data.frame. Alternatively, you could pre-compute the ranges and store in a list named by the domain and return the name, compute the range on the fly, etc.