Question: Match values within a range from uneven data frames
0
gravatar for odayel
2.4 years ago by
odayel20
odayel20 wrote:

Hello! Thank you for your help in advance.

I have two data frames DataBase and Hits

DataBase=
X  Y    OTDB
0   0    OTDB001
2  14   OTDB002
0   0.5  OTDB003 

Hits=
X   Y       Signal
0    0       100
2.1 14.3   20
7    15       90

Results=
X   Y    Signal  OTDB
0    0    100      OTDB001
0    0    100      OTDB003
2.1 14.3 20      OTDB002
7     15   90       NA

For every X, Y in Hits I want to search the database to see if there is a corresponding value within a threshold of X+/- 0.1 and Y+/- 0.5. If there is a value(s) that matches within the threshold I want to add the OTDB number or "NA" to a new column in a Results data frame. It is likely multiple dataBase entries will match an X,Y from the hits.

For perfect matches of X,Y between Database and Hits I used

Results=merge(DataBase, Hits, by=c("X", "Y"), all.x = TRUE, all.y = TRUE)

However I'm having trouble setting the tolerance in searching. Thank you again for any advice!

R • 678 views
ADD COMMENTlink modified 16 months ago by Biostar ♦♦ 20 • written 2.4 years ago by odayel20
1
gravatar for ddiez
2.4 years ago by
ddiez1.7k
Japan
ddiez1.7k wrote:

This is a way to do it (probably not the best):

# datasets.
db <- data.frame(
  x = c(0, 2, 0),
  y = c(0, 14, .5),
  otdb = c("OTDB001", "OTDB002", "OTDB003")
)
db
  x    y    otdb
1 0  0.0 OTDB001
2 2 14.0 OTDB002
3 0  0.5 OTDB003

hits <- data.frame(
  x = c(0, 2.1, 7),
  y = c(0, 14.3, 15),
  signal = c(100, 20, 90)
)
hits
    x    y signal
1 0.0  0.0    100
2 2.1 14.3     20
3 7.0 15.0     90

tol <- 0.5 # set tolerance.
# iterate over hits:
res <- lapply(seq_len(nrow(hits)), function(i) {
  h <- hits[i, ]
  sel.x <- db$x <= h$x + tol & db$x >= h$x - tol
  sel.y <- db$y <= h$y + tol & db$y >= h$y - tol
  sel <- sel.x & sel.y
  if (any(sel)) {
    data.frame(x = db$x[sel], y = db$y[sel], signal = h$signal, otdb = db$otdb[sel])
  } else {
    data.frame(x = h$x, y = h$y, signal = h$signal, otdb = NA)
  }
})
res <- do.call(rbind, res)
res
  x    y signal    otdb
1 0  0.0    100 OTDB001
2 0  0.5    100 OTDB003
3 2 14.0     20 OTDB002
4 7 15.0     90    <NA>
ADD COMMENTlink written 2.4 years ago by ddiez1.7k

Thank you! This works great! I really appreciate your help!! thank you again!

ADD REPLYlink written 2.4 years ago by odayel20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 781 users visited in the last hour