Question

Matching Column Elements

0

Entering edit mode

10.5 years ago

robjohn7000 ▴ 110

Hi,

I want to compare the elements in column V2 of two tables (below): I'm interested in rows where the elements of columns are close to each other by 3 - 6. The two tables may have different numbers of rows.

For example, row 1 in both tables 1 and 2 differ by 3, I want to add the two elements together and output the mean along with other elements wherever the criterion is satisfied (difference of between 3 - 6) occurs as in the example output.

Table1:

V1    V2    V3    V4    
TX14566 200    27    10
TX14566 187    27    10
TX14566 723    27    10
TX14566 17    27    10

Table2:

V1    V2    V3    V4    
TX14566 203    27    10
TX14566 200    27    10
TX14566 729    27    10
TX14566 39    27    10

Example output (only rows 1 and 3 satisfying the criterion are outputted):

V1    V2    V3    V4    
TX14566 202    27    10
TX14566 726    27    10

I have tried using merge function, thus:

xx <- merge(Tab1,Tab2,by=intersect(Tab1$V2,Tab1$V2+3:4)

which did not work. I'm unsure about the correct way to write the code in R and would be grateful for help. Thanks

r bioconductor • 2.1k views

ADD COMMENT • link updated 7.4 years ago by Biostar 20 • written 10.5 years ago by robjohn7000 ▴ 110

1

Entering edit mode

Please make it more clear. If two tables can have different number of rows than what should be used as an primary identifier to know which two exact rows should be compared. Here all the rows have the same primary identifier i.e. TX14566. Also, what if the values are not different in the first column of a row but some other column. Do you still want to take the mean or ignore them. What if values differ by more than 6 units in one column and less than 3 in other columns. You will have to come up with clear rules. Take some time and create a good example.

ADD REPLY • link 10.5 years ago by Ashutosh Pandey 12k

score 0 · Answer 1 · 2013-11-19

0

Entering edit mode

10.5 years ago

Devon Ryan 104k

The following will show you which rows are separated by the distance range you want. It's completely unclear how you might want dataframes with differing dimensions handled.

#given dataframes foo1 and foo2
foo1[which(abs(foo1$V2-foo2$V2) %in% c(3:6)),]

ADD COMMENT • link 10.5 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks guys! Apologies for my lack of clarity. ashutoshmits - you're right that the same primary identifier ( TX14566) makes the whole thing more difficult than it should be. Only rows satisfying the distance rule are retained, while others are to be eliminated. Following on from dpryan79 code then, which allows me to know the rows satisfying the rule, I will like to find the means of the two values from the two data frames and get a final output as in the example output above (200+203=202; 723+729=726).

Although got errors after modifying the code:

foo1[which(abs(foo1$V2-foo2$V2) %in% c(3:6)),]
file1[which(abs(file1$V2-file2$V2) %in% c(3:6)),]

errors:

<0 rows> (or 0-length row.names)
 Warning message:
 In file1 - file2 :
 longer object length is not a multiple of shorter object length

ADD REPLY • link 10.5 years ago by robjohn7000 ▴ 110