Question: Matching Column Elements
gravatar for robjohn7000
7.0 years ago by
United Kingdom
robjohn7000100 wrote:


I want to compare the elements in column V2 of two tables (below): I'm interested in rows where the elements of columns are close to each other by 3 - 6. The two tables may have different numbers of rows.

For example, row 1 in both tables 1 and 2 differ by 3, I want to add the two elements together and output the mean along with other elements wherever the criterion is satisfied (difference of between 3 - 6) occurs as in the example output.


V1    V2    V3    V4    
TX14566 200    27    10
TX14566 187    27    10
TX14566 723    27    10
TX14566 17    27    10


V1    V2    V3    V4    
TX14566 203    27    10
TX14566 200    27    10
TX14566 729    27    10
TX14566 39    27    10

Example output (only rows 1 and 3 satisfying the criterion are outputted):

V1    V2    V3    V4    
TX14566 202    27    10
TX14566 726    27    10

I have tried using merge function, thus:

xx <- merge(Tab1,Tab2,by=intersect(Tab1$V2,Tab1$V2+3:4)

which did not work. I'm unsure about the correct way to write the code in R and would be grateful for help. Thanks

R bioconductor • 1.6k views
ADD COMMENTlink modified 4.0 years ago by Biostar ♦♦ 20 • written 7.0 years ago by robjohn7000100

Please make it more clear. If two tables can have different number of rows than what should be used as an primary identifier to know which two exact rows should be compared. Here all the rows have the same primary identifier i.e. TX14566. Also, what if the values are not different in the first column of a row but some other column. Do you still want to take the mean or ignore them. What if values differ by more than 6 units in one column and less than 3 in other columns. You will have to come up with clear rules. Take some time and create a good example.

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by Ashutosh Pandey12k
gravatar for Devon Ryan
7.0 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

The following will show you which rows are separated by the distance range you want. It's completely unclear how you might want dataframes with differing dimensions handled.

#given dataframes foo1 and foo2
foo1[which(abs(foo1$V2-foo2$V2) %in% c(3:6)),]
ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Devon Ryan97k

Thanks guys! Apologies for my lack of clarity. ashutoshmits - you're right that the same primary identifier ( TX14566) makes the whole thing more difficult than it should be. Only rows satisfying the distance rule are retained, while others are to be eliminated. Following on from dpryan79 code then, which allows me to know the rows satisfying the rule, I will like to find the means of the two values from the two data frames and get a final output as in the example output above (200+203=202; 723+729=726).

Although got errors after modifying the code:

foo1[which(abs(foo1$V2-foo2$V2) %in% c(3:6)),]
file1[which(abs(file1$V2-file2$V2) %in% c(3:6)),]


<0 rows> (or 0-length row.names)
 Warning message:
 In file1 - file2 :
 longer object length is not a multiple of shorter object length
ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by robjohn7000100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2161 users visited in the last hour