Question: Matching Column Elements
0
7.0 years ago by
robjohn7000100
United Kingdom
robjohn7000100 wrote:

Hi,

I want to compare the elements in column V2 of two tables (below): I'm interested in rows where the elements of columns are close to each other by 3 - 6. The two tables may have different numbers of rows.

For example, row 1 in both tables 1 and 2 differ by 3, I want to add the two elements together and output the mean along with other elements wherever the criterion is satisfied (difference of between 3 - 6) occurs as in the example output.

Table1:

``````V1    V2    V3    V4
TX14566 200    27    10
TX14566 187    27    10
TX14566 723    27    10
TX14566 17    27    10
``````

Table2:

``````V1    V2    V3    V4
TX14566 203    27    10
TX14566 200    27    10
TX14566 729    27    10
TX14566 39    27    10
``````

Example output (only rows 1 and 3 satisfying the criterion are outputted):

``````V1    V2    V3    V4
TX14566 202    27    10
TX14566 726    27    10
``````

I have tried using merge function, thus:

``````xx <- merge(Tab1,Tab2,by=intersect(Tab1\$V2,Tab1\$V2+3:4)
``````

which did not work. I'm unsure about the correct way to write the code in R and would be grateful for help. Thanks

R bioconductor • 1.6k views
modified 4.0 years ago by Biostar ♦♦ 20 • written 7.0 years ago by robjohn7000100
1

Please make it more clear. If two tables can have different number of rows than what should be used as an primary identifier to know which two exact rows should be compared. Here all the rows have the same primary identifier i.e. TX14566. Also, what if the values are not different in the first column of a row but some other column. Do you still want to take the mean or ignore them. What if values differ by more than 6 units in one column and less than 3 in other columns. You will have to come up with clear rules. Take some time and create a good example.

0
7.0 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

The following will show you which rows are separated by the distance range you want. It's completely unclear how you might want dataframes with differing dimensions handled.

``````#given dataframes foo1 and foo2
foo1[which(abs(foo1\$V2-foo2\$V2) %in% c(3:6)),]
``````

Thanks guys! Apologies for my lack of clarity. ashutoshmits - you're right that the same primary identifier ( TX14566) makes the whole thing more difficult than it should be. Only rows satisfying the distance rule are retained, while others are to be eliminated. Following on from dpryan79 code then, which allows me to know the rows satisfying the rule, I will like to find the means of the two values from the two data frames and get a final output as in the example output above (200+203=202; 723+729=726).

Although got errors after modifying the code:

``````foo1[which(abs(foo1\$V2-foo2\$V2) %in% c(3:6)),]
file1[which(abs(file1\$V2-file2\$V2) %in% c(3:6)),]
``````

errors:

``````<0 rows> (or 0-length row.names)
Warning message:
In file1 - file2 :
longer object length is not a multiple of shorter object length
``````