Question: Comparing 2 Columns at once
1
gravatar for mail2steff
2.8 years ago by
mail2steff60
Potsdam, Germay
mail2steff60 wrote:

I am new to R programming. I have a data frame with 120 columns and 518 rows. Now I have to compare columns to columns (2 at once). If two values in successive columns are same 0 ( if not same -> 1) should be added to a new data frame

>data
V1 V2 V3 V4 V5 V6
A  A  C  C  G  G
A  G  T  T  C  G
G  C  T  A  A  C

The output should look like

>new_data_fram
V12 V34 V45
0   0   0
1   0   1
1   1   1

Can anyone help me with this? Thank you in advance

seq R • 638 views
ADD COMMENTlink modified 2.8 years ago by shoujun.gu370 • written 2.8 years ago by mail2steff60
1

You're skipping a cpl of cols in your output example. Did you try any code in R? If so, show it along with any errors. If not, try something and come back with it.

ADD REPLYlink written 2.8 years ago by st.ph.n2.5k

I tried with combn fucntion in R.
compare = t(combn(ncol(file8),2,FUN=function(x)file8[,x[1]]==file8[,x[2]])) But I got the following output

V1  V2  V3  V4  V5  V6`

1 1 1 1 1 1

0 0 0 0 0 0

0 0 0 0 0 0

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by mail2steff60
1
gravatar for zx8754
2.8 years ago by
zx87549.7k
London
zx87549.7k wrote:

Taking advantage of recycling in R, we can do as below:

# data
df1 <- read.table(text = "V1 V2 V3 V4 V5 V6
A  A  C  C  G  G
A  G  T  T  C  G
G  C  T  A  A  C", header = TRUE, stringsAsFactors = FALSE)

# compare odd columns with even using recycling, then convert to number 0,1.
(!df1[, c(TRUE, FALSE)] == df1[, c(FALSE, TRUE)]) * 1
#      V1 V3 V5
# [1,]  0  0  0
# [2,]  1  0  1
# [3,]  1  1  1
ADD COMMENTlink written 2.8 years ago by zx87549.7k
1

thank u so much . It worked perfectly

ADD REPLYlink written 2.8 years ago by mail2steff60
0
gravatar for shoujun.gu
2.8 years ago by
shoujun.gu370
Rockville/MD
shoujun.gu370 wrote:

here is the python code, replace the real file name in the first two lines:

input_file='your_input_file'
output_file='your_output_file'

import pandas as pd

df=pd.read_csv(input_file, index_col=0)
col=df.columns
col_t=col[:-1]

new_col=[col_t[i]+str(i+2) for i in range(len(col_t))]

for i in range(len(col_t)):
    df[new_col[i]]=(df[col[i]]==df[col[i+1]]).astype(int)

df=df.loc[:,new_col]
df.to_csv('output_file')
ADD COMMENTlink written 2.8 years ago by shoujun.gu370

Thank you for the reply. Ill try this also

ADD REPLYlink written 2.8 years ago by mail2steff60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1435 users visited in the last hour