Question: Comparing 2 Columns at once
1
gravatar for mail2steff
14 months ago by
mail2steff50
Potsdam, Germay
mail2steff50 wrote:

I am new to R programming. I have a data frame with 120 columns and 518 rows. Now I have to compare columns to columns (2 at once). If two values in successive columns are same 0 ( if not same -> 1) should be added to a new data frame

>data
V1 V2 V3 V4 V5 V6
A  A  C  C  G  G
A  G  T  T  C  G
G  C  T  A  A  C

The output should look like

>new_data_fram
V12 V34 V45
0   0   0
1   0   1
1   1   1

Can anyone help me with this? Thank you in advance

seq R • 392 views
ADD COMMENTlink modified 14 months ago by shoujun.gu370 • written 14 months ago by mail2steff50
1

You're skipping a cpl of cols in your output example. Did you try any code in R? If so, show it along with any errors. If not, try something and come back with it.

ADD REPLYlink written 14 months ago by st.ph.n2.4k

I tried with combn fucntion in R.
compare = t(combn(ncol(file8),2,FUN=function(x)file8[,x[1]]==file8[,x[2]])) But I got the following output

V1  V2  V3  V4  V5  V6`

1 1 1 1 1 1

0 0 0 0 0 0

0 0 0 0 0 0

ADD REPLYlink modified 14 months ago • written 14 months ago by mail2steff50
1
gravatar for zx8754
14 months ago by
zx87547.1k
London
zx87547.1k wrote:

Taking advantage of recycling in R, we can do as below:

# data
df1 <- read.table(text = "V1 V2 V3 V4 V5 V6
A  A  C  C  G  G
A  G  T  T  C  G
G  C  T  A  A  C", header = TRUE, stringsAsFactors = FALSE)

# compare odd columns with even using recycling, then convert to number 0,1.
(!df1[, c(TRUE, FALSE)] == df1[, c(FALSE, TRUE)]) * 1
#      V1 V3 V5
# [1,]  0  0  0
# [2,]  1  0  1
# [3,]  1  1  1
ADD COMMENTlink written 14 months ago by zx87547.1k
1

thank u so much . It worked perfectly

ADD REPLYlink written 14 months ago by mail2steff50
0
gravatar for shoujun.gu
14 months ago by
shoujun.gu370
Rockville/MD
shoujun.gu370 wrote:

here is the python code, replace the real file name in the first two lines:

input_file='your_input_file'
output_file='your_output_file'

import pandas as pd

df=pd.read_csv(input_file, index_col=0)
col=df.columns
col_t=col[:-1]

new_col=[col_t[i]+str(i+2) for i in range(len(col_t))]

for i in range(len(col_t)):
    df[new_col[i]]=(df[col[i]]==df[col[i+1]]).astype(int)

df=df.loc[:,new_col]
df.to_csv('output_file')
ADD COMMENTlink written 14 months ago by shoujun.gu370

Thank you for the reply. Ill try this also

ADD REPLYlink written 14 months ago by mail2steff50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 868 users visited in the last hour