Question

How to alternate and merge columns from different data frames?

0

Entering edit mode

7.4 years ago

Spacebio ▴ 200

Hello,

I have two different dfs looking like the example below. df1 displays the name of a group of pathways as df2 shows the category of the pathway in the same order they appear on df1

> df1:

Path_1                                           Path_2
Amphoterin signaling                             Antigen presentation
Antigen presentation                             Death Domain receptors & caspases in apoptosis
Regulation of angiogenesis                       Apoptosis stimulation by external signals
Blood vessel morphogenesis                       Regulation of angiogenesis
Cartilage development                            Blood vessel morphogenesis
Apoptosis stimulation by external signals        Cartilage development
Death Domain receptors & caspases in apoptosis   Amphoterin signaling


> df2:

Type_1                     Type_2
Inflammation               Immune response
Immune response            Signal transduction
Development                Apoptosis and survival
Development                Development
Development                Development
Apoptosis and survival     Development
Signal transduction        Inflammation

I'd like to obtain a unique df displaying both columns like this:

> df_all:

df_all_1
Amphoterin signaling_Inflammation
Antigen presentation_Immune response
Regulation of angiogenesis_Development
Blood vessel morphogenesis_Development
Cartilage development_Development
Apoptosis stimulation by external signals_Apoptosis and survival
Death Domain receptors & caspases in apoptosis_Signal transduction

df_all_2
Antigen presentation_Immune response
Death Domain receptors & caspases in apoptosis_Signal transduction
Apoptosis stimulation by external signals_Apoptosis and survival
Regulation of angiogenesis_Development
Blood vessel morphogenesis_Development
Cartilage development_Development
Amphoterin signaling_Inflammation

I tried with this code:

df_all <- merge(data.frame(df1, row.names = NULL), data.frame(df2, row.names = NULL), by = 0, all = T)[-1]

but this is just merging all the columns together without alternating. Any suggestions? Preferably base R

R dataframe • 4.7k views

ADD COMMENT • link 7.4 years ago by Spacebio ▴ 200

2

Entering edit mode

Output will be stored in a third dataframe (df3) and each column from two data frames will be concatenated. It is a blind concatenation assuming that column 1 of df1 has exact rows as column 1 of df2 and they match. Number of columns and number of rows of each data frame (df1, df2) match with resultant data frame (df3)

setwd("~/Desktop/")
df1=read.csv("df1.txt",sep="\t", strip.white = T, stringsAsFactors = F)
df2=read.csv("df2.txt",sep="\t", strip.white = T, stringsAsFactors = F)

df3 = data.frame(matrix(NA, ncol = ncol(df1), nrow = nrow(df1)))

for (i in 1:ncol(df1)){
#    print (i)
    df3[,i]=paste(df1[,i],df2[,i],sep="_")
}

or

df3=data.frame(sapply(seq(1:ncol(df1)), function(x) paste(df1[,x],df2[,x],sep="_")))

output:

"X1" "X2"
"Amphoterin signaling_Inflammation" "Antigen presentation_Immune response"
"Antigen presentation_Immune response" "Death Domain receptors & caspases in apoptosis_Signal transduction"
"Regulation of angiogenesis_Development" "Apoptosis stimulation by external signals_Apoptosis and survival"
"Blood vessel morphogenesis_Development" "Regulation of angiogenesis_Development"
"Cartilage development_Development" "Blood vessel morphogenesis_Development"
"Apoptosis stimulation by external signals_Apoptosis and survival" "Cartilage development_Development"
"Death Domain receptors & caspases in apoptosis_Signal transduction" "Amphoterin signaling_Inflammation"

ADD REPLY • link 7.4 years ago by cpad0112 21k

0

Entering edit mode

The loop works really fast, thank you so much!!

ADD REPLY • link 7.4 years ago by Spacebio ▴ 200

1

Entering edit mode

To get column names as df_all_1, df_all_2, use following code:

for (i in 1:ncol(df1)){
    #    print (i)
    df3[,i]=paste(df1[,i],df2[,i],sep="_")
    colnames(df3)[i]=paste0("df_all_",i)
}

> df3
                                                            df_all_1
1                                  Amphoterin signaling_Inflammation
2                               Antigen presentation_Immune response
3                             Regulation of angiogenesis_Development
4                             Blood vessel morphogenesis_Development
5                                  Cartilage development_Development
6   Apoptosis stimulation by external signals_Apoptosis and survival
7 Death Domain receptors & caspases in apoptosis_Signal transduction
                                                            df_all_2
1                               Antigen presentation_Immune response
2 Death Domain receptors & caspases in apoptosis_Signal transduction
3   Apoptosis stimulation by external signals_Apoptosis and survival
4                             Regulation of angiogenesis_Development
5                             Blood vessel morphogenesis_Development
6                                  Cartilage development_Development
7                                  Amphoterin signaling_Inflammation
>

ADD REPLY • link 7.4 years ago by cpad0112 21k

0

Entering edit mode

df_all = data.frame(df_all1 =  paste(df1$Path_1 , df2$Type_1 , sep="_") , df_all2 = paste(df1$Path_2 , df2$Type_2 ,sep= "_"))

ADD REPLY • link 7.4 years ago by Chirag Parsania ★ 2.0k