Extract the values by matching two rows of one dataframe with the two columns of another dataframe
1
2
Entering edit mode
6.6 years ago
MAPK ★ 1.9k

Hi everyone,

I have a programming question and I want this to be done in R. I have two data frames. The df1 has first three columns as header line and the file is in xlsx format. The second data frame has first line as a header. I want to search the genes from the first line of df1 along with their corresponding mutation to match the genes and mutation in df2 and extract the corresponding values. I want the corresponding values to be inserted in the sample line(I have indicated those columns with *). Also I am not concerned anything with the columns indicated by X. I am a beginner in programming and would really appreciate if you guys could help me get this done. Thank you.

 

df1:

       
       
Gene MPK MPKK MPKKK
Mutation D888H T66S D44G
Sample * * *
1 X X X
2 X X X
3 X X X

 

df2:

 

Gene Values Mutation
MPK 211 D888H
MPKK 443 T66S
MPK 44 D334G
MPKK 222 S434G
MPKKK 223 D44G

 

 

I want this Output:

Gene MPK MPKK MPKKK
Mutation D888H T66S D44G
Sample 211 443 223
1 X X X
2 X X X
3 X X X

 

 

 

 

match dataframe • 43k views
ADD COMMENT
1
Entering edit mode
6.6 years ago
Phil S. ▴ 670

Hi,

 

this is quick and dirty code. There is, for sure, a way more elegant way of doing this in R. Since I am in a hurry this should get you started for the table structure you provided...

> df1 <- read.table("df1.txt")
> df2 <- read.table("df2.txt")
> df1
        V1    V2   V3    V4
1     Gene   MPK MPKK MPKKK
2 Mutation D888H T66S  D44G
3   Sample     *    *     *
4        1     X    X     X
5        2     X    X     X
6        3     X    X     X
> df2
     V1     V2       V3
1  Gene Values Mutation
2   MPK    211    D888H
3  MPKK    443     T66S
4   MPK     44    D334G
5  MPKK    222    S434G
6 MPKKK    223     D44G
for(i in 2:ncol(df1)){
+ tmp <- df2[which(df2[,1]==df1[1,i]),]
+ df1[3,i] <- tmp[which(tmp[,3]==df1[2,i]),2]
+ }
> df1

 

> df1
        V1    V2   V3    V4
1     Gene   MPK MPKK MPKKK
2 Mutation D888H T66S  D44G
3   Sample   211  443   223
4        1     X    X     X
5        2     X    X     X
6        3     X    X     X

 

Feel free to modify etc...

 

HTH,

 

Phil

ADD COMMENT
2
Entering edit mode

A 2 line (that could be 1) version:

idx <- match(paste(colnames(df1),df1[1,]), paste(df2$Gene,df2$Mutation))
df1[2,2:4] <- df2$Values[idx[-1]]

For really large matches, inner_join/left_join/etc. from dplyr are more efficient. Note that I used the first line as a header, though one could easily modify things were that not the case.

ADD REPLY
0
Entering edit mode

Hi every one , could you please help me on this error , after running above for loop I have below error:

Error in Ops.factor(df2[, 1], df1[1, i]) : level sets of factors are different

any help would appreciated !

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6