Question

R programming question: genotyping data table manipulation

5

Entering edit mode

10.2 years ago

MAPK ★ 2.1k

Hi guys,

I have a problem figuring out a way to do this in R. I have two data frames with genotyping data, df1 and df2. The table is really big, with hundreds of samples, which I have included small set of this data below. I have also included pieces of codes that I tried to make keys for both data columns to match with each other.

df1:

Gene      MAPK1    MAPK1    MAPK1    MAPK2    MAPK2    MAPK2
TYPE      At       GT       AD       At       GT       AD
Sample    R23C     R23C     R23C     T34Y     T34Y     T34Y
1         A                          G
2         A                          G
3         A                          G

rownames(seqnom)[c(1,2,3)]<-c("Gene","Type","Sample")
key.df1 <- paste(paste(df1["Gene",] , df1["Sample",],sep=":"))

Now, we have df2:

df2:

Genes        MAPK1    MAPK1    MAPK2    MAPK3    MAPK4    MAPK4
Names
Protein      R23C     R33Y     T34Y     R45C     T44S     S33D
Mutation
1.GT         0/0      0/0      0/0      0/0      0/0      0/0
1.AD         34,2     23,4     33,33    33,2     44,44    34,0
2.GT         0/1      0/1      0/1      0/0      0/1      0/1
2.AD         22,3     33,2     44,22    34,22    34,3     91,91
3.GT         1/1      1/1      1/1      1/1      1/1      1/1
3.AD         33,2     3,2      112,0    22,3     34,0     33,2

key.df2 <- paste(paste(df2["Gene Names",],df2["Protein Mutation",],sep=":"))

So using these two keys (key.df1 and key.df2) I would like to match with each other and if they match I want to paste the corresponding values in their respective columns. There are 100 samples (1:100) and all 100 samples have GT and AD values. Could you guys please help me fill the table below. I would really really appreciate it guys. Thank you.

Result:

Gene      MAPK1    MAPK1    MAPK1    MAPK2    MAPK2    MAPK2
TYPE      At       GT       AD       At       GT       AD
Sample    R23C     R23C     R23C     T34Y     T34Y     T34Y
1         A        0/0      34,2     G        0/0      33,33
2         A        0/1      22,3     G        0/1      44,22
3         A        0/1      33,2     G        1/1      112,0

R • 2.4k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by MAPK ★ 2.1k

0

Entering edit mode

I found the table difficult to read. Are your table follow this format?

df1:

Gene      MAPK1    MAPK1    MAPK1    MAPK2    MAPK2    MAPK2
TYPE      At       GT       AD       At       GT       AD
Sample    R23C     R23C     R23C     T34Y     T34Y     T34Y
1         A                          G
2         A                          G
3         A                          G

df2:

Genes        MAPK1    MAPK1    MAPK2    MAPK3    MAPK4    MAPK4
Names
Protein      R23C     R33Y     T34Y     R45C     T44S     S33D
Mutation
1.GT         0/0      0/0      0/0      0/0      0/0      0/0
1.AD         34,2     23,4     33,33    33,2     44,44    34,0

And you want to merge the two table given the Gene name and Protein mutation

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by Sam ★ 4.8k

0

Entering edit mode

Sorry, Yes you are right. Thank you for replying to my question.

ADD REPLY • link 10.2 years ago by MAPK ★ 2.1k

0

Entering edit mode

Sorry, you are right. Thanks

ADD REPLY • link 10.2 years ago by MAPK ★ 2.1k

Ram · Answer 1 · 2015-04-27

1

Entering edit mode

10.2 years ago

Michael 55k

I think that merge will do the job, almost. For this to work, rows need to correspond to genes and columns to samples, as is common for many bioinformatics text-format,s you could try the following:

t(merge(t(x), t(y)), by=1)

This should work in principle, see also ?merge. If you want to remove or rename columns, this can be easily done using subsetting.

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by Michael 55k