R programming question: genotyping data table manipulation
1
5
Entering edit mode
9.0 years ago
MAPK ★ 2.1k

Hi guys,

I have a problem figuring out a way to do this in R. I have two data frames with genotyping data, df1 and df2. The table is really big, with hundreds of samples, which I have included small set of this data below. I have also included pieces of codes that I tried to make keys for both data columns to match with each other.

df1:

Gene      MAPK1    MAPK1    MAPK1    MAPK2    MAPK2    MAPK2
TYPE      At       GT       AD       At       GT       AD
Sample    R23C     R23C     R23C     T34Y     T34Y     T34Y
1         A                          G
2         A                          G
3         A                          G
rownames(seqnom)[c(1,2,3)]<-c("Gene","Type","Sample")
key.df1 <- paste(paste(df1["Gene",] , df1["Sample",],sep=":"))

Now, we have df2:

df2:

Genes        MAPK1    MAPK1    MAPK2    MAPK3    MAPK4    MAPK4
Names
Protein      R23C     R33Y     T34Y     R45C     T44S     S33D
Mutation
1.GT         0/0      0/0      0/0      0/0      0/0      0/0
1.AD         34,2     23,4     33,33    33,2     44,44    34,0
2.GT         0/1      0/1      0/1      0/0      0/1      0/1
2.AD         22,3     33,2     44,22    34,22    34,3     91,91
3.GT         1/1      1/1      1/1      1/1      1/1      1/1
3.AD         33,2     3,2      112,0    22,3     34,0     33,2
key.df2 <- paste(paste(df2["Gene Names",],df2["Protein Mutation",],sep=":"))

So using these two keys (key.df1 and key.df2) I would like to match with each other and if they match I want to paste the corresponding values in their respective columns. There are 100 samples (1:100) and all 100 samples have GT and AD values. Could you guys please help me fill the table below. I would really really appreciate it guys. Thank you.

Result:

Gene      MAPK1    MAPK1    MAPK1    MAPK2    MAPK2    MAPK2
TYPE      At       GT       AD       At       GT       AD
Sample    R23C     R23C     R23C     T34Y     T34Y     T34Y
1         A        0/0      34,2     G        0/0      33,33
2         A        0/1      22,3     G        0/1      44,22
3         A        0/1      33,2     G        1/1      112,0
R • 2.1k views
ADD COMMENT
0
Entering edit mode

I found the table difficult to read. Are your table follow this format?

df1:

Gene      MAPK1    MAPK1    MAPK1    MAPK2    MAPK2    MAPK2
TYPE      At       GT       AD       At       GT       AD
Sample    R23C     R23C     R23C     T34Y     T34Y     T34Y
1         A                          G
2         A                          G
3         A                          G

df2:

Genes        MAPK1    MAPK1    MAPK2    MAPK3    MAPK4    MAPK4
Names
Protein      R23C     R33Y     T34Y     R45C     T44S     S33D
Mutation
1.GT         0/0      0/0      0/0      0/0      0/0      0/0
1.AD         34,2     23,4     33,33    33,2     44,44    34,0

And you want to merge the two table given the Gene name and Protein mutation

ADD REPLY
0
Entering edit mode

Sorry, Yes you are right. Thank you for replying to my question.

ADD REPLY
0
Entering edit mode

Sorry, you are right. Thanks

ADD REPLY
1
Entering edit mode
9.0 years ago
Michael 54k

I think that merge will do the job, almost. For this to work, rows need to correspond to genes and columns to samples, as is common for many bioinformatics text-format,s you could try the following:

t(merge(t(x), t(y)), by=1)

This should work in principle, see also ?merge. If you want to remove or rename columns, this can be easily done using subsetting.

ADD COMMENT

Login before adding your answer.

Traffic: 2773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6