Question: R programming question: genotyping data table manipulation
5
gravatar for MAPK
2.4 years ago by
MAPK1.0k
United States
MAPK1.0k wrote:

Hi guys,

I have a problem figuring out a way to do this in R. I have two data frames with genotyping data, df1 and df2. The table is really big, with hundreds of samples, which I have included small set of this data below. I have also included pieces of codes that I tried to make keys for both data columns to match with each other.   

df1:

Gene

MAPK1

MAPK1

MAPK1

MAPK2

MAPK2

MAPK2

TYPE

At

GT

AD

At

GT

AD

Sample

R23C

R23C

R23C

T34Y

T34Y

T34Y

1

A

 

 

G

 

 

2

A

 

 

G

 

 

3

A

 

 

G

 

 

 

rownames(seqnom)[c(1,2,3)]<-c("Gene","Type","Sample")

key.df1 <- paste(paste(df1["Gene",] , df1[“Sample”,],sep=":"))

Now, we have df2:

df2:

Genes Names

MAPK1

MAPK1

MAPK2

MAPK3

MAPK4

MAPK4

Protein Mutation

R23C

R33Y

T34Y

R45C

T44S

S33D

1.GT

0/0

0/0

0/0

0/0

0/0

0/0

1.AD

34,2

23,4

33,33

33,2

44,44

34,0

2.GT

0/1

0/1

0/1

0/0

0/1

0/1

2.AD

22,3

33,2

44,22

34,22

34,3

91,91

3.GT

1/1

1/1

1/1

1/1

1/1

1/1

3.AD

33,2

3,2

112,0

22,3

34,0

33,2

key.df2 <- paste(paste(df2["Gene Names",],df2["Protein Mutation",],sep=":"))

So using these two keys (key.df1 and key.df2) I would like to match with each other and if they match I want to paste the corresponding values in their respective columns. There are 100 samples (1:100) and all 100 samples have GT and AD values. Could you guys please help me fill the table below. I would really really appreciate it guys. Thank you.

Result:

Gene

MAPK1

MAPK1

MAPK1

MAPK2

MAPK2

MAPK2

TYPE

At

GT

AD

At

GT

AD

Sample

R23C

R23C

R23C

T34Y

T34Y

T34Y

1

A

0/0

34,2

G

0/0

33,33

2

A

0/1

22,3

G

0/1

44,22

3

A

0/1

33,2

G

1/1

112,0

 

R • 794 views
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by MAPK1.0k

I found the table difficult to read. Are your table follow this format?

df1

Gene MAPK1 MAPK1 MAPK1 MAPK2 MAPK2 MAPK2
TYPE At GT AD At GT AD
Sample R23C R23C R23C T34Y T34Y T34Y
1 A     G    
2 A     G    
3 A     G    

df2

Gene Names MAPK1 MAPK1 MAPK1 MAPK2 MAPK2 MAPK2
Protein Mutation R23C R33Y T34Y R45C T44S S33D
1.GT 0/0 0/0 0/0 0/0 0/0 0/0
1.AD 34,2 23,4 33,33 33,2 44,44 34,0
             

And you want to merge the two table given the Gene name and Protein mutation

ADD REPLYlink written 2.4 years ago by Sam2.0k

Sorry, Yes you are right. Thank you for replying to my question.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by MAPK1.0k

Sorry, you are right. Thanks

ADD REPLYlink written 2.4 years ago by MAPK1.0k
1
gravatar for Michael Dondrup
2.4 years ago by
Bergen, Norway
Michael Dondrup43k wrote:

I think that merge will do the job, almost. For this to work, rows need to correspond to genes and columns to samples, as is common for many bioinformatics text-format,s you could try the following:

t (merge (t(x), t(y)), by=1 ) 

This should work in principle, see also ?merge. If you want to remove  or rename columns, this can be easily done using subsetting.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Michael Dondrup43k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1396 users visited in the last hour