Question: (Closed) Id match and Matrix multiplication
0
gravatar for akang
4.3 years ago by
akang90
akang90 wrote:

I have two files. Row id in File1 matches the column id in file2 (starting from column6 )except the last 2 characters. File1 has 50 rows and File 2 has 56 columns. If the id matches I want to multiply the value in column3 of File1 to the entire column in File2. and in the final output print only Column2 and column7 onwards from file2. Any awk or R suggestions?

File1
P1  A   -0.468018   -3.49806    
P2  A   0.0903727   0.675471    
P3  C   0.441187    3.29752 
P4  C   0.240075    1.79437 


File2
ID1 ID2 ID3 ID4 ID5 ID6 P1_A P2_A P3_C........
0 A01 0 0 0 0 0 2 1 
0 A04 0 0 0 0 11 0 
0 E05 0 0 0 0 0 1 2 
0 G06 0 0 0 0 2 0 2 

Output
ID2  P1 P2 P3........
A01  0*-0.468018 2*0.0903727 ....
A04  1*-0.468018 1*0.0903727...
E05  0*-0.468018 1*0.0903727....
G06  2*-0.468018 0*0.0903727...
awk sed R • 1.0k views
ADD COMMENTlink modified 4.3 years ago by Shicheng Guo8.2k • written 4.3 years ago by akang90

Hello akang!

We believe that this post does not fit the main topic of this site.

This is a question on R. Please ask stackoverflow

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 4.3 years ago by RamRS27k

@Ram:File manipulation is an integral part of any bioinformatic analysis. So I think posts related to file manipulations should not be closed in this group. After all it is all about helping each other out.

ADD REPLYlink written 4.2 years ago by akang90

I am sorry, but pure file manipulation questions, even if the file contains biological data, do not qualify as a bioinformatics questions unless the operation involved needs bioinformatics.

But yes, we make these calls on a case-by-case basis and if we come across an edge case, we ensure multiple mods agree before such a post is closed.

Your question is not an edge case though. The operation you're trying to do has nothing to do with bioinformatics, and the data does not look distinctly biological either.

ADD REPLYlink written 4.2 years ago by RamRS27k
1
gravatar for Shicheng Guo
4.3 years ago by
Shicheng Guo8.2k
Shicheng Guo8.2k wrote:

For awk, It would be hard. But for R, it is very easy.

for(i in 1:nrow(file2)){
file2[i,2:7]<-file2[,2:7]*file1[match(substr(colnames(file2)[i],1,2),file1[,1]),3]
}
file2

Now, file2 is the new file what you want.

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Shicheng Guo8.2k

@Shicheng Guo I get results like this:

ID1 ID2 ID3 ID4 ID5 ID6 P1_A P2_A P3_C........
NA <NA> NA NA NA NA  0 2 1 
NA <NA> NA NA NA NA 1 1 0 
NA <NA> NA NA NA NA 0 1 2 
NA <NA> NA NA NA NA 2 0 2
ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by akang90
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2057 users visited in the last hour