problem with match between two data in R
1
0
Entering edit mode
6.4 years ago
Mo ▴ 920

Hello, 

What I want is to check whether I find a match between each columns of df2 with df1. 

The results for each columns should be which element is matched with rows of df1. 

Please check for all my effort to get this job done with no success in stack overflow . If it can be done in python, I will also appreciate any comment 

 

r python • 1.6k views
ADD COMMENT
1
Entering edit mode

Add a small example of your two datasets, and then an example of what the result should look like. We might be able to help you then. Off the top of my head I'm thinking merge or match function.

ADD REPLY
0
Entering edit mode

@Carlos Guzman small example won't help because of the structure of data. I have tried all those , look at my question in stack overflow . any solution will be ok for small example but when I use the real data , it will not work

ADD REPLY
0
Entering edit mode

I edited your question, using the syntax tag and reading the file directly from github.

ADD REPLY
0
Entering edit mode

I still can't understand the question. Could you make an example of the expected output?

ADD REPLY
0
Entering edit mode

I can see that the first file is taken from a complexes database (reactome), but I still can't understand the format of the second file. Can you explain what it contains?

ADD REPLY
0
Entering edit mode

I found a solution . Thanks

ADD REPLY
1
Entering edit mode
6.4 years ago

I haven't grasped the fine details of your data and your aim but it seems to me that R is not ideal for this job (although certainly possible). In python, I would do the following:

  • Parse the "second" file to make a reference set of unique IDs. I.e. split each row to extract the IDs and add them to the reference ID set.

  • Iterate through the first file. For each row extract the IDs. Test if any of these IDs are in the reference set constructed above. If so print out the row.

It's not clear to me what delimits the IDs (tab, colon, semicolon?) but using python's string.split() it should be easy to get them for each row.

ADD COMMENT
0
Entering edit mode

@dariober I found a solution. Thanks

ADD REPLY
0
Entering edit mode

So what is the solution :D ?

ADD REPLY
0
Entering edit mode

@medhat the solution is to use python and exactly what dariober said.

ADD REPLY

Login before adding your answer.

Traffic: 1664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6