Question

Using Python, How to compare two columns in two different csv files, and then print the similar lines and differents lines

0

Entering edit mode

7.2 years ago

hamzaallal07 • 0

I have two files contains two columns for each files, I need to compare each row in each first column of file1.csv and file2.csv, and then, if the two columns are similar, I print the first column and the second two columns. like this: in file1.csv: C(2)—C(1) 1.5183 in file2.csv: C(2)—C(1) 1.5052

output.csv C(2)—C(1) 1.5183 1.5052

and if the two columns are different, I print the line that exists in file1 and file2

Here are my two .scv files

file1.csv

C(2)—C(1) 1.5183
C(3)—C(2) 1.49
C(3)—C(1) 1.4991
O(4)—C(3) 1.4104
H(10)—O(4) 0.964
C(2)—C(1)—C(3) 59.19
C(3)—C(1)—H(5) 118.4

file2.csv

C(2)—C(1) 1.5052
C(3)—C(2) 1.505
C(3)—C(1) 1.5037
S(4)—C(3) 1.7976
H(10)—S(4) 1.3445
C(2)—C(1)—H(6) 117.68
C(2)—C(1)—C(3) 60.3
C(3)—C(1)—H(5) 116.99

and my desired output:

similar_lines
C(2)-C(1)           1.5183    1.5052
C(3)-C(2)           1.49      1.505
C(3)-C(1)           1.4991    1.5037
C(2)-C(1)-C(3)      59.19     60.3
C(3)-C(1)-H(5)      118.4     116.99

different_lines
O(4)—C(3)           1.4104      –
H(10)—O(4)          0.964       –
S(4)—C(3)            –       1.7976
H(10)-S(4)           –       1.3445
C(2)-C(1)-H(6)       –       117.68

regarding similar lines, I found a very good srcipt in this wonderful sitweb Compare two columns in several different files with Perl or Python , which offers a solution for similar lingne.

but on the other hand I do not have an idea how to print the different lines.

python • 33k views

ADD COMMENT • link 7.2 years ago by hamzaallal07 • 0

0

Entering edit mode

Can you please give us some context to understand how this is related to bioinformatics, please?

ADD REPLY • link 7.2 years ago by Ram 45k

0

Entering edit mode

Hi cpad0112, sorry for the delay.

I am a researcher in chemistry, computational chemistry (theoretical chemistry), "Chemoinformatics". I use ORCA program https://orcaforum.cec.mpg.de in order to calculate several parameters related to the molecular structures. so I prepare an input file that contains information of a molecular structure, and then I get the results in a large output file. so after each calculation I collect my results.

The script above makes my tasks easier, it allows me to compare two binding results and angles for a similar molecule.

ADD REPLY • link 7.2 years ago by hamzaallal07 • 0

0

Entering edit mode

Np. Please mark posts with appropriate tags and give the forum appropriate context of the problem. This is because most of the members are not only programmers, they are also knowledgeable in multiple subjects (mostly bioinformatics). They appreciate the context of problem and the context would help in coming up with a better solution for posted issue. Some times, what we think may or may not be appropriate solution for that problem/issue. Good luck with your research and keep posting here :).

ADD REPLY • link 7.2 years ago by cpad0112 21k

0

Entering edit mode

Hello hamzaallal07!

We believe that this post does not fit the main topic of this site.

Please see: C: Using Python, How to compare two columns in two different csv files, and then pr

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY • link 7.2 years ago by Ram 45k

score 2 · Answer 1 · 2018-05-04

import pandas as pd
test1=pd.read_csv("test1.txt", sep="\t",  header=None)
test1.columns=["a","b"]
test2=pd.read_csv("test2.txt", sep="\t",  header=None)
test2.columns=["a","b"]
pd.merge(test1, test2, on='a', how='inner')

    a   b_x     b_y
0   C(2)—C(1)   1.5183  1.5052
1   C(3)—C(2)   1.4900  1.5050
2   C(3)—C(1)   1.4991  1.5037
3   C(2)—C(1)—C(3)  59.1900     60.3000
4   C(3)—C(1)—H(5)  118.4000    116.9900

outer_common=pd.merge(test1, test2, on='a', how='outer', indicator=True)
outer_common_nc=outer_common[outer_common['_merge']!="both"]
outer_common_nc.iloc[:,0:3]
a   b_x     b_y
3   O(4)—C(3)   1.4104  NaN
4   H(10)—O(4)  0.9640  NaN
7   S(4)—C(3)   NaN     1.7976
8   H(10)—S(4)  NaN     1.3445
9   C(2)—C(1)—H(6)  NaN     117.6800

works with python 3.6 and pandas 0.22