Question: (Closed) How to make a python script that compares two files
0
gravatar for naiannegri
4.0 years ago by
naiannegri0
Brazil
naiannegri0 wrote:

Hey guys,

I'm new with python so i'm reaaally struggling in making a script.

So, what I need  is to make a comparison between two files. One file contains all proteins of some data base, the other contain only some of the proteins presents in the other file, because it belongs to a organism. So I need to know wich proteins of this data base is present in my organism. For that I want to build a output like a matrix, with 0 and 1 referring to every protein present in the data base that may or may not be in my organism.

Does anybody have any idea of how could I do that?

I'm thinking of something like this

f1 = open('file1.txt', 'r')
f2 = open('file2.txt', 'r')
FO = open('output.txt', 'w')

for line1 in file1:
    for line2 in file2:
        if line1 == line2:
            FO.write("%d" %(1))


FO.close()
file1.close()
file2.close()

But there's a problem, that script will only add if the lines are totally equal and that will not work because I only need that one word (the protein) be the same.

 

Could anybody please help me?

Thanks so far.

 

pfam script • 17k views
ADD COMMENTlink modified 4.0 years ago by Pierre Lindenbaum122k • written 4.0 years ago by naiannegri0
1

Why don't you use the linux diff command

ADD REPLYlink written 4.0 years ago by Irsan6.9k

This command will only compares line by line and won't make a output with 0 and 1 which is what I need

ADD REPLYlink written 4.0 years ago by naiannegri0
1

Can you post a few lines of the database and your file? 

ADD REPLYlink written 4.0 years ago by Damian Kao15k

1-cysPrx_C
120_Rick_ant
14-03-2003
2-Hacid_dh
2-Hacid_dh_C
2-oxoacid_dh
2-ph_phosp
2CSK_N

comparing with

1-cysPrx_C
14-3-3
2-Hacid_dh
2-Hacid_dh_C
2-oxoacid_dh
2H-phosphodiest
2OG-FeII_Oxy
2OG-FeII_Oxy_3
2OG-FeII_Oxy_4

just an example

ADD REPLYlink written 4.0 years ago by naiannegri0
1
$ cat sorted.a
A
B
C
D
$ cat sorted.b
A
D
$ join  sorted.a sorted.b | sed 's/^/1 /' && join  -v 1 sorted.a sorted.b | sed 's/^/0 /'
1 A
1 D
0 B
0 C

 

ADD REPLYlink written 4.0 years ago by Pierre Lindenbaum122k

Why don't you use the linux comm command ?

ADD REPLYlink written 4.0 years ago by Pierre Lindenbaum122k

This command will only compares line by line and won't make a output with 0 and 1 which is what I need

ADD REPLYlink written 4.0 years ago by naiannegri0

Can you use "in" instead of "==" (http://www.tutorialspoint.com/python/membership_operators_example.htm)?

Otherwise, you can use regular expression.

Can you show a few lines of each file and how you want the comparison done?

ADD REPLYlink written 4.0 years ago by Janake160

Hello naiannegri!

We believe that this post does not fit the main topic of this site.

not related to bioinformatics. Just basic python/linux, no bio-thing inside

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 4.0 years ago by Pierre Lindenbaum122k

Actually I'm trying to make this script because I need a file like a matrix that tells me wich proteins are present in a proteome when comparing to all pfam database.

ADD REPLYlink written 4.0 years ago by naiannegri0
So you need to find the intersect between two sets... You don't need to write a New script for that...
ADD REPLYlink written 4.0 years ago by Irsan6.9k

Hi Pierre,

I think you should make a decision whether to close the question or to answer it, but not both....

ADD REPLYlink written 4.0 years ago by Michael Dondrup46k

you're right: I moved my answer to a comment.

ADD REPLYlink written 4.0 years ago by Pierre Lindenbaum122k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1313 users visited in the last hour