Question: how to compare sets using python (dealing with PDB file)
0
gravatar for Jason Lin
5.4 years ago by
Jason Lin0
United States
Jason Lin0 wrote:

Hi all,

 

Sorry to bother you all again. so I have a text file which contains the PDBID and corresponding missing coordinates from PDB file. Such as:

1FZ2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZH 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

and I have another text file which contains the PDBID and SEG signal (which is the signal indicates to low complexity region in protein sequence). Such as:

1FZ2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 
1FZ4 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 
1FZ5 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 
1FZ8 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 
1FZ9 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 
1FZH 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354

The numbers in each files are coordinates. so I want to compare those two files and generate a file which contains PDBID or course and corresponding overlap coordinates between SEG signal and missing coordinates.

In this case I want to generate a file like:

1FZ2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ4 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ5 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ8 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ9 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZH 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

I have my python code so far:

    total = []

    fin = open('file1.txt')      # I want to make the missing coordinates file a set called 'a'
    for lines in fin:
        l = lines.split()
        a = set(l[2:])
        print a

    with open('file2.txt') as seg_num:     #  I want to make the SEG signal another set called 'b'
        for seg_signal in seg_num:
            signal = seg_signal.split()
            b = set(signal[1:])
            print("lol" * 10)
            print b
            c = a & b                       # and pick the intersection between a and b called c
            space = ' '
            newlines = '\n'

            total.append([signal[0], space, str(c), newlines])

    with open('file3.txt', 'w') as f:
        for t in total:
            f.write(" ".join(t))

    f.close()

But for some reason it did not give the desire answer. And I don't know how to fix it.

seg set python pdb • 2.5k views
ADD COMMENTlink modified 5.4 years ago by dariober10k • written 5.4 years ago by Jason Lin0
2
gravatar for dariober
5.4 years ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

That's how I would do it. IN_PDB file is read in memory as dictionary so the first column is a unique identifier. The common coordinates are found with the list comprehension [x for x in pdb[k] if x in coords]:

#!/usr/bin/env python

IN_PDB= 'pdb.txt'
IN_SEG= 'seg.txt'
OUT_PDB= 'outpdb.txt'

inpdb= open(IN_PDB)
pdb= {}
for line in inpdb:
    line= line.strip().split()
    pdb[line[0]]= line[1:]
inpdb.close()

outsig= open(OUT_PDB, 'w')
inseg= open(IN_SEG)
for line in inseg:
    line= line.strip().split()
    k= line[0]
    coords= line[1:]
    if k in pdb:
        common= [x for x in pdb[k] if x in coords]
        outsig.write(k + '\t' + '\t'.join(common) + '\n')
outsig.close()
inseg.close()
ADD COMMENTlink written 5.4 years ago by dariober10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1737 users visited in the last hour