How to align disorder region and sequence in a larger scale (python)
1
0
Entering edit mode
9.8 years ago
Jason Lin • 0

So this is a follow up to my previous question. Thanks to @mdml. My previous question about How to align and compare two elements (sequence) in a list using python have been solved. Here is the code that I'm using (Code credit to mdml):

# Parse the file which was already split into split_list
lines = open("seq.txt")
for list in lines:
split_list = list.split()
header = "".join(split_list[0:2])
seq = split_list[2]
disorder = split_list[4]

# Create the new disorder string
new_disorder = ["Disorder: Posi R"]
for i, x in enumerate(disorder):
if x == "X":
    # Appends of the form: "AminoAcid Position"
    new_disorder.append("{} {}".format(i, seq[i]))

new_disorder = " ".join(new_disorder)

# Output the modified file
open("seq2.txt", "w").write( "\n".join([header, seq, new_disorder]))

This code work perfectly with my example which is:

103L Sequence: MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL Disorder: ----------------------------------XXXXXX-----------------------------------------------------------------------------------------------------------------------------XX

However when I use this code for multiple protein sequence. It still work, but only last protein sequence and it's disordered region showed up in the new file. What should I do to fix it?

protein-sequence python • 2.1k views
ADD COMMENT
2
Entering edit mode
9.8 years ago
kavin.pl ▴ 70

The problem you had is that you are opening the "seq2.txt" file to write each time it passes over a new line. Simply move the whole code a couple of indents and it should work.

Try:

# Parse the file which was already split into split_list
with open("seq.txt", "r") as lines:
    with open("seq2.txt", "w") as output:
        for list in lines:
            split_list = list.split()
            header = "".join(split_list[0:2])
            seq = split_list[2]
            disorder = split_list[4]
            # Create the new disorder string
            new_disorder = ["Disorder:\nPosi\tR"]
            for i, x in enumerate(disorder):
                if x == "X":
                    # Appends of the form: "AminoAcid Position"
                    new_disorder.append("{}\t{}".format(i, seq[i]))

            new_disorder = " ".join(new_disorder)

            # Output the modified file
            output.write("\n".join([header, seq, new_disorder])+"\n\n")

It's good practice to close files within the script and to save from having to do so at the end you can use the with open() as ... : at the start.

Hope that helps.

ADD COMMENT

Login before adding your answer.

Traffic: 2773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6