Bioinformatics Q: How to align and compare two elements (sequence) in a list using python
3
0
Entering edit mode
9.8 years ago
Jason Lin • 0

here is my question:

I've got a file which looks like this:

103L Sequence: MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL Disorder: ----------------------------------XXXXXX-----------------------------------------------------------------------------------------------------------------------------XX

It contains name, which in this case is 103L; protein sequence, which has "Sequence:" label; disorder region, which is after "Disorder:". the "-" represent that this position is ordered, and "X" represent that this particular position is disordered. For example, that last two "XX" under disorder represent that the last two position of the protein sequence is disordered, which is "NL". After I use split method, it looks like this:

['>103L', 'Sequence:', 'MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL', 'Disorder:', '----------------------------------XXXXXX-----------------------------------------------------------------------------------------------------------------------------XX']

I want to use python to find the disorder sequence and its position. So the final file should look somewhat like this: Name Sequence: 'real sequence' Disorder: position(Posi) residue-name(R) Take 103L as an example:

103L Sequence: MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL

Disorder: Posi R
               34    K
               35    S
               36    P
               37    S
               38    L
               39    N
               65    N
               66    L

I am new in python, really hope someone can help me, thank you so much!!!

python protein-sequence alignment • 2.9k views
ADD COMMENT
2
Entering edit mode
9.8 years ago
Asaf 10k
print "Disorder: Posi R"
sp_line = data.split()
for i, x in enumerate(sp_line[4]):
    if x == 'X':
        print('%d\t%s'%(i + 1, sp_line[2][i]))
ADD COMMENT
1
Entering edit mode
9.8 years ago

For each input line, this function will give you a dictionary that you can use later to format your output as desired:

def getDisorderedPositions(sequence, matches):
    """For positions in 'matches' where char is not '-', return the corresponding char
    in 'sequence' in the form of a dictionary.
    """
    if len(sequence) != len(matches):
        return(None)
    disorder= {}
    for i in range(0, len(sequence)):
        if not matches[i] == '-':
            disorder[i]= sequence[i]
    return(disorder)

Example:

region= ['>103L', 'Sequence', 'MNIFEMLRID', 'Disorder', '---XX----X']
sequence= region[2]
matches= region[4]

dis= getDisorderedPositions(sequence, matches)
for k in sorted(dis.keys()):
    print(k, dis[k])
(3, 'F')
(4, 'E')
(9, 'D')
ADD COMMENT
1
Entering edit mode
9.8 years ago

I like Asaf's answer, but like the idea of iterating more over data with less indexing:

ADD COMMENT

Login before adding your answer.

Traffic: 3832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6