Global Pairwise Alignment For Long Sequence Throws Error In Python!!
2
1
Entering edit mode
11.2 years ago

Hello there,

I am performing pairwise global alignment using Emboss Needleman-Wunsch algorithm via python script. The script runs pretty well with shorter sequences but it throws an error when I perform with a pair of proteins (the longest protein Titin). I am trying to perform pairwise global alignment of ensembl protein ENSP00000343764 and SwissProt protein Q8WZ42. The length of these two sequences are not same, so I am interested to see the alignment. I am using python to perform this alignment. The code I used is:

from Bio.Emboss.Applications import NeedleCommandline
from Bio import AlignIO
needle_cline=NeedleCommandline(asequence="Q8WZ42.fa",bsequence="ENSP00000343764.fa",outfile="ENSP00000343764.needle",gapopen=10,gapextend=0.5)
stdout,stderr=needle_cline()

This generates an error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.7/Bio/Application/__init__.py", line 437, in __call__
stdout_str, stderr_str)
Bio.Application.ApplicationError: Command 'needle -outfile=ENSP00000343764.needle -asequence=Q8WZ42.fa -bsequence=ENSP00000343764.fa -gapopen=10 -gapextend=0.5' returned non-zero exit status 1, 'Needleman-Wunsch global alignment of two sequences'

If I use only a small fragment (say 5000 amino acid) of any one of the sequences, the script works. It generates an alignment file. I am not sure, if the error is because of the length of the proteins. Can anyone explain the possible reason for this error and how to fix it? I might use fragments of the sequences to see the alignment but thats not a good idea when my script is running for large number of proteins. Do you have any idea how I can do it?

Thanks in advance!

python • 5.4k views
ADD COMMENT
4
Entering edit mode
11.2 years ago
brentp 24k

When you do sequence alignment with an N-length sequence and an M-length sequence, it's probably creating at least 2 N*M arrays which can be a lot of memory.

Try running that needle command from the command-line and watch the memory usage. (Or just watch usage from the python script).

If memory is the problem, you may try using http://pypi.python.org/pypi/nwalign/ as it makes some attempt to use as little memory as possible.

ADD COMMENT
1
Entering edit mode

Thanks @brentp This module seems to work faster. But, it did not solve my problem. Both the strings are of length approximately 35000. So I got message: MemoryError. Probably, I should make smaller fragments of one sequence and then form alignment with the other sequence.

ADD REPLY
1
Entering edit mode

you can either split them or go to a machine with more memory. you sure you want to do global sequence alignment on 35kb regions?

ADD REPLY
0
Entering edit mode

Well, I am doing that for large number of sequences. And, I want to make the process automatic.

ADD REPLY
0
Entering edit mode
7.9 years ago
Markus ▴ 320

You might consider using EMBOSS Stretcher, which uses a modified Needelman-Wunsch algorithm that works in linear space (instead of quadratic). Biopython also provides a command-line interface for Stretcher under Bio.Emboss.Applications.

ADD COMMENT

Login before adding your answer.

Traffic: 2694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6