Question: Global Pairwise Alignment For Long Sequence Throws Error In Python!!
1
gravatar for abhishekniroula7
6.8 years ago by
Sweden
abhishekniroula750 wrote:

Hello there,

I am performing pairwise global alignment using Emboss Needleman-Wunsch algorithm via python script. The script runs pretty well with shorter sequences but it throws an error when I perform with a pair of proteins (the longest protein Titin). I am trying to perform pairwise global alignment of ensembl protein ENSP00000343764 and SwissProt protein Q8WZ42. The length of these two sequences are not same, so I am interested to see the alignment. I am using python to perform this alignment. The code I used is:

from Bio.Emboss.Applications import NeedleCommandline
from Bio import AlignIO
needle_cline=NeedleCommandline(asequence="Q8WZ42.fa",bsequence="ENSP00000343764.fa",outfile="ENSP00000343764.needle",gapopen=10,gapextend=0.5)
stdout,stderr=needle_cline()

This generates an error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.7/Bio/Application/__init__.py", line 437, in __call__
stdout_str, stderr_str)
Bio.Application.ApplicationError: Command 'needle -outfile=ENSP00000343764.needle -asequence=Q8WZ42.fa -bsequence=ENSP00000343764.fa -gapopen=10 -gapextend=0.5' returned non-zero exit status 1, 'Needleman-Wunsch global alignment of two sequences'

If I use only a small fragment (say 5000 amino acid) of any one of the sequences, the script works. It generates an alignment file. I am not sure, if the error is because of the length of the proteins. Can anyone explain the possible reason for this error and how to fix it? I might use fragments of the sequences to see the alignment but thats not a good idea when my script is running for large number of proteins. Do you have any idea how I can do it?

Thanks in advance!

python • 3.7k views
ADD COMMENTlink modified 3.5 years ago by Markus260 • written 6.8 years ago by abhishekniroula750
4
gravatar for brentp
6.8 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

When you do sequence alignment with an N-length sequence and an M-length sequence, it's probably creating at least 2 N*M arrays which can be a lot of memory.

Try running that needle command from the command-line and watch the memory usage. (Or just watch usage from the python script).

If memory is the problem, you may try using http://pypi.python.org/pypi/nwalign/ as it makes some attempt to use as little memory as possible.

ADD COMMENTlink written 6.8 years ago by brentp23k
1

Thanks @brentp This module seems to work faster. But, it did not solve my problem. Both the strings are of length approximately 35000. So I got message: MemoryError. Probably, I should make smaller fragments of one sequence and then form alignment with the other sequence.

ADD REPLYlink written 6.8 years ago by abhishekniroula750
1

you can either split them or go to a machine with more memory. you sure you want to do global sequence alignment on 35kb regions?

ADD REPLYlink written 6.8 years ago by brentp23k

Well, I am doing that for large number of sequences. And, I want to make the process automatic.

ADD REPLYlink written 6.8 years ago by abhishekniroula750
0
gravatar for Markus
3.5 years ago by
Markus260
Markus260 wrote:

You might consider using EMBOSS Stretcher, which uses a modified Needelman-Wunsch algorithm that works in linear space (instead of quadratic). Biopython also provides a command-line interface for Stretcher under Bio.Emboss.Applications.

ADD COMMENTlink written 3.5 years ago by Markus260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1891 users visited in the last hour