Needle Crashing on Specific Sequences
1
1
Entering edit mode
5.9 years ago

Given two fasta files, attempting alignment:

==> seq1.fasta <==
>seq1
DVARNTGAYTS
==> seq2.fasta <==
>seq2
EVACNTGAYTS

Both needle and stretcher crash:

$/usr/local/bin/needle seq1.fasta seq2.fasta needle.out
Needleman-Wunsch global alignment of two sequences
Error: Sequence is not nucleic

Error: Unable to read sequence 'seq2.fasta'
Died: needle terminated: Bad value for '-bsequence' and no prompt

Error happens even when specifying

/usr/local/bin/needle seq1.fasta seq2.fasta needle.out -datafile EBLOSUM62 -auto Y -awidth3 999999999

Can narrow it down to this confusing example:

THIS WORKS

==> seq1.fasta <==
>seq1
SVAS
==> seq2.fasta <==
>seq2
SATA

THIS CRASHES NEEDLE

==> seq1.fasta <==
>seq1
SVAS
==> seq2.fasta <==
>seq2
SLTA

I can't find any reason why.

needle emboss alignment pairwise fasta • 2.4k views
ADD COMMENT
2
Entering edit mode

I got the same error (EMBOSS 6.3.1). What's interesting is that needle works fine when you change the order of your input sequences (providing seq2.fasta as the first one and then seq1.fasta).

ADD REPLY
0
Entering edit mode

... I thought this was a weird error, the order mattering makes it twice as much.

ADD REPLY
3
Entering edit mode
5.9 years ago
5heikki 10k

S,V, T and A are all valid IUPAC letters for nucleotides whereas L is not. If needle guesses nuc/prot based on seq1 (nevermind what substitution matrix you specify), then it's easy to explain a.zielezinski's observation since if you reverse the order (seq2 first) it can only be prot, whereas with seq1 it could be either: DVARNTGAYTS.. all these letters are valid IUPAC nucleotide code. However, the E in EVACNTGAYTS is not valid nucleotide code. You even get the error - "Error: Sequence is not nucleic"

tldr: Unless specified, needle probably guesses nuc/prot based on IUPAC letters and if there's not a single prot specific letter in seq1 then the guess is nuc.

Basically the program goes: Hmm, I sure wish user would have told me if these were nuc or prot seqs. Oh well, let's guess based on the first seq. First letter could be both, second letter could be both, .., all could be both. Hmm, I guess it's then likely that this is nuc seq. I sure wish the dev would have included A+T+G+C vs the rest frequency comparison. Oh well. Let's read seq 2. Oh what the hell, this is not nuc. Let's throw an error..

The fix? Use:

/usr/local/bin/needle seq1.fasta seq2.fasta needle.out -datafile EBLOSUM62 -auto Y -awidth3 999999999 -sprotein1 Y -sprotein2 Y
ADD COMMENT
1
Entering edit mode

I always use "-sprotein" flag to indicate I am aligning protein sequences when using needle. This should indicate both are proteins and save a few keystrokes. e.g. "needle asequence=seq1.fasta bsequence=seq2.fasta gapopen=3 gapextend=1 outfile=needle.out -sprotein"

ADD REPLY
0
Entering edit mode

I just use muscle always..

ADD REPLY

Login before adding your answer.

Traffic: 1434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6