Perl program: The sequence does not appear to be FASTA format (lacks a descriptor line '>')
1
1
Entering edit mode
9.3 years ago
dago ★ 2.8k

I get the following error when running a perl program:

Use of uninitialized value $Bio::DB::NCBIHelper::HOSTBASE in concatenation (.) or string at /usr/share/perl5/Bio/DB/Query/GenBank.pm line 103.
Use of uninitialized value $Bio::DB::NCBIHelper::HOSTBASE in concatenation (.) or string at /usr/share/perl5/Bio/DB/Query/GenBank.pm line 104.
outDir: Test1/

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: The sequence does not appear to be FASTA format (lacks a descriptor line '>')
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:486
STACK: Bio::SeqIO::fasta::next_seq /usr/share/perl5/Bio/SeqIO/fasta.pm:136
STACK: Guidance::name2codeFastaFrom1 /usr/local/lib/guidance.v1.5/www/Guidance/Guidance.pm:1220
STACK: /usr/local/lib/guidance.v1.5/www/Guidance/guidance.pl:445

However, I am quite sure that all my seq are fasta. Here an example:

cat 2312_Ad2_02358.faa -A
>Ad2_02358 Chaperone protein ClpB$
MDFEKYTERARGFIQSAQTYALGQGHQQFTPAHILKVLLDDSEGMSAGLIERAGGRAQDVRLQIETDLAALPKVSGGNGQLYLSPEIARLFEQAEKIAEKAGDSYVTVERLLLALALDKGSQAGKALAQGGVTPSGLNEAINGLRKGRTADSASAENQYDALKKFAQDLTQAARDGKLDPVIGRDEEIRRAIQVLSRRTKNNPVLIGEPGVGKTAIAEGL

What I am missing here?

EDIT

Here is the file with the seqs

perl sequence software-error fasta • 5.4k views
ADD COMMENT
2
Entering edit mode

Difficult to say what you are missing without seeing the complete file - the file itself, not a copy/paste here.

However, clearly you are missing something :) You may be "quite sure" but the fasta parser is equally sure that at least one sequence is invalid - and in my experience, the parser is generally correct. Convincing yourself that you know better than the error message is a common mistake and it will not lead to solutions.

ADD REPLY
0
Entering edit mode

Agree with you. I added a link to the file containing the seqs, maybe I am missing something there.

ADD REPLY
0
Entering edit mode

If your file comes from a Windows machine, you might use dos2unix on your file to strip any extraneous Windows carriage return characters, which can interfere with parsing on UNIX platforms.

ADD REPLY
0
Entering edit mode

If all your headers have numbers you can check for missing > in the header by executing:

perl -lne 'if(/\d+/){$t++;print "$t\t$_" unless />/}' inputFile

or a cmd pipe equivalent, but without the actual code and an example it's hard to say.

ADD REPLY
0
Entering edit mode

@Alex Reynolds thanks, but all my files come from unix. @mxs The file reported above contains only 6 sequences and I manually checked them. There is always a > at the starting of the seq.

ADD REPLY
0
Entering edit mode

Have you tried removing (replacing with underscore) blanks from the header? Otherwise I see no obvious "mistake".

ADD REPLY
0
Entering edit mode

Thanks! I tried, but same problem. The program I am using is creating a folder with the results. If a conflict with the folder name is created (e.g. same outdir names) the program crashes.

ADD REPLY
0
Entering edit mode

Could you maybe explain this dirname conflict a bit please?

ADD REPLY
1
Entering edit mode

Sure. I use guidance.pl and it asks me for an ouDir name. If the dir name is the same as an existing one I get the error, if not it runs correctly.

ADD REPLY
1
Entering edit mode

Guidance looks like a really complicated script+package. I'll run a local check on next_seq with your file. If it works, there's something wrong with either how guidance passes parameters or how you're using the tool. In the meantime, could you also update the question with the exact command you're running please? Thank you!

EDIT: I ran a simple Bio::Seq script on it and it works fine. We're probably looking at an error in usage or an untested anomaly in the guidance package.

ADD REPLY
0
Entering edit mode

This is the code that finally worked

for I in *.faa
do
  guidance.pl \
    --seqFile $i \
    --msaProgram MUSCLE \
    --seqType aa \
    --outDir TEST/$i \
    --muscle /usr/bin/muscle \
    --proc_num 20 \
    --datasets $i
done

However, if I run the following it runs the firs seq and if gives me the error:

for I in *.faa
do
  guidance.pl \
    --seqFile $i \
    --msaProgram MUSCLE \
    --seqType aa \
    --outDir test1_$i \
    --muscle /usr/bin/muscle \
    --proc_num 20 \
    --datasets $i
done
ADD REPLY
0
Entering edit mode

There's nothing in $1, $i is the loop variable.

ADD REPLY
0
Entering edit mode

Sorry there was a typo.

Also,

guidance.pl \
  --seqFile 2746_Ad2_02800.faa \
  --msaProgram MUSCLE \
  --seqType aa \
  --outDir Gui \
  --muscle /usr/bin/muscle \
  --proc_num 20

It works, but if I try to run it again with the outDir Gui already there it gives me the error above.

ADD REPLY
0
Entering edit mode

That's strange, especially considering how guidance is existing folder tolerant from the brief glance I gave to the code. What is the output of:

set | grep "noclobber"
ADD REPLY
0
Entering edit mode

I agree that guidance is probably the issue. I also tried a simple Bioperl script, no errors with your file.

#!/usr/bin/perl -w

use strict;
use Bio::SeqIO;

my $seqio = Bio::SeqIO->new(-file => "2312_Ad2_02358.faa", -format => "fasta");

while(my $seq = $seqio->next_seq) {
    print $seq->display_id, "\n";
}
ADD REPLY
0
Entering edit mode

Someone once told me it's better to use use warnings; instead of perl -w.

Ref: How to copy all fasta-seqs from fasta-files with the seq-lengths between minlen and maxlen

ADD REPLY
0
Entering edit mode

Some use use warnings FATAL => 'all'; to make the script die on warnings. Seems like a good defensive approach.

ADD REPLY
0
Entering edit mode

The script shouts KMN if it gets a papercut :)

ADD REPLY
0
Entering edit mode
9.3 years ago
jairly • 0

Hi,

Maybe it is coming across other non-standard characters in the >fasta_header... recently I found the pipe | character in fasta files and it was causing me problems.

Do you have an example of the perl script working on another fasta file?

ADD COMMENT
0
Entering edit mode

Hi, thanks for the suggestion. I do not have | in my header. I do not why, but as I wrote in few comments above there was a problem with the directory.

ADD REPLY

Login before adding your answer.

Traffic: 1744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6