Question: blast error invalid query/ sequence/ filtering options. What is the problem?
0
gravatar for john
5.2 years ago by
john70
European Union
john70 wrote:

Hello people

I'm trying to run blast (blast-2.2.26) on a fasta file. But I get this error message:

[blastall] WARNING: MaulwurfLeber_H21F7XH01DFTZ0_rank=0133853_x=1293_: Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options

I call blast like this:

blastall -p blastp -i MaulwurfLeber_prots.fasta -d Pool_new_unclustered -o Contigs_prots_vs_New_unclustered.tab -a 8 -m 8 -e 0.001

The reads which causes the error are the following:

>MaulwurfLeber_H21F7XH01DFTZ0_rank=0133853_x=1293_0_y=1550_0_length=353_-gene_1
VEIGEVVVFGEVETVVGEAVEVEAGEVVEVEVGEVEVGEVVVGEVVVV
>MaulwurfLeber_H21F7XH01DFTZ0_rank=0133853_x=1293_0_y=1550_0_length=353_-gene_2
VRWWSVRWWSFEEVKVVVGEVEVVVGEAVEVEISEVEVGEWSR

The fasta file was created with hmmsearch called like this:

hmmsearch --tblout Contigs_prots_vs_PFAMa.tab --cpu 8 -o Contigs_prots_vs_PFAMa.out --noali Pfam-A.hmm MaulwurfLeber_prots.fasta

My calls are based on this scripts from VirSorter.

https://github.com/simroux/VirSorter

I'm running the script on a cluster.

myposts blastp blast fasta • 5.8k views
ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by john70

Even though this should not be happening I wonder if blast is not liking the = and - characters in the fasta header. Can you try replacing them with an "_" and see if that helps.

You are not using the latest blast so if possible upgrade.

ADD REPLYlink modified 5.2 years ago • written 5.2 years ago by GenoMax95k
0
gravatar for Jean-Karim Heriche
5.2 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche24k wrote:

This message is a warning, it shouldn't be fatal to the execution. It can happen that for some choices of blast parameters and some sequences, the statistics can't be calculated. In your case, my guess is that it's because your peptide sequences are highly repetitive.

ADD COMMENTlink written 5.2 years ago by Jean-Karim Heriche24k

Thank you for your quick answer.

If it is not a fatale error, than something else is wrong with the call or my db. Because the output file is empty.

ADD REPLYlink modified 13 months ago by _r_am32k • written 5.2 years ago by john70

It can also happen when there are unrecognized/unacceptable characters in the sequences. Check the database to make sure it only has valid amino-acid characters. However, remember that an empty result file without any fatal error message could also mean that there are no results to be had. Given that your sequences are highly repetitive, this is likely if you have some filtering turned on, which I seem to remember blastall does by default. I think the option to turn this off is -F F.

ADD REPLYlink modified 13 months ago by _r_am32k • written 5.2 years ago by Jean-Karim Heriche24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1447 users visited in the last hour
_