BLAST+ outfmt no stitle
1
0
Entering edit mode
7.3 years ago

I have seen the posts about BLAST outfmt and I suspect my mistake to be something really small that I am overlooking. I am blasting a large number of sequences, want just one hit for each, but I want to be able to filter for the word virus in the output.

This is my command line:

blastn -remote -db nr -query ~/path/filename.fa -outfmt "6 field1 field2..." num_alignments 1 -out ~/path/outfilename


I have tried various fields from the list of outfmt options, and I get output for everything EXCEPT stitle or salltitles or any of the other descriptive fields like scientific or common name.

if I use -outfmt "6 qseid sseqid stitle pident" for instance I get output with only 3 columns, not 4. if I use only stitle in that line, I get the std output columns. it is like the word stitle does not exist, yet it does not trigger any error messages.

What am I missing here? Can I not use those fields when I do remote?

Susanne

blast • 2.8k views
0
Entering edit mode

How about -outfmt '6 qseid sseqid stitle pident'?

0
Entering edit mode

Single or double quotes are interchangeable here.

1
Entering edit mode

Ok. There are at least two things wrong with your command.

1. You're attempting to do a nucleotide search (blastn) against a protein database (nr).
2. num_alignments is irrelevant to outfmt 6

I'm surprised it doesn't just return some error. Anyway, perhaps your problem is somehow related to this.

0
Entering edit mode

Susanne,

Did you ever get this to work? I'm experiencing the same problem using the following code...

-outfmt "6 qseqid sseqid stitle pident length mismatch gapopen qstart qend sstart send evalue bitscore"


It's as though the subject title (stitle) cannot be retrieved. I'm using blast/2.4.0. Any advice welcome.

1
Entering edit mode
7.3 years ago
pld 5.0k

It might be that the subjects you're landing on don't have titles. Since you're using tabular formatting, do you have a \t without a value where stitle should be?

Either way, it would be better to use some other means for identifying viral hits. Searching for the word "virus" might land you false positives, there are proteins with the word "virus" in their names (e.g. http://www.ncbi.nlm.nih.gov/protein/CAA56342.1).

It would be better to use the viral genomes database or filter by taxonomy.

I would also advise against remotely blasting large numbers of sequences, it can take a very long time and the jobs can fail.