BLAST+ outfmt no stitle
1
0
Entering edit mode
8.7 years ago

I have seen the posts about BLAST outfmt and I suspect my mistake to be something really small that I am overlooking. I am blasting a large number of sequences, want just one hit for each, but I want to be able to filter for the word virus in the output.

This is my command line:

blastn -remote -db nr -query ~/path/filename.fa -outfmt "6 field1 field2..." num_alignments 1 -out ~/path/outfilename

I have tried various fields from the list of outfmt options, and I get output for everything EXCEPT stitle or salltitles or any of the other descriptive fields like scientific or common name.

if I use -outfmt "6 qseid sseqid stitle pident" for instance I get output with only 3 columns, not 4. if I use only stitle in that line, I get the std output columns. it is like the word stitle does not exist, yet it does not trigger any error messages.

What am I missing here? Can I not use those fields when I do remote?

Susanne

blast • 3.2k views
ADD COMMENT
0
Entering edit mode

How about -outfmt '6 qseid sseqid stitle pident'?

ADD REPLY
0
Entering edit mode

Single or double quotes are interchangeable here.

ADD REPLY
1
Entering edit mode

Ok. There are at least two things wrong with your command.

  1. You're attempting to do a nucleotide search (blastn) against a protein database (nr).
  2. num_alignments is irrelevant to outfmt 6

I'm surprised it doesn't just return some error. Anyway, perhaps your problem is somehow related to this.

ADD REPLY
0
Entering edit mode

Susanne,

Did you ever get this to work? I'm experiencing the same problem using the following code...

-outfmt "6 qseqid sseqid stitle pident length mismatch gapopen qstart qend sstart send evalue bitscore"

It's as though the subject title (stitle) cannot be retrieved. I'm using blast/2.4.0. Any advice welcome.

ADD REPLY
1
Entering edit mode
8.7 years ago
pld 5.1k

It might be that the subjects you're landing on don't have titles. Since you're using tabular formatting, do you have a \t without a value where stitle should be?

Either way, it would be better to use some other means for identifying viral hits. Searching for the word "virus" might land you false positives, there are proteins with the word "virus" in their names (e.g. http://www.ncbi.nlm.nih.gov/protein/CAA56342.1).

It would be better to use the viral genomes database or filter by taxonomy.

I would also advise against remotely blasting large numbers of sequences, it can take a very long time and the jobs can fail.

ADD COMMENT

Login before adding your answer.

Traffic: 2573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6