Question: BLAST+ outfmt no stitle
0
gravatar for susannehoward
5.4 years ago by
United States
susannehoward90 wrote:

I have seen the posts about BLAST outfmt and I suspect my mistake to be something really small that I am overlooking. I am blasting a large number of sequences, want just one hit for each, but I want to be able to filter for the word virus in the output.

this is my command line:

blastn -remote -db nr -query ~/path/filename.fa -outfmt "6 field1 field2..." num_alignments 1 -out ~/path/outfilename

I have tried various fields from the list of outfmt options, and I get output for everything EXCEPT stitle or salltitles or any of the other descriptive fields like scientific or common name.

if I use -outfmt "6 qseid sseqid stitle pident" for instance I get output with only 3 columns, not 4. if I use only stitle in that line, I get the std output columns. it is like the word stitle does not exist, yet it does not trigger any error messages.

What am I missing here?  Can I not use those fields when I do remote?

Susanne

blast • 2.1k views
ADD COMMENTlink modified 5.4 years ago by pld4.9k • written 5.4 years ago by susannehoward90

How about -outfmt '6 qseid sseqid stitle pident'?

ADD REPLYlink written 5.4 years ago by 5heikki9.1k

single or double quotes are interchangeable here. 

ADD REPLYlink written 5.4 years ago by susannehoward90
1

Ok. There are at least two things wrong with your command. 1. You're attempting to do a nucleotide search (blastn) against a protein database (nr). 2. num_alignments is irrelevant to outfmt 6 

I'm surprised it doesn't just return some error. Anyway, perhaps your problem is somehow related to this.

ADD REPLYlink written 5.4 years ago by 5heikki9.1k

Susanne,

Did you ever get this to work? I'm experiencing the same problem using the following code...

-outfmt "6 qseqid sseqid stitle pident length mismatch gapopen qstart qend sstart send evalue bitscore"

It's as though the subject title (stitle) cannot be retrieved. I'm using blast/2.4.0. Any advice welcome.

ADD REPLYlink written 4.3 years ago by nickleshill0
1
gravatar for pld
5.4 years ago by
pld4.9k
United States
pld4.9k wrote:

It might be that the subjects you're landing on don't have titles. Since you're using tabular formatting, do you have a \t without a value where stitle should be?

Either way, it would be better to use some other means for identifying viral hits. Searching for the word "virus" might land you false positives, there are proteins with the word "virus" in their names (e.g. http://www.ncbi.nlm.nih.gov/protein/CAA56342.1).

It would be better to use the viral genomes database or filter by taxonomy.

I would also advise against remotely blasting large numbers of sequences, it can take a very long time and the jobs can fail.

ADD COMMENTlink written 5.4 years ago by pld4.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1570 users visited in the last hour
_