Hi!
Here is my problem: I have a list of sequences (reads in fasta format) from a NGS experiment, and I want to identify some specific sequences in this list and count the number of occurrences they occur. The sequences I am looking for are grouped into a database (another fasta file).
I use a local standalone blast the following way:
makeblastdb -in database.fasta -out database -dbtype nucl
blastn -db database -query ngs_reads.fasta -out results.out
The problem is that the output file displays all the input sequences (several millions in my case), and only a handful of them actually score hits in the database. As a results I have a very impractical file to read until I find the results I am looking for.
I tried using the -outfmt
option, but to no avail:
-outfmt "7 qacc sacc evalue length nident"
Also, the database is pretty big also, so inverting db/query doesn't really help (already tried).
My question is : is there a way to display only the query sequences for which the number of hits is > or = to 1 in the blastn output?
Any help would be most welcome! :)
See this thread for some discussion about this: http://seqanswers.com/forums/showthread.php?t=14498
Thanks !
I tried outfmt 7 because of the formatting and the options, I didn't realize the "comment line" was my problem. Indeed very simply with the outfmt 6 it works perfectly.
Have you try to specify a value for
evalue
parameter?