Here is my problem: I have a list of sequences (reads in fasta format) from a NGS experiment, and I want to identify some specific sequences in this list and count the number of occurrences they occur. The sequences I am looking for are grouped into a database (another fasta file).
I use a local standalone blast the following way :
makeblastdb -in database.fasta -out database -dbtype nucl blastn -db database -query ngs_reads.fasta -out results.out
The problem is that the output file displays all the input sequences (several millions in my case), and only a handful of them actually score hits in the database. As a results I have a very impractical file to read until I find the results I am looking for.
I tried using the "-outfmt" option, but to no avail :
-outfmt "7 qacc sacc evalue length nident"
Also, the database is pretty big also, so inverting db/query doesn't really help (already tried).
My question is : is there a way to display only the query sequences for which the number of hits is > or = to 1 in the blastn output ?
Any help would be most welcome ! :)