I used the following code to do blastx (by last version of ncbi-blast+) of my transcriptome assembly against the proteome reference,
blastx -query file1 -db pr_database -out file1_1.out -e-value 1e-3 -max_target_seqs 20 -outfmt 6
Now, I have two questions; given my command, are all blast hit is the best or I look at also other parameters, like identity and alignment length? Sharing your factors to select the best hit would be highly appreciated.
Also, could you please provide me how to extract desired hit in terms of e-value and identity or alignment length (if it is necessary)?
Just giving this a try and it doesn't output the correct columns for my last BLAST output -- is this for standard BLAST output? What flags did you set in your BLAST for this to work?
Standard outfmt 6 plus some extra fields after that. I'm guessing it's not really working if you used blastx (instead of IMO the far superior protein prediction + blastp approach), meaning that all your query names from the same contigs (or whatever) are identical. Tabular blastx output is pretty much unsuitable for automated sorting, unless input is short reads and you're not really expecting more than 1 protein per query sequence..
+1, You're right -- I did use blastx. I'll check on the next BLAST run. Thanks for all your help.
It works for me, for me
Just for clarification, as I used
-max_target_seqs 20in my blast command, I expected that all hits were best hit, but using suggested command I got about 27000 hit from 32000 hits as best hit. Please let me know how to explain this difference? Sorry for this question, I'm a new this filed and may be have a stupid question in your professional view! Thanks
-max_target_seqs 20for each query contig/read.
thanks dear friend