Question: standalone BLAST parametes
0
gravatar for cvu
19 months ago by
cvu130
India
cvu130 wrote:

Hi All,

I've predicted genes in genome. Now I want to identify proteins, For that, I've blasted all predicted proteins against uniprot database.

blast parameters

blastp -query proteins.fasta -db Uniprotdb -max_target_seqs 1 -max_hsps 1 -out output.blastp -outfmt 6 -evalue 0.001

what should be my blastp parameters, to get only significant match ?

Thank you in advance!!

blast gene genome • 610 views
ADD COMMENTlink modified 19 months ago by Macspider2.8k • written 19 months ago by cvu130

What is a significant match for you ? If you expect to always find nearly perfect matches in Uniprot, then restrictive parameters would work but if you also expect imperfect matches you need to have parameters to accommodate them. It may be easier to let blast report more hits then filter these with your favourite scripting language.

ADD REPLYlink written 19 months ago by Jean-Karim Heriche18k

Thanks for the reply. I want perfect matches, but which parameters to set to get a good match?

ADD REPLYlink modified 19 months ago • written 19 months ago by cvu130

If you're only looking for identical matches blast is the wrong tool for the job. Just use grep or the string matching function of a scripting language or an implementation of a global alignment algorithm (e.g. needle in the EMBOSS suite). If you insist on blast, filter the output on alignment length and percent identity, i.e. only keep alignments (HSPs) that are full length relative to the query and 100% identity.

ADD REPLYlink written 19 months ago by Jean-Karim Heriche18k
0
gravatar for Macspider
19 months ago by
Macspider2.8k
Vienna - BOKU
Macspider2.8k wrote:

what should be my blastp parameters, to get only significant match ?

This question groups together with "what is the cure of cancer".

Functional annotation is a pain in the a**, you have to deal with it. There are no "optimal" parameters, and many results in the databases are either wrong or "unknown", "uncharacterized", "undefined".

From your command I can see that you are limiting the hits to 1 and the high-scoring pairs to 1 (hsp). Why? Are you working in a non-model organism I assume (like I do) so you did a gene prediction because there was none. However, you could allow for more hsps than 1 because many times you have more than 1 hsp per sequence. There is one script (this: find-best-hit.py ) which allows you to find the best combination of HSPs in a blast run from an xml output file. Give it a try!

ADD COMMENTlink written 19 months ago by Macspider2.8k

Thanks for the reply. I want only one best match for each protein, Hence I set hsp 1.

ADD REPLYlink written 19 months ago by cvu130

"HSP corresponds to the matching region between the query sequence and the database hit sequence." from High Scoring Pairs (HSP) in BLAST output

There can be many HSPs per match, and limiting your blast run to one per match may reduce / underestimate / misestimate the overall sequence identity.

I think you should make the effort to read the literature about it before doing your analysis blindly.

EDIT: this is the literature you need to read! http://jeff.wintersinger.org/posts/2014/07/designing-an-algorithm-to-compute-the-optimal-set-of-blast-hits/

ADD REPLYlink modified 19 months ago • written 19 months ago by Macspider2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 723 users visited in the last hour