Question: short protein BLAST
0
gravatar for biolab
4.8 years ago by
biolab1.1k
biolab1.1k wrote:

Dear All

I am performing BLAST.  The database is a collection of short protein sequences (~40 aa in length,  short but not short very much).  The query file contains short protein sequences as well (~40 aa).   I found BLASTp E-value cutoff of 1e-5 works badly.   What's your suggestions on doing this BLAST?  Thanks a lot!

blast • 1.2k views
ADD COMMENTlink modified 4.8 years ago by Juke-342.1k • written 4.8 years ago by biolab1.1k
1

"Performing BLAST" is a very bad descriptor. What is your goal doing this? Are you looking for similar sequences? Proteins with a specific domain? Proteins which has not been observed before?

And what do you mean by "short, but not short very much"?

ADD REPLYlink written 4.8 years ago by David Westergaard1.4k
0
gravatar for Juke-34
4.8 years ago by
Juke-342.1k
Sweden
Juke-342.1k wrote:

What do you mean by it works badly... too much sequence are retrieved ? Proteins are not really similar ? ... ?

Your e-value depends of the size of your query (m) and the size of your database (n).

See there : http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html

So, you can try by cheating on your database size (you can define with a parameter), or use the bit-score instead of the e-value. You can also decrease your cutoff.

I invite you to consult also this document: http://homepages.ulb.ac.be/~dgonze/TEACHING/stat_scores.pdf.

 

ADD COMMENTlink written 4.8 years ago by Juke-342.1k

Thank you very much for your suggestions.

ADD REPLYlink written 4.8 years ago by biolab1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1392 users visited in the last hour