short protein BLAST
1
0
Entering edit mode
9.8 years ago
biolab ★ 1.4k

Dear All

I am performing BLAST. The database is a collection of short protein sequences (~40 aa in length, short but not short very much). The query file contains short protein sequences as well (~40 aa). I found BLASTp E-value cutoff of 1e-5 works badly. What's your suggestions on doing this BLAST? Thanks a lot!

blast • 2.7k views
ADD COMMENT
1
Entering edit mode

"Performing BLAST" is a very bad descriptor. What is your goal doing this? Are you looking for similar sequences? Proteins with a specific domain? Proteins which has not been observed before?

And what do you mean by "short, but not short very much"?

ADD REPLY
0
Entering edit mode
9.8 years ago
Juke34 8.5k

What do you mean by it works badly... too much sequence are retrieved ? Proteins are not really similar ? ... ?

Your e-value depends of the size of your query (m) and the size of your database (n).

See here: http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html

So, you can try by cheating on your database size (you can define with a parameter), or use the bit-score instead of the e-value. You can also decrease your cutoff.

I invite you to consult also this document: http://homepages.ulb.ac.be/~dgonze/TEACHING/stat_scores.pdf.

ADD COMMENT
0
Entering edit mode

Thank you very much for your suggestions.

ADD REPLY

Login before adding your answer.

Traffic: 2010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6