I now have some protein sequences of a number of species (chicken, human, mouse, ...). I want to use BLASTP to find the homologous sequence of a small peptide "KPWLRVALCPG" in these species. For the convenience of searching directly for multiple species, I have combined the protein sequences of these species into a single file called "merge.fa" and created a blast database.
I try to filter out peptides with a bit-score > 20 and I use a formula (E = query_sequence_length * total_database_length / 2^bit-score) to calculate the corresponding e_value for a bit-score of 20.
I first searched for merge.fa, using the following command.
blastp -db merge.fa -query test.fa -out test.merge.0 -task blastp-short -outfmt 0 -evalue 5283 -word_size 2 -matrix PAM30
Another search was carried out on chicken.fa, using the following command.
blastp -db chicken.fa -query test.fa -out test.chicken.0 -task blastp-short -outfmt 0 -evalue 44 -word_size 2 -matrix PAM30
However, I found two problems.
- Firstly, the bit-score threshold was not the expected "20", but "18".
- Secondly, I compared the search results for "chicken.fa" with those for "merge.fa" and found that many of the results for "chicken.fa" had disappeared!
I am very confused, can anyone give me some advice? Thanks.