Hello all
I am trying to generate a blastp command that is sensitive enough to find similarity between short query sequences (5-12 amino acids long) in a tailored database of over 8k protein sequences.
I have already came up with a working command:
blastp -task blastp-short -word_size 2 -num_alignments 50 -max_hsps 1 -evalue 60 -db tailored_db -query short_seqs.fasta -out outfile.blastp
In general it works nicely, but there are a couple of sequences that I know that have similarity in two different proteins, but the command only outputs hits to one of them. I have already tried to tweak the -threshold command, and -window_size, with no luck.
As a test, I tried devoiding the original fasta file of the database of the protein giving hits, with the hope that the other one will show some hits, but the command did not yield any hit.
I am trying to tweak the algorithm in order to get all the proteins with similarity, and not just one, and the resources I have found do not help.
Do anyone have any idea which other flags can I tweak in order to make the algorithm more sensitive to other sequences?
any help would be very appreciated.
What is the size of proteins in the database that you are searching against?
5-12 AA seem like extremely short queries. Perhaps you should look at pattern matching algorithms like
fuzzpro
: https://embossgui.sourceforge.net/demo/manual/fuzzpro.htmlThe proteins range from a couple hundreds amino acids to around a thousand.
They are really short query sequences in deed, but that is the reality of my problem at hand.
Thank you for putting forward this tool, I will check it out, it looks useful. I was looking for a way of tweaking the BLASTP algorithm, but hey, if this emboss tool does the trick it is amazing.