Question: How To Use Blast To Find Exact Matches Of Short Sequences?
2
gravatar for Free Man
5.8 years ago by
Free Man170
Earth - China
Free Man170 wrote:

Hi, I am using tblastn (under blast 2.2.25+) for exact peptide mapping (no gaps).
I want to map few peptides (about 6 to 50 AAs in length) to genome.
However, as I test a known peptide of 6 AAs,tblastn failed to mapped this peptide.
I have read the doc of blast, but failed to find a solution. What did I miss?
Thank you!
PS. I have also tried PGM (ProteogenomicMapping). This tool can map the known peptide tested above correctly, but it's slow in my computer which is impossible for large scale mapping.

blast • 8.4k views
ADD COMMENTlink modified 5.8 years ago by SRKR170 • written 5.8 years ago by Free Man170
3
gravatar for SRKR
5.8 years ago by
SRKR170
Visakhapatnam
SRKR170 wrote:

In the command line BLAST there is an option -perc_identity. You can use this, keep it as 100 and then run the blast. With that setting you will be able to get hits only if there is 100% identity. The command would be like this:

blastn -db dbname -query input_file -out output_file -perc_identity 100

you can try this and I believe it will work. you can also use word-size to get hits with even shorter peptides, like your case. It's value should be a minimum of 2 in case of tblastn

-word_size 3

you can always get to know all the options available by typing -h (brief) or -help (detailed) after the blast type

tblastn -help

hope this helps...

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by SRKR170
1

Hi, I got error: "Error: (CArgException::eInvalidArg) Unknown argument: "perc_identity"".
I did not find something like 'perc_identity' in the help doc for tblastn. It seems it is only avaliable for blastn. So what version are you using?

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by Free Man170
1

yeah I am sorry, just now noticed that -perc_identity is not available with tbalstn. The best option that seems to be the case is to use -ungapped, which will avoid gaps, but still it might result in mismatches.

ADD REPLYlink written 5.8 years ago by SRKR170

What is your genome size? If it isn't too big a script can be useful to you to get the positions. Just have to six frame translate the genome and search for your amino acid sequences in the translates. You will get the positions all through the genome.

ADD REPLYlink written 5.8 years ago by SRKR170

Thanks for you suggestion! After tedious attempts using various parameters, I got the solution for my project:
Key parameters: -comp_based_stats 0 -ungapped -matrix PAM 30 -seg no

ADD REPLYlink written 5.8 years ago by Free Man170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1395 users visited in the last hour