Blastp And Biopython
1
1
Entering edit mode
13.9 years ago

I have noticed that when I blast my own database in particular some of the sequences get truncated. This is using my local blast algorithm.

For example I know that the sequence FYNTSTPQ and FYNTSTRR would align nicely like this:

FYNTSTPQ
FYNTST
FYNTSTRR

but the output for blast p will truncate them so you only see the matching amino acids in the middle

FYNTST
FYNTST
FYNTST

Does anyone know how I can get around this truncation, it is critical I know what these amino acids are that diverge from the template...

Cheers, Jordan

biopython blast • 3.7k views
ADD COMMENT
2
Entering edit mode

Can you provide some more details. (1) By "local blast algorithm", you mean a local installation of BLAST, not something that you have written yourself? (2) It's unclear where Biopython is used: are you calling BLAST from the Biopython wrapper? (3) Are you also using Biopython to parse the BLAST output?

ADD REPLY
0
Entering edit mode

Yes sorry:

I am indeed using Biopython wrapper with a cline that looks like this:

cline = NcbiblastpCommandline(matrix="PAM30", evalue="20", word_size="2", query="germ.fasta", cmd='blastp', db="temporary_database", out="blastout")
call_blast(cline)

To parse the output file I use

blast_parser = NCBIStandalone.BlastParser()
blast_parser.parse('blastout')
ADD REPLY
0
Entering edit mode

Thanks for the details. I've not used Biopython for this, but I'm sure others will have some ideas. In the meantime I'd suggest first to run standalone BLAST and see if it gives the same alignment. If it does, you may need to experiment with the parameters, such as gap open/extend penalty, see http://www.plexdb.org/modules/documentation/NCBIblastall.htm. If you're aligning short sequences (guessing you might be, since you use PAM30), BLAST may not be the best tool.

ADD REPLY
2
Entering edit mode
13.9 years ago

Following up on Neil's comments, it sounds like you are looking for a global alignment algorithm. BLAST does local alignments with plenty of heuristics and is a good tool for searching, but you'll likely want a dedicated aligner to assess an alignment at the level you are looking to do. If you have two sequences, Needle or Stretcher from EMBOSS would be useful:

If you are doing multiple alignments, check out MAFFT or PROBCONS.

You can leverage Biopython and Python for running and parsing the output of these. Biopython has commandline wrappers for Needle, MAFFT and PROBCONS in Bio.Emboss.Applications and Bio.Align.Applications; this is a good place to get started.

ADD COMMENT
0
Entering edit mode

Right - I wasn't sure in the original question if the example sequences were full-length. If so, you should certainly be using the tools Brad describes, not BLAST.

ADD REPLY

Login before adding your answer.

Traffic: 2438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6