Blast Two Sequences
2
8
Entering edit mode
9.8 years ago
flitrfli ▴ 80

Hello all! I have a list of pairs of proteins and I want to compare speed and accuracy of "BLAST Two Sequences" to a Smith-Waterman program for alignment. I know there is a "Blast Two Sequences" option on NCBI website, but I would like to run it from a python script. Perhaps Biopython has this capability? If I cannot use Blast Two Sequences, I will compare different versions of Smith-Waterman, but this would not be nearly as exciting :) OR, if anyone has another idea for a great senior year project in Bioinformatics involving comparing pairs of proteins, please don't hesitate to let me know！ Thank you in advance.

biopython blast alignment • 16k views
0
Entering edit mode

when I am working on the above code I got this error

Traceback (most recent call last):
File "F:\python-implement-fast-BLAST-Basic-Local-Alignment-Search-Tool-master\my blast.py", line 18, in <module>
output = NcbiblastpCommandline(query="F:\python-implement-fast-BLAST-Basic-Local-Alignment-Search-Tool-master\seq1.fasta", subject="F:\python-implement-fast-BLAST-Basic-Local-Alignment-Search-Tool-master\seq2.fasta", outfmt=5)()[0]
File "C:\Users\USER\AppData\Local\Programs\Python\Python36-32\lib\site-packages\Bio\Application\__init__.py", line 502, in __call__
shell=use_shell)
File "C:\Users\USER\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Users\USER\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified


9
Entering edit mode
9.8 years ago
Ryan Thompson ★ 3.5k

Biopython includes the Bio.Blast.Applications module which provides wrappers for the NCBI blast tools. The Biopython Tutorial provides some examples on how to use them, and the BLAST chapter of the Biopython manual has more information. If you don't specify an output file, the alignment output will come back to you as a string in the standard output of the program. You should specify outfmt=5 as one of the arguments so that your command returns XML output, which Biopython can parse (see below) You can use Python's StringIO module to convert the string to a filehandle, which you can parse using Bio.Blast.NCBIXML (alternatively, you can just specify out="blast_results.xml" in the command to write the output to a file and then use open("blast_results.xml", "r") to get a filehandle for reading the output). Again, see the BLAST chapter of the Biopython manual for examples of this.

#!/usr/bin/env python

from Bio.Blast.Applications import NcbiblastpCommandline
from StringIO import StringIO
from Bio.Blast import NCBIXML
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO

# Create two sequence files
id="seq1")
id="seq2")
SeqIO.write(seq1, "seq1.fasta", "fasta")
SeqIO.write(seq2, "seq2.fasta", "fasta")

# Run BLAST and parse the output as XML
output = NcbiblastpCommandline(query="seq1.fasta", subject="seq2.fasta", outfmt=5)()[0]

# Print some information on the result
for alignment in blast_result_record.alignments:
for hsp in alignment.hsps:
print '****Alignment****'
print 'sequence:', alignment.title
print 'length:', alignment.length
print 'e value:', hsp.expect
print hsp.query
print hsp.match
print hsp.sbjct


For doing a full Smith-Waterman alignment, you can use Bio.Emboss.Applications.WaterCommandline to run the alignment, and use Bio.AlignIO to read the result. Or, as mentioned in another answer, you can use the use_sw_tback=True option for blastp.

EDIT: I forgot to mention that the code here uses the NCBI Blast+ suite, which has separate commands for blastn, blastp, and son on, and not the old all-in-one blastall program.

0
Entering edit mode

Here is the code, where fileA and fileB are names of fasta files:

#!/usr/bin/env python
...
output = NcbiblastpCommandline(query=fileA, subject=fileB, outfmt=5)()[0]


Here is my error:

  File "./dddd.py", line 37, in <module>
output = NcbiblastpCommandline(query=fileA, subject=fileB, outfmt=5)()[0]
File "/usr/lib/pymodules/python2.7/Bio/Application/__init__.py", line 537, in __call__
stdout_str, stderr_str)
Bio.Application.ApplicationError: Command 'blastp -query 1bdo -outfmt 5 -subject 1ghj' returned non-zero exit status 127, '/bin/sh: blastp: not found'

1
Entering edit mode

Quick question. I understand this all very completely except for what the index [0] means at the end of your function call:

NcbiblastpCommandline(query=fileA, subject=fileB, outfmt=5)()[0]


Here is an instance when I've used the function:

NcbiblastnCommandline(query="queryseq", subject=seq, outfmt=5, out=str(name) + ".xml")()

0
Entering edit mode

Sidenote: How do you get it to be nicely formatted like yours?

0
Entering edit mode

it's a bit hard in comments, when you edit a comment you have the "101010" button. Or you add 4 spaces before the line.

(test)

0
Entering edit mode

Have you installed BLAST+?

0
Entering edit mode

That error message is indicating that the blastp command is not found. You need to ensure that it is installed and that it is in your $PATH. ADD REPLY 0 Entering edit mode I know I'm pretty late to the game here, but I'm new to the field and am not sure how to add BLAST+ to my$PATH? I'm using python 2.7.5 on a mac.

0
Entering edit mode

After you download and unzip the BLAST+ file you then need to add the bin folder to your path using the following command (assuming you unzipped the files into your home directory):

nano ~/.profile

export PATH =/Users/YourName/blast-2.2.22/bin:\${PATH}

Close your terminal and reopen. You should now be able to use your BLAST+ files using the terminal. To test this type:

which blastall

You should get the location of the blastall executable files. These steps are outlined in greater detail here.

0
Entering edit mode

Thank you so much for the quick reply. The link with detailed instructions solved everything for me.

4
Entering edit mode
9.8 years ago

When you put your two sequences into two separate fasta files, you can use the -query and -subject parameters to blast these files against each other (BLAST+). Also, be advised that BLAST has the option to do a full Smith-Waterman alignment with the -use_sw_tback parameter. (Here's BLAST's manual.)

However, you say you want to measure speed. Speed for comparing two sequences against each other is normally not important, but rather the speed of comparing many sequences against each other. For two sequences you probably mainly measure the time to initialize the program than the actual alignment. Lastly, there are many optimized SW implementations, be sure to search for them.

2
Entering edit mode

Michael: There is an even simpler solution to get all pairwise alignments. You can put all your sequences into one FASTA file and then use the same file for -query and -subject.