Question

Blast Two Sequences

9

Entering edit mode

12.0 years ago

flitrfli ▴ 90

Hello all! I have a list of pairs of proteins and I want to compare speed and accuracy of "BLAST Two Sequences" to a Smith-Waterman program for alignment. I know there is a "Blast Two Sequences" option on NCBI website, but I would like to run it from a python script. Perhaps Biopython has this capability? If I cannot use Blast Two Sequences, I will compare different versions of Smith-Waterman, but this would not be nearly as exciting :) OR, if anyone has another idea for a great senior year project in Bioinformatics involving comparing pairs of proteins, please don't hesitate to let me know！ Thank you in advance.

biopython blast alignment • 22k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 12.0 years ago by flitrfli ▴ 90

0

Entering edit mode

when I am working on the above code I got this error

Traceback (most recent call last):
  File "F:\python-implement-fast-BLAST-Basic-Local-Alignment-Search-Tool-master\my blast.py", line 18, in <module>
    output = NcbiblastpCommandline(query="F:\python-implement-fast-BLAST-Basic-Local-Alignment-Search-Tool-master\seq1.fasta", subject="F:\python-implement-fast-BLAST-Basic-Local-Alignment-Search-Tool-master\seq2.fasta", outfmt=5)()[0]
  File "C:\Users\USER\AppData\Local\Programs\Python\Python36-32\lib\site-packages\Bio\Application\__init__.py", line 502, in __call__
    shell=use_shell)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

Can anyone please help me?

ADD REPLY • link updated 6.2 years ago by GenoMax 141k • written 6.2 years ago by santhoshhegde278 • 0

Michael Kuhn · Answer 1 · 2012-04-15

10

Entering edit mode

12.0 years ago

Ryan Thompson ★ 3.6k

Biopython includes the Bio.Blast.Applications module which provides wrappers for the NCBI blast tools. The Biopython Tutorial provides some examples on how to use them, and the BLAST chapter of the Biopython manual has more information. If you don't specify an output file, the alignment output will come back to you as a string in the standard output of the program. You should specify outfmt=5 as one of the arguments so that your command returns XML output, which Biopython can parse (see below) You can use Python's StringIO module to convert the string to a filehandle, which you can parse using Bio.Blast.NCBIXML (alternatively, you can just specify out="blast_results.xml" in the command to write the output to a file and then use open("blast_results.xml", "r") to get a filehandle for reading the output). Again, see the BLAST chapter of the Biopython manual for examples of this.

#!/usr/bin/env python

from Bio.Blast.Applications import NcbiblastpCommandline
from StringIO import StringIO
from Bio.Blast import NCBIXML
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO

# Create two sequence files
seq1 = SeqRecord(Seq("FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF"),
                   id="seq1")
seq2 = SeqRecord(Seq("FQTWEEFSRAEKLYLADPMKVRVVLRYRHVDGNLCIKVTDDLICLVYRTDQAQDVKKIEKF"),
                   id="seq2")
SeqIO.write(seq1, "seq1.fasta", "fasta")
SeqIO.write(seq2, "seq2.fasta", "fasta")

# Run BLAST and parse the output as XML
output = NcbiblastpCommandline(query="seq1.fasta", subject="seq2.fasta", outfmt=5)()[0]
blast_result_record = NCBIXML.read(StringIO(output))

# Print some information on the result
for alignment in blast_result_record.alignments:
    for hsp in alignment.hsps:
        print '****Alignment****'
        print 'sequence:', alignment.title
        print 'length:', alignment.length
        print 'e value:', hsp.expect
        print hsp.query
        print hsp.match
        print hsp.sbjct

For doing a full Smith-Waterman alignment, you can use Bio.Emboss.Applications.WaterCommandline to run the alignment, and use Bio.AlignIO to read the result. Or, as mentioned in another answer, you can use the use_sw_tback=True option for blastp.

EDIT: I forgot to mention that the code here uses the NCBI Blast+ suite, which has separate commands for blastn, blastp, and son on, and not the old all-in-one blastall program.

ADD COMMENT • link 12.0 years ago by Ryan Thompson ★ 3.6k

0

Entering edit mode

Here is the code, where fileA and fileB are names of fasta files:

#!/usr/bin/env python
...
output = NcbiblastpCommandline(query=fileA, subject=fileB, outfmt=5)()[0]

Here is my error:

  File "./dddd.py", line 37, in <module>
    output = NcbiblastpCommandline(query=fileA, subject=fileB, outfmt=5)()[0]
  File "/usr/lib/pymodules/python2.7/Bio/Application/__init__.py", line 537, in __call__
    stdout_str, stderr_str)
Bio.Application.ApplicationError: Command 'blastp -query 1bdo -outfmt 5 -subject 1ghj' returned non-zero exit status 127, '/bin/sh: blastp: not found'

ADD REPLY • link updated 12.0 years ago by Michael Kuhn 5.0k • written 12.0 years ago by flitrfli ▴ 90

1

Entering edit mode

Quick question. I understand this all very completely except for what the index [0] means at the end of your function call:

NcbiblastpCommandline(query=fileA, subject=fileB, outfmt=5)()[0]

Here is an instance when I've used the function:

NcbiblastnCommandline(query="queryseq", subject=seq, outfmt=5, out=str(name) + ".xml")()

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.7 years ago by Clint Valentine ▴ 10

0

Entering edit mode

Sidenote: How do you get it to be nicely formatted like yours?

ADD REPLY • link 12.0 years ago by flitrfli ▴ 90

0

Entering edit mode

it's a bit hard in comments, when you edit a comment you have the "101010" button. Or you add 4 spaces before the line.

(test)

ADD REPLY • link 12.0 years ago by Michael Kuhn 5.0k

0

Entering edit mode

Have you installed BLAST+?

ADD REPLY • link 12.0 years ago by Michael Kuhn 5.0k

0

Entering edit mode

That error message is indicating that the blastp command is not found. You need to ensure that it is installed and that it is in your $PATH.

ADD REPLY • link 12.0 years ago by Ryan Thompson ★ 3.6k

0

Entering edit mode

I know I'm pretty late to the game here, but I'm new to the field and am not sure how to add BLAST+ to my $PATH? I'm using python 2.7.5 on a mac.

ADD REPLY • link 8.8 years ago by kevin • 0

0

Entering edit mode

After you download and unzip the BLAST+ file you then need to add the bin folder to your path using the following command (assuming you unzipped the files into your home directory):

nano ~/.profile

Then add this line:

export PATH =/Users/YourName/blast-2.2.22/bin:${PATH}

Close your terminal and reopen. You should now be able to use your BLAST+ files using the terminal. To test this type:

which blastall

You should get the location of the blastall executable files. These steps are outlined in greater detail here.

ADD REPLY • link 8.8 years ago by Clint Valentine ▴ 10

0

Entering edit mode

Thank you so much for the quick reply. The link with detailed instructions solved everything for me.

ADD REPLY • link 8.8 years ago by kevin • 0

score 5 · Answer 2 · 2012-04-15

When you put your two sequences into two separate fasta files, you can use the -query and -subject parameters to blast these files against each other (BLAST+). Also, be advised that BLAST has the option to do a full Smith-Waterman alignment with the -use_sw_tback parameter. (Here's BLAST's manual.)

However, you say you want to measure speed. Speed for comparing two sequences against each other is normally not important, but rather the speed of comparing many sequences against each other. For two sequences you probably mainly measure the time to initialize the program than the actual alignment. Lastly, there are many optimized SW implementations, be sure to search for them.