Question: Blast Two Sequences
8
gravatar for flitrfli
8.8 years ago by
flitrfli80
flitrfli80 wrote:

Hello all! I have a list of pairs of proteins and I want to compare speed and accuracy of "BLAST Two Sequences" to a Smith-Waterman program for alignment. I know there is a "Blast Two Sequences" option on NCBI website, but I would like to run it from a python script. Perhaps Biopython has this capability? If I cannot use Blast Two Sequences, I will compare different versions of Smith-Waterman, but this would not be nearly as exciting :) OR, if anyone has another idea for a great senior year project in Bioinformatics involving comparing pairs of proteins, please don't hesitate to let me know! Thank you in advance.

alignment biopython blast • 14k views
ADD COMMENTlink modified 2.9 years ago by santhoshhegde2780 • written 8.8 years ago by flitrfli80

when I am working on the above code I got this error

Traceback (most recent call last):
  File "F:\python-implement-fast-BLAST-Basic-Local-Alignment-Search-Tool-master\my blast.py", line 18, in <module>
    output = NcbiblastpCommandline(query="F:\python-implement-fast-BLAST-Basic-Local-Alignment-Search-Tool-master\seq1.fasta", subject="F:\python-implement-fast-BLAST-Basic-Local-Alignment-Search-Tool-master\seq2.fasta", outfmt=5)()[0]
  File "C:\Users\USER\AppData\Local\Programs\Python\Python36-32\lib\site-packages\Bio\Application\__init__.py", line 502, in __call__
    shell=use_shell)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\Users\USER\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

Can anyone please help me?

ADD REPLYlink modified 2.9 years ago by GenoMax94k • written 2.9 years ago by santhoshhegde2780
9
gravatar for Ryan Thompson
8.8 years ago by
Ryan Thompson3.4k
TSRI, La Jolla, CA
Ryan Thompson3.4k wrote:

Biopython includes the Bio.Blast.Applications module which provides wrappers for the NCBI blast tools. The Biopython Tutorial provides some examples on how to use them, and the BLAST chapter of the Biopython manual has more information. If you don't specify an output file, the alignment output will come back to you as a string in the standard output of the program. You should specify outfmt=5 as one of the arguments so that your command returns XML output, which Biopython can parse (see below) You can use Python's StringIO module to convert the string to a filehandle, which you can parse using Bio.Blast.NCBIXML (alternatively, you can just specify out="blast_results.xml" in the command to write the output to a file and then use open("blast_results.xml", "r") to get a filehandle for reading the output). Again, see the BLAST chapter of the Biopython manual for examples of this.

#!/usr/bin/env python

from Bio.Blast.Applications import NcbiblastpCommandline
from StringIO import StringIO
from Bio.Blast import NCBIXML
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO

# Create two sequence files
seq1 = SeqRecord(Seq("FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF"),
                   id="seq1")
seq2 = SeqRecord(Seq("FQTWEEFSRAEKLYLADPMKVRVVLRYRHVDGNLCIKVTDDLICLVYRTDQAQDVKKIEKF"),
                   id="seq2")
SeqIO.write(seq1, "seq1.fasta", "fasta")
SeqIO.write(seq2, "seq2.fasta", "fasta")

# Run BLAST and parse the output as XML
output = NcbiblastpCommandline(query="seq1.fasta", subject="seq2.fasta", outfmt=5)()[0]
blast_result_record = NCBIXML.read(StringIO(output))

# Print some information on the result
for alignment in blast_result_record.alignments:
    for hsp in alignment.hsps:
        print '****Alignment****'
        print 'sequence:', alignment.title
        print 'length:', alignment.length
        print 'e value:', hsp.expect
        print hsp.query
        print hsp.match
        print hsp.sbjct

For doing a full Smith-Waterman alignment, you can use Bio.Emboss.Applications.WaterCommandline to run the alignment, and use Bio.AlignIO to read the result. Or, as mentioned in another answer, you can use the use_sw_tback=True option for blastp.

EDIT: I forgot to mention that the code here uses the NCBI Blast+ suite, which has separate commands for blastn, blastp, and son on, and not the old all-in-one blastall program.

ADD COMMENTlink modified 8.7 years ago • written 8.8 years ago by Ryan Thompson3.4k

Here is the code, where fileA and fileB are names of fasta files:

#!/usr/bin/env python
...
output = NcbiblastpCommandline(query=fileA, subject=fileB, outfmt=5)()[0]

Here is my error:

  File "./dddd.py", line 37, in <module>
    output = NcbiblastpCommandline(query=fileA, subject=fileB, outfmt=5)()[0]
  File "/usr/lib/pymodules/python2.7/Bio/Application/__init__.py", line 537, in __call__
    stdout_str, stderr_str)
Bio.Application.ApplicationError: Command 'blastp -query 1bdo -outfmt 5 -subject 1ghj' returned non-zero exit status 127, '/bin/sh: blastp: not found'
ADD REPLYlink modified 8.7 years ago by Michael Kuhn5.0k • written 8.7 years ago by flitrfli80
1

Quick question. I understand this all very completely except for what the index [0] means at the end of your function call:

NcbiblastpCommandline(query=fileA, subject=fileB, outfmt=5)()[0]

Here is an instance when I've used the function:

NcbiblastnCommandline(query="queryseq", subject=seq, outfmt=5, out=str(name) + ".xml")()

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by Clint Valentine10

Sidenote: How do you get it to be nicely formatted like yours?

ADD REPLYlink written 8.7 years ago by flitrfli80

it's a bit hard in comments, when you edit a comment you have the "101010" button. Or you add 4 spaces before the line.

(test)
ADD REPLYlink written 8.7 years ago by Michael Kuhn5.0k

Have you installed BLAST+?

ADD REPLYlink written 8.7 years ago by Michael Kuhn5.0k

That error message is indicating that the blastp command is not found. You need to ensure that it is installed and that it is in your $PATH.

ADD REPLYlink written 8.7 years ago by Ryan Thompson3.4k

I know I'm pretty late to the game here, but I'm new to the field and am not sure how to add BLAST+ to my $PATH? I'm using python 2.7.5 on a mac.

ADD REPLYlink written 5.5 years ago by kevin0

After you download and unzip the BLAST+ file you then need to add the bin folder to your path using the following command (assuming you unzipped the files into your home directory):

nano ~/.profile

Then add this line:

export PATH =/Users/YourName/blast-2.2.22/bin:${PATH}

Close your terminal and reopen. You should now be able to use your BLAST+ files using the terminal. To test this type:

which blastall

You should get the location of the blastall executable files. These steps are outlined in greater detail here.

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Clint Valentine10

Thank you so much for the quick reply. The link with detailed instructions solved everything for me.

ADD REPLYlink written 5.5 years ago by kevin0
3
gravatar for Michael Kuhn
8.8 years ago by
Michael Kuhn5.0k
EMBL Heidelberg
Michael Kuhn5.0k wrote:

When you put your two sequences into two separate fasta files, you can use the -query and -subject parameters to blast these files against each other (BLAST+). Also, be advised that BLAST has the option to do a full Smith-Waterman alignment with the -use_sw_tback parameter. (Here's BLAST's manual.)

However, you say you want to measure speed. Speed for comparing two sequences against each other is normally not important, but rather the speed of comparing many sequences against each other. For two sequences you probably mainly measure the time to initialize the program than the actual alignment. Lastly, there are many optimized SW implementations, be sure to search for them.

ADD COMMENTlink written 8.8 years ago by Michael Kuhn5.0k
1

Michael: There is an even simpler solution to get all pairwise alignments. You can put all your sequences into one FASTA file and then use the same file for -query and -subject.

ADD REPLYlink written 8.8 years ago by Michael Schubert7.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2421 users visited in the last hour