Compare Two Protein Sequences Using Blast On Biopython Without Making A File For Each Sequence
2
0
Entering edit mode
10.1 years ago
Discotacos • 0

Hi guys,

I have been trying to do this, but I've got stuck.

I know how to blast two sequences using biopython and the standalone Blast, but I make files in order to blast them.

I think making files increase run time a lot, so I would like to reduce it and my main idea is to avoid making files.

I have read about piping it, for example, straight with the command bl2seq and even in biopython, but I don't know, do pipe makes a temporary files?

Thanks in advance for your suggestions

biopython blast • 4.2k views
ADD COMMENT
4
Entering edit mode
10.1 years ago

Instead of creating separate files for two sequences, you can provide them as standard input (stdin).

blastp -query <(echo -e ">seq1\nMKSTGGGGTGATGATAG") -subject <(echo -e ">seq2\nMKSTGGGGTGATGATAG")

Outputs:

BLASTP 2.2.27+


Query= Name
Length=17  
Subject= Name
Length=17

 Score = 29.6 bits (65),  Expect = 8e-09, Method: Compositional matrix adjust.
 Identities = 17/17 (100%), Positives = 17/17 (100%), Gaps = 0/17 (0%)

Query  1   MKSTGGGGTGATGATAG  17
           MKSTGGGGTGATGATAG
Sbjct  1   MKSTGGGGTGATGATAG  17

Lambda      K        H        a         alpha
   0.299    0.124    0.352    0.792     4.96 

Gapped
Lambda      K        H        a         alpha    sigma
   0.267   0.0410    0.140     1.90     42.6     43.6 

Effective search space used: 289
ADD COMMENT
1
Entering edit mode
10.1 years ago

I would say that the high resource cost is associated with starting blast, initializing the databases, then reading the files and processing them. Creating the file itself is probably not the bottleneck, it is most likely the quickest step of them all. Optimizing that probably won't do much good.

But dont't take my word for it, create a small script that creates the file and measure how long it takes.

python -m timeit -n 100 'f= open("test.fa", "wt"); f.write(">1\nAAAAAAAAAAAAAAAA\n"); f.close()'
100 loops, best of 3: 714 usec per loop

so that says less than a millisecond per file creation, not likely to be a factor.

What you should look into is collecting your sequences into one file and running blast that way, rather than invoking blast repeatedly to do pairwise alignments.

ADD COMMENT

Login before adding your answer.

Traffic: 1439 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6