Question: Using Biopython and BLAST+ to automate de novo viral contig sorting
gravatar for skbrimer
5.8 years ago by
United States
skbrimer640 wrote:

Hello all,

I need some assistance in programing a script to run a basic nt BLAST on contigs and then save the requested files in a output file (csv) for easy sorting. I have an understanding of python but do not practice enough to be very good at it and since I spend a large portion of my time doing this simple task I am trying to speed it up.

I have a test file of 3 contigs in basic fasta fomat with the ">" as the contig id followed by the sequence I would like to have my script work by reading the file into a dict and then passing the key and value pairs to the blast+ wrapper (found it expects each file to be one search) and then save to the output file and append the next search result until done with the file.

This is my script so far...

import sys
from Bio import SeqIO
from Bio.Blast.Applications import NcbitblastnCommandline

#varable input files
file_of_seq = sys.argv[1]

#make a dictinary for the sequences to be ran using BLAST
record_dict = SeqIO.to_dict(SeqIO.parse(file_of_seq,"fasta")

The issue I'm running into is when I run this script in command line it doesn't return the len of the dictinary it just returns an sytax error

~$ python test_query.fasta
  File "", line 10

I do not know why it doesn't return 3 to the console since that is how many key,val pairs it should be making.  So I 'm not really sure I'm using Biopython correctly in making the dictinary object.

Any advise would be appreciated.



blast biopython next-gen assembly • 4.9k views
ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by skbrimer640

Perhaps I misunderstood your description, but just give your multiple sequence file to BLAST+ as the query argument. That should work, there is no reason to make individual FASTA files with a single query sequence each.

ADD REPLYlink written 5.8 years ago by Peter5.8k

I have tried that however my non-test file contains 153 contigs that I would like to blast and it seems to only return the first 20ish contigs the couple of times I have tried. The console looks like its running just fine but the outfile stops growing in size no matter how long I wait. The console never returns, the file stops getting big and there is no error message.  So I took this to be a batch/job size issue and was wanting to use Biopython to feed it in one by one so I could blast them. Am I wrong in this line of thinking? Is there an option I'm missing in the command line tool that would let blast so many or make it do them one by one?

ADD REPLYlink written 5.8 years ago by skbrimer640

Hi Peter,

It appears I may have been just to impatient with the command line tool. I have been letting it run for a longer amount of time and it just finished and gave me the result I was looking for. Thank you for your patience and advice.


ADD REPLYlink written 5.8 years ago by skbrimer640

I'm glad you've solved this :)

ADD REPLYlink written 5.8 years ago by Peter5.8k
gravatar for dasdevashishdas
5.8 years ago by
dasdevashishdas120 wrote:
Closing parenthesis is missing. Line 9: "fasta"))
ADD COMMENTlink written 5.8 years ago by dasdevashishdas120

Thanks dasdevashishdas this is big help. 

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by skbrimer640
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1680 users visited in the last hour