Using Biopython and BLAST+ to automate de novo viral contig sorting
1
0
Entering edit mode
9.4 years ago
skbrimer ▴ 740

Hello all,

I need some assistance in programing a script to run a basic nt BLAST on contigs and then save the requested files in a output file (csv) for easy sorting. I have an understanding of python but do not practice enough to be very good at it and since I spend a large portion of my time doing this simple task I am trying to speed it up.

I have a test file of 3 contigs in basic fasta fomat with the ">" as the contig id followed by the sequence I would like to have my script work by reading the file into a dict and then passing the key and value pairs to the blast+ wrapper (found it expects each file to be one search) and then save to the output file and append the next search result until done with the file.

This is my script so far...

import sys
from Bio import SeqIO
from Bio.Blast.Applications import NcbitblastnCommandline

#varable input files
file_of_seq = sys.argv[1]

#make a dictinary for the sequences to be ran using BLAST
record_dict = SeqIO.to_dict(SeqIO.parse(file_of_seq,"fasta")
len(record_dict)

The issue I'm running into is when I run this script in command line it doesn't return the len of the dictionary it just returns an syntax error

~$ python blast.py test_query.fasta
  File "blast.py", line 10
    len(record_dict)
      ^

I do not know why it doesn't return 3 to the console since that is how many key,val pairs it should be making. So I 'm not really sure I'm using Biopython correctly in making the dictionary object.

Any advise would be appreciated.

Thanks,
Sean

Assembly Biopython blast next-gen • 6.1k views
ADD COMMENT
0
Entering edit mode

Perhaps I misunderstood your description, but just give your multiple sequence file to BLAST+ as the query argument. That should work, there is no reason to make individual FASTA files with a single query sequence each.

ADD REPLY
0
Entering edit mode

I have tried that however my non-test file contains 153 contigs that I would like to blast and it seems to only return the first 20ish contigs the couple of times I have tried. The console looks like its running just fine but the outfile stops growing in size no matter how long I wait. The console never returns, the file stops getting big and there is no error message. So I took this to be a batch/job size issue and was wanting to use Biopython to feed it in one by one so I could blast them. Am I wrong in this line of thinking? Is there an option I'm missing in the command line tool that would let blast so many or make it do them one by one?

ADD REPLY
0
Entering edit mode

Hi Peter,

It appears I may have been just to impatient with the command line tool. I have been letting it run for a longer amount of time and it just finished and gave me the result I was looking for. Thank you for your patience and advice.

Sean

ADD REPLY
0
Entering edit mode

I'm glad you've solved this :)

ADD REPLY
3
Entering edit mode
9.4 years ago

Closing parenthesis is missing.

Line 9: "fasta"))

ADD COMMENT
0
Entering edit mode

Thanks dasdevashishdas this is big help.

ADD REPLY

Login before adding your answer.

Traffic: 2514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6