Running blast shell script within python pipeline
2
0
Entering edit mode
3.2 years ago
Wilber0x ▴ 50

I have a list of plastid sequences in fasta files, which I need to turn into blast databases as part of a longer pipeline within my python script. I think using a shell subprocess within the script will be the easiest way to do this but I am not getting it to work.

Here is my code:

for i in range(len(fileNameList)):
blastdb_cmd = 'makeblastdb -in ' + fileNameList[i] + (' -out ../Databases/') + genbankIDs[i] + ' -parse_seqids -dbtype nucl -title temp_blastdb'
DB_process = subprocess.Popen(blastdb_cmd,
                              shell=True,
                              stdin=subprocess.PIPE,
                              stdout=subprocess.PIPE,
                              stderr=subprocess.PIPE)
DB_process.wait()

fileNameList is what I have called the list of plastid sequence file names, and genbankIDs is the list of genbank IDs that I will be using in the databases.

When I isolate a blastdb_cmd line and copy and paste it into the shell, it works just fine to produce a database. See example of a blastdb_cmd below:

makeblastdb -in NC_026291.fasta -out ../Databases/NC_026291 -parse_seqids -dbtype nucl

However, the python script is not making any databases. I don't get any error messages

sequence blast biopython • 1.2k views
ADD COMMENT
2
Entering edit mode
3.2 years ago
Mensur Dlakic ★ 27k

Have you tried:

import os

for i in range(len(fileNameList)):
    blastdb_cmd = 'makeblastdb -in ' + fileNameList[i] + (' -out ../Databases/') + genbankIDs[i] + ' -parse_seqids -dbtype nucl -title temp_blastdb'
    os.system(blastdb_cmd)

using a shell subprocess within the script will be the easiest way to do this but I am not getting it to work

This can be done in a shell script directly and the statement is not necessarily true, though it may be most convenient given your programming habits.

ADD COMMENT
0
Entering edit mode

This is great thank you for your help! Is the os.system() the key here?

ADD REPLY
1
Entering edit mode

Not sure that I understand the question, but os.system() simply passes a pre-formed command string to the shell. This post may help to explain the differences, and it is easy to google for more info if you need it.

ADD REPLY
2
Entering edit mode
3.2 years ago
Joe 21k

subprocess is the 'correct' way to do this. While the os approach does work, it is not the recommended practise any more.

I suspect your command is simply incorrect. I can spot a few irregularities, but I haven't tested this:

For one thing, you don't need to do any of the item[i] business:

for filename, genbankID in zip(fileNameList, genbankIDs):
    blastdb_cmd = 'makeblastdb -in {0} -out ../Databases/{1} -parse_seqids -dbtype nucl -title temp_blastdb'.format(filename, genbankID)
    DB_process = subprocess.Popen(blastdb_cmd,
                                  shell=True,
                                  stdin=subprocess.PIPE,
                                  stdout=subprocess.PIPE,
                                  stderr=subprocess.PIPE)
    DB_process.wait()

I think your problem is the errant ( ) (possibly), as well as your IDs/filepaths maybe not being formed in to the string as you think they are. It may be worth printing the content of filename and genbankID each time to ensure there are no filepath components, white space, or other errant characters in the input that may be breaking your command.

I've done exactly this before, so this may help:

https://github.com/jrjhealey/bioinfo-tools/blob/master/Genbank_slicer.py#L76-L101

ADD COMMENT

Login before adding your answer.

Traffic: 1678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6