Question: Trouble Calling Emboss Program From Python
3
gravatar for charles.bridges
6.9 years ago by
charles.bridges70 wrote:

I am having trouble calling an EMBOSS program (which runs via command line) called sixpack through python. I run Python via Windows 7, Python version 3.23, Biopython version 1.59, EMBOSS version 6.4.0.4. Sixpack is used to translate a DNA sequence in all six reading frames and creates two files as output; a sequence file identifying ORFs, and a file containing the protein sequences. There are three required arguments which I can successfully call from command line: (-sequence [input file], -outseq [output sequence file], -outfile [protein sequence file]). I have been using the subprocess module in place of os.system as I have read that it is more powerful and versatile. The following is my python code, which runs without error but does not produce the desired output files.

from Bio import SeqIO
import re
import os
import subprocess

infile = input('Full path to EXISTING .fasta file would you like to open: ')
outdir = input('NEW Directory to write outfiles to: ')
os.mkdir(outdir)
for record in SeqIO.parse(infile, "fasta"):

    print("Translating (6-Frame): " + record.id)

    ident=re.sub("\|", "-", record.id)

    print (infile)
    print ("Old record ID: " + record.id)
    print ("New record ID: " + ident)

    subprocess.call (['C:\memboss\sixpack.exe', '-sequence ' + infile, '-outseq ' + outdir + ident + '.sixpack', '-outfile ' + outdir + ident + '.format'])

    print ("Translation of: " + infile + "\nWritten to: " + outdir + ident)
python biopython • 2.3k views
ADD COMMENTlink written 6.9 years ago by charles.bridges70
2

If your file had 100 records, it appears you'd call six-pack 100 times, but each time giving it all 100 sequences as input. i.e. I think you'd get 100 sets of output, all the same. You should only need to call six-pack ONCE with all the records (although you could call it 100 times, each with one sequence if you really wanted to). See also http://emboss.open-bio.org/wiki/Appdoc:Sixpack

ADD REPLYlink written 6.9 years ago by Peter5.8k

Unfortunately, sixpack will only read a SINGLE sequence at a time. I used SeqIO.parse to 'iterate' through the records and call sixpack for each record in the file. That way each record creates the two output files.

EDIT: New problem: this script does iterate over all of the records, but just creates the same output for each record and renames it by the record's GI number. I'll have to find a way to format each record in the input file appropriately and use that as input for the record.

ADD REPLYlink modified 6.9 years ago • written 6.9 years ago by charles.bridges70
3
gravatar for Joachim
6.9 years ago by
Joachim2.8k
San Francisco, California
Joachim2.8k wrote:

Hi!

You have to use subprocess.call like this:

subprocess.call (['C:\memboss\sixpack.exe', '-sequence', infile, '-outseq', outdir + ident + '.sixpack', '-outfile', outdir + ident + '.format'])

For example: subprocess.call(['ls','-l /']) gives an error message, but subprocess.call(['ls','-l', '/']) works fine.

Hope that helps,

Joachim

ADD COMMENTlink written 6.9 years ago by Joachim2.8k
3

In addition to making a argument list of individual strings, try using 'subprocess.check_call' instead of call. This will raise an error when the subprocess has issues instead of failing silently.

ADD REPLYlink written 6.9 years ago by Brad Chapman9.4k

AHA! Finally! works like a charm. Thanks Joachim!

ADD REPLYlink written 6.9 years ago by charles.bridges70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1476 users visited in the last hour