Question: Error when running clustal omega with python/biopython
2
gravatar for Nate
3.0 years ago by
Nate20
Nate20 wrote:

Hi all,

I've been trying to run a sequence alignment using clustal omega from within a python script. However, I keep getting the following error:

FATAL: Sequence file contains no data

However, if I run the raw command on the command line, everything runs fine. Here is the relevant code in my script:

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
import Bio.SeqIO as SeqIO
from Bio.Align.Applications import ClustalOmegaCommandline

ref_seq = "ERVVIGSKPFNEQYILANMIAILLEENGYKA"
target_seq = "ERVVIGSKPFNEQYILANMINGYKA"

in_file = "unaligned.fasta"
out_file = "aligned.fasta"

# Write my sequences to a fasta file
handle = open(in_file, 'w')
records = (SeqRecord(Seq(seq, IUPAC.protein), id=str(index), name="Test", description="Test") for index,seq in enumerate([ref_seq, target_seq]) )
SeqIO.write(records, handle, "fasta")

clustalomega_cline = ClustalOmegaCommandline(infile=in_file, outfile=out_file, verbose=True, auto=True, force=True)
clustalomega_cline()
#sp.call(str(clustalomega_cline), shell=True) # This will also fail with the same error

Edit:

It seems that this is due to either the way biopython writes the FASTA file, or how clustalo processes it.

If I comment out the lines where I write to a file and run the script again, is still fails, even if I run the raw command using subprocess. BUT, if I also make sure that there are no spaces in the FASTA sequence headers, then the commands at the bottom run fine.

I don't know why this isn't a problem when just run from the command line with the headers unmodified. This doesn't make any sense.

ADD COMMENTlink modified 3.0 years ago by gearoid200 • written 3.0 years ago by Nate20
4
gravatar for gearoid
3.0 years ago by
gearoid200
gearoid200 wrote:

Closing the input file after writing to it, but before calling Clustal Omega, makes this work for me.

Just add

handle.close()

after the call to SeqIO.write(). I think the way it is now, maybe the sequences aren't being written to that file until the script terminates (after the call to Clustal Omega).

Edit: a more elegant version might be:

records = (SeqRecord(Seq(seq, IUPAC.protein), id=str(index), name="Test", description="Test") for index,seq in enumerate([ref_seq, target_seq]) )

with open(in_file, 'w') as handle:
    SeqIO.write(records, handle, "fasta")

where the 'with' statement takes care of closing the file handle for you (even if an exception is thrown in the inner block).

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by gearoid200

I had a "workaround" that inadvertently did this. But now it's obvious. Thanks.

ADD REPLYlink written 3.0 years ago by Nate20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 686 users visited in the last hour