Error when running clustal omega with python/biopython
1
2
Entering edit mode
8.0 years ago
Nate ▴ 20

Hi all,

I've been trying to run a sequence alignment using clustal omega from within a python script. However, I keep getting the following error:

FATAL: Sequence file contains no data

However, if I run the raw command on the command line, everything runs fine. Here is the relevant code in my script:

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
import Bio.SeqIO as SeqIO
from Bio.Align.Applications import ClustalOmegaCommandline

ref_seq = "ERVVIGSKPFNEQYILANMIAILLEENGYKA"
target_seq = "ERVVIGSKPFNEQYILANMINGYKA"

in_file = "unaligned.fasta"
out_file = "aligned.fasta"

# Write my sequences to a fasta file
handle = open(in_file, 'w')
records = (SeqRecord(Seq(seq, IUPAC.protein), id=str(index), name="Test", description="Test") for index,seq in enumerate([ref_seq, target_seq]) )
SeqIO.write(records, handle, "fasta")

clustalomega_cline = ClustalOmegaCommandline(infile=in_file, outfile=out_file, verbose=True, auto=True, force=True)
clustalomega_cline()
#sp.call(str(clustalomega_cline), shell=True) # This will also fail with the same error

Edit:

It seems that this is due to either the way biopython writes the FASTA file, or how clustalo processes it.

If I comment out the lines where I write to a file and run the script again, is still fails, even if I run the raw command using subprocess. BUT, if I also make sure that there are no spaces in the FASTA sequence headers, then the commands at the bottom run fine.

I don't know why this isn't a problem when just run from the command line with the headers unmodified. This doesn't make any sense.

biopython python clustalo clustalomega • 4.6k views
ADD COMMENT
4
Entering edit mode
8.0 years ago
gearoid ▴ 200

Closing the input file after writing to it, but before calling Clustal Omega, makes this work for me.

Just add

handle.close()

after the call to SeqIO.write(). I think the way it is now, maybe the sequences aren't being written to that file until the script terminates (after the call to Clustal Omega).

Edit: a more elegant version might be:

records = (SeqRecord(Seq(seq, IUPAC.protein), id=str(index), name="Test", description="Test") for index,seq in enumerate([ref_seq, target_seq]) )

with open(in_file, 'w') as handle:
    SeqIO.write(records, handle, "fasta")

where the 'with' statement takes care of closing the file handle for you (even if an exception is thrown in the inner block).

ADD COMMENT
0
Entering edit mode

I had a "workaround" that inadvertently did this. But now it's obvious. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2179 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6