I've been trying to run a sequence alignment using clustal omega from within a python script. However, I keep getting the following error:
FATAL: Sequence file contains no data
However, if I run the raw command on the command line, everything runs fine. Here is the relevant code in my script:
from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord from Bio.Alphabet import IUPAC import Bio.SeqIO as SeqIO from Bio.Align.Applications import ClustalOmegaCommandline ref_seq = "ERVVIGSKPFNEQYILANMIAILLEENGYKA" target_seq = "ERVVIGSKPFNEQYILANMINGYKA" in_file = "unaligned.fasta" out_file = "aligned.fasta" # Write my sequences to a fasta file handle = open(in_file, 'w') records = (SeqRecord(Seq(seq, IUPAC.protein), id=str(index), name="Test", description="Test") for index,seq in enumerate([ref_seq, target_seq]) ) SeqIO.write(records, handle, "fasta") clustalomega_cline = ClustalOmegaCommandline(infile=in_file, outfile=out_file, verbose=True, auto=True, force=True) clustalomega_cline() #sp.call(str(clustalomega_cline), shell=True) # This will also fail with the same error
It seems that this is due to either the way biopython writes the FASTA file, or how clustalo processes it.
If I comment out the lines where I write to a file and run the script again, is still fails, even if I run the raw command using subprocess. BUT, if I also make sure that there are no spaces in the FASTA sequence headers, then the commands at the bottom run fine.
I don't know why this isn't a problem when just run from the command line with the headers unmodified. This doesn't make any sense.