I just started playing with the pyfasta Python module, which is described as a fast, memory-efficient, pythonic (and command-line) access to fasta sequence files.
It looks very promising for my work, but I am currently unable to use it for a very simple application, namely: I want to treat sequences from a fasta file sequentially, respecting the order of the sequences in the fasta file from which I read them. I have tried the following:
import pyfasta f = pyfasta.Fasta("coreg.fa") ncar = 60 with open("output.file", "w") as out_file: for header in f.keys(): name = str(header) seq = str(f[header]) out_file.write(name + "\n") while len(seq) > 0: out_file.write(seq[:ncar] + "\n") seq = seq[ncar:]
This simple example only reads my input fasta file and writes them back to an output fasta file. However, the initial order is randomized, since the headers are fitted into a dictionary based class (if I understood well), which modifies the order but permits a faster research time. This is not, however, exactly the behavior I am trying to achieve. So, my question is:
Is there a way to maintain the order from the original file?
(I do not want to iterate on
sorted(f.keys()) since this does not give back the original order either.)