Question: Iterating Through Fasta Sequences With Pyfasta Python Module
3
gravatar for Eric Normandeau
9.5 years ago by
Quebec, Canada
Eric Normandeau10k wrote:

I just started playing with the pyfasta Python module, which is described as a fast, memory-efficient, pythonic (and command-line) access to fasta sequence files.

It looks very promising for my work, but I am currently unable to use it for a very simple application, namely: I want to treat sequences from a fasta file sequentially, respecting the order of the sequences in the fasta file from which I read them. I have tried the following:

import pyfasta

f = pyfasta.Fasta("coreg.fa")
ncar = 60

with open("output.file", "w") as out_file:
    for header in f.keys():
        name = str(header)
        seq = str(f[header])
        out_file.write(name + "\n")
        while len(seq) > 0:
            out_file.write(seq[:ncar] + "\n")
            seq = seq[ncar:]

This simple example only reads my input fasta file and writes them back to an output fasta file. However, the initial order is randomized, since the headers are fitted into a dictionary based class (if I understood well), which modifies the order but permits a faster research time. This is not, however, exactly the behavior I am trying to achieve. So, my question is:

Is there a way to maintain the order from the original file?

(I do not want to iterate on sorted(f.keys()) since this does not give back the original order either.)

Many thanks!

python fasta • 4.5k views
ADD COMMENTlink modified 13 months ago by RamRS24k • written 9.5 years ago by Eric Normandeau10k

What exactly do you need to do with the sequences? I have a module that creates a class for FASTA files and should be able to maintain order with some small modifications.

ADD REPLYlink written 9.5 years ago by Paulo Nuin3.7k

I have 20 different uses for these sequences, depending on the project... However, they all have in common that I want to treat the sequences one at a time. The nice feature of pyfasta for me is that I don't have to put all the sequences in an object that has to reside in memory. For multiple Go fasta files, that will matter. Would you post your alternative @nuin ? I am interested to see it. Cheers!

ADD REPLYlink written 9.5 years ago by Eric Normandeau10k

Mine just put all the sequences in the memory. Send me an email and we can talk (nuin AT genedrift DOT org).

ADD REPLYlink written 9.5 years ago by Paulo Nuin3.7k
3
gravatar for brentp
9.5 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

that's a use-case i never thought of, but, as it happens, you can do that with the index. mostly, you dont need to know about the index, but since it saves the file position, you can use that to order your chromosomes:

f = pyfasta.Fasta('some.fasta')
sorted_keys = [x[0] for x in sorted(f.index.items(), key=lambda a: a[1][0])]

and that doesn't re-parse the file. and in if you're doing that frequently, you could use a subclass:

class Fasta(pyfasta.Fasta):
    def sorted_keys(self):
        return [x[0] for x in sorted(self.index.items(), key=lambda a: a[1][0])]
ADD COMMENTlink modified 13 months ago by RamRS24k • written 9.5 years ago by brentp23k

Thanks Brent! That was nested pretty far! You think it would be easy to add a feature making this possible out-of-the-box and that would not have to much of a time overhead compared to a fasta iterator on the file? Cheers

ADD REPLYlink written 9.5 years ago by Eric Normandeau10k

will, it's really just in f.index, so that's not nested too far. rather than add more features (which i have to document, test, maintain, and justify), i'd prefer to add it to the wiki or something.

ADD REPLYlink written 9.5 years ago by brentp23k
1
gravatar for krst
8.6 years ago by
krst10
krst10 wrote:

If you have Python 2.7+, you could modify pyfasta to the keys in an OrderedDict so that they stay in order.

ADD COMMENTlink written 8.6 years ago by krst10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 901 users visited in the last hour