Question: splitting multifasta-file in python
0
gravatar for twesigomwedavid
10 months ago by
twesigomwedavid0 wrote:

Hello,

How can I split a multi-fasta file into individual sequence files in python?

ADD COMMENTlink modified 10 months ago by Siya Diya0 • written 10 months ago by twesigomwedavid0
2

If this is a assignment you should always show the code your have written so far (if you need specific help).

Otherwise there are similar questions/solutions that can be found on this forum. Try doing an external google search.

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax70k
1

Always show an attempt in your post to show you tried something first.

Some hints :

  • You can use Biopython but it can be slow if you have a huge file

  • Or read the file line by line in a for loop, for each ">" at the beginning of a line, create a new file and write the current line and the next one into it. (you can even use the header of each sequence as output file name)

I think you can even do that in one Unix command

ADD REPLYlink modified 10 months ago • written 10 months ago by Bastien HervĂ©4.4k
0
gravatar for jrj.healey
10 months ago by
jrj.healey13k
United Kingdom
jrj.healey13k wrote:

As the others have said, see other results on this forum, for example: Split the multiple sequences file into a separate files

ADD COMMENTlink modified 10 months ago • written 10 months ago by jrj.healey13k
0
gravatar for Siya Diya
10 months ago by
Siya Diya0
Thrissur
Siya Diya0 wrote:

Try this code

#!/usr/bin/env python
import os
from Bio import SeqIO
def split(fastafile     =   "test_fasta.fasta",
          outfastadir   =   "splitoutput"):
    """Extract multiple sequence fasta file and write each sequence in separate file"""
    os.system("mkdir -p %s"% (outfastadir))
    with open (fastafile) as FH:
        record          =   SeqIO.parse(FH, "fasta")
        file_count      =   0
        for seq_rec in record:
            file_count  =   file_count  +   1
            with open("%s/%s.fasta" % (outfastadir,str(file_count)), "w") as FHO:
                SeqIO.write(seq_rec, FHO, "fasta")
    if file_count       == 0:
        raise Exception("No valid sequence in fasta file")
    return "Done"

if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser(version="1.0",
                                     description="Extract multiple sequence fasta file and write each sequence in separate file")

    parser.add_argument('-f','--fastafile',
                        action  ="store",
                        default ="test_fasta.fasta",
                        help="Fasta File for parsing")
    parser.add_argument('-d','--outfastadir',
                        action  ="store",
                        default ="splitoutput",
                        help    ="Fasta File output directory")

    args = parser.parse_args()
    split(fastafile     =   args.fastafile,
          outfastadir   =   args.outfastadir)
ADD COMMENTlink written 10 months ago by Siya Diya0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1140 users visited in the last hour