Question

Split the multiple sequences file into a separate files

2

Entering edit mode

6.6 years ago

skjobs1234 ▴ 40

I have a file contain multiple gene >sequences. I want to separate into a file with gene ID perl or python script

sequence • 9.3k views

ADD COMMENT • link updated 3.0 years ago by chronotope ▴ 10 • written 6.6 years ago by skjobs1234 ▴ 40

1

Entering edit mode

Can you show an example?

ADD REPLY • link 6.6 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

Search this forum for any regular question. These were answered many times. (You'll get used to this if you are a new user)

Here is one of many solutions on this forum: How To Split A Multiple Fasta

BTW, it is a question, not a tutorial.

ADD REPLY • link 6.6 years ago by venu 7.1k

0

Entering edit mode

Have you searched the forum for similar questions?
Why Perl or Python specifically? Why not awk or other unix tools?

ADD REPLY • link 6.6 years ago by Ram 43k

score 6 · Accepted Answer · 2017-09-18

6

Entering edit mode

6.6 years ago

Joe 21k

As others have mentioned this is answered a lot on the forum. But I can't help myself when it comes to trying to make bash do this sort of thing (sequences will have to be linearised).

#!/bin/bash

i=1;
while read line ; do
  if [ ${line:0:1} == ">" ] ; then
    echo "$line" >> seq"${i}".fasta
  else
    echo "$line" >> seq"${i}".fasta
    ((i++))
  fi
done < $1

Usage:

$ bash splitfasta.sh multifasta.fasta

Disclaimer:

You should always use a proper parser though (like biopython) as it'll catch many of the special cases. My code just has the bonus of not requiring anything to be installed to run.

ADD COMMENT • link 5.6 years ago by Joe 21k

0

Entering edit mode

A quick awk for linearizing the sequences:

awk '$0~/^>/{if(NR>1){print sequence;sequence=""}print $0}$0!~/^>/{sequence=sequence""$0}END{print sequence}' "$1"

ADD REPLY • link 5.3 years ago by dipoppleton • 0

Ram · Accepted Answer · 2017-09-18

5

Entering edit mode

6.6 years ago

Renesh ★ 2.2k

This is a python script for splitting FASTA file into an individual file.

from Bio import SeqIO
import argparse

parser = argparse.ArgumentParser(description="Split the fasta file into individual file with each gene seq")
parser.add_argument('-f', action='store', dest='fasta_file', help='Input fasta file')
result = parser.parse_args()

f_open = open(result.fasta_file, "rU")

for rec in SeqIO.parse(f_open, "fasta"):
   id = rec.id
   seq = rec.seq
   id_file = open(id, "w")
   id_file.write(">"+str(id)+"\n"+str(seq))
   id_file.close()

f_open.close()

To run above code, (save the above code in code.py file)

python code.py -f fasta_file

Note: You need to install Biopython module SeqIO to run this code.

ADD COMMENT • link 6.6 years ago by Renesh ★ 2.2k

0

Entering edit mode

you should add a note: this script requires biopython module installed

ADD REPLY • link 6.6 years ago by shoujun.gu ▴ 380

0

Entering edit mode

Hello,

I tried to use this code, but PYZO kept giving me error message. Any resolution? Thanks!

Running script: "C:\Users\14805\Desktop\python test\New folder\Splitthefastafile.py"
C:\Users\14805\Desktop\python test\New folder\Splitthefastafile.py:9: DeprecationWarning: 'U' mode is deprecated
  f_open = open('result.fasta_file', "rU")
Traceback (most recent call last):
  File "C:\Users\14805\Desktop\python test\New folder\Splitthefastafile.py", line 9, in <module>
    f_open = open('result.fasta_file', "rU")
FileNotFoundError: [Errno 2] No such file or directory: 'result.fasta_file'

ADD REPLY • link updated 3.4 years ago by Ram 43k • written 3.4 years ago by oocute72327 • 0

0

Entering edit mode

You haven't told it where the target fasta file is or have gotten the path/filename wrong, so it cannot find it.

Also, don't use spaces in file/folder names.

ADD REPLY • link 3.4 years ago by Joe 21k