Question

mutiple fasta to single fasta

0

Entering edit mode

2.9 years ago

setschmann ▴ 10

i have a huge reference genome with a lot of contigs, it looks something like this.

>aalba5_s00000010
TTGTCTGCTTCACAGTACAGCTAGAAAATTATGAATTCATTTCCCCACATCAAGCAACCCCTGCTTATTC
>aalba5_s00000011
ACTTGGAATGGGATCTTGTTGGGGGGCCAACAGAACCATAAGGGCAATGGCTGCAATCTTTGATAAGATC
>aalba5_s00000012                                                                                                                                                                                                                        
TGTAGCAAACAGCTACGGAAAAATTTTAAAAATTTTCGAAATTTAAATCTGGGGTTCCCTTTCCTGTGTA 
GATGTATTCCCTTTTTAAAGGTTTTCCTAGGACTTGCAGTCATTAATGAGACGTCTTCTCATGATATCCT
AATTTTTGGAAGATGCCTCCTACATCAGGAATCTTTGCTGCCACTTGTCTCTTTCATCAGCCAGATGTCT

how can i subset this that i have a file each with the filename of the name of the contig (examplea alba5_s00000010.fa) conatining its sequence?

tanks for the help

fasta reference visualisation genome • 1.0k views

ADD COMMENT • link updated 2.9 years ago by Renesh ★ 2.2k • written 2.9 years ago by setschmann ▴ 10

0

Entering edit mode

You can try below Python code for your file

from bioinfokit.analys import Fasta
Fasta.split_fasta(file='seq.fasta', n=3)

Replace the n with number of sequences in your file. Read more here https://www.reneshbedre.com/blog/filereaders.html#split-fasta-file-into-multiple-fasta-files

ADD REPLY • link 2.9 years ago by Renesh ★ 2.2k

score 1 · Answer 1 · 2021-09-13

1

Entering edit mode

2.9 years ago

ATpoint 84k

How to split fasta by '>' into a file each containing one sequence, and have the name of that file be the ID?

How To Split A Multiple Fasta

ADD COMMENT • link 2.9 years ago by ATpoint 84k

score 1 · Answer 2 · 2021-09-13

1

Entering edit mode

2.9 years ago

Pierre Lindenbaum 163k

How to make fasta manipulation more efficient

ADD COMMENT • link 2.9 years ago by Pierre Lindenbaum 163k