Question

is there any solution to convert a multifasta file to csv?

0

Entering edit mode

3.3 years ago

USER • 0

I am trying to transform Multifasta - with many sequences fasta to a csv table. But I got the following error

is there any solution to convert a fasta file to csv?

fasta biopython perl • 3.0k views

ADD COMMENT • link updated 9 months ago by Ram 43k • written 3.3 years ago by USER • 0

0

Entering edit mode

What on earth is a multi-folder file? Given that your header is in a custom format, you're going to have to parse it using custom code.

ADD REPLY • link 3.3 years ago by Ram 43k

0

Entering edit mode

Multifasta - with many sequences fasta. I'm going to edit sorry. Is there any solution?

ADD REPLY • link 3.3 years ago by USER • 0

0

Entering edit mode

You're halfway there. re.id + redescription should give you the whole header, make sure of that. Once you have it, you can split by | and then spit each element of that by : into key value pairs. Then, these key value pairs along with re.seq will give you the columns for each re. Write these attributes separated by , and you'll have your CSV.

ADD REPLY • link 3.3 years ago by Ram 43k

0

Entering edit mode

Thanks!!!

ADD REPLY • link 3.3 years ago by USER • 0

score 2 · Accepted Answer · 2021-01-27

This should get you going:

from Bio import SeqIO
for re in SeqIO.parse('CHIKV1-X-gb_AB455493.fasta', 'fasta'):
    print('>{}\t{}'.format(str(re.description).replace('|', '\t'), re.seq))

Prints out:

>gb:KX262887    Organism:Zika virus     Strain Name:103451      Segment:null    Subtype:Asian   Host:Human    GTTGTTGATCTGTGTGAATCA ....
>gb:KX262887    Organism:Zika virus     Strain Name:103451      Segment:null    Subtype:Asian   Host:Human    GTTGTTGATCTGTGTGAATCA ....
>gb:KX262887    Organism:Zika virus     Strain Name:103451      Segment:null    Subtype:Asian   Host:Human    GTTGTTGATCTGTGTGAATCA ....

The rest is simply a matter of possibly using comma instead of tab.