is there any solution to convert a multifasta file to csv?
1
0
Entering edit mode
3.3 years ago
USER • 0

I am trying to transform Multifasta - with many sequences fasta to a csv table. But I got the following error

is there any solution to convert a fasta file to csv?

fasta biopython perl • 3.0k views
ADD COMMENT
0
Entering edit mode

What on earth is a multi-folder file? Given that your header is in a custom format, you're going to have to parse it using custom code.

ADD REPLY
0
Entering edit mode

Multifasta - with many sequences fasta. I'm going to edit sorry. Is there any solution?

ADD REPLY
0
Entering edit mode

You're halfway there. re.id + redescription should give you the whole header, make sure of that. Once you have it, you can split by | and then spit each element of that by : into key value pairs. Then, these key value pairs along with re.seq will give you the columns for each re. Write these attributes separated by , and you'll have your CSV.

ADD REPLY
0
Entering edit mode

Thanks!!!

ADD REPLY
2
Entering edit mode
3.3 years ago
Mensur Dlakic ★ 27k

This should get you going:

from Bio import SeqIO
for re in SeqIO.parse('CHIKV1-X-gb_AB455493.fasta', 'fasta'):
    print('>{}\t{}'.format(str(re.description).replace('|', '\t'), re.seq))

Prints out:

>gb:KX262887    Organism:Zika virus     Strain Name:103451      Segment:null    Subtype:Asian   Host:Human    GTTGTTGATCTGTGTGAATCA ....
>gb:KX262887    Organism:Zika virus     Strain Name:103451      Segment:null    Subtype:Asian   Host:Human    GTTGTTGATCTGTGTGAATCA ....
>gb:KX262887    Organism:Zika virus     Strain Name:103451      Segment:null    Subtype:Asian   Host:Human    GTTGTTGATCTGTGTGAATCA ....

The rest is simply a matter of possibly using comma instead of tab.

ADD COMMENT
0
Entering edit mode

Thanks!!!! :)

ADD REPLY

Login before adding your answer.

Traffic: 2774 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6