Question: is there any solution to convert a multifasta file to csv?
0
gravatar for projetoic
4 weeks ago by
projetoic0
projetoic0 wrote:

I am trying to transform Multifasta - with many sequences fasta to a csv table. But I got the following error

is there any solution to convert a fasta file to csv?

from Bio import SeqIO

for re in SeqIO.parse('CHIKV1-X-gb_AB455493.fasta', 'fasta'):
    print('>{}\t{}'.format(re.description, re.id))

and

import fastatocsv

fastatocsv.converter.convert("CHIKV1-X-gb_AB455493.fasta","zikanovo.csv")

Until then, you can only get solutions that return two columns or return only the id and the sequence. Or everything is returned

expected exit Output:

column1           column2..
gb:AB455493  Organism:Chikungunya virus   Strain Name:SL11131   Segment:nul    Host:Human  AATGG
gb:AB455493  Organism:Chikungunya virus   Strain Name:SL11131   Segment:nul    Host:Human  AATGG
gb:AB455493  Organism:Chikungunya virus   Strain Name:SL11131   Segment:nul    Host:Human  AATGG

myseq:

>gb:KX262887|Organism:Zika virus|Strain Name:103451|Segment:null|Subtype:Asian|Host:Human
GTTGTTGATCTGTGTGAATCAGACTGCGACAGTTCGAGTTTGAAGCGAAAGCTAGCAACAGTATCAACAG
GTTTTATTTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCAAAAAAGAAATCCGGAGGATTCC

>gb:KX262887|Organism:Zika virus|Strain Name:103451|Segment:null|Subtype:Asian|Host:Human
    GTTGTTGATCTGTGTGAATCAGACTGCGACAGTTCGAGTTTGAAGCGAAAGCTAGCAACAGTATCAACAG
    GTTTTATTTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCAAAAAAGAAATCCGGAGGATTCC

>gb:KX262887|Organism:Zika virus|Strain Name:103451|Segment:null|Subtype:Asian|Host:Human
    GTTGTTGATCTGTGTGAATCAGACTGCGACAGTTCGAGTTTGAAGCGAAAGCTAGCAACAGTATCAACAG
    GTTTTATTTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCAAAAAAGAAATCCGGAGGATTCC
convert biopython csv fasta perl • 119 views
ADD COMMENTlink modified 4 weeks ago by Mensur Dlakic9.0k • written 4 weeks ago by projetoic0

What on earth is a multi-folder file? Given that your header is in a custom format, you're going to have to parse it using custom code.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Ram32k

Multifasta - with many sequences fasta. I'm going to edit sorry. Is there any solution?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by projetoic0

You're halfway there. re.id + redescription should give you the whole header, make sure of that. Once you have it, you can split by | and then spit each element of that by : into key value pairs. Then, these key value pairs along with re.seq will give you the columns for each re. Write these attributes separated by , and you'll have your CSV.

ADD REPLYlink written 4 weeks ago by Ram32k

Thanks!!!

ADD REPLYlink written 29 days ago by projetoic0
2
gravatar for Mensur Dlakic
4 weeks ago by
Mensur Dlakic9.0k
USA
Mensur Dlakic9.0k wrote:

This should get you going:

from Bio import SeqIO
for re in SeqIO.parse('CHIKV1-X-gb_AB455493.fasta', 'fasta'):
    print('>{}\t{}'.format(str(re.description).replace('|', '\t'), re.seq))

Prints out:

>gb:KX262887    Organism:Zika virus     Strain Name:103451      Segment:null    Subtype:Asian   Host:Human    GTTGTTGATCTGTGTGAATCA ....
>gb:KX262887    Organism:Zika virus     Strain Name:103451      Segment:null    Subtype:Asian   Host:Human    GTTGTTGATCTGTGTGAATCA ....
>gb:KX262887    Organism:Zika virus     Strain Name:103451      Segment:null    Subtype:Asian   Host:Human    GTTGTTGATCTGTGTGAATCA ....

The rest is simply a matter of possibly using comma instead of tab.

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Mensur Dlakic9.0k

Thanks!!!! :)

ADD REPLYlink written 29 days ago by projetoic0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1153 users visited in the last hour
_