From .csv file to fasta
1
0
Entering edit mode
3.2 years ago
Sbrillo ▴ 10

Hi,

I want to convert this csv file in fasta format in order to have the name in the first row that maches the sequence in the second.

This is how the csv file looks:

kmer_100410623.0    AGTTTCTAGGCTTACAGGATGAAGA
kmer_100410669.0    AGTTTCTAGGCTTTAAGGATGAAGA
kmer_100423637.0    AGTTTCTGGGCTTTCAGGATGAAGA
kmer_100425211.0    AGTTTCTGTGCATTTGGAATGAAGA
kmer_100427622.0    AGTTTCTTAGCTTTCAGGATGAAGA
kmer_100432807.0    AGTTTCTTGGCTTTCATAATGAAGA
kmer_100433939.0    AGTTTCTTTACATTTGGAATGAAGA

The output should be like this

>kmer_100410623.0
AGTTTCTAGGCTTACAGGATGAAGA
>kmer_100410669.0
AGTTTCTAGGCTTTAAGGATGAAGA
>kmer_100423637.0   
AGTTTCTGGGCTTTCAGGATGAAGA

and so on...

Any suggestion of how to do it using python/bash or R ?

Thanks

fasta python genome Assembly kmer • 3.9k views
ADD COMMENT
2
Entering edit mode
3.2 years ago
GenoMax 141k

BTW: Your example looks like a tab or space separated, rather than a , separated file.

$ more example
kmer_100410623.0,AGTTTCTAGGCTTACAGGATGAAGA
kmer_100410669.0,AGTTTCTAGGCTTTAAGGATGAAGA
kmer_100423637.0,AGTTTCTGGGCTTTCAGGATGAAGA
kmer_100425211.0,AGTTTCTGTGCATTTGGAATGAAGA
kmer_100427622.0,AGTTTCTTAGCTTTCAGGATGAAGA
kmer_100432807.0,AGTTTCTTGGCTTTCATAATGAAGA
kmer_100433939.0,AGTTTCTTTACATTTGGAATGAAGA

$ cat example | tr "," "\n" | sed 's/^kmer/>kmer/'
>kmer_100410623.0
AGTTTCTAGGCTTACAGGATGAAGA
>kmer_100410669.0
AGTTTCTAGGCTTTAAGGATGAAGA
>kmer_100423637.0
AGTTTCTGGGCTTTCAGGATGAAGA
>kmer_100425211.0
AGTTTCTGTGCATTTGGAATGAAGA
>kmer_100427622.0
AGTTTCTTAGCTTTCAGGATGAAGA
>kmer_100432807.0
AGTTTCTTGGCTTTCATAATGAAGA
>kmer_100433939.0
AGTTTCTTTACATTTGGAATGAAGA
ADD COMMENT

Login before adding your answer.

Traffic: 2917 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6