split multiple fasta into codons
1
0
Entering edit mode
6.6 years ago

Hi,

I have a list of gene sequence

 >2619859165 

GCGTATCCGCGGTACGGCTGGCTTGGCGCGAAGGAACGTTCCCGGGACAC
CCTGTCCGGCCTCACCCTGATGGGTGTCCGTCTCTACGACCCCAACCTCG
GTCGCTTCCTCCAGACCGATCCGGTCCCCGGCGGGTCGGACAACGCCTAC

>2619859164

ATGGAGGATCTACTCTTCTCTCTATTTGGCGTGCTGATGATTAGTGCTGG
GCTGATCTCGCTTCTGATTCCCGAGAGAGTATCTCGTTGGAATGACACGG
TAGGGCCGAGGTGGATCCGCGACTTCAGTGTGCGTGGCGAGTTCAAGGCA


>2619859163 

ATGGCCGGGTGTGAGGAACAGGTGGGGATTGGCGGGTCGGGTGACGGCGT
GCCGGTCGGGTCGGTTGTCCGGTGGGGGTTGGCGACGTTCGGCACGGGGC

and I want to split the each sequence into their codons. Like

>2619859165
GCG
TAT
CCG
...
>2619859164
ATG
GAG
GAT
...

how to do this...please suggest..

thanks in advance

sequence • 1.5k views
ADD COMMENT
0
Entering edit mode
6.6 years ago

linearize the fasta and use sed to insert a carriage return after the codon

cat input.fasta |\
awk '/^>/ {printf("%s%s\n",(N==0?"":"\n"),$0);N++;next;} {printf("%s",$0);}END{printf("\n");}' |\
sed -e $'/^[^>]/s/\([A-Z][A-Z][A-Z]\)/\\1\\\n/g'
ADD COMMENT

Login before adding your answer.

Traffic: 2362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6