Entering edit mode
2.5 years ago
Nelo
▴
20
Hello everyone
I wanted to trimmed or shorten the header of multiple fasta as given here;
>PH01000278G0580 AAPIP1;1 PH_genemodel_v1 PH01000278..503019..506969 . + . ID=PH01000278G0580;Name=cytochrome P450, putative, expressed
MVLLVAIGVVVGVLVVSSLVLRWNEVRYSRKQGLPPGTMGWPLFGETTEFLKHGP
>PH01003036G0080 AANIP2;1 PH_genemodel_v1 PH01003036..45987..47350 . + . ID=PH01003036G0080;Name=chlorophyll A-B binding protein, putative, expressed
MAMASSSGLRSCSAVGVPSLLAPSSRSGRSGLPFCAYATTSGRVTMSAEWFPGQ
TO
>PH01000278G0580 AAPIP1;1
MVLLVAIGVVVGVLVVSSLVLRWNEVRYSRKQGLPPGTMGWPLFGETTEFLKHGP
>PH01003036G0080 AANIP2;1
MAMASSSGLRSCSAVGVPSLLAPSSRSGRSGLPFCAYATTSGRVTMSAEWFPGQ
I found some command like
awk 'BEGIN{RS=">";}NR>1{ split($1,a," "); print ">"a[0]"\n"$2; }' in.fasta > out.fasta
awk -F 'locus_tag=|]' 'NR %2 == 1 {print ">"$2 }; NR % 2 == 0 {print}'
But not works for even after playing with those commands multiple times