How to shorten header of multiple fasta sequences
1
0
Entering edit mode
2.5 years ago
Nelo ▴ 20

Hello everyone

I wanted to trimmed or shorten the header of multiple fasta as given here;

>PH01000278G0580 AAPIP1;1 PH_genemodel_v1 PH01000278..503019..506969 . + . ID=PH01000278G0580;Name=cytochrome P450, putative, expressed
MVLLVAIGVVVGVLVVSSLVLRWNEVRYSRKQGLPPGTMGWPLFGETTEFLKHGP
>PH01003036G0080 AANIP2;1 PH_genemodel_v1 PH01003036..45987..47350 . + . ID=PH01003036G0080;Name=chlorophyll A-B binding protein, putative, expressed
MAMASSSGLRSCSAVGVPSLLAPSSRSGRSGLPFCAYATTSGRVTMSAEWFPGQ


                                                                                                               TO


>PH01000278G0580 AAPIP1;1
MVLLVAIGVVVGVLVVSSLVLRWNEVRYSRKQGLPPGTMGWPLFGETTEFLKHGP
>PH01003036G0080 AANIP2;1
MAMASSSGLRSCSAVGVPSLLAPSSRSGRSGLPFCAYATTSGRVTMSAEWFPGQ

I found some command like

awk 'BEGIN{RS=">";}NR>1{ split($1,a," "); print ">"a[0]"\n"$2; }' in.fasta > out.fasta
awk -F 'locus_tag=|]' 'NR %2 == 1 {print ">"$2 }; NR % 2 == 0 {print}'

But not works for even after playing with those commands multiple times

trimmed header fasta • 589 views
ADD COMMENT
1
Entering edit mode
2.5 years ago
cut -d ' ' -f 1,2  in.fasta
ADD COMMENT

Login before adding your answer.

Traffic: 2699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6