I want to edit my headers in fasta file by adding pipes but unable to do so. The header looks like this
>XP_002436309.2 NAC domain-containing protein 69 isoform X1 [Sorghum bicolor]
MPSTSISSASAAGKGGSKAMQPPPQLPAALPVGFRFRPTDEELVRHYLKPKIAGHAHADLLLIPDVDLSACEPWELPAKA
>XP_002436310.1 plastocyanin, chloroplastic [Sorghum bicolor]
MASLSSATITAPSAFAAPAARAVARRSSFTVRASLGKAAGTAAVAVAASALLAGGAMAQEVLLGANGGVLVFEPSEFTVK
to
>sp|XP_002436309.2| NAC domain-containing protein 69 isoform X1 [Sorghum bicolor]
MPSTSISSASAAGKGGSKAMQPPPQLPAALPVGFRFRPTDEELVRHYLKPKIAGHAHADLLLIPDVDLSACEPWELPAKA
>sp|XP_002436310.1 plastocyanin, chloroplastic [Sorghum bicolor]
MASLSSATITAPSAFAAPAARAVARRSSFTVRASLGKAAGTAAVAVAASALLAGGAMAQEVLLGANGGVLVFEPSEFTVK
I am able to add sp| using notepad++ but cannot do it after the accession number (KX035646.1).
Thank you for the help!
We need a bit more information really.
Is it just one fasta header? Do you need
sp
in front of all of them? Is the accession number always the same?Yes, this is just one header, the whole file has more than 150,000 sequences. All headers should have "sp" and then "pipe" and then accession and then "pipe". The accession number is different for all sequences.
See if this does it
sed 's/^>/\>sp|/g' your_file > new_file
.Edit: Looks like you need another
|
after the accession. You should search biostars for leads. This is one of the most frequently asked questions here.sed -e 's/^>/\>sp|/g' -e 's/\ Name/\|\ Name/g' your_file > new_file
This didn't work...
I got the same output as input...
It did not work because in the example above you had put
Name:
. If the names are not consistent then this example should be added to the original post.I tried to find but somehow my keywords were not matching it... I am sorry for it... I just modified it in the original post...
And this is again not generating second
|
in the outputUse this: A: modify header of sequencs in fasta file