Question: Editing header by adding pipe in fasta file
0
gravatar for muhammad.arslan
7 months ago by
muhammad.arslan0 wrote:

I want to edit my headers in fasta file by adding pipes but unable to do so. The header looks like this

>XP_002436309.2 NAC domain-containing protein 69 isoform X1 [Sorghum bicolor]
MPSTSISSASAAGKGGSKAMQPPPQLPAALPVGFRFRPTDEELVRHYLKPKIAGHAHADLLLIPDVDLSACEPWELPAKA

>XP_002436310.1 plastocyanin, chloroplastic [Sorghum bicolor]
MASLSSATITAPSAFAAPAARAVARRSSFTVRASLGKAAGTAAVAVAASALLAGGAMAQEVLLGANGGVLVFEPSEFTVK

to

>sp|XP_002436309.2| NAC domain-containing protein 69 isoform X1 [Sorghum bicolor]
MPSTSISSASAAGKGGSKAMQPPPQLPAALPVGFRFRPTDEELVRHYLKPKIAGHAHADLLLIPDVDLSACEPWELPAKA

>sp|XP_002436310.1 plastocyanin, chloroplastic [Sorghum bicolor]
MASLSSATITAPSAFAAPAARAVARRSSFTVRASLGKAAGTAAVAVAASALLAGGAMAQEVLLGANGGVLVFEPSEFTVK

I am able to add sp| using notepad++ but cannot do it after the accession number (KX035646.1).

Thank you for the help!

editing header grep fasta • 343 views
ADD COMMENTlink modified 7 months ago by Hugo130 • written 7 months ago by muhammad.arslan0

We need a bit more information really.

Is it just one fasta header? Do you need sp in front of all of them? Is the accession number always the same?

ADD REPLYlink written 7 months ago by jrj.healey6.8k

Yes, this is just one header, the whole file has more than 150,000 sequences. All headers should have "sp" and then "pipe" and then accession and then "pipe". The accession number is different for all sequences.

ADD REPLYlink modified 7 months ago • written 7 months ago by muhammad.arslan0

See if this does it sed 's/^>/\>sp|/g' your_file > new_file.

Edit: Looks like you need another | after the accession. You should search biostars for leads. This is one of the most frequently asked questions here.

sed -e 's/^>/\>sp|/g' -e 's/\ Name/\|\ Name/g' your_file > new_file

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax56k

This didn't work...

I got the same output as input...

>XP_002436309.2 NAC domain-containing protein 69 isoform X1 [Sorghum bicolor]
MPSTSISSASAAGKGGSKAMQPPPQLPAALPVGFRFRPTDEELVRHYLKPKIAGHAHADLLLIPDVDLSACEPWELPAKALIRSGDPEWFFFAPLDRKYPGGHRSNRSTAAGYWKATGKDRLIRSRRAGTLIGVKKTLVFHRGRAPRGHRTAWIMHEYRT
>XP_002436310.1 plastocyanin, chloroplastic [Sorghum bicolor]
MASLSSATITAPSAFAAPAARAVARRSSFTVRASLGKAAGTAAVAVAASALLAGGAMAQEVLLGANGGVLVFEPSEFTVKAGDTITFKNNAGYPHNVVFDEDEVPSGVDATKISQEEYLNAPGETYSVTLTVPGTYGFYCEPHQGAGMVGKVTVN
ADD REPLYlink modified 7 months ago • written 7 months ago by muhammad.arslan0

It did not work because in the example above you had put Name:. If the names are not consistent then this example should be added to the original post.

ADD REPLYlink written 7 months ago by genomax56k

I tried to find but somehow my keywords were not matching it... I am sorry for it... I just modified it in the original post...

And this is again not generating second | in the output

>sp|XP_002436309.2 NAC domain-containing protein 69 isoform X1 [Sorghum bicolor]
MPSTSISSASAAGKGGSKAMQPPPQLPAALPVGFRFRPTDEELVRHYLKPKIAGHAHADLLLIPDVDLSACEPWELPAKA
ADD REPLYlink modified 7 months ago • written 7 months ago by muhammad.arslan0

Use this: A: modify header of sequencs in fasta file

ADD REPLYlink written 7 months ago by genomax56k
0
gravatar for genomax
7 months ago by
genomax56k
United States
genomax56k wrote:

@Pierre's answer works for this: A: modify header of sequencs in fasta file

ADD COMMENTlink modified 7 months ago • written 7 months ago by genomax56k

I would suggest to not close the post for such reasons. The best course of action is to post the link to the duplicate as an answer and leave it that way. There is little to be gained from closing the post.

ADD REPLYlink written 7 months ago by Istvan Albert ♦♦ 77k
0
gravatar for Hugo
7 months ago by
Hugo130
Universidade de Vigo, Ourense (Spain)
Hugo130 wrote:

Dear Muhammad, I would suggest you to try the "Rename header" option of SEDA (http://www.sing-group.org/seda/). Section 3.8.4 "Add prefix/suffix" of the manual explains you how to easily achieve what you want: first add a word (prefix "sp|") before the header id and then add a word (suffix "|") after the header id. Do not hesitate contact me if you need some help.

Regards,

Hugo.

ADD COMMENTlink modified 7 months ago • written 7 months ago by Hugo130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1568 users visited in the last hour