Question

help to edit fasta header

0

Entering edit mode

9.4 years ago

mario.t.murakami • 0

Dear All,

I would like to edit the header of a multifasta file, but I am not so familiar with scripting.

Basically, I want to remove information within parenthesis (including the symbols) and remove bracket symbols, but miantaining the information within. for instance,

>gi|745831934|gb|AJD39620.1| protein A (plasmid) [Homo sapiens]
>gi|745831934|gb|AJD39620.1| protein A Homo sapiens

Thanks in advance

sequence alignment • 3.7k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.4 years ago by mario.t.murakami • 0

Ram · Answer 1 · 2016-01-23

2

Entering edit mode

9.4 years ago

venu 7.1k

perl -pe 's/\(.*\)//' file.faa | sed 's/\[//' | sed 's/\]//'

or (Updated)

perl -pe 's/\(.*\)//' file.faa | perl -pe 's/\[//' | perl -pe 's/\]//'

ADD COMMENT • link updated 6.1 years ago by Ram 45k • written 9.4 years ago by venu 7.1k

0

Entering edit mode

Dear Venu,

Thanks for your help. When I run your suggested command line I got the following error:

sed: -e expression #1, char 1: unknown command: ` ' '

Please, tell me how to write into an output file and not only printing on the screen. thanks again.

ADD REPLY • link updated 6.1 years ago by Ram 45k • written 9.4 years ago by mario.t.murakami • 0

0

Entering edit mode

You can redirect the output to a new file like this:

$ perl -pe 's/\(.*\)//' file.faa | sed 's/\[//' | sed 's/\]//' > new_file_name

ADD REPLY • link updated 6.1 years ago by Ram 45k • written 9.4 years ago by GenoMax 152k

0

Entering edit mode

Yes, I did it. but it is just writing the original file. PS: I tested it without the sed commands.

ADD REPLY • link updated 6.1 years ago by Ram 45k • written 9.4 years ago by mario.t.murakami • 0

0

Entering edit mode

Are you saying that you are still getting the sed error you mentioned above? Are you using the command exactly as provided by @venu (with single quote characters)?

ADD REPLY • link updated 6.1 years ago by Ram 45k • written 9.4 years ago by GenoMax 152k

0

Entering edit mode

I mean that the changes that should be done by the command are not written in the output file.

by the way the command line updated by @venu is not working the second and third part. it is now only printing like that:

original: >gi|745831934|gb|AJD39620.1| protein A (plasmid) [Homo sapiens]
printed: >gi|745831934|gb|AJD39620.1| protein A [Homo sapiens]

perhaps is it related with fact that I am using strawberry perl in windows os?

thanks

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.4 years ago by mario.t.murakami • 0

0

Entering edit mode

Obviously there is perl on unix and the perl you are using on windows, which does not appear to work the way we expect it to on unix.

Is this file large to not be able to do this using an editor in windows?

ADD REPLY • link updated 6.1 years ago by Ram 45k • written 9.4 years ago by GenoMax 152k

0

Entering edit mode

yes, there are 400 sequences. I will run the script on linux and it will probably work fine.

thanks

ADD REPLY • link 9.4 years ago by mario.t.murakami • 0

0

Entering edit mode

It's perfectly working fine. I don't know why you are getting error. I am updating answer. Direct the output to new file as @genomax said.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.4 years ago by venu 7.1k

0

Entering edit mode

thanks both. On linux, it worked fine.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.4 years ago by mario.t.murakami • 0