Entering edit mode
7.4 years ago
mario.t.murakami • 0
I would like to edit the header of a multifasta file, but I am not so familiar with scripting.
Basically, I want to remove information within parenthesis (including the symbols) and remove bracket symbols, but miantaining the information within. for instance,
>gi|745831934|gb|AJD39620.1| protein A (plasmid) [Homo sapiens] >gi|745831934|gb|AJD39620.1| protein A Homo sapiens
Thanks in advance
Thanks for your help. When I run your suggested command line I got the following error:
Please, tell me how to write into an output file and not only printing on the screen. thanks again.
You can redirect the output to a new file like this:
Yes, I did it. but it is just writing the original file. PS: I tested it without the sed commands.
Are you saying that you are still getting the sed error you mentioned above? Are you using the command exactly as provided by @venu (with single quote characters)?
I mean that the changes that should be done by the command are not written in the output file.
by the way the command line updated by @venu is not working the second and third part. it is now only printing like that:
perhaps is it related with fact that I am using strawberry perl in windows os?
Obviously there is perl on unix and the perl you are using on windows, which does not appear to work the way we expect it to on unix.
Is this file large to not be able to do this using an editor in windows?
yes, there are 400 sequences. I will run the script on linux and it will probably work fine.
It's perfectly working fine. I don't know why you are getting error. I am updating answer. Direct the output to new file as @genomax said.
thanks both. On linux, it worked fine.