2
0
Entering edit mode
5.9 years ago

Dear All,

I would like to edit the header of a multifasta file, but I am not so familiar with scripting.

Basically, I want to remove information within parenthesis (including the symbols) and remove bracket symbols, but mantaining the information within. for instance,

>gi|745831934|gb|AJD39620.1| protein A (plasmid) [Homo sapiens]

>gi|745831934|gb|AJD39620.1| protein A Homo sapiens

sequence alignment • 2.1k views
2
Entering edit mode
5.9 years ago
venu 7.0k
perl -pe 's/$$.*$$//' file.faa | sed 's/$//' | sed 's/$//'


or (Updated)

perl -pe 's/$$.*$$//' file.faa | perl -pe 's/$//' | perl -pe 's/$//'

0
Entering edit mode

Dear Venu,

Thanks for your help. When I run your suggested command line I got the following error:

sed: -e expression #1, char 1: unknown command:  ' '


Please, tell me how to write into an output file and not only printing on the screen. thanks again.

0
Entering edit mode

You can redirect the output to a new file like this:

\$ perl -pe 's/$$.*$$//' file.faa | sed 's/$//' | sed 's/$//' > new_file_name

0
Entering edit mode

Yes, I did it. but it is just writing the original file. PS: I tested it without the sed commands.

0
Entering edit mode

Are you saying that you are still getting the sed error you mentioned above? Are you using the command exactly as provided by @venu (with single quote characters)?

0
Entering edit mode

I mean that the changes that should be done by the command are not written in the output file.

by the way the command line updated by @venu is not working the second and third part. it is now only printing like that:

original: >gi|745831934|gb|AJD39620.1| protein A (plasmid) [Homo sapiens]
printed: >gi|745831934|gb|AJD39620.1| protein A [Homo sapiens]
`

perhaps is it related with fact that I am using strawberry perl in windows os?

thanks

0
Entering edit mode

Obviously there is perl on unix and the perl you are using on windows, which does not appear to work the way we expect it to on unix.

Is this file large to not be able to do this using an editor in windows?

0
Entering edit mode

yes, there are 400 sequences. I will run the script on linux and it will probably work fine.

thanks

0
Entering edit mode

It's perfectly working fine. I don't know why you are getting error. I am updating answer. Direct the output to new file as @genomax said.

0
Entering edit mode

thanks both. On linux, it worked fine.