Editing of fasta header file
0
0
Entering edit mode
3.5 years ago

Hello Everyone

Can anyone you guide me editing of the fasta header file. My fasta header file shown as below

>NP_006556.1 transcriptional repressor CTCF isoform 1 [Homo sapiens]

And I want the output should have

>NP_006556.1 [Homo sapiens]

Thank you so much

gene • 936 views
ADD COMMENT
0
Entering edit mode

use sed

ADD REPLY
0
Entering edit mode

I used the sed command but it did not work for me.

ADD REPLY
0
Entering edit mode

I used the following command

sed -E 's/>(.+)::(.+)/>\2_\1/' als.fasta > out.fasta
ADD REPLY
0
Entering edit mode

That pattern doesn't match your example header above in the way you need it to. Use [^ ]+ to capture the first part and \[.+\] to capture the second.

ADD REPLY
0
Entering edit mode

Could you please help with the command. I tried with the pattern you mentioned but it did not work

ADD REPLY
0
Entering edit mode

it did not work

Please be more specific. Show us an example of an input line, the exact command you ran and the result it yields, as well as how this result differs from your expected result.

ADD REPLY
0
Entering edit mode

Please find the command used for running it.

sed 's|\[^ ]+::\[.+\]' als.fasta > out.fasta

and it shows the error as

"s|\[^ ]+::\[.+\]": unterminated substitute pattern.
ADD REPLY
0
Entering edit mode

try to google "sed unterminated substitute pattern" ....

ADD REPLY
0
Entering edit mode

I used the following command and it worked for me.

sed 's/ .*\[/ \[/'
ADD REPLY
0
Entering edit mode

I need to use grep command to extract the pattern the organism name in brackets

ADD REPLY
0
Entering edit mode

I need to use grep command to extract the pattern the organism name in brackets

If you are sure that there are only two brackets per line and the species name is always between them, you can use this:

awk -F [ '{print $2}' als.fasta | awk -F ] '{print $1}' > out.fasta
ADD REPLY

Login before adding your answer.

Traffic: 2603 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6