Extract specific information from headers of fasta file
6.4 years ago
Hi,

I know this is stupid, I posted similar question before, but I need a little modification to the code to get the right information.

This is the format of header.

>AAA23421(AI041) fim41, [Escherichia coli]

I need to extract only "AAA23421(AI041)" part from the header. The length of this part differs for sequences in this fasta file.

I tried to modify and use this code:

grep -Po -e ">.*?\)" fileName.fa | sed 's/^>//g' >file1.txt

but it didn't work.

Can anyone help with this?

Thanks

perl -lne 'if(/>(.*?)$$(.*?)$$ /){print "$1($2)"}' fileName.fa

(.*?) - anything of any length

Thank you so much! This code works, too.

Actually the code I modified DO work on the server!! I ran it at the wrong place before.

Sorry for any confusion.

6.4 years ago

