Extract specific information from headers of fasta file
1
0
Entering edit mode
6.4 years ago
Crystal ▴ 50

Hi,

I know this is stupid, I posted similar question before, but I need a little modification to the code to get the right information.

This is the format of header.

>AAA23421(AI041) fim41, [Escherichia coli]

I need to extract only "AAA23421(AI041)" part from the header. The length of this part differs for sequences in this fasta file.

I tried to modify and use this code:

grep -Po -e ">.*?\)" fileName.fa | sed 's/^>//g' >file1.txt

but it didn't work.

Can anyone help with this?

Thanks

Crystal

sequence • 2.0k views
ADD COMMENT
1
Entering edit mode
perl -lne 'if(/>(.*?)\((.*?)\) /){print "$1($2)"}' fileName.fa 

(.*?) - anything of any length

ADD REPLY
0
Entering edit mode

Thank you so much! This code works, too.

Crystal

ADD REPLY
0
Entering edit mode

Actually the code I modified DO work on the server!! I ran it at the wrong place before.

Sorry for any confusion.

Crystal

ADD REPLY
1
Entering edit mode
6.4 years ago

I just replied with a link to the extracted results and an explanation of the sed command that I posted in the other thread. The command you just tried works on the dataset you posted to dropbox in the other thread,

ADD COMMENT

Login before adding your answer.

Traffic: 2449 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6