Question: Extract specific information from headers of fasta file
0
gravatar for Crystal
4.6 years ago by
Crystal30
United States
Crystal30 wrote:

Hi,

I know this is stupid, I posted similar question before, but I need a little modification to the code to get the right information.

This is the format of header.

>AAA23421(AI041) fim41, [Escherichia coli]

I need to extract only "AAA23421(AI041)" part from the header. The length of this part differs for sequences in this fasta file.

I tried to modify and use this code:

grep -Po -e ">.*?\)" fileName.fa | sed 's/^>//g' >file1.txt

but it didn't work.

Can anyone help with this?

Thanks

Crystal

sequence • 1.6k views
ADD COMMENTlink modified 4.6 years ago by Devon Ryan92k • written 4.6 years ago by Crystal30
1
perl -lne 'if(/>(.*?)\((.*?)\) /){print "$1($2)"}' fileName.fa 

(.*?) - anything of any length

ADD REPLYlink written 4.6 years ago by mxs530

Thank you so much! This code works, too.

Crystal

ADD REPLYlink written 4.6 years ago by Crystal30

Actually the code I modified DO work on the server!! I ran it at the wrong place before.

Sorry for any confusion.

Crystal

ADD REPLYlink written 4.6 years ago by Crystal30
1
gravatar for Devon Ryan
4.6 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

I just replied with a link to the extracted results and an explanation of the sed command that I posted in the other thread. The command you just tried works on the dataset you posted to dropbox in the other thread,

ADD COMMENTlink written 4.6 years ago by Devon Ryan92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1702 users visited in the last hour