Question: Naming of FastA sequences
0
gravatar for Phil S.
4.9 years ago by
Phil S.660
Stuttgart, Germany
Phil S.660 wrote:

Hi there,

I have a usual looking fastA file like this:

>Translation: 2..112 (direct), 37 amino acids
ADTAQEFISTAVFGTSMSAHHILGLKPVPRVWLFAI*

>Translation: 1482..1790 (direct), 103 amino acids
MKKYTEQAKLSVVEDYCSGSAGHREVAHRHGVNANVIRKWLPIYRDKPVAPLPAFVPLQP
MPKRQADEAVVIALSLGDKSITVKWPISDPDGCARFIRSLSQ*
 
>Translation: 1787..2122 (direct), 112 amino acids
MIRIDAIWLATEPMDMRAGTETALVRVVAVFGAAKPHCAYLFANRRANRMKVLVHDGVGI
WLAARRLNQGKFHWPGTHRGLEVGLDAEQLQALVLGLPWQRVGANGAITMI*

now what I want to do is to kind of rename the sequences with a number which has to be 5 digits long. That means the three sequences above should be named like this:

>orf00001 2..112 (direct), 37 amino acids
ADTAQEFISTAVFGTSMSAHHILGLKPVPRVWLFAI*


>orf00002 1482..1790 (direct), 103 amino acids
MKKYTEQAKLSVVEDYCSGSAGHREVAHRHGVNANVIRKWLPIYRDKPVAPLPAFVPLQP
MPKRQADEAVVIALSLGDKSITVKWPISDPDGCARFIRSLSQ*
 
>orf00003 1787..2122 (direct), 112 amino acids
MIRIDAIWLATEPMDMRAGTETALVRVVAVFGAAKPHCAYLFANRRANRMKVLVHDGVGI
WLAARRLNQGKFHWPGTHRGLEVGLDAEQLQALVLGLPWQRVGANGAITMI*

So the 5 digits are fixed and I just need to count upwards seeing the '>' unfortunately I don't know how to make it a fixed length onto five digits.

Thanks for your help (once again ;) )

Best,

Phil

bash python fasta • 1.1k views
ADD COMMENTlink modified 4.9 years ago by Neilfws48k • written 4.9 years ago by Phil S.660
3
gravatar for Alex Reynolds
4.9 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Something like the following should get you close:

$ awk ' \
    BEGIN { idx = 0; } \
    if (/^>/) { \
        printf(">orf%05d %s\n", idx, substr($1, 2)); \
        idx++; \
    } \
    else { \
        print $0; \
    } \
​' mySeqs.fa > myRelabeledSeqs.fa
ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Alex Reynolds28k

Thank you so much for the fast and correct answer. The only thing I had to adjust is a line brake...

ADD REPLYlink written 4.9 years ago by Phil S.660
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 645 users visited in the last hour