Question: Renaming header of contig file by awk
0
gravatar for waqasnayab
3.6 years ago by
waqasnayab180
Pakistan
waqasnayab180 wrote:

Hi,

I have a contig file:

>NODE_1_length_248_cov_3.157258
AAGGACTTGAGGGGCCTAACCTACCCTCAAGCATGCTCCCCGAAAGATTCCATCCATCCT
AGTCTTTTGAGGACAAATCCTACTGTGTAGACGAGTCATAGGGCAGACATTCGCGACGAA
TGGATCCGCCGGCCTCATCAGATAATTGAGACCGTCAACTGCCAGGTGCTCAAGAGGTTC
CTGGTTAAGTCTCCCTAGGCGTGGGAACTCTTTATGCATCGTTAACGTCCATCGGCTGAG
TGCCCACAGCGTTACTCAAGGCAGATTATACTGGGgag
>NODE_2_length_89_cov_4.494382
GTCGATAGATCTATGTGTTTAGACATGTAGATCAGTGGTCGTTGTGATGAGCGTAGCGCT
TGCGGAACGTGCACGAGTATACTATCACCGCCGGATTTTAATGCAGAGAGGTTCCCGAg
>NODE_3_length_79_cov_3.227848

and so on ........


I need to change the header in the following way:

 

>Contig1.1
AAGGACTTGAGGGGCCTAACCTACCCTCAAGCATGCTCCCCGAAAGATTCCATCCATCCT
AGTCTTTTGAGGACAAATCCTACTGTGTAGACGAGTCATAGGGCAGACATTCGCGACGAA
TGGATCCGCCGGCCTCATCAGATAATTGAGACCGTCAACTGCCAGGTGCTCAAGAGGTTC
CTGGTTAAGTCTCCCTAGGCGTGGGAACTCTTTATGCATCGTTAACGTCCATCGGCTGAG
TGCCCACAGCGTTACTCAAGGCAGATTATACTGGGgag
>Contig1.2
GTCGATAGATCTATGTGTTTAGACATGTAGATCAGTGGTCGTTGTGATGAGCGTAGCGCT
TGCGGAACGTGCACGAGTATACTATCACCGCCGGATTTTAATGCAGAGAGGTTCCCGAg
>Contig1.3

and so on........

I tired this awk command:

cat contig_1.fa | awk '{print (NR%4 == 1) ? ">Contig1." ++i : $0}' > contig_1_rename.fa

the output is:

contig_1_rename.fa

>Contig1.1
AAGGACTTGAGGGGCCTAACCTACCCTCAAGCATGCTCCCCGAAAGATTCCATCCATCCT
AGTCTTTTGAGGACAAATCCTACTGTGTAGACGAGTCATAGGGCAGACATTCGCGACGAA
TGGATCCGCCGGCCTCATCAGATAATTGAGACCGTCAACTGCCAGGTGCTCAAGAGGTTC
>Contig1.2
TGCCCACAGCGTTACTCAAGGCAGATTATACTGGGgag
>NODE_2_length_89_cov_4.494382
GTCGATAGATCTATGTGTTTAGACATGTAGATCAGTGGTCGTTGTGATGAGCGTAGCGCT
>Contig1.3

seems to me inserting header after every four lines instead of replacing the header. how to give a pattern search and replace in awk command rather than mentioning line (NR)?

Thanks,

Waqas.

 

 

awk next-gen assembly • 916 views
ADD COMMENTlink modified 3.6 years ago by seta1.2k • written 3.6 years ago by waqasnayab180
2
gravatar for Alex Reynolds
3.6 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Another approach is to modify the header line and leave the sequence lines untouched:

$ awk ' \
    BEGIN { \
        contigIdx = 1; \
    } \
    { \
        if ($0 ~ /^>/) { \
            print ">Contig1."contigIdx; \
            contigIdx++; \
        } \
        else { \
            print $0; \
        } \
    }' sequences.fa > sequences_renamed.fa

The pattern /^>/ matches lines which start with the character >.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Alex Reynolds28k

Thanks, its great, worked fine,

 

ADD REPLYlink written 3.6 years ago by waqasnayab180
1
gravatar for seta
3.6 years ago by
seta1.2k
Sweden
seta1.2k wrote:

Also, you can use the following awk command: 

awk '/^>/{print "> contig" ++i; next}{print}' < file.fasta > output.fasta

ADD COMMENTlink written 3.6 years ago by seta1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1961 users visited in the last hour