How to rearrange the contig numbers and rename it in an assebly.fasta file?
2
1
Entering edit mode
2.2 years ago
Ap1438 ▴ 50
>1 length=424197 depth=1.05x           
ATGCATGCAGTAGCAGATGCAGAGAGACAGATAG             
AATAGACAGTAGACGATAGACAGTAGAGATAGAGA              
ACGATGATGACCCAGTAGATGACAGTAGACAGATG            
>3 length=322465 depth=0.97x               
GTACATGGTAGCAGATGCAGAGAGACAGATAGAA      
AATAGACAGTAGACTATAGACAGTAGAGATAGAGA             
ATATAHATAHATAHAGATTHATHATACAGTATAGAT         
>4 length=313463 depth=0.87x             
CGTAGATCGTAGCAGATGCAGAGAGACAGATAGA          
AATAGACAGTAGACGATAGACAGTAGAGATAGAGA          
ATAGCAGTAGCAGTAGCAGATGACAGATGGAGAG         
>6 length=285776 depth=0.79x               
ATGCATGCAGTAGCAGATGCAGAGAGACAGATAG           
AATAGACAGTAGACGATAGACAGTAGAGATAGAGA         
TGATGACGATGACGATGGGTAGAACACCAGATGG        
>7 length=281883 depth=0.90x                     
ATGCATGCAGTAGCAGATGCAGAGAGACAGATAG             
AATAGACAGTAGACGATAGACAGTAGAGATAGAGA              
GTACAGTAGACAGTAGACAGAGGGGGAGATAGGA        

I have a fasta contig file which is numbered serially but for some reasons i have to remove some contigs , which resulted in gaps in the contig serial no. as shown above i.e. it became from 1,2,3,4,5,etc to 1,3,4,6,7 and so on.

So,now i want to rearrange the contigs i.e.

1 - 1             
3 - 2            
4 - 3            
6 - 4              
7 - 5 and so on till 580 (My total contigs)    

AND rename them as

1 - Contig_0001           
2 - Contig_0002           
3 - Contig_0003 and so on till Contig_0580            

I am trying it with awk print f function with increment. But i have not succeeded till now .

Thanks for your valuable time .Can anyone help me in solving this issue.

FASTA awk • 1.5k views
ADD COMMENT
5
Entering edit mode
2.2 years ago

Try seqkit replace:

seqkit replace -p '.+' -r 'Contig_{nr}' --nr-width 4  contigs.fasta -o renamed.fasta

Example:

$ echo -ne ">1\nactg\n>3\nACTG\n"
>1
actg
>3
ACTG

$ echo -ne ">1\nactg\n>3\nACTG\n" \
    | seqkit replace -p '.+' -r 'Contig_{nr}' --nr-width 4
>Contig_0001
actg
>Contig_0002
ACTG
ADD COMMENT
1
Entering edit mode

Thankyou for your valuable time and suggestion. This command is working fine.

ADD REPLY
0
Entering edit mode

Can you please explain the command?
What if i have a file where i want to replace something in the middle of the header and the file is in proper order

>S_griseus__Contig_0001__peg_1__799__35__negative        
atgcatgc        
>S_griseus__Contig_0001__peg_2__1655__3444__posetive    
gtcgtacg    

And want to change the peg_1 to peg_0001 in this header? What command will help me solve it?

ADD REPLY
1
Entering edit mode
seqkit replace -p '^(.+__peg_)\d+(__.+)$' -r '${1}{nr}${2}' --nr-width 4

You need to learn more about regular expressions.

ADD REPLY
1
Entering edit mode
2.2 years ago

Try this:

$ awk '/>/ {count++; printf("%s%d%s%04d\n",">",count,"-contig_",count)} !/>/{print}' test.fa

>1-contig_0001
ATGCATGCAGTAGCAGATGCAGAGAGACAGATAG             
AATAGACAGTAGACGATAGACAGTAGAGATAGAGA              
ACGATGATGACCCAGTAGATGACAGTAGACAGATG            
>2-contig_0002
GTACATGGTAGCAGATGCAGAGAGACAGATAGAA      
AATAGACAGTAGACTATAGACAGTAGAGATAGAGA             
ATATAHATAHATAHAGATTHATHATACAGTATAGAT         
>3-contig_0003
CGTAGATCGTAGCAGATGCAGAGAGACAGATAGA          
AATAGACAGTAGACGATAGACAGTAGAGATAGAGA          
ATAGCAGTAGCAGTAGCAGATGACAGATGGAGAG         
>4-contig_0004
ATGCATGCAGTAGCAGATGCAGAGAGACAGATAG           
AATAGACAGTAGACGATAGACAGTAGAGATAGAGA         
TGATGACGATGACGATGGGTAGAACACCAGATGG        
>5-contig_0005
ATGCATGCAGTAGCAGATGCAGAGAGACAGATAG             
AATAGACAGTAGACGATAGACAGTAGAGATAGAGA              
GTACAGTAGACAGTAGACAGAGGGGGAGATAGGA
ADD COMMENT
0
Entering edit mode

This command works fine. Thank you for your valuable time and suggestion.

ADD REPLY

Login before adding your answer.

Traffic: 1568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6