Question: rename mutiple fasta header for mutiple fasta files
gravatar for bioinformaticssrm2011
2.6 years ago by
bioinformaticssrm201190 wrote:


I have mutiple fasta file and I want to change the header, for this I am using -

awk '/^>/{print ">C1_" ++i; next}{print}' C1_pandaseq.fasta > C1_pandaseq_new.fasta

input fasta-


output fasta-


Similarly i have mutiple fasta file, which looks like-


So I need to rename all the fasta file header, e.g.,

for fasta file C2_pandaseq.fasta

for fasta file C4_pandaseq.fasta

and so on...

For each fasta file, i need to rename fasta header according to the file name only. Therefore, I need to write a for loop for this, but i dont know how can i do that.

Any help. Thanks

ADD COMMENTlink modified 2.6 years ago by shenwei3564.8k • written 2.6 years ago by bioinformaticssrm201190

Hello bioinformaticssrm2011!

It appears that your post has been cross-posted to another site:

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 2.6 years ago by genomax71k

I understand, but I was not able to use the script mentioned there for my work. Though I appreciate there help.

ADD REPLYlink written 2.6 years ago by bioinformaticssrm201190
gravatar for Pierre Lindenbaum
2.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:
 ls *_pandaseq.fasta | cut -d "_" -f 1 | while read PREFIX; do awk -v P=${PREFIX} '/^>/{print ">" P "_" ++i; next}{print}' ${PREFIX}_pandaseq.fasta > ${PREFIX}_pandaseq_new.fasta ; done
ADD COMMENTlink written 2.6 years ago by Pierre Lindenbaum122k

Thanks, Pierre. I learned something new here since I don't typically use awk. I first tried to reference your answer on this post, having a similar approach with cut, but couldn't figure out how to pass the variable to the awk statement for the headers. Now I know to use -v.

ADD REPLYlink written 2.6 years ago by

Thank you Pierre. It works.

ADD REPLYlink written 2.6 years ago by bioinformaticssrm201190
gravatar for shenwei356
2.6 years ago by
shenwei3564.8k wrote:

Combining seqkit and rush:

ls *_pandaseq.fasta \
    | rush 'cat {} | seqkit replace -p ".+" -r "{^_pandaseq.fasta}_{nr}" > {.}.fa'


  • Seqkit is used to rename fasta header. {nr} means number of record, i.e. 1, 2, 3 ....
  • rush is a GNU parallel like tool.
    • {} is the input. e.g., C1_pandaseq.fasta.
    • {^_pandaseq.fasta} is used to remove suffix _pandaseq.fasta. e.g., C1_pandaseq.fasta becomes C1.
    • {.} removes last file extension. e.g., C1_pandaseq.fasta becomes C1_pandaseq.

A dry run example:

$ ls *_pandaseq.fasta
C1_pandaseq.fasta  C4_pandaseq.fasta

$ ls *_pandaseq.fasta \
      | rush 'cat {} | seqkit replace -p ".+" -r "{^_pandaseq.fasta}_{nr}" > {.}.fa' --dry-run
cat C4_pandaseq.fasta | seqkit replace -p ".+" -r "C4_{nr}" > C4_pandaseq.fa
cat C1_pandaseq.fasta | seqkit replace -p ".+" -r "C1_{nr}" > C1_pandaseq.fa
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by shenwei3564.8k
gravatar for
2.6 years ago by
Philadelphia, PA wrote:

Looks like you almost have it. See this post for using awk.

If you want to use python (2.7):

#!/usr/bin/env python

import sys

inpfile = sys.argv[1]

outfile = open(inp.split('.fasta')[0] + '_new.fasta', 'w')

with open(inp, 'r') as f:
           numb = 0
           for line in f:
                       if line.startswith('>'):
                                    numb += 1
                                    print >> outfile, '>' + inp.split("_")[0] + str(numb), '\n', next(f).strip()

To run save as, or whatever you want. List your files in a text file: ls -1 *_pandaseq.fasta > files.txt and run with cat files.txt | xargs -n 1 python

This assumes all your fasta files are single line. If you have multi-line fasta files, you can linearize with an awk statment from Pierre.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by

No matter how much I love python, for simple jobs like this it's the best to use the available gnu/command line tools. It's quite pointless to write a python script everytime you need to get something done :p

Oh and avoid the print >> outfile synthax, which is old synthax which shouldn't be used anymore. Instead, use outfile.write("yourtexthere")

ADD REPLYlink written 2.6 years ago by WouterDeCoster40k

@WouterDeCoster - I first approached this with awk, see comment on Pierre's answer. In regards to syntax, I noted using Python 2.7, and still need to get used to 3+.

ADD REPLYlink written 2.6 years ago by
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1754 users visited in the last hour