Adding numbers after duplicate headers in fasta files
2
0
Entering edit mode
17 months ago

If I have the .fasta file consisting of a sequence of genes from certain species, how do I add numbers after duplicate headers in such a manner:

i.e. before

>Homo Sapiens
ABCDEFG

>Mus Musculus
EDFGHIK

>Homo Sapiens
XYGFS

after

>Homo Sapiens_1
ABCDEFG

>Mus Musculus
EDFGHIK

>Homo Sapiens_2
XYGFS
linux • 727 views
ADD COMMENT
1
Entering edit mode
17 months ago

Here's a seqkit answer too.

seqkit rename -n file.fasta

ADD COMMENT
0
Entering edit mode
17 months ago
 awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' < in.fa |\
sort -t $'\t' -k1,1 |\
awk -F '\t' '{N++;if($1!=P) N=1;printf("%s_%d\t%s\n",$1,N,$2);P=$1;}' |\
tr "\t" "\n"
ADD COMMENT
0
Entering edit mode

that still adds the one to non-replicate header species.

ADD REPLY

Login before adding your answer.

Traffic: 3434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6